Skip to main content
Jorge Bernhardt Jorge Bernhardt
  1. Posts/

How to Implement Data Deduplication using PowerShell

·679 words·4 mins· 100 views · 5 likes ·
Enable-DedupVolume Get-DedupStatus Get-ScheduledTask Install-WindowsFeature

In this post, I want to show you how to install and configure the role of  Data Deduplication in Windows Server 2016 using Windows PowerShell. Data Deduplication is a role service that conserves storage space on an NTFS volume by locating redundant data and storing one only copy of that data instead of multiple copies.

Requirements>

Requirements #

  • System or boot volumes are not supported
  • Volumes must be using NTFS or ReFS
  • Volumes must be attached to the server and cannot appear as non-removable drives
  • Volume can be shared storage
  • Certain files will not be processed
    • Files with extended attributes
    • Encrypted files
    • Files smaller than 32 KB
Usage Types>

Usage Types #

  • General purpose file servers.
    • General file shares.
  • Virtualized Desktop Infrastructure.
    • Virtual hard disks
  • Virtualized Backup Applications.
    • Backup volumes
Install Data Deduplication by using Windows PowerShell>

Install Data Deduplication by using Windows PowerShell #

You can do this using the Install-WindowsFeature cmdlet with the following syntax:

Install-WindowsFeature `
  -Name FS-Data-Deduplication `
  -IncludeAllSubFeature

Data Deduplication
When you install the role, three jobs are created in the task scheduler. If you want to get these jobs using the Get-ScheduledTask cmdlet with the following syntax:

Get-ScheduledTask `
  -TaskPath \Microsoft\Windows\Deduplication\

get-scheduledtask.

  • BackgroundOptimization: The Optimization jobs deduplicate data and compress file chunks on a volume per the policy settings.
  • WeeklyGarbageCollection: The Garbage Collection job reclaims disk space by removing unnecessary chunks that are no longer being referenced.
  • WeeklyScrubbing: The Integrity Scrubbing job identifies corruption in the chunk store due to disk failures or bad sectors.
Enable Data Deduplication on a volume>

Enable Data Deduplication on a volume #

To enable deduplication on a volume, run the Enable-DedupVolume with the following syntax:

Enable-DedupVolume `
  -Volume <String[]> `
  -UsageType <UsageType>

enable-dedupvolume
-UsageType: Specifies the type of workload for the volume.HyperV, Backup or Default.

Set the data deduplication settings on the volume>

Set the data deduplication settings on the volume #

If you want to set additional settings on a volume, use the Set-DedupVolume cmdlet with the following syntax:

Set-DedupVolume `
  -Volume <String[]> `
  -OptimizeInUseFiles `
  -NoCompress <Boolean> `
  -NoCompressionFileType <String[]> `
  -MinimumFileAgeDays <UInt32> `
  -MinimumFileSize <UInt32> `
  -ExcludeFolder <String[]> `
  -ExcludeFileType <String[]>

set-dedupvolume

  • -OptimizeInUseFiles: Indicates the behavior of the server when optimizing the files in use.
  • -NoCompress: Indicates whether or not the server compresses data after deduplication.
  • -MinimumFileAgeDays: The deduplication process optimizes the files that users have not accessed in the number of days that you specify in this parameter.
  • -MinimumFileSize: Specifies the minimum size in bytes.
  • -ExcludeFileType: Specifies comma-separated values of the extension types that are excluded by the deduplication engine.
  • -ExcludeFolder: Specifies an array of folders in which all files are ignored during data deduplication.
Run data deduplication jobs on demand>

Run data deduplication jobs on demand #

By default, deduplication occurs in the background, as a low-priority process, when the system is not busy but if you want to execute these jobs manually. Use the Start-DedupJob cmdlet with the following syntax:

Start-DedupJob `
  -Volume <String[]> `
  -Type <Type> `
  -Memory <UInt32> `
  -Cores <UInt32> `
  -Priority <Priority> `
  -StopWhenSystemBusy `
  -Preempt `
  -Full `
  -ReadOnly

start-dedupjob

  • -Type: Specifies the type of data deduplication job.
  • -Memory: Specifies the maximum percentage of physical computer memory that a job can use.
  • -Cores: Specifies the maximum percentage of physical cores that a job uses.
  • -Preempt: Indicates that the deduplication engine moves the job to the top of the job queue and cancels the current job.
  • _-ReadOnly:_ Indicates that the scrubbing job only reports the damage it finds but does not perform any repair action.
Monitor Deduplication>

Monitor Deduplication #

Once you have installed data deduplication and enabled it on volumes. You can monitor the deduplication process using the Get-DedupStatus cmdlet with the following syntax:

Get-DedupStatus `
  | format-list

get-dedupstatus
Important: A LastOptimizationResult value of zero indicates that the operation was successful.

You can also review the history of a server’s deduplication jobs on the Windows event logs. Data Deduplication events are located in the application and Services Logs\Windows\Deduplication\Operational container.

Thanks for reading my post. I hope you find it useful.

If you want to know more about Data Deduplication, check out this link.