How to Implement Data Deduplication using PowerShell

In this post, I want to show you how to install and configure the role of Data Deduplication in Windows Server 2016 using Windows PowerShell. Data Deduplication is a role service that conserves storage space on an NTFS volume by locating redundant data and storing one only copy of that data instead of multiple copies.

Requirements>

Requirements #

System or boot volumes are not supported
Volumes must be using NTFS or ReFS
Volumes must be attached to the server and cannot appear as non-removable drives
Volume can be shared storage
Certain files will not be processed
- Files with extended attributes
- Encrypted files
- Files smaller than 32 KB

Usage Types>

Usage Types #

General purpose file servers.
- General file shares.
Virtualized Desktop Infrastructure.
- Virtual hard disks
Virtualized Backup Applications.
- Backup volumes

Install Data Deduplication by using Windows PowerShell>

Install Data Deduplication by using Windows PowerShell #

You can do this using the Install-WindowsFeature cmdlet with the following syntax:

Install-WindowsFeature `
  -Name FS-Data-Deduplication `
  -IncludeAllSubFeature

When you install the role, three jobs are created in the task scheduler. If you want to get these jobs using the Get-ScheduledTask cmdlet with the following syntax:

Get-ScheduledTask `
  -TaskPath \Microsoft\Windows\Deduplication\

BackgroundOptimization: The Optimization jobs deduplicate data and compress file chunks on a volume per the policy settings.
WeeklyGarbageCollection: The Garbage Collection job reclaims disk space by removing unnecessary chunks that are no longer being referenced.
WeeklyScrubbing: The Integrity Scrubbing job identifies corruption in the chunk store due to disk failures or bad sectors.

Enable Data Deduplication on a volume>

Enable Data Deduplication on a volume #

To enable deduplication on a volume, run the Enable-DedupVolume with the following syntax:

Enable-DedupVolume `
  -Volume <String[]> `
  -UsageType <UsageType>

-UsageType: Specifies the type of workload for the volume.HyperV, Backup or Default.

Set the data deduplication settings on the volume>

Set the data deduplication settings on the volume #

If you want to set additional settings on a volume, use the Set-DedupVolume cmdlet with the following syntax:

Set-DedupVolume `
  -Volume <String[]> `
  -OptimizeInUseFiles `
  -NoCompress <Boolean> `
  -NoCompressionFileType <String[]> `
  -MinimumFileAgeDays <UInt32> `
  -MinimumFileSize <UInt32> `
  -ExcludeFolder <String[]> `
  -ExcludeFileType <String[]>

-OptimizeInUseFiles: Indicates the behavior of the server when optimizing the files in use.
-NoCompress: Indicates whether or not the server compresses data after deduplication.
-MinimumFileAgeDays: The deduplication process optimizes the files that users have not accessed in the number of days that you specify in this parameter.
-MinimumFileSize: Specifies the minimum size in bytes.
-ExcludeFileType: Specifies comma-separated values of the extension types that are excluded by the deduplication engine.
-ExcludeFolder: Specifies an array of folders in which all files are ignored during data deduplication.

Run data deduplication jobs on demand>

Run data deduplication jobs on demand #

By default, deduplication occurs in the background, as a low-priority process, when the system is not busy but if you want to execute these jobs manually. Use the Start-DedupJob cmdlet with the following syntax:

Start-DedupJob `
  -Volume <String[]> `
  -Type <Type> `
  -Memory <UInt32> `
  -Cores <UInt32> `
  -Priority <Priority> `
  -StopWhenSystemBusy `
  -Preempt `
  -Full `
  -ReadOnly

-Type: Specifies the type of data deduplication job.
-Memory: Specifies the maximum percentage of physical computer memory that a job can use.
-Cores: Specifies the maximum percentage of physical cores that a job uses.
-Preempt: Indicates that the deduplication engine moves the job to the top of the job queue and cancels the current job.
_-ReadOnly:_ Indicates that the scrubbing job only reports the damage it finds but does not perform any repair action.

Monitor Deduplication>

Monitor Deduplication #

Once you have installed data deduplication and enabled it on volumes. You can monitor the deduplication process using the Get-DedupStatus cmdlet with the following syntax:

Get-DedupStatus `
  | format-list

Important: A LastOptimizationResult value of zero indicates that the operation was successful.

You can also review the history of a server’s deduplication jobs on the Windows event logs. Data Deduplication events are located in the application and Services Logs\Windows\Deduplication\Operational container.

Thanks for reading my post. I hope you find it useful.

If you want to know more about Data Deduplication, check out this link.