In this post, I want to show you how to install and configure the role of  Data Deduplication in Windows Server 2016 using Windows PowerShell. Data Deduplication is a role service that conserves storage space on an NTFS volume by locating redundant data and storing one only copy of that data instead of multiple copies.

Requirements

  • System or boot volumes are not supported
  • Volumes must be using NTFS or ReFS
  • Volumes must be attached to the server and cannot appear as non-removable drives
  • Volume can be shared storage
  • Certain files will not be processed
    • Files with extended attributes
    • Encrypted files
    • Files smaller than 32 KB

Usage Types

  • General purpose file servers.
    • General file shares.
  • Virtualized Desktop Infrastructure.
    • Virtual hard disks
  • Virtualized Backup Applications.
    • Backup volumes

Install Data Deduplication by using Windows PowerShell

You can do this using the Install-WindowsFeature cmdlet with the following syntax:

Data Deduplication

When you install the role, three jobs are created in the task scheduler. If you want to get these jobs using the Get-ScheduledTask cmdlet with the following syntax:

get-scheduledtask.

  • BackgroundOptimization: The Optimization jobs deduplicate data and compress file chunks on a volume per the policy settings.
  • WeeklyGarbageCollection: The Garbage Collection job reclaims disk space by removing unnecessary chunks that are no longer being referenced.
  • WeeklyScrubbing: The Integrity Scrubbing job identifies corruption in the chunk store due to disk failures or bad sectors.

Enable Data Deduplication on a volume

To enable deduplication on a volume, run the Enable-DedupVolume with the following syntax:

enable-dedupvolume

-UsageType: Specifies the type of workload for the volume.HyperV, Backup or Default.

Set the data deduplication settings on the volume

If you want to set additional settings on a volume, use the Set-DedupVolume cmdlet with the following syntax:

set-dedupvolume

-OptimizeInUseFiles: Indicates the behavior of the server when optimizing the files in use.

-NoCompress: Indicates whether or not the server compresses data after deduplication.

 -MinimumFileAgeDays: The deduplication process optimizes the files that users have not accessed in the number of days that you specify in this parameter.

MinimumFileSize: Specifies the minimum size in bytes.

-ExcludeFileType: Specifies comma-separated values of the extension types that are excluded by the deduplication engine.

-ExcludeFolder: Specifies an array of folders in which all files are ignored during data deduplication.

Run data deduplication jobs on demand

By default, deduplication occurs in the background, as a low-priority process, when the system is not busy but if you want to execute these jobs manually. Use the Start-DedupJob cmdlet with the following syntax:

start-dedupjob

-Type: Specifies the type of data deduplication job.

-Memory: Specifies the maximum percentage of physical computer memory that a job can use.

-Cores: Specifies the maximum percentage of physical cores that a job uses.

-Preempt: Indicates that the deduplication engine moves the job to the top of the job queue and cancels the current job.

-ReadOnly: Indicates that the scrubbing job only reports the damage it finds but does not perform any repair action.

Monitor Deduplication

Once you have installed data deduplication and enable it on volumes. You can monitor the deduplication process using the Get-DedupStatus cmdlet with the following syntax:

get-dedupstatus

Important: A LastOptimizationResult value of zero indicates that the operation was successful.

You can also review the history of a server’s deduplication jobs on the Windows event logs. Data Deduplication events are located in the application and Services Logs\Windows\Deduplication\Operational container.

If you want to know more about Data Deduplication, check out this link: https://docs.microsoft.com/en-us/windows-server/storage/data-deduplication/overviewwin