I read with great interest today some specs on the new Microsoft Windows Storage Server 2003 R2. Anyone who administers an Exchange messaging system knows the value of something called "Single Instance Storage." Simply put, if I send an email to 50 people, there is not 50 copies of the email taking up space in the mail store. There is one copy with pointers to it from all the other mailboxes. Now imagine applying this principle to file servers. Those 50 people that got that email save the attachment to their home drives. 50 x 300kb instantly disappears from your free drive space.

With Windows Storage Server 2003 R2 there is a service called the SIS Groveler. When it finds duplicate files on the NTFS volume it reports them to the SIS filter driver. SIS links are then created that point to one master copy that has been moved to the "common store." When a user wants to access the file they are transparently re-directed to the copy in the common store. If the user then modifies the file, the altered copy is left where the user saves it and the link to the original copy is deleted.

The groveler service runs in background mode during free cycles, but can be run at maximum capacity in foreground mode by using the sisadmin.exe tool.

Of course, if the common store is lost, then the common data is lost for all users so some care will have to be taken with it.

There is a great paper here which goes in to more detail.

Link to Windows Storage Server 2003 R2 Home

Posted on Thursday, August 17, 2006

