SharePoint had, what I have always considered, until version 14, a limitation in its document / content management design – the inability to scale the storage of its documents and content. This led to various kinds of problems and was a headache to manage if the number of documents and the size of the documents are very large.
1) The smallest unit for splitting a database into Filegroups is a Table. This obviously doesn’t do a lot to help us, since, as you will see below, all document content is stored within a single table.
2) We may partition the table into more than one filegroup in SQL Server 2005 (and 2008), but without a bit of normalization and indexing this is of no use to us. This will probably not help us with the SharePoint database though since modifying the database structure of any SharePoint database is not supported by Microsoft.
So, atleast until version 14, we were stuck with using the default configuration that comes out of the box with SharePoint for our content database needs. This is not bad per se, if you have a small or medium sized content database. For large content databases which have a lot of active document libraries, you will see that this will become a limitation very quickly. The recommended limit of 100 GB per content database doesn’t quite help either – since that means more content databases need to be created and attached to a web application. But then you cannot split a site collection into more than one content database. So, basically, we are stuck.
Thankfully, in SharePoint 14 (with SQL Server 2008), we have another option: Remote BLOB storage providers. But we need to know where we are coming from, to know where we are going.
To give you a quick overview of how SharePoint stores its content in its database, we will need to review the structure of its content databases. If you open up a SharePoint content database in SQL Server Management Studio, you will see many tables related to features, lists, content types, documents. However, we are more interested in the tables related to documents – which is the AllDocs table and AllDocStreams. While AllDocs table stores the metadata and the general details of the document, the AllDocStreams stores the actual content of the document. To test my theory, I uploaded a brand new document into a document library and checked the AllDocs and AllDocStreams table for changes.
The above highlighted bits tells us how the newly uploaded document is stored within the AllDocs table. The main bits we need to look out for are the Id and the LeafName columns. As all of our intuitions will say, the Id column should be our primary key for lookup in other tables. The LeafName stores the name of the document as we uploaded.
Lets look at how it relates to the AllDocStreams table. Pick up the Id and do a query on the AllDocStreams table. You will see the Id column in the AllDocStreams corresponds to the Id in the AllDocs table.
Here is the query for the highlighted document record we found above:
SELECT TOP 1000 [Id]
WHERE [Id] = ‘42626DEB-A54C-463D-9935-B70DAA4B2C5E’
This gives me one record corresponding to one version of the document existing in the library from the AllDocStreams table.
Note that depending on the number of versions of the document in the document library, you might get multiple records for the same Id value. Another thing to note here is contrary to the popular thinking, the Id column is not a primary key and nor is it unique. It simply stores the Id of the document from the AllDocs table. Not a normalized design, I know.
I have highlighted the interesting bits in the above screenshot. The Content field seems to contain the actual content of the document. In SSMS this is represented as a long stream of hexadecimal values corresponding to the byte values in the file that was uploaded. I have also highlighted one other field called RbsId which seems to correspond to the internal Id from the remote blob storage provider. For now it is NULL because this document is stored within the BLOB of the content database itself. We will get back to that a little later. But first let us look at how to configure our SharePoint application to use RBS. I had to deviate from the official MSDN article a bit for my system. Below is what I did.
Enable FILESTREAM in SQL Server 2008
The first step in this process is to enable FILESTREAM support in your SQL Server Instance. If you are like me, then you would have enabled it by default when you installed the SQL Server. However, I am using a development box, so I didn’t have to think twice to do it. In a production environment, things may be different and it is quite possible that this was not enabled by default. If that is the case, enable it on your server by following the steps in the below article:
Don’t worry if you see the word “R2” on that page. It applies to the regular SQL Server 2008 version too.
Download the Remote BLOB store component
To quote what this component does:
The SQL Server Remote Blob Store is a method for storing blobs of unstructured data in an external Content Addressable data store. The component consists of a client-side DLL that is linked into a user application, as well as a set of stored procedures to be installed on SQL Server.
The download is available from the below page under the section “Remote BLOB store component”
When downloading ensure that you select the correct version of the file that matches your processor architecture. Don’t install it or run it yet.
Configure FILESTREAM for the content database
I am assuming you have already created a web application, if not this would be the right time. Note the content database name you used when create the web application. It would be a good test to not have a Site Collection created in that application. If you have, don’t fret. This will still work.
Fire up the SQL Server Management Studio and connect to our SharePoint database server.
Open up the Databases node and right click on the content database that we created when creating the web application. Select “New Query” to start the query editor window. At this stage, there are a couple of things that you will need to decide which I have listed below:
1) Decide what is the password you will use for the database master key for this database, if you haven’t already got one. The database master key will be used to encrypt the FILESTREAM content in the disk. The stronger you make the encryption key password the better.
2) Decide where will you be storing the FILESTREAM content in the disk. Remember that these are VERY crucial files for the smooth functioning of your SharePoint web application. You may not be able to recover the documents you upload if this location becomes corrupted or unavailable. If you are using a SAN for storing your SQL Server databases then a location within the SAN would be the best place.
In the below examples I am assuming the content database name is WSS_Content_Database. You may replace it with the name you have provided to your content database. I am also assuming that the location where the filestream content will be stored will be in the folder C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\DATA\WSSDocuments. You may replace this with any other folder that you have decided on.
Execute the below command to setup the master key encryption password:
if not exists (select * from sys.symmetric_keys where name = N’##MS_DatabaseMasterKey##’)
create master key encryption by password = N’MasterKeyPassword!2#4′
Add a new FILEGROUP to the database by executing the below command:
if not exists (select groupname from sysfilegroups where groupname=N’WSSDocuments’)
alter database [WSS_Content_Database] add filegroup WSSDocuments contains filestream
Configure the newly created FILEGROUPs FILESTREAM settings by executing the below command;
alter database [WSS_Content_Database] add file (name = WSSDocuments, filename = ‘C:Program FilesMicrosoft SQL ServerMSSQL10.MSSQLSERVERMSSQLDATAWSSDocuments’) to filegroup WSSDocuments
Please DO NOT create the folder yourself. When the above command is completed SQL Server will automatically create the folder for you and apply the necessary permissions to it.
Install the RBS component
It is usual to double click on an MSI file to start installing it. However, in this case we cannot do that. We will need to pass some parameters to the MSI so that it not only installs the provider but configures the FILESTREAM settings for our content database to take advantage of it.
So, fire up an elevated command prompt and execute the below command:
msiexec /qn /lvx* rbsinstall.log /i RBS.msi TRUSTSERVERCERTIFICATE=true FILEGROUP=PRIMARY DBNAME=”WSS_Content_Database” DBINSTANCE=”MSSQLSERVER” FILESTREAMFILEGROUP=WSSDocuments FILESTREAMSTORENAME=WSSDocuments_1
Some things to note in the above command:
- DBINSTANCE: The instance name of the database installation. If your instance configuration is default then you may skip providing this parameter. Otherwise specify the name of the instance.
- DBNAME: The name of the content database that we created above.
- FILESTREAMGROUP: The name of the file stream group we configured in the above step.
- FILESTREAMSTORENAME: This need not match the group name provided above as listed in the above command. It can be a unique store name for your database server installation.
You will see that once you press the enter key the command will return back to the prompt almost immediately. However, you will see the MSIEXEC processes churning away. The best way to know when the installation is finished is to check the rbsinstall.log for “Installation completed successfully” towards the end of the file.
To verify that RBS has been installed and configured for your content database, switch to the SSMS and expand the Tables node. You should see some tables in the list with names starting with mssqlrbs_
The next step is to configure the SharePoint configuration database for enabling remote blob storage.
Lets fire up the “SharePoint 2010 Management Shell” in elevated mode. PowerShell and SharePoint 2010 are brilliant together. Follow this link to learn more about it.
We will first get the content database that we are interested in into a variable. Replace the database name with what you have provided as a part of setting up your web application:
$cdb = Get-SPContentDatabase ‘WSS_Content_Database’
Next, get the remote blob settings for the database:
$rbss = $cdb.RemoteBlobStorageSettings
To check if we actually got the settings object, just type in $rbss and press enter. You should get some information back on the screen.
Now, check if the RBS provider is installed correctly:
This should be print a single True or False on the screen. To continue we will need a True on the above command.
Enable the RBS on the content database now:
Once the command prompt comes back, type in $rbss again and check if it is enabled. If enable it is time to specify which provider should be used. Use the following command to do that:
Type in $rbss again and you should see WSSDocuments_1 listed as the active provider. The output should look something similar to the below screen shot:
Testing our configuration
After all that hard work it is time to test our configuration. If you have not already created a site collection in the web application, it is time to create one now. If you have already created it, skip that bit.
Navigate to the site collection that we newly created, or what has already been created and to a document library. Upload a document of any size into it. You should immediately see files and folders being created in the WSSDocuments folder that we configured.
Congratulations! Now your documents are stored outside the SQL Server!
A sample is provided in the below screenshot:
We will do some digging around in the content database to see how the AllDocStreams records change once RBS is configured.
First let us requery the content database for the document we uploaded in the AllDocs table. In my case I got the ID of the document as C967AC32-014C-4F9F-8D06-525DD47D35CA.
I used that ID to lookup the AllDocStreams table for the document content and here is what I got:
As you can see above, the Content field value is now NULL and the RbsId field has some value.
Setting constraints for RBS storage
It may not be prudent to store every single document in the RBS folders outside the database, since the performance may actually decrease for files are small size (i.e. < 1 MB). It would be better if we set a minimum size of 1 MB as a cut off point for documents to be pulled into the RBS folder. To do that we will use the PowerShell console again and use the following set of commands:
$cdb = Get-SPContentDatabase “WSS_Content_Database”
You probably figured it out yourself, but the size is specified in bytes and any higher value should be converted to bytes and specified here. For e.g. to specify 2MB, you will have to specify 2 * 1024 = 2097152. If you are like me, and are too lazy to open up the calculator, Google might be your best friend. Just search for 2MB =? Byte and it will tell you the answer!
Enabling RBS on additional content databases
To enable additional content databases use the same methods described in “Configure FILESTREAM for the content database” to enable and configure the filestream provider for the content database and run the RBS installer as below:
msiexec /qn /i rbs.msi REMOTEBLOBENABLE=1 FILESTREAMPROVIDERENABLE=1 DBNAME=WSS_Content_Database FILESTREAMSTORENAME=WSSDocuments_1 ADDLOCAL=EnableRBS,FilestreamRunScript DBINSTANCE=MSSQLSERVER