I was recently turning one of my clients database to help improve performance in their high transactional environment. One of the major caveats that we had was that the client was dealing with ever increasing size of their databases (which all organizations do) that were surging to over TB in size and had limited window of maintenance due to high availability and constant change of data. With this in mind we knew that regular database indexing was not an option adding the sizes of indexes that were over 70GB in size.
Re-indexing such a database can be time and resource consuming and would only be viable if you had time and patient clients on your side. After the analysis we recommended to the client to consider doing updating the statistics on regular basis rather than re-indexing the database. And what followed is the same question that I get so often “what’s the difference?”
Indexes: - Physical object within a table that allows referencing data within a database faster.
Statistics: - Set of values that help to optimize a query during the execution.
We demonstrated to the client the importance of using statistics sampling to help optimize the performance without having to deal with indexes on a regular basis. With that in mind I wrote a dynamic SQL script to allow the client to specify which database and the level of sampling they wanted to do.
Here is the query for anyone else that would be interested in a similar setup.
SET NOCOUNT ON
DECLARE @MaxDaysOld int
DECLARE @SamplePercent int
DECLARE @SampleType nvarchar(50)
DECLARE @DBNAME nvarchar(50)
SET @MaxDaysOld = 0
SET @SamplePercent = 10 -- Sampling percentage
SET @SampleType = 'PERCENT'
SET @DBName = '<database name>' -- your database name
EXEC('USE '+@DBNAME+'
DECLARE @current_stats TABLE (RowNum int
, TableName varchar(100)
, StatName varchar (max)
, StatDate smalldatetime
, flag int default 0)
INSERT INTO @current_stats (RowNum, TableName , StatName, StatDate, flag)
SELECT TOP 10 RowNum = ROW_NUMBER() OVER (ORDER BY ISNULL(STATS_DATE(object_id, st.stats_id),1))
,TableName = QUOTENAME(OBJECT_SCHEMA_NAME(st.object_id), '''')+''.''+ OBJECT_NAME(st.object_id)
,StatName = st.name
,StatDate = ISNULL(STATS_DATE(object_id, st.stats_id),1)
,Flag = 0
FROM sys.stats st WITH (nolock)
WHERE DATEDIFF(day, ISNULL(STATS_DATE(object_id, st.stats_id),1), GETDATE()) > '+@MaxDaysOld+'
and OBJECT_SCHEMA_NAME(st.object_id) <> ''sys''
--and OBJECT_NAME(st.object_id) like ''%breakout_test_2%''
ORDER BY ROW_NUMBER() OVER (ORDER BY ISNULL(STATS_DATE(object_id, st.stats_id),1))
DECLARE @MaxRecord int
DECLARE @CurrentRecord int
DECLARE @TableName nvarchar(max)
DECLARE @StatName nvarchar(max)
DECLARE @SQL nvarchar(max)
DECLARE @SampleSize nvarchar(100)
DECLARE @Stat int, @totalstats int
DECLARE @stats_table TABLE (t_name varchar(100),st_name varchar(100), flag int default 0)
DECLARE @instring varchar (50), @outstring varchar(50)
SET @MaxRecord = (SELECT MAX(RowNum) FROM @current_stats)
SET @CurrentRecord = 1
SET @SQL = ''''
SET @SampleSize = ISNULL(CAST('+@SamplePercent+' AS nvarchar(20)),'''+@SampleType+''' )
WHILE @CurrentRecord <= @MaxRecord
BEGIN
SELECT @TableName = (SELECT TOP 1 TableName FROM @current_stats WHERE FLAG = 0)
SET @instring = @tableName
SET @outstring = (SELECT SUBSTRING(@instring,7,30))
INSERT INTO @stats_table (t_name, st_name, flag )
SELECT so.name, st.name, flag=0 FROM sys.stats st
JOIN sys.sysobjects so
ON st.object_id =so.id
WHERE so.name = @outstring
SET @Stat = 1
SET @totalstats = (SELECT COUNT(*) FROM @stats_table WHERE flag = 0)
WHILE @Stat <=@totalstats
BEGIN
SET @StatName = (SELECT TOP 1 st_name FROM @stats_table WHERE FLAG = 0)
SET @SQL = (''UPDATE STATISTICS '+@dbname+'.''+ @TableName+'' ''+ quotename(@StatName, '''') +'' WITH SAMPLE''+ '' ''+@SampleSize +'' PERCENT'' )
-- Prints what will be executed to update the statistics
PRINT @SQL
-- Execute the statistics
EXEC (@SQL)
UPDATE TOP (1) @stats_table
SET FLAG = 1
WHERE FLAG = 0
SET @Stat = @Stat + 1
END
UPDATE TOP (1) @current_stats
SET FLAG = 1
WHERE FLAG = 0
SET @CurrentRecord = @CurrentRecord + 1
END
'
)