Guide to reduce TFS database growth using the Test Attachment Cleaner

UPDATED March 24th 2016:   Test Data Retention is now added to TFS 2015 Update 1,  see this post for more information.

UPDATED Apr 15th 2013: Corrected to VS2012 Update 2

UPDATED Mar 23rd 2012: Added information about VS 11

UPDATED Mar 21st 2012:  Added info and link to VS/TFS 11 tool for getting the Test Attachment Cleaner, now included with the TFS Power Tools

UPDATED Mar 17th 2012: Added sql queries for TFS 11 Beta.  Changed structure after Adam’s advices. Updated information.

UPDATED Jan 9th 2012: Added link to SQL 2008 R2 SP1 CU4 for ghost file fix

Background

Recently there has been several reports on TFS databases growing too fast and growing too big.  Notable this has been observed when one has started to use more features of the Testing system.  Also, the TFS 2010 handles test results differently from TFS 2008, and this leads to more data stored in the TFS databases. As a consequence of this there has been released some tools to remove unneeded data in the database, and also some fixes to correct for bugs which has been found and corrected during this process.  Further some preventive practices and maintenance rules should be adopted.

Scenarios that trigger this is:

  • You have started to run manual tests, and includes video’s, images and collects a lot of data during the runs.
  • You have automated your test runs, and includes code coverage information. This adds all the binaries to the TFS too.
  • You are running a version of Visual studio 2010 that haven’t been updated (as detailed below), and thus manual testing and automated testing during build pushes up binary files.

VS 2012 Improvements

In VS 2012 the following changes have been done to improve the situation:

  • Deployment items are no longer being uploaded, even for code coverage there is no upload, as the way it works have changed.
  • System information files are now uploaded once per test run.  If your testrun consists of multiple test cases, like when you use a test suite as basis for a test run, you will now only get one system information file, whereas in 2010 you got one per test case regardless of how it was run. As they are identical, this was just waste.
  • All test attachments are now compressed and decompressed on the client side, reducing both space on server and download time.
  • A trx file consists of multiple small tr_ files.  In VS 2012 a single trx file is uploaded, which consist of all test case results for that test run.

See this blogpost for detailed information about this.

Links

A lot of people have blogged about this, among these are:

Anu’s very important blog post here describes both the problem and solutions to handle it.  She describes both the Test Attachment Cleaner tool, and also some QFE/CU releases to fix some underlying bugs which prevented the tool from being fully effective.

Brian Harry’s blog post here describes the problem too

This forum thread describes the problem with some solution hints.

Ravi Shanker’s blog post here describes best practices on solving this (TBP)

Grant Holidays blogpost here describes strategies to use the Test Attachment Cleaner both to detect space problems and how to rectify them.

What features can cause the problem:

  • Publishing of test results from builds
  • Publishing of manual test results and their attachments in particular
  • Publishing of deployment binaries for use during a test run
  • Bugs in SQL server preventing total cleanup of data

All the published data above is published into the TFS database as attachments.

The test results will include all data being collected during the run.  Some of this data can grow rather large, like IntelliTrace logs and video recordings.

Particularly annoying is the pushing of binaries which happen for automated test runs.  This is in earlier versions of TFS the default setting, whereas in later versions it has been reversed. The binaries includes all the programs dll’s and exe’s, and is pushed up to prepare for a second run. Normally that never happens, so there is really no point in polluting the database with these binaries.  These binaries are often seen to be the highest size contributor.

The goal of this post is to give you a step-by-step process to get a smaller database.

  1. Set up your system to minimize potential database issues
  2. Find out if you have a database space issue
  3. If you have the “problem”, clean up the database and otherwise keep it clean

Step 1: Ensure your Visual Studio (MTM) and SQL Server for TFS is properly set up

Even if you have got the problem or if have yet not got the problem, you should ensure the Visual Studio MTM and SQL server is set up so that the risk of getting into this problem is minimized. To ensure this you should install the following set of updates and components.  Note that the TFS Server itself does not affect this, but you should also keep that up to the latest upgrade.

See my blog post here for the latest extensions, updates and patches or Grant Holidays excellent post here for further details.

Visual Studio updates:

  1. Visual Studio 2010 SP1 CU2 or Visual Studio 2012 (any version from RTM to Update 2)
    1. These have the fix included
  2. Visual Studio 2010 SP1 pre-CU2
    1. You should preferable update to CU2, otherwise you can install the QFE for KB2608743 – which also contains detailed instructions on its use, download from here.

The default settings (for both 1 and 2 above) will be to not upload deployed binaries, which are used in automated test runs, given the following conditions:

  • However, note that binaries will still be uploaded if:
    • Code coverage is enabled in the test settings.
    • You change the UploadDeploymentItem to true in the testsettings file.
      • This has to be done by editing the testsettings file as an XML file, this is not in the UI.  Change it in the element named Deployment.
         <Deployment enabled="false" uploadDeploymentItems="false"  />
    • Be aware that this might be reset back to true by another user which haven’t installed this QFE, or is running a non-CU2 VS/MTM.
  • The CU2 or hotfix should be installed to
    • The build servers (the build agents)
    • The machine hosting the Test Controller
    • Local development computers (Visual Studio)
    • Local test computers (MTM)
      • It is not required to install it to the TFS Server, test agents or the build controller – it has no effect on these programs. The hotfix is named as a TFS Server hotfix, but further down in small letters it is stated it doesn’t apply to the server itself.

SQL Server updates:

  • If you use the SQL Server 2008 R2 SP1 you should also install the CU 4 (or later) (for pre-SP1 it is the the CU 10 ). This CU fixes a potential problem of hanging “ghost” files. This seems to happen only in certain trigger situations, but to ensure it doesn’t bite you, it is better to make sure this CU is installed.
    • There is no such CU for SQL Server 2008 pre-R2
      • Work around: If you suspect hanging ghost files, they can – with some mental effort, be deduced from the ghost counters using the following SQL query:
        use master
        SELECT DB_NAME(database_id) as 'database',OBJECT_NAME(object_id) as 'objectname',
        index_type_desc,ghost_record_count,version_ghost_record_count,record_count,avg_record_size_in_bytes 
        FROM sys.dm_db_index_physical_stats (DB_ID(N'<DatabaseName>'), OBJECT_ID(N'<TableName>'), NULL, NULL , 'DETAILED')
      • The problem is a stalled ghost cleanup process. Restarting the SQL server after having stopped all components that depends on it, like the TFS Server and SPS services – that is all applications that connect to the SQL server. Then restart the SQL server, and finally start up all dependent processes again. (I would guess a complete server reboot would do the trick too.) After this the ghost cleanup process will run properly again.
    • The “hanging ghost file” issue came up after one have run the TAC, and deleted enormous amount of data. The SQL Server can get into this hanging state (without the QFE) in certain cases due to this.

TFS Attachment cleaner:

  • For Visual Studio 2010:  Install and set up the Test Attachment Cleaner (TAC) on your own computer. It works through the client API, so you can run it from any client computer.
  • For Visual Studio/TFS 2010 download Test Attachment Cleaner command line power tool from here or download the TFS Power Tools December 2011 for TFS 2010. The TAC is included with this.
  • For Visual Studio/TFS 2012 download  the TFS Power tools.  The TAC is included with this. See this post for information for the different versions.

Step 2:  Analyze the data

Are your database(s) growing ?  Are unused test results growing out of proportion?

To find out about this you need to query your TFS database for some of the information, and use the Test Attachment Cleaner (TAC) to obtain some  more detailed information.

If you don’t have too many databases you can use the SQL Server reports from within the Management Studio to analyze the database and table sizes. Or, you can use a set of queries. I find queries often faster to use because I can tweak them the way I want them.  But be aware that these queries are non-documented and non-supported and may change when the product team wants to change them.

Step 1.1: If you have multiple Project Collections, find out which might have problems:

Open a SQL Management Studio session onto the SQL Server where you have your TFS Databases.

Use the query below to find the Project Collection databases and their sizes, in descending size order.

use master
select DB_NAME(database_id) AS DBName, (size/128) SizeInMB
 FROM sys.master_files 
 where type=0  and substring(db_name(database_id),1,4)='Tfs_' and DB_NAME(database_id)<>'Tfs_Configuration' order by size desc

Doing this on one of our SQL servers gives the following results:

image

It is pretty easy to see on which collection to start the work Smile

Step 1.2: Find out which tables are possibly too large

Keep a special watch out for the Tfs_Attachment table.

Use the script at the bottom of Grant’s blog to find the table sizes in descending size order.  Add a “use Tfs_DefaultCollection” or whatever name you have for your collection database( s) at the top of the script.

In our case we got this result:

image

From Grant’s blog we learnt that the tbl_Content is in the Version Control category, so the major only big issue we have here is the tbl_AttachmentContent.

Step 1.3: Find out which team projects have possibly too large attachments

In order to use the TAC to find and eventually delete attachment data we need to find out which team projects have these attachments. The team project is a required parameter to the TAC.

Use the following query to find this, replace the collection database name with whatever applies in your case:

For TFS 2010
use Tfs_DefaultCollection
select  p.projectname, sum(a.compressedlength)/1024/1024 as sizeInMB from dbo.tbl_Attachment as a
inner join tbl_testrun as tr on a.testrunid=tr.testrunid
inner join tbl_project as p on p.projectid=tr.projectid
group by p.projectname
order by sum(a.compressedlength) desc
For TFS 2012
use Tfs_DefaultCollection
select  p.projectname, sum(a.uncompressedlength)/1024/1024 as sizeInMBUncompressed, SUM(f.compressedlength)/1024/1024 as sizeInMBCompressed from dbo.tbl_Attachment as a
inner join tbl_File as f on a.TfsFileId=f.FileId
inner join tbl_testrun as tr on a.testrunid=tr.testrunid
inner join tbl_project as p on p.projectid=tr.projectid
group by p.projectname
order by sum(f.compressedlength) desc

In our case we got this result (had to remove some names), out of more than 100 team projects accumulated over quite some years:

image

As can be seen here it is pretty obvious the “Byggtjeneste – Projects” are the main team project to take care of, with the ones on lines 2-4 as the next ones.

 

Step 1.4: Check which attachment types take up the most space

It can be nice to know which attachment types takes up the space, so run the following query:

For TFS 2010
use Tfs_DefaultCollection
select  a.attachmenttype, sum(a.compressedlength)/1024/1024 as sizeInMB from dbo.tbl_Attachment as a
inner join tbl_testrun as tr on a.testrunid=tr.testrunid
inner join tbl_project as p on p.projectid=tr.projectid
group by a.attachmenttype
order by sum(a.compressedlength) desc
For TFS 2012
use Tfs_DefaultCollection
select  a.attachmenttype, sum(f.compressedlength)/1024/1024 as sizeInMB from dbo.tbl_Attachment as a
inner join tbl_File as f on a.TfsFileId=f.FileId
inner join tbl_testrun as tr on a.testrunid=tr.testrunid
inner join tbl_project as p on p.projectid=tr.projectid
group by a.attachmenttype
order by sum(f.compressedlength) desc

We then got this result for a TFS 2010 database:

image

From this it is pretty obvious that the problem here is the binary files, as also mentioned in Anu’s blog.

Step 1.5: Check which file types, by their extension, takes up the most space

Run the following query

For TFS 2010
use Tfs_DefaultCollection
select  SUBSTRING(filename,len(filename)-CHARINDEX(‘.’,REVERSE(filename))+2,999)as Extension, sum(compressedlength)/1024 as SizeInKB from tbl_Attachment
group by SUBSTRING(filename,len(filename)-CHARINDEX(‘.’,REVERSE(filename))+2,999)
order by sum(compressedlength) desc
For TFS 2012
use Tfs_DefaultCollection
select  SUBSTRING(a.filename,len(a.filename)-CHARINDEX(‘.’,REVERSE(a.filename))+2,999)as Extension, sum(f.compressedlength)/1024 as SizeInKB from tbl_Attachment as a
inner join tbl_File as f on a.TfsFileId=f.fileid
group by SUBSTRING(a.filename,len(a.filename)-CHARINDEX(‘.’,REVERSE(a.filename))+2,999)
order by sum(f.compressedlength) desc

This gives a result like this:

image

Now you should have collected enough information to tell you what to do – if you got to do something, and some of the information you need in order to set up your TAC settings file, both for a cleanup and for scheduled maintenance later.

Step 3: Cleaning out the attachments

The TAC is run from the command line using a set of parameters and controlled by a settingsfile.  The parameters point out a server uri including the team project collection and also point at a specific team project. So in order to run this for multiple team projects regularly one has to set up a script to run the TAC multiple times, once for each team project.  When you install the TAC there is a very useful readme file in the same directory.

There  are some guidelines about running the TAC from Ravi Shanker:

  • When you run TAC, ensure that you are deleting small chunks of data at regular intervals (say run TAC every night at 3AM to delete data that is between age 730 to 731 days) – this will ensure that small amounts of data are being deleted and SQL ghosted record cleanup can catch up with the number of deletes performed. “
    • This rule minimizes the risk of the ghosted hang problem to occur, and further makes it easier for the SQL server ghosting process to work smoothly.
  • “Run DBCC SHRINKDB post the ghosted records are cleaned up to physically reclaim the space on the file system”
    • This is the last step in a 3 step process of removing SQL server data. First they are logically deleted. Then they are cleaned out by the ghosting process, and finally removed using the shrinkdb command.

When the deployment binaries are published to the TFS server, ALL items are published up from the deployment folder. That often means much more files than you would assume are necessary. This is a brute force technique. It works, but you need to take care when cleaning up.

Grant has shown how their settings file looks in his blog post, removing all attachments older than 180 days , as long as there are no active workitems connected to them. This setting can be useful to clean out all items, both in a clean-up once operation, and in a general cleanup.

There are two scenarios we need to consider:

  1. Cleaning up an existing overgrown database
  2. Maintaining a server to avoid an overgrown database using scheduled TAC

3.1: Cleaning up a database which has grown too big due to these attachments.

This job is a “Once” job.  We do this once and then move on to make sure it won’t happen again, by taking the actions in 2) below.  In this scenario you should only consider the large files. Your goal should be to simply reduce the size, and don’t bother about  the smaller stuff. That can be left a scheduled TAC cleanup ( 2 below).

Here you can use a very general settings file, and just remove the large attachments, or you can choose to remove any old items.  Grant’s settings file is an example of the last one.  A settings file to remove only large attachments could look like this:

<!-- Scenario : Remove large files -->
<DeletionCriteria>
    <TestRun />
    <Attachment>
        <SizeInMB GreaterThan="10" />
    </Attachment>
</DeletionCriteria>

Or like this:

If you want only to remove dll’s and pdb’s about that size, add an Extensions-section.  Without that section, all extensions will be deleted.

<!-- Scenario : Remove large files of type dll's and pdb's -->
<DeletionCriteria>
    <TestRun />
    <Attachment>
        <SizeInMB GreaterThan="10" />
        <Extensions>
            <Include value="dll" />
            <Include value="pdb" />
        </Extensions>
    </Attachment>
</DeletionCriteria>

Before you start up your scheduled maintenance, you should clear out all older items.

3.2: Scheduled maintenance using the TAC

If you run a schedule every night, and remove old items, and also remove them in small batches.  It is important to run this often, like every night, in order to keep the number of deleted items low. That way the SQL ghost process works better.

One approach could be to delete all items older than some number of days, let’s say 180 days. This could be combined with restricting it to keep attachments with active or resolved bugs.  Doing this every night ensures that only small amounts of data are deleted.

<!-- Scenario : Remove old items except if they have active or resolved bugs -->
<DeletionCriteria> 
  <TestRun> 
    <AgeInDays OlderThan="180" /> 
  </TestRun> 
  <Attachment /> 
  <LinkedBugs>     
     <Exclude state="Active" /> 
     <Exclude state="Resolved"/>
  </LinkedBugs> 
</DeletionCriteria>

In my experience there are projects which are left with active or resolved workitems, although no further work is done.  It can be wise to have a cleanup process with no restrictions on linked bugs at all. Note that you then have to remove the whole LinkedBugs section.

A approach which could work better here is to do a two-step approach, use the schedule above to with no LinkedBugs as a sweeper cleaning task taking away all data older than you could care about.  Then have another scheduled TAC task to take out more specifically attachments that you are not likely to use. This task could be much more specific, and based on your analysis clean out what you know is troublesome data.

<!-- Scenario : Remove specific files early -->
<DeletionCriteria>
    <TestRun >
        <AgeInDays OlderThan="30" />
    </TestRun>
    <Attachment>
        <SizeInMB GreaterThan="10" />
        <Extensions>
            <Include value="iTrace"/>
            <Include value="dll"/>
            <Include value="pdb"/>
            <Include value="wmv"/>
        </Extensions>
    </Attachment>
    <LinkedBugs>
        <Exclude state="Active" />
        <Exclude state="Resolved" />
    </LinkedBugs>
</DeletionCriteria>

The readme document for the TAC says that it recognizes “internal” extensions, but it does recognize any extension.

To run the tool, use the following command:

tcmpt attachmentcleanup /collection:your_tfs_collection_url /teamproject:your_team_project /settingsfile:path_to_settingsfile /outputfile:%temp%/teamproject.tcmpt.log /mode:delete

3.3: Shrinking the database

You could run a shrink database command after the TAC has run in cases where there are a lot of data being deleted.  In this case you SHOULD do it, to free up all that space.  But, after the shrink operation you should do a “rebuild indexes”, since the shrink operation will leave the database in a very fragmented state, which will reduce performance. Note that you need to rebuild indexes, reorganizing is not enough.

For smaller amounts of data you should NOT shrink the database, since the data will be reused by the SQL server when it needs to add more records.  In fact, it is regarded as a bad practice to shrink the database regularly.  On a daily maintenance schedule you should NOT shrink the database.

To shrink the database you do a DBCC SHRINKDATABASE command, and then follow up with a DBCC INDEXDEFRAG afterwards.  I find the easiest way to do this is to create a SQL Maintenance plan including the Shrink Database Task and the Rebuild Index Task and just execute it when you need to do this.

About terje

See http://about.me/terjes