Feb 03

Question

How can the size needed for metadata of VM backups be estimated to size the storage pool defined as destination by the VMCTLMC option ?

Answer

The VMCTLMC management class and storage pool destination is required when backing up virtual machines to tape or Virtual Tape using TSM for Virtual Environments. These “control” files (or “metadata”) provide a fast index to locate the virtual disk blocks backup data in TSM. The storage pool destination for these control files, associated with the VMCTLMC management class, should be on a random disk storage device type.
The easiest method to estimate the required amount storage for the VMCTLMC storage pool is to multiply the total retained VM backup data by 0.2% (0.002 decimal). The 0.2% value is an approximation and may not be appropriate for all environments, but has been shown to be valid in a sample of actual customer installations. To estimate the size requirement for the VMCTLMC storage pool, you must first estimate the total amount of retained backup storage for VMware backups. The total retained storage is total amount of data that is backed up for virtual machines, including all versions.

As an example let’s consider a virtual machine with a single vmdk of 100GB, with an average 10% daily change rate of data, with 15 days retention policy.

The total retained data is the original source data (100GB) plus the daily amount of data changed (10% * 100GB = 10GB) times the number of days retained (15). The total is (100 GB) + (10GB * 15 days) = 100GB + 150GB = 250GB. To estimate the amount of VMCTLMC data required multiple the retained data by 0.2%: 250GB * 0.2% = 0.5GB or 512 MB of disk storage.

In summary:

  • Original source data = 100 GB
  • Total retained data = 100 GB + (100 GB * .10 change rate * 15 days) = 100 GB + 150 GB = 250 GB
  • VMCTLMC data required = 250 GB * 0.002 VMCTLMC estimate = 0.5 GB

Remember that this is an approximation, thus you may want to increase or round this value up to ensure that you allocate sufficient storage pool, for example to 600GB. As with any TSM storage pool, you should monitor and set alerts for the percent utilization so you can increase the size of the storage pool if the utilization reaches a critical level, such as over 90%. Backups will fail if the VMCTLMC storage pool runs out of space. Control files associated with expired data will be removed from the VMCTLMC storage pool destination. However the timing actual expiration cycle may result in the VMCTLMC storage pool retaining some expired data and in the short term exceeding the estimate of 0.2%.

The following sections describe the analysis of VMCTLMC capacity estimation in greater detail. However, this information is generally not required if you use the simplified method of estimation using 0.2% as the guideline.

Detailed Analysis of VMCTLMC storage pool estimation method:

The VMCTLMC control files contain index information for each block of a virtual disk (vmdk) in VMware. These blocks are 16KB each. TSM stores the 16KB blocks in larger objects called “Megablocks” that are 128MB. A single control file is used for each TSM Megablock. For a 100GB vmdk, there are 800 megablocks [(100GB * 1024)/128MB]. For an initial full backup, 800 megablocks will be stored, as well as 800 control files. Each control file is a fixed size of 73KB. The total amount of space required for the control files with the initial full is 800 * 73KB = 58400KB. As a percent of retained data, the control files in this case (single full backup only) will be (58400KB)/(100GB * 1024 * 1024) = 0.000557 or 0.06% rounded. This represents the best case, or minimum, amount of VMCTL MC data as a percent of retained data.

Best case calculation (minimum distribution of changed vmdk blocks):

When incremental backups occur which affect individual 16KB vmdk blocks and not an entire 128MB TSM Megablock, only the changed data is backed up to TSM. A new control file is generated when as little as a single 16KB vmdk block is changed. The best case scenario, where the fewest megablocks are affected and the least number of new control file are created occurs when all of the changed data is consolidated in as few megablocks as possible. For example, with an incremental backup with 10% changed data on a 100GB vmdk, there would be a minimum of (10GB * 1024)/128MB = 80 changed megablocks. The amount of control file data for just this one incremental backup is 80 * 73KB = 5840KB. As a percent of the incremental backup data, the control files require (5840KB)/(10*1024*1024) = 0.000557 or 0.06% rounded of the backup data. Since the best case percentage is the same as the initial full backup, subsequent incremental backups will not change the percent calculation when calculating control file space as a percent of total retained data.

The 0.006% value calculated about represents the theoretical absolute minimum storage required for the VMCTLMC, and therefore is not a good value to use for estimation for an actual deployment.

Worst case calculation (minimum distribution of changed vmdk blocks):

It is most likely that the changed vmdk blocks are distributed to some degree throughout the virtual disk. The result of this fragmentation is to distribute the changed data over a greater number of megablocks, and results in a greater number of control files that are created. The worst case is that the changed vmdk blocks are distributed over the maximum number of megablocks possible, which in this example is 800. (There are only 800 megablocks for the 100GB vmdk, therefore this represents the maximum.) In this worst case scenario, the maximum amount of control file data that will be generated is 800 * 73KB = 58400. The associated amount of backup data for these control files is only 10GB (the 10% changed data), rather than 100GB in the case of the full backup. Therefore, the percentage of control file data (VMCTLMC) required is (58400KB)/(10GB * 1024 * 1024) = 0.00557 or 0.6% rounded. Note that this is 10 times the percentage calculated for the best case. As the number of retained incremental backups increases, the calculation of control file space as a percentage of retained backup data will approach 0.6%.

Estimation for most likely scenario

In an actual TSM for Virtual Environments deployment, neither the theoretical best case nor the worst case are likely, and the most likely scenario will be somewhere between. Based on a sampling of actual deployments, the steady state value of the ratio of control file space required compared with total retained data is approximately 0.2%. At any particular point in time the percentage may be higher such as due to delays in expiring old backup data, smaller change rates (which can result in the same amount of control file data for a smaller amount of backup data), or a wider distribution of changed data in vmdk data blocks. Best practice is to monitor the percent utilization of the VMCTLMC storage pool to determine if additional space is allocated.

written by Bosse


Comments are closed.

i3Theme sponsored by Top 10 Web Hosting and Hosting in Colombia