HDF5 Chunking

HDF5’s dataset chunking feature is a way to optimize data layout on disk to support partial dataset reads by downstream consumers. This is all the more important when compression filters are applied to datasets as it frees a consumer from suffering the UNcompression of an entire dataset only to read a portion.

ZFP Chunklets

When using HDF5 chunking with ZFP compression, it is important to account for the fact that ZFP does its work in tiny 4^d chunklets of its own where d is the dataset dimension (rank in HDF5 parlance). This means that whenever possible, the chunking dimensions you select in HDF5 should be multiples of 4. When a chunk dimension is not a multiple of 4, ZFP will wind up with partial chunklets, which will be padded with useless data, reducing the results’ overall time and space efficiency.

The degree to which this may degrade performance depends on the percentage of a chunk that is padded. Suppose we have a 2D chunk of dimensions 27 x 101. ZFP will have to treat it as 28 x 104 by padding out each dimension to the next closest multiple of 4. The fraction of space that will wind up being wasted due to ZFP chunklet padding will be (28x104-27x101) / (28x104), which is about 6.4%. On the other hand, consider a 3D chunk that is 1024 x 1024 x 2. ZFP will have to treat it as a 1024 x 1024 x 4 resulting in 50% waste.

The latter example is potentially very relevant when applying ZFP to compress data along the time dimension in a large, 3D, simulation. Ordinarily, a simulation advances one time step at a time and so needs to store in memory only the current timestep. However, in order to give ZFP enough width in the time dimension to satisfy the minimum chunklet dimension size of 4, the simulation needs to keep in memory 4 timesteps. This is demonstrated in the example below.

Partial I/O Requests

In any given H5Dwrite call, the caller has the option of writing (or reading) only a portion of the data in the dataset. This is a partial I/O request. This is handled by the mem_space_id and file_space_id arguments in an H5Dwrite call.

An HDF5 producer or consumer can issue partial I/O requests on any HDF5 dataset regardless of whether the dataset is compressed or not or whether the dataset has H5D_CONTIGUOUS layout. When combining partial I/O with compression, chunk size and shape in relation to partial I/O request size and shape will have an impact on performance.

This is particularly important in writer scenarios if an I/O request winds up overlapping chunks only partially. Suppose the partially overlapped chunks exist in the file (from a previous write, for example). In that case, the HDF5 library may wind up having to engage in read-modify-write operations for those chunks.

If the partially overlapped chunks do not exist in the file, the HDF5 library will wind up fill-value padding the chunks before they are written. HDF5’s default fill value is zero (as defined by the associated datatype). Data producers can choose the desired fill value (see H5Pset_fill_value) for a dataset, but this fill value can impact the space-performance of the compression filter. On the other hand, if the partial chunks in one I/O request wind up getting fully filled in another, any fill value impacts on compressor performance are resolved.

Finally, HDF5 manages a chunk cache and data sieving buffer to help alleviate some of the I/O performance issues that can be encountered in these situations.

More Than 3 (or 4) Dimensions

Versions of ZFP 0.5.3 and older support compression in only 1,2 or 3 dimensions. Versions of ZFP 0.5.4 and newer also support 4 dimensions.

What if you have a dataset with more dimensions than ZFP can compress? You can still use the H5Z-ZFP filter. But, in order to do so, you are required to chunk the dataset [1] . Furthermore, you must select a chunk size such that no more than 3 (or 4 for ZFP 0.5.4 and newer) dimensions are non-unitary (e.g. of size one).

For example, what if you are using ZFP 0.5.3 and have a 4D HDF5 dataset you want to compress? To do this, you will need to chunk the dataset and when you define the chunk size and shape, you will need to select which of the 4 dimensions of the chunk you do not intend to have ZFP perform compression along by setting the size of the chunk in that dimension to unity (1). When you do this, as HDF5 processes writes and reads, it will organize the data so that all the H5Z-ZFP filter sees are chunks which have extent only in the non-unity dimensions of the chunk.

In the example below, we have a 4D array of shape int dims[] = {256,128,32,16}; that we have intentionally constructed to be smooth in only 2 of its 4 dimensions (e.g. correlation is high in those dimensions). Because of that, we expect ZFP compression to do well along those dimensions, and we do not want ZFP to compress along the other 2 dimensions. The uncorrelated dimensions here are dimensions with indices 1 (128 in dims[]) and 3 (16 in dims[]). Thus, our chunk size and shape are chosen to set the size for those dimension indices to 1, hsize_t hchunk[] = {256,1,32,1};

    if (highd)
    {
     /* dimension indices 0   1   2  3 */
        int dims[] = {256,128,32,16};
        int ucdims[]={1,3}; /* UNcorrleted dimensions indices */
        hsize_t hdims[] = {256,128,32,16};
        hsize_t hchunk[] = {256,1,32,1};

        buf = gen_random_correlated_array(TYPDBL, 4, dims, 2, ucdims);

        cpid = setup_filter(4, hchunk, zfpmode, rate, acc, prec, minbits, maxbits, maxprec, minexp);

        if (0 > (sid = H5Screate_simple(4, hdims, 0))) SET_ERROR(H5Screate_simple);

        /* write the data WITHOUT compression */
        if (0 > (dsid = H5Dcreate(fid, "highD_original", H5T_NATIVE_DOUBLE, sid, H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT))) SET_ERROR(H5Dcreate);
        if (0 > H5Dwrite(dsid, H5T_NATIVE_DOUBLE, H5S_ALL, H5S_ALL, H5P_DEFAULT, buf)) SET_ERROR(H5Dwrite);
        if (0 > H5Dclose(dsid)) SET_ERROR(H5Dclose);

        /* write the data with compression */
        if (0 > (dsid = H5Dcreate(fid, "highD_compressed", H5T_NATIVE_DOUBLE, sid, H5P_DEFAULT, cpid, H5P_DEFAULT))) SET_ERROR(H5Dcreate);
        if (0 > H5Dwrite(dsid, H5T_NATIVE_DOUBLE, H5S_ALL, H5S_ALL, H5P_DEFAULT, buf)) SET_ERROR(H5Dwrite);
        if (0 > H5Dclose(dsid)) SET_ERROR(H5Dclose);

        /* clean up from high dimensional test */
        if (0 > H5Sclose(sid)) SET_ERROR(H5Sclose);
        if (0 > H5Pclose(cpid)) SET_ERROR(H5Pclose);
        free(buf);
    }

What analysis process should you use to select the chunk shape? Depending on what you expect in the way of access patterns in downstream consumers, this can be a challenging question to answer. There are potentially two competing interests. One is optimizing the chunk size and shape for access patterns anticipated by downstream consumers. The other is optimizing the chunk size and shape for compression. These two interests may not be compatible and you may have to compromise between them. We illustrate the issues and trade-offs using an example.

Compression Along the State Iteration Dimension

By state iteration dimension, we refer to the data producer’s main iteration loop(s). For example, the main iteration dimension for many PDE-based simulations is time. But, for some outer loop methods, the main iteration dimension(s) might be some kind of parameter study including multiple parameters.

The challenge here is to manage the data to meet ZFP’s chunklet size and shape minimum requirements. In any H5Dwrite at least 4 samples along a ZFP compression dimension are needed, or there will be wasted space due to padding. This means that data must be buffered along those dimensions before H5Dwrite’s can be issued.

For example, suppose you have a tensor-valued field (e.g. a 3x3 matrix at every point) over a 4D (3 spatial dimensions and 1 time dimension), regularly sampled domain? Conceptually, this is a 6 dimensional dataset in HDF5 with one of the dimensions (the time dimension) extendable. So, you are free to define this as a 6 dimensional dataset in HDF5. But, you will also have to chunk the dataset. You can select any chunk shape you want, except that no more than 3 (or 4 for ZFP versions 0.5.4 and newer) dimensions of the chunk can be non-unity.

In the code snippet below, we demonstrate this case. A key issue to deal with is that because we will use ZFP to compress along the time dimension, this forces us to keep in memory a sufficient number of timesteps to match ZFP’s chunklet size of 4.

The code below iterates over 9 timesteps. Each of the first two groups of 4 timesteps are buffered in memory in tbuf. Once 4 timesteps have been buffered, we can issue an H5Dwrite call doing hyperslab can issue an H5Dwrite call doing hyperslab partial I/O on the 6D, extendable dataset. But, notice that the chunk dimensions (line 10) are such that only 4 of the 6 dimensions are non-unity. This means ZFP will only ever see something to compress that is essentially 4D.

On the last iteration, we have only one new timestep. So, when we write this to ZFP 75% of that write will be wasted due to ZFP chunklet padding. However, if the application were to restart from this time and continue forward, this waste would ultimately get overwritten with new timesteps.

    /* Test six dimensional, time varying array...
           ...a 3x3 tensor valued variable
           ...over a 3D+time domain.
           Dimension sizes are chosen to miss perfect ZFP block alignment.
    */
    if (sixd)
    {
        void *tbuf;
        int t, dims[] = {31,31,31,3,3}; /* a single time instance */
        int ucdims[]={3,4}; /* indices of UNcorrleted dimensions in dims (tensor components) */
        hsize_t hdims[] = {31,31,31,3,3,H5S_UNLIMITED};
        hsize_t hchunk[] = {31,31,31,1,1,4}; /* 4 non-unity, requires >= ZFP 0.5.4 */
        hsize_t hwrite[] = {31,31,31,3,3,4}; /* size/shape of any given H5Dwrite */

        /* Setup the filter properties and create the dataset */
        cpid = setup_filter(6, hchunk, zfpmode, rate, acc, prec, minbits, maxbits, maxprec, minexp);

        /* Create the time-varying, 6D dataset */
        if (0 > (sid = H5Screate_simple(6, hwrite, hdims))) SET_ERROR(H5Screate_simple);
        if (0 > (dsid = H5Dcreate(fid, "6D_extendible", H5T_NATIVE_DOUBLE, sid, H5P_DEFAULT, cpid, H5P_DEFAULT))) SET_ERROR(H5Dcreate);
        if (0 > H5Sclose(sid)) SET_ERROR(H5Sclose);
        if (0 > H5Pclose(cpid)) SET_ERROR(H5Pclose);

        /* Generate a single buffer which we'll modulate by a time-varying function
           to represent each timestep */
        buf = gen_random_correlated_array(TYPDBL, 5, dims, 2, ucdims);

        /* Allocate the "time" buffer where we will buffer up each time step
           until we have enough to span a width of 4 */
        tbuf = malloc(31*31*31*3*3*4*sizeof(double));

        /* Iterate, writing 9 timesteps by buffering in time 4x. The last
           write will contain just one timestep causing ZFP to wind up
           padding all those blocks by 3x along the time dimension.  */
        for (t = 1; t < 10; t++)
        {
            hid_t msid, fsid;
            hsize_t hstart[] = {0,0,0,0,0,t-4}; /* size/shape of any given H5Dwrite */
            hsize_t hcount[] = {31,31,31,3,3,4}; /* size/shape of any given H5Dwrite */
            hsize_t hextend[] = {31,31,31,3,3,t}; /* size/shape of */

            /* Update (e.g. modulate) the buf data for the current time step */
            modulate_by_time(buf, TYPDBL, 5, dims, t);

            /* Buffer this timestep in memory. Since chunk size in time dimension is 4,
               we need to buffer up 4 time steps before we can issue any writes */
            buffer_time_step(tbuf, buf, TYPDBL, 5, dims, t);

            /* If the buffer isn't full, just continue updating it */
            if (t%4 && t!=9) continue;

            /* For last step, adjust time dim of this write down from 4 to just 1 */
            if (t == 9)
            {
                /* last timestep, write a partial buffer */
                hwrite[5] = 1;
                hcount[5] = 1;
            }

            /* extend the dataset in time */
            if (t > 4)
                H5Dextend(dsid, hextend);

            /* Create the memory dataspace */
            if (0 > (msid = H5Screate_simple(6, hwrite, 0))) SET_ERROR(H5Screate_simple);

            /* Get the file dataspace to use for this H5Dwrite call */
            if (0 > (fsid = H5Dget_space(dsid))) SET_ERROR(H5Dget_space);

            /* Do a hyperslab selection on the file dataspace for this write*/
            if (0 > H5Sselect_hyperslab(fsid, H5S_SELECT_SET, hstart, 0, hcount, 0)) SET_ERROR(H5Sselect_hyperslab);

            /* Write this iteration to the dataset */
            if (0 > H5Dwrite(dsid, H5T_NATIVE_DOUBLE, msid, fsid, H5P_DEFAULT, tbuf)) SET_ERROR(H5Dwrite);
            if (0 > H5Sclose(msid)) SET_ERROR(H5Sclose);
            if (0 > H5Sclose(fsid)) SET_ERROR(H5Sclose);
        }
        if (0 > H5Dclose(dsid)) SET_ERROR(H5Dclose);
        free(buf);
        free(tbuf);
    }