HDF5 Chunking

HDF5’s dataset chunking feature is a way to optimize data layout on disk to support partial dataset reads by downstream consumers. This is all the more important when compression filters are applied to datasets as it frees a consumer from suffering the UNcompression of an entire dataset only to read a portion.

ZFP Chunklets

When using HDF5 chunking with ZFP compression, it is important to account for the fact that ZFP does its work in tiny 4d chunklets of its own where d is the dataset dimension (rank in HDF5 parlance). This means that that whenever possible chunking dimensions you select in HDF5 should be multiples of 4. When a chunk dimension is not a multiple of 4, ZFP will wind up with partial chunklets which it will pad with useless data reducing overall time and space efficiency of the results.

The degree to which this may degrade performance depends on the percentage of a chunk that is padded. Suppose we have 2D chunk of dimensions 27 x 101. ZFP will have to treat it as 28 x 104 by padding out each dimension to the next closest multiple of 4. The fraction of space that will wind up being wasted due to ZFP chunklet padding will be (28x104-27x101) / (28x104) which is about 6.4%. On the other hand, consider a 3D chunk that is 1024 x 1024 x 2. ZFP will have to treat it as a 1024 x 1024 x 4 resulting in 50% waste.

The latter example is potentialy very relevant when attemping to apply ZFP to compress data long the time dimension in a large, 3D, simulation. Ordinarily, a simulation advances one time step at a time and so needs to store in memory only the current timestep. In order to give ZFP enough width in the time dimension to satisfy the minimum chunklet dimension size of 4, the simulation needs to keep in memory 4 timesteps. This is demonstrated in the example below.

More Than 3 (or 4) Dimensions

Versions of ZFP 0.5.3 and older support compression in only 1,2 or 3 dimensions. Versions of ZFP 0.5.4 and newer also support 4 dimensions.

What if you have a dataset with more dimensions than ZFP can compress? You can still use the H5Z-ZFP filter. But, in order to do so you are required to chunk the dataset [1] . Furthermore, you must select a chunk size such that no more than 3 (or 4 for ZFP 0.5.4 and newer) dimensions are non-unitary (e.g. of size one).

For example, what if you are using ZFP 0.5.3 and have a 4D HDF5 dataset you want to compress? To do this, you will need to chunk the dataset and when you define the chunk size and shape, you will need to select which of the 4 dimensions of the chunk you do not intend to have ZFP perform compression along by setting the size of the chunk in that dimension to unity (1). When you do this, as HDF5 processes writes and reads, it will organize the data so that all the H5Z-ZFP filter sees are chunks which have extent only in the non-unity dimensions of the chunk.

In the example below, we have a 4D array of shape int dims[] = {256,128,32,16}; that we have intentionally constructed to be smooth in only 2 of its 4 dimensions (e.g. correlation is high in those dimensions). Because of that, we expect ZFP compression to do well along those dimensions and we do no want ZFP to compress along the other 2 dimensions. The uncorrelated dimensions here are dimensions with indices 1 (128 in dims[]) and 3 (16 in dims[]). Thus, our chunk size and shape is chosoen to set the size for those dimension indices to 1, hsize_t hchunk[] = {256,1,32,1};

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
    if (highd)
    {
     /* dimension indices 0   1   2  3 */
        int fd, dims[] = {256,128,32,16};
        int ucdims[]={1,3}; /* UNcorrleted dimensions indices */
        hsize_t hdims[] = {256,128,32,16};
        hsize_t hchunk[] = {256,1,32,1};

        buf = gen_random_correlated_array(TYPDBL, 4, dims, 2, ucdims);

        cpid = setup_filter(4, hchunk, zfpmode, rate, acc, prec, minbits, maxbits, maxprec, minexp);

        if (0 > (sid = H5Screate_simple(4, hdims, 0))) ERROR(H5Screate_simple);

        /* write the data WITHOUT compression */
        if (0 > (dsid = H5Dcreate(fid, "highD_original", H5T_NATIVE_DOUBLE, sid, H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT))) ERROR(H5Dcreate);
        if (0 > H5Dwrite(dsid, H5T_NATIVE_DOUBLE, H5S_ALL, H5S_ALL, H5P_DEFAULT, buf)) ERROR(H5Dwrite);
        if (0 > H5Dclose(dsid)) ERROR(H5Dclose);

        /* write the data with compression */
        if (0 > (dsid = H5Dcreate(fid, "highD_compressed", H5T_NATIVE_DOUBLE, sid, H5P_DEFAULT, cpid, H5P_DEFAULT))) ERROR(H5Dcreate);
        if (0 > H5Dwrite(dsid, H5T_NATIVE_DOUBLE, H5S_ALL, H5S_ALL, H5P_DEFAULT, buf)) ERROR(H5Dwrite);
        if (0 > H5Dclose(dsid)) ERROR(H5Dclose);

        /* clean up from high dimensional test */
        if (0 > H5Sclose(sid)) ERROR(H5Sclose);
        if (0 > H5Pclose(cpid)) ERROR(H5Pclose);
        free(buf);
    }

What analysis process should you use to select the chunk shape? Depending on what you expect in the way of access patters in downstream consumers, this can be a challenging question to answer. There are potentially two competing interests. One is optimizing the chunk size and shape for access patterns anticipated by downstream consumers. The other is optimizing the chunk size and shape for compression. These two interests may not be compatible and you may have to compromise between them. We illustrate the issues and tradeoffs using an example.

Compression Along the State Iteration Dimension

By state iteration dimension, we are referring to the main iteration loop(s) of the data producer. For many PDE-based simulations, the main iteration dimension is time. But, for some outer loop methods, the main iteration dimension(s) might be some kind of parameter study including multiple paramaters.

The challenge here is to manage the data in a way that meets ZFP’s chunklet size and shape minimum requirements. In any H5Dwrite at least 4 samples along a ZFP compression dimension are needed or there will be wasted space due to padding. This means that data must be buffered along those dimensions before H5Dwrite’s can be issued.

For example, suppose you have a tensor valued field (e.g. a 3x3 matrix at every point) over a 4D (3 spatial dimensions and 1 time dimension), regularly sampled domain? Conceptually, this is a 6 dimensional dataset in HDF5 with one of the dimensions (the time dimension) extendible. You are free to define this as a 6 dimensional dataset in HDF5. But, you will also have to chunk the dataset. You can select any chunk shape you want except that no more than 3 (or 4 for ZFP versions 0.5.4 and newer) dimensions of the chunk can be non-unity.

In the code snipit below, we demonstrate this case. A key issue to deal with is that because we will use ZFP to compress along the time dimension, this forces us to keep in memory a sufficient number of timesteps to match ZFP’s chunklet size of 4.

The code below iterates over 9 timesteps. Each of the first two groups of 4 timesteps are buffered in memory in tbuf. Once 4 timesteps have been buffered, we can issue an H5Dwrite call doing hyperslab can issue an H5Dwrite call doing hyperslab partial I/O on the 6D, extendible dataset. But, notice that the chunk dimensions (line 10) are such that only 4 of the 6 dimensions are non-unity. This means ZFP will only ever see something to compress that is essentially 4D.

On the last iteration, we have only one new timestep. So, when we write this to ZFP 75% of that write will be wasted due to ZFP chunklet padding. However, if the application were to restart from this time and continue forward, this waste will ulimately get overwritten with new timesteps.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
    /* Test six dimensional, time varying array...
           ...a 3x3 tensor valued variable
           ...over a 3D+time domain.
           Dimension sizes are chosen to miss perfect ZFP block alignment.
    */
    if (sixd)
    {
        void *tbuf;
        int t, fd, dims[] = {31,31,31,3,3}; /* a single time instance */
        int ucdims[]={3,4}; /* indices of UNcorrleted dimensions in dims (tensor components) */
        hsize_t  hdims[] = {31,31,31,3,3,H5S_UNLIMITED};
        hsize_t hchunk[] = {31,31,31,1,1,4}; /* 4 non-unity, requires >= ZFP 0.5.4 */
        hsize_t hwrite[] = {31,31,31,3,3,4}; /* size/shape of any given H5Dwrite */

        /* Setup the filter properties and create the dataset */
        cpid = setup_filter(6, hchunk, zfpmode, rate, acc, prec, minbits, maxbits, maxprec, minexp);

        /* Create the time-varying, 6D dataset */
        if (0 > (sid = H5Screate_simple(6, hwrite, hdims))) ERROR(H5Screate_simple);
        if (0 > (dsid = H5Dcreate(fid, "6D_extendible", H5T_NATIVE_DOUBLE, sid, H5P_DEFAULT, cpid, H5P_DEFAULT))) ERROR(H5Dcreate);
        if (0 > H5Sclose(sid)) ERROR(H5Sclose);
        if (0 > H5Pclose(cpid)) ERROR(H5Pclose);

        /* Generate a single buffer which we'll modulate by a time-varying function
           to represent each timestep */
        buf = gen_random_correlated_array(TYPDBL, 5, dims, 2, ucdims);

        /* Allocate the "time" buffer where we will buffer up each time step
           until we have enough to span a width of 4 */
        tbuf = malloc(31*31*31*3*3*4*sizeof(double));

        /* Iterate, writing 9 timesteps by buffering in time 4x. The last
           write will contain just one timestep causing ZFP to wind up 
           padding all those blocks by 3x along the time dimension.  */
        for (t = 1; t < 10; t++)
        {
            hid_t msid, fsid;
            hsize_t hstart[] = {0,0,0,0,0,t-4}; /* size/shape of any given H5Dwrite */
            hsize_t hcount[] = {31,31,31,3,3,4}; /* size/shape of any given H5Dwrite */
            hsize_t hextend[] = {31,31,31,3,3,t}; /* size/shape of */

            /* Update (e.g. modulate) the buf data for the current time step */
            modulate_by_time(buf, TYPDBL, 5, dims, t);

            /* Buffer this timestep in memory. Since chunk size in time dimension is 4,
               we need to buffer up 4 time steps before we can issue any writes */ 
            buffer_time_step(tbuf, buf, TYPDBL, 5, dims, t);

            /* If the buffer isn't full, just continue updating it */
            if (t%4 && t!=9) continue;

            /* For last step, adjust time dim of this write down from 4 to just 1 */
            if (t == 9)
            {
                /* last timestep, write a partial buffer */
                hwrite[5] = 1;
                hcount[5] = 1;
            }

            /* extend the dataset in time */
            if (t > 4)
                H5Dextend(dsid, hextend);

            /* Create the memory dataspace */
            if (0 > (msid = H5Screate_simple(6, hwrite, 0))) ERROR(H5Screate_simple);

            /* Get the file dataspace to use for this H5Dwrite call */
            if (0 > (fsid = H5Dget_space(dsid))) ERROR(H5Dget_space);

            /* Do a hyperslab selection on the file dataspace for this write*/
            if (0 > H5Sselect_hyperslab(fsid, H5S_SELECT_SET, hstart, 0, hcount, 0)) ERROR(H5Sselect_hyperslab);

            /* Write this iteration to the dataset */
            if (0 > H5Dwrite(dsid, H5T_NATIVE_DOUBLE, msid, fsid, H5P_DEFAULT, tbuf)) ERROR(H5Dwrite);
            if (0 > H5Sclose(msid)) ERROR(H5Sclose);
            if (0 > H5Sclose(fsid)) ERROR(H5Sclose);
        }
        if (0 > H5Dclose(dsid)) ERROR(H5Dclose);
        free(buf);
        free(tbuf);
    }
[1]The HDF5 library currently requires dataset chunking anyways for any dataset that has any kind of filter applied.