Direct Writes (and Reads)

The purpose of direct writes is to enable an application to write data that is already compressed in memory directly to an HDF5 file without first uncompressing it so the filter can then turn around and compress it during write. However, once data is written to the file with a direct write, consumers must still be able to read it without concern for how the producer wrote it.

Doing this requires the use of an advanced HDF5 function for direct writes.

At present, we demonstrate only minimal functionality here using single chunking, where the chunk size is chosen to match the size of the entire dataset. To see an example of code that does this, have a look at…

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
    if (zfparr>0 && zfpmode==1 && rate>0)
    {
        int            dims[] = {38, 128};
       /*int      chunk_dims[] = {19, 34};*/
        int      chunk_dims[] = {38, 128};
        hsize_t       hdims[] = {38, 128};
       /*hsize_t hchunk_dims[] = {19, 34};*/
        hsize_t hchunk_dims[] = {38, 128};
        hsize_t hchunk_off[] = {0, 0};
#if defined(ZFP_LIB_VERSION) && ZFP_LIB_VERSION<=0x055
        cfp_array2d *origarr;
#else
        cfp_array2d origarr;
#endif

        /* Create the array data */
        buf = gen_random_correlated_array(TYPDBL, 2, dims, 0, 0);

        /* Instantiate a cfp array */
        origarr = cfp.array2d.ctor(dims[1], dims[0], rate, buf, 0);
        cfp.array2d.flush_cache(origarr);

        cpid = setup_filter(2, hchunk_dims, 1, rate, acc, prec, minbits, maxbits, maxprec, minexp);

        if (0 > (sid = H5Screate_simple(2, hdims, 0))) SET_ERROR(H5Screate_simple);

        /* write the data WITHOUT compression */
        if (0 > (dsid = H5Dcreate(fid, "zfparr_original", H5T_NATIVE_DOUBLE, sid, H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT))) SET_ERROR(H5Dcreate);
        if (0 > H5Dwrite(dsid, H5T_NATIVE_DOUBLE, H5S_ALL, H5S_ALL, H5P_DEFAULT, buf)) SET_ERROR(H5Dwrite);
        if (0 > H5Dclose(dsid)) SET_ERROR(H5Dclose);

        /* write the data with compression via the filter */
        if (0 > (dsid = H5Dcreate(fid, "zfparr_compressed", H5T_NATIVE_DOUBLE, sid, H5P_DEFAULT, cpid, H5P_DEFAULT))) SET_ERROR(H5Dcreate);
        if (0 > H5Dwrite(dsid, H5T_NATIVE_DOUBLE, H5S_ALL, H5S_ALL, H5P_DEFAULT, buf)) SET_ERROR(H5Dwrite);
        if (0 > H5Dclose(dsid)) SET_ERROR(H5Dclose);

        /* write the data direct from compressed array using H5Dwrite_chunk calls */
        if (0 > (dsid = H5Dcreate(fid, "zfparr_direct", H5T_NATIVE_DOUBLE, sid, H5P_DEFAULT, cpid, H5P_DEFAULT))) SET_ERROR(H5Dcreate);
        if (0 > H5Dwrite_chunk(dsid, H5P_DEFAULT, 0, hchunk_off, cfp.array2d.compressed_size(origarr), cfp.array2d.compressed_data(origarr))) SET_ERROR(H5Dwrite_chunk);

        if (0 > H5Dclose(dsid)) SET_ERROR(H5Dclose);

        free(buf);
        cfp.array2d.dtor(origarr);
    }

In particular, look for the line using H5Dchunk_write in place of H5Dwrite. In all other respects, the code looks the same.

The test case for this code writes uncompressed data as a dataset named zfparr_original, the compressed dataset named zfparr_compressed using the filter and then the compressed data a second time named zfparr_direct using a direct write. Then, the h5diff tool is used to compare the data in the original and the direct write datasets.

Note that in order for consumers to work as normal, the producer must set dataset creation properties as it ordinarily would using the H5Z-ZFP filter. In the call to H5Dchunk_write, the caller indicates to the HDF5 library not to invoke the filter via the filters mask argument.