📊 Understanding HDF5

A Practical Guide for InSAR Data Storage - Version 2.0 with Multi-track and various product-type support

1. What is HDF5?

📚 HDF5 = Hierarchical Data Format version 5

HDF5 is a file format designed to store and organize large amounts of scientific data. Think of it as a sophisticated container that can hold:

  • Large arrays of numerical data (like your interferogram images)
  • Geographic coordinates (longitude/latitude for every data point)
  • Metadata (information about the data)
  • Multiple datasets in a single file
  • Hierarchical organization (like folders and files)

🏢 File System Analogy

HDF5 is like a miniature file system inside a single file:

  • Groups = Folders/Directories
  • Datasets = Files (containing actual data arrays)
  • Attributes = File properties/metadata (like file tags or EXIF data in photos)

Just like you organize files on your computer into folders, HDF5 lets you organize data arrays into groups!

Why Use HDF5 for InSAR?

Advantage What It Means for InSAR
Self-Describing All metadata travels with the data - you know what satellite, what dates, what processing software and methods were used
Georeferenced Built-in geographic coordinates (lon/lat) for every pixel - no separate geolocation files needed
Efficient Storage Built-in compression can reduce file sizes by 50-90%
Multiple Datasets Store unwrapped phase, wrapped phase, correlation, time series, velocity, AND coordinates all in one file
Multi-Product Support NEW in v2.0: Store interferograms, time series, AND velocity in the same file
Partial Reading Read just the part of the image you need without loading the whole file
Cross-Platform Works on Linux, Mac, Windows - same file everywhere
Language Support Can read with Python, MATLAB, R, C++, Java, etc.

2. HDF5 File Structure

The Three Building Blocks

🗂️ 1. Groups (The Folders)

Groups organize your data hierarchically, just like folders on your computer.

  • Can contain other groups (subfolders)
  • Can contain datasets (files)
  • Can have attributes attached to them
  • Named with paths like: /ALOS2_073_A/INTERFEROGRAM/20240101_20240113/

📊 2. Datasets (The Actual Data)

Datasets are multi-dimensional arrays that hold your actual numerical data.

  • Can be 1D (like a list), 2D (like an image), 3D (like a video), or higher
  • Have a specific data type (float32, int16, etc.)
  • Can be compressed to save space
  • Can be read partially (just a slice)
  • Example: A 1000×1200 array of phase values or coordinates

🏷️ 3. Attributes (The Metadata)

Attributes are small pieces of metadata attached to groups or datasets.

  • Store descriptive information (not large data)
  • Usually simple values: strings, numbers, small arrays
  • Examples: platform name, date, units, description, coordinate system
  • Can be attached to root, groups, or datasets

In this documentation, we use @ to indicate attributes: @platform means an attribute named "platform"

Visual Structure Example (Version 2.0 Multi-Product Format with Coordinates)

📁 myfile.h5 (the HDF5 file itself)
├── @processing_software = "ISCE2 + MintPy" (attribute of root)
├── @history = "2024-01-15T10:30:00"
├── @sign_convention = "Positive toward sensor"
├── 📁 ALOS2_073_A/ (track group)
│ │
│ ├── @product_types = ["INTERFEROGRAM", "TIMESERIES", "VELOCITY"] ← NEW!
│ ├── @coordinate_reference_system = "EPSG:4326" ← REQUIRED!
│ ├── @platform = "ALOS-2"
│ ├── @relative_orbit = 73
│ ├── @flight_direction = "A"
│ ├── @wavelength = 0.236
│ ├── @first_date = "2024-01-01"
│ ├── @last_date = "2024-02-08"
│ │
│ ├── 📄 longitude (1000×1200 array) ← Geographic coordinates (WGS84)
│ │ ├── @units = "degrees_east"
│ │ └── @valid_range = [-180.0, 180.0]
│ │
│ ├── 📄 latitude (1000×1200 array) ← Geographic coordinates (WGS84)
│ │ ├── @units = "degrees_north"
│ │ └── @valid_range = [-90.0, 90.0]
│ │
│ ├── 📄 line_of_sight_e (1000×1200 array) ← Shared LOS vectors
│ │ ├── @description = "LOS East component"
│ │ └── @units = "dimensionless"
│ │
│ ├── 📄 line_of_sight_n (dataset)
│ ├── 📄 line_of_sight_u (dataset)
│ │
│ ├── 📁 INTERFEROGRAM/ ← Product group 1
│ │ └── 📁 20240101_20240113/ (date pair)
│ │ ├── @reference_date = "20240101"
│ │ ├── @secondary_date = "20240113"
│ │ ├── 📄 unwrapped_interferogram (radians)
│ │ ├── 📄 wrapped_interferogram
│ │ └── 📄 correlation
│ │
│ ├── 📁 TIMESERIES/ ← Product group 2
│ │ ├── @reference_date = "20240101"
│ │ ├── 📄 dLOS_20240101 (meters)
│ │ ├── 📄 dLOS_20240113
│ │ └── 📄 dLOS_20240125
│ │
│ └── 📁 VELOCITY/ ← Product group 3
│ ├── 📄 velocity (m/year)
│ └── 📄 velocity_std
└── 📁 S1_064_D/ (another track)
├── 📄 longitude ← Each track has its own coordinates
├── 📄 latitude
└── ...

💡 Understanding the Structure

Computer File System → HDF5 Equivalent:

  • /Users/username/Documents/Project//ALOS2_073_A/INTERFEROGRAM/
  • photo.jpgunwrapped_interferogram (dataset)
  • coordinates.txtlongitude and latitude (datasets)
  • File properties (size, date, camera model) → Attributes (@units, @description)
  • Multiple albums in one folder → Multiple product groups in one track

3. Key Components Explained

Groups in Detail

What Groups Do:

  • Organize data logically: All data for one track in one group
  • Create hierarchy: /ALOS2_073_A/INTERFEROGRAM/20240101_20240113/unwrapped_interferogram
  • Separate product types: INTERFEROGRAM/, TIMESERIES/, VELOCITY/ subgroups

In Your InSAR File (Version 2.0):

  • /ALOS2_073_A/ - Contains all data for ALOS-2 track 73 ascending
  • /ALOS2_073_A/INTERFEROGRAM/ - Contains all interferogram date pairs for this track
  • /ALOS2_073_A/TIMESERIES/ - Contains displacement time series for this track
  • /ALOS2_073_A/VELOCITY/ - Contains velocity products for this track
  • /S1_064_D/ - Contains all data for Sentinel-1 track 64 descending

Datasets in Detail

Dataset Properties:

  • Shape: Dimensions of the array (e.g., 1000 rows × 1200 columns)
  • Data Type: float32 (32-bit floating point), int16 (16-bit integer), etc.
  • Chunks: How the data is divided for efficient storage/reading
  • Compression: How the data is compressed (gzip, lzf, etc.)

Example Dataset:

Dataset: unwrapped_interferogram Shape: (1000, 1200) Type: float32 Size: 4.58 MB (uncompressed) Compression: gzip level 6 Actual Size: 1.2 MB (74% reduction)

Coordinate Datasets:

Dataset: longitude Shape: (1000, 1200) Type: float32 Units: degrees_east Range: [-118.5, -118.0] Dataset: latitude Shape: (1000, 1200) Type: float32 Units: degrees_north Range: [34.0, 34.5]

Attributes in Detail

⚠️ Attributes vs Datasets - When to Use Which?

Use Attributes for:

  • Small metadata (strings, numbers, small lists)
  • Descriptive information (platform name, dates, units)
  • Configuration values (processing parameters)
  • Product type declarations (@product_types)
  • Coordinate system (@coordinate_reference_system)

Use Datasets for:

  • Large arrays (images, time series, coordinates)
  • Data you want to compress
  • Data you want to read partially
  • Multi-dimensional data

Rule of Thumb: If it's bigger than a few KB, make it a dataset. If it's a description or label, make it an attribute. Coordinates are always datasets because they match the size of your data arrays.

4. Quick Reference Summary

Concept What It Is Example (V2.0) Python
Group Container (like a folder) /ALOS2_073_A/INTERFEROGRAM/ f.create_group('INTERFEROGRAM')
Dataset Data array (like a file) unwrapped_interferogram, longitude f.create_dataset('name', data=array)
Attribute Metadata (like file properties) @platform = "ALOS-2" f.attrs['platform'] = 'ALOS-2'
Coordinates Geographic location of data longitude, latitude track.create_dataset('longitude', data=lon_array)
CRS Coordinate reference system @coordinate_reference_system = "EPSG:4326" track.attrs['coordinate_reference_system'] = 'EPSG:4326'
Product Types Declares available products @product_types = ["INTERFEROGRAM"] track.attrs['product_types'] = '["INTERFEROGRAM"]'
Track One satellite/orbit combination ALOS2_073_A, S1_064_D track = f.create_group('ALOS2_073_A')

🎯 Remember:

  • HDF5 = miniature file system in a single file
  • Groups = folders, Datasets = files, Attributes = properties
  • @ symbol indicates attributes in documentation
  • NEW v2.0: Geographic coordinates (lon/lat) REQUIRED for all tracks
  • NEW v2.0: Must use EPSG:4326 (WGS84) - no other CRS allowed
  • NEW v2.0: Multiple products in one file via product groups
  • Coordinates stored once at track level (shared by all products)
  • Each track declares: @product_types and @coordinate_reference_system
  • Three product groups: INTERFEROGRAM/, TIMESERIES/, VELOCITY/
  • LOS vectors stored once at track level (shared)
  • Time series requires @reference_date on group
  • Different units: radians (IFG), meters (TS), m/year (VEL), degrees (coords)
  • All data arrays must match coordinate array dimensions
  • You can use any combination of product types!
  • Always compress, always document with attributes