📊 Understanding HDF5

A Practical Guide for InSAR Data Storage

1. What is HDF5?

📚 HDF5 = Hierarchical Data Format version 5

HDF5 is a file format designed to store and organize large amounts of scientific data. Think of it as a sophisticated container that can hold:

  • Large arrays of numerical data (like your interferogram images)
  • Metadata (information about the data)
  • Multiple datasets in a single file
  • Hierarchical organization (like folders and files)

🏢 File System Analogy

HDF5 is like a miniature file system inside a single file:

  • Groups = Folders/Directories
  • Datasets = Files (containing actual data arrays)
  • Attributes = File properties/metadata (like file tags or EXIF data in photos)

Just like you organize files on your computer into folders, HDF5 lets you organize data arrays into groups!

Why Use HDF5 for InSAR?

Advantage What It Means for InSAR
Self-Describing All metadata travels with the data - you know what satellite, what dates, what processing software and methods were used
Efficient Storage Built-in compression can reduce file sizes by 50-90%
Multiple Datasets Store unwrapped phase, wrapped phase, correlation, time series, and geometry all in one file
Partial Reading Read just the part of the image you need without loading the whole file
Cross-Platform Works on Linux, Mac, Windows - same file everywhere
Language Support Can read with Python, MATLAB, R, C++, Java, etc.

2. HDF5 File Structure

The Three Building Blocks

🗂️ 1. Groups (The Folders)

Groups organize your data hierarchically, just like folders on your computer.

  • Can contain other groups (subfolders)
  • Can contain datasets (files)
  • Can have attributes attached to them
  • Named with paths like: /ALOS2_073_A/20240101_20240113/

📊 2. Datasets (The Actual Data)

Datasets are multi-dimensional arrays that hold your actual numerical data.

  • Can be 1D (like a list), 2D (like an image), 3D (like a video), or higher
  • Have a specific data type (float32, int16, etc.)
  • Can be compressed to save space
  • Can be read partially (just a slice)
  • Example: A 1000×1200 array of phase values

🏷️ 3. Attributes (The Metadata)

Attributes are small pieces of metadata attached to groups or datasets.

  • Store descriptive information (not large data)
  • Usually simple values: strings, numbers, small arrays
  • Examples: platform name, date, units, description
  • Can be attached to root, groups, or datasets

In documentation, we use @ to indicate attributes: @platform means an attribute named "platform"

Visual Structure Example (Version 2.0 Format)

📁 myfile.h5 (the HDF5 file itself)
├── @processing_type = "INTERFEROGRAM" (attribute of root)
├── @processing_software = "ISCE2 v2.6.3"
├── @sign_convention = "Positive LOS displacement corresponds to surface motion toward the sensor"
├── 📁 ALOS2_073_A/ (track group)
│ │
│ ├── @platform = "ALOS-2"
│ ├── @relative_orbit = 73
│ ├── @flight_direction = "A"
│ ├── @wavelength = 0.236
│ ├── @first_date = "2024-01-01"
│ ├── @last_date = "2024-02-08"
│ │
│ ├── 📄 line_of_sight_e (1000×1200 array)
│ │ ├── @description = "LOS East component"
│ │ └── @units = "dimensionless"
│ │
│ ├── 📄 line_of_sight_n (dataset)
│ ├── 📄 line_of_sight_u (dataset)
│ │
│ └── 📁 20240101_20240113/ (interferogram group)
│ │
│ ├── @reference_date = "20240101"
│ ├── @secondary_date = "20240113"
│ ├── @baseline_perp = 45.2
│ │
│ ├── 📄 unwrapped_interferogram (1000×1200 array)
│ │ ├── @description = "Unwrapped phase"
│ │ └── @units = "radians"
│ │
│ ├── 📄 wrapped_interferogram (1000×1200 array)
│ └── 📄 correlation (1000×1200 array)
└── 📁 S1_064_D/ (another track)
└── ...

💡 Understanding the Structure

Computer File System → HDF5 Equivalent:

  • /Users/username/Documents//ALOS2_073_A/20240101_20240113/
  • photo.jpgunwrapped_interferogram (dataset)
  • File properties (size, date, camera model) → Attributes (@units, @description)

3. Key Components Explained

Groups in Detail

What Groups Do:

  • Organize data logically: All data for one track in one group
  • Create hierarchy: /ALOS2_073_A/20240101_20240113/unwrapped_interferogram
  • Separate different types: Track groups for spatial data, METADATA_SUMMARY for summaries

In Your InSAR File (Version 2.0):

  • /ALOS2_073_A/ - Contains all data for ALOS-2 track 73 ascending
  • /ALOS2_073_A/20240101_20240113/ - Contains one interferogram pair (for INTERFEROGRAM products)
  • /S1_064_D/ - Contains all data for Sentinel-1 track 64 descending
  • /METADATA_SUMMARY/ - Contains summary tables across all tracks

Datasets in Detail

Dataset Properties:

  • Shape: Dimensions of the array (e.g., 1000 rows × 1200 columns)
  • Data Type: float32 (32-bit floating point), int16 (16-bit integer), etc.
  • Chunks: How the data is divided for efficient storage/reading
  • Compression: How the data is compressed (gzip, lzf, etc.)

Example Dataset:

Dataset: unwrapped_interferogram Shape: (1000, 1200) Type: float32 Size: 4.58 MB (uncompressed) Compression: gzip level 6 Actual Size: 1.2 MB (74% reduction)

Attributes in Detail

⚠️ Attributes vs Datasets - When to Use Which?

Use Attributes for:

  • Small metadata (strings, numbers, small lists)
  • Descriptive information (platform name, dates, units)
  • Configuration values (processing parameters)

Use Datasets for:

  • Large arrays (images, time series)
  • Data you want to compress
  • Data you want to read partially
  • Multi-dimensional data

Rule of Thumb: If it's bigger than a few KB, make it a dataset. If it's a description or label, make it an attribute.

4. InSAR Product Types in Version 2.0

Three Main Product Types:

The Version 2.0 format supports three types of InSAR products, each with a different data organization:

4.1 INTERFEROGRAM Products

📐 What are Interferograms?

Interferograms show the phase difference between two SAR acquisitions. They contain raw measurements of surface displacement wrapped in cycles of radar wavelength.

File Structure:

📁 interferogram_file.h5
├── @processing_type = "INTERFEROGRAM"
├── 📁 ALOS2_073_A/
│ ├── @platform = "ALOS-2"
│ ├── @relative_orbit = 73
│ ├── @first_date = "2024-01-01"
│ ├── 📄 line_of_sight_e/n/u
│ └── 📁 20240101_20240113/ ← Date pair groups
│ ├── 📄 unwrapped_interferogram (radians)
│ ├── 📄 wrapped_interferogram (radians)
│ └── 📄 correlation (coherence)

Key Characteristics:

  • Data organized in date-pair groups: YYYYMMDD_YYYYMMDD/
  • Contains unwrapped and wrapped phase in radians
  • Includes correlation (coherence) maps
  • Each pair has metadata: baselines, dates, platforms
  • Used for analyzing individual interferometric pairs

4.2 DISP. TIME SERIES Products

📈 What are Displacement Time Series?

Time series show cumulative surface displacement over time relative to a reference date. These are derived from multiple interferograms using time series analysis methods (SBAS, PS, etc.).

File Structure:

📁 timeseries_file.h5
├── @processing_type = "DISP. TIME SERIES"
├── 📁 ALOS2_073_A/
│ ├── @platform = "ALOS-2"
│ ├── @reference_date = "20240101" ← Track-specific (REQUIRED)
│ ├── 📄 line_of_sight_e/n/u
│ ├── 📄 dLOS_20240101 (meters) ← All zeros (reference)
│ ├── 📄 dLOS_20240113 (meters) ← Cumulative displacement
│ ├── 📄 dLOS_20240125 (meters)
│ └── 📄 dLOS_20240208 (meters)
└── 📁 S1_064_D/
├── @reference_date = "20240105" ← Different reference!
└── 📄 dLOS_...

Key Characteristics:

  • Individual displacement datasets: dLOS_YYYYMMDD
  • Units are meters (not radians)
  • All displacements relative to track-specific reference date
  • Each track MUST have its own @reference_date attribute
  • Reference date dataset contains zeros
  • No date-pair groups - dates stored directly under track
  • Used for monitoring deformation over time

4.3 LOS_VELOCITY Products

🚀 What are Velocity Maps?

Velocity maps show the average rate of surface motion over a time period. Derived from linear regression or other methods applied to displacement time series.

File Structure:

📁 velocity_file.h5
├── @processing_type = "LOS_VELOCITY"
├── 📁 ALOS2_073_A/
│ ├── 📄 line_of_sight_e/n/u
│ ├── 📄 velocity (m/year)
│ │ ├── @time_span_start = "2024-01-01"
│ │ └── @time_span_end = "2024-04-01"
│ └── 📄 velocity_std (optional uncertainty)

Key Characteristics:

  • Single velocity dataset per track
  • Units are m/year or mm/year
  • Time span specified in attributes
  • Optional uncertainty/standard deviation field
  • Used for long-term deformation rates

Comparison Table

Feature INTERFEROGRAM DISP. TIME SERIES LOS_VELOCITY
processing_type "INTERFEROGRAM" "DISP. TIME SERIES" "LOS_VELOCITY"
Data Organization Date-pair groups Individual date datasets Single velocity dataset
Dataset Names YYYYMMDD_YYYYMMDD/ dLOS_YYYYMMDD velocity
Units Radians Meters (m) m/year or mm/year
Reference Date Not applicable Required (track-level) Time span in attributes
Typical Use Raw phase measurements Time-dependent deformation Long-term rates
Common Methods GMTSAR, ISCE, GAMMA, SNAP MintPy, StaMPS, (X)SBAS Stacking, StaMPS, (X)SBAS

5. Working with HDF5 in Python

5.1 Creating an INTERFEROGRAM File

creating_interferogram.py
import h5py
import numpy as np

# Create INTERFEROGRAM product
with h5py.File('my_interferogram.h5', 'w') as f:
    
    # Root attributes
    f.attrs['processing_type'] = 'INTERFEROGRAM'
    f.attrs['processing_software'] = 'ISCE2 v2.6.3 + SNAPHU v2.0.5'
    f.attrs['sign_convention'] = 'Positive LOS displacement corresponds to surface motion toward the sensor'
    
    # Create track
    track = f.create_group('ALOS2_073_A')
    track.attrs['platform'] = 'ALOS-2'
    track.attrs['relative_orbit'] = 73
    track.attrs['flight_direction'] = 'A'
    track.attrs['look_direction'] = 'R'
    track.attrs['beam_mode'] = 'WD1'
    track.attrs['beam_swath'] = 'W1'
    track.attrs['wavelength'] = 0.236
    track.attrs['first_date'] = '2024-01-01'
    track.attrs['last_date'] = '2024-02-08'
    track.attrs['time_acquisition'] = '10:23'
    track.attrs['scene_footprint'] = 'POLYGON((-118.5 34.0, -118.0 34.0, -118.0 34.5, -118.5 34.5, -118.5 34.0))'
    
    # LOS vectors (track-specific)
    los_e_data = np.random.uniform(0.35, 0.45, (1000, 1200)).astype('float32')
    los_e = track.create_dataset('line_of_sight_e', data=los_e_data, compression='gzip')
    los_e.attrs['description'] = 'LOS unit vector - East component'
    los_e.attrs['units'] = 'dimensionless'
    
    # Create interferogram group (date pair)
    ifg_group = track.create_group('20240101_20240113')
    ifg_group.attrs['reference_date'] = '20240101'
    ifg_group.attrs['secondary_date'] = '20240113'
    ifg_group.attrs['temporal_baseline_days'] = 12
    ifg_group.attrs['baseline_perp'] = 45.2
    
    # Add unwrapped phase (radians)
    unwrapped_data = np.random.randn(1000, 1200).astype('float32')
    unwrapped = ifg_group.create_dataset('unwrapped_interferogram', 
                                         data=unwrapped_data, compression='gzip')
    unwrapped.attrs['description'] = 'Unwrapped interferometric phase'
    unwrapped.attrs['units'] = 'radians'

print("✅ Interferogram file created!")

5.2 Creating a TIME SERIES File

creating_timeseries.py
import h5py
import numpy as np

# Create DISP. TIME SERIES product
with h5py.File('my_timeseries.h5', 'w') as f:
    
    # Root attributes
    f.attrs['processing_type'] = 'DISP. TIME SERIES'
    f.attrs['processing_software'] = 'MintPy v1.5.1'
    f.attrs['sign_convention'] = 'Positive LOS displacement corresponds to surface motion toward the sensor'
    
    # Create track
    track = f.create_group('ALOS2_073_A')
    track.attrs['platform'] = 'ALOS-2'
    track.attrs['relative_orbit'] = 73
    track.attrs['flight_direction'] = 'A'
    track.attrs['look_direction'] = 'R'
    track.attrs['beam_mode'] = 'WD1'
    track.attrs['beam_swath'] = 'W1'
    track.attrs['wavelength'] = 0.236
    track.attrs['first_date'] = '2024-01-01'
    track.attrs['last_date'] = '2024-04-01'
    track.attrs['time_acquisition'] = '10:23'
    track.attrs['scene_footprint'] = 'POLYGON((-118.5 34.0, -118.0 34.0, -118.0 34.5, -118.5 34.5, -118.5 34.0))'
    
    # REQUIRED: Track-specific reference date for time series
    track.attrs['reference_date'] = '20240101'
    
    # LOS vectors
    los_e_data = np.random.uniform(0.35, 0.45, (1000, 1200)).astype('float32')
    los_e = track.create_dataset('line_of_sight_e', data=los_e_data, compression='gzip')
    los_e.attrs['units'] = 'dimensionless'
    
    # Create displacement time series (note: no date-pair groups!)
    dates = ['20240101', '20240113', '20240125', '20240208']
    
    for i, date in enumerate(dates):
        if i == 0:
            # Reference date: all zeros
            disp_data = np.zeros((1000, 1200), dtype='float32')
        else:
            # Cumulative displacement in meters
            disp_data = np.random.randn(1000, 1200).astype('float32') * 0.01 * i
        
        # Create dataset: dLOS_YYYYMMDD
        dset = track.create_dataset(f'dLOS_{date}', data=disp_data, compression='gzip')
        dset.attrs['description'] = 'Cumulative LOS displacement relative to reference date'
        dset.attrs['units'] = 'meters'  # Note: meters, not radians!
        dset.attrs['acquisition_date'] = date
        dset.attrs['reference_date'] = '20240101'

print("✅ Time series file created!")

5.3 Reading Different Product Types

reading_products.py
import h5py

# Open any InSAR HDF5 file
with h5py.File('my_insar_file.h5', 'r') as f:
    
    # Check processing type
    proc_type = f.attrs['processing_type']
    print(f"Processing type: {proc_type}")
    
    # Get track
    track = f['ALOS2_073_A']
    print(f"Platform: {track.attrs['platform']}")
    
    # Read LOS vectors (same for all product types)
    los_e = track['line_of_sight_e'][:]
    los_n = track['line_of_sight_n'][:]
    los_u = track['line_of_sight_u'][:]
    
    # Read data based on product type
    if proc_type == 'INTERFEROGRAM':
        print("\nReading interferogram...")
        # List interferogram date pairs
        ifgs = [key for key in track.keys() if '_' in key and len(key) == 17]
        print(f"Found {len(ifgs)} interferogram(s): {ifgs}")
        
        # Read specific interferogram
        ifg = track['20240101_20240113']
        unwrapped = ifg['unwrapped_interferogram'][:]
        print(f"Unwrapped phase shape: {unwrapped.shape}")
        print(f"Units: {ifg['unwrapped_interferogram'].attrs['units']}")  # radians
    
    elif proc_type == 'DISP. TIME SERIES':
        print("\nReading time series...")
        # Check for track-specific reference date (REQUIRED)
        ref_date = track.attrs['reference_date']
        print(f"Track reference date: {ref_date}")
        
        # List all displacement dates
        dates = [key.replace('dLOS_', '') for key in track.keys() 
                 if key.startswith('dLOS_')]
        dates.sort()
        print(f"Found {len(dates)} date(s): {dates}")
        
        # Read specific date
        disp = track['dLOS_20240113'][:]
        print(f"Displacement shape: {disp.shape}")
        print(f"Units: {track['dLOS_20240113'].attrs['units']}")  # meters
    
    elif proc_type == 'LOS_VELOCITY':
        print("\nReading velocity...")
        velocity = track['velocity'][:]
        print(f"Velocity shape: {velocity.shape}")
        print(f"Units: {track['velocity'].attrs['units']}")  # m/year
        print(f"Time span: {track['velocity'].attrs['time_span_start']} to "
              f"{track['velocity'].attrs['time_span_end']}")

print("✅ File read successfully!")

6. Practical Tips

✅ Best Practices

  • Always use compression - Saves 50-90% space with minimal speed penalty
  • Always add attributes - Future you will thank present you
  • Use meaningful names - line_of_sight_e better than los1
  • Be consistent - Same date format everywhere (YYYYMMDD for datasets, YYYY-MM-DD for attributes)
  • Follow the standard - Use Version 2.0 format for new files
  • Choose the right product type - INTERFEROGRAM for raw phase, TIME SERIES for deformation evolution
  • Include track-level metadata - Each track needs platform, orbit, dates, wavelength, etc.
  • Set reference_date for time series - Each track MUST have its own reference_date attribute

⚠️ Common Mistakes to Avoid

  • Don't mix product types - One file = one processing_type
  • Don't forget units - Radians for interferograms, meters for time series
  • Don't mix up date formats - Use YYYYMMDD for dataset names, YYYY-MM-DD for date attributes
  • Don't forget track-specific LOS - Each track has different viewing geometry
  • Don't use date-pair groups for time series - Use dLOS_YYYYMMDD directly under track
  • Don't forget reference_date for time series - Required at track level (each track can have different reference)
  • Don't forget required track metadata - platform, relative_orbit, flight_direction, wavelength, first_date, last_date, etc.
  • Don't put processing methods at root - atmos_correct_method and post_processing_method are track-level

7. Quick Reference Summary

Concept What It Is Example (V2.0) Python
Group Container (like a folder) /ALOS2_073_A/ f.create_group('ALOS2_073_A')
Dataset Data array (like a file) unwrapped_interferogram f.create_dataset('name', data=array)
Attribute Metadata (like file properties) @platform = "ALOS-2" f.attrs['platform'] = 'ALOS-2'
Path Location in hierarchy /ALOS2_073_A/20240101_20240113/ f['ALOS2_073_A']['20240101_20240113']
Track One satellite/orbit combination ALOS2_073_A, S1_064_D track = f.create_group('ALOS2_073_A')

🎯 Remember:

  • HDF5 = miniature file system in a single file
  • Groups = folders, Datasets = files, Attributes = properties
  • @ symbol indicates attributes in documentation
  • Three product types: INTERFEROGRAM, DISP. TIME SERIES, LOS_VELOCITY
  • INTERFEROGRAM: date-pair groups with phase in radians
  • DISP. TIME SERIES: dLOS_YYYYMMDD datasets with displacement in meters + track-level reference_date
  • LOS_VELOCITY: single velocity dataset with rates in m/year
  • Each track has its own LOS geometry - essential for multi-track analysis
  • Each track needs complete metadata: platform, orbit, dates, wavelength, beam mode, etc.
  • Time series: each track MUST have its own @reference_date attribute
  • Always compress, always document with attributes