A Practical Guide for InSAR Data Storage
HDF5 is a file format designed to store and organize large amounts of scientific data. Think of it as a sophisticated container that can hold:
HDF5 is like a miniature file system inside a single file:
Just like you organize files on your computer into folders, HDF5 lets you organize data arrays into groups!
| Advantage | What It Means for InSAR |
|---|---|
| Self-Describing | All metadata travels with the data - you know what satellite, what dates, what processing software and methods were used |
| Efficient Storage | Built-in compression can reduce file sizes by 50-90% |
| Multiple Datasets | Store unwrapped phase, wrapped phase, correlation, time series, and geometry all in one file |
| Partial Reading | Read just the part of the image you need without loading the whole file |
| Cross-Platform | Works on Linux, Mac, Windows - same file everywhere |
| Language Support | Can read with Python, MATLAB, R, C++, Java, etc. |
Groups organize your data hierarchically, just like folders on your computer.
/ALOS2_073_A/20240101_20240113/Datasets are multi-dimensional arrays that hold your actual numerical data.
Attributes are small pieces of metadata attached to groups or datasets.
In documentation, we use @ to indicate attributes: @platform means an attribute named "platform"
Computer File System → HDF5 Equivalent:
/Users/username/Documents/ → /ALOS2_073_A/20240101_20240113/photo.jpg → unwrapped_interferogram (dataset)@units, @description)/ALOS2_073_A/20240101_20240113/unwrapped_interferogramIn Your InSAR File (Version 2.0):
/ALOS2_073_A/ - Contains all data for ALOS-2 track 73 ascending/ALOS2_073_A/20240101_20240113/ - Contains one interferogram pair (for INTERFEROGRAM products)/S1_064_D/ - Contains all data for Sentinel-1 track 64 descending/METADATA_SUMMARY/ - Contains summary tables across all tracksExample Dataset:
Use Attributes for:
Use Datasets for:
Rule of Thumb: If it's bigger than a few KB, make it a dataset. If it's a description or label, make it an attribute.
The Version 2.0 format supports three types of InSAR products, each with a different data organization:
Interferograms show the phase difference between two SAR acquisitions. They contain raw measurements of surface displacement wrapped in cycles of radar wavelength.
File Structure:
Key Characteristics:
YYYYMMDD_YYYYMMDD/Time series show cumulative surface displacement over time relative to a reference date. These are derived from multiple interferograms using time series analysis methods (SBAS, PS, etc.).
File Structure:
Key Characteristics:
dLOS_YYYYMMDDVelocity maps show the average rate of surface motion over a time period. Derived from linear regression or other methods applied to displacement time series.
File Structure:
Key Characteristics:
| Feature | INTERFEROGRAM | DISP. TIME SERIES | LOS_VELOCITY |
|---|---|---|---|
| processing_type | "INTERFEROGRAM" | "DISP. TIME SERIES" | "LOS_VELOCITY" |
| Data Organization | Date-pair groups | Individual date datasets | Single velocity dataset |
| Dataset Names | YYYYMMDD_YYYYMMDD/ | dLOS_YYYYMMDD | velocity |
| Units | Radians | Meters (m) | m/year or mm/year |
| Reference Date | Not applicable | Required (track-level) | Time span in attributes |
| Typical Use | Raw phase measurements | Time-dependent deformation | Long-term rates |
| Common Methods | GMTSAR, ISCE, GAMMA, SNAP | MintPy, StaMPS, (X)SBAS | Stacking, StaMPS, (X)SBAS |
import h5py
import numpy as np
# Create INTERFEROGRAM product
with h5py.File('my_interferogram.h5', 'w') as f:
# Root attributes
f.attrs['processing_type'] = 'INTERFEROGRAM'
f.attrs['processing_software'] = 'ISCE2 v2.6.3 + SNAPHU v2.0.5'
f.attrs['sign_convention'] = 'Positive LOS displacement corresponds to surface motion toward the sensor'
# Create track
track = f.create_group('ALOS2_073_A')
track.attrs['platform'] = 'ALOS-2'
track.attrs['relative_orbit'] = 73
track.attrs['flight_direction'] = 'A'
track.attrs['look_direction'] = 'R'
track.attrs['beam_mode'] = 'WD1'
track.attrs['beam_swath'] = 'W1'
track.attrs['wavelength'] = 0.236
track.attrs['first_date'] = '2024-01-01'
track.attrs['last_date'] = '2024-02-08'
track.attrs['time_acquisition'] = '10:23'
track.attrs['scene_footprint'] = 'POLYGON((-118.5 34.0, -118.0 34.0, -118.0 34.5, -118.5 34.5, -118.5 34.0))'
# LOS vectors (track-specific)
los_e_data = np.random.uniform(0.35, 0.45, (1000, 1200)).astype('float32')
los_e = track.create_dataset('line_of_sight_e', data=los_e_data, compression='gzip')
los_e.attrs['description'] = 'LOS unit vector - East component'
los_e.attrs['units'] = 'dimensionless'
# Create interferogram group (date pair)
ifg_group = track.create_group('20240101_20240113')
ifg_group.attrs['reference_date'] = '20240101'
ifg_group.attrs['secondary_date'] = '20240113'
ifg_group.attrs['temporal_baseline_days'] = 12
ifg_group.attrs['baseline_perp'] = 45.2
# Add unwrapped phase (radians)
unwrapped_data = np.random.randn(1000, 1200).astype('float32')
unwrapped = ifg_group.create_dataset('unwrapped_interferogram',
data=unwrapped_data, compression='gzip')
unwrapped.attrs['description'] = 'Unwrapped interferometric phase'
unwrapped.attrs['units'] = 'radians'
print("✅ Interferogram file created!")
import h5py
import numpy as np
# Create DISP. TIME SERIES product
with h5py.File('my_timeseries.h5', 'w') as f:
# Root attributes
f.attrs['processing_type'] = 'DISP. TIME SERIES'
f.attrs['processing_software'] = 'MintPy v1.5.1'
f.attrs['sign_convention'] = 'Positive LOS displacement corresponds to surface motion toward the sensor'
# Create track
track = f.create_group('ALOS2_073_A')
track.attrs['platform'] = 'ALOS-2'
track.attrs['relative_orbit'] = 73
track.attrs['flight_direction'] = 'A'
track.attrs['look_direction'] = 'R'
track.attrs['beam_mode'] = 'WD1'
track.attrs['beam_swath'] = 'W1'
track.attrs['wavelength'] = 0.236
track.attrs['first_date'] = '2024-01-01'
track.attrs['last_date'] = '2024-04-01'
track.attrs['time_acquisition'] = '10:23'
track.attrs['scene_footprint'] = 'POLYGON((-118.5 34.0, -118.0 34.0, -118.0 34.5, -118.5 34.5, -118.5 34.0))'
# REQUIRED: Track-specific reference date for time series
track.attrs['reference_date'] = '20240101'
# LOS vectors
los_e_data = np.random.uniform(0.35, 0.45, (1000, 1200)).astype('float32')
los_e = track.create_dataset('line_of_sight_e', data=los_e_data, compression='gzip')
los_e.attrs['units'] = 'dimensionless'
# Create displacement time series (note: no date-pair groups!)
dates = ['20240101', '20240113', '20240125', '20240208']
for i, date in enumerate(dates):
if i == 0:
# Reference date: all zeros
disp_data = np.zeros((1000, 1200), dtype='float32')
else:
# Cumulative displacement in meters
disp_data = np.random.randn(1000, 1200).astype('float32') * 0.01 * i
# Create dataset: dLOS_YYYYMMDD
dset = track.create_dataset(f'dLOS_{date}', data=disp_data, compression='gzip')
dset.attrs['description'] = 'Cumulative LOS displacement relative to reference date'
dset.attrs['units'] = 'meters' # Note: meters, not radians!
dset.attrs['acquisition_date'] = date
dset.attrs['reference_date'] = '20240101'
print("✅ Time series file created!")
import h5py
# Open any InSAR HDF5 file
with h5py.File('my_insar_file.h5', 'r') as f:
# Check processing type
proc_type = f.attrs['processing_type']
print(f"Processing type: {proc_type}")
# Get track
track = f['ALOS2_073_A']
print(f"Platform: {track.attrs['platform']}")
# Read LOS vectors (same for all product types)
los_e = track['line_of_sight_e'][:]
los_n = track['line_of_sight_n'][:]
los_u = track['line_of_sight_u'][:]
# Read data based on product type
if proc_type == 'INTERFEROGRAM':
print("\nReading interferogram...")
# List interferogram date pairs
ifgs = [key for key in track.keys() if '_' in key and len(key) == 17]
print(f"Found {len(ifgs)} interferogram(s): {ifgs}")
# Read specific interferogram
ifg = track['20240101_20240113']
unwrapped = ifg['unwrapped_interferogram'][:]
print(f"Unwrapped phase shape: {unwrapped.shape}")
print(f"Units: {ifg['unwrapped_interferogram'].attrs['units']}") # radians
elif proc_type == 'DISP. TIME SERIES':
print("\nReading time series...")
# Check for track-specific reference date (REQUIRED)
ref_date = track.attrs['reference_date']
print(f"Track reference date: {ref_date}")
# List all displacement dates
dates = [key.replace('dLOS_', '') for key in track.keys()
if key.startswith('dLOS_')]
dates.sort()
print(f"Found {len(dates)} date(s): {dates}")
# Read specific date
disp = track['dLOS_20240113'][:]
print(f"Displacement shape: {disp.shape}")
print(f"Units: {track['dLOS_20240113'].attrs['units']}") # meters
elif proc_type == 'LOS_VELOCITY':
print("\nReading velocity...")
velocity = track['velocity'][:]
print(f"Velocity shape: {velocity.shape}")
print(f"Units: {track['velocity'].attrs['units']}") # m/year
print(f"Time span: {track['velocity'].attrs['time_span_start']} to "
f"{track['velocity'].attrs['time_span_end']}")
print("✅ File read successfully!")
line_of_sight_e better than los1| Concept | What It Is | Example (V2.0) | Python |
|---|---|---|---|
| Group | Container (like a folder) | /ALOS2_073_A/ |
f.create_group('ALOS2_073_A') |
| Dataset | Data array (like a file) | unwrapped_interferogram |
f.create_dataset('name', data=array) |
| Attribute | Metadata (like file properties) | @platform = "ALOS-2" |
f.attrs['platform'] = 'ALOS-2' |
| Path | Location in hierarchy | /ALOS2_073_A/20240101_20240113/ |
f['ALOS2_073_A']['20240101_20240113'] |
| Track | One satellite/orbit combination | ALOS2_073_A, S1_064_D |
track = f.create_group('ALOS2_073_A') |