Skip to article frontmatterSkip to article content

The following setup is based on our methodology described in the paper “Towards Standardization of the Earth Observation Data Product Supply Chain – Are OCI Artifacts the Key to Ubiquitous and Scalable EO Data Handling?”

Prerequisites

Before getting started, ensure the following tools are installed:

  • ORAS CLI
  • tar, tree, and jq (available on most Unix-like systems)
  • Docker (if you plan to run a local OCI registry)

To demonstrate the ubiquity and usability of OCI registries in real-world scenarios, we evaluated the following five registries:

The repositories on Docker Hub and Quay are publicly accessible in read-only mode, so you can directly inspect the used evaluation artifacts. Harbor and AWS ECR are private instances, so you will need your own cloud subscription and authentication credentials if you want to use them. The Zot Registry is the most straightforward option to follow up. You can run it locally via:

docker run -p 5000:5000 ghcr.io/project-zot/zot-minimal-linux-amd64:v2.1.2

You can also experiment by running other OCI-compliant registries locally, such as Harbor or Quay.

Note: The OCI registry reference implementation provided by the distribution project supports OCI Artifacts 1.0, but does not implement the OCI 1.1 referrers API. As a result, features such as external signatures or linked metadata are not discoverable using the standard referrers mechanism when using this registry.

For Python dependencies a requirements file is provided for convenience:

pip install -r requirements.txt

Reference Dataset and Partitioning Strategy

We used the Panoptic Agricultural Satellite Time Series (PASTIS) dataset as our reference, as it integrates diverse Earth Observation (EO) modalities, including:

  • Optical time-series data from Sentinel-2
  • Radar time-series data from Sentinel-1
  • Very High Resolution (VHR) imagery from SPOT satellites
  • Curated annotations, including label masks and semantic classifications

Rather than preserving the original organization—which grouped data by source and required consumers to search and filter for relevant information—we restructured the dataset into an analysis-ready format using two distinct partitioning schemes:

  • PASTIS-2433: The entire PASTIS dataset is split into 2,433 individual per-patch subsets. Each patch is packaged into a separate TAR archive and added as a layer within a single OCI artifact. The final artifact includes 2,433 layers, each approximately 30–35 MB as well as a config object that describes the metadata for each patch. This layout enables fine-grained access and maximizes deduplication across patches.

  • PASTIS-t4: The dataset is instead divided into four larger spatial tiles. Each tile represents a distinct region and is packaged into a TAR archive, then added as a layer in the OCI artifact. This results in 4 layers, each approximately 15–20 GB and a config object that captures tile-level metadata. This approach enables high-throughput data access optimized for regional analysis.

For our own convenience, the TAR archives (both the 2,433 patches and the 4 tiles) were uploaded to an object storage bucket on OVHCloud. These were used as the source to package the evaluation OCI artifacts before pushing them to various registries.

The used partitioning scripts are available here (2433 patches) and here (4 tiles).

The scripts used to generate the config files—required for building the OCI artifacts and pushing them to a registry—are available here (2433 patches) and here (4 tiles).

The original spatial metadata for the PASTIS dataset is provided as a GeoJSON file in the data/ folder.

Note: We’ve created a small sample of PASTIS-2433 with just 3 layers and a matching config file. It’s located in the sample/ folder to help you quickly explore the data and get started.

EO Data Packaging and Publishing to OCI registries

The following example illustrates how a sample of the PASTIS-2433 dataset is organized, packaged into OCI-compatible layers, and pushed as a unified artifact to a local registry.

Each layer (i.e. the .tar files) represents a self-contained EO data partition, structured as follows in the package:

sample/
├── 10000.tar
├── 10001.tar
├── 10002.tar
└── config.json

The individual layers contain multiple subdirectories corresponding to input modalities and annotations:

├── ANNOTATIONS
│   ├── ParcelIDs_10001.npy
│   └── TARGET_10001.npy
├── DATA_S1A
│   └── S1A_10001.npy
├── DATA_S1D
│   └── S1D_10001.npy
├── DATA_S2
│   └── S2_10001.npy
├── DATA_SPOT
│   └── PASTIS_SPOT6_RVB_1M00_2019
│       └── SPOT6_RVB_1M00_2019_10001.tif
└── INSTANCE_ANNOTATIONS
    ├── HEATMAP_10001.npy
    └── INSTANCES_10001.npy

We use the ORAS cli to push the data to a locally started OCI registry at localhost:5000. Each .tar file is attached as a layer, and a separate config.json file provides summary metadata.

oras push localhost:5000/pastis-2433:sample \
  --artifact-type application/vnd.whatever.v1+tar \
  --config config.json:application/vnd.oci.image.config.v1+json \
  10000.tar:application/vnd.oci.image.layer.v1.tar \
  10001.tar:application/vnd.oci.image.layer.v1.tar \
  10002.tar:application/vnd.oci.image.layer.v1.tar

This results in a published single OCI artifact composed of:

  • A config.json blob describing the dataset (e.g., patch index, date range, modalities).
  • Three .tar layer blobs, each corresponding to one data partition.

The following Python script achieves the same functionality in a generalized way. Equivalent Bash scripts for publishing to various OCI-compliant registries, with the appropriate registry endpoints and credentials, are available: here (2433 patches) and here (4 tiles).

import os
import re
import subprocess

data_dir = "sample"
registry = "localhost:5000"
repo = f"{registry}/pastis-2433:sample"

print(f"Preparing to push to {repo}")

layers = []
tar_pattern = re.compile(r"^\d{5}\.tar$")
for filename in os.listdir(data_dir):
    if tar_pattern.match(filename):
        full_path = os.path.join(data_dir, filename)
        layers.append(f"{full_path}:application/vnd.oci.image.layer.v1.tar")

if not layers:
    print("No valid layers found")
    exit(1)

print(f"Found {len(layers)} layer(s):")
for layer in layers:
    print("  -", layer.split(":")[0])

config_path = os.path.join(data_dir, "config.json")
cmd = [
    "oras", "push", "--verbose", repo,
    "--artifact-type", "application/vnd.whatever.v1+tar",
    "--config", f"{config_path}:application/vnd.oci.image.config.v1+json"
] + layers

subprocess.run(cmd, check=True)
Preparing to push to localhost:5000/pastis-2433:sample
Found 3 layer(s):
  - sample/10000.tar
  - sample/10001.tar
  - sample/10002.tar
Preparing sample/10000.tar
Preparing sample/10001.tar
Preparing sample/10002.tar
Exists    6f57fa9c759f sample/10002.tar
Exists    7cff937ff47c sample/10000.tar
Exists    e4c1009e385d application/vnd.oci.image.config.v1+json
Exists    b9af2a69dee3 sample/10001.tar
Uploading 8e05221fa48d application/vnd.oci.image.manifest.v1+json
Uploaded  8e05221fa48d application/vnd.oci.image.manifest.v1+json
Pushed [registry] localhost:5000/pastis-2433:sample
ArtifactType: application/vnd.whatever.v1+tar
Digest: sha256:8e05221fa48d22d6426877032b20e3dd5f5f06913c5435031c0eabf463265059
CompletedProcess(args=['oras', 'push', '--verbose', 'localhost:5000/pastis-2433:sample', '--artifact-type', 'application/vnd.whatever.v1+tar', '--config', 'sample/config.json:application/vnd.oci.image.config.v1+json', 'sample/10000.tar:application/vnd.oci.image.layer.v1.tar', 'sample/10001.tar:application/vnd.oci.image.layer.v1.tar', 'sample/10002.tar:application/vnd.oci.image.layer.v1.tar'], returncode=0)

Inspecting and Retrieving EO Data Packages from OCI registries

The following steps demonstrate how to inspect and retrieve Earth Observation (EO) data packages published as OCI artifacts. This process helps users identify which partitions are included in a package and enables selective download of relevant components.

We use the ORAS cli to fetch the manifest associated with the artifact. This manifest provides a structured list of all attached blobs (config and layers), along with their digests and media types. These identifiers are essential for both full and selective retrieval.

oras manifest fetch localhost:5000/pastis-2433:sample --format json

The output includes a JSON structure describing the config and all layers:

{
  "reference": "localhost:5000/pastis-2433@sha256:72bf0b123756669a8b9b34dfd4beb898dc9ab1eb7171bcb804de9d38e1371c9c",
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "digest": "sha256:72bf0b123756669a8b9b34dfd4beb898dc9ab1eb7171bcb804de9d38e1371c9c",
  "size": 1037,
  "content": {
    "artifactType": "application/vnd.whatever.v1+tar",
    "config": {
      "digest": "sha256:e4c1009e385dbbc1159703daf9a5a960bc9bc61e0364c41c673c981e2a0874c0",
      "mediaType": "application/vnd.oci.image.config.v1+json",
      "size": 3209
    },
    "layers": [
      {
        "annotations": {
          "org.opencontainers.image.title": "sample/10000.tar"
        },
        "digest": "sha256:7cff937ff47cf327ab3fe27310da670719f87e86cd1f72c24459c3b542e505e0",
        "mediaType": "application/vnd.oci.image.layer.v1.tar",
        "size": 32819200
      },
      {
        "annotations": {
          "org.opencontainers.image.title": "sample/10001.tar"
        },
        "digest": "sha256:b9af2a69dee30874cb6687f0e36188292442da236beaf6ccd699c4f2e231fe3c",
        "mediaType": "application/vnd.oci.image.layer.v1.tar",
        "size": 33402880
      },
      {
        "annotations": {
          "org.opencontainers.image.title": "sample/10002.tar"
        },
        "digest": "sha256:6f57fa9c759fc3c70f1717eccfd8381f89188bb72b66bd95f0215f1b84f773b5",
        "mediaType": "application/vnd.oci.image.layer.v1.tar",
        "size": 33382400
      }
    ],
    "mediaType": "application/vnd.oci.image.manifest.v1+json",
    "schemaVersion": 2
  }
}

To retrieve the full dataset, we use the ORAS cli to pull the entire artifact and extract a specific patch for inspection:

oras pull localhost:5000/pastis-2433:sample -o /tmp/pastis-2433
cd /tmp/pastis-2433/sample
mkdir 10001
tar -xvf 10001.tar -C 10001
ANNOTATIONS/ParcelIDs_10001.npy
ANNOTATIONS/TARGET_10001.npy
DATA_S1A/S1A_10001.npy
DATA_S1D/S1D_10001.npy
DATA_S2/S2_10001.npy
DATA_SPOT/PASTIS_SPOT6_RVB_1M00_2019/SPOT6_RVB_1M00_2019_10001.tif
INSTANCE_ANNOTATIONS/HEATMAP_10001.npy
INSTANCE_ANNOTATIONS/INSTANCES_10001.npy

Alternatively, we can fetch only selected components of the artifact—starting with the config.json blob, which typically includes metadata about the partitions:

CONFIG_DIGEST=$(oras manifest fetch localhost:5000/pastis-2433:sample | jq -r '.config.digest')
oras blob fetch localhost:5000/pastis-2433:sample@$CONFIG_DIGEST -o /tmp/config.json

After inspecting the config file to locate the relevant partition, we can fetch only the desired .tar blob using its digest:

oras blob fetch localhost:5000/pastis-2433:sample@sha256:7cff93... -o /tmp/10000.tar

The following Python script codifies the same functionality. Equivalent Bash scripts were used to benchmark the retrieval process—both full and selective—across various OCI-compliant registries, using the appropriate registry endpoints and authentication settings.

import subprocess
import json
import tempfile
import os
import tarfile

registry = "localhost:5000"
repo = f"{registry}/pastis-2433:sample"

print(f"Fetching manifest for {repo}...")
manifest_json = subprocess.check_output([
    "oras", "manifest", "fetch", repo, "--format", "json"
], text=True)
manifest = json.loads(manifest_json)

if "content" not in manifest or "layers" not in manifest["content"]:
    raise ValueError("Manifest structure does not contain expected 'content.layers' path.")

layers = manifest["content"]["layers"]

print(f"Found {len(layers)} layer(s):")
for i, layer in enumerate(layers):
    print(f"  [{i}] digest: {layer['digest']} - {layer['annotations'].get('org.opencontainers.image.title', 'no title')}")

layer_index = 1
digest = layers[layer_index]["digest"]
layer_filename = f"layer_{layer_index}.tar"
layer_tar_path = os.path.join(tempfile.gettempdir(), layer_filename)

print(f"\nDownloading layer {layer_index} to: {layer_tar_path}")
subprocess.run([
    "oras", "blob", "fetch", f"{repo}@{digest}", "-o", layer_tar_path
], check=True)

extract_dir = os.path.join(tempfile.gettempdir(), f"layer_{layer_index}_extracted")
os.makedirs(extract_dir, exist_ok=True)

print(f"Extracting layer to: {extract_dir}")
with tarfile.open(layer_tar_path, "r") as tar:
    tar.extractall(path=extract_dir)

print("\nContents of extracted layer:")
for root, dirs, files in os.walk(extract_dir):
    for name in files:
        rel_path = os.path.relpath(os.path.join(root, name), extract_dir)
        print(f" - {rel_path}")
Fetching manifest for localhost:5000/pastis-2433:sample...
Found 3 layer(s):
  [0] digest: sha256:7cff937ff47cf327ab3fe27310da670719f87e86cd1f72c24459c3b542e505e0 - sample/10000.tar
  [1] digest: sha256:b9af2a69dee30874cb6687f0e36188292442da236beaf6ccd699c4f2e231fe3c - sample/10001.tar
  [2] digest: sha256:6f57fa9c759fc3c70f1717eccfd8381f89188bb72b66bd95f0215f1b84f773b5 - sample/10002.tar

Downloading layer 1 to: /tmp/layer_1.tar
Extracting layer to: /tmp/layer_1_extracted

Contents of extracted layer:
 - DATA_S1D/S1D_10001.npy
 - DATA_S2/S2_10001.npy
 - INSTANCE_ANNOTATIONS/INSTANCES_10001.npy
 - INSTANCE_ANNOTATIONS/HEATMAP_10001.npy
 - ANNOTATIONS/TARGET_10001.npy
 - ANNOTATIONS/ParcelIDs_10001.npy
 - DATA_S1A/S1A_10001.npy
 - DATA_SPOT/PASTIS_SPOT6_RVB_1M00_2019/SPOT6_RVB_1M00_2019_10001.tif

Attaching Attestations to EO Data Artifacts Using OCI Referrers

OCI referrers make it possible to link auxiliary artifacts—such as signatures, validation reports, or provenance metadata—to a primary artifact by its digest, without modifying the original content. This supports traceability, trust, and structured discovery in EO data pipelines while preserving immutability.

The mechanism works by attaching a new artifact using its digest reference, leveraging the oras attach command introduced in OCI Artifacts 1.1. This allows registries that support the referrers API to expose associated artifacts in a standard way.

The following Python script demonstrates how to attach such a referrer (e.g., a signature) to an EO data artifact and subsequently discover and inspect referrers using the ORAS CLI. This approach was also used to evaluate support for referrers across different OCI-compliant registries.


import subprocess
import tempfile
import os
import json

tag = "sample"
registry = "localhost:5000"
repo = f"{registry}/pastis-2433"
tag_ref = f"{repo}:{tag}"

print(f"Fetching manifest digest for {tag_ref}...")
manifest_raw = subprocess.check_output([
    "oras", "manifest", "fetch", tag_ref, "--format", "json"
], text=True)
manifest = json.loads(manifest_raw)
subject_digest = manifest["digest"]
subject_ref = f"{repo}@{subject_digest}"
print(f"Resolved digest: {subject_digest}")

sig_content = f"Signed by EO pipeline v1.2.0\nDigest: {subject_digest}"
sig_path = os.path.join(tempfile.gettempdir(), "signature.txt")
with open(sig_path, "w") as f:
    f.write(sig_content)

print(f"Attaching signature to {subject_ref} using `oras attach`...")
subprocess.run([
    "oras", "attach", subject_ref,
    "--artifact-type", "application/vnd.oci.artifact.signature.v1+text",
    "--disable-path-validation",
    f"{sig_path}:text/plain"
], check=True)
print("Signature attached as referrer.")

referrers_json = subprocess.check_output([
    "oras", "discover", subject_ref, "--format", "json"
], text=True)
referrers = json.loads(referrers_json).get("manifests", [])

if not referrers:
    print("No referrers found.")
    exit(1)

print(f"{len(referrers)} referrer(s) discovered:")
for i, ref in enumerate(referrers):
    print(f"  [{i}] Digest: {ref['digest']} | Type: {ref.get('artifactType')}")

referrer_digest = referrers[0]["digest"]
print(f"\nFetching referrer artifact with digest: {referrer_digest}")

referrer_blob_path = os.path.join(tempfile.gettempdir(), "referrer_signature.txt")
subprocess.run([
    "oras", "blob", "fetch", f"{repo}@{referrer_digest}",
    "-o", referrer_blob_path
], check=True)

print(f"\nContents of the referrer artifact ({referrer_blob_path}):\n")
with open(referrer_blob_path, "r") as f:
    print(f.read())
Fetching manifest digest for localhost:5000/pastis-2433:sample...
Resolved digest: sha256:8e05221fa48d22d6426877032b20e3dd5f5f06913c5435031c0eabf463265059
Attaching signature to localhost:5000/pastis-2433@sha256:8e05221fa48d22d6426877032b20e3dd5f5f06913c5435031c0eabf463265059 using `oras attach`...
Uploading 9a1609bec129 /tmp/signature.txt
Uploaded  9a1609bec129 /tmp/signature.txt
Attached to [registry] localhost:5000/pastis-2433@sha256:8e05221fa48d22d6426877032b20e3dd5f5f06913c5435031c0eabf463265059
Digest: sha256:b40af5bd6983a050a2438d5b0df79b478f92c72c5c034b5421bc6504fe3100e2
Signature attached as referrer.
1 referrer(s) discovered:
  [0] Digest: sha256:b40af5bd6983a050a2438d5b0df79b478f92c72c5c034b5421bc6504fe3100e2 | Type: application/vnd.oci.artifact.signature.v1+text

Fetching referrer artifact with digest: sha256:b40af5bd6983a050a2438d5b0df79b478f92c72c5c034b5421bc6504fe3100e2

Contents of the referrer artifact (/tmp/referrer_signature.txt):

{"schemaVersion":2,"mediaType":"application/vnd.oci.image.manifest.v1+json","artifactType":"application/vnd.oci.artifact.signature.v1+text","config":{"mediaType":"application/vnd.oci.empty.v1+json","digest":"sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a","size":2,"data":"e30="},"layers":[{"mediaType":"text/plain","digest":"sha256:9a1609bec129e04afd360f0b7f7da0d10a0136280bd3d79f6389e66eb86156f7","size":108,"annotations":{"org.opencontainers.image.title":"/tmp/signature.txt"}}],"subject":{"mediaType":"application/vnd.oci.image.manifest.v1+json","digest":"sha256:8e05221fa48d22d6426877032b20e3dd5f5f06913c5435031c0eabf463265059","size":1037},"annotations":{"org.opencontainers.image.created":"2025-05-18T18:33:34Z"}}

Remarks and Discussion on the Benchmark

  • All benchmarks were conducted either locally or on OVHCloud compute instances located in the Frankfurt (DE1) region.

  • We used the Apache HTTP server benchmarking tool (ab) to explore different upload and download scenarios. These tests were exploratory rather than systematic, primarily due to cost constraints. Managed services like Docker Hub and Quay were only evaluated under their free-tier plans, which impose rate limits and size restrictions. For private registries, we remained within the lowest pricing tier. To avoid local caching side effects, we sandboxed some test runs in Docker containers using the ab tool, for example:

    docker run --rm -it \
      -v "$(pwd)/output:/workspace" \
      -w /workspace \
      ghcr.io/oras-project/oras:v1.2.2 \
      pull --allow-path-traversal docker.io/versioneer/pastis-2433:sample
  • We intentionally did not test extreme scenarios, such as multi-100 GB artifacts, very large individual layers, or artifacts with thousands of layers. These cases were outside the scope of our study, which focused on evaluating data packaging workflows using plausible and representative package sizes. This choice is discussed further in the paper.

  • Deduplication was tested by uploading a small sample package first, followed by a full package. In all cases, the three shared layers from the sample were correctly reported as already existing and were not re-uploaded.

  • Upload and download resumability was also evaluated. We found that only fully completed layers were eligible for resumption. Partially transferred layers were treated as failed and had to be restarted from the beginning.

  • The sample package includes both custom annotations and non-standard media types. We confirmed that these were retained by the tested OCI-compliant registries.