• Help
    Discussion forum
    Search tips
  • About
    CERN Open Data
    ALICE
    ATLAS
    CMS
    DELPHI
    JADE
    LHCb
    OPERA
    TOTEM
    Glossary

Important notice: opendata-dev.cern.ch is a development server. Please use it for testing purposes only. The content may be erased at any time. Please use opendata.cern.ch for production.

Introducing cold data in the CERN Open Data portal

2025-10-07 by CERN Open Data team

News


Over the past decade, as the CERN Open Data Portal has grown, so too have the demands on storage: more datasets, more users, and ever more ambitious releases. Today, the portal hosts more than five petabytes of data. And the trend is growing rapidly. While a substantial fraction is accessed frequently (“hot data”), there is a portion that is rarely used but still must be preserved for reproducibility, future analyses, and educational purposes.

Maintaining all data in high‐performance “hot” storage is expensive and unsustainable. This has led us to explore and now introduce cold storage, a tiered approach that balances cost, durability, and accessibility.

Cold storage refers to storage media and infrastructure optimized for data that is infrequently accessed. Key features include:

  • High durability: Ensuring integrity over long periods.
  • Cost efficiency: Lower cost per terabyte compared to hot disk or SSD storage.
  • Long-term preservation: Suitability for archival retention policies.
  • Acceptable latency: Users may need to “stage” data (retrieve from cold to warm/hot storage) before access; this introduces delays.

We have implemented cold storage using the CERN Tape Archive (CTA).

A file can therefore be either online, if the file is in the hot media, or offline, if the file is in cold media.

Since a record can contain multiple files, there are more possible status for a record:

  • Online: If all the files of the record are online.
  • Offline: If all the files of the record are offline.
  • Partial: There are some files available online.
  • Requested: The record contains the at least one file that has been requested for staging and that it is still not online.

The records that are not online will include a button to request the staging. Any user can issue such a request. They might also introduce their email, and they will be notified when the staging request has finished. The status of the requests can also be found on a dedicated page.

In conclusion, introducing cold storage is a strategic move to ensure that the CERN Open Data Portal can keep growing — in terms of data volume, longevity, and impact — without compromising cost, preservation, or user expectations

ALICE experiment
ATLAS experiment
CMS experiment
DELPHI experiment
JADE experiment
LHCb experiment
OPERA experiment
PHENIX experiment
TOTEM experiment
© CERN, 2014–2025 ·
Terms of Use ·
Privacy Policy ·
Help ·
GitHub ·
Twitter ·
Email
Powered by Invenio
Open Data Portal v0.7.0
CERN