Important notice: opendata-dev.cern.ch is a development server. Please use it for testing purposes only. The content may be erased at any time. Please use opendata.cern.ch for production.
2025-10-07 by CERN Open Data team
Over the past decade, as the CERN Open Data Portal has grown, so too have the demands on storage: more datasets, more users, and ever more ambitious releases. Today, the portal hosts more than five petabytes of data. And the trend is growing rapidly. While a substantial fraction is accessed frequently (“hot data”), there is a portion that is rarely used but still must be preserved for reproducibility, future analyses, and educational purposes.
Maintaining all data in high‐performance “hot” storage is expensive and unsustainable. This has led us to explore and now introduce cold storage, a tiered approach that balances cost, durability, and accessibility.
Cold storage refers to storage media and infrastructure optimized for data that is infrequently accessed. Key features include:
We have implemented cold storage using the CERN Tape Archive (CTA).
A file can therefore be either online, if the file is in the hot media, or offline, if the file is in cold media.
Since a record can contain multiple files, there are more possible status for a record:
The records that are not online will include a button to request the staging. Any user can issue such a request. They might also introduce their email, and they will be notified when the staging request has finished. The status of the requests can also be found on a dedicated page.
In conclusion, introducing cold storage is a strategic move to ensure that the CERN Open Data Portal can keep growing — in terms of data volume, longevity, and impact — without compromising cost, preservation, or user expectations