• Help
    Discussion forum
    Search tips
  • About
    CERN Open Data
    ALICE
    ATLAS
    CMS
    DELPHI
    JADE
    LHCb
    OPERA
    TOTEM
    Glossary

Important notice: opendata-dev.cern.ch is a development server. Please use it for testing purposes only. The content may be erased at any time. Please use opendata.cern.ch for production.

CMS Guide to research use of CMS Open Data

Documentation Guide


If you are interested in step-by-step instructions to start working with CMS Open Data, please consult these pages:

  • Install Virtual Machine or Use a container
  • Getting started with CMS AOD Data, for data collected during Run 1 of the LHC.
  • Getting started with CMS MiniAOD Data or NanoAOD Data, for data collected during Run 2 of the LHC.
  • Getting started with CMS Heavy Ion Data.

This page offers hints, tips and guidance for conducting a research-oriented analysis using CMS Open Data. More detailed information can be found in the CMS Open Data Guide.


Quick introduction

I want to get a general introduction into HEP and CMS software and terminology, with a simplified event format.

  • Read the instructions related to our educational content and follow the corresponding exercises.

I want to learn about the terms under which I can access and use the CMS Open Data, and publish results obtained from them.

  • Go to the "Data preservation and open access policy" (if you are a CMS member, also see the internal document "Rules for use of open access CMS data by individual members of CMS").

I want to get inspiration for some potential physics topics.

  • See what others are doing with CMS Open Data! Papers citing DOI 10.7483/OPENDATA.CMS show a broad scope of usage for Open Data, including physics analyses, data science, and research tool development.

I want to learn about the nature of the CMS physics objects and the corresponding variables and terminology.

  • Check out the "CMS Open Data Guide" as well as the pages describing CMS Physics Objects for 2011-2012 data and for 2015 data.

I want to follow a set of detailed tutorials to learn how to analyze CMS Open Data.

  • Beginning in 2020, CMS has offered workshops targeting research use of Open Data. You can follow the lessons of previous workshops by visiting this page.

Deciding which datasets to explore

CMS has released data proton collision data from Run 1 and Run 2, as well as heavy ion collision data from Run 1.

High-energy proton collisions:

Collisions Energy (TeV) Simulation Getting Started CMSSW version
proton-proton 2010 7 2010 simulation AOD data CMSSW_4_2_8
proton-proton 2011 7 2011 simulation AOD data CMSSW_5_3_32
proton-proton 2012 8 2012 simulation AOD data CMSSW_5_3_32
proton-proton 2015 13 2015 simulation MiniAOD data CMSSW_7_6_7
proton-proton 2016 13 2016 simulation MiniAOD data
NanoAOD data
CMSSW_10_6_30
Not required

For Run 1 data, the 2010 datasets are smaller and offer a better environment for low-momentum, low-pileup studies. The 2011-2012 datasets are suitable for replicating CMS Run 1 physics results or performing new searches or studies at 7 - 8 TeV collision energy. Considering Run 2, the 2015 dataset is smaller than the 2011-2012 Run 1 datasets, but offered the first look at 13 TeV collisions and a much broader array of simulation. The 2016 13 TeV dataset (released as of 2024) has a similar luminosity to the Run 1 datasets, and offers a more advanced computing environment and new identification algorithms for Run 2. Information on the respective luminosities and pile-up rates vs time can be found in public CMS luminosity information.

Heavy-ion program:

Collisions Energy (TeV) Simulation Getting Started CMSSW version
lead-lead 2010 2.76 2010-2011 Pb-Pb simulation Pb-Pb 2010 CMSSW_3_9_2_patch5\*
lead-lead 2011 2.76 2010-2011 Pb-Pb simulation Pb-Pb 2011 CMSSW_4_4_7\*
proton-proton 2011 2.76 N/A Pb-Pb 2011 CMSSW_4_4_7
proton-proton 2013 2.76 2013 p-p simulation p-Pb data CMSSW_5_3_20
proton-lead 2013 5.02 2013 p-Pb simulation p-Pb data CMSSW_5_3_20
proton-proton 2015 5.02 N/A p-Pb data CMSSW_7_5_8_patch3

* The Pb-Pb simulation linked in these rows was produced later and should be analyzed using CMSSW_5_3_20.

The Pb-Pb collisions from 2010 and 2011 are accompanied by "reference" proton-proton collisions at the same energy, collected during 2011 and 2013. The p-Pb collisions from 2013 are accompanied by referece proton-proton collisions collected during 2015. Some simulations have also been released that correspond to the heavy-ion collisions, as well as some of the reference collision data.


Exploring event displays

Visualizing CMS events is a very helpful way to get acquainted with the CMS detector and the features of different datasets. Software is provided to produce event display files from the Run 1 datasets, but many events are already available in this format for viewing on the web:

  • Load the CMS event display,
  • From the menu bar at the top, choose Open File → Open Files from Web → choose a year, and choose a dataset.
  • Select a particular event from the list, and then other events can be explore using the arrow buttons in the menu bar.
  • Various detector and/or physics features can be toggled on or off in the left-hand-side menu.

Exploring example analyses

Example analyses help demonstrate how analysts can process CMS data files to accomplish a real physics goal. Examples range from data validation exercises to full searches. The "Getting Started" pages linked in the table able all offer links to example analyses.

  • 2010 proton-proton examples
  • 2011 proton-proton examples
  • 2012 proton-proton examples
  • Run 2 proton-proton examples
  • Heavy Ion examples

Trigger information, condition data, luminosity

I want to find out how to use the trigger and trigger prescale information in the dataset I am interested in.

  • Check the guide to CMS trigger system.

I want to find out how to access the luminosity information for the dataset I am interested in and how to select "good data" only.

  • Check the CMS luminosity information for each year, and
  • check the list of validated runs.

I want to find the luminosity of my dataset, possibly constrained by using specific triggers.

  • Check the Guide to calculate luminosity

I want to find out whether I need condition data base information, and if so, how to access it.

  • Condition data are needed for examples using e.g. jet energy corrections and trigger configuration information, many of the simpler analysis examples do not need any additional corrections from the conditions database.
  • The most recent CMSSW containers contain the conditions database information needed for the relevant year's data, and example analysis frameworks such as the Physics Object Extractor Tool demonstrate how to access this information.
  • More information is available in the "Guide to the CMS condition database".

Using simulation

How do I interpret the simulated dataset names?

  • Check CMS Simulated Dataset Names.

I want to find the generator cross section of a particular simulation.

  • Check CMS Simulation cross sections.

I want to find the effective luminosity of my simulated dataset.

  • Effective luminosity = (cross section) (generator matching efficiency, if applicable) (generator filter efficiency, if applicable) / (Number of positive-weight events - Number of negative-weight events).
  • Find more information about using simulation in the CMS Open Data Guide.

Contact us

I want information that is not documented here and elsewhere on the CERN Open Data portal.

  • Please check the CMS Open Data Guide.
  • Kindly reach out on the CERN Open Data Forum and tag "CMS" in your message.

I ran into a problem and need help!

  • Please check our page related to known errors.
  • Kindly reach out on the CERN Open Data Forum and tag "CMS" in your message.
ALICE experiment
ATLAS experiment
CMS experiment
DELPHI experiment
JADE experiment
LHCb experiment
OPERA experiment
PHENIX experiment
TOTEM experiment
© CERN, 2014–2025 ·
Terms of Use ·
Privacy Policy ·
Help ·
GitHub ·
Twitter ·
Email
Powered by Invenio
Open Data Portal v0.3.0
CERN