However, if you are interested in finding hints, tips and guidance for conducting a research-oriented analysis using CMS Open Data, please see our notes on this page. Note that possible solutions to frequently encountered issues can be found on our page of known errors.
I want to get a general introduction into HEP and CMS software and terminology, with a simplified event format.
I want to find out whether I should go for data from 2010 or 2011 (both are pp data at 7 TeV) or from 2012 (pp data at 8 TeV).
The 2010 data have been released first; have fewer, smaller datasets with better low-pT tracking, low trigger thresholds, low pile-up and more/simpler analysis/validation examples; but have no MC. If you do not need MC or maximal statistics, you might want to try 2010 data first.
The 2011/2012 data have more statistics, more diverse datasets, many associated MC sets, and a slightly more advanced VM environment. If you are immediately interested in maximal statistics and/or MC acceptance corrections you should go for 2011/2012 data.
Note: The 2010 (SL5) virtual machine will only work on 2010 data with CMSSW 4-2-8 (and other SLC5-based CMSSW releases). The 2011 (SL6) virtual machine will only work on 2011/2012 data and MC with CMSSW 5-3-32 (and other SLC6-based CMSSW releases).
I want to produce some example physics distributions.
Install the CMS software as in the previous item (for 2010 or 2011).
choose Open File → Open Files from Web → 2010, and
choose your dataset.
I want to find out which 2010 dataset and/or analysis/validation example is most useful for my purpose.
To learn how to do a muon analysis, follow "I want to produce some example physics distributions," (above) with either option A (recommended) or B, or try one of the relevant "I want to run the examples used for validation of the 2010 datasets," (further below).
To learn how to do an electron analysis, follow "I want to produce some first physics distributions," (above) with option B, or try one of the relevant "I want to run the examples used for validation of the 2010 datasets," (below).
To learn how to do a minimum-bias track analysis, try the MinimumBias example on "I want to run the examples used for validation of the 2010 datasets," (below).
[More to come…]
Exploring the 2011-2012 datasets
I want to find out which 2011-2012 data and MC sets exist, and how to get a feel for their content.
choose Open File → Open Files from Web → 2011 or 2012, and
choose your dataset.
I want to find out which 2011 or 2012 dataset and/or analysis/validation example is most useful for my purpose.
Dedicated examples beyond those available in "Getting Started" can be found in software, including Higgs-to-four-lepton analysis, jet tuple production and top cross-sections. Alternatively, start from a 2010 example and adjust to run on 2011 or 2012 data (see the CMS troubleshooting guide for instructions).
For more information on Monte Carlo, see below.
Trigger information, condition data, luminosity
I want to find out how to use the trigger and trigger prescale information in the dataset I am interested in.
I want to find out whether I need condition data base information, and if so, how to access it.
Condition data are needed on examples using e.g. jet energy corrections and trigger configuration information, many of the simpler analysis/validation examples do not need any additional corrections from condition database.
Using condition data slows down data access, so use them only if really needed. If so:
I want to find the effective luminosity of my MC set.
Information will be added to the portal.
Generically: divide MC cross-section (next item) times matching efficiency times filter efficiency by the number of events.
I want to find the generator cross section of a particular MC set.
To be documented.
On some MC sets, the following might work (reliability of information not guaranteed): open the ROOT file, create TBrowser and navigate to Runs → GenRunInfoProduct_generator__SIM. → GenRunInfoProduct_generator__SIM.obj → InternalXSec → value_.