• Help
    Discussion forum
    Search tips
  • About
    CERN Open Data
    ALICE
    ATLAS
    CMS
    DELPHI
    JADE
    LHCb
    OPERA
    TOTEM
    Glossary

Important notice: opendata-dev.cern.ch is a development server. Please use it for testing purposes only. The content may be erased at any time. Please use opendata.cern.ch for production.

Sample with jet properties for jet-flavor and other jet-related ML studies JetNTuple_QCD_RunII_13TeV_MC

Kallonen, Kimmo

Cite as: Kallonen, Kimmo; (2019). Sample with jet properties for jet-flavor and other jet-related ML studies JetNTuple_QCD_RunII_13TeV_MC. CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.RY2V.T797

Dataset Derived Datascience CMS CERN-LHC Parent Dataset: /QCD_Pt-15to7000_TuneCUETP8M1_Flat_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_magnetOn_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM


Description

The dataset consists of particle jets extracted from simulated proton-proton collision events at a center-of-mass energy of 13 TeV generated with Pythia 8. The particles emerging from the collisions traverse through a simulation of the CMS detector. The particles were reconstructed from the simulated detector signals using the particle-flow (PF) algorithm. The reconstructed particles are also called PF candidates. The jets in this dataset were clustered from the PF candidates of each collision event using the anti-$k_t$ algorithm with distance parameter $R = 0.4$. The standard L1+L2+L3+residual jet energy corrections are applied to the jets and pileup contamination is mitigated using the charged hadron subtraction (CHS) algorithm.

From each collision event, only those jets with transverse momentum exceeding 30 GeV were saved to file. The jets were also required to have pseudorapidity of less than 2.5 (this indicates the jet's position in the detector). For each jet, there are variables describing the jet on a high-level, particle-level and generator-level. There are also some variables describing the collision event and the conditions of its simulation. All of the variables are saved on a jet-by-jet basis, which means that one row of data corresponds to one jet.

The origin of a jet is particularly interesting. This so-called flavor of the jet is obtained from the generator-level particles by a jet flavor algorithm, which attempts to match a reconstructed jet to a single initiating particle. As a consequence, the jet flavor definition depends on the chosen algorithm. Here three different flavor definitions are available. The ‘hadron’ definition identifies b- and c-hadrons from the jet’s constituents, so it is only useful for b-tagging studies. The ‘parton’ definition extends this to include the light jet flavors (u, d, s and gluon). Finally there is the ‘physics’ definition, which looks at the quarks and gluons of the initial collision. The ‘parton’ and ‘physics’ definitions both identify all jet flavors, but the former is more biased towards b- and c-quarks. If in doubt, it is recommended to use the ‘physics’ definition.

Related datasets

This dataset was derived from:

/QCD_Pt-15to7000_TuneCUETP8M1_Flat_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_magnetOn_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM

Dataset characteristics

22554294 entries. 244 files. 190.6 GiB in total.

Dataset semantics

Variable Type Description
jetPt Float_t Transverse momentum of the jet.
jetEta Float_t Pseudorapidity (η) of the jet.
jetPhi Float_t Azimuthal angle (ϕ) of the jet.
jetMass Float_t Mass of the jet.
jetGirth Float_t Girth of the jet (as defined in arXiv:1106.3076 [hep-ph]).
jetArea Float_t Catchment area of the jet; used for jet energy corrections.
jetRawPt Float_t Transverse momentum of the jet before the energy corrections.
jetRawMass Float_t Mass of the jet before the energy corrections.
jetLooseID UInt_t Binary variable indicating whether the jet passes 'loose' criteria for being a real jet.
jetTightID UInt_t Binary variable indicating whether the jet passes 'tight' criteria for being a real jet.
jetGenMatch UInt_t 1: if a matched generator level jet exists; 0: if no match was found.
jetQGl Float_t Quark-Gluon jet likelihood discriminant variable built out of the three following variables (see the report CMS-PAS-JME-13-002 for more information).
QG_ptD Float_t Jet energy variable (see CMS-PAS-JME-13-002).
QG_axis2 Float_t Minor axis of the jet (see CMS-PAS-JME-13-002).
QG_mult UInt_t Jet constituent multiplicity with additional cuts (see CMS-PAS-JME-13-002).
partonFlav Int_t Flavour of the jet, as defined by the CMS parton-based definition.
hadronFlav Int_t Flavour of the jet, as defined by the CMS hadron-based definition.
physFlav Int_t Flavour of the jet, as defined by the CMS 'physics' definition (if in doubt, use this).
isPartonUDS UInt_t Indicates light quark (Up, Down, Strange) jets: partonFlav = 1, 2, 3.
isPartonG UInt_t Indicates gluon jets: partonFlav = 21.
isPartonOther UInt_t Indicates any other kind of jet: partonflav != 1, 2, 3, 21.
isPhysUDS UInt_t Indicates light quark (Up, Down, Strange) jets: physFlav = 1, 2, 3.
isPhysG UInt_t Indicates gluon jets: physFlav = 21.
isPhysOther UInt_t Indicates any other kind of jet: physFlav != 1, 2, 3, 21.
jetChargedHadronMult UInt_t Multiplicity of charged hadron jet constituents.
jetNeutralHadronMult UInt_t Multiplicity of neutral hadron jet constituents.
jetChargedMult UInt_t Multiplicity of charged jet constituents.
jetNeutralMult UInt_t Multiplicity of neutral jet constituents.
jetMult UInt_t Multiplicity of jet constituents.
nPF UInt_t Number of particle flow (PF) candidates (particles reconstructed by the particle flow algorithm); contains all particles within |Δϕ| < 1 and |Δη| < 1 from the center of the jet.
PF_pT[nPF] Float_t Transverse momentum of a PF candidate.
PF_dR[nPF] Float_t Distance of a PF candidate to the center of the jet.
PF_dTheta[nPF] Float_t Polar angle (θ) of a PF candidate.
PF_dPhi[nPF] Float_t Azimuthal angle (ϕ) of a PF candidate.
PF_dEta[nPF] Float_t Pseudorapidity (η) of a PF candidate.
PF_mass[nPF] Float_t Mass of a PF candidate.
PF_id[nPF] Int_t Generator level particle identifier for the particle flow candidates, as defined in the PDG particle numbering scheme.
PF_fromPV[nPF] UInt_t A number indicating how tightly a particle is associated with the primary vertex (ranges from 3 to 0).
PF_fromAK4Jet[nPF] UInt_t 1: if the particle flow candidate is a constituent of the reconstructed AK4 jet; 0: if it is not a constituent of the jet.
genJetPt Float_t Transverse momentum of the matched generator level jet.
genJetEta Float_t Pseudorapidity (η) of the matched generator level jet.
genJetPhi Float_t Azimuthal angle (ϕ) of the matched generator level jet.
genJetMass Float_t Mass of the matched generator level jet.
nGenJetPF UInt_t Number of particles in the matched generator level jet.
genPF_pT[nGenJetPF] Float_t Transverse momentum of a particle in the matched generator level jet.
genPF_dR[nGenJetPF] Float_t Distance of a particle to the center of the matched generator level jet.
genPF_dTheta[nGenJetPF] Float_t Polar angle (θ) of a particle in the matched generator level jet.
genPF_mass[nGenJetPF] Float_t Mass of a particle in the matched generator level jet.
genPF_id[nGenJetPF] Int_t Generator level particle identifier for the particles in the matched generator level jet, as defined in the PDG particle numbering scheme.
eventJetMult UInt_t Multiplicity of jets in the event.
jetPtOrder UInt_t Indicates the ranking number of the jet, as the jets are ordered by their transverse momenta within a single event.
dPhiJetsLO Float_t The phi difference of the two leading jets.
dEtaJetsLO Float_t The eta difference of the two leading jets.
alpha Float_t If there are at least 3 jets in the event, alpha is the third jet's transverse momentum divided by the average transverse momentum of the two leading jets.
event ULong64_t Event number.
run UInt_t Run number.
lumi UInt_t Luminosity block.
pthat Float_t Transverse momentum of the generated hard process.
eventWeight Float_t Weight assigned to the generated event.
rhoAll Float_t The median density (in GeV/A) of pile-up contamination per event; computed from all PF candidates of the event.
rhoCentral Float_t Same as above, computed from all PF candidates with |η| < 2.5.
rhoCentralNeutral Float_t Same as above, computed from all neutral PF candidates with |η| < 2.5.
rhoCentralChargedPileUp Float_t Same as above, computed from all PF charged hadrons associated to pileup vertices and with |η| < 2.5.
PV_npvsGood UInt_t The number of good reconstructed primary vertices.
Pileup_nPU UInt_t The number of pileup interactions that have been added to the event in the current bunch crossing.
Pileup_nTrueInt Float_t The true mean number of the poisson distribution for this event from which the number of interactions in each bunch crossing has been sampled.

How were these data selected?

This dataset was produced with the software available in:

JetNtupleProducerTool - Jet tuple producer from CMS Run2 MiniAOD

How can you use these data?

The use of these files does not require any software specific to the CMS experiment. There are two sets of equivalent files in two different formats: ROOT and H5. An example notebook is provided.


      

Files and indexes

Disclaimer

These open data are released under the Creative Commons Zero v1.0 Universal license.

Logo CC0-1.0

Neither the experiment(s) ( CMS ) nor CERN endorse any works, scientific or otherwise, produced using these data.

This release has a unique DOI that you are requested to cite in any applications or publications.

ALICE experiment
ATLAS experiment
CMS experiment
DELPHI experiment
JADE experiment
LHCb experiment
OPERA experiment
PHENIX experiment
TOTEM experiment
© CERN, 2014–2025 ·
Terms of Use ·
Privacy Policy ·
Help ·
GitHub ·
Twitter ·
Email
Powered by Invenio
Open Data Portal v0.4.3
CERN