Full combination of output from split runs and re-entrant histogramming
|Reported by:||Frank Siegert||Owned by:||Andy Buckley|
|Priority:||blocker||Milestone:||2.Y.0 -- re-entrant histogramming|
One of the most requested features recently has been the possibility to combine output from separate runs of Rivet over split event samples to what it would have been like if it ran over all events in one go. Due to the flexibility necessary for the analyses in Rivet when it comes to filling/finalising its output this is not trivial. It becomes particularly tricky as soon as the final output histograms are generated from intermediate histograms or numbers (e.g. "sum of weights that passed cuts" counters) in the finalise method. To solve this problem properly but still automatic and hidden from the analysis author, I suggest the following:
- All (intermediate) histograms/numbers relevant for finalise are registered by some name (instead of as normal member variable)
- Those intermediate histograms/numbers are stored by their name in output files at the end of a Rivet run
- maybe with rivet option to disable this to get smaller files for cases where the outputs are not to be combined
- with some kind of separate type or flag to hide them from plotting tools
- The Analysis class gains an "
Analysis::fill(input (from) file)" method which fills the intermediate objects from the written files
- The Analysis class gains a central "
Analysis::add(Analysis)" method which combines all the elementary objects that are registered with an analysis
- e.g. add binheights for histograms, add sumOfPassedWeights counters
- If necessary this could be made virtual for very weird analyses to re-implement(?) -- but I don't see the need right now
- An external "afterburner" tool can use these fill and combine methods for each analysis it finds in its combinable input files, and then run the finalize() method which will use the previously combined elementary objects to build its fancy and very specific final output
This assumes, that we will be able to define a general combine method for each elementary object, but I can only imagine histograms (profile histos are basically two normal histos which are divided in the end, cf.
Profile1D->binHeight(), right?) and sumOfWeights counters for these intermediate objects right now, and they are trivial to combine. All other "complicated" and analysis-specific logic will remain in the
finalise() method just like it is now.
Change History (4)
comment:3 Changed 6 years ago by
|Milestone:||2.2.0 -- jets, tagging, cuts → 2.Y.0 -- re-entrant histogramming|
|Status:||new → assigned|
|Summary:||Combination of output from split runs → Full combination of output from split runs and re-entrant histogramming|