rivet is hosted by Hepforge, IPPP Durham
close Warning: Can't synchronize with repository "(default)" (Repository path '/hepforge/hg/rivet/public/rivet' does not exist.). Look in the Trac log for more information.

Opened 10 years ago

Last modified 7 years ago

#421 assigned defect

Fix projection cleanup mess

Reported by: Andy Buckley Owned by: Andy Buckley
Priority: blocker Milestone: Perfection
Component: Projections Version: HEAD
Keywords: Cc: hoeth, Frank Siegert

Description

A last-minute tidying of singleton implementations with the Rivet 1.2.2 release candidate has unearthed a nasty set of bugs in projection deletion, encompassing all the perverse joys of boost::shared_ptr's operator<, why you shouldn't call dynamic_cast on a partially destructed object, and many more. I'm trying to fix this at the moment, and suspect it has a lot to do with our memory leak. Definitely a release blocker, unfortunately!

Attachments (1)

file.hepmc (123.8 KB) - added by Frank Siegert 10 years ago.

Download all attachments as: .zip

Change History (9)

comment:1 Changed 10 years ago by Andy Buckley

Milestone: Release 1.2.21.2.3 release
Status: newassigned

Pushing back to 1.2.3 since this is still safe for release at the MCnet school (the 1.2.2 target) and we need some serious code eyeballing to fix it.

comment:2 Changed 10 years ago by Andy Buckley

Milestone: 1.2.3 releaseRelease 1.2.2

Arse, Frank K discovered a guaranteed way to enrage some tentacle of this error in normal use, via registering multiple simultaneous analyses. Better fix it for 1.2.2 *sigh*.

comment:3 Changed 10 years ago by Frank Siegert

I have come across a problem which seems to be related to this one (? if you don't think so, I'll file a separate ticket) and might be useful for debugging. When running the JADE_OPAL analysis on the attached 10 normal LEP events, I get the following segfault right away:

$ gdb --args /usr/bin/python $(which rivet) -a JADE_OPAL_2000_S4300807 file.hepmc
[...]
(gdb) run
Starting program: /usr/bin/python /home/frank/rivet/install/bin/rivet -a JADE_OPAL_2000_S4300807 file.hepmc
[Thread debugging using libthread_db enabled]
Rivet running on machine l40 (i686)

Program received signal SIGSEGV, Segmentation fault.
0x026c4dca in std::type_info::name (this=0x0) at /usr/include/c++/4.4/typeinfo:100
100         { return __name; }
(gdb) bt
#0  0x026c4dca in std::type_info::name (this=0x0) at /usr/include/c++/4.4/typeinfo:100
#1  0x026c3872 in Rivet::ProjectionHandler::_getEquiv (this=0x880c610, proj=...) at ProjectionHandler.cc:160
#2  0x026c311c in Rivet::ProjectionHandler::registerProjection (this=0x880c610, parent=..., proj=..., name=...) at ProjectionHandler.cc:71
#3  0x0269ad97 in Rivet::ProjectionApplier::_addProjection (this=0xbfffed00, proj=..., name=...) at ProjectionApplier.cc:40
#4  0x026fad41 in Rivet::ProjectionApplier::addProjection<Rivet::VisibleFinalState> (this=0xbfffed00, proj=..., name=...) at ../../include/Rivet/ProjectionApplier.hh:131
#5  0x026fa9f6 in JetAlg (this=0xbfffed00, fs=...) at JetAlg.cc:13
#6  0x026dcc57 in FastJets (this=0xbfffed00, fsp=..., alg=Rivet::FastJets::DURHAM, rparameter=0.69999999999999996, seed_threshold=1) at FastJets.cc:18
#7  0x02da0ea8 in Rivet::JADE_OPAL_2000_S4300807::init (this=0x881a3d8) at JADE_OPAL_2000_S4300807.cc:34
#8  0x026ba7ff in Rivet::AnalysisHandler::init (this=0x8818728, ge=...) at AnalysisHandler.cc:88
#9  0x026c0a82 in Rivet::Run::init (this=0x881e7a0, evtfile=...) at Run.cc:73
#10 0x024f8579 in _wrap_Run_init (args=0x87d214c) at ./rivet/rivetwrap_wrap.cc:14162
#11 0x00555d4d in ?? ()

There are a few interesting things about this crash:

  • I can reproduce it on both SL5 and Ubuntu, so it doesn't seem to be completely random.
  • The crash does *not* appear when I use the simple rivet-nopy, so maybe the Python wrapping is to blame?
  • Some analyses work fine (ALEPH_2004_S5765862, ALEPH_1996_S3486095, DELPHI_2002_069_CONF_603, OPAL_1998_S3780481, ALEPH_1991_S2435284, DELPHI_1995_S3137023) and some segfault (DELPHI_1996_S3430090, JADE_OPAL_2000_S4300807)

Changed 10 years ago by Frank Siegert

Attachment: file.hepmc added

comment:4 Changed 10 years ago by Andy Buckley

Hi Frank,

Yes, this looks similar. I don't *think* the problem is coming from the Python wrapper per se, but maybe its presence is changing the memory layout and triggering the problem. I'll believe anything with this bug! I've not had the time or mental energy to dig into it for a while, but since it's now reproducible I will find the time this week.

Thanks for the extra info...

Andy

comment:5 Changed 10 years ago by Frank Siegert

One more finding: My crash goes away when I revert r2622 (either by going back to 2621 or by reverting that one commit in an up-to-date trunk). Does that make any sense?

comment:6 Changed 10 years ago by Frank Siegert

Oh, and it definitely is not the Python wrapper, as I get the same crash from the Sherpa interface.

comment:7 Changed 9 years ago by hoeth

Milestone: 1.5.02.0.0

comment:8 Changed 7 years ago by Andy Buckley

Cc: hoeth,fsiegerthoeth, fsiegert
Milestone: 2.0.0Perfection
Note: See TracTickets for help on using tickets.