So here we are getting ready to push the SoapBox Server 2007 out the door, and I somehow have a bunch of task items assigned to me (seeing as I put together the master task list, I'm not quite sure how that happended, but that's another story).
Like many developers, simple tasks always seem easy, even while they're spiraling out of control. Take for example the one that's taken me the better part of two days and counting: "Add CrashDump support into the SoapBox Server".
At first blush, this seems pretty easy - after all, we've been doing crashdumps with ADPlus for a while now, and the .NET framework has this new "FailFast" method that is supposed to make thing really easy. No problem, right?
The MSDN article that describes FailFast makes some great promises:
System.Environment.FailFast is a simple method that does three things. First, it writes an event to the Windows Application log noting that a fatal error occurred. This message includes custom information supplied to FailFast as its sole string parameter. Second, FailFast causes a Watson error report and mini-dump to be generated and uploaded to the Microsoft Windows Error Reporting (WER) service. You can then use Windows Quality Online Services (winqual.microsoft.com) to access WER data for the application and analyze the data to locate the source of the problem. Finally, FailFast kills the process.
This sure sounds ideal.
As always, things aren't quite what they seem. FailFast, the nice new shiny .NET method that is going to exactly what I need, doesn't have any useful documentation at all. It certainly makes no mention of WER or WinQual as the MSDN quote above does. In fact all it says is:
The FailFast method writes a log entry to the Windows Application event log using the message parameter, creates a dump of your application, and then terminates the current process.
I can live with that, but I need to know:
- What kind of MiniDump will it produce? Stack-Trace only? Data Segments? Unloaded Modules? Everything?
- Where will it put the files? I would really like to specify a file name.
- Does it work in x86? x64? Itanium? Mono? (Well, Mono doesn't really do crashdumps, so that's a little misleading).
- What is the best way to use it? Do I use the new Unhandled Exception infrastructure? The AppDomain Unhandled Exception Event?
.. and to further complicate things are some new problems:
- Can we debug an Itanium MiniDump on a x86 computer?
- How best can our clients push these huge dumps up to us?
Due to lack of documentation, I was forced to rule out using FailFast. Reflector shows no meaningfull managed code for the implementation, so what it's really doing is a mystery. Ah well, I didn't want to use that method anyway. Hmpf.
Fortunatly I stumbled across CLR Dump which, from the documentation, will do exactly what I want. ... but again, things aren't ever that easy. CLRDump is a C DLL for x86 - this means no x64 or Itanium build, which is hardly ideal. The x64 and Itamium installations of our server are the biggest and most complex, therefore the ones most in need of collecting crashdumps.
It looks like the only answer that hits everything is to directly poke at the dbghelp.dll (which comes from MS, is fully redistributable, and has x86, x64 and Itamium flavors) from Managed C++. This code then needs to be tied into the new unhandled exception infrastructure in a way that doesn't confused managed debugging, and finally tested across all the platforms.
Once I figure out how to do this, I'll let ya all know.
... more info from the trenches later.
--
Chris Mullins
cmullins@coversant.net