New nightly build and initial tutorial on how to use crash reporting!
Hello everyone a new nightly build is out which has the first "feature complete" crash reporting concept. So please try it out and met me know how it goes.
I have started a tutorial on how to use both the crash reports but also how to use the new [check_nscp] command which checks the health of NSClient++.
Michael Medin
Fixed crash: Thank you breakpad!
Got some 300 dumps or so submitted on xmas so I went through them and all were crashing on the same line! Which has been fixed in the latest nightly build. It seems there was an issue with the new PDH counters where I missed to initialize a variable to NULL.
Crash pad analysis was pretty straight forward. The dump looks like this:
... 6|0|CheckSystem.dll|memmove|F:\dd\vctools\crt_bld\SELF_64_AMD64\crt\src\AMD64\memcpy.asm|224|0x0 6|1|CheckSystem.dll|memcpy_s|f:\dd\vctools\crt_bld\self_64_amd64\crt\src\memcpy_s.c|67|0xa 6|2|CheckSystem.dll|std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> >::assign(std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> > const &,unsigned __int64,unsigned __int64)|c:\program files (x86)\microsoft visual studio 8\vc\include\xstring|1049|0x2c 6|3|CheckSystem.dll|PDH::PDHCounter::getName()|c:\source\nscp\branches\stable\include\pdh\counters.hpp|75|0x26 6|4|CheckSystem.dll|PDHCollectors::RoundINTPDHBufferListenerImpl<__int64,PDHCollectors::PDHCounterNormalMutex>::get_name()|c:\source\nscp\branches\stable\include\pdh\collectors.hpp|321|0x4 6|5|CheckSystem.dll|PDHCollectors::RoundINTPDHBufferListenerImpl<__int64,PDHCollectors::PDHCounterNormalMutex>::getAvrage(unsigned int)|c:\source\nscp\branches\stable\include\pdh\collectors.hpp|298|0xf 6|6|ntdll.dll||||0x117287 6|7|CheckSystem.dll|wcstoxl|f:\dd\vctools\crt_bld\self_64_amd64\crt\src\wcstol.c|141|0x7 6|8|KERNELBASE.dll||||0x10ab 6|9|KERNELBASE.dll||||0x10ab 6|10|CheckSystem.dll|PDHCollector::getCPUAvrage(std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> >)|c:\source\nscp\branches\stable\modules\checksystem\pdhcollector.cpp|269|0x1d ...
With the offensive line being: 6|3|CheckSystem.dll|PDH::PDHCounter::getName()|c:\source\nscp\branches\stable\include\pdh\counters.hpp|75|0x26
The problem (if we look at the code) was a bit perplexing actually:
std::wstring get_name() const {
if (parent_ != NULL)
return parent_->getName();
return _T("<UN ATTACHED>");
}
...
const std::wstring getName() const {
return name_;
}
This looks solid enough right? Whats even worse is that it seems to work fine on my box. So after digging around a bit I noticed this was NULL "after the second call" meaning something is fishy but with a check for NULL before the I was a bit puzzled until I noticed there was no default assignment for the parent_ pointer meaning in some rare cases when performance counters was not working properly we would not get a valid value which in conjunction with an problem in the counter would yield this error.
Anyways, to make a long story short: Thank you Google breakpad and whomever sent in the crash report'''
Tomorrow I will write up a quick tutorial/info page on how to enable crash report submissions and how you can help out the development by submitting (manually if you prefer) crash reports whenever you have a problem!
Michael Medin
New nightly build: Eventlog and Configurable crash handling
Hello everyone.
Also I have some DNS issues so use www.nsclient.org (not nsclient.org). xname is broken and wont reload (as usual) so not sure when the short hand will come back.
Another little update.
2010-12-14 MickeM
* CheckEventLog: Fixed so type can be compared to various string keys: error, warning, info, auditSuccess, auditFailure
* CheckEventLog: Fixed so invalid parses are reported better (check the "rest" buffer)
CheckEventLog file=Application "filter=generated gt -600m AND message LIKE 'Click2Run'" ...
WARNING:Parsing failed: AND message LIKE 'Click2Run'
* CheckEventLog: Added support for "not like" operator.
CheckEventLog file=Application "filter=generated gt -600m AND message not like 'Click2Run'" ...
* CrashHandler: Added several options to the crash handler (so it can be configurable)
Everything reside under the [crash] sectiuon and the avalible keys are:
* restart=1 # if we shall restart the service when a crash is detected.
* service_name=<name of service to restart>
* submit=0 # if we shall submit crash reports to crash.nsclient.org
* url=http://crash.nsclient.org/submit
* archive=1 # Archive crashdumps
* folder=<appfolder>/dumps
2010-12-13 MickeM
+ Added not responding detection to CheckProcState
All "hung" processes will be considerd "hung" (and not started/stopped)
When process is "not hung" (badapp.exe)
CheckProcState quake.exe=stopped badapp.exe=started notepad++.exe=started
OK:OK: All processes are running.
CheckProcState quake.exe=stopped badapp.exe=hung notepad++.exe=started
CRITICAL:CRITICAL: BadApp.exe: started (critical)
Where as when it is hung:
CheckProcState quake.exe=stopped badapp.exe=started notepad++.exe=started
CRITICAL:CRITICAL: BadApp.exe: hung (critical)
CheckProcState quake.exe=stopped badapp.exe=hung notepad++.exe=started
OK:OK: All processes are running.
Michael Medin
New nightly build with "not responding" (hung) detection
Hello,
Just a quick shout since I added experimental support for detecting "hung" applications. It is built into CheckProcState and works much like "started" and "stopped" but it now also has a "hung" state.
You can use it like so:
- When process (badapp.exe) is "responding" (not hung)
CheckProcState quake.exe=stopped badapp.exe=started notepad++.exe=started OK:OK: All processes are running. CheckProcState quake.exe=stopped badapp.exe=hung notepad++.exe=started CRITICAL:CRITICAL: BadApp.exe: started (critical)
- When process (badapp.exe) is "not responding" (hung)
CheckProcState quake.exe=stopped badapp.exe=started notepad++.exe=started CRITICAL:CRITICAL: BadApp.exe: hung (critical) CheckProcState quake.exe=stopped badapp.exe=hung notepad++.exe=started OK:OK: All processes are running.
*NOTICE* this also has the automatic crash report submission built in (so be ware!).
Michael Medin
New built-in crash detection with dump submission using Google breakpad!
Hello everyone,
I just released a new version which features Google breakpad support. This is an initial version so don't use this in production!
In essence what this means is that whenever NSClient++ crashes it will do two things.
- Send a crash dump to crash.nsclient.org
- Restart the nsclient++ service.
This is as I said before an initial version so there is a lot of things you cant really use yet. The idea is to have tree modes of operation.
- Automatic: meaning crashes will be submitted and service restarted
- Manual: Dumps will be collected and service restarted. Having a check inside nsclient++ where you can check for dumps and submit them.
- Off: Dumps will be collected.
The concept is based around google breakpad which is (amongst others) what Mozilla uses for Firefox and Thunderbird and Google uses for Chrome.
You can try this out in /test mode using the not so friendly "assert" keyword:
nsclient++ /test ... ... assert
MickeM







rss
