2007-04-16

Part 2: Background - What's using my CPU?

Previously (Part 1: Introduction - What's using my CPU?), I kicked off what I expect to be a multi-part series on determining what is causing excessive CPU consumption, outside of the normal "which process has the highest value in the CPU column in Task Manager".

Before I get into things, a little bit of background may prove useful or mildly entertaining. Over on "Sysinternals Forums", there were recently two similar problems that both involved excessive CPU utilization that was not attributable to a specific process. I became involved in both problems and attempted to use similar techniques to get additional information with the hopes of ultimately being able to pinpoint the problem. What may make this mildly entertaining is that in both cases, there was limited or no success in detetmining the cause of or solution to the problem. In the end, one problem was resolved by disabling the floppy disk controller, and the other problem appears to be as of yet unresolved. (In the latter case, the poster did admit that the system was experiencing hardware problems - the chipset fan was dying and there were diagnostic beep codes during / after POST. These hardware problems could be related to the problem.) Despite the lack of success in determining the cause of the problems I do feel that I learned a bit about this type of problem and gained some insight into the use of some tools that can come in handy in this situation.

In the two cases, the problem consisted of the CPU spending a lot of time servicing interrupts and deferred procedure calls (DPCs). What are interrupts and DPCs? "Windows Internals, Chapter 3 - System Mechanisms" says:

Interrupts ... are operating system conditions that divert the processor to code outside the normal flow of control. An interrupt is an asynchronous event (one that can occur at any time) that is unrelated to what the processor is executing. Interrupts are generated primarily by I/O devices, processor clocks, or timers.
A deferred procedure call (DPC) is a function that performs a system task—a task that is less time-critical than the current one. The functions are called deferred because they might not execute immediately.
It is interesting to note that one may have a problem with excessive CPU use but may not be able determine it by using Windows' Task Manager. This is because for whatever reason, Task Manager adds time the CPU spends servicing interrupts and DPCs to the "System Idle Process". Microsoft's / Sysinternals' Process Explorer includes separate "artificial" processes for interrupts and DPCs so that one can see how much time the CPU spends dealing with each. Per Process Explorer's help file, "high CPU consumption by these activities can indicate a hardware problem or device driver bug".

Another thing that could be consuming CPU is the SYSTEM process. The process of determining what system thread is consuming the CPU is similar to determining what thread in a user-mode process is utilizing the CPU. However, excessive CPU utilization by the SYSTEM process might be a little more serious as it is an indication that some driver is possibly running rampant.

Next time, I plan to introduce some tools that can be useful in exploring DPC and interrupt activity on a system, as well as discussing how to determine what driver might be inolved with excessive CPU utilization in the SYSTEM process.


»

3 comments:

Julian said...

Any update on this? I have the same problem, Interrupts taking 25% CPU time (pegging one core in a quad-core system).

«/\/\Ø|ö±ò\/»®© said...

Hi Julian,

Unfortunately, I'm still trying to find the time to do some justice to Part 3.

Interrupts consuming excessive CPU can be indicative of hardware problem or a bug in a device driver. One technique that can be used in troubleshooting this is to first remove unnecessary devices from the system. If that doesn't help, selectively disable components one at a time until the problem goes away. The last item that was disabled may be related to the problem.

Other options include ensuring any disk drives haven't reverted to PIO mode from DMA (if applicable), upgrading / downgrading drivers for various hardware components, and re-socketing components to ensure the connections are good.

I hope that some of these suggestions are of assistance! Please post back if you figure out what the problem is, or if you have further questions.

Mark said...

I'm very interested in Part 3 of this as well. I've seen lots of people asking about this problem in various places on the web but not many solutions. Sounds like a tricky problem to solve with no single solution that works for everyone. I'll try some of your tips from your previous comment.