Context Switching, The hidden server killer
Imagine this :
You have a server running various applications and all of a sudden the machine just locks up, nothing happens anymore and the only way out is a long push on the power button. After a couple of times you start running perfmons on the basics : CPU, Memory, Pagefile usage, Disk IO but they all check out okay. But, to your fear, the problem remains. Then you turn to your Citrix Resource Manager – it also has a variety of monitors but one of the default is a rather strange one called “Context Switches”. And this one is going through the roof. But what the heck is this ? And what is the meaning ?
First question is ofcourse, what is a Context Switch ?
Google or any other searchengine often is your friend but not really in this case, you’ll find some info in online dictionaries, a Linux-oriented IBM document, and even a Unix / Linux consultancy firm. WikiPedia also gives a pretty accurate but not so easy to understand explanation.
This pretty much gets us nowhere so lets start from scratch.
Ok so what the hell is it ?
In general a Context Switch is something that is at the core of a multitasking operating system, as it is in fact the switching from one application running on the computer to the other. A CPU can actually do only a single task at a time, it sure can do a lot of them in a second so it looks like its doing various things at the same time but down-level you can only use the hardware registers once. Intel’s Hyperthreading tries to cheat around on this but in the end the CPU is still doing a single task at once.
Uh ?
Let’s say your CPU is currently executing an executable that is part of MS Word, now on your Citrix server this will not be the only application available so MS Excel is also running. The operating system wants to give Excel its slice of CPU cycles so it switches between the two. What happens is that the CPU registers in use by MS Word are written to memory, afterwards the CPU registers that MS Excel has been using are copied to the CPU. When these are in place the task the CPU is supposed to be doing will be executed.
So how does this cause hanging of the server ?
Well during the time that the register information belonging to MS Word is read from the CPU registers and written to memory followed by the informationg belonging to MS Excel is read from memory and written to the CPU Registers you server cant do anything else (remember, one task at a time). So for the user looking at the screen the server will be dead in the water when its switching.
Aha – now what ?
Now remember that today CPU’s count in the gigahertz’s speedwise so normally you will not notice this on a regular server. Now a regular (non Citrix) server quite often runs very few applications – for example MS SQL 2000 or IIS that dont have to switch that often. Now a Citrix server can have a multitude of applications published for a large amount of users causing for a lot of application switching. And (often badly-written) applications that tend to cause a lot of Context Switches can go into ‘Switching’ so hard that the CPU and/or OS lose track and lock up completely, causing the server to hang and requiring a long push of the power button.
So how do I prevent this ?
Well you should get applications that don’t cause that many Context Switches. Now if you are stuck with one you can do some pro-active monitoring. Out of experience [url=http://www.netiq.com/products/am/default.asp]NetIQ AppManager[/url] does a pretty good job at this. Keeping an eye on the Resource Manager also is a good indication to show you that something is wrong. What to look for actually is an application that is abnormally high in user-cpu cycles (not system). Your normal Taskmanager will not display this but a third-party application such as Hyena will display this.
Aha, you’re the expert !
No I’m not.. This is all from personal experience and some knowledge on how an operating system actually works (digging thru the insides of the C64 and Amiga has been usefull it seems). Actually it can be that most of the stuff on this page is wrong, but I dont think so. If you have some additional info or comments, feel free to let me know. The e-mail adress is somewhere on this website normally.
