Saturday, August 2, 2014

A QEMU BUG: No Context Switch on a Stale Snapshot

Background

Depending on the age of a snapshot taken with Qemu, its behavior could vastly vary from running perfectly normally to exhibiting behavior that appears the OS is frozen. This is a result of a bug in the Qemu system. My conjecture is that the bug is related to the way in which Qemu handles the timers for the guest OS.

Purpose

To prove that there is a bug in Qemu that prevents older snapshots from running properly, we inserted a function into the Qemu system, helper_trace2, that executes before every single instruction is run through the Qemu emulated CPU. By inserting this function into ops_sse.h, it allows us to verify that the Qemu system is in fact not performing a context switch. We will be able to see that a context switch occurred because the function prints out, "new xth CR3 value".  This message will only be generated for new processes. CR3 (Control Register #3) is a special register in Intel's CPUs. This register is used for keeping track of the beginning address of a page table for the current process. It can also act as a unique identifier for a process. 

Procedure

1. First, I will demonstrate the output of helper_trace2 by debugging into Qemu using GDB

Once Qemu begins to run, the message is printed out immediately


The part of helper_trace2 that prints out that message looks like this:


2. Now, I will let Qemu load the guest Windows XP system normally to demonstrate that the message is printed out with each new process generated by the guest OS


For each new process, helper_trace2 prints out the page table address for that particular process. Now if we were to run an older snapshot, the "xth CR3" value would never be printed indicating that a context switch is not being performed. 

3. I'm going to demonstrate that when an old snapshot is loaded into Qemu, only one statement from helper_trace2 will be printed.

First, I load the old snapshot and then send key press events to the guest OS.


As shown in the above picture, a second CR3 value was obtained, but not matter how long I wait or event I send, a new process is never switched to. This can also be observed by viewing the Qemu window displaying the guest OS. They "d" key is not present in the command window nor is the cursor blinking.


Conclusion

Since no new CR3 values are found, we can conclude that no new processes are being created and therefore, the Qemu CPU is not performing a context switch. This is a bug in the Qemu system that we will fix in the future.


No comments:

Post a Comment