Discussion:
Kernel core dumps from qemu
Chris Smith
2009-03-24 16:54:01 UTC
Permalink
Hello.

I have been looking at adding a monitor command

(qemu) dump <file>

to produce a snapshot kernel core dump similar to /proc/vmcore.

There is nothing wrong with kdump in the guest, but having
a host dumper (in addition) would have some advantages:

- saves 64 meg in each Linux guest (reserved for dump-capture kernel)

- can write a snapshot dump and continue, kdump cannot
(can automatically dump on OOPS as well as panic)

- produces dump in the host filesystem, easier for management tools

- Xen has this, virsh supports this

- not limited to Linux guests

I'll outline some specific ideas and issues.

My question is: might qemu/KVM be interested in adopting something like this?
In due course.

Ok... kdump vmcores are 64-bit ELF core files, even on 32-bit i386.
So too it should be with host dumps.

Linux-x64 maps physical memory twice, at 0xffff880000000000
and at 0xffffffff80000000. An ELF dump that gives those addresses
both pointing to a copy of phys_ram[] is satisfactory to crash and gdb.

It would be unwholesome for qemu to know those constants (__PAGE_OFFSET
and __START_KERNEL_map). Worse than unwholesome, __PAGE_OFFSET recently
changed.

So: qemu writes an ELF dump, but without providing virtual addresses.
Crash will be able to read this as is. For others, a tool can examine
System.map or vmlinux and fix up the vmcore. For ease of use, piping
the dump through the tool can have some easy syntax.

This also handles Windows dumps, Solaris dumps, DMX dumps, and so on.
qemu provides the info it has, and need not worry about the formatting.

Registers are needed too.

vmcores have elf NT_PRSTATUS notes with the registers filled in and
the rest zero. (Except current->pid which I hope isn't needed because
qemu can't do it.)

So, NT_PRSTATUS is reasonable. Also, since it's inevitably
machine dependent, CPUState plus a version number would allow access
to pretty much everything. Or the standard thing would be pt_regs.
(It is easy to add new note types.)

Last, there should be a note giving the version number of the spec.

Comments appreciated.
Avi Kivity
2009-03-24 18:08:38 UTC
Permalink
Post by Chris Smith
Hello.
I have been looking at adding a monitor command
(qemu) dump <file>
to produce a snapshot kernel core dump similar to /proc/vmcore.
There is nothing wrong with kdump in the guest, but having
- saves 64 meg in each Linux guest (reserved for dump-capture kernel)
- can write a snapshot dump and continue, kdump cannot
(can automatically dump on OOPS as well as panic)
- produces dump in the host filesystem, easier for management tools
- Xen has this, virsh supports this
- not limited to Linux guests
I'll outline some specific ideas and issues.
My question is: might qemu/KVM be interested in adopting something like this?
In due course.
Ok... kdump vmcores are 64-bit ELF core files, even on 32-bit i386.
So too it should be with host dumps.
Linux-x64 maps physical memory twice, at 0xffff880000000000
and at 0xffffffff80000000. An ELF dump that gives those addresses
both pointing to a copy of phys_ram[] is satisfactory to crash and gdb.
It would be unwholesome for qemu to know those constants (__PAGE_OFFSET
and __START_KERNEL_map). Worse than unwholesome, __PAGE_OFFSET recently
changed.
So: qemu writes an ELF dump, but without providing virtual addresses.
Crash will be able to read this as is. For others, a tool can examine
System.map or vmlinux and fix up the vmcore. For ease of use, piping
the dump through the tool can have some easy syntax.
This also handles Windows dumps, Solaris dumps, DMX dumps, and so on.
qemu provides the info it has, and need not worry about the formatting.
Registers are needed too.
vmcores have elf NT_PRSTATUS notes with the registers filled in and
the rest zero. (Except current->pid which I hope isn't needed because
qemu can't do it.)
So, NT_PRSTATUS is reasonable. Also, since it's inevitably
machine dependent, CPUState plus a version number would allow access
to pretty much everything. Or the standard thing would be pt_regs.
(It is easy to add new note types.)
Last, there should be a note giving the version number of the spec.
Comments appreciated.
This looks useful. I'd suggest a 'format' argument, so we can extend
this later to dump in non-ELF formats (the Windows native memory dump
format would be useful).

I suppose the core format handles smp?
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
Chris Smith
2009-03-24 20:19:25 UTC
Permalink
I'd suggest a 'format' argument, so we can extend this later to dump
in non-ELF formats (the Windows native memory dump format would be
useful).
No problem. I was thinking of something similar --

(qemu) dump | winfmt > win.dmp

or

(qemu) dump | /usr/bin/gdbfmt > just.like.vmcore

or

(qemu) dump > raw.elf [normal case for crash utility]

with qemu providing a dump primitive (just the data) to be dressed up
by tools distributed with qemu, or contributed, or home grown.

The | and > are just to be explicit, it could be prettier.

(gdbfmt would just look up PAGE_OFFSET and alter the elf header.)
(gdbfmt needs a path to System.map, if so, better it than qemu.)
I suppose the core format handles smp?
I think so. kdump dumps out the memory and the registers (for each cpu).

memory:

kdump dumps out the physical memory and its virtual addresses.
It (vmcore.c) is given a list of physical memory addresses and sizes.

qemu has cpu_physical_memory_rw() -- I think that's the right one --
and page tables. qemu can dump out the physical memory but without
virtual addresses.

registers:

kdump dumps out the registers, in NT_PRSTATUS notes.
crash_save_cpu() does this, it's in /usr/src/linux/kernel/kexec.c.
struct prstatus is in /usr/include/linux/elfcore.h.

For smp, it's one NT_PRSTATUS note per cpu, in the order 0,1,2,...

For qemu's registers in the raw dump, struct prstatus is sort of one
endpoint, it matches kdump's vmcore. Free tools.

The other endpoint is CPUState. If some dump format wants something,
it's there, or else qemu doesn't have it. The dump formatters would
be kind of grizzly, knowing way too much about qemu internals and
targets. Maybe it should use XML.

Well, no. Anyway, the dump should provide everything kdump does,
except current->pid for each cpu. There is no indication what
that is for.
Avi Kivity
2009-03-25 09:41:17 UTC
Permalink
Post by Chris Smith
I'd suggest a 'format' argument, so we can extend this later to dump
in non-ELF formats (the Windows native memory dump format would be
useful).
No problem. I was thinking of something similar --
(qemu) dump | winfmt > win.dmp
or
(qemu) dump | /usr/bin/gdbfmt > just.like.vmcore
or
(qemu) dump > raw.elf [normal case for crash utility]
with qemu providing a dump primitive (just the data) to be dressed up
by tools distributed with qemu, or contributed, or home grown.
The | and > are just to be explicit, it could be prettier.
I suggest 'dump -filter ... blah'. I guess for larger cores this is
better than postprocessing, if it can be done without buffering all of RAM.
--
error compiling committee.c: too many arguments to function
Paul Brook
2009-03-25 00:32:33 UTC
Permalink
Post by Avi Kivity
This looks useful. I'd suggest a 'format' argument, so we can extend
this later to dump in non-ELF formats (the Windows native memory dump
format would be useful).
I'm not keen on having a plethora of different formats in qemu, especially if
they are proprietary or poorly documented. As long as it's done properly it
should be straight forwarded to reconstruct everything else (vritual
addresses, other formats) with offline debug tools.

What you actually want to do is use the the existing snapshot/savevm
mechanism, and postprocess that into whatever format you want.

Paul
Avi Kivity
2009-03-25 09:28:28 UTC
Permalink
Post by Paul Brook
Post by Avi Kivity
This looks useful. I'd suggest a 'format' argument, so we can extend
this later to dump in non-ELF formats (the Windows native memory dump
format would be useful).
I'm not keen on having a plethora of different formats in qemu, especially if
they are proprietary or poorly documented. As long as it's done properly it
should be straight forwarded to reconstruct everything else (vritual
addresses, other formats) with offline debug tools.
Well, the physical elf format together with postprocessing tools to
convert to virtual elf or Windows dumps seem like a good solution.
Post by Paul Brook
What you actually want to do is use the the existing snapshot/savevm
mechanism, and postprocess that into whatever format you want.
savevm falls into the poorly documented category, I'm afraid. But it
does have the advantage of carrying device state, not just cpu and
memory state, which might be useful in extreme situations.
--
error compiling committee.c: too many arguments to function
Chris Smith
2009-03-25 18:02:16 UTC
Permalink
Post by Paul Brook
What you actually want to do is use the the existing snapshot/savevm
mechanism, and postprocess that into whatever format you want.
No argument. It's not so hot having two places doing the same thing.

But it's slow to write and slow to read, and changes in the layout
would be bad, and automatic tools would have a harder time finding
a particular dump.
Post by Paul Brook
I suggest 'dump -filter ... blah'. I guess for larger cores this is
better than postprocessing, if it can be done without buffering all of RAM.
Standard vmcore only alters the header. crash doesn't even need that.
Others can use a temp disk file if necessary, essentially the same
as haveing the dump command do it.
Post by Paul Brook
Maybe it should use XML.
XML, not. But I think an ASCII representation of CPUState makes sense.

regs[0]=0xffffffffffffffff
regs[1]=0xffffffffffffffff
...
eip=0xffffffffffffffff
eflags=0xffffffffffffffff
...

Parseable with scanf, no endian issues, robust against additions
and deletions and permutations. strings(1) prints it out.

Loading...