* Serial console hangs with Linux 2.6.20 HVM guest
@ 2009-02-04 1:50 Anders Kaseorg
0 siblings, 0 replies; 9+ messages in thread
From: Anders Kaseorg @ 2009-02-04 1:50 UTC (permalink / raw)
To: xen-devel
I am seeing a problem with the Xen emulated serial console. When
running a Linux 2.6.20 HVM guest that has CONFIG_HOTPLUG_CPU=n, the
guest blocks on output to the console until it receives input keypresses
from `xm console`. This prevents the guest from booting up without
banging on some keys, and makes interactive use of the console
difficult.
By bisecting Linux kernel commits, I found that the bug goes away in
commit 40b36daad0ac704e6d5c1b75789f371ef5b053c1 (v2.6.21-rc1~261), which
is a workaround for buggy UARTs on certain HP machines.
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=40b36daad0ac704e6d5c1b75789f371ef5b053c1
“The patch below works around a minor bug found in the UART of the
remote management card used in many HP ia64 and parisc servers (aka the
Diva UARTs). The problem is that the UART does not reassert the THRE
interrupt if it has been previously cleared and the IIR THRI bit is
re-enabled. This can produce a very annoying failure mode when used as
a serial console, allowing a boot/reboot to hang indefinitely until an
RX interrupt kicks it into working again (ie. an unattended reboot could
stall).”
That matches my symptoms exactly, which suggests that the Xen UART
probably has a similar bug.
I’ve seen this in Xen 3.2.1 and 3.3.1. (My host is running Debian Lenny
amd64, with the Xen dom0 kernel 2.6.24-23-xen from Ubuntu Hardy, on a
server with two quad-core Xeons.)
Anders
^ permalink raw reply [flat|nested] 9+ messages in thread
* Serial console hangs with Linux 2.6.20 HVM guest
@ 2009-02-05 2:23 Anders Kaseorg
2009-02-05 17:04 ` Ian Jackson
0 siblings, 1 reply; 9+ messages in thread
From: Anders Kaseorg @ 2009-02-05 2:23 UTC (permalink / raw)
To: xen-devel
I am seeing a problem with the Xen emulated serial console. When
running a Linux 2.6.20 HVM guest that has CONFIG_HOTPLUG_CPU=n, the
guest blocks on output to the console until it receives input keypresses
from `xm console`. This prevents the guest from booting up without
banging on some keys, and makes interactive use of the console
difficult.
By bisecting Linux kernel commits, I found that the bug goes away in
commit 40b36daad0ac704e6d5c1b75789f371ef5b053c1 (v2.6.21-rc1~261), which
is a workaround for buggy UARTs on certain HP machines.
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=40b36daad0ac704e6d5c1b75789f371ef5b053c1
“The patch below works around a minor bug found in the UART of the
remote management card used in many HP ia64 and parisc servers (aka the
Diva UARTs). The problem is that the UART does not reassert the THRE
interrupt if it has been previously cleared and the IIR THRI bit is
re-enabled. This can produce a very annoying failure mode when used as
a serial console, allowing a boot/reboot to hang indefinitely until an
RX interrupt kicks it into working again (ie. an unattended reboot could
stall).”
That matches my symptoms exactly, which suggests that the Xen UART
probably has a similar bug.
I’ve seen this in Xen 3.2.1 and 3.3.1. (My host is running Debian Lenny
amd64, with the Xen dom0 kernel 2.6.24-23-xen from Ubuntu Hardy, on a
server with two quad-core Xeons.)
Anders
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Serial console hangs with Linux 2.6.20 HVM guest
2009-02-05 2:23 Anders Kaseorg
@ 2009-02-05 17:04 ` Ian Jackson
2009-02-05 19:34 ` Anders Kaseorg
0 siblings, 1 reply; 9+ messages in thread
From: Ian Jackson @ 2009-02-05 17:04 UTC (permalink / raw)
To: Anders Kaseorg; +Cc: xen-devel
Anders Kaseorg writes ("[Xen-devel] Serial console hangs with Linux 2.6.20 HVM guest"):
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=40b36daad0ac704e6d5c1b75789f371ef5b053c1
Thanks for this. However looking at the code I'm not sure that's
what's actually going on. Very old versions of qemu did have the bug
that patch describes, but it was fixed in qemu upstream in 2004 in
this commit:
commit 60e336dbb837ef4d5053433f9ee391feb102be36
Author: bellard <bellard@c046a42c-6fe2-441c-8c8c-71466251a162>
Date: Tue Aug 24 21:55:28 2004 +0000
serial interrupt fix (Hampa Hug)
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@1049 c046a42c-6fe2-441c-8c8c-71466251a162
As far as I can see this fix is included in all relevant versions of
qemu. The upstream Linux changeset seems like it would fix any
general IRQ loss problem with the UART.
If you can reproduce this problem, I would like to try to debug it.
Probably the best way is to attach to the running qemu with gdb:
* Add
CFLAGS += -g -O0
to xen-unstable.hg/.config (creating it if necessary) and rebuild
and install the resulting qemu-dm binary where it will be built.
* Start the VM in the usual way.
* cd to the qemu-xen-unstable tree (probably tools/ioemu-remote);
use ps to find the pid of qemu-dm, and say
gdb i386-dm/qemu-dm <pid>
and then at the gdb prompt say
handle SIGUSR2 nostop noprint
break serial_ioport_write if (addr&7)==1
cont
* do whatever it is that makes the VM stuck
* when it next stops it will be in serial_ioport_write setting
the IER. So
print val
print *s
and email the output to this list.
NB I haven't tested this recipe. If you get syntax errors or
something doesn't work I'll actually do so and let you know what to
type.
Ian.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Serial console hangs with Linux 2.6.20 HVM guest
2009-02-05 17:04 ` Ian Jackson
@ 2009-02-05 19:34 ` Anders Kaseorg
2009-02-05 21:52 ` Anders Kaseorg
2009-02-09 17:57 ` Ian Jackson
0 siblings, 2 replies; 9+ messages in thread
From: Anders Kaseorg @ 2009-02-05 19:34 UTC (permalink / raw)
To: Ian Jackson; +Cc: xen-devel
On Thu, 2009-02-05 at 17:04 +0000, Ian Jackson wrote:
> handle SIGUSR2 nostop noprint
> break serial_ioport_write if (addr&7)==1
> cont
> * do whatever it is that makes the VM stuck
> * when it next stops it will be in serial_ioport_write setting
> the IER. So
> print val
> print *s
This breakpoint is triggered for all messages printed by the kernel,
which always showed up with no delay; but it is only occasionally
triggered for strings printed by userspace, even after forcing those
strings to show up by sending keystrokes.
Here is one of the latter cases. (I am sitting at a
“root@andersk-intrepid:~# ” prompt, repeatedly pressing Enter. Each
keypress causes the previous prompt to show up, followed by a newline,
and the current prompt is stalled.)
Breakpoint 1, serial_ioport_write (opaque=0xb342e0, addr=1, val=5)
at /home/andersk/xen-3-3.3.1/debian/build/build-utils_amd64/tools/ioemu-dir/hw/serial.c:413
413 {
(gdb) print val
$5 = 5
(gdb) print *s
$6 = {divider = 1, rbr = 0 '\0', thr = 32 ' ', tsr = 32 ' ', ier = 5 '\005', iir = 193 '�',
lcr = 19 '\023', mcr = 11 '\v', lsr = 96 '`', msr = 176 '�', scr = 0 '\0', fcr = 129 '\201',
thr_ipending = 1, irq = 0xb1d610, chr = 0xb122a0, last_break_enable = 0, base = 0,
it_shift = 0, baudbase = 115200, tsr_retry = 0, last_xmit_ts = 380482341502, recv_fifo = {
data = '\r' <repeats 16 times>, count = 0 '\0', itl = 8 '\b', tail = 0 '\0',
head = 0 '\0'}, xmit_fifo = {data = "repid:~# rsk-int", count = 0 '\0', itl = 0 '\0',
tail = 9 '\t', head = 9 '\t'}, fifo_timeout_timer = 0xb31ad0, timeout_ipending = 0,
transmit_timer = 0xb31b00, char_transmit_time = 78120, poll_msl = -1,
modem_status_poll = 0xb327e0}
Anders
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Serial console hangs with Linux 2.6.20 HVM guest
2009-02-05 19:34 ` Anders Kaseorg
@ 2009-02-05 21:52 ` Anders Kaseorg
2009-02-10 15:34 ` Ian Jackson
2009-02-09 17:57 ` Ian Jackson
1 sibling, 1 reply; 9+ messages in thread
From: Anders Kaseorg @ 2009-02-05 21:52 UTC (permalink / raw)
To: Ian Jackson; +Cc: xen-devel
In case this is more interesting, here is a log of all the breakpoints
hit during a Linux boot, up to the point where it hangs:
http://web.mit.edu/andersk/Public/xen-serial-log
Anders
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Serial console hangs with Linux 2.6.20 HVM guest
2009-02-05 19:34 ` Anders Kaseorg
2009-02-05 21:52 ` Anders Kaseorg
@ 2009-02-09 17:57 ` Ian Jackson
2009-02-09 18:13 ` Anders Kaseorg
1 sibling, 1 reply; 9+ messages in thread
From: Ian Jackson @ 2009-02-09 17:57 UTC (permalink / raw)
To: Anders Kaseorg; +Cc: xen-devel
Anders Kaseorg writes ("Re: [Xen-devel] Serial console hangs with Linux 2.6.20 HVM guest"):
> This breakpoint is triggered for all messages printed by the kernel,
> which always showed up with no delay; but it is only occasionally
> triggered for strings printed by userspace, even after forcing those
> strings to show up by sending keystrokes.
That's interesting ....
Anders Kaseorg writes ("Re: [Xen-devel] Serial console hangs with Linux 2.6.20 HVM guest"):
> In case this is more interesting, here is a log of all the breakpoints
> hit during a Linux boot, up to the point where it hangs:
>
> http://web.mit.edu/andersk/Public/xen-serial-log
.... looking at this, it seems like we definitely don't have the bug
that the Linux kernel is trying to work around. In each of the cases
where the IER was written there by the kernel, an interrupt was
already pending.
But since the Linux change is just to have a failsafe timer for lost
interrupts, any other kind of lost interrupt situation might cause it.
Perhaps the kernel is expecting us to deassert and reassert the
interrupt in some situation when we aren't; since the interrupts are
converted from level to edge triggered it's possible that we are
failing to _deassert_ the interrupt when we should. I'm just
speculating here at the moment ...
Can you recompile your qemu-dm with
#define DEBUG_SERIAL
near the top of serial.c uncommented, and try booting up and get it to
the point where some output is supposed to have come out but has got
stuck, and then send me your /var/log/xen/qemu-dm-<whatever>.log ?
If it's short post it to the list; otherwise put it up on a webpage or
email it to me privately.
Thanks,
Ian.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Serial console hangs with Linux 2.6.20 HVM guest
2009-02-09 17:57 ` Ian Jackson
@ 2009-02-09 18:13 ` Anders Kaseorg
0 siblings, 0 replies; 9+ messages in thread
From: Anders Kaseorg @ 2009-02-09 18:13 UTC (permalink / raw)
To: Ian Jackson; +Cc: xen-devel
On Mon, 2009-02-09 at 17:57 +0000, Ian Jackson wrote:
> Can you recompile your qemu-dm with
> #define DEBUG_SERIAL
> near the top of serial.c uncommented, and try booting up and get it to
> the point where some output is supposed to have come out but has got
> stuck, and then send me your /var/log/xen/qemu-dm-<whatever>.log ?
Sure: http://web.mit.edu/andersk/Public/qemu-dm-andersk-intrepid.log (3
MB).
Anders
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Serial console hangs with Linux 2.6.20 HVM guest
2009-02-05 21:52 ` Anders Kaseorg
@ 2009-02-10 15:34 ` Ian Jackson
2009-02-10 18:20 ` Anders Kaseorg
0 siblings, 1 reply; 9+ messages in thread
From: Ian Jackson @ 2009-02-10 15:34 UTC (permalink / raw)
To: Anders Kaseorg; +Cc: xen-devel
Anders Kaseorg writes ("Re: [Xen-devel] Serial console hangs with Linux 2.6.20 HVM guest"):
> In case this is more interesting, here is a log of all the breakpoints
> hit during a Linux boot, up to the point where it hangs:
>
> http://web.mit.edu/andersk/Public/xen-serial-log
Thanks, that's great. But I think it's a kernel bug. The last thing
the kernel does is read the IIR (Interrupt Identification Register)
twice at a time when the transmit FIFO is empty. Reading the IIR is
(sadly) not a side-effect-free operation; specifically, it cancels any
outstanding transmit fifo/buffer empty interrupt[1]. So the first
time it gets told `Transmit Holding Register Empty interrupt', but
that has the effect of clearing the interrupt so the second time it
reads the IIRC it gets `no interrupt pending'.
[1] I'm getting this out of the National Semiconductor datasheet for
the PC16550D, document number TL/C/8652, June 1995. See for example
section 8.11 `FIFO Interrupt Mode Operation' item A:
The transmit holding register interrupt (02) occurs when
the XMIT FIFO is empty; it is cleared as soon as the transmitter
holding register is written to ([...]) or the IIR is read.
As far as I can see qemu-dm is emulating this accurately.
Can you point me at the exact kernel source code you're using ?
Thanks,
Ian.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Serial console hangs with Linux 2.6.20 HVM guest
2009-02-10 15:34 ` Ian Jackson
@ 2009-02-10 18:20 ` Anders Kaseorg
0 siblings, 0 replies; 9+ messages in thread
From: Anders Kaseorg @ 2009-02-10 18:20 UTC (permalink / raw)
To: Ian Jackson; +Cc: xen-devel
On Tue, 10 Feb 2009, Ian Jackson wrote:
> Can you point me at the exact kernel source code you're using ?
Yes, I took Linux v2.6.20 on amd64, ran `make defconfig`, then ran `make
menuconfig` and turned off CONFIG_HOTPLUG_CPU (Processor type and features
→ Support for hot-pluggable CPUs).
Anders
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-02-10 18:20 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-04 1:50 Serial console hangs with Linux 2.6.20 HVM guest Anders Kaseorg
-- strict thread matches above, loose matches on Subject: below --
2009-02-05 2:23 Anders Kaseorg
2009-02-05 17:04 ` Ian Jackson
2009-02-05 19:34 ` Anders Kaseorg
2009-02-05 21:52 ` Anders Kaseorg
2009-02-10 15:34 ` Ian Jackson
2009-02-10 18:20 ` Anders Kaseorg
2009-02-09 17:57 ` Ian Jackson
2009-02-09 18:13 ` Anders Kaseorg
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.