From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:46212) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T4mdr-0006ph-ID for qemu-devel@nongnu.org; Fri, 24 Aug 2012 01:41:14 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1T4mdp-0003u9-JU for qemu-devel@nongnu.org; Fri, 24 Aug 2012 01:41:11 -0400 Received: from mout.web.de ([212.227.17.11]:52945) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T4mdS-0003oE-K4 for qemu-devel@nongnu.org; Fri, 24 Aug 2012 01:41:09 -0400 Message-ID: <503713D4.6000002@web.de> Date: Fri, 24 Aug 2012 07:40:36 +0200 From: Jan Kiszka MIME-Version: 1.0 References: <1345703083-25322-1-git-send-email-mmogilvi_qemu@miniinfo.net> <1345703083-25322-7-git-send-email-mmogilvi_qemu@miniinfo.net> In-Reply-To: <1345703083-25322-7-git-send-email-mmogilvi_qemu@miniinfo.net> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig2495B0A528286768F7169FFC" Subject: Re: [Qemu-devel] [PATCH v2 6/6] i8259: add -no-spurious-interrupt-hack option List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Matthew Ogilvie Cc: qemu-devel@nongnu.org This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig2495B0A528286768F7169FFC Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: quoted-printable On 2012-08-23 08:24, Matthew Ogilvie wrote: > This patch provides a way to optionally suppress spurious interrupts, > as a workaround for systems described below: >=20 > Some old operating systems do not handle spurious interrupts well, > and qemu tends to generate them significantly more often than > real hardware. >=20 > Examples: > - Microport UNIX System V/386 v 2.1 (ca 1987) > (The main problem I'm fixing: Without this patch, it panics > sporadically when accessing the hard disk.) > - AT&T UNIX System V/386 Release 4.0 Version 2.1a (ca 1991) > See screenshot in "QEMU Official OS Support List": > http://www.claunia.com/qemu/objectManager.php?sClass=3Dapplication&= iId=3D9 > (I don't have this system to test.) > - A report about OS/2 boot lockup from 2004 by Hampa Hug: > http://lists.nongnu.org/archive/html/qemu-devel/2004-09/msg00367.ht= ml > (My patch was partially inspired by his.) > Also: http://lists.nongnu.org/archive/html/qemu-devel/2005-06/msg00= 243.html > (I don't have this system to test.) >=20 > Signed-off-by: Matthew Ogilvie > --- >=20 > Note: checkpatches.pl gives an error about initializing the global=20 > "int no_spurious_interrupt_hack =3D 0;", even though existing lines > near it are doing the same thing. Should I give precedence to > checkpatches.pl, or nearby code? >=20 > There was no version 1 of this patch; this was the last thing I had to > work around to get UNIX running. >=20 > High level symptoms: > 1. Despite using this UNIX system for nearly 10 years (ca 1987-1996)= > on an early 80386, I don't remember ever seeing any crash like > this. I vaguely remember I may have had one or two crashes for > which I don't have other explanations that perhaps could have > been this, but I don't remember the error messages to confirm it.= > 2. It is somewhat random when UNIX crashes when running in qemu. > - Sometimes it crashes the first time the floppy-based installer= > tries to access the hard disk (partition table?). > - Other times (though fairly rarely), it actually finishes > formatting and copying the first disk's files to the > hard disk without crashing. > - On the other hand, I've never seen it successfully boot from > the hard disk without this patch. An attempt to boot from > the hard drive always panics quite early. > 3. I tried -win2k-hack instead, thinking maybe the hard disk is just= > responding faster than UNIX expected. But it doesn't seem > to have any effect. UNIX still panics sporadically the same way.= > - TANGENT: I was going to see if my patch provides an > alternative fix for installing Windows 2000, but > I was unable to reproduce the original -win2k-hack problem at > all (with neither -win2k-hack NOR this patch). Maybe > some other change has fixed it some other way? Or maybe > it is only an issue in configurations I didn't test? > (KVM instead of TCG? Less RAM? Something else?) > It might be worth doing a little more investigation, > and eliminating the -win2k-hack option if appropriate. > 4. If I enable KVM, I get a different error very early in > bootup (in splx function instead of splint), and this patch > doesn't help. >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > My low level analysis of what is going on: >=20 > It is hard to track down all the details, but based on logging a > lot of qemu IRQ stuff, and setting a breakpoint in the earliest > panic-related UNIX function using gdb, it looks like: >=20 > 1. It is near the end of servicing a previous IRQ14 from the > hard disk. > 2. The processor has interrupts disabled (I think), while UNIX > clears the slave 8259's IMR (mask) register (sets it to 0), allow= ing > all interrupts to be passed on to the master. > 3. While in that state, IRQ14 is raised (on the slave), which > gets propagated to the master (IRQ2), but the CPU > is not interrupted yet. > 4. UNIX then masks the slave 8259's IMR register > completely (sets to 0xff). > 5. Because the master elcr register is set (by BIOS; UNIX never > touches it) to edge trigger for IRQ2, the master latched on > to IRQ2 earlier, and continues to assert the processors INT line > (the env->interrupt_request&CPU_INTERRUPT_HARD bit) even > after all slave IRQs have been masked off (clearing the input > IRQ2). > 6. Finally, UNIX enables CPU interrupts and the interrupt is deliver= ed > to the CPU, which ends up as a spurious IRQ15 due to the > slave's imr register. UNIX doesn't know what to do with > that, and panics/halts. >=20 > I'm not sure why it only sporadically hits this sequence of events. > There doesn't seem to be other IRQs asserted or serviced anywhere > in the near past; the last several were all IRQ14's. But I can't > help feeling I'm not reading the log output correctly or something, > because that doesn't make sense. Maybe there is there some kind > of a-few-instructions delay before a CPU interrupt is actually > deliviered after interrupts are enabled, or some delay in raising > IRQ14 after a hard drive operation is requested, and such delays > need to fall into a narrow window of opportunity left by UNIX? >=20 > I can get a disassembly of the UNIX kernel using a "coff"-enabled > build of GNU objdump, giving function names but not much else. > But I haven't studied it in enough detail to actually find the > relevant code path that is manipulating imr as described above. > However, this old post outlines some of the high level theory > of UNIX spl*() functions: > http://www.linuxmisc.com/29-unix-internals/4e6c1f6fa2e41670.htm >=20 > If anyone wants to look into this further, I can provide access to the > initial boot install floppy, at least. Email me. (Without the rest > of the install disks, it isn't much use for anything except testing > virtual machines like qemu against rare corner cases...) >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > Alternative Approaches: >=20 > An alternative to this patch that might work (I haven't tried) would > be to have BIOS set the master's elcr register 0x04 bit, making IRQ2 > level triggered instead of edge triggered. I'm not sure what other > effects this might have. Maybe it would actually be a more accurate > model (I haven't checked documentation; maybe "slave mode" of a > IRQ line into the master is supposed to be level triggered?) >=20 > Or perhaps find a way to model the minimum timescale that a interrupt > request needs to be active to be recognized? >=20 > Or maybe my analysis isn't correct; I wasn't able to find the > relevant code path in the UNIX kernel. >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >=20 > cpu-exec.c | 12 +++++++----- > hw/i8259.c | 18 ++++++++++++++++++ > qemu-options.hx | 12 ++++++++++++ > sysemu.h | 1 + > vl.c | 4 ++++ > 5 files changed, 42 insertions(+), 5 deletions(-) >=20 > diff --git a/cpu-exec.c b/cpu-exec.c > index 134b3c4..c309847 100644 > --- a/cpu-exec.c > +++ b/cpu-exec.c > @@ -329,11 +329,15 @@ int cpu_exec(CPUArchState *env) > 0); > env->interrupt_request &=3D ~(CPU_INTERRUP= T_HARD | CPU_INTERRUPT_VIRQ); > intno =3D cpu_get_pic_interrupt(env); > - qemu_log_mask(CPU_LOG_TB_IN_ASM, "Servicin= g hardware INT=3D0x%02x\n", intno); > - do_interrupt_x86_hardirq(env, intno, 1); > - /* ensure that no TB jump will be modified= as > - the program flow was changed */ > - next_tb =3D 0; > + if (intno >=3D 0) { > + qemu_log_mask(CPU_LOG_TB_IN_ASM, > + "Servicing hardware INT=3D= 0x%02x\n", > + intno); > + do_interrupt_x86_hardirq(env, intno, 1= ); > + /* ensure that no TB jump will be modi= fied as > + the program flow was changed */ > + next_tb =3D 0; > + } > #if !defined(CONFIG_USER_ONLY) > } else if ((interrupt_request & CPU_INTERRUPT_= VIRQ) && > (env->eflags & IF_MASK) &&=20 > diff --git a/hw/i8259.c b/hw/i8259.c > index 6587666..7ecb7e1 100644 > --- a/hw/i8259.c > +++ b/hw/i8259.c > @@ -26,6 +26,7 @@ > #include "isa.h" > #include "monitor.h" > #include "qemu-timer.h" > +#include "sysemu.h" > #include "i8259_internal.h" > =20 > /* debug PIC */ > @@ -193,6 +194,20 @@ int pic_read_irq(DeviceState *d) > pic_intack(slave_pic, irq2); > } else { > /* spurious IRQ on slave controller */ > + if (no_spurious_interrupt_hack) { > + /* Pretend it was delivered and acknowledged. If > + * it was spurious due to slave_pic->imr, then > + * as soon as the mask is cleared, the slave will > + * re-trigger IRQ2 on the master. If it is spurio= us for > + * some other reason, make sure we don't keep tryi= ng > + * to half-process the same spurious interrupt ove= r > + * and over again. > + */ > + s->irr &=3D ~(1< + s->last_irr &=3D ~(1< + s->isr &=3D ~(1< + return -1; > + } > irq2 =3D 7; > } > intno =3D slave_pic->irq_base + irq2; > @@ -202,6 +217,9 @@ int pic_read_irq(DeviceState *d) > pic_intack(s, irq); > } else { > /* spurious IRQ on host controller */ > + if (no_spurious_interrupt_hack) { > + return -1; > + } > irq =3D 7; > intno =3D s->irq_base + irq; > } > diff --git a/qemu-options.hx b/qemu-options.hx > index 03e13ec..57bb0b4 100644 > --- a/qemu-options.hx > +++ b/qemu-options.hx > @@ -1188,6 +1188,18 @@ Windows 2000 is installed, you no longer need th= is option (this option > slows down the IDE transfers). > ETEXI > =20 > +DEF("no-spurious-interrupt-hack", 0, QEMU_OPTION_no_spurious_interrupt= _hack, > + "-no-spurious-interrupt-hack disable delivery of spurious inte= rrupts\n", > + QEMU_ARCH_I386) > +STEXI > +@item -no-spurious-interrupt-hack > +@findex -no-spurious-interrupt-hack > +Use it as a workaround for operating systems that drive PICs in a way = that > +can generate spurious interrupts, but the OS doesn't handle spurious > +interrupts gracefully. (e.g. late 80s/early 90s versions of ATT UNIX > +and derivatives) Has to mention or even actively warn that it doesn't work with KVM and its in-kernel irqchip (as that PIC model lacks your hack). However, I strongly suspect you are nastily papering over an issue in some device model. So I would prefer to dig deeper before installing this in upstream (also due to its dependency on the userspace PIC model).= Jan --------------enig2495B0A528286768F7169FFC Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAlA3E9cACgkQitSsb3rl5xTIiQCePvrHtZBDK3rR/fHousrIK6gf LI4AoLF31w6fYQcKhWhP5SsOoAk1cquH =h3U2 -----END PGP SIGNATURE----- --------------enig2495B0A528286768F7169FFC--