qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Anthony Liguori <anthony@codemonkey.ws>
To: Matthew Ogilvie <mmogilvi_qemu@miniinfo.net>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH v2 6/6] i8259: add -no-spurious-interrupt-hack option
Date: Mon, 27 Aug 2012 08:55:06 -0500	[thread overview]
Message-ID: <87ehmsa18l.fsf@codemonkey.ws> (raw)
In-Reply-To: <1345703083-25322-7-git-send-email-mmogilvi_qemu@miniinfo.net>

Matthew Ogilvie <mmogilvi_qemu@miniinfo.net> writes:

> This patch provides a way to optionally suppress spurious interrupts,
> as a workaround for systems described below:
>
> Some old operating systems do not handle spurious interrupts well,
> and qemu tends to generate them significantly more often than
> real hardware.

This is the wrong approach.  You add a LostTickPolicy property to the
i8259 device.

Regards,

Anthony Liguori

>
> Examples:
>   - Microport UNIX System V/386 v 2.1 (ca 1987)
>     (The main problem I'm fixing: Without this patch, it panics
>     sporadically when accessing the hard disk.)
>   - AT&T UNIX System V/386 Release 4.0 Version 2.1a (ca 1991)
>     See screenshot in "QEMU Official OS Support List":
>     http://www.claunia.com/qemu/objectManager.php?sClass=application&iId=9
>     (I don't have this system to test.)
>   - A report about OS/2 boot lockup from 2004 by Hampa Hug:
>     http://lists.nongnu.org/archive/html/qemu-devel/2004-09/msg00367.html
>     (My patch was partially inspired by his.)
>     Also: http://lists.nongnu.org/archive/html/qemu-devel/2005-06/msg00243.html
>     (I don't have this system to test.)
>
> Signed-off-by: Matthew Ogilvie <mmogilvi_qemu@miniinfo.net>
> ---
>
> Note: checkpatches.pl gives an error about initializing the global 
> "int no_spurious_interrupt_hack = 0;", even though existing lines
> near it are doing the same thing.  Should I give precedence to
> checkpatches.pl, or nearby code?
>
> There was no version 1 of this patch; this was the last thing I had to
> work around to get UNIX running.
>
> High level symptoms:
>    1. Despite using this UNIX system for nearly 10 years (ca 1987-1996)
>       on an early 80386, I don't remember ever seeing any crash like
>       this.  I vaguely remember I may have had one or two crashes for
>       which I don't have other explanations that perhaps could have
>       been this, but I don't remember the error messages to confirm it.
>    2. It is somewhat random when UNIX crashes when running in qemu.
>        - Sometimes it crashes the first time the floppy-based installer
>          tries to access the hard disk (partition table?).
>        - Other times (though fairly rarely), it actually finishes
>          formatting and copying the first disk's files to the
>          hard disk without crashing.
>        - On the other hand, I've never seen it successfully boot from
>          the hard disk without this patch.  An attempt to boot from
>          the hard drive always panics quite early.
>    3. I tried -win2k-hack instead, thinking maybe the hard disk is just
>       responding faster than UNIX expected.  But it doesn't seem
>       to have any effect.  UNIX still panics sporadically the same way.
>        - TANGENT: I was going to see if my patch provides an
>          alternative fix for installing Windows 2000, but
>          I was unable to reproduce the original -win2k-hack problem at
>          all (with neither -win2k-hack NOR this patch).  Maybe
>          some other change has fixed it some other way?  Or maybe
>          it is only an issue in configurations I didn't test?
>          (KVM instead of TCG?  Less RAM?  Something else?)
>             It might be worth doing a little more investigation,
>          and eliminating the -win2k-hack option if appropriate.
>    4. If I enable KVM, I get a different error very early in
>       bootup (in splx function instead of splint), and this patch
>       doesn't help.
>
> ============
> My low level analysis of what is going on:
>
> It is hard to track down all the details, but based on logging a
> lot of qemu IRQ stuff, and setting a breakpoint in the earliest
> panic-related UNIX function using gdb, it looks like:
>
>    1. It is near the end of servicing a previous IRQ14 from the
>       hard disk.
>    2. The processor has interrupts disabled (I think), while UNIX
>       clears the slave 8259's IMR (mask) register (sets it to 0), allowing
>       all interrupts to be passed on to the master.
>    3. While in that state, IRQ14 is raised (on the slave), which
>       gets propagated to the master (IRQ2), but the CPU
>       is not interrupted yet.
>    4. UNIX then masks the slave 8259's IMR register
>       completely (sets to 0xff).
>    5. Because the master elcr register is set (by BIOS; UNIX never
>       touches it) to edge trigger for IRQ2, the master latched on
>       to IRQ2 earlier, and continues to assert the processors INT line
>       (the env->interrupt_request&CPU_INTERRUPT_HARD bit) even
>       after all slave IRQs have been masked off (clearing the input
>       IRQ2).
>    6. Finally, UNIX enables CPU interrupts and the interrupt is delivered
>       to the CPU, which ends up as a spurious IRQ15 due to the
>       slave's imr register.  UNIX doesn't know what to do with
>       that, and panics/halts.
>
> I'm not sure why it only sporadically hits this sequence of events.
> There doesn't seem to be other IRQs asserted or serviced anywhere
> in the near past; the last several were all IRQ14's.  But I can't
> help feeling I'm not reading the log output correctly or something,
> because that doesn't make sense.  Maybe there is there some kind
> of a-few-instructions delay before a CPU interrupt is actually
> deliviered after interrupts are enabled, or some delay in raising
> IRQ14 after a hard drive operation is requested, and such delays
> need to fall into a narrow window of opportunity left by UNIX?
>
> I can get a disassembly of the UNIX kernel using a "coff"-enabled
> build of GNU objdump, giving function names but not much else.
> But I haven't studied it in enough detail to actually find the
> relevant code path that is manipulating imr as described above.
> However, this old post outlines some of the high level theory
> of UNIX spl*() functions:
> http://www.linuxmisc.com/29-unix-internals/4e6c1f6fa2e41670.htm
>
> If anyone wants to look into this further, I can provide access to the
> initial boot install floppy, at least.  Email me.  (Without the rest
> of the install disks, it isn't much use for anything except testing
> virtual machines like qemu against rare corner cases...)
>
> ============
> Alternative Approaches:
>
> An alternative to this patch that might work (I haven't tried) would
> be to have BIOS set the master's elcr register 0x04 bit, making IRQ2
> level triggered instead of edge triggered.  I'm not sure what other
> effects this might have.  Maybe it would actually be a more accurate
> model (I haven't checked documentation; maybe "slave mode" of a
> IRQ line into the master is supposed to be level triggered?)
>
> Or perhaps find a way to model the minimum timescale that a interrupt
> request needs to be active to be recognized?
>
> Or maybe my analysis isn't correct; I wasn't able to find the
> relevant code path in the UNIX kernel.
>
> ============
>
>  cpu-exec.c      | 12 +++++++-----
>  hw/i8259.c      | 18 ++++++++++++++++++
>  qemu-options.hx | 12 ++++++++++++
>  sysemu.h        |  1 +
>  vl.c            |  4 ++++
>  5 files changed, 42 insertions(+), 5 deletions(-)
>
> diff --git a/cpu-exec.c b/cpu-exec.c
> index 134b3c4..c309847 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -329,11 +329,15 @@ int cpu_exec(CPUArchState *env)
>                                                            0);
>                              env->interrupt_request &= ~(CPU_INTERRUPT_HARD | CPU_INTERRUPT_VIRQ);
>                              intno = cpu_get_pic_interrupt(env);
> -                            qemu_log_mask(CPU_LOG_TB_IN_ASM, "Servicing hardware INT=0x%02x\n", intno);
> -                            do_interrupt_x86_hardirq(env, intno, 1);
> -                            /* ensure that no TB jump will be modified as
> -                               the program flow was changed */
> -                            next_tb = 0;
> +                            if (intno >= 0) {
> +                                qemu_log_mask(CPU_LOG_TB_IN_ASM,
> +                                              "Servicing hardware INT=0x%02x\n",
> +                                              intno);
> +                                do_interrupt_x86_hardirq(env, intno, 1);
> +                                /* ensure that no TB jump will be modified as
> +                                   the program flow was changed */
> +                                next_tb = 0;
> +                            }
>  #if !defined(CONFIG_USER_ONLY)
>                          } else if ((interrupt_request & CPU_INTERRUPT_VIRQ) &&
>                                     (env->eflags & IF_MASK) && 
> diff --git a/hw/i8259.c b/hw/i8259.c
> index 6587666..7ecb7e1 100644
> --- a/hw/i8259.c
> +++ b/hw/i8259.c
> @@ -26,6 +26,7 @@
>  #include "isa.h"
>  #include "monitor.h"
>  #include "qemu-timer.h"
> +#include "sysemu.h"
>  #include "i8259_internal.h"
>  
>  /* debug PIC */
> @@ -193,6 +194,20 @@ int pic_read_irq(DeviceState *d)
>                  pic_intack(slave_pic, irq2);
>              } else {
>                  /* spurious IRQ on slave controller */
> +                if (no_spurious_interrupt_hack) {
> +                    /* Pretend it was delivered and acknowledged.  If
> +                     * it was spurious due to slave_pic->imr, then
> +                     * as soon as the mask is cleared, the slave will
> +                     * re-trigger IRQ2 on the master.  If it is spurious for
> +                     * some other reason, make sure we don't keep trying
> +                     * to half-process the same spurious interrupt over
> +                     * and over again.
> +                     */
> +                    s->irr &= ~(1<<irq);
> +                    s->last_irr &= ~(1<<irq);
> +                    s->isr &= ~(1<<irq);
> +                    return -1;
> +                }
>                  irq2 = 7;
>              }
>              intno = slave_pic->irq_base + irq2;
> @@ -202,6 +217,9 @@ int pic_read_irq(DeviceState *d)
>          pic_intack(s, irq);
>      } else {
>          /* spurious IRQ on host controller */
> +        if (no_spurious_interrupt_hack) {
> +            return -1;
> +        }
>          irq = 7;
>          intno = s->irq_base + irq;
>      }
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 03e13ec..57bb0b4 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -1188,6 +1188,18 @@ Windows 2000 is installed, you no longer need this option (this option
>  slows down the IDE transfers).
>  ETEXI
>  
> +DEF("no-spurious-interrupt-hack", 0, QEMU_OPTION_no_spurious_interrupt_hack,
> +    "-no-spurious-interrupt-hack     disable delivery of spurious interrupts\n",
> +    QEMU_ARCH_I386)
> +STEXI
> +@item -no-spurious-interrupt-hack
> +@findex -no-spurious-interrupt-hack
> +Use it as a workaround for operating systems that drive PICs in a way that
> +can generate spurious interrupts, but the OS doesn't handle spurious
> +interrupts gracefully.  (e.g. late 80s/early 90s versions of ATT UNIX
> +and derivatives)
> +ETEXI
> +
>  HXCOMM Deprecated by -rtc
>  DEF("rtc-td-hack", 0, QEMU_OPTION_rtc_td_hack, "", QEMU_ARCH_I386)
>  
> diff --git a/sysemu.h b/sysemu.h
> index 65552ac..0170109 100644
> --- a/sysemu.h
> +++ b/sysemu.h
> @@ -117,6 +117,7 @@ extern int graphic_depth;
>  extern DisplayType display_type;
>  extern const char *keyboard_layout;
>  extern int win2k_install_hack;
> +extern int no_spurious_interrupt_hack;
>  extern int alt_grab;
>  extern int ctrl_grab;
>  extern int usb_enabled;
> diff --git a/vl.c b/vl.c
> index 16d04a2..6de41c1 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -204,6 +204,7 @@ CharDriverState *serial_hds[MAX_SERIAL_PORTS];
>  CharDriverState *parallel_hds[MAX_PARALLEL_PORTS];
>  CharDriverState *virtcon_hds[MAX_VIRTIO_CONSOLES];
>  int win2k_install_hack = 0;
> +int no_spurious_interrupt_hack = 0;
>  int usb_enabled = 0;
>  int singlestep = 0;
>  int smp_cpus = 1;
> @@ -3046,6 +3047,9 @@ int main(int argc, char **argv, char **envp)
>              case QEMU_OPTION_win2k_hack:
>                  win2k_install_hack = 1;
>                  break;
> +            case QEMU_OPTION_no_spurious_interrupt_hack:
> +                no_spurious_interrupt_hack = 1;
> +                break;
>              case QEMU_OPTION_rtc_td_hack: {
>                  static GlobalProperty slew_lost_ticks[] = {
>                      {
> -- 
> 1.7.10.2.484.gcd07cc5

  parent reply	other threads:[~2012-08-27 13:55 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-23  6:24 [Qemu-devel] [PATCH v2 0/6] Running Microport UNIX (ca 1987) Matthew Ogilvie
2012-08-23  6:24 ` [Qemu-devel] [PATCH v2 1/6] fix some debug printf format strings Matthew Ogilvie
2012-08-23 11:50   ` Andreas Färber
2012-08-23  6:24 ` [Qemu-devel] [PATCH v2 2/6] target-i386/translate.c: mov to/from crN/drN: ignore mod bits Matthew Ogilvie
2012-08-23  6:24 ` [Qemu-devel] [PATCH v2 3/6] vl: fix -hdachs/-hda argument order parsing issues Matthew Ogilvie
2012-08-23  6:24 ` [Qemu-devel] [PATCH v2 4/6] qemu-options.hx: mention retrace= VGA option Matthew Ogilvie
2012-08-23  6:24 ` [Qemu-devel] [PATCH v2 5/6] vga: add some optional CGA compatibility hacks Matthew Ogilvie
2012-08-23  6:24 ` [Qemu-devel] [PATCH v2 6/6] i8259: add -no-spurious-interrupt-hack option Matthew Ogilvie
2012-08-24  5:40   ` Jan Kiszka
2012-08-24  8:05     ` Matthew Ogilvie
2012-08-24  8:16       ` Jan Kiszka
2012-08-27 13:55   ` Anthony Liguori [this message]
2012-08-27 14:23     ` Paolo Bonzini
2012-08-27 15:50       ` Anthony Liguori
2012-08-24  3:58 ` [Qemu-devel] [PATCH v2 0/6] Running Microport UNIX (ca 1987) malc
2012-08-24  5:44   ` Jan Kiszka
2012-08-24  7:19     ` Peter Maydell
2012-08-24 13:39       ` Paolo Bonzini
2012-08-24 13:46         ` Peter Maydell
2012-08-24  9:13     ` [Qemu-devel] [PATCH v3 0/3] Microport UNIX series (was: [PATCH v2 0/6] ...) Matthew Ogilvie
2012-08-24  9:13       ` [Qemu-devel] [PATCH 1/3] debug printf (cirrus_vga): fixup unintended format change Matthew Ogilvie
2012-08-24  9:13       ` [Qemu-devel] [PATCH 2/3] vga cga_hack=palette_blanking: narrower conditions for hack Matthew Ogilvie
2012-08-24  9:13       ` [Qemu-devel] [PATCH 3/3] doc: mention that -no-spurious-interrupt-hack doesn't work with KVM Matthew Ogilvie
2012-08-24 12:02     ` [Qemu-devel] [PATCH v2 0/6] Running Microport UNIX (ca 1987) malc
2012-08-24 12:10       ` Jan Kiszka
2012-08-24 12:18         ` malc
2012-08-27 13:50   ` Anthony Liguori
2012-08-27 14:09     ` malc
2012-08-27 14:17       ` Anthony Liguori
2012-08-27 14:38         ` malc
2012-08-27 15:11           ` Anthony Liguori

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ehmsa18l.fsf@codemonkey.ws \
    --to=anthony@codemonkey.ws \
    --cc=mmogilvi_qemu@miniinfo.net \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).