qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] debugging apic
@ 2010-09-28 15:15 Sam King
  2010-09-28 19:57 ` [Qemu-devel] PATCH: " Sam King
  2010-09-29  0:53 ` [Qemu-devel] " TeLeMan
  0 siblings, 2 replies; 5+ messages in thread
From: Sam King @ 2010-09-28 15:15 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2103 bytes --]

  Hello,

I am seeing a weird crash in my system and I am trying to figure out if 
it is a software bug or a qemu emulation bug.  From the software 
perspective I am getting a GP fault at a time where it looks like 
everything should be running normally.  After digging into the Qemu 
source code I found out where the GPF was coming from.  It looks like 
intno = -1 when it was being passed into do_interrupt64, which was 
triggering one of the GPF checks.  From what I can tell, intno was being 
set to -1 by an interrupt_request in cpu-exec.c, which was going down 
the following if statement around line 409 of that file:

else if ((interrupt_request & CPU_INTERRUPT_HARD) &&
                                    (((env->hflags2 & HF2_VINTR_MASK) &&
                                      (env->hflags2 & HF2_HIF_MASK)) ||
                                     (!(env->hflags2 & HF2_VINTR_MASK) &&
                                      (env->eflags & IF_MASK &&
                                       !(env->hflags & 
HF_INHIBIT_IRQ_MASK)))))

and from within that else if statement, env has the following state:

hflags2 = 0x00000001
eflags = 0x00003202
hflags = 0x0040c0b7
interrupt request = 0x00000002

But intno is being set equal to -1 by the call to cpu_get_pic_interrupt, 
from the call to apic_accept_pic_intr returning 0.  If I change the 
cpu_get_pic_interrupt code to this:

int cpu_get_pic_interrupt(CPUState *env)
{
     int intno;

     intno = apic_get_interrupt(env);
     if (intno >= 0) {
         /* set irq request if a PIC irq is still pending */
         /* XXX: improve that */
         pic_update_irq(isa_pic);
         return intno;
     }
     /* read the irq from the PIC */
     if (!apic_accept_pic_intr(env)) {
         //return -1;
     }

     intno = pic_read_irq(isa_pic);

     return intno;
}

Then the issue manifests as a spurious interrupt and the software 
ignores it, avoiding the GPF.  Does anyone have any ideas as to what is 
going wrong here?  Should I look more closely at the Qemu emulation code 
or my software? Any help is appreciated.

Thanks!

--Sam

[-- Attachment #2: Type: text/html, Size: 4237 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] PATCH: debugging apic
  2010-09-28 15:15 [Qemu-devel] debugging apic Sam King
@ 2010-09-28 19:57 ` Sam King
  2010-09-29  6:44   ` [Qemu-devel] " Jan Kiszka
  2010-09-29  0:53 ` [Qemu-devel] " TeLeMan
  1 sibling, 1 reply; 5+ messages in thread
From: Sam King @ 2010-09-28 19:57 UTC (permalink / raw)
  To: qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 2588 bytes --]

  Thanks to Bernhard Kauer for pointing out the problem.  Apparently if 
software disables LVT_LINT0 when there is a pending CPU_HARD_INTERRUPT 
you can get into trouble.  I attached a patch that fixes the problem by 
resetting the interrupt_request.  I am not sure if we need to do the 
same for LINT1, but this fixed the incorrect GPF I was getting.

--Sam

On 9/28/10 10:15 AM, Sam King wrote:
> Hello,
>
> I am seeing a weird crash in my system and I am trying to figure out 
> if it is a software bug or a qemu emulation bug.  From the software 
> perspective I am getting a GP fault at a time where it looks like 
> everything should be running normally.  After digging into the Qemu 
> source code I found out where the GPF was coming from.  It looks like 
> intno = -1 when it was being passed into do_interrupt64, which was 
> triggering one of the GPF checks.  From what I can tell, intno was 
> being set to -1 by an interrupt_request in cpu-exec.c, which was going 
> down the following if statement around line 409 of that file:
>
> else if ((interrupt_request & CPU_INTERRUPT_HARD) &&
>                                    (((env->hflags2 & HF2_VINTR_MASK) &&
>                                      (env->hflags2 & HF2_HIF_MASK)) ||
>                                     (!(env->hflags2 & HF2_VINTR_MASK) &&
>                                      (env->eflags & IF_MASK &&
>                                       !(env->hflags & 
> HF_INHIBIT_IRQ_MASK)))))
>
> and from within that else if statement, env has the following state:
>
> hflags2 = 0x00000001
> eflags = 0x00003202
> hflags = 0x0040c0b7
> interrupt request = 0x00000002
>
> But intno is being set equal to -1 by the call to 
> cpu_get_pic_interrupt, from the call to apic_accept_pic_intr returning 
> 0.  If I change the cpu_get_pic_interrupt code to this:
>
> int cpu_get_pic_interrupt(CPUState *env)
> {
>     int intno;
>
>     intno = apic_get_interrupt(env);
>     if (intno >= 0) {
>         /* set irq request if a PIC irq is still pending */
>         /* XXX: improve that */
>         pic_update_irq(isa_pic);
>         return intno;
>     }
>     /* read the irq from the PIC */
>     if (!apic_accept_pic_intr(env)) {
>         //return -1;
>     }
>
>     intno = pic_read_irq(isa_pic);
>
>     return intno;
> }
>
> Then the issue manifests as a spurious interrupt and the software 
> ignores it, avoiding the GPF.  Does anyone have any ideas as to what 
> is going wrong here?  Should I look more closely at the Qemu emulation 
> code or my software? Any help is appreciated.
>
> Thanks!
>
> --Sam


[-- Attachment #1.2: Type: text/html, Size: 5030 bytes --]

[-- Attachment #2: apic-race.patch --]
[-- Type: text/plain, Size: 575 bytes --]

*** hw/apic.c	2010-07-22 07:39:04.000000000 -0500
--- ../qemu-0.12.5-fixed/hw/apic.c	2010-09-28 14:45:55.476945540 -0500
***************
*** 841,846 ****
--- 841,851 ----
              s->lvt[n] = val;
              if (n == APIC_LVT_TIMER)
                  apic_timer_update(s, qemu_get_clock(vm_clock));
+ 
+             if(n == APIC_LVT_LINT0) {
+                 if((val & APIC_LVT_MASKED) && (env->interrupt_request & CPU_INTERRUPT_HARD))
+                     cpu_reset_interrupt(env, CPU_INTERRUPT_HARD);
+             }
          }
          break;
      case 0x38:

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] debugging apic
  2010-09-28 15:15 [Qemu-devel] debugging apic Sam King
  2010-09-28 19:57 ` [Qemu-devel] PATCH: " Sam King
@ 2010-09-29  0:53 ` TeLeMan
  1 sibling, 0 replies; 5+ messages in thread
From: TeLeMan @ 2010-09-29  0:53 UTC (permalink / raw)
  To: Sam King; +Cc: qemu-devel

I reported this bug two year ago, but nobody cared.
http://www.mail-archive.com/qemu-devel@nongnu.org/msg15330.html
--
SUN OF A BEACH



On Tue, Sep 28, 2010 at 23:15, Sam King <kingst@uiuc.edu> wrote:
> Hello,
>
> I am seeing a weird crash in my system and I am trying to figure out if it
> is a software bug or a qemu emulation bug.  From the software perspective I
> am getting a GP fault at a time where it looks like everything should be
> running normally.  After digging into the Qemu source code I found out where
> the GPF was coming from.  It looks like intno = -1 when it was being passed
> into do_interrupt64, which was triggering one of the GPF checks.  From what
> I can tell, intno was being set to -1 by an interrupt_request in cpu-exec.c,
> which was going down the following if statement around line 409 of that
> file:
>
> else if ((interrupt_request & CPU_INTERRUPT_HARD) &&
>                                    (((env->hflags2 & HF2_VINTR_MASK) &&
>                                      (env->hflags2 & HF2_HIF_MASK)) ||
>                                     (!(env->hflags2 & HF2_VINTR_MASK) &&
>                                      (env->eflags & IF_MASK &&
>                                       !(env->hflags &
> HF_INHIBIT_IRQ_MASK)))))
>
> and from within that else if statement, env has the following state:
>
> hflags2 = 0x00000001
> eflags = 0x00003202
> hflags = 0x0040c0b7
> interrupt request = 0x00000002
>
> But intno is being set equal to -1 by the call to cpu_get_pic_interrupt,
> from the call to apic_accept_pic_intr returning 0.  If I change the
> cpu_get_pic_interrupt code to this:
>
> int cpu_get_pic_interrupt(CPUState *env)
> {
>     int intno;
>
>     intno = apic_get_interrupt(env);
>     if (intno >= 0) {
>         /* set irq request if a PIC irq is still pending */
>         /* XXX: improve that */
>         pic_update_irq(isa_pic);
>         return intno;
>     }
>     /* read the irq from the PIC */
>     if (!apic_accept_pic_intr(env)) {
>         //return -1;
>     }
>
>     intno = pic_read_irq(isa_pic);
>
>     return intno;
> }
>
> Then the issue manifests as a spurious interrupt and the software ignores
> it, avoiding the GPF.  Does anyone have any ideas as to what is going wrong
> here?  Should I look more closely at the Qemu emulation code or my software?
> Any help is appreciated.
>
> Thanks!
>
> --Sam
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Qemu-devel] Re: PATCH: debugging apic
  2010-09-28 19:57 ` [Qemu-devel] PATCH: " Sam King
@ 2010-09-29  6:44   ` Jan Kiszka
  2010-09-29 16:47     ` Sam King
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Kiszka @ 2010-09-29  6:44 UTC (permalink / raw)
  To: Sam King; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1507 bytes --]

Am 28.09.2010 21:57, Sam King wrote:
>  Thanks to Bernhard Kauer for pointing out the problem.  Apparently if
> software disables LVT_LINT0 when there is a pending CPU_HARD_INTERRUPT
> you can get into trouble.  I attached a patch that fixes the problem by
> resetting the interrupt_request.  I am not sure if we need to do the
> same for LINT1, but this fixed the incorrect GPF I was getting.
> 

...
[ please inline patches ]

> *** hw/apic.c	2010-07-22 07:39:04.000000000 -0500
> --- ../qemu-0.12.5-fixed/hw/apic.c	2010-09-28 14:45:55.476945540 -0500
> ***************
> *** 841,846 ****
> --- 841,851 ----
>               s->lvt[n] = val;
>               if (n == APIC_LVT_TIMER)
>                   apic_timer_update(s, qemu_get_clock(vm_clock));
> + 
> +             if(n == APIC_LVT_LINT0) {
> +                 if((val & APIC_LVT_MASKED) && (env->interrupt_request & CPU_INTERRUPT_HARD))
> +                     cpu_reset_interrupt(env, CPU_INTERRUPT_HARD);
> +             }
>           }
>           break;
>       case 0x38:

This actually points out open issues, but more work is required:

You need to consider other potentially pending interrupts as well, thus
you must not blindly reset here. And the same is true for invocations of
apic_deliver_pic_intr(..., 0). The APIC has to save the PIC line state
and forward it according to its current LVT mask state, which includes
raising the interrupt if the mask is removed while the PIC line is high.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Qemu-devel] Re: PATCH: debugging apic
  2010-09-29  6:44   ` [Qemu-devel] " Jan Kiszka
@ 2010-09-29 16:47     ` Sam King
  0 siblings, 0 replies; 5+ messages in thread
From: Sam King @ 2010-09-29 16:47 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: qemu-devel

  Thanks for the info.  I started looking at the APIC logic and fixing 
the behavior is going to take me a little bit of time to make sure that 
I keep all of the APIC states consistent.  In the meantime, TeLeMan 
suggested a patch a few years ago that would fix this (and potentially 
other) problems.  I have updated this patch and included it in this email:

*** cpu-exec.c  2010-07-22 07:39:04.000000000 -0500
--- ../qemu-0.12.5-fixed/cpu-exec.c     2010-09-29 11:41:25.991566727 -0500
***************
*** 388,396 ****
                                        (env->eflags & IF_MASK &&
                                         !(env->hflags & 
HF_INHIBIT_IRQ_MASK))))) {
                               int intno;
-                             svm_check_intercept(SVM_EXIT_INTR);
                               env->interrupt_request &= 
~(CPU_INTERRUPT_HARD | CPU_INTERRUPT_VIRQ);
                               intno = cpu_get_pic_interrupt(env);
                               qemu_log_mask(CPU_LOG_TB_IN_ASM, 
"Servicing hardware INT=0x%02x\n", intno);
   #if defined(__sparc__) && !defined(CONFIG_SOLARIS)
   #undef env
--- 388,397 ----
                                        (env->eflags & IF_MASK &&
                                         !(env->hflags & 
HF_INHIBIT_IRQ_MASK))))) {
                               int intno;
                               env->interrupt_request &= 
~(CPU_INTERRUPT_HARD | CPU_INTERRUPT_VIRQ);
                               intno = cpu_get_pic_interrupt(env);
+                             if(intno >= 0) {
+                                 svm_check_intercept(SVM_EXIT_INTR);
                                   qemu_log_mask(CPU_LOG_TB_IN_ASM, 
"Servicing hardware INT=0x%02x\n", intno);
   #if defined(__sparc__) && !defined(CONFIG_SOLARIS)
   #undef env
***************
*** 401,406 ****
--- 402,409 ----
                                   /* ensure that no TB jump will be 
modified as
                                      the program flow was changed */
                                   next_tb = 0;
+                             }
+
   #if !defined(CONFIG_USER_ONLY)
                           } else if ((interrupt_request & 
CPU_INTERRUPT_VIRQ) &&
                                      (env->eflags & IF_MASK) &&


On 9/29/10 1:44 AM, Jan Kiszka wrote:
> Am 28.09.2010 21:57, Sam King wrote:
>>   Thanks to Bernhard Kauer for pointing out the problem.  Apparently if
>> software disables LVT_LINT0 when there is a pending CPU_HARD_INTERRUPT
>> you can get into trouble.  I attached a patch that fixes the problem by
>> resetting the interrupt_request.  I am not sure if we need to do the
>> same for LINT1, but this fixed the incorrect GPF I was getting.
>>
> ...
> [ please inline patches ]
>
>> *** hw/apic.c	2010-07-22 07:39:04.000000000 -0500
>> --- ../qemu-0.12.5-fixed/hw/apic.c	2010-09-28 14:45:55.476945540 -0500
>> ***************
>> *** 841,846 ****
>> --- 841,851 ----
>>                s->lvt[n] = val;
>>                if (n == APIC_LVT_TIMER)
>>                    apic_timer_update(s, qemu_get_clock(vm_clock));
>> +
>> +             if(n == APIC_LVT_LINT0) {
>> +                 if((val&  APIC_LVT_MASKED)&&  (env->interrupt_request&  CPU_INTERRUPT_HARD))
>> +                     cpu_reset_interrupt(env, CPU_INTERRUPT_HARD);
>> +             }
>>            }
>>            break;
>>        case 0x38:
> This actually points out open issues, but more work is required:
>
> You need to consider other potentially pending interrupts as well, thus
> you must not blindly reset here. And the same is true for invocations of
> apic_deliver_pic_intr(..., 0). The APIC has to save the PIC line state
> and forward it according to its current LVT mask state, which includes
> raising the interrupt if the mask is removed while the PIC line is high.
>
> Jan
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-09-29 16:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-28 15:15 [Qemu-devel] debugging apic Sam King
2010-09-28 19:57 ` [Qemu-devel] PATCH: " Sam King
2010-09-29  6:44   ` [Qemu-devel] " Jan Kiszka
2010-09-29 16:47     ` Sam King
2010-09-29  0:53 ` [Qemu-devel] " TeLeMan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).