public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC] kvm irq assignment
@ 2008-06-12 16:39 Xu, Anthony
  2008-06-12 19:16 ` Avi Kivity
  0 siblings, 1 reply; 12+ messages in thread
From: Xu, Anthony @ 2008-06-12 16:39 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti, Alexander Graf; +Cc: Jes Sorensen, kvm, kvm-ia64

Hi all,
Thanks for your comments.

I made this new patch based on your comments

1. use bimodal _PRT, to take advantage of IOAPIC pin 16~23
	the mapping is simple,  slot  ->  (slot&7)+16 IOAPIC pin,
someone may provide good mapping ?
2. use ISA-bridge configure space 0x64 byte as a communication
mechansim.
	When guest BIOS invokes _PIC, the value is passed to qemu
through byte 0x64.
             qemu know whether it is PIC mode and APIC mode by checking
byte 0x64.
3. pci_slot_get_pirq and piix3_set_irq adopt different operation based
on PIC mode/APIC mode
4. All pci devices are still using active high level-triggered
interrupt, while IOAPIC pin 16~23 use active low level-triggered
interrupt.
   piix3_set_irq is responsible to reserve the level if it is in APIC
mode.
5. interrupt sharing for PCI devices is still supported


Test by runing guest linux, NIC is using IOAPIC pin > 15
  0:     980211    IO-APIC-edge  timer
  1:        179    IO-APIC-edge  i8042
  8:          0    IO-APIC-edge  rtc
  9:          0   IO-APIC-level  acpi
 12:        289    IO-APIC-edge  i8042
 14:       3266    IO-APIC-edge  ide0
 15:      13489    IO-APIC-edge  ide1
185:        485   IO-APIC-level  eth0

Didn't try guest linux only with PIC, I think it works, because the path
for PIC is not changed at all.


Please comment!

Thanks,
Anthony






diff --git a/bios/acpi-dsdt.dsl b/bios/acpi-dsdt.dsl
index 21fc76a..64219f2 100755
--- a/bios/acpi-dsdt.dsl
+++ b/bios/acpi-dsdt.dsl
@@ -201,14 +201,24 @@ DefinitionBlock (
         }
     }
 
+    Name (PICD, 0)
 
-    /* PCI Bus definition */
+
+    /*PCI Bus definition */
     Scope(\_SB) {
         Device(PCI0) {
             Name (_HID, EisaId ("PNP0A03"))
             Name (_ADR, 0x00)
             Name (_UID, 1)
-            Name(_PRT, Package() {
+
+            Method(_PRT,0){
+                If(PICD){
+                    Return(PRTA)
+                }
+                Return(PRTP)
+            }
+
+            Name(PRTP, Package() {
                 /* PCI IRQ routing table, example from ACPI 2.0a
specification,
                    section 6.2.8.1 */
                 /* Note: we provide the same info as the PCI routing
@@ -407,6 +417,202 @@ DefinitionBlock (
                 Package() {0x001fffff, 3, LNKB, 0},
             })
 
+            Name(PRTA, Package() {
+                /* IOAPIC use fixed connection */
+
+                // PCI Slot 0
+                Package() {0x0000ffff, 0, 0, 16},
+                Package() {0x0000ffff, 1, 0, 16},
+                Package() {0x0000ffff, 2, 0, 16},
+                Package() {0x0000ffff, 3, 0, 16},
+
+                // PCI Slot 1
+                Package() {0x0001ffff, 0, 0, 17},
+                Package() {0x0001ffff, 1, 0, 17},
+                Package() {0x0001ffff, 2, 0, 17},
+                Package() {0x0001ffff, 3, 0, 17},
+
+                // PCI Slot 2
+                Package() {0x0002ffff, 0, 0, 18},
+                Package() {0x0002ffff, 1, 0, 18},
+                Package() {0x0002ffff, 2, 0, 18},
+                Package() {0x0002ffff, 3, 0, 18},
+
+                // PCI Slot 3
+                Package() {0x0003ffff, 0, 0, 19},
+                Package() {0x0003ffff, 1, 0, 19},
+                Package() {0x0003ffff, 2, 0, 19},
+                Package() {0x0003ffff, 3, 0, 19},
+
+                // PCI Slot 4
+                Package() {0x0004ffff, 0, 0, 20},
+                Package() {0x0004ffff, 1, 0, 20},
+                Package() {0x0004ffff, 2, 0, 20},
+                Package() {0x0004ffff, 3, 0, 20},
+
+                // PCI Slot 5
+                Package() {0x0005ffff, 0, 0, 21},
+                Package() {0x0005ffff, 1, 0, 21},
+                Package() {0x0005ffff, 2, 0, 21},
+                Package() {0x0005ffff, 3, 0, 21},
+
+                // PCI Slot 6
+                Package() {0x0006ffff, 0, 0, 22},
+                Package() {0x0006ffff, 1, 0, 22},
+                Package() {0x0006ffff, 2, 0, 22},
+                Package() {0x0006ffff, 3, 0, 22},
+
+                // PCI Slot 7
+                Package() {0x0007ffff, 0, 0, 23},
+                Package() {0x0007ffff, 1, 0, 23},
+                Package() {0x0007ffff, 2, 0, 23},
+                Package() {0x0007ffff, 3, 0, 23},
+
+                // PCI Slot 8
+                Package() {0x0008ffff, 0, 0, 16},
+                Package() {0x0008ffff, 1, 0, 16},
+                Package() {0x0008ffff, 2, 0, 16},
+                Package() {0x0008ffff, 3, 0, 16},
+
+                // PCI Slot 9
+                Package() {0x0009ffff, 0, 0, 17},
+                Package() {0x0009ffff, 1, 0, 17},
+                Package() {0x0009ffff, 2, 0, 17},
+                Package() {0x0009ffff, 3, 0, 17},
+
+                // PCI Slot 10
+                Package() {0x000affff, 0, 0, 18},
+                Package() {0x000affff, 1, 0, 18},
+                Package() {0x000affff, 2, 0, 18},
+                Package() {0x000affff, 3, 0, 18},
+
+                // PCI Slot 11
+                Package() {0x000bffff, 0, 0, 19},
+                Package() {0x000bffff, 1, 0, 19},
+                Package() {0x000bffff, 2, 0, 19},
+                Package() {0x000bffff, 3, 0, 19},
+
+                // PCI Slot 12
+                Package() {0x000cffff, 0, 0, 20},
+                Package() {0x000cffff, 1, 0, 20},
+                Package() {0x000cffff, 2, 0, 20},
+                Package() {0x000cffff, 3, 0, 20},
+
+                // PCI Slot 13
+                Package() {0x000dffff, 0, 0, 21},
+                Package() {0x000dffff, 1, 0, 21},
+                Package() {0x000dffff, 2, 0, 21},
+                Package() {0x000dffff, 3, 0, 21},
+
+                // PCI Slot 14
+                Package() {0x000effff, 0, 0, 22},
+                Package() {0x000effff, 1, 0, 22},
+                Package() {0x000effff, 2, 0, 22},
+                Package() {0x000effff, 3, 0, 22},
+
+                // PCI Slot 15
+                Package() {0x000fffff, 0, 0, 23},
+                Package() {0x000fffff, 1, 0, 23},
+                Package() {0x000fffff, 2, 0, 23},
+                Package() {0x000fffff, 3, 0, 23},
+
+                // PCI Slot 16
+                Package() {0x0010ffff, 0, 0, 16},
+                Package() {0x0010ffff, 1, 0, 16},
+                Package() {0x0010ffff, 2, 0, 16},
+                Package() {0x0010ffff, 3, 0, 16},
+
+                // PCI Slot 17
+                Package() {0x0011ffff, 0, 0, 17},
+                Package() {0x0011ffff, 1, 0, 17},
+                Package() {0x0011ffff, 2, 0, 17},
+                Package() {0x0011ffff, 3, 0, 17},
+
+                // PCI Slot 18
+                Package() {0x0012ffff, 0, 0, 18},
+                Package() {0x0012ffff, 1, 0, 18},
+                Package() {0x0012ffff, 2, 0, 18},
+                Package() {0x0012ffff, 3, 0, 18},
+
+                // PCI Slot 19
+                Package() {0x0013ffff, 0, 0, 19},
+                Package() {0x0013ffff, 1, 0, 19},
+                Package() {0x0013ffff, 2, 0, 19},
+                Package() {0x0013ffff, 3, 0, 19},
+
+                // PCI Slot 20
+                Package() {0x0014ffff, 0, 0, 20},
+                Package() {0x0014ffff, 1, 0, 20},
+                Package() {0x0014ffff, 2, 0, 20},
+                Package() {0x0014ffff, 3, 0, 20},
+
+                // PCI Slot 21
+                Package() {0x0015ffff, 0, 0, 21},
+                Package() {0x0015ffff, 1, 0, 21},
+                Package() {0x0015ffff, 2, 0, 21},
+                Package() {0x0015ffff, 3, 0, 21},
+
+                // PCI Slot 22
+                Package() {0x0016ffff, 0, 0, 22},
+                Package() {0x0016ffff, 1, 0, 22},
+                Package() {0x0016ffff, 2, 0, 22},
+                Package() {0x0016ffff, 3, 0, 22},
+
+                // PCI Slot 23
+                Package() {0x0017ffff, 0, 0, 23},
+                Package() {0x0017ffff, 1, 0, 23},
+                Package() {0x0017ffff, 2, 0, 23},
+                Package() {0x0017ffff, 3, 0, 23},
+
+                // PCI Slot 24
+                Package() {0x0018ffff, 0, 0, 16},
+                Package() {0x0018ffff, 1, 0, 16},
+                Package() {0x0018ffff, 2, 0, 16},
+                Package() {0x0018ffff, 3, 0, 16},
+
+                // PCI Slot 25
+                Package() {0x0019ffff, 0, 0, 17},
+                Package() {0x0019ffff, 1, 0, 17},
+                Package() {0x0019ffff, 2, 0, 17},
+                Package() {0x0019ffff, 3, 0, 17},
+
+                // PCI Slot 26
+                Package() {0x001affff, 0, 0, 18},
+                Package() {0x001affff, 1, 0, 18},
+                Package() {0x001affff, 2, 0, 18},
+                Package() {0x001affff, 3, 0, 18},
+
+                // PCI Slot 27
+                Package() {0x001bffff, 0, 0, 19},
+                Package() {0x001bffff, 1, 0, 19},
+                Package() {0x001bffff, 2, 0, 19},
+                Package() {0x001bffff, 3, 0, 19},
+
+                // PCI Slot 28
+                Package() {0x001cffff, 0, 0, 20},
+                Package() {0x001cffff, 1, 0, 20},
+                Package() {0x001cffff, 2, 0, 20},
+                Package() {0x001cffff, 3, 0, 20},
+
+                // PCI Slot 29
+                Package() {0x001dffff, 0, 0, 21},
+                Package() {0x001dffff, 1, 0, 21},
+                Package() {0x001dffff, 2, 0, 21},
+                Package() {0x001dffff, 3, 0, 21},
+
+                // PCI Slot 30
+                Package() {0x001effff, 0, 0, 22},
+                Package() {0x001effff, 1, 0, 22},
+                Package() {0x001effff, 2, 0, 22},
+                Package() {0x001effff, 3, 0, 22},
+
+                // PCI Slot 31
+                Package() {0x001fffff, 0, 0, 23},
+                Package() {0x001fffff, 1, 0, 23},
+                Package() {0x001fffff, 2, 0, 23},
+                Package() {0x001fffff, 3, 0, 23},
+            })
+
             OperationRegion(PCST, SystemIO, 0xae00, 0x08)
             Field (PCST, DWordAcc, NoLock, WriteAsZeros)
 	    {
@@ -770,6 +976,9 @@ DefinitionBlock (
             /* PIIX PCI to ISA irq remapping */
             OperationRegion (P40C, PCI_Config, 0x60, 0x04)
 
+            /* APIC and PIC flag */
+            OperationRegion (P40D, PCI_Config, 0x64, 0x01)
+
             /* Real-time clock */
             Device (RTC)
             {
@@ -960,6 +1169,19 @@ DefinitionBlock (
 	}
     }
 
+
+    Field (\_SB.PCI0.ISA.P40D, ByteAcc, NoLock, Preserve)
+    {
+       FLAG,  8,
+    }
+
+    Method(_PIC, 1)
+    {
+        Store(Arg0, PICD)
+        Store(Arg0, FLAG)
+    }
+
+
     /* PCI IRQs */
     Scope(\_SB) {
          Field (\_SB.PCI0.ISA.P40C, ByteAcc, NoLock, Preserve)
diff --git a/qemu/hw/piix_pci.c b/qemu/hw/piix_pci.c
index 90cb3a6..da6bcab 100644
--- a/qemu/hw/piix_pci.c
+++ b/qemu/hw/piix_pci.c
@@ -28,6 +28,7 @@
 
 typedef uint32_t pci_addr_t;
 #include "pci_host.h"
+#include "qemu-kvm.h"
 
 typedef PCIHostState I440FXState;
 
@@ -45,6 +46,25 @@ static uint32_t i440fx_addr_readl(void* opaque,
uint32_t addr)
 
 static void piix3_set_irq(qemu_irq *pic, int irq_num, int level);
 
+/* PIIX3 PCI to ISA bridge */
+
+PCIDevice *piix3_dev;
+PCIDevice *piix4_dev;
+
+
+#ifdef KVM_CAP_IRQCHIP
+static int pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num)
+{
+    int slot_addend;
+    if( piix3_dev->config[0x64])  // APIC mode
+        return ((pci_dev->devfn >> 3) & 7)+16;
+    else {		// PIC mode
+        slot_addend = (pci_dev->devfn >> 3) - 1;
+        return (irq_num + slot_addend) & 3;
+    }
+}
+
+#else
 /* return the global irq number corresponding to a given device irq
    pin. We could also use the bus number to have a more precise
    mapping. */
@@ -54,6 +74,7 @@ static int pci_slot_get_pirq(PCIDevice *pci_dev, int
irq_num)
     slot_addend = (pci_dev->devfn >> 3) - 1;
     return (irq_num + slot_addend) & 3;
 }
+#endif
 
 static target_phys_addr_t isa_page_descs[384 / 4];
 static uint8_t smm_enabled;
@@ -171,12 +192,17 @@ static int i440fx_load(QEMUFile* f, void *opaque,
int version_id)
 
 PCIBus *i440fx_init(PCIDevice **pi440fx_state, qemu_irq *pic)
 {
+    int irq_num;
     PCIBus *b;
     PCIDevice *d;
     I440FXState *s;
-
+#ifdef KVM_CAP_IRQCHIP
+    irq_num = 24;
+#else
+    irq_num = 4;
+#endif
     s = qemu_mallocz(sizeof(I440FXState));
-    b = pci_register_bus(piix3_set_irq, pci_slot_get_pirq, pic, 0, 4);
+    b = pci_register_bus(piix3_set_irq, pci_slot_get_pirq, pic, 0,
irq_num);
     s->bus = b;
 
     register_ioport_write(0xcf8, 4, 4, i440fx_addr_writel, s);
@@ -208,10 +234,6 @@ PCIBus *i440fx_init(PCIDevice **pi440fx_state,
qemu_irq *pic)
     return b;
 }
 
-/* PIIX3 PCI to ISA bridge */
-
-PCIDevice *piix3_dev;
-PCIDevice *piix4_dev;
 
 /* just used for simpler irq handling. */
 #define PCI_IRQ_WORDS   ((PCI_DEVICES_MAX + 31) / 32)
@@ -220,6 +242,12 @@ static void piix3_set_irq(qemu_irq *pic, int
irq_num, int level)
 {
     int i, pic_irq, pic_level;
 
+#ifdef KVM_CAP_IRQCHIP
+    if (piix3_dev->config[0x64]){  // APIC mode
+        kvm_set_irq(irq_num, !level);
+        return;
+    }
+#endif
     pci_irq_levels[irq_num] = level;
 
     /* now we change the pic irq level according to the piix irq
mappings */

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC] kvm irq assignment
  2008-06-12 16:39 [RFC] kvm irq assignment Xu, Anthony
@ 2008-06-12 19:16 ` Avi Kivity
  2008-06-13  2:45   ` Xu, Anthony
  2008-06-13  6:38   ` Xu, Anthony
  0 siblings, 2 replies; 12+ messages in thread
From: Avi Kivity @ 2008-06-12 19:16 UTC (permalink / raw)
  To: Xu, Anthony; +Cc: Marcelo Tosatti, Alexander Graf, Jes Sorensen, kvm, kvm-ia64

Xu, Anthony wrote:
> Hi all,
> Thanks for your comments.
>
> I made this new patch based on your comments
>
> 1. use bimodal _PRT, to take advantage of IOAPIC pin 16~23
> 	the mapping is simple,  slot  ->  (slot&7)+16 IOAPIC pin,
> someone may provide good mapping ?
>   

I think it's fine. If we find a better one later, or if we add another 
ioapic, we can easily change it since the bios and qemu are shipped as a 
unit.

> 2. use ISA-bridge configure space 0x64 byte as a communication
> mechansim.
> 	When guest BIOS invokes _PIC, the value is passed to qemu
> through byte 0x64.
>              qemu know whether it is PIC mode and APIC mode by checking
> byte 0x64.
> 3. pci_slot_get_pirq and piix3_set_irq adopt different operation based
> on PIC mode/APIC mode
>   

I'm not sure how real hardware works, but I _think_ that it routes irqs 
unconditionally to both the legacy path and directly to the ioapic. So 
for example if slot 5 asserts an interrupt, we map it through the pci 
link mapping and generate an active high interrupt to one of {5, 10, 11} 
(both pic and ioapic), and simultaneously an active low interrupt to 
ioapic pin 21.

The _PIC method should disable the link interrupts if ioapic mode is 
disabled.

This removes the need for communication between the bios and qemu.

>  
> +            /* APIC and PIC flag */
> +            OperationRegion (P40D, PCI_Config, 0x64, 0x01)
> +
>   

This is actually SERIRQC, serial irq control.

> +
> +#ifdef KVM_CAP_IRQCHIP
>   

This should be unconditional.

> +static int pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num)
> +{
> +    int slot_addend;
> +    if( piix3_dev->config[0x64])  // APIC mode
> +        return ((pci_dev->devfn >> 3) & 7)+16;
> +    else {		// PIC mode
> +        slot_addend = (pci_dev->devfn >> 3) - 1;
> +        return (irq_num + slot_addend) & 3;
> +    }
> +}
>   

What I'm suggesting is to "fork" the interrupt into two lines, one 
legacy path and the ioapic path.


-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [RFC] kvm irq assignment
  2008-06-12 19:16 ` Avi Kivity
@ 2008-06-13  2:45   ` Xu, Anthony
  2008-06-13  6:38   ` Xu, Anthony
  1 sibling, 0 replies; 12+ messages in thread
From: Xu, Anthony @ 2008-06-13  2:45 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, Alexander Graf, Jes Sorensen, kvm, kvm-ia64

Avi Kivity wrote:
> Xu, Anthony wrote:
>> Hi all,
>> Thanks for your comments.
>> 
>> I made this new patch based on your comments
>> 
>> 1. use bimodal _PRT, to take advantage of IOAPIC pin 16~23
>> 	the mapping is simple,  slot  ->  (slot&7)+16 IOAPIC pin,
>> someone may provide good mapping ?
>> 
> 
> I think it's fine. If we find a better one later, or if we add another
> ioapic, we can easily change it since the bios and qemu are shipped
> as a unit.
> 
>> 2. use ISA-bridge configure space 0x64 byte as a communication
>> 	mechansim. When guest BIOS invokes _PIC, the value is passed to
qemu
>> through byte 0x64.
>>              qemu know whether it is PIC mode and APIC mode by
>> checking byte 0x64. 
>> 3. pci_slot_get_pirq and piix3_set_irq adopt different operation
>> based on PIC mode/APIC mode 
>> 
> 
> I'm not sure how real hardware works, but I _think_ that it routes
> irqs unconditionally to both the legacy path and directly to the
> ioapic. So for example if slot 5 asserts an interrupt, we map it
> through the pci link mapping and generate an active high interrupt to
> one of {5, 10, 11} (both pic and ioapic), and simultaneously an
> active low interrupt to ioapic pin 21.
I think what you described is correct.


> 
> The _PIC method should disable the link interrupts if ioapic mode is
> disabled.
Typo!  If ioapic mode is enabled.

>From x86 BIOS, OS disable link interrupt through link device _DIS
mothod.


> 
> This removes the need for communication between the bios and qemu.
Agree

> 
>> 
>> +            /* APIC and PIC flag */
>> +            OperationRegion (P40D, PCI_Config, 0x64, 0x01) +
>> 
> 
> This is actually SERIRQC, serial irq control.
> 
>> +
>> +#ifdef KVM_CAP_IRQCHIP
>> 
> 
> This should be unconditional.
> 
>> +static int pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num) +{
>> +    int slot_addend;
>> +    if( piix3_dev->config[0x64])  // APIC mode
>> +        return ((pci_dev->devfn >> 3) & 7)+16;
>> +    else {		// PIC mode
>> +        slot_addend = (pci_dev->devfn >> 3) - 1;
>> +        return (irq_num + slot_addend) & 3;
>> +    }
>> +}
>> 
> 
> What I'm suggesting is to "fork" the interrupt into two lines, one
> legacy path and the ioapic path.

I'll try this way.

Anthony

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [RFC] kvm irq assignment
  2008-06-12 19:16 ` Avi Kivity
  2008-06-13  2:45   ` Xu, Anthony
@ 2008-06-13  6:38   ` Xu, Anthony
  2008-06-13 14:22     ` Avi Kivity
                       ` (2 more replies)
  1 sibling, 3 replies; 12+ messages in thread
From: Xu, Anthony @ 2008-06-13  6:38 UTC (permalink / raw)
  To: Xu, Anthony, Avi Kivity
  Cc: Marcelo Tosatti, Alexander Graf, Jes Sorensen, kvm, kvm-ia64

Hi Avi and all

This is the revised one,

All PCI devices send interrupt to both PIC and IOAPIC,  
a). When PIC is enabled and IOAPIC is disabled,  all redirect entries in
IOAPIC are masked.
B) When PIC is disabled and IPAPIC is enabled, link entry bit7 is set,
means this link entry is disable.
Guest OS need to guarantee PIC and IOAPIC are not enabled in the same
time. Otherwise cause many suspicious interrupt to guest.

Test by running guest linux in kvm/ia32 and kvm/ia64.


Thanks,
Anthony



diff --git a/bios/acpi-dsdt.dsl b/bios/acpi-dsdt.dsl
index 21fc76a..e12fd66 100755
--- a/bios/acpi-dsdt.dsl
+++ b/bios/acpi-dsdt.dsl
@@ -201,14 +201,28 @@ DefinitionBlock (
         }
     }
 
+    Name (PICD, 0)
 
-    /* PCI Bus definition */
+    Method(_PIC, 1)
+    {
+        Store(Arg0, PICD)
+    }
+
+    /*PCI Bus definition */
     Scope(\_SB) {
         Device(PCI0) {
             Name (_HID, EisaId ("PNP0A03"))
             Name (_ADR, 0x00)
             Name (_UID, 1)
-            Name(_PRT, Package() {
+
+            Method(_PRT,0){
+                If(PICD){
+                    Return(PRTA)
+                }
+                Return(PRTP)
+            }
+
+            Name(PRTP, Package() {
                 /* PCI IRQ routing table, example from ACPI 2.0a
specification,
                    section 6.2.8.1 */
                 /* Note: we provide the same info as the PCI routing
@@ -407,6 +421,202 @@ DefinitionBlock (
                 Package() {0x001fffff, 3, LNKB, 0},
             })
 
+            Name(PRTA, Package() {
+                /* IOAPIC use fixed connection */
+
+                // PCI Slot 0
+                Package() {0x0000ffff, 0, 0, 16},
+                Package() {0x0000ffff, 1, 0, 16},
+                Package() {0x0000ffff, 2, 0, 16},
+                Package() {0x0000ffff, 3, 0, 16},
+
+                // PCI Slot 1
+                Package() {0x0001ffff, 0, 0, 17},
+                Package() {0x0001ffff, 1, 0, 17},
+                Package() {0x0001ffff, 2, 0, 17},
+                Package() {0x0001ffff, 3, 0, 17},
+
+                // PCI Slot 2
+                Package() {0x0002ffff, 0, 0, 18},
+                Package() {0x0002ffff, 1, 0, 18},
+                Package() {0x0002ffff, 2, 0, 18},
+                Package() {0x0002ffff, 3, 0, 18},
+
+                // PCI Slot 3
+                Package() {0x0003ffff, 0, 0, 19},
+                Package() {0x0003ffff, 1, 0, 19},
+                Package() {0x0003ffff, 2, 0, 19},
+                Package() {0x0003ffff, 3, 0, 19},
+
+                // PCI Slot 4
+                Package() {0x0004ffff, 0, 0, 20},
+                Package() {0x0004ffff, 1, 0, 20},
+                Package() {0x0004ffff, 2, 0, 20},
+                Package() {0x0004ffff, 3, 0, 20},
+
+                // PCI Slot 5
+                Package() {0x0005ffff, 0, 0, 21},
+                Package() {0x0005ffff, 1, 0, 21},
+                Package() {0x0005ffff, 2, 0, 21},
+                Package() {0x0005ffff, 3, 0, 21},
+
+                // PCI Slot 6
+                Package() {0x0006ffff, 0, 0, 22},
+                Package() {0x0006ffff, 1, 0, 22},
+                Package() {0x0006ffff, 2, 0, 22},
+                Package() {0x0006ffff, 3, 0, 22},
+
+                // PCI Slot 7
+                Package() {0x0007ffff, 0, 0, 23},
+                Package() {0x0007ffff, 1, 0, 23},
+                Package() {0x0007ffff, 2, 0, 23},
+                Package() {0x0007ffff, 3, 0, 23},
+
+                // PCI Slot 8
+                Package() {0x0008ffff, 0, 0, 16},
+                Package() {0x0008ffff, 1, 0, 16},
+                Package() {0x0008ffff, 2, 0, 16},
+                Package() {0x0008ffff, 3, 0, 16},
+
+                // PCI Slot 9
+                Package() {0x0009ffff, 0, 0, 17},
+                Package() {0x0009ffff, 1, 0, 17},
+                Package() {0x0009ffff, 2, 0, 17},
+                Package() {0x0009ffff, 3, 0, 17},
+
+                // PCI Slot 10
+                Package() {0x000affff, 0, 0, 18},
+                Package() {0x000affff, 1, 0, 18},
+                Package() {0x000affff, 2, 0, 18},
+                Package() {0x000affff, 3, 0, 18},
+
+                // PCI Slot 11
+                Package() {0x000bffff, 0, 0, 19},
+                Package() {0x000bffff, 1, 0, 19},
+                Package() {0x000bffff, 2, 0, 19},
+                Package() {0x000bffff, 3, 0, 19},
+
+                // PCI Slot 12
+                Package() {0x000cffff, 0, 0, 20},
+                Package() {0x000cffff, 1, 0, 20},
+                Package() {0x000cffff, 2, 0, 20},
+                Package() {0x000cffff, 3, 0, 20},
+
+                // PCI Slot 13
+                Package() {0x000dffff, 0, 0, 21},
+                Package() {0x000dffff, 1, 0, 21},
+                Package() {0x000dffff, 2, 0, 21},
+                Package() {0x000dffff, 3, 0, 21},
+
+                // PCI Slot 14
+                Package() {0x000effff, 0, 0, 22},
+                Package() {0x000effff, 1, 0, 22},
+                Package() {0x000effff, 2, 0, 22},
+                Package() {0x000effff, 3, 0, 22},
+
+                // PCI Slot 15
+                Package() {0x000fffff, 0, 0, 23},
+                Package() {0x000fffff, 1, 0, 23},
+                Package() {0x000fffff, 2, 0, 23},
+                Package() {0x000fffff, 3, 0, 23},
+
+                // PCI Slot 16
+                Package() {0x0010ffff, 0, 0, 16},
+                Package() {0x0010ffff, 1, 0, 16},
+                Package() {0x0010ffff, 2, 0, 16},
+                Package() {0x0010ffff, 3, 0, 16},
+
+                // PCI Slot 17
+                Package() {0x0011ffff, 0, 0, 17},
+                Package() {0x0011ffff, 1, 0, 17},
+                Package() {0x0011ffff, 2, 0, 17},
+                Package() {0x0011ffff, 3, 0, 17},
+
+                // PCI Slot 18
+                Package() {0x0012ffff, 0, 0, 18},
+                Package() {0x0012ffff, 1, 0, 18},
+                Package() {0x0012ffff, 2, 0, 18},
+                Package() {0x0012ffff, 3, 0, 18},
+
+                // PCI Slot 19
+                Package() {0x0013ffff, 0, 0, 19},
+                Package() {0x0013ffff, 1, 0, 19},
+                Package() {0x0013ffff, 2, 0, 19},
+                Package() {0x0013ffff, 3, 0, 19},
+
+                // PCI Slot 20
+                Package() {0x0014ffff, 0, 0, 20},
+                Package() {0x0014ffff, 1, 0, 20},
+                Package() {0x0014ffff, 2, 0, 20},
+                Package() {0x0014ffff, 3, 0, 20},
+
+                // PCI Slot 21
+                Package() {0x0015ffff, 0, 0, 21},
+                Package() {0x0015ffff, 1, 0, 21},
+                Package() {0x0015ffff, 2, 0, 21},
+                Package() {0x0015ffff, 3, 0, 21},
+
+                // PCI Slot 22
+                Package() {0x0016ffff, 0, 0, 22},
+                Package() {0x0016ffff, 1, 0, 22},
+                Package() {0x0016ffff, 2, 0, 22},
+                Package() {0x0016ffff, 3, 0, 22},
+
+                // PCI Slot 23
+                Package() {0x0017ffff, 0, 0, 23},
+                Package() {0x0017ffff, 1, 0, 23},
+                Package() {0x0017ffff, 2, 0, 23},
+                Package() {0x0017ffff, 3, 0, 23},
+
+                // PCI Slot 24
+                Package() {0x0018ffff, 0, 0, 16},
+                Package() {0x0018ffff, 1, 0, 16},
+                Package() {0x0018ffff, 2, 0, 16},
+                Package() {0x0018ffff, 3, 0, 16},
+
+                // PCI Slot 25
+                Package() {0x0019ffff, 0, 0, 17},
+                Package() {0x0019ffff, 1, 0, 17},
+                Package() {0x0019ffff, 2, 0, 17},
+                Package() {0x0019ffff, 3, 0, 17},
+
+                // PCI Slot 26
+                Package() {0x001affff, 0, 0, 18},
+                Package() {0x001affff, 1, 0, 18},
+                Package() {0x001affff, 2, 0, 18},
+                Package() {0x001affff, 3, 0, 18},
+
+                // PCI Slot 27
+                Package() {0x001bffff, 0, 0, 19},
+                Package() {0x001bffff, 1, 0, 19},
+                Package() {0x001bffff, 2, 0, 19},
+                Package() {0x001bffff, 3, 0, 19},
+
+                // PCI Slot 28
+                Package() {0x001cffff, 0, 0, 20},
+                Package() {0x001cffff, 1, 0, 20},
+                Package() {0x001cffff, 2, 0, 20},
+                Package() {0x001cffff, 3, 0, 20},
+
+                // PCI Slot 29
+                Package() {0x001dffff, 0, 0, 21},
+                Package() {0x001dffff, 1, 0, 21},
+                Package() {0x001dffff, 2, 0, 21},
+                Package() {0x001dffff, 3, 0, 21},
+
+                // PCI Slot 30
+                Package() {0x001effff, 0, 0, 22},
+                Package() {0x001effff, 1, 0, 22},
+                Package() {0x001effff, 2, 0, 22},
+                Package() {0x001effff, 3, 0, 22},
+
+                // PCI Slot 31
+                Package() {0x001fffff, 0, 0, 23},
+                Package() {0x001fffff, 1, 0, 23},
+                Package() {0x001fffff, 2, 0, 23},
+                Package() {0x001fffff, 3, 0, 23},
+            })
+
             OperationRegion(PCST, SystemIO, 0xae00, 0x08)
             Field (PCST, DWordAcc, NoLock, WriteAsZeros)
 	    {
diff --git a/qemu/hw/apic.c b/qemu/hw/apic.c
index a14cab2..c3014fa 100644
--- a/qemu/hw/apic.c
+++ b/qemu/hw/apic.c
@@ -1053,9 +1053,25 @@ static void ioapic_service(IOAPICState *s)
     }
 }
 
+int ioapic_map_irq(int devfn, int irq_num)
+{
+    int irq;
+    irq = ((devfn >> 3) & 7) + 16;
+    return irq;
+}
+#ifdef KVM_CAP_IRQCHIP
+static int ioapic_irq_count[IOAPIC_NUM_PINS];
+#endif
+
 void ioapic_set_irq(void *opaque, int vector, int level)
 {
     IOAPICState *s = opaque;
+#ifdef KVM_CAP_IRQCHIP
+    ioapic_irq_count[vector] += level;
+    if (kvm_enabled())
+	if (kvm_set_irq(vector, ioapic_irq_count[vector] == 0))
+	    return;
+#endif
 
     if (vector >= 0 && vector < IOAPIC_NUM_PINS) {
         uint32_t mask = 1 << vector;
diff --git a/qemu/hw/ipf.c b/qemu/hw/ipf.c
index b11e328..4761463 100644
--- a/qemu/hw/ipf.c
+++ b/qemu/hw/ipf.c
@@ -672,3 +672,23 @@ QEMUMachine ipf_machine = {
     ipf_init_pci,
     VGA_RAM_SIZE + VGA_RAM_SIZE,
 };
+
+#define IOAPIC_NUM_PINS 48
+static int ioapic_irq_count[IOAPIC_NUM_PINS];
+
+int ioapic_map_irq(int devfn, int irq_num)
+{
+    int irq, dev;
+    dev = devfn >> 3;
+    irq = ((((dev << 2) + (dev >> 3) + irq_num) & 31) + 16);
+    return irq;
+}
+
+void ioapic_set_irq(void *opaque, int vector, int level)
+{
+    ioapic_irq_count[vector] += level;
+    if (kvm_enabled())
+	if (kvm_set_irq(vector, ioapic_irq_count[vector] == 0))
+	    return;
+}
+
diff --git a/qemu/hw/pc.h b/qemu/hw/pc.h
index c284bf1..ef09a78 100644
--- a/qemu/hw/pc.h
+++ b/qemu/hw/pc.h
@@ -47,6 +47,7 @@ int apic_accept_pic_intr(CPUState *env);
 void apic_local_deliver(CPUState *env, int vector);
 int apic_get_interrupt(CPUState *env);
 IOAPICState *ioapic_init(void);
+int ioapic_map_irq(int devfn, int irq_num);
 void ioapic_set_irq(void *opaque, int vector, int level);
 
 /* i8254.c */
diff --git a/qemu/hw/pci.c b/qemu/hw/pci.c
index a23a466..f96fbb5 100644
--- a/qemu/hw/pci.c
+++ b/qemu/hw/pci.c
@@ -27,6 +27,8 @@
 #include "net.h"
 #include "pc.h"
 
+#include "qemu-kvm.h"
+
 //#define DEBUG_PCI
 
 struct PCIBus {
@@ -534,12 +536,18 @@ static void pci_set_irq(void *opaque, int irq_num,
int level)
     PCIDevice *pci_dev = (PCIDevice *)opaque;
     PCIBus *bus;
     int change;
-
+#ifdef KVM_CAP_IRQCHIP
+    int irq;
+#endif 
     change = level - pci_dev->irq_state[irq_num];
     if (!change)
         return;
 
     pci_dev->irq_state[irq_num] = level;
+#ifdef KVM_CAP_IRQCHIP
+    irq = ioapic_map_irq(pci_dev->devfn, irq_num);
+    ioapic_set_irq(opaque, irq, change);
+#endif
     for (;;) {
         bus = pci_dev->bus;
         irq_num = bus->map_irq(pci_dev, irq_num);
diff --git a/qemu/hw/piix_pci.c b/qemu/hw/piix_pci.c
index 90cb3a6..96316ca 100644
--- a/qemu/hw/piix_pci.c
+++ b/qemu/hw/piix_pci.c
@@ -225,6 +226,9 @@ static void piix3_set_irq(qemu_irq *pic, int
irq_num, int level)
     /* now we change the pic irq level according to the piix irq
mappings */
     /* XXX: optimize */
     pic_irq = piix3_dev->config[0x60 + irq_num];
+    /* if bit7 set 1, this link is disabled */
+    if (pic_irq & 0x80)  
+        return;
     if (pic_irq < 16) {
         /* The pic level is the logical OR of all the PCI irqs mapped
            to it */





Xu, Anthony wrote:
> Avi Kivity wrote:
>> Xu, Anthony wrote:
>>> Hi all,
>>> Thanks for your comments.
>>> 
>>> I made this new patch based on your comments
>>> 
>>> 1. use bimodal _PRT, to take advantage of IOAPIC pin 16~23
>>> 	the mapping is simple,  slot  ->  (slot&7)+16 IOAPIC pin,
>>> someone may provide good mapping ?
>>> 
>> 
>> I think it's fine. If we find a better one later, or if we add
>> another ioapic, we can easily change it since the bios and qemu are
>> shipped as a unit. 
>> 
>>> 2. use ISA-bridge configure space 0x64 byte as a communication
>>> 	mechansim. When guest BIOS invokes _PIC, the value is passed to
>>>              qemu through byte 0x64. qemu know whether it is PIC
>>> mode and APIC mode by checking byte 0x64. 
>>> 3. pci_slot_get_pirq and piix3_set_irq adopt different operation
>>> based on PIC mode/APIC mode 
>>> 
>> 
>> I'm not sure how real hardware works, but I _think_ that it routes
>> irqs unconditionally to both the legacy path and directly to the
>> ioapic. So for example if slot 5 asserts an interrupt, we map it
>> through the pci link mapping and generate an active high interrupt to
>> one of {5, 10, 11} (both pic and ioapic), and simultaneously an
>> active low interrupt to ioapic pin 21.
> I think what you described is correct.
> 
> 
>> 
>> The _PIC method should disable the link interrupts if ioapic mode is
>> disabled.
> Typo!  If ioapic mode is enabled.
> 
> From x86 BIOS, OS disable link interrupt through link device _DIS
> mothod. 
> 
> 
>> 
>> This removes the need for communication between the bios and qemu.
>> Agree 
> 
>> 
>>> 
>>> +            /* APIC and PIC flag */
>>> +            OperationRegion (P40D, PCI_Config, 0x64, 0x01) +
>>> 
>> 
>> This is actually SERIRQC, serial irq control.
>> 
>>> +
>>> +#ifdef KVM_CAP_IRQCHIP
>>> 
>> 
>> This should be unconditional.
>> 
>>> +static int pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num) +{ +
>>> int slot_addend; +    if( piix3_dev->config[0x64])  // APIC mode
>>> +        return ((pci_dev->devfn >> 3) & 7)+16;
>>> +    else {		// PIC mode
>>> +        slot_addend = (pci_dev->devfn >> 3) - 1;
>>> +        return (irq_num + slot_addend) & 3;
>>> +    }
>>> +}
>>> 
>> 
>> What I'm suggesting is to "fork" the interrupt into two lines, one
>> legacy path and the ioapic path.
> 
> I'll try this way.
> 
> Anthony

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC] kvm irq assignment
  2008-06-13  6:38   ` Xu, Anthony
@ 2008-06-13 14:22     ` Avi Kivity
  2008-06-16  1:36       ` Xu, Anthony
  2008-06-14 23:32     ` Marcelo Tosatti
       [not found]     ` <3073362C-AF0F-4DBC-989C-AAA5E2875BDF@suse.de>
  2 siblings, 1 reply; 12+ messages in thread
From: Avi Kivity @ 2008-06-13 14:22 UTC (permalink / raw)
  To: Xu, Anthony; +Cc: Marcelo Tosatti, Alexander Graf, Jes Sorensen, kvm, kvm-ia64

Xu, Anthony wrote:
> Hi Avi and all
>
> This is the revised one,
>
> All PCI devices send interrupt to both PIC and IOAPIC,  
> a). When PIC is enabled and IOAPIC is disabled,  all redirect entries in
> IOAPIC are masked.
> B) When PIC is disabled and IPAPIC is enabled, link entry bit7 is set,
> means this link entry is disable.
> Guest OS need to guarantee PIC and IOAPIC are not enabled in the same
> time. Otherwise cause many suspicious interrupt to guest.
>
> Test by running guest linux in kvm/ia32 and kvm/ia64.
>
>
>   

Looks good.

> diff --git a/qemu/hw/apic.c b/qemu/hw/apic.c
> index a14cab2..c3014fa 100644
> --- a/qemu/hw/apic.c
> +++ b/qemu/hw/apic.c
> @@ -1053,9 +1053,25 @@ static void ioapic_service(IOAPICState *s)
>      }
>  }
>  
> +int ioapic_map_irq(int devfn, int irq_num)
> +{
> +    int irq;
> +    irq = ((devfn >> 3) & 7) + 16;
> +    return irq;
> +}
> +#ifdef KVM_CAP_IRQCHIP
> +static int ioapic_irq_count[IOAPIC_NUM_PINS];
> +#endif
> +
>  void ioapic_set_irq(void *opaque, int vector, int level)
>  {
>      IOAPICState *s = opaque;
> +#ifdef KVM_CAP_IRQCHIP
> +    ioapic_irq_count[vector] += level;
> +    if (kvm_enabled())
> +	if (kvm_set_irq(vector, ioapic_irq_count[vector] == 0))
> +	    return;
> +#endif
>   

I think this can be done more cleanly using the qemu_irq infrastructure. 
qem_irq_invert can do the inversion, and to get the "irq fork", you can 
have a qemu_irq instance that forwards its argument to two other 
qemu_irq instances.

There shouldn't be any #ifdef KVM_CAP_IRQCHIPs in there - it should work 
for plain qemu as well (well, if we fix qemu ioapic polarity code).

> diff --git a/qemu/hw/piix_pci.c b/qemu/hw/piix_pci.c
> index 90cb3a6..96316ca 100644
> --- a/qemu/hw/piix_pci.c
> +++ b/qemu/hw/piix_pci.c
> @@ -225,6 +226,9 @@ static void piix3_set_irq(qemu_irq *pic, int
> irq_num, int level)
>      /* now we change the pic irq level according to the piix irq
> mappings */
>      /* XXX: optimize */
>      pic_irq = piix3_dev->config[0x60 + irq_num];
> +    /* if bit7 set 1, this link is disabled */
> +    if (pic_irq & 0x80)  
> +        return;
>   

Already caught by the test below... hacky.

>      if (pic_irq < 16) {
>          /* The pic level is the logical OR of all the PCI irqs mapped
>             to it */
>
>
>   

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] kvm irq assignment
  2008-06-13  6:38   ` Xu, Anthony
  2008-06-13 14:22     ` Avi Kivity
@ 2008-06-14 23:32     ` Marcelo Tosatti
  2008-06-16  1:34       ` Xu, Anthony
  2008-06-16  5:31       ` Xu, Anthony
       [not found]     ` <3073362C-AF0F-4DBC-989C-AAA5E2875BDF@suse.de>
  2 siblings, 2 replies; 12+ messages in thread
From: Marcelo Tosatti @ 2008-06-14 23:32 UTC (permalink / raw)
  To: Xu, Anthony; +Cc: Avi Kivity, Alexander Graf, Jes Sorensen, kvm, kvm-ia64

Hi Anthony,

On Fri, Jun 13, 2008 at 02:38:08PM +0800, Xu, Anthony wrote:
> Hi Avi and all
> 
> This is the revised one,
> 
> All PCI devices send interrupt to both PIC and IOAPIC,  
> a). When PIC is enabled and IOAPIC is disabled,  all redirect entries in
> IOAPIC are masked.
> B) When PIC is disabled and IPAPIC is enabled, link entry bit7 is set,
> means this link entry is disable.
> Guest OS need to guarantee PIC and IOAPIC are not enabled in the same
> time. Otherwise cause many suspicious interrupt to guest.
> 
> Test by running guest linux in kvm/ia32 and kvm/ia64.

Interrupt sharing is stable under Linux, PCI hotplug is happy, and
Windows is happy. Ship it!

I had to apply your patch by hand, your mailer eats newlines and other
nasty things, please fix that (or send attached patches).

>  
> +    Name (PICD, 0)
>  
> -    /* PCI Bus definition */
> +    Method(_PIC, 1)
> +    {
> +        Store(Arg0, PICD)
> +    }
> +
> +    /*PCI Bus definition */

Why did you take off the space before the "P" of PCI? Before you ask me,
no, I don't have anything better to do :)

>      Scope(\_SB) {
>          Device(PCI0) {
>              Name (_HID, EisaId ("PNP0A03"))
>              Name (_ADR, 0x00)
>              Name (_UID, 1)
> -            Name(_PRT, Package() {
> +
> +            Method(_PRT,0){
> +                If(PICD){

Put some spaces there too.

> diff --git a/qemu/hw/pci.c b/qemu/hw/pci.c
> index a23a466..f96fbb5 100644
> --- a/qemu/hw/pci.c
> +++ b/qemu/hw/pci.c
> @@ -27,6 +27,8 @@
>  #include "net.h"
>  #include "pc.h"
>  
> +#include "qemu-kvm.h"
> +
>  //#define DEBUG_PCI
>  
>  struct PCIBus {
> @@ -534,12 +536,18 @@ static void pci_set_irq(void *opaque, int irq_num,
> int level)
>      PCIDevice *pci_dev = (PCIDevice *)opaque;
>      PCIBus *bus;
>      int change;
> -
> +#ifdef KVM_CAP_IRQCHIP
> +    int irq;
> +#endif 
>      change = level - pci_dev->irq_state[irq_num];
>      if (!change)
>          return;
>  
>      pci_dev->irq_state[irq_num] = level;
> +#ifdef KVM_CAP_IRQCHIP
> +    irq = ioapic_map_irq(pci_dev->devfn, irq_num);
> +    ioapic_set_irq(opaque, irq, change);
> +#endif

I think you should avoid any changes to pci.c. Perhaps create a new
ioapic_and_pic_map / ioapic_and_pic_set pair of functions and change
pc.c to use that instead of piix_set_irq.

Other than that (and KVM_CAP_IRQCHIP mentioned by Avi, along with making
sure this works with "-no-kvm") looks great.

Regarding the non-PIIX link devices I mentioned, that can be done later
if necessary.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [RFC] kvm irq assignment
       [not found]     ` <3073362C-AF0F-4DBC-989C-AAA5E2875BDF@suse.de>
@ 2008-06-16  1:26       ` Xu, Anthony
  2008-06-16  5:40       ` Xu, Anthony
  1 sibling, 0 replies; 12+ messages in thread
From: Xu, Anthony @ 2008-06-16  1:26 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Avi Kivity, Marcelo Tosatti, Jes Sorensen, kvm, kvm-ia64

 

________________________________

From: Alexander Graf [mailto:agraf@suse.de] 
Sent: 2008年6月13日 22:31
To: Xu, Anthony
Cc: Avi Kivity; Marcelo Tosatti; Jes Sorensen; kvm@vger.kernel.org; kvm-ia64@vger.kernel.org
Subject: Re: [RFC] kvm irq assignment



On Jun 12, 2008, at 11:38 PM, Xu, Anthony wrote:


	Hi Avi and all
	
	This is the revised one,
	
	All PCI devices send interrupt to both PIC and IOAPIC,  
	a). When PIC is enabled and IOAPIC is disabled,  all redirect entries in
	IOAPIC are masked.
	B) When PIC is disabled and IPAPIC is enabled, link entry bit7 is set,
	means this link entry is disable.
	Guest OS need to guarantee PIC and IOAPIC are not enabled in the same
	time. Otherwise cause many suspicious interrupt to guest.


At boottime the IOAPIC is completely masked, while the PIC is completely unmasked. Somewhere during bootup ACPI compliant OSs can call _PIC to switch to IOAPIC mode and somehow deactivate the PIC lines and activate IOAPIC lines for the devices they are interested in.

I sent a tool called "apicdump" to this list some time ago that allows you to read the PIC and IOAPIC from within Linux.


[Anthony] nice tool, I'll try that.



	Test by running guest linux in kvm/ia32 and kvm/ia64.
	
	
	Thanks,
	Anthony
	
	
	
	diff --git a/bios/acpi-dsdt.dsl b/bios/acpi-dsdt.dsl
	index 21fc76a..e12fd66 100755
	--- a/bios/acpi-dsdt.dsl
	+++ b/bios/acpi-dsdt.dsl


ACPI changes look fine to me.


	
	



[snip]


	diff --git a/qemu/hw/apic.c b/qemu/hw/apic.c
	index a14cab2..c3014fa 100644
	--- a/qemu/hw/apic.c
	+++ b/qemu/hw/apic.c
	@@ -1053,9 +1053,25 @@ static void ioapic_service(IOAPICState *s)
	    }
	}
	
	+int ioapic_map_irq(int devfn, int irq_num)
	+{
	+    int irq;
	+    irq = ((devfn >> 3) & 7) + 16;
	+    return irq;
	+}
	+#ifdef KVM_CAP_IRQCHIP
	+static int ioapic_irq_count[IOAPIC_NUM_PINS];
	+#endif
	+
	void ioapic_set_irq(void *opaque, int vector, int level)
	{
	    IOAPICState *s = opaque;
	+#ifdef KVM_CAP_IRQCHIP
	+    ioapic_irq_count[vector] += level;
	+    if (kvm_enabled())
	+ if (kvm_set_irq(vector, ioapic_irq_count[vector] == 0))
	+    return;
	+#endif
	
	    if (vector >= 0 && vector < IOAPIC_NUM_PINS) {
	        uint32_t mask = 1 << vector;
	diff --git a/qemu/hw/ipf.c b/qemu/hw/ipf.c
	index b11e328..4761463 100644
	--- a/qemu/hw/ipf.c
	+++ b/qemu/hw/ipf.c
	@@ -672,3 +672,23 @@ QEMUMachine ipf_machine = {
	    ipf_init_pci,
	    VGA_RAM_SIZE + VGA_RAM_SIZE,
	};
	+
	+#define IOAPIC_NUM_PINS 48
	+static int ioapic_irq_count[IOAPIC_NUM_PINS];
	+
	+int ioapic_map_irq(int devfn, int irq_num)
	+{
	+    int irq, dev;
	+    dev = devfn >> 3;
	+    irq = ((((dev << 2) + (dev >> 3) + irq_num) & 31) + 16);
	+    return irq;
	+}


Why does this look different from the one in apic.c?
[Anthony] it is used for ia64,  I would like to use same guest Firmware of XEN/ia64 in KVM/IA64,
I "copy" the mapping from XEN/IA64.




	+
	+void ioapic_set_irq(void *opaque, int vector, int level)
	+{
	+    ioapic_irq_count[vector] += level;
	+    if (kvm_enabled())
	+ if (kvm_set_irq(vector, ioapic_irq_count[vector] == 0))
	+    return;
	+}
	+


This does look like common code to me, so it might be a good idea to share it with ia32. AFAIK apics are the same on ia32 and ia64.
[Anthony] I wanted to do the same thing as what you said, while currently KVM/ia64 doesn't include apic.c file, I tried to add apic.c, but failed to complie
I'll try to make it work.




	diff --git a/qemu/hw/pc.h b/qemu/hw/pc.h
	index c284bf1..ef09a78 100644
	--- a/qemu/hw/pc.h
	+++ b/qemu/hw/pc.h
	@@ -47,6 +47,7 @@ int apic_accept_pic_intr(CPUState *env);
	void apic_local_deliver(CPUState *env, int vector);
	int apic_get_interrupt(CPUState *env);
	IOAPICState *ioapic_init(void);
	+int ioapic_map_irq(int devfn, int irq_num);
	void ioapic_set_irq(void *opaque, int vector, int level);
	
	/* i8254.c */
	diff --git a/qemu/hw/pci.c b/qemu/hw/pci.c
	index a23a466..f96fbb5 100644
	--- a/qemu/hw/pci.c
	+++ b/qemu/hw/pci.c
	@@ -27,6 +27,8 @@
	#include "net.h"
	#include "pc.h"
	
	+#include "qemu-kvm.h"
	+
	//#define DEBUG_PCI
	
	struct PCIBus {
	@@ -534,12 +536,18 @@ static void pci_set_irq(void *opaque, int irq_num,
	int level)
	    PCIDevice *pci_dev = (PCIDevice *)opaque;
	    PCIBus *bus;
	    int change;
	-
	+#ifdef KVM_CAP_IRQCHIP
	+    int irq;
	+#endif 
	    change = level - pci_dev->irq_state[irq_num];
	    if (!change)
	        return;
	
	    pci_dev->irq_state[irq_num] = level;
	+#ifdef KVM_CAP_IRQCHIP
	+    irq = ioapic_map_irq(pci_dev->devfn, irq_num);
	+    ioapic_set_irq(opaque, irq, change);
	+#endif


So if we're not running KVM we don't trigger the APIC? I don't see how that helps anyone. Just modify the Qemu APIC so that it works fine and then hook the in-kernel APIC to that.
[Anthony] agree, it will support native Qemu.



	    for (;;) {
	        bus = pci_dev->bus;
	        irq_num = bus->map_irq(pci_dev, irq_num);
	diff --git a/qemu/hw/piix_pci.c b/qemu/hw/piix_pci.c
	index 90cb3a6..96316ca 100644
	--- a/qemu/hw/piix_pci.c
	+++ b/qemu/hw/piix_pci.c
	@@ -225,6 +226,9 @@ static void piix3_set_irq(qemu_irq *pic, int
	irq_num, int level)
	    /* now we change the pic irq level according to the piix irq
	mappings */
	    /* XXX: optimize */
	    pic_irq = piix3_dev->config[0x60 + irq_num];
	+    /* if bit7 set 1, this link is disabled */
	+    if (pic_irq & 0x80)  
	+        return;


Uhm ... I do remember having seen that somewhere else. Why do you need to add it now?
[Anthony] As Avi pointed out, below check ( pic_irq < 16) already covers this check.

Thanks
Anthony



	    if (pic_irq < 16) {
	        /* The pic level is the logical OR of all the PCI irqs mapped
	           to it */
	
	
	
	
	
	Xu, Anthony wrote:
	

		Avi Kivity wrote:
		

			Xu, Anthony wrote:
			

				Hi all,
				

				Thanks for your comments.
				


				I made this new patch based on your comments
				


				1. use bimodal _PRT, to take advantage of IOAPIC pin 16~23
				

				the mapping is simple,  slot  ->  (slot&7)+16 IOAPIC pin,
				

				someone may provide good mapping ?
				



			I think it's fine. If we find a better one later, or if we add
			

			another ioapic, we can easily change it since the bios and qemu are
			

			shipped as a unit. 
			


				2. use ISA-bridge configure space 0x64 byte as a communication
				

				mechansim. When guest BIOS invokes _PIC, the value is passed to
				

				            qemu through byte 0x64. qemu know whether it is PIC
				

				mode and APIC mode by checking byte 0x64. 
				

				3. pci_slot_get_pirq and piix3_set_irq adopt different operation
				

				based on PIC mode/APIC mode 
				



			I'm not sure how real hardware works, but I _think_ that it routes
			

			irqs unconditionally to both the legacy path and directly to the
			

			ioapic. So for example if slot 5 asserts an interrupt, we map it
			

			through the pci link mapping and generate an active high interrupt to
			

			one of {5, 10, 11} (both pic and ioapic), and simultaneously an
			

			active low interrupt to ioapic pin 21.
			

		I think what you described is correct.
		




			The _PIC method should disable the link interrupts if ioapic mode is
			

			disabled.
			

		Typo!  If ioapic mode is enabled.
		


		From x86 BIOS, OS disable link interrupt through link device _DIS
		

		mothod. 
		




			This removes the need for communication between the bios and qemu.
			

			Agree 
			




				+            /* APIC and PIC flag */
				

				+            OperationRegion (P40D, PCI_Config, 0x64, 0x01) +
				



			This is actually SERIRQC, serial irq control.
			


				+
				

				+#ifdef KVM_CAP_IRQCHIP
				



			This should be unconditional.
			


				+static int pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num) +{ +
				

				int slot_addend; +    if( piix3_dev->config[0x64])  // APIC mode
				

				+        return ((pci_dev->devfn >> 3) & 7)+16;
				

				+    else { // PIC mode
				

				+        slot_addend = (pci_dev->devfn >> 3) - 1;
				

				+        return (irq_num + slot_addend) & 3;
				

				+    }
				

				+}
				



			What I'm suggesting is to "fork" the interrupt into two lines, one
			

			legacy path and the ioapic path.
			


		I'll try this way.
		


		Anthony
		



^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [RFC] kvm irq assignment
  2008-06-14 23:32     ` Marcelo Tosatti
@ 2008-06-16  1:34       ` Xu, Anthony
  2008-06-16  5:31       ` Xu, Anthony
  1 sibling, 0 replies; 12+ messages in thread
From: Xu, Anthony @ 2008-06-16  1:34 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, Alexander Graf, Jes Sorensen, kvm, kvm-ia64

Marcelo Tosatti wrote:
> Hi Anthony,
> 
> On Fri, Jun 13, 2008 at 02:38:08PM +0800, Xu, Anthony wrote:
>> Hi Avi and all
>> 
>> This is the revised one,
>> 
>> All PCI devices send interrupt to both PIC and IOAPIC,
>> a). When PIC is enabled and IOAPIC is disabled,  all redirect
>> entries in IOAPIC are masked. B) When PIC is disabled and IPAPIC is
>> enabled, link entry bit7 is set, means this link entry is disable.
>> Guest OS need to guarantee PIC and IOAPIC are not enabled in the same
>> time. Otherwise cause many suspicious interrupt to guest.
>> 
>> Test by running guest linux in kvm/ia32 and kvm/ia64.
> 
> Interrupt sharing is stable under Linux, PCI hotplug is happy, and
> Windows is happy. Ship it!
Thanks, I will

> 
> I had to apply your patch by hand, your mailer eats newlines and other
> nasty things, please fix that (or send attached patches).
Sorry, I'll attach patch.

> 
>> 
>> +    Name (PICD, 0)
>> 
>> -    /* PCI Bus definition */
>> +    Method(_PIC, 1)
>> +    {
>> +        Store(Arg0, PICD)
>> +    }
>> +
>> +    /*PCI Bus definition */
> 
> Why did you take off the space before the "P" of PCI? Before you ask
> me, no, I don't have anything better to do :)
typo

> 
>>      Scope(\_SB) {
>>          Device(PCI0) {
>>              Name (_HID, EisaId ("PNP0A03"))
>>              Name (_ADR, 0x00)
>>              Name (_UID, 1)
>> -            Name(_PRT, Package() {
>> +
>> +            Method(_PRT,0){
>> +                If(PICD){
> 
> Put some spaces there too.
Okay

> 
>> diff --git a/qemu/hw/pci.c b/qemu/hw/pci.c
>> index a23a466..f96fbb5 100644
>> --- a/qemu/hw/pci.c
>> +++ b/qemu/hw/pci.c
>> @@ -27,6 +27,8 @@
>>  #include "net.h"
>>  #include "pc.h"
>> 
>> +#include "qemu-kvm.h"
>> +
>>  //#define DEBUG_PCI
>> 
>>  struct PCIBus {
>> @@ -534,12 +536,18 @@ static void pci_set_irq(void *opaque, int
>>      irq_num, int level) PCIDevice *pci_dev = (PCIDevice *)opaque;
>>      PCIBus *bus;
>>      int change;
>> -
>> +#ifdef KVM_CAP_IRQCHIP
>> +    int irq;
>> +#endif
>>      change = level - pci_dev->irq_state[irq_num];
>>      if (!change)
>>          return;
>> 
>>      pci_dev->irq_state[irq_num] = level;
>> +#ifdef KVM_CAP_IRQCHIP
>> +    irq = ioapic_map_irq(pci_dev->devfn, irq_num);
>> +    ioapic_set_irq(opaque, irq, change);
>> +#endif
> 
> I think you should avoid any changes to pci.c. Perhaps create a new
> ioapic_and_pic_map / ioapic_and_pic_set pair of functions and change
> pc.c to use that instead of piix_set_irq.
I'll consider how to do this

> 
> Other than that (and KVM_CAP_IRQCHIP mentioned by Avi, along with
> making sure this works with "-no-kvm") looks great.
Agree

> 
> Regarding the non-PIIX link devices I mentioned, that can be done
> later if necessary.
Agree, if we need to support dynamic irq, or support multiple IOAPIC.




Anthony

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [RFC] kvm irq assignment
  2008-06-13 14:22     ` Avi Kivity
@ 2008-06-16  1:36       ` Xu, Anthony
  0 siblings, 0 replies; 12+ messages in thread
From: Xu, Anthony @ 2008-06-16  1:36 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, Alexander Graf, Jes Sorensen, kvm, kvm-ia64

Avi Kivity wrote:
> Xu, Anthony wrote:
>> Hi Avi and all
>> 
>> This is the revised one,
>> 
>> All PCI devices send interrupt to both PIC and IOAPIC,
>> a). When PIC is enabled and IOAPIC is disabled,  all redirect
>> entries in IOAPIC are masked. B) When PIC is disabled and IPAPIC is
>> enabled, link entry bit7 is set, means this link entry is disable.
>> Guest OS need to guarantee PIC and IOAPIC are not enabled in the same
>> time. Otherwise cause many suspicious interrupt to guest.
>> 
>> Test by running guest linux in kvm/ia32 and kvm/ia64.
>> 
>> 
>> 
> 
> Looks good.
> 
>> diff --git a/qemu/hw/apic.c b/qemu/hw/apic.c
>> index a14cab2..c3014fa 100644
>> --- a/qemu/hw/apic.c
>> +++ b/qemu/hw/apic.c
>> @@ -1053,9 +1053,25 @@ static void ioapic_service(IOAPICState *s)   
>>  } }
>> 
>> +int ioapic_map_irq(int devfn, int irq_num)
>> +{
>> +    int irq;
>> +    irq = ((devfn >> 3) & 7) + 16;
>> +    return irq;
>> +}
>> +#ifdef KVM_CAP_IRQCHIP
>> +static int ioapic_irq_count[IOAPIC_NUM_PINS];
>> +#endif
>> +
>>  void ioapic_set_irq(void *opaque, int vector, int level)  {
>>      IOAPICState *s = opaque;
>> +#ifdef KVM_CAP_IRQCHIP
>> +    ioapic_irq_count[vector] += level;
>> +    if (kvm_enabled())
>> +	if (kvm_set_irq(vector, ioapic_irq_count[vector] == 0)) +

>> return; +#endif
>> 
> 
> I think this can be done more cleanly using the qemu_irq
> infrastructure. qem_irq_invert can do the inversion, and to get the
> "irq fork", you can have a qemu_irq instance that forwards its
> argument to two other qemu_irq instances.
I'm thinking how to make it clean and generic

> 
> There shouldn't be any #ifdef KVM_CAP_IRQCHIPs in there - it should
> work for plain qemu as well (well, if we fix qemu ioapic polarity
> code). 
Agree

> 
>> diff --git a/qemu/hw/piix_pci.c b/qemu/hw/piix_pci.c
>> index 90cb3a6..96316ca 100644
>> --- a/qemu/hw/piix_pci.c
>> +++ b/qemu/hw/piix_pci.c
>> @@ -225,6 +226,9 @@ static void piix3_set_irq(qemu_irq *pic, int
>>      irq_num, int level) /* now we change the pic irq level
>>      according to the piix irq mappings */ /* XXX: optimize */
>>      pic_irq = piix3_dev->config[0x60 + irq_num];
>> +    /* if bit7 set 1, this link is disabled */
>> +    if (pic_irq & 0x80)
>> +        return;
>> 
> 
> Already caught by the test below... hacky.
Didn't see that, thanks

> 
>>      if (pic_irq < 16) {
>>          /* The pic level is the logical OR of all the PCI irqs
>> mapped             to it */ 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [RFC] kvm irq assignment
  2008-06-14 23:32     ` Marcelo Tosatti
  2008-06-16  1:34       ` Xu, Anthony
@ 2008-06-16  5:31       ` Xu, Anthony
  2008-06-16 15:52         ` Marcelo Tosatti
  1 sibling, 1 reply; 12+ messages in thread
From: Xu, Anthony @ 2008-06-16  5:31 UTC (permalink / raw)
  To: Xu, Anthony, Marcelo Tosatti
  Cc: Avi Kivity, Alexander Graf, Jes Sorensen, kvm, kvm-ia64

>> I think you should avoid any changes to pci.c. Perhaps create a new
>> ioapic_and_pic_map / ioapic_and_pic_set pair of functions and change
>> pc.c to use that instead of piix_set_irq.
> I'll consider how to do this
> 

But pci.c includes below code section, which implements irq_num mapping
through interrupt link, While ioapic_set_irq doesn't
need this modification.
Seems, it's unavoidable to modify pci.c

Thanks,
Anthony



    pci_dev->irq_state[irq_num] = level;
    for (;;) {
        bus = pci_dev->bus;
        irq_num = bus->map_irq(pci_dev, irq_num);
        if (bus->set_irq)
            break;
        pci_dev = bus->parent_dev;


Xu, Anthony wrote:
> Marcelo Tosatti wrote:
>> Hi Anthony,
>> 
>> On Fri, Jun 13, 2008 at 02:38:08PM +0800, Xu, Anthony wrote:
>>> Hi Avi and all
>>> 
>>> This is the revised one,
>>> 
>>> All PCI devices send interrupt to both PIC and IOAPIC,
>>> a). When PIC is enabled and IOAPIC is disabled,  all redirect
>>> entries in IOAPIC are masked. B) When PIC is disabled and IPAPIC is
>>> enabled, link entry bit7 is set, means this link entry is disable.
>>> Guest OS need to guarantee PIC and IOAPIC are not enabled in the
>>> same time. Otherwise cause many suspicious interrupt to guest.
>>> 
>>> Test by running guest linux in kvm/ia32 and kvm/ia64.
>> 
>> Interrupt sharing is stable under Linux, PCI hotplug is happy, and
>> Windows is happy. Ship it!
> Thanks, I will
> 
>> 
>> I had to apply your patch by hand, your mailer eats newlines and
>> other nasty things, please fix that (or send attached patches).
>> Sorry, I'll attach patch. 
> 
>> 
>>> 
>>> +    Name (PICD, 0)
>>> 
>>> -    /* PCI Bus definition */
>>> +    Method(_PIC, 1)
>>> +    {
>>> +        Store(Arg0, PICD)
>>> +    }
>>> +
>>> +    /*PCI Bus definition */
>> 
>> Why did you take off the space before the "P" of PCI? Before you ask
>> me, no, I don't have anything better to do :) typo
> 
>> 
>>>      Scope(\_SB) {
>>>          Device(PCI0) {
>>>              Name (_HID, EisaId ("PNP0A03"))
>>>              Name (_ADR, 0x00)
>>>              Name (_UID, 1)
>>> -            Name(_PRT, Package() {
>>> +
>>> +            Method(_PRT,0){
>>> +                If(PICD){
>> 
>> Put some spaces there too.
> Okay
> 
>> 
>>> diff --git a/qemu/hw/pci.c b/qemu/hw/pci.c
>>> index a23a466..f96fbb5 100644
>>> --- a/qemu/hw/pci.c
>>> +++ b/qemu/hw/pci.c
>>> @@ -27,6 +27,8 @@
>>>  #include "net.h"
>>>  #include "pc.h"
>>> 
>>> +#include "qemu-kvm.h"
>>> +
>>>  //#define DEBUG_PCI
>>> 
>>>  struct PCIBus {
>>> @@ -534,12 +536,18 @@ static void pci_set_irq(void *opaque, int
>>>      irq_num, int level) PCIDevice *pci_dev = (PCIDevice *)opaque; 
>>>      PCIBus *bus; int change;
>>> -
>>> +#ifdef KVM_CAP_IRQCHIP
>>> +    int irq;
>>> +#endif
>>>      change = level - pci_dev->irq_state[irq_num];
>>>      if (!change)
>>>          return;
>>> 
>>>      pci_dev->irq_state[irq_num] = level;
>>> +#ifdef KVM_CAP_IRQCHIP
>>> +    irq = ioapic_map_irq(pci_dev->devfn, irq_num);
>>> +    ioapic_set_irq(opaque, irq, change);
>>> +#endif
>> 
>> I think you should avoid any changes to pci.c. Perhaps create a new
>> ioapic_and_pic_map / ioapic_and_pic_set pair of functions and change
>> pc.c to use that instead of piix_set_irq.
> I'll consider how to do this
> 
>> 
>> Other than that (and KVM_CAP_IRQCHIP mentioned by Avi, along with
>> making sure this works with "-no-kvm") looks great. Agree
> 
>> 
>> Regarding the non-PIIX link devices I mentioned, that can be done
>> later if necessary.
> Agree, if we need to support dynamic irq, or support multiple IOAPIC.
> 
> 
> 
> 
> Anthony

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [RFC] kvm irq assignment
       [not found]     ` <3073362C-AF0F-4DBC-989C-AAA5E2875BDF@suse.de>
  2008-06-16  1:26       ` Xu, Anthony
@ 2008-06-16  5:40       ` Xu, Anthony
  1 sibling, 0 replies; 12+ messages in thread
From: Xu, Anthony @ 2008-06-16  5:40 UTC (permalink / raw)
  To: Xu, Anthony, Alexander Graf
  Cc: Avi Kivity, Marcelo Tosatti, Jes Sorensen, kvm, kvm-ia64

> This does look like common code to me, so it might be a good idea to
> share it with ia32. AFAIK apics are the same on ia32 and ia64.
> [Anthony] I wanted to do the same thing as what you said, while
> currently KVM/ia64 doesn't include apic.c file, I tried to add
> apic.c, but failed to complie   
> I'll try to make it work.

Further investigation show there are many ia32 specific codes in apic.c file.
It may incur a lot of modifications if we want to share apic.c with IA64.

For example,
        s->apicbase = (val & 0xfffff000) |
            (s->apicbase & (MSR_IA32_APICBASE_BSP | MSR_IA32_APICBASE_ENABLE));
    /* if disabled, cannot be enabled again */
    if (!(val & MSR_IA32_APICBASE_ENABLE)) {
        s->apicbase &= ~MSR_IA32_APICBASE_ENABLE;
        env->cpuid_features &= ~CPUID_APIC;
        s->spurious_vec &= ~APIC_SV_ENABLE;

BTW IA64 is using SAPIC not APIC, there are some difference.
For KVM/IA64, only two functions ioapic_map_irq and ioapic_set_irq are needed, and
Native qemu doesn't support guest IA64 platform yet.

So, the simple way is to put these two functions in ia64 specific file, ipf.c


Thanks,
Anthony


Xu, Anthony wrote:
> ________________________________
> 
> From: Alexander Graf [mailto:agraf@suse.de]
> Sent: 2008年6月13日 22:31
> To: Xu, Anthony
> Cc: Avi Kivity; Marcelo Tosatti; Jes Sorensen; kvm@vger.kernel.org;
> kvm-ia64@vger.kernel.org 
> Subject: Re: [RFC] kvm irq assignment
> 
> 
> 
> On Jun 12, 2008, at 11:38 PM, Xu, Anthony wrote:
> 
> 
> 	Hi Avi and all
> 
> 	This is the revised one,
> 
> 	All PCI devices send interrupt to both PIC and IOAPIC,
> 	a). When PIC is enabled and IOAPIC is disabled,  all redirect
> 	entries in IOAPIC are masked.
> 	B) When PIC is disabled and IPAPIC is enabled, link entry bit7 is
> 	set, means this link entry is disable.
> 	Guest OS need to guarantee PIC and IOAPIC are not enabled in the same
> 	time. Otherwise cause many suspicious interrupt to guest.
> 
> 
> At boottime the IOAPIC is completely masked, while the PIC is
> completely unmasked. Somewhere during bootup ACPI compliant OSs can
> call _PIC to switch to IOAPIC mode and somehow deactivate the PIC
> lines and activate IOAPIC lines for the devices they are interested
> in.    
> 
> I sent a tool called "apicdump" to this list some time ago that
> allows you to read the PIC and IOAPIC from within Linux. 
> 
> 
> [Anthony] nice tool, I'll try that.
> 
> 
> 
> 	Test by running guest linux in kvm/ia32 and kvm/ia64.
> 
> 
> 	Thanks,
> 	Anthony
> 
> 
> 
> 	diff --git a/bios/acpi-dsdt.dsl b/bios/acpi-dsdt.dsl
> 	index 21fc76a..e12fd66 100755
> 	--- a/bios/acpi-dsdt.dsl
> 	+++ b/bios/acpi-dsdt.dsl
> 
> 
> ACPI changes look fine to me.
> 
> 
> 
> 
> 
> 
> 
> [snip]
> 
> 
> 	diff --git a/qemu/hw/apic.c b/qemu/hw/apic.c
> 	index a14cab2..c3014fa 100644
> 	--- a/qemu/hw/apic.c
> 	+++ b/qemu/hw/apic.c
> 	@@ -1053,9 +1053,25 @@ static void ioapic_service(IOAPICState *s)
> 	    }
> 	}
> 
> 	+int ioapic_map_irq(int devfn, int irq_num)
> 	+{
> 	+    int irq;
> 	+    irq = ((devfn >> 3) & 7) + 16;
> 	+    return irq;
> 	+}
> 	+#ifdef KVM_CAP_IRQCHIP
> 	+static int ioapic_irq_count[IOAPIC_NUM_PINS];
> 	+#endif
> 	+
> 	void ioapic_set_irq(void *opaque, int vector, int level)
> 	{
> 	    IOAPICState *s = opaque;
> 	+#ifdef KVM_CAP_IRQCHIP
> 	+    ioapic_irq_count[vector] += level;
> 	+    if (kvm_enabled())
> 	+ if (kvm_set_irq(vector, ioapic_irq_count[vector] == 0))
> 	+    return;
> 	+#endif
> 
> 	    if (vector >= 0 && vector < IOAPIC_NUM_PINS) {
> 	        uint32_t mask = 1 << vector;
> 	diff --git a/qemu/hw/ipf.c b/qemu/hw/ipf.c
> 	index b11e328..4761463 100644
> 	--- a/qemu/hw/ipf.c
> 	+++ b/qemu/hw/ipf.c
> 	@@ -672,3 +672,23 @@ QEMUMachine ipf_machine = {
> 	    ipf_init_pci,
> 	    VGA_RAM_SIZE + VGA_RAM_SIZE,
> 	};
> 	+
> 	+#define IOAPIC_NUM_PINS 48
> 	+static int ioapic_irq_count[IOAPIC_NUM_PINS];
> 	+
> 	+int ioapic_map_irq(int devfn, int irq_num)
> 	+{
> 	+    int irq, dev;
> 	+    dev = devfn >> 3;
> 	+    irq = ((((dev << 2) + (dev >> 3) + irq_num) & 31) + 16);
> 	+    return irq;
> 	+}
> 
> 
> Why does this look different from the one in apic.c?
> [Anthony] it is used for ia64,  I would like to use same guest
> Firmware of XEN/ia64 in KVM/IA64, 
> I "copy" the mapping from XEN/IA64.
> 
> 
> 
> 
> 	+
> 	+void ioapic_set_irq(void *opaque, int vector, int level)
> 	+{
> 	+    ioapic_irq_count[vector] += level;
> 	+    if (kvm_enabled())
> 	+ if (kvm_set_irq(vector, ioapic_irq_count[vector] == 0))
> 	+    return;
> 	+}
> 	+
> 
> 
> This does look like common code to me, so it might be a good idea to
> share it with ia32. AFAIK apics are the same on ia32 and ia64.
> [Anthony] I wanted to do the same thing as what you said, while
> currently KVM/ia64 doesn't include apic.c file, I tried to add
> apic.c, but failed to complie   
> I'll try to make it work.
> 
> 
> 
> 
> 	diff --git a/qemu/hw/pc.h b/qemu/hw/pc.h
> 	index c284bf1..ef09a78 100644
> 	--- a/qemu/hw/pc.h
> 	+++ b/qemu/hw/pc.h
> 	@@ -47,6 +47,7 @@ int apic_accept_pic_intr(CPUState *env);
> 	void apic_local_deliver(CPUState *env, int vector);
> 	int apic_get_interrupt(CPUState *env);
> 	IOAPICState *ioapic_init(void);
> 	+int ioapic_map_irq(int devfn, int irq_num);
> 	void ioapic_set_irq(void *opaque, int vector, int level);
> 
> 	/* i8254.c */
> 	diff --git a/qemu/hw/pci.c b/qemu/hw/pci.c
> 	index a23a466..f96fbb5 100644
> 	--- a/qemu/hw/pci.c
> 	+++ b/qemu/hw/pci.c
> 	@@ -27,6 +27,8 @@
> 	#include "net.h"
> 	#include "pc.h"
> 
> 	+#include "qemu-kvm.h"
> 	+
> 	//#define DEBUG_PCI
> 
> 	struct PCIBus {
> 	@@ -534,12 +536,18 @@ static void pci_set_irq(void *opaque, int
> 	irq_num, int level)
> 	    PCIDevice *pci_dev = (PCIDevice *)opaque;
> 	    PCIBus *bus;
> 	    int change;
> 	-
> 	+#ifdef KVM_CAP_IRQCHIP
> 	+    int irq;
> 	+#endif
> 	    change = level - pci_dev->irq_state[irq_num];
> 	    if (!change)
> 	        return;
> 
> 	    pci_dev->irq_state[irq_num] = level;
> 	+#ifdef KVM_CAP_IRQCHIP
> 	+    irq = ioapic_map_irq(pci_dev->devfn, irq_num);
> 	+    ioapic_set_irq(opaque, irq, change);
> 	+#endif
> 
> 
> So if we're not running KVM we don't trigger the APIC? I don't see
> how that helps anyone. Just modify the Qemu APIC so that it works
> fine and then hook the in-kernel APIC to that. [Anthony] agree, it
> will support native Qemu.  
> 
> 
> 
> 	    for (;;) {
> 	        bus = pci_dev->bus;
> 	        irq_num = bus->map_irq(pci_dev, irq_num);
> 	diff --git a/qemu/hw/piix_pci.c b/qemu/hw/piix_pci.c
> 	index 90cb3a6..96316ca 100644
> 	--- a/qemu/hw/piix_pci.c
> 	+++ b/qemu/hw/piix_pci.c
> 	@@ -225,6 +226,9 @@ static void piix3_set_irq(qemu_irq *pic, int
> 	irq_num, int level)
> 	    /* now we change the pic irq level according to the piix irq
> 	mappings */
> 	    /* XXX: optimize */
> 	    pic_irq = piix3_dev->config[0x60 + irq_num];
> 	+    /* if bit7 set 1, this link is disabled */
> 	+    if (pic_irq & 0x80)
> 	+        return;
> 
> 
> Uhm ... I do remember having seen that somewhere else. Why do you
> need to add it now? [Anthony] As Avi pointed out, below check (
> pic_irq < 16) already covers this check. 
> 
> Thanks
> Anthony
> 
> 
> 
> 	    if (pic_irq < 16) {
> 	        /* The pic level is the logical OR of all the PCI irqs mapped
> 	           to it */
> 
> 
> 
> 
> 
> 	Xu, Anthony wrote:
> 
> 
> 		Avi Kivity wrote:
> 
> 
> 			Xu, Anthony wrote:
> 
> 
> 				Hi all,
> 
> 
> 				Thanks for your comments.
> 
> 
> 
> 				I made this new patch based on your comments
> 
> 
> 
> 				1. use bimodal _PRT, to take advantage of IOAPIC pin 16~23
> 
> 
> 				the mapping is simple,  slot  ->  (slot&7)+16 IOAPIC pin,
> 
> 
> 				someone may provide good mapping ?
> 
> 
> 
> 
> 			I think it's fine. If we find a better one later, or if we add
> 
> 
> 			another ioapic, we can easily change it since the bios and qemu are
> 
> 
> 			shipped as a unit.
> 
> 
> 
> 				2. use ISA-bridge configure space 0x64 byte as a communication
> 
> 
> 				mechansim. When guest BIOS invokes _PIC, the value is passed to
> 
> 
> 				            qemu through byte 0x64. qemu know whether it is PIC
> 
> 
> 				mode and APIC mode by checking byte 0x64.
> 
> 
> 				3. pci_slot_get_pirq and piix3_set_irq adopt different operation
> 
> 
> 				based on PIC mode/APIC mode
> 
> 
> 
> 
> 			I'm not sure how real hardware works, but I _think_ that it routes
> 
> 
> 			irqs unconditionally to both the legacy path and directly to the
> 
> 
> 			ioapic. So for example if slot 5 asserts an interrupt, we map it
> 
> 
> 			through the pci link mapping and generate an active high interrupt
> to 
> 
> 
> 			one of {5, 10, 11} (both pic and ioapic), and simultaneously an
> 
> 
> 			active low interrupt to ioapic pin 21.
> 
> 
> 		I think what you described is correct.
> 
> 
> 
> 
> 
> 			The _PIC method should disable the link interrupts if ioapic mode
> is 
> 
> 
> 			disabled.
> 
> 
> 		Typo!  If ioapic mode is enabled.
> 
> 
> 
> 		From x86 BIOS, OS disable link interrupt through link device _DIS
> 
> 
> 		mothod.
> 
> 
> 
> 
> 
> 			This removes the need for communication between the bios and qemu.
> 
> 
> 			Agree
> 
> 
> 
> 
> 
> 				+            /* APIC and PIC flag */
> 
> 
> 				+            OperationRegion (P40D, PCI_Config, 0x64, 0x01) +
> 
> 
> 
> 
> 			This is actually SERIRQC, serial irq control.
> 
> 
> 
> 				+
> 
> 
> 				+#ifdef KVM_CAP_IRQCHIP
> 
> 
> 
> 
> 			This should be unconditional.
> 
> 
> 
> 				+static int pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num) +{
> + 
> 
> 
> 				int slot_addend; +    if( piix3_dev->config[0x64])  // APIC mode
> 
> 
> 				+        return ((pci_dev->devfn >> 3) & 7)+16;
> 
> 
> 				+    else { // PIC mode
> 
> 
> 				+        slot_addend = (pci_dev->devfn >> 3) - 1;
> 
> 
> 				+        return (irq_num + slot_addend) & 3;
> 
> 
> 				+    }
> 
> 
> 				+}
> 
> 
> 
> 
> 			What I'm suggesting is to "fork" the interrupt into two lines, one
> 
> 
> 			legacy path and the ioapic path.
> 
> 
> 
> 		I'll try this way.
> 
> 
> 
> 		Anthony

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] kvm irq assignment
  2008-06-16  5:31       ` Xu, Anthony
@ 2008-06-16 15:52         ` Marcelo Tosatti
  0 siblings, 0 replies; 12+ messages in thread
From: Marcelo Tosatti @ 2008-06-16 15:52 UTC (permalink / raw)
  To: Xu, Anthony; +Cc: Avi Kivity, Alexander Graf, Jes Sorensen, kvm, kvm-ia64

On Mon, Jun 16, 2008 at 01:31:21PM +0800, Xu, Anthony wrote:
> >> I think you should avoid any changes to pci.c. Perhaps create a new
> >> ioapic_and_pic_map / ioapic_and_pic_set pair of functions and change
> >> pc.c to use that instead of piix_set_irq.
> > I'll consider how to do this
> > 
> 
> But pci.c includes below code section, which implements irq_num mapping
> through interrupt link, While ioapic_set_irq doesn't
> need this modification.
> Seems, it's unavoidable to modify pci.c

Just do a dummy mapping function? 

int ioapic_pic_map_irq(PCIDevice *dev, int irq_num)
{
    return irq_num;
}

> Thanks,
> Anthony
> 
> 
> 
>     pci_dev->irq_state[irq_num] = level;
>     for (;;) {
>         bus = pci_dev->bus;
>         irq_num = bus->map_irq(pci_dev, irq_num);
>         if (bus->set_irq)
>             break;
>         pci_dev = bus->parent_dev;



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-06-16 15:54 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-12 16:39 [RFC] kvm irq assignment Xu, Anthony
2008-06-12 19:16 ` Avi Kivity
2008-06-13  2:45   ` Xu, Anthony
2008-06-13  6:38   ` Xu, Anthony
2008-06-13 14:22     ` Avi Kivity
2008-06-16  1:36       ` Xu, Anthony
2008-06-14 23:32     ` Marcelo Tosatti
2008-06-16  1:34       ` Xu, Anthony
2008-06-16  5:31       ` Xu, Anthony
2008-06-16 15:52         ` Marcelo Tosatti
     [not found]     ` <3073362C-AF0F-4DBC-989C-AAA5E2875BDF@suse.de>
2008-06-16  1:26       ` Xu, Anthony
2008-06-16  5:40       ` Xu, Anthony

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox