From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marcial Rion Subject: Re: Issue with pv_ops Kernel 2.6.31.6 and Xen Date: Thu, 28 Jan 2010 06:59:54 +0100 Message-ID: <4B6127DA.1040408@swissonline.ch> References: <4B5A28CC.1090404@swissonline.ch> Reply-To: marcial.rion@swissonline.ch Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4B5A28CC.1090404@swissonline.ch> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org Sorry, this is a duplicate of http://lists.xensource.com/archives/html/xen-devel/2010-01/msg00855.html Thought that this mail did not reach the mailing list, so I reposted it... Marcial Rion wrote: > Hi > > First of all I have to state that I am neither a Kernel nor a Xen > developer. Nevertheless, while trying to use Kernel 2.6.31.6 from > git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git as a Dom0 > Kernel, I discovered an issue and searching the Internet for a long > time, I probably also found the cause. However, I won't be able to fix > it by myself :-(, so I am trying to share my knowledge with this list, > in the hope that the issue might gets fixed sometime :-)... > I will try to give you all information that seems relevant to me; > however, if it turns out I missed to give enough details about my system > (configuration), log files or anything else, I will be glad to provide > this information. Furthermore, I would also be happy to support > "testing" of potential patches if this is required. I post to this list > as this has been suggested at > http://wiki.xensource.com/xenwiki/XenParavirtOps (bottom of page). If I > am wrong, please give me a short hint so I wont bother you any longer... > > Now, let's get into it... > > About my system: > I am running Gentoo (10.0, server profile) on an Asus P2B-D motherboard > (PIIX4 chipset) with two PIII 500 MHz CPUs and 1G of RAM. The system > furthermore possesses 3 PCI network interfaces of chip type Realtek RLT > 8139 (rlt8139too Kernel driver). Network interface to be used is eth0 (I > already tried whether using another interface as eth0 would change > anything - without success :-( ). > > The issue I have: > While Xen pv_ops Kernel 2.6.31.6 perfectly runs on bare metal, it fails > to get network connectivity when run on top of Xen 3.4.1 (Gentoo default > installation). Though the system seems to come up correctly at a first > sight and network interface is available (I can ping it locally), access > to network fails (I cannot ping other system in the network nor vice-versa). > > What I discovered so far: > Consulting the boot messages within "dmesg", I discovered that ACPI SCI > fails to load when run on top of Xen, while this error is not happening > on bare metal. > > With XEN: > ********* > bio: create slab at 0 > ACPI: SCI (IRQ20) allocation failed > ACPI Exception: AE_NOT_ACQUIRED, Unable to install System Control > Interrupt handler 20090521 evevent-161 > ACPI: Unable to start the ACPI Interpreter > ------------[ cut here ]------------ > WARNING: at lib/kobject.c:595 kobject_put+0x27/0x3c() > Hardware name: System Name > kobject: '' (cf805ea0): is not initialized, yet kobject_put() is > being called. > Modules linked in: > Pid: 1, comm: swapper Tainted: G W 2.6.31.6 #14 > Call Trace: > [] warn_slowpath_common+0x60/0x90 > [] warn_slowpath_fmt+0x24/0x27 > [] kobject_put+0x27/0x3c > [] kmem_cache_destroy+0x105/0x11b > [] acpi_os_delete_cache+0x8/0xc > [] acpi_ut_delete_caches+0xd/0x6b > [] acpi_ut_subsystem_shutdown+0x87/0x90 > [] ? acpi_init+0x0/0x263 > [] acpi_terminate+0x8/0x14 > [] acpi_init+0x194/0x263 > [] ? __class_create+0x44/0x5e > [] ? fbmem_init+0x0/0x78 > [] ? acpi_init+0x0/0x263 > [] do_one_initcall+0x4c/0x13a > [] kernel_init+0x12c/0x17d > [] ? kernel_init+0x0/0x17d > [] kernel_thread_helper+0x7/0x10 > ---[ end trace 4eaa2a86a8e2da23 ]--- > ------------[ cut here ]------------ > WARNING: at lib/kobject.c:595 kobject_put+0x27/0x3c() > Hardware name: System Name > kobject: '' (cf805f60): is not initialized, yet kobject_put() is > being called. > Modules linked in: > Pid: 1, comm: swapper Tainted: G W 2.6.31.6 #14 > Call Trace: > [] warn_slowpath_common+0x60/0x90 > [] warn_slowpath_fmt+0x24/0x27 > [] kobject_put+0x27/0x3c > [] kmem_cache_destroy+0x105/0x11b > [] acpi_os_delete_cache+0x8/0xc > [] acpi_ut_delete_caches+0x35/0x6b > [] acpi_ut_subsystem_shutdown+0x87/0x90 > [] ? acpi_init+0x0/0x263 > [] acpi_terminate+0x8/0x14 > [] acpi_init+0x194/0x263 > [] ? __class_create+0x44/0x5e > [] ? fbmem_init+0x0/0x78 > [] ? acpi_init+0x0/0x263 > [] do_one_initcall+0x4c/0x13a > [] kernel_init+0x12c/0x17d > [] ? kernel_init+0x0/0x17d > [] kernel_thread_helper+0x7/0x10 > ---[ end trace 4eaa2a86a8e2da24 ]--- > sync cpu 0 get result ffffffff max_id 0 > Failed to sync pcpu 0 > xenbus_probe_backend_init bus registered ok > > > Wihout Xen: > *********** > bio: create slab at 0 > ACPI: EC: Look up EC in DSDT > ACPI: Interpreter enabled > ACPI: (supports S0 S5) > ACPI: Using IOAPIC for interrupt routing > ACPI: No dock devices found. > ACPI: PCI Root Bridge [PCI0] (0000:00) > pci 0000:00:00.0: reg 10 32bit mmio: [0xf8000000-0xfbffffff] > pci 0000:00:04.1: reg 20 io port: [0xb800-0xb80f] > pci 0000:00:04.2: reg 20 io port: [0xb400-0xb41f] > * Found PM-Timer Bug on the chipset. Due to workarounds for a bug, > * this clock source is slow. Consider trying other clock sources > pci 0000:00:04.3: quirk: region e400-e43f claimed by PIIX4 ACPI > pci 0000:00:04.3: quirk: region e800-e80f claimed by PIIX4 SMB > pci 0000:00:04.3: PIIX4 devres B PIO at 0290-0297 > pci 0000:00:09.0: reg 10 io port: [0xb000-0xb0ff] > pci 0000:00:09.0: reg 14 32bit mmio: [0xde800000-0xde8000ff] > pci 0000:00:09.0: reg 30 32bit mmio: [0x000000-0x00ffff] > pci 0000:00:0a.0: reg 10 io port: [0xa800-0xa8ff] > pci 0000:00:0a.0: reg 14 32bit mmio: [0xde000000-0xde0000ff] > pci 0000:00:0a.0: supports D1 D2 > pci 0000:00:0a.0: PME# supported from D1 D2 D3hot > pci 0000:00:0a.0: PME# disabled > pci 0000:00:0b.0: reg 10 io port: [0xa400-0xa4ff] > pci 0000:00:0b.0: reg 14 32bit mmio: [0xdd800000-0xdd8000ff] > pci 0000:00:0b.0: supports D1 D2 > pci 0000:00:0b.0: PME# supported from D1 D2 D3hot > pci 0000:00:0b.0: PME# disabled > pci 0000:01:00.0: reg 10 32bit mmio: [0xe0000000-0xe3ffffff] > pci 0000:01:00.0: reg 14 32bit mmio: [0xdf800000-0xdf87ffff] > pci 0000:01:00.0: reg 18 io port: [0xd800-0xd8ff] > pci 0000:01:00.0: reg 30 32bit mmio: [0xdf7e0000-0xdf7fffff] > pci 0000:01:00.0: supports D1 D2 > pci 0000:00:01.0: bridge io port: [0xd000-0xdfff] > pci 0000:00:01.0: bridge 32bit mmio: [0xf4000000-0xf40fffff] > pci 0000:00:01.0: bridge 32bit mmio pref: [0xdf700000-0xe3ffffff] > pci_bus 0000:00: on NUMA node 0 > ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] > ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 9 10 *11 12 14 15) > ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 9 *10 11 12 14 15) > ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 9 10 11 *12 14 15) > ACPI: PCI Interrupt Link [LNKD] (IRQs 3 *4 5 6 7 9 10 11 12 14 15) > xenbus_probe_backend_init bus registered ok > > > Respective to the error, the /proc/interrupts tables were also different: > > With XEN: > ********* > CPU0 CPU1 > 1: 426 0 xen-pirq-ioapic-edge i8042 > 3: 0 0 xen-pirq-ioapic-edge uhci_hcd:usb1 > 4: 2 0 xen-pirq-ioapic-edge serial > 8: 2 0 xen-pirq-ioapic-edge rtc0 > 12: 0 0 xen-pirq-ioapic-edge eth0 > 14: 4319 0 xen-pirq-ioapic-edge ide0 > 15: 42 0 xen-pirq-ioapic-edge ide1 > 411: 0 0 xen-dyn-event xenbus > 412: 0 703 xen-dyn-ipi callfuncsingle1 > 413: 0 0 xen-dyn-virq debug1 > 414: 0 0 xen-dyn-ipi callfunc1 > 415: 0 45622 xen-dyn-ipi resched1 > 416: 0 311 xen-dyn-ipi spinlock1 > 417: 0 153289 xen-dyn-virq timer1 > 418: 550 0 xen-dyn-ipi callfuncsingle0 > 419: 0 0 xen-dyn-virq debug0 > 420: 0 0 xen-dyn-ipi callfunc0 > 421: 18071 0 xen-dyn-ipi resched0 > 422: 661 0 xen-dyn-ipi spinlock0 > 423: 277476 0 xen-dyn-virq timer0 > NMI: 0 0 Non-maskable interrupts > LOC: 0 0 Local timer interrupts > SPU: 0 0 Spurious interrupts > CNT: 0 0 Performance counter interrupts > PND: 0 0 Performance pending work > RES: 18071 45622 Rescheduling interrupts > CAL: 550 703 Function call interrupts > TLB: 0 0 TLB shootdowns > TRM: 0 0 Thermal event interrupts > THR: 0 0 Threshold APIC interrupts > MCE: 0 0 Machine check exceptions > MCP: 132 132 Machine check polls > ERR: 0 > MIS: 0 > > > Without XEN: > ************ > CPU0 CPU1 > 0: 46 0 IO-APIC-edge timer > 1: 2567 4239 IO-APIC-edge i8042 > 6: 3 0 IO-APIC-edge floppy > 8: 1 1 IO-APIC-edge rtc0 > 14: 28604 27089 IO-APIC-edge ide0 > 15: 0 0 IO-APIC-edge ide1 > 18: 1942 1978 IO-APIC-fasteoi eth0 > 20: 0 0 IO-APIC-fasteoi acpi > NMI: 0 0 Non-maskable interrupts > LOC: 1097380 1052641 Local timer interrupts > SPU: 0 0 Spurious interrupts > CNT: 0 0 Performance counter interrupts > PND: 0 0 Performance pending work > RES: 105211 107135 Rescheduling interrupts > CAL: 16 20 Function call interrupts > TLB: 4542 4509 TLB shootdowns > TRM: 0 0 Thermal event interrupts > THR: 0 0 Threshold APIC interrupts > MCE: 0 0 Machine check exceptions > MCP: 289 289 Machine check polls > ERR: 0 > MIS: 0 > > > Searching the Internet, I ran across different messages (i.e. > http://www.mail-archive.com/kvm@vger.kernel.org/msg26601.html) > mentioning that on motherboards with the PIIX4 chipset SCI interrupt is > hardwired to IRQ 9. However, on my system it is assigned IRQ 20 on bare > metal, and fails to be set to IRQ 20 on top of Xen (see extract above of > dmesg when run on top of Xen -> ACPI: SCI (IRQ20) allocation failed). > > As I started wondering whether it would work with IRQ 9 and having no > knowledge of ACPI and interrupt handling in the Kernel, I badly fixed > the code of /drivers/acpi/osl.c in the following manner: > > osl.c:391 > ********* > acpi_status > acpi_os_install_interrupt_handler(u32 gsi, acpi_osd_handler handler, > void *context) > { > unsigned int irq; > > acpi_irq_stats_init(); > > /* > * Ignore the GSI from the core, and use the value in our copy > of the > * FADT. It may not be the same if an interrupt source override > exists > * for the SCI. > */ > gsi = acpi_gbl_FADT.sci_interrupt; > if (acpi_gsi_to_irq(gsi, &irq) < 0) { > printk(KERN_ERR PREFIX "SCI (ACPI GSI %d) not registered\n", > gsi); > return AE_OK; > } > + irq = 9; > acpi_irq_handler = handler; > acpi_irq_context = context; > if (request_irq(irq, acpi_irq, IRQF_SHARED, "acpi", acpi_irq)) { > printk(KERN_ERR PREFIX "SCI (IRQ%d) allocation > failed\n", irq); > return AE_NOT_ACQUIRED; > } > acpi_irq_irq = irq; > > return AE_OK; > } > > > As you can see, I just "overwrote" the IRQ number somehow evaluated by > the system with IRQ 9, recompiled the Kernel and discovered(!) that > networking was now working, even within Xen (btw: it was still working > on bare metal). > > Now I don't know why it is working with SCI mapped to IRQ 20 on bare > metal while SCI is supposed to be hardwired to IRQ 9, but the fact that > it works in both cases with IRQ 9 suggests me there is something "wrong" > or at least different when pv_ops Kernel 2.6.31.6 is run on top of Xen. > So someone somewhen might have a look at it, because that's where my > knowledge stops... > > Thanks & regards, > Marcial > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >