On 12/23/2009 10:10 AM, Pallipadi, Venkatesh wrote: > > >> -----Original Message----- >> From: Mark Hounschell [mailto:markh@compro.net] >> Sent: Wednesday, December 23, 2009 5:03 AM >> To: Pallipadi, Venkatesh >> Cc: dmarkh@cfl.rr.com; Linus Torvalds; Alain Knaff; Linux >> Kernel Mailing List; fdutils@fdutils.linux.lu; Li, Shaohua; Ingo Molnar >> Subject: Re: [Fdutils] DMA cache consistency bug introduced in >> 2.6.28 (Was: Re: Cannot format floppies under kernel 2.6.*?) >> >> On 12/22/2009 07:22 PM, Mark Hounschell wrote: >>> On 12/22/2009 06:37 PM, Pallipadi, Venkatesh wrote: >>>> On Tue, 2009-12-22 at 09:57 -0800, Mark Hounschell wrote: >>>>> On 12/22/2009 12:38 PM, Linus Torvalds wrote: >>>>>> >>>>>> [ Ingo, Venki and Shaohua added to cc: see the whole >> thread on lkml for >>>>>> details, but Mark is basically chasing down a situation >> where the floppy >>>>>> driver seems to have trouble formatting floppies, and >> it happened >>>>>> between 2.6.27 and .28. The trouble seems to be that a >> DMA transfer of a >>>>>> memory block transfers the wrong value for the first >> byte of the block. >>>>>> >>>>>> Which should be impossible, but whatever. Some part of >> the system has a >>>>>> cached buffer that isn't flushed. >>>>>> >>>>>> What gets _you_ guys involved is that Mark cannot >> reproduce the bug if >>>>>> HPET is disabled in the BIOS or by using 'nohpet'. He >> found that out by >>>>>> pure luck while bisecting, because some time during his >> bisect, his >>>>>> machine wouldn't even boot with HPET. >>>>>> >>>>>> So the problem is: with HPET enabled, 2.6.27.4 _used_ >> to work. But >>>>>> 2.6.28 (and current -git) does not. Any ideas? ] >>>>>> >>>>>> On Tue, 22 Dec 2009, Mark Hounschell wrote: >>>>>>> >>>>>>> Ok, I may have something that might help. >>>>>>> >>>>>>> # git bisect bad >>>>>>> 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is the first bad commit >>>>>>> commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 >>>>>>> Author: venkatesh.pallipadi@intel.com >> >>>>>>> Date: Fri Sep 5 18:02:18 2008 -0700 >>>>>>> >>>>>>> x86: HPET_MSI Initialise per-cpu HPET timers >>>>>>> >>>>>>> Initialize a per CPU HPET MSI timer when possible. >> We retain the HPET >>>>>>> timer 0 (IRQ 0) and timer 1 (IRQ 8) as is when >> legacy mode is being used. We >>>>>>> setup the remaining HPET timers as per CPU MSI based >> timers. This per CPU >>>>>>> timer will eliminate the need for timer broadcasting >> with IRQ 0 when there >>>>>>> is non-functional LAPIC timer across CPU deep C-states. >>>>>>> >>>>>>> If there are more CPUs than number of available >> timers, CPUs that do not >>>>>>> find any timer to use will continue using LAPIC and >> IRQ 0 broadcast. >>>>>>> >>>>>>> Signed-off-by: Venkatesh Pallipadi >> >>>>>>> Signed-off-by: Shaohua Li >>>>>>> Signed-off-by: Ingo Molnar >>>>>>> >>>>>>> And of coarse this was the first commit that I could not >> boot if I had hpet >>>>>>> enabled. To get this one to boot (single user mode only) >> I had to add the >>>>>>> the quiet cmdline option and following patch from to >> arch/x86/kernel/hpet.c >>>>>>> >>>>>>> commit 5ceb1a04187553e08c6ab60d30cee7c454ee139a >>>>>>> >>>>>>> @ -445,7 +445,7 @@ static int hpet_setup_irq(struct >> hpet_dev *dev) >>>>>>> { >>>>>>> >>>>>>> if (request_irq(dev->irq, hpet_interrupt_handler, >>>>>>> - IRQF_SHARED|IRQF_NOBALANCING, >> dev->name, dev)) >>>>>>> + IRQF_DISABLED|IRQF_NOBALANCING, >> dev->name, dev)) >>>>>>> return -1; >>>>>>> >>>>>>> disable_irq(dev->irq); >>>>>>> >>>>>>> AND add the quiet cmdline option. >>>>>> >>>>>> Ok, so we know why HPET didn't boot for you, and that was >> fixed later (by >>>>>> that 5ceb1a04). But is this also when the floppy started >> mis-behaving? >>>>>> >>>>> >>>>> Commit 26afe5f2fbf06ea0765aaa316640c4dd472310c0 is when >> the floppy stops >>>>> working >>>>> and also when I could no longer boot with hpet enabled. >>>> >>>> >>>> I am missing something here. Commit 26afe5f2 is where >> system does not >>>> boot with HPET or is it where the floppy stops working when you boot >>>> with HPET enabled. >>>> >>> >>> As it happens, both happen there. Commit 5ceb1a04 is where it starts >>> booting _again_ with hpet enabled. So I took that patch >> (5ceb1a04) and >>> applied it to (26afe5f2f) to be able to boot with hpet >> enabled. I had to >>> use the quiet option to get to a login prompt, but there is where the >>> floppy format first fails, just as it does in 2.6.28 and up. >>> >>>> Can you try "idle=halt" with both .27 and .28 with /proc/interrupts >>>> output in each case. With that option, we should be using local APIC >>>> timer and PIT, HPET or HPET with MSI should not really >> matter. Does it >>>> still fail with .28 with that option? >>>> >> >> 2.6.28 still fails with that option. >> >> 2.6.27.41 /proc/interrupts with idle=halt >> >> CPU0 CPU1 CPU2 CPU3 >> 0: 126 0 0 1 >> IO-APIC-edge timer >> 1: 0 0 1 157 >> IO-APIC-edge i8042 >> 3: 0 0 0 6 IO-APIC-edge >> 4: 0 0 0 6 IO-APIC-edge >> 6: 0 0 0 4 >> IO-APIC-edge floppy >> 8: 0 0 0 1 >> IO-APIC-edge rtc0 >> 9: 0 0 0 0 >> IO-APIC-fasteoi acpi >> 12: 0 0 1 128 >> IO-APIC-edge i8042 >> 14: 0 0 34 4457 IO-APIC-edge >> pata_atiixp >> 15: 0 0 4 480 IO-APIC-edge >> pata_atiixp >> 16: 0 0 0 397 IO-APIC-fasteoi >> aic79xx, ohci_hcd:usb3, ohci_hcd:usb4, HDA Intel >> 17: 0 0 0 2 IO-APIC-fasteoi >> ehci_hcd:usb1 >> 18: 0 0 0 0 IO-APIC-fasteoi >> ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7 >> 19: 0 0 0 142 IO-APIC-fasteoi >> aic7xxx, ehci_hcd:usb2, ttySLG0, eth1 >> 22: 0 0 4 1154 >> IO-APIC-fasteoi ahci >> 219: 0 0 3 63 >> PCI-MSI-edge eth0 >> NMI: 0 0 0 0 >> Non-maskable interrupts >> LOC: 91539 91964 92525 91181 Local timer >> interrupts >> RES: 2888 3873 2434 2721 >> Rescheduling interrupts >> CAL: 240 245 247 84 function >> call interrupts >> TLB: 768 628 526 512 TLB shootdowns >> SPU: 0 0 0 0 Spurious interrupts >> ERR: 0 >> MIS: 0 >> >> 2.6.28 /proc/interrupts with idle=halt >> >> CPU0 CPU1 CPU2 CPU3 >> 0: 126 0 2 0 >> IO-APIC-edge timer >> 1: 0 0 192 0 >> IO-APIC-edge i8042 >> 3: 0 0 6 0 IO-APIC-edge >> 4: 0 0 6 0 IO-APIC-edge >> 6: 0 0 4 0 >> IO-APIC-edge floppy >> 8: 0 0 1 0 >> IO-APIC-edge rtc0 >> 9: 0 0 0 0 >> IO-APIC-fasteoi acpi >> 12: 0 0 128 1 >> IO-APIC-edge i8042 >> 14: 0 1 147114 396 IO-APIC-edge >> pata_atiixp >> 15: 0 0 646 2 IO-APIC-edge >> pata_atiixp >> 16: 0 0 396 0 IO-APIC-fasteoi >> aic79xx, ohci_hcd:usb2, ohci_hcd:usb4, HDA Intel >> 17: 0 0 0 0 IO-APIC-fasteoi >> ehci_hcd:usb1 >> 18: 0 0 0 0 IO-APIC-fasteoi >> ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7 >> 19: 0 0 362 1 IO-APIC-fasteoi >> aic7xxx, ehci_hcd:usb3, ttySLG0, eth1 >> 22: 0 0 874 1 >> IO-APIC-fasteoi ahci >> 1274: 0 0 193 4 >> PCI-MSI-edge eth0 >> 1279: 513207 0 0 0 >> HPET_MSI-edge hpet2 >> NMI: 0 0 0 0 >> Non-maskable interrupts >> LOC: 268 513395 513138 522088 Local timer >> interrupts >> RES: 3262 3679 2573 3746 >> Rescheduling interrupts >> CAL: 131 166 57 147 Function >> call interrupts >> TLB: 680 438 450 639 TLB shootdowns >> SPU: 0 0 0 0 Spurious interrupts >> ERR: 0 >> MIS: 0 >> > > Hmm. Looks like hpet2 is still getting used instead of local APIC timer in .28 case. > > I was expecting some low number in hpet2 and local timer on all CPU to be around the same value. Above shows CPU 0 is depending on hpet2 for some reason even with idle=halt. Can you send the output of below two in case of .28 > /proc/timer_list Attached. > grep . /sys/devices/system/cpu/cpu0/cpuidle/*/* I have no /sys/devices/system/cpu/cpu0/cpuidle on this machine. Maybe because of # # CPU Frequency scaling # # CONFIG_CPU_FREQ is not set # CONFIG_CPU_IDLE is not set Would it be OK if when you ask for 2.6.28 info, I use a 2.6.32.2 kernel? That kernel also fails fdformat with hpet enabled on these machines. Thanks Mark