netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: HPET regression in 2.6.26 versus 2.6.25 -- connection between HPET and lockups found
@ 2008-08-19 12:49 David Witbrodt
  2008-08-19 13:08 ` Ingo Molnar
  0 siblings, 1 reply; 6+ messages in thread
From: David Witbrodt @ 2008-08-19 12:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Yinghai Lu, linux-kernel, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, H. Peter Anvin, netdev



> Just to make sure: on a working kernel, do you get the HPET messages? 
> I.e. does the hpet truly work in that case?

On the "fileserver", where 2.6.25 works but 2.6.26 locks up, the HPET
_does_ work on a working kernel:

$ uname -r
2.6.26.revert1

$ dmesg | grep -i hpet
ACPI: HPET 77FE80C0, 0038 (r1 RS690  AWRDACPI 42302E31 AWRD       98)
ACPI: HPET id: 0x10b9a201 base: 0xfed00000
hpet clockevent registered
hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0, 0
hpet0: 4 32-bit timers, 14318180 Hz
hpet_resources: 0xfed00000 is busy


What I didn't realize is that the "desktop" machine, where 2.6.26 has
always "worked", does NOT have a working HPET after all, even though 
I have enabled all HPET options in .config:

$ uname -r
2.6.26.080801.desktop.uvesafb

$ dmesg | grep -i hpet
$ 


This means I misunderstood my situation on "desktop".  I believed HPET
was working on all of my machines, but now I am not certain that it 
ever worked on "desktop" since I built it (May 2007).  The question 
never arose before, and because I enabled the HPET option in .config,
I just assumed that HPET was working.  (Duh...)  I failed to look into
this until now.

At any rate, my subject line is still accurate:  there _is_ an HPET
regression on "fileserver" (and "webserver"), since it worked on
2.6.25 kernels but causes lockups on 2.6.2[67] kernels.

(I don't know what is going on with "desktop":  does the motherboard
lack HPET, or does the Linux kernel not support the HPET hardware on 
the motherboard?)

BTW:  the 'dmesg' output above is the same on "desktop" with 2.6.25
and 2.6.26 -- I just checked to be sure.  For "fileserver", I checked
an old 2.6.25 kernel just now, and the output is identical.


Another experiment:  I just tried this...

static __init int hpet_insert_resource(void)
{
-     if (!hpet_res)
+     /* if (!hpet_res) */
         return 1;

-    return insert_resource(&iomem_resource, hpet_res);
+    /* return insert_resource(&iomem_resource, hpet_res); */
}

... and the lock up still occurs.  So, the memory is allocated but
the resource info is not inserted into the tree.  Whether the
dynamic memory for hpet_res is being damaged or not has no bearing
on the lockups, it would seem.  Looks like I was barking up the
wrong tree....


Dave W.

^ permalink raw reply	[flat|nested] 6+ messages in thread
* Re: HPET regression in 2.6.26 versus 2.6.25 -- connection between HPET and lockups found
@ 2008-08-19  3:51 David Witbrodt
  2008-08-19  9:23 ` Ingo Molnar
  0 siblings, 1 reply; 6+ messages in thread
From: David Witbrodt @ 2008-08-19  3:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Yinghai Lu, linux-kernel, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, H. Peter Anvin, netdev



> > Does this connection between HPET and insert_resource() look 
> > meaningful, or is this a coincidence?
> 
> it is definitely the angle i'd suspect the most.
> 
> perhaps we stomp over some piece of memory that is "available RAM" 
> according to your BIOS, but in reality is used by something. With 
> previous kernels we got lucky and have put a data structure there which 
> kept your hpet still working. (a bit far-fetched i think, but the best 
> theory i could come up with)

Working... or NOT working.  Tonight I noticed something strange about 
my desktop machine, which _works_ with 2.6.2[67] tonight:  even though 
it shares the same HPET .config settings with the 2 problem machines,

CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_HPET=y
CONFIG_HPET_RTC_IRQ=y
CONFIG_HPET_MMAP=y

apparently no HPET device gets configured by the kernel:

$ dmesg | grep -i hpet
$


In contrast, I get this on the 2 "bad" machines if using the 2.6.26
kernel with the 2 problem commits reverted:

$ dmesg | grep -i hpet
ACPI: HPET 77FE80C0, 0038 (r1 RS690  AWRDACPI 42302E31 AWRD       98)
ACPI: HPET id: 0x10b9a201 base: 0xfed00000
hpet clockevent registered
hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0, 0
hpet0: 4 32-bit timers, 14318180 Hz
hpet_resources: 0xfed00000 is busy


That makes it looks like my third machine might have locked up with 
2.6.2[67] as well, but some problem configuring HPET actually prevents
it from locking up.  I wonder how widespread this badness really is 
after all?!  Are we not seeing more reports of lockups simply because 
people are getting lucky on AMD dual core machines, and having their
HPET _fail_ instead of their kernel locking up?


> the address you printed out (0xffff88000100f000), does look _somewhat_ 
> suspicious. It corresponds to the physical address of 0x100f000. That is 
> _just_ above the 16MB boundary. It should not be relevant normally - but 
> it's still somewhat suspicious.

I guess I was hitting around about the upper 32 bits -- I take it that
these pointers are virtualized, and the upper half is some sort of
descriptor?  In that pointer was in a flat memory model, then it would be
pointing _way_ past the end of my 2 GB of RAM, which would end around
0x0000000080000000.

I am not used to looking at raw pointer addresses, just pointer variable 
names.  I think I was recalling the /proc/iomem data that Yinghai asked 
for, but this stuff is just offsets stripped of descriptors, huh?:

$ cat /proc/iomem
00000000-0009f3ff : System RAM
0009f400-0009ffff : reserved
000f0000-000fffff : reserved
00100000-77fdffff : System RAM
  00200000-0056ca21 : Kernel code
  0056ca22-006ce3d7 : Kernel data
  00753000-0079a3c7 : Kernel bss
77fe0000-77fe2fff : ACPI Non-volatile Storage
77fe3000-77feffff : ACPI Tables
77ff0000-77ffffff : reserved
78000000-7fffffff : pnp 00:0d
d8000000-dfffffff : PCI Bus #01
  d8000000-dfffffff : 0000:01:05.0
    d8000000-d8ffffff : uvesafb
e0000000-efffffff : PCI MMCONFIG 0
  e0000000-efffffff : reserved
fdc00000-fdcfffff : PCI Bus #02
  fdcff000-fdcff0ff : 0000:02:05.0
    fdcff000-fdcff0ff : r8169
fdd00000-fdefffff : PCI Bus #01
  fdd00000-fddfffff : 0000:01:05.0
  fdee0000-fdeeffff : 0000:01:05.0
  fdefc000-fdefffff : 0000:01:05.2
    fdefc000-fdefffff : ICH HD audio
fdf00000-fdffffff : PCI Bus #02
fe020000-fe023fff : 0000:00:14.2
  fe020000-fe023fff : ICH HD audio
fe029000-fe0290ff : 0000:00:13.5
  fe029000-fe0290ff : ehci_hcd
fe02a000-fe02afff : 0000:00:13.4
  fe02a000-fe02afff : ohci_hcd
fe02b000-fe02bfff : 0000:00:13.3
  fe02b000-fe02bfff : ohci_hcd
fe02c000-fe02cfff : 0000:00:13.2
  fe02c000-fe02cfff : ohci_hcd
fe02d000-fe02dfff : 0000:00:13.1
  fe02d000-fe02dfff : ohci_hcd
fe02e000-fe02efff : 0000:00:13.0
  fe02e000-fe02efff : ohci_hcd
fe02f000-fe02f3ff : 0000:00:12.0
  fe02f000-fe02f3ff : ahci
fec00000-fec00fff : IOAPIC 0
  fec00000-fec00fff : pnp 00:0d
fed00000-fed003ff : HPET 0
  fed00000-fed003ff : 0000:00:14.0
fee00000-fee00fff : Local APIC
fff80000-fffeffff : pnp 00:0d
ffff0000-ffffffff : pnp 00:0d


> To test this theory, could you tweak this:
> 
>   alloc_bootmem(sizeof(*hpet_res) + HPET_RESOURCE_NAME_SIZE);
> 
> to be:
> 
>   alloc_bootmem_low(sizeof(*hpet_res) + HPET_RESOURCE_NAME_SIZE);
> 
> this will allocate the hpet resource descriptor in lower RAM.

Results:  strange... still locked up, and more or less the same output,
especially the same address!:

Data from arch/x86/kernel/acpi/boot.c:
  hpet_res = ffff88000100f000    requested size: 65
  sequence = 0    insert_resource() returned:  0
  broken_bios: 0


Here is a section of 'git diff arch/x86/kernel/acpi/bootc' to
verify that I _did_ make the change:

===== BEGIN DIFF =============
@@ -701,13 +711,16 @@ static int __init acpi_parse_hpet(struct acpi_table_header *table)
      * the resource tree during the lateinit timeframe.
      */
 #define HPET_RESOURCE_NAME_SIZE 9
-    hpet_res = alloc_bootmem(sizeof(*hpet_res) + HPET_RESOURCE_NAME_SIZE);
+    hpet_res = alloc_bootmem_low (sizeof(*hpet_res) + HPET_RESOURCE_NAME_SIZE);
+    dw_hpet_res = hpet_res;
+    dw_req_size = sizeof (*hpet_res) + HPET_RESOURCE_NAME_SIZE;
 
     hpet_res->name = (void *)&hpet_res[1];
     hpet_res->flags = IORESOURCE_MEM;
     snprintf((char *)hpet_res->name, HPET_RESOURCE_NAME_SIZE, "HPET %u",
          hpet_tbl->sequence);
===== END DIFF =============

It's like the change to alloc_bootmem_low made no difference at all!

The Aug. 12 messages I saw about alloc_bootmem() had to do with alignment
issues on 1 GB boundaries on x86_64 NUMA machines.  I certainly do have
x86_64 NUMA machines, but the behavior above seems to have nothing to do
with alignment issues.


> Another idea: could you increase HPET_RESOURCE_NAME_SIZE from 9 to 
> something larger (via the patch below)? Maybe the bug is that this 
> overflows:
> 
>         snprintf((char *)hpet_res->name, HPET_RESOURCE_NAME_SIZE, "HPET %u",
>                  hpet_tbl->sequence);
> 
> and corrupts the memory next to the hpet resource descriptor.

I noticed the potential for sequence to overflow the 9 byte buffer size
right away.  I got my hopes up... until I looked in include/acpi/actbl1.h:

struct acpi_table_hpet {
        struct acpi_table_header  header;
        u32  id;
        struct acpi_generic_address  address;
        u8  sequence;
        u16  minimum_tick;
        u8  flags;
};

The original programmer set HPET_RESOURCE_NAME_SIZE to 9 because the
combined length of "HPET " and a u8 is guaranteed to be <= 8.  I have
applied the change, nevertheless:


> @@ -700,7 +700,7 @@ static int __init acpi_parse_hpet(struct acpi_table_header 
> *table)
>      * Allocate and initialize the HPET firmware resource for adding into
>      * the resource tree during the lateinit timeframe.
>      */
> -#define HPET_RESOURCE_NAME_SIZE 9
> +#define HPET_RESOURCE_NAME_SIZE 14
>     hpet_res = alloc_bootmem(sizeof(*hpet_res) + HPET_RESOURCE_NAME_SIZE);

Results:  locked up

Data from arch/x86/kernel/acpi/boot.c:
  hpet_res = ffff88000100f000    requested size: 70
  sequence = 0    insert_resource() returned:  0
  broken_bios: 0


> Also, you could try to increase the bootmem allocation drastically, by 
> say 16*1024 bytes, via:
> 
>     hpet_res = alloc_bootmem(sizeof(*hpet_res) + HPET_RESOURCE_NAME_SIZE + 
> 16*1024);
>         hpet_res = (void *)hpet_res + 15*1024;
> 
> this will pad the memory at ~16MB and not use it for any resource. 
> Arguably a really weird hack, but i'm running out of ideas ...

I tried this:

-    hpet_res = alloc_bootmem(sizeof(*hpet_res) + HPET_RESOURCE_NAME_SIZE);
+    hpet_res = alloc_bootmem(sizeof(*hpet_res) + HPET_RESOURCE_NAME_SIZE + 16*1024);
+    hpet_res = (void*) hpet_res + 1024;

Results:  locked up

Data from arch/x86/kernel/acpi/boot.c:
  hpet_res = ffff88000100f400    requested size: 70
  sequence = 0    insert_resource() returned:  0
  broken_bios: 0

It looks like this resource does not get mangled, but maybe others are.

In a weekend experiment (for which I didn't post results), I recursed the
iomem_resource tree -- struggling to get all of the output to fit on one
80x25 screen.  Everything there seemed to be intact, with the addresses
matching the output of 'cat /proc/iomem' on a working kernel... except
(naturally) for some missing resources because the kernel locks before
getting to them.

But what does any of this have to do with the fact that the lockup occurs
in synchronize_rcu()?????  Madness... MADNESS!!!!!


[Old issue]  No one responded when I asked for some help with 'git' to
move my reverts up from "v2.6.26" to the HEAD of origin/master (or
tip/master).  Did you see that question, and do you have any advice?


Thanks Ingo,
Dave W.

^ permalink raw reply	[flat|nested] 6+ messages in thread
* Re: HPET regression in 2.6.26 versus 2.6.25 -- connection between HPET and lockups found
@ 2008-08-19  0:34 David Witbrodt
  2008-08-19  1:14 ` Ingo Molnar
  0 siblings, 1 reply; 6+ messages in thread
From: David Witbrodt @ 2008-08-19  0:34 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: linux-kernel, Ingo Molnar, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, H. Peter Anvin, netdev


As part of my experiments to determine the root cause of my lockups,
I was searching through the kernel sources trying to discover any
connection between the changes in the commits introducing the lockups
(3def3d6d... and 1e934dda...) and the fact that "hpet=disable" 
alleviates the lockups.

I finally discovered something that looks promising!


Both of those commits introduce changes involving insert_resource(),
and I found the function hpet_insert_resource() in
arch/x86/kernel/acpi/boot.c that also uses insert_resource():

static __init int hpet_insert_resource(void)
{
        if (!hpet_res)
                return 1;

        return insert_resource(&iomem_resource, hpet_res);
}


The effect of "hpet=disable" is to prevent the hpet_res pointer,

    static struct __initdata resource *hpet_res;

from being attached to memory, keeping it NULL and causing the
return value to indicate that the HPET resource was not assigned.

When not using "hpet=disable", the memory location of hpet_res
is added to the iomem_resource tree.  The code that obtains the
memory for hpet_res is in the same file, in the lines immediately
preceding:

static int __init acpi_parse_hpet(struct acpi_table_header *table)
{
        struct acpi_table_hpet *hpet_tbl;

...
#define HPET_RESOURCE_NAME_SIZE 9
        hpet_res = alloc_bootmem(sizeof(*hpet_res) + HPET_RESOURCE_NAME_SIZE);

...
        return 0;
}


Trying to discover if something was going haywire in this part of the code,
I tried to capture some data which I could save until just before the kernel
locks so that I could printk() it and still see it without having it scroll
off the top:

===== BEGIN DIFF ==========
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 9d3528c..c4670a6 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -644,6 +644,11 @@ static int __init acpi_parse_sbf(struct acpi_table_header *table)
 
 static struct __initdata resource *hpet_res;
 
+extern void *dw_hpet_res;
+extern int dw_broken_bios;
+extern unsigned dw_seq;
+extern unsigned dw_req_size;
+
 static int __init acpi_parse_hpet(struct acpi_table_header *table)
 {
     struct acpi_table_hpet *hpet_tbl;
@@ -672,6 +677,9 @@ static int __init acpi_parse_hpet(struct acpi_table_header *table)
                hpet_tbl->id, hpet_address);
         return 0;
     }
+
+    dw_broken_bios = 0;
+
 #ifdef CONFIG_X86_64
     /*
      * Some even more broken BIOSes advertise HPET at
@@ -679,6 +687,8 @@ static int __init acpi_parse_hpet(struct acpi_table_header *table)
      * some noise:
      */
     if (hpet_address == 0xfed0000000000000UL) {
+            dw_broken_bios = 1;
+
         if (!hpet_force_user) {
             printk(KERN_WARNING PREFIX "HPET id: %#x "
                    "base: 0xfed0000000000000 is bogus\n "
@@ -702,12 +712,15 @@ static int __init acpi_parse_hpet(struct acpi_table_header *table)
      */
 #define HPET_RESOURCE_NAME_SIZE 9
     hpet_res = alloc_bootmem(sizeof(*hpet_res) + HPET_RESOURCE_NAME_SIZE);
+    dw_hpet_res = hpet_res;
+    dw_req_size = sizeof (*hpet_res) + HPET_RESOURCE_NAME_SIZE;
 
     hpet_res->name = (void *)&hpet_res[1];
     hpet_res->flags = IORESOURCE_MEM;
     snprintf((char *)hpet_res->name, HPET_RESOURCE_NAME_SIZE, "HPET %u",
          hpet_tbl->sequence);
 
+    dw_seq = hpet_tbl->sequence;
     hpet_res->start = hpet_address;
     hpet_res->end = hpet_address + (1 * 1024) - 1;
 
@@ -718,12 +731,19 @@ static int __init acpi_parse_hpet(struct acpi_table_header *table)
  * hpet_insert_resource inserts the HPET resources used into the resource
  * tree.
  */
+extern int dw_ir_retval;
+
 static __init int hpet_insert_resource(void)
 {
+        int retval;
+
     if (!hpet_res)
         return 1;
 
-    return insert_resource(&iomem_resource, hpet_res);
+    retval = insert_resource(&iomem_resource, hpet_res);
+    dw_ir_retval = retval;
+
+    return retval;
 }
 
 late_initcall(hpet_insert_resource);
diff --git a/net/core/dev.c b/net/core/dev.c
index 600bb23..fe27b94 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4304,10 +4304,21 @@ void free_netdev(struct net_device *dev)
     put_device(&dev->dev);
 }
 
+void *dw_hpet_res;
+int dw_broken_bios;
+unsigned dw_seq;
+int dw_ir_retval;
+unsigned dw_req_size;
+
 /* Synchronize with packet receive processing. */
 void synchronize_net(void)
 {
     might_sleep();
+
+    printk ("Data from arch/x86/kernel/acpi/boot.c:\n");
+    printk ("  hpet_res = %p    requested size: %u\n", dw_hpet_res, dw_req_size);
+    printk ("  sequence = %u    insert_resource() returned:  %d\n", dw_seq, dw_ir_retval);
+        printk ("  broken_bios: %d\n", dw_broken_bios);
     synchronize_rcu();
 }
===== END DIFF ==========


The output I get when the kernel locks up looks perfectly OK, except
maybe for the address of hpet_res (which I am not knowledgeable enough
to judge):

Data from arch/x86/kernel/acpi/boot.c:
  hpet_res = ffff88000100f000    broken_bios: 0
  sequence = 0    insert_resource() returned: 0


I see some recent (Aug. 2008) discussion of alloc_bootmem() being 
broken, so maybe that is related to my problem.

Does this connection between HPET and insert_resource() look meaningful,
or is this a coincidence?


Thanks,
Dave W.

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-08-19 13:08 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-19 12:49 HPET regression in 2.6.26 versus 2.6.25 -- connection between HPET and lockups found David Witbrodt
2008-08-19 13:08 ` Ingo Molnar
  -- strict thread matches above, loose matches on Subject: below --
2008-08-19  3:51 David Witbrodt
2008-08-19  9:23 ` Ingo Molnar
2008-08-19  0:34 David Witbrodt
2008-08-19  1:14 ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).