public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* HPET regression in 2.6.26 versus 2.6.25
@ 2008-08-04 23:57 David Witbrodt
  2008-08-05 13:58 ` Peter Zijlstra
  0 siblings, 1 reply; 9+ messages in thread
From: David Witbrodt @ 2008-08-04 23:57 UTC (permalink / raw)
  To: linux-kernel

Hello,

[Please CC me if you reply, for I am not subscribed to LKML.]

This is my first time posting to LKML.

I am a Debian user.  The sources for 2.6.26 recently became available
in the Debian unstable repositories.  Trying them out by building
custom kernels (think 'make oldconfig'), I found that one machine 
worked while another froze early in boot.  No oops, no error msg of
any kind, just a hard freeze without even Magic SysRq working!

I suspected a dumb config error on my part, but found that the Debian
stock kernel exhibited the same problem.  So I filed a bug report in
the Debian BTS:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=493479

There is much info about my hardware and configs there, but I can
repost them here if that is helpful.  The machine that works with
2.6.26 has a Gigabyte GA-M59SLI-S5 mboard; the broken machine has an
ECS AMD690GM-M2 mboard.

After much experimenting with various configs and rebuilds, I was
finally able to discover that a kernel boot parameter,
"hpet=disabled", allowed me to boot on the troublesome machine.  
Both custom and Debian stock kernels of version 2.6.25 (most recently
based on 2.6.25.10) work fine on this machine, no problem with HPET.

A member of the Debian kernel team (Bastian Blank) tried to help, but
ended up suggesting bisecting using 'git'.  I am not (yet) a developer
so I was not really thinking of getting that deeply involved, but I
spent so much time trying to track this problem on Saturday night and
all day Sunday, that I decided to give it a try!

Starting with Linus' instructions here,
  http://lkml.org/lkml/2007/7/10/248

I ran: 
  git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6

and:
  git checkout v2.6.25

I built a kernel on the ECS machine and it worked (as expected), so I ran:
  git bisect good

then:
  git checkout v2.6.26-rc4

hoping maybe to save some iterations by not starting with the 2.6.26 release.
This 2.6.26-rc4 kernel froze early in boot, so I ran:
  git bisect bad

Here is a summary of my first git bisecting experiment:
======================================================

Iteration  ID                                        status
---------  ----------                                ------
1          2.6.25                                    good
2          2.6.26-rc4                                bad
3          10c993a6b5418cb1026775765ba4c70ffb70853d  bad
4          334d094504c2fe1c44211ecb49146ae6bca8c321  bad
5          eddeb0e2d863e3941d8768e70cb50c6120e61fa0  bad
6          77ad386e596c6b0930cc2e09e3cce485e3ee7f72  bad
7          ede1389f8ab4f3a1343e567133fa9720a054a3aa  bad
8          c048fdfe6178e082be918d4062c86d9764979112  bad
9          f73920cd63d316008738427a0df2caab6cc88ad7  bad
10         04aaa7ba096c707a8df337b29303f1a5a65f0462  good
11         8fa6878ffc6366f490e99a1ab31127fb599657c9  good
12         1180e01de50c0c7683c6648251f32957bc2d7850  good
13         1e934dda0c77c8ad13fdda02074f2cfcea118a56  bad
14         322850af8d93735f67b8ebf84bb1350639be3f34  good
15         3def3d6ddf43dbe20c00c3cbc38dfacc8586998f  bad
16         700efc1b9f6afe34caae231b87d129ad8ffb559f  good

First commit causing failure:

commit 3def3d6ddf43dbe20c00c3cbc38dfacc8586998f
Author: Yinghai Lu <Yinghai.Lu@Sun.COM>
Date:   Fri Feb 22 17:07:16 2008 -0800

    x86: clean up e820_reserve_resources on 64-bit
    
    e820_resource_resources could use insert_resource instead of request_resource
    also move code_resource, data_resource, bss_resource, and crashk_res
    out of e820_reserve_resources.
    
    Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
======================================================

So, it seems that this commit made a change that works on some
(most?) systems, like my Gigabyte mboard machine, but causes
others, like my ECS mboard machine, to freeze early in boot
unless HPET is disabled.

I don't know how important the High Precision Event Timer really 
is to the health of my machine, but for the sake of principle I 
would really like to see it working again, like with 2.6.25 and 
before!  ;)

For me this is a "regression," but I have found a workaround.  I'm 
not sure what sort of problem is important enough to Linux kernel
developers to qualify as a true regression, so I brought my problem
here in case its something that should be reported and/or fixed.

I work as a programming tutor at a community college, so I'm willing
to make code changes and build test kernels, if anyone can make
suggestions.  I looked at the diff between the last working commit
and the first broken (for me) commit, and found that I did not have
a clue about the hardware issues involved:

git diff 700efc1b9f6afe34caae231b87d129ad8ffb559f 3def3d6ddf43dbe20c00c3cbc38dfacc8586998f

There are only 3 files involved,
  arch/x86/kernel/e820_64.c
  arch/x86/kernel/setup_64.c
  include/asm-x86/e820_64.h

and I could see that 'setup_64.c' is not implicated in my freeze
because the code change is in an #ifdef block depending on 
CONFIG_KEXEC, which is not enabled in my custom kernels (though it
is in the Debian stock kernels).

If what I am describing is considered a regression bug, as I do, then I
am willing to try code changes to get 2.6.26 working on BOTH of my 
machines.


Thx (and please CC replies to me),
Dave Witbrodt

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: HPET regression in 2.6.26 versus 2.6.25
  2008-08-04 23:57 David Witbrodt
@ 2008-08-05 13:58 ` Peter Zijlstra
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2008-08-05 13:58 UTC (permalink / raw)
  To: David Witbrodt
  Cc: linux-kernel, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Yinghai Lu

[ Let's CC people, so that they'll at least see this mail
  when they're back from holidays ]

On Mon, 2008-08-04 at 16:57 -0700, David Witbrodt wrote:
> Hello,
> 
> [Please CC me if you reply, for I am not subscribed to LKML.]
> 
> This is my first time posting to LKML.
> 
> I am a Debian user.  The sources for 2.6.26 recently became available
> in the Debian unstable repositories.  Trying them out by building
> custom kernels (think 'make oldconfig'), I found that one machine 
> worked while another froze early in boot.  No oops, no error msg of
> any kind, just a hard freeze without even Magic SysRq working!
> 
> I suspected a dumb config error on my part, but found that the Debian
> stock kernel exhibited the same problem.  So I filed a bug report in
> the Debian BTS:
> 
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=493479
> 
> There is much info about my hardware and configs there, but I can
> repost them here if that is helpful.  The machine that works with
> 2.6.26 has a Gigabyte GA-M59SLI-S5 mboard; the broken machine has an
> ECS AMD690GM-M2 mboard.
> 
> After much experimenting with various configs and rebuilds, I was
> finally able to discover that a kernel boot parameter,
> "hpet=disabled", allowed me to boot on the troublesome machine.  
> Both custom and Debian stock kernels of version 2.6.25 (most recently
> based on 2.6.25.10) work fine on this machine, no problem with HPET.
> 
> A member of the Debian kernel team (Bastian Blank) tried to help, but
> ended up suggesting bisecting using 'git'.  I am not (yet) a developer
> so I was not really thinking of getting that deeply involved, but I
> spent so much time trying to track this problem on Saturday night and
> all day Sunday, that I decided to give it a try!
> 
> Starting with Linus' instructions here,
>   http://lkml.org/lkml/2007/7/10/248
> 
> I ran: 
>   git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
> 
> and:
>   git checkout v2.6.25


Since you have that git tree, could you try to see if the latest -git
still has this problem?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: HPET regression in 2.6.26 versus 2.6.25
@ 2008-08-05 14:14 David Witbrodt
  2008-08-05 19:19 ` Yinghai Lu
  0 siblings, 1 reply; 9+ messages in thread
From: David Witbrodt @ 2008-08-05 14:14 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Yinghai Lu



> Since you have that git tree, could you try to see if the latest -git
> still has this problem?

  I had forgotten to do that at the time I posted.

  Last night, after the post, I _did_ try building "master".  The commit
date was from Aug. 1 -- I am at work right now, so I can't provide exact
info until I get home in about 6 hours.  I checked 'Makefile', and the
version there was 2.6.27-rc1, IIRC.
  It built fine, but same freeze was seen.

  A reversion of the commit I mentioned _will_ solve the problem, but
looking at the code I saw that it was an attempt to provide better
functionality.  I'm willing to help test modifications to make the new
code work!


Thanks,
Dave W.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: HPET regression in 2.6.26 versus 2.6.25
  2008-08-05 14:14 David Witbrodt
@ 2008-08-05 19:19 ` Yinghai Lu
  0 siblings, 0 replies; 9+ messages in thread
From: Yinghai Lu @ 2008-08-05 19:19 UTC (permalink / raw)
  To: David Witbrodt
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin

On Tue, Aug 5, 2008 at 7:14 AM, David Witbrodt <dawitbro@sbcglobal.net> wrote:
>
>
>> Since you have that git tree, could you try to see if the latest -git
>> still has this problem?
>
>  I had forgotten to do that at the time I posted.
>
>  Last night, after the post, I _did_ try building "master".  The commit
> date was from Aug. 1 -- I am at work right now, so I can't provide exact
> info until I get home in about 6 hours.  I checked 'Makefile', and the
> version there was 2.6.27-rc1, IIRC.
>  It built fine, but same freeze was seen.
>
>  A reversion of the commit I mentioned _will_ solve the problem, but
> looking at the code I saw that it was an attempt to provide better
> functionality.  I'm willing to help test modifications to make the new
> code work!

please boot with "debug apic=verbose initcall_debut" to check exactly
where it hangs...

YH

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: HPET regression in 2.6.26 versus 2.6.25
@ 2008-08-05 21:12 David Witbrodt
  0 siblings, 0 replies; 9+ messages in thread
From: David Witbrodt @ 2008-08-05 21:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Yinghai Lu



> Since you have that git tree, could you try to see if the latest -git
> still has this problem?

  In a previous msg I mentioned that I had tried compiling the HEAD of
my git repository, but only after I had posted to LKML.  I was at work
when I wrote the prev msg, so I could not provide details except from
memory.

  OK, now I'm home:
======================================

$ git-show
commit 2b12a4c524812fb3f6ee590a02e65b95c8c32229
Merge: 4744b43... 7f30491...
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Fri Aug 1 14:59:11 2008 -0700

    Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
    
    * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
      [IA64] Move include/asm-ia64 to arch/ia64/include/asm

$ head Makefile
VERSION = 2
PATCHLEVEL = 6
SUBLEVEL = 27
EXTRAVERSION = -rc1
NAME = Rotary Wombat

# *DOCUMENTATION*
# To see a list of typical targets execute "make help"
# More info can be located in ./README
# Comments in this file are targeted only to the developer, do not
======================================

  This kernel built fine, but froze at boot just like the 2.6.26 kernels,
unless using "hpet=disabled".
  Sorry that I forgot to do this before my orig. post.

Dave W.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: HPET regression in 2.6.26 versus 2.6.25
@ 2008-08-05 22:16 David Witbrodt
  0 siblings, 0 replies; 9+ messages in thread
From: David Witbrodt @ 2008-08-05 22:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Yinghai Lu, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin



> please boot with "debug apic=verbose initcall_debut" to check exactly
> where it hangs...

In my OP, I mentioned that I submitted a bug report to the Debian BTS
before coming to LKML.  I hoped to keep the bug an internal Debian matter,
since the kernels I compile were always from Debianized kernel sources.

Below I comply with your request, booting the kernel built from the HEAD
of the git tree I downloaded yesterday, dated 
    Fri Aug 1 14:59:11 2008 -0700

and with commit ID
    2b12a4c524812fb3f6ee590a02e65b95c8c32229


Before continuing, I would like to mention that in my original post to
the Debian BTS, I reported the last lines on the screen for several
kernels booted with "debug earlyprink=vga initcall_debug loglevel=7".
I originally thought I was to blame -- some error in my '.config' --
so, unfortunately, I made a lot of irrelevant noise in the Debian BTS
thread as I scrambled to determine the cause of the freeze.  So maybe
the info there is not useful at all, but here is the link again, 
Just In Case:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=493479


OK, now booting 2.6.27-rc1 with "ro debug apic=verbose initcall_debug"...
Here is the last visible output before the freeze:
=========================================

calling chr_dev_init+0x0/0xa2
initcall chr_dev_init returned 0 after 0 msecs
calling firmware_class_init+0x0/0x71
initcall firmware_class_init returned 0 after 0 msecs
calling loopback_init+0x0/0xc
initcall loopback_init returned 0 after 0 msecs
calling cpufreq_gov_performance_init+0x0/0xc
initcall cpufreq_gov_performance_init returned 0 after 0 msecs
calling init_acpi_pm_clocksource+0x0/0xb4
initcall init_acpi_pm_clocksource returned 0 after 0 msecs
calling pci_bios_assign_resources+0x0/0x8b
pci 0000:00:01.0: PCI bridge, secondary bus 0000:01
pci 0000:00:01.0:   IO window: 0xe000-0xefff
pci 0000:00:01.0:   MEM window: 0xfdd00000-0xfdefffff
pci 0000:00:01.0:   PREFETCH window: 0x000000d8000000-0x000000dfffffff
pci 0000:00:14.4: PCI bridge, secondary bus 0000:02
pci 0000:00:14.4:   IO window: 0xd000-0xdfff
pci 0000:00:14.4:   MEM window: 0xfdc00000-0xfdcfffff
pci 0000:00:14.4:   PREFETCH window: 0x000000fdf00000-0x000000fdffffff
initcall pci_bios_assign_resources returned 0 after 285696 msecs
calling inet_init+0x0/0x250
NET: Registered protocol family 2
=========================================

I can tell you that the "285696" figure is way off if "msecs" is
supposed to mean milliseconds.  It might be accurate if microseconds
are intended, but the entire process from GRUB handing off to the
kernel until the freeze occurs is just a few moments:  3 seconds at
the most, probably less.

This info was copied by hand.  I had no other way to transfer the info
into this post, so I apologize in advance for any errors.  I did
double check it, but some of those hex values are typos waiting to
happen....  (I'm pretty sure I got them right, though  ;)

Only 3 files were impacted by the commit that is causing the freeze
for my machine with the ECS mboard.  If you would like to give me
some code to insert in those files (or other files) that would 
print more helpful output during the boot, I would be more than
happy to give it a try.


Dave W.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: HPET regression in 2.6.26 versus 2.6.25
@ 2008-08-06  4:45 David Witbrodt
  0 siblings, 0 replies; 9+ messages in thread
From: David Witbrodt @ 2008-08-06  4:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Yinghai Lu, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin



Had a bit of a scare tonight, about possibly wasting the time of you
good folks.

My desktop machine (mboard = Gigabyte GA-M59SLI-S5) was built last
year, in May 2007.  It runs 2.6.26 with no HPET regression, as mentioned.

The troublesome machine (mboard = ECS AMD690GM-M2) was built this year,
in May or June.  I actually bought two identical motherboards, which
were on sale at a very nice price, so I could make 2 "servers" for my
home network.

One machine (call it "fileserver") is in working order, and is the
machine I've been using for all of the testing I've done in this bug
thread.  The other ECS machine (call it "webserver") was not really
in working condition -- actually, it is OK, just not hooked up while
I've been backing up files from an older Pentium 4 machine and a
Pentium 3 machine.

The "scare" has to do with the CPU/BIOS situation.  The webserver uses
an AMD Athlon 64 X2 3600+, fully recognized by the mboard BIOS.  The 
fileserver uses a very new model:  AMD X2 4850e.  The ECS mboard runs
this CPU fine, but the BIOS does not "recognize" it.

I asked ECS about the possibility of a BIOS update in early June.  The
response:
=========================

ECS Support(USA) Posted : GMT 2008/06/14 00:19:14
Thank for your question. It is hard to say if there will be a BIOS 
version to support your CPU. But for sure we will pass this along to 
the engineering department in Taipei. Thanks.
=========================

The last update for this mboard I know of was from Dec. 2007:
http://www.ecsusa.com/ECSWebSite/Products/ProductsDetail.aspx?detailid=789&DetailName=Bios&MenuID=46&LanID=9

Tonight, fearing that some peculiarity of the CPU might be causing the
problem instead of the motherboard hardware itself, I got the other
machine (ECS mboard + Athlon 64 X2 3600) running and tested the
2.6.27-rc1 kernel on it:  froze on boot, but ran with "hpet=disabled".

Well, at least I'm glad I didn't waste everybody's time on some weird 
exception.  Of course, this bug is not really a problem for me at all 
at the present:  I can easily run 2.6.25 kernels on these two boxes, and
even 2.6.26+ kernels with "hpet=disabled" if need be.  I just would like
to see this issue fixed on the hardware I own, thinking in terms of the
future.  The Debian Developers are trying to get 2.6.26 into the next 
stable release, but right now it looks like anyone with this ECS 
motherboard who would try to install Linux from media with 2.6.26 would 
have a seizure... their machine, that is.  ;)


Dave W.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: HPET regression in 2.6.26 versus 2.6.25
@ 2008-08-08 10:32 David Witbrodt
  0 siblings, 0 replies; 9+ messages in thread
From: David Witbrodt @ 2008-08-08 10:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Yinghai Lu, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin

OK, even though I am not a developer, I _hate_ feeling powerless to help...
so I went looking for more details about where the freeze is occuring.

First, I updated my git tree:
===== BEGIN INFO ========================

$ head Makefile    
VERSION = 2
PATCHLEVEL = 6
SUBLEVEL = 27
EXTRAVERSION = -rc2
NAME = Rotary Wombat

# *DOCUMENTATION*
# To see a list of typical targets execute "make help"
# More info can be located in ./README
# Comments in this file are targeted only to the developer, do not


$ git show |head
commit 685d87f7ccc649ab92b55e18e507a65d0e694eb9
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Wed Aug 6 19:24:47 2008 -0700

    Revert "pcm_native.c: remove unused label"
    
    This reverts commit 680db0136e0778a0d7e025af7572c6a8d82279e2.  The label
    is actually used, but hidden behind CONFIG_SND_DEBUG and the horrible
    snd_assert() macro.
===== END INFO ========================

Since the last initcall function listed before the freeze was inet_init(), 
I decided to try to locate the files involved using 'grep -R'.  I found
that inet_init() is defined in 'net/ipv4/af_inet.c'.  So, I thought it
wouldn't hurt to print some more info during the kernel boot process, and
I added some debugging printk() calls to af_inetc.:

===== BEGIN DIFF ========================
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 8a3ac1f..8e98094 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1421,14 +1421,17 @@ static int __init inet_init(void)

    BUILD_BUG_ON(sizeof(struct inet_skb_parm) > sizeof(dummy_skb->cb));

+   printk("           Calling proto_register(&tcp_prot, 1)\n");
    rc = proto_register(&tcp_prot, 1);
    if (rc)
        goto out;

+   printk("           Calling proto_register(&udp_prot, 1)\n");
    rc = proto_register(&udp_prot, 1);
    if (rc)
        goto out_unregister_tcp_proto;

+   printk("           Calling proto_register(&raw_prot, 1)\n");
    rc = proto_register(&raw_prot, 1);
    if (rc)
        goto out_unregister_udp_proto;
@@ -1437,15 +1440,18 @@ static int __init inet_init(void)
     *    Tell SOCKET that we are alive...
     */

+   printk("           Calling sock_register()\n");
    (void)sock_register(&inet_family_ops);

 #ifdef CONFIG_SYSCTL
+   printk("           Calling ip_static_sysctl_init()\n");
    ip_static_sysctl_init();
 #endif

    /*
     *    Add all the base protocols.
     */
+   printk("           Adding base protocols\n");

    if (inet_add_protocol(&icmp_protocol, IPPROTO_ICMP) < 0)
        printk(KERN_CRIT "inet_init: Cannot add ICMP protocol\n");
@@ -1459,6 +1465,7 @@ static int __init inet_init(void)
 #endif

    /* Register the socket-side information for inet_create. */
+   printk("           Initializing lists for inet_create\n");
    for (r = &inetsw[0]; r < &inetsw[SOCK_MAX]; ++r)
        INIT_LIST_HEAD(r);

@@ -1469,23 +1476,31 @@ static int __init inet_init(void)
     *    Set the ARP module up
     */

+   printk("           Calling arp_init()\n");
    arp_init();

    /*
     *    Set the IP module up
     */

+   printk("           Calling\n");
+
+   printk("           Calling ip_init()\n");
    ip_init();

+   printk("           Calling tcp_v4_init()\n");
    tcp_v4_init();

    /* Setup TCP slab cache for open requests. */
+   printk("           Calling tcp_init()\n");
    tcp_init();

    /* Setup UDP memory threshold */
+   printk("           Calling udp_init()\n");
    udp_init();

    /* Add UDP-Lite (RFC 3828) */
+   printk("           Calling udplit4_register()\n");
    udplite4_register();

    /*
@@ -1509,10 +1524,13 @@ static int __init inet_init(void)
    if (init_ipv4_mibs())
        printk(KERN_CRIT "inet_init: Cannot init ipv4 mibs\n");

+   printk("           Calling ipv4_proc_init()\n");
    ipv4_proc_init();

+   printk("           Calling ipfrag_init()\n");
    ipfrag_init();

+   printk("           Calling dev_add_pack(&ip_packet_type)\n");
    dev_add_pack(&ip_packet_type);

    rc = 0;
===== END DIFF ========================

[Hopefully the text formatting is preserved in the emails.  The
archived messages via the web interface have their  whitespace 
formatting totally destroyed!]


After building and running the kernel, the last line on the terminal was:

    Initializing lists for inet_create

So the freeze occurs in this "for" loop (or the loop immediately 
following it):

    for (r = &inetsw[0]; r < &inetsw[SOCK_MAX]; ++r)
        INIT_LIST_HEAD(r);


HTH,
Dave W.

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: HPET regression in 2.6.26 versus 2.6.25
@ 2008-08-08 22:48 David Witbrodt
  0 siblings, 0 replies; 9+ messages in thread
From: David Witbrodt @ 2008-08-08 22:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: Yinghai Lu, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin

OK, suffering from insomnia this morning, I added printk()'s to
net/ipv4/af_inet.c in order to find the code where the freeze
happens.  One of 2 loops was the culprit:

===== BEGIN CODE =======================
#ifdef CONFIG_IP_MULTICAST
    if (inet_add_protocol(&igmp_protocol, IPPROTO_IGMP) < 0)
        printk(KERN_CRIT "inet_init: Cannot add IGMP protocol\n");
#endif
  
    /* Register the socket-side information for inet_create. */
    for (r = &inetsw[0]; r < &inetsw[SOCK_MAX]; ++r)
        INIT_LIST_HEAD(r);

    for (q = inetsw_array; q < &inetsw_array[INETSW_ARRAY_LEN]; ++q)
        inet_register_protosw(q);
  
    /*
     *    Set the ARP module up
     */
===== END CODE =======================

Feeling better, I tried to get a few hours of sleep before I had to
go to work.

Knowing where to focus more attention, I restored the original version
of af_inet.c from the git tree with

    git show HEAD:net/ivp4/af_inet.c

and then made the following changes to discover which loop was
the problem:

===== BEGIN DIFF ========================
  #ifdef CONFIG_IP_MULTICAST
      if (inet_add_protocol(&igmp_protocol, IPPROTO_IGMP) < 0)
          printk(KERN_CRIT "inet_init: Cannot add IGMP protocol\n");
  #endif
  
+     printk("           First loop:\n");
+     printk("             SOCK_MAX = %d\n", SOCK_MAX);
+     int dwindex=0;
      /* Register the socket-side information for inet_create. */
      for (r = &inetsw[0]; r < &inetsw[SOCK_MAX]; ++r)
+       {
+         printk("             initializing:  &inetsw[%d] = %p\n", dwindex, r);
          INIT_LIST_HEAD(r);
+         ++dwindex;
+       }
  
+     printk("           Second loop:\n");
+     printk("             INETSW_ARRAY_LEN = %d\n", INETSW_ARRAY_LEN);
+     printk("             Initial q = %p\n", inetsw_array);
+     printk("             Final   q = %p\n", &inetsw_array[INETSW_ARRAY_LEN]);
+     dwindex=0;
      for (q = inetsw_array; q < &inetsw_array[INETSW_ARRAY_LEN]; ++q)
+       {
+         printk("             initializing:  &q[%d]\n", dwindex);
          inet_register_protosw(q);
+         ++dwindex;
+       }
  
      /*
       *    Set the ARP module up
       */
===== END DIFF ========================

I then built the kernel, installed it, and rebooted.  The following
output was observed:

===== BEGIN OUTPUT ========================
...
NET: Registered protocol family 2
           First loop:
             SOCK_MAX = 11
             initializing:  &initsw[0] = ffffffff809c8460
             initializing:  &initsw[1] = ffffffff809c8470
             initializing:  &initsw[2] = ffffffff809c8480
             initializing:  &initsw[3] = ffffffff809c8490
             initializing:  &initsw[4] = ffffffff809c84a0
             initializing:  &initsw[5] = ffffffff809c84b0
             initializing:  &initsw[6] = ffffffff809c84c0
             initializing:  &initsw[7] = ffffffff809c84d0
             initializing:  &initsw[8] = ffffffff809c84e0
             initializing:  &initsw[9] = ffffffff809c84f0
             initializing:  &initsw[10] = ffffffff809c8500
           Second loop:
             INETSW_ARRAY_LEN = 3
             Initial q = ffffffff806f8a20
             Final   q = ffffffff806f8a60
             initializing:  &q[0]

===== END OUTPUT ========================


This is where my kernels (2.6.26* and 2.6.27*) are freezing, in
the call of inet_register_protosw().

As I find time, I will keep trying to dig deeper.  Hopefully one
of you on the LKML has an idea of what's wrong, because even
though I am familiar with C and C++ I have no background at all
with Linux kernel code itself.


Dave W.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-08-08 22:49 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-06  4:45 HPET regression in 2.6.26 versus 2.6.25 David Witbrodt
  -- strict thread matches above, loose matches on Subject: below --
2008-08-08 22:48 David Witbrodt
2008-08-08 10:32 David Witbrodt
2008-08-05 22:16 David Witbrodt
2008-08-05 21:12 David Witbrodt
2008-08-05 14:14 David Witbrodt
2008-08-05 19:19 ` Yinghai Lu
2008-08-04 23:57 David Witbrodt
2008-08-05 13:58 ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox