netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: HPET regression in 2.6.26 versus 2.6.25 -- why Yinghai's revert may have failed
@ 2008-08-14 12:20 David Witbrodt
  2008-08-15  8:10 ` Bill Fink
  0 siblings, 1 reply; 3+ messages in thread
From: David Witbrodt @ 2008-08-14 12:20 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, linux-kernel, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, H. Peter Anvin, netdev



> > I used 'git apply --check ' first, and got no errors, so
> > I applied it, built, installed, and rebooted.
> 
> that patch revert to use request_resource, so there is some other problem
> 
> YH

I finished experimenting last night with trying to find the last commit
in the gittree that would let me revert the problem successfully...
and I got completely raped.

The bisecting took me all the way back to the first commit introducing
the problem on these motherboards:  3def3d6d...

Considering these 3 consecutive commits (according to 'git log')from late
Feb. 2008, between kernel versions 2.6.25 and 2.6.26-rc1:
---------------------------------------------------------

700efc1b...:  the last kernel I can build and run just fine.

3def3d6d...:  this one builds, but locks up in inet_init() once the sequence
of function calls reaches synchronize_rcu().  Reverting here works, but is
trivial and silly, just reproducing 700efc1b...

1e934dda...:  attempting to revert the changes from 3def3d6d... (just one
commit before!) already fails.
---------------------------------------------------------

This last commit has an effect on my machine that prevents attempts to
revert 3def3d6d... from working as intended.  This may explain why
Yinghai's patch providing the revert for 2.6.27-rc3 did not work.
(Hopefully none of the other changes between Feb. and Aug. would also keep
the revert from working, but I wouldn't bet my life on it....)

The 3d... and 1e... commits are quite small, touching only 4 files total,
and both commits involve calls to insert_resource().  Something on my 2
problem machines is behaving badly in this area.

Reminder:  disabling HPET with "hpet=disable" allows any kernel with the
lockup problem to boot just fine.

Further note: Before my first LKML post about this problem, I had also 
tried turning off all CONFIG_HPET* features that I could reach via 
'make menuconfig', but that did not work and I still had to use 
"hpet=disable" to get the kernel to boot.


SUGGESTION

When my kernels lock up, it is always a chain of calls beginning with
inet_init() and ending up here (in net/core/dev.c):

void synchronize_net(void)
{
    might_sleep();
    synchronize_rcu();
}

If anyone wants to print diagnostic info before my kernel locks up, this 
would be a really good place to do it (so that it doesn't scroll away
before I can write it down):

void synchronize_net(void)
{
    might_sleep();
    /* Insert printk's or diagnostic function here */
    synchronize_rcu();
}


Thanks,
Dave W.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: HPET regression in 2.6.26 versus 2.6.25 -- why Yinghai's revert may have failed
  2008-08-14 12:20 HPET regression in 2.6.26 versus 2.6.25 -- why Yinghai's revert may have failed David Witbrodt
@ 2008-08-15  8:10 ` Bill Fink
  0 siblings, 0 replies; 3+ messages in thread
From: Bill Fink @ 2008-08-15  8:10 UTC (permalink / raw)
  To: David Witbrodt
  Cc: Yinghai Lu, Ingo Molnar, linux-kernel, Paul E. McKenney,
	Peter Zijlstra, Thomas Gleixner, H. Peter Anvin, netdev

Hi David,

On Thu, 14 Aug 2008, David Witbrodt wrote:

> > > I used 'git apply --check ' first, and got no errors, so
> > > I applied it, built, installed, and rebooted.
> > 
> > that patch revert to use request_resource, so there is some other problem
> > 
> > YH
> 
> I finished experimenting last night with trying to find the last commit
> in the gittree that would let me revert the problem successfully...
> and I got completely raped.
> 
> The bisecting took me all the way back to the first commit introducing
> the problem on these motherboards:  3def3d6d...
> 
> Considering these 3 consecutive commits (according to 'git log')from late
> Feb. 2008, between kernel versions 2.6.25 and 2.6.26-rc1:
> ---------------------------------------------------------
> 
> 700efc1b...:  the last kernel I can build and run just fine.
> 
> 3def3d6d...:  this one builds, but locks up in inet_init() once the sequence
> of function calls reaches synchronize_rcu().  Reverting here works, but is
> trivial and silly, just reproducing 700efc1b...
> 
> 1e934dda...:  attempting to revert the changes from 3def3d6d... (just one
> commit before!) already fails.
> ---------------------------------------------------------
> 
> This last commit has an effect on my machine that prevents attempts to
> revert 3def3d6d... from working as intended.  This may explain why
> Yinghai's patch providing the revert for 2.6.27-rc3 did not work.
> (Hopefully none of the other changes between Feb. and Aug. would also keep
> the revert from working, but I wouldn't bet my life on it....)
> 
> The 3d... and 1e... commits are quite small, touching only 4 files total,
> and both commits involve calls to insert_resource().  Something on my 2
> problem machines is behaving badly in this area.

I wonder if it would help to revert both the 3def3d6d... and 1e934dda...
commits.  If there are 2 (or more) problematic commits, then of course
it wouldn't help to revert just one of the two commits.  This is one of
the nastiest type of debugging scenario, when there is more than one
cause of the observed problem, although in such case the multiple
causes are often related in some way.

						-Bill

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: HPET regression in 2.6.26 versus 2.6.25 -- why Yinghai's revert may have failed
@ 2008-08-15 12:33 David Witbrodt
  0 siblings, 0 replies; 3+ messages in thread
From: David Witbrodt @ 2008-08-15 12:33 UTC (permalink / raw)
  To: Bill Fink
  Cc: Yinghai Lu, Ingo Molnar, linux-kernel, Paul E. McKenney,
	Peter Zijlstra, Thomas Gleixner, H. Peter Anvin, netdev



> > Considering these 3 consecutive commits (according to 'git log')from late
> > Feb. 2008, between kernel versions 2.6.25 and 2.6.26-rc1:
> > ---------------------------------------------------------
> > 
> > 700efc1b...:  the last kernel I can build and run just fine.
> > 
> > 3def3d6d...:  this one builds, but locks up in inet_init() once the sequence
> > of function calls reaches synchronize_rcu().  Reverting here works, but is
> > trivial and silly, just reproducing 700efc1b...
> > 
> > 1e934dda...:  attempting to revert the changes from 3def3d6d... (just one
> > commit before!) already fails.
> > ---------------------------------------------------------
> > 
> > This last commit has an effect on my machine that prevents attempts to
> > revert 3def3d6d... from working as intended.  This may explain why
> > Yinghai's patch providing the revert for 2.6.27-rc3 did not work.
> > (Hopefully none of the other changes between Feb. and Aug. would also keep
> > the revert from working, but I wouldn't bet my life on it....)
> > 
> > The 3d... and 1e... commits are quite small, touching only 4 files total,
> > and both commits involve calls to insert_resource().  Something on my 2
> > problem machines is behaving badly in this area.
> 
> I wonder if it would help to revert both the 3def3d6d... and 1e934dda...
> commits.  If there are 2 (or more) problematic commits, then of course
> it wouldn't help to revert just one of the two commits.  This is one of
> the nastiest type of debugging scenario, when there is more than one
> cause of the observed problem, although in such case the multiple
> causes are often related in some way.

Thanks for this Bill.  I got home pretty late last night, so I only tried a
few things before hitting the sack.

Your suggestion is something I was planning, but didn't get to yet.  It
seems like any change after 3def3d6d that touches insert_resource() causes
kernels to lock up on 2 of my 3 machines.

Mike Galbraith sent an offlist reply with a very good idea for finding out
whether a commit _before_ 3def3d6d is the actual cause of my troubles.  I
am more intrigued by this possibility than the idea you and I had about
reverting both 3def3d6d and 1e934dda and moving forward from there.

If Mike's idea doesn't seem to go anywhere -- if I cannot find a kernel
that works by applying the 3def3d6d changes to _previous_ kernel revisions,
then I plan to create a branch at 700efc1b and try moving forward toward
2.6.26 (skipping the 3def3d6d and 1e934dda commits, of course) until the
kernel freezes again.

Now I have plenty of things to try!


Thx,
Dave W.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-08-15 12:33 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-14 12:20 HPET regression in 2.6.26 versus 2.6.25 -- why Yinghai's revert may have failed David Witbrodt
2008-08-15  8:10 ` Bill Fink
  -- strict thread matches above, loose matches on Subject: below --
2008-08-15 12:33 David Witbrodt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).