2.6.26-rc9-git4: Reported regressions from 2.6.25

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* 2.6.26-rc9-git4: Reported regressions from 2.6.25
@ 2008-07-08 21:37 Rafael J. Wysocki
  2008-07-09  4:49 ` Randy Dunlap
       [not found] ` <200807101725.36175.nickpiggin@yahoo.com.au>
  0 siblings, 2 replies; 13+ messages in thread
From: Rafael J. Wysocki @ 2008-07-08 21:37 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Adrian Bunk, Andrew Morton, Linus Torvalds, Natalie Protasevich,
	Kernel Testers List

[Here's an updated report as promised.  We have closed some bug entries since
the previous one, dropped one from the list and there are two new items.

I won't be sending the follow-up messages this time, it's sufficient to send
them once a week IMO.]

This message contains a list of some regressions from 2.6.25, for which there
are no fixes in the mainline I know of.  If any of them have been fixed already,
please let me know.

If you know of any other unresolved regressions from 2.6.25, please let me know
either and I'll add them to the list.  Also, please let me know if any of the
entries below are invalid.

Each entry from the list will be sent additionally in an automatic reply to
this message with CCs to the people involved in reporting and handling the
issue.


Listed regressions statistics:

  Date          Total  Pending  Unresolved
  ----------------------------------------
  2008-07-09      167       30          21
  2008-07-06      166       38          26
  2008-06-29      158       43          31
  2008-06-22      148       39          28
  2008-06-14      130       37          28
  2008-06-07      125       48          33
  2008-05-31      115       52          31
  2008-05-24       94       47          28
  2008-05-18       80       51          37
  2008-05-11       53       46          34


Unresolved regressions
----------------------

Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11055
Subject		: iwl4965 - connection doesn't work more than 10 seconds
Submitter	: François Valenduc <francois.valenduc@tvcablenet.be>
Date		: 2008-07-08 13:29 (1 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11045
Subject		: Bug in MPT Fusion 2.6.26-rc7 unbootable
Submitter	: Kurk <kurk@shiftmail.org>
Date		: 2008-07-06 11:22 (3 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11040
Subject		: 2.6.26-rc: host can not shutdown: ata problem
Submitter	: Alexander Beregalov <a.beregalov@gmail.com>
Date		: 2008-07-03 21:43 (6 days old)
References	: http://marc.info/?l=linux-kernel&m=121512197225068&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11039
Subject		: 2.6.28-rc8-git3 forcedeth WARNING (kills the interface)
Submitter	: Brad Campbell <brad@wasp.net.au>
Date		: 2008-07-03 10:07 (6 days old)
References	: http://marc.info/?l=linux-netdev&m=121508714430752&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11035
Subject		: System hangs on 2.6.26-rc8
Submitter	: Roman Mindalev <lists@r000n.net>
Date		: 2008-07-02 14:25 (7 days old)
References	: http://marc.info/?l=linux-kernel&m=121500871414995&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11023
Subject		: 2.6.26-rc8-git2 - kernel BUG at mm/page_alloc.c:585
Submitter	: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Date		: 2008-07-02 11:55 (7 days old)
References	: http://lkml.org/lkml/2008/7/2/32
Handled-By	: Andrew Morton <akpm@linux-foundation.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11009
Subject		: No console on Riva TNT since 2.6.26-0.rc4
Submitter	: Quel Qun <kelk1@comcast.net>
Date		: 2008-06-26 20:04 (13 days old)
References	: http://marc.info/?l=linux-kernel&m=121451344229718&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10989
Subject		: kernel oopses when wiggling the mouse to make it known to hidd
Submitter	: Daniel Vetter <daniel@ffwll.ch>
Date		: 2008-06-26 10:32 (13 days old)
Handled-By	: Marcel Holtmann <marcel@holtmann.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10985
Subject		: backlight doesn't come on after resume with i915 video
Submitter	: Jon Dowland <jon+bugzilla.kernel.org@alcopop.org>
Date		: 2008-06-26 02:09 (13 days old)
Handled-By	: Jesse Barnes <jbarnes@virtuousgeek.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10960
Subject		: 2.6.26-rc: SPARC: Sun Ultra 10 can not boot
Submitter	: Alexander Beregalov <a.beregalov@gmail.com>
Date		: 2008-06-19 14:07 (20 days old)
References	: http://marc.info/?l=linux-kernel&m=121388456519637&w=4
Handled-By	: David Miller <davem@davemloft.net>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10955
Subject		: v2.6.26-rc7: BUG task_struct: Poison overwritten
Submitter	: Vegard Nossum <vegard.nossum@gmail.com>
Date		: 2008-06-21 19:24 (18 days old)
References	: http://marc.info/?l=linux-kernel&m=121407641925121&w=4
Handled-By	: Peter Zijlstra <a.p.zijlstra@chello.nl>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10954
Subject		: hda_intel: azx_get_response timeout, switching to polling mode: last cmd=0x011f000c
Submitter	: Justin Mattock <justinmattock@gmail.com>
Date		: 2008-06-21 2:05 (18 days old)
References	: http://marc.info/?l=linux-kernel&m=121401399622190&w=4
		  http://marc.info/?t=121416231700010&r=1&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10919
Subject		: [regression] display dimming is slow and laggy - Acer Travelmate 661lci
Submitter	: Maximilian Engelhardt <maxi@daemonizer.de>
Date		: 2008-06-14 22:31 (25 days old)
References	: http://marc.info/?l=linux-kernel&m=121348428828320&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10906
Subject		: repeatable slab corruption with LTP msgctl08
Submitter	: Andrew Morton <akpm@linux-foundation.org>
Date		: 2008-06-12 5:13 (27 days old)
References	: http://marc.info/?l=linux-kernel&m=121324775927704&w=4
Handled-By	: Pekka J Enberg <penberg@cs.helsinki.fi>
		  Christoph Lameter <clameter@sgi.com>
		  Manfred Spraul <manfred@colorfullife.com>
		  Andi Kleen <andi@firstfloor.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10872
Subject		: x86_64 boot hang when CONFIG_NUMA=n
Submitter	: Randy Dunlap <randy.dunlap@oracle.com>
Date		: 2008-06-05 21:50 (34 days old)
References	: http://marc.info/?l=linux-kernel&m=121270308607116&w=4
		  http://lkml.org/lkml/2008/6/11/355
		  http://lkml.org/lkml/2008/6/15/117
		  http://marc.info/?l=linux-kernel&m=121287638527452&w=2
Handled-By	: Yinghai Lu <yhlu.kernel@gmail.com>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10865
Subject		: Oops trying to mount an ntfs partition on thinkpad
Submitter	: Alex Romosan <romosan@sycorax.lbl.gov>
Date		: 2008-06-05 14:47 (34 days old)
References	: http://marc.info/?l=linux-kernel&m=121267834421414&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10843
Subject		: Display artifacts on XOrg logout with PAT kernel and VESA framebuffer
Submitter	: Frans Pop <elendil@planet.nl>
Date		: 2008-05-31 14:04 (39 days old)
References	: http://lkml.org/lkml/2008/6/7/206
		  http://lkml.org/lkml/2008/6/15/119
		  http://lkml.org/lkml/2008/6/23/160


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10821
Subject		: rt25xx: lock dependency warning, association failure, and kmalloc corruption
Submitter	: Christian Casteyde <casteyde.christian@free.fr>
Date		: 2008-05-29 14:30 (41 days old)
Handled-By	: Ivo van Doorn <IvDoorn@gmail.com>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10741
Subject		: bug in `tty: BKL pushdown'?
Submitter	: Johannes Weiner <hannes@saeurebad.de>
Date		: 2008-05-18 2:16 (52 days old)
References	: http://marc.info/?l=linux-kernel&m=121107706506181&w=4
		  http://lkml.org/lkml/2008/6/16/104
Handled-By	: Alan Cox <alan@lxorguk.ukuu.org.uk>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10714
Subject		: powerpc: Badness seen on 2.6.26-rc2 with lockdep enabled
Submitter	: Balbir Singh <balbir@linux.vnet.ibm.com>
Date		: 2008-05-14 12:57 (56 days old)
References	: http://marc.info/?l=linux-kernel&m=121076917429133&w=4
Handled-By	: Benjamin Herrenschmidt <benh@kernel.crashing.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10629
Subject		: 2.6.26-rc1-$sha1: RIP __d_lookup+0x8c/0x160
Submitter	: Alexey Dobriyan <adobriyan@gmail.com>
Date		: 2008-05-05 09:59 (65 days old)
References	: http://lkml.org/lkml/2008/5/5/28
Handled-By	: Paul E. McKenney <paulmck@linux.vnet.ibm.com>


Regressions with patches
------------------------

Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11042
Subject		: build issue #477 for v2.6.26-rc8-290-gb8a0b6c : input_event" [drivers/media/dvb/ttpci/dvb-ttpci.ko] undefined!
Submitter	: Toralf Förster <toralf.foerster@gmx.de>
Date		: 2008-07-05 15:25 (4 days old)
References	: http://marc.info/?l=linux-kernel&m=121527158632563&w=4
Handled-By	: Oliver Endriss <o.endriss@gmx.de>
Patch		: http://marc.info/?l=linux-kernel&m=121529790229531&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11024
Subject		: 2.6.25 to 2.6.26-rc8 regression (related to ahci and acpi _GTF)
Submitter	: Mathieu Bérard <Mathieu.Berard@crans.org>
Date		: 2008-07-01 9:39 (8 days old)
References	: http://marc.info/?t=121490593600001&r=1&w=4
Handled-By	: Tejun Heo <htejun@gmail.com>
Patch		: http://marc.info/?l=linux-kernel&m=121514631317343&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10957
Subject		: pata_pcmcia with Sandisk Extreme III 8GB
Submitter	: Komuro <komurojun-mbn@nifty.com>
Date		: 2008-06-07 13:37 (32 days old)
References	: http://marc.info/?l=linux-kernel&m=121284627119861&w=4
Handled-By	: Tejun Heo <htejun@gmail.com>
		  Dominik Brodowski <linux@dominikbrodowski.net>
		  Komuro <komurojun-mbn@nifty.com>
Patch		: http://marc.info/?l=linux-kernel&m=121530861605673&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10862
Subject		: forcedeth: lockdep warning on ethtool -s
Submitter	: Tobias Diedrich <ranma+kernel@tdiedrich.de>
Date		: 2008-06-01 8:37 (38 days old)
References	: http://marc.info/?l=linux-kernel&m=121230964032247&w=4
Handled-By	: Tobias Diedrich <ranma+kernel@tdiedrich.de>
Patch		: http://lkml.org/lkml/2008/6/15/120


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10815
Subject		: 2.6.26-rc4: RIP find_pid_ns+0x6b/0xa0
Submitter	: Alexey Dobriyan <adobriyan@gmail.com>
Date		: 2008-05-27 09:23 (43 days old)
References	: http://lkml.org/lkml/2008/5/27/9
		  http://lkml.org/lkml/2008/6/14/87
Handled-By	: Oleg Nesterov <oleg@tv-sign.ru>
		  Linus Torvalds <torvalds@linux-foundation.org>
		  Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Patch		: http://lkml.org/lkml/2008/5/28/16


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10725
Subject		: USB Mass storage mount fails: Write protect on
Submitter	: Maciej Rutecki <maciej.rutecki@gmail.com>
Date		: 2008-05-16 14:55 (54 days old)
References	: http://marc.info/?l=linux-kernel&m=121095168003572&w=4
Handled-By	: Alan Stern <stern@rowland.harvard.edu>
Patch		: http://marc.info/?l=linux-scsi&m=121433068314568&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10724
Subject		: ACPI: EC: GPE storm detected, disabling EC GPE
Submitter	: Justin Mattock <justinmattock@gmail.com>
Date		: 2008-05-16 6:17 (54 days old)
References	: http://marc.info/?l=linux-kernel&m=121091875711824&w=4
		  http://lkml.org/lkml/2008/5/18/168
		  http://lkml.org/lkml/2008/5/25/195
Patch		: http://bugzilla.kernel.org/attachment.cgi?id=16364&action=view
		  http://bugzilla.kernel.org/attachment.cgi?id=16365&action=view


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10493
Subject		: mips BCM47XX compile error
Submitter	: Adrian Bunk <adrian.bunk@movial.fi>
Date		: 2008-04-20 17:07 (80 days old)
References	: http://lkml.org/lkml/2008/4/20/34
		  http://lkml.org/lkml/2008/5/12/30
		  http://lkml.org/lkml/2008/5/18/131
		  http://lkml.org/lkml/2008/5/31/202
		  http://lkml.org/lkml/2008/6/7/154
Patch		: http://marc.info/?l=linux-kernel&m=120876451216558&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9791
Subject		: Clock is running too fast^Wslow using acpi_pm clocksource
Submitter	: tosn00j02@sneakemail.com
Date		: 2008-05-03 05:09 (67 days old)
Handled-By	: Maciej W. Rozycki <macro@linux-mips.org>
Patch		: http://bugzilla.kernel.org/attachment.cgi?id=16180


For details, please visit the bug entries and follow the links given in
references.

As you can see, there is a Bugzilla entry for each of the listed regressions.
There also is a Bugzilla entry used for tracking the regressions from 2.6.25,
unresolved as well as resolved, at:

http://bugzilla.kernel.org/show_bug.cgi?id=10492

Please let me know if there are any Bugzilla entries that should be added to
the list in there.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.26-rc9-git4: Reported regressions from 2.6.25
  2008-07-08 21:37 2.6.26-rc9-git4: Reported regressions from 2.6.25 Rafael J. Wysocki
@ 2008-07-09  4:49 ` Randy Dunlap
  2008-07-09 14:35   ` Rafael J. Wysocki
       [not found] ` <200807101725.36175.nickpiggin@yahoo.com.au>
  1 sibling, 1 reply; 13+ messages in thread
From: Randy Dunlap @ 2008-07-09  4:49 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, Kernel Testers List

On Tue,  8 Jul 2008 23:37:43 +0200 (CEST) Rafael J. Wysocki wrote:

> Unresolved regressions
> ----------------------
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10872
> Subject		: x86_64 boot hang when CONFIG_NUMA=n
> Submitter	: Randy Dunlap <randy.dunlap@oracle.com>
> Date		: 2008-06-05 21:50 (34 days old)
> References	: http://marc.info/?l=linux-kernel&m=121270308607116&w=4
> 		  http://lkml.org/lkml/2008/6/11/355
> 		  http://lkml.org/lkml/2008/6/15/117
> 		  http://marc.info/?l=linux-kernel&m=121287638527452&w=2
> Handled-By	: Yinghai Lu <yhlu.kernel@gmail.com>

I've spent quite some time trying to bisect this problem, but to no avail.
I don't expect to spend more time on it, so I suppose you can just mark it as dead.

Thanks for your efforts, YH.

---
~Randy
Linux Plumbers Conference, 17-19 September 2008, Portland, Oregon USA
http://linuxplumbersconf.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.26-rc9-git4: Reported regressions from 2.6.25
  2008-07-09  4:49 ` Randy Dunlap
@ 2008-07-09 14:35   ` Rafael J. Wysocki
  0 siblings, 0 replies; 13+ messages in thread
From: Rafael J. Wysocki @ 2008-07-09 14:35 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, Kernel Testers List

On Wednesday, 9 of July 2008, Randy Dunlap wrote:
> On Tue,  8 Jul 2008 23:37:43 +0200 (CEST) Rafael J. Wysocki wrote:
> 
> > Unresolved regressions
> > ----------------------
> > 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10872
> > Subject		: x86_64 boot hang when CONFIG_NUMA=n
> > Submitter	: Randy Dunlap <randy.dunlap@oracle.com>
> > Date		: 2008-06-05 21:50 (34 days old)
> > References	: http://marc.info/?l=linux-kernel&m=121270308607116&w=4
> > 		  http://lkml.org/lkml/2008/6/11/355
> > 		  http://lkml.org/lkml/2008/6/15/117
> > 		  http://marc.info/?l=linux-kernel&m=121287638527452&w=2
> > Handled-By	: Yinghai Lu <yhlu.kernel@gmail.com>
> 
> I've spent quite some time trying to bisect this problem, but to no avail.
> I don't expect to spend more time on it, so I suppose you can just mark it as dead.

OK, I'll close it, then.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 13+ messages in thread

[parent not found: <200807101725.36175.nickpiggin@yahoo.com.au>]

* Re: 2.6.26-rc9-git4: Reported regressions from 2.6.25
       [not found] ` <200807101725.36175.nickpiggin@yahoo.com.au>
@ 2008-07-10  9:03   ` Kamalesh Babulal
  2008-07-10 11:02   ` Alexey Dobriyan
  2008-08-01 21:09   ` Paul E. McKenney
  2 siblings, 0 replies; 13+ messages in thread
From: Kamalesh Babulal @ 2008-07-10  9:03 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Rafael J. Wysocki, Paul E. McKenney, Alexey Dobriyan,
	Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, Kernel Testers List

Nick Piggin wrote:
> On Wednesday 09 July 2008 07:37, Rafael J. Wysocki wrote:
> 
>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11023
>> Subject		: 2.6.26-rc8-git2 - kernel BUG at mm/page_alloc.c:585
>> Submitter	: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
>> Date		: 2008-07-02 11:55 (7 days old)
>> References	: http://lkml.org/lkml/2008/7/2/32
>> Handled-By	: Andrew Morton <akpm@linux-foundation.org>
> 
> I expect Andrew probably doesn't have time to delve into this. 
> Usual questions apply: is it reproduceable, is it bisectable?
> Someone at IBM is probably best to handle it. Maybe try Mel or
> powerpc list?
> 

This is reproducible, I have marked the powerpc list in the bug report,
send to the list. I will try and bisect the bug. 
> 
>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10906
>> Subject		: repeatable slab corruption with LTP msgctl08
>> Submitter	: Andrew Morton <akpm@linux-foundation.org>
>> Date		: 2008-06-12 5:13 (27 days old)
>> References	: http://marc.info/?l=linux-kernel&m=121324775927704&w=4
>> Handled-By	: Pekka J Enberg <penberg@cs.helsinki.fi>
>> 		  Christoph Lameter <clameter@sgi.com>
>> 		  Manfred Spraul <manfred@colorfullife.com>
>> 		  Andi Kleen <andi@firstfloor.org>
> 
> I couldn't reproduce this one either. Maybe hardware failure?
> 
> 
>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10629
>> Subject		: 2.6.26-rc1-$sha1: RIP __d_lookup+0x8c/0x160
>> Submitter	: Alexey Dobriyan <adobriyan@gmail.com>
>> Date		: 2008-05-05 09:59 (65 days old)
>> References	: http://lkml.org/lkml/2008/5/5/28
>> Handled-By	: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> Attached is my fix for this problem. I don't think it is a regression
> as such, but it can't hurt to go into 2.6.26 IMO.
> 
> 
> 
> ------------------------------------------------------------------------
> 
> PREEMPT_RCU without HOTPLUG_CPU is broken. The rcu_online_cpu is called to
> initially populate rcu_cpu_online_map with all online CPUs when the hotplug
> event handler is installed, and also to populate the map with CPUs as they
> come online. The former case is meant to happen with and without HOTPLUG_CPU,
> but without HOTPLUG_CPU, the rcu_offline_cpu function is no-oped -- while it
> still gets called, it does not set the rcu CPU map.
> 
> With a blank RCU CPU map, grace periods get to tick by completely oblivious
> to active RCU read side critical sections. This results in free-before-grace
> bugs.
> 
> Fix is obvious once the problem is known. (Also, change __devinit to
> __cpuinit so the function gets thrown away on !HOTPLUG_CPU kernels).
> 
> Signed-off-by: Nick Piggin <npiggin@suse.de>
> ---
> 
> Annoyed this wasn't a crazy obscure error in the algorithm I could fix :)
> I spent all day debugging it and had to make a special test case (rcutorture
> didn't seem to trigger it), and a big RCU state logging infrastructure to log
> millions of RCU state transitions and events. Oh well.
> 
> Index: linux-2.6/kernel/rcupreempt.c
> ===================================================================
> --- linux-2.6.orig/kernel/rcupreempt.c	2008-07-10 17:08:56.000000000 +1000
> +++ linux-2.6/kernel/rcupreempt.c	2008-07-10 17:09:10.000000000 +1000
> @@ -925,26 +925,22 @@ void rcu_offline_cpu(int cpu)
>  	spin_unlock_irqrestore(&rdp->lock, flags);
>  }
> 
> -void __devinit rcu_online_cpu(int cpu)
> -{
> -	unsigned long flags;
> -
> -	spin_lock_irqsave(&rcu_ctrlblk.fliplock, flags);
> -	cpu_set(cpu, rcu_cpu_online_map);
> -	spin_unlock_irqrestore(&rcu_ctrlblk.fliplock, flags);
> -}
> -
>  #else /* #ifdef CONFIG_HOTPLUG_CPU */
> 
>  void rcu_offline_cpu(int cpu)
>  {
>  }
> 
> -void __devinit rcu_online_cpu(int cpu)
> +#endif /* #else #ifdef CONFIG_HOTPLUG_CPU */
> +
> +void __cpuinit rcu_online_cpu(int cpu)
>  {
> -}
> +	unsigned long flags;
> 
> -#endif /* #else #ifdef CONFIG_HOTPLUG_CPU */
> +	spin_lock_irqsave(&rcu_ctrlblk.fliplock, flags);
> +	cpu_set(cpu, rcu_cpu_online_map);
> +	spin_unlock_irqrestore(&rcu_ctrlblk.fliplock, flags);
> +}
> 
>  static void rcu_process_callbacks(struct softirq_action *unused)
>  {


-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.26-rc9-git4: Reported regressions from 2.6.25
       [not found] ` <200807101725.36175.nickpiggin@yahoo.com.au>
  2008-07-10  9:03   ` Kamalesh Babulal
@ 2008-07-10 11:02   ` Alexey Dobriyan
  2008-07-10 17:21     ` Linus Torvalds
  2008-08-01 21:09   ` Paul E. McKenney
  2 siblings, 1 reply; 13+ messages in thread
From: Alexey Dobriyan @ 2008-07-10 11:02 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Rafael J. Wysocki, Kamalesh Babulal, Paul E. McKenney,
	Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, Kernel Testers List

On Thu, Jul 10, 2008 at 05:25:35PM +1000, Nick Piggin wrote:
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10629
> > Subject		: 2.6.26-rc1-$sha1: RIP __d_lookup+0x8c/0x160
> > Submitter	: Alexey Dobriyan <adobriyan@gmail.com>
> > Date		: 2008-05-05 09:59 (65 days old)
> > References	: http://lkml.org/lkml/2008/5/5/28
> > Handled-By	: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> Attached is my fix for this problem. I don't think it is a regression
> as such, but it can't hurt to go into 2.6.26 IMO.

> PREEMPT_RCU without HOTPLUG_CPU is broken.

Bastard!

rcutorture fixed here, starting cross-compile stuff (without much interest).

> Annoyed this wasn't a crazy obscure error in the algorithm I could fix :)
> I spent all day debugging it and had to make a special test case (rcutorture
> didn't seem to trigger it), and a big RCU state logging infrastructure to log
> millions of RCU state transitions and events. Oh well.

> --- linux-2.6.orig/kernel/rcupreempt.c
> +++ linux-2.6/kernel/rcupreempt.c
> @@ -925,26 +925,22 @@ void rcu_offline_cpu(int cpu)
>  	spin_unlock_irqrestore(&rdp->lock, flags);
>  }
>  
> -void __devinit rcu_online_cpu(int cpu)
> -{
> -	unsigned long flags;
> -
> -	spin_lock_irqsave(&rcu_ctrlblk.fliplock, flags);
> -	cpu_set(cpu, rcu_cpu_online_map);
> -	spin_unlock_irqrestore(&rcu_ctrlblk.fliplock, flags);
> -}
> -
>  #else /* #ifdef CONFIG_HOTPLUG_CPU */
>  
>  void rcu_offline_cpu(int cpu)
>  {
>  }
>  
> -void __devinit rcu_online_cpu(int cpu)
> +#endif /* #else #ifdef CONFIG_HOTPLUG_CPU */
> +
> +void __cpuinit rcu_online_cpu(int cpu)
>  {
> -}
> +	unsigned long flags;
>  
> -#endif /* #else #ifdef CONFIG_HOTPLUG_CPU */
> +	spin_lock_irqsave(&rcu_ctrlblk.fliplock, flags);
> +	cpu_set(cpu, rcu_cpu_online_map);
> +	spin_unlock_irqrestore(&rcu_ctrlblk.fliplock, flags);
> +}
>  
>  static void rcu_process_callbacks(struct softirq_action *unused)
>  {



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.26-rc9-git4: Reported regressions from 2.6.25
  2008-07-10 11:02   ` Alexey Dobriyan
@ 2008-07-10 17:21     ` Linus Torvalds
  2008-07-10 17:34       ` Ingo Molnar
  0 siblings, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2008-07-10 17:21 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Nick Piggin, Rafael J. Wysocki, Kamalesh Babulal,
	Paul E. McKenney, Linux Kernel Mailing List, Adrian Bunk,
	Andrew Morton, Natalie Protasevich, Kernel Testers List



On Thu, 10 Jul 2008, Alexey Dobriyan wrote:

> On Thu, Jul 10, 2008 at 05:25:35PM +1000, Nick Piggin wrote:
> > 
> > Attached is my fix for this problem. I don't think it is a regression
> > as such, but it can't hurt to go into 2.6.26 IMO.

Nick, you're a hero.

> > PREEMPT_RCU without HOTPLUG_CPU is broken.
> 
> Bastard!
> 
> rcutorture fixed here, starting cross-compile stuff (without much interest).

I'm marking this "tested-by" by you too, on the strength of that 
rcutorture thing. I think Nick nailed this one.

Good jorb,

		Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.26-rc9-git4: Reported regressions from 2.6.25
  2008-07-10 17:21     ` Linus Torvalds
@ 2008-07-10 17:34       ` Ingo Molnar
  2008-07-10 18:06         ` Ingo Molnar
  0 siblings, 1 reply; 13+ messages in thread
From: Ingo Molnar @ 2008-07-10 17:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexey Dobriyan, Nick Piggin, Rafael J. Wysocki, Kamalesh Babulal,
	Paul E. McKenney, Linux Kernel Mailing List, Adrian Bunk,
	Andrew Morton, Natalie Protasevich, Kernel Testers List


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Thu, 10 Jul 2008, Alexey Dobriyan wrote:
> 
> > On Thu, Jul 10, 2008 at 05:25:35PM +1000, Nick Piggin wrote:
> > > 
> > > Attached is my fix for this problem. I don't think it is a 
> > > regression as such, but it can't hurt to go into 2.6.26 IMO.
> 
> Nick, you're a hero.

cool! :)

(hm, could anyone please resend Nick's original mail? The original one 
is not in my lkml folder nor on lkml.org - only the quoted one.)

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.26-rc9-git4: Reported regressions from 2.6.25
  2008-07-10 17:34       ` Ingo Molnar
@ 2008-07-10 18:06         ` Ingo Molnar
  2008-07-11  4:11           ` Nick Piggin
                             ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Ingo Molnar @ 2008-07-10 18:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexey Dobriyan, Nick Piggin, Rafael J. Wysocki, Kamalesh Babulal,
	Paul E. McKenney, Linux Kernel Mailing List, Adrian Bunk,
	Andrew Morton, Natalie Protasevich, Kernel Testers List,
	Paul E. McKenney


* Ingo Molnar <mingo@elte.hu> wrote:

> cool! :)
> 
> (hm, could anyone please resend Nick's original mail? The original one 
> is not in my lkml folder nor on lkml.org - only the quoted one.)

ok, got the mail now now:

| | Annoyed this wasn't a crazy obscure error in the algorithm I could 
| | fix :) [...]

Paul recently ran a formal proof against all sorts of RCU details (and 
found and fixed a few obscure races that way that no-one ever 
triggered), so i'd be quite surprised if we found anything in the core 
algorithm :-)

| | [...] I spent all day debugging it and had to make a special test 
| | case (rcutorture didn't seem to trigger it), and a big RCU state 
| | logging infrastructure to log millions of RCU state transitions and 
| | events. Oh well.

nice debugging!

Acked-by: Ingo Molnar <mingo@elte.hu>

i'm wondering why rcutorture didnt trigger it. I do run !HOTPLUG + 
RCU_PREEMPT kernels and never saw this. Nor did Paul. That aspect is 
weird.

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.26-rc9-git4: Reported regressions from 2.6.25
  2008-07-10 18:06         ` Ingo Molnar
@ 2008-07-11  4:11           ` Nick Piggin
  2008-08-01 21:09             ` Paul E. McKenney
  2008-08-01 21:09           ` Paul E. McKenney
       [not found]           ` <20080710204157.GG6877@linux.vnet.ibm.com>
  2 siblings, 1 reply; 13+ messages in thread
From: Nick Piggin @ 2008-07-11  4:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Alexey Dobriyan, Rafael J. Wysocki,
	Kamalesh Babulal, Paul E. McKenney, Linux Kernel Mailing List,
	Adrian Bunk, Andrew Morton, Natalie Protasevich,
	Kernel Testers List, Paul E. McKenney

On Friday 11 July 2008 04:06, Ingo Molnar wrote:

> i'm wondering why rcutorture didnt trigger it. I do run !HOTPLUG +
> RCU_PREEMPT kernels and never saw this. Nor did Paul. That aspect is
> weird.

It basically requires an active rcu reader to be preempted (preferably
by something doing a lot of call_rcu or other activity ie. the writer
so it can tick along the different states quickly).

I found just 2 threads (reader and writer) bound to the same CPU would
trigger it fastest, my reader has quite a long rcu read section.

I'm not sure why rcutorture doesn't trigger for everyone. I'm surprised
it does not have much longer maximum read delays -- several ms I would
have thought should be useful to have a crticial section open while the
rcu engine can run through a number of states...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.26-rc9-git4: Reported regressions from 2.6.25
  2008-07-11  4:11           ` Nick Piggin
@ 2008-08-01 21:09             ` Paul E. McKenney
  0 siblings, 0 replies; 13+ messages in thread
From: Paul E. McKenney @ 2008-08-01 21:09 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Ingo Molnar, Linus Torvalds, Alexey Dobriyan, Rafael J. Wysocki,
	Kamalesh Babulal, Linux Kernel Mailing List, Adrian Bunk,
	Andrew Morton, Natalie Protasevich, Kernel Testers List

On Fri, Jul 11, 2008 at 02:11:59PM +1000, Nick Piggin wrote:
> On Friday 11 July 2008 04:06, Ingo Molnar wrote:
> 
> > i'm wondering why rcutorture didnt trigger it. I do run !HOTPLUG +
> > RCU_PREEMPT kernels and never saw this. Nor did Paul. That aspect is
> > weird.
> 
> It basically requires an active rcu reader to be preempted (preferably
> by something doing a lot of call_rcu or other activity ie. the writer
> so it can tick along the different states quickly).
> 
> I found just 2 threads (reader and writer) bound to the same CPU would
> trigger it fastest, my reader has quite a long rcu read section.
> 
> I'm not sure why rcutorture doesn't trigger for everyone. I'm surprised
> it does not have much longer maximum read delays -- several ms I would
> have thought should be useful to have a crticial section open while the
> rcu engine can run through a number of states...

Hit it in 10 seconds once I actually got HOTPLUG_CPU disabled.

The theory behind the default settings for rcutorture are as follows:

o	Having two reader threads for each CPU helps ensure interactions
	between those threads.

o	The writer is normally going to have to share a CPU with a
	reader or two, maybe three.  This should force reader-writer
	interactions.

o	The read-hold time needs to be long enough to ensure interactions
	with the writer, but if it is too long, there are too few
	rcu_read_lock() and rcu_read_unlock() events to really stress
	the read-side processing.

o	The four fakewriters ensure interaction between multiple
	writers.

To Nick's point, I did use a hacked-up rcutorture with millisecond
read-side delays when debugging preemptable RCU, but I also used stock
rcutorture.

I will give this some thought and see if the defaults should change or
if more knobs are needed.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.26-rc9-git4: Reported regressions from 2.6.25
  2008-07-10 18:06         ` Ingo Molnar
  2008-07-11  4:11           ` Nick Piggin
@ 2008-08-01 21:09           ` Paul E. McKenney
       [not found]           ` <20080710204157.GG6877@linux.vnet.ibm.com>
  2 siblings, 0 replies; 13+ messages in thread
From: Paul E. McKenney @ 2008-08-01 21:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Alexey Dobriyan, Nick Piggin, Rafael J. Wysocki,
	Kamalesh Babulal, Linux Kernel Mailing List, Adrian Bunk,
	Andrew Morton, Natalie Protasevich, Kernel Testers List

On Thu, Jul 10, 2008 at 08:06:20PM +0200, Ingo Molnar wrote:
> 
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > cool! :)
> > 
> > (hm, could anyone please resend Nick's original mail? The original one 
> > is not in my lkml folder nor on lkml.org - only the quoted one.)
> 
> ok, got the mail now now:
> 
> | | Annoyed this wasn't a crazy obscure error in the algorithm I could 
> | | fix :) [...]
> 
> Paul recently ran a formal proof against all sorts of RCU details (and 
> found and fixed a few obscure races that way that no-one ever 
> triggered), so i'd be quite surprised if we found anything in the core 
> algorithm :-)
> 
> | | [...] I spent all day debugging it and had to make a special test 
> | | case (rcutorture didn't seem to trigger it), and a big RCU state 
> | | logging infrastructure to log millions of RCU state transitions and 
> | | events. Oh well.
> 
> nice debugging!

Indeed!!!

> Acked-by: Ingo Molnar <mingo@elte.hu>
> 
> i'm wondering why rcutorture didnt trigger it. I do run !HOTPLUG + 
> RCU_PREEMPT kernels and never saw this. Nor did Paul. That aspect is 
> weird.

Turns out that my environment was silently re-enabling HOTPLUG_CPU, so I
only -thought- I was testing !CPU_HOTPLUG.  Once I forced it to really
disable HOTPLUG_CPU (by manually also specifying CONFIG_SUSPEND=n and
CONFIG_HIBERNATION=n), then rcutorture complained within 10 seconds.

Sigh!!!

						Thanx, Paul

^ permalink raw reply	[flat|nested] 13+ messages in thread

[parent not found: <20080710204157.GG6877@linux.vnet.ibm.com>]

* Re: 2.6.26-rc9-git4: Reported regressions from 2.6.25
       [not found]           ` <20080710204157.GG6877@linux.vnet.ibm.com>
@ 2008-08-01 21:09             ` Paul E. McKenney
  0 siblings, 0 replies; 13+ messages in thread
From: Paul E. McKenney @ 2008-08-01 21:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Alexey Dobriyan, Nick Piggin, Rafael J. Wysocki,
	Kamalesh Babulal, Linux Kernel Mailing List, Adrian Bunk,
	Andrew Morton, Natalie Protasevich, Kernel Testers List

On Thu, Jul 10, 2008 at 01:41:57PM -0700, Paul E. McKenney wrote:
> On Thu, Jul 10, 2008 at 08:06:20PM +0200, Ingo Molnar wrote:
> > 
> > * Ingo Molnar <mingo@elte.hu> wrote:
> > 
> > > cool! :)
> > > 
> > > (hm, could anyone please resend Nick's original mail? The original one 
> > > is not in my lkml folder nor on lkml.org - only the quoted one.)
> > 
> > ok, got the mail now now:
> > 
> > | | Annoyed this wasn't a crazy obscure error in the algorithm I could 
> > | | fix :) [...]
> > 
> > Paul recently ran a formal proof against all sorts of RCU details (and 
> > found and fixed a few obscure races that way that no-one ever 
> > triggered), so i'd be quite surprised if we found anything in the core 
> > algorithm :-)

Yeah, it was instead the simple stuff that I messed up...  :-/

> > | | [...] I spent all day debugging it and had to make a special test 
> > | | case (rcutorture didn't seem to trigger it), and a big RCU state 
> > | | logging infrastructure to log millions of RCU state transitions and 
> > | | events. Oh well.
> > 
> > nice debugging!
> 
> Indeed!!!
> 
> > Acked-by: Ingo Molnar <mingo@elte.hu>
> > 
> > i'm wondering why rcutorture didnt trigger it. I do run !HOTPLUG + 
> > RCU_PREEMPT kernels and never saw this. Nor did Paul. That aspect is 
> > weird.
> 
> Turns out that my environment was silently re-enabling HOTPLUG_CPU, so I
> only -thought- I was testing !CPU_HOTPLUG.  Once I forced it to really
> disable HOTPLUG_CPU (by manually also specifying CONFIG_SUSPEND=n and
> CONFIG_HIBERNATION=n), then rcutorture complained within 10 seconds.
> 
> Sigh!!!

And Nick's patch gets rid of the rcutorture failures for me as well,
now that I can reproduce them.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.26-rc9-git4: Reported regressions from 2.6.25
       [not found] ` <200807101725.36175.nickpiggin@yahoo.com.au>
  2008-07-10  9:03   ` Kamalesh Babulal
  2008-07-10 11:02   ` Alexey Dobriyan
@ 2008-08-01 21:09   ` Paul E. McKenney
  2 siblings, 0 replies; 13+ messages in thread
From: Paul E. McKenney @ 2008-08-01 21:09 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Rafael J. Wysocki, Kamalesh Babulal, Alexey Dobriyan,
	Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, Kernel Testers List

On Thu, Jul 10, 2008 at 05:25:35PM +1000, Nick Piggin wrote:
> On Wednesday 09 July 2008 07:37, Rafael J. Wysocki wrote:
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10629
> > Subject		: 2.6.26-rc1-$sha1: RIP __d_lookup+0x8c/0x160
> > Submitter	: Alexey Dobriyan <adobriyan@gmail.com>
> > Date		: 2008-05-05 09:59 (65 days old)
> > References	: http://lkml.org/lkml/2008/5/5/28
> > Handled-By	: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> Attached is my fix for this problem. I don't think it is a regression
> as such, but it can't hurt to go into 2.6.26 IMO.
> 
> PREEMPT_RCU without HOTPLUG_CPU is broken. The rcu_online_cpu is called to
> initially populate rcu_cpu_online_map with all online CPUs when the hotplug
> event handler is installed, and also to populate the map with CPUs as they
> come online. The former case is meant to happen with and without HOTPLUG_CPU,
> but without HOTPLUG_CPU, the rcu_offline_cpu function is no-oped -- while it
> still gets called, it does not set the rcu CPU map.
> 
> With a blank RCU CPU map, grace periods get to tick by completely oblivious
> to active RCU read side critical sections. This results in free-before-grace
> bugs.
> 
> Fix is obvious once the problem is known. (Also, change __devinit to
> __cpuinit so the function gets thrown away on !HOTPLUG_CPU kernels).

I officially feel extremely stupid.  Thank you -very- much for tracking
this down, Nick!!!  And especially for the fix!

I will give this a good testing.  In the meantime:

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> Signed-off-by: Nick Piggin <npiggin@suse.de>
> ---
> 
> Annoyed this wasn't a crazy obscure error in the algorithm I could fix :)
> I spent all day debugging it and had to make a special test case (rcutorture
> didn't seem to trigger it), and a big RCU state logging infrastructure to log
> millions of RCU state transitions and events. Oh well.
> 
> Index: linux-2.6/kernel/rcupreempt.c
> ===================================================================
> --- linux-2.6.orig/kernel/rcupreempt.c	2008-07-10 17:08:56.000000000 +1000
> +++ linux-2.6/kernel/rcupreempt.c	2008-07-10 17:09:10.000000000 +1000
> @@ -925,26 +925,22 @@ void rcu_offline_cpu(int cpu)
>  	spin_unlock_irqrestore(&rdp->lock, flags);
>  }
> 
> -void __devinit rcu_online_cpu(int cpu)
> -{
> -	unsigned long flags;
> -
> -	spin_lock_irqsave(&rcu_ctrlblk.fliplock, flags);
> -	cpu_set(cpu, rcu_cpu_online_map);
> -	spin_unlock_irqrestore(&rcu_ctrlblk.fliplock, flags);
> -}
> -
>  #else /* #ifdef CONFIG_HOTPLUG_CPU */
> 
>  void rcu_offline_cpu(int cpu)
>  {
>  }
> 
> -void __devinit rcu_online_cpu(int cpu)
> +#endif /* #else #ifdef CONFIG_HOTPLUG_CPU */
> +
> +void __cpuinit rcu_online_cpu(int cpu)
>  {
> -}
> +	unsigned long flags;
> 
> -#endif /* #else #ifdef CONFIG_HOTPLUG_CPU */
> +	spin_lock_irqsave(&rcu_ctrlblk.fliplock, flags);
> +	cpu_set(cpu, rcu_cpu_online_map);
> +	spin_unlock_irqrestore(&rcu_ctrlblk.fliplock, flags);
> +}
> 
>  static void rcu_process_callbacks(struct softirq_action *unused)
>  {


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2008-08-01 21:13 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-08 21:37 2.6.26-rc9-git4: Reported regressions from 2.6.25 Rafael J. Wysocki
2008-07-09  4:49 ` Randy Dunlap
2008-07-09 14:35   ` Rafael J. Wysocki
     [not found] ` <200807101725.36175.nickpiggin@yahoo.com.au>
2008-07-10  9:03   ` Kamalesh Babulal
2008-07-10 11:02   ` Alexey Dobriyan
2008-07-10 17:21     ` Linus Torvalds
2008-07-10 17:34       ` Ingo Molnar
2008-07-10 18:06         ` Ingo Molnar
2008-07-11  4:11           ` Nick Piggin
2008-08-01 21:09             ` Paul E. McKenney
2008-08-01 21:09           ` Paul E. McKenney
     [not found]           ` <20080710204157.GG6877@linux.vnet.ibm.com>
2008-08-01 21:09             ` Paul E. McKenney
2008-08-01 21:09   ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox