2.6.25-rc7-git2: Reported regressions from 2.6.24

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* 2.6.25-rc7-git2: Reported regressions from 2.6.24
@ 2008-03-27 22:53 Rafael J. Wysocki
  2008-03-28  0:18 ` Carlos R. Mafra
                   ` (3 more replies)
  0 siblings, 4 replies; 55+ messages in thread
From: Rafael J. Wysocki @ 2008-03-27 22:53 UTC (permalink / raw)
  To: LKML; +Cc: Adrian Bunk, Andrew Morton, Linus Torvalds, Natalie Protasevich

This message contains a list of some regressions from 2.6.24 reported since
2.6.25-rc1 was released, for which there are no fixes in the mainline I know
of.  If any of them have been fixed already, please let me know.

If you know of any other unresolved regressions from 2.6.24, please let me know
either and I'll add them to the list.  Also, please let me know if any of the
entries below are invalid.


Listed regressions statistics:

  Date          Total  Pending  Unresolved
  ----------------------------------------
  2008-03-27      171       38          30
  2008-03-22      159       35          31
  2008-03-17      148       38          30
  2008-03-16      146       42          35
  2008-03-14      145       45          39
  2008-03-12      143       51          41
  2008-03-11      141       58          43
  2008-03-10      138       66          47
  2008-03-03      115       65          49
  2008-02-25       90       51          39
  2008-02-17       61       45          37


Unresolved regressions
----------------------

Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9412
Subject		: commit a878539ef994787c447a98c2e3ba0fe3dad984ec breaks boot on SB600 AHCI
Submitter	: Srihari Vijayaraghavan <sriharivijayaraghavan@yahoo.com.au>
Date		: 2008-03-12 17:15 (16 days old)
Handled-By	: Jeff Garzik <jgarzik@pobox.com>
		  Richard Zhao <richard.zhao@amd.com>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9962
Subject		: mount: could not find filesystem
Submitter	: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Date		: 2008-02-12 14:34 (45 days old)
References	: http://lkml.org/lkml/2008/2/12/91
Handled-By	: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
		  Yinghai Lu <yhlu.kernel@gmail.com>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9976
Subject		: BUG: 2.6.25-rc1: iptables postrouting setup causes oops
Submitter	: Ben Nizette <bn@niasdigital.com>
Date		: 2008-02-12 12:46 (45 days old)
References	: http://lkml.org/lkml/2008/2/12/148
Handled-By	: Haavard Skinnemoen <hskinnemoen@atmel.com>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9978
Subject		: 2.6.25-rc1: volanoMark regression
Submitter	: Zhang, Yanmin <yanmin_zhang@linux.intel.com>
Date		: 2008-02-13 10:30 (44 days old)
References	: http://lkml.org/lkml/2008/2/13/128
		  http://lkml.org/lkml/2008/3/12/52
		  http://lkml.org/lkml/2008/3/18/81
Handled-By	: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
		  Balbir Singh <balbir@linux.vnet.ibm.com>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9980
Subject		: 2.6.25-rc1 on Sun Ultra 40- HPET clocksource which causes it to hang
Submitter	: Jasper Bryant-Greene <jasper@unix.geek.nz>
Date		: 2008-02-13 12:25 (44 days old)
References	: http://lkml.org/lkml/2008/2/13/181
Handled-By	: Yinghai Lu <Yinghai.Lu@sun.com>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10011
Subject		: The computer is blocked when X is started - unless max_cstate=2 - Acer Travelmate 4001 Lmi
Submitter	: François Valenduc <francois.valenduc@tvcablenet.be>
Date		: 2008-02-17 06:28 (40 days old)
Handled-By	: Thomas Gleixner <tglx@linutronix.de>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10027
Subject		: 2.6.25-rc[12] Video4Linux Bttv Regression
Submitter	: Bongani Hlope <bonganilinux@mweb.co.za>
Date		: 2008-02-17 09:36 (40 days old)
References	: http://lkml.org/lkml/2008/2/17/55
Handled-By	: Mauro Carvalho Chehab <mchehab@infradead.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10067
Subject		: TUNER_TDA8290=y, VIDEO_DEV=n build error
Submitter	: Toralf Förster <toralf.foerster@gmx.de>
Date		: 2008-02-22 10:36 (35 days old)
References	: http://lkml.org/lkml/2008/2/19/262


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10082
Subject		: 2.6.25-rc2-git4 - Kernel oops while running kernbench and tbench on powerpc
Submitter	: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Date		: 2008-02-20 16:01 (37 days old)
References	: http://lkml.org/lkml/2008/2/20/218
		  http://lkml.org/lkml/2008/1/18/71


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10093
Subject		: 2.6.25-current-git hangs on boot unless CONFIG_CPU_IDLE=n - Apple
Submitter	: Soeren Sonnenburg <kernel@nn7.de>
Date		: 2008-02-23 18:55 (34 days old)
References	: http://lkml.org/lkml/2008/2/23/263
		  http://marc.info/?l=linux-acpi&amp;m=120387537018467&amp;w=4
Handled-By	: Pallipadi, Venkatesh <venkatesh.pallipadi@intel.com>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10117
Subject		: 2.6.25-current-git hangs on boot (pci=nommconf helps)
Submitter	: Soeren Sonnenburg <kernel@nn7.de>
Date		: 2008-02-23 18:55 (34 days old)
References	: http://lkml.org/lkml/2008/2/23/263


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10133
Subject		: INFO: possible circular locking in the resume
Submitter	: Zdenek Kabelac <zdenek.kabelac@gmail.com>
Date		: 2008-02-27 (30 days old)
References	: http://lkml.org/lkml/2008/2/26/479
Handled-By	: Gautham R Shenoy <ego@in.ibm.com>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10156
Subject		: KVM &amp; Qemu crashed with infinite recursive kernel loop in the guest
Submitter	: Zdenek Kabelac <zdenek.kabelac@gmail.com>
Date		: 2008-02-28 11:25 (29 days old)
References	: http://lkml.org/lkml/2008/2/28/106


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10172
Subject		: kvm: INFO: inconsistent lock state
Submitter	: Zdenek Kabelac <zdenek.kabelac@gmail.com>
Date		: 2008-03-05 03:26 (23 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10235
Subject		: 2.6.25-rc5: Blank Screen with Intel 945
Submitter	: Justin Madru <jdm64@gawab.com>
Date		: 2008-03-12 12:02 (16 days old)
References	: http://lkml.org/lkml/2008/3/12/290
Handled-By	: Jesse Barnes <jbarnes@virtuousgeek.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10290
Subject		: [BUG] Linux 2.6.25-rc6 - kernel BUG at fs/mpage.c:476! on powerpc
Submitter	: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Date		: 2008-03-20 13:13 (8 days old)
References	: http://lkml.org/lkml/2008/3/20/39


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10291
Subject		: 2.6.25-rc6 hangs at resume after suspend to RAM on Mac mini Core Duo
Submitter	: Tino Keitel <tino.keitel@gmx.de>
Date		: 2008-03-20 07:05 (8 days old)
References	: http://lkml.org/lkml/2008/3/20/23


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10295
Subject		: rfkill continous console output
Submitter	: Bruce Duncan <bwduncan@gmail.com>
Date		: 2008-03-21 02:01 (7 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10300
Subject		: volume wheel does not work in 2.6.25-rc6
Submitter	: Romano Giannetti <romano.giannetti@gmail.com>
Date		: 2008-03-21 11:42 (7 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10302
Subject		: 2.6.25-git regression with snd-hda-intel on Dell XPS M1330, no analog sound
Submitter	: Andre Tomt <andre@tomt.net>
Date		: 2008-03-21 20:03 (7 days old)
References	: http://lkml.org/lkml/2008/3/21/295
Handled-By	: Matthew Ranostay <mranostay@embeddedalley.com>
		  Takashi Iwai <tiwai@suse.de>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10305
Subject		: 2.6.25-rc6: kernel BUG at fs/sysfs/file.c:89
Submitter	: Christian Kujau <lists@nerdbynature.de>
Date		: 2008-03-21 23:35 (7 days old)
References	: http://lkml.org/lkml/2008/3/21/408


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10318
Subject		: WARNING: at arch/x86/mm/highmem_32.c:43 kmap_atomic_prot+0x87/0x184()
Submitter	: Pawel Staszewski <pstaszewski@artcom.pl>
Date		: 2008-03-25 02:50 (3 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10319
Subject		: 2.6.25-rc6 regression - hang on resume
Submitter	: Soeren Sonnenburg <kernel@nn7.de>
Date		: 2008-03-25 04:44 (3 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10320
Subject		: rt2x00 does not associate or give scan results
Submitter	: Marcus Better <marcus@better.se>
Date		: 2008-03-25 06:04 (3 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10323
Subject		: panic using bridging on linus kernel 2.6.25-rc6
Submitter	: Andy Gospodarek <andy@greyhouse.net>
Date		: 2008-03-25 11:40 (3 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10324
Subject		: kernel panic ip_route_input
Submitter	: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
Date		: 2008-03-25 12:48 (3 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10343
Subject		: 103 sec. latency: sync_page() with TASK_UNINTERRUPTIBLE (?) (bisected)
Submitter	: Carlos R. Mafra <crmafra2@gmail.com>
Date		: 2008-03-23 00:47 (5 days old)
References	: http://lkml.org/lkml/2008/3/22/211
Handled-By	: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10344
Subject		: [2.6.25-rc6] possible regression: X server dying
Submitter	: Tilman Schmidt <tilman@imap.cc>
Date		: 2008-03-24 23:38 (4 days old)
References	: http://lkml.org/lkml/2008/3/24/260
Handled-By	: Dave Airlie <airlied@gmail.com>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10345
Subject		: 2.6.25-rc7: Ugh. (USB HID problems after resume)
Submitter	: Mark Lord <lkml@rtr.ca>
Date		: 2008-03-27 11:29 (1 days old)
References	: http://lkml.org/lkml/2008/3/27/228
Handled-By	: Greg KH <gregkh@suse.de>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10346
Subject		: Linux 2.6.25-rc6: WARNING: at net/ipv4/tcp_input.c:2510
Submitter	: Georgi Chorbadzhiyski <gf@unixsol.org>
Date		: 2008-03-27 17:29 (1 days old)
References	: http://lkml.org/lkml/2008/3/27/246


Regressionn with patches
------------------------

Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9969
Subject		: 2.6.24-git15 Keyboard Issue?
Submitter	: Chris Holvenstot <cholvenstot@comcast.net>
Date		: 2008-02-06 14:02 (51 days old)
References	: http://lkml.org/lkml/2008/2/6/100
		  http://lkml.org/lkml/2008/2/13/82
Handled-By	: Thomas Gleixner <tglx@linutronix.de>
Patch		: http://lkml.org/lkml/2008/2/15/343


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10153
Subject		: (regression) kernel/timeconst.h bugs with HZ=128
Submitter	: David Brownell <david-b@pacbell.net>
Date		: 2008-02-26 19:32 (31 days old)
References	: http://lkml.org/lkml/2008/2/26/294
Handled-By	: H. Peter Anvin <hpa@zytor.com>
Patch		: http://bugzilla.kernel.org/attachment.cgi?id=15114&amp;action=view
		  http://bugzilla.kernel.org/attachment.cgi?id=15115&amp;action=view


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10186
Subject		: SCSI_AIC94XX must depend on SCSI
Submitter	: Toralf Förster <toralf.foerster@gmx.de>
Date		: 2008-03-06 19:09 (22 days old)
References	: http://marc.info/?l=linux-kernel&amp;m=120483073617232&amp;w=2
Handled-By	: Adrian Bunk <bunk@kernel.org>
Patch		: http://marc.info/?l=linux-kernel&amp;m=120483499725928&amp;w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10242
Subject		: rm command hangs
Submitter	: Jean-Luc Coulon <jean.luc.coulon@gmail.com>
Date		: 2008-03-14 05:47 (14 days old)
Handled-By	: Alasdair G Kergon <agk@redhat.com>
Patch		: http://lkml.org/lkml/2008/3/27/293


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10271
Subject		: dm raid1: bitops bug
Submitter	: Heiko Carstens <heiko.carstens@de.ibm.com>
Date		: 2008-03-17 16:45 (11 days old)
References	: http://lkml.org/lkml/2008/3/17/174
Handled-By	: Alasdair G Kergon <agk@redhat.com>
Patch		: http://lkml.org/lkml/2008/3/27/296


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10304
Subject		: 2.6.25-rc5-git5 KVM memory not freed
Submitter	: Ioan Ionita <opslynx@gmail.com>
Date		: 2008-03-20 18:41 (8 days old)
References	: http://lkml.org/lkml/2008/3/20/136
Handled-By	: Avi Kivity <avi@qumranet.com>
Patch		: http://lkml.org/lkml/2008/3/23/216


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10326
Subject		: inconsistent lock state in net_rx_action
Submitter	: Marcus Better <marcus@better.se>
Date		: 2008-03-25 13:21 (3 days old)
Patch		: http://bugzilla.kernel.org/show_bug.cgi?id=10326#c20


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10340
Subject		: suspend order - again
Submitter	: Shaohua Li <shaohua.li@intel.com>
Date		: 2008-03-14 09:22 (14 days old)
References	: http://marc.info/?l=linux-acpi&amp;m=120549574009914&amp;w=4
Handled-By	: Shaohua Li <shaohua.li@intel.com>
Patch		: http://marc.info/?l=linux-acpi&amp;m=120649617317396&amp;w=4


For details, please visit the bug entries and follow the links given in
references.

As you can see, there is a Bugzilla entry for each of the listed regressions.
There also is a Bugzilla entry used for tracking the regressions from 2.6.24,
unresolved as well as resolved, at:

http://bugzilla.kernel.org/show_bug.cgi?id=9832

Please let me know if there are any Bugzilla entries that should be added to
the list in there.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-27 22:53 2.6.25-rc7-git2: Reported regressions from 2.6.24 Rafael J. Wysocki
@ 2008-03-28  0:18 ` Carlos R. Mafra
  2008-03-28  0:23   ` Rafael J. Wysocki
  2008-03-28  2:30 ` Linus Torvalds
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 55+ messages in thread
From: Carlos R. Mafra @ 2008-03-28  0:18 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Adrian Bunk, Andrew Morton, Linus Torvalds,
	Natalie Protasevich

On Thu 27.Mar'08 at 23:53:50 +0100, Rafael J. Wysocki wrote:
> This message contains a list of some regressions from 2.6.24 reported since
> 2.6.25-rc1 was released, for which there are no fixes in the mainline I know
> of.  If any of them have been fixed already, please let me know.
> [...]
 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10343
> Subject		: 103 sec. latency: sync_page() with TASK_UNINTERRUPTIBLE (?) (bisected)
> Submitter	: Carlos R. Mafra <crmafra2@gmail.com>
> Date		: 2008-03-23 00:47 (5 days old)
> References	: http://lkml.org/lkml/2008/3/22/211
> Handled-By	: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

This regression is already solved. The patch which caused this problem 
was reverted by Andrew Morton in 4dd4b920218326231156c7991ce5b94afad841c3.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28  0:18 ` Carlos R. Mafra
@ 2008-03-28  0:23   ` Rafael J. Wysocki
  0 siblings, 0 replies; 55+ messages in thread
From: Rafael J. Wysocki @ 2008-03-28  0:23 UTC (permalink / raw)
  To: Carlos R. Mafra
  Cc: LKML, Adrian Bunk, Andrew Morton, Linus Torvalds,
	Natalie Protasevich

On Friday, 28 of March 2008, Carlos R. Mafra wrote:
> On Thu 27.Mar'08 at 23:53:50 +0100, Rafael J. Wysocki wrote:
> > This message contains a list of some regressions from 2.6.24 reported since
> > 2.6.25-rc1 was released, for which there are no fixes in the mainline I know
> > of.  If any of them have been fixed already, please let me know.
> > [...]
>  
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10343
> > Subject		: 103 sec. latency: sync_page() with TASK_UNINTERRUPTIBLE (?) (bisected)
> > Submitter	: Carlos R. Mafra <crmafra2@gmail.com>
> > Date		: 2008-03-23 00:47 (5 days old)
> > References	: http://lkml.org/lkml/2008/3/22/211
> > Handled-By	: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> 
> This regression is already solved. The patch which caused this problem 
> was reverted by Andrew Morton in 4dd4b920218326231156c7991ce5b94afad841c3.

Thanks, closed.

Rafael

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-27 22:53 2.6.25-rc7-git2: Reported regressions from 2.6.24 Rafael J. Wysocki
  2008-03-28  0:18 ` Carlos R. Mafra
@ 2008-03-28  2:30 ` Linus Torvalds
  2008-03-28  3:24   ` Christoph Lameter
                     ` (3 more replies)
  2008-03-28 10:24 ` Thomas Gleixner
  2008-03-28 18:57 ` Mark Lord
  3 siblings, 4 replies; 55+ messages in thread
From: Linus Torvalds @ 2008-03-28  2:30 UTC (permalink / raw)
  To: Rafael J. Wysocki, Pawel Staszewski, Christoph Lameter
  Cc: LKML, Adrian Bunk, Andrew Morton, Natalie Protasevich

On Thu, 27 Mar 2008, Rafael J. Wysocki wrote:
>
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9412
> Subject		: commit a878539ef994787c447a98c2e3ba0fe3dad984ec breaks boot on SB600 AHCI
> Submitter	: Srihari Vijayaraghavan <sriharivijayaraghavan@yahoo.com.au>
> Date		: 2008-03-12 17:15 (16 days old)
> Handled-By	: Jeff Garzik <jgarzik@pobox.com>
> 		  Richard Zhao <richard.zhao@amd.com>

Fixed by 4cde32fc4b32e96a99063af3183acdfd54c563f0, methinks.

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9962
> Subject		: mount: could not find filesystem
> Submitter	: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
> Date		: 2008-02-12 14:34 (45 days old)
> References	: http://lkml.org/lkml/2008/2/12/91
> Handled-By	: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
> 		  Yinghai Lu <yhlu.kernel@gmail.com>

Needs more info. The original oops that opened it is fixed, but..

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9976
> Subject		: BUG: 2.6.25-rc1: iptables postrouting setup causes oops
> Submitter	: Ben Nizette <bn@niasdigital.com>
> Date		: 2008-02-12 12:46 (45 days old)
> References	: http://lkml.org/lkml/2008/2/12/148
> Handled-By	: Haavard Skinnemoen <hskinnemoen@atmel.com>

This one seems gone (and was apparently AVR-only):

http://lkml.org/lkml/2008/2/13/607:
  "What ever the problem is it isn't immediately apparent in latest git so
   I guess we'll just have to keep our eyes peeled."

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9978
> Subject		: 2.6.25-rc1: volanoMark regression
> Submitter	: Zhang, Yanmin <yanmin_zhang@linux.intel.com>
> Date		: 2008-02-13 10:30 (44 days old)
> References	: http://lkml.org/lkml/2008/2/13/128
> 		  http://lkml.org/lkml/2008/3/12/52
> 		  http://lkml.org/lkml/2008/3/18/81
> Handled-By	: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
> 		  Balbir Singh <balbir@linux.vnet.ibm.com>

Hmm. It is a regression on one machine (2x quad-core stoakley), but not 
another (4x quad-core tigerton).

Interestingly, the stoakley box numbers have apparently been all over the 
map.

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10318
> Subject		: WARNING: at arch/x86/mm/highmem_32.c:43 kmap_atomic_prot+0x87/0x184()
> Submitter	: Pawel Staszewski <pstaszewski@artcom.pl>
> Date		: 2008-03-25 02:50 (3 days old)

Andrew and seems to have debugged this down to a kzalloc(GFP_ATOMIC) or 
similar.

I wonder if the bug is in that commit 
3811dbf67162bd08412f1b0e02e554f353e93bdb ("SLUB: remove useless masking 
of GFP_ZERO"), because I don't think that masking was at all useless and I 
think my original 7fd272550bd43cc1d7289ef0ab2fa50de137e767 was correct.

That apparently bogus commit says "GFP_ZERO is already masked out in 
new_slab()", but gfpflags is not just used for new_slab(), but for 
kmalloc_large() too.  Which does *not* clear GFP_ZERO.

Pawel, does reverting 3811dbf67162bd08412f1b0e02e554f353e93bdb fix it for 
you?

Also, Rafael - do these reminder emails also go to the people who are 
mentioned in the regressions (especially people who are set up as being 
"hanled-by" or having patches for the problem)? 

		Linus

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28  2:30 ` Linus Torvalds
@ 2008-03-28  3:24   ` Christoph Lameter
  2008-03-28  4:00     ` Linus Torvalds
  2008-03-28  3:31   ` Yinghai Lu
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 55+ messages in thread
From: Christoph Lameter @ 2008-03-28  3:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Pawel Staszewski, LKML, Adrian Bunk,
	Andrew Morton, Natalie Protasevich

On Thu, 27 Mar 2008, Linus Torvalds wrote:

> 
> 
> On Thu, 27 Mar 2008, Rafael J. Wysocki wrote:
> >
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9412
> > Subject		: commit a878539ef994787c447a98c2e3ba0fe3dad984ec breaks boot on SB600 AHCI
> > Submitter	: Srihari Vijayaraghavan <sriharivijayaraghavan@yahoo.com.au>
> > Date		: 2008-03-12 17:15 (16 days old)
> > Handled-By	: Jeff Garzik <jgarzik@pobox.com>
> > 		  Richard Zhao <richard.zhao@amd.com>
> 
> Fixed by 4cde32fc4b32e96a99063af3183acdfd54c563f0, methinks.
> 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9962
> > Subject		: mount: could not find filesystem
> > Submitter	: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
> > Date		: 2008-02-12 14:34 (45 days old)
> > References	: http://lkml.org/lkml/2008/2/12/91
> > Handled-By	: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
> > 		  Yinghai Lu <yhlu.kernel@gmail.com>
> 
> Needs more info. The original oops that opened it is fixed, but..
> 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9976
> > Subject		: BUG: 2.6.25-rc1: iptables postrouting setup causes oops
> > Submitter	: Ben Nizette <bn@niasdigital.com>
> > Date		: 2008-02-12 12:46 (45 days old)
> > References	: http://lkml.org/lkml/2008/2/12/148
> > Handled-By	: Haavard Skinnemoen <hskinnemoen@atmel.com>
> 
> This one seems gone (and was apparently AVR-only):
> 
> http://lkml.org/lkml/2008/2/13/607:
>   "What ever the problem is it isn't immediately apparent in latest git so
>    I guess we'll just have to keep our eyes peeled."
> 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9978
> > Subject		: 2.6.25-rc1: volanoMark regression
> > Submitter	: Zhang, Yanmin <yanmin_zhang@linux.intel.com>
> > Date		: 2008-02-13 10:30 (44 days old)
> > References	: http://lkml.org/lkml/2008/2/13/128
> > 		  http://lkml.org/lkml/2008/3/12/52
> > 		  http://lkml.org/lkml/2008/3/18/81
> > Handled-By	: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
> > 		  Balbir Singh <balbir@linux.vnet.ibm.com>
> 
> Hmm. It is a regression on one machine (2x quad-core stoakley), but not 
> another (4x quad-core tigerton).
> 
> Interestingly, the stoakley box numbers have apparently been all over the 
> map.
> 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10318
> > Subject		: WARNING: at arch/x86/mm/highmem_32.c:43 kmap_atomic_prot+0x87/0x184()
> > Submitter	: Pawel Staszewski <pstaszewski@artcom.pl>
> > Date		: 2008-03-25 02:50 (3 days old)
> 
> Andrew and seems to have debugged this down to a kzalloc(GFP_ATOMIC) or 
> similar.

Slab allocations can never use GFP_HIGHMEM. Slab allocators BUG 
if either of these bits are set (checks on the slowpaths):

#define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK)

GFP flags are not masked/checked if either inline fallback to the page 
allocator occurs (SLUB for >4k allocs) or if an allocation is forwarded
to the page allocator (SLOB, SLUB). They are also not checked on the 
fastpaths.


AFAICT the check in kmap_atomic_prot is simply too strict.

void *kmap_atomic_prot(struct page *page, enum km_type type, pgprot_t 
prot)
{
        enum fixed_addresses idx;
        unsigned long vaddr;
        /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */

        debug_kmap_atomic_prot(type);

        pagefault_disable();

        if (!PageHighMem(page))
                return page_address(page);


The check for PageHighMem(page) needs to either come before the 
debug_kmap_atomic_prot() or kmap_atomic_prot should only be called for 
HIGHMEM allocations. Otherwise any get_zeroed_page() alloc from an 
interrupt context may cause a false positive here.

Seems to be a reoccurrence of something that I discussed with Andrew a 
while back.

http://marc.info/?t=118790336700011&r=1&w=2

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28  3:24   ` Christoph Lameter
@ 2008-03-28  4:00     ` Linus Torvalds
  2008-03-28 10:48       ` Paweł Staszewski
                         ` (2 more replies)
  0 siblings, 3 replies; 55+ messages in thread
From: Linus Torvalds @ 2008-03-28  4:00 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Rafael J. Wysocki, Pawel Staszewski, LKML, Adrian Bunk,
	Andrew Morton, Natalie Protasevich



On Thu, 27 Mar 2008, Christoph Lameter wrote:
> 
> Slab allocations can never use GFP_HIGHMEM.

Totally irrelevant.

The page allocation path does

        if (gfp_flags & __GFP_ZERO)
                prep_zero_page(page, order, gfp_flags);

and that will cause a warning REGARDLESS of whether the page is a HIGHMEM 
page or not.

And the fact is, passing in GFP_ZERO from the SLUB code is a bug 
regardless, because it unnecessarily does the dual memset().

So here's a damn big clue:
 - SLUB does its own GFP_ZERO handling
 - so passing GFP_ZERO down to the page allocator is a f*cking bug
 - and this has NOTHING what-so-ever to do with GFP_HIGHMEM or even 
   whether the warning is "valid" or not - it's a bug even if the warning 
   had never happened.

So stop blathering, and just admit that this was buggy. It was also 
fundamentally fragile to leave GFP_ZERO around when it was known to not be 
valid at that point (exactly because GFP_ZERO was handled by the caller).

		Linus

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28  4:00     ` Linus Torvalds
@ 2008-03-28 10:48       ` Paweł Staszewski
  2008-03-28 17:46         ` Andrew Morton
  2008-03-28 17:15       ` Pekka Enberg
  2008-03-28 18:33       ` Christoph Lameter
  2 siblings, 1 reply; 55+ messages in thread
From: Paweł Staszewski @ 2008-03-28 10:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Lameter, Rafael J. Wysocki, LKML, Adrian Bunk,
	Andrew Morton, Natalie Protasevich

Linus Torvalds pisze:
> On Thu, 27 Mar 2008, Christoph Lameter wrote:
>   
>> Slab allocations can never use GFP_HIGHMEM.
>>     
>
> Totally irrelevant.
>
> The page allocation path does
>
>         if (gfp_flags & __GFP_ZERO)
>                 prep_zero_page(page, order, gfp_flags);
>
> and that will cause a warning REGARDLESS of whether the page is a HIGHMEM 
> page or not.
>
> And the fact is, passing in GFP_ZERO from the SLUB code is a bug 
> regardless, because it unnecessarily does the dual memset().
>
> So here's a damn big clue:
>  - SLUB does its own GFP_ZERO handling
>  - so passing GFP_ZERO down to the page allocator is a f*cking bug
>  - and this has NOTHING what-so-ever to do with GFP_HIGHMEM or even 
>    whether the warning is "valid" or not - it's a bug even if the warning 
>    had never happened.
>
> So stop blathering, and just admit that this was buggy. It was also 
> fundamentally fragile to leave GFP_ZERO around when it was known to not be 
> valid at that point (exactly because GFP_ZERO was handled by the caller).
>
> 		Linus
>
>
>   
Sorry for offtopic but i have the same problem with kernels 2.6.25-*
like:
http://kerneltrap.org/mailarchive/linux-netdev/2008/3/27/1274804
http://kerneltrap.org/mailarchive/linux-netdev/2008/3/27/1270334

I search linux-netdev and found this links.
I only sugest that 

	Denys Fedoryshchenko

can have the same problem that i have with this kernels.
I must revert my all kernels to 2.6.23.11 to get stable work on high (ip 
traffic) loads.

And there is no documentation for LRO... and Stephen Hemminger write me 
that LRO is not compatible with bridgeing and routing.
see this link:
http://bugzilla.kernel.org/show_bug.cgi?id=10335


So there must be some documentation for this ... because people can have 
many problems with this.


Sorry for offtopic but this can resolve problems like my and Denys .

Pawel


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 10:48       ` Paweł Staszewski
@ 2008-03-28 17:46         ` Andrew Morton
  2008-03-28 21:57           ` Rafael J. Wysocki
  0 siblings, 1 reply; 55+ messages in thread
From: Andrew Morton @ 2008-03-28 17:46 UTC (permalink / raw)
  To: Paweł Staszewski
  Cc: Linus Torvalds, Christoph Lameter, Rafael J. Wysocki, LKML,
	Adrian Bunk, Natalie Protasevich, netdev

On Fri, 28 Mar 2008 11:48:55 +0100 Pawe__ Staszewski <pstaszewski@artcom.pl> wrote:

> Linus Torvalds pisze:
> > On Thu, 27 Mar 2008, Christoph Lameter wrote:
> >   
> >> Slab allocations can never use GFP_HIGHMEM.
> >>     
> >
> > Totally irrelevant.
> >
> > The page allocation path does
> >
> >         if (gfp_flags & __GFP_ZERO)
> >                 prep_zero_page(page, order, gfp_flags);
> >
> > and that will cause a warning REGARDLESS of whether the page is a HIGHMEM 
> > page or not.
> >
> > And the fact is, passing in GFP_ZERO from the SLUB code is a bug 
> > regardless, because it unnecessarily does the dual memset().
> >
> > So here's a damn big clue:
> >  - SLUB does its own GFP_ZERO handling
> >  - so passing GFP_ZERO down to the page allocator is a f*cking bug
> >  - and this has NOTHING what-so-ever to do with GFP_HIGHMEM or even 
> >    whether the warning is "valid" or not - it's a bug even if the warning 
> >    had never happened.
> >
> > So stop blathering, and just admit that this was buggy. It was also 
> > fundamentally fragile to leave GFP_ZERO around when it was known to not be 
> > valid at that point (exactly because GFP_ZERO was handled by the caller).
> >
> > 		Linus
> >
> >
> >   
> Sorry for offtopic but i have the same problem with kernels 2.6.25-*
> like:
> http://kerneltrap.org/mailarchive/linux-netdev/2008/3/27/1274804
> http://kerneltrap.org/mailarchive/linux-netdev/2008/3/27/1270334
> 
> I search linux-netdev and found this links.
> I only sugest that 
> 
> 	Denys Fedoryshchenko
> 
> can have the same problem that i have with this kernels.
> I must revert my all kernels to 2.6.23.11 to get stable work on high (ip 
> traffic) loads.
> 
> And there is no documentation for LRO... and Stephen Hemminger write me 
> that LRO is not compatible with bridgeing and routing.
> see this link:
> http://bugzilla.kernel.org/show_bug.cgi?id=10335
> 
> 
> So there must be some documentation for this ... because people can have 
> many problems with this.
> 

These are all networking things, so let's cc that list.

> 
> Sorry for offtopic but this can resolve problems like my and Denys .

It's very on-topic - thanks for the reminder.

Rafael, are these things actually on the list?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 17:46         ` Andrew Morton
@ 2008-03-28 21:57           ` Rafael J. Wysocki
  0 siblings, 0 replies; 55+ messages in thread
From: Rafael J. Wysocki @ 2008-03-28 21:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Paweł Staszewski, Linus Torvalds, Christoph Lameter, LKML,
	Adrian Bunk, Natalie Protasevich, netdev

On Friday, 28 of March 2008, Andrew Morton wrote:
> On Fri, 28 Mar 2008 11:48:55 +0100 Pawe__ Staszewski <pstaszewski@artcom.pl> wrote:
> 
> > Linus Torvalds pisze:
> > > On Thu, 27 Mar 2008, Christoph Lameter wrote:
> > >   
> > >> Slab allocations can never use GFP_HIGHMEM.
> > >>     
> > >
> > > Totally irrelevant.
> > >
> > > The page allocation path does
> > >
> > >         if (gfp_flags & __GFP_ZERO)
> > >                 prep_zero_page(page, order, gfp_flags);
> > >
> > > and that will cause a warning REGARDLESS of whether the page is a HIGHMEM 
> > > page or not.
> > >
> > > And the fact is, passing in GFP_ZERO from the SLUB code is a bug 
> > > regardless, because it unnecessarily does the dual memset().
> > >
> > > So here's a damn big clue:
> > >  - SLUB does its own GFP_ZERO handling
> > >  - so passing GFP_ZERO down to the page allocator is a f*cking bug
> > >  - and this has NOTHING what-so-ever to do with GFP_HIGHMEM or even 
> > >    whether the warning is "valid" or not - it's a bug even if the warning 
> > >    had never happened.
> > >
> > > So stop blathering, and just admit that this was buggy. It was also 
> > > fundamentally fragile to leave GFP_ZERO around when it was known to not be 
> > > valid at that point (exactly because GFP_ZERO was handled by the caller).
> > >
> > > 		Linus
> > >
> > >
> > >   
> > Sorry for offtopic but i have the same problem with kernels 2.6.25-*
> > like:
> > http://kerneltrap.org/mailarchive/linux-netdev/2008/3/27/1274804
> > http://kerneltrap.org/mailarchive/linux-netdev/2008/3/27/1270334
> > 
> > I search linux-netdev and found this links.
> > I only sugest that 
> > 
> > 	Denys Fedoryshchenko
> > 
> > can have the same problem that i have with this kernels.
> > I must revert my all kernels to 2.6.23.11 to get stable work on high (ip 
> > traffic) loads.
> > 
> > And there is no documentation for LRO... and Stephen Hemminger write me 
> > that LRO is not compatible with bridgeing and routing.
> > see this link:
> > http://bugzilla.kernel.org/show_bug.cgi?id=10335
> > 
> > 
> > So there must be some documentation for this ... because people can have 
> > many problems with this.
> > 
> 
> These are all networking things, so let's cc that list.
> 
> > 
> > Sorry for offtopic but this can resolve problems like my and Denys .
> 
> It's very on-topic - thanks for the reminder.
> 
> Rafael, are these things actually on the list?

Well, this isn't a recent regression, at least not from 2.6.24, so I don't
list it.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28  4:00     ` Linus Torvalds
  2008-03-28 10:48       ` Paweł Staszewski
@ 2008-03-28 17:15       ` Pekka Enberg
  2008-03-28 17:27         ` Linus Torvalds
  2008-03-28 18:33       ` Christoph Lameter
  2 siblings, 1 reply; 55+ messages in thread
From: Pekka Enberg @ 2008-03-28 17:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Lameter, Rafael J. Wysocki, Pawel Staszewski, LKML,
	Adrian Bunk, Andrew Morton, Natalie Protasevich

Hi Christoph,

On Fri, Mar 28, 2008 at 6:00 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>  And the fact is, passing in GFP_ZERO from the SLUB code is a bug
>  regardless, because it unnecessarily does the dual memset().

We clear GFP_ZERO in new_slab() so the normal kmalloc()/kzalloc() path
should be fine but don't do it for kmalloc_large() nor
kmalloc_large_node(). Is that the bug here?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 17:15       ` Pekka Enberg
@ 2008-03-28 17:27         ` Linus Torvalds
  2008-03-28 18:08           ` Pekka Enberg
  2008-03-28 18:37           ` Christoph Lameter
  0 siblings, 2 replies; 55+ messages in thread
From: Linus Torvalds @ 2008-03-28 17:27 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Rafael J. Wysocki, Pawel Staszewski, LKML,
	Adrian Bunk, Andrew Morton, Natalie Protasevich

On Fri, 28 Mar 2008, Pekka Enberg wrote:
> 
> On Fri, Mar 28, 2008 at 6:00 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >  And the fact is, passing in GFP_ZERO from the SLUB code is a bug
> >  regardless, because it unnecessarily does the dual memset().
> 
> We clear GFP_ZERO in new_slab() so the normal kmalloc()/kzalloc() path
> should be fine but don't do it for kmalloc_large() nor
> kmalloc_large_node(). Is that the bug here?

Dammit, NO.

The bug was that the commit I made (which was correct and robust) was then 
partially reverted by Christoph for no good reason. At that point, 
kmalloc_large() didn't even exist, so at that point the change was 
"technically correct" (since the only user of gfpflags really did end up 
clearing it somewhere deep in its callchain).

So when that original 3811dbf67162bd08412f1b0e02e554f353e93bdb happened, 
it wasn't an outright bug - but that doesn't make it right. That commit 
was always just a bug waiting to happen, because it just set things up for 
later problems by retaining that bit when it really shouldn't have been 
retained, and forcing all future callers to be careful. Which they 
obviously were not!

Yes, you can clear GFP_ZERO in multiple illogical places, and it will fix 
the bug. Or you can clear it in *one* place, that is in on the direct 
callchain from the person who actually does the memset(0), and even add a 
comment that says exactly what is going on.

So the fact is, commit 3811dbf67162bd08412f1b0e02e554f353e93bdb is and was 
total and utter crap. I've reverted it in my tree. It's crap not because 
it was buggy when it was put in, but because it was *fragile* when it was 
put in. And that fragility ended up causing a bug later.

I'm getting really tired of slub. It was supposed to be simpler code than 
slab, and yes, it's simpler, but it has been buggy as hell, and part of it 
has been that people just haven't been careful enough, and haven't written 
code to be defensive and easy-to-follow.

So the *last* thing we want to do is to clear GFP_ZERO in multiple subtle 
places based on new random code being added. We want to clear it at the 
top level, so that no other code never ever even has to _think_ about it!

			Linus

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 17:27         ` Linus Torvalds
@ 2008-03-28 18:08           ` Pekka Enberg
  2008-03-28 18:20             ` Linus Torvalds
  2008-03-28 18:37           ` Christoph Lameter
  1 sibling, 1 reply; 55+ messages in thread
From: Pekka Enberg @ 2008-03-28 18:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Lameter, Rafael J. Wysocki, Pawel Staszewski, LKML,
	Adrian Bunk, Andrew Morton, Natalie Protasevich

Hi Linus,

On Fri, 28 Mar 2008, Pekka Enberg wrote:
> > We clear GFP_ZERO in new_slab() so the normal kmalloc()/kzalloc() path
> > should be fine but don't do it for kmalloc_large() nor
> > kmalloc_large_node(). Is that the bug here?

On Fri, Mar 28, 2008 at 7:27 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>  Dammit, NO.
>
>  The bug was that the commit I made (which was correct and robust) was then
>  partially reverted by Christoph for no good reason. At that point,
>  kmalloc_large() didn't even exist, so at that point the change was
>  "technically correct" (since the only user of gfpflags really did end up
>  clearing it somewhere deep in its callchain).

I was not implying that we should clear GFP_ZERO in kmalloc_large()
but that we can hit the page allocator with GFP_ZERO via kmalloc() and
kzalloc() for size > PAGE_SIZE allocations. And asking Christoph if
that's the bug we're seeing here.

On Fri, Mar 28, 2008 at 7:27 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>  So the *last* thing we want to do is to clear GFP_ZERO in multiple subtle
>  places based on new random code being added. We want to clear it at the
>  top level, so that no other code never ever even has to _think_ about it!

We are clearing it in one place, just before calling alloc_pages.
[Yes, it's hard to spot, it's in new_slab() where we call
allocate_slab().] I'm okay with moving it to top level but I don't see
how that fixes any of the bugs mentioned here.

                        Pekka

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 18:08           ` Pekka Enberg
@ 2008-03-28 18:20             ` Linus Torvalds
  2008-03-28 18:38               ` Christoph Lameter
                                 ` (2 more replies)
  0 siblings, 3 replies; 55+ messages in thread
From: Linus Torvalds @ 2008-03-28 18:20 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Rafael J. Wysocki, Pawel Staszewski, LKML,
	Adrian Bunk, Andrew Morton, Natalie Protasevich



On Fri, 28 Mar 2008, Pekka Enberg wrote:
> 
> On Fri, Mar 28, 2008 at 7:27 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >  So the *last* thing we want to do is to clear GFP_ZERO in multiple subtle
> >  places based on new random code being added. We want to clear it at the
> >  top level, so that no other code never ever even has to _think_ about it!
> 
> We are clearing it in one place, just before calling alloc_pages.

BUT THAT IS TOTALLY IRRELEVANT.

We're not clearing it in kmalloc_large()!

What's so hard to understand?

> [Yes, it's hard to spot, it's in new_slab() where we call
> allocate_slab().] I'm okay with moving it to top level but I don't see
> how that fixes any of the bugs mentioned here.

That stupid clearing in new_slab() is totally and utterly irrelevant (in 
addition to the fact that it's hard to spot).

The point was never new_slab(). So why do you even mention it?

The code in question is __slab_alloc(). It did *not* clear it correctly 
before its uses (__slab_alloc -> kmalloc_large -> __get_free_pages).

			Linus

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 18:20             ` Linus Torvalds
@ 2008-03-28 18:38               ` Christoph Lameter
  2008-03-28 18:47               ` Andrew Morton
  2008-03-28 19:59               ` Pekka Enberg
  2 siblings, 0 replies; 55+ messages in thread
From: Christoph Lameter @ 2008-03-28 18:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Pekka Enberg, Rafael J. Wysocki, Pawel Staszewski, LKML,
	Adrian Bunk, Andrew Morton, Natalie Protasevich

On Fri, 28 Mar 2008, Linus Torvalds wrote:

> The code in question is __slab_alloc(). It did *not* clear it correctly 
> before its uses (__slab_alloc -> kmalloc_large -> __get_free_pages).

But why could have __GFP_ZERO there be harmful? Its a non highmem 
allocation. No kmap needed.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 18:20             ` Linus Torvalds
  2008-03-28 18:38               ` Christoph Lameter
@ 2008-03-28 18:47               ` Andrew Morton
  2008-03-28 18:53                 ` Christoph Lameter
  2008-03-28 19:37                 ` Linus Torvalds
  2008-03-28 19:59               ` Pekka Enberg
  2 siblings, 2 replies; 55+ messages in thread
From: Andrew Morton @ 2008-03-28 18:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: penberg, clameter, rjw, pstaszewski, linux-kernel, bunk, protasnb

On Fri, 28 Mar 2008 11:20:52 -0700 (PDT)
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> The code in question is __slab_alloc().

The bug we're discussing is http://bugzilla.kernel.org/show_bug.cgi?id=10318

It's a driver call stright into the page allocator - slab is not involved.

I was planning on plugging it this way:


From: Andrew Morton <akpm@linux-foundation.org>

I believe http://bugzilla.kernel.org/show_bug.cgi?id=10318 is a false
positive.  There's no way in which networking will be using highmem pages
here, so it won't be taking the KM_USER0 kmap slot, so there's no point in
performing these checks.

Cc: Pawel Staszewski <pstaszewski@artcom.pl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/x86/mm/highmem_32.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff -puN arch/x86/mm/highmem_32.c~x86-kmap_atomic-debugging-only-run-debug_kmap_atomic_prot-for-highmem-pages arch/x86/mm/highmem_32.c
--- a/arch/x86/mm/highmem_32.c~x86-kmap_atomic-debugging-only-run-debug_kmap_atomic_prot-for-highmem-pages
+++ a/arch/x86/mm/highmem_32.c
@@ -73,15 +73,15 @@ void *kmap_atomic_prot(struct page *page
 {
 	enum fixed_addresses idx;
 	unsigned long vaddr;
-	/* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
-
-	debug_kmap_atomic_prot(type);
 
+	/* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
 	pagefault_disable();
 
 	if (!PageHighMem(page))
 		return page_address(page);
 
+	debug_kmap_atomic_prot(type);
+
 	idx = type + KM_TYPE_NR*smp_processor_id();
 	vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
 	BUG_ON(!pte_none(*(kmap_pte-idx)));
_


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 18:47               ` Andrew Morton
@ 2008-03-28 18:53                 ` Christoph Lameter
  2008-03-28 19:37                 ` Linus Torvalds
  1 sibling, 0 replies; 55+ messages in thread
From: Christoph Lameter @ 2008-03-28 18:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linus Torvalds, penberg, rjw, pstaszewski, linux-kernel, bunk,
	protasnb

On Fri, 28 Mar 2008, Andrew Morton wrote:

> I believe http://bugzilla.kernel.org/show_bug.cgi?id=10318 is a false
> positive.  There's no way in which networking will be using highmem pages
> here, so it won't be taking the KM_USER0 kmap slot, so there's no point in
> performing these checks.

Reviewed-by: Christoph Lameter <clameter@sgi.com>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 18:47               ` Andrew Morton
  2008-03-28 18:53                 ` Christoph Lameter
@ 2008-03-28 19:37                 ` Linus Torvalds
  2008-03-28 19:59                   ` Linus Torvalds
  1 sibling, 1 reply; 55+ messages in thread
From: Linus Torvalds @ 2008-03-28 19:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: penberg, clameter, rjw, pstaszewski, linux-kernel, bunk, protasnb

On Fri, 28 Mar 2008, Andrew Morton wrote:
>
> I was planning on plugging it this way:

.. and thereby losing the point of doing it the old way.

The old way caused more warnigns, but more importantly, it caused warnings 
on machines that didn't actually _have_ highmem pages. Which is actually 
the big majority of them.

That was the whole (and only) point of the debugging! If you only test the 
page address, you lose all the coverage!

It would be better if we actually passed in the gfp_flags, and then we 
could test the __GFP_HIGH bit rather than the page address.  But for now, 
the rule is that GFP_ATOMIC and __GFP_ZERO do not work together, because 
this sanity test currently cannot work for that case.

		Linus

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 19:37                 ` Linus Torvalds
@ 2008-03-28 19:59                   ` Linus Torvalds
  0 siblings, 0 replies; 55+ messages in thread
From: Linus Torvalds @ 2008-03-28 19:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: penberg, clameter, rjw, pstaszewski, linux-kernel, bunk, protasnb



On Fri, 28 Mar 2008, Linus Torvalds wrote:
> 
> It would be better if we actually passed in the gfp_flags, and then we 
> could test the __GFP_HIGH bit rather than the page address.  But for now, 
> the rule is that GFP_ATOMIC and __GFP_ZERO do not work together, because 
> this sanity test currently cannot work for that case.

Gaah. Sadly, we don't actually have any reasonably sane way of doing that. 

So I guess we'll have to lose the coverage of kmap_atomic() in other 
places, even though the only case where this was worth losing it for was 
that one clear_highpage() call site :/

		Linus

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 18:20             ` Linus Torvalds
  2008-03-28 18:38               ` Christoph Lameter
  2008-03-28 18:47               ` Andrew Morton
@ 2008-03-28 19:59               ` Pekka Enberg
  2008-03-28 20:24                 ` Linus Torvalds
  2 siblings, 1 reply; 55+ messages in thread
From: Pekka Enberg @ 2008-03-28 19:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Lameter, Rafael J. Wysocki, Pawel Staszewski, LKML,
	Adrian Bunk, Andrew Morton, Natalie Protasevich

Linus Torvalds wrote:
> That stupid clearing in new_slab() is totally and utterly irrelevant (in 
> addition to the fact that it's hard to spot).
> 
> The point was never new_slab(). So why do you even mention it?

Because I didn't spot that we're calling kmalloc_large() in the fallback 
path too. I was referring to the page allocator pass-through 
kmalloc_large() call that happens before we hit __slab_alloc. My bad, sorry.

			Pekka

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 19:59               ` Pekka Enberg
@ 2008-03-28 20:24                 ` Linus Torvalds
  0 siblings, 0 replies; 55+ messages in thread
From: Linus Torvalds @ 2008-03-28 20:24 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Rafael J. Wysocki, Pawel Staszewski, LKML,
	Adrian Bunk, Andrew Morton, Natalie Protasevich

On Fri, 28 Mar 2008, Pekka Enberg wrote:
> 
> Because I didn't spot that we're calling kmalloc_large() in the fallback path
> too. I was referring to the page allocator pass-through kmalloc_large() call
> that happens before we hit __slab_alloc. My bad, sorry.

So this is kind of my point of the whole thing. It's really hard to even 
realize that there are multiple points - unless you do it at the top 
level.

And yeah, the double clearing is just a performance issue in a rare and 
odd case, and if it hadn't been for the very anal debug checks we would 
never have even noticed. But code that is hard to follow and has really 
subtle non-local implications is fundamentally more likely to have these 
kinds of problems. Which is why I'm arguing against code like that.

			Linus

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 17:27         ` Linus Torvalds
  2008-03-28 18:08           ` Pekka Enberg
@ 2008-03-28 18:37           ` Christoph Lameter
  2008-03-28 19:32             ` Linus Torvalds
  1 sibling, 1 reply; 55+ messages in thread
From: Christoph Lameter @ 2008-03-28 18:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Pekka Enberg, Rafael J. Wysocki, Pawel Staszewski, LKML,
	Adrian Bunk, Andrew Morton, Natalie Protasevich

On Fri, 28 Mar 2008, Linus Torvalds wrote:

> The bug was that the commit I made (which was correct and robust) was then 
> partially reverted by Christoph for no good reason. At that point, 
> kmalloc_large() didn't even exist, so at that point the change was 
> "technically correct" (since the only user of gfpflags really did end up 
> clearing it somewhere deep in its callchain).

That commit cleared a flag that was cleared later anyways....

> So the fact is, commit 3811dbf67162bd08412f1b0e02e554f353e93bdb is and was 
> total and utter crap. I've reverted it in my tree. It's crap not because 
> it was buggy when it was put in, but because it was *fragile* when it was 
> put in. And that fragility ended up causing a bug later.

Do we have confirmation that the revert actually does something? It 
certainly does not address the issues with inline page forwarding. And why 
are we forbidding page allocator allocs with __GFP_ZERO in interrupt 
context? Only __GFP_HIGHMEM needs kmap and only __GFP_HIGHMEM can cause 
trouble.

> I'm getting really tired of slub. It was supposed to be simpler code than 
> slab, and yes, it's simpler, but it has been buggy as hell, and part of it 
> has been that people just haven't been careful enough, and haven't written 
> code to be defensive and easy-to-follow.

You may want to check SLAB recent bug list. There is still crap over there 
as well and AFAICT this is more severe than SLUB and its inherent in the 
complex scheme that we had to put in ther.
 
> So the *last* thing we want to do is to clear GFP_ZERO in multiple subtle 
> places based on new random code being added. We want to clear it at the 
> top level, so that no other code never ever even has to _think_ about it!

__GFP_ZERO is cleared in one place when new_slab() calls allocate_slab(). 
The fallback code in 2.6.25 will go away in 2.6.26.



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 18:37           ` Christoph Lameter
@ 2008-03-28 19:32             ` Linus Torvalds
  0 siblings, 0 replies; 55+ messages in thread
From: Linus Torvalds @ 2008-03-28 19:32 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Rafael J. Wysocki, Pawel Staszewski, LKML,
	Adrian Bunk, Andrew Morton, Natalie Protasevich



On Fri, 28 Mar 2008, Christoph Lameter wrote:
> 
> __GFP_ZERO is cleared in one place when new_slab() calls allocate_slab(). 

You're so full of crap that it's not even funny.

The fact that one path does it correct makes it ok to do it wrong in 
another path?

> The fallback code in 2.6.25 will go away in 2.6.26.

And the fact that 2.5.26 may fix it differently makes it ok to be buggy in 
2.6.25?

You are not making any sense what-so-ever.

		Linus

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28  4:00     ` Linus Torvalds
  2008-03-28 10:48       ` Paweł Staszewski
  2008-03-28 17:15       ` Pekka Enberg
@ 2008-03-28 18:33       ` Christoph Lameter
  2008-03-28 19:25         ` Linus Torvalds
  2 siblings, 1 reply; 55+ messages in thread
From: Christoph Lameter @ 2008-03-28 18:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Pawel Staszewski, LKML, Adrian Bunk,
	Andrew Morton, Natalie Protasevich

On Thu, 27 Mar 2008, Linus Torvalds wrote:

> Totally irrelevant.
> 
> The page allocation path does
> 
>         if (gfp_flags & __GFP_ZERO)
>                 prep_zero_page(page, order, gfp_flags);
> 
> and that will cause a warning REGARDLESS of whether the page is a HIGHMEM 
> page or not.


prep_zero_page does:

static inline void prep_zero_page(struct page *page, int order, gfp_t 
gfp_flags)
{
        int i;

        /*
         * clear_highpage() will use KM_USER0, so it's a bug to use __GFP_ZERO
         * and __GFP_HIGHMEM from hard or soft interrupt context.
         */
        VM_BUG_ON((gfp_flags & __GFP_HIGHMEM) && in_interrupt());
        for (i = 0; i < (1 << order); i++)
                clear_highpage(page + i);
}

So we forbit __GFP_HIGHMEM and in_interrupt which makes sense. The simple 
forwarding of large kmallocs to the page allocator as done by SLUB / SLOB 
is fine.

Then clear_highpage calls additional checking functions that have 
the effect of generally forbiding zeroing in interrupt context if 
CONFIG_HIGHMEM is set. This is wrong and needs to be fixed.

> And the fact is, passing in GFP_ZERO from the SLUB code is a bug 
> regardless, because it unnecessarily does the dual memset().

Well that is only the fallback path of __slab_alloc which is not triggered 
here and not performance sensitive. We could clear the flag there but 
that is irrevelant for this issue.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 18:33       ` Christoph Lameter
@ 2008-03-28 19:25         ` Linus Torvalds
  2008-03-29 20:42           ` Christoph Lameter
  0 siblings, 1 reply; 55+ messages in thread
From: Linus Torvalds @ 2008-03-28 19:25 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Rafael J. Wysocki, Pawel Staszewski, LKML, Adrian Bunk,
	Andrew Morton, Natalie Protasevich

On Fri, 28 Mar 2008, Christoph Lameter wrote:
> 
> prep_zero_page does:
> 
> static inline void prep_zero_page(struct page *page, int order, gfp_t  gfp_flags)
> {
>         int i;
> 
>         /*
>          * clear_highpage() will use KM_USER0, so it's a bug to use __GFP_ZERO
>          * and __GFP_HIGHMEM from hard or soft interrupt context.
>          */
>         VM_BUG_ON((gfp_flags & __GFP_HIGHMEM) && in_interrupt());
>         for (i = 0; i < (1 << order); i++)
>                 clear_highpage(page + i);

.. and clear_highpage() does:

	void *kaddr = kmap_atomic(page, KM_USER0);
	clear_page(kaddr);

.. where kmap_atomic() on x86 does:

	kmap_atomic() ->
	  kmap_atomic_prot() ->
	    debug_kmap_atomic_prot() ->
	      if (in_irq())
	        WARN_ON_ONCE()

none of which are at all conditional on __GFP_HIGHMEM.

But none of this is relevant. The warning possibly didn't even come from 
slub, it just made me look at it - because *something* is doing GFP_ATOMIC 
together with __GFP_ZERO, and it became obvious that SLUB is one potential 
cause of that.

And the SLUB case simply isn't valid!

> Then clear_highpage calls additional checking functions that have 
> the effect of generally forbiding zeroing in interrupt context if 
> CONFIG_HIGHMEM is set. This is wrong and needs to be fixed.

No. Dammit, the bug is in SLUB.

If SLUB *ever* calls the page allocator with __GFP_ZERO set, it's a 
bug, and that has nothing to do with GFP_ATOMIC or anything else. Because 
SLUB uses its own logic for clearing the result.

Why cannot you just admit it?

Now, _outside_ of SLUB there appear to be other users too, and those users 
need to either be fixed or we need to allow __GFP_ZERO togethe with 
GFP_ATOMIC. But the fact is, SLUB had a really stupid bug that it 
shouldn't have had.

		Linus

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 19:25         ` Linus Torvalds
@ 2008-03-29 20:42           ` Christoph Lameter
  2008-03-29 21:29             ` Linus Torvalds
  0 siblings, 1 reply; 55+ messages in thread
From: Christoph Lameter @ 2008-03-29 20:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Pawel Staszewski, LKML, Adrian Bunk,
	Andrew Morton, Natalie Protasevich

On Fri, 28 Mar 2008, Linus Torvalds wrote:

> .. where kmap_atomic() on x86 does:
> 
> 	kmap_atomic() ->
> 	  kmap_atomic_prot() ->
> 	    debug_kmap_atomic_prot() ->
> 	      if (in_irq())
> 	        WARN_ON_ONCE()
> 
> none of which are at all conditional on __GFP_HIGHMEM.

kmap check for PageHighmem and does not do a kmap for regular pages.
So this is actually okay. If the allocation that was performed does not 
allow GFP_HIGHMEM then the kmap will never use a real mapping. The check 
should not trigger.

> > Then clear_highpage calls additional checking functions that have 
> > the effect of generally forbiding zeroing in interrupt context if 
> > CONFIG_HIGHMEM is set. This is wrong and needs to be fixed.
> 
> No. Dammit, the bug is in SLUB.
> 
> If SLUB *ever* calls the page allocator with __GFP_ZERO set, it's a 
> bug, and that has nothing to do with GFP_ATOMIC or anything else. Because 
> SLUB uses its own logic for clearing the result.

Yes it uses its own logic if the object is managed by SLUB but not if the 
object is too big and/or the allocation forwarded to the page allocator 
or for other internal allocations of buffers etc.
 
> Why cannot you just admit it?

Admitting something that is not true is rather difficult.

> Now, _outside_ of SLUB there appear to be other users too, and those users 
> need to either be fixed or we need to allow __GFP_ZERO togethe with 
> GFP_ATOMIC. But the fact is, SLUB had a really stupid bug that it 
> shouldn't have had.

So what you want is to forbid any use of

	alloc_pages(__GFP_ZERO|...)

from an interrupt context? That works fine on most platforms and used to 
work fine on x86 as well until the check was added on January 30th.

If we really want this then the check in prep_zero_page should be changed 
too:

---
 mm/page_alloc.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c	2008-03-29 13:40:42.166669333 -0700
+++ linux-2.6/mm/page_alloc.c	2008-03-29 13:41:21.039168276 -0700
@@ -317,7 +317,7 @@ static inline void prep_zero_page(struct
 	 * clear_highpage() will use KM_USER0, so it's a bug to use __GFP_ZERO
 	 * and __GFP_HIGHMEM from hard or soft interrupt context.
 	 */
-	VM_BUG_ON((gfp_flags & __GFP_HIGHMEM) && in_interrupt());
+	VM_BUG_ON(in_interrupt());
 	for (i = 0; i < (1 << order); i++)
 		clear_highpage(page + i);
 }

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-29 20:42           ` Christoph Lameter
@ 2008-03-29 21:29             ` Linus Torvalds
  2008-03-29 23:52               ` Pekka Enberg
  2008-03-31 18:45               ` Christoph Lameter
  0 siblings, 2 replies; 55+ messages in thread
From: Linus Torvalds @ 2008-03-29 21:29 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Rafael J. Wysocki, Pawel Staszewski, LKML, Adrian Bunk,
	Andrew Morton, Natalie Protasevich

On Sat, 29 Mar 2008, Christoph Lameter wrote:
>
> > If SLUB *ever* calls the page allocator with __GFP_ZERO set, it's a 
> > bug, and that has nothing to do with GFP_ATOMIC or anything else. Because 
> > SLUB uses its own logic for clearing the result.
> 
> Yes it uses its own logic if the object is managed by SLUB but not if the 
> object is too big and/or the allocation forwarded to the page allocator 
> or for other internal allocations of buffers etc.

Wrong.

It uses it's own logic for __GFP_ZERO *regardless* of size.

> > Why cannot you just admit it?
> 
> Admitting something that is not true is rather difficult.

You don't have a f*cking clue about this cocde that you're supposed to be 
maintaining, do you?

See "slab_alloc()". See the code:

	if (unlikely((gfpflags & __GFP_ZERO) && object))
		memset(object, 0, c->objsize);

and see how it does it regardless of anything else.

In short, if *any* code-path calls down to any allocator from that routine 
with GFP_ZERO set, it's a bug. No ifs, buts or maybes about it. It 
shouldn't do that, because the actual memset() is done by slab_alloc(), 
and should not be done ANYWHERE ELSE.

It has *nothing* to do with "object is too big" or anything else.

> So what you want is to forbid any use of
> 
> 	alloc_pages(__GFP_ZERO|...)

No. I want you to admit the bugs in code you maintain. I want you to admit 
that slab_alloc() does the memset(), and should NEVER EVER use __GFP_ZERO
for the page allocations.

I have told you about a million times now that THIS HAS NOTHING TO DO WITH 
interrupts or HIGHMEM or *anything* else. This is purely a SLUB issue.

But don't worry. I already fixed it by reverting your broken commit. I 
just wish you could follow code that you are supposed to be maintaining.

			Linus

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-29 21:29             ` Linus Torvalds
@ 2008-03-29 23:52               ` Pekka Enberg
  2008-03-31 18:56                 ` Christoph Lameter
  2008-03-31 18:45               ` Christoph Lameter
  1 sibling, 1 reply; 55+ messages in thread
From: Pekka Enberg @ 2008-03-29 23:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Lameter, Rafael J. Wysocki, Pawel Staszewski, LKML,
	Adrian Bunk, Andrew Morton, Natalie Protasevich

Hi,

On Sat, 29 Mar 2008, Christoph Lameter wrote:
>  > Yes it uses its own logic if the object is managed by SLUB but not if the
>  > object is too big and/or the allocation forwarded to the page allocator
>  > or for other internal allocations of buffers etc.

On Sat, Mar 29, 2008 at 11:29 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>  Wrong.
>
>  It uses it's own logic for __GFP_ZERO *regardless* of size.

Christoph, I think you're overlooking the same thing I was until Linus
straightened me out. We're calling kmalloc_large() from __slab_alloc()
for the fall-back case which causes a bug fixed by Linus' revert.

On Sat, Mar 29, 2008 at 11:29 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>  You don't have a f*cking clue about this cocde that you're supposed to be
>  maintaining, do you?

Yeah, me too. Fortunately we have you as our upstream maintainer :-).

                        Pekka

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-29 23:52               ` Pekka Enberg
@ 2008-03-31 18:56                 ` Christoph Lameter
  0 siblings, 0 replies; 55+ messages in thread
From: Christoph Lameter @ 2008-03-31 18:56 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Linus Torvalds, Rafael J. Wysocki, Pawel Staszewski, LKML,
	Adrian Bunk, Andrew Morton, Natalie Protasevich

On Sun, 30 Mar 2008, Pekka Enberg wrote:

> Christoph, I think you're overlooking the same thing I was until Linus
> straightened me out. We're calling kmalloc_large() from __slab_alloc()
> for the fall-back case which causes a bug fixed by Linus' revert.

Yes, that means that __GFP_ZERO can be passed to the page allocator there. 
Same thing as happens when kmalloc() calls kmalloc_large() (as you pointed 
out before). The harm is zeroing an allocated object twice in rare cases 
that trigger the fallback path and that is avoided by Linus' patch.

For the fix to get rid of the backtraces that started this discussion see
commit 9c312058b2e530722c7bd30c1b6f26eea35dc5fe


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-29 21:29             ` Linus Torvalds
  2008-03-29 23:52               ` Pekka Enberg
@ 2008-03-31 18:45               ` Christoph Lameter
  1 sibling, 0 replies; 55+ messages in thread
From: Christoph Lameter @ 2008-03-31 18:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Pawel Staszewski, LKML, Adrian Bunk,
	Andrew Morton, Natalie Protasevich

On Sat, 29 Mar 2008, Linus Torvalds wrote:

> 
> You don't have a f*cking clue about this cocde that you're supposed to be 
> maintaining, do you?
> 
> See "slab_alloc()". See the code:
> 
> 	if (unlikely((gfpflags & __GFP_ZERO) && object))
> 		memset(object, 0, c->objsize);
> 
> and see how it does it regardless of anything else.

Yes I am very aware of that.

> In short, if *any* code-path calls down to any allocator from that routine 
> with GFP_ZERO set, it's a bug. No ifs, buts or maybes about it. It 
> shouldn't do that, because the actual memset() is done by slab_alloc(), 
> and should not be done ANYWHERE ELSE.
> 
> It has *nothing* to do with "object is too big" or anything else.

It has to do how large objects are allocated through kmalloc_large().
kmalloc_large() is elsewhere called with unfiltered gfpflags and relies 
on zeroing being handled by the page allocator. It can take unfiltered gfp 
flags.

The filtering of __GFP_ZERO that you added avoids the double zeroing for 
the fallback path (which is only called if all the partial lists are empty 
and after the page allocator went through reclaim and did not get the 
large sized memory we wanted). So okay the patch could be a performance 
enhancement. But then it adds the filtering to the hot path instead of the 
code path that containts the kmalloc_large that is executed once in a blue 
moon. The hot path should only filter when we actually decide that we need 
to allocate a new slab from the page allocator.

It seemed to me that the reason for inserting the filtering of __GFP_ZERO 
there was the belief that the page allocator cannot take __GFP_ZERO 
through kmalloc_large() if we are in an interrupt.

The use of kmalloc_large() in __slab_alloc() is a bit strange at this 
point. The cleanup work in 2.6.26 will make this all nice again.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28  2:30 ` Linus Torvalds
  2008-03-28  3:24   ` Christoph Lameter
@ 2008-03-28  3:31   ` Yinghai Lu
  2008-03-31 10:14     ` Kamalesh Babulal
  2008-03-28 11:29   ` Haavard Skinnemoen
  2008-03-28 16:10   ` Rafael J. Wysocki
  3 siblings, 1 reply; 55+ messages in thread
From: Yinghai Lu @ 2008-03-28  3:31 UTC (permalink / raw)
  To: Linus Torvalds, Kamalesh Babulal, Ingo Molnar
  Cc: Rafael J. Wysocki, Pawel Staszewski, Christoph Lameter, LKML,
	Adrian Bunk, Andrew Morton, Natalie Protasevich

On Thu, Mar 27, 2008 at 7:30 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>
>  On Thu, 27 Mar 2008, Rafael J. Wysocki wrote:
>  >
...
>
>  > Bug-Entry     : http://bugzilla.kernel.org/show_bug.cgi?id=9962
>  > Subject               : mount: could not find filesystem
>  > Submitter     : Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
>  > Date          : 2008-02-12 14:34 (45 days old)
>  > References    : http://lkml.org/lkml/2008/2/12/91
>  > Handled-By    : Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
>  >                 Yinghai Lu <yhlu.kernel@gmail.com>
>
>  Needs more info. The original oops that opened it is fixed, but..
>
Kamalesh,
can you check that?

YH

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28  3:31   ` Yinghai Lu
@ 2008-03-31 10:14     ` Kamalesh Babulal
  2008-03-31 12:10       ` Rafael J. Wysocki
  0 siblings, 1 reply; 55+ messages in thread
From: Kamalesh Babulal @ 2008-03-31 10:14 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Linus Torvalds, Ingo Molnar, Rafael J. Wysocki, Pawel Staszewski,
	Christoph Lameter, LKML, Adrian Bunk, Andrew Morton,
	Natalie Protasevich

Yinghai Lu wrote:
> On Thu, Mar 27, 2008 at 7:30 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>>  On Thu, 27 Mar 2008, Rafael J. Wysocki wrote:
>>  >
> ...
>>  > Bug-Entry     : http://bugzilla.kernel.org/show_bug.cgi?id=9962
>>  > Subject               : mount: could not find filesystem
>>  > Submitter     : Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
>>  > Date          : 2008-02-12 14:34 (45 days old)
>>  > References    : http://lkml.org/lkml/2008/2/12/91
>>  > Handled-By    : Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
>>  >                 Yinghai Lu <yhlu.kernel@gmail.com>
>>
>>  Needs more info. The original oops that opened it is fixed, but..
>>
> Kamalesh,
> can you check that?
> 
> YH
Hi,

I have tested the 2.6.25-rc7 kernel and the kernel boots up without the panic.

-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-31 10:14     ` Kamalesh Babulal
@ 2008-03-31 12:10       ` Rafael J. Wysocki
  0 siblings, 0 replies; 55+ messages in thread
From: Rafael J. Wysocki @ 2008-03-31 12:10 UTC (permalink / raw)
  To: Kamalesh Babulal
  Cc: Yinghai Lu, Linus Torvalds, Ingo Molnar, Pawel Staszewski,
	Christoph Lameter, LKML, Adrian Bunk, Andrew Morton,
	Natalie Protasevich

On Monday, 31 of March 2008, Kamalesh Babulal wrote:
> Yinghai Lu wrote:
> > On Thu, Mar 27, 2008 at 7:30 PM, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> >>
> >>  On Thu, 27 Mar 2008, Rafael J. Wysocki wrote:
> >>  >
> > ...
> >>  > Bug-Entry     : http://bugzilla.kernel.org/show_bug.cgi?id=9962
> >>  > Subject               : mount: could not find filesystem
> >>  > Submitter     : Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
> >>  > Date          : 2008-02-12 14:34 (45 days old)
> >>  > References    : http://lkml.org/lkml/2008/2/12/91
> >>  > Handled-By    : Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
> >>  >                 Yinghai Lu <yhlu.kernel@gmail.com>
> >>
> >>  Needs more info. The original oops that opened it is fixed, but..
> >>
> > Kamalesh,
> > can you check that?
> > 
> > YH
> Hi,
> 
> I have tested the 2.6.25-rc7 kernel and the kernel boots up without the panic.

Thanks for the update, I closed the bug.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28  2:30 ` Linus Torvalds
  2008-03-28  3:24   ` Christoph Lameter
  2008-03-28  3:31   ` Yinghai Lu
@ 2008-03-28 11:29   ` Haavard Skinnemoen
  2008-03-28 16:11     ` Rafael J. Wysocki
  2008-03-28 16:10   ` Rafael J. Wysocki
  3 siblings, 1 reply; 55+ messages in thread
From: Haavard Skinnemoen @ 2008-03-28 11:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Pawel Staszewski, Christoph Lameter, LKML,
	Adrian Bunk, Andrew Morton, Natalie Protasevich, Ben Nizette

Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9976
> > Subject		: BUG: 2.6.25-rc1: iptables postrouting setup causes oops
> > Submitter	: Ben Nizette <bn@niasdigital.com>
> > Date		: 2008-02-12 12:46 (45 days old)
> > References	: http://lkml.org/lkml/2008/2/12/148
> > Handled-By	: Haavard Skinnemoen <hskinnemoen@atmel.com>  
> 
> This one seems gone (and was apparently AVR-only):
> 
> http://lkml.org/lkml/2008/2/13/607:
>   "What ever the problem is it isn't immediately apparent in latest git so
>    I guess we'll just have to keep our eyes peeled."

Yes, I haven't been able to reproduce it, so I was going to ask for it
to be removed. I suspect it must be some sort of -rc1 weirdness that
has been fixed by a later commit.

I don't feel good about closing it without knowing what it was, but
since neither Ben nor I can reproduce it anymore, I don't have any
better suggestions.

Haavard

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 11:29   ` Haavard Skinnemoen
@ 2008-03-28 16:11     ` Rafael J. Wysocki
  0 siblings, 0 replies; 55+ messages in thread
From: Rafael J. Wysocki @ 2008-03-28 16:11 UTC (permalink / raw)
  To: Haavard Skinnemoen
  Cc: Linus Torvalds, Pawel Staszewski, Christoph Lameter, LKML,
	Adrian Bunk, Andrew Morton, Natalie Protasevich, Ben Nizette

On Friday, 28 of March 2008, Haavard Skinnemoen wrote:
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9976
> > > Subject		: BUG: 2.6.25-rc1: iptables postrouting setup causes oops
> > > Submitter	: Ben Nizette <bn@niasdigital.com>
> > > Date		: 2008-02-12 12:46 (45 days old)
> > > References	: http://lkml.org/lkml/2008/2/12/148
> > > Handled-By	: Haavard Skinnemoen <hskinnemoen@atmel.com>  
> > 
> > This one seems gone (and was apparently AVR-only):
> > 
> > http://lkml.org/lkml/2008/2/13/607:
> >   "What ever the problem is it isn't immediately apparent in latest git so
> >    I guess we'll just have to keep our eyes peeled."
> 
> Yes, I haven't been able to reproduce it, so I was going to ask for it
> to be removed. I suspect it must be some sort of -rc1 weirdness that
> has been fixed by a later commit.
> 
> I don't feel good about closing it without knowing what it was, but
> since neither Ben nor I can reproduce it anymore, I don't have any
> better suggestions.

I closed it as "unreproducible".

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28  2:30 ` Linus Torvalds
                     ` (2 preceding siblings ...)
  2008-03-28 11:29   ` Haavard Skinnemoen
@ 2008-03-28 16:10   ` Rafael J. Wysocki
  2008-03-28 16:47     ` Linus Torvalds
  3 siblings, 1 reply; 55+ messages in thread
From: Rafael J. Wysocki @ 2008-03-28 16:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Pawel Staszewski, Christoph Lameter, LKML, Adrian Bunk,
	Andrew Morton, Natalie Protasevich

On Friday, 28 of March 2008, Linus Torvalds wrote:
> 
> On Thu, 27 Mar 2008, Rafael J. Wysocki wrote:
> >
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9412
> > Subject		: commit a878539ef994787c447a98c2e3ba0fe3dad984ec breaks boot on SB600 AHCI
> > Submitter	: Srihari Vijayaraghavan <sriharivijayaraghavan@yahoo.com.au>
> > Date		: 2008-03-12 17:15 (16 days old)
> > Handled-By	: Jeff Garzik <jgarzik@pobox.com>
> > 		  Richard Zhao <richard.zhao@amd.com>
> 
> Fixed by 4cde32fc4b32e96a99063af3183acdfd54c563f0, methinks.

Closed.

> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9962
> > Subject		: mount: could not find filesystem
> > Submitter	: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
> > Date		: 2008-02-12 14:34 (45 days old)
> > References	: http://lkml.org/lkml/2008/2/12/91
> > Handled-By	: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
> > 		  Yinghai Lu <yhlu.kernel@gmail.com>
> 
> Needs more info. The original oops that opened it is fixed, but..
> 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9976
> > Subject		: BUG: 2.6.25-rc1: iptables postrouting setup causes oops
> > Submitter	: Ben Nizette <bn@niasdigital.com>
> > Date		: 2008-02-12 12:46 (45 days old)
> > References	: http://lkml.org/lkml/2008/2/12/148
> > Handled-By	: Haavard Skinnemoen <hskinnemoen@atmel.com>
> 
> This one seems gone (and was apparently AVR-only):
> 
> http://lkml.org/lkml/2008/2/13/607:
>   "What ever the problem is it isn't immediately apparent in latest git so
>    I guess we'll just have to keep our eyes peeled."

Closed, too.
    
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9978
> > Subject		: 2.6.25-rc1: volanoMark regression
> > Submitter	: Zhang, Yanmin <yanmin_zhang@linux.intel.com>
> > Date		: 2008-02-13 10:30 (44 days old)
> > References	: http://lkml.org/lkml/2008/2/13/128
> > 		  http://lkml.org/lkml/2008/3/12/52
> > 		  http://lkml.org/lkml/2008/3/18/81
> > Handled-By	: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
> > 		  Balbir Singh <balbir@linux.vnet.ibm.com>
> 
> Hmm. It is a regression on one machine (2x quad-core stoakley), but not 
> another (4x quad-core tigerton).
> 
> Interestingly, the stoakley box numbers have apparently been all over the 
> map.
> 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10318
> > Subject		: WARNING: at arch/x86/mm/highmem_32.c:43 kmap_atomic_prot+0x87/0x184()
> > Submitter	: Pawel Staszewski <pstaszewski@artcom.pl>
> > Date		: 2008-03-25 02:50 (3 days old)
> 
> Andrew and seems to have debugged this down to a kzalloc(GFP_ATOMIC) or 
> similar.
> 
> I wonder if the bug is in that commit 
> 3811dbf67162bd08412f1b0e02e554f353e93bdb ("SLUB: remove useless masking 
> of GFP_ZERO"), because I don't think that masking was at all useless and I 
> think my original 7fd272550bd43cc1d7289ef0ab2fa50de137e767 was correct.
> 
> That apparently bogus commit says "GFP_ZERO is already masked out in 
> new_slab()", but gfpflags is not just used for new_slab(), but for 
> kmalloc_large() too.  Which does *not* clear GFP_ZERO.
> 
> Pawel, does reverting 3811dbf67162bd08412f1b0e02e554f353e93bdb fix it for 
> you?
> 
> Also, Rafael - do these reminder emails also go to the people who are 
> mentioned in the regressions (especially people who are set up as being 
> "hanled-by" or having patches for the problem)? 

No, they don't.  I need to do some scriptwork to make that happen.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 16:10   ` Rafael J. Wysocki
@ 2008-03-28 16:47     ` Linus Torvalds
  2008-03-28 17:36       ` Adrian Bunk
  2008-03-28 22:28       ` Rafael J. Wysocki
  0 siblings, 2 replies; 55+ messages in thread
From: Linus Torvalds @ 2008-03-28 16:47 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Pawel Staszewski, Christoph Lameter, LKML, Adrian Bunk,
	Andrew Morton, Natalie Protasevich

On Fri, 28 Mar 2008, Rafael J. Wysocki wrote:
> > 
> > Also, Rafael - do these reminder emails also go to the people who are 
> > mentioned in the regressions (especially people who are set up as being 
> > "handled-by" or having patches for the problem)? 
> 
> No, they don't.  I need to do some scriptwork to make that happen.

It would be good. Right now I know for a fact that a lot of people read 
LKML with various filters in place (or just very spottily), so I have a 
feeling that while people try to track regressions, many people are 
probably less aware of these things than they should be. And sometimes the 
"handled-by" ends up being inaccurate (maybe somebody replied to the 
original problem, but it became obvious that it was somewhere else, and 
they remain "handled-by" even though the person doesn't actually handle 
it).

It would probably also make sense to add some of the bigger subsystem 
maintainers to the Cc (and/or with a mailing list for regressions?)

I think the regression list is _extremely_ valuable, but the problem I 
see is that it's not necessarily reaching all the involved people.

The other problem is that I think the old reports (especially the ones 
that haven't had reporter feedback in the last two weeks) end up being not 
just stale, but they sort at the top, so when the right people _do_ look 
at the list, the natural way to do so with email is to look at the first 
ones first, and they *all* tend to be stale.

So the reaction is often "need more info" or "I think this was fixed 
already two weeks ago, but there hasn't been any reply". Which is 
psychologically really bad, because after you've seen three or four of 
those, you just dismiss the rest (even if the later ones then may be much 
more relevant!)

This is why I have always been advocating so aggressive culling of 
regressions and bug-reports - stale bug-reports are worse than useless, 
they actually _hurt_.

			Linus

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 16:47     ` Linus Torvalds
@ 2008-03-28 17:36       ` Adrian Bunk
  2008-03-28 20:33         ` Ingo Molnar
  2008-03-28 22:28       ` Rafael J. Wysocki
  1 sibling, 1 reply; 55+ messages in thread
From: Adrian Bunk @ 2008-03-28 17:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Pawel Staszewski, Christoph Lameter, LKML,
	Andrew Morton, Natalie Protasevich

On Fri, Mar 28, 2008 at 09:47:47AM -0700, Linus Torvalds wrote:
> 
> 
> On Fri, 28 Mar 2008, Rafael J. Wysocki wrote:
> > > 
> > > Also, Rafael - do these reminder emails also go to the people who are 
> > > mentioned in the regressions (especially people who are set up as being 
> > > "handled-by" or having patches for the problem)? 
> > 
> > No, they don't.  I need to do some scriptwork to make that happen.
> 
> It would be good. Right now I know for a fact that a lot of people read 
> LKML with various filters in place (or just very spottily), so I have a 
> feeling that while people try to track regressions, many people are 
> probably less aware of these things than they should be. And sometimes the 
> "handled-by" ends up being inaccurate (maybe somebody replied to the 
> original problem, but it became obvious that it was somewhere else, and 
> they remain "handled-by" even though the person doesn't actually handle 
> it).
> 
> It would probably also make sense to add some of the bigger subsystem 
> maintainers to the Cc (and/or with a mailing list for regressions?)
>...

When I tracked regressions I also Cc'ed all submitters, maintainers of 
all affected subsystems, and the people who said they'd handle the 
reports.

The problem I ran into was vger dropping emails with >= 30 recipients, 
which meant that I had to split the regression list into half a dozen 
emails (splitted by the area of the regressions) when I tracked at about
30 regressions.

But that was one year ago when we had only half as many regressions per 
release as they do now, with 2.6.25 we had a peak of 66 pending 
regressions...

> 			Linus

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 17:36       ` Adrian Bunk
@ 2008-03-28 20:33         ` Ingo Molnar
  0 siblings, 0 replies; 55+ messages in thread
From: Ingo Molnar @ 2008-03-28 20:33 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Linus Torvalds, Rafael J. Wysocki, Pawel Staszewski,
	Christoph Lameter, LKML, Andrew Morton, Natalie Protasevich

* Adrian Bunk <bunk@kernel.org> wrote:

> But that was one year ago when we had only half as many regressions 
> per release as they do now, with 2.6.25 we had a peak of 66 pending 
> regressions...

we have twice as many commits, and we have better test coverage. I get 
the impression that user trust is coming back as well: regressions are 
being reported sooner and more persistently - because we are handling 
them in a more structured and more dependable way. We also seem to have 
more users of latest -git.

so it _appears_ to be an increase in bugginess but i believe it's an 
increase of activity and it's all good IMO, we close 90% of the 
regressions within a week or two, and most of the regressions are for 
obscure cases.

I had 2.6.25 running on most of my boxes from -rc1 on without any 
unprovoked crash. (provoked bugs were another matter) Bisection became 
more practical and more widespread as well. And i periodically find bugs 
that came from ancient kernels so we are fixing bugs faster than we put 
them in i think. I didnt have that feeling in the .18-.19 kernels.

also, now that the kerneloops.org client is in Fedora 9 by default, 
we'll start to have really objective long-term statistics about how our 
users react to the bugs we put into the kernel.

	Ingo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 16:47     ` Linus Torvalds
  2008-03-28 17:36       ` Adrian Bunk
@ 2008-03-28 22:28       ` Rafael J. Wysocki
  2008-03-31 13:34         ` Ingo Molnar
  1 sibling, 1 reply; 55+ messages in thread
From: Rafael J. Wysocki @ 2008-03-28 22:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Pawel Staszewski, Christoph Lameter, LKML, Adrian Bunk,
	Andrew Morton, Natalie Protasevich

On Friday, 28 of March 2008, Linus Torvalds wrote:
> 
> On Fri, 28 Mar 2008, Rafael J. Wysocki wrote:
> > > 
> > > Also, Rafael - do these reminder emails also go to the people who are 
> > > mentioned in the regressions (especially people who are set up as being 
> > > "handled-by" or having patches for the problem)? 
> > 
> > No, they don't.  I need to do some scriptwork to make that happen.
> 
> It would be good. Right now I know for a fact that a lot of people read 
> LKML with various filters in place (or just very spottily), so I have a 
> feeling that while people try to track regressions, many people are 
> probably less aware of these things than they should be. And sometimes the 
> "handled-by" ends up being inaccurate (maybe somebody replied to the 
> original problem, but it became obvious that it was somewhere else, and 
> they remain "handled-by" even though the person doesn't actually handle 
> it).

I generally agree, but OTOH I don't think that sending one message with
50+ addresses on the CC list will actually help a lot.  I personally don't like
to receive messages like that and tend to filter them out.

I'd prefer people to receive information regarding those regression reports in
which they are directly involved, but it will take some time to arrange that in
a reliable way.

> It would probably also make sense to add some of the bigger subsystem 
> maintainers to the Cc (and/or with a mailing list for regressions?)

Tell that to davem. ;-)

Seriously, it often is difficult to say who's the right maintainer and what
list is appropriate.
 
> I think the regression list is _extremely_ valuable, but the problem I 
> see is that it's not necessarily reaching all the involved people.
> 
> The other problem is that I think the old reports (especially the ones 
> that haven't had reporter feedback in the last two weeks) end up being not 
> just stale, but they sort at the top, so when the right people _do_ look 
> at the list, the natural way to do so with email is to look at the first 
> ones first, and they *all* tend to be stale.

Well, I can sort them in the reverse order just fine, if you prefer.

As far as closing the older ones is concerned, I'm a bit reluctant to do
that, because (1) some reporters reappear after a long period of silence (like
more than two weeks) and (2) it would create an impression that we _fixed_
more regressions than we actually did.

> So the reaction is often "need more info" or "I think this was fixed 
> already two weeks ago, but there hasn't been any reply". Which is 
> psychologically really bad, because after you've seen three or four of 
> those, you just dismiss the rest (even if the later ones then may be much 
> more relevant!)

Well, you shouldn't. :-)

> This is why I have always been advocating so aggressive culling of 
> regressions and bug-reports - stale bug-reports are worse than useless, 
> they actually _hurt_.

I don't quite agree here.  At least, they indicate that we may have an unfixed
problem and the fact that no one has taken care of it doesn't really mean we
should generally ignore it.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 22:28       ` Rafael J. Wysocki
@ 2008-03-31 13:34         ` Ingo Molnar
  0 siblings, 0 replies; 55+ messages in thread
From: Ingo Molnar @ 2008-03-31 13:34 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linus Torvalds, Pawel Staszewski, Christoph Lameter, LKML,
	Adrian Bunk, Andrew Morton, Natalie Protasevich

* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> > This is why I have always been advocating so aggressive culling of 
> > regressions and bug-reports - stale bug-reports are worse than 
> > useless, they actually _hurt_.
> 
> I don't quite agree here.  At least, they indicate that we may have an 
> unfixed problem and the fact that no one has taken care of it doesn't 
> really mean we should generally ignore it.

culling doesnt mean ignoring - it just means de-prioritizing. There's 
four basic bug categories:

1- bugs where there's inactivity on the reporter side. This we should
   de-prioritize - and reactivate them once their activity changes.

2- bugs where there's inactivity on the _maintainer_ side show bad bugs 
   in our process.

3- bugs that are old but have lots of activity are usually the most 
   difficult bugs where both side try their best to get it resolved.

4- bugs that are relatively new can be in any of the above 3 categories, 
   we dont know yet.

so i think we should list bugs in category #3 first: the hardest bugs, 
which need the most eyes. Then should we list #2 - the embarrasing bugs 
where our pocess failed. Then should we list #4 - new, not yet resolved 
bugs which need more eyes - especially in late -rc's. Then comes #1 - 
inactive bugs.

the problem for your scripting is to efficiently parse lkml activity for 
these bugs: which replies are from the "maintainer", and how "active" is 
a thread. So i guess a good heuristic is what you did in your latest 
mail: to reverse sort by age of the bug - but i'd also suggest to list 
too old entries where the bugzilla is not in NEEDINFO state - those 
indicate inactive (or unaware) maintainers.

	Ingo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-27 22:53 2.6.25-rc7-git2: Reported regressions from 2.6.24 Rafael J. Wysocki
  2008-03-28  0:18 ` Carlos R. Mafra
  2008-03-28  2:30 ` Linus Torvalds
@ 2008-03-28 10:24 ` Thomas Gleixner
  2008-03-28 10:58   ` Thomas Gleixner
  2008-03-28 16:12   ` Rafael J. Wysocki
  2008-03-28 18:57 ` Mark Lord
  3 siblings, 2 replies; 55+ messages in thread
From: Thomas Gleixner @ 2008-03-28 10:24 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Adrian Bunk, Andrew Morton, Linus Torvalds,
	Natalie Protasevich, Yinghai Lu, H. Peter Anvin,
	Venkatesch Pallipadi

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1774 bytes --]

On Thu, 27 Mar 2008, Rafael J. Wysocki wrote:
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9980
> Subject		: 2.6.25-rc1 on Sun Ultra 40- HPET clocksource which causes it to hang
> Submitter	: Jasper Bryant-Greene <jasper@unix.geek.nz>
> Date		: 2008-02-13 12:25 (44 days old)
> References	: http://lkml.org/lkml/2008/2/13/181
> Handled-By	: Yinghai Lu <Yinghai.Lu@sun.com>

asked the bug reporter for an update.
 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10011
> Subject		: The computer is blocked when X is started - unless max_cstate=2 - Acer Travelmate 4001 Lmi
> Submitter	: François Valenduc <francois.valenduc@tvcablenet.be>
> Date		: 2008-02-17 06:28 (40 days old)
> Handled-By	: Thomas Gleixner <tglx@linutronix.de>

asked the bug reporter for an update.
 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9969
> Subject		: 2.6.24-git15 Keyboard Issue?
> Submitter	: Chris Holvenstot <cholvenstot@comcast.net>
> Date		: 2008-02-06 14:02 (51 days old)
> References	: http://lkml.org/lkml/2008/2/6/100
> 		  http://lkml.org/lkml/2008/2/13/82
> Handled-By	: Thomas Gleixner <tglx@linutronix.de>
> Patch		: http://lkml.org/lkml/2008/2/15/343
 
asked the bug reporter for an update.
 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10153
> Subject		: (regression) kernel/timeconst.h bugs with HZ=128
> Submitter	: David Brownell <david-b@pacbell.net>
> Date		: 2008-02-26 19:32 (31 days old)
> References	: http://lkml.org/lkml/2008/2/26/294
> Handled-By	: H. Peter Anvin <hpa@zytor.com>
> Patch		: http://bugzilla.kernel.org/attachment.cgi?id=15114&amp;action=view
> 		  http://bugzilla.kernel.org/attachment.cgi?id=15115&amp;action=view

Have those patches made it somewhere else than into the bugzilla ?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 10:24 ` Thomas Gleixner
@ 2008-03-28 10:58   ` Thomas Gleixner
  2008-03-28 11:00     ` Peter Zijlstra
  2008-03-28 16:12   ` Rafael J. Wysocki
  1 sibling, 1 reply; 55+ messages in thread
From: Thomas Gleixner @ 2008-03-28 10:58 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Adrian Bunk, Andrew Morton, Linus Torvalds,
	Natalie Protasevich, Peter Zijlstra

On Fri, 28 Mar 2008, Thomas Gleixner wrote:
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9969
> > Subject		: 2.6.24-git15 Keyboard Issue?
> > Submitter	: Chris Holvenstot <cholvenstot@comcast.net>
> > Date		: 2008-02-06 14:02 (51 days old)
> > References	: http://lkml.org/lkml/2008/2/6/100
> > 		  http://lkml.org/lkml/2008/2/13/82
> > Handled-By	: Thomas Gleixner <tglx@linutronix.de>
> > Patch		: http://lkml.org/lkml/2008/2/15/343
>  
> asked the bug reporter for an update.

Now that lkml.org is working again, I checked the patch which is
referenced above and I have a hard time to connect it even remotely to
that bug.

As far as I can tell the discussion on lkml identified GROUP_SCHED=y
as the culprit, but I have no idea whether there was any resolution
other than disabling GROUP_SCHED.

Peter ??

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 10:58   ` Thomas Gleixner
@ 2008-03-28 11:00     ` Peter Zijlstra
  2008-03-28 11:13       ` Adrian Bunk
  0 siblings, 1 reply; 55+ messages in thread
From: Peter Zijlstra @ 2008-03-28 11:00 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Rafael J. Wysocki, LKML, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich

On Fri, 2008-03-28 at 11:58 +0100, Thomas Gleixner wrote:
> On Fri, 28 Mar 2008, Thomas Gleixner wrote:
> > > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9969
> > > Subject		: 2.6.24-git15 Keyboard Issue?
> > > Submitter	: Chris Holvenstot <cholvenstot@comcast.net>
> > > Date		: 2008-02-06 14:02 (51 days old)
> > > References	: http://lkml.org/lkml/2008/2/6/100
> > > 		  http://lkml.org/lkml/2008/2/13/82
> > > Handled-By	: Thomas Gleixner <tglx@linutronix.de>
> > > Patch		: http://lkml.org/lkml/2008/2/15/343
> >  
> > asked the bug reporter for an update.
> 
> Now that lkml.org is working again, I checked the patch which is
> referenced above and I have a hard time to connect it even remotely to
> that bug.
> 
> As far as I can tell the discussion on lkml identified GROUP_SCHED=y
> as the culprit, but I have no idea whether there was any resolution
> other than disabling GROUP_SCHED.
> 
> Peter ??

Correct, I am working on the issue but that's not going to be .25 stuff.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 11:00     ` Peter Zijlstra
@ 2008-03-28 11:13       ` Adrian Bunk
  2008-03-28 11:16         ` Thomas Gleixner
  2008-03-28 11:44         ` Peter Zijlstra
  0 siblings, 2 replies; 55+ messages in thread
From: Adrian Bunk @ 2008-03-28 11:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Rafael J. Wysocki, LKML, Andrew Morton,
	Linus Torvalds, Natalie Protasevich

On Fri, Mar 28, 2008 at 12:00:25PM +0100, Peter Zijlstra wrote:
> On Fri, 2008-03-28 at 11:58 +0100, Thomas Gleixner wrote:
> > On Fri, 28 Mar 2008, Thomas Gleixner wrote:
> > > > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9969
> > > > Subject		: 2.6.24-git15 Keyboard Issue?
> > > > Submitter	: Chris Holvenstot <cholvenstot@comcast.net>
> > > > Date		: 2008-02-06 14:02 (51 days old)
> > > > References	: http://lkml.org/lkml/2008/2/6/100
> > > > 		  http://lkml.org/lkml/2008/2/13/82
> > > > Handled-By	: Thomas Gleixner <tglx@linutronix.de>
> > > > Patch		: http://lkml.org/lkml/2008/2/15/343
> > >  
> > > asked the bug reporter for an update.
> > 
> > Now that lkml.org is working again, I checked the patch which is
> > referenced above and I have a hard time to connect it even remotely to
> > that bug.
> > 
> > As far as I can tell the discussion on lkml identified GROUP_SCHED=y
> > as the culprit, but I have no idea whether there was any resolution
> > other than disabling GROUP_SCHED.
> > 
> > Peter ??
> 
> Correct, I am working on the issue but that's not going to be .25 stuff.

How are we going to get 2.6.25 working no worse than 2.6.24?

Letting GROUP_SCHED depend on BROKEN would sound logical, but that would 
also kill FAIR_GROUP_SCHED.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 11:13       ` Adrian Bunk
@ 2008-03-28 11:16         ` Thomas Gleixner
  2008-03-28 11:31           ` Adrian Bunk
  2008-03-28 11:44         ` Peter Zijlstra
  1 sibling, 1 reply; 55+ messages in thread
From: Thomas Gleixner @ 2008-03-28 11:16 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Peter Zijlstra, Rafael J. Wysocki, LKML, Andrew Morton,
	Linus Torvalds, Natalie Protasevich

On Fri, 28 Mar 2008, Adrian Bunk wrote:
> On Fri, Mar 28, 2008 at 12:00:25PM +0100, Peter Zijlstra wrote:
> > On Fri, 2008-03-28 at 11:58 +0100, Thomas Gleixner wrote:
> > > On Fri, 28 Mar 2008, Thomas Gleixner wrote:
> > > > > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9969
> > > > > Subject		: 2.6.24-git15 Keyboard Issue?
> > > > > Submitter	: Chris Holvenstot <cholvenstot@comcast.net>
> > > > > Date		: 2008-02-06 14:02 (51 days old)
> > > > > References	: http://lkml.org/lkml/2008/2/6/100
> > > > > 		  http://lkml.org/lkml/2008/2/13/82
> > > > > Handled-By	: Thomas Gleixner <tglx@linutronix.de>
> > > > > Patch		: http://lkml.org/lkml/2008/2/15/343
> > > >  
> > > > asked the bug reporter for an update.
> > > 
> > > Now that lkml.org is working again, I checked the patch which is
> > > referenced above and I have a hard time to connect it even remotely to
> > > that bug.
> > > 
> > > As far as I can tell the discussion on lkml identified GROUP_SCHED=y
> > > as the culprit, but I have no idea whether there was any resolution
> > > other than disabling GROUP_SCHED.
> > > 
> > > Peter ??
> > 
> > Correct, I am working on the issue but that's not going to be .25 stuff.
> 
> How are we going to get 2.6.25 working no worse than 2.6.24?
> 
> Letting GROUP_SCHED depend on BROKEN would sound logical, but that would 
> also kill FAIR_GROUP_SCHED.

I think default it to off should be sufficient.

Thanks,
	tglx

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 11:16         ` Thomas Gleixner
@ 2008-03-28 11:31           ` Adrian Bunk
  2008-03-28 16:17             ` Rafael J. Wysocki
  0 siblings, 1 reply; 55+ messages in thread
From: Adrian Bunk @ 2008-03-28 11:31 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Rafael J. Wysocki, LKML, Andrew Morton,
	Linus Torvalds, Natalie Protasevich

On Fri, Mar 28, 2008 at 12:16:03PM +0100, Thomas Gleixner wrote:
> On Fri, 28 Mar 2008, Adrian Bunk wrote:
> > On Fri, Mar 28, 2008 at 12:00:25PM +0100, Peter Zijlstra wrote:
> > > On Fri, 2008-03-28 at 11:58 +0100, Thomas Gleixner wrote:
> > > > On Fri, 28 Mar 2008, Thomas Gleixner wrote:
> > > > > > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9969
> > > > > > Subject		: 2.6.24-git15 Keyboard Issue?
> > > > > > Submitter	: Chris Holvenstot <cholvenstot@comcast.net>
> > > > > > Date		: 2008-02-06 14:02 (51 days old)
> > > > > > References	: http://lkml.org/lkml/2008/2/6/100
> > > > > > 		  http://lkml.org/lkml/2008/2/13/82
> > > > > > Handled-By	: Thomas Gleixner <tglx@linutronix.de>
> > > > > > Patch		: http://lkml.org/lkml/2008/2/15/343
> > > > >  
> > > > > asked the bug reporter for an update.
> > > > 
> > > > Now that lkml.org is working again, I checked the patch which is
> > > > referenced above and I have a hard time to connect it even remotely to
> > > > that bug.
> > > > 
> > > > As far as I can tell the discussion on lkml identified GROUP_SCHED=y
> > > > as the culprit, but I have no idea whether there was any resolution
> > > > other than disabling GROUP_SCHED.
> > > > 
> > > > Peter ??
> > > 
> > > Correct, I am working on the issue but that's not going to be .25 stuff.
> > 
> > How are we going to get 2.6.25 working no worse than 2.6.24?
> > 
> > Letting GROUP_SCHED depend on BROKEN would sound logical, but that would 
> > also kill FAIR_GROUP_SCHED.
> 
> I think default it to off should be sufficient.

A system administrator saying yes to GROUP_SCHED when running
"make oldconfig" can hardly be blamed for doing so, and "makes your 
mouse and keyboard unusable" mustn't be the result.

Avoiding unexpected breakages when toggling an option is _much_ more 
important than e.g. whether some exotic randconfig configuration 
compiles.

> Thanks,
> 	tglx

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 11:31           ` Adrian Bunk
@ 2008-03-28 16:17             ` Rafael J. Wysocki
  2008-03-28 17:06               ` Adrian Bunk
  2008-03-28 20:42               ` Ingo Molnar
  0 siblings, 2 replies; 55+ messages in thread
From: Rafael J. Wysocki @ 2008-03-28 16:17 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Thomas Gleixner, Peter Zijlstra, LKML, Andrew Morton,
	Linus Torvalds, Natalie Protasevich

On Friday, 28 of March 2008, Adrian Bunk wrote:
> On Fri, Mar 28, 2008 at 12:16:03PM +0100, Thomas Gleixner wrote:
> > On Fri, 28 Mar 2008, Adrian Bunk wrote:
> > > On Fri, Mar 28, 2008 at 12:00:25PM +0100, Peter Zijlstra wrote:
> > > > On Fri, 2008-03-28 at 11:58 +0100, Thomas Gleixner wrote:
> > > > > On Fri, 28 Mar 2008, Thomas Gleixner wrote:
> > > > > > > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9969
> > > > > > > Subject		: 2.6.24-git15 Keyboard Issue?
> > > > > > > Submitter	: Chris Holvenstot <cholvenstot@comcast.net>
> > > > > > > Date		: 2008-02-06 14:02 (51 days old)
> > > > > > > References	: http://lkml.org/lkml/2008/2/6/100
> > > > > > > 		  http://lkml.org/lkml/2008/2/13/82
> > > > > > > Handled-By	: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > Patch		: http://lkml.org/lkml/2008/2/15/343
> > > > > >  
> > > > > > asked the bug reporter for an update.
> > > > > 
> > > > > Now that lkml.org is working again, I checked the patch which is
> > > > > referenced above and I have a hard time to connect it even remotely to
> > > > > that bug.
> > > > > 
> > > > > As far as I can tell the discussion on lkml identified GROUP_SCHED=y
> > > > > as the culprit, but I have no idea whether there was any resolution
> > > > > other than disabling GROUP_SCHED.
> > > > > 
> > > > > Peter ??
> > > > 
> > > > Correct, I am working on the issue but that's not going to be .25 stuff.
> > > 
> > > How are we going to get 2.6.25 working no worse than 2.6.24?
> > > 
> > > Letting GROUP_SCHED depend on BROKEN would sound logical, but that would 
> > > also kill FAIR_GROUP_SCHED.
> > 
> > I think default it to off should be sufficient.
> 
> A system administrator saying yes to GROUP_SCHED when running
> "make oldconfig" can hardly be blamed for doing so, and "makes your 
> mouse and keyboard unusable" mustn't be the result.
> 
> Avoiding unexpected breakages when toggling an option is _much_ more 
> important than e.g. whether some exotic randconfig configuration 
> compiles.

Perhaps add something like "(UNRELIABLE)" to the Kconfig option?

Rafael

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 16:17             ` Rafael J. Wysocki
@ 2008-03-28 17:06               ` Adrian Bunk
  2008-03-28 20:42               ` Ingo Molnar
  1 sibling, 0 replies; 55+ messages in thread
From: Adrian Bunk @ 2008-03-28 17:06 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Thomas Gleixner, Peter Zijlstra, LKML, Andrew Morton,
	Linus Torvalds, Natalie Protasevich

On Fri, Mar 28, 2008 at 05:17:17PM +0100, Rafael J. Wysocki wrote:
> On Friday, 28 of March 2008, Adrian Bunk wrote:
> > On Fri, Mar 28, 2008 at 12:16:03PM +0100, Thomas Gleixner wrote:
> > > On Fri, 28 Mar 2008, Adrian Bunk wrote:
> > > > On Fri, Mar 28, 2008 at 12:00:25PM +0100, Peter Zijlstra wrote:
> > > > > On Fri, 2008-03-28 at 11:58 +0100, Thomas Gleixner wrote:
> > > > > > On Fri, 28 Mar 2008, Thomas Gleixner wrote:
> > > > > > > > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9969
> > > > > > > > Subject		: 2.6.24-git15 Keyboard Issue?
> > > > > > > > Submitter	: Chris Holvenstot <cholvenstot@comcast.net>
> > > > > > > > Date		: 2008-02-06 14:02 (51 days old)
> > > > > > > > References	: http://lkml.org/lkml/2008/2/6/100
> > > > > > > > 		  http://lkml.org/lkml/2008/2/13/82
> > > > > > > > Handled-By	: Thomas Gleixner <tglx@linutronix.de>
> > > > > > > > Patch		: http://lkml.org/lkml/2008/2/15/343
> > > > > > >  
> > > > > > > asked the bug reporter for an update.
> > > > > > 
> > > > > > Now that lkml.org is working again, I checked the patch which is
> > > > > > referenced above and I have a hard time to connect it even remotely to
> > > > > > that bug.
> > > > > > 
> > > > > > As far as I can tell the discussion on lkml identified GROUP_SCHED=y
> > > > > > as the culprit, but I have no idea whether there was any resolution
> > > > > > other than disabling GROUP_SCHED.
> > > > > > 
> > > > > > Peter ??
> > > > > 
> > > > > Correct, I am working on the issue but that's not going to be .25 stuff.
> > > > 
> > > > How are we going to get 2.6.25 working no worse than 2.6.24?
> > > > 
> > > > Letting GROUP_SCHED depend on BROKEN would sound logical, but that would 
> > > > also kill FAIR_GROUP_SCHED.
> > > 
> > > I think default it to off should be sufficient.
> > 
> > A system administrator saying yes to GROUP_SCHED when running
> > "make oldconfig" can hardly be blamed for doing so, and "makes your 
> > mouse and keyboard unusable" mustn't be the result.
> > 
> > Avoiding unexpected breakages when toggling an option is _much_ more 
> > important than e.g. whether some exotic randconfig configuration 
> > compiles.
> 
> Perhaps add something like "(UNRELIABLE)" to the Kconfig option?

That would be called "depends on BROKEN".

There seem to be two choices:
- either FAIR_GROUP_SCHED=y brought in some situations real advantages
  with 2.6.24, then the solution is to revert commits until 2.6.25 is no 
  worse than 2.6.24 (no matter how many and how big they are) or
- we can simply let GROUP_SCHED depend on BROKEN in 2.6.25.

What we need are not hints for users how to avoid known problems, 
what we need are big cluebats for kernel developers who caused
regressions.

> Rafael

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 16:17             ` Rafael J. Wysocki
  2008-03-28 17:06               ` Adrian Bunk
@ 2008-03-28 20:42               ` Ingo Molnar
  2008-03-28 22:33                 ` Rafael J. Wysocki
  1 sibling, 1 reply; 55+ messages in thread
From: Ingo Molnar @ 2008-03-28 20:42 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Adrian Bunk, Thomas Gleixner, Peter Zijlstra, LKML, Andrew Morton,
	Linus Torvalds, Natalie Protasevich

> Subject		: 2.6.24-git15 Keyboard Issue?

Rafael, we can close this bug i think. The group scheduler related 
interactivity problems in early .25-rc were resolved via:

| commit 62fb185130e4d420f71a30ff59d8b16b74ef5d2b
| Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
| Date:   Mon Feb 25 17:34:02 2008 +0100
|
|     sched: revert load_balance_monitor() changes
[...]
|    Namely:
|     - very frequent wakeups on SMP, reported by PowerTop users.
|     - cacheline trashing on (large) SMP
|     - some latencies larger than 500ms

that includes Xorg's fragility to certain types of delays. It should be 
resolved by now and i havent seen it resurface.

we are now back to the 2.6.24 state of things. Which isnt as perfectly 
interactive as the non-group-scheduler 2.6.24, but which is still more 
interactive than say 2.6.22.

Peter is working on improving it even more, but that's 2.6.26 material.

	Ingo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 20:42               ` Ingo Molnar
@ 2008-03-28 22:33                 ` Rafael J. Wysocki
  0 siblings, 0 replies; 55+ messages in thread
From: Rafael J. Wysocki @ 2008-03-28 22:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Adrian Bunk, Thomas Gleixner, Peter Zijlstra, LKML, Andrew Morton,
	Linus Torvalds, Natalie Protasevich

On Friday, 28 of March 2008, Ingo Molnar wrote:
> 
> > Subject		: 2.6.24-git15 Keyboard Issue?
> 
> Rafael, we can close this bug i think.

Closed.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 11:13       ` Adrian Bunk
  2008-03-28 11:16         ` Thomas Gleixner
@ 2008-03-28 11:44         ` Peter Zijlstra
  1 sibling, 0 replies; 55+ messages in thread
From: Peter Zijlstra @ 2008-03-28 11:44 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Thomas Gleixner, Rafael J. Wysocki, LKML, Andrew Morton,
	Linus Torvalds, Natalie Protasevich

On Fri, 2008-03-28 at 13:13 +0200, Adrian Bunk wrote:
> On Fri, Mar 28, 2008 at 12:00:25PM +0100, Peter Zijlstra wrote:
> > On Fri, 2008-03-28 at 11:58 +0100, Thomas Gleixner wrote:
> > > On Fri, 28 Mar 2008, Thomas Gleixner wrote:
> > > > > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9969
> > > > > Subject		: 2.6.24-git15 Keyboard Issue?
> > > > > Submitter	: Chris Holvenstot <cholvenstot@comcast.net>
> > > > > Date		: 2008-02-06 14:02 (51 days old)
> > > > > References	: http://lkml.org/lkml/2008/2/6/100
> > > > > 		  http://lkml.org/lkml/2008/2/13/82
> > > > > Handled-By	: Thomas Gleixner <tglx@linutronix.de>
> > > > > Patch		: http://lkml.org/lkml/2008/2/15/343
> > > >  
> > > > asked the bug reporter for an update.
> > > 
> > > Now that lkml.org is working again, I checked the patch which is
> > > referenced above and I have a hard time to connect it even remotely to
> > > that bug.
> > > 
> > > As far as I can tell the discussion on lkml identified GROUP_SCHED=y
> > > as the culprit, but I have no idea whether there was any resolution
> > > other than disabling GROUP_SCHED.
> > > 
> > > Peter ??
> > 
> > Correct, I am working on the issue but that's not going to be .25 stuff.
> 
> How are we going to get 2.6.25 working no worse than 2.6.24?
> 
> Letting GROUP_SCHED depend on BROKEN would sound logical, but that would 
> also kill FAIR_GROUP_SCHED.

Are we actually worse off than .24?

I thought group scheduling has been performing less than expected ever
since it got merged.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 10:24 ` Thomas Gleixner
  2008-03-28 10:58   ` Thomas Gleixner
@ 2008-03-28 16:12   ` Rafael J. Wysocki
  2008-03-28 16:18     ` Thomas Gleixner
  1 sibling, 1 reply; 55+ messages in thread
From: Rafael J. Wysocki @ 2008-03-28 16:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Adrian Bunk, Andrew Morton, Linus Torvalds,
	Natalie Protasevich, Yinghai Lu, H. Peter Anvin,
	Venkatesch Pallipadi

On Friday, 28 of March 2008, Thomas Gleixner wrote:
> On Thu, 27 Mar 2008, Rafael J. Wysocki wrote:
> > 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9980
> > Subject		: 2.6.25-rc1 on Sun Ultra 40- HPET clocksource which causes it to hang
> > Submitter	: Jasper Bryant-Greene <jasper@unix.geek.nz>
> > Date		: 2008-02-13 12:25 (44 days old)
> > References	: http://lkml.org/lkml/2008/2/13/181
> > Handled-By	: Yinghai Lu <Yinghai.Lu@sun.com>
> 
> asked the bug reporter for an update.
>  
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10011
> > Subject		: The computer is blocked when X is started - unless max_cstate=2 - Acer Travelmate 4001 Lmi
> > Submitter	: François Valenduc <francois.valenduc@tvcablenet.be>
> > Date		: 2008-02-17 06:28 (40 days old)
> > Handled-By	: Thomas Gleixner <tglx@linutronix.de>
> 
> asked the bug reporter for an update.
>  
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9969
> > Subject		: 2.6.24-git15 Keyboard Issue?
> > Submitter	: Chris Holvenstot <cholvenstot@comcast.net>
> > Date		: 2008-02-06 14:02 (51 days old)
> > References	: http://lkml.org/lkml/2008/2/6/100
> > 		  http://lkml.org/lkml/2008/2/13/82
> > Handled-By	: Thomas Gleixner <tglx@linutronix.de>
> > Patch		: http://lkml.org/lkml/2008/2/15/343
>  
> asked the bug reporter for an update.
>  
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10153
> > Subject		: (regression) kernel/timeconst.h bugs with HZ=128
> > Submitter	: David Brownell <david-b@pacbell.net>
> > Date		: 2008-02-26 19:32 (31 days old)
> > References	: http://lkml.org/lkml/2008/2/26/294
> > Handled-By	: H. Peter Anvin <hpa@zytor.com>
> > Patch		: http://bugzilla.kernel.org/attachment.cgi?id=15114&amp;action=view
> > 		  http://bugzilla.kernel.org/attachment.cgi?id=15115&amp;action=view
> 
> Have those patches made it somewhere else than into the bugzilla ?

I don't think so.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 16:12   ` Rafael J. Wysocki
@ 2008-03-28 16:18     ` Thomas Gleixner
  0 siblings, 0 replies; 55+ messages in thread
From: Thomas Gleixner @ 2008-03-28 16:18 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Adrian Bunk, Andrew Morton, Linus Torvalds,
	Natalie Protasevich, Yinghai Lu, H. Peter Anvin,
	Venkatesch Pallipadi

On Fri, 28 Mar 2008, Rafael J. Wysocki wrote:
> > > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10153
> > > Subject		: (regression) kernel/timeconst.h bugs with HZ=128
> > > Submitter	: David Brownell <david-b@pacbell.net>
> > > Date		: 2008-02-26 19:32 (31 days old)
> > > References	: http://lkml.org/lkml/2008/2/26/294
> > > Handled-By	: H. Peter Anvin <hpa@zytor.com>
> > > Patch		: http://bugzilla.kernel.org/attachment.cgi?id=15114&amp;action=view
> > > 		  http://bugzilla.kernel.org/attachment.cgi?id=15115&amp;action=view
> > 
> > Have those patches made it somewhere else than into the bugzilla ?
> 
> I don't think so.

They touch a bunch of files but they are pretty straight forward and
obvious. OTOH HZ=128 is not really anyway near to a default config and
I think we can safely postpone that one to .26

@hpa: can you please post them on LKML so they don't settle dust in
the bugzilla ?

Thanks,
	tglx

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-27 22:53 2.6.25-rc7-git2: Reported regressions from 2.6.24 Rafael J. Wysocki
                   ` (2 preceding siblings ...)
  2008-03-28 10:24 ` Thomas Gleixner
@ 2008-03-28 18:57 ` Mark Lord
  2008-03-28 22:37   ` Rafael J. Wysocki
  3 siblings, 1 reply; 55+ messages in thread
From: Mark Lord @ 2008-03-28 18:57 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Adrian Bunk, Andrew Morton, Linus Torvalds,
	Natalie Protasevich

Rafael J. Wysocki wrote:
..
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10345
> Subject		: 2.6.25-rc7: Ugh. (USB HID problems after resume)
> Submitter	: Mark Lord <lkml@rtr.ca>
> Date		: 2008-03-27 11:29 (1 days old)
> References	: http://lkml.org/lkml/2008/3/27/228
> Handled-By	: Greg KH <gregkh@suse.de>
..

Greg didn't really look at this one.
Change the "Handled-By" to:

  Handled-By	: Mark Lord <lkml@rtr.ca>
 
Also, for now it seems to be "resolved" via a seemingly-unrelated
change in the .config file on that machine.  If it dies again,
I'll re-post and investigate further.

Thanks for doing this stuff!

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
  2008-03-28 18:57 ` Mark Lord
@ 2008-03-28 22:37   ` Rafael J. Wysocki
  0 siblings, 0 replies; 55+ messages in thread
From: Rafael J. Wysocki @ 2008-03-28 22:37 UTC (permalink / raw)
  To: Mark Lord
  Cc: LKML, Adrian Bunk, Andrew Morton, Linus Torvalds,
	Natalie Protasevich

On Friday, 28 of March 2008, Mark Lord wrote:
> Rafael J. Wysocki wrote:
> ..
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10345
> > Subject		: 2.6.25-rc7: Ugh. (USB HID problems after resume)
> > Submitter	: Mark Lord <lkml@rtr.ca>
> > Date		: 2008-03-27 11:29 (1 days old)
> > References	: http://lkml.org/lkml/2008/3/27/228
> > Handled-By	: Greg KH <gregkh@suse.de>
> ..
> 
> Greg didn't really look at this one.
> Change the "Handled-By" to:
> 
>   Handled-By	: Mark Lord <lkml@rtr.ca>
>  
> Also, for now it seems to be "resolved" via a seemingly-unrelated
> change in the .config file on that machine.  If it dies again,
> I'll re-post and investigate further.

I've closed it as "unreproducible", please reopen if necessary.

> Thanks for doing this stuff!

You're welcome. :-)

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2008-03-31 18:59 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-27 22:53 2.6.25-rc7-git2: Reported regressions from 2.6.24 Rafael J. Wysocki
2008-03-28  0:18 ` Carlos R. Mafra
2008-03-28  0:23   ` Rafael J. Wysocki
2008-03-28  2:30 ` Linus Torvalds
2008-03-28  3:24   ` Christoph Lameter
2008-03-28  4:00     ` Linus Torvalds
2008-03-28 10:48       ` Paweł Staszewski
2008-03-28 17:46         ` Andrew Morton
2008-03-28 21:57           ` Rafael J. Wysocki
2008-03-28 17:15       ` Pekka Enberg
2008-03-28 17:27         ` Linus Torvalds
2008-03-28 18:08           ` Pekka Enberg
2008-03-28 18:20             ` Linus Torvalds
2008-03-28 18:38               ` Christoph Lameter
2008-03-28 18:47               ` Andrew Morton
2008-03-28 18:53                 ` Christoph Lameter
2008-03-28 19:37                 ` Linus Torvalds
2008-03-28 19:59                   ` Linus Torvalds
2008-03-28 19:59               ` Pekka Enberg
2008-03-28 20:24                 ` Linus Torvalds
2008-03-28 18:37           ` Christoph Lameter
2008-03-28 19:32             ` Linus Torvalds
2008-03-28 18:33       ` Christoph Lameter
2008-03-28 19:25         ` Linus Torvalds
2008-03-29 20:42           ` Christoph Lameter
2008-03-29 21:29             ` Linus Torvalds
2008-03-29 23:52               ` Pekka Enberg
2008-03-31 18:56                 ` Christoph Lameter
2008-03-31 18:45               ` Christoph Lameter
2008-03-28  3:31   ` Yinghai Lu
2008-03-31 10:14     ` Kamalesh Babulal
2008-03-31 12:10       ` Rafael J. Wysocki
2008-03-28 11:29   ` Haavard Skinnemoen
2008-03-28 16:11     ` Rafael J. Wysocki
2008-03-28 16:10   ` Rafael J. Wysocki
2008-03-28 16:47     ` Linus Torvalds
2008-03-28 17:36       ` Adrian Bunk
2008-03-28 20:33         ` Ingo Molnar
2008-03-28 22:28       ` Rafael J. Wysocki
2008-03-31 13:34         ` Ingo Molnar
2008-03-28 10:24 ` Thomas Gleixner
2008-03-28 10:58   ` Thomas Gleixner
2008-03-28 11:00     ` Peter Zijlstra
2008-03-28 11:13       ` Adrian Bunk
2008-03-28 11:16         ` Thomas Gleixner
2008-03-28 11:31           ` Adrian Bunk
2008-03-28 16:17             ` Rafael J. Wysocki
2008-03-28 17:06               ` Adrian Bunk
2008-03-28 20:42               ` Ingo Molnar
2008-03-28 22:33                 ` Rafael J. Wysocki
2008-03-28 11:44         ` Peter Zijlstra
2008-03-28 16:12   ` Rafael J. Wysocki
2008-03-28 16:18     ` Thomas Gleixner
2008-03-28 18:57 ` Mark Lord
2008-03-28 22:37   ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox