* Re: [2/3] 2.6.22-rc2: known regressions v2 [not found] <46558708.2040803@googlemail.com> @ 2007-05-24 14:04 ` Michal Piotrowski 2007-05-24 14:18 ` [linux-pm] " Alan Stern ` (2 more replies) 2007-05-24 14:04 ` [3/3] " Michal Piotrowski 1 sibling, 3 replies; 35+ messages in thread From: Michal Piotrowski @ 2007-05-24 14:04 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, LKML, Cherwin R. Nooitmeer, Christoph Lameter, linux-pcmcia, Robert de Rooy, Alan Cox, Tejun Heo, sparclinux, David Miller, Mikael Pettersson, linux1394-devel, Stefan Richter, Kristian Høgsberg, linux-pm, Rafael J. Wysocki, Pavel Machek, Marcus Better, Andrey Borzenkov, linux-usb-devel, Greg Kroah-Hartman Hi all, Here is a list of some known regressions in 2.6.22-rc2. Feel free to add new regressions/remove fixed etc. http://kernelnewbies.org/known_regressions Memory management Subject : kernel BUG at include/linux/slub_def.h:88 kmalloc_index() References : http://bugzilla.kernel.org/show_bug.cgi?id=8476 Submitter : Cherwin R. Nooitmeer <cherwin@gmail.com> Status : Unknown PCMCIA Subject : libata and legacy ide pcmcia failure References : http://lkml.org/lkml/2007/5/17/305 Submitter : Robert de Rooy <robert.de.rooy@gmail.com> Status : Unknown Sparc64 Subject : 2.6.22-rc broke X on Ultra5 References : http://lkml.org/lkml/2007/5/22/78 Submitter : Mikael Pettersson <mikpe@it.uu.se> Status : Unknown Suspend Subject : STD fails with pci_device_suspend(): usb_hcd_pci_suspend+0x0/0x160 [usbcore]() returns -16 References : http://lkml.org/lkml/2007/5/19/66 Submitter : Andrey Borzenkov <arvidjaar@mail.ru> Status : Unknown Subject : 2.6.22-rc1 suspend to RAM problem References : http://permalink.gmane.org/gmane.linux.power-management.general/5819 Submitter : Marcus Better <marcus@better.se> Handled-By : Stefan Richter <stefanr@s5r6.in-berlin.de> Kristian Høgsberg <krh@bitplanet.net> Status : caused by fw-ohci module Regards, Michal -- "Najbardziej brakowało mi twojego milczenia." -- Andrzej Sapkowski "Coś więcej" ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [linux-pm] Re: [2/3] 2.6.22-rc2: known regressions v2 2007-05-24 14:04 ` [2/3] 2.6.22-rc2: known regressions v2 Michal Piotrowski @ 2007-05-24 14:18 ` Alan Stern 2007-05-24 17:01 ` Christoph Lameter 2007-06-03 6:47 ` Stefan Richter 2 siblings, 0 replies; 35+ messages in thread From: Alan Stern @ 2007-05-24 14:18 UTC (permalink / raw) To: Michal Piotrowski; +Cc: LKML, Linux-pm mailing list, Andrey Borzenkov On Thu, 24 May 2007, Michal Piotrowski wrote: > Suspend > > Subject : STD fails with pci_device_suspend(): usb_hcd_pci_suspend+0x0/0x160 [usbcore]() returns -16 > References : http://lkml.org/lkml/2007/5/19/66 > Submitter : Andrey Borzenkov <arvidjaar@mail.ru> > Status : Unknown This has been fixed and the patches are in the current -git. Alan Stern ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2/3] 2.6.22-rc2: known regressions v2 2007-05-24 14:04 ` [2/3] 2.6.22-rc2: known regressions v2 Michal Piotrowski 2007-05-24 14:18 ` [linux-pm] " Alan Stern @ 2007-05-24 17:01 ` Christoph Lameter 2007-05-24 17:12 ` Linus Torvalds 2007-06-03 6:47 ` Stefan Richter 2 siblings, 1 reply; 35+ messages in thread From: Christoph Lameter @ 2007-05-24 17:01 UTC (permalink / raw) To: Michal Piotrowski Cc: Linus Torvalds, Andrew Morton, LKML, Cherwin R. Nooitmeer, linux-pcmcia, Robert de Rooy, Alan Cox, Tejun Heo, sparclinux, David Miller, Mikael Pettersson, linux1394-devel, Stefan Richter, Kristian Høgsberg, linux-pm, Rafael J. Wysocki, Pavel Machek, Marcus Better, Andrey Borzenkov, linux-usb-devel, Greg Kroah-Hartman On Thu, 24 May 2007, Michal Piotrowski wrote: > Memory management > > Subject : kernel BUG at include/linux/slub_def.h:88 kmalloc_index() > References : http://bugzilla.kernel.org/show_bug.cgi?id=8476 > Submitter : Cherwin R. Nooitmeer <cherwin@gmail.com> > Status : Unknown Looks like this is in DRM code: BUG: at include/linux/slub_def.h:88 kmalloc_index() [<c014f67f>] get_slab+0x43/0x1c6 [<c014f875>] __kmalloc+0xc/0x57 [<f0a7b564>] drm_rmdraw+0x0/0x27d [drm] [<f0a7b698>] drm_rmdraw+0x134/0x27d [drm] [<f0a7b564>] drm_rmdraw+0x0/0x27d [drm] [<f0a7c1d0>] drm_ioctl+0x144/0x18c [drm] [<c01265ad>] enqueue_hrtimer+0xe3/0xef [<c015c184>] do_ioctl+0x4c/0x64 [<c015c3c7>] vfs_ioctl+0x22b/0x23e [<c015c40d>] sys_ioctl+0x33/0x4e [<c0103ca0>] syscall_call+0x7/0xb ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2/3] 2.6.22-rc2: known regressions v2 2007-05-24 17:01 ` Christoph Lameter @ 2007-05-24 17:12 ` Linus Torvalds 2007-05-24 17:18 ` Christoph Lameter ` (2 more replies) 0 siblings, 3 replies; 35+ messages in thread From: Linus Torvalds @ 2007-05-24 17:12 UTC (permalink / raw) To: Christoph Lameter Cc: Michal Piotrowski, Andrew Morton, LKML, Cherwin R. Nooitmeer, linux-pcmcia, Robert de Rooy, Alan Cox, Tejun Heo, sparclinux, David Miller, Mikael Pettersson, linux1394-devel, Stefan Richter, Kristian Høgsberg, linux-pm, Rafael J. Wysocki, Pavel Machek, Marcus Better, Andrey Borzenkov, linux-usb-devel, Greg Kroah-Hartman, Ingo Molnar On Thu, 24 May 2007, Christoph Lameter wrote: > > On Thu, 24 May 2007, Michal Piotrowski wrote: > > > Memory management > > > > Subject : kernel BUG at include/linux/slub_def.h:88 kmalloc_index() > > References : http://bugzilla.kernel.org/show_bug.cgi?id=8476 > > Submitter : Cherwin R. Nooitmeer <cherwin@gmail.com> > > Status : Unknown > > > Looks like this is in DRM code: > > BUG: at include/linux/slub_def.h:88 kmalloc_index() I'm going to change that "BUG:" to "WARNING:". I know some people disagreed with it (ie Ingo), but I think that's total and utter bullshit. It's a warning. Right now that "BUG:" message makes people all scared about something that is not fatal at all, just a note that something hasn't been converted, but is expected to work absolutely fine. Calling it a bug is idiotic. Linus ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2/3] 2.6.22-rc2: known regressions v2 2007-05-24 17:12 ` Linus Torvalds @ 2007-05-24 17:18 ` Christoph Lameter 2007-05-24 18:49 ` Andrew Morton 2007-05-24 19:37 ` Ingo Molnar 2 siblings, 0 replies; 35+ messages in thread From: Christoph Lameter @ 2007-05-24 17:18 UTC (permalink / raw) To: Linus Torvalds Cc: Michal Piotrowski, Andrew Morton, LKML, Cherwin R. Nooitmeer, linux-pcmcia, Robert de Rooy, Alan Cox, Tejun Heo, sparclinux, David Miller, Mikael Pettersson, linux1394-devel, Stefan Richter, Kristian Høgsberg, linux-pm, Rafael J. Wysocki, Pavel Machek, Marcus Better, Andrey Borzenkov, linux-usb-devel, Greg Kroah-Hartman, Ingo Molnar On Thu, 24 May 2007, Linus Torvalds wrote: > I'm going to change that "BUG:" to "WARNING:". Good. I wondered for a long time why a "WARN_xxx ... " does print BUG: xxx. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2/3] 2.6.22-rc2: known regressions v2 2007-05-24 17:12 ` Linus Torvalds 2007-05-24 17:18 ` Christoph Lameter @ 2007-05-24 18:49 ` Andrew Morton 2007-05-24 19:21 ` Linus Torvalds 2007-05-24 19:37 ` Ingo Molnar 2 siblings, 1 reply; 35+ messages in thread From: Andrew Morton @ 2007-05-24 18:49 UTC (permalink / raw) To: Linus Torvalds Cc: Christoph Lameter, Michal Piotrowski, LKML, Cherwin R. Nooitmeer, linux-pcmcia, Robert de Rooy, Alan Cox, Tejun Heo, sparclinux, David Miller, Mikael Pettersson, linux1394-devel, Stefan Richter, Kristian Høgsberg, linux-pm, Rafael J. Wysocki, Pavel Machek, Marcus Better, Andrey Borzenkov, linux-usb-devel, Greg Kroah-Hartman, Ingo Molnar On Thu, 24 May 2007 10:12:14 -0700 (PDT) Linus Torvalds <torvalds@linux-foundation.org> wrote: > > BUG: at include/linux/slub_def.h:88 kmalloc_index() > > I'm going to change that "BUG:" to "WARNING:". I think we should remove these kmalloc(0, ...) warnings prior to the 2.6.22 release, put them back afterwards. Actually it would be useful to have some config variable which is "false" for production kernels (2.6.x, 2.6.x.y) and "true" for development kernels (2.6.x-rcN). I suspect we'd end up using this in quite a lot of places for general developer-nags which shouldn't be exposed to users of production kernels. The problem with this is that on the day Linus goes from 2.6.x-rc7 to 2.6.x we suddenly get a compile-time kernel which nobody has tested, so we'd need to set CONFIG_DEVELOPMENT_KERNEL to false around the -rc4/5 timeframe. I'm not sure how we could do this, apart from patching and unpatching a config file each time. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2/3] 2.6.22-rc2: known regressions v2 2007-05-24 18:49 ` Andrew Morton @ 2007-05-24 19:21 ` Linus Torvalds 2007-05-24 21:02 ` Jeff Garzik 0 siblings, 1 reply; 35+ messages in thread From: Linus Torvalds @ 2007-05-24 19:21 UTC (permalink / raw) To: Andrew Morton Cc: Christoph Lameter, Michal Piotrowski, LKML, Cherwin R. Nooitmeer, linux-pcmcia, Robert de Rooy, Alan Cox, Tejun Heo, sparclinux, David Miller, Mikael Pettersson, linux1394-devel, Stefan Richter, Kristian H?gsberg, linux-pm, Rafael J. Wysocki, Pavel Machek, Marcus Better, Andrey Borzenkov, linux-usb-devel, Greg Kroah-Hartman, Ingo Molnar On Thu, 24 May 2007, Andrew Morton wrote: > On Thu, 24 May 2007 10:12:14 -0700 (PDT) > Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > > BUG: at include/linux/slub_def.h:88 kmalloc_index() > > > > I'm going to change that "BUG:" to "WARNING:". > > I think we should remove these kmalloc(0, ...) warnings prior to > the 2.6.22 release, put them back afterwards. Orthogonal issue, but yes, I agree. I'll do that too, but let's keep it for now. > I'm not sure how we could do this, apart from patching and unpatching a > config file each time. Doing it in the Makefile would make more sense, since I have to edit that file anyway to change -rc5 to -rc6. Linus ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2/3] 2.6.22-rc2: known regressions v2 2007-05-24 19:21 ` Linus Torvalds @ 2007-05-24 21:02 ` Jeff Garzik 0 siblings, 0 replies; 35+ messages in thread From: Jeff Garzik @ 2007-05-24 21:02 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, Christoph Lameter, Michal Piotrowski, LKML, Cherwin R. Nooitmeer, linux-pcmcia, Robert de Rooy, Alan Cox, Tejun Heo, sparclinux, David Miller, Mikael Pettersson, linux1394-devel, Stefan Richter, Kristian H?gsberg, linux-pm, Rafael J. Wysocki, Pavel Machek, Marcus Better, Andrey Borzenkov, linux-usb-devel, Greg Kroah-Hartman, Ingo Molnar Linus Torvalds wrote: > Doing it in the Makefile would make more sense, since I have to edit that > file anyway to change -rc5 to -rc6. Tangent: you should also change NAME when you do so :) ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2/3] 2.6.22-rc2: known regressions v2 2007-05-24 17:12 ` Linus Torvalds 2007-05-24 17:18 ` Christoph Lameter 2007-05-24 18:49 ` Andrew Morton @ 2007-05-24 19:37 ` Ingo Molnar 2007-05-24 19:50 ` Linus Torvalds 2 siblings, 1 reply; 35+ messages in thread From: Ingo Molnar @ 2007-05-24 19:37 UTC (permalink / raw) To: Linus Torvalds Cc: Christoph Lameter, Michal Piotrowski, Andrew Morton, LKML, Cherwin R. Nooitmeer, linux-pcmcia, Robert de Rooy, Alan Cox, Tejun Heo, sparclinux, David Miller, Mikael Pettersson, linux1394-devel, Stefan Richter, Kristian Høgsberg, linux-pm, Rafael J. Wysocki, Pavel Machek, Marcus Better, Andrey Borzenkov, linux-usb-devel, Greg Kroah-Hartman * Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Looks like this is in DRM code: > > > > BUG: at include/linux/slub_def.h:88 kmalloc_index() > > I'm going to change that "BUG:" to "WARNING:". > > I know some people disagreed with it (ie Ingo), but I think that's > total and utter bullshit. > > It's a warning. Right now that "BUG:" message makes people all scared > about something that is not fatal at all, just a note that something > hasn't been converted, but is expected to work absolutely fine. > > Calling it a bug is idiotic. i very much agree that this kmalloc_index() one shouldnt be called a "BUG: ", but if you look at the majority of WARN_ON() instances they are checks for clear, serious kernel bugs. Very often we use WARN_ON() not to signal that it's just a harmless warning, but because we do not want to bring the system down via a BUG_ON(). The API is misnamed for sure, but still, the purpose and current practice is clear: to signal kernel bugs. To quantify this a bit more objectively i just did a "grep WARN_ON kernel/*.c" and randomly picked 10 out of the 113 WARN_ON()'s: kernel/cpu.c: WARN_ON(1); kernel/exit.c: WARN_ON(atomic_read(&tsk->fs_excl)); kernel/fork.c: WARN_ON(!(tsk->exit_state & (EXIT_DEAD | EXIT_ZOMBIE))); kernel/futex.c: WARN_ON(!pi_state); kernel/hrtimer.c: WARN_ON_ONCE(timer->cb_mode == HRTIMER_CB_IRQSAFE_NO_SOFTIRQ); kernel/lockdep.c: WARN_ON(1); kernel/mutex-debug.c: DEBUG_LOCKS_WARN_ON(list_empty(&waiter->list)); kernel/rtmutex.c: WARN_ON(rt_mutex_is_locked(lock)); kernel/sched.c: if (DEBUG_LOCKS_WARN_ON((preempt_count() < 0))) kernel/softirq.c: WARN_ON_ONCE(in_irq()); and reviewed each instance, each and every one of these warnings is a serious kernel bug that i would not 'warn' about, but what i'd like to see reported ASAP. [ In fact i added 5 of these WARN_ON()s :-/ ] But maybe that's just me? now regarding the naming of this API, i'd very much agree to do this rename: WARN_ON => BUG_ON BUG_ON => CRASH_ON and make WARN_ON() print a "WARNING: ". and i signalled this in the original discussion too a few months ago when i opposed the watering-down of the WARN_ON printk. OTOH i dont feel that strongly about all this :-) Ingo ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2/3] 2.6.22-rc2: known regressions v2 2007-05-24 19:37 ` Ingo Molnar @ 2007-05-24 19:50 ` Linus Torvalds 2007-05-24 20:02 ` David Woodhouse 2007-05-25 10:11 ` Ingo Molnar 0 siblings, 2 replies; 35+ messages in thread From: Linus Torvalds @ 2007-05-24 19:50 UTC (permalink / raw) To: Ingo Molnar Cc: Christoph Lameter, Michal Piotrowski, Andrew Morton, LKML, Cherwin R. Nooitmeer, linux-pcmcia, Robert de Rooy, Alan Cox, Tejun Heo, sparclinux, David Miller, Mikael Pettersson, linux1394-devel, Stefan Richter, Kristian H?gsberg, linux-pm, Rafael J. Wysocki, Pavel Machek, Marcus Better, Andrey Borzenkov, linux-usb-devel, Greg Kroah-Hartman On Thu, 24 May 2007, Ingo Molnar wrote: > > i very much agree that this kmalloc_index() one shouldnt be called a > "BUG: ", but if you look at the majority of WARN_ON() instances they are > checks for clear, serious kernel bugs. I _still_ disagree. There's a huge difference between "You killed my father, prepare to die", and "Btw, I didn't like that, but I'll just continue". And that's the difference between BUG_ON() and WARN_ON(). And dammit, the kernel message should make that CLEAR. It's totally idiotic to call both "BUG". One is a clear BUG, the other is a "uhhuh, something unexpected happened, but I know how to continue". So stop this idiotic "we should call them both the same". If they actually were the same, they'd both be called BUG_ON(). Linus ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2/3] 2.6.22-rc2: known regressions v2 2007-05-24 19:50 ` Linus Torvalds @ 2007-05-24 20:02 ` David Woodhouse 2007-05-25 10:11 ` Ingo Molnar 1 sibling, 0 replies; 35+ messages in thread From: David Woodhouse @ 2007-05-24 20:02 UTC (permalink / raw) To: Linus Torvalds Cc: Ingo Molnar, Cherwin R. Nooitmeer, Kristian Høgsberg, Mikael Pettersson, Pavel Machek, sparclinux, linux1394-devel, linux-usb-devel, Alan Cox, linux-pm, Christoph Lameter, Robert de Rooy, Tejun Heo, Michal Piotrowski, linux-pcmcia, Marcus Better, Rafael J. Wysocki, Greg Kroah-Hartman, LKML, Stefan Richter, Andrew Morton, Andrey Borzenkov, David Miller On Thu, 2007-05-24 at 12:50 -0700, Linus Torvalds wrote: > There's a huge difference between "You killed my father, prepare to > die", and "Btw, I didn't like that, but I'll just continue". There are three cases, not two: 1. Something slightly suboptimal happened. We didn't like it. 2. Something very broken happened but we can recover so there's no need to actually commit Harikiri right now. 3. Oh fuck. We are currently using WARN_ON() for both of the first two; I think Ingo is merely suggesting that we start to differentiate between them. Which makes a certain amount of sense. -- dwmw2 ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2/3] 2.6.22-rc2: known regressions v2 2007-05-24 19:50 ` Linus Torvalds 2007-05-24 20:02 ` David Woodhouse @ 2007-05-25 10:11 ` Ingo Molnar 2007-05-25 10:18 ` Stefan Richter 2007-05-25 11:53 ` Chris Newport 1 sibling, 2 replies; 35+ messages in thread From: Ingo Molnar @ 2007-05-25 10:11 UTC (permalink / raw) To: Linus Torvalds Cc: Christoph Lameter, Michal Piotrowski, Andrew Morton, LKML, Cherwin R. Nooitmeer, linux-pcmcia, Robert de Rooy, Alan Cox, Tejun Heo, sparclinux, David Miller, Mikael Pettersson, linux1394-devel, Stefan Richter, Kristian H?gsberg, linux-pm, Rafael J. Wysocki, Pavel Machek, Marcus Better, Andrey Borzenkov, linux-usb-devel, Greg Kroah-Hartman * Linus Torvalds <torvalds@linux-foundation.org> wrote: > > i very much agree that this kmalloc_index() one shouldnt be called a > > "BUG: ", but if you look at the majority of WARN_ON() instances they > > are checks for clear, serious kernel bugs. > > I _still_ disagree. > > There's a huge difference between "You killed my father, prepare to > die", and "Btw, I didn't like that, but I'll just continue". yeah ... > And that's the difference between BUG_ON() and WARN_ON(). how about this solution: make WARN_ON() a "WARNING: " like you suggested (i still agree with that in principle), but also solve the additional problem i'm trying to outline: make BUG_ON() _not_ crash the box [only if the user asks for a crash to happen in such circumstances - this can be a sysctl.]. Then i can change the majority of the current WARN_ON()s to BUG_ON()s. Most of the WARN_ON()s i personally add (and most of the WARN_ON()s i see others adding) are not WARN_ON()s because "i didnt like that and i'll just continue", they are WARN_ON() because i want _actual feedback from users_. A BUG_ON() has a (much) lower likelyhood of being reported back - for most users it is a "X just hung hard, there was nothing in the syslog, i had to switch back to the older kernel" experience, and they do not have a serial console to hook up (newer hardware often doesnt even have a serial port). With the WARN_ON()s we have a _chance_ that despite the seriousness of the bug, the message makes it to the syslog, until the system comes to a screeching halt due to side-effects of the bug. in that sense i am part of the problem: i was adding WARN_ON()s that werent true 'warnings' but 'bugs'. So i'd very much like to fix that problem, but i'd also like to solve the (very serious and existing) problem of BUG_ON()s making it less likely to get bugs reported back. Ingo ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2/3] 2.6.22-rc2: known regressions v2 2007-05-25 10:11 ` Ingo Molnar @ 2007-05-25 10:18 ` Stefan Richter 2007-05-25 16:21 ` Linus Torvalds 2007-05-25 11:53 ` Chris Newport 1 sibling, 1 reply; 35+ messages in thread From: Stefan Richter @ 2007-05-25 10:18 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, Christoph Lameter, Michal Piotrowski, Andrew Morton, LKML, Cherwin R. Nooitmeer, linux-pcmcia, Robert de Rooy, Alan Cox, Tejun Heo, sparclinux, David Miller, Mikael Pettersson, linux1394-devel, Kristian H?gsberg, linux-pm, Rafael J. Wysocki, Pavel Machek, Marcus Better, Andrey Borzenkov, linux-usb-devel, Greg Kroah-Hartman Ingo Molnar wrote: > i was adding WARN_ON()s that werent true 'warnings' but 'bugs'. IME, the trace dump in the kernel log looks scary enough to be eventually reported, even if prefixed with "WARNING:". -- Stefan Richter -=====-=-=== -=-= ==--= http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2/3] 2.6.22-rc2: known regressions v2 2007-05-25 10:18 ` Stefan Richter @ 2007-05-25 16:21 ` Linus Torvalds 0 siblings, 0 replies; 35+ messages in thread From: Linus Torvalds @ 2007-05-25 16:21 UTC (permalink / raw) To: Stefan Richter Cc: Ingo Molnar, Christoph Lameter, Michal Piotrowski, Andrew Morton, LKML, Cherwin R. Nooitmeer, linux-pcmcia, Robert de Rooy, Alan Cox, Tejun Heo, sparclinux, David Miller, Mikael Pettersson, linux1394-devel, Kristian H?gsberg, linux-pm, Rafael J. Wysocki, Pavel Machek, Marcus Better, Andrey Borzenkov, linux-usb-devel, Greg Kroah-Hartman On Fri, 25 May 2007, Stefan Richter wrote: > Ingo Molnar wrote: > > i was adding WARN_ON()s that werent true 'warnings' but 'bugs'. > > IME, the trace dump in the kernel log looks scary enough to be > eventually reported, even if prefixed with "WARNING:". Oh, absolutely. It will stand out like a sore thumb. In fact, I think WARNING: stands out more than BUG:, if only because it's longer! Linus ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2/3] 2.6.22-rc2: known regressions v2 2007-05-25 10:11 ` Ingo Molnar 2007-05-25 10:18 ` Stefan Richter @ 2007-05-25 11:53 ` Chris Newport 2007-05-25 12:34 ` Ingo Molnar ` (2 more replies) 1 sibling, 3 replies; 35+ messages in thread From: Chris Newport @ 2007-05-25 11:53 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, Christoph Lameter, Michal Piotrowski, Andrew Morton, LKML, Cherwin R. Nooitmeer, linux-pcmcia, Robert de Rooy, Alan Cox, Tejun Heo, sparclinux, David Miller, Mikael Pettersson, linux1394-devel, Stefan Richter, Kristian H?gsberg, linux-pm, Rafael J. Wysocki, Pavel Machek, Marcus Better, Andrey Borzenkov, linux-usb-devel, Greg Kroah-Hartman Ingo Molnar wrote: >A BUG_ON() has a (much) lower likelyhood of being reported back - for >most users it is a "X just hung hard, there was nothing in the syslog, i >had to switch back to the older kernel" experience, and they do not have >a serial console to hook up (newer hardware often doesnt even have a >serial port). With the WARN_ON()s we have a _chance_ that despite the >seriousness of the bug, the message makes it to the syslog, until the >system comes to a screeching halt due to side-effects of the bug. > >in that sense i am part of the problem: i was adding WARN_ON()s that >werent true 'warnings' but 'bugs'. So i'd very much like to fix that >problem, but i'd also like to solve the (very serious and existing) >problem of BUG_ON()s making it less likely to get bugs reported back. > > > There is a fundamental problem in getting a decent log to debug a crashed kernel. Maybe we should take a hint from Solaris. If the kernel crashes Solaris dumps core to swap and sets a flag. At the next boot this image is copied to /var/adm/crashdump where it is preserved for future debugging. Obviously swap needs to be larger than core, but this is usually the case. On Sun machines this is fairly easy because the dump can be performed by the OBP, on other architectures it may be more difficult to still have enough working kernel to achieve the dump after a kernel panic. Just a thought ....... ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2/3] 2.6.22-rc2: known regressions v2 2007-05-25 11:53 ` Chris Newport @ 2007-05-25 12:34 ` Ingo Molnar 2007-05-25 16:33 ` Andrew Morton 2007-05-25 12:40 ` Stefan Richter 2007-05-25 16:45 ` Linus Torvalds 2 siblings, 1 reply; 35+ messages in thread From: Ingo Molnar @ 2007-05-25 12:34 UTC (permalink / raw) To: Chris Newport Cc: Linus Torvalds, Christoph Lameter, Michal Piotrowski, Andrew Morton, LKML, Cherwin R. Nooitmeer, linux-pcmcia, Robert de Rooy, Alan Cox, Tejun Heo, sparclinux, David Miller, Mikael Pettersson, linux1394-devel, Stefan Richter, Kristian H?gsberg, linux-pm, Rafael J. Wysocki, Pavel Machek, Marcus Better, Andrey Borzenkov, linux-usb-devel, Greg Kroah-Hartman * Chris Newport <crn@netunix.com> wrote: > There is a fundamental problem in getting a decent log to debug a > crashed kernel. Maybe we should take a hint from Solaris. If the > kernel crashes Solaris dumps core to swap and sets a flag. At the next > boot this image is copied to /var/adm/crashdump where it is preserved > for future debugging. Obviously swap needs to be larger than core, but > this is usually the case. we've got kdump, but it's not usually enabled by default by distros. Ingo ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2/3] 2.6.22-rc2: known regressions v2 2007-05-25 12:34 ` Ingo Molnar @ 2007-05-25 16:33 ` Andrew Morton 2007-05-25 16:45 ` Christoph Lameter 2007-05-28 3:46 ` Vivek Goyal 0 siblings, 2 replies; 35+ messages in thread From: Andrew Morton @ 2007-05-25 16:33 UTC (permalink / raw) To: Ingo Molnar Cc: Chris Newport, Linus Torvalds, Christoph Lameter, Michal Piotrowski, LKML, Cherwin R. Nooitmeer, linux-pcmcia, Robert de Rooy, Alan Cox, Tejun Heo, sparclinux, David Miller, Mikael Pettersson, linux1394-devel, Stefan Richter, Kristian H?gsberg, linux-pm, Rafael J. Wysocki, Pavel Machek, Marcus Better, Andrey Borzenkov, linux-usb-devel, Greg Kroah-Hartman On Fri, 25 May 2007 14:34:56 +0200 Ingo Molnar <mingo@elte.hu> wrote: > * Chris Newport <crn@netunix.com> wrote: > > > There is a fundamental problem in getting a decent log to debug a > > crashed kernel. Maybe we should take a hint from Solaris. If the > > kernel crashes Solaris dumps core to swap and sets a flag. At the next > > boot this image is copied to /var/adm/crashdump where it is preserved > > for future debugging. Obviously swap needs to be larger than core, but > > this is usually the case. > > we've got kdump, but it's not usually enabled by default by distros. Isn't that awful? By now we should be in the situation where if a tester is hitting a kernel crash we can say to them "please turn on crashdumps and send me the image". But we're not - kernel developers don't know how to turn the thing on in $RANDOM_DISTRO, testers have no experience with the feature and kernel developers don't have experience handling the crash images. And I'm not sure that the (required) "don't dump user memory and pagecache" feature has been implemented yet? It'd be in our interests to push all this along a bit. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2/3] 2.6.22-rc2: known regressions v2 2007-05-25 16:33 ` Andrew Morton @ 2007-05-25 16:45 ` Christoph Lameter 2007-05-28 3:46 ` Vivek Goyal 1 sibling, 0 replies; 35+ messages in thread From: Christoph Lameter @ 2007-05-25 16:45 UTC (permalink / raw) To: Andrew Morton Cc: Ingo Molnar, Chris Newport, Linus Torvalds, Michal Piotrowski, LKML, Cherwin R. Nooitmeer, linux-pcmcia, Robert de Rooy, Alan Cox, Tejun Heo, sparclinux, David Miller, Mikael Pettersson, linux1394-devel, Stefan Richter, Kristian H?gsberg, linux-pm, Rafael J. Wysocki, Pavel Machek, Marcus Better, Andrey Borzenkov, linux-usb-devel, Greg Kroah-Hartman, jlan On Fri, 25 May 2007, Andrew Morton wrote: > the image". But we're not - kernel developers don't know how to turn the > thing on in $RANDOM_DISTRO, testers have no experience with the feature > and kernel developers don't have experience handling the crash images. Well, we for instance have problems with huge crash dumps. Its even a challenge if the machine has only a few gigabytes of main memory. > And I'm not sure that the (required) "don't dump user memory and pagecache" > feature has been implemented yet? A colleague of mine is still working to get it to work just right on IA64. He does it full time. sigh. Jay what is the status on that? ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2/3] 2.6.22-rc2: known regressions v2 2007-05-25 16:33 ` Andrew Morton 2007-05-25 16:45 ` Christoph Lameter @ 2007-05-28 3:46 ` Vivek Goyal 1 sibling, 0 replies; 35+ messages in thread From: Vivek Goyal @ 2007-05-28 3:46 UTC (permalink / raw) To: Andrew Morton Cc: Ingo Molnar, Chris Newport, Linus Torvalds, Christoph Lameter, Michal Piotrowski, LKML, Cherwin R. Nooitmeer, linux-pcmcia, Robert de Rooy, Alan Cox, Tejun Heo, sparclinux, David Miller, Mikael Pettersson, linux1394-devel, Stefan Richter, Kristian H?gsberg, linux-pm, Rafael J. Wysocki, Pavel Machek, Marcus Better, Andrey Borzenkov, linux-usb-devel, Greg Kroah-Hartman, Ken'ichi Ohmichi On Fri, May 25, 2007 at 09:33:54AM -0700, Andrew Morton wrote: > On Fri, 25 May 2007 14:34:56 +0200 Ingo Molnar <mingo@elte.hu> wrote: > > > * Chris Newport <crn@netunix.com> wrote: > > > > > There is a fundamental problem in getting a decent log to debug a > > > crashed kernel. Maybe we should take a hint from Solaris. If the > > > kernel crashes Solaris dumps core to swap and sets a flag. At the next > > > boot this image is copied to /var/adm/crashdump where it is preserved > > > for future debugging. Obviously swap needs to be larger than core, but > > > this is usually the case. > > > > we've got kdump, but it's not usually enabled by default by distros. > > Isn't that awful? > I think kdump should be enabled by default. Or at least user should be given an option to enable/configure this service at installation time. Things are still good atleast in RHEL5. It gives user a option to enable/disable kdump at firstboot after installation. A fall side of doing it at firstboot time is that a user has to go for an extra reboot if he chooses to enable kdump (Because of kernel command line crashkernel=). An improvemnt could be that these options should be given at installation time so that a user does not have to go through an extra reboot to enable kdump service. It also has got graphical scripts to configure kdump serivce and enable it. Other distributions are catching up but there seems to be a reluctance to enable kdump by default primarily because of a chunk of memory being reserved for kdump kernel which can not be used by regular kernel. As of today RHEL reserves 128MB of memory for x86/x86_64 arch if kdump is enabled. Some people are also reluctant to change the installer to include a screen which can help user enable/disable/configure kdump. They think it increases installer complexity and user is likely to get confused. > By now we should be in the situation where if a tester is hitting a > kernel crash we can say to them "please turn on crashdumps and send me > the image". But we're not - kernel developers don't know how to turn the > thing on in $RANDOM_DISTRO, testers have no experience with the feature > and kernel developers don't have experience handling the crash images. > > And I'm not sure that the (required) "don't dump user memory and pagecache" > feature has been implemented yet? > Yes. It has been implemented and integrated with RHEL5. Not sure about others. NEC developers have developed a user space filtering utility which can filter out pagecache, user memory and zero pages. http://sourceforge.net/projects/makedumpfile In RHEL5, one can pre-configure filtering options and a filtered crash dump will automatically be saved to user configured destination. Thanks Vivek ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2/3] 2.6.22-rc2: known regressions v2 2007-05-25 11:53 ` Chris Newport 2007-05-25 12:34 ` Ingo Molnar @ 2007-05-25 12:40 ` Stefan Richter 2007-05-25 16:45 ` Linus Torvalds 2 siblings, 0 replies; 35+ messages in thread From: Stefan Richter @ 2007-05-25 12:40 UTC (permalink / raw) To: crn Cc: Ingo Molnar, Linus Torvalds, Christoph Lameter, Michal Piotrowski, Andrew Morton, LKML, Cherwin R. Nooitmeer, linux-pcmcia, Robert de Rooy, Alan Cox, Tejun Heo, sparclinux, David Miller, Mikael Pettersson, linux1394-devel, Kristian H?gsberg, linux-pm, Rafael J. Wysocki, Pavel Machek, Marcus Better, Andrey Borzenkov, linux-usb-devel, Greg Kroah-Hartman Chris Newport wrote: > There is a fundamental problem in getting a decent log to debug a > crashed kernel. If the test machine and a 2nd machine have FireWire ports, it's possible to get the kernel log and more via FireWire, unless the machine rebooted immediately or the PCI bus locked up. The program 'firescope' and the raw1394 driver is needed on the 2nd machine; the test machine has to have ohci1394 loaded. -- Stefan Richter -=====-=-=== -=-= ==--= http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2/3] 2.6.22-rc2: known regressions v2 2007-05-25 11:53 ` Chris Newport 2007-05-25 12:34 ` Ingo Molnar 2007-05-25 12:40 ` Stefan Richter @ 2007-05-25 16:45 ` Linus Torvalds 2007-05-25 17:03 ` Alan Cox ` (2 more replies) 2 siblings, 3 replies; 35+ messages in thread From: Linus Torvalds @ 2007-05-25 16:45 UTC (permalink / raw) To: Chris Newport Cc: Ingo Molnar, Christoph Lameter, Michal Piotrowski, Andrew Morton, LKML, Cherwin R. Nooitmeer, linux-pcmcia, Robert de Rooy, Alan Cox, Tejun Heo, sparclinux, David Miller, Mikael Pettersson, linux1394-devel, Stefan Richter, Kristian H?gsberg, linux-pm, Rafael J. Wysocki, Pavel Machek, Marcus Better, Andrey Borzenkov, linux-usb-devel, Greg Kroah-Hartman On Fri, 25 May 2007, Chris Newport wrote: > > Maybe we should take a hint from Solaris. No. Solaris is shit. They make their decisions based on "we control the hardware" kind of setup. > If the kernel crashes Solaris dumps core to swap and sets a flag. > At the next boot this image is copied to /var/adm/crashdump where > it is preserved for future debugging. Obviously swap needs to be > larger than core, but this is usually the case. (a) it's not necessarily the case at all on many systems (b) _most_ crashes that are real BUG()'s (rather than WARN_ON()'s) leave the system in such a fragile state that trying to write to disk is the _last_ thing you should do. Linux does the right thing: it tries to not make bugs fatal. Generally, you should see an oops, and things continue. Or a WARN_ON(), and things continue. But you should avoid the "the machine is now dead" cases. (c) have you looked at the size of drivers lately? I'd argue that *most* bugs by far happen in something driver-related, and most of our source code is likely drivers. Writing to disk when the biggest problem is a driver to begin with is INSANE. So the fact is, Solaris is crap, and to a large degree Solaris is crap exactly _because_ it assumes that it runs in a "controlled environment". Yes, in a controlled environment, dumping the whole memory image to disk may be the right thing to do. BUT: in a controlled environment, you'll never get the kind of usage that Linux gets. Why do you think Linux (and Windows, for that matter) took away a lot of the market from traditional UNIX? Answer: the traditional UNIX hardware/control model doesn't _work_. People want more flexibility, both on a hardware side and on a usage side. And once you have the flexibility, the "dump everything to disk" is simply not an option any more. Disk dumps etc are options at things like wall street. But look at the bug reports, and ask yourself how many of them happen at Wall Street, and how many of them would even be _relevant_ to somebody there? So forget about it. The whole model is totally broken. We need to make bug-reports short and sweet, enough so that random people can copy-and-paste them into an email or take a digital photo. Anything else IS TOTALLY INSANE AND USELESS! Linus ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2/3] 2.6.22-rc2: known regressions v2 2007-05-25 16:45 ` Linus Torvalds @ 2007-05-25 17:03 ` Alan Cox 2007-05-25 17:19 ` Linus Torvalds 2007-05-25 17:07 ` Chuck Ebbert 2007-05-25 18:03 ` Chris Newport 2 siblings, 1 reply; 35+ messages in thread From: Alan Cox @ 2007-05-25 17:03 UTC (permalink / raw) To: Linus Torvalds Cc: Chris Newport, Ingo Molnar, Christoph Lameter, Michal Piotrowski, Andrew Morton, LKML, Cherwin R. Nooitmeer, linux-pcmcia, Robert de Rooy, Alan Cox, Tejun Heo, sparclinux, David Miller, Mikael Pettersson, linux1394-devel, Stefan Richter, Kristian H?gsberg, linux-pm, Rafael J. Wysocki, Pavel Machek, Marcus Better, Andrey Borzenkov, linux-usb-devel, Greg Kroah-Hartman > Disk dumps etc are options at things like wall street. But look at the bug > reports, and ask yourself how many of them happen at Wall Street, and how > many of them would even be _relevant_ to somebody there? There is an additional factor - dumps contain data which variously is - copyright third parties, protected by privacy laws, just personally private, security sensitive (eg browser history) and so on. The only reasons you can get dumps back in the hands of vendors is because there are strong formal agreements controlling where they go and what is done with them. Diskdump (and even more so netdump) are useful in the hands of a developer crashing their own box just like kgdb, but not in the the normal and rational end user response of "its broken, hit reset" Alan ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2/3] 2.6.22-rc2: known regressions v2 2007-05-25 17:03 ` Alan Cox @ 2007-05-25 17:19 ` Linus Torvalds 2007-05-25 17:37 ` Andrew Morton 0 siblings, 1 reply; 35+ messages in thread From: Linus Torvalds @ 2007-05-25 17:19 UTC (permalink / raw) To: Alan Cox Cc: Chris Newport, Ingo Molnar, Christoph Lameter, Michal Piotrowski, Andrew Morton, LKML, Cherwin R. Nooitmeer, linux-pcmcia, Robert de Rooy, Alan Cox, Tejun Heo, sparclinux, David Miller, Mikael Pettersson, linux1394-devel, Stefan Richter, Kristian H?gsberg, linux-pm, Rafael J. Wysocki, Pavel Machek, Marcus Better, Andrey Borzenkov, linux-usb-devel, Greg Kroah-Hartman On Fri, 25 May 2007, Alan Cox wrote: > > There is an additional factor - dumps contain data which variously is - > copyright third parties, protected by privacy laws, just personally > private, security sensitive (eg browser history) and so on. Yes. I'm sure we've had one or two crashdumps over the years that have actually clarified a bug. But I seriously doubt it is more than a handful. > Diskdump (and even more so netdump) are useful in the hands of a > developer crashing their own box just like kgdb, but not in the the > normal and rational end user response of "its broken, hit reset" Amen, brother. Even for developers, I suspect a _lot_ of people end up doing "ok, let's bisect this" or some other method to narrow it down to a specific case, and then staring at the source code once they get to that point. At least I hope so. Even in user space, you should generally use gdb to get a traceback and perhaps variable information, and then go look at the source code. Yes, dumps can (in theory) be useful for one-off issues, but I doubt many people have ever been able to get anything much more out of them than from a kernel "oops" message. For developers, I can heartily recommend the firewire-based remote debug facilities that the PowerPC people use. I've used it once or twice, and it is fairly simple and much better than a full dump (adn it works even when the CPU is totally locked up, which is the best reason for using it). But 99% of the time, the problem doesn't happen on a developer machine, and even if it does, 90% of the time you really just want the traceback and register info that you get out of an oops. Linus ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2/3] 2.6.22-rc2: known regressions v2 2007-05-25 17:19 ` Linus Torvalds @ 2007-05-25 17:37 ` Andrew Morton 2007-05-25 17:48 ` Alan Cox 2007-05-25 17:50 ` Linus Torvalds 0 siblings, 2 replies; 35+ messages in thread From: Andrew Morton @ 2007-05-25 17:37 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Cox, Chris Newport, Ingo Molnar, Christoph Lameter, Michal Piotrowski, LKML, Cherwin R. Nooitmeer, linux-pcmcia, Robert de Rooy, Alan Cox, Tejun Heo, sparclinux, David Miller, Mikael Pettersson, linux1394-devel, Stefan Richter, Kristian H?gsberg, linux-pm, Rafael J. Wysocki, Pavel Machek, Marcus Better, Andrey Borzenkov, linux-usb-devel, Greg Kroah-Hartman On Fri, 25 May 2007 10:19:52 -0700 (PDT) Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > On Fri, 25 May 2007, Alan Cox wrote: > > > > There is an additional factor - dumps contain data which variously is - > > copyright third parties, protected by privacy laws, just personally > > private, security sensitive (eg browser history) and so on. > > Yes. We're uninterested in pagecache and user memory and they should be omitted from the image (making it enormously smaller too). That leaves security keys and perhaps filenames, and these could probably be addressed. > I'm sure we've had one or two crashdumps over the years that have actually > clarified a bug. > > But I seriously doubt it is more than a handful. We've had a few more than that, but all the ones I recall actually came from the kdump developers who were hitting other bugs and who just happened to know how to drive the thing. > > Diskdump (and even more so netdump) are useful in the hands of a > > developer crashing their own box just like kgdb, but not in the the > > normal and rational end user response of "its broken, hit reset" > > Amen, brother. > > Even for developers, I suspect a _lot_ of people end up doing "ok, let's > bisect this" or some other method to narrow it down to a specific case, > and then staring at the source code once they get to that point. > > At least I hope so. Even in user space, you should generally use gdb to > get a traceback and perhaps variable information, and then go look at the > source code. > > Yes, dumps can (in theory) be useful for one-off issues, but I doubt many > people have ever been able to get anything much more out of them than from > a kernel "oops" message. > > For developers, I can heartily recommend the firewire-based remote debug > facilities that the PowerPC people use. I've used it once or twice, and it > is fairly simple and much better than a full dump (adn it works even when > the CPU is totally locked up, which is the best reason for using it). > > But 99% of the time, the problem doesn't happen on a developer machine, > and even if it does, 90% of the time you really just want the traceback > and register info that you get out of an oops. > Often we don't even get that: "I was in X and it didn't hit the logs". You can learn a hell of a lot by really carefully picking through kernel memory with gdb. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2/3] 2.6.22-rc2: known regressions v2 2007-05-25 17:37 ` Andrew Morton @ 2007-05-25 17:48 ` Alan Cox 2007-05-25 17:50 ` Linus Torvalds 1 sibling, 0 replies; 35+ messages in thread From: Alan Cox @ 2007-05-25 17:48 UTC (permalink / raw) To: Andrew Morton Cc: Linus Torvalds, Alan Cox, Chris Newport, Ingo Molnar, Christoph Lameter, Michal Piotrowski, LKML, Cherwin R. Nooitmeer, linux-pcmcia, Robert de Rooy, Alan Cox, Tejun Heo, sparclinux, David Miller, Mikael Pettersson, linux1394-devel, Stefan Richter, Kristian H?gsberg, linux-pm, Rafael J. Wysocki, Pavel Machek, Marcus Better, Andrey Borzenkov, linux-usb-devel, Greg Kroah-Hartman On Fri, May 25, 2007 at 10:37:14AM -0700, Andrew Morton wrote: > Often we don't even get that: "I was in X and it didn't hit the logs". Thats mostly solved by fixing the Oops and framebuffer code to co-operate and is a different problem Alan ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2/3] 2.6.22-rc2: known regressions v2 2007-05-25 17:37 ` Andrew Morton 2007-05-25 17:48 ` Alan Cox @ 2007-05-25 17:50 ` Linus Torvalds 2007-05-28 4:27 ` Vivek Goyal 1 sibling, 1 reply; 35+ messages in thread From: Linus Torvalds @ 2007-05-25 17:50 UTC (permalink / raw) To: Andrew Morton Cc: Alan Cox, Chris Newport, Ingo Molnar, Christoph Lameter, Michal Piotrowski, LKML, Cherwin R. Nooitmeer, linux-pcmcia, Robert de Rooy, Alan Cox, Tejun Heo, sparclinux, David Miller, Mikael Pettersson, linux1394-devel, Stefan Richter, Kristian H?gsberg, linux-pm, Rafael J. Wysocki, Pavel Machek, Marcus Better, Andrey Borzenkov, linux-usb-devel, Greg Kroah-Hartman On Fri, 25 May 2007, Andrew Morton wrote: > > > > There is an additional factor - dumps contain data which variously is - > > > copyright third parties, protected by privacy laws, just personally > > > private, security sensitive (eg browser history) and so on. > > > > Yes. > > We're uninterested in pagecache and user memory and they should be omitted > from the image (making it enormously smaller too). The people who would use crash-dumps (big sensitive firms) don't trust you. And they'd be right not to trust you. You end up having a _lot_ of sensitive data even if you avoid user memory and page cache. The network buffers, the dentries, and just stale data that hasn't been overwritten. So if you end up having secure data on that machine, you should *never* send a dump to somebody you don't trust. For the financial companies (which are practically the only ones that would use dumps) there can even be legal reasons why they cannot do that! > That leaves security keys and perhaps filenames, and these could probably > be addressed. It leaves almost every single kernel allocation, and no, it cannot be addressed. How are you going to clear out the network packets that you have in memory? They're just kmalloc'ed. > > I'm sure we've had one or two crashdumps over the years that have actually > > clarified a bug. > > > > But I seriously doubt it is more than a handful. > > We've had a few more than that, but all the ones I recall actually came > from the kdump developers who were hitting other bugs and who just happened > to know how to drive the thing. Right, I don't dispute that some _developers_ might use dumping. I dispute that any user would practically ever use it. And even for developers, I suspect it's _so_ far down the list of things you do, that it's practically zero. > > But 99% of the time, the problem doesn't happen on a developer machine, > > and even if it does, 90% of the time you really just want the traceback > > and register info that you get out of an oops. > > Often we don't even get that: "I was in X and it didn't hit the logs". Yes. > You can learn a hell of a lot by really carefully picking through kernel > memory with gdb. .. but you can learn equally much with other methods that do *not* involve the pain and suffering that is a kernel dump. Setting up netconsole or the firewire tools is much easier. The firewire thing in particular is nice, because it doesn't actually rely on the target having to even know about it (other than enabling the "remote DMA access" thing once on bootup). If you've ever picked through a kernel dump after-the-fact, I just bet you could have done equally well with firewire, and it would have had _zero_ impact on your kernel image. Now, contrast that with kdump, and ask yourself: which one do you think is worth concentrating effort on? - kdump: lots of code and maintenance effort, doesn't work if the CPU locks up, requires a lot of learning to go through the dump. - firewire: zero code, no maintenance effort, works even if the CPU locks up. Still does require the same learning to go through the end result. Which one wins? I know which one I'll push. Linus ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2/3] 2.6.22-rc2: known regressions v2 2007-05-25 17:50 ` Linus Torvalds @ 2007-05-28 4:27 ` Vivek Goyal 0 siblings, 0 replies; 35+ messages in thread From: Vivek Goyal @ 2007-05-28 4:27 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, Alan Cox, Chris Newport, Ingo Molnar, Christoph Lameter, Michal Piotrowski, LKML, Cherwin R. Nooitmeer, linux-pcmcia, Robert de Rooy, Alan Cox, Tejun Heo, sparclinux, David Miller, Mikael Pettersson, linux1394-devel, Stefan Richter, Kristian H?gsberg, linux-pm, Rafael J. Wysocki, Pavel Machek, Marcus Better, Andrey Borzenkov, linux-usb-devel, Greg Kroah-Hartman On Fri, May 25, 2007 at 10:50:38AM -0700, Linus Torvalds wrote: > > > On Fri, 25 May 2007, Andrew Morton wrote: > > > > > > There is an additional factor - dumps contain data which variously is - > > > > copyright third parties, protected by privacy laws, just personally > > > > private, security sensitive (eg browser history) and so on. > > > > > > Yes. > > > > We're uninterested in pagecache and user memory and they should be omitted > > from the image (making it enormously smaller too). > > The people who would use crash-dumps (big sensitive firms) don't trust > you. > > And they'd be right not to trust you. You end up having a _lot_ of > sensitive data even if you avoid user memory and page cache. The network > buffers, the dentries, and just stale data that hasn't been overwritten. > > So if you end up having secure data on that machine, you should *never* > send a dump to somebody you don't trust. For the financial companies > (which are practically the only ones that would use dumps) there can even > be legal reasons why they cannot do that! > > > That leaves security keys and perhaps filenames, and these could probably > > be addressed. > > It leaves almost every single kernel allocation, and no, it cannot be > addressed. > > How are you going to clear out the network packets that you have in > memory? They're just kmalloc'ed. > > > > I'm sure we've had one or two crashdumps over the years that have actually > > > clarified a bug. > > > > > > But I seriously doubt it is more than a handful. > > > > We've had a few more than that, but all the ones I recall actually came > > from the kdump developers who were hitting other bugs and who just happened > > to know how to drive the thing. > > Right, I don't dispute that some _developers_ might use dumping. I dispute > that any user would practically ever use it. > > And even for developers, I suspect it's _so_ far down the list of things > you do, that it's practically zero. > > > > But 99% of the time, the problem doesn't happen on a developer machine, > > > and even if it does, 90% of the time you really just want the traceback > > > and register info that you get out of an oops. > > > > Often we don't even get that: "I was in X and it didn't hit the logs". > > Yes. > Isn't it reason enough for customers to use it? How will a customer capture even an OOPS meesage? Lets say some web server crashed and OOPs message did not make it to logs. I am not expecting a customer to setup a console for each and every machine in the network. In this scenario customer will have no choice but to reset the machine without any information about why did machine crash. If one can keep kdump configured, upon a crash there are high chances that customer will have some debug information to look at. I agree that security of data can be a concern. In that case probably one can extract a small dump report, like OOPs message, messages in the kernel buffers just before crash etc. and report it to service folks. Without kdump, customer will most likely have no debug data except a complain that system crashed and rebooted or it had to be reset by manual intervention. Thanks Vivek ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2/3] 2.6.22-rc2: known regressions v2 2007-05-25 16:45 ` Linus Torvalds 2007-05-25 17:03 ` Alan Cox @ 2007-05-25 17:07 ` Chuck Ebbert 2007-05-25 17:21 ` Linus Torvalds 2007-05-25 18:03 ` Chris Newport 2 siblings, 1 reply; 35+ messages in thread From: Chuck Ebbert @ 2007-05-25 17:07 UTC (permalink / raw) To: Linus Torvalds Cc: Chris Newport, Ingo Molnar, Christoph Lameter, Michal Piotrowski, Andrew Morton, LKML, Cherwin R. Nooitmeer, linux-pcmcia, Robert de Rooy, Alan Cox, Tejun Heo, sparclinux, David Miller, Mikael Pettersson, linux1394-devel, Stefan Richter, Kristian H?gsberg, linux-pm, Rafael J. Wysocki, Pavel Machek, Marcus Better, Andrey Borzenkov, linux-usb-devel, Greg Kroah-Hartman On 05/25/2007 12:45 PM, Linus Torvalds wrote: > Yes, in a controlled environment, dumping the whole memory image to disk > may be the right thing to do. BUT: in a controlled environment, you'll > never get the kind of usage that Linux gets. Why do you think Linux (and > Windows, for that matter) took away a lot of the market from traditional > UNIX? > Windows can dump memory to the swap file on crash. Default is a "minidump" IIRC but you can set it to dump all memory (or none.) ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2/3] 2.6.22-rc2: known regressions v2 2007-05-25 17:07 ` Chuck Ebbert @ 2007-05-25 17:21 ` Linus Torvalds 0 siblings, 0 replies; 35+ messages in thread From: Linus Torvalds @ 2007-05-25 17:21 UTC (permalink / raw) To: Chuck Ebbert Cc: Chris Newport, Ingo Molnar, Christoph Lameter, Michal Piotrowski, Andrew Morton, LKML, Cherwin R. Nooitmeer, linux-pcmcia, Robert de Rooy, Alan Cox, Tejun Heo, sparclinux, David Miller, Mikael Pettersson, linux1394-devel, Stefan Richter, Kristian H?gsberg, linux-pm, Rafael J. Wysocki, Pavel Machek, Marcus Better, Andrey Borzenkov, linux-usb-devel, Greg Kroah-Hartman On Fri, 25 May 2007, Chuck Ebbert wrote: > > Windows can dump memory to the swap file on crash. Default is a "minidump" > IIRC but you can set it to dump all memory (or none.) And Linux can too. And exactly as with Windows, nobody should ever use it. It's a *developer* thing. It's not a user debug facility. And even developers are not all that likely to use it. Linus ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2/3] 2.6.22-rc2: known regressions v2 2007-05-25 16:45 ` Linus Torvalds 2007-05-25 17:03 ` Alan Cox 2007-05-25 17:07 ` Chuck Ebbert @ 2007-05-25 18:03 ` Chris Newport 2007-05-25 20:36 ` David Miller 2007-05-26 13:16 ` [linux-pm] " Matt Sealey 2 siblings, 2 replies; 35+ messages in thread From: Chris Newport @ 2007-05-25 18:03 UTC (permalink / raw) To: Linus Torvalds Cc: Ingo Molnar, Christoph Lameter, Michal Piotrowski, Andrew Morton, LKML, Cherwin R. Nooitmeer, linux-pcmcia, Robert de Rooy, Alan Cox, Tejun Heo, sparclinux, David Miller, Mikael Pettersson, linux1394-devel, Stefan Richter, Kristian H?gsberg, linux-pm, Rafael J. Wysocki, Pavel Machek, Marcus Better, Andrey Borzenkov, linux-usb-devel, Greg Kroah-Hartman Sorry, I did not make myself clear. Linus Torvalds wrote: >On Fri, 25 May 2007, Chris Newport wrote: > > >>Maybe we should take a hint from Solaris. >> >> > >No. Solaris is shit. They make their decisions based on "we control the >hardware" kind of setup. > > Not really a Solaris feature. This is a feature of the Openboot PROM which is also used by several other vendors. The Openboot PROM knows how to write to disk. The same should apply on Apple hardware and others which use the openboot convention. If dumps are enabled (disabled by default in a file read at boot) the crash() function need only set a couple of registers and do a prom interrupt. At this point the kernel is no longer involved so broken drivers etc are not an issue. The cute bit is that the SunOS debug program can be called as debug $DUMPFILE and it takes you to the failure point just like a tracefile. Crashdumps should not be enabled by default, they can chew rather a lot of disk space making a crashdump.datetime file every time something breaks <B-). In most cases only developers will use this but it does resolve the problem of error messages vanishing before they can be saved. > > >>If the kernel crashes Solaris dumps core to swap and sets a flag. >>At the next boot this image is copied to /var/adm/crashdump where >>it is preserved for future debugging. Obviously swap needs to be >>larger than core, but this is usually the case. >> >> > >(a) it's not necessarily the case at all on many systems > >(b) _most_ crashes that are real BUG()'s (rather than WARN_ON()'s) leave > the system in such a fragile state that trying to write to disk is the > _last_ thing you should do. > > Linux does the right thing: it tries to not make bugs fatal. > Generally, you should see an oops, and things continue. Or a > WARN_ON(), and things continue. But you should avoid the "the machine > is now dead" cases. > >(c) have you looked at the size of drivers lately? I'd argue that *most* > bugs by far happen in something driver-related, and most of our source > code is likely drivers. > > Writing to disk when the biggest problem is a driver to begin with > is INSANE. > >So the fact is, Solaris is crap, and to a large degree Solaris is crap >exactly _because_ it assumes that it runs in a "controlled environment". > >Yes, in a controlled environment, dumping the whole memory image to disk >may be the right thing to do. BUT: in a controlled environment, you'll >never get the kind of usage that Linux gets. Why do you think Linux (and >Windows, for that matter) took away a lot of the market from traditional >UNIX? > >Answer: the traditional UNIX hardware/control model doesn't _work_. People >want more flexibility, both on a hardware side and on a usage side. And >once you have the flexibility, the "dump everything to disk" is simply not >an option any more. > >Disk dumps etc are options at things like wall street. But look at the bug >reports, and ask yourself how many of them happen at Wall Street, and how >many of them would even be _relevant_ to somebody there? > >So forget about it. The whole model is totally broken. We need to make >bug-reports short and sweet, enough so that random people can >copy-and-paste them into an email or take a digital photo. Anything else >IS TOTALLY INSANE AND USELESS! > > Linus > > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2/3] 2.6.22-rc2: known regressions v2 2007-05-25 18:03 ` Chris Newport @ 2007-05-25 20:36 ` David Miller 2007-05-26 13:16 ` [linux-pm] " Matt Sealey 1 sibling, 0 replies; 35+ messages in thread From: David Miller @ 2007-05-25 20:36 UTC (permalink / raw) To: crn Cc: torvalds, mingo, clameter, michal.k.k.piotrowski, akpm, linux-kernel, cherwin, linux-pcmcia, robert.de.rooy, alan, htejun, sparclinux, mikpe, linux1394-devel, stefanr, krh, linux-pm, rjw, pavel, marcus, arvidjaar, linux-usb-devel, gregkh From: Chris Newport <crn@netunix.com> Date: Fri, 25 May 2007 19:03:51 +0100 > Not really a Solaris feature. This is a feature of the Openboot PROM > which is also used by several other vendors. > The Openboot PROM knows how to write to disk. The same should > apply on Apple hardware and others which use the openboot > convention. This is totally unusable even if it weren't stupid, and as someone who loves OpenBoot I think it's very stupid. The reason it's unusable is that PowerPC already and sparc64 soon (in order to support LDOMs) totally drops the OBP firmware very soon after early kernel boot. We pull in the device tree and then say "see ya" to openboot. PowerPC does it because sharing the address space with openboot is next to impossible on that cpu, sparc64 will need to do it because dynamic-reconfiguration of cpus in an LDOM is too hard to do with openboot there. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [linux-pm] Re: [2/3] 2.6.22-rc2: known regressions v2 2007-05-25 18:03 ` Chris Newport 2007-05-25 20:36 ` David Miller @ 2007-05-26 13:16 ` Matt Sealey 1 sibling, 0 replies; 35+ messages in thread From: Matt Sealey @ 2007-05-26 13:16 UTC (permalink / raw) To: crn Cc: Linus Torvalds, Cherwin R. Nooitmeer, Kristian H?gsberg, Mikael Pettersson, Pavel Machek, sparclinux, linux1394-devel, linux-usb-devel, Alan Cox, linux-pm, Christoph Lameter, Robert de Rooy, Tejun Heo, Michal Piotrowski, linux-pcmcia, Marcus Better, Ingo Molnar, Greg Kroah-Hartman, LKML, Stefan Richter, Andrew Morton, Andrey Borzenkov, David Miller Chris Newport wrote: > > Sorry, I did not make myself clear. > > Linus Torvalds wrote: > >> On Fri, 25 May 2007, Chris Newport wrote: >> >> >>> Maybe we should take a hint from Solaris. >>> >> >> No. Solaris is shit. They make their decisions based on "we control >> the hardware" kind of setup. >> >> > Not really a Solaris feature. This is a feature of the Openboot PROM > which is also used by several other vendors. > The Openboot PROM knows how to write to disk. The same should > apply on Apple hardware and others which use the openboot > convention. It doesn't, though. It is not part of any specification of Open Firmware that you MUST be able to write to any exposed disk. In fact, I know of one firmware implementation (ours) where we don't allow it. Apart from being a bitch to implement safely (i.e. for people's data) it is also quite a security problem to allow the firmware interface to write to the disk. You can't make the differentiation between "at the firmware console" and "inside a booted OS" unfortunately. The 'solution' in Solaris is actually that the filesystem and disk write handling is done by the OS bootloader and not the PROM. All the PROM knows how to do is read off the bootloader (in a special partition in a special filesystem format) and execute it after probing the hardware and providing the device tree. The first stage boot loader then knows how to read UFS filesystems so it can grab the kernel and load kernel modules, and write back a kernel dump if it needs to. That, and Linux drops OF support like a lead weight very early in boot. About all you can rely on is the device tree listing all your disks present, and even then, Linux will redetect all of these with native drivers and give them new names anyway. In fact, Solaris does the same (but it is better about associating the device tree entries than Linux) -- Matt Sealey <matt@genesi-usa.com> Genesi, Manager, Developer Relations ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2/3] 2.6.22-rc2: known regressions v2 2007-05-24 14:04 ` [2/3] 2.6.22-rc2: known regressions v2 Michal Piotrowski 2007-05-24 14:18 ` [linux-pm] " Alan Stern 2007-05-24 17:01 ` Christoph Lameter @ 2007-06-03 6:47 ` Stefan Richter 2 siblings, 0 replies; 35+ messages in thread From: Stefan Richter @ 2007-06-03 6:47 UTC (permalink / raw) To: Michal Piotrowski Cc: Linus Torvalds, Kristian Høgsberg, LKML, Marcus Better, Andrew Morton Michal Piotrowski wrote: > Subject : 2.6.22-rc1 suspend to RAM problem > References : http://permalink.gmane.org/gmane.linux.power-management.general/5819 > Submitter : Marcus Better <marcus@better.se> > Handled-By : Stefan Richter <stefanr@s5r6.in-berlin.de> > Kristian Høgsberg <krh@bitplanet.net> > Status : caused by fw-ohci module Fixed in Linus' tree. -- Stefan Richter -=====-=-=== -==- ---== http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [3/3] 2.6.22-rc2: known regressions v2 [not found] <46558708.2040803@googlemail.com> 2007-05-24 14:04 ` [2/3] 2.6.22-rc2: known regressions v2 Michal Piotrowski @ 2007-05-24 14:04 ` Michal Piotrowski 2007-05-24 22:04 ` Greg KH 1 sibling, 1 reply; 35+ messages in thread From: Michal Piotrowski @ 2007-05-24 14:04 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, LKML, linux-usb-devel, art@usfltd.com, Greg Kroah-Hartman, Mauro Carvalho Chehab, Hans Verkuil, Robert Fitzsimons Hi all, Here is a list of some known regressions in 2.6.22-rc2. Feel free to add new regressions/remove fixed etc. http://kernelnewbies.org/known_regressions USB Subject : usb hotplug/udev cannot correctly register usb/scanners References : http://lkml.org/lkml/2007/5/15/205 Submitter : art@usfltd.com <art@usfltd.com> Status : Unknown V4L Subject : V4L ABI breakage References : http://lkml.org/lkml/2007/5/14/42 Submitter : Robert Fitzsimons <robfitz@273k.net> Caused-By : Hans Verkuil <hverkuil@xs4all.nl> Mauro Carvalho Chehab <mchehab@infradead.org> commit 206ebaf32795cf1582b1e2ff2ec6a560c9e986b8 Workaround : http://git.kernel.org/?p=linux/kernel/git/mchehab/v4l-dvb.git;a=commitdiff;h=15ac2d293b08b6975c6ca1a80eb839d4cb0dddbf Status : problem is being debugged Regards, Michal -- "Najbardziej brakowało mi twojego milczenia." -- Andrzej Sapkowski "Coś więcej" ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [3/3] 2.6.22-rc2: known regressions v2 2007-05-24 14:04 ` [3/3] " Michal Piotrowski @ 2007-05-24 22:04 ` Greg KH 0 siblings, 0 replies; 35+ messages in thread From: Greg KH @ 2007-05-24 22:04 UTC (permalink / raw) To: Michal Piotrowski Cc: Linus Torvalds, Andrew Morton, LKML, linux-usb-devel, art@usfltd.com, Mauro Carvalho Chehab, Hans Verkuil, Robert Fitzsimons On Thu, May 24, 2007 at 04:04:11PM +0200, Michal Piotrowski wrote: > Hi all, > > Here is a list of some known regressions in 2.6.22-rc2. > > Feel free to add new regressions/remove fixed etc. > http://kernelnewbies.org/known_regressions > > > > USB > > Subject : usb hotplug/udev cannot correctly register usb/scanners > References : http://lkml.org/lkml/2007/5/15/205 > Submitter : art@usfltd.com <art@usfltd.com> > Status : Unknown I'm still working to track this down. As scanners are in userspace, I really suspect a bad udev rule here... thanks, greg k-h ^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2007-06-03 6:48 UTC | newest]
Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <46558708.2040803@googlemail.com>
2007-05-24 14:04 ` [2/3] 2.6.22-rc2: known regressions v2 Michal Piotrowski
2007-05-24 14:18 ` [linux-pm] " Alan Stern
2007-05-24 17:01 ` Christoph Lameter
2007-05-24 17:12 ` Linus Torvalds
2007-05-24 17:18 ` Christoph Lameter
2007-05-24 18:49 ` Andrew Morton
2007-05-24 19:21 ` Linus Torvalds
2007-05-24 21:02 ` Jeff Garzik
2007-05-24 19:37 ` Ingo Molnar
2007-05-24 19:50 ` Linus Torvalds
2007-05-24 20:02 ` David Woodhouse
2007-05-25 10:11 ` Ingo Molnar
2007-05-25 10:18 ` Stefan Richter
2007-05-25 16:21 ` Linus Torvalds
2007-05-25 11:53 ` Chris Newport
2007-05-25 12:34 ` Ingo Molnar
2007-05-25 16:33 ` Andrew Morton
2007-05-25 16:45 ` Christoph Lameter
2007-05-28 3:46 ` Vivek Goyal
2007-05-25 12:40 ` Stefan Richter
2007-05-25 16:45 ` Linus Torvalds
2007-05-25 17:03 ` Alan Cox
2007-05-25 17:19 ` Linus Torvalds
2007-05-25 17:37 ` Andrew Morton
2007-05-25 17:48 ` Alan Cox
2007-05-25 17:50 ` Linus Torvalds
2007-05-28 4:27 ` Vivek Goyal
2007-05-25 17:07 ` Chuck Ebbert
2007-05-25 17:21 ` Linus Torvalds
2007-05-25 18:03 ` Chris Newport
2007-05-25 20:36 ` David Miller
2007-05-26 13:16 ` [linux-pm] " Matt Sealey
2007-06-03 6:47 ` Stefan Richter
2007-05-24 14:04 ` [3/3] " Michal Piotrowski
2007-05-24 22:04 ` Greg KH
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox