* Re: Fw: Re: oops in choose_configuration() @ 2006-03-08 3:29 Chuck Ebbert 2006-03-08 3:48 ` Dmitry Torokhov 2006-03-08 4:01 ` Linus Torvalds 0 siblings, 2 replies; 28+ messages in thread From: Chuck Ebbert @ 2006-03-08 3:29 UTC (permalink / raw) To: Greg KH Cc: linux-kernel, Ingo Molnar, Andrew Morton, Linus Torvalds, Dmitry Torokhov In-Reply-To: <20060308005455.GA23921@kroah.com> On Tue, 7 Mar 2006 16:54:55 -0800, Greg KH wrote: > On Tue, Mar 07, 2006 at 04:54:24PM -0500, Chuck Ebbert wrote: > > At least one susbsystem rolls its own method of adding env vars to the > > uevent buffer, and it's so broken it triggers the WARN_ON() in > > lib/vsprintf.c::vsnprintf() by passing a negative length to that function. > > Start at drivers/input/input.c::input_dev_uevent() and watch the fun. > > All of the INPUT_ADD_HOTPLUG_VAR() calls do use add_uevent_var(), so we > should be safe there. The other calls also look safe, if not a bit > wierd... So I don't see how we could change this to be any safer, do > you? input.c line 747+ was recently added and caused the error message: [1]=> envp[i++] = buffer + len; [2]=> len += snprintf(buffer + len, buffer_size - len, "MODALIAS="); [2]=> len += print_modalias(buffer + len, buffer_size - len, dev) + 1; [1]=> envp[i] = NULL; return 0; [1] What is checking for enough space here? This didn't overflow AFAICT but it could. [2] The snprintf() and print_modalias() calls don't check for errors and thus don't return -ENOMEM when the buffer does fill up. Shouldn't they do that instead of returning a truncated env string? Only the final sanity test in snprintf() keeps them from overrunning the buffer. (It was print_modalias_bits() that actually caused the overflow.) > Are you still seeing problems now? Wasn't me, it was Horst Schirmeier <horst@schirmeier.com> -- Chuck "Penguins don't come from next door, they come from the Antarctic!" ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Fw: Re: oops in choose_configuration() 2006-03-08 3:29 Fw: Re: oops in choose_configuration() Chuck Ebbert @ 2006-03-08 3:48 ` Dmitry Torokhov 2006-03-08 4:01 ` Linus Torvalds 1 sibling, 0 replies; 28+ messages in thread From: Dmitry Torokhov @ 2006-03-08 3:48 UTC (permalink / raw) To: Chuck Ebbert Cc: Greg KH, linux-kernel, Ingo Molnar, Andrew Morton, Linus Torvalds On Tuesday 07 March 2006 22:29, Chuck Ebbert wrote: > In-Reply-To: <20060308005455.GA23921@kroah.com> > > On Tue, 7 Mar 2006 16:54:55 -0800, Greg KH wrote: > > > On Tue, Mar 07, 2006 at 04:54:24PM -0500, Chuck Ebbert wrote: > > > At least one susbsystem rolls its own method of adding env vars to the > > > uevent buffer, and it's so broken it triggers the WARN_ON() in > > > lib/vsprintf.c::vsnprintf() by passing a negative length to that function. > > > Start at drivers/input/input.c::input_dev_uevent() and watch the fun. > > > > All of the INPUT_ADD_HOTPLUG_VAR() calls do use add_uevent_var(), so we > > should be safe there. The other calls also look safe, if not a bit > > wierd... So I don't see how we could change this to be any safer, do > > you? > > input.c line 747+ was recently added and caused the error message: > > [1]=> envp[i++] = buffer + len; > [2]=> len += snprintf(buffer + len, buffer_size - len, "MODALIAS="); > [2]=> len += print_modalias(buffer + len, buffer_size - len, dev) + 1; > > [1]=> envp[i] = NULL; > return 0; > > [1] What is checking for enough space here? This didn't overflow AFAICT > but it could. > > [2] The snprintf() and print_modalias() calls don't check for errors and > thus don't return -ENOMEM when the buffer does fill up. Shouldn't they > do that instead of returning a truncated env string? Only the final > sanity test in snprintf() keeps them from overrunning the buffer. > (It was print_modalias_bits() that actually caused the overflow.) > I agree with all of the above, it will be fixed. -- Dmitry ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Fw: Re: oops in choose_configuration() 2006-03-08 3:29 Fw: Re: oops in choose_configuration() Chuck Ebbert 2006-03-08 3:48 ` Dmitry Torokhov @ 2006-03-08 4:01 ` Linus Torvalds 1 sibling, 0 replies; 28+ messages in thread From: Linus Torvalds @ 2006-03-08 4:01 UTC (permalink / raw) To: Chuck Ebbert Cc: Greg KH, linux-kernel, Ingo Molnar, Andrew Morton, Dmitry Torokhov On Tue, 7 Mar 2006, Chuck Ebbert wrote: > > [2] The snprintf() and print_modalias() calls don't check for errors and > thus don't return -ENOMEM when the buffer does fill up. Shouldn't they > do that instead of returning a truncated env string? They try to act like the standard says "snprintf()" should act. And yes, the standard says to return the number of bytes you _would_ have written, had not the buffer been to small. (Of course, giving a negative buffer length is not ok, and the kernel version checking for that is a kernel extension on the standard. In the standard, the buffer size is a "size_t", which doesn't have the notion of "negative", since it's an unsigned type. The kernel version is just being safe and nice). Linus ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Fw: Re: oops in choose_configuration()
@ 2006-03-07 21:54 Chuck Ebbert
2006-03-08 0:54 ` Greg KH
2006-03-08 0:57 ` Linus Torvalds
0 siblings, 2 replies; 28+ messages in thread
From: Chuck Ebbert @ 2006-03-07 21:54 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Andrew Morton, Greg KH, Ingo Molnar, linux-kernel
In-Reply-To: <Pine.LNX.4.64.0603051840280.13139@g5.osdl.org>
On Sun, 5 Mar 2006 19:27:53 -0800, Linus Torvalds wrote:
> So I'd be more inclined to blame a buffer overflow on a kmalloc, and the
> obvious target is the "add_uevent_var()" thing, since all/many of the
> corruptions seem to come from uevent environment variable strings.
At least one susbsystem rolls its own method of adding env vars to the
uevent buffer, and it's so broken it triggers the WARN_ON() in
lib/vsprintf.c::vsnprintf() by passing a negative length to that function.
Start at drivers/input/input.c::input_dev_uevent() and watch the fun.
I reported this to linux-kernel, the input maintainer and the author
of that code on Feb. 26:
http://lkml.org/lkml/2006/2/26/39
--
Chuck
"Penguins don't come from next door, they come from the Antarctic!"
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: Fw: Re: oops in choose_configuration() 2006-03-07 21:54 Chuck Ebbert @ 2006-03-08 0:54 ` Greg KH 2006-03-08 0:57 ` Linus Torvalds 1 sibling, 0 replies; 28+ messages in thread From: Greg KH @ 2006-03-08 0:54 UTC (permalink / raw) To: Chuck Ebbert; +Cc: Linus Torvalds, Andrew Morton, Ingo Molnar, linux-kernel On Tue, Mar 07, 2006 at 04:54:24PM -0500, Chuck Ebbert wrote: > In-Reply-To: <Pine.LNX.4.64.0603051840280.13139@g5.osdl.org> > > On Sun, 5 Mar 2006 19:27:53 -0800, Linus Torvalds wrote: > > > So I'd be more inclined to blame a buffer overflow on a kmalloc, and the > > obvious target is the "add_uevent_var()" thing, since all/many of the > > corruptions seem to come from uevent environment variable strings. > > At least one susbsystem rolls its own method of adding env vars to the > uevent buffer, and it's so broken it triggers the WARN_ON() in > lib/vsprintf.c::vsnprintf() by passing a negative length to that function. > Start at drivers/input/input.c::input_dev_uevent() and watch the fun. All of the INPUT_ADD_HOTPLUG_VAR() calls do use add_uevent_var(), so we should be safe there. The other calls also look safe, if not a bit wierd... So I don't see how we could change this to be any safer, do you? > I reported this to linux-kernel, the input maintainer and the author > of that code on Feb. 26: > > http://lkml.org/lkml/2006/2/26/39 We should have fixed that already by increasing the size of the buffer, but yes, we should catch errors in the MODALIAS function, that would have stopped that previous overflow. Are you still seeing problems now? thanks, greg k-h ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Fw: Re: oops in choose_configuration() 2006-03-07 21:54 Chuck Ebbert 2006-03-08 0:54 ` Greg KH @ 2006-03-08 0:57 ` Linus Torvalds 2006-03-08 1:13 ` Dmitry Torokhov 2006-03-08 4:28 ` Joe Korty 1 sibling, 2 replies; 28+ messages in thread From: Linus Torvalds @ 2006-03-08 0:57 UTC (permalink / raw) To: Chuck Ebbert, Dmitry Torokhov Cc: Andrew Morton, Greg KH, Ingo Molnar, linux-kernel On Tue, 7 Mar 2006, Chuck Ebbert wrote: > > At least one susbsystem rolls its own method of adding env vars to the > uevent buffer, and it's so broken it triggers the WARN_ON() in > lib/vsprintf.c::vsnprintf() by passing a negative length to that function. Well, snprintf() should be safe, though. It will warn if the caller is lazy, but these days, the thing does max(buf_size - len, 0) which should mean that the input layer passes in 0 instead of a negative number. And snprintf() will then _not_ print anything. So I think input_add_uevent_bm_var() is safe, even if it's not pretty. However, input_devices_read() doesn't do any sanity checking at all, and if that ever ends up printing more than a page, that would be bad. I didn't look very closely, but it looks worrisome. Dmitry? Linus ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Fw: Re: oops in choose_configuration() 2006-03-08 0:57 ` Linus Torvalds @ 2006-03-08 1:13 ` Dmitry Torokhov 2006-03-08 1:27 ` Greg KH 2006-03-08 4:28 ` Joe Korty 1 sibling, 1 reply; 28+ messages in thread From: Dmitry Torokhov @ 2006-03-08 1:13 UTC (permalink / raw) To: Linus Torvalds Cc: Chuck Ebbert, Andrew Morton, Greg KH, Ingo Molnar, linux-kernel On Tuesday 07 March 2006 19:57, Linus Torvalds wrote: > > On Tue, 7 Mar 2006, Chuck Ebbert wrote: > > > > At least one susbsystem rolls its own method of adding env vars to the > > uevent buffer, and it's so broken it triggers the WARN_ON() in > > lib/vsprintf.c::vsnprintf() by passing a negative length to that function. > > Well, snprintf() should be safe, though. It will warn if the caller is > lazy, but these days, the thing does > > max(buf_size - len, 0) > > which should mean that the input layer passes in 0 instead of a negative > number. And snprintf() will then _not_ print anything. > > So I think input_add_uevent_bm_var() is safe, even if it's not pretty. > > However, input_devices_read() doesn't do any sanity checking at all, and > if that ever ends up printing more than a page, that would be bad. I > didn't look very closely, but it looks worrisome. > > Dmitry? I had all this code converted to seq_file, but it depends on converting input handlers to class interfaces and it is not possible nowadays because with latest Greg's changes to class code we would try to register class devices while registering class devices/interfaces (psmouse creates input_dev which binds to mousedev interface which in turn tries to create mouseX which is also belongs to input class) and deadlocking. Greg promised current implementation is only a temporary solution. I suppose I could separate those changes... BTW, what is ETA for 2.6.16 - I could not affort enought time getting psmouse resync logic working in all cases so I'd like to have it disabled for 2.6.16 final. -- Dmitry ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Fw: Re: oops in choose_configuration() 2006-03-08 1:13 ` Dmitry Torokhov @ 2006-03-08 1:27 ` Greg KH 2006-03-08 3:22 ` Dmitry Torokhov 0 siblings, 1 reply; 28+ messages in thread From: Greg KH @ 2006-03-08 1:27 UTC (permalink / raw) To: Dmitry Torokhov Cc: Linus Torvalds, Chuck Ebbert, Andrew Morton, Ingo Molnar, linux-kernel On Tue, Mar 07, 2006 at 08:13:28PM -0500, Dmitry Torokhov wrote: > On Tuesday 07 March 2006 19:57, Linus Torvalds wrote: > > > > On Tue, 7 Mar 2006, Chuck Ebbert wrote: > > > > > > At least one susbsystem rolls its own method of adding env vars to the > > > uevent buffer, and it's so broken it triggers the WARN_ON() in > > > lib/vsprintf.c::vsnprintf() by passing a negative length to that function. > > > > Well, snprintf() should be safe, though. It will warn if the caller is > > lazy, but these days, the thing does > > > > max(buf_size - len, 0) > > > > which should mean that the input layer passes in 0 instead of a negative > > number. And snprintf() will then _not_ print anything. > > > > So I think input_add_uevent_bm_var() is safe, even if it's not pretty. > > > > However, input_devices_read() doesn't do any sanity checking at all, and > > if that ever ends up printing more than a page, that would be bad. I > > didn't look very closely, but it looks worrisome. > > > > Dmitry? > > I had all this code converted to seq_file, but it depends on converting > input handlers to class interfaces and it is not possible nowadays > because with latest Greg's changes to class code we would try to > register class devices while registering class devices/interfaces > (psmouse creates input_dev which binds to mousedev interface which in > turn tries to create mouseX which is also belongs to input class) and > deadlocking. Greg promised current implementation is only a temporary > solution. > > I suppose I could separate those changes... That would probably be a good idea :) Oh, and I have been working on makeing the correct fix, you can see the initial try of it at: http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/patches/device-class.patch if you are really brave :) thanks, greg k-h ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Fw: Re: oops in choose_configuration() 2006-03-08 1:27 ` Greg KH @ 2006-03-08 3:22 ` Dmitry Torokhov 2006-03-08 5:23 ` Greg KH 0 siblings, 1 reply; 28+ messages in thread From: Dmitry Torokhov @ 2006-03-08 3:22 UTC (permalink / raw) To: Greg KH Cc: Linus Torvalds, Chuck Ebbert, Andrew Morton, Ingo Molnar, linux-kernel On Tuesday 07 March 2006 20:27, Greg KH wrote: > On Tue, Mar 07, 2006 at 08:13:28PM -0500, Dmitry Torokhov wrote: > > On Tuesday 07 March 2006 19:57, Linus Torvalds wrote: > > > > > > On Tue, 7 Mar 2006, Chuck Ebbert wrote: > > > > > > > > At least one susbsystem rolls its own method of adding env vars to the > > > > uevent buffer, and it's so broken it triggers the WARN_ON() in > > > > lib/vsprintf.c::vsnprintf() by passing a negative length to that function. > > > > > > Well, snprintf() should be safe, though. It will warn if the caller is > > > lazy, but these days, the thing does > > > > > > max(buf_size - len, 0) > > > > > > which should mean that the input layer passes in 0 instead of a negative > > > number. And snprintf() will then _not_ print anything. > > > > > > So I think input_add_uevent_bm_var() is safe, even if it's not pretty. > > > > > > However, input_devices_read() doesn't do any sanity checking at all, and > > > if that ever ends up printing more than a page, that would be bad. I > > > didn't look very closely, but it looks worrisome. > > > > > > Dmitry? > > > > I had all this code converted to seq_file, but it depends on converting > > input handlers to class interfaces and it is not possible nowadays > > because with latest Greg's changes to class code we would try to > > register class devices while registering class devices/interfaces > > (psmouse creates input_dev which binds to mousedev interface which in > > turn tries to create mouseX which is also belongs to input class) and > > deadlocking. Greg promised current implementation is only a temporary > > solution. > > > > I suppose I could separate those changes... > > That would probably be a good idea :) > Hmm, what is the policy for attr->show()? With hotplug variables we return -ENOMEM if there is not enough memory to store all data, but what about attributes? Should we also return error (and which one, -ENOMEM, -ENOBUFS?) or fill as much as we can and return up to PAGE_SIZE? With sysfs not kernel nor application can really recover if attribute needs buffer larger than a page. Or just rely on BUG_ON in fs/sysfs/file.c::fill_read_buffer()? -- Dmitry ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Fw: Re: oops in choose_configuration() 2006-03-08 3:22 ` Dmitry Torokhov @ 2006-03-08 5:23 ` Greg KH 2006-03-08 6:06 ` Dmitry Torokhov 0 siblings, 1 reply; 28+ messages in thread From: Greg KH @ 2006-03-08 5:23 UTC (permalink / raw) To: Dmitry Torokhov Cc: Linus Torvalds, Chuck Ebbert, Andrew Morton, Ingo Molnar, linux-kernel On Tue, Mar 07, 2006 at 10:22:10PM -0500, Dmitry Torokhov wrote: > On Tuesday 07 March 2006 20:27, Greg KH wrote: > > On Tue, Mar 07, 2006 at 08:13:28PM -0500, Dmitry Torokhov wrote: > > > On Tuesday 07 March 2006 19:57, Linus Torvalds wrote: > > > > > > > > On Tue, 7 Mar 2006, Chuck Ebbert wrote: > > > > > > > > > > At least one susbsystem rolls its own method of adding env vars to the > > > > > uevent buffer, and it's so broken it triggers the WARN_ON() in > > > > > lib/vsprintf.c::vsnprintf() by passing a negative length to that function. > > > > > > > > Well, snprintf() should be safe, though. It will warn if the caller is > > > > lazy, but these days, the thing does > > > > > > > > max(buf_size - len, 0) > > > > > > > > which should mean that the input layer passes in 0 instead of a negative > > > > number. And snprintf() will then _not_ print anything. > > > > > > > > So I think input_add_uevent_bm_var() is safe, even if it's not pretty. > > > > > > > > However, input_devices_read() doesn't do any sanity checking at all, and > > > > if that ever ends up printing more than a page, that would be bad. I > > > > didn't look very closely, but it looks worrisome. > > > > > > > > Dmitry? > > > > > > I had all this code converted to seq_file, but it depends on converting > > > input handlers to class interfaces and it is not possible nowadays > > > because with latest Greg's changes to class code we would try to > > > register class devices while registering class devices/interfaces > > > (psmouse creates input_dev which binds to mousedev interface which in > > > turn tries to create mouseX which is also belongs to input class) and > > > deadlocking. Greg promised current implementation is only a temporary > > > solution. > > > > > > I suppose I could separate those changes... > > > > That would probably be a good idea :) > > > > Hmm, what is the policy for attr->show()? With hotplug variables we > return -ENOMEM if there is not enough memory to store all data, but > what about attributes? Should we also return error (and which one, > -ENOMEM, -ENOBUFS?) or fill as much as we can and return up to > PAGE_SIZE? Remember, sysfs files are supposed to be small, you are an "oddity" in that you have a much larger buffer that you can return due to the wierd aliases you have. Truncating the buffer is probably good as we want userspace to get some information, right? > With sysfs not kernel nor application can really recover > if attribute needs buffer larger than a page. Or just rely on BUG_ON > in fs/sysfs/file.c::fill_read_buffer()? How about just making this a binary attribute, then you can handle an arbitrary size buffer and don't have to worry about the PAGE_SIZE stuff (but it makes it more code that you have to write to handle it all, there are tradeoffs...) thanks, greg k-h ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Fw: Re: oops in choose_configuration() 2006-03-08 5:23 ` Greg KH @ 2006-03-08 6:06 ` Dmitry Torokhov 2006-03-08 6:15 ` Greg KH 0 siblings, 1 reply; 28+ messages in thread From: Dmitry Torokhov @ 2006-03-08 6:06 UTC (permalink / raw) To: Greg KH Cc: Linus Torvalds, Chuck Ebbert, Andrew Morton, Ingo Molnar, linux-kernel On Wednesday 08 March 2006 00:23, Greg KH wrote: > > > > Hmm, what is the policy for attr->show()? With hotplug variables we > > return -ENOMEM if there is not enough memory to store all data, but > > what about attributes? Should we also return error (and which one, > > -ENOMEM, -ENOBUFS?) or fill as much as we can and return up to > > PAGE_SIZE? > > Remember, sysfs files are supposed to be small, you are an "oddity" in > that you have a much larger buffer that you can return due to the wierd > aliases you have. > > Truncating the buffer is probably good as we want userspace to get some > information, right? > > > With sysfs not kernel nor application can really recover > > if attribute needs buffer larger than a page. Or just rely on BUG_ON > > in fs/sysfs/file.c::fill_read_buffer()? > > How about just making this a binary attribute, then you can handle an > arbitrary size buffer and don't have to worry about the PAGE_SIZE stuff > (but it makes it more code that you have to write to handle it all, > there are tradeoffs...) > I really don't believe that we ever going to cross 4096 boundary for any single input attribute, but just to code defensively we need to decide what to do if we ever encounter a crazy device. Just truncating to PAGE_SIZE is easiest so that's what I am going to do. -- Dmitry ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Fw: Re: oops in choose_configuration() 2006-03-08 6:06 ` Dmitry Torokhov @ 2006-03-08 6:15 ` Greg KH 0 siblings, 0 replies; 28+ messages in thread From: Greg KH @ 2006-03-08 6:15 UTC (permalink / raw) To: Dmitry Torokhov Cc: Linus Torvalds, Chuck Ebbert, Andrew Morton, Ingo Molnar, linux-kernel On Wed, Mar 08, 2006 at 01:06:59AM -0500, Dmitry Torokhov wrote: > On Wednesday 08 March 2006 00:23, Greg KH wrote: > > > > > > Hmm, what is the policy for attr->show()? With hotplug variables we > > > return -ENOMEM if there is not enough memory to store all data, but > > > what about attributes? Should we also return error (and which one, > > > -ENOMEM, -ENOBUFS?) or fill as much as we can and return up to > > > PAGE_SIZE? > > > > Remember, sysfs files are supposed to be small, you are an "oddity" in > > that you have a much larger buffer that you can return due to the wierd > > aliases you have. > > > > Truncating the buffer is probably good as we want userspace to get some > > information, right? > > > > > With sysfs not kernel nor application can really recover > > > if attribute needs buffer larger than a page. Or just rely on BUG_ON > > > in fs/sysfs/file.c::fill_read_buffer()? > > > > How about just making this a binary attribute, then you can handle an > > arbitrary size buffer and don't have to worry about the PAGE_SIZE stuff > > (but it makes it more code that you have to write to handle it all, > > there are tradeoffs...) > > > > I really don't believe that we ever going to cross 4096 boundary for any > single input attribute, but just to code defensively we need to decide > what to do if we ever encounter a crazy device. Just truncating to > PAGE_SIZE is easiest so that's what I am going to do. Ok, that's fine with me. thanks, greg k-h ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Fw: Re: oops in choose_configuration() 2006-03-08 0:57 ` Linus Torvalds 2006-03-08 1:13 ` Dmitry Torokhov @ 2006-03-08 4:28 ` Joe Korty 2006-03-08 4:44 ` Dmitry Torokhov 1 sibling, 1 reply; 28+ messages in thread From: Joe Korty @ 2006-03-08 4:28 UTC (permalink / raw) To: Linus Torvalds Cc: Chuck Ebbert, Dmitry Torokhov, Andrew Morton, Greg KH, Ingo Molnar, linux-kernel On Tue, Mar 07, 2006 at 04:57:39PM -0800, Linus Torvalds wrote: > Well, snprintf() should be safe, though. It will warn if the caller is > lazy, but these days, the thing does > > max(buf_size - len, 0) > > which should mean that the input layer passes in 0 instead of a negative > number. And snprintf() will then _not_ print anything. I assume this is a typo, and you meant scnprintf? AFAIK, snprintf has the same ol' bad behavior when #bytes-to-be-written > #bytes-in-buffer. Joe ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Fw: Re: oops in choose_configuration() 2006-03-08 4:28 ` Joe Korty @ 2006-03-08 4:44 ` Dmitry Torokhov 0 siblings, 0 replies; 28+ messages in thread From: Dmitry Torokhov @ 2006-03-08 4:44 UTC (permalink / raw) To: joe.korty Cc: Linus Torvalds, Chuck Ebbert, Andrew Morton, Greg KH, Ingo Molnar, linux-kernel On Tuesday 07 March 2006 23:28, Joe Korty wrote: > On Tue, Mar 07, 2006 at 04:57:39PM -0800, Linus Torvalds wrote: > > > Well, snprintf() should be safe, though. It will warn if the caller is > > lazy, but these days, the thing does > > > > max(buf_size - len, 0) > > > > which should mean that the input layer passes in 0 instead of a negative > > number. And snprintf() will then _not_ print anything. > > I assume this is a typo, and you meant scnprintf? AFAIK, snprintf has > the same ol' bad behavior when #bytes-to-be-written > #bytes-in-buffer. > No, we do want to know if output was truncated or not so snprintf is used. -- Dmitry ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <20060304121723.19fe9b4b.akpm@osdl.org>]
[parent not found: <Pine.LNX.4.64.0603041235110.22647@g5.osdl.org>]
[parent not found: <20060304213447.GA4445@kroah.com>]
[parent not found: <20060304135138.613021bd.akpm@osdl.org>]
[parent not found: <20060304221810.GA20011@kroah.com>]
* Re: Fw: Re: oops in choose_configuration() [not found] ` <20060304221810.GA20011@kroah.com> @ 2006-03-05 23:48 ` Andrew Morton 2006-03-06 3:27 ` Linus Torvalds ` (2 more replies) 0 siblings, 3 replies; 28+ messages in thread From: Andrew Morton @ 2006-03-05 23:48 UTC (permalink / raw) To: Greg KH; +Cc: torvalds, Ingo Molnar, linux-kernel Might as well cc the list on this, so there's a record... For several days I've been getting repeatable oopses in the -mm kernel. They occur once per ~30 boots, during initscripts. Two days of bisection searching shows that when this scheduler patch ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.16-rc5/2.6.16-rc5-mm2/broken-out/sched-alter_uninterruptible_sleep_interactivity.patch is included, the oopses start occurring. I don't think the scheduler patch is the bug - I guess it just alters timing. The pipe code and the exec code are repeatedly implicated in the oopses, and it's noteworthy that sched-make_task_noninteractive_use_sleep_type.patch, which affects the pipe wakeup handling does _not_ need to be present for this bug to manifest. I tested current -linus with just these patches applied: sched-fix-task-interactivity-calculation sched-implement-smpnice sched-smpnice-apply-review-suggestions sched-smpnice-fix-average-load-per-run-queue-calculations sched-cleanup_task_activated sched-alter_uninterruptible_sleep_interactivity sched-make_task_noninteractive_use_sleep_type sched-dont_decrease_idle_sleep_avg sched-include_noninteractive_sleep_in_idle_detect sched-remove-on-runqueue-requeueing and mainline oopses too. So either there's a bug in mainline, or there's a bug in one of these: sched-fix-task-interactivity-calculation sched-implement-smpnice sched-smpnice-apply-review-suggestions sched-smpnice-fix-average-load-per-run-queue-calculations sched-cleanup_task_activated sched-alter_uninterruptible_sleep_interactivity The usual pattern of the oopses is that something has been overwritten with some data which has been used as an environment string by the uevent code. Example: SELinux: initialized (dev rootfs, type rootfs), uses genfs_contexts SELinux: initialized (dev sysfs, type sysfs), uses genfs_contexts SELinux: initialized (dev usbfs, type usbfs), uses genfs_contexts Unable to handle kernel paging request at virtual address 44535955 printing eip: *pde = 00000000 Oops: 0000 [#1] Modules linked in: nvram ipw2200 ohci1394 ide_cd cdrom ieee1394 uhci_hcd sg ehci_hcd ieee80211 ieee80211_crypt joydev snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event i2c_i801 snd_seq i2c_core snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore piix hw_random snd_page_alloc generic ext3 jbd ide_disk ide_core CPU: 0 EIP: 0060:[<c0242dd3>] Not tainted VLI EFLAGS: 00010246 (2.6.16-rc5-mm2 #298) EIP is at choose_configuration+0x58/0x13a eax: 000000d2 ebx: f7724c00 ecx: 44535950 edx: 000001f4 esi: f77e3600 edi: 00000001 ebp: 00000000 esp: c1d7def8 ds: 007b es: 007b ss: 0068 Process khubd (pid: 114, threadinfo=c1d7d000 task=c1d7ca90) Stack: <0>00000000 0000d000 f7724c00 f7724c00 00000000 00000001 c0242f68 00000000 f7724c00 c1d410c0 c02440b8 00000000 0000000a f7c71214 f7055000 00000010 c1d410c0 00000001 f7055000 00000001 c024450f 00000001 00000000 f7c71200 Call Trace: [<c0242f68>] usb_new_device+0xb3/0x129 [<c02440b8>] hub_port_connect_change+0x1ad/0x343 [<c024450f>] hub_events+0x2c1/0x3f1 [<c024463f>] hub_thread+0x0/0xe6 [<c0244649>] hub_thread+0xa/0xe6 [<c0129bb3>] autoremove_wake_function+0x0/0x2d [<c0129bb3>] autoremove_wake_function+0x0/0x2d [<c01298a8>] kthread+0x9c/0xa1 [<c012980c>] kthread+0x0/0xa1 [<c01012e1>] kernel_thread_helper+0x5/0xb Code: 00 58 39 fd c7 04 24 00 00 00 00 7d 53 0f b6 46 08 8b 8e 90 00 00 00 0f b7 93 18 02 00 00 01 c0 83 c1 08 39 d0 7f 2e 85 ed 75 0a <80> 79 05 02 0f 84 bf 00 00 00 80 bb 7c 01 00 00 ff 74 0a 80 79 <7>ieee1394: Host added: ID:BUS[0-00:1023] GUID[0800460300efc55d] Here, the faulting address is ascii "UYSD". Ignore the "D" because it was offset a bit by pointer indexing. That YSD appears in drivers/base/core.c:dev_uevent(). Previously I've seen corruption of the local linux_binprm storage in do_execve(). Often I've seen oopses like this: audit(1141477276.672:3): policy loaded auid=4294967295 SELinux: initialized (dev usbfs, type usbfs), uses genfs_contexts BUG: unable to handle kernel NULL pointer dereference at virtual address 00000004 printing eip: *pde = 00000000 Oops: 0002 [#1] last sysfs file: /block/hda/removable Modules linked in: ide_cd cdrom ipw2200 ohci1394 ieee1394 ehci_hcd uhci_hcd ieee80211 sg ieee80211_crypt joydev snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm i2c_i801 snd_timer snd i2c_core piix soundcore hw_random snd_page_alloc generic ext3 jbd ide_disk ide_core CPU: 0 EIP: 0060:[<c0129011>] Not tainted VLI EFLAGS: 00010046 (2.6.16-rc5-mm2 #287) EIP is at add_wait_queue+0xd/0x1d eax: f76dc000 ebx: 00000000 ecx: f768a018 edx: f768a00c esi: 00000246 edi: f7f6b200 ebp: f7106764 esp: f7763ea0 ds: 007b es: 007b ss: 0068 Process udevd (pid: 1005, threadinfo=f7763000 task=f7729ab0) Stack: <0>f76dc000 f7763f0c c015e6a7 f7f6b200 00000000 00000008 00000145 c0163c6e 00000000 00000000 00000000 00000003 00000008 00000000 00000000 00000008 f705bf6c f705bf68 f705bf64 f705bf74 f705bf70 f705bf6c 00000003 f7763f0c Call Trace: <c015e6a7> pipe_poll+0x24/0x94 <c0163c6e> do_select+0x2d1/0x2d5 <c016385f> __pollwait+0x0/0x94 <c0163e49> core_sys_select+0x1c4/0x2f8 <c0164007> sys_select+0x8a/0x18f <c0102a9b> sysenter_past_esp+0x54/0x75 Code: 11 c0 e8 af e1 ff ff eb c2 89 d8 e8 28 dd fe ff eb af 8d 06 e8 55 72 19 00 eb a6 90 90 90 56 53 83 22 fe 9c 5e fa 8b 18 8d 4a 0c <89> 4b 04 89 5a 0c 89 41 04 89 08 56 9d 5b 5e c3 56 53 83 0a 01 <1>BUG: unable to handle kernel paging request at virtual address 464c457f printing eip: c0117418 *pde = 3e427067 Oops: 0000 [#2] last sysfs file: /class/firmware/0000:06:0b.0/loading Modules linked in: ide_cd cdrom ipw2200 ohci1394 ieee1394 ehci_hcd uhci_hcd ieee80211 sg ieee80211_crypt joydev snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm i2c_i801 snd_timer snd i2c_core piix soundcore hw_random snd_page_alloc generic ext3 jbd ide_disk ide_core CPU: 0 EIP: 0060:[<c0117418>] Not tainted VLI EFLAGS: 00010092 (2.6.16-rc5-mm2 #287) EIP is at __wake_up_common+0x12/0x57 eax: 464c457f ebx: 00000202 ecx: 00000001 edx: 00000001 esi: 00000001 edi: f76dc000 ebp: f7eb1f1c esp: f7eb1f04 ds: 007b es: 007b ss: 0068 Process cat (pid: 1013, threadinfo=f7eb1000 task=f7ebe570) Stack: <0>00000006 c0339118 00000001 00000202 c17e9f40 00000006 f7eb1f30 c011746e 00000000 00000000 f76dc000 00000001 c015e4e8 00000000 f76dc000 f76dc010 00000000 00000001 00000006 f76dc000 f7106764 f7eb1f78 f7f6b740 00000006 Call Trace: <c011746e> __wake_up+0x11/0x1a <c015e4e8> pipe_writev+0x273/0x38a <c015e61b> pipe_write+0x1c/0x20 <c0154386> vfs_write+0xb1/0x151 <c01544c6> sys_write+0x3c/0x64 <c0102a9b> sysenter_past_esp+0x54/0x75 Code: c0 89 f0 e8 df f6 ff ff e9 f4 fd ff ff 55 89 e5 8b 40 04 5d e9 95 f8 ff ff 55 89 e5 57 89 c7 56 89 ce 53 83 ec 0c 89 55 f0 8b 00 <8b> 10 39 f8 89 55 ec 74 34 8d 58 f4 8b 55 f0 8b 40 f4 ff 75 0c <1>BUG: unable to handle kernel paging request at virtual address 6f732e6f printing eip: c0163033 *pde = 00000000 Oops: 0000 [#3] Note that in oops #2, the faulting address is ascii "\7fELF". We think that kernel_read() has read an elf file on top of *(inode->i_pipe) in pipe_writev. Note that i_pipe and linux_binprm both use the size-512 slab, and the offsets into that memory is 0x00 in both cases. But the management of do_execve():bprm is super-simple. Here's another pipe-got-overwritten-by-ELF oops: audit(1141524105.698:3): policy loaded auid=4294967295 SELinux: initialized (dev usbfs, type usbfs), uses genfs_contexts BUG: unable to handle kernel paging request at virtual address 464c457f printing eip: *pde = 3eeb0067 Oops: 0000 [#1] last sysfs file: /devices/pci0000:00/0000:00:1d.7/usb5/5-0:1.0/modalias Modules linked in: ipw2200 ohci1394 ieee1394 ehci_hcd uhci_hcd sg ieee80211 ieee80211_crypt joydev snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm i2c_i801 snd_timer piix snd i2c_core soundcore hw_random snd_page_alloc generic ext3 jbd ide_disk ide_core CPU: 0 EIP: 0060:[<c011741a>] Not tainted VLI EFLAGS: 00010092 (2.6.16-rc5-mm2 #293) EIP is at __wake_up_common+0x12/0x57 eax: 464c457f ebx: 00000202 ecx: 00000001 edx: 00000001 esi: 00000001 edi: f76fde00 ebp: f7f7af1c esp: f7f7af04 ds: 007b es: 007b ss: 0068 Process sh (pid: 994, threadinfo=f7f7a000 task=f776a030) Stack: <0>0000000f c0338128 00000001 00000202 c17d5b60 0000000f f7f7af30 c0117470 00000000 00000000 f76fde00 00000001 c015e00c 00000000 f76fde00 f76fde10 00000000 00000001 0000000f f76fde00 f706b320 f7f7af78 f7702900 0000000f Call Trace: <c0117470> __wake_up+0x11/0x1a <c015e00c> pipe_writev+0x273/0x38a <c015e13f> pipe_write+0x1c/0x20 <c0153eae> vfs_write+0xb1/0x151 <c0153fee> sys_write+0x3c/0x64 <c0102a9b> sysenter_past_esp+0x54/0x75 Code: c0 89 f0 e8 d9 f6 ff ff e9 f4 fd ff ff 55 89 e5 8b 40 04 5d e9 95 f8 ff ff 55 89 e5 57 89 c7 56 89 ce 53 83 ec 0c 89 55 f0 8b 00 <8b> 10 39 f8 89 55 ec 74 34 8d 58 f4 8b 55 f0 8b 40 f4 ff 75 0c <1>BUG: unable to handle kernel paging request at virtual address 0064777f It takes maybe an hour of rebooting to reproduce this. Kernel is uniproc, voluntary preempt (http://www.zip.com.au/~akpm/linux/patches/stuff/config-sony). The next stage in brute-forcing is to start bisecting Linus's tree, applying the sched patches at each stage. At a few hours per step I don't relish that. Ingo, could you please take a real close look at the above six sched patches, see if you can spot anything in there which might cause this? I have a feeling that this bug has been there for a while - I have seen a couple of funny oopses booting this machine over the past couple of months. Those sched patches have been in -mm for a couple of months too. Enabling CONFIG_SLAB_DEBUG or CONFIG_DEBUG_PAGEALLOC seems to make the bug go away (naturally). I'll retest that now. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Fw: Re: oops in choose_configuration() 2006-03-05 23:48 ` Andrew Morton @ 2006-03-06 3:27 ` Linus Torvalds 2006-03-06 8:48 ` Andrew Morton 2006-03-06 5:00 ` Linus Torvalds 2006-03-06 11:04 ` Jesper Juhl 2 siblings, 1 reply; 28+ messages in thread From: Linus Torvalds @ 2006-03-06 3:27 UTC (permalink / raw) To: Andrew Morton; +Cc: Greg KH, Ingo Molnar, linux-kernel Andrew, I'm worried about the fact that kmalloc() doesn't get redlining. The lifetime rules for the pipe_info thing and the bprm seems totally obvious, and we'd get slab errors if somebody was doing something strange there anyway. So I'd be more inclined to blame a buffer overflow on a kmalloc, and the obvious target is the "add_uevent_var()" thing, since all/many of the corruptions seem to come from uevent environment variable strings. Because kmalloc() doesn't do redlining, we'd never see an overflow as an error, and it would just overwrite the next block. Now, they are in different slab blocks (the uevent strign allocation is a 2048-byte allocation), but maybe it flows over into the next page entirely.. Now, it all uses "vnsprintf()" which _should_ be safe, but that in turn uses pointer comparisons, so maybe gcc screws that up. Who knows. Gcc has been known to use signed comparisons on pointers and other brokenness. And we could just have screwed something up (not updating "len" when we update the buffer start etc etc) Anyway, this trivial patch will check for buffer length consistency and overflow by just putting a magic value at the end of the buffer, and checking it. Maybe. I don't see anything wrong there, and booting with this patch doesn't trigger anything for me, but it's simple enough to be worth checking out. Linus --- diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c index 086a0c6..366214a 100644 --- a/lib/kobject_uevent.c +++ b/lib/kobject_uevent.c @@ -51,6 +51,9 @@ static char *action_to_string(enum kobje } } +#define UEV_MAGIC (0xc0edbabeu) +#define uev_magic(buffer, len) ((unsigned int *) ((len) + (void *)(buffer)))[-1] + /** * kobject_uevent - notify userspace by ending an uevent * @@ -107,6 +110,8 @@ void kobject_uevent(struct kobject *kobj if (!buffer) goto exit; + uev_magic(buffer, BUFFER_SIZE) = UEV_MAGIC; + /* complete object path */ devpath = kobject_get_path(kobj, GFP_KERNEL); if (!devpath) @@ -223,6 +228,10 @@ int add_uevent_var(char **envp, int num_ { va_list args; + BUG_ON(buffer_size < 4); + BUG_ON(buffer_size > 2048); + BUG_ON(uev_magic(buffer, buffer_size) != UEV_MAGIC); + /* * We check against num_envp - 1 to make sure there is at * least one slot left after we return, since kobject_uevent() ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: Fw: Re: oops in choose_configuration() 2006-03-06 3:27 ` Linus Torvalds @ 2006-03-06 8:48 ` Andrew Morton 2006-03-08 1:31 ` Greg KH 0 siblings, 1 reply; 28+ messages in thread From: Andrew Morton @ 2006-03-06 8:48 UTC (permalink / raw) To: Linus Torvalds; +Cc: greg, mingo, linux-kernel Linus Torvalds <torvalds@osdl.org> wrote: > > > Andrew, > I'm worried about the fact that kmalloc() doesn't get redlining. > > The lifetime rules for the pipe_info thing and the bprm seems totally > obvious, and we'd get slab errors if somebody was doing something strange > there anyway. > > So I'd be more inclined to blame a buffer overflow on a kmalloc, and the > obvious target is the "add_uevent_var()" thing, since all/many of the > corruptions seem to come from uevent environment variable strings. > > Because kmalloc() doesn't do redlining, we'd never see an overflow as an > error, and it would just overwrite the next block. Now, they are in > different slab blocks (the uevent strign allocation is a 2048-byte > allocation), but maybe it flows over into the next page entirely.. > > Now, it all uses "vnsprintf()" which _should_ be safe, but that in turn > uses pointer comparisons, so maybe gcc screws that up. Who knows. Gcc has > been known to use signed comparisons on pointers and other brokenness. And > we could just have screwed something up (not updating "len" when we update > the buffer start etc etc) Maybe. Something which is as deterministic as that would trip CONFIG_DEBUG_PAGEALLOC though. > Anyway, this trivial patch will check for buffer length consistency and > overflow by just putting a magic value at the end of the buffer, and > checking it. Maybe. > > I don't see anything wrong there, and booting with this patch doesn't > trigger anything for me, but it's simple enough to be worth checking out. Yup. The wind changed direction and I lost the ability to reproduce it for a few hours. Going back to exactly the rc5-mm2 lineup did the trick. Your debug patch didn't trigger. SELinux: initialized (dev rootfs, type rootfs), uses genfs_contexts SELinux: initialized (dev sysfs, type sysfs), uses genfs_contexts audit(1141604160.744:3): policy loaded auid=4294967295 SELinux: initialized (dev usbfs, type usbfs), uses genfs_contexts BUG: unable to handle kernel paging request at virtual address 44535955 printing eip: *pde = 00000000 Oops: 0000 [#1] last sysfs file: /class/firmware/0000:06:0b.0/loading Modules linked in: ide_cd cdrom ipw2200 ohci1394 ieee1394 ehci_hcd ieee80211 uhci_hcd sg joydev ieee80211_crypt snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm hw_random i2c_i801 snd_timer piix snd i2c_core soundcore snd_page_alloc generic ext3 jbd ide_disk ide_core CPU: 0 EIP: 0060:[<c02437ed>] Not tainted VLI EFLAGS: 00010246 (2.6.16-rc5-mm2 #302) EIP is at choose_configuration+0x58/0x13a eax: 000000d2 ebx: f70aac00 ecx: 44535950 edx: 000001f4 esi: f76eca00 edi: 00000001 ebp: 00000000 esp: f7f3def8 ds: 007b es: 007b ss: 0068 Process khubd (pid: 114, threadinfo=f7f3d000 task=f7f3cab0) Stack: <0>00000000 0000d000 f70aac00 f70aac00 00000000 00000001 c0243a5a 00000000 f70aac00 c1caa3c0 c0244ba1 00000000 0000000a f76ec814 f7029000 00000011 c1caa3c0 00000001 f7029000 00000001 c0244ff8 00000003 00000000 f76ec800 Call Trace: <c0243a5a> usb_new_device+0x15a/0x1d0 <c0244ba1> hub_port_connect_change+0x1ad/0x343 <c0244ff8> hub_events+0x2c1/0x3f1 <c0245128> hub_thread+0x0/0xe6 <c0245132> hub_thread+0xa/0xe6 <c0128dfb> autoremove_wake_function+0x0/0x2d <c0128dfb> autoremove_wake_function+0x0/0x2d <c0128b0c> kthread+0x9c/0xa1 <c0128a70> kthread+0x0/0xa1 <c01012ed> kernel_thread_helper+0x5/0xb Code: 00 58 39 fd c7 04 24 00 00 00 00 7d 53 0f b6 46 08 8b 8e 90 00 00 00 0f b7 93 18 02 00 00 01 c0 83 c1 08 39 d0 7f 2e 85 ed 75 0a <80> 79 05 02 0f 84 bf 00 00 00 80 bb 7c 01 00 00 ff 74 0a 80 79 <6>Non-volatile memory driver v1.2 BUG: unable to handle kernel paging request at virtual address 6963702f printing eip: c01bc9b0 *pde = 00000000 Oops: 0000 [#2] last sysfs file: /class/firmware/0000:06:0b.0/loading Modules linked in: nvram ide_cd cdrom ipw2200 ohci1394 ieee1394 ehci_hcd ieee80211 uhci_hcd sg joydev ieee80211_crypt snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm hw_random i2c_i801 snd_timer piix snd i2c_core soundcore snd_page_alloc generic ext3 jbd ide_disk ide_core CPU: 0 EIP: 0060:[<c01bc9b0>] Not tainted VLI EFLAGS: 00010246 (2.6.16-rc5-mm2 #302) EIP is at kobject_uevent+0x28d/0x37c eax: 00000000 ebx: f76f76c0 ecx: ffffffff edx: f7176efc esi: 0000002a edi: 6963702f ebp: f77542c0 esp: f7176f04 ds: 007b es: 007b ss: 0068 Process knodemgrd_0 (pid: 1025, threadinfo=f7176000 task=f7788ab0) Stack: <0>f70f1000 c034da80 00000002 f70f1051 f8cf6e4d f7097640 c02e93d8 f76e9568 00000246 00000000 f7758a94 f76e9568 f76e9568 f76e9560 f76e9568 f8d28c40 c021e752 00000000 f70473a0 00000000 f76e9400 f76e9440 f76e9560 f763c000 Call Trace: <c021e752> class_device_add+0x14e/0x201 <f8cf1efe> nodemgr_create_node+0x104/0x17b [ieee1394] <f8cf2b1a> nodemgr_node_scan_one+0x160/0x16e [ieee1394] <f8cf2b76> nodemgr_node_scan+0x4e/0x50 [ieee1394] <f8cf3277> nodemgr_host_thread+0xdc/0x14c [ieee1394] <f8cf319b> nodemgr_host_thread+0x0/0x14c [ieee1394] <c01012ed> kernel_thread_helper+0x5/0xb Code: 24 1c 68 01 1d 2f c0 57 e8 6e 25 00 00 c7 44 24 18 02 00 00 00 83 c4 10 83 7d 08 00 74 4f 8b 44 24 08 83 c9 ff 8b 7c 85 00 31 c0 <f2> ae f7 d1 49 83 7b 64 00 8d 71 01 8b bb 9c 00 00 00 75 61 01 <6>ACPI: AC Adapter [ACAD] (on-line) ACPI: Battery Slot [BAT1] (battery present) ACPI: Lid Switch [LID0] As is usual with this particular manifestation, we died here: if (i == 0 && desc->bInterfaceClass == USB_CLASS_COMM (gdb) p sizeof(struct usb_interface_descriptor) $3 = 9 (gdb) p sizeof(struct usb_host_config) $4 = 280 (gdb) p sizeof(struct usb_interface_cache) $5 = 8 (gdb) p sizeof(struct usb_host_interface) $6 = 28 (gdb) offsetof usb_interface_descriptor bInterfaceClass 5 0x5 And that's quite different. `struct usb_interface_cache' is only 8 bytes, and the scribble is (as far as we know), not at offset zero. Unless there are a large number of `usb_host_interface's in `struct usb_interface_cache's variable-sized array, we're not using the size-512 slab here. <looks at choose_configuration()> struct usb_interface_descriptor *desc = &c->intf_cache[0]->altsetting->desc; Seems funny. struct usb_interface_cache { unsigned num_altsetting; /* number of alternate settings */ struct kref ref; /* reference counter */ /* variable-length array of alternate settings for this interface, * stored in no particular order */ struct usb_host_interface altsetting[0]; }; A clearer way of coding that assignment would have been struct usb_interface_descriptor *desc = &c->intf_cache[0]->altsetting[0].desc; a) How come we're only considering the zeroth slot in that array in here? b) How do we know that there's actually anything _there_? The length of that variable-sized array doesn't seem to have been stored anywhere obvious by usb_parse_configuration() and choose_configuration() doesn't check. What happens if the length was zero? drivers/usb/core/config.c hasn't changed since October. If it was this easy, we'd have hit it before now. Greg, who knows this code? Can we triple-check it please? Oh dear, there it goes. SELinux: initialized (dev sysfs, type sysfs), uses genfs_contexts audit(1141605301.698:3): policy loaded auid=4294967295 SELinux: initialized (dev usbfs, type usbfs), uses genfs_contexts ------------[ cut here ]------------ kernel BUG at kernel/timer.c:213! invalid opcode: 0000 [#1] last sysfs file: /class/firmware/0000:06:0b.0/loading Modules linked in: ide_cd cdrom ipw2200 ohci1394 ieee1394 ieee80211 ehci_hcd uhci_hcd ieee80211_crypt sg joydev snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm i2c_i801 hw_random piix snd_timer i2c_core snd soundcore snd_page_alloc generic ext3 jbd ide_disk ide_core CPU: 0 EIP: 0060:[<c0120d68>] Not tainted VLI EFLAGS: 00010246 (2.6.16-rc5-mm2 #302) EIP is at __mod_timer+0x5e/0x68 eax: f70a0898 ebx: f70a0898 ecx: 00000000 edx: fffef455 esi: fffef455 edi: 00000000 ebp: f7fadc48 esp: f77e4e10 ds: 007b es: 007b ss: 0068 Process modprobe (pid: 613, threadinfo=f77e4000 task=f77d8570) Stack: <0>00000000 f70a0840 f8d616e4 00000000 c022125f f706e240 f70faa0c f8d54328 f8d616e4 f8d54328 c0221323 00000001 f8d426d8 f70faa0c 00000000 00000003 f8d42a75 f8d54313 f70faa0c f70faa10 00000000 f7fadcb0 f8d4df95 f70faa0c Call Trace: <c022125f> _request_firmware+0x8f/0x14c <c0221323> request_firmware+0x7/0x9 <f8d426d8> ipw_get_fw+0x16/0xa0 [ipw2200] <f8d42a75> ipw_load+0x232/0x799 [ipw2200] <f8d4df95> ipw_up+0x48/0x358 [ipw2200] <f8d4e550> ipw_net_init+0x1c/0x3a [ipw2200] <c0271593> register_netdevice+0x4c/0x2da <c0187433> create_files+0x23/0x44 <c0271869> register_netdev+0x48/0x68 <f8d4e8b7> ipw_pci_probe+0x349/0x401 [ipw2200] <c0117484> __wake_up_locked+0x11/0x13 <c021db21> __driver_attach+0x0/0x6b <c01c693c> pci_call_probe+0xa/0xc <c01c6972> __pci_device_probe+0x34/0x41 <c01c699d> pci_device_probe+0x1e/0x32 <c021da87> driver_probe_device+0x49/0x8c <c021db77> __driver_attach+0x56/0x6b <c021d2e2> bus_for_each_dev+0x44/0x59 <c021db9d> driver_attach+0x11/0x13 <c021db21> __driver_attach+0x0/0x6b <c021d68e> bus_add_driver+0x57/0x81 <c021df2c> driver_register+0x70/0x75 <c012f3bb> sys_init_module+0x66/0x15f <c01c6b28> __pci_register_driver+0x37/0x49 <c011a909> printk+0xe/0x11 <f8c93015> ipw_init+0x15/0x6a [ipw2200] <f8c93026> ipw_init+0x26/0x6a [ipw2200] <c012f49c> sys_init_module+0x147/0x15f <c0102a9b> sysenter_past_esp+0x54/0x75 Code: 20 00 a1 84 08 49 c0 39 c1 74 07 39 19 74 18 89 43 14 89 73 08 89 da e8 fb fe ff ff ff 34 24 9d 5a 89 f8 5b 5e 5f c3 89 c8 eb e7 <0f> 0b d5 00 c4 44 2e c0 eb a8 53 31 d2 8b 0d 84 08 49 c0 83 38 BUG: modprobe/613, lock held at task exit time! [c0357220] {rtnl_mutex} .. held by: modprobe: 613 [f77d8570, 113] ... acquired at: register_netdev+0xb/0x68 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Fw: Re: oops in choose_configuration() 2006-03-06 8:48 ` Andrew Morton @ 2006-03-08 1:31 ` Greg KH 2006-03-08 1:49 ` Andrew Morton 0 siblings, 1 reply; 28+ messages in thread From: Greg KH @ 2006-03-08 1:31 UTC (permalink / raw) To: Andrew Morton, linux-usb-devel; +Cc: Linus Torvalds, mingo, linux-kernel On Mon, Mar 06, 2006 at 12:48:36AM -0800, Andrew Morton wrote: > Linus Torvalds <torvalds@osdl.org> wrote: > > > > > > Andrew, > > I'm worried about the fact that kmalloc() doesn't get redlining. > > > > The lifetime rules for the pipe_info thing and the bprm seems totally > > obvious, and we'd get slab errors if somebody was doing something strange > > there anyway. > > > > So I'd be more inclined to blame a buffer overflow on a kmalloc, and the > > obvious target is the "add_uevent_var()" thing, since all/many of the > > corruptions seem to come from uevent environment variable strings. > > > > Because kmalloc() doesn't do redlining, we'd never see an overflow as an > > error, and it would just overwrite the next block. Now, they are in > > different slab blocks (the uevent strign allocation is a 2048-byte > > allocation), but maybe it flows over into the next page entirely.. > > > > Now, it all uses "vnsprintf()" which _should_ be safe, but that in turn > > uses pointer comparisons, so maybe gcc screws that up. Who knows. Gcc has > > been known to use signed comparisons on pointers and other brokenness. And > > we could just have screwed something up (not updating "len" when we update > > the buffer start etc etc) > > Maybe. Something which is as deterministic as that would trip > CONFIG_DEBUG_PAGEALLOC though. > > > Anyway, this trivial patch will check for buffer length consistency and > > overflow by just putting a magic value at the end of the buffer, and > > checking it. Maybe. > > > > I don't see anything wrong there, and booting with this patch doesn't > > trigger anything for me, but it's simple enough to be worth checking out. > > Yup. The wind changed direction and I lost the ability to reproduce it for > a few hours. Going back to exactly the rc5-mm2 lineup did the trick. Your > debug patch didn't trigger. > > > > SELinux: initialized (dev rootfs, type rootfs), uses genfs_contexts > SELinux: initialized (dev sysfs, type sysfs), uses genfs_contexts > audit(1141604160.744:3): policy loaded auid=4294967295 > SELinux: initialized (dev usbfs, type usbfs), uses genfs_contexts > BUG: unable to handle kernel paging request at virtual address 44535955 > printing eip: > *pde = 00000000 > Oops: 0000 [#1] > last sysfs file: /class/firmware/0000:06:0b.0/loading > Modules linked in: ide_cd cdrom ipw2200 ohci1394 ieee1394 ehci_hcd ieee80211 uhci_hcd sg joydev ieee80211_crypt snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm hw_random i2c_i801 snd_timer piix snd i2c_core soundcore snd_page_alloc generic ext3 jbd ide_disk ide_core > CPU: 0 > EIP: 0060:[<c02437ed>] Not tainted VLI > EFLAGS: 00010246 (2.6.16-rc5-mm2 #302) > EIP is at choose_configuration+0x58/0x13a > eax: 000000d2 ebx: f70aac00 ecx: 44535950 edx: 000001f4 > esi: f76eca00 edi: 00000001 ebp: 00000000 esp: f7f3def8 > ds: 007b es: 007b ss: 0068 > Process khubd (pid: 114, threadinfo=f7f3d000 task=f7f3cab0) > Stack: <0>00000000 0000d000 f70aac00 f70aac00 00000000 00000001 c0243a5a 00000000 > f70aac00 c1caa3c0 c0244ba1 00000000 0000000a f76ec814 f7029000 00000011 > c1caa3c0 00000001 f7029000 00000001 c0244ff8 00000003 00000000 f76ec800 > Call Trace: > <c0243a5a> usb_new_device+0x15a/0x1d0 <c0244ba1> hub_port_connect_change+0x1ad/0x343 > <c0244ff8> hub_events+0x2c1/0x3f1 <c0245128> hub_thread+0x0/0xe6 > <c0245132> hub_thread+0xa/0xe6 <c0128dfb> autoremove_wake_function+0x0/0x2d > <c0128dfb> autoremove_wake_function+0x0/0x2d <c0128b0c> kthread+0x9c/0xa1 > <c0128a70> kthread+0x0/0xa1 <c01012ed> kernel_thread_helper+0x5/0xb > Code: 00 58 39 fd c7 04 24 00 00 00 00 7d 53 0f b6 46 08 8b 8e 90 00 00 00 0f b7 93 18 02 00 00 01 c0 83 c1 08 39 d0 7f 2e 85 ed 75 0a <80> 79 05 02 0f 84 bf 00 00 00 80 bb 7c 01 00 00 ff 74 0a 80 79 > <6>Non-volatile memory driver v1.2 > BUG: unable to handle kernel paging request at virtual address 6963702f > printing eip: > c01bc9b0 > *pde = 00000000 > Oops: 0000 [#2] > last sysfs file: /class/firmware/0000:06:0b.0/loading > Modules linked in: nvram ide_cd cdrom ipw2200 ohci1394 ieee1394 ehci_hcd ieee80211 uhci_hcd sg joydev ieee80211_crypt snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm hw_random i2c_i801 snd_timer piix snd i2c_core soundcore snd_page_alloc generic ext3 jbd ide_disk ide_core > CPU: 0 > EIP: 0060:[<c01bc9b0>] Not tainted VLI > EFLAGS: 00010246 (2.6.16-rc5-mm2 #302) > EIP is at kobject_uevent+0x28d/0x37c > eax: 00000000 ebx: f76f76c0 ecx: ffffffff edx: f7176efc > esi: 0000002a edi: 6963702f ebp: f77542c0 esp: f7176f04 > ds: 007b es: 007b ss: 0068 > Process knodemgrd_0 (pid: 1025, threadinfo=f7176000 task=f7788ab0) > Stack: <0>f70f1000 c034da80 00000002 f70f1051 f8cf6e4d f7097640 c02e93d8 f76e9568 > 00000246 00000000 f7758a94 f76e9568 f76e9568 f76e9560 f76e9568 f8d28c40 > c021e752 00000000 f70473a0 00000000 f76e9400 f76e9440 f76e9560 f763c000 > Call Trace: > <c021e752> class_device_add+0x14e/0x201 <f8cf1efe> nodemgr_create_node+0x104/0x17b [ieee1394] > <f8cf2b1a> nodemgr_node_scan_one+0x160/0x16e [ieee1394] <f8cf2b76> nodemgr_node_scan+0x4e/0x50 [ieee1394] > <f8cf3277> nodemgr_host_thread+0xdc/0x14c [ieee1394] <f8cf319b> nodemgr_host_thread+0x0/0x14c [ieee1394] > <c01012ed> kernel_thread_helper+0x5/0xb > Code: 24 1c 68 01 1d 2f c0 57 e8 6e 25 00 00 c7 44 24 18 02 00 00 00 83 c4 10 83 7d 08 00 74 4f 8b 44 24 08 83 c9 ff 8b 7c 85 00 31 c0 <f2> ae f7 d1 49 83 7b 64 00 8d 71 01 8b bb 9c 00 00 00 75 61 01 > <6>ACPI: AC Adapter [ACAD] (on-line) > ACPI: Battery Slot [BAT1] (battery present) > ACPI: Lid Switch [LID0] > > > As is usual with this particular manifestation, we died here: > > if (i == 0 && desc->bInterfaceClass == USB_CLASS_COMM > > > (gdb) p sizeof(struct usb_interface_descriptor) > $3 = 9 > (gdb) p sizeof(struct usb_host_config) > $4 = 280 > (gdb) p sizeof(struct usb_interface_cache) > $5 = 8 > (gdb) p sizeof(struct usb_host_interface) > $6 = 28 > (gdb) offsetof usb_interface_descriptor bInterfaceClass > 5 0x5 > > And that's quite different. `struct usb_interface_cache' is only 8 bytes, > and the scribble is (as far as we know), not at offset zero. Unless there > are a large number of `usb_host_interface's in `struct > usb_interface_cache's variable-sized array, we're not using the size-512 > slab here. > > <looks at choose_configuration()> > > struct usb_interface_descriptor *desc = > &c->intf_cache[0]->altsetting->desc; > > Seems funny. > > struct usb_interface_cache { > unsigned num_altsetting; /* number of alternate settings */ > struct kref ref; /* reference counter */ > > /* variable-length array of alternate settings for this interface, > * stored in no particular order */ > struct usb_host_interface altsetting[0]; > }; > > A clearer way of coding that assignment would have been > > struct usb_interface_descriptor *desc = > &c->intf_cache[0]->altsetting[0].desc; Agreed. > a) How come we're only considering the zeroth slot in that array in here? We start out with the first interface setting, as we always know we have one of them as per the USB spec (I think, anyone from linux-usb-devel want to verify this?) > b) How do we know that there's actually anything _there_? The length of > that variable-sized array doesn't seem to have been stored anywhere > obvious by usb_parse_configuration() and choose_configuration() doesn't > check. What happens if the length was zero? I don't think it is allowed to be, as all USB devices have to have at least 1 interface. > drivers/usb/core/config.c hasn't changed since October. If it was this > easy, we'd have hit it before now. > > Greg, who knows this code? Can we triple-check it please? I'll walk through it again, it's some nasty crap, I agree. Time to dig out the USB spec... Anyone else on linux-usb-devel want to verify it too? thanks, greg k-h ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Fw: Re: oops in choose_configuration() 2006-03-08 1:31 ` Greg KH @ 2006-03-08 1:49 ` Andrew Morton 0 siblings, 0 replies; 28+ messages in thread From: Andrew Morton @ 2006-03-08 1:49 UTC (permalink / raw) To: Greg KH; +Cc: linux-usb-devel, torvalds, mingo, linux-kernel Greg KH <greg@kroah.com> wrote: > > > a) How come we're only considering the zeroth slot in that array in here? > > We start out with the first interface setting, as we always know we have > one of them as per the USB spec (I think, anyone from linux-usb-devel > want to verify this?) I think that code has to be OK - it's longstanding stuff and slab debugging would have caught these sorts of things in a jiffy. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Fw: Re: oops in choose_configuration() 2006-03-05 23:48 ` Andrew Morton 2006-03-06 3:27 ` Linus Torvalds @ 2006-03-06 5:00 ` Linus Torvalds 2006-03-06 7:47 ` Mike Galbraith ` (2 more replies) 2006-03-06 11:04 ` Jesper Juhl 2 siblings, 3 replies; 28+ messages in thread From: Linus Torvalds @ 2006-03-06 5:00 UTC (permalink / raw) To: Andrew Morton; +Cc: Greg KH, Ingo Molnar, Linux Kernel Mailing List, Dave Jones On Sun, 5 Mar 2006, Andrew Morton wrote: > > For several days I've been getting repeatable oopses in the -mm kernel. > They occur once per ~30 boots, during initscripts. Actually, having thought about this some more, I wonder if the bug isn't a hell of a lot simpler than we've given it credit for. I think you're running with CONFIG_PREEMPT_VOLUNTARY, right? And looking more closely, that thing is BROKEN. DaveJ - do Fedora kernels also enable that thing? Ingo: as far as I can see, CONFIG_PREEMPT_VOLUNTARY is totally and utterly broken during bootup. It does: # define might_resched() cond_resched() and then we have # define might_sleep() do { might_resched(); } while (0) and but the fact is, we _know_ that "might_sleep()" is broken during early bootup. We know this, because when we ahev __might_sleep() enabled to warn about cases where we must not sleep, we've had those tests disabled during early boot for a long time, in order to avoid irritating and nasty known "sleeping function called from invalid context" messages: ... if ((in_atomic() || irqs_disabled()) && system_state == SYSTEM_RUNNING && !oops_in_progress) { if (time_before(jiffies, prev_jiffy + HZ) && prev_jiffy) ... Note in particular the "system_state == SYSTEM_RUNNING". It's there for a reason. Namely that we know that we do things that aren't valid during early bootup, and that we call functions that might sleep while we have interrupts disabled, for example. HOWEVER, the "cond_resched()" does not take that into account at all, and will happily conditionally reschedule things at early bootup before we have set system_state to SYSTEM_RUNNING. In other words, unless I've totally lost it, I think that CONFIG_PREEMPT_VOLUNTARY currently makes us re-schedule at points in the early boot that we _know_ are unsafe. We happen to not hit it very often, because (a) some of the time it doesn't matter and (b) when it matters, we seldom have "need_resched()" returning true, but I would not be at all surprised if Andrew's problems are because the scheduler heuristics make it happen when it shouldn't. And the end result? I don't know. But we've traditionally run _all_ of the early boot ignoring the "might_sleep()" warnings, up until the point where we unlock the kernel lock, long after things like kmem_cache_init(). So I would not be surprised, for example, if we had kmem_cache_init() doing bad things because it got interrupts enabled at a point where it shouldn't, because it went through the scheduler. I dunno. I can't actually see what would corrupt anything, but the point is that we definitely do scheduling in places that have gotten absolutely _zero_ coverage, because we turned off the checks on purpose during early boot because the checks gave false positives. And CONFIG_PREEMPT_VOLUNTARY turns those false positives into potential rescheduling events. Maybe I'm crazy. But it looks really really broken to me. Andrew, if I'm right, then this ugly patch should make a difference. Is there something else I've missed? Linus ---- diff --git a/kernel/sched.c b/kernel/sched.c index 12d291b..3454bb8 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -4028,6 +4028,8 @@ static inline void __cond_resched(void) */ if (unlikely(preempt_count())) return; + if (unlikely(system_state != SYSTEM_RUNNING)) + return; do { add_preempt_count(PREEMPT_ACTIVE); schedule(); ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: Fw: Re: oops in choose_configuration() 2006-03-06 5:00 ` Linus Torvalds @ 2006-03-06 7:47 ` Mike Galbraith 2006-03-07 5:51 ` Mike Galbraith 2006-03-06 9:14 ` Ingo Molnar 2006-03-06 10:31 ` Andrew Morton 2 siblings, 1 reply; 28+ messages in thread From: Mike Galbraith @ 2006-03-06 7:47 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, Greg KH, Ingo Molnar, Linux Kernel Mailing List, Dave Jones On Sun, 2006-03-05 at 21:00 -0800, Linus Torvalds wrote: > > Is there something else I've missed? Maybe. Does this add anything to the picture? During boot, recalc_task_prio() is called with now < p->timestamp. This causes quite a stir. If you WARN_ON(now < p->timestamp) or printk, you'll have a dead box due to hundreds of gripes as things churn. Adding... if (unlikely(now < p->timestamp)) __sleep_time = 0ULL; ...turns it into exactly one gripe. -Mike ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Fw: Re: oops in choose_configuration() 2006-03-06 7:47 ` Mike Galbraith @ 2006-03-07 5:51 ` Mike Galbraith 2006-03-07 5:56 ` Nick Piggin 0 siblings, 1 reply; 28+ messages in thread From: Mike Galbraith @ 2006-03-07 5:51 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, Greg KH, Ingo Molnar, Linux Kernel Mailing List, Dave Jones On Mon, 2006-03-06 at 08:47 +0100, Mike Galbraith wrote: > On Sun, 2006-03-05 at 21:00 -0800, Linus Torvalds wrote: > > > > > Is there something else I've missed? > > Maybe. Does this add anything to the picture? During boot, > recalc_task_prio() is called with now < p->timestamp. This causes quite > a stir. If you WARN_ON(now < p->timestamp) or printk, you'll have a > dead box due to hundreds of gripes as things churn. Adding... > > if (unlikely(now < p->timestamp)) > __sleep_time = 0ULL; > > ...turns it into exactly one gripe. Nope. Further research shows that this is just a speed-step problem. Still, the scheduler needs to protect itself, because even with all of the speed-step stuff enabled, these continue to occurr even after boot-up if you switch to a low power setting, which screws up the scheduler. -Mike ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Fw: Re: oops in choose_configuration() 2006-03-07 5:51 ` Mike Galbraith @ 2006-03-07 5:56 ` Nick Piggin 0 siblings, 0 replies; 28+ messages in thread From: Nick Piggin @ 2006-03-07 5:56 UTC (permalink / raw) To: Mike Galbraith Cc: Linus Torvalds, Andrew Morton, Greg KH, Ingo Molnar, Linux Kernel Mailing List, Dave Jones Mike Galbraith wrote: > On Mon, 2006-03-06 at 08:47 +0100, Mike Galbraith wrote: > >>On Sun, 2006-03-05 at 21:00 -0800, Linus Torvalds wrote: >> >> >>>Is there something else I've missed? >> >>Maybe. Does this add anything to the picture? During boot, >>recalc_task_prio() is called with now < p->timestamp. This causes quite >>a stir. If you WARN_ON(now < p->timestamp) or printk, you'll have a >>dead box due to hundreds of gripes as things churn. Adding... >> >>if (unlikely(now < p->timestamp)) >> __sleep_time = 0ULL; >> >>...turns it into exactly one gripe. > > > Nope. Further research shows that this is just a speed-step problem. > Still, the scheduler needs to protect itself, because even with all of > the speed-step stuff enabled, these continue to occurr even after > boot-up if you switch to a low power setting, which screws up the > scheduler. > sched_clock needs to be fixed, rather than scheduler. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Fw: Re: oops in choose_configuration() 2006-03-06 5:00 ` Linus Torvalds 2006-03-06 7:47 ` Mike Galbraith @ 2006-03-06 9:14 ` Ingo Molnar 2006-03-06 10:31 ` Andrew Morton 2 siblings, 0 replies; 28+ messages in thread From: Ingo Molnar @ 2006-03-06 9:14 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, Greg KH, Linux Kernel Mailing List, Dave Jones * Linus Torvalds <torvalds@osdl.org> wrote: > @@ -4028,6 +4028,8 @@ static inline void __cond_resched(void) > */ > if (unlikely(preempt_count())) > return; > + if (unlikely(system_state != SYSTEM_RUNNING)) > + return; we used to have this, but it was frowned upon during the initial lkml review of -VP so i fixed all the known 'early bootup scheduling assumptions' and removed this condition .. but that was a losing battle i suspect. In any case, i fully agree with your fix. We also need this fix for v2.6.16. Acked-by: Ingo Molnar <mingo@elte.hu> Ingo ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Fw: Re: oops in choose_configuration() 2006-03-06 5:00 ` Linus Torvalds 2006-03-06 7:47 ` Mike Galbraith 2006-03-06 9:14 ` Ingo Molnar @ 2006-03-06 10:31 ` Andrew Morton 2 siblings, 0 replies; 28+ messages in thread From: Andrew Morton @ 2006-03-06 10:31 UTC (permalink / raw) To: Linus Torvalds; +Cc: greg, mingo, linux-kernel, davej Linus Torvalds <torvalds@osdl.org> wrote: > > > > On Sun, 5 Mar 2006, Andrew Morton wrote: > > > > For several days I've been getting repeatable oopses in the -mm kernel. > > They occur once per ~30 boots, during initscripts. > > Actually, having thought about this some more, I wonder if the bug isn't a > hell of a lot simpler than we've given it credit for. > > I think you're running with CONFIG_PREEMPT_VOLUNTARY, right? Yes. > ... > And CONFIG_PREEMPT_VOLUNTARY turns those false positives into potential > rescheduling events. Yes. A similar problem exists with CONFIG_PREEMPT_BKL - we're doing down() inside lock_kenrel() in situations where we shouldn't be sleeping yet. That's one of the reasons why might_sleep() is suppressed early in boot. We assume that the down() won't be contented early in boot. We were looking at fixing all that up a month or so back, but the patches started getting pretty ugly. Still. If either lock_kernel() or cond_resched() _do_ sleep, we'll hopefully get scheduling-while-atomic warnings. > index 12d291b..3454bb8 100644 > --- a/kernel/sched.c > +++ b/kernel/sched.c > @@ -4028,6 +4028,8 @@ static inline void __cond_resched(void) > */ > if (unlikely(preempt_count())) > return; > + if (unlikely(system_state != SYSTEM_RUNNING)) > + return; > do { > add_preempt_count(PREEMPT_ACTIVE); > schedule(); It died after 35 reboots. SELinux: initialized (dev rootfs, type rootfs), uses genfs_contexts SELinux: initialized (dev sysfs, type sysfs), uses genfs_contexts audit(1141611448.702:3): policy loaded auid=4294967295 SELinux: initialized (dev usbfs, type usbfs), uses genfs_contexts BUG: unable to handle kernel paging request at virtual address 464c4583 printing eip: *pde = 3f1ea067 Oops: 0002 [#1] last sysfs file: /devices/pci0000:00/0000:00:1d.2/usb3/idVendor Modules linked in: nvram snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq ide_cd cdrom snd_seq_device snd_pcm_oss ipw2200 snd_mixer_oss ieee80211 ieee80211_crypt ohci1394 snd_pcm ieee1394 uhci_hcd sg ehci_hcd joydev snd_timer snd soundcore i2c_i801 piix i2c_core hw_random snd_page_alloc generic ext3 jbd ide_disk ide_core CPU: 0 EIP: 0060:[<c0128d66>] Not tainted VLI EFLAGS: 00010046 (2.6.16-rc5-mm2 #303) EIP is at prepare_to_wait+0x19/0x3b eax: 464c457f ebx: f776cf08 ecx: 00000246 edx: f776cf14 esi: f76fde00 edi: 00000041 ebp: f71378d0 esp: f776ced0 ds: 007b es: 007b ss: 0068 Process loadkeys (pid: 1119, threadinfo=f776c000 task=f7eaa030) Stack: <0>f7137940 f776cefc f776cf1c c015db02 f7137940 f776cee8 00000000 f7eaa030 c0128e0b f776cf14 f776cf14 00000246 c02c0f02 c015dc4b 00000000 f7eaa030 c0128e0b f776cf14 f776cf14 c01a0a63 f776cf34 00000000 f76fde00 f7cbb240 Call Trace: <c015db02> pipe_wait+0x66/0x90 <c0128e0b> autoremove_wake_function+0x0/0x2d <c02c0f02> __mutex_lock_slowpath+0x1e1/0x339 <c015dc4b> pipe_readv+0x58/0x25a <c0128e0b> autoremove_wake_function+0x0/0x2d <c01a0a63> file_has_perm+0x87/0x8a <c015dd36> pipe_readv+0x143/0x25a <c015de69> pipe_read+0x1c/0x20 <c0153d36> vfs_read+0xae/0x14e <c015405e> sys_read+0x3c/0x64 <c0102a9b> sysenter_past_esp+0x54/0x75 Code: 43 04 00 02 20 00 c7 42 0c 00 01 10 00 56 9d 5b 5e c3 57 89 cf 56 89 c6 53 89 d3 83 22 fe 9c 59 fa 8d 52 0c 39 53 0c 75 0d 8b 00 <89> 50 04 89 43 0c 89 72 04 89 16 83 7b 04 00 74 0b b8 00 f0 ff That's "\7fELF" again, plus 4 bytes. Note that this time it's the sleeper that went splat - usually it's the waker. I don't think that a bisection search of mainline (adding the sched patches each time) will be fruitful really. Just breathing on this thing makes the crashes go away. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Fw: Re: oops in choose_configuration() 2006-03-05 23:48 ` Andrew Morton 2006-03-06 3:27 ` Linus Torvalds 2006-03-06 5:00 ` Linus Torvalds @ 2006-03-06 11:04 ` Jesper Juhl 2006-03-06 11:15 ` Andrew Morton 2 siblings, 1 reply; 28+ messages in thread From: Jesper Juhl @ 2006-03-06 11:04 UTC (permalink / raw) To: Andrew Morton; +Cc: Greg KH, torvalds, Ingo Molnar, linux-kernel On 3/6/06, Andrew Morton <akpm@osdl.org> wrote: > > Might as well cc the list on this, so there's a record... > > For several days I've been getting repeatable oopses in the -mm kernel. > They occur once per ~30 boots, during initscripts. > [snip] Hi Andrew, I send a mail to the list yesterday (with you and a few other people on cc) about Slab corruption in 2.6.16-5c5-mm2. You have a problem with memory corruption and I'm seeing slab corruption messages in dmesg... Just wanted to point it out in this thread in case they are related. -- Jesper Juhl <jesper.juhl@gmail.com> Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html Plain text mails only, please http://www.expita.com/nomime.html ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Fw: Re: oops in choose_configuration() 2006-03-06 11:04 ` Jesper Juhl @ 2006-03-06 11:15 ` Andrew Morton 2006-03-06 15:59 ` Dave Jones 0 siblings, 1 reply; 28+ messages in thread From: Andrew Morton @ 2006-03-06 11:15 UTC (permalink / raw) To: Jesper Juhl; +Cc: greg, torvalds, mingo, linux-kernel "Jesper Juhl" <jesper.juhl@gmail.com> wrote: > > On 3/6/06, Andrew Morton <akpm@osdl.org> wrote: > > > > Might as well cc the list on this, so there's a record... > > > > For several days I've been getting repeatable oopses in the -mm kernel. > > They occur once per ~30 boots, during initscripts. > > > [snip] > > Hi Andrew, > > I send a mail to the list yesterday (with you and a few other people > on cc) about Slab corruption in 2.6.16-5c5-mm2. > You have a problem with memory corruption and I'm seeing slab > corruption messages in dmesg... > > Just wanted to point it out in this thread in case they are related. > Yes, there have been several memory-scribblish reports. Until this one's found I'm not sure there's much we can do with them. I'll do an overnight run with CONFIG_DEBUG_SLAB, see if that catches anything. Although AFAICT it just fixes the bug. Beyond that, I'm rather stuck. I'd probably have to drop a pile of sched patches, see if that improves things for people. But if we cannot spot a bug in those patches then it's probably in mainline and it will bite people in generally random and rare ways. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Fw: Re: oops in choose_configuration() 2006-03-06 11:15 ` Andrew Morton @ 2006-03-06 15:59 ` Dave Jones 0 siblings, 0 replies; 28+ messages in thread From: Dave Jones @ 2006-03-06 15:59 UTC (permalink / raw) To: Andrew Morton; +Cc: Jesper Juhl, greg, torvalds, mingo, linux-kernel On Mon, Mar 06, 2006 at 03:15:50AM -0800, Andrew Morton wrote: > Yes, there have been several memory-scribblish reports. Until this one's > found I'm not sure there's much we can do with them. > > I'll do an overnight run with CONFIG_DEBUG_SLAB, see if that catches > anything. Although AFAICT it just fixes the bug. > > Beyond that, I'm rather stuck. I'd probably have to drop a pile of sched > patches, see if that improves things for people. But if we cannot spot a > bug in those patches then it's probably in mainline and it will bite > people in generally random and rare ways. Not sure if it was you or Linus I private-mailed about this, but to reiterate - circa rc4-git2, the Fedora rawhide kernel stopped booting for a bunch of people, all with 686-SMP boxes. I saw it myself too, it hung just after the 'write protecting kernel rodata'. It totally puzzled me for a day. The following day, I rebased to rc4-git4, and the problem "went away". Nothing in the changesets merged could explain the hangs I saw. A few days ago, the exact same bug resurfaced, and like before, the following day, it went into hiding again. I'm terrified the release kernel I build for FC5 is going to be a stinker because of this, but every time I tried to instrument this, the bugger disappeared. I'm not sure this is even remotely the same bug you're chasing, but the unpredictability of really nasty bugs like these give me the creeps. Dave -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2006-03-08 6:15 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-03-08 3:29 Fw: Re: oops in choose_configuration() Chuck Ebbert
2006-03-08 3:48 ` Dmitry Torokhov
2006-03-08 4:01 ` Linus Torvalds
-- strict thread matches above, loose matches on Subject: below --
2006-03-07 21:54 Chuck Ebbert
2006-03-08 0:54 ` Greg KH
2006-03-08 0:57 ` Linus Torvalds
2006-03-08 1:13 ` Dmitry Torokhov
2006-03-08 1:27 ` Greg KH
2006-03-08 3:22 ` Dmitry Torokhov
2006-03-08 5:23 ` Greg KH
2006-03-08 6:06 ` Dmitry Torokhov
2006-03-08 6:15 ` Greg KH
2006-03-08 4:28 ` Joe Korty
2006-03-08 4:44 ` Dmitry Torokhov
[not found] <20060304121723.19fe9b4b.akpm@osdl.org>
[not found] ` <Pine.LNX.4.64.0603041235110.22647@g5.osdl.org>
[not found] ` <20060304213447.GA4445@kroah.com>
[not found] ` <20060304135138.613021bd.akpm@osdl.org>
[not found] ` <20060304221810.GA20011@kroah.com>
2006-03-05 23:48 ` Andrew Morton
2006-03-06 3:27 ` Linus Torvalds
2006-03-06 8:48 ` Andrew Morton
2006-03-08 1:31 ` Greg KH
2006-03-08 1:49 ` Andrew Morton
2006-03-06 5:00 ` Linus Torvalds
2006-03-06 7:47 ` Mike Galbraith
2006-03-07 5:51 ` Mike Galbraith
2006-03-07 5:56 ` Nick Piggin
2006-03-06 9:14 ` Ingo Molnar
2006-03-06 10:31 ` Andrew Morton
2006-03-06 11:04 ` Jesper Juhl
2006-03-06 11:15 ` Andrew Morton
2006-03-06 15:59 ` Dave Jones
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox