Top kernel oopses/warnings for the week of May 30th 2008

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Top kernel oopses/warnings for the week of May 30th 2008
@ 2008-05-30 16:39 Arjan van de Ven
  2008-05-30 19:19 ` Hugh Dickins
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Arjan van de Ven @ 2008-05-30 16:39 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Linus Torvalds, Andrew Morton, Ingo Molnar, Greg KH, Hugh Dickins,
	Jeff Garzik

The http://www.kerneloops.org website collects kernel oops and
warning reports from various mailing lists and bugzillas as well as
with a client users can install to auto-submit oopses.
Below is a top 10 list of the traces collected in the last 7 days.
(Reports prior to 2.6.23 have been omitted in collecting the top 10)

This week, a total of 3670 oopses and warnings have been reported,
compared to 3029 reports in the previous week.



In addition to Fedora, Debian now has included the client application in their
default GUI install targets, thanks a lot for that!

This week, based on feedback, I've split the report into "untainted"
and "caused by proprietary drivers". Let me know if I should continue
doing this or if the old format was better.

As an experiment (on request) I've exported the database to text files (one file
per report) and stuck it in a git repository. You can take a look with
git clone git://www.kerneloops.org/
Suggestions for improving the format of this are obviously very welcome, as are
"yes useful" and "no not useful" comments. Again, this is an experiment, if it's
not seen as useful I may discontinue it.



Per file statistics
1427	kernel/sysctl.c
238	fs/sysfs/dir.c
206	fs/buffer.c
167	security/selinux/hooks.c
84	kernel/spinlock.c
53	net/mac80211/main.c
48	mm/highmem.c
30	net/core/sock.c
26	net/bluetooth/rfcomm/sock.c
26	drivers/media/video/saa7134/saa7134-cards.c
24	mm/rmap.c
23	kernel/softirq.c



Seen with untainted systems
---------------------------
Rank 2: sysfs_add_one (warning)
	Reported 243 times (759 total reports)
	Duplicated sysfs entries, various drivers including USB
	This warning was last seen in version 2.6.26-rc3, and first seen in 2.6.24-rc6.
	More info: http://www.kerneloops.org/searchweek.php?search=sysfs_add_one

Rank 3: mark_buffer_dirty (warning)
	Reported 222 times (759 total reports)
	EXT3 bug while hot-removing a USB device
	This warning was last seen in version 2.6.25.3, and first seen in 2.6.24-rc6.
	More info: http://www.kerneloops.org/searchweek.php?search=mark_buffer_dirty

Rank 5: _spin_unlock_irqrestore (lockup)
	Reported 85 times (293 total reports)
	Soft lockup, mostly coming out of the scsi layer or out of idle
	This lockup was last seen in version 2.6.26-rc4, and first seen in 2.6.22-rc1.
	More info: http://www.kerneloops.org/searchweek.php?search=_spin_unlock_irqrestore

Rank 6: ieee80211_stop_tx_ba_session (warning)
	Reported 65 times (164 total reports)
	iwl4965 driver bug
	This warning was last seen in version 2.6.25.3, and first seen in 2.6.25-rc7-git6.
	More info: http://www.kerneloops.org/searchweek.php?search=ieee80211_stop_tx_ba_session

Rank 7: set_page_address (oops)
	Reported 53 times (65 total reports)
	crash coming from flush_all_zero_pkmaps; was this fixed by Hugh the other day?
	This oops was last seen in version 2.6.25.3, and first seen in 2.6.25.
	More info: http://www.kerneloops.org/searchweek.php?search=set_page_address

Rank 8: ata_hsm_move (warning)
	Reported 34 times (63 total reports)
	SATA layer bug
	This warning was last seen in version 2.6.25.3, and first seen in 2.6.25-rc9-git1.
	More info: http://www.kerneloops.org/searchweek.php?search=ata_hsm_move

Rank 10: __ioremap_caller (warning)
	Reported 26 times (32 total reports)
	[fixed] bug in the ck804xrom driver that would mark a large chunk of ram uncachable
	(fixed in 2.6.26-rc4)
	This warning was last seen in version 2.6.26-rc3-git6, and first seen in 2.6.25.
	More info: http://www.kerneloops.org/searchweek.php?search=__ioremap_caller

Rank 11: rfcomm_sock_destruct (oops)
	Reported 26 times (51 total reports)
	[fixed] Bug in the bluetooth protocol stack where a double spin-unlock caused an underflow of the lock variable
	This oops was last seen in version 2.6.25.3, and first seen in 2.6.25.
	(fix available in -mm)
	More info: http://www.kerneloops.org/searchweek.php?search=rfcomm_sock_destruct

Rank 12: saa7134_tuner_callback (oops)
	Reported 26 times (31 total reports)
	This oops was last seen in version 2.6.25.3, and first seen in 2.6.25-rc5-git4.
	Seems to crash during a string copy operation
	More info: http://www.kerneloops.org/searchweek.php?search=saa7134_tuner_callback


	
Caused by proprietary drivers
-----------------------------

Rank 1: __register_sysctl_paths (warning)
	Reported 1566 times (4003 total reports)
	Duplicate /proc registration. Bugs in madwifi (but also in the parport driver)
	This warning was last seen in version 2.6.26-rc4-git2, and first seen in 2.6.25-rc3.
	More info: http://www.kerneloops.org/searchweek.php?search=__register_sysctl_paths

Rank 4: task_has_capability (warning)
	Reported 181 times (202 total reports)
	[out of tree] Bug in the proprietary firegl driver
	warning only shows up in tainted kernels
	This warning was last seen in version 2.6.25.3, and first seen in 2.6.25.
	More info: http://www.kerneloops.org/searchweek.php?search=task_has_capability

Rank 9: sk_free (warning)
	Reported 30 times (135 total reports)
	VMWare driver bug
	warning only shows up in tainted kernels
	This warning was last seen in version 2.6.25.4, and first seen in 2.6.23.9.
	More info: http://www.kerneloops.org/searchweek.php?search=sk_free



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Top kernel oopses/warnings for the week of May 30th 2008
  2008-05-30 16:39 Top kernel oopses/warnings for the week of May 30th 2008 Arjan van de Ven
@ 2008-05-30 19:19 ` Hugh Dickins
  2008-05-30 21:43   ` Linus Torvalds
  2008-06-02 23:44   ` Hugh Dickins
  2008-05-30 22:34 ` Jochen Voß
  2008-06-02  0:02 ` James Morris
  2 siblings, 2 replies; 20+ messages in thread
From: Hugh Dickins @ 2008-05-30 19:19 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Linux Kernel Mailing List, Linus Torvalds, Andrew Morton,
	Ingo Molnar, Greg KH, Jeff Garzik

On Fri, 30 May 2008, Arjan van de Ven wrote:
> 
> Rank 7: set_page_address (oops)
> 	Reported 53 times (65 total reports)
> 	crash coming from flush_all_zero_pkmaps; was this fixed by Hugh the
> 	other day?

No, not at all.  But I'll have a little ponder over it.

> 	This oops was last seen in version 2.6.25.3, and first seen in 2.6.25.
> 	More info:
> 	http://www.kerneloops.org/searchweek.php?search=set_page_address

Hugh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Top kernel oopses/warnings for the week of May 30th 2008
  2008-05-30 19:19 ` Hugh Dickins
@ 2008-05-30 21:43   ` Linus Torvalds
  2008-05-30 21:49     ` Arjan van de Ven
  2008-05-30 22:00     ` Arjan van de Ven
  2008-06-02 23:44   ` Hugh Dickins
  1 sibling, 2 replies; 20+ messages in thread
From: Linus Torvalds @ 2008-05-30 21:43 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Arjan van de Ven, Linux Kernel Mailing List, Andrew Morton,
	Ingo Molnar, Greg KH, Jeff Garzik

On Fri, 30 May 2008, Hugh Dickins wrote:
>
> On Fri, 30 May 2008, Arjan van de Ven wrote:
> > 
> > Rank 7: set_page_address (oops)
> > 	Reported 53 times (65 total reports)
> > 	crash coming from flush_all_zero_pkmaps; was this fixed by Hugh the
> > 	other day?
> 
> No, not at all.  But I'll have a little ponder over it.

It's a BUG_ON(), but sadly the oops gatherer doesn't seem to gather that 
part. You can see it from the code portion: the "<0f> 0b" gives it away 
(that's the ud2 opcode).

There's two BUG_ON()'s in that function, and I think it's the second one, 
based on at least the code generation that my particular compiler version 
gets. IOW, it would be the

	BUG_ON(list_empty(&page_address_pool));

thing.

Why would we run out of the page-address pool? Or perhaps the right 
question is what actually protects us from _not_ running out? 

We seem to depend on the page_address_pool always being in sync with the 
pkmap_count[] array, but the fact is, they are not protected by the same 
locks. The array is protected by kmap_lock, and the page_address_pool is 
protected by the "pool_lock".

And even if they were to nest properly (I don't think they do), we 
actually do the list_empty(&page_address_pool) outside the pool lock, 
so...

I dunno. That code is really messy. Why does it have two locks for the 
data structures when it then seems to absolutely require that they are 
always coherent? And if we want to have separate locks, we cannot require 
that they are in lock-step, perhaps we should have more pages in the 
page_address_pool than strictly required since they may not be 1:1?

I do hate that mm/highmem.c mess, but I also wonder what made it start to 
trigger if it's a bug there. That code hasn't changed in ages, afaik.

I don't think this is Hugh's fault, but on the other hand I think it would 
be great if Hugh looked at it. I think most of that code predates even the 
BK repo - because I'm not finding any history for it even in the 
historical archives. Who dares look at it?

			Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Top kernel oopses/warnings for the week of May 30th 2008
  2008-05-30 21:43   ` Linus Torvalds
@ 2008-05-30 21:49     ` Arjan van de Ven
  2008-05-30 22:17       ` Arjan van de Ven
  2008-05-30 22:00     ` Arjan van de Ven
  1 sibling, 1 reply; 20+ messages in thread
From: Arjan van de Ven @ 2008-05-30 21:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Linux Kernel Mailing List, Andrew Morton,
	Ingo Molnar, Greg KH, Jeff Garzik

Linus Torvalds wrote:
> 
> On Fri, 30 May 2008, Hugh Dickins wrote:
>> On Fri, 30 May 2008, Arjan van de Ven wrote:
>>> Rank 7: set_page_address (oops)
>>> 	Reported 53 times (65 total reports)
>>> 	crash coming from flush_all_zero_pkmaps; was this fixed by Hugh the
>>> 	other day?
>> No, not at all.  But I'll have a little ponder over it.
> 
> It's a BUG_ON(), but sadly the oops gatherer doesn't seem to gather that 
> part. You can see it from the code portion: the "<0f> 0b" gives it away 
> (that's the ud2 opcode).

I've seen it a few more times the last few weeks, I'll dig into how that is happening.
Maybe we changed the bug_on text to miss my regexps ;(
(it's only about 1000 lines of perl, so what can go wrong in that ;-)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Top kernel oopses/warnings for the week of May 30th 2008
  2008-05-30 21:43   ` Linus Torvalds
  2008-05-30 21:49     ` Arjan van de Ven
@ 2008-05-30 22:00     ` Arjan van de Ven
  2008-05-30 22:30       ` Linus Torvalds
  1 sibling, 1 reply; 20+ messages in thread
From: Arjan van de Ven @ 2008-05-30 22:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Linux Kernel Mailing List, Andrew Morton,
	Ingo Molnar, Greg KH, Jeff Garzik

Linus Torvalds wrote:
> 
> On Fri, 30 May 2008, Hugh Dickins wrote:
>> On Fri, 30 May 2008, Arjan van de Ven wrote:
>>> Rank 7: set_page_address (oops)
>>> 	Reported 53 times (65 total reports)
>>> 	crash coming from flush_all_zero_pkmaps; was this fixed by Hugh the
>>> 	other day?
>> No, not at all.  But I'll have a little ponder over it.
> 
> It's a BUG_ON(), but sadly the oops gatherer doesn't seem to gather that 
> part. You can see it from the code portion: the "<0f> 0b" gives it away 
> (that's the ud2 opcode).

ok for some it did gather this information, and it is

kernel BUG at mm/highmem.c:319!


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Top kernel oopses/warnings for the week of May 30th 2008
  2008-05-30 21:49     ` Arjan van de Ven
@ 2008-05-30 22:17       ` Arjan van de Ven
  0 siblings, 0 replies; 20+ messages in thread
From: Arjan van de Ven @ 2008-05-30 22:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Linux Kernel Mailing List, Andrew Morton,
	Ingo Molnar, Greg KH, Jeff Garzik

Arjan van de Ven wrote:
> Linus Torvalds wrote:
>>
>> On Fri, 30 May 2008, Hugh Dickins wrote:
>>> On Fri, 30 May 2008, Arjan van de Ven wrote:
>>>> Rank 7: set_page_address (oops)
>>>>     Reported 53 times (65 total reports)
>>>>     crash coming from flush_all_zero_pkmaps; was this fixed by Hugh the
>>>>     other day?
>>> No, not at all.  But I'll have a little ponder over it.
>>
>> It's a BUG_ON(), but sadly the oops gatherer doesn't seem to gather 
>> that part. You can see it from the code portion: the "<0f> 0b" gives 
>> it away (that's the ud2 opcode).
> 
> I've seen it a few more times the last few weeks, I'll dig into how that 
> is happening.
> Maybe we changed the bug_on text to miss my regexps ;(

ok it was a bug I already fixed a few days ago; any reports from the last 2 or 3 days shouldn't have this.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Top kernel oopses/warnings for the week of May 30th 2008
  2008-05-30 22:00     ` Arjan van de Ven
@ 2008-05-30 22:30       ` Linus Torvalds
  2008-05-30 22:34         ` Arjan van de Ven
  0 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2008-05-30 22:30 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Hugh Dickins, Linux Kernel Mailing List, Andrew Morton,
	Ingo Molnar, Greg KH, Jeff Garzik

On Fri, 30 May 2008, Arjan van de Ven wrote:
> 
> ok for some it did gather this information, and it is
> 
> kernel BUG at mm/highmem.c:319!

That's just _odd_. The call chain actually has kmap() in it, and kmap 
does:

	if (!PageHighMem(page))
		return page_address(page);
	return kmap_high(page);

so if it's the one at line 319, which says

	BUG_ON(!PageHighMem(page));

then I wonder what happened to that PageHighMem() test of the page in 
between..

Ahh.. Not the same "page". It looks like it's in the 
flush_all_zero_pkmaps() path, and it's clearing some _other_ page in the 
pkmap table in order to make room for the new one. So the page that causes 
problems is from here:

	 page = pte_page(pkmap_page_table[i]);

rather than the one we're trying to map.

Not that it explains the BUG_ON(). We should only insert page table 
entries into the pkmap_page_table[] array in map_new_virtual(), which in 
turn is only called from kmap_high(), which in turn means that *those* 
pages have also gine through the PageHighMem() test.

So it sounds like we either
 - have corruption in pkmap_page_table[]
 - or pte_page() doesn't reverse mk_pte(page) propely, and one or the 
   other is broken.

Does anybody know if the fc9 x86-32 kernel is built with PAE enabled? 
Might this be another PAE bit-masking bug and thus possibly fixed by the 
PTE_MASK changes?

		Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Top kernel oopses/warnings for the week of May 30th 2008
  2008-05-30 22:30       ` Linus Torvalds
@ 2008-05-30 22:34         ` Arjan van de Ven
  2008-05-30 22:55           ` Linus Torvalds
  0 siblings, 1 reply; 20+ messages in thread
From: Arjan van de Ven @ 2008-05-30 22:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Linux Kernel Mailing List, Andrew Morton,
	Ingo Molnar, Greg KH, Jeff Garzik

Linus Torvalds wrote:
> 
> On Fri, 30 May 2008, Arjan van de Ven wrote:
>> ok for some it did gather this information, and it is
>>
>> kernel BUG at mm/highmem.c:319!
> 
> That's just _odd_. The call chain actually has kmap() in it, and kmap 
> does:
> 
> 	if (!PageHighMem(page))
> 		return page_address(page);
> 	return kmap_high(page);
> 
> so if it's the one at line 319, which says
> 
> 	BUG_ON(!PageHighMem(page));
> 
> then I wonder what happened to that PageHighMem() test of the page in 
> between..
> 
> Ahh.. Not the same "page". It looks like it's in the 
> flush_all_zero_pkmaps() path, and it's clearing some _other_ page in the 
> pkmap table in order to make room for the new one. So the page that causes 
> problems is from here:
> 
> 	 page = pte_page(pkmap_page_table[i]);
> 
> rather than the one we're trying to map.
> 
> Not that it explains the BUG_ON(). We should only insert page table 
> entries into the pkmap_page_table[] array in map_new_virtual(), which in 
> turn is only called from kmap_high(), which in turn means that *those* 
> pages have also gine through the PageHighMem() test.
> 
> So it sounds like we either
>  - have corruption in pkmap_page_table[]
>  - or pte_page() doesn't reverse mk_pte(page) propely, and one or the 
>    other is broken.
> 
> Does anybody know if the fc9 x86-32 kernel is built with PAE enabled? 

versions that do identify themselves as "2.6.25-14.fc9.i686.PAE", and these ones didn't,
(they're all in the "2.6.25-14.fc9.i686" form) so this is a kernel without PAE.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Top kernel oopses/warnings for the week of May 30th 2008
  2008-05-30 16:39 Top kernel oopses/warnings for the week of May 30th 2008 Arjan van de Ven
  2008-05-30 19:19 ` Hugh Dickins
@ 2008-05-30 22:34 ` Jochen Voß
  2008-05-30 22:36   ` Arjan van de Ven
  2008-06-02  0:02 ` James Morris
  2 siblings, 1 reply; 20+ messages in thread
From: Jochen Voß @ 2008-05-30 22:34 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Linux Kernel Mailing List

Hi Arjan,

2008/5/30 Arjan van de Ven <arjan@linux.intel.com>:
> Rank 2: sysfs_add_one (warning)
>        Reported 243 times (759 total reports)
[...]
> Rank 3: mark_buffer_dirty (warning)
>        Reported 222 times (759 total reports)

It seems like a strange coincidence that the first two entries have
the same number of total reports.  Is this really the case or is there
something mixed up?

All the best,
Jochen
-- 
http://seehuhn.de/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Top kernel oopses/warnings for the week of May 30th 2008
  2008-05-30 22:34 ` Jochen Voß
@ 2008-05-30 22:36   ` Arjan van de Ven
  0 siblings, 0 replies; 20+ messages in thread
From: Arjan van de Ven @ 2008-05-30 22:36 UTC (permalink / raw)
  To: Jochen Voß; +Cc: Linux Kernel Mailing List

Jochen Voß wrote:
> Hi Arjan,
> 
> 2008/5/30 Arjan van de Ven <arjan@linux.intel.com>:
>> Rank 2: sysfs_add_one (warning)
>>        Reported 243 times (759 total reports)
> [...]
>> Rank 3: mark_buffer_dirty (warning)
>>        Reported 222 times (759 total reports)
> 
> It seems like a strange coincidence that the first two entries have
> the same number of total reports.  Is this really the case or is there
> something mixed up?

coincidence; I just recreated a new version of the report (so several hours later) and they're 1 appart now.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Top kernel oopses/warnings for the week of May 30th 2008
  2008-05-30 22:34         ` Arjan van de Ven
@ 2008-05-30 22:55           ` Linus Torvalds
  2008-05-31  0:41             ` Dave Jones
  0 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2008-05-30 22:55 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Hugh Dickins, Linux Kernel Mailing List, Andrew Morton,
	Ingo Molnar, Greg KH, Jeff Garzik



On Fri, 30 May 2008, Arjan van de Ven wrote:
> 
> versions that do identify themselves as "2.6.25-14.fc9.i686.PAE", and 
> these ones didn't, (they're all in the "2.6.25-14.fc9.i686" form) so 
> this is a kernel without PAE.

Hmm. Every single one is that one kernel version or 2.6.25.3-18.fc9.i686, 
and with that many reports I'd have expected it from other kernels too. 
What was the previous popular fc9 kernel (I assume it was 2.6.25-based 
too?), and what changed?

		Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Top kernel oopses/warnings for the week of May 30th 2008
  2008-05-30 22:55           ` Linus Torvalds
@ 2008-05-31  0:41             ` Dave Jones
  0 siblings, 0 replies; 20+ messages in thread
From: Dave Jones @ 2008-05-31  0:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arjan van de Ven, Hugh Dickins, Linux Kernel Mailing List,
	Andrew Morton, Ingo Molnar, Greg KH, Jeff Garzik

On Fri, May 30, 2008 at 03:55:25PM -0700, Linus Torvalds wrote:
 > 
 > 
 > On Fri, 30 May 2008, Arjan van de Ven wrote:
 > > 
 > > versions that do identify themselves as "2.6.25-14.fc9.i686.PAE", and 
 > > these ones didn't, (they're all in the "2.6.25-14.fc9.i686" form) so 
 > > this is a kernel without PAE.
 > 
 > Hmm. Every single one is that one kernel version or 2.6.25.3-18.fc9.i686, 
 > and with that many reports I'd have expected it from other kernels too. 
 > What was the previous popular fc9 kernel (I assume it was 2.6.25-based 
 > too?), and what changed?

-14 is the version that we released F9 with, which explains its popularity.
-18 was the first update we pushed out within the first few days..

The earlier f9 builds were only beaten on by people testing our development
tree, which is nowhere near as many as what jump on a proper release.

	Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Top kernel oopses/warnings for the week of May 30th 2008
  2008-05-30 16:39 Top kernel oopses/warnings for the week of May 30th 2008 Arjan van de Ven
  2008-05-30 19:19 ` Hugh Dickins
  2008-05-30 22:34 ` Jochen Voß
@ 2008-06-02  0:02 ` James Morris
  2008-06-02  2:27   ` Arjan van de Ven
  2 siblings, 1 reply; 20+ messages in thread
From: James Morris @ 2008-06-02  0:02 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Linux Kernel Mailing List, Linus Torvalds, Andrew Morton,
	Ingo Molnar, Greg KH, Hugh Dickins, Jeff Garzik

On Fri, 30 May 2008, Arjan van de Ven wrote:

> Rank 4: task_has_capability (warning)
> 	Reported 181 times (202 total reports)
> 	[out of tree] Bug in the proprietary firegl driver
> 	warning only shows up in tainted kernels
> 	This warning was last seen in version 2.6.25.3, and first seen in
> 	2.6.25.
> 	More info:
> 	http://www.kerneloops.org/searchweek.php?search=task_has_capability

This is a shining example of why people should avoid binary drivers.  I'd 
guess that the bug is related to the new 64-bit capability code.

It'd be really interesting to know what this driver is doing with 
capabilities in the first place.

If anyone is using this driver, the output of the following command as a 
non-root user from gnome-terminal or similar may be of interest:

$ cat /proc/self/status |grep ^Cap

It should generally be all zeroes.

- James
-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Top kernel oopses/warnings for the week of May 30th 2008
  2008-06-02  0:02 ` James Morris
@ 2008-06-02  2:27   ` Arjan van de Ven
  0 siblings, 0 replies; 20+ messages in thread
From: Arjan van de Ven @ 2008-06-02  2:27 UTC (permalink / raw)
  To: James Morris
  Cc: Linux Kernel Mailing List, Linus Torvalds, Andrew Morton,
	Ingo Molnar, Greg KH, Hugh Dickins, Jeff Garzik

James Morris wrote:
> On Fri, 30 May 2008, Arjan van de Ven wrote:
> 
>> Rank 4: task_has_capability (warning)
>> 	Reported 181 times (202 total reports)
>> 	[out of tree] Bug in the proprietary firegl driver
>> 	warning only shows up in tainted kernels
>> 	This warning was last seen in version 2.6.25.3, and first seen in
>> 	2.6.25.
>> 	More info:
>> 	http://www.kerneloops.org/searchweek.php?search=task_has_capability
> 
> This is a shining example of why people should avoid binary drivers.  I'd 
> guess that the bug is related to the new 64-bit capability code.
> 
> It'd be really interesting to know what this driver is doing with 
> capabilities in the first place.

it's easy; it's making the user root via the following function:

void ATI_API_CALL KCL_PosixSecurityCapSetEffectiveVector(KCL_TYPE_Cap cap)
{
	capt_t(current->cap_effective) = cap;
}


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Top kernel oopses/warnings for the week of May 30th 2008
  2008-05-30 19:19 ` Hugh Dickins
  2008-05-30 21:43   ` Linus Torvalds
@ 2008-06-02 23:44   ` Hugh Dickins
  2008-06-03  0:00     ` Andrew Morton
  2008-06-09 16:32     ` Ingo Molnar
  1 sibling, 2 replies; 20+ messages in thread
From: Hugh Dickins @ 2008-06-02 23:44 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Linux Kernel Mailing List, Linus Torvalds, Andrew Morton,
	Ingo Molnar, Greg KH, Jeff Garzik, Dave Jones

On Fri, 30 May 2008, Hugh Dickins wrote:
> On Fri, 30 May 2008, Arjan van de Ven wrote:
> > 
> > Rank 7: set_page_address (oops)
> > 	Reported 53 times (65 total reports)
> > 	crash coming from flush_all_zero_pkmaps; was this fixed by Hugh the
> > 	other day?
> 
> No, not at all.  But I'll have a little ponder over it.
> 
> > 	This oops was last seen in version 2.6.25.3, and first seen in 2.6.25.
> > 	More info:
> > 	http://www.kerneloops.org/searchweek.php?search=set_page_address

Though I've spent quite a while poring over it, I regret to say I
haven't got much beyond the obvious with this BUG_ON(!PageHighMem)
in set_page_address() called from flush_all_zero_pkmaps().

It appears to be a corruption of the start of the pkmap_page_table,
but not a random corruption: entries of the form 0x378xxxxx through
0x37Bxxxxxx where they need to be 0x38xxxxxx or more to be highmem.
(I say appears because the compiler is reusing %eax a lot, there's
no trace on the stack or in registers of what pte was actually read.)

In every case except the 17141 nfsd one, it's found at the start of
the table, when flush_all_zero_pkmaps() is called for the very first
time (I'm guessing that from the fact that they're all failing on the
second entry, which preincrementation of the index made the first one
used).  Whereas 17141 nfsd finds a 0x00000xxx some way into the page
table, quite possibly later on: may have a very different cause.

Do we have any idea whether all or most of these come from a single
machine?  That would of course be a very different (less interesting)
story from if they're spread out over lots of machines.

I didn't notice anything suspicious in the Fedora patches to 2.6.25,
but I haven't heard (Google hasn't shown) any such problem outside
of these kerneloops from Fedora 9.  Is it showing up on Rawhide at
all?  If so, then we could devise some debug to include in coming
kernels to help shed more light on it.

Veering off at a tangent away from the oops: I was rather sobered
to see all those traces of execve using kmap, I thought we were
avoiding kmap like BKL in common paths these days (though it is
convenient for symlinks).  Would a patch something like that
below, copying the filemap.c trick, be welcome?

Hugh

--- 2.6.26-rc4/fs/exec.c	2008-05-26 20:00:39.000000000 +0100
+++ linux/fs/exec.c	2008-06-02 11:18:32.000000000 +0100
@@ -33,6 +33,7 @@
 #include <linux/string.h>
 #include <linux/init.h>
 #include <linux/pagemap.h>
+#include <linux/hardirq.h>
 #include <linux/highmem.h>
 #include <linux/spinlock.h>
 #include <linux/key.h>
@@ -396,7 +397,7 @@ static int copy_strings(int argc, char _
 {
 	struct page *kmapped_page = NULL;
 	char *kaddr = NULL;
-	unsigned long kpos = 0;
+	unsigned long kpos = ~PAGE_MASK;
 	int ret;

 	while (argc-- > 0) {
@@ -436,28 +437,38 @@ static int copy_strings(int argc, char _
 			str -= bytes_to_copy;
 			len -= bytes_to_copy;

-			if (!kmapped_page || kpos != (pos & PAGE_MASK)) {
-				struct page *page;
-
-				page = get_arg_page(bprm, pos, 1);
-				if (!page) {
-					ret = -E2BIG;
-					goto out;
-				}
-
+			if (kpos != (pos & PAGE_MASK)) {
 				if (kmapped_page) {
 					flush_kernel_dcache_page(kmapped_page);
-					kunmap(kmapped_page);
+					if (in_atomic())
+						kunmap_atomic(kaddr, KM_USER0);
+					else
+						kunmap(kmapped_page);
 					put_arg_page(kmapped_page);
 				}
-				kmapped_page = page;
-				kaddr = kmap(kmapped_page);
+				kmapped_page = get_arg_page(bprm, pos, 1);
+				if (!kmapped_page) {
+					ret = -E2BIG;
+					goto out;
+				}
+				kaddr = kmap_atomic(kmapped_page, KM_USER0);
 				kpos = pos & PAGE_MASK;
 				flush_arg_page(bprm, kpos, kmapped_page);
 			}
-			if (copy_from_user(kaddr+offset, str, bytes_to_copy)) {
-				ret = -EFAULT;
-				goto out;
+			if (in_atomic()) {
+				if (need_resched() ||
+				    __copy_from_user_inatomic(kaddr + offset,
+							str, bytes_to_copy)) {
+					kunmap_atomic(kaddr, KM_USER0);
+					kaddr = kmap(kmapped_page);
+				}
+			}
+			if (!in_atomic()) {
+				if (copy_from_user(kaddr + offset,
+							str, bytes_to_copy)) {
+					ret = -EFAULT;
+					goto out;
+				}
 			}
 		}
 	}
@@ -465,7 +476,10 @@ static int copy_strings(int argc, char _
 out:
 	if (kmapped_page) {
 		flush_kernel_dcache_page(kmapped_page);
-		kunmap(kmapped_page);
+		if (in_atomic())
+			kunmap_atomic(kaddr, KM_USER0);
+		else
+			kunmap(kmapped_page);
 		put_arg_page(kmapped_page);
 	}
 	return ret;

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Top kernel oopses/warnings for the week of May 30th 2008
  2008-06-02 23:44   ` Hugh Dickins
@ 2008-06-03  0:00     ` Andrew Morton
  2008-06-03  0:41       ` Hugh Dickins
  2008-06-09 16:32     ` Ingo Molnar
  1 sibling, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2008-06-03  0:00 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: arjan, linux-kernel, torvalds, mingo, greg, jeff, davej

On Tue, 3 Jun 2008 00:44:38 +0100 (BST)
Hugh Dickins <hugh@veritas.com> wrote:

> +					if (in_atomic())
> +						kunmap_atomic(kaddr, KM_USER0);
> +					else
> +						kunmap(kmapped_page);

eek.

/*
 * Are we running in atomic context?  WARNING: this macro cannot
 * always detect atomic context; in particular, it cannot know about
 * held spinlocks in non-preemptible kernels.  Thus it should not be
 * used in the general case to determine whether sleeping is possible.
 * Do not use in_atomic() in driver code.
 */
#define in_atomic()	((preempt_count() & ~PREEMPT_ACTIVE) != PREEMPT_INATOMIC_BASE)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Top kernel oopses/warnings for the week of May 30th 2008
  2008-06-03  0:00     ` Andrew Morton
@ 2008-06-03  0:41       ` Hugh Dickins
  2008-06-03  1:19         ` Andrew Morton
  0 siblings, 1 reply; 20+ messages in thread
From: Hugh Dickins @ 2008-06-03  0:41 UTC (permalink / raw)
  To: Andrew Morton; +Cc: arjan, linux-kernel, torvalds, mingo, greg, jeff, davej

On Mon, 2 Jun 2008, Andrew Morton wrote:
> On Tue, 3 Jun 2008 00:44:38 +0100 (BST)
> Hugh Dickins <hugh@veritas.com> wrote:
> 
> > +					if (in_atomic())
> > +						kunmap_atomic(kaddr, KM_USER0);
> > +					else
> > +						kunmap(kmapped_page);
> 
> eek.
> 
> /*
>  * Are we running in atomic context?  WARNING: this macro cannot
>  * always detect atomic context; in particular, it cannot know about
>  * held spinlocks in non-preemptible kernels.  Thus it should not be
>  * used in the general case to determine whether sleeping is possible.
>  * Do not use in_atomic() in driver code.
>  */
> #define in_atomic()	((preempt_count() & ~PREEMPT_ACTIVE) != PREEMPT_INATOMIC_BASE)

Yes, that comment is all about how a common function cannot be expected
to guess whether it's being called in atomic context or not; but we
know that we don't have any spinlocks held here, therefore it's okay.

Or do you consider fs/exec.c a driver, and shouldn't set bad example?
It is exactly the test that do_page_fault() makes at the other end,
when deciding whether it can handle the fault.

Originally I had a bool atomic there instead.  I switched over to
testing in_atomic() itself because I had it mind to suggest another
patch: it has long seemed wrong to me that we should have to disable
preemption and fault handling there, when often (on many architectures,
or on many pages) it's unnecessary.

So I'd like to change (the various implementations of) kmap_atomic()
to use pagefault_disable() only when the page actually is in highmem.

Hugh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Top kernel oopses/warnings for the week of May 30th 2008
  2008-06-03  0:41       ` Hugh Dickins
@ 2008-06-03  1:19         ` Andrew Morton
  0 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2008-06-03  1:19 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: arjan, linux-kernel, torvalds, mingo, greg, jeff, davej

On Tue, 3 Jun 2008 01:41:22 +0100 (BST) Hugh Dickins <hugh@veritas.com> wrote:

> On Mon, 2 Jun 2008, Andrew Morton wrote:
> > On Tue, 3 Jun 2008 00:44:38 +0100 (BST)
> > Hugh Dickins <hugh@veritas.com> wrote:
> > 
> > > +					if (in_atomic())
> > > +						kunmap_atomic(kaddr, KM_USER0);
> > > +					else
> > > +						kunmap(kmapped_page);
> > 
> > eek.
> > 
> > /*
> >  * Are we running in atomic context?  WARNING: this macro cannot
> >  * always detect atomic context; in particular, it cannot know about
> >  * held spinlocks in non-preemptible kernels.  Thus it should not be
> >  * used in the general case to determine whether sleeping is possible.
> >  * Do not use in_atomic() in driver code.
> >  */
> > #define in_atomic()	((preempt_count() & ~PREEMPT_ACTIVE) != PREEMPT_INATOMIC_BASE)
> 
> Yes, that comment is all about how a common function cannot be expected
> to guess whether it's being called in atomic context or not; but we
> know that we don't have any spinlocks held here, therefore it's okay.
> 
> Or do you consider fs/exec.c a driver, and shouldn't set bad example?
> It is exactly the test that do_page_fault() makes at the other end,
> when deciding whether it can handle the fault.

Well, if you're sure..  I didn't look very closely (sorry), nor did you
explain very closely.

I think doing this sort of thing is OK in fs/exec.c from the
should-we-be-doing-this-in there POV, but it should have suitable comments
slapped all over it.

> Originally I had a bool atomic there instead.  I switched over to
> testing in_atomic() itself because I had it mind to suggest another
> patch: it has long seemed wrong to me that we should have to disable
> preemption and fault handling there, when often (on many architectures,
> or on many pages) it's unnecessary.
> 
> So I'd like to change (the various implementations of) kmap_atomic()
> to use pagefault_disable() only when the page actually is in highmem.

So...  places like file_read_actor() would be given an open-coded
pagefault_disable() so we preserve out implicit boolean-passing down to
do_page_fault()?

One of the reasons why we (I?) left kmap_atomic() doing
pagefault_disable() for all pages was testing coverage: not many
developers test with highmem nowadays so there's a high risk (almost a
certainty) that people will start adding can-schedule code inside their
kmap_atomic() regions.  Probably it's not a terribly good reason...

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Top kernel oopses/warnings for the week of May 30th 2008
  2008-06-02 23:44   ` Hugh Dickins
  2008-06-03  0:00     ` Andrew Morton
@ 2008-06-09 16:32     ` Ingo Molnar
  2008-06-10 12:42       ` Hugh Dickins
  1 sibling, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2008-06-09 16:32 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Arjan van de Ven, Linux Kernel Mailing List, Linus Torvalds,
	Andrew Morton, Greg KH, Jeff Garzik, Dave Jones


* Hugh Dickins <hugh@veritas.com> wrote:

> Veering off at a tangent away from the oops: I was rather sobered to 
> see all those traces of execve using kmap, I thought we were avoiding 
> kmap like BKL in common paths these days (though it is convenient for 
> symlinks).  Would a patch something like that below, copying the 
> filemap.c trick, be welcome?

FYI, i stuck this into -tip for testing and after some time i started 
getting:

[    8.540917] Freeing unused kernel memory: 304k freed
[   12.368096] BUG: scheduling while atomic: ifup-eth/1820/0x10000001
[   12.374144] Modules linked in:
[   12.377175] Pid: 1820, comm: ifup-eth Not tainted 2.6.26-rc5-00029-ga252672-dirty #3490
[   12.384031]  [<c0131a39>] __schedule_bug+0x59/0x60
[   12.388031]  [<c06b1375>] schedule+0x465/0x8c0
[   12.392031]  [<c013eecf>] ? update_process_times+0x4f/0x60
[   12.396031]  [<c013b50f>] ? irq_exit+0x3f/0x70
[   12.400451]  [<c012164b>] ? smp_apic_timer_interrupt+0x5b/0x90
[   12.406248]  [<c0117038>] ? apic_timer_interrupt+0x28/0x30
[   12.411702]  [<c0131a58>] __cond_resched+0x18/0x30
[   12.416466]  [<c06b1838>] _cond_resched+0x28/0x30
[   12.421141]  [<c03720bb>] strnlen_user+0x2b/0x60
[   12.425728]  [<c018dd53>] copy_strings+0x63/0x210
[   12.430403]  [<c018f986>] do_execve+0x176/0x200
[   12.434903]  [<c0372007>] ? strncpy_from_user+0x37/0x60
[   12.440031]  [<c0114ade>] sys_execve+0x2e/0x60
[   12.444447]  [<c01165ae>] sysenter_past_esp+0x6a/0x90
[   12.449469]  =======================
[   12.736676] eth1: link down
[   12.736919] ADDRCONF(NETDEV_UP): eth1: link is not ready

it would occur about every 10 bootups with the same config. Bisection 
led me to your patch.

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Top kernel oopses/warnings for the week of May 30th 2008
  2008-06-09 16:32     ` Ingo Molnar
@ 2008-06-10 12:42       ` Hugh Dickins
  0 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2008-06-10 12:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arjan van de Ven, Linux Kernel Mailing List, Linus Torvalds,
	Andrew Morton, Greg KH, Jeff Garzik, Dave Jones

On Mon, 9 Jun 2008, Ingo Molnar wrote:
> * Hugh Dickins <hugh@veritas.com> wrote:
> 
> > Veering off at a tangent away from the oops: I was rather sobered to 
> > see all those traces of execve using kmap, I thought we were avoiding 
> > kmap like BKL in common paths these days (though it is convenient for 
> > symlinks).  Would a patch something like that below, copying the 
> > filemap.c trick, be welcome?
> 
> FYI, i stuck this into -tip for testing and after some time i started 
> getting:

Thanks for giving it a try.

> [   12.368096] BUG: scheduling while atomic: ifup-eth/1820/0x10000001
> [   12.374144] Modules linked in:
> [   12.377175] Pid: 1820, comm: ifup-eth Not tainted 2.6.26-rc5-00029-ga252672-dirty #3490
> [   12.384031]  [<c0131a39>] __schedule_bug+0x59/0x60
> [   12.388031]  [<c06b1375>] schedule+0x465/0x8c0
> [   12.392031]  [<c013eecf>] ? update_process_times+0x4f/0x60
> [   12.396031]  [<c013b50f>] ? irq_exit+0x3f/0x70
> [   12.400451]  [<c012164b>] ? smp_apic_timer_interrupt+0x5b/0x90
> [   12.406248]  [<c0117038>] ? apic_timer_interrupt+0x28/0x30
> [   12.411702]  [<c0131a58>] __cond_resched+0x18/0x30
> [   12.416466]  [<c06b1838>] _cond_resched+0x28/0x30
> [   12.421141]  [<c03720bb>] strnlen_user+0x2b/0x60
> [   12.425728]  [<c018dd53>] copy_strings+0x63/0x210
> [   12.430403]  [<c018f986>] do_execve+0x176/0x200
> [   12.434903]  [<c0372007>] ? strncpy_from_user+0x37/0x60
> [   12.440031]  [<c0114ade>] sys_execve+0x2e/0x60
> [   12.444447]  [<c01165ae>] sysenter_past_esp+0x6a/0x90
> [   12.449469]  =======================
> [   12.736676] eth1: link down
> [   12.736919] ADDRCONF(NETDEV_UP): eth1: link is not ready
> 
> it would occur about every 10 bootups with the same config. Bisection 
> led me to your patch.

Right, that would be with CONFIG_PREEMPT_VOLUNTARY.  Or in my case
with CONFIG_DEBUG_SPINLOCK_SLEEP, strnlen_user's might_sleep gives
BUG: sleeping function called from invalid context...

At first I thought it was just falling foul of our zeal for might_sleep.
But no, the warning is correct: the get_user(str) and strnlen_user(str)
can perfectly well fault, but my suggested patch lets them be called
with a kmap_atomic outstanding.

I doubt it would be cost-effective to kunmap_atomic for each little
string there.  I don't see a quick and effective way to fix it up.
I don't have the patience to go about adding get_user_inatomic and
strnlen_user_inatomic, there's more urgent things to be doing.

It would be nice to use a per-process kmap; or use an efficient
one-page mapping in the exec'ers userspace; or maybe just having
a kunmap_and_flush would help (to slow the cycling around pkmap
page table), though it would still involve the global spinlock.

Sorry, no quick and effective fix: please just drop the patch.

Thanks,
Hugh

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2008-06-10 12:43 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-30 16:39 Top kernel oopses/warnings for the week of May 30th 2008 Arjan van de Ven
2008-05-30 19:19 ` Hugh Dickins
2008-05-30 21:43   ` Linus Torvalds
2008-05-30 21:49     ` Arjan van de Ven
2008-05-30 22:17       ` Arjan van de Ven
2008-05-30 22:00     ` Arjan van de Ven
2008-05-30 22:30       ` Linus Torvalds
2008-05-30 22:34         ` Arjan van de Ven
2008-05-30 22:55           ` Linus Torvalds
2008-05-31  0:41             ` Dave Jones
2008-06-02 23:44   ` Hugh Dickins
2008-06-03  0:00     ` Andrew Morton
2008-06-03  0:41       ` Hugh Dickins
2008-06-03  1:19         ` Andrew Morton
2008-06-09 16:32     ` Ingo Molnar
2008-06-10 12:42       ` Hugh Dickins
2008-05-30 22:34 ` Jochen Voß
2008-05-30 22:36   ` Arjan van de Ven
2008-06-02  0:02 ` James Morris
2008-06-02  2:27   ` Arjan van de Ven

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox