* Sparc32: random invalid instruction occourances on sparc32 (sun4c)
2007-07-03 16:45 [1/2] 2.6.22-rc7: known regressions Michal Piotrowski
@ 2007-07-03 17:29 ` Mark Fortescue
2007-07-03 17:50 ` [1/2] 2.6.22-rc7: known regressions Bartlomiej Zolnierkiewicz
` (3 subsequent siblings)
4 siblings, 0 replies; 13+ messages in thread
From: Mark Fortescue @ 2007-07-03 17:29 UTC (permalink / raw)
To: Michal Piotrowski
Cc: Linus Torvalds, Andrew Morton, LKML, reiserfs-devel,
Vladimir V. Saveliev, Randy Dunlap, linux-ide, David Chinner,
linux-mm, sparclinux, David Miller, Mikael Pettersson,
William Lee Irwin III
[-- Attachment #1: Type: TEXT/PLAIN, Size: 2404 bytes --]
Hi all,
I think I have found the cause of the problem.
Commit b46b8f19c9cd435ecac4d9d12b39d78c137ecd66 partially fixed alignment
issues but does not ensure that all 64bit alignment requirements of
sparc32 are met. Tests have shown that the redzone2 word can become
misallignd.
I am currently working on a posible fix.
Regards
Mark Fortescue.
On Tue, 3 Jul 2007, Michal Piotrowski wrote:
> Hi all,
>
> Here is a list of some known regressions in 2.6.22-rc7.
>
> Feel free to add new regressions/remove fixed etc.
> http://kernelnewbies.org/known_regressions
>
> List of Aces
>
> Name Regressions fixed since 21-Jun-2007
> Hugh Dickins 2
> Andi Kleen 1
> Andrew Morton 1
> Benjamin Herrenschmidt 1
> Björn Steinbrink 1
> Bjorn Helgaas 1
> Jean Delvare 1
> Olaf Hering 1
> Siddha, Suresh B 1
> Trent Piepho 1
> Ville Syrjälä 1
>
>
>
> FS
>
> Subject : 2.6.22-rc4-git5 reiserfs: null ptr deref.
> References : http://lkml.org/lkml/2007/6/13/322
> Submitter : Randy Dunlap <randy.dunlap@oracle.com>
> Handled-By : Vladimir V. Saveliev <vs@namesys.com>
> Status : problem is being debugged
>
>
>
> IDE
>
> Subject : 2.6.22-rcX: hda: lost interrupt
> References : http://lkml.org/lkml/2007/6/29/121
> Submitter : David Chinner <dgc@sgi.com>
> Status : unknown
>
>
>
> Sparc64
>
> Subject : random invalid instruction occourances on sparc32 (sun4c)
> References : http://lkml.org/lkml/2007/6/17/111
> Submitter : Mark Fortescue <mark@mtfhpc.demon.co.uk>
> Status : problem is being debugged
>
> Subject : 2.6.22-rc broke X on Ultra5
> References : http://lkml.org/lkml/2007/5/22/78
> Submitter : Mikael Pettersson <mikpe@it.uu.se>
> Handled-By : David Miller <davem@davemloft.net>
> Status : problem is being debugged
>
>
>
> Regards,
> Michal
>
> --
> LOG
> http://www.stardust.webpages.pl/log/
> -
> To unsubscribe from this list: send the line "unsubscribe sparclinux" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [1/2] 2.6.22-rc7: known regressions
2007-07-03 16:45 [1/2] 2.6.22-rc7: known regressions Michal Piotrowski
2007-07-03 17:29 ` Sparc32: random invalid instruction occourances on sparc32 (sun4c) Mark Fortescue
@ 2007-07-03 17:50 ` Bartlomiej Zolnierkiewicz
2007-07-03 23:09 ` David Chinner
2007-07-05 0:20 ` David Woodhouse
` (2 subsequent siblings)
4 siblings, 1 reply; 13+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2007-07-03 17:50 UTC (permalink / raw)
To: Michal Piotrowski
Cc: Linus Torvalds, Andrew Morton, LKML, reiserfs-devel,
Vladimir V. Saveliev, Randy Dunlap, linux-ide, David Chinner,
sparclinux, David Miller, Mikael Pettersson, Mark Fortescue,
William Lee Irwin III
Hi,
On Tuesday 03 July 2007, Michal Piotrowski wrote:
> IDE
>
> Subject : 2.6.22-rcX: hda: lost interrupt
> References : http://lkml.org/lkml/2007/6/29/121
> Submitter : David Chinner <dgc@sgi.com>
> Status : unknown
David, any news on this one?
Have you tried libata as suggested by Jeff?
[ would exclude IRQ routing issue or broken hardware ]
Thanks,
Bart
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [1/2] 2.6.22-rc7: known regressions
2007-07-03 17:50 ` [1/2] 2.6.22-rc7: known regressions Bartlomiej Zolnierkiewicz
@ 2007-07-03 23:09 ` David Chinner
0 siblings, 0 replies; 13+ messages in thread
From: David Chinner @ 2007-07-03 23:09 UTC (permalink / raw)
To: Bartlomiej Zolnierkiewicz
Cc: Michal Piotrowski, Linus Torvalds, Andrew Morton, LKML,
reiserfs-devel, Vladimir V. Saveliev, Randy Dunlap, linux-ide,
David Chinner, sparclinux, David Miller, Mikael Pettersson,
Mark Fortescue, William Lee Irwin III
On Tue, Jul 03, 2007 at 07:50:26PM +0200, Bartlomiej Zolnierkiewicz wrote:
>
> Hi,
>
> On Tuesday 03 July 2007, Michal Piotrowski wrote:
>
> > IDE
> >
> > Subject : 2.6.22-rcX: hda: lost interrupt
> > References : http://lkml.org/lkml/2007/6/29/121
> > Submitter : David Chinner <dgc@sgi.com>
> > Status : unknown
>
> David, any news on this one?
>
> Have you tried libata as suggested by Jeff?
Not yet - I've been flat out and haven't got back to it yet.
I'll try to get to it today.
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [1/2] 2.6.22-rc7: known regressions
2007-07-03 16:45 [1/2] 2.6.22-rc7: known regressions Michal Piotrowski
2007-07-03 17:29 ` Sparc32: random invalid instruction occourances on sparc32 (sun4c) Mark Fortescue
2007-07-03 17:50 ` [1/2] 2.6.22-rc7: known regressions Bartlomiej Zolnierkiewicz
@ 2007-07-05 0:20 ` David Woodhouse
2007-07-05 1:26 ` [PATCH 2.6.22 REGRESSION] Fix slab redzone alignment David Woodhouse
2007-07-05 1:42 ` [1/2] 2.6.22-rc7: known regressions David Woodhouse
4 siblings, 0 replies; 13+ messages in thread
From: David Woodhouse @ 2007-07-05 0:20 UTC (permalink / raw)
To: Michal Piotrowski
Cc: Linus Torvalds, Andrew Morton, LKML, reiserfs-devel,
Vladimir V. Saveliev, Randy Dunlap, linux-ide, David Chinner,
sparclinux, David Miller, Mikael Pettersson, Mark Fortescue,
William Lee Irwin III
On Tue, 2007-07-03 at 18:45 +0200, Michal Piotrowski wrote:
> Subject : random invalid instruction occourances on sparc32 (sun4c)
> References : http://lkml.org/lkml/2007/6/17/111
> Submitter : Mark Fortescue <mark@mtfhpc.demon.co.uk>
> Status : problem is being debugged
Hm, when testing the fix for that on ppc32, I stupidly built with Slub
instead, and got this...
radeonfb: Monitor 1 type LCD found
radeonfb: EDID probed
radeonfb: Monitor 2 type no found
radeonfb: Using Firmware dividers 0x00040080 from PPLL 0
radeonfb: Dynamic Clock Power Management enabled
*** SLUB kmalloc-32768: Poison check failed@0xc1e20000 slab 0xc04de400 [Not tainted]
offset=0 flags=0x40c3 inuse=0 freelist=0xc1e20000
Object 0xc1e20000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object 0xc1e20010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object 0xc1e20020: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object 0xc1e20030: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object 0xc1e20040: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object 0xc1e20050: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object 0xc1e20060: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object 0xc1e20070: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Redzone 0xc1e28000: bb bb bb bb <BB><BB><BB><BB>
FreePointer 0xc1e28004 -> 0x00000000
Last alloc: malloc+0x14/0x24 jiffies_ago=1382 cpu=0 pid=1
Last free : free+0x10/0x20 jiffies_ago=837 cpu=0 pid=1
Filler 0xc1e28028: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
Call Trace:
[effc7b80] [c000893c] show_stack+0x50/0x184 (unreliable)
[effc7ba0] [c009705c] object_err+0x178/0x18c
[effc7bc0] [c0097380] check_object+0x180/0x2ec
[effc7be0] [c0098320] __slab_alloc+0x5c8/0x5f4
[effc7c10] [c0098aa4] __kmalloc+0x64/0x9c
[effc7c30] [c015f5dc] fbcon_startup+0x154/0x2c0
[effc7c60] [c01bb8ec] register_con_driver+0x94/0x164
[effc7c90] [c01bedc8] take_over_console+0x24/0x58
[effc7cb0] [c015b41c] fbcon_takeover+0x8c/0xec
[effc7cc0] [c015d31c] fbcon_event_notify+0x1e0/0x6c8
[effc7d90] [c02d9490] notifier_call_chain+0x3c/0x94
[effc7db0] [c0045468] __blocking_notifier_call_chain+0x50/0x74
[effc7dd0] [c014f514] fb_notifier_call_chain+0x24/0x34
[effc7de0] [c0150590] register_framebuffer+0x190/0x1a8
[effc7e40] [c0185450] radeonfb_pci_register+0xe54/0xf50
[effc7e70] [c0145b04] pci_device_probe+0x6c/0xa0
[effc7e90] [c01d4108] driver_probe_device+0xfc/0x1a0
[effc7eb0] [c01d436c] __driver_attach+0xac/0x110
[effc7ed0] [c01d32f0] bus_for_each_dev+0x50/0x94
[effc7f00] [c01d3efc] driver_attach+0x24/0x34
[effc7f10] [c01d3710] bus_add_driver+0x78/0x1a0
[effc7f30] [c01d468c] driver_register+0x88/0x9c
[effc7f40] [c0145900] __pci_register_driver+0x6c/0xb8
[effc7f60] [c03e8e4c] radeonfb_init+0x20c/0x220
[effc7f80] [c03c82e4] kernel_init+0xc8/0x284
[effc7ff0] [c0013e28] kernel_thread+0x44/0x60
@@@ SLUB kmalloc-32768: Restoring Poison (0x6b) from 0xc1e20000-0xc1e27ffe
@@@ SLUB kmalloc-32768: Restoring Poison (0xa5) from 0xc1e27fff-0xc1e27fff
@@@ SLUB: kmalloc-32768 slab 0xc04de400. Marking all objects used.
Console: switching to colour frame buffer device 180x56
--
dwmw2
^ permalink raw reply [flat|nested] 13+ messages in thread* [PATCH 2.6.22 REGRESSION] Fix slab redzone alignment
2007-07-03 16:45 [1/2] 2.6.22-rc7: known regressions Michal Piotrowski
` (2 preceding siblings ...)
2007-07-05 0:20 ` David Woodhouse
@ 2007-07-05 1:26 ` David Woodhouse
2007-07-05 1:42 ` [1/2] 2.6.22-rc7: known regressions David Woodhouse
4 siblings, 0 replies; 13+ messages in thread
From: David Woodhouse @ 2007-07-05 1:26 UTC (permalink / raw)
To: Michal Piotrowski
Cc: Linus Torvalds, Andrew Morton, LKML, reiserfs-devel,
Vladimir V. Saveliev, Randy Dunlap, linux-ide, David Chinner,
sparclinux, David Miller, Mikael Pettersson, Mark Fortescue,
William Lee Irwin III
Commit b46b8f19c9cd435ecac4d9d12b39d78c137ecd66 fixed a couple of bugs
by switching the redzone to 64 bits. Unfortunately, it neglected to
ensure that the _second_ redzone, after the slab object, is aligned
correctly. This caused illegal instruction faults on sparc32, which for
some reason not entirely clear to me are not trapped and fixed up.
Two things need to be done to fix this:
- increase the object size, rounding up to alignof(long long) so
that the second redzone can be aligned correctly.
- If SLAB_STORE_USER is set but alignof(long long)==8, allow a
full 64 bits of space for the user word at the end of the buffer,
even though we may not _use_ the whole 64 bits.
This patch should be a no-op on any 64-bit architecture or any 32-bit
architecture where alignof(long long) == 4. Of the others, it's tested
on ppc32 by myself and a very similar patch was tested on sparc32 by
Mark Fortescue, who reported the new problem.
Also, fix the conditions for FORCED_DEBUG, which hadn't been adjusted to
the new sizes. Again noticed by Mark.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
diff --git a/mm/slab.c b/mm/slab.c
index a9c4472..b344e67 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -137,6 +137,7 @@
/* Shouldn't this be in a header file somewhere? */
#define BYTES_PER_WORD sizeof(void *)
+#define REDZONE_ALIGN max(BYTES_PER_WORD, __alignof__(unsigned long long))
#ifndef cache_line_size
#define cache_line_size() L1_CACHE_BYTES
@@ -547,7 +548,7 @@ static unsigned long long *dbg_redzone2(struct kmem_cache *cachep, void *objp)
if (cachep->flags & SLAB_STORE_USER)
return (unsigned long long *)(objp + cachep->buffer_size -
sizeof(unsigned long long) -
- BYTES_PER_WORD);
+ REDZONE_ALIGN);
return (unsigned long long *) (objp + cachep->buffer_size -
sizeof(unsigned long long));
}
@@ -2178,7 +2179,8 @@ kmem_cache_create (const char *name, size_t size, size_t align,
* above the next power of two: caches with object sizes just above a
* power of two have a significant amount of internal fragmentation.
*/
- if (size < 4096 || fls(size - 1) == fls(size-1 + 3 * BYTES_PER_WORD))
+ if (size < 4096 || fls(size - 1) == fls(size-1 + REDZONE_ALIGN +
+ 2 * sizeof(unsigned long long)))
flags |= SLAB_RED_ZONE | SLAB_STORE_USER;
if (!(flags & SLAB_DESTROY_BY_RCU))
flags |= SLAB_POISON;
@@ -2219,12 +2221,20 @@ kmem_cache_create (const char *name, size_t size, size_t align,
}
/*
- * Redzoning and user store require word alignment. Note this will be
- * overridden by architecture or caller mandated alignment if either
- * is greater than BYTES_PER_WORD.
+ * Redzoning and user store require word alignment or possibly larger.
+ * Note this will be overridden by architecture or caller mandated
+ * alignment if either is greater than BYTES_PER_WORD.
*/
- if (flags & SLAB_RED_ZONE || flags & SLAB_STORE_USER)
- ralign = __alignof__(unsigned long long);
+ if (flags & SLAB_STORE_USER)
+ ralign = BYTES_PER_WORD;
+
+ if (flags & SLAB_RED_ZONE) {
+ ralign = REDZONE_ALIGN;
+ /* If redzoning, ensure that the second redzone is suitably
+ * aligned, by adjusting the object size accordingly. */
+ size += REDZONE_ALIGN - 1;
+ size &= ~(REDZONE_ALIGN - 1);
+ }
/* 2) arch mandated alignment */
if (ralign < ARCH_SLAB_MINALIGN) {
@@ -2261,9 +2271,13 @@ kmem_cache_create (const char *name, size_t size, size_t align,
}
if (flags & SLAB_STORE_USER) {
/* user store requires one word storage behind the end of
- * the real object.
+ * the real object. But if the second red zone needs to be
+ * aligned to 64 bits, we must allow that much space.
*/
- size += BYTES_PER_WORD;
+ if (flags & SLAB_RED_ZONE)
+ size += REDZONE_ALIGN;
+ else
+ size += BYTES_PER_WORD;
}
#if FORCED_DEBUG && defined(CONFIG_DEBUG_PAGEALLOC)
if (size >= malloc_sizes[INDEX_L3 + 1].cs_size
--
dwmw2
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [1/2] 2.6.22-rc7: known regressions
2007-07-03 16:45 [1/2] 2.6.22-rc7: known regressions Michal Piotrowski
` (3 preceding siblings ...)
2007-07-05 1:26 ` [PATCH 2.6.22 REGRESSION] Fix slab redzone alignment David Woodhouse
@ 2007-07-05 1:42 ` David Woodhouse
2007-07-05 16:28 ` Linus Torvalds
4 siblings, 1 reply; 13+ messages in thread
From: David Woodhouse @ 2007-07-05 1:42 UTC (permalink / raw)
To: Michal Piotrowski, marcel
Cc: Linus Torvalds, Andrew Morton, LKML, reiserfs-devel,
Vladimir V. Saveliev, Randy Dunlap, linux-ide, David Chinner,
sparclinux, David Miller, Mikael Pettersson, Mark Fortescue,
William Lee Irwin III
On Tue, 2007-07-03 at 18:45 +0200, Michal Piotrowski wrote:
> Hi all,
>
> Here is a list of some known regressions in 2.6.22-rc7.
Oh, and here's another one for you. My Bluetooth mouse just stopped
working and hidd is deadlocked...
hidd D 1FE27798 5940 1695 1 (NOTLB)
Call Trace:
[ef3ddb70] [00000004] 0x4 (unreliable)
[ef3ddc30] [c0008e7c] __switch_to+0x50/0x68
[ef3ddc50] [c02d5998] schedule+0x3cc/0x480
[ef3ddc80] [c0137a20] rwsem_down_failed_common+0x1c4/0x1f4
[ef3ddcb0] [c02d7454] rwsem_down_write_failed+0x28/0x40
[ef3ddce0] [c004ff60] down_write+0x50/0x64
[ef3ddd00] [f27f2068] hidp_add_connection+0x168/0x75c [hidp]
[ef3ddd40] [f27f2e44] hidp_sock_ioctl+0x140/0x414 [hidp]
[ef3ddeb0] [c024da6c] sock_ioctl+0x248/0x284
[ef3dded0] [c00ab02c] do_ioctl+0x38/0x84
[ef3ddee0] [c00ab448] vfs_ioctl+0x3d0/0x404
[ef3ddf10] [c00ab4e4] sys_ioctl+0x68/0x98
--
dwmw2
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [1/2] 2.6.22-rc7: known regressions
2007-07-05 1:42 ` [1/2] 2.6.22-rc7: known regressions David Woodhouse
@ 2007-07-05 16:28 ` Linus Torvalds
2007-07-05 16:43 ` David Woodhouse
2007-07-05 18:46 ` David Woodhouse
0 siblings, 2 replies; 13+ messages in thread
From: Linus Torvalds @ 2007-07-05 16:28 UTC (permalink / raw)
To: David Woodhouse
Cc: Michal Piotrowski, marcel, Andrew Morton, LKML, reiserfs-devel,
Vladimir V. Saveliev, Randy Dunlap, linux-ide, David Chinner,
sparclinux, David Miller, Mikael Pettersson, Mark Fortescue,
William Lee Irwin III
On Wed, 4 Jul 2007, David Woodhouse wrote:
>
> Oh, and here's another one for you. My Bluetooth mouse just stopped
> working and hidd is deadlocked...
Looks like it is stuck on hidp_session_sem.
Nothing after 2.6.21 seems to have even touched that semaphore usage, and
in fact there's not a whole lot of changes to the hidp code at all (and
none of them look even remotely interesting).
So I suspect it's something lower down in the bluetooth stack, or it's a
long-standing problem that you are somehow able to trigger more easily
now. Is it consistent?
Can you showo the traces for the _other_ processes that are in bluetooth
functions? Because there should be other processes there, holding that
hidp_session_sem rwsem.
[ Alternatively, there is some process that doesn't release it in an error
case, but that is definitely not a regression if so: the changes to
net/bluetooth/hidp/core.c since 2.6.21 really are trivial. ]
IOW, more info needed, I think.
Linus
---
> hidd D 1FE27798 5940 1695 1 (NOTLB)
> Call Trace:
> [ef3ddb70] [00000004] 0x4 (unreliable)
> [ef3ddc30] [c0008e7c] __switch_to+0x50/0x68
> [ef3ddc50] [c02d5998] schedule+0x3cc/0x480
> [ef3ddc80] [c0137a20] rwsem_down_failed_common+0x1c4/0x1f4
> [ef3ddcb0] [c02d7454] rwsem_down_write_failed+0x28/0x40
> [ef3ddce0] [c004ff60] down_write+0x50/0x64
> [ef3ddd00] [f27f2068] hidp_add_connection+0x168/0x75c [hidp]
> [ef3ddd40] [f27f2e44] hidp_sock_ioctl+0x140/0x414 [hidp]
> [ef3ddeb0] [c024da6c] sock_ioctl+0x248/0x284
> [ef3dded0] [c00ab02c] do_ioctl+0x38/0x84
> [ef3ddee0] [c00ab448] vfs_ioctl+0x3d0/0x404
> [ef3ddf10] [c00ab4e4] sys_ioctl+0x68/0x98
>
> --
> dwmw2
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [1/2] 2.6.22-rc7: known regressions
2007-07-05 16:28 ` Linus Torvalds
@ 2007-07-05 16:43 ` David Woodhouse
2007-07-05 18:46 ` David Woodhouse
1 sibling, 0 replies; 13+ messages in thread
From: David Woodhouse @ 2007-07-05 16:43 UTC (permalink / raw)
To: Linus Torvalds
Cc: Michal Piotrowski, marcel, Andrew Morton, LKML, reiserfs-devel,
Vladimir V. Saveliev, Randy Dunlap, linux-ide, David Chinner,
sparclinux, David Miller, Mikael Pettersson, Mark Fortescue,
William Lee Irwin III
On Thu, 2007-07-05 at 09:28 -0700, Linus Torvalds wrote:
>
> On Wed, 4 Jul 2007, David Woodhouse wrote:
> >
> > Oh, and here's another one for you. My Bluetooth mouse just stopped
> > working and hidd is deadlocked...
>
> Looks like it is stuck on hidp_session_sem.
>
> Nothing after 2.6.21 seems to have even touched that semaphore usage, and
> in fact there's not a whole lot of changes to the hidp code at all (and
> none of them look even remotely interesting).
>
> So I suspect it's something lower down in the bluetooth stack, or it's a
> long-standing problem that you are somehow able to trigger more easily
> now. Is it consistent?
It happened twice before I gave up on my 2.6.22-rc7 test kernel and went
back to something earlier. I suppose I should double-check that it
wasn't my slab changes, but I really don't think that's it.
> Can you showo the traces for the _other_ processes that are in bluetooth
> functions? Because there should be other processes there, holding that
> hidp_session_sem rwsem.
There was nothing, apart from a later 'hidd -l' which got stuck on the
same semaphore. I have an hcidump of it happening, at
http://david.woodhou.se/hidd-lockup-dump.txt -- it doesn't seem
particularly enlightening though. There's just a disconnection and
reconnect, as happens quite frequently with this mouse, and then we're
deadlocked. I'll build with hidp debugging.
--
dwmw2
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [1/2] 2.6.22-rc7: known regressions
2007-07-05 16:28 ` Linus Torvalds
2007-07-05 16:43 ` David Woodhouse
@ 2007-07-05 18:46 ` David Woodhouse
2007-07-05 19:31 ` David Woodhouse
1 sibling, 1 reply; 13+ messages in thread
From: David Woodhouse @ 2007-07-05 18:46 UTC (permalink / raw)
To: Linus Torvalds
Cc: Michal Piotrowski, marcel, Andrew Morton, LKML, reiserfs-devel,
Vladimir V. Saveliev, Randy Dunlap, linux-ide, David Chinner,
sparclinux, David Miller, Mikael Pettersson, Mark Fortescue,
William Lee Irwin III
On Thu, 2007-07-05 at 09:28 -0700, Linus Torvalds wrote:
>
> On Wed, 4 Jul 2007, David Woodhouse wrote:
> >
> > Oh, and here's another one for you. My Bluetooth mouse just stopped
> > working and hidd is deadlocked...
>
> Looks like it is stuck on hidp_session_sem.
Oh, I suck. I failed to noticed that it had oopsed earlier, in slab
debugging. I shall look at my 'obviously correct' slab patch a little
harder, now that I'm not distracted by the fireworks.
--
dwmw2
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [1/2] 2.6.22-rc7: known regressions
2007-07-05 18:46 ` David Woodhouse
@ 2007-07-05 19:31 ` David Woodhouse
2007-07-05 19:51 ` Linus Torvalds
0 siblings, 1 reply; 13+ messages in thread
From: David Woodhouse @ 2007-07-05 19:31 UTC (permalink / raw)
To: Linus Torvalds
Cc: Michal Piotrowski, marcel, Andrew Morton, LKML, reiserfs-devel,
Vladimir V. Saveliev, Randy Dunlap, linux-ide, David Chinner,
sparclinux, David Miller, Mikael Pettersson, Mark Fortescue,
William Lee Irwin III, greg
On Thu, 2007-07-05 at 14:46 -0400, David Woodhouse wrote:
> Oh, I suck. I failed to noticed that it had oopsed earlier, in slab
> debugging. I shall look at my 'obviously correct' slab patch a little
> harder, now that I'm not distracted by the fireworks.
Hm, it's not something new. It's an oops I saw occasionally in 2.6.21-rc
too, whenever we had CONFIG_SYSFS_DEPRECATED set.
Unable to handle kernel paging request for data at address 0x6b6b6b6b
Faulting instruction address: 0xc001870c
Oops: Kernel access of bad area, sig: 11 [#1]
PowerMac
Modules linked in: radeon(U) drm(U) hidp(U) hci_usb(U) rfcomm(U) l2cap(U) bluetooth(U) ipv6(U) nls_utf8(U) hfsplus(U) dm_mirror(U) dm_mod(U) therm_adt746x(U) parport_pc(U) lp(U) parport(U) loop(U) snd_aoa_i2sbus(U) snd_powermac(U) snd_seq_dummy(U) snd_seq_oss(U) snd_seq_midi_event(U) snd_seq(U) ide_cd(U) cdrom(U) snd_seq_device(U) pmac_zilog(U) snd_pcm_oss(U) snd_mixer_oss(U) snd_pcm(U) snd_timer(U) snd_page_alloc(U) snd(U) soundcore(U) snd_aoa_soundbus(U) firewire_ohci(U) firewire_core(U) crc_itu_t(U) sungem(U) sungem_phy(U) bcm43xx(U) ieee80211softmac(U) ieee80211(U) ieee80211_crypt(U) ext3(U) jbd(U) mbcache(U) ehci_hcd(U) ohci_hcd(U) uhci_hcd(U)
NIP: c001870c LR: c0134fec CTR: c01d5078
REGS: eed5bdc0 TRAP: 0300 Not tainted (2.6.22-0.9.rc7.git3.fc8)
MSR: 00009032 <EE,ME,IR,DR> CR: 22000224 XER: 20000000
DAR: 6b6b6b6b, DSISR: 40000000
TASK = ef72a950[1753] 'khidpd_00000000' THREAD: eed5a000
GPR00: 6b6b6b6b eed5be70 ef72a950 6b6b6b6b 6b6b6b6a ef18a3a0 0000001a ec8701c6
GPR08: 000007aa 00000014 00000000 00000005 00000000 2002163c 100d0000 00000000
GPR16: 00000000 7fb9a006 00000003 00000000 ee12cb08 c038cf20 ec870170 c037bf38
GPR24: ef6bab08 ec8701c6 0000001a 000007aa 00000001 ef6babb0 ef6babb0 000000d0
NIP [c001870c] strlen+0x4/0x18
LR [c0134fec] kobject_get_path+0x34/0xc4
Call Trace:
[eed5be70] [c0098030] __kmalloc_track_caller+0x13c/0x164 (unreliable)
[eed5be90] [c01d5124] class_uevent+0xac/0x1bc
[eed5bed0] [c01357e4] kobject_uevent_env+0x23c/0x460
[eed5bf20] [c01d485c] class_device_del+0x178/0x1a0
[eed5bf40] [c01d489c] class_device_unregister+0x18/0x30
[eed5bf60] [c021f820] input_unregister_device+0xf4/0x130
[eed5bf70] [c0242f4c] hidinput_disconnect+0x2c/0x60
[eed5bf90] [f27f2bac] hidp_session+0x550/0x584 [hidp]
[eed5bff0] [c0013e28] kernel_thread+0x44/0x60
Instruction dump:
4082fff4 4e800020 38a3ffff 3884ffff 8c650001 2c830000 8c040001 7c601851
4d860020 4182ffec 4e800020 3883ffff <8c040001> 2c000000 4082fff8 7c632050
--
dwmw2
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [1/2] 2.6.22-rc7: known regressions
2007-07-05 19:31 ` David Woodhouse
@ 2007-07-05 19:51 ` Linus Torvalds
2007-07-05 21:03 ` Dmitry Torokhov
0 siblings, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2007-07-05 19:51 UTC (permalink / raw)
To: David Woodhouse, Dmitry Torokhov, Jiri Kosina
Cc: Michal Piotrowski, marcel, Andrew Morton, LKML, reiserfs-devel,
Vladimir V. Saveliev, Randy Dunlap, linux-ide, David Chinner,
sparclinux, David Miller, Mikael Pettersson, Mark Fortescue,
William Lee Irwin III, Greg KH
Looks input-related..
On Thu, 5 Jul 2007, David Woodhouse wrote:
>
> Hm, it's not something new. It's an oops I saw occasionally in 2.6.21-rc
> too, whenever we had CONFIG_SYSFS_DEPRECATED set.
>
> Unable to handle kernel paging request for data at address 0x6b6b6b6b
Ok, that 0x6b is obviously the kfree() poisoning, ie it looks like a
use-after-free problem with a pointer being loaded from a structure that
had been free'd-
And the trace seems to be (ignore the unreliable one):
> NIP [c001870c] strlen+0x4/0x18
> LR [c0134fec] kobject_get_path+0x34/0xc4
> Call Trace:
> [eed5be90] [c01d5124] class_uevent+0xac/0x1bc
> [eed5bed0] [c01357e4] kobject_uevent_env+0x23c/0x460
> [eed5bf20] [c01d485c] class_device_del+0x178/0x1a0
> [eed5bf40] [c01d489c] class_device_unregister+0x18/0x30
> [eed5bf60] [c021f820] input_unregister_device+0xf4/0x130
> [eed5bf70] [c0242f4c] hidinput_disconnect+0x2c/0x60
> [eed5bf90] [f27f2bac] hidp_session+0x550/0x584 [hidp]
> [eed5bff0] [c0013e28] kernel_thread+0x44/0x60
Where we have a few missing functions due to inlining, ie the real
sequence seems to be:
class_device_del ->
kobject_uevent_env ->
class_uevent ->
kobject_get_path ->
get_kobj_path_length ->
parent = kobj;
do {
strlen(parent->k_name /* kobject_name(parent) */);
parent = parent->parent;
} while (parent);
so either the kobj or one of it's parents had already been freed when it
was unregistered due to the disconnect.
I'm not seeing any reference counting or other protection for the device
("input") on "hid->inputs" list. But I don't know the code. Dmitry? Jiri?
Linus
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [1/2] 2.6.22-rc7: known regressions
2007-07-05 19:51 ` Linus Torvalds
@ 2007-07-05 21:03 ` Dmitry Torokhov
0 siblings, 0 replies; 13+ messages in thread
From: Dmitry Torokhov @ 2007-07-05 21:03 UTC (permalink / raw)
To: Linus Torvalds
Cc: David Woodhouse, Jiri Kosina, Michal Piotrowski, marcel,
Andrew Morton, LKML, reiserfs-devel, Vladimir V. Saveliev,
Randy Dunlap, linux-ide, David Chinner, sparclinux, David Miller,
Mikael Pettersson, Mark Fortescue, William Lee Irwin III, Greg KH
On 7/5/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> Looks input-related..
>
> On Thu, 5 Jul 2007, David Woodhouse wrote:
> >
> > Hm, it's not something new. It's an oops I saw occasionally in 2.6.21-rc
> > too, whenever we had CONFIG_SYSFS_DEPRECATED set.
> >
> > Unable to handle kernel paging request for data at address 0x6b6b6b6b
>
> Ok, that 0x6b is obviously the kfree() poisoning, ie it looks like a
> use-after-free problem with a pointer being loaded from a structure that
> had been free'd-
>
> And the trace seems to be (ignore the unreliable one):
>
> > NIP [c001870c] strlen+0x4/0x18
> > LR [c0134fec] kobject_get_path+0x34/0xc4
> > Call Trace:
> > [eed5be90] [c01d5124] class_uevent+0xac/0x1bc
> > [eed5bed0] [c01357e4] kobject_uevent_env+0x23c/0x460
> > [eed5bf20] [c01d485c] class_device_del+0x178/0x1a0
> > [eed5bf40] [c01d489c] class_device_unregister+0x18/0x30
> > [eed5bf60] [c021f820] input_unregister_device+0xf4/0x130
> > [eed5bf70] [c0242f4c] hidinput_disconnect+0x2c/0x60
> > [eed5bf90] [f27f2bac] hidp_session+0x550/0x584 [hidp]
> > [eed5bff0] [c0013e28] kernel_thread+0x44/0x60
>
> Where we have a few missing functions due to inlining, ie the real
> sequence seems to be:
>
> class_device_del ->
> kobject_uevent_env ->
> class_uevent ->
> kobject_get_path ->
> get_kobj_path_length ->
> parent = kobj;
> do {
> strlen(parent->k_name /* kobject_name(parent) */);
> parent = parent->parent;
> } while (parent);
>
> so either the kobj or one of it's parents had already been freed when it
> was unregistered due to the disconnect.
>
> I'm not seeing any reference counting or other protection for the device
> ("input") on "hid->inputs" list. But I don't know the code. Dmitry? Jiri?
>
In hidinput_connect we do:
input_dev->dev.parent = hid->dev;
This should pin hid object untill all inputs are released. However
bluetooth does not use driver model and does not have hid->dev set up
and so it looks like we are simply trying to unregister an input
device that is already gone... I still don't quite get how we
unregister the same device twice - it is done from a per-hid-device
thread in hidp...
--
Dmitry
^ permalink raw reply [flat|nested] 13+ messages in thread