linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [1/2] 2.6.22-rc7: known regressions
@ 2007-07-03 16:45 Michal Piotrowski
  2007-07-03 17:29 ` Sparc32: random invalid instruction occourances on sparc32 (sun4c) Mark Fortescue
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Michal Piotrowski @ 2007-07-03 16:45 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, LKML, reiserfs-devel,
	Vladimir V. Saveliev, Randy Dunlap, linux-ide, David Chinner,
	sparclinux, David Miller, Mikael Pettersson, Mark Fortescue,
	William Lee Irwin III

Hi all,

Here is a list of some known regressions in 2.6.22-rc7.

Feel free to add new regressions/remove fixed etc.
http://kernelnewbies.org/known_regressions

List of Aces

Name                    Regressions fixed since 21-Jun-2007
Hugh Dickins                           2
Andi Kleen                             1
Andrew Morton                          1
Benjamin Herrenschmidt                 1
Björn Steinbrink                       1
Bjorn Helgaas                          1
Jean Delvare                           1
Olaf Hering                            1
Siddha, Suresh B                       1
Trent Piepho                           1
Ville Syrjälä                          1



FS

Subject    : 2.6.22-rc4-git5 reiserfs: null ptr deref.
References : http://lkml.org/lkml/2007/6/13/322
Submitter  : Randy Dunlap <randy.dunlap@oracle.com>
Handled-By : Vladimir V. Saveliev <vs@namesys.com>
Status     : problem is being debugged



IDE

Subject    : 2.6.22-rcX: hda: lost interrupt
References : http://lkml.org/lkml/2007/6/29/121
Submitter  : David Chinner <dgc@sgi.com>
Status     : unknown



Sparc64

Subject    : random invalid instruction occourances on sparc32 (sun4c)
References : http://lkml.org/lkml/2007/6/17/111
Submitter  : Mark Fortescue <mark@mtfhpc.demon.co.uk>
Status     : problem is being debugged

Subject    : 2.6.22-rc broke X on Ultra5
References : http://lkml.org/lkml/2007/5/22/78
Submitter  : Mikael Pettersson <mikpe@it.uu.se>
Handled-By : David Miller <davem@davemloft.net>
Status     : problem is being debugged



Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Sparc32: random invalid instruction occourances on sparc32 (sun4c)
  2007-07-03 16:45 [1/2] 2.6.22-rc7: known regressions Michal Piotrowski
@ 2007-07-03 17:29 ` Mark Fortescue
  2007-07-03 17:50 ` [1/2] 2.6.22-rc7: known regressions Bartlomiej Zolnierkiewicz
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Mark Fortescue @ 2007-07-03 17:29 UTC (permalink / raw)
  To: Michal Piotrowski
  Cc: Linus Torvalds, Andrew Morton, LKML, reiserfs-devel,
	Vladimir V. Saveliev, Randy Dunlap, linux-ide, David Chinner,
	linux-mm, sparclinux, David Miller, Mikael Pettersson,
	William Lee Irwin III

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2404 bytes --]

Hi all,

I think I have found the cause of the problem.

Commit b46b8f19c9cd435ecac4d9d12b39d78c137ecd66 partially fixed alignment 
issues but does not ensure that all 64bit alignment requirements of 
sparc32 are met. Tests have shown that the redzone2 word can become 
misallignd.

I am currently working on a posible fix.

Regards
 	Mark Fortescue.

On Tue, 3 Jul 2007, Michal Piotrowski wrote:

> Hi all,
>
> Here is a list of some known regressions in 2.6.22-rc7.
>
> Feel free to add new regressions/remove fixed etc.
> http://kernelnewbies.org/known_regressions
>
> List of Aces
>
> Name                    Regressions fixed since 21-Jun-2007
> Hugh Dickins                           2
> Andi Kleen                             1
> Andrew Morton                          1
> Benjamin Herrenschmidt                 1
> Björn Steinbrink                       1
> Bjorn Helgaas                          1
> Jean Delvare                           1
> Olaf Hering                            1
> Siddha, Suresh B                       1
> Trent Piepho                           1
> Ville Syrjälä                          1
>
>
>
> FS
>
> Subject    : 2.6.22-rc4-git5 reiserfs: null ptr deref.
> References : http://lkml.org/lkml/2007/6/13/322
> Submitter  : Randy Dunlap <randy.dunlap@oracle.com>
> Handled-By : Vladimir V. Saveliev <vs@namesys.com>
> Status     : problem is being debugged
>
>
>
> IDE
>
> Subject    : 2.6.22-rcX: hda: lost interrupt
> References : http://lkml.org/lkml/2007/6/29/121
> Submitter  : David Chinner <dgc@sgi.com>
> Status     : unknown
>
>
>
> Sparc64
>
> Subject    : random invalid instruction occourances on sparc32 (sun4c)
> References : http://lkml.org/lkml/2007/6/17/111
> Submitter  : Mark Fortescue <mark@mtfhpc.demon.co.uk>
> Status     : problem is being debugged
>
> Subject    : 2.6.22-rc broke X on Ultra5
> References : http://lkml.org/lkml/2007/5/22/78
> Submitter  : Mikael Pettersson <mikpe@it.uu.se>
> Handled-By : David Miller <davem@davemloft.net>
> Status     : problem is being debugged
>
>
>
> Regards,
> Michal
>
> --
> LOG
> http://www.stardust.webpages.pl/log/
> -
> To unsubscribe from this list: send the line "unsubscribe sparclinux" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [1/2] 2.6.22-rc7: known regressions
  2007-07-03 16:45 [1/2] 2.6.22-rc7: known regressions Michal Piotrowski
  2007-07-03 17:29 ` Sparc32: random invalid instruction occourances on sparc32 (sun4c) Mark Fortescue
@ 2007-07-03 17:50 ` Bartlomiej Zolnierkiewicz
  2007-07-03 23:09   ` David Chinner
  2007-07-05  0:20 ` David Woodhouse
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2007-07-03 17:50 UTC (permalink / raw)
  To: Michal Piotrowski
  Cc: Linus Torvalds, Andrew Morton, LKML, reiserfs-devel,
	Vladimir V. Saveliev, Randy Dunlap, linux-ide, David Chinner,
	sparclinux, David Miller, Mikael Pettersson, Mark Fortescue,
	William Lee Irwin III


Hi,

On Tuesday 03 July 2007, Michal Piotrowski wrote:

> IDE
> 
> Subject    : 2.6.22-rcX: hda: lost interrupt
> References : http://lkml.org/lkml/2007/6/29/121
> Submitter  : David Chinner <dgc@sgi.com>
> Status     : unknown

David, any news on this one?

Have you tried libata as suggested by Jeff?

[ would exclude IRQ routing issue or broken hardware ]

Thanks,
Bart

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [1/2] 2.6.22-rc7: known regressions
  2007-07-03 17:50 ` [1/2] 2.6.22-rc7: known regressions Bartlomiej Zolnierkiewicz
@ 2007-07-03 23:09   ` David Chinner
  0 siblings, 0 replies; 13+ messages in thread
From: David Chinner @ 2007-07-03 23:09 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Michal Piotrowski, Linus Torvalds, Andrew Morton, LKML,
	reiserfs-devel, Vladimir V. Saveliev, Randy Dunlap, linux-ide,
	David Chinner, sparclinux, David Miller, Mikael Pettersson,
	Mark Fortescue, William Lee Irwin III

On Tue, Jul 03, 2007 at 07:50:26PM +0200, Bartlomiej Zolnierkiewicz wrote:
> 
> Hi,
> 
> On Tuesday 03 July 2007, Michal Piotrowski wrote:
> 
> > IDE
> > 
> > Subject    : 2.6.22-rcX: hda: lost interrupt
> > References : http://lkml.org/lkml/2007/6/29/121
> > Submitter  : David Chinner <dgc@sgi.com>
> > Status     : unknown
> 
> David, any news on this one?
> 
> Have you tried libata as suggested by Jeff?

Not yet - I've been flat out and haven't got back to it yet.
I'll try to get to it today.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [1/2] 2.6.22-rc7: known regressions
  2007-07-03 16:45 [1/2] 2.6.22-rc7: known regressions Michal Piotrowski
  2007-07-03 17:29 ` Sparc32: random invalid instruction occourances on sparc32 (sun4c) Mark Fortescue
  2007-07-03 17:50 ` [1/2] 2.6.22-rc7: known regressions Bartlomiej Zolnierkiewicz
@ 2007-07-05  0:20 ` David Woodhouse
  2007-07-05  1:26 ` [PATCH 2.6.22 REGRESSION] Fix slab redzone alignment David Woodhouse
  2007-07-05  1:42 ` [1/2] 2.6.22-rc7: known regressions David Woodhouse
  4 siblings, 0 replies; 13+ messages in thread
From: David Woodhouse @ 2007-07-05  0:20 UTC (permalink / raw)
  To: Michal Piotrowski
  Cc: Linus Torvalds, Andrew Morton, LKML, reiserfs-devel,
	Vladimir V. Saveliev, Randy Dunlap, linux-ide, David Chinner,
	sparclinux, David Miller, Mikael Pettersson, Mark Fortescue,
	William Lee Irwin III

On Tue, 2007-07-03 at 18:45 +0200, Michal Piotrowski wrote:
> Subject    : random invalid instruction occourances on sparc32 (sun4c)
> References : http://lkml.org/lkml/2007/6/17/111
> Submitter  : Mark Fortescue <mark@mtfhpc.demon.co.uk>
> Status     : problem is being debugged 

Hm, when testing the fix for that on ppc32, I stupidly built with Slub
instead, and got this...

radeonfb: Monitor 1 type LCD found
radeonfb: EDID probed
radeonfb: Monitor 2 type no found
radeonfb: Using Firmware dividers 0x00040080 from PPLL 0
radeonfb: Dynamic Clock Power Management enabled
*** SLUB kmalloc-32768: Poison check failed@0xc1e20000 slab 0xc04de400 [Not tainted]
    offset=0 flags=0x40c3 inuse=0 freelist=0xc1e20000
    Object 0xc1e20000:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    Object 0xc1e20010:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    Object 0xc1e20020:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    Object 0xc1e20030:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    Object 0xc1e20040:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    Object 0xc1e20050:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    Object 0xc1e20060:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    Object 0xc1e20070:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
   Redzone 0xc1e28000:  bb bb bb bb                                     <BB><BB><BB><BB>            
FreePointer 0xc1e28004 -> 0x00000000
Last alloc: malloc+0x14/0x24 jiffies_ago=1382 cpu=0 pid=1
Last free : free+0x10/0x20 jiffies_ago=837 cpu=0 pid=1
    Filler 0xc1e28028:  5a 5a 5a 5a 5a 5a 5a 5a                         ZZZZZZZZ        
Call Trace:
[effc7b80] [c000893c] show_stack+0x50/0x184 (unreliable)
[effc7ba0] [c009705c] object_err+0x178/0x18c
[effc7bc0] [c0097380] check_object+0x180/0x2ec
[effc7be0] [c0098320] __slab_alloc+0x5c8/0x5f4
[effc7c10] [c0098aa4] __kmalloc+0x64/0x9c
[effc7c30] [c015f5dc] fbcon_startup+0x154/0x2c0
[effc7c60] [c01bb8ec] register_con_driver+0x94/0x164
[effc7c90] [c01bedc8] take_over_console+0x24/0x58
[effc7cb0] [c015b41c] fbcon_takeover+0x8c/0xec
[effc7cc0] [c015d31c] fbcon_event_notify+0x1e0/0x6c8
[effc7d90] [c02d9490] notifier_call_chain+0x3c/0x94
[effc7db0] [c0045468] __blocking_notifier_call_chain+0x50/0x74
[effc7dd0] [c014f514] fb_notifier_call_chain+0x24/0x34
[effc7de0] [c0150590] register_framebuffer+0x190/0x1a8
[effc7e40] [c0185450] radeonfb_pci_register+0xe54/0xf50
[effc7e70] [c0145b04] pci_device_probe+0x6c/0xa0
[effc7e90] [c01d4108] driver_probe_device+0xfc/0x1a0
[effc7eb0] [c01d436c] __driver_attach+0xac/0x110
[effc7ed0] [c01d32f0] bus_for_each_dev+0x50/0x94
[effc7f00] [c01d3efc] driver_attach+0x24/0x34
[effc7f10] [c01d3710] bus_add_driver+0x78/0x1a0
[effc7f30] [c01d468c] driver_register+0x88/0x9c
[effc7f40] [c0145900] __pci_register_driver+0x6c/0xb8
[effc7f60] [c03e8e4c] radeonfb_init+0x20c/0x220
[effc7f80] [c03c82e4] kernel_init+0xc8/0x284
[effc7ff0] [c0013e28] kernel_thread+0x44/0x60
@@@ SLUB kmalloc-32768: Restoring Poison (0x6b) from 0xc1e20000-0xc1e27ffe
@@@ SLUB kmalloc-32768: Restoring Poison (0xa5) from 0xc1e27fff-0xc1e27fff
@@@ SLUB: kmalloc-32768 slab 0xc04de400. Marking all objects used.
Console: switching to colour frame buffer device 180x56


-- 
dwmw2


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 2.6.22 REGRESSION] Fix slab redzone alignment
  2007-07-03 16:45 [1/2] 2.6.22-rc7: known regressions Michal Piotrowski
                   ` (2 preceding siblings ...)
  2007-07-05  0:20 ` David Woodhouse
@ 2007-07-05  1:26 ` David Woodhouse
  2007-07-05  1:42 ` [1/2] 2.6.22-rc7: known regressions David Woodhouse
  4 siblings, 0 replies; 13+ messages in thread
From: David Woodhouse @ 2007-07-05  1:26 UTC (permalink / raw)
  To: Michal Piotrowski
  Cc: Linus Torvalds, Andrew Morton, LKML, reiserfs-devel,
	Vladimir V. Saveliev, Randy Dunlap, linux-ide, David Chinner,
	sparclinux, David Miller, Mikael Pettersson, Mark Fortescue,
	William Lee Irwin III

Commit b46b8f19c9cd435ecac4d9d12b39d78c137ecd66 fixed a couple of bugs
by switching the redzone to 64 bits. Unfortunately, it neglected to
ensure that the _second_ redzone, after the slab object, is aligned
correctly. This caused illegal instruction faults on sparc32, which for
some reason not entirely clear to me are not trapped and fixed up.

Two things need to be done to fix this:
  - increase the object size, rounding up to alignof(long long) so
    that the second redzone can be aligned correctly.
  - If SLAB_STORE_USER is set but alignof(long long)==8, allow a
    full 64 bits of space for the user word at the end of the buffer,
    even though we may not _use_ the whole 64 bits.

This patch should be a no-op on any 64-bit architecture or any 32-bit
architecture where alignof(long long) == 4. Of the others, it's tested
on ppc32 by myself and a very similar patch was tested on sparc32 by
Mark Fortescue, who reported the new problem.

Also, fix the conditions for FORCED_DEBUG, which hadn't been adjusted to
the new sizes. Again noticed by Mark.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>

diff --git a/mm/slab.c b/mm/slab.c
index a9c4472..b344e67 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -137,6 +137,7 @@
 
 /* Shouldn't this be in a header file somewhere? */
 #define	BYTES_PER_WORD		sizeof(void *)
+#define	REDZONE_ALIGN		max(BYTES_PER_WORD, __alignof__(unsigned long long))
 
 #ifndef cache_line_size
 #define cache_line_size()	L1_CACHE_BYTES
@@ -547,7 +548,7 @@ static unsigned long long *dbg_redzone2(struct kmem_cache *cachep, void *objp)
 	if (cachep->flags & SLAB_STORE_USER)
 		return (unsigned long long *)(objp + cachep->buffer_size -
 					      sizeof(unsigned long long) -
-					      BYTES_PER_WORD);
+					      REDZONE_ALIGN);
 	return (unsigned long long *) (objp + cachep->buffer_size -
 				       sizeof(unsigned long long));
 }
@@ -2178,7 +2179,8 @@ kmem_cache_create (const char *name, size_t size, size_t align,
 	 * above the next power of two: caches with object sizes just above a
 	 * power of two have a significant amount of internal fragmentation.
 	 */
-	if (size < 4096 || fls(size - 1) == fls(size-1 + 3 * BYTES_PER_WORD))
+	if (size < 4096 || fls(size - 1) == fls(size-1 + REDZONE_ALIGN +
+						2 * sizeof(unsigned long long)))
 		flags |= SLAB_RED_ZONE | SLAB_STORE_USER;
 	if (!(flags & SLAB_DESTROY_BY_RCU))
 		flags |= SLAB_POISON;
@@ -2219,12 +2221,20 @@ kmem_cache_create (const char *name, size_t size, size_t align,
 	}
 
 	/*
-	 * Redzoning and user store require word alignment. Note this will be
-	 * overridden by architecture or caller mandated alignment if either
-	 * is greater than BYTES_PER_WORD.
+	 * Redzoning and user store require word alignment or possibly larger.
+	 * Note this will be overridden by architecture or caller mandated
+	 * alignment if either is greater than BYTES_PER_WORD.
 	 */
-	if (flags & SLAB_RED_ZONE || flags & SLAB_STORE_USER)
-		ralign = __alignof__(unsigned long long);
+	if (flags & SLAB_STORE_USER)
+		ralign = BYTES_PER_WORD;
+
+	if (flags & SLAB_RED_ZONE) {
+		ralign = REDZONE_ALIGN;
+		/* If redzoning, ensure that the second redzone is suitably
+		 * aligned, by adjusting the object size accordingly. */
+		size += REDZONE_ALIGN - 1;
+		size &= ~(REDZONE_ALIGN - 1);
+	}
 
 	/* 2) arch mandated alignment */
 	if (ralign < ARCH_SLAB_MINALIGN) {
@@ -2261,9 +2271,13 @@ kmem_cache_create (const char *name, size_t size, size_t align,
 	}
 	if (flags & SLAB_STORE_USER) {
 		/* user store requires one word storage behind the end of
-		 * the real object.
+		 * the real object. But if the second red zone needs to be
+		 * aligned to 64 bits, we must allow that much space.
 		 */
-		size += BYTES_PER_WORD;
+		if (flags & SLAB_RED_ZONE)
+			size += REDZONE_ALIGN;
+		else
+			size += BYTES_PER_WORD;
 	}
 #if FORCED_DEBUG && defined(CONFIG_DEBUG_PAGEALLOC)
 	if (size >= malloc_sizes[INDEX_L3 + 1].cs_size


-- 
dwmw2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [1/2] 2.6.22-rc7: known regressions
  2007-07-03 16:45 [1/2] 2.6.22-rc7: known regressions Michal Piotrowski
                   ` (3 preceding siblings ...)
  2007-07-05  1:26 ` [PATCH 2.6.22 REGRESSION] Fix slab redzone alignment David Woodhouse
@ 2007-07-05  1:42 ` David Woodhouse
  2007-07-05 16:28   ` Linus Torvalds
  4 siblings, 1 reply; 13+ messages in thread
From: David Woodhouse @ 2007-07-05  1:42 UTC (permalink / raw)
  To: Michal Piotrowski, marcel
  Cc: Linus Torvalds, Andrew Morton, LKML, reiserfs-devel,
	Vladimir V. Saveliev, Randy Dunlap, linux-ide, David Chinner,
	sparclinux, David Miller, Mikael Pettersson, Mark Fortescue,
	William Lee Irwin III

On Tue, 2007-07-03 at 18:45 +0200, Michal Piotrowski wrote:
> Hi all,
> 
> Here is a list of some known regressions in 2.6.22-rc7.

Oh, and here's another one for you. My Bluetooth mouse just stopped
working and hidd is deadlocked...

hidd          D 1FE27798  5940  1695      1 (NOTLB)
Call Trace:
[ef3ddb70] [00000004] 0x4 (unreliable)
[ef3ddc30] [c0008e7c] __switch_to+0x50/0x68
[ef3ddc50] [c02d5998] schedule+0x3cc/0x480
[ef3ddc80] [c0137a20] rwsem_down_failed_common+0x1c4/0x1f4
[ef3ddcb0] [c02d7454] rwsem_down_write_failed+0x28/0x40
[ef3ddce0] [c004ff60] down_write+0x50/0x64
[ef3ddd00] [f27f2068] hidp_add_connection+0x168/0x75c [hidp]
[ef3ddd40] [f27f2e44] hidp_sock_ioctl+0x140/0x414 [hidp]
[ef3ddeb0] [c024da6c] sock_ioctl+0x248/0x284
[ef3dded0] [c00ab02c] do_ioctl+0x38/0x84
[ef3ddee0] [c00ab448] vfs_ioctl+0x3d0/0x404
[ef3ddf10] [c00ab4e4] sys_ioctl+0x68/0x98

-- 
dwmw2

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [1/2] 2.6.22-rc7: known regressions
  2007-07-05  1:42 ` [1/2] 2.6.22-rc7: known regressions David Woodhouse
@ 2007-07-05 16:28   ` Linus Torvalds
  2007-07-05 16:43     ` David Woodhouse
  2007-07-05 18:46     ` David Woodhouse
  0 siblings, 2 replies; 13+ messages in thread
From: Linus Torvalds @ 2007-07-05 16:28 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Michal Piotrowski, marcel, Andrew Morton, LKML, reiserfs-devel,
	Vladimir V. Saveliev, Randy Dunlap, linux-ide, David Chinner,
	sparclinux, David Miller, Mikael Pettersson, Mark Fortescue,
	William Lee Irwin III



On Wed, 4 Jul 2007, David Woodhouse wrote:
> 
> Oh, and here's another one for you. My Bluetooth mouse just stopped
> working and hidd is deadlocked...

Looks like it is stuck on hidp_session_sem.

Nothing after 2.6.21 seems to have even touched that semaphore usage, and 
in fact there's not a whole lot of changes to the hidp code at all (and 
none of them look even remotely interesting). 

So I suspect it's something lower down in the bluetooth stack, or it's a 
long-standing problem that you are somehow able to trigger more easily 
now. Is it consistent?

Can you showo the traces for the _other_ processes that are in bluetooth 
functions? Because there should be other processes there, holding that 
hidp_session_sem rwsem.

[ Alternatively, there is some process that doesn't release it in an error 
  case, but that is definitely not a regression if so: the changes to 
  net/bluetooth/hidp/core.c since 2.6.21 really are trivial. ]

IOW, more info needed, I think.

			Linus

---
> hidd          D 1FE27798  5940  1695      1 (NOTLB)
> Call Trace:
> [ef3ddb70] [00000004] 0x4 (unreliable)
> [ef3ddc30] [c0008e7c] __switch_to+0x50/0x68
> [ef3ddc50] [c02d5998] schedule+0x3cc/0x480
> [ef3ddc80] [c0137a20] rwsem_down_failed_common+0x1c4/0x1f4
> [ef3ddcb0] [c02d7454] rwsem_down_write_failed+0x28/0x40
> [ef3ddce0] [c004ff60] down_write+0x50/0x64
> [ef3ddd00] [f27f2068] hidp_add_connection+0x168/0x75c [hidp]
> [ef3ddd40] [f27f2e44] hidp_sock_ioctl+0x140/0x414 [hidp]
> [ef3ddeb0] [c024da6c] sock_ioctl+0x248/0x284
> [ef3dded0] [c00ab02c] do_ioctl+0x38/0x84
> [ef3ddee0] [c00ab448] vfs_ioctl+0x3d0/0x404
> [ef3ddf10] [c00ab4e4] sys_ioctl+0x68/0x98
> 
> -- 
> dwmw2
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [1/2] 2.6.22-rc7: known regressions
  2007-07-05 16:28   ` Linus Torvalds
@ 2007-07-05 16:43     ` David Woodhouse
  2007-07-05 18:46     ` David Woodhouse
  1 sibling, 0 replies; 13+ messages in thread
From: David Woodhouse @ 2007-07-05 16:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Michal Piotrowski, marcel, Andrew Morton, LKML, reiserfs-devel,
	Vladimir V. Saveliev, Randy Dunlap, linux-ide, David Chinner,
	sparclinux, David Miller, Mikael Pettersson, Mark Fortescue,
	William Lee Irwin III

On Thu, 2007-07-05 at 09:28 -0700, Linus Torvalds wrote:
> 
> On Wed, 4 Jul 2007, David Woodhouse wrote:
> > 
> > Oh, and here's another one for you. My Bluetooth mouse just stopped
> > working and hidd is deadlocked...
> 
> Looks like it is stuck on hidp_session_sem.
> 
> Nothing after 2.6.21 seems to have even touched that semaphore usage, and 
> in fact there's not a whole lot of changes to the hidp code at all (and 
> none of them look even remotely interesting). 
> 
> So I suspect it's something lower down in the bluetooth stack, or it's a 
> long-standing problem that you are somehow able to trigger more easily 
> now. Is it consistent?

It happened twice before I gave up on my 2.6.22-rc7 test kernel and went
back to something earlier. I suppose I should double-check that it
wasn't my slab changes, but I really don't think that's it.

> Can you showo the traces for the _other_ processes that are in bluetooth 
> functions? Because there should be other processes there, holding that 
> hidp_session_sem rwsem.

There was nothing, apart from a later 'hidd -l' which got stuck on the
same semaphore. I have an hcidump of it happening, at
http://david.woodhou.se/hidd-lockup-dump.txt -- it doesn't seem
particularly enlightening though. There's just a disconnection and
reconnect, as happens quite frequently with this mouse, and then we're
deadlocked. I'll build with hidp debugging.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [1/2] 2.6.22-rc7: known regressions
  2007-07-05 16:28   ` Linus Torvalds
  2007-07-05 16:43     ` David Woodhouse
@ 2007-07-05 18:46     ` David Woodhouse
  2007-07-05 19:31       ` David Woodhouse
  1 sibling, 1 reply; 13+ messages in thread
From: David Woodhouse @ 2007-07-05 18:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Michal Piotrowski, marcel, Andrew Morton, LKML, reiserfs-devel,
	Vladimir V. Saveliev, Randy Dunlap, linux-ide, David Chinner,
	sparclinux, David Miller, Mikael Pettersson, Mark Fortescue,
	William Lee Irwin III

On Thu, 2007-07-05 at 09:28 -0700, Linus Torvalds wrote:
> 
> On Wed, 4 Jul 2007, David Woodhouse wrote:
> > 
> > Oh, and here's another one for you. My Bluetooth mouse just stopped
> > working and hidd is deadlocked...
> 
> Looks like it is stuck on hidp_session_sem.

Oh, I suck. I failed to noticed that it had oopsed earlier, in slab
debugging. I shall look at my 'obviously correct' slab patch a little
harder, now that I'm not distracted by the fireworks.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [1/2] 2.6.22-rc7: known regressions
  2007-07-05 18:46     ` David Woodhouse
@ 2007-07-05 19:31       ` David Woodhouse
  2007-07-05 19:51         ` Linus Torvalds
  0 siblings, 1 reply; 13+ messages in thread
From: David Woodhouse @ 2007-07-05 19:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Michal Piotrowski, marcel, Andrew Morton, LKML, reiserfs-devel,
	Vladimir V. Saveliev, Randy Dunlap, linux-ide, David Chinner,
	sparclinux, David Miller, Mikael Pettersson, Mark Fortescue,
	William Lee Irwin III, greg

On Thu, 2007-07-05 at 14:46 -0400, David Woodhouse wrote:
> Oh, I suck. I failed to noticed that it had oopsed earlier, in slab
> debugging. I shall look at my 'obviously correct' slab patch a little
> harder, now that I'm not distracted by the fireworks.

Hm, it's not something new. It's an oops I saw occasionally in 2.6.21-rc
too, whenever we had CONFIG_SYSFS_DEPRECATED set.

 Unable to handle kernel paging request for data at address 0x6b6b6b6b
 Faulting instruction address: 0xc001870c
 Oops: Kernel access of bad area, sig: 11 [#1]
 PowerMac
 Modules linked in: radeon(U) drm(U) hidp(U) hci_usb(U) rfcomm(U) l2cap(U) bluetooth(U) ipv6(U) nls_utf8(U) hfsplus(U) dm_mirror(U) dm_mod(U) therm_adt746x(U) parport_pc(U) lp(U) parport(U) loop(U) snd_aoa_i2sbus(U) snd_powermac(U) snd_seq_dummy(U) snd_seq_oss(U) snd_seq_midi_event(U) snd_seq(U) ide_cd(U) cdrom(U) snd_seq_device(U) pmac_zilog(U) snd_pcm_oss(U) snd_mixer_oss(U) snd_pcm(U) snd_timer(U) snd_page_alloc(U) snd(U) soundcore(U) snd_aoa_soundbus(U) firewire_ohci(U) firewire_core(U) crc_itu_t(U) sungem(U) sungem_phy(U) bcm43xx(U) ieee80211softmac(U) ieee80211(U) ieee80211_crypt(U) ext3(U) jbd(U) mbcache(U) ehci_hcd(U) ohci_hcd(U) uhci_hcd(U)
 NIP: c001870c LR: c0134fec CTR: c01d5078
 REGS: eed5bdc0 TRAP: 0300   Not tainted  (2.6.22-0.9.rc7.git3.fc8)
 MSR: 00009032 <EE,ME,IR,DR>  CR: 22000224  XER: 20000000
 DAR: 6b6b6b6b, DSISR: 40000000
 TASK = ef72a950[1753] 'khidpd_00000000' THREAD: eed5a000
 GPR00: 6b6b6b6b eed5be70 ef72a950 6b6b6b6b 6b6b6b6a ef18a3a0 0000001a ec8701c6 
 GPR08: 000007aa 00000014 00000000 00000005 00000000 2002163c 100d0000 00000000 
 GPR16: 00000000 7fb9a006 00000003 00000000 ee12cb08 c038cf20 ec870170 c037bf38 
 GPR24: ef6bab08 ec8701c6 0000001a 000007aa 00000001 ef6babb0 ef6babb0 000000d0 
 NIP [c001870c] strlen+0x4/0x18
 LR [c0134fec] kobject_get_path+0x34/0xc4
 Call Trace:
 [eed5be70] [c0098030] __kmalloc_track_caller+0x13c/0x164 (unreliable)
 [eed5be90] [c01d5124] class_uevent+0xac/0x1bc
 [eed5bed0] [c01357e4] kobject_uevent_env+0x23c/0x460
 [eed5bf20] [c01d485c] class_device_del+0x178/0x1a0
 [eed5bf40] [c01d489c] class_device_unregister+0x18/0x30
 [eed5bf60] [c021f820] input_unregister_device+0xf4/0x130
 [eed5bf70] [c0242f4c] hidinput_disconnect+0x2c/0x60
 [eed5bf90] [f27f2bac] hidp_session+0x550/0x584 [hidp]
 [eed5bff0] [c0013e28] kernel_thread+0x44/0x60
 Instruction dump:
 4082fff4 4e800020 38a3ffff 3884ffff 8c650001 2c830000 8c040001 7c601851 
 4d860020 4182ffec 4e800020 3883ffff <8c040001> 2c000000 4082fff8 7c632050 


-- 
dwmw2


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [1/2] 2.6.22-rc7: known regressions
  2007-07-05 19:31       ` David Woodhouse
@ 2007-07-05 19:51         ` Linus Torvalds
  2007-07-05 21:03           ` Dmitry Torokhov
  0 siblings, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2007-07-05 19:51 UTC (permalink / raw)
  To: David Woodhouse, Dmitry Torokhov, Jiri Kosina
  Cc: Michal Piotrowski, marcel, Andrew Morton, LKML, reiserfs-devel,
	Vladimir V. Saveliev, Randy Dunlap, linux-ide, David Chinner,
	sparclinux, David Miller, Mikael Pettersson, Mark Fortescue,
	William Lee Irwin III, Greg KH


Looks input-related..

On Thu, 5 Jul 2007, David Woodhouse wrote:
> 
> Hm, it's not something new. It's an oops I saw occasionally in 2.6.21-rc
> too, whenever we had CONFIG_SYSFS_DEPRECATED set.
> 
>  Unable to handle kernel paging request for data at address 0x6b6b6b6b

Ok, that 0x6b is obviously the kfree() poisoning, ie it looks like a 
use-after-free problem with a pointer being loaded from a structure that 
had been free'd-

And the trace seems to be (ignore the unreliable one):

>  NIP [c001870c] strlen+0x4/0x18
>  LR [c0134fec] kobject_get_path+0x34/0xc4
>  Call Trace:
>  [eed5be90] [c01d5124] class_uevent+0xac/0x1bc
>  [eed5bed0] [c01357e4] kobject_uevent_env+0x23c/0x460
>  [eed5bf20] [c01d485c] class_device_del+0x178/0x1a0
>  [eed5bf40] [c01d489c] class_device_unregister+0x18/0x30
>  [eed5bf60] [c021f820] input_unregister_device+0xf4/0x130
>  [eed5bf70] [c0242f4c] hidinput_disconnect+0x2c/0x60
>  [eed5bf90] [f27f2bac] hidp_session+0x550/0x584 [hidp]
>  [eed5bff0] [c0013e28] kernel_thread+0x44/0x60

Where we have a few missing functions due to inlining, ie the real 
sequence seems to be:

class_device_del ->
  kobject_uevent_env ->
    class_uevent ->
      kobject_get_path ->
	get_kobj_path_length ->
	  parent = kobj;
	  do {
	    strlen(parent->k_name /* kobject_name(parent) */);
	    parent = parent->parent;
	  } while (parent);

so either the kobj or one of it's parents had already been freed when it 
was unregistered due to the disconnect.

I'm not seeing any reference counting or other protection for the device 
("input") on "hid->inputs" list. But I don't know the code. Dmitry? Jiri?

		Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [1/2] 2.6.22-rc7: known regressions
  2007-07-05 19:51         ` Linus Torvalds
@ 2007-07-05 21:03           ` Dmitry Torokhov
  0 siblings, 0 replies; 13+ messages in thread
From: Dmitry Torokhov @ 2007-07-05 21:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Woodhouse, Jiri Kosina, Michal Piotrowski, marcel,
	Andrew Morton, LKML, reiserfs-devel, Vladimir V. Saveliev,
	Randy Dunlap, linux-ide, David Chinner, sparclinux, David Miller,
	Mikael Pettersson, Mark Fortescue, William Lee Irwin III, Greg KH

On 7/5/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> Looks input-related..
>
> On Thu, 5 Jul 2007, David Woodhouse wrote:
> >
> > Hm, it's not something new. It's an oops I saw occasionally in 2.6.21-rc
> > too, whenever we had CONFIG_SYSFS_DEPRECATED set.
> >
> >  Unable to handle kernel paging request for data at address 0x6b6b6b6b
>
> Ok, that 0x6b is obviously the kfree() poisoning, ie it looks like a
> use-after-free problem with a pointer being loaded from a structure that
> had been free'd-
>
> And the trace seems to be (ignore the unreliable one):
>
> >  NIP [c001870c] strlen+0x4/0x18
> >  LR [c0134fec] kobject_get_path+0x34/0xc4
> >  Call Trace:
> >  [eed5be90] [c01d5124] class_uevent+0xac/0x1bc
> >  [eed5bed0] [c01357e4] kobject_uevent_env+0x23c/0x460
> >  [eed5bf20] [c01d485c] class_device_del+0x178/0x1a0
> >  [eed5bf40] [c01d489c] class_device_unregister+0x18/0x30
> >  [eed5bf60] [c021f820] input_unregister_device+0xf4/0x130
> >  [eed5bf70] [c0242f4c] hidinput_disconnect+0x2c/0x60
> >  [eed5bf90] [f27f2bac] hidp_session+0x550/0x584 [hidp]
> >  [eed5bff0] [c0013e28] kernel_thread+0x44/0x60
>
> Where we have a few missing functions due to inlining, ie the real
> sequence seems to be:
>
> class_device_del ->
>  kobject_uevent_env ->
>    class_uevent ->
>      kobject_get_path ->
>        get_kobj_path_length ->
>          parent = kobj;
>          do {
>            strlen(parent->k_name /* kobject_name(parent) */);
>            parent = parent->parent;
>          } while (parent);
>
> so either the kobj or one of it's parents had already been freed when it
> was unregistered due to the disconnect.
>
> I'm not seeing any reference counting or other protection for the device
> ("input") on "hid->inputs" list. But I don't know the code. Dmitry? Jiri?
>

In hidinput_connect we do:

          input_dev->dev.parent = hid->dev;

This should pin hid object untill all inputs are released. However
bluetooth does not use driver model and does not have hid->dev set up
and so it looks like we are simply trying to unregister an input
device that is already gone... I still don't quite get how we
unregister the same device twice - it is done from a per-hid-device
thread in hidp...

-- 
Dmitry

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2007-07-05 21:03 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-03 16:45 [1/2] 2.6.22-rc7: known regressions Michal Piotrowski
2007-07-03 17:29 ` Sparc32: random invalid instruction occourances on sparc32 (sun4c) Mark Fortescue
2007-07-03 17:50 ` [1/2] 2.6.22-rc7: known regressions Bartlomiej Zolnierkiewicz
2007-07-03 23:09   ` David Chinner
2007-07-05  0:20 ` David Woodhouse
2007-07-05  1:26 ` [PATCH 2.6.22 REGRESSION] Fix slab redzone alignment David Woodhouse
2007-07-05  1:42 ` [1/2] 2.6.22-rc7: known regressions David Woodhouse
2007-07-05 16:28   ` Linus Torvalds
2007-07-05 16:43     ` David Woodhouse
2007-07-05 18:46     ` David Woodhouse
2007-07-05 19:31       ` David Woodhouse
2007-07-05 19:51         ` Linus Torvalds
2007-07-05 21:03           ` Dmitry Torokhov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).