public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Slab corruption in 2.6.16-rc5-mm2
@ 2006-03-06  0:17 Jesper Juhl
  2006-03-06 18:25 ` Linus Torvalds
  0 siblings, 1 reply; 53+ messages in thread
From: Jesper Juhl @ 2006-03-06  0:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, markhe, Andrea Arcangeli


Just found the following in dmesg :

Slab corruption: start=f72948a0, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c02934eb>](sr_do_ioctl+0x11b/0x270)
000: 70 00 05 00 00 00 00 0a 00 00 00 00 24 00 00 00
010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Prev obj: start=f7294854, len=64
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [<c0173923>](real_lookup+0x93/0xe0)
000: 6c 69 62 6b 64 65 69 6e 69 74 5f 6b 69 6f 5f 68
010: 74 74 70 5f 63 61 63 68 65 5f 63 6c 65 61 6e 65
Next obj: start=f72948ec, len=64
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [<c01c7c80>](ext3_init_block_alloc_info+0x20/0x70)
000: 10 cf ce f7 00 00 00 00 00 00 00 00 00 00 00 00
010: 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Slab corruption: start=f70aeab4, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c02934eb>](sr_do_ioctl+0x11b/0x270)
000: 70 00 02 00 00 00 00 0a 00 00 00 00 3a 01 00 00
010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Prev obj: start=f70aea68, len=64
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [<c023d6df>](init_dev+0x5cf/0x630)
000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Next obj: start=f70aeb00, len=64
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [<c0173923>](real_lookup+0x93/0xe0)
000: 6c 69 62 62 6f 6f 73 74 5f 70 72 67 5f 65 78 65
010: 63 5f 6d 6f 6e 69 74 6f 72 2d 67 63 63 2d 31 5f


Machine is running 2.6.16-rc5-mm2 :
$ uname -a
Linux dragon 2.6.16-rc5-mm2 #1 SMP PREEMPT Mon Mar 6 00:06:54 CET 2006 i686 athlon-4 i386 GNU/Linux

CPU is a dualcore Athlon X2 4400+

Kernel is 32bit, build for i386.

The machine has 2GB of RAM.

Let me know what additional info would be useful, if any.

If patches need testing then just send them my way.


-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06  0:17 Jesper Juhl
@ 2006-03-06 18:25 ` Linus Torvalds
  2006-03-06 18:43   ` Jesper Juhl
  2006-03-06 18:48   ` Mike Christie
  0 siblings, 2 replies; 53+ messages in thread
From: Linus Torvalds @ 2006-03-06 18:25 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: Linux Kernel Mailing List, Andrew Morton, markhe,
	Andrea Arcangeli, Mike Christie, James Bottomley



On Mon, 6 Mar 2006, Jesper Juhl wrote:
> 
> Slab corruption: start=f72948a0, len=64
> Redzone: 0x5a2cf071/0x5a2cf071.
> Last user: [<c02934eb>](sr_do_ioctl+0x11b/0x270)
> 000: 70 00 05 00 00 00 00 0a 00 00 00 00 24 00 00 00
> 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> Prev obj: start=f7294854, len=64
> Redzone: 0x170fc2a5/0x170fc2a5.
> Last user: [<c0173923>](real_lookup+0x93/0xe0)
> 000: 6c 69 62 6b 64 65 69 6e 69 74 5f 6b 69 6f 5f 68
> 010: 74 74 70 5f 63 61 63 68 65 5f 63 6c 65 61 6e 65
> Next obj: start=f72948ec, len=64
> Redzone: 0x170fc2a5/0x170fc2a5.
> Last user: [<c01c7c80>](ext3_init_block_alloc_info+0x20/0x70)
> 000: 10 cf ce f7 00 00 00 00 00 00 00 00 00 00 00 00
> 010: 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Ok, this is interesting because the previous and the next objects look 
fine, which implies that it's not something else that overwrote it.

Perhaps more importantly, the actual corrupted object _should_ contain the 
POISON_FREE bytes, but doesn't. In fact, what it _does_ contain is 
apparently perfectly valid SCSI sense request data (if I read it right, 
it's ASC/ASCQ of 24/00 means "Invalid field in cdb").

So that "last user: sr_do_ioctl" thing actually matches what the contents 
are.

Your other smal corruption:

> Slab corruption: start=f70aeab4, len=64
> Redzone: 0x5a2cf071/0x5a2cf071.
> Last user: [<c02934eb>](sr_do_ioctl+0x11b/0x270)
> 000: 70 00 02 00 00 00 00 0a 00 00 00 00 3a 01 00 00
> 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> Prev obj: start=f70aea68, len=64
> Redzone: 0x170fc2a5/0x170fc2a5.
> Last user: [<c023d6df>](init_dev+0x5cf/0x630)
> 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> Next obj: start=f70aeb00, len=64
> Redzone: 0x170fc2a5/0x170fc2a5.
> Last user: [<c0173923>](real_lookup+0x93/0xe0)
> 000: 6c 69 62 62 6f 6f 73 74 5f 70 72 67 5f 65 78 65
> 010: 63 5f 6d 6f 6e 69 74 6f 72 2d 67 63 63 2d 31 5f

is exactly the same thing. The objects around it look ok. The object that 
should be poisoned again looks like it contains normal request-sense data 
(this time 3a/01: "Medium not present - tray closed", if I read it right).

So it really looks like in this case, the sr_do_ioctl() function free'd 
the sense key object, but that the sense data was actually written _after_ 
the object was free'd.

That in turn would imply that scsi_execute() returned before the SCSI 
command had actually completed. Now, scsi_execute() just calls to the 
generic block device layer, which in turn just submits it, and then waits 
for its completion, so if it returns too early, that means that it got 
_completed_ too early by the driver.

Alternatively, it got re-tried. The reason I mention that is that we have 
commit 17e01f216b611fc46956dcd9063aec4de75991e3, which changes 
scsi_execute() to add retries.

I wonder if something does a "complete(rq->waiting)" while the thing is 
still retrying? In general, I do not believe that we should retry special 
commands that have been initiated by a user, we should return the error. 
But I haven't thought this through.

Anyway, Jesper, I see two potential reasons for this bug:

 - total and utter slab confusion (the slab layer returned the same slab 
   allocation twice to two different callers). I consider this pretty 
   unlikely, because it's such a _major_ failure of the slab code, and the 
   slab code hasn't changed that much, but I mention it just in case. 

 - SCSI layer breakage. It might well be the low-level driver completing a 
   request too early, or it migth be the re-trying. If it's the re-trying, 
   you could try just reverting that commit I pointed to (ie if you're a 
   git user, just do "git revert 17e01f21", otherwise you'd need to look 
   it up from gitweb and un-apply the patch)

Regardless, Jesper, it would be great to hear _what_ strange CDROM device 
you have that would implied in sr_ioctl.c - is it USB, SATA or something 
else?

James, Mike, can you double-check the retries? In particular, it's _wrong_ 
to retry after you've already marked a command completed with 
"complete(rq->waiting)", so if that happens somewhere, things are really 
broken.

			Linus

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 18:25 ` Linus Torvalds
@ 2006-03-06 18:43   ` Jesper Juhl
  2006-03-06 19:32     ` Linus Torvalds
  2006-03-06 18:48   ` Mike Christie
  1 sibling, 1 reply; 53+ messages in thread
From: Jesper Juhl @ 2006-03-06 18:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Andrew Morton, markhe,
	Andrea Arcangeli, Mike Christie, James Bottomley, Jesper Juhl

On Monday 06 March 2006 19:25, Linus Torvalds wrote:
> 
<...snip...>
> Anyway, Jesper, I see two potential reasons for this bug:
> 
>  - total and utter slab confusion (the slab layer returned the same slab 
>    allocation twice to two different callers). I consider this pretty 
>    unlikely, because it's such a _major_ failure of the slab code, and the 
>    slab code hasn't changed that much, but I mention it just in case. 
> 
>  - SCSI layer breakage. It might well be the low-level driver completing a 
>    request too early, or it migth be the re-trying. If it's the re-trying, 
>    you could try just reverting that commit I pointed to (ie if you're a 
>    git user, just do "git revert 17e01f21", otherwise you'd need to look 
>    it up from gitweb and un-apply the patch)
> 
Not a git user (I need to become one but haven't found the time to read up 
on it yet), but no problem, I'll dig out the patch and try reverting it.
Luckily it seems this is pretty repeatable on every boot, I find it in the 
logs instantly after logging in and launching a shell on my KDE desktop and
running dmesg - I'll do a few more reboots to make sure it *really* is 
reproducible before reverting the patch so we can be sure if it fixes the 
problem or not.

Btw, the messages turn out slightly different on each boot, here are the 
ones from this current boot of my box:

Slab corruption: start=f72b6b98, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c02934eb>](sr_do_ioctl+0x11b/0x270)
000: 70 00 02 00 00 00 00 0a 00 00 00 00 3a 01 00 00
010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Prev obj: start=f72b6b4c, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c01813e6>](free_fdtable_rcu+0x66/0x150)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Next obj: start=f72b6be4, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<00000000>](_stext+0x3feffd68/0x8)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Slab corruption: start=f72b6b98, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c02934eb>](sr_do_ioctl+0x11b/0x270)
000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Prev obj: start=f72b6b4c, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c01813e6>](free_fdtable_rcu+0x66/0x150)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Next obj: start=f72b6be4, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<00000000>](_stext+0x3feffd68/0x8)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Slab corruption: start=f72b6b98, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c01d3769>](ext3_clear_inode+0x29/0x40)
000: 70 00 05 00 00 00 00 0a 00 00 00 00 24 00 00 00
010: 00 00 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Prev obj: start=f72b6b4c, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c01813e6>](free_fdtable_rcu+0x66/0x150)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Next obj: start=f72b6be4, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<00000000>](_stext+0x3feffd68/0x8)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b

Would gathering more of these help you out?


> Regardless, Jesper, it would be great to hear _what_ strange CDROM device 
> you have that would implied in sr_ioctl.c - is it USB, SATA or something 
> else?
> 
I have no USB, SATA or similar devices in the box, only a floppy drive, a 
SCSI harddisk, a SCSI CD writer and a SCSI DVD-ROM.
Here are some details : 


$ cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 04 Lun: 00
  Vendor: PIONEER  Model: DVD-ROM DVD-305  Rev: 1.03
  Type:   CD-ROM                           ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 05 Lun: 00
  Vendor: PLEXTOR  Model: CD-R   PX-W1210S Rev: 1.01
  Type:   CD-ROM                           ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 06 Lun: 00
  Vendor: IBM      Model: DDYS-T36950N     Rev: S96H
  Type:   Direct-Access                    ANSI SCSI revision: 03


>From dmesg :

scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0
        <Adaptec 29160N Ultra160 SCSI adapter>
        aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs

  Vendor: PIONEER   Model: DVD-ROM DVD-305   Rev: 1.03
  Type:   CD-ROM                             ANSI SCSI revision: 02
 target0:0:4: Beginning Domain Validation
 target0:0:4: FAST-20 SCSI 20.0 MB/s ST (50 ns, offset 16)
 target0:0:4: Domain Validation skipping write tests
 target0:0:4: Ending Domain Validation
  Vendor: PLEXTOR   Model: CD-R   PX-W1210S  Rev: 1.01
  Type:   CD-ROM                             ANSI SCSI revision: 02
 target0:0:5: Beginning Domain Validation
 target0:0:5: FAST-20 SCSI 20.0 MB/s ST (50 ns, offset 16)
 target0:0:5: Domain Validation skipping write tests
 target0:0:5: Ending Domain Validation
  Vendor: IBM       Model: DDYS-T36950N      Rev: S96H
  Type:   Direct-Access                      ANSI SCSI revision: 03
scsi0:A:6:0: Tagged Queuing enabled.  Depth 200
 target0:0:6: Beginning Domain Validation
 target0:0:6: wide asynchronous
 target0:0:6: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 63)
 target0:0:6: Ending Domain Validation
SCSI device sda: 71687340 512-byte hdwr sectors (36704 MB)
sda: Write Protect is off
sda: Mode Sense: cb 00 00 08
SCSI device sda: drive cache: write back
SCSI device sda: 71687340 512-byte hdwr sectors (36704 MB)
sda: Write Protect is off
sda: Mode Sense: cb 00 00 08
SCSI device sda: drive cache: write back
 sda: sda1 sda2 sda3 sda4
sd 0:0:6:0: Attached scsi disk sda
sr0: scsi3-mmc drive: 16x/40x cd/rw xa/form2 cdda tray
Uniform CD-ROM driver Revision: 3.20
sr 0:0:4:0: Attached scsi CD-ROM sr0
sr1: scsi3-mmc drive: 32x/32x writer cd/rw xa/form2 cdda tray
sr 0:0:5:0: Attached scsi CD-ROM sr1
sr 0:0:4:0: Attached scsi generic sg0 type 5
sr 0:0:5:0: Attached scsi generic sg1 type 5
sd 0:0:6:0: Attached scsi generic sg2 type 0


# lspci -vvx
00:00.0 Host bridge: ALi Corporation M1695 K8 Northbridge [PCI Express and HyperTransport]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 0
        Capabilities: [40] #08 [0060]
        Capabilities: [5c] #08 [a800]
        Capabilities: [68] #08 [9000]
        Capabilities: [74] #08 [8000]
        Capabilities: [7c] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable-
                Address: 00000000fee00000  Data: 0000
00: b9 10 95 16 07 00 10 00 00 00 00 06 00 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00

00:01.0 PCI bridge: ALi Corporation: Unknown device 524b (prog-if 00 [Normal decode])
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 0, cache line size 10
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        Memory behind bridge: ff200000-ff2fffff
        BridgeCtl: Parity+ SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
        Capabilities: [40] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [48] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable-
                Address: 00000000fee00000  Data: 0000
        Capabilities: [58] #10 [0141]
        Capabilities: [7c] #08 [a800]
        Capabilities: [88] #08 [8825]
00: b9 10 4b 52 06 01 10 00 00 00 04 06 10 00 01 00
10: 00 00 00 00 00 00 00 00 00 01 01 00 f0 00 00 00
20: 20 ff 20 ff f0 ff 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 0a 01 03 00

00:02.0 PCI bridge: ALi Corporation: Unknown device 524c (prog-if 00 [Normal decode])
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 0, cache line size 10
        Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
        Memory behind bridge: ff300000-ff3fffff
        BridgeCtl: Parity+ SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
        Capabilities: [40] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [48] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable-
                Address: 00000000fee00000  Data: 0000
        Capabilities: [58] #10 [0141]
        Capabilities: [7c] #08 [a800]
        Capabilities: [88] #08 [8825]
00: b9 10 4c 52 06 01 10 00 00 00 04 06 10 00 01 00
10: 00 00 00 00 00 00 00 00 00 02 02 00 f0 00 00 00
20: 30 ff 30 ff f0 ff 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 01 03 00

00:04.0 Host bridge: ALi Corporation M1689 K8 Northbridge [Super K8 Single Chip]
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 0
        Region 0: Memory at dc000000 (32-bit, prefetchable) [size=64M]
        Capabilities: [40] #08 [0024]
        Capabilities: [60] #08 [8038]
        Capabilities: [80] AGP version 3.0
                Status: RQ=28 Iso- ArqSz=0 Cal=0 SBA+ ITACoh- GART64- HTrans- 64bit- FW- AGP3- Rate=x1,x2,x4
                Command: RQ=1 ArqSz=0 Cal=0 SBA- AGP- GART64- 64bit- FW- Rate=<none>
00: b9 10 89 16 06 01 10 00 00 00 00 06 00 00 00 00
10: 08 00 00 dc 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00

00:05.0 PCI bridge: ALi Corporation AGP8X Controller (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
        Status: Cap- 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 0
        Bus: primary=00, secondary=03, subordinate=03, sec-latency=64
        Memory behind bridge: ff400000-ff4fffff
        Prefetchable memory behind bridge: c7f00000-d7efffff
        BridgeCtl: Parity+ SERR+ NoISA- VGA+ MAbort- >Reset- FastB2B-
00: b9 10 46 52 07 01 20 00 00 00 04 06 00 00 01 00
10: 00 00 00 00 00 00 00 00 00 03 03 40 f0 00 20 22
20: 40 ff 40 ff f0 c7 e0 d7 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0b 00

00:06.0 PCI bridge: ALi Corporation M5249 HTT to PCI Bridge (prog-if 01 [Subtractive decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 0
        Bus: primary=00, secondary=04, subordinate=04, sec-latency=32
        I/O behind bridge: 0000d000-0000dfff
        Memory behind bridge: ff500000-ff5fffff
        Prefetchable memory behind bridge: 88000000-880fffff
        BridgeCtl: Parity+ SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
00: b9 10 49 52 07 01 00 00 00 01 04 06 00 00 01 00
10: 00 00 00 00 00 00 00 00 00 04 04 20 d0 d0 00 22
20: 50 ff 50 ff 00 88 00 88 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 00

00:07.0 ISA bridge: ALi Corporation M1563 HyperTransport South Bridge (rev 70)
        Subsystem: ASRock Incorporation: Unknown device 1563
        Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 0 (250ns min, 6000ns max)
00: b9 10 63 15 0f 00 00 02 70 00 01 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 49 18 63 15
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 18

00:07.1 Bridge: ALi Corporation M7101 Power Management Controller [PMU]
        Subsystem: ASRock Incorporation: Unknown device 7101
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
00: b9 10 01 71 00 00 00 02 00 00 80 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 49 18 01 71
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:11.0 Ethernet controller: ALi Corporation M5263 Ethernet Controller (rev 40)
        Subsystem: ASRock Incorporation: Unknown device 5263
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 32 (5000ns min, 10000ns max), cache line size 08
        Interrupt: pin A routed to IRQ 10
        Region 0: I/O ports at e800 [size=256]
        Region 1: Memory at ff6ffc00 (32-bit, non-prefetchable) [size=256]
        Capabilities: [50] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: b9 10 63 52 07 01 10 02 40 00 00 02 08 20 00 00
10: 01 e8 00 00 00 fc 6f ff 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 49 18 63 52
30: 00 00 00 00 50 00 00 00 00 00 00 00 0a 01 14 28

00:12.0 IDE interface: ALi Corporation M5229 IDE (rev c7) (prog-if 8a [Master SecP PriP])
        Subsystem: ASRock Incorporation: Unknown device 5229
        Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 32
        Interrupt: pin A routed to IRQ 0
        Region 0: I/O ports at <ignored>
        Region 1: I/O ports at <ignored>
        Region 2: I/O ports at <ignored>
        Region 3: I/O ports at <ignored>
        Region 4: I/O ports at ff00 [size=16]
00: b9 10 29 52 05 00 a0 02 c7 8a 01 01 00 20 00 00
10: f1 01 00 00 f5 03 00 00 71 01 00 00 75 03 00 00
20: 01 ff 00 00 00 00 00 00 00 00 00 00 49 18 29 52
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00

00:13.0 USB Controller: ALi Corporation USB 1.1 Controller (rev 03) (prog-if 10 [OHCI])
        Subsystem: ASRock Incorporation: Unknown device 5237
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
        Status: Cap- 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 32 (20000ns max), cache line size 10
        Interrupt: pin A routed to IRQ 11
        Region 0: Memory at ff6fe000 (32-bit, non-prefetchable) [size=4K]
00: b9 10 37 52 17 01 a8 02 03 10 03 0c 10 20 80 00
10: 00 e0 6f ff 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 49 18 37 52
30: 00 00 00 00 00 00 00 00 00 00 00 00 0b 01 00 50

00:13.1 USB Controller: ALi Corporation USB 1.1 Controller (rev 03) (prog-if 10 [OHCI])
        Subsystem: ASRock Incorporation: Unknown device 5237
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
        Status: Cap- 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 32 (20000ns max), cache line size 10
        Interrupt: pin B routed to IRQ 3
        Region 0: Memory at ff6fd000 (32-bit, non-prefetchable) [size=4K]
00: b9 10 37 52 17 01 a8 02 03 10 03 0c 10 20 80 00
10: 00 d0 6f ff 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 49 18 37 52
30: 00 00 00 00 00 00 00 00 00 00 00 00 03 02 00 50

00:13.2 USB Controller: ALi Corporation USB 1.1 Controller (rev 03) (prog-if 10 [OHCI])
        Subsystem: ASRock Incorporation: Unknown device 5237
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
        Status: Cap- 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 32 (20000ns max), cache line size 10
        Interrupt: pin C routed to IRQ 11
        Region 0: Memory at ff6fc000 (32-bit, non-prefetchable) [size=4K]
00: b9 10 37 52 17 01 a8 02 03 10 03 0c 10 20 80 00
10: 00 c0 6f ff 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 49 18 37 52
30: 00 00 00 00 00 00 00 00 00 00 00 00 0b 03 00 50

00:13.3 USB Controller: ALi Corporation USB 2.0 Controller (rev 01) (prog-if 20 [EHCI])
        Subsystem: ASRock Incorporation: Unknown device 5239
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 32 (4000ns min, 8000ns max), cache line size 10
        Interrupt: pin D routed to IRQ 5
        Region 0: Memory at ff6ff800 (32-bit, non-prefetchable) [size=256]
        Capabilities: [50] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [58] #0a [2090]
00: b9 10 39 52 16 01 b0 02 01 20 03 0c 10 20 80 00
10: 00 f8 6f ff 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 49 18 39 52
30: 00 00 00 00 50 00 00 00 00 00 00 00 05 04 10 20

00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Capabilities: [80] #08 [2101]
00: 22 10 00 11 00 00 10 00 00 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 80 00 00 00 00 00 00 00 00 00 00 00

00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
00: 22 10 01 11 00 00 00 00 00 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
00: 22 10 02 11 00 00 00 00 00 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
00: 22 10 03 11 00 00 00 00 00 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

03:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA Parhelia AGP (rev 03) (prog-if 00 [VGA])
        Subsystem: Matrox Graphics, Inc. Parhelia 128Mb
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 32 (4000ns min, 8000ns max), cache line size 10
        Interrupt: pin A routed to IRQ 5
        Region 0: Memory at c8000000 (32-bit, prefetchable) [size=128M]
        Region 1: Memory at ff4fe000 (32-bit, non-prefetchable) [size=8K]
        Expansion ROM at ff4c0000 [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [f0] AGP version 2.0
                Status: RQ=32 Iso- ArqSz=0 Cal=0 SBA+ ITACoh- GART64- HTrans- 64bit- FW+ AGP3- Rate=x1,x2,x4
                Command: RQ=1 ArqSz=0 Cal=0 SBA- AGP- GART64- 64bit- FW- Rate=<none>
00: 2b 10 27 05 07 00 b0 02 03 00 00 03 10 20 00 00
10: 08 00 00 c8 00 e0 4f ff 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 2b 10 40 08
30: 00 00 4c ff dc 00 00 00 00 00 00 00 05 01 10 20

04:05.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev 0a)
        Subsystem: Creative Labs SBLive! 5.1 eMicro 28028
        Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 32 (500ns min, 5000ns max)
        Interrupt: pin A routed to IRQ 20
        Region 0: I/O ports at d880 [size=32]
        Capabilities: [dc] Power Management version 1
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: 02 11 02 00 05 01 90 02 0a 00 01 04 00 20 80 00
10: 81 d8 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 02 11 67 80
30: 00 00 00 00 dc 00 00 00 00 00 00 00 0b 01 02 14

04:05.1 Input device controller: Creative Labs SB Live! MIDI/Game Port (rev 0a)
        Subsystem: Creative Labs Gameport Joystick
        Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 32
        Region 0: I/O ports at dc00 [size=8]
        Capabilities: [dc] Power Management version 1
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: 02 11 02 70 05 01 90 02 0a 00 80 09 00 20 80 00
10: 01 dc 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 02 11 20 00
30: 00 00 00 00 dc 00 00 00 00 00 00 00 00 00 00 00

04:06.0 SCSI storage controller: Adaptec AIC-7892A U160/m (rev 02)
        Subsystem: Adaptec 29160N Ultra160 SCSI Controller
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 32 (10000ns min, 6250ns max), cache line size 10
        Interrupt: pin A routed to IRQ 19
        BIST result: 00
        Region 0: I/O ports at d400 [disabled] [size=256]
        Region 1: Memory at ff5ff000 (64-bit, non-prefetchable) [size=4K]
        Expansion ROM at 88000000 [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: 05 90 80 00 16 01 b0 02 02 00 00 01 10 20 00 80
10: 01 d4 00 00 04 f0 5f ff 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 05 90 a0 62
30: 00 00 5c ff dc 00 00 00 00 00 00 00 03 01 28 19

04:07.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 42)
        Subsystem: D-Link System Inc DFE-530TX rev B
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 32 (750ns min, 2000ns max), cache line size 10
        Interrupt: pin A routed to IRQ 18
        Region 0: I/O ports at d000 [size=256]
        Region 1: Memory at ff5fec00 (32-bit, non-prefetchable) [size=256]
        Expansion ROM at 88020000 [disabled] [size=64K]
        Capabilities: [40] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: 06 11 65 30 17 01 10 02 42 00 00 02 10 20 00 00
10: 01 d0 00 00 00 ec 5f ff 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 11 01 14
30: 00 00 ff ff 40 00 00 00 00 00 00 00 02 01 03 08



/Jesper


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 18:25 ` Linus Torvalds
  2006-03-06 18:43   ` Jesper Juhl
@ 2006-03-06 18:48   ` Mike Christie
  2006-03-06 18:49     ` Mike Christie
  1 sibling, 1 reply; 53+ messages in thread
From: Mike Christie @ 2006-03-06 18:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jesper Juhl, Linux Kernel Mailing List, Andrew Morton, markhe,
	Andrea Arcangeli, James Bottomley

Linus Torvalds wrote:
> 
> James, Mike, can you double-check the retries? In particular, it's _wrong_ 
> to retry after you've already marked a command completed with 
> "complete(rq->waiting)", so if that happens somewhere, things are really 
> broken.
> 

I am looking into it. I think it has something to do with the request 
getting completed too early or maybe something crazy like twice. This 
looks like a similar problem that was reported to linux-scsi where for 
some tape setup the request's bio gets freed twice.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 18:48   ` Mike Christie
@ 2006-03-06 18:49     ` Mike Christie
  0 siblings, 0 replies; 53+ messages in thread
From: Mike Christie @ 2006-03-06 18:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jesper Juhl, Linux Kernel Mailing List, Andrew Morton, markhe,
	Andrea Arcangeli, James Bottomley

Mike Christie wrote:
> Linus Torvalds wrote:
> 
>>
>> James, Mike, can you double-check the retries? In particular, it's 
>> _wrong_ to retry after you've already marked a command completed with 
>> "complete(rq->waiting)", so if that happens somewhere, things are 
>> really broken.
>>
> 
> I am looking into it. I think it has something to do with the request 
> getting completed too early or maybe something crazy like twice. This 
> looks like a similar problem that was reported to linux-scsi

Oh yeah here is that thread

http://marc.theaimsgroup.com/?l=linux-scsi&m=114127615918030&w=2

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 18:43   ` Jesper Juhl
@ 2006-03-06 19:32     ` Linus Torvalds
  2006-03-06 19:51       ` Jesper Juhl
  2006-03-06 20:06       ` Linus Torvalds
  0 siblings, 2 replies; 53+ messages in thread
From: Linus Torvalds @ 2006-03-06 19:32 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: Linux Kernel Mailing List, Andrew Morton, markhe,
	Andrea Arcangeli, Mike Christie, James Bottomley

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2986 bytes --]



On Mon, 6 Mar 2006, Jesper Juhl wrote:
>
> Not a git user (I need to become one but haven't found the time to read up 
> on it yet), but no problem, I'll dig out the patch and try reverting it.

It's attached here.

NOTE! I'm not at all sure it's the re-try logic. It could be something 
else. Anything that completes the request before it's actually totally 
done - or possibly re-uses the sense data for something else would be 
wrong and buggy.

> Btw, the messages turn out slightly different on each boot, here are the 
> ones from this current boot of my box:
> 
> Slab corruption: start=f72b6b98, len=64
> Redzone: 0x5a2cf071/0x5a2cf071.
> Last user: [<c02934eb>](sr_do_ioctl+0x11b/0x270)
> 000: 70 00 02 00 00 00 00 0a 00 00 00 00 3a 01 00 00
> 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Ok, same deal. "Medium not present - tray closed" sense data.

> Slab corruption: start=f72b6b98, len=64
> Redzone: 0x5a2cf071/0x5a2cf071.
> Last user: [<c02934eb>](sr_do_ioctl+0x11b/0x270)
> 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Hmm. Totally empty sense data? Strange.

> Slab corruption: start=f72b6b98, len=64
> Redzone: 0x5a2cf071/0x5a2cf071.
> Last user: [<c01d3769>](ext3_clear_inode+0x29/0x40)
> 000: 70 00 05 00 00 00 00 0a 00 00 00 00 24 00 00 00
> 010: 00 00 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b

This is different. But it looks similar. It looks like the thing was 
actually re-allocated for something else (posix acl data?) but then 
overwritten. However, the overwritten data does look like SCSI sense 
information again ("Invalid field in cdb"), so I think it's the same 
thing despite the fact that it had gotten re-allocated for something else.

> Would gathering more of these help you out?

It's always interesting when trying to find the pattern, but I think the 
pattern is already pretty clear. sr_do_ioctl() seems to be the thing, and 
sense data is written too late.

> I have no USB, SATA or similar devices in the box, only a floppy drive, a 
> SCSI harddisk, a SCSI CD writer and a SCSI DVD-ROM.

Well, the fact that you have a CDSI CD-writer and a SCSI DVD-ROM explains 
the thing, so that's all good.

> scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0
>         <Adaptec 29160N Ultra160 SCSI adapter>
>         aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs

So it's either an aic7xxx bug, or it's generic SCSI.

Considering that we've had other slab corruption issues (the reason I was 
looking closely at yours), generic SCSI isn't out of the question.

If you were a git user, doing a bisection run would be useful since you 
seem to be able to recreate it at will. Oh, well. Testign that one patch 
would still help.

		Linus

[-- Attachment #2: Type: TEXT/PLAIN, Size: 2046 bytes --]

diff-tree 17e01f216b611fc46956dcd9063aec4de75991e3 (from 6e68af666f5336254b5715dca591026b7324499a)
Author: Mike Christie <michaelc@cs.wisc.edu>
Date:   Fri Nov 11 05:31:37 2005 -0600

    [SCSI] add retries field to request for REQ_BLOCK_PC use
    
    For tape we need to control the retries. This patch adds a retries
    counter on the request for REQ_BLOCK_PC commands originating from
    scsi_execute* to use. REQ_BLOCK_PC commands comming from the block
    layer SG_IO path continue to use the retires set in the ULD init_command.
    (scsi_execute* does not set the gendisk so we do not execute
    the init_command in that path).
    
    Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
    Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index eb0cfbf..365843a 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -259,6 +259,7 @@ int scsi_execute(struct scsi_device *sde
 	memcpy(req->cmd, cmd, req->cmd_len);
 	req->sense = sense;
 	req->sense_len = 0;
+	req->retries = retries;
 	req->timeout = timeout;
 	req->flags |= flags | REQ_BLOCK_PC | REQ_SPECIAL | REQ_QUIET;
 
@@ -472,6 +473,7 @@ int scsi_execute_async(struct scsi_devic
 	req->sense = sioc->sense;
 	req->sense_len = 0;
 	req->timeout = timeout;
+	req->retries = retries;
 	req->flags |= REQ_BLOCK_PC | REQ_QUIET;
 	req->end_io_data = sioc;
 
@@ -1393,7 +1395,7 @@ static int scsi_prep_fn(struct request_q
 				cmd->sc_data_direction = DMA_NONE;
 			
 			cmd->transfersize = req->data_len;
-			cmd->allowed = 3;
+			cmd->allowed = req->retries;
 			cmd->timeout_per_command = req->timeout;
 			cmd->done = scsi_generic_done;
 		}
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 9a68716..509e9a0 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -184,6 +184,7 @@ struct request {
 	void *sense;
 
 	unsigned int timeout;
+	int retries;
 
 	/*
 	 * For Power Management requests

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 19:32     ` Linus Torvalds
@ 2006-03-06 19:51       ` Jesper Juhl
  2006-03-06 19:58         ` Jesper Juhl
  2006-03-06 20:06       ` Linus Torvalds
  1 sibling, 1 reply; 53+ messages in thread
From: Jesper Juhl @ 2006-03-06 19:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Andrew Morton, markhe,
	Andrea Arcangeli, Mike Christie, James Bottomley

On Monday 06 March 2006 20:32, Linus Torvalds wrote:
> 
> On Mon, 6 Mar 2006, Jesper Juhl wrote:
> >
> > Not a git user (I need to become one but haven't found the time to read up 
> > on it yet), but no problem, I'll dig out the patch and try reverting it.
> 
> It's attached here.
> 
Thanks.


> NOTE! I'm not at all sure it's the re-try logic. It could be something 
> else. Anything that completes the request before it's actually totally 
> done - or possibly re-uses the sense data for something else would be 
> wrong and buggy.
> 
Ohh well, let's work on the assumption that it is the re-try logic first, 
then try something else if it turns out it isn't. I have no problem testing
a bunch of patches if needed.

<...>
> This is different. But it looks similar. It looks like the thing was 
> actually re-allocated for something else (posix acl data?) but then 

I doubt it's POSIX ACL data : 

juhl@dragon:~/download/kernel/linux-2.6.16-rc5-mm2$ grep -i ACL .config
# CONFIG_FS_POSIX_ACL is not set


> overwritten. However, the overwritten data does look like SCSI sense 
> information again ("Invalid field in cdb"), so I think it's the same 
> thing despite the fact that it had gotten re-allocated for something else.
> 
> > Would gathering more of these help you out?
> 
> It's always interesting when trying to find the pattern, but I think the 
> pattern is already pretty clear. sr_do_ioctl() seems to be the thing, and 
> sense data is written too late.
> 
Ok, it's reproducible on demand (at least I've now reproduced it on 6 more
boots), so if you need any more just let me know and I'll gather a few.


> > I have no USB, SATA or similar devices in the box, only a floppy drive, a 
> > SCSI harddisk, a SCSI CD writer and a SCSI DVD-ROM.
> 
> Well, the fact that you have a CDSI CD-writer and a SCSI DVD-ROM explains 
> the thing, so that's all good.
> 
> > scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0
> >         <Adaptec 29160N Ultra160 SCSI adapter>
> >         aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs
> 
> So it's either an aic7xxx bug, or it's generic SCSI.
> 
> Considering that we've had other slab corruption issues (the reason I was 
> looking closely at yours), generic SCSI isn't out of the question.
> 
> If you were a git user, doing a bisection run would be useful since you 

Well, now is probably as good a time as any for becoming a git user, 
tracking my own patches as individual plain-text files is getting 
un-managable and as you say, bisection would be useful to be able to do.
I'll dig up some git docs and start reading.


> seem to be able to recreate it at will. Oh, well. Testign that one patch 
> would still help.
> 
Hmm, that patch does not apply to the 2.6.16-rc5-mm2 kernel : 

patching file drivers/scsi/scsi_lib.c
Hunk #1 succeeded at 260 (offset 1 line).
Hunk #2 FAILED at 473.
Hunk #3 FAILED at 1394.
2 out of 3 hunks FAILED -- saving rejects to file drivers/scsi/scsi_lib.c.rej
patching file include/linux/blkdev.h
Hunk #1 succeeded at 190 (offset 6 lines).

I'll go see if the problem also exists in mainline - will report on that 
shortly.


 /Jesper



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 19:51       ` Jesper Juhl
@ 2006-03-06 19:58         ` Jesper Juhl
  0 siblings, 0 replies; 53+ messages in thread
From: Jesper Juhl @ 2006-03-06 19:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Andrew Morton, markhe,
	Andrea Arcangeli, Mike Christie, James Bottomley

On 3/6/06, Jesper Juhl <jesper.juhl@gmail.com> wrote:
> On Monday 06 March 2006 20:32, Linus Torvalds wrote:
> >
> > seem to be able to recreate it at will. Oh, well. Testign that one patch
> > would still help.
> >
> Hmm, that patch does not apply to the 2.6.16-rc5-mm2 kernel :
>
I fixed it up by hand.
Building a new kernel at the moment - results in a short while.

>
> I'll go see if the problem also exists in mainline - will report on that
> shortly.
>
I'll still do this. Just downloading 2.6.16-rc5-git8 as we speak.


--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 19:32     ` Linus Torvalds
  2006-03-06 19:51       ` Jesper Juhl
@ 2006-03-06 20:06       ` Linus Torvalds
  2006-03-06 20:24         ` Jesper Juhl
  2006-03-06 20:36         ` Jesper Juhl
  1 sibling, 2 replies; 53+ messages in thread
From: Linus Torvalds @ 2006-03-06 20:06 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: Linux Kernel Mailing List, Andrew Morton, markhe,
	Andrea Arcangeli, Mike Christie, James Bottomley



On Mon, 6 Mar 2006, Linus Torvalds wrote:
> 
> So it's either an aic7xxx bug, or it's generic SCSI.
> 
> Considering that we've had other slab corruption issues (the reason I was 
> looking closely at yours), generic SCSI isn't out of the question.
> 
> If you were a git user, doing a bisection run would be useful since you 
> seem to be able to recreate it at will. Oh, well. Testign that one patch 
> would still help.

Hmm.. This appended patch may or may not help.

It overwrites the SCSI command "req" pointer when the request has been 
done. The request cannot be used afterwards, so anybody accessing it would 
be a bug. I think.

HOWEVER. I noticed something else strange. Your slab corruption report 
says

	Slab corruption: start=f72948a0, len=64
	Redzone: 0x5a2cf071/0x5a2cf071.
	Last user: [<c02934eb>](sr_do_ioctl+0x11b/0x270)
	...

and the scary thing is that "len=64". 

The thing is, SCSI uses "SCSI_SENSE_BUFFERSIZE" to determine the maximum 
sense size to copy, and what do we have, if not

	include/scsi/scsi_cmnd.h:#define SCSI_SENSE_BUFFERSIZE  96

ie a 64-byte buffer is simply TOO DAMN SMALL!

Now, the thing is, the 64 comes from "sizeof(struct request_sense)", which 
is what "struct packet_command *" uses. We can change that sizeof() to 
just use SCSI_SENSE_BUFFERSIZE, but that still makes me worry about 
somebody else having allocated a "packed_command->sense" using just the 
same 64-byte "struct request_sense".

Can a SCSI sense-buffer really be 96? If so, why is "struct request_sense" 
the wrong size? This looks likea really nasty bug.

		Linus

----
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 701a328..296ac8e 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -788,6 +788,9 @@ static struct scsi_cmnd *scsi_end_reques
 		}
 	}
 
+	/* Poison the request pointer: it is done and no longer exists after this */
+	cmd->request = (void *) 0x5a5a5a5a;
+
 	add_disk_randomness(req->rq_disk);
 
 	spin_lock_irqsave(q->queue_lock, flags);

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 20:06       ` Linus Torvalds
@ 2006-03-06 20:24         ` Jesper Juhl
  2006-03-06 20:30           ` Jens Axboe
  2006-03-06 20:36         ` Jesper Juhl
  1 sibling, 1 reply; 53+ messages in thread
From: Jesper Juhl @ 2006-03-06 20:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Andrew Morton, markhe,
	Andrea Arcangeli, Mike Christie, James Bottomley

On Monday 06 March 2006 21:06, Linus Torvalds wrote:
> 
> On Mon, 6 Mar 2006, Linus Torvalds wrote:
> > 
> > So it's either an aic7xxx bug, or it's generic SCSI.
> > 
> > Considering that we've had other slab corruption issues (the reason I was 
> > looking closely at yours), generic SCSI isn't out of the question.
> > 
> > If you were a git user, doing a bisection run would be useful since you 
> > seem to be able to recreate it at will. Oh, well. Testign that one patch 
> > would still help.
> 

Since the patch you sent me didn't apply cleanly to the mm kernel I made the
changes by hand. This is what I ended up with (should be the same end result
as what you intended as far as I can see) :

--- linux-2.6.16-rc5-mm2-orig/drivers/scsi/scsi_lib.c	2006-03-05 23:43:56.000000000 +0100
+++ linux-2.6.16-rc5-mm2/drivers/scsi/scsi_lib.c	2006-03-06 21:13:53.000000000 +0100
@@ -260,7 +260,6 @@ int scsi_execute(struct scsi_device *sde
 	memcpy(req->cmd, cmd, req->cmd_len);
 	req->sense = sense;
 	req->sense_len = 0;
-	req->retries = retries;
 	req->timeout = timeout;
 	req->flags |= flags | REQ_BLOCK_PC | REQ_SPECIAL | REQ_QUIET;
 
@@ -478,7 +477,6 @@ int scsi_execute_async(struct scsi_devic
 	req->sense = sioc->sense;
 	req->sense_len = 0;
 	req->timeout = timeout;
-	req->retries = retries;
 	req->end_io_data = sioc;
 
 	sioc->data = privdata;
@@ -1240,7 +1238,7 @@ static void scsi_setup_blk_pc_cmnd(struc
 		cmd->sc_data_direction = DMA_FROM_DEVICE;
 	
 	cmd->transfersize = req->data_len;
-	cmd->allowed = req->retries;
+	cmd->allowed = 3;
 	cmd->timeout_per_command = req->timeout;
 	cmd->done = scsi_blk_pc_done;
 }
--- linux-2.6.16-rc5-mm2-orig/block/scsi_ioctl.c	2006-03-05 23:43:41.000000000 +0100
+++ linux-2.6.16-rc5-mm2/block/scsi_ioctl.c	2006-03-06 21:16:19.000000000 +0100
@@ -314,8 +314,6 @@ static int sg_io(struct file *file, requ
 	if (!rq->timeout)
 		rq->timeout = BLK_DEFAULT_TIMEOUT;
 
-	rq->retries = 0;
-
 	start_time = jiffies;
 
 	/* ignore return value. All information is passed back to caller
@@ -433,7 +431,6 @@ static int sg_scsi_ioctl(struct file *fi
 	rq->data = buffer;
 	rq->data_len = bytes;
 	rq->flags |= REQ_BLOCK_PC;
-	rq->retries = 0;
 
 	blk_execute_rq(q, bd_disk, rq, 0);
 	err = rq->errors & 0xff;	/* only 8 bit SCSI status */
--- linux-2.6.16-rc5-mm2-orig/include/linux/blkdev.h	2006-03-05 23:44:06.000000000 +0100
+++ linux-2.6.16-rc5-mm2/include/linux/blkdev.h	2006-03-06 21:13:02.000000000 +0100
@@ -190,7 +190,6 @@ struct request {
 	void *sense;
 
 	unsigned int timeout;
-	int retries;
 
 	/*
 	 * For Power Management requests


Unfortunately that didn't fix it. After booting the patched kernel I found
this in dmesg :

Slab corruption: start=f77d768c, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c02934db>](sr_do_ioctl+0x11b/0x270)
000: 70 00 02 00 00 00 00 0a 00 00 00 00 3a 01 00 00
010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Prev obj: start=f77d7640, len=64
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [<c023d64d>](init_dev+0x55d/0x630)
000: 00 01 00 00 05 00 00 00 bf 00 00 00 3b 8a 00 00
010: 00 03 1c 7f 15 04 00 01 00 11 13 1a 00 12 0f 17
Next obj: start=f77d76d8, len=64
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [<c01c7c80>](ext3_init_block_alloc_info+0x20/0x70)
000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
010: 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Slab corruption: start=f7001640, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c02934db>](sr_do_ioctl+0x11b/0x270)
000: 70 00 05 00 00 00 00 0a 00 00 00 00 24 00 00 00
010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Prev obj: start=f70015f4, len=64
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [<f8bdf124>](__snd_util_mem_alloc+0x74/0x80 [snd_util_mem])
000: 00 10 00 00 00 00 00 00 b0 75 7d f7 50 33 bd f5
010: 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00
Next obj: start=f700168c, len=64
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [<c0173923>](real_lookup+0x93/0xe0)
000: 6c 69 62 62 6f 6f 73 74 5f 75 6e 69 74 5f 74 65
010: 73 74 5f 66 72 61 6d 65 77 6f 72 6b 2d 67 63 63


> Hmm.. This appended patch may or may not help.
> 
I'll give it a spin.


> It overwrites the SCSI command "req" pointer when the request has been 
> done. The request cannot be used afterwards, so anybody accessing it would 
> be a bug. I think.
> 
Let's see what happens.
I've applied it on top of the changes mentioned above - let me know if 
that's wrong.


> HOWEVER. I noticed something else strange. Your slab corruption report 
> says
> 
> 	Slab corruption: start=f72948a0, len=64
> 	Redzone: 0x5a2cf071/0x5a2cf071.
> 	Last user: [<c02934eb>](sr_do_ioctl+0x11b/0x270)
> 	...
> 
> and the scary thing is that "len=64". 
> 
> The thing is, SCSI uses "SCSI_SENSE_BUFFERSIZE" to determine the maximum 
> sense size to copy, and what do we have, if not
> 
> 	include/scsi/scsi_cmnd.h:#define SCSI_SENSE_BUFFERSIZE  96
> 
> ie a 64-byte buffer is simply TOO DAMN SMALL!
> 
> Now, the thing is, the 64 comes from "sizeof(struct request_sense)", which 
> is what "struct packet_command *" uses. We can change that sizeof() to 
> just use SCSI_SENSE_BUFFERSIZE, but that still makes me worry about 

I'll try that after booting with the patch you just supplied and let you 
know the results shortly.


 /Jesper


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 20:24         ` Jesper Juhl
@ 2006-03-06 20:30           ` Jens Axboe
  2006-03-06 20:33             ` Jens Axboe
  2006-03-06 21:41             ` Jesper Juhl
  0 siblings, 2 replies; 53+ messages in thread
From: Jens Axboe @ 2006-03-06 20:30 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: Linus Torvalds, Linux Kernel Mailing List, Andrew Morton, markhe,
	Andrea Arcangeli, Mike Christie, James Bottomley

On Mon, Mar 06 2006, Jesper Juhl wrote:
> On Monday 06 March 2006 21:06, Linus Torvalds wrote:
> > 
> > On Mon, 6 Mar 2006, Linus Torvalds wrote:
> > > 
> > > So it's either an aic7xxx bug, or it's generic SCSI.
> > > 
> > > Considering that we've had other slab corruption issues (the reason I was 
> > > looking closely at yours), generic SCSI isn't out of the question.
> > > 
> > > If you were a git user, doing a bisection run would be useful since you 
> > > seem to be able to recreate it at will. Oh, well. Testign that one patch 
> > > would still help.
> > 
> 
> Since the patch you sent me didn't apply cleanly to the mm kernel I made the
> changes by hand. This is what I ended up with (should be the same end result
> as what you intended as far as I can see) :
> 
> --- linux-2.6.16-rc5-mm2-orig/drivers/scsi/scsi_lib.c	2006-03-05 23:43:56.000000000 +0100
> +++ linux-2.6.16-rc5-mm2/drivers/scsi/scsi_lib.c	2006-03-06 21:13:53.000000000 +0100
> @@ -260,7 +260,6 @@ int scsi_execute(struct scsi_device *sde
>  	memcpy(req->cmd, cmd, req->cmd_len);
>  	req->sense = sense;
>  	req->sense_len = 0;
> -	req->retries = retries;
>  	req->timeout = timeout;
>  	req->flags |= flags | REQ_BLOCK_PC | REQ_SPECIAL | REQ_QUIET;
>  
> @@ -478,7 +477,6 @@ int scsi_execute_async(struct scsi_devic
>  	req->sense = sioc->sense;
>  	req->sense_len = 0;
>  	req->timeout = timeout;
> -	req->retries = retries;
>  	req->end_io_data = sioc;
>  
>  	sioc->data = privdata;
> @@ -1240,7 +1238,7 @@ static void scsi_setup_blk_pc_cmnd(struc
>  		cmd->sc_data_direction = DMA_FROM_DEVICE;
>  	
>  	cmd->transfersize = req->data_len;
> -	cmd->allowed = req->retries;
> +	cmd->allowed = 3;
>  	cmd->timeout_per_command = req->timeout;
>  	cmd->done = scsi_blk_pc_done;
>  }
> --- linux-2.6.16-rc5-mm2-orig/block/scsi_ioctl.c	2006-03-05 23:43:41.000000000 +0100
> +++ linux-2.6.16-rc5-mm2/block/scsi_ioctl.c	2006-03-06 21:16:19.000000000 +0100
> @@ -314,8 +314,6 @@ static int sg_io(struct file *file, requ
>  	if (!rq->timeout)
>  		rq->timeout = BLK_DEFAULT_TIMEOUT;
>  
> -	rq->retries = 0;
> -
>  	start_time = jiffies;
>  
>  	/* ignore return value. All information is passed back to caller
> @@ -433,7 +431,6 @@ static int sg_scsi_ioctl(struct file *fi
>  	rq->data = buffer;
>  	rq->data_len = bytes;
>  	rq->flags |= REQ_BLOCK_PC;
> -	rq->retries = 0;
>  
>  	blk_execute_rq(q, bd_disk, rq, 0);
>  	err = rq->errors & 0xff;	/* only 8 bit SCSI status */
> --- linux-2.6.16-rc5-mm2-orig/include/linux/blkdev.h	2006-03-05 23:44:06.000000000 +0100
> +++ linux-2.6.16-rc5-mm2/include/linux/blkdev.h	2006-03-06 21:13:02.000000000 +0100
> @@ -190,7 +190,6 @@ struct request {
>  	void *sense;
>  
>  	unsigned int timeout;
> -	int retries;
>  
>  	/*
>  	 * For Power Management requests
> 
> 
> Unfortunately that didn't fix it. After booting the patched kernel I found
> this in dmesg :

I don't see how it could be, honestly, we would gladly oops in locally
close places if that was the case. If you disable slab debug/poison, do
you get a nice NULL pointer dereference instead? There have been some
reports on a NULL queue for sr devices as of lately, I wonder if some
SCSI change recently was broken.

Tejun, I seem to recall you looking at this, but I can't seem to locate
the thread. Did anything come of it?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 20:30           ` Jens Axboe
@ 2006-03-06 20:33             ` Jens Axboe
  2006-03-06 21:14               ` Jesper Juhl
  2006-03-06 21:41             ` Jesper Juhl
  1 sibling, 1 reply; 53+ messages in thread
From: Jens Axboe @ 2006-03-06 20:33 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: Linus Torvalds, Linux Kernel Mailing List, Andrew Morton, markhe,
	Andrea Arcangeli, Mike Christie, James Bottomley

On Mon, Mar 06 2006, Jens Axboe wrote:
> I don't see how it could be, honestly, we would gladly oops in locally
> close places if that was the case. If you disable slab debug/poison, do
> you get a nice NULL pointer dereference instead? There have been some
> reports on a NULL queue for sr devices as of lately, I wonder if some
> SCSI change recently was broken.
> 
> Tejun, I seem to recall you looking at this, but I can't seem to locate
> the thread. Did anything come of it?

This is the one:

http://marc.theaimsgroup.com/?l=linux-kernel&m=114041855331295&w=2

Also an -mm report, btw. Does this reproduce with 2.6.16-rcX latest?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 20:06       ` Linus Torvalds
  2006-03-06 20:24         ` Jesper Juhl
@ 2006-03-06 20:36         ` Jesper Juhl
  2006-03-06 20:53           ` Jesper Juhl
  1 sibling, 1 reply; 53+ messages in thread
From: Jesper Juhl @ 2006-03-06 20:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Andrew Morton, markhe,
	Andrea Arcangeli, Mike Christie, James Bottomley, Jens Axboe

On Monday 06 March 2006 21:06, Linus Torvalds wrote:
> 
> On Mon, 6 Mar 2006, Linus Torvalds wrote:
> > 
> > So it's either an aic7xxx bug, or it's generic SCSI.
> > 
> > Considering that we've had other slab corruption issues (the reason I was 
> > looking closely at yours), generic SCSI isn't out of the question.
> > 
> > If you were a git user, doing a bisection run would be useful since you 
> > seem to be able to recreate it at will. Oh, well. Testign that one patch 
> > would still help.
> 
> Hmm.. This appended patch may or may not help.
> 
> It overwrites the SCSI command "req" pointer when the request has been 
> done. The request cannot be used afterwards, so anybody accessing it would 
> be a bug. I think.
> 

With the retry code removed and your req poisoning patch on top I just got this :

Slab corruption: start=f727c5a8, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c02934db>](sr_do_ioctl+0x11b/0x270)
000: 70 00 02 00 00 00 00 0a 00 00 00 00 3a 01 00 00
010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Prev obj: start=f727c55c, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c01813ee>](free_fdtable_rcu+0x6e/0x150)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Next obj: start=f727c5f4, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c01813ee>](free_fdtable_rcu+0x6e/0x150)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Slab corruption: start=f727c5a8, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c02934db>](sr_do_ioctl+0x11b/0x270)
000: 70 00 05 00 00 00 00 0a 00 00 00 00 24 00 00 00
010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Prev obj: start=f727c55c, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c01813ee>](free_fdtable_rcu+0x6e/0x150)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Next obj: start=f727c5f4, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c01813ee>](free_fdtable_rcu+0x6e/0x150)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b

and another, probably unrelated, thing I just noticed in my dmesg output:

initcall at 0xc0428240: init_hpet_clocksource+0x0/0x90(): returned with error code -19


> HOWEVER. I noticed something else strange. Your slab corruption report 
> says
> 
> 	Slab corruption: start=f72948a0, len=64
> 	Redzone: 0x5a2cf071/0x5a2cf071.
> 	Last user: [<c02934eb>](sr_do_ioctl+0x11b/0x270)
> 	...
> 
> and the scary thing is that "len=64". 
> 
> The thing is, SCSI uses "SCSI_SENSE_BUFFERSIZE" to determine the maximum 
> sense size to copy, and what do we have, if not
> 
> 	include/scsi/scsi_cmnd.h:#define SCSI_SENSE_BUFFERSIZE  96
> 
> ie a 64-byte buffer is simply TOO DAMN SMALL!
> 
> Now, the thing is, the 64 comes from "sizeof(struct request_sense)", which 
> is what "struct packet_command *" uses. We can change that sizeof() to 
> just use SCSI_SENSE_BUFFERSIZE, but that still makes me worry about 

Building a kernel with that change on top of the other ones atm.


 / Jesper


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 20:36         ` Jesper Juhl
@ 2006-03-06 20:53           ` Jesper Juhl
  2006-03-06 20:56             ` Jesper Juhl
  0 siblings, 1 reply; 53+ messages in thread
From: Jesper Juhl @ 2006-03-06 20:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Andrew Morton, markhe,
	Andrea Arcangeli, Mike Christie, James Bottomley, Jens Axboe

On 3/6/06, Jesper Juhl <jesper.juhl@gmail.com> wrote:
> On Monday 06 March 2006 21:06, Linus Torvalds wrote:
> >
<...snip...>
> > and the scary thing is that "len=64".
> >
> > The thing is, SCSI uses "SCSI_SENSE_BUFFERSIZE" to determine the maximum
> > sense size to copy, and what do we have, if not
> >
> >       include/scsi/scsi_cmnd.h:#define SCSI_SENSE_BUFFERSIZE  96
> >
> > ie a 64-byte buffer is simply TOO DAMN SMALL!
> >
> > Now, the thing is, the 64 comes from "sizeof(struct request_sense)", which
> > is what "struct packet_command *" uses. We can change that sizeof() to
> > just use SCSI_SENSE_BUFFERSIZE, but that still makes me worry about
>
> Building a kernel with that change on top of the other ones atm.
>
Changing the sizeof() to SCSI_SENSE_BUFFERSIZE doesn't fix it :

Slab corruption: start=f79da5a8, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c02934db>](sr_do_ioctl+0x11b/0x270)
000: 70 00 02 00 00 00 00 0a 00 00 00 00 3a 01 00 00
010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Prev obj: start=f79da55c, len=64
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [<c0158918>](__vmalloc_node+0x68/0x80)
000: d0 1e 1e c3 18 1f 1e c3 60 1f 1e c3 a8 1f 1e c3
010: f0 1f 1e c3 38 20 1e c3 80 20 1e c3 c8 20 1e c3
Next obj: start=f79da5f4, len=64
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [<c0173923>](real_lookup+0x93/0xe0)
000: 6c 69 62 62 6f 6f 73 74 5f 70 72 67 5f 65 78 65
010: 63 5f 6d 6f 6e 69 74 6f 72 2d 67 63 63 2d 6d 74
Slab corruption: start=f79da5a8, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c02934db>](sr_do_ioctl+0x11b/0x270)
000: 70 00 05 00 00 00 00 0a 00 00 00 00 24 00 00 00
010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Prev obj: start=f79da55c, len=64
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [<c0158918>](__vmalloc_node+0x68/0x80)
000: d0 1e 1e c3 18 1f 1e c3 60 1f 1e c3 a8 1f 1e c3
010: f0 1f 1e c3 38 20 1e c3 80 20 1e c3 c8 20 1e c3
Next obj: start=f79da5f4, len=64
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [<c0173923>](real_lookup+0x93/0xe0)
000: 6c 69 62 62 6f 6f 73 74 5f 70 72 67 5f 65 78 65
010: 63 5f 6d 6f 6e 69 74 6f 72 2d 67 63 63 2d 6d 74

I'll now go test the things Jens suggested.  Expect more feedback shortly.

--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 20:53           ` Jesper Juhl
@ 2006-03-06 20:56             ` Jesper Juhl
  2006-03-06 21:07               ` Linus Torvalds
  0 siblings, 1 reply; 53+ messages in thread
From: Jesper Juhl @ 2006-03-06 20:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Andrew Morton, markhe,
	Andrea Arcangeli, Mike Christie, James Bottomley, Jens Axboe

On 3/6/06, Jesper Juhl <jesper.juhl@gmail.com> wrote:
> On 3/6/06, Jesper Juhl <jesper.juhl@gmail.com> wrote:
> > On Monday 06 March 2006 21:06, Linus Torvalds wrote:
> > >
> <...snip...>
> > > and the scary thing is that "len=64".
> > >
> > > The thing is, SCSI uses "SCSI_SENSE_BUFFERSIZE" to determine the maximum
> > > sense size to copy, and what do we have, if not
> > >
> > >       include/scsi/scsi_cmnd.h:#define SCSI_SENSE_BUFFERSIZE  96
> > >
> > > ie a 64-byte buffer is simply TOO DAMN SMALL!
> > >
> > > Now, the thing is, the 64 comes from "sizeof(struct request_sense)", which
> > > is what "struct packet_command *" uses. We can change that sizeof() to
> > > just use SCSI_SENSE_BUFFERSIZE, but that still makes me worry about
> >
> > Building a kernel with that change on top of the other ones atm.
> >
> Changing the sizeof() to SCSI_SENSE_BUFFERSIZE doesn't fix it :
>
> Slab corruption: start=f79da5a8, len=64

Hmm, is it just me or should that len= have read len=96 ???

This is the change I made :

--- linux-2.6.16-rc5-mm2/block/scsi_ioctl.c~    2006-03-06
21:43:56.000000000 +0100
+++ linux-2.6.16-rc5-mm2/block/scsi_ioctl.c     2006-03-06
21:43:56.000000000 +0100
@@ -568,7 +568,7 @@ int scsi_cmd_ioctl(struct file *file, st
                        hdr.dxferp = cgc.buffer;
                        hdr.sbp = cgc.sense;
                        if (hdr.sbp)
-                               hdr.mx_sb_len = sizeof(struct request_sense);
+                               hdr.mx_sb_len = SCSI_SENSE_BUFFERSIZE;
                        hdr.timeout = cgc.timeout;
                        hdr.cmdp = ((struct cdrom_generic_command
__user*) arg)->cmd;
                        hdr.cmd_len = sizeof(cgc.cmd);

did I mess up?


--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 20:56             ` Jesper Juhl
@ 2006-03-06 21:07               ` Linus Torvalds
  2006-03-06 21:16                 ` Jesper Juhl
  2006-03-06 21:54                 ` Jesper Juhl
  0 siblings, 2 replies; 53+ messages in thread
From: Linus Torvalds @ 2006-03-06 21:07 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: Linux Kernel Mailing List, Andrew Morton, markhe,
	Andrea Arcangeli, Mike Christie, James Bottomley, Jens Axboe



On Mon, 6 Mar 2006, Jesper Juhl wrote:
> 
> Hmm, is it just me or should that len= have read len=96 ???
> 
> This is the change I made :
> 
> --- linux-2.6.16-rc5-mm2/block/scsi_ioctl.c~    2006-03-06
> 21:43:56.000000000 +0100
> +++ linux-2.6.16-rc5-mm2/block/scsi_ioctl.c     2006-03-06
> 21:43:56.000000000 +0100
> @@ -568,7 +568,7 @@ int scsi_cmd_ioctl(struct file *file, st
>                         hdr.dxferp = cgc.buffer;
>                         hdr.sbp = cgc.sense;
>                         if (hdr.sbp)
> -                               hdr.mx_sb_len = sizeof(struct request_sense);
> +                               hdr.mx_sb_len = SCSI_SENSE_BUFFERSIZE;
>                         hdr.timeout = cgc.timeout;
>                         hdr.cmdp = ((struct cdrom_generic_command __user*) arg)->cmd;
>                         hdr.cmd_len = sizeof(cgc.cmd);
> 
> did I mess up?

That's not the one to change. It's the one in "sr_do_ioctl()", where it 
uses "sizeof(*sense)".

		Linus

----
diff --git a/drivers/scsi/sr_ioctl.c b/drivers/scsi/sr_ioctl.c
index 5d02ff4..b65462f 100644
--- a/drivers/scsi/sr_ioctl.c
+++ b/drivers/scsi/sr_ioctl.c
@@ -192,7 +192,7 @@ int sr_do_ioctl(Scsi_CD *cd, struct pack
 	SDev = cd->device;
 
 	if (!sense) {
-		sense = kmalloc(sizeof(*sense), GFP_KERNEL);
+		sense = kmalloc(SCSI_SENSE_BUFFERSIZE, GFP_KERNEL);
 		if (!sense) {
 			err = -ENOMEM;
 			goto out;

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 20:33             ` Jens Axboe
@ 2006-03-06 21:14               ` Jesper Juhl
  0 siblings, 0 replies; 53+ messages in thread
From: Jesper Juhl @ 2006-03-06 21:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Linus Torvalds, Linux Kernel Mailing List, Andrew Morton, markhe,
	Andrea Arcangeli, Mike Christie, James Bottomley

On 3/6/06, Jens Axboe <axboe@suse.de> wrote:
<snip>
>
> This is the one:
>
> http://marc.theaimsgroup.com/?l=linux-kernel&m=114041855331295&w=2
>
> Also an -mm report, btw. Does this reproduce with 2.6.16-rcX latest?
>

I just build and booted a2.6.16-rc5-git8 with the same config that I
used for the -mm kernel and the problem did not manifest itself there.
So it seems that mainline is fine but we need to find the bug before
it propagates from mm to mainline.

I'll test a -mm kernel without slab debug/poison now to see it it goes Oops.

--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 21:07               ` Linus Torvalds
@ 2006-03-06 21:16                 ` Jesper Juhl
  2006-03-06 21:54                 ` Jesper Juhl
  1 sibling, 0 replies; 53+ messages in thread
From: Jesper Juhl @ 2006-03-06 21:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Andrew Morton, markhe,
	Andrea Arcangeli, Mike Christie, James Bottomley, Jens Axboe

On 3/6/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
>
> On Mon, 6 Mar 2006, Jesper Juhl wrote:
> >
> > Hmm, is it just me or should that len= have read len=96 ???
> >
> > This is the change I made :
> >
> > --- linux-2.6.16-rc5-mm2/block/scsi_ioctl.c~    2006-03-06
> > 21:43:56.000000000 +0100
> > +++ linux-2.6.16-rc5-mm2/block/scsi_ioctl.c     2006-03-06
> > 21:43:56.000000000 +0100
> > @@ -568,7 +568,7 @@ int scsi_cmd_ioctl(struct file *file, st
> >                         hdr.dxferp = cgc.buffer;
> >                         hdr.sbp = cgc.sense;
> >                         if (hdr.sbp)
> > -                               hdr.mx_sb_len = sizeof(struct request_sense);
> > +                               hdr.mx_sb_len = SCSI_SENSE_BUFFERSIZE;
> >                         hdr.timeout = cgc.timeout;
> >                         hdr.cmdp = ((struct cdrom_generic_command __user*) arg)->cmd;
> >                         hdr.cmd_len = sizeof(cgc.cmd);
> >
> > did I mess up?
>
> That's not the one to change. It's the one in "sr_do_ioctl()", where it
> uses "sizeof(*sense)".
>

Ahh, so I did mess up - whoops - I just grep'ed for "sizeof(struct
request_sense)" :-(

I'll try it again (with the correct change) in a moment, after I've
tested Jens's "does no slab poison/debug make it go Oops" question...

--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 20:30           ` Jens Axboe
  2006-03-06 20:33             ` Jens Axboe
@ 2006-03-06 21:41             ` Jesper Juhl
  2006-03-06 21:55               ` Dave Jones
  1 sibling, 1 reply; 53+ messages in thread
From: Jesper Juhl @ 2006-03-06 21:41 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Linus Torvalds, Linux Kernel Mailing List, Andrew Morton, markhe,
	Andrea Arcangeli, Mike Christie, James Bottomley

On 3/6/06, Jens Axboe <axboe@suse.de> wrote:
[...snip...]
>
> I don't see how it could be, honestly, we would gladly oops in locally
> close places if that was the case. If you disable slab debug/poison, do
> you get a nice NULL pointer dereference instead? There have been some
> reports on a NULL queue for sr devices as of lately, I wonder if some
> SCSI change recently was broken.
>

I just build and booted a plain 2.6.16-rc5-mm2 with the same config
I've used previously, but with the following options disabled :

CONFIG_DEBUG_SLAB
CONFIG_PAGE_OWNER
CONFIG_DEBUG_VM
CONFIG_DEBUG_PAGEALLOC

The resulting kernel boots and runs just fine (no Oops) and leaves
nothing in dmesg.
So, without the debugging options it appears to the user that
everything is OK - nasty.

--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 21:07               ` Linus Torvalds
  2006-03-06 21:16                 ` Jesper Juhl
@ 2006-03-06 21:54                 ` Jesper Juhl
  2006-03-06 22:05                   ` Andrew Morton
  2006-03-06 22:17                   ` Linus Torvalds
  1 sibling, 2 replies; 53+ messages in thread
From: Jesper Juhl @ 2006-03-06 21:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Andrew Morton, markhe,
	Andrea Arcangeli, Mike Christie, James Bottomley, Jens Axboe

On 3/6/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
[.snip.]
>
> That's not the one to change. It's the one in "sr_do_ioctl()", where it
> uses "sizeof(*sense)".
>
>                 Linus
>
> ----
> diff --git a/drivers/scsi/sr_ioctl.c b/drivers/scsi/sr_ioctl.c
> index 5d02ff4..b65462f 100644
> --- a/drivers/scsi/sr_ioctl.c
> +++ b/drivers/scsi/sr_ioctl.c
> @@ -192,7 +192,7 @@ int sr_do_ioctl(Scsi_CD *cd, struct pack
>         SDev = cd->device;
>
>         if (!sense) {
> -               sense = kmalloc(sizeof(*sense), GFP_KERNEL);
> +               sense = kmalloc(SCSI_SENSE_BUFFERSIZE, GFP_KERNEL);
>                 if (!sense) {
>                         err = -ENOMEM;
>                         goto out;
>

Ok, booting a plain 2.6.16-rc5-mm2 kernel with the above being the
only change made results in this :

Slab corruption: start=f4f6a11c, len=128
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c02934eb>](sr_do_ioctl+0x11b/0x270)
000: 70 00 02 00 00 00 00 0a 00 00 00 00 3a 01 00 00
010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Prev obj: start=f4f6a090, len=128
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [<c01f4a26>](alloc_as_io_context+0x16/0xd0)
000: 01 00 00 00 00 00 00 00 ad 4e ad de ff ff ff ff
010: ff ff ff ff b0 49 1f c0 c0 49 1f c0 07 00 00 00
Next obj: start=f4f6a1a8, len=128
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [<c01f4a26>](alloc_as_io_context+0x16/0xd0)
000: 01 00 00 00 00 00 00 00 ad 4e ad de ff ff ff ff
010: ff ff ff ff b0 49 1f c0 c0 49 1f c0 07 00 00 00
Slab corruption: start=f4f6a11c, len=128
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c02934eb>](sr_do_ioctl+0x11b/0x270)
000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Prev obj: start=f4f6a090, len=128
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [<c01f4a26>](alloc_as_io_context+0x16/0xd0)
000: 01 00 00 00 00 00 00 00 ad 4e ad de ff ff ff ff
010: ff ff ff ff b0 49 1f c0 c0 49 1f c0 07 00 00 00
Next obj: start=f4f6a1a8, len=128
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [<c01f4a26>](alloc_as_io_context+0x16/0xd0)
000: 01 00 00 00 00 00 00 00 ad 4e ad de ff ff ff ff
010: ff ff ff ff b0 49 1f c0 c0 49 1f c0 07 00 00 00


Where do we go from here ?


--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 21:41             ` Jesper Juhl
@ 2006-03-06 21:55               ` Dave Jones
  2006-03-06 21:57                 ` Jesper Juhl
  2006-03-09 15:50                 ` Martin J. Bligh
  0 siblings, 2 replies; 53+ messages in thread
From: Dave Jones @ 2006-03-06 21:55 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: Jens Axboe, Linus Torvalds, Linux Kernel Mailing List,
	Andrew Morton, markhe, Andrea Arcangeli, Mike Christie,
	James Bottomley

On Mon, Mar 06, 2006 at 10:41:07PM +0100, Jesper Juhl wrote:

 > CONFIG_DEBUG_SLAB
 > CONFIG_PAGE_OWNER
 > CONFIG_DEBUG_VM
 > CONFIG_DEBUG_PAGEALLOC
 > 
 > The resulting kernel boots and runs just fine (no Oops) and leaves
 > nothing in dmesg.
 > So, without the debugging options it appears to the user that
 > everything is OK - nasty.

DEBUG_PAGEALLOC in particular is *fantastic* at making bugs hide.
I've lost many an hour trying to pin bugs down due to that.

		Dave


-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 21:55               ` Dave Jones
@ 2006-03-06 21:57                 ` Jesper Juhl
  2006-03-09 15:50                 ` Martin J. Bligh
  1 sibling, 0 replies; 53+ messages in thread
From: Jesper Juhl @ 2006-03-06 21:57 UTC (permalink / raw)
  To: Dave Jones, Jesper Juhl, Jens Axboe, Linus Torvalds,
	Linux Kernel Mailing List, Andrew Morton, markhe,
	Andrea Arcangeli, Mike Christie, James Bottomley

On 3/6/06, Dave Jones <davej@redhat.com> wrote:
> On Mon, Mar 06, 2006 at 10:41:07PM +0100, Jesper Juhl wrote:
>
>  > CONFIG_DEBUG_SLAB
>  > CONFIG_PAGE_OWNER
>  > CONFIG_DEBUG_VM
>  > CONFIG_DEBUG_PAGEALLOC
>  >
>  > The resulting kernel boots and runs just fine (no Oops) and leaves
>  > nothing in dmesg.
>  > So, without the debugging options it appears to the user that
>  > everything is OK - nasty.
>
> DEBUG_PAGEALLOC in particular is *fantastic* at making bugs hide.
> I've lost many an hour trying to pin bugs down due to that.
>

Well, in this case, turning the option *off* hides the bug ;)

--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 21:54                 ` Jesper Juhl
@ 2006-03-06 22:05                   ` Andrew Morton
  2006-03-06 22:08                     ` Jesper Juhl
  2006-03-06 22:17                   ` Linus Torvalds
  1 sibling, 1 reply; 53+ messages in thread
From: Andrew Morton @ 2006-03-06 22:05 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: torvalds, linux-kernel, markhe, andrea, michaelc, James.Bottomley,
	axboe

"Jesper Juhl" <jesper.juhl@gmail.com> wrote:
>
> Where do we go from here ?
>

If you can test just

	2.6.16-rc5 + linus.patch + git-scsi-misc.patch

then we'd have a clearer idea.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 22:05                   ` Andrew Morton
@ 2006-03-06 22:08                     ` Jesper Juhl
  2006-03-06 22:27                       ` Jesper Juhl
  0 siblings, 1 reply; 53+ messages in thread
From: Jesper Juhl @ 2006-03-06 22:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: torvalds, linux-kernel, markhe, andrea, michaelc, James.Bottomley,
	axboe

On 3/6/06, Andrew Morton <akpm@osdl.org> wrote:
> "Jesper Juhl" <jesper.juhl@gmail.com> wrote:
> >
> > Where do we go from here ?
> >
>
> If you can test just
>
>         2.6.16-rc5 + linus.patch + git-scsi-misc.patch
>
> then we'd have a clearer idea.
>
Sure, I'll get right on it.
I'll post the results in 15min or so.

--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 21:54                 ` Jesper Juhl
  2006-03-06 22:05                   ` Andrew Morton
@ 2006-03-06 22:17                   ` Linus Torvalds
  2006-03-06 22:34                     ` Linus Torvalds
  2006-03-06 22:44                     ` Jesper Juhl
  1 sibling, 2 replies; 53+ messages in thread
From: Linus Torvalds @ 2006-03-06 22:17 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: Linux Kernel Mailing List, Andrew Morton, markhe,
	Andrea Arcangeli, Mike Christie, James Bottomley, Jens Axboe



On Mon, 6 Mar 2006, Jesper Juhl wrote:
> 
> Ok, booting a plain 2.6.16-rc5-mm2 kernel with the above being the
> only change made results in this :

Yeah. I'm not surprised. A real mode-sense shouldn't be even 64 bytes, 
much less 96, so it shouldn't have overflowed, and we had no indication of 
the corruption spreading past the one allocation anyway.

It did/does seem a bug, though, so worth checking.

So onward in our tireless battle. Does this patch make any difference for 
you? It does two things:

 - it clears the "->sense" buffer in blk_end_sync_rq() (since it won't be 
   valid any more: the request is gone)
 - it adds a BUG_ON() if we appear to have already done the sense fill on 
   SCSI IO completion, and do it again.

Now, I've not tried either of these, and the BUG_ON() in particular might 
be a false positive itself, but it might be worth testing.

		Linus

---
diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index 03d9c82..4351d34 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -2637,6 +2637,7 @@ void blk_end_sync_rq(struct request *rq,
 	struct completion *waiting = rq->waiting;
 
 	rq->waiting = NULL;
+	rq->sense = NULL;
 	__blk_put_request(rq->q, rq);
 
 	/*
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 701a328..2b60769 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -961,6 +961,10 @@ void scsi_io_completion(struct scsi_cmnd
 		if (result) {
 			clear_errors = 0;
 			if (sense_valid && req->sense) {
+
+				/* Have we already filled the sense buffer? */
+				BUG_ON(req->sense_len);
+
 				/*
 				 * SG_IO wants current and deferred errors
 				 */

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 22:08                     ` Jesper Juhl
@ 2006-03-06 22:27                       ` Jesper Juhl
  0 siblings, 0 replies; 53+ messages in thread
From: Jesper Juhl @ 2006-03-06 22:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: torvalds, linux-kernel, markhe, andrea, michaelc, James.Bottomley,
	axboe

On 3/6/06, Jesper Juhl <jesper.juhl@gmail.com> wrote:
> On 3/6/06, Andrew Morton <akpm@osdl.org> wrote:
> > "Jesper Juhl" <jesper.juhl@gmail.com> wrote:
> > >
> > > Where do we go from here ?
> > >
> >
> > If you can test just
> >
> >         2.6.16-rc5 + linus.patch + git-scsi-misc.patch
> >
> > then we'd have a clearer idea.
> >
> Sure, I'll get right on it.
> I'll post the results in 15min or so.
>

Ok, a plain 2.6.15-rc5 + linus.patch + git-scsi-misc.patch results in this :

Slab corruption: start=f4812d14, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c028f61b>](sr_do_ioctl+0x11b/0x270)
000: 70 00 02 00 00 00 00 0a 00 00 00 00 3a 01 00 00
010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Prev obj: start=f4812cc8, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<00000000>](_stext+0x3feffd68/0x8)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Next obj: start=f4812d60, len=64
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [<c02367ef>](init_dev+0x5cf/0x630)
000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Slab corruption: start=f4812d14, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c028f61b>](sr_do_ioctl+0x11b/0x270)
000: 70 00 05 00 00 00 00 0a 00 00 00 00 24 00 00 00
010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Prev obj: start=f4812cc8, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<00000000>](_stext+0x3feffd68/0x8)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Next obj: start=f4812d60, len=64
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [<c02367ef>](init_dev+0x5cf/0x630)
000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00


--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 22:17                   ` Linus Torvalds
@ 2006-03-06 22:34                     ` Linus Torvalds
  2006-03-06 22:52                       ` Jesper Juhl
                                         ` (2 more replies)
  2006-03-06 22:44                     ` Jesper Juhl
  1 sibling, 3 replies; 53+ messages in thread
From: Linus Torvalds @ 2006-03-06 22:34 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: Linux Kernel Mailing List, Andrew Morton, markhe,
	Andrea Arcangeli, Mike Christie, James Bottomley, Jens Axboe,
	Pekka Enberg



Ok, 
 I have a new favorite suspect.

It is this one: commit 4d268eba1187ef66844a6a33b9431e5d0dadd4ad:

    [PATCH] slab: extract slab order calculation to separate function
    
    This patch moves the ugly loop that determines the 'optimal' size (page order)
    of cache slabs from kmem_cache_create() to a separate function and cleans it
    up a bit.
    
    Thanks to Matthew Wilcox for the help with this patch.
    
    Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

and I think it may be broken.

In particular, as far as I can tell, that

+               /* More than offslab_limit objects will cause problems */
+               if (flags & CFLGS_OFF_SLAB && cachep->num > offslab_limit)
+                       break;

has been incorrectly translated for several reasons:

 - we shouldn't check "cachep->num > offslab_limit". We should check just 
   "num > offslab_limit" (cachep->num is the _previous_ number we tested).

 - when we do "break", we've already incremented "gfporder", and we should 
   correct for that.

Now, maybe I'm just off my rocker again (I've certainly been batting 0.000 
so far, even if I think I've been finding real bugs). So who knows. But I 
get the feeling that that patch is broken.

Either revert it, or try this (TOTALLY UNTESTED!!!) patch..

And hey, maybe I'm just crazy.

		Linus

----
diff --git a/mm/slab.c b/mm/slab.c
index 2b0b151..1cca41d 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1628,25 +1628,22 @@ static inline size_t calculate_slab_orde
 			size_t size, size_t align, unsigned long flags)
 {
 	size_t left_over = 0;
+	int gfporder;
 
-	for (;; cachep->gfporder++) {
+	for (gfporder = 0 ; gfporder < MAX_GFP_ORDER; gfporder++) {
 		unsigned int num;
 		size_t remainder;
 
-		if (cachep->gfporder > MAX_GFP_ORDER) {
-			cachep->num = 0;
-			break;
-		}
-
-		cache_estimate(cachep->gfporder, size, align, flags,
-			       &remainder, &num);
+		cache_estimate(gfporder, size, align, flags, &remainder, &num);
 		if (!num)
 			continue;
+
 		/* More than offslab_limit objects will cause problems */
-		if (flags & CFLGS_OFF_SLAB && cachep->num > offslab_limit)
+		if ((flags & CFLGS_OFF_SLAB) && num > offslab_limit)
 			break;
 
 		cachep->num = num;
+		cachep->gfporder = gfporder;
 		left_over = remainder;
 
 		/*

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 22:17                   ` Linus Torvalds
  2006-03-06 22:34                     ` Linus Torvalds
@ 2006-03-06 22:44                     ` Jesper Juhl
  1 sibling, 0 replies; 53+ messages in thread
From: Jesper Juhl @ 2006-03-06 22:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Andrew Morton, markhe,
	Andrea Arcangeli, Mike Christie, James Bottomley, Jens Axboe

On 3/6/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
>
> On Mon, 6 Mar 2006, Jesper Juhl wrote:
> >
> > Ok, booting a plain 2.6.16-rc5-mm2 kernel with the above being the
> > only change made results in this :
>
> Yeah. I'm not surprised. A real mode-sense shouldn't be even 64 bytes,
> much less 96, so it shouldn't have overflowed, and we had no indication of
> the corruption spreading past the one allocation anyway.
>
> It did/does seem a bug, though, so worth checking.
>
Well, hopefully the SCSI people can take a look at that as a seperate issue...


> So onward in our tireless battle. Does this patch make any difference for
> you? It does two things:
>
>  - it clears the "->sense" buffer in blk_end_sync_rq() (since it won't be
>    valid any more: the request is gone)
>  - it adds a BUG_ON() if we appear to have already done the sense fill on
>    SCSI IO completion, and do it again.
>
> Now, I've not tried either of these, and the BUG_ON() in particular might
> be a false positive itself, but it might be worth testing.
>

Unfortunately this doesn't seem to make a difference :

Slab corruption: start=f70c0770, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c02934fb>](sr_do_ioctl+0x11b/0x270)
000: 70 00 02 00 00 00 00 0a 00 00 00 00 3a 01 00 00
010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Prev obj: start=f70c0724, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c01813e6>](free_fdtable_rcu+0x66/0x150)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Next obj: start=f70c07bc, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c01813ee>](free_fdtable_rcu+0x6e/0x150)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Slab corruption: start=f70c0770, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c02934fb>](sr_do_ioctl+0x11b/0x270)
000: 70 00 05 00 00 00 00 0a 00 00 00 00 24 00 00 00
010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Prev obj: start=f70c0724, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c01813e6>](free_fdtable_rcu+0x66/0x150)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Next obj: start=f70c07bc, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c01813ee>](free_fdtable_rcu+0x6e/0x150)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b


--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 22:34                     ` Linus Torvalds
@ 2006-03-06 22:52                       ` Jesper Juhl
  2006-03-06 22:54                       ` Linus Torvalds
  2006-03-07 19:28                       ` Bill Davidsen
  2 siblings, 0 replies; 53+ messages in thread
From: Jesper Juhl @ 2006-03-06 22:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Andrew Morton, markhe,
	Andrea Arcangeli, Mike Christie, James Bottomley, Jens Axboe,
	Pekka Enberg

On 3/6/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
> Ok,
>  I have a new favorite suspect.
>
Heh, you are coming up with stuff to test faster than I can build &
boot kernels ;-)
Which is good, we'll get to the bottom of this all the faster :)


> It is this one: commit 4d268eba1187ef66844a6a33b9431e5d0dadd4ad:
>
[--snip--]
>
> Now, maybe I'm just off my rocker again (I've certainly been batting 0.000
> so far, even if I think I've been finding real bugs). So who knows. But I
> get the feeling that that patch is broken.
>
> Either revert it, or try this (TOTALLY UNTESTED!!!) patch..
>
Hmm, that patch doesn't apply at all to 2.6.16-rc5-mm2 :/

patching file mm/slab.c
Hunk #1 FAILED at 1628.
1 out of 1 hunk FAILED -- saving rejects to file mm/slab.c.rej


$ cat mm/slab.c.rej
***************
*** 1628,1649 ****
  			size_t size, size_t align, unsigned long flags)
  {
  	size_t left_over = 0;
- 	int gfporder;

- 	for (gfporder = 0 ; gfporder < MAX_GFP_ORDER; gfporder++) {
  		unsigned int num;
  		size_t remainder;

- 		cache_estimate(gfporder, size, align, flags, &remainder, &num);
  		if (!num)
  			continue;
-
  		/* More than offslab_limit objects will cause problems */
- 		if ((flags & CFLGS_OFF_SLAB) && num > offslab_limit)
  			break;

  		cachep->num = num;
- 		cachep->gfporder = gfporder;
  		left_over = remainder;

  		/*
--- 1628,1652 ----
  			size_t size, size_t align, unsigned long flags)
  {
  	size_t left_over = 0;

+ 	for (;; cachep->gfporder++) {
  		unsigned int num;
  		size_t remainder;

+ 		if (cachep->gfporder > MAX_GFP_ORDER) {
+ 			cachep->num = 0;
+ 			break;
+ 		}
+
+ 		cache_estimate(cachep->gfporder, size, align, flags,
+ 			       &remainder, &num);
  		if (!num)
  			continue;
  		/* More than offslab_limit objects will cause problems */
+ 		if (flags & CFLGS_OFF_SLAB && cachep->num > offslab_limit)
  			break;

  		cachep->num = num;
  		left_over = remainder;

  		/*



> And hey, maybe I'm just crazy.
>
Somehow I don't think that's the core problem here ;)


--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 22:34                     ` Linus Torvalds
  2006-03-06 22:52                       ` Jesper Juhl
@ 2006-03-06 22:54                       ` Linus Torvalds
  2006-03-06 23:01                         ` Jesper Juhl
  2006-03-07 19:28                       ` Bill Davidsen
  2 siblings, 1 reply; 53+ messages in thread
From: Linus Torvalds @ 2006-03-06 22:54 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: Linux Kernel Mailing List, Andrew Morton, markhe,
	Andrea Arcangeli, Mike Christie, James Bottomley, Jens Axboe,
	Pekka Enberg



On Mon, 6 Mar 2006, Linus Torvalds wrote:
> 
> Either revert it, or try this (TOTALLY UNTESTED!!!) patch..

Don't even bother with the untested patch.

> +	for (gfporder = 0 ; gfporder < MAX_GFP_ORDER; gfporder++) {

At a minimum, this "<" needs to be "<=".

After that, it might even work. Not that I can convince me that the test 
for "offslab_limit" ever even triggers, so..

		Linus

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 22:54                       ` Linus Torvalds
@ 2006-03-06 23:01                         ` Jesper Juhl
  2006-03-06 23:06                           ` Andrew Morton
  0 siblings, 1 reply; 53+ messages in thread
From: Jesper Juhl @ 2006-03-06 23:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Andrew Morton, markhe,
	Andrea Arcangeli, Mike Christie, James Bottomley, Jens Axboe,
	Pekka Enberg

On 3/6/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
>
> On Mon, 6 Mar 2006, Linus Torvalds wrote:
> >
> > Either revert it, or try this (TOTALLY UNTESTED!!!) patch..
>
> Don't even bother with the untested patch.
>
> > +     for (gfporder = 0 ; gfporder < MAX_GFP_ORDER; gfporder++) {
>
> At a minimum, this "<" needs to be "<=".
>
> After that, it might even work. Not that I can convince me that the test
> for "offslab_limit" ever even triggers, so..
>

Ehh, it's getting pretty clear that you are looking at
2.6.16-rc5-git<latest> and I'm using -mm here, since that code is not
present in mm/slab.c in 2.6.16-rc5-mm2 in anything near that form.

And since 2.6.16-rc5-git8 is not experiencing problems I'd suggest you
perhaps instead take a look at what's in -mm... That's where we need
to work (it seems) to find the bug...

--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 23:01                         ` Jesper Juhl
@ 2006-03-06 23:06                           ` Andrew Morton
  2006-03-06 23:24                             ` Jesper Juhl
  0 siblings, 1 reply; 53+ messages in thread
From: Andrew Morton @ 2006-03-06 23:06 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: torvalds, linux-kernel, markhe, andrea, michaelc, James.Bottomley,
	axboe, penberg

"Jesper Juhl" <jesper.juhl@gmail.com> wrote:
>
> And since 2.6.16-rc5-git8 is not experiencing problems I'd suggest you
>  perhaps instead take a look at what's in -mm... That's where we need
>  to work (it seems) to find the bug...

Yes, it's very probably something in git-scsi-misc.

 drivers/block/cciss.c               |    3 
 drivers/message/fusion/Kconfig      |    1 
 drivers/message/fusion/mptbase.c    |   72 -
 drivers/message/fusion/mptbase.h    |   14 
 drivers/message/fusion/mptfc.c      |    4 
 drivers/message/fusion/mptlan.c     |    5 
 drivers/message/fusion/mptsas.c     |  196 ++
 drivers/message/fusion/mptscsih.c   | 2402 +-----------------------------------
 drivers/message/fusion/mptscsih.h   |   11 
 drivers/message/fusion/mptspi.c     |  733 ++++++++++
 drivers/scsi/53c700.c               |   18 
 drivers/scsi/aacraid/aacraid.h      |    5 
 drivers/scsi/aacraid/comminit.c     |    1 
 drivers/scsi/aacraid/commsup.c      |   14 
 drivers/scsi/aacraid/linit.c        |   14 
 drivers/scsi/aha152x.c              |    7 
 drivers/scsi/aic7xxx/aic79xx_core.c |   24 
 drivers/scsi/aic7xxx/aic7xxx_core.c |   24 
 drivers/scsi/aic7xxx/aic7xxx_osm.c  |   45 
 drivers/scsi/aic7xxx/aic7xxx_osm.h  |    5 
 drivers/scsi/hosts.c                |    3 
 drivers/scsi/ipr.c                  |  109 +
 drivers/scsi/ips.c                  |    2 
 drivers/scsi/jazz_esp.c             |   19 
 drivers/scsi/lpfc/lpfc.h            |   40 
 drivers/scsi/lpfc/lpfc_attr.c       |  162 +-
 drivers/scsi/lpfc/lpfc_crtn.h       |   28 
 drivers/scsi/lpfc/lpfc_ct.c         |   74 -
 drivers/scsi/lpfc/lpfc_disc.h       |   19 
 drivers/scsi/lpfc/lpfc_els.c        |  772 +++++++----
 drivers/scsi/lpfc/lpfc_hbadisc.c    |  492 +++----
 drivers/scsi/lpfc/lpfc_hw.h         |   65 
 drivers/scsi/lpfc/lpfc_init.c       |  265 ++-
 drivers/scsi/lpfc/lpfc_mbox.c       |   33 
 drivers/scsi/lpfc/lpfc_nportdisc.c  |  374 +++--
 drivers/scsi/lpfc/lpfc_scsi.c       |   25 
 drivers/scsi/lpfc/lpfc_scsi.h       |    5 
 drivers/scsi/lpfc/lpfc_sli.c        |  385 +++--
 drivers/scsi/lpfc/lpfc_sli.h        |    5 
 drivers/scsi/lpfc/lpfc_version.h    |    6 
 drivers/scsi/ncr53c8xx.c            |  127 -
 drivers/scsi/ncr53c8xx.h            |   37 
 drivers/scsi/osst.c                 |   24 
 drivers/scsi/qla2xxx/qla_def.h      |    6 
 drivers/scsi/qla2xxx/qla_gbl.h      |    2 
 drivers/scsi/qla2xxx/qla_isr.c      |    7 
 drivers/scsi/qla2xxx/qla_mbx.c      |    4 
 drivers/scsi/qla2xxx/qla_os.c       |   89 -
 drivers/scsi/qla2xxx/qla_sup.c      |    4 
 drivers/scsi/scsi.c                 |    6 
 drivers/scsi/scsi_debug.c           |    9 
 drivers/scsi/scsi_ioctl.c           |    3 
 drivers/scsi/scsi_lib.c             |   76 -
 drivers/scsi/scsi_scan.c            |  100 -
 drivers/scsi/scsi_sysfs.c           |    4 
 drivers/scsi/scsi_transport_fc.c    |    9 
 drivers/scsi/scsi_transport_iscsi.c |    3 
 drivers/scsi/scsi_transport_sas.c   |  258 +++
 drivers/scsi/scsi_transport_spi.c   |   83 -
 drivers/scsi/sd.c                   |   11 
 drivers/scsi/sg.c                   |   16 
 drivers/scsi/sr.c                   |    5 
 drivers/scsi/sr_ioctl.c             |    6 
 drivers/scsi/st.c                   |   29 
 drivers/scsi/sym53c8xx_2/sym_hipd.c |   53 
 include/linux/workqueue.h           |    6 
 include/scsi/scsi.h                 |    2 
 include/scsi/scsi_cmnd.h            |   20 
 include/scsi/scsi_device.h          |   16 
 include/scsi/scsi_transport_sas.h   |   22 
 include/scsi/scsi_transport_spi.h   |    4 
 kernel/workqueue.c                  |   29 
 72 files changed, 3444 insertions(+), 4107 deletions(-)


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 23:06                           ` Andrew Morton
@ 2006-03-06 23:24                             ` Jesper Juhl
  2006-03-07  0:17                               ` Linus Torvalds
  2006-03-07  3:15                               ` Mike Christie
  0 siblings, 2 replies; 53+ messages in thread
From: Jesper Juhl @ 2006-03-06 23:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: torvalds, linux-kernel, markhe, andrea, michaelc, James.Bottomley,
	axboe, penberg

On 3/7/06, Andrew Morton <akpm@osdl.org> wrote:
> "Jesper Juhl" <jesper.juhl@gmail.com> wrote:
> >
> > And since 2.6.16-rc5-git8 is not experiencing problems I'd suggest you
> >  perhaps instead take a look at what's in -mm... That's where we need
> >  to work (it seems) to find the bug...
>
> Yes, it's very probably something in git-scsi-misc.
>
I would say that's correct. I just build 2.6.16-rc5-mm2 with just
git-scsi-misc.patch reverted, and that makes the problem go away.

So now the big question is; what part(s) of git-scsi-misc is broken?

--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 23:24                             ` Jesper Juhl
@ 2006-03-07  0:17                               ` Linus Torvalds
  2006-03-07  0:25                                 ` Jesper Juhl
  2006-03-07  3:15                               ` Mike Christie
  1 sibling, 1 reply; 53+ messages in thread
From: Linus Torvalds @ 2006-03-07  0:17 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: Andrew Morton, linux-kernel, markhe, andrea, michaelc,
	James.Bottomley, axboe, penberg



On Tue, 7 Mar 2006, Jesper Juhl wrote:
>
> On 3/7/06, Andrew Morton <akpm@osdl.org> wrote:
> > "Jesper Juhl" <jesper.juhl@gmail.com> wrote:
> > >
> > > And since 2.6.16-rc5-git8 is not experiencing problems I'd suggest you
> > >  perhaps instead take a look at what's in -mm... That's where we need
> > >  to work (it seems) to find the bug...
> >
> > Yes, it's very probably something in git-scsi-misc.
> >
> I would say that's correct. I just build 2.6.16-rc5-mm2 with just
> git-scsi-misc.patch reverted, and that makes the problem go away.

Ok. I was kind of hoping that it was just a more reliable case of the 
corruption that Andrew had been seeing too (which seems to be hard to 
trigger in mainline too, but might exist there).

> So now the big question is; what part(s) of git-scsi-misc is broken?

Well, its origin is actually a git tree, so you could try the "git bisect" 
approach using the

	git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git

tree that the patch comes from..

		Linus

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-07  0:17                               ` Linus Torvalds
@ 2006-03-07  0:25                                 ` Jesper Juhl
  0 siblings, 0 replies; 53+ messages in thread
From: Jesper Juhl @ 2006-03-07  0:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, linux-kernel, markhe, andrea, michaelc,
	James.Bottomley, axboe, penberg

On 3/7/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
>
> On Tue, 7 Mar 2006, Jesper Juhl wrote:
> >
> > On 3/7/06, Andrew Morton <akpm@osdl.org> wrote:
> > > "Jesper Juhl" <jesper.juhl@gmail.com> wrote:
> > > >
> > > > And since 2.6.16-rc5-git8 is not experiencing problems I'd suggest you
> > > >  perhaps instead take a look at what's in -mm... That's where we need
> > > >  to work (it seems) to find the bug...
> > >
> > > Yes, it's very probably something in git-scsi-misc.
> > >
> > I would say that's correct. I just build 2.6.16-rc5-mm2 with just
> > git-scsi-misc.patch reverted, and that makes the problem go away.
>
> Ok. I was kind of hoping that it was just a more reliable case of the
> corruption that Andrew had been seeing too (which seems to be hard to
> trigger in mainline too, but might exist there).
>
> > So now the big question is; what part(s) of git-scsi-misc is broken?
>
> Well, its origin is actually a git tree, so you could try the "git bisect"
> approach using the
>
>         git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git
>
> tree that the patch comes from..
>

I'll give that a go tomorrow - right now I need to get some sleep.
If there are other things to try, then just drop me a mail and I'll
test it tomorrow.

--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 23:24                             ` Jesper Juhl
  2006-03-07  0:17                               ` Linus Torvalds
@ 2006-03-07  3:15                               ` Mike Christie
  2006-03-07  3:20                                 ` Linus Torvalds
  1 sibling, 1 reply; 53+ messages in thread
From: Mike Christie @ 2006-03-07  3:15 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: Andrew Morton, torvalds, linux-kernel, markhe, andrea,
	James.Bottomley, axboe, penberg

Jesper Juhl wrote:
> On 3/7/06, Andrew Morton <akpm@osdl.org> wrote:
> 
>>"Jesper Juhl" <jesper.juhl@gmail.com> wrote:
>>
>>>And since 2.6.16-rc5-git8 is not experiencing problems I'd suggest you
>>> perhaps instead take a look at what's in -mm... That's where we need
>>> to work (it seems) to find the bug...
>>
>>Yes, it's very probably something in git-scsi-misc.
>>
> 
> I would say that's correct. I just build 2.6.16-rc5-mm2 with just
> git-scsi-misc.patch reverted, and that makes the problem go away.
> 
> So now the big question is; what part(s) of git-scsi-misc is broken?
> 

Is it relate to this change?

diff --git a/drivers/scsi/sr_ioctl.c b/drivers/scsi/sr_ioctl.c
index 5d02ff4..03fbc4b 100644
--- a/drivers/scsi/sr_ioctl.c
+++ b/drivers/scsi/sr_ioctl.c
@@ -44,11 +44,10 @@ static int sr_read_tochdr(struct cdrom_d
         int result;
         unsigned char *buffer;

-       buffer = kmalloc(32, GFP_KERNEL | SR_GFP_DMA(cd));
+       buffer = kzalloc(32, GFP_KERNEL | SR_GFP_DMA(cd));
         if (!buffer)
                 return -ENOMEM;

-       memset(&cgc, 0, sizeof(struct packet_command));         cgc.timeout = IOCTL_TIMEOUT;
         cgc.cmd[0] = GPCMD_READ_TOC_PMA_ATIP;         cgc.cmd[8] = 12;                /* LSB of length */
@@ -74,11 +73,10 @@ static int sr_read_tocentry(struct cdrom
         int result;
         unsigned char *buffer;

-       buffer = kmalloc(32, GFP_KERNEL | SR_GFP_DMA(cd));
+       buffer = kzalloc(32, GFP_KERNEL | SR_GFP_DMA(cd));
         if (!buffer)
                 return -ENOMEM;

-       memset(&cgc, 0, sizeof(struct packet_command));




When someone converted the *buffer* allocation to kzalloc they
also removed the the memset for the *packet_cmmand* struct.

The

memset(&cgc, 0, sizeof(struct packet_command));

should be added back I think.

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-07  3:15                               ` Mike Christie
@ 2006-03-07  3:20                                 ` Linus Torvalds
  2006-03-07 18:01                                   ` James Bottomley
  0 siblings, 1 reply; 53+ messages in thread
From: Linus Torvalds @ 2006-03-07  3:20 UTC (permalink / raw)
  To: Mike Christie
  Cc: Jesper Juhl, Andrew Morton, linux-kernel, markhe, andrea,
	James.Bottomley, axboe, penberg



On Mon, 6 Mar 2006, Mike Christie wrote:
> -       buffer = kmalloc(32, GFP_KERNEL | SR_GFP_DMA(cd));
> +       buffer = kzalloc(32, GFP_KERNEL | SR_GFP_DMA(cd));
>         if (!buffer)
>                 return -ENOMEM;
> 
> -       memset(&cgc, 0, sizeof(struct packet_command));
...
> -       buffer = kmalloc(32, GFP_KERNEL | SR_GFP_DMA(cd));
> +       buffer = kzalloc(32, GFP_KERNEL | SR_GFP_DMA(cd));
>         if (!buffer)
>                 return -ENOMEM;
> 
> -       memset(&cgc, 0, sizeof(struct packet_command));

> When someone converted the *buffer* allocation to kzalloc they
> also removed the the memset for the *packet_cmmand* struct.
> 
> The
> 
> memset(&cgc, 0, sizeof(struct packet_command));
> 
> should be added back I think.

Good eyes. I bet that's it.

		Linus

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-07  3:20                                 ` Linus Torvalds
@ 2006-03-07 18:01                                   ` James Bottomley
  2006-03-07 19:40                                     ` Jesper Juhl
  0 siblings, 1 reply; 53+ messages in thread
From: James Bottomley @ 2006-03-07 18:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mike Christie, Jesper Juhl, Andrew Morton, linux-kernel, markhe,
	andrea, axboe, penberg

On Mon, 2006-03-06 at 19:20 -0800, Linus Torvalds wrote:
> > should be added back I think.
> 
> Good eyes. I bet that's it.

Yes, well done.  Do we have confirmation yet that reversing this fixes
the bug?

I think a full reversal is in order, since buffer is a quantity being
written to, there's no point in zeroing it.

[Note to self: must do better in checking janitors patches]

James



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 22:34                     ` Linus Torvalds
  2006-03-06 22:52                       ` Jesper Juhl
  2006-03-06 22:54                       ` Linus Torvalds
@ 2006-03-07 19:28                       ` Bill Davidsen
  2 siblings, 0 replies; 53+ messages in thread
From: Bill Davidsen @ 2006-03-07 19:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Andrew Morton, markhe,
	Andrea Arcangeli, Mike Christie, James Bottomley, Jens Axboe,
	Pekka Enberg

Linus Torvalds wrote:

> has been incorrectly translated for several reasons:
> 
>  - we shouldn't check "cachep->num > offslab_limit". We should check just 
>    "num > offslab_limit" (cachep->num is the _previous_ number we tested).
> 
>  - when we do "break", we've already incremented "gfporder", and we should 
>    correct for that.
> 
> Now, maybe I'm just off my rocker again (I've certainly been batting 0.000 
> so far, even if I think I've been finding real bugs). So who knows. But I 
> get the feeling that that patch is broken.

I thought stumbling over bugs while looking for other things was part of 
the new development model ;-)
> 
> Either revert it, or try this (TOTALLY UNTESTED!!!) patch..
> 
> And hey, maybe I'm just crazy.

Being crazy and being right are not mutually exclusize.
-- 
    -bill davidsen (davidsen@tmr.com)
"The secret to procrastination is to put things off until the
  last possible moment - but no longer"  -me


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-07 18:01                                   ` James Bottomley
@ 2006-03-07 19:40                                     ` Jesper Juhl
  0 siblings, 0 replies; 53+ messages in thread
From: Jesper Juhl @ 2006-03-07 19:40 UTC (permalink / raw)
  To: James Bottomley
  Cc: Linus Torvalds, Mike Christie, Andrew Morton, linux-kernel,
	markhe, andrea, axboe, penberg

On 3/7/06, James Bottomley <James.Bottomley@steeleye.com> wrote:
> On Mon, 2006-03-06 at 19:20 -0800, Linus Torvalds wrote:
> > > should be added back I think.
> >
> > Good eyes. I bet that's it.
>
> Yes, well done.  Do we have confirmation yet that reversing this fixes
> the bug?
>

I just tried reverting that bit only from 2.6.16-rc5-mm2, and it does
indeed fix the problem.

Thanks for spotting that Mike.

--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
@ 2006-03-08  6:25 Chuck Ebbert
  2006-03-08  8:32 ` Nick Piggin
  0 siblings, 1 reply; 53+ messages in thread
From: Chuck Ebbert @ 2006-03-08  6:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Lee Schermerhorn, Mike Christie, Jesper Juhl, Andrew Morton,
	linux-kernel, James Bottomley

In-Reply-To: <Pine.LNX.4.64.0603061917330.3573@g5.osdl.org>

On Mon, 6 Mar 2006 19:20:13 -0800, Linus Torvalds wrote:

> > When someone converted the *buffer* allocation to kzalloc they
> > also removed the the memset for the *packet_cmmand* struct.
> > 
> > The
> > 
> > memset(&cgc, 0, sizeof(struct packet_command));
> > 
> > should be added back I think.
> 
> Good eyes. I bet that's it.

Heh.  This exact fix was posted to linux-kernel by Lee Schermerhorn
three weeks ago:

 Date: Wed, 15 Feb 2006 14:07:37 -0500
 From: Lee Schermerhorn <lee.schermerhorn@hp.com>
 Subject: [PATCH] 2.6.16-rc3-mm1 - restore zeroing of packet_command
        struct  in sr_ioctl.c
 To: linux-kernel <linux-kernel@vger.kernel.org>
 Cc: Andrew Morton <akpm@osdl.org>
 Message-ID: <1140030457.6619.3.camel@localhost.localdomain>


-- 
Chuck
"Penguins don't come from next door, they come from the Antarctic!"


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-08  6:25 Slab corruption in 2.6.16-rc5-mm2 Chuck Ebbert
@ 2006-03-08  8:32 ` Nick Piggin
  2006-03-08  8:46   ` Andrew Morton
  0 siblings, 1 reply; 53+ messages in thread
From: Nick Piggin @ 2006-03-08  8:32 UTC (permalink / raw)
  To: Chuck Ebbert
  Cc: Linus Torvalds, Lee Schermerhorn, Mike Christie, Jesper Juhl,
	Andrew Morton, linux-kernel, James Bottomley

Chuck Ebbert wrote:
> In-Reply-To: <Pine.LNX.4.64.0603061917330.3573@g5.osdl.org>
> 
> On Mon, 6 Mar 2006 19:20:13 -0800, Linus Torvalds wrote:
> 
> 
>>>When someone converted the *buffer* allocation to kzalloc they
>>>also removed the the memset for the *packet_cmmand* struct.
>>>
>>>The
>>>
>>>memset(&cgc, 0, sizeof(struct packet_command));
>>>
>>>should be added back I think.
>>
>>Good eyes. I bet that's it.
> 
> 
> Heh.  This exact fix was posted to linux-kernel by Lee Schermerhorn
> three weeks ago:
> 
>  Date: Wed, 15 Feb 2006 14:07:37 -0500
>  From: Lee Schermerhorn <lee.schermerhorn@hp.com>
>  Subject: [PATCH] 2.6.16-rc3-mm1 - restore zeroing of packet_command
>         struct  in sr_ioctl.c
>  To: linux-kernel <linux-kernel@vger.kernel.org>
>  Cc: Andrew Morton <akpm@osdl.org>
>  Message-ID: <1140030457.6619.3.camel@localhost.localdomain>
> 
> 

It isn't Andrew's job to make sure a patch gets to the right place
until it is safely in -mm, and even then he's not always going to
know the severity and importance unless he's told.

If it was a patch to "restore" a regression in behaviour, CCs should
at least have gone to the author of the patch that broke it, and the
subsystem maintainers / list / etc as well.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-08  8:32 ` Nick Piggin
@ 2006-03-08  8:46   ` Andrew Morton
  2006-03-08  9:02     ` Nick Piggin
  0 siblings, 1 reply; 53+ messages in thread
From: Andrew Morton @ 2006-03-08  8:46 UTC (permalink / raw)
  To: Nick Piggin
  Cc: 76306.1226, torvalds, lee.schermerhorn, michaelc, jesper.juhl,
	linux-kernel, James.Bottomley

Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>
> Chuck Ebbert wrote:
> > In-Reply-To: <Pine.LNX.4.64.0603061917330.3573@g5.osdl.org>
> > 
> > On Mon, 6 Mar 2006 19:20:13 -0800, Linus Torvalds wrote:
> > 
> > 
> >>>When someone converted the *buffer* allocation to kzalloc they
> >>>also removed the the memset for the *packet_cmmand* struct.
> >>>
> >>>The
> >>>
> >>>memset(&cgc, 0, sizeof(struct packet_command));
> >>>
> >>>should be added back I think.
> >>
> >>Good eyes. I bet that's it.
> > 
> > 
> > Heh.  This exact fix was posted to linux-kernel by Lee Schermerhorn
> > three weeks ago:
> > 
> >  Date: Wed, 15 Feb 2006 14:07:37 -0500
> >  From: Lee Schermerhorn <lee.schermerhorn@hp.com>
> >  Subject: [PATCH] 2.6.16-rc3-mm1 - restore zeroing of packet_command
> >         struct  in sr_ioctl.c
> >  To: linux-kernel <linux-kernel@vger.kernel.org>
> >  Cc: Andrew Morton <akpm@osdl.org>
> >  Message-ID: <1140030457.6619.3.camel@localhost.localdomain>
> > 
> > 
> 
> It isn't Andrew's job to make sure a patch gets to the right place
> until it is safely in -mm, and even then he's not always going to
> know the severity and importance unless he's told.

Is too!

> If it was a patch to "restore" a regression in behaviour, CCs should
> at least have gone to the author of the patch that broke it, and the
> subsystem maintainers / list / etc as well.

I actually merged Lee's patch into -mm, copied James on it and then I
dropped it when I saw that it spat rejects against an updated version of
James's tree, assuming that it had been merged.

Often I'll check that a patch reverts successfully from the upstream tree
before dropping it, but for an obvious one like that I guess I didn't
bother, and assumed that James had taken it.  Only he hadn't - instead he'd
gone and merged something else, hence the rejects.   Oh well.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-08  8:46   ` Andrew Morton
@ 2006-03-08  9:02     ` Nick Piggin
  2006-03-08  9:12       ` Andrew Morton
  0 siblings, 1 reply; 53+ messages in thread
From: Nick Piggin @ 2006-03-08  9:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: 76306.1226, torvalds, lee.schermerhorn, michaelc, jesper.juhl,
	linux-kernel, James.Bottomley

Andrew Morton wrote:
> Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 

>>It isn't Andrew's job to make sure a patch gets to the right place
>>until it is safely in -mm, and even then he's not always going to
>>know the severity and importance unless he's told.
> 
> 
> Is too!
> 

OK, partially. As this case illustrates, everybody makes mistakes and
you obviously can't go back and verify you got all the patches because.

The guy who hits the bug and/or writes the patch can easily see it is
still not merged and shout.

> 
>>If it was a patch to "restore" a regression in behaviour, CCs should
>>at least have gone to the author of the patch that broke it, and the
>>subsystem maintainers / list / etc as well.
> 
> 
> I actually merged Lee's patch into -mm, copied James on it and then I
> dropped it when I saw that it spat rejects against an updated version of
> James's tree, assuming that it had been merged.
> 
> Often I'll check that a patch reverts successfully from the upstream tree
> before dropping it, but for an obvious one like that I guess I didn't
> bother, and assumed that James had taken it.  Only he hadn't - instead he'd
> gone and merged something else, hence the rejects.   Oh well.
> 

You do a great job, but "push the work out to the end nodes", right?
That's how we get this network to scale. It is trivial for people to
verify their important patches have propogated as the release approaches.

(A little harder for part-timers who aren't in the loop about exactly
when the release will happen, thanks to our -ridiculous-count release
system, but still easy compared with your having to double check
everything).

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-08  9:02     ` Nick Piggin
@ 2006-03-08  9:12       ` Andrew Morton
  2006-03-08  9:23         ` Nick Piggin
  0 siblings, 1 reply; 53+ messages in thread
From: Andrew Morton @ 2006-03-08  9:12 UTC (permalink / raw)
  To: Nick Piggin
  Cc: 76306.1226, torvalds, lee.schermerhorn, michaelc, jesper.juhl,
	linux-kernel, James.Bottomley

Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>
> > Often I'll check that a patch reverts successfully from the upstream tree
>  > before dropping it, but for an obvious one like that I guess I didn't
>  > bother, and assumed that James had taken it.  Only he hadn't - instead he'd
>  > gone and merged something else, hence the rejects.   Oh well.
>  > 
> 
>  You do a great job, but "push the work out to the end nodes", right?
>  That's how we get this network to scale. It is trivial for people to
>  verify their important patches have propogated as the release approaches.
> 
>  (A little harder for part-timers who aren't in the loop about exactly
>  when the release will happen, thanks to our -ridiculous-count release
>  system, but still easy compared with your having to double check
>  everything).

Well yes, Lee sent the fix to the guy who he got the kernel release from in
the reasonable expectation that I'd take care of getting it to where it
needed to be.

Problem is, a) I screwed up, b) James screwed up and c) someone just
happened to change those few lines of code in that place within a few-day
window.

That triple-combo doesn't happen very often.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-08  9:12       ` Andrew Morton
@ 2006-03-08  9:23         ` Nick Piggin
  2006-03-08 14:35           ` Lee Schermerhorn
  0 siblings, 1 reply; 53+ messages in thread
From: Nick Piggin @ 2006-03-08  9:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: 76306.1226, torvalds, lee.schermerhorn, michaelc, jesper.juhl,
	linux-kernel, James.Bottomley

Andrew Morton wrote:

> Well yes, Lee sent the fix to the guy who he got the kernel release from in
> the reasonable expectation that I'd take care of getting it to where it
> needed to be.
> 
> Problem is, a) I screwed up, b) James screwed up and c) someone just
> happened to change those few lines of code in that place within a few-day
> window.
> 
> That triple-combo doesn't happen very often.
> 

I guess what I'm advocating isn't foolproof either: the guy who wrote
the patch might die (knock on wood) ;)

Carry on.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-08  9:23         ` Nick Piggin
@ 2006-03-08 14:35           ` Lee Schermerhorn
  0 siblings, 0 replies; 53+ messages in thread
From: Lee Schermerhorn @ 2006-03-08 14:35 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, 76306.1226, torvalds, michaelc, jesper.juhl,
	linux-kernel, James.Bottomley

On Wed, 2006-03-08 at 20:23 +1100, Nick Piggin wrote:
> Andrew Morton wrote:
> 
> > Well yes, Lee sent the fix to the guy who he got the kernel release from in
> > the reasonable expectation that I'd take care of getting it to where it
> > needed to be.
> > 
> > Problem is, a) I screwed up, b) James screwed up and c) someone just
> > happened to change those few lines of code in that place within a few-day
> > window.
> > 
> > That triple-combo doesn't happen very often.
> > 
> 
> I guess what I'm advocating isn't foolproof either: the guy who wrote
> the patch might die (knock on wood) ;)

Thanks, Nick.  I'll remember that...  See you at OLS?

Lee

P.S. ;-)


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-06 21:55               ` Dave Jones
  2006-03-06 21:57                 ` Jesper Juhl
@ 2006-03-09 15:50                 ` Martin J. Bligh
  2006-03-09 15:54                   ` Martin J. Bligh
                                     ` (2 more replies)
  1 sibling, 3 replies; 53+ messages in thread
From: Martin J. Bligh @ 2006-03-09 15:50 UTC (permalink / raw)
  To: Dave Jones
  Cc: Jesper Juhl, Jens Axboe, Linus Torvalds,
	Linux Kernel Mailing List, Andrew Morton, markhe,
	Andrea Arcangeli, Mike Christie, James Bottomley

Dave Jones wrote:
> On Mon, Mar 06, 2006 at 10:41:07PM +0100, Jesper Juhl wrote:
> 
>  > CONFIG_DEBUG_SLAB
>  > CONFIG_PAGE_OWNER
>  > CONFIG_DEBUG_VM
>  > CONFIG_DEBUG_PAGEALLOC
>  > 
>  > The resulting kernel boots and runs just fine (no Oops) and leaves
>  > nothing in dmesg.
>  > So, without the debugging options it appears to the user that
>  > everything is OK - nasty.
> 
> DEBUG_PAGEALLOC in particular is *fantastic* at making bugs hide.
> I've lost many an hour trying to pin bugs down due to that.

Is this backwards? We're saying DEBUG_PAGEALLOC is bad?

OK, what I'm going to try to do, given the recent comments re
CONFIG_DEBUG_SLAB and also PAGEALLOC is to arrange with Andy to run
a debug kernel as well as a normal kernel for every test, and then
we can publish the results on http://test.kernel.org. For now it'll be
a seperate matrix, until I work out how to fold the 3d cube nicely
into 2d - I know I have to do that anyway, so no big deal.

Do we NOT want to have DEBUG_SLAB and DEBUG_PAGEALLOC both enabled?
Running multiple permutations is going to get really painful on the
systems involved. Any other requests for what gets enabled (I really
want to just stick to one 'debug' setup if possible).

I have no idea why I didn't do this a year ago <slaps self>.

M.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-09 15:50                 ` Martin J. Bligh
@ 2006-03-09 15:54                   ` Martin J. Bligh
  2006-03-09 15:54                   ` Benjamin LaHaise
  2006-03-09 16:08                   ` Linus Torvalds
  2 siblings, 0 replies; 53+ messages in thread
From: Martin J. Bligh @ 2006-03-09 15:54 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Dave Jones, Jesper Juhl, Jens Axboe, Linus Torvalds,
	Linux Kernel Mailing List, Andrew Morton, markhe,
	Andrea Arcangeli, Mike Christie, James Bottomley

>> DEBUG_PAGEALLOC in particular is *fantastic* at making bugs hide.
>> I've lost many an hour trying to pin bugs down due to that.
> 
> 
> Is this backwards? We're saying DEBUG_PAGEALLOC is bad?
> 
> OK, what I'm going to try to do, given the recent comments re
> CONFIG_DEBUG_SLAB and also PAGEALLOC is to arrange with Andy to run
> a debug kernel as well as a normal kernel for every test, and then
> we can publish the results on http://test.kernel.org. For now it'll be
> a seperate matrix, until I work out how to fold the 3d cube nicely
> into 2d - I know I have to do that anyway, so no big deal.
> 
> Do we NOT want to have DEBUG_SLAB and DEBUG_PAGEALLOC both enabled?
> Running multiple permutations is going to get really painful on the
> systems involved. Any other requests for what gets enabled (I really
> want to just stick to one 'debug' setup if possible).
> 
> I have no idea why I didn't do this a year ago <slaps self>.

Oh, and I ported CONFIG_DEBUG_PAGEALLOC to x86_64 last week. Will send
out the patch later today.

M.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-09 15:50                 ` Martin J. Bligh
  2006-03-09 15:54                   ` Martin J. Bligh
@ 2006-03-09 15:54                   ` Benjamin LaHaise
  2006-03-09 16:04                     ` Martin J. Bligh
  2006-03-09 16:08                   ` Linus Torvalds
  2 siblings, 1 reply; 53+ messages in thread
From: Benjamin LaHaise @ 2006-03-09 15:54 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Dave Jones, Jesper Juhl, Jens Axboe, Linus Torvalds,
	Linux Kernel Mailing List, Andrew Morton, markhe,
	Andrea Arcangeli, Mike Christie, James Bottomley

On Thu, Mar 09, 2006 at 07:50:15AM -0800, Martin J. Bligh wrote:
> Do we NOT want to have DEBUG_SLAB and DEBUG_PAGEALLOC both enabled?
> Running multiple permutations is going to get really painful on the
> systems involved. Any other requests for what gets enabled (I really
> want to just stick to one 'debug' setup if possible).

Debug kernels are incredibly slow, making hitting certain races next to 
impossible.  By all means non-DEBUG kernels should definately be getting 
tested.

		-ben
-- 
"Time is of no importance, Mr. President, only life is important."
Don't Email: <dont@kvack.org>.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-09 15:54                   ` Benjamin LaHaise
@ 2006-03-09 16:04                     ` Martin J. Bligh
  0 siblings, 0 replies; 53+ messages in thread
From: Martin J. Bligh @ 2006-03-09 16:04 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Dave Jones, Jesper Juhl, Jens Axboe, Linus Torvalds,
	Linux Kernel Mailing List, Andrew Morton, markhe,
	Andrea Arcangeli, Mike Christie, James Bottomley

Benjamin LaHaise wrote:
> On Thu, Mar 09, 2006 at 07:50:15AM -0800, Martin J. Bligh wrote:
> 
>>Do we NOT want to have DEBUG_SLAB and DEBUG_PAGEALLOC both enabled?
>>Running multiple permutations is going to get really painful on the
>>systems involved. Any other requests for what gets enabled (I really
>>want to just stick to one 'debug' setup if possible).
> 
> 
> Debug kernels are incredibly slow, making hitting certain races next to 
> impossible.  By all means non-DEBUG kernels should definately be getting 
> tested.

It'd still run a totally non-debug kernel as well - I want that for the
perf tests etc anyway. I guess the question is whether the debug kernels
should have most of the DEBUG_* turned on, or just a select few.

M.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-09 15:50                 ` Martin J. Bligh
  2006-03-09 15:54                   ` Martin J. Bligh
  2006-03-09 15:54                   ` Benjamin LaHaise
@ 2006-03-09 16:08                   ` Linus Torvalds
  2006-03-09 16:41                     ` Dave Jones
  2 siblings, 1 reply; 53+ messages in thread
From: Linus Torvalds @ 2006-03-09 16:08 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Dave Jones, Jesper Juhl, Jens Axboe, Linux Kernel Mailing List,
	Andrew Morton, markhe, Andrea Arcangeli, Mike Christie,
	James Bottomley



On Thu, 9 Mar 2006, Martin J. Bligh wrote:
> > 
> > DEBUG_PAGEALLOC in particular is *fantastic* at making bugs hide.
> > I've lost many an hour trying to pin bugs down due to that.
> 
> Is this backwards? We're saying DEBUG_PAGEALLOC is bad?

DEBUG_PAGEALLOC is great for finding the really stupid kinds of bugs, and 
it's definitely worth doing every once in a while.

However, DEBUG_PAGEALLOC makes many things orders of magnitude slower, and 
it eats memory like mad (because it turns some slabs into whole pages - 
but it still doesn't help small allocation debugging that much). So unlike 
DEBUG_SLAB, it's not reasonable to have it on all the time.

IOW, DEBUG_SLAB is something that a distro kernel can reasonably enable 
for users by default (I think fedora-devel does, for example). In 
contrast, DEBUG_PAGEALLOC is more of a "useful for special cases" thing, 
where you want to validate that there's nothing _obviously_ bad going on.

> Do we NOT want to have DEBUG_SLAB and DEBUG_PAGEALLOC both enabled?

I suspect that once DEBUG_PAGEALLOC is on, whether you do DEBUG_SLAB or 
not is a toss-up. The interesting cases tend to be

 - neither: usable for benchmarking
 - DEBUG_SLAB: perfectly usable for normal work
 - DEBUG_PAGEALLOC (with or without DEBUG_SLAB): debugging tool only

At least that's my opinion, maybe others have other experiences.

			Linus

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Slab corruption in 2.6.16-rc5-mm2
  2006-03-09 16:08                   ` Linus Torvalds
@ 2006-03-09 16:41                     ` Dave Jones
  0 siblings, 0 replies; 53+ messages in thread
From: Dave Jones @ 2006-03-09 16:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Martin J. Bligh, Jesper Juhl, Jens Axboe,
	Linux Kernel Mailing List, Andrew Morton, markhe,
	Andrea Arcangeli, Mike Christie, James Bottomley

On Thu, Mar 09, 2006 at 08:08:52AM -0800, Linus Torvalds wrote:

 > IOW, DEBUG_SLAB is something that a distro kernel can reasonably enable 
 > for users by default (I think fedora-devel does, for example).

Correct.

 > - neither: usable for benchmarking
 > - DEBUG_SLAB: perfectly usable for normal work
 > - DEBUG_PAGEALLOC (with or without DEBUG_SLAB): debugging tool only
 >
 > At least that's my opinion, maybe others have other experiences.

That's pretty much my experience.  I get people screaming at me when
I enable PAGEALLOC for a day or so in the Fedora-devel kernel if I'm
chasing something :)

		Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2006-03-09 16:42 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-03-08  6:25 Slab corruption in 2.6.16-rc5-mm2 Chuck Ebbert
2006-03-08  8:32 ` Nick Piggin
2006-03-08  8:46   ` Andrew Morton
2006-03-08  9:02     ` Nick Piggin
2006-03-08  9:12       ` Andrew Morton
2006-03-08  9:23         ` Nick Piggin
2006-03-08 14:35           ` Lee Schermerhorn
  -- strict thread matches above, loose matches on Subject: below --
2006-03-06  0:17 Jesper Juhl
2006-03-06 18:25 ` Linus Torvalds
2006-03-06 18:43   ` Jesper Juhl
2006-03-06 19:32     ` Linus Torvalds
2006-03-06 19:51       ` Jesper Juhl
2006-03-06 19:58         ` Jesper Juhl
2006-03-06 20:06       ` Linus Torvalds
2006-03-06 20:24         ` Jesper Juhl
2006-03-06 20:30           ` Jens Axboe
2006-03-06 20:33             ` Jens Axboe
2006-03-06 21:14               ` Jesper Juhl
2006-03-06 21:41             ` Jesper Juhl
2006-03-06 21:55               ` Dave Jones
2006-03-06 21:57                 ` Jesper Juhl
2006-03-09 15:50                 ` Martin J. Bligh
2006-03-09 15:54                   ` Martin J. Bligh
2006-03-09 15:54                   ` Benjamin LaHaise
2006-03-09 16:04                     ` Martin J. Bligh
2006-03-09 16:08                   ` Linus Torvalds
2006-03-09 16:41                     ` Dave Jones
2006-03-06 20:36         ` Jesper Juhl
2006-03-06 20:53           ` Jesper Juhl
2006-03-06 20:56             ` Jesper Juhl
2006-03-06 21:07               ` Linus Torvalds
2006-03-06 21:16                 ` Jesper Juhl
2006-03-06 21:54                 ` Jesper Juhl
2006-03-06 22:05                   ` Andrew Morton
2006-03-06 22:08                     ` Jesper Juhl
2006-03-06 22:27                       ` Jesper Juhl
2006-03-06 22:17                   ` Linus Torvalds
2006-03-06 22:34                     ` Linus Torvalds
2006-03-06 22:52                       ` Jesper Juhl
2006-03-06 22:54                       ` Linus Torvalds
2006-03-06 23:01                         ` Jesper Juhl
2006-03-06 23:06                           ` Andrew Morton
2006-03-06 23:24                             ` Jesper Juhl
2006-03-07  0:17                               ` Linus Torvalds
2006-03-07  0:25                                 ` Jesper Juhl
2006-03-07  3:15                               ` Mike Christie
2006-03-07  3:20                                 ` Linus Torvalds
2006-03-07 18:01                                   ` James Bottomley
2006-03-07 19:40                                     ` Jesper Juhl
2006-03-07 19:28                       ` Bill Davidsen
2006-03-06 22:44                     ` Jesper Juhl
2006-03-06 18:48   ` Mike Christie
2006-03-06 18:49     ` Mike Christie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox