[PATCH as445] Fix reference to deallocated memory in sd.c

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH as445] Fix reference to deallocated memory in sd.c
       [not found] <200501121941.18058.david-b@pacbell.net>
@ 2005-01-14 16:05 ` Alan Stern
  2005-01-18 14:56   ` [PATCH as448] Fix reference to deallocated memory in sr.c Alan Stern
  0 siblings, 1 reply; 8+ messages in thread
From: Alan Stern @ 2005-01-14 16:05 UTC (permalink / raw)
  To: James Bottomley
  Cc: David Brownell, USB development list, SCSI development list

James:

This patch of yours:

http://linux-scsi.bkbits.net:8080/scsi-for-linus-2.6/cset@1.2034.95.5?nav=index.html|src/|src/drivers|src/drivers/scsi|related/drivers/scsi/sd.c

 --- 1.162/drivers/scsi/sd.c	2005-01-14 07:53:59 -08:00
 +++ 1.163/drivers/scsi/sd.c	2005-01-14 07:53:59 -08:00
 @@ -198,8 +198,8 @@
  static void scsi_disk_put(struct scsi_disk *sdkp)
  {
  	down(&sd_ref_sem);
 -	scsi_device_put(sdkp->device);
  	kref_put(&sdkp->kref, scsi_disk_release);
 +	scsi_device_put(sdkp->device);
  	up(&sd_ref_sem);
  }

is causing almost as much trouble as it fixed.  If kref_put() drops the
last reference to the scsi_disk (this happens when the device file is
closed after the device has been hot-unplugged) then the call to
scsi_device_put() will take its argument from an area of memory that has
been deallocated.

The patch below should fix things.

Alan Stern



Signed-off-by: Alan Stern <stern@rowland.harvard.edu>

===== drivers/scsi/sd.c 1.75 vs edited =====
--- 1.75/drivers/scsi/sd.c	2004-12-22 11:18:12 -05:00
+++ edited/drivers/scsi/sd.c	2005-01-14 11:01:14 -05:00
@@ -197,9 +197,11 @@
 
 static void scsi_disk_put(struct scsi_disk *sdkp)
 {
+	struct scsi_device *sdev = sdkp->device;
+
 	down(&sd_ref_sem);
 	kref_put(&sdkp->kref, scsi_disk_release);
-	scsi_device_put(sdkp->device);
+	scsi_device_put(sdev);
 	up(&sd_ref_sem);
 }


 
On Wed, 12 Jan 2005, David Brownell wrote:

> I didn't realize FC3 was mounting this drive, else I might have done
> things differently ... but I think everyone will agree that oopsing
> is not OK.  See the following dmesg trace.
> 
> I've seen a lot of messages about similar failures lately, as if
> maybe more distros are automounting removable drives.  But I also
> remember seeing a lot of fixes go by; does this oops have a fix?
> 
> - Dave
> 
> 
> ============================================================================
> 
> 	Connect drive to NEC EHCI

> 	Erm, FC3 must be automatically mounting this for me.
> 	I didn't ask it to, but I suppose that could be OK ...

> 	... except that when I then unplug the drive ...

> 	... then things go completely haywire ...
> 
> lost page write due to I/O error on sda1
> Unable to handle kernel paging request at virtual address 6b6b6b6b
>  printing eip:
> c027169b
> *pde = 00000000
> Oops: 0000 [#1]
> SMP 
> Modules linked in: usb_storage ohci_hcd ehci_hcd
> CPU:    1
> EIP:    0060:[<c027169b>]    Not tainted VLI
> EFLAGS: 00010286   (2.6.11-rc1-helium) 
> EIP is at scsi_device_put+0x7/0x48
> eax: 0000000f   ebx: 6b6b6b6b   ecx: 00000000   edx: c14efbdc
> esi: cd3f2360   edi: d12b3148   ebp: cd241ee4   esp: cd241ee0
> ds: 007b   es: 007b   ss: 0068
> Process umount (pid: 3497, threadinfo=cd240000 task=ccd71ac0)
> Stack: cd3f2360 cd241efc c0278852 6b6b6b6b cd3f2360 c0279e4e cd2f05b8 cd241f10 
>        c0278bf2 cd3f2360 c14df02c c14df02c cd241f34 c01519c3 c14df0a0 00000000 
>        c14df0a0 00000000 00000000 dfc695d4 d12b3148 cd241f54 c0151a69 c14df02c 
> Call Trace:
>  [<c0102f63>] show_stack+0x74/0x7c
>  [<c0103077>] show_registers+0xf4/0x15e
>  [<c010323e>] die+0xd8/0x157
>  [<c010fe9a>] do_page_fault+0x43d/0x5cc
>  [<c0102c2b>] error_code+0x2b/0x30
>  [<c0278852>] scsi_disk_put+0x38/0x4d
>  [<c0278bf2>] sd_release+0x46/0x4f
>  [<c01519c3>] blkdev_put+0x69/0x137
>  [<c0151a69>] blkdev_put+0x10f/0x137
>  [<c014fdc7>] deactivate_super+0x59/0x78
>  [<c0161ffa>] sys_umount+0x6b/0x73
>  [<c0102155>] sysenter_past_esp+0x52/0x75
> Code: 06 8d 04 02 ff 80 00 01 00 00 eb 0d 56 e8 3a 8c fd ff ba fa ff ff ff eb 02 31 d2 8d 65 f8 89 d0 5b 5e c9 c3 55 89 e5 53 8b 5d 08 <8b> 03 8b 40 74 8b 10 85 d2 74 26 b8 00 e0 ff ff 21 e0 8b 40 10 
>  <7>hub 2-0:1.0: debounce: port 5: total 100ms stable 100ms status 0x100



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH as448] Fix reference to deallocated memory in sr.c
  2005-01-14 16:05 ` [PATCH as445] Fix reference to deallocated memory in sd.c Alan Stern
@ 2005-01-18 14:56   ` Alan Stern
  2005-01-18 15:10     ` James Bottomley
  0 siblings, 1 reply; 8+ messages in thread
From: Alan Stern @ 2005-01-18 14:56 UTC (permalink / raw)
  To: James Bottomley; +Cc: SCSI development list

James:

When I posted a patch last week to fix a reference to deallocated memory 
in sd.c, I forgot to check whether the same problem exists in sr.c.  It 
does, and here's the patch to fix it.

Alan Stern



Signed-off-by: Alan Stern <stern@rowland.harvard.edu>

===== drivers/scsi/sr.c 1.78 vs edited =====
--- 1.78/drivers/scsi/sr.c	2005-01-11 11:57:28 -05:00
+++ edited/drivers/scsi/sr.c	2005-01-18 09:53:50 -05:00
@@ -152,9 +152,11 @@
 
 static inline void scsi_cd_put(struct scsi_cd *cd)
 {
+	struct scsi_device *sdev = cd->device;
+
 	down(&sr_ref_sem);
 	kref_put(&cd->kref, sr_kref_release);
-	scsi_device_put(cd->device);
+	scsi_device_put(sdev);
 	up(&sr_ref_sem);
 }
 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH as448] Fix reference to deallocated memory in sr.c
  2005-01-18 14:56   ` [PATCH as448] Fix reference to deallocated memory in sr.c Alan Stern
@ 2005-01-18 15:10     ` James Bottomley
  2005-01-18 20:55       ` Help decoding: Info fld=0x25e6e3, Current sd08:b1: sense key Recovered Error Guy
  0 siblings, 1 reply; 8+ messages in thread
From: James Bottomley @ 2005-01-18 15:10 UTC (permalink / raw)
  To: Alan Stern; +Cc: SCSI Mailing List

On Tue, 2005-01-18 at 09:56 -0500, Alan Stern wrote:
> When I posted a patch last week to fix a reference to deallocated memory 
> in sd.c, I forgot to check whether the same problem exists in sr.c.  It 
> does, and here's the patch to fix it.

Yes, I already caught that in the scsi-rc-fixes-2.6 tree (although I
haven't synchronised it yet).

James



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Help decoding: Info fld=0x25e6e3, Current sd08:b1: sense key Recovered Error
  2005-01-18 15:10     ` James Bottomley
@ 2005-01-18 20:55       ` Guy
  2005-01-18 21:08         ` Matthias Andree
  0 siblings, 1 reply; 8+ messages in thread
From: Guy @ 2005-01-18 20:55 UTC (permalink / raw)
  Cc: 'SCSI Mailing List'

Can anyone help decode this info?

What is 0x25e6e3?
What disk is sd08:b1?

I have disks on 3 SCSI buses (scsi0, scsi2 and scsi3).
Do you need more info?

Thanks,
Guy

kernel: Info fld=0x25e6e3, Current sd08:b1: sense key Recovered Error
kernel: Additional sense indicates Recovered data with error corr. & retries
applied

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Help decoding: Info fld=0x25e6e3, Current sd08:b1: sense key Recovered Error
  2005-01-18 20:55       ` Help decoding: Info fld=0x25e6e3, Current sd08:b1: sense key Recovered Error Guy
@ 2005-01-18 21:08         ` Matthias Andree
  2005-01-18 23:32           ` Guy
  0 siblings, 1 reply; 8+ messages in thread
From: Matthias Andree @ 2005-01-18 21:08 UTC (permalink / raw)
  To: Guy

"Guy" <bugzilla@watkins-home.com> writes:

> Can anyone help decode this info?
>
> What is 0x25e6e3?
> What disk is sd08:b1?

/dev/sdl1 (ess dee ell one) - that's sedecimal notation for a device
with major 8 minor 0xb1 = 177;

$ ls -l /dev/sd* |grep " 8, 177"
brw-rw----  1 root disk   8, 177 2004-10-02 10:38 /dev/sdl1

> kernel: Info fld=0x25e6e3, Current sd08:b1: sense key Recovered Error
> kernel: Additional sense indicates Recovered data with error corr. & retries
> applied

Time to check and possibly replace the drive, or at least refresh the
block.

smartmontools (on sourceforge) and perhaps badblocks or Jörg Schillings
sformat (careful!) may help you with that.

-- 
Matthias Andree
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Help decoding: Info fld=0x25e6e3, Current sd08:b1: sense key Recovered Error
  2005-01-18 21:08         ` Matthias Andree
@ 2005-01-18 23:32           ` Guy
  2005-01-19  1:22             ` Douglas Gilbert
  0 siblings, 1 reply; 8+ messages in thread
From: Guy @ 2005-01-18 23:32 UTC (permalink / raw)
  To: 'Matthias Andree'; +Cc: 'SCSI Mailing List'

Good info.  Thanks!
I could not find the answer with google.  Too much noise!

Is 0x25e6e3 the block number?

If it is, is it relative to the beginning of sdl1, or sdl?

If not, what is it?

Thanks,
Guy

-----Original Message-----
From: Matthias Andree [mailto:matthias.andree@gmx.de] 
Sent: Tuesday, January 18, 2005 4:09 PM
To: Guy
Cc: unlisted-recipients:; no To-header on input; 'SCSI Mailing List'
Subject: Re: Help decoding: Info fld=0x25e6e3, Current sd08:b1: sense key
Recovered Error

"Guy" <bugzilla@watkins-home.com> writes:

> Can anyone help decode this info?
>
> What is 0x25e6e3?
> What disk is sd08:b1?

/dev/sdl1 (ess dee ell one) - that's sedecimal notation for a device
with major 8 minor 0xb1 = 177;

$ ls -l /dev/sd* |grep " 8, 177"
brw-rw----  1 root disk   8, 177 2004-10-02 10:38 /dev/sdl1

> kernel: Info fld=0x25e6e3, Current sd08:b1: sense key Recovered Error
> kernel: Additional sense indicates Recovered data with error corr. &
retries
> applied

Time to check and possibly replace the drive, or at least refresh the
block.

smartmontools (on sourceforge) and perhaps badblocks or Jörg Schillings
sformat (careful!) may help you with that.

-- 
Matthias Andree

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Help decoding: Info fld=0x25e6e3, Current sd08:b1: sense key Recovered Error
  2005-01-18 23:32           ` Guy
@ 2005-01-19  1:22             ` Douglas Gilbert
  2005-01-19  4:32               ` Guy
  0 siblings, 1 reply; 8+ messages in thread
From: Douglas Gilbert @ 2005-01-19  1:22 UTC (permalink / raw)
  To: Guy; +Cc: 'Matthias Andree', 'SCSI Mailing List'

Guy wrote:
> Good info.  Thanks!
> I could not find the answer with google.  Too much noise!
> 
> Is 0x25e6e3 the block number?

Yes (logical block number expressed in hex)

> If it is, is it relative to the beginning of sdl1, or sdl?

/dev/sdl

> If not, what is it?

Looking at the settings of the "read write error recovery"
mode page on /dev/sdl may be instructive. ['sginfo -e /dev/sdl'
from sg3_utils.] The PER bit seems to be set (otherwise a
recovered error should not have been reported) but the ARRE
and AWRE bits are probably clear. Those bits control the
automatic reaasignment of a block when a recovered error
occurs as reported in your case.

Assuming the problem occurred on a read and that the ARRE
it is clear then you may want to reassign that block. To
check its current state you might try:
  sg_dd if=/dev/sdl skip=0x25e6e3 of=. bs=512 count=1 blk_sgio=1

If that recovered error persists (or worse) rather than formatting
the disk, reassigning that block is more surgical. sg_reassign has
be added to sg3_utils recently (v1.12 beta at www.torque.net/sg)
to do this. In your case:
  sg_reassign -a 0x25e6e3 /dev/sdl

If successful the replaced sector should go into the
"grown" defect list ('sginfo -G /dev/sdl'). This utility
may be worth trying before and after the sg_reassign.

Another way to accomplish the same thing is to set
the ARRE bit (and the AWRE while you are at it) and do
another read of that block. The reported additonal
sense message should change to something like "Recovered
data: data auto-reallocated". Reading the whole disk
might be wise (to see if that lba was a lone case).

More generally this is not a good sign concerning the
health of that disk. No data has been lost _yet_ but it
had to work hard to recovery it. Any entries in the "grown"
defect list is not a good sign. Also with smartmontools
you might like to try 'smartctl -a /dev/sdl' and examine
the "Error counter log" and compare that does some of your
other drives that are not reporting problems. A long
self test may also be appropriate: 'smartctl -t long /dev/sdl'.

Doug Gilbert

> -----Original Message-----
> From: Matthias Andree [mailto:matthias.andree@gmx.de] 
> Sent: Tuesday, January 18, 2005 4:09 PM
> To: Guy
> Cc: unlisted-recipients:; no To-header on input; 'SCSI Mailing List'
> Subject: Re: Help decoding: Info fld=0x25e6e3, Current sd08:b1: sense key
> Recovered Error
> 
> "Guy" <bugzilla@watkins-home.com> writes:
> 
> 
>>Can anyone help decode this info?
>>
>>What is 0x25e6e3?
>>What disk is sd08:b1?
> 
> 
> /dev/sdl1 (ess dee ell one) - that's sedecimal notation for a device
> with major 8 minor 0xb1 = 177;
> 
> $ ls -l /dev/sd* |grep " 8, 177"
> brw-rw----  1 root disk   8, 177 2004-10-02 10:38 /dev/sdl1
> 
> 
>>kernel: Info fld=0x25e6e3, Current sd08:b1: sense key Recovered Error
>>kernel: Additional sense indicates Recovered data with error corr. &
> 
> retries
> 
>>applied
> 
> 
> Time to check and possibly replace the drive, or at least refresh the
> block.
> 
> smartmontools (on sourceforge) and perhaps badblocks or Jörg Schillings
> sformat (careful!) may help you with that.
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Help decoding: Info fld=0x25e6e3, Current sd08:b1: sense key Recovered Error
  2005-01-19  1:22             ` Douglas Gilbert
@ 2005-01-19  4:32               ` Guy
  0 siblings, 0 replies; 8+ messages in thread
From: Guy @ 2005-01-19  4:32 UTC (permalink / raw)
  To: dougg; +Cc: 'Matthias Andree', 'SCSI Mailing List'

Lots of good info!  Thanks.
I have installed sg3_utils, cool stuff.
I knew about AWRE and ARRE.  AWRE is on, ARRE is off.
I do plan to turn on ARRE for all of my disks.
I can't re-produce these errors, so I guess they were write errors that were
re-located.  I was hoping to find a reproducible error, then turn ARRE on
and "see" the error get corrected.  You had the same idea using 'sginfo -G
/dev/sdl' to verify the error was corrected.

I do a read test of all my disks, every night.  It is required, IMO, since
md kicks disks out for having 1 bad block.  I want to find the bad blocks
before md finds them.  But since I started my nightly disk tests, I have had
no bad blocks.  It seems ARRE is on, but it is not.

Anyway, thanks for the good info.

Guy

-----Original Message-----
From: linux-scsi-owner@vger.kernel.org
[mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of Douglas Gilbert
Sent: Tuesday, January 18, 2005 8:23 PM
To: Guy
Cc: 'Matthias Andree'; 'SCSI Mailing List'
Subject: Re: Help decoding: Info fld=0x25e6e3, Current sd08:b1: sense key
Recovered Error

Guy wrote:
> Good info.  Thanks!
> I could not find the answer with google.  Too much noise!
> 
> Is 0x25e6e3 the block number?

Yes (logical block number expressed in hex)

> If it is, is it relative to the beginning of sdl1, or sdl?

/dev/sdl

> If not, what is it?

Looking at the settings of the "read write error recovery"
mode page on /dev/sdl may be instructive. ['sginfo -e /dev/sdl'
from sg3_utils.] The PER bit seems to be set (otherwise a
recovered error should not have been reported) but the ARRE
and AWRE bits are probably clear. Those bits control the
automatic reaasignment of a block when a recovered error
occurs as reported in your case.

Assuming the problem occurred on a read and that the ARRE
it is clear then you may want to reassign that block. To
check its current state you might try:
  sg_dd if=/dev/sdl skip=0x25e6e3 of=. bs=512 count=1 blk_sgio=1

If that recovered error persists (or worse) rather than formatting
the disk, reassigning that block is more surgical. sg_reassign has
be added to sg3_utils recently (v1.12 beta at www.torque.net/sg)
to do this. In your case:
  sg_reassign -a 0x25e6e3 /dev/sdl

If successful the replaced sector should go into the
"grown" defect list ('sginfo -G /dev/sdl'). This utility
may be worth trying before and after the sg_reassign.

Another way to accomplish the same thing is to set
the ARRE bit (and the AWRE while you are at it) and do
another read of that block. The reported additonal
sense message should change to something like "Recovered
data: data auto-reallocated". Reading the whole disk
might be wise (to see if that lba was a lone case).

More generally this is not a good sign concerning the
health of that disk. No data has been lost _yet_ but it
had to work hard to recovery it. Any entries in the "grown"
defect list is not a good sign. Also with smartmontools
you might like to try 'smartctl -a /dev/sdl' and examine
the "Error counter log" and compare that does some of your
other drives that are not reporting problems. A long
self test may also be appropriate: 'smartctl -t long /dev/sdl'.

Doug Gilbert

> -----Original Message-----
> From: Matthias Andree [mailto:matthias.andree@gmx.de] 
> Sent: Tuesday, January 18, 2005 4:09 PM
> To: Guy
> Cc: unlisted-recipients:; no To-header on input; 'SCSI Mailing List'
> Subject: Re: Help decoding: Info fld=0x25e6e3, Current sd08:b1: sense key
> Recovered Error
> 
> "Guy" <bugzilla@watkins-home.com> writes:
> 
> 
>>Can anyone help decode this info?
>>
>>What is 0x25e6e3?
>>What disk is sd08:b1?
> 
> 
> /dev/sdl1 (ess dee ell one) - that's sedecimal notation for a device
> with major 8 minor 0xb1 = 177;
> 
> $ ls -l /dev/sd* |grep " 8, 177"
> brw-rw----  1 root disk   8, 177 2004-10-02 10:38 /dev/sdl1
> 
> 
>>kernel: Info fld=0x25e6e3, Current sd08:b1: sense key Recovered Error
>>kernel: Additional sense indicates Recovered data with error corr. &
> 
> retries
> 
>>applied
> 
> 
> Time to check and possibly replace the drive, or at least refresh the
> block.
> 
> smartmontools (on sourceforge) and perhaps badblocks or Jörg Schillings
> sformat (careful!) may help you with that.
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2005-01-19  4:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <200501121941.18058.david-b@pacbell.net>
2005-01-14 16:05 ` [PATCH as445] Fix reference to deallocated memory in sd.c Alan Stern
2005-01-18 14:56   ` [PATCH as448] Fix reference to deallocated memory in sr.c Alan Stern
2005-01-18 15:10     ` James Bottomley
2005-01-18 20:55       ` Help decoding: Info fld=0x25e6e3, Current sd08:b1: sense key Recovered Error Guy
2005-01-18 21:08         ` Matthias Andree
2005-01-18 23:32           ` Guy
2005-01-19  1:22             ` Douglas Gilbert
2005-01-19  4:32               ` Guy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox