Re: 2.6.17-mm5

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* Re: 2.6.17-mm5
       [not found] ` <20060701142419.GB28750@tlg.swandive.local>
@ 2006-07-01 21:30   ` Andrew Morton
  2006-07-01 22:26     ` 2.6.17-mm5 James Bottomley
                       ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Andrew Morton @ 2006-07-01 21:30 UTC (permalink / raw)
  To: Grant Wilson; +Cc: linux-kernel, Neil Brown, linux-scsi

On Sat, 1 Jul 2006 15:24:19 +0100
Grant Wilson <grant.wilson@zen.co.uk> wrote:

> More RAID1 problems - OOPS on shutdown.

Thanks.  Please copy the mailing lists on these reports - I'm not an MD,
SCSI or SATA developer, and this is in their area.

> [   37.482699] md: Autodetecting RAID arrays.
> [   37.547908] md: autorun ...
> [   37.566449] md: considering sdb2 ...
> [   37.589664] md:  adding sdb2 ...
> [   37.610757] md:  adding sda2 ...
> [   37.632116] md: created md1
> [   37.650587] md: bind<sda2>
> [   37.668571] md: bind<sdb2>
> [   37.686541] md: running: <sdb2><sda2>
> [   37.710807] raid1: raid set md1 active with 2 out of 2 mirrors
> [   37.747557] md: ... autorun DONE.
> [   37.784444] EXT3-fs: INFO: recovery required on readonly filesystem.
> [   37.824275] EXT3-fs: write access will be enabled during recovery.
> [   38.814113] kjournald starting.  Commit interval 5 seconds
> [   38.848761] EXT3-fs: sdc1: orphan cleanup on readonly fs
> [   38.985436] EXT3-fs: sdc1: 7 orphan inodes deleted
> [   39.015845] EXT3-fs: recovery complete.
> [   39.072168] EXT3-fs: mounted filesystem with ordered data mode.
> [   44.693986] Adding 995988k swap on /dev/sda1.  Priority:-1 extents:1 across:995988k
> [   44.744558] Adding 995988k swap on /dev/sdb1.  Priority:-2 extents:1 across:995988k
> [   44.966034] EXT3 FS on sdc1, internal journal
> [   49.305350] device-mapper: ioctl: 4.8.0-ioctl (2006-06-24) initialised: dm-devel@redhat.com
> [   64.091331] raid1: Disk failure on sdb2, disabling device. 
> [   64.091333] 	Operation continuing on 1 devices
> [   64.212624] RAID1 conf printout:
> [   64.233951]  --- wd:1 rd:2
> [   64.252195]  disk 0, wo:0, o:1, dev:sda2
> [   64.277712]  disk 1, wo:1, o:0, dev:sdb2
> [   64.305627] RAID1 conf printout:
> [   64.326977]  --- wd:1 rd:2
> [   64.345220]  disk 0, wo:0, o:1, dev:sda2
> [

Which device drivers are being used for these disks?

> [  155.123022] Unable to handle kernel NULL pointer dereference at 0000000000000048 RIP: 
> [  155.155867]  [<ffffffff8047157a>] md_error+0x45/0x91
> [  155.200353] PGD 77954067 PUD 726e5067 PMD 0 
> [  155.226233] Oops: 0000 [1] PREEMPT SMP 
> [  155.249516] last sysfs file: /devices/system/cpu/cpu0/cpufreq/scaling_setspeed
> [  155.292808] CPU 0 
> [  155.304968] Modules linked in: dm_mod evdev
> [  155.330331] Pid: 0, comm: swapper Not tainted 2.6.17-mm5 #1
> [  155.363697] RIP: 0010:[<ffffffff8047157a>]  [<ffffffff8047157a>] md_error+0x45/0x91
> [  155.409638] RSP: 0018:ffffffff807a0c50  EFLAGS: 00010046
> [  155.441445] RAX: 0000000000000000 RBX: ffff81007aa34708 RCX: 000000000000003f
> [  155.484216] RDX: 00000000fffffffb RSI: ffff81007a821d28 RDI: ffff81007aa34708
> [  155.526989] RBP: ffffffff807a0c60 R08: 0000000000000000 R09: ffff81007aac43b0
> [  155.569759] R10: ffffffff804221e5 R11: 0000000000000058 R12: ffff81007aac4ab0
> [  155.612533] R13: ffff81007aac43b0 R14: ffff81007aac4ab0 R15: 00000000fffffffb
> [  155.655303] FS:  00002aeb361606d0(0000) GS:ffffffff80a46000(0000) knlGS:0000000000000000
> [  155.703791] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [  155.738195] CR2: 0000000000000048 CR3: 0000000070997000 CR4: 00000000000006e0
> [  155.780969] Process swapper (pid: 0, threadinfo ffffffff80a64000, task ffffffff80696a00)
> [  155.829404] Stack:  ffff81007a821d28 ffff81007aa34708 ffffffff807a0c80 ffffffff804728d9
> [  155.877840]  ffff81007a821d28 ffff81007aa34708 ffffffff807a0cc0 ffffffff8047409c
> [  155.922535]  00001000807a0d00 ffff81007aac4ab0 00000000fffffffb ffff81007aac4ab0
> [  155.966085] Call Trace:
> [  155.982416]  [<ffffffff804728d9>] super_written+0x30/0x65
> [  156.015292]  [<ffffffff8047409c>] super_written_barrier+0xc4/0xd1
> [  156.052297]  [<ffffffff8023a5a5>] bio_endio+0x56/0x5b
> [  156.082688]  [<ffffffff8022d21b>] __end_that_request_first+0x1c9/0x4c9
> [  156.122068]  [<ffffffff8024a0d6>] end_that_request_first+0xc/0xe
> [  156.158343]  [<ffffffff8036a692>] blk_ordered_complete_seq+0x7c/0x8b
> [  156.196705]  [<ffffffff8036a6d1>] post_flush_end_io+0x30/0x35
> [  156.231419]  [<ffffffff8036a5b5>] end_that_request_last+0xd9/0xf6
> [  156.268215]  [<ffffffff80422204>] scsi_end_request+0xad/0xd7
> [  156.302573]  [<ffffffff80422637>] scsi_io_completion+0x3e1/0x3f0
> [  156.339004]  [<ffffffff8042266c>] scsi_blk_pc_done+0x26/0x28
> [  156.373357]  [<ffffffff8041d11e>] scsi_finish_command+0xa9/0xb2
> [  156.409264]  [<ffffffff804229f9>] scsi_softirq_done+0xf4/0xfd
> [  156.444143]  [<ffffffff80237f66>] blk_done_softirq+0x70/0x7f
> [  156.478323]  [<ffffffff80211366>] __do_softirq+0x67/0xf4
> [  156.510224]  [<ffffffff8025f95e>] call_softirq+0x1e/0x28
> [  156.542083] 
> [  156.542083] Code: 48 8b 40 48 48 85 c0 74 3f ff d0 f0 0f ba ab e0 01 00 00 03 

The barrier code is in there again.

mddev->pers is NULL in md_error(), so the test of
!mddev->pers->error_handler oopsed.  Perhaps this is a real MD bug which is
now being exposed by the new barrier-handling problem.


This should get you further, but...

From: Andrew Morton <akpm@osdl.org>

Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 drivers/md/md.c |    2 ++
 1 file changed, 2 insertions(+)

diff -puN drivers/md/md.c~md-oops-workaround drivers/md/md.c
--- a/drivers/md/md.c~md-oops-workaround
+++ a/drivers/md/md.c
@@ -4586,6 +4586,8 @@ void md_error(mddev_t *mddev, mdk_rdev_t
 		__builtin_return_address(0),__builtin_return_address(1),
 		__builtin_return_address(2),__builtin_return_address(3));
 */
+	if (!mddev->pers)
+		return;
 	if (!mddev->pers->error_handler)
 		return;
 	mddev->pers->error_handler(mddev,rdev);
_


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.17-mm5 dislikes raid-1, just like mm4
       [not found] ` <20060701181455.GA16412@aitel.hist.no>
@ 2006-07-01 22:22   ` Andrew Morton
  2006-07-01 22:52     ` Jeff Garzik
                       ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Andrew Morton @ 2006-07-01 22:22 UTC (permalink / raw)
  To: Helge Hafting
  Cc: linux-kernel, linux-scsi, Neil Brown, Reuben Farrelly,
	Grant Wilson

On Sat, 1 Jul 2006 20:14:55 +0200
Helge Hafting <helgehaf@aitel.hist.no> wrote:

> I  just got mm5 up, and it has the same problem as mm4.
> Raid-1 does not work. I used 2.6.16 to resync my raids,
> and booted into 2.6.17-mm5.
> 
> [...]
> md:  adding sda2 ...
> md: created md0
> md: bind<sda2>
> md: bind<sdb1>
> md: running: <sdb1><sda2>
> raid1: raid set md0 active with 2 out of 2 mirrors
> md: ... autorun DONE.
> kjournald starting.  Commit interval 5 seconds
> EXT3-fs: mounted filesystem with ordered data mode.
>   Vendor: USB2.0    Model:       HS-CF       Rev: 1.64
>   Type:   Direct-Access                      ANSI SCSI revision: 00
> sd 3:0:0:0: Attached scsi removable disk sdf
> sd 3:0:0:0: Attached scsi generic sg5 type 0
>   Vendor: USB2.0    Model:       HS-MS       Rev: 1.64
>   Type:   Direct-Access                      ANSI SCSI revision: 00
> sd 3:0:0:1: Attached scsi removable disk sdg
> sd 3:0:0:1: Attached scsi generic sg6 type 0
>   Vendor: USB2.0    Model:       HS-SM       Rev: 1.64
>   Type:   Direct-Access                      ANSI SCSI revision: 00
> sd 3:0:0:2: Attached scsi removable disk sdh
> sd 3:0:0:2: Attached scsi generic sg7 type 0
>   Vendor: USB2.0    Model:       HS-SD/MMC   Rev: 1.64
>   Type:   Direct-Access                      ANSI SCSI revision: 00
> sd 3:0:0:3: Attached scsi removable disk sdi
> sd 3:0:0:3: Attached scsi generic sg8 type 0
> usb-storage: device scan complete
> loadkeys[2214]: segfault at 00000000000005a0 rip 00002b22e169feea rsp 00007fffc973c478 error 4
> Adding 1000424k swap on /dev/sda6.  Priority:1 extents:1 across:1000424k
> EXT3 FS on sdd1, internal journal
> raid1: Disk failure on sda2, disabling device. 
>         Operation continuing on 1 devices
> RAID1 conf printout:
>  --- wd:1 rd:2
>  disk 0, wo:1, o:0, dev:sda2
>  disk 1, wo:0, o:1, dev:sdb1
> RAID1 conf printout:
>  --- wd:1 rd:2
> 
> ...
>
> As we see, the md devices are assembled, then the filesystems are
> mounted and swap turned on.  Then all three md devices fail a 
> partition at the same time.  Somehow, I don't believe that
> is correct. ;-)
> 

I assume this is still the broken-barriers bug.  Thanks for all the help on
this, guys.  More is to be asked for, I'm afraid.

I've prepared a tree which is basically 2.6.17-mm5, only the git-scsi-misc
and git-libata-all trees have been omitted.  It's at 

http://www.zip.com.au/~akpm/linux/patches/stuff/2.6.17-mm5-no-sata-scsi.bz2

(That's a diff against 2.6.17)

If that kernel works, then the next step is to test

http://www.zip.com.au/~akpm/linux/patches/stuff/2.6.17-mm5-no-scsi.bz2

which is 2.6.17-mm5 without git-scsi-misc, but with git-libata-all.

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 21:30   ` 2.6.17-mm5 Andrew Morton
@ 2006-07-01 22:26     ` James Bottomley
  2006-07-01 22:32       ` 2.6.17-mm5 Neil Brown
  2006-07-01 22:29     ` More RAID / SATA / barrier problems [ Re: 2.6.17-mm5 ] Neil Brown
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 20+ messages in thread
From: James Bottomley @ 2006-07-01 22:26 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Grant Wilson, linux-kernel, Neil Brown, linux-scsi

On Sat, 2006-07-01 at 14:30 -0700, Andrew Morton wrote:
> On Sat, 1 Jul 2006 15:24:19 +0100
> Grant Wilson <grant.wilson@zen.co.uk> wrote:
> 
> > More RAID1 problems - OOPS on shutdown.

Actually, is there any more of the trace, like what was going on just
before the oops?

It looks very like a lifetime issue (i.e. md thinks the array is dead
and has torn it down, but there's still an outstanding command).  It
would be nice to know what the outstanding command might have been.

James


> Thanks.  Please copy the mailing lists on these reports - I'm not an MD,
> SCSI or SATA developer, and this is in their area.
> 
> > [   37.482699] md: Autodetecting RAID arrays.
> > [   37.547908] md: autorun ...
> > [   37.566449] md: considering sdb2 ...
> > [   37.589664] md:  adding sdb2 ...
> > [   37.610757] md:  adding sda2 ...
> > [   37.632116] md: created md1
> > [   37.650587] md: bind<sda2>
> > [   37.668571] md: bind<sdb2>
> > [   37.686541] md: running: <sdb2><sda2>
> > [   37.710807] raid1: raid set md1 active with 2 out of 2 mirrors
> > [   37.747557] md: ... autorun DONE.
> > [   37.784444] EXT3-fs: INFO: recovery required on readonly filesystem.
> > [   37.824275] EXT3-fs: write access will be enabled during recovery.
> > [   38.814113] kjournald starting.  Commit interval 5 seconds
> > [   38.848761] EXT3-fs: sdc1: orphan cleanup on readonly fs
> > [   38.985436] EXT3-fs: sdc1: 7 orphan inodes deleted
> > [   39.015845] EXT3-fs: recovery complete.
> > [   39.072168] EXT3-fs: mounted filesystem with ordered data mode.
> > [   44.693986] Adding 995988k swap on /dev/sda1.  Priority:-1 extents:1 across:995988k
> > [   44.744558] Adding 995988k swap on /dev/sdb1.  Priority:-2 extents:1 across:995988k
> > [   44.966034] EXT3 FS on sdc1, internal journal
> > [   49.305350] device-mapper: ioctl: 4.8.0-ioctl (2006-06-24) initialised: dm-devel@redhat.com
> > [   64.091331] raid1: Disk failure on sdb2, disabling device. 
> > [   64.091333] 	Operation continuing on 1 devices
> > [   64.212624] RAID1 conf printout:
> > [   64.233951]  --- wd:1 rd:2
> > [   64.252195]  disk 0, wo:0, o:1, dev:sda2
> > [   64.277712]  disk 1, wo:1, o:0, dev:sdb2
> > [   64.305627] RAID1 conf printout:
> > [   64.326977]  --- wd:1 rd:2
> > [   64.345220]  disk 0, wo:0, o:1, dev:sda2
> > [
> 
> Which device drivers are being used for these disks?
> 
> > [  155.123022] Unable to handle kernel NULL pointer dereference at 0000000000000048 RIP: 
> > [  155.155867]  [<ffffffff8047157a>] md_error+0x45/0x91
> > [  155.200353] PGD 77954067 PUD 726e5067 PMD 0 
> > [  155.226233] Oops: 0000 [1] PREEMPT SMP 
> > [  155.249516] last sysfs file: /devices/system/cpu/cpu0/cpufreq/scaling_setspeed
> > [  155.292808] CPU 0 
> > [  155.304968] Modules linked in: dm_mod evdev
> > [  155.330331] Pid: 0, comm: swapper Not tainted 2.6.17-mm5 #1
> > [  155.363697] RIP: 0010:[<ffffffff8047157a>]  [<ffffffff8047157a>] md_error+0x45/0x91
> > [  155.409638] RSP: 0018:ffffffff807a0c50  EFLAGS: 00010046
> > [  155.441445] RAX: 0000000000000000 RBX: ffff81007aa34708 RCX: 000000000000003f
> > [  155.484216] RDX: 00000000fffffffb RSI: ffff81007a821d28 RDI: ffff81007aa34708
> > [  155.526989] RBP: ffffffff807a0c60 R08: 0000000000000000 R09: ffff81007aac43b0
> > [  155.569759] R10: ffffffff804221e5 R11: 0000000000000058 R12: ffff81007aac4ab0
> > [  155.612533] R13: ffff81007aac43b0 R14: ffff81007aac4ab0 R15: 00000000fffffffb
> > [  155.655303] FS:  00002aeb361606d0(0000) GS:ffffffff80a46000(0000) knlGS:0000000000000000
> > [  155.703791] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> > [  155.738195] CR2: 0000000000000048 CR3: 0000000070997000 CR4: 00000000000006e0
> > [  155.780969] Process swapper (pid: 0, threadinfo ffffffff80a64000, task ffffffff80696a00)
> > [  155.829404] Stack:  ffff81007a821d28 ffff81007aa34708 ffffffff807a0c80 ffffffff804728d9
> > [  155.877840]  ffff81007a821d28 ffff81007aa34708 ffffffff807a0cc0 ffffffff8047409c
> > [  155.922535]  00001000807a0d00 ffff81007aac4ab0 00000000fffffffb ffff81007aac4ab0
> > [  155.966085] Call Trace:
> > [  155.982416]  [<ffffffff804728d9>] super_written+0x30/0x65
> > [  156.015292]  [<ffffffff8047409c>] super_written_barrier+0xc4/0xd1
> > [  156.052297]  [<ffffffff8023a5a5>] bio_endio+0x56/0x5b
> > [  156.082688]  [<ffffffff8022d21b>] __end_that_request_first+0x1c9/0x4c9
> > [  156.122068]  [<ffffffff8024a0d6>] end_that_request_first+0xc/0xe
> > [  156.158343]  [<ffffffff8036a692>] blk_ordered_complete_seq+0x7c/0x8b
> > [  156.196705]  [<ffffffff8036a6d1>] post_flush_end_io+0x30/0x35
> > [  156.231419]  [<ffffffff8036a5b5>] end_that_request_last+0xd9/0xf6
> > [  156.268215]  [<ffffffff80422204>] scsi_end_request+0xad/0xd7
> > [  156.302573]  [<ffffffff80422637>] scsi_io_completion+0x3e1/0x3f0
> > [  156.339004]  [<ffffffff8042266c>] scsi_blk_pc_done+0x26/0x28
> > [  156.373357]  [<ffffffff8041d11e>] scsi_finish_command+0xa9/0xb2
> > [  156.409264]  [<ffffffff804229f9>] scsi_softirq_done+0xf4/0xfd
> > [  156.444143]  [<ffffffff80237f66>] blk_done_softirq+0x70/0x7f
> > [  156.478323]  [<ffffffff80211366>] __do_softirq+0x67/0xf4
> > [  156.510224]  [<ffffffff8025f95e>] call_softirq+0x1e/0x28
> > [  156.542083] 
> > [  156.542083] Code: 48 8b 40 48 48 85 c0 74 3f ff d0 f0 0f ba ab e0 01 00 00 03 
> 
> The barrier code is in there again.
> 
> mddev->pers is NULL in md_error(), so the test of
> !mddev->pers->error_handler oopsed.  Perhaps this is a real MD bug which is
> now being exposed by the new barrier-handling problem.
> 
> 
> This should get you further, but...
> 
> From: Andrew Morton <akpm@osdl.org>
> 
> Cc: Neil Brown <neilb@suse.de>
> Signed-off-by: Andrew Morton <akpm@osdl.org>
> ---
> 
>  drivers/md/md.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> diff -puN drivers/md/md.c~md-oops-workaround drivers/md/md.c
> --- a/drivers/md/md.c~md-oops-workaround
> +++ a/drivers/md/md.c
> @@ -4586,6 +4586,8 @@ void md_error(mddev_t *mddev, mdk_rdev_t
>  		__builtin_return_address(0),__builtin_return_address(1),
>  		__builtin_return_address(2),__builtin_return_address(3));
>  */
> +	if (!mddev->pers)
> +		return;
>  	if (!mddev->pers->error_handler)
>  		return;
>  	mddev->pers->error_handler(mddev,rdev);
> _
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 20+ messages in thread

* More RAID / SATA / barrier problems [ Re: 2.6.17-mm5 ]
  2006-07-01 21:30   ` 2.6.17-mm5 Andrew Morton
  2006-07-01 22:26     ` 2.6.17-mm5 James Bottomley
@ 2006-07-01 22:29     ` Neil Brown
  2006-07-01 22:54     ` 2.6.17-mm5 Jeff Garzik
  2006-07-27 21:02     ` 2.6.17-mm5 Ming Zhang
  3 siblings, 0 replies; 20+ messages in thread
From: Neil Brown @ 2006-07-01 22:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Grant Wilson, linux-kernel, linux-scsi

On Saturday July 1, akpm@osdl.org wrote:
> 
> mddev->pers is NULL in md_error(), so the test of
> !mddev->pers->error_handler oopsed.  Perhaps this is a real MD bug which is
> now being exposed by the new barrier-handling problem.
> 

Yes, this is a real MD bug which would hit whenever writing a
superblock fails during array-shutdown.  I guess that has never
happened before!  The work around you propose is probably as good as
any, but I'll think through it some more and see.

It seems that super block writes are always failing in some
configurations at the moment!

I wonder what we *should* do when writing to the superblock on the
last device of a raid1 faills... maybe switch the array to read-only?
I'll have a think about that too.

But we need to find out why barrier-writes are returning EIO.
Hopefully Reuben's testing will shed some light.

NeilBrown

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 22:26     ` 2.6.17-mm5 James Bottomley
@ 2006-07-01 22:32       ` Neil Brown
  2006-07-01 22:56         ` 2.6.17-mm5 Jeff Garzik
  0 siblings, 1 reply; 20+ messages in thread
From: Neil Brown @ 2006-07-01 22:32 UTC (permalink / raw)
  To: James Bottomley; +Cc: Andrew Morton, Grant Wilson, linux-kernel, linux-scsi

On Saturday July 1, James.Bottomley@SteelEye.com wrote:
> On Sat, 2006-07-01 at 14:30 -0700, Andrew Morton wrote:
> > On Sat, 1 Jul 2006 15:24:19 +0100
> > Grant Wilson <grant.wilson@zen.co.uk> wrote:
> > 
> > > More RAID1 problems - OOPS on shutdown.
> 
> Actually, is there any more of the trace, like what was going on just
> before the oops?
> 
> It looks very like a lifetime issue (i.e. md thinks the array is dead
> and has torn it down, but there's still an outstanding command).  It
> would be nice to know what the outstanding command might have been.

md writes the superblock after tearing down the array, which is
admittedly a bit careless.

The problem seems to be simply that on some hardware at least,
BIO_RW_BARRIER writes result in an EIO.  Don't know why yet.

NeilBrown

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.17-mm5 dislikes raid-1, just like mm4
  2006-07-01 22:22   ` 2.6.17-mm5 dislikes raid-1, just like mm4 Andrew Morton
@ 2006-07-01 22:52     ` Jeff Garzik
  2006-07-01 22:58       ` Andrew Morton
  2006-07-02  4:43     ` Reuben Farrelly
  2006-07-02  5:13     ` Reuben Farrelly
  2 siblings, 1 reply; 20+ messages in thread
From: Jeff Garzik @ 2006-07-01 22:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Helge Hafting, linux-kernel, linux-scsi, Neil Brown,
	Reuben Farrelly, Grant Wilson

On Sat, Jul 01, 2006 at 03:22:58PM -0700, Andrew Morton wrote:
> Helge Hafting <helgehaf@aitel.hist.no> wrote:
> > kjournald starting.  Commit interval 5 seconds
> > EXT3-fs: mounted filesystem with ordered data mode.
> >   Vendor: USB2.0    Model:       HS-CF       Rev: 1.64
> >   Type:   Direct-Access                      ANSI SCSI revision: 00
> > sd 3:0:0:0: Attached scsi removable disk sdf
> > sd 3:0:0:0: Attached scsi generic sg5 type 0
> >   Vendor: USB2.0    Model:       HS-MS       Rev: 1.64
> >   Type:   Direct-Access                      ANSI SCSI revision: 00
> > sd 3:0:0:1: Attached scsi removable disk sdg
> > sd 3:0:0:1: Attached scsi generic sg6 type 0
> >   Vendor: USB2.0    Model:       HS-SM       Rev: 1.64
> >   Type:   Direct-Access                      ANSI SCSI revision: 00
> > sd 3:0:0:2: Attached scsi removable disk sdh
> > sd 3:0:0:2: Attached scsi generic sg7 type 0
> >   Vendor: USB2.0    Model:       HS-SD/MMC   Rev: 1.64
> >   Type:   Direct-Access                      ANSI SCSI revision: 00

> I assume this is still the broken-barriers bug.  Thanks for all the help on
> this, guys.  More is to be asked for, I'm afraid.
> 
> I've prepared a tree which is basically 2.6.17-mm5, only the git-scsi-misc
> and git-libata-all trees have been omitted.  It's at 

What does USB storage have to do with SATA?

	Jeff




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 21:30   ` 2.6.17-mm5 Andrew Morton
  2006-07-01 22:26     ` 2.6.17-mm5 James Bottomley
  2006-07-01 22:29     ` More RAID / SATA / barrier problems [ Re: 2.6.17-mm5 ] Neil Brown
@ 2006-07-01 22:54     ` Jeff Garzik
  2006-07-27 21:02     ` 2.6.17-mm5 Ming Zhang
  3 siblings, 0 replies; 20+ messages in thread
From: Jeff Garzik @ 2006-07-01 22:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Grant Wilson, linux-kernel, Neil Brown, linux-scsi

On Sat, Jul 01, 2006 at 02:30:47PM -0700, Andrew Morton wrote:
> Grant Wilson <grant.wilson@zen.co.uk> wrote:
> > [  155.226233] Oops: 0000 [1] PREEMPT SMP 

Also, would be nice to re-test without preempt.

Disabling preempt _continues_ to fix (bandaid?) problems...

	Jeff




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 22:32       ` 2.6.17-mm5 Neil Brown
@ 2006-07-01 22:56         ` Jeff Garzik
  2006-07-02  0:10           ` 2.6.17-mm5 James Bottomley
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff Garzik @ 2006-07-01 22:56 UTC (permalink / raw)
  To: Neil Brown
  Cc: James Bottomley, Andrew Morton, Grant Wilson, linux-kernel,
	linux-scsi

On Sun, Jul 02, 2006 at 08:32:28AM +1000, Neil Brown wrote:
> The problem seems to be simply that on some hardware at least,
> BIO_RW_BARRIER writes result in an EIO.  Don't know why yet.

Could be that <whatever device> is choking on FLUSH CACHE (ATA)
or SYNCHRONIZE CACHE (SCSI).

That's one possible reason why EIO may result from a barrier...

	Jeff




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.17-mm5 dislikes raid-1, just like mm4
  2006-07-01 22:52     ` Jeff Garzik
@ 2006-07-01 22:58       ` Andrew Morton
  0 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2006-07-01 22:58 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: helgehaf, linux-kernel, linux-scsi, neilb, reuben-lkml,
	grant.wilson

On Sat, 1 Jul 2006 18:52:12 -0400
Jeff Garzik <jeff@garzik.org> wrote:

> On Sat, Jul 01, 2006 at 03:22:58PM -0700, Andrew Morton wrote:
> > Helge Hafting <helgehaf@aitel.hist.no> wrote:
> > > kjournald starting.  Commit interval 5 seconds
> > > EXT3-fs: mounted filesystem with ordered data mode.
> > >   Vendor: USB2.0    Model:       HS-CF       Rev: 1.64
> > >   Type:   Direct-Access                      ANSI SCSI revision: 00
> > > sd 3:0:0:0: Attached scsi removable disk sdf
> > > sd 3:0:0:0: Attached scsi generic sg5 type 0
> > >   Vendor: USB2.0    Model:       HS-MS       Rev: 1.64
> > >   Type:   Direct-Access                      ANSI SCSI revision: 00
> > > sd 3:0:0:1: Attached scsi removable disk sdg
> > > sd 3:0:0:1: Attached scsi generic sg6 type 0
> > >   Vendor: USB2.0    Model:       HS-SM       Rev: 1.64
> > >   Type:   Direct-Access                      ANSI SCSI revision: 00
> > > sd 3:0:0:2: Attached scsi removable disk sdh
> > > sd 3:0:0:2: Attached scsi generic sg7 type 0
> > >   Vendor: USB2.0    Model:       HS-SD/MMC   Rev: 1.64
> > >   Type:   Direct-Access                      ANSI SCSI revision: 00
> 
> > I assume this is still the broken-barriers bug.  Thanks for all the help on
> > this, guys.  More is to be asked for, I'm afraid.
> > 
> > I've prepared a tree which is basically 2.6.17-mm5, only the git-scsi-misc
> > and git-libata-all trees have been omitted.  It's at 
> 
> What does USB storage have to do with SATA?
> 

Please read the mailing list - several of these reports have been with
sata.

Yes, thank you.  As this report is against usb-storage then the bug most
probably lies in git-scsi-misc.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 22:56         ` 2.6.17-mm5 Jeff Garzik
@ 2006-07-02  0:10           ` James Bottomley
  0 siblings, 0 replies; 20+ messages in thread
From: James Bottomley @ 2006-07-02  0:10 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Neil Brown, Andrew Morton, Grant Wilson, linux-kernel, linux-scsi

On Sat, 2006-07-01 at 18:56 -0400, Jeff Garzik wrote:
> On Sun, Jul 02, 2006 at 08:32:28AM +1000, Neil Brown wrote:
> > The problem seems to be simply that on some hardware at least,
> > BIO_RW_BARRIER writes result in an EIO.  Don't know why yet.
> 
> Could be that <whatever device> is choking on FLUSH CACHE (ATA)
> or SYNCHRONIZE CACHE (SCSI).
> 
> That's one possible reason why EIO may result from a barrier...

There is no barrier implementation on SCSI (basically you can't maintain
barriers in the face of TCQ, so only depth one devices can do it and
hence all the scsi drivers turn it off), so it must be a FLUSH CACHE.

This one looks like it went down via prepare_flush rather than
issue_flush, so the normal error printing case that issue flush has is
skipped.  This patch should tell us what the error was on the command.

James

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 3d04a9f..3e3e3b7 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1162,7 +1162,20 @@ static int scsi_issue_flush_fn(request_q
 
 static void scsi_blk_pc_done(struct scsi_cmnd *cmd)
 {
+	int res = cmd->result;
+	struct scsi_sense_hdr sshdr;
+
 	BUG_ON(!blk_pc_request(cmd->request));
+	if (!res) {
+		printk(KERN_ERR "REQ_BLOCK_PC FAILED for ");
+		__scsi_print_command(cmd->cmnd);
+		printk(KERN_ERR "FAILED\n  status = %x, message = %02x, "
+		       "host = %d, driver = %02x\n  ",
+		       status_byte(res), msg_byte(res),
+		       host_byte(res), driver_byte(res));
+		if (scsi_command_normalize_sense(cmd, &sshdr))
+			scsi_print_sense_hdr("sd", &sshdr);
+	}
 	/*
 	 * This will complete the whole command with uptodate=1 so
 	 * as far as the block layer is concerned the command completed


James



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: 2.6.17-mm5 dislikes raid-1, just like mm4
  2006-07-01 22:22   ` 2.6.17-mm5 dislikes raid-1, just like mm4 Andrew Morton
  2006-07-01 22:52     ` Jeff Garzik
@ 2006-07-02  4:43     ` Reuben Farrelly
  2006-07-02  6:09       ` Andrew Morton
  2006-07-02  5:13     ` Reuben Farrelly
  2 siblings, 1 reply; 20+ messages in thread
From: Reuben Farrelly @ 2006-07-02  4:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Helge Hafting, linux-kernel, linux-scsi, Neil Brown, Grant Wilson



On 2/07/2006 10:22 a.m., Andrew Morton wrote:
> On Sat, 1 Jul 2006 20:14:55 +0200
> Helge Hafting <helgehaf@aitel.hist.no> wrote:
> 
>> I  just got mm5 up, and it has the same problem as mm4.
>> Raid-1 does not work. I used 2.6.16 to resync my raids,
>> and booted into 2.6.17-mm5.
<snip>
>> As we see, the md devices are assembled, then the filesystems are
>> mounted and swap turned on.  Then all three md devices fail a 
>> partition at the same time.  Somehow, I don't believe that
>> is correct. ;-)
>>
> 
> I assume this is still the broken-barriers bug.  Thanks for all the help on
> this, guys.  More is to be asked for, I'm afraid.
> 
> I've prepared a tree which is basically 2.6.17-mm5, only the git-scsi-misc
> and git-libata-all trees have been omitted.  It's at 
> 
> http://www.zip.com.au/~akpm/linux/patches/stuff/2.6.17-mm5-no-sata-scsi.bz2
> 
> (That's a diff against 2.6.17)

Works.

> If that kernel works, then the next step is to test
> 
> http://www.zip.com.au/~akpm/linux/patches/stuff/2.6.17-mm5-no-scsi.bz2
> 
> which is 2.6.17-mm5 without git-scsi-misc, but with git-libata-all.

Works.  I'm running it now and it looks to be all fine (including the 
workaround/fix for MSI)

In both cases I rebooted twice with each kernel to be sure it wasn't a one-off.

This then must point to git-scsi-misc being implicated, if not the source.......

Reuben

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.17-mm5 dislikes raid-1, just like mm4
  2006-07-01 22:22   ` 2.6.17-mm5 dislikes raid-1, just like mm4 Andrew Morton
  2006-07-01 22:52     ` Jeff Garzik
  2006-07-02  4:43     ` Reuben Farrelly
@ 2006-07-02  5:13     ` Reuben Farrelly
  2006-07-02 13:53       ` James Bottomley
  2 siblings, 1 reply; 20+ messages in thread
From: Reuben Farrelly @ 2006-07-02  5:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Helge Hafting, linux-kernel, linux-scsi, Neil Brown, Grant Wilson


On 2/07/2006 10:22 a.m., Andrew Morton wrote:
> I assume this is still the broken-barriers bug.  Thanks for all the help on
> this, guys.  More is to be asked for, I'm afraid.
> 
> I've prepared a tree which is basically 2.6.17-mm5, only the git-scsi-misc
> and git-libata-all trees have been omitted.  It's at 
> 
> http://www.zip.com.au/~akpm/linux/patches/stuff/2.6.17-mm5-no-sata-scsi.bz2
> 
> (That's a diff against 2.6.17)
> 
> If that kernel works, then the next step is to test
> 
> http://www.zip.com.au/~akpm/linux/patches/stuff/2.6.17-mm5-no-scsi.bz2
> 
> which is 2.6.17-mm5 without git-scsi-misc, but with git-libata-all.

Just for kicks, after testing those two trees (see previous email) I took my 
2.6.17-mm5 without git-scsi-misc and then patched git-scsi-misc.patch back in, 
rebuilt and rebooted and noted that RAID broke again.  Reverted the patch and it 
all worked.

So I can conclude that definitely and reproduceably that's the one.........

reuben



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.17-mm5 dislikes raid-1, just like mm4
  2006-07-02  4:43     ` Reuben Farrelly
@ 2006-07-02  6:09       ` Andrew Morton
  0 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2006-07-02  6:09 UTC (permalink / raw)
  To: Reuben Farrelly, James Bottomley
  Cc: helgehaf, linux-kernel, linux-scsi, neilb, grant.wilson

On Sun, 02 Jul 2006 16:43:56 +1200
Reuben Farrelly <reuben-lkml@reub.net> wrote:

> 
> 
> On 2/07/2006 10:22 a.m., Andrew Morton wrote:
> > On Sat, 1 Jul 2006 20:14:55 +0200
> > Helge Hafting <helgehaf@aitel.hist.no> wrote:
> > 
> >> I  just got mm5 up, and it has the same problem as mm4.
> >> Raid-1 does not work. I used 2.6.16 to resync my raids,
> >> and booted into 2.6.17-mm5.
> <snip>
> >> As we see, the md devices are assembled, then the filesystems are
> >> mounted and swap turned on.  Then all three md devices fail a 
> >> partition at the same time.  Somehow, I don't believe that
> >> is correct. ;-)
> >>
> > 
> > I assume this is still the broken-barriers bug.  Thanks for all the help on
> > this, guys.  More is to be asked for, I'm afraid.
> > 
> > I've prepared a tree which is basically 2.6.17-mm5, only the git-scsi-misc
> > and git-libata-all trees have been omitted.  It's at 
> > 
> > http://www.zip.com.au/~akpm/linux/patches/stuff/2.6.17-mm5-no-sata-scsi.bz2
> > 
> > (That's a diff against 2.6.17)
> 
> Works.
> 
> > If that kernel works, then the next step is to test
> > 
> > http://www.zip.com.au/~akpm/linux/patches/stuff/2.6.17-mm5-no-scsi.bz2
> > 
> > which is 2.6.17-mm5 without git-scsi-misc, but with git-libata-all.
> 
> Works.  I'm running it now and it looks to be all fine (including the 
> workaround/fix for MSI)
> 
> In both cases I rebooted twice with each kernel to be sure it wasn't a one-off.
> 
> This then must point to git-scsi-misc being implicated, if not the source.......
> 

Yep, everything points to that, thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.17-mm5 dislikes raid-1, just like mm4
  2006-07-02  5:13     ` Reuben Farrelly
@ 2006-07-02 13:53       ` James Bottomley
  2006-07-02 14:28         ` Grant Wilson
  0 siblings, 1 reply; 20+ messages in thread
From: James Bottomley @ 2006-07-02 13:53 UTC (permalink / raw)
  To: Reuben Farrelly
  Cc: Andrew Morton, Helge Hafting, linux-kernel, linux-scsi,
	Neil Brown, Grant Wilson

On Sun, 2006-07-02 at 17:13 +1200, Reuben Farrelly wrote:
> Just for kicks, after testing those two trees (see previous email) I
> took my 
> 2.6.17-mm5 without git-scsi-misc and then patched git-scsi-misc.patch
> back in, 
> rebuilt and rebooted and noted that RAID broke again.  Reverted the
> patch and it 
> all worked.
> 
> So I can conclude that definitely and reproduceably that's the
> one.........

OK, I have a theory.  I think 

[SCSI] sd/scsi_lib simplify sd_rw_intr and scsi_io_completion

Failed to take into account completion of zero length commands (which is
what a flush is).  Could you try the whole of -mm with this patch?

Thanks,

James

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 4c4add5..3d04a9f 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -855,7 +855,8 @@ static void scsi_release_buffers(struct 
  *		b) We can just use scsi_requeue_command() here.  This would
  *		   be used if we just wanted to retry, for example.
  */
-void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
+void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes,
+			unsigned int block_bytes)
 {
 	int result = cmd->result;
 	int this_count = cmd->bufflen;
@@ -920,72 +921,87 @@ void scsi_io_completion(struct scsi_cmnd
 	 * Next deal with any sectors which we were able to correctly
 	 * handle.
 	 */
-	if (good_bytes > 0) {
-		SCSI_LOG_HLCOMPLETE(1, printk("%ld sectors total, "
-					      "%d bytes done.\n",
+	if (good_bytes >= 0) {
+		SCSI_LOG_HLCOMPLETE(1, printk("%ld sectors total, %d bytes done.\n",
 					      req->nr_sectors, good_bytes));
 		SCSI_LOG_HLCOMPLETE(1, printk("use_sg is %d\n", cmd->use_sg));
 
 		if (clear_errors)
 			req->errors = 0;
+		/*
+		 * If multiple sectors are requested in one buffer, then
+		 * they will have been finished off by the first command.
+		 * If not, then we have a multi-buffer command.
+		 *
+		 * If block_bytes != 0, it means we had a medium error
+		 * of some sort, and that we want to mark some number of
+		 * sectors as not uptodate.  Thus we want to inhibit
+		 * requeueing right here - we will requeue down below
+		 * when we handle the bad sectors.
+		 */
 
-		/* A number of bytes were successfully read.  If there
-		 * is leftovers and there is some kind of error
-		 * (result != 0), retry the rest.
+		/*
+		 * If the command completed without error, then either
+		 * finish off the rest of the command, or start a new one.
 		 */
-		if (scsi_end_request(cmd, 1, good_bytes, !!result) == NULL)
+		if (scsi_end_request(cmd, 1, good_bytes, result == 0) == NULL)
 			return;
 	}
-
-	/* good_bytes = 0, or (inclusive) there were leftovers and
-	 * result = 0, so scsi_end_request couldn't retry.
+	/*
+	 * Now, if we were good little boys and girls, Santa left us a request
+	 * sense buffer.  We can extract information from this, so we
+	 * can choose a block to remap, etc.
 	 */
 	if (sense_valid && !sense_deferred) {
 		switch (sshdr.sense_key) {
 		case UNIT_ATTENTION:
 			if (cmd->device->removable) {
-				/* Detected disc change.  Set a bit
+				/* detected disc change.  set a bit 
 				 * and quietly refuse further access.
 				 */
 				cmd->device->changed = 1;
-				scsi_end_request(cmd, 0, this_count, 1);
+				scsi_end_request(cmd, 0,
+						this_count, 1);
 				return;
 			} else {
-				/* Must have been a power glitch, or a
-				 * bus reset.  Could not have been a
-				 * media change, so we just retry the
-				 * request and see what happens.
-				 */
+				/*
+				* Must have been a power glitch, or a
+				* bus reset.  Could not have been a
+				* media change, so we just retry the
+				* request and see what happens.  
+				*/
 				scsi_requeue_command(q, cmd);
 				return;
 			}
 			break;
 		case ILLEGAL_REQUEST:
-			/* If we had an ILLEGAL REQUEST returned, then
-			 * we may have performed an unsupported
-			 * command.  The only thing this should be
-			 * would be a ten byte read where only a six
-			 * byte read was supported.  Also, on a system
-			 * where READ CAPACITY failed, we may have
-			 * read past the end of the disk.
-			 */
+			/*
+		 	* If we had an ILLEGAL REQUEST returned, then we may
+		 	* have performed an unsupported command.  The only
+		 	* thing this should be would be a ten byte read where
+			* only a six byte read was supported.  Also, on a
+			* system where READ CAPACITY failed, we may have read
+			* past the end of the disk.
+		 	*/
 			if ((cmd->device->use_10_for_rw &&
 			    sshdr.asc == 0x20 && sshdr.ascq == 0x00) &&
 			    (cmd->cmnd[0] == READ_10 ||
 			     cmd->cmnd[0] == WRITE_10)) {
 				cmd->device->use_10_for_rw = 0;
-				/* This will cause a retry with a
-				 * 6-byte command.
+				/*
+				 * This will cause a retry with a 6-byte
+				 * command.
 				 */
 				scsi_requeue_command(q, cmd);
-				return;
+				result = 0;
 			} else {
 				scsi_end_request(cmd, 0, this_count, 1);
 				return;
 			}
 			break;
 		case NOT_READY:
-			/* If the device is in the process of becoming
+			/*
+			 * If the device is in the process of becoming
 			 * ready, or has a temporary blockage, retry.
 			 */
 			if (sshdr.asc == 0x04) {
@@ -1005,7 +1021,7 @@ void scsi_io_completion(struct scsi_cmnd
 			}
 			if (!(req->flags & REQ_QUIET)) {
 				scmd_printk(KERN_INFO, cmd,
-					    "Device not ready: ");
+					   "Device not ready: ");
 				scsi_print_sense_hdr("", &sshdr);
 			}
 			scsi_end_request(cmd, 0, this_count, 1);
@@ -1013,21 +1029,21 @@ void scsi_io_completion(struct scsi_cmnd
 		case VOLUME_OVERFLOW:
 			if (!(req->flags & REQ_QUIET)) {
 				scmd_printk(KERN_INFO, cmd,
-					    "Volume overflow, CDB: ");
+					   "Volume overflow, CDB: ");
 				__scsi_print_command(cmd->data_cmnd);
 				scsi_print_sense("", cmd);
 			}
-			/* See SSC3rXX or current. */
-			scsi_end_request(cmd, 0, this_count, 1);
+			scsi_end_request(cmd, 0, block_bytes, 1);
 			return;
 		default:
 			break;
 		}
-	}
+	}			/* driver byte != 0 */
 	if (host_byte(result) == DID_RESET) {
-		/* Third party bus reset or reset for error recovery
-		 * reasons.  Just retry the request and see what
-		 * happens.
+		/*
+		 * Third party bus reset or reset for error
+		 * recovery reasons.  Just retry the request
+		 * and see what happens.  
 		 */
 		scsi_requeue_command(q, cmd);
 		return;
@@ -1035,13 +1051,21 @@ void scsi_io_completion(struct scsi_cmnd
 	if (result) {
 		if (!(req->flags & REQ_QUIET)) {
 			scmd_printk(KERN_INFO, cmd,
-				    "SCSI error: return code = 0x%08x\n",
-				    result);
+				   "SCSI error: return code = 0x%x\n", result);
+
 			if (driver_byte(result) & DRIVER_SENSE)
 				scsi_print_sense("", cmd);
 		}
+		/*
+		 * Mark a single buffer as not uptodate.  Queue the remainder.
+		 * We sometimes get this cruft in the event that a medium error
+		 * isn't properly reported.
+		 */
+		block_bytes = req->hard_cur_sectors << 9;
+		if (!block_bytes)
+			block_bytes = req->data_len;
+		scsi_end_request(cmd, 0, block_bytes, 1);
 	}
-	scsi_end_request(cmd, 0, this_count, !result);
 }
 EXPORT_SYMBOL(scsi_io_completion);
 
@@ -1145,7 +1169,7 @@ static void scsi_blk_pc_done(struct scsi
 	 * successfully. Since this is a REQ_BLOCK_PC command the
 	 * caller should check the request's errors value
 	 */
-	scsi_io_completion(cmd, cmd->bufflen);
+	scsi_io_completion(cmd, cmd->bufflen, 0);
 }
 
 static void scsi_setup_blk_pc_cmnd(struct scsi_cmnd *cmd)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index f899ff0..3541990 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -891,10 +891,11 @@ #endif
 static void sd_rw_intr(struct scsi_cmnd * SCpnt)
 {
 	int result = SCpnt->result;
- 	unsigned int xfer_size = SCpnt->request_bufflen;
- 	unsigned int good_bytes = result ? 0 : xfer_size;
- 	u64 start_lba = SCpnt->request->sector;
- 	u64 bad_lba;
+	int this_count = SCpnt->request_bufflen;
+	int good_bytes = (result == 0 ? this_count : 0);
+	sector_t block_sectors = 1;
+	u64 first_err_block;
+	sector_t error_sector;
 	struct scsi_sense_hdr sshdr;
 	int sense_valid = 0;
 	int sense_deferred = 0;
@@ -905,6 +906,7 @@ static void sd_rw_intr(struct scsi_cmnd 
 		if (sense_valid)
 			sense_deferred = scsi_sense_is_deferred(&sshdr);
 	}
+
 #ifdef CONFIG_SCSI_LOGGING
 	SCSI_LOG_HLCOMPLETE(1, printk("sd_rw_intr: %s: res=0x%x\n", 
 				SCpnt->request->rq_disk->disk_name, result));
@@ -914,72 +916,89 @@ #ifdef CONFIG_SCSI_LOGGING
 				sshdr.sense_key, sshdr.asc, sshdr.ascq));
 	}
 #endif
-	if (driver_byte(result) != DRIVER_SENSE &&
-	    (!sense_valid || sense_deferred))
-		goto out;
+	/*
+	   Handle MEDIUM ERRORs that indicate partial success.  Since this is a
+	   relatively rare error condition, no care is taken to avoid
+	   unnecessary additional work such as memcpy's that could be avoided.
+	 */
+	if (driver_byte(result) != 0 &&
+		 sense_valid && !sense_deferred) {
+		switch (sshdr.sense_key) {
+		case MEDIUM_ERROR:
+			if (!blk_fs_request(SCpnt->request))
+				break;
+			info_valid = scsi_get_sense_info_fld(
+				SCpnt->sense_buffer, SCSI_SENSE_BUFFERSIZE,
+				&first_err_block);
+			/*
+			 * May want to warn and skip if following cast results
+			 * in actual truncation (if sector_t < 64 bits)
+			 */
+			error_sector = (sector_t)first_err_block;
+			if (SCpnt->request->bio != NULL)
+				block_sectors = bio_sectors(SCpnt->request->bio);
+			switch (SCpnt->device->sector_size) {
+			case 1024:
+				error_sector <<= 1;
+				if (block_sectors < 2)
+					block_sectors = 2;
+				break;
+			case 2048:
+				error_sector <<= 2;
+				if (block_sectors < 4)
+					block_sectors = 4;
+				break;
+			case 4096:
+				error_sector <<=3;
+				if (block_sectors < 8)
+					block_sectors = 8;
+				break;
+			case 256:
+				error_sector >>= 1;
+				break;
+			default:
+				break;
+			}
 
-	switch (sshdr.sense_key) {
-	case HARDWARE_ERROR:
-	case MEDIUM_ERROR:
-		if (!blk_fs_request(SCpnt->request))
-			goto out;
-		info_valid = scsi_get_sense_info_fld(SCpnt->sense_buffer,
-						     SCSI_SENSE_BUFFERSIZE,
-						     &bad_lba);
-		if (!info_valid)
-			goto out;
-		if (xfer_size <= SCpnt->device->sector_size)
-			goto out;
-		switch (SCpnt->device->sector_size) {
-		case 256:
-			start_lba <<= 1;
-			break;
-		case 512:
+			error_sector &= ~(block_sectors - 1);
+			good_bytes = (error_sector - SCpnt->request->sector) << 9;
+			if (good_bytes < 0 || good_bytes >= this_count)
+				good_bytes = 0;
 			break;
-		case 1024:
-			start_lba >>= 1;
-			break;
-		case 2048:
-			start_lba >>= 2;
+
+		case RECOVERED_ERROR: /* an error occurred, but it recovered */
+		case NO_SENSE: /* LLDD got sense data */
+			/*
+			 * Inform the user, but make sure that it's not treated
+			 * as a hard error.
+			 */
+			scsi_print_sense("sd", SCpnt);
+			SCpnt->result = 0;
+			memset(SCpnt->sense_buffer, 0, SCSI_SENSE_BUFFERSIZE);
+			good_bytes = this_count;
 			break;
-		case 4096:
-			start_lba >>= 3;
+
+		case ILLEGAL_REQUEST:
+			if (SCpnt->device->use_10_for_rw &&
+			    (SCpnt->cmnd[0] == READ_10 ||
+			     SCpnt->cmnd[0] == WRITE_10))
+				SCpnt->device->use_10_for_rw = 0;
+			if (SCpnt->device->use_10_for_ms &&
+			    (SCpnt->cmnd[0] == MODE_SENSE_10 ||
+			     SCpnt->cmnd[0] == MODE_SELECT_10))
+				SCpnt->device->use_10_for_ms = 0;
 			break;
+
 		default:
-			/* Print something here with limiting frequency. */
-			goto out;
 			break;
 		}
-		/* This computation should always be done in terms of
-		 * the resolution of the device's medium.
-		 */
-		good_bytes = (bad_lba - start_lba)*SCpnt->device->sector_size;
-		break;
-	case RECOVERED_ERROR:
-	case NO_SENSE:
-		/* Inform the user, but make sure that it's not treated
-		 * as a hard error.
-		 */
-		scsi_print_sense("sd", SCpnt);
-		SCpnt->result = 0;
-		memset(SCpnt->sense_buffer, 0, SCSI_SENSE_BUFFERSIZE);
-		good_bytes = xfer_size;
-		break;
-	case ILLEGAL_REQUEST:
-		if (SCpnt->device->use_10_for_rw &&
-		    (SCpnt->cmnd[0] == READ_10 ||
-		     SCpnt->cmnd[0] == WRITE_10))
-			SCpnt->device->use_10_for_rw = 0;
-		if (SCpnt->device->use_10_for_ms &&
-		    (SCpnt->cmnd[0] == MODE_SENSE_10 ||
-		     SCpnt->cmnd[0] == MODE_SELECT_10))
-			SCpnt->device->use_10_for_ms = 0;
-		break;
-	default:
-		break;
 	}
- out:
-	scsi_io_completion(SCpnt, good_bytes);
+	/*
+	 * This calls the generic completion function, now that we know
+	 * how many actual sectors finished, and how many sectors we need
+	 * to say have failed.
+	 */
+	scsi_io_completion(SCpnt, good_bytes, block_sectors << 9);
 }
 
 static int media_not_present(struct scsi_disk *sdkp,
diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index fd94408..ebf6579 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -292,7 +292,7 @@ #endif
 	 * how many actual sectors finished, and how many sectors we need
 	 * to say have failed.
 	 */
-	scsi_io_completion(SCpnt, good_bytes);
+	scsi_io_completion(SCpnt, good_bytes, block_sectors << 9);
 }
 
 static int sr_init_command(struct scsi_cmnd * SCpnt)
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index 371f70d..e46cd40 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -143,7 +143,7 @@ #define SCSI_STATE_MLQUEUE         0x100
 
 extern struct scsi_cmnd *scsi_get_command(struct scsi_device *, gfp_t);
 extern void scsi_put_command(struct scsi_cmnd *);
-extern void scsi_io_completion(struct scsi_cmnd *, unsigned int);
+extern void scsi_io_completion(struct scsi_cmnd *, unsigned int, unsigned int);
 extern void scsi_finish_command(struct scsi_cmnd *cmd);
 extern void scsi_req_abort_cmd(struct scsi_cmnd *cmd);
 



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: 2.6.17-mm5 dislikes raid-1, just like mm4
  2006-07-02 13:53       ` James Bottomley
@ 2006-07-02 14:28         ` Grant Wilson
  2006-07-02 15:06           ` James Bottomley
  0 siblings, 1 reply; 20+ messages in thread
From: Grant Wilson @ 2006-07-02 14:28 UTC (permalink / raw)
  To: James Bottomley
  Cc: Reuben Farrelly, Andrew Morton, Helge Hafting, linux-kernel,
	linux-scsi, Neil Brown

James Bottomley wrote:
> On Sun, 2006-07-02 at 17:13 +1200, Reuben Farrelly wrote:
>> Just for kicks, after testing those two trees (see previous email) I
>> took my 
>> 2.6.17-mm5 without git-scsi-misc and then patched git-scsi-misc.patch
>> back in, 
>> rebuilt and rebooted and noted that RAID broke again.  Reverted the
>> patch and it 
>> all worked.
>>
>> So I can conclude that definitely and reproduceably that's the
>> one.........
> 
> OK, I have a theory.  I think 
> 
> [SCSI] sd/scsi_lib simplify sd_rw_intr and scsi_io_completion
> 
> Failed to take into account completion of zero length commands (which is
> what a flush is).  Could you try the whole of -mm with this patch?
> 
> Thanks,
> 
> James
> 
[patch snipped]

With the patch applied to 2.6.17-mm5 my RAID-1 is up and running on both
SATA drives with no problems.

Thanks,
Grant

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.17-mm5 dislikes raid-1, just like mm4
  2006-07-02 14:28         ` Grant Wilson
@ 2006-07-02 15:06           ` James Bottomley
  2006-07-02 15:43             ` Grant Wilson
  0 siblings, 1 reply; 20+ messages in thread
From: James Bottomley @ 2006-07-02 15:06 UTC (permalink / raw)
  To: Grant Wilson
  Cc: Reuben Farrelly, Andrew Morton, Helge Hafting, linux-kernel,
	linux-scsi, Neil Brown

On Sun, 2006-07-02 at 15:28 +0100, Grant Wilson wrote:
> With the patch applied to 2.6.17-mm5 my RAID-1 is up and running on both
> SATA drives with no problems.

That's great, thanks.  Now we know what the problem patch is, I'd like
to try an 11th our correction of the logic fault in the original.  Could
you try this patch against original -mm (by reversing the previous
patch).  I think it should correct the problem?

Thanks,

James

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index bf5191f..08af9aa 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -920,22 +920,20 @@ void scsi_io_completion(struct scsi_cmnd
 	 * Next deal with any sectors which we were able to correctly
 	 * handle.
 	 */
-	if (good_bytes > 0) {
-		SCSI_LOG_HLCOMPLETE(1, printk("%ld sectors total, "
-					      "%d bytes done.\n",
-					      req->nr_sectors, good_bytes));
-		SCSI_LOG_HLCOMPLETE(1, printk("use_sg is %d\n", cmd->use_sg));
-
-		if (clear_errors)
-			req->errors = 0;
-
-		/* A number of bytes were successfully read.  If there
-		 * is leftovers and there is some kind of error
-		 * (result != 0), retry the rest.
-		 */
-		if (scsi_end_request(cmd, 1, good_bytes, !!result) == NULL)
-			return;
-	}
+	SCSI_LOG_HLCOMPLETE(1, printk("%ld sectors total, "
+				      "%d bytes done.\n",
+				      req->nr_sectors, good_bytes));
+	SCSI_LOG_HLCOMPLETE(1, printk("use_sg is %d\n", cmd->use_sg));
+
+	if (clear_errors)
+		req->errors = 0;
+
+	/* A number of bytes were successfully read.  If there
+	 * are leftovers and there is some kind of error
+	 * (result != 0), retry the rest.
+	 */
+	if (scsi_end_request(cmd, 1, good_bytes, result == 0) == NULL)
+		return;
 
 	/* good_bytes = 0, or (inclusive) there were leftovers and
 	 * result = 0, so scsi_end_request couldn't retry.

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: 2.6.17-mm5 dislikes raid-1, just like mm4
  2006-07-02 15:06           ` James Bottomley
@ 2006-07-02 15:43             ` Grant Wilson
  2006-07-02 19:07               ` Helge Hafting
  0 siblings, 1 reply; 20+ messages in thread
From: Grant Wilson @ 2006-07-02 15:43 UTC (permalink / raw)
  To: James Bottomley
  Cc: Reuben Farrelly, Andrew Morton, Helge Hafting, linux-kernel,
	linux-scsi, Neil Brown

James Bottomley wrote:
> On Sun, 2006-07-02 at 15:28 +0100, Grant Wilson wrote:
>> With the patch applied to 2.6.17-mm5 my RAID-1 is up and running on both
>> SATA drives with no problems.
> 
> That's great, thanks.  Now we know what the problem patch is, I'd like
> to try an 11th our correction of the logic fault in the original.  Could
> you try this patch against original -mm (by reversing the previous
> patch).  I think it should correct the problem?
> 
> Thanks,
> 
> James
> 
[snip]

With the first patch reversed and the second applied to -mm5 my RAID-1
array is still working correctly on both disks.

Thanks again,
Grant

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.17-mm5 dislikes raid-1, just like mm4
  2006-07-02 15:43             ` Grant Wilson
@ 2006-07-02 19:07               ` Helge Hafting
  2006-07-03  6:52                 ` Reuben Farrelly
  0 siblings, 1 reply; 20+ messages in thread
From: Helge Hafting @ 2006-07-02 19:07 UTC (permalink / raw)
  To: Grant Wilson
  Cc: James Bottomley, Reuben Farrelly, Andrew Morton, linux-kernel,
	linux-scsi, Neil Brown

On Sun, Jul 02, 2006 at 04:43:14PM +0100, Grant Wilson wrote:
> James Bottomley wrote:
> > On Sun, 2006-07-02 at 15:28 +0100, Grant Wilson wrote:
> >> With the patch applied to 2.6.17-mm5 my RAID-1 is up and running on both
> >> SATA drives with no problems.
> > 
> > That's great, thanks.  Now we know what the problem patch is, I'd like
> > to try an 11th our correction of the logic fault in the original.  Could
> > you try this patch against original -mm (by reversing the previous
> > patch).  I think it should correct the problem?
> > 
> > Thanks,
> > 
> > James
> > 
> [snip]
> 
> With the first patch reversed and the second applied to -mm5 my RAID-1
> array is still working correctly on both disks.
> 
The patch makes 2.6.17-mm5 md work on SATA and SCSI for me too.

Helge Hafting

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.17-mm5 dislikes raid-1, just like mm4
  2006-07-02 19:07               ` Helge Hafting
@ 2006-07-03  6:52                 ` Reuben Farrelly
  0 siblings, 0 replies; 20+ messages in thread
From: Reuben Farrelly @ 2006-07-03  6:52 UTC (permalink / raw)
  To: Helge Hafting
  Cc: Grant Wilson, James Bottomley, Andrew Morton, linux-kernel,
	linux-scsi, Neil Brown



On 3/07/2006 7:07 a.m., Helge Hafting wrote:
> On Sun, Jul 02, 2006 at 04:43:14PM +0100, Grant Wilson wrote:
>> James Bottomley wrote:
>>> On Sun, 2006-07-02 at 15:28 +0100, Grant Wilson wrote:
>>>> With the patch applied to 2.6.17-mm5 my RAID-1 is up and running on both
>>>> SATA drives with no problems.
>>> That's great, thanks.  Now we know what the problem patch is, I'd like
>>> to try an 11th our correction of the logic fault in the original.  Could
>>> you try this patch against original -mm (by reversing the previous
>>> patch).  I think it should correct the problem?
>>>
>>> Thanks,
>>>
>>> James
>>>
>> [snip]
>>
>> With the first patch reversed and the second applied to -mm5 my RAID-1
>> array is still working correctly on both disks.
>>
> The patch makes 2.6.17-mm5 md work on SATA and SCSI for me too.
> 
> Helge Hafting

+1.  Fixes everything here up too.

So with two patches applied (this one and an unrelated MSI fix) I'm all up and 
running perfectly on -mm5.

Thanks,
Reuben

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 21:30   ` 2.6.17-mm5 Andrew Morton
                       ` (2 preceding siblings ...)
  2006-07-01 22:54     ` 2.6.17-mm5 Jeff Garzik
@ 2006-07-27 21:02     ` Ming Zhang
  3 siblings, 0 replies; 20+ messages in thread
From: Ming Zhang @ 2006-07-27 21:02 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Grant Wilson, linux-kernel, Neil Brown, linux-scsi

On Sat, 2006-07-01 at 14:30 -0700, Andrew Morton wrote:
> On Sat, 1 Jul 2006 15:24:19 +0100
<...>

> 
> > [  155.123022] Unable to handle kernel NULL pointer dereference at 0000000000000048 RIP: 
> > [  155.155867]  [<ffffffff8047157a>] md_error+0x45/0x91
> > [  155.200353] PGD 77954067 PUD 726e5067 PMD 0 
> > [  155.226233] Oops: 0000 [1] PREEMPT SMP 
> > [  155.249516] last sysfs file: /devices/system/cpu/cpu0/cpufreq/scaling_setspeed
> > [  155.292808] CPU 0 
> > [  155.304968] Modules linked in: dm_mod evdev
> > [  155.330331] Pid: 0, comm: swapper Not tainted 2.6.17-mm5 #1
> > [  155.363697] RIP: 0010:[<ffffffff8047157a>]  [<ffffffff8047157a>] md_error+0x45/0x91
> > [  155.409638] RSP: 0018:ffffffff807a0c50  EFLAGS: 00010046
> > [  155.441445] RAX: 0000000000000000 RBX: ffff81007aa34708 RCX: 000000000000003f
> > [  155.484216] RDX: 00000000fffffffb RSI: ffff81007a821d28 RDI: ffff81007aa34708
> > [  155.526989] RBP: ffffffff807a0c60 R08: 0000000000000000 R09: ffff81007aac43b0
> > [  155.569759] R10: ffffffff804221e5 R11: 0000000000000058 R12: ffff81007aac4ab0
> > [  155.612533] R13: ffff81007aac43b0 R14: ffff81007aac4ab0 R15: 00000000fffffffb
> > [  155.655303] FS:  00002aeb361606d0(0000) GS:ffffffff80a46000(0000) knlGS:0000000000000000
> > [  155.703791] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> > [  155.738195] CR2: 0000000000000048 CR3: 0000000070997000 CR4: 00000000000006e0
> > [  155.780969] Process swapper (pid: 0, threadinfo ffffffff80a64000, task ffffffff80696a00)
> > [  155.829404] Stack:  ffff81007a821d28 ffff81007aa34708 ffffffff807a0c80 ffffffff804728d9
> > [  155.877840]  ffff81007a821d28 ffff81007aa34708 ffffffff807a0cc0 ffffffff8047409c
> > [  155.922535]  00001000807a0d00 ffff81007aac4ab0 00000000fffffffb ffff81007aac4ab0
> > [  155.966085] Call Trace:
> > [  155.982416]  [<ffffffff804728d9>] super_written+0x30/0x65
> > [  156.015292]  [<ffffffff8047409c>] super_written_barrier+0xc4/0xd1
> > [  156.052297]  [<ffffffff8023a5a5>] bio_endio+0x56/0x5b
> > [  156.082688]  [<ffffffff8022d21b>] __end_that_request_first+0x1c9/0x4c9
> > [  156.122068]  [<ffffffff8024a0d6>] end_that_request_first+0xc/0xe
> > [  156.158343]  [<ffffffff8036a692>] blk_ordered_complete_seq+0x7c/0x8b
> > [  156.196705]  [<ffffffff8036a6d1>] post_flush_end_io+0x30/0x35
> > [  156.231419]  [<ffffffff8036a5b5>] end_that_request_last+0xd9/0xf6
> > [  156.268215]  [<ffffffff80422204>] scsi_end_request+0xad/0xd7
> > [  156.302573]  [<ffffffff80422637>] scsi_io_completion+0x3e1/0x3f0
> > [  156.339004]  [<ffffffff8042266c>] scsi_blk_pc_done+0x26/0x28
> > [  156.373357]  [<ffffffff8041d11e>] scsi_finish_command+0xa9/0xb2
> > [  156.409264]  [<ffffffff804229f9>] scsi_softirq_done+0xf4/0xfd
> > [  156.444143]  [<ffffffff80237f66>] blk_done_softirq+0x70/0x7f
> > [  156.478323]  [<ffffffff80211366>] __do_softirq+0x67/0xf4
> > [  156.510224]  [<ffffffff8025f95e>] call_softirq+0x1e/0x28
> > [  156.542083] 
> > [  156.542083] Code: 48 8b 40 48 48 85 c0 74 3f ff d0 f0 0f ba ab e0 01 00 00 03 
> 
> The barrier code is in there again.
> 
> mddev->pers is NULL in md_error(), so the test of


feel curious, how did you find out it is because "mddev->pers is NULL"?

thanks!


> !mddev->pers->error_handler oopsed.  Perhaps this is a real MD bug which is
> now being exposed by the new barrier-handling problem.
> 
> 
> This should get you further, but...
> 
> From: Andrew Morton <akpm@osdl.org>
> 
> Cc: Neil Brown <neilb@suse.de>
> Signed-off-by: Andrew Morton <akpm@osdl.org>
> ---
> 
>  drivers/md/md.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> diff -puN drivers/md/md.c~md-oops-workaround drivers/md/md.c
> --- a/drivers/md/md.c~md-oops-workaround
> +++ a/drivers/md/md.c
> @@ -4586,6 +4586,8 @@ void md_error(mddev_t *mddev, mdk_rdev_t
>  		__builtin_return_address(0),__builtin_return_address(1),
>  		__builtin_return_address(2),__builtin_return_address(3));
>  */
> +	if (!mddev->pers)
> +		return;
>  	if (!mddev->pers->error_handler)
>  		return;
>  	mddev->pers->error_handler(mddev,rdev);
> _
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2006-07-27 21:02 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20060701033524.3c478698.akpm@osdl.org>
     [not found] ` <20060701142419.GB28750@tlg.swandive.local>
2006-07-01 21:30   ` 2.6.17-mm5 Andrew Morton
2006-07-01 22:26     ` 2.6.17-mm5 James Bottomley
2006-07-01 22:32       ` 2.6.17-mm5 Neil Brown
2006-07-01 22:56         ` 2.6.17-mm5 Jeff Garzik
2006-07-02  0:10           ` 2.6.17-mm5 James Bottomley
2006-07-01 22:29     ` More RAID / SATA / barrier problems [ Re: 2.6.17-mm5 ] Neil Brown
2006-07-01 22:54     ` 2.6.17-mm5 Jeff Garzik
2006-07-27 21:02     ` 2.6.17-mm5 Ming Zhang
     [not found] ` <20060701181455.GA16412@aitel.hist.no>
2006-07-01 22:22   ` 2.6.17-mm5 dislikes raid-1, just like mm4 Andrew Morton
2006-07-01 22:52     ` Jeff Garzik
2006-07-01 22:58       ` Andrew Morton
2006-07-02  4:43     ` Reuben Farrelly
2006-07-02  6:09       ` Andrew Morton
2006-07-02  5:13     ` Reuben Farrelly
2006-07-02 13:53       ` James Bottomley
2006-07-02 14:28         ` Grant Wilson
2006-07-02 15:06           ` James Bottomley
2006-07-02 15:43             ` Grant Wilson
2006-07-02 19:07               ` Helge Hafting
2006-07-03  6:52                 ` Reuben Farrelly

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox