* [PATCH] raid1: set bi_idx after bio cloning
@ 2005-05-13 21:39 Mike Tran
2005-05-13 23:08 ` Neil Brown
0 siblings, 1 reply; 5+ messages in thread
From: Mike Tran @ 2005-05-13 21:39 UTC (permalink / raw)
To: neilb, axboe, linux-raid
Hi Neil et al,
The scenario: Having LVM (striped) volumes on top of 2 raid1 devices
The problem: Corruption in building the scatter gather list for
device driver (see /drivers/block/blk_rq_map_sg() function)
In DM code, a bio sometimes can be split into several
clones and the clones' bi_idx is set accordingly by DM. MD raid1 also
clones bio but does not adjust bi_idx. The problem can be fixed in
either __bio_clone() or the caller of bio_clone(). I chose to fix
raid1.
--- linux-2.6.12-rc4-orig/drivers/md/raid1.c 2005-05-10 13:35:06.000000000-0500
+++ linux-2.6.12-rc4-patch/drivers/md/raid1.c 2005-05-13 15:41:43.000000000-0500
@@ -577,6 +577,7 @@
r1_bio->bios[rdisk] = read_bio;
+ read_bio->bi_idx = bio->bi_idx;
read_bio->bi_sector = r1_bio->sector + mirror->rdev->data_offset;
read_bio->bi_bdev = mirror->rdev->bdev;
read_bio->bi_end_io = raid1_end_read_request;
--
Regards,
Mike T.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] raid1: set bi_idx after bio cloning
2005-05-13 21:39 [PATCH] raid1: set bi_idx after bio cloning Mike Tran
@ 2005-05-13 23:08 ` Neil Brown
2005-05-16 15:25 ` [PATCH] bio: " Mike Tran
0 siblings, 1 reply; 5+ messages in thread
From: Neil Brown @ 2005-05-13 23:08 UTC (permalink / raw)
To: Mike Tran; +Cc: axboe, linux-raid
On Friday May 13, mhtran@us.ibm.com wrote:
> Hi Neil et al,
>
> The scenario: Having LVM (striped) volumes on top of 2 raid1 devices
> The problem: Corruption in building the scatter gather list for
> device driver (see /drivers/block/blk_rq_map_sg() function)
>
> In DM code, a bio sometimes can be split into several
> clones and the clones' bi_idx is set accordingly by DM. MD raid1 also
> clones bio but does not adjust bi_idx. The problem can be fixed in
> either __bio_clone() or the caller of bio_clone(). I chose to fix
> raid1.
Presumably the same fix would be needed in the "write" path and all in
the read-retry path in raid1d?
Why not just put it in __bio_clone, which would be one change instead
of (atleast) 3?
NeilBrown
>
> --- linux-2.6.12-rc4-orig/drivers/md/raid1.c 2005-05-10 13:35:06.000000000-0500
> +++ linux-2.6.12-rc4-patch/drivers/md/raid1.c 2005-05-13 15:41:43.000000000-0500
> @@ -577,6 +577,7 @@
>
> r1_bio->bios[rdisk] = read_bio;
>
> + read_bio->bi_idx = bio->bi_idx;
> read_bio->bi_sector = r1_bio->sector + mirror->rdev->data_offset;
> read_bio->bi_bdev = mirror->rdev->bdev;
> read_bio->bi_end_io = raid1_end_read_request;
>
> --
> Regards,
> Mike T.
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] bio: set bi_idx after bio cloning
2005-05-13 23:08 ` Neil Brown
@ 2005-05-16 15:25 ` Mike Tran
2005-05-16 22:19 ` problems with 3ware 8506-8 post-disk failure Harry Mangalam
2005-05-23 7:31 ` [PATCH] bio: set bi_idx after bio cloning Jens Axboe
0 siblings, 2 replies; 5+ messages in thread
From: Mike Tran @ 2005-05-16 15:25 UTC (permalink / raw)
To: Neil Brown, axboe; +Cc: linux-raid
Hi Jens,
Unless you have specific reason for not doing in __bio_clone(), I agree
with Neil's suggestion. Please consider the following patch:
--- linux-2.6.12-rc4-orig/fs/bio.c 2005-05-16 10:13:27.000000000 -0500
+++ linux-2.6.12-rc4-patch/fs/bio.c 2005-05-16 10:15:06.000000000 -0500
@@ -255,10 +255,7 @@
bio->bi_flags |= 1 << BIO_CLONED;
bio->bi_rw = bio_src->bi_rw;
- /*
- * notes -- maybe just leave bi_idx alone. assume identical mapping
- * for the clone
- */
+ bio->bi_idx = bio_src->bi_idx;
bio->bi_vcnt = bio_src->bi_vcnt;
bio->bi_size = bio_src->bi_size;
bio_phys_segments(q, bio);
--
Regards,
Mike T.
On Fri, 2005-05-13 at 18:08, Neil Brown wrote:
> On Friday May 13, mhtran@us.ibm.com wrote:
> > Hi Neil et al,
> >
> > The scenario: Having LVM (striped) volumes on top of 2 raid1 devices
> > The problem: Corruption in building the scatter gather list for
> > device driver (see /drivers/block/blk_rq_map_sg() function)
> >
> > In DM code, a bio sometimes can be split into several
> > clones and the clones' bi_idx is set accordingly by DM. MD raid1 also
> > clones bio but does not adjust bi_idx. The problem can be fixed in
> > either __bio_clone() or the caller of bio_clone(). I chose to fix
> > raid1.
>
> Presumably the same fix would be needed in the "write" path and all in
> the read-retry path in raid1d?
>
> Why not just put it in __bio_clone, which would be one change instead
> of (atleast) 3?
>
> NeilBrown
>
> >
> > --- linux-2.6.12-rc4-orig/drivers/md/raid1.c 2005-05-10 13:35:06.000000000-0500
> > +++ linux-2.6.12-rc4-patch/drivers/md/raid1.c 2005-05-13 15:41:43.000000000-0500
> > @@ -577,6 +577,7 @@
> >
> > r1_bio->bios[rdisk] = read_bio;
> >
> > + read_bio->bi_idx = bio->bi_idx;
> > read_bio->bi_sector = r1_bio->sector + mirror->rdev->data_offset;
> > read_bio->bi_bdev = mirror->rdev->bdev;
> > read_bio->bi_end_io = raid1_end_read_request;
> >
> > --
> > Regards,
> > Mike T.
> >
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* problems with 3ware 8506-8 post-disk failure
2005-05-16 15:25 ` [PATCH] bio: " Mike Tran
@ 2005-05-16 22:19 ` Harry Mangalam
2005-05-23 7:31 ` [PATCH] bio: set bi_idx after bio cloning Jens Axboe
1 sibling, 0 replies; 5+ messages in thread
From: Harry Mangalam @ 2005-05-16 22:19 UTC (permalink / raw)
To: linux-raid
Scenario:
dual opteron/4G/Ubuntu pure 64bit SMP / OS on separate IDE drive, 3ware
8506-8port driving 8x WD2500JD disks in Chenbro hotswap cages as RAID5,
config'ed as both reiserfs (pre-catastrophe) and ext3 (postcatastrophe).
I'm responsible for getting this system up (done) and reliable (not done).
The short version is that it ran well for a few weeks until we discovered on a
reboot that a disk had silently failed, degrading the RAID5. In trying to
repair that failure, 3ware's 3dm2 software that indicated that it was
repairing the array, but failed to do so, causing the loss of the entire
array. I tried to rescue the data with reiserfs's fsck but was only able to
recover individual chunks. Since most of the info was huge binary files and
most of it was backed up elsewhere, we decided not to attempt to rescue
anything and we re-formatted with ext3, supposedly bc it was considered more
reliable and better suited for large files. After that, the raid stayed up
for a day or so and I loaded it down with huge disk i/o, trying to see what
would happen. The same port / disk # failed again (tho at least this time
the SW notified us), but this seems pretty suspicious that it's the same port
number failing.
I played around with the motherboard Silicon Image 4port SATA controller and
sw raid (via mdadm) for a while and found that after a certain amount of
futzing, it looked not too bad, but the amount of futzing made me a bit
nervous, especially since someone else is going to have to care for it. The
speed of the SW RAID was about 10-20% better than the 3ware by bonnie++, but
I liked the idea on having the RAID looks like big scsi disk. So I went for
the 3ware.
I'll detail the complete catastrophe later (already written up in large chunks
- just have to remove some inflammatory language before posting), but my
question to the group is what people think of 3ware's support. The common
opinion on 3ware seems to be that it's great that they support Linux and the
HW works fine (also my experience), but my opinion has been shaded
considerably by what happens when a RAID fails - when you really DO need to
recover and you need a straightforward path to do so.
In short, I've found 3ware support for recovery procedures to be hard to find
(via google for example and also on their website), hard to understand
because of some peculiar nomenclature, and sometimes misleading due to
oddities of their software.
Is this just my experience, or is this a widely held view? I realize that I'm
talking to a group that seems to be heavily weighted towards SW RAID, but
maybe it's just me. If anyone can compare recovery paths between the 2 (SW
vs 3ware HW) I'd be very happy to hear the stories. Given this recent
experience, I'm re-evaluating whether I should switch back and go SW RAID,
especially given another large catastrophe involving 3ware ccontrollers on
campus.
Have people found that the Chenbro hotswap cages are a contributing factor to
RAID failure? That's what one 3wware person indicated.
--
Cheers, Harry
Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm@tacgi.com
<<plain text preferred>>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] bio: set bi_idx after bio cloning
2005-05-16 15:25 ` [PATCH] bio: " Mike Tran
2005-05-16 22:19 ` problems with 3ware 8506-8 post-disk failure Harry Mangalam
@ 2005-05-23 7:31 ` Jens Axboe
1 sibling, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2005-05-23 7:31 UTC (permalink / raw)
To: Mike Tran; +Cc: Neil Brown, linux-raid
On Mon, May 16 2005, Mike Tran wrote:
>
> Hi Jens,
>
> Unless you have specific reason for not doing in __bio_clone(), I agree
> with Neil's suggestion. Please consider the following patch:
>
> --- linux-2.6.12-rc4-orig/fs/bio.c 2005-05-16 10:13:27.000000000 -0500
> +++ linux-2.6.12-rc4-patch/fs/bio.c 2005-05-16 10:15:06.000000000 -0500
> @@ -255,10 +255,7 @@
> bio->bi_flags |= 1 << BIO_CLONED;
> bio->bi_rw = bio_src->bi_rw;
>
> - /*
> - * notes -- maybe just leave bi_idx alone. assume identical mapping
> - * for the clone
> - */
> + bio->bi_idx = bio_src->bi_idx;
> bio->bi_vcnt = bio_src->bi_vcnt;
> bio->bi_size = bio_src->bi_size;
> bio_phys_segments(q, bio);
That patch is fine with me, I'll add it. Thanks.
--
Jens Axboe
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2005-05-23 7:31 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-13 21:39 [PATCH] raid1: set bi_idx after bio cloning Mike Tran
2005-05-13 23:08 ` Neil Brown
2005-05-16 15:25 ` [PATCH] bio: " Mike Tran
2005-05-16 22:19 ` problems with 3ware 8506-8 post-disk failure Harry Mangalam
2005-05-23 7:31 ` [PATCH] bio: set bi_idx after bio cloning Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).