* [PATCH] raid1: set bi_idx after bio cloning
@ 2005-05-13 21:39 Mike Tran
2005-05-13 23:08 ` Neil Brown
0 siblings, 1 reply; 5+ messages in thread
From: Mike Tran @ 2005-05-13 21:39 UTC (permalink / raw)
To: neilb, axboe, linux-raid
Hi Neil et al,
The scenario: Having LVM (striped) volumes on top of 2 raid1 devices
The problem: Corruption in building the scatter gather list for
device driver (see /drivers/block/blk_rq_map_sg() function)
In DM code, a bio sometimes can be split into several
clones and the clones' bi_idx is set accordingly by DM. MD raid1 also
clones bio but does not adjust bi_idx. The problem can be fixed in
either __bio_clone() or the caller of bio_clone(). I chose to fix
raid1.
--- linux-2.6.12-rc4-orig/drivers/md/raid1.c 2005-05-10 13:35:06.000000000-0500
+++ linux-2.6.12-rc4-patch/drivers/md/raid1.c 2005-05-13 15:41:43.000000000-0500
@@ -577,6 +577,7 @@
r1_bio->bios[rdisk] = read_bio;
+ read_bio->bi_idx = bio->bi_idx;
read_bio->bi_sector = r1_bio->sector + mirror->rdev->data_offset;
read_bio->bi_bdev = mirror->rdev->bdev;
read_bio->bi_end_io = raid1_end_read_request;
--
Regards,
Mike T.
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] raid1: set bi_idx after bio cloning 2005-05-13 21:39 [PATCH] raid1: set bi_idx after bio cloning Mike Tran @ 2005-05-13 23:08 ` Neil Brown 2005-05-16 15:25 ` [PATCH] bio: " Mike Tran 0 siblings, 1 reply; 5+ messages in thread From: Neil Brown @ 2005-05-13 23:08 UTC (permalink / raw) To: Mike Tran; +Cc: axboe, linux-raid On Friday May 13, mhtran@us.ibm.com wrote: > Hi Neil et al, > > The scenario: Having LVM (striped) volumes on top of 2 raid1 devices > The problem: Corruption in building the scatter gather list for > device driver (see /drivers/block/blk_rq_map_sg() function) > > In DM code, a bio sometimes can be split into several > clones and the clones' bi_idx is set accordingly by DM. MD raid1 also > clones bio but does not adjust bi_idx. The problem can be fixed in > either __bio_clone() or the caller of bio_clone(). I chose to fix > raid1. Presumably the same fix would be needed in the "write" path and all in the read-retry path in raid1d? Why not just put it in __bio_clone, which would be one change instead of (atleast) 3? NeilBrown > > --- linux-2.6.12-rc4-orig/drivers/md/raid1.c 2005-05-10 13:35:06.000000000-0500 > +++ linux-2.6.12-rc4-patch/drivers/md/raid1.c 2005-05-13 15:41:43.000000000-0500 > @@ -577,6 +577,7 @@ > > r1_bio->bios[rdisk] = read_bio; > > + read_bio->bi_idx = bio->bi_idx; > read_bio->bi_sector = r1_bio->sector + mirror->rdev->data_offset; > read_bio->bi_bdev = mirror->rdev->bdev; > read_bio->bi_end_io = raid1_end_read_request; > > -- > Regards, > Mike T. > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] bio: set bi_idx after bio cloning 2005-05-13 23:08 ` Neil Brown @ 2005-05-16 15:25 ` Mike Tran 2005-05-16 22:19 ` problems with 3ware 8506-8 post-disk failure Harry Mangalam 2005-05-23 7:31 ` [PATCH] bio: set bi_idx after bio cloning Jens Axboe 0 siblings, 2 replies; 5+ messages in thread From: Mike Tran @ 2005-05-16 15:25 UTC (permalink / raw) To: Neil Brown, axboe; +Cc: linux-raid Hi Jens, Unless you have specific reason for not doing in __bio_clone(), I agree with Neil's suggestion. Please consider the following patch: --- linux-2.6.12-rc4-orig/fs/bio.c 2005-05-16 10:13:27.000000000 -0500 +++ linux-2.6.12-rc4-patch/fs/bio.c 2005-05-16 10:15:06.000000000 -0500 @@ -255,10 +255,7 @@ bio->bi_flags |= 1 << BIO_CLONED; bio->bi_rw = bio_src->bi_rw; - /* - * notes -- maybe just leave bi_idx alone. assume identical mapping - * for the clone - */ + bio->bi_idx = bio_src->bi_idx; bio->bi_vcnt = bio_src->bi_vcnt; bio->bi_size = bio_src->bi_size; bio_phys_segments(q, bio); -- Regards, Mike T. On Fri, 2005-05-13 at 18:08, Neil Brown wrote: > On Friday May 13, mhtran@us.ibm.com wrote: > > Hi Neil et al, > > > > The scenario: Having LVM (striped) volumes on top of 2 raid1 devices > > The problem: Corruption in building the scatter gather list for > > device driver (see /drivers/block/blk_rq_map_sg() function) > > > > In DM code, a bio sometimes can be split into several > > clones and the clones' bi_idx is set accordingly by DM. MD raid1 also > > clones bio but does not adjust bi_idx. The problem can be fixed in > > either __bio_clone() or the caller of bio_clone(). I chose to fix > > raid1. > > Presumably the same fix would be needed in the "write" path and all in > the read-retry path in raid1d? > > Why not just put it in __bio_clone, which would be one change instead > of (atleast) 3? > > NeilBrown > > > > > --- linux-2.6.12-rc4-orig/drivers/md/raid1.c 2005-05-10 13:35:06.000000000-0500 > > +++ linux-2.6.12-rc4-patch/drivers/md/raid1.c 2005-05-13 15:41:43.000000000-0500 > > @@ -577,6 +577,7 @@ > > > > r1_bio->bios[rdisk] = read_bio; > > > > + read_bio->bi_idx = bio->bi_idx; > > read_bio->bi_sector = r1_bio->sector + mirror->rdev->data_offset; > > read_bio->bi_bdev = mirror->rdev->bdev; > > read_bio->bi_end_io = raid1_end_read_request; > > > > -- > > Regards, > > Mike T. > > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 5+ messages in thread
* problems with 3ware 8506-8 post-disk failure 2005-05-16 15:25 ` [PATCH] bio: " Mike Tran @ 2005-05-16 22:19 ` Harry Mangalam 2005-05-23 7:31 ` [PATCH] bio: set bi_idx after bio cloning Jens Axboe 1 sibling, 0 replies; 5+ messages in thread From: Harry Mangalam @ 2005-05-16 22:19 UTC (permalink / raw) To: linux-raid Scenario: dual opteron/4G/Ubuntu pure 64bit SMP / OS on separate IDE drive, 3ware 8506-8port driving 8x WD2500JD disks in Chenbro hotswap cages as RAID5, config'ed as both reiserfs (pre-catastrophe) and ext3 (postcatastrophe). I'm responsible for getting this system up (done) and reliable (not done). The short version is that it ran well for a few weeks until we discovered on a reboot that a disk had silently failed, degrading the RAID5. In trying to repair that failure, 3ware's 3dm2 software that indicated that it was repairing the array, but failed to do so, causing the loss of the entire array. I tried to rescue the data with reiserfs's fsck but was only able to recover individual chunks. Since most of the info was huge binary files and most of it was backed up elsewhere, we decided not to attempt to rescue anything and we re-formatted with ext3, supposedly bc it was considered more reliable and better suited for large files. After that, the raid stayed up for a day or so and I loaded it down with huge disk i/o, trying to see what would happen. The same port / disk # failed again (tho at least this time the SW notified us), but this seems pretty suspicious that it's the same port number failing. I played around with the motherboard Silicon Image 4port SATA controller and sw raid (via mdadm) for a while and found that after a certain amount of futzing, it looked not too bad, but the amount of futzing made me a bit nervous, especially since someone else is going to have to care for it. The speed of the SW RAID was about 10-20% better than the 3ware by bonnie++, but I liked the idea on having the RAID looks like big scsi disk. So I went for the 3ware. I'll detail the complete catastrophe later (already written up in large chunks - just have to remove some inflammatory language before posting), but my question to the group is what people think of 3ware's support. The common opinion on 3ware seems to be that it's great that they support Linux and the HW works fine (also my experience), but my opinion has been shaded considerably by what happens when a RAID fails - when you really DO need to recover and you need a straightforward path to do so. In short, I've found 3ware support for recovery procedures to be hard to find (via google for example and also on their website), hard to understand because of some peculiar nomenclature, and sometimes misleading due to oddities of their software. Is this just my experience, or is this a widely held view? I realize that I'm talking to a group that seems to be heavily weighted towards SW RAID, but maybe it's just me. If anyone can compare recovery paths between the 2 (SW vs 3ware HW) I'd be very happy to hear the stories. Given this recent experience, I'm re-evaluating whether I should switch back and go SW RAID, especially given another large catastrophe involving 3ware ccontrollers on campus. Have people found that the Chenbro hotswap cages are a contributing factor to RAID failure? That's what one 3wware person indicated. -- Cheers, Harry Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm@tacgi.com <<plain text preferred>> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] bio: set bi_idx after bio cloning 2005-05-16 15:25 ` [PATCH] bio: " Mike Tran 2005-05-16 22:19 ` problems with 3ware 8506-8 post-disk failure Harry Mangalam @ 2005-05-23 7:31 ` Jens Axboe 1 sibling, 0 replies; 5+ messages in thread From: Jens Axboe @ 2005-05-23 7:31 UTC (permalink / raw) To: Mike Tran; +Cc: Neil Brown, linux-raid On Mon, May 16 2005, Mike Tran wrote: > > Hi Jens, > > Unless you have specific reason for not doing in __bio_clone(), I agree > with Neil's suggestion. Please consider the following patch: > > --- linux-2.6.12-rc4-orig/fs/bio.c 2005-05-16 10:13:27.000000000 -0500 > +++ linux-2.6.12-rc4-patch/fs/bio.c 2005-05-16 10:15:06.000000000 -0500 > @@ -255,10 +255,7 @@ > bio->bi_flags |= 1 << BIO_CLONED; > bio->bi_rw = bio_src->bi_rw; > > - /* > - * notes -- maybe just leave bi_idx alone. assume identical mapping > - * for the clone > - */ > + bio->bi_idx = bio_src->bi_idx; > bio->bi_vcnt = bio_src->bi_vcnt; > bio->bi_size = bio_src->bi_size; > bio_phys_segments(q, bio); That patch is fine with me, I'll add it. Thanks. -- Jens Axboe ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2005-05-23 7:31 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-05-13 21:39 [PATCH] raid1: set bi_idx after bio cloning Mike Tran 2005-05-13 23:08 ` Neil Brown 2005-05-16 15:25 ` [PATCH] bio: " Mike Tran 2005-05-16 22:19 ` problems with 3ware 8506-8 post-disk failure Harry Mangalam 2005-05-23 7:31 ` [PATCH] bio: set bi_idx after bio cloning Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).