From mboxrd@z Thu Jan 1 00:00:00 1970 From: Neil Brown Subject: Re: New RAID causing system lockups Date: Tue, 14 Sep 2010 09:51:11 +1000 Message-ID: <20100914095111.6e3045c7@notabene> References: <20100912064308.46d96742@notabene> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: Sender: linux-raid-owner@vger.kernel.org To: Mike Hartman Cc: linux-raid@vger.kernel.org List-Id: linux-raid.ids On Mon, 13 Sep 2010 11:57:03 -0400 Mike Hartman wrote: > >>> I don't know yet what is causing the lock-up. =C2=A0A quick look = at your logs > >>> suggest that it could be related to the barrier handling. =C2=A0M= aybe trying to > >>> handle a barrier during a reshape is prone to races of some sort = - I wouldn't > >>> be very surprised by that. > >> > >> Just note that during the second lockup no reshape or resync was g= oing > >> on. The array state was stable, I was just writing to it. > >> > >>> > >>> I'll have a look at the code and see what I can find. > >> > >> Thanks a lot. If it was only a risk when I was growing/reshaping t= he > >> array, and covered by the backup file, it would just be an > >> inconvenience. But since it can seemingly happen at any time it's = a > >> problem. > >> > > > > The lockup just happened again. I wasn't doing any > > growing/reshaping/anything like that. Just copying some data into t= he > > partition that lives on md0. dmesg_3.txt has been uploaded alongsid= e > > the other files at http://www.hartmanipulation.com/raid/. The trace > > looks pretty similar to me. > > >=20 > The lockup just happened for the fourth time, less than an hour after > I rebooted to clear the previous lockup from last night. All I did wa= s > boot the system, start the RAID, and start copying some files onto it= =2E > The problem seems to be getting worse - up until now I got at least a > full day of fairly heavy usage out of the system before it happened. > dmesg_4.txt has been uploaded alongside the other files. Let me know > if there's any other system information that would be useful. >=20 > Mike Hi Mike, thanks for the updates. I'm not entirely clear what is happening (in fact, due to a cold that I= am still fighting off, nothing is entirely clear at the moment), but it lo= oks very likely that the problem is due to an interplay between barrier han= dling, and the multi-level structure of your array (a raid0 being a member of = a raid5). When a barrier request is processed, both arrays will schedule 'work' t= o be done by the 'event' thread and I'm guess that you can get into a situat= ion where one work time is wait for the other, but the other is behind the = one on the single queue (I wonder if that make sense...) Anyway, this patch might make a difference, It reduced the number of w= ork items schedule in a way that could conceivably fix the problem. If you can test this, please report the results. I cannot easily repro= duce the problem so there is limited testing that I can do. Thanks, NeilBrown diff --git a/drivers/md/md.c b/drivers/md/md.c index f20d13e..7f2785c 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -294,6 +294,23 @@ EXPORT_SYMBOL(mddev_congested); =20 #define POST_REQUEST_BARRIER ((void*)1) =20 +static void md_barrier_done(mddev_t *mddev) +{ + struct bio *bio =3D mddev->barrier; + + if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags)) + bio_endio(bio, -EOPNOTSUPP); + else if (bio->bi_size =3D=3D 0) + bio_endio(bio, 0); + else { + /* other options need to be handled from process context */ + schedule_work(&mddev->barrier_work); + return; + } + mddev->barrier =3D NULL; + wake_up(&mddev->sb_wait); +} + static void md_end_barrier(struct bio *bio, int err) { mdk_rdev_t *rdev =3D bio->bi_private; @@ -310,7 +327,7 @@ static void md_end_barrier(struct bio *bio, int err= ) wake_up(&mddev->sb_wait); } else /* The pre-request barrier has finished */ - schedule_work(&mddev->barrier_work); + md_barrier_done(mddev); } bio_put(bio); } @@ -350,18 +367,12 @@ static void md_submit_barrier(struct work_struct = *ws) =20 atomic_set(&mddev->flush_pending, 1); =20 - if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags)) - bio_endio(bio, -EOPNOTSUPP); - else if (bio->bi_size =3D=3D 0) - /* an empty barrier - all done */ - bio_endio(bio, 0); - else { - bio->bi_rw &=3D ~REQ_HARDBARRIER; - if (mddev->pers->make_request(mddev, bio)) - generic_make_request(bio); - mddev->barrier =3D POST_REQUEST_BARRIER; - submit_barriers(mddev); - } + bio->bi_rw &=3D ~REQ_HARDBARRIER; + if (mddev->pers->make_request(mddev, bio)) + generic_make_request(bio); + mddev->barrier =3D POST_REQUEST_BARRIER; + submit_barriers(mddev); + if (atomic_dec_and_test(&mddev->flush_pending)) { mddev->barrier =3D NULL; wake_up(&mddev->sb_wait); @@ -383,7 +394,7 @@ void md_barrier_request(mddev_t *mddev, struct bio = *bio) submit_barriers(mddev); =20 if (atomic_dec_and_test(&mddev->flush_pending)) - schedule_work(&mddev->barrier_work); + md_barrier_done(mddev); } EXPORT_SYMBOL(md_barrier_request); =20 -- To unsubscribe from this list: send the line "unsubscribe linux-raid" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html