From mboxrd@z Thu Jan 1 00:00:00 1970 From: NeilBrown Subject: Re: Question about RAID1 plug/unplug code Date: Fri, 12 Sep 2014 16:16:49 +1000 Message-ID: <20140912161649.09b2d898@notabene.brown> References: <20140909114538.552a4a7d@notabene.brown> <20140909194532.621f4500@notabene.brown> <20140910193631.0c8cd1f4@notabene.brown> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; boundary="Sig_/jfKwnpVeBpqnhLovKh9O.al"; protocol="application/pgp-signature" Return-path: In-Reply-To: Sender: linux-raid-owner@vger.kernel.org To: Alexander Lyakas Cc: linux-raid , Yair Hershko List-Id: linux-raid.ids --Sig_/jfKwnpVeBpqnhLovKh9O.al Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Thu, 11 Sep 2014 11:22:40 +0300 Alexander Lyakas wrote: > Hi Neil, >=20 > On Wed, Sep 10, 2014 at 12:36 PM, NeilBrown wrote: > > On Wed, 10 Sep 2014 11:01:30 +0300 Alexander Lyakas > > wrote: > > > >> Hello Neil, > >> > >> On Tue, Sep 9, 2014 at 12:45 PM, NeilBrown wrote: > >> > On Tue, 9 Sep 2014 11:33:13 +0300 Alexander Lyakas > >> > wrote: > >> > > >> >> Hi Neil, > >> >> > >> >> > >> >> On Tue, Sep 9, 2014 at 4:45 AM, NeilBrown wrote: > >> >> > On Mon, 8 Sep 2014 16:55:52 +0300 Alexander Lyakas > >> >> > wrote: > >> >> > > >> >> >> Hi Neil, > >> >> >> We have been seeing high latency on the md/raid1 block device, d= ue to > >> >> >> the fact that all WRITEs are handed off to raid1d thread. This t= hread > >> >> >> also calls bitmap_unplug(), which writes the bitmap synchronousl= y. > >> >> >> While it waits for the bitmap, it cannot trigger other WRITEs wa= iting > >> >> >> in its pending_bio_list. This is especially seen with SSDs: MD's > >> >> >> latency is much higher that SSD latency (I have been stoned by P= eter > >> >> >> Grandi when I brought up this issue previously for raid5). > >> >> >> > >> >> >> Then I have noticed the commit: > >> >> >> > >> >> >> commit f54a9d0e59c4bea3db733921ca9147612a6f292c > >> >> >> Author: NeilBrown > >> >> >> Date: Thu Aug 2 08:33:20 2012 +1000 > >> >> >> > >> >> >> md/raid1: submit IO from originating thread instead of md th= read. > >> >> >> > >> >> >> Looking at the code, I learned that to avoid switching into raid= 1d, > >> >> >> the caller has to use blk_start_plug/blk_finish_plug. So I added= these > >> >> >> calls in our kernel module, which submits bios to MD. Results we= re > >> >> >> awesome, MD latency got down significantly. > >> >> > > >> >> > That's good to hear. > >> >> > > >> >> >> > >> >> >> So I have several questions about this plug/unplug thing. > >> >> >> > >> >> >> 1/ Originally this infrastructure was supposed to help IO schedu= lers > >> >> >> in merging requests. It is useful when one has a bunch of reques= ts to > >> >> >> submit in one shot. > >> >> > > >> >> > That is exactly the whole point of plugging: allow the device to= handle a > >> >> > batch of requests together instead of one at a time. > >> >> > > >> >> >> But in MD case, thus infrastructure is used for a different purp= ose: > >> >> >> not to merge requests (which may help bandwidth, but probably not > >> >> >> latency), but to avoid making raid1d a bottleneck, to be able to > >> >> >> submit requests from multiple threads in parallel, which brings = down > >> >> >> latency significantly in our case. Indeed "struct blk_plug" has a > >> >> >> special "cb_list", which is used only by MD. > >> >> > > >> >> > I don't think the way md uses plugging is conceptually different = from any > >> >> > other use: it is always about gathering a batch together. > >> >> > "cb_list" is handled by blk_check_plugged() which is also used by > >> >> > block/umem.c and btrfs. > >> >> > > >> >> > The base plugging code assumes that it is only gathering a batch = of requests > >> >> > for a single device - if the target device changes then the batch= is flushed. > >> >> > It also assumed that it was "struct request" that was batched. > >> >> > Devices like md that want to queue 'struct bio', something else w= as needed. > >> >> > Also with layered devices it can be useful to gather multiple bat= ches for > >> >> > multiple layers. > >> >> > So I created "cb_list" etc and a more generic interface. > >> >> > > >> >> >> In my case I have only individual bios (not a bunch of bios), an= d I > >> >> >> after wrap them with plug/unplug, MD latency gets better. So we = are > >> >> >> using the plug infrastructure for a different purpose. > >> >> >> Is my understanding correct? Was this your intention? > >> >> > > >> >> > I don't really understand what you are doing. There is no point = in using > >> >> > plugging for individual bios. The main point for raid1 writes i= s to gather > >> >> > a lot of writes together so that all multiple bitmap bits can be = set all at > >> >> > once. > >> >> > It should be possible to submit individual bios directly from mak= e_request > >> >> > without passing them to raid1d and without using plugging. > >> >> Can you pls explain how it is possible? > >> >> You have this code for WRITEs: > >> >> cb =3D blk_check_plugged(raid1_unplug, mddev, sizeof(*plug)= ); > >> >> if (cb) > >> >> plug =3D container_of(cb, struct raid1_plug_cb, cb); > >> >> else > >> >> plug =3D NULL; > >> >> spin_lock_irqsave(&conf->device_lock, flags); > >> >> if (plug) { > >> >> bio_list_add(&plug->pending, mbio); > >> >> plug->pending_cnt++; > >> >> } else { > >> >> bio_list_add(&conf->pending_bio_list, mbio); > >> >> conf->pending_count++; > >> >> } > >> >> spin_unlock_irqrestore(&conf->device_lock, flags); > >> >> > >> >> If the thread blk_check_plugged returns NULL, then you always hand = the > >> >> WRITE to raid1d. So the only option to avoid handoff to raid1d is f= or > >> >> the caller to plug. Otherwise, all WRITEs are handed off to raid1d = and > >> >> latency becomes terrible. > >> >> So in my case, I use plug/unplug for individual bios only to avoid = the > >> >> handoff to raid1d. > >> >> What am I missing in this analysis? > >> > > >> > if blk_check_plugged succeeds then it has arranged for raid1_unplug = to be > >> > called a little later by that same process. > >> > So there is nothing to stop you calling raid1_unplug immediately. > >> > > >> > raid1_unplug essentially does: > >> > bitmap_unplug() > >> > generic_make_request() > >> > > >> > so you can very nearly just do that, without any plugging. > >> I am sorry, but I did not understand your reply. Maybe I did not > >> explain myself, I will try again. > >> > >> I am not changing raid1.c code. I just want to avoid the handoff to > >> raid1d on WRITEs. According to your code, there are only two possible > >> flows: > >> > >> Flow 1 - with plugging > >> # caller calls blk_start_plug > >> # caller calls submit_bio > >> # blk_check_plugged succeeds, and bio is put onto plug->pending list > >> # caller calls blk_finish_plug > >> # raid1_unplug is called in the same caller's thread, so it does > >> bitmap_unplug and generic_make_request > >> > >> Flow 2 - without plugging > >> # caller calls submit_bio > >> # blk_check_plugged fails, and bio is put onto conf->pending_bio_list, > >> which means it will be submitted by raid1d > >> > >> My conclusion from that: to avoid the handoff to raid1, caller always > >> need to plug, even if it has a single bio to submit. But you said "it > >> should be possible to submit individual bios directly from > >> make_request without passing them to raid1d and without using > >> plugging". So can you explain how it is possible? I prefer not to > >> change raid1.c code. > >> > >> > > >> > There is a bit of extra subtlety but I can't really know how relevan= t that > >> > might be to you without actually seeing you code. > >> My code (in a different kernel module, not in raid1.c) is simply doing > >> submit_bio. I want to wrap this with plug/unplug to avoid the handoff > >> to raid1d and improve raid1 latency. > >> > > > > I think I need to see the code you are working with to be able to sugge= st > > anything used. > I am working with kernel 3.8.13. But your master branch has the same > code with respect to plug/unplug logic. >=20 > > But if it works with plugging, then just do it that way(?). > It works perfectly, and latency is much better. The only doubt is with > bitmap_unplug being called from multiple threads now. However, it can > happen for anybody that uses plug/unplug on top of MD raid1 (like ext4 > for example). So question is whether it is safe for MD users to > plug/unplug when submitting bios to MD. If not, would you be fixing > this? Didn't I already answer that question? Date: Tue, 9 Sep 2014 11:45:38 +1000 Hmmm... there could be an issue there. It is possible that some callers = of bitmap_unplug won't block when they should. bitmap_unplug should probably wait unconditionally. See http://git.neil.brown.name/?p=3Dmd.git;a=3Dcommitdiff;h=3D339f60d943f848eb5= 16aa4b82b5e187dbbe088dc (not tested yet). NeilBrown >=20 > Alex. > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html --Sig_/jfKwnpVeBpqnhLovKh9O.al Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIVAwUBVBKP0Tnsnt1WYoG5AQL8IRAAin097Bz+k44PACzyWR0MPkl+j3n4d2OF qnqAiiW8ysLEztyt9l1z+toRBPm9YDXH5D3ZxDTC7IM8uTPpramjTTfRCqfGTIVG 3mD1brVFnwf/yAgXuLL5SS7p4Sd3wftkh7enC8AVw4w9IpbicjZXY5gppjTpfuA0 ox3WiGmOr1Xho/aqf1Qc9eRXfhzQsDRvBi/qhYSmJ4tKj2EMvWHuBLUdGVQN1LtS VGfY85VLyPUjKfA8WLWIprV3Uq2ZSP8NshlHOwX/IHbAp4A1XsGqY6dKXB+0d5tA 6mwd4rGidoZ6vuoRNyYSlCIz8f9pInD6POdxekxRS/b1wWLicTU/riQyaSEJQ8p5 2CnSravfyGQ6uwgjL2rCI15L1R1rYDj0Q0hJ17m64zmVIe1P9l0m3GNiJgdiRtI7 eUaw3u0IGfmMj7n2JGF5eG8q4oG4SkV2ZdeGhM+d+hTIF4UbVcNudnBvk17ZDIli g7pFFF8zYOtIw6ekueLTYWbF6wAB32Y3l/vrLgKMQhmWqkuNC3ldWc+1nh6tirV8 X3XpDEwPsXPdcpyhbPaO16H5Zay9rBh0Zk7SJKsL7COtZVImg2dNOpM4ztCA+WJO OhrAU6HpN5a0jCoBNlUqxuGXlw6sS5NtMapx35LOvMhHQYvPO47KeCiUZUmLOeM6 ZYk2twLttzc= =OBTe -----END PGP SIGNATURE----- --Sig_/jfKwnpVeBpqnhLovKh9O.al--