linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Alexander Lyakas <alex.bolshoy@gmail.com>
Cc: linux-raid <linux-raid@vger.kernel.org>,
	Yair Hershko <yair@zadarastorage.com>
Subject: Re: Question about RAID1 plug/unplug code
Date: Tue, 9 Sep 2014 19:45:32 +1000	[thread overview]
Message-ID: <20140909194532.621f4500@notabene.brown> (raw)
In-Reply-To: <CAGRgLy4QAz6KVncLsVypsOD85jMUNaOgrFc13=knHMXb92SvXw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5106 bytes --]

On Tue, 9 Sep 2014 11:33:13 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
wrote:

> Hi Neil,
> 
> 
> On Tue, Sep 9, 2014 at 4:45 AM, NeilBrown <neilb@suse.de> wrote:
> > On Mon, 8 Sep 2014 16:55:52 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
> > wrote:
> >
> >> Hi Neil,
> >> We have been seeing high latency on the md/raid1 block device, due to
> >> the fact that all WRITEs are handed off to raid1d thread. This thread
> >> also calls bitmap_unplug(), which writes the bitmap synchronously.
> >> While it waits for the bitmap, it cannot trigger other WRITEs waiting
> >> in its pending_bio_list. This is especially seen with SSDs: MD's
> >> latency is much higher that SSD latency (I have been stoned by Peter
> >> Grandi when I brought up this issue previously for raid5).
> >>
> >> Then I have noticed the commit:
> >>
> >> commit f54a9d0e59c4bea3db733921ca9147612a6f292c
> >> Author: NeilBrown <neilb@suse.de>
> >> Date:   Thu Aug 2 08:33:20 2012 +1000
> >>
> >>     md/raid1: submit IO from originating thread instead of md thread.
> >>
> >> Looking at the code, I learned that to avoid switching into raid1d,
> >> the caller has to use blk_start_plug/blk_finish_plug. So I added these
> >> calls in our kernel module, which submits bios to MD. Results were
> >> awesome, MD latency got down significantly.
> >
> > That's good to hear.
> >
> >>
> >> So I have several questions about this plug/unplug thing.
> >>
> >> 1/ Originally this infrastructure was supposed to help IO schedulers
> >> in merging requests. It is useful when one has a bunch of requests to
> >> submit in one shot.
> >
> > That is exactly the whole point of plugging:  allow the device to handle a
> > batch of requests together instead of one at a time.
> >
> >> But in MD case, thus infrastructure is used for a different purpose:
> >> not to merge requests (which may help bandwidth, but probably not
> >> latency), but to avoid making raid1d a bottleneck, to be able to
> >> submit requests from multiple threads in parallel, which brings down
> >> latency significantly in our case. Indeed "struct blk_plug" has a
> >> special "cb_list", which is used only by MD.
> >
> > I don't think the way md uses plugging is conceptually different from any
> > other use: it is always about gathering a batch together.
> > "cb_list" is handled by blk_check_plugged() which is also used by
> > block/umem.c and btrfs.
> >
> > The base plugging code assumes that it is only gathering a batch of requests
> > for a single device - if the target device changes then the batch is flushed.
> > It also assumed that it was "struct request" that was batched.
> > Devices like md that want to queue 'struct bio', something else was needed.
> > Also with layered devices it can be useful to gather multiple batches for
> > multiple layers.
> > So I created "cb_list" etc and a more generic interface.
> >
> >> In my case I have only individual bios (not a bunch of bios), and I
> >> after wrap them with plug/unplug, MD latency gets better. So we are
> >> using the plug infrastructure for a different purpose.
> >> Is my understanding correct? Was this your intention?
> >
> > I don't really understand what you are doing.  There is no point in using
> > plugging for individual bios.  The  main point for raid1 writes is to gather
> > a lot of writes together so that all multiple bitmap bits can be set all at
> > once.
> > It should be possible to submit individual bios directly from make_request
> > without passing them to raid1d and without using plugging.
> Can you pls explain how it is possible?
> You have this code for WRITEs:
>         cb = blk_check_plugged(raid1_unplug, mddev, sizeof(*plug));
>         if (cb)
>             plug = container_of(cb, struct raid1_plug_cb, cb);
>         else
>             plug = NULL;
>         spin_lock_irqsave(&conf->device_lock, flags);
>         if (plug) {
>             bio_list_add(&plug->pending, mbio);
>             plug->pending_cnt++;
>         } else {
>             bio_list_add(&conf->pending_bio_list, mbio);
>             conf->pending_count++;
>         }
>         spin_unlock_irqrestore(&conf->device_lock, flags);
> 
> If the thread blk_check_plugged returns NULL, then you always hand the
> WRITE to raid1d. So the only option to avoid handoff to raid1d is for
> the caller to plug. Otherwise, all WRITEs are handed off to raid1d and
> latency becomes terrible.
> So in my case, I use plug/unplug for individual bios only to avoid the
> handoff to raid1d.
> What am I missing in this analysis?

if blk_check_plugged succeeds then it has arranged for raid1_unplug to be
called a little later by that same process.
So there is nothing to stop you calling raid1_unplug immediately.

raid1_unplug essentially does:
  bitmap_unplug()
  generic_make_request()

so you can very nearly just do that, without any plugging.

There is a bit of extra subtlety but I can't really know how relevant that
might be to you without actually seeing you code.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  reply	other threads:[~2014-09-09  9:45 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-08 13:55 Question about RAID1 plug/unplug code Alexander Lyakas
2014-09-09  1:45 ` NeilBrown
2014-09-09  8:33   ` Alexander Lyakas
2014-09-09  9:45     ` NeilBrown [this message]
2014-09-10  8:01       ` Alexander Lyakas
2014-09-10  9:36         ` NeilBrown
2014-09-11  8:22           ` Alexander Lyakas
2014-09-12  6:16             ` NeilBrown
2014-09-14 17:52               ` Alexander Lyakas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140909194532.621f4500@notabene.brown \
    --to=neilb@suse.de \
    --cc=alex.bolshoy@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=yair@zadarastorage.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).