linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andras Korn <korn@raidlist.elan.rulez.org>
To: linux-raid@vger.kernel.org
Subject: Re: write-behind has no measurable effect?
Date: Tue, 15 Feb 2011 02:00:52 +0100	[thread overview]
Message-ID: <20110215010052.GA13135@hellgate.intra.guy> (raw)
In-Reply-To: <20110215104109.06b12b33@notabene.brown>

On Tue, Feb 15, 2011 at 10:41:09AM +1100, NeilBrown wrote:

> > > I suspect your tests did not test for low latency in a low-throughput
> > > scenario.
> > 
> > I thought they did. "High latency" was, in my case, caused by the high seek
> > times (compared to the SSD) of the spinning disks. Throughput-wise, they
> > certainly could have kept up (their sequential read/write performance even
> > exceeds that of the SSD).
> 
> A "MB/s" number is not going to show a difference with write-behind as it is
> fundamentally about throughput.  We cannot turn random writes into sequential
> writes just be doing 'write-behind' as the same locations on disk still have
> to be written to.

Thanks, I understand now; I had hoped write-behind would in fact re-order
the writes to the slow devices. In retrospect, I'm not sure what gave me
that notion. (Reckless optimism, probably. :)

> > What does it actually do? md(4) isn't very forthcoming, and the wiki has no
> > relevant hits either.
> 
> write-behind makes a copy of the data, submits writes to all devices in
> parallel, and reports success to the upper layer as soon as all the
> non-write-behind writes have finished.

So this really only makes a difference for synchronous writes (because
otherwise success would be reported as soon as the write is buffered),
right?

> The approach you suggest could be synthesised by:
> 
>  - add a write-intent bitmap with fairly small chunks.  This should be
>    an external bitmap and should be directly on the fastest drive
>  - have some daemon that fails the 'slow' device, waits 30 seconds, re-adds
>    it, waits for recovery to complete, and loops back.

Ewww. :)

> Actually I just realised another reason why you don' see any improvement.
> You are using an internal bitmap.  This requires a synch write to both
> devices.

Yes, that was something I actually wanted to ask. Since it's write_behind_,
it wouldn't need to be a synchronous write though - you could at least allow
the write-mostly disk to reorder it, couldn't you?

>  The use-case for which write-behind was developed involved an external
> bitmap.

My use case, fwiw, is that I have a single SSD and would like to exploit its
close-to-zero seek time while also providing redundancy (using spinning
disks) with eventual consistency. It's not for databases or anything
irreplaceable, just things like logs, svn working copies, vserver system
files... and an external jfs journal. (I know journal i/o is very nearly
sequential, but I don't have a spinning disk to dedicate to it, and if I use
the same disk for other purposes as well, seeking would definitely occur,
decreasing performance.)

> Maybe I should disable bitmap updates to write-behind devices .....

Or make them asynchronous, or lazy (like, update the bitmap whenever you
must seek into the vicinity anyway), or just infrequent. But yes, this
sounds like a very good idea.

Another approach to take would be to mark as dirty, on the fast devices, all
areas being written to, and in the background continuously synch them to the
slow devices, sequentially (marking as clean synched-and-as-yet-unwritten-to
areas); so that the array would be resyncing continually, but be very fast
for random writes. This would of course also require the bitmap to only be
synchronously updated on the fast devices.

Otoh, this is really a different mechanism from the current write-behind,
aimed at a different use-case, so maybe it could be implemented
orthogonally. (Patches welcome, I'm sure; it's times like these I hate not
being a coder.)

-- 
                     Andras Korn <korn at elan.rulez.org>
                    Take my advice, I don't use it anyway.

  reply	other threads:[~2011-02-15  1:00 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-14 21:38 write-behind has no measurable effect? Andras Korn
2011-02-14 22:50 ` NeilBrown
2011-02-14 22:57   ` Andras Korn
2011-02-14 23:41     ` NeilBrown
2011-02-15  1:00       ` Andras Korn [this message]
2011-02-15  1:19         ` John Robinson
2011-02-15  2:19           ` Andras Korn
     [not found]             ` <AANLkTikFSOePZJXknAt=Tx6+FpdJ4tiSNwpuwuPC3RY=@mail.gmail.com>
2011-02-15  9:10               ` Roberto Spadim
2011-02-15 12:40                 ` Andras Korn
2011-02-15 13:26                   ` Roberto Spadim
2011-02-15 17:46                     ` Roberto Spadim
2011-02-16 12:00                 ` Andras Korn
2011-02-16 15:00                   ` Roberto Spadim
2011-02-14 22:56 ` Doug Dumitru
2011-02-14 23:03   ` Andras Korn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110215010052.GA13135@hellgate.intra.guy \
    --to=korn@raidlist.elan.rulez.org \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).