Re: dstat shows unexpected result for two disk RAID1

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Nicholas D Steeves <nsteeves@gmail.com>
To: Chris Murphy <lists@colorremedies.com>
Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: dstat shows unexpected result for two disk RAID1
Date: Wed, 9 Mar 2016 17:51:34 -0500	[thread overview]
Message-ID: <CAD=QJKgtjgJoH2=d2i2OhK2YD8eWyCeAfX36B04-G33vbW-TXw@mail.gmail.com> (raw)
In-Reply-To: <CAJCQCtShrmtnO-d3u+jnFOWqct5QQ0oKErJpDeC+0pw-XiECBA@mail.gmail.com>

On 9 March 2016 at 16:36, Roman Mamedov <rm@romanrm.net> wrote:
> On Wed, 9 Mar 2016 15:25:19 -0500
> Nicholas D Steeves <nsteeves@gmail.com> wrote:
>
>> I understood that a btrfs RAID1 would at best grab one block from sdb
>> and then one block from sdd in round-robin fashion, or at worse grab
>> one chunk from sdb and then one chunk from sdd.  Alternatively I
>> thought that it might read from both simultaneously, to make sure that
>> all data matches, while at the same time providing single-disk
>> performance.  None of these was the case.  Running a single
>> IO-intensive process reads from a single drive.
>
> No RAID1 implementation reads from disks in a round-robin fashion, as that
> would give terrible performance giving disks a constant seek load instead of
> the normal linear read scenario.

On 9 March 2016 at 16:26, Chris Murphy <lists@colorremedies.com> wrote:
> It's normal and recognized to be sub-optimal. So it's an optimization
> opportunity. :-)
>
> I see parallelization of reads and writes to data single profile
> multiple devices as useful also, similar to XFS allocation group
> parallelization. Those AGs are spread across multiple devices in
> md/lvm linear layouts, so if you have processes that read/write to
> multiple AGs at a time, those I/Os happen at the same time when on
> separate devices.

Chris, yes, that's exactly how I thought that it would work.  Roman,
when I said round-robin--please forgive my naïvité--I meant hoped
there would be a chunk A1 from disk0 read at the same time as chunk A2
from disk1.  Can you use the btree associated with chunk A1 to put
disk B to work readingahead, but searching the btree associated with
chunk A1?  Then, when disk0 finishes reading A1 into memory, A2 gets
contatinated.

If disk0 is finishes reading chunk A1, change the primary read disk
for PID to disk1 and let reading A2 continue, and put disk0 to work
using the same method as disk1 was previously, but on chunk A3.  Else,
if disk1 reading A2 finishes before disk0 finishes A1, then disk0
remains the primary read disk for PID and disk1 begins reading A3.

That's how I thought that it would work, and that the scheduler could
interrupt the readahead operation for non-primary disk.  Eg: disk1
would becoming primary reading disk for PID2, where disk0 would
continue as primary for PID1.  And if there's a long queue of reads or
writes then this simplest-case would be limited in the following way:
disk0 and disk1 never actually get to read or write to the same chunk
<- Is this the explanation why, for practical reasons, dstat shows the
behaviour it shows?

If this is the case, would it be possible for the non-primary read
disk for PID1 to tag the A[x] chunk it wrote to memory with a request
for the PID to use what it wrote to memory from A[x]?  And also for
the "primary" disk to resume from location y in A[x] instead beginning
from scratch with A[x]?  Roman, in this case, the seeks would be
time-saving, no?

Unfortunately, I don't know how to implement this, but I had imagined
that the btree for a directory contained pointers (I'm using this term
loosely rather than programically) to all extents associated with all
files contained underneath it.  Or does it point to the chunk, which
then points to the extent?  At any rate, is this similar to the
dir_index of ext4, and is this the method btrfs uses?

Best regards,
Nicholas

next prev parent reply	other threads:[~2016-03-09 22:51 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-09 20:21 dstat shows unexpected result for two disk RAID1 Nicholas D Steeves
2016-03-09 20:25 ` Nicholas D Steeves
2016-03-09 20:50   ` Goffredo Baroncelli
2016-03-09 21:26   ` Chris Murphy
2016-03-09 22:51     ` Nicholas D Steeves [this message]
2016-03-11 23:42     ` Nicholas D Steeves
2016-03-09 21:36   ` Roman Mamedov
2016-03-09 21:43     ` Chris Murphy
2016-03-09 22:08       ` Nicholas D Steeves
2016-03-10  4:06     ` Duncan
2016-03-10  5:01       ` Chris Murphy
2016-03-10  8:10         ` Duncan
2016-03-12  0:04       ` Nicholas D Steeves
2016-03-12  0:10         ` Nicholas D Steeves
2016-03-12  1:20           ` Chris Murphy
2016-04-06  3:58             ` Nicholas D Steeves
2016-04-06 12:02               ` Austin S. Hemmelgarn
2016-04-22 22:36                 ` Nicholas D Steeves

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAD=QJKgtjgJoH2=d2i2OhK2YD8eWyCeAfX36B04-G33vbW-TXw@mail.gmail.com' \
    --to=nsteeves@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).