linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "G. Richard Bellamy" <rbellamy@pteradigm.com>
To: Duncan <1i5t5.duncan@cox.net>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: btrfs and iostat - how do I measure the live performance of my btrsf filesystems?
Date: Fri, 22 Aug 2014 21:13:58 -0700	[thread overview]
Message-ID: <CADw2B2P8bPwzcQeO0r5VnLXSJFO+MG_D5egUQ9pJG5M60J0UuA@mail.gmail.com> (raw)
In-Reply-To: <pan$fd73$abec75f5$f92eda58$95927b3e@cox.net>

Um. Derp. Yeah, it's actually sd[defh].

Thanks for the continuing education.

On Fri, Aug 22, 2014 at 8:24 PM, Duncan <1i5t5.duncan@cox.net> wrote:
> G. Richard Bellamy posted on Fri, 22 Aug 2014 14:36:22 -0700 as excerpted:
>
>> An interesting exercise saw me reading data from my RAID10 to a USB
>> device, which produced the following representative iostat:
>>
>> Linux 3.14.17-1-lts (eanna) 08/22/2014 _x86_64_ (24 CPU)
>>
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>            3.53    0.00    0.50    2.83    0.00   93.14
>>
>> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
>> sda               1.89         0.01         0.01        839        998
>> sdc               0.00         0.00         0.00          1          0
>> sdb               1.23         0.02         0.01       1254        998
>
>> sdi             175.40         0.00        20.26         39    1454881
>
>> sdd               0.26         0.01         0.00        827         58
>> sde              28.86        12.29         0.00     882447         61
>> sdf               0.00         0.00         0.00          1          0
>> sdh              25.25        12.29         0.00     882448         57
>> sdg               0.25         0.01         0.00        826         60
>>
>> /dev/sdi is the USB drive, and /dev/sd[defg] are the four devices in the
>> raid10 volume. I'm reading a large (1.1T) file from the raid10 volume
>> and writing it to the USB drive.
>>
>> You can see that there are approximately two drives from the raid10
>> which are being read from - I assume this corresponds to the two spans
>> (the 'no lower than the (n/spans)x' speed I mentioned in my original
>> post - and that they aggregate to 24.58MB/s reads. This corresponds to
>> the 20.26MB/s writes to the USB drive.
>>
>> The raid10 volume is only being used for this file operation, nothing
>> else is touching it but the kernel and btrfs.
>>
>> I'm curious how others would read this?
>
> Something's not adding up.  You say sd[defg] are the btrfs raid10, but
> it's sde and sdh that are getting the read traffic.  Are you sure sdh
> isn't part of the raid10 and one of sd[dfg] (perhaps f, seeing d and g
> appear to balance out leaving f the odd one out?) is?
>
> Assuming sdh is indeed part of the raid10, it makes sense, and the fact
> that only two of the four devices are being active read matches what's
> known about btrfs raid1/10 at this point -- it has a relatively dumb read
> allocation algorithm that was good enough for a first implementation but
> obviously isn't optimal, reads are allocated based on the last bit of the
> PID (or TID IDR which), so even/odd.  Since this is a single transfer
> process, all the activity is on one or the other, so it's reading from
> the two device wide stripe, but always from the same one of the two
> mirrors supporting each strip.
>
> If you had a second read process going on and it was the same even/odd
> pid, you'd be doubling up on the same two devices.  Only with a
> relatively even mix of even/odd pid reads will you see things even out
> across all four.  See what I mean about a "relatively dumb" not well
> optimized first implementation?
>
> As they say btrfs is stabilizing now, presumably one of these kernel
> cycles we'll see something better in terms of read mirror allocation
> algorithm, perhaps as part of N-way-mirroring, when that gets implemented
> (roadmapped for after raid5/6 is completed, it's two-way-mirroring only
> now, regardless of the number of devices).
>
> --
> Duncan - List replies preferred.   No HTML msgs.
> "Every nonfree program has a lord, a master --
> and if you use the program, he is your master."  Richard Stallman
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2014-08-23  4:13 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-05 14:14 btrfs and iostat - how do I measure the live performance of my btrsf filesystems? Tomasz Chmielewski
2014-08-05 22:06 ` G. Richard Bellamy
2014-08-05 23:39   ` Tomasz Chmielewski
2014-08-22 21:36     ` G. Richard Bellamy
2014-08-23  3:24       ` Duncan
2014-08-23  4:13         ` G. Richard Bellamy [this message]
  -- strict thread matches above, loose matches on Subject: below --
2014-08-04 23:01 G. Richard Bellamy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CADw2B2P8bPwzcQeO0r5VnLXSJFO+MG_D5egUQ9pJG5M60J0UuA@mail.gmail.com \
    --to=rbellamy@pteradigm.com \
    --cc=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).