linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: btrfs and iostat - how do I measure the live performance of my btrsf filesystems?
Date: Sat, 23 Aug 2014 03:24:02 +0000 (UTC)	[thread overview]
Message-ID: <pan$fd73$abec75f5$f92eda58$95927b3e@cox.net> (raw)
In-Reply-To: CADw2B2Ow+s+V03vT_WPMH5w9pWck-d0uaDi6KjLGOygKugAciA@mail.gmail.com

G. Richard Bellamy posted on Fri, 22 Aug 2014 14:36:22 -0700 as excerpted:

> An interesting exercise saw me reading data from my RAID10 to a USB
> device, which produced the following representative iostat:
> 
> Linux 3.14.17-1-lts (eanna) 08/22/2014 _x86_64_ (24 CPU)
> 
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            3.53    0.00    0.50    2.83    0.00   93.14
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sda               1.89         0.01         0.01        839        998
> sdc               0.00         0.00         0.00          1          0
> sdb               1.23         0.02         0.01       1254        998

> sdi             175.40         0.00        20.26         39    1454881

> sdd               0.26         0.01         0.00        827         58
> sde              28.86        12.29         0.00     882447         61
> sdf               0.00         0.00         0.00          1          0
> sdh              25.25        12.29         0.00     882448         57
> sdg               0.25         0.01         0.00        826         60
> 
> /dev/sdi is the USB drive, and /dev/sd[defg] are the four devices in the
> raid10 volume. I'm reading a large (1.1T) file from the raid10 volume
> and writing it to the USB drive.
> 
> You can see that there are approximately two drives from the raid10
> which are being read from - I assume this corresponds to the two spans
> (the 'no lower than the (n/spans)x' speed I mentioned in my original
> post - and that they aggregate to 24.58MB/s reads. This corresponds to
> the 20.26MB/s writes to the USB drive.
> 
> The raid10 volume is only being used for this file operation, nothing
> else is touching it but the kernel and btrfs.
> 
> I'm curious how others would read this?

Something's not adding up.  You say sd[defg] are the btrfs raid10, but 
it's sde and sdh that are getting the read traffic.  Are you sure sdh 
isn't part of the raid10 and one of sd[dfg] (perhaps f, seeing d and g 
appear to balance out leaving f the odd one out?) is?

Assuming sdh is indeed part of the raid10, it makes sense, and the fact 
that only two of the four devices are being active read matches what's 
known about btrfs raid1/10 at this point -- it has a relatively dumb read 
allocation algorithm that was good enough for a first implementation but 
obviously isn't optimal, reads are allocated based on the last bit of the 
PID (or TID IDR which), so even/odd.  Since this is a single transfer 
process, all the activity is on one or the other, so it's reading from 
the two device wide stripe, but always from the same one of the two 
mirrors supporting each strip.

If you had a second read process going on and it was the same even/odd 
pid, you'd be doubling up on the same two devices.  Only with a 
relatively even mix of even/odd pid reads will you see things even out 
across all four.  See what I mean about a "relatively dumb" not well 
optimized first implementation?

As they say btrfs is stabilizing now, presumably one of these kernel 
cycles we'll see something better in terms of read mirror allocation 
algorithm, perhaps as part of N-way-mirroring, when that gets implemented 
(roadmapped for after raid5/6 is completed, it's two-way-mirroring only 
now, regardless of the number of devices).

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


  reply	other threads:[~2014-08-23  3:24 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-05 14:14 btrfs and iostat - how do I measure the live performance of my btrsf filesystems? Tomasz Chmielewski
2014-08-05 22:06 ` G. Richard Bellamy
2014-08-05 23:39   ` Tomasz Chmielewski
2014-08-22 21:36     ` G. Richard Bellamy
2014-08-23  3:24       ` Duncan [this message]
2014-08-23  4:13         ` G. Richard Bellamy
  -- strict thread matches above, loose matches on Subject: below --
2014-08-04 23:01 G. Richard Bellamy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$fd73$abec75f5$f92eda58$95927b3e@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).