From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: Raid10 performance issues during copy and balance, only half the spindles used for reading data
Date: Tue, 10 Mar 2015 04:37:46 +0000 (UTC) [thread overview]
Message-ID: <pan$f286d$674de9b9$44f18402$f6b4f970@cox.net> (raw)
In-Reply-To: 54FE3093.3020902@gmail.com
Sven Witterstein posted on Tue, 10 Mar 2015 00:45:23 +0100 as excerpted:
> During balance or copies, the second image of the stripeset A + B | A' +
> B' is never used, thus throwing away about 40% of performance, e.g. it
> NEVER used A' + B' to read from even if 50% of the needed assembled data
> could have been read from there..., so 2 disks were maxed out, the other
> writing at about 40% their I/O capacity.
>
> Also when rsyncing to ssd raid0 zpool (just for testing, the ssd-pool is
> the working pool, the zfs and btrfs disk pools are for backup) - only 3
> disks of 6 are read from.
>
> As opposed, a properly set up mdadm "far or offset" + xfs and zfs itself
> use all spindles (devices) to read from and net data is delivered twice
> as fast.
>
> I would love to see btrfs trying harder to deliver data - it slips my
> mind whether it is a missing feature in btrfs raid10 right now or a bug
> in the 3.16 lines of kernel I am using (mint rebecca on my workstation).
>
> If anybody knows about it, or I am missing something (-m=raid10
> -d=raid10 was OK I hope when rebalancing?)
> I'd like to be enlightened (when I googled it was always stated that
> btrfs would read from all spindles, but it's not the case for me...)
Known issue, explained below...
The btrfs raid1 (and thus raid10, since it's inherited) read-scheduling
algorithm remains a rather simplistic one, suitable for btrfs development
and testing, but not yet optimized.
The existing algorithm is a very simple even/odd PID-based algorithm.
Thus, single-thread testing will indeed always read from the same side of
the pair-mirror (since btrfs raid1 and raid10 are pair-mirrored, no N-way-
mirroring available yet, tho it's the next new feature on the raid roadmap
now that raid56 is essentially code-complete altho not yet well bug-
flushed, with 3.19). With a reasonably balanced mix of even/odd-PID
readers, however, you should indeed get reasonable balanced read activity.
The obvious worst-case, of course, is an alternate read/write PID
spawning script or other arrangement such that all the readers tend to be
on the same side of the even/odd.
Meanwhile, as stated above, this sort of extremely simplistic algorithm
is reasonably suited to testing, as it's very easy to force multi-PID-
read scenarios with either good balance, or worst-case-stress-test where
all activity should be from one side or the other. However, it's
obviously not production-grade optimization yet, one of the clearest
indicators remaining (other than flat-out bugs) that btrfs really is /
not/ fully stable yet, even for raid-types that have been around long
enough to be effectively as stable as btrfs itself is (unlike the newly
completed in 3.19 raid56 code).
OK, but when /can/ we expect optimization?
Good question. With the caveat that I'm only an admin and list regular
myself, not a dev, and that I've seen no specifics on this particular
matter, reasonable speculation at better raid1/10 read optimization
timing would put its introduction either as part of N-way-mirroring, or
shortly thereafter, since that's a definitely planned and long roadmapped
feature that was waiting for raid56 as the N-way-mirroring code is
planned to build on the raid56 code, and arguably, optimization before
that would be premature optimization of the pair-mirror special-case.
So when can N-way-mirroring be expected?
Another good question. A /very/ good one for me, personally, since
that's the feature I really /really/ want to see for my own use case.
Given that various btrfs features have repeatedly taken longer to
implement than planned, and just raid56 took about three years years
(original introduction was delayed from 3.5 or so to 3.9, where it was
introduced but in a code-incomplete state, undegraded runtime worked,
recovery, not so much, and only with 3.19 is the code essentially
complete, altho I'd consider it bug-testing until 3.21 aka 4.1 at least),
I'm really not expecting N-way-mirroring until maybe this time next
year... and even that's potentially wildly optimistic, given the three
years raid56 took.
So again, a best-guess for raid1 read-optimization, still keeping in mind
that I'm simply a btrfs user and list regular myself, and I've not seen
any specific discussion on the timing here, only the explanation of the
current algorithm I repeated above...
Some time in 2016... if we're lucky. I'd frankly be surprised to see it
this year. I do expect we'll see it before 2020, and I'd /hope/ by 2018,
but 2016-2018, 1-3 years out... really is about my best guess, given
btrfs history.
(FWIW, I've seen people compare zfs to btrfs in terms of feature
development timing. ZFS moved faster, wikipedia says 2001-2006 so half a
decade, but I believe they had a rather larger dedicated/paid team
working on it, and it /still/ took them half a decade. Btrfs has fewer
dedicated engineers working on it but /does/ have the advantages of free
and open source, tho AFAIK that shows up mostly in the bug testing/
reporting and to some extent fixing department, not so much main feature
development. Person-hour-wise, from the comparison I read, it's
reasonably equivalent; btrfs is simply doing it with fewer devs,
resulting in it being spread out rather longer. I think some folks are
on record as predicting btrfs would take about a decade to reach a
comparable level, and looking back and forward, that's quite a good
prediction, a decade out on a software project, where software
development happens at internet speed.)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
next prev parent reply other threads:[~2015-03-10 4:37 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-09 23:45 Raid10 performance issues during copy and balance, only half the spindles used for reading data Sven Witterstein
2015-03-10 4:37 ` Duncan [this message]
-- strict thread matches above, loose matches on Subject: below --
2015-03-15 1:30 Sven Witterstein
2015-03-15 3:35 ` Duncan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='pan$f286d$674de9b9$44f18402$f6b4f970@cox.net' \
--to=1i5t5.duncan@cox.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).