From: Sven Witterstein <sven.witterstein@gmail.com>
To: linux-btrfs@vger.kernel.org
Subject: Re: Raid10 performance issues during copy and balance, only half the spindles used for reading data
Date: Sun, 15 Mar 2015 02:30:11 +0100 [thread overview]
Message-ID: <5504E0A3.3040908@gmail.com> (raw)
Hi Duncan,
thank you for that explanation
> The existing algorithm is a very simple even/odd PID-based algorithm.
> Thus, single-thread testing will indeed always read from the same side of
> the pair-mirror (since btrfs raid1 and raid10 are pair-mirrored, no N-way-
> mirroring available yet, tho it's the next new feature on the raid roadmap
> now that raid56 is essentially code-complete altho not yet well bug-
> flushed, with 3.19). With a reasonably balanced mix of even/odd-PID
> readers, however, you should indeed get reasonable balanced read activity.
OK, that is really no good design for production:
eveness/oddity of PIDs is not related (not a good criterion to predict) to what PIDs
will request I/O from the pool at a given time.
In Raid10 (N0.. to come) probably all read requests from all PIDs need to be queued and
spread across the available number of redundant stripesets
(or simple mirrors, if the "striped mirrors" layout 1+0 à la zfs is used or were possible)
Would apply for pure-ssd pool as well, though some fancy "near/far/seek time"
considerations are obsoleted
Something like that...
Probably an option-parameter in analogy to (single-spindle pre-ssd ideas for the I/O scheduler) like
elevator=cfq
(for btrfs="try to balance reads between devices by common read queue" resp "max out all resources and distribute fairly to requesting apps")
(optimum for large reads (and parallel several of them such as balance / send/receive/copy/tar with the pool and the same time to external backup...)
elevator=noop (assign by even/odd, current behavior (testing)
elevator=jumpy (e.g. assign a read to the stripeset which has the smallest number of other reads on it, every rand x
secs switch stripeset if number of "customers" on other 1..N redundant stripesets has decreased
(similar to core switching of a long-running process in kernel)
(optimised for smaller r/w operations such as many users accessing a central server)
etc..
would bring room to experiment in the years till 2020 as you outlined and to review,
whether mdadms raid10 near/far/offset should be considered when most future storage will be non-rotary...
Some kind of self-optimzing should also be included, i.e. if the filesystem new,
how much is to be read, it could know if it made sense to try different methods
and find the fastest, such as gparteds' block size adapting...
Again, it would be interessting if the impact on non-rotary storage would be insignificant
In my use case it's a simple rsync or cp -a or nemo/nautilus
copying between zfs and btrfs pools. Those are single-threaded I guess
and I understand in btrfs there is not such a ton of z_read - processess that
probably account for the "flying" zfs-reads on 6disk raidz2 or 3x2 or 2x3
vdev layout compared to btrfs reads.
I still find it strange, that also a balance also only uses half the spindles,
but it is explainable when the same logic is used as for any other reading from
the array. At least the scrub reads all data and not only one copy ;-)
Interesting enough, all my other btrfses are single-SSD for operating system with auto-snap to be able to revert...
and one is a 2-disk raid 0 for throw away data, so I never had a setup that would expose this behaviour...
Goodbye,
Sven.
next reply other threads:[~2015-03-15 1:30 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-15 1:30 Sven Witterstein [this message]
2015-03-15 3:35 ` Raid10 performance issues during copy and balance, only half the spindles used for reading data Duncan
-- strict thread matches above, loose matches on Subject: below --
2015-03-09 23:45 Sven Witterstein
2015-03-10 4:37 ` Duncan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5504E0A3.3040908@gmail.com \
--to=sven.witterstein@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.