From: Lukas Straub <lukasstraub2@web.de>
To: Roger Willcocks <roger@filmlight.ltd.uk>
Cc: dm-devel@redhat.com, Song Liu <song@kernel.org>,
linux-raid@vger.kernel.org
Subject: Re: [dm-devel] Raid0 performance regression
Date: Sun, 23 Jan 2022 18:00:58 +0000 [thread overview]
Message-ID: <20220123180058.372f72ce@gecko.fritz.box> (raw)
In-Reply-To: <747C2684-B570-473E-9146-E8AB53102236@filmlight.ltd.uk>
[-- Attachment #1: Type: text/plain, Size: 3157 bytes --]
CC'ing Song Liu (md-raid maintainer) and linux-raid mailing list.
On Fri, 21 Jan 2022 16:38:03 +0000
Roger Willcocks <roger@filmlight.ltd.uk> wrote:
> Hi folks,
>
> we noticed a thirty percent drop in performance on one of our raid
> arrays when switching from CentOS 6.5 to 8.4; it uses raid0-like
> striping to balance (by time) access to a pair of hardware raid-6
> arrays. The underlying issue is also present in the native raid0
> driver so herewith the gory details; I'd appreciate your thoughts.
>
> --
>
> blkdev_direct_IO() calls submit_bio() which calls an outermost
> generic_make_request() (aka submit_bio_noacct()).
>
> md_make_request() calls blk_queue_split() which cuts an incoming
> request into two parts with the first no larger than get_max_io_size()
> bytes (which in the case of raid0, is the chunk size):
>
> R -> AB
>
> blk_queue_split() gives the second part 'B' to generic_make_request()
> to worry about later and returns the first part 'A'.
>
> md_make_request() then passes 'A' to a more specific request handler,
> In this case raid0_make_request().
>
> raid0_make_request() cuts its incoming request into two parts at the
> next chunk boundary:
>
> A -> ab
>
> it then fixes up the device (chooses a physical device) for 'a', and
> gives both parts, separately, to generic make request()
>
> This is where things go awry, because 'b' is still targetted to the
> original device (same as 'B'), but 'B' was queued before 'b'. So we
> end up with:
>
> R -> Bab
>
> The outermost generic_make_request() then cuts 'B' at
> get_max_io_size(), and the process repeats. Ascii art follows:
>
>
> /---------------------------------------------------/ incoming rq
>
> /--------/--------/--------/--------/--------/------/ max_io_size
>
> |--------|--------|--------|--------|--------|--------|--------| chunks
>
> |...=====|---=====|---=====|---=====|---=====|---=====|--......| rq out
> a b c d e f g h i j k l
>
> Actual submission order for two-disk raid0: 'aeilhd' and 'cgkjfb'
>
> --
>
> There are several potential fixes -
>
> simplest is to set raid0 blk_queue_max_hw_sectors() to UINT_MAX
> instead of chunk_size, so that raid0_make_request() receives the
> entire transfer length and cuts it up at chunk boundaries;
>
> neatest is for raid0_make_request() to recognise that 'b' doesn't
> cross a chunk boundary so it can be sent directly to the physical
> device;
>
> and correct is for blk_queue_split to requeue 'A' before 'B'.
>
> --
>
> There's also a second issue - with large raid0 chunk size (256K), the
> segments submitted to the physical device are at least 128K and
> trigger the early unplug code in blk_mq_make_request(), so the
> requests are never merged. There are legitimate reasons for a large
> chunk size so this seems unhelpful.
>
> --
>
> As I said, I'd appreciate your thoughts.
>
> --
>
> Roger
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://listman.redhat.com/mailman/listinfo/dm-devel
>
--
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next parent reply other threads:[~2022-01-23 18:01 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <747C2684-B570-473E-9146-E8AB53102236@filmlight.ltd.uk>
2022-01-23 18:00 ` Lukas Straub [this message]
2022-01-23 21:34 ` [dm-devel] Raid0 performance regression Paul Menzel
2022-01-24 16:48 ` Roger Willcocks
2022-01-25 8:00 ` Song Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220123180058.372f72ce@gecko.fritz.box \
--to=lukasstraub2@web.de \
--cc=dm-devel@redhat.com \
--cc=linux-raid@vger.kernel.org \
--cc=roger@filmlight.ltd.uk \
--cc=song@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).