* [PATCH 0/2] md/raid0: Fix performance regression for large sequential IO
@ 2023-08-14 9:27 Jan Kara
2023-08-14 9:27 ` [PATCH 1/2] md/raid0: Factor out helper for mapping and submitting a bio Jan Kara
2023-08-14 9:27 ` [PATCH 2/2] md/raid0: Fix performance regression for large sequential writes Jan Kara
0 siblings, 2 replies; 9+ messages in thread
From: Jan Kara @ 2023-08-14 9:27 UTC (permalink / raw)
To: Song Liu; +Cc: linux-raid, Neil Brown, Jan Kara
Hello,
one of our customers reported a noticeable performance regression with his
raid0 setup after updating to newer kernels. After investigation of the block
traces it was obvious that raid0 code generates very suboptimal IO pattern for
large requests that are not properly aligned to the raid chunk boundary (see
patch 2 for detailed description). The regression has been introduced already 6
years ago by commit f00d7c85be9e ("md/raid0: fix up bio splitting.") but likely
because it requires multimegabyte requests (only possible after multipage bvec
work) which are not aligned to the chunk boundary (which is usually possible
only if the filesystems are created with suboptimal parameters) it went
unnoticed for a long time. These patches fix the regression.
Honza
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] md/raid0: Factor out helper for mapping and submitting a bio
2023-08-14 9:27 [PATCH 0/2] md/raid0: Fix performance regression for large sequential IO Jan Kara
@ 2023-08-14 9:27 ` Jan Kara
2023-08-15 1:33 ` Yu Kuai
2023-08-14 9:27 ` [PATCH 2/2] md/raid0: Fix performance regression for large sequential writes Jan Kara
1 sibling, 1 reply; 9+ messages in thread
From: Jan Kara @ 2023-08-14 9:27 UTC (permalink / raw)
To: Song Liu; +Cc: linux-raid, Neil Brown, Jan Kara
Factor out helper function for mapping and submitting a bio out of
raid0_make_request(). We will use it later for submitting both parts of
a split bio.
Signed-off-by: Jan Kara <jack@suse.cz>
---
drivers/md/raid0.c | 79 +++++++++++++++++++++++-----------------------
1 file changed, 40 insertions(+), 39 deletions(-)
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index d1ac73fcd852..d3c55f2e9b18 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -557,54 +557,21 @@ static void raid0_handle_discard(struct mddev *mddev, struct bio *bio)
bio_endio(bio);
}
-static bool raid0_make_request(struct mddev *mddev, struct bio *bio)
+static void raid0_map_submit_bio(struct mddev *mddev, struct bio *bio)
{
struct r0conf *conf = mddev->private;
struct strip_zone *zone;
struct md_rdev *tmp_dev;
- sector_t bio_sector;
- sector_t sector;
- sector_t orig_sector;
- unsigned chunk_sects;
- unsigned sectors;
-
- if (unlikely(bio->bi_opf & REQ_PREFLUSH)
- && md_flush_request(mddev, bio))
- return true;
-
- if (unlikely((bio_op(bio) == REQ_OP_DISCARD))) {
- raid0_handle_discard(mddev, bio);
- return true;
- }
-
- bio_sector = bio->bi_iter.bi_sector;
- sector = bio_sector;
- chunk_sects = mddev->chunk_sectors;
-
- sectors = chunk_sects -
- (likely(is_power_of_2(chunk_sects))
- ? (sector & (chunk_sects-1))
- : sector_div(sector, chunk_sects));
-
- /* Restore due to sector_div */
- sector = bio_sector;
-
- if (sectors < bio_sectors(bio)) {
- struct bio *split = bio_split(bio, sectors, GFP_NOIO,
- &mddev->bio_set);
- bio_chain(split, bio);
- submit_bio_noacct(bio);
- bio = split;
- }
+ sector_t bio_sector = bio->bi_iter.bi_sector;
+ sector_t sector = bio_sector;
if (bio->bi_pool != &mddev->bio_set)
md_account_bio(mddev, &bio);
- orig_sector = sector;
zone = find_zone(mddev->private, §or);
switch (conf->layout) {
case RAID0_ORIG_LAYOUT:
- tmp_dev = map_sector(mddev, zone, orig_sector, §or);
+ tmp_dev = map_sector(mddev, zone, bio_sector, §or);
break;
case RAID0_ALT_MULTIZONE_LAYOUT:
tmp_dev = map_sector(mddev, zone, sector, §or);
@@ -612,13 +579,13 @@ static bool raid0_make_request(struct mddev *mddev, struct bio *bio)
default:
WARN(1, "md/raid0:%s: Invalid layout\n", mdname(mddev));
bio_io_error(bio);
- return true;
+ return;
}
if (unlikely(is_rdev_broken(tmp_dev))) {
bio_io_error(bio);
md_error(mddev, tmp_dev);
- return true;
+ return;
}
bio_set_dev(bio, tmp_dev->bdev);
@@ -630,6 +597,40 @@ static bool raid0_make_request(struct mddev *mddev, struct bio *bio)
bio_sector);
mddev_check_write_zeroes(mddev, bio);
submit_bio_noacct(bio);
+}
+
+static bool raid0_make_request(struct mddev *mddev, struct bio *bio)
+{
+ sector_t sector;
+ unsigned chunk_sects;
+ unsigned sectors;
+
+ if (unlikely(bio->bi_opf & REQ_PREFLUSH)
+ && md_flush_request(mddev, bio))
+ return true;
+
+ if (unlikely((bio_op(bio) == REQ_OP_DISCARD))) {
+ raid0_handle_discard(mddev, bio);
+ return true;
+ }
+
+ sector = bio->bi_iter.bi_sector;
+ chunk_sects = mddev->chunk_sectors;
+
+ sectors = chunk_sects -
+ (likely(is_power_of_2(chunk_sects))
+ ? (sector & (chunk_sects-1))
+ : sector_div(sector, chunk_sects));
+
+ if (sectors < bio_sectors(bio)) {
+ struct bio *split = bio_split(bio, sectors, GFP_NOIO,
+ &mddev->bio_set);
+ bio_chain(split, bio);
+ submit_bio_noacct(bio);
+ bio = split;
+ }
+
+ raid0_map_submit_bio(mddev, bio);
return true;
}
--
2.35.3
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/2] md/raid0: Fix performance regression for large sequential writes
2023-08-14 9:27 [PATCH 0/2] md/raid0: Fix performance regression for large sequential IO Jan Kara
2023-08-14 9:27 ` [PATCH 1/2] md/raid0: Factor out helper for mapping and submitting a bio Jan Kara
@ 2023-08-14 9:27 ` Jan Kara
2023-08-14 12:15 ` Yu Kuai
2023-08-15 1:33 ` Yu Kuai
1 sibling, 2 replies; 9+ messages in thread
From: Jan Kara @ 2023-08-14 9:27 UTC (permalink / raw)
To: Song Liu; +Cc: linux-raid, Neil Brown, Jan Kara
Commit f00d7c85be9e ("md/raid0: fix up bio splitting.") among other
things changed how bio that needs to be split is submitted. Before this
commit, we have split the bio, mapped and submitted each part. After
this commit, we map only the first part of the split bio and submit the
second part unmapped. Due to bio sorting in __submit_bio_noacct() this
results in the following request ordering:
9,0 18 1181 0.525037895 15995 Q WS 1479315464 + 63392
Split off chunk-sized (1024 sectors) request:
9,0 18 1182 0.629019647 15995 X WS 1479315464 / 1479316488
Request is unaligned to the chunk so it's split in
raid0_make_request(). This is the first part mapped and punted to
bio_list:
8,0 18 7053 0.629020455 15995 A WS 739921928 + 1016 <- (9,0) 1479315464
Now raid0_make_request() returns, second part is postponed on
bio_list. __submit_bio_noacct() resorts the bio_list, mapped request
is submitted to the underlying device:
8,0 18 7054 0.629022782 15995 G WS 739921928 + 1016
Now we take another request from the bio_list which is the remainder
of the original huge request. Split off another chunk-sized bit from
it and the situation repeats:
9,0 18 1183 0.629024499 15995 X WS 1479316488 / 1479317512
8,16 18 6998 0.629025110 15995 A WS 739921928 + 1016 <- (9,0) 1479316488
8,16 18 6999 0.629026728 15995 G WS 739921928 + 1016
...
9,0 18 1184 0.629032940 15995 X WS 1479317512 / 1479318536 [libnetacq-write]
8,0 18 7059 0.629033294 15995 A WS 739922952 + 1016 <- (9,0) 1479317512
8,0 18 7060 0.629033902 15995 G WS 739922952 + 1016
...
This repeats until we consume the whole original huge request. Now we
finally get to processing the second parts of the split off requests
(in reverse order):
8,16 18 7181 0.629161384 15995 A WS 739952640 + 8 <- (9,0) 1479377920
8,0 18 7239 0.629162140 15995 A WS 739952640 + 8 <- (9,0) 1479376896
8,16 18 7186 0.629163881 15995 A WS 739951616 + 8 <- (9,0) 1479375872
8,0 18 7242 0.629164421 15995 A WS 739951616 + 8 <- (9,0) 1479374848
...
I guess it is obvious that this IO pattern is extremely inefficient way
to perform sequential IO. It also makes bio_list to grow to rather long
lengths.
Change raid0_make_request() to map both parts of the split bio. Since we
know we are provided with at most chunk-sized bios, we will always need
to split the incoming bio at most once.
Fixes: f00d7c85be9e ("md/raid0: fix up bio splitting.")
Signed-off-by: Jan Kara <jack@suse.cz>
---
drivers/md/raid0.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index d3c55f2e9b18..595856948dff 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -626,7 +626,7 @@ static bool raid0_make_request(struct mddev *mddev, struct bio *bio)
struct bio *split = bio_split(bio, sectors, GFP_NOIO,
&mddev->bio_set);
bio_chain(split, bio);
- submit_bio_noacct(bio);
+ raid0_map_submit_bio(mddev, bio);
bio = split;
}
--
2.35.3
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] md/raid0: Fix performance regression for large sequential writes
2023-08-14 9:27 ` [PATCH 2/2] md/raid0: Fix performance regression for large sequential writes Jan Kara
@ 2023-08-14 12:15 ` Yu Kuai
2023-08-14 13:15 ` Jan Kara
2023-08-15 1:33 ` Yu Kuai
1 sibling, 1 reply; 9+ messages in thread
From: Yu Kuai @ 2023-08-14 12:15 UTC (permalink / raw)
To: Jan Kara, Song Liu; +Cc: linux-raid, Neil Brown, yukuai (C)
Hi,
在 2023/08/14 17:27, Jan Kara 写道:
> Commit f00d7c85be9e ("md/raid0: fix up bio splitting.") among other
> things changed how bio that needs to be split is submitted. Before this
> commit, we have split the bio, mapped and submitted each part. After
> this commit, we map only the first part of the split bio and submit the
> second part unmapped. Due to bio sorting in __submit_bio_noacct() this
> results in the following request ordering:
>
> 9,0 18 1181 0.525037895 15995 Q WS 1479315464 + 63392
>
> Split off chunk-sized (1024 sectors) request:
>
> 9,0 18 1182 0.629019647 15995 X WS 1479315464 / 1479316488
>
> Request is unaligned to the chunk so it's split in
> raid0_make_request(). This is the first part mapped and punted to
> bio_list:
>
> 8,0 18 7053 0.629020455 15995 A WS 739921928 + 1016 <- (9,0) 1479315464
>
> Now raid0_make_request() returns, second part is postponed on
> bio_list. __submit_bio_noacct() resorts the bio_list, mapped request
> is submitted to the underlying device:
>
> 8,0 18 7054 0.629022782 15995 G WS 739921928 + 1016
>
> Now we take another request from the bio_list which is the remainder
> of the original huge request. Split off another chunk-sized bit from
> it and the situation repeats:
>
> 9,0 18 1183 0.629024499 15995 X WS 1479316488 / 1479317512
> 8,16 18 6998 0.629025110 15995 A WS 739921928 + 1016 <- (9,0) 1479316488
> 8,16 18 6999 0.629026728 15995 G WS 739921928 + 1016
> ...
> 9,0 18 1184 0.629032940 15995 X WS 1479317512 / 1479318536 [libnetacq-write]
> 8,0 18 7059 0.629033294 15995 A WS 739922952 + 1016 <- (9,0) 1479317512
> 8,0 18 7060 0.629033902 15995 G WS 739922952 + 1016
> ...
>
> This repeats until we consume the whole original huge request. Now we
> finally get to processing the second parts of the split off requests
> (in reverse order):
>
> 8,16 18 7181 0.629161384 15995 A WS 739952640 + 8 <- (9,0) 1479377920
> 8,0 18 7239 0.629162140 15995 A WS 739952640 + 8 <- (9,0) 1479376896
> 8,16 18 7186 0.629163881 15995 A WS 739951616 + 8 <- (9,0) 1479375872
> 8,0 18 7242 0.629164421 15995 A WS 739951616 + 8 <- (9,0) 1479374848
> ...
>
> I guess it is obvious that this IO pattern is extremely inefficient way
> to perform sequential IO. It also makes bio_list to grow to rather long
> lengths.
>
> Change raid0_make_request() to map both parts of the split bio. Since we
> know we are provided with at most chunk-sized bios, we will always need
> to split the incoming bio at most once.
I understand the problem now, but I'm lost here, can you explain why "at
most once" more ? Do you mean that the original bio won't be splited
again?
Thanks,
Kuai
>
> Fixes: f00d7c85be9e ("md/raid0: fix up bio splitting.")
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> drivers/md/raid0.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index d3c55f2e9b18..595856948dff 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -626,7 +626,7 @@ static bool raid0_make_request(struct mddev *mddev, struct bio *bio)
> struct bio *split = bio_split(bio, sectors, GFP_NOIO,
> &mddev->bio_set);
> bio_chain(split, bio);
> - submit_bio_noacct(bio);
> + raid0_map_submit_bio(mddev, bio);
> bio = split;
> }
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] md/raid0: Fix performance regression for large sequential writes
2023-08-14 12:15 ` Yu Kuai
@ 2023-08-14 13:15 ` Jan Kara
2023-08-15 1:27 ` Yu Kuai
0 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2023-08-14 13:15 UTC (permalink / raw)
To: Yu Kuai; +Cc: Jan Kara, Song Liu, linux-raid, Neil Brown, yukuai (C)
On Mon 14-08-23 20:15:58, Yu Kuai wrote:
> Hi,
>
> 在 2023/08/14 17:27, Jan Kara 写道:
> > Commit f00d7c85be9e ("md/raid0: fix up bio splitting.") among other
> > things changed how bio that needs to be split is submitted. Before this
> > commit, we have split the bio, mapped and submitted each part. After
> > this commit, we map only the first part of the split bio and submit the
> > second part unmapped. Due to bio sorting in __submit_bio_noacct() this
> > results in the following request ordering:
> >
> > 9,0 18 1181 0.525037895 15995 Q WS 1479315464 + 63392
> >
> > Split off chunk-sized (1024 sectors) request:
> >
> > 9,0 18 1182 0.629019647 15995 X WS 1479315464 / 1479316488
> >
> > Request is unaligned to the chunk so it's split in
> > raid0_make_request(). This is the first part mapped and punted to
> > bio_list:
> >
> > 8,0 18 7053 0.629020455 15995 A WS 739921928 + 1016 <- (9,0) 1479315464
> >
> > Now raid0_make_request() returns, second part is postponed on
> > bio_list. __submit_bio_noacct() resorts the bio_list, mapped request
> > is submitted to the underlying device:
> >
> > 8,0 18 7054 0.629022782 15995 G WS 739921928 + 1016
> >
> > Now we take another request from the bio_list which is the remainder
> > of the original huge request. Split off another chunk-sized bit from
> > it and the situation repeats:
> >
> > 9,0 18 1183 0.629024499 15995 X WS 1479316488 / 1479317512
> > 8,16 18 6998 0.629025110 15995 A WS 739921928 + 1016 <- (9,0) 1479316488
> > 8,16 18 6999 0.629026728 15995 G WS 739921928 + 1016
> > ...
> > 9,0 18 1184 0.629032940 15995 X WS 1479317512 / 1479318536 [libnetacq-write]
> > 8,0 18 7059 0.629033294 15995 A WS 739922952 + 1016 <- (9,0) 1479317512
> > 8,0 18 7060 0.629033902 15995 G WS 739922952 + 1016
> > ...
> >
> > This repeats until we consume the whole original huge request. Now we
> > finally get to processing the second parts of the split off requests
> > (in reverse order):
> >
> > 8,16 18 7181 0.629161384 15995 A WS 739952640 + 8 <- (9,0) 1479377920
> > 8,0 18 7239 0.629162140 15995 A WS 739952640 + 8 <- (9,0) 1479376896
> > 8,16 18 7186 0.629163881 15995 A WS 739951616 + 8 <- (9,0) 1479375872
> > 8,0 18 7242 0.629164421 15995 A WS 739951616 + 8 <- (9,0) 1479374848
> > ...
> >
> > I guess it is obvious that this IO pattern is extremely inefficient way
> > to perform sequential IO. It also makes bio_list to grow to rather long
> > lengths.
> >
> > Change raid0_make_request() to map both parts of the split bio. Since we
> > know we are provided with at most chunk-sized bios, we will always need
> > to split the incoming bio at most once.
>
> I understand the problem now, but I'm lost here, can you explain why "at
> most once" more ? Do you mean that the original bio won't be splited
> again?
Yes. md_submit_bio() splits the incoming bio by bio_split_to_limits() so we
are guaranteed raid0_make_request() gets bio at most chunk_sectors long.
If this bio is misaligned, it will be split in raid0_make_request(). But
after that we are sure both parts of the bio are meeting requirements of
the raid0 code so we can just directly map them to the underlying device and
submit them.
Honza
> > Fixes: f00d7c85be9e ("md/raid0: fix up bio splitting.")
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> > drivers/md/raid0.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> > index d3c55f2e9b18..595856948dff 100644
> > --- a/drivers/md/raid0.c
> > +++ b/drivers/md/raid0.c
> > @@ -626,7 +626,7 @@ static bool raid0_make_request(struct mddev *mddev, struct bio *bio)
> > struct bio *split = bio_split(bio, sectors, GFP_NOIO,
> > &mddev->bio_set);
> > bio_chain(split, bio);
> > - submit_bio_noacct(bio);
> > + raid0_map_submit_bio(mddev, bio);
> > bio = split;
> > }
> >
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] md/raid0: Fix performance regression for large sequential writes
2023-08-14 13:15 ` Jan Kara
@ 2023-08-15 1:27 ` Yu Kuai
0 siblings, 0 replies; 9+ messages in thread
From: Yu Kuai @ 2023-08-15 1:27 UTC (permalink / raw)
To: Jan Kara, Yu Kuai; +Cc: Song Liu, linux-raid, Neil Brown, yukuai (C)
Hi, Jan!
在 2023/08/14 21:15, Jan Kara 写道:
> On Mon 14-08-23 20:15:58, Yu Kuai wrote:
>> Hi,
>>
>> 在 2023/08/14 17:27, Jan Kara 写道:
>>> Commit f00d7c85be9e ("md/raid0: fix up bio splitting.") among other
>>> things changed how bio that needs to be split is submitted. Before this
>>> commit, we have split the bio, mapped and submitted each part. After
>>> this commit, we map only the first part of the split bio and submit the
>>> second part unmapped. Due to bio sorting in __submit_bio_noacct() this
>>> results in the following request ordering:
>>>
>>> 9,0 18 1181 0.525037895 15995 Q WS 1479315464 + 63392
>>>
>>> Split off chunk-sized (1024 sectors) request:
>>>
>>> 9,0 18 1182 0.629019647 15995 X WS 1479315464 / 1479316488
>>>
>>> Request is unaligned to the chunk so it's split in
>>> raid0_make_request(). This is the first part mapped and punted to
>>> bio_list:
>>>
>>> 8,0 18 7053 0.629020455 15995 A WS 739921928 + 1016 <- (9,0) 1479315464
>>>
>>> Now raid0_make_request() returns, second part is postponed on
>>> bio_list. __submit_bio_noacct() resorts the bio_list, mapped request
>>> is submitted to the underlying device:
>>>
>>> 8,0 18 7054 0.629022782 15995 G WS 739921928 + 1016
>>>
>>> Now we take another request from the bio_list which is the remainder
>>> of the original huge request. Split off another chunk-sized bit from
>>> it and the situation repeats:
>>>
>>> 9,0 18 1183 0.629024499 15995 X WS 1479316488 / 1479317512
>>> 8,16 18 6998 0.629025110 15995 A WS 739921928 + 1016 <- (9,0) 1479316488
>>> 8,16 18 6999 0.629026728 15995 G WS 739921928 + 1016
>>> ...
>>> 9,0 18 1184 0.629032940 15995 X WS 1479317512 / 1479318536 [libnetacq-write]
>>> 8,0 18 7059 0.629033294 15995 A WS 739922952 + 1016 <- (9,0) 1479317512
>>> 8,0 18 7060 0.629033902 15995 G WS 739922952 + 1016
>>> ...
>>>
>>> This repeats until we consume the whole original huge request. Now we
>>> finally get to processing the second parts of the split off requests
>>> (in reverse order):
>>>
>>> 8,16 18 7181 0.629161384 15995 A WS 739952640 + 8 <- (9,0) 1479377920
>>> 8,0 18 7239 0.629162140 15995 A WS 739952640 + 8 <- (9,0) 1479376896
>>> 8,16 18 7186 0.629163881 15995 A WS 739951616 + 8 <- (9,0) 1479375872
>>> 8,0 18 7242 0.629164421 15995 A WS 739951616 + 8 <- (9,0) 1479374848
>>> ...
>>>
>>> I guess it is obvious that this IO pattern is extremely inefficient way
>>> to perform sequential IO. It also makes bio_list to grow to rather long
>>> lengths.
>>>
>>> Change raid0_make_request() to map both parts of the split bio. Since we
>>> know we are provided with at most chunk-sized bios, we will always need
>>> to split the incoming bio at most once.
>>
>> I understand the problem now, but I'm lost here, can you explain why "at
>> most once" more ? Do you mean that the original bio won't be splited
>> again?
>
> Yes. md_submit_bio() splits the incoming bio by bio_split_to_limits() so we
> are guaranteed raid0_make_request() gets bio at most chunk_sectors long.
> If this bio is misaligned, it will be split in raid0_make_request(). But
> after that we are sure both parts of the bio are meeting requirements of
> the raid0 code so we can just directly map them to the underlying device and
> submit them.
Thanks for the explanation, I understand this now.
Kuai
>
> Honza
>
>>> Fixes: f00d7c85be9e ("md/raid0: fix up bio splitting.")
>>> Signed-off-by: Jan Kara <jack@suse.cz>
>>> ---
>>> drivers/md/raid0.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
>>> index d3c55f2e9b18..595856948dff 100644
>>> --- a/drivers/md/raid0.c
>>> +++ b/drivers/md/raid0.c
>>> @@ -626,7 +626,7 @@ static bool raid0_make_request(struct mddev *mddev, struct bio *bio)
>>> struct bio *split = bio_split(bio, sectors, GFP_NOIO,
>>> &mddev->bio_set);
>>> bio_chain(split, bio);
>>> - submit_bio_noacct(bio);
>>> + raid0_map_submit_bio(mddev, bio);
>>> bio = split;
>>> }
>>>
>>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] md/raid0: Factor out helper for mapping and submitting a bio
2023-08-14 9:27 ` [PATCH 1/2] md/raid0: Factor out helper for mapping and submitting a bio Jan Kara
@ 2023-08-15 1:33 ` Yu Kuai
2023-08-15 7:38 ` Song Liu
0 siblings, 1 reply; 9+ messages in thread
From: Yu Kuai @ 2023-08-15 1:33 UTC (permalink / raw)
To: Jan Kara, Song Liu; +Cc: linux-raid, Neil Brown, yukuai (C)
在 2023/08/14 17:27, Jan Kara 写道:
> Factor out helper function for mapping and submitting a bio out of
> raid0_make_request(). We will use it later for submitting both parts of
> a split bio.
>
LGTM
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> drivers/md/raid0.c | 79 +++++++++++++++++++++++-----------------------
> 1 file changed, 40 insertions(+), 39 deletions(-)
>
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index d1ac73fcd852..d3c55f2e9b18 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -557,54 +557,21 @@ static void raid0_handle_discard(struct mddev *mddev, struct bio *bio)
> bio_endio(bio);
> }
>
> -static bool raid0_make_request(struct mddev *mddev, struct bio *bio)
> +static void raid0_map_submit_bio(struct mddev *mddev, struct bio *bio)
> {
> struct r0conf *conf = mddev->private;
> struct strip_zone *zone;
> struct md_rdev *tmp_dev;
> - sector_t bio_sector;
> - sector_t sector;
> - sector_t orig_sector;
> - unsigned chunk_sects;
> - unsigned sectors;
> -
> - if (unlikely(bio->bi_opf & REQ_PREFLUSH)
> - && md_flush_request(mddev, bio))
> - return true;
> -
> - if (unlikely((bio_op(bio) == REQ_OP_DISCARD))) {
> - raid0_handle_discard(mddev, bio);
> - return true;
> - }
> -
> - bio_sector = bio->bi_iter.bi_sector;
> - sector = bio_sector;
> - chunk_sects = mddev->chunk_sectors;
> -
> - sectors = chunk_sects -
> - (likely(is_power_of_2(chunk_sects))
> - ? (sector & (chunk_sects-1))
> - : sector_div(sector, chunk_sects));
> -
> - /* Restore due to sector_div */
> - sector = bio_sector;
> -
> - if (sectors < bio_sectors(bio)) {
> - struct bio *split = bio_split(bio, sectors, GFP_NOIO,
> - &mddev->bio_set);
> - bio_chain(split, bio);
> - submit_bio_noacct(bio);
> - bio = split;
> - }
> + sector_t bio_sector = bio->bi_iter.bi_sector;
> + sector_t sector = bio_sector;
>
> if (bio->bi_pool != &mddev->bio_set)
> md_account_bio(mddev, &bio);
>
> - orig_sector = sector;
> zone = find_zone(mddev->private, §or);
> switch (conf->layout) {
> case RAID0_ORIG_LAYOUT:
> - tmp_dev = map_sector(mddev, zone, orig_sector, §or);
> + tmp_dev = map_sector(mddev, zone, bio_sector, §or);
> break;
> case RAID0_ALT_MULTIZONE_LAYOUT:
> tmp_dev = map_sector(mddev, zone, sector, §or);
> @@ -612,13 +579,13 @@ static bool raid0_make_request(struct mddev *mddev, struct bio *bio)
> default:
> WARN(1, "md/raid0:%s: Invalid layout\n", mdname(mddev));
> bio_io_error(bio);
> - return true;
> + return;
> }
>
> if (unlikely(is_rdev_broken(tmp_dev))) {
> bio_io_error(bio);
> md_error(mddev, tmp_dev);
> - return true;
> + return;
> }
>
> bio_set_dev(bio, tmp_dev->bdev);
> @@ -630,6 +597,40 @@ static bool raid0_make_request(struct mddev *mddev, struct bio *bio)
> bio_sector);
> mddev_check_write_zeroes(mddev, bio);
> submit_bio_noacct(bio);
> +}
> +
> +static bool raid0_make_request(struct mddev *mddev, struct bio *bio)
> +{
> + sector_t sector;
> + unsigned chunk_sects;
> + unsigned sectors;
> +
> + if (unlikely(bio->bi_opf & REQ_PREFLUSH)
> + && md_flush_request(mddev, bio))
> + return true;
> +
> + if (unlikely((bio_op(bio) == REQ_OP_DISCARD))) {
> + raid0_handle_discard(mddev, bio);
> + return true;
> + }
> +
> + sector = bio->bi_iter.bi_sector;
> + chunk_sects = mddev->chunk_sectors;
> +
> + sectors = chunk_sects -
> + (likely(is_power_of_2(chunk_sects))
> + ? (sector & (chunk_sects-1))
> + : sector_div(sector, chunk_sects));
> +
> + if (sectors < bio_sectors(bio)) {
> + struct bio *split = bio_split(bio, sectors, GFP_NOIO,
> + &mddev->bio_set);
> + bio_chain(split, bio);
> + submit_bio_noacct(bio);
> + bio = split;
> + }
> +
> + raid0_map_submit_bio(mddev, bio);
> return true;
> }
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] md/raid0: Fix performance regression for large sequential writes
2023-08-14 9:27 ` [PATCH 2/2] md/raid0: Fix performance regression for large sequential writes Jan Kara
2023-08-14 12:15 ` Yu Kuai
@ 2023-08-15 1:33 ` Yu Kuai
1 sibling, 0 replies; 9+ messages in thread
From: Yu Kuai @ 2023-08-15 1:33 UTC (permalink / raw)
To: Jan Kara, Song Liu; +Cc: linux-raid, Neil Brown, yukuai (C)
在 2023/08/14 17:27, Jan Kara 写道:
> Commit f00d7c85be9e ("md/raid0: fix up bio splitting.") among other
> things changed how bio that needs to be split is submitted. Before this
> commit, we have split the bio, mapped and submitted each part. After
> this commit, we map only the first part of the split bio and submit the
> second part unmapped. Due to bio sorting in __submit_bio_noacct() this
> results in the following request ordering:
>
> 9,0 18 1181 0.525037895 15995 Q WS 1479315464 + 63392
>
> Split off chunk-sized (1024 sectors) request:
>
> 9,0 18 1182 0.629019647 15995 X WS 1479315464 / 1479316488
>
> Request is unaligned to the chunk so it's split in
> raid0_make_request(). This is the first part mapped and punted to
> bio_list:
>
> 8,0 18 7053 0.629020455 15995 A WS 739921928 + 1016 <- (9,0) 1479315464
>
> Now raid0_make_request() returns, second part is postponed on
> bio_list. __submit_bio_noacct() resorts the bio_list, mapped request
> is submitted to the underlying device:
>
> 8,0 18 7054 0.629022782 15995 G WS 739921928 + 1016
>
> Now we take another request from the bio_list which is the remainder
> of the original huge request. Split off another chunk-sized bit from
> it and the situation repeats:
>
> 9,0 18 1183 0.629024499 15995 X WS 1479316488 / 1479317512
> 8,16 18 6998 0.629025110 15995 A WS 739921928 + 1016 <- (9,0) 1479316488
> 8,16 18 6999 0.629026728 15995 G WS 739921928 + 1016
> ...
> 9,0 18 1184 0.629032940 15995 X WS 1479317512 / 1479318536 [libnetacq-write]
> 8,0 18 7059 0.629033294 15995 A WS 739922952 + 1016 <- (9,0) 1479317512
> 8,0 18 7060 0.629033902 15995 G WS 739922952 + 1016
> ...
>
> This repeats until we consume the whole original huge request. Now we
> finally get to processing the second parts of the split off requests
> (in reverse order):
>
> 8,16 18 7181 0.629161384 15995 A WS 739952640 + 8 <- (9,0) 1479377920
> 8,0 18 7239 0.629162140 15995 A WS 739952640 + 8 <- (9,0) 1479376896
> 8,16 18 7186 0.629163881 15995 A WS 739951616 + 8 <- (9,0) 1479375872
> 8,0 18 7242 0.629164421 15995 A WS 739951616 + 8 <- (9,0) 1479374848
> ...
>
> I guess it is obvious that this IO pattern is extremely inefficient way
> to perform sequential IO. It also makes bio_list to grow to rather long
> lengths.
>
> Change raid0_make_request() to map both parts of the split bio. Since we
> know we are provided with at most chunk-sized bios, we will always need
> to split the incoming bio at most once.
>
LGTM
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
> Fixes: f00d7c85be9e ("md/raid0: fix up bio splitting.")
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> drivers/md/raid0.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index d3c55f2e9b18..595856948dff 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -626,7 +626,7 @@ static bool raid0_make_request(struct mddev *mddev, struct bio *bio)
> struct bio *split = bio_split(bio, sectors, GFP_NOIO,
> &mddev->bio_set);
> bio_chain(split, bio);
> - submit_bio_noacct(bio);
> + raid0_map_submit_bio(mddev, bio);
> bio = split;
> }
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] md/raid0: Factor out helper for mapping and submitting a bio
2023-08-15 1:33 ` Yu Kuai
@ 2023-08-15 7:38 ` Song Liu
0 siblings, 0 replies; 9+ messages in thread
From: Song Liu @ 2023-08-15 7:38 UTC (permalink / raw)
To: Yu Kuai; +Cc: Jan Kara, linux-raid, Neil Brown, yukuai (C)
On Tue, Aug 15, 2023 at 9:33 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> 在 2023/08/14 17:27, Jan Kara 写道:
> > Factor out helper function for mapping and submitting a bio out of
> > raid0_make_request(). We will use it later for submitting both parts of
> > a split bio.
> >
> LGTM
>
> Reviewed-by: Yu Kuai <yukuai3@huawei.com>
>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
Applied the set to md-next. Thanks!
Song
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-08-15 7:39 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-14 9:27 [PATCH 0/2] md/raid0: Fix performance regression for large sequential IO Jan Kara
2023-08-14 9:27 ` [PATCH 1/2] md/raid0: Factor out helper for mapping and submitting a bio Jan Kara
2023-08-15 1:33 ` Yu Kuai
2023-08-15 7:38 ` Song Liu
2023-08-14 9:27 ` [PATCH 2/2] md/raid0: Fix performance regression for large sequential writes Jan Kara
2023-08-14 12:15 ` Yu Kuai
2023-08-14 13:15 ` Jan Kara
2023-08-15 1:27 ` Yu Kuai
2023-08-15 1:33 ` Yu Kuai
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).