* read regression for dm-snapshot with loopback
@ 2024-10-03 21:13 Leah Rumancik
2024-10-04 5:58 ` Christoph Hellwig
0 siblings, 1 reply; 6+ messages in thread
From: Leah Rumancik @ 2024-10-03 21:13 UTC (permalink / raw)
To: stable; +Cc: axboe, Christoph Hellwig, bvanassche, linux-block
Hello,
I have been investigating a read performance regression of dm-snapshot
on top of loopback in which the read time for a dd command increased
from 2min to 40min. I bisected the issue to dc5fc361d89 ("block:
attempt direct issue of plug list"). I blktraced before and after this
commit and the main difference I saw was that before this commit, when
the performance was good, there were a lot of IO unplugs on the loop
dev. After this commit, I saw 0 IO unplugs.
On the mainline, I was also able to bisect to a commit which fixed
this issue: 667ea36378cf ("loop: don't set QUEUE_FLAG_NOMERGES"). I
also blktraced before and after this commit, and unsurprisingly, the
main difference was that commit resulted in IO merges whereas
previously there were none being.
I don't totally understand what is going on with the first commit
which introduced the issue but I'd guess some modifying of the plug
list behavior resulted in IO not getting merged/grouped but when we
enabled QUEUE_FLAG_NOMERGES, we were then able to optimize through
this mechanism. Buuuut 2min->40min seems like a huge performance drop
just from merged vs non-merged IO, no? So perhaps it's more
complicated than that...
dc5fc361d89 -> 5.16
667ea36378c -> 6.11
6.6.y and 6.1.y and were both experiencing the performance issue. I
tried porting 667ea36378 to these branches; it applied cleanly and
resolved the issue for both. So perhaps we should consider it for the
stable trees, but it'd be great if someone from the block layer could
chime in with a better idea of what's going on here.
- leah
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: read regression for dm-snapshot with loopback
2024-10-03 21:13 read regression for dm-snapshot with loopback Leah Rumancik
@ 2024-10-04 5:58 ` Christoph Hellwig
2024-10-05 0:41 ` Leah Rumancik
0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2024-10-04 5:58 UTC (permalink / raw)
To: Leah Rumancik; +Cc: stable, axboe, Christoph Hellwig, bvanassche, linux-block
Hi Leah,
On Thu, Oct 03, 2024 at 02:13:52PM -0700, Leah Rumancik wrote:
> Hello,
>
> I have been investigating a read performance regression of dm-snapshot
> on top of loopback in which the read time for a dd command increased
> from 2min to 40min. I bisected the issue to dc5fc361d89 ("block:
> attempt direct issue of plug list"). I blktraced before and after this
> commit and the main difference I saw was that before this commit, when
> the performance was good, there were a lot of IO unplugs on the loop
> dev. After this commit, I saw 0 IO unplugs.
/me makes a not that it might make sense to enhance the tracing to show
which of the trace_block_unplug call sites did a particular unplug because
that might be helpful here, but I suspect the ones you saw the ones
from blk_mq_dispatch_plug_list, which now gets bypassed.
I'm not really sure how that changed things, except that I know in
old kernels we had issues with reordering I/O in this path, and
maybe your workload hit that? Did the issue order change in the
blktrace?
> On the mainline, I was also able to bisect to a commit which fixed
> this issue: 667ea36378cf ("loop: don't set QUEUE_FLAG_NOMERGES"). I
> also blktraced before and after this commit, and unsurprisingly, the
> main difference was that commit resulted in IO merges whereas
> previously there were none being.
With QUEUE_FLAG_NOMERGES even very basic merging is enabled, so that
would fix any smaller amount of reordering. It is in fact a bit
stupid that this ever got set by default on the loop driver.
> 6.6.y and 6.1.y and were both experiencing the performance issue. I
> tried porting 667ea36378 to these branches; it applied cleanly and
> resolved the issue for both. So perhaps we should consider it for the
> stable trees, but it'd be great if someone from the block layer could
> chime in with a better idea of what's going on here.
I'm totally fine with backporting the patch.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: read regression for dm-snapshot with loopback
2024-10-04 5:58 ` Christoph Hellwig
@ 2024-10-05 0:41 ` Leah Rumancik
2024-10-05 1:22 ` Stable backport request (was Re: read regression for dm-snapshot with loopback) Jens Axboe
0 siblings, 1 reply; 6+ messages in thread
From: Leah Rumancik @ 2024-10-05 0:41 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: stable, axboe, bvanassche, linux-block
Cool, thanks. I'll poke around some more next week, but sounds good,
let's go ahead with 667ea36378 for 6.6 and 6.1 then.
- leah
On Thu, Oct 3, 2024 at 10:58 PM Christoph Hellwig <hch@lst.de> wrote:
>
> Hi Leah,
>
> On Thu, Oct 03, 2024 at 02:13:52PM -0700, Leah Rumancik wrote:
> > Hello,
> >
> > I have been investigating a read performance regression of dm-snapshot
> > on top of loopback in which the read time for a dd command increased
> > from 2min to 40min. I bisected the issue to dc5fc361d89 ("block:
> > attempt direct issue of plug list"). I blktraced before and after this
> > commit and the main difference I saw was that before this commit, when
> > the performance was good, there were a lot of IO unplugs on the loop
> > dev. After this commit, I saw 0 IO unplugs.
>
> /me makes a not that it might make sense to enhance the tracing to show
> which of the trace_block_unplug call sites did a particular unplug because
> that might be helpful here, but I suspect the ones you saw the ones
> from blk_mq_dispatch_plug_list, which now gets bypassed.
>
> I'm not really sure how that changed things, except that I know in
> old kernels we had issues with reordering I/O in this path, and
> maybe your workload hit that? Did the issue order change in the
> blktrace?
>
> > On the mainline, I was also able to bisect to a commit which fixed
> > this issue: 667ea36378cf ("loop: don't set QUEUE_FLAG_NOMERGES"). I
> > also blktraced before and after this commit, and unsurprisingly, the
> > main difference was that commit resulted in IO merges whereas
> > previously there were none being.
>
> With QUEUE_FLAG_NOMERGES even very basic merging is enabled, so that
> would fix any smaller amount of reordering. It is in fact a bit
> stupid that this ever got set by default on the loop driver.
>
> > 6.6.y and 6.1.y and were both experiencing the performance issue. I
> > tried porting 667ea36378 to these branches; it applied cleanly and
> > resolved the issue for both. So perhaps we should consider it for the
> > stable trees, but it'd be great if someone from the block layer could
> > chime in with a better idea of what's going on here.
>
> I'm totally fine with backporting the patch.
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Stable backport request (was Re: read regression for dm-snapshot with loopback)
2024-10-05 0:41 ` Leah Rumancik
@ 2024-10-05 1:22 ` Jens Axboe
2024-10-06 0:22 ` Sasha Levin
0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2024-10-05 1:22 UTC (permalink / raw)
To: Leah Rumancik, Christoph Hellwig
Cc: stable, bvanassche, linux-block, Greg Kroah-Hartman
On 10/4/24 6:41 PM, Leah Rumancik wrote:
> Cool, thanks. I'll poke around some more next week, but sounds good,
> let's go ahead with 667ea36378 for 6.6 and 6.1 then.
Greg, can you pickup 667ea36378cf for 6.1-stable and 6.6-stable?
--
Jens Axboe
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Stable backport request (was Re: read regression for dm-snapshot with loopback)
2024-10-05 1:22 ` Stable backport request (was Re: read regression for dm-snapshot with loopback) Jens Axboe
@ 2024-10-06 0:22 ` Sasha Levin
2024-10-06 0:36 ` Jens Axboe
0 siblings, 1 reply; 6+ messages in thread
From: Sasha Levin @ 2024-10-06 0:22 UTC (permalink / raw)
To: Jens Axboe
Cc: Leah Rumancik, Christoph Hellwig, stable, bvanassche, linux-block,
Greg Kroah-Hartman
On Fri, Oct 04, 2024 at 07:22:24PM -0600, Jens Axboe wrote:
>On 10/4/24 6:41 PM, Leah Rumancik wrote:
>> Cool, thanks. I'll poke around some more next week, but sounds good,
>> let's go ahead with 667ea36378 for 6.6 and 6.1 then.
>
>Greg, can you pickup 667ea36378cf for 6.1-stable and 6.6-stable?
Queued up, thanks!
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Stable backport request (was Re: read regression for dm-snapshot with loopback)
2024-10-06 0:22 ` Sasha Levin
@ 2024-10-06 0:36 ` Jens Axboe
0 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2024-10-06 0:36 UTC (permalink / raw)
To: Sasha Levin
Cc: Leah Rumancik, Christoph Hellwig, stable, bvanassche, linux-block,
Greg Kroah-Hartman
On 10/5/24 6:22 PM, Sasha Levin wrote:
> On Fri, Oct 04, 2024 at 07:22:24PM -0600, Jens Axboe wrote:
>> On 10/4/24 6:41 PM, Leah Rumancik wrote:
>>> Cool, thanks. I'll poke around some more next week, but sounds good,
>>> let's go ahead with 667ea36378 for 6.6 and 6.1 then.
>>
>> Greg, can you pickup 667ea36378cf for 6.1-stable and 6.6-stable?
>
> Queued up, thanks!
Thanks Sasha!
--
Jens Axboe
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-10-06 0:36 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-03 21:13 read regression for dm-snapshot with loopback Leah Rumancik
2024-10-04 5:58 ` Christoph Hellwig
2024-10-05 0:41 ` Leah Rumancik
2024-10-05 1:22 ` Stable backport request (was Re: read regression for dm-snapshot with loopback) Jens Axboe
2024-10-06 0:22 ` Sasha Levin
2024-10-06 0:36 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox