* Merging raw block device writes
@ 2023-11-25 17:38 Michael Kelley
2023-11-27 6:59 ` hch
0 siblings, 1 reply; 5+ messages in thread
From: Michael Kelley @ 2023-11-25 17:38 UTC (permalink / raw)
To: axboe@kernel.dk, hch@lst.de, linux-block@vger.kernel.org,
linux-hyperv@vger.kernel.org
About 18 months ago, I raised an issue with merging of direct raw block
device writes. [1] At the time, nobody offered any input on the issue.
To recap, when multiple direct write requests are in-flight to a raw block
device with I/O scheduler "none", and requests must wait for budget
(i.e., the device is SCSI), write requests don't go on a blk-mq software
queue, and no merging is done. Direct read requests that must wait for
budget *do* go on a software queue and merges happen.
Recently, I noticed that the problem has been fixed in the latest
upstream kernel, and I had time to do further investigation on the
issue. Bisecting shows the problem first occurred in 5.16-rc1 with
commit dc5fc361d891 from Jens Axboe. This commit actually prevents
merging of both reads and writes. But reads were indirectly fixed in
commit 54a88eb838d3 from Pavel Begunkov, also in 5.16-rc1, so
the read problem never occurred in a release. There's no mention
of merging in either commit message, so I suspect the effect on
merging was unintentional in both cases. In 5.16, blkdev_read_iter()
does not create a plug list, while blkdev_write_iter() does. But the
lower level __blkdev_direct_IO() creates a plug list for both reads
and writes, which is why commit dc5fc361d891 broke both. Then
commit 54a88eb838d3 bypassed __blkdev_direct_IO() in most
cases, and the new path does not create a plug list. So reads
typically proceed without a plug list, and the merging can happen.
Writes still don't merge because of the plug list in the higher level
blkdev_write_iter().
The situation stayed that way until 6.5-rc1 when commit
712c7364655f from Christoph removed the plug list from
blkdev_write_iter(). Again, there's no mention of merging in the
commit message, so fixing the merge problem may be happenstance.
Hyper-V guests and the Azure cloud have a particular interest here
because Hyper-V guests uses SCSI as the standard interface to virtual
disks. Azure cloud disks can be throttled to a limited number of IOPS,
so the number of in-flights I/Os can be relatively high, and
merging can be beneficial to staying within the throttle
limits. Of the flip side, this problem hasn't generated complaints
over the last 18 months that I'm aware of, though that may be more
because commercial distros haven't been running 5.16 or later kernels
until relatively recently.
In any case, the 6.5 kernel fixes the problem, at least in the
common cases where there's no plug list. But I still wonder if
there's a latent problem with the original commit dc5fc361d891
that should be looked at by someone with more blk-mq expertise
than I have.
Michael
[1] https://lore.kernel.org/linux-block/PH0PR21MB3025A7D1326A92A4B8BDB5FED7B59@PH0PR21MB3025.namprd21.prod.outlook.com/
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Merging raw block device writes
2023-11-25 17:38 Merging raw block device writes Michael Kelley
@ 2023-11-27 6:59 ` hch
2023-11-27 16:10 ` Jens Axboe
0 siblings, 1 reply; 5+ messages in thread
From: hch @ 2023-11-27 6:59 UTC (permalink / raw)
To: Michael Kelley
Cc: axboe@kernel.dk, hch@lst.de, linux-block@vger.kernel.org,
linux-hyperv@vger.kernel.org
On Sat, Nov 25, 2023 at 05:38:28PM +0000, Michael Kelley wrote:
> Hyper-V guests and the Azure cloud have a particular interest here
> because Hyper-V guests uses SCSI as the standard interface to virtual
> disks. Azure cloud disks can be throttled to a limited number of IOPS,
> so the number of in-flights I/Os can be relatively high, and
> merging can be beneficial to staying within the throttle
> limits. Of the flip side, this problem hasn't generated complaints
> over the last 18 months that I'm aware of, though that may be more
> because commercial distros haven't been running 5.16 or later kernels
> until relatively recently.
I think the more important thing is that if you care about reducing
the number of I/Os you probably should use an I/O scheduler. Reducing
the number of I/Os without an I/O scheduler isn't (and I'll argue
shouldn't) be a concern for the non I/O scheduler.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Merging raw block device writes
2023-11-27 6:59 ` hch
@ 2023-11-27 16:10 ` Jens Axboe
2023-11-28 19:29 ` Michael Kelley
0 siblings, 1 reply; 5+ messages in thread
From: Jens Axboe @ 2023-11-27 16:10 UTC (permalink / raw)
To: hch@lst.de, Michael Kelley
Cc: linux-block@vger.kernel.org, linux-hyperv@vger.kernel.org
On 11/26/23 11:59 PM, hch@lst.de wrote:
> On Sat, Nov 25, 2023 at 05:38:28PM +0000, Michael Kelley wrote:
>> Hyper-V guests and the Azure cloud have a particular interest here
>> because Hyper-V guests uses SCSI as the standard interface to virtual
>> disks. Azure cloud disks can be throttled to a limited number of IOPS,
>> so the number of in-flights I/Os can be relatively high, and
>> merging can be beneficial to staying within the throttle
>> limits. Of the flip side, this problem hasn't generated complaints
>> over the last 18 months that I'm aware of, though that may be more
>> because commercial distros haven't been running 5.16 or later kernels
>> until relatively recently.
>
> I think the more important thing is that if you care about reducing
> the number of I/Os you probably should use an I/O scheduler. Reducing
> the number of I/Os without an I/O scheduler isn't (and I'll argue
> shouldn't) be a concern for the non I/O scheduler.
Yep fully agree.
--
Jens Axboe
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: Merging raw block device writes
2023-11-27 16:10 ` Jens Axboe
@ 2023-11-28 19:29 ` Michael Kelley
2023-11-28 21:59 ` Jens Axboe
0 siblings, 1 reply; 5+ messages in thread
From: Michael Kelley @ 2023-11-28 19:29 UTC (permalink / raw)
To: Jens Axboe, hch@lst.de
Cc: linux-block@vger.kernel.org, linux-hyperv@vger.kernel.org
From: Jens Axboe <axboe@kernel.dk> Sent: Monday, November 27, 2023 8:10 AM
>
> On 11/26/23 11:59 PM, hch@lst.de wrote:
> > On Sat, Nov 25, 2023 at 05:38:28PM +0000, Michael Kelley wrote:
> >> Hyper-V guests and the Azure cloud have a particular interest here
> >> because Hyper-V guests uses SCSI as the standard interface to virtual
> >> disks. Azure cloud disks can be throttled to a limited number of IOPS,
> >> so the number of in-flights I/Os can be relatively high, and
> >> merging can be beneficial to staying within the throttle
> >> limits. Of the flip side, this problem hasn't generated complaints
> >> over the last 18 months that I'm aware of, though that may be more
> >> because commercial distros haven't been running 5.16 or later kernels
> >> until relatively recently.
> >
> > I think the more important thing is that if you care about reducing
> > the number of I/Os you probably should use an I/O scheduler. Reducing
> > the number of I/Os without an I/O scheduler isn't (and I'll argue
> > shouldn't) be a concern for the non I/O scheduler.
>
> Yep fully agree.
>
OK. But there *is* intentional functionality in blk-mq to do merging
even when there's no I/O scheduler. If that functionality breaks, is
that considered a bug and regression? The functionality only affects
performance and not correctness, so maybe it's a bit of a gray area.
It's all working again as of 6.5, so the only potential code action is to
backport Christoph's commit to stable releases. But it still seems like
there should be an explicit statement about what to expect going forward.
Should the code for doing merging with no I/O scheduler be removed, or
at least put on the deprecation path?
Michael
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Merging raw block device writes
2023-11-28 19:29 ` Michael Kelley
@ 2023-11-28 21:59 ` Jens Axboe
0 siblings, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2023-11-28 21:59 UTC (permalink / raw)
To: Michael Kelley, hch@lst.de
Cc: linux-block@vger.kernel.org, linux-hyperv@vger.kernel.org
On 11/28/23 12:29 PM, Michael Kelley wrote:
> From: Jens Axboe <axboe@kernel.dk> Sent: Monday, November 27, 2023 8:10 AM
>>
>> On 11/26/23 11:59 PM, hch@lst.de wrote:
>>> On Sat, Nov 25, 2023 at 05:38:28PM +0000, Michael Kelley wrote:
>>>> Hyper-V guests and the Azure cloud have a particular interest here
>>>> because Hyper-V guests uses SCSI as the standard interface to virtual
>>>> disks. Azure cloud disks can be throttled to a limited number of IOPS,
>>>> so the number of in-flights I/Os can be relatively high, and
>>>> merging can be beneficial to staying within the throttle
>>>> limits. Of the flip side, this problem hasn't generated complaints
>>>> over the last 18 months that I'm aware of, though that may be more
>>>> because commercial distros haven't been running 5.16 or later kernels
>>>> until relatively recently.
>>>
>>> I think the more important thing is that if you care about reducing
>>> the number of I/Os you probably should use an I/O scheduler. Reducing
>>> the number of I/Os without an I/O scheduler isn't (and I'll argue
>>> shouldn't) be a concern for the non I/O scheduler.
>>
>> Yep fully agree.
>>
>
> OK. But there *is* intentional functionality in blk-mq to do merging
> even when there's no I/O scheduler. If that functionality breaks, is
> that considered a bug and regression? The functionality only affects
> performance and not correctness, so maybe it's a bit of a gray area.
>
> It's all working again as of 6.5, so the only potential code action is to
> backport Christoph's commit to stable releases. But it still seems like
> there should be an explicit statement about what to expect going forward.
> Should the code for doing merging with no I/O scheduler be removed, or
> at least put on the deprecation path?
It's mostly a "you get what you get" thing. If we can do merging cheap
or for free, then we do that above the IO scheduler. It'd be great to
have some tests for this to ensure we don't regress, unknowingly.
But in general, if you want guaranteed good merging, then an IO
scheduler is the right choice.
--
Jens Axboe
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-11-28 21:59 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-25 17:38 Merging raw block device writes Michael Kelley
2023-11-27 6:59 ` hch
2023-11-27 16:10 ` Jens Axboe
2023-11-28 19:29 ` Michael Kelley
2023-11-28 21:59 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox