* Re: [PATCHv3 1/2] block: accumulate segment page gaps per bio
[not found] ` <aKxu83upEBhf5gT7@kbusch-mbp>
@ 2025-08-26 13:03 ` Christoph Hellwig
2025-08-26 13:47 ` Keith Busch
0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2025-08-26 13:03 UTC (permalink / raw)
To: Keith Busch
Cc: Christoph Hellwig, Keith Busch, linux-block, linux-nvme, axboe,
iommu
On Mon, Aug 25, 2025 at 08:10:59AM -0600, Keith Busch wrote:
> On Mon, Aug 25, 2025 at 06:46:50AM -0700, Christoph Hellwig wrote:
> > On Thu, Aug 21, 2025 at 01:44:19PM -0700, Keith Busch wrote:
> >
> > Also use the chance to document why all this is PAGE_SIZE based and
> > not based on either the iommu granule size or the virt boundary.
>
> This is a good opportunity to double check my assumptions:
Always a good idea!
>
> PAGE_SIZEs, iommu granules, and virt boundaries are all power-of-two
> values, and PAGE_SIZE is always the largest (or tied for largest) of
> these.
I just had an offlist conversation with someone trying to make a nvme
device with a virt boundary larger than PAGE_SIZE work. No idea
where that device came from.
I hink IOMMU granule > PAGE_SIZE would be painful, but adding the
IOMMU list to double check.
It would also be really good to document all these assumptions with
both comments and assert so that we immediately see when they are
violated.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCHv3 1/2] block: accumulate segment page gaps per bio
2025-08-26 13:03 ` [PATCHv3 1/2] block: accumulate segment page gaps per bio Christoph Hellwig
@ 2025-08-26 13:47 ` Keith Busch
2025-08-26 13:57 ` Christoph Hellwig
0 siblings, 1 reply; 7+ messages in thread
From: Keith Busch @ 2025-08-26 13:47 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christoph Hellwig, Keith Busch, linux-block, linux-nvme, axboe,
iommu
On Tue, Aug 26, 2025 at 03:03:44PM +0200, Christoph Hellwig wrote:
> On Mon, Aug 25, 2025 at 08:10:59AM -0600, Keith Busch wrote:
> >
> > PAGE_SIZEs, iommu granules, and virt boundaries are all power-of-two
> > values, and PAGE_SIZE is always the largest (or tied for largest) of
> > these.
>
> I just had an offlist conversation with someone trying to make a nvme
> device with a virt boundary larger than PAGE_SIZE work. No idea
> where that device came from.
Currently, the virtual boundary is always compared to bv_offset, which
is a page offset. If the virtual boundary is larger than a page, then we
need something like "page_to_phys(bv.bv_page) + bv.bv_offset" every
place we need to check against the virt boundary.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCHv3 1/2] block: accumulate segment page gaps per bio
2025-08-26 13:47 ` Keith Busch
@ 2025-08-26 13:57 ` Christoph Hellwig
2025-08-26 22:33 ` Keith Busch
0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2025-08-26 13:57 UTC (permalink / raw)
To: Keith Busch
Cc: Christoph Hellwig, Christoph Hellwig, Keith Busch, linux-block,
linux-nvme, axboe, iommu
On Tue, Aug 26, 2025 at 07:47:46AM -0600, Keith Busch wrote:
> Currently, the virtual boundary is always compared to bv_offset, which
> is a page offset. If the virtual boundary is larger than a page, then we
> need something like "page_to_phys(bv.bv_page) + bv.bv_offset" every
> place we need to check against the virt boundary.
bv_offset is only guaranteed to be a page offset if your use
bio_for_each_segment(_all) or the low-level helpers implementing
it and not bio_for_each_bvec(_all) where it can be much larger
than PAGE_SIZE.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCHv3 1/2] block: accumulate segment page gaps per bio
2025-08-26 13:57 ` Christoph Hellwig
@ 2025-08-26 22:33 ` Keith Busch
2025-08-27 7:37 ` Christoph Hellwig
0 siblings, 1 reply; 7+ messages in thread
From: Keith Busch @ 2025-08-26 22:33 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christoph Hellwig, Keith Busch, linux-block, linux-nvme, axboe,
iommu
On Tue, Aug 26, 2025 at 03:57:34PM +0200, Christoph Hellwig wrote:
> On Tue, Aug 26, 2025 at 07:47:46AM -0600, Keith Busch wrote:
> > Currently, the virtual boundary is always compared to bv_offset, which
> > is a page offset. If the virtual boundary is larger than a page, then we
> > need something like "page_to_phys(bv.bv_page) + bv.bv_offset" every
> > place we need to check against the virt boundary.
>
> bv_offset is only guaranteed to be a page offset if your use
> bio_for_each_segment(_all) or the low-level helpers implementing
> it and not bio_for_each_bvec(_all) where it can be much larger
> than PAGE_SIZE.
Yes, good point. So we'd have a folio offset when it's not a single
page, but I don't think we want to special case large folios for every
virt boundary check. It's looking like replace bvec's "page + offset"
with phys addrs, yeah?!
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCHv3 1/2] block: accumulate segment page gaps per bio
2025-08-26 22:33 ` Keith Busch
@ 2025-08-27 7:37 ` Christoph Hellwig
2025-08-30 1:47 ` Keith Busch
0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2025-08-27 7:37 UTC (permalink / raw)
To: Keith Busch
Cc: Christoph Hellwig, Christoph Hellwig, Keith Busch, linux-block,
linux-nvme, axboe, iommu
On Tue, Aug 26, 2025 at 04:33:15PM -0600, Keith Busch wrote:
> On Tue, Aug 26, 2025 at 03:57:34PM +0200, Christoph Hellwig wrote:
> > On Tue, Aug 26, 2025 at 07:47:46AM -0600, Keith Busch wrote:
> > > Currently, the virtual boundary is always compared to bv_offset, which
> > > is a page offset. If the virtual boundary is larger than a page, then we
> > > need something like "page_to_phys(bv.bv_page) + bv.bv_offset" every
> > > place we need to check against the virt boundary.
> >
> > bv_offset is only guaranteed to be a page offset if your use
> > bio_for_each_segment(_all) or the low-level helpers implementing
> > it and not bio_for_each_bvec(_all) where it can be much larger
> > than PAGE_SIZE.
>
> Yes, good point. So we'd have a folio offset when it's not a single
> page, but I don't think we want to special case large folios for every
> virt boundary check. It's looking like replace bvec's "page + offset"
> with phys addrs, yeah?!
Basically everything should be using physical address. The page + offset
is just a weird and inefficient way to represent that and we really
need to get rid of it.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCHv3 1/2] block: accumulate segment page gaps per bio
2025-08-27 7:37 ` Christoph Hellwig
@ 2025-08-30 1:47 ` Keith Busch
2025-09-02 5:36 ` Christoph Hellwig
0 siblings, 1 reply; 7+ messages in thread
From: Keith Busch @ 2025-08-30 1:47 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christoph Hellwig, Keith Busch, linux-block, linux-nvme, axboe,
iommu
On Wed, Aug 27, 2025 at 09:37:09AM +0200, Christoph Hellwig wrote:
> On Tue, Aug 26, 2025 at 04:33:15PM -0600, Keith Busch wrote:
> > virt boundary check. It's looking like replace bvec's "page + offset"
> > with phys addrs, yeah?!
>
> Basically everything should be using physical address. The page + offset
> is just a weird and inefficient way to represent that and we really
> need to get rid of it.
I was plowing ahead with converting to phys addrs only to discover
skb_frag_t overlays a bvec with tightly coupled expectations on its
layout. I'm not comfortable right now messing with that type. I think it
may need to be decoupled to proceed on this path. :(
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCHv3 1/2] block: accumulate segment page gaps per bio
2025-08-30 1:47 ` Keith Busch
@ 2025-09-02 5:36 ` Christoph Hellwig
0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2025-09-02 5:36 UTC (permalink / raw)
To: Keith Busch
Cc: Christoph Hellwig, Christoph Hellwig, Keith Busch, linux-block,
linux-nvme, axboe, iommu, willy, netdev, linux-mm
On Fri, Aug 29, 2025 at 07:47:40PM -0600, Keith Busch wrote:
> On Wed, Aug 27, 2025 at 09:37:09AM +0200, Christoph Hellwig wrote:
> > On Tue, Aug 26, 2025 at 04:33:15PM -0600, Keith Busch wrote:
> > > virt boundary check. It's looking like replace bvec's "page + offset"
> > > with phys addrs, yeah?!
> >
> > Basically everything should be using physical address. The page + offset
> > is just a weird and inefficient way to represent that and we really
> > need to get rid of it.
>
> I was plowing ahead with converting to phys addrs only to discover
> skb_frag_t overlays a bvec with tightly coupled expectations on its
> layout. I'm not comfortable right now messing with that type. I think it
> may need to be decoupled to proceed on this path. :(
willy got really angry at this for all the right reasons, and we
still need it fixed. Can we just revert the unreviewed crap the
network folks did here to move the kernel forward?
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-09-02 5:36 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20250821204420.2267923-1-kbusch@meta.com>
[not found] ` <20250821204420.2267923-2-kbusch@meta.com>
[not found] ` <aKxpSorluMXgOFEI@infradead.org>
[not found] ` <aKxu83upEBhf5gT7@kbusch-mbp>
2025-08-26 13:03 ` [PATCHv3 1/2] block: accumulate segment page gaps per bio Christoph Hellwig
2025-08-26 13:47 ` Keith Busch
2025-08-26 13:57 ` Christoph Hellwig
2025-08-26 22:33 ` Keith Busch
2025-08-27 7:37 ` Christoph Hellwig
2025-08-30 1:47 ` Keith Busch
2025-09-02 5:36 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).