* Re: [PATCHv3 1/2] block: accumulate segment page gaps per bio [not found] ` <aKxu83upEBhf5gT7@kbusch-mbp> @ 2025-08-26 13:03 ` Christoph Hellwig 2025-08-26 13:47 ` Keith Busch 0 siblings, 1 reply; 7+ messages in thread From: Christoph Hellwig @ 2025-08-26 13:03 UTC (permalink / raw) To: Keith Busch Cc: Christoph Hellwig, Keith Busch, linux-block, linux-nvme, axboe, iommu On Mon, Aug 25, 2025 at 08:10:59AM -0600, Keith Busch wrote: > On Mon, Aug 25, 2025 at 06:46:50AM -0700, Christoph Hellwig wrote: > > On Thu, Aug 21, 2025 at 01:44:19PM -0700, Keith Busch wrote: > > > > Also use the chance to document why all this is PAGE_SIZE based and > > not based on either the iommu granule size or the virt boundary. > > This is a good opportunity to double check my assumptions: Always a good idea! > > PAGE_SIZEs, iommu granules, and virt boundaries are all power-of-two > values, and PAGE_SIZE is always the largest (or tied for largest) of > these. I just had an offlist conversation with someone trying to make a nvme device with a virt boundary larger than PAGE_SIZE work. No idea where that device came from. I hink IOMMU granule > PAGE_SIZE would be painful, but adding the IOMMU list to double check. It would also be really good to document all these assumptions with both comments and assert so that we immediately see when they are violated. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCHv3 1/2] block: accumulate segment page gaps per bio 2025-08-26 13:03 ` [PATCHv3 1/2] block: accumulate segment page gaps per bio Christoph Hellwig @ 2025-08-26 13:47 ` Keith Busch 2025-08-26 13:57 ` Christoph Hellwig 0 siblings, 1 reply; 7+ messages in thread From: Keith Busch @ 2025-08-26 13:47 UTC (permalink / raw) To: Christoph Hellwig Cc: Christoph Hellwig, Keith Busch, linux-block, linux-nvme, axboe, iommu On Tue, Aug 26, 2025 at 03:03:44PM +0200, Christoph Hellwig wrote: > On Mon, Aug 25, 2025 at 08:10:59AM -0600, Keith Busch wrote: > > > > PAGE_SIZEs, iommu granules, and virt boundaries are all power-of-two > > values, and PAGE_SIZE is always the largest (or tied for largest) of > > these. > > I just had an offlist conversation with someone trying to make a nvme > device with a virt boundary larger than PAGE_SIZE work. No idea > where that device came from. Currently, the virtual boundary is always compared to bv_offset, which is a page offset. If the virtual boundary is larger than a page, then we need something like "page_to_phys(bv.bv_page) + bv.bv_offset" every place we need to check against the virt boundary. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCHv3 1/2] block: accumulate segment page gaps per bio 2025-08-26 13:47 ` Keith Busch @ 2025-08-26 13:57 ` Christoph Hellwig 2025-08-26 22:33 ` Keith Busch 0 siblings, 1 reply; 7+ messages in thread From: Christoph Hellwig @ 2025-08-26 13:57 UTC (permalink / raw) To: Keith Busch Cc: Christoph Hellwig, Christoph Hellwig, Keith Busch, linux-block, linux-nvme, axboe, iommu On Tue, Aug 26, 2025 at 07:47:46AM -0600, Keith Busch wrote: > Currently, the virtual boundary is always compared to bv_offset, which > is a page offset. If the virtual boundary is larger than a page, then we > need something like "page_to_phys(bv.bv_page) + bv.bv_offset" every > place we need to check against the virt boundary. bv_offset is only guaranteed to be a page offset if your use bio_for_each_segment(_all) or the low-level helpers implementing it and not bio_for_each_bvec(_all) where it can be much larger than PAGE_SIZE. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCHv3 1/2] block: accumulate segment page gaps per bio 2025-08-26 13:57 ` Christoph Hellwig @ 2025-08-26 22:33 ` Keith Busch 2025-08-27 7:37 ` Christoph Hellwig 0 siblings, 1 reply; 7+ messages in thread From: Keith Busch @ 2025-08-26 22:33 UTC (permalink / raw) To: Christoph Hellwig Cc: Christoph Hellwig, Keith Busch, linux-block, linux-nvme, axboe, iommu On Tue, Aug 26, 2025 at 03:57:34PM +0200, Christoph Hellwig wrote: > On Tue, Aug 26, 2025 at 07:47:46AM -0600, Keith Busch wrote: > > Currently, the virtual boundary is always compared to bv_offset, which > > is a page offset. If the virtual boundary is larger than a page, then we > > need something like "page_to_phys(bv.bv_page) + bv.bv_offset" every > > place we need to check against the virt boundary. > > bv_offset is only guaranteed to be a page offset if your use > bio_for_each_segment(_all) or the low-level helpers implementing > it and not bio_for_each_bvec(_all) where it can be much larger > than PAGE_SIZE. Yes, good point. So we'd have a folio offset when it's not a single page, but I don't think we want to special case large folios for every virt boundary check. It's looking like replace bvec's "page + offset" with phys addrs, yeah?! ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCHv3 1/2] block: accumulate segment page gaps per bio 2025-08-26 22:33 ` Keith Busch @ 2025-08-27 7:37 ` Christoph Hellwig 2025-08-30 1:47 ` Keith Busch 0 siblings, 1 reply; 7+ messages in thread From: Christoph Hellwig @ 2025-08-27 7:37 UTC (permalink / raw) To: Keith Busch Cc: Christoph Hellwig, Christoph Hellwig, Keith Busch, linux-block, linux-nvme, axboe, iommu On Tue, Aug 26, 2025 at 04:33:15PM -0600, Keith Busch wrote: > On Tue, Aug 26, 2025 at 03:57:34PM +0200, Christoph Hellwig wrote: > > On Tue, Aug 26, 2025 at 07:47:46AM -0600, Keith Busch wrote: > > > Currently, the virtual boundary is always compared to bv_offset, which > > > is a page offset. If the virtual boundary is larger than a page, then we > > > need something like "page_to_phys(bv.bv_page) + bv.bv_offset" every > > > place we need to check against the virt boundary. > > > > bv_offset is only guaranteed to be a page offset if your use > > bio_for_each_segment(_all) or the low-level helpers implementing > > it and not bio_for_each_bvec(_all) where it can be much larger > > than PAGE_SIZE. > > Yes, good point. So we'd have a folio offset when it's not a single > page, but I don't think we want to special case large folios for every > virt boundary check. It's looking like replace bvec's "page + offset" > with phys addrs, yeah?! Basically everything should be using physical address. The page + offset is just a weird and inefficient way to represent that and we really need to get rid of it. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCHv3 1/2] block: accumulate segment page gaps per bio 2025-08-27 7:37 ` Christoph Hellwig @ 2025-08-30 1:47 ` Keith Busch 2025-09-02 5:36 ` Christoph Hellwig 0 siblings, 1 reply; 7+ messages in thread From: Keith Busch @ 2025-08-30 1:47 UTC (permalink / raw) To: Christoph Hellwig Cc: Christoph Hellwig, Keith Busch, linux-block, linux-nvme, axboe, iommu On Wed, Aug 27, 2025 at 09:37:09AM +0200, Christoph Hellwig wrote: > On Tue, Aug 26, 2025 at 04:33:15PM -0600, Keith Busch wrote: > > virt boundary check. It's looking like replace bvec's "page + offset" > > with phys addrs, yeah?! > > Basically everything should be using physical address. The page + offset > is just a weird and inefficient way to represent that and we really > need to get rid of it. I was plowing ahead with converting to phys addrs only to discover skb_frag_t overlays a bvec with tightly coupled expectations on its layout. I'm not comfortable right now messing with that type. I think it may need to be decoupled to proceed on this path. :( ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCHv3 1/2] block: accumulate segment page gaps per bio 2025-08-30 1:47 ` Keith Busch @ 2025-09-02 5:36 ` Christoph Hellwig 0 siblings, 0 replies; 7+ messages in thread From: Christoph Hellwig @ 2025-09-02 5:36 UTC (permalink / raw) To: Keith Busch Cc: Christoph Hellwig, Christoph Hellwig, Keith Busch, linux-block, linux-nvme, axboe, iommu, willy, netdev, linux-mm On Fri, Aug 29, 2025 at 07:47:40PM -0600, Keith Busch wrote: > On Wed, Aug 27, 2025 at 09:37:09AM +0200, Christoph Hellwig wrote: > > On Tue, Aug 26, 2025 at 04:33:15PM -0600, Keith Busch wrote: > > > virt boundary check. It's looking like replace bvec's "page + offset" > > > with phys addrs, yeah?! > > > > Basically everything should be using physical address. The page + offset > > is just a weird and inefficient way to represent that and we really > > need to get rid of it. > > I was plowing ahead with converting to phys addrs only to discover > skb_frag_t overlays a bvec with tightly coupled expectations on its > layout. I'm not comfortable right now messing with that type. I think it > may need to be decoupled to proceed on this path. :( willy got really angry at this for all the right reasons, and we still need it fixed. Can we just revert the unreviewed crap the network folks did here to move the kernel forward? ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-09-02 5:36 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20250821204420.2267923-1-kbusch@meta.com>
[not found] ` <20250821204420.2267923-2-kbusch@meta.com>
[not found] ` <aKxpSorluMXgOFEI@infradead.org>
[not found] ` <aKxu83upEBhf5gT7@kbusch-mbp>
2025-08-26 13:03 ` [PATCHv3 1/2] block: accumulate segment page gaps per bio Christoph Hellwig
2025-08-26 13:47 ` Keith Busch
2025-08-26 13:57 ` Christoph Hellwig
2025-08-26 22:33 ` Keith Busch
2025-08-27 7:37 ` Christoph Hellwig
2025-08-30 1:47 ` Keith Busch
2025-09-02 5:36 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).