The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: David Laight <david.laight.linux@gmail.com>
To: David Hu <xuehaohu@google.com>
Cc: "Jason Gunthorpe" <jgg@ziepe.ca>,
	"Pranjal Shrivastava" <praan@google.com>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	"Christian König" <christian.koenig@amd.com>,
	"Nicolin Chen" <nicolinc@nvidia.com>,
	"Leon Romanovsky" <leon@kernel.org>,
	"Kevin Tian" <kevin.tian@intel.com>,
	"Ankit Agrawal" <ankita@nvidia.com>,
	"Alex Williamson" <alex@shazbot.org>,
	linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linaro-mm-sig@lists.linaro.org, linux-kernel@vger.kernel.org,
	iommu@lists.linux.dev, jmoroni@google.com, kpberry@google.com,
	chriscli@google.com, sashiko-bot@kernel.org,
	stable@vger.kernel.org
Subject: Re: [PATCH v2] dma-buf: Split sgl into page-aligned 2G chunks
Date: Thu, 2 Jul 2026 09:10:40 +0100	[thread overview]
Message-ID: <20260702091040.35eff00c@pumpkin> (raw)
In-Reply-To: <CAPd9Lg9uY1RZvYUtcbKUg=VdWM61M2f3aqmS5veUg_8M_Ce80g@mail.gmail.com>

On Thu, 2 Jul 2026 00:56:40 -0400
David Hu <xuehaohu@google.com> wrote:

> On Tue, Jun 30, 2026 at 8:42 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Tue, Jun 23, 2026 at 11:53:50PM +0100, David Laight wrote:
> >  
> > > > If we restrict incoming dmabuf transfers to fit within VFS-centric
> > > > limits (2GB), we impose unnecessary overhead on the RDMA stack, forcing
> > > > it to manage a significantly higher number of memory registrations. By
> > > > cleanly splitting these massive contiguous device buffers into
> > > > page-aligned SGL entries, we directly improve the efficiency of P2P
> > > > transfers and memory registration.  
> > >
> > > But a divide by '4G - PAGE_SIZE' is also non-trivial and (I think affects
> > > a lot of io) when the quotient is always 1.
> > > Splitting into 2G chunks is a lot cheaper.  
> >
> > Doesn't matter this isn't fast path stuff. It is better to use fewer
> > SGL entries, IHMO.
> >  
> > > > Since this change doesn't seem to have a negative impact on standard file
> > > > I/O or break existing VFS constraints, I'm curious why we shouldn't
> > > > support splitting these >4GB P2P transfers? Am I missing something?  
> > >
> > > I was only wondering whether it was needed...
> > > It does bring up the question of why the >4GB transfers even need splitting.
> > > But that is another question.  
> >
> > SGL can only store an unsigned int size, so any large physical range
> > has to be split down.
> >
> > rdma now a days has code to process the sgl and restore back the > 4G
> > sizes since mode RDMA HW can accept that.
> >
> > commit 486055f5e09df959ad4e3aa4ee75b5c91ddeec2e
> > Author: Michael Margolin <mrgolin@amazon.com>
> > Date:   Mon Feb 17 14:16:23 2025 +0000
> >
> >     RDMA/core: Fix best page size finding when it can cross SG entries
> >
> > So whatever this produces needs to be compatible with that to undo it.  
> 
> Thank you everyone. It looks like most open issues are sorted out.
> I'll wait for maintainers to weigh in before sending out v3 (which
> will remove the type cast for min() per David L.'s feedback, and
> revert to ALIGN_DOWN(UINT_MAX, PAGE_SIZE) per Jason's feedback).

Does this code get used a lot for 'normal' transfers?
I'm away from my normal systems and can't check.
But if pretty much all of the fragments are small (< 4G) then
it is probably worth adding a check for 'size < limit' before
anything else and optimising that case.

	David

> 
> Hi Jason,
> 
> Thank you for your feedback. I took a closer look at the commit to
> ensure compatibility. This patch is perfectly complementary, and
> actually prevents a failure in an edge case for the latest
> `ib_umem_find_best_pgsz` [1].
> 
> Regards,
> David
> 
> [1] For dma-buf split with `0xFFFFFFFF`, in case of a discontinguity
> in later buffers, we will hit this code path in
> `ib_umem_find_best_pgsz`
> 
> ```
> if (i != 0)
>     mask |= va;
> ```
> (*After `va` had been incremented by `0xFFFFFFFF`, due to `va +=
> sg_dma_len(sg) - pgoff`)
> (*Which will set the lowest bit of `mask` to 1)
> 
> Because `count_trailing_zeros(mask) returns 0`,
> `ib_umem_find_best_pgsz()` will always return 0 in such cases.


  reply	other threads:[~2026-07-02  8:10 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-21 22:21 [PATCH] dma-buf: Split sgl by largest page-aligned chunk David Hu
2026-06-22  8:13 ` David Laight
2026-06-22 21:26   ` David Hu
2026-06-23  8:25     ` David Laight
2026-06-23 21:03       ` David Hu
2026-06-23  1:54 ` [PATCH v2] dma-buf: Split sgl into page-aligned 2G chunks David Hu
2026-06-23  8:44   ` David Laight
2026-06-23 20:55     ` Pranjal Shrivastava
2026-06-23 22:53       ` David Laight
2026-06-24 14:31         ` Leon Romanovsky
2026-06-30 12:42         ` Jason Gunthorpe
2026-07-02  4:56           ` David Hu
2026-07-02  8:10             ` David Laight [this message]
2026-06-30 12:38     ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260702091040.35eff00c@pumpkin \
    --to=david.laight.linux@gmail.com \
    --cc=alex@shazbot.org \
    --cc=ankita@nvidia.com \
    --cc=chriscli@google.com \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@ziepe.ca \
    --cc=jmoroni@google.com \
    --cc=kevin.tian@intel.com \
    --cc=kpberry@google.com \
    --cc=leon@kernel.org \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=nicolinc@nvidia.com \
    --cc=praan@google.com \
    --cc=sashiko-bot@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=sumit.semwal@linaro.org \
    --cc=xuehaohu@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox