Netdev List
 help / color / mirror / Atom feed
* [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length
@ 2026-05-19 20:35 David Carlier
  2026-05-19 21:07 ` Bobby Eshleman
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: David Carlier @ 2026-05-19 20:35 UTC (permalink / raw)
  To: netdev
  Cc: stable, davem, edumazet, kuba, pabeni, horms, sdf, sdf.kernel,
	kaiyuanz, almasrymina, bobbyeshleman, linux-kernel, David Carlier

net_devmem_bind_dmabuf() trusts dmabuf->size and sg_dma_len() to be
PAGE_SIZE multiples without checking:

  - tx_vec is sized dmabuf->size / PAGE_SIZE, and
    net_devmem_get_niov_at() only bounds-checks virt_addr < dmabuf->size
    before indexing tx_vec[virt_addr / PAGE_SIZE]. With size =
    N*PAGE_SIZE + r (1 <= r < PAGE_SIZE), sendmsg() at iov_base =
    N*PAGE_SIZE passes the bound check and reads tx_vec[N] -- one past.

  - owner->area.num_niovs = len / PAGE_SIZE while gen_pool_add_owner()
    covers the full byte len, so a non-page-multiple non-final sg
    desyncs num_niovs from the gen_pool region for every later sg, on
    both RX and TX.

dma-buf does not require page-aligned sizes, so the bind path has to
enforce what its own indexing assumes. Reject both with -EINVAL.

The size check is TX-only (only tx_vec is sized off dmabuf->size); the
SG-length check covers both directions.

Fixes: bd61848900bf ("net: devmem: Implement TX path")
Cc: stable@vger.kernel.org
Signed-off-by: David Carlier <devnexen@gmail.com>
---
Changes in v2:
  - Reframe commit message around the kernel-side OOB instead of
    "real exporters already page-align", which read as the OOB being
    unreachable and undercut Cc: stable (Stanislav Fomichev).
  - Hoist the SG-length check out of the if (TX) branch so it covers
    RX too; RX has the same num_niovs / gen_pool desync on a
    contract-violating exporter, just without an OOB. Keep the
    size-multiple check TX-only (Stanislav Fomichev).
  - Drop bool todevice; compare direction == DMA_TO_DEVICE inline to
    match the existing call site at the tx_vec[] assignment
    (Bobby Eshleman).

 net/core/devmem.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/net/core/devmem.c b/net/core/devmem.c
index 468344739db2..4f71de44c0fb 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -241,6 +241,11 @@ net_devmem_bind_dmabuf(struct net_device *dev,
 	}
 
 	if (direction == DMA_TO_DEVICE) {
+		if (!IS_ALIGNED(dmabuf->size, PAGE_SIZE)) {
+			err = -EINVAL;
+			NL_SET_ERR_MSG(extack, "TX dma-buf size must be a multiple of PAGE_SIZE");
+			goto err_unmap;
+		}
 		binding->tx_vec = kvmalloc_objs(struct net_iov *,
 						dmabuf->size / PAGE_SIZE);
 		if (!binding->tx_vec) {
@@ -267,6 +272,12 @@ net_devmem_bind_dmabuf(struct net_device *dev,
 		size_t len = sg_dma_len(sg);
 		struct net_iov *niov;
 
+		if (!IS_ALIGNED(len, PAGE_SIZE)) {
+			err = -EINVAL;
+			NL_SET_ERR_MSG(extack, "dma-buf SG length must be PAGE_SIZE aligned");
+			goto err_free_chunks;
+		}
+
 		owner = kzalloc_node(sizeof(*owner), GFP_KERNEL,
 				     dev_to_node(&dev->dev));
 		if (!owner) {
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length
  2026-05-19 20:35 [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length David Carlier
@ 2026-05-19 21:07 ` Bobby Eshleman
  2026-05-19 23:33 ` Stanislav Fomichev
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Bobby Eshleman @ 2026-05-19 21:07 UTC (permalink / raw)
  To: David Carlier
  Cc: netdev, stable, davem, edumazet, kuba, pabeni, horms, sdf,
	sdf.kernel, kaiyuanz, almasrymina, linux-kernel

On Tue, May 19, 2026 at 09:35:30PM +0100, David Carlier wrote:
> net_devmem_bind_dmabuf() trusts dmabuf->size and sg_dma_len() to be
> PAGE_SIZE multiples without checking:
> 
>   - tx_vec is sized dmabuf->size / PAGE_SIZE, and
>     net_devmem_get_niov_at() only bounds-checks virt_addr < dmabuf->size
>     before indexing tx_vec[virt_addr / PAGE_SIZE]. With size =
>     N*PAGE_SIZE + r (1 <= r < PAGE_SIZE), sendmsg() at iov_base =
>     N*PAGE_SIZE passes the bound check and reads tx_vec[N] -- one past.
> 
>   - owner->area.num_niovs = len / PAGE_SIZE while gen_pool_add_owner()
>     covers the full byte len, so a non-page-multiple non-final sg
>     desyncs num_niovs from the gen_pool region for every later sg, on
>     both RX and TX.
> 
> dma-buf does not require page-aligned sizes, so the bind path has to
> enforce what its own indexing assumes. Reject both with -EINVAL.
> 
> The size check is TX-only (only tx_vec is sized off dmabuf->size); the
> SG-length check covers both directions.
> 
> Fixes: bd61848900bf ("net: devmem: Implement TX path")
> Cc: stable@vger.kernel.org
> Signed-off-by: David Carlier <devnexen@gmail.com>
> ---
> Changes in v2:
>   - Reframe commit message around the kernel-side OOB instead of
>     "real exporters already page-align", which read as the OOB being
>     unreachable and undercut Cc: stable (Stanislav Fomichev).
>   - Hoist the SG-length check out of the if (TX) branch so it covers
>     RX too; RX has the same num_niovs / gen_pool desync on a
>     contract-violating exporter, just without an OOB. Keep the
>     size-multiple check TX-only (Stanislav Fomichev).
>   - Drop bool todevice; compare direction == DMA_TO_DEVICE inline to
>     match the existing call site at the tx_vec[] assignment
>     (Bobby Eshleman).
> 
>  net/core/devmem.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/net/core/devmem.c b/net/core/devmem.c
> index 468344739db2..4f71de44c0fb 100644
> --- a/net/core/devmem.c
> +++ b/net/core/devmem.c
> @@ -241,6 +241,11 @@ net_devmem_bind_dmabuf(struct net_device *dev,
>  	}
>  
>  	if (direction == DMA_TO_DEVICE) {
> +		if (!IS_ALIGNED(dmabuf->size, PAGE_SIZE)) {
> +			err = -EINVAL;
> +			NL_SET_ERR_MSG(extack, "TX dma-buf size must be a multiple of PAGE_SIZE");
> +			goto err_unmap;
> +		}
>  		binding->tx_vec = kvmalloc_objs(struct net_iov *,
>  						dmabuf->size / PAGE_SIZE);
>  		if (!binding->tx_vec) {
> @@ -267,6 +272,12 @@ net_devmem_bind_dmabuf(struct net_device *dev,
>  		size_t len = sg_dma_len(sg);
>  		struct net_iov *niov;
>  
> +		if (!IS_ALIGNED(len, PAGE_SIZE)) {
> +			err = -EINVAL;
> +			NL_SET_ERR_MSG(extack, "dma-buf SG length must be PAGE_SIZE aligned");
> +			goto err_free_chunks;
> +		}
> +
>  		owner = kzalloc_node(sizeof(*owner), GFP_KERNEL,
>  				     dev_to_node(&dev->dev));
>  		if (!owner) {
> -- 
> 2.53.0

LGTM.

Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length
  2026-05-19 20:35 [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length David Carlier
  2026-05-19 21:07 ` Bobby Eshleman
@ 2026-05-19 23:33 ` Stanislav Fomichev
  2026-05-20 18:55 ` Mina Almasry
  2026-05-21  2:10 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 8+ messages in thread
From: Stanislav Fomichev @ 2026-05-19 23:33 UTC (permalink / raw)
  To: David Carlier
  Cc: netdev, stable, davem, edumazet, kuba, pabeni, horms, sdf,
	kaiyuanz, almasrymina, bobbyeshleman, linux-kernel

On 05/19, David Carlier wrote:
> net_devmem_bind_dmabuf() trusts dmabuf->size and sg_dma_len() to be
> PAGE_SIZE multiples without checking:
> 
>   - tx_vec is sized dmabuf->size / PAGE_SIZE, and
>     net_devmem_get_niov_at() only bounds-checks virt_addr < dmabuf->size
>     before indexing tx_vec[virt_addr / PAGE_SIZE]. With size =
>     N*PAGE_SIZE + r (1 <= r < PAGE_SIZE), sendmsg() at iov_base =
>     N*PAGE_SIZE passes the bound check and reads tx_vec[N] -- one past.
> 
>   - owner->area.num_niovs = len / PAGE_SIZE while gen_pool_add_owner()
>     covers the full byte len, so a non-page-multiple non-final sg
>     desyncs num_niovs from the gen_pool region for every later sg, on
>     both RX and TX.
> 
> dma-buf does not require page-aligned sizes, so the bind path has to
> enforce what its own indexing assumes. Reject both with -EINVAL.
> 
> The size check is TX-only (only tx_vec is sized off dmabuf->size); the
> SG-length check covers both directions.
> 
> Fixes: bd61848900bf ("net: devmem: Implement TX path")
> Cc: stable@vger.kernel.org
> Signed-off-by: David Carlier <devnexen@gmail.com>
> ---
> Changes in v2:
>   - Reframe commit message around the kernel-side OOB instead of
>     "real exporters already page-align", which read as the OOB being
>     unreachable and undercut Cc: stable (Stanislav Fomichev).
>   - Hoist the SG-length check out of the if (TX) branch so it covers
>     RX too; RX has the same num_niovs / gen_pool desync on a
>     contract-violating exporter, just without an OOB. Keep the
>     size-multiple check TX-only (Stanislav Fomichev).
>   - Drop bool todevice; compare direction == DMA_TO_DEVICE inline to
>     match the existing call site at the tx_vec[] assignment
>     (Bobby Eshleman).

Acked-by: Stanislav Fomichev <sdf@fomichev.me>

Thanks!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length
  2026-05-19 20:35 [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length David Carlier
  2026-05-19 21:07 ` Bobby Eshleman
  2026-05-19 23:33 ` Stanislav Fomichev
@ 2026-05-20 18:55 ` Mina Almasry
  2026-05-20 19:28   ` David CARLIER
  2026-05-21  2:10 ` patchwork-bot+netdevbpf
  3 siblings, 1 reply; 8+ messages in thread
From: Mina Almasry @ 2026-05-20 18:55 UTC (permalink / raw)
  To: David Carlier
  Cc: netdev, stable, davem, edumazet, kuba, pabeni, horms, sdf,
	sdf.kernel, kaiyuanz, bobbyeshleman, linux-kernel

On Tue, May 19, 2026 at 1:35 PM David Carlier <devnexen@gmail.com> wrote:
>
> net_devmem_bind_dmabuf() trusts dmabuf->size and sg_dma_len() to be
> PAGE_SIZE multiples without checking:
>
>   - tx_vec is sized dmabuf->size / PAGE_SIZE, and
>     net_devmem_get_niov_at() only bounds-checks virt_addr < dmabuf->size
>     before indexing tx_vec[virt_addr / PAGE_SIZE]. With size =
>     N*PAGE_SIZE + r (1 <= r < PAGE_SIZE), sendmsg() at iov_base =
>     N*PAGE_SIZE passes the bound check and reads tx_vec[N] -- one past.
>
>   - owner->area.num_niovs = len / PAGE_SIZE while gen_pool_add_owner()
>     covers the full byte len, so a non-page-multiple non-final sg
>     desyncs num_niovs from the gen_pool region for every later sg, on
>     both RX and TX.
>
> dma-buf does not require page-aligned sizes, so the bind path has to
> enforce what its own indexing assumes. Reject both with -EINVAL.
>
> The size check is TX-only (only tx_vec is sized off dmabuf->size); the
> SG-length check covers both directions.
>
> Fixes: bd61848900bf ("net: devmem: Implement TX path")
> Cc: stable@vger.kernel.org
> Signed-off-by: David Carlier <devnexen@gmail.com>
> ---
> Changes in v2:
>   - Reframe commit message around the kernel-side OOB instead of
>     "real exporters already page-align", which read as the OOB being
>     unreachable and undercut Cc: stable (Stanislav Fomichev).
>   - Hoist the SG-length check out of the if (TX) branch so it covers
>     RX too; RX has the same num_niovs / gen_pool desync on a
>     contract-violating exporter, just without an OOB. Keep the
>     size-multiple check TX-only (Stanislav Fomichev).
>   - Drop bool todevice; compare direction == DMA_TO_DEVICE inline to
>     match the existing call site at the tx_vec[] assignment
>     (Bobby Eshleman).
>
>  net/core/devmem.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/net/core/devmem.c b/net/core/devmem.c
> index 468344739db2..4f71de44c0fb 100644
> --- a/net/core/devmem.c
> +++ b/net/core/devmem.c
> @@ -241,6 +241,11 @@ net_devmem_bind_dmabuf(struct net_device *dev,
>         }
>
>         if (direction == DMA_TO_DEVICE) {
> +               if (!IS_ALIGNED(dmabuf->size, PAGE_SIZE)) {
> +                       err = -EINVAL;
> +                       NL_SET_ERR_MSG(extack, "TX dma-buf size must be a multiple of PAGE_SIZE");
> +                       goto err_unmap;
> +               }
>                 binding->tx_vec = kvmalloc_objs(struct net_iov *,
>                                                 dmabuf->size / PAGE_SIZE);
>                 if (!binding->tx_vec) {
> @@ -267,6 +272,12 @@ net_devmem_bind_dmabuf(struct net_device *dev,
>                 size_t len = sg_dma_len(sg);
>                 struct net_iov *niov;
>
> +               if (!IS_ALIGNED(len, PAGE_SIZE)) {
> +                       err = -EINVAL;
> +                       NL_SET_ERR_MSG(extack, "dma-buf SG length must be PAGE_SIZE aligned");
> +                       goto err_free_chunks;
> +               }
> +
>                 owner = kzalloc_node(sizeof(*owner), GFP_KERNEL,
>                                      dev_to_node(&dev->dev));
>                 if (!owner) {
> --
> 2.53.0
>

No hold on, I don't think we actually have a bug here. AFAIR all
you're describing is intionional . Yes the TX vectors and their niov
arrays do implicitly 'pad' the dmabuf size to PAGE_SIZE.

But net_devmem_get_niov_at has this check that prevents us from trying
to send past the dma-buf size, even if it's not page_aligned:

```
if (virt_addr >= binding->dmabuf->size)
return NULL;
```

IIRC the NULL should be bubbled up to the user as some error.

Please double check that we actually have a bug here. If not, please
don't merge this. This change could break existing users using devmem
TX correctly with non-PAGE_SIZE aligned dmabufs, which is a valid use
case.

And if we have a bug, lets fix it in some way that doesn't deprecate
support for non page-aligned TX dmabufs. You may be breaking users
here.

-- 
Thanks,
Mina

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length
  2026-05-20 18:55 ` Mina Almasry
@ 2026-05-20 19:28   ` David CARLIER
  2026-05-20 20:42     ` Mina Almasry
  0 siblings, 1 reply; 8+ messages in thread
From: David CARLIER @ 2026-05-20 19:28 UTC (permalink / raw)
  To: Mina Almasry
  Cc: netdev, stable, davem, edumazet, kuba, pabeni, horms, sdf,
	sdf.kernel, kaiyuanz, bobbyeshleman, linux-kernel

On Wed, 20 May 2026 at 19:55, Mina Almasry <almasrymina@google.com> wrote:
>
> On Tue, May 19, 2026 at 1:35 PM David Carlier <devnexen@gmail.com> wrote:
> >
> > net_devmem_bind_dmabuf() trusts dmabuf->size and sg_dma_len() to be
> > PAGE_SIZE multiples without checking:
> >
> >   - tx_vec is sized dmabuf->size / PAGE_SIZE, and
> >     net_devmem_get_niov_at() only bounds-checks virt_addr < dmabuf->size
> >     before indexing tx_vec[virt_addr / PAGE_SIZE]. With size =
> >     N*PAGE_SIZE + r (1 <= r < PAGE_SIZE), sendmsg() at iov_base =
> >     N*PAGE_SIZE passes the bound check and reads tx_vec[N] -- one past.
> >
> >   - owner->area.num_niovs = len / PAGE_SIZE while gen_pool_add_owner()
> >     covers the full byte len, so a non-page-multiple non-final sg
> >     desyncs num_niovs from the gen_pool region for every later sg, on
> >     both RX and TX.
> >
> > dma-buf does not require page-aligned sizes, so the bind path has to
> > enforce what its own indexing assumes. Reject both with -EINVAL.
> >
> > The size check is TX-only (only tx_vec is sized off dmabuf->size); the
> > SG-length check covers both directions.
> >
> > Fixes: bd61848900bf ("net: devmem: Implement TX path")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: David Carlier <devnexen@gmail.com>
> > ---
> > Changes in v2:
> >   - Reframe commit message around the kernel-side OOB instead of
> >     "real exporters already page-align", which read as the OOB being
> >     unreachable and undercut Cc: stable (Stanislav Fomichev).
> >   - Hoist the SG-length check out of the if (TX) branch so it covers
> >     RX too; RX has the same num_niovs / gen_pool desync on a
> >     contract-violating exporter, just without an OOB. Keep the
> >     size-multiple check TX-only (Stanislav Fomichev).
> >   - Drop bool todevice; compare direction == DMA_TO_DEVICE inline to
> >     match the existing call site at the tx_vec[] assignment
> >     (Bobby Eshleman).
> >
> >  net/core/devmem.c | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> >
> > diff --git a/net/core/devmem.c b/net/core/devmem.c
> > index 468344739db2..4f71de44c0fb 100644
> > --- a/net/core/devmem.c
> > +++ b/net/core/devmem.c
> > @@ -241,6 +241,11 @@ net_devmem_bind_dmabuf(struct net_device *dev,
> >         }
> >
> >         if (direction == DMA_TO_DEVICE) {
> > +               if (!IS_ALIGNED(dmabuf->size, PAGE_SIZE)) {
> > +                       err = -EINVAL;
> > +                       NL_SET_ERR_MSG(extack, "TX dma-buf size must be a multiple of PAGE_SIZE");
> > +                       goto err_unmap;
> > +               }
> >                 binding->tx_vec = kvmalloc_objs(struct net_iov *,
> >                                                 dmabuf->size / PAGE_SIZE);
> >                 if (!binding->tx_vec) {
> > @@ -267,6 +272,12 @@ net_devmem_bind_dmabuf(struct net_device *dev,
> >                 size_t len = sg_dma_len(sg);
> >                 struct net_iov *niov;
> >
> > +               if (!IS_ALIGNED(len, PAGE_SIZE)) {
> > +                       err = -EINVAL;
> > +                       NL_SET_ERR_MSG(extack, "dma-buf SG length must be PAGE_SIZE aligned");
> > +                       goto err_free_chunks;
> > +               }
> > +
> >                 owner = kzalloc_node(sizeof(*owner), GFP_KERNEL,
> >                                      dev_to_node(&dev->dev));
> >                 if (!owner) {
> > --
> > 2.53.0
> >
>
> No hold on, I don't think we actually have a bug here. AFAIR all
> you're describing is intionional . Yes the TX vectors and their niov
> arrays do implicitly 'pad' the dmabuf size to PAGE_SIZE.
>
> But net_devmem_get_niov_at has this check that prevents us from trying
> to send past the dma-buf size, even if it's not page_aligned:
>
> ```
> if (virt_addr >= binding->dmabuf->size)
> return NULL;
> ```
>
> IIRC the NULL should be bubbled up to the user as some error.
>
> Please double check that we actually have a bug here. If not, please
> don't merge this. This change could break existing users using devmem
> TX correctly with non-PAGE_SIZE aligned dmabufs, which is a valid use
> case.
>
> And if we have a bug, lets fix it in some way that doesn't deprecate
> support for non page-aligned TX dmabufs. You may be breaking users
> here.
>
> --
> Thanks,
> Mina

Hi, Mina, note that the guard you're quoting doesn't cover the case
I'm describing. It's in bytes against dmabuf->size, while tx_vec is
sized dmabuf->size / PAGE_SIZE
  (truncating) -- there's a sub-page window where the check passes and
the index doesn't.

  Concretely, PAGE_SIZE = 4096 and dmabuf->size = 4097:

  tx_vec allocated for 4097 / 4096 = 1 entry (valid index 0).
  sendmsg with iov_base = 4096:
    virt_addr >= dmabuf->size   ->  4096 >= 4097, passes.
    tx_vec[virt_addr / PAGE_SIZE] -> tx_vec[1], OOB by one.

  And the OOB pointer isn't just returned to the caller -- it flows
through get_netmem() -> __get_netmem(), which dereferences it
(net_is_devmem_iov() reads ->type) and
   on a matching byte refcounts the binding off of it. So this is a
controlled OOB deref, not just a stray read.

  On breaking users: the partial-page tail was never TX-usable to
begin with. The fill loop only populates num_niovs = len / PAGE_SIZE
entries while
  gen_pool_add_owner() covers the full byte len, so any sendmsg into
[num_niovs*PAGE_SIZE, dmabuf->size) was already heading for either
NULL or this OOB. It's not
  deprecating a working configuration -- it's rejecting one that wasn't working.

  Also worth flagging: the second hunk (sg_dma_len alignment) is the
RX path too. net_devmem_get_niov_at() is TX-only, so the guard you
quoted doesn't apply there, and
  the num_niovs vs gen_pool_add_owner(len) desync hits every non-final
SG with a sub-page length, both directions.

  If you'd prefer to keep non-aligned binds legal, an alternative
would be tightening the guard in net_devmem_get_niov_at() to virt_addr
>= num_niovs * PAGE_SIZE
  instead of >= dmabuf->size. Happy to spin that as v3 -- but it
doesn't help the SG-length leg, which still needs rejecting at bind
time.

Cheers.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length
  2026-05-20 19:28   ` David CARLIER
@ 2026-05-20 20:42     ` Mina Almasry
  2026-05-20 21:02       ` David CARLIER
  0 siblings, 1 reply; 8+ messages in thread
From: Mina Almasry @ 2026-05-20 20:42 UTC (permalink / raw)
  To: David CARLIER
  Cc: netdev, stable, davem, edumazet, kuba, pabeni, horms, sdf,
	sdf.kernel, kaiyuanz, bobbyeshleman, linux-kernel

On Wed, May 20, 2026 at 12:28 PM David CARLIER <devnexen@gmail.com> wrote:
>
> On Wed, 20 May 2026 at 19:55, Mina Almasry <almasrymina@google.com> wrote:
> >
> > On Tue, May 19, 2026 at 1:35 PM David Carlier <devnexen@gmail.com> wrote:
> > >
> > > net_devmem_bind_dmabuf() trusts dmabuf->size and sg_dma_len() to be
> > > PAGE_SIZE multiples without checking:
> > >
> > >   - tx_vec is sized dmabuf->size / PAGE_SIZE, and
> > >     net_devmem_get_niov_at() only bounds-checks virt_addr < dmabuf->size
> > >     before indexing tx_vec[virt_addr / PAGE_SIZE]. With size =
> > >     N*PAGE_SIZE + r (1 <= r < PAGE_SIZE), sendmsg() at iov_base =
> > >     N*PAGE_SIZE passes the bound check and reads tx_vec[N] -- one past.
> > >
> > >   - owner->area.num_niovs = len / PAGE_SIZE while gen_pool_add_owner()
> > >     covers the full byte len, so a non-page-multiple non-final sg
> > >     desyncs num_niovs from the gen_pool region for every later sg, on
> > >     both RX and TX.
> > >
> > > dma-buf does not require page-aligned sizes, so the bind path has to
> > > enforce what its own indexing assumes. Reject both with -EINVAL.
> > >
> > > The size check is TX-only (only tx_vec is sized off dmabuf->size); the
> > > SG-length check covers both directions.
> > >
> > > Fixes: bd61848900bf ("net: devmem: Implement TX path")
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: David Carlier <devnexen@gmail.com>
> > > ---
> > > Changes in v2:
> > >   - Reframe commit message around the kernel-side OOB instead of
> > >     "real exporters already page-align", which read as the OOB being
> > >     unreachable and undercut Cc: stable (Stanislav Fomichev).
> > >   - Hoist the SG-length check out of the if (TX) branch so it covers
> > >     RX too; RX has the same num_niovs / gen_pool desync on a
> > >     contract-violating exporter, just without an OOB. Keep the
> > >     size-multiple check TX-only (Stanislav Fomichev).
> > >   - Drop bool todevice; compare direction == DMA_TO_DEVICE inline to
> > >     match the existing call site at the tx_vec[] assignment
> > >     (Bobby Eshleman).
> > >
> > >  net/core/devmem.c | 11 +++++++++++
> > >  1 file changed, 11 insertions(+)
> > >
> > > diff --git a/net/core/devmem.c b/net/core/devmem.c
> > > index 468344739db2..4f71de44c0fb 100644
> > > --- a/net/core/devmem.c
> > > +++ b/net/core/devmem.c
> > > @@ -241,6 +241,11 @@ net_devmem_bind_dmabuf(struct net_device *dev,
> > >         }
> > >
> > >         if (direction == DMA_TO_DEVICE) {
> > > +               if (!IS_ALIGNED(dmabuf->size, PAGE_SIZE)) {
> > > +                       err = -EINVAL;
> > > +                       NL_SET_ERR_MSG(extack, "TX dma-buf size must be a multiple of PAGE_SIZE");
> > > +                       goto err_unmap;
> > > +               }
> > >                 binding->tx_vec = kvmalloc_objs(struct net_iov *,
> > >                                                 dmabuf->size / PAGE_SIZE);
> > >                 if (!binding->tx_vec) {
> > > @@ -267,6 +272,12 @@ net_devmem_bind_dmabuf(struct net_device *dev,
> > >                 size_t len = sg_dma_len(sg);
> > >                 struct net_iov *niov;
> > >
> > > +               if (!IS_ALIGNED(len, PAGE_SIZE)) {
> > > +                       err = -EINVAL;
> > > +                       NL_SET_ERR_MSG(extack, "dma-buf SG length must be PAGE_SIZE aligned");
> > > +                       goto err_free_chunks;
> > > +               }
> > > +
> > >                 owner = kzalloc_node(sizeof(*owner), GFP_KERNEL,
> > >                                      dev_to_node(&dev->dev));
> > >                 if (!owner) {
> > > --
> > > 2.53.0
> > >
> >
> > No hold on, I don't think we actually have a bug here. AFAIR all
> > you're describing is intionional . Yes the TX vectors and their niov
> > arrays do implicitly 'pad' the dmabuf size to PAGE_SIZE.
> >
> > But net_devmem_get_niov_at has this check that prevents us from trying
> > to send past the dma-buf size, even if it's not page_aligned:
> >
> > ```
> > if (virt_addr >= binding->dmabuf->size)
> > return NULL;
> > ```
> >
> > IIRC the NULL should be bubbled up to the user as some error.
> >
> > Please double check that we actually have a bug here. If not, please
> > don't merge this. This change could break existing users using devmem
> > TX correctly with non-PAGE_SIZE aligned dmabufs, which is a valid use
> > case.
> >
> > And if we have a bug, lets fix it in some way that doesn't deprecate
> > support for non page-aligned TX dmabufs. You may be breaking users
> > here.
> >
> > --
> > Thanks,
> > Mina
>
> Hi, Mina, note that the guard you're quoting doesn't cover the case
> I'm describing. It's in bytes against dmabuf->size, while tx_vec is
> sized dmabuf->size / PAGE_SIZE
>   (truncating) -- there's a sub-page window where the check passes and
> the index doesn't.
>
>   Concretely, PAGE_SIZE = 4096 and dmabuf->size = 4097:
>
>   tx_vec allocated for 4097 / 4096 = 1 entry (valid index 0).
>   sendmsg with iov_base = 4096:
>     virt_addr >= dmabuf->size   ->  4096 >= 4097, passes.
>     tx_vec[virt_addr / PAGE_SIZE] -> tx_vec[1], OOB by one.
>
>   And the OOB pointer isn't just returned to the caller -- it flows
> through get_netmem() -> __get_netmem(), which dereferences it
> (net_is_devmem_iov() reads ->type) and
>    on a matching byte refcounts the binding off of it. So this is a
> controlled OOB deref, not just a stray read.
>

Ah, I see. I think that is indeed the bug. In the case PAGE_SIZE=4096
and dmabuf->size is 4097, the intention in the code was to have tx_vec
be an array of 2, where tx_vec[0] is [1->4096] and tx_vec[1] is
[4097]. I have an off-by-one error in the tx_vec allocation :(

tx_vec[1] would contain an niov and as is stands we assume nvios are
PAGE_SIZE, but IIRC net_devmem_get_niov_at() would make sure that the
callers trying to use tx_vec[1] would only use it for a range that's
valid, so using [4097] and not a range like [4097->8192].

However when I dug deeper on proper pruning of page alignment
assumptions in net_devmem_bind_dmabuf, the problems are deeper than
this patch suggests. We don't properly handle the (probably
non-existent?) edge case where the dma-buf itself is page aligned but
for_each_sgtable_dma_sg itself gives us un-page_aligned sg entries :(

I think probably all of this is very theoretical. In practice probably
the dmabuf implementations in the wild seem to be page-aligned.
udmabuf doesn't support non-page-aligned dmabufs even so we can't add
tests for these things. So I guess fine, lets merge this.

>   On breaking users: the partial-page tail was never TX-usable to
> begin with. The fill loop only populates num_niovs = len / PAGE_SIZE
> entries while
>   gen_pool_add_owner() covers the full byte len, so any sendmsg into
> [num_niovs*PAGE_SIZE, dmabuf->size) was already heading for either
> NULL or this OOB. It's not
>   deprecating a working configuration -- it's rejecting one that wasn't working.
>

Yes, but still, today binding a non-PAGE_SIZE TX dmabuf but only using
actually sendmsging up to the last PAGE_SIZE boundary is working,
andthat would break entirely. I guess lets merge this and on the
offchance someone is hitting this edge case we can revisit.

Reviewed-by: Mina Almasry <almasrymina@google.com>

I don't know if you're interested in also fixing the edge case where
the sg table entries themselves are not not page-aligned. I think that
also doesn't work properly?

--
Thanks,
Mina

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length
  2026-05-20 20:42     ` Mina Almasry
@ 2026-05-20 21:02       ` David CARLIER
  0 siblings, 0 replies; 8+ messages in thread
From: David CARLIER @ 2026-05-20 21:02 UTC (permalink / raw)
  To: Mina Almasry
  Cc: netdev, stable, davem, edumazet, kuba, pabeni, horms, sdf,
	sdf.kernel, kaiyuanz, bobbyeshleman, linux-kernel

On Wed, 20 May 2026 at 21:42, Mina Almasry <almasrymina@google.com> wrote:
>
> On Wed, May 20, 2026 at 12:28 PM David CARLIER <devnexen@gmail.com> wrote:
> >
> > On Wed, 20 May 2026 at 19:55, Mina Almasry <almasrymina@google.com> wrote:
> > >
> > > On Tue, May 19, 2026 at 1:35 PM David Carlier <devnexen@gmail.com> wrote:
> > > >
> > > > net_devmem_bind_dmabuf() trusts dmabuf->size and sg_dma_len() to be
> > > > PAGE_SIZE multiples without checking:
> > > >
> > > >   - tx_vec is sized dmabuf->size / PAGE_SIZE, and
> > > >     net_devmem_get_niov_at() only bounds-checks virt_addr < dmabuf->size
> > > >     before indexing tx_vec[virt_addr / PAGE_SIZE]. With size =
> > > >     N*PAGE_SIZE + r (1 <= r < PAGE_SIZE), sendmsg() at iov_base =
> > > >     N*PAGE_SIZE passes the bound check and reads tx_vec[N] -- one past.
> > > >
> > > >   - owner->area.num_niovs = len / PAGE_SIZE while gen_pool_add_owner()
> > > >     covers the full byte len, so a non-page-multiple non-final sg
> > > >     desyncs num_niovs from the gen_pool region for every later sg, on
> > > >     both RX and TX.
> > > >
> > > > dma-buf does not require page-aligned sizes, so the bind path has to
> > > > enforce what its own indexing assumes. Reject both with -EINVAL.
> > > >
> > > > The size check is TX-only (only tx_vec is sized off dmabuf->size); the
> > > > SG-length check covers both directions.
> > > >
> > > > Fixes: bd61848900bf ("net: devmem: Implement TX path")
> > > > Cc: stable@vger.kernel.org
> > > > Signed-off-by: David Carlier <devnexen@gmail.com>
> > > > ---
> > > > Changes in v2:
> > > >   - Reframe commit message around the kernel-side OOB instead of
> > > >     "real exporters already page-align", which read as the OOB being
> > > >     unreachable and undercut Cc: stable (Stanislav Fomichev).
> > > >   - Hoist the SG-length check out of the if (TX) branch so it covers
> > > >     RX too; RX has the same num_niovs / gen_pool desync on a
> > > >     contract-violating exporter, just without an OOB. Keep the
> > > >     size-multiple check TX-only (Stanislav Fomichev).
> > > >   - Drop bool todevice; compare direction == DMA_TO_DEVICE inline to
> > > >     match the existing call site at the tx_vec[] assignment
> > > >     (Bobby Eshleman).
> > > >
> > > >  net/core/devmem.c | 11 +++++++++++
> > > >  1 file changed, 11 insertions(+)
> > > >
> > > > diff --git a/net/core/devmem.c b/net/core/devmem.c
> > > > index 468344739db2..4f71de44c0fb 100644
> > > > --- a/net/core/devmem.c
> > > > +++ b/net/core/devmem.c
> > > > @@ -241,6 +241,11 @@ net_devmem_bind_dmabuf(struct net_device *dev,
> > > >         }
> > > >
> > > >         if (direction == DMA_TO_DEVICE) {
> > > > +               if (!IS_ALIGNED(dmabuf->size, PAGE_SIZE)) {
> > > > +                       err = -EINVAL;
> > > > +                       NL_SET_ERR_MSG(extack, "TX dma-buf size must be a multiple of PAGE_SIZE");
> > > > +                       goto err_unmap;
> > > > +               }
> > > >                 binding->tx_vec = kvmalloc_objs(struct net_iov *,
> > > >                                                 dmabuf->size / PAGE_SIZE);
> > > >                 if (!binding->tx_vec) {
> > > > @@ -267,6 +272,12 @@ net_devmem_bind_dmabuf(struct net_device *dev,
> > > >                 size_t len = sg_dma_len(sg);
> > > >                 struct net_iov *niov;
> > > >
> > > > +               if (!IS_ALIGNED(len, PAGE_SIZE)) {
> > > > +                       err = -EINVAL;
> > > > +                       NL_SET_ERR_MSG(extack, "dma-buf SG length must be PAGE_SIZE aligned");
> > > > +                       goto err_free_chunks;
> > > > +               }
> > > > +
> > > >                 owner = kzalloc_node(sizeof(*owner), GFP_KERNEL,
> > > >                                      dev_to_node(&dev->dev));
> > > >                 if (!owner) {
> > > > --
> > > > 2.53.0
> > > >
> > >
> > > No hold on, I don't think we actually have a bug here. AFAIR all
> > > you're describing is intionional . Yes the TX vectors and their niov
> > > arrays do implicitly 'pad' the dmabuf size to PAGE_SIZE.
> > >
> > > But net_devmem_get_niov_at has this check that prevents us from trying
> > > to send past the dma-buf size, even if it's not page_aligned:
> > >
> > > ```
> > > if (virt_addr >= binding->dmabuf->size)
> > > return NULL;
> > > ```
> > >
> > > IIRC the NULL should be bubbled up to the user as some error.
> > >
> > > Please double check that we actually have a bug here. If not, please
> > > don't merge this. This change could break existing users using devmem
> > > TX correctly with non-PAGE_SIZE aligned dmabufs, which is a valid use
> > > case.
> > >
> > > And if we have a bug, lets fix it in some way that doesn't deprecate
> > > support for non page-aligned TX dmabufs. You may be breaking users
> > > here.
> > >
> > > --
> > > Thanks,
> > > Mina
> >
> > Hi, Mina, note that the guard you're quoting doesn't cover the case
> > I'm describing. It's in bytes against dmabuf->size, while tx_vec is
> > sized dmabuf->size / PAGE_SIZE
> >   (truncating) -- there's a sub-page window where the check passes and
> > the index doesn't.
> >
> >   Concretely, PAGE_SIZE = 4096 and dmabuf->size = 4097:
> >
> >   tx_vec allocated for 4097 / 4096 = 1 entry (valid index 0).
> >   sendmsg with iov_base = 4096:
> >     virt_addr >= dmabuf->size   ->  4096 >= 4097, passes.
> >     tx_vec[virt_addr / PAGE_SIZE] -> tx_vec[1], OOB by one.
> >
> >   And the OOB pointer isn't just returned to the caller -- it flows
> > through get_netmem() -> __get_netmem(), which dereferences it
> > (net_is_devmem_iov() reads ->type) and
> >    on a matching byte refcounts the binding off of it. So this is a
> > controlled OOB deref, not just a stray read.
> >
>
> Ah, I see. I think that is indeed the bug. In the case PAGE_SIZE=4096
> and dmabuf->size is 4097, the intention in the code was to have tx_vec
> be an array of 2, where tx_vec[0] is [1->4096] and tx_vec[1] is
> [4097]. I have an off-by-one error in the tx_vec allocation :(
>
> tx_vec[1] would contain an niov and as is stands we assume nvios are
> PAGE_SIZE, but IIRC net_devmem_get_niov_at() would make sure that the
> callers trying to use tx_vec[1] would only use it for a range that's
> valid, so using [4097] and not a range like [4097->8192].
>
> However when I dug deeper on proper pruning of page alignment
> assumptions in net_devmem_bind_dmabuf, the problems are deeper than
> this patch suggests. We don't properly handle the (probably
> non-existent?) edge case where the dma-buf itself is page aligned but
> for_each_sgtable_dma_sg itself gives us un-page_aligned sg entries :(
>
> I think probably all of this is very theoretical. In practice probably
> the dmabuf implementations in the wild seem to be page-aligned.
> udmabuf doesn't support non-page-aligned dmabufs even so we can't add
> tests for these things. So I guess fine, lets merge this.
>
> >   On breaking users: the partial-page tail was never TX-usable to
> > begin with. The fill loop only populates num_niovs = len / PAGE_SIZE
> > entries while
> >   gen_pool_add_owner() covers the full byte len, so any sendmsg into
> > [num_niovs*PAGE_SIZE, dmabuf->size) was already heading for either
> > NULL or this OOB. It's not
> >   deprecating a working configuration -- it's rejecting one that wasn't working.
> >
>
> Yes, but still, today binding a non-PAGE_SIZE TX dmabuf but only using
> actually sendmsging up to the last PAGE_SIZE boundary is working,
> andthat would break entirely. I guess lets merge this and on the
> offchance someone is hitting this edge case we can revisit.
>
> Reviewed-by: Mina Almasry <almasrymina@google.com>
>
> I don't know if you're interested in also fixing the edge case where
> the sg table entries themselves are not not page-aligned. I think that
> also doesn't work properly?
>
> --
> Thanks,
> Mina

Thanks Mina, appreciate the review.

  On the SG side -- the second hunk in this patch already rejects any
SG with !IS_ALIGNED(sg_dma_len(sg), PAGE_SIZE) at bind time, so the
num_niovs vs
  gen_pool_add_owner(len) desync I described in the changelog is
covered for both RX and TX.

  What's not checked is the sg_dma_address() alignment itself -- an SG
with a page-multiple length but a sub-page start offset would still
slip through and produce
  niovs whose dma addresses straddle pages. Like you said, it's
theoretical (in-tree exporters and udmabuf all hand back page-aligned
addresses), but happy to send a
  follow-up as a separate fix if you'd like.

Cheers.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length
  2026-05-19 20:35 [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length David Carlier
                   ` (2 preceding siblings ...)
  2026-05-20 18:55 ` Mina Almasry
@ 2026-05-21  2:10 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 8+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-05-21  2:10 UTC (permalink / raw)
  To: David CARLIER
  Cc: netdev, stable, davem, edumazet, kuba, pabeni, horms, sdf,
	sdf.kernel, kaiyuanz, almasrymina, bobbyeshleman, linux-kernel

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 19 May 2026 21:35:30 +0100 you wrote:
> net_devmem_bind_dmabuf() trusts dmabuf->size and sg_dma_len() to be
> PAGE_SIZE multiples without checking:
> 
>   - tx_vec is sized dmabuf->size / PAGE_SIZE, and
>     net_devmem_get_niov_at() only bounds-checks virt_addr < dmabuf->size
>     before indexing tx_vec[virt_addr / PAGE_SIZE]. With size =
>     N*PAGE_SIZE + r (1 <= r < PAGE_SIZE), sendmsg() at iov_base =
>     N*PAGE_SIZE passes the bound check and reads tx_vec[N] -- one past.
> 
> [...]

Here is the summary with links:
  - [net,v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length
    https://git.kernel.org/netdev/net/c/4eb82ba54342

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-05-21  2:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-19 20:35 [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length David Carlier
2026-05-19 21:07 ` Bobby Eshleman
2026-05-19 23:33 ` Stanislav Fomichev
2026-05-20 18:55 ` Mina Almasry
2026-05-20 19:28   ` David CARLIER
2026-05-20 20:42     ` Mina Almasry
2026-05-20 21:02       ` David CARLIER
2026-05-21  2:10 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox