* [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length
@ 2026-05-19 20:35 David Carlier
2026-05-19 21:07 ` Bobby Eshleman
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: David Carlier @ 2026-05-19 20:35 UTC (permalink / raw)
To: netdev
Cc: stable, davem, edumazet, kuba, pabeni, horms, sdf, sdf.kernel,
kaiyuanz, almasrymina, bobbyeshleman, linux-kernel, David Carlier
net_devmem_bind_dmabuf() trusts dmabuf->size and sg_dma_len() to be
PAGE_SIZE multiples without checking:
- tx_vec is sized dmabuf->size / PAGE_SIZE, and
net_devmem_get_niov_at() only bounds-checks virt_addr < dmabuf->size
before indexing tx_vec[virt_addr / PAGE_SIZE]. With size =
N*PAGE_SIZE + r (1 <= r < PAGE_SIZE), sendmsg() at iov_base =
N*PAGE_SIZE passes the bound check and reads tx_vec[N] -- one past.
- owner->area.num_niovs = len / PAGE_SIZE while gen_pool_add_owner()
covers the full byte len, so a non-page-multiple non-final sg
desyncs num_niovs from the gen_pool region for every later sg, on
both RX and TX.
dma-buf does not require page-aligned sizes, so the bind path has to
enforce what its own indexing assumes. Reject both with -EINVAL.
The size check is TX-only (only tx_vec is sized off dmabuf->size); the
SG-length check covers both directions.
Fixes: bd61848900bf ("net: devmem: Implement TX path")
Cc: stable@vger.kernel.org
Signed-off-by: David Carlier <devnexen@gmail.com>
---
Changes in v2:
- Reframe commit message around the kernel-side OOB instead of
"real exporters already page-align", which read as the OOB being
unreachable and undercut Cc: stable (Stanislav Fomichev).
- Hoist the SG-length check out of the if (TX) branch so it covers
RX too; RX has the same num_niovs / gen_pool desync on a
contract-violating exporter, just without an OOB. Keep the
size-multiple check TX-only (Stanislav Fomichev).
- Drop bool todevice; compare direction == DMA_TO_DEVICE inline to
match the existing call site at the tx_vec[] assignment
(Bobby Eshleman).
net/core/devmem.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/net/core/devmem.c b/net/core/devmem.c
index 468344739db2..4f71de44c0fb 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -241,6 +241,11 @@ net_devmem_bind_dmabuf(struct net_device *dev,
}
if (direction == DMA_TO_DEVICE) {
+ if (!IS_ALIGNED(dmabuf->size, PAGE_SIZE)) {
+ err = -EINVAL;
+ NL_SET_ERR_MSG(extack, "TX dma-buf size must be a multiple of PAGE_SIZE");
+ goto err_unmap;
+ }
binding->tx_vec = kvmalloc_objs(struct net_iov *,
dmabuf->size / PAGE_SIZE);
if (!binding->tx_vec) {
@@ -267,6 +272,12 @@ net_devmem_bind_dmabuf(struct net_device *dev,
size_t len = sg_dma_len(sg);
struct net_iov *niov;
+ if (!IS_ALIGNED(len, PAGE_SIZE)) {
+ err = -EINVAL;
+ NL_SET_ERR_MSG(extack, "dma-buf SG length must be PAGE_SIZE aligned");
+ goto err_free_chunks;
+ }
+
owner = kzalloc_node(sizeof(*owner), GFP_KERNEL,
dev_to_node(&dev->dev));
if (!owner) {
--
2.53.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length
2026-05-19 20:35 [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length David Carlier
@ 2026-05-19 21:07 ` Bobby Eshleman
2026-05-19 23:33 ` Stanislav Fomichev
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: Bobby Eshleman @ 2026-05-19 21:07 UTC (permalink / raw)
To: David Carlier
Cc: netdev, stable, davem, edumazet, kuba, pabeni, horms, sdf,
sdf.kernel, kaiyuanz, almasrymina, linux-kernel
On Tue, May 19, 2026 at 09:35:30PM +0100, David Carlier wrote:
> net_devmem_bind_dmabuf() trusts dmabuf->size and sg_dma_len() to be
> PAGE_SIZE multiples without checking:
>
> - tx_vec is sized dmabuf->size / PAGE_SIZE, and
> net_devmem_get_niov_at() only bounds-checks virt_addr < dmabuf->size
> before indexing tx_vec[virt_addr / PAGE_SIZE]. With size =
> N*PAGE_SIZE + r (1 <= r < PAGE_SIZE), sendmsg() at iov_base =
> N*PAGE_SIZE passes the bound check and reads tx_vec[N] -- one past.
>
> - owner->area.num_niovs = len / PAGE_SIZE while gen_pool_add_owner()
> covers the full byte len, so a non-page-multiple non-final sg
> desyncs num_niovs from the gen_pool region for every later sg, on
> both RX and TX.
>
> dma-buf does not require page-aligned sizes, so the bind path has to
> enforce what its own indexing assumes. Reject both with -EINVAL.
>
> The size check is TX-only (only tx_vec is sized off dmabuf->size); the
> SG-length check covers both directions.
>
> Fixes: bd61848900bf ("net: devmem: Implement TX path")
> Cc: stable@vger.kernel.org
> Signed-off-by: David Carlier <devnexen@gmail.com>
> ---
> Changes in v2:
> - Reframe commit message around the kernel-side OOB instead of
> "real exporters already page-align", which read as the OOB being
> unreachable and undercut Cc: stable (Stanislav Fomichev).
> - Hoist the SG-length check out of the if (TX) branch so it covers
> RX too; RX has the same num_niovs / gen_pool desync on a
> contract-violating exporter, just without an OOB. Keep the
> size-multiple check TX-only (Stanislav Fomichev).
> - Drop bool todevice; compare direction == DMA_TO_DEVICE inline to
> match the existing call site at the tx_vec[] assignment
> (Bobby Eshleman).
>
> net/core/devmem.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/net/core/devmem.c b/net/core/devmem.c
> index 468344739db2..4f71de44c0fb 100644
> --- a/net/core/devmem.c
> +++ b/net/core/devmem.c
> @@ -241,6 +241,11 @@ net_devmem_bind_dmabuf(struct net_device *dev,
> }
>
> if (direction == DMA_TO_DEVICE) {
> + if (!IS_ALIGNED(dmabuf->size, PAGE_SIZE)) {
> + err = -EINVAL;
> + NL_SET_ERR_MSG(extack, "TX dma-buf size must be a multiple of PAGE_SIZE");
> + goto err_unmap;
> + }
> binding->tx_vec = kvmalloc_objs(struct net_iov *,
> dmabuf->size / PAGE_SIZE);
> if (!binding->tx_vec) {
> @@ -267,6 +272,12 @@ net_devmem_bind_dmabuf(struct net_device *dev,
> size_t len = sg_dma_len(sg);
> struct net_iov *niov;
>
> + if (!IS_ALIGNED(len, PAGE_SIZE)) {
> + err = -EINVAL;
> + NL_SET_ERR_MSG(extack, "dma-buf SG length must be PAGE_SIZE aligned");
> + goto err_free_chunks;
> + }
> +
> owner = kzalloc_node(sizeof(*owner), GFP_KERNEL,
> dev_to_node(&dev->dev));
> if (!owner) {
> --
> 2.53.0
LGTM.
Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length
2026-05-19 20:35 [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length David Carlier
2026-05-19 21:07 ` Bobby Eshleman
@ 2026-05-19 23:33 ` Stanislav Fomichev
2026-05-20 18:55 ` Mina Almasry
2026-05-21 2:10 ` patchwork-bot+netdevbpf
3 siblings, 0 replies; 8+ messages in thread
From: Stanislav Fomichev @ 2026-05-19 23:33 UTC (permalink / raw)
To: David Carlier
Cc: netdev, stable, davem, edumazet, kuba, pabeni, horms, sdf,
kaiyuanz, almasrymina, bobbyeshleman, linux-kernel
On 05/19, David Carlier wrote:
> net_devmem_bind_dmabuf() trusts dmabuf->size and sg_dma_len() to be
> PAGE_SIZE multiples without checking:
>
> - tx_vec is sized dmabuf->size / PAGE_SIZE, and
> net_devmem_get_niov_at() only bounds-checks virt_addr < dmabuf->size
> before indexing tx_vec[virt_addr / PAGE_SIZE]. With size =
> N*PAGE_SIZE + r (1 <= r < PAGE_SIZE), sendmsg() at iov_base =
> N*PAGE_SIZE passes the bound check and reads tx_vec[N] -- one past.
>
> - owner->area.num_niovs = len / PAGE_SIZE while gen_pool_add_owner()
> covers the full byte len, so a non-page-multiple non-final sg
> desyncs num_niovs from the gen_pool region for every later sg, on
> both RX and TX.
>
> dma-buf does not require page-aligned sizes, so the bind path has to
> enforce what its own indexing assumes. Reject both with -EINVAL.
>
> The size check is TX-only (only tx_vec is sized off dmabuf->size); the
> SG-length check covers both directions.
>
> Fixes: bd61848900bf ("net: devmem: Implement TX path")
> Cc: stable@vger.kernel.org
> Signed-off-by: David Carlier <devnexen@gmail.com>
> ---
> Changes in v2:
> - Reframe commit message around the kernel-side OOB instead of
> "real exporters already page-align", which read as the OOB being
> unreachable and undercut Cc: stable (Stanislav Fomichev).
> - Hoist the SG-length check out of the if (TX) branch so it covers
> RX too; RX has the same num_niovs / gen_pool desync on a
> contract-violating exporter, just without an OOB. Keep the
> size-multiple check TX-only (Stanislav Fomichev).
> - Drop bool todevice; compare direction == DMA_TO_DEVICE inline to
> match the existing call site at the tx_vec[] assignment
> (Bobby Eshleman).
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Thanks!
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length
2026-05-19 20:35 [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length David Carlier
2026-05-19 21:07 ` Bobby Eshleman
2026-05-19 23:33 ` Stanislav Fomichev
@ 2026-05-20 18:55 ` Mina Almasry
2026-05-20 19:28 ` David CARLIER
2026-05-21 2:10 ` patchwork-bot+netdevbpf
3 siblings, 1 reply; 8+ messages in thread
From: Mina Almasry @ 2026-05-20 18:55 UTC (permalink / raw)
To: David Carlier
Cc: netdev, stable, davem, edumazet, kuba, pabeni, horms, sdf,
sdf.kernel, kaiyuanz, bobbyeshleman, linux-kernel
On Tue, May 19, 2026 at 1:35 PM David Carlier <devnexen@gmail.com> wrote:
>
> net_devmem_bind_dmabuf() trusts dmabuf->size and sg_dma_len() to be
> PAGE_SIZE multiples without checking:
>
> - tx_vec is sized dmabuf->size / PAGE_SIZE, and
> net_devmem_get_niov_at() only bounds-checks virt_addr < dmabuf->size
> before indexing tx_vec[virt_addr / PAGE_SIZE]. With size =
> N*PAGE_SIZE + r (1 <= r < PAGE_SIZE), sendmsg() at iov_base =
> N*PAGE_SIZE passes the bound check and reads tx_vec[N] -- one past.
>
> - owner->area.num_niovs = len / PAGE_SIZE while gen_pool_add_owner()
> covers the full byte len, so a non-page-multiple non-final sg
> desyncs num_niovs from the gen_pool region for every later sg, on
> both RX and TX.
>
> dma-buf does not require page-aligned sizes, so the bind path has to
> enforce what its own indexing assumes. Reject both with -EINVAL.
>
> The size check is TX-only (only tx_vec is sized off dmabuf->size); the
> SG-length check covers both directions.
>
> Fixes: bd61848900bf ("net: devmem: Implement TX path")
> Cc: stable@vger.kernel.org
> Signed-off-by: David Carlier <devnexen@gmail.com>
> ---
> Changes in v2:
> - Reframe commit message around the kernel-side OOB instead of
> "real exporters already page-align", which read as the OOB being
> unreachable and undercut Cc: stable (Stanislav Fomichev).
> - Hoist the SG-length check out of the if (TX) branch so it covers
> RX too; RX has the same num_niovs / gen_pool desync on a
> contract-violating exporter, just without an OOB. Keep the
> size-multiple check TX-only (Stanislav Fomichev).
> - Drop bool todevice; compare direction == DMA_TO_DEVICE inline to
> match the existing call site at the tx_vec[] assignment
> (Bobby Eshleman).
>
> net/core/devmem.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/net/core/devmem.c b/net/core/devmem.c
> index 468344739db2..4f71de44c0fb 100644
> --- a/net/core/devmem.c
> +++ b/net/core/devmem.c
> @@ -241,6 +241,11 @@ net_devmem_bind_dmabuf(struct net_device *dev,
> }
>
> if (direction == DMA_TO_DEVICE) {
> + if (!IS_ALIGNED(dmabuf->size, PAGE_SIZE)) {
> + err = -EINVAL;
> + NL_SET_ERR_MSG(extack, "TX dma-buf size must be a multiple of PAGE_SIZE");
> + goto err_unmap;
> + }
> binding->tx_vec = kvmalloc_objs(struct net_iov *,
> dmabuf->size / PAGE_SIZE);
> if (!binding->tx_vec) {
> @@ -267,6 +272,12 @@ net_devmem_bind_dmabuf(struct net_device *dev,
> size_t len = sg_dma_len(sg);
> struct net_iov *niov;
>
> + if (!IS_ALIGNED(len, PAGE_SIZE)) {
> + err = -EINVAL;
> + NL_SET_ERR_MSG(extack, "dma-buf SG length must be PAGE_SIZE aligned");
> + goto err_free_chunks;
> + }
> +
> owner = kzalloc_node(sizeof(*owner), GFP_KERNEL,
> dev_to_node(&dev->dev));
> if (!owner) {
> --
> 2.53.0
>
No hold on, I don't think we actually have a bug here. AFAIR all
you're describing is intionional . Yes the TX vectors and their niov
arrays do implicitly 'pad' the dmabuf size to PAGE_SIZE.
But net_devmem_get_niov_at has this check that prevents us from trying
to send past the dma-buf size, even if it's not page_aligned:
```
if (virt_addr >= binding->dmabuf->size)
return NULL;
```
IIRC the NULL should be bubbled up to the user as some error.
Please double check that we actually have a bug here. If not, please
don't merge this. This change could break existing users using devmem
TX correctly with non-PAGE_SIZE aligned dmabufs, which is a valid use
case.
And if we have a bug, lets fix it in some way that doesn't deprecate
support for non page-aligned TX dmabufs. You may be breaking users
here.
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length
2026-05-20 18:55 ` Mina Almasry
@ 2026-05-20 19:28 ` David CARLIER
2026-05-20 20:42 ` Mina Almasry
0 siblings, 1 reply; 8+ messages in thread
From: David CARLIER @ 2026-05-20 19:28 UTC (permalink / raw)
To: Mina Almasry
Cc: netdev, stable, davem, edumazet, kuba, pabeni, horms, sdf,
sdf.kernel, kaiyuanz, bobbyeshleman, linux-kernel
On Wed, 20 May 2026 at 19:55, Mina Almasry <almasrymina@google.com> wrote:
>
> On Tue, May 19, 2026 at 1:35 PM David Carlier <devnexen@gmail.com> wrote:
> >
> > net_devmem_bind_dmabuf() trusts dmabuf->size and sg_dma_len() to be
> > PAGE_SIZE multiples without checking:
> >
> > - tx_vec is sized dmabuf->size / PAGE_SIZE, and
> > net_devmem_get_niov_at() only bounds-checks virt_addr < dmabuf->size
> > before indexing tx_vec[virt_addr / PAGE_SIZE]. With size =
> > N*PAGE_SIZE + r (1 <= r < PAGE_SIZE), sendmsg() at iov_base =
> > N*PAGE_SIZE passes the bound check and reads tx_vec[N] -- one past.
> >
> > - owner->area.num_niovs = len / PAGE_SIZE while gen_pool_add_owner()
> > covers the full byte len, so a non-page-multiple non-final sg
> > desyncs num_niovs from the gen_pool region for every later sg, on
> > both RX and TX.
> >
> > dma-buf does not require page-aligned sizes, so the bind path has to
> > enforce what its own indexing assumes. Reject both with -EINVAL.
> >
> > The size check is TX-only (only tx_vec is sized off dmabuf->size); the
> > SG-length check covers both directions.
> >
> > Fixes: bd61848900bf ("net: devmem: Implement TX path")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: David Carlier <devnexen@gmail.com>
> > ---
> > Changes in v2:
> > - Reframe commit message around the kernel-side OOB instead of
> > "real exporters already page-align", which read as the OOB being
> > unreachable and undercut Cc: stable (Stanislav Fomichev).
> > - Hoist the SG-length check out of the if (TX) branch so it covers
> > RX too; RX has the same num_niovs / gen_pool desync on a
> > contract-violating exporter, just without an OOB. Keep the
> > size-multiple check TX-only (Stanislav Fomichev).
> > - Drop bool todevice; compare direction == DMA_TO_DEVICE inline to
> > match the existing call site at the tx_vec[] assignment
> > (Bobby Eshleman).
> >
> > net/core/devmem.c | 11 +++++++++++
> > 1 file changed, 11 insertions(+)
> >
> > diff --git a/net/core/devmem.c b/net/core/devmem.c
> > index 468344739db2..4f71de44c0fb 100644
> > --- a/net/core/devmem.c
> > +++ b/net/core/devmem.c
> > @@ -241,6 +241,11 @@ net_devmem_bind_dmabuf(struct net_device *dev,
> > }
> >
> > if (direction == DMA_TO_DEVICE) {
> > + if (!IS_ALIGNED(dmabuf->size, PAGE_SIZE)) {
> > + err = -EINVAL;
> > + NL_SET_ERR_MSG(extack, "TX dma-buf size must be a multiple of PAGE_SIZE");
> > + goto err_unmap;
> > + }
> > binding->tx_vec = kvmalloc_objs(struct net_iov *,
> > dmabuf->size / PAGE_SIZE);
> > if (!binding->tx_vec) {
> > @@ -267,6 +272,12 @@ net_devmem_bind_dmabuf(struct net_device *dev,
> > size_t len = sg_dma_len(sg);
> > struct net_iov *niov;
> >
> > + if (!IS_ALIGNED(len, PAGE_SIZE)) {
> > + err = -EINVAL;
> > + NL_SET_ERR_MSG(extack, "dma-buf SG length must be PAGE_SIZE aligned");
> > + goto err_free_chunks;
> > + }
> > +
> > owner = kzalloc_node(sizeof(*owner), GFP_KERNEL,
> > dev_to_node(&dev->dev));
> > if (!owner) {
> > --
> > 2.53.0
> >
>
> No hold on, I don't think we actually have a bug here. AFAIR all
> you're describing is intionional . Yes the TX vectors and their niov
> arrays do implicitly 'pad' the dmabuf size to PAGE_SIZE.
>
> But net_devmem_get_niov_at has this check that prevents us from trying
> to send past the dma-buf size, even if it's not page_aligned:
>
> ```
> if (virt_addr >= binding->dmabuf->size)
> return NULL;
> ```
>
> IIRC the NULL should be bubbled up to the user as some error.
>
> Please double check that we actually have a bug here. If not, please
> don't merge this. This change could break existing users using devmem
> TX correctly with non-PAGE_SIZE aligned dmabufs, which is a valid use
> case.
>
> And if we have a bug, lets fix it in some way that doesn't deprecate
> support for non page-aligned TX dmabufs. You may be breaking users
> here.
>
> --
> Thanks,
> Mina
Hi, Mina, note that the guard you're quoting doesn't cover the case
I'm describing. It's in bytes against dmabuf->size, while tx_vec is
sized dmabuf->size / PAGE_SIZE
(truncating) -- there's a sub-page window where the check passes and
the index doesn't.
Concretely, PAGE_SIZE = 4096 and dmabuf->size = 4097:
tx_vec allocated for 4097 / 4096 = 1 entry (valid index 0).
sendmsg with iov_base = 4096:
virt_addr >= dmabuf->size -> 4096 >= 4097, passes.
tx_vec[virt_addr / PAGE_SIZE] -> tx_vec[1], OOB by one.
And the OOB pointer isn't just returned to the caller -- it flows
through get_netmem() -> __get_netmem(), which dereferences it
(net_is_devmem_iov() reads ->type) and
on a matching byte refcounts the binding off of it. So this is a
controlled OOB deref, not just a stray read.
On breaking users: the partial-page tail was never TX-usable to
begin with. The fill loop only populates num_niovs = len / PAGE_SIZE
entries while
gen_pool_add_owner() covers the full byte len, so any sendmsg into
[num_niovs*PAGE_SIZE, dmabuf->size) was already heading for either
NULL or this OOB. It's not
deprecating a working configuration -- it's rejecting one that wasn't working.
Also worth flagging: the second hunk (sg_dma_len alignment) is the
RX path too. net_devmem_get_niov_at() is TX-only, so the guard you
quoted doesn't apply there, and
the num_niovs vs gen_pool_add_owner(len) desync hits every non-final
SG with a sub-page length, both directions.
If you'd prefer to keep non-aligned binds legal, an alternative
would be tightening the guard in net_devmem_get_niov_at() to virt_addr
>= num_niovs * PAGE_SIZE
instead of >= dmabuf->size. Happy to spin that as v3 -- but it
doesn't help the SG-length leg, which still needs rejecting at bind
time.
Cheers.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length
2026-05-20 19:28 ` David CARLIER
@ 2026-05-20 20:42 ` Mina Almasry
2026-05-20 21:02 ` David CARLIER
0 siblings, 1 reply; 8+ messages in thread
From: Mina Almasry @ 2026-05-20 20:42 UTC (permalink / raw)
To: David CARLIER
Cc: netdev, stable, davem, edumazet, kuba, pabeni, horms, sdf,
sdf.kernel, kaiyuanz, bobbyeshleman, linux-kernel
On Wed, May 20, 2026 at 12:28 PM David CARLIER <devnexen@gmail.com> wrote:
>
> On Wed, 20 May 2026 at 19:55, Mina Almasry <almasrymina@google.com> wrote:
> >
> > On Tue, May 19, 2026 at 1:35 PM David Carlier <devnexen@gmail.com> wrote:
> > >
> > > net_devmem_bind_dmabuf() trusts dmabuf->size and sg_dma_len() to be
> > > PAGE_SIZE multiples without checking:
> > >
> > > - tx_vec is sized dmabuf->size / PAGE_SIZE, and
> > > net_devmem_get_niov_at() only bounds-checks virt_addr < dmabuf->size
> > > before indexing tx_vec[virt_addr / PAGE_SIZE]. With size =
> > > N*PAGE_SIZE + r (1 <= r < PAGE_SIZE), sendmsg() at iov_base =
> > > N*PAGE_SIZE passes the bound check and reads tx_vec[N] -- one past.
> > >
> > > - owner->area.num_niovs = len / PAGE_SIZE while gen_pool_add_owner()
> > > covers the full byte len, so a non-page-multiple non-final sg
> > > desyncs num_niovs from the gen_pool region for every later sg, on
> > > both RX and TX.
> > >
> > > dma-buf does not require page-aligned sizes, so the bind path has to
> > > enforce what its own indexing assumes. Reject both with -EINVAL.
> > >
> > > The size check is TX-only (only tx_vec is sized off dmabuf->size); the
> > > SG-length check covers both directions.
> > >
> > > Fixes: bd61848900bf ("net: devmem: Implement TX path")
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: David Carlier <devnexen@gmail.com>
> > > ---
> > > Changes in v2:
> > > - Reframe commit message around the kernel-side OOB instead of
> > > "real exporters already page-align", which read as the OOB being
> > > unreachable and undercut Cc: stable (Stanislav Fomichev).
> > > - Hoist the SG-length check out of the if (TX) branch so it covers
> > > RX too; RX has the same num_niovs / gen_pool desync on a
> > > contract-violating exporter, just without an OOB. Keep the
> > > size-multiple check TX-only (Stanislav Fomichev).
> > > - Drop bool todevice; compare direction == DMA_TO_DEVICE inline to
> > > match the existing call site at the tx_vec[] assignment
> > > (Bobby Eshleman).
> > >
> > > net/core/devmem.c | 11 +++++++++++
> > > 1 file changed, 11 insertions(+)
> > >
> > > diff --git a/net/core/devmem.c b/net/core/devmem.c
> > > index 468344739db2..4f71de44c0fb 100644
> > > --- a/net/core/devmem.c
> > > +++ b/net/core/devmem.c
> > > @@ -241,6 +241,11 @@ net_devmem_bind_dmabuf(struct net_device *dev,
> > > }
> > >
> > > if (direction == DMA_TO_DEVICE) {
> > > + if (!IS_ALIGNED(dmabuf->size, PAGE_SIZE)) {
> > > + err = -EINVAL;
> > > + NL_SET_ERR_MSG(extack, "TX dma-buf size must be a multiple of PAGE_SIZE");
> > > + goto err_unmap;
> > > + }
> > > binding->tx_vec = kvmalloc_objs(struct net_iov *,
> > > dmabuf->size / PAGE_SIZE);
> > > if (!binding->tx_vec) {
> > > @@ -267,6 +272,12 @@ net_devmem_bind_dmabuf(struct net_device *dev,
> > > size_t len = sg_dma_len(sg);
> > > struct net_iov *niov;
> > >
> > > + if (!IS_ALIGNED(len, PAGE_SIZE)) {
> > > + err = -EINVAL;
> > > + NL_SET_ERR_MSG(extack, "dma-buf SG length must be PAGE_SIZE aligned");
> > > + goto err_free_chunks;
> > > + }
> > > +
> > > owner = kzalloc_node(sizeof(*owner), GFP_KERNEL,
> > > dev_to_node(&dev->dev));
> > > if (!owner) {
> > > --
> > > 2.53.0
> > >
> >
> > No hold on, I don't think we actually have a bug here. AFAIR all
> > you're describing is intionional . Yes the TX vectors and their niov
> > arrays do implicitly 'pad' the dmabuf size to PAGE_SIZE.
> >
> > But net_devmem_get_niov_at has this check that prevents us from trying
> > to send past the dma-buf size, even if it's not page_aligned:
> >
> > ```
> > if (virt_addr >= binding->dmabuf->size)
> > return NULL;
> > ```
> >
> > IIRC the NULL should be bubbled up to the user as some error.
> >
> > Please double check that we actually have a bug here. If not, please
> > don't merge this. This change could break existing users using devmem
> > TX correctly with non-PAGE_SIZE aligned dmabufs, which is a valid use
> > case.
> >
> > And if we have a bug, lets fix it in some way that doesn't deprecate
> > support for non page-aligned TX dmabufs. You may be breaking users
> > here.
> >
> > --
> > Thanks,
> > Mina
>
> Hi, Mina, note that the guard you're quoting doesn't cover the case
> I'm describing. It's in bytes against dmabuf->size, while tx_vec is
> sized dmabuf->size / PAGE_SIZE
> (truncating) -- there's a sub-page window where the check passes and
> the index doesn't.
>
> Concretely, PAGE_SIZE = 4096 and dmabuf->size = 4097:
>
> tx_vec allocated for 4097 / 4096 = 1 entry (valid index 0).
> sendmsg with iov_base = 4096:
> virt_addr >= dmabuf->size -> 4096 >= 4097, passes.
> tx_vec[virt_addr / PAGE_SIZE] -> tx_vec[1], OOB by one.
>
> And the OOB pointer isn't just returned to the caller -- it flows
> through get_netmem() -> __get_netmem(), which dereferences it
> (net_is_devmem_iov() reads ->type) and
> on a matching byte refcounts the binding off of it. So this is a
> controlled OOB deref, not just a stray read.
>
Ah, I see. I think that is indeed the bug. In the case PAGE_SIZE=4096
and dmabuf->size is 4097, the intention in the code was to have tx_vec
be an array of 2, where tx_vec[0] is [1->4096] and tx_vec[1] is
[4097]. I have an off-by-one error in the tx_vec allocation :(
tx_vec[1] would contain an niov and as is stands we assume nvios are
PAGE_SIZE, but IIRC net_devmem_get_niov_at() would make sure that the
callers trying to use tx_vec[1] would only use it for a range that's
valid, so using [4097] and not a range like [4097->8192].
However when I dug deeper on proper pruning of page alignment
assumptions in net_devmem_bind_dmabuf, the problems are deeper than
this patch suggests. We don't properly handle the (probably
non-existent?) edge case where the dma-buf itself is page aligned but
for_each_sgtable_dma_sg itself gives us un-page_aligned sg entries :(
I think probably all of this is very theoretical. In practice probably
the dmabuf implementations in the wild seem to be page-aligned.
udmabuf doesn't support non-page-aligned dmabufs even so we can't add
tests for these things. So I guess fine, lets merge this.
> On breaking users: the partial-page tail was never TX-usable to
> begin with. The fill loop only populates num_niovs = len / PAGE_SIZE
> entries while
> gen_pool_add_owner() covers the full byte len, so any sendmsg into
> [num_niovs*PAGE_SIZE, dmabuf->size) was already heading for either
> NULL or this OOB. It's not
> deprecating a working configuration -- it's rejecting one that wasn't working.
>
Yes, but still, today binding a non-PAGE_SIZE TX dmabuf but only using
actually sendmsging up to the last PAGE_SIZE boundary is working,
andthat would break entirely. I guess lets merge this and on the
offchance someone is hitting this edge case we can revisit.
Reviewed-by: Mina Almasry <almasrymina@google.com>
I don't know if you're interested in also fixing the edge case where
the sg table entries themselves are not not page-aligned. I think that
also doesn't work properly?
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length
2026-05-20 20:42 ` Mina Almasry
@ 2026-05-20 21:02 ` David CARLIER
0 siblings, 0 replies; 8+ messages in thread
From: David CARLIER @ 2026-05-20 21:02 UTC (permalink / raw)
To: Mina Almasry
Cc: netdev, stable, davem, edumazet, kuba, pabeni, horms, sdf,
sdf.kernel, kaiyuanz, bobbyeshleman, linux-kernel
On Wed, 20 May 2026 at 21:42, Mina Almasry <almasrymina@google.com> wrote:
>
> On Wed, May 20, 2026 at 12:28 PM David CARLIER <devnexen@gmail.com> wrote:
> >
> > On Wed, 20 May 2026 at 19:55, Mina Almasry <almasrymina@google.com> wrote:
> > >
> > > On Tue, May 19, 2026 at 1:35 PM David Carlier <devnexen@gmail.com> wrote:
> > > >
> > > > net_devmem_bind_dmabuf() trusts dmabuf->size and sg_dma_len() to be
> > > > PAGE_SIZE multiples without checking:
> > > >
> > > > - tx_vec is sized dmabuf->size / PAGE_SIZE, and
> > > > net_devmem_get_niov_at() only bounds-checks virt_addr < dmabuf->size
> > > > before indexing tx_vec[virt_addr / PAGE_SIZE]. With size =
> > > > N*PAGE_SIZE + r (1 <= r < PAGE_SIZE), sendmsg() at iov_base =
> > > > N*PAGE_SIZE passes the bound check and reads tx_vec[N] -- one past.
> > > >
> > > > - owner->area.num_niovs = len / PAGE_SIZE while gen_pool_add_owner()
> > > > covers the full byte len, so a non-page-multiple non-final sg
> > > > desyncs num_niovs from the gen_pool region for every later sg, on
> > > > both RX and TX.
> > > >
> > > > dma-buf does not require page-aligned sizes, so the bind path has to
> > > > enforce what its own indexing assumes. Reject both with -EINVAL.
> > > >
> > > > The size check is TX-only (only tx_vec is sized off dmabuf->size); the
> > > > SG-length check covers both directions.
> > > >
> > > > Fixes: bd61848900bf ("net: devmem: Implement TX path")
> > > > Cc: stable@vger.kernel.org
> > > > Signed-off-by: David Carlier <devnexen@gmail.com>
> > > > ---
> > > > Changes in v2:
> > > > - Reframe commit message around the kernel-side OOB instead of
> > > > "real exporters already page-align", which read as the OOB being
> > > > unreachable and undercut Cc: stable (Stanislav Fomichev).
> > > > - Hoist the SG-length check out of the if (TX) branch so it covers
> > > > RX too; RX has the same num_niovs / gen_pool desync on a
> > > > contract-violating exporter, just without an OOB. Keep the
> > > > size-multiple check TX-only (Stanislav Fomichev).
> > > > - Drop bool todevice; compare direction == DMA_TO_DEVICE inline to
> > > > match the existing call site at the tx_vec[] assignment
> > > > (Bobby Eshleman).
> > > >
> > > > net/core/devmem.c | 11 +++++++++++
> > > > 1 file changed, 11 insertions(+)
> > > >
> > > > diff --git a/net/core/devmem.c b/net/core/devmem.c
> > > > index 468344739db2..4f71de44c0fb 100644
> > > > --- a/net/core/devmem.c
> > > > +++ b/net/core/devmem.c
> > > > @@ -241,6 +241,11 @@ net_devmem_bind_dmabuf(struct net_device *dev,
> > > > }
> > > >
> > > > if (direction == DMA_TO_DEVICE) {
> > > > + if (!IS_ALIGNED(dmabuf->size, PAGE_SIZE)) {
> > > > + err = -EINVAL;
> > > > + NL_SET_ERR_MSG(extack, "TX dma-buf size must be a multiple of PAGE_SIZE");
> > > > + goto err_unmap;
> > > > + }
> > > > binding->tx_vec = kvmalloc_objs(struct net_iov *,
> > > > dmabuf->size / PAGE_SIZE);
> > > > if (!binding->tx_vec) {
> > > > @@ -267,6 +272,12 @@ net_devmem_bind_dmabuf(struct net_device *dev,
> > > > size_t len = sg_dma_len(sg);
> > > > struct net_iov *niov;
> > > >
> > > > + if (!IS_ALIGNED(len, PAGE_SIZE)) {
> > > > + err = -EINVAL;
> > > > + NL_SET_ERR_MSG(extack, "dma-buf SG length must be PAGE_SIZE aligned");
> > > > + goto err_free_chunks;
> > > > + }
> > > > +
> > > > owner = kzalloc_node(sizeof(*owner), GFP_KERNEL,
> > > > dev_to_node(&dev->dev));
> > > > if (!owner) {
> > > > --
> > > > 2.53.0
> > > >
> > >
> > > No hold on, I don't think we actually have a bug here. AFAIR all
> > > you're describing is intionional . Yes the TX vectors and their niov
> > > arrays do implicitly 'pad' the dmabuf size to PAGE_SIZE.
> > >
> > > But net_devmem_get_niov_at has this check that prevents us from trying
> > > to send past the dma-buf size, even if it's not page_aligned:
> > >
> > > ```
> > > if (virt_addr >= binding->dmabuf->size)
> > > return NULL;
> > > ```
> > >
> > > IIRC the NULL should be bubbled up to the user as some error.
> > >
> > > Please double check that we actually have a bug here. If not, please
> > > don't merge this. This change could break existing users using devmem
> > > TX correctly with non-PAGE_SIZE aligned dmabufs, which is a valid use
> > > case.
> > >
> > > And if we have a bug, lets fix it in some way that doesn't deprecate
> > > support for non page-aligned TX dmabufs. You may be breaking users
> > > here.
> > >
> > > --
> > > Thanks,
> > > Mina
> >
> > Hi, Mina, note that the guard you're quoting doesn't cover the case
> > I'm describing. It's in bytes against dmabuf->size, while tx_vec is
> > sized dmabuf->size / PAGE_SIZE
> > (truncating) -- there's a sub-page window where the check passes and
> > the index doesn't.
> >
> > Concretely, PAGE_SIZE = 4096 and dmabuf->size = 4097:
> >
> > tx_vec allocated for 4097 / 4096 = 1 entry (valid index 0).
> > sendmsg with iov_base = 4096:
> > virt_addr >= dmabuf->size -> 4096 >= 4097, passes.
> > tx_vec[virt_addr / PAGE_SIZE] -> tx_vec[1], OOB by one.
> >
> > And the OOB pointer isn't just returned to the caller -- it flows
> > through get_netmem() -> __get_netmem(), which dereferences it
> > (net_is_devmem_iov() reads ->type) and
> > on a matching byte refcounts the binding off of it. So this is a
> > controlled OOB deref, not just a stray read.
> >
>
> Ah, I see. I think that is indeed the bug. In the case PAGE_SIZE=4096
> and dmabuf->size is 4097, the intention in the code was to have tx_vec
> be an array of 2, where tx_vec[0] is [1->4096] and tx_vec[1] is
> [4097]. I have an off-by-one error in the tx_vec allocation :(
>
> tx_vec[1] would contain an niov and as is stands we assume nvios are
> PAGE_SIZE, but IIRC net_devmem_get_niov_at() would make sure that the
> callers trying to use tx_vec[1] would only use it for a range that's
> valid, so using [4097] and not a range like [4097->8192].
>
> However when I dug deeper on proper pruning of page alignment
> assumptions in net_devmem_bind_dmabuf, the problems are deeper than
> this patch suggests. We don't properly handle the (probably
> non-existent?) edge case where the dma-buf itself is page aligned but
> for_each_sgtable_dma_sg itself gives us un-page_aligned sg entries :(
>
> I think probably all of this is very theoretical. In practice probably
> the dmabuf implementations in the wild seem to be page-aligned.
> udmabuf doesn't support non-page-aligned dmabufs even so we can't add
> tests for these things. So I guess fine, lets merge this.
>
> > On breaking users: the partial-page tail was never TX-usable to
> > begin with. The fill loop only populates num_niovs = len / PAGE_SIZE
> > entries while
> > gen_pool_add_owner() covers the full byte len, so any sendmsg into
> > [num_niovs*PAGE_SIZE, dmabuf->size) was already heading for either
> > NULL or this OOB. It's not
> > deprecating a working configuration -- it's rejecting one that wasn't working.
> >
>
> Yes, but still, today binding a non-PAGE_SIZE TX dmabuf but only using
> actually sendmsging up to the last PAGE_SIZE boundary is working,
> andthat would break entirely. I guess lets merge this and on the
> offchance someone is hitting this edge case we can revisit.
>
> Reviewed-by: Mina Almasry <almasrymina@google.com>
>
> I don't know if you're interested in also fixing the edge case where
> the sg table entries themselves are not not page-aligned. I think that
> also doesn't work properly?
>
> --
> Thanks,
> Mina
Thanks Mina, appreciate the review.
On the SG side -- the second hunk in this patch already rejects any
SG with !IS_ALIGNED(sg_dma_len(sg), PAGE_SIZE) at bind time, so the
num_niovs vs
gen_pool_add_owner(len) desync I described in the changelog is
covered for both RX and TX.
What's not checked is the sg_dma_address() alignment itself -- an SG
with a page-multiple length but a sub-page start offset would still
slip through and produce
niovs whose dma addresses straddle pages. Like you said, it's
theoretical (in-tree exporters and udmabuf all hand back page-aligned
addresses), but happy to send a
follow-up as a separate fix if you'd like.
Cheers.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length
2026-05-19 20:35 [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length David Carlier
` (2 preceding siblings ...)
2026-05-20 18:55 ` Mina Almasry
@ 2026-05-21 2:10 ` patchwork-bot+netdevbpf
3 siblings, 0 replies; 8+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-05-21 2:10 UTC (permalink / raw)
To: David CARLIER
Cc: netdev, stable, davem, edumazet, kuba, pabeni, horms, sdf,
sdf.kernel, kaiyuanz, almasrymina, bobbyeshleman, linux-kernel
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Tue, 19 May 2026 21:35:30 +0100 you wrote:
> net_devmem_bind_dmabuf() trusts dmabuf->size and sg_dma_len() to be
> PAGE_SIZE multiples without checking:
>
> - tx_vec is sized dmabuf->size / PAGE_SIZE, and
> net_devmem_get_niov_at() only bounds-checks virt_addr < dmabuf->size
> before indexing tx_vec[virt_addr / PAGE_SIZE]. With size =
> N*PAGE_SIZE + r (1 <= r < PAGE_SIZE), sendmsg() at iov_base =
> N*PAGE_SIZE passes the bound check and reads tx_vec[N] -- one past.
>
> [...]
Here is the summary with links:
- [net,v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length
https://git.kernel.org/netdev/net/c/4eb82ba54342
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-05-21 2:10 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-19 20:35 [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length David Carlier
2026-05-19 21:07 ` Bobby Eshleman
2026-05-19 23:33 ` Stanislav Fomichev
2026-05-20 18:55 ` Mina Almasry
2026-05-20 19:28 ` David CARLIER
2026-05-20 20:42 ` Mina Almasry
2026-05-20 21:02 ` David CARLIER
2026-05-21 2:10 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox