* [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length
@ 2026-05-19 20:35 David Carlier
2026-05-19 21:07 ` Bobby Eshleman
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: David Carlier @ 2026-05-19 20:35 UTC (permalink / raw)
To: netdev
Cc: stable, davem, edumazet, kuba, pabeni, horms, sdf, sdf.kernel,
kaiyuanz, almasrymina, bobbyeshleman, linux-kernel, David Carlier
net_devmem_bind_dmabuf() trusts dmabuf->size and sg_dma_len() to be
PAGE_SIZE multiples without checking:
- tx_vec is sized dmabuf->size / PAGE_SIZE, and
net_devmem_get_niov_at() only bounds-checks virt_addr < dmabuf->size
before indexing tx_vec[virt_addr / PAGE_SIZE]. With size =
N*PAGE_SIZE + r (1 <= r < PAGE_SIZE), sendmsg() at iov_base =
N*PAGE_SIZE passes the bound check and reads tx_vec[N] -- one past.
- owner->area.num_niovs = len / PAGE_SIZE while gen_pool_add_owner()
covers the full byte len, so a non-page-multiple non-final sg
desyncs num_niovs from the gen_pool region for every later sg, on
both RX and TX.
dma-buf does not require page-aligned sizes, so the bind path has to
enforce what its own indexing assumes. Reject both with -EINVAL.
The size check is TX-only (only tx_vec is sized off dmabuf->size); the
SG-length check covers both directions.
Fixes: bd61848900bf ("net: devmem: Implement TX path")
Cc: stable@vger.kernel.org
Signed-off-by: David Carlier <devnexen@gmail.com>
---
Changes in v2:
- Reframe commit message around the kernel-side OOB instead of
"real exporters already page-align", which read as the OOB being
unreachable and undercut Cc: stable (Stanislav Fomichev).
- Hoist the SG-length check out of the if (TX) branch so it covers
RX too; RX has the same num_niovs / gen_pool desync on a
contract-violating exporter, just without an OOB. Keep the
size-multiple check TX-only (Stanislav Fomichev).
- Drop bool todevice; compare direction == DMA_TO_DEVICE inline to
match the existing call site at the tx_vec[] assignment
(Bobby Eshleman).
net/core/devmem.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/net/core/devmem.c b/net/core/devmem.c
index 468344739db2..4f71de44c0fb 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -241,6 +241,11 @@ net_devmem_bind_dmabuf(struct net_device *dev,
}
if (direction == DMA_TO_DEVICE) {
+ if (!IS_ALIGNED(dmabuf->size, PAGE_SIZE)) {
+ err = -EINVAL;
+ NL_SET_ERR_MSG(extack, "TX dma-buf size must be a multiple of PAGE_SIZE");
+ goto err_unmap;
+ }
binding->tx_vec = kvmalloc_objs(struct net_iov *,
dmabuf->size / PAGE_SIZE);
if (!binding->tx_vec) {
@@ -267,6 +272,12 @@ net_devmem_bind_dmabuf(struct net_device *dev,
size_t len = sg_dma_len(sg);
struct net_iov *niov;
+ if (!IS_ALIGNED(len, PAGE_SIZE)) {
+ err = -EINVAL;
+ NL_SET_ERR_MSG(extack, "dma-buf SG length must be PAGE_SIZE aligned");
+ goto err_free_chunks;
+ }
+
owner = kzalloc_node(sizeof(*owner), GFP_KERNEL,
dev_to_node(&dev->dev));
if (!owner) {
--
2.53.0
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length 2026-05-19 20:35 [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length David Carlier @ 2026-05-19 21:07 ` Bobby Eshleman 2026-05-19 23:33 ` Stanislav Fomichev ` (2 subsequent siblings) 3 siblings, 0 replies; 8+ messages in thread From: Bobby Eshleman @ 2026-05-19 21:07 UTC (permalink / raw) To: David Carlier Cc: netdev, stable, davem, edumazet, kuba, pabeni, horms, sdf, sdf.kernel, kaiyuanz, almasrymina, linux-kernel On Tue, May 19, 2026 at 09:35:30PM +0100, David Carlier wrote: > net_devmem_bind_dmabuf() trusts dmabuf->size and sg_dma_len() to be > PAGE_SIZE multiples without checking: > > - tx_vec is sized dmabuf->size / PAGE_SIZE, and > net_devmem_get_niov_at() only bounds-checks virt_addr < dmabuf->size > before indexing tx_vec[virt_addr / PAGE_SIZE]. With size = > N*PAGE_SIZE + r (1 <= r < PAGE_SIZE), sendmsg() at iov_base = > N*PAGE_SIZE passes the bound check and reads tx_vec[N] -- one past. > > - owner->area.num_niovs = len / PAGE_SIZE while gen_pool_add_owner() > covers the full byte len, so a non-page-multiple non-final sg > desyncs num_niovs from the gen_pool region for every later sg, on > both RX and TX. > > dma-buf does not require page-aligned sizes, so the bind path has to > enforce what its own indexing assumes. Reject both with -EINVAL. > > The size check is TX-only (only tx_vec is sized off dmabuf->size); the > SG-length check covers both directions. > > Fixes: bd61848900bf ("net: devmem: Implement TX path") > Cc: stable@vger.kernel.org > Signed-off-by: David Carlier <devnexen@gmail.com> > --- > Changes in v2: > - Reframe commit message around the kernel-side OOB instead of > "real exporters already page-align", which read as the OOB being > unreachable and undercut Cc: stable (Stanislav Fomichev). > - Hoist the SG-length check out of the if (TX) branch so it covers > RX too; RX has the same num_niovs / gen_pool desync on a > contract-violating exporter, just without an OOB. Keep the > size-multiple check TX-only (Stanislav Fomichev). > - Drop bool todevice; compare direction == DMA_TO_DEVICE inline to > match the existing call site at the tx_vec[] assignment > (Bobby Eshleman). > > net/core/devmem.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/net/core/devmem.c b/net/core/devmem.c > index 468344739db2..4f71de44c0fb 100644 > --- a/net/core/devmem.c > +++ b/net/core/devmem.c > @@ -241,6 +241,11 @@ net_devmem_bind_dmabuf(struct net_device *dev, > } > > if (direction == DMA_TO_DEVICE) { > + if (!IS_ALIGNED(dmabuf->size, PAGE_SIZE)) { > + err = -EINVAL; > + NL_SET_ERR_MSG(extack, "TX dma-buf size must be a multiple of PAGE_SIZE"); > + goto err_unmap; > + } > binding->tx_vec = kvmalloc_objs(struct net_iov *, > dmabuf->size / PAGE_SIZE); > if (!binding->tx_vec) { > @@ -267,6 +272,12 @@ net_devmem_bind_dmabuf(struct net_device *dev, > size_t len = sg_dma_len(sg); > struct net_iov *niov; > > + if (!IS_ALIGNED(len, PAGE_SIZE)) { > + err = -EINVAL; > + NL_SET_ERR_MSG(extack, "dma-buf SG length must be PAGE_SIZE aligned"); > + goto err_free_chunks; > + } > + > owner = kzalloc_node(sizeof(*owner), GFP_KERNEL, > dev_to_node(&dev->dev)); > if (!owner) { > -- > 2.53.0 LGTM. Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length 2026-05-19 20:35 [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length David Carlier 2026-05-19 21:07 ` Bobby Eshleman @ 2026-05-19 23:33 ` Stanislav Fomichev 2026-05-20 18:55 ` Mina Almasry 2026-05-21 2:10 ` patchwork-bot+netdevbpf 3 siblings, 0 replies; 8+ messages in thread From: Stanislav Fomichev @ 2026-05-19 23:33 UTC (permalink / raw) To: David Carlier Cc: netdev, stable, davem, edumazet, kuba, pabeni, horms, sdf, kaiyuanz, almasrymina, bobbyeshleman, linux-kernel On 05/19, David Carlier wrote: > net_devmem_bind_dmabuf() trusts dmabuf->size and sg_dma_len() to be > PAGE_SIZE multiples without checking: > > - tx_vec is sized dmabuf->size / PAGE_SIZE, and > net_devmem_get_niov_at() only bounds-checks virt_addr < dmabuf->size > before indexing tx_vec[virt_addr / PAGE_SIZE]. With size = > N*PAGE_SIZE + r (1 <= r < PAGE_SIZE), sendmsg() at iov_base = > N*PAGE_SIZE passes the bound check and reads tx_vec[N] -- one past. > > - owner->area.num_niovs = len / PAGE_SIZE while gen_pool_add_owner() > covers the full byte len, so a non-page-multiple non-final sg > desyncs num_niovs from the gen_pool region for every later sg, on > both RX and TX. > > dma-buf does not require page-aligned sizes, so the bind path has to > enforce what its own indexing assumes. Reject both with -EINVAL. > > The size check is TX-only (only tx_vec is sized off dmabuf->size); the > SG-length check covers both directions. > > Fixes: bd61848900bf ("net: devmem: Implement TX path") > Cc: stable@vger.kernel.org > Signed-off-by: David Carlier <devnexen@gmail.com> > --- > Changes in v2: > - Reframe commit message around the kernel-side OOB instead of > "real exporters already page-align", which read as the OOB being > unreachable and undercut Cc: stable (Stanislav Fomichev). > - Hoist the SG-length check out of the if (TX) branch so it covers > RX too; RX has the same num_niovs / gen_pool desync on a > contract-violating exporter, just without an OOB. Keep the > size-multiple check TX-only (Stanislav Fomichev). > - Drop bool todevice; compare direction == DMA_TO_DEVICE inline to > match the existing call site at the tx_vec[] assignment > (Bobby Eshleman). Acked-by: Stanislav Fomichev <sdf@fomichev.me> Thanks! ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length 2026-05-19 20:35 [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length David Carlier 2026-05-19 21:07 ` Bobby Eshleman 2026-05-19 23:33 ` Stanislav Fomichev @ 2026-05-20 18:55 ` Mina Almasry 2026-05-20 19:28 ` David CARLIER 2026-05-21 2:10 ` patchwork-bot+netdevbpf 3 siblings, 1 reply; 8+ messages in thread From: Mina Almasry @ 2026-05-20 18:55 UTC (permalink / raw) To: David Carlier Cc: netdev, stable, davem, edumazet, kuba, pabeni, horms, sdf, sdf.kernel, kaiyuanz, bobbyeshleman, linux-kernel On Tue, May 19, 2026 at 1:35 PM David Carlier <devnexen@gmail.com> wrote: > > net_devmem_bind_dmabuf() trusts dmabuf->size and sg_dma_len() to be > PAGE_SIZE multiples without checking: > > - tx_vec is sized dmabuf->size / PAGE_SIZE, and > net_devmem_get_niov_at() only bounds-checks virt_addr < dmabuf->size > before indexing tx_vec[virt_addr / PAGE_SIZE]. With size = > N*PAGE_SIZE + r (1 <= r < PAGE_SIZE), sendmsg() at iov_base = > N*PAGE_SIZE passes the bound check and reads tx_vec[N] -- one past. > > - owner->area.num_niovs = len / PAGE_SIZE while gen_pool_add_owner() > covers the full byte len, so a non-page-multiple non-final sg > desyncs num_niovs from the gen_pool region for every later sg, on > both RX and TX. > > dma-buf does not require page-aligned sizes, so the bind path has to > enforce what its own indexing assumes. Reject both with -EINVAL. > > The size check is TX-only (only tx_vec is sized off dmabuf->size); the > SG-length check covers both directions. > > Fixes: bd61848900bf ("net: devmem: Implement TX path") > Cc: stable@vger.kernel.org > Signed-off-by: David Carlier <devnexen@gmail.com> > --- > Changes in v2: > - Reframe commit message around the kernel-side OOB instead of > "real exporters already page-align", which read as the OOB being > unreachable and undercut Cc: stable (Stanislav Fomichev). > - Hoist the SG-length check out of the if (TX) branch so it covers > RX too; RX has the same num_niovs / gen_pool desync on a > contract-violating exporter, just without an OOB. Keep the > size-multiple check TX-only (Stanislav Fomichev). > - Drop bool todevice; compare direction == DMA_TO_DEVICE inline to > match the existing call site at the tx_vec[] assignment > (Bobby Eshleman). > > net/core/devmem.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/net/core/devmem.c b/net/core/devmem.c > index 468344739db2..4f71de44c0fb 100644 > --- a/net/core/devmem.c > +++ b/net/core/devmem.c > @@ -241,6 +241,11 @@ net_devmem_bind_dmabuf(struct net_device *dev, > } > > if (direction == DMA_TO_DEVICE) { > + if (!IS_ALIGNED(dmabuf->size, PAGE_SIZE)) { > + err = -EINVAL; > + NL_SET_ERR_MSG(extack, "TX dma-buf size must be a multiple of PAGE_SIZE"); > + goto err_unmap; > + } > binding->tx_vec = kvmalloc_objs(struct net_iov *, > dmabuf->size / PAGE_SIZE); > if (!binding->tx_vec) { > @@ -267,6 +272,12 @@ net_devmem_bind_dmabuf(struct net_device *dev, > size_t len = sg_dma_len(sg); > struct net_iov *niov; > > + if (!IS_ALIGNED(len, PAGE_SIZE)) { > + err = -EINVAL; > + NL_SET_ERR_MSG(extack, "dma-buf SG length must be PAGE_SIZE aligned"); > + goto err_free_chunks; > + } > + > owner = kzalloc_node(sizeof(*owner), GFP_KERNEL, > dev_to_node(&dev->dev)); > if (!owner) { > -- > 2.53.0 > No hold on, I don't think we actually have a bug here. AFAIR all you're describing is intionional . Yes the TX vectors and their niov arrays do implicitly 'pad' the dmabuf size to PAGE_SIZE. But net_devmem_get_niov_at has this check that prevents us from trying to send past the dma-buf size, even if it's not page_aligned: ``` if (virt_addr >= binding->dmabuf->size) return NULL; ``` IIRC the NULL should be bubbled up to the user as some error. Please double check that we actually have a bug here. If not, please don't merge this. This change could break existing users using devmem TX correctly with non-PAGE_SIZE aligned dmabufs, which is a valid use case. And if we have a bug, lets fix it in some way that doesn't deprecate support for non page-aligned TX dmabufs. You may be breaking users here. -- Thanks, Mina ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length 2026-05-20 18:55 ` Mina Almasry @ 2026-05-20 19:28 ` David CARLIER 2026-05-20 20:42 ` Mina Almasry 0 siblings, 1 reply; 8+ messages in thread From: David CARLIER @ 2026-05-20 19:28 UTC (permalink / raw) To: Mina Almasry Cc: netdev, stable, davem, edumazet, kuba, pabeni, horms, sdf, sdf.kernel, kaiyuanz, bobbyeshleman, linux-kernel On Wed, 20 May 2026 at 19:55, Mina Almasry <almasrymina@google.com> wrote: > > On Tue, May 19, 2026 at 1:35 PM David Carlier <devnexen@gmail.com> wrote: > > > > net_devmem_bind_dmabuf() trusts dmabuf->size and sg_dma_len() to be > > PAGE_SIZE multiples without checking: > > > > - tx_vec is sized dmabuf->size / PAGE_SIZE, and > > net_devmem_get_niov_at() only bounds-checks virt_addr < dmabuf->size > > before indexing tx_vec[virt_addr / PAGE_SIZE]. With size = > > N*PAGE_SIZE + r (1 <= r < PAGE_SIZE), sendmsg() at iov_base = > > N*PAGE_SIZE passes the bound check and reads tx_vec[N] -- one past. > > > > - owner->area.num_niovs = len / PAGE_SIZE while gen_pool_add_owner() > > covers the full byte len, so a non-page-multiple non-final sg > > desyncs num_niovs from the gen_pool region for every later sg, on > > both RX and TX. > > > > dma-buf does not require page-aligned sizes, so the bind path has to > > enforce what its own indexing assumes. Reject both with -EINVAL. > > > > The size check is TX-only (only tx_vec is sized off dmabuf->size); the > > SG-length check covers both directions. > > > > Fixes: bd61848900bf ("net: devmem: Implement TX path") > > Cc: stable@vger.kernel.org > > Signed-off-by: David Carlier <devnexen@gmail.com> > > --- > > Changes in v2: > > - Reframe commit message around the kernel-side OOB instead of > > "real exporters already page-align", which read as the OOB being > > unreachable and undercut Cc: stable (Stanislav Fomichev). > > - Hoist the SG-length check out of the if (TX) branch so it covers > > RX too; RX has the same num_niovs / gen_pool desync on a > > contract-violating exporter, just without an OOB. Keep the > > size-multiple check TX-only (Stanislav Fomichev). > > - Drop bool todevice; compare direction == DMA_TO_DEVICE inline to > > match the existing call site at the tx_vec[] assignment > > (Bobby Eshleman). > > > > net/core/devmem.c | 11 +++++++++++ > > 1 file changed, 11 insertions(+) > > > > diff --git a/net/core/devmem.c b/net/core/devmem.c > > index 468344739db2..4f71de44c0fb 100644 > > --- a/net/core/devmem.c > > +++ b/net/core/devmem.c > > @@ -241,6 +241,11 @@ net_devmem_bind_dmabuf(struct net_device *dev, > > } > > > > if (direction == DMA_TO_DEVICE) { > > + if (!IS_ALIGNED(dmabuf->size, PAGE_SIZE)) { > > + err = -EINVAL; > > + NL_SET_ERR_MSG(extack, "TX dma-buf size must be a multiple of PAGE_SIZE"); > > + goto err_unmap; > > + } > > binding->tx_vec = kvmalloc_objs(struct net_iov *, > > dmabuf->size / PAGE_SIZE); > > if (!binding->tx_vec) { > > @@ -267,6 +272,12 @@ net_devmem_bind_dmabuf(struct net_device *dev, > > size_t len = sg_dma_len(sg); > > struct net_iov *niov; > > > > + if (!IS_ALIGNED(len, PAGE_SIZE)) { > > + err = -EINVAL; > > + NL_SET_ERR_MSG(extack, "dma-buf SG length must be PAGE_SIZE aligned"); > > + goto err_free_chunks; > > + } > > + > > owner = kzalloc_node(sizeof(*owner), GFP_KERNEL, > > dev_to_node(&dev->dev)); > > if (!owner) { > > -- > > 2.53.0 > > > > No hold on, I don't think we actually have a bug here. AFAIR all > you're describing is intionional . Yes the TX vectors and their niov > arrays do implicitly 'pad' the dmabuf size to PAGE_SIZE. > > But net_devmem_get_niov_at has this check that prevents us from trying > to send past the dma-buf size, even if it's not page_aligned: > > ``` > if (virt_addr >= binding->dmabuf->size) > return NULL; > ``` > > IIRC the NULL should be bubbled up to the user as some error. > > Please double check that we actually have a bug here. If not, please > don't merge this. This change could break existing users using devmem > TX correctly with non-PAGE_SIZE aligned dmabufs, which is a valid use > case. > > And if we have a bug, lets fix it in some way that doesn't deprecate > support for non page-aligned TX dmabufs. You may be breaking users > here. > > -- > Thanks, > Mina Hi, Mina, note that the guard you're quoting doesn't cover the case I'm describing. It's in bytes against dmabuf->size, while tx_vec is sized dmabuf->size / PAGE_SIZE (truncating) -- there's a sub-page window where the check passes and the index doesn't. Concretely, PAGE_SIZE = 4096 and dmabuf->size = 4097: tx_vec allocated for 4097 / 4096 = 1 entry (valid index 0). sendmsg with iov_base = 4096: virt_addr >= dmabuf->size -> 4096 >= 4097, passes. tx_vec[virt_addr / PAGE_SIZE] -> tx_vec[1], OOB by one. And the OOB pointer isn't just returned to the caller -- it flows through get_netmem() -> __get_netmem(), which dereferences it (net_is_devmem_iov() reads ->type) and on a matching byte refcounts the binding off of it. So this is a controlled OOB deref, not just a stray read. On breaking users: the partial-page tail was never TX-usable to begin with. The fill loop only populates num_niovs = len / PAGE_SIZE entries while gen_pool_add_owner() covers the full byte len, so any sendmsg into [num_niovs*PAGE_SIZE, dmabuf->size) was already heading for either NULL or this OOB. It's not deprecating a working configuration -- it's rejecting one that wasn't working. Also worth flagging: the second hunk (sg_dma_len alignment) is the RX path too. net_devmem_get_niov_at() is TX-only, so the guard you quoted doesn't apply there, and the num_niovs vs gen_pool_add_owner(len) desync hits every non-final SG with a sub-page length, both directions. If you'd prefer to keep non-aligned binds legal, an alternative would be tightening the guard in net_devmem_get_niov_at() to virt_addr >= num_niovs * PAGE_SIZE instead of >= dmabuf->size. Happy to spin that as v3 -- but it doesn't help the SG-length leg, which still needs rejecting at bind time. Cheers. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length 2026-05-20 19:28 ` David CARLIER @ 2026-05-20 20:42 ` Mina Almasry 2026-05-20 21:02 ` David CARLIER 0 siblings, 1 reply; 8+ messages in thread From: Mina Almasry @ 2026-05-20 20:42 UTC (permalink / raw) To: David CARLIER Cc: netdev, stable, davem, edumazet, kuba, pabeni, horms, sdf, sdf.kernel, kaiyuanz, bobbyeshleman, linux-kernel On Wed, May 20, 2026 at 12:28 PM David CARLIER <devnexen@gmail.com> wrote: > > On Wed, 20 May 2026 at 19:55, Mina Almasry <almasrymina@google.com> wrote: > > > > On Tue, May 19, 2026 at 1:35 PM David Carlier <devnexen@gmail.com> wrote: > > > > > > net_devmem_bind_dmabuf() trusts dmabuf->size and sg_dma_len() to be > > > PAGE_SIZE multiples without checking: > > > > > > - tx_vec is sized dmabuf->size / PAGE_SIZE, and > > > net_devmem_get_niov_at() only bounds-checks virt_addr < dmabuf->size > > > before indexing tx_vec[virt_addr / PAGE_SIZE]. With size = > > > N*PAGE_SIZE + r (1 <= r < PAGE_SIZE), sendmsg() at iov_base = > > > N*PAGE_SIZE passes the bound check and reads tx_vec[N] -- one past. > > > > > > - owner->area.num_niovs = len / PAGE_SIZE while gen_pool_add_owner() > > > covers the full byte len, so a non-page-multiple non-final sg > > > desyncs num_niovs from the gen_pool region for every later sg, on > > > both RX and TX. > > > > > > dma-buf does not require page-aligned sizes, so the bind path has to > > > enforce what its own indexing assumes. Reject both with -EINVAL. > > > > > > The size check is TX-only (only tx_vec is sized off dmabuf->size); the > > > SG-length check covers both directions. > > > > > > Fixes: bd61848900bf ("net: devmem: Implement TX path") > > > Cc: stable@vger.kernel.org > > > Signed-off-by: David Carlier <devnexen@gmail.com> > > > --- > > > Changes in v2: > > > - Reframe commit message around the kernel-side OOB instead of > > > "real exporters already page-align", which read as the OOB being > > > unreachable and undercut Cc: stable (Stanislav Fomichev). > > > - Hoist the SG-length check out of the if (TX) branch so it covers > > > RX too; RX has the same num_niovs / gen_pool desync on a > > > contract-violating exporter, just without an OOB. Keep the > > > size-multiple check TX-only (Stanislav Fomichev). > > > - Drop bool todevice; compare direction == DMA_TO_DEVICE inline to > > > match the existing call site at the tx_vec[] assignment > > > (Bobby Eshleman). > > > > > > net/core/devmem.c | 11 +++++++++++ > > > 1 file changed, 11 insertions(+) > > > > > > diff --git a/net/core/devmem.c b/net/core/devmem.c > > > index 468344739db2..4f71de44c0fb 100644 > > > --- a/net/core/devmem.c > > > +++ b/net/core/devmem.c > > > @@ -241,6 +241,11 @@ net_devmem_bind_dmabuf(struct net_device *dev, > > > } > > > > > > if (direction == DMA_TO_DEVICE) { > > > + if (!IS_ALIGNED(dmabuf->size, PAGE_SIZE)) { > > > + err = -EINVAL; > > > + NL_SET_ERR_MSG(extack, "TX dma-buf size must be a multiple of PAGE_SIZE"); > > > + goto err_unmap; > > > + } > > > binding->tx_vec = kvmalloc_objs(struct net_iov *, > > > dmabuf->size / PAGE_SIZE); > > > if (!binding->tx_vec) { > > > @@ -267,6 +272,12 @@ net_devmem_bind_dmabuf(struct net_device *dev, > > > size_t len = sg_dma_len(sg); > > > struct net_iov *niov; > > > > > > + if (!IS_ALIGNED(len, PAGE_SIZE)) { > > > + err = -EINVAL; > > > + NL_SET_ERR_MSG(extack, "dma-buf SG length must be PAGE_SIZE aligned"); > > > + goto err_free_chunks; > > > + } > > > + > > > owner = kzalloc_node(sizeof(*owner), GFP_KERNEL, > > > dev_to_node(&dev->dev)); > > > if (!owner) { > > > -- > > > 2.53.0 > > > > > > > No hold on, I don't think we actually have a bug here. AFAIR all > > you're describing is intionional . Yes the TX vectors and their niov > > arrays do implicitly 'pad' the dmabuf size to PAGE_SIZE. > > > > But net_devmem_get_niov_at has this check that prevents us from trying > > to send past the dma-buf size, even if it's not page_aligned: > > > > ``` > > if (virt_addr >= binding->dmabuf->size) > > return NULL; > > ``` > > > > IIRC the NULL should be bubbled up to the user as some error. > > > > Please double check that we actually have a bug here. If not, please > > don't merge this. This change could break existing users using devmem > > TX correctly with non-PAGE_SIZE aligned dmabufs, which is a valid use > > case. > > > > And if we have a bug, lets fix it in some way that doesn't deprecate > > support for non page-aligned TX dmabufs. You may be breaking users > > here. > > > > -- > > Thanks, > > Mina > > Hi, Mina, note that the guard you're quoting doesn't cover the case > I'm describing. It's in bytes against dmabuf->size, while tx_vec is > sized dmabuf->size / PAGE_SIZE > (truncating) -- there's a sub-page window where the check passes and > the index doesn't. > > Concretely, PAGE_SIZE = 4096 and dmabuf->size = 4097: > > tx_vec allocated for 4097 / 4096 = 1 entry (valid index 0). > sendmsg with iov_base = 4096: > virt_addr >= dmabuf->size -> 4096 >= 4097, passes. > tx_vec[virt_addr / PAGE_SIZE] -> tx_vec[1], OOB by one. > > And the OOB pointer isn't just returned to the caller -- it flows > through get_netmem() -> __get_netmem(), which dereferences it > (net_is_devmem_iov() reads ->type) and > on a matching byte refcounts the binding off of it. So this is a > controlled OOB deref, not just a stray read. > Ah, I see. I think that is indeed the bug. In the case PAGE_SIZE=4096 and dmabuf->size is 4097, the intention in the code was to have tx_vec be an array of 2, where tx_vec[0] is [1->4096] and tx_vec[1] is [4097]. I have an off-by-one error in the tx_vec allocation :( tx_vec[1] would contain an niov and as is stands we assume nvios are PAGE_SIZE, but IIRC net_devmem_get_niov_at() would make sure that the callers trying to use tx_vec[1] would only use it for a range that's valid, so using [4097] and not a range like [4097->8192]. However when I dug deeper on proper pruning of page alignment assumptions in net_devmem_bind_dmabuf, the problems are deeper than this patch suggests. We don't properly handle the (probably non-existent?) edge case where the dma-buf itself is page aligned but for_each_sgtable_dma_sg itself gives us un-page_aligned sg entries :( I think probably all of this is very theoretical. In practice probably the dmabuf implementations in the wild seem to be page-aligned. udmabuf doesn't support non-page-aligned dmabufs even so we can't add tests for these things. So I guess fine, lets merge this. > On breaking users: the partial-page tail was never TX-usable to > begin with. The fill loop only populates num_niovs = len / PAGE_SIZE > entries while > gen_pool_add_owner() covers the full byte len, so any sendmsg into > [num_niovs*PAGE_SIZE, dmabuf->size) was already heading for either > NULL or this OOB. It's not > deprecating a working configuration -- it's rejecting one that wasn't working. > Yes, but still, today binding a non-PAGE_SIZE TX dmabuf but only using actually sendmsging up to the last PAGE_SIZE boundary is working, andthat would break entirely. I guess lets merge this and on the offchance someone is hitting this edge case we can revisit. Reviewed-by: Mina Almasry <almasrymina@google.com> I don't know if you're interested in also fixing the edge case where the sg table entries themselves are not not page-aligned. I think that also doesn't work properly? -- Thanks, Mina ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length 2026-05-20 20:42 ` Mina Almasry @ 2026-05-20 21:02 ` David CARLIER 0 siblings, 0 replies; 8+ messages in thread From: David CARLIER @ 2026-05-20 21:02 UTC (permalink / raw) To: Mina Almasry Cc: netdev, stable, davem, edumazet, kuba, pabeni, horms, sdf, sdf.kernel, kaiyuanz, bobbyeshleman, linux-kernel On Wed, 20 May 2026 at 21:42, Mina Almasry <almasrymina@google.com> wrote: > > On Wed, May 20, 2026 at 12:28 PM David CARLIER <devnexen@gmail.com> wrote: > > > > On Wed, 20 May 2026 at 19:55, Mina Almasry <almasrymina@google.com> wrote: > > > > > > On Tue, May 19, 2026 at 1:35 PM David Carlier <devnexen@gmail.com> wrote: > > > > > > > > net_devmem_bind_dmabuf() trusts dmabuf->size and sg_dma_len() to be > > > > PAGE_SIZE multiples without checking: > > > > > > > > - tx_vec is sized dmabuf->size / PAGE_SIZE, and > > > > net_devmem_get_niov_at() only bounds-checks virt_addr < dmabuf->size > > > > before indexing tx_vec[virt_addr / PAGE_SIZE]. With size = > > > > N*PAGE_SIZE + r (1 <= r < PAGE_SIZE), sendmsg() at iov_base = > > > > N*PAGE_SIZE passes the bound check and reads tx_vec[N] -- one past. > > > > > > > > - owner->area.num_niovs = len / PAGE_SIZE while gen_pool_add_owner() > > > > covers the full byte len, so a non-page-multiple non-final sg > > > > desyncs num_niovs from the gen_pool region for every later sg, on > > > > both RX and TX. > > > > > > > > dma-buf does not require page-aligned sizes, so the bind path has to > > > > enforce what its own indexing assumes. Reject both with -EINVAL. > > > > > > > > The size check is TX-only (only tx_vec is sized off dmabuf->size); the > > > > SG-length check covers both directions. > > > > > > > > Fixes: bd61848900bf ("net: devmem: Implement TX path") > > > > Cc: stable@vger.kernel.org > > > > Signed-off-by: David Carlier <devnexen@gmail.com> > > > > --- > > > > Changes in v2: > > > > - Reframe commit message around the kernel-side OOB instead of > > > > "real exporters already page-align", which read as the OOB being > > > > unreachable and undercut Cc: stable (Stanislav Fomichev). > > > > - Hoist the SG-length check out of the if (TX) branch so it covers > > > > RX too; RX has the same num_niovs / gen_pool desync on a > > > > contract-violating exporter, just without an OOB. Keep the > > > > size-multiple check TX-only (Stanislav Fomichev). > > > > - Drop bool todevice; compare direction == DMA_TO_DEVICE inline to > > > > match the existing call site at the tx_vec[] assignment > > > > (Bobby Eshleman). > > > > > > > > net/core/devmem.c | 11 +++++++++++ > > > > 1 file changed, 11 insertions(+) > > > > > > > > diff --git a/net/core/devmem.c b/net/core/devmem.c > > > > index 468344739db2..4f71de44c0fb 100644 > > > > --- a/net/core/devmem.c > > > > +++ b/net/core/devmem.c > > > > @@ -241,6 +241,11 @@ net_devmem_bind_dmabuf(struct net_device *dev, > > > > } > > > > > > > > if (direction == DMA_TO_DEVICE) { > > > > + if (!IS_ALIGNED(dmabuf->size, PAGE_SIZE)) { > > > > + err = -EINVAL; > > > > + NL_SET_ERR_MSG(extack, "TX dma-buf size must be a multiple of PAGE_SIZE"); > > > > + goto err_unmap; > > > > + } > > > > binding->tx_vec = kvmalloc_objs(struct net_iov *, > > > > dmabuf->size / PAGE_SIZE); > > > > if (!binding->tx_vec) { > > > > @@ -267,6 +272,12 @@ net_devmem_bind_dmabuf(struct net_device *dev, > > > > size_t len = sg_dma_len(sg); > > > > struct net_iov *niov; > > > > > > > > + if (!IS_ALIGNED(len, PAGE_SIZE)) { > > > > + err = -EINVAL; > > > > + NL_SET_ERR_MSG(extack, "dma-buf SG length must be PAGE_SIZE aligned"); > > > > + goto err_free_chunks; > > > > + } > > > > + > > > > owner = kzalloc_node(sizeof(*owner), GFP_KERNEL, > > > > dev_to_node(&dev->dev)); > > > > if (!owner) { > > > > -- > > > > 2.53.0 > > > > > > > > > > No hold on, I don't think we actually have a bug here. AFAIR all > > > you're describing is intionional . Yes the TX vectors and their niov > > > arrays do implicitly 'pad' the dmabuf size to PAGE_SIZE. > > > > > > But net_devmem_get_niov_at has this check that prevents us from trying > > > to send past the dma-buf size, even if it's not page_aligned: > > > > > > ``` > > > if (virt_addr >= binding->dmabuf->size) > > > return NULL; > > > ``` > > > > > > IIRC the NULL should be bubbled up to the user as some error. > > > > > > Please double check that we actually have a bug here. If not, please > > > don't merge this. This change could break existing users using devmem > > > TX correctly with non-PAGE_SIZE aligned dmabufs, which is a valid use > > > case. > > > > > > And if we have a bug, lets fix it in some way that doesn't deprecate > > > support for non page-aligned TX dmabufs. You may be breaking users > > > here. > > > > > > -- > > > Thanks, > > > Mina > > > > Hi, Mina, note that the guard you're quoting doesn't cover the case > > I'm describing. It's in bytes against dmabuf->size, while tx_vec is > > sized dmabuf->size / PAGE_SIZE > > (truncating) -- there's a sub-page window where the check passes and > > the index doesn't. > > > > Concretely, PAGE_SIZE = 4096 and dmabuf->size = 4097: > > > > tx_vec allocated for 4097 / 4096 = 1 entry (valid index 0). > > sendmsg with iov_base = 4096: > > virt_addr >= dmabuf->size -> 4096 >= 4097, passes. > > tx_vec[virt_addr / PAGE_SIZE] -> tx_vec[1], OOB by one. > > > > And the OOB pointer isn't just returned to the caller -- it flows > > through get_netmem() -> __get_netmem(), which dereferences it > > (net_is_devmem_iov() reads ->type) and > > on a matching byte refcounts the binding off of it. So this is a > > controlled OOB deref, not just a stray read. > > > > Ah, I see. I think that is indeed the bug. In the case PAGE_SIZE=4096 > and dmabuf->size is 4097, the intention in the code was to have tx_vec > be an array of 2, where tx_vec[0] is [1->4096] and tx_vec[1] is > [4097]. I have an off-by-one error in the tx_vec allocation :( > > tx_vec[1] would contain an niov and as is stands we assume nvios are > PAGE_SIZE, but IIRC net_devmem_get_niov_at() would make sure that the > callers trying to use tx_vec[1] would only use it for a range that's > valid, so using [4097] and not a range like [4097->8192]. > > However when I dug deeper on proper pruning of page alignment > assumptions in net_devmem_bind_dmabuf, the problems are deeper than > this patch suggests. We don't properly handle the (probably > non-existent?) edge case where the dma-buf itself is page aligned but > for_each_sgtable_dma_sg itself gives us un-page_aligned sg entries :( > > I think probably all of this is very theoretical. In practice probably > the dmabuf implementations in the wild seem to be page-aligned. > udmabuf doesn't support non-page-aligned dmabufs even so we can't add > tests for these things. So I guess fine, lets merge this. > > > On breaking users: the partial-page tail was never TX-usable to > > begin with. The fill loop only populates num_niovs = len / PAGE_SIZE > > entries while > > gen_pool_add_owner() covers the full byte len, so any sendmsg into > > [num_niovs*PAGE_SIZE, dmabuf->size) was already heading for either > > NULL or this OOB. It's not > > deprecating a working configuration -- it's rejecting one that wasn't working. > > > > Yes, but still, today binding a non-PAGE_SIZE TX dmabuf but only using > actually sendmsging up to the last PAGE_SIZE boundary is working, > andthat would break entirely. I guess lets merge this and on the > offchance someone is hitting this edge case we can revisit. > > Reviewed-by: Mina Almasry <almasrymina@google.com> > > I don't know if you're interested in also fixing the edge case where > the sg table entries themselves are not not page-aligned. I think that > also doesn't work properly? > > -- > Thanks, > Mina Thanks Mina, appreciate the review. On the SG side -- the second hunk in this patch already rejects any SG with !IS_ALIGNED(sg_dma_len(sg), PAGE_SIZE) at bind time, so the num_niovs vs gen_pool_add_owner(len) desync I described in the changelog is covered for both RX and TX. What's not checked is the sg_dma_address() alignment itself -- an SG with a page-multiple length but a sub-page start offset would still slip through and produce niovs whose dma addresses straddle pages. Like you said, it's theoretical (in-tree exporters and udmabuf all hand back page-aligned addresses), but happy to send a follow-up as a separate fix if you'd like. Cheers. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length 2026-05-19 20:35 [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length David Carlier ` (2 preceding siblings ...) 2026-05-20 18:55 ` Mina Almasry @ 2026-05-21 2:10 ` patchwork-bot+netdevbpf 3 siblings, 0 replies; 8+ messages in thread From: patchwork-bot+netdevbpf @ 2026-05-21 2:10 UTC (permalink / raw) To: David CARLIER Cc: netdev, stable, davem, edumazet, kuba, pabeni, horms, sdf, sdf.kernel, kaiyuanz, almasrymina, bobbyeshleman, linux-kernel Hello: This patch was applied to netdev/net.git (main) by Jakub Kicinski <kuba@kernel.org>: On Tue, 19 May 2026 21:35:30 +0100 you wrote: > net_devmem_bind_dmabuf() trusts dmabuf->size and sg_dma_len() to be > PAGE_SIZE multiples without checking: > > - tx_vec is sized dmabuf->size / PAGE_SIZE, and > net_devmem_get_niov_at() only bounds-checks virt_addr < dmabuf->size > before indexing tx_vec[virt_addr / PAGE_SIZE]. With size = > N*PAGE_SIZE + r (1 <= r < PAGE_SIZE), sendmsg() at iov_base = > N*PAGE_SIZE passes the bound check and reads tx_vec[N] -- one past. > > [...] Here is the summary with links: - [net,v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length https://git.kernel.org/netdev/net/c/4eb82ba54342 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-05-21 2:10 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-19 20:35 [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length David Carlier 2026-05-19 21:07 ` Bobby Eshleman 2026-05-19 23:33 ` Stanislav Fomichev 2026-05-20 18:55 ` Mina Almasry 2026-05-20 19:28 ` David CARLIER 2026-05-20 20:42 ` Mina Almasry 2026-05-20 21:02 ` David CARLIER 2026-05-21 2:10 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox