* [PATCH] net: devmem: Remove dst (ENODEV) check in net_devmem_get_binding
@ 2025-10-28 6:07 Shivaji Kant
2025-10-28 7:03 ` Eric Dumazet
2025-10-28 18:11 ` Bobby Eshleman
0 siblings, 2 replies; 5+ messages in thread
From: Shivaji Kant @ 2025-10-28 6:07 UTC (permalink / raw)
To: netdev
Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Mina Almasry, Stanislav Fomichev, Pavel Begunkov,
Pranjal Shrivastava, Shivaji Kant, Vedant Mathur
The Devmem TX binding lookup function, performs a strict
check against the socket's destination cache (`dst`) to
ensure the bound `dmabuf_id` corresponds to the correct
network device (`dst->dev->ifindex == binding->dev->ifindex`).
However, this check incorrectly fails and returns `-ENODEV`
if the socket's route cache entry (`dst`) is merely missing
or expired (`dst == NULL`). This scenario is observed during
network events, such as when flow steering rules are deleted,
leading to a temporary route cache invalidation.
The parent caller, `tcp_sendmsg_locked()`, is already
responsible for acquiring or validating the route (`dst_entry`).
If `dst` is `NULL`, `tcp_sendmsg_locked()` will correctly
derive the route before transmission.
This patch removes the `dst` validation from
`net_devmem_get_binding()`. The function now only validates
the existence of the binding and its TX vector, relying on the
calling context for device/route correctness. This allows
temporary route cache misses to be handled gracefully by the
TCP/IP stack without ENODEV error on the Devmem TX path.
Reported-by: Eric Dumazet <edumazet@google.com>
Reported-by: Vedant Mathur <vedantmathur@google.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Fixes: bd61848900bf ("net: devmem: Implement TX path")
Signed-off-by: Shivaji Kant <shivajikant@google.com>
---
net/core/devmem.c | 27 ++++++++++++++++++++++++---
1 file changed, 24 insertions(+), 3 deletions(-)
diff --git a/net/core/devmem.c b/net/core/devmem.c
index d9de31a6cc7f..1d04754bc756 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -17,6 +17,7 @@
#include <net/page_pool/helpers.h>
#include <net/page_pool/memory_provider.h>
#include <net/sock.h>
+#include <net/tcp.h>
#include <trace/events/page_pool.h>
#include "devmem.h"
@@ -357,7 +358,8 @@ struct net_devmem_dmabuf_binding *net_devmem_get_binding(struct sock *sk,
unsigned int dmabuf_id)
{
struct net_devmem_dmabuf_binding *binding;
- struct dst_entry *dst = __sk_dst_get(sk);
+ struct net_device *dst_dev;
+ struct dst_entry *dst;
int err = 0;
binding = net_devmem_lookup_dmabuf(dmabuf_id);
@@ -366,16 +368,35 @@ struct net_devmem_dmabuf_binding *net_devmem_get_binding(struct sock *sk,
goto out_err;
}
+ rcu_read_lock();
+ dst = __sk_dst_get(sk);
+ /* If dst is NULL (route expired), attempt to rebuild it. */
+ if (unlikely(!dst)) {
+ if (inet_csk(sk)->icsk_af_ops->rebuild_header(sk)) {
+ err = -EHOSTUNREACH;
+ goto out_unlock;
+ }
+ dst = __sk_dst_get(sk);
+ if (unlikely(!dst)) {
+ err = -ENODEV;
+ goto out_unlock;
+ }
+ }
+
/* The dma-addrs in this binding are only reachable to the corresponding
* net_device.
*/
- if (!dst || !dst->dev || dst->dev->ifindex != binding->dev->ifindex) {
+ dst_dev = dst_dev_rcu(dst);
+ if (unlikely(!dst_dev) || unlikely(dst_dev != binding->dev)) {
err = -ENODEV;
- goto out_err;
+ goto out_unlock;
}
+ rcu_read_unlock();
return binding;
+out_unlock:
+ rcu_read_unlock();
out_err:
if (binding)
net_devmem_dmabuf_binding_put(binding);
--
2.51.1.838.g19442a804e-goog
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH] net: devmem: Remove dst (ENODEV) check in net_devmem_get_binding
2025-10-28 6:07 [PATCH] net: devmem: Remove dst (ENODEV) check in net_devmem_get_binding Shivaji Kant
@ 2025-10-28 7:03 ` Eric Dumazet
2025-10-28 7:27 ` Shivaji Kant
2025-10-28 18:11 ` Bobby Eshleman
1 sibling, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2025-10-28 7:03 UTC (permalink / raw)
To: Shivaji Kant
Cc: netdev, David S . Miller, Jakub Kicinski, Paolo Abeni,
Simon Horman, Mina Almasry, Stanislav Fomichev, Pavel Begunkov,
Pranjal Shrivastava, Vedant Mathur
On Mon, Oct 27, 2025 at 11:07 PM Shivaji Kant <shivajikant@google.com> wrote:
>
> The Devmem TX binding lookup function, performs a strict
> check against the socket's destination cache (`dst`) to
> ensure the bound `dmabuf_id` corresponds to the correct
> network device (`dst->dev->ifindex == binding->dev->ifindex`).
>
> However, this check incorrectly fails and returns `-ENODEV`
> if the socket's route cache entry (`dst`) is merely missing
> or expired (`dst == NULL`). This scenario is observed during
> network events, such as when flow steering rules are deleted,
> leading to a temporary route cache invalidation.
>
> The parent caller, `tcp_sendmsg_locked()`, is already
> responsible for acquiring or validating the route (`dst_entry`).
> If `dst` is `NULL`, `tcp_sendmsg_locked()` will correctly
> derive the route before transmission.
>
> This patch removes the `dst` validation from
> `net_devmem_get_binding()`. The function now only validates
> the existence of the binding and its TX vector, relying on the
> calling context for device/route correctness. This allows
> temporary route cache misses to be handled gracefully by the
> TCP/IP stack without ENODEV error on the Devmem TX path.
>
> Reported-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Vedant Mathur <vedantmathur@google.com>
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Fixes: bd61848900bf ("net: devmem: Implement TX path")
> Signed-off-by: Shivaji Kant <shivajikant@google.com>
> ---
> net/core/devmem.c | 27 ++++++++++++++++++++++++---
> 1 file changed, 24 insertions(+), 3 deletions(-)
>
Patch looks good, but the title should be improved a bit.
"Remove dst (ENODEV) check in net_devmem_get_binding"
It is not about removing the check, more about refreshing the dst if necessary ?
Please wait ~24 hours before sending an updated version, to give time
for other reviewers to chime in.
Thanks !
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] net: devmem: Remove dst (ENODEV) check in net_devmem_get_binding
2025-10-28 7:03 ` Eric Dumazet
@ 2025-10-28 7:27 ` Shivaji Kant
0 siblings, 0 replies; 5+ messages in thread
From: Shivaji Kant @ 2025-10-28 7:27 UTC (permalink / raw)
To: Eric Dumazet
Cc: netdev, David S . Miller, Jakub Kicinski, Paolo Abeni,
Simon Horman, Mina Almasry, Stanislav Fomichev, Pavel Begunkov,
Pranjal Shrivastava, Vedant Mathur
Thanks Eric for pointing it out.
The description of the patch needs to be updated.
Will send a v2 with updated description in 24 hrs.
On Tue, Oct 28, 2025 at 12:33 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Mon, Oct 27, 2025 at 11:07 PM Shivaji Kant <shivajikant@google.com> wrote:
> >
> > The Devmem TX binding lookup function, performs a strict
> > check against the socket's destination cache (`dst`) to
> > ensure the bound `dmabuf_id` corresponds to the correct
> > network device (`dst->dev->ifindex == binding->dev->ifindex`).
> >
> > However, this check incorrectly fails and returns `-ENODEV`
> > if the socket's route cache entry (`dst`) is merely missing
> > or expired (`dst == NULL`). This scenario is observed during
> > network events, such as when flow steering rules are deleted,
> > leading to a temporary route cache invalidation.
> >
> > The parent caller, `tcp_sendmsg_locked()`, is already
> > responsible for acquiring or validating the route (`dst_entry`).
> > If `dst` is `NULL`, `tcp_sendmsg_locked()` will correctly
> > derive the route before transmission.
> >
> > This patch removes the `dst` validation from
> > `net_devmem_get_binding()`. The function now only validates
> > the existence of the binding and its TX vector, relying on the
> > calling context for device/route correctness. This allows
> > temporary route cache misses to be handled gracefully by the
> > TCP/IP stack without ENODEV error on the Devmem TX path.
> >
> > Reported-by: Eric Dumazet <edumazet@google.com>
> > Reported-by: Vedant Mathur <vedantmathur@google.com>
> > Suggested-by: Eric Dumazet <edumazet@google.com>
> > Fixes: bd61848900bf ("net: devmem: Implement TX path")
> > Signed-off-by: Shivaji Kant <shivajikant@google.com>
> > ---
> > net/core/devmem.c | 27 ++++++++++++++++++++++++---
> > 1 file changed, 24 insertions(+), 3 deletions(-)
> >
>
> Patch looks good, but the title should be improved a bit.
>
> "Remove dst (ENODEV) check in net_devmem_get_binding"
>
> It is not about removing the check, more about refreshing the dst if necessary ?
>
> Please wait ~24 hours before sending an updated version, to give time
> for other reviewers to chime in.
>
> Thanks !
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] net: devmem: Remove dst (ENODEV) check in net_devmem_get_binding
2025-10-28 6:07 [PATCH] net: devmem: Remove dst (ENODEV) check in net_devmem_get_binding Shivaji Kant
2025-10-28 7:03 ` Eric Dumazet
@ 2025-10-28 18:11 ` Bobby Eshleman
2025-10-28 18:14 ` Shivaji Kant
1 sibling, 1 reply; 5+ messages in thread
From: Bobby Eshleman @ 2025-10-28 18:11 UTC (permalink / raw)
To: Shivaji Kant
Cc: netdev, David S . Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Mina Almasry, Stanislav Fomichev,
Pavel Begunkov, Pranjal Shrivastava, Vedant Mathur
On Tue, Oct 28, 2025 at 06:07:14AM +0000, Shivaji Kant wrote:
> The Devmem TX binding lookup function, performs a strict
> check against the socket's destination cache (`dst`) to
> ensure the bound `dmabuf_id` corresponds to the correct
> network device (`dst->dev->ifindex == binding->dev->ifindex`).
>
> However, this check incorrectly fails and returns `-ENODEV`
> if the socket's route cache entry (`dst`) is merely missing
> or expired (`dst == NULL`). This scenario is observed during
> network events, such as when flow steering rules are deleted,
> leading to a temporary route cache invalidation.
>
> The parent caller, `tcp_sendmsg_locked()`, is already
> responsible for acquiring or validating the route (`dst_entry`).
> If `dst` is `NULL`, `tcp_sendmsg_locked()` will correctly
> derive the route before transmission.
>
> This patch removes the `dst` validation from
> `net_devmem_get_binding()`. The function now only validates
> the existence of the binding and its TX vector, relying on the
> calling context for device/route correctness. This allows
> temporary route cache misses to be handled gracefully by the
> TCP/IP stack without ENODEV error on the Devmem TX path.
>
> Reported-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Vedant Mathur <vedantmathur@google.com>
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Fixes: bd61848900bf ("net: devmem: Implement TX path")
> Signed-off-by: Shivaji Kant <shivajikant@google.com>
> ---
> net/core/devmem.c | 27 ++++++++++++++++++++++++---
> 1 file changed, 24 insertions(+), 3 deletions(-)
>
> diff --git a/net/core/devmem.c b/net/core/devmem.c
> index d9de31a6cc7f..1d04754bc756 100644
> --- a/net/core/devmem.c
> +++ b/net/core/devmem.c
> @@ -17,6 +17,7 @@
> #include <net/page_pool/helpers.h>
> #include <net/page_pool/memory_provider.h>
> #include <net/sock.h>
> +#include <net/tcp.h>
> #include <trace/events/page_pool.h>
>
> #include "devmem.h"
> @@ -357,7 +358,8 @@ struct net_devmem_dmabuf_binding *net_devmem_get_binding(struct sock *sk,
> unsigned int dmabuf_id)
> {
> struct net_devmem_dmabuf_binding *binding;
> - struct dst_entry *dst = __sk_dst_get(sk);
> + struct net_device *dst_dev;
> + struct dst_entry *dst;
> int err = 0;
>
> binding = net_devmem_lookup_dmabuf(dmabuf_id);
> @@ -366,16 +368,35 @@ struct net_devmem_dmabuf_binding *net_devmem_get_binding(struct sock *sk,
> goto out_err;
> }
>
> + rcu_read_lock();
> + dst = __sk_dst_get(sk);
> + /* If dst is NULL (route expired), attempt to rebuild it. */
> + if (unlikely(!dst)) {
> + if (inet_csk(sk)->icsk_af_ops->rebuild_header(sk)) {
> + err = -EHOSTUNREACH;
> + goto out_unlock;
> + }
Echoing your discussion with Eric, I think the message might want to
call out this part. Besides that, all looks good!
Pending that nit:
Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>
Best,
Bobby
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] net: devmem: Remove dst (ENODEV) check in net_devmem_get_binding
2025-10-28 18:11 ` Bobby Eshleman
@ 2025-10-28 18:14 ` Shivaji Kant
0 siblings, 0 replies; 5+ messages in thread
From: Shivaji Kant @ 2025-10-28 18:14 UTC (permalink / raw)
To: Bobby Eshleman
Cc: netdev, David S . Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Mina Almasry, Stanislav Fomichev,
Pavel Begunkov, Pranjal Shrivastava, Vedant Mathur
On Tue, Oct 28, 2025 at 11:41 PM Bobby Eshleman <bobbyeshleman@gmail.com> wrote:
>
> On Tue, Oct 28, 2025 at 06:07:14AM +0000, Shivaji Kant wrote:
> > The Devmem TX binding lookup function, performs a strict
> > check against the socket's destination cache (`dst`) to
> > ensure the bound `dmabuf_id` corresponds to the correct
> > network device (`dst->dev->ifindex == binding->dev->ifindex`).
> >
> > However, this check incorrectly fails and returns `-ENODEV`
> > if the socket's route cache entry (`dst`) is merely missing
> > or expired (`dst == NULL`). This scenario is observed during
> > network events, such as when flow steering rules are deleted,
> > leading to a temporary route cache invalidation.
> >
> > The parent caller, `tcp_sendmsg_locked()`, is already
> > responsible for acquiring or validating the route (`dst_entry`).
> > If `dst` is `NULL`, `tcp_sendmsg_locked()` will correctly
> > derive the route before transmission.
> >
> > This patch removes the `dst` validation from
> > `net_devmem_get_binding()`. The function now only validates
> > the existence of the binding and its TX vector, relying on the
> > calling context for device/route correctness. This allows
> > temporary route cache misses to be handled gracefully by the
> > TCP/IP stack without ENODEV error on the Devmem TX path.
> >
> > Reported-by: Eric Dumazet <edumazet@google.com>
> > Reported-by: Vedant Mathur <vedantmathur@google.com>
> > Suggested-by: Eric Dumazet <edumazet@google.com>
> > Fixes: bd61848900bf ("net: devmem: Implement TX path")
> > Signed-off-by: Shivaji Kant <shivajikant@google.com>
> > ---
> > net/core/devmem.c | 27 ++++++++++++++++++++++++---
> > 1 file changed, 24 insertions(+), 3 deletions(-)
> >
> > diff --git a/net/core/devmem.c b/net/core/devmem.c
> > index d9de31a6cc7f..1d04754bc756 100644
> > --- a/net/core/devmem.c
> > +++ b/net/core/devmem.c
> > @@ -17,6 +17,7 @@
> > #include <net/page_pool/helpers.h>
> > #include <net/page_pool/memory_provider.h>
> > #include <net/sock.h>
> > +#include <net/tcp.h>
> > #include <trace/events/page_pool.h>
> >
> > #include "devmem.h"
> > @@ -357,7 +358,8 @@ struct net_devmem_dmabuf_binding *net_devmem_get_binding(struct sock *sk,
> > unsigned int dmabuf_id)
> > {
> > struct net_devmem_dmabuf_binding *binding;
> > - struct dst_entry *dst = __sk_dst_get(sk);
> > + struct net_device *dst_dev;
> > + struct dst_entry *dst;
> > int err = 0;
> >
> > binding = net_devmem_lookup_dmabuf(dmabuf_id);
> > @@ -366,16 +368,35 @@ struct net_devmem_dmabuf_binding *net_devmem_get_binding(struct sock *sk,
> > goto out_err;
> > }
> >
> > + rcu_read_lock();
> > + dst = __sk_dst_get(sk);
> > + /* If dst is NULL (route expired), attempt to rebuild it. */
> > + if (unlikely(!dst)) {
> > + if (inet_csk(sk)->icsk_af_ops->rebuild_header(sk)) {
> > + err = -EHOSTUNREACH;
> > + goto out_unlock;
> > + }
>
> Echoing your discussion with Eric, I think the message might want to
> call out this part. Besides that, all looks good!
>
> Pending that nit:
>
> Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>
>
> Best,
> Bobby
Thanks Bobby for the review.
Will surely update the description in the v2 patch.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-10-28 18:15 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-28 6:07 [PATCH] net: devmem: Remove dst (ENODEV) check in net_devmem_get_binding Shivaji Kant
2025-10-28 7:03 ` Eric Dumazet
2025-10-28 7:27 ` Shivaji Kant
2025-10-28 18:11 ` Bobby Eshleman
2025-10-28 18:14 ` Shivaji Kant
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).