[PATCH net-next v1 0/7] devmem TCP fixes

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next v1 0/7] devmem TCP fixes
@ 2024-10-29 20:55 Mina Almasry
  2024-10-29 20:55 ` [PATCH net-next v1 6/7] net: fix SO_DEVMEM_DONTNEED looping too long Mina Almasry
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Mina Almasry @ 2024-10-29 20:55 UTC (permalink / raw)
  To: netdev, linux-kernel, linux-kselftest
  Cc: Mina Almasry, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jesper Dangaard Brouer,
	Ilias Apalodimas, Shuah Khan

A few unrelated devmem TCP fixes bundled in a series for some
convenience (if that's ok).

Patch 1-2: fix naming and provide page_pool_alloc_netmem for fragged
netmem.

Patch 3-4: fix issues with dma-buf dma addresses being potentially
passed to dma_sync_for_* helpers.

Patch 5-6: fix syzbot SO_DEVMEM_DONTNEED issue and add test for this
case.


Mina Almasry (6):
  net: page_pool: rename page_pool_alloc_netmem to *_netmems
  net: page_pool: create page_pool_alloc_netmem
  page_pool: disable sync for cpu for dmabuf memory provider
  netmem: add netmem_prefetch
  net: fix SO_DEVMEM_DONTNEED looping too long
  ncdevmem: add test for too many token_count

Samiullah Khawaja (1):
  page_pool: Set `dma_sync` to false for devmem memory provider

 include/net/netmem.h                   |  7 ++++
 include/net/page_pool/helpers.h        | 50 ++++++++++++++++++--------
 include/net/page_pool/types.h          |  2 +-
 net/core/devmem.c                      |  9 +++--
 net/core/page_pool.c                   | 11 +++---
 net/core/sock.c                        | 46 ++++++++++++++----------
 tools/testing/selftests/net/ncdevmem.c | 11 ++++++
 7 files changed, 93 insertions(+), 43 deletions(-)

-- 
2.47.0.163.g1226f6d8fa-goog


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH net-next v1 6/7] net: fix SO_DEVMEM_DONTNEED looping too long
  2024-10-29 20:55 [PATCH net-next v1 0/7] devmem TCP fixes Mina Almasry
@ 2024-10-29 20:55 ` Mina Almasry
  2024-10-30 14:33   ` Stanislav Fomichev
  2024-10-29 20:55 ` [PATCH net-next v1 7/7] ncdevmem: add test for too many token_count Mina Almasry
  2024-11-01  2:41 ` [PATCH net-next v1 0/7] devmem TCP fixes Jakub Kicinski
  2 siblings, 1 reply; 14+ messages in thread
From: Mina Almasry @ 2024-10-29 20:55 UTC (permalink / raw)
  To: netdev, linux-kernel, linux-kselftest
  Cc: Mina Almasry, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jesper Dangaard Brouer,
	Ilias Apalodimas, Shuah Khan, Yi Lai

Check we're going to free a reasonable number of frags in token_count
before starting the loop, to prevent looping too long.

Also minor code cleanups:
- Flip checks to reduce indentation.
- Use sizeof(*tokens) everywhere for consistentcy.

Cc: Yi Lai <yi1.lai@linux.intel.com>

Signed-off-by: Mina Almasry <almasrymina@google.com>

---
 net/core/sock.c | 46 ++++++++++++++++++++++++++++------------------
 1 file changed, 28 insertions(+), 18 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 7f398bd07fb7..8603b8d87f2e 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1047,11 +1047,12 @@ static int sock_reserve_memory(struct sock *sk, int bytes)

 #ifdef CONFIG_PAGE_POOL

-/* This is the number of tokens that the user can SO_DEVMEM_DONTNEED in
+/* This is the number of frags that the user can SO_DEVMEM_DONTNEED in
  * 1 syscall. The limit exists to limit the amount of memory the kernel
- * allocates to copy these tokens.
+ * allocates to copy these tokens, and to prevent looping over the frags for
+ * too long.
  */
-#define MAX_DONTNEED_TOKENS 128
+#define MAX_DONTNEED_FRAGS 1024

 static noinline_for_stack int
 sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int optlen)
@@ -1059,43 +1060,52 @@ sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int optlen)
 	unsigned int num_tokens, i, j, k, netmem_num = 0;
 	struct dmabuf_token *tokens;
 	netmem_ref netmems[16];
+	u64 num_frags = 0;
 	int ret = 0;

 	if (!sk_is_tcp(sk))
 		return -EBADF;

-	if (optlen % sizeof(struct dmabuf_token) ||
-	    optlen > sizeof(*tokens) * MAX_DONTNEED_TOKENS)
+	if (optlen % sizeof(*tokens) ||
+	    optlen > sizeof(*tokens) * MAX_DONTNEED_FRAGS)
 		return -EINVAL;

-	tokens = kvmalloc_array(optlen, sizeof(*tokens), GFP_KERNEL);
+	num_tokens = optlen / sizeof(*tokens);
+	tokens = kvmalloc_array(num_tokens, sizeof(*tokens), GFP_KERNEL);
 	if (!tokens)
 		return -ENOMEM;

-	num_tokens = optlen / sizeof(struct dmabuf_token);
 	if (copy_from_sockptr(tokens, optval, optlen)) {
 		kvfree(tokens);
 		return -EFAULT;
 	}

+	for (i = 0; i < num_tokens; i++) {
+		num_frags += tokens[i].token_count;
+		if (num_frags > MAX_DONTNEED_FRAGS) {
+			kvfree(tokens);
+			return -E2BIG;
+		}
+	}
+
 	xa_lock_bh(&sk->sk_user_frags);
 	for (i = 0; i < num_tokens; i++) {
 		for (j = 0; j < tokens[i].token_count; j++) {
 			netmem_ref netmem = (__force netmem_ref)__xa_erase(
 				&sk->sk_user_frags, tokens[i].token_start + j);

-			if (netmem &&
-			    !WARN_ON_ONCE(!netmem_is_net_iov(netmem))) {
-				netmems[netmem_num++] = netmem;
-				if (netmem_num == ARRAY_SIZE(netmems)) {
-					xa_unlock_bh(&sk->sk_user_frags);
-					for (k = 0; k < netmem_num; k++)
-						WARN_ON_ONCE(!napi_pp_put_page(netmems[k]));
-					netmem_num = 0;
-					xa_lock_bh(&sk->sk_user_frags);
-				}
-				ret++;
+			if (!netmem || WARN_ON_ONCE(!netmem_is_net_iov(netmem)))
+				continue;
+
+			netmems[netmem_num++] = netmem;
+			if (netmem_num == ARRAY_SIZE(netmems)) {
+				xa_unlock_bh(&sk->sk_user_frags);
+				for (k = 0; k < netmem_num; k++)
+					WARN_ON_ONCE(!napi_pp_put_page(netmems[k]));
+				netmem_num = 0;
+				xa_lock_bh(&sk->sk_user_frags);
 			}
+			ret++;
 		}
 	}

--
2.47.0.163.g1226f6d8fa-goog

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v1 6/7] net: fix SO_DEVMEM_DONTNEED looping too long
  2024-10-29 20:55 ` [PATCH net-next v1 6/7] net: fix SO_DEVMEM_DONTNEED looping too long Mina Almasry
@ 2024-10-30 14:33   ` Stanislav Fomichev
  2024-10-30 14:46     ` Mina Almasry
  0 siblings, 1 reply; 14+ messages in thread
From: Stanislav Fomichev @ 2024-10-30 14:33 UTC (permalink / raw)
  To: Mina Almasry
  Cc: netdev, linux-kernel, linux-kselftest, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Jesper Dangaard Brouer, Ilias Apalodimas, Shuah Khan, Yi Lai

On 10/29, Mina Almasry wrote:
> Check we're going to free a reasonable number of frags in token_count
> before starting the loop, to prevent looping too long.
> 
> Also minor code cleanups:
> - Flip checks to reduce indentation.
> - Use sizeof(*tokens) everywhere for consistentcy.
> 
> Cc: Yi Lai <yi1.lai@linux.intel.com>
> 
> Signed-off-by: Mina Almasry <almasrymina@google.com>
> 
> ---
>  net/core/sock.c | 46 ++++++++++++++++++++++++++++------------------
>  1 file changed, 28 insertions(+), 18 deletions(-)
> 
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 7f398bd07fb7..8603b8d87f2e 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1047,11 +1047,12 @@ static int sock_reserve_memory(struct sock *sk, int bytes)
> 
>  #ifdef CONFIG_PAGE_POOL
> 
> -/* This is the number of tokens that the user can SO_DEVMEM_DONTNEED in
> +/* This is the number of frags that the user can SO_DEVMEM_DONTNEED in
>   * 1 syscall. The limit exists to limit the amount of memory the kernel
> - * allocates to copy these tokens.
> + * allocates to copy these tokens, and to prevent looping over the frags for
> + * too long.
>   */
> -#define MAX_DONTNEED_TOKENS 128
> +#define MAX_DONTNEED_FRAGS 1024
> 
>  static noinline_for_stack int
>  sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int optlen)
> @@ -1059,43 +1060,52 @@ sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int optlen)
>  	unsigned int num_tokens, i, j, k, netmem_num = 0;
>  	struct dmabuf_token *tokens;
>  	netmem_ref netmems[16];
> +	u64 num_frags = 0;
>  	int ret = 0;
> 
>  	if (!sk_is_tcp(sk))
>  		return -EBADF;
> 
> -	if (optlen % sizeof(struct dmabuf_token) ||
> -	    optlen > sizeof(*tokens) * MAX_DONTNEED_TOKENS)
> +	if (optlen % sizeof(*tokens) ||
> +	    optlen > sizeof(*tokens) * MAX_DONTNEED_FRAGS)
>  		return -EINVAL;
> 
> -	tokens = kvmalloc_array(optlen, sizeof(*tokens), GFP_KERNEL);
> +	num_tokens = optlen / sizeof(*tokens);
> +	tokens = kvmalloc_array(num_tokens, sizeof(*tokens), GFP_KERNEL);
>  	if (!tokens)
>  		return -ENOMEM;
> 
> -	num_tokens = optlen / sizeof(struct dmabuf_token);
>  	if (copy_from_sockptr(tokens, optval, optlen)) {
>  		kvfree(tokens);
>  		return -EFAULT;
>  	}
> 
> +	for (i = 0; i < num_tokens; i++) {
> +		num_frags += tokens[i].token_count;
> +		if (num_frags > MAX_DONTNEED_FRAGS) {
> +			kvfree(tokens);
> +			return -E2BIG;
> +		}
> +	}
> +
>  	xa_lock_bh(&sk->sk_user_frags);
>  	for (i = 0; i < num_tokens; i++) {
>  		for (j = 0; j < tokens[i].token_count; j++) {
>  			netmem_ref netmem = (__force netmem_ref)__xa_erase(
>  				&sk->sk_user_frags, tokens[i].token_start + j);
> 
> -			if (netmem &&
> -			    !WARN_ON_ONCE(!netmem_is_net_iov(netmem))) {
> -				netmems[netmem_num++] = netmem;
> -				if (netmem_num == ARRAY_SIZE(netmems)) {
> -					xa_unlock_bh(&sk->sk_user_frags);
> -					for (k = 0; k < netmem_num; k++)
> -						WARN_ON_ONCE(!napi_pp_put_page(netmems[k]));
> -					netmem_num = 0;
> -					xa_lock_bh(&sk->sk_user_frags);
> -				}
> -				ret++;

[..]

> +			if (!netmem || WARN_ON_ONCE(!netmem_is_net_iov(netmem)))
> +				continue;

Any reason we are not returning explicit error to the callers here?
That probably needs some mechanism to signal which particular one failed
so the users can restart?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v1 6/7] net: fix SO_DEVMEM_DONTNEED looping too long
  2024-10-30 14:33   ` Stanislav Fomichev
@ 2024-10-30 14:46     ` Mina Almasry
  2024-10-30 15:06       ` Stanislav Fomichev
  0 siblings, 1 reply; 14+ messages in thread
From: Mina Almasry @ 2024-10-30 14:46 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: netdev, linux-kernel, linux-kselftest, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Jesper Dangaard Brouer, Ilias Apalodimas, Shuah Khan, Yi Lai

On Wed, Oct 30, 2024 at 7:33 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:
>
> On 10/29, Mina Almasry wrote:
> > Check we're going to free a reasonable number of frags in token_count
> > before starting the loop, to prevent looping too long.
> >
> > Also minor code cleanups:
> > - Flip checks to reduce indentation.
> > - Use sizeof(*tokens) everywhere for consistentcy.
> >
> > Cc: Yi Lai <yi1.lai@linux.intel.com>
> >
> > Signed-off-by: Mina Almasry <almasrymina@google.com>
> >
> > ---
> >  net/core/sock.c | 46 ++++++++++++++++++++++++++++------------------
> >  1 file changed, 28 insertions(+), 18 deletions(-)
> >
> > diff --git a/net/core/sock.c b/net/core/sock.c
> > index 7f398bd07fb7..8603b8d87f2e 100644
> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -1047,11 +1047,12 @@ static int sock_reserve_memory(struct sock *sk, int bytes)
> >
> >  #ifdef CONFIG_PAGE_POOL
> >
> > -/* This is the number of tokens that the user can SO_DEVMEM_DONTNEED in
> > +/* This is the number of frags that the user can SO_DEVMEM_DONTNEED in
> >   * 1 syscall. The limit exists to limit the amount of memory the kernel
> > - * allocates to copy these tokens.
> > + * allocates to copy these tokens, and to prevent looping over the frags for
> > + * too long.
> >   */
> > -#define MAX_DONTNEED_TOKENS 128
> > +#define MAX_DONTNEED_FRAGS 1024
> >
> >  static noinline_for_stack int
> >  sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int optlen)
> > @@ -1059,43 +1060,52 @@ sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int optlen)
> >       unsigned int num_tokens, i, j, k, netmem_num = 0;
> >       struct dmabuf_token *tokens;
> >       netmem_ref netmems[16];
> > +     u64 num_frags = 0;
> >       int ret = 0;
> >
> >       if (!sk_is_tcp(sk))
> >               return -EBADF;
> >
> > -     if (optlen % sizeof(struct dmabuf_token) ||
> > -         optlen > sizeof(*tokens) * MAX_DONTNEED_TOKENS)
> > +     if (optlen % sizeof(*tokens) ||
> > +         optlen > sizeof(*tokens) * MAX_DONTNEED_FRAGS)
> >               return -EINVAL;
> >
> > -     tokens = kvmalloc_array(optlen, sizeof(*tokens), GFP_KERNEL);
> > +     num_tokens = optlen / sizeof(*tokens);
> > +     tokens = kvmalloc_array(num_tokens, sizeof(*tokens), GFP_KERNEL);
> >       if (!tokens)
> >               return -ENOMEM;
> >
> > -     num_tokens = optlen / sizeof(struct dmabuf_token);
> >       if (copy_from_sockptr(tokens, optval, optlen)) {
> >               kvfree(tokens);
> >               return -EFAULT;
> >       }
> >
> > +     for (i = 0; i < num_tokens; i++) {
> > +             num_frags += tokens[i].token_count;
> > +             if (num_frags > MAX_DONTNEED_FRAGS) {
> > +                     kvfree(tokens);
> > +                     return -E2BIG;
> > +             }
> > +     }
> > +
> >       xa_lock_bh(&sk->sk_user_frags);
> >       for (i = 0; i < num_tokens; i++) {
> >               for (j = 0; j < tokens[i].token_count; j++) {
> >                       netmem_ref netmem = (__force netmem_ref)__xa_erase(
> >                               &sk->sk_user_frags, tokens[i].token_start + j);
> >
> > -                     if (netmem &&
> > -                         !WARN_ON_ONCE(!netmem_is_net_iov(netmem))) {
> > -                             netmems[netmem_num++] = netmem;
> > -                             if (netmem_num == ARRAY_SIZE(netmems)) {
> > -                                     xa_unlock_bh(&sk->sk_user_frags);
> > -                                     for (k = 0; k < netmem_num; k++)
> > -                                             WARN_ON_ONCE(!napi_pp_put_page(netmems[k]));
> > -                                     netmem_num = 0;
> > -                                     xa_lock_bh(&sk->sk_user_frags);
> > -                             }
> > -                             ret++;
>
> [..]
>
> > +                     if (!netmem || WARN_ON_ONCE(!netmem_is_net_iov(netmem)))
> > +                             continue;
>
> Any reason we are not returning explicit error to the callers here?
> That probably needs some mechanism to signal which particular one failed
> so the users can restart?

Only because I can't think of a simple way to return an array of frags
failed to DONTNEED to the user.

Also, this error should be extremely rare or never hit really. I don't
know how we end up not finding a netmem here or the netmem is page.
The only way is if the user is malicious (messing with the token ids
passed to the kernel) or if a kernel bug is happening.

Also, the information is useless to the user. If the user sees 'frag
128 failed to free'. There is nothing really the user can do to
recover at runtime. Only usefulness that could come is for the user to
log the error. We already WARN_ON_ONCE on the error the user would not
be able to trigger.

-- 
Thanks,
Mina

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v1 6/7] net: fix SO_DEVMEM_DONTNEED looping too long
  2024-10-30 14:46     ` Mina Almasry
@ 2024-10-30 15:06       ` Stanislav Fomichev
  2024-11-05 21:28         ` Mina Almasry
  0 siblings, 1 reply; 14+ messages in thread
From: Stanislav Fomichev @ 2024-10-30 15:06 UTC (permalink / raw)
  To: Mina Almasry
  Cc: netdev, linux-kernel, linux-kselftest, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Jesper Dangaard Brouer, Ilias Apalodimas, Shuah Khan, Yi Lai

On 10/30, Mina Almasry wrote:
> On Wed, Oct 30, 2024 at 7:33 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:
> >
> > On 10/29, Mina Almasry wrote:
> > > Check we're going to free a reasonable number of frags in token_count
> > > before starting the loop, to prevent looping too long.
> > >
> > > Also minor code cleanups:
> > > - Flip checks to reduce indentation.
> > > - Use sizeof(*tokens) everywhere for consistentcy.
> > >
> > > Cc: Yi Lai <yi1.lai@linux.intel.com>
> > >
> > > Signed-off-by: Mina Almasry <almasrymina@google.com>
> > >
> > > ---
> > >  net/core/sock.c | 46 ++++++++++++++++++++++++++++------------------
> > >  1 file changed, 28 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/net/core/sock.c b/net/core/sock.c
> > > index 7f398bd07fb7..8603b8d87f2e 100644
> > > --- a/net/core/sock.c
> > > +++ b/net/core/sock.c
> > > @@ -1047,11 +1047,12 @@ static int sock_reserve_memory(struct sock *sk, int bytes)
> > >
> > >  #ifdef CONFIG_PAGE_POOL
> > >
> > > -/* This is the number of tokens that the user can SO_DEVMEM_DONTNEED in
> > > +/* This is the number of frags that the user can SO_DEVMEM_DONTNEED in
> > >   * 1 syscall. The limit exists to limit the amount of memory the kernel
> > > - * allocates to copy these tokens.
> > > + * allocates to copy these tokens, and to prevent looping over the frags for
> > > + * too long.
> > >   */
> > > -#define MAX_DONTNEED_TOKENS 128
> > > +#define MAX_DONTNEED_FRAGS 1024
> > >
> > >  static noinline_for_stack int
> > >  sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int optlen)
> > > @@ -1059,43 +1060,52 @@ sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int optlen)
> > >       unsigned int num_tokens, i, j, k, netmem_num = 0;
> > >       struct dmabuf_token *tokens;
> > >       netmem_ref netmems[16];
> > > +     u64 num_frags = 0;
> > >       int ret = 0;
> > >
> > >       if (!sk_is_tcp(sk))
> > >               return -EBADF;
> > >
> > > -     if (optlen % sizeof(struct dmabuf_token) ||
> > > -         optlen > sizeof(*tokens) * MAX_DONTNEED_TOKENS)
> > > +     if (optlen % sizeof(*tokens) ||
> > > +         optlen > sizeof(*tokens) * MAX_DONTNEED_FRAGS)
> > >               return -EINVAL;
> > >
> > > -     tokens = kvmalloc_array(optlen, sizeof(*tokens), GFP_KERNEL);
> > > +     num_tokens = optlen / sizeof(*tokens);
> > > +     tokens = kvmalloc_array(num_tokens, sizeof(*tokens), GFP_KERNEL);
> > >       if (!tokens)
> > >               return -ENOMEM;
> > >
> > > -     num_tokens = optlen / sizeof(struct dmabuf_token);
> > >       if (copy_from_sockptr(tokens, optval, optlen)) {
> > >               kvfree(tokens);
> > >               return -EFAULT;
> > >       }
> > >
> > > +     for (i = 0; i < num_tokens; i++) {
> > > +             num_frags += tokens[i].token_count;
> > > +             if (num_frags > MAX_DONTNEED_FRAGS) {
> > > +                     kvfree(tokens);
> > > +                     return -E2BIG;
> > > +             }
> > > +     }
> > > +
> > >       xa_lock_bh(&sk->sk_user_frags);
> > >       for (i = 0; i < num_tokens; i++) {
> > >               for (j = 0; j < tokens[i].token_count; j++) {
> > >                       netmem_ref netmem = (__force netmem_ref)__xa_erase(
> > >                               &sk->sk_user_frags, tokens[i].token_start + j);
> > >
> > > -                     if (netmem &&
> > > -                         !WARN_ON_ONCE(!netmem_is_net_iov(netmem))) {
> > > -                             netmems[netmem_num++] = netmem;
> > > -                             if (netmem_num == ARRAY_SIZE(netmems)) {
> > > -                                     xa_unlock_bh(&sk->sk_user_frags);
> > > -                                     for (k = 0; k < netmem_num; k++)
> > > -                                             WARN_ON_ONCE(!napi_pp_put_page(netmems[k]));
> > > -                                     netmem_num = 0;
> > > -                                     xa_lock_bh(&sk->sk_user_frags);
> > > -                             }
> > > -                             ret++;
> >
> > [..]
> >
> > > +                     if (!netmem || WARN_ON_ONCE(!netmem_is_net_iov(netmem)))
> > > +                             continue;
> >
> > Any reason we are not returning explicit error to the callers here?
> > That probably needs some mechanism to signal which particular one failed
> > so the users can restart?
> 
> Only because I can't think of a simple way to return an array of frags
> failed to DONTNEED to the user.

I'd expect the call to return as soon as it hits the invalid frag
entry (plus the number of entries that it successfully refilled up to
the invalid one). But too late I guess.

> Also, this error should be extremely rare or never hit really. I don't
> know how we end up not finding a netmem here or the netmem is page.
> The only way is if the user is malicious (messing with the token ids
> passed to the kernel) or if a kernel bug is happening.

I do hit this error with 1500 mtu, so it would've been nice to
understand which particular token triggered that. It might be
something buggy on the driver side, I need to investigate. (it's
super low prio because 1500)

> Also, the information is useless to the user. If the user sees 'frag
> 128 failed to free'. There is nothing really the user can do to
> recover at runtime. Only usefulness that could come is for the user to
> log the error. We already WARN_ON_ONCE on the error the user would not
> be able to trigger.

I'm thinking from the pow of user application. It might have bugs as
well and try to refill something that should not have been refilled.
Having info about which particular token has failed (even just for
the logging purposes) might have been nice.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v1 6/7] net: fix SO_DEVMEM_DONTNEED looping too long
  2024-10-30 15:06       ` Stanislav Fomichev
@ 2024-11-05 21:28         ` Mina Almasry
  2024-11-05 21:46           ` Stanislav Fomichev
  0 siblings, 1 reply; 14+ messages in thread
From: Mina Almasry @ 2024-11-05 21:28 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: netdev, linux-kernel, linux-kselftest, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Jesper Dangaard Brouer, Ilias Apalodimas, Shuah Khan, Yi Lai

On Wed, Oct 30, 2024 at 8:07 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:
>
> On 10/30, Mina Almasry wrote:
> > On Wed, Oct 30, 2024 at 7:33 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:
> > >
> > > On 10/29, Mina Almasry wrote:
> > > > Check we're going to free a reasonable number of frags in token_count
> > > > before starting the loop, to prevent looping too long.
> > > >
> > > > Also minor code cleanups:
> > > > - Flip checks to reduce indentation.
> > > > - Use sizeof(*tokens) everywhere for consistentcy.
> > > >
> > > > Cc: Yi Lai <yi1.lai@linux.intel.com>
> > > >
> > > > Signed-off-by: Mina Almasry <almasrymina@google.com>
> > > >
> > > > ---
> > > >  net/core/sock.c | 46 ++++++++++++++++++++++++++++------------------
> > > >  1 file changed, 28 insertions(+), 18 deletions(-)
> > > >
> > > > diff --git a/net/core/sock.c b/net/core/sock.c
> > > > index 7f398bd07fb7..8603b8d87f2e 100644
> > > > --- a/net/core/sock.c
> > > > +++ b/net/core/sock.c
> > > > @@ -1047,11 +1047,12 @@ static int sock_reserve_memory(struct sock *sk, int bytes)
> > > >
> > > >  #ifdef CONFIG_PAGE_POOL
> > > >
> > > > -/* This is the number of tokens that the user can SO_DEVMEM_DONTNEED in
> > > > +/* This is the number of frags that the user can SO_DEVMEM_DONTNEED in
> > > >   * 1 syscall. The limit exists to limit the amount of memory the kernel
> > > > - * allocates to copy these tokens.
> > > > + * allocates to copy these tokens, and to prevent looping over the frags for
> > > > + * too long.
> > > >   */
> > > > -#define MAX_DONTNEED_TOKENS 128
> > > > +#define MAX_DONTNEED_FRAGS 1024
> > > >
> > > >  static noinline_for_stack int
> > > >  sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int optlen)
> > > > @@ -1059,43 +1060,52 @@ sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int optlen)
> > > >       unsigned int num_tokens, i, j, k, netmem_num = 0;
> > > >       struct dmabuf_token *tokens;
> > > >       netmem_ref netmems[16];
> > > > +     u64 num_frags = 0;
> > > >       int ret = 0;
> > > >
> > > >       if (!sk_is_tcp(sk))
> > > >               return -EBADF;
> > > >
> > > > -     if (optlen % sizeof(struct dmabuf_token) ||
> > > > -         optlen > sizeof(*tokens) * MAX_DONTNEED_TOKENS)
> > > > +     if (optlen % sizeof(*tokens) ||
> > > > +         optlen > sizeof(*tokens) * MAX_DONTNEED_FRAGS)
> > > >               return -EINVAL;
> > > >
> > > > -     tokens = kvmalloc_array(optlen, sizeof(*tokens), GFP_KERNEL);
> > > > +     num_tokens = optlen / sizeof(*tokens);
> > > > +     tokens = kvmalloc_array(num_tokens, sizeof(*tokens), GFP_KERNEL);
> > > >       if (!tokens)
> > > >               return -ENOMEM;
> > > >
> > > > -     num_tokens = optlen / sizeof(struct dmabuf_token);
> > > >       if (copy_from_sockptr(tokens, optval, optlen)) {
> > > >               kvfree(tokens);
> > > >               return -EFAULT;
> > > >       }
> > > >
> > > > +     for (i = 0; i < num_tokens; i++) {
> > > > +             num_frags += tokens[i].token_count;
> > > > +             if (num_frags > MAX_DONTNEED_FRAGS) {
> > > > +                     kvfree(tokens);
> > > > +                     return -E2BIG;
> > > > +             }
> > > > +     }
> > > > +
> > > >       xa_lock_bh(&sk->sk_user_frags);
> > > >       for (i = 0; i < num_tokens; i++) {
> > > >               for (j = 0; j < tokens[i].token_count; j++) {
> > > >                       netmem_ref netmem = (__force netmem_ref)__xa_erase(
> > > >                               &sk->sk_user_frags, tokens[i].token_start + j);
> > > >
> > > > -                     if (netmem &&
> > > > -                         !WARN_ON_ONCE(!netmem_is_net_iov(netmem))) {
> > > > -                             netmems[netmem_num++] = netmem;
> > > > -                             if (netmem_num == ARRAY_SIZE(netmems)) {
> > > > -                                     xa_unlock_bh(&sk->sk_user_frags);
> > > > -                                     for (k = 0; k < netmem_num; k++)
> > > > -                                             WARN_ON_ONCE(!napi_pp_put_page(netmems[k]));
> > > > -                                     netmem_num = 0;
> > > > -                                     xa_lock_bh(&sk->sk_user_frags);
> > > > -                             }
> > > > -                             ret++;
> > >
> > > [..]
> > >
> > > > +                     if (!netmem || WARN_ON_ONCE(!netmem_is_net_iov(netmem)))
> > > > +                             continue;
> > >
> > > Any reason we are not returning explicit error to the callers here?
> > > That probably needs some mechanism to signal which particular one failed
> > > so the users can restart?
> >
> > Only because I can't think of a simple way to return an array of frags
> > failed to DONTNEED to the user.
>
> I'd expect the call to return as soon as it hits the invalid frag
> entry (plus the number of entries that it successfully refilled up to
> the invalid one). But too late I guess.
>
> > Also, this error should be extremely rare or never hit really. I don't
> > know how we end up not finding a netmem here or the netmem is page.
> > The only way is if the user is malicious (messing with the token ids
> > passed to the kernel) or if a kernel bug is happening.
>
> I do hit this error with 1500 mtu, so it would've been nice to
> understand which particular token triggered that. It might be
> something buggy on the driver side, I need to investigate. (it's
> super low prio because 1500)
>

Hmm, I've never seen this, in production (code is similar to
upstreamed, but I guess not exactly the same), and in my ncdevmem
upstream testing.

FWIW leaked frags are extremely bad, because there is no opportunity
to reap them until the entire dmabuf has been rebound. You will need
to root cause this if you're seeing it and are interested in using
devmem tcp in prod.

sk_user_frags is only really touched in:
- sock_devmem_dontneed(), where they are removed from the xarray.
- tcp_recvmsg_dmabuf(), where they are added to the xarray.
- tcp_v4_destroy_sock(), where they are freed (but not removed from
the xarray?!).

The only root causes for this bug I see are:

1. You're racing tcp_v4_destroy_sock() with sock_devmem_dontneed(), so
somehow you're trying to release a frag already released in that loop?
Or,
2. You're releasing a frag that was never added by
tcp_recvmsg_dmabuf(). I.e. There is a bug in tcp_recvmsg_dmabuf() that
it put_cmsg the frag_id to the userspace but never adds it to the
sk_user_frags. That should be accompanied by a ncdevmem validation
error.

The way to debug #2 is really to test with the ncdevmem validation. I
got the sense from reviewing the test series that you don't like to
use it, but it's how I root cause such issues. You should familiarize
yourself with it if you want to root cause such issues as well. To use
it:

client: yes $(echo -e \\x01\\x02\\x03\\x04\\x05\\x06) | tr \\n \\0 \
          | head -c 1G | nc <server ip> -p 5224
 server: ncdevmem -s <server IP> -c <client IP> -f eth1 -l -p 5224 -v 7

If you see a validation error with your missing frag, send me the
logs, I may be able to guess what's wrong.

> > Also, the information is useless to the user. If the user sees 'frag
> > 128 failed to free'. There is nothing really the user can do to
> > recover at runtime. Only usefulness that could come is for the user to
> > log the error. We already WARN_ON_ONCE on the error the user would not
> > be able to trigger.
>
> I'm thinking from the pow of user application. It might have bugs as
> well and try to refill something that should not have been refilled.
> Having info about which particular token has failed (even just for
> the logging purposes) might have been nice.

Yeah, it may have been nice. On the flip side it complicates calling
sock_devmem_dontneed(). The userspace need to count the freed frags in
its input, remove them, skip the leaked one, and re-call the syscall.
On the flipside the userspace gets to know the id of the frag that
leaked but the usefulness of the information is slightly questionable
for me. :shrug:


-- 
Thanks,
Mina

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v1 6/7] net: fix SO_DEVMEM_DONTNEED looping too long
  2024-11-05 21:28         ` Mina Almasry
@ 2024-11-05 21:46           ` Stanislav Fomichev
  2024-11-05 23:43             ` Mina Almasry
  0 siblings, 1 reply; 14+ messages in thread
From: Stanislav Fomichev @ 2024-11-05 21:46 UTC (permalink / raw)
  To: Mina Almasry
  Cc: netdev, linux-kernel, linux-kselftest, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Jesper Dangaard Brouer, Ilias Apalodimas, Shuah Khan, Yi Lai

On 11/05, Mina Almasry wrote:
> On Wed, Oct 30, 2024 at 8:07 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:
> >
> > On 10/30, Mina Almasry wrote:
> > > On Wed, Oct 30, 2024 at 7:33 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:
> > > >
> > > > On 10/29, Mina Almasry wrote:
> > > > > Check we're going to free a reasonable number of frags in token_count
> > > > > before starting the loop, to prevent looping too long.
> > > > >
> > > > > Also minor code cleanups:
> > > > > - Flip checks to reduce indentation.
> > > > > - Use sizeof(*tokens) everywhere for consistentcy.
> > > > >
> > > > > Cc: Yi Lai <yi1.lai@linux.intel.com>
> > > > >
> > > > > Signed-off-by: Mina Almasry <almasrymina@google.com>
> > > > >
> > > > > ---
> > > > >  net/core/sock.c | 46 ++++++++++++++++++++++++++++------------------
> > > > >  1 file changed, 28 insertions(+), 18 deletions(-)
> > > > >
> > > > > diff --git a/net/core/sock.c b/net/core/sock.c
> > > > > index 7f398bd07fb7..8603b8d87f2e 100644
> > > > > --- a/net/core/sock.c
> > > > > +++ b/net/core/sock.c
> > > > > @@ -1047,11 +1047,12 @@ static int sock_reserve_memory(struct sock *sk, int bytes)
> > > > >
> > > > >  #ifdef CONFIG_PAGE_POOL
> > > > >
> > > > > -/* This is the number of tokens that the user can SO_DEVMEM_DONTNEED in
> > > > > +/* This is the number of frags that the user can SO_DEVMEM_DONTNEED in
> > > > >   * 1 syscall. The limit exists to limit the amount of memory the kernel
> > > > > - * allocates to copy these tokens.
> > > > > + * allocates to copy these tokens, and to prevent looping over the frags for
> > > > > + * too long.
> > > > >   */
> > > > > -#define MAX_DONTNEED_TOKENS 128
> > > > > +#define MAX_DONTNEED_FRAGS 1024
> > > > >
> > > > >  static noinline_for_stack int
> > > > >  sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int optlen)
> > > > > @@ -1059,43 +1060,52 @@ sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int optlen)
> > > > >       unsigned int num_tokens, i, j, k, netmem_num = 0;
> > > > >       struct dmabuf_token *tokens;
> > > > >       netmem_ref netmems[16];
> > > > > +     u64 num_frags = 0;
> > > > >       int ret = 0;
> > > > >
> > > > >       if (!sk_is_tcp(sk))
> > > > >               return -EBADF;
> > > > >
> > > > > -     if (optlen % sizeof(struct dmabuf_token) ||
> > > > > -         optlen > sizeof(*tokens) * MAX_DONTNEED_TOKENS)
> > > > > +     if (optlen % sizeof(*tokens) ||
> > > > > +         optlen > sizeof(*tokens) * MAX_DONTNEED_FRAGS)
> > > > >               return -EINVAL;
> > > > >
> > > > > -     tokens = kvmalloc_array(optlen, sizeof(*tokens), GFP_KERNEL);
> > > > > +     num_tokens = optlen / sizeof(*tokens);
> > > > > +     tokens = kvmalloc_array(num_tokens, sizeof(*tokens), GFP_KERNEL);
> > > > >       if (!tokens)
> > > > >               return -ENOMEM;
> > > > >
> > > > > -     num_tokens = optlen / sizeof(struct dmabuf_token);
> > > > >       if (copy_from_sockptr(tokens, optval, optlen)) {
> > > > >               kvfree(tokens);
> > > > >               return -EFAULT;
> > > > >       }
> > > > >
> > > > > +     for (i = 0; i < num_tokens; i++) {
> > > > > +             num_frags += tokens[i].token_count;
> > > > > +             if (num_frags > MAX_DONTNEED_FRAGS) {
> > > > > +                     kvfree(tokens);
> > > > > +                     return -E2BIG;
> > > > > +             }
> > > > > +     }
> > > > > +
> > > > >       xa_lock_bh(&sk->sk_user_frags);
> > > > >       for (i = 0; i < num_tokens; i++) {
> > > > >               for (j = 0; j < tokens[i].token_count; j++) {
> > > > >                       netmem_ref netmem = (__force netmem_ref)__xa_erase(
> > > > >                               &sk->sk_user_frags, tokens[i].token_start + j);
> > > > >
> > > > > -                     if (netmem &&
> > > > > -                         !WARN_ON_ONCE(!netmem_is_net_iov(netmem))) {
> > > > > -                             netmems[netmem_num++] = netmem;
> > > > > -                             if (netmem_num == ARRAY_SIZE(netmems)) {
> > > > > -                                     xa_unlock_bh(&sk->sk_user_frags);
> > > > > -                                     for (k = 0; k < netmem_num; k++)
> > > > > -                                             WARN_ON_ONCE(!napi_pp_put_page(netmems[k]));
> > > > > -                                     netmem_num = 0;
> > > > > -                                     xa_lock_bh(&sk->sk_user_frags);
> > > > > -                             }
> > > > > -                             ret++;
> > > >
> > > > [..]
> > > >
> > > > > +                     if (!netmem || WARN_ON_ONCE(!netmem_is_net_iov(netmem)))
> > > > > +                             continue;
> > > >
> > > > Any reason we are not returning explicit error to the callers here?
> > > > That probably needs some mechanism to signal which particular one failed
> > > > so the users can restart?
> > >
> > > Only because I can't think of a simple way to return an array of frags
> > > failed to DONTNEED to the user.
> >
> > I'd expect the call to return as soon as it hits the invalid frag
> > entry (plus the number of entries that it successfully refilled up to
> > the invalid one). But too late I guess.
> >
> > > Also, this error should be extremely rare or never hit really. I don't
> > > know how we end up not finding a netmem here or the netmem is page.
> > > The only way is if the user is malicious (messing with the token ids
> > > passed to the kernel) or if a kernel bug is happening.
> >
> > I do hit this error with 1500 mtu, so it would've been nice to
> > understand which particular token triggered that. It might be
> > something buggy on the driver side, I need to investigate. (it's
> > super low prio because 1500)
> >
> 
> Hmm, I've never seen this, in production (code is similar to
> upstreamed, but I guess not exactly the same), and in my ncdevmem
> upstream testing.
> 
> FWIW leaked frags are extremely bad, because there is no opportunity
> to reap them until the entire dmabuf has been rebound. You will need
> to root cause this if you're seeing it and are interested in using
> devmem tcp in prod.
> 
> sk_user_frags is only really touched in:
> - sock_devmem_dontneed(), where they are removed from the xarray.
> - tcp_recvmsg_dmabuf(), where they are added to the xarray.
> - tcp_v4_destroy_sock(), where they are freed (but not removed from
> the xarray?!).
> 
> The only root causes for this bug I see are:
> 
> 1. You're racing tcp_v4_destroy_sock() with sock_devmem_dontneed(), so
> somehow you're trying to release a frag already released in that loop?
> Or,
> 2. You're releasing a frag that was never added by
> tcp_recvmsg_dmabuf(). I.e. There is a bug in tcp_recvmsg_dmabuf() that
> it put_cmsg the frag_id to the userspace but never adds it to the
> sk_user_frags. That should be accompanied by a ncdevmem validation
> error.
> 
> The way to debug #2 is really to test with the ncdevmem validation. I
> got the sense from reviewing the test series that you don't like to
> use it, but it's how I root cause such issues. You should familiarize
> yourself with it if you want to root cause such issues as well. To use
> it:
> 
> client: yes $(echo -e \\x01\\x02\\x03\\x04\\x05\\x06) | tr \\n \\0 \
>           | head -c 1G | nc <server ip> -p 5224
>  server: ncdevmem -s <server IP> -c <client IP> -f eth1 -l -p 5224 -v 7
> 
> If you see a validation error with your missing frag, send me the
> logs, I may be able to guess what's wrong.

Ack, let's put this discussion on the stack and I'll resurrect it
once we have something upstream that's mimicking whatever I have
on my side in terms of generic devmem tx + device drivers :-)

> > > Also, the information is useless to the user. If the user sees 'frag
> > > 128 failed to free'. There is nothing really the user can do to
> > > recover at runtime. Only usefulness that could come is for the user to
> > > log the error. We already WARN_ON_ONCE on the error the user would not
> > > be able to trigger.
> >
> > I'm thinking from the pow of user application. It might have bugs as
> > well and try to refill something that should not have been refilled.
> > Having info about which particular token has failed (even just for
> > the logging purposes) might have been nice.
> 
> Yeah, it may have been nice. On the flip side it complicates calling
> sock_devmem_dontneed(). The userspace need to count the freed frags in
> its input, remove them, skip the leaked one, and re-call the syscall.
> On the flipside the userspace gets to know the id of the frag that
> leaked but the usefulness of the information is slightly questionable
> for me. :shrug:

Right, because I was gonna suggest for this patch, instead of having
a separate extra loop that returns -E2BIG (since this loop is basically
mostly cycles wasted assuming most of the calls are gonna be well behaved),
can we keep num_frags freed as we go and stop and return once
we reach MAX_DONTNEED_FRAGS?

for (i = 0; i < num_tokens; i++) {
	for (j ...) {
		netmem_ref netmem ...
		...
	}
	num_frags += tokens[i].token_count;
	if (num_frags > MAX_DONTNEED_FRAGS)
		return ret;
}

Or do you still find it confusing because userspace has to handle that?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v1 6/7] net: fix SO_DEVMEM_DONTNEED looping too long
  2024-11-05 21:46           ` Stanislav Fomichev
@ 2024-11-05 23:43             ` Mina Almasry
  2024-11-06  0:13               ` Stanislav Fomichev
  0 siblings, 1 reply; 14+ messages in thread
From: Mina Almasry @ 2024-11-05 23:43 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: netdev, linux-kernel, linux-kselftest, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Jesper Dangaard Brouer, Ilias Apalodimas, Shuah Khan, Yi Lai

On Tue, Nov 5, 2024 at 1:46 PM Stanislav Fomichev <stfomichev@gmail.com> wrote:
> > > > Also, the information is useless to the user. If the user sees 'frag
> > > > 128 failed to free'. There is nothing really the user can do to
> > > > recover at runtime. Only usefulness that could come is for the user to
> > > > log the error. We already WARN_ON_ONCE on the error the user would not
> > > > be able to trigger.
> > >
> > > I'm thinking from the pow of user application. It might have bugs as
> > > well and try to refill something that should not have been refilled.
> > > Having info about which particular token has failed (even just for
> > > the logging purposes) might have been nice.
> >
> > Yeah, it may have been nice. On the flip side it complicates calling
> > sock_devmem_dontneed(). The userspace need to count the freed frags in
> > its input, remove them, skip the leaked one, and re-call the syscall.
> > On the flipside the userspace gets to know the id of the frag that
> > leaked but the usefulness of the information is slightly questionable
> > for me. :shrug:
>
> Right, because I was gonna suggest for this patch, instead of having
> a separate extra loop that returns -E2BIG (since this loop is basically
> mostly cycles wasted assuming most of the calls are gonna be well behaved),
> can we keep num_frags freed as we go and stop and return once
> we reach MAX_DONTNEED_FRAGS?
>
> for (i = 0; i < num_tokens; i++) {
>         for (j ...) {
>                 netmem_ref netmem ...
>                 ...
>         }
>         num_frags += tokens[i].token_count;
>         if (num_frags > MAX_DONTNEED_FRAGS)
>                 return ret;
> }
>
> Or do you still find it confusing because userspace has to handle that?

Ah, I don't think this will work, because it creates this scenario:

- user calls SO_DEVMEM_DONTNEED passing 1030 tokens.
- Kernel returns 500 freed.
- User doesn't know whether:
(a)  The remaining 530 tokens are all attached to the last
tokens.count and that's why the kernel returned early, or
(b) the kernel leaked 530 tokens because it could not find any of them
in the sk_user_frags.

In (a) the user is supposed to recall SO_DEVMEM_DONTNEED on the
remaining 530 tokens, but in (b) the user is not supposed to do that
(the tokens have leaked and there is nothing the user can do to
recover).

The current interface is more simple. The kernel either returns an
error (nothing has been freed): recall SO_DEVMEM_DONTNEED on all the
tokens after resolving the error, or,

the kernel returns a positive value which means all the tokens have
been freed (or unrecoverably leaked), and the userspace must not call
SO_DEVMEM_DONTNEED on this batch again.


--
Thanks,
Mina

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v1 6/7] net: fix SO_DEVMEM_DONTNEED looping too long
  2024-11-05 23:43             ` Mina Almasry
@ 2024-11-06  0:13               ` Stanislav Fomichev
  0 siblings, 0 replies; 14+ messages in thread
From: Stanislav Fomichev @ 2024-11-06  0:13 UTC (permalink / raw)
  To: Mina Almasry
  Cc: netdev, linux-kernel, linux-kselftest, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Jesper Dangaard Brouer, Ilias Apalodimas, Shuah Khan, Yi Lai

On 11/05, Mina Almasry wrote:
> On Tue, Nov 5, 2024 at 1:46 PM Stanislav Fomichev <stfomichev@gmail.com> wrote:
> > > > > Also, the information is useless to the user. If the user sees 'frag
> > > > > 128 failed to free'. There is nothing really the user can do to
> > > > > recover at runtime. Only usefulness that could come is for the user to
> > > > > log the error. We already WARN_ON_ONCE on the error the user would not
> > > > > be able to trigger.
> > > >
> > > > I'm thinking from the pow of user application. It might have bugs as
> > > > well and try to refill something that should not have been refilled.
> > > > Having info about which particular token has failed (even just for
> > > > the logging purposes) might have been nice.
> > >
> > > Yeah, it may have been nice. On the flip side it complicates calling
> > > sock_devmem_dontneed(). The userspace need to count the freed frags in
> > > its input, remove them, skip the leaked one, and re-call the syscall.
> > > On the flipside the userspace gets to know the id of the frag that
> > > leaked but the usefulness of the information is slightly questionable
> > > for me. :shrug:
> >
> > Right, because I was gonna suggest for this patch, instead of having
> > a separate extra loop that returns -E2BIG (since this loop is basically
> > mostly cycles wasted assuming most of the calls are gonna be well behaved),
> > can we keep num_frags freed as we go and stop and return once
> > we reach MAX_DONTNEED_FRAGS?
> >
> > for (i = 0; i < num_tokens; i++) {
> >         for (j ...) {
> >                 netmem_ref netmem ...
> >                 ...
> >         }
> >         num_frags += tokens[i].token_count;
> >         if (num_frags > MAX_DONTNEED_FRAGS)
> >                 return ret;
> > }
> >
> > Or do you still find it confusing because userspace has to handle that?
> 
> Ah, I don't think this will work, because it creates this scenario:
> 
> - user calls SO_DEVMEM_DONTNEED passing 1030 tokens.
> - Kernel returns 500 freed.
> - User doesn't know whether:
> (a)  The remaining 530 tokens are all attached to the last
> tokens.count and that's why the kernel returned early, or
> (b) the kernel leaked 530 tokens because it could not find any of them
> in the sk_user_frags.
> 
> In (a) the user is supposed to recall SO_DEVMEM_DONTNEED on the
> remaining 530 tokens, but in (b) the user is not supposed to do that
> (the tokens have leaked and there is nothing the user can do to
> recover).

I kinda feel like people will still write code against internal limits anyway?
At least that's what we did with the internal version of your code: you
know that you can't return more than 128 tokens per call
so you don't even try. If you get an error, or ret != the expected
length, you kill the connection. It seems like there is no graceful
recovery from that?

Regarding your (a) vs (b) example, you can try to call DONTNEED another
time for both cases and either get non-zero and make some progress
or get 0 and give up?

> The current interface is more simple. The kernel either returns an
> error (nothing has been freed): recall SO_DEVMEM_DONTNEED on all the
> tokens after resolving the error, or,
> 
> the kernel returns a positive value which means all the tokens have
> been freed (or unrecoverably leaked), and the userspace must not call
> SO_DEVMEM_DONTNEED on this batch again.

Totally agree that it's more simple. But my worry is that we now
essentially waste a bunch of cpu looping over and testing for the
condition that's not gonna happed in a well-behaved applications.
But maybe I'm over blowing it, idk.

(I'm gonna wait for you to respin before formally sending acks because
 it's not clear which series goes where...)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH net-next v1 7/7] ncdevmem: add test for too many token_count
  2024-10-29 20:55 [PATCH net-next v1 0/7] devmem TCP fixes Mina Almasry
  2024-10-29 20:55 ` [PATCH net-next v1 6/7] net: fix SO_DEVMEM_DONTNEED looping too long Mina Almasry
@ 2024-10-29 20:55 ` Mina Almasry
  2024-11-01  2:41 ` [PATCH net-next v1 0/7] devmem TCP fixes Jakub Kicinski
  2 siblings, 0 replies; 14+ messages in thread
From: Mina Almasry @ 2024-10-29 20:55 UTC (permalink / raw)
  To: netdev, linux-kernel, linux-kselftest
  Cc: Mina Almasry, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jesper Dangaard Brouer,
	Ilias Apalodimas, Shuah Khan

Add test for fixed issue: user passing a token with a very large
token_count. Expect an error in this case.

Signed-off-by: Mina Almasry <almasrymina@google.com>
---
 tools/testing/selftests/net/ncdevmem.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/tools/testing/selftests/net/ncdevmem.c b/tools/testing/selftests/net/ncdevmem.c
index 64d6805381c5..3fd2aee461f3 100644
--- a/tools/testing/selftests/net/ncdevmem.c
+++ b/tools/testing/selftests/net/ncdevmem.c
@@ -391,6 +391,17 @@ int do_server(void)
 				continue;
 			}
 
+			token.token_start = dmabuf_cmsg->frag_token;
+			token.token_count = 8192;
+
+			ret = setsockopt(client_fd, SOL_SOCKET,
+					 SO_DEVMEM_DONTNEED, &token,
+					 sizeof(token));
+			if (ret >= 0)
+				error(1, 0,
+				      "DONTNEED of too many frags should have failed. ret=%ld\n",
+				      ret);
+
 			token.token_start = dmabuf_cmsg->frag_token;
 			token.token_count = 1;
 
-- 
2.47.0.163.g1226f6d8fa-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v1 0/7] devmem TCP fixes
  2024-10-29 20:55 [PATCH net-next v1 0/7] devmem TCP fixes Mina Almasry
  2024-10-29 20:55 ` [PATCH net-next v1 6/7] net: fix SO_DEVMEM_DONTNEED looping too long Mina Almasry
  2024-10-29 20:55 ` [PATCH net-next v1 7/7] ncdevmem: add test for too many token_count Mina Almasry
@ 2024-11-01  2:41 ` Jakub Kicinski
  2024-11-01 13:14   ` Mina Almasry
  2 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2024-11-01  2:41 UTC (permalink / raw)
  To: Mina Almasry
  Cc: netdev, linux-kernel, linux-kselftest, David S. Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, Jesper Dangaard Brouer,
	Ilias Apalodimas, Shuah Khan

On Tue, 29 Oct 2024 20:55:20 +0000 Mina Almasry wrote:
> A few unrelated devmem TCP fixes bundled in a series for some
> convenience (if that's ok).

These two should go to net I presume? It's missing input validation.

Either way you gotta repost either as two properly separate series,
or combine them as one, cause right now they are neither and patchwork
doesn't recognize they are related.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v1 0/7] devmem TCP fixes
  2024-11-01  2:41 ` [PATCH net-next v1 0/7] devmem TCP fixes Jakub Kicinski
@ 2024-11-01 13:14   ` Mina Almasry
  2024-11-02  2:27     ` Jakub Kicinski
  0 siblings, 1 reply; 14+ messages in thread
From: Mina Almasry @ 2024-11-01 13:14 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, linux-kernel, linux-kselftest, David S. Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, Jesper Dangaard Brouer,
	Ilias Apalodimas, Shuah Khan

On Thu, Oct 31, 2024 at 7:42 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 29 Oct 2024 20:55:20 +0000 Mina Almasry wrote:
> > A few unrelated devmem TCP fixes bundled in a series for some
> > convenience (if that's ok).
>
> These two should go to net I presume? It's missing input validation.
>
> Either way you gotta repost either as two properly separate series,
> or combine them as one, cause right now they are neither and patchwork
> doesn't recognize they are related.
>

Yeah my apologies. I made a mistake posting the series and posted the
cover letter twice. Looks like that confused patchwork very much.

I'll also repost targeting net since these are fixes to existing code.

But what is the 'missing input validation'? Do you mean the input
validation for the SO_DEVMEM_DONTNEED API? That should be handled in
the patch  "net: fix SO_DEVMEM_DONTNEED looping too long" in this
series, unless I missed something.

--
Thanks,
Mina

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v1 0/7] devmem TCP fixes
  2024-11-01 13:14   ` Mina Almasry
@ 2024-11-02  2:27     ` Jakub Kicinski
  0 siblings, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2024-11-02  2:27 UTC (permalink / raw)
  To: Mina Almasry
  Cc: netdev, linux-kernel, linux-kselftest, David S. Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, Jesper Dangaard Brouer,
	Ilias Apalodimas, Shuah Khan

On Fri, 1 Nov 2024 06:14:14 -0700 Mina Almasry wrote:
> But what is the 'missing input validation'? Do you mean the input
> validation for the SO_DEVMEM_DONTNEED API? That should be handled in
> the patch  "net: fix SO_DEVMEM_DONTNEED looping too long" in this
> series, unless I missed something.

I guess it's borderline but to me it feels like net material.
It changes the user visible behavior. Someone can write their
code to free 2k tokens on 6.12 and it will break on 6.13.
I don't feel strongly but the way the series ended up getting
split I figured maybe it was also your intuition. If you do
follow the net path -- please move the refactor out to the net-next
series.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH net-next v1 0/7] devmem TCP fixes
@ 2024-10-29 20:45 Mina Almasry
  0 siblings, 0 replies; 14+ messages in thread
From: Mina Almasry @ 2024-10-29 20:45 UTC (permalink / raw)
  To: netdev, linux-kernel, linux-kselftest
  Cc: Mina Almasry, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jesper Dangaard Brouer,
	Ilias Apalodimas, Shuah Khan

A few unrelated devmem TCP fixes bundled in a series for some
convenience (if that's ok).

Patch 1-2: fix naming and provide page_pool_alloc_netmem for fragged
netmem.

Patch 3-4: fix issues with dma-buf dma addresses being potentially
passed to dma_sync_for_* helpers.

Patch 5-6: fix syzbot SO_DEVMEM_DONTNEED issue and add test for this
case.


Mina Almasry (6):
  net: page_pool: rename page_pool_alloc_netmem to *_netmems
  net: page_pool: create page_pool_alloc_netmem
  page_pool: disable sync for cpu for dmabuf memory provider
  netmem: add netmem_prefetch
  net: fix SO_DEVMEM_DONTNEED looping too long
  ncdevmem: add test for too many token_count

Samiullah Khawaja (1):
  page_pool: Set `dma_sync` to false for devmem memory provider

 include/net/netmem.h                   |  7 ++++
 include/net/page_pool/helpers.h        | 50 ++++++++++++++++++--------
 include/net/page_pool/types.h          |  2 +-
 net/core/devmem.c                      |  9 +++--
 net/core/page_pool.c                   | 11 +++---
 net/core/sock.c                        | 46 ++++++++++++++----------
 tools/testing/selftests/net/ncdevmem.c | 11 ++++++
 7 files changed, 93 insertions(+), 43 deletions(-)

-- 
2.47.0.163.g1226f6d8fa-goog


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-11-06  0:13 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-29 20:55 [PATCH net-next v1 0/7] devmem TCP fixes Mina Almasry
2024-10-29 20:55 ` [PATCH net-next v1 6/7] net: fix SO_DEVMEM_DONTNEED looping too long Mina Almasry
2024-10-30 14:33   ` Stanislav Fomichev
2024-10-30 14:46     ` Mina Almasry
2024-10-30 15:06       ` Stanislav Fomichev
2024-11-05 21:28         ` Mina Almasry
2024-11-05 21:46           ` Stanislav Fomichev
2024-11-05 23:43             ` Mina Almasry
2024-11-06  0:13               ` Stanislav Fomichev
2024-10-29 20:55 ` [PATCH net-next v1 7/7] ncdevmem: add test for too many token_count Mina Almasry
2024-11-01  2:41 ` [PATCH net-next v1 0/7] devmem TCP fixes Jakub Kicinski
2024-11-01 13:14   ` Mina Almasry
2024-11-02  2:27     ` Jakub Kicinski
  -- strict thread matches above, loose matches on Subject: below --
2024-10-29 20:45 Mina Almasry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).