From: Stanislav Fomichev <stfomichev@gmail.com>
To: Mina Almasry <almasrymina@google.com>
Cc: netdev@vger.kernel.org, Jakub Kicinski <kuba@kernel.org>,
Eric Dumazet <edumazet@google.com>,
Willem de Bruijn <willemb@google.com>,
"David S. Miller" <davem@davemloft.net>,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
Paolo Abeni <pabeni@redhat.com>, Simon Horman <horms@kernel.org>,
Jonathan Corbet <corbet@lwn.net>,
Yi Lai <yi1.lai@linux.intel.com>,
Stanislav Fomichev <sdf@fomichev.me>
Subject: Re: [PATCH net v2 2/2] net: clarify SO_DEVMEM_DONTNEED behavior in documentation
Date: Fri, 8 Nov 2024 10:07:39 -0800 [thread overview]
Message-ID: <Zy5Ta-M868VvBme2@mini-arch> (raw)
In-Reply-To: <CAHS8izP8UoGZXoFCEshYrL=o2+T6o4g-PDdgDG=Cfc0X=EXyVQ@mail.gmail.com>
On 11/08, Mina Almasry wrote:
> On Thu, Nov 7, 2024 at 7:01 PM Stanislav Fomichev <stfomichev@gmail.com> wrote:
> >
> > On 11/07, Mina Almasry wrote:
> > > On Thu, Nov 7, 2024 at 5:30 PM Stanislav Fomichev <stfomichev@gmail.com> wrote:
> > > >
> > > > On 11/07, Mina Almasry wrote:
> > > > > Document new behavior when the number of frags passed is too big.
> > > > >
> > > > > Signed-off-by: Mina Almasry <almasrymina@google.com>
> > > > > ---
> > > > > Documentation/networking/devmem.rst | 9 +++++++++
> > > > > 1 file changed, 9 insertions(+)
> > > > >
> > > > > diff --git a/Documentation/networking/devmem.rst b/Documentation/networking/devmem.rst
> > > > > index a55bf21f671c..d95363645331 100644
> > > > > --- a/Documentation/networking/devmem.rst
> > > > > +++ b/Documentation/networking/devmem.rst
> > > > > @@ -225,6 +225,15 @@ The user must ensure the tokens are returned to the kernel in a timely manner.
> > > > > Failure to do so will exhaust the limited dmabuf that is bound to the RX queue
> > > > > and will lead to packet drops.
> > > > >
> > > > > +The user must pass no more than 128 tokens, with no more than 1024 total frags
> > > > > +among the token->token_count across all the tokens. If the user provides more
> > > > > +than 1024 frags, the kernel will free up to 1024 frags and return early.
> > > > > +
> > > > > +The kernel returns the number of actual frags freed. The number of frags freed
> > > > > +can be less than the tokens provided by the user in case of:
> > > > > +
> > > >
> > > > [..]
> > > >
> > > > > +(a) an internal kernel leak bug.
> > > >
> > > > If you're gonna respin, might be worth mentioning that the dmesg
> > > > will contain a warning in case of a leak?
> > >
> > > We will not actually warn in the likely cases of leak.
> > >
> > > We warn when we find an entry in the xarray that is not a net_iov, or
> > > if napi_pp_put_page fails on that net_iov. Both are very unlikely to
> > > happen honestly.
> > >
> > > The likely 'leaks' are when we don't find the frag_id in the xarray.
> > > We do not warn on that because the user can intentionally trigger the
> > > warning with invalid input. If the user is actually giving valid input
> > > and the warn still happens, likely a kernel bug like I mentioned in
> > > another thread, but we still don't warn.
> >
> > In this case, maybe don't mention the leaks at all? If it's not
> > actionable, not sure how it helps?
>
> It's good to explain what the return code of the setsockopt means, and
> when it would be less than the number of passed in tokens.
>
> Also it's not really 'not actionable'. I expect serious users of
> devmem tcp to log such leaks in metrics and try to root cause the
> userspace or kernel bug causing them if they happen.
Right now it reads like both (a) and (b) have a similar probability. Maybe
even (a) is more probable because you mention it first? In theory, any syscall
can have a bug in it where it returns something bogus, so maybe at least
downplay the 'leak' part a bit? "In the extremely rare cases, kernel
might free less frags than requested .... "
Imagine a situation where the user inadvertently tries to free the same token
twice or something and gets the unexpected return value. Why? Might be
the kernel leak, right?
From the POW of the kernel, the most probable cases where we return
less tokens are:
1. user gave us more than 1024
2. user gave us incorrect tokens
...
99. kernel is full of bugs and we lost the frag
next prev parent reply other threads:[~2024-11-08 18:07 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-07 21:03 [PATCH net v2 1/2] net: fix SO_DEVMEM_DONTNEED looping too long Mina Almasry
2024-11-07 21:03 ` [PATCH net v2 2/2] net: clarify SO_DEVMEM_DONTNEED behavior in documentation Mina Almasry
2024-11-08 1:30 ` Stanislav Fomichev
2024-11-08 1:40 ` Mina Almasry
2024-11-08 3:01 ` Stanislav Fomichev
2024-11-08 16:30 ` Mina Almasry
2024-11-08 18:07 ` Stanislav Fomichev [this message]
2024-11-08 18:45 ` Mina Almasry
2024-11-08 1:28 ` [PATCH net v2 1/2] net: fix SO_DEVMEM_DONTNEED looping too long Stanislav Fomichev
2024-11-08 1:33 ` Mina Almasry
2024-11-08 2:58 ` Stanislav Fomichev
2024-11-12 2:40 ` patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zy5Ta-M868VvBme2@mini-arch \
--to=stfomichev@gmail.com \
--cc=almasrymina@google.com \
--cc=corbet@lwn.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
--cc=willemb@google.com \
--cc=yi1.lai@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).