public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Dmitry Safonov <0x7f454c46@gmail.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Dmitry Safonov via B4 Relay
	<devnull+0x7f454c46.gmail.com@kernel.org>,
	 "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	 Paolo Abeni <pabeni@redhat.com>, Simon Horman <horms@kernel.org>,
	David Ahern <dsahern@kernel.org>,
	 Ivan Delalande <colona@arista.com>,
	Matthieu Baerts <matttbe@kernel.org>,
	 Mat Martineau <martineau@kernel.org>,
	Geliang Tang <geliang@kernel.org>,
	 John Fastabend <john.fastabend@gmail.com>,
	Davide Caratti <dcaratti@redhat.com>,
	 Kuniyuki Iwashima <kuniyu@amazon.com>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	 mptcp@lists.linux.dev, Johannes Berg <johannes@sipsolutions.net>
Subject: Re: [PATCH net v2 0/5] Make TCP-MD5-diag slightly less broken
Date: Wed, 20 Nov 2024 00:19:50 +0000	[thread overview]
Message-ID: <CAJwJo6YcPt5+9uQt4yuYS_7o+O8ubjEgOBrq9RmH+b8OpJxdGA@mail.gmail.com> (raw)
In-Reply-To: <20241118161243.21dd9bc0@kernel.org>

On Tue, 19 Nov 2024 at 00:12, Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sat, 16 Nov 2024 03:52:47 +0000 Dmitry Safonov wrote:
> > Kind of agree. But then, it seems to be quite rare. Even on a
> > purposely created selftest it fires not each time (maybe I'm not
> > skilful enough). Yet somewhat sceptical about a re-try in the kernel:
> > the need for it is caused by another thread manipulating keys, so we
> > may need another re-try after the first re-try... So, then we would
> > have to introduce a limit on retries :D
>
> Wouldn't be the first time ;)
> But I'd just retry once with a "very large" buffer.
>
> > Hmm, what do you think about a kind of middle-ground/compromise
> > solution: keeping this NLM_F_DUMP_INTR flag and logic, but making it
> > hardly ever/never happen by purposely allocating larger skb. I don't
> > want to set some value in stone as one day it might become not enough
> > for all different socket infos, but maybe just add 4kB more to the
> > initial allocation? So, for it to reproduce, another thread would have
> > to add 4kB/sizeof(tcp_diag_md5sig) = 4kB/100 ~= 40 MD5 keys on the
> > socket between this thread's skb allocation and filling of the info
> > array. I'd call it "attempting to be nice to a user, but not at their
> > busylooping expense".
>
> The size of the retry buffer should be larger than any valid size.
> We can add a warning if calculated size >= 32kB.

Currently, md5/ao keys are limited by sock_kmalloc(), which uses
optmem_max sysctl limit. The default nowadays is 128KB.

From [1] I see that the current in-kernel (struct tcp_md5sig_key) hits
optmem_max on
# ok 38 optmem limit was hit on adding 655 key
IOW, with the default limit and sizeof(struct tcp_diag_md5sig) = 100,
the maximum skb size would be ~= 65Kb. Sounds a little too big for
kmemcache allocation.

Initially, my idea was to limit this old version of tcp-md5-diag by
U16_MAX. Now I'm thinking of adopting your idea by always allocating
32kB skb for single-message and marking it somehow, if it's not big
enough to fit all the keys on a socket (NLM_F_DUMP_INTR or any other
alternative for userspace to get a clue that the single message wasn't
enough).

Then, as I planned, teach the multi-message dump iterator to stop
between recvmsg() on N-th md5/ao key and continue the dump from that
key on the next recvmsg().

> If we support an inf number of md5 keys we need to cap it.

Yeah, unfortunately, we have some customers with 1000 peers (and
because of that we internally test BGP with even more peers).
And that's with an assumption of one key per peer, which is not
necessarily true for AO.

> Eric is back later this week, perhaps we should wait for his advice.

Sure, I will be glad to have advice from you both, thanks!

> > > Right, the table based parsing doesn't work well with multi-attr,
> > > but other table formats aren't fundamentally better. Or at least
> > > I never came up with a good way of solving this. And the multi-attr
> > > at least doesn't suffer from the u16 problem.
> >
> > Yeah, also an array of structs that makes it impossible to extend such
> > an ABI with new members.
> >
> > And with regards to u16, I was thinking of this diff for net-next, but
> > was not sure if it's worth it:
> >
> > diff --git a/lib/nlattr.c b/lib/nlattr.c
> > index be9c576b6e2d..01c5a49ffa34 100644
> > --- a/lib/nlattr.c
> > +++ b/lib/nlattr.c
> > @@ -903,6 +903,9 @@ struct nlattr *__nla_reserve(struct sk_buff *skb,
> > int attrtype, int attrlen)
> >  {
> >   struct nlattr *nla;
> >
> > + DEBUG_NET_WARN_ONCE(attrlen >= U16_MAX,
> > +     "requested nlattr::nla_len %d >= U16_MAX", attrlen);
> > +
> >   nla = skb_put(skb, nla_total_size(attrlen));
> >   nla->nla_type = attrtype;
> >   nla->nla_len = nla_attr_size(attrlen);
>
> I'm slightly worried that this can be triggered already from user
> space, but we can try DEBUG_NET_* and see. Here and in nla_nest_end().

Yeah, I thought that CONFIG_DEBUG_NET is not enabled on generic
distros, but the description is:
:          Enable extra sanity checks in networking.
:          This is mostly used by fuzzers, but is safe to select.

not sure if that guards any production users from enabling it.
But that would be interesting to see if, with those new additions,
netdev doesn't produce any warnings.

[1] https://netdev-3.bots.linux.dev/vmksft-tcp-ao/results/867500/14-setsockopt-closed-ipv4/stdout

Thanks,
             Dmitry

  reply	other threads:[~2024-11-20  0:20 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-13 18:46 [PATCH net v2 0/5] Make TCP-MD5-diag slightly less broken Dmitry Safonov via B4 Relay
2024-11-13 18:46 ` [PATCH net v2 1/5] net/diag: Do not race on dumping MD5 keys with adding new MD5 keys Dmitry Safonov via B4 Relay
2024-11-13 18:46 ` [PATCH net v2 2/5] net/diag: Warn only once on EMSGSIZE Dmitry Safonov via B4 Relay
2024-11-13 18:46 ` [PATCH net v2 3/5] net/diag: Pre-allocate optional info only if requested Dmitry Safonov via B4 Relay
2024-11-13 18:46 ` [PATCH net v2 4/5] net/diag: Always pre-allocate tcp_ulp info Dmitry Safonov via B4 Relay
2024-11-13 18:46 ` [PATCH net v2 5/5] net/netlink: Correct the comment on netlink message max cap Dmitry Safonov via B4 Relay
2024-11-16  0:13   ` Jakub Kicinski
2024-11-16  0:08 ` [PATCH net v2 0/5] Make TCP-MD5-diag slightly less broken Jakub Kicinski
2024-11-16  0:48   ` Dmitry Safonov
2024-11-16  1:58     ` Jakub Kicinski
2024-11-16  3:52       ` Dmitry Safonov
2024-11-19  0:12         ` Jakub Kicinski
2024-11-20  0:19           ` Dmitry Safonov [this message]
2024-11-20  8:44   ` Johannes Berg
2024-11-20 16:13     ` Dmitry Safonov
2024-11-20 19:36       ` Johannes Berg
2024-11-16  0:20 ` patchwork-bot+netdevbpf
2024-12-05  1:13 ` Jakub Kicinski
2024-12-05  9:09   ` Eric Dumazet
2024-12-06  0:31     ` Jakub Kicinski
2024-12-06  2:49     ` Dmitry Safonov
2024-12-06 15:14       ` Eric Dumazet
2024-12-06 20:35         ` Dmitry Safonov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJwJo6YcPt5+9uQt4yuYS_7o+O8ubjEgOBrq9RmH+b8OpJxdGA@mail.gmail.com \
    --to=0x7f454c46@gmail.com \
    --cc=colona@arista.com \
    --cc=davem@davemloft.net \
    --cc=dcaratti@redhat.com \
    --cc=devnull+0x7f454c46.gmail.com@kernel.org \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=geliang@kernel.org \
    --cc=horms@kernel.org \
    --cc=johannes@sipsolutions.net \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=kuniyu@amazon.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martineau@kernel.org \
    --cc=matttbe@kernel.org \
    --cc=mptcp@lists.linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox