From: John Fastabend <john.fastabend@gmail.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>,
Jakub Kicinski <kuba@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>,
bpf <bpf@vger.kernel.org>, Networking <netdev@vger.kernel.org>,
"davidhwei@meta.com" <davidhwei@meta.com>
Subject: Re: Sockmap's parser/verdict programs and epoll notifications
Date: Tue, 03 Oct 2023 22:40:43 -0700 [thread overview]
Message-ID: <651cfadbe3308_314bc2083f@john.notmuch> (raw)
In-Reply-To: <CAEf4BzaaCvMdKMA=N01Gm1uN2XB_5bcYDZF0oXZR=XyoDePfXg@mail.gmail.com>
Andrii Nakryiko wrote:
> On Tue, Oct 3, 2023 at 5:42 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Mon, 02 Oct 2023 22:16:13 -0700 John Fastabend wrote:
> > > > This with the other piece we want from our side to allow running
> > > > verdict and sk_msg programs on sockets without having them in a
> > > > sockmap/sockhash it would seem like a better system to me. The
> > > > idea to drop the sockmap/sockhash is because we never remove progs
> > > > once they are added and we add them from sockops side. The filter
> > > > to socketes is almost always the port + metadata related to the
> > > > process or environment. This simplifies having to manage the
> > > > sockmap/sockhash and guess what size it should be. Sometimes we
> > > > overrun these maps and have to kill connections until we can
> > > > get more space.
> >
> > That's a step in the right direction for sure, but I still think that
> > Google's auto-lowat is the best approach. We just need a hook that
> > looks at incoming data and sets rcvlowat appropriately. That's it.
> > TCP looks at rcvlowat in a number of places to make protocol decisions,
> > not just the wake-up. Plus Google will no longer have to carry their
> > OOT patch..
>
> David can correct me, but when he tried the SO_RCVLOWAT approach to
> solving this problem, he saw no improvements (and it might have
> actually been a regression in terms of behavior). I'd say that this
> sounds a bit suspicious and we have plans to get back to SO_RCVLOWAT
> and try to understand the behavior a bit better.
Not sure how large your packets are but you might need to bump your
sk_rcvbuf size as well otherwise even if you set SO_RCVLOWAT you can
hit memory pressure which will wake up the application regardless
iirc.
>
> I'll just say that the simpler the solution - the better. And if this
> rcvlowat hook gets us the ability to delay network notification to
> user-space until a full logical packet (where packet size is provided
> by BPF program without user space involvement) is assembled (up to
> some reasonable limits, of course), that would be great.
When we created the sockmap/sockhash maps and verdict progs, etc. one
of the goals was to avoid touching the TCP code paths as much as
possible. We also wanted to work on top of KTLS. Maybe you wouldn't
need it, but if you need to read a header across multiple skbs that
is hard without something to reconstruct them. Perhaps here you
could get away without needing this though.
I'll still fix the parser program and start working on simplifying
the verdict programs so they can run without maps and so on because
it helps other use cases. Maybe it will end up working for this
case or you find a simpler mechanism.
next prev parent reply other threads:[~2023-10-04 5:40 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-07 18:30 Sockmap's parser/verdict programs and epoll notifications Andrii Nakryiko
2023-07-17 4:37 ` John Fastabend
2023-09-08 22:46 ` Andrii Nakryiko
2023-09-11 14:43 ` John Fastabend
2023-09-11 18:01 ` Andrii Nakryiko
2023-10-03 5:04 ` John Fastabend
2023-10-03 5:16 ` John Fastabend
2023-10-03 12:41 ` Jakub Kicinski
2023-10-03 22:22 ` Andrii Nakryiko
2023-10-04 5:40 ` John Fastabend [this message]
2023-10-03 22:18 ` Andrii Nakryiko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=651cfadbe3308_314bc2083f@john.notmuch \
--to=john.fastabend@gmail.com \
--cc=andrii.nakryiko@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=davidhwei@meta.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).