From: Jakub Sitnicki <jakub@cloudflare.com>
To: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>,
netdev@vger.kernel.org, Marek Majkowski <marek@cloudflare.com>
Subject: Re: [PATCH net] sk_msg: Keep reference on socket file while psock lives
Date: Mon, 27 May 2019 11:27:31 +0200 [thread overview]
Message-ID: <87k1ecz1q4.fsf@cloudflare.com> (raw)
In-Reply-To: <5ce81306aacbe_39402ae86c50a5bc2f@john-XPS-13-9360.notmuch>
On Fri, May 24, 2019 at 05:51 PM CEST, John Fastabend wrote:
> Jakub Sitnicki wrote:
>>
>> Now that those pesky crashes are gone, we plan to look into drops when
>> doing echo with sockmap. Marek tried running echo-sockmap [1] with
>> latest bpf-next (plus mentioned crash fixes) and reports that not all
>> data bounces back:
>>
>> $ yes| head -c $[1024*1024] | nc -q2 192.168.1.33 1234 |wc -c
>> 971832
>> $ yes| head -c $[1024*1024] | nc -q2 192.168.1.33 1234 |wc -c
>> 867352
>> $ yes| head -c $[1024*1024] | nc -q2 192.168.1.33 1234 |wc -c
>> 952648
>>
>> I'm tring to turn echo-sockmap into a selftest but as you can probably
>> guess over loopback all works fine.
>>
>
> Right, sockmap when used from recvmsg with redirect is lossy. This
> was a design choice I made that apparently caught a few people
> by surprise. The original rationale for this was when doing a
> multiplex operation, e.g. single ingress socket to many egress
> sockets blocking would cause head of line blocking on all
> sockets. To resolve this I simply dropped the packet and then allow
> the flow to continue. This pushes the logic up to the application
> to do retries, etc. when this happens. FWIW userspace proxies I
> tested also had similar points where they fell over and dropped
> packets. In hind sight though it probably would have made more
> sense to make this behavior opt-in vs the default. But, the
> use case I was solving at the time I wrote this could handle
> drops and was actually a NxM sockets with N ingress sockets and M
> egress sockets so head of line blocking was a real problem.
>
> Adding a flag to turn this into a blocking op has been on my
> todo list for awhile. Especially when sockmap is being used as
> a single ingress to single egress socket then blocking vs dropping
> makes much more sense.
>
> The place to look is in sk_psock_verdict_apply() in __SK_REDIRECT
> case there is a few checks and notice we can fallthrough to a
> kfree_skb(skb). This is most likely the drops you are hitting.
> Maybe annotate it with a dbg statement to check.
>
> To fix this we could have a flag to _not_ drop but enqueue the
> packet regardless of the test or hold it until space is
> available. I even think sk_psock_strp_read could push back
> on the stream parser which would eventually push back via TCP
> and get the behavior you want.
>
> Also, I have a couple items on my TODO list that I'll eventually
> get to. First we run without stream parsers in some Cilium
> use cases. I'll push some patches to allow this in the next
> months or so. This avoids the annoying stream parser prog that
> simply returns skb->len. This is mostly an optimizations. A
> larger change I want to make at some point is to remove the
> backlog workqueue altogether. Originally it was added for
> simplicity but actually causes some latency spikes when
> looking at 99+ percentiles. It really doesn't need to be
> there it was a hold over from some original architecture that
> got pushed upstream. If you have time and want to let me know
> if you would like to tackle removing it.
This is great stuff. Thanks for explaining the sockmap's design and
decisions behind it. The opt-in blocking mode idea is spot on.
I imagine we'll get back to sockmap once we have a replacement for
TPROXY figured out (unrelated to sockmap). Let's sync then.
-Jakub
prev parent reply other threads:[~2019-05-27 9:27 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-11 9:09 [PATCH net] sk_msg: Keep reference on socket file while psock lives Jakub Sitnicki
2019-02-19 16:00 ` Daniel Borkmann
2019-02-20 9:47 ` Jakub Sitnicki
2019-05-21 20:07 ` John Fastabend
2019-05-22 11:14 ` Jakub Sitnicki
2019-05-23 15:58 ` John Fastabend
2019-05-24 8:05 ` Jakub Sitnicki
2019-05-24 15:51 ` John Fastabend
2019-05-27 9:27 ` Jakub Sitnicki [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87k1ecz1q4.fsf@cloudflare.com \
--to=jakub@cloudflare.com \
--cc=daniel@iogearbox.net \
--cc=john.fastabend@gmail.com \
--cc=marek@cloudflare.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.