From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Daniel Borkmann <daniel@iogearbox.net>,
Magnus Karlsson <magnus.karlsson@gmail.com>,
Jesper Dangaard Brouer <brouer@redhat.com>
Cc: "Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
"Eelco Chaudron" <echaudro@redhat.com>,
"Lorenzo Bianconi" <lorenzo.bianconi@redhat.com>,
"Björn Töpel" <bjorn@kernel.org>,
"Magnus Karlsson" <magnus.karlsson@intel.com>,
"Jonathan Lemon" <jonathan.lemon@gmail.com>,
"David S. Miller" <davem@davemloft.net>,
"Jakub Kicinski" <kuba@kernel.org>,
"Alexei Starovoitov" <ast@kernel.org>,
"John Fastabend" <john.fastabend@gmail.com>,
"Andrii Nakryiko" <andrii@kernel.org>,
"Martin KaFai Lau" <kafai@fb.com>,
"Song Liu" <songliubraving@fb.com>, "Yonghong Song" <yhs@fb.com>,
"KP Singh" <kpsingh@kernel.org>,
"Willem de Bruijn" <willemb@google.com>,
"Xie He" <xie.he.0141@gmail.com>,
"Eric Dumazet" <edumazet@google.com>,
"John Ogness" <john.ogness@linutronix.de>,
"Wang Hai" <wanghai38@huawei.com>,
"Tanner Love" <tannerlove@google.com>,
"Eyal Birger" <eyal.birger@gmail.com>,
"Menglong Dong" <dong.menglong@zte.com.cn>,
"Network Development" <netdev@vger.kernel.org>,
bpf <bpf@vger.kernel.org>
Subject: Re: [PATCH bpf-next] xsk: support AF_PACKET
Date: Fri, 28 May 2021 14:35:25 +0200 [thread overview]
Message-ID: <8735u7gho2.fsf@toke.dk> (raw)
In-Reply-To: <066a0c0a-ad48-517a-4bd0-8920bdbf0dd8@iogearbox.net>
Daniel Borkmann <daniel@iogearbox.net> writes:
> On 5/28/21 12:54 PM, Toke Høiland-Jørgensen wrote:
>> Daniel Borkmann <daniel@iogearbox.net> writes:
>>> On 5/28/21 12:00 PM, Magnus Karlsson wrote:
>>>> On Fri, May 28, 2021 at 11:52 AM Jesper Dangaard Brouer
>>>> <brouer@redhat.com> wrote:
>>>>> On Fri, 28 May 2021 17:02:01 +0800
>>>>> Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>>>>> On Fri, 28 May 2021 10:55:58 +0200, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>>>>>> Xuan Zhuo <xuanzhuo@linux.alibaba.com> writes:
>>>>>>>
>>>>>>>> In xsk mode, users cannot use AF_PACKET(tcpdump) to observe the current
>>>>>>>> rx/tx data packets. This feature is very important in many cases. So
>>>>>>>> this patch allows AF_PACKET to obtain xsk packages.
>>>>>>>
>>>>>>> You can use xdpdump to dump the packets from the XDP program before it
>>>>>>> gets redirected into the XSK:
>>>>>>> https://github.com/xdp-project/xdp-tools/tree/master/xdp-dump
>>>>>>
>>>>>> Wow, this is a good idea.
>>>>>
>>>>> Yes, it is rather cool (credit to Eelco). Notice the extra info you
>>>>> can capture from 'exit', like XDP return codes, if_index, rx_queue.
>>>>>
>>>>> The tool uses the perf ring-buffer to send/copy data to userspace.
>>>>> This is actually surprisingly fast, but I still think AF_XDP will be
>>>>> faster (but it usually 'steals' the packet).
>>>>>
>>>>> Another (crazy?) idea is to extend this (and xdpdump), is to leverage
>>>>> Hangbin's recent XDP_REDIRECT extension e624d4ed4aa8 ("xdp: Extend
>>>>> xdp_redirect_map with broadcast support"). We now have a
>>>>> xdp_redirect_map flag BPF_F_BROADCAST, what if we create a
>>>>> BPF_F_CLONE_PASS flag?
>>>>>
>>>>> The semantic meaning of BPF_F_CLONE_PASS flag is to copy/clone the
>>>>> packet for the specified map target index (e.g AF_XDP map), but
>>>>> afterwards it does like veth/cpumap and creates an SKB from the
>>>>> xdp_frame (see __xdp_build_skb_from_frame()) and send to netstack.
>>>>> (Feel free to kick me if this doesn't make any sense)
>>>>
>>>> This would be a smooth way to implement clone support for AF_XDP. If
>>>> we had this and someone added AF_XDP support to libpcap, we could both
>>>> capture AF_XDP traffic with tcpdump (using this clone functionality in
>>>> the XDP program) and speed up tcpdump for dumping traffic destined for
>>>> regular sockets. Would that solve your use case Xuan? Note that I have
>>>> not looked into the BPF_F_CLONE_PASS code, so do not know at this
>>>> point what it would take to support this for XSKMAPs.
>>>
>>> Recently also ended up with something similar for our XDP LB to record pcaps [0] ;)
>>> My question is.. tcpdump doesn't really care where the packet data comes from,
>>> so why not extending libpcap's Linux-related internals to either capture from
>>> perf RB or BPF ringbuf rather than AF_PACKET sockets? Cloning is slow, and if
>>> you need to end up creating an skb which is then cloned once again inside AF_PACKET
>>> it's even worse. Just relying and reading out, say, perf RB you don't need any
>>> clones at all.
>>
>> We discussed this when creating xdpdump and decided to keep it as a
>> separate tool for the time being. I forget the details of the
>> discussion, maybe Eelco remembers.
>>
>> Anyway, xdpdump does have a "pipe pcap to stdout" feature so you can do
>> `xdpdump | tcpdump` and get the interactive output; and it will also
>> save pcap information to disk, of course (using pcap-ng so it can also
>> save metadata like XDP program name and return code).
>
> Right, and this should yield a significantly better performance compared to
> cloning & pushing traffic into AF_PACKET. I presume not many folks are aware
> of xdpdump (yet) which is probably why such patch was created here..
What, are you implying we haven't achieved world domination yet?
Inconceivable! ;)
> a native libpcap implementation could solve that aspect fwiw and
> additionally hook at the same points as AF_PACKET via BPF but without
> the hassle/overhead of things like dev_queue_xmit_nit() in fast path.
> (Maybe another option could be to have a drop-in replacement
> libpcap.so for tcpdump using it transparently.)
I do believe that Michael was open to adding something like this to
tcpdump/libpcap when I last talked to him about it; and I'm certainly
not opposed to it either! Hooking up tcpdump like this may be a bit of a
firehose, though, so it would be nice to be able to carry over the
kernel-side filtering as well. I suppose it should be possible to write
an eBPF bytecode generator that does a bit of setup and then just
translates the cBPF packet filtering ops, no? This would be cool to have
in any case; IIRC Cloudflare did something like that but took a detour
through C code generation?
-Toke
next prev parent reply other threads:[~2021-05-28 12:37 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-28 6:08 [PATCH bpf-next] xsk: support AF_PACKET Xuan Zhuo
2021-05-28 8:27 ` kernel test robot
2021-05-28 8:27 ` kernel test robot
2021-05-28 8:34 ` kernel test robot
2021-05-28 8:34 ` kernel test robot
2021-05-28 8:55 ` Toke Høiland-Jørgensen
[not found] ` <1622192521.5931044-1-xuanzhuo@linux.alibaba.com>
2021-05-28 9:25 ` Toke Høiland-Jørgensen
2021-05-28 9:32 ` Maciej Fijalkowski
2021-05-28 9:50 ` Jesper Dangaard Brouer
2021-05-28 10:00 ` Magnus Karlsson
2021-05-28 10:22 ` Daniel Borkmann
2021-05-28 10:54 ` Toke Høiland-Jørgensen
2021-05-28 11:29 ` Daniel Borkmann
2021-05-28 12:35 ` Toke Høiland-Jørgensen [this message]
2021-05-28 12:23 ` Jesper Dangaard Brouer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8735u7gho2.fsf@toke.dk \
--to=toke@redhat.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bjorn@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brouer@redhat.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=dong.menglong@zte.com.cn \
--cc=echaudro@redhat.com \
--cc=edumazet@google.com \
--cc=eyal.birger@gmail.com \
--cc=john.fastabend@gmail.com \
--cc=john.ogness@linutronix.de \
--cc=jonathan.lemon@gmail.com \
--cc=kafai@fb.com \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=lorenzo.bianconi@redhat.com \
--cc=magnus.karlsson@gmail.com \
--cc=magnus.karlsson@intel.com \
--cc=netdev@vger.kernel.org \
--cc=songliubraving@fb.com \
--cc=tannerlove@google.com \
--cc=wanghai38@huawei.com \
--cc=willemb@google.com \
--cc=xie.he.0141@gmail.com \
--cc=xuanzhuo@linux.alibaba.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.