From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Yuri Benditovich <yuri.benditovich@daynix.com>
Cc: Yan Vugenfirer <yan@daynix.com>, Jason Wang <jasowang@redhat.com>,
"Michael S . Tsirkin" <mst@redhat.com>,
Andrew Melnychenko <andrew@daynix.com>,
qemu-devel@nongnu.org
Subject: Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
Date: Wed, 4 Nov 2020 12:04:15 +0000 [thread overview]
Message-ID: <20201104120415.GH565323@redhat.com> (raw)
In-Reply-To: <CAOEp5OciCLsKtnZ=mYavaFbePBwh7VWVg9NyFrns6zy18YKC=w@mail.gmail.com>
On Wed, Nov 04, 2020 at 01:49:05PM +0200, Yuri Benditovich wrote:
> On Wed, Nov 4, 2020 at 4:08 AM Jason Wang <jasowang@redhat.com> wrote:
>
> >
> > On 2020/11/3 下午6:32, Yuri Benditovich wrote:
> > >
> > >
> > > On Tue, Nov 3, 2020 at 11:02 AM Jason Wang <jasowang@redhat.com
> > > <mailto:jasowang@redhat.com>> wrote:
> > >
> > >
> > > On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> > > > Basic idea is to use eBPF to calculate and steer packets in TAP.
> > > > RSS(Receive Side Scaling) is used to distribute network packets
> > > to guest virtqueues
> > > > by calculating packet hash.
> > > > eBPF RSS allows us to use RSS with vhost TAP.
> > > >
> > > > This set of patches introduces the usage of eBPF for packet
> > steering
> > > > and RSS hash calculation:
> > > > * RSS(Receive Side Scaling) is used to distribute network packets
> > to
> > > > guest virtqueues by calculating packet hash
> > > > * eBPF RSS suppose to be faster than already existing 'software'
> > > > implementation in QEMU
> > > > * Additionally adding support for the usage of RSS with vhost
> > > >
> > > > Supported kernels: 5.8+
> > > >
> > > > Implementation notes:
> > > > Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF
> > program.
> > > > Added eBPF support to qemu directly through a system call, see the
> > > > bpf(2) for details.
> > > > The eBPF program is part of the qemu and presented as an array
> > > of bpf
> > > > instructions.
> > > > The program can be recompiled by provided Makefile.ebpf(need to
> > > adjust
> > > > 'linuxhdrs'),
> > > > although it's not required to build QEMU with eBPF support.
> > > > Added changes to virtio-net and vhost, primary eBPF RSS is used.
> > > > 'Software' RSS used in the case of hash population and as a
> > > fallback option.
> > > > For vhost, the hash population feature is not reported to the
> > guest.
> > > >
> > > > Please also see the documentation in PATCH 6/6.
> > > >
> > > > I am sending those patches as RFC to initiate the discussions
> > > and get
> > > > feedback on the following points:
> > > > * Fallback when eBPF is not supported by the kernel
> > >
> > >
> > > Yes, and it could also a lacking of CAP_BPF.
> > >
> > >
> > > > * Live migration to the kernel that doesn't have eBPF support
> > >
> > >
> > > Is there anything that we needs special treatment here?
> > >
> > > Possible case: rss=on, vhost=on, source system with kernel 5.8
> > > (everything works) -> dest. system 5.6 (bpf does not work), the
> > > adapter functions, but all the steering does not use proper queues.
> >
> >
> > Right, I think we need to disable vhost on dest.
> >
> >
> Is this acceptable to disable vhost at time of migration?
>
>
> > >
> > >
> > >
> > > > * Integration with current QEMU build
> > >
> > >
> > > Yes, a question here:
> > >
> > > 1) Any reason for not using libbpf, e.g it has been shipped with some
> > > distros
> > >
> > >
> > > We intentionally do not use libbpf, as it present only on some distros.
> > > We can switch to libbpf, but this will disable bpf if libbpf is not
> > > installed
> >
> >
> > That's better I think.
> >
>
> We think the preferred way is to have an eBPF code built-in in QEMU (not
> distribute it as a separate file).
>
> Our initial idea was to not use the libbpf because it:
> 1. Does not create additional dependency during build time and during
> run-time
> 2. Gives us smaller footprint of loadable eBPF blob inside qemu
> 3. Do not add too much code to QEMU
>
> We can switch to libbpf, in this case:
> 1. Presence of dynamic library is not guaranteed on the target system
Again if a distro or users wants to use this feature in
QEMU they should be expected build the library.
> 2. Static library is large
QEMU doesn't support static linking for system emulators. It may
happen to work at times but there's no expectations in this respect.
> 3. libbpf uses eBPF ELF which is significantly bigger than just the array
> or instructions (May be we succeed to reduce the ELF to some suitable size
> and still have it built-in)
>
> Please let us know whether you still think libbpf is better and why.
It looks like both CLang and GCC compilers for BPF are moving towards
a world where they use BTF to get compile once, run everywhere portability
for the compiled bytecode. IIUC the libbpf is what is responsible for
processing the BTF data when loading it into the running kernel. This
all looks like a good thing in general.
If we introduce BPF to QEMU without using libbpf, and then later decide
we absolutely need libbpf features, it creates an upgrade back compat
issue for existing deployments. It is better to use libbpf right from
the start, so we're set up to take full advantage of what it offers
long term.
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
prev parent reply other threads:[~2020-11-04 12:05 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-02 18:51 [RFC PATCH 0/6] eBPF RSS support for virtio-net Andrew Melnychenko
2020-11-02 18:51 ` [RFC PATCH 1/6] net: Added SetSteeringEBPF method for NetClientState Andrew Melnychenko
2020-11-04 2:49 ` Jason Wang
2020-11-04 9:34 ` Yuri Benditovich
2020-11-02 18:51 ` [RFC PATCH 2/6] ebpf: Added basic eBPF API Andrew Melnychenko
2020-11-02 18:51 ` [RFC PATCH 3/6] ebpf: Added eBPF RSS program Andrew Melnychenko
2020-11-03 13:07 ` Daniel P. Berrangé
2020-11-02 18:51 ` [RFC PATCH 4/6] ebpf: Added eBPF RSS loader Andrew Melnychenko
2020-11-02 18:51 ` [RFC PATCH 5/6] virtio-net: Added eBPF RSS to virtio-net Andrew Melnychenko
2020-11-04 3:09 ` Jason Wang
2020-11-04 11:07 ` Yuri Benditovich
2020-11-04 11:13 ` Daniel P. Berrangé
2020-11-04 15:51 ` Yuri Benditovich
2020-11-05 3:29 ` Jason Wang
2020-11-02 18:51 ` [RFC PATCH 6/6] docs: Added eBPF documentation Andrew Melnychenko
2020-11-04 3:15 ` Jason Wang
2020-11-05 3:56 ` Jason Wang
2020-11-05 9:40 ` Yuri Benditovich
2020-11-03 9:02 ` [RFC PATCH 0/6] eBPF RSS support for virtio-net Jason Wang
2020-11-03 10:32 ` Yuri Benditovich
2020-11-03 11:56 ` Daniel P. Berrangé
2020-11-04 2:15 ` Jason Wang
2020-11-04 2:07 ` Jason Wang
2020-11-04 9:31 ` Daniel P. Berrangé
2020-11-05 3:46 ` Jason Wang
2020-11-05 3:52 ` Jason Wang
2020-11-05 9:11 ` Yuri Benditovich
2020-11-05 10:01 ` Daniel P. Berrangé
2020-11-05 13:19 ` Daniel P. Berrangé
2020-11-05 15:13 ` Yuri Benditovich
2020-11-09 2:13 ` Jason Wang
2020-11-09 13:33 ` Yuri Benditovich
2020-11-10 2:23 ` Jason Wang
2020-11-10 8:00 ` Yuri Benditovich
2020-11-04 11:49 ` Yuri Benditovich
2020-11-04 12:04 ` Daniel P. Berrangé [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201104120415.GH565323@redhat.com \
--to=berrange@redhat.com \
--cc=andrew@daynix.com \
--cc=jasowang@redhat.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=yan@daynix.com \
--cc=yuri.benditovich@daynix.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).