XDP offload to hypervisor

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* XDP offload to hypervisor
@ 2017-01-23 21:40 Michael S. Tsirkin
  2017-01-23 21:56 ` John Fastabend
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2017-01-23 21:40 UTC (permalink / raw)
  To: John Fastabend
  Cc: jasowang, john.r.fastabend, netdev, alexei.starovoitov, daniel

I've been thinking about passing XDP programs from guest to the
hypervisor.  Basically, after getting an incoming packet, we could run
an XDP program in host kernel.

If the result is XDP_DROP or XDP_TX we don't need to wake up the guest at all!

When using tun for networking - especially with adjust_head - this
unfortunately probably means we need to do a data copy unless there is
enough headroom.  How much is enough though?

Another issue is around host/guest ABI. Guest BPF could add new features
at any point. What if hypervisor can not support it all?  I guess we
could try loading program into hypervisor and run it within guest on
failure to load, but this ignores question of cross-version
compatibility - someone might start guest on a new host
then try to move to an old one. So we will need an option
"behave like an older host" such that guest can start and then
move to an older host later. This will likely mean
implementing this validation of programs in qemu userspace unless linux
can supply something like this. Is this (disabling some features)
something that might be of interest to larger bpf community?

With a device such as macvtap there exist configurations where a single
guest is in control of the device (aka passthrough mode) in that case
there's a potential to run xdp on host before host skb is built, unless
host already has an xdp program attached.  If it does we could run the
program within guest, but what if a guest program got attached first?
Maybe we should pass a flag in the packet "xdp passed on this packet in
host". Then, guest can skip running it.  Unless we do a full reset
there's always a potential for packets to slip through, e.g. on xdp
program changes. Maybe a flush command is needed, or force queue or
device reset to make sure nothing is going on. Does this make sense?

Thanks!

-- 
MST

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XDP offload to hypervisor
  2017-01-23 21:40 XDP offload to hypervisor Michael S. Tsirkin
@ 2017-01-23 21:56 ` John Fastabend
  2017-01-23 22:26   ` Michael S. Tsirkin
  2017-01-25  2:45   ` Jason Wang
  2017-01-24  1:02 ` Alexei Starovoitov
  2017-01-25  2:41 ` Jason Wang
  2 siblings, 2 replies; 14+ messages in thread
From: John Fastabend @ 2017-01-23 21:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, john.r.fastabend, netdev, alexei.starovoitov, daniel

On 17-01-23 01:40 PM, Michael S. Tsirkin wrote:
> I've been thinking about passing XDP programs from guest to the
> hypervisor.  Basically, after getting an incoming packet, we could run
> an XDP program in host kernel.
> 

Interesting. I am planning on adding XDP to tun driver. My use case
is we want to use XDP to restrict VM traffic. I was planning on pushing
the xdp program execution into tun_get_user(). So different then "offloading"
an xdp program into hypervisor.

> If the result is XDP_DROP or XDP_TX we don't need to wake up the guest at all!
> 

nice win.

> When using tun for networking - especially with adjust_head - this
> unfortunately probably means we need to do a data copy unless there is
> enough headroom.  How much is enough though?

We were looking at making headroom configurable on Intel drivers or at
least matching it with XDP headroom guidelines. (although the developers
had the same complaint about 256B being large). Then at least on supported
drivers the copy could be an exception path.

> 
> Another issue is around host/guest ABI. Guest BPF could add new features
> at any point. What if hypervisor can not support it all?  I guess we
> could try loading program into hypervisor and run it within guest on
> failure to load, but this ignores question of cross-version
> compatibility - someone might start guest on a new host
> then try to move to an old one. So we will need an option
> "behave like an older host" such that guest can start and then
> move to an older host later. This will likely mean
> implementing this validation of programs in qemu userspace unless linux
> can supply something like this. Is this (disabling some features)
> something that might be of interest to larger bpf community?

This is interesting to me at least. Another interesting "feature" of
running bpf in qemu userspace is it could work with vhost_user as well
presumably?

> 
> With a device such as macvtap there exist configurations where a single
> guest is in control of the device (aka passthrough mode) in that case
> there's a potential to run xdp on host before host skb is built, unless
> host already has an xdp program attached.  If it does we could run the
> program within guest, but what if a guest program got attached first?
> Maybe we should pass a flag in the packet "xdp passed on this packet in
> host". Then, guest can skip running it.  Unless we do a full reset
> there's always a potential for packets to slip through, e.g. on xdp
> program changes. Maybe a flush command is needed, or force queue or
> device reset to make sure nothing is going on. Does this make sense?
> 

Could the virtio driver pretend its "offloading" the XDP program to
hardware? This would make it explicit in VM that the program is run
before data is received by virtio_net. Then qemu is enabling the
offload framework which would be interesting.

> Thanks!
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XDP offload to hypervisor
  2017-01-23 21:56 ` John Fastabend
@ 2017-01-23 22:26   ` Michael S. Tsirkin
  2017-01-25  2:45   ` Jason Wang
  1 sibling, 0 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2017-01-23 22:26 UTC (permalink / raw)
  To: John Fastabend
  Cc: jasowang, john.r.fastabend, netdev, alexei.starovoitov, daniel

On Mon, Jan 23, 2017 at 01:56:16PM -0800, John Fastabend wrote:
> On 17-01-23 01:40 PM, Michael S. Tsirkin wrote:
> > I've been thinking about passing XDP programs from guest to the
> > hypervisor.  Basically, after getting an incoming packet, we could run
> > an XDP program in host kernel.
> > 
> 
> Interesting. I am planning on adding XDP to tun driver. My use case
> is we want to use XDP to restrict VM traffic. I was planning on pushing
> the xdp program execution into tun_get_user(). So different then "offloading"
> an xdp program into hypervisor.

tun currently supports TUNATTACHFILTER. Do you plan to extend it then?

So maybe there's need to support more than one program.

Would it work if we run one (host-supplied)
and then if we get XDP_PASS run another (guest supplied)
otherwise don't wake up guest?


> > If the result is XDP_DROP or XDP_TX we don't need to wake up the guest at all!
> > 
> 
> nice win.
> 
> > When using tun for networking - especially with adjust_head - this
> > unfortunately probably means we need to do a data copy unless there is
> > enough headroom.  How much is enough though?
> 
> We were looking at making headroom configurable on Intel drivers or at
> least matching it with XDP headroom guidelines. (although the developers
> had the same complaint about 256B being large).
> Then at least on supported
> drivers the copy could be an exception path.

So I am concerned that userspace comes to depend on support for 256byte
headroom that this patchset enables. How about 
-#define XDP_PACKET_HEADROOM 256
+#define XDP_PACKET_HEADROOM 64
so we start with a concervative value?
In fact NET_SKB_PAD would be ideal I think but it's
platform dependent.

Or at least do the equivalent for virtio only ...

> > 
> > Another issue is around host/guest ABI. Guest BPF could add new features
> > at any point. What if hypervisor can not support it all?  I guess we
> > could try loading program into hypervisor and run it within guest on
> > failure to load, but this ignores question of cross-version
> > compatibility - someone might start guest on a new host
> > then try to move to an old one. So we will need an option
> > "behave like an older host" such that guest can start and then
> > move to an older host later. This will likely mean
> > implementing this validation of programs in qemu userspace unless linux
> > can supply something like this. Is this (disabling some features)
> > something that might be of interest to larger bpf community?
> 
> This is interesting to me at least. Another interesting "feature" of
> running bpf in qemu userspace is it could work with vhost_user as well
> presumably?

I think with vhost user you would want to push it out
to the switch, not run it in qemu.
IOW qemu gets the program and sends it to the switch.
Response is sent to guest so it knows whether switch can support it.

> > 
> > With a device such as macvtap there exist configurations where a single
> > guest is in control of the device (aka passthrough mode) in that case
> > there's a potential to run xdp on host before host skb is built, unless
> > host already has an xdp program attached.  If it does we could run the
> > program within guest, but what if a guest program got attached first?
> > Maybe we should pass a flag in the packet "xdp passed on this packet in
> > host". Then, guest can skip running it.  Unless we do a full reset
> > there's always a potential for packets to slip through, e.g. on xdp
> > program changes. Maybe a flush command is needed, or force queue or
> > device reset to make sure nothing is going on. Does this make sense?
> > 
> 
> Could the virtio driver pretend its "offloading" the XDP program to
> hardware? This would make it explicit in VM that the program is run
> before data is received by virtio_net. Then qemu is enabling the
> offload framework which would be interesting.

On qemu side this is not a problem a command causes
a trap and qemu could flush the queue. But the packets
are still in the rx queue and get processed by napi later.
I think the cleanest interface for it might be a command consuming
an rx buffer and writing a pre-defined pattern into it.
This way guest can figure out how far did device get in the rx queue.


> > Thanks!
> > 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XDP offload to hypervisor
  2017-01-23 21:56 ` John Fastabend
  2017-01-23 22:26   ` Michael S. Tsirkin
@ 2017-01-25  2:45   ` Jason Wang
  2017-01-25  3:17     ` Michael S. Tsirkin
  1 sibling, 1 reply; 14+ messages in thread
From: Jason Wang @ 2017-01-25  2:45 UTC (permalink / raw)
  To: John Fastabend, Michael S. Tsirkin
  Cc: john.r.fastabend, netdev, alexei.starovoitov, daniel



On 2017年01月24日 05:56, John Fastabend wrote:
> On 17-01-23 01:40 PM, Michael S. Tsirkin wrote:
>> I've been thinking about passing XDP programs from guest to the
>> hypervisor.  Basically, after getting an incoming packet, we could run
>> an XDP program in host kernel.
>>
> Interesting. I am planning on adding XDP to tun driver. My use case
> is we want to use XDP to restrict VM traffic. I was planning on pushing
> the xdp program execution into tun_get_user(). So different then "offloading"
> an xdp program into hypervisor.

This looks interesting to me. BTW, I was playing a patch and tries to 
make use of XDP to accelerate macvtap in passthrough mode rx on host. 
With the patch, XDP buffer instead of sbk could be used for vhost rx, 
and tests shows nice results.

But this seems conflict with XDP offload idea here.

Thanks

>
>> If the result is XDP_DROP or XDP_TX we don't need to wake up the guest at all!
>>
> nice win.
>
>> When using tun for networking - especially with adjust_head - this
>> unfortunately probably means we need to do a data copy unless there is
>> enough headroom.  How much is enough though?
> We were looking at making headroom configurable on Intel drivers or at
> least matching it with XDP headroom guidelines. (although the developers
> had the same complaint about 256B being large). Then at least on supported
> drivers the copy could be an exception path.
>
>> Another issue is around host/guest ABI. Guest BPF could add new features
>> at any point. What if hypervisor can not support it all?  I guess we
>> could try loading program into hypervisor and run it within guest on
>> failure to load, but this ignores question of cross-version
>> compatibility - someone might start guest on a new host
>> then try to move to an old one. So we will need an option
>> "behave like an older host" such that guest can start and then
>> move to an older host later. This will likely mean
>> implementing this validation of programs in qemu userspace unless linux
>> can supply something like this. Is this (disabling some features)
>> something that might be of interest to larger bpf community?
> This is interesting to me at least. Another interesting "feature" of
> running bpf in qemu userspace is it could work with vhost_user as well
> presumably?
>
>> With a device such as macvtap there exist configurations where a single
>> guest is in control of the device (aka passthrough mode) in that case
>> there's a potential to run xdp on host before host skb is built, unless
>> host already has an xdp program attached.  If it does we could run the
>> program within guest, but what if a guest program got attached first?
>> Maybe we should pass a flag in the packet "xdp passed on this packet in
>> host". Then, guest can skip running it.  Unless we do a full reset
>> there's always a potential for packets to slip through, e.g. on xdp
>> program changes. Maybe a flush command is needed, or force queue or
>> device reset to make sure nothing is going on. Does this make sense?
>>
> Could the virtio driver pretend its "offloading" the XDP program to
> hardware? This would make it explicit in VM that the program is run
> before data is received by virtio_net. Then qemu is enabling the
> offload framework which would be interesting.
>
>> Thanks!
>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XDP offload to hypervisor
  2017-01-25  2:45   ` Jason Wang
@ 2017-01-25  3:17     ` Michael S. Tsirkin
  0 siblings, 0 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2017-01-25  3:17 UTC (permalink / raw)
  To: Jason Wang
  Cc: John Fastabend, john.r.fastabend, netdev, alexei.starovoitov,
	daniel

On Wed, Jan 25, 2017 at 10:45:18AM +0800, Jason Wang wrote:
> 
> 
> On 2017年01月24日 05:56, John Fastabend wrote:
> > On 17-01-23 01:40 PM, Michael S. Tsirkin wrote:
> > > I've been thinking about passing XDP programs from guest to the
> > > hypervisor.  Basically, after getting an incoming packet, we could run
> > > an XDP program in host kernel.
> > > 
> > Interesting. I am planning on adding XDP to tun driver. My use case
> > is we want to use XDP to restrict VM traffic. I was planning on pushing
> > the xdp program execution into tun_get_user(). So different then "offloading"
> > an xdp program into hypervisor.
> 
> This looks interesting to me. BTW, I was playing a patch and tries to make
> use of XDP to accelerate macvtap in passthrough mode rx on host. With the
> patch, XDP buffer instead of sbk could be used for vhost rx, and tests shows
> nice results.
> 
> But this seems conflict with XDP offload idea here.
> 
> Thanks

One way is to add ability to attach two XDP
programs to macvtap: guest and host. Run host first on XDP_PASS
run guest.

> > 
> > > If the result is XDP_DROP or XDP_TX we don't need to wake up the guest at all!
> > > 
> > nice win.
> > 
> > > When using tun for networking - especially with adjust_head - this
> > > unfortunately probably means we need to do a data copy unless there is
> > > enough headroom.  How much is enough though?
> > We were looking at making headroom configurable on Intel drivers or at
> > least matching it with XDP headroom guidelines. (although the developers
> > had the same complaint about 256B being large). Then at least on supported
> > drivers the copy could be an exception path.
> > 
> > > Another issue is around host/guest ABI. Guest BPF could add new features
> > > at any point. What if hypervisor can not support it all?  I guess we
> > > could try loading program into hypervisor and run it within guest on
> > > failure to load, but this ignores question of cross-version
> > > compatibility - someone might start guest on a new host
> > > then try to move to an old one. So we will need an option
> > > "behave like an older host" such that guest can start and then
> > > move to an older host later. This will likely mean
> > > implementing this validation of programs in qemu userspace unless linux
> > > can supply something like this. Is this (disabling some features)
> > > something that might be of interest to larger bpf community?
> > This is interesting to me at least. Another interesting "feature" of
> > running bpf in qemu userspace is it could work with vhost_user as well
> > presumably?
> > 
> > > With a device such as macvtap there exist configurations where a single
> > > guest is in control of the device (aka passthrough mode) in that case
> > > there's a potential to run xdp on host before host skb is built, unless
> > > host already has an xdp program attached.  If it does we could run the
> > > program within guest, but what if a guest program got attached first?
> > > Maybe we should pass a flag in the packet "xdp passed on this packet in
> > > host". Then, guest can skip running it.  Unless we do a full reset
> > > there's always a potential for packets to slip through, e.g. on xdp
> > > program changes. Maybe a flush command is needed, or force queue or
> > > device reset to make sure nothing is going on. Does this make sense?
> > > 
> > Could the virtio driver pretend its "offloading" the XDP program to
> > hardware? This would make it explicit in VM that the program is run
> > before data is received by virtio_net. Then qemu is enabling the
> > offload framework which would be interesting.
> > 
> > > Thanks!
> > > 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XDP offload to hypervisor
  2017-01-23 21:40 XDP offload to hypervisor Michael S. Tsirkin
  2017-01-23 21:56 ` John Fastabend
@ 2017-01-24  1:02 ` Alexei Starovoitov
  2017-01-24  2:47   ` Michael S. Tsirkin
                     ` (3 more replies)
  2017-01-25  2:41 ` Jason Wang
  2 siblings, 4 replies; 14+ messages in thread
From: Alexei Starovoitov @ 2017-01-24  1:02 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: John Fastabend, jasowang, john.r.fastabend, netdev, daniel

On Mon, Jan 23, 2017 at 11:40:29PM +0200, Michael S. Tsirkin wrote:
> I've been thinking about passing XDP programs from guest to the
> hypervisor.  Basically, after getting an incoming packet, we could run
> an XDP program in host kernel.
> 
> If the result is XDP_DROP or XDP_TX we don't need to wake up the guest at all!

that's an interesting idea!
Long term 'xdp offload' needs to be defined, since NICs become smarter
and can accelerate xdp programs.
So pushing the xdp program down from virtio in the guest into host
and from x86 into nic cpu should probably be handled through the same api.

> When using tun for networking - especially with adjust_head - this
> unfortunately probably means we need to do a data copy unless there is
> enough headroom.  How much is enough though?

Frankly I don't understand the whole virtio nit picking that was happening.
imo virtio+xdp by itself is only useful for debugging, development and testing
of xdp programs in a VM. The discussion about performance of virtio+xdp
will only be meaningful when corresponding host part is done.
Likely in the form of vhost extensions and may be driver changes.
Trying to optimize virtio+xdp when host is doing traditional skb+vhost
isn't going to be impactful.
But when host can do xdp in phyiscal NIC that can deliver raw
pages into vhost that gets picked up by guest virtio, then we hopefully
will be around 10G line rate. page pool is likely needed in such scenario.
Some new xdp action like xdp_tx_into_vhost or whatever.
And guest will be seeing full pages that host nic provided and discussion
about headroom will be automatically solved.
Arguing that skb has 64-byte headroom and therefore we need to
reduce XDP_PACKET_HEADROOM is really upside down.

> Another issue is around host/guest ABI. Guest BPF could add new features
> at any point. What if hypervisor can not support it all?  I guess we
> could try loading program into hypervisor and run it within guest on
> failure to load, but this ignores question of cross-version
> compatibility - someone might start guest on a new host
> then try to move to an old one. So we will need an option
> "behave like an older host" such that guest can start and then
> move to an older host later. This will likely mean
> implementing this validation of programs in qemu userspace unless linux
> can supply something like this. Is this (disabling some features)
> something that might be of interest to larger bpf community?

In case of x86->nic offload not all xdp features will be supported
by the nic and that is expected. The user will request 'offload of xdp prog'
in some form and if it cannot be done, then xdp programs will run
on x86 as before. Same thing, I imagine, is applicable to virtio->host
offload. Therefore I don't see a need for user space visible
feature negotiation.

> With a device such as macvtap there exist configurations where a single
> guest is in control of the device (aka passthrough mode) in that case
> there's a potential to run xdp on host before host skb is built, unless
> host already has an xdp program attached.  If it does we could run the
> program within guest, but what if a guest program got attached first?
> Maybe we should pass a flag in the packet "xdp passed on this packet in
> host". Then, guest can skip running it.  Unless we do a full reset
> there's always a potential for packets to slip through, e.g. on xdp
> program changes. Maybe a flush command is needed, or force queue or
> device reset to make sure nothing is going on. Does this make sense?

All valid questions and concerns.
Since there is still no xdp_adjust_head support in virtio,
it feels kinda early to get into detailed 'virtio offload' discussion.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XDP offload to hypervisor
  2017-01-24  1:02 ` Alexei Starovoitov
@ 2017-01-24  2:47   ` Michael S. Tsirkin
  2017-01-24  3:33   ` Michael S. Tsirkin
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2017-01-24  2:47 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: John Fastabend, jasowang, john.r.fastabend, netdev, daniel

On Mon, Jan 23, 2017 at 05:02:02PM -0800, Alexei Starovoitov wrote:
> > Another issue is around host/guest ABI. Guest BPF could add new features
> > at any point. What if hypervisor can not support it all?  I guess we
> > could try loading program into hypervisor and run it within guest on
> > failure to load, but this ignores question of cross-version
> > compatibility - someone might start guest on a new host
> > then try to move to an old one. So we will need an option
> > "behave like an older host" such that guest can start and then
> > move to an older host later. This will likely mean
> > implementing this validation of programs in qemu userspace unless linux
> > can supply something like this. Is this (disabling some features)
> > something that might be of interest to larger bpf community?
> 
> In case of x86->nic offload not all xdp features will be supported
> by the nic and that is expected. The user will request 'offload of xdp prog'
> in some form and if it cannot be done, then xdp programs will run
> on x86 as before. Same thing, I imagine, is applicable to virtio->host
> offload. Therefore I don't see a need for user space visible
> feature negotiation.

Not userspace visible - guest visible. As guests move between hosts,
you need to make sure source host does not commit to a feature that
destination host does not support.

-- 
MST

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XDP offload to hypervisor
  2017-01-24  1:02 ` Alexei Starovoitov
  2017-01-24  2:47   ` Michael S. Tsirkin
@ 2017-01-24  3:33   ` Michael S. Tsirkin
  2017-01-24  3:50     ` Alexei Starovoitov
  2017-01-25  2:51   ` Jason Wang
  2017-01-25  3:03   ` Jason Wang
  3 siblings, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2017-01-24  3:33 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: John Fastabend, jasowang, john.r.fastabend, netdev, daniel

On Mon, Jan 23, 2017 at 05:02:02PM -0800, Alexei Starovoitov wrote:
> Frankly I don't understand the whole virtio nit picking that was happening.
> imo virtio+xdp by itself is only useful for debugging, development and testing
> of xdp programs in a VM. The discussion about performance of virtio+xdp
> will only be meaningful when corresponding host part is done.
> Likely in the form of vhost extensions and may be driver changes.
> Trying to optimize virtio+xdp when host is doing traditional skb+vhost
> isn't going to be impactful.

Well if packets can be dropped without a host/guest
transition then yes, that will have an impact even
with traditional skbs.

-- 
MST

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XDP offload to hypervisor
  2017-01-24  3:33   ` Michael S. Tsirkin
@ 2017-01-24  3:50     ` Alexei Starovoitov
  2017-01-24  4:35       ` Michael S. Tsirkin
  0 siblings, 1 reply; 14+ messages in thread
From: Alexei Starovoitov @ 2017-01-24  3:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: John Fastabend, jasowang, john.r.fastabend, netdev, daniel

On Tue, Jan 24, 2017 at 05:33:37AM +0200, Michael S. Tsirkin wrote:
> On Mon, Jan 23, 2017 at 05:02:02PM -0800, Alexei Starovoitov wrote:
> > Frankly I don't understand the whole virtio nit picking that was happening.
> > imo virtio+xdp by itself is only useful for debugging, development and testing
> > of xdp programs in a VM. The discussion about performance of virtio+xdp
> > will only be meaningful when corresponding host part is done.
> > Likely in the form of vhost extensions and may be driver changes.
> > Trying to optimize virtio+xdp when host is doing traditional skb+vhost
> > isn't going to be impactful.
> 
> Well if packets can be dropped without a host/guest
> transition then yes, that will have an impact even
> with traditional skbs.

I don't think it's worth optimizing for though, since the speed of drop
matters for ddos-like use case and if we let host be flooded with skbs,
we already lost, since the only thing cpu is doing is allocating skbs
and moving them around. Whether drop is happening upon entry into VM
or host does it in some post-vhost layer doesn't change the picture much.
That said, I do like the idea of offloading virto+xdp into host somehow.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XDP offload to hypervisor
  2017-01-24  3:50     ` Alexei Starovoitov
@ 2017-01-24  4:35       ` Michael S. Tsirkin
  0 siblings, 0 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2017-01-24  4:35 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: John Fastabend, jasowang, john.r.fastabend, netdev, daniel

On Mon, Jan 23, 2017 at 07:50:31PM -0800, Alexei Starovoitov wrote:
> On Tue, Jan 24, 2017 at 05:33:37AM +0200, Michael S. Tsirkin wrote:
> > On Mon, Jan 23, 2017 at 05:02:02PM -0800, Alexei Starovoitov wrote:
> > > Frankly I don't understand the whole virtio nit picking that was happening.
> > > imo virtio+xdp by itself is only useful for debugging, development and testing
> > > of xdp programs in a VM. The discussion about performance of virtio+xdp
> > > will only be meaningful when corresponding host part is done.
> > > Likely in the form of vhost extensions and may be driver changes.
> > > Trying to optimize virtio+xdp when host is doing traditional skb+vhost
> > > isn't going to be impactful.
> > 
> > Well if packets can be dropped without a host/guest
> > transition then yes, that will have an impact even
> > with traditional skbs.
> 
> I don't think it's worth optimizing for though, since the speed of drop
> matters for ddos-like use case

It's not just drops. adjust head + xmit can handle bridging
without entering the VM.

> and if we let host be flooded with skbs,
> we already lost, since the only thing cpu is doing is allocating skbs
> and moving them around. Whether drop is happening upon entry into VM
> or host does it in some post-vhost layer doesn't change the picture much.
> That said, I do like the idea of offloading virto+xdp into host somehow.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XDP offload to hypervisor
  2017-01-24  1:02 ` Alexei Starovoitov
  2017-01-24  2:47   ` Michael S. Tsirkin
  2017-01-24  3:33   ` Michael S. Tsirkin
@ 2017-01-25  2:51   ` Jason Wang
  2017-01-25  3:03   ` Jason Wang
  3 siblings, 0 replies; 14+ messages in thread
From: Jason Wang @ 2017-01-25  2:51 UTC (permalink / raw)
  To: Alexei Starovoitov, Michael S. Tsirkin
  Cc: John Fastabend, john.r.fastabend, netdev, daniel



On 2017年01月24日 09:02, Alexei Starovoitov wrote:
> On Mon, Jan 23, 2017 at 11:40:29PM +0200, Michael S. Tsirkin wrote:
>> I've been thinking about passing XDP programs from guest to the
>> hypervisor.  Basically, after getting an incoming packet, we could run
>> an XDP program in host kernel.
>>
>> If the result is XDP_DROP or XDP_TX we don't need to wake up the guest at all!
> that's an interesting idea!
> Long term 'xdp offload' needs to be defined, since NICs become smarter
> and can accelerate xdp programs.
> So pushing the xdp program down from virtio in the guest into host
> and from x86 into nic cpu should probably be handled through the same api.
>
>> When using tun for networking - especially with adjust_head - this
>> unfortunately probably means we need to do a data copy unless there is
>> enough headroom.  How much is enough though?
> Frankly I don't understand the whole virtio nit picking that was happening.
> imo virtio+xdp by itself is only useful for debugging, development and testing
> of xdp programs in a VM. The discussion about performance of virtio+xdp
> will only be meaningful when corresponding host part is done.

I was doing a prototype to make XDP rx works for macvtap (with minor 
changes in the driver e.g mlx4). Tests shows improvements, plan to post 
as RFC after spring festival holiday in China. This is even useful for 
nested VM but can not work well for XDP offload.

> Likely in the form of vhost extensions and may be driver changes.
> Trying to optimize virtio+xdp when host is doing traditional skb+vhost
> isn't going to be impactful.
> But when host can do xdp in phyiscal NIC that can deliver raw
> pages into vhost that gets picked up by guest virtio, then we hopefully
> will be around 10G line rate. page pool is likely needed in such scenario.
> Some new xdp action like xdp_tx_into_vhost or whatever.

Yes, in my prototype, mlx4 XDP rx page pool were reused.

Thanks

> And guest will be seeing full pages that host nic provided and discussion
> about headroom will be automatically solved.
> Arguing that skb has 64-byte headroom and therefore we need to
> reduce XDP_PACKET_HEADROOM is really upside down.
>
>> Another issue is around host/guest ABI. Guest BPF could add new features
>> at any point. What if hypervisor can not support it all?  I guess we
>> could try loading program into hypervisor and run it within guest on
>> failure to load, but this ignores question of cross-version
>> compatibility - someone might start guest on a new host
>> then try to move to an old one. So we will need an option
>> "behave like an older host" such that guest can start and then
>> move to an older host later. This will likely mean
>> implementing this validation of programs in qemu userspace unless linux
>> can supply something like this. Is this (disabling some features)
>> something that might be of interest to larger bpf community?
> In case of x86->nic offload not all xdp features will be supported
> by the nic and that is expected. The user will request 'offload of xdp prog'
> in some form and if it cannot be done, then xdp programs will run
> on x86 as before. Same thing, I imagine, is applicable to virtio->host
> offload. Therefore I don't see a need for user space visible
> feature negotiation.
>
>> With a device such as macvtap there exist configurations where a single
>> guest is in control of the device (aka passthrough mode) in that case
>> there's a potential to run xdp on host before host skb is built, unless
>> host already has an xdp program attached.  If it does we could run the
>> program within guest, but what if a guest program got attached first?
>> Maybe we should pass a flag in the packet "xdp passed on this packet in
>> host". Then, guest can skip running it.  Unless we do a full reset
>> there's always a potential for packets to slip through, e.g. on xdp
>> program changes. Maybe a flush command is needed, or force queue or
>> device reset to make sure nothing is going on. Does this make sense?
> All valid questions and concerns.
> Since there is still no xdp_adjust_head support in virtio,
> it feels kinda early to get into detailed 'virtio offload' discussion.
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XDP offload to hypervisor
  2017-01-24  1:02 ` Alexei Starovoitov
                     ` (2 preceding siblings ...)
  2017-01-25  2:51   ` Jason Wang
@ 2017-01-25  3:03   ` Jason Wang
  3 siblings, 0 replies; 14+ messages in thread
From: Jason Wang @ 2017-01-25  3:03 UTC (permalink / raw)
  To: Alexei Starovoitov, Michael S. Tsirkin
  Cc: John Fastabend, john.r.fastabend, netdev, daniel



On 2017年01月24日 09:02, Alexei Starovoitov wrote:
> Frankly I don't understand the whole virtio nit picking that was happening.
> imo virtio+xdp by itself is only useful for debugging, development and testing
> of xdp programs in a VM. The discussion about performance of virtio+xdp
> will only be meaningful when corresponding host part is done.

Maybe not, consider you may have dpdk backend on host.

Thanks

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XDP offload to hypervisor
  2017-01-23 21:40 XDP offload to hypervisor Michael S. Tsirkin
  2017-01-23 21:56 ` John Fastabend
  2017-01-24  1:02 ` Alexei Starovoitov
@ 2017-01-25  2:41 ` Jason Wang
  2017-01-25  3:12   ` Michael S. Tsirkin
  2 siblings, 1 reply; 14+ messages in thread
From: Jason Wang @ 2017-01-25  2:41 UTC (permalink / raw)
  To: Michael S. Tsirkin, John Fastabend
  Cc: john.r.fastabend, netdev, alexei.starovoitov, daniel



On 2017年01月24日 05:40, Michael S. Tsirkin wrote:
> I've been thinking about passing XDP programs from guest to the
> hypervisor.  Basically, after getting an incoming packet, we could run
> an XDP program in host kernel.
>
> If the result is XDP_DROP or XDP_TX we don't need to wake up the guest at all!

Interesting, but there're some questions:

- This may work well for XDP_DROP and XDP_TX, and manage to work for 
XDP_PASS. But what if XDP were extend for other capabilities in the 
future? E.g forward to other interface or userspace?
- For XDP_DROP, it can be done through socket filter.
- Need to translate XDP_TX as something like XDP_RX at least for tun. Or 
it may bring some confusion if tun support XDP or XDP were supported in 
tx patch in the future.

>
> When using tun for networking - especially with adjust_head - this
> unfortunately probably means we need to do a data copy unless there is
> enough headroom.  How much is enough though?

Not a tun specific issue I believe?

>
> Another issue is around host/guest ABI. Guest BPF could add new features
> at any point. What if hypervisor can not support it all?  I guess we
> could try loading program into hypervisor and run it within guest on
> failure to load, but this ignores question of cross-version
> compatibility - someone might start guest on a new host
> then try to move to an old one. So we will need an option
> "behave like an older host" such that guest can start and then
> move to an older host later.

I'm suspect whether or not this can be done easily.

>   This will likely mean
> implementing this validation of programs in qemu userspace unless linux
> can supply something like this. Is this (disabling some features)
> something that might be of interest to larger bpf community?
>
> With a device such as macvtap there exist configurations where a single
> guest is in control of the device (aka passthrough mode) in that case
> there's a potential to run xdp on host before host skb is built, unless
> host already has an xdp program attached.  If it does we could run the
> program within guest, but what if a guest program got attached first?
> Maybe we should pass a flag in the packet "xdp passed on this packet in
> host". Then, guest can skip running it.  Unless we do a full reset
> there's always a potential for packets to slip through, e.g. on xdp
> program changes. Maybe a flush command is needed, or force queue or
> device reset to make sure nothing is going on. Does this make sense?
>
> Thanks!
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XDP offload to hypervisor
  2017-01-25  2:41 ` Jason Wang
@ 2017-01-25  3:12   ` Michael S. Tsirkin
  0 siblings, 0 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2017-01-25  3:12 UTC (permalink / raw)
  To: Jason Wang
  Cc: John Fastabend, john.r.fastabend, netdev, alexei.starovoitov,
	daniel

On Wed, Jan 25, 2017 at 10:41:24AM +0800, Jason Wang wrote:
> 
> 
> On 2017年01月24日 05:40, Michael S. Tsirkin wrote:
> > I've been thinking about passing XDP programs from guest to the
> > hypervisor.  Basically, after getting an incoming packet, we could run
> > an XDP program in host kernel.
> > 
> > If the result is XDP_DROP or XDP_TX we don't need to wake up the guest at all!
> 
> Interesting, but there're some questions:
> 
> - This may work well for XDP_DROP and XDP_TX, and manage to work for
> XDP_PASS. But what if XDP were extend for other capabilities in the future?
> E.g forward to other interface or userspace?

This is exactly what I am saying. Any future extensions will need
feature negotiation.

> - For XDP_DROP, it can be done through socket filter.
> - Need to translate XDP_TX as something like XDP_RX at least for tun. Or it
> may bring some confusion if tun support XDP or XDP were supported in tx
> patch in the future.
> 
> > 
> > When using tun for networking - especially with adjust_head - this
> > unfortunately probably means we need to do a data copy unless there is
> > enough headroom.  How much is enough though?
> 
> Not a tun specific issue I believe?

It is tun specific because tun gets skbs from linux net core while XDP
expects pre-skb pages.


> > 
> > Another issue is around host/guest ABI. Guest BPF could add new features
> > at any point. What if hypervisor can not support it all?  I guess we
> > could try loading program into hypervisor and run it within guest on
> > failure to load, but this ignores question of cross-version
> > compatibility - someone might start guest on a new host
> > then try to move to an old one. So we will need an option
> > "behave like an older host" such that guest can start and then
> > move to an older host later.
> 
> I'm suspect whether or not this can be done easily.
> 
> >   This will likely mean
> > implementing this validation of programs in qemu userspace unless linux
> > can supply something like this. Is this (disabling some features)
> > something that might be of interest to larger bpf community?
> > 
> > With a device such as macvtap there exist configurations where a single
> > guest is in control of the device (aka passthrough mode) in that case
> > there's a potential to run xdp on host before host skb is built, unless
> > host already has an xdp program attached.  If it does we could run the
> > program within guest, but what if a guest program got attached first?
> > Maybe we should pass a flag in the packet "xdp passed on this packet in
> > host". Then, guest can skip running it.  Unless we do a full reset
> > there's always a potential for packets to slip through, e.g. on xdp
> > program changes. Maybe a flush command is needed, or force queue or
> > device reset to make sure nothing is going on. Does this make sense?
> > 
> > Thanks!
> > 

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-01-25  3:17 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-01-23 21:40 XDP offload to hypervisor Michael S. Tsirkin
2017-01-23 21:56 ` John Fastabend
2017-01-23 22:26   ` Michael S. Tsirkin
2017-01-25  2:45   ` Jason Wang
2017-01-25  3:17     ` Michael S. Tsirkin
2017-01-24  1:02 ` Alexei Starovoitov
2017-01-24  2:47   ` Michael S. Tsirkin
2017-01-24  3:33   ` Michael S. Tsirkin
2017-01-24  3:50     ` Alexei Starovoitov
2017-01-24  4:35       ` Michael S. Tsirkin
2017-01-25  2:51   ` Jason Wang
2017-01-25  3:03   ` Jason Wang
2017-01-25  2:41 ` Jason Wang
2017-01-25  3:12   ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).