* Re: XDP multi-buffer incl. jumbo-frames (Was: [RFC V1 net-next 1/1] net: ena: implement XDP drop support)
@ 2019-06-26 14:40 ` Jesper Dangaard Brouer
0 siblings, 0 replies; 28+ messages in thread
From: Jesper Dangaard Brouer @ 2019-06-26 14:40 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: Machulsky, Zorik, Jubran, Samih, davem@davemloft.net,
netdev@vger.kernel.org, Woodhouse, David, Matushevsky, Alexander,
Bshara, Saeed, Wilson, Matt, Liguori, Anthony, Bshara, Nafea,
Tzalik, Guy, Belgazal, Netanel, Saidi, Ali,
Herrenschmidt, Benjamin, Kiyanovski, Arthur, Daniel Borkmann,
Ilias Apalodimas, Alexei Starovoitov, Jakub Kicinski,
xdp-newbies@vger.kernel.org, brouer
On Wed, 26 Jun 2019 13:52:16 +0200
Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> Jesper Dangaard Brouer <brouer@redhat.com> writes:
>
> > On Tue, 25 Jun 2019 03:19:22 +0000
> > "Machulsky, Zorik" <zorik@amazon.com> wrote:
> >
> >> On 6/23/19, 7:21 AM, "Jesper Dangaard Brouer" <brouer@redhat.com> wrote:
> >>
> >> On Sun, 23 Jun 2019 10:06:49 +0300 <sameehj@amazon.com> wrote:
> >>
> >> > This commit implements the basic functionality of drop/pass logic in the
> >> > ena driver.
> >>
> >> Usually we require a driver to implement all the XDP return codes,
> >> before we accept it. But as Daniel and I discussed with Zorik during
> >> NetConf[1], we are going to make an exception and accept the driver
> >> if you also implement XDP_TX.
> >>
> >> As we trust that Zorik/Amazon will follow and implement XDP_REDIRECT
> >> later, given he/you wants AF_XDP support which requires XDP_REDIRECT.
> >>
> >> Jesper, thanks for your comments and very helpful discussion during
> >> NetConf! That's the plan, as we agreed. From our side I would like to
> >> reiterate again the importance of multi-buffer support by xdp frame.
> >> We would really prefer not to see our MTU shrinking because of xdp
> >> support.
> >
> > Okay we really need to make a serious attempt to find a way to support
> > multi-buffer packets with XDP. With the important criteria of not
> > hurting performance of the single-buffer per packet design.
> >
> > I've created a design document[2], that I will update based on our
> > discussions: [2] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org
> >
> > The use-case that really convinced me was Eric's packet header-split.
> >
> >
> > Lets refresh: Why XDP don't have multi-buffer support:
> >
> > XDP is designed for maximum performance, which is why certain driver-level
> > use-cases were not supported, like multi-buffer packets (like jumbo-frames).
> > As it e.g. complicated the driver RX-loop and memory model handling.
> >
> > The single buffer per packet design, is also tied into eBPF Direct-Access
> > (DA) to packet data, which can only be allowed if the packet memory is in
> > contiguous memory. This DA feature is essential for XDP performance.
> >
> >
> > One way forward is to define that XDP only get access to the first
> > packet buffer, and it cannot see subsequent buffers. For XDP_TX and
> > XDP_REDIRECT to work then XDP still need to carry pointers (plus
> > len+offset) to the other buffers, which is 16 bytes per extra buffer.
>
> Yeah, I think this would be reasonable. As long as we can have a
> metadata field with the full length + still give XDP programs the
> ability to truncate the packet (i.e., discard the subsequent pages)
You touch upon some interesting complications already:
1. It is valuable for XDP bpf_prog to know "full" length?
(if so, then we need to extend xdp ctx with info)
But if we need to know the full length, when the first-buffer is
processed. Then realize that this affect the drivers RX-loop, because
then we need to "collect" all the buffers before we can know the
length (although some HW provide this in first descriptor).
We likely have to change drivers RX-loop anyhow, as XDP_TX and
XDP_REDIRECT will also need to "collect" all buffers before the packet
can be forwarded. (Although this could potentially happen later in
driver loop when it meet/find the End-Of-Packet descriptor bit).
2. Can we even allow helper bpf_xdp_adjust_tail() ?
Wouldn't it be easier to disallow a BPF-prog with this helper, when
driver have configured multi-buffer? Or will it be too restrictive,
if jumbo-frame is very uncommon and only enabled because switch infra
could not be changed (like Amazon case).
Perhaps it is better to let bpf_xdp_adjust_tail() fail runtime?
> I think many (most?) use cases will work fine without having access
> to the full packet data...
I agree. Other people should voice their concerns if they don't
agree...
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: XDP multi-buffer incl. jumbo-frames (Was: [RFC V1 net-next 1/1] net: ena: implement XDP drop support)
2019-06-26 14:40 ` Jesper Dangaard Brouer
@ 2019-06-26 15:01 ` Toke Høiland-Jørgensen
-1 siblings, 0 replies; 28+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-06-26 15:01 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Machulsky, Zorik, Jubran, Samih, davem@davemloft.net,
netdev@vger.kernel.org, Woodhouse, David, Matushevsky, Alexander,
Bshara, Saeed, Wilson, Matt, Liguori, Anthony, Bshara, Nafea,
Tzalik, Guy, Belgazal, Netanel, Saidi, Ali,
Herrenschmidt, Benjamin, Kiyanovski, Arthur, Daniel Borkmann,
Ilias Apalodimas, Alexei Starovoitov, Jakub Kicinski,
xdp-newbies@vger.kernel.org
Jesper Dangaard Brouer <brouer@redhat.com> writes:
> On Wed, 26 Jun 2019 13:52:16 +0200
> Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
>> Jesper Dangaard Brouer <brouer@redhat.com> writes:
>>
>> > On Tue, 25 Jun 2019 03:19:22 +0000
>> > "Machulsky, Zorik" <zorik@amazon.com> wrote:
>> >
>> >> On 6/23/19, 7:21 AM, "Jesper Dangaard Brouer" <brouer@redhat.com> wrote:
>> >>
>> >> On Sun, 23 Jun 2019 10:06:49 +0300 <sameehj@amazon.com> wrote:
>> >>
>> >> > This commit implements the basic functionality of drop/pass logic in the
>> >> > ena driver.
>> >>
>> >> Usually we require a driver to implement all the XDP return codes,
>> >> before we accept it. But as Daniel and I discussed with Zorik during
>> >> NetConf[1], we are going to make an exception and accept the driver
>> >> if you also implement XDP_TX.
>> >>
>> >> As we trust that Zorik/Amazon will follow and implement XDP_REDIRECT
>> >> later, given he/you wants AF_XDP support which requires XDP_REDIRECT.
>> >>
>> >> Jesper, thanks for your comments and very helpful discussion during
>> >> NetConf! That's the plan, as we agreed. From our side I would like to
>> >> reiterate again the importance of multi-buffer support by xdp frame.
>> >> We would really prefer not to see our MTU shrinking because of xdp
>> >> support.
>> >
>> > Okay we really need to make a serious attempt to find a way to support
>> > multi-buffer packets with XDP. With the important criteria of not
>> > hurting performance of the single-buffer per packet design.
>> >
>> > I've created a design document[2], that I will update based on our
>> > discussions: [2] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org
>> >
>> > The use-case that really convinced me was Eric's packet header-split.
>> >
>> >
>> > Lets refresh: Why XDP don't have multi-buffer support:
>> >
>> > XDP is designed for maximum performance, which is why certain driver-level
>> > use-cases were not supported, like multi-buffer packets (like jumbo-frames).
>> > As it e.g. complicated the driver RX-loop and memory model handling.
>> >
>> > The single buffer per packet design, is also tied into eBPF Direct-Access
>> > (DA) to packet data, which can only be allowed if the packet memory is in
>> > contiguous memory. This DA feature is essential for XDP performance.
>> >
>> >
>> > One way forward is to define that XDP only get access to the first
>> > packet buffer, and it cannot see subsequent buffers. For XDP_TX and
>> > XDP_REDIRECT to work then XDP still need to carry pointers (plus
>> > len+offset) to the other buffers, which is 16 bytes per extra buffer.
>>
>> Yeah, I think this would be reasonable. As long as we can have a
>> metadata field with the full length + still give XDP programs the
>> ability to truncate the packet (i.e., discard the subsequent pages)
>
> You touch upon some interesting complications already:
>
> 1. It is valuable for XDP bpf_prog to know "full" length?
> (if so, then we need to extend xdp ctx with info)
Valuable, quite likely. A hard requirement, probably not (for all use
cases).
> But if we need to know the full length, when the first-buffer is
> processed. Then realize that this affect the drivers RX-loop, because
> then we need to "collect" all the buffers before we can know the
> length (although some HW provide this in first descriptor).
>
> We likely have to change drivers RX-loop anyhow, as XDP_TX and
> XDP_REDIRECT will also need to "collect" all buffers before the packet
> can be forwarded. (Although this could potentially happen later in
> driver loop when it meet/find the End-Of-Packet descriptor bit).
>
>
> 2. Can we even allow helper bpf_xdp_adjust_tail() ?
>
> Wouldn't it be easier to disallow a BPF-prog with this helper, when
> driver have configured multi-buffer?
Easier, certainly. But then it's even easier to not implement this at
all ;)
> Or will it be too restrictive, if jumbo-frame is very uncommon and
> only enabled because switch infra could not be changed (like Amazon
> case).
I think it would be preferable to support it; but maybe we can let that
depend on how difficult it actually turns out to be to allow it?
> Perhaps it is better to let bpf_xdp_adjust_tail() fail runtime?
If we do disallow it, I think I'd lean towards failing the call at
runtime...
-Toke
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: XDP multi-buffer incl. jumbo-frames (Was: [RFC V1 net-next 1/1] net: ena: implement XDP drop support)
@ 2019-06-26 15:01 ` Toke Høiland-Jørgensen
0 siblings, 0 replies; 28+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-06-26 15:01 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Machulsky, Zorik, Jubran, Samih, davem@davemloft.net,
netdev@vger.kernel.org, Woodhouse, David, Matushevsky, Alexander,
Bshara, Saeed, Wilson, Matt, Liguori, Anthony, Bshara, Nafea,
Tzalik, Guy, Belgazal, Netanel, Saidi, Ali,
Herrenschmidt, Benjamin, Kiyanovski, Arthur, Daniel Borkmann,
Ilias Apalodimas, Alexei Starovoitov, Jakub Kicinski,
xdp-newbies@vger.kernel.org, brouer
Jesper Dangaard Brouer <brouer@redhat.com> writes:
> On Wed, 26 Jun 2019 13:52:16 +0200
> Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
>> Jesper Dangaard Brouer <brouer@redhat.com> writes:
>>
>> > On Tue, 25 Jun 2019 03:19:22 +0000
>> > "Machulsky, Zorik" <zorik@amazon.com> wrote:
>> >
>> >> On 6/23/19, 7:21 AM, "Jesper Dangaard Brouer" <brouer@redhat.com> wrote:
>> >>
>> >> On Sun, 23 Jun 2019 10:06:49 +0300 <sameehj@amazon.com> wrote:
>> >>
>> >> > This commit implements the basic functionality of drop/pass logic in the
>> >> > ena driver.
>> >>
>> >> Usually we require a driver to implement all the XDP return codes,
>> >> before we accept it. But as Daniel and I discussed with Zorik during
>> >> NetConf[1], we are going to make an exception and accept the driver
>> >> if you also implement XDP_TX.
>> >>
>> >> As we trust that Zorik/Amazon will follow and implement XDP_REDIRECT
>> >> later, given he/you wants AF_XDP support which requires XDP_REDIRECT.
>> >>
>> >> Jesper, thanks for your comments and very helpful discussion during
>> >> NetConf! That's the plan, as we agreed. From our side I would like to
>> >> reiterate again the importance of multi-buffer support by xdp frame.
>> >> We would really prefer not to see our MTU shrinking because of xdp
>> >> support.
>> >
>> > Okay we really need to make a serious attempt to find a way to support
>> > multi-buffer packets with XDP. With the important criteria of not
>> > hurting performance of the single-buffer per packet design.
>> >
>> > I've created a design document[2], that I will update based on our
>> > discussions: [2] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org
>> >
>> > The use-case that really convinced me was Eric's packet header-split.
>> >
>> >
>> > Lets refresh: Why XDP don't have multi-buffer support:
>> >
>> > XDP is designed for maximum performance, which is why certain driver-level
>> > use-cases were not supported, like multi-buffer packets (like jumbo-frames).
>> > As it e.g. complicated the driver RX-loop and memory model handling.
>> >
>> > The single buffer per packet design, is also tied into eBPF Direct-Access
>> > (DA) to packet data, which can only be allowed if the packet memory is in
>> > contiguous memory. This DA feature is essential for XDP performance.
>> >
>> >
>> > One way forward is to define that XDP only get access to the first
>> > packet buffer, and it cannot see subsequent buffers. For XDP_TX and
>> > XDP_REDIRECT to work then XDP still need to carry pointers (plus
>> > len+offset) to the other buffers, which is 16 bytes per extra buffer.
>>
>> Yeah, I think this would be reasonable. As long as we can have a
>> metadata field with the full length + still give XDP programs the
>> ability to truncate the packet (i.e., discard the subsequent pages)
>
> You touch upon some interesting complications already:
>
> 1. It is valuable for XDP bpf_prog to know "full" length?
> (if so, then we need to extend xdp ctx with info)
Valuable, quite likely. A hard requirement, probably not (for all use
cases).
> But if we need to know the full length, when the first-buffer is
> processed. Then realize that this affect the drivers RX-loop, because
> then we need to "collect" all the buffers before we can know the
> length (although some HW provide this in first descriptor).
>
> We likely have to change drivers RX-loop anyhow, as XDP_TX and
> XDP_REDIRECT will also need to "collect" all buffers before the packet
> can be forwarded. (Although this could potentially happen later in
> driver loop when it meet/find the End-Of-Packet descriptor bit).
>
>
> 2. Can we even allow helper bpf_xdp_adjust_tail() ?
>
> Wouldn't it be easier to disallow a BPF-prog with this helper, when
> driver have configured multi-buffer?
Easier, certainly. But then it's even easier to not implement this at
all ;)
> Or will it be too restrictive, if jumbo-frame is very uncommon and
> only enabled because switch infra could not be changed (like Amazon
> case).
I think it would be preferable to support it; but maybe we can let that
depend on how difficult it actually turns out to be to allow it?
> Perhaps it is better to let bpf_xdp_adjust_tail() fail runtime?
If we do disallow it, I think I'd lean towards failing the call at
runtime...
-Toke
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: XDP multi-buffer incl. jumbo-frames (Was: [RFC V1 net-next 1/1] net: ena: implement XDP drop support)
2019-06-26 15:01 ` Toke Høiland-Jørgensen
(?)
@ 2019-06-26 15:20 ` Willem de Bruijn
2019-06-26 16:42 ` Jonathan Lemon
2019-06-28 8:46 ` Jesper Dangaard Brouer
-1 siblings, 2 replies; 28+ messages in thread
From: Willem de Bruijn @ 2019-06-26 15:20 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: Jesper Dangaard Brouer, Machulsky, Zorik, Jubran, Samih,
davem@davemloft.net, netdev@vger.kernel.org, Woodhouse, David,
Matushevsky, Alexander, Bshara, Saeed, Wilson, Matt,
Liguori, Anthony, Bshara, Nafea, Tzalik, Guy, Belgazal, Netanel,
Saidi, Ali, Herrenschmidt, Benjamin, Kiyanovski, Arthur,
Daniel Borkmann, Ilias Apalodimas, Alexei Starovoitov,
Jakub Kicinski, xdp-newbies@vger.kernel.org
On Wed, Jun 26, 2019 at 11:01 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Jesper Dangaard Brouer <brouer@redhat.com> writes:
>
> > On Wed, 26 Jun 2019 13:52:16 +0200
> > Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >
> >> Jesper Dangaard Brouer <brouer@redhat.com> writes:
> >>
> >> > On Tue, 25 Jun 2019 03:19:22 +0000
> >> > "Machulsky, Zorik" <zorik@amazon.com> wrote:
> >> >
> >> >> On 6/23/19, 7:21 AM, "Jesper Dangaard Brouer" <brouer@redhat.com> wrote:
> >> >>
> >> >> On Sun, 23 Jun 2019 10:06:49 +0300 <sameehj@amazon.com> wrote:
> >> >>
> >> >> > This commit implements the basic functionality of drop/pass logic in the
> >> >> > ena driver.
> >> >>
> >> >> Usually we require a driver to implement all the XDP return codes,
> >> >> before we accept it. But as Daniel and I discussed with Zorik during
> >> >> NetConf[1], we are going to make an exception and accept the driver
> >> >> if you also implement XDP_TX.
> >> >>
> >> >> As we trust that Zorik/Amazon will follow and implement XDP_REDIRECT
> >> >> later, given he/you wants AF_XDP support which requires XDP_REDIRECT.
> >> >>
> >> >> Jesper, thanks for your comments and very helpful discussion during
> >> >> NetConf! That's the plan, as we agreed. From our side I would like to
> >> >> reiterate again the importance of multi-buffer support by xdp frame.
> >> >> We would really prefer not to see our MTU shrinking because of xdp
> >> >> support.
> >> >
> >> > Okay we really need to make a serious attempt to find a way to support
> >> > multi-buffer packets with XDP. With the important criteria of not
> >> > hurting performance of the single-buffer per packet design.
> >> >
> >> > I've created a design document[2], that I will update based on our
> >> > discussions: [2] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org
> >> >
> >> > The use-case that really convinced me was Eric's packet header-split.
Thanks for starting this discussion Jesper!
> >> >
> >> >
> >> > Lets refresh: Why XDP don't have multi-buffer support:
> >> >
> >> > XDP is designed for maximum performance, which is why certain driver-level
> >> > use-cases were not supported, like multi-buffer packets (like jumbo-frames).
> >> > As it e.g. complicated the driver RX-loop and memory model handling.
> >> >
> >> > The single buffer per packet design, is also tied into eBPF Direct-Access
> >> > (DA) to packet data, which can only be allowed if the packet memory is in
> >> > contiguous memory. This DA feature is essential for XDP performance.
> >> >
> >> >
> >> > One way forward is to define that XDP only get access to the first
> >> > packet buffer, and it cannot see subsequent buffers. For XDP_TX and
> >> > XDP_REDIRECT to work then XDP still need to carry pointers (plus
> >> > len+offset) to the other buffers, which is 16 bytes per extra buffer.
> >>
> >> Yeah, I think this would be reasonable. As long as we can have a
> >> metadata field with the full length + still give XDP programs the
> >> ability to truncate the packet (i.e., discard the subsequent pages)
> >
> > You touch upon some interesting complications already:
> >
> > 1. It is valuable for XDP bpf_prog to know "full" length?
> > (if so, then we need to extend xdp ctx with info)
>
> Valuable, quite likely. A hard requirement, probably not (for all use
> cases).
Agreed.
One common validation use would be to drop any packets whose header
length disagrees with the actual packet length.
> > But if we need to know the full length, when the first-buffer is
> > processed. Then realize that this affect the drivers RX-loop, because
> > then we need to "collect" all the buffers before we can know the
> > length (although some HW provide this in first descriptor).
> >
> > We likely have to change drivers RX-loop anyhow, as XDP_TX and
> > XDP_REDIRECT will also need to "collect" all buffers before the packet
> > can be forwarded. (Although this could potentially happen later in
> > driver loop when it meet/find the End-Of-Packet descriptor bit).
Yes, this might be quite a bit of refactoring of device driver code.
Should we move forward with some initial constraints, e.g., no
XDP_REDIRECT, no "full" length and no bpf_xdp_adjust_tail?
That already allows many useful programs.
As long as we don't arrive at a design that cannot be extended with
those features later.
> >
> >
> > 2. Can we even allow helper bpf_xdp_adjust_tail() ?
> >
> > Wouldn't it be easier to disallow a BPF-prog with this helper, when
> > driver have configured multi-buffer?
>
> Easier, certainly. But then it's even easier to not implement this at
> all ;)
>
> > Or will it be too restrictive, if jumbo-frame is very uncommon and
> > only enabled because switch infra could not be changed (like Amazon
> > case).
Header-split, LRO and jumbo frame are certainly not limited to the Amazon case.
> I think it would be preferable to support it; but maybe we can let that
> depend on how difficult it actually turns out to be to allow it?
>
> > Perhaps it is better to let bpf_xdp_adjust_tail() fail runtime?
>
> If we do disallow it, I think I'd lean towards failing the call at
> runtime...
Disagree. I'd rather have a program fail at load if it depends on
multi-frag support while the (driver) implementation does not yet
support it.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: XDP multi-buffer incl. jumbo-frames (Was: [RFC V1 net-next 1/1] net: ena: implement XDP drop support)
2019-06-26 15:20 ` Willem de Bruijn
@ 2019-06-26 16:42 ` Jonathan Lemon
2019-06-26 20:00 ` Jesper Dangaard Brouer
2019-06-28 8:46 ` Jesper Dangaard Brouer
1 sibling, 1 reply; 28+ messages in thread
From: Jonathan Lemon @ 2019-06-26 16:42 UTC (permalink / raw)
To: Willem de Bruijn
Cc: Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
Machulsky, Zorik, Jubran, Samih, davem, netdev, Woodhouse, David,
Matushevsky, Alexander, Bshara, Saeed, Wilson, Matt,
Liguori, Anthony, Bshara, Nafea, Tzalik, Guy, Belgazal, Netanel,
Saidi, Ali, Herrenschmidt, Benjamin, Kiyanovski, Arthur,
Daniel Borkmann, Ilias Apalodimas, Alexei Starovoitov,
Jakub Kicinski, xdp-newbies
On 26 Jun 2019, at 8:20, Willem de Bruijn wrote:
> On Wed, Jun 26, 2019 at 11:01 AM Toke Høiland-Jørgensen
> <toke@redhat.com> wrote:
>>
>> Jesper Dangaard Brouer <brouer@redhat.com> writes:
>>
>>> On Wed, 26 Jun 2019 13:52:16 +0200
>>> Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>>
>>>> Jesper Dangaard Brouer <brouer@redhat.com> writes:
>>>>
>>>>> On Tue, 25 Jun 2019 03:19:22 +0000
>>>>> "Machulsky, Zorik" <zorik@amazon.com> wrote:
>>>>>
>>>>>> On 6/23/19, 7:21 AM, "Jesper Dangaard Brouer"
>>>>>> <brouer@redhat.com> wrote:
>>>>>>
>>>>>> On Sun, 23 Jun 2019 10:06:49 +0300 <sameehj@amazon.com>
>>>>>> wrote:
>>>>>>
>>>>>> > This commit implements the basic functionality of drop/pass
>>>>>> logic in the
>>>>>> > ena driver.
>>>>>>
>>>>>> Usually we require a driver to implement all the XDP return
>>>>>> codes,
>>>>>> before we accept it. But as Daniel and I discussed with
>>>>>> Zorik during
>>>>>> NetConf[1], we are going to make an exception and accept the
>>>>>> driver
>>>>>> if you also implement XDP_TX.
>>>>>>
>>>>>> As we trust that Zorik/Amazon will follow and implement
>>>>>> XDP_REDIRECT
>>>>>> later, given he/you wants AF_XDP support which requires
>>>>>> XDP_REDIRECT.
>>>>>>
>>>>>> Jesper, thanks for your comments and very helpful discussion
>>>>>> during
>>>>>> NetConf! That's the plan, as we agreed. From our side I would
>>>>>> like to
>>>>>> reiterate again the importance of multi-buffer support by xdp
>>>>>> frame.
>>>>>> We would really prefer not to see our MTU shrinking because of
>>>>>> xdp
>>>>>> support.
>>>>>
>>>>> Okay we really need to make a serious attempt to find a way to
>>>>> support
>>>>> multi-buffer packets with XDP. With the important criteria of not
>>>>> hurting performance of the single-buffer per packet design.
>>>>>
>>>>> I've created a design document[2], that I will update based on our
>>>>> discussions: [2]
>>>>> https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org
>>>>>
>>>>> The use-case that really convinced me was Eric's packet
>>>>> header-split.
>
> Thanks for starting this discussion Jesper!
>
>>>>>
>>>>>
>>>>> Lets refresh: Why XDP don't have multi-buffer support:
>>>>>
>>>>> XDP is designed for maximum performance, which is why certain
>>>>> driver-level
>>>>> use-cases were not supported, like multi-buffer packets (like
>>>>> jumbo-frames).
>>>>> As it e.g. complicated the driver RX-loop and memory model
>>>>> handling.
>>>>>
>>>>> The single buffer per packet design, is also tied into eBPF
>>>>> Direct-Access
>>>>> (DA) to packet data, which can only be allowed if the packet
>>>>> memory is in
>>>>> contiguous memory. This DA feature is essential for XDP
>>>>> performance.
>>>>>
>>>>>
>>>>> One way forward is to define that XDP only get access to the first
>>>>> packet buffer, and it cannot see subsequent buffers. For XDP_TX
>>>>> and
>>>>> XDP_REDIRECT to work then XDP still need to carry pointers (plus
>>>>> len+offset) to the other buffers, which is 16 bytes per extra
>>>>> buffer.
>>>>
>>>> Yeah, I think this would be reasonable. As long as we can have a
>>>> metadata field with the full length + still give XDP programs the
>>>> ability to truncate the packet (i.e., discard the subsequent pages)
>>>
>>> You touch upon some interesting complications already:
>>>
>>> 1. It is valuable for XDP bpf_prog to know "full" length?
>>> (if so, then we need to extend xdp ctx with info)
>>
>> Valuable, quite likely. A hard requirement, probably not (for all use
>> cases).
>
> Agreed.
>
> One common validation use would be to drop any packets whose header
> length disagrees with the actual packet length.
>
>>> But if we need to know the full length, when the first-buffer is
>>> processed. Then realize that this affect the drivers RX-loop,
>>> because
>>> then we need to "collect" all the buffers before we can know the
>>> length (although some HW provide this in first descriptor).
>>>
>>> We likely have to change drivers RX-loop anyhow, as XDP_TX and
>>> XDP_REDIRECT will also need to "collect" all buffers before the
>>> packet
>>> can be forwarded. (Although this could potentially happen later in
>>> driver loop when it meet/find the End-Of-Packet descriptor bit).
>
> Yes, this might be quite a bit of refactoring of device driver code.
>
> Should we move forward with some initial constraints, e.g., no
> XDP_REDIRECT, no "full" length and no bpf_xdp_adjust_tail?
>
> That already allows many useful programs.
>
> As long as we don't arrive at a design that cannot be extended with
> those features later.
I think collecting all frames until EOP and then processing them
at once sounds reasonable.
>>> 2. Can we even allow helper bpf_xdp_adjust_tail() ?
>>>
>>> Wouldn't it be easier to disallow a BPF-prog with this helper, when
>>> driver have configured multi-buffer?
>>
>> Easier, certainly. But then it's even easier to not implement this at
>> all ;)
>>
>>> Or will it be too restrictive, if jumbo-frame is very uncommon and
>>> only enabled because switch infra could not be changed (like Amazon
>>> case).
>
> Header-split, LRO and jumbo frame are certainly not limited to the
> Amazon case.
>
>> I think it would be preferable to support it; but maybe we can let
>> that
>> depend on how difficult it actually turns out to be to allow it?
>>
>>> Perhaps it is better to let bpf_xdp_adjust_tail() fail runtime?
>>
>> If we do disallow it, I think I'd lean towards failing the call at
>> runtime...
>
> Disagree. I'd rather have a program fail at load if it depends on
> multi-frag support while the (driver) implementation does not yet
> support it.
If all packets are collected together (like the bulk queue does), and
then
passed to XDP, this could easily be made backwards compatible. If the
XDP
program isn't 'multi-frag' aware, then each packet is just passed in
individually.
Of course, passing in the equivalent of a iovec requires some form of
loop
support on the BPF side, doesn't it?
--
Jonathan
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: XDP multi-buffer incl. jumbo-frames (Was: [RFC V1 net-next 1/1] net: ena: implement XDP drop support)
2019-06-26 16:42 ` Jonathan Lemon
@ 2019-06-26 20:00 ` Jesper Dangaard Brouer
2019-06-27 22:07 ` Jonathan Lemon
0 siblings, 1 reply; 28+ messages in thread
From: Jesper Dangaard Brouer @ 2019-06-26 20:00 UTC (permalink / raw)
To: Jonathan Lemon
Cc: Willem de Bruijn, Toke Høiland-Jørgensen,
Machulsky, Zorik, Jubran, Samih, davem, netdev, Woodhouse, David,
Matushevsky, Alexander, Bshara, Saeed, Wilson, Matt,
Liguori, Anthony, Bshara, Nafea, Tzalik, Guy, Belgazal, Netanel,
Saidi, Ali, Herrenschmidt, Benjamin, Kiyanovski, Arthur,
Daniel Borkmann, Ilias Apalodimas, Alexei Starovoitov,
Jakub Kicinski, xdp-newbies, brouer
On Wed, 26 Jun 2019 09:42:07 -0700 "Jonathan Lemon" <jonathan.lemon@gmail.com> wrote:
> If all packets are collected together (like the bulk queue does), and
> then passed to XDP, this could easily be made backwards compatible.
> If the XDP program isn't 'multi-frag' aware, then each packet is just
> passed in individually.
My proposal#1 is XDP only access first-buffer[1], as this simplifies things.
(AFAIK) What you are proposing is that all the buffers are passed to
the XDP prog (in form of a iovec). I need some more details about your
suggestion.
Specifically:
- What is the semantic when a 3 buffer packet is input and XDP prog
choose to return XDP_DROP for packet #2 ?
- Same situation of packet #2 wants a XDP_TX or redirect?
> Of course, passing in the equivalent of a iovec requires some form of
> loop support on the BPF side, doesn't it?
The data structure used for holding these packet buffers/segments also
needs to be discussed. I would either use an array of bio_vec[2] or
skb_frag_t (aka skb_frag_struct). The skb_frag_t would be most
obvious, as we already have to write this when creating an SKB, in
skb_shared_info area. (Structs listed below signature).
The problem is also that size of these structs (16 bytes) per
buffer/segment, and we likely need to support 17 segments, as this need
to be compatible with SKBs (size 272 bytes).
My idea here is that we simply use the same memory area, that we have to
store skb_shared_info into. As this allow us to get the SKB setup for
free, when doing XDP_PASS or when doing SKB alloc after XDP_REDIRECT.
[1] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org#proposal1-xdp-only-access-first-buffer
[2] https://lore.kernel.org/netdev/20190501041757.8647-1-willy@infradead.org/
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
$ pahole -C skb_frag_struct vmlinux
struct skb_frag_struct {
struct {
struct page * p; /* 0 8 */
} page; /* 0 8 */
__u32 page_offset; /* 8 4 */
__u32 size; /* 12 4 */
/* size: 16, cachelines: 1, members: 3 */
/* last cacheline: 16 bytes */
};
$ pahole -C bio_vec vmlinux
struct bio_vec {
struct page * bv_page; /* 0 8 */
unsigned int bv_len; /* 8 4 */
unsigned int bv_offset; /* 12 4 */
/* size: 16, cachelines: 1, members: 3 */
/* last cacheline: 16 bytes */
};
$ pahole -C skb_shared_info vmlinux
struct skb_shared_info {
__u8 __unused; /* 0 1 */
__u8 meta_len; /* 1 1 */
__u8 nr_frags; /* 2 1 */
__u8 tx_flags; /* 3 1 */
short unsigned int gso_size; /* 4 2 */
short unsigned int gso_segs; /* 6 2 */
struct sk_buff * frag_list; /* 8 8 */
struct skb_shared_hwtstamps hwtstamps; /* 16 8 */
unsigned int gso_type; /* 24 4 */
u32 tskey; /* 28 4 */
atomic_t dataref; /* 32 0 */
/* XXX 8 bytes hole, try to pack */
void * destructor_arg; /* 40 8 */
skb_frag_t frags[17]; /* 48 272 */
/* size: 320, cachelines: 5, members: 13 */
/* sum members: 312, holes: 1, sum holes: 8 */
};
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: XDP multi-buffer incl. jumbo-frames (Was: [RFC V1 net-next 1/1] net: ena: implement XDP drop support)
2019-06-26 20:00 ` Jesper Dangaard Brouer
@ 2019-06-27 22:07 ` Jonathan Lemon
0 siblings, 0 replies; 28+ messages in thread
From: Jonathan Lemon @ 2019-06-27 22:07 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Willem de Bruijn, Toke Høiland-Jørgensen,
Machulsky, Zorik, Jubran, Samih, davem, netdev, Woodhouse, David,
Matushevsky, Alexander, Bshara, Saeed, Wilson, Matt,
Liguori, Anthony, Bshara, Nafea, Tzalik, Guy, Belgazal, Netanel,
Saidi, Ali, Herrenschmidt, Benjamin, Kiyanovski, Arthur,
Daniel Borkmann, Ilias Apalodimas, Alexei Starovoitov,
Jakub Kicinski, xdp-newbies
On 26 Jun 2019, at 13:00, Jesper Dangaard Brouer wrote:
> On Wed, 26 Jun 2019 09:42:07 -0700 "Jonathan Lemon"
> <jonathan.lemon@gmail.com> wrote:
>
>> If all packets are collected together (like the bulk queue does), and
>> then passed to XDP, this could easily be made backwards compatible.
>> If the XDP program isn't 'multi-frag' aware, then each packet is just
>> passed in individually.
>
> My proposal#1 is XDP only access first-buffer[1], as this simplifies
> things.
>
> (AFAIK) What you are proposing is that all the buffers are passed to
> the XDP prog (in form of a iovec). I need some more details about
> your
> suggestion.
I was thinking this over yesterday - and was probably conflating packets
and buffers a bit. Suppose that for the purposes of this discussion,
we're
talking about a single packet that is split over multiple buffer areas.
Say, on RX, with header split:
buf[0] = header
buf[1] = data
For LRO (hw recv) and jumbo frames (and TSO):
buf[0] = hdr + data
buf[1] = data
buf[n] = data
GRO cases, where individual packets are reassembled by software, aren't
handled here.
> Specifically:
>
> - What is the semantic when a 3 buffer packet is input and XDP prog
> choose to return XDP_DROP for packet #2 ?
>
> - Same situation of packet #2 wants a XDP_TX or redirect?
The collection of buffers represents a single packet, so this isn't
applicable here, right?
However, just thinking about incomplete data words (aka: pullup) gives
me a headache - seems this would complicate the BPF/verifier quite a
bit.
So perhaps just restricting things to the first entry would do for now?
As far as the exact data structure used to hold the buffers, it would
be nice if it had the same layout as a bio_vec, in case someone wanted
to get clever and start transferring things over directly.
--
Jonathan
>> Of course, passing in the equivalent of a iovec requires some form of
>> loop support on the BPF side, doesn't it?
>
> The data structure used for holding these packet buffers/segments also
> needs to be discussed. I would either use an array of bio_vec[2] or
> skb_frag_t (aka skb_frag_struct). The skb_frag_t would be most
> obvious, as we already have to write this when creating an SKB, in
> skb_shared_info area. (Structs listed below signature).
>
> The problem is also that size of these structs (16 bytes) per
> buffer/segment, and we likely need to support 17 segments, as this
> need
> to be compatible with SKBs (size 272 bytes).
>
> My idea here is that we simply use the same memory area, that we have
> to
> store skb_shared_info into. As this allow us to get the SKB setup for
> free, when doing XDP_PASS or when doing SKB alloc after XDP_REDIRECT.
>
>
> [1]
> https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org#proposal1-xdp-only-access-first-buffer
>
> [2]
> https://lore.kernel.org/netdev/20190501041757.8647-1-willy@infradead.org/
> --
> Best regards,
> Jesper Dangaard Brouer
> MSc.CS, Principal Kernel Engineer at Red Hat
> LinkedIn: http://www.linkedin.com/in/brouer
>
>
> $ pahole -C skb_frag_struct vmlinux
> struct skb_frag_struct {
> struct {
> struct page * p; /* 0 8 */
> } page; /* 0 8 */
> __u32 page_offset; /* 8 4 */
> __u32 size; /* 12 4 */
>
> /* size: 16, cachelines: 1, members: 3 */
> /* last cacheline: 16 bytes */
> };
>
> $ pahole -C bio_vec vmlinux
> struct bio_vec {
> struct page * bv_page; /* 0 8 */
> unsigned int bv_len; /* 8 4 */
> unsigned int bv_offset; /* 12 4 */
>
> /* size: 16, cachelines: 1, members: 3 */
> /* last cacheline: 16 bytes */
> };
>
> $ pahole -C skb_shared_info vmlinux
> struct skb_shared_info {
> __u8 __unused; /* 0 1 */
> __u8 meta_len; /* 1 1 */
> __u8 nr_frags; /* 2 1 */
> __u8 tx_flags; /* 3 1 */
> short unsigned int gso_size; /* 4 2 */
> short unsigned int gso_segs; /* 6 2 */
> struct sk_buff * frag_list; /* 8 8 */
> struct skb_shared_hwtstamps hwtstamps; /* 16 8 */
> unsigned int gso_type; /* 24 4 */
> u32 tskey; /* 28 4 */
> atomic_t dataref; /* 32 0 */
>
> /* XXX 8 bytes hole, try to pack */
>
> void * destructor_arg; /* 40 8 */
> skb_frag_t frags[17]; /* 48 272 */
>
> /* size: 320, cachelines: 5, members: 13 */
> /* sum members: 312, holes: 1, sum holes: 8 */
> };
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: XDP multi-buffer incl. jumbo-frames (Was: [RFC V1 net-next 1/1] net: ena: implement XDP drop support)
2019-06-26 15:20 ` Willem de Bruijn
2019-06-26 16:42 ` Jonathan Lemon
@ 2019-06-28 8:46 ` Jesper Dangaard Brouer
1 sibling, 0 replies; 28+ messages in thread
From: Jesper Dangaard Brouer @ 2019-06-28 8:46 UTC (permalink / raw)
To: Willem de Bruijn
Cc: Toke Høiland-Jørgensen, Machulsky, Zorik, Jubran, Samih,
davem@davemloft.net, netdev@vger.kernel.org, Woodhouse, David,
Matushevsky, Alexander, Bshara, Saeed, Wilson, Matt,
Liguori, Anthony, Bshara, Nafea, Tzalik, Guy, Belgazal, Netanel,
Saidi, Ali, Herrenschmidt, Benjamin, Kiyanovski, Arthur,
Daniel Borkmann, Ilias Apalodimas, Alexei Starovoitov,
Jakub Kicinski, xdp-newbies@vger.kernel.org, brouer
On Wed, 26 Jun 2019 11:20:45 -0400 Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> On Wed, Jun 26, 2019 at 11:01 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> > Jesper Dangaard Brouer <brouer@redhat.com> writes:
> > > On Wed, 26 Jun 2019 13:52:16 +0200
> > > Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> > >
> > >> Jesper Dangaard Brouer <brouer@redhat.com> writes:
> > >>
[...]
> > >
> > > You touch upon some interesting complications already:
> > >
> > > 1. It is valuable for XDP bpf_prog to know "full" length?
> > > (if so, then we need to extend xdp ctx with info)
> >
> > Valuable, quite likely. A hard requirement, probably not (for all use
> > cases).
>
> Agreed.
>
> One common validation use would be to drop any packets whose header
> length disagrees with the actual packet length.
That is a good point.
Added a section "XDP access to full packet length?" to capture this:
- https://github.com/xdp-project/xdp-project/commit/da5b84264b85b0d
- https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org#xdp-access-to-full-packet-length
> > > But if we need to know the full length, when the first-buffer is
> > > processed. Then realize that this affect the drivers RX-loop, because
> > > then we need to "collect" all the buffers before we can know the
> > > length (although some HW provide this in first descriptor).
> > >
> > > We likely have to change drivers RX-loop anyhow, as XDP_TX and
> > > XDP_REDIRECT will also need to "collect" all buffers before the packet
> > > can be forwarded. (Although this could potentially happen later in
> > > driver loop when it meet/find the End-Of-Packet descriptor bit).
>
> Yes, this might be quite a bit of refactoring of device driver code.
>
> Should we move forward with some initial constraints, e.g., no
> XDP_REDIRECT, no "full" length and no bpf_xdp_adjust_tail?
I generally like this...
If not adding "full" length. Maybe we could add an indication to
XDP-developer, that his is a multi-buffer/multi-segment packet, such
that header length validation code against packet length (data_end-data)
is not possible. This is user visible, so we would have to keep it
forever... I'm leaning towards adding "full" length from beginning.
> That already allows many useful programs.
>
> As long as we don't arrive at a design that cannot be extended with
> those features later.
That is the important part...
> > > 2. Can we even allow helper bpf_xdp_adjust_tail() ?
[...]
> >
> > > Perhaps it is better to let bpf_xdp_adjust_tail() fail runtime?
> >
> > If we do disallow it, I think I'd lean towards failing the call at
> > runtime...
>
> Disagree. I'd rather have a program fail at load if it depends on
> multi-frag support while the (driver) implementation does not yet
> support it.
I usually agree that we should fail the program, early at load time.
For XDP we are unfortunately missing some knobs to do this, see[1].
Specifically for bpf_xdp_adjust_tail(), it might be better to fail
runtime. Because, the driver might have enabled TSO for TCP packets,
while your XDP use-case is for adjusting UDP-packets (and do XDP level
replies), which will never see multi-buffer packets. If we fail use of
bpf_xdp_adjust_tail(), then you would have to disable TSO to allow
loading your XDP-prog, hurting the other TSO-TCP use-case.
[1] http://vger.kernel.org/netconf2019_files/xdp-feature-detection.pdf
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: XDP multi-buffer incl. jumbo-frames (Was: [RFC V1 net-next 1/1] net: ena: implement XDP drop support)
2019-06-26 14:40 ` Jesper Dangaard Brouer
@ 2019-06-26 15:14 ` Toke Høiland-Jørgensen
-1 siblings, 0 replies; 28+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-06-26 15:14 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Machulsky, Zorik, Jubran, Samih, davem@davemloft.net,
netdev@vger.kernel.org, Woodhouse, David, Matushevsky, Alexander,
Bshara, Saeed, Wilson, Matt, Liguori, Anthony, Bshara, Nafea,
Tzalik, Guy, Belgazal, Netanel, Saidi, Ali,
Herrenschmidt, Benjamin, Kiyanovski, Arthur, Daniel Borkmann,
Ilias Apalodimas, Alexei Starovoitov, Jakub Kicinski,
xdp-newbies@vger.kernel.org
Jesper Dangaard Brouer <brouer@redhat.com> writes:
> On Wed, 26 Jun 2019 13:52:16 +0200
> Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
>> Jesper Dangaard Brouer <brouer@redhat.com> writes:
>>
>> > On Tue, 25 Jun 2019 03:19:22 +0000
>> > "Machulsky, Zorik" <zorik@amazon.com> wrote:
>> >
>> >> On 6/23/19, 7:21 AM, "Jesper Dangaard Brouer" <brouer@redhat.com> wrote:
>> >>
>> >> On Sun, 23 Jun 2019 10:06:49 +0300 <sameehj@amazon.com> wrote:
>> >>
>> >> > This commit implements the basic functionality of drop/pass logic in the
>> >> > ena driver.
>> >>
>> >> Usually we require a driver to implement all the XDP return codes,
>> >> before we accept it. But as Daniel and I discussed with Zorik during
>> >> NetConf[1], we are going to make an exception and accept the driver
>> >> if you also implement XDP_TX.
>> >>
>> >> As we trust that Zorik/Amazon will follow and implement XDP_REDIRECT
>> >> later, given he/you wants AF_XDP support which requires XDP_REDIRECT.
>> >>
>> >> Jesper, thanks for your comments and very helpful discussion during
>> >> NetConf! That's the plan, as we agreed. From our side I would like to
>> >> reiterate again the importance of multi-buffer support by xdp frame.
>> >> We would really prefer not to see our MTU shrinking because of xdp
>> >> support.
>> >
>> > Okay we really need to make a serious attempt to find a way to support
>> > multi-buffer packets with XDP. With the important criteria of not
>> > hurting performance of the single-buffer per packet design.
>> >
>> > I've created a design document[2], that I will update based on our
>> > discussions: [2] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org
>> >
>> > The use-case that really convinced me was Eric's packet header-split.
>> >
>> >
>> > Lets refresh: Why XDP don't have multi-buffer support:
>> >
>> > XDP is designed for maximum performance, which is why certain driver-level
>> > use-cases were not supported, like multi-buffer packets (like jumbo-frames).
>> > As it e.g. complicated the driver RX-loop and memory model handling.
>> >
>> > The single buffer per packet design, is also tied into eBPF Direct-Access
>> > (DA) to packet data, which can only be allowed if the packet memory is in
>> > contiguous memory. This DA feature is essential for XDP performance.
>> >
>> >
>> > One way forward is to define that XDP only get access to the first
>> > packet buffer, and it cannot see subsequent buffers. For XDP_TX and
>> > XDP_REDIRECT to work then XDP still need to carry pointers (plus
>> > len+offset) to the other buffers, which is 16 bytes per extra buffer.
>>
>> Yeah, I think this would be reasonable. As long as we can have a
>> metadata field with the full length + still give XDP programs the
>> ability to truncate the packet (i.e., discard the subsequent pages)
>
> You touch upon some interesting complications already:
>
> 1. It is valuable for XDP bpf_prog to know "full" length?
> (if so, then we need to extend xdp ctx with info)
>
> But if we need to know the full length, when the first-buffer is
> processed. Then realize that this affect the drivers RX-loop, because
> then we need to "collect" all the buffers before we can know the
> length (although some HW provide this in first descriptor).
>
> We likely have to change drivers RX-loop anyhow, as XDP_TX and
> XDP_REDIRECT will also need to "collect" all buffers before the packet
> can be forwarded. (Although this could potentially happen later in
> driver loop when it meet/find the End-Of-Packet descriptor bit).
A few more points (mostly thinking out loud here):
- In any case we probably need to loop through the subsequent
descriptors in all cases, right? (i.e., if we run XDP on first
segment, and that returns DROP, the rest that are part of the packet
still need to be discarded). So we may as well delay XDP execution
until we have the whole packet?
- Will this allow us to run XDP on hardware-assembled GRO super-packets?
-Toke
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: XDP multi-buffer incl. jumbo-frames (Was: [RFC V1 net-next 1/1] net: ena: implement XDP drop support)
@ 2019-06-26 15:14 ` Toke Høiland-Jørgensen
0 siblings, 0 replies; 28+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-06-26 15:14 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Machulsky, Zorik, Jubran, Samih, davem@davemloft.net,
netdev@vger.kernel.org, Woodhouse, David, Matushevsky, Alexander,
Bshara, Saeed, Wilson, Matt, Liguori, Anthony, Bshara, Nafea,
Tzalik, Guy, Belgazal, Netanel, Saidi, Ali,
Herrenschmidt, Benjamin, Kiyanovski, Arthur, Daniel Borkmann,
Ilias Apalodimas, Alexei Starovoitov, Jakub Kicinski,
xdp-newbies@vger.kernel.org, brouer
Jesper Dangaard Brouer <brouer@redhat.com> writes:
> On Wed, 26 Jun 2019 13:52:16 +0200
> Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
>> Jesper Dangaard Brouer <brouer@redhat.com> writes:
>>
>> > On Tue, 25 Jun 2019 03:19:22 +0000
>> > "Machulsky, Zorik" <zorik@amazon.com> wrote:
>> >
>> >> On 6/23/19, 7:21 AM, "Jesper Dangaard Brouer" <brouer@redhat.com> wrote:
>> >>
>> >> On Sun, 23 Jun 2019 10:06:49 +0300 <sameehj@amazon.com> wrote:
>> >>
>> >> > This commit implements the basic functionality of drop/pass logic in the
>> >> > ena driver.
>> >>
>> >> Usually we require a driver to implement all the XDP return codes,
>> >> before we accept it. But as Daniel and I discussed with Zorik during
>> >> NetConf[1], we are going to make an exception and accept the driver
>> >> if you also implement XDP_TX.
>> >>
>> >> As we trust that Zorik/Amazon will follow and implement XDP_REDIRECT
>> >> later, given he/you wants AF_XDP support which requires XDP_REDIRECT.
>> >>
>> >> Jesper, thanks for your comments and very helpful discussion during
>> >> NetConf! That's the plan, as we agreed. From our side I would like to
>> >> reiterate again the importance of multi-buffer support by xdp frame.
>> >> We would really prefer not to see our MTU shrinking because of xdp
>> >> support.
>> >
>> > Okay we really need to make a serious attempt to find a way to support
>> > multi-buffer packets with XDP. With the important criteria of not
>> > hurting performance of the single-buffer per packet design.
>> >
>> > I've created a design document[2], that I will update based on our
>> > discussions: [2] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org
>> >
>> > The use-case that really convinced me was Eric's packet header-split.
>> >
>> >
>> > Lets refresh: Why XDP don't have multi-buffer support:
>> >
>> > XDP is designed for maximum performance, which is why certain driver-level
>> > use-cases were not supported, like multi-buffer packets (like jumbo-frames).
>> > As it e.g. complicated the driver RX-loop and memory model handling.
>> >
>> > The single buffer per packet design, is also tied into eBPF Direct-Access
>> > (DA) to packet data, which can only be allowed if the packet memory is in
>> > contiguous memory. This DA feature is essential for XDP performance.
>> >
>> >
>> > One way forward is to define that XDP only get access to the first
>> > packet buffer, and it cannot see subsequent buffers. For XDP_TX and
>> > XDP_REDIRECT to work then XDP still need to carry pointers (plus
>> > len+offset) to the other buffers, which is 16 bytes per extra buffer.
>>
>> Yeah, I think this would be reasonable. As long as we can have a
>> metadata field with the full length + still give XDP programs the
>> ability to truncate the packet (i.e., discard the subsequent pages)
>
> You touch upon some interesting complications already:
>
> 1. It is valuable for XDP bpf_prog to know "full" length?
> (if so, then we need to extend xdp ctx with info)
>
> But if we need to know the full length, when the first-buffer is
> processed. Then realize that this affect the drivers RX-loop, because
> then we need to "collect" all the buffers before we can know the
> length (although some HW provide this in first descriptor).
>
> We likely have to change drivers RX-loop anyhow, as XDP_TX and
> XDP_REDIRECT will also need to "collect" all buffers before the packet
> can be forwarded. (Although this could potentially happen later in
> driver loop when it meet/find the End-Of-Packet descriptor bit).
A few more points (mostly thinking out loud here):
- In any case we probably need to loop through the subsequent
descriptors in all cases, right? (i.e., if we run XDP on first
segment, and that returns DROP, the rest that are part of the packet
still need to be discarded). So we may as well delay XDP execution
until we have the whole packet?
- Will this allow us to run XDP on hardware-assembled GRO super-packets?
-Toke
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: XDP multi-buffer incl. jumbo-frames (Was: [RFC V1 net-next 1/1] net: ena: implement XDP drop support)
2019-06-26 15:14 ` Toke Høiland-Jørgensen
@ 2019-06-26 16:36 ` Jesper Dangaard Brouer
-1 siblings, 0 replies; 28+ messages in thread
From: Jesper Dangaard Brouer @ 2019-06-26 16:36 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: Machulsky, Zorik, Jubran, Samih, davem@davemloft.net,
netdev@vger.kernel.org, Woodhouse, David, Matushevsky, Alexander,
Bshara, Saeed, Wilson, Matt, Liguori, Anthony, Bshara, Nafea,
Tzalik, Guy, Belgazal, Netanel, Saidi, Ali,
Herrenschmidt, Benjamin, Kiyanovski, Arthur, Daniel Borkmann,
Ilias Apalodimas, Alexei Starovoitov, Jakub Kicinski,
xdp-newbies@vger.kernel.org, brouer
On Wed, 26 Jun 2019 17:14:32 +0200
Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> Jesper Dangaard Brouer <brouer@redhat.com> writes:
>
> > On Wed, 26 Jun 2019 13:52:16 +0200
> > Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >
> >> Jesper Dangaard Brouer <brouer@redhat.com> writes:
> >>
> >> > On Tue, 25 Jun 2019 03:19:22 +0000
> >> > "Machulsky, Zorik" <zorik@amazon.com> wrote:
> >> >
> >> >> On 6/23/19, 7:21 AM, "Jesper Dangaard Brouer" <brouer@redhat.com> wrote:
> >> >>
> >> >> On Sun, 23 Jun 2019 10:06:49 +0300 <sameehj@amazon.com> wrote:
> >> >>
> >> >> > This commit implements the basic functionality of drop/pass logic in the
> >> >> > ena driver.
> >> >>
> >> >> Usually we require a driver to implement all the XDP return codes,
> >> >> before we accept it. But as Daniel and I discussed with Zorik during
> >> >> NetConf[1], we are going to make an exception and accept the driver
> >> >> if you also implement XDP_TX.
> >> >>
> >> >> As we trust that Zorik/Amazon will follow and implement XDP_REDIRECT
> >> >> later, given he/you wants AF_XDP support which requires XDP_REDIRECT.
> >> >>
> >> >> Jesper, thanks for your comments and very helpful discussion during
> >> >> NetConf! That's the plan, as we agreed. From our side I would like to
> >> >> reiterate again the importance of multi-buffer support by xdp frame.
> >> >> We would really prefer not to see our MTU shrinking because of xdp
> >> >> support.
> >> >
> >> > Okay we really need to make a serious attempt to find a way to support
> >> > multi-buffer packets with XDP. With the important criteria of not
> >> > hurting performance of the single-buffer per packet design.
> >> >
> >> > I've created a design document[2], that I will update based on our
> >> > discussions: [2] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org
> >> >
> >> > The use-case that really convinced me was Eric's packet header-split.
> >> >
> >> >
> >> > Lets refresh: Why XDP don't have multi-buffer support:
> >> >
> >> > XDP is designed for maximum performance, which is why certain driver-level
> >> > use-cases were not supported, like multi-buffer packets (like jumbo-frames).
> >> > As it e.g. complicated the driver RX-loop and memory model handling.
> >> >
> >> > The single buffer per packet design, is also tied into eBPF Direct-Access
> >> > (DA) to packet data, which can only be allowed if the packet memory is in
> >> > contiguous memory. This DA feature is essential for XDP performance.
> >> >
> >> >
> >> > One way forward is to define that XDP only get access to the first
> >> > packet buffer, and it cannot see subsequent buffers. For XDP_TX and
> >> > XDP_REDIRECT to work then XDP still need to carry pointers (plus
> >> > len+offset) to the other buffers, which is 16 bytes per extra buffer.
> >>
> >> Yeah, I think this would be reasonable. As long as we can have a
> >> metadata field with the full length + still give XDP programs the
> >> ability to truncate the packet (i.e., discard the subsequent pages)
> >
> > You touch upon some interesting complications already:
> >
> > 1. It is valuable for XDP bpf_prog to know "full" length?
> > (if so, then we need to extend xdp ctx with info)
> >
> > But if we need to know the full length, when the first-buffer is
> > processed. Then realize that this affect the drivers RX-loop, because
> > then we need to "collect" all the buffers before we can know the
> > length (although some HW provide this in first descriptor).
> >
> > We likely have to change drivers RX-loop anyhow, as XDP_TX and
> > XDP_REDIRECT will also need to "collect" all buffers before the packet
> > can be forwarded. (Although this could potentially happen later in
> > driver loop when it meet/find the End-Of-Packet descriptor bit).
>
> A few more points (mostly thinking out loud here):
>
> - In any case we probably need to loop through the subsequent
> descriptors in all cases, right? (i.e., if we run XDP on first
> segment, and that returns DROP, the rest that are part of the packet
> still need to be discarded). So we may as well delay XDP execution
> until we have the whole packet?
For the XDP_DROP case, drivers usually have way to discard remaining
buffers/segments, to handle error cases. But it heavily depend on the
driver, how tricky/convoluted this code is...
Generally I would say it would make sense to delay XDP execution until
all buffers/segments are "collected". It would be the clean approach,
but will likely require refactoring of driver level code.
> - Will this allow us to run XDP on hardware-assembled GRO super-packets?
Big YES. This is usually called LRO or TSO packets. And yes, I also
want to support this use-case, which is also listed in [2]. If we go
down this road, this use-case is also important. (Especially related to
my alloc SKBs outside drivers[3], this is a hardware offload we must
support).
[2] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org
[3] http://vger.kernel.org/netconf2019_files/xdp-metadata-discussion.pdf
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: XDP multi-buffer incl. jumbo-frames (Was: [RFC V1 net-next 1/1] net: ena: implement XDP drop support)
@ 2019-06-26 16:36 ` Jesper Dangaard Brouer
0 siblings, 0 replies; 28+ messages in thread
From: Jesper Dangaard Brouer @ 2019-06-26 16:36 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: Machulsky, Zorik, Jubran, Samih, davem@davemloft.net,
netdev@vger.kernel.org, Woodhouse, David, Matushevsky, Alexander,
Bshara, Saeed, Wilson, Matt, Liguori, Anthony, Bshara, Nafea,
Tzalik, Guy, Belgazal, Netanel, Saidi, Ali,
Herrenschmidt, Benjamin, Kiyanovski, Arthur, Daniel Borkmann,
Ilias Apalodimas, Alexei Starovoitov, Jakub Kicinski,
xdp-newbies@vger.kernel.org, brouer
On Wed, 26 Jun 2019 17:14:32 +0200
Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> Jesper Dangaard Brouer <brouer@redhat.com> writes:
>
> > On Wed, 26 Jun 2019 13:52:16 +0200
> > Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >
> >> Jesper Dangaard Brouer <brouer@redhat.com> writes:
> >>
> >> > On Tue, 25 Jun 2019 03:19:22 +0000
> >> > "Machulsky, Zorik" <zorik@amazon.com> wrote:
> >> >
> >> >> On 6/23/19, 7:21 AM, "Jesper Dangaard Brouer" <brouer@redhat.com> wrote:
> >> >>
> >> >> On Sun, 23 Jun 2019 10:06:49 +0300 <sameehj@amazon.com> wrote:
> >> >>
> >> >> > This commit implements the basic functionality of drop/pass logic in the
> >> >> > ena driver.
> >> >>
> >> >> Usually we require a driver to implement all the XDP return codes,
> >> >> before we accept it. But as Daniel and I discussed with Zorik during
> >> >> NetConf[1], we are going to make an exception and accept the driver
> >> >> if you also implement XDP_TX.
> >> >>
> >> >> As we trust that Zorik/Amazon will follow and implement XDP_REDIRECT
> >> >> later, given he/you wants AF_XDP support which requires XDP_REDIRECT.
> >> >>
> >> >> Jesper, thanks for your comments and very helpful discussion during
> >> >> NetConf! That's the plan, as we agreed. From our side I would like to
> >> >> reiterate again the importance of multi-buffer support by xdp frame.
> >> >> We would really prefer not to see our MTU shrinking because of xdp
> >> >> support.
> >> >
> >> > Okay we really need to make a serious attempt to find a way to support
> >> > multi-buffer packets with XDP. With the important criteria of not
> >> > hurting performance of the single-buffer per packet design.
> >> >
> >> > I've created a design document[2], that I will update based on our
> >> > discussions: [2] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org
> >> >
> >> > The use-case that really convinced me was Eric's packet header-split.
> >> >
> >> >
> >> > Lets refresh: Why XDP don't have multi-buffer support:
> >> >
> >> > XDP is designed for maximum performance, which is why certain driver-level
> >> > use-cases were not supported, like multi-buffer packets (like jumbo-frames).
> >> > As it e.g. complicated the driver RX-loop and memory model handling.
> >> >
> >> > The single buffer per packet design, is also tied into eBPF Direct-Access
> >> > (DA) to packet data, which can only be allowed if the packet memory is in
> >> > contiguous memory. This DA feature is essential for XDP performance.
> >> >
> >> >
> >> > One way forward is to define that XDP only get access to the first
> >> > packet buffer, and it cannot see subsequent buffers. For XDP_TX and
> >> > XDP_REDIRECT to work then XDP still need to carry pointers (plus
> >> > len+offset) to the other buffers, which is 16 bytes per extra buffer.
> >>
> >> Yeah, I think this would be reasonable. As long as we can have a
> >> metadata field with the full length + still give XDP programs the
> >> ability to truncate the packet (i.e., discard the subsequent pages)
> >
> > You touch upon some interesting complications already:
> >
> > 1. It is valuable for XDP bpf_prog to know "full" length?
> > (if so, then we need to extend xdp ctx with info)
> >
> > But if we need to know the full length, when the first-buffer is
> > processed. Then realize that this affect the drivers RX-loop, because
> > then we need to "collect" all the buffers before we can know the
> > length (although some HW provide this in first descriptor).
> >
> > We likely have to change drivers RX-loop anyhow, as XDP_TX and
> > XDP_REDIRECT will also need to "collect" all buffers before the packet
> > can be forwarded. (Although this could potentially happen later in
> > driver loop when it meet/find the End-Of-Packet descriptor bit).
>
> A few more points (mostly thinking out loud here):
>
> - In any case we probably need to loop through the subsequent
> descriptors in all cases, right? (i.e., if we run XDP on first
> segment, and that returns DROP, the rest that are part of the packet
> still need to be discarded). So we may as well delay XDP execution
> until we have the whole packet?
For the XDP_DROP case, drivers usually have way to discard remaining
buffers/segments, to handle error cases. But it heavily depend on the
driver, how tricky/convoluted this code is...
Generally I would say it would make sense to delay XDP execution until
all buffers/segments are "collected". It would be the clean approach,
but will likely require refactoring of driver level code.
> - Will this allow us to run XDP on hardware-assembled GRO super-packets?
Big YES. This is usually called LRO or TSO packets. And yes, I also
want to support this use-case, which is also listed in [2]. If we go
down this road, this use-case is also important. (Especially related to
my alloc SKBs outside drivers[3], this is a hardware offload we must
support).
[2] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org
[3] http://vger.kernel.org/netconf2019_files/xdp-metadata-discussion.pdf
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 28+ messages in thread