* [PATCH 00/11] XDP unaligned chunk placement support
@ 2019-06-20 9:09 Kevin Laatz
0 siblings, 0 replies; 10+ messages in thread
From: Kevin Laatz @ 2019-06-20 9:09 UTC (permalink / raw)
To: netdev, ast, daniel, bjorn.topel, magnus.karlsson
Cc: bpf, intel-wired-lan, bruce.richardson, ciara.loftus, Kevin Laatz
This patchset adds the ability to use unaligned chunks in the XDP umem.
Currently, all chunk addresses passed to the umem are masked to be chunk
size aligned (default is 2k, max is PAGE_SIZE). This limits where we can
place chunks within the umem as well as limiting the packet sizes that are
supported.
The changes in this patchset removes these restrictions, allowing XDP to be
more flexible in where it can place a chunk within a umem. By relaxing where
the chunks can be placed, it allows us to use an arbitrary buffer size and
place that wherever we have a free address in the umem. These changes add the
ability to support jumboframes and make it easy to integrate with other
existing frameworks that have their own memory management systems, such as
DPDK.
Structure of the patchset:
Patch 1:
- Remove unnecessary masking and headroom addition during zero-copy Rx
buffer recycling in i40e. This change is required in order for the
buffer recycling to work in the unaligned chunk mode.
Patch 2:
- Remove unnecessary masking and headroom addition during
zero-copy Rx buffer recycling in ixgbe. This change is required in
order for the buffer recycling to work in the unaligned chunk mode.
Patch 3:
- Adds an offset parameter to zero_copy_allocator. This change will
enable us to calculate the original handle in zca_free. This will be
required for unaligned chunk mode since we can't easily mask back to
the original handle.
Patch 4:
- Adds the offset parameter to i40e_zca_free. This change is needed for
calculating the handle since we can't easily mask back to the original
handle like we can in the aligned case.
Patch 5:
- Adds the offset parameter to ixgbe_zca_free. This change is needed for
calculating the handle since we can't easily mask back to the original
handle like we can in the aligned case.
Patch 6:
- Add infrastructure for unaligned chunks. Since we are dealing
with unaligned chunks that could potentially cross a physical page
boundary, we add checks to keep track of that information. We can
later use this information to correctly handle buffers that are
placed at an address where they cross a page boundary.
Patch 7:
- Add flags for umem configuration to libbpf
Patch 8:
- Modify xdpsock application to add a command line option for
unaligned chunks
Patch 9:
- Addition of command line argument to pass in a desired buffer size
and buffer recycling for unaligned mode. Passing in a buffer size will
allow the application to use unaligned chunks with the unaligned chunk
mode. Since we are now using unaligned chunks, we need to recycle our
buffers in a slightly different way.
Patch 10:
- Adds hugepage support to the xdpsock application
Patch 11:
- Documentation update to include the unaligned chunk scenario. We need
to explicitly state that the incoming addresses are only masked in the
aligned chunk mode and not the unaligned chunk mode.
Kevin Laatz (11):
i40e: simplify Rx buffer recycle
ixgbe: simplify Rx buffer recycle
xdp: add offset param to zero_copy_allocator
i40e: add offset to zca_free
ixgbe: add offset to zca_free
xsk: add support to allow unaligned chunk placement
libbpf: add flags to umem config
samples/bpf: add unaligned chunks mode support to xdpsock
samples/bpf: add buffer recycling for unaligned chunks to xdpsock
samples/bpf: use hugepages in xdpsock app
doc/af_xdp: include unaligned chunk case
Documentation/networking/af_xdp.rst | 10 +-
drivers/net/ethernet/intel/i40e/i40e_xsk.c | 21 ++--
drivers/net/ethernet/intel/i40e/i40e_xsk.h | 3 +-
.../ethernet/intel/ixgbe/ixgbe_txrx_common.h | 3 +-
drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 21 ++--
include/net/xdp.h | 3 +-
include/net/xdp_sock.h | 2 +
include/uapi/linux/if_xdp.h | 4 +
net/core/xdp.c | 11 ++-
net/xdp/xdp_umem.c | 17 ++--
net/xdp/xsk.c | 60 +++++++++--
net/xdp/xsk_queue.h | 60 +++++++++--
samples/bpf/xdpsock_user.c | 99 ++++++++++++++-----
tools/include/uapi/linux/if_xdp.h | 4 +
tools/lib/bpf/xsk.c | 7 ++
tools/lib/bpf/xsk.h | 2 +
16 files changed, 241 insertions(+), 86 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 00/11] XDP unaligned chunk placement support
[not found] ` <20190627142534.4f4b8995@cakuba.netronome.com>
@ 2019-06-28 16:19 ` Laatz, Kevin
2019-06-28 16:51 ` Björn Töpel
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Laatz, Kevin @ 2019-06-28 16:19 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Jonathan Lemon, netdev, ast, daniel, bjorn.topel, magnus.karlsson,
bpf, intel-wired-lan, bruce.richardson, ciara.loftus
On 27/06/2019 22:25, Jakub Kicinski wrote:
> On Thu, 27 Jun 2019 12:14:50 +0100, Laatz, Kevin wrote:
>> On the application side (xdpsock), we don't have to worry about the user
>> defined headroom, since it is 0, so we only need to account for the
>> XDP_PACKET_HEADROOM when computing the original address (in the default
>> scenario).
> That assumes specific layout for the data inside the buffer. Some NICs
> will prepend information like timestamp to the packet, meaning the
> packet would start at offset XDP_PACKET_HEADROOM + metadata len..
Yes, if NICs prepend extra data to the packet that would be a problem for
using this feature in isolation. However, if we also add in support for
in-order
RX and TX rings, that would no longer be an issue. However, even for NICs
which do prepend data, this patchset should not break anything that is
currently
working.
>
> I think that's very limiting. What is the challenge in providing
> aligned addresses, exactly?
The challenges are two-fold:
1) it prevents using arbitrary buffer sizes, which will be an issue
supporting e.g. jumbo frames in future.
2) higher level user-space frameworks which may want to use AF_XDP, such
as DPDK, do not currently support having buffers with 'fixed' alignment.
The reason that DPDK uses arbitrary placement is that:
- it would stop things working on certain NICs which need the
actual writable space specified in units of 1k - therefore we need 2k +
metadata space.
- we place padding between buffers to avoid constantly hitting
the same memory channels when accessing memory.
- it allows the application to choose the actual buffer size it
wants to use.
We make use of the above to allow us to speed up processing
significantly and also reduce the packet buffer memory size.
Not having arbitrary buffer alignment also means an AF_XDP driver
for DPDK cannot be a drop-in replacement for existing drivers in those
frameworks. Even with a new capability to allow an arbitrary buffer
alignment, existing apps will need to be modified to use that new
capability.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 00/11] XDP unaligned chunk placement support
2019-06-28 16:19 ` [PATCH 00/11] XDP unaligned chunk placement support Laatz, Kevin
@ 2019-06-28 16:51 ` Björn Töpel
2019-06-28 20:08 ` Jakub Kicinski
2019-06-28 20:25 ` Jakub Kicinski
2019-06-28 20:29 ` Jonathan Lemon
2 siblings, 1 reply; 10+ messages in thread
From: Björn Töpel @ 2019-06-28 16:51 UTC (permalink / raw)
To: Laatz, Kevin, Jakub Kicinski
Cc: Jonathan Lemon, netdev, ast, daniel, magnus.karlsson, bpf,
intel-wired-lan, bruce.richardson, ciara.loftus
On 2019-06-28 18:19, Laatz, Kevin wrote:
> On 27/06/2019 22:25, Jakub Kicinski wrote:
>> On Thu, 27 Jun 2019 12:14:50 +0100, Laatz, Kevin wrote:
>>> On the application side (xdpsock), we don't have to worry about the user
>>> defined headroom, since it is 0, so we only need to account for the
>>> XDP_PACKET_HEADROOM when computing the original address (in the default
>>> scenario).
>> That assumes specific layout for the data inside the buffer. Some NICs
>> will prepend information like timestamp to the packet, meaning the
>> packet would start at offset XDP_PACKET_HEADROOM + metadata len..
>
> Yes, if NICs prepend extra data to the packet that would be a problem for
> using this feature in isolation. However, if we also add in support for
> in-order
> RX and TX rings, that would no longer be an issue. However, even for NICs
> which do prepend data, this patchset should not break anything that is
> currently
> working.
(Late on the ball. I'm in vacation mode.)
In your example Jakub, how would this look in XDP? Wouldn't the
timestamp be part of the metadata (xdp_md.data_meta)? Isn't
data-data_meta (if valid) <= XDP_PACKET_HEADROOM? That was my assumption.
There were some discussion on having meta data length in the struct
xdp_desc, before AF_XDP was merged, but the conclusion was that this was
*not* needed, because AF_XDP and the XDP program had an implicit
contract. If you're running AF_XDP, you also have an XDP program running
and you can determine the meta data length (and also getting back the
original buffer).
So, today in AF_XDP if XDP metadata is added, the userland application
can look it up before the xdp_desc.addr (just like regular XDP), and how
the XDP/AF_XDP application determines length/layout of the metadata i
out-of-band/not specified.
This is a bit messy/handwavy TBH, so maybe adding the length to the
descriptor *is* a good idea (extending the options part of the
xdp_desc)? Less clean though. OTOH the layout of the meta data still
need to be determined.
Björn
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 00/11] XDP unaligned chunk placement support
2019-06-28 16:51 ` Björn Töpel
@ 2019-06-28 20:08 ` Jakub Kicinski
0 siblings, 0 replies; 10+ messages in thread
From: Jakub Kicinski @ 2019-06-28 20:08 UTC (permalink / raw)
To: Björn Töpel
Cc: Laatz, Kevin, Jonathan Lemon, netdev, ast, daniel,
magnus.karlsson, bpf, intel-wired-lan, bruce.richardson,
ciara.loftus
On Fri, 28 Jun 2019 18:51:37 +0200, Björn Töpel wrote:
> In your example Jakub, how would this look in XDP? Wouldn't the
> timestamp be part of the metadata (xdp_md.data_meta)? Isn't
> data-data_meta (if valid) <= XDP_PACKET_HEADROOM? That was my assumption.
The driver parses the metadata and copies it outside of the prepend
before XDP runs. Then XDP runs unaware of the prepend contents.
That's the current situation.
XDP_PACKET_HEADROOM is before the entire frame. Like this:
buffer start
/ DMA addr given to the device
/ /
v v
| XDP_HEADROOM | meta data | packet data |
Length of meta data comes in the standard fixed size descriptor.
The metadata prepend is in TV form ("TLV with no length field", length's
implied by type).
> There were some discussion on having meta data length in the struct
> xdp_desc, before AF_XDP was merged, but the conclusion was that this was
> *not* needed, because AF_XDP and the XDP program had an implicit
> contract. If you're running AF_XDP, you also have an XDP program running
> and you can determine the meta data length (and also getting back the
> original buffer).
>
> So, today in AF_XDP if XDP metadata is added, the userland application
> can look it up before the xdp_desc.addr (just like regular XDP), and how
> the XDP/AF_XDP application determines length/layout of the metadata i
> out-of-band/not specified.
>
> This is a bit messy/handwavy TBH, so maybe adding the length to the
> descriptor *is* a good idea (extending the options part of the
> xdp_desc)? Less clean though. OTOH the layout of the meta data still
> need to be determined.
Right, the device prepend is not exposed as metadata to XDP.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 00/11] XDP unaligned chunk placement support
2019-06-28 16:19 ` [PATCH 00/11] XDP unaligned chunk placement support Laatz, Kevin
2019-06-28 16:51 ` Björn Töpel
@ 2019-06-28 20:25 ` Jakub Kicinski
2019-06-28 20:29 ` Jonathan Lemon
2 siblings, 0 replies; 10+ messages in thread
From: Jakub Kicinski @ 2019-06-28 20:25 UTC (permalink / raw)
To: Laatz, Kevin
Cc: Jonathan Lemon, netdev, ast, daniel, bjorn.topel, magnus.karlsson,
bpf, intel-wired-lan, bruce.richardson, ciara.loftus
On Fri, 28 Jun 2019 17:19:09 +0100, Laatz, Kevin wrote:
> On 27/06/2019 22:25, Jakub Kicinski wrote:
> > On Thu, 27 Jun 2019 12:14:50 +0100, Laatz, Kevin wrote:
> >> On the application side (xdpsock), we don't have to worry about the user
> >> defined headroom, since it is 0, so we only need to account for the
> >> XDP_PACKET_HEADROOM when computing the original address (in the default
> >> scenario).
> > That assumes specific layout for the data inside the buffer. Some NICs
> > will prepend information like timestamp to the packet, meaning the
> > packet would start at offset XDP_PACKET_HEADROOM + metadata len..
>
> Yes, if NICs prepend extra data to the packet that would be a problem for
> using this feature in isolation. However, if we also add in support for
> in-order RX and TX rings, that would no longer be an issue.
Can you shed more light on in-order rings? Do you mean that RX frames
come in order buffers were placed in the fill queue? That wouldn't
make practical sense, no? Even if the application does no
reordering there is also XDP_DROP and XDP_TX. Please explain :)
> However, even for NICs which do prepend data, this patchset should
> not break anything that is currently working.
My understanding from the beginnings of AF_XDP was that we were
searching for a format flexible enough to support most if not all NICs.
Creating an ABI which will preclude vendors from supporting DPDK via
AF_XDP would seriously undermine the neutrality aspect.
> > I think that's very limiting. What is the challenge in providing
> > aligned addresses, exactly?
> The challenges are two-fold:
> 1) it prevents using arbitrary buffer sizes, which will be an issue
> supporting e.g. jumbo frames in future.
Presumably support for jumbos would require a multi-buffer setup, and
therefore extensions to the ring format. Should we perhaps look into
implementing unaligned chunks by extending ring format as well?
> 2) higher level user-space frameworks which may want to use AF_XDP, such
> as DPDK, do not currently support having buffers with 'fixed' alignment.
> The reason that DPDK uses arbitrary placement is that:
> - it would stop things working on certain NICs which need the
> actual writable space specified in units of 1k - therefore we need 2k +
> metadata space.
> - we place padding between buffers to avoid constantly hitting
> the same memory channels when accessing memory.
> - it allows the application to choose the actual buffer size it
> wants to use.
> We make use of the above to allow us to speed up processing
> significantly and also reduce the packet buffer memory size.
>
> Not having arbitrary buffer alignment also means an AF_XDP driver
> for DPDK cannot be a drop-in replacement for existing drivers in those
> frameworks. Even with a new capability to allow an arbitrary buffer
> alignment, existing apps will need to be modified to use that new
> capability.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 00/11] XDP unaligned chunk placement support
2019-06-28 16:19 ` [PATCH 00/11] XDP unaligned chunk placement support Laatz, Kevin
2019-06-28 16:51 ` Björn Töpel
2019-06-28 20:25 ` Jakub Kicinski
@ 2019-06-28 20:29 ` Jonathan Lemon
2019-07-01 14:58 ` Laatz, Kevin
[not found] ` <07e404eb-f712-b15a-4884-315aff3f7c7d@intel.com>
2 siblings, 2 replies; 10+ messages in thread
From: Jonathan Lemon @ 2019-06-28 20:29 UTC (permalink / raw)
To: Laatz, Kevin
Cc: Jakub Kicinski, netdev, ast, daniel, bjorn.topel, magnus.karlsson,
bpf, intel-wired-lan, bruce.richardson, ciara.loftus
On 28 Jun 2019, at 9:19, Laatz, Kevin wrote:
> On 27/06/2019 22:25, Jakub Kicinski wrote:
>> On Thu, 27 Jun 2019 12:14:50 +0100, Laatz, Kevin wrote:
>>> On the application side (xdpsock), we don't have to worry about the
>>> user
>>> defined headroom, since it is 0, so we only need to account for the
>>> XDP_PACKET_HEADROOM when computing the original address (in the
>>> default
>>> scenario).
>> That assumes specific layout for the data inside the buffer. Some
>> NICs
>> will prepend information like timestamp to the packet, meaning the
>> packet would start at offset XDP_PACKET_HEADROOM + metadata len..
>
> Yes, if NICs prepend extra data to the packet that would be a problem
> for
> using this feature in isolation. However, if we also add in support
> for in-order
> RX and TX rings, that would no longer be an issue. However, even for
> NICs
> which do prepend data, this patchset should not break anything that is
> currently
> working.
I read this as "the correct buffer address is recovered from the shadow
ring".
I'm not sure I'm comfortable with that, and I'm also not sold on
in-order completion
for the RX/TX rings.
>> I think that's very limiting. What is the challenge in providing
>> aligned addresses, exactly?
> The challenges are two-fold:
> 1) it prevents using arbitrary buffer sizes, which will be an issue
> supporting e.g. jumbo frames in future.
> 2) higher level user-space frameworks which may want to use AF_XDP,
> such as DPDK, do not currently support having buffers with 'fixed'
> alignment.
> The reason that DPDK uses arbitrary placement is that:
> - it would stop things working on certain NICs which
> need the actual writable space specified in units of 1k - therefore we
> need 2k + metadata space.
> - we place padding between buffers to avoid constantly
> hitting the same memory channels when accessing memory.
> - it allows the application to choose the actual buffer
> size it wants to use.
> We make use of the above to allow us to speed up processing
> significantly and also reduce the packet buffer memory size.
>
> Not having arbitrary buffer alignment also means an AF_XDP
> driver for DPDK cannot be a drop-in replacement for existing drivers
> in those frameworks. Even with a new capability to allow an arbitrary
> buffer alignment, existing apps will need to be modified to use that
> new capability.
Since all buffers in the umem are the same chunk size, the original
buffer
address can be recalculated with some multiply/shift math. However,
this is
more expensive than just a mask operation.
--
Jonathan
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 00/11] XDP unaligned chunk placement support
2019-06-28 20:29 ` Jonathan Lemon
@ 2019-07-01 14:58 ` Laatz, Kevin
[not found] ` <07e404eb-f712-b15a-4884-315aff3f7c7d@intel.com>
1 sibling, 0 replies; 10+ messages in thread
From: Laatz, Kevin @ 2019-07-01 14:58 UTC (permalink / raw)
To: Jonathan Lemon
Cc: Jakub Kicinski, netdev, ast, daniel, bjorn.topel, magnus.karlsson,
bpf, intel-wired-lan, bruce.richardson, ciara.loftus
On 28/06/2019 21:29, Jonathan Lemon wrote:
> On 28 Jun 2019, at 9:19, Laatz, Kevin wrote:
>> On 27/06/2019 22:25, Jakub Kicinski wrote:
>>> I think that's very limiting. What is the challenge in providing
>>> aligned addresses, exactly?
>> The challenges are two-fold:
>> 1) it prevents using arbitrary buffer sizes, which will be an issue
>> supporting e.g. jumbo frames in future.
>> 2) higher level user-space frameworks which may want to use AF_XDP,
>> such as DPDK, do not currently support having buffers with 'fixed'
>> alignment.
>> The reason that DPDK uses arbitrary placement is that:
>> - it would stop things working on certain NICs which need the
>> actual writable space specified in units of 1k - therefore we need 2k
>> + metadata space.
>> - we place padding between buffers to avoid constantly
>> hitting the same memory channels when accessing memory.
>> - it allows the application to choose the actual buffer size
>> it wants to use.
>> We make use of the above to allow us to speed up processing
>> significantly and also reduce the packet buffer memory size.
>>
>> Not having arbitrary buffer alignment also means an AF_XDP driver
>> for DPDK cannot be a drop-in replacement for existing drivers in
>> those frameworks. Even with a new capability to allow an arbitrary
>> buffer alignment, existing apps will need to be modified to use that
>> new capability.
>
> Since all buffers in the umem are the same chunk size, the original
> buffer
> address can be recalculated with some multiply/shift math. However,
> this is
> more expensive than just a mask operation.
Yes, we can do this.
Another option we have is to add a socket option for querying the
metadata length from the driver (assuming it doesn't vary per packet).
We can use that information to get back the original address using
subtraction.
Alternatively, we can change the Rx descriptor format to include the
metadata length. We could do this in a couple of ways, for example,
rather than returning the address at the start of the packet, instead
return the buffer address that was passed in, and adding another 16-bit
field to specify the start of the packet offset with that buffer. Id
using 16-bits of descriptor space is not desirable, an alternative could
be to limit umem sizes to e.g. 2^48 bits (256 terabytes should be
enough, right :-) ) and use the remaining 16 bits of the address as a
packet offset. Other variations on these approaches are obviously
possible too.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 00/11] XDP unaligned chunk placement support
[not found] ` <07e404eb-f712-b15a-4884-315aff3f7c7d@intel.com>
@ 2019-07-01 21:20 ` Jakub Kicinski
2019-07-02 9:27 ` Richardson, Bruce
0 siblings, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2019-07-01 21:20 UTC (permalink / raw)
To: Laatz, Kevin
Cc: Jonathan Lemon, netdev, ast, daniel, bjorn.topel, magnus.karlsson,
bpf, intel-wired-lan, bruce.richardson, ciara.loftus
On Mon, 1 Jul 2019 15:44:29 +0100, Laatz, Kevin wrote:
> On 28/06/2019 21:29, Jonathan Lemon wrote:
> > On 28 Jun 2019, at 9:19, Laatz, Kevin wrote:
> >> On 27/06/2019 22:25, Jakub Kicinski wrote:
> >>> I think that's very limiting. What is the challenge in providing
> >>> aligned addresses, exactly?
> >> The challenges are two-fold:
> >> 1) it prevents using arbitrary buffer sizes, which will be an issue
> >> supporting e.g. jumbo frames in future.
> >> 2) higher level user-space frameworks which may want to use AF_XDP,
> >> such as DPDK, do not currently support having buffers with 'fixed'
> >> alignment.
> >> The reason that DPDK uses arbitrary placement is that:
> >> - it would stop things working on certain NICs which need the
> >> actual writable space specified in units of 1k - therefore we need 2k
> >> + metadata space.
> >> - we place padding between buffers to avoid constantly
> >> hitting the same memory channels when accessing memory.
> >> - it allows the application to choose the actual buffer size
> >> it wants to use.
> >> We make use of the above to allow us to speed up processing
> >> significantly and also reduce the packet buffer memory size.
> >>
> >> Not having arbitrary buffer alignment also means an AF_XDP driver
> >> for DPDK cannot be a drop-in replacement for existing drivers in
> >> those frameworks. Even with a new capability to allow an arbitrary
> >> buffer alignment, existing apps will need to be modified to use that
> >> new capability.
> >
> > Since all buffers in the umem are the same chunk size, the original
> > buffer
> > address can be recalculated with some multiply/shift math. However,
> > this is
> > more expensive than just a mask operation.
>
> Yes, we can do this.
That'd be best, can DPDK reasonably guarantee the slicing is uniform?
E.g. it's not desperate buffer pools with different bases?
> Another option we have is to add a socket option for querying the
> metadata length from the driver (assuming it doesn't vary per packet).
> We can use that information to get back to the original address using
> subtraction.
Unfortunately the metadata depends on the packet and how much info
the device was able to extract. So it's variable length.
> Alternatively, we can change the Rx descriptor format to include the
> metadata length. We could do this in a couple of ways, for example,
> rather than returning the address as the start of the packet, instead
> return the buffer address that was passed in, and adding another 16-bit
> field to specify the start of packet offset with that buffer. If using
> another 16-bits of the descriptor space is not desirable, an alternative
> could be to limit umem sizes to e.g. 2^48 bits (256 terabytes should be
> enough, right :-) ) and use the remaining 16 bits of the address as a
> packet offset. Other variations on these approach are obviously possible
> too.
Seems reasonable to me..
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [PATCH 00/11] XDP unaligned chunk placement support
2019-07-01 21:20 ` Jakub Kicinski
@ 2019-07-02 9:27 ` Richardson, Bruce
2019-07-02 16:33 ` Jonathan Lemon
0 siblings, 1 reply; 10+ messages in thread
From: Richardson, Bruce @ 2019-07-02 9:27 UTC (permalink / raw)
To: Jakub Kicinski, Laatz, Kevin
Cc: Jonathan Lemon, netdev@vger.kernel.org, ast@kernel.org,
daniel@iogearbox.net, Topel, Bjorn, Karlsson, Magnus,
bpf@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
Loftus, Ciara
> -----Original Message-----
> From: Jakub Kicinski [mailto:jakub.kicinski@netronome.com]
> Sent: Monday, July 1, 2019 10:20 PM
> To: Laatz, Kevin <kevin.laatz@intel.com>
> Cc: Jonathan Lemon <jonathan.lemon@gmail.com>; netdev@vger.kernel.org;
> ast@kernel.org; daniel@iogearbox.net; Topel, Bjorn
> <bjorn.topel@intel.com>; Karlsson, Magnus <magnus.karlsson@intel.com>;
> bpf@vger.kernel.org; intel-wired-lan@lists.osuosl.org; Richardson, Bruce
> <bruce.richardson@intel.com>; Loftus, Ciara <ciara.loftus@intel.com>
> Subject: Re: [PATCH 00/11] XDP unaligned chunk placement support
>
> On Mon, 1 Jul 2019 15:44:29 +0100, Laatz, Kevin wrote:
> > On 28/06/2019 21:29, Jonathan Lemon wrote:
> > > On 28 Jun 2019, at 9:19, Laatz, Kevin wrote:
> > >> On 27/06/2019 22:25, Jakub Kicinski wrote:
> > >>> I think that's very limiting. What is the challenge in providing
> > >>> aligned addresses, exactly?
> > >> The challenges are two-fold:
> > >> 1) it prevents using arbitrary buffer sizes, which will be an issue
> > >> supporting e.g. jumbo frames in future.
> > >> 2) higher level user-space frameworks which may want to use AF_XDP,
> > >> such as DPDK, do not currently support having buffers with 'fixed'
> > >> alignment.
> > >> The reason that DPDK uses arbitrary placement is that:
> > >> - it would stop things working on certain NICs which need
> > >> the actual writable space specified in units of 1k - therefore we
> > >> need 2k
> > >> + metadata space.
> > >> - we place padding between buffers to avoid constantly
> > >> hitting the same memory channels when accessing memory.
> > >> - it allows the application to choose the actual buffer
> > >> size it wants to use.
> > >> We make use of the above to allow us to speed up processing
> > >> significantly and also reduce the packet buffer memory size.
> > >>
> > >> Not having arbitrary buffer alignment also means an AF_XDP
> > >> driver for DPDK cannot be a drop-in replacement for existing
> > >> drivers in those frameworks. Even with a new capability to allow an
> > >> arbitrary buffer alignment, existing apps will need to be modified
> > >> to use that new capability.
> > >
> > > Since all buffers in the umem are the same chunk size, the original
> > > buffer address can be recalculated with some multiply/shift math.
> > > However, this is more expensive than just a mask operation.
> >
> > Yes, we can do this.
>
> That'd be best, can DPDK reasonably guarantee the slicing is uniform?
> E.g. it's not desperate buffer pools with different bases?
It's generally uniform, but handling the crossing of (huge)page boundaries
complicates things a bit. Therefore I think the final option below
is best as it avoids any such problems.
>
> > Another option we have is to add a socket option for querying the
> > metadata length from the driver (assuming it doesn't vary per packet).
> > We can use that information to get back to the original address using
> > subtraction.
>
> Unfortunately the metadata depends on the packet and how much info the
> device was able to extract. So it's variable length.
>
> > Alternatively, we can change the Rx descriptor format to include the
> > metadata length. We could do this in a couple of ways, for example,
> > rather than returning the address as the start of the packet, instead
> > return the buffer address that was passed in, and adding another
> > 16-bit field to specify the start of packet offset with that buffer.
> > If using another 16-bits of the descriptor space is not desirable, an
> > alternative could be to limit umem sizes to e.g. 2^48 bits (256
> > terabytes should be enough, right :-) ) and use the remaining 16 bits
> > of the address as a packet offset. Other variations on these approach
> > are obviously possible too.
>
> Seems reasonable to me..
I think this is probably the best solution, and also has the advantage that
a buffer retains its base address the full way through the cycle of Rx and Tx.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 00/11] XDP unaligned chunk placement support
2019-07-02 9:27 ` Richardson, Bruce
@ 2019-07-02 16:33 ` Jonathan Lemon
0 siblings, 0 replies; 10+ messages in thread
From: Jonathan Lemon @ 2019-07-02 16:33 UTC (permalink / raw)
To: Richardson, Bruce
Cc: Jakub Kicinski, Laatz, Kevin, netdev, ast, daniel, Topel, Bjorn,
Karlsson, Magnus, bpf, intel-wired-lan, Loftus, Ciara
On 2 Jul 2019, at 2:27, Richardson, Bruce wrote:
>> -----Original Message-----
>> From: Jakub Kicinski [mailto:jakub.kicinski@netronome.com]
>> Sent: Monday, July 1, 2019 10:20 PM
>> To: Laatz, Kevin <kevin.laatz@intel.com>
>> Cc: Jonathan Lemon <jonathan.lemon@gmail.com>;
>> netdev@vger.kernel.org;
>> ast@kernel.org; daniel@iogearbox.net; Topel, Bjorn
>> <bjorn.topel@intel.com>; Karlsson, Magnus
>> <magnus.karlsson@intel.com>;
>> bpf@vger.kernel.org; intel-wired-lan@lists.osuosl.org; Richardson,
>> Bruce
>> <bruce.richardson@intel.com>; Loftus, Ciara <ciara.loftus@intel.com>
>> Subject: Re: [PATCH 00/11] XDP unaligned chunk placement support
>>
>> On Mon, 1 Jul 2019 15:44:29 +0100, Laatz, Kevin wrote:
>>> On 28/06/2019 21:29, Jonathan Lemon wrote:
>>>> On 28 Jun 2019, at 9:19, Laatz, Kevin wrote:
>>>>> On 27/06/2019 22:25, Jakub Kicinski wrote:
>>>>>> I think that's very limiting. What is the challenge in
>>>>>> providing
>>>>>> aligned addresses, exactly?
>>>>> The challenges are two-fold:
>>>>> 1) it prevents using arbitrary buffer sizes, which will be an
>>>>> issue
>>>>> supporting e.g. jumbo frames in future.
>>>>> 2) higher level user-space frameworks which may want to use
>>>>> AF_XDP,
>>>>> such as DPDK, do not currently support having buffers with 'fixed'
>>>>> alignment.
>>>>> The reason that DPDK uses arbitrary placement is that:
>>>>> - it would stop things working on certain NICs which
>>>>> need
>>>>> the actual writable space specified in units of 1k - therefore we
>>>>> need 2k
>>>>> + metadata space.
>>>>> - we place padding between buffers to avoid
>>>>> constantly
>>>>> hitting the same memory channels when accessing memory.
>>>>> - it allows the application to choose the actual
>>>>> buffer
>>>>> size it wants to use.
>>>>> We make use of the above to allow us to speed up processing
>>>>> significantly and also reduce the packet buffer memory size.
>>>>>
>>>>> Not having arbitrary buffer alignment also means an AF_XDP
>>>>> driver for DPDK cannot be a drop-in replacement for existing
>>>>> drivers in those frameworks. Even with a new capability to allow
>>>>> an
>>>>> arbitrary buffer alignment, existing apps will need to be modified
>>>>> to use that new capability.
>>>>
>>>> Since all buffers in the umem are the same chunk size, the original
>>>> buffer address can be recalculated with some multiply/shift math.
>>>> However, this is more expensive than just a mask operation.
>>>
>>> Yes, we can do this.
>>
>> That'd be best, can DPDK reasonably guarantee the slicing is uniform?
>> E.g. it's not desperate buffer pools with different bases?
>
> It's generally uniform, but handling the crossing of (huge)page
> boundaries
> complicates things a bit. Therefore I think the final option below
> is best as it avoids any such problems.
>
>>
>>> Another option we have is to add a socket option for querying the
>>> metadata length from the driver (assuming it doesn't vary per
>>> packet).
>>> We can use that information to get back to the original address
>>> using
>>> subtraction.
>>
>> Unfortunately the metadata depends on the packet and how much info
>> the
>> device was able to extract. So it's variable length.
>>
>>> Alternatively, we can change the Rx descriptor format to include the
>>> metadata length. We could do this in a couple of ways, for example,
>>> rather than returning the address as the start of the packet,
>>> instead
>>> return the buffer address that was passed in, and adding another
>>> 16-bit field to specify the start of packet offset with that buffer.
>>> If using another 16-bits of the descriptor space is not desirable,
>>> an
>>> alternative could be to limit umem sizes to e.g. 2^48 bits (256
>>> terabytes should be enough, right :-) ) and use the remaining 16
>>> bits
>>> of the address as a packet offset. Other variations on these
>>> approach
>>> are obviously possible too.
>>
>> Seems reasonable to me..
>
> I think this is probably the best solution, and also has the advantage
> that
> a buffer retains its base address the full way through the cycle of Rx
> and Tx.
I like this as well - it also has the advantage that drivers can keep
performing adjustments on the handle, which ends up just modifying the
offset.
--
Jonathan
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-07-02 16:33 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20190620083924.1996-1-kevin.laatz@intel.com>
[not found] ` <FA8389B9-F89C-4BFF-95EE-56F702BBCC6D@gmail.com>
[not found] ` <ef7e9469-e7be-647b-8bb1-da29bc01fa2e@intel.com>
[not found] ` <20190627142534.4f4b8995@cakuba.netronome.com>
2019-06-28 16:19 ` [PATCH 00/11] XDP unaligned chunk placement support Laatz, Kevin
2019-06-28 16:51 ` Björn Töpel
2019-06-28 20:08 ` Jakub Kicinski
2019-06-28 20:25 ` Jakub Kicinski
2019-06-28 20:29 ` Jonathan Lemon
2019-07-01 14:58 ` Laatz, Kevin
[not found] ` <07e404eb-f712-b15a-4884-315aff3f7c7d@intel.com>
2019-07-01 21:20 ` Jakub Kicinski
2019-07-02 9:27 ` Richardson, Bruce
2019-07-02 16:33 ` Jonathan Lemon
2019-06-20 9:09 Kevin Laatz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).