netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: <bpf@vger.kernel.org>, <ast@kernel.org>, <daniel@iogearbox.net>,
	<andrii@kernel.org>, <netdev@vger.kernel.org>,
	<magnus.karlsson@intel.com>, <bjorn@kernel.org>,
	<echaudro@redhat.com>, <lorenzo@kernel.org>,
	<martin.lau@linux.dev>, <tirthendu.sarkar@intel.com>,
	<john.fastabend@gmail.com>, <horms@kernel.org>
Subject: Re: [PATCH v5 bpf 02/11] xsk: make xsk_buff_pool responsible for clearing xdp_buff::flags
Date: Wed, 24 Jan 2024 12:42:45 +0100	[thread overview]
Message-ID: <ZbD3tbiM/xD+aEJB@boxer> (raw)
In-Reply-To: <CAJ8uoz2W6nqJ=vk6+RR7zEohkv7CTBO+2KsAQAfgp=gf_5-ycA@mail.gmail.com>

On Wed, Jan 24, 2024 at 09:20:26AM +0100, Magnus Karlsson wrote:
> On Mon, 22 Jan 2024 at 23:16, Maciej Fijalkowski
> <maciej.fijalkowski@intel.com> wrote:
> >
> > XDP multi-buffer support introduced XDP_FLAGS_HAS_FRAGS flag that is
> > used by drivers to notify data path whether xdp_buff contains fragments
> > or not. Data path looks up mentioned flag on first buffer that occupies
> > the linear part of xdp_buff, so drivers only modify it there. This is
> > sufficient for SKB and XDP_DRV modes as usually xdp_buff is allocated on
> > stack or it resides within struct representing driver's queue and
> > fragments are carried via skb_frag_t structs. IOW, we are dealing with
> > only one xdp_buff.
> >
> > ZC mode though relies on list of xdp_buff structs that is carried via
> > xsk_buff_pool::xskb_list, so ZC data path has to make sure that
> > fragments do *not* have XDP_FLAGS_HAS_FRAGS set. Otherwise,
> > xsk_buff_free() could misbehave if it would be executed against xdp_buff
> > that carries a frag with XDP_FLAGS_HAS_FRAGS flag set. Such scenario can
> > take place when within supplied XDP program bpf_xdp_adjust_tail() is
> > used with negative offset that would in turn release the tail fragment
> > from multi-buffer frame.
> >
> > Calling xsk_buff_free() on tail fragment with XDP_FLAGS_HAS_FRAGS would
> > result in releasing all the nodes from xskb_list that were produced by
> > driver before XDP program execution, which is not what is intended -
> > only tail fragment should be deleted from xskb_list and then it should
> > be put onto xsk_buff_pool::free_list. Such multi-buffer frame will never
> > make it up to user space, so from AF_XDP application POV there would be
> > no traffic running, however due to free_list getting constantly new
> > nodes, driver will be able to feed HW Rx queue with recycled buffers.
> > Bottom line is that instead of traffic being redirected to user space,
> > it would be continuously dropped.
> >
> > To fix this, let us clear the mentioned flag on xsk_buff_pool side at
> > allocation time, which is what should have been done right from the
> > start of XSK multi-buffer support.
> >
> > Fixes: 1bbc04de607b ("ice: xsk: add RX multi-buffer support")
> > Fixes: 1c9ba9c14658 ("i40e: xsk: add RX multi-buffer support")
> > Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX")
> > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > ---
> >  drivers/net/ethernet/intel/i40e/i40e_xsk.c | 1 -
> >  drivers/net/ethernet/intel/ice/ice_xsk.c   | 1 -
> >  net/xdp/xsk_buff_pool.c                    | 3 +++
> >  3 files changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> > index e99fa854d17f..fede0bb3e047 100644
> > --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> > +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> > @@ -499,7 +499,6 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
> >                 xdp_res = i40e_run_xdp_zc(rx_ring, first, xdp_prog);
> >                 i40e_handle_xdp_result_zc(rx_ring, first, rx_desc, &rx_packets,
> >                                           &rx_bytes, xdp_res, &failure);
> > -               first->flags = 0;
> >                 next_to_clean = next_to_process;
> >                 if (failure)
> >                         break;
> > diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
> > index 5d1ae8e4058a..d9073a618ad6 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_xsk.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
> > @@ -895,7 +895,6 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget)
> >
> >                 if (!first) {
> >                         first = xdp;
> > -                       xdp_buff_clear_frags_flag(first);
> >                 } else if (ice_add_xsk_frag(rx_ring, first, xdp, size)) {
> >                         break;
> >                 }
> > diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
> > index 28711cc44ced..dc5659da6728 100644
> > --- a/net/xdp/xsk_buff_pool.c
> > +++ b/net/xdp/xsk_buff_pool.c
> > @@ -555,6 +555,7 @@ struct xdp_buff *xp_alloc(struct xsk_buff_pool *pool)
> >
> >         xskb->xdp.data = xskb->xdp.data_hard_start + XDP_PACKET_HEADROOM;
> >         xskb->xdp.data_meta = xskb->xdp.data;
> > +       xskb->xdp.flags = 0;
> >
> >         if (pool->dma_need_sync) {
> >                 dma_sync_single_range_for_device(pool->dev, xskb->dma, 0,
> > @@ -601,6 +602,7 @@ static u32 xp_alloc_new_from_fq(struct xsk_buff_pool *pool, struct xdp_buff **xd
> >                 }
> >
> >                 *xdp = &xskb->xdp;
> > +               xskb->xdp.flags = 0;
> 
> Thanks for catching this. I am thinking we should have an if-statement
> here and only do this when multi-buffer is enabled. The reason that we
> have two different paths for aligned mode and unaligned mode here is
> that we do not have to touch the xdp_buff at all at allocation time in
> aligned mode, which provides a nice speed-up. So let us only do this
> when necessary. What do you think? Same goes for the line in
> xp_alloc_reused().
> 

Good point. How about keeping flags = 0 in xp_alloc() and adding it to
xsk_buff_set_size() ? We do touch xdp_buff there and these two paths cover
batched and non-batched APIs. I do agree that doing it in
xp_alloc_new_from_fq() and in xp_alloc_reused() is not really handy.

> >                 xdp++;
> >         }
> >
> > @@ -621,6 +623,7 @@ static u32 xp_alloc_reused(struct xsk_buff_pool *pool, struct xdp_buff **xdp, u3
> >                 list_del_init(&xskb->free_list_node);
> >
> >                 *xdp = &xskb->xdp;
> > +               xskb->xdp.flags = 0;
> >                 xdp++;
> >         }
> >         pool->free_list_cnt -= nb_entries;
> > --
> > 2.34.1
> >
> >

  reply	other threads:[~2024-01-24 11:42 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-22 22:15 [PATCH v5 bpf 00/11] net: bpf_xdp_adjust_tail() and Intel mbuf fixes Maciej Fijalkowski
2024-01-22 22:16 ` [PATCH v5 bpf 01/11] xsk: recycle buffer in case Rx queue was full Maciej Fijalkowski
2024-01-22 22:16 ` [PATCH v5 bpf 02/11] xsk: make xsk_buff_pool responsible for clearing xdp_buff::flags Maciej Fijalkowski
2024-01-24  8:20   ` Magnus Karlsson
2024-01-24 11:42     ` Maciej Fijalkowski [this message]
2024-01-24 11:49       ` Magnus Karlsson
2024-01-22 22:16 ` [PATCH v5 bpf 03/11] xsk: fix usage of multi-buffer BPF helpers for ZC XDP Maciej Fijalkowski
2024-01-24  1:53   ` Jakub Kicinski
2024-01-24 12:02     ` Maciej Fijalkowski
2024-01-24 16:53       ` Jakub Kicinski
2024-01-24 16:56         ` Maciej Fijalkowski
2024-01-22 22:16 ` [PATCH v5 bpf 04/11] ice: work on pre-XDP prog frag count Maciej Fijalkowski
2024-01-24  8:37   ` Magnus Karlsson
2024-01-24 14:05     ` Maciej Fijalkowski
2024-01-22 22:16 ` [PATCH v5 bpf 05/11] i40e: handle multi-buffer packets that are shrunk by xdp prog Maciej Fijalkowski
2024-01-22 22:16 ` [PATCH v5 bpf 06/11] ice: remove redundant xdp_rxq_info registration Maciej Fijalkowski
2024-01-24  8:39   ` Magnus Karlsson
2024-01-22 22:16 ` [PATCH v5 bpf 07/11] intel: xsk: initialize skb_frag_t::bv_offset in ZC drivers Maciej Fijalkowski
2024-01-24  8:44   ` Magnus Karlsson
2024-01-24 16:21     ` Maciej Fijalkowski
2024-01-22 22:16 ` [PATCH v5 bpf 08/11] ice: update xdp_rxq_info::frag_size for ZC enabled Rx queue Maciej Fijalkowski
2024-01-24  8:51   ` Magnus Karlsson
2024-01-24 13:58     ` Maciej Fijalkowski
2024-01-22 22:16 ` [PATCH v5 bpf 09/11] xdp: reflect tail increase for MEM_TYPE_XSK_BUFF_POOL Maciej Fijalkowski
2024-01-22 22:16 ` [PATCH v5 bpf 10/11] i40e: set xdp_rxq_info::frag_size Maciej Fijalkowski
2024-01-22 22:16 ` [PATCH v5 bpf 11/11] i40e: update xdp_rxq_info::frag_size for ZC enabled Rx queue Maciej Fijalkowski
2024-01-24  8:52   ` Magnus Karlsson
2024-01-24  8:58 ` [PATCH v5 bpf 00/11] net: bpf_xdp_adjust_tail() and Intel mbuf fixes Magnus Karlsson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZbD3tbiM/xD+aEJB@boxer \
    --to=maciej.fijalkowski@intel.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=echaudro@redhat.com \
    --cc=horms@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=lorenzo@kernel.org \
    --cc=magnus.karlsson@gmail.com \
    --cc=magnus.karlsson@intel.com \
    --cc=martin.lau@linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=tirthendu.sarkar@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).