All of lore.kernel.org
 help / color / mirror / Atom feed
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Gerhard Engleder <gerhard@engleder-embedded.com>
Cc: <netdev@vger.kernel.org>, <bpf@vger.kernel.org>,
	<davem@davemloft.net>, <kuba@kernel.org>, <edumazet@google.com>,
	<pabeni@redhat.com>, <bjorn@kernel.org>,
	<magnus.karlsson@intel.com>, <jonathan.lemon@gmail.com>
Subject: Re: [PATCH net-next 4/5] tsnep: Add XDP socket zero-copy RX support
Date: Mon, 3 Apr 2023 19:26:12 +0200	[thread overview]
Message-ID: <ZCsMNKCK0xQECDJh@boxer> (raw)
In-Reply-To: <ZCsKkygVjB3J+XrO@boxer>

On Mon, Apr 03, 2023 at 07:19:15PM +0200, Maciej Fijalkowski wrote:
> On Sun, Apr 02, 2023 at 09:38:37PM +0200, Gerhard Engleder wrote:
> 
> Hey Gerhard,
> 
> > Add support for XSK zero-copy to RX path. The setup of the XSK pool can
> > be done at runtime. If the netdev is running, then the queue must be
> > disabled and enabled during reconfiguration. This can be done easily
> > with functions introduced in previous commits.
> > 
> > A more important property is that, if the netdev is running, then the
> > setup of the XSK pool shall not stop the netdev in case of errors. A
> > broken netdev after a failed XSK pool setup is bad behavior. Therefore,
> > the allocation and setup of resources during XSK pool setup is done only
> > before any queue is disabled. Additionally, freeing and later allocation
> > of resources is eliminated in some cases. Page pool entries are kept for
> > later use. Two memory models are registered in parallel. As a result,
> > the XSK pool setup cannot fail during queue reconfiguration.
> > 
> > In contrast to other drivers, XSK pool setup and XDP BPF program setup
> > are separate actions. XSK pool setup can be done without any XDP BPF
> > program. The XDP BPF program can be added, removed or changed without
> > any reconfiguration of the XSK pool.
> 
> I won't argue about your design, but I'd be glad if you would present any
> perf numbers (ZC vs copy mode) just to give us some overview how your
> implementation works out. Also, please consider using batching APIs and
> see if this gives you any boost (my assumption is that it would).
> 
> > 
> > Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
> > ---
> >  drivers/net/ethernet/engleder/tsnep.h      |   7 +
> >  drivers/net/ethernet/engleder/tsnep_main.c | 432 ++++++++++++++++++++-
> >  drivers/net/ethernet/engleder/tsnep_xdp.c  |  67 ++++
> >  3 files changed, 488 insertions(+), 18 deletions(-)

(...)

> > +
> >  static bool tsnep_xdp_run_prog(struct tsnep_rx *rx, struct bpf_prog *prog,
> >  			       struct xdp_buff *xdp, int *status,
> > -			       struct netdev_queue *tx_nq, struct tsnep_tx *tx)
> > +			       struct netdev_queue *tx_nq, struct tsnep_tx *tx,
> > +			       bool zc)
> >  {
> >  	unsigned int length;
> > -	unsigned int sync;
> >  	u32 act;
> >  
> >  	length = xdp->data_end - xdp->data_hard_start - XDP_PACKET_HEADROOM;
> >  
> >  	act = bpf_prog_run_xdp(prog, xdp);
> > -
> > -	/* Due xdp_adjust_tail: DMA sync for_device cover max len CPU touch */
> > -	sync = xdp->data_end - xdp->data_hard_start - XDP_PACKET_HEADROOM;
> > -	sync = max(sync, length);
> > -
> >  	switch (act) {
> >  	case XDP_PASS:
> >  		return false;
> > @@ -1027,8 +1149,21 @@ static bool tsnep_xdp_run_prog(struct tsnep_rx *rx, struct bpf_prog *prog,
> >  		trace_xdp_exception(rx->adapter->netdev, prog, act);
> >  		fallthrough;
> >  	case XDP_DROP:
> > -		page_pool_put_page(rx->page_pool, virt_to_head_page(xdp->data),
> > -				   sync, true);
> > +		if (zc) {
> > +			xsk_buff_free(xdp);
> > +		} else {
> > +			unsigned int sync;
> > +
> > +			/* Due xdp_adjust_tail: DMA sync for_device cover max
> > +			 * len CPU touch
> > +			 */
> > +			sync = xdp->data_end - xdp->data_hard_start -
> > +			       XDP_PACKET_HEADROOM;
> > +			sync = max(sync, length);
> > +			page_pool_put_page(rx->page_pool,
> > +					   virt_to_head_page(xdp->data), sync,
> > +					   true);
> > +		}
> >  		return true;
> >  	}
> >  }
> > @@ -1181,7 +1316,8 @@ static int tsnep_rx_poll(struct tsnep_rx *rx, struct napi_struct *napi,
> >  					 length, false);
> >  
> >  			consume = tsnep_xdp_run_prog(rx, prog, &xdp,
> > -						     &xdp_status, tx_nq, tx);
> > +						     &xdp_status, tx_nq, tx,
> > +						     false);
> >  			if (consume) {
> >  				rx->packets++;
> >  				rx->bytes += length;
> > @@ -1205,6 +1341,125 @@ static int tsnep_rx_poll(struct tsnep_rx *rx, struct napi_struct *napi,
> >  	return done;
> >  }
> >  
> > +static int tsnep_rx_poll_zc(struct tsnep_rx *rx, struct napi_struct *napi,
> > +			    int budget)
> > +{
> > +	struct tsnep_rx_entry *entry;
> > +	struct netdev_queue *tx_nq;
> > +	struct bpf_prog *prog;
> > +	struct tsnep_tx *tx;
> > +	int desc_available;
> > +	int xdp_status = 0;
> > +	struct page *page;
> > +	int done = 0;
> > +	int length;
> > +
> > +	desc_available = tsnep_rx_desc_available(rx);
> > +	prog = READ_ONCE(rx->adapter->xdp_prog);
> > +	if (prog) {
> > +		tx_nq = netdev_get_tx_queue(rx->adapter->netdev,
> > +					    rx->tx_queue_index);
> > +		tx = &rx->adapter->tx[rx->tx_queue_index];
> > +	}
> > +
> > +	while (likely(done < budget) && (rx->read != rx->write)) {
> > +		entry = &rx->entry[rx->read];
> > +		if ((__le32_to_cpu(entry->desc_wb->properties) &
> > +		     TSNEP_DESC_OWNER_COUNTER_MASK) !=
> > +		    (entry->properties & TSNEP_DESC_OWNER_COUNTER_MASK))
> > +			break;
> > +		done++;
> > +
> > +		if (desc_available >= TSNEP_RING_RX_REFILL) {
> > +			bool reuse = desc_available >= TSNEP_RING_RX_REUSE;
> > +
> > +			desc_available -= tsnep_rx_refill_zc(rx, desc_available,
> > +							     reuse);
> > +			if (!entry->xdp) {
> > +				/* buffer has been reused for refill to prevent
> > +				 * empty RX ring, thus buffer cannot be used for
> > +				 * RX processing
> > +				 */
> > +				rx->read = (rx->read + 1) % TSNEP_RING_SIZE;
> > +				desc_available++;
> > +
> > +				rx->dropped++;
> > +
> > +				continue;
> > +			}
> > +		}
> > +
> > +		/* descriptor properties shall be read first, because valid data
> > +		 * is signaled there
> > +		 */
> > +		dma_rmb();
> > +
> > +		prefetch(entry->xdp->data);
> > +		length = __le32_to_cpu(entry->desc_wb->properties) &
> > +			 TSNEP_DESC_LENGTH_MASK;
> > +		entry->xdp->data_end = entry->xdp->data + length;
> > +		xsk_buff_dma_sync_for_cpu(entry->xdp, rx->xsk_pool);
> > +
> > +		/* RX metadata with timestamps is in front of actual data,
> > +		 * subtract metadata size to get length of actual data and
> > +		 * consider metadata size as offset of actual data during RX
> > +		 * processing
> > +		 */
> > +		length -= TSNEP_RX_INLINE_METADATA_SIZE;
> > +
> > +		rx->read = (rx->read + 1) % TSNEP_RING_SIZE;
> > +		desc_available++;
> > +
> > +		if (prog) {
> > +			bool consume;
> > +
> > +			entry->xdp->data += TSNEP_RX_INLINE_METADATA_SIZE;
> > +			entry->xdp->data_meta += TSNEP_RX_INLINE_METADATA_SIZE;
> > +
> > +			consume = tsnep_xdp_run_prog(rx, prog, entry->xdp,
> > +						     &xdp_status, tx_nq, tx,
> > +						     true);
> 
> reason for separate xdp run prog routine for ZC was usually "likely-fying"
> XDP_REDIRECT action as this is the main action for AF_XDP which was giving
> us perf improvement. Please try this out on your side to see if this
> yields any positive value.

One more thing - you have to handle XDP_TX action in a ZC specific way.
Your current code will break if you enable xsk_pool and return XDP_TX from
XDP prog.

  reply	other threads:[~2023-04-03 17:26 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-02 19:38 [PATCH net-next 0/5] tsnep: XDP socket zero-copy support Gerhard Engleder
2023-04-02 19:38 ` [PATCH net-next 1/5] tsnep: Rework TX/RX queue initialization Gerhard Engleder
2023-04-02 19:38 ` [PATCH net-next 2/5] tsnep: Add functions for queue enable/disable Gerhard Engleder
2023-04-02 19:38 ` [PATCH net-next 3/5] tsnep: Move skb receive action to separate function Gerhard Engleder
2023-04-02 19:38 ` [PATCH net-next 4/5] tsnep: Add XDP socket zero-copy RX support Gerhard Engleder
2023-04-03 17:19   ` Maciej Fijalkowski
2023-04-03 17:26     ` Maciej Fijalkowski [this message]
2023-04-05 19:13       ` Gerhard Engleder
2023-04-07 13:41         ` Maciej Fijalkowski
2023-04-15 14:19           ` Gerhard Engleder
2023-04-05 18:49     ` Gerhard Engleder
2023-04-02 19:38 ` [PATCH net-next 5/5] tsnep: Add XDP socket zero-copy TX support Gerhard Engleder
2023-04-03 22:36   ` Maciej Fijalkowski
2023-04-05 19:15     ` Gerhard Engleder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZCsMNKCK0xQECDJh@boxer \
    --to=maciej.fijalkowski@intel.com \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=gerhard@engleder-embedded.com \
    --cc=jonathan.lemon@gmail.com \
    --cc=kuba@kernel.org \
    --cc=magnus.karlsson@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.