Re: [net-next PATCH v4 7/7] net: ravb: Allocate RX buffers via page pool

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Simon Horman <horms@kernel.org>
To: Paul Barker <paul.barker.ct@bp.renesas.com>
Cc: "Sergey Shtylyov" <s.shtylyov@omp.ru>,
	"David S. Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Niklas Söderlund" <niklas.soderlund+renesas@ragnatech.se>,
	"Biju Das" <biju.das.jz@bp.renesas.com>,
	"Claudiu Beznea" <claudiu.beznea.uj@bp.renesas.com>,
	"Yoshihiro Shimoda" <yoshihiro.shimoda.uh@renesas.com>,
	netdev@vger.kernel.org, linux-renesas-soc@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [net-next PATCH v4 7/7] net: ravb: Allocate RX buffers via page pool
Date: Sat, 1 Jun 2024 11:13:00 +0100	[thread overview]
Message-ID: <20240601101300.GA491852@kernel.org> (raw)
In-Reply-To: <20240528150339.6791-8-paul.barker.ct@bp.renesas.com>

On Tue, May 28, 2024 at 04:03:39PM +0100, Paul Barker wrote:
> This patch makes multiple changes that can't be separated:
> 
>   1) Allocate plain RX buffers via a page pool instead of allocating
>      SKBs, then use build_skb() when a packet is received.
>   2) For GbEth IP, reduce the RX buffer size to 2kB.
>   3) For GbEth IP, merge packets which span more than one RX descriptor
>      as SKB fragments instead of copying data.
> 
> Implementing (1) without (2) would require the use of an order-1 page
> pool (instead of an order-0 page pool split into page fragments) for
> GbEth.
> 
> Implementing (2) without (3) would leave us no space to re-assemble
> packets which span more than one RX descriptor.
> 
> Implementing (3) without (1) would not be possible as the network stack
> expects to use put_page() or page_pool_put_page() to free SKB fragments
> after an SKB is consumed.
> 
> RX checksum offload support is adjusted to handle both linear and
> nonlinear (fragmented) packets.
> 
> This patch gives the following improvements during testing with iperf3.
> 
>   * RZ/G2L:
>     * TCP RX: same bandwidth at -43% CPU load (70% -> 40%)
>     * UDP RX: same bandwidth at -17% CPU load (88% -> 74%)
> 
>   * RZ/G2UL:
>     * TCP RX: +30% bandwidth (726Mbps -> 941Mbps)
>     * UDP RX: +417% bandwidth (108Mbps -> 558Mbps)
> 
>   * RZ/G3S:
>     * TCP RX: +64% bandwidth (562Mbps -> 920Mbps)
>     * UDP RX: +420% bandwidth (90Mbps -> 468Mbps)
> 
>   * RZ/Five:
>     * TCP RX: +217% bandwidth (145Mbps -> 459Mbps)
>     * UDP RX: +470% bandwidth (20Mbps -> 114Mbps)
> 
> There is no significant impact on bandwidth or CPU load in testing on
> RZ/G2H or R-Car M3N.
> 
> Signed-off-by: Paul Barker <paul.barker.ct@bp.renesas.com>

Hi Paul,

Some minor feedback from my side.

...

> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c

...

> @@ -298,13 +269,14 @@ static void ravb_ring_free(struct net_device *ndev, int q)
>  		priv->tx_ring[q] = NULL;
>  	}
>  
> -	/* Free RX skb ringbuffer */
> -	if (priv->rx_skb[q]) {
> -		for (i = 0; i < priv->num_rx_ring[q]; i++)
> -			dev_kfree_skb(priv->rx_skb[q][i]);
> +	/* Free RX buffers */
> +	for (i = 0; i < priv->num_rx_ring[q]; i++) {
> +		if (priv->rx_buffers[q][i].page)
> +			page_pool_put_page(priv->rx_pool[q], priv->rx_buffers[q][i].page, 0, true);

nit: Networking still prefers code to be 80 columns wide or less.
     It looks like that can be trivially achieved here.

     Flagged by checkpatch.pl --max-line-length=80

>  	}
> -	kfree(priv->rx_skb[q]);
> -	priv->rx_skb[q] = NULL;
> +	kfree(priv->rx_buffers[q]);
> +	priv->rx_buffers[q] = NULL;
> +	page_pool_destroy(priv->rx_pool[q]);
>  
>  	/* Free aligned TX buffers */
>  	kfree(priv->tx_align[q]);
> @@ -317,35 +289,56 @@ static void ravb_ring_free(struct net_device *ndev, int q)
>  	priv->tx_skb[q] = NULL;
>  }
>  
> +static int
> +ravb_alloc_rx_buffer(struct net_device *ndev, int q, u32 entry, gfp_t gfp_mask,
> +		     struct ravb_rx_desc *rx_desc)
> +{
> +	struct ravb_private *priv = netdev_priv(ndev);
> +	const struct ravb_hw_info *info = priv->info;
> +	struct ravb_rx_buffer *rx_buff = &priv->rx_buffers[q][entry];
> +	dma_addr_t dma_addr;
> +	unsigned int size;

nit: I would appreciate it if some consideration could be given to
     moving this driver towards rather than away from reverse xmas
     tree - longest line to shortest - for local variable declarations.

     I'm not suggesting a clean-up patch. Rather, that in cases
     like this where new code is added, and also in cases where
     code is modified, reverse xmas tree is preferred.

     Here I would suggest separating the assinment of rx_buf from
     it's declaration (completely untested!):

	struct ravb_private *priv = netdev_priv(ndev);
	const struct ravb_hw_info *info = priv->info;
	struct ravb_rx_buffer *rx_buff;
	dma_addr_t dma_addr;
	unsigned int size;

	rx_buff = &priv->rx_buffers[q][entry];

     Edward Cree's xmastree tool can be helpful here:
     https://github.com/ecree-solarflare/xmastree

> +
> +	size = info->rx_buffer_size;
> +	rx_buff->page = page_pool_alloc(priv->rx_pool[q], &rx_buff->offset, &size,
> +					gfp_mask);
> +	if (unlikely(!rx_buff->page)) {
> +		/* We just set the data size to 0 for a failed mapping
> +		 * which should prevent DMA from happening...
> +		 */
> +		rx_desc->ds_cc = cpu_to_le16(0);
> +		return -ENOMEM;
> +	}
> +
> +	dma_addr = page_pool_get_dma_addr(rx_buff->page) + rx_buff->offset;
> +	dma_sync_single_for_device(ndev->dev.parent, dma_addr,
> +				   info->rx_buffer_size, DMA_FROM_DEVICE);
> +	rx_desc->dptr = cpu_to_le32(dma_addr);
> +
> +	/* The end of the RX buffer is used to store skb shared data, so we need
> +	 * to ensure that the hardware leaves enough space for this.
> +	 */
> +	rx_desc->ds_cc = cpu_to_le16(info->rx_buffer_size
> +				     - SKB_DATA_ALIGN(sizeof(struct skb_shared_info))
> +				     - ETH_FCS_LEN + sizeof(__sum16));
> +	return 0;
> +}

...

> @@ -816,14 +824,26 @@ static int ravb_rx_gbeth(struct net_device *ndev, int budget, int q)
>  			if (desc_status & MSC_CEEF)
>  				stats->rx_missed_errors++;
>  		} else {
> +			struct ravb_rx_buffer *rx_buff = &priv->rx_buffers[q][entry];
> +			void *rx_addr = page_address(rx_buff->page) + rx_buff->offset;
>  			die_dt = desc->die_dt & 0xF0;
> -			skb = ravb_get_skb_gbeth(ndev, entry, desc);
> +			dma_sync_single_for_cpu(ndev->dev.parent, le32_to_cpu(desc->dptr),
> +						desc_len, DMA_FROM_DEVICE);
> +
>  			switch (die_dt) {
>  			case DT_FSINGLE:
>  			case DT_FSTART:
>  				/* Start of packet:
> -				 * Set initial data length.
> +				 * Prepare an SKB and add initial data.
>  				 */
> +				skb = napi_build_skb(rx_addr, info->rx_buffer_size);
> +				if (unlikely(!skb)) {
> +					stats->rx_errors++;
> +					page_pool_put_page(priv->rx_pool[q],
> +							   rx_buff->page, 0, true);

Here skb is NULL.

> +					break;
> +				}
> +				skb_mark_for_recycle(skb);
>  				skb_put(skb, desc_len);
>  
>  				/* Save this SKB if the packet spans multiple
> @@ -836,14 +856,23 @@ static int ravb_rx_gbeth(struct net_device *ndev, int budget, int q)
>  			case DT_FMID:
>  			case DT_FEND:
>  				/* Continuing a packet:
> -				 * Move data into the saved SKB.
> +				 * Add this buffer as an RX frag.
>  				 */
> -				skb_copy_to_linear_data_offset(priv->rx_1st_skb,
> -							       priv->rx_1st_skb->len,
> -							       skb->data,
> -							       desc_len);
> -				skb_put(priv->rx_1st_skb, desc_len);
> -				dev_kfree_skb(skb);
> +
> +				/* rx_1st_skb will be NULL if napi_build_skb()
> +				 * failed for the first descriptor of a
> +				 * multi-descriptor packet.
> +				 */
> +				if (unlikely(!priv->rx_1st_skb)) {
> +					stats->rx_errors++;
> +					page_pool_put_page(priv->rx_pool[q],
> +							   rx_buff->page, 0, true);

And here skb seems to be uninitialised.

> +					break;
> +				}
> +				skb_add_rx_frag(priv->rx_1st_skb,
> +						skb_shinfo(priv->rx_1st_skb)->nr_frags,
> +						rx_buff->page, rx_buff->offset,
> +						desc_len, info->rx_buffer_size);
>  
>  				/* Set skb to point at the whole packet so that
>  				 * we only need one code path for finishing a

The code between the hunk above and the hunk below is:

				/* Set skb to point at the whole packet so that
				 * we only need one code path for finishing a
				 * packet.
				 */
				skb = priv->rx_1st_skb;
			}
			switch (die_dt) {
			case DT_FSINGLE:
			case DT_FEND:
				/* Finishing a packet:
				 * Determine protocol & checksum, hand off to
				 * NAPI and update our stats.
				 */
				skb->protocol = eth_type_trans(skb, ndev);
				if (ndev->features & NETIF_F_RXCSUM)
					ravb_rx_csum_gbeth(skb);
				stats->rx_bytes += skb->len;
				napi_gro_receive(&priv->napi[q], skb);
				rx_packets++;

It seems that the inter-hunk code above may dereference skb when it is NULL
or uninitialised.

Flagged by Smatch.

> @@ -865,7 +894,16 @@ static int ravb_rx_gbeth(struct net_device *ndev, int budget, int q)
>  				stats->rx_bytes += skb->len;
>  				napi_gro_receive(&priv->napi[q], skb);
>  				rx_packets++;
> +
> +				/* Clear rx_1st_skb so that it will only be
> +				 * non-NULL when valid.
> +				 */
> +				if (die_dt == DT_FEND)
> +					priv->rx_1st_skb = NULL;
>  			}
> +
> +			/* Mark this RX buffer as consumed. */
> +			rx_buff->page = NULL;
>  		}
>  	}
>  

...

next prev parent reply	other threads:[~2024-06-01 10:13 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-28 15:03 [net-next PATCH v4 0/7] Improve GbEth performance on Renesas RZ/G2L and related SoCs Paul Barker
2024-05-28 15:03 ` [net-next PATCH v4 1/7] net: ravb: Simplify poll & receive functions Paul Barker
2024-05-28 16:21   ` Sergey Shtylyov
2024-05-28 15:03 ` [net-next PATCH v4 2/7] net: ravb: Consider busypolling status when re-enabling interrupts Paul Barker
2024-05-28 16:44   ` Sergey Shtylyov
2024-05-28 16:47     ` Sergey Shtylyov
2024-05-29 19:09       ` Paul Barker
2024-05-28 15:03 ` [net-next PATCH v4 3/7] net: ravb: Refactor RX ring refill Paul Barker
2024-05-28 20:50   ` Sergey Shtylyov
2024-05-28 15:03 ` [net-next PATCH v4 4/7] net: ravb: Refactor GbEth RX code path Paul Barker
2024-05-29 18:30   ` Sergey Shtylyov
2024-05-29 19:07     ` Paul Barker
2024-05-30 20:37       ` Sergey Shtylyov
2024-05-28 15:03 ` [net-next PATCH v4 5/7] net: ravb: Enable SW IRQ Coalescing for GbEth Paul Barker
2024-05-28 15:03 ` [net-next PATCH v4 6/7] net: ravb: Use NAPI threaded mode on 1-core CPUs with GbEth IP Paul Barker
2024-05-28 15:03 ` [net-next PATCH v4 7/7] net: ravb: Allocate RX buffers via page pool Paul Barker
2024-05-29 20:52   ` Sergey Shtylyov
2024-05-30  9:21     ` Paul Barker
2024-05-30 10:29       ` Paul Barker
2024-05-31 17:25       ` Sergey Shtylyov
2024-06-01 10:13   ` Simon Horman [this message]
2024-06-03  8:02     ` Paul Barker
2024-06-03 12:07       ` Simon Horman
2024-06-03 12:15         ` Paul Barker
2024-06-05 17:18           ` Sergey Shtylyov
2024-06-03 20:45       ` Sergey Shtylyov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240601101300.GA491852@kernel.org \
    --to=horms@kernel.org \
    --cc=biju.das.jz@bp.renesas.com \
    --cc=claudiu.beznea.uj@bp.renesas.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-renesas-soc@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=niklas.soderlund+renesas@ragnatech.se \
    --cc=pabeni@redhat.com \
    --cc=paul.barker.ct@bp.renesas.com \
    --cc=s.shtylyov@omp.ru \
    --cc=yoshihiro.shimoda.uh@renesas.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.