Re: [PATCH 07/10] net: emac: fix DMA API mapping and unmapping correctness

Netdev List
 help / color / mirror / Atom feed

From: Paolo Abeni <pabeni@redhat.com>
To: rosenp@gmail.com
Cc: netdev@vger.kernel.org, andrew+netdev@lunn.ch,
	davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 07/10] net: emac: fix DMA API mapping and unmapping correctness
Date: Thu,  2 Jul 2026 12:39:31 +0200	[thread overview]
Message-ID: <20260702103931.147571-1-pabeni@redhat.com> (raw)
In-Reply-To: <20260630041634.284127-8-rosenp@gmail.com>

From: AI Reviewer <ai@example.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://netdev-ai.bots.linux.dev/sashiko/
---
net: emac: fix DMA API mapping and unmapping correctness

This patch adds dma_mapping_error() checks after every dma_map_single()
and skb_frag_dma_map() call in the ibm emac driver, introduces a shadow
rx_dma[] array in struct emac_instance, and replaces the map-and-leak
pattern in emac_recycle_rx_skb() with dma_sync_single_for_device().

A few questions follow inline.

> net: emac: fix DMA API mapping and unmapping correctness

[Low]
The subject says the fix is for DMA API mapping and unmapping
correctness, but the in-tree comment added by this patch states:

  + * TX buffers still lack dma_unmap calls for the reasons explained in the
  + * original note below (a single skb may be split across multiple BDs on
  + * TAH-equipped EMACs, making per-fragment tracking complex).

Should the subject be scoped to the RX path (for example, "fix DMA API
correctness on the RX path") since the TX-side DMA API violation is
intentionally preserved?

[Low]
The commit message bullet says:

    - emac_recycle_rx_skb: map first, check error, then unmap old;
      use dma_sync_single_for_device() with SKB_DATA_ALIGN(len + NET_IP_ALIGN)

But emac_recycle_rx_skb() only calls dma_sync_single_for_device() and
performs no map/check/unmap sequence; the "map first, check, unmap old"
sequence is in the emac_poll_rx() MAL_RX_CTRL_FIRST branch. Would it
help to split this bullet across the two functions?

[Low]
The changelog ends with:

    For whatever reason after this patch, ath9k stopped throwing DMA
    errors with CONFIG_DMA_API_DEBUG.

Is this perhaps an artifact of CONFIG_DMA_API_DEBUG hash exhaustion
caused by the leaked mappings in the pre-patch driver? It might be
worth investigating and stating the mechanism rather than leaving it
as an unexplained side effect.

[Low]
The emac_recycle_rx_skb() change (discarding the returned DMA address
from dma_map_single()) looks like a long-standing functional bug on
systems with IOMMU/SWIOTLB. Would a Fixes: tag and stable consideration
be appropriate for that hunk?

> diff --git a/drivers/net/ethernet/ibm/emac/core.c b/drivers/net/ethernet/ibm/emac/core.c
> index ced9690cddc3..aed1ad21e2ea 100644
> --- a/drivers/net/ethernet/ibm/emac/core.c
> +++ b/drivers/net/ethernet/ibm/emac/core.c

[ ... ]

> @@ -1058,6 +1066,7 @@ static int emac_resize_rx_ring(struct emac_instance *dev, int new_mtu)
>  	/* Second pass, allocate new skbs */
>  	for (i = 0; i < NUM_RX_BUFF; ++i) {
>  		struct sk_buff *skb;
> +		dma_addr_t dma;
>  
>  		skb = netdev_alloc_skb_ip_align(dev->ndev, rx_skb_size);
>  		if (!skb) {
> @@ -1066,12 +1075,24 @@ static int emac_resize_rx_ring(struct emac_instance *dev, int new_mtu)
>  		}
>  
>  		BUG_ON(!dev->rx_skb[i]);
> +		dma_unmap_single(&dev->ofdev->dev,
> +				 dev->rx_dma[i],
> +				 dev->rx_sync_size, DMA_FROM_DEVICE);
>  		dev_kfree_skb(dev->rx_skb[i]);
>  
> -		dev->rx_desc[i].data_ptr =
> -		    dma_map_single(&dev->ofdev->dev, skb->data - NET_IP_ALIGN,
> -				   rx_sync_size, DMA_FROM_DEVICE)
> -				   + NET_IP_ALIGN;
> +		dma = dma_map_single(&dev->ofdev->dev, skb->data - NET_IP_ALIGN,
> +				     rx_sync_size, DMA_FROM_DEVICE);
> +		if (dma_mapping_error(&dev->ofdev->dev, dma)) {
> +			dev_kfree_skb(skb);
> +			dev->rx_skb[i] = NULL;
> +			dev->rx_dma[i] = 0;
> +			dev->rx_desc[i].data_ptr = 0;
> +			dev->rx_desc[i].ctrl = 0;
> +			ret = -ENOMEM;
> +			goto oom;
> +		}

[High]
Can this error path leave the descriptor ring in a state that crashes
the next NAPI poll?

The first pass earlier in emac_resize_rx_ring() initialises every
slot's ctrl with:

    dev->rx_desc[i].ctrl = MAL_RX_CTRL_EMPTY |
        (i == (NUM_RX_BUFF - 1) ? MAL_RX_CTRL_WRAP : 0);

so software (which polls with "if (ctrl & MAL_RX_CTRL_EMPTY) break;")
treats those slots as owned by hardware and skips them. The new error
path here sets ctrl = 0 instead, which clears MAL_RX_CTRL_EMPTY (and
MAL_RX_CTRL_WRAP for the last slot).

The oom: label below unconditionally re-enables RX:

    oom:
        clear_bit(MAL_COMMAC_RX_STOPPED, &dev->commac.flags);
        dev->rx_slot = 0;
        mal_enable_rx_channel(dev->mal, dev->mal_rx_chan);
        emac_rx_enable(dev);
        emac_netif_start(dev);

On the next emac_poll_rx() call, the loop reads ctrl = 0, finds the
EMPTY bit clear, and proceeds with skb = dev->rx_skb[slot] = NULL,
followed by memcpy()/skb_put()/eth_type_trans() on that NULL skb.

Additionally, the slot's rx_skb[i] is now NULL but the loop above
starts each iteration with BUG_ON(!dev->rx_skb[i]). Would a subsequent
MTU change deterministically panic on that BUG_ON()?

Should this error path mirror the first pass and use MAL_RX_CTRL_EMPTY
(plus MAL_RX_CTRL_WRAP for the last slot) rather than zero?

[ ... ]

> @@ -1176,15 +1201,23 @@ static void emac_clear_mal_desc(struct mal_descriptor *desc, int count)
>  static int
>  __emac_prepare_rx_skb(struct sk_buff *skb, struct emac_instance *dev, int slot)
>  {
> +	dma_addr_t dma;
> +
>  	if (unlikely(!skb))
>  		return -ENOMEM;
>  
>  	dev->rx_skb[slot] = skb;
>  	dev->rx_desc[slot].data_len = 0;
>  
> -	dev->rx_desc[slot].data_ptr =
> -	    dma_map_single(&dev->ofdev->dev, skb->data - NET_IP_ALIGN,
> -			   dev->rx_sync_size, DMA_FROM_DEVICE) + NET_IP_ALIGN;
> +	dma = dma_map_single(&dev->ofdev->dev, skb->data - NET_IP_ALIGN,
> +			     dev->rx_sync_size, DMA_FROM_DEVICE);
> +	if (dma_mapping_error(&dev->ofdev->dev, dma)) {
> +		dev->rx_skb[slot] = NULL;
> +		dev_kfree_skb(skb);
> +		return -ENOMEM;
> +	}
> +	dev->rx_desc[slot].data_ptr = dma + NET_IP_ALIGN;
> +	dev->rx_dma[slot] = dma;

[Critical]
Can this corrupt the slot state on a DMA mapping failure?

The new skb is installed into the slot before the mapping is performed:

    dev->rx_skb[slot] = skb;
    dev->rx_desc[slot].data_len = 0;

    dma = dma_map_single(...);
    if (dma_mapping_error(...)) {
        dev->rx_skb[slot] = NULL;
        dev_kfree_skb(skb);
        return -ENOMEM;
    }

On failure, the new skb is freed and rx_skb[slot] is set to NULL, but
dev->rx_desc[slot].data_ptr and dev->rx_dma[slot] still hold the old
mapping, and the previous skb pointer that the caller saved (the
emac_poll_rx() local "skb") is also lost.

The emac_poll_rx() caller paths handle the failure as:

    } else if (unlikely(emac_alloc_rx_skb_napi(dev, slot)))
        goto oom;
    ...
    oom:
        ++dev->estats.rx_dropped_oom;
        emac_recycle_rx_skb(dev, slot, 0);

emac_recycle_rx_skb() then re-arms the descriptor with
MAL_RX_CTRL_EMPTY, handing the still-mapped old buffer back to
hardware. On the next received packet, dev->rx_skb[slot] is NULL and:

    skb = dev->rx_skb[slot];
    ...
    memcpy(copy_skb->data - NET_IP_ALIGN,
           skb->data - NET_IP_ALIGN,
           len + NET_IP_ALIGN);

would dereference NULL in NAPI context. The MAL_RX_CTRL_FIRST branch
that calls emac_alloc_rx_skb_napi() has the same shape.

Would it be safer to perform the dma_map_single() and error check
before installing the new skb into rx_skb[slot], so the slot still
points at a valid skb and mapping on failure?

[ ... ]

> @@ -1808,12 +1863,17 @@ static int emac_poll_rx(void *param, int budget)
>  		continue;
>  	sg:
>  		if (ctrl & MAL_RX_CTRL_FIRST) {
> +			dma_addr_t old_dma = dev->rx_dma[slot];
> +
>  			BUG_ON(dev->rx_sg_skb);
>  			if (unlikely(emac_alloc_rx_skb_napi(dev, slot))) {
>  				DBG(dev, "rx OOM %d" NL, slot);
>  				++dev->estats.rx_dropped_oom;
>  				emac_recycle_rx_skb(dev, slot, 0);
>  			} else {
> +				dma_unmap_single(&dev->ofdev->dev, old_dma,
> +						 dev->rx_sync_size,
> +						 DMA_FROM_DEVICE);
>  				dev->rx_sg_skb = skb;
>  				skb_put(skb, len);
>  			}

[High]
The commit message bullet says:

    - emac_poll_rx: use rx_dma[] shadow array, unmap old skb before
      passing to napi_gro_receive

but the new dma_unmap_single() above only covers the SG
MAL_RX_CTRL_FIRST branch. The non-SG large-packet path
(len >= EMAC_RX_COPY_THRESH) is in emac_poll_rx():

    skb = dev->rx_skb[slot];
    ...
    if (len && len < EMAC_RX_COPY_THRESH) {
        ...
    } else if (unlikely(emac_alloc_rx_skb_napi(dev, slot)))
        goto oom;

    skb_put(skb, len);
    push_packet:
        skb->protocol = eth_type_trans(skb, dev->ndev);
        emac_rx_csum(dev, skb, ctrl);
        napi_gro_receive(&dev->mal->napi, skb);

emac_alloc_rx_skb_napi() calls __emac_prepare_rx_skb() which
overwrites dev->rx_dma[slot] with the new mapping, so the previous
mapping address is lost. The original "skb" is then handed to
napi_gro_receive() while still mapped DMA_FROM_DEVICE.

Does this leak one DMA mapping per large-packet RX, and on the
non-coherent PowerPC platforms this patch targets, hand the stack a
buffer that has not been dma_unmap_single()'d?

Should the non-SG large-packet branch also save old_dma before
emac_alloc_rx_skb_napi() and call dma_unmap_single() before
napi_gro_receive(), mirroring the SG-first branch?

[ ... ]
-- 
This is an AI-generated review.

next prev parent reply	other threads:[~2026-07-02 10:39 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-30  4:16 [PATCH 00/10] net: emac: various cleanups, fixes, and feature additions Rosen Penev
2026-06-30  4:16 ` [PATCH 01/10] net: emac: remove emac_xaht_base() Rosen Penev
2026-06-30  4:16 ` [PATCH 02/10] net: emac: fix sparse __iomem warnings in IAHT register access Rosen Penev
2026-07-02 10:37   ` Paolo Abeni
2026-06-30  4:16 ` [PATCH 03/10] net: emac: use DMA-specific and SMP memory barriers Rosen Penev
2026-07-02 10:42   ` Paolo Abeni
2026-06-30  4:16 ` [PATCH 04/10] net: emac: mal: replace of_get_property with of_property_read_u32 Rosen Penev
2026-06-30  4:16 ` [PATCH 05/10] net: emac: mal: replace busy-wait in mal_poll_disable with wait_event Rosen Penev
2026-07-02 10:46   ` Paolo Abeni
2026-06-30  4:16 ` [PATCH 06/10] net: emac: batch stats, eliminate modulo, tighten barrier in RX poll Rosen Penev
2026-06-30  4:16 ` [PATCH 07/10] net: emac: fix DMA API mapping and unmapping correctness Rosen Penev
2026-07-02 10:39   ` Paolo Abeni [this message]
2026-06-30  4:16 ` [PATCH 08/10] net: emac: replace #ifdef CONFIG_PPC_DCR_NATIVE with IS_ENABLED() Rosen Penev
2026-06-30  4:16 ` [PATCH 09/10] net: emac: add Byte Queue Limits (BQL) support Rosen Penev
2026-06-30  4:16 ` [PATCH 10/10] net: emac: use ndo_get_stats64 instead of ndo_get_stats Rosen Penev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260702103931.147571-1-pabeni@redhat.com \
    --to=pabeni@redhat.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=rosenp@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox