Netdev List
 help / color / mirror / Atom feed
From: Jacob Keller <jacob.e.keller@intel.com>
To: David Laight <david.laight.linux@gmail.com>,
	John Ousterhout <ouster@cs.stanford.edu>
Cc: <stable@vger.kernel.org>, <anthony.l.nguyen@intel.com>,
	<intel-wired-lan@lists.osuosl.org>,
	<przemyslaw.kitszel@intel.com>, <netdev@vger.kernel.org>
Subject: Re: [PATCH net v3] ice: fix packet corruption due to extraneous page flip
Date: Thu, 14 May 2026 09:43:32 -0700	[thread overview]
Message-ID: <30dc284c-8cc0-4bae-b7b0-99d6d71a66e3@intel.com> (raw)
In-Reply-To: <20260514110112.12bdf5ff@pumpkin>

On 5/14/2026 3:01 AM, David Laight wrote:
> On Wed, 13 May 2026 21:47:11 -0700
> John Ousterhout <ouster@cs.stanford.edu> wrote:
> 
>> On Wed, May 13, 2026 at 1:49 PM David Laight
>> <david.laight.linux@gmail.com> wrote:
>>>
>>> On Wed, 13 May 2026 09:28:40 -0700
>>> John Ousterhout <ouster@cs.stanford.edu> wrote:
>>>  
>>>> On Wed, May 13, 2026 at 2:07 AM David Laight
>>>> <david.laight.linux@gmail.com> wrote:  
>>>>>
>>>>> On Tue, 12 May 2026 11:19:53 -0700
>>>>> John Ousterhout <ouster@cs.stanford.edu> wrote:
>>>>>  
>>>>>> Consider the following sequence of events:
>>>>>> * The bottom half of a buffer page is filled with data from
>>>>>>   packet A. The page has a net reference count (reference count
>>>>>>   - bias) of 1. The page is returned to the NIC, flipped to
>>>>>>   use the top half.
>>>>>> * Before the reference on the page is released, the NIC returns
>>>>>>   the page with no data in it ('size' is zero in ice_clean_rx_irq).
>>>>>>   In this case the bias does not get decremented. The page still
>>>>>>   has a net reference count of 1, so it gets returned to the NIC.
>>>>>>   However, ice_put_rx_mbuf flipped the page so that the bottom
>>>>>>   half is active.
>>>>>> * If the NIC stores another packet in the page before packet A
>>>>>>   has released its reference, the data in packet A will be
>>>>>>   overwritten with data from the new packet.
>>>>>> * Unfortunately zero-length buffers occur frequently: they seem
>>>>>>   to occur whenever a packet uses every available byte in a
>>>>>>   buffer, ending precisely at the end of the buffer. When this
>>>>>>   happens the NIC seems to generate an extra zero-length
>>>>>>   buffer.
>>>>>> The fix is for ice_put_rx_mbuf not to flip pages that have a
>>>>>> size of 0.  
>>>>>
>>>>> How is this different from packet B (in the top half) being
>>>>> freed before packet A (in the bottom half)?  
>>>>
>>>> I'm not sure exactly what you're referring to here. Are you asking
>>>> about a situation where both halves of the page get filled with packet
>>>> data and then the second half to be filled is the first to be freed? I
>>>> believe that the ICE driver abandons a page if both halves are ever
>>>> occupied simultaneously; the page will be returned to the system once
>>>> both halves have dropped their references. Thus it doesn't matter
>>>> which half is freed first.  
>>>
>>> That is what I was thinking, seems like the logic is over complicated.
>>>
>>> If you need to put 4k pages into some kind of iommu rather than 2k buffers
>>> (to contain 1536 byte ethernet packets) then I'd have thought you'd
>>> initially put both halves into adjacent tx ring entries.
>>> If a rx buffer is discarded (eg a zero length fragment or a CRC error,
>>> or even 'copy break' for short packets) then, as an optimisation,
>>> you could reuse the buffer for another receive.
>>> The same could be done if the page is freed by an application.
>>>
>>> However it sounds like it doesn't use the 2nd half until the first
>>> completes - otherwise you'd never 'flip' to make the other half
>>> active.
>>>
>>> Thinks...
>>> By only putting half of each 4k 'page' into the rx ring the code
>>> will usually save (expensive) iommu setup in the (probably) normal
>>> case where the buffers are freed 'reasonably quickly'.
>>> But that really requires a 'free/with_nic/busy' state for each half
>>> rather then trying to guess from a reference count.
>>>
>>> But if the low-level code is recycling the rx buffer (for any reason)
>>> it wants to use the same buffer.
>>>
>>> The ethernet driver I wrote (a long time ago, early 90s) allocated
>>> 64k as 128 512byte buffers and did an aligned word-sized copy of
>>> every receive frame - most frames were in contiguous memory.
>>> The simplicity of it made up for the cost of the copy, especially
>>> since that was an iommu system.  
>>
>> I'm not here to defend the logic (and it has been replaced with
>> something that is probably simpler and more efficient); I'm just
>> suggesting a bug fix for the stable releases that still have this
>> logic.

Right. We definitely want a fix for the possible data corruption in
stable. Ideally one as simple as possible.

> 
> You've forced me to look at all of the function :-)
> I've noticed a few things:
> - If ice_add_xdp_frag() fails (because there are too many fragments)
>   then the rest of the fragments are left in the tx ring (instead
>   of being discarded) - so are likely to be treated as a full packet
>   later on.
> - Frames with status errors (crc, framing etc) are discarded after
>   the skb is built - surely that should happen before the xdp 'program'
>   is called.
> - If the remote system send a very very long frame (traditionally the PHY's
>   'jabber detect' didn't always work) you can end up with all of the rx
>   ring being full of a single partial packet.
> 
> I think you need to avoid calling ice_add_xdp_frag() when 'size == 0'.
> Then in ice_put_rx_mbuf() unconditionally call ice_put_rx_buf() for
> zero length fragments.
> The comment would be 'zero length fragments can always be reused'.
> 

That seems correct.

> The zero length fragments almost certainly exist because the mac hardware
> advances the the new buffer expecting more data - but only gets the 
> 4 byte CRC. So the zero length buffer contains the receive status.
> 

That matches my understanding.

  reply	other threads:[~2026-05-14 16:43 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-12 18:19 [PATCH net v3] ice: fix packet corruption due to extraneous page flip John Ousterhout
2026-05-13  9:07 ` David Laight
2026-05-13 16:28   ` John Ousterhout
2026-05-13 20:49     ` David Laight
2026-05-14  4:47       ` John Ousterhout
2026-05-14 10:01         ` David Laight
2026-05-14 16:43           ` Jacob Keller [this message]
2026-05-14  9:08 ` [Intel-wired-lan] " Loktionov, Aleksandr

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=30dc284c-8cc0-4bae-b7b0-99d6d71a66e3@intel.com \
    --to=jacob.e.keller@intel.com \
    --cc=anthony.l.nguyen@intel.com \
    --cc=david.laight.linux@gmail.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=netdev@vger.kernel.org \
    --cc=ouster@cs.stanford.edu \
    --cc=przemyslaw.kitszel@intel.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox