Netdev List
 help / color / mirror / Atom feed
* [PATCH net v2] ice: fix packet corruption due to extraneous page flip
@ 2026-05-07 18:38 John Ousterhout
  2026-05-07 22:11 ` Jacob Keller
  0 siblings, 1 reply; 6+ messages in thread
From: John Ousterhout @ 2026-05-07 18:38 UTC (permalink / raw)
  To: anthony.l.nguyen
  Cc: intel-wired-lan, przemyslaw.kitszel, netdev, John Ousterhout

Consider the following sequence of events:
* The bottom half of a buffer page is filled with data from
  packet A. The page has a net reference count (reference count
  - bias) of 1. The page is returned to the NIC, flipped to
  use the top half.
* Before the reference on the page is released, the NIC returns
  the page with no data in it ('size' is zero in ice_clean_rx_irq).
  In this case the bias does not get decremented. The page still
  has a net reference count of 1, so it gets returned to the NIC.
  However, ice_put_rx_mbuf flipped the page so that the bottom
  half is active.
* If the NIC stores another packet in the page before packet A
  has released its reference, the data in packet A will be
  overwritten with data from the new packet.
The fix is for ice_put_rx_mbuf not to flip pages that have a
size of 0.

Note: major revisions to the ice driver make this patch irrelevant
for recent versions. It applies to longterm stable versions
6.18.27 and 6.12.86; it also seems relevant for 6.6.137, but would
need modifications for that version. I have not examined earlier
versions

Signed-off-by: John Ousterhout <ouster@cs.stanford.edu>
---
 drivers/net/ethernet/intel/ice/ice_txrx.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index 51c459a3e722..081c7a7392b7 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -1215,6 +1215,13 @@ static void ice_put_rx_mbuf(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
 		xdp_frags = xdp_get_shared_info_from_buff(xdp)->nr_frags;
 
 	while (idx != ntc) {
+		union ice_32b_rx_flex_desc *rx_desc;
+		unsigned int size;
+
+		rx_desc = ICE_RX_DESC(rx_ring, idx);
+		size = le16_to_cpu(rx_desc->wb.pkt_len) &
+		       ICE_RX_FLX_DESC_PKT_LEN_M;
+
 		buf = &rx_ring->rx_buf[idx];
 		if (++idx == cnt)
 			idx = 0;
@@ -1224,10 +1231,20 @@ static void ice_put_rx_mbuf(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
 		 * To do this, only adjust pagecnt_bias for fragments up to
 		 * the total remaining after the XDP program has run.
 		 */
-		if (verdict != ICE_XDP_CONSUMED)
-			ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz);
-		else if (i++ <= xdp_frags)
+		if (verdict != ICE_XDP_CONSUMED) {
+			/* Don't "flip" the page if size is 0: in this case
+			 * the data in the current half will not be used so
+			 * it's OK to reuse that half. And, since the bias
+			 * didn't get decremented for this half, the page can
+			 * be returned to the NIC even if the other half is
+			 * still in use, so flipping the page could cause
+			 * live packet data to be overwritten.
+			 */
+			if (size != 0)
+				ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz);
+		} else if (i++ <= xdp_frags) {
 			buf->pagecnt_bias++;
+		}
 
 		ice_put_rx_buf(rx_ring, buf);
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net v2] ice: fix packet corruption due to extraneous page flip
  2026-05-07 18:38 [PATCH net v2] ice: fix packet corruption due to extraneous page flip John Ousterhout
@ 2026-05-07 22:11 ` Jacob Keller
  2026-05-08  2:37   ` John Ousterhout
  0 siblings, 1 reply; 6+ messages in thread
From: Jacob Keller @ 2026-05-07 22:11 UTC (permalink / raw)
  To: John Ousterhout, anthony.l.nguyen, Jakub Kicinski, Paolo Abeni
  Cc: intel-wired-lan, przemyslaw.kitszel, netdev, stable

On 5/7/2026 11:38 AM, John Ousterhout wrote:
> Note: major revisions to the ice driver make this patch irrelevant
> for recent versions. It applies to longterm stable versions
> 6.18.27 and 6.12.86; it also seems relevant for 6.6.137, but would
> need modifications for that version. I have not examined earlier
> versions
> 

From this description I take it this only applies to the ice driver
prior to its conversion to page pool?

In that case, I think you need to Cc: stable@vger.kernel.org and include
the relevant versions you intend to target.

I think this case is "unique" since there would not be an upstream
equivalent patch. But that is merely because we removed the faulty code
before it could be fixed.

I'm not 100% sure whta method to follow since typical stable rules don't
really like taking patches that don't apply to mainline...

Even with it being somewhat rare to get 0 size packet, it is not
impossible and packet corruption is a Big(TM) deal.

Thanks,
Jake

> Signed-off-by: John Ousterhout <ouster@cs.stanford.edu>
> ---
>  drivers/net/ethernet/intel/ice/ice_txrx.c | 23 ++++++++++++++++++++---
>  1 file changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
> index 51c459a3e722..081c7a7392b7 100644
> --- a/drivers/net/ethernet/intel/ice/ice_txrx.c
> +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
> @@ -1215,6 +1215,13 @@ static void ice_put_rx_mbuf(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
>  		xdp_frags = xdp_get_shared_info_from_buff(xdp)->nr_frags;
>  
>  	while (idx != ntc) {
> +		union ice_32b_rx_flex_desc *rx_desc;
> +		unsigned int size;
> +
> +		rx_desc = ICE_RX_DESC(rx_ring, idx);
> +		size = le16_to_cpu(rx_desc->wb.pkt_len) &
> +		       ICE_RX_FLX_DESC_PKT_LEN_M;
> +
>  		buf = &rx_ring->rx_buf[idx];
>  		if (++idx == cnt)
>  			idx = 0;
> @@ -1224,10 +1231,20 @@ static void ice_put_rx_mbuf(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
>  		 * To do this, only adjust pagecnt_bias for fragments up to
>  		 * the total remaining after the XDP program has run.
>  		 */
> -		if (verdict != ICE_XDP_CONSUMED)
> -			ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz);
> -		else if (i++ <= xdp_frags)
> +		if (verdict != ICE_XDP_CONSUMED) {
> +			/* Don't "flip" the page if size is 0: in this case
> +			 * the data in the current half will not be used so
> +			 * it's OK to reuse that half. And, since the bias
> +			 * didn't get decremented for this half, the page can
> +			 * be returned to the NIC even if the other half is
> +			 * still in use, so flipping the page could cause
> +			 * live packet data to be overwritten.
> +			 */
> +			if (size != 0)
> +				ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz);
> +		} else if (i++ <= xdp_frags) {
>  			buf->pagecnt_bias++;
> +		}
>  
>  		ice_put_rx_buf(rx_ring, buf);
>  	}


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net v2] ice: fix packet corruption due to extraneous page flip
  2026-05-07 22:11 ` Jacob Keller
@ 2026-05-08  2:37   ` John Ousterhout
  2026-05-08 21:55     ` Jacob Keller
  0 siblings, 1 reply; 6+ messages in thread
From: John Ousterhout @ 2026-05-08  2:37 UTC (permalink / raw)
  To: Jacob Keller
  Cc: anthony.l.nguyen, Jakub Kicinski, Paolo Abeni, intel-wired-lan,
	przemyslaw.kitszel, netdev, stable

Correct: this patch only applies to the ice driver before its conversion.

The patch applies to versions 6.18.27 and 6.12.86. I believe the bug
may also be present in 6.6.137, but the code has a slightly different
structure there (the function ice_put_rx_mbuf doesn't yet exist in
that version) so the patch would need to be reworked a bit.

This situation isn't all that rare. It isn't a zero-length packet that
triggers it; it seems to happen if a packet uses every available byte
in a buffer, ending precisely at the end of the buffer. When this
happens, the NIC seems to generate an extra zero-length "buffer". This
happens quite frequently (thousands of times per second in some of my
workloads).

What keeps corruption from happening constantly is that there is only
a problem if the "other half" of the buffer page is still active when
the 0-length buffer is received from the NIC. I suspect that with TCP
this is pretty unlikely: packet buffers get recycled quickly. If the
other half is not in use, then it doesn't matter whether the page gets
"flipped" while processing the 0-length buffer. I ran into this
problem because I was testing Homa under conditions that caused some
packet buffers to stay alive for longer periods of time.

-John-


On Thu, May 7, 2026 at 3:11 PM Jacob Keller <jacob.e.keller@intel.com> wrote:
>
> On 5/7/2026 11:38 AM, John Ousterhout wrote:
> > Note: major revisions to the ice driver make this patch irrelevant
> > for recent versions. It applies to longterm stable versions
> > 6.18.27 and 6.12.86; it also seems relevant for 6.6.137, but would
> > need modifications for that version. I have not examined earlier
> > versions
> >
>
> From this description I take it this only applies to the ice driver
> prior to its conversion to page pool?
>
> In that case, I think you need to Cc: stable@vger.kernel.org and include
> the relevant versions you intend to target.
>
> I think this case is "unique" since there would not be an upstream
> equivalent patch. But that is merely because we removed the faulty code
> before it could be fixed.
>
> I'm not 100% sure whta method to follow since typical stable rules don't
> really like taking patches that don't apply to mainline...
>
> Even with it being somewhat rare to get 0 size packet, it is not
> impossible and packet corruption is a Big(TM) deal.
>
> Thanks,
> Jake
>
> > Signed-off-by: John Ousterhout <ouster@cs.stanford.edu>
> > ---
> >  drivers/net/ethernet/intel/ice/ice_txrx.c | 23 ++++++++++++++++++++---
> >  1 file changed, 20 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
> > index 51c459a3e722..081c7a7392b7 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_txrx.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
> > @@ -1215,6 +1215,13 @@ static void ice_put_rx_mbuf(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
> >               xdp_frags = xdp_get_shared_info_from_buff(xdp)->nr_frags;
> >
> >       while (idx != ntc) {
> > +             union ice_32b_rx_flex_desc *rx_desc;
> > +             unsigned int size;
> > +
> > +             rx_desc = ICE_RX_DESC(rx_ring, idx);
> > +             size = le16_to_cpu(rx_desc->wb.pkt_len) &
> > +                    ICE_RX_FLX_DESC_PKT_LEN_M;
> > +
> >               buf = &rx_ring->rx_buf[idx];
> >               if (++idx == cnt)
> >                       idx = 0;
> > @@ -1224,10 +1231,20 @@ static void ice_put_rx_mbuf(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
> >                * To do this, only adjust pagecnt_bias for fragments up to
> >                * the total remaining after the XDP program has run.
> >                */
> > -             if (verdict != ICE_XDP_CONSUMED)
> > -                     ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz);
> > -             else if (i++ <= xdp_frags)
> > +             if (verdict != ICE_XDP_CONSUMED) {
> > +                     /* Don't "flip" the page if size is 0: in this case
> > +                      * the data in the current half will not be used so
> > +                      * it's OK to reuse that half. And, since the bias
> > +                      * didn't get decremented for this half, the page can
> > +                      * be returned to the NIC even if the other half is
> > +                      * still in use, so flipping the page could cause
> > +                      * live packet data to be overwritten.
> > +                      */
> > +                     if (size != 0)
> > +                             ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz);
> > +             } else if (i++ <= xdp_frags) {
> >                       buf->pagecnt_bias++;
> > +             }
> >
> >               ice_put_rx_buf(rx_ring, buf);
> >       }
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net v2] ice: fix packet corruption due to extraneous page flip
  2026-05-08  2:37   ` John Ousterhout
@ 2026-05-08 21:55     ` Jacob Keller
  2026-05-08 21:59       ` John Ousterhout
  0 siblings, 1 reply; 6+ messages in thread
From: Jacob Keller @ 2026-05-08 21:55 UTC (permalink / raw)
  To: John Ousterhout
  Cc: anthony.l.nguyen, Jakub Kicinski, Paolo Abeni, intel-wired-lan,
	przemyslaw.kitszel, netdev, stable

On 5/7/2026 7:37 PM, John Ousterhout wrote:
> Correct: this patch only applies to the ice driver before its conversion.
> 
> The patch applies to versions 6.18.27 and 6.12.86. I believe the bug
> may also be present in 6.6.137, but the code has a slightly different
> structure there (the function ice_put_rx_mbuf doesn't yet exist in
> that version) so the patch would need to be reworked a bit.
> 
> This situation isn't all that rare. It isn't a zero-length packet that
> triggers it; it seems to happen if a packet uses every available byte
> in a buffer, ending precisely at the end of the buffer. When this
> happens, the NIC seems to generate an extra zero-length "buffer". This
> happens quite frequently (thousands of times per second in some of my
> workloads).
> 
> What keeps corruption from happening constantly is that there is only
> a problem if the "other half" of the buffer page is still active when
> the 0-length buffer is received from the NIC. I suspect that with TCP
> this is pretty unlikely: packet buffers get recycled quickly. If the
> other half is not in use, then it doesn't matter whether the page gets
> "flipped" while processing the 0-length buffer. I ran into this
> problem because I was testing Homa under conditions that caused some
> packet buffers to stay alive for longer periods of time.
> 
> -John-
Right. So I think we need to make sure the patch is cc'd to stable.
Technically it doesn't strictly follow any of the 3 rules, but its
closest to 3 with a clarification that there is no upstream equivalent
due to the libeth Rx refactor.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net v2] ice: fix packet corruption due to extraneous page flip
  2026-05-08 21:55     ` Jacob Keller
@ 2026-05-08 21:59       ` John Ousterhout
  2026-05-08 22:34         ` Jacob Keller
  0 siblings, 1 reply; 6+ messages in thread
From: John Ousterhout @ 2026-05-08 21:59 UTC (permalink / raw)
  To: Jacob Keller
  Cc: anthony.l.nguyen, Jakub Kicinski, Paolo Abeni, intel-wired-lan,
	przemyslaw.kitszel, netdev, stable

On Fri, May 8, 2026 at 2:55 PM Jacob Keller <jacob.e.keller@intel.com> wrote:
>
> On 5/7/2026 7:37 PM, John Ousterhout wrote:
> > Correct: this patch only applies to the ice driver before its conversion.
> >
> > The patch applies to versions 6.18.27 and 6.12.86. I believe the bug
> > may also be present in 6.6.137, but the code has a slightly different
> > structure there (the function ice_put_rx_mbuf doesn't yet exist in
> > that version) so the patch would need to be reworked a bit.
> >
> > This situation isn't all that rare. It isn't a zero-length packet that
> > triggers it; it seems to happen if a packet uses every available byte
> > in a buffer, ending precisely at the end of the buffer. When this
> > happens, the NIC seems to generate an extra zero-length "buffer". This
> > happens quite frequently (thousands of times per second in some of my
> > workloads).
> >
> > What keeps corruption from happening constantly is that there is only
> > a problem if the "other half" of the buffer page is still active when
> > the 0-length buffer is received from the NIC. I suspect that with TCP
> > this is pretty unlikely: packet buffers get recycled quickly. If the
> > other half is not in use, then it doesn't matter whether the page gets
> > "flipped" while processing the 0-length buffer. I ran into this
> > problem because I was testing Homa under conditions that caused some
> > packet buffers to stay alive for longer periods of time.
> >
> > -John-
> Right. So I think we need to make sure the patch is cc'd to stable.
> Technically it doesn't strictly follow any of the 3 rules, but its
> closest to 3 with a clarification that there is no upstream equivalent
> due to the libeth Rx refactor.

It looks like messages on this chain have been cc-ed to stable since
your first message. Is that sufficient, or do I need to resubmit (e.g.
v3) with stable in the cc list?

-John-

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net v2] ice: fix packet corruption due to extraneous page flip
  2026-05-08 21:59       ` John Ousterhout
@ 2026-05-08 22:34         ` Jacob Keller
  0 siblings, 0 replies; 6+ messages in thread
From: Jacob Keller @ 2026-05-08 22:34 UTC (permalink / raw)
  To: John Ousterhout
  Cc: anthony.l.nguyen, Jakub Kicinski, Paolo Abeni, intel-wired-lan,
	przemyslaw.kitszel, netdev, stable

On 5/8/2026 2:59 PM, John Ousterhout wrote:
> On Fri, May 8, 2026 at 2:55 PM Jacob Keller <jacob.e.keller@intel.com> wrote:
>>
>> On 5/7/2026 7:37 PM, John Ousterhout wrote:
>>> Correct: this patch only applies to the ice driver before its conversion.
>>>
>>> The patch applies to versions 6.18.27 and 6.12.86. I believe the bug
>>> may also be present in 6.6.137, but the code has a slightly different
>>> structure there (the function ice_put_rx_mbuf doesn't yet exist in
>>> that version) so the patch would need to be reworked a bit.
>>>
>>> This situation isn't all that rare. It isn't a zero-length packet that
>>> triggers it; it seems to happen if a packet uses every available byte
>>> in a buffer, ending precisely at the end of the buffer. When this
>>> happens, the NIC seems to generate an extra zero-length "buffer". This
>>> happens quite frequently (thousands of times per second in some of my
>>> workloads).
>>>
>>> What keeps corruption from happening constantly is that there is only
>>> a problem if the "other half" of the buffer page is still active when
>>> the 0-length buffer is received from the NIC. I suspect that with TCP
>>> this is pretty unlikely: packet buffers get recycled quickly. If the
>>> other half is not in use, then it doesn't matter whether the page gets
>>> "flipped" while processing the 0-length buffer. I ran into this
>>> problem because I was testing Homa under conditions that caused some
>>> packet buffers to stay alive for longer periods of time.
>>>
>>> -John-
>> Right. So I think we need to make sure the patch is cc'd to stable.
>> Technically it doesn't strictly follow any of the 3 rules, but its
>> closest to 3 with a clarification that there is no upstream equivalent
>> due to the libeth Rx refactor.
> 
> It looks like messages on this chain have been cc-ed to stable since
> your first message. Is that sufficient, or do I need to resubmit (e.g.
> v3) with stable in the cc list?
> 
> -John-

I had added cc to stable to get some visibility, but I suspect that it
won't show up to the stable maintainers without being sent fully as a
patch that can be picked up by patchwork etc. Thus....

Its probably best to send a version to stable along with a comment about
why you can't list an upstream commit id following the guidelines from
Documentation/process/stable-rules.rst specifically the "option 3" rule,
since we can't apply this fix to any main tree, and there is no
equivalent commit already to backport.

Its a bit unorthodox but I can't see any other solution. It is also
important to be extremely clear in the commit to explain why it deviates
from the upstream (which was fixed accidentally by libeth refactor and
pagepool conversion) as to why we need a separate commit is necessary.

For now I would just target the kernels that the patch easily applies
on. Fixing some is better than fixing none. For the 6.6.x series, I can
try to poke someone from Intel to see if we can get something tested.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-05-08 22:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-07 18:38 [PATCH net v2] ice: fix packet corruption due to extraneous page flip John Ousterhout
2026-05-07 22:11 ` Jacob Keller
2026-05-08  2:37   ` John Ousterhout
2026-05-08 21:55     ` Jacob Keller
2026-05-08 21:59       ` John Ousterhout
2026-05-08 22:34         ` Jacob Keller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox