* [Intel-wired-lan] [PATCH net v3] ice: fix packet corruption due to extraneous page flip
@ 2026-05-12 18:19 John Ousterhout
2026-05-13 9:07 ` David Laight
0 siblings, 1 reply; 4+ messages in thread
From: John Ousterhout @ 2026-05-12 18:19 UTC (permalink / raw)
To: stable
Cc: anthony.l.nguyen, intel-wired-lan, przemyslaw.kitszel, netdev,
jacob.e.keller, John Ousterhout
Consider the following sequence of events:
* The bottom half of a buffer page is filled with data from
packet A. The page has a net reference count (reference count
- bias) of 1. The page is returned to the NIC, flipped to
use the top half.
* Before the reference on the page is released, the NIC returns
the page with no data in it ('size' is zero in ice_clean_rx_irq).
In this case the bias does not get decremented. The page still
has a net reference count of 1, so it gets returned to the NIC.
However, ice_put_rx_mbuf flipped the page so that the bottom
half is active.
* If the NIC stores another packet in the page before packet A
has released its reference, the data in packet A will be
overwritten with data from the new packet.
* Unfortunately zero-length buffers occur frequently: they seem
to occur whenever a packet uses every available byte in a
buffer, ending precisely at the end of the buffer. When this
happens the NIC seems to generate an extra zero-length
buffer.
The fix is for ice_put_rx_mbuf not to flip pages that have a
size of 0.
This patch applies directly to longterm stable versions 6.18.27
and 6.12.86; it also seems relevant for 6.6.137 but would need
modifcations for that version. I have not examined earlier
versions.
Unfortunately there is no upstream commit id for this patch because
the ICE driver has undergone a major revision (libeth refactor and
pagepool conversion) that eliminated the buggy code. Thus the
problem no longer exists in the main line.
Cc: stable@vger.kernel.org # 6.12+
Signed-off-by: John Ousterhout <ouster@cs.stanford.edu>
---
drivers/net/ethernet/intel/ice/ice_txrx.c | 23 ++++++++++++++++++++---
1 file changed, 20 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index 51c459a3e722..081c7a7392b7 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -1215,6 +1215,13 @@ static void ice_put_rx_mbuf(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
xdp_frags = xdp_get_shared_info_from_buff(xdp)->nr_frags;
while (idx != ntc) {
+ union ice_32b_rx_flex_desc *rx_desc;
+ unsigned int size;
+
+ rx_desc = ICE_RX_DESC(rx_ring, idx);
+ size = le16_to_cpu(rx_desc->wb.pkt_len) &
+ ICE_RX_FLX_DESC_PKT_LEN_M;
+
buf = &rx_ring->rx_buf[idx];
if (++idx == cnt)
idx = 0;
@@ -1224,10 +1231,20 @@ static void ice_put_rx_mbuf(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
* To do this, only adjust pagecnt_bias for fragments up to
* the total remaining after the XDP program has run.
*/
- if (verdict != ICE_XDP_CONSUMED)
- ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz);
- else if (i++ <= xdp_frags)
+ if (verdict != ICE_XDP_CONSUMED) {
+ /* Don't "flip" the page if size is 0: in this case
+ * the data in the current half will not be used so
+ * it's OK to reuse that half. And, since the bias
+ * didn't get decremented for this half, the page can
+ * be returned to the NIC even if the other half is
+ * still in use, so flipping the page could cause
+ * live packet data to be overwritten.
+ */
+ if (size != 0)
+ ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz);
+ } else if (i++ <= xdp_frags) {
buf->pagecnt_bias++;
+ }
ice_put_rx_buf(rx_ring, buf);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [Intel-wired-lan] [PATCH net v3] ice: fix packet corruption due to extraneous page flip
2026-05-12 18:19 [Intel-wired-lan] [PATCH net v3] ice: fix packet corruption due to extraneous page flip John Ousterhout
@ 2026-05-13 9:07 ` David Laight
2026-05-13 16:28 ` John Ousterhout
0 siblings, 1 reply; 4+ messages in thread
From: David Laight @ 2026-05-13 9:07 UTC (permalink / raw)
To: John Ousterhout
Cc: stable, anthony.l.nguyen, intel-wired-lan, przemyslaw.kitszel,
netdev, jacob.e.keller
On Tue, 12 May 2026 11:19:53 -0700
John Ousterhout <ouster@cs.stanford.edu> wrote:
> Consider the following sequence of events:
> * The bottom half of a buffer page is filled with data from
> packet A. The page has a net reference count (reference count
> - bias) of 1. The page is returned to the NIC, flipped to
> use the top half.
> * Before the reference on the page is released, the NIC returns
> the page with no data in it ('size' is zero in ice_clean_rx_irq).
> In this case the bias does not get decremented. The page still
> has a net reference count of 1, so it gets returned to the NIC.
> However, ice_put_rx_mbuf flipped the page so that the bottom
> half is active.
> * If the NIC stores another packet in the page before packet A
> has released its reference, the data in packet A will be
> overwritten with data from the new packet.
> * Unfortunately zero-length buffers occur frequently: they seem
> to occur whenever a packet uses every available byte in a
> buffer, ending precisely at the end of the buffer. When this
> happens the NIC seems to generate an extra zero-length
> buffer.
> The fix is for ice_put_rx_mbuf not to flip pages that have a
> size of 0.
How is this different from packet B (in the top half) being
freed before packet A (in the bottom half)?
> This patch applies directly to longterm stable versions 6.18.27
> and 6.12.86; it also seems relevant for 6.6.137 but would need
> modifcations for that version. I have not examined earlier
> versions.
>
> Unfortunately there is no upstream commit id for this patch because
> the ICE driver has undergone a major revision (libeth refactor and
> pagepool conversion) that eliminated the buggy code. Thus the
> problem no longer exists in the main line.
>
> Cc: stable@vger.kernel.org # 6.12+
> Signed-off-by: John Ousterhout <ouster@cs.stanford.edu>
> ---
> drivers/net/ethernet/intel/ice/ice_txrx.c | 23 ++++++++++++++++++++---
> 1 file changed, 20 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
> index 51c459a3e722..081c7a7392b7 100644
> --- a/drivers/net/ethernet/intel/ice/ice_txrx.c
> +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
> @@ -1215,6 +1215,13 @@ static void ice_put_rx_mbuf(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
> xdp_frags = xdp_get_shared_info_from_buff(xdp)->nr_frags;
>
> while (idx != ntc) {
> + union ice_32b_rx_flex_desc *rx_desc;
> + unsigned int size;
> +
> + rx_desc = ICE_RX_DESC(rx_ring, idx);
> + size = le16_to_cpu(rx_desc->wb.pkt_len) &
> + ICE_RX_FLX_DESC_PKT_LEN_M;
> +
Looks like you only need to calculate 'size' for the !ICE_XDP_CONSUMED path.
You could also use the (likely cheaper) test for zero:
if (!(rx_desc->wb.pkt_len & cpu_to_le16(ICE_RX_FLX_DESC_PKT_LEN_M))
-- David
> buf = &rx_ring->rx_buf[idx];
> if (++idx == cnt)
> idx = 0;
> @@ -1224,10 +1231,20 @@ static void ice_put_rx_mbuf(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
> * To do this, only adjust pagecnt_bias for fragments up to
> * the total remaining after the XDP program has run.
> */
> - if (verdict != ICE_XDP_CONSUMED)
> - ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz);
> - else if (i++ <= xdp_frags)
> + if (verdict != ICE_XDP_CONSUMED) {
> + /* Don't "flip" the page if size is 0: in this case
> + * the data in the current half will not be used so
> + * it's OK to reuse that half. And, since the bias
> + * didn't get decremented for this half, the page can
> + * be returned to the NIC even if the other half is
> + * still in use, so flipping the page could cause
> + * live packet data to be overwritten.
> + */
> + if (size != 0)
> + ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz);
> + } else if (i++ <= xdp_frags) {
> buf->pagecnt_bias++;
> + }
>
> ice_put_rx_buf(rx_ring, buf);
> }
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Intel-wired-lan] [PATCH net v3] ice: fix packet corruption due to extraneous page flip
2026-05-13 9:07 ` David Laight
@ 2026-05-13 16:28 ` John Ousterhout
2026-05-13 20:49 ` David Laight
0 siblings, 1 reply; 4+ messages in thread
From: John Ousterhout @ 2026-05-13 16:28 UTC (permalink / raw)
To: David Laight
Cc: stable, anthony.l.nguyen, intel-wired-lan, przemyslaw.kitszel,
netdev, jacob.e.keller
On Wed, May 13, 2026 at 2:07 AM David Laight
<david.laight.linux@gmail.com> wrote:
>
> On Tue, 12 May 2026 11:19:53 -0700
> John Ousterhout <ouster@cs.stanford.edu> wrote:
>
> > Consider the following sequence of events:
> > * The bottom half of a buffer page is filled with data from
> > packet A. The page has a net reference count (reference count
> > - bias) of 1. The page is returned to the NIC, flipped to
> > use the top half.
> > * Before the reference on the page is released, the NIC returns
> > the page with no data in it ('size' is zero in ice_clean_rx_irq).
> > In this case the bias does not get decremented. The page still
> > has a net reference count of 1, so it gets returned to the NIC.
> > However, ice_put_rx_mbuf flipped the page so that the bottom
> > half is active.
> > * If the NIC stores another packet in the page before packet A
> > has released its reference, the data in packet A will be
> > overwritten with data from the new packet.
> > * Unfortunately zero-length buffers occur frequently: they seem
> > to occur whenever a packet uses every available byte in a
> > buffer, ending precisely at the end of the buffer. When this
> > happens the NIC seems to generate an extra zero-length
> > buffer.
> > The fix is for ice_put_rx_mbuf not to flip pages that have a
> > size of 0.
>
> How is this different from packet B (in the top half) being
> freed before packet A (in the bottom half)?
I'm not sure exactly what you're referring to here. Are you asking
about a situation where both halves of the page get filled with packet
data and then the second half to be filled is the first to be freed? I
believe that the ICE driver abandons a page if both halves are ever
occupied simultaneously; the page will be returned to the system once
both halves have dropped their references. Thus it doesn't matter
which half is freed first.
> > This patch applies directly to longterm stable versions 6.18.27
> > and 6.12.86; it also seems relevant for 6.6.137 but would need
> > modifcations for that version. I have not examined earlier
> > versions.
> >
> > Unfortunately there is no upstream commit id for this patch because
> > the ICE driver has undergone a major revision (libeth refactor and
> > pagepool conversion) that eliminated the buggy code. Thus the
> > problem no longer exists in the main line.
> >
> > Cc: stable@vger.kernel.org # 6.12+
> > Signed-off-by: John Ousterhout <ouster@cs.stanford.edu>
> > ---
> > drivers/net/ethernet/intel/ice/ice_txrx.c | 23 ++++++++++++++++++++---
> > 1 file changed, 20 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
> > index 51c459a3e722..081c7a7392b7 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_txrx.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
> > @@ -1215,6 +1215,13 @@ static void ice_put_rx_mbuf(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
> > xdp_frags = xdp_get_shared_info_from_buff(xdp)->nr_frags;
> >
> > while (idx != ntc) {
> > + union ice_32b_rx_flex_desc *rx_desc;
> > + unsigned int size;
> > +
> > + rx_desc = ICE_RX_DESC(rx_ring, idx);
> > + size = le16_to_cpu(rx_desc->wb.pkt_len) &
> > + ICE_RX_FLX_DESC_PKT_LEN_M;
> > +
>
> Looks like you only need to calculate 'size' for the !ICE_XDP_CONSUMED path.
> You could also use the (likely cheaper) test for zero:
> if (!(rx_desc->wb.pkt_len & cpu_to_le16(ICE_RX_FLX_DESC_PKT_LEN_M))
>
> -- David
>
> > buf = &rx_ring->rx_buf[idx];
> > if (++idx == cnt)
> > idx = 0;
> > @@ -1224,10 +1231,20 @@ static void ice_put_rx_mbuf(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
> > * To do this, only adjust pagecnt_bias for fragments up to
> > * the total remaining after the XDP program has run.
> > */
> > - if (verdict != ICE_XDP_CONSUMED)
> > - ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz);
> > - else if (i++ <= xdp_frags)
> > + if (verdict != ICE_XDP_CONSUMED) {
> > + /* Don't "flip" the page if size is 0: in this case
> > + * the data in the current half will not be used so
> > + * it's OK to reuse that half. And, since the bias
> > + * didn't get decremented for this half, the page can
> > + * be returned to the NIC even if the other half is
> > + * still in use, so flipping the page could cause
> > + * live packet data to be overwritten.
> > + */
> > + if (size != 0)
> > + ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz);
> > + } else if (i++ <= xdp_frags) {
> > buf->pagecnt_bias++;
> > + }
> >
> > ice_put_rx_buf(rx_ring, buf);
> > }
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Intel-wired-lan] [PATCH net v3] ice: fix packet corruption due to extraneous page flip
2026-05-13 16:28 ` John Ousterhout
@ 2026-05-13 20:49 ` David Laight
0 siblings, 0 replies; 4+ messages in thread
From: David Laight @ 2026-05-13 20:49 UTC (permalink / raw)
To: John Ousterhout
Cc: stable, anthony.l.nguyen, intel-wired-lan, przemyslaw.kitszel,
netdev, jacob.e.keller
On Wed, 13 May 2026 09:28:40 -0700
John Ousterhout <ouster@cs.stanford.edu> wrote:
> On Wed, May 13, 2026 at 2:07 AM David Laight
> <david.laight.linux@gmail.com> wrote:
> >
> > On Tue, 12 May 2026 11:19:53 -0700
> > John Ousterhout <ouster@cs.stanford.edu> wrote:
> >
> > > Consider the following sequence of events:
> > > * The bottom half of a buffer page is filled with data from
> > > packet A. The page has a net reference count (reference count
> > > - bias) of 1. The page is returned to the NIC, flipped to
> > > use the top half.
> > > * Before the reference on the page is released, the NIC returns
> > > the page with no data in it ('size' is zero in ice_clean_rx_irq).
> > > In this case the bias does not get decremented. The page still
> > > has a net reference count of 1, so it gets returned to the NIC.
> > > However, ice_put_rx_mbuf flipped the page so that the bottom
> > > half is active.
> > > * If the NIC stores another packet in the page before packet A
> > > has released its reference, the data in packet A will be
> > > overwritten with data from the new packet.
> > > * Unfortunately zero-length buffers occur frequently: they seem
> > > to occur whenever a packet uses every available byte in a
> > > buffer, ending precisely at the end of the buffer. When this
> > > happens the NIC seems to generate an extra zero-length
> > > buffer.
> > > The fix is for ice_put_rx_mbuf not to flip pages that have a
> > > size of 0.
> >
> > How is this different from packet B (in the top half) being
> > freed before packet A (in the bottom half)?
>
> I'm not sure exactly what you're referring to here. Are you asking
> about a situation where both halves of the page get filled with packet
> data and then the second half to be filled is the first to be freed? I
> believe that the ICE driver abandons a page if both halves are ever
> occupied simultaneously; the page will be returned to the system once
> both halves have dropped their references. Thus it doesn't matter
> which half is freed first.
That is what I was thinking, seems like the logic is over complicated.
If you need to put 4k pages into some kind of iommu rather than 2k buffers
(to contain 1536 byte ethernet packets) then I'd have thought you'd
initially put both halves into adjacent tx ring entries.
If a rx buffer is discarded (eg a zero length fragment or a CRC error,
or even 'copy break' for short packets) then, as an optimisation,
you could reuse the buffer for another receive.
The same could be done if the page is freed by an application.
However it sounds like it doesn't use the 2nd half until the first
completes - otherwise you'd never 'flip' to make the other half
active.
Thinks...
By only putting half of each 4k 'page' into the rx ring the code
will usually save (expensive) iommu setup in the (probably) normal
case where the buffers are freed 'reasonably quickly'.
But that really requires a 'free/with_nic/busy' state for each half
rather then trying to guess from a reference count.
But if the low-level code is recycling the rx buffer (for any reason)
it wants to use the same buffer.
The ethernet driver I wrote (a long time ago, early 90s) allocated
64k as 128 512byte buffers and did an aligned word-sized copy of
every receive frame - most frames were in contiguous memory.
The simplicity of it made up for the cost of the copy, especially
since that was an iommu system.
-- David
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-05-13 20:49 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-12 18:19 [Intel-wired-lan] [PATCH net v3] ice: fix packet corruption due to extraneous page flip John Ousterhout
2026-05-13 9:07 ` David Laight
2026-05-13 16:28 ` John Ousterhout
2026-05-13 20:49 ` David Laight
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox