[PATCH net] net: fix potential out-of-bound access in set_rps

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH net] net: fix potential out-of-bound access in set_rps_cpu()
@ 2026-02-20 14:43 Eric Dumazet
  2026-02-20 21:52 ` [net] " Jakub Kicinski
  0 siblings, 1 reply; 3+ messages in thread
From: Eric Dumazet @ 2026-02-20 14:43 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, Willem de Bruijn, netdev,
	eric.dumazet, Eric Dumazet, Krishna Kumar

Blamed commit forgot that set_rps_cpu() was reloading
rxqueue->rps_flow_table.

If this pointer has changed between get_rps_cpu() read it initially,
and the new table size is smaller than @flow_id,
we can access out-of-bound data and either corrupt memory or crash.

Fix this by passing the flow_table pointer from get_rps_cpu()
to set_rps_cpu(), more in line with RCU rules.

Fixes: 48aa30443e52 ("net: Cache hash and flow_id to avoid recalculation")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Krishna Kumar <krikku@gmail.com>
---
 net/core/dev.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 096b3ff13f6b9bf685cb74d0e762a2b00e97d9de..7c5e69c771caf7fd590e602384be49019853b490 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4993,13 +4993,11 @@ static bool rps_flow_is_active(struct rps_dev_flow *rflow,
 static struct rps_dev_flow *
 set_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 	    struct rps_dev_flow *rflow, u16 next_cpu, u32 hash,
-	    u32 flow_id)
+	    u32 flow_id, struct rps_dev_flow_table *flow_table)
 {
 	if (next_cpu < nr_cpu_ids) {
 		u32 head;
 #ifdef CONFIG_RFS_ACCEL
-		struct netdev_rx_queue *rxqueue;
-		struct rps_dev_flow_table *flow_table;
 		struct rps_dev_flow *old_rflow;
 		struct rps_dev_flow *tmp_rflow;
 		unsigned int tmp_cpu;
@@ -5014,11 +5012,6 @@ set_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 		if (rxq_index == skb_get_rx_queue(skb))
 			goto out;
 
-		rxqueue = dev->_rx + rxq_index;
-		flow_table = rcu_dereference(rxqueue->rps_flow_table);
-		if (!flow_table)
-			goto out;
-
 		tmp_rflow = &flow_table->flows[flow_id];
 		tmp_cpu = READ_ONCE(tmp_rflow->cpu);
 
@@ -5134,7 +5127,7 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 		      rflow->last_qtail)) >= 0)) {
 			tcpu = next_cpu;
 			rflow = set_rps_cpu(dev, skb, rflow, next_cpu, hash,
-					    flow_id);
+					    flow_id, flow_table);
 		}
 
 		if (tcpu < nr_cpu_ids && cpu_online(tcpu)) {
-- 
2.53.0.345.g96ddfc5eaa-goog


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [net] net: fix potential out-of-bound access in set_rps_cpu()
  2026-02-20 14:43 [PATCH net] net: fix potential out-of-bound access in set_rps_cpu() Eric Dumazet
@ 2026-02-20 21:52 ` Jakub Kicinski
  2026-02-20 22:03   ` Eric Dumazet
  0 siblings, 1 reply; 3+ messages in thread
From: Jakub Kicinski @ 2026-02-20 21:52 UTC (permalink / raw)
  To: edumazet
  Cc: Jakub Kicinski, willemb, netdev, davem, pabeni, krikku, horms,
	kuniyu, eric.dumazet

Hi Eric!

I'm attaching the AI review. It's a bit long, I stopped reading
half way thru. But looking at the code I think AI is onto something.
We have fetched the flow_table in get_rps_cpu() based on
skb_get_rx_queue(skb), but in set_rps_cpu() we are fetching the table
for the target / next_cpu, which we make sure is _not_ the same:

    if (rxq_index == skb_get_rx_queue(skb))

---
net: fix potential out-of-bound access in set_rps_cpu()

This commit addresses an out-of-bound access issue introduced when
set_rps_cpu() was reloading rxqueue->rps_flow_table. The fix passes the
flow_table pointer from get_rps_cpu() to set_rps_cpu() to follow RCU rules.

> diff --git a/net/core/dev.c b/net/core/dev.c
> index 096b3ff13f6b..7c5e69c771ca 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4993,13 +4993,11 @@ static bool rps_flow_is_active(struct rps_dev_flow *rflow,
>  static struct rps_dev_flow *
>  set_rps_cpu(struct net_device *dev, struct sk_buff *skb,
>  	    struct rps_dev_flow *rflow, u16 next_cpu, u32 hash,
> -	    u32 flow_id)
> +	    u32 flow_id, struct rps_dev_flow_table *flow_table)
>  {
>  	if (next_cpu < nr_cpu_ids) {
>  		u32 head;
>  #ifdef CONFIG_RFS_ACCEL
> -		struct netdev_rx_queue *rxqueue;
> -		struct rps_dev_flow_table *flow_table;
>  		struct rps_dev_flow *old_rflow;
>  		struct rps_dev_flow *tmp_rflow;
>  		unsigned int tmp_cpu;
> @@ -5014,11 +5012,6 @@ set_rps_cpu(struct net_device *dev, struct sk_buff *skb,
>  		if (rxq_index == skb_get_rx_queue(skb))
>  			goto out;
>
> -		rxqueue = dev->_rx + rxq_index;
> -		flow_table = rcu_dereference(rxqueue->rps_flow_table);
> -		if (!flow_table)
> -			goto out;
> -
>  		tmp_rflow = &flow_table->flows[flow_id];
>  		tmp_cpu = READ_ONCE(tmp_rflow->cpu);
>

Does this change introduce an aliasing issue in the CONFIG_RFS_ACCEL path?

In get_rps_cpu(), the flow_table parameter is obtained from the receiving
queue (dev->_rx[skb_get_rx_queue(skb)]) and rflow points to
&flow_table->flows[flow_id].

When this rflow is passed into set_rps_cpu() with the same flow_table and
flow_id, tmp_rflow is computed as &flow_table->flows[flow_id]:

net/core/dev.c:set_rps_cpu() {
	tmp_rflow = &flow_table->flows[flow_id];
	tmp_cpu = READ_ONCE(tmp_rflow->cpu);
	rc = cpu_rmap_lookup_index(dev->rx_cpu_rmap, next_cpu);
	if (rc >= 0)
		next_cpu = rc;

	old_rflow = rflow;
	rflow = tmp_rflow;
	WRITE_ONCE(rflow->filter, rc);
	WRITE_ONCE(rflow->hash, hash);

	if (old_rflow->filter == rc)
		WRITE_ONCE(old_rflow->filter, RPS_NO_FILTER);
}

Since rflow and tmp_rflow are derived from the same flow_table and flow_id,
they point to the same memory location. After the assignments old_rflow =
rflow and rflow = tmp_rflow, all three pointers alias the same object.

The code stores rc into rflow->filter with WRITE_ONCE(rflow->filter, rc),
then checks if (old_rflow->filter == rc). Because old_rflow is rflow (which
is tmp_rflow), this condition is always true, causing WRITE_ONCE(old_rflow->
filter, RPS_NO_FILTER) to immediately clear the filter ID that was just set.

This means the hardware filter ID is never retained in the rps_dev_flow
entry. On subsequent packets for the same flow, the stale-flow check:

	if (READ_ONCE(tmp_rflow->filter) != RPS_NO_FILTER)

will always find RPS_NO_FILTER, causing ndo_rx_flow_steer() to be called
redundantly for every packet even when the flow is already steered.

Was the intent to use different flow tables for the receiving and target
queues so that the filter deduplication logic could work properly?

> @@ -5134,7 +5127,7 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
>  		      rflow->last_qtail)) >= 0)) {
>  			tcpu = next_cpu;
>  			rflow = set_rps_cpu(dev, skb, rflow, next_cpu, hash,
> -					    flow_id);
> +					    flow_id, flow_table);
>  		}
>
>  		if (tcpu < nr_cpu_ids && cpu_online(tcpu)) {
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [net] net: fix potential out-of-bound access in set_rps_cpu()
  2026-02-20 21:52 ` [net] " Jakub Kicinski
@ 2026-02-20 22:03   ` Eric Dumazet
  0 siblings, 0 replies; 3+ messages in thread
From: Eric Dumazet @ 2026-02-20 22:03 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: willemb, netdev, davem, pabeni, krikku, horms, kuniyu,
	eric.dumazet

On Fri, Feb 20, 2026 at 10:55 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Hi Eric!
>
> I'm attaching the AI review. It's a bit long, I stopped reading
> half way thru. But looking at the code I think AI is onto something.
> We have fetched the flow_table in get_rps_cpu() based on
> skb_get_rx_queue(skb), but in set_rps_cpu() we are fetching the table
> for the target / next_cpu, which we make sure is _not_ the same:
>
>     if (rxq_index == skb_get_rx_queue(skb))
>
> ---
> net: fix potential out-of-bound access in set_rps_cpu()
>
> This commit addresses an out-of-bound access issue introduced when
> set_rps_cpu() was reloading rxqueue->rps_flow_table. The fix passes the
> flow_table pointer from get_rps_cpu() to set_rps_cpu() to follow RCU rules.
>
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 096b3ff13f6b..7c5e69c771ca 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -4993,13 +4993,11 @@ static bool rps_flow_is_active(struct rps_dev_flow *rflow,
> >  static struct rps_dev_flow *
> >  set_rps_cpu(struct net_device *dev, struct sk_buff *skb,
> >           struct rps_dev_flow *rflow, u16 next_cpu, u32 hash,
> > -         u32 flow_id)
> > +         u32 flow_id, struct rps_dev_flow_table *flow_table)
> >  {
> >       if (next_cpu < nr_cpu_ids) {
> >               u32 head;
> >  #ifdef CONFIG_RFS_ACCEL
> > -             struct netdev_rx_queue *rxqueue;
> > -             struct rps_dev_flow_table *flow_table;
> >               struct rps_dev_flow *old_rflow;
> >               struct rps_dev_flow *tmp_rflow;
> >               unsigned int tmp_cpu;
> > @@ -5014,11 +5012,6 @@ set_rps_cpu(struct net_device *dev, struct sk_buff *skb,
> >               if (rxq_index == skb_get_rx_queue(skb))
> >                       goto out;
> >
> > -             rxqueue = dev->_rx + rxq_index;
> > -             flow_table = rcu_dereference(rxqueue->rps_flow_table);
> > -             if (!flow_table)
> > -                     goto out;
> > -
> >               tmp_rflow = &flow_table->flows[flow_id];
> >               tmp_cpu = READ_ONCE(tmp_rflow->cpu);
> >
>
> Does this change introduce an aliasing issue in the CONFIG_RFS_ACCEL path?
>
> In get_rps_cpu(), the flow_table parameter is obtained from the receiving
> queue (dev->_rx[skb_get_rx_queue(skb)]) and rflow points to
> &flow_table->flows[flow_id].
>
> When this rflow is passed into set_rps_cpu() with the same flow_table and
> flow_id, tmp_rflow is computed as &flow_table->flows[flow_id]:
>
> net/core/dev.c:set_rps_cpu() {
>         tmp_rflow = &flow_table->flows[flow_id];
>         tmp_cpu = READ_ONCE(tmp_rflow->cpu);
>         rc = cpu_rmap_lookup_index(dev->rx_cpu_rmap, next_cpu);
>         if (rc >= 0)
>                 next_cpu = rc;
>
>         old_rflow = rflow;
>         rflow = tmp_rflow;
>         WRITE_ONCE(rflow->filter, rc);
>         WRITE_ONCE(rflow->hash, hash);
>
>         if (old_rflow->filter == rc)
>                 WRITE_ONCE(old_rflow->filter, RPS_NO_FILTER);
> }
>
> Since rflow and tmp_rflow are derived from the same flow_table and flow_id,
> they point to the same memory location. After the assignments old_rflow =
> rflow and rflow = tmp_rflow, all three pointers alias the same object.
>
> The code stores rc into rflow->filter with WRITE_ONCE(rflow->filter, rc),
> then checks if (old_rflow->filter == rc). Because old_rflow is rflow (which
> is tmp_rflow), this condition is always true, causing WRITE_ONCE(old_rflow->
> filter, RPS_NO_FILTER) to immediately clear the filter ID that was just set.
>
> This means the hardware filter ID is never retained in the rps_dev_flow
> entry. On subsequent packets for the same flow, the stale-flow check:
>
>         if (READ_ONCE(tmp_rflow->filter) != RPS_NO_FILTER)
>
> will always find RPS_NO_FILTER, causing ndo_rx_flow_steer() to be called
> redundantly for every packet even when the flow is already steered.
>
> Was the intent to use different flow tables for the receiving and target
> queues so that the filter deduplication logic could work properly?
>
> > @@ -5134,7 +5127,7 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
> >                     rflow->last_qtail)) >= 0)) {
> >                       tcpu = next_cpu;
> >                       rflow = set_rps_cpu(dev, skb, rflow, next_cpu, hash,
> > -                                         flow_id);
> > +                                         flow_id, flow_table);
> >               }
> >
> >               if (tcpu < nr_cpu_ids && cpu_online(tcpu)) {
> --
> pw-bot: cr

It is indeed very very very long.
I guess  I will send a plain revert then.
I was trying to fix the patch, but it appears it was completely broken.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-02-20 22:03 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-20 14:43 [PATCH net] net: fix potential out-of-bound access in set_rps_cpu() Eric Dumazet
2026-02-20 21:52 ` [net] " Jakub Kicinski
2026-02-20 22:03   ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox