[PATCH net 0/2] gve: fix crashes on invalid TX queue indices

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net 0/2] gve: fix crashes on invalid TX queue indices
@ 2026-01-05 23:25 Joshua Washington
  2026-01-05 23:25 ` [PATCH net 1/2] gve: drop packets on invalid queue indices in GQI TX path Joshua Washington
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Joshua Washington @ 2026-01-05 23:25 UTC (permalink / raw)
  To: netdev
  Cc: Joshua Washington, Harshitha Ramamurthy, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Willem de Bruijn, Ankit Garg, Praveen Kaligineedi,
	Catherine Sullivan, Luigi Rizzo, Jon Olson, Sagi Shahar,
	Bailey Forrest, linux-kernel, stable

From: Ankit Garg <nktgrg@google.com>

This series fixes a kernel panic in the GVE driver caused by
out-of-bounds array access when the network stack provides an invalid
TX queue index.

The issue impacts both GQI and DQO queue formats. For both cases, the
driver is updated to validate the queue index and drop the packet if
the index is out of range.

Ankit Garg (2):
  gve: drop packets on invalid queue indices in GQI TX path
  gve: drop packets on invalid queue indices in DQO TX path

 drivers/net/ethernet/google/gve/gve_tx.c     | 12 +++++++++---
 drivers/net/ethernet/google/gve/gve_tx_dqo.c |  9 ++++++++-
 2 files changed, 17 insertions(+), 4 deletions(-)

-- 
2.52.0.351.gbe84eed79e-goog


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH net 1/2] gve: drop packets on invalid queue indices in GQI TX path
  2026-01-05 23:25 [PATCH net 0/2] gve: fix crashes on invalid TX queue indices Joshua Washington
@ 2026-01-05 23:25 ` Joshua Washington
  2026-01-05 23:25 ` [PATCH net 2/2] gve: drop packets on invalid queue indices in DQO " Joshua Washington
  2026-01-07  2:22 ` [PATCH net 0/2] gve: fix crashes on invalid TX queue indices Jakub Kicinski
  2 siblings, 0 replies; 8+ messages in thread
From: Joshua Washington @ 2026-01-05 23:25 UTC (permalink / raw)
  To: netdev
  Cc: Joshua Washington, Harshitha Ramamurthy, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Willem de Bruijn, Ankit Garg, Praveen Kaligineedi,
	Catherine Sullivan, Luigi Rizzo, Jon Olson, Sagi Shahar,
	Bailey Forrest, linux-kernel, stable

From: Ankit Garg <nktgrg@google.com>

The driver currently assumes that the skb queue mapping is within the
range of configured TX queues. However, the stack may provide an index
that exceeds the number of active queues.

In GQI format, an out-of-range index triggered a warning but continues
to dereference tx array, potentially causing a crash like below:

[    6.700970] Call Trace:
[    6.703576]  ? __warn+0x94/0xe0
[    6.706863]  ? gve_tx+0xa9f/0xc30 [gve]
[    6.712223]  ? gve_tx+0xa9f/0xc30 [gve]
[    6.716197]  ? report_bug+0xb1/0xe0
[    6.721195]  ? do_error_trap+0x9e/0xd0
[    6.725084]  ? do_invalid_op+0x36/0x40
[    6.730355]  ? gve_tx+0xa9f/0xc30 [gve]
[    6.734353]  ? invalid_op+0x14/0x20
[    6.739372]  ? gve_tx+0xa9f/0xc30 [gve]
[    6.743350]  ? netif_skb_features+0xcf/0x2a0
[    6.749137]  dev_hard_start_xmit+0xd7/0x240

Change that behavior to log a warning and drop the packet.

Cc: stable@vger.kernel.org
Fixes: f5cedc84a30d ("gve: Add transmit and receive support")
Signed-off-by: Ankit Garg <nktgrg@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
---
 drivers/net/ethernet/google/gve/gve_tx.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/google/gve/gve_tx.c b/drivers/net/ethernet/google/gve/gve_tx.c
index 97efc8d..30d1686 100644
--- a/drivers/net/ethernet/google/gve/gve_tx.c
+++ b/drivers/net/ethernet/google/gve/gve_tx.c
@@ -739,12 +739,18 @@ drop:
 netdev_tx_t gve_tx(struct sk_buff *skb, struct net_device *dev)
 {
 	struct gve_priv *priv = netdev_priv(dev);
+	u16 qid = skb_get_queue_mapping(skb);
 	struct gve_tx_ring *tx;
 	int nsegs;
 
-	WARN(skb_get_queue_mapping(skb) >= priv->tx_cfg.num_queues,
-	     "skb queue index out of range");
-	tx = &priv->tx[skb_get_queue_mapping(skb)];
+	if (unlikely(qid >= priv->tx_cfg.num_queues)) {
+		net_warn_ratelimited("%s: skb qid %d out of range, num tx queue %d. dropping packet",
+				     dev->name, qid, priv->tx_cfg.num_queues);
+		dev_kfree_skb_any(skb);
+		return NETDEV_TX_OK;
+	}
+
+	tx = &priv->tx[qid];
 	if (unlikely(gve_maybe_stop_tx(priv, tx, skb))) {
 		/* We need to ring the txq doorbell -- we have stopped the Tx
 		 * queue for want of resources, but prior calls to gve_tx()
-- 
2.52.0.351.gbe84eed79e-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net 2/2] gve: drop packets on invalid queue indices in DQO TX path
  2026-01-05 23:25 [PATCH net 0/2] gve: fix crashes on invalid TX queue indices Joshua Washington
  2026-01-05 23:25 ` [PATCH net 1/2] gve: drop packets on invalid queue indices in GQI TX path Joshua Washington
@ 2026-01-05 23:25 ` Joshua Washington
  2026-01-07  2:22 ` [PATCH net 0/2] gve: fix crashes on invalid TX queue indices Jakub Kicinski
  2 siblings, 0 replies; 8+ messages in thread
From: Joshua Washington @ 2026-01-05 23:25 UTC (permalink / raw)
  To: netdev
  Cc: Joshua Washington, Harshitha Ramamurthy, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Willem de Bruijn, Ankit Garg, Praveen Kaligineedi,
	Catherine Sullivan, Luigi Rizzo, Jon Olson, Sagi Shahar,
	Bailey Forrest, linux-kernel, stable

From: Ankit Garg <nktgrg@google.com>

The driver currently assumes that the skb queue mapping is within the
range of configured TX queues. However, the stack may provide an index
that exceeds the number of active queues.

In DQO format, driver doesn't perform any validation and continues to
dereference tx array, potentially causing a crash like below (trace is
from GQI format, but how we handle OOB queue is same in both formats).

[    6.700970] Call Trace:
[    6.703576]  ? __warn+0x94/0xe0
[    6.706863]  ? gve_tx+0xa9f/0xc30 [gve]
[    6.712223]  ? gve_tx+0xa9f/0xc30 [gve]
[    6.716197]  ? report_bug+0xb1/0xe0
[    6.721195]  ? do_error_trap+0x9e/0xd0
[    6.725084]  ? do_invalid_op+0x36/0x40
[    6.730355]  ? gve_tx+0xa9f/0xc30 [gve]
[    6.734353]  ? invalid_op+0x14/0x20
[    6.739372]  ? gve_tx+0xa9f/0xc30 [gve]
[    6.743350]  ? netif_skb_features+0xcf/0x2a0
[    6.749137]  dev_hard_start_xmit+0xd7/0x240

Change that behavior to log a warning and drop the packet.

Cc: stable@vger.kernel.org
Fixes: a57e5de476be ("gve: DQO: Add TX path")
Signed-off-by: Ankit Garg <nktgrg@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
---
 drivers/net/ethernet/google/gve/gve_tx_dqo.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/google/gve/gve_tx_dqo.c b/drivers/net/ethernet/google/gve/gve_tx_dqo.c
index 40b89b3..8ebcc84 100644
--- a/drivers/net/ethernet/google/gve/gve_tx_dqo.c
+++ b/drivers/net/ethernet/google/gve/gve_tx_dqo.c
@@ -1045,9 +1045,16 @@ static void gve_xsk_reorder_queue_pop_dqo(struct gve_tx_ring *tx)
 netdev_tx_t gve_tx_dqo(struct sk_buff *skb, struct net_device *dev)
 {
 	struct gve_priv *priv = netdev_priv(dev);
+	u16 qid = skb_get_queue_mapping(skb);
 	struct gve_tx_ring *tx;

-	tx = &priv->tx[skb_get_queue_mapping(skb)];
+	if (unlikely(qid >= priv->tx_cfg.num_queues)) {
+		net_warn_ratelimited("%s: skb qid %d out of range, num tx queue %d. dropping packet",
+				     dev->name, qid, priv->tx_cfg.num_queues);
+		dev_kfree_skb_any(skb);
+		return NETDEV_TX_OK;
+	}
+	tx = &priv->tx[qid];
 	if (unlikely(gve_try_tx_skb(priv, tx, skb) < 0)) {
 		/* We need to ring the txq doorbell -- we have stopped the Tx
 		 * queue for want of resources, but prior calls to gve_tx()
--
2.52.0.351.gbe84eed79e-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net 0/2] gve: fix crashes on invalid TX queue indices
  2026-01-05 23:25 [PATCH net 0/2] gve: fix crashes on invalid TX queue indices Joshua Washington
  2026-01-05 23:25 ` [PATCH net 1/2] gve: drop packets on invalid queue indices in GQI TX path Joshua Washington
  2026-01-05 23:25 ` [PATCH net 2/2] gve: drop packets on invalid queue indices in DQO " Joshua Washington
@ 2026-01-07  2:22 ` Jakub Kicinski
  2026-01-08 15:35   ` Ankit Garg
  2 siblings, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2026-01-07  2:22 UTC (permalink / raw)
  To: Joshua Washington
  Cc: netdev, Harshitha Ramamurthy, Andrew Lunn, David S. Miller,
	Eric Dumazet, Paolo Abeni, Willem de Bruijn, Ankit Garg,
	Praveen Kaligineedi, Catherine Sullivan, Luigi Rizzo, Jon Olson,
	Sagi Shahar, Bailey Forrest, linux-kernel, stable

On Mon,  5 Jan 2026 15:25:02 -0800 Joshua Washington wrote:
> This series fixes a kernel panic in the GVE driver caused by
> out-of-bounds array access when the network stack provides an invalid
> TX queue index.

Do you know how? I seem to recall we had such issues due to bugs
in the qdisc layer, most of which were fixed.

Fixing this at the source, if possible, would be far preferable
to sprinkling this condition to all the drivers.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net 0/2] gve: fix crashes on invalid TX queue indices
  2026-01-07  2:22 ` [PATCH net 0/2] gve: fix crashes on invalid TX queue indices Jakub Kicinski
@ 2026-01-08 15:35   ` Ankit Garg
  2026-01-08 16:31     ` Jakub Kicinski
  2026-01-08 16:37     ` Eric Dumazet
  0 siblings, 2 replies; 8+ messages in thread
From: Ankit Garg @ 2026-01-08 15:35 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Joshua Washington, netdev, Harshitha Ramamurthy, Andrew Lunn,
	David S. Miller, Eric Dumazet, Paolo Abeni, Willem de Bruijn,
	Praveen Kaligineedi, Catherine Sullivan, Luigi Rizzo, Jon Olson,
	Sagi Shahar, Bailey Forrest, linux-kernel, stable

On Tue, Jan 6, 2026 at 6:22 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon,  5 Jan 2026 15:25:02 -0800 Joshua Washington wrote:
> > This series fixes a kernel panic in the GVE driver caused by
> > out-of-bounds array access when the network stack provides an invalid
> > TX queue index.
>
> Do you know how? I seem to recall we had such issues due to bugs
> in the qdisc layer, most of which were fixed.
>
> Fixing this at the source, if possible, would be far preferable
> to sprinkling this condition to all the drivers.
That matches our observation—we have encountered this panic on older
kernels (specifically Rocky Linux 8) but have not been able to
reproduce it on recent upstream kernels.

Could you point us to the specific qdisc fixes you recall? We'd like
to verify if the issue we are seeing on the older kernel is indeed one
of those known/fixed bugs.

If it turns out this is fully resolved in the core network stack
upstream, we can drop this patch for the mainline driver. However, if
there is ambiguity, do you think there is value in keeping this check
to prevent the driver from crashing on invalid input?

Thanks,
Ankit Garg

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net 0/2] gve: fix crashes on invalid TX queue indices
  2026-01-08 15:35   ` Ankit Garg
@ 2026-01-08 16:31     ` Jakub Kicinski
  2026-01-08 16:37     ` Eric Dumazet
  1 sibling, 0 replies; 8+ messages in thread
From: Jakub Kicinski @ 2026-01-08 16:31 UTC (permalink / raw)
  To: Ankit Garg
  Cc: Joshua Washington, netdev, Harshitha Ramamurthy, Andrew Lunn,
	David S. Miller, Eric Dumazet, Paolo Abeni, Willem de Bruijn,
	Praveen Kaligineedi, Catherine Sullivan, Luigi Rizzo, Jon Olson,
	Sagi Shahar, Bailey Forrest, linux-kernel, stable

On Thu, 8 Jan 2026 07:35:59 -0800 Ankit Garg wrote:
> On Tue, Jan 6, 2026 at 6:22 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > On Mon,  5 Jan 2026 15:25:02 -0800 Joshua Washington wrote:  
> > > This series fixes a kernel panic in the GVE driver caused by
> > > out-of-bounds array access when the network stack provides an invalid
> > > TX queue index.  
> >
> > Do you know how? I seem to recall we had such issues due to bugs
> > in the qdisc layer, most of which were fixed.
> >
> > Fixing this at the source, if possible, would be far preferable
> > to sprinkling this condition to all the drivers.  
> 
> That matches our observation—we have encountered this panic on older
> kernels (specifically Rocky Linux 8) but have not been able to
> reproduce it on recent upstream kernels.
> 
> Could you point us to the specific qdisc fixes you recall? We'd like
> to verify if the issue we are seeing on the older kernel is indeed one
> of those known/fixed bugs.

Very old - ac5b70198adc25

> If it turns out this is fully resolved in the core network stack
> upstream, we can drop this patch for the mainline driver. However, if
> there is ambiguity, do you think there is value in keeping this check
> to prevent the driver from crashing on invalid input?

The API contract is that the stack does not send frames for queues
which don't exist (> real_num_tx_queues) down to the drivers.
There's no ambiguity, IMO, if the stack sends such frames its a bug
in the stack.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net 0/2] gve: fix crashes on invalid TX queue indices
  2026-01-08 15:35   ` Ankit Garg
  2026-01-08 16:31     ` Jakub Kicinski
@ 2026-01-08 16:37     ` Eric Dumazet
  2026-01-08 20:53       ` Ankit Garg
  1 sibling, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2026-01-08 16:37 UTC (permalink / raw)
  To: Ankit Garg
  Cc: Jakub Kicinski, Joshua Washington, netdev, Harshitha Ramamurthy,
	Andrew Lunn, David S. Miller, Paolo Abeni, Willem de Bruijn,
	Praveen Kaligineedi, Catherine Sullivan, Luigi Rizzo, Jon Olson,
	Sagi Shahar, Bailey Forrest, linux-kernel, stable

On Thu, Jan 8, 2026 at 4:36 PM Ankit Garg <nktgrg@google.com> wrote:
>
> On Tue, Jan 6, 2026 at 6:22 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Mon,  5 Jan 2026 15:25:02 -0800 Joshua Washington wrote:
> > > This series fixes a kernel panic in the GVE driver caused by
> > > out-of-bounds array access when the network stack provides an invalid
> > > TX queue index.
> >
> > Do you know how? I seem to recall we had such issues due to bugs
> > in the qdisc layer, most of which were fixed.
> >
> > Fixing this at the source, if possible, would be far preferable
> > to sprinkling this condition to all the drivers.
> That matches our observation—we have encountered this panic on older
> kernels (specifically Rocky Linux 8) but have not been able to
> reproduce it on recent upstream kernels.

What is the kernel version used in Rocky Linux 8 ?

Note that the test against real_num_tx_queues is done before reaching
the Qdisc layer.

It might help to give a stack trace of a panic.

>
> Could you point us to the specific qdisc fixes you recall? We'd like
> to verify if the issue we are seeing on the older kernel is indeed one
> of those known/fixed bugs.
>
> If it turns out this is fully resolved in the core network stack
> upstream, we can drop this patch for the mainline driver. However, if
> there is ambiguity, do you think there is value in keeping this check
> to prevent the driver from crashing on invalid input?

We already have many costly checks, and netdev_core_pick_tx() should
already prevent such panic.

>
> Thanks,
> Ankit Garg

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net 0/2] gve: fix crashes on invalid TX queue indices
  2026-01-08 16:37     ` Eric Dumazet
@ 2026-01-08 20:53       ` Ankit Garg
  0 siblings, 0 replies; 8+ messages in thread
From: Ankit Garg @ 2026-01-08 20:53 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jakub Kicinski, Joshua Washington, netdev, Harshitha Ramamurthy,
	Andrew Lunn, David S. Miller, Paolo Abeni, Willem de Bruijn,
	Praveen Kaligineedi, Catherine Sullivan, Luigi Rizzo, Jon Olson,
	Sagi Shahar, Bailey Forrest, linux-kernel, stable

On Thu, Jan 8, 2026 at 8:37 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Jan 8, 2026 at 4:36 PM Ankit Garg <nktgrg@google.com> wrote:
> >
> > On Tue, Jan 6, 2026 at 6:22 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > >
> > > On Mon,  5 Jan 2026 15:25:02 -0800 Joshua Washington wrote:
> > > > This series fixes a kernel panic in the GVE driver caused by
> > > > out-of-bounds array access when the network stack provides an invalid
> > > > TX queue index.
> > >
> > > Do you know how? I seem to recall we had such issues due to bugs
> > > in the qdisc layer, most of which were fixed.
> > >
> > > Fixing this at the source, if possible, would be far preferable
> > > to sprinkling this condition to all the drivers.
> > That matches our observation—we have encountered this panic on older
> > kernels (specifically Rocky Linux 8) but have not been able to
> > reproduce it on recent upstream kernels.
>
> What is the kernel version used in Rocky Linux 8 ?
>
The kernel version where we observed this is 4.18.0 (full version
4.18.0-553.81.1+2.1.el8_10_ciq)

> Note that the test against real_num_tx_queues is done before reaching
> the Qdisc layer.
>
> It might help to give a stack trace of a panic.
>
Crash happens in the sch_direct_xmit path per the trace.

I wonder if sch_direct_xmit is acting as an optimization to bypass the
queueing layer, and if that is somehow bypassing the queue index
checks you mentioned?

I'll try to dig a bit deeper into that specific flow, but here is the
trace in the meantime:

Call Trace:
? __warn+0x94/0xe0
? gve_tx+0xa9f/0xc30 [gve]
? gve_tx+0xa9f/0xc30 [gve]
? report_bug+0xb1/0xe0
? do_error_trap+0x9e/0xd0
? do_invalid_op+0x36/0x40
? gve_tx+0xa9f/0xc30 [gve]
? invalid_op+0x14/0x20
? gve_tx+0xa9f/0xc30 [gve]
? netif_skb_features+0xcf/0x2a0
dev_hard_start_xmit+0xd7/0x240
sch_direct_xmit+0x9f/0x370
__dev_queue_xmit+0xa04/0xc50
ip_finish_output2+0x26d/0x430
? __ip_finish_output+0xdf/0x1d0
ip_output+0x70/0xf0
__ip_queue_xmit+0x165/0x400
__tcp_transmit_skb+0xa6b/0xb90
tcp_connect+0xae3/0xd40
tcp_v4_connect+0x476/0x4f0
__inet_stream_connect+0xda/0x380
> >
> > Could you point us to the specific qdisc fixes you recall? We'd like
> > to verify if the issue we are seeing on the older kernel is indeed one
> > of those known/fixed bugs.
> >
> > If it turns out this is fully resolved in the core network stack
> > upstream, we can drop this patch for the mainline driver. However, if
> > there is ambiguity, do you think there is value in keeping this check
> > to prevent the driver from crashing on invalid input?
>
> We already have many costly checks, and netdev_core_pick_tx() should
> already prevent such panic.
>
> >
> > Thanks,
> > Ankit Garg

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-01-08 20:53 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-05 23:25 [PATCH net 0/2] gve: fix crashes on invalid TX queue indices Joshua Washington
2026-01-05 23:25 ` [PATCH net 1/2] gve: drop packets on invalid queue indices in GQI TX path Joshua Washington
2026-01-05 23:25 ` [PATCH net 2/2] gve: drop packets on invalid queue indices in DQO " Joshua Washington
2026-01-07  2:22 ` [PATCH net 0/2] gve: fix crashes on invalid TX queue indices Jakub Kicinski
2026-01-08 15:35   ` Ankit Garg
2026-01-08 16:31     ` Jakub Kicinski
2026-01-08 16:37     ` Eric Dumazet
2026-01-08 20:53       ` Ankit Garg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).