All of lore.kernel.org
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Harshitha Ramamurthy" <hramamurthy@google.com>
Cc: bpf@vger.kernel.org
Subject: Re: [PATCH net-next v2 15/15] gve: add link status/speed ctrl ops
Date: Wed, 03 Jun 2026 23:59:31 +0000	[thread overview]
Message-ID: <20260603235932.569201F00898@smtp.kernel.org> (raw)
In-Reply-To: <20260602235900.3050782-16-hramamurthy@google.com>

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:

New issues:
- [Medium] Data race and double-read TOCTOU on unprotected `priv->link_up` boolean field.

Pre-existing issues:
- [Critical] DMA memory corruption risk in AdminQ command timeout handling.
- [Critical] Use-After-Free of `priv->adminq` when an ethtool command is issued after a failed device reset.
- [High] The driver fails to cancel the `service_task` workqueue during device suspend, allowing it to run while the device is in D3hot, triggering AER crashes.
--

commit ef66cc6af8d7947d84b4c6d611c745d03fb318bf
Author: Joshua Washington <joshwash@google.com>

gve: add link status/speed ctrl ops

Refactor link status check to use a control plane op. Introduce
new op for retrieving the link status in AQ mode. This op reads
the link status from the device status register and stores the
value in priv.

Also add an op for retrieving link speed in AQ mode which calls into
gve_adminq_report_link_speed.

> diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
> index 799381d24c71f..b33633555b710 100644
> --- a/drivers/net/ethernet/google/gve/gve_adminq.c
> +++ b/drivers/net/ethernet/google/gve/gve_adminq.c
> @@ -1872,3 +1872,12 @@ int gve_adminq_create_queues(struct gve_priv *priv)
>  
>  	return err;
>  }
> +
> +int gve_adminq_report_link_status(struct gve_priv *priv)
> +{
> +	u32 status;
> +
> +	status = ioread32be(&priv->reg_bar0->device_status);
> +	priv->link_up = !!(GVE_DEVICE_STATUS_LINK_STATUS_MASK & status);

[Severity: Medium]
Can this cause a data race? This function directly writes to priv->link_up
without locks, while it is read concurrently in other contexts.

> +	return 0;
> +}

[ ... ]

> diff --git a/drivers/net/ethernet/google/gve/gve_ethtool.c b/drivers/net/ethernet/google/gve/gve_ethtool.c
> index 54b623b678ce3..aaf5b8b1671b5 100644
> --- a/drivers/net/ethernet/google/gve/gve_ethtool.c
> +++ b/drivers/net/ethernet/google/gve/gve_ethtool.c
> @@ -749,12 +749,11 @@ static int gve_get_link_ksettings(struct net_device *netdev,
>  	int err = 0;
>  
>  	if (priv->link_speed == 0)
> -		err = gve_adminq_report_link_speed(priv);
> +		err = priv->ctrl_ops->report_link_speed(priv);

[Severity: Critical]
This is a pre-existing issue, but I have a question about the timeout handling
inside gve_adminq_report_link_speed(). 

If gve_adminq_execute_cmd() returns a timeout error, the function drops the
error and immediately frees the DMA memory via dma_free_coherent(). However,
since a host CPU timeout does not guarantee the NIC has aborted the DMA
operation, could the hardware eventually execute a DMA write to the provided
bus address after the memory has been reallocated, causing memory corruption?

[Severity: Critical]
This is another pre-existing issue, but could this ethtool call trigger a
use-after-free if a previous device reset failed?

If a device reset fails, gve_adminq_free() destroys the adminq_pool and frees
the DMA memory backing priv->adminq. If user-space subsequently issues an
ethtool command like gve_get_link_ksettings, the driver invokes
report_link_speed() and eventually gve_adminq_issue_cmd(). 

If the device's tail counter was reset to 0, and the host's head counter is
still 0, could gve_adminq_issue_cmd() write the command payload into the
now-dangling priv->adminq buffer?

>  
>  	cmd->base.speed = priv->link_speed;
>  
>  	cmd->base.duplex = DUPLEX_FULL;

[ ... ]

> diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
> index a103a6304b495..74778ee4d4d0e 100644
> --- a/drivers/net/ethernet/google/gve/gve_main.c
> +++ b/drivers/net/ethernet/google/gve/gve_main.c
> @@ -1475,15 +1475,15 @@ static int gve_close(struct net_device *dev)
>  	return err;
>  }
>  
> -static void gve_handle_link_status(struct gve_priv *priv, bool link_status)
> +static void gve_handle_link_status(struct gve_priv *priv)
>  {
>  	if (!gve_get_napi_enabled(priv))
>  		return;
>  
> -	if (link_status == netif_carrier_ok(priv->dev))
> +	if (priv->link_up == netif_carrier_ok(priv->dev))
>  		return;
>  
> -	if (link_status) {
> +	if (priv->link_up) {

[Severity: Medium]
Is there a time-of-check to time-of-use race here? 

Because priv->link_up is read twice in quick succession without protection, 
and could be modified concurrently by gve_adminq_report_link_status() running
in the lockless service_task workqueue, can the value change between the two
checks?

Would using READ_ONCE() and saving the result to a local variable prevent
unintended compiler optimizations and state changes?

>  		netdev_info(priv->dev, "Device link is up.\n");
>  		netif_carrier_on(priv->dev);
>  	} else {

[ ... ]

> @@ -2299,12 +2298,14 @@ static void gve_service_task(struct work_struct *work)
>  {
>  	struct gve_priv *priv = container_of(work, struct gve_priv,
>  					     service_task);
> +	const struct gve_ctrl_ops *ops = priv->ctrl_ops;
>  	u32 status = ioread32be(&priv->reg_bar0->device_status);
>  
>  	gve_handle_status(priv, status);
>  
>  	gve_handle_reset(priv);
> -	gve_handle_link_status(priv, GVE_DEVICE_STATUS_LINK_STATUS_MASK & status);
> +	ops->report_link_status(priv);
> +	gve_handle_link_status(priv);

[Severity: High]
This is a pre-existing issue, but is the service_task safely stopped during
system suspend?

Looking at gve_suspend(), the function does not call cancel_work_sync() for
the service_task. Because the priv->gve_wq workqueue is allocated without the
WQ_FREEZABLE flag, it will not be frozen by the PM core. 

If this service_task runs while the device is in D3hot, could ioread32be()
return all Fs (0xFFFFFFFF), causing the GVE_DEVICE_STATUS_RESET_MASK check to
pass? If so, would this trigger gve_reset() to write to MMIO registers of a
suspended PCIe device, resulting in a PCIe Master Abort?

>  }
>  
>  static void gve_set_netdev_xdp_features(struct gve_priv *priv)

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260602235900.3050782-1-hramamurthy@google.com?part=15

  reply	other threads:[~2026-06-03 23:59 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-02 23:58 [PATCH net-next v2 00/15] gve: AdminQ mode related refactors Harshitha Ramamurthy
2026-06-02 23:58 ` [PATCH net-next v2 01/15] gve: don't pass in unused parameter to gve_adminq_free Harshitha Ramamurthy
2026-06-02 23:58 ` [PATCH net-next v2 02/15] gve: refactor initialization with helper functions Harshitha Ramamurthy
2026-06-03 23:59   ` sashiko-bot
2026-06-05  3:41   ` Harshitha Ramamurthy
2026-06-02 23:58 ` [PATCH net-next v2 03/15] gve: introduce gve_adminq_get_device_properties() Harshitha Ramamurthy
2026-06-02 23:58 ` [PATCH net-next v2 04/15] gve: add a few helper functions to set device properties Harshitha Ramamurthy
2026-06-02 23:58 ` [PATCH net-next v2 05/15] gve: add struct gve_device_info to hold " Harshitha Ramamurthy
2026-06-03 23:59   ` sashiko-bot
2026-06-05  3:44   ` Harshitha Ramamurthy
2026-06-02 23:58 ` [PATCH net-next v2 06/15] gve: introduce control plane operations structure Harshitha Ramamurthy
2026-06-02 23:58 ` [PATCH net-next v2 07/15] gve: introduce ctrl ops to set vectors and Qs Harshitha Ramamurthy
2026-06-02 23:58 ` [PATCH net-next v2 08/15] gve: refactor gve_init_priv for reset path Harshitha Ramamurthy
2026-06-03 23:59   ` sashiko-bot
2026-06-05  3:52   ` Harshitha Ramamurthy
2026-06-02 23:58 ` [PATCH net-next v2 09/15] gve: simplify reset logic Harshitha Ramamurthy
2026-06-03 23:59   ` sashiko-bot
2026-06-07 22:20   ` Joshua Washington
2026-06-02 23:58 ` [PATCH net-next v2 10/15] gve: add gve_ctrl_ops for gve initialization/teardown sequences Harshitha Ramamurthy
2026-06-02 23:58 ` [PATCH net-next v2 11/15] gve: split up notify block allocation and setup paths Harshitha Ramamurthy
2026-06-03 23:59   ` sashiko-bot
2026-06-07 22:29   ` Joshua Washington
2026-06-02 23:58 ` [PATCH net-next v2 12/15] gve: introduce new methods to handle IRQ doorbells Harshitha Ramamurthy
2026-06-03 23:59   ` sashiko-bot
2026-06-07 22:35   ` Joshua Washington
2026-06-02 23:58 ` [PATCH net-next v2 13/15] gve: setup and teardown management interrupts Harshitha Ramamurthy
2026-06-03 23:59   ` sashiko-bot
2026-06-07 22:39   ` Joshua Washington
2026-06-02 23:58 ` [PATCH net-next v2 14/15] gve: add ctrl ops to for queue operations Harshitha Ramamurthy
2026-06-05  3:56   ` Harshitha Ramamurthy
2026-06-02 23:58 ` [PATCH net-next v2 15/15] gve: add link status/speed ctrl ops Harshitha Ramamurthy
2026-06-03 23:59   ` sashiko-bot [this message]
2026-06-07 22:45   ` Joshua Washington

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260603235932.569201F00898@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=hramamurthy@google.com \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.