From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B9BAC1A680C; Tue, 12 May 2026 02:21:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778552497; cv=none; b=Wfj7qU2Y79Txb1w/MNqUhxqOz5RDXUZV2D2f7eaVJlZTM3wxA0d25/fpkq7AmsUtVTJz0TWnBKVVbzyPpdFBPSq9gbDIs/KWSxVMp12SwbxESg0imCFrzAk1Qe505sY+ugsieelFLc1rJvgIhUtUa6wHLRPRDP8vEGlI+h9IxM8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778552497; c=relaxed/simple; bh=T+0Nef/1KS8+15UPaVvnoK25b9jw3Qq81QPRwdPZQYc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ON15t93W1RXTI2GQJ7bQVTOi6zudA5I4Rp5Roy83p+SIe1dUhZiQuRMhYwSwAfql47Wl9R8bZNaYTVa4DF6rZmfG2J/F/bLVlZyfIoC54Ihn07qRfcQGXHuGUB3sdRd0klYPkHAVx+0iuMjeX9ekSDTQu7F7CLeitkfh9U8IxZ8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=g+ZNzb8x; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="g+ZNzb8x" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1C70AC2BCF5; Tue, 12 May 2026 02:21:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1778552496; bh=T+0Nef/1KS8+15UPaVvnoK25b9jw3Qq81QPRwdPZQYc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=g+ZNzb8xj6Rp4yY/07754ewJ407kQs9VG7mxAtf+AbcxcvXzPR2THpc+eUjyyCSeb DomBi0XfG321XOM8wlEnDXeIsxHrsImaGk0DDcHTBnY7cPTP3q41NcKEgz1Av4Ii6Y wwJKqPqgklKg+mzVcFUaDOFEnvSxnsz5jGZl4fIKSbUyDplNJh2xDlZ3TKCuPQ73lC onkzFmQXESTrVP9yO7ksHshCS6eKXxCbjKTKstTw0zThg3JhhSpWTPL3QKfgNI0aPe xVtO7UVsnDVMQHDOuTrbKSBO9jUzX7oMEs+essH8t4s2h4NAONpCLbIellTKr9Zzkd I5d5LIwXpZQGA== From: Jakub Kicinski To: dipayanroy@linux.microsoft.com Cc: Jakub Kicinski , kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com, andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com, pabeni@redhat.com, leon@kernel.org, longli@microsoft.com, kotaranov@microsoft.com, horms@kernel.org, shradhagupta@linux.microsoft.com, ssengar@linux.microsoft.com, ernis@linux.microsoft.com, shirazsaleem@microsoft.com, linux-hyperv@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, stephen@networkplumber.org, jacob.e.keller@intel.com, dipayanroy@microsoft.com, leitao@debian.org, kees@kernel.org, john.fastabend@gmail.com, hawk@kernel.org, bpf@vger.kernel.org, daniel@iogearbox.net, ast@kernel.org, sdf@fomichev.me, yury.norov@gmail.com Subject: Re: [PATCH v8 2/2] net: mana: force full-page RX buffers via ethtool private flag Date: Mon, 11 May 2026 19:21:33 -0700 Message-ID: <20260512022133.856196-1-kuba@kernel.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260508115100.488506-3-dipayanroy@linux.microsoft.com> References: <20260508115100.488506-3-dipayanroy@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit This is an AI-generated review of your patch. The human sending this email has considered the AI review valid, or at least plausible. --- net: mana: force full-page RX buffers via ethtool private flag This patch adds an ethtool private flag "full-page-rx" that forces one RX buffer per page, bypassing the page_pool fragment path, to address a 15-20% throughput regression seen on some 4K PAGE_SIZE ARM64 platforms under high connection counts. Toggling the flag triggers mana_detach() + mana_attach() to apply the new RX buffer layout. > diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c > index a28ca461c135..dfd222de11d0 100644 > --- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c > +++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c [ ... ] > @@ -590,6 +609,88 @@ static int mana_get_link_ksettings(struct net_device *ndev, > return 0; > } > [ ... ] > +static int mana_set_priv_flags(struct net_device *ndev, u32 priv_flags) > +{ > + struct mana_port_context *apc = netdev_priv(ndev); > + u32 changed = apc->priv_flags ^ priv_flags; > + u32 old_priv_flags = apc->priv_flags; > + bool schedule_port_reset = false; > + int err = 0; > + > + if (!changed) > + return 0; > + > + /* Reject unknown bits */ > + if (priv_flags & ~GENMASK(MANA_PRIV_FLAG_MAX - 1, 0)) > + return -EINVAL; > + > + if (changed & BIT(MANA_PRIV_FLAG_USE_FULL_PAGE_RXBUF)) { > + apc->priv_flags = priv_flags; > + > + if (!apc->port_is_up) { > + /* Port is down, flag updated to apply on next up > + * so just return. > + */ > + return 0; > + } > + > + /* Pre-allocate buffers to prevent failure in mana_attach > + * later > + */ > + err = mana_pre_alloc_rxbufs(apc, ndev->mtu, apc->num_queues); > + if (err) { > + netdev_err(ndev, > + "Insufficient memory for new allocations\n"); > + apc->priv_flags = old_priv_flags; > + return err; > + } > + > + err = mana_detach(ndev, false); > + if (err) { > + netdev_err(ndev, "mana_detach failed: %d\n", err); > + apc->priv_flags = old_priv_flags; > + > + /* Port is in an inconsistent state. Restore > + * 'port_is_up' so that queue reset work handler > + * can properly detach and re-attach. > + */ > + apc->port_is_up = true; > + schedule_port_reset = true; > + goto out; > + } > + > + err = mana_attach(ndev); > + if (err) { > + netdev_err(ndev, "mana_attach failed: %d\n", err); > + apc->priv_flags = old_priv_flags; > + > + /* Restore 'port_is_up' so the reset work handler > + * can properly detach/attach. Without this, > + * the handler sees port_is_up=false and skips > + * queue allocation, leaving the port dead. > + */ > + apc->port_is_up = true; > + schedule_port_reset = true; > + } > + } > + > +out: > + mana_pre_dealloc_rxbufs(apc); > + > + if (schedule_port_reset) > + queue_work(apc->ac->per_port_queue_reset_wq, > + &apc->queue_reset_work); > + > + return err; > +} [High] Can this failure path oops inside the reset work handler? When mana_attach() fails after mana_detach() has already torn the vport down, apc->tx_qp has been set to NULL by mana_destroy_txq() and apc->rxqs has been cleared by mana_cleanup_port_context(). This code then forces apc->port_is_up back to true and queues apc->queue_reset_work: apc->port_is_up = true; schedule_port_reset = true; When mana_per_port_queue_reset_work_handler runs it calls mana_detach(ndev, false), which latches port_st_save = port_is_up = true and, because port_st_save is true, invokes mana_dealloc_queues(). The TX drain loop there does: txq = &apc->tx_qp[i].txq; atomic_read(&txq->pending_sends); with no NULL check on apc->tx_qp, and mana_destroy_vport further down iterates apc->rxqs[rxq_idx] while rxqs is NULL. Would that produce a NULL pointer dereference in the workqueue worker whenever mana_attach() fails here (for example on an allocation failure in mana_init_port_context or mana_create_txq, or a transient HW command failure in mana_query_vport_cfg / mana_cfg_vport)? The mana_detach() failure path above (goto out) reaches mana_dealloc_queues() through the same chain and looks to have the same exposure. For comparison, mana_change_mtu() handles a mana_attach() failure by returning the error without scheduling a reset. Would a similar treatment here avoid the asynchronous oops, or is there a reason the reset must be scheduled in this specific failure case? -- pw-bot: cr