Archive-only list for patches
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev, Haiyang Zhang <haiyangz@microsoft.com>,
	Long Li <longli@microsoft.com>,
	"David S. Miller" <davem@davemloft.net>
Subject: [PATCH 6.1 10/71] net: mana: Fix race of mana_hwc_post_rx_wqe and new hwc response
Date: Sun,  1 Sep 2024 18:17:15 +0200	[thread overview]
Message-ID: <20240901160802.276744046@linuxfoundation.org> (raw)
In-Reply-To: <20240901160801.879647959@linuxfoundation.org>

6.1-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Haiyang Zhang <haiyangz@microsoft.com>

commit 8af174ea863c72f25ce31cee3baad8a301c0cf0f upstream.

The mana_hwc_rx_event_handler() / mana_hwc_handle_resp() calls
complete(&ctx->comp_event) before posting the wqe back. It's
possible that other callers, like mana_create_txq(), start the
next round of mana_hwc_send_request() before the posting of wqe.
And if the HW is fast enough to respond, it can hit no_wqe error
on the HW channel, then the response message is lost. The mana
driver may fail to create queues and open, because of waiting for
the HW response and timed out.
Sample dmesg:
[  528.610840] mana 39d4:00:02.0: HWC: Request timed out!
[  528.614452] mana 39d4:00:02.0: Failed to send mana message: -110, 0x0
[  528.618326] mana 39d4:00:02.0 enP14804s2: Failed to create WQ object: -110

To fix it, move posting of rx wqe before complete(&ctx->comp_event).

Cc: stable@vger.kernel.org
Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/microsoft/mana/hw_channel.c |   62 ++++++++++++-----------
 1 file changed, 34 insertions(+), 28 deletions(-)

--- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
+++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
@@ -51,9 +51,33 @@ static int mana_hwc_verify_resp_msg(cons
 	return 0;
 }
 
+static int mana_hwc_post_rx_wqe(const struct hwc_wq *hwc_rxq,
+				struct hwc_work_request *req)
+{
+	struct device *dev = hwc_rxq->hwc->dev;
+	struct gdma_sge *sge;
+	int err;
+
+	sge = &req->sge;
+	sge->address = (u64)req->buf_sge_addr;
+	sge->mem_key = hwc_rxq->msg_buf->gpa_mkey;
+	sge->size = req->buf_len;
+
+	memset(&req->wqe_req, 0, sizeof(struct gdma_wqe_request));
+	req->wqe_req.sgl = sge;
+	req->wqe_req.num_sge = 1;
+	req->wqe_req.client_data_unit = 0;
+
+	err = mana_gd_post_and_ring(hwc_rxq->gdma_wq, &req->wqe_req, NULL);
+	if (err)
+		dev_err(dev, "Failed to post WQE on HWC RQ: %d\n", err);
+	return err;
+}
+
 static void mana_hwc_handle_resp(struct hw_channel_context *hwc, u32 resp_len,
-				 const struct gdma_resp_hdr *resp_msg)
+				 struct hwc_work_request *rx_req)
 {
+	const struct gdma_resp_hdr *resp_msg = rx_req->buf_va;
 	struct hwc_caller_ctx *ctx;
 	int err;
 
@@ -61,6 +85,7 @@ static void mana_hwc_handle_resp(struct
 		      hwc->inflight_msg_res.map)) {
 		dev_err(hwc->dev, "hwc_rx: invalid msg_id = %u\n",
 			resp_msg->response.hwc_msg_id);
+		mana_hwc_post_rx_wqe(hwc->rxq, rx_req);
 		return;
 	}
 
@@ -74,30 +99,13 @@ static void mana_hwc_handle_resp(struct
 	memcpy(ctx->output_buf, resp_msg, resp_len);
 out:
 	ctx->error = err;
-	complete(&ctx->comp_event);
-}
-
-static int mana_hwc_post_rx_wqe(const struct hwc_wq *hwc_rxq,
-				struct hwc_work_request *req)
-{
-	struct device *dev = hwc_rxq->hwc->dev;
-	struct gdma_sge *sge;
-	int err;
-
-	sge = &req->sge;
-	sge->address = (u64)req->buf_sge_addr;
-	sge->mem_key = hwc_rxq->msg_buf->gpa_mkey;
-	sge->size = req->buf_len;
 
-	memset(&req->wqe_req, 0, sizeof(struct gdma_wqe_request));
-	req->wqe_req.sgl = sge;
-	req->wqe_req.num_sge = 1;
-	req->wqe_req.client_data_unit = 0;
+	/* Must post rx wqe before complete(), otherwise the next rx may
+	 * hit no_wqe error.
+	 */
+	mana_hwc_post_rx_wqe(hwc->rxq, rx_req);
 
-	err = mana_gd_post_and_ring(hwc_rxq->gdma_wq, &req->wqe_req, NULL);
-	if (err)
-		dev_err(dev, "Failed to post WQE on HWC RQ: %d\n", err);
-	return err;
+	complete(&ctx->comp_event);
 }
 
 static void mana_hwc_init_event_handler(void *ctx, struct gdma_queue *q_self,
@@ -216,14 +224,12 @@ static void mana_hwc_rx_event_handler(vo
 		return;
 	}
 
-	mana_hwc_handle_resp(hwc, rx_oob->tx_oob_data_size, resp);
+	mana_hwc_handle_resp(hwc, rx_oob->tx_oob_data_size, rx_req);
 
-	/* Do no longer use 'resp', because the buffer is posted to the HW
-	 * in the below mana_hwc_post_rx_wqe().
+	/* Can no longer use 'resp', because the buffer is posted to the HW
+	 * in mana_hwc_handle_resp() above.
 	 */
 	resp = NULL;
-
-	mana_hwc_post_rx_wqe(hwc_rxq, rx_req);
 }
 
 static void mana_hwc_tx_event_handler(void *ctx, u32 gdma_txq_id,



  parent reply	other threads:[~2024-09-01 16:45 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-01 16:17 [PATCH 6.1 00/71] 6.1.108-rc1 review Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 01/71] drm/amdgpu: Using uninitialized value *size when calling amdgpu_vce_cs_reloc Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 02/71] LoongArch: Remove the unused dma-direct.h Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 03/71] btrfs: run delayed iputs when flushing delalloc Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 04/71] smb/client: avoid dereferencing rdata=NULL in smb2_new_read_req() Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 05/71] pinctrl: rockchip: correct RK3328 iomux width flag for GPIO2-B pins Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 06/71] pinctrl: single: fix potential NULL dereference in pcs_get_function() Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 07/71] of: Add cleanup.h based auto release via __free(device_node) markings Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 08/71] wifi: wfx: repair open network AP mode Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 09/71] wifi: mwifiex: duplicate static structs used in driver instances Greg Kroah-Hartman
2024-09-01 16:17 ` Greg Kroah-Hartman [this message]
2024-09-01 16:17 ` [PATCH 6.1 11/71] mptcp: close subflow when receiving TCP+FIN Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 12/71] mptcp: sched: check both backup in retrans Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 13/71] mptcp: pm: skip connecting to already established sf Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 14/71] mptcp: pm: reset MPC endp ID when re-added Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 15/71] mptcp: pm: send ACK on an active subflow Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 16/71] mptcp: pm: do not remove already closed subflows Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 17/71] mptcp: pm: ADD_ADDR 0 is not a new address Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 18/71] drm/amdgpu: align pp_power_profile_mode with kernel docs Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 19/71] drm/amdgpu/swsmu: always force a state reprogram on init Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 20/71] ata: libata-core: Fix null pointer dereference on error Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 21/71] usb: typec: fix up incorrectly backported "usb: typec: tcpm: unregister existing source caps before re-registration" Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 22/71] mmc: Avoid open coding by using mmc_op_tuning() Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 23/71] mmc: mtk-sd: receive cmd8 data when hs400 tuning fail Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 24/71] mptcp: unify pm get_local_id interfaces Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 25/71] mptcp: pm: remove mptcp_pm_remove_subflow() Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 26/71] mptcp: pm: only mark subflow endp as available Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 27/71] mptcp: pm: check add_addr_accept_max before accepting new ADD_ADDR Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 28/71] of: Introduce for_each_*_child_of_node_scoped() to automate of_node_put() handling Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 29/71] thermal: of: Fix OF node leak in thermal_of_trips_init() error path Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 30/71] thermal: of: Fix OF node leak in of_thermal_zone_find() error paths Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 31/71] ASoC: amd: acp: fix module autoloading Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 32/71] ASoC: SOF: amd: Fix for acp init sequence Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 33/71] pinctrl: mediatek: common-v2: Fix broken bias-disable for PULL_PU_PD_RSEL_TYPE Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 34/71] mm: Fix missing folio invalidation calls during truncation Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 35/71] btrfs: fix extent map use-after-free when adding pages to compressed bio Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 36/71] soundwire: stream: fix programming slave ports for non-continous port maps Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 37/71] phy: xilinx: add runtime PM support Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 38/71] phy: xilinx: phy-zynqmp: dynamic clock support for power-save Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 39/71] phy: xilinx: phy-zynqmp: Fix SGMII linkup failure on resume Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 40/71] dmaengine: dw: Add peripheral bus width verification Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 41/71] dmaengine: dw: Add memory " Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 42/71] Bluetooth: hci_core: Fix not handling hibernation actions Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 43/71] iommu: Do not return 0 from map_pages if it doesnt do anything Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 44/71] netfilter: nf_tables: restore IP sanity checks for netdev/egress Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 45/71] wifi: iwlwifi: fw: fix wgds rev 3 exact size Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 46/71] ethtool: check device is present when getting link settings Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 47/71] netfilter: nf_tables_ipv6: consider network offset in netdev/egress validation Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 48/71] selftests: forwarding: no_forwarding: Down ports on cleanup Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 49/71] selftests: forwarding: local_termination: " Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 50/71] bonding: implement xdo_dev_state_free and call it after deletion Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 51/71] gtp: fix a potential NULL pointer dereference Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 52/71] sctp: fix association labeling in the duplicate COOKIE-ECHO case Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 53/71] drm/amd/display: avoid using null object of framebuffer Greg Kroah-Hartman
2024-09-01 16:17 ` [PATCH 6.1 54/71] net: busy-poll: use ktime_get_ns() instead of local_clock() Greg Kroah-Hartman
2024-09-01 16:18 ` [PATCH 6.1 55/71] nfc: pn533: Add poll mod list filling check Greg Kroah-Hartman
2024-09-01 16:18 ` [PATCH 6.1 56/71] soc: qcom: cmd-db: Map shared memory as WC, not WB Greg Kroah-Hartman
2024-09-01 16:18 ` [PATCH 6.1 57/71] cdc-acm: Add DISABLE_ECHO quirk for GE HealthCare UI Controller Greg Kroah-Hartman
2024-09-01 16:18 ` [PATCH 6.1 58/71] USB: serial: option: add MeiG Smart SRM825L Greg Kroah-Hartman
2024-09-01 16:18 ` [PATCH 6.1 59/71] usb: dwc3: omap: add missing depopulate in probe error path Greg Kroah-Hartman
2024-09-01 16:18 ` [PATCH 6.1 60/71] usb: dwc3: core: Prevent USB core invalid event buffer address access Greg Kroah-Hartman
2024-09-01 16:18 ` [PATCH 6.1 61/71] usb: dwc3: st: fix probed platform device ref count on probe error path Greg Kroah-Hartman
2024-09-01 16:18 ` [PATCH 6.1 62/71] usb: dwc3: st: add missing depopulate in " Greg Kroah-Hartman
2024-09-01 16:18 ` [PATCH 6.1 63/71] usb: core: sysfs: Unmerge @usb3_hardware_lpm_attr_group in remove_power_attributes() Greg Kroah-Hartman
2024-09-01 16:18 ` [PATCH 6.1 64/71] usb: cdnsp: fix incorrect index in cdnsp_get_hw_deq function Greg Kroah-Hartman
2024-09-01 16:18 ` [PATCH 6.1 65/71] usb: cdnsp: fix for Link TRB with TC Greg Kroah-Hartman
2024-09-01 16:18 ` [PATCH 6.1 66/71] phy: zynqmp: Enable reference clock correctly Greg Kroah-Hartman
2024-09-01 16:18 ` [PATCH 6.1 67/71] igc: Fix reset adapter logics when tx mode change Greg Kroah-Hartman
2024-09-01 16:18 ` [PATCH 6.1 68/71] igc: Fix qbv tx latency by setting gtxoffset Greg Kroah-Hartman
2024-09-01 16:18 ` [PATCH 6.1 69/71] scsi: aacraid: Fix double-free on probe failure Greg Kroah-Hartman
2024-09-01 16:18 ` [PATCH 6.1 70/71] apparmor: fix policy_unpack_test on big endian systems Greg Kroah-Hartman
2024-09-01 16:18 ` [PATCH 6.1 71/71] fbdev: offb: fix up missing cleanup.h Greg Kroah-Hartman
2024-09-02  7:10 ` [PATCH 6.1 00/71] 6.1.108-rc1 review Pavel Machek
2024-09-02 15:42 ` Naresh Kamboju
2024-09-02 18:55 ` Florian Fainelli
2024-09-03  7:23 ` Ron Economos
2024-09-03  8:44 ` Jon Hunter
2024-09-03 11:48 ` Mark Brown
2024-09-04  7:24 ` Yann Sionneau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240901160802.276744046@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=davem@davemloft.net \
    --cc=haiyangz@microsoft.com \
    --cc=longli@microsoft.com \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox