From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev, "D. Wythe" <alibuda@linux.alibaba.com>,
Wen Gu <guwen@linux.alibaba.com>,
Wenjia Zhang <wenjia@linux.ibm.com>,
"David S. Miller" <davem@davemloft.net>,
Sasha Levin <sashal@kernel.org>
Subject: [PATCH 5.15 15/69] net/smc: avoid data corruption caused by decline
Date: Thu, 30 Nov 2023 16:22:12 +0000 [thread overview]
Message-ID: <20231130162133.585896957@linuxfoundation.org> (raw)
In-Reply-To: <20231130162133.035359406@linuxfoundation.org>
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: D. Wythe <alibuda@linux.alibaba.com>
[ Upstream commit e6d71b437abc2f249e3b6a1ae1a7228e09c6e563 ]
We found a data corruption issue during testing of SMC-R on Redis
applications.
The benchmark has a low probability of reporting a strange error as
shown below.
"Error: Protocol error, got "\xe2" as reply type byte"
Finally, we found that the retrieved error data was as follows:
0xE2 0xD4 0xC3 0xD9 0x04 0x00 0x2C 0x20 0xA6 0x56 0x00 0x16 0x3E 0x0C
0xCB 0x04 0x02 0x01 0x00 0x00 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xE2
It is quite obvious that this is a SMC DECLINE message, which means that
the applications received SMC protocol message.
We found that this was caused by the following situations:
client server
¦ clc proposal
------------->
¦ clc accept
<-------------
¦ clc confirm
------------->
wait llc confirm
send llc confirm
¦failed llc confirm
¦ x------
(after 2s)timeout
wait llc confirm rsp
wait decline
(after 1s) timeout
(after 2s) timeout
¦ decline
-------------->
¦ decline
<--------------
As a result, a decline message was sent in the implementation, and this
message was read from TCP by the already-fallback connection.
This patch double the client timeout as 2x of the server value,
With this simple change, the Decline messages should never cross or
collide (during Confirm link timeout).
This issue requires an immediate solution, since the protocol updates
involve a more long-term solution.
Fixes: 0fb0b02bd6fd ("net/smc: adapt SMC client code to use the LLC flow")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Wen Gu <guwen@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/smc/af_smc.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 49cf523a783a2..8c11eb70c0f69 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -398,8 +398,12 @@ static int smcr_clnt_conf_first_link(struct smc_sock *smc)
struct smc_llc_qentry *qentry;
int rc;
- /* receive CONFIRM LINK request from server over RoCE fabric */
- qentry = smc_llc_wait(link->lgr, NULL, SMC_LLC_WAIT_TIME,
+ /* Receive CONFIRM LINK request from server over RoCE fabric.
+ * Increasing the client's timeout by twice as much as the server's
+ * timeout by default can temporarily avoid decline messages of
+ * both sides crossing or colliding
+ */
+ qentry = smc_llc_wait(link->lgr, NULL, 2 * SMC_LLC_WAIT_TIME,
SMC_LLC_CONFIRM_LINK);
if (!qentry) {
struct smc_clc_msg_decline dclc;
--
2.42.0
next prev parent reply other threads:[~2023-11-30 16:31 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-30 16:21 [PATCH 5.15 00/69] 5.15.141-rc1 review Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 5.15 01/69] afs: Fix afs_server_list to be cleaned up with RCU Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 5.15 02/69] afs: Make error on cell lookup failure consistent with OpenAFS Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 03/69] drm/panel: boe-tv101wum-nl6: Fine tune the panel power sequence Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 04/69] drm/panel: auo,b101uan08.3: " Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 05/69] drm/panel: simple: Fix Innolux G101ICE-L01 bus flags Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 06/69] drm/panel: simple: Fix Innolux G101ICE-L01 timings Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 07/69] wireguard: use DEV_STATS_INC() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 08/69] octeontx2-pf: Fix memory leak during interface down Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 09/69] ata: pata_isapnp: Add missing error check for devm_ioport_map() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 10/69] drm/rockchip: vop: Fix color for RGB888/BGR888 format on VOP full Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 11/69] HID: core: store the unique system identifier in hid_device Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 12/69] HID: fix HID device resource race between HID core and debugging support Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 13/69] ipv4: Correct/silence an endian warning in __ip_do_redirect Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 14/69] net: usb: ax88179_178a: fix failed operations during ax88179_reset Greg Kroah-Hartman
2023-11-30 16:22 ` Greg Kroah-Hartman [this message]
2023-11-30 16:22 ` [PATCH 5.15 16/69] arm/xen: fix xen_vcpu_info allocation alignment Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 17/69] octeontx2-pf: Fix ntuple rule creation to direct packet to VF with higher Rx queue than its PF Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 18/69] amd-xgbe: handle corner-case during sfp hotplug Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 19/69] amd-xgbe: handle the corner-case during tx completion Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 20/69] amd-xgbe: propagate the correct speed and duplex status Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 21/69] net: axienet: Fix check for partial TX checksum Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 22/69] afs: Return ENOENT if no cell DNS record can be found Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 23/69] afs: Fix file locking on R/O volumes to operate in local mode Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 24/69] nvmet: nul-terminate the NQNs passed in the connect command Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 25/69] USB: dwc3: qcom: fix resource leaks on probe deferral Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 26/69] USB: dwc3: qcom: fix ACPI platform device leak Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 27/69] lockdep: Fix block chain corruption Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 28/69] MIPS: KVM: Fix a build warning about variable set but not used Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 29/69] media: camss: Replace hard coded value with parameter Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 30/69] media: camss: sm8250: Virtual channels for CSID Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 31/69] media: qcom: camss: Fix set CSI2_RX_CFG1_VC_MODE when VC is greater than 3 Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 32/69] media: qcom: camss: Fix csid-gen2 for test pattern generator Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 33/69] ext4: add a new helper to check if es must be kept Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 34/69] ext4: factor out __es_alloc_extent() and __es_free_extent() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 35/69] ext4: use pre-allocated es in __es_insert_extent() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 36/69] ext4: use pre-allocated es in __es_remove_extent() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 37/69] ext4: using nofail preallocation in ext4_es_remove_extent() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 38/69] ext4: using nofail preallocation in ext4_es_insert_delayed_block() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 39/69] ext4: using nofail preallocation in ext4_es_insert_extent() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 40/69] ext4: fix slab-use-after-free " Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 41/69] ext4: make sure allocate pending entry not fail Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 42/69] tracing/kprobes: Return EADDRNOTAVAIL when func matches several symbols Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 43/69] proc: sysctl: prevent aliased sysctls from getting passed to init Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 44/69] ACPI: resource: Skip IRQ override on ASUS ExpertBook B1402CVA Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 45/69] swiotlb-xen: provide the "max_mapping_size" method Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 46/69] bcache: replace a mistaken IS_ERR() by IS_ERR_OR_NULL() in btree_gc_coalesce() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 47/69] md: fix bi_status reporting in md_end_clone_io Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 48/69] bcache: fixup multi-threaded bch_sectors_dirty_init() wake-up race Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 49/69] io_uring/fs: consider link->flags when getting path for LINKAT Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 50/69] s390/dasd: protect device queue against concurrent access Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 51/69] USB: serial: option: add Luat Air72*U series products Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 52/69] hv_netvsc: Fix race of register_netdevice_notifier and VF register Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 53/69] hv_netvsc: Mark VF as slave before exposing it to user-mode Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 54/69] dm-delay: fix a race between delay_presuspend and delay_bio Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 55/69] bcache: check return value from btree_node_alloc_replacement() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 56/69] bcache: prevent potential division by zero error Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 57/69] bcache: fixup init dirty data errors Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 58/69] bcache: fixup lock c->root error Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 59/69] usb: cdnsp: Fix deadlock issue during using NCM gadget Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 60/69] USB: serial: option: add Fibocom L7xx modules Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 61/69] USB: serial: option: fix FM101R-GL defines Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 62/69] USB: serial: option: dont claim interface 4 for ZTE MF290 Greg Kroah-Hartman
2023-11-30 16:23 ` [PATCH 5.15 63/69] usb: typec: tcpm: Skip hard reset when in error recovery Greg Kroah-Hartman
2023-11-30 16:23 ` [PATCH 5.15 64/69] USB: dwc2: write HCINT with INTMASK applied Greg Kroah-Hartman
2023-11-30 16:23 ` [PATCH 5.15 65/69] usb: dwc3: Fix default mode initialization Greg Kroah-Hartman
2023-11-30 16:23 ` [PATCH 5.15 66/69] usb: dwc3: set the dma max_seg_size Greg Kroah-Hartman
2023-11-30 16:23 ` [PATCH 5.15 67/69] USB: dwc3: qcom: fix software node leak on probe errors Greg Kroah-Hartman
2023-11-30 16:23 ` [PATCH 5.15 68/69] USB: dwc3: qcom: fix wakeup after probe deferral Greg Kroah-Hartman
2023-11-30 16:23 ` [PATCH 5.15 69/69] io_uring: fix off-by one bvec index Greg Kroah-Hartman
2023-11-30 17:21 ` [PATCH 5.15 00/69] 5.15.141-rc1 review Daniel Díaz
2023-11-30 17:44 ` Guenter Roeck
2023-11-30 18:11 ` Daniel Díaz
2023-11-30 18:56 ` Guenter Roeck
2023-12-01 8:21 ` Greg Kroah-Hartman
2023-12-01 9:35 ` Francis Laniel
2023-12-01 9:44 ` Greg Kroah-Hartman
2023-12-01 14:34 ` Daniel Díaz
2023-12-01 23:04 ` Greg Kroah-Hartman
2023-12-04 14:55 ` Daniel Díaz
2023-12-01 6:31 ` Harshit Mogalapalli
2023-11-30 22:27 ` Pavel Machek
2023-11-30 18:57 ` Florian Fainelli
2023-12-01 0:08 ` Shuah Khan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231130162133.585896957@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=alibuda@linux.alibaba.com \
--cc=davem@davemloft.net \
--cc=guwen@linux.alibaba.com \
--cc=patches@lists.linux.dev \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
--cc=wenjia@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox