All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev, Eric Dumazet <edumazet@google.com>,
	Jason Xing <kernelxing@tencent.com>,
	Kuniyuki Iwashima <kuniyu@amazon.com>,
	Paolo Abeni <pabeni@redhat.com>, Sasha Levin <sashal@kernel.org>
Subject: [PATCH 4.19 29/80] tcp: avoid the lookup process failing to get sk in ehash table
Date: Fri,  3 Feb 2023 11:12:23 +0100	[thread overview]
Message-ID: <20230203101016.412855262@linuxfoundation.org> (raw)
In-Reply-To: <20230203101015.263854890@linuxfoundation.org>

From: Jason Xing <kernelxing@tencent.com>

[ Upstream commit 3f4ca5fafc08881d7a57daa20449d171f2887043 ]

While one cpu is working on looking up the right socket from ehash
table, another cpu is done deleting the request socket and is about
to add (or is adding) the big socket from the table. It means that
we could miss both of them, even though it has little chance.

Let me draw a call trace map of the server side.
   CPU 0                           CPU 1
   -----                           -----
tcp_v4_rcv()                  syn_recv_sock()
                            inet_ehash_insert()
                            -> sk_nulls_del_node_init_rcu(osk)
__inet_lookup_established()
                            -> __sk_nulls_add_node_rcu(sk, list)

Notice that the CPU 0 is receiving the data after the final ack
during 3-way shakehands and CPU 1 is still handling the final ack.

Why could this be a real problem?
This case is happening only when the final ack and the first data
receiving by different CPUs. Then the server receiving data with
ACK flag tries to search one proper established socket from ehash
table, but apparently it fails as my map shows above. After that,
the server fetches a listener socket and then sends a RST because
it finds a ACK flag in the skb (data), which obeys RST definition
in RFC 793.

Besides, Eric pointed out there's one more race condition where it
handles tw socket hashdance. Only by adding to the tail of the list
before deleting the old one can we avoid the race if the reader has
already begun the bucket traversal and it would possibly miss the head.

Many thanks to Eric for great help from beginning to end.

Fixes: 5e0724d027f0 ("tcp/dccp: fix hashdance race for passive sessions")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/lkml/20230112065336.41034-1-kerneljasonxing@gmail.com/
Link: https://lore.kernel.org/r/20230118015941.1313-1-kerneljasonxing@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/ipv4/inet_hashtables.c    | 17 +++++++++++++++--
 net/ipv4/inet_timewait_sock.c |  8 ++++----
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 3c58019f0718..d64522af9c3a 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -579,8 +579,20 @@ bool inet_ehash_insert(struct sock *sk, struct sock *osk, bool *found_dup_sk)
 	spin_lock(lock);
 	if (osk) {
 		WARN_ON_ONCE(sk->sk_hash != osk->sk_hash);
-		ret = sk_nulls_del_node_init_rcu(osk);
-	} else if (found_dup_sk) {
+		ret = sk_hashed(osk);
+		if (ret) {
+			/* Before deleting the node, we insert a new one to make
+			 * sure that the look-up-sk process would not miss either
+			 * of them and that at least one node would exist in ehash
+			 * table all the time. Otherwise there's a tiny chance
+			 * that lookup process could find nothing in ehash table.
+			 */
+			__sk_nulls_add_node_tail_rcu(sk, list);
+			sk_nulls_del_node_init_rcu(osk);
+		}
+		goto unlock;
+	}
+	if (found_dup_sk) {
 		*found_dup_sk = inet_ehash_lookup_by_sk(sk, list);
 		if (*found_dup_sk)
 			ret = false;
@@ -589,6 +601,7 @@ bool inet_ehash_insert(struct sock *sk, struct sock *osk, bool *found_dup_sk)
 	if (ret)
 		__sk_nulls_add_node_rcu(sk, list);
 
+unlock:
 	spin_unlock(lock);
 
 	return ret;
diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
index 88c5069b5d20..fedd19c22b39 100644
--- a/net/ipv4/inet_timewait_sock.c
+++ b/net/ipv4/inet_timewait_sock.c
@@ -80,10 +80,10 @@ void inet_twsk_put(struct inet_timewait_sock *tw)
 }
 EXPORT_SYMBOL_GPL(inet_twsk_put);
 
-static void inet_twsk_add_node_rcu(struct inet_timewait_sock *tw,
-				   struct hlist_nulls_head *list)
+static void inet_twsk_add_node_tail_rcu(struct inet_timewait_sock *tw,
+					struct hlist_nulls_head *list)
 {
-	hlist_nulls_add_head_rcu(&tw->tw_node, list);
+	hlist_nulls_add_tail_rcu(&tw->tw_node, list);
 }
 
 static void inet_twsk_add_bind_node(struct inet_timewait_sock *tw,
@@ -119,7 +119,7 @@ void inet_twsk_hashdance(struct inet_timewait_sock *tw, struct sock *sk,
 
 	spin_lock(lock);
 
-	inet_twsk_add_node_rcu(tw, &ehead->chain);
+	inet_twsk_add_node_tail_rcu(tw, &ehead->chain);
 
 	/* Step 3: Remove SK from hash chain */
 	if (__sk_nulls_del_node_init_rcu(sk))
-- 
2.39.0




  parent reply	other threads:[~2023-02-03 10:18 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-03 10:11 [PATCH 4.19 00/80] 4.19.272-rc1 review Greg Kroah-Hartman
2023-02-03 10:11 ` [PATCH 4.19 01/80] memory: mvebu-devbus: Fix missing clk_disable_unprepare in mvebu_devbus_probe() Greg Kroah-Hartman
2023-02-03 10:11 ` [PATCH 4.19 02/80] ARM: dts: imx6qdl-gw560x: Remove incorrect uart-has-rtscts Greg Kroah-Hartman
2023-02-03 10:11 ` [PATCH 4.19 03/80] HID: intel_ish-hid: Add check for ishtp_dma_tx_map Greg Kroah-Hartman
2023-02-03 10:11 ` [PATCH 4.19 04/80] EDAC/highbank: Fix memory leak in highbank_mc_probe() Greg Kroah-Hartman
2023-02-03 10:11 ` [PATCH 4.19 05/80] tomoyo: fix broken dependency on *.conf.default Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 06/80] IB/hfi1: Reject a zero-length user expected buffer Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 07/80] IB/hfi1: Reserve user expected TIDs Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 08/80] IB/hfi1: Fix expected receive setup error exit issues Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 09/80] affs: initialize fsdata in affs_truncate() Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 10/80] amd-xgbe: TX Flow Ctrl Registers are h/w ver dependent Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 11/80] amd-xgbe: Delay AN timeout during KR training Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 12/80] bpf: Fix pointer-leak due to insufficient speculative store bypass mitigation Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 13/80] phy: rockchip-inno-usb2: Fix missing clk_disable_unprepare() in rockchip_usb2phy_power_on() Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 14/80] net: nfc: Fix use-after-free in local_cleanup() Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 15/80] wifi: rndis_wlan: Prevent buffer overflow in rndis_query_oid Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 16/80] net: usb: sr9700: Handle negative len Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 17/80] net: mdio: validate parameter addr in mdiobus_get_phy() Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 18/80] HID: check empty report_list in hid_validate_values() Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 19/80] usb: gadget: f_fs: Prevent race during ffs_ep0_queue_wait Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 20/80] usb: gadget: f_fs: Ensure ep0req is dequeued before free_request Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 21/80] net: mlx5: eliminate anonymous module_init & module_exit Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 22/80] dmaengine: Fix double increment of client_count in dma_chan_get() Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 23/80] net: macb: fix PTP TX timestamp failure due to packet padding Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 24/80] HID: betop: check shape of output reports Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 25/80] dmaengine: xilinx_dma: commonize DMA copy size calculation Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 26/80] dmaengine: xilinx_dma: program hardware supported buffer length Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 27/80] dmaengine: xilinx_dma: Fix devm_platform_ioremap_resource error handling Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 28/80] dmaengine: xilinx_dma: call of_node_put() when breaking out of for_each_child_of_node() Greg Kroah-Hartman
2023-02-03 10:12 ` Greg Kroah-Hartman [this message]
2023-02-03 10:12 ` [PATCH 4.19 30/80] w1: fix deadloop in __w1_remove_master_device() Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 31/80] w1: fix WARNING after calling w1_process() Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 32/80] netfilter: conntrack: do not renew entry stuck in tcp SYN_SENT state Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 33/80] block: fix and cleanup bio_check_ro Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 34/80] perf env: Do not return pointers to local variables Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 35/80] fs: reiserfs: remove useless new_opts in reiserfs_remount Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 36/80] Bluetooth: hci_sync: cancel cmd_timer if hci_open failed Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 37/80] scsi: hpsa: Fix allocation size for scsi_host_alloc() Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 38/80] module: Dont wait for GOING modules Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 39/80] tracing: Make sure trace_printk() can output as soon as it can be used Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 40/80] trace_events_hist: add check for return value of create_hist_field Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 41/80] smbd: Make upper layer decide when to destroy the transport Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 42/80] cifs: Fix oops due to uncleared server->smbd_conn in reconnect Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 43/80] ARM: 9280/1: mm: fix warning on phys_addr_t to void pointer assignment Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 44/80] EDAC/device: Respect any driver-supplied workqueue polling value Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 45/80] net: fix UaF in netns ops registration error path Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 46/80] netfilter: nft_set_rbtree: skip elements in transaction from garbage collection Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 47/80] netlink: remove hash::nelems check in netlink_insert Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 48/80] netlink: annotate data races around nlk->portid Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 49/80] netlink: annotate data races around dst_portid and dst_group Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 50/80] netlink: annotate data races around sk_state Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 51/80] ipv4: prevent potential spectre v1 gadget in ip_metrics_convert() Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 52/80] netfilter: conntrack: fix vtag checks for ABORT/SHUTDOWN_COMPLETE Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 53/80] netrom: Fix use-after-free of a listening socket Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 54/80] sctp: fail if no bound addresses can be used for a given scope Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 55/80] net: ravb: Fix possible hang if RIS2_QFF1 happen Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 56/80] net/tg3: resolve deadlock in tg3_reset_task() during EEH Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 57/80] Revert "Input: synaptics - switch touchpad on HP Laptop 15-da3001TU to RMI mode" Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 58/80] x86/i8259: Mark legacy PIC interrupts with IRQ_LEVEL Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 59/80] drm/i915/display: fix compiler warning about array overrun Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 60/80] x86/asm: Fix an assembler warning with current binutils Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 61/80] x86/entry/64: Add instruction suffix to SYSRET Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 62/80] ARM: dts: imx: Fix pca9547 i2c-mux node name Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 63/80] dmaengine: imx-sdma: Fix a possible memory leak in sdma_transfer_init Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 64/80] sysctl: add a new register_sysctl_init() interface Greg Kroah-Hartman
2023-02-03 10:12 ` [PATCH 4.19 65/80] panic: unset panic_on_warn inside panic() Greg Kroah-Hartman
2023-02-03 10:13 ` [PATCH 4.19 66/80] exit: Add and use make_task_dead Greg Kroah-Hartman
2023-02-03 10:13 ` [PATCH 4.19 67/80] objtool: Add a missing comma to avoid string concatenation Greg Kroah-Hartman
2023-02-03 10:13 ` [PATCH 4.19 68/80] hexagon: Fix function name in die() Greg Kroah-Hartman
2023-02-03 10:13 ` [PATCH 4.19 69/80] h8300: Fix build errors from do_exit() to make_task_dead() transition Greg Kroah-Hartman
2023-02-03 10:13 ` [PATCH 4.19 70/80] ia64: make IA64_MCA_RECOVERY bool instead of tristate Greg Kroah-Hartman
2023-02-03 10:13 ` [PATCH 4.19 71/80] exit: Put an upper limit on how often we can oops Greg Kroah-Hartman
2023-02-03 10:13 ` [PATCH 4.19 72/80] exit: Expose "oops_count" to sysfs Greg Kroah-Hartman
2023-02-03 10:13 ` [PATCH 4.19 73/80] exit: Allow oops_limit to be disabled Greg Kroah-Hartman
2023-02-03 10:13 ` [PATCH 4.19 74/80] panic: Consolidate open-coded panic_on_warn checks Greg Kroah-Hartman
2023-02-03 10:13 ` [PATCH 4.19 75/80] panic: Introduce warn_limit Greg Kroah-Hartman
2023-02-03 10:13 ` [PATCH 4.19 76/80] panic: Expose "warn_count" to sysfs Greg Kroah-Hartman
2023-02-03 10:13 ` [PATCH 4.19 77/80] docs: Fix path paste-o for /sys/kernel/warn_count Greg Kroah-Hartman
2023-02-03 10:13 ` [PATCH 4.19 78/80] exit: Use READ_ONCE() for all oops/warn limit reads Greg Kroah-Hartman
2023-02-03 10:13 ` [PATCH 4.19 79/80] ipv6: ensure sane device mtu in tunnels Greg Kroah-Hartman
2023-02-03 10:13 ` [PATCH 4.19 80/80] usb: host: xhci-plat: add wakeup entry at sysfs Greg Kroah-Hartman
2023-02-03 11:04 ` [PATCH 4.19 00/80] 4.19.272-rc1 review Naresh Kamboju
2023-02-03 12:28   ` Krzysztof Kozlowski
2023-02-03 15:51     ` Guenter Roeck
2023-02-03 16:56       ` Krzysztof Kozlowski
2023-02-03 17:04         ` Greg Kroah-Hartman
2023-02-03 16:38     ` Greg Kroah-Hartman
2023-02-03 16:50   ` Greg Kroah-Hartman
2023-02-03 17:19     ` Guenter Roeck
2023-02-04  1:04 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230203101016.412855262@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=edumazet@google.com \
    --cc=kernelxing@tencent.com \
    --cc=kuniyu@amazon.com \
    --cc=pabeni@redhat.com \
    --cc=patches@lists.linux.dev \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.