From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev, Kuniyuki Iwashima <kuniyu@amazon.com>,
Paolo Abeni <pabeni@redhat.com>, Sasha Levin <sashal@kernel.org>
Subject: [PATCH 5.10 30/73] udp: Update reuse->has_conns under reuseport_lock.
Date: Fri, 28 Oct 2022 14:03:27 +0200 [thread overview]
Message-ID: <20221028120233.677150258@linuxfoundation.org> (raw)
In-Reply-To: <20221028120232.344548477@linuxfoundation.org>
From: Kuniyuki Iwashima <kuniyu@amazon.com>
[ Upstream commit 69421bf98482d089e50799f45e48b25ce4a8d154 ]
When we call connect() for a UDP socket in a reuseport group, we have
to update sk->sk_reuseport_cb->has_conns to 1. Otherwise, the kernel
could select a unconnected socket wrongly for packets sent to the
connected socket.
However, the current way to set has_conns is illegal and possible to
trigger that problem. reuseport_has_conns() changes has_conns under
rcu_read_lock(), which upgrades the RCU reader to the updater. Then,
it must do the update under the updater's lock, reuseport_lock, but
it doesn't for now.
For this reason, there is a race below where we fail to set has_conns
resulting in the wrong socket selection. To avoid the race, let's split
the reader and updater with proper locking.
cpu1 cpu2
+----+ +----+
__ip[46]_datagram_connect() reuseport_grow()
. .
|- reuseport_has_conns(sk, true) |- more_reuse = __reuseport_alloc(more_socks_size)
| . |
| |- rcu_read_lock()
| |- reuse = rcu_dereference(sk->sk_reuseport_cb)
| |
| | | /* reuse->has_conns == 0 here */
| | |- more_reuse->has_conns = reuse->has_conns
| |- reuse->has_conns = 1 | /* more_reuse->has_conns SHOULD BE 1 HERE */
| | |
| | |- rcu_assign_pointer(reuse->socks[i]->sk_reuseport_cb,
| | | more_reuse)
| `- rcu_read_unlock() `- kfree_rcu(reuse, rcu)
|
|- sk->sk_state = TCP_ESTABLISHED
Note the likely(reuse) in reuseport_has_conns_set() is always true,
but we put the test there for ease of review. [0]
For the record, usually, sk_reuseport_cb is changed under lock_sock().
The only exception is reuseport_grow() & TCP reqsk migration case.
1) shutdown() TCP listener, which is moved into the latter part of
reuse->socks[] to migrate reqsk.
2) New listen() overflows reuse->socks[] and call reuseport_grow().
3) reuse->max_socks overflows u16 with the new listener.
4) reuseport_grow() pops the old shutdown()ed listener from the array
and update its sk->sk_reuseport_cb as NULL without lock_sock().
shutdown()ed TCP sk->sk_reuseport_cb can be changed without lock_sock(),
but, reuseport_has_conns_set() is called only for UDP under lock_sock(),
so likely(reuse) never be false in reuseport_has_conns_set().
[0]: https://lore.kernel.org/netdev/CANn89iLja=eQHbsM_Ta2sQF0tOGU8vAGrh_izRuuHjuO1ouUag@mail.gmail.com/
Fixes: acdcecc61285 ("udp: correct reuseport selection with connected sockets")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20221014182625.89913-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
include/net/sock_reuseport.h | 11 +++++------
net/core/sock_reuseport.c | 16 ++++++++++++++++
net/ipv4/datagram.c | 2 +-
net/ipv4/udp.c | 2 +-
net/ipv6/datagram.c | 2 +-
net/ipv6/udp.c | 2 +-
6 files changed, 25 insertions(+), 10 deletions(-)
diff --git a/include/net/sock_reuseport.h b/include/net/sock_reuseport.h
index 0e558ca7afbf..6348c6f26903 100644
--- a/include/net/sock_reuseport.h
+++ b/include/net/sock_reuseport.h
@@ -39,21 +39,20 @@ extern struct sock *reuseport_select_sock(struct sock *sk,
extern int reuseport_attach_prog(struct sock *sk, struct bpf_prog *prog);
extern int reuseport_detach_prog(struct sock *sk);
-static inline bool reuseport_has_conns(struct sock *sk, bool set)
+static inline bool reuseport_has_conns(struct sock *sk)
{
struct sock_reuseport *reuse;
bool ret = false;
rcu_read_lock();
reuse = rcu_dereference(sk->sk_reuseport_cb);
- if (reuse) {
- if (set)
- reuse->has_conns = 1;
- ret = reuse->has_conns;
- }
+ if (reuse && reuse->has_conns)
+ ret = true;
rcu_read_unlock();
return ret;
}
+void reuseport_has_conns_set(struct sock *sk);
+
#endif /* _SOCK_REUSEPORT_H */
diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c
index f478c65a281b..364cf6c6912b 100644
--- a/net/core/sock_reuseport.c
+++ b/net/core/sock_reuseport.c
@@ -18,6 +18,22 @@ DEFINE_SPINLOCK(reuseport_lock);
static DEFINE_IDA(reuseport_ida);
+void reuseport_has_conns_set(struct sock *sk)
+{
+ struct sock_reuseport *reuse;
+
+ if (!rcu_access_pointer(sk->sk_reuseport_cb))
+ return;
+
+ spin_lock_bh(&reuseport_lock);
+ reuse = rcu_dereference_protected(sk->sk_reuseport_cb,
+ lockdep_is_held(&reuseport_lock));
+ if (likely(reuse))
+ reuse->has_conns = 1;
+ spin_unlock_bh(&reuseport_lock);
+}
+EXPORT_SYMBOL(reuseport_has_conns_set);
+
static int reuseport_sock_index(struct sock *sk,
const struct sock_reuseport *reuse,
bool closed)
diff --git a/net/ipv4/datagram.c b/net/ipv4/datagram.c
index 4a8550c49202..112c6e892d30 100644
--- a/net/ipv4/datagram.c
+++ b/net/ipv4/datagram.c
@@ -70,7 +70,7 @@ int __ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len
}
inet->inet_daddr = fl4->daddr;
inet->inet_dport = usin->sin_port;
- reuseport_has_conns(sk, true);
+ reuseport_has_conns_set(sk);
sk->sk_state = TCP_ESTABLISHED;
sk_set_txhash(sk);
inet->inet_id = prandom_u32();
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 4446aa8237ff..b093daaa3deb 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -446,7 +446,7 @@ static struct sock *udp4_lib_lookup2(struct net *net,
result = lookup_reuseport(net, sk, skb,
saddr, sport, daddr, hnum);
/* Fall back to scoring if group has connections */
- if (result && !reuseport_has_conns(sk, false))
+ if (result && !reuseport_has_conns(sk))
return result;
result = result ? : sk;
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 206f66310a88..f4559e5bc84b 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -256,7 +256,7 @@ int __ip6_datagram_connect(struct sock *sk, struct sockaddr *uaddr,
goto out;
}
- reuseport_has_conns(sk, true);
+ reuseport_has_conns_set(sk);
sk->sk_state = TCP_ESTABLISHED;
sk_set_txhash(sk);
out:
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 9b504bf49214..514e6a55959f 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -179,7 +179,7 @@ static struct sock *udp6_lib_lookup2(struct net *net,
result = lookup_reuseport(net, sk, skb,
saddr, sport, daddr, hnum);
/* Fall back to scoring if group has connections */
- if (result && !reuseport_has_conns(sk, false))
+ if (result && !reuseport_has_conns(sk))
return result;
result = result ? : sk;
--
2.35.1
next prev parent reply other threads:[~2022-10-28 12:05 UTC|newest]
Thread overview: 80+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-28 12:02 [PATCH 5.10 00/73] 5.10.152-rc1 review Greg Kroah-Hartman
2022-10-28 12:02 ` [PATCH 5.10 01/73] ocfs2: clear dinode links count in case of error Greg Kroah-Hartman
2022-10-28 12:02 ` [PATCH 5.10 02/73] ocfs2: fix BUG when iput after ocfs2_mknod fails Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 03/73] selinux: enable use of both GFP_KERNEL and GFP_ATOMIC in convert_context() Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 04/73] cpufreq: qcom: fix writes in read-only memory region Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 05/73] i2c: qcom-cci: Fix ordering of pm_runtime_xx and i2c_add_adapter Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 06/73] x86/microcode/AMD: Apply the patch early on every logical thread Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 07/73] hwmon/coretemp: Handle large core ID value Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 08/73] ata: ahci-imx: Fix MODULE_ALIAS Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 09/73] ata: ahci: Match EM_MAX_SLOTS with SATA_PMP_MAX_PORTS Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 10/73] cpufreq: qcom: fix memory leak in error path Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 11/73] kvm: Add support for arch compat vm ioctls Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 12/73] KVM: arm64: vgic: Fix exit condition in scan_its_table() Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 13/73] media: mceusb: set timeout to at least timeout provided Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 14/73] media: venus: dec: Handle the case where find_format fails Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 15/73] block: wbt: Remove unnecessary invoking of wbt_update_limits in wbt_init Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 16/73] blk-wbt: call rq_qos_add() after wb_normal is initialized Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 17/73] arm64: errata: Remove AES hwcap for COMPAT tasks Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 18/73] r8152: add PID for the Lenovo OneLink+ Dock Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 19/73] btrfs: fix processing of delayed data refs during backref walking Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 20/73] btrfs: fix processing of delayed tree block " Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 21/73] ACPI: extlog: Handle multiple records Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 22/73] tipc: Fix recognition of trial period Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 23/73] tipc: fix an information leak in tipc_topsrv_kern_subscr Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 24/73] i40e: Fix DMA mappings leak Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 25/73] HID: magicmouse: Do not set BTN_MOUSE on double report Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 26/73] sfc: Change VF mac via PF as first preference if available Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 27/73] net/atm: fix proc_mpc_write incorrect return value Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 28/73] net: phy: dp83867: Extend RX strap quirk for SGMII mode Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 29/73] tcp: Add num_closed_socks to struct sock_reuseport Greg Kroah-Hartman
2022-10-28 12:03 ` Greg Kroah-Hartman [this message]
2022-10-28 12:03 ` [PATCH 5.10 31/73] cifs: Fix xid leak in cifs_copy_file_range() Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 32/73] cifs: Fix xid leak in cifs_flock() Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 33/73] cifs: Fix xid leak in cifs_ses_add_channel() Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 34/73] net: hsr: avoid possible NULL deref in skb_clone() Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 35/73] ionic: catch NULL pointer issue on reconfig Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 36/73] nvme-hwmon: rework to avoid devm allocation Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 37/73] nvme-hwmon: Return error code when registration fails Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 38/73] nvme-hwmon: consistently ignore errors from nvme_hwmon_init Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 39/73] nvme-hwmon: kmalloc the NVME SMART log buffer Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 40/73] net: sched: cake: fix null pointer access issue when cake_init() fails Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 41/73] net: sched: delete duplicate cleanup of backlog and qlen Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 42/73] net: sched: sfb: fix null pointer access issue when sfb_init() fails Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 43/73] sfc: include vport_id in filter spec hash and equal() Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 44/73] net: hns: fix possible memory leak in hnae_ae_register() Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 45/73] net: sched: fix race condition in qdisc_graft() Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 46/73] net: phy: dp83822: disable MDI crossover status change interrupt Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 47/73] iommu/vt-d: Allow NVS regions in arch_rmrr_sanity_check() Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 48/73] iommu/vt-d: Clean up si_domain in the init_dmars() error path Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 49/73] drm/virtio: Use appropriate atomic state in virtio_gpu_plane_cleanup_fb() Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 50/73] dmaengine: mxs-dma: Remove the unused .id_table Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 51/73] dmaengine: mxs: use platform_driver_register Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 52/73] tracing: Simplify conditional compilation code in tracing_set_tracer() Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 53/73] tracing: Do not free snapshot if tracer is on cmdline Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 54/73] xen: assume XENFEAT_gnttab_map_avail_bits being set for pv guests Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 55/73] xen/gntdev: Accommodate VMA splitting Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 56/73] mmc: sdhci-tegra: Use actual clock rate for SW tuning correction Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 57/73] riscv: Add machine name to kernel boot log and stack dump output Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 58/73] riscv: always honor the CONFIG_CMDLINE_FORCE when parsing dtb Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 59/73] perf pmu: Validate raw event with sysfs exported format bits Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 60/73] perf: Skip and warn on unknown format configN attrs Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 61/73] fcntl: make F_GETOWN(EX) return 0 on dead owner task Greg Kroah-Hartman
2022-10-28 12:03 ` [PATCH 5.10 62/73] fcntl: fix potential deadlocks for &fown_struct.lock Greg Kroah-Hartman
2022-10-28 12:04 ` [PATCH 5.10 63/73] arm64: dts: qcom: sc7180-trogdor: Fixup modem memory region Greg Kroah-Hartman
2022-10-28 12:04 ` [PATCH 5.10 64/73] arm64: topology: move store_cpu_topology() to shared code Greg Kroah-Hartman
2022-10-28 12:04 ` [PATCH 5.10 65/73] riscv: topology: fix default topology reporting Greg Kroah-Hartman
2022-10-28 12:04 ` [PATCH 5.10 66/73] perf/x86/intel/pt: Relax address filter validation Greg Kroah-Hartman
2022-10-28 12:04 ` [PATCH 5.10 67/73] hv_netvsc: Fix race between VF offering and VF association message from host Greg Kroah-Hartman
2022-10-28 12:04 ` [PATCH 5.10 68/73] [PATCH v3] ACPI: video: Force backlight native for more TongFang devices Greg Kroah-Hartman
2022-10-28 12:04 ` [PATCH 5.10 69/73] x86/Kconfig: Drop check for -mabi=ms for CONFIG_EFI_STUB Greg Kroah-Hartman
2022-10-28 12:04 ` [PATCH 5.10 70/73] Makefile.debug: re-enable debug info for .S files Greg Kroah-Hartman
2022-10-28 12:04 ` [PATCH 5.10 71/73] mmc: core: Add SD card quirk for broken discard Greg Kroah-Hartman
2022-10-28 12:04 ` [PATCH 5.10 72/73] blk-wbt: fix that rwb->wc is always set to 1 in wbt_init() Greg Kroah-Hartman
2022-10-28 12:04 ` [PATCH 5.10 73/73] mm: /proc/pid/smaps_rollup: fix no vmas null-deref Greg Kroah-Hartman
2022-10-28 13:55 ` [PATCH 5.10 00/73] 5.10.152-rc1 review Rudi Heitbaum
2022-10-28 17:14 ` Pavel Machek
2022-10-28 19:44 ` Sudip Mukherjee (Codethink)
2022-10-28 20:55 ` Florian Fainelli
2022-10-29 3:35 ` Guenter Roeck
2022-10-29 6:55 ` Naresh Kamboju
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221028120233.677150258@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=kuniyu@amazon.com \
--cc=pabeni@redhat.com \
--cc=patches@lists.linux.dev \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox