* [PATCH AUTOSEL 4.19 2/6] netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test
2023-11-22 15:36 [PATCH AUTOSEL 4.19 1/6] hrtimers: Push pending hrtimers away from outgoing CPU earlier Sasha Levin
@ 2023-11-22 15:36 ` Sasha Levin
2023-11-22 15:36 ` [PATCH AUTOSEL 4.19 3/6] tg3: Move the [rt]x_dropped counters to tg3_napi Sasha Levin
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2023-11-22 15:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Jozsef Kadlecsik, Pablo Neira Ayuso, Sasha Levin, fw, davem,
edumazet, kuba, pabeni, justinstitt, kuniyu, netfilter-devel,
coreteam, netdev
From: Jozsef Kadlecsik <kadlec@netfilter.org>
[ Upstream commit 28628fa952fefc7f2072ce6e8016968cc452b1ba ]
Linkui Xiao reported that there's a race condition when ipset swap and destroy is
called, which can lead to crash in add/del/test element operations. Swap then
destroy are usual operations to replace a set with another one in a production
system. The issue can in some cases be reproduced with the script:
ipset create hash_ip1 hash:net family inet hashsize 1024 maxelem 1048576
ipset add hash_ip1 172.20.0.0/16
ipset add hash_ip1 192.168.0.0/16
iptables -A INPUT -m set --match-set hash_ip1 src -j ACCEPT
while [ 1 ]
do
# ... Ongoing traffic...
ipset create hash_ip2 hash:net family inet hashsize 1024 maxelem 1048576
ipset add hash_ip2 172.20.0.0/16
ipset swap hash_ip1 hash_ip2
ipset destroy hash_ip2
sleep 0.05
done
In the race case the possible order of the operations are
CPU0 CPU1
ip_set_test
ipset swap hash_ip1 hash_ip2
ipset destroy hash_ip2
hash_net_kadt
Swap replaces hash_ip1 with hash_ip2 and then destroy removes hash_ip2 which
is the original hash_ip1. ip_set_test was called on hash_ip1 and because destroy
removed it, hash_net_kadt crashes.
The fix is to force ip_set_swap() to wait for all readers to finish accessing the
old set pointers by calling synchronize_rcu().
The first version of the patch was written by Linkui Xiao <xiaolinkui@kylinos.cn>.
v2: synchronize_rcu() is moved into ip_set_swap() in order not to burden
ip_set_destroy() unnecessarily when all sets are destroyed.
v3: Florian Westphal pointed out that all netfilter hooks run with rcu_read_lock() held
and em_ipset.c wraps the entire ip_set_test() in rcu read lock/unlock pair.
So there's no need to extend the rcu read locked area in ipset itself.
Closes: https://lore.kernel.org/all/69e7963b-e7f8-3ad0-210-7b86eebf7f78@netfilter.org/
Reported by: Linkui Xiao <xiaolinkui@kylinos.cn>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/netfilter/ipset/ip_set_core.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
index 31756d1bf83e7..031bb83aed70a 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -64,6 +64,8 @@ MODULE_ALIAS_NFNL_SUBSYS(NFNL_SUBSYS_IPSET);
ip_set_dereference((inst)->ip_set_list)[id]
#define ip_set_ref_netlink(inst,id) \
rcu_dereference_raw((inst)->ip_set_list)[id]
+#define ip_set_dereference_nfnl(p) \
+ rcu_dereference_check(p, lockdep_nfnl_is_held(NFNL_SUBSYS_IPSET))
/* The set types are implemented in modules and registered set types
* can be found in ip_set_type_list. Adding/deleting types is
@@ -552,15 +554,10 @@ __ip_set_put_netlink(struct ip_set *set)
static inline struct ip_set *
ip_set_rcu_get(struct net *net, ip_set_id_t index)
{
- struct ip_set *set;
struct ip_set_net *inst = ip_set_pernet(net);
- rcu_read_lock();
- /* ip_set_list itself needs to be protected */
- set = rcu_dereference(inst->ip_set_list)[index];
- rcu_read_unlock();
-
- return set;
+ /* ip_set_list and the set pointer need to be protected */
+ return ip_set_dereference_nfnl(inst->ip_set_list)[index];
}
int
@@ -1227,6 +1224,9 @@ static int ip_set_swap(struct net *net, struct sock *ctnl, struct sk_buff *skb,
ip_set(inst, to_id) = from;
write_unlock_bh(&ip_set_ref_lock);
+ /* Make sure all readers of the old set pointers are completed. */
+ synchronize_rcu();
+
return 0;
}
--
2.42.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH AUTOSEL 4.19 3/6] tg3: Move the [rt]x_dropped counters to tg3_napi
2023-11-22 15:36 [PATCH AUTOSEL 4.19 1/6] hrtimers: Push pending hrtimers away from outgoing CPU earlier Sasha Levin
2023-11-22 15:36 ` [PATCH AUTOSEL 4.19 2/6] netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test Sasha Levin
@ 2023-11-22 15:36 ` Sasha Levin
2023-11-22 15:36 ` [PATCH AUTOSEL 4.19 4/6] tg3: Increment tx_dropped in tg3_tso_bug() Sasha Levin
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2023-11-22 15:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Alex Pakhunov, Vincent Wong, Michael Chan, Jakub Kicinski,
Sasha Levin, pavan.chebbi, mchan, davem, edumazet, pabeni, netdev
From: Alex Pakhunov <alexey.pakhunov@spacex.com>
[ Upstream commit 907d1bdb8b2cc0357d03a1c34d2a08d9943760b1 ]
This change moves [rt]x_dropped counters to tg3_napi so that they can be
updated by a single writer, race-free.
Signed-off-by: Alex Pakhunov <alexey.pakhunov@spacex.com>
Signed-off-by: Vincent Wong <vincent.wong2@spacex.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231113182350.37472-1-alexey.pakhunov@spacex.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/broadcom/tg3.c | 38 +++++++++++++++++++++++++----
drivers/net/ethernet/broadcom/tg3.h | 4 +--
2 files changed, 35 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index f0b5c8a4d29f5..84a5bddf614c8 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -6859,7 +6859,7 @@ static int tg3_rx(struct tg3_napi *tnapi, int budget)
desc_idx, *post_ptr);
drop_it_no_recycle:
/* Other statistics kept track of by card. */
- tp->rx_dropped++;
+ tnapi->rx_dropped++;
goto next_pkt;
}
@@ -8163,7 +8163,7 @@ static netdev_tx_t tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)
drop:
dev_kfree_skb_any(skb);
drop_nofree:
- tp->tx_dropped++;
+ tnapi->tx_dropped++;
return NETDEV_TX_OK;
}
@@ -9342,7 +9342,7 @@ static void __tg3_set_rx_mode(struct net_device *);
/* tp->lock is held. */
static int tg3_halt(struct tg3 *tp, int kind, bool silent)
{
- int err;
+ int err, i;
tg3_stop_fw(tp);
@@ -9363,6 +9363,13 @@ static int tg3_halt(struct tg3 *tp, int kind, bool silent)
/* And make sure the next sample is new data */
memset(tp->hw_stats, 0, sizeof(struct tg3_hw_stats));
+
+ for (i = 0; i < TG3_IRQ_MAX_VECS; ++i) {
+ struct tg3_napi *tnapi = &tp->napi[i];
+
+ tnapi->rx_dropped = 0;
+ tnapi->tx_dropped = 0;
+ }
}
return err;
@@ -11919,6 +11926,9 @@ static void tg3_get_nstats(struct tg3 *tp, struct rtnl_link_stats64 *stats)
{
struct rtnl_link_stats64 *old_stats = &tp->net_stats_prev;
struct tg3_hw_stats *hw_stats = tp->hw_stats;
+ unsigned long rx_dropped;
+ unsigned long tx_dropped;
+ int i;
stats->rx_packets = old_stats->rx_packets +
get_stat64(&hw_stats->rx_ucast_packets) +
@@ -11965,8 +11975,26 @@ static void tg3_get_nstats(struct tg3 *tp, struct rtnl_link_stats64 *stats)
stats->rx_missed_errors = old_stats->rx_missed_errors +
get_stat64(&hw_stats->rx_discards);
- stats->rx_dropped = tp->rx_dropped;
- stats->tx_dropped = tp->tx_dropped;
+ /* Aggregate per-queue counters. The per-queue counters are updated
+ * by a single writer, race-free. The result computed by this loop
+ * might not be 100% accurate (counters can be updated in the middle of
+ * the loop) but the next tg3_get_nstats() will recompute the current
+ * value so it is acceptable.
+ *
+ * Note that these counters wrap around at 4G on 32bit machines.
+ */
+ rx_dropped = (unsigned long)(old_stats->rx_dropped);
+ tx_dropped = (unsigned long)(old_stats->tx_dropped);
+
+ for (i = 0; i < tp->irq_cnt; i++) {
+ struct tg3_napi *tnapi = &tp->napi[i];
+
+ rx_dropped += tnapi->rx_dropped;
+ tx_dropped += tnapi->tx_dropped;
+ }
+
+ stats->rx_dropped = rx_dropped;
+ stats->tx_dropped = tx_dropped;
}
static int tg3_get_regs_len(struct net_device *dev)
diff --git a/drivers/net/ethernet/broadcom/tg3.h b/drivers/net/ethernet/broadcom/tg3.h
index a772a33b685c5..b8cc8ff4e8782 100644
--- a/drivers/net/ethernet/broadcom/tg3.h
+++ b/drivers/net/ethernet/broadcom/tg3.h
@@ -3018,6 +3018,7 @@ struct tg3_napi {
u16 *rx_rcb_prod_idx;
struct tg3_rx_prodring_set prodring;
struct tg3_rx_buffer_desc *rx_rcb;
+ unsigned long rx_dropped;
u32 tx_prod ____cacheline_aligned;
u32 tx_cons;
@@ -3026,6 +3027,7 @@ struct tg3_napi {
u32 prodmbox;
struct tg3_tx_buffer_desc *tx_ring;
struct tg3_tx_ring_info *tx_buffers;
+ unsigned long tx_dropped;
dma_addr_t status_mapping;
dma_addr_t rx_rcb_mapping;
@@ -3219,8 +3221,6 @@ struct tg3 {
/* begin "everything else" cacheline(s) section */
- unsigned long rx_dropped;
- unsigned long tx_dropped;
struct rtnl_link_stats64 net_stats_prev;
struct tg3_ethtool_stats estats_prev;
--
2.42.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH AUTOSEL 4.19 4/6] tg3: Increment tx_dropped in tg3_tso_bug()
2023-11-22 15:36 [PATCH AUTOSEL 4.19 1/6] hrtimers: Push pending hrtimers away from outgoing CPU earlier Sasha Levin
2023-11-22 15:36 ` [PATCH AUTOSEL 4.19 2/6] netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test Sasha Levin
2023-11-22 15:36 ` [PATCH AUTOSEL 4.19 3/6] tg3: Move the [rt]x_dropped counters to tg3_napi Sasha Levin
@ 2023-11-22 15:36 ` Sasha Levin
2023-11-22 15:36 ` [PATCH AUTOSEL 4.19 5/6] kconfig: fix memory leak from range properties Sasha Levin
2023-11-22 15:36 ` [PATCH AUTOSEL 4.19 6/6] drm/amdgpu: correct chunk_ptr to a pointer to chunk Sasha Levin
4 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2023-11-22 15:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Alex Pakhunov, Vincent Wong, Pavan Chebbi, Jakub Kicinski,
Sasha Levin, mchan, davem, edumazet, pabeni, netdev
From: Alex Pakhunov <alexey.pakhunov@spacex.com>
[ Upstream commit 17dd5efe5f36a96bd78012594fabe21efb01186b ]
tg3_tso_bug() drops a packet if it cannot be segmented for any reason.
The number of discarded frames should be incremented accordingly.
Signed-off-by: Alex Pakhunov <alexey.pakhunov@spacex.com>
Signed-off-by: Vincent Wong <vincent.wong2@spacex.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/20231113182350.37472-2-alexey.pakhunov@spacex.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/broadcom/tg3.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 84a5bddf614c8..68bb4a2ff7cee 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -7889,8 +7889,10 @@ static int tg3_tso_bug(struct tg3 *tp, struct tg3_napi *tnapi,
segs = skb_gso_segment(skb, tp->dev->features &
~(NETIF_F_TSO | NETIF_F_TSO6));
- if (IS_ERR(segs) || !segs)
+ if (IS_ERR(segs) || !segs) {
+ tnapi->tx_dropped++;
goto tg3_tso_bug_end;
+ }
do {
nskb = segs;
--
2.42.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH AUTOSEL 4.19 5/6] kconfig: fix memory leak from range properties
2023-11-22 15:36 [PATCH AUTOSEL 4.19 1/6] hrtimers: Push pending hrtimers away from outgoing CPU earlier Sasha Levin
` (2 preceding siblings ...)
2023-11-22 15:36 ` [PATCH AUTOSEL 4.19 4/6] tg3: Increment tx_dropped in tg3_tso_bug() Sasha Levin
@ 2023-11-22 15:36 ` Sasha Levin
2023-11-22 15:36 ` [PATCH AUTOSEL 4.19 6/6] drm/amdgpu: correct chunk_ptr to a pointer to chunk Sasha Levin
4 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2023-11-22 15:36 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Masahiro Yamada, Sasha Levin, linux-kbuild
From: Masahiro Yamada <masahiroy@kernel.org>
[ Upstream commit ae1eff0349f2e908fc083630e8441ea6dc434dc0 ]
Currently, sym_validate_range() duplicates the range string using
xstrdup(), which is overwritten by a subsequent sym_calc_value() call.
It results in a memory leak.
Instead, only the pointer should be copied.
Below is a test case, with a summary from Valgrind.
[Test Kconfig]
config FOO
int "foo"
range 10 20
[Test .config]
CONFIG_FOO=0
[Before]
LEAK SUMMARY:
definitely lost: 3 bytes in 1 blocks
indirectly lost: 0 bytes in 0 blocks
possibly lost: 0 bytes in 0 blocks
still reachable: 17,465 bytes in 21 blocks
suppressed: 0 bytes in 0 blocks
[After]
LEAK SUMMARY:
definitely lost: 0 bytes in 0 blocks
indirectly lost: 0 bytes in 0 blocks
possibly lost: 0 bytes in 0 blocks
still reachable: 17,462 bytes in 20 blocks
suppressed: 0 bytes in 0 blocks
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
scripts/kconfig/symbol.c | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)
diff --git a/scripts/kconfig/symbol.c b/scripts/kconfig/symbol.c
index 703b9b899ee9c..5adb60b7e12f3 100644
--- a/scripts/kconfig/symbol.c
+++ b/scripts/kconfig/symbol.c
@@ -119,9 +119,9 @@ static long long sym_get_range_val(struct symbol *sym, int base)
static void sym_validate_range(struct symbol *sym)
{
struct property *prop;
+ struct symbol *range_sym;
int base;
long long val, val2;
- char str[64];
switch (sym->type) {
case S_INT:
@@ -137,17 +137,15 @@ static void sym_validate_range(struct symbol *sym)
if (!prop)
return;
val = strtoll(sym->curr.val, NULL, base);
- val2 = sym_get_range_val(prop->expr->left.sym, base);
+ range_sym = prop->expr->left.sym;
+ val2 = sym_get_range_val(range_sym, base);
if (val >= val2) {
- val2 = sym_get_range_val(prop->expr->right.sym, base);
+ range_sym = prop->expr->right.sym;
+ val2 = sym_get_range_val(range_sym, base);
if (val <= val2)
return;
}
- if (sym->type == S_INT)
- sprintf(str, "%lld", val2);
- else
- sprintf(str, "0x%llx", val2);
- sym->curr.val = xstrdup(str);
+ sym->curr.val = range_sym->curr.val;
}
static void sym_set_changed(struct symbol *sym)
--
2.42.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH AUTOSEL 4.19 6/6] drm/amdgpu: correct chunk_ptr to a pointer to chunk.
2023-11-22 15:36 [PATCH AUTOSEL 4.19 1/6] hrtimers: Push pending hrtimers away from outgoing CPU earlier Sasha Levin
` (3 preceding siblings ...)
2023-11-22 15:36 ` [PATCH AUTOSEL 4.19 5/6] kconfig: fix memory leak from range properties Sasha Levin
@ 2023-11-22 15:36 ` Sasha Levin
4 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2023-11-22 15:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: YuanShang, Christian König, Alex Deucher, Sasha Levin,
Xinhui.Pan, airlied, daniel, guchun.chen, srinivasan.shanmugam,
luben.tuikov, amd-gfx, dri-devel
From: YuanShang <YuanShang.Mao@amd.com>
[ Upstream commit 50d51374b498457c4dea26779d32ccfed12ddaff ]
The variable "chunk_ptr" should be a pointer pointing
to a struct drm_amdgpu_cs_chunk instead of to a pointer
of that.
Signed-off-by: YuanShang <YuanShang.Mao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 70e446c2acf82..94b06c918e80d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -147,7 +147,7 @@ static int amdgpu_cs_parser_init(struct amdgpu_cs_parser *p, union drm_amdgpu_cs
}
for (i = 0; i < p->nchunks; i++) {
- struct drm_amdgpu_cs_chunk __user **chunk_ptr = NULL;
+ struct drm_amdgpu_cs_chunk __user *chunk_ptr = NULL;
struct drm_amdgpu_cs_chunk user_chunk;
uint32_t __user *cdata;
--
2.42.0
^ permalink raw reply related [flat|nested] 6+ messages in thread