* [PATCH AUTOSEL 5.0 006/173] ipc: prevent lockup on alloc_msg and free_msg
[not found] <20190601131934.25053-1-sashal@kernel.org>
@ 2019-06-01 13:16 ` Sasha Levin
2019-06-01 13:17 ` [PATCH AUTOSEL 5.0 033/173] bpf: fix undefined behavior in narrow load handling Sasha Levin
` (5 subsequent siblings)
6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2019-06-01 13:16 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Li Rongqing, Zhang Yu, Davidlohr Bueso, Manfred Spraul,
Arnd Bergmann, Andrew Morton, Linus Torvalds, Sasha Levin, netdev,
bpf
From: Li Rongqing <lirongqing@baidu.com>
[ Upstream commit d6a2946a88f524a47cc9b79279667137899db807 ]
msgctl10 of ltp triggers the following lockup When CONFIG_KASAN is
enabled on large memory SMP systems, the pages initialization can take a
long time, if msgctl10 requests a huge block memory, and it will block
rcu scheduler, so release cpu actively.
After adding schedule() in free_msg, free_msg can not be called when
holding spinlock, so adding msg to a tmp list, and free it out of
spinlock
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: Tasks blocked on level-1 rcu_node (CPUs 16-31): P32505
rcu: Tasks blocked on level-1 rcu_node (CPUs 48-63): P34978
rcu: (detected by 11, t=35024 jiffies, g=44237529, q=16542267)
msgctl10 R running task 21608 32505 2794 0x00000082
Call Trace:
preempt_schedule_irq+0x4c/0xb0
retint_kernel+0x1b/0x2d
RIP: 0010:__is_insn_slot_addr+0xfb/0x250
Code: 82 1d 00 48 8b 9b 90 00 00 00 4c 89 f7 49 c1 ee 03 e8 59 83 1d 00 48 b8 00 00 00 00 00 fc ff df 4c 39 eb 48 89 9d 58 ff ff ff <41> c6 04 06 f8 74 66 4c 8d 75 98 4c 89 f1 48 c1 e9 03 48 01 c8 48
RSP: 0018:ffff88bce041f758 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
RAX: dffffc0000000000 RBX: ffffffff8471bc50 RCX: ffffffff828a2a57
RDX: dffffc0000000000 RSI: dffffc0000000000 RDI: ffff88bce041f780
RBP: ffff88bce041f828 R08: ffffed15f3f4c5b3 R09: ffffed15f3f4c5b3
R10: 0000000000000001 R11: ffffed15f3f4c5b2 R12: 000000318aee9b73
R13: ffffffff8471bc50 R14: 1ffff1179c083ef0 R15: 1ffff1179c083eec
kernel_text_address+0xc1/0x100
__kernel_text_address+0xe/0x30
unwind_get_return_address+0x2f/0x50
__save_stack_trace+0x92/0x100
create_object+0x380/0x650
__kmalloc+0x14c/0x2b0
load_msg+0x38/0x1a0
do_msgsnd+0x19e/0xcf0
do_syscall_64+0x117/0x400
entry_SYSCALL_64_after_hwframe+0x49/0xbe
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: Tasks blocked on level-1 rcu_node (CPUs 0-15): P32170
rcu: (detected by 14, t=35016 jiffies, g=44237525, q=12423063)
msgctl10 R running task 21608 32170 32155 0x00000082
Call Trace:
preempt_schedule_irq+0x4c/0xb0
retint_kernel+0x1b/0x2d
RIP: 0010:lock_acquire+0x4d/0x340
Code: 48 81 ec c0 00 00 00 45 89 c6 4d 89 cf 48 8d 6c 24 20 48 89 3c 24 48 8d bb e4 0c 00 00 89 74 24 0c 48 c7 44 24 20 b3 8a b5 41 <48> c1 ed 03 48 c7 44 24 28 b4 25 18 84 48 c7 44 24 30 d0 54 7a 82
RSP: 0018:ffff88af83417738 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
RAX: dffffc0000000000 RBX: ffff88bd335f3080 RCX: 0000000000000002
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88bd335f3d64
RBP: ffff88af83417758 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: ffffed13f3f745b2 R12: 0000000000000000
R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000000
is_bpf_text_address+0x32/0xe0
kernel_text_address+0xec/0x100
__kernel_text_address+0xe/0x30
unwind_get_return_address+0x2f/0x50
__save_stack_trace+0x92/0x100
save_stack+0x32/0xb0
__kasan_slab_free+0x130/0x180
kfree+0xfa/0x2d0
free_msg+0x24/0x50
do_msgrcv+0x508/0xe60
do_syscall_64+0x117/0x400
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Davidlohr said:
"So after releasing the lock, the msg rbtree/list is empty and new
calls will not see those in the newly populated tmp_msg list, and
therefore they cannot access the delayed msg freeing pointers, which
is good. Also the fact that the node_cache is now freed before the
actual messages seems to be harmless as this is wanted for
msg_insert() avoiding GFP_ATOMIC allocations, and after releasing the
info->lock the thing is freed anyway so it should not change things"
Link: http://lkml.kernel.org/r/1552029161-4957-1-git-send-email-lirongqing@baidu.com
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
ipc/mqueue.c | 10 ++++++++--
ipc/msgutil.c | 6 ++++++
2 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index c595bed7bfcbc..43f8ddd637159 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -391,7 +391,8 @@ static void mqueue_evict_inode(struct inode *inode)
struct user_struct *user;
unsigned long mq_bytes, mq_treesize;
struct ipc_namespace *ipc_ns;
- struct msg_msg *msg;
+ struct msg_msg *msg, *nmsg;
+ LIST_HEAD(tmp_msg);
clear_inode(inode);
@@ -402,10 +403,15 @@ static void mqueue_evict_inode(struct inode *inode)
info = MQUEUE_I(inode);
spin_lock(&info->lock);
while ((msg = msg_get(info)) != NULL)
- free_msg(msg);
+ list_add_tail(&msg->m_list, &tmp_msg);
kfree(info->node_cache);
spin_unlock(&info->lock);
+ list_for_each_entry_safe(msg, nmsg, &tmp_msg, m_list) {
+ list_del(&msg->m_list);
+ free_msg(msg);
+ }
+
/* Total amount of bytes accounted for the mqueue */
mq_treesize = info->attr.mq_maxmsg * sizeof(struct msg_msg) +
min_t(unsigned int, info->attr.mq_maxmsg, MQ_PRIO_MAX) *
diff --git a/ipc/msgutil.c b/ipc/msgutil.c
index 84598025a6ade..e65593742e2be 100644
--- a/ipc/msgutil.c
+++ b/ipc/msgutil.c
@@ -18,6 +18,7 @@
#include <linux/utsname.h>
#include <linux/proc_ns.h>
#include <linux/uaccess.h>
+#include <linux/sched.h>
#include "util.h"
@@ -64,6 +65,9 @@ static struct msg_msg *alloc_msg(size_t len)
pseg = &msg->next;
while (len > 0) {
struct msg_msgseg *seg;
+
+ cond_resched();
+
alen = min(len, DATALEN_SEG);
seg = kmalloc(sizeof(*seg) + alen, GFP_KERNEL_ACCOUNT);
if (seg == NULL)
@@ -176,6 +180,8 @@ void free_msg(struct msg_msg *msg)
kfree(msg);
while (seg != NULL) {
struct msg_msgseg *tmp = seg->next;
+
+ cond_resched();
kfree(seg);
seg = tmp;
}
--
2.20.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH AUTOSEL 5.0 033/173] bpf: fix undefined behavior in narrow load handling
[not found] <20190601131934.25053-1-sashal@kernel.org>
2019-06-01 13:16 ` [PATCH AUTOSEL 5.0 006/173] ipc: prevent lockup on alloc_msg and free_msg Sasha Levin
@ 2019-06-01 13:17 ` Sasha Levin
2019-06-01 13:17 ` [PATCH AUTOSEL 5.0 054/173] percpu: remove spurious lock dependency between percpu and sched Sasha Levin
` (4 subsequent siblings)
6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2019-06-01 13:17 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Krzesimir Nowak, Alban Crequy, Iago López Galeiras,
Yonghong Song, Daniel Borkmann, Sasha Levin, netdev, bpf
From: Krzesimir Nowak <krzesimir@kinvolk.io>
[ Upstream commit e2f7fc0ac6957cabff4cecf6c721979b571af208 ]
Commit 31fd85816dbe ("bpf: permits narrower load from bpf program
context fields") made the verifier add AND instructions to clear the
unwanted bits with a mask when doing a narrow load. The mask is
computed with
(1 << size * 8) - 1
where "size" is the size of the narrow load. When doing a 4 byte load
of a an 8 byte field the verifier shifts the literal 1 by 32 places to
the left. This results in an overflow of a signed integer, which is an
undefined behavior. Typically, the computed mask was zero, so the
result of the narrow load ended up being zero too.
Cast the literal to long long to avoid overflows. Note that narrow
load of the 4 byte fields does not have the undefined behavior,
because the load size can only be either 1 or 2 bytes, so shifting 1
by 8 or 16 places will not overflow it. And reading 4 bytes would not
be a narrow load of a 4 bytes field.
Fixes: 31fd85816dbe ("bpf: permits narrower load from bpf program context fields")
Reviewed-by: Alban Crequy <alban@kinvolk.io>
Reviewed-by: Iago López Galeiras <iago@kinvolk.io>
Signed-off-by: Krzesimir Nowak <krzesimir@kinvolk.io>
Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
kernel/bpf/verifier.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d53825b6fcd94..3b287b42212ea 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6612,7 +6612,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
insn->dst_reg,
shift);
insn_buf[cnt++] = BPF_ALU64_IMM(BPF_AND, insn->dst_reg,
- (1 << size * 8) - 1);
+ (1ULL << size * 8) - 1);
}
}
--
2.20.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH AUTOSEL 5.0 054/173] percpu: remove spurious lock dependency between percpu and sched
[not found] <20190601131934.25053-1-sashal@kernel.org>
2019-06-01 13:16 ` [PATCH AUTOSEL 5.0 006/173] ipc: prevent lockup on alloc_msg and free_msg Sasha Levin
2019-06-01 13:17 ` [PATCH AUTOSEL 5.0 033/173] bpf: fix undefined behavior in narrow load handling Sasha Levin
@ 2019-06-01 13:17 ` Sasha Levin
2019-06-01 13:17 ` [PATCH AUTOSEL 5.0 061/173] netfilter: ctnetlink: Resolve conntrack L3-protocol flush regression Sasha Levin
` (3 subsequent siblings)
6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2019-06-01 13:17 UTC (permalink / raw)
To: linux-kernel, stable
Cc: John Sperbeck, Dennis Zhou, Sasha Levin, linux-mm, netdev, bpf
From: John Sperbeck <jsperbeck@google.com>
[ Upstream commit 198790d9a3aeaef5792d33a560020861126edc22 ]
In free_percpu() we sometimes call pcpu_schedule_balance_work() to
queue a work item (which does a wakeup) while holding pcpu_lock.
This creates an unnecessary lock dependency between pcpu_lock and
the scheduler's pi_lock. There are other places where we call
pcpu_schedule_balance_work() without hold pcpu_lock, and this case
doesn't need to be different.
Moving the call outside the lock prevents the following lockdep splat
when running tools/testing/selftests/bpf/{test_maps,test_progs} in
sequence with lockdep enabled:
======================================================
WARNING: possible circular locking dependency detected
5.1.0-dbg-DEV #1 Not tainted
------------------------------------------------------
kworker/23:255/18872 is trying to acquire lock:
000000000bc79290 (&(&pool->lock)->rlock){-.-.}, at: __queue_work+0xb2/0x520
but task is already holding lock:
00000000e3e7a6aa (pcpu_lock){..-.}, at: free_percpu+0x36/0x260
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #4 (pcpu_lock){..-.}:
lock_acquire+0x9e/0x180
_raw_spin_lock_irqsave+0x3a/0x50
pcpu_alloc+0xfa/0x780
__alloc_percpu_gfp+0x12/0x20
alloc_htab_elem+0x184/0x2b0
__htab_percpu_map_update_elem+0x252/0x290
bpf_percpu_hash_update+0x7c/0x130
__do_sys_bpf+0x1912/0x1be0
__x64_sys_bpf+0x1a/0x20
do_syscall_64+0x59/0x400
entry_SYSCALL_64_after_hwframe+0x49/0xbe
-> #3 (&htab->buckets[i].lock){....}:
lock_acquire+0x9e/0x180
_raw_spin_lock_irqsave+0x3a/0x50
htab_map_update_elem+0x1af/0x3a0
-> #2 (&rq->lock){-.-.}:
lock_acquire+0x9e/0x180
_raw_spin_lock+0x2f/0x40
task_fork_fair+0x37/0x160
sched_fork+0x211/0x310
copy_process.part.43+0x7b1/0x2160
_do_fork+0xda/0x6b0
kernel_thread+0x29/0x30
rest_init+0x22/0x260
arch_call_rest_init+0xe/0x10
start_kernel+0x4fd/0x520
x86_64_start_reservations+0x24/0x26
x86_64_start_kernel+0x6f/0x72
secondary_startup_64+0xa4/0xb0
-> #1 (&p->pi_lock){-.-.}:
lock_acquire+0x9e/0x180
_raw_spin_lock_irqsave+0x3a/0x50
try_to_wake_up+0x41/0x600
wake_up_process+0x15/0x20
create_worker+0x16b/0x1e0
workqueue_init+0x279/0x2ee
kernel_init_freeable+0xf7/0x288
kernel_init+0xf/0x180
ret_from_fork+0x24/0x30
-> #0 (&(&pool->lock)->rlock){-.-.}:
__lock_acquire+0x101f/0x12a0
lock_acquire+0x9e/0x180
_raw_spin_lock+0x2f/0x40
__queue_work+0xb2/0x520
queue_work_on+0x38/0x80
free_percpu+0x221/0x260
pcpu_freelist_destroy+0x11/0x20
stack_map_free+0x2a/0x40
bpf_map_free_deferred+0x3c/0x50
process_one_work+0x1f7/0x580
worker_thread+0x54/0x410
kthread+0x10f/0x150
ret_from_fork+0x24/0x30
other info that might help us debug this:
Chain exists of:
&(&pool->lock)->rlock --> &htab->buckets[i].lock --> pcpu_lock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(pcpu_lock);
lock(&htab->buckets[i].lock);
lock(pcpu_lock);
lock(&(&pool->lock)->rlock);
*** DEADLOCK ***
3 locks held by kworker/23:255/18872:
#0: 00000000b36a6e16 ((wq_completion)events){+.+.},
at: process_one_work+0x17a/0x580
#1: 00000000dfd966f0 ((work_completion)(&map->work)){+.+.},
at: process_one_work+0x17a/0x580
#2: 00000000e3e7a6aa (pcpu_lock){..-.},
at: free_percpu+0x36/0x260
stack backtrace:
CPU: 23 PID: 18872 Comm: kworker/23:255 Not tainted 5.1.0-dbg-DEV #1
Hardware name: ...
Workqueue: events bpf_map_free_deferred
Call Trace:
dump_stack+0x67/0x95
print_circular_bug.isra.38+0x1c6/0x220
check_prev_add.constprop.50+0x9f6/0xd20
__lock_acquire+0x101f/0x12a0
lock_acquire+0x9e/0x180
_raw_spin_lock+0x2f/0x40
__queue_work+0xb2/0x520
queue_work_on+0x38/0x80
free_percpu+0x221/0x260
pcpu_freelist_destroy+0x11/0x20
stack_map_free+0x2a/0x40
bpf_map_free_deferred+0x3c/0x50
process_one_work+0x1f7/0x580
worker_thread+0x54/0x410
kthread+0x10f/0x150
ret_from_fork+0x24/0x30
Signed-off-by: John Sperbeck <jsperbeck@google.com>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/percpu.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/mm/percpu.c b/mm/percpu.c
index 59bd6a51954c9..4ad3a4214249e 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1721,6 +1721,7 @@ void free_percpu(void __percpu *ptr)
struct pcpu_chunk *chunk;
unsigned long flags;
int off;
+ bool need_balance = false;
if (!ptr)
return;
@@ -1742,7 +1743,7 @@ void free_percpu(void __percpu *ptr)
list_for_each_entry(pos, &pcpu_slot[pcpu_nr_slots - 1], list)
if (pos != chunk) {
- pcpu_schedule_balance_work();
+ need_balance = true;
break;
}
}
@@ -1750,6 +1751,9 @@ void free_percpu(void __percpu *ptr)
trace_percpu_free_percpu(chunk->base_addr, off, ptr);
spin_unlock_irqrestore(&pcpu_lock, flags);
+
+ if (need_balance)
+ pcpu_schedule_balance_work();
}
EXPORT_SYMBOL_GPL(free_percpu);
--
2.20.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH AUTOSEL 5.0 061/173] netfilter: ctnetlink: Resolve conntrack L3-protocol flush regression
[not found] <20190601131934.25053-1-sashal@kernel.org>
` (2 preceding siblings ...)
2019-06-01 13:17 ` [PATCH AUTOSEL 5.0 054/173] percpu: remove spurious lock dependency between percpu and sched Sasha Levin
@ 2019-06-01 13:17 ` Sasha Levin
2019-06-01 13:17 ` [PATCH AUTOSEL 5.0 064/173] netfilter: nf_flow_table: fix missing error check for rhashtable_insert_fast Sasha Levin
` (2 subsequent siblings)
6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2019-06-01 13:17 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Kristian Evensen, Nicolas Dichtel, Pablo Neira Ayuso, Sasha Levin,
netfilter-devel, coreteam, netdev
From: Kristian Evensen <kristian.evensen@gmail.com>
[ Upstream commit f8e608982022fad035160870f5b06086d3cba54d ]
Commit 59c08c69c278 ("netfilter: ctnetlink: Support L3 protocol-filter
on flush") introduced a user-space regression when flushing connection
track entries. Before this commit, the nfgen_family field was not used
by the kernel and all entries were removed. Since this commit,
nfgen_family is used to filter out entries that should not be removed.
One example a broken tool is conntrack. conntrack always sets
nfgen_family to AF_INET, so after 59c08c69c278 only IPv4 entries were
removed with the -F parameter.
Pablo Neira Ayuso suggested using nfgenmsg->version to resolve the
regression, and this commit implements his suggestion. nfgenmsg->version
is so far set to zero, so it is well-suited to be used as a flag for
selecting old or new flush behavior. If version is 0, nfgen_family is
ignored and all entries are used. If user-space sets the version to one
(or any other value than 0), then the new behavior is used. As version
only can have two valid values, I chose not to add a new
NFNETLINK_VERSION-constant.
Fixes: 59c08c69c278 ("netfilter: ctnetlink: Support L3 protocol-filter on flush")
Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Tested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/netfilter/nf_conntrack_netlink.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 36619ad8ab8c2..8233dfafb339d 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1254,7 +1254,7 @@ static int ctnetlink_del_conntrack(struct net *net, struct sock *ctnl,
struct nf_conntrack_tuple tuple;
struct nf_conn *ct;
struct nfgenmsg *nfmsg = nlmsg_data(nlh);
- u_int8_t u3 = nfmsg->nfgen_family;
+ u_int8_t u3 = nfmsg->version ? nfmsg->nfgen_family : AF_UNSPEC;
struct nf_conntrack_zone zone;
int err;
--
2.20.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH AUTOSEL 5.0 064/173] netfilter: nf_flow_table: fix missing error check for rhashtable_insert_fast
[not found] <20190601131934.25053-1-sashal@kernel.org>
` (3 preceding siblings ...)
2019-06-01 13:17 ` [PATCH AUTOSEL 5.0 061/173] netfilter: ctnetlink: Resolve conntrack L3-protocol flush regression Sasha Levin
@ 2019-06-01 13:17 ` Sasha Levin
2019-06-01 13:17 ` [PATCH AUTOSEL 5.0 065/173] netfilter: nf_conntrack_h323: restore boundary check correctness Sasha Levin
2019-06-01 13:17 ` [PATCH AUTOSEL 5.0 067/173] netfilter: nf_tables: fix base chain stat rcu_dereference usage Sasha Levin
6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2019-06-01 13:17 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Taehee Yoo, Pablo Neira Ayuso, Sasha Levin, netfilter-devel,
coreteam, netdev
From: Taehee Yoo <ap420073@gmail.com>
[ Upstream commit 43c8f131184faf20c07221f3e09724611c6525d8 ]
rhashtable_insert_fast() may return an error value when memory
allocation fails, but flow_offload_add() does not check for errors.
This patch just adds missing error checking.
Fixes: ac2a66665e23 ("netfilter: add generic flow table infrastructure")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/netfilter/nf_flow_table_core.c | 25 ++++++++++++++++++-------
1 file changed, 18 insertions(+), 7 deletions(-)
diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index c0c72ae9df42f..604446c0264cc 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -185,14 +185,25 @@ static const struct rhashtable_params nf_flow_offload_rhash_params = {
int flow_offload_add(struct nf_flowtable *flow_table, struct flow_offload *flow)
{
- flow->timeout = (u32)jiffies;
+ int err;
- rhashtable_insert_fast(&flow_table->rhashtable,
- &flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].node,
- nf_flow_offload_rhash_params);
- rhashtable_insert_fast(&flow_table->rhashtable,
- &flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].node,
- nf_flow_offload_rhash_params);
+ err = rhashtable_insert_fast(&flow_table->rhashtable,
+ &flow->tuplehash[0].node,
+ nf_flow_offload_rhash_params);
+ if (err < 0)
+ return err;
+
+ err = rhashtable_insert_fast(&flow_table->rhashtable,
+ &flow->tuplehash[1].node,
+ nf_flow_offload_rhash_params);
+ if (err < 0) {
+ rhashtable_remove_fast(&flow_table->rhashtable,
+ &flow->tuplehash[0].node,
+ nf_flow_offload_rhash_params);
+ return err;
+ }
+
+ flow->timeout = (u32)jiffies;
return 0;
}
EXPORT_SYMBOL_GPL(flow_offload_add);
--
2.20.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH AUTOSEL 5.0 065/173] netfilter: nf_conntrack_h323: restore boundary check correctness
[not found] <20190601131934.25053-1-sashal@kernel.org>
` (4 preceding siblings ...)
2019-06-01 13:17 ` [PATCH AUTOSEL 5.0 064/173] netfilter: nf_flow_table: fix missing error check for rhashtable_insert_fast Sasha Levin
@ 2019-06-01 13:17 ` Sasha Levin
2019-06-01 13:17 ` [PATCH AUTOSEL 5.0 067/173] netfilter: nf_tables: fix base chain stat rcu_dereference usage Sasha Levin
6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2019-06-01 13:17 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Jakub Jankowski, Pablo Neira Ayuso, Sasha Levin, netfilter-devel,
coreteam, netdev
From: Jakub Jankowski <shasta@toxcorp.com>
[ Upstream commit f5e85ce8e733c2547827f6268136b70b802eabdb ]
Since commit bc7d811ace4a ("netfilter: nf_ct_h323: Convert
CHECK_BOUND macro to function"), NAT traversal for H.323
doesn't work, failing to parse H323-UserInformation.
nf_h323_error_boundary() compares contents of the bitstring,
not the addresses, preventing valid H.323 packets from being
conntrack'd.
This looks like an oversight from when CHECK_BOUND macro was
converted to a function.
To fix it, stop dereferencing bs->cur and bs->end.
Fixes: bc7d811ace4a ("netfilter: nf_ct_h323: Convert CHECK_BOUND macro to function")
Signed-off-by: Jakub Jankowski <shasta@toxcorp.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/netfilter/nf_conntrack_h323_asn1.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/nf_conntrack_h323_asn1.c b/net/netfilter/nf_conntrack_h323_asn1.c
index 1601275efe2d1..4c2ef42e189cb 100644
--- a/net/netfilter/nf_conntrack_h323_asn1.c
+++ b/net/netfilter/nf_conntrack_h323_asn1.c
@@ -172,7 +172,7 @@ static int nf_h323_error_boundary(struct bitstr *bs, size_t bytes, size_t bits)
if (bits % BITS_PER_BYTE > 0)
bytes++;
- if (*bs->cur + bytes > *bs->end)
+ if (bs->cur + bytes > bs->end)
return 1;
return 0;
--
2.20.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH AUTOSEL 5.0 067/173] netfilter: nf_tables: fix base chain stat rcu_dereference usage
[not found] <20190601131934.25053-1-sashal@kernel.org>
` (5 preceding siblings ...)
2019-06-01 13:17 ` [PATCH AUTOSEL 5.0 065/173] netfilter: nf_conntrack_h323: restore boundary check correctness Sasha Levin
@ 2019-06-01 13:17 ` Sasha Levin
6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2019-06-01 13:17 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Florian Westphal, Pablo Neira Ayuso, Sasha Levin, netfilter-devel,
coreteam, netdev
From: Florian Westphal <fw@strlen.de>
[ Upstream commit edbd82c5fba009f68d20b5db585be1e667c605f6 ]
Following splat gets triggered when nfnetlink monitor is running while
xtables-nft selftests are running:
net/netfilter/nf_tables_api.c:1272 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
1 lock held by xtables-nft-mul/27006:
#0: 00000000e0f85be9 (&net->nft.commit_mutex){+.+.}, at: nf_tables_valid_genid+0x1a/0x50
Call Trace:
nf_tables_fill_chain_info.isra.45+0x6cc/0x6e0
nf_tables_chain_notify+0xf8/0x1a0
nf_tables_commit+0x165c/0x1740
nf_tables_fill_chain_info() can be called both from dumps (rcu read locked)
or from the transaction path if a userspace process subscribed to nftables
notifications.
In the 'table dump' case, rcu_access_pointer() cannot be used: We do not
hold transaction mutex so the pointer can be NULLed right after the check.
Just unconditionally fetch the value, then have the helper return
immediately if its NULL.
In the notification case we don't hold the rcu read lock, but updates are
prevented due to transaction mutex. Use rcu_dereference_check() to make lockdep
aware of this.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/netfilter/nf_tables_api.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 25c2b98b9a960..eb3915b76ff47 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -1147,6 +1147,9 @@ static int nft_dump_stats(struct sk_buff *skb, struct nft_stats __percpu *stats)
u64 pkts, bytes;
int cpu;
+ if (!stats)
+ return 0;
+
memset(&total, 0, sizeof(total));
for_each_possible_cpu(cpu) {
cpu_stats = per_cpu_ptr(stats, cpu);
@@ -1204,6 +1207,7 @@ static int nf_tables_fill_chain_info(struct sk_buff *skb, struct net *net,
if (nft_is_base_chain(chain)) {
const struct nft_base_chain *basechain = nft_base_chain(chain);
const struct nf_hook_ops *ops = &basechain->ops;
+ struct nft_stats __percpu *stats;
struct nlattr *nest;
nest = nla_nest_start(skb, NFTA_CHAIN_HOOK);
@@ -1225,8 +1229,9 @@ static int nf_tables_fill_chain_info(struct sk_buff *skb, struct net *net,
if (nla_put_string(skb, NFTA_CHAIN_TYPE, basechain->type->name))
goto nla_put_failure;
- if (rcu_access_pointer(basechain->stats) &&
- nft_dump_stats(skb, rcu_dereference(basechain->stats)))
+ stats = rcu_dereference_check(basechain->stats,
+ lockdep_commit_lock_is_held(net));
+ if (nft_dump_stats(skb, stats))
goto nla_put_failure;
}
--
2.20.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2019-06-01 13:39 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20190601131934.25053-1-sashal@kernel.org>
2019-06-01 13:16 ` [PATCH AUTOSEL 5.0 006/173] ipc: prevent lockup on alloc_msg and free_msg Sasha Levin
2019-06-01 13:17 ` [PATCH AUTOSEL 5.0 033/173] bpf: fix undefined behavior in narrow load handling Sasha Levin
2019-06-01 13:17 ` [PATCH AUTOSEL 5.0 054/173] percpu: remove spurious lock dependency between percpu and sched Sasha Levin
2019-06-01 13:17 ` [PATCH AUTOSEL 5.0 061/173] netfilter: ctnetlink: Resolve conntrack L3-protocol flush regression Sasha Levin
2019-06-01 13:17 ` [PATCH AUTOSEL 5.0 064/173] netfilter: nf_flow_table: fix missing error check for rhashtable_insert_fast Sasha Levin
2019-06-01 13:17 ` [PATCH AUTOSEL 5.0 065/173] netfilter: nf_conntrack_h323: restore boundary check correctness Sasha Levin
2019-06-01 13:17 ` [PATCH AUTOSEL 5.0 067/173] netfilter: nf_tables: fix base chain stat rcu_dereference usage Sasha Levin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).