* [PATCH net 0/3] Fix for BPF devmap percpu allocation splat
@ 2017-10-17 14:55 Daniel Borkmann
  2017-10-17 14:55 ` [PATCH net 1/3] mm, percpu: add support for __GFP_NOWARN flag Daniel Borkmann
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Daniel Borkmann @ 2017-10-17 14:55 UTC (permalink / raw)
  To: davem
  Cc: tj, ast, john.fastabend, mark.rutland, richard, sp3485, netdev,
	linux-kernel, Daniel Borkmann
The set fixes a splat in devmap percpu allocation when we alloc
the flush bitmap. Patch 1 is a prerequisite for the fix in patch 2,
patch 1 is rather small, so if this could be routed via -net, for
example, with Tejun's Ack that would be good. Patch 3 gets rid of
remaining PCPU_MIN_UNIT_SIZE checks, which are percpu allocator
internals and should not be used.
Thanks!
Daniel Borkmann (3):
  mm, percpu: add support for __GFP_NOWARN flag
  bpf: fix splat for illegal devmap percpu allocation
  bpf: do not test for PCPU_MIN_UNIT_SIZE before percpu allocations
 kernel/bpf/arraymap.c |  2 +-
 kernel/bpf/devmap.c   |  5 +++--
 kernel/bpf/hashtab.c  |  4 ----
 mm/percpu.c           | 15 ++++++++++-----
 4 files changed, 14 insertions(+), 12 deletions(-)
-- 
1.9.3
^ permalink raw reply	[flat|nested] 19+ messages in thread
* [PATCH net 1/3] mm, percpu: add support for __GFP_NOWARN flag
  2017-10-17 14:55 [PATCH net 0/3] Fix for BPF devmap percpu allocation splat Daniel Borkmann
@ 2017-10-17 14:55 ` Daniel Borkmann
  2017-10-17 15:53   ` Alexei Starovoitov
  2017-10-17 14:55 ` [PATCH net 2/3] bpf: fix splat for illegal devmap percpu allocation Daniel Borkmann
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Daniel Borkmann @ 2017-10-17 14:55 UTC (permalink / raw)
  To: davem
  Cc: tj, ast, john.fastabend, mark.rutland, richard, sp3485, netdev,
	linux-kernel, Daniel Borkmann
Add an option for pcpu_alloc() to support __GFP_NOWARN flag.
Currently, we always throw a warning when size or alignment
is unsupported (and also dump stack on failed allocation
requests). The warning itself is harmless since we return
NULL anyway for any failed request, which callers are
required to handle anyway. However, it becomes harmful when
panic_on_warn is set.
The rationale for the WARN() in pcpu_alloc() is that it can
be tracked when larger than supported allocation requests are
made such that allocations limits can be tweaked if warranted.
This makes sense for in-kernel users, however, there are users
of pcpu allocator where allocation size is derived from user
space requests, e.g. when creating BPF maps. In these cases,
the requests should fail gracefully without throwing a splat.
The current work-around was to check allocation size against
the upper limit of PCPU_MIN_UNIT_SIZE from call-sites for
bailing out prior to a call to pcpu_alloc() in order to
avoid throwing the WARN(). This is bad in multiple ways since
PCPU_MIN_UNIT_SIZE is an implementation detail, and having
the checks on call-sites only complicates the code for no
good reason. Thus, lets fix it generically by supporting the
__GFP_NOWARN flag that users can then use with calling the
__alloc_percpu_gfp() helper instead.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
---
 mm/percpu.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/mm/percpu.c b/mm/percpu.c
index aa121ce..a0e0c82 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1329,7 +1329,9 @@ static struct pcpu_chunk *pcpu_chunk_addr_search(void *addr)
  * @gfp: allocation flags
  *
  * Allocate percpu area of @size bytes aligned at @align.  If @gfp doesn't
- * contain %GFP_KERNEL, the allocation is atomic.
+ * contain %GFP_KERNEL, the allocation is atomic. If @gfp has __GFP_NOWARN
+ * then no warning will be triggered on invalid or failed allocation
+ * requests.
  *
  * RETURNS:
  * Percpu pointer to the allocated area on success, NULL on failure.
@@ -1337,10 +1339,11 @@ static struct pcpu_chunk *pcpu_chunk_addr_search(void *addr)
 static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
 				 gfp_t gfp)
 {
+	bool is_atomic = (gfp & GFP_KERNEL) != GFP_KERNEL;
+	bool do_warn = !(gfp & __GFP_NOWARN);
 	static int warn_limit = 10;
 	struct pcpu_chunk *chunk;
 	const char *err;
-	bool is_atomic = (gfp & GFP_KERNEL) != GFP_KERNEL;
 	int slot, off, cpu, ret;
 	unsigned long flags;
 	void __percpu *ptr;
@@ -1361,7 +1364,7 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
 
 	if (unlikely(!size || size > PCPU_MIN_UNIT_SIZE || align > PAGE_SIZE ||
 		     !is_power_of_2(align))) {
-		WARN(true, "illegal size (%zu) or align (%zu) for percpu allocation\n",
+		WARN(do_warn, "illegal size (%zu) or align (%zu) for percpu allocation\n",
 		     size, align);
 		return NULL;
 	}
@@ -1482,7 +1485,7 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
 fail:
 	trace_percpu_alloc_percpu_fail(reserved, is_atomic, size, align);
 
-	if (!is_atomic && warn_limit) {
+	if (!is_atomic && do_warn && warn_limit) {
 		pr_warn("allocation failed, size=%zu align=%zu atomic=%d, %s\n",
 			size, align, is_atomic, err);
 		dump_stack();
@@ -1507,7 +1510,9 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
  *
  * Allocate zero-filled percpu area of @size bytes aligned at @align.  If
  * @gfp doesn't contain %GFP_KERNEL, the allocation doesn't block and can
- * be called from any context but is a lot more likely to fail.
+ * be called from any context but is a lot more likely to fail. If @gfp
+ * has __GFP_NOWARN then no warning will be triggered on invalid or failed
+ * allocation requests.
  *
  * RETURNS:
  * Percpu pointer to the allocated area on success, NULL on failure.
-- 
1.9.3
^ permalink raw reply related	[flat|nested] 19+ messages in thread
* [PATCH net 2/3] bpf: fix splat for illegal devmap percpu allocation
  2017-10-17 14:55 [PATCH net 0/3] Fix for BPF devmap percpu allocation splat Daniel Borkmann
  2017-10-17 14:55 ` [PATCH net 1/3] mm, percpu: add support for __GFP_NOWARN flag Daniel Borkmann
@ 2017-10-17 14:55 ` Daniel Borkmann
  2017-10-17 15:54   ` Alexei Starovoitov
  2017-10-17 16:28   ` John Fastabend
  2017-10-17 14:55 ` [PATCH net 3/3] bpf: do not test for PCPU_MIN_UNIT_SIZE before percpu allocations Daniel Borkmann
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 19+ messages in thread
From: Daniel Borkmann @ 2017-10-17 14:55 UTC (permalink / raw)
  To: davem
  Cc: tj, ast, john.fastabend, mark.rutland, richard, sp3485, netdev,
	linux-kernel, Daniel Borkmann
It was reported that syzkaller was able to trigger a splat on
devmap percpu allocation due to illegal/unsupported allocation
request size passed to __alloc_percpu():
  [   70.094249] illegal size (32776) or align (8) for percpu allocation
  [   70.094256] ------------[ cut here ]------------
  [   70.094259] WARNING: CPU: 3 PID: 3451 at mm/percpu.c:1365 pcpu_alloc+0x96/0x630
  [...]
  [   70.094325] Call Trace:
  [   70.094328]  __alloc_percpu_gfp+0x12/0x20
  [   70.094330]  dev_map_alloc+0x134/0x1e0
  [   70.094331]  SyS_bpf+0x9bc/0x1610
  [   70.094333]  ? selinux_task_setrlimit+0x5a/0x60
  [   70.094334]  ? security_task_setrlimit+0x43/0x60
  [   70.094336]  entry_SYSCALL_64_fastpath+0x1a/0xa5
This was due to too large max_entries for the map such that we
surpassed the upper limit of PCPU_MIN_UNIT_SIZE. It's fine to
fail naturally here, so switch to __alloc_percpu_gfp() and pass
__GFP_NOWARN instead.
Fixes: 11393cc9b9be ("xdp: Add batching support to redirect map")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Shankara Pailoor <sp3485@columbia.edu>
Reported-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: John Fastabend <john.fastabend@gmail.com>
---
 kernel/bpf/devmap.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index e093d9a..920428d 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -111,8 +111,9 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
 	err = -ENOMEM;
 
 	/* A per cpu bitfield with a bit per possible net device */
-	dtab->flush_needed = __alloc_percpu(dev_map_bitmap_size(attr),
-					    __alignof__(unsigned long));
+	dtab->flush_needed = __alloc_percpu_gfp(dev_map_bitmap_size(attr),
+						__alignof__(unsigned long),
+						GFP_KERNEL | __GFP_NOWARN);
 	if (!dtab->flush_needed)
 		goto free_dtab;
 
-- 
1.9.3
^ permalink raw reply related	[flat|nested] 19+ messages in thread
* [PATCH net 3/3] bpf: do not test for PCPU_MIN_UNIT_SIZE before percpu allocations
  2017-10-17 14:55 [PATCH net 0/3] Fix for BPF devmap percpu allocation splat Daniel Borkmann
  2017-10-17 14:55 ` [PATCH net 1/3] mm, percpu: add support for __GFP_NOWARN flag Daniel Borkmann
  2017-10-17 14:55 ` [PATCH net 2/3] bpf: fix splat for illegal devmap percpu allocation Daniel Borkmann
@ 2017-10-17 14:55 ` Daniel Borkmann
  2017-10-17 15:55   ` Alexei Starovoitov
  2017-10-17 16:29   ` John Fastabend
  2017-10-17 15:03 ` [PATCH net 0/3] Fix for BPF devmap percpu allocation splat David Laight
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 19+ messages in thread
From: Daniel Borkmann @ 2017-10-17 14:55 UTC (permalink / raw)
  To: davem
  Cc: tj, ast, john.fastabend, mark.rutland, richard, sp3485, netdev,
	linux-kernel, Daniel Borkmann
PCPU_MIN_UNIT_SIZE is an implementation detail of the percpu
allocator. Given we support __GFP_NOWARN now, lets just let
the allocation request fail naturally instead. The two call
sites from BPF mistakenly assumed __GFP_NOWARN would work, so
no changes needed to their actual __alloc_percpu_gfp() calls
which use the flag already.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 kernel/bpf/arraymap.c | 2 +-
 kernel/bpf/hashtab.c  | 4 ----
 2 files changed, 1 insertion(+), 5 deletions(-)
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 98c0f00..e263673 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -98,7 +98,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
 	array_size += (u64) attr->max_entries * elem_size * num_possible_cpus();
 
 	if (array_size >= U32_MAX - PAGE_SIZE ||
-	    elem_size > PCPU_MIN_UNIT_SIZE || bpf_array_alloc_percpu(array)) {
+	    bpf_array_alloc_percpu(array)) {
 		bpf_map_area_free(array);
 		return ERR_PTR(-ENOMEM);
 	}
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 431126f..6533f08 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -317,10 +317,6 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 		 */
 		goto free_htab;
 
-	if (percpu && round_up(htab->map.value_size, 8) > PCPU_MIN_UNIT_SIZE)
-		/* make sure the size for pcpu_alloc() is reasonable */
-		goto free_htab;
-
 	htab->elem_size = sizeof(struct htab_elem) +
 			  round_up(htab->map.key_size, 8);
 	if (percpu)
-- 
1.9.3
^ permalink raw reply related	[flat|nested] 19+ messages in thread
* RE: [PATCH net 0/3] Fix for BPF devmap percpu allocation splat
  2017-10-17 14:55 [PATCH net 0/3] Fix for BPF devmap percpu allocation splat Daniel Borkmann
                   ` (2 preceding siblings ...)
  2017-10-17 14:55 ` [PATCH net 3/3] bpf: do not test for PCPU_MIN_UNIT_SIZE before percpu allocations Daniel Borkmann
@ 2017-10-17 15:03 ` David Laight
  2017-10-17 15:11   ` Daniel Borkmann
  2017-10-18 13:25 ` Tejun Heo
  2017-10-19 12:14 ` David Miller
  5 siblings, 1 reply; 19+ messages in thread
From: David Laight @ 2017-10-17 15:03 UTC (permalink / raw)
  To: 'Daniel Borkmann', davem@davemloft.net
  Cc: tj@kernel.org, ast@kernel.org, john.fastabend@gmail.com,
	mark.rutland@arm.com, richard@nod.at, sp3485@columbia.edu,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
From: Daniel Borkmann
> Sent: 17 October 2017 15:56
> 
> The set fixes a splat in devmap percpu allocation when we alloc
> the flush bitmap. Patch 1 is a prerequisite for the fix in patch 2,
> patch 1 is rather small, so if this could be routed via -net, for
> example, with Tejun's Ack that would be good. Patch 3 gets rid of
> remaining PCPU_MIN_UNIT_SIZE checks, which are percpu allocator
> internals and should not be used.
Does it make sense to allow the user program to try to allocate ever
smaller very large maps until it finds one that succeeds - thus
using up all the percpu space?
Or is this a 'root only' 'shoot self in foot' job?
	David
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH net 0/3] Fix for BPF devmap percpu allocation splat
  2017-10-17 15:03 ` [PATCH net 0/3] Fix for BPF devmap percpu allocation splat David Laight
@ 2017-10-17 15:11   ` Daniel Borkmann
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Borkmann @ 2017-10-17 15:11 UTC (permalink / raw)
  To: David Laight, davem@davemloft.net
  Cc: tj@kernel.org, ast@kernel.org, john.fastabend@gmail.com,
	mark.rutland@arm.com, richard@nod.at, sp3485@columbia.edu,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
On 10/17/2017 05:03 PM, David Laight wrote:
> From: Daniel Borkmann
>> Sent: 17 October 2017 15:56
>>
>> The set fixes a splat in devmap percpu allocation when we alloc
>> the flush bitmap. Patch 1 is a prerequisite for the fix in patch 2,
>> patch 1 is rather small, so if this could be routed via -net, for
>> example, with Tejun's Ack that would be good. Patch 3 gets rid of
>> remaining PCPU_MIN_UNIT_SIZE checks, which are percpu allocator
>> internals and should not be used.
>
> Does it make sense to allow the user program to try to allocate ever
> smaller very large maps until it finds one that succeeds - thus
> using up all the percpu space?
>
> Or is this a 'root only' 'shoot self in foot' job?
It's root only although John still has a pending fix to be flushed
out for -net first in the next days to actually enforce that cap
(devmap is not in an official kernel yet at this point, so all good),
but apart from this, all map allocs in general are accounted for
as well.
Thanks,
Daniel
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH net 1/3] mm, percpu: add support for __GFP_NOWARN flag
  2017-10-17 14:55 ` [PATCH net 1/3] mm, percpu: add support for __GFP_NOWARN flag Daniel Borkmann
@ 2017-10-17 15:53   ` Alexei Starovoitov
  0 siblings, 0 replies; 19+ messages in thread
From: Alexei Starovoitov @ 2017-10-17 15:53 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: davem, tj, ast, john.fastabend, mark.rutland, richard, sp3485,
	netdev, linux-kernel
On Tue, Oct 17, 2017 at 04:55:52PM +0200, Daniel Borkmann wrote:
> Add an option for pcpu_alloc() to support __GFP_NOWARN flag.
> Currently, we always throw a warning when size or alignment
> is unsupported (and also dump stack on failed allocation
> requests). The warning itself is harmless since we return
> NULL anyway for any failed request, which callers are
> required to handle anyway. However, it becomes harmful when
> panic_on_warn is set.
> 
> The rationale for the WARN() in pcpu_alloc() is that it can
> be tracked when larger than supported allocation requests are
> made such that allocations limits can be tweaked if warranted.
> This makes sense for in-kernel users, however, there are users
> of pcpu allocator where allocation size is derived from user
> space requests, e.g. when creating BPF maps. In these cases,
> the requests should fail gracefully without throwing a splat.
> 
> The current work-around was to check allocation size against
> the upper limit of PCPU_MIN_UNIT_SIZE from call-sites for
> bailing out prior to a call to pcpu_alloc() in order to
> avoid throwing the WARN(). This is bad in multiple ways since
> PCPU_MIN_UNIT_SIZE is an implementation detail, and having
> the checks on call-sites only complicates the code for no
> good reason. Thus, lets fix it generically by supporting the
> __GFP_NOWARN flag that users can then use with calling the
> __alloc_percpu_gfp() helper instead.
> 
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
The approach looks great to me. We've been doing this dance around
allocator warning for long time. It's really not a job of bpf code
to guess into valid parameters of pcpu alloc.
Adding support for __GFP_NOWARN and using it in bpf is much cleaner
fix that avoids layering violations.
Acked-by: Alexei Starovoitov <ast@kernel.org>
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH net 2/3] bpf: fix splat for illegal devmap percpu allocation
  2017-10-17 14:55 ` [PATCH net 2/3] bpf: fix splat for illegal devmap percpu allocation Daniel Borkmann
@ 2017-10-17 15:54   ` Alexei Starovoitov
  2017-10-17 16:28   ` John Fastabend
  1 sibling, 0 replies; 19+ messages in thread
From: Alexei Starovoitov @ 2017-10-17 15:54 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: davem, tj, ast, john.fastabend, mark.rutland, richard, sp3485,
	netdev, linux-kernel
On Tue, Oct 17, 2017 at 04:55:53PM +0200, Daniel Borkmann wrote:
> It was reported that syzkaller was able to trigger a splat on
> devmap percpu allocation due to illegal/unsupported allocation
> request size passed to __alloc_percpu():
> 
>   [   70.094249] illegal size (32776) or align (8) for percpu allocation
>   [   70.094256] ------------[ cut here ]------------
>   [   70.094259] WARNING: CPU: 3 PID: 3451 at mm/percpu.c:1365 pcpu_alloc+0x96/0x630
>   [...]
>   [   70.094325] Call Trace:
>   [   70.094328]  __alloc_percpu_gfp+0x12/0x20
>   [   70.094330]  dev_map_alloc+0x134/0x1e0
>   [   70.094331]  SyS_bpf+0x9bc/0x1610
>   [   70.094333]  ? selinux_task_setrlimit+0x5a/0x60
>   [   70.094334]  ? security_task_setrlimit+0x43/0x60
>   [   70.094336]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> 
> This was due to too large max_entries for the map such that we
> surpassed the upper limit of PCPU_MIN_UNIT_SIZE. It's fine to
> fail naturally here, so switch to __alloc_percpu_gfp() and pass
> __GFP_NOWARN instead.
> 
> Fixes: 11393cc9b9be ("xdp: Add batching support to redirect map")
> Reported-by: Mark Rutland <mark.rutland@arm.com>
> Reported-by: Shankara Pailoor <sp3485@columbia.edu>
> Reported-by: Richard Weinberger <richard@nod.at>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Cc: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH net 3/3] bpf: do not test for PCPU_MIN_UNIT_SIZE before percpu allocations
  2017-10-17 14:55 ` [PATCH net 3/3] bpf: do not test for PCPU_MIN_UNIT_SIZE before percpu allocations Daniel Borkmann
@ 2017-10-17 15:55   ` Alexei Starovoitov
  2017-10-17 16:29   ` John Fastabend
  1 sibling, 0 replies; 19+ messages in thread
From: Alexei Starovoitov @ 2017-10-17 15:55 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: davem, tj, ast, john.fastabend, mark.rutland, richard, sp3485,
	netdev, linux-kernel
On Tue, Oct 17, 2017 at 04:55:54PM +0200, Daniel Borkmann wrote:
> PCPU_MIN_UNIT_SIZE is an implementation detail of the percpu
> allocator. Given we support __GFP_NOWARN now, lets just let
> the allocation request fail naturally instead. The two call
> sites from BPF mistakenly assumed __GFP_NOWARN would work, so
> no changes needed to their actual __alloc_percpu_gfp() calls
> which use the flag already.
> 
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH net 2/3] bpf: fix splat for illegal devmap percpu allocation
  2017-10-17 14:55 ` [PATCH net 2/3] bpf: fix splat for illegal devmap percpu allocation Daniel Borkmann
  2017-10-17 15:54   ` Alexei Starovoitov
@ 2017-10-17 16:28   ` John Fastabend
  1 sibling, 0 replies; 19+ messages in thread
From: John Fastabend @ 2017-10-17 16:28 UTC (permalink / raw)
  To: Daniel Borkmann, davem
  Cc: tj, ast, mark.rutland, richard, sp3485, netdev, linux-kernel
On 10/17/2017 07:55 AM, Daniel Borkmann wrote:
> It was reported that syzkaller was able to trigger a splat on
> devmap percpu allocation due to illegal/unsupported allocation
> request size passed to __alloc_percpu():
> 
>   [   70.094249] illegal size (32776) or align (8) for percpu allocation
>   [   70.094256] ------------[ cut here ]------------
>   [   70.094259] WARNING: CPU: 3 PID: 3451 at mm/percpu.c:1365 pcpu_alloc+0x96/0x630
>   [...]
>   [   70.094325] Call Trace:
>   [   70.094328]  __alloc_percpu_gfp+0x12/0x20
>   [   70.094330]  dev_map_alloc+0x134/0x1e0
>   [   70.094331]  SyS_bpf+0x9bc/0x1610
>   [   70.094333]  ? selinux_task_setrlimit+0x5a/0x60
>   [   70.094334]  ? security_task_setrlimit+0x43/0x60
>   [   70.094336]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> 
> This was due to too large max_entries for the map such that we
> surpassed the upper limit of PCPU_MIN_UNIT_SIZE. It's fine to
> fail naturally here, so switch to __alloc_percpu_gfp() and pass
> __GFP_NOWARN instead.
> 
> Fixes: 11393cc9b9be ("xdp: Add batching support to redirect map")
> Reported-by: Mark Rutland <mark.rutland@arm.com>
> Reported-by: Shankara Pailoor <sp3485@columbia.edu>
> Reported-by: Richard Weinberger <richard@nod.at>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Cc: John Fastabend <john.fastabend@gmail.com>
> ---
Thanks!
Acked-by: John Fastabend <john.fastabend@gmail.com>
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH net 3/3] bpf: do not test for PCPU_MIN_UNIT_SIZE before percpu allocations
  2017-10-17 14:55 ` [PATCH net 3/3] bpf: do not test for PCPU_MIN_UNIT_SIZE before percpu allocations Daniel Borkmann
  2017-10-17 15:55   ` Alexei Starovoitov
@ 2017-10-17 16:29   ` John Fastabend
  1 sibling, 0 replies; 19+ messages in thread
From: John Fastabend @ 2017-10-17 16:29 UTC (permalink / raw)
  To: Daniel Borkmann, davem
  Cc: tj, ast, mark.rutland, richard, sp3485, netdev, linux-kernel
On 10/17/2017 07:55 AM, Daniel Borkmann wrote:
> PCPU_MIN_UNIT_SIZE is an implementation detail of the percpu
> allocator. Given we support __GFP_NOWARN now, lets just let
> the allocation request fail naturally instead. The two call
> sites from BPF mistakenly assumed __GFP_NOWARN would work, so
> no changes needed to their actual __alloc_percpu_gfp() calls
> which use the flag already.
> 
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> ---
Nice cleanup. Thanks!
Acked-by: John Fastabend <john.fastabend@gmail.com>
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH net 0/3] Fix for BPF devmap percpu allocation splat
  2017-10-17 14:55 [PATCH net 0/3] Fix for BPF devmap percpu allocation splat Daniel Borkmann
                   ` (3 preceding siblings ...)
  2017-10-17 15:03 ` [PATCH net 0/3] Fix for BPF devmap percpu allocation splat David Laight
@ 2017-10-18 13:25 ` Tejun Heo
  2017-10-18 14:03   ` Daniel Borkmann
  2017-10-18 21:45   ` Dennis Zhou
  2017-10-19 12:14 ` David Miller
  5 siblings, 2 replies; 19+ messages in thread
From: Tejun Heo @ 2017-10-18 13:25 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: davem, ast, john.fastabend, mark.rutland, richard, sp3485, netdev,
	linux-kernel, Dennis Zhou
Hello, Daniel.
(cc'ing Dennis)
On Tue, Oct 17, 2017 at 04:55:51PM +0200, Daniel Borkmann wrote:
> The set fixes a splat in devmap percpu allocation when we alloc
> the flush bitmap. Patch 1 is a prerequisite for the fix in patch 2,
> patch 1 is rather small, so if this could be routed via -net, for
> example, with Tejun's Ack that would be good. Patch 3 gets rid of
> remaining PCPU_MIN_UNIT_SIZE checks, which are percpu allocator
> internals and should not be used.
> 
> Thanks!
> 
> Daniel Borkmann (3):
>   mm, percpu: add support for __GFP_NOWARN flag
This looks fine.
>   bpf: fix splat for illegal devmap percpu allocation
>   bpf: do not test for PCPU_MIN_UNIT_SIZE before percpu allocations
These look okay too but if it helps percpu allocator can expose the
maximum size / alignment supported to take out the guessing game too.
Also, the reason why PCPU_MIN_UNIT_SIZE is what it is is because
nobody needed anything bigger.  Increasing the size doesn't really
cost much at least on 64bit archs.  Is that something we want to be
considering?
Thanks.
-- 
tejun
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH net 0/3] Fix for BPF devmap percpu allocation splat
  2017-10-18 13:25 ` Tejun Heo
@ 2017-10-18 14:03   ` Daniel Borkmann
  2017-10-18 14:22     ` Daniel Borkmann
  2017-10-18 21:45   ` Dennis Zhou
  1 sibling, 1 reply; 19+ messages in thread
From: Daniel Borkmann @ 2017-10-18 14:03 UTC (permalink / raw)
  To: Tejun Heo
  Cc: davem, ast, john.fastabend, mark.rutland, richard, sp3485, netdev,
	linux-kernel, Dennis Zhou
On 10/18/2017 03:25 PM, Tejun Heo wrote:
> Hello, Daniel.
>
> (cc'ing Dennis)
>
> On Tue, Oct 17, 2017 at 04:55:51PM +0200, Daniel Borkmann wrote:
>> The set fixes a splat in devmap percpu allocation when we alloc
>> the flush bitmap. Patch 1 is a prerequisite for the fix in patch 2,
>> patch 1 is rather small, so if this could be routed via -net, for
>> example, with Tejun's Ack that would be good. Patch 3 gets rid of
>> remaining PCPU_MIN_UNIT_SIZE checks, which are percpu allocator
>> internals and should not be used.
>>
>> Thanks!
>>
>> Daniel Borkmann (3):
>>    mm, percpu: add support for __GFP_NOWARN flag
>
> This looks fine.
Great, thanks!
>>    bpf: fix splat for illegal devmap percpu allocation
>>    bpf: do not test for PCPU_MIN_UNIT_SIZE before percpu allocations
>
> These look okay too but if it helps percpu allocator can expose the
> maximum size / alignment supported to take out the guessing game too.
At least from BPF side there's right now no infra for exposing
max possible alloc sizes for maps to e.g. user space as indication.
There are few users left in the tree, where it would make sense for
having some helpers though:
   arch/tile/kernel/setup.c:729:   if (size < PCPU_MIN_UNIT_SIZE)
   arch/tile/kernel/setup.c:730:           size = PCPU_MIN_UNIT_SIZE;
   drivers/net/ethernet/chelsio/libcxgb/libcxgb_ppm.c:346: unsigned int max = (PCPU_MIN_UNIT_SIZE - sizeof(*pools)) << 3;
   drivers/net/ethernet/chelsio/libcxgb/libcxgb_ppm.c:352: /* make sure per cpu pool fits into PCPU_MIN_UNIT_SIZE */
   drivers/scsi/libfc/fc_exch.c:2488:       /* reduce range so per cpu pool fits into PCPU_MIN_UNIT_SIZE pool */
   drivers/scsi/libfc/fc_exch.c:2489:      pool_exch_range = (PCPU_MIN_UNIT_SIZE - sizeof(*pool)) /
> Also, the reason why PCPU_MIN_UNIT_SIZE is what it is is because
> nobody needed anything bigger.  Increasing the size doesn't really
> cost much at least on 64bit archs.  Is that something we want to be
> considering?
For devmap (and cpumap) itself it wouldn't make sense. For per-cpu
hashtable we could indeed consider it in the future.
Thanks,
Daniel
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH net 0/3] Fix for BPF devmap percpu allocation splat
  2017-10-18 14:03   ` Daniel Borkmann
@ 2017-10-18 14:22     ` Daniel Borkmann
  2017-10-18 15:28       ` Alexei Starovoitov
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel Borkmann @ 2017-10-18 14:22 UTC (permalink / raw)
  To: Tejun Heo
  Cc: davem, ast, john.fastabend, mark.rutland, richard, sp3485, netdev,
	linux-kernel, Dennis Zhou
On 10/18/2017 04:03 PM, Daniel Borkmann wrote:
> On 10/18/2017 03:25 PM, Tejun Heo wrote:
>> Hello, Daniel.
>>
>> (cc'ing Dennis)
>>
>> On Tue, Oct 17, 2017 at 04:55:51PM +0200, Daniel Borkmann wrote:
>>> The set fixes a splat in devmap percpu allocation when we alloc
>>> the flush bitmap. Patch 1 is a prerequisite for the fix in patch 2,
>>> patch 1 is rather small, so if this could be routed via -net, for
>>> example, with Tejun's Ack that would be good. Patch 3 gets rid of
>>> remaining PCPU_MIN_UNIT_SIZE checks, which are percpu allocator
>>> internals and should not be used.
>>>
>>> Thanks!
>>>
>>> Daniel Borkmann (3):
>>>    mm, percpu: add support for __GFP_NOWARN flag
>>
>> This looks fine.
>
> Great, thanks!
>
>>>    bpf: fix splat for illegal devmap percpu allocation
>>>    bpf: do not test for PCPU_MIN_UNIT_SIZE before percpu allocations
>>
>> These look okay too but if it helps percpu allocator can expose the
>> maximum size / alignment supported to take out the guessing game too.
>
> At least from BPF side there's right now no infra for exposing
> max possible alloc sizes for maps to e.g. user space as indication.
> There are few users left in the tree, where it would make sense for
> having some helpers though:
>
>    arch/tile/kernel/setup.c:729:   if (size < PCPU_MIN_UNIT_SIZE)
>    arch/tile/kernel/setup.c:730:           size = PCPU_MIN_UNIT_SIZE;
>    drivers/net/ethernet/chelsio/libcxgb/libcxgb_ppm.c:346: unsigned int max = (PCPU_MIN_UNIT_SIZE - sizeof(*pools)) << 3;
>    drivers/net/ethernet/chelsio/libcxgb/libcxgb_ppm.c:352: /* make sure per cpu pool fits into PCPU_MIN_UNIT_SIZE */
>    drivers/scsi/libfc/fc_exch.c:2488:       /* reduce range so per cpu pool fits into PCPU_MIN_UNIT_SIZE pool */
>    drivers/scsi/libfc/fc_exch.c:2489:      pool_exch_range = (PCPU_MIN_UNIT_SIZE - sizeof(*pool)) /
>
>> Also, the reason why PCPU_MIN_UNIT_SIZE is what it is is because
>> nobody needed anything bigger.  Increasing the size doesn't really
>> cost much at least on 64bit archs.  Is that something we want to be
>> considering?
>
> For devmap (and cpumap) itself it wouldn't make sense. For per-cpu
> hashtable we could indeed consider it in the future.
Higher prio imo would be to make the allocation itself faster
though, I remember we talked about this back in May wrt hashtable,
but I kind of lost track whether there was an update on this in
the mean time. ;-)
Cheers,
Daniel
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH net 0/3] Fix for BPF devmap percpu allocation splat
  2017-10-18 14:22     ` Daniel Borkmann
@ 2017-10-18 15:28       ` Alexei Starovoitov
  2017-10-18 15:31         ` Daniel Borkmann
  0 siblings, 1 reply; 19+ messages in thread
From: Alexei Starovoitov @ 2017-10-18 15:28 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Tejun Heo, David S. Miller, Alexei Starovoitov, John Fastabend,
	Mark Rutland, Richard Weinberger, sp3485, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Dennis Zhou
On Wed, Oct 18, 2017 at 7:22 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> Higher prio imo would be to make the allocation itself faster
> though, I remember we talked about this back in May wrt hashtable,
> but I kind of lost track whether there was an update on this in
> the mean time. ;-)
new percpu allocator by Dennis fixed those issues. It's in 4.14
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH net 0/3] Fix for BPF devmap percpu allocation splat
  2017-10-18 15:28       ` Alexei Starovoitov
@ 2017-10-18 15:31         ` Daniel Borkmann
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Borkmann @ 2017-10-18 15:31 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Tejun Heo, David S. Miller, Alexei Starovoitov, John Fastabend,
	Mark Rutland, Richard Weinberger, sp3485, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Dennis Zhou
On 10/18/2017 05:28 PM, Alexei Starovoitov wrote:
> On Wed, Oct 18, 2017 at 7:22 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>>
>> Higher prio imo would be to make the allocation itself faster
>> though, I remember we talked about this back in May wrt hashtable,
>> but I kind of lost track whether there was an update on this in
>> the mean time. ;-)
>
> new percpu allocator by Dennis fixed those issues. It's in 4.14
Ah, perfect!
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH net 0/3] Fix for BPF devmap percpu allocation splat
  2017-10-18 13:25 ` Tejun Heo
  2017-10-18 14:03   ` Daniel Borkmann
@ 2017-10-18 21:45   ` Dennis Zhou
  2017-10-21 16:00     ` Tejun Heo
  1 sibling, 1 reply; 19+ messages in thread
From: Dennis Zhou @ 2017-10-18 21:45 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Daniel Borkmann, davem, ast, john.fastabend, mark.rutland,
	richard, sp3485, netdev, linux-kernel
Hi Daniel and Tejun,
On Wed, Oct 18, 2017 at 06:25:26AM -0700, Tejun Heo wrote:
> > Daniel Borkmann (3):
> >   mm, percpu: add support for __GFP_NOWARN flag
> 
> This looks fine.
> 
Looks good to me too.
> >   bpf: fix splat for illegal devmap percpu allocation
> >   bpf: do not test for PCPU_MIN_UNIT_SIZE before percpu allocations
> 
> These look okay too but if it helps percpu allocator can expose the
> maximum size / alignment supported to take out the guessing game too.
> 
I can add this once we've addressed the below if we want to.
> Also, the reason why PCPU_MIN_UNIT_SIZE is what it is is because
> nobody needed anything bigger.  Increasing the size doesn't really
> cost much at least on 64bit archs.  Is that something we want to be
> considering?
> 
I'm not sure I see the reason we can't match the minimum allocation size
with the unit size? It seems weird to arbitrate the maximum allocation
size given a lower bound on the unit size.
Thanks,
Dennis
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH net 0/3] Fix for BPF devmap percpu allocation splat
  2017-10-17 14:55 [PATCH net 0/3] Fix for BPF devmap percpu allocation splat Daniel Borkmann
                   ` (4 preceding siblings ...)
  2017-10-18 13:25 ` Tejun Heo
@ 2017-10-19 12:14 ` David Miller
  5 siblings, 0 replies; 19+ messages in thread
From: David Miller @ 2017-10-19 12:14 UTC (permalink / raw)
  To: daniel
  Cc: tj, ast, john.fastabend, mark.rutland, richard, sp3485, netdev,
	linux-kernel
From: Daniel Borkmann <daniel@iogearbox.net>
Date: Tue, 17 Oct 2017 16:55:51 +0200
> The set fixes a splat in devmap percpu allocation when we alloc
> the flush bitmap. Patch 1 is a prerequisite for the fix in patch 2,
> patch 1 is rather small, so if this could be routed via -net, for
> example, with Tejun's Ack that would be good. Patch 3 gets rid of
> remaining PCPU_MIN_UNIT_SIZE checks, which are percpu allocator
> internals and should not be used.
Series applied.
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH net 0/3] Fix for BPF devmap percpu allocation splat
  2017-10-18 21:45   ` Dennis Zhou
@ 2017-10-21 16:00     ` Tejun Heo
  0 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2017-10-21 16:00 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: Daniel Borkmann, davem, ast, john.fastabend, mark.rutland,
	richard, sp3485, netdev, linux-kernel
Hello,
On Wed, Oct 18, 2017 at 04:45:08PM -0500, Dennis Zhou wrote:
> I'm not sure I see the reason we can't match the minimum allocation size
> with the unit size? It seems weird to arbitrate the maximum allocation
> size given a lower bound on the unit size.
idk, it can be weird for the maximum allowed allocation size varying
widely depending on how the machine boots up.
Thanks.
-- 
tejun
^ permalink raw reply	[flat|nested] 19+ messages in thread
end of thread, other threads:[~2017-10-21 16:00 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-17 14:55 [PATCH net 0/3] Fix for BPF devmap percpu allocation splat Daniel Borkmann
2017-10-17 14:55 ` [PATCH net 1/3] mm, percpu: add support for __GFP_NOWARN flag Daniel Borkmann
2017-10-17 15:53   ` Alexei Starovoitov
2017-10-17 14:55 ` [PATCH net 2/3] bpf: fix splat for illegal devmap percpu allocation Daniel Borkmann
2017-10-17 15:54   ` Alexei Starovoitov
2017-10-17 16:28   ` John Fastabend
2017-10-17 14:55 ` [PATCH net 3/3] bpf: do not test for PCPU_MIN_UNIT_SIZE before percpu allocations Daniel Borkmann
2017-10-17 15:55   ` Alexei Starovoitov
2017-10-17 16:29   ` John Fastabend
2017-10-17 15:03 ` [PATCH net 0/3] Fix for BPF devmap percpu allocation splat David Laight
2017-10-17 15:11   ` Daniel Borkmann
2017-10-18 13:25 ` Tejun Heo
2017-10-18 14:03   ` Daniel Borkmann
2017-10-18 14:22     ` Daniel Borkmann
2017-10-18 15:28       ` Alexei Starovoitov
2017-10-18 15:31         ` Daniel Borkmann
2017-10-18 21:45   ` Dennis Zhou
2017-10-21 16:00     ` Tejun Heo
2017-10-19 12:14 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).