[RFC bpf-next] libbpf: increase rlimit before trying to create BPF maps

Netdev List
 help / color / mirror / Atom feed

* [RFC bpf-next] libbpf: increase rlimit before trying to create BPF maps
@ 2018-10-30 15:23 Quentin Monnet
  2018-11-01 17:18 ` Quentin Monnet
  0 siblings, 1 reply; 4+ messages in thread
From: Quentin Monnet @ 2018-10-30 15:23 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann; +Cc: netdev, oss-drivers, Quentin Monnet

The limit for memory locked in the kernel by a process is usually set to
64 bytes by default. This can be an issue when creating large BPF maps.
A workaround is to raise this limit for the current process before
trying to create a new BPF map. Changing the hard limit requires the
CAP_SYS_RESOURCE and can usually only be done by root user (but then
only root can create BPF maps).

As far as I know there is not API to get the current amount of memory
locked for a user, therefore we cannot raise the limit only when
required. One solution, used by bcc, is to try to create the map, and on
getting a EPERM error, raising the limit to infinity before giving
another try. Another approach, used in iproute, is to raise the limit in
all cases, before trying to create the map.

Here we do the same as in iproute2: the rlimit is raised to infinity
before trying to load the map.

I send this patch as a RFC to see if people would prefer the bcc
approach instead, or the rlimit change to be in bpftool rather than in
libbpf.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 tools/lib/bpf/bpf.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 03f9bcc4ef50..456a5a7b112c 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -26,6 +26,8 @@
 #include <unistd.h>
 #include <asm/unistd.h>
 #include <linux/bpf.h>
+#include <sys/resource.h>
+#include <sys/types.h>
 #include "bpf.h"
 #include "libbpf.h"
 #include <errno.h>
@@ -68,8 +70,11 @@ static inline int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr,
 int bpf_create_map_xattr(const struct bpf_create_map_attr *create_attr)
 {
 	__u32 name_len = create_attr->name ? strlen(create_attr->name) : 0;
+	struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
 	union bpf_attr attr;

+	setrlimit(RLIMIT_MEMLOCK, &rinf);
+
 	memset(&attr, '\0', sizeof(attr));

 	attr.map_type = create_attr->map_type;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC bpf-next] libbpf: increase rlimit before trying to create BPF maps
  2018-10-30 15:23 [RFC bpf-next] libbpf: increase rlimit before trying to create BPF maps Quentin Monnet
@ 2018-11-01 17:18 ` Quentin Monnet
  2018-11-02  9:08   ` Daniel Borkmann
  0 siblings, 1 reply; 4+ messages in thread
From: Quentin Monnet @ 2018-11-01 17:18 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann; +Cc: netdev, oss-drivers

2018-10-30 15:23 UTC+0000 ~ Quentin Monnet <quentin.monnet@netronome.com>
> The limit for memory locked in the kernel by a process is usually set to
> 64 bytes by default. This can be an issue when creating large BPF maps.
> A workaround is to raise this limit for the current process before
> trying to create a new BPF map. Changing the hard limit requires the
> CAP_SYS_RESOURCE and can usually only be done by root user (but then
> only root can create BPF maps).

Sorry, the parenthesis is not correct: non-root users can in fact create
BPF maps as well. If a non-root user calls the function to create a map,
setrlimit() will fail silently (but set errno), and the program will
simply go on with its rlimit unchanged.

> 
> As far as I know there is not API to get the current amount of memory
> locked for a user, therefore we cannot raise the limit only when
> required. One solution, used by bcc, is to try to create the map, and on
> getting a EPERM error, raising the limit to infinity before giving
> another try. Another approach, used in iproute, is to raise the limit in
> all cases, before trying to create the map.
> 
> Here we do the same as in iproute2: the rlimit is raised to infinity
> before trying to load the map.
> 
> I send this patch as a RFC to see if people would prefer the bcc
> approach instead, or the rlimit change to be in bpftool rather than in
> libbpf.
> 
> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
> ---
>  tools/lib/bpf/bpf.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> index 03f9bcc4ef50..456a5a7b112c 100644
> --- a/tools/lib/bpf/bpf.c
> +++ b/tools/lib/bpf/bpf.c
> @@ -26,6 +26,8 @@
>  #include <unistd.h>
>  #include <asm/unistd.h>
>  #include <linux/bpf.h>
> +#include <sys/resource.h>
> +#include <sys/types.h>
>  #include "bpf.h"
>  #include "libbpf.h"
>  #include <errno.h>
> @@ -68,8 +70,11 @@ static inline int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr,
>  int bpf_create_map_xattr(const struct bpf_create_map_attr *create_attr)
>  {
>  	__u32 name_len = create_attr->name ? strlen(create_attr->name) : 0;
> +	struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
>  	union bpf_attr attr;
>  
> +	setrlimit(RLIMIT_MEMLOCK, &rinf);
> +
>  	memset(&attr, '\0', sizeof(attr));
>  
>  	attr.map_type = create_attr->map_type;
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC bpf-next] libbpf: increase rlimit before trying to create BPF maps
  2018-11-01 17:18 ` Quentin Monnet
@ 2018-11-02  9:08   ` Daniel Borkmann
  2018-11-02 11:04     ` Quentin Monnet
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Borkmann @ 2018-11-02  9:08 UTC (permalink / raw)
  To: Quentin Monnet, Alexei Starovoitov; +Cc: netdev, oss-drivers

On 11/01/2018 06:18 PM, Quentin Monnet wrote:
> 2018-10-30 15:23 UTC+0000 ~ Quentin Monnet <quentin.monnet@netronome.com>
>> The limit for memory locked in the kernel by a process is usually set to
>> 64 bytes by default. This can be an issue when creating large BPF maps.
>> A workaround is to raise this limit for the current process before
>> trying to create a new BPF map. Changing the hard limit requires the
>> CAP_SYS_RESOURCE and can usually only be done by root user (but then
>> only root can create BPF maps).
> 
> Sorry, the parenthesis is not correct: non-root users can in fact create
> BPF maps as well. If a non-root user calls the function to create a map,
> setrlimit() will fail silently (but set errno), and the program will
> simply go on with its rlimit unchanged.
> 
>> As far as I know there is not API to get the current amount of memory
>> locked for a user, therefore we cannot raise the limit only when
>> required. One solution, used by bcc, is to try to create the map, and on
>> getting a EPERM error, raising the limit to infinity before giving
>> another try. Another approach, used in iproute, is to raise the limit in
>> all cases, before trying to create the map.
>>
>> Here we do the same as in iproute2: the rlimit is raised to infinity
>> before trying to load the map.
>>
>> I send this patch as a RFC to see if people would prefer the bcc
>> approach instead, or the rlimit change to be in bpftool rather than in
>> libbpf.

I'd avoid doing something like this in a generic library; it's basically an
ugly hack for the kind of accounting we're doing and only shows that while
this was "good enough" to start off with in the early days, we should be
doing something better today if every application raises it to inf anyway
then it's broken. :) It just shows that this missed its purpose. Similarly
to the jit_limit discussion on rlimit, perhaps we should be considering
switching to something else entirely from kernel side. Could be something
like memcg but this definitely needs some more evaluation first. (Meanwhile
I'd not change the lib but callers instead and once we have something better
in place we remove this type of "raising to inf" from the tree ...)

>> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
>> ---
>>  tools/lib/bpf/bpf.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
>> index 03f9bcc4ef50..456a5a7b112c 100644
>> --- a/tools/lib/bpf/bpf.c
>> +++ b/tools/lib/bpf/bpf.c
>> @@ -26,6 +26,8 @@
>>  #include <unistd.h>
>>  #include <asm/unistd.h>
>>  #include <linux/bpf.h>
>> +#include <sys/resource.h>
>> +#include <sys/types.h>
>>  #include "bpf.h"
>>  #include "libbpf.h"
>>  #include <errno.h>
>> @@ -68,8 +70,11 @@ static inline int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr,
>>  int bpf_create_map_xattr(const struct bpf_create_map_attr *create_attr)
>>  {
>>  	__u32 name_len = create_attr->name ? strlen(create_attr->name) : 0;
>> +	struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
>>  	union bpf_attr attr;
>>  
>> +	setrlimit(RLIMIT_MEMLOCK, &rinf);
>> +
>>  	memset(&attr, '\0', sizeof(attr));
>>  
>>  	attr.map_type = create_attr->map_type;
>>
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC bpf-next] libbpf: increase rlimit before trying to create BPF maps
  2018-11-02  9:08   ` Daniel Borkmann
@ 2018-11-02 11:04     ` Quentin Monnet
  0 siblings, 0 replies; 4+ messages in thread
From: Quentin Monnet @ 2018-11-02 11:04 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov; +Cc: netdev, oss-drivers

2018-11-02 10:08 UTC+0100 ~ Daniel Borkmann <daniel@iogearbox.net>
> On 11/01/2018 06:18 PM, Quentin Monnet wrote:
>> 2018-10-30 15:23 UTC+0000 ~ Quentin Monnet <quentin.monnet@netronome.com>
>>> The limit for memory locked in the kernel by a process is usually set to
>>> 64 bytes by default. This can be an issue when creating large BPF maps.
>>> A workaround is to raise this limit for the current process before
>>> trying to create a new BPF map. Changing the hard limit requires the
>>> CAP_SYS_RESOURCE and can usually only be done by root user (but then
>>> only root can create BPF maps).
>>
>> Sorry, the parenthesis is not correct: non-root users can in fact create
>> BPF maps as well. If a non-root user calls the function to create a map,
>> setrlimit() will fail silently (but set errno), and the program will
>> simply go on with its rlimit unchanged.
>>
>>> As far as I know there is not API to get the current amount of memory
>>> locked for a user, therefore we cannot raise the limit only when
>>> required. One solution, used by bcc, is to try to create the map, and on
>>> getting a EPERM error, raising the limit to infinity before giving
>>> another try. Another approach, used in iproute, is to raise the limit in
>>> all cases, before trying to create the map.
>>>
>>> Here we do the same as in iproute2: the rlimit is raised to infinity
>>> before trying to load the map.
>>>
>>> I send this patch as a RFC to see if people would prefer the bcc
>>> approach instead, or the rlimit change to be in bpftool rather than in
>>> libbpf.
> 
> I'd avoid doing something like this in a generic library; it's basically an
> ugly hack for the kind of accounting we're doing and only shows that while
> this was "good enough" to start off with in the early days, we should be
> doing something better today if every application raises it to inf anyway
> then it's broken. :) It just shows that this missed its purpose. Similarly
> to the jit_limit discussion on rlimit, perhaps we should be considering
> switching to something else entirely from kernel side. Could be something
> like memcg but this definitely needs some more evaluation first.

Changing the way limitations are enforced sounds like a cleaner
long-term approach indeed.

> (Meanwhile
> I'd not change the lib but callers instead and once we have something better
> in place we remove this type of "raising to inf" from the tree ...)

Understood, for the time beeing I'll repost a patch adding the
modification to bpftool once bpf-next is open.

Thanks Daniel!
Quentin

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-11-02 20:11 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-10-30 15:23 [RFC bpf-next] libbpf: increase rlimit before trying to create BPF maps Quentin Monnet
2018-11-01 17:18 ` Quentin Monnet
2018-11-02  9:08   ` Daniel Borkmann
2018-11-02 11:04     ` Quentin Monnet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox