All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Borkmann <daniel@iogearbox.net>
To: William Tu <u9012063@gmail.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>
Subject: Re: [RFC PATCH net-next] bpf: fix potential percpu map overcopy to user.
Date: Wed, 19 Oct 2016 17:15:41 +0200	[thread overview]
Message-ID: <58078E1D.6080108@iogearbox.net> (raw)
In-Reply-To: <CALDO+SaebqahYHn2PU=nk6mYOMjRp+F8PKb3cFaGMv4_xjLwZg@mail.gmail.com>

On 10/19/2016 07:31 AM, William Tu wrote:
>> ...
>>> -     if (copy_to_user(uvalue, value, value_size) != 0)
>>> +     if (copy_to_user(uvalue, value, min_t(u32, usize, value_size)) != 0)
>>>                goto free_value;
>>
>> I think such approach won't actually fix anything. User space
>> may lose some of the values and won't have any idea what was lost.
>> I think we need to fix sample code to avoid using sysconf(_SC_NPROCESSORS_CONF)
>> and use /sys/devices/system/cpu/possible instead.
>> I would argue that glibc should be fixed as well since relying on
>> ls -d /sys/devices/system/cpu/cpu[0-9]*|wc -l turned out to be incorrect.
>
> Thanks for the feedback. I think glibc is correct. The
> _SC_NPROCESSORS_CONF presents the number of processors
> configured/populated and is indeed "ls
> /sys/devices/system/cpu/cpu[0-9]*|wc -l". This means the actual number
> of CPUs installed on your system. On the other hand, the

In glibc __get_nprocs_conf() seems to try a number of things, first it
tries equivalent of /sys/devices/system/cpu/cpu[0-9]*|wc -l, if that fails,
depending on the config, it either tries to count cpus in /proc/cpuinfo,
or returns the _SC_NPROCESSORS_ONLN value instead. If /proc/cpuinfo has
some issue, it returns just 1 worst case. _SC_NPROCESSORS_ONLN will parse
/sys/devices/system/cpu/online, if that fails it looks into /proc/stat
for cpuX entries, and if also that fails for some reason, /proc/cpuinfo
is consulted (and returning 1 if unlikely all breaks down).

> num_possible_cpus() includes both the installed CPUs and the empty CPU
> socket/slot, in order to support CPU hotplug.

Correct.

> As a example, one of my dual socket motherboard with 1 CPU installed has
> # /sys/devices/system/cpu/possible
> 0-239
> # /sys/devices/system/cpu/cpu[0-9]*|wc -l
> 12
> Note that these 12 cpus could be online/offline by
> # echo 1/0 > /sys/devices/system/cpu/cpuX/online
> Even if it is offline, the entry is still there.
>
> Thinking about another solution, maybe we should use
> "num_present_cpus()" which means the configured/populated CPUs and the
> value is the same as sysconf(_SC_NPROCESSORS_CONF). Consider:
> 1) cpuX is online/offline: the num_present_cpus() remains the same.
> 2) new cpu is hotplug into the empty socket: the num_present_cpus()
> gets updates, and also the sysconf(_SC_NPROCESSORS_CONF).
>
> +++ b/kernel/bpf/syscall.c
> @@ -297,7 +297,7 @@ static int map_lookup_elem(union bpf_attr *attr)
>
>          if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
>              map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY)
> -               value_size = round_up(map->value_size, 8) * num_possible_cpus();
> +               value_size = round_up(map->value_size, 8) * num_present_cpus();
>          else
>                  value_size = map->value_size;

But as you say in 2) that also has a chance of being racy on CPU hotplug
compared to num_possible_cpus() which is fixed at boot time.

Documentation/cputopology.txt +106 says /sys/devices/system/cpu/possible
outputs cpu_possible_mask. That is the same as in num_possible_cpus(), so
first step would be to fix the buggy example code, imho.

What perhaps could be done in a second step to reduce overhead is an option
for bpf(2) to pass in a cpu mask similarly as for sched_{get,set}affinity()
syscalls, where user space can construct a mask via CPU_SET(3). For the
syscall time, kernel would lock hot plugging via get_online_cpus() and
put_online_cpus(), it would check whether passed CPUs are online to query
and if so then it would copy the values into the user provided buffer. I'd
think this might be useful in a number of ways anyway.

Thanks,
Daniel

  parent reply	other threads:[~2016-10-19 15:15 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-16 16:41 [RFC PATCH net-next] bpf: fix potential percpu map overcopy to user William Tu
2016-10-19  0:50 ` Alexei Starovoitov
2016-10-19  5:31   ` William Tu
2016-10-19 10:05     ` Daniel Borkmann
2016-10-20 19:21       ` William Tu
2016-10-19 15:15     ` Daniel Borkmann [this message]
2016-10-20 16:04       ` Daniel Borkmann
2016-10-20 16:58         ` Alexei Starovoitov
2016-10-20 18:41           ` William Tu
2016-10-20 19:39             ` Daniel Borkmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58078E1D.6080108@iogearbox.net \
    --to=daniel@iogearbox.net \
    --cc=alexei.starovoitov@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=u9012063@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.