From: "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: Daniel Borkmann <daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>,
Alexei Starovoitov <ast-uqk4Ao+rVK5Wk0Htik3J/w@public.gmane.org>
Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
linux-man <linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Silvan Jegen <s.jegen-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
Walter Harms <wharms-fPG8STNUNVg@public.gmane.org>
Subject: Re: Draft 3 of bpf(2) man page for review
Date: Thu, 23 Jul 2015 15:36:23 +0200 [thread overview]
Message-ID: <55B0EDD7.8020407@gmail.com> (raw)
In-Reply-To: <55B0E252.2010207-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
Hi Daniel,
On 07/23/2015 02:47 PM, Daniel Borkmann wrote:
> On 07/23/2015 01:23 PM, Michael Kerrisk (man-pages) wrote:
> ...
>>> Btw, a user obviously can close() the map fds if he
>>> wants to, but ultimatively they're freed when the program unloads.
>>
>> Okay. (Not sure if you meant that something should be added to the page.)
>
> I think not necessary.
Okay.
> [...]
>>>> The attributes key_size and value_size will be used by the
>>>
>>> attribute's?
>>
>> Nope. But I changed this to "The key_size and value_size attributes will be",
>> which may read clearer.
>
> Sorry, true, I was a bit confused. :)
NP.
> [...]
>>> The type __u64 is kernel internal, so if there's no strict reason to use it,
>>> we should just use what's provided by stdint.h.
>>
>> Agreed. Done. (By the way, what about all the __u32 and __u64 elements in the
>> bpf_attr union?)
>
> I wouldn't change the bpf_attr from the uapi.
Okay.
> Just the provided example code here, I presume people might copy from here when
> they build their own library and in userspace uint64_t seems to be more natural.
Yup.
> [...]
>>>> * map_update_elem() replaces elements in an non-atomic
>>>> fashion; for atomic updates, a hash-table map should be
>>>> used instead.
>>>
>>> This point here is most important, i.e. to not have false user expecations.
>>> Maybe it's also worth mentioning that when you have a value_size of sizeof(long),
>>> you can however use __sync_fetch_and_add() atomic builtin from the LLVM backend.
>>
>> I think I'll leave out that detail for the moment.
>
> Ok, I guess we could revisit/clarify that at a later point in time. I'd add
> a TODO comment to the source or the like, as this also is related to the 2nd
> below use case (aggregation/accounting), where an array is typically used.
Okay. FIXME added.
>>>> Among the uses for array maps are the following:
>>>>
>>>> * As "global" eBPF variables: an array of 1 element whose
>>>> key is (index) 0 and where the value is a collection of
>>>> 'global' variables which eBPF programs can use to keep
>>>> state between events.
>>>>
>>>> * Aggregation of tracing events into a fixed set of buck‐
>>>> ets.
>
> [...]
>>>> * license is a license string, which must be GPL compatible to
>>>> call helper functions marked gpl_only.
>>>
>>> Not strictly. So here, the same rules apply as with kernel modules. I.e. what
>>> the kernel checks for are the following license strings:
>>>
>>> static inline int license_is_gpl_compatible(const char *license)
>>> {
>>> return (strcmp(license, "GPL") == 0
>>> || strcmp(license, "GPL v2") == 0
>>> || strcmp(license, "GPL and additional rights") == 0
>>> || strcmp(license, "Dual BSD/GPL") == 0
>>> || strcmp(license, "Dual MIT/GPL") == 0
>>> || strcmp(license, "Dual MPL/GPL") == 0);
>>> }
>>>
>>> With any of them, the eBPF program is declared GPL compatible. Maybe of interest
>>> for those that want to use dual licensing of some sort.
>>
>> So, I'm a little unclear here. What text do you suggest for the page?
>
> Maybe we should mention in addition that the same licensing rules apply as
> in case with kernel modules, so also dual licenses could be used.
Done.
>>>> * log_buf is a pointer to a caller-allocated buffer in which the
>>>> in-kernel verifier can store the verification log. This log
>>>> is a multi-line string that can be checked by the program
>>>> author in order to understand how the verifier came to the
>>>> conclusion that the BPF program is unsafe. The format of the
>>>> output can change at any time as the verifier evolves.
>>>>
>>>> * log_size size of the buffer pointed to by log_bug. If the
>>>> size of the buffer is not large enough to store all verifier
>>>> messages, -1 is returned and errno is set to ENOSPC.
>>>>
>>>> * log_level verbosity level of the verifier. A value of zero
>>>> means that the verifier will not provide a log.
>>>
>>> Note that the log buffer is optional as mentioned here log_level = 0. The
>>> above example code of bpf_prog_load() suggests that it always needs to be
>>> provided.
>>>
>>> I once ran indeed into an issue where the program itself was correct, but
>>> it got rejected by the kernel, because my log buffer size was too small, so
>>> in tc, we now have it larger as bpf_log_buf[65536] ...
>>
>> So, I'm not clear. Do you mean that some piece of text here in the page
>> should be changed? If so, could elaborate?
>
> I'd maybe only mention in addition that in log_level=0 case, we also must not
> provide a log_buf and log_size, otherwise we get EINVAL.
I changed the text to:
* log_level verbosity level of the verifier. A value of zero
means that the verifier will not provide a log; in this case,
log_buf must be a NULL pointer, and log_size must be zero.
> [...]
>>> I had to read this twice. ;) Maybe this needs to be reworded slightly.
>>>
>>> It just means that depending on the program type that the author selects,
>>> you might end up with a different subset of helper functions, and a
>>> different program input/context. For example tracing does not have the
>>> exact same helpers as socket filters (it might have some that can be used
>>> by both). Also, the eBPF program input (context) for socket filters is a
>>> network packet, wheras for tracing you operate on a set of registers.
>>
>> Changed. Now we have:
>>
>> eBPF program types
>> The eBPF program type (prog_type) determines the subset of a ker‐
>> nel helper functions that the program may call. The program type
>
> s/a//
Fixed.
>> also determines dthe program input (context)—the format of struct
>
> s/dthe/the/
Fixed.
>> bpf_context (which is the data blob passed into the eBPF program
>> as the first argument).
>>
>> For example, a tracing program does not have the exact same sub‐
>> set of helper functions as a socket filter program (though they
>> may have some helpers in common). Similarly, the input (context)
>> for a tracing program is a set of register values, while for a
>> socket filter it is a network packet.
>>
>> The set of functions available to eBPF programs of a given type
>> may increase in the future.
>
> That's fine with me.
Okay.
> [...]
>>> I would also make a note about the JIT compiler here, i.e. that it's disabled
>>> by default, and can be enabled via:
>>>
>>> * Normal mode: echo 1 > /proc/sys/net/core/bpf_jit_enable
>>>
>>> * Debugging mode: echo 2 > /proc/sys/net/core/bpf_jit_enable
>>> [opcodes dumped in hex into the kernel log, which can then be disassembled
>>
>> Here, I assume you mean thet the generated (native) opcodes are dumpeed, right?
>
> Yes.
>
>>> with tools/net/bpf_jit_disasm.c from the kernel tree]
>>>
>>> When enabled, after a eBPF program gets loaded, it's transparently compiled /
>>> translated inside the kernel into machine opcodes for better performance,
>>> currently on x86_64, arm64 and s390.
>>
>> According to Documentation/networking/filter.txt the JIT compiler supports
>> many more architectures:
>>
>> The Linux kernel has a built-in BPF JIT compiler for x86_64,
>> SPARC, PowerPC, ARM, ARM64, MIPS and s390 and can be enabled
>> through CONFIG_BPF_JIT.
>>
>> Or am I misunderstanding something?
>
> The others only work for cBPF and have not (yet) be converted over to eBPF.
>
> For the three mentioned above, the kernel internally migrates cBPF into eBPF
> instructions and then JITs the eBPF result eventually.
Thanks for clearing that up -- I added the following sentence
JIT compiler for eBPF is currently available for the x86-64, arm64,
and s390 architectures.
Okay?
>
>> I added the following:
>>
>> The kernel contains a just-in-time (JIT) compiler that translates
>> eBPF bytecode into native machine code for better performance.
>> The JIT compiler is disabled by default, but its operation can be
>> controlled by writing one of the following values to
>> /proc/sys/net/core/bpf_jit_enable:
>>
>> 0 Disable JIT compilation (default).
>>
>> 1 Normal compilation.
>>
>> 2 Debugging mode. The generated opcodes are dumped in hexadeci‐
>> mal into the kernel log. These opcodes can then be disassem‐
>> bled using the program tools/net/bpf_jit_disasm.c provided in
>> the kernel source tree.
>>
>>>> SEE ALSO
>>>> seccomp(2), socket(7), tc(8), tc-bpf(8)
>>>>
>>>> Both classic and extended BPF are explained in the kernel source
>>>> file Documentation/networking/filter.txt.
>>>>
>>>
>
> Rest looks good for an initial version!
Yup!
Thanks,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2015-07-23 13:36 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-22 18:43 Draft 3 of bpf(2) man page for review Michael Kerrisk (man-pages)
[not found] ` <55AFE46F.3090800-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-07-22 19:22 ` Alexei Starovoitov
[not found] ` <55AFED75.2030208-uqk4Ao+rVK5Wk0Htik3J/w@public.gmane.org>
2015-07-22 20:10 ` Michael Kerrisk (man-pages)
[not found] ` <55AFF8BF.3050204-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-07-22 22:12 ` Alexei Starovoitov
[not found] ` <55B01535.4080809-uqk4Ao+rVK5Wk0Htik3J/w@public.gmane.org>
2015-07-23 9:58 ` Michael Kerrisk (man-pages)
2015-07-23 9:31 ` Daniel Borkmann
[not found] ` <55B0B461.1020201-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
2015-07-23 11:23 ` Michael Kerrisk (man-pages)
[not found] ` <55B0CECA.2010105-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-07-23 12:47 ` Daniel Borkmann
[not found] ` <55B0E252.2010207-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
2015-07-23 13:36 ` Michael Kerrisk (man-pages) [this message]
[not found] ` <55B0EDD7.8020407-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-07-23 13:39 ` Daniel Borkmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55B0EDD7.8020407@gmail.com \
--to=mtk.manpages-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
--cc=ast-uqk4Ao+rVK5Wk0Htik3J/w@public.gmane.org \
--cc=daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=s.jegen-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=wharms-fPG8STNUNVg@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).