From: <gregkh@linuxfoundation.org>
To: daniel@iogearbox.net, ast@kernel.org, gregkh@linuxfoundation.org
Cc: <stable@vger.kernel.org>, <stable-commits@vger.kernel.org>
Subject: Patch "bpf: avoid false sharing of map refcount with max_entries" has been added to the 4.4-stable tree
Date: Thu, 01 Feb 2018 09:07:27 +0100 [thread overview]
Message-ID: <1517472447136116@kroah.com> (raw)
In-Reply-To: <6c5f91e38c952be4831f6764a92cedb7a48be095.1517279268.git.daniel@iogearbox.net>
This is a note to let you know that I've just added the patch titled
bpf: avoid false sharing of map refcount with max_entries
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
The filename of the patch is:
bpf-avoid-false-sharing-of-map-refcount-with-max_entries.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.
>From foo@baz Thu Feb 1 09:05:44 CET 2018
From: Daniel Borkmann <daniel@iogearbox.net>
Date: Tue, 30 Jan 2018 03:37:43 +0100
Subject: bpf: avoid false sharing of map refcount with max_entries
To: gregkh@linuxfoundation.org
Cc: ast@kernel.org, daniel@iogearbox.net, stable@vger.kernel.org
Message-ID: <6c5f91e38c952be4831f6764a92cedb7a48be095.1517279268.git.daniel@iogearbox.net>
From: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit be95a845cc4402272994ce290e3ad928aff06cb9 ]
In addition to commit b2157399cc98 ("bpf: prevent out-of-bounds
speculation") also change the layout of struct bpf_map such that
false sharing of fast-path members like max_entries is avoided
when the maps reference counter is altered. Therefore enforce
them to be placed into separate cachelines.
pahole dump after change:
struct bpf_map {
const struct bpf_map_ops * ops; /* 0 8 */
struct bpf_map * inner_map_meta; /* 8 8 */
void * security; /* 16 8 */
enum bpf_map_type map_type; /* 24 4 */
u32 key_size; /* 28 4 */
u32 value_size; /* 32 4 */
u32 max_entries; /* 36 4 */
u32 map_flags; /* 40 4 */
u32 pages; /* 44 4 */
u32 id; /* 48 4 */
int numa_node; /* 52 4 */
bool unpriv_array; /* 56 1 */
/* XXX 7 bytes hole, try to pack */
/* --- cacheline 1 boundary (64 bytes) --- */
struct user_struct * user; /* 64 8 */
atomic_t refcnt; /* 72 4 */
atomic_t usercnt; /* 76 4 */
struct work_struct work; /* 80 32 */
char name[16]; /* 112 16 */
/* --- cacheline 2 boundary (128 bytes) --- */
/* size: 128, cachelines: 2, members: 17 */
/* sum members: 121, holes: 1, sum holes: 7 */
};
Now all entries in the first cacheline are read only throughout
the life time of the map, set up once during map creation. Overall
struct size and number of cachelines doesn't change from the
reordering. struct bpf_map is usually first member and embedded
in map structs in specific map implementations, so also avoid those
members to sit at the end where it could potentially share the
cacheline with first map values e.g. in the array since remote
CPUs could trigger map updates just as well for those (easily
dirtying members like max_entries intentionally as well) while
having subsequent values in cache.
Quoting from Google's Project Zero blog [1]:
Additionally, at least on the Intel machine on which this was
tested, bouncing modified cache lines between cores is slow,
apparently because the MESI protocol is used for cache coherence
[8]. Changing the reference counter of an eBPF array on one
physical CPU core causes the cache line containing the reference
counter to be bounced over to that CPU core, making reads of the
reference counter on all other CPU cores slow until the changed
reference counter has been written back to memory. Because the
length and the reference counter of an eBPF array are stored in
the same cache line, this also means that changing the reference
counter on one physical CPU core causes reads of the eBPF array's
length to be slow on other physical CPU cores (intentional false
sharing).
While this doesn't 'control' the out-of-bounds speculation through
masking the index as in commit b2157399cc98, triggering a manipulation
of the map's reference counter is really trivial, so lets not allow
to easily affect max_entries from it.
Splitting to separate cachelines also generally makes sense from
a performance perspective anyway in that fast-path won't have a
cache miss if the map gets pinned, reused in other progs, etc out
of control path, thus also avoids unintentional false sharing.
[1] https://googleprojectzero.blogspot.ch/2018/01/reading-privileged-memory-with-side.html
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
include/linux/bpf.h | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -31,17 +31,25 @@ struct bpf_map_ops {
};
struct bpf_map {
- atomic_t refcnt;
+ /* 1st cacheline with read-mostly members of which some
+ * are also accessed in fast-path (e.g. ops, max_entries).
+ */
+ const struct bpf_map_ops *ops ____cacheline_aligned;
enum bpf_map_type map_type;
u32 key_size;
u32 value_size;
u32 max_entries;
u32 pages;
bool unpriv_array;
- struct user_struct *user;
- const struct bpf_map_ops *ops;
- struct work_struct work;
+ /* 7 bytes hole */
+
+ /* 2nd cacheline with misc members to avoid false sharing
+ * particularly with refcounting.
+ */
+ struct user_struct *user ____cacheline_aligned;
+ atomic_t refcnt;
atomic_t usercnt;
+ struct work_struct work;
};
struct bpf_map_type_list {
Patches currently in stable-queue which might be from daniel@iogearbox.net are
queue-4.4/bpf-fix-branch-pruning-logic.patch
queue-4.4/bpf-avoid-false-sharing-of-map-refcount-with-max_entries.patch
queue-4.4/x86-bpf_jit-small-optimization-in-emit_bpf_tail_call.patch
queue-4.4/bpf-reject-stores-into-ctx-via-st-and-xadd.patch
queue-4.4/bpf-fix-32-bit-divide-by-zero.patch
queue-4.4/bpf-fix-bpf_tail_call-x64-jit.patch
queue-4.4/bpf-arsh-is-not-supported-in-32-bit-alu-thus-reject-it.patch
queue-4.4/bpf-fix-divides-by-zero.patch
queue-4.4/bpf-introduce-bpf_jit_always_on-config.patch
next prev parent reply other threads:[~2018-02-01 8:07 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-30 2:37 [PATCH stable 4.4 0/9] BPF stable patches Daniel Borkmann
2018-01-30 2:37 ` [PATCH stable 4.4 1/9] bpf: fix branch pruning logic Daniel Borkmann
2018-02-01 8:07 ` Patch "bpf: fix branch pruning logic" has been added to the 4.4-stable tree gregkh
2018-01-30 2:37 ` [PATCH stable 4.4 2/9] x86: bpf_jit: small optimization in emit_bpf_tail_call() Daniel Borkmann
2018-02-01 8:07 ` Patch "x86: bpf_jit: small optimization in emit_bpf_tail_call()" has been added to the 4.4-stable tree gregkh
2018-01-30 2:37 ` [PATCH stable 4.4 3/9] bpf: fix bpf_tail_call() x64 JIT Daniel Borkmann
2018-02-01 8:07 ` Patch "bpf: fix bpf_tail_call() x64 JIT" has been added to the 4.4-stable tree gregkh
2018-01-30 2:37 ` [PATCH stable 4.4 4/9] bpf: introduce BPF_JIT_ALWAYS_ON config Daniel Borkmann
2018-02-01 8:07 ` Patch "bpf: introduce BPF_JIT_ALWAYS_ON config" has been added to the 4.4-stable tree gregkh
2018-01-30 2:37 ` [PATCH stable 4.4 5/9] bpf: arsh is not supported in 32 bit alu thus reject it Daniel Borkmann
2018-02-01 8:07 ` Patch "bpf: arsh is not supported in 32 bit alu thus reject it" has been added to the 4.4-stable tree gregkh
2018-01-30 2:37 ` [PATCH stable 4.4 6/9] bpf: avoid false sharing of map refcount with max_entries Daniel Borkmann
2018-02-01 8:07 ` gregkh [this message]
2018-01-30 2:37 ` [PATCH stable 4.4 7/9] bpf: fix divides by zero Daniel Borkmann
2018-02-01 8:07 ` Patch "bpf: fix divides by zero" has been added to the 4.4-stable tree gregkh
2018-01-30 2:37 ` [PATCH stable 4.4 8/9] bpf: fix 32-bit divide by zero Daniel Borkmann
2018-02-01 8:07 ` Patch "bpf: fix 32-bit divide by zero" has been added to the 4.4-stable tree gregkh
2018-01-30 2:37 ` [PATCH stable 4.4 9/9] bpf: reject stores into ctx via st and xadd Daniel Borkmann
2018-02-01 8:07 ` Patch "bpf: reject stores into ctx via st and xadd" has been added to the 4.4-stable tree gregkh
2018-01-30 6:30 ` [PATCH stable 4.4 0/9] BPF stable patches Greg KH
2018-01-30 10:29 ` Daniel Borkmann
2018-02-01 8:07 ` Greg KH
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1517472447136116@kroah.com \
--to=gregkh@linuxfoundation.org \
--cc=ast@kernel.org \
--cc=daniel@iogearbox.net \
--cc=stable-commits@vger.kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).