stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Daniel Borkmann <daniel@iogearbox.net>,
	Alexei Starovoitov <ast@kernel.org>
Subject: [PATCH 4.4 07/67] bpf: avoid false sharing of map refcount with max_entries
Date: Fri,  2 Feb 2018 17:57:36 +0100	[thread overview]
Message-ID: <20180202140816.064013359@linuxfoundation.org> (raw)
In-Reply-To: <20180202140815.091718203@linuxfoundation.org>

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <daniel@iogearbox.net>

[ upstream commit be95a845cc4402272994ce290e3ad928aff06cb9 ]

In addition to commit b2157399cc98 ("bpf: prevent out-of-bounds
speculation") also change the layout of struct bpf_map such that
false sharing of fast-path members like max_entries is avoided
when the maps reference counter is altered. Therefore enforce
them to be placed into separate cachelines.

pahole dump after change:

  struct bpf_map {
        const struct bpf_map_ops  * ops;                 /*     0     8 */
        struct bpf_map *           inner_map_meta;       /*     8     8 */
        void *                     security;             /*    16     8 */
        enum bpf_map_type          map_type;             /*    24     4 */
        u32                        key_size;             /*    28     4 */
        u32                        value_size;           /*    32     4 */
        u32                        max_entries;          /*    36     4 */
        u32                        map_flags;            /*    40     4 */
        u32                        pages;                /*    44     4 */
        u32                        id;                   /*    48     4 */
        int                        numa_node;            /*    52     4 */
        bool                       unpriv_array;         /*    56     1 */

        /* XXX 7 bytes hole, try to pack */

        /* --- cacheline 1 boundary (64 bytes) --- */
        struct user_struct *       user;                 /*    64     8 */
        atomic_t                   refcnt;               /*    72     4 */
        atomic_t                   usercnt;              /*    76     4 */
        struct work_struct         work;                 /*    80    32 */
        char                       name[16];             /*   112    16 */
        /* --- cacheline 2 boundary (128 bytes) --- */

        /* size: 128, cachelines: 2, members: 17 */
        /* sum members: 121, holes: 1, sum holes: 7 */
  };

Now all entries in the first cacheline are read only throughout
the life time of the map, set up once during map creation. Overall
struct size and number of cachelines doesn't change from the
reordering. struct bpf_map is usually first member and embedded
in map structs in specific map implementations, so also avoid those
members to sit at the end where it could potentially share the
cacheline with first map values e.g. in the array since remote
CPUs could trigger map updates just as well for those (easily
dirtying members like max_entries intentionally as well) while
having subsequent values in cache.

Quoting from Google's Project Zero blog [1]:

  Additionally, at least on the Intel machine on which this was
  tested, bouncing modified cache lines between cores is slow,
  apparently because the MESI protocol is used for cache coherence
  [8]. Changing the reference counter of an eBPF array on one
  physical CPU core causes the cache line containing the reference
  counter to be bounced over to that CPU core, making reads of the
  reference counter on all other CPU cores slow until the changed
  reference counter has been written back to memory. Because the
  length and the reference counter of an eBPF array are stored in
  the same cache line, this also means that changing the reference
  counter on one physical CPU core causes reads of the eBPF array's
  length to be slow on other physical CPU cores (intentional false
  sharing).

While this doesn't 'control' the out-of-bounds speculation through
masking the index as in commit b2157399cc98, triggering a manipulation
of the map's reference counter is really trivial, so lets not allow
to easily affect max_entries from it.

Splitting to separate cachelines also generally makes sense from
a performance perspective anyway in that fast-path won't have a
cache miss if the map gets pinned, reused in other progs, etc out
of control path, thus also avoids unintentional false sharing.

  [1] https://googleprojectzero.blogspot.ch/2018/01/reading-privileged-memory-with-side.html

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/linux/bpf.h |   16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -31,17 +31,25 @@ struct bpf_map_ops {
 };
 
 struct bpf_map {
-	atomic_t refcnt;
+	/* 1st cacheline with read-mostly members of which some
+	 * are also accessed in fast-path (e.g. ops, max_entries).
+	 */
+	const struct bpf_map_ops *ops ____cacheline_aligned;
 	enum bpf_map_type map_type;
 	u32 key_size;
 	u32 value_size;
 	u32 max_entries;
 	u32 pages;
 	bool unpriv_array;
-	struct user_struct *user;
-	const struct bpf_map_ops *ops;
-	struct work_struct work;
+	/* 7 bytes hole */
+
+	/* 2nd cacheline with misc members to avoid false sharing
+	 * particularly with refcounting.
+	 */
+	struct user_struct *user ____cacheline_aligned;
+	atomic_t refcnt;
 	atomic_t usercnt;
+	struct work_struct work;
 };
 
 struct bpf_map_type_list {

  parent reply	other threads:[~2018-02-02 17:00 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-02 16:57 [PATCH 4.4 00/67] 4.4.115-stable review Greg Kroah-Hartman
2018-02-02 16:57 ` [PATCH 4.4 02/67] bpf: fix branch pruning logic Greg Kroah-Hartman
2018-02-02 16:57 ` [PATCH 4.4 03/67] x86: bpf_jit: small optimization in emit_bpf_tail_call() Greg Kroah-Hartman
2018-02-02 16:57 ` [PATCH 4.4 04/67] bpf: fix bpf_tail_call() x64 JIT Greg Kroah-Hartman
2018-02-02 16:57 ` [PATCH 4.4 05/67] bpf: introduce BPF_JIT_ALWAYS_ON config Greg Kroah-Hartman
2018-02-02 16:57 ` [PATCH 4.4 06/67] bpf: arsh is not supported in 32 bit alu thus reject it Greg Kroah-Hartman
2018-02-02 16:57 ` Greg Kroah-Hartman [this message]
2018-02-02 16:57 ` [PATCH 4.4 08/67] bpf: fix divides by zero Greg Kroah-Hartman
2018-02-02 16:57 ` [PATCH 4.4 09/67] bpf: fix 32-bit divide " Greg Kroah-Hartman
2018-02-02 16:57 ` [PATCH 4.4 10/67] bpf: reject stores into ctx via st and xadd Greg Kroah-Hartman
2018-02-02 16:57 ` [PATCH 4.4 11/67] x86/pti: Make unpoison of pgd for trusted boot work for real Greg Kroah-Hartman
2018-02-02 16:57 ` [PATCH 4.4 13/67] ALSA: seq: Make ioctls race-free Greg Kroah-Hartman
2018-02-02 16:57 ` [PATCH 4.4 14/67] crypto: aesni - handle zero length dst buffer Greg Kroah-Hartman
2018-02-02 16:57 ` [PATCH 4.4 15/67] crypto: af_alg - whitelist mask and type Greg Kroah-Hartman
2018-02-02 16:57 ` [PATCH 4.4 16/67] power: reset: zx-reboot: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE Greg Kroah-Hartman
2018-02-02 16:57 ` [PATCH 4.4 17/67] gpio: iop: " Greg Kroah-Hartman
2018-02-02 16:57 ` [PATCH 4.4 18/67] gpio: ath79: add missing MODULE_DESCRIPTION/LICENSE Greg Kroah-Hartman
2018-02-02 16:57 ` [PATCH 4.4 19/67] mtd: nand: denali_pci: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE Greg Kroah-Hartman
2018-02-02 16:57 ` [PATCH 4.4 20/67] igb: Free IRQs when device is hotplugged Greg Kroah-Hartman
2018-02-02 16:57 ` [PATCH 4.4 27/67] ACPI / bus: Leave modalias empty for devices which are not present Greg Kroah-Hartman
2018-02-02 16:57 ` [PATCH 4.4 28/67] cpufreq: Add Loongson machine dependencies Greg Kroah-Hartman
2018-02-02 16:57 ` [PATCH 4.4 29/67] bcache: check return value of register_shrinker Greg Kroah-Hartman
2018-02-02 16:57 ` [PATCH 4.4 30/67] drm/amdgpu: Fix SDMA load/unload sequence on HWS disabled mode Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 31/67] drm/amdkfd: Fix SDMA ring buffer size calculation Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 32/67] drm/amdkfd: Fix SDMA oversubsription handling Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 33/67] openvswitch: fix the incorrect flow action alloc size Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 34/67] mac80211: fix the update of path metric for RANN frame Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 35/67] btrfs: fix deadlock when writing out space cache Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 37/67] xen-netfront: remove warning when unloading module Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 38/67] nfsd: CLOSE SHOULD return the invalid special stateid for NFSv4.x (x>0) Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 39/67] nfsd: Ensure we check stateid validity in the seqid operation checks Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 40/67] grace: replace BUG_ON by WARN_ONCE in exit_net hook Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 41/67] nfsd: check for use of the closed special stateid Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 42/67] lockd: fix "list_add double add" caused by legacy signal interface Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 43/67] hwmon: (pmbus) Use 64bit math for DIRECT format values Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 44/67] net: ethernet: xilinx: Mark XILINX_LL_TEMAC broken on 64-bit Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 45/67] quota: Check for register_shrinker() failure Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 46/67] SUNRPC: Allow connect to return EHOSTUNREACH Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 47/67] kmemleak: add scheduling point to kmemleak_scan() Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 48/67] drm/omap: Fix error handling path in omap_dmm_probe() Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 49/67] xfs: ubsan fixes Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 50/67] scsi: aacraid: Prevent crash in case of free interrupt during scsi EH path Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 51/67] scsi: ufs: ufshcd: fix potential NULL pointer dereference in ufshcd_config_vreg Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 52/67] media: usbtv: add a new usbid Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 53/67] usb: gadget: dont dereference g until after it has been null checked Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 54/67] staging: rtl8188eu: Fix incorrect response to SIOCGIWESSID Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 55/67] usb: option: Add support for FS040U modem Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 57/67] USB: cdc-acm: Do not log urb submission errors on disconnect Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 58/67] CDC-ACM: apply quirk for card reader Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 59/67] USB: serial: io_edgeport: fix possible sleep-in-atomic Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 60/67] usbip: prevent bind loops on devices attached to vhci_hcd Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 61/67] usbip: list: dont list " Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 62/67] USB: serial: simple: add Motorola Tetra driver Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 63/67] usb: f_fs: Prevent gadget unbind if it is already unbound Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 64/67] usb: uas: unconditionally bring back host after reset Greg Kroah-Hartman
2018-03-03  0:19   ` Ben Hutchings
2018-03-03  9:14     ` Hans de Goede
2018-02-02 16:58 ` [PATCH 4.4 65/67] selinux: general protection fault in sock_has_perm Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 66/67] serial: imx: Only wakeup via RTSDEN bit if the system has RTS/CTS Greg Kroah-Hartman
2018-02-02 16:58 ` [PATCH 4.4 67/67] spi: imx: do not access registers while clocks disabled Greg Kroah-Hartman
2018-02-02 18:40 ` [PATCH 4.4 00/67] 4.4.115-stable review Nathan Chancellor
2018-02-03  5:18   ` Greg Kroah-Hartman
2018-02-02 22:20 ` Shuah Khan
2018-02-02 22:33 ` Dan Rue
2018-02-03 15:28 ` Guenter Roeck
2018-02-03 15:44   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180202140816.064013359@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=ast@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).