From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Michael Ellerman <mpe@ellerman.id.au>,
Nicholas Piggin <npiggin@gmail.com>
Subject: [PATCH 3.18 31/45] powerpc/numa: Fix percpu allocations to be NUMA aware
Date: Mon, 12 Jun 2017 17:26:41 +0200 [thread overview]
Message-ID: <20170612152555.070623666@linuxfoundation.org> (raw)
In-Reply-To: <20170612152553.118037974@linuxfoundation.org>
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Ellerman <mpe@ellerman.id.au>
commit ba4a648f12f4cd0a8003dd229b6ca8a53348ee4b upstream.
In commit 8c272261194d ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"), we
switched to the generic implementation of cpu_to_node(), which uses a percpu
variable to hold the NUMA node for each CPU.
Unfortunately we neglected to notice that we use cpu_to_node() in the allocation
of our percpu areas, leading to a chicken and egg problem. In practice what
happens is when we are setting up the percpu areas, cpu_to_node() reports that
all CPUs are on node 0, so we allocate all percpu areas on node 0.
This is visible in the dmesg output, as all pcpu allocs being in group 0:
pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
pcpu-alloc: [0] 24 25 26 27 [0] 28 29 30 31
pcpu-alloc: [0] 32 33 34 35 [0] 36 37 38 39
pcpu-alloc: [0] 40 41 42 43 [0] 44 45 46 47
To fix it we need an early_cpu_to_node() which can run prior to percpu being
setup. We already have the numa_cpu_lookup_table we can use, so just plumb it
in. With the patch dmesg output shows two groups, 0 and 1:
pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
pcpu-alloc: [1] 24 25 26 27 [1] 28 29 30 31
pcpu-alloc: [1] 32 33 34 35 [1] 36 37 38 39
pcpu-alloc: [1] 40 41 42 43 [1] 44 45 46 47
We can also check the data_offset in the paca of various CPUs, with the fix we
see:
CPU 0: data_offset = 0x0ffe8b0000
CPU 24: data_offset = 0x1ffe5b0000
And we can see from dmesg that CPU 24 has an allocation on node 1:
node 0: [mem 0x0000000000000000-0x0000000fffffffff]
node 1: [mem 0x0000001000000000-0x0000001fffffffff]
Fixes: 8c272261194d ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/powerpc/include/asm/topology.h | 14 ++++++++++++++
arch/powerpc/kernel/setup_64.c | 4 ++--
2 files changed, 16 insertions(+), 2 deletions(-)
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -44,8 +44,22 @@ extern void __init dump_numa_cpu_topolog
extern int sysfs_add_device_to_node(struct device *dev, int nid);
extern void sysfs_remove_device_from_node(struct device *dev, int nid);
+static inline int early_cpu_to_node(int cpu)
+{
+ int nid;
+
+ nid = numa_cpu_lookup_table[cpu];
+
+ /*
+ * Fall back to node 0 if nid is unset (it should be, except bugs).
+ * This allows callers to safely do NODE_DATA(early_cpu_to_node(cpu)).
+ */
+ return (nid < 0) ? 0 : nid;
+}
#else
+static inline int early_cpu_to_node(int cpu) { return 0; }
+
static inline void dump_numa_cpu_topology(void) {}
static inline int sysfs_add_device_to_node(struct device *dev, int nid)
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -754,7 +754,7 @@ void ppc64_boot_msg(unsigned int src, co
static void * __init pcpu_fc_alloc(unsigned int cpu, size_t size, size_t align)
{
- return __alloc_bootmem_node(NODE_DATA(cpu_to_node(cpu)), size, align,
+ return __alloc_bootmem_node(NODE_DATA(early_cpu_to_node(cpu)), size, align,
__pa(MAX_DMA_ADDRESS));
}
@@ -765,7 +765,7 @@ static void __init pcpu_fc_free(void *pt
static int pcpu_cpu_distance(unsigned int from, unsigned int to)
{
- if (cpu_to_node(from) == cpu_to_node(to))
+ if (early_cpu_to_node(from) == early_cpu_to_node(to))
return LOCAL_DISTANCE;
else
return REMOTE_DISTANCE;
next prev parent reply other threads:[~2017-06-12 15:48 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-12 15:26 [PATCH 3.18 00/45] 3.18.57-stable review Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 01/45] bnx2x: Fix Multi-Cos Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 02/45] ipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt() Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 03/45] cxgb4: avoid enabling napi twice to the same queue Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 04/45] tcp: disallow cwnd undo when switching congestion control Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 05/45] ipv6: Fix leak in ipv6_gso_segment() Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 06/45] net: ping: do not abuse udp_poll() Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 07/45] net: ethoc: enable NAPI before poll may be scheduled Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 08/45] serial: ifx6x60: fix use-after-free on module unload Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 09/45] KEYS: fix dereferencing NULL payload with nonzero length Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 10/45] KEYS: fix freeing uninitialized memory in key_update() Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 11/45] crypto: gcm - wait for crypto op not signal safe Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 12/45] nfsd4: fix null dereference on replay Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 15/45] arm: KVM: Allow unaligned accesses at HYP Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 16/45] dmaengine: ep93xx: Always start from BASE0 Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 17/45] ext4: fix SEEK_HOLE Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 18/45] ext4: keep existing extra fields when inode expands Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 19/45] usb: gadget: f_mass_storage: Serialize wake and sleep execution Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 20/45] usb: chipidea: udc: fix NULL pointer dereference if udc_start failed Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 21/45] usb: chipidea: debug: check before accessing ci_role Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 22/45] staging/lustre/lov: remove set_fs() call from lov_getstripe() Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 23/45] iio: proximity: as3935: fix AS3935_INT mask Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 24/45] drivers: char: random: add get_random_long() Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 25/45] random: properly align get_random_int_hash Greg Kroah-Hartman
2017-06-12 15:26 ` [kernel-hardening] [PATCH 3.18 26/45] stackprotector: Increase the per-task stack canarys random range from 32 bits to 64 bits on 64-bit platforms Greg Kroah-Hartman
2017-06-12 15:26 ` Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 27/45] btrfs: use correct types for page indices in btrfs_page_exists_in_range Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 28/45] btrfs: fix memory leak in update_space_info failure path Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 29/45] scsi: qla2xxx: dont disable a not previously enabled PCI device Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 30/45] powerpc/eeh: Avoid use after free in eeh_handle_special_event() Greg Kroah-Hartman
2017-06-12 15:26 ` Greg Kroah-Hartman [this message]
2017-06-12 15:26 ` [PATCH 3.18 32/45] perf/core: Drop kernel samples even though :u is specified Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 33/45] drm/vmwgfx: Handle vmalloc() failure in vmw_local_fifo_reserve() Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 34/45] drm/vmwgfx: limit the number of mip levels in vmw_gb_surface_define_ioctl() Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 35/45] ALSA: timer: Fix missing queue indices reset at SNDRV_TIMER_IOCTL_SELECT Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 36/45] ASoC: Fix use-after-free at card unregistration Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 37/45] drivers: char: mem: Fix wraparound check to allow mappings up to the end Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 38/45] serial: sh-sci: Fix panic when serial console and DMA are enabled Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 39/45] arm64: hw_breakpoint: fix watchpoint matching for tagged pointers Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 40/45] arm64: entry: improve data abort handling of " Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 41/45] RDMA/qib,hfi1: Fix MR reference count leak on write with immediate Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 42/45] tracing: Use strlcpy() instead of strcpy() in __trace_find_cmdline() Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 43/45] usercopy: Adjust tests to deal with SMAP/PAN Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 44/45] arm64: ensure extension of smp_store_release value Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 3.18 45/45] mlx5: stop including <asm-generic/kmap_types.h> Greg Kroah-Hartman
2017-06-12 21:52 ` [PATCH 3.18 00/45] 3.18.57-stable review Guenter Roeck
2017-06-13 0:49 ` Shuah Khan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170612152555.070623666@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mpe@ellerman.id.au \
--cc=npiggin@gmail.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.