stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Michael Ellerman <mpe@ellerman.id.au>,
	Nicholas Piggin <npiggin@gmail.com>
Subject: [PATCH 4.4 66/90] powerpc/numa: Fix percpu allocations to be NUMA aware
Date: Mon, 12 Jun 2017 17:26:12 +0200	[thread overview]
Message-ID: <20170612152600.887366512@linuxfoundation.org> (raw)
In-Reply-To: <20170612152556.133240249@linuxfoundation.org>

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Michael Ellerman <mpe@ellerman.id.au>

commit ba4a648f12f4cd0a8003dd229b6ca8a53348ee4b upstream.

In commit 8c272261194d ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"), we
switched to the generic implementation of cpu_to_node(), which uses a percpu
variable to hold the NUMA node for each CPU.

Unfortunately we neglected to notice that we use cpu_to_node() in the allocation
of our percpu areas, leading to a chicken and egg problem. In practice what
happens is when we are setting up the percpu areas, cpu_to_node() reports that
all CPUs are on node 0, so we allocate all percpu areas on node 0.

This is visible in the dmesg output, as all pcpu allocs being in group 0:

  pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
  pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
  pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
  pcpu-alloc: [0] 24 25 26 27 [0] 28 29 30 31
  pcpu-alloc: [0] 32 33 34 35 [0] 36 37 38 39
  pcpu-alloc: [0] 40 41 42 43 [0] 44 45 46 47

To fix it we need an early_cpu_to_node() which can run prior to percpu being
setup. We already have the numa_cpu_lookup_table we can use, so just plumb it
in. With the patch dmesg output shows two groups, 0 and 1:

  pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
  pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
  pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
  pcpu-alloc: [1] 24 25 26 27 [1] 28 29 30 31
  pcpu-alloc: [1] 32 33 34 35 [1] 36 37 38 39
  pcpu-alloc: [1] 40 41 42 43 [1] 44 45 46 47

We can also check the data_offset in the paca of various CPUs, with the fix we
see:

  CPU 0:  data_offset = 0x0ffe8b0000
  CPU 24: data_offset = 0x1ffe5b0000

And we can see from dmesg that CPU 24 has an allocation on node 1:

  node   0: [mem 0x0000000000000000-0x0000000fffffffff]
  node   1: [mem 0x0000001000000000-0x0000001fffffffff]

Fixes: 8c272261194d ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/powerpc/include/asm/topology.h |   14 ++++++++++++++
 arch/powerpc/kernel/setup_64.c      |    4 ++--
 2 files changed, 16 insertions(+), 2 deletions(-)

--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -44,8 +44,22 @@ extern void __init dump_numa_cpu_topolog
 extern int sysfs_add_device_to_node(struct device *dev, int nid);
 extern void sysfs_remove_device_from_node(struct device *dev, int nid);
 
+static inline int early_cpu_to_node(int cpu)
+{
+	int nid;
+
+	nid = numa_cpu_lookup_table[cpu];
+
+	/*
+	 * Fall back to node 0 if nid is unset (it should be, except bugs).
+	 * This allows callers to safely do NODE_DATA(early_cpu_to_node(cpu)).
+	 */
+	return (nid < 0) ? 0 : nid;
+}
 #else
 
+static inline int early_cpu_to_node(int cpu) { return 0; }
+
 static inline void dump_numa_cpu_topology(void) {}
 
 static inline int sysfs_add_device_to_node(struct device *dev, int nid)
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -751,7 +751,7 @@ void __init setup_arch(char **cmdline_p)
 
 static void * __init pcpu_fc_alloc(unsigned int cpu, size_t size, size_t align)
 {
-	return __alloc_bootmem_node(NODE_DATA(cpu_to_node(cpu)), size, align,
+	return __alloc_bootmem_node(NODE_DATA(early_cpu_to_node(cpu)), size, align,
 				    __pa(MAX_DMA_ADDRESS));
 }
 
@@ -762,7 +762,7 @@ static void __init pcpu_fc_free(void *pt
 
 static int pcpu_cpu_distance(unsigned int from, unsigned int to)
 {
-	if (cpu_to_node(from) == cpu_to_node(to))
+	if (early_cpu_to_node(from) == early_cpu_to_node(to))
 		return LOCAL_DISTANCE;
 	else
 		return REMOTE_DISTANCE;

  parent reply	other threads:[~2017-06-12 15:26 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-12 15:25 [PATCH 4.4 00/90] 4.4.72-stable review Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 01/90] bnx2x: Fix Multi-Cos Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 02/90] ipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt() Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 03/90] cxgb4: avoid enabling napi twice to the same queue Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 04/90] tcp: disallow cwnd undo when switching congestion control Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 05/90] vxlan: fix use-after-free on deletion Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 06/90] ipv6: Fix leak in ipv6_gso_segment() Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 07/90] net: ping: do not abuse udp_poll() Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 08/90] net: ethoc: enable NAPI before poll may be scheduled Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 09/90] net: bridge: start hello timer only if device is up Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 10/90] sparc64: mm: fix copy_tsb to correctly copy huge page TSBs Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 11/90] sparc: Machine description indices can vary Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 12/90] sparc64: reset mm cpumask after wrap Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 13/90] sparc64: combine activate_mm and switch_mm Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 14/90] sparc64: redefine first version Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 15/90] sparc64: add per-cpu mm of secondary contexts Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 16/90] sparc64: new context wrap Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 17/90] sparc64: delete old wrap code Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 18/90] arch/sparc: support NR_CPUS = 4096 Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 19/90] serial: ifx6x60: fix use-after-free on module unload Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 20/90] ptrace: Properly initialize ptracer_cred on fork Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 21/90] KEYS: fix dereferencing NULL payload with nonzero length Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 22/90] KEYS: fix freeing uninitialized memory in key_update() Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 23/90] crypto: gcm - wait for crypto op not signal safe Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 25/90] nfsd4: fix null dereference on replay Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 26/90] nfsd: Fix up the "supattr_exclcreat" attributes Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 29/90] arm: KVM: Allow unaligned accesses at HYP Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 31/90] dmaengine: usb-dmac: Fix DMAOR AE bit definition Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 32/90] dmaengine: ep93xx: Always start from BASE0 Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 33/90] xen/privcmd: Support correctly 64KB page granularity when mapping memory Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 34/90] xen-netfront: do not cast grant table reference to signed short Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 35/90] xen-netfront: cast grant table reference first to type int Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 36/90] ext4: fix SEEK_HOLE Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 37/90] ext4: keep existing extra fields when inode expands Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 38/90] ext4: fix fdatasync(2) after extent manipulation operations Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 39/90] usb: gadget: f_mass_storage: Serialize wake and sleep execution Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 40/90] usb: chipidea: udc: fix NULL pointer dereference if udc_start failed Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 41/90] usb: chipidea: debug: check before accessing ci_role Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 42/90] staging/lustre/lov: remove set_fs() call from lov_getstripe() Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 43/90] iio: light: ltr501 Fix interchanged als/ps register field Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 44/90] iio: proximity: as3935: fix AS3935_INT mask Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 45/90] drivers: char: random: add get_random_long() Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 46/90] random: properly align get_random_int_hash Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 47/90] stackprotector: Increase the per-task stack canarys random range from 32 bits to 64 bits on 64-bit platforms Greg Kroah-Hartman
2017-06-12 15:41   ` [kernel-hardening] " Jann Horn
2017-06-12 15:45     ` Jann Horn
2017-06-12 15:25 ` [PATCH 4.4 48/90] cpufreq: cpufreq_register_driver() should return -ENODEV if init fails Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 49/90] target: Re-add check to reject control WRITEs with overflow data Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 50/90] drm/msm: Expose our reservation object when exporting a dmabuf Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 51/90] Input: elantech - add Fujitsu Lifebook E546/E557 to force crc_enabled Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 52/90] cpuset: consider dying css as offline Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 53/90] fs: add i_blocksize() Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 54/90] ufs: restore proper tail allocation Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 55/90] fix ufs_isblockset() Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 56/90] ufs: restore maintaining ->i_blocks Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 57/90] ufs: set correct ->s_maxsize Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 58/90] ufs_extend_tail(): fix the braino in calling conventions of ufs_new_fragments() Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 59/90] ufs_getfrag_block(): we only grab ->truncate_mutex on block creation path Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 60/90] cxl: Fix error path on bad ioctl Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 61/90] btrfs: use correct types for page indices in btrfs_page_exists_in_range Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 62/90] btrfs: fix memory leak in update_space_info failure path Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 63/90] KVM: arm/arm64: Handle possible NULL stage2 pud when ageing pages Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 64/90] scsi: qla2xxx: dont disable a not previously enabled PCI device Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 65/90] powerpc/eeh: Avoid use after free in eeh_handle_special_event() Greg Kroah-Hartman
2017-06-12 15:26 ` Greg Kroah-Hartman [this message]
2017-07-28 13:53   ` [PATCH 4.4 66/90] powerpc/numa: Fix percpu allocations to be NUMA aware Michal Hocko
2017-07-28 22:41     ` Greg Kroah-Hartman
2017-07-31  6:41       ` Michal Hocko
2017-08-03 19:29         ` Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 67/90] powerpc/hotplug-mem: Fix missing endian conversion of aa_index Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 68/90] perf/core: Drop kernel samples even though :u is specified Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 69/90] drm/vmwgfx: Handle vmalloc() failure in vmw_local_fifo_reserve() Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 70/90] drm/vmwgfx: limit the number of mip levels in vmw_gb_surface_define_ioctl() Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 71/90] drm/vmwgfx: Make sure backup_handle is always valid Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 72/90] drm/nouveau/tmr: fully separate alarm execution/pending lists Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 73/90] ALSA: timer: Fix race between read and ioctl Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 74/90] ALSA: timer: Fix missing queue indices reset at SNDRV_TIMER_IOCTL_SELECT Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 75/90] ASoC: Fix use-after-free at card unregistration Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 76/90] drivers: char: mem: Fix wraparound check to allow mappings up to the end Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 77/90] tty: Drop krefs for interrupted tty lock Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 78/90] serial: sh-sci: Fix panic when serial console and DMA are enabled Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 79/90] net: better skb->sender_cpu and skb->napi_id cohabitation Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 80/90] mm: consider memblock reservations for deferred memory initialization sizing Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 81/90] NFS: Ensure we revalidate attributes before using execute_ok() Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 82/90] NFSv4: Dont perform cached access checks before weve OPENed the file Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 83/90] Make __xfs_xattr_put_listen preperly report errors Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 84/90] arm64: hw_breakpoint: fix watchpoint matching for tagged pointers Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 85/90] arm64: entry: improve data abort handling of " Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 86/90] RDMA/qib,hfi1: Fix MR reference count leak on write with immediate Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 87/90] tracing: Use strlcpy() instead of strcpy() in __trace_find_cmdline() Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 88/90] usercopy: Adjust tests to deal with SMAP/PAN Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 89/90] arm64: armv8_deprecated: ensure extension of addr Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 90/90] arm64: ensure extension of smp_store_release value Greg Kroah-Hartman
2017-06-12 21:53 ` [PATCH 4.4 00/90] 4.4.72-stable review Guenter Roeck
2017-06-13  0:45 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170612152600.887366512@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).