All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Gu Zheng <guz.fnst@cn.fujitsu.com>,
	Xishi Qiu <qiuxishi@huawei.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	David Rientjes <rientjes@google.com>,
	Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>,
	Taku Izumi <izumi.taku@jp.fujitsu.com>,
	Tang Chen <tangchen@cn.fujitsu.com>,
	Xie XiuQi <xiexiuqi@huawei.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 3.10 10/34] mm/memory hotplug: postpone the reset of obsolete pgdat
Date: Fri, 17 Apr 2015 15:28:42 +0200	[thread overview]
Message-ID: <20150417132554.426859206@linuxfoundation.org> (raw)
In-Reply-To: <20150417132553.751904098@linuxfoundation.org>

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Gu Zheng <guz.fnst@cn.fujitsu.com>

commit b0dc3a342af36f95a68fe229b8f0f73552c5ca08 upstream.

Qiu Xishi reported the following BUG when testing hot-add/hot-remove node under
stress condition:

  BUG: unable to handle kernel paging request at 0000000000025f60
  IP: next_online_pgdat+0x1/0x50
  PGD 0
  Oops: 0000 [#1] SMP
  ACPI: Device does not support D3cold
  Modules linked in: fuse nls_iso8859_1 nls_cp437 vfat fat loop dm_mod coretemp mperf crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 pcspkr microcode igb dca i2c_algo_bit ipv6 megaraid_sas iTCO_wdt i2c_i801 i2c_core iTCO_vendor_support tg3 sg hwmon ptp lpc_ich pps_core mfd_core acpi_pad rtc_cmos button ext3 jbd mbcache sd_mod crc_t10dif scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh ahci libahci libata scsi_mod [last unloaded: rasf]
  CPU: 23 PID: 238 Comm: kworker/23:1 Tainted: G           O 3.10.15-5885-euler0302 #1
  Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. Huawei N1/Huawei N1, BIOS V100R001 03/02/2015
  Workqueue: events vmstat_update
  task: ffffa800d32c0000 ti: ffffa800d32ae000 task.ti: ffffa800d32ae000
  RIP: 0010: next_online_pgdat+0x1/0x50
  RSP: 0018:ffffa800d32afce8  EFLAGS: 00010286
  RAX: 0000000000001440 RBX: ffffffff81da53b8 RCX: 0000000000000082
  RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000000
  RBP: ffffa800d32afd28 R08: ffffffff81c93bfc R09: ffffffff81cbdc96
  R10: 00000000000040ec R11: 00000000000000a0 R12: ffffa800fffb3440
  R13: ffffa800d32afd38 R14: 0000000000000017 R15: ffffa800e6616800
  FS:  0000000000000000(0000) GS:ffffa800e6600000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000025f60 CR3: 0000000001a0b000 CR4: 00000000001407e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
    refresh_cpu_vm_stats+0xd0/0x140
    vmstat_update+0x11/0x50
    process_one_work+0x194/0x3d0
    worker_thread+0x12b/0x410
    kthread+0xc6/0xd0
    ret_from_fork+0x7c/0xb0

The cause is the "memset(pgdat, 0, sizeof(*pgdat))" at the end of
try_offline_node, which will reset all the content of pgdat to 0, as the
pgdat is accessed lock-free, so that the users still using the pgdat
will panic, such as the vmstat_update routine.

process A:				offline node XX:

vmstat_updat()
   refresh_cpu_vm_stats()
     for_each_populated_zone()
       find online node XX
     cond_resched()
					offline cpu and memory, then try_offline_node()
					node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat))
       zone = next_zone(zone)
         pg_data_t *pgdat = zone->zone_pgdat;  // here pgdat is NULL now
           next_online_pgdat(pgdat)
             next_online_node(pgdat->node_id);  // NULL pointer access

So the solution here is postponing the reset of obsolete pgdat from
try_offline_node() to hotadd_new_pgdat(), and just resetting
pgdat->nr_zones and pgdat->classzone_idx to be 0 rather than the memset
0 to avoid breaking pointer information in pgdat.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Reported-by: Xishi Qiu <qiuxishi@huawei.com>
Suggested-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/memory_hotplug.c |   13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1039,6 +1039,10 @@ static pg_data_t __ref *hotadd_new_pgdat
 			return NULL;
 
 		arch_refresh_nodedata(nid, pgdat);
+	} else {
+		/* Reset the nr_zones and classzone_idx to 0 before reuse */
+		pgdat->nr_zones = 0;
+		pgdat->classzone_idx = 0;
 	}
 
 	/* we can use NODE_DATA(nid) from here */
@@ -1802,15 +1806,6 @@ void try_offline_node(int nid)
 		if (is_vmalloc_addr(zone->wait_table))
 			vfree(zone->wait_table);
 	}
-
-	/*
-	 * Since there is no way to guarentee the address of pgdat/zone is not
-	 * on stack of any kernel threads or used by other kernel objects
-	 * without reference counting or other symchronizing method, do not
-	 * reset node_data and free pgdat here. Just reset it to 0 and reuse
-	 * the memory when the node is online again.
-	 */
-	memset(pgdat, 0, sizeof(*pgdat));
 }
 EXPORT_SYMBOL(try_offline_node);
 



  parent reply	other threads:[~2015-04-17 14:54 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-17 13:28 [PATCH 3.10 00/34] 3.10.75-stable review Greg Kroah-Hartman
2015-04-17 13:28 ` [PATCH 3.10 01/34] ALSA: hda - Add one more node in the EAPD supporting candidate list Greg Kroah-Hartman
2015-04-17 13:28 ` [PATCH 3.10 02/34] ALSA: usb - Creative USB X-Fi Pro SB1095 volume knob support Greg Kroah-Hartman
2015-04-17 13:28 ` [PATCH 3.10 03/34] ALSA: hda - Fix headphone pin config for Lifebook T731 Greg Kroah-Hartman
2015-04-17 13:28 ` [PATCH 3.10 04/34] selinux: fix sel_write_enforce broken return value Greg Kroah-Hartman
2015-04-17 13:28 ` [PATCH 3.10 05/34] tcp: Fix crash in TCP Fast Open Greg Kroah-Hartman
2015-04-17 13:28 ` [PATCH 3.10 06/34] IB/core: Avoid leakage from kernel to user space Greg Kroah-Hartman
2015-04-17 13:28 ` [PATCH 3.10 07/34] IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic Greg Kroah-Hartman
2015-04-17 13:28 ` [PATCH 3.10 08/34] iwlwifi: dvm: run INIT firmware again upon .start() Greg Kroah-Hartman
2015-04-17 13:28 ` [PATCH 3.10 09/34] nbd: fix possible memory leak Greg Kroah-Hartman
2015-04-17 13:28 ` Greg Kroah-Hartman [this message]
2015-04-17 13:28 ` [PATCH 3.10 11/34] writeback: add missing INITIAL_JIFFIES init in global_update_bandwidth() Greg Kroah-Hartman
2015-04-17 13:28 ` [PATCH 3.10 12/34] writeback: fix possible underflow in write bandwidth calculation Greg Kroah-Hartman
2015-04-17 13:28   ` Greg Kroah-Hartman
2015-04-17 13:28 ` [PATCH 3.10 14/34] USB: ftdi_sio: Added custom PID for Synapse Wireless product Greg Kroah-Hartman
2015-04-17 13:28 ` [PATCH 3.10 15/34] USB: ftdi_sio: Use jtag quirk for SNAP Connect E10 Greg Kroah-Hartman
2015-04-17 13:28 ` [PATCH 3.10 16/34] Defer processing of REQ_PREEMPT requests for blocked devices Greg Kroah-Hartman
2015-04-17 13:28 ` [PATCH 3.10 17/34] iio: inv_mpu6050: Clear timestamps fifo while resetting hardware fifo Greg Kroah-Hartman
2015-04-17 13:28 ` [PATCH 3.10 18/34] iio: imu: Use iio_trigger_get for indio_dev->trig assignment Greg Kroah-Hartman
2015-04-17 13:28 ` [PATCH 3.10 19/34] dmaengine: omap-dma: Fix memory leak when terminating running transfer Greg Kroah-Hartman
2015-04-17 13:28 ` [PATCH 3.10 20/34] cpuidle: ACPI: do not overwrite name and description of C0 Greg Kroah-Hartman
2015-04-17 13:28 ` [PATCH 3.10 21/34] usb: xhci: apply XHCI_AVOID_BEI quirk to all Intel xHCI controllers Greg Kroah-Hartman
2015-04-17 13:28 ` [PATCH 3.10 22/34] cifs: fix use-after-free bug in find_writable_file Greg Kroah-Hartman
2015-04-17 13:28 ` [PATCH 3.10 23/34] be2iscsi: Fix kernel panic when device initialization fails Greg Kroah-Hartman
2015-04-17 13:28 ` [PATCH 3.10 24/34] ocfs2: _really_ sync the right range Greg Kroah-Hartman
2015-04-17 13:28 ` [PATCH 3.10 25/34] iscsi target: fix oops when adding reject pdu Greg Kroah-Hartman
2015-04-17 13:28 ` [PATCH 3.10 26/34] [media] media: s5p-mfc: fix mmap support for 64bit arch Greg Kroah-Hartman
2015-04-17 13:29 ` [PATCH 3.10 28/34] ipc: fix compat msgrcv with negative msgtyp Greg Kroah-Hartman
2015-04-17 13:29 ` [PATCH 3.10 29/34] net: rds: use correct size for max unacked packets and bytes Greg Kroah-Hartman
2015-04-17 13:29 ` [PATCH 3.10 30/34] net: llc: use correct size for sysctl timeout entries Greg Kroah-Hartman
2015-04-17 13:29 ` [PATCH 3.10 31/34] kernel.h: define u8, s8, u32, etc. limits Greg Kroah-Hartman
2015-04-20  9:43   ` Konstantin Khlebnikov
2015-04-20 14:35     ` Greg Kroah-Hartman
2015-04-20 15:01       ` Alex Elder
2015-04-20 18:22         ` Greg Kroah-Hartman
2015-04-17 13:29 ` [PATCH 3.10 32/34] IB/mlx4: Saturate RoCE port PMA counters in case of overflow Greg Kroah-Hartman
2015-04-17 13:29 ` [PATCH 3.10 33/34] console: Fix console name size mismatch Greg Kroah-Hartman
2015-04-17 13:29 ` [PATCH 3.10 34/34] pagemap: do not leak physical addresses to non-privileged userspace Greg Kroah-Hartman
2015-04-17 17:35 ` [PATCH 3.10 00/34] 3.10.75-stable review Shuah Khan
2015-04-17 20:00 ` Guenter Roeck
2015-04-18  9:02   ` Greg Kroah-Hartman
2015-04-18 13:40     ` Guenter Roeck
2015-04-18 19:10     ` Guenter Roeck
2015-04-18 20:35       ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150417132554.426859206@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=guz.fnst@cn.fujitsu.com \
    --cc=isimatu.yasuaki@jp.fujitsu.com \
    --cc=izumi.taku@jp.fujitsu.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=qiuxishi@huawei.com \
    --cc=rientjes@google.com \
    --cc=stable@vger.kernel.org \
    --cc=tangchen@cn.fujitsu.com \
    --cc=torvalds@linux-foundation.org \
    --cc=xiexiuqi@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.