stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	alan@lxorguk.ukuu.org.uk, Michal Hocko <mhocko@suse.cz>,
	Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [ 46/47] mm: limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT
Date: Wed,  9 Jan 2013 12:35:41 -0800	[thread overview]
Message-ID: <20130109201734.795331570@linuxfoundation.org> (raw)
In-Reply-To: <20130109201729.435233560@linuxfoundation.org>

3.0-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Michal Hocko <mhocko@suse.cz>

commit 53a59fc67f97374758e63a9c785891ec62324c81 upstream.

Since commit e303297e6c3a ("mm: extended batches for generic
mmu_gather") we are batching pages to be freed until either
tlb_next_batch cannot allocate a new batch or we are done.

This works just fine most of the time but we can get in troubles with
non-preemptible kernel (CONFIG_PREEMPT_NONE or CONFIG_PREEMPT_VOLUNTARY)
on large machines where too aggressive batching might lead to soft
lockups during process exit path (exit_mmap) because there are no
scheduling points down the free_pages_and_swap_cache path and so the
freeing can take long enough to trigger the soft lockup.

The lockup is harmless except when the system is setup to panic on
softlockup which is not that unusual.

The simplest way to work around this issue is to limit the maximum
number of batches in a single mmu_gather.  10k of collected pages should
be safe to prevent from soft lockups (we would have 2ms for one) even if
they are all freed without an explicit scheduling point.

This patch doesn't add any new explicit scheduling points because it
relies on zap_pmd_range during page tables zapping which calls
cond_resched per PMD.

The following lockup has been reported for 3.0 kernel with a huge
process (in order of hundreds gigs but I do know any more details).

  BUG: soft lockup - CPU#56 stuck for 22s! [kernel:31053]
  Modules linked in: af_packet nfs lockd fscache auth_rpcgss nfs_acl sunrpc mptctl mptbase autofs4 binfmt_misc dm_round_robin dm_multipath bonding cpufreq_conservative cpufreq_userspace cpufreq_powersave pcc_cpufreq mperf microcode fuse loop osst sg sd_mod crc_t10dif st qla2xxx scsi_transport_fc scsi_tgt netxen_nic i7core_edac iTCO_wdt joydev e1000e serio_raw pcspkr edac_core iTCO_vendor_support acpi_power_meter rtc_cmos hpwdt hpilo button container usbhid hid dm_mirror dm_region_hash dm_log linear uhci_hcd ehci_hcd usbcore usb_common scsi_dh_emc scsi_dh_alua scsi_dh_hp_sw scsi_dh_rdac scsi_dh dm_snapshot pcnet32 mii edd dm_mod raid1 ext3 mbcache jbd fan thermal processor thermal_sys hwmon cciss scsi_mod
  Supported: Yes
  CPU 56
  Pid: 31053, comm: kernel Not tainted 3.0.31-0.9-default #1 HP ProLiant DL580 G7
  RIP: 0010:  _raw_spin_unlock_irqrestore+0x8/0x10
  RSP: 0018:ffff883ec1037af0  EFLAGS: 00000206
  RAX: 0000000000000e00 RBX: ffffea01a0817e28 RCX: ffff88803ffd9e80
  RDX: 0000000000000200 RSI: 0000000000000206 RDI: 0000000000000206
  RBP: 0000000000000002 R08: 0000000000000001 R09: ffff887ec724a400
  R10: 0000000000000000 R11: dead000000200200 R12: ffffffff8144c26e
  R13: 0000000000000030 R14: 0000000000000297 R15: 000000000000000e
  FS:  00007ed834282700(0000) GS:ffff88c03f200000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 000000000068b240 CR3: 0000003ec13c5000 CR4: 00000000000006e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process kernel (pid: 31053, threadinfo ffff883ec1036000, task ffff883ebd5d4100)
  Call Trace:
    release_pages+0xc5/0x260
    free_pages_and_swap_cache+0x9d/0xc0
    tlb_flush_mmu+0x5c/0x80
    tlb_finish_mmu+0xe/0x50
    exit_mmap+0xbd/0x120
    mmput+0x49/0x120
    exit_mm+0x122/0x160
    do_exit+0x17a/0x430
    do_group_exit+0x3d/0xb0
    get_signal_to_deliver+0x247/0x480
    do_signal+0x71/0x1b0
    do_notify_resume+0x98/0xb0
    int_signal+0x12/0x17
  DWARF2 unwinder stuck at int_signal+0x12/0x17

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/asm-generic/tlb.h |    9 +++++++++
 mm/memory.c               |    5 +++++
 2 files changed, 14 insertions(+)

--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -78,6 +78,14 @@ struct mmu_gather_batch {
 #define MAX_GATHER_BATCH	\
 	((PAGE_SIZE - sizeof(struct mmu_gather_batch)) / sizeof(void *))
 
+/*
+ * Limit the maximum number of mmu_gather batches to reduce a risk of soft
+ * lockups for non-preemptible kernels on huge machines when a lot of memory
+ * is zapped during unmapping.
+ * 10K pages freed at once should be safe even without a preemption point.
+ */
+#define MAX_GATHER_BATCH_COUNT	(10000UL/MAX_GATHER_BATCH)
+
 /* struct mmu_gather is an opaque type used by the mm code for passing around
  * any data needed by arch specific code for tlb_remove_page.
  */
@@ -94,6 +102,7 @@ struct mmu_gather {
 	struct mmu_gather_batch *active;
 	struct mmu_gather_batch	local;
 	struct page		*__pages[MMU_GATHER_BUNDLE];
+	unsigned int		batch_count;
 };
 
 #define HAVE_GENERIC_MMU_GATHER
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -205,10 +205,14 @@ static int tlb_next_batch(struct mmu_gat
 		return 1;
 	}
 
+	if (tlb->batch_count == MAX_GATHER_BATCH_COUNT)
+		return 0;
+
 	batch = (void *)__get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
 	if (!batch)
 		return 0;
 
+	tlb->batch_count++;
 	batch->next = NULL;
 	batch->nr   = 0;
 	batch->max  = MAX_GATHER_BATCH;
@@ -235,6 +239,7 @@ void tlb_gather_mmu(struct mmu_gather *t
 	tlb->local.nr   = 0;
 	tlb->local.max  = ARRAY_SIZE(tlb->__pages);
 	tlb->active     = &tlb->local;
+	tlb->batch_count = 0;
 
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	tlb->batch = NULL;



  parent reply	other threads:[~2013-01-09 20:35 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-09 20:34 [ 00/47] 3.0.58-stable review Greg Kroah-Hartman
2013-01-09 20:34 ` [ 01/47] bonding: Bonding driver does not consider the gso_max_size/gso_max_segs setting of slave devices Greg Kroah-Hartman
2013-01-09 20:34 ` [ 02/47] bonding: fix race condition in bonding_store_slaves_active Greg Kroah-Hartman
2013-01-09 20:34 ` [ 03/47] sctp: fix memory leak in sctp_datamsg_from_user() when copy from user space fails Greg Kroah-Hartman
2013-01-09 20:34 ` [ 04/47] sctp: fix -ENOMEM result with invalid user space pointer in sendto() syscall Greg Kroah-Hartman
2013-01-09 20:35 ` [ 05/47] ne2000: add the right platform device Greg Kroah-Hartman
2013-01-09 20:35 ` [ 06/47] irda: sir_dev: Fix copy/paste typo Greg Kroah-Hartman
2013-01-09 20:35 ` [ 07/47] usb/ipheth: Add iPhone 5 support Greg Kroah-Hartman
2013-01-09 20:35 ` [ 08/47] pnpacpi: fix incorrect TEST_ALPHA() test Greg Kroah-Hartman
2013-01-09 20:35 ` [ 09/47] exec: do not leave bprm->interp on stack Greg Kroah-Hartman
2013-01-09 20:35 ` [ 10/47] x86, 8042: Enable A20 using KBC to fix S3 resume on some MSI laptops Greg Kroah-Hartman
2013-01-09 20:35 ` [ 11/47] virtio: force vring descriptors to be allocated from lowmem Greg Kroah-Hartman
2013-01-09 20:35 ` [ 12/47] mm: Fix PageHead when !CONFIG_PAGEFLAGS_EXTENDED Greg Kroah-Hartman
2013-01-09 20:35 ` [ 13/47] tmpfs mempolicy: fix /proc/mounts corrupting memory Greg Kroah-Hartman
2013-01-09 20:35 ` [ 14/47] ALSA: usb-audio: Avoid autopm calls after disconnection Greg Kroah-Hartman
2013-01-09 20:35 ` [ 15/47] ALSA: usb-audio: Fix missing autopm for MIDI input Greg Kroah-Hartman
2013-01-09 20:35 ` [ 16/47] p54usb: add USB ID for T-Com Sinus 154 data II Greg Kroah-Hartman
2013-01-09 20:35 ` [ 17/47] p54usb: add USBIDs for two more p54usb devices Greg Kroah-Hartman
2013-01-09 20:35 ` [ 18/47] usb: gadget: phonet: free requests in pn_bind()s error path Greg Kroah-Hartman
2013-01-09 20:35 ` [ 19/47] ACPI / scan: Do not use dummy HID for system bus ACPI nodes Greg Kroah-Hartman
2013-01-09 20:35 ` [ 20/47] NFS: avoid NULL dereference in nfs_destroy_server Greg Kroah-Hartman
2013-01-09 20:35 ` [ 21/47] NFS: Fix calls to drop_nlink() Greg Kroah-Hartman
2013-01-09 20:35 ` [ 22/47] nfsd4: fix oops on unusual readlike compound Greg Kroah-Hartman
2013-01-09 20:35 ` [ 23/47] nfs: fix null checking in nfs_get_option_str() Greg Kroah-Hartman
2013-01-09 20:35 ` [ 24/47] Input: walkera0701 - fix crash on startup Greg Kroah-Hartman
2013-01-09 20:35 ` [ 25/47] genirq: Always force thread affinity Greg Kroah-Hartman
2013-01-09 20:35 ` [ 26/47] xhci: Add Lynx Point LP to list of Intel switchable hosts Greg Kroah-Hartman
2013-01-09 20:35 ` [ 27/47] cgroup: remove incorrect dget/dput() pair in cgroup_create_dir() Greg Kroah-Hartman
2013-01-09 20:35 ` [ 28/47] x86, amd: Disable way access filter on Piledriver CPUs Greg Kroah-Hartman
2013-01-09 20:35 ` [ 29/47] ftrace: Do not function trace inlined functions Greg Kroah-Hartman
2013-01-09 20:35 ` [ 30/47] sparc: huge_ptep_set_* functions need to call set_huge_pte_at() Greg Kroah-Hartman
2013-01-09 20:35 ` [ 31/47] net: sched: integer overflow fix Greg Kroah-Hartman
2013-01-09 20:35 ` [ 32/47] tcp: implement RFC 5961 3.2 Greg Kroah-Hartman
2013-01-09 20:35 ` [ 33/47] tcp: implement RFC 5961 4.2 Greg Kroah-Hartman
2013-01-09 20:35 ` [ 34/47] tcp: refine SYN handling in tcp_validate_incoming Greg Kroah-Hartman
2013-01-09 20:35 ` [ 35/47] tcp: tcp_replace_ts_recent() should not be called from tcp_validate_incoming() Greg Kroah-Hartman
2013-01-09 20:35 ` [ 36/47] tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation Greg Kroah-Hartman
2013-01-09 20:35 ` [ 37/47] ARM: mm: use pteval_t to represent page protection values Greg Kroah-Hartman
2013-01-09 20:35 ` [ 38/47] ARM: missing ->mmap_sem around find_vma() in swp_emulate.c Greg Kroah-Hartman
2013-01-09 20:35 ` [ 39/47] solos-pci: fix double-free of TX skb in DMA mode Greg Kroah-Hartman
2013-01-09 20:35 ` [ 40/47] PCI: Reduce Ricoh 0xe822 SD card reader base clock frequency to 50MHz Greg Kroah-Hartman
2013-01-09 20:35 ` [ 41/47] Bluetooth: ath3k: Add support for VAIO VPCEH [0489:e027] Greg Kroah-Hartman
2013-01-09 20:35 ` [ 42/47] Bluetooth: cancel power_on work when unregistering the device Greg Kroah-Hartman
2013-01-09 20:35 ` [ 43/47] CRIS: fix I/O macros Greg Kroah-Hartman
2013-01-09 20:35 ` [ 44/47] drivers/rtc/rtc-vt8500.c: correct handling of CR_24H bitfield Greg Kroah-Hartman
2013-01-09 20:35 ` [ 45/47] drivers/rtc/rtc-vt8500.c: fix handling of data passed in struct rtc_time Greg Kroah-Hartman
2013-01-09 20:35 ` Greg Kroah-Hartman [this message]
2013-01-09 20:35 ` [ 47/47] can: Do not call dev_put if restart timer is running upon close Greg Kroah-Hartman
2013-01-10 18:01 ` [ 00/47] 3.0.58-stable review Shuah Khan
2013-01-11 14:47 ` Satoru Takeuchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130109201734.795331570@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=riel@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).