[ 28/49] mm: limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Ben Hutchings <ben@decadent.org.uk>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: akpm@linux-foundation.org, alan@lxorguk.ukuu.org.uk,
	Michal Hocko <mhocko@suse.cz>, Mel Gorman <mgorman@suse.de>,
	Rik van Riel <riel@redhat.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [ 28/49] mm: limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT
Date: Sun, 13 Jan 2013 17:43:23 +0000	[thread overview]
Message-ID: <20130113174301.644385394@decadent.org.uk> (raw)
In-Reply-To: <20130113174255.736888844@decadent.org.uk>

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Michal Hocko <mhocko@suse.cz>

commit 53a59fc67f97374758e63a9c785891ec62324c81 upstream.

Since commit e303297e6c3a ("mm: extended batches for generic
mmu_gather") we are batching pages to be freed until either
tlb_next_batch cannot allocate a new batch or we are done.

This works just fine most of the time but we can get in troubles with
non-preemptible kernel (CONFIG_PREEMPT_NONE or CONFIG_PREEMPT_VOLUNTARY)
on large machines where too aggressive batching might lead to soft
lockups during process exit path (exit_mmap) because there are no
scheduling points down the free_pages_and_swap_cache path and so the
freeing can take long enough to trigger the soft lockup.

The lockup is harmless except when the system is setup to panic on
softlockup which is not that unusual.

The simplest way to work around this issue is to limit the maximum
number of batches in a single mmu_gather.  10k of collected pages should
be safe to prevent from soft lockups (we would have 2ms for one) even if
they are all freed without an explicit scheduling point.

This patch doesn't add any new explicit scheduling points because it
relies on zap_pmd_range during page tables zapping which calls
cond_resched per PMD.

The following lockup has been reported for 3.0 kernel with a huge
process (in order of hundreds gigs but I do know any more details).

  BUG: soft lockup - CPU#56 stuck for 22s! [kernel:31053]
  Modules linked in: af_packet nfs lockd fscache auth_rpcgss nfs_acl sunrpc mptctl mptbase autofs4 binfmt_misc dm_round_robin dm_multipath bonding cpufreq_conservative cpufreq_userspace cpufreq_powersave pcc_cpufreq mperf microcode fuse loop osst sg sd_mod crc_t10dif st qla2xxx scsi_transport_fc scsi_tgt netxen_nic i7core_edac iTCO_wdt joydev e1000e serio_raw pcspkr edac_core iTCO_vendor_support acpi_power_meter rtc_cmos hpwdt hpilo button container usbhid hid dm_mirror dm_region_hash dm_log linear uhci_hcd ehci_hcd usbcore usb_common scsi_dh_emc scsi_dh_alua scsi_dh_hp_sw scsi_dh_rdac scsi_dh dm_snapshot pcnet32 mii edd dm_mod raid1 ext3 mbcache jbd fan thermal processor thermal_sys hwmon cciss scsi_mod
  Supported: Yes
  CPU 56
  Pid: 31053, comm: kernel Not tainted 3.0.31-0.9-default #1 HP ProLiant DL580 G7
  RIP: 0010:  _raw_spin_unlock_irqrestore+0x8/0x10
  RSP: 0018:ffff883ec1037af0  EFLAGS: 00000206
  RAX: 0000000000000e00 RBX: ffffea01a0817e28 RCX: ffff88803ffd9e80
  RDX: 0000000000000200 RSI: 0000000000000206 RDI: 0000000000000206
  RBP: 0000000000000002 R08: 0000000000000001 R09: ffff887ec724a400
  R10: 0000000000000000 R11: dead000000200200 R12: ffffffff8144c26e
  R13: 0000000000000030 R14: 0000000000000297 R15: 000000000000000e
  FS:  00007ed834282700(0000) GS:ffff88c03f200000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 000000000068b240 CR3: 0000003ec13c5000 CR4: 00000000000006e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process kernel (pid: 31053, threadinfo ffff883ec1036000, task ffff883ebd5d4100)
  Call Trace:
    release_pages+0xc5/0x260
    free_pages_and_swap_cache+0x9d/0xc0
    tlb_flush_mmu+0x5c/0x80
    tlb_finish_mmu+0xe/0x50
    exit_mmap+0xbd/0x120
    mmput+0x49/0x120
    exit_mm+0x122/0x160
    do_exit+0x17a/0x430
    do_group_exit+0x3d/0xb0
    get_signal_to_deliver+0x247/0x480
    do_signal+0x71/0x1b0
    do_notify_resume+0x98/0xb0
    int_signal+0x12/0x17
  DWARF2 unwinder stuck at int_signal+0x12/0x17

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 include/asm-generic/tlb.h |    9 +++++++++
 mm/memory.c               |    5 +++++
 2 files changed, 14 insertions(+)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index ed6642a..25f01d0 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -78,6 +78,14 @@ struct mmu_gather_batch {
 #define MAX_GATHER_BATCH	\
 	((PAGE_SIZE - sizeof(struct mmu_gather_batch)) / sizeof(void *))
 
+/*
+ * Limit the maximum number of mmu_gather batches to reduce a risk of soft
+ * lockups for non-preemptible kernels on huge machines when a lot of memory
+ * is zapped during unmapping.
+ * 10K pages freed at once should be safe even without a preemption point.
+ */
+#define MAX_GATHER_BATCH_COUNT	(10000UL/MAX_GATHER_BATCH)
+
 /* struct mmu_gather is an opaque type used by the mm code for passing around
  * any data needed by arch specific code for tlb_remove_page.
  */
@@ -96,6 +104,7 @@ struct mmu_gather {
 	struct mmu_gather_batch *active;
 	struct mmu_gather_batch	local;
 	struct page		*__pages[MMU_GATHER_BUNDLE];
+	unsigned int		batch_count;
 };
 
 #define HAVE_GENERIC_MMU_GATHER
diff --git a/mm/memory.c b/mm/memory.c
index e0a9b0c..49fb1cf 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -184,10 +184,14 @@ static int tlb_next_batch(struct mmu_gather *tlb)
 		return 1;
 	}
 
+	if (tlb->batch_count == MAX_GATHER_BATCH_COUNT)
+		return 0;
+
 	batch = (void *)__get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
 	if (!batch)
 		return 0;
 
+	tlb->batch_count++;
 	batch->next = NULL;
 	batch->nr   = 0;
 	batch->max  = MAX_GATHER_BATCH;
@@ -216,6 +220,7 @@ void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, bool fullmm)
 	tlb->local.nr   = 0;
 	tlb->local.max  = ARRAY_SIZE(tlb->__pages);
 	tlb->active     = &tlb->local;
+	tlb->batch_count = 0;
 
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	tlb->batch = NULL;

next prev parent reply	other threads:[~2013-01-13 17:47 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-13 17:42 [ 00/49] 3.2.37-stable review Ben Hutchings
2013-01-13 17:42 ` [ 01/49] ext4: fix extent tree corruption caused by hole punch Ben Hutchings
2013-01-13 17:42 ` [ 02/49] i915: ensure that VGA plane is disabled Ben Hutchings
2013-01-13 17:42 ` [ 03/49] ext4: check dioread_nolock on remount Ben Hutchings
2013-01-13 17:42 ` [ 04/49] jbd2: fix assertion failure in jbd2_journal_flush() Ben Hutchings
2013-01-13 17:43 ` [ 05/49] hwmon: (lm73} Detect and report i2c bus errors Ben Hutchings
2013-01-13 17:43 ` [ 06/49] ext4: do not try to write superblock on ro remount w/o journal Ben Hutchings
2013-01-13 17:43 ` [ 07/49] PCI: Reduce Ricoh 0xe822 SD card reader base clock frequency to 50MHz Ben Hutchings
2013-01-13 17:43 ` [ 08/49] mm: Fix PageHead when !CONFIG_PAGEFLAGS_EXTENDED Ben Hutchings
2013-01-13 17:43 ` [ 09/49] cifs: adjust sequence number downward after signing NT_CANCEL request Ben Hutchings
2013-01-13 17:43 ` [ 10/49] tmpfs mempolicy: fix /proc/mounts corrupting memory Ben Hutchings
2013-01-13 17:43 ` [ 11/49] p54usb: add USB ID for T-Com Sinus 154 data II Ben Hutchings
2013-01-13 17:43 ` [ 12/49] ath9k_hw: Fix RX gain initvals for AR9485 Ben Hutchings
2013-01-13 17:43 ` [ 13/49] p54usb: add USBIDs for two more p54usb devices Ben Hutchings
2013-01-13 17:43 ` [ 14/49] powerpc/vdso: Remove redundant locking in update_vsyscall_tz() Ben Hutchings
2013-01-13 17:43 ` [ 15/49] powerpc: Add missing NULL terminator to avoid boot panic on PPC40x Ben Hutchings
2013-01-13 17:43 ` [ 16/49] drm/radeon: add connector table for SAM440ep embedded board Ben Hutchings
2013-01-13 17:43 ` [ 17/49] drm/radeon: add connector table for Mac G4 Silver Ben Hutchings
2013-01-14 18:48   ` Albrecht Dreß
2013-01-13 17:43 ` [ 18/49] drm/radeon: Properly handle DDC probe for DP bridges Ben Hutchings
2013-01-13 17:43 ` [ 19/49] NFSv4: Add ACCESS operation to OPEN compound Ben Hutchings
2013-01-15 18:42   ` Herton Ronaldo Krzesinski
2013-01-16  1:08     ` Ben Hutchings
2013-01-13 17:43 ` [ 20/49] NFSv4: dont check MAY_WRITE access bit in OPEN Ben Hutchings
2013-01-13 17:43 ` [ 21/49] NFS4: nfs4_opendata_access should return errno Ben Hutchings
2013-01-13 17:43 ` [ 22/49] NFS: Fix access to suid/sgid executables Ben Hutchings
2013-01-13 17:43 ` [ 23/49] drm/nouveau: fix init with agpgart-uninorth Ben Hutchings
2013-01-13 17:43 ` [ 24/49] video: mxsfb: fix crash when unblanking the display Ben Hutchings
2013-01-13 17:43 ` [ 25/49] nfs: fix null checking in nfs_get_option_str() Ben Hutchings
2013-01-13 17:43 ` [ 26/49] SUNRPC: Ensure that we free the rpc_task after cleanups are done Ben Hutchings
2013-01-13 17:43 ` [ 27/49] ACPI / scan: Do not use dummy HID for system bus ACPI nodes Ben Hutchings
2013-01-13 17:43 ` Ben Hutchings [this message]
2013-01-13 17:43 ` [ 29/49] drivers/rtc/rtc-vt8500.c: correct handling of CR_24H bitfield Ben Hutchings
2013-01-13 17:43 ` [ 30/49] drivers/rtc/rtc-vt8500.c: fix handling of data passed in struct rtc_time Ben Hutchings
2013-01-13 17:43 ` [ 31/49] udf: dont increment lenExtents while writing to a hole Ben Hutchings
2013-01-13 17:43 ` [ 32/49] epoll: prevent missed events on EPOLL_CTL_MOD Ben Hutchings
2013-01-13 17:43 ` [ 33/49] rt2x00: Dont let mac80211 send a BAR when an AMPDU subframe fails Ben Hutchings
2013-01-13 17:43 ` [ 34/49] mac80211: introduce IEEE80211_HW_TEARDOWN_AGGR_ON_BAR_FAIL Ben Hutchings
2013-01-13 17:43 ` [ 35/49] Revert: "rt2x00: Dont let mac80211 send a BAR when an AMPDU subframe fails" Ben Hutchings
2013-01-13 17:43 ` [ 36/49] ftrace: Do not function trace inlined functions Ben Hutchings
2013-01-13 17:43 ` [ 37/49] sparc: huge_ptep_set_* functions need to call set_huge_pte_at() Ben Hutchings
2013-01-13 17:43 ` [ 38/49] inet: Fix kmemleak in tcp_v4/6_syn_recv_sock and dccp_v4/6_request_recv_sock Ben Hutchings
2013-01-13 17:43 ` [ 39/49] net: sched: integer overflow fix Ben Hutchings
2013-01-13 17:43 ` [ 40/49] tcp: implement RFC 5961 3.2 Ben Hutchings
2013-01-13 17:43 ` [ 41/49] tcp: implement RFC 5961 4.2 Ben Hutchings
2013-01-13 17:43 ` [ 42/49] tcp: refine SYN handling in tcp_validate_incoming Ben Hutchings
2013-01-13 17:43 ` [ 43/49] tcp: tcp_replace_ts_recent() should not be called from tcp_validate_incoming() Ben Hutchings
2013-01-13 17:43 ` [ 44/49] tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation Ben Hutchings
2013-01-13 17:43 ` [ 45/49] [SCSI] mvsas: Fix oops when ata commond timeout Ben Hutchings
2013-01-13 17:43 ` [ 46/49] RDMA/nes: Fix for crash when registering zero length MR for CQ Ben Hutchings
2013-01-13 17:43 ` [ 47/49] RDMA/nes: Fix for terminate timer crash Ben Hutchings
2013-01-13 17:43 ` [ 48/49] ACPI : do not use Lid and Sleep button for S5 wakeup Ben Hutchings
2013-01-13 17:43 ` [ 49/49] aoe: do not call bdi_init after blk_alloc_queue Ben Hutchings
2013-01-13 22:44 ` [ 00/49] 3.2.37-stable review Ben Hutchings

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:ed6642a dfblob:25f01d0 dfblob:e0a9b0c dfblob:49fb1cf )
 OR (
bs:"[ 28/49] mm: limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130113174301.644385394@decadent.org.uk \
    --to=ben@decadent.org.uk \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=riel@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox