linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Jim Schutt <jaschut@sandia.gov>
Cc: Linux-MM <linux-mm@kvack.org>, Rik van Riel <riel@redhat.com>,
	Minchan Kim <minchan@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH 0/5] Improve hugepage allocation success rates under load V3
Date: Sun, 12 Aug 2012 21:22:57 +0100	[thread overview]
Message-ID: <20120812202257.GA4177@suse.de> (raw)
In-Reply-To: <502542C7.8050306@sandia.gov>

On Fri, Aug 10, 2012 at 11:20:07AM -0600, Jim Schutt wrote:
> On 08/10/2012 05:02 AM, Mel Gorman wrote:
> >On Thu, Aug 09, 2012 at 04:38:24PM -0600, Jim Schutt wrote:
> 
> >>>
> >>>Ok, this is an untested hack and I expect it would drop allocation success
> >>>rates again under load (but not as much). Can you test again and see what
> >>>effect, if any, it has please?
> >>>
> >>>---8<---
> >>>mm: compaction: back out if contended
> >>>
> >>>---
> >>
> >><snip>
> >>
> >>Initial testing with this patch looks very good from
> >>my perspective; CPU utilization stays reasonable,
> >>write-out rate stays high, no signs of stress.
> >>Here's an example after ~10 minutes under my test load:
> >>
> 
> Hmmm, I wonder if I should have tested this patch longer,
> in view of the trouble I ran into testing the new patch?
> See below.
> 

The two patches are quite different in what they do. I think it's
unlikely they would share a common bug.

> > <SNIP>
> >---8<---
> >mm: compaction: Abort async compaction if locks are contended or taking too long
> 
> 
> Hmmm, while testing this patch, a couple of my servers got
> stuck after ~30 minutes or so, like this:
> 
> [ 2515.869936] INFO: task ceph-osd:30375 blocked for more than 120 seconds.
> [ 2515.876630] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 2515.884447] ceph-osd        D 0000000000000000     0 30375      1 0x00000000
> [ 2515.891531]  ffff8802e1a99e38 0000000000000082 ffff88056b38e298 ffff8802e1a99fd8
> [ 2515.899013]  ffff8802e1a98010 ffff8802e1a98000 ffff8802e1a98000 ffff8802e1a98000
> [ 2515.906482]  ffff8802e1a99fd8 ffff8802e1a98000 ffff880697d31700 ffff8802e1a84500
> [ 2515.913968] Call Trace:
> [ 2515.916433]  [<ffffffff8147fded>] schedule+0x5d/0x60
> [ 2515.921417]  [<ffffffff81480b25>] rwsem_down_failed_common+0x105/0x140
> [ 2515.927938]  [<ffffffff81480b73>] rwsem_down_write_failed+0x13/0x20
> [ 2515.934195]  [<ffffffff8124bcd3>] call_rwsem_down_write_failed+0x13/0x20
> [ 2515.940934]  [<ffffffff8147edc5>] ? down_write+0x45/0x50
> [ 2515.946244]  [<ffffffff81127b62>] sys_mprotect+0xd2/0x240
> [ 2515.951640]  [<ffffffff81489412>] system_call_fastpath+0x16/0x1b
> <SNIP>
> 
> I tried to capture a perf trace while this was going on, but it
> never completed.  "ps" on this system reports lots of kernel threads
> and some user-space stuff, but hangs part way through - no ceph
> executables in the output, oddly.
> 

ps is probably locking up because it's trying to access a proc file for
a process that is not releasing the mmap_sem.

> I can retest your earlier patch for a longer period, to
> see if it does the same thing, or I can do some other thing
> if you tell me what it is.
> 
> Also, FWIW I sorted a little through SysRq-T output from such
> a system; these bits looked interesting:
> 
> [ 3663.685097] INFO: rcu_sched self-detected stall on CPU { 17}  (t=60000 jiffies)
> [ 3663.685099] sending NMI to all CPUs:
> [ 3663.685101] NMI backtrace for cpu 0
> [ 3663.685102] CPU 0 Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa iw_cxgb4 dm_mirror dm_region_hash dm_log dm_round_robin dm_multipath scsi_dh vhost_net macvtap macvlan tun uinput sg joydev sd_mod hid_generic coretemp hwmon kvm crc32c_intel ghash_clmulni_intel aesni_intel cryptd aes_x86_64 microcode serio_raw pcspkr ata_piix libata button mlx4_ib ib_mad ib_core mlx4_en mlx4_core mpt2sas scsi_transport_sas raid_class scsi_mod cxgb4 i2c_i801 i2c_core lpc_ich mfd_core ehci_hcd uhci_hcd i7core_edac edac_core dm_mod ioatdma nfs nfs_acl auth_rpcgss fscache lockd sunrpc broadcom tg3 bnx2 igb dca e1000 [last unloaded: scsi_wait_scan]
> [ 3663.685138]
> [ 3663.685140] Pid: 100027, comm: ceph-osd Not tainted 3.5.0-00019-g472719a #221 Supermicro X8DTH-i/6/iF/6F/X8DTH
> [ 3663.685142] RIP: 0010:[<ffffffff81480ed5>]  [<ffffffff81480ed5>] _raw_spin_lock_irqsave+0x45/0x60
> [ 3663.685148] RSP: 0018:ffff880a08191898  EFLAGS: 00000012
> [ 3663.685149] RAX: ffff88063fffcb00 RBX: ffff88063fffcb00 RCX: 00000000000000c5
> [ 3663.685149] RDX: 00000000000000bf RSI: 000000000000015a RDI: ffff88063fffcb00
> [ 3663.685150] RBP: ffff880a081918a8 R08: 0000000000000000 R09: 0000000000000000
> [ 3663.685151] R10: ffff88063fffcb98 R11: ffff88063fffcc38 R12: 0000000000000246
> [ 3663.685152] R13: ffff88063fffcba8 R14: ffff88063fffcb90 R15: ffff88063fffc680
> [ 3663.685153] FS:  00007fff90ae0700(0000) GS:ffff880627c00000(0000) knlGS:0000000000000000
> [ 3663.685154] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 3663.685155] CR2: ffffffffff600400 CR3: 00000002b8fbe000 CR4: 00000000000007f0
> [ 3663.685156] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 3663.685157] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 3663.685158] Process ceph-osd (pid: 100027, threadinfo ffff880a08190000, task ffff880a9a29ae00)
> [ 3663.685158] Stack:
> [ 3663.685159]  000000000000130a 0000000000000000 ffff880a08191948 ffffffff8111a760
> [ 3663.685162]  ffffffff81a13420 0000000000000009 ffffea000004c240 0000000000000000
> [ 3663.685165]  ffff88063fffcba0 000000003fffcb98 ffff880a08191a18 0000000000001600
> [ 3663.685168] Call Trace:
> [ 3663.685169]  [<ffffffff8111a760>] isolate_migratepages_range+0x150/0x4e0
> [ 3663.685173]  [<ffffffff8111a5b0>] ? isolate_freepages+0x330/0x330
> [ 3663.685175]  [<ffffffff8111af5b>] compact_zone+0x46b/0x4f0
> [ 3663.685178]  [<ffffffff8111b3f8>] compact_zone_order+0xe8/0x100
> [ 3663.685180]  [<ffffffff8111b4b6>] try_to_compact_pages+0xa6/0x110
> [ 3663.685182]  [<ffffffff81100339>] __alloc_pages_direct_compact+0xd9/0x250
> [ 3663.685187]  [<ffffffff81100883>] __alloc_pages_slowpath+0x3d3/0x750
> [ 3663.685190]  [<ffffffff81100d3e>] __alloc_pages_nodemask+0x13e/0x1d0
> [ 3663.685192]  [<ffffffff8113c894>] alloc_pages_vma+0x124/0x150
> [ 3663.685195]  [<ffffffff8114e065>] do_huge_pmd_anonymous_page+0xf5/0x1e0
> [ 3663.685199]  [<ffffffff81121bcd>] handle_mm_fault+0x21d/0x320
> [ 3663.685202]  [<ffffffff8124bca4>] ? call_rwsem_down_read_failed+0x14/0x30
> [ 3663.685205]  [<ffffffff81484e49>] do_page_fault+0x439/0x4a0
> [ 3663.685208]  [<ffffffff8124bdaa>] ? trace_hardirqs_off_thunk+0x3a/0x6c
> [ 3663.685211]  [<ffffffff8148154f>] page_fault+0x1f/0x30

I went through the patch again but only found the following which is a
weak candidate. Still, can you retest with the following patch on top and
CONFIG_PROVE_LOCKING set please?

---8<---
diff --git a/mm/compaction.c b/mm/compaction.c
index 1827d9a..d4a51c6 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -64,7 +64,7 @@ static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags,
 {
 	if (need_resched() || spin_is_contended(lock)) {
 		if (locked) {
-			spin_unlock_irq(lock);
+			spin_unlock_irqrestore(lock, *flags);
 			locked = false;
 		}
 
@@ -276,8 +276,8 @@ static void acct_isolated(struct zone *zone, struct compact_control *cc)
 	list_for_each_entry(page, &cc->migratepages, lru)
 		count[!!page_is_file_cache(page)]++;
 
-	__mod_zone_page_state(zone, NR_ISOLATED_ANON, count[0]);
-	__mod_zone_page_state(zone, NR_ISOLATED_FILE, count[1]);
+	mod_zone_page_state(zone, NR_ISOLATED_ANON, count[0]);
+	mod_zone_page_state(zone, NR_ISOLATED_FILE, count[1]);
 }
 
 /* Similar to reclaim, but different enough that they don't share logic */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-08-12 20:23 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-09 13:49 [RFC PATCH 0/5] Improve hugepage allocation success rates under load V3 Mel Gorman
2012-08-09 13:49 ` [PATCH 1/5] mm: compaction: Update comment in try_to_compact_pages Mel Gorman
2012-08-09 13:49 ` [PATCH 2/5] mm: vmscan: Scale number of pages reclaimed by reclaim/compaction based on failures Mel Gorman
2012-08-10  8:49   ` Minchan Kim
2012-08-09 13:49 ` [PATCH 3/5] mm: compaction: Capture a suitable high-order page immediately when it is made available Mel Gorman
2012-08-09 23:35   ` Minchan Kim
2012-08-09 13:49 ` [PATCH 4/5] mm: have order > 0 compaction start off where it left Mel Gorman
2012-08-09 13:49 ` [PATCH 5/5] mm: have order > 0 compaction start near a pageblock with free pages Mel Gorman
2012-08-09 14:36 ` [RFC PATCH 0/5] Improve hugepage allocation success rates under load V3 Jim Schutt
2012-08-09 14:51   ` Mel Gorman
2012-08-09 18:16 ` Jim Schutt
2012-08-09 20:46   ` Mel Gorman
2012-08-09 22:38     ` Jim Schutt
2012-08-10 11:02       ` Mel Gorman
2012-08-10 17:20         ` Jim Schutt
2012-08-12 20:22           ` Mel Gorman [this message]
2012-08-13 20:35             ` Jim Schutt
2012-08-14  9:23               ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120812202257.GA4177@suse.de \
    --to=mgorman@suse.de \
    --cc=jaschut@sandia.gov \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).