From: Wu Fengguang <fengguang.wu@intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mel@csn.ul.ie>,
Linux Kernel List <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Rik van Riel <riel@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Minchan Kim <minchan.kim@gmail.com>,
Christoph Lameter <cl@linux-foundation.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
"Wu, Fengguang" <fengguang.wu@intel.com>,
David Rientjes <rientjes@google.com>
Subject: Re: [PATCH 3/3] mm: page allocator: Drain per-cpu lists after direct reclaim allocation fails
Date: Sun, 5 Sep 2010 21:45:54 +0800 [thread overview]
Message-ID: <20100905134554.GA7083@localhost> (raw)
In-Reply-To: <20100905131447.GJ705@dastard>
[restoring CC list]
On Sun, Sep 05, 2010 at 09:14:47PM +0800, Dave Chinner wrote:
> On Sun, Sep 05, 2010 at 02:05:39PM +0800, Wu Fengguang wrote:
> > On Sun, Sep 05, 2010 at 10:15:55AM +0800, Dave Chinner wrote:
> > > On Sun, Sep 05, 2010 at 09:54:00AM +0800, Wu Fengguang wrote:
> > > > Dave, could you post (publicly) the kconfig and /proc/vmstat?
> > > >
> > > > I'd like to check if you have swap or memory compaction enabled..
> > >
> > > Swap is enabled - it has 512MB of swap space:
> > >
> > > $ free
> > > total used free shared buffers cached
> > > Mem: 4054304 100928 3953376 0 4096 43108
> > > -/+ buffers/cache: 53724 4000580
> > > Swap: 497976 0 497976
> >
> > It looks swap is not used at all.
>
> It isn't 30s after boot, abut I haven't checked after a livelock.
That's fine. I see in your fs_mark-wedge-1.png that there are no
read/write IO at all when CPUs are 100% busy. So there should be no
swap IO at "livelock" time.
> > > And memory compaction is not enabled:
> > >
> > > $ grep COMPACT .config
> > > # CONFIG_COMPACTION is not set
Memory compaction is not likely the cause too. It will only kick in for
order > 3 allocations.
> > >
> > > The .config is pretty much a 'make defconfig' and then enabling XFS and
> > > whatever debug I need (e.g. locking, memleak, etc).
> >
> > Thanks! The problem seems hard to debug -- you cannot login at all
> > when it is doing lock contentions, so cannot get sysrq call traces.
>
> Well, I don't know whether it is lock contention at all. The sets of
> traces I have got previously have shown backtraces on all CPUs in
> direct reclaim with several in draining queues, but no apparent lock
> contention.
That's interesting. Do you still have the full backtraces?
Maybe your system eats too much slab cache (icache/dcache) by creating
so many zero-sized files. The system may run into problems reclaiming
so many (dirty) slab pages.
> > How about enabling CONFIG_LOCK_STAT? Then you can check
> > /proc/lock_stat when the contentions are over.
>
> Enabling the locking debug/stats gathering slows the workload
> by a factor of 3 and doesn't produce the livelock....
Oh sorry.. but it would still be interesting to check the top
contended locks for this workload without any livelocks :)
Thanks,
Fengguang
WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu@intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mel@csn.ul.ie>,
Linux Kernel List <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Rik van Riel <riel@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Minchan Kim <minchan.kim@gmail.com>,
Christoph Lameter <cl@linux-foundation.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
"Wu, Fengguang" <fengguang.wu@intel.com>,
David Rientjes <rientjes@google.com>
Subject: Re: [PATCH 3/3] mm: page allocator: Drain per-cpu lists after direct reclaim allocation fails
Date: Sun, 5 Sep 2010 21:45:54 +0800 [thread overview]
Message-ID: <20100905134554.GA7083@localhost> (raw)
In-Reply-To: <20100905131447.GJ705@dastard>
[restoring CC list]
On Sun, Sep 05, 2010 at 09:14:47PM +0800, Dave Chinner wrote:
> On Sun, Sep 05, 2010 at 02:05:39PM +0800, Wu Fengguang wrote:
> > On Sun, Sep 05, 2010 at 10:15:55AM +0800, Dave Chinner wrote:
> > > On Sun, Sep 05, 2010 at 09:54:00AM +0800, Wu Fengguang wrote:
> > > > Dave, could you post (publicly) the kconfig and /proc/vmstat?
> > > >
> > > > I'd like to check if you have swap or memory compaction enabled..
> > >
> > > Swap is enabled - it has 512MB of swap space:
> > >
> > > $ free
> > > total used free shared buffers cached
> > > Mem: 4054304 100928 3953376 0 4096 43108
> > > -/+ buffers/cache: 53724 4000580
> > > Swap: 497976 0 497976
> >
> > It looks swap is not used at all.
>
> It isn't 30s after boot, abut I haven't checked after a livelock.
That's fine. I see in your fs_mark-wedge-1.png that there are no
read/write IO at all when CPUs are 100% busy. So there should be no
swap IO at "livelock" time.
> > > And memory compaction is not enabled:
> > >
> > > $ grep COMPACT .config
> > > # CONFIG_COMPACTION is not set
Memory compaction is not likely the cause too. It will only kick in for
order > 3 allocations.
> > >
> > > The .config is pretty much a 'make defconfig' and then enabling XFS and
> > > whatever debug I need (e.g. locking, memleak, etc).
> >
> > Thanks! The problem seems hard to debug -- you cannot login at all
> > when it is doing lock contentions, so cannot get sysrq call traces.
>
> Well, I don't know whether it is lock contention at all. The sets of
> traces I have got previously have shown backtraces on all CPUs in
> direct reclaim with several in draining queues, but no apparent lock
> contention.
That's interesting. Do you still have the full backtraces?
Maybe your system eats too much slab cache (icache/dcache) by creating
so many zero-sized files. The system may run into problems reclaiming
so many (dirty) slab pages.
> > How about enabling CONFIG_LOCK_STAT? Then you can check
> > /proc/lock_stat when the contentions are over.
>
> Enabling the locking debug/stats gathering slows the workload
> by a factor of 3 and doesn't produce the livelock....
Oh sorry.. but it would still be interesting to check the top
contended locks for this workload without any livelocks :)
Thanks,
Fengguang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-09-05 13:46 UTC|newest]
Thread overview: 104+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-03 9:08 [PATCH 0/3] Reduce watermark-related problems with the per-cpu allocator V4 Mel Gorman
2010-09-03 9:08 ` Mel Gorman
2010-09-03 9:08 ` [PATCH 1/3] mm: page allocator: Update free page counters after pages are placed on the free list Mel Gorman
2010-09-03 9:08 ` Mel Gorman
2010-09-03 22:38 ` Andrew Morton
2010-09-03 22:38 ` Andrew Morton
2010-09-05 18:06 ` Mel Gorman
2010-09-05 18:06 ` Mel Gorman
2010-09-03 9:08 ` [PATCH 2/3] mm: page allocator: Calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake Mel Gorman
2010-09-03 9:08 ` Mel Gorman
2010-09-03 22:55 ` Andrew Morton
2010-09-03 22:55 ` Andrew Morton
2010-09-03 23:17 ` Christoph Lameter
2010-09-03 23:17 ` Christoph Lameter
2010-09-03 23:28 ` Andrew Morton
2010-09-03 23:28 ` Andrew Morton
2010-09-04 0:54 ` Christoph Lameter
2010-09-04 0:54 ` Christoph Lameter
2010-09-05 18:12 ` Mel Gorman
2010-09-05 18:12 ` Mel Gorman
2010-09-03 9:08 ` [PATCH 3/3] mm: page allocator: Drain per-cpu lists after direct reclaim allocation fails Mel Gorman
2010-09-03 9:08 ` Mel Gorman
2010-09-03 23:00 ` Andrew Morton
2010-09-03 23:00 ` Andrew Morton
2010-09-04 2:25 ` Dave Chinner
2010-09-04 2:25 ` Dave Chinner
2010-09-04 3:21 ` Andrew Morton
2010-09-04 3:21 ` Andrew Morton
2010-09-04 7:58 ` Dave Chinner
2010-09-04 7:58 ` Dave Chinner
2010-09-04 8:14 ` Dave Chinner
2010-09-04 8:14 ` Dave Chinner
[not found] ` <20100905015400.GA10714@localhost>
[not found] ` <20100905021555.GG705@dastard>
[not found] ` <20100905060539.GA17450@localhost>
[not found] ` <20100905131447.GJ705@dastard>
2010-09-05 13:45 ` Wu Fengguang [this message]
2010-09-05 13:45 ` Wu Fengguang
2010-09-05 23:33 ` Dave Chinner
2010-09-05 23:33 ` Dave Chinner
2010-09-06 4:02 ` Dave Chinner
2010-09-06 4:02 ` Dave Chinner
2010-09-06 8:40 ` Mel Gorman
2010-09-06 8:40 ` Mel Gorman
2010-09-06 21:50 ` Dave Chinner
2010-09-06 21:50 ` Dave Chinner
2010-09-08 8:49 ` Dave Chinner
2010-09-08 8:49 ` Dave Chinner
2010-09-09 12:39 ` Mel Gorman
2010-09-09 12:39 ` Mel Gorman
2010-09-10 6:17 ` Dave Chinner
2010-09-10 6:17 ` Dave Chinner
2010-09-07 14:23 ` Christoph Lameter
2010-09-07 14:23 ` Christoph Lameter
2010-09-08 2:13 ` Wu Fengguang
2010-09-08 2:13 ` Wu Fengguang
2010-09-04 3:23 ` Wu Fengguang
2010-09-04 3:23 ` Wu Fengguang
2010-09-04 3:59 ` Andrew Morton
2010-09-04 3:59 ` Andrew Morton
2010-09-04 4:37 ` Wu Fengguang
2010-09-04 4:37 ` Wu Fengguang
2010-09-05 18:22 ` Mel Gorman
2010-09-05 18:22 ` Mel Gorman
2010-09-05 18:14 ` Mel Gorman
2010-09-05 18:14 ` Mel Gorman
2010-09-08 7:43 ` KOSAKI Motohiro
2010-09-08 7:43 ` KOSAKI Motohiro
2010-09-08 20:05 ` Christoph Lameter
2010-09-08 20:05 ` Christoph Lameter
2010-09-09 12:41 ` Mel Gorman
2010-09-09 12:41 ` Mel Gorman
2010-09-09 13:45 ` Christoph Lameter
2010-09-09 13:45 ` Christoph Lameter
2010-09-09 13:55 ` Mel Gorman
2010-09-09 13:55 ` Mel Gorman
2010-09-09 14:32 ` Christoph Lameter
2010-09-09 14:32 ` Christoph Lameter
2010-09-09 15:05 ` Mel Gorman
2010-09-09 15:05 ` Mel Gorman
2010-09-10 2:56 ` KOSAKI Motohiro
2010-09-10 2:56 ` KOSAKI Motohiro
2010-09-03 23:05 ` [PATCH 0/3] Reduce watermark-related problems with the per-cpu allocator V4 Andrew Morton
2010-09-03 23:05 ` Andrew Morton
2010-09-21 11:17 ` Mel Gorman
2010-09-21 11:17 ` Mel Gorman
2010-09-21 12:58 ` [stable] " Greg KH
2010-09-21 12:58 ` Greg KH
2010-09-21 14:23 ` Mel Gorman
2010-09-21 14:23 ` Mel Gorman
2010-09-23 18:49 ` Greg KH
2010-09-23 18:49 ` Greg KH
2010-09-24 9:14 ` Mel Gorman
2010-09-24 9:14 ` Mel Gorman
-- strict thread matches above, loose matches on Subject: below --
2010-08-31 17:37 [PATCH 0/3] Reduce watermark-related problems with the per-cpu allocator V3 Mel Gorman
2010-08-31 17:37 ` [PATCH 3/3] mm: page allocator: Drain per-cpu lists after direct reclaim allocation fails Mel Gorman
2010-08-31 17:37 ` Mel Gorman
2010-08-31 18:26 ` Christoph Lameter
2010-08-31 18:26 ` Christoph Lameter
2010-08-23 8:00 [PATCH 0/3] Reduce watermark-related problems with the per-cpu allocator V2 Mel Gorman
2010-08-23 8:00 ` [PATCH 3/3] mm: page allocator: Drain per-cpu lists after direct reclaim allocation fails Mel Gorman
2010-08-23 8:00 ` Mel Gorman
2010-08-23 23:17 ` KOSAKI Motohiro
2010-08-23 23:17 ` KOSAKI Motohiro
2010-08-16 9:42 [RFC PATCH 0/3] Reduce watermark-related problems with the per-cpu allocator Mel Gorman
2010-08-16 9:42 ` [PATCH 3/3] mm: page allocator: Drain per-cpu lists after direct reclaim allocation fails Mel Gorman
2010-08-16 14:50 ` Rik van Riel
2010-08-17 2:57 ` Minchan Kim
2010-08-18 3:02 ` KAMEZAWA Hiroyuki
2010-08-19 14:47 ` Minchan Kim
2010-08-19 15:10 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100905134554.GA7083@localhost \
--to=fengguang.wu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux-foundation.org \
--cc=david@fromorbit.com \
--cc=hannes@cmpxchg.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=minchan.kim@gmail.com \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.