Re: [PATCH 1/2] mm: free large amount of 0-order pages in workqueue

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sasha Levin <sasha.levin@oracle.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org, mhocko@suse.cz,
	Mel Gorman <mgorman@suse.de>, Vlastimil Babka <vbabka@suse.cz>,
	Johannes Weiner <hannes@cmpxchg.org>,
	David Rientjes <rientjes@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	open@kvack.org, list@kvack.org,
	MEMORY MANAGEMENT <linux-mm@kvack.org>
Subject: Re: [PATCH 1/2] mm: free large amount of 0-order pages in workqueue
Date: Wed, 01 Apr 2015 09:20:52 -0400	[thread overview]
Message-ID: <551BF0B4.2060309@oracle.com> (raw)
In-Reply-To: <20150331155455.dd725010cec78112cd549c5b@linux-foundation.org>

On 03/31/2015 06:54 PM, Andrew Morton wrote:
> On Tue, 31 Mar 2015 18:39:42 -0400 Sasha Levin <sasha.levin@oracle.com> wrote:
> 
>>
>>> Stick a cond_resched() in __vunmap() ;)
>>
>> If only it was that simple :)
>>
>> Not only it get called in atomic context, 
> 
> Drat.  Who's calling vfree() from non-interrupt, atomic context for
> vast regions?

I have to admit that I don't have a clue. Michal and I discussed it at LSF/MM, and
he mentioned in his mail on the subject:

On 03/17/2015 04:58 AM, Michal Hocko wrote:
> Hmm, just looked into the git log and it seems that there are/were
> some callers of vfree with spinlock held (e.g. 9265f1d0c759 (GFS2:
> gfs2_dir_get_hash_table(): avoiding deferred vfree() is easy here...))
> and who knows how many others like that we have so cond_resched here is
> no-no.

>> but the problem is not just the
>> thread locking up, it's also lock dependency which causes other processes
>> to lock up. This is the example I've mentioned in the commit log with shmem.
>>
>> We have one random process crying about being stuck for two minutes:
>>
>> [ 2885.711517] INFO: task trinity-c5:7071 blocked for more than 120 seconds.
>> [ 2885.714534]       Not tainted 4.0.0-rc6-next-20150331-sasha-00036-g29ef5d2 #2108
>> [ 2885.717519] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [ 2885.719472] trinity-c5      D ffff88011604fc18 26704  7071   9144 0x10000004
>> [ 2885.721271]  ffff88011604fc18 ffff880127bb3d80 0000000000000001 0000000000000000
>> [ 2885.722842]  ffff8801291e1588 ffff8801291e1560 ffff880127bb3008 ffff8801f9218000
>> [ 2885.724431]  ffff880127bb3000 ffff88011604fbf8 ffff880116048000 ffffed0022c09002
>> [ 2885.726088] Call Trace:
>> [ 2885.726612] schedule (./arch/x86/include/asm/bitops.h:311 (discriminator 1) kernel/sched/core.c:2827 (discriminator 1))
>> [ 2885.727523] schedule_preempt_disabled (kernel/sched/core.c:2859)
>> [ 2885.728639] mutex_lock_nested (kernel/locking/mutex.c:585 kernel/locking/mutex.c:623)
>> [ 2885.736019] chown_common (fs/open.c:595)
>> [ 2885.745761] SyS_fchown (fs/open.c:663 fs/open.c:650)
>> [ 2885.746714] tracesys_phase2 (arch/x86/kernel/entry_64.S:340)
>> [ 2885.747758] 2 locks held by trinity-c5/7071:
>> [ 2885.748545] #0: (sb_writers#10){.+.+.+}, at: mnt_want_write_file (fs/namespace.c:445)
>> [ 2885.751407] #1: (&sb->s_type->i_mutex_key#15){+.+.+.}, at: chown_common (fs/open.c:595)
>> [ 2885.755143] Mutex: counter: -1 owner: trinity-c6
>>
>> While shmem is work tirelessly to free up it's pages:
>>
>> [ 2896.340953] trinity-c6      R  running task    27040  6561   9144 0x10000006
>> [ 2896.342673]  ffff8802e72576a8 ffff8802e7257758 ffffffffabfdd628 003c5e36ef1674fa
>> [ 2896.344267]  ffff8801533e1588 ffff8801533e1560 ffff8802d3963778 ffff8802ad220000
>> [ 2896.345824]  ffff8802d3963000 0000000000000000 ffff8802e7250000 ffffed005ce4a002
>> [ 2896.347286] Call Trace:
>> [ 2896.347784] ? trace_hardirqs_on_thunk (arch/x86/lib/thunk_64.S:42)
>> [ 2896.348977] preempt_schedule_common (./arch/x86/include/asm/preempt.h:77 (discriminator 1) kernel/sched/core.c:2867 (discriminator 1))
>> [ 2896.350279] preempt_schedule (kernel/sched/core.c:2893)
>> [ 2896.351349] ___preempt_schedule (arch/x86/lib/thunk_64.S:51)
>> [ 2896.353782] __debug_check_no_obj_freed (lib/debugobjects.c:713)
>> [ 2896.360001] debug_check_no_obj_freed (lib/debugobjects.c:727)
>> [ 2896.361574] free_pages_prepare (mm/page_alloc.c:823)
>> [ 2896.362657] free_hot_cold_page (mm/page_alloc.c:1550)
>> [ 2896.363735] free_hot_cold_page_list (mm/page_alloc.c:1596 (discriminator 3))
>> [ 2896.364846] release_pages (mm/swap.c:935)
>> [ 2896.367979] __pagevec_release (include/linux/pagevec.h:44 mm/swap.c:1013)
>> [ 2896.369149] shmem_undo_range (include/linux/pagevec.h:69 mm/shmem.c:446)
>> [ 2896.377070] shmem_truncate_range (mm/shmem.c:541)
>> [ 2896.378450] shmem_setattr (mm/shmem.c:577)
>> [ 2896.379556] notify_change (fs/attr.c:270)
>> [ 2896.382804] do_truncate (fs/open.c:62)
>> [ 2896.387739] do_sys_ftruncate.constprop.4 (fs/open.c:191)
>> [ 2896.389450] SyS_ftruncate (fs/open.c:199)
>> [ 2896.390879] tracesys_phase2 (arch/x86/kernel/entry_64.S:340)
> 
> OK, so shmem_undo_range() is full of cond_resched()s but it's holding
> i_mutex for too long.  Hugh, fix your junk!
> 
> Rather than mucking with the core page allocator I really do think it
> would be better to bodge the offending callers for this problem.
> 
> And/or maybe extend the softlockup timeout when crazy debug options are
> selected.  You're the only person who this will hurt ;)

2 minutes is too little, but I'm hitting (unrelated) things like the
lru_add_drain_all() hang even with a 20 minute timer. At some point it
just stops fuzzing and turns into an attempt to deal with freeing large
chunks of memory :/


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2015-04-01 13:21 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-31 22:11 [PATCH 1/2] mm: free large amount of 0-order pages in workqueue Sasha Levin
2015-03-31 22:11 ` Sasha Levin
2015-03-31 22:11 ` [PATCH 2/2] mm: __free_pages batch up 0-order pages for freeing Sasha Levin
2015-03-31 22:11   ` Sasha Levin
2015-04-01 12:48   ` Rasmus Villemoes
2015-04-01 12:48     ` Rasmus Villemoes
2015-03-31 22:31 ` [PATCH 1/2] mm: free large amount of 0-order pages in workqueue Andrew Morton
2015-03-31 22:31   ` Andrew Morton
2015-03-31 22:39   ` Sasha Levin
2015-03-31 22:54     ` Andrew Morton
2015-04-01 13:20       ` Sasha Levin [this message]
2015-04-25 21:51       ` Sasha Levin
2015-04-01 12:57 ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=551BF0B4.2060309@oracle.com \
    --to=sasha.levin@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=list@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=open@kvack.org \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.