From: Vlastimil Babka <vbabka@suse.cz>
To: Richard Yao <ryao@gentoo.org>, Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-kernel@vger.kernel.org, mthode@mthode.org,
kernel@gentoo.org, Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@suse.cz>, Glauber Costa <glommer@openvz.org>,
Rik van Riel <riel@redhat.com>,
Vladimir Davydov <vdavydov@parallels.com>,
Dave Chinner <dchinner@redhat.com>,
open@kvack.org,
"list@kvack.org:MEMORY MANAGEMENT" <linux-mm@kvack.org>
Subject: Re: [PATCH] mm: vmscan: unlock_page page when forcing reclaim
Date: Mon, 21 Jul 2014 09:18:12 +0200 [thread overview]
Message-ID: <53CCBEB4.1050401@suse.cz> (raw)
In-Reply-To: <53C96CBF.4040705@gentoo.org>
On 07/18/2014 08:51 PM, Richard Yao wrote:
> On 07/18/2014 12:38 PM, Johannes Weiner wrote:
>> I don't really understand how the scenario you describe can happen.
>>
>> Successfully reclaiming a page means that __remove_mapping() was able
>> to freeze a page count of 2 (page cache and LRU isolation), but
>> filemap_fault() increases the refcount on the page before trying to
>> lock the page. If __remove_mapping() wins, find_get_page() does not
>> work and the fault does not lock the page. If find_get_page() wins,
>> __remove_mapping() does not work and the reclaimer aborts and does a
>> regular unlock_page().
>>
>> page_check_references() is purely about reclaim strategy, it should
>> not be essential for correctness.
>>
>
> You are right that something else is happened here. I had not spotted
> the cmpxchg being done in __remove_mapping(). If I spot something that
> looks like it could be what went wrong doing this, I will propose a new
> fix to the list for review. Thanks for your time.
>
> P.S. The system had ECC RAM, so this was not a bit flip. My current
> method for debugging this involves using cscope to construct possible
> call paths under a couple of assumptions:
>
> 1. Something set PG_locked without calling unlock_page().
> 2. The only ways of doing #1 that I see in the code are calling
> __clear_page_locked() or failing to clear the bit. I do not believe that
> a patch was accepted that did the latter, so I assume the former.
Could it be that the process holding the lock was also stuck doing
something, and it was not a missed unlock?
> I have root access to the system, so each time I do a lookup using
> cscope, I go through the list to logically eliminate possibilities by
> inspecting the system where the problem occurred. When I cannot
> eliminate a possibility, I recurse. This is prone to fail positives
> should I miss a subtle piece of code that prevents a problem and it is
> very tedious, but I do not see a better way of debugging based on what I
> have at my disposal. If anyone has any suggestions, I would appreciate them.
You could try enabling VM_DEBUG, possibly LOCKDEP, try a git bisect if
there's a previous known working kernel version...
> P.P.S. I *really* wish that I had used kdump when this issue happened,
> but sadly, the system is not setup for kdump.
So it happened only once so far? How about enabling kdump and waiting if
it happens again.
prev parent reply other threads:[~2014-07-21 7:18 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-18 15:48 [PATCH] mm: vmscan: unlock_page page when forcing reclaim Richard Yao
2014-07-18 16:38 ` Johannes Weiner
[not found] ` <53C96CBF.4040705@gentoo.org>
2014-07-21 7:18 ` Vlastimil Babka [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53CCBEB4.1050401@suse.cz \
--to=vbabka@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=dchinner@redhat.com \
--cc=glommer@openvz.org \
--cc=hannes@cmpxchg.org \
--cc=kernel@gentoo.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
--cc=mthode@mthode.org \
--cc=open@kvack.org \
--cc=riel@redhat.com \
--cc=ryao@gentoo.org \
--cc=vdavydov@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox