From: Richard Yao <ryao@gentoo.org>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-kernel@vger.kernel.org, mthode@mthode.org,
kernel@gentoo.org, Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@suse.cz>, Glauber Costa <glommer@openvz.org>,
Rik van Riel <riel@redhat.com>,
Vladimir Davydov <vdavydov@parallels.com>,
Dave Chinner <dchinner@redhat.com>,
open@kvack.org, list@kvack.org,
MEMORY MANAGEMENT <linux-mm@kvack.org>
Subject: Re: [PATCH] mm: vmscan: unlock_page page when forcing reclaim
Date: Fri, 18 Jul 2014 14:51:43 -0400 [thread overview]
Message-ID: <53C96CBF.4040705@gentoo.org> (raw)
In-Reply-To: <20140718163843.GK29639@cmpxchg.org>
[-- Attachment #1: Type: text/plain, Size: 2035 bytes --]
On 07/18/2014 12:38 PM, Johannes Weiner wrote:
> I don't really understand how the scenario you describe can happen.
>
> Successfully reclaiming a page means that __remove_mapping() was able
> to freeze a page count of 2 (page cache and LRU isolation), but
> filemap_fault() increases the refcount on the page before trying to
> lock the page. If __remove_mapping() wins, find_get_page() does not
> work and the fault does not lock the page. If find_get_page() wins,
> __remove_mapping() does not work and the reclaimer aborts and does a
> regular unlock_page().
>
> page_check_references() is purely about reclaim strategy, it should
> not be essential for correctness.
>
You are right that something else is happened here. I had not spotted
the cmpxchg being done in __remove_mapping(). If I spot something that
looks like it could be what went wrong doing this, I will propose a new
fix to the list for review. Thanks for your time.
P.S. The system had ECC RAM, so this was not a bit flip. My current
method for debugging this involves using cscope to construct possible
call paths under a couple of assumptions:
1. Something set PG_locked without calling unlock_page().
2. The only ways of doing #1 that I see in the code are calling
__clear_page_locked() or failing to clear the bit. I do not believe that
a patch was accepted that did the latter, so I assume the former.
I have root access to the system, so each time I do a lookup using
cscope, I go through the list to logically eliminate possibilities by
inspecting the system where the problem occurred. When I cannot
eliminate a possibility, I recurse. This is prone to fail positives
should I miss a subtle piece of code that prevents a problem and it is
very tedious, but I do not see a better way of debugging based on what I
have at my disposal. If anyone has any suggestions, I would appreciate them.
P.P.S. I *really* wish that I had used kdump when this issue happened,
but sadly, the system is not setup for kdump.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]
next prev parent reply other threads:[~2014-07-18 18:52 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-18 15:48 [PATCH] mm: vmscan: unlock_page page when forcing reclaim Richard Yao
2014-07-18 15:48 ` Richard Yao
2014-07-18 16:38 ` Johannes Weiner
2014-07-18 16:38 ` Johannes Weiner
2014-07-18 18:51 ` Richard Yao [this message]
2014-07-21 7:18 ` Vlastimil Babka
2014-07-21 7:18 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53C96CBF.4040705@gentoo.org \
--to=ryao@gentoo.org \
--cc=akpm@linux-foundation.org \
--cc=dchinner@redhat.com \
--cc=glommer@openvz.org \
--cc=hannes@cmpxchg.org \
--cc=kernel@gentoo.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=list@kvack.org \
--cc=mhocko@suse.cz \
--cc=mthode@mthode.org \
--cc=open@kvack.org \
--cc=riel@redhat.com \
--cc=vdavydov@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.