From: Jan Kara <jack@suse.cz>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>,
linux-mm@kvack.org, Martin Schwidefsky <schwidefsky@de.ibm.com>,
Mel Gorman <mgorman@suse.de>,
linux-s390@vger.kernel.org, Hugh Dickins <hughd@google.com>
Subject: Re: [PATCH] mm: Fix XFS oops due to dirty pages without buffers on s390
Date: Tue, 23 Oct 2012 12:21:53 +0200 [thread overview]
Message-ID: <20121023102153.GD3064@quack.suse.cz> (raw)
In-Reply-To: <20121022123852.a4bd5f2a.akpm@linux-foundation.org>
On Mon 22-10-12 12:38:52, Andrew Morton wrote:
> On Mon, 22 Oct 2012 17:06:46 +0200
> Jan Kara <jack@suse.cz> wrote:
>
> > On s390 any write to a page (even from kernel itself) sets architecture
> > specific page dirty bit. Thus when a page is written to via buffered write, HW
> > dirty bit gets set and when we later map and unmap the page, page_remove_rmap()
> > finds the dirty bit and calls set_page_dirty().
> >
> > Dirtying of a page which shouldn't be dirty can cause all sorts of problems to
> > filesystems. The bug we observed in practice is that buffers from the page get
> > freed, so when the page gets later marked as dirty and writeback writes it, XFS
> > crashes due to an assertion BUG_ON(!PagePrivate(page)) in page_buffers() called
> > from xfs_count_page_state().
> >
> > Similar problem can also happen when zero_user_segment() call from
> > xfs_vm_writepage() (or block_write_full_page() for that matter) set the
> > hardware dirty bit during writeback, later buffers get freed, and then page
> > unmapped.
> >
> > Fix the issue by ignoring s390 HW dirty bit for page cache pages of mappings
> > with mapping_cap_account_dirty(). This is safe because for such mappings when a
> > page gets marked as writeable in PTE it is also marked dirty in do_wp_page() or
> > do_page_fault(). When the dirty bit is cleared by clear_page_dirty_for_io(),
> > the page gets writeprotected in page_mkclean(). So pagecache page is writeable
> > if and only if it is dirty.
> >
> > Thanks to Hugh Dickins <hughd@google.com> for pointing out mapping has to have
> > mapping_cap_account_dirty() for things to work and proposing a cleaned up
> > variant of the patch.
> >
> > The patch has survived about two hours of running fsx-linux on tmpfs while
> > heavily swapping and several days of running on out build machines where the
> > original problem was triggered.
>
> That seems a fairly serious problem. To which kernel version(s) should
> we apply the fix?
Well, XFS will crash starting from 2.6.36 kernel where the assertion was
added. Previously XFS just silently added buffers (as other filesystems do
it) and wrote / redirtied the page (unnecessarily). So looking into
maintained -stable branches I think pushing the patch to -stable from 3.0
on should be enough.
> > diff --git a/mm/rmap.c b/mm/rmap.c
>
> It's a bit surprising that none of the added comments mention the s390
> pte-dirtying oddity. I don't see an obvious place to mention this, but
> I for one didn't know about this and it would be good if we could
> capture the info _somewhere_?
As Hugh says, the comment before page_test_and_clear_dirty() is somewhat
updated. But do you mean recording somewhere the catch that s390 HW dirty
bit gets set also whenever we write to a page from kernel? I guess we could
add that also to the comment before page_test_and_clear_dirty() in
page_remove_rmap() and also before definition of
page_test_and_clear_dirty(). So most people that will add / remove these
calls will be warned. OK?
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-10-23 10:21 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-22 15:06 [PATCH] mm: Fix XFS oops due to dirty pages without buffers on s390 Jan Kara
2012-10-22 19:38 ` Andrew Morton
2012-10-23 4:40 ` Hugh Dickins
2012-10-23 10:21 ` Jan Kara [this message]
2012-10-23 21:56 ` Andrew Morton
2012-10-24 8:30 ` Martin Schwidefsky
2012-10-25 20:01 ` Jan Kara
2012-12-14 8:45 ` Martin Schwidefsky
2012-12-17 23:31 ` Hugh Dickins
2012-12-18 7:30 ` Martin Schwidefsky
-- strict thread matches above, loose matches on Subject: below --
2012-10-01 16:26 Jan Kara
2012-10-08 14:28 ` Mel Gorman
2012-10-09 4:24 ` Hugh Dickins
2012-10-09 8:18 ` Martin Schwidefsky
2012-10-09 23:21 ` Hugh Dickins
2012-10-10 21:57 ` Hugh Dickins
2012-10-19 14:38 ` Martin Schwidefsky
2012-10-09 9:32 ` Mel Gorman
2012-10-09 23:00 ` Hugh Dickins
2012-10-09 16:21 ` Jan Kara
2012-10-10 2:19 ` Hugh Dickins
2012-10-10 8:55 ` Jan Kara
2012-10-10 21:28 ` Hugh Dickins
2012-10-11 7:42 ` Martin Schwidefsky
2012-10-10 21:56 ` Dave Chinner
2012-10-11 7:44 ` Martin Schwidefsky
2012-10-17 0:43 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121023102153.GD3064@quack.suse.cz \
--to=jack@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=hughd@google.com \
--cc=linux-mm@kvack.org \
--cc=linux-s390@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=schwidefsky@de.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).