From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Jan Kara <jack@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Krzysztof Oledzki <olel@ans.pl>,
Andrew Morton <akpm@linux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Osterried <osterried@jesse.de>,
protasnb@gmail.com, bugme-daemon@bugzilla.kernel.org
Subject: Re: [Bug 9182] Critical memory leak (dirty pages)
Date: Thu, 20 Dec 2007 12:19:13 +1100 [thread overview]
Message-ID: <200712201219.13873.nickpiggin@yahoo.com.au> (raw)
In-Reply-To: <20071220010508.GA19332@atrey.karlin.mff.cuni.cz>
On Thursday 20 December 2007 12:05, Jan Kara wrote:
> > On Sun, 16 Dec 2007, Krzysztof Oledzki wrote:
> > > I'll confirm this tomorrow but it seems that even switching to
> > > data=ordered (AFAIK default o ext3) is indeed enough to cure this
> > > problem.
> >
> > Ok, do we actually have any ext3 expert following this? I have no idea
> > about what the journalling code does, but I have painful memories of ext3
> > doing really odd buffer-head-based IO and totally bypassing all the
> > normal page dirty logic.
> >
> > Judging by the symptoms (sorry for not following this well, it came up
> > while I was mostly away travelling), something probably *does* clear the
> > dirty bit on the pages, but the dirty *accounting* is not done properly,
> > so the kernel keeps thinking it has dirty pages.
> >
> > Now, a simple "grep" shows that ext3 does not actually do any
> > ClearPageDirty() or similar on its own, although maybe I missed some
> > other subtle way this can happen. And the *normal* VFS routines that do
> > ClearPageDirty should all be doing the proper accounting.
> >
> > So I see a couple of possible cases:
> >
> > - actually clearing the PG_dirty bit somehow, without doing the
> > accounting.
> >
> > This looks very unlikely. PG_dirty is always cleared by some variant
> > of "*ClearPageDirty()", and that bit definition isn't used for anything
> > else in the whole kernel judging by "grep" (the page allocator tests the
> > bit, that's it).
> >
> > And there aren't that many hits for ClearPageDirty, and they all seem
> > to do the proper "dec_zone_page_state(page, NR_FILE_DIRTY);" etc if
> > the mapping has dirty state accounting.
> >
> > The exceptions seem to be:
> > - the page freeing path, but that path checks that "mapping" is NULL
> > (so no accounting), and would complain loudly if it wasn't
> > - the swap state stuff ("move_from_swap_cache()"), but that should
> > only ever trigger for swap cache pages (we have a BUG_ON() in that
> > path), and those don't do dirty accounting anyway.
> > - pageout(), but again only for pages that have a NULL mapping.
> >
> > - ext3 might be clearing (probably indirectly) the "page->mapping" thing
> > or similar, which in turn will make the VFS think that even a dirty
> > page isn't actually to be accounted for - so when the page *turned*
> > dirty, it was accounted as a dirty page, but then, when it was
> > cleaned, the accounting wasn't reversed because ->mapping had become
> > NULL.
> >
> > This would be some interaction with the truncation logic, and quite
> > frankly, that should be all shared with the non-journal case, so I
> > find this all very unlikely.
> >
> > However, that second case is interesting, because the pageout case
> > actually has a comment like this:
> >
> > /*
> > * Some data journaling orphaned pages can have
> > * page->mapping == NULL while being dirty with clean buffers.
> > */
> >
> > which really sounds like the case in question.
> >
> > I may know the VM, but that special case was added due to insane
> > journaling filesystems, and I don't know what insane things they do.
> > Which is why I'm wondering if there is any ext3 person who knows the
> > journaling code?
>
> Yes, I'm looking into the problem... I think those orphan pages
> without mapping are created because we cannot drop truncated
> buffers/pages immediately. There can be a committing transaction that
> still needs the data in those buffers and until it commits we have to
> keep the pages (and even maybe write them to disk etc.). But eventually,
> we should write the buffers, call try_to_free_buffers() which calls
> cancel_dirty_page() and everything should be happy... in theory ;)
If mapping is NULL, then try_to_free_buffers won't call cancel_dirty_page,
I think?
I don't know whether ext3 can be changed to not require/allow these dirty
pages, but I would rather Linus's dirty page accounting fix to go into that
path (the /* can this still happen? */ in try_to_free_buffers()), if possible.
Then you could also have a WARN_ON in __remove_from_page_cache().
next prev parent reply other threads:[~2007-12-20 1:19 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20071215221935.306A5108068@picon.linux-foundation.org>
2007-12-15 23:08 ` [Bug 9182] Critical memory leak (dirty pages) Krzysztof Oledzki
2007-12-16 4:35 ` Andrew Morton
2007-12-16 9:33 ` Krzysztof Oledzki
2007-12-16 9:51 ` Andrew Morton
2007-12-16 13:46 ` Krzysztof Oledzki
2007-12-16 21:51 ` Andrew Morton
2007-12-17 14:26 ` Jan Kara
2007-12-17 17:17 ` Krzysztof Oledzki
2007-12-19 17:44 ` Linus Torvalds
2007-12-20 1:05 ` Jan Kara
2007-12-20 1:19 ` Nick Piggin [this message]
2007-12-20 14:12 ` Björn Steinbrink
2007-12-20 15:04 ` Jan Kara
2007-12-20 16:05 ` Jan Kara
2007-12-20 16:25 ` Linus Torvalds
2007-12-20 17:25 ` Jan Kara
2007-12-20 19:24 ` Linus Torvalds
2007-12-21 1:59 ` Nick Piggin
2007-12-20 22:28 ` Björn Steinbrink
2007-12-21 19:59 ` Krzysztof Oledzki
2007-12-21 20:42 ` [PATCH] Fix dirty page accounting leak with ext3 data=journal Björn Steinbrink
[not found] <20071216095834.1B899108069@picon.linux-foundation.org>
2007-12-16 10:12 ` [Bug 9182] Critical memory leak (dirty pages) Krzysztof Oledzki
[not found] <20071205213750.14194108010@picon.linux-foundation.org>
[not found] ` <Pine.LNX.4.64.0712052238520.21312@bizon.gios.gov.pl>
[not found] ` <Pine.LNX.4.64.0712111844510.21312@bizon.gios.gov.pl>
2007-12-12 13:28 ` Krzysztof Oledzki
[not found] <20071205135655.1A832108010@picon.linux-foundation.org>
2007-12-05 14:09 ` Krzysztof Oledzki
2007-09-28 8:42 Strange system hangs Krzysztof Oledzki
2007-09-28 20:14 ` Nick Piggin
2007-12-02 15:09 ` Krzysztof Oledzki
[not found] ` <200712030936.25363.osterried@jesse.de>
2007-12-13 15:17 ` [Bug 9182] Critical memory leak (dirty pages) Krzysztof Oledzki
2007-12-13 15:44 ` Peter Zijlstra
2007-12-13 16:16 ` Krzysztof Oledzki
2007-12-15 12:33 ` Krzysztof Oledzki
2007-12-15 21:53 ` Krzysztof Oledzki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200712201219.13873.nickpiggin@yahoo.com.au \
--to=nickpiggin@yahoo.com.au \
--cc=akpm@linux-foundation.org \
--cc=bugme-daemon@bugzilla.kernel.org \
--cc=jack@suse.cz \
--cc=linux-kernel@vger.kernel.org \
--cc=olel@ans.pl \
--cc=osterried@jesse.de \
--cc=peterz@infradead.org \
--cc=protasnb@gmail.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox