From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Jan Kara <jack@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Krzysztof Oledzki <olel@ans.pl>,
Andrew Morton <akpm@linux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Osterried <osterried@jesse.de>,
protasnb@gmail.com, bugme-daemon@bugzilla.kernel.org
Subject: Re: [Bug 9182] Critical memory leak (dirty pages)
Date: Thu, 20 Dec 2007 12:19:13 +1100 [thread overview]
Message-ID: <200712201219.13873.nickpiggin@yahoo.com.au> (raw)
In-Reply-To: <20071220010508.GA19332@atrey.karlin.mff.cuni.cz>
On Thursday 20 December 2007 12:05, Jan Kara wrote:
> > On Sun, 16 Dec 2007, Krzysztof Oledzki wrote:
> > > I'll confirm this tomorrow but it seems that even switching to
> > > data=ordered (AFAIK default o ext3) is indeed enough to cure this
> > > problem.
> >
> > Ok, do we actually have any ext3 expert following this? I have no idea
> > about what the journalling code does, but I have painful memories of ext3
> > doing really odd buffer-head-based IO and totally bypassing all the
> > normal page dirty logic.
> >
> > Judging by the symptoms (sorry for not following this well, it came up
> > while I was mostly away travelling), something probably *does* clear the
> > dirty bit on the pages, but the dirty *accounting* is not done properly,
> > so the kernel keeps thinking it has dirty pages.
> >
> > Now, a simple "grep" shows that ext3 does not actually do any
> > ClearPageDirty() or similar on its own, although maybe I missed some
> > other subtle way this can happen. And the *normal* VFS routines that do
> > ClearPageDirty should all be doing the proper accounting.
> >
> > So I see a couple of possible cases:
> >
> > - actually clearing the PG_dirty bit somehow, without doing the
> > accounting.
> >
> > This looks very unlikely. PG_dirty is always cleared by some variant
> > of "*ClearPageDirty()", and that bit definition isn't used for anything
> > else in the whole kernel judging by "grep" (the page allocator tests the
> > bit, that's it).
> >
> > And there aren't that many hits for ClearPageDirty, and they all seem
> > to do the proper "dec_zone_page_state(page, NR_FILE_DIRTY);" etc if
> > the mapping has dirty state accounting.
> >
> > The exceptions seem to be:
> > - the page freeing path, but that path checks that "mapping" is NULL
> > (so no accounting), and would complain loudly if it wasn't
> > - the swap state stuff ("move_from_swap_cache()"), but that should
> > only ever trigger for swap cache pages (we have a BUG_ON() in that
> > path), and those don't do dirty accounting anyway.
> > - pageout(), but again only for pages that have a NULL mapping.
> >
> > - ext3 might be clearing (probably indirectly) the "page->mapping" thing
> > or similar, which in turn will make the VFS think that even a dirty
> > page isn't actually to be accounted for - so when the page *turned*
> > dirty, it was accounted as a dirty page, but then, when it was
> > cleaned, the accounting wasn't reversed because ->mapping had become
> > NULL.
> >
> > This would be some interaction with the truncation logic, and quite
> > frankly, that should be all shared with the non-journal case, so I
> > find this all very unlikely.
> >
> > However, that second case is interesting, because the pageout case
> > actually has a comment like this:
> >
> > /*
> > * Some data journaling orphaned pages can have
> > * page->mapping == NULL while being dirty with clean buffers.
> > */
> >
> > which really sounds like the case in question.
> >
> > I may know the VM, but that special case was added due to insane
> > journaling filesystems, and I don't know what insane things they do.
> > Which is why I'm wondering if there is any ext3 person who knows the
> > journaling code?
>
> Yes, I'm looking into the problem... I think those orphan pages
> without mapping are created because we cannot drop truncated
> buffers/pages immediately. There can be a committing transaction that
> still needs the data in those buffers and until it commits we have to
> keep the pages (and even maybe write them to disk etc.). But eventually,
> we should write the buffers, call try_to_free_buffers() which calls
> cancel_dirty_page() and everything should be happy... in theory ;)
If mapping is NULL, then try_to_free_buffers won't call cancel_dirty_page,
I think?
I don't know whether ext3 can be changed to not require/allow these dirty
pages, but I would rather Linus's dirty page accounting fix to go into that
path (the /* can this still happen? */ in try_to_free_buffers()), if possible.
Then you could also have a WARN_ON in __remove_from_page_cache().
next prev parent reply other threads:[~2007-12-20 1:19 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20071215221935.306A5108068@picon.linux-foundation.org>
2007-12-15 23:08 ` [Bug 9182] Critical memory leak (dirty pages) Krzysztof Oledzki
2007-12-16 4:35 ` Andrew Morton
2007-12-16 9:33 ` Krzysztof Oledzki
2007-12-16 9:51 ` Andrew Morton
2007-12-16 13:46 ` Krzysztof Oledzki
2007-12-16 21:51 ` Andrew Morton
2007-12-17 14:26 ` Jan Kara
2007-12-17 17:17 ` Krzysztof Oledzki
2007-12-19 17:44 ` Linus Torvalds
2007-12-20 1:05 ` Jan Kara
2007-12-20 1:19 ` Nick Piggin [this message]
2007-12-20 14:12 ` Björn Steinbrink
2007-12-20 15:04 ` Jan Kara
2007-12-20 16:05 ` Jan Kara
2007-12-20 16:25 ` Linus Torvalds
2007-12-20 17:25 ` Jan Kara
2007-12-20 19:24 ` Linus Torvalds
2007-12-21 1:59 ` Nick Piggin
2007-12-20 22:28 ` Björn Steinbrink
2007-12-21 19:59 ` Krzysztof Oledzki
2007-12-21 20:42 ` [PATCH] Fix dirty page accounting leak with ext3 data=journal Björn Steinbrink
[not found] <20071216095834.1B899108069@picon.linux-foundation.org>
2007-12-16 10:12 ` [Bug 9182] Critical memory leak (dirty pages) Krzysztof Oledzki
[not found] <20071205213750.14194108010@picon.linux-foundation.org>
[not found] ` <Pine.LNX.4.64.0712052238520.21312@bizon.gios.gov.pl>
[not found] ` <Pine.LNX.4.64.0712111844510.21312@bizon.gios.gov.pl>
2007-12-12 13:28 ` Krzysztof Oledzki
[not found] <20071205135655.1A832108010@picon.linux-foundation.org>
2007-12-05 14:09 ` Krzysztof Oledzki
2007-09-28 8:42 Strange system hangs Krzysztof Oledzki
2007-09-28 20:14 ` Nick Piggin
2007-12-02 15:09 ` Krzysztof Oledzki
[not found] ` <200712030936.25363.osterried@jesse.de>
2007-12-13 15:17 ` [Bug 9182] Critical memory leak (dirty pages) Krzysztof Oledzki
2007-12-13 15:44 ` Peter Zijlstra
2007-12-13 16:16 ` Krzysztof Oledzki
2007-12-15 12:33 ` Krzysztof Oledzki
2007-12-15 21:53 ` Krzysztof Oledzki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200712201219.13873.nickpiggin@yahoo.com.au \
--to=nickpiggin@yahoo.com.au \
--cc=akpm@linux-foundation.org \
--cc=bugme-daemon@bugzilla.kernel.org \
--cc=jack@suse.cz \
--cc=linux-kernel@vger.kernel.org \
--cc=olel@ans.pl \
--cc=osterried@jesse.de \
--cc=peterz@infradead.org \
--cc=protasnb@gmail.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.