From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
To: Hugh Dickins <hugh@veritas.com>
Cc: linux-mm@kvack.org
Subject: Re: __set_page_dirty_nobuffers superfluous check
Date: Sat, 14 Aug 2004 10:37:17 -0300 [thread overview]
Message-ID: <20040814133717.GA32755@logos.cnet> (raw)
In-Reply-To: <Pine.LNX.4.44.0408140745280.20187-100000@localhost.localdomain>
Hi Hugh!
Thanks for writing this down :)
On Sat, Aug 14, 2004 at 07:46:45AM +0100, Hugh Dickins wrote:
> On Fri, 13 Aug 2004, Marcelo Tosatti wrote:
>
> > While wandering through mm/page-writeback.c I noticed
> > __set_page_dirty_nobuffers does:
> >
> > int __set_page_dirty_nobuffers(struct page *page)
> > {
> > int ret = 0;
> >
> > if (!TestSetPageDirty(page)) {
> > struct address_space *mapping = page_mapping(page);
> >
> > if (mapping) {
> > spin_lock_irq(&mapping->tree_lock);
> > mapping = page_mapping(page);
> > if (page_mapping(page)) { /* Race with truncate? */
> > BUG_ON(page_mapping(page) != mapping); <------------------
> > if (!mapping->backing_dev_info->memory_backed)
> > inc_page_state(nr_dirty);
> > radix_tree_tag_set(&mapping->page_tree,
> > page_index(page), PAGECACHE_TAG_DIRTY);
> > }
> >
> > How could the mapping ever change if we have tree_lock?
> >
> > Its basically a check which assumes there might be
> > buggy page->mapping writers who do so without the lock, yes?
>
> Nicely observed - and four evaluations of page_mapping(page)
> within seven lines is two too many, even if optimized away.
>
> But actually your interpretation is wrong: because this has evolved
> from a sensible check on something seriously in doubt,
> to a pointless duplication of effort.
>
> It makes sense if you look at the original, in 2.6.6 or earlier:
>
> if (!TestSetPageDirty(page)) {
> struct address_space *mapping = page->mapping;
>
> if (mapping) {
> spin_lock_irq(&mapping->tree_lock);
> if (page->mapping) { /* Race with truncate? */
> BUG_ON(page->mapping != mapping);
> if (!mapping->backing_dev_info->memory_backed)
>
> What this is actually worrying about (along with truncation suddenly
> setting page->mapping to NULL, won't happen without tree_lock) is
> tmpfs swizzling page->mapping between a tmpfs struct address_space *
> and &swapper_space (move_to/from_swap_cache in swap_state.c); or
> (more distant concern) the page getting reused while we're in here,
> coming in with one page->mapping and then suddenly another.
>
> It's not doubting that tree_lock protects against that, but _which_
> tree_lock? If page->mapping suddenly changes underneath us, then
> the spin_lock_irq(&mapping->tree_lock) may have been done on the
> wrong mapping->tree_lock - to lock "mapping->tree_lock" you have
> to choose "mapping" first, but perhaps that's not stable without
> its tree_lock.
>
> In most cases, the (assumed) hold on the page in question prevents
> page->mapping from changing from one non-NULL to another non-NULL
> here, even without the tree_lock. But that's not enough to protect
> against the tmpfs swizzling: what protects against that? Er, er,
> it's the way tmpfs pages only go to swap when not in use, and are
> brought back from swap before being used, and shmem.c insists on
> page lock in each direction; but really it needs an audit of every
> use of set_page_dirty to be sure. Though I've never heard of the
> (earlier, useful) BUG_ON actually firing.
>
> And I think you'll find that in practice it's just a waste for ramfs,
> tmpfs and swap to be coming through __set_page_dirty_nobuffers at all:
> since we don't do mpage operations on the "memory_backed" filesystems,
> all the radix-tree tagging and dirty-inode operations on them are just
> a waste of time? and they'd do better to use a .set_page_dirty which
> just does SetPageDirty. But akpm does wonder from time to time whether
> to reintroduce mpage operations on at least swap, so may resist such a
> simplification.
Makes sense, why arent tmpfs/swap using mpage operations?
> Anyway, perhaps a suitable patch to make sense of that BUG_ON would be:
>
> --- 2.6.8/mm/page-writeback.c 2004-08-10 05:40:21.000000000 +0100
> +++ linux/mm/page-writeback.c 2004-08-14 07:21:58.744468256 +0100
> @@ -580,12 +580,13 @@ int __set_page_dirty_nobuffers(struct pa
>
> if (!TestSetPageDirty(page)) {
> struct address_space *mapping = page_mapping(page);
> + struct address_space *mapping2;
>
> if (mapping) {
> spin_lock_irq(&mapping->tree_lock);
> - mapping = page_mapping(page);
> - if (page_mapping(page)) { /* Race with truncate? */
> - BUG_ON(page_mapping(page) != mapping);
> + mapping2 = page_mapping(page);
> + if (mapping2) { /* Race with truncate? */
> + BUG_ON(mapping2 != mapping);
> if (!mapping->backing_dev_info->memory_backed)
> inc_page_state(nr_dirty);
> radix_tree_tag_set(&mapping->page_tree,
I see.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
next prev parent reply other threads:[~2004-08-14 13:37 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-08-13 18:05 __set_page_dirty_nobuffers superfluous check Marcelo Tosatti
2004-08-14 6:46 ` Hugh Dickins
2004-08-14 13:37 ` Marcelo Tosatti [this message]
2004-08-16 12:37 ` Hugh Dickins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20040814133717.GA32755@logos.cnet \
--to=marcelo.tosatti@cyclades.com \
--cc=hugh@veritas.com \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.