From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ying Han <yinghan@google.com>, Jan Kara <jack@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
"linux-kernel" <linux-kernel@vger.kernel.org>,
"linux-mm" <linux-mm@kvack.org>,
guichaz@gmail.com, Alex Khesin <alexk@google.com>,
Mike Waychison <mikew@google.com>,
Rohit Seth <rohitseth@google.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: ftruncate-mmap: pages are lost after writing to mmaped file.
Date: Fri, 20 Mar 2009 03:34:54 +1100 [thread overview]
Message-ID: <200903200334.55710.nickpiggin@yahoo.com.au> (raw)
In-Reply-To: <alpine.LFD.2.00.0903190902000.17240@localhost.localdomain>
On Friday 20 March 2009 03:20:57 Linus Torvalds wrote:
> On Fri, 20 Mar 2009, Nick Piggin wrote:
> > But I think we do have a race in __set_page_dirty_buffers():
> >
> > The page may not have buffers between the mapping->private_lock
> > critical section and the __set_page_dirty call there. So between
> > them, another thread might do a create_empty_buffers which can
> > see !PageDirty and thus it will create clean buffers.
>
> Hmm.
>
> Creating clean buffers is locked by the page lock, nothing else. And not
> all page dirtiers hold the page lock (in fact, most try to avoid it - the
> rule is that you either have to hold the page lock _or_ hold a reference
> to the 'mapping', and the latter is what the mmap code does, I think).
>
> So yeah, the page lock isn't sufficient.
No. FWIW, I thought there might be a race due to page fault code
not holding page lock over set_page_dirty (well there *are* some kinds
of races, but they're another story). So I tried out my patches that
move that lock over set_page_dirty for __do_fault and do_wp_page
(so the lock is held over pte_mkwrite and set_page_dirty), but that
still didn't solve the problem either.
> > Holding mapping->private_lock over the __set_page_dirty should
> > fix it, although I guess you'd want to release it before calling
> > __mark_inode_dirty so as not to put inode_lock under there. I
> > have a patch for this if it sounds reasonable.
>
> That would seem to make sense. Maybe moving the "TestSetPageDirty()" from
> inside __set_page_dirty() to the caller? Something like the appended?
>
> This is TOTALLY untested. Of course.
Yeah, probably no need to hold private_lock while tagging the radix
tree (which is what my version did). So maybe this one is a little
better. I did test mine, it worked, but it didn't solve the problem.
Still, it does appear to solve a real race, which we should close.
>
> Linus
>
> ---
> fs/buffer.c | 23 +++++++++++------------
> 1 files changed, 11 insertions(+), 12 deletions(-)
>
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 9f69741..891e1c7 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -760,15 +760,9 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode);
> * If warn is true, then emit a warning if the page is not uptodate and
> has * not been truncated.
> */
> -static int __set_page_dirty(struct page *page,
> +static void __set_page_dirty(struct page *page,
> struct address_space *mapping, int warn)
> {
> - if (unlikely(!mapping))
> - return !TestSetPageDirty(page);
> -
> - if (TestSetPageDirty(page))
> - return 0;
> -
> spin_lock_irq(&mapping->tree_lock);
> if (page->mapping) { /* Race with truncate? */
> WARN_ON_ONCE(warn && !PageUptodate(page));
> @@ -785,8 +779,6 @@ static int __set_page_dirty(struct page *page,
> }
> spin_unlock_irq(&mapping->tree_lock);
> __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
> -
> - return 1;
> }
>
> /*
> @@ -816,6 +808,7 @@ static int __set_page_dirty(struct page *page,
> */
> int __set_page_dirty_buffers(struct page *page)
> {
> + int newly_dirty;
> struct address_space *mapping = page_mapping(page);
>
> if (unlikely(!mapping))
> @@ -831,9 +824,12 @@ int __set_page_dirty_buffers(struct page *page)
> bh = bh->b_this_page;
> } while (bh != head);
> }
> + newly_dirty = !TestSetPageDirty(page);
> spin_unlock(&mapping->private_lock);
>
> - return __set_page_dirty(page, mapping, 1);
> + if (newly_dirty)
> + __set_page_dirty(page, mapping, 1);
> + return newly_dirty;
> }
> EXPORT_SYMBOL(__set_page_dirty_buffers);
>
> @@ -1262,8 +1258,11 @@ void mark_buffer_dirty(struct buffer_head *bh)
> return;
> }
>
> - if (!test_set_buffer_dirty(bh))
> - __set_page_dirty(bh->b_page, page_mapping(bh->b_page), 0);
> + if (!test_set_buffer_dirty(bh)) {
> + struct page *page = bh->b_page;
> + if (!TestSetPageDirty(page))
> + __set_page_dirty(page, page_mapping(page), 0);
> + }
> }
>
> /*
WARNING: multiple messages have this Message-ID (diff)
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ying Han <yinghan@google.com>, Jan Kara <jack@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>,
guichaz@gmail.com, Alex Khesin <alexk@google.com>,
Mike Waychison <mikew@google.com>,
Rohit Seth <rohitseth@google.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: ftruncate-mmap: pages are lost after writing to mmaped file.
Date: Fri, 20 Mar 2009 03:34:54 +1100 [thread overview]
Message-ID: <200903200334.55710.nickpiggin@yahoo.com.au> (raw)
In-Reply-To: <alpine.LFD.2.00.0903190902000.17240@localhost.localdomain>
On Friday 20 March 2009 03:20:57 Linus Torvalds wrote:
> On Fri, 20 Mar 2009, Nick Piggin wrote:
> > But I think we do have a race in __set_page_dirty_buffers():
> >
> > The page may not have buffers between the mapping->private_lock
> > critical section and the __set_page_dirty call there. So between
> > them, another thread might do a create_empty_buffers which can
> > see !PageDirty and thus it will create clean buffers.
>
> Hmm.
>
> Creating clean buffers is locked by the page lock, nothing else. And not
> all page dirtiers hold the page lock (in fact, most try to avoid it - the
> rule is that you either have to hold the page lock _or_ hold a reference
> to the 'mapping', and the latter is what the mmap code does, I think).
>
> So yeah, the page lock isn't sufficient.
No. FWIW, I thought there might be a race due to page fault code
not holding page lock over set_page_dirty (well there *are* some kinds
of races, but they're another story). So I tried out my patches that
move that lock over set_page_dirty for __do_fault and do_wp_page
(so the lock is held over pte_mkwrite and set_page_dirty), but that
still didn't solve the problem either.
> > Holding mapping->private_lock over the __set_page_dirty should
> > fix it, although I guess you'd want to release it before calling
> > __mark_inode_dirty so as not to put inode_lock under there. I
> > have a patch for this if it sounds reasonable.
>
> That would seem to make sense. Maybe moving the "TestSetPageDirty()" from
> inside __set_page_dirty() to the caller? Something like the appended?
>
> This is TOTALLY untested. Of course.
Yeah, probably no need to hold private_lock while tagging the radix
tree (which is what my version did). So maybe this one is a little
better. I did test mine, it worked, but it didn't solve the problem.
Still, it does appear to solve a real race, which we should close.
>
> Linus
>
> ---
> fs/buffer.c | 23 +++++++++++------------
> 1 files changed, 11 insertions(+), 12 deletions(-)
>
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 9f69741..891e1c7 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -760,15 +760,9 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode);
> * If warn is true, then emit a warning if the page is not uptodate and
> has * not been truncated.
> */
> -static int __set_page_dirty(struct page *page,
> +static void __set_page_dirty(struct page *page,
> struct address_space *mapping, int warn)
> {
> - if (unlikely(!mapping))
> - return !TestSetPageDirty(page);
> -
> - if (TestSetPageDirty(page))
> - return 0;
> -
> spin_lock_irq(&mapping->tree_lock);
> if (page->mapping) { /* Race with truncate? */
> WARN_ON_ONCE(warn && !PageUptodate(page));
> @@ -785,8 +779,6 @@ static int __set_page_dirty(struct page *page,
> }
> spin_unlock_irq(&mapping->tree_lock);
> __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
> -
> - return 1;
> }
>
> /*
> @@ -816,6 +808,7 @@ static int __set_page_dirty(struct page *page,
> */
> int __set_page_dirty_buffers(struct page *page)
> {
> + int newly_dirty;
> struct address_space *mapping = page_mapping(page);
>
> if (unlikely(!mapping))
> @@ -831,9 +824,12 @@ int __set_page_dirty_buffers(struct page *page)
> bh = bh->b_this_page;
> } while (bh != head);
> }
> + newly_dirty = !TestSetPageDirty(page);
> spin_unlock(&mapping->private_lock);
>
> - return __set_page_dirty(page, mapping, 1);
> + if (newly_dirty)
> + __set_page_dirty(page, mapping, 1);
> + return newly_dirty;
> }
> EXPORT_SYMBOL(__set_page_dirty_buffers);
>
> @@ -1262,8 +1258,11 @@ void mark_buffer_dirty(struct buffer_head *bh)
> return;
> }
>
> - if (!test_set_buffer_dirty(bh))
> - __set_page_dirty(bh->b_page, page_mapping(bh->b_page), 0);
> + if (!test_set_buffer_dirty(bh)) {
> + struct page *page = bh->b_page;
> + if (!TestSetPageDirty(page))
> + __set_page_dirty(page, page_mapping(page), 0);
> + }
> }
>
> /*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-03-19 16:35 UTC|newest]
Thread overview: 121+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-18 19:44 ftruncate-mmap: pages are lost after writing to mmaped file Ying Han
2009-03-18 19:44 ` Ying Han
2009-03-18 22:11 ` Andrew Morton
2009-03-18 22:11 ` Andrew Morton
2009-03-18 22:40 ` Linus Torvalds
2009-03-18 22:40 ` Linus Torvalds
2009-03-18 23:18 ` Ying Han
2009-03-18 23:18 ` Ying Han
2009-03-18 23:36 ` Linus Torvalds
2009-03-18 23:36 ` Linus Torvalds
2009-03-18 23:54 ` Ying Han
2009-03-18 23:54 ` Ying Han
2009-03-19 15:48 ` Nick Piggin
2009-03-19 15:48 ` Nick Piggin
2009-03-19 16:16 ` Peter Zijlstra
2009-03-19 16:16 ` Peter Zijlstra
2009-03-19 16:36 ` Nick Piggin
2009-03-19 16:36 ` Nick Piggin
2009-03-19 16:20 ` Linus Torvalds
2009-03-19 16:20 ` Linus Torvalds
2009-03-19 16:34 ` Nick Piggin [this message]
2009-03-19 16:34 ` Nick Piggin
2009-03-19 16:51 ` Linus Torvalds
2009-03-19 16:51 ` Linus Torvalds
2009-03-19 17:03 ` Jan Kara
2009-03-19 17:03 ` Jan Kara
2009-03-19 17:06 ` Jan Kara
2009-03-19 17:06 ` Jan Kara
2009-03-19 20:05 ` Linus Torvalds
2009-03-19 20:05 ` Linus Torvalds
2009-03-19 20:21 ` Linus Torvalds
2009-03-19 20:21 ` Linus Torvalds
2009-03-19 21:17 ` Ying Han
2009-03-19 21:17 ` Ying Han
2009-03-19 22:16 ` Jan Kara
2009-03-19 22:16 ` Jan Kara
2009-03-19 16:46 ` Jan Kara
2009-03-19 16:46 ` Jan Kara
2009-03-24 7:44 ` Nick Piggin
2009-03-24 7:44 ` Nick Piggin
2009-03-24 10:27 ` Nick Piggin
2009-03-24 10:27 ` Nick Piggin
2009-03-24 10:32 ` Andrew Morton
2009-03-24 10:32 ` Andrew Morton
2009-03-24 15:35 ` Nick Piggin
2009-03-24 15:35 ` Nick Piggin
2009-03-26 18:29 ` Jan Kara
2009-03-26 18:29 ` Jan Kara
2009-03-26 0:03 ` Ying Han
2009-03-26 0:03 ` Ying Han
2009-03-24 12:39 ` Jan Kara
2009-03-24 12:39 ` Jan Kara
2009-03-24 12:55 ` Jan Kara
2009-03-24 12:55 ` Jan Kara
2009-03-24 13:26 ` Jan Kara
2009-03-24 13:26 ` Jan Kara
2009-03-24 14:01 ` Chris Mason
2009-03-24 14:01 ` Chris Mason
2009-03-24 14:07 ` Jan Kara
2009-03-24 14:07 ` Jan Kara
2009-03-26 8:18 ` Aneesh Kumar K.V
2009-03-26 8:18 ` Aneesh Kumar K.V
2009-03-24 14:30 ` Nick Piggin
2009-03-24 14:30 ` Nick Piggin
2009-03-24 14:47 ` Jan Kara
2009-03-24 14:47 ` Jan Kara
2009-03-24 14:56 ` Peter Zijlstra
2009-03-24 14:56 ` Peter Zijlstra
2009-03-24 15:29 ` Jan Kara
2009-03-24 15:29 ` Jan Kara
2009-03-24 20:14 ` OGAWA Hirofumi
2009-03-24 20:14 ` OGAWA Hirofumi
2009-03-26 8:47 ` Aneesh Kumar K.V
2009-03-26 8:47 ` Aneesh Kumar K.V
2009-03-26 11:37 ` Jan Kara
2009-03-26 11:37 ` Jan Kara
2009-03-26 23:02 ` Linus Torvalds
2009-03-26 23:02 ` Linus Torvalds
2009-03-24 15:03 ` Nick Piggin
2009-03-24 15:03 ` Nick Piggin
2009-03-24 15:48 ` Jan Kara
2009-03-24 15:48 ` Jan Kara
2009-03-24 17:35 ` Jan Kara
2009-03-24 17:35 ` Jan Kara
2009-03-24 17:35 ` Jan Kara
2009-04-01 22:36 ` Ying Han
2009-04-01 22:36 ` Ying Han
2009-04-02 10:11 ` Jan Kara
2009-04-02 10:11 ` Jan Kara
2009-04-02 11:24 ` Nick Piggin
2009-04-02 11:24 ` Nick Piggin
2009-04-02 11:34 ` Jan Kara
2009-04-02 11:34 ` Jan Kara
2009-04-02 15:51 ` Nick Piggin
2009-04-02 15:51 ` Nick Piggin
2009-04-02 17:44 ` Ying Han
2009-04-02 17:44 ` Ying Han
2009-04-02 22:52 ` Ying Han
2009-04-02 22:52 ` Ying Han
2009-04-02 23:39 ` Jan Kara
2009-04-02 23:39 ` Jan Kara
2009-04-03 0:25 ` Ying Han
2009-04-03 0:25 ` Ying Han
2009-04-03 1:29 ` Ying Han
2009-04-03 1:29 ` Ying Han
2009-04-03 9:41 ` Jan Kara
2009-04-03 9:41 ` Jan Kara
2009-04-03 21:34 ` Ying Han
2009-04-03 21:34 ` Ying Han
2009-04-03 0:13 ` Ying Han
2009-04-03 0:13 ` Ying Han
2009-03-27 20:35 ` Ying Han
2009-03-27 20:35 ` Ying Han
2009-03-20 0:34 ` Ying Han
2009-03-20 0:34 ` Ying Han
2009-03-20 0:49 ` Linus Torvalds
2009-03-20 0:49 ` Linus Torvalds
2009-03-20 7:00 ` Ying Han
2009-03-20 7:00 ` Ying Han
2009-03-25 23:15 ` Ying Han
2009-03-25 23:15 ` Ying Han
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200903200334.55710.nickpiggin@yahoo.com.au \
--to=nickpiggin@yahoo.com.au \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=alexk@google.com \
--cc=guichaz@gmail.com \
--cc=jack@suse.cz \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mikew@google.com \
--cc=rohitseth@google.com \
--cc=torvalds@linux-foundation.org \
--cc=yinghan@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.