Hole punching and mmap races

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jan Kara <jack@suse.cz>
To: linux-fsdevel@vger.kernel.org
Cc: xfs@oss.sgi.com, linux-ext4@vger.kernel.org,
	Hugh Dickins <hughd@google.com>,
	linux-mm@kvack.org
Subject: Hole punching and mmap races
Date: Wed, 16 May 2012 00:48:05 +0200	[thread overview]
Message-ID: <20120515224805.GA25577@quack.suse.cz> (raw)

  Hello,

  Hugh pointed me to ext4 hole punching code which is clearly missing some
locking. But looking at the code more deeply I realized I don't see
anything preventing the following race in XFS or ext4:

TASK1				TASK2
				punch_hole(file, 0, 4096)
				  filemap_write_and_wait()
				  truncate_pagecache_range()
addr = mmap(file);
addr[0] = 1
  ^^ writeably fault a page
				  remove file blocks

						FLUSHER
						write out file
						  ^^ interesting things can
happen because we expect blocks under the first page to be allocated /
reserved but they are not...

I'm pretty sure ext4 has this problem, I'm not completely sure whether
XFS has something to protect against such race but I don't see anything.

It's not easy to protect against these races. For truncate, i_size protects
us against similar races but for hole punching we don't have any such
mechanism. One way to avoid the race would be to hold mmap_sem while we are
invalidating the page cache and punching hole but that sounds a bit ugly.
Alternatively we could just have some special lock (rwsem?) held during
page_mkwrite() (for reading) and during whole hole punching (for writing)
to serialize these two operations.

Another alternative, which doesn't really look more appealing, is to go
page-by-page and always free corresponding blocks under page lock.

Any other ideas or thoughts?

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

WARNING: multiple messages have this Message-ID (diff)

From: Jan Kara <jack@suse.cz>
To: linux-fsdevel@vger.kernel.org
Cc: linux-mm@kvack.org, linux-ext4@vger.kernel.org,
	Hugh Dickins <hughd@google.com>,
	xfs@oss.sgi.com
Subject: Hole punching and mmap races
Date: Wed, 16 May 2012 00:48:05 +0200	[thread overview]
Message-ID: <20120515224805.GA25577@quack.suse.cz> (raw)

  Hello,

  Hugh pointed me to ext4 hole punching code which is clearly missing some
locking. But looking at the code more deeply I realized I don't see
anything preventing the following race in XFS or ext4:

TASK1				TASK2
				punch_hole(file, 0, 4096)
				  filemap_write_and_wait()
				  truncate_pagecache_range()
addr = mmap(file);
addr[0] = 1
  ^^ writeably fault a page
				  remove file blocks

						FLUSHER
						write out file
						  ^^ interesting things can
happen because we expect blocks under the first page to be allocated /
reserved but they are not...

I'm pretty sure ext4 has this problem, I'm not completely sure whether
XFS has something to protect against such race but I don't see anything.

It's not easy to protect against these races. For truncate, i_size protects
us against similar races but for hole punching we don't have any such
mechanism. One way to avoid the race would be to hold mmap_sem while we are
invalidating the page cache and punching hole but that sounds a bit ugly.
Alternatively we could just have some special lock (rwsem?) held during
page_mkwrite() (for reading) and during whole hole punching (for writing)
to serialize these two operations.

Another alternative, which doesn't really look more appealing, is to go
page-by-page and always free corresponding blocks under page lock.

Any other ideas or thoughts?

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

WARNING: multiple messages have this Message-ID (diff)

From: Jan Kara <jack@suse.cz>
To: linux-fsdevel@vger.kernel.org
Cc: xfs@oss.sgi.com, linux-ext4@vger.kernel.org,
	Hugh Dickins <hughd@google.com>,
	linux-mm@kvack.org
Subject: Hole punching and mmap races
Date: Wed, 16 May 2012 00:48:05 +0200	[thread overview]
Message-ID: <20120515224805.GA25577@quack.suse.cz> (raw)

  Hello,

  Hugh pointed me to ext4 hole punching code which is clearly missing some
locking. But looking at the code more deeply I realized I don't see
anything preventing the following race in XFS or ext4:

TASK1				TASK2
				punch_hole(file, 0, 4096)
				  filemap_write_and_wait()
				  truncate_pagecache_range()
addr = mmap(file);
addr[0] = 1
  ^^ writeably fault a page
				  remove file blocks

						FLUSHER
						write out file
						  ^^ interesting things can
happen because we expect blocks under the first page to be allocated /
reserved but they are not...

I'm pretty sure ext4 has this problem, I'm not completely sure whether
XFS has something to protect against such race but I don't see anything.

It's not easy to protect against these races. For truncate, i_size protects
us against similar races but for hole punching we don't have any such
mechanism. One way to avoid the race would be to hold mmap_sem while we are
invalidating the page cache and punching hole but that sounds a bit ugly.
Alternatively we could just have some special lock (rwsem?) held during
page_mkwrite() (for reading) and during whole hole punching (for writing)
to serialize these two operations.

Another alternative, which doesn't really look more appealing, is to go
page-by-page and always free corresponding blocks under page lock.

Any other ideas or thoughts?

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next             reply	other threads:[~2012-05-15 22:48 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-15 22:48 Jan Kara [this message]
2012-05-15 22:48 ` Hole punching and mmap races Jan Kara
2012-05-15 22:48 ` Jan Kara
2012-05-16  2:14 ` Dave Chinner
2012-05-16  2:14   ` Dave Chinner
2012-05-16  2:14   ` Dave Chinner
2012-05-16 13:04   ` Jan Kara
2012-05-16 13:04     ` Jan Kara
2012-05-16 13:04     ` Jan Kara
2012-05-17  7:43     ` Dave Chinner
2012-05-17  7:43       ` Dave Chinner
2012-05-17  7:43       ` Dave Chinner
2012-05-17 23:28       ` Jan Kara
2012-05-17 23:28         ` Jan Kara
2012-05-17 23:28         ` Jan Kara
2012-05-18 10:12         ` Dave Chinner
2012-05-18 10:12           ` Dave Chinner
2012-05-18 10:12           ` Dave Chinner
2012-05-18 13:32           ` Jan Kara
2012-05-18 13:32             ` Jan Kara
2012-05-18 13:32             ` Jan Kara
2012-05-19  1:40             ` Dave Chinner
2012-05-19  1:40               ` Dave Chinner
2012-05-24 12:35               ` Jan Kara
2012-05-24 12:35                 ` Jan Kara
2012-05-24 12:35                 ` Jan Kara
2012-06-05  5:51                 ` Dave Chinner
2012-06-05  5:51                   ` Dave Chinner
2012-06-05  5:51                   ` Dave Chinner
2012-06-05  6:22                   ` Marco Stornelli
2012-06-05  6:22                     ` Marco Stornelli
2012-06-05  6:22                     ` Marco Stornelli
2012-06-05 23:15                   ` Jan Kara
2012-06-05 23:15                     ` Jan Kara
2012-06-05 23:15                     ` Jan Kara
2012-06-06  0:06                     ` Dave Chinner
2012-06-06  0:06                       ` Dave Chinner
2012-06-06  0:06                       ` Dave Chinner
2012-06-06  9:58                       ` Jan Kara
2012-06-06  9:58                         ` Jan Kara
2012-06-06  9:58                         ` Jan Kara
2012-06-06 13:36                         ` Dave Chinner
2012-06-06 13:36                           ` Dave Chinner
2012-06-06 13:36                           ` Dave Chinner
2012-06-07 21:58                           ` Jan Kara
2012-06-07 21:58                             ` Jan Kara
2012-06-07 21:58                             ` Jan Kara
2012-06-08  0:57                             ` Dave Chinner
2012-06-08  0:57                               ` Dave Chinner
2012-06-08 21:36                               ` Jan Kara
2012-06-08 21:36                                 ` Jan Kara
2012-06-08 23:06                                 ` Dave Chinner
2012-06-08 23:06                                   ` Dave Chinner
2012-06-12  8:56                                   ` Jan Kara
2012-06-12  8:56                                     ` Jan Kara
2012-06-12  8:56                                     ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120515224805.GA25577@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=hughd@google.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.