linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: amir73il@users.sourceforge.net
To: linux-ext4@vger.kernel.org
Cc: tytso@mit.edu, lczerner@redhat.com,
	Amir Goldstein <amir73il@users.sf.net>,
	Yongqiang Yang <xiaoqiangnk@gmail.com>
Subject: [PATCH v1 30/36] ext4: snapshot race conditions - concurrent COW operations
Date: Tue,  7 Jun 2011 18:07:57 +0300	[thread overview]
Message-ID: <1307459283-22130-31-git-send-email-amir73il@users.sourceforge.net> (raw)
In-Reply-To: <1307459283-22130-1-git-send-email-amir73il@users.sourceforge.net>

From: Amir Goldstein <amir73il@users.sf.net>

Wait for pending COW operations to complete.
When concurrent tasks try to COW the same buffer, the task that takes
the active snapshot i_data_sem is elected as the the COWing task.
The COWing task allocates a new snapshot block and creates a buffer
cache entry with ref_count=1 for that new block.  It then locks the
new buffer and marks it with the buffer_new flag.  The rest of the
tasks wait (in msleep(1) loop), until the buffer_new flag is cleared.
The COWing task copies the source buffer into the 'new' buffer,
unlocks it, clears the new_buffer flag and drops its reference count.
On active snapshot readpage, the buffer cache is checked.
If a 'new' buffer entry is found, the reader task waits until the
buffer_new flag is cleared and then copies the 'new' buffer directly
into the snapshot file page.
The sleep loop method was copied from LVM snapshot code, which does
the same thing to deal with these (rare) races without wait queues.


Signed-off-by: Amir Goldstein <amir73il@users.sf.net>
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
---
 fs/ext4/inode.c          |   26 ++++++++++++++++++
 fs/ext4/snapshot.c       |   11 ++++++++
 fs/ext4/snapshot.h       |   64 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/ext4/snapshot_inode.c |   40 ++++++++++++++++++++++++++++
 4 files changed, 141 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index de40993..89a97da 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1049,6 +1049,7 @@ static int ext4_ind_map_blocks(handle_t *handle, struct inode *inode,
 	int depth;
 	int count = 0;
 	ext4_fsblk_t first_block = 0;
+	struct buffer_head *sbh = NULL;
 
 	trace_ext4_ind_map_blocks_enter(inode, map->m_lblk, map->m_len, flags);
 	J_ASSERT(!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)));
@@ -1155,6 +1156,25 @@ static int ext4_ind_map_blocks(handle_t *handle, struct inode *inode,
 	if (err)
 		goto cleanup;
 
+	if (SNAPMAP_ISCOW(flags)) {
+		/*
+		 * COWing block or creating COW bitmap.
+		 * we now have exclusive access to the COW destination block
+		 * and we are about to create the snapshot block mapping
+		 * and make it public.
+		 * grab the buffer cache entry and mark it new
+		 * to indicate a pending COW operation.
+		 * the refcount for the buffer cache will be released
+		 * when the COW operation is either completed or canceled.
+		 */
+		sbh = sb_getblk(inode->i_sb, le32_to_cpu(chain[depth-1].key));
+		if (!sbh) {
+			err = -EIO;
+			goto cleanup;
+		}
+		ext4_snapshot_start_pending_cow(sbh);
+	}
+
 	if (map->m_flags & EXT4_MAP_REMAP) {
 		map->m_len = count;
 		/* move old block to snapshot */
@@ -1198,6 +1218,12 @@ got_it:
 	/* Clean up and exit */
 	partial = chain + depth - 1;	/* the whole chain */
 cleanup:
+	/* cancel pending COW operation on failure to alloc snapshot block */
+	if (SNAPMAP_ISCOW(flags)) {
+		if (err < 0 && sbh)
+			ext4_snapshot_end_pending_cow(sbh);
+		brelse(sbh);
+	}
 	while (partial > chain) {
 		BUFFER_TRACE(partial->bh, "call brelse");
 		brelse(partial->bh);
diff --git a/fs/ext4/snapshot.c b/fs/ext4/snapshot.c
index 000e655..bd6a833 100644
--- a/fs/ext4/snapshot.c
+++ b/fs/ext4/snapshot.c
@@ -115,6 +115,8 @@ ext4_snapshot_complete_cow(handle_t *handle, struct inode *snapshot,
 	if (sync)
 		sync_dirty_buffer(sbh);
 out:
+	/* COW operation is complete */
+	ext4_snapshot_end_pending_cow(sbh);
 	return err;
 }
 
@@ -688,6 +690,12 @@ int ext4_snapshot_test_and_cow(const char *where, handle_t *handle,
 	 * we allocated this block -
 	 * copy block data to snapshot and complete COW operation
 	 */
+	snapshot_debug(3, "COWing block [%llu/%llu] of snapshot "
+			"(%u)...\n",
+			SNAPSHOT_BLOCK_TUPLE(block),
+			active_snapshot->i_generation);
+	/* sleep 1 tunable delay unit */
+	snapshot_test_delay(SNAPTEST_COW);
 	err = ext4_snapshot_copy_buffer_cow(handle, active_snapshot,
 			sbh, bh);
 	if (err)
@@ -700,6 +708,9 @@ int ext4_snapshot_test_and_cow(const char *where, handle_t *handle,
 
 	trace_cow_inc(handle, copied);
 test_pending_cow:
+	if (sbh)
+		/* wait for pending COW to complete */
+		ext4_snapshot_test_pending_cow(sbh, block);
 
 cowed:
 	/* mark the buffer COWed in the current transaction */
diff --git a/fs/ext4/snapshot.h b/fs/ext4/snapshot.h
index 44bac96..37f5c2d 100644
--- a/fs/ext4/snapshot.h
+++ b/fs/ext4/snapshot.h
@@ -474,6 +474,70 @@ static inline int ext4_snapshot_mow_in_tid(struct inode *inode)
 		      ext4_snapshot_get_tid(inode->i_sb));
 }
 
+/*
+ * Pending COW functions
+ */
+
+/*
+ * Start pending COW operation from get_blocks_handle()
+ * after allocating snapshot block and before connecting it
+ * to the snapshot inode.
+ */
+static inline void ext4_snapshot_start_pending_cow(struct buffer_head *sbh)
+{
+	/*
+	 * setting the 'new' flag on a newly allocated snapshot block buffer
+	 * indicates that the COW operation is pending.
+	 */
+	set_buffer_new(sbh);
+	/* keep buffer in cache as long as we need to test the 'new' flag */
+	get_bh(sbh);
+}
+
+/*
+ * End pending COW operation started in get_blocks_handle().
+ * Called on failure to connect the new snapshot block to the inode
+ * or on successful completion of the COW operation.
+ */
+static inline void ext4_snapshot_end_pending_cow(struct buffer_head *sbh)
+{
+	/*
+	 * clearing the 'new' flag from the snapshot block buffer
+	 * indicates that the COW operation is complete.
+	 */
+	clear_buffer_new(sbh);
+	/* we no longer need to keep the buffer in cache */
+	put_bh(sbh);
+}
+
+/*
+ * Test for pending COW operation and wait for its completion.
+ */
+static inline void ext4_snapshot_test_pending_cow(struct buffer_head *sbh,
+						sector_t blocknr)
+{
+	while (buffer_new(sbh)) {
+		/* wait for pending COW to complete */
+		snapshot_debug_once(2, "waiting for pending cow: "
+				"block = [%llu/%llu]...\n",
+				SNAPSHOT_BLOCK_TUPLE(blocknr));
+		/*
+		 * An unusually long pending COW operation can be caused by
+		 * the debugging function snapshot_test_delay(SNAPTEST_COW)
+		 * and by waiting for tracked reads to complete.
+		 * The new COW buffer is locked during those events, so wait
+		 * on the buffer before the short msleep.
+		 */
+		wait_on_buffer(sbh);
+		/*
+		 * This is an unlikely event that can happen only once per
+		 * block/snapshot, so msleep(1) is sufficient and there is
+		 * no need for a wait queue.
+		 */
+		msleep(1);
+		/* XXX: Should we fail after N retries? */
+	}
+}
 
 #else /* CONFIG_EXT4_FS_SNAPSHOT */
 
diff --git a/fs/ext4/snapshot_inode.c b/fs/ext4/snapshot_inode.c
index a97411e..55cac07 100644
--- a/fs/ext4/snapshot_inode.c
+++ b/fs/ext4/snapshot_inode.c
@@ -183,6 +183,7 @@ static int ext4_snapshot_read_through(struct inode *inode, sector_t iblock,
 	int err;
 	struct ext4_map_blocks map;
 	struct inode *prev_snapshot;
+	struct buffer_head *sbh = NULL;
 
 	map.m_lblk = iblock;
 	map.m_pblk = 0;
@@ -214,6 +215,45 @@ get_block:
 	bh_result->b_state = (bh_result->b_state & ~EXT4_MAP_FLAGS) |
 		map.m_flags;
 
+	/*
+	 * On read of active snapshot, a mapped block may belong to a non
+	 * completed COW operation.  Use the buffer cache to test this
+	 * condition.  if (bh_result->b_blocknr == SNAPSHOT_BLOCK(iblock)),
+	 * then this is either read through to block device or moved block.
+	 * Either way, it is not a COWed block, so it cannot be pending COW.
+	 */
+	if (ext4_snapshot_is_active(inode) &&
+	    bh_result->b_blocknr != SNAPSHOT_BLOCK(iblock))
+		sbh = sb_find_get_block(inode->i_sb, bh_result->b_blocknr);
+	if (!sbh)
+		return 0;
+	/* wait for pending COW to complete */
+	ext4_snapshot_test_pending_cow(sbh, SNAPSHOT_BLOCK(iblock));
+	lock_buffer(sbh);
+	if (buffer_uptodate(sbh)) {
+		/*
+		 * Avoid disk I/O and copy out snapshot page directly
+		 * from block device page when possible.
+		 */
+		BUG_ON(!sbh->b_page);
+		BUG_ON(!bh_result->b_page);
+		lock_buffer(bh_result);
+		copy_highpage(bh_result->b_page, sbh->b_page);
+		set_buffer_uptodate(bh_result);
+		unlock_buffer(bh_result);
+	} else if (buffer_dirty(sbh)) {
+		/*
+		 * If snapshot data buffer is dirty (just been COWed),
+		 * then it is not safe to read it from disk yet.
+		 * We shouldn't get here because snapshot data buffer
+		 * only becomes dirty during COW and because we waited
+		 * for pending COW to complete, which means that a
+		 * dirty snapshot data buffer should be uptodate.
+		 */
+		WARN_ON(1);
+	}
+	unlock_buffer(sbh);
+	brelse(sbh);
 	return 0;
 }
 
-- 
1.7.4.1


  parent reply	other threads:[~2011-06-07 15:10 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-07 15:07 [PATCH v1 00/30] Ext4 snapshots amir73il
2011-06-07 15:07 ` [PATCH v1 01/36] ext4: EXT4 snapshots (Experimental) amir73il
2011-06-07 15:07 ` [PATCH v1 02/36] ext4: snapshot debugging support amir73il
2011-06-07 15:07 ` [PATCH v1 03/36] ext4: snapshot hooks - inside JBD hooks amir73il
2011-06-07 15:07 ` [PATCH v1 04/36] ext4: snapshot hooks - block bitmap access amir73il
2011-06-07 15:07 ` [PATCH v1 05/36] ext4: snapshot hooks - delete blocks amir73il
2011-06-07 15:07 ` [PATCH v1 06/36] ext4: snapshot hooks - move data blocks amir73il
2011-06-07 15:07 ` [PATCH v1 07/36] ext4: snapshot hooks - direct I/O amir73il
2011-06-07 15:07 ` [PATCH v1 08/36] ext4: snapshot hooks - move extent file data blocks amir73il
2011-06-07 15:07 ` [PATCH v1 09/36] ext4: snapshot file amir73il
2011-06-07 15:07 ` [PATCH v1 10/36] ext4: snapshot file - read through to block device amir73il
2011-06-07 15:07 ` [PATCH v1 11/36] ext4: snapshot file - permissions amir73il
2011-06-07 15:07 ` [PATCH v1 12/36] ext4: snapshot file - store on disk amir73il
2011-06-07 15:07 ` [PATCH v1 13/36] ext4: snapshot file - increase maximum file size limit to 16TB amir73il
2011-06-07 15:07 ` [PATCH v1 14/36] ext4: snapshot block operations amir73il
2011-06-07 15:07 ` [PATCH v1 15/36] ext4: snapshot block operation - copy blocks to snapshot amir73il
2011-06-07 15:07 ` [PATCH v1 16/36] ext4: snapshot block operation - move " amir73il
2011-06-07 15:07 ` [PATCH v1 17/36] ext4: snapshot block operation - copy block bitmap " amir73il
2011-06-07 15:07 ` [PATCH v1 18/36] ext4: snapshot control amir73il
2011-06-07 15:07 ` [PATCH v1 19/36] ext4: snapshot control - init new snapshot amir73il
2011-06-07 15:07 ` [PATCH v1 20/36] ext4: snapshot control - fix " amir73il
2011-06-07 15:07 ` [PATCH v1 21/36] ext4: snapshot control - reserve disk space for snapshot amir73il
2011-06-07 15:07 ` [PATCH v1 22/36] ext4: snapshot journaled - increase transaction credits amir73il
2011-06-07 15:07 ` [PATCH v1 23/36] ext4: snapshot journaled - implement journal_release_buffer() amir73il
2011-06-07 15:07 ` [PATCH v1 24/36] ext4: snapshot journaled - bypass to save credits amir73il
2011-06-07 15:07 ` [PATCH v1 25/36] ext4: snapshot journaled - cache last COW tid in journal_head amir73il
2011-06-07 15:07 ` [PATCH v1 26/36] ext4: snapshot journaled - trace COW/buffer credits amir73il
2011-06-07 15:07 ` [PATCH v1 27/36] ext4: snapshot list support amir73il
2011-06-07 15:07 ` [PATCH v1 28/36] ext4: snapshot list - read through to previous snapshot amir73il
2011-06-07 15:07 ` [PATCH v1 29/36] ext4: snapshot race conditions - concurrent COW bitmap operations amir73il
2011-06-07 15:07 ` amir73il [this message]
2011-06-07 15:07 ` [PATCH v1 31/36] ext4: snapshot race conditions - tracked reads amir73il
2011-06-07 15:07 ` [PATCH v1 32/36] ext4: snapshot exclude - the exclude bitmap amir73il
2011-06-07 15:08 ` [PATCH v1 33/36] ext4: snapshot cleanup amir73il
2011-06-07 15:08 ` [PATCH v1 34/36] ext4: snapshot cleanup - shrink deleted snapshots amir73il
2011-06-07 15:08 ` [PATCH v1 35/36] ext4: snapshot cleanup - merge shrunk snapshots amir73il
2011-06-07 15:08 ` [PATCH v1 36/36] ext4: snapshot rocompat - enable rw mount amir73il
2011-06-07 15:56 ` [PATCH v1 00/30] Ext4 snapshots Lukas Czerner
2011-06-07 16:31   ` Amir G.
2011-06-08 10:09     ` Lukas Czerner
2011-06-08 14:04       ` Amir G.
2011-06-08 14:41         ` Eric Sandeen
2011-06-08 15:01           ` Amir G.
2011-06-08 15:22             ` Eric Sandeen
2011-06-08 15:33               ` Amir G.
2011-06-08 15:38         ` Lukas Czerner
2011-06-08 15:59           ` Amir G.
2011-06-08 16:19             ` Mike Snitzer
2011-06-09  1:59           ` Yongqiang Yang
2011-06-09  3:18             ` Amir G.
2011-06-09  3:51               ` Yongqiang Yang
2011-06-09  6:50                 ` Lukas Czerner
2011-06-09  7:57                   ` Amir G.
2011-06-09  8:13                     ` david
2011-06-09 10:06                       ` Amir G.
2011-06-09 10:17                         ` Lukas Czerner
2011-06-09  8:46                     ` Lukas Czerner
2011-06-09 10:54                       ` Amir G.
2011-06-09 12:59                         ` Lukas Czerner
2011-06-10  7:06                           ` Amir G.
2011-06-10  9:00                             ` Lukas Czerner
2011-06-10 12:02                               ` Amir G.
2011-06-13  9:56                               ` Amir G.
2011-06-13 10:54                                 ` Lukas Czerner
2011-06-13 12:56                                   ` Amir G.
2011-06-13 13:11                                     ` Lukas Czerner
2011-06-13 13:26                                       ` Amir G.
2011-06-13 13:50                                         ` Joe Thornber
2011-06-10 22:51                         ` Valdis.Kletnieks
2011-06-11  1:09                           ` Amir G.
2011-06-21 11:06 ` Amir G.
2011-06-21 15:45   ` Andreas Dilger
2011-06-22  6:38     ` Amir G.

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1307459283-22130-31-git-send-email-amir73il@users.sourceforge.net \
    --to=amir73il@users.sourceforge.net \
    --cc=amir73il@users.sf.net \
    --cc=lczerner@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=xiaoqiangnk@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).