linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/5] ext4: convert write_begin methods to stable_page_writes semantics
@ 2013-04-14 19:01 Dmitry Monakhov
  2013-04-14 19:01 ` [PATCH 2/5] jbd2: optimize jbd2_journal_force_commit Dmitry Monakhov
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Dmitry Monakhov @ 2013-04-14 19:01 UTC (permalink / raw)
  To: linux-ext4; +Cc: jack, Dmitry Monakhov

use wait_for_stable_page() instead of wait_on_page_writeback()

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/inode.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 62189c8..1be5827 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1028,7 +1028,8 @@ retry_journal:
 		ext4_journal_stop(handle);
 		goto retry_grab;
 	}
-	wait_on_page_writeback(page);
+	/* In case writeback began while the page was unlocked */
+	wait_for_stable_page(page);
 
 	if (ext4_should_dioread_nolock(inode))
 		ret = __block_write_begin(page, pos, len, ext4_get_block_write);
@@ -2715,7 +2716,7 @@ retry_journal:
 		goto retry_grab;
 	}
 	/* In case writeback began while the page was unlocked */
-	wait_on_page_writeback(page);
+	wait_for_stable_page(page);
 
 	ret = __block_write_begin(page, pos, len, ext4_da_get_block_prep);
 	if (ret < 0) {
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/5] jbd2: optimize jbd2_journal_force_commit
  2013-04-14 19:01 [PATCH 1/5] ext4: convert write_begin methods to stable_page_writes semantics Dmitry Monakhov
@ 2013-04-14 19:01 ` Dmitry Monakhov
  2013-04-15 12:29   ` Jan Kara
  2013-04-14 19:01 ` [PATCH 3/5] ext4: fix data integrity for ext4_sync_fs Dmitry Monakhov
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Dmitry Monakhov @ 2013-04-14 19:01 UTC (permalink / raw)
  To: linux-ext4; +Cc: jack, Dmitry Monakhov

Current implementation of jbd2_journal_force_commit() is suboptimal because
result in empty and useless commits. But callers just want to force and wait
any unfinished commits. We already has jbd2_journal_force_commit_nested()
which does exactly what we want, except we are guaranteed that we do not hold
journal transaction open.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/jbd2/journal.c     |   58 ++++++++++++++++++++++++++++++++++++------------
 fs/jbd2/transaction.c |   23 -------------------
 include/linux/jbd2.h  |    2 +-
 3 files changed, 44 insertions(+), 39 deletions(-)

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 886ec2f..cfe0aca 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -566,21 +566,19 @@ int jbd2_log_start_commit(journal_t *journal, tid_t tid)
 }
 
 /*
- * Force and wait upon a commit if the calling process is not within
- * transaction.  This is used for forcing out undo-protected data which contains
- * bitmaps, when the fs is running out of space.
- *
- * We can only force the running transaction if we don't have an active handle;
- * otherwise, we will deadlock.
- *
- * Returns true if a transaction was started.
+ * Force and wait any uncommitted transactions.  We can only force the running
+ * transaction if we don't have an active handle, otherwise, we will deadlock.
  */
-int jbd2_journal_force_commit_nested(journal_t *journal)
+int __jbd2_journal_force_commit(journal_t *journal, int nested, int *progress)
 {
 	transaction_t *transaction = NULL;
 	tid_t tid;
-	int need_to_start = 0;
+	int need_to_start = 0, ret = 0;
 
+	J_ASSERT(!current->journal_info || nested);
+
+	if (progress)
+		*progress = 0;
 	read_lock(&journal->j_state_lock);
 	if (journal->j_running_transaction && !current->journal_info) {
 		transaction = journal->j_running_transaction;
@@ -590,16 +588,46 @@ int jbd2_journal_force_commit_nested(journal_t *journal)
 		transaction = journal->j_committing_transaction;
 
 	if (!transaction) {
+		/* Nothing to commit */
 		read_unlock(&journal->j_state_lock);
-		return 0;	/* Nothing to retry */
+		return 0;
 	}
-
 	tid = transaction->t_tid;
 	read_unlock(&journal->j_state_lock);
 	if (need_to_start)
-		jbd2_log_start_commit(journal, tid);
-	jbd2_log_wait_commit(journal, tid);
-	return 1;
+		ret = jbd2_log_start_commit(journal, tid);
+	if (!ret)
+		ret = jbd2_log_wait_commit(journal, tid);
+	if (!ret && progress)
+		*progress = 1;
+
+	return ret;
+}
+
+/**
+ * Force and wait upon a commit if the calling process is not within
+ * transaction.  This is used for forcing out undo-protected data which contains
+ * bitmaps, when the fs is running out of space.
+ *
+ * @journal: journal to force
+ * Returns true if progress was made.
+ */
+int jbd2_journal_force_commit_nested(journal_t *journal)
+{
+	int progress;
+
+	__jbd2_journal_force_commit(journal, 1, &progress);
+	return progress;
+}
+
+/**
+ * int journal_force_commit() - force any uncommitted transactions
+ * @journal: journal to force
+ *
+ */
+int jbd2_journal_force_commit(journal_t *journal)
+{
+	return __jbd2_journal_force_commit(journal, 0, NULL);
 }
 
 /*
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 325bc01..bb2a5c0 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1515,29 +1515,6 @@ int jbd2_journal_stop(handle_t *handle)
 	return err;
 }
 
-/**
- * int jbd2_journal_force_commit() - force any uncommitted transactions
- * @journal: journal to force
- *
- * For synchronous operations: force any uncommitted transactions
- * to disk.  May seem kludgy, but it reuses all the handle batching
- * code in a very simple manner.
- */
-int jbd2_journal_force_commit(journal_t *journal)
-{
-	handle_t *handle;
-	int ret;
-
-	handle = jbd2_journal_start(journal, 1);
-	if (IS_ERR(handle)) {
-		ret = PTR_ERR(handle);
-	} else {
-		handle->h_sync = 1;
-		ret = jbd2_journal_stop(handle);
-	}
-	return ret;
-}
-
 /*
  *
  * List management code snippets: various functions for manipulating the
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index f9fe889..7b38517 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1125,6 +1125,7 @@ extern void	   jbd2_journal_ack_err    (journal_t *);
 extern int	   jbd2_journal_clear_err  (journal_t *);
 extern int	   jbd2_journal_bmap(journal_t *, unsigned long, unsigned long long *);
 extern int	   jbd2_journal_force_commit(journal_t *);
+extern int	   jbd2_journal_force_commit_nested(journal_t *);
 extern int	   jbd2_journal_file_inode(handle_t *handle, struct jbd2_inode *inode);
 extern int	   jbd2_journal_begin_ordered_truncate(journal_t *journal,
 				struct jbd2_inode *inode, loff_t new_size);
@@ -1199,7 +1200,6 @@ int __jbd2_log_space_left(journal_t *); /* Called with journal locked */
 int jbd2_log_start_commit(journal_t *journal, tid_t tid);
 int __jbd2_log_start_commit(journal_t *journal, tid_t tid);
 int jbd2_journal_start_commit(journal_t *journal, tid_t *tid);
-int jbd2_journal_force_commit_nested(journal_t *journal);
 int jbd2_log_wait_commit(journal_t *journal, tid_t tid);
 int jbd2_complete_transaction(journal_t *journal, tid_t tid);
 int jbd2_log_do_checkpoint(journal_t *journal);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/5] ext4: fix data integrity for ext4_sync_fs
  2013-04-14 19:01 [PATCH 1/5] ext4: convert write_begin methods to stable_page_writes semantics Dmitry Monakhov
  2013-04-14 19:01 ` [PATCH 2/5] jbd2: optimize jbd2_journal_force_commit Dmitry Monakhov
@ 2013-04-14 19:01 ` Dmitry Monakhov
  2013-04-15 13:59   ` Jan Kara
  2013-04-14 19:01 ` [PATCH 4/5] jbd: optimize journal_force_commit Dmitry Monakhov
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Dmitry Monakhov @ 2013-04-14 19:01 UTC (permalink / raw)
  To: linux-ext4; +Cc: jack, Dmitry Monakhov

Inode's data or non journaled quota may be written w/o jounral so we must
send a barrier at the end of ext4_sync_fs. But it can be skipped if journal
commit will do it for us.

Also fix data integrity for nojournal mode.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/fsync.c      |    2 +-
 fs/ext4/super.c      |   34 +++++++++++++++++++++++++++++++++-
 fs/jbd2/journal.c    |   18 +++++++++++++++++-
 include/linux/jbd2.h |    2 +-
 4 files changed, 52 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
index e0ba8a4..8a0dee8 100644
--- a/fs/ext4/fsync.c
+++ b/fs/ext4/fsync.c
@@ -164,7 +164,7 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 
 	commit_tid = datasync ? ei->i_datasync_tid : ei->i_sync_tid;
 	if (journal->j_flags & JBD2_BARRIER &&
-	    !jbd2_trans_will_send_data_barrier(journal, commit_tid))
+	    !jbd2_trans_will_send_data_barrier(journal, &commit_tid))
 		needs_barrier = true;
 	ret = jbd2_complete_transaction(journal, commit_tid);
 	if (needs_barrier) {
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index f355c28..f241644 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -69,6 +69,7 @@ static void ext4_mark_recovery_complete(struct super_block *sb,
 static void ext4_clear_journal_err(struct super_block *sb,
 				   struct ext4_super_block *es);
 static int ext4_sync_fs(struct super_block *sb, int wait);
+static int ext4_sync_fs_nojournal(struct super_block *sb, int wait);
 static int ext4_remount(struct super_block *sb, int *flags, char *data);
 static int ext4_statfs(struct dentry *dentry, struct kstatfs *buf);
 static int ext4_unfreeze(struct super_block *sb);
@@ -1096,6 +1097,7 @@ static const struct super_operations ext4_nojournal_sops = {
 	.dirty_inode	= ext4_dirty_inode,
 	.drop_inode	= ext4_drop_inode,
 	.evict_inode	= ext4_evict_inode,
+	.sync_fs	= ext4_sync_fs_nojournal,
 	.put_super	= ext4_put_super,
 	.statfs		= ext4_statfs,
 	.remount_fs	= ext4_remount,
@@ -4529,6 +4531,7 @@ static int ext4_sync_fs(struct super_block *sb, int wait)
 {
 	int ret = 0;
 	tid_t target;
+	bool needs_barrier = false;
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
 
 	trace_ext4_sync_fs(sb, wait);
@@ -4538,10 +4541,39 @@ static int ext4_sync_fs(struct super_block *sb, int wait)
 	 * no dirty dquots
 	 */
 	dquot_writeback_dquots(sb, -1);
+	/*
+	 * Data writeback is possible w/o journal transaction, so barrier must
+	 * being sent at the end of the function. But we can skip it if
+	 * transaction_commit will do it for us.
+	 */
+	if (sbi->s_journal->j_flags & JBD2_BARRIER &&
+	    !jbd2_trans_will_send_data_barrier(sbi->s_journal, NULL))
+		needs_barrier = true;
+
 	if (jbd2_journal_start_commit(sbi->s_journal, &target)) {
 		if (wait)
-			jbd2_log_wait_commit(sbi->s_journal, target);
+			ret = jbd2_log_wait_commit(sbi->s_journal, target);
+	}
+	if (needs_barrier) {
+		int err;
+		err = blkdev_issue_flush(sb->s_bdev, GFP_KERNEL, NULL);
+		if (!ret)
+			ret = err;
 	}
+
+	return ret;
+}
+
+static int ext4_sync_fs_nojournal(struct super_block *sb, int wait)
+{
+	int ret = 0;
+
+	trace_ext4_sync_fs(sb, wait);
+	flush_workqueue(EXT4_SB(sb)->dio_unwritten_wq);
+	dquot_writeback_dquots(sb, -1);
+	if (test_opt(sb, BARRIER))
+		ret = blkdev_issue_flush(sb->s_bdev, GFP_KERNEL, NULL);
+
 	return ret;
 }
 
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index cfe0aca..35de8a0 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -668,14 +668,30 @@ int jbd2_journal_start_commit(journal_t *journal, tid_t *ptid)
  * may or may not have sent the barrier. Used to avoid sending barrier
  * twice in common cases.
  */
-int jbd2_trans_will_send_data_barrier(journal_t *journal, tid_t tid)
+int jbd2_trans_will_send_data_barrier(journal_t *journal, tid_t *wait_tid)
 {
 	int ret = 0;
 	transaction_t *commit_trans;
+	tid_t tid;
 
 	if (!(journal->j_flags & JBD2_BARRIER))
 		return 0;
 	read_lock(&journal->j_state_lock);
+
+	/* Caller want to wait some specific transaction? */
+	if (wait_tid)
+		tid = *wait_tid;
+	else  {
+		/* Most recent uncommitted transaction */
+		if (journal->j_running_transaction) {
+			ret = 1;
+			goto out;
+		}
+		if (!journal->j_commit_sequence) {
+			goto out;
+		}
+		tid = journal->j_commit_sequence;
+	}
 	/* Transaction already committed? */
 	if (tid_geq(journal->j_commit_sequence, tid))
 		goto out;
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 7b38517..8d77790 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1203,7 +1203,7 @@ int jbd2_journal_start_commit(journal_t *journal, tid_t *tid);
 int jbd2_log_wait_commit(journal_t *journal, tid_t tid);
 int jbd2_complete_transaction(journal_t *journal, tid_t tid);
 int jbd2_log_do_checkpoint(journal_t *journal);
-int jbd2_trans_will_send_data_barrier(journal_t *journal, tid_t tid);
+int jbd2_trans_will_send_data_barrier(journal_t *journal, tid_t *tid);
 
 void __jbd2_log_wait_for_space(journal_t *journal);
 extern void __jbd2_journal_drop_transaction(journal_t *, transaction_t *);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/5] jbd: optimize journal_force_commit
  2013-04-14 19:01 [PATCH 1/5] ext4: convert write_begin methods to stable_page_writes semantics Dmitry Monakhov
  2013-04-14 19:01 ` [PATCH 2/5] jbd2: optimize jbd2_journal_force_commit Dmitry Monakhov
  2013-04-14 19:01 ` [PATCH 3/5] ext4: fix data integrity for ext4_sync_fs Dmitry Monakhov
@ 2013-04-14 19:01 ` Dmitry Monakhov
  2013-04-14 19:01 ` [PATCH 5/5] ext3: fix data integrity for ext4_sync_fs Dmitry Monakhov
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Dmitry Monakhov @ 2013-04-14 19:01 UTC (permalink / raw)
  To: linux-ext4; +Cc: jack, Dmitry Monakhov

Current implementation of journal_force_commit() is suboptimal because
result in empty and useless commits. But callers just want to force and wait
any unfinished commits. We already has journal_force_commit_nested()
which does exactly what we want, except we are guaranteed that we do not hold
journal transaction open.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/jbd/journal.c     |   53 ++++++++++++++++++++++++++++++++++++++-----------
 fs/jbd/transaction.c |   23 ---------------------
 2 files changed, 41 insertions(+), 35 deletions(-)

diff --git a/fs/jbd/journal.c b/fs/jbd/journal.c
index 81cc7ea..82eaec4 100644
--- a/fs/jbd/journal.c
+++ b/fs/jbd/journal.c
@@ -482,24 +482,23 @@ int log_start_commit(journal_t *journal, tid_t tid)
 }
 
 /*
- * Force and wait upon a commit if the calling process is not within
- * transaction.  This is used for forcing out undo-protected data which contains
- * bitmaps, when the fs is running out of space.
- *
- * We can only force the running transaction if we don't have an active handle;
- * otherwise, we will deadlock.
- *
- * Returns true if a transaction was started.
+ * Force and wait any uncommitted transactions.  We can only force the running
+ * transaction if we don't have an active handle, otherwise, we will deadlock.
  */
-int journal_force_commit_nested(journal_t *journal)
+static int __journal_force_commit(journal_t *journal, int nested, int *progress)
 {
 	transaction_t *transaction = NULL;
 	tid_t tid;
+	int ret = 0;
+
+	J_ASSERT(!current->journal_info || nested);
+	if (progress)
+		*progress = 0;
 
 	spin_lock(&journal->j_state_lock);
 	if (journal->j_running_transaction && !current->journal_info) {
 		transaction = journal->j_running_transaction;
-		__log_start_commit(journal, transaction->t_tid);
+		ret = __log_start_commit(journal, transaction->t_tid);
 	} else if (journal->j_committing_transaction)
 		transaction = journal->j_committing_transaction;
 
@@ -510,8 +509,38 @@ int journal_force_commit_nested(journal_t *journal)
 
 	tid = transaction->t_tid;
 	spin_unlock(&journal->j_state_lock);
-	log_wait_commit(journal, tid);
-	return 1;
+	if (!ret)
+		ret = log_wait_commit(journal, tid);
+	if (!ret && progress)
+		*progress = 1;
+
+	return ret;
+}
+
+/**
+ * Force and wait upon a commit if the calling process is not within
+ * transaction.  This is used for forcing out undo-protected data which contains
+ * bitmaps, when the fs is running out of space.
+ *
+ * @journal: journal to force
+ * Returns true if progress was made.
+ */
+int journal_force_commit_nested(journal_t *journal)
+{
+	int progress;
+
+	__journal_force_commit(journal, 1, &progress);
+	return progress;
+}
+
+/**
+ * int journal_force_commit() - force any uncommitted transactions
+ * @journal: journal to force
+ *
+ */
+int journal_force_commit(journal_t *journal)
+{
+	return __journal_force_commit(journal, 0, NULL);
 }
 
 /*
diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
index 071d690..d3b9a13 100644
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@@ -1483,29 +1483,6 @@ int journal_stop(handle_t *handle)
 	return err;
 }
 
-/**
- * int journal_force_commit() - force any uncommitted transactions
- * @journal: journal to force
- *
- * For synchronous operations: force any uncommitted transactions
- * to disk.  May seem kludgy, but it reuses all the handle batching
- * code in a very simple manner.
- */
-int journal_force_commit(journal_t *journal)
-{
-	handle_t *handle;
-	int ret;
-
-	handle = journal_start(journal, 1);
-	if (IS_ERR(handle)) {
-		ret = PTR_ERR(handle);
-	} else {
-		handle->h_sync = 1;
-		ret = journal_stop(handle);
-	}
-	return ret;
-}

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/5] ext3: fix data integrity for ext4_sync_fs
  2013-04-14 19:01 [PATCH 1/5] ext4: convert write_begin methods to stable_page_writes semantics Dmitry Monakhov
                   ` (2 preceding siblings ...)
  2013-04-14 19:01 ` [PATCH 4/5] jbd: optimize journal_force_commit Dmitry Monakhov
@ 2013-04-14 19:01 ` Dmitry Monakhov
  2013-04-15 12:01 ` [PATCH 1/5] ext4: convert write_begin methods to stable_page_writes semantics Jan Kara
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Dmitry Monakhov @ 2013-04-14 19:01 UTC (permalink / raw)
  To: linux-ext4; +Cc: jack, Dmitry Monakhov

Inode's data or non journaled quota may be written w/o jounral so we must
send a barrier at the end of ext3_sync_fs. But it can be skipped if journal
commit will do it for us.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext3/super.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index fb5120a..ee140a8 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -2521,6 +2521,7 @@ int ext3_force_commit(struct super_block *sb)
 static int ext3_sync_fs(struct super_block *sb, int wait)
 {
 	tid_t target;
+	int ret = 0;
 
 	trace_ext3_sync_fs(sb, wait);
 	/*
@@ -2530,9 +2531,12 @@ static int ext3_sync_fs(struct super_block *sb, int wait)
 	dquot_writeback_dquots(sb, -1);
 	if (journal_start_commit(EXT3_SB(sb)->s_journal, &target)) {
 		if (wait)
-			log_wait_commit(EXT3_SB(sb)->s_journal, target);
+			ret = log_wait_commit(EXT3_SB(sb)->s_journal, target);
+	} else {
+		ret = blkdev_issue_flush(sb->s_bdev, GFP_KERNEL, NULL);
 	}
-	return 0;
+
+	return ret;
 }
 
 /*
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] ext4: convert write_begin methods to stable_page_writes semantics
  2013-04-14 19:01 [PATCH 1/5] ext4: convert write_begin methods to stable_page_writes semantics Dmitry Monakhov
                   ` (3 preceding siblings ...)
  2013-04-14 19:01 ` [PATCH 5/5] ext3: fix data integrity for ext4_sync_fs Dmitry Monakhov
@ 2013-04-15 12:01 ` Jan Kara
  2013-04-22 12:36 ` Zheng Liu
  2013-08-28 18:32 ` Theodore Ts'o
  6 siblings, 0 replies; 15+ messages in thread
From: Jan Kara @ 2013-04-15 12:01 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-ext4, jack

On Sun 14-04-13 23:01:33, Dmitry Monakhov wrote:
> use wait_for_stable_page() instead of wait_on_page_writeback()
  Looks good. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> 
> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> ---
>  fs/ext4/inode.c |    5 +++--
>  1 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 62189c8..1be5827 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1028,7 +1028,8 @@ retry_journal:
>  		ext4_journal_stop(handle);
>  		goto retry_grab;
>  	}
> -	wait_on_page_writeback(page);
> +	/* In case writeback began while the page was unlocked */
> +	wait_for_stable_page(page);
>  
>  	if (ext4_should_dioread_nolock(inode))
>  		ret = __block_write_begin(page, pos, len, ext4_get_block_write);
> @@ -2715,7 +2716,7 @@ retry_journal:
>  		goto retry_grab;
>  	}
>  	/* In case writeback began while the page was unlocked */
> -	wait_on_page_writeback(page);
> +	wait_for_stable_page(page);
>  
>  	ret = __block_write_begin(page, pos, len, ext4_da_get_block_prep);
>  	if (ret < 0) {
> -- 
> 1.7.1
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] jbd2: optimize jbd2_journal_force_commit
  2013-04-14 19:01 ` [PATCH 2/5] jbd2: optimize jbd2_journal_force_commit Dmitry Monakhov
@ 2013-04-15 12:29   ` Jan Kara
  2013-04-17  7:39     ` Dmitry Monakhov
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Kara @ 2013-04-15 12:29 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-ext4, jack

On Sun 14-04-13 23:01:34, Dmitry Monakhov wrote:
> Current implementation of jbd2_journal_force_commit() is suboptimal because
> result in empty and useless commits. But callers just want to force and wait
> any unfinished commits. We already has jbd2_journal_force_commit_nested()
> which does exactly what we want, except we are guaranteed that we do not hold
> journal transaction open.
  Umm, I have a questions regarding this patch:
Grep shows there are just two places in the code which use
ext4_force_commit() (and thus jbd2_journal_force_commit()). These are
ext4_write_inode() and ext4_sync_file() (in data=journal mode). The first
callsite can use _nested() variant immediately as we even assert there's
no handle started. The second call site can use the _nested variant as well
because if we had the transaction started when entering ext4_sync_file() we
would have serious problems (lock inversion, deadlocks in !data=journal
modes) anyway. So IMO there's no need for !nested variant at all (at least
in ext4, ocfs2 uses it as well, IMHO it can be converted as well but that's
a different topic). Thoughts?

									Honza

> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> ---
>  fs/jbd2/journal.c     |   58 ++++++++++++++++++++++++++++++++++++------------
>  fs/jbd2/transaction.c |   23 -------------------
>  include/linux/jbd2.h  |    2 +-
>  3 files changed, 44 insertions(+), 39 deletions(-)
> 
> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index 886ec2f..cfe0aca 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -566,21 +566,19 @@ int jbd2_log_start_commit(journal_t *journal, tid_t tid)
>  }
>  
>  /*
> - * Force and wait upon a commit if the calling process is not within
> - * transaction.  This is used for forcing out undo-protected data which contains
> - * bitmaps, when the fs is running out of space.
> - *
> - * We can only force the running transaction if we don't have an active handle;
> - * otherwise, we will deadlock.
> - *
> - * Returns true if a transaction was started.
> + * Force and wait any uncommitted transactions.  We can only force the running
> + * transaction if we don't have an active handle, otherwise, we will deadlock.
>   */
> -int jbd2_journal_force_commit_nested(journal_t *journal)
> +int __jbd2_journal_force_commit(journal_t *journal, int nested, int *progress)
>  {
>  	transaction_t *transaction = NULL;
>  	tid_t tid;
> -	int need_to_start = 0;
> +	int need_to_start = 0, ret = 0;
>  
> +	J_ASSERT(!current->journal_info || nested);
> +
> +	if (progress)
> +		*progress = 0;
>  	read_lock(&journal->j_state_lock);
>  	if (journal->j_running_transaction && !current->journal_info) {
>  		transaction = journal->j_running_transaction;
> @@ -590,16 +588,46 @@ int jbd2_journal_force_commit_nested(journal_t *journal)
>  		transaction = journal->j_committing_transaction;
>  
>  	if (!transaction) {
> +		/* Nothing to commit */
>  		read_unlock(&journal->j_state_lock);
> -		return 0;	/* Nothing to retry */
> +		return 0;
>  	}
> -
>  	tid = transaction->t_tid;
>  	read_unlock(&journal->j_state_lock);
>  	if (need_to_start)
> -		jbd2_log_start_commit(journal, tid);
> -	jbd2_log_wait_commit(journal, tid);
> -	return 1;
> +		ret = jbd2_log_start_commit(journal, tid);
> +	if (!ret)
> +		ret = jbd2_log_wait_commit(journal, tid);
> +	if (!ret && progress)
> +		*progress = 1;
> +
> +	return ret;
> +}
> +
> +/**
> + * Force and wait upon a commit if the calling process is not within
> + * transaction.  This is used for forcing out undo-protected data which contains
> + * bitmaps, when the fs is running out of space.
> + *
> + * @journal: journal to force
> + * Returns true if progress was made.
> + */
> +int jbd2_journal_force_commit_nested(journal_t *journal)
> +{
> +	int progress;
> +
> +	__jbd2_journal_force_commit(journal, 1, &progress);
> +	return progress;
> +}
> +
> +/**
> + * int journal_force_commit() - force any uncommitted transactions
> + * @journal: journal to force
> + *
> + */
> +int jbd2_journal_force_commit(journal_t *journal)
> +{
> +	return __jbd2_journal_force_commit(journal, 0, NULL);
>  }
>  
>  /*
> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index 325bc01..bb2a5c0 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -1515,29 +1515,6 @@ int jbd2_journal_stop(handle_t *handle)
>  	return err;
>  }
>  
> -/**
> - * int jbd2_journal_force_commit() - force any uncommitted transactions
> - * @journal: journal to force
> - *
> - * For synchronous operations: force any uncommitted transactions
> - * to disk.  May seem kludgy, but it reuses all the handle batching
> - * code in a very simple manner.
> - */
> -int jbd2_journal_force_commit(journal_t *journal)
> -{
> -	handle_t *handle;
> -	int ret;
> -
> -	handle = jbd2_journal_start(journal, 1);
> -	if (IS_ERR(handle)) {
> -		ret = PTR_ERR(handle);
> -	} else {
> -		handle->h_sync = 1;
> -		ret = jbd2_journal_stop(handle);
> -	}
> -	return ret;
> -}
> -
>  /*
>   *
>   * List management code snippets: various functions for manipulating the
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index f9fe889..7b38517 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -1125,6 +1125,7 @@ extern void	   jbd2_journal_ack_err    (journal_t *);
>  extern int	   jbd2_journal_clear_err  (journal_t *);
>  extern int	   jbd2_journal_bmap(journal_t *, unsigned long, unsigned long long *);
>  extern int	   jbd2_journal_force_commit(journal_t *);
> +extern int	   jbd2_journal_force_commit_nested(journal_t *);
>  extern int	   jbd2_journal_file_inode(handle_t *handle, struct jbd2_inode *inode);
>  extern int	   jbd2_journal_begin_ordered_truncate(journal_t *journal,
>  				struct jbd2_inode *inode, loff_t new_size);
> @@ -1199,7 +1200,6 @@ int __jbd2_log_space_left(journal_t *); /* Called with journal locked */
>  int jbd2_log_start_commit(journal_t *journal, tid_t tid);
>  int __jbd2_log_start_commit(journal_t *journal, tid_t tid);
>  int jbd2_journal_start_commit(journal_t *journal, tid_t *tid);
> -int jbd2_journal_force_commit_nested(journal_t *journal);
>  int jbd2_log_wait_commit(journal_t *journal, tid_t tid);
>  int jbd2_complete_transaction(journal_t *journal, tid_t tid);
>  int jbd2_log_do_checkpoint(journal_t *journal);
> -- 
> 1.7.1
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/5] ext4: fix data integrity for ext4_sync_fs
  2013-04-14 19:01 ` [PATCH 3/5] ext4: fix data integrity for ext4_sync_fs Dmitry Monakhov
@ 2013-04-15 13:59   ` Jan Kara
  0 siblings, 0 replies; 15+ messages in thread
From: Jan Kara @ 2013-04-15 13:59 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-ext4, jack

On Sun 14-04-13 23:01:35, Dmitry Monakhov wrote:
> Inode's data or non journaled quota may be written w/o jounral so we must
> send a barrier at the end of ext4_sync_fs. But it can be skipped if journal
> commit will do it for us.
> 
> Also fix data integrity for nojournal mode.
  Looks good, just some nits below.

> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index cfe0aca..35de8a0 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -668,14 +668,30 @@ int jbd2_journal_start_commit(journal_t *journal, tid_t *ptid)
>   * may or may not have sent the barrier. Used to avoid sending barrier
>   * twice in common cases.
>   */
> -int jbd2_trans_will_send_data_barrier(journal_t *journal, tid_t tid)
> +int jbd2_trans_will_send_data_barrier(journal_t *journal, tid_t *wait_tid)
  I don't like the pointer for wait_tid. It looks somewhat confusing. Can't
we just either create a function to return the last transaction's tid and
pass that value to jbd2_trans_will_send_data_barrier() or create another
function without 'tid' argument which would take the last transaction?

								Honza
>  {
>  	int ret = 0;
>  	transaction_t *commit_trans;
> +	tid_t tid;
>  
>  	if (!(journal->j_flags & JBD2_BARRIER))
>  		return 0;
>  	read_lock(&journal->j_state_lock);
> +
> +	/* Caller want to wait some specific transaction? */
> +	if (wait_tid)
> +		tid = *wait_tid;
> +	else  {
> +		/* Most recent uncommitted transaction */
> +		if (journal->j_running_transaction) {
> +			ret = 1;
> +			goto out;
> +		}
> +		if (!journal->j_commit_sequence) {
> +			goto out;
> +		}
> +		tid = journal->j_commit_sequence;
> +	}
>  	/* Transaction already committed? */
>  	if (tid_geq(journal->j_commit_sequence, tid))
>  		goto out;
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index 7b38517..8d77790 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -1203,7 +1203,7 @@ int jbd2_journal_start_commit(journal_t *journal, tid_t *tid);
>  int jbd2_log_wait_commit(journal_t *journal, tid_t tid);
>  int jbd2_complete_transaction(journal_t *journal, tid_t tid);
>  int jbd2_log_do_checkpoint(journal_t *journal);
> -int jbd2_trans_will_send_data_barrier(journal_t *journal, tid_t tid);
> +int jbd2_trans_will_send_data_barrier(journal_t *journal, tid_t *tid);
>  
>  void __jbd2_log_wait_for_space(journal_t *journal);
>  extern void __jbd2_journal_drop_transaction(journal_t *, transaction_t *);
> -- 
> 1.7.1
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] jbd2: optimize jbd2_journal_force_commit
  2013-04-15 12:29   ` Jan Kara
@ 2013-04-17  7:39     ` Dmitry Monakhov
  2013-04-18 18:07       ` Jan Kara
  0 siblings, 1 reply; 15+ messages in thread
From: Dmitry Monakhov @ 2013-04-17  7:39 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, jack

On Mon, 15 Apr 2013 14:29:13 +0200, Jan Kara <jack@suse.cz> wrote:
> On Sun 14-04-13 23:01:34, Dmitry Monakhov wrote:
> > Current implementation of jbd2_journal_force_commit() is suboptimal because
> > result in empty and useless commits. But callers just want to force and wait
> > any unfinished commits. We already has jbd2_journal_force_commit_nested()
> > which does exactly what we want, except we are guaranteed that we do not hold
> > journal transaction open.
>   Umm, I have a questions regarding this patch:
> Grep shows there are just two places in the code which use
> ext4_force_commit() (and thus jbd2_journal_force_commit()). These are
> ext4_write_inode() and ext4_sync_file() (in data=journal mode). The first
> callsite can use _nested() variant immediately as we even assert there's
> no handle started. The second call site can use the _nested variant as well
> because if we had the transaction started when entering ext4_sync_file() we
> would have serious problems (lock inversion, deadlocks in !data=journal
> modes) anyway. So IMO there's no need for !nested variant at all (at least
> in ext4, ocfs2 uses it as well, IMHO it can be converted as well but that's
> a different topic). Thoughts?
I'm not sure that I completely understand what you meant, but it seems
incorrect to use jbd2_journal_force_commit_nested() in 
ext4_write_inode() and ext4_sync_file(). Because nested variant has
probabilistic behavior, It may skip real transaction commit if we hold
a transaction running.  ext4_write_inode() and ext4_sync_file() 
are the functions where we demand deterministic behavior. If we silently
miss real transaction commit because current->journal_info != NULL (due
to some bugs) this breaks data integrity assumptions and it is better to
make it loud and trigger a BUGON. This also true for ocfs2_sync_file().
At the same time ocfs2_file_aio_write() looks like a candidate for xxx_nested().
> 
> 									Honza
> 
> > Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> > ---
> >  fs/jbd2/journal.c     |   58 ++++++++++++++++++++++++++++++++++++------------
> >  fs/jbd2/transaction.c |   23 -------------------
> >  include/linux/jbd2.h  |    2 +-
> >  3 files changed, 44 insertions(+), 39 deletions(-)
> > 
> > diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> > index 886ec2f..cfe0aca 100644
> > --- a/fs/jbd2/journal.c
> > +++ b/fs/jbd2/journal.c
> > @@ -566,21 +566,19 @@ int jbd2_log_start_commit(journal_t *journal, tid_t tid)
> >  }
> >  
> >  /*
> > - * Force and wait upon a commit if the calling process is not within
> > - * transaction.  This is used for forcing out undo-protected data which contains
> > - * bitmaps, when the fs is running out of space.
> > - *
> > - * We can only force the running transaction if we don't have an active handle;
> > - * otherwise, we will deadlock.
> > - *
> > - * Returns true if a transaction was started.
> > + * Force and wait any uncommitted transactions.  We can only force the running
> > + * transaction if we don't have an active handle, otherwise, we will deadlock.
> >   */
> > -int jbd2_journal_force_commit_nested(journal_t *journal)
> > +int __jbd2_journal_force_commit(journal_t *journal, int nested, int *progress)
> >  {
> >  	transaction_t *transaction = NULL;
> >  	tid_t tid;
> > -	int need_to_start = 0;
> > +	int need_to_start = 0, ret = 0;
> >  
> > +	J_ASSERT(!current->journal_info || nested);
> > +
> > +	if (progress)
> > +		*progress = 0;
> >  	read_lock(&journal->j_state_lock);
> >  	if (journal->j_running_transaction && !current->journal_info) {
> >  		transaction = journal->j_running_transaction;
> > @@ -590,16 +588,46 @@ int jbd2_journal_force_commit_nested(journal_t *journal)
> >  		transaction = journal->j_committing_transaction;
> >  
> >  	if (!transaction) {
> > +		/* Nothing to commit */
> >  		read_unlock(&journal->j_state_lock);
> > -		return 0;	/* Nothing to retry */
> > +		return 0;
> >  	}
> > -
> >  	tid = transaction->t_tid;
> >  	read_unlock(&journal->j_state_lock);
> >  	if (need_to_start)
> > -		jbd2_log_start_commit(journal, tid);
> > -	jbd2_log_wait_commit(journal, tid);
> > -	return 1;
> > +		ret = jbd2_log_start_commit(journal, tid);
> > +	if (!ret)
> > +		ret = jbd2_log_wait_commit(journal, tid);
> > +	if (!ret && progress)
> > +		*progress = 1;
> > +
> > +	return ret;
> > +}
> > +
> > +/**
> > + * Force and wait upon a commit if the calling process is not within
> > + * transaction.  This is used for forcing out undo-protected data which contains
> > + * bitmaps, when the fs is running out of space.
> > + *
> > + * @journal: journal to force
> > + * Returns true if progress was made.
> > + */
> > +int jbd2_journal_force_commit_nested(journal_t *journal)
> > +{
> > +	int progress;
> > +
> > +	__jbd2_journal_force_commit(journal, 1, &progress);
> > +	return progress;
> > +}
> > +
> > +/**
> > + * int journal_force_commit() - force any uncommitted transactions
> > + * @journal: journal to force
> > + *
> > + */
> > +int jbd2_journal_force_commit(journal_t *journal)
> > +{
> > +	return __jbd2_journal_force_commit(journal, 0, NULL);
> >  }
> >  
> >  /*
> > diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> > index 325bc01..bb2a5c0 100644
> > --- a/fs/jbd2/transaction.c
> > +++ b/fs/jbd2/transaction.c
> > @@ -1515,29 +1515,6 @@ int jbd2_journal_stop(handle_t *handle)
> >  	return err;
> >  }
> >  
> > -/**
> > - * int jbd2_journal_force_commit() - force any uncommitted transactions
> > - * @journal: journal to force
> > - *
> > - * For synchronous operations: force any uncommitted transactions
> > - * to disk.  May seem kludgy, but it reuses all the handle batching
> > - * code in a very simple manner.
> > - */
> > -int jbd2_journal_force_commit(journal_t *journal)
> > -{
> > -	handle_t *handle;
> > -	int ret;
> > -
> > -	handle = jbd2_journal_start(journal, 1);
> > -	if (IS_ERR(handle)) {
> > -		ret = PTR_ERR(handle);
> > -	} else {
> > -		handle->h_sync = 1;
> > -		ret = jbd2_journal_stop(handle);
> > -	}
> > -	return ret;
> > -}
> > -
> >  /*
> >   *
> >   * List management code snippets: various functions for manipulating the
> > diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> > index f9fe889..7b38517 100644
> > --- a/include/linux/jbd2.h
> > +++ b/include/linux/jbd2.h
> > @@ -1125,6 +1125,7 @@ extern void	   jbd2_journal_ack_err    (journal_t *);
> >  extern int	   jbd2_journal_clear_err  (journal_t *);
> >  extern int	   jbd2_journal_bmap(journal_t *, unsigned long, unsigned long long *);
> >  extern int	   jbd2_journal_force_commit(journal_t *);
> > +extern int	   jbd2_journal_force_commit_nested(journal_t *);
> >  extern int	   jbd2_journal_file_inode(handle_t *handle, struct jbd2_inode *inode);
> >  extern int	   jbd2_journal_begin_ordered_truncate(journal_t *journal,
> >  				struct jbd2_inode *inode, loff_t new_size);
> > @@ -1199,7 +1200,6 @@ int __jbd2_log_space_left(journal_t *); /* Called with journal locked */
> >  int jbd2_log_start_commit(journal_t *journal, tid_t tid);
> >  int __jbd2_log_start_commit(journal_t *journal, tid_t tid);
> >  int jbd2_journal_start_commit(journal_t *journal, tid_t *tid);
> > -int jbd2_journal_force_commit_nested(journal_t *journal);
> >  int jbd2_log_wait_commit(journal_t *journal, tid_t tid);
> >  int jbd2_complete_transaction(journal_t *journal, tid_t tid);
> >  int jbd2_log_do_checkpoint(journal_t *journal);
> > -- 
> > 1.7.1
> > 
> -- 
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] jbd2: optimize jbd2_journal_force_commit
  2013-04-17  7:39     ` Dmitry Monakhov
@ 2013-04-18 18:07       ` Jan Kara
  2013-04-22  8:11         ` Dmitry Monakhov
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Kara @ 2013-04-18 18:07 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: Jan Kara, linux-ext4

On Wed 17-04-13 11:39:27, Dmitry Monakhov wrote:
> On Mon, 15 Apr 2013 14:29:13 +0200, Jan Kara <jack@suse.cz> wrote:
> > On Sun 14-04-13 23:01:34, Dmitry Monakhov wrote:
> > > Current implementation of jbd2_journal_force_commit() is suboptimal because
> > > result in empty and useless commits. But callers just want to force and wait
> > > any unfinished commits. We already has jbd2_journal_force_commit_nested()
> > > which does exactly what we want, except we are guaranteed that we do not hold
> > > journal transaction open.
> >   Umm, I have a questions regarding this patch:
> > Grep shows there are just two places in the code which use
> > ext4_force_commit() (and thus jbd2_journal_force_commit()). These are
> > ext4_write_inode() and ext4_sync_file() (in data=journal mode). The first
> > callsite can use _nested() variant immediately as we even assert there's
> > no handle started. The second call site can use the _nested variant as well
> > because if we had the transaction started when entering ext4_sync_file() we
> > would have serious problems (lock inversion, deadlocks in !data=journal
> > modes) anyway. So IMO there's no need for !nested variant at all (at least
> > in ext4, ocfs2 uses it as well, IMHO it can be converted as well but that's
> > a different topic). Thoughts?
> I'm not sure that I completely understand what you meant, but it seems
> incorrect to use jbd2_journal_force_commit_nested() in 
> ext4_write_inode() and ext4_sync_file(). Because nested variant has
> probabilistic behavior, It may skip real transaction commit if we hold
> a transaction running.  ext4_write_inode() and ext4_sync_file() 
> are the functions where we demand deterministic behavior. If we silently
> miss real transaction commit because current->journal_info != NULL (due
> to some bugs) this breaks data integrity assumptions and it is better to
> make it loud and trigger a BUGON.
  I see. I was confused by the fact that 'nested' argument got used only in
the assertion but now I see why that is.

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] jbd2: optimize jbd2_journal_force_commit
  2013-04-18 18:07       ` Jan Kara
@ 2013-04-22  8:11         ` Dmitry Monakhov
  2013-04-23  8:51           ` Jan Kara
  0 siblings, 1 reply; 15+ messages in thread
From: Dmitry Monakhov @ 2013-04-22  8:11 UTC (permalink / raw)
  To: Jan Kara; +Cc: Jan Kara, linux-ext4

On Thu, 18 Apr 2013 20:07:27 +0200, Jan Kara <jack@suse.cz> wrote:
> On Wed 17-04-13 11:39:27, Dmitry Monakhov wrote:
> > On Mon, 15 Apr 2013 14:29:13 +0200, Jan Kara <jack@suse.cz> wrote:
> > > On Sun 14-04-13 23:01:34, Dmitry Monakhov wrote:
> > > > Current implementation of jbd2_journal_force_commit() is suboptimal because
> > > > result in empty and useless commits. But callers just want to force and wait
> > > > any unfinished commits. We already has jbd2_journal_force_commit_nested()
> > > > which does exactly what we want, except we are guaranteed that we do not hold
> > > > journal transaction open.
> > >   Umm, I have a questions regarding this patch:
> > > Grep shows there are just two places in the code which use
> > > ext4_force_commit() (and thus jbd2_journal_force_commit()). These are
> > > ext4_write_inode() and ext4_sync_file() (in data=journal mode). The first
> > > callsite can use _nested() variant immediately as we even assert there's
> > > no handle started. The second call site can use the _nested variant as well
> > > because if we had the transaction started when entering ext4_sync_file() we
> > > would have serious problems (lock inversion, deadlocks in !data=journal
> > > modes) anyway. So IMO there's no need for !nested variant at all (at least
> > > in ext4, ocfs2 uses it as well, IMHO it can be converted as well but that's
> > > a different topic). Thoughts?
> > I'm not sure that I completely understand what you meant, but it seems
> > incorrect to use jbd2_journal_force_commit_nested() in 
> > ext4_write_inode() and ext4_sync_file(). Because nested variant has
> > probabilistic behavior, It may skip real transaction commit if we hold
> > a transaction running.  ext4_write_inode() and ext4_sync_file() 
> > are the functions where we demand deterministic behavior. If we silently
> > miss real transaction commit because current->journal_info != NULL (due
> > to some bugs) this breaks data integrity assumptions and it is better to
> > make it loud and trigger a BUGON.
>   I see. I was confused by the fact that 'nested' argument got used only in
> the assertion but now I see why that is.
Do you give me your ACK/Reviewed  signature?
> 
> 									Honza
> -- 
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] ext4: convert write_begin methods to stable_page_writes semantics
  2013-04-14 19:01 [PATCH 1/5] ext4: convert write_begin methods to stable_page_writes semantics Dmitry Monakhov
                   ` (4 preceding siblings ...)
  2013-04-15 12:01 ` [PATCH 1/5] ext4: convert write_begin methods to stable_page_writes semantics Jan Kara
@ 2013-04-22 12:36 ` Zheng Liu
  2013-08-28 18:32 ` Theodore Ts'o
  6 siblings, 0 replies; 15+ messages in thread
From: Zheng Liu @ 2013-04-22 12:36 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-ext4, jack

On Sun, Apr 14, 2013 at 11:01:33PM +0400, Dmitry Monakhov wrote:
> use wait_for_stable_page() instead of wait_on_page_writeback()
> 
> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Regards,
                                                - Zheng
> ---
>  fs/ext4/inode.c |    5 +++--
>  1 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 62189c8..1be5827 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1028,7 +1028,8 @@ retry_journal:
>  		ext4_journal_stop(handle);
>  		goto retry_grab;
>  	}
> -	wait_on_page_writeback(page);
> +	/* In case writeback began while the page was unlocked */
> +	wait_for_stable_page(page);
>  
>  	if (ext4_should_dioread_nolock(inode))
>  		ret = __block_write_begin(page, pos, len, ext4_get_block_write);
> @@ -2715,7 +2716,7 @@ retry_journal:
>  		goto retry_grab;
>  	}
>  	/* In case writeback began while the page was unlocked */
> -	wait_on_page_writeback(page);
> +	wait_for_stable_page(page);
>  
>  	ret = __block_write_begin(page, pos, len, ext4_da_get_block_prep);
>  	if (ret < 0) {
> -- 
> 1.7.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] jbd2: optimize jbd2_journal_force_commit
  2013-04-22  8:11         ` Dmitry Monakhov
@ 2013-04-23  8:51           ` Jan Kara
  2013-08-28 18:33             ` Theodore Ts'o
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Kara @ 2013-04-23  8:51 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: Jan Kara, linux-ext4

On Mon 22-04-13 12:11:11, Dmitry Monakhov wrote:
> On Thu, 18 Apr 2013 20:07:27 +0200, Jan Kara <jack@suse.cz> wrote:
> > On Wed 17-04-13 11:39:27, Dmitry Monakhov wrote:
> > > On Mon, 15 Apr 2013 14:29:13 +0200, Jan Kara <jack@suse.cz> wrote:
> > > > On Sun 14-04-13 23:01:34, Dmitry Monakhov wrote:
> > > > > Current implementation of jbd2_journal_force_commit() is suboptimal because
> > > > > result in empty and useless commits. But callers just want to force and wait
> > > > > any unfinished commits. We already has jbd2_journal_force_commit_nested()
> > > > > which does exactly what we want, except we are guaranteed that we do not hold
> > > > > journal transaction open.
> > > >   Umm, I have a questions regarding this patch:
> > > > Grep shows there are just two places in the code which use
> > > > ext4_force_commit() (and thus jbd2_journal_force_commit()). These are
> > > > ext4_write_inode() and ext4_sync_file() (in data=journal mode). The first
> > > > callsite can use _nested() variant immediately as we even assert there's
> > > > no handle started. The second call site can use the _nested variant as well
> > > > because if we had the transaction started when entering ext4_sync_file() we
> > > > would have serious problems (lock inversion, deadlocks in !data=journal
> > > > modes) anyway. So IMO there's no need for !nested variant at all (at least
> > > > in ext4, ocfs2 uses it as well, IMHO it can be converted as well but that's
> > > > a different topic). Thoughts?
> > > I'm not sure that I completely understand what you meant, but it seems
> > > incorrect to use jbd2_journal_force_commit_nested() in 
> > > ext4_write_inode() and ext4_sync_file(). Because nested variant has
> > > probabilistic behavior, It may skip real transaction commit if we hold
> > > a transaction running.  ext4_write_inode() and ext4_sync_file() 
> > > are the functions where we demand deterministic behavior. If we silently
> > > miss real transaction commit because current->journal_info != NULL (due
> > > to some bugs) this breaks data integrity assumptions and it is better to
> > > make it loud and trigger a BUGON.
> >   I see. I was confused by the fact that 'nested' argument got used only in
> > the assertion but now I see why that is.
> Do you give me your ACK/Reviewed  signature?
  Looking at it with a fresh mind I think there's still one bug:
 	read_unlock(&journal->j_state_lock);
 	if (need_to_start)
-		jbd2_log_start_commit(journal, tid);
-	jbd2_log_wait_commit(journal, tid);
-	return 1;
+		ret = jbd2_log_start_commit(journal, tid);
+	if (!ret)
+		ret = jbd2_log_wait_commit(journal, tid);
  jbd2_log_start_commit() will return 0 if it didn't wake jbd2 thread and 1
if it did. In either case we should call jbd2_log_wait_commit()...

+	if (!ret && progress)
+		*progress = 1;

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] ext4: convert write_begin methods to stable_page_writes semantics
  2013-04-14 19:01 [PATCH 1/5] ext4: convert write_begin methods to stable_page_writes semantics Dmitry Monakhov
                   ` (5 preceding siblings ...)
  2013-04-22 12:36 ` Zheng Liu
@ 2013-08-28 18:32 ` Theodore Ts'o
  6 siblings, 0 replies; 15+ messages in thread
From: Theodore Ts'o @ 2013-08-28 18:32 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-ext4, jack

On Sun, Apr 14, 2013 at 11:01:33PM +0400, Dmitry Monakhov wrote:
> use wait_for_stable_page() instead of wait_on_page_writeback()
> 
> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] jbd2: optimize jbd2_journal_force_commit
  2013-04-23  8:51           ` Jan Kara
@ 2013-08-28 18:33             ` Theodore Ts'o
  0 siblings, 0 replies; 15+ messages in thread
From: Theodore Ts'o @ 2013-08-28 18:33 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: Jan Kara, linux-ext4

On Tue, Apr 23, 2013 at 10:51:40AM +0200, Jan Kara wrote:
> > Do you give me your ACK/Reviewed  signature?
>   Looking at it with a fresh mind I think there's still one bug:
>  	read_unlock(&journal->j_state_lock);
>  	if (need_to_start)
> -		jbd2_log_start_commit(journal, tid);
> -	jbd2_log_wait_commit(journal, tid);
> -	return 1;
> +		ret = jbd2_log_start_commit(journal, tid);
> +	if (!ret)
> +		ret = jbd2_log_wait_commit(journal, tid);
>   jbd2_log_start_commit() will return 0 if it didn't wake jbd2 thread and 1
> if it did. In either case we should call jbd2_log_wait_commit()...
> 
> +	if (!ret && progress)
> +		*progress = 1;

I was going through old patch series, and I came across this one.

Dmitry, did you have a chance to look at Jan's comments?  

Thanks,

					- Ted

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-08-28 18:33 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-14 19:01 [PATCH 1/5] ext4: convert write_begin methods to stable_page_writes semantics Dmitry Monakhov
2013-04-14 19:01 ` [PATCH 2/5] jbd2: optimize jbd2_journal_force_commit Dmitry Monakhov
2013-04-15 12:29   ` Jan Kara
2013-04-17  7:39     ` Dmitry Monakhov
2013-04-18 18:07       ` Jan Kara
2013-04-22  8:11         ` Dmitry Monakhov
2013-04-23  8:51           ` Jan Kara
2013-08-28 18:33             ` Theodore Ts'o
2013-04-14 19:01 ` [PATCH 3/5] ext4: fix data integrity for ext4_sync_fs Dmitry Monakhov
2013-04-15 13:59   ` Jan Kara
2013-04-14 19:01 ` [PATCH 4/5] jbd: optimize journal_force_commit Dmitry Monakhov
2013-04-14 19:01 ` [PATCH 5/5] ext3: fix data integrity for ext4_sync_fs Dmitry Monakhov
2013-04-15 12:01 ` [PATCH 1/5] ext4: convert write_begin methods to stable_page_writes semantics Jan Kara
2013-04-22 12:36 ` Zheng Liu
2013-08-28 18:32 ` Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).