linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/6] jbd2: don't hold j_state_lock while calling wake_up()
@ 2014-03-09  6:05 Theodore Ts'o
  2014-03-09  6:05 ` [PATCH 2/6] jbd2: calculate statistics without holding j_state_lock and j_list_lock Theodore Ts'o
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Theodore Ts'o @ 2014-03-09  6:05 UTC (permalink / raw)
  To: Ext4 Developers List; +Cc: Theodore Ts'o

The j_state_lock is one of the hottest locks in the jbd2 layer and
thus one of its scalability bottlenecks.

We don't need to be holding the j_state_lock while we are calling
wake_up(&journal->j_wait_commit), so release the lock a little bit
earlier.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
 fs/jbd2/journal.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 244b6f6..67b8e30 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -302,8 +302,8 @@ static void journal_kill_thread(journal_t *journal)
 	journal->j_flags |= JBD2_UNMOUNT;
 
 	while (journal->j_task) {
-		wake_up(&journal->j_wait_commit);
 		write_unlock(&journal->j_state_lock);
+		wake_up(&journal->j_wait_commit);
 		wait_event(journal->j_wait_done_commit, journal->j_task == NULL);
 		write_lock(&journal->j_state_lock);
 	}
@@ -710,8 +710,8 @@ int jbd2_log_wait_commit(journal_t *journal, tid_t tid)
 	while (tid_gt(tid, journal->j_commit_sequence)) {
 		jbd_debug(1, "JBD2: want %d, j_commit_sequence=%d\n",
 				  tid, journal->j_commit_sequence);
-		wake_up(&journal->j_wait_commit);
 		read_unlock(&journal->j_state_lock);
+		wake_up(&journal->j_wait_commit);
 		wait_event(journal->j_wait_done_commit,
 				!tid_gt(tid, journal->j_commit_sequence));
 		read_lock(&journal->j_state_lock);
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/6] jbd2: calculate statistics without holding j_state_lock and j_list_lock
  2014-03-09  6:05 [PATCH 1/6] jbd2: don't hold j_state_lock while calling wake_up() Theodore Ts'o
@ 2014-03-09  6:05 ` Theodore Ts'o
  2014-03-09  6:05 ` [PATCH 3/6] jbd2: add transaction to checkpoint list earlier Theodore Ts'o
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Theodore Ts'o @ 2014-03-09  6:05 UTC (permalink / raw)
  To: Ext4 Developers List; +Cc: Theodore Ts'o

The two hottest locks, and thus the biggest scalability bottlenecks,
in the jbd2 layer, are the j_list_lock and j_state_lock.  This has
inspired some people to do some truly unnatural things[1].

[1] https://www.usenix.org/system/files/conference/fast14/fast14-paper_kang.pdf

We don't need to be holding both j_state_lock and j_list_lock while
calculating the journal statistics, so move those calculations to the
very end of jbd2_journal_commit_transaction.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
 fs/jbd2/commit.c | 36 ++++++++++++++++++------------------
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 765b31d..af36252 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -1083,24 +1083,7 @@ restart_loop:
 		atomic_read(&commit_transaction->t_handle_count);
 	trace_jbd2_run_stats(journal->j_fs_dev->bd_dev,
 			     commit_transaction->t_tid, &stats.run);
-
-	/*
-	 * Calculate overall stats
-	 */
-	spin_lock(&journal->j_history_lock);
-	journal->j_stats.ts_tid++;
-	if (commit_transaction->t_requested)
-		journal->j_stats.ts_requested++;
-	journal->j_stats.run.rs_wait += stats.run.rs_wait;
-	journal->j_stats.run.rs_request_delay += stats.run.rs_request_delay;
-	journal->j_stats.run.rs_running += stats.run.rs_running;
-	journal->j_stats.run.rs_locked += stats.run.rs_locked;
-	journal->j_stats.run.rs_flushing += stats.run.rs_flushing;
-	journal->j_stats.run.rs_logging += stats.run.rs_logging;
-	journal->j_stats.run.rs_handle_count += stats.run.rs_handle_count;
-	journal->j_stats.run.rs_blocks += stats.run.rs_blocks;
-	journal->j_stats.run.rs_blocks_logged += stats.run.rs_blocks_logged;
-	spin_unlock(&journal->j_history_lock);
+	stats.ts_requested = (commit_transaction->t_requested) ? 1 : 0;
 
 	commit_transaction->t_state = T_COMMIT_CALLBACK;
 	J_ASSERT(commit_transaction == journal->j_committing_transaction);
@@ -1157,4 +1140,21 @@ restart_loop:
 	spin_unlock(&journal->j_list_lock);
 	write_unlock(&journal->j_state_lock);
 	wake_up(&journal->j_wait_done_commit);
+
+	/*
+	 * Calculate overall stats
+	 */
+	spin_lock(&journal->j_history_lock);
+	journal->j_stats.ts_tid++;
+	journal->j_stats.ts_requested += stats.ts_requested;
+	journal->j_stats.run.rs_wait += stats.run.rs_wait;
+	journal->j_stats.run.rs_request_delay += stats.run.rs_request_delay;
+	journal->j_stats.run.rs_running += stats.run.rs_running;
+	journal->j_stats.run.rs_locked += stats.run.rs_locked;
+	journal->j_stats.run.rs_flushing += stats.run.rs_flushing;
+	journal->j_stats.run.rs_logging += stats.run.rs_logging;
+	journal->j_stats.run.rs_handle_count += stats.run.rs_handle_count;
+	journal->j_stats.run.rs_blocks += stats.run.rs_blocks;
+	journal->j_stats.run.rs_blocks_logged += stats.run.rs_blocks_logged;
+	spin_unlock(&journal->j_history_lock);
 }
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/6] jbd2: add transaction to checkpoint list earlier
  2014-03-09  6:05 [PATCH 1/6] jbd2: don't hold j_state_lock while calling wake_up() Theodore Ts'o
  2014-03-09  6:05 ` [PATCH 2/6] jbd2: calculate statistics without holding j_state_lock and j_list_lock Theodore Ts'o
@ 2014-03-09  6:05 ` Theodore Ts'o
  2014-03-09  6:05 ` [PATCH 4/6] jbd2: check jh->b_transaction without taking j_list_lock Theodore Ts'o
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Theodore Ts'o @ 2014-03-09  6:05 UTC (permalink / raw)
  To: Ext4 Developers List; +Cc: Theodore Ts'o

We don't otherwise need j_list_lock during the rest of commit phase
#7, so add the transaction to the checkpoint list at the very end of
commit phase #6.  This allows us to drop j_list_lock earlier, which is
a good thing since it is a super hot lock.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
 fs/jbd2/commit.c | 39 ++++++++++++++++++++-------------------
 1 file changed, 20 insertions(+), 19 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index af36252..5f26139 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -1065,6 +1065,25 @@ restart_loop:
 		goto restart_loop;
 	}
 
+	/* Add the transaction to the checkpoint list
+	 * __journal_remove_checkpoint() can not destroy transaction
+	 * under us because it is not marked as T_FINISHED yet */
+	if (journal->j_checkpoint_transactions == NULL) {
+		journal->j_checkpoint_transactions = commit_transaction;
+		commit_transaction->t_cpnext = commit_transaction;
+		commit_transaction->t_cpprev = commit_transaction;
+	} else {
+		commit_transaction->t_cpnext =
+			journal->j_checkpoint_transactions;
+		commit_transaction->t_cpprev =
+			commit_transaction->t_cpnext->t_cpprev;
+		commit_transaction->t_cpnext->t_cpprev =
+			commit_transaction;
+		commit_transaction->t_cpprev->t_cpnext =
+				commit_transaction;
+	}
+	spin_unlock(&journal->j_list_lock);
+
 	/* Done with this transaction! */
 
 	jbd_debug(3, "JBD2: commit phase 7\n");
@@ -1103,24 +1122,6 @@ restart_loop:
 
 	write_unlock(&journal->j_state_lock);
 
-	if (journal->j_checkpoint_transactions == NULL) {
-		journal->j_checkpoint_transactions = commit_transaction;
-		commit_transaction->t_cpnext = commit_transaction;
-		commit_transaction->t_cpprev = commit_transaction;
-	} else {
-		commit_transaction->t_cpnext =
-			journal->j_checkpoint_transactions;
-		commit_transaction->t_cpprev =
-			commit_transaction->t_cpnext->t_cpprev;
-		commit_transaction->t_cpnext->t_cpprev =
-			commit_transaction;
-		commit_transaction->t_cpprev->t_cpnext =
-				commit_transaction;
-	}
-	spin_unlock(&journal->j_list_lock);
-	/* Drop all spin_locks because commit_callback may be block.
-	 * __journal_remove_checkpoint() can not destroy transaction
-	 * under us because it is not marked as T_FINISHED yet */
 	if (journal->j_commit_callback)
 		journal->j_commit_callback(journal, commit_transaction);
 
@@ -1131,7 +1132,7 @@ restart_loop:
 	write_lock(&journal->j_state_lock);
 	spin_lock(&journal->j_list_lock);
 	commit_transaction->t_state = T_FINISHED;
-	/* Recheck checkpoint lists after j_list_lock was dropped */
+	/* Check if the transaction can be dropped now that we are finished */
 	if (commit_transaction->t_checkpoint_list == NULL &&
 	    commit_transaction->t_checkpoint_io_list == NULL) {
 		__jbd2_journal_drop_transaction(journal, commit_transaction);
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 4/6] jbd2: check jh->b_transaction without taking j_list_lock
  2014-03-09  6:05 [PATCH 1/6] jbd2: don't hold j_state_lock while calling wake_up() Theodore Ts'o
  2014-03-09  6:05 ` [PATCH 2/6] jbd2: calculate statistics without holding j_state_lock and j_list_lock Theodore Ts'o
  2014-03-09  6:05 ` [PATCH 3/6] jbd2: add transaction to checkpoint list earlier Theodore Ts'o
@ 2014-03-09  6:05 ` Theodore Ts'o
  2014-03-09  6:05 ` [PATCH 5/6] jbd2: minimize region locked by j_list_lock in journal_get_create_access() Theodore Ts'o
  2014-03-09  6:05 ` [PATCH 6/6] jbd2: minimize region locked by j_list_lock in jbd2_journal_forget() Theodore Ts'o
  4 siblings, 0 replies; 6+ messages in thread
From: Theodore Ts'o @ 2014-03-09  6:05 UTC (permalink / raw)
  To: Ext4 Developers List; +Cc: Theodore Ts'o

jh->b_transaction is adequately protected for reading by the
jbd_lock_bh_state(bh), so we don't need to take j_list_lock in
__journal_try_to_free_buffer().

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
 fs/jbd2/transaction.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 60bb365..78900a1 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1821,11 +1821,11 @@ __journal_try_to_free_buffer(journal_t *journal, struct buffer_head *bh)
 	if (buffer_locked(bh) || buffer_dirty(bh))
 		goto out;
 
-	if (jh->b_next_transaction != NULL)
+	if (jh->b_next_transaction != NULL || jh->b_transaction != NULL)
 		goto out;
 
 	spin_lock(&journal->j_list_lock);
-	if (jh->b_cp_transaction != NULL && jh->b_transaction == NULL) {
+	if (jh->b_cp_transaction != NULL) {
 		/* written-back checkpointed metadata buffer */
 		JBUFFER_TRACE(jh, "remove from checkpoint list");
 		__jbd2_journal_remove_checkpoint(jh);
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 5/6] jbd2: minimize region locked by j_list_lock in journal_get_create_access()
  2014-03-09  6:05 [PATCH 1/6] jbd2: don't hold j_state_lock while calling wake_up() Theodore Ts'o
                   ` (2 preceding siblings ...)
  2014-03-09  6:05 ` [PATCH 4/6] jbd2: check jh->b_transaction without taking j_list_lock Theodore Ts'o
@ 2014-03-09  6:05 ` Theodore Ts'o
  2014-03-09  6:05 ` [PATCH 6/6] jbd2: minimize region locked by j_list_lock in jbd2_journal_forget() Theodore Ts'o
  4 siblings, 0 replies; 6+ messages in thread
From: Theodore Ts'o @ 2014-03-09  6:05 UTC (permalink / raw)
  To: Ext4 Developers List; +Cc: Theodore Ts'o

It's not needed until we start trying to modifying fields in the
journal_head which are protected by j_list_lock.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
 fs/jbd2/transaction.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 78900a1..357f3dc 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1073,7 +1073,6 @@ int jbd2_journal_get_create_access(handle_t *handle, struct buffer_head *bh)
 	 * reused here.
 	 */
 	jbd_lock_bh_state(bh);
-	spin_lock(&journal->j_list_lock);
 	J_ASSERT_JH(jh, (jh->b_transaction == transaction ||
 		jh->b_transaction == NULL ||
 		(jh->b_transaction == journal->j_committing_transaction &&
@@ -1096,12 +1095,14 @@ int jbd2_journal_get_create_access(handle_t *handle, struct buffer_head *bh)
 		jh->b_modified = 0;
 
 		JBUFFER_TRACE(jh, "file as BJ_Reserved");
+		spin_lock(&journal->j_list_lock);
 		__jbd2_journal_file_buffer(jh, transaction, BJ_Reserved);
 	} else if (jh->b_transaction == journal->j_committing_transaction) {
 		/* first access by this transaction */
 		jh->b_modified = 0;
 
 		JBUFFER_TRACE(jh, "set next transaction");
+		spin_lock(&journal->j_list_lock);
 		jh->b_next_transaction = transaction;
 	}
 	spin_unlock(&journal->j_list_lock);
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 6/6] jbd2: minimize region locked by j_list_lock in jbd2_journal_forget()
  2014-03-09  6:05 [PATCH 1/6] jbd2: don't hold j_state_lock while calling wake_up() Theodore Ts'o
                   ` (3 preceding siblings ...)
  2014-03-09  6:05 ` [PATCH 5/6] jbd2: minimize region locked by j_list_lock in journal_get_create_access() Theodore Ts'o
@ 2014-03-09  6:05 ` Theodore Ts'o
  4 siblings, 0 replies; 6+ messages in thread
From: Theodore Ts'o @ 2014-03-09  6:05 UTC (permalink / raw)
  To: Ext4 Developers List; +Cc: Theodore Ts'o

It's not needed until we start trying to modifying fields in the
journal_head which are protected by j_list_lock.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
 fs/jbd2/transaction.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 357f3dc..d999b1f 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1416,7 +1416,6 @@ int jbd2_journal_forget (handle_t *handle, struct buffer_head *bh)
 	BUFFER_TRACE(bh, "entry");
 
 	jbd_lock_bh_state(bh);
-	spin_lock(&journal->j_list_lock);
 
 	if (!buffer_jbd(bh))
 		goto not_jbd;
@@ -1469,6 +1468,7 @@ int jbd2_journal_forget (handle_t *handle, struct buffer_head *bh)
 		 * we know to remove the checkpoint after we commit.
 		 */
 
+		spin_lock(&journal->j_list_lock);
 		if (jh->b_cp_transaction) {
 			__jbd2_journal_temp_unlink_buffer(jh);
 			__jbd2_journal_file_buffer(jh, transaction, BJ_Forget);
@@ -1481,6 +1481,7 @@ int jbd2_journal_forget (handle_t *handle, struct buffer_head *bh)
 				goto drop;
 			}
 		}
+		spin_unlock(&journal->j_list_lock);
 	} else if (jh->b_transaction) {
 		J_ASSERT_JH(jh, (jh->b_transaction ==
 				 journal->j_committing_transaction));
@@ -1492,7 +1493,9 @@ int jbd2_journal_forget (handle_t *handle, struct buffer_head *bh)
 
 		if (jh->b_next_transaction) {
 			J_ASSERT(jh->b_next_transaction == transaction);
+			spin_lock(&journal->j_list_lock);
 			jh->b_next_transaction = NULL;
+			spin_unlock(&journal->j_list_lock);
 
 			/*
 			 * only drop a reference if this transaction modified
@@ -1504,7 +1507,6 @@ int jbd2_journal_forget (handle_t *handle, struct buffer_head *bh)
 	}
 
 not_jbd:
-	spin_unlock(&journal->j_list_lock);
 	jbd_unlock_bh_state(bh);
 	__brelse(bh);
 drop:
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-03-09  6:05 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-09  6:05 [PATCH 1/6] jbd2: don't hold j_state_lock while calling wake_up() Theodore Ts'o
2014-03-09  6:05 ` [PATCH 2/6] jbd2: calculate statistics without holding j_state_lock and j_list_lock Theodore Ts'o
2014-03-09  6:05 ` [PATCH 3/6] jbd2: add transaction to checkpoint list earlier Theodore Ts'o
2014-03-09  6:05 ` [PATCH 4/6] jbd2: check jh->b_transaction without taking j_list_lock Theodore Ts'o
2014-03-09  6:05 ` [PATCH 5/6] jbd2: minimize region locked by j_list_lock in journal_get_create_access() Theodore Ts'o
2014-03-09  6:05 ` [PATCH 6/6] jbd2: minimize region locked by j_list_lock in jbd2_journal_forget() Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).