[PATCH] [RFC] Fix jbd2 to stop waking up sleeping disks on sync

public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] [RFC] Fix jbd2 to stop waking up sleeping disks on sync
@ 2024-02-27 21:25 Phillip Susi
  2024-02-29  8:07 ` Theodore Tso
  0 siblings, 1 reply; 5+ messages in thread
From: Phillip Susi @ 2024-02-27 21:25 UTC (permalink / raw)
  To: tytso; +Cc: linux-ext4, Phillip Susi

I noticed that every time I sync ( which happens automatically when
you suspend to ram ), ext4 issues a flush to the block device, even
though there have been no writes to flush.  This appears to be because
jbd2_trans_will_send_data_barrier() returns a 0 when no transaction
has been started.  The intent appears to be that a transaction that
has completed should return 0, and that when there is NO transaction,
it should return a 1, but the tests were in the wrong order, leading
to the 0 to be returned before checking for the absense of a
transaction at all.  Reversing the order allows my disk to remain in
runtime_pm when syncing.

I *think* this is correct, but I'm not very familliar with jbd2, so it
may have unintended consequences.  What do you think?
---
 fs/jbd2/journal.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index b6c114c11b97..be13dae767be 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -632,14 +632,16 @@ int jbd2_trans_will_send_data_barrier(journal_t *journal, tid_t tid)
 	if (!(journal->j_flags & JBD2_BARRIER))
 		return 0;
 	read_lock(&journal->j_state_lock);
-	/* Transaction already committed? */
-	if (tid_geq(journal->j_commit_sequence, tid))
-		goto out;
 	commit_trans = journal->j_committing_transaction;
 	if (!commit_trans || commit_trans->t_tid != tid) {
 		ret = 1;
 		goto out;
 	}
+	/* Transaction already committed? */
+	if (tid_geq(journal->j_commit_sequence, tid))
+	{
+		goto out;
+	}
 	/*
 	 * Transaction is being committed and we already proceeded to
 	 * submitting a flush to fs partition?
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] [RFC] Fix jbd2 to stop waking up sleeping disks on sync
  2024-02-27 21:25 [PATCH] [RFC] Fix jbd2 to stop waking up sleeping disks on sync Phillip Susi
@ 2024-02-29  8:07 ` Theodore Tso
  2024-02-29 14:23   ` Phillip Susi
  0 siblings, 1 reply; 5+ messages in thread
From: Theodore Tso @ 2024-02-29  8:07 UTC (permalink / raw)
  To: Phillip Susi; +Cc: linux-ext4

On Tue, Feb 27, 2024 at 04:25:46PM -0500, Phillip Susi wrote:
> I noticed that every time I sync ( which happens automatically when
> you suspend to ram ), ext4 issues a flush to the block device, even
> though there have been no writes to flush.  This appears to be because
> jbd2_trans_will_send_data_barrier() returns a 0 when no transaction
> has been started.  The intent appears to be that a transaction that
> has completed should return 0, and that when there is NO transaction,
> it should return a 1, but the tests were in the wrong order, leading
> to the 0 to be returned before checking for the absense of a
> transaction at all.  Reversing the order allows my disk to remain in
> runtime_pm when syncing.
> 
> I *think* this is correct, but I'm not very familliar with jbd2, so it
> may have unintended consequences.  What do you think?

Yeah, this change is going to problems.  The basic idea here is if
when we request that a transaction to commit, will it issue a a
commit?  If so, then fsync(2) doesn't need to issue a barrier (i.e., a
cache flush command).

So for example, if a database does an overwriting write to a file
block which is already allocated, and then follows it up with a
fdatasync(2), there won't be any need to make any metadata changes as
part of writing out the changed block.  Hence, we won't need to start
a new jbd2 transaction, and in that case, current transaction has
already commited, so the jbd2 layer won't need to do anything, and so
it won't have issued a commit.  In that case,
jbd2_trans_will_send_data_barrier() needs to return false, so that
fdatasync(2) will actually issue a cache flush command.

The patch you've proposed will cause fdatasync(2) to not issue a
barrier, which could lead to the write to the database file getting
lost after a power fail event, which would make the database
adminisrtator very sad.

Cheers,

					- Ted

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] [RFC] Fix jbd2 to stop waking up sleeping disks on sync
  2024-02-29  8:07 ` Theodore Tso
@ 2024-02-29 14:23   ` Phillip Susi
  2024-02-29 14:58     ` Theodore Ts'o
  0 siblings, 1 reply; 5+ messages in thread
From: Phillip Susi @ 2024-02-29 14:23 UTC (permalink / raw)
  To: Theodore Tso; +Cc: linux-ext4

"Theodore Tso" <tytso@mit.edu> writes:

> Yeah, this change is going to problems.  The basic idea here is if
> when we request that a transaction to commit, will it issue a a
> commit?  If so, then fsync(2) doesn't need to issue a barrier (i.e., a
> cache flush command).
>
> So for example, if a database does an overwriting write to a file
> block which is already allocated, and then follows it up with a
> fdatasync(2), there won't be any need to make any metadata changes as
> part of writing out the changed block.  Hence, we won't need to start
> a new jbd2 transaction, and in that case, current transaction has
> already commited, so the jbd2 layer won't need to do anything, and so
> it won't have issued a commit.  In that case,
> jbd2_trans_will_send_data_barrier() needs to return false, so that
> fdatasync(2) will actually issue a cache flush command.
>
> The patch you've proposed will cause fdatasync(2) to not issue a
> barrier, which could lead to the write to the database file getting
> lost after a power fail event, which would make the database
> adminisrtator very sad.

So because no metadata changed, jbd2 will not issue a barrier to end the
transaction?  How can we fix this then?  Is there some way that jbd2 can
know whether file data has been written, and thus, issue the barrier to
close the transaction?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] [RFC] Fix jbd2 to stop waking up sleeping disks on sync
  2024-02-29 14:23   ` Phillip Susi
@ 2024-02-29 14:58     ` Theodore Ts'o
  2024-02-29 20:19       ` Phillip Susi
  0 siblings, 1 reply; 5+ messages in thread
From: Theodore Ts'o @ 2024-02-29 14:58 UTC (permalink / raw)
  To: Phillip Susi; +Cc: linux-ext4

On Thu, Feb 29, 2024 at 09:23:35AM -0500, Phillip Susi wrote:
> 
> So because no metadata changed, jbd2 will not issue a barrier to end the
> transaction?  How can we fix this then?  Is there some way that jbd2 can
> know whether file data has been written, and thus, issue the barrier to
> close the transaction?

Because no metadata changed, jbd2 will not even *start* a (jbd2)
transaction as a result of that write (overwrite) to an already
allocated data block..  Since it didn't start a jbd2 transaction for
this file system operation, there's no reason to force a jbd2
transaction to close.

(Note: this could because there *is* no currently existing open
transaction, or there might be a currently open transaction, but it's
not relevent to the activity associated with the file descriptor being
fsync'ed.)

This is a critical performance optimization, because for many database
workloads, which are overwriting existing blocks and using
fdatasync(2), there is no reason to force a jbd2 transaction commit
for every single fdatasync(2) issued by the database.  However, we
still need to send a cache flush operation so that the data block is
safely persistend onto stable storage.

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] [RFC] Fix jbd2 to stop waking up sleeping disks on sync
  2024-02-29 14:58     ` Theodore Ts'o
@ 2024-02-29 20:19       ` Phillip Susi
  0 siblings, 0 replies; 5+ messages in thread
From: Phillip Susi @ 2024-02-29 20:19 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4

"Theodore Ts'o" <tytso@mit.edu> writes:

> Because no metadata changed, jbd2 will not even *start* a (jbd2)
> transaction as a result of that write (overwrite) to an already
> allocated data block..  Since it didn't start a jbd2 transaction for
> this file system operation, there's no reason to force a jbd2
> transaction to close.
>
> (Note: this could because there *is* no currently existing open
> transaction, or there might be a currently open transaction, but it's
> not relevent to the activity associated with the file descriptor being
> fsync'ed.)
>
> This is a critical performance optimization, because for many database
> workloads, which are overwriting existing blocks and using
> fdatasync(2), there is no reason to force a jbd2 transaction commit
> for every single fdatasync(2) issued by the database.  However, we
> still need to send a cache flush operation so that the data block is
> safely persistend onto stable storage.

So maybe what ext4's sync_fs needs to know is whether ANY writes have
been done since the last transaction committed?  Is there a way to know
that?  As long as NOTHING has been written since the last commit, then
there is no need to issue a flush.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-02-29 20:19 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-27 21:25 [PATCH] [RFC] Fix jbd2 to stop waking up sleeping disks on sync Phillip Susi
2024-02-29  8:07 ` Theodore Tso
2024-02-29 14:23   ` Phillip Susi
2024-02-29 14:58     ` Theodore Ts'o
2024-02-29 20:19       ` Phillip Susi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox