[PATCH RFC 0/2] fs: fully sync all fsese even for an emergency sync

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RFC 0/2] fs: fully sync all fsese even for an emergency sync
@ 2025-11-03  4:07 Qu Wenruo
  2025-11-03  4:07 ` [PATCH RFC 1/2] fs: do not pass a parameter for sync_inodes_one_sb() Qu Wenruo
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Qu Wenruo @ 2025-11-03  4:07 UTC (permalink / raw)
  To: linux-btrfs, linux-fsdevel; +Cc: viro, brauner, jack

The first patch is a cleanup related to sync_inodes_one_sb() callback.
Since it always wait for the writeback, there is no need to pass any
parameter for it.

The second patch is a fix mostly affecting btrfs, as btrfs requires a
explicit sync_fc() call with wait == 1, to commit its super blocks,
and sync_bdevs() won't cut it at all.

However the current emergency sync never passes wait == 1, it means
btrfs will writeback all dirty data and metadata, but still no super
block update, resulting everything still pointing back to the old
data/metadata.

This lead to a problem where btrfs doesn't seem to do anything during
emergency sync.

The second patch fixes the problem by passing wait == 1 for the second
iteration of sync_fs_one_sb().

[REASON FOR RFC]
I am not sure which way should I fix the bug.

I can definitely put btrfs to ignore the @wait parameter and always do
transaction commit, that will definitely fix the bug, but btrfs will do
two transaction commits for emergency sync.
Which may or may not be a problem for emergency sync itself, but will
definitely cause a lot of unnessary small transactions during regular
sync_fs() calls and degrade the peroformance.

On the other hand, I also didn't see why we can not follow the common
pattern inside emergency_sync(), all other call sites are syncing the fs
first with nowait, then wait.
(E.g. sync_filesyastem() and ksys_sync()).

I know it's an emergency sync thus we don't want to wait, but please
also remember that sync_inodes_one_sb() is always waiting, and I'm
pretty sure we spend most of the time inside sync_inodes_one_sb(), thus
it looks more sane to fix the only exception inside fs/sync.c.

Qu Wenruo (2):
  fs: do not pass a parameter for sync_inodes_one_sb()
  fs: fully sync all fses even for an emergency sync

 fs/sync.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

-- 
2.51.2

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH RFC 1/2] fs: do not pass a parameter for sync_inodes_one_sb()
  2025-11-03  4:07 [PATCH RFC 0/2] fs: fully sync all fsese even for an emergency sync Qu Wenruo
@ 2025-11-03  4:07 ` Qu Wenruo
  2025-11-03 11:51   ` Christoph Hellwig
  2025-11-03  4:07 ` [PATCH RFC 2/2] fs: fully sync all fses even for an emergency sync Qu Wenruo
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Qu Wenruo @ 2025-11-03  4:07 UTC (permalink / raw)
  To: linux-btrfs, linux-fsdevel; +Cc: viro, brauner, jack

The function sync_inodes_one_sb() will always wait for the writeback,
and ignore the optional parameter.

Explicitly pass NULL as parameter for the call sites inside
do_sync_work().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/sync.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/sync.c b/fs/sync.c
index 7e26c6f1f2a1..1081cd0e89c7 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -126,10 +126,10 @@ static void do_sync_work(struct work_struct *work)
 	 * Sync twice to reduce the possibility we skipped some inodes / pages
 	 * because they were temporarily locked
 	 */
-	iterate_supers(sync_inodes_one_sb, &nowait);
+	iterate_supers(sync_inodes_one_sb, NULL);
 	iterate_supers(sync_fs_one_sb, &nowait);
 	sync_bdevs(false);
-	iterate_supers(sync_inodes_one_sb, &nowait);
+	iterate_supers(sync_inodes_one_sb, NULL);
 	iterate_supers(sync_fs_one_sb, &nowait);
 	sync_bdevs(false);
 	printk("Emergency Sync complete\n");
-- 
2.51.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC 2/2] fs: fully sync all fses even for an emergency sync
  2025-11-03  4:07 [PATCH RFC 0/2] fs: fully sync all fsese even for an emergency sync Qu Wenruo
  2025-11-03  4:07 ` [PATCH RFC 1/2] fs: do not pass a parameter for sync_inodes_one_sb() Qu Wenruo
@ 2025-11-03  4:07 ` Qu Wenruo
  2025-11-03 11:56   ` Christoph Hellwig
  2025-11-03 16:00 ` [PATCH RFC 0/2] fs: fully sync all fsese " Askar Safin
  2025-11-05 11:31 ` Christian Brauner
  3 siblings, 1 reply; 18+ messages in thread
From: Qu Wenruo @ 2025-11-03  4:07 UTC (permalink / raw)
  To: linux-btrfs, linux-fsdevel; +Cc: viro, brauner, jack, Askar Safin

[BUG]
There is a bug report that during emergency sync, btrfs only write back
all the dirty data and metadadta, but no full transaction commit,
resulting the super block still pointing to the old trees, thus the end
user can only see the old data, not the newer one.

[CAUSE]
Initially this looks like a btrfs specific bug, since ext4 doesn't get
affected by this one.

But the root problem here is, a combination of btrfs features and the no
wait nature of emergency sync.

Firstly do_sync_work() will call sync_inodes_one_sb() for every fs, to
writeback all the dirty pages for the fs.

Btrfs will properly writeback all dirty pages, including both data and
the updated metadata. So far so good.

Then sync_fs_one_sb() called with @nowait, in the case of btrfs it means
no full transaction commit, thus no super block update.

At this stage, btrfs is only one super block update away to be fully committed.
I believe it's the more or less the same for other fses too.

The problem is the next step, sync_bdevs().
Normally other fses have their super block already updated in the page
cache of the block device, but btrfs only updates the super block during
full transaction commit.

So sync_bdevs() may work for other fses, but not for btrfs, btrfs is
still using its older super block, all pointing back to the old metadata
and data.

Thus if after emergency sync, power loss happened, the end user will
only see the old data, not the newer one, despite that everything but the
super block is already written back.

[FIX]
Since the emergency sync is already executing in a workqueue, I didn't
see much need to only do a nowait sync.
Especially after the fact that sync_inodes_one_sb() always wait for the
writeback to finish.

Instead for the second iteration of sync_fs_one_sb(), pass wait == 1
into it, so fses like btrfs can properly commit its super blocks.

Reported-by: Askar Safin <safinaskar@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/20251101150429.321537-1-safinaskar@gmail.com/
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/sync.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/sync.c b/fs/sync.c
index 1081cd0e89c7..dd692df54a85 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -121,6 +121,7 @@ SYSCALL_DEFINE0(sync)
 static void do_sync_work(struct work_struct *work)
 {
 	int nowait = 0;
+	int wait = 1;

 	/*
 	 * Sync twice to reduce the possibility we skipped some inodes / pages
@@ -130,7 +131,7 @@ static void do_sync_work(struct work_struct *work)
 	iterate_supers(sync_fs_one_sb, &nowait);
 	sync_bdevs(false);
 	iterate_supers(sync_inodes_one_sb, NULL);
-	iterate_supers(sync_fs_one_sb, &nowait);
+	iterate_supers(sync_fs_one_sb, &wait);
 	sync_bdevs(false);
 	printk("Emergency Sync complete\n");
 	kfree(work);
-- 
2.51.2

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 1/2] fs: do not pass a parameter for sync_inodes_one_sb()
  2025-11-03  4:07 ` [PATCH RFC 1/2] fs: do not pass a parameter for sync_inodes_one_sb() Qu Wenruo
@ 2025-11-03 11:51   ` Christoph Hellwig
  2025-11-04  8:42     ` Jan Kara
  0 siblings, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2025-11-03 11:51 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, linux-fsdevel, viro, brauner, jack

On Mon, Nov 03, 2025 at 02:37:28PM +1030, Qu Wenruo wrote:
> The function sync_inodes_one_sb() will always wait for the writeback,
> and ignore the optional parameter.

Indeed.

> Explicitly pass NULL as parameter for the call sites inside
> do_sync_work().

Note that this also kinda defeats having the two passes.

Te non-blocking sync seems to have gone away in:

b3de653105180b57af90ef2f5b8441f085f4ff56
Author: Jan Kara <jack@suse.cz>
Date:   Tue Jul 3 16:45:30 2012 +0200

    vfs: Reorder operations during sys_sync

(the argument was later briefly used for something else and that
then got reverted)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 2/2] fs: fully sync all fses even for an emergency sync
  2025-11-03  4:07 ` [PATCH RFC 2/2] fs: fully sync all fses even for an emergency sync Qu Wenruo
@ 2025-11-03 11:56   ` Christoph Hellwig
  2025-11-03 12:04     ` Christoph Hellwig
  2025-11-03 20:55     ` Qu Wenruo
  0 siblings, 2 replies; 18+ messages in thread
From: Christoph Hellwig @ 2025-11-03 11:56 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, linux-fsdevel, viro, brauner, jack, Askar Safin

The emergency sync being non-blocking goes back to day 1.  I think the
idea behind it is to not lock up a already messed up system by
blocking forever, even if it is in workqueue.  Changing this feels
a bit risky to me.

On Mon, Nov 03, 2025 at 02:37:29PM +1030, Qu Wenruo wrote:
> At this stage, btrfs is only one super block update away to be fully committed.
> I believe it's the more or less the same for other fses too.

Most file systems do not need a superblock update to commit data.

> The problem is the next step, sync_bdevs().
> Normally other fses have their super block already updated in the page
> cache of the block device, but btrfs only updates the super block during
> full transaction commit.
>
> So sync_bdevs() may work for other fses, but not for btrfs, btrfs is
> still using its older super block, all pointing back to the old metadata
> and data.
> 

At least for XFS, no metadata is written through the block device
mapping anyway.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 2/2] fs: fully sync all fses even for an emergency sync
  2025-11-03 11:56   ` Christoph Hellwig
@ 2025-11-03 12:04     ` Christoph Hellwig
  2025-11-03 20:46       ` Qu Wenruo
  2025-11-03 20:55     ` Qu Wenruo
  1 sibling, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2025-11-03 12:04 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, linux-fsdevel, viro, brauner, jack, Askar Safin

Forgot to add, but I think the main issue here is that btrfs clears the
writeback flag on the folio before commiting the metadata for the
writeback.  I tried to fix this a few years ago, and IIRC Josef tried
again a few month ago.  If that is fixed, even a non-blocking writeback
will commit all the updates it actually kicked off, just like for other
file systems. (It will also sort out all kinds of annoying locking
issues)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 0/2] fs: fully sync all fsese even for an emergency sync
  2025-11-03  4:07 [PATCH RFC 0/2] fs: fully sync all fsese even for an emergency sync Qu Wenruo
  2025-11-03  4:07 ` [PATCH RFC 1/2] fs: do not pass a parameter for sync_inodes_one_sb() Qu Wenruo
  2025-11-03  4:07 ` [PATCH RFC 2/2] fs: fully sync all fses even for an emergency sync Qu Wenruo
@ 2025-11-03 16:00 ` Askar Safin
  2025-11-05 11:31 ` Christian Brauner
  3 siblings, 0 replies; 18+ messages in thread
From: Askar Safin @ 2025-11-03 16:00 UTC (permalink / raw)
  To: wqu; +Cc: brauner, jack, linux-btrfs, linux-fsdevel, viro

Qu Wenruo <wqu@suse.com>:
> The first patch is a cleanup related to sync_inodes_one_sb() callback.
> Since it always wait for the writeback, there is no need to pass any
> parameter for it.

I tested this patchset, and it indeed works (in Qemu).

-- 
Askar Safin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 2/2] fs: fully sync all fses even for an emergency sync
  2025-11-03 12:04     ` Christoph Hellwig
@ 2025-11-03 20:46       ` Qu Wenruo
  0 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2025-11-03 20:46 UTC (permalink / raw)
  To: Christoph Hellwig, Qu Wenruo
  Cc: linux-btrfs, linux-fsdevel, viro, brauner, jack, Askar Safin



在 2025/11/3 22:34, Christoph Hellwig 写道:
> Forgot to add, but I think the main issue here is that btrfs clears the
> writeback flag on the folio before commiting the metadata for the
> writeback.

During my debug runs, the metadata are properly written back to storage, 
during the sync_inodes_one_sb() calls.

So that doesn't seem to be the problem.

>  I tried to fix this a few years ago, and IIRC Josef tried
> again a few month ago.  If that is fixed, even a non-blocking writeback
> will commit all the updates it actually kicked off, just like for other
> file systems. (It will also sort out all kinds of annoying locking
> issues)

The problem is still there, even if all new metadata are committed, the 
super block is still not, thus after a power loss we are still seeing 
the old metadata, not the newer ones.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 2/2] fs: fully sync all fses even for an emergency sync
  2025-11-03 11:56   ` Christoph Hellwig
  2025-11-03 12:04     ` Christoph Hellwig
@ 2025-11-03 20:55     ` Qu Wenruo
  2025-11-04  8:28       ` Jan Kara
  1 sibling, 1 reply; 18+ messages in thread
From: Qu Wenruo @ 2025-11-03 20:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-btrfs, linux-fsdevel, viro, brauner, jack, Askar Safin



在 2025/11/3 22:26, Christoph Hellwig 写道:
> The emergency sync being non-blocking goes back to day 1.  I think the
> idea behind it is to not lock up a already messed up system by
> blocking forever, even if it is in workqueue.  Changing this feels
> a bit risky to me.

Considering everything is already done in task context (baked by the 
global per-cpu workqueue), it at least won't block anything else.

And I'd say if the fs is already screwed up and hanging, the 
sync_inodes_one_sb() call are more likely to hang than the final 
sync_fs() call.

> 
> On Mon, Nov 03, 2025 at 02:37:29PM +1030, Qu Wenruo wrote:
>> At this stage, btrfs is only one super block update away to be fully committed.
>> I believe it's the more or less the same for other fses too.
> 
> Most file systems do not need a superblock update to commit data.

That's the main difference, btrfs always needs a superblock update to 
switch metadata due to its metadata COW nature.

The only good news is, emergency sync is not that a hot path, we have a 
lot of time to properly fix.

>> The problem is the next step, sync_bdevs().
>> Normally other fses have their super block already updated in the page
>> cache of the block device, but btrfs only updates the super block during
>> full transaction commit.
>>
>> So sync_bdevs() may work for other fses, but not for btrfs, btrfs is
>> still using its older super block, all pointing back to the old metadata
>> and data.
>>
> 
> At least for XFS, no metadata is written through the block device
> mapping anyway.
> 

So does that mean sync_inodes_one_sb() on XFS (or even ext4?) will 
always submit needed metadata (journal?) to disk?

Thanks,
Qu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 2/2] fs: fully sync all fses even for an emergency sync
  2025-11-03 20:55     ` Qu Wenruo
@ 2025-11-04  8:28       ` Jan Kara
  2025-11-04  8:39         ` Jan Kara
  2025-11-04  8:43         ` Qu Wenruo
  0 siblings, 2 replies; 18+ messages in thread
From: Jan Kara @ 2025-11-04  8:28 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, linux-btrfs, linux-fsdevel, viro, brauner,
	jack, Askar Safin

On Tue 04-11-25 07:25:06, Qu Wenruo wrote:
> 
> 
> 在 2025/11/3 22:26, Christoph Hellwig 写道:
> > The emergency sync being non-blocking goes back to day 1.  I think the
> > idea behind it is to not lock up a already messed up system by
> > blocking forever, even if it is in workqueue.  Changing this feels
> > a bit risky to me.
> 
> Considering everything is already done in task context (baked by the global
> per-cpu workqueue), it at least won't block anything else.
> 
> And I'd say if the fs is already screwed up and hanging, the
> sync_inodes_one_sb() call are more likely to hang than the final sync_fs()
> call.

Well, but notice that sync_inodes_one_sb() is always called with wait == 0
from do_sync_work() exactly to skip inodes already marked as under
writeback, locked pages or pages under writeback as waiting for these has
high chances of locking up. Suddently calling sync_fs_one_sb() with wait ==
1 can change things. That being said for ext4 the chances of locking up
ext4_sync_fs() with wait == 1 after sync_fs_one_sb() managed to do
non-trivial work are fairly minimal so I don't have strong objections
myself.

> > On Mon, Nov 03, 2025 at 02:37:29PM +1030, Qu Wenruo wrote:
> > > At this stage, btrfs is only one super block update away to be fully committed.
> > > I believe it's the more or less the same for other fses too.
> > 
> > Most file systems do not need a superblock update to commit data.
> 
> That's the main difference, btrfs always needs a superblock update to switch
> metadata due to its metadata COW nature.
> 
> The only good news is, emergency sync is not that a hot path, we have a lot
> of time to properly fix.

I'd say even better news is that emergency sync is used practically only
when debugging the kernel. So we can do what we wish and will have to live
with whatever pain we inflict onto ourselves ;).

> > > The problem is the next step, sync_bdevs().
> > > Normally other fses have their super block already updated in the page
> > > cache of the block device, but btrfs only updates the super block during
> > > full transaction commit.
> > > 
> > > So sync_bdevs() may work for other fses, but not for btrfs, btrfs is
> > > still using its older super block, all pointing back to the old metadata
> > > and data.
> > > 
> > 
> > At least for XFS, no metadata is written through the block device
> > mapping anyway.
> > 
> 
> So does that mean sync_inodes_one_sb() on XFS (or even ext4?) will always
> submit needed metadata (journal?) to disk?

No, sync_inodes_one_sb() will just prepare transaction in memory (both for
ext4 and xfs). For ext4 sync_fs_one_sb() with wait == 0 will start writeback
of this transaction to the disk and sync_fs_one_sb() with wait == 1 will make
sure the transaction is fully written out (committed). For xfs
sync_fs_one_sb() with wait == 0 does nothing, sync_fs_one_sb() with wait
== 1 makes sure the transaction is committed.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 2/2] fs: fully sync all fses even for an emergency sync
  2025-11-04  8:28       ` Jan Kara
@ 2025-11-04  8:39         ` Jan Kara
  2025-11-05 11:32           ` Christian Brauner
  2025-11-04  8:43         ` Qu Wenruo
  1 sibling, 1 reply; 18+ messages in thread
From: Jan Kara @ 2025-11-04  8:39 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, linux-btrfs, linux-fsdevel, viro, brauner,
	jack, Askar Safin

On Tue 04-11-25 09:28:27, Jan Kara wrote:
> On Tue 04-11-25 07:25:06, Qu Wenruo wrote:
> > 
> > 
> > 在 2025/11/3 22:26, Christoph Hellwig 写道:
> > > The emergency sync being non-blocking goes back to day 1.  I think the
> > > idea behind it is to not lock up a already messed up system by
> > > blocking forever, even if it is in workqueue.  Changing this feels
> > > a bit risky to me.
> > 
> > Considering everything is already done in task context (baked by the global
> > per-cpu workqueue), it at least won't block anything else.
> > 
> > And I'd say if the fs is already screwed up and hanging, the
> > sync_inodes_one_sb() call are more likely to hang than the final sync_fs()
> > call.
> 
> Well, but notice that sync_inodes_one_sb() is always called with wait == 0
> from do_sync_work() exactly to skip inodes already marked as under
> writeback, locked pages or pages under writeback as waiting for these has
> high chances of locking up. Suddently calling sync_fs_one_sb() with wait ==
> 1 can change things. That being said for ext4 the chances of locking up
> ext4_sync_fs() with wait == 1 after sync_fs_one_sb() managed to do
> non-trivial work are fairly minimal so I don't have strong objections
> myself.

Ah, ok, now I've checked the code and read patch 1 in this thread. Indeed
sync_inodes_one_sb() ignores the wait parameter and waits for everything.
Given we've been running like this for over 10 years and nobody complained
I agree calling sync_fs_one_sb() with wait == 1 is worth trying if it makes
life better for btrfs.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 1/2] fs: do not pass a parameter for sync_inodes_one_sb()
  2025-11-03 11:51   ` Christoph Hellwig
@ 2025-11-04  8:42     ` Jan Kara
  2025-11-05 18:50       ` Daniel Vacek
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Kara @ 2025-11-04  8:42 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Qu Wenruo, linux-btrfs, linux-fsdevel, viro, brauner, jack

On Mon 03-11-25 03:51:21, Christoph Hellwig wrote:
> On Mon, Nov 03, 2025 at 02:37:28PM +1030, Qu Wenruo wrote:
> > The function sync_inodes_one_sb() will always wait for the writeback,
> > and ignore the optional parameter.
> 
> Indeed.

Yeah, apparently I've broken non-blocking nature of emergency sync without
nobody noticing for 13 years. Which probably means the non-blocking logic
isn't that important ;)...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 2/2] fs: fully sync all fses even for an emergency sync
  2025-11-04  8:28       ` Jan Kara
  2025-11-04  8:39         ` Jan Kara
@ 2025-11-04  8:43         ` Qu Wenruo
  2025-11-04 12:42           ` Jan Kara
  1 sibling, 1 reply; 18+ messages in thread
From: Qu Wenruo @ 2025-11-04  8:43 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, linux-btrfs, linux-fsdevel, viro, brauner,
	Askar Safin



[...]
>>> On Mon, Nov 03, 2025 at 02:37:29PM +1030, Qu Wenruo wrote:
>>>> At this stage, btrfs is only one super block update away to be fully committed.
>>>> I believe it's the more or less the same for other fses too.
>>>
>>> Most file systems do not need a superblock update to commit data.
>>
>> That's the main difference, btrfs always needs a superblock update to switch
>> metadata due to its metadata COW nature.
>>
>> The only good news is, emergency sync is not that a hot path, we have a lot
>> of time to properly fix.
> 
> I'd say even better news is that emergency sync is used practically only
> when debugging the kernel. So we can do what we wish and will have to live
> with whatever pain we inflict onto ourselves ;).

Then what about some documents around the sysrq-s usage?

The current docs only shows "Will attempt to sync all mounted 
filesystems", thus I guess that's the reason why the original reporter 
is testing it, expecting it's a proper way to sync the fses.

> 
>>>> The problem is the next step, sync_bdevs().
>>>> Normally other fses have their super block already updated in the page
>>>> cache of the block device, but btrfs only updates the super block during
>>>> full transaction commit.
>>>>
>>>> So sync_bdevs() may work for other fses, but not for btrfs, btrfs is
>>>> still using its older super block, all pointing back to the old metadata
>>>> and data.
>>>>
>>>
>>> At least for XFS, no metadata is written through the block device
>>> mapping anyway.
>>>
>>
>> So does that mean sync_inodes_one_sb() on XFS (or even ext4?) will always
>> submit needed metadata (journal?) to disk?
> 
> No, sync_inodes_one_sb() will just prepare transaction in memory (both for
> ext4 and xfs). For ext4 sync_fs_one_sb() with wait == 0 will start writeback
> of this transaction to the disk and sync_fs_one_sb() with wait == 1 will make
> sure the transaction is fully written out (committed). For xfs
> sync_fs_one_sb() with wait == 0 does nothing, sync_fs_one_sb() with wait
> == 1 makes sure the transaction is committed.

Then my question is, why EXT4 (and XFS) is fine with the emergency sync 
with a power loss, at least according to the original reporter.

Is the journal already committed for every metadata changes?

Thanks,
Qu

> 
> 								Honza


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 2/2] fs: fully sync all fses even for an emergency sync
  2025-11-04  8:43         ` Qu Wenruo
@ 2025-11-04 12:42           ` Jan Kara
  2025-11-04 15:48             ` Christoph Hellwig
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Kara @ 2025-11-04 12:42 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Jan Kara, Christoph Hellwig, linux-btrfs, linux-fsdevel, viro,
	brauner, Askar Safin

On Tue 04-11-25 19:13:04, Qu Wenruo wrote:
> [...]
> > > > On Mon, Nov 03, 2025 at 02:37:29PM +1030, Qu Wenruo wrote:
> > > > > At this stage, btrfs is only one super block update away to be fully committed.
> > > > > I believe it's the more or less the same for other fses too.
> > > > 
> > > > Most file systems do not need a superblock update to commit data.
> > > 
> > > That's the main difference, btrfs always needs a superblock update to switch
> > > metadata due to its metadata COW nature.
> > > 
> > > The only good news is, emergency sync is not that a hot path, we have a lot
> > > of time to properly fix.
> > 
> > I'd say even better news is that emergency sync is used practically only
> > when debugging the kernel. So we can do what we wish and will have to live
> > with whatever pain we inflict onto ourselves ;).
> 
> Then what about some documents around the sysrq-s usage?
> 
> The current docs only shows "Will attempt to sync all mounted filesystems",
> thus I guess that's the reason why the original reporter is testing it,
> expecting it's a proper way to sync the fses.

The key is in the word "attempt" :). But yes, we could expand the
description to something like "Will try to writeback data on all mounted
filesystems. There are no guarantees when the data actually hits the
storage or whether all of the data ever hits it. It is just a desperate
attempt to limit data loss if possible."

> > > > > The problem is the next step, sync_bdevs().
> > > > > Normally other fses have their super block already updated in the page
> > > > > cache of the block device, but btrfs only updates the super block during
> > > > > full transaction commit.
> > > > > 
> > > > > So sync_bdevs() may work for other fses, but not for btrfs, btrfs is
> > > > > still using its older super block, all pointing back to the old metadata
> > > > > and data.
> > > > > 
> > > > 
> > > > At least for XFS, no metadata is written through the block device
> > > > mapping anyway.
> > > > 
> > > 
> > > So does that mean sync_inodes_one_sb() on XFS (or even ext4?) will always
> > > submit needed metadata (journal?) to disk?
> > 
> > No, sync_inodes_one_sb() will just prepare transaction in memory (both for
> > ext4 and xfs). For ext4 sync_fs_one_sb() with wait == 0 will start writeback
> > of this transaction to the disk and sync_fs_one_sb() with wait == 1 will make
> > sure the transaction is fully written out (committed). For xfs
> > sync_fs_one_sb() with wait == 0 does nothing, sync_fs_one_sb() with wait
> > == 1 makes sure the transaction is committed.
> 
> Then my question is, why EXT4 (and XFS) is fine with the emergency sync with
> a power loss, at least according to the original reporter.
> 
> Is the journal already committed for every metadata changes?

No it is not. For ext4 if the transaction grows large we'll commit it. Also
if it gets too old (5s by default), we'll commit it. XFS will likely have
similar constraints. So data loss is limited but definitely not completely
avoided. But that never was the point of emergency sync.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 2/2] fs: fully sync all fses even for an emergency sync
  2025-11-04 12:42           ` Jan Kara
@ 2025-11-04 15:48             ` Christoph Hellwig
  0 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2025-11-04 15:48 UTC (permalink / raw)
  To: Jan Kara
  Cc: Qu Wenruo, Christoph Hellwig, linux-btrfs, linux-fsdevel, viro,
	brauner, Askar Safin

On Tue, Nov 04, 2025 at 01:42:14PM +0100, Jan Kara wrote:
> No it is not. For ext4 if the transaction grows large we'll commit it. Also
> if it gets too old (5s by default), we'll commit it. XFS will likely have
> similar constraints. So data loss is limited but definitely not completely
> avoided. But that never was the point of emergency sync.

Yes, XFS will eventually clear the log as well.  But the important part
is that writeback of any kind will actually log the metadata changes.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 0/2] fs: fully sync all fsese even for an emergency sync
  2025-11-03  4:07 [PATCH RFC 0/2] fs: fully sync all fsese even for an emergency sync Qu Wenruo
                   ` (2 preceding siblings ...)
  2025-11-03 16:00 ` [PATCH RFC 0/2] fs: fully sync all fsese " Askar Safin
@ 2025-11-05 11:31 ` Christian Brauner
  3 siblings, 0 replies; 18+ messages in thread
From: Christian Brauner @ 2025-11-05 11:31 UTC (permalink / raw)
  To: linux-btrfs, linux-fsdevel, Qu Wenruo; +Cc: Christian Brauner, viro, jack

On Mon, 03 Nov 2025 14:37:27 +1030, Qu Wenruo wrote:
> The first patch is a cleanup related to sync_inodes_one_sb() callback.
> Since it always wait for the writeback, there is no need to pass any
> parameter for it.
> 
> The second patch is a fix mostly affecting btrfs, as btrfs requires a
> explicit sync_fc() call with wait == 1, to commit its super blocks,
> and sync_bdevs() won't cut it at all.
> 
> [...]

Applied to the vfs-6.19.misc branch of the vfs/vfs.git tree.
Patches in the vfs-6.19.misc branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs-6.19.misc

[1/2] fs: do not pass a parameter for sync_inodes_one_sb()
      https://git.kernel.org/vfs/vfs/c/fbc22c299636
[2/2] fs: fully sync all fses even for an emergency sync
      https://git.kernel.org/vfs/vfs/c/2706659d642e

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 2/2] fs: fully sync all fses even for an emergency sync
  2025-11-04  8:39         ` Jan Kara
@ 2025-11-05 11:32           ` Christian Brauner
  0 siblings, 0 replies; 18+ messages in thread
From: Christian Brauner @ 2025-11-05 11:32 UTC (permalink / raw)
  To: Jan Kara
  Cc: Qu Wenruo, Christoph Hellwig, linux-btrfs, linux-fsdevel, viro,
	Askar Safin

On Tue, Nov 04, 2025 at 09:39:14AM +0100, Jan Kara wrote:
> On Tue 04-11-25 09:28:27, Jan Kara wrote:
> > On Tue 04-11-25 07:25:06, Qu Wenruo wrote:
> > > 
> > > 
> > > 在 2025/11/3 22:26, Christoph Hellwig 写道:
> > > > The emergency sync being non-blocking goes back to day 1.  I think the
> > > > idea behind it is to not lock up a already messed up system by
> > > > blocking forever, even if it is in workqueue.  Changing this feels
> > > > a bit risky to me.
> > > 
> > > Considering everything is already done in task context (baked by the global
> > > per-cpu workqueue), it at least won't block anything else.
> > > 
> > > And I'd say if the fs is already screwed up and hanging, the
> > > sync_inodes_one_sb() call are more likely to hang than the final sync_fs()
> > > call.
> > 
> > Well, but notice that sync_inodes_one_sb() is always called with wait == 0
> > from do_sync_work() exactly to skip inodes already marked as under
> > writeback, locked pages or pages under writeback as waiting for these has
> > high chances of locking up. Suddently calling sync_fs_one_sb() with wait ==
> > 1 can change things. That being said for ext4 the chances of locking up
> > ext4_sync_fs() with wait == 1 after sync_fs_one_sb() managed to do
> > non-trivial work are fairly minimal so I don't have strong objections
> > myself.
> 
> Ah, ok, now I've checked the code and read patch 1 in this thread. Indeed
> sync_inodes_one_sb() ignores the wait parameter and waits for everything.
> Given we've been running like this for over 10 years and nobody complained
> I agree calling sync_fs_one_sb() with wait == 1 is worth trying if it makes
> life better for btrfs.

Agreed. But emergency_sync() is really just a best-effort thing and I'm
rather weary accumulating complexity for it otherwise.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 1/2] fs: do not pass a parameter for sync_inodes_one_sb()
  2025-11-04  8:42     ` Jan Kara
@ 2025-11-05 18:50       ` Daniel Vacek
  0 siblings, 0 replies; 18+ messages in thread
From: Daniel Vacek @ 2025-11-05 18:50 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Qu Wenruo, linux-btrfs, linux-fsdevel, viro,
	brauner

On Tue, 4 Nov 2025 at 09:50, Jan Kara <jack@suse.cz> wrote:
>
> On Mon 03-11-25 03:51:21, Christoph Hellwig wrote:
> > On Mon, Nov 03, 2025 at 02:37:28PM +1030, Qu Wenruo wrote:
> > > The function sync_inodes_one_sb() will always wait for the writeback,
> > > and ignore the optional parameter.
> >
> > Indeed.
>
> Yeah, apparently I've broken non-blocking nature of emergency sync without
> nobody noticing for 13 years. Which probably means the non-blocking logic
> isn't that important ;)...

Or it means it's not being widely tested or used heavily in production
and when it fails it's hard to tell if that was because it is broken
or because the system state was already severely impaired at that
time.

--nX

>                                                                 Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-11-05 18:50 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-03  4:07 [PATCH RFC 0/2] fs: fully sync all fsese even for an emergency sync Qu Wenruo
2025-11-03  4:07 ` [PATCH RFC 1/2] fs: do not pass a parameter for sync_inodes_one_sb() Qu Wenruo
2025-11-03 11:51   ` Christoph Hellwig
2025-11-04  8:42     ` Jan Kara
2025-11-05 18:50       ` Daniel Vacek
2025-11-03  4:07 ` [PATCH RFC 2/2] fs: fully sync all fses even for an emergency sync Qu Wenruo
2025-11-03 11:56   ` Christoph Hellwig
2025-11-03 12:04     ` Christoph Hellwig
2025-11-03 20:46       ` Qu Wenruo
2025-11-03 20:55     ` Qu Wenruo
2025-11-04  8:28       ` Jan Kara
2025-11-04  8:39         ` Jan Kara
2025-11-05 11:32           ` Christian Brauner
2025-11-04  8:43         ` Qu Wenruo
2025-11-04 12:42           ` Jan Kara
2025-11-04 15:48             ` Christoph Hellwig
2025-11-03 16:00 ` [PATCH RFC 0/2] fs: fully sync all fsese " Askar Safin
2025-11-05 11:31 ` Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).