* Directory is not persisted after writing to the file within directory if system crashes
@ 2025-10-24 15:19 Vyacheslav Kovalevsky
2025-10-24 16:17 ` Filipe Manana
0 siblings, 1 reply; 6+ messages in thread
From: Vyacheslav Kovalevsky @ 2025-10-24 15:19 UTC (permalink / raw)
To: clm, dsterba; +Cc: linux-btrfs, linux-kernel
Under some circumstances, directory entry is not persisted after writing
to the file inside the directory that was opened with `O_SYNC` flag if
system crashes.
Detailed description
====================
Hello, we have found another issue with btrfs crash behavior.
In short, empty file is created and synced. Then, a new directory is
created, old file is opened with `O_SYNC` flag and some data is written.
After this, a new hard link is created inside the directory and the root
is `fsync`ed (directory should persist). However, after a crash, the
directory entry is missing even though data written to the old file was
persisted.
System info
===========
Linux version 6.18.0-rc2, also tested on 6.14.11.
How to reproduce
================
```
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
int main() {
int status;
int file_fd;
int root_fd;
status = creat("file1", S_IRWXU | S_IRWXG | S_IROTH | S_IXOTH);
printf("CREAT: %d\n", status);
// persist `file1`
sync();
status = mkdir("dir", S_IRWXU | S_IRWXG | S_IROTH | S_IXOTH);
printf("MKDIR: %d\n", status);
status = open("file1", O_WRONLY | O_SYNC);
printf("OPEN: %d\n", status);
file_fd = status;
status = write(file_fd, "Test data!", 10);
printf("WRITE: %d\n", status);
status = link("file1", "dir/file2");
printf("LINK: %d\n", status);
status = open(".", O_RDONLY | O_DIRECTORY);
printf("OPEN: %d\n", status);
root_fd = status;
// persist `dir`
status = fsync(root_fd);
printf("FSYNC: %d\n", status);
}
```
Steps:
1. Create and mount new btrfs file system in default configuration.
2. Change directory to root of the file system and run the compiled test.
3. Cause hard system crash (e.g. QEMU `system_reset` command).
4. Remount file system after crash.
5. Observe that `dir` directory is missing.
Notes:
- ext4 does persist `dir` and `dir/file2` even though it was not synced.
- xfs does persist `dir` but does not persist `dir/file2`.
P.S. Want to apologize for formatting in previous report, first time
using Thunderbird and plain text.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Directory is not persisted after writing to the file within directory if system crashes
2025-10-24 15:19 Directory is not persisted after writing to the file within directory if system crashes Vyacheslav Kovalevsky
@ 2025-10-24 16:17 ` Filipe Manana
2025-10-25 9:49 ` Vyacheslav Kovalevsky
0 siblings, 1 reply; 6+ messages in thread
From: Filipe Manana @ 2025-10-24 16:17 UTC (permalink / raw)
To: Vyacheslav Kovalevsky; +Cc: clm, dsterba, linux-btrfs, linux-kernel
On Fri, Oct 24, 2025 at 4:21 PM Vyacheslav Kovalevsky
<slava.kovalevskiy.2014@gmail.com> wrote:
>
> Under some circumstances, directory entry is not persisted after writing
> to the file inside the directory that was opened with `O_SYNC` flag if
> system crashes.
>
>
> Detailed description
> ====================
>
> Hello, we have found another issue with btrfs crash behavior.
>
> In short, empty file is created and synced. Then, a new directory is
> created, old file is opened with `O_SYNC` flag and some data is written.
> After this, a new hard link is created inside the directory and the root
> is `fsync`ed (directory should persist). However, after a crash, the
> directory entry is missing even though data written to the old file was
> persisted.
>
>
> System info
> ===========
>
> Linux version 6.18.0-rc2, also tested on 6.14.11.
>
>
> How to reproduce
> ================
>
> ```
> #include <errno.h>
> #include <fcntl.h>
> #include <stdio.h>
> #include <string.h>
> #include <sys/stat.h>
> #include <sys/types.h>
> #include <unistd.h>
>
> int main() {
> int status;
> int file_fd;
> int root_fd;
>
> status = creat("file1", S_IRWXU | S_IRWXG | S_IROTH | S_IXOTH);
> printf("CREAT: %d\n", status);
>
> // persist `file1`
> sync();
>
> status = mkdir("dir", S_IRWXU | S_IRWXG | S_IROTH | S_IXOTH);
> printf("MKDIR: %d\n", status);
>
> status = open("file1", O_WRONLY | O_SYNC);
> printf("OPEN: %d\n", status);
> file_fd = status;
>
> status = write(file_fd, "Test data!", 10);
> printf("WRITE: %d\n", status);
>
> status = link("file1", "dir/file2");
> printf("LINK: %d\n", status);
>
> status = open(".", O_RDONLY | O_DIRECTORY);
> printf("OPEN: %d\n", status);
> root_fd = status;
>
> // persist `dir`
> status = fsync(root_fd);
> printf("FSYNC: %d\n", status);
> }
> ```
>
> Steps:
>
> 1. Create and mount new btrfs file system in default configuration.
> 2. Change directory to root of the file system and run the compiled test.
> 3. Cause hard system crash (e.g. QEMU `system_reset` command).
> 4. Remount file system after crash.
> 5. Observe that `dir` directory is missing.
I converted that to a test case for fstests and couldn't reproduce,
"dir", "file1" and "dir/file2" exist after the power failure.
The conversion for fstests:
#! /bin/bash
# SPDX-License-Identifier: GPL-2.0
# Copyright (c) 2025 SUSE S.A. All Rights Reserved.
#
# FS QA Test 780
#
# what am I here for?
#
. ./common/preamble
_begin_fstest auto quick log
_cleanup()
{
_cleanup_flakey
cd /
rm -r -f $tmp.*
}
. ./common/filter
. ./common/dmflakey
_require_scratch
_require_dm_target flakey
rm -f $seqres.full
_scratch_mkfs >>$seqres.full 2>&1 || _fail "mkfs failed"
_require_metadata_journaling $SCRATCH_DEV
_init_flakey
_mount_flakey
touch $SCRATCH_MNT/file1
_scratch_sync
mkdir $SCRATCH_MNT/dir
echo -n "hello world" > $SCRATCH_MNT/file1
ln $SCRATCH_MNT/file1 $SCRATCH_MNT/dir/file2
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/
# Simulate a power failure and then mount again the filesystem to replay the
# journal/log.
_flakey_drop_and_remount
ls -R $SCRATCH_MNT/ | _filter_scratch
_unmount_flakey
# success, all done
_exit 0
>
> Notes:
>
> - ext4 does persist `dir` and `dir/file2` even though it was not synced.
> - xfs does persist `dir` but does not persist `dir/file2`.
>
>
> P.S. Want to apologize for formatting in previous report, first time
> using Thunderbird and plain text.
>
>
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Directory is not persisted after writing to the file within directory if system crashes
2025-10-24 16:17 ` Filipe Manana
@ 2025-10-25 9:49 ` Vyacheslav Kovalevsky
2025-10-25 23:40 ` Chris Murphy
2025-10-26 9:04 ` Filipe Manana
0 siblings, 2 replies; 6+ messages in thread
From: Vyacheslav Kovalevsky @ 2025-10-25 9:49 UTC (permalink / raw)
To: Filipe Manana; +Cc: clm, dsterba, linux-btrfs, linux-kernel
On 24/10/2025 19:17, Filipe Manana wrote:
> I converted that to a test case for fstests and couldn't reproduce,
> "dir", "file1" and "dir/file2" exist after the power failure.
>
> The conversion for fstests:
>
> #! /bin/bash
> # SPDX-License-Identifier: GPL-2.0
> # Copyright (c) 2025 SUSE S.A. All Rights Reserved.
> #
> # FS QA Test 780
> #
> # what am I here for?
> #
> . ./common/preamble
> _begin_fstest auto quick log
>
> _cleanup()
> {
> _cleanup_flakey
> cd /
> rm -r -f $tmp.*
> }
>
> . ./common/filter
> . ./common/dmflakey
>
> _require_scratch
> _require_dm_target flakey
>
> rm -f $seqres.full
> On 24/10/2025 19:17, Filipe Manana wrote:
> _scratch_mkfs >>$seqres.full 2>&1 || _fail "mkfs failed"
> _require_metadata_journaling $SCRATCH_DEV
> _init_flakey
> _mount_flakey
>
> touch $SCRATCH_MNT/file1
>
> _scratch_sync
>
> mkdir $SCRATCH_MNT/dir
> echo -n "hello world" > $SCRATCH_MNT/file1
> ln $SCRATCH_MNT/file1 $SCRATCH_MNT/dir/file2
>
> $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/
>
> # Simulate a power failure and then mount again the filesystem to replay the
> # journal/log.
> _flakey_drop_and_remount
>
> ls -R $SCRATCH_MNT/ | _filter_scratch
>
> _unmount_flakey
>
> # success, all done
> _exit 0
I think the line with `echo` may not be the correct translation:
> echo -n "hello world" > $SCRATCH_MNT/file1
In the original test, the file was opened with `O_SYNC` flag, if you
remove it, the directory will be there when the system crashes. I also
forgot to close the file after the `creat` call in the original test,
may be important as well.
The test itself is quite weird (why would `dir` be gone after seemingly
unrelated operation?), any detail can matter.
Please run the original test with a real system crash. I will also
double check everything on my side.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Directory is not persisted after writing to the file within directory if system crashes
2025-10-25 9:49 ` Vyacheslav Kovalevsky
@ 2025-10-25 23:40 ` Chris Murphy
2025-10-26 9:04 ` Filipe Manana
1 sibling, 0 replies; 6+ messages in thread
From: Chris Murphy @ 2025-10-25 23:40 UTC (permalink / raw)
To: Vyacheslav Kovalevsky, Filipe Manana
Cc: Chris Mason, David Sterba, Btrfs BTRFS, linux-kernel
On Sat, Oct 25, 2025, at 5:49 AM, Vyacheslav Kovalevsky wrote:
> I think the line with `echo` may not be the correct translation:
> > echo -n "hello world" > $SCRATCH_MNT/file1
>
> In the original test, the file was opened with `O_SYNC` flag, if you
> remove it, the directory will be there when the system crashes. I also
> forgot to close the file after the `creat` call in the original test,
> may be important as well.
>
> The test itself is quite weird (why would `dir` be gone after seemingly
> unrelated operation?), any detail can matter.
>
> Please run the original test with a real system crash.
This would produce hardware specific results rather than determining whether the file system is behaving correctly. It's possible the hardware is acknowledging the metadata, flush, super, flush, but then it's still not really persisting on disk
--
Chris Murphy
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Directory is not persisted after writing to the file within directory if system crashes
2025-10-25 9:49 ` Vyacheslav Kovalevsky
2025-10-25 23:40 ` Chris Murphy
@ 2025-10-26 9:04 ` Filipe Manana
2025-10-27 7:19 ` Christoph Hellwig
1 sibling, 1 reply; 6+ messages in thread
From: Filipe Manana @ 2025-10-26 9:04 UTC (permalink / raw)
To: Vyacheslav Kovalevsky; +Cc: clm, dsterba, linux-btrfs, linux-kernel
On Sat, Oct 25, 2025 at 10:49 AM Vyacheslav Kovalevsky
<slava.kovalevskiy.2014@gmail.com> wrote:
>
> On 24/10/2025 19:17, Filipe Manana wrote:
> > I converted that to a test case for fstests and couldn't reproduce,
> > "dir", "file1" and "dir/file2" exist after the power failure.
> >
> > The conversion for fstests:
> >
> > #! /bin/bash
> > # SPDX-License-Identifier: GPL-2.0
> > # Copyright (c) 2025 SUSE S.A. All Rights Reserved.
> > #
> > # FS QA Test 780
> > #
> > # what am I here for?
> > #
> > . ./common/preamble
> > _begin_fstest auto quick log
> >
> > _cleanup()
> > {
> > _cleanup_flakey
> > cd /
> > rm -r -f $tmp.*
> > }
> >
> > . ./common/filter
> > . ./common/dmflakey
> >
> > _require_scratch
> > _require_dm_target flakey
> >
> > rm -f $seqres.full
> > On 24/10/2025 19:17, Filipe Manana wrote:
> > _scratch_mkfs >>$seqres.full 2>&1 || _fail "mkfs failed"
> > _require_metadata_journaling $SCRATCH_DEV
> > _init_flakey
> > _mount_flakey
> >
> > touch $SCRATCH_MNT/file1
> >
> > _scratch_sync
> >
> > mkdir $SCRATCH_MNT/dir
> > echo -n "hello world" > $SCRATCH_MNT/file1
> > ln $SCRATCH_MNT/file1 $SCRATCH_MNT/dir/file2
> >
> > $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/
> >
> > # Simulate a power failure and then mount again the filesystem to replay the
> > # journal/log.
> > _flakey_drop_and_remount
> >
> > ls -R $SCRATCH_MNT/ | _filter_scratch
> >
> > _unmount_flakey
> >
> > # success, all done
> > _exit 0
>
> I think the line with `echo` may not be the correct translation:
> > echo -n "hello world" > $SCRATCH_MNT/file1
An echo is just a write...
>
> In the original test, the file was opened with `O_SYNC` flag, if you
> remove it, the directory will be there when the system crashes. I also
> forgot to close the file after the `creat` call in the original test,
> may be important as well.
An O_SYNC, which is what I missed before, is essentially just an
implicit fsync after every write on a file.
Adding an fsync after the echo:
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/file1
Triggers the problem of "dir" not being persisted.
>
> The test itself is quite weird (why would `dir` be gone after seemingly
> unrelated operation?), any detail can matter.
"dir" should be persisted as well as "dir/file2", according to the
SOMC (Strictly Ordered Metadata Consistency) that Dave Chinner
discussed many times in the past in fstests and btrfs mailing lists.
You should also reach the xfs mailing list and mention that
"dir/file2" is not persisted.
>
> Please run the original test with a real system crash. I will also
> double check everything on my side.
I've said before in another thread: we don't need to trigger qemu
crashes in order to test fsync.
Just use the dm flakey target with fstests - no need to do reboots,
much more practical and way less time consuming.
In 12 years of fixing fsync stuff on btrfs, I haven't yet seen any
case where dm flakey didn't do the job of reproducing issues.
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Directory is not persisted after writing to the file within directory if system crashes
2025-10-26 9:04 ` Filipe Manana
@ 2025-10-27 7:19 ` Christoph Hellwig
0 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2025-10-27 7:19 UTC (permalink / raw)
To: Filipe Manana
Cc: Vyacheslav Kovalevsky, clm, dsterba, linux-btrfs, linux-kernel
On Sun, Oct 26, 2025 at 09:04:13AM +0000, Filipe Manana wrote:
> >
> > The test itself is quite weird (why would `dir` be gone after seemingly
> > unrelated operation?), any detail can matter.
>
> "dir" should be persisted as well as "dir/file2", according to the
> SOMC (Strictly Ordered Metadata Consistency) that Dave Chinner
> discussed many times in the past in fstests and btrfs mailing lists.
>
> You should also reach the xfs mailing list and mention that
> "dir/file2" is not persisted.
No. The fsync is on the root directory. So the only thing that is
needed to per persisted is transactions touching that. The transaction
to created dir is persisted because it is an entry in the root directly.
creating and writing to file2 has nothing to do with the root directly
and absolutely should not be effected by an fsync on an unrelated inode.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-10-27 7:19 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-24 15:19 Directory is not persisted after writing to the file within directory if system crashes Vyacheslav Kovalevsky
2025-10-24 16:17 ` Filipe Manana
2025-10-25 9:49 ` Vyacheslav Kovalevsky
2025-10-25 23:40 ` Chris Murphy
2025-10-26 9:04 ` Filipe Manana
2025-10-27 7:19 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox