linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
To: "fstests@vger.kernel.org" <fstests@vger.kernel.org>,
	"wqu@suse.com" <wqu@suse.com>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>,
	"glaubitz@physik.fu-berlin.de" <glaubitz@physik.fu-berlin.de>,
	"frank.li@vivo.com" <frank.li@vivo.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: RE: [RFC] Why generic/073 is generic but not btrfs specific?
Date: Thu, 6 Nov 2025 22:29:46 +0000	[thread overview]
Message-ID: <ee43d81115d91ceb359f697162f21ce50cee29ff.camel@ibm.com> (raw)
In-Reply-To: <a43fd07d-88e6-473d-a0be-3ba3203785e6@suse.com>

On Fri, 2025-11-07 at 08:02 +1030, Qu Wenruo wrote:
> 
> 在 2025/11/7 07:41, Viacheslav Dubeyko 写道:
> > On Fri, 2025-11-07 at 07:28 +1030, Qu Wenruo wrote:
> > > 
> > > 在 2025/11/7 07:10, Viacheslav Dubeyko 写道:
> > > > Hello,
> > > > 
> > > > Running generic/073 for the case of HFS+ finishes with volume corruption:
> > > > 
> > > > sudo ./check generic/073
> > > > FSTYP -- hfsplus
> > > > PLATFORM -- Linux/x86_64 hfsplus-testing-0001 6.17.0-rc1+ #4 SMP PREEMPT_DYNAMIC
> > > > Wed Oct 1 15:02:44 PDT 2025
> > > > MKFS_OPTIONS -- /dev/loop51
> > > > MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch
> > > > 
> > > > generic/073 _check_generic_filesystem: filesystem on /dev/loop51 is inconsistent
> > > > (see XFSTESTS-2/xfstests-dev/results//generic/073.full for details)
> > > > 
> > > > Ran: generic/073
> > > > Failures: generic/073
> > > > Failed 1 of 1 tests
> > > > 
> > > > sudo fsck.hfsplus -d /dev/loop51
> > > > ** /dev/loop51
> > > > Using cacheBlockSize=32K cacheTotalBlock=1024 cacheSize=32768K.
> > > > Executing fsck_hfs (version 540.1-Linux).
> > > > ** Checking non-journaled HFS Plus Volume.
> > > > The volume name is untitled
> > > > ** Checking extents overflow file.
> > > > ** Checking catalog file.
> > > > ** Checking multi-linked files.
> > > > ** Checking catalog hierarchy.
> > > > Invalid directory item count
> > > > (It should be 1 instead of 0)
> > > > ** Checking extended attributes file.
> > > > ** Checking volume bitmap.
> > > > ** Checking volume information.
> > > > Verify Status: VIStat = 0x0000, ABTStat = 0x0000 EBTStat = 0x0000
> > > > CBTStat = 0x0000 CatStat = 0x00004000
> > > > ** Repairing volume.
> > > > ** Rechecking volume.
> > > > ** Checking non-journaled HFS Plus Volume.
> > > > The volume name is untitled
> > > > ** Checking extents overflow file.
> > > > ** Checking catalog file.
> > > > ** Checking multi-linked files.
> > > > ** Checking catalog hierarchy.
> > > > ** Checking extended attributes file.
> > > > ** Checking volume bitmap.
> > > > ** Checking volume information.
> > > > ** The volume untitled was repaired successfully.
> > > > 
> > > > Initially, I considered that something is wrong with HFS+ driver logic. But
> > > > after testing and debugging the issue, I believe that HFS+ logic is correct.
> > > > 
> > > > As far as I can see, the generic/073 is checking specific btrfs related case:
> > > > 
> > > > # Test file A fsync after moving one other unrelated file B between directories
> > > > # and fsyncing B's old parent directory before fsyncing the file A. Check that
> > > > # after a crash all the file A data we fsynced is available.
> > > > #
> > > > # This test is motivated by an issue discovered in btrfs which caused the file
> > > > # data to be lost (despite fsync returning success to user space). That btrfs
> > > > # bug was fixed by the following linux kernel patch:
> > > > #
> > > > #   Btrfs: fix data loss in the fast fsync path
> > > > 
> > > > The test is doing these steps on final phase:
> > > > 
> > > > mv $SCRATCH_MNT/testdir_1/bar $SCRATCH_MNT/testdir_2/bar
> > > > $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir_1
> > > > $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo
> > > > 
> > > > So, we move file bar from testdir_1 into testdir_2 folder. It means that HFS+
> > > > logic decrements the number of entries in testdir_1 and increments number of
> > > > entries in testdir_2. Finally, we do fsync only for testdir_1 and foo but not
> > > > for testdir_2.
> > > 
> > > If the fs is using journal, shouldn't the increments on the testdir_2
> > > already be journaled? Thus after a power loss, the increments on
> > > testdir_2/bar should be replayed thus the end user should still see that
> > > inode.
> > > 
> > 
> > Technically speaking, HFS+ is journaling file system in Apple implementation.
> > But we don't have this functionality implemented and fully supported on Linux
> > kernel side. Potentially, it can be done but currently we haven't such
> > functionality yet. So, HFS/HFS+ doesn't use journaling on Linux kernel side  and
> > no journal replay could happen. :)
> 
> That's fine, btrfs doesn't support (traditional) journal either, since 
> its metadata is already fully COW protected.
> 
> The journal-like part is called log-tree, which is only utilized for 
> speeding up fsync() so no full fs sync is needed (which is super 
> expensive for btrfs)
> 
> That's the reason why btrfs' fsync() is super complex and Filipe spent 
> tons of his time on this field, and it is btrfs that exposed this problem.
> 
> 
> But if HFS+ on linux doesn't support metadata journal, I'm afraid you 
> will need to go the hard path just like btrfs, to manually check if an 
> inode needs its parent inodes updated during fsync.

Yeah, you are right, this is one of the possible way. Let me check the
complexity of this change.

> > 
> > > To me this looks like a bug in HFS logic where something is not properly
> > > journaled (the increment on the testdir_2/bar).
> > > 
> > > 
> > 
> > This statement could be correct if we will support journaling for HFS+. But HFS
> > never supported the journaling technology.
> > 
> > > Finally, if you're asking an end user that if it is acceptable that
> > > after moving an inode and fsync the old directory, the inode is no
> > > longer reachable, I'm pretty sure no end user will think it's acceptable.
> > > 
> > 
> > As far as I can see, the fsck only corrects the number of entries for testdir_2
> > folder. The rest is pretty OK.
> 
> So it means the nlink number on the child inode is correct. I guess 
> since testdir_2 is not the directory where fsync is called, thus its 
> number of references is not updated.
> 
> With journal it will be much easier, but without it, you have to iterate 
> through the changed inode backrefs, and update the involved directories 
> one by one during fsync(), just like btrfs...
> 
> > 
> > > > As a result, this is the reason why fsck.hfsplus detects the
> > > > volume corruption afterwards. As far as I can see, the HFS+ driver behavior is
> > > > completely correct and nothing needs to be done for fixing in HFS+ logic here.
> > > 
> > > Then I guess you may also want to ask why EXT4/XFS/Btrfs/F2fs all pass
> > > the test case.
> > > 
> > 
> > So, ext4 and xfs support journaling. Maybe, it's interesting to check what is
> > going on with F2FS, FAT, and other file system that hasn't journaling support.
> 
> IIRC F2FS also support journals.
> 
> For VFAT the test case is rejected due to the lack of journal support.
> 
> On the other hand, if HFS+ doesn't support journal at all, you may want 
> to add hfs/hfs+ to _has_metadata_journaling(), so that the test case 
> will be skipped.
> 
> But still, since this test case already leads to fsck error reports, it 
> is worthy some improvement to avoid such problem.
> 

Make sense. Let me check the complexity of HFS+ modification to support
correctly such situation. But HFS should be added to _has_metadata_journaling()
if it is not already there.

Thanks,
Slava.


  reply	other threads:[~2025-11-06 22:30 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-06 20:40 [RFC] Why generic/073 is generic but not btrfs specific? Viacheslav Dubeyko
2025-11-06 20:58 ` Qu Wenruo
2025-11-06 21:11   ` Viacheslav Dubeyko
2025-11-06 21:32     ` Qu Wenruo
2025-11-06 22:29       ` Viacheslav Dubeyko [this message]
2025-11-08 14:01         ` Theodore Ts'o
2025-11-10 19:41           ` Viacheslav Dubeyko
2025-11-11  1:53             ` Zorro Lang
2025-11-11  2:02               ` Viacheslav Dubeyko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ee43d81115d91ceb359f697162f21ce50cee29ff.camel@ibm.com \
    --to=slava.dubeyko@ibm.com \
    --cc=frank.li@vivo.com \
    --cc=fstests@vger.kernel.org \
    --cc=glaubitz@physik.fu-berlin.de \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).