From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A1EE8DF56 for ; Wed, 10 Jan 2024 05:58:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="XmjrCm4F" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 159C7C433C7; Wed, 10 Jan 2024 05:58:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704866290; bh=EoWJHcoW8dpNkUVZDx2nqfgQqLVvHbY6OPIJXsC9rKk=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=XmjrCm4FWZ2GSy9gayCUACrY0M5v6GqhdyVwZL40bXX0AY17Fx5hB68la5/pPc9C/ Z3MSFSPXnib6tJTK0+wGuHFpB9+csqW16iJAEve5EKIBNXyLJSwYmip7my+bd6GSsB ZZJj2tdOnuC85jf54N5HITH7lC2rij5PTS9vBhtLElNh/QOizldtKIcuQoIK3FyKNJ wAvDGW5918yD5VP7AnLSDPIoMxhXCYPfIb+Novzn0WCX9xfCCt6+D+dNeT28Dem7FT HMSa+UJtmaTvHo4/xvPXAQ0eIsD0oTz4SXPEm2mxBWLuKlftDJBqYb6z/F3bAewGgh oqqqx/YuIpu6g== Date: Tue, 9 Jan 2024 21:58:09 -0800 From: "Darrick J. Wong" To: tytso@mit.edu Cc: linux-ext4@vger.kernel.org, "Brian J. Murrell" Subject: [RFC PATCH 6/2] e2scrub: skip filesystems that don't have journals Message-ID: <20240110055809.GD722946@frogsfrogsfrogs> References: <170268089742.2679199.16836622895526209331.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-ext4@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <170268089742.2679199.16836622895526209331.stgit@frogsfrogsfrogs> From: Darrick J. Wong Brian J. Murrell reported that e2scrub reports failures with one of his filesystems. From the email discussion after he supplied a metadump: AHA! This is an ext2 filesystem, since it doesn't have the "has_journal" or "extents" features turned on: # e2image -r /tmp/disk.qcow2 /dev/sda # dumpe2fs /dev/sda -h dumpe2fs 1.47.1~WIP-2023-12-27 (27-Dec-2023) Filesystem volume name: Last mounted on: /opt Filesystem UUID: 2c70368a-0d54-4805-8620-fda19466d819 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: ext_attr resize_inode dir_index filetype sparse_super large_file Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: not clean with errors (Note: Filesystem state == "clean" means that EXT2_VALID_FS is set in the superblock s_state field; "not clean with errors" means that the flag is not set.) I bet the "journal only" preen doesn't actually reset the filesystem state either: # e2fsck -E journal_only -p /dev/sda # dumpe2fs /dev/sda -h | grep state dumpe2fs 1.47.1~WIP-2023-12-27 (27-Dec-2023) Filesystem state: not clean with errors Nope. So now I know what happened -- when mounting an ext* filesystem that doesn't have a journal, the driver clears EXT2_VALID_FS from the primary superblock. This forces the system to run e2fsck after a crash, because that's what you have to do for unjournalled filesystems. The "e2fsck -E journal_only -p" call in e2scrub only replays the journal. Since there is no journal, it exits almost immediately. That's the intended behavior, but then it means that the "e2fsck -fy" call immediately after sees that the superblock doesn't have EXT2_VALID_FS set, sets it, and makes e2fsck return 1. So that's why you're getting the e2scrub failures. Contrast this to what you get when the filesystem has a journal: # dumpe2fs -h /dev/sdb dumpe2fs 1.47.0 (5-Feb-2023) Filesystem volume name: Last mounted on: Filesystem UUID: e18b8b57-a75e-4316-87ce-6a08969476c3 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean Filesystems with journals retain their EXT4_VALID_FS state when they're mounted. Hmm. What e2scrub should do about unjournalled filesystems is a thorny question. My initial thought is that it should skip them, because a mounted unjournalled filesystem cannot by definition be kept consistent. Therefore, teach e2scrub_all to avoid them and e2scrub to fail them at the onset. Restricting the scope of e2scrub sucks, but in the meantime at least it means that your filesystem isn't massively corrupt. Thanks for the metadump, it was very useful for root cause analysis. Reported-by: "Brian J. Murrell" Signed-off-by: Darrick J. Wong --- scrub/e2scrub.in | 7 +++++++ scrub/e2scrub_all.in | 4 ++++ 2 files changed, 11 insertions(+) diff --git a/scrub/e2scrub.in b/scrub/e2scrub.in index 7ed57f2d..043bc12b 100644 --- a/scrub/e2scrub.in +++ b/scrub/e2scrub.in @@ -159,6 +159,13 @@ if [ ! -e "${dev}" ]; then exitcode 16 fi +# Do not scrub unjournalled filesystems; they are inconsistent when mounted +if [ "${reap}" -eq 0 ] && ! dumpe2fs -h "${dev}" | grep -q 'has_journal'; then + echo "${arg}: Filesystem has no journal, cannot check." + print_help + exitcode 16 +fi + # Make sure this is an LVM device we can snapshot lvm_vars="$(lvs --nameprefixes -o name,vgname,lv_role --noheadings "${dev}" 2> /dev/null)" eval "${lvm_vars}" diff --git a/scrub/e2scrub_all.in b/scrub/e2scrub_all.in index 437f6cc2..fe4dda95 100644 --- a/scrub/e2scrub_all.in +++ b/scrub/e2scrub_all.in @@ -125,6 +125,10 @@ ls_scan_targets() { while read vars ; do eval "${vars}" + # Skip unjournalled filesystems; they are inconsistent when + # mounted + dumpe2fs -h "${NAME}" | grep -q 'has_journal' || continue + if [ "${scrub_all}" -eq 1 ] || [ -n "${MOUNTPOINT}" ]; then echo ${MOUNTPOINT:-${NAME}} fi