public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET v2 0/1] xfsprogs: random fixes for 5.19
@ 2022-07-19 21:44 Darrick J. Wong
  2022-07-19 21:44 ` [PATCH 1/1] mkfs: complain about impossible log size constraints Darrick J. Wong
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Darrick J. Wong @ 2022-07-19 21:44 UTC (permalink / raw)
  To: sandeen, djwong; +Cc: linux-xfs

Hi all,

This is a rollup of all the random fixes I've collected for xfsprogs
5.19.  At this point it's just an assorted collection, no particular
theme.  Some of them are leftovers from last week's posting.

v2: note the actual cause (too small filesystem) when we abort due to
    impossible log geometry

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=random-fixes

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=random-fixes

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=random-fixes
---
 mkfs/xfs_mkfs.c |    7 +++++++
 1 file changed, 7 insertions(+)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/1] mkfs: complain about impossible log size constraints
  2022-07-19 21:44 [PATCHSET v2 0/1] xfsprogs: random fixes for 5.19 Darrick J. Wong
@ 2022-07-19 21:44 ` Darrick J. Wong
  2022-08-05  2:23   ` Eric Sandeen
  2022-07-21 16:41 ` [PATCH 2/1] libxfs: stop overriding MAP_SYNC in publicly exported header files Darrick J. Wong
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 6+ messages in thread
From: Darrick J. Wong @ 2022-07-19 21:44 UTC (permalink / raw)
  To: sandeen, djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

xfs/042 trips over an impossible fs geometry when nrext64 is enabled.
The minimum log size calculation comes out to 4287 blocks, but the mkfs
parameters specify an AG size of 4096 blocks.  This eventually causes
mkfs to complain that the autoselected log size doesn't meet the minimum
size, but we could be a little more explicit in pointing out that the
two size constraints make for an impossible geometry.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 mkfs/xfs_mkfs.c |    7 +++++++
 1 file changed, 7 insertions(+)


diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index b140b815..a5e2df76 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -3413,6 +3413,13 @@ _("external log device size %lld blocks too small, must be at least %lld blocks\
 	 * an AG.
 	 */
 	max_logblocks = libxfs_alloc_ag_max_usable(mp) - 1;
+	if (max_logblocks < min_logblocks) {
+		fprintf(stderr,
+_("max log size %d smaller than min log size %d, filesystem is too small\n"),
+				max_logblocks,
+				min_logblocks);
+		usage();
+	}
 
 	/* internal log - if no size specified, calculate automatically */
 	if (!cfg->logblocks) {


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/1] libxfs: stop overriding MAP_SYNC in publicly exported header files
  2022-07-19 21:44 [PATCHSET v2 0/1] xfsprogs: random fixes for 5.19 Darrick J. Wong
  2022-07-19 21:44 ` [PATCH 1/1] mkfs: complain about impossible log size constraints Darrick J. Wong
@ 2022-07-21 16:41 ` Darrick J. Wong
  2022-08-05  2:40 ` [PATCH 3/1] xfs_repair: fix printf format specifiers on 32-bit platforms Darrick J. Wong
  2022-08-05  2:41 ` [PATCH 4/1] xfs_repair: retain superblock buffer to avoid write hook deadlock Darrick J. Wong
  3 siblings, 0 replies; 6+ messages in thread
From: Darrick J. Wong @ 2022-07-21 16:41 UTC (permalink / raw)
  To: sandeen; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Florian Fainelli most recently reported that xfsprogs doesn't build with
musl on mips:

"MIPS platforms building with recent kernel headers and the musl-libc
toolchain will expose the following build failure:

mmap.c: In function 'mmap_f':
mmap.c:196:12: error: 'MAP_SYNC' undeclared (first use in this function); did you mean 'MS_SYNC'?
  196 |    flags = MAP_SYNC | MAP_SHARED_VALIDATE;
      |            ^~~~~~~~
      |            MS_SYNC
mmap.c:196:12: note: each undeclared identifier is reported only once for each function it appears in
make[4]: *** [../include/buildrules:81: mmap.o] Error 1"

At first glance, the build failure here is caused by the fact that:

1. The configure script doesn't detect MAP_SYNC support
2. The build system doesn't set HAVE_MAP_SYNC
2. io/mmap.c includes input.h -> projects.h -> xfs.h and later sys/mman.h
3. include/linux.h #define's MAP_SYNC to 0 if HAVE_MAP_SYNC is not set
4. musl's sys/mman.h #undef MAP_SYNC on platforms that don't support it
5. io/mmap.c tries to use MAP_SYNC, not realizing that libc undefined it

Normally, xfs_io only exports functionality that is defined by the libc
and/or kernel headers on the build system.  We often make exceptions for
new functionality so that we have a way to test them before the header
file packages catch up, hence this '#ifndef HAVE_FOO #define FOO'
paradigm.

MAP_SYNC is a gross and horribly broken example of this.  These support
crutches are supposed to be *private* to xfsprogs for benefit of early
testing, but they were instead added to include/linux.h, which we
provide to user programs in the xfslibs-dev package.  IOWs, we've been
#defining MAP_SYNC to zero for unsuspecting programs.

Worst yet, gcc 11.3 doesn't even warn about overriding a #define to 0:

#include <stdio.h>
#include <sys/mman.h>
#ifdef STUPID
# include <xfs/xfs.h>
#endif

int main(int argc, char *argv[]) {
	printf("MAP_SYNC 0x%x\n", MAP_SYNC);
}

$ gcc -o a a.c -Wall
$ ./a
MAP_SYNC 0x80000
$ gcc -DSTUPID -o a a.c -Wall
$ ./a
MAP_SYNC 0x0

Four years have gone by since the introduction of MAP_SYNC, so let's get
rid of the override code entirely -- any platform that supports MAP_SYNC
has had plenty of chances to ensure their header files have the right
bits.  While we're at it, fix AC_HAVE_MAP_SYNC to look for MAP_SYNC in
the same header file that the one user (io/mmap.c) uses -- sys/mman.h.

Annoyingly, I had to test this by hand because the sole fstest that
exercises MAP_SYNC (generic/470) requires dm-logwrites and dm-thinp,
neither of which support fsdax on current kernels.

Reported-by: info@mobile-stream.com
Reported-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Reported-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
---
 include/linux.h       |    8 --------
 io/io.h               |    2 +-
 io/mmap.c             |   25 +++++++++++++------------
 m4/package_libcdev.m4 |    3 +--
 4 files changed, 15 insertions(+), 23 deletions(-)

diff --git a/include/linux.h b/include/linux.h
index 3d9f4e3d..eddc4ad9 100644
--- a/include/linux.h
+++ b/include/linux.h
@@ -251,14 +251,6 @@ struct fsxattr {
 #define FS_XFLAG_COWEXTSIZE	0x00010000	/* CoW extent size allocator hint */
 #endif
 
-#ifndef HAVE_MAP_SYNC
-#define MAP_SYNC 0
-#define MAP_SHARED_VALIDATE 0
-#else
-#include <asm-generic/mman.h>
-#include <asm-generic/mman-common.h>
-#endif /* HAVE_MAP_SYNC */
-
 /*
  * Reminder: anything added to this file will be compiled into downstream
  * userspace projects!
diff --git a/io/io.h b/io/io.h
index ada0a149..de4ef607 100644
--- a/io/io.h
+++ b/io/io.h
@@ -58,7 +58,7 @@ typedef struct mmap_region {
 	size_t		length;		/* length of mapping */
 	off64_t		offset;		/* start offset into backing file */
 	int		prot;		/* protection mode of the mapping */
-	bool		map_sync;	/* is this a MAP_SYNC mapping? */
+	int		flags;		/* MAP_* flags passed to mmap() */
 	char		*name;		/* name of backing file */
 } mmap_region_t;
 
diff --git a/io/mmap.c b/io/mmap.c
index 8c048a0a..425957d4 100644
--- a/io/mmap.c
+++ b/io/mmap.c
@@ -46,8 +46,11 @@ print_mapping(
 	for (i = 0, p = pflags; p->prot != PROT_NONE; i++, p++)
 		buffer[i] = (map->prot & p->prot) ? p->mode : '-';
 
-	if (map->map_sync)
+#ifdef HAVE_MAP_SYNC
+	if ((map->flags & (MAP_SYNC | MAP_SHARED_VALIDATE)) ==
+			  (MAP_SYNC | MAP_SHARED_VALIDATE))
 		sprintf(&buffer[i], " S");
+#endif
 
 	printf("%c%03d%c 0x%lx - 0x%lx %s  %14s (%lld : %ld)\n",
 		braces? '[' : ' ', index, braces? ']' : ' ',
@@ -139,7 +142,9 @@ mmap_help(void)
 " -r -- map with PROT_READ protection\n"
 " -w -- map with PROT_WRITE protection\n"
 " -x -- map with PROT_EXEC protection\n"
+#ifdef HAVE_MAP_SYNC
 " -S -- map with MAP_SYNC and MAP_SHARED_VALIDATE flags\n"
+#endif
 " -s <size> -- first do mmap(size)/munmap(size), try to reserve some free space\n"
 " If no protection mode is specified, all are used by default.\n"
 "\n"));
@@ -193,18 +198,14 @@ mmap_f(
 			prot |= PROT_EXEC;
 			break;
 		case 'S':
+#ifdef HAVE_MAP_SYNC
 			flags = MAP_SYNC | MAP_SHARED_VALIDATE;
-
-			/*
-			 * If MAP_SYNC and MAP_SHARED_VALIDATE aren't defined
-			 * in the system headers we will have defined them
-			 * both as 0.
-			 */
-			if (!flags) {
-				printf("MAP_SYNC not supported\n");
-				return 0;
-			}
 			break;
+#else
+			printf("MAP_SYNC not supported\n");
+			exitcode = 1;
+			return command_usage(&mmap_cmd);
+#endif
 		case 's':
 			length2 = cvtnum(blocksize, sectsize, optarg);
 			break;
@@ -281,7 +282,7 @@ mmap_f(
 	mapping->offset = offset;
 	mapping->name = filename;
 	mapping->prot = prot;
-	mapping->map_sync = (flags == (MAP_SYNC | MAP_SHARED_VALIDATE));
+	mapping->flags = flags;
 	return 0;
 }
 
diff --git a/m4/package_libcdev.m4 b/m4/package_libcdev.m4
index df44174d..5293dd1a 100644
--- a/m4/package_libcdev.m4
+++ b/m4/package_libcdev.m4
@@ -387,8 +387,7 @@ AC_DEFUN([AC_HAVE_MAP_SYNC],
   [ AC_MSG_CHECKING([for MAP_SYNC])
     AC_COMPILE_IFELSE(
     [	AC_LANG_PROGRAM([[
-#include <asm-generic/mman.h>
-#include <asm-generic/mman-common.h>
+#include <sys/mman.h>
 	]], [[
 int flags = MAP_SYNC | MAP_SHARED_VALIDATE;
 	]])

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] mkfs: complain about impossible log size constraints
  2022-07-19 21:44 ` [PATCH 1/1] mkfs: complain about impossible log size constraints Darrick J. Wong
@ 2022-08-05  2:23   ` Eric Sandeen
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Sandeen @ 2022-08-05  2:23 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On 7/19/22 4:44 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> xfs/042 trips over an impossible fs geometry when nrext64 is enabled.
> The minimum log size calculation comes out to 4287 blocks, but the mkfs
> parameters specify an AG size of 4096 blocks.  This eventually causes
> mkfs to complain that the autoselected log size doesn't meet the minimum
> size, but we could be a little more explicit in pointing out that the
> two size constraints make for an impossible geometry.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Thanks for the update.

Reviewed-by : Eric Sandeen <sandeen@redhat.com>

> ---
>  mkfs/xfs_mkfs.c |    7 +++++++
>  1 file changed, 7 insertions(+)
> 
> 
> diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> index b140b815..a5e2df76 100644
> --- a/mkfs/xfs_mkfs.c
> +++ b/mkfs/xfs_mkfs.c
> @@ -3413,6 +3413,13 @@ _("external log device size %lld blocks too small, must be at least %lld blocks\
>  	 * an AG.
>  	 */
>  	max_logblocks = libxfs_alloc_ag_max_usable(mp) - 1;
> +	if (max_logblocks < min_logblocks) {
> +		fprintf(stderr,
> +_("max log size %d smaller than min log size %d, filesystem is too small\n"),
> +				max_logblocks,
> +				min_logblocks);
> +		usage();
> +	}
>  
>  	/* internal log - if no size specified, calculate automatically */
>  	if (!cfg->logblocks) {
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 3/1] xfs_repair: fix printf format specifiers on 32-bit platforms
  2022-07-19 21:44 [PATCHSET v2 0/1] xfsprogs: random fixes for 5.19 Darrick J. Wong
  2022-07-19 21:44 ` [PATCH 1/1] mkfs: complain about impossible log size constraints Darrick J. Wong
  2022-07-21 16:41 ` [PATCH 2/1] libxfs: stop overriding MAP_SYNC in publicly exported header files Darrick J. Wong
@ 2022-08-05  2:40 ` Darrick J. Wong
  2022-08-05  2:41 ` [PATCH 4/1] xfs_repair: retain superblock buffer to avoid write hook deadlock Darrick J. Wong
  3 siblings, 0 replies; 6+ messages in thread
From: Darrick J. Wong @ 2022-08-05  2:40 UTC (permalink / raw)
  To: sandeen; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

armv7l builds spit out the following warnings:

In file included from ../include/platform_defs.h:44,
                 from ../include/libxfs.h:13,
                 from bmap.c:7:
bmap.c: In function ‘blkmap_alloc’:
bmap.c:41:11: error: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘xfs_extnum_t’ {aka ‘long long unsigned int’} [-Werror=format=]
   41 |         _("Number of extents requested in blkmap_alloc (%d) overflows 32 bits.\n"
      |           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
bmap.c:41:9: note: in expansion of macro ‘_’
   41 |         _("Number of extents requested in blkmap_alloc (%d) overflows 32 bits.\n"
      |         ^
bmap.c:41:58: note: format string is defined here
   41 |         _("Number of extents requested in blkmap_alloc (%d) overflows 32 bits.\n"
      |                                                         ~^
      |                                                          |
      |                                                          int
      |                                                         %lld
In file included from ../include/platform_defs.h:44,
                 from ../include/libxfs.h:13,
                 from bmap.c:7:
bmap.c:54:35: error: format ‘%zu’ expects argument of type ‘size_t’, but argument 2 has type ‘xfs_extnum_t’ {aka ‘long long unsigned int’} [-Werror=format=]
   54 |                         do_warn(_("malloc failed in blkmap_alloc (%zu bytes)\n"),
      |                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
bmap.c:54:33: note: in expansion of macro ‘_’
   54 |                         do_warn(_("malloc failed in blkmap_alloc (%zu bytes)\n"),
      |                                 ^
bmap.c:54:69: note: format string is defined here
   54 |                         do_warn(_("malloc failed in blkmap_alloc (%zu bytes)\n"),
      |                                                                   ~~^
      |                                                                     |
      |                                                                     unsigned int
      |                                                                   %llu

Fix these.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 logprint/log_misc.c |    8 ++++----
 repair/bmap.c       |    8 ++++----
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/logprint/log_misc.c b/logprint/log_misc.c
index 82e1f682..836156e0 100644
--- a/logprint/log_misc.c
+++ b/logprint/log_misc.c
@@ -493,16 +493,16 @@ xlog_print_trans_inode_core(
 	nextents = ip->di_big_nextents;
     else
 	nextents = ip->di_nextents;
-    printf(_("size 0x%llx nblocks 0x%llx extsize 0x%x nextents 0x%lx\n"),
+    printf(_("size 0x%llx nblocks 0x%llx extsize 0x%x nextents 0x%llx\n"),
 	   (unsigned long long)ip->di_size, (unsigned long long)ip->di_nblocks,
-	   ip->di_extsize, nextents);
+	   ip->di_extsize, (unsigned long long)nextents);
 
     if (ip->di_flags2 & XFS_DIFLAG2_NREXT64)
 	nextents = ip->di_big_anextents;
     else
 	nextents = ip->di_anextents;
-    printf(_("naextents 0x%lx forkoff %d dmevmask 0x%x dmstate 0x%hx\n"),
-	   nextents, (int)ip->di_forkoff, ip->di_dmevmask, ip->di_dmstate);
+    printf(_("naextents 0x%llx forkoff %d dmevmask 0x%x dmstate 0x%hx\n"),
+	   (unsigned long long)nextents, (int)ip->di_forkoff, ip->di_dmevmask, ip->di_dmstate);
     printf(_("flags 0x%x gen 0x%x\n"),
 	   ip->di_flags, ip->di_gen);
     if (ip->di_version == 3) {
diff --git a/repair/bmap.c b/repair/bmap.c
index 44e43ab4..cd1a8b07 100644
--- a/repair/bmap.c
+++ b/repair/bmap.c
@@ -38,10 +38,10 @@ blkmap_alloc(
 #if (BITS_PER_LONG == 32)	/* on 64-bit platforms this is never true */
 	if (nex > BLKMAP_NEXTS_MAX) {
 		do_warn(
-	_("Number of extents requested in blkmap_alloc (%d) overflows 32 bits.\n"
+	_("Number of extents requested in blkmap_alloc (%llu) overflows 32 bits.\n"
 	  "If this is not a corruption, then you will need a 64 bit system\n"
 	  "to repair this filesystem.\n"),
-			nex);
+			(unsigned long long)nex);
 		return NULL;
 	}
 #endif
@@ -51,8 +51,8 @@ blkmap_alloc(
 	if (!blkmap || blkmap->naexts < nex) {
 		blkmap = realloc(blkmap, BLKMAP_SIZE(nex));
 		if (!blkmap) {
-			do_warn(_("malloc failed in blkmap_alloc (%zu bytes)\n"),
-				BLKMAP_SIZE(nex));
+			do_warn(_("malloc failed in blkmap_alloc (%llu bytes)\n"),
+				(unsigned long long)BLKMAP_SIZE(nex));
 			return NULL;
 		}
 		pthread_setspecific(key, blkmap);

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 4/1] xfs_repair: retain superblock buffer to avoid write hook deadlock
  2022-07-19 21:44 [PATCHSET v2 0/1] xfsprogs: random fixes for 5.19 Darrick J. Wong
                   ` (2 preceding siblings ...)
  2022-08-05  2:40 ` [PATCH 3/1] xfs_repair: fix printf format specifiers on 32-bit platforms Darrick J. Wong
@ 2022-08-05  2:41 ` Darrick J. Wong
  3 siblings, 0 replies; 6+ messages in thread
From: Darrick J. Wong @ 2022-08-05  2:41 UTC (permalink / raw)
  To: sandeen; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Every now and then I experience the following deadlock in xfs_repair
when I'm running the offline repair fuzz tests:

#0  futex_wait (private=0, expected=2, futex_word=0x55555566df70) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait (futex=futex@entry=0x55555566df70, private=0) at ./nptl/lowlevellock.c:49
#2  lll_mutex_lock_optimized (mutex=0x55555566df70) at ./nptl/pthread_mutex_lock.c:48
#3  ___pthread_mutex_lock (mutex=mutex@entry=0x55555566df70) at ./nptl/pthread_mutex_lock.c:93
#4  cache_shake (cache=cache@entry=0x55555566de60, priority=priority@entry=2, purge=purge@entry=false) at cache.c:231
#5  cache_node_get (cache=cache@entry=0x55555566de60, key=key@entry=0x7fffe55e01b0, nodep=nodep@entry=0x7fffe55e0168) at cache.c:452
#6  __cache_lookup (key=key@entry=0x7fffe55e01b0, flags=0, bpp=bpp@entry=0x7fffe55e0228) at rdwr.c:405
#7  libxfs_getbuf_flags (btp=0x55555566de00, blkno=0, len=<optimized out>, flags=<optimized out>, bpp=0x7fffe55e0228) at rdwr.c:457
#8  libxfs_buf_read_map (btp=0x55555566de00, map=map@entry=0x7fffe55e0280, nmaps=nmaps@entry=1, flags=flags@entry=0, bpp=bpp@entry=0x7fffe55e0278, ops=0x5555556233e0 <xfs_sb_buf_ops>)
    at rdwr.c:704
#9  libxfs_buf_read (ops=<optimized out>, bpp=0x7fffe55e0278, flags=0, numblks=<optimized out>, blkno=0, target=<optimized out>)
    at /storage/home/djwong/cdev/work/xfsprogs/build-x86_64/libxfs/libxfs_io.h:195
#10 libxfs_getsb (mp=mp@entry=0x7fffffffd690) at rdwr.c:162
#11 force_needsrepair (mp=0x7fffffffd690) at xfs_repair.c:924
#12 repair_capture_writeback (bp=<optimized out>) at xfs_repair.c:1000
#13 libxfs_bwrite (bp=0x7fffe011e530) at rdwr.c:869
#14 cache_shake (cache=cache@entry=0x55555566de60, priority=priority@entry=2, purge=purge@entry=false) at cache.c:240
#15 cache_node_get (cache=cache@entry=0x55555566de60, key=key@entry=0x7fffe55e0470, nodep=nodep@entry=0x7fffe55e0428) at cache.c:452
#16 __cache_lookup (key=key@entry=0x7fffe55e0470, flags=1, bpp=bpp@entry=0x7fffe55e0538) at rdwr.c:405
#17 libxfs_getbuf_flags (btp=0x55555566de00, blkno=12736, len=<optimized out>, flags=<optimized out>, bpp=0x7fffe55e0538) at rdwr.c:457
#18 __libxfs_buf_get_map (btp=<optimized out>, map=map@entry=0x7fffe55e05b0, nmaps=<optimized out>, flags=flags@entry=1, bpp=bpp@entry=0x7fffe55e0538) at rdwr.c:501
#19 libxfs_buf_get_map (btp=<optimized out>, map=map@entry=0x7fffe55e05b0, nmaps=<optimized out>, flags=flags@entry=1, bpp=bpp@entry=0x7fffe55e0538) at rdwr.c:525
#20 pf_queue_io (args=args@entry=0x5555556722c0, map=map@entry=0x7fffe55e05b0, nmaps=<optimized out>, flag=flag@entry=11) at prefetch.c:124
#21 pf_read_bmbt_reclist (args=0x5555556722c0, rp=<optimized out>, numrecs=78) at prefetch.c:220
#22 pf_scan_lbtree (dbno=dbno@entry=1211, level=level@entry=1, isadir=isadir@entry=1, args=args@entry=0x5555556722c0, func=0x55555557f240 <pf_scanfunc_bmap>) at prefetch.c:298
#23 pf_read_btinode (isadir=1, dino=<optimized out>, args=0x5555556722c0) at prefetch.c:385
#24 pf_read_inode_dirs (args=args@entry=0x5555556722c0, bp=bp@entry=0x7fffdc023790) at prefetch.c:459
#25 pf_read_inode_dirs (bp=<optimized out>, args=0x5555556722c0) at prefetch.c:411
#26 pf_batch_read (args=args@entry=0x5555556722c0, which=which@entry=PF_PRIMARY, buf=buf@entry=0x7fffd001d000) at prefetch.c:609
#27 pf_io_worker (param=0x5555556722c0) at prefetch.c:673
#28 start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#29 clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

From this stack trace, we see that xfs_repair's prefetch module is
getting some xfs_buf objects ahead of initiating a read (#19).  The
buffer cache has hit its limit, so it calls cache_shake (#14) to free
some unused xfs_bufs.  The buffer it finds is a dirty buffer, so it
calls libxfs_bwrite to flush it out to disk, which in turn invokes the
buffer write hook that xfs_repair set up in 3b7667cb to mark the ondisk
filesystem's superblock as NEEDSREPAIR until repair actually completes.

Unfortunately, the NEEDSREPAIR handler itself needs to grab the
superblock buffer, so it makes another call into the buffer cache (#9),
which sees that the cache is full and tries to shake it(#4).  Hence we
deadlock on cm_mutex because shaking is not reentrant.

Fix this by retaining a reference to the superblock buffer when possible
so that the writeback hook doesn't have to access the buffer cache to
set NEEDSREPAIR.

Fixes: 3b7667cb ("xfs_repair: set NEEDSREPAIR the first time we write to a filesystem")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/libxfs_api_defs.h |    2 +
 libxfs/libxfs_io.h       |    1 +
 libxfs/rdwr.c            |    8 +++++
 repair/phase2.c          |    8 +++++
 repair/protos.h          |    1 +
 repair/xfs_repair.c      |   75 ++++++++++++++++++++++++++++++++++++++++------
 6 files changed, 86 insertions(+), 9 deletions(-)

diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 370ad8b3..9e5da210 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -51,9 +51,11 @@
 #define xfs_buf_delwri_submit		libxfs_buf_delwri_submit
 #define xfs_buf_get			libxfs_buf_get
 #define xfs_buf_get_uncached		libxfs_buf_get_uncached
+#define xfs_buf_lock			libxfs_buf_lock
 #define xfs_buf_read			libxfs_buf_read
 #define xfs_buf_read_uncached		libxfs_buf_read_uncached
 #define xfs_buf_relse			libxfs_buf_relse
+#define xfs_buf_unlock			libxfs_buf_unlock
 #define xfs_bunmapi			libxfs_bunmapi
 #define xfs_bwrite			libxfs_bwrite
 #define xfs_calc_dquots_per_chunk	libxfs_calc_dquots_per_chunk
diff --git a/libxfs/libxfs_io.h b/libxfs/libxfs_io.h
index 9c0e2704..fae86427 100644
--- a/libxfs/libxfs_io.h
+++ b/libxfs/libxfs_io.h
@@ -226,6 +226,7 @@ xfs_buf_hold(struct xfs_buf *bp)
 }
 
 void xfs_buf_lock(struct xfs_buf *bp);
+void xfs_buf_unlock(struct xfs_buf *bp);
 
 int libxfs_buf_get_uncached(struct xfs_buftarg *targ, size_t bblen, int flags,
 		struct xfs_buf **bpp);
diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
index fe69f9b4..61083dac 100644
--- a/libxfs/rdwr.c
+++ b/libxfs/rdwr.c
@@ -384,6 +384,14 @@ xfs_buf_lock(
 		pthread_mutex_lock(&bp->b_lock);
 }
 
+void
+xfs_buf_unlock(
+	struct xfs_buf	*bp)
+{
+	if (use_xfs_buf_lock)
+		pthread_mutex_unlock(&bp->b_lock);
+}
+
 static int
 __cache_lookup(
 	struct xfs_bufkey	*key,
diff --git a/repair/phase2.c b/repair/phase2.c
index 13832701..df27adab 100644
--- a/repair/phase2.c
+++ b/repair/phase2.c
@@ -251,6 +251,14 @@ phase2(
 	} else
 		do_log(_("Phase 2 - using internal log\n"));
 
+	/*
+	 * Now that we've set up the buffer cache the way we want it, try to
+	 * grab our own reference to the primary sb so that the hooks will not
+	 * have to call out to the buffer cache.
+	 */
+	if (mp->m_buf_writeback_fn)
+		retain_primary_sb(mp);
+
 	/* Zero log if applicable */
 	do_log(_("        - zero log...\n"));
 
diff --git a/repair/protos.h b/repair/protos.h
index 03ebae14..83e471ff 100644
--- a/repair/protos.h
+++ b/repair/protos.h
@@ -16,6 +16,7 @@ int	get_sb(xfs_sb_t			*sbp,
 		xfs_off_t			off,
 		int			size,
 		xfs_agnumber_t		agno);
+int retain_primary_sb(struct xfs_mount *mp);
 void	write_primary_sb(xfs_sb_t	*sbp,
 			int		size);
 
diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index d08b0cec..7fd891e3 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -738,6 +738,63 @@ check_fs_vs_host_sectsize(
 	}
 }
 
+/*
+ * If we set up a writeback function to set NEEDSREPAIR while the filesystem is
+ * dirty, there's a chance that calling libxfs_getsb could deadlock the buffer
+ * cache while trying to get the primary sb buffer if the first non-sb write to
+ * the filesystem is the result of a cache shake.  Retain a reference to the
+ * primary sb buffer to avoid all that.
+ */
+static struct xfs_buf *primary_sb_bp;	/* buffer for superblock */
+
+int
+retain_primary_sb(
+	struct xfs_mount	*mp)
+{
+	int			error;
+
+	error = -libxfs_buf_read(mp->m_ddev_targp, XFS_SB_DADDR,
+			XFS_FSS_TO_BB(mp, 1), 0, &primary_sb_bp,
+			&xfs_sb_buf_ops);
+	if (error)
+		return error;
+
+	libxfs_buf_unlock(primary_sb_bp);
+	return 0;
+}
+
+static void
+drop_primary_sb(void)
+{
+	if (!primary_sb_bp)
+		return;
+
+	libxfs_buf_lock(primary_sb_bp);
+	libxfs_buf_relse(primary_sb_bp);
+	primary_sb_bp = NULL;
+}
+
+static int
+get_primary_sb(
+	struct xfs_mount	*mp,
+	struct xfs_buf		**bpp)
+{
+	int			error;
+
+	*bpp = NULL;
+
+	if (!primary_sb_bp) {
+		error = retain_primary_sb(mp);
+		if (error)
+			return error;
+	}
+
+	libxfs_buf_lock(primary_sb_bp);
+	xfs_buf_hold(primary_sb_bp);
+	*bpp = primary_sb_bp;
+	return 0;
+}
+
 /* Clear needsrepair after a successful repair run. */
 void
 clear_needsrepair(
@@ -758,15 +815,14 @@ clear_needsrepair(
 		do_warn(
 	_("Cannot clear needsrepair due to flush failure, err=%d.\n"),
 			error);
-		return;
+		goto drop;
 	}
 
 	/* Clear needsrepair from the superblock. */
-	bp = libxfs_getsb(mp);
-	if (!bp || bp->b_error) {
+	error = get_primary_sb(mp, &bp);
+	if (error) {
 		do_warn(
-	_("Cannot clear needsrepair from primary super, err=%d.\n"),
-			bp ? bp->b_error : ENOMEM);
+	_("Cannot clear needsrepair from primary super, err=%d.\n"), error);
 	} else {
 		mp->m_sb.sb_features_incompat &=
 				~XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR;
@@ -775,6 +831,8 @@ clear_needsrepair(
 	}
 	if (bp)
 		libxfs_buf_relse(bp);
+drop:
+	drop_primary_sb();
 }
 
 static void
@@ -797,11 +855,10 @@ force_needsrepair(
 	    xfs_sb_version_needsrepair(&mp->m_sb))
 		return;
 
-	bp = libxfs_getsb(mp);
-	if (!bp || bp->b_error) {
+	error = get_primary_sb(mp, &bp);
+	if (error) {
 		do_log(
-	_("couldn't get superblock to set needsrepair, err=%d\n"),
-				bp ? bp->b_error : ENOMEM);
+	_("couldn't get superblock to set needsrepair, err=%d\n"), error);
 	} else {
 		/*
 		 * It's possible that we need to set NEEDSREPAIR before we've

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-08-05  2:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-07-19 21:44 [PATCHSET v2 0/1] xfsprogs: random fixes for 5.19 Darrick J. Wong
2022-07-19 21:44 ` [PATCH 1/1] mkfs: complain about impossible log size constraints Darrick J. Wong
2022-08-05  2:23   ` Eric Sandeen
2022-07-21 16:41 ` [PATCH 2/1] libxfs: stop overriding MAP_SYNC in publicly exported header files Darrick J. Wong
2022-08-05  2:40 ` [PATCH 3/1] xfs_repair: fix printf format specifiers on 32-bit platforms Darrick J. Wong
2022-08-05  2:41 ` [PATCH 4/1] xfs_repair: retain superblock buffer to avoid write hook deadlock Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox