[PATCHSET 3/3] fstests: integrate with coredump capturing

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCHSET 3/3] fstests: integrate with coredump capturing
@ 2025-07-29 20:08 Darrick J. Wong
  2025-07-29 20:10 ` [PATCH 1/2] fsstress: don't abort when stat(".") returns EIO Darrick J. Wong
  2025-07-29 20:11 ` [PATCH 2/2] check: collect core dumps from systemd-coredump Darrick J. Wong
  0 siblings, 2 replies; 8+ messages in thread
From: Darrick J. Wong @ 2025-07-29 20:08 UTC (permalink / raw)
  To: djwong, zlang; +Cc: fstests, linux-xfs

Hi all,

Integrate fstests with coredump capturing tools such as systemd-coredump.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

With a bit of luck, this should all go splendidly.
Comments and questions are, as always, welcome.

--D

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=coredump-capture
---
Commits in this patchset:
 * fsstress: don't abort when stat(".") returns EIO
 * check: collect core dumps from systemd-coredump
---
 check          |    2 ++
 common/rc      |   44 ++++++++++++++++++++++++++++++++++++++++++++
 ltp/fsstress.c |   15 ++++++++++++++-
 3 files changed, 60 insertions(+), 1 deletion(-)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] fsstress: don't abort when stat(".") returns EIO
  2025-07-29 20:08 [PATCHSET 3/3] fstests: integrate with coredump capturing Darrick J. Wong
@ 2025-07-29 20:10 ` Darrick J. Wong
  2025-07-30 14:23   ` Christoph Hellwig
  2025-07-29 20:11 ` [PATCH 2/2] check: collect core dumps from systemd-coredump Darrick J. Wong
  1 sibling, 1 reply; 8+ messages in thread
From: Darrick J. Wong @ 2025-07-29 20:10 UTC (permalink / raw)
  To: djwong, zlang; +Cc: fstests, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

First, start with the premise that fstests is run with a nonzero limit
on the size of core dumps so that we can capture the state of
misbehaving fs utilities like fsck and scrub if they crash.

When fsstress is compiled with DEBUG defined (which is the default), it
will periodically call check_cwd to ensure that the current working
directory hasn't changed out from underneath it.

If the filesystem is XFS and it shuts down, the stat64() calls will
start returning EIO.  In this case, we follow the out: label and call
abort() to exit the program.  Historically this did not produce any core
dumps because $PWD is on the dead filesystem and the write fails.

However, modern systems are often configured to capture coredumps using
some external mechanism, e.g. abrt/systemd-coredump.   In this case, the
capture tool will succeeds in capturing every crashed process, which
fills the crash dump directory with a lot of useless junk.  Worse, if
the capture tool is configured to pass the dumps to fstests, it will
flag the test as failed because something dumped core.

This is really silly, because basic stat requests for the current
working directory can be satisfied from the inode cache without a disk
access.  In this narrow situation, EIO only happens when the fs has shut
down, so just exit the program.

We really should have a way to query if a filesystem is shut down that
isn't conflated with (possibly transient) EIO errors.  But for now this
is what we have to do. :(

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 ltp/fsstress.c |   15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/ltp/fsstress.c b/ltp/fsstress.c
index 8dbfb81f95a538..d4abe561787f19 100644
--- a/ltp/fsstress.c
+++ b/ltp/fsstress.c
@@ -1049,8 +1049,21 @@ check_cwd(void)

 	ret = stat64(".", &statbuf);
 	if (ret != 0) {
+		int error = errno;
+
 		fprintf(stderr, "fsstress: check_cwd stat64() returned %d with errno: %d (%s)\n",
-			ret, errno, strerror(errno));
+			ret, error, strerror(error));
+
+		/*
+		 * The current working directory is pinned in memory, which
+		 * means that stat should not have had to do any disk accesses
+		 * to retrieve stat information.  Treat an EIO as an indication
+		 * that the filesystem shut down and exit instead of dumping
+		 * core like the abort() below does.
+		 */
+		if (error == EIO)
+			exit(1);
+
 		goto out;
 	}

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] check: collect core dumps from systemd-coredump
  2025-07-29 20:08 [PATCHSET 3/3] fstests: integrate with coredump capturing Darrick J. Wong
  2025-07-29 20:10 ` [PATCH 1/2] fsstress: don't abort when stat(".") returns EIO Darrick J. Wong
@ 2025-07-29 20:11 ` Darrick J. Wong
  2025-08-02 13:47   ` Zorro Lang
  2025-08-13 15:18   ` [PATCH v2 " Darrick J. Wong
  1 sibling, 2 replies; 8+ messages in thread
From: Darrick J. Wong @ 2025-07-29 20:11 UTC (permalink / raw)
  To: djwong, zlang; +Cc: fstests, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

On modern RHEL (>=8) and Debian KDE systems, systemd-coredump can be
installed to capture core dumps from crashed programs.  If this is the
case, we would like to capture core dumps from programs that crash
during the test.  Set up an (admittedly overwrought) pipeline to extract
dumps created during the test and then capture them the same way that we
pick up "core" and "core.$pid" files.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 check     |    2 ++
 common/rc |   44 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)


diff --git a/check b/check
index ce7eacb7c45d9e..77581e438c46b9 100755
--- a/check
+++ b/check
@@ -924,6 +924,7 @@ function run_section()
 		     $1 == "'$seqnum'" {lasttime=" " $2 "s ... "; exit} \
 		     END {printf "%s", lasttime}' "$check.time"
 		rm -f core $seqres.notrun
+		_start_coredumpctl_collection
 
 		start=`_wallclock`
 		$timestamp && _timestamp
@@ -957,6 +958,7 @@ function run_section()
 		# just "core".  Use globbing to find the most common patterns,
 		# assuming there are no other coredump capture packages set up.
 		local cores=0
+		_finish_coredumpctl_collection
 		for i in core core.*; do
 			test -f "$i" || continue
 			if ((cores++ == 0)); then
diff --git a/common/rc b/common/rc
index 04b721b7318a7e..e4c4d05387f44e 100644
--- a/common/rc
+++ b/common/rc
@@ -5034,6 +5034,50 @@ _check_kmemleak()
 	fi
 }
 
+# Current timestamp, in a format that systemd likes
+_systemd_now() {
+	timedatectl show --property=TimeUSec --value
+}
+
+# Do what we need to do to capture core dumps from coredumpctl
+_start_coredumpctl_collection() {
+	command -v coredumpctl &>/dev/null || return
+	command -v timedatectl &>/dev/null || return
+	command -v jq &>/dev/null || return
+
+	sysctl kernel.core_pattern | grep -q systemd-coredump || return
+	COREDUMPCTL_START_TIMESTAMP="$(_systemd_now)"
+}
+
+# Capture core dumps from coredumpctl.
+#
+# coredumpctl list only supports json output as a machine-readable format.  The
+# human-readable format intermingles spaces from the timestamp with actual
+# column separators, so we cannot parse that sanely.  The json output is an
+# array of:
+#        {
+#                "time" : 1749744847150926,
+#                "pid" : 2297,
+#                "uid" : 0,
+#                "gid" : 0,
+#                "sig" : 6,
+#                "corefile" : "present",
+#                "exe" : "/run/fstests/e2fsprogs/fuse2fs",
+#                "size" : 47245
+#        },
+# So we use jq to filter out lost corefiles, then print the pid and exe
+# separated by a pipe and hope that nobody ever puts a pipe in an executable
+# name.
+_finish_coredumpctl_collection() {
+	test -n "$COREDUMPCTL_START_TIMESTAMP" || return
+
+	coredumpctl list --since="$COREDUMPCTL_START_TIMESTAMP" --json=short 2>/dev/null | \
+	jq --raw-output 'map(select(.corefile == "present")) | map("\(.pid)|\(.exe)") | .[]' | while IFS='|' read pid exe; do
+		test -e "core.$pid" || coredumpctl dump --output="core.$pid" "$pid" "$exe" &>> $seqres.full
+	done
+	unset COREDUMPCTL_START_TIMESTAMP
+}
+
 # don't check dmesg log after test
 _disable_dmesg_check()
 {


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] fsstress: don't abort when stat(".") returns EIO
  2025-07-29 20:10 ` [PATCH 1/2] fsstress: don't abort when stat(".") returns EIO Darrick J. Wong
@ 2025-07-30 14:23   ` Christoph Hellwig
  2025-07-30 14:55     ` Darrick J. Wong
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2025-07-30 14:23 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: zlang, fstests, linux-xfs

On Tue, Jul 29, 2025 at 01:10:50PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> First, start with the premise that fstests is run with a nonzero limit
> on the size of core dumps so that we can capture the state of
> misbehaving fs utilities like fsck and scrub if they crash.

Can you explain what this has to do with core dumping?

I'm just really confused between this patch content and the subject of
this patch and the entire series..

> This is really silly, because basic stat requests for the current
> working directory can be satisfied from the inode cache without a disk
> access.  In this narrow situation, EIO only happens when the fs has shut
> down, so just exit the program.

If we think it's silly we can trivially drop the xfs_is_shutdown check
in xfs_vn_getattr.  But is it really silly?  We've tried to basically
make every file system operation consistently fail on shut down
file systems,

> We really should have a way to query if a filesystem is shut down that
> isn't conflated with (possibly transient) EIO errors.  But for now this
> is what we have to do. :(

Well, a new STATX_ flag would work, assuming stat doesn't actually
fail :)  Otherwise a new ioctl/fcntl would make sense, especially as
the shutdown concept has spread beyond XFS.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] fsstress: don't abort when stat(".") returns EIO
  2025-07-30 14:23   ` Christoph Hellwig
@ 2025-07-30 14:55     ` Darrick J. Wong
  0 siblings, 0 replies; 8+ messages in thread
From: Darrick J. Wong @ 2025-07-30 14:55 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: zlang, fstests, linux-xfs

On Wed, Jul 30, 2025 at 07:23:24AM -0700, Christoph Hellwig wrote:
> On Tue, Jul 29, 2025 at 01:10:50PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > First, start with the premise that fstests is run with a nonzero limit
> > on the size of core dumps so that we can capture the state of
> > misbehaving fs utilities like fsck and scrub if they crash.
> 
> Can you explain what this has to do with core dumping?
> 
> I'm just really confused between this patch content and the subject of
> this patch and the entire series..

It's a bugfix ahead of new behaviors introduced in patch 2.  I clearly
didn't explain this well enough, so I'll try again.

Before abrt/systemd-coredump, FS_IOC_SHUTDOWN fsstress tests would do
something like the following:

1. start fsstress, which chdirs to $TEST_DIR
2. shut down the filesystem
3. fsstress tries to stat($TEST_DIR), fails, and calls abort
4. abort triggers coredump
5. kernel fails to write "core" to $TEST_DIR (because fs is shut down)
6. test finishes, no core files written to $here, test passes

Once you install systemd-coredump, that changes to:

same 1-4 above
5. kernel pipes core file to coredumpctl, which writes it to /var/crash
6. test finishes, no core files written to $here, test passes

And then with patch 2 of this series, that becomes:

same 1-4 above
5. kernel pipes core file to coredumpctl, which writes it to /var/crash
6. test finishes, ./check queries coredumpctl for any new coredumps,
   and copies them to $here
7. ./check finds core files written to $here, test fails

Now we've caused a test failure where there was none before, simply
because the crash reporting improved.

Therefore this patch changes fsstress not to call abort() from check_cwd
when it has a reasonable suspicion that the fs has died.

(Did that help?  /me is still pre-coffee...)

> > This is really silly, because basic stat requests for the current
> > working directory can be satisfied from the inode cache without a disk
> > access.  In this narrow situation, EIO only happens when the fs has shut
> > down, so just exit the program.
> 
> If we think it's silly we can trivially drop the xfs_is_shutdown check
> in xfs_vn_getattr.  But is it really silly?  We've tried to basically
> make every file system operation consistently fail on shut down
> file systems,

No no, "really silly" refers to failing tests that we didn't used to
fail.

> > We really should have a way to query if a filesystem is shut down that
> > isn't conflated with (possibly transient) EIO errors.  But for now this
> > is what we have to do. :(
> 
> Well, a new STATX_ flag would work, assuming stat doesn't actually
> fail :)  Otherwise a new ioctl/fcntl would make sense, especially as
> the shutdown concept has spread beyond XFS.

I think we ought to add a new ioctl or something so that callers can
positively identify a shut down filesystem.  bfoster I think was asking
about that for fstests some years back, and ended up coding a bunch of
grep heuristics to work around the lack of a real call.

I think we can't drop the "stat{,x} returns EIO on shutdown fs" behavior
because I know of a few, uh, users whose heartbeat monitor periodically
queries statx($PWD) and reboots the node if it returns errno.

--D

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] check: collect core dumps from systemd-coredump
  2025-07-29 20:11 ` [PATCH 2/2] check: collect core dumps from systemd-coredump Darrick J. Wong
@ 2025-08-02 13:47   ` Zorro Lang
  2025-08-12 18:14     ` Darrick J. Wong
  2025-08-13 15:18   ` [PATCH v2 " Darrick J. Wong
  1 sibling, 1 reply; 8+ messages in thread
From: Zorro Lang @ 2025-08-02 13:47 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: fstests, linux-xfs

On Tue, Jul 29, 2025 at 01:11:06PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> On modern RHEL (>=8) and Debian KDE systems, systemd-coredump can be
> installed to capture core dumps from crashed programs.  If this is the
> case, we would like to capture core dumps from programs that crash
> during the test.  Set up an (admittedly overwrought) pipeline to extract
> dumps created during the test and then capture them the same way that we
> pick up "core" and "core.$pid" files.
> 
> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> ---
>  check     |    2 ++
>  common/rc |   44 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 46 insertions(+)
> 
> 
> diff --git a/check b/check
> index ce7eacb7c45d9e..77581e438c46b9 100755
> --- a/check
> +++ b/check
> @@ -924,6 +924,7 @@ function run_section()
>  		     $1 == "'$seqnum'" {lasttime=" " $2 "s ... "; exit} \
>  		     END {printf "%s", lasttime}' "$check.time"
>  		rm -f core $seqres.notrun
> +		_start_coredumpctl_collection
>  
>  		start=`_wallclock`
>  		$timestamp && _timestamp
> @@ -957,6 +958,7 @@ function run_section()
>  		# just "core".  Use globbing to find the most common patterns,
>  		# assuming there are no other coredump capture packages set up.
>  		local cores=0
> +		_finish_coredumpctl_collection
>  		for i in core core.*; do
>  			test -f "$i" || continue
>  			if ((cores++ == 0)); then
> diff --git a/common/rc b/common/rc
> index 04b721b7318a7e..e4c4d05387f44e 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -5034,6 +5034,50 @@ _check_kmemleak()
>  	fi
>  }
>  
> +# Current timestamp, in a format that systemd likes
> +_systemd_now() {
> +	timedatectl show --property=TimeUSec --value
> +}
> +
> +# Do what we need to do to capture core dumps from coredumpctl
> +_start_coredumpctl_collection() {
> +	command -v coredumpctl &>/dev/null || return
> +	command -v timedatectl &>/dev/null || return
> +	command -v jq &>/dev/null || return
> +
> +	sysctl kernel.core_pattern | grep -q systemd-coredump || return

# rpm -qf `which coredumpctl`
systemd-udev-252-53.el9.x86_64
# rpm -qf `which timedatectl`
systemd-252-53.el9.x86_64
# rpm -qf `which jq`
jq-1.6-17.el9.x86_64
# rpm -qf /usr/lib/systemd/systemd-coredump 
systemd-udev-252-53.el9.x86_64

So we have 3 optional running dependences, how about metion that in README?

Thanks,
Zorro

> +	COREDUMPCTL_START_TIMESTAMP="$(_systemd_now)"
> +}
> +
> +# Capture core dumps from coredumpctl.
> +#
> +# coredumpctl list only supports json output as a machine-readable format.  The
> +# human-readable format intermingles spaces from the timestamp with actual
> +# column separators, so we cannot parse that sanely.  The json output is an
> +# array of:
> +#        {
> +#                "time" : 1749744847150926,
> +#                "pid" : 2297,
> +#                "uid" : 0,
> +#                "gid" : 0,
> +#                "sig" : 6,
> +#                "corefile" : "present",
> +#                "exe" : "/run/fstests/e2fsprogs/fuse2fs",
> +#                "size" : 47245
> +#        },
> +# So we use jq to filter out lost corefiles, then print the pid and exe
> +# separated by a pipe and hope that nobody ever puts a pipe in an executable
> +# name.
> +_finish_coredumpctl_collection() {
> +	test -n "$COREDUMPCTL_START_TIMESTAMP" || return
> +
> +	coredumpctl list --since="$COREDUMPCTL_START_TIMESTAMP" --json=short 2>/dev/null | \
> +	jq --raw-output 'map(select(.corefile == "present")) | map("\(.pid)|\(.exe)") | .[]' | while IFS='|' read pid exe; do
> +		test -e "core.$pid" || coredumpctl dump --output="core.$pid" "$pid" "$exe" &>> $seqres.full
> +	done
> +	unset COREDUMPCTL_START_TIMESTAMP
> +}
> +
>  # don't check dmesg log after test
>  _disable_dmesg_check()
>  {
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] check: collect core dumps from systemd-coredump
  2025-08-02 13:47   ` Zorro Lang
@ 2025-08-12 18:14     ` Darrick J. Wong
  0 siblings, 0 replies; 8+ messages in thread
From: Darrick J. Wong @ 2025-08-12 18:14 UTC (permalink / raw)
  To: Zorro Lang; +Cc: fstests, linux-xfs

On Sat, Aug 02, 2025 at 09:47:00PM +0800, Zorro Lang wrote:
> On Tue, Jul 29, 2025 at 01:11:06PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > On modern RHEL (>=8) and Debian KDE systems, systemd-coredump can be
> > installed to capture core dumps from crashed programs.  If this is the
> > case, we would like to capture core dumps from programs that crash
> > during the test.  Set up an (admittedly overwrought) pipeline to extract
> > dumps created during the test and then capture them the same way that we
> > pick up "core" and "core.$pid" files.
> > 
> > Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> > ---
> >  check     |    2 ++
> >  common/rc |   44 ++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 46 insertions(+)
> > 
> > 
> > diff --git a/check b/check
> > index ce7eacb7c45d9e..77581e438c46b9 100755
> > --- a/check
> > +++ b/check
> > @@ -924,6 +924,7 @@ function run_section()
> >  		     $1 == "'$seqnum'" {lasttime=" " $2 "s ... "; exit} \
> >  		     END {printf "%s", lasttime}' "$check.time"
> >  		rm -f core $seqres.notrun
> > +		_start_coredumpctl_collection
> >  
> >  		start=`_wallclock`
> >  		$timestamp && _timestamp
> > @@ -957,6 +958,7 @@ function run_section()
> >  		# just "core".  Use globbing to find the most common patterns,
> >  		# assuming there are no other coredump capture packages set up.
> >  		local cores=0
> > +		_finish_coredumpctl_collection
> >  		for i in core core.*; do
> >  			test -f "$i" || continue
> >  			if ((cores++ == 0)); then
> > diff --git a/common/rc b/common/rc
> > index 04b721b7318a7e..e4c4d05387f44e 100644
> > --- a/common/rc
> > +++ b/common/rc
> > @@ -5034,6 +5034,50 @@ _check_kmemleak()
> >  	fi
> >  }
> >  
> > +# Current timestamp, in a format that systemd likes
> > +_systemd_now() {
> > +	timedatectl show --property=TimeUSec --value
> > +}
> > +
> > +# Do what we need to do to capture core dumps from coredumpctl
> > +_start_coredumpctl_collection() {
> > +	command -v coredumpctl &>/dev/null || return
> > +	command -v timedatectl &>/dev/null || return
> > +	command -v jq &>/dev/null || return
> > +
> > +	sysctl kernel.core_pattern | grep -q systemd-coredump || return
> 
> # rpm -qf `which coredumpctl`
> systemd-udev-252-53.el9.x86_64
> # rpm -qf `which timedatectl`
> systemd-252-53.el9.x86_64
> # rpm -qf `which jq`
> jq-1.6-17.el9.x86_64
> # rpm -qf /usr/lib/systemd/systemd-coredump 
> systemd-udev-252-53.el9.x86_64
> 
> So we have 3 optional running dependences, how about metion that in README?

Done.

--D

> Thanks,
> Zorro
> 
> > +	COREDUMPCTL_START_TIMESTAMP="$(_systemd_now)"
> > +}
> > +
> > +# Capture core dumps from coredumpctl.
> > +#
> > +# coredumpctl list only supports json output as a machine-readable format.  The
> > +# human-readable format intermingles spaces from the timestamp with actual
> > +# column separators, so we cannot parse that sanely.  The json output is an
> > +# array of:
> > +#        {
> > +#                "time" : 1749744847150926,
> > +#                "pid" : 2297,
> > +#                "uid" : 0,
> > +#                "gid" : 0,
> > +#                "sig" : 6,
> > +#                "corefile" : "present",
> > +#                "exe" : "/run/fstests/e2fsprogs/fuse2fs",
> > +#                "size" : 47245
> > +#        },
> > +# So we use jq to filter out lost corefiles, then print the pid and exe
> > +# separated by a pipe and hope that nobody ever puts a pipe in an executable
> > +# name.
> > +_finish_coredumpctl_collection() {
> > +	test -n "$COREDUMPCTL_START_TIMESTAMP" || return
> > +
> > +	coredumpctl list --since="$COREDUMPCTL_START_TIMESTAMP" --json=short 2>/dev/null | \
> > +	jq --raw-output 'map(select(.corefile == "present")) | map("\(.pid)|\(.exe)") | .[]' | while IFS='|' read pid exe; do
> > +		test -e "core.$pid" || coredumpctl dump --output="core.$pid" "$pid" "$exe" &>> $seqres.full
> > +	done
> > +	unset COREDUMPCTL_START_TIMESTAMP
> > +}
> > +
> >  # don't check dmesg log after test
> >  _disable_dmesg_check()
> >  {
> > 
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 2/2] check: collect core dumps from systemd-coredump
  2025-07-29 20:11 ` [PATCH 2/2] check: collect core dumps from systemd-coredump Darrick J. Wong
  2025-08-02 13:47   ` Zorro Lang
@ 2025-08-13 15:18   ` Darrick J. Wong
  1 sibling, 0 replies; 8+ messages in thread
From: Darrick J. Wong @ 2025-08-13 15:18 UTC (permalink / raw)
  To: zlang; +Cc: fstests, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

On modern RHEL (>=8) and Debian KDE systems, systemd-coredump can be
installed to capture core dumps from crashed programs.  If this is the
case, we would like to capture core dumps from programs that crash
during the test.  Set up an (admittedly overwrought) pipeline to extract
dumps created during the test and then capture them the same way that we
pick up "core" and "core.$pid" files.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
v2: update reamde
---
 README    |   20 ++++++++++++++++++++
 check     |    2 ++
 common/rc |   44 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 66 insertions(+)

diff --git a/README b/README
index de452485af87a3..14e54a00c9e1a2 100644
--- a/README
+++ b/README
@@ -109,6 +109,11 @@ Ubuntu or Debian
    $ sudo apt-get install exfatprogs f2fs-tools ocfs2-tools udftools xfsdump \
         xfslibs-dev
 
+3. Install packages for optional features:
+
+    systemd coredump capture:
+    $ sudo apt install systemd-coredump systemd jq
+
 Fedora
 ------
 
@@ -124,6 +129,11 @@ Fedora
     $ sudo yum install btrfs-progs exfatprogs f2fs-tools ocfs2-tools xfsdump \
         xfsprogs-devel
 
+3. Install packages for optional features:
+
+    systemd coredump capture:
+    $ sudo yum install systemd systemd-udev jq
+
 RHEL or CentOS
 --------------
 
@@ -159,6 +169,11 @@ RHEL or CentOS
     For ocfs2 build and install:
      - see https://github.com/markfasheh/ocfs2-tools
 
+5. Install packages for optional features:
+
+    systemd coredump capture:
+    $ sudo yum install systemd systemd-udev jq
+
 SUSE Linux Enterprise or openSUSE
 ---------------------------------
 
@@ -176,6 +191,11 @@ SUSE Linux Enterprise or openSUSE
     For XFS install:
      $ sudo zypper install xfsdump xfsprogs-devel
 
+3. Install packages for optional features:
+
+    systemd coredump capture:
+    $ sudo yum install systemd systemd-coredump jq
+
 Build and install test, libs and utils
 --------------------------------------
 
diff --git a/check b/check
index 7ef6c9b3d69df5..37f733d0f2afb2 100755
--- a/check
+++ b/check
@@ -924,6 +924,7 @@ function run_section()
 		     $1 == "'$seqnum'" {lasttime=" " $2 "s ... "; exit} \
 		     END {printf "%s", lasttime}' "$check.time"
 		rm -f core $seqres.notrun
+		_start_coredumpctl_collection
 
 		start=`_wallclock`
 		$timestamp && _timestamp
@@ -957,6 +958,7 @@ function run_section()
 		# just "core".  Use globbing to find the most common patterns,
 		# assuming there are no other coredump capture packages set up.
 		local cores=0
+		_finish_coredumpctl_collection
 		for i in core core.*; do
 			test -f "$i" || continue
 			if ((cores++ == 0)); then
diff --git a/common/rc b/common/rc
index 3b853a913bee44..335d995909f74c 100644
--- a/common/rc
+++ b/common/rc
@@ -5053,6 +5053,50 @@ _check_kmemleak()
 	fi
 }
 
+# Current timestamp, in a format that systemd likes
+_systemd_now() {
+	timedatectl show --property=TimeUSec --value
+}
+
+# Do what we need to do to capture core dumps from coredumpctl
+_start_coredumpctl_collection() {
+	command -v coredumpctl &>/dev/null || return
+	command -v timedatectl &>/dev/null || return
+	command -v jq &>/dev/null || return
+
+	sysctl kernel.core_pattern | grep -q systemd-coredump || return
+	COREDUMPCTL_START_TIMESTAMP="$(_systemd_now)"
+}
+
+# Capture core dumps from coredumpctl.
+#
+# coredumpctl list only supports json output as a machine-readable format.  The
+# human-readable format intermingles spaces from the timestamp with actual
+# column separators, so we cannot parse that sanely.  The json output is an
+# array of:
+#        {
+#                "time" : 1749744847150926,
+#                "pid" : 2297,
+#                "uid" : 0,
+#                "gid" : 0,
+#                "sig" : 6,
+#                "corefile" : "present",
+#                "exe" : "/run/fstests/e2fsprogs/fuse2fs",
+#                "size" : 47245
+#        },
+# So we use jq to filter out lost corefiles, then print the pid and exe
+# separated by a pipe and hope that nobody ever puts a pipe in an executable
+# name.
+_finish_coredumpctl_collection() {
+	test -n "$COREDUMPCTL_START_TIMESTAMP" || return
+
+	coredumpctl list --since="$COREDUMPCTL_START_TIMESTAMP" --json=short 2>/dev/null | \
+	jq --raw-output 'map(select(.corefile == "present")) | map("\(.pid)|\(.exe)") | .[]' | while IFS='|' read pid exe; do
+		test -e "core.$pid" || coredumpctl dump --output="core.$pid" "$pid" "$exe" &>> $seqres.full
+	done
+	unset COREDUMPCTL_START_TIMESTAMP
+}
+
 # don't check dmesg log after test
 _disable_dmesg_check()
 {

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-08-13 15:18 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-29 20:08 [PATCHSET 3/3] fstests: integrate with coredump capturing Darrick J. Wong
2025-07-29 20:10 ` [PATCH 1/2] fsstress: don't abort when stat(".") returns EIO Darrick J. Wong
2025-07-30 14:23   ` Christoph Hellwig
2025-07-30 14:55     ` Darrick J. Wong
2025-07-29 20:11 ` [PATCH 2/2] check: collect core dumps from systemd-coredump Darrick J. Wong
2025-08-02 13:47   ` Zorro Lang
2025-08-12 18:14     ` Darrick J. Wong
2025-08-13 15:18   ` [PATCH v2 " Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).