[PATCH 0/1] binfmt_elf, coredump: Log the reason of the failed core dumps

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/1] binfmt_elf, coredump: Log the reason of the failed core dumps
@ 2024-06-17 23:41 Roman Kisel
  2024-06-17 23:41 ` [PATCH 1/1] " Roman Kisel
  0 siblings, 1 reply; 10+ messages in thread
From: Roman Kisel @ 2024-06-17 23:41 UTC (permalink / raw)
  To: akpm, apais, ardb, bigeasy, brauner, ebiederm, jack, keescook,
	linux-fsdevel, linux-kernel, linux-mm, nagvijay, oleg, tandersen,
	vincent.whitchurch, viro
  Cc: apais, ssengar, sunilmut, vdso

A powerful way to diagnose crashes is to analyze the core dump produced upon
the failure. Missing or malformed core dump files hinder these investigations.
I'd like to propose changes that add logging as to why the kernel would not
finish writing out the core dump file.

These changes don't attempt to turn the code into a state machine with the numerical
error codes. This is just the next step to not logging which is logging :).

Please let me know what is good, bad and ugly with these changes!

Signed-off-by: Roman Kisel <romank@linux.microsoft.com>

Roman Kisel (1):
  binfmt_elf, coredump: Log the reason of the failed core dumps

 fs/binfmt_elf.c          | 48 +++++++++++++++++++++-------
 fs/coredump.c            | 69 +++++++++++++++++++++++++++++++---------
 include/linux/coredump.h |  4 +--
 kernel/signal.c          |  5 ++-
 4 files changed, 96 insertions(+), 30 deletions(-)

base-commit: 831bcbcead6668ebf20b64fdb27518f1362ace3a
-- 
2.45.2

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/1] binfmt_elf, coredump: Log the reason of the failed core dumps
  2024-06-17 23:41 [PATCH 0/1] binfmt_elf, coredump: Log the reason of the failed core dumps Roman Kisel
@ 2024-06-17 23:41 ` Roman Kisel
  2024-06-17 23:52   ` Kees Cook
                     ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Roman Kisel @ 2024-06-17 23:41 UTC (permalink / raw)
  To: akpm, apais, ardb, bigeasy, brauner, ebiederm, jack, keescook,
	linux-fsdevel, linux-kernel, linux-mm, nagvijay, oleg, tandersen,
	vincent.whitchurch, viro
  Cc: apais, ssengar, sunilmut, vdso

Missing, failed, or corrupted core dumps might impede crash
investigations. To improve reliability of that process and consequently
the programs themselves, one needs to trace the path from producing
a core dumpfile to analyzing it. That path starts from the core dump file
written to the disk by the kernel or to the standard input of a user
mode helper program to which the kernel streams the coredump contents.
There are cases where the kernel will interrupt writing the core out or
produce a truncated/not-well-formed core dump.

Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
---
 fs/binfmt_elf.c          | 48 +++++++++++++++++++++-------
 fs/coredump.c            | 69 +++++++++++++++++++++++++++++++---------
 include/linux/coredump.h |  4 +--
 kernel/signal.c          |  5 ++-
 4 files changed, 96 insertions(+), 30 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index a43897b03ce9..26f6ff00913d 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1994,8 +1994,10 @@ static int elf_core_dump(struct coredump_params *cprm)
 	 * Collect all the non-memory information about the process for the
 	 * notes.  This also sets up the file header.
 	 */
-	if (!fill_note_info(&elf, e_phnum, &info, cprm))
+	if (!fill_note_info(&elf, e_phnum, &info, cprm)) {
+		pr_err("Error collecting note info for the core dump; dumping core failed");
 		goto end_coredump;
+	}
 
 	has_dumped = 1;
 
@@ -2010,8 +2012,10 @@ static int elf_core_dump(struct coredump_params *cprm)
 		sz += elf_coredump_extra_notes_size();
 
 		phdr4note = kmalloc(sizeof(*phdr4note), GFP_KERNEL);
-		if (!phdr4note)
+		if (!phdr4note) {
+			pr_err("Error allocating program headers note entry; dumping core failed");
 			goto end_coredump;
+		}
 
 		fill_elf_note_phdr(phdr4note, sz, offset);
 		offset += sz;
@@ -2025,18 +2029,24 @@ static int elf_core_dump(struct coredump_params *cprm)
 
 	if (e_phnum == PN_XNUM) {
 		shdr4extnum = kmalloc(sizeof(*shdr4extnum), GFP_KERNEL);
-		if (!shdr4extnum)
+		if (!shdr4extnum) {
+			pr_err("Error allocating extra program headers; dumping core failed");
 			goto end_coredump;
+		}
 		fill_extnum_info(&elf, shdr4extnum, e_shoff, segs);
 	}
 
 	offset = dataoff;
 
-	if (!dump_emit(cprm, &elf, sizeof(elf)))
+	if (!dump_emit(cprm, &elf, sizeof(elf))) {
+		pr_err("Error emitting the ELF header; dumping core failed");
 		goto end_coredump;
+	}
 
-	if (!dump_emit(cprm, phdr4note, sizeof(*phdr4note)))
+	if (!dump_emit(cprm, phdr4note, sizeof(*phdr4note))) {
+		pr_err("Error emitting the program header for notes; dumping core failed");
 		goto end_coredump;
+	}
 
 	/* Write program headers for segments dump */
 	for (i = 0; i < cprm->vma_count; i++) {
@@ -2059,20 +2069,28 @@ static int elf_core_dump(struct coredump_params *cprm)
 			phdr.p_flags |= PF_X;
 		phdr.p_align = ELF_EXEC_PAGESIZE;
 
-		if (!dump_emit(cprm, &phdr, sizeof(phdr)))
+		if (!dump_emit(cprm, &phdr, sizeof(phdr))) {
+			pr_err("Error emitting program headers; dumping core failed");
 			goto end_coredump;
+		}
 	}
 
-	if (!elf_core_write_extra_phdrs(cprm, offset))
+	if (!elf_core_write_extra_phdrs(cprm, offset)) {
+		pr_err("Error writing out extra program headers; dumping core failed");
 		goto end_coredump;
+	}
 
 	/* write out the notes section */
-	if (!write_note_info(&info, cprm))
+	if (!write_note_info(&info, cprm)) {
+		pr_err("Error writing out notes; dumping core failed");
 		goto end_coredump;
+	}
 
 	/* For cell spufs */
-	if (elf_coredump_extra_notes_write(cprm))
+	if (elf_coredump_extra_notes_write(cprm)) {
+		pr_err("Error writing out extra notes; dumping core failed");
 		goto end_coredump;
+	}
 
 	/* Align to page */
 	dump_skip_to(cprm, dataoff);
@@ -2080,16 +2098,22 @@ static int elf_core_dump(struct coredump_params *cprm)
 	for (i = 0; i < cprm->vma_count; i++) {
 		struct core_vma_metadata *meta = cprm->vma_meta + i;
 
-		if (!dump_user_range(cprm, meta->start, meta->dump_size))
+		if (!dump_user_range(cprm, meta->start, meta->dump_size)) {
+			pr_err("Error writing out the process memory; dumping core failed");
 			goto end_coredump;
+		}
 	}
 
-	if (!elf_core_write_extra_data(cprm))
+	if (!elf_core_write_extra_data(cprm)) {
+		pr_err("Error writing out extra data; dumping core failed");
 		goto end_coredump;
+	}
 
 	if (e_phnum == PN_XNUM) {
-		if (!dump_emit(cprm, shdr4extnum, sizeof(*shdr4extnum)))
+		if (!dump_emit(cprm, shdr4extnum, sizeof(*shdr4extnum))) {
+			pr_err("Error writing out extra program headers; dumping core failed");
 			goto end_coredump;
+		}
 	}
 
 end_coredump:
diff --git a/fs/coredump.c b/fs/coredump.c
index a57a06b80f57..a7200c9024c6 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -519,7 +519,7 @@ static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)
 	return err;
 }
 
-void do_coredump(const kernel_siginfo_t *siginfo)
+int do_coredump(const kernel_siginfo_t *siginfo)
 {
 	struct core_state core_state;
 	struct core_name cn;
@@ -527,7 +527,7 @@ void do_coredump(const kernel_siginfo_t *siginfo)
 	struct linux_binfmt * binfmt;
 	const struct cred *old_cred;
 	struct cred *cred;
-	int retval = 0;
+	int retval;
 	int ispipe;
 	size_t *argv = NULL;
 	int argc = 0;
@@ -551,14 +551,20 @@ void do_coredump(const kernel_siginfo_t *siginfo)
 	audit_core_dumps(siginfo->si_signo);
 
 	binfmt = mm->binfmt;
-	if (!binfmt || !binfmt->core_dump)
+	if (!binfmt || !binfmt->core_dump) {
+		retval = -ENOEXEC;
 		goto fail;
-	if (!__get_dumpable(cprm.mm_flags))
+	}
+	if (!__get_dumpable(cprm.mm_flags)) {
+		retval = -EACCES;
 		goto fail;
+	}
 
 	cred = prepare_creds();
-	if (!cred)
+	if (!cred) {
+		retval = -EPERM;
 		goto fail;
+	}
 	/*
 	 * We cannot trust fsuid as being the "true" uid of the process
 	 * nor do we know its entire history. We only know it was tainted
@@ -588,6 +594,7 @@ void do_coredump(const kernel_siginfo_t *siginfo)
 		if (ispipe < 0) {
 			printk(KERN_WARNING "format_corename failed\n");
 			printk(KERN_WARNING "Aborting core\n");
+			retval = ispipe;
 			goto fail_unlock;
 		}
 
@@ -611,6 +618,7 @@ void do_coredump(const kernel_siginfo_t *siginfo)
 				"Process %d(%s) has RLIMIT_CORE set to 1\n",
 				task_tgid_vnr(current), current->comm);
 			printk(KERN_WARNING "Aborting core\n");
+			retval = -EPERM;
 			goto fail_unlock;
 		}
 		cprm.limit = RLIM_INFINITY;
@@ -620,6 +628,7 @@ void do_coredump(const kernel_siginfo_t *siginfo)
 			printk(KERN_WARNING "Pid %d(%s) over core_pipe_limit\n",
 			       task_tgid_vnr(current), current->comm);
 			printk(KERN_WARNING "Skipping core dump\n");
+			retval = -E2BIG;
 			goto fail_dropcount;
 		}
 
@@ -628,6 +637,7 @@ void do_coredump(const kernel_siginfo_t *siginfo)
 		if (!helper_argv) {
 			printk(KERN_WARNING "%s failed to allocate memory\n",
 			       __func__);
+			retval = -ENOMEM;
 			goto fail_dropcount;
 		}
 		for (argi = 0; argi < argc; argi++)
@@ -654,14 +664,17 @@ void do_coredump(const kernel_siginfo_t *siginfo)
 		int open_flags = O_CREAT | O_WRONLY | O_NOFOLLOW |
 				 O_LARGEFILE | O_EXCL;
 
-		if (cprm.limit < binfmt->min_coredump)
+		if (cprm.limit < binfmt->min_coredump) {
+			retval = -E2BIG;
 			goto fail_unlock;
+		}
 
 		if (need_suid_safe && cn.corename[0] != '/') {
 			printk(KERN_WARNING "Pid %d(%s) can only dump core "\
 				"to fully qualified path!\n",
 				task_tgid_vnr(current), current->comm);
 			printk(KERN_WARNING "Skipping core dump\n");
+			retval = -EPERM;
 			goto fail_unlock;
 		}
 
@@ -707,20 +720,28 @@ void do_coredump(const kernel_siginfo_t *siginfo)
 		} else {
 			cprm.file = filp_open(cn.corename, open_flags, 0600);
 		}
-		if (IS_ERR(cprm.file))
+		if (IS_ERR(cprm.file)) {
+			retval = PTR_ERR(cprm.file);
 			goto fail_unlock;
+		}
 
 		inode = file_inode(cprm.file);
-		if (inode->i_nlink > 1)
+		if (inode->i_nlink > 1) {
+			retval = -EMLINK;
 			goto close_fail;
-		if (d_unhashed(cprm.file->f_path.dentry))
+		}
+		if (d_unhashed(cprm.file->f_path.dentry)) {
+			retval = -EEXIST;
 			goto close_fail;
+		}
 		/*
 		 * AK: actually i see no reason to not allow this for named
 		 * pipes etc, but keep the previous behaviour for now.
 		 */
-		if (!S_ISREG(inode->i_mode))
+		if (!S_ISREG(inode->i_mode)) {
+			retval = -EISDIR;
 			goto close_fail;
+		}
 		/*
 		 * Don't dump core if the filesystem changed owner or mode
 		 * of the file during file creation. This is an issue when
@@ -732,17 +753,22 @@ void do_coredump(const kernel_siginfo_t *siginfo)
 				    current_fsuid())) {
 			pr_info_ratelimited("Core dump to %s aborted: cannot preserve file owner\n",
 					    cn.corename);
+			retval = -EPERM;
 			goto close_fail;
 		}
 		if ((inode->i_mode & 0677) != 0600) {
 			pr_info_ratelimited("Core dump to %s aborted: cannot preserve file permissions\n",
 					    cn.corename);
+			retval = -EPERM;
 			goto close_fail;
 		}
-		if (!(cprm.file->f_mode & FMODE_CAN_WRITE))
+		if (!(cprm.file->f_mode & FMODE_CAN_WRITE)) {
+			retval = -EACCES;
 			goto close_fail;
-		if (do_truncate(idmap, cprm.file->f_path.dentry,
-				0, 0, cprm.file))
+		}
+		retval = do_truncate(idmap, cprm.file->f_path.dentry,
+				0, 0, cprm.file);
+		if (retval)
 			goto close_fail;
 	}
 
@@ -758,10 +784,14 @@ void do_coredump(const kernel_siginfo_t *siginfo)
 		 */
 		if (!cprm.file) {
 			pr_info("Core dump to |%s disabled\n", cn.corename);
+			retval = -EPERM;
 			goto close_fail;
 		}
-		if (!dump_vma_snapshot(&cprm))
+		if (!dump_vma_snapshot(&cprm)) {
+			pr_err("Can't get VMA snapshot for core dump |%s\n", cn.corename);
+			retval = -EACCES;
 			goto close_fail;
+		}
 
 		file_start_write(cprm.file);
 		core_dumped = binfmt->core_dump(&cprm);
@@ -777,9 +807,18 @@ void do_coredump(const kernel_siginfo_t *siginfo)
 		}
 		file_end_write(cprm.file);
 		free_vma_snapshot(&cprm);
+	} else {
+		pr_err("Core dump to |%s has been interrupted\n", cn.corename);
+		retval = -EAGAIN;
+		goto fail;
 	}
+	pr_info("Core dump to |%s: vma_count %d, vma_data_size %lu, written %lld bytes, pos %lld\n",
+		cn.corename, cprm.vma_count, cprm.vma_data_size, cprm.written, cprm.pos);
 	if (ispipe && core_pipe_limit)
 		wait_for_dump_helpers(cprm.file);
+
+	retval = 0;
+
 close_fail:
 	if (cprm.file)
 		filp_close(cprm.file, NULL);
@@ -794,7 +833,7 @@ void do_coredump(const kernel_siginfo_t *siginfo)
 fail_creds:
 	put_cred(cred);
 fail:
-	return;
+	return retval;
 }
 
 /*
diff --git a/include/linux/coredump.h b/include/linux/coredump.h
index 0904ba010341..8b29be758a87 100644
--- a/include/linux/coredump.h
+++ b/include/linux/coredump.h
@@ -42,9 +42,9 @@ extern int dump_emit(struct coredump_params *cprm, const void *addr, int nr);
 extern int dump_align(struct coredump_params *cprm, int align);
 int dump_user_range(struct coredump_params *cprm, unsigned long start,
 		    unsigned long len);
-extern void do_coredump(const kernel_siginfo_t *siginfo);
+extern int do_coredump(const kernel_siginfo_t *siginfo);
 #else
-static inline void do_coredump(const kernel_siginfo_t *siginfo) {}
+static inline int do_coredump(const kernel_siginfo_t *siginfo) {}
 #endif
 
 #if defined(CONFIG_COREDUMP) && defined(CONFIG_SYSCTL)
diff --git a/kernel/signal.c b/kernel/signal.c
index 1f9dd41c04be..f2ecf29a994d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2675,6 +2675,7 @@ bool get_signal(struct ksignal *ksig)
 	struct sighand_struct *sighand = current->sighand;
 	struct signal_struct *signal = current->signal;
 	int signr;
+	int ret;
 
 	clear_notify_signal();
 	if (unlikely(task_work_pending(current)))
@@ -2891,7 +2892,9 @@ bool get_signal(struct ksignal *ksig)
 			 * first and our do_group_exit call below will use
 			 * that value and ignore the one we pass it.
 			 */
-			do_coredump(&ksig->info);
+			ret = do_coredump(&ksig->info);
+			if (ret)
+				pr_err("coredump has not been created, error %d\n", ret);
 		}
 
 		/*
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] binfmt_elf, coredump: Log the reason of the failed core dumps
  2024-06-17 23:41 ` [PATCH 1/1] " Roman Kisel
@ 2024-06-17 23:52   ` Kees Cook
  2024-06-18 15:49     ` Roman Kisel
  2024-06-18  6:18   ` Sebastian Andrzej Siewior
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Kees Cook @ 2024-06-17 23:52 UTC (permalink / raw)
  To: Roman Kisel
  Cc: akpm, apais, ardb, bigeasy, brauner, ebiederm, jack,
	linux-fsdevel, linux-kernel, linux-mm, nagvijay, oleg, tandersen,
	vincent.whitchurch, viro, apais, ssengar, sunilmut, vdso

On Mon, Jun 17, 2024 at 04:41:30PM -0700, Roman Kisel wrote:
> Missing, failed, or corrupted core dumps might impede crash
> investigations. To improve reliability of that process and consequently
> the programs themselves, one needs to trace the path from producing
> a core dumpfile to analyzing it. That path starts from the core dump file
> written to the disk by the kernel or to the standard input of a user
> mode helper program to which the kernel streams the coredump contents.
> There are cases where the kernel will interrupt writing the core out or
> produce a truncated/not-well-formed core dump.

Hm, I'm all for better diagnostics, but they need to be helpful and not
be a risk to the system. All the added "pr_*()" calls need to use the
_ratelimited variant to avoid a user inducing massive spam to the system
logs. And please standardize the reporting to include information about
the task that is dumping. Otherwise the logging isn't useful for anyone
reading it. Something that includes pid and task->comm at the very
least. :)

For example, see report_mem_rw_reject() in
https://lore.kernel.org/lkml/20240613133937.2352724-2-adrian.ratiu@collabora.com/

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] binfmt_elf, coredump: Log the reason of the failed core dumps
  2024-06-17 23:41 ` [PATCH 1/1] " Roman Kisel
  2024-06-17 23:52   ` Kees Cook
@ 2024-06-18  6:18   ` Sebastian Andrzej Siewior
  2024-06-18 16:30     ` Roman Kisel
  2024-06-18 10:54   ` kernel test robot
  2024-06-18 11:31   ` kernel test robot
  3 siblings, 1 reply; 10+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-06-18  6:18 UTC (permalink / raw)
  To: Roman Kisel
  Cc: akpm, apais, ardb, brauner, ebiederm, jack, keescook,
	linux-fsdevel, linux-kernel, linux-mm, nagvijay, oleg, tandersen,
	vincent.whitchurch, viro, apais, ssengar, sunilmut, vdso

On 2024-06-17 16:41:30 [-0700], Roman Kisel wrote:
> Missing, failed, or corrupted core dumps might impede crash
> investigations. To improve reliability of that process and consequently
> the programs themselves, one needs to trace the path from producing
> a core dumpfile to analyzing it. That path starts from the core dump file
> written to the disk by the kernel or to the standard input of a user
> mode helper program to which the kernel streams the coredump contents.
> There are cases where the kernel will interrupt writing the core out or
> produce a truncated/not-well-formed core dump.

How much of this happened and how much of this is just "let me handle
everything that could go wrong".
The cases where it was interrupted without a hint probably deserve a
note rather then leaving a half of coredump back.

> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> diff --git a/fs/coredump.c b/fs/coredump.c
> index a57a06b80f57..a7200c9024c6 100644
> --- a/fs/coredump.c
> +++ b/fs/coredump.c
> @@ -777,9 +807,18 @@ void do_coredump(const kernel_siginfo_t *siginfo)
>  		}
>  		file_end_write(cprm.file);
>  		free_vma_snapshot(&cprm);
> +	} else {
> +		pr_err("Core dump to |%s has been interrupted\n", cn.corename);
> +		retval = -EAGAIN;
> +		goto fail;
>  	}
> +	pr_info("Core dump to |%s: vma_count %d, vma_data_size %lu, written %lld bytes, pos %lld\n",
> +		cn.corename, cprm.vma_count, cprm.vma_data_size, cprm.written, cprm.pos);

Probably too noisy in the default case. The offsets probably don't
matter unless you debug.

>  	if (ispipe && core_pipe_limit)
>  		wait_for_dump_helpers(cprm.file);
> +
> +	retval = 0;
> +
>  close_fail:
>  	if (cprm.file)
>  		filp_close(cprm.file, NULL);
> diff --git a/include/linux/coredump.h b/include/linux/coredump.h
> index 0904ba010341..8b29be758a87 100644
> --- a/include/linux/coredump.h
> +++ b/include/linux/coredump.h
> @@ -42,9 +42,9 @@ extern int dump_emit(struct coredump_params *cprm, const void *addr, int nr);
>  extern int dump_align(struct coredump_params *cprm, int align);
>  int dump_user_range(struct coredump_params *cprm, unsigned long start,
>  		    unsigned long len);
> -extern void do_coredump(const kernel_siginfo_t *siginfo);
> +extern int do_coredump(const kernel_siginfo_t *siginfo);
>  #else
> -static inline void do_coredump(const kernel_siginfo_t *siginfo) {}
> +static inline int do_coredump(const kernel_siginfo_t *siginfo) {}

This probably does not compile.

>  #endif
>  
>  #if defined(CONFIG_COREDUMP) && defined(CONFIG_SYSCTL)
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 1f9dd41c04be..f2ecf29a994d 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2675,6 +2675,7 @@ bool get_signal(struct ksignal *ksig)
>  	struct sighand_struct *sighand = current->sighand;
>  	struct signal_struct *signal = current->signal;
>  	int signr;
> +	int ret;
>  
>  	clear_notify_signal();
>  	if (unlikely(task_work_pending(current)))
> @@ -2891,7 +2892,9 @@ bool get_signal(struct ksignal *ksig)
>  			 * first and our do_group_exit call below will use
>  			 * that value and ignore the one we pass it.
>  			 */
> -			do_coredump(&ksig->info);
> +			ret = do_coredump(&ksig->info);
> +			if (ret)
> +				pr_err("coredump has not been created, error %d\n", ret);

So you preserve the error code just for one additional note.

>  		}
>  
>  		/*

Sebastian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] binfmt_elf, coredump: Log the reason of the failed core dumps
  2024-06-17 23:41 ` [PATCH 1/1] " Roman Kisel
  2024-06-17 23:52   ` Kees Cook
  2024-06-18  6:18   ` Sebastian Andrzej Siewior
@ 2024-06-18 10:54   ` kernel test robot
  2024-06-18 11:31   ` kernel test robot
  3 siblings, 0 replies; 10+ messages in thread
From: kernel test robot @ 2024-06-18 10:54 UTC (permalink / raw)
  To: Roman Kisel, akpm, apais, ardb, bigeasy, brauner, ebiederm, jack,
	keescook, linux-fsdevel, linux-kernel, linux-mm, nagvijay, oleg,
	tandersen, vincent.whitchurch, viro
  Cc: oe-kbuild-all, apais, ssengar, sunilmut, vdso

Hi Roman,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 831bcbcead6668ebf20b64fdb27518f1362ace3a]

url:    https://github.com/intel-lab-lkp/linux/commits/Roman-Kisel/binfmt_elf-coredump-Log-the-reason-of-the-failed-core-dumps/20240618-074419
base:   831bcbcead6668ebf20b64fdb27518f1362ace3a
patch link:    https://lore.kernel.org/r/20240617234133.1167523-2-romank%40linux.microsoft.com
patch subject: [PATCH 1/1] binfmt_elf, coredump: Log the reason of the failed core dumps
config: x86_64-buildonly-randconfig-002-20240618 (https://download.01.org/0day-ci/archive/20240618/202406181823.dr4ogEY0-lkp@intel.com/config)
compiler: gcc-8 (Ubuntu 8.4.0-3ubuntu2) 8.4.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240618/202406181823.dr4ogEY0-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202406181823.dr4ogEY0-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from kernel/signal.c:29:
   include/linux/coredump.h: In function 'do_coredump':
>> include/linux/coredump.h:47:1: warning: no return statement in function returning non-void [-Wreturn-type]
    static inline int do_coredump(const kernel_siginfo_t *siginfo) {}
    ^~~~~~


vim +47 include/linux/coredump.h

    34	
    35	/*
    36	 * These are the only things you should do on a core-file: use only these
    37	 * functions to write out all the necessary info.
    38	 */
    39	extern void dump_skip_to(struct coredump_params *cprm, unsigned long to);
    40	extern void dump_skip(struct coredump_params *cprm, size_t nr);
    41	extern int dump_emit(struct coredump_params *cprm, const void *addr, int nr);
    42	extern int dump_align(struct coredump_params *cprm, int align);
    43	int dump_user_range(struct coredump_params *cprm, unsigned long start,
    44			    unsigned long len);
    45	extern int do_coredump(const kernel_siginfo_t *siginfo);
    46	#else
  > 47	static inline int do_coredump(const kernel_siginfo_t *siginfo) {}
    48	#endif
    49	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] binfmt_elf, coredump: Log the reason of the failed core dumps
  2024-06-17 23:41 ` [PATCH 1/1] " Roman Kisel
                     ` (2 preceding siblings ...)
  2024-06-18 10:54   ` kernel test robot
@ 2024-06-18 11:31   ` kernel test robot
  3 siblings, 0 replies; 10+ messages in thread
From: kernel test robot @ 2024-06-18 11:31 UTC (permalink / raw)
  To: Roman Kisel, akpm, apais, ardb, bigeasy, brauner, ebiederm, jack,
	keescook, linux-fsdevel, linux-kernel, linux-mm, nagvijay, oleg,
	tandersen, vincent.whitchurch, viro
  Cc: llvm, oe-kbuild-all, apais, ssengar, sunilmut, vdso

Hi Roman,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 831bcbcead6668ebf20b64fdb27518f1362ace3a]

url:    https://github.com/intel-lab-lkp/linux/commits/Roman-Kisel/binfmt_elf-coredump-Log-the-reason-of-the-failed-core-dumps/20240618-074419
base:   831bcbcead6668ebf20b64fdb27518f1362ace3a
patch link:    https://lore.kernel.org/r/20240617234133.1167523-2-romank%40linux.microsoft.com
patch subject: [PATCH 1/1] binfmt_elf, coredump: Log the reason of the failed core dumps
config: arm-allnoconfig (https://download.01.org/0day-ci/archive/20240618/202406181954.9Z65WD4Z-lkp@intel.com/config)
compiler: clang version 19.0.0git (https://github.com/llvm/llvm-project 78ee473784e5ef6f0b19ce4cb111fb6e4d23c6b2)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240618/202406181954.9Z65WD4Z-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202406181954.9Z65WD4Z-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from fs/coredump.c:6:
   In file included from include/linux/mm.h:2253:
   include/linux/vmstat.h:514:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
     514 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
         |                               ~~~~~~~~~~~ ^ ~~~
>> fs/coredump.c:816:32: warning: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Wformat]
     815 |         pr_info("Core dump to |%s: vma_count %d, vma_data_size %lu, written %lld bytes, pos %lld\n",
         |                                                                ~~~
         |                                                                %zu
     816 |                 cn.corename, cprm.vma_count, cprm.vma_data_size, cprm.written, cprm.pos);
         |                                              ^~~~~~~~~~~~~~~~~~
   include/linux/printk.h:537:34: note: expanded from macro 'pr_info'
     537 |         printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
         |                                 ~~~     ^~~~~~~~~~~
   include/linux/printk.h:464:60: note: expanded from macro 'printk'
     464 | #define printk(fmt, ...) printk_index_wrap(_printk, fmt, ##__VA_ARGS__)
         |                                                     ~~~    ^~~~~~~~~~~
   include/linux/printk.h:436:19: note: expanded from macro 'printk_index_wrap'
     436 |                 _p_func(_fmt, ##__VA_ARGS__);                           \
         |                         ~~~~    ^~~~~~~~~~~
   2 warnings generated.


vim +816 fs/coredump.c

   521	
   522	int do_coredump(const kernel_siginfo_t *siginfo)
   523	{
   524		struct core_state core_state;
   525		struct core_name cn;
   526		struct mm_struct *mm = current->mm;
   527		struct linux_binfmt * binfmt;
   528		const struct cred *old_cred;
   529		struct cred *cred;
   530		int retval;
   531		int ispipe;
   532		size_t *argv = NULL;
   533		int argc = 0;
   534		/* require nonrelative corefile path and be extra careful */
   535		bool need_suid_safe = false;
   536		bool core_dumped = false;
   537		static atomic_t core_dump_count = ATOMIC_INIT(0);
   538		struct coredump_params cprm = {
   539			.siginfo = siginfo,
   540			.limit = rlimit(RLIMIT_CORE),
   541			/*
   542			 * We must use the same mm->flags while dumping core to avoid
   543			 * inconsistency of bit flags, since this flag is not protected
   544			 * by any locks.
   545			 */
   546			.mm_flags = mm->flags,
   547			.vma_meta = NULL,
   548			.cpu = raw_smp_processor_id(),
   549		};
   550	
   551		audit_core_dumps(siginfo->si_signo);
   552	
   553		binfmt = mm->binfmt;
   554		if (!binfmt || !binfmt->core_dump) {
   555			retval = -ENOEXEC;
   556			goto fail;
   557		}
   558		if (!__get_dumpable(cprm.mm_flags)) {
   559			retval = -EACCES;
   560			goto fail;
   561		}
   562	
   563		cred = prepare_creds();
   564		if (!cred) {
   565			retval = -EPERM;
   566			goto fail;
   567		}
   568		/*
   569		 * We cannot trust fsuid as being the "true" uid of the process
   570		 * nor do we know its entire history. We only know it was tainted
   571		 * so we dump it as root in mode 2, and only into a controlled
   572		 * environment (pipe handler or fully qualified path).
   573		 */
   574		if (__get_dumpable(cprm.mm_flags) == SUID_DUMP_ROOT) {
   575			/* Setuid core dump mode */
   576			cred->fsuid = GLOBAL_ROOT_UID;	/* Dump root private */
   577			need_suid_safe = true;
   578		}
   579	
   580		retval = coredump_wait(siginfo->si_signo, &core_state);
   581		if (retval < 0)
   582			goto fail_creds;
   583	
   584		old_cred = override_creds(cred);
   585	
   586		ispipe = format_corename(&cn, &cprm, &argv, &argc);
   587	
   588		if (ispipe) {
   589			int argi;
   590			int dump_count;
   591			char **helper_argv;
   592			struct subprocess_info *sub_info;
   593	
   594			if (ispipe < 0) {
   595				printk(KERN_WARNING "format_corename failed\n");
   596				printk(KERN_WARNING "Aborting core\n");
   597				retval = ispipe;
   598				goto fail_unlock;
   599			}
   600	
   601			if (cprm.limit == 1) {
   602				/* See umh_pipe_setup() which sets RLIMIT_CORE = 1.
   603				 *
   604				 * Normally core limits are irrelevant to pipes, since
   605				 * we're not writing to the file system, but we use
   606				 * cprm.limit of 1 here as a special value, this is a
   607				 * consistent way to catch recursive crashes.
   608				 * We can still crash if the core_pattern binary sets
   609				 * RLIM_CORE = !1, but it runs as root, and can do
   610				 * lots of stupid things.
   611				 *
   612				 * Note that we use task_tgid_vnr here to grab the pid
   613				 * of the process group leader.  That way we get the
   614				 * right pid if a thread in a multi-threaded
   615				 * core_pattern process dies.
   616				 */
   617				printk(KERN_WARNING
   618					"Process %d(%s) has RLIMIT_CORE set to 1\n",
   619					task_tgid_vnr(current), current->comm);
   620				printk(KERN_WARNING "Aborting core\n");
   621				retval = -EPERM;
   622				goto fail_unlock;
   623			}
   624			cprm.limit = RLIM_INFINITY;
   625	
   626			dump_count = atomic_inc_return(&core_dump_count);
   627			if (core_pipe_limit && (core_pipe_limit < dump_count)) {
   628				printk(KERN_WARNING "Pid %d(%s) over core_pipe_limit\n",
   629				       task_tgid_vnr(current), current->comm);
   630				printk(KERN_WARNING "Skipping core dump\n");
   631				retval = -E2BIG;
   632				goto fail_dropcount;
   633			}
   634	
   635			helper_argv = kmalloc_array(argc + 1, sizeof(*helper_argv),
   636						    GFP_KERNEL);
   637			if (!helper_argv) {
   638				printk(KERN_WARNING "%s failed to allocate memory\n",
   639				       __func__);
   640				retval = -ENOMEM;
   641				goto fail_dropcount;
   642			}
   643			for (argi = 0; argi < argc; argi++)
   644				helper_argv[argi] = cn.corename + argv[argi];
   645			helper_argv[argi] = NULL;
   646	
   647			retval = -ENOMEM;
   648			sub_info = call_usermodehelper_setup(helper_argv[0],
   649							helper_argv, NULL, GFP_KERNEL,
   650							umh_pipe_setup, NULL, &cprm);
   651			if (sub_info)
   652				retval = call_usermodehelper_exec(sub_info,
   653								  UMH_WAIT_EXEC);
   654	
   655			kfree(helper_argv);
   656			if (retval) {
   657				printk(KERN_INFO "Core dump to |%s pipe failed\n",
   658				       cn.corename);
   659				goto close_fail;
   660			}
   661		} else {
   662			struct mnt_idmap *idmap;
   663			struct inode *inode;
   664			int open_flags = O_CREAT | O_WRONLY | O_NOFOLLOW |
   665					 O_LARGEFILE | O_EXCL;
   666	
   667			if (cprm.limit < binfmt->min_coredump) {
   668				retval = -E2BIG;
   669				goto fail_unlock;
   670			}
   671	
   672			if (need_suid_safe && cn.corename[0] != '/') {
   673				printk(KERN_WARNING "Pid %d(%s) can only dump core "\
   674					"to fully qualified path!\n",
   675					task_tgid_vnr(current), current->comm);
   676				printk(KERN_WARNING "Skipping core dump\n");
   677				retval = -EPERM;
   678				goto fail_unlock;
   679			}
   680	
   681			/*
   682			 * Unlink the file if it exists unless this is a SUID
   683			 * binary - in that case, we're running around with root
   684			 * privs and don't want to unlink another user's coredump.
   685			 */
   686			if (!need_suid_safe) {
   687				/*
   688				 * If it doesn't exist, that's fine. If there's some
   689				 * other problem, we'll catch it at the filp_open().
   690				 */
   691				do_unlinkat(AT_FDCWD, getname_kernel(cn.corename));
   692			}
   693	
   694			/*
   695			 * There is a race between unlinking and creating the
   696			 * file, but if that causes an EEXIST here, that's
   697			 * fine - another process raced with us while creating
   698			 * the corefile, and the other process won. To userspace,
   699			 * what matters is that at least one of the two processes
   700			 * writes its coredump successfully, not which one.
   701			 */
   702			if (need_suid_safe) {
   703				/*
   704				 * Using user namespaces, normal user tasks can change
   705				 * their current->fs->root to point to arbitrary
   706				 * directories. Since the intention of the "only dump
   707				 * with a fully qualified path" rule is to control where
   708				 * coredumps may be placed using root privileges,
   709				 * current->fs->root must not be used. Instead, use the
   710				 * root directory of init_task.
   711				 */
   712				struct path root;
   713	
   714				task_lock(&init_task);
   715				get_fs_root(init_task.fs, &root);
   716				task_unlock(&init_task);
   717				cprm.file = file_open_root(&root, cn.corename,
   718							   open_flags, 0600);
   719				path_put(&root);
   720			} else {
   721				cprm.file = filp_open(cn.corename, open_flags, 0600);
   722			}
   723			if (IS_ERR(cprm.file)) {
   724				retval = PTR_ERR(cprm.file);
   725				goto fail_unlock;
   726			}
   727	
   728			inode = file_inode(cprm.file);
   729			if (inode->i_nlink > 1) {
   730				retval = -EMLINK;
   731				goto close_fail;
   732			}
   733			if (d_unhashed(cprm.file->f_path.dentry)) {
   734				retval = -EEXIST;
   735				goto close_fail;
   736			}
   737			/*
   738			 * AK: actually i see no reason to not allow this for named
   739			 * pipes etc, but keep the previous behaviour for now.
   740			 */
   741			if (!S_ISREG(inode->i_mode)) {
   742				retval = -EISDIR;
   743				goto close_fail;
   744			}
   745			/*
   746			 * Don't dump core if the filesystem changed owner or mode
   747			 * of the file during file creation. This is an issue when
   748			 * a process dumps core while its cwd is e.g. on a vfat
   749			 * filesystem.
   750			 */
   751			idmap = file_mnt_idmap(cprm.file);
   752			if (!vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, inode),
   753					    current_fsuid())) {
   754				pr_info_ratelimited("Core dump to %s aborted: cannot preserve file owner\n",
   755						    cn.corename);
   756				retval = -EPERM;
   757				goto close_fail;
   758			}
   759			if ((inode->i_mode & 0677) != 0600) {
   760				pr_info_ratelimited("Core dump to %s aborted: cannot preserve file permissions\n",
   761						    cn.corename);
   762				retval = -EPERM;
   763				goto close_fail;
   764			}
   765			if (!(cprm.file->f_mode & FMODE_CAN_WRITE)) {
   766				retval = -EACCES;
   767				goto close_fail;
   768			}
   769			retval = do_truncate(idmap, cprm.file->f_path.dentry,
   770					0, 0, cprm.file);
   771			if (retval)
   772				goto close_fail;
   773		}
   774	
   775		/* get us an unshared descriptor table; almost always a no-op */
   776		/* The cell spufs coredump code reads the file descriptor tables */
   777		retval = unshare_files();
   778		if (retval)
   779			goto close_fail;
   780		if (!dump_interrupted()) {
   781			/*
   782			 * umh disabled with CONFIG_STATIC_USERMODEHELPER_PATH="" would
   783			 * have this set to NULL.
   784			 */
   785			if (!cprm.file) {
   786				pr_info("Core dump to |%s disabled\n", cn.corename);
   787				retval = -EPERM;
   788				goto close_fail;
   789			}
   790			if (!dump_vma_snapshot(&cprm)) {
   791				pr_err("Can't get VMA snapshot for core dump |%s\n", cn.corename);
   792				retval = -EACCES;
   793				goto close_fail;
   794			}
   795	
   796			file_start_write(cprm.file);
   797			core_dumped = binfmt->core_dump(&cprm);
   798			/*
   799			 * Ensures that file size is big enough to contain the current
   800			 * file postion. This prevents gdb from complaining about
   801			 * a truncated file if the last "write" to the file was
   802			 * dump_skip.
   803			 */
   804			if (cprm.to_skip) {
   805				cprm.to_skip--;
   806				dump_emit(&cprm, "", 1);
   807			}
   808			file_end_write(cprm.file);
   809			free_vma_snapshot(&cprm);
   810		} else {
   811			pr_err("Core dump to |%s has been interrupted\n", cn.corename);
   812			retval = -EAGAIN;
   813			goto fail;
   814		}
   815		pr_info("Core dump to |%s: vma_count %d, vma_data_size %lu, written %lld bytes, pos %lld\n",
 > 816			cn.corename, cprm.vma_count, cprm.vma_data_size, cprm.written, cprm.pos);
   817		if (ispipe && core_pipe_limit)
   818			wait_for_dump_helpers(cprm.file);
   819	
   820		retval = 0;
   821	
   822	close_fail:
   823		if (cprm.file)
   824			filp_close(cprm.file, NULL);
   825	fail_dropcount:
   826		if (ispipe)
   827			atomic_dec(&core_dump_count);
   828	fail_unlock:
   829		kfree(argv);
   830		kfree(cn.corename);
   831		coredump_finish(core_dumped);
   832		revert_creds(old_cred);
   833	fail_creds:
   834		put_cred(cred);
   835	fail:
   836		return retval;
   837	}
   838	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] binfmt_elf, coredump: Log the reason of the failed core dumps
  2024-06-17 23:52   ` Kees Cook
@ 2024-06-18 15:49     ` Roman Kisel
  0 siblings, 0 replies; 10+ messages in thread
From: Roman Kisel @ 2024-06-18 15:49 UTC (permalink / raw)
  To: Kees Cook
  Cc: akpm, apais, ardb, bigeasy, brauner, ebiederm, jack,
	linux-fsdevel, linux-kernel, linux-mm, nagvijay, oleg, tandersen,
	vincent.whitchurch, viro, apais, ssengar, sunilmut, vdso



On 6/17/2024 4:52 PM, Kees Cook wrote:
> On Mon, Jun 17, 2024 at 04:41:30PM -0700, Roman Kisel wrote:
>> Missing, failed, or corrupted core dumps might impede crash
>> investigations. To improve reliability of that process and consequently
>> the programs themselves, one needs to trace the path from producing
>> a core dumpfile to analyzing it. That path starts from the core dump file
>> written to the disk by the kernel or to the standard input of a user
>> mode helper program to which the kernel streams the coredump contents.
>> There are cases where the kernel will interrupt writing the core out or
>> produce a truncated/not-well-formed core dump.
> 
> Hm, I'm all for better diagnostics, but they need to be helpful and not
> be a risk to the system. All the added "pr_*()" calls need to use the
> _ratelimited variant to avoid a user inducing massive spam to the system
> logs. And please standardize the reporting to include information about
> the task that is dumping. Otherwise the logging isn't useful for anyone
> reading it. Something that includes pid and task->comm at the very
> least. :)
Appreciate your suggestions very much! Rate-limiting has definitely 
slipped off my mind, my bad. Will also fix the reporting format to make 
it useful.

> 
> For example, see report_mem_rw_reject() in
> https://lore.kernel.org/lkml/20240613133937.2352724-2-adrian.ratiu@collabora.com/
Thanks, that's awesome!

> 
> -Kees
> 

-- 
Thank you,
Roman

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] binfmt_elf, coredump: Log the reason of the failed core dumps
  2024-06-18  6:18   ` Sebastian Andrzej Siewior
@ 2024-06-18 16:30     ` Roman Kisel
  2024-06-18 21:21       ` Eric W. Biederman
  0 siblings, 1 reply; 10+ messages in thread
From: Roman Kisel @ 2024-06-18 16:30 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: akpm, apais, ardb, brauner, ebiederm, jack, keescook,
	linux-fsdevel, linux-kernel, linux-mm, nagvijay, oleg, tandersen,
	vincent.whitchurch, viro, apais, ssengar, sunilmut, vdso



On 6/17/2024 11:18 PM, Sebastian Andrzej Siewior wrote:
> On 2024-06-17 16:41:30 [-0700], Roman Kisel wrote:
>> Missing, failed, or corrupted core dumps might impede crash
>> investigations. To improve reliability of that process and consequently
>> the programs themselves, one needs to trace the path from producing
>> a core dumpfile to analyzing it. That path starts from the core dump file
>> written to the disk by the kernel or to the standard input of a user
>> mode helper program to which the kernel streams the coredump contents.
>> There are cases where the kernel will interrupt writing the core out or
>> produce a truncated/not-well-formed core dump.
> 
> How much of this happened and how much of this is just "let me handle
> everything that could go wrong".
Some of that must be happening as there are truncated dump files. 
Haven't run the logging code at large scale yet with the systems being 
stressed a lot by the customer workloads to hit all edge cases. Sent the 
changes to the kernel mail list out of abundance of caution first, and 
being ecstatic about that: on the other thread Kees noticed I didn't use 
the ratelimited logging. That has absolutely made me day and whole week, 
just glowing :) Might've been a close call due to something in a crash loop.

I think it'd be fair to say that I am asking to please "let me handle 
(log) everything that could go wrong", ratelimited, as these error cases 
are present in the code, and logging can give a clue why the core dump 
collection didn't succeed and what one would need to explore to increase 
reliability of the system.

> The cases where it was interrupted without a hint probably deserve a
> note rather then leaving a half of coredump back.
Wholeheartedly agree!

> 
>> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
>> diff --git a/fs/coredump.c b/fs/coredump.c
>> index a57a06b80f57..a7200c9024c6 100644
>> --- a/fs/coredump.c
>> +++ b/fs/coredump.c
>> @@ -777,9 +807,18 @@ void do_coredump(const kernel_siginfo_t *siginfo)
>>   		}
>>   		file_end_write(cprm.file);
>>   		free_vma_snapshot(&cprm);
>> +	} else {
>> +		pr_err("Core dump to |%s has been interrupted\n", cn.corename);
>> +		retval = -EAGAIN;
>> +		goto fail;
>>   	}
>> +	pr_info("Core dump to |%s: vma_count %d, vma_data_size %lu, written %lld bytes, pos %lld\n",
>> +		cn.corename, cprm.vma_count, cprm.vma_data_size, cprm.written, cprm.pos);
> 
> Probably too noisy in the default case. The offsets probably don't
> matter unless you debug.
Will make less noisy, thanks!

> 
>>   	if (ispipe && core_pipe_limit)
>>   		wait_for_dump_helpers(cprm.file);
>> +
>> +	retval = 0;
>> +
>>   close_fail:
>>   	if (cprm.file)
>>   		filp_close(cprm.file, NULL);
>> diff --git a/include/linux/coredump.h b/include/linux/coredump.h
>> index 0904ba010341..8b29be758a87 100644
>> --- a/include/linux/coredump.h
>> +++ b/include/linux/coredump.h
>> @@ -42,9 +42,9 @@ extern int dump_emit(struct coredump_params *cprm, const void *addr, int nr);
>>   extern int dump_align(struct coredump_params *cprm, int align);
>>   int dump_user_range(struct coredump_params *cprm, unsigned long start,
>>   		    unsigned long len);
>> -extern void do_coredump(const kernel_siginfo_t *siginfo);
>> +extern int do_coredump(const kernel_siginfo_t *siginfo);
>>   #else
>> -static inline void do_coredump(const kernel_siginfo_t *siginfo) {}
>> +static inline int do_coredump(const kernel_siginfo_t *siginfo) {}
> 
> This probably does not compile.
D'oh! It does compile, and somehow the warning didn't show up for me. 
Fortunately, you and the kernel robot noticed that one silly piece I 
wrote here. Thank you very much!

For the inclined reader, both C99 and C11 require just these two things 
of the "return" statement (6.8.6.4 The return statement/Constraints):

"A return statement with an expression shall not appear in a function 
whose return type is void. A return statement without an expression 
shall only appear in a function whose return type is void".

One can omit the "return" statement in which case a warning is emitted 
(by gcc), and instead of "ret", gcc emits "ud2" or "brk #0x1000" or 
"trap", etc. to cause a fault.

> 
>>   #endif
>>   
>>   #if defined(CONFIG_COREDUMP) && defined(CONFIG_SYSCTL)
>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index 1f9dd41c04be..f2ecf29a994d 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -2675,6 +2675,7 @@ bool get_signal(struct ksignal *ksig)
>>   	struct sighand_struct *sighand = current->sighand;
>>   	struct signal_struct *signal = current->signal;
>>   	int signr;
>> +	int ret;
>>   
>>   	clear_notify_signal();
>>   	if (unlikely(task_work_pending(current)))
>> @@ -2891,7 +2892,9 @@ bool get_signal(struct ksignal *ksig)
>>   			 * first and our do_group_exit call below will use
>>   			 * that value and ignore the one we pass it.
>>   			 */
>> -			do_coredump(&ksig->info);
>> +			ret = do_coredump(&ksig->info);
>> +			if (ret)
>> +				pr_err("coredump has not been created, error %d\n", ret);
> 
> So you preserve the error code just for one additional note.
Couldn't see how not to do that and report the error code... Might move 
the declaration closer to the point of use, into the innermost 
enclosing/basic block. The C standard used by the kernel permits mixed 
declaration and code, yet not much of that seems to be actually used and 
I hesitated to do

		if (sig_kernel_coredump(signr)) {
			if (print_fatal_signals)
				print_fatal_signal(signr);
			proc_coredump_connector(current);
-			do_coredump(&ksig->info);
+			int ret = do_coredump(&ksig->info);
+			if (ret)
+				pr_err("coredump has not been created, error %d\n", ret);

Feel like moving the declaration inside that "if" statement if that 
looks better.

> 
>>   		}
>>   
>>   		/*
> 
> Sebastian

-- 
Thank you,
Roman

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] binfmt_elf, coredump: Log the reason of the failed core dumps
  2024-06-18 16:30     ` Roman Kisel
@ 2024-06-18 21:21       ` Eric W. Biederman
  2024-06-20 19:10         ` Roman Kisel
  0 siblings, 1 reply; 10+ messages in thread
From: Eric W. Biederman @ 2024-06-18 21:21 UTC (permalink / raw)
  To: Roman Kisel
  Cc: Sebastian Andrzej Siewior, akpm, apais, ardb, brauner, jack,
	keescook, linux-fsdevel, linux-kernel, linux-mm, nagvijay, oleg,
	tandersen, vincent.whitchurch, viro, apais, ssengar, sunilmut,
	vdso

Roman Kisel <romank@linux.microsoft.com> writes:

> On 6/17/2024 11:18 PM, Sebastian Andrzej Siewior wrote:
>> On 2024-06-17 16:41:30 [-0700], Roman Kisel wrote:
>>> Missing, failed, or corrupted core dumps might impede crash
>>> investigations. To improve reliability of that process and consequently
>>> the programs themselves, one needs to trace the path from producing
>>> a core dumpfile to analyzing it. That path starts from the core dump file
>>> written to the disk by the kernel or to the standard input of a user
>>> mode helper program to which the kernel streams the coredump contents.
>>> There are cases where the kernel will interrupt writing the core out or
>>> produce a truncated/not-well-formed core dump.
>> How much of this happened and how much of this is just "let me handle
>> everything that could go wrong".
> Some of that must be happening as there are truncated dump files. Haven't run
> the logging code at large scale yet with the systems being stressed a lot by the
> customer workloads to hit all edge cases. Sent the changes to the kernel mail
> list out of abundance of caution first, and being ecstatic about that: on the
> other thread Kees noticed I didn't use the ratelimited logging. That has
> absolutely made me day and whole week, just glowing :) Might've been a close
> call due to something in a crash loop.

Another reason you could have truncated coredumps is the coredumping
process being killed.

I suspect if you want reasons why the coredump is truncated you are
going to want to instrument dump_interrupted, dump_skip and dump_emit
rather than their callers.  As they don't actually report why the
failed.

Are you using systemd-coredump?  Or another pipe based coredump
collector?  It might be the dump collector is truncating things.

Do you know if your application uses io_uring?  There were some weird
issues with io_uring and coredumps that were causing things to get
truncation at one point.  As I recall a hack was put in the coredump
code so that it worked but maybe there is another odd case that still
needs to be handled.
>
> I think it'd be fair to say that I am asking to please "let me handle (log)
> everything that could go wrong", ratelimited, as these error cases are present
> in the code, and logging can give a clue why the core dump collection didn't
> succeed and what one would need to explore to increase reliability of the
> system.

If you are looking for reasons you definitely want to instrument
fs/coredump.c much more than fs/binfmt_elf.c.  As fs/coredump.c is the
code that actually performs the writes.

One of these days if someone is ambitious we should probably merge the
coredump code from fs/binfmt_elf.c and fs/binfmt_elf_fdpic.c and just
hardcode the coredump code to always produce an elf format coredump.
Just for the simplicity of it all.

Eric

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] binfmt_elf, coredump: Log the reason of the failed core dumps
  2024-06-18 21:21       ` Eric W. Biederman
@ 2024-06-20 19:10         ` Roman Kisel
  0 siblings, 0 replies; 10+ messages in thread
From: Roman Kisel @ 2024-06-20 19:10 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Sebastian Andrzej Siewior, akpm, apais, ardb, brauner, jack,
	keescook, linux-fsdevel, linux-kernel, linux-mm, nagvijay, oleg,
	tandersen, vincent.whitchurch, viro, apais, ssengar, sunilmut,
	vdso



On 6/18/2024 2:21 PM, Eric W. Biederman wrote:
> Roman Kisel <romank@linux.microsoft.com> writes:
> 
>> On 6/17/2024 11:18 PM, Sebastian Andrzej Siewior wrote:
>>> On 2024-06-17 16:41:30 [-0700], Roman Kisel wrote:
>>>> Missing, failed, or corrupted core dumps might impede crash
>>>> investigations. To improve reliability of that process and consequently
>>>> the programs themselves, one needs to trace the path from producing
>>>> a core dumpfile to analyzing it. That path starts from the core dump file
>>>> written to the disk by the kernel or to the standard input of a user
>>>> mode helper program to which the kernel streams the coredump contents.
>>>> There are cases where the kernel will interrupt writing the core out or
>>>> produce a truncated/not-well-formed core dump.
>>> How much of this happened and how much of this is just "let me handle
>>> everything that could go wrong".
>> Some of that must be happening as there are truncated dump files. Haven't run
>> the logging code at large scale yet with the systems being stressed a lot by the
>> customer workloads to hit all edge cases. Sent the changes to the kernel mail
>> list out of abundance of caution first, and being ecstatic about that: on the
>> other thread Kees noticed I didn't use the ratelimited logging. That has
>> absolutely made me day and whole week, just glowing :) Might've been a close
>> call due to something in a crash loop.
> 
> Another reason you could have truncated coredumps is the coredumping
> process being killed.
> 
> I suspect if you want reasons why the coredump is truncated you are
> going to want to instrument dump_interrupted, dump_skip and dump_emit
> rather than their callers.  As they don't actually report why the
> failed.
I'll add logging there as well, thanks for the great idea!

> 
> Are you using systemd-coredump?  Or another pipe based coredump
> collector?  It might be the dump collector is truncating things.
There is a collector program set via core_pattern so that the core dump 
is streamed to its standard input. That is a very simple memcpy-like 
bytes-in..bytes-out code. It logs how many bytes it receives and how 
many bytes it writes, and no bytes are lost in this path. Of the system 
itself, it is built out of the latest stable LTS kernel and a small user 
land, not based on any distribution and packet management. One might say 
it resembles an appliance.

> 
> Do you know if your application uses io_uring?  There were some weird
> issues with io_uring and coredumps that were causing things to get
> truncation at one point.  As I recall a hack was put in the coredump
> code so that it worked but maybe there is another odd case that still
> needs to be handled.
Couldn't appreciate the pointer more! There are cases when the user land 
reaches out to io_uring, not the work horse though.

>>
>> I think it'd be fair to say that I am asking to please "let me handle (log)
>> everything that could go wrong", ratelimited, as these error cases are present
>> in the code, and logging can give a clue why the core dump collection didn't
>> succeed and what one would need to explore to increase reliability of the
>> system.
> 
> If you are looking for reasons you definitely want to instrument
> fs/coredump.c much more than fs/binfmt_elf.c.  As fs/coredump.c is the
> code that actually performs the writes.
Understood, thank you very much!

> 
> One of these days if someone is ambitious we should probably merge the
> coredump code from fs/binfmt_elf.c and fs/binfmt_elf_fdpic.c and just
> hardcode the coredump code to always produce an elf format coredump.
> Just for the simplicity of it all.
I've had loads of experience with collecting and analyzing ELF core dump 
files, including a tool that parses machine state, rebuilds the 
necessary Linux kernel structures and produces ELF core dump files for 
the user land processes from that. Perhaps I could embark on that 
ambitious journey if no one else has time :)

> 
> Eric

-- 
Thank you,
Roman

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-06-20 19:10 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-17 23:41 [PATCH 0/1] binfmt_elf, coredump: Log the reason of the failed core dumps Roman Kisel
2024-06-17 23:41 ` [PATCH 1/1] " Roman Kisel
2024-06-17 23:52   ` Kees Cook
2024-06-18 15:49     ` Roman Kisel
2024-06-18  6:18   ` Sebastian Andrzej Siewior
2024-06-18 16:30     ` Roman Kisel
2024-06-18 21:21       ` Eric W. Biederman
2024-06-20 19:10         ` Roman Kisel
2024-06-18 10:54   ` kernel test robot
2024-06-18 11:31   ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).