All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Jann Horn <jannh@google.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Petr Mladek <pmladek@suse.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>, Ingo Molnar <mingo@kernel.org>
Subject: [PATCH 5.2 14/20] sched/fair: Dont free p->numa_faults with concurrent readers
Date: Fri,  2 Aug 2019 11:40:08 +0200	[thread overview]
Message-ID: <20190802092102.527902959@linuxfoundation.org> (raw)
In-Reply-To: <20190802092055.131876977@linuxfoundation.org>

From: Jann Horn <jannh@google.com>

commit 16d51a590a8ce3befb1308e0e7ab77f3b661af33 upstream.

When going through execve(), zero out the NUMA fault statistics instead of
freeing them.

During execve, the task is reachable through procfs and the scheduler. A
concurrent /proc/*/sched reader can read data from a freed ->numa_faults
allocation (confirmed by KASAN) and write it back to userspace.
I believe that it would also be possible for a use-after-free read to occur
through a race between a NUMA fault and execve(): task_numa_fault() can
lead to task_numa_compare(), which invokes task_weight() on the currently
running task of a different CPU.

Another way to fix this would be to make ->numa_faults RCU-managed or add
extra locking, but it seems easier to wipe the NUMA fault statistics on
execve.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Fixes: 82727018b0d3 ("sched/numa: Call task_numa_free() from do_execve()")
Link: https://lkml.kernel.org/r/20190716152047.14424-1-jannh@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/exec.c                            |    2 +-
 include/linux/sched/numa_balancing.h |    4 ++--
 kernel/fork.c                        |    2 +-
 kernel/sched/fair.c                  |   24 ++++++++++++++++++++----
 4 files changed, 24 insertions(+), 8 deletions(-)

--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1828,7 +1828,7 @@ static int __do_execve_file(int fd, stru
 	membarrier_execve(current);
 	rseq_execve(current);
 	acct_update_integrals(current);
-	task_numa_free(current);
+	task_numa_free(current, false);
 	free_bprm(bprm);
 	kfree(pathbuf);
 	if (filename)
--- a/include/linux/sched/numa_balancing.h
+++ b/include/linux/sched/numa_balancing.h
@@ -19,7 +19,7 @@
 extern void task_numa_fault(int last_node, int node, int pages, int flags);
 extern pid_t task_numa_group_id(struct task_struct *p);
 extern void set_numabalancing_state(bool enabled);
-extern void task_numa_free(struct task_struct *p);
+extern void task_numa_free(struct task_struct *p, bool final);
 extern bool should_numa_migrate_memory(struct task_struct *p, struct page *page,
 					int src_nid, int dst_cpu);
 #else
@@ -34,7 +34,7 @@ static inline pid_t task_numa_group_id(s
 static inline void set_numabalancing_state(bool enabled)
 {
 }
-static inline void task_numa_free(struct task_struct *p)
+static inline void task_numa_free(struct task_struct *p, bool final)
 {
 }
 static inline bool should_numa_migrate_memory(struct task_struct *p,
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -727,7 +727,7 @@ void __put_task_struct(struct task_struc
 	WARN_ON(tsk == current);
 
 	cgroup_free(tsk);
-	task_numa_free(tsk);
+	task_numa_free(tsk, true);
 	security_task_free(tsk);
 	exit_creds(tsk);
 	delayacct_tsk_free(tsk);
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2336,13 +2336,23 @@ no_join:
 	return;
 }
 
-void task_numa_free(struct task_struct *p)
+/*
+ * Get rid of NUMA staticstics associated with a task (either current or dead).
+ * If @final is set, the task is dead and has reached refcount zero, so we can
+ * safely free all relevant data structures. Otherwise, there might be
+ * concurrent reads from places like load balancing and procfs, and we should
+ * reset the data back to default state without freeing ->numa_faults.
+ */
+void task_numa_free(struct task_struct *p, bool final)
 {
 	struct numa_group *grp = p->numa_group;
-	void *numa_faults = p->numa_faults;
+	unsigned long *numa_faults = p->numa_faults;
 	unsigned long flags;
 	int i;
 
+	if (!numa_faults)
+		return;
+
 	if (grp) {
 		spin_lock_irqsave(&grp->lock, flags);
 		for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++)
@@ -2355,8 +2365,14 @@ void task_numa_free(struct task_struct *
 		put_numa_group(grp);
 	}
 
-	p->numa_faults = NULL;
-	kfree(numa_faults);
+	if (final) {
+		p->numa_faults = NULL;
+		kfree(numa_faults);
+	} else {
+		p->total_numa_faults = 0;
+		for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++)
+			numa_faults[i] = 0;
+	}
 }
 
 /*



  parent reply	other threads:[~2019-08-02  9:57 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-02  9:39 [PATCH 5.2 00/20] 5.2.6-stable review Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 5.2 01/20] vsock: correct removal of socket from the list Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 5.2 02/20] ISDN: hfcsusb: checking idx of ep configuration Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 5.2 03/20] ALSA: usb-audio: Sanity checks for each pipe and EP types Greg Kroah-Hartman
2019-08-02 13:48   ` Sasha Levin
2019-08-02 15:51     ` Greg Kroah-Hartman
2019-08-07  2:38       ` Sasha Levin
2019-08-02  9:39 ` [PATCH 5.2 04/20] bpf: fix NULL deref in btf_type_is_resolve_source_only Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 5.2 05/20] media: au0828: fix null dereference in error path Greg Kroah-Hartman
2019-08-02  9:40 ` [PATCH 5.2 06/20] ath10k: Change the warning message string Greg Kroah-Hartman
2019-08-02  9:40 ` [PATCH 5.2 07/20] media: cpia2_usb: first wake up, then free in disconnect Greg Kroah-Hartman
2019-08-02  9:40 ` [PATCH 5.2 08/20] media: pvrusb2: use a different format for warnings Greg Kroah-Hartman
2019-08-02  9:40 ` [PATCH 5.2 09/20] NFS: Cleanup if nfs_match_client is interrupted Greg Kroah-Hartman
2019-08-02  9:40 ` [PATCH 5.2 10/20] media: radio-raremono: change devm_k*alloc to k*alloc Greg Kroah-Hartman
2019-08-02 10:04   ` Joe Perches
2019-08-06  5:34     ` Luke Nowakowski-Krijger
2019-08-02  9:40 ` [PATCH 5.2 11/20] xfrm: policy: fix bydst hlist corruption on hash rebuild Greg Kroah-Hartman
2019-08-02  9:40 ` [PATCH 5.2 12/20] nvme: fix multipath crash when ANA is deactivated Greg Kroah-Hartman
2019-08-02  9:40 ` [PATCH 5.2 13/20] Bluetooth: hci_uart: check for missing tty operations Greg Kroah-Hartman
2019-08-02  9:40 ` Greg Kroah-Hartman [this message]
2019-08-02  9:40 ` [PATCH 5.2 15/20] sched/fair: Use RCU accessors consistently for ->numa_group Greg Kroah-Hartman
2019-08-02  9:40 ` [PATCH 5.2 16/20] /proc/<pid>/cmdline: remove all the special cases Greg Kroah-Hartman
2019-08-02  9:40 ` [PATCH 5.2 17/20] /proc/<pid>/cmdline: add back the setproctitle() special case Greg Kroah-Hartman
2019-08-02  9:40 ` [PATCH 5.2 18/20] drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl Greg Kroah-Hartman
2019-08-02  9:40 ` [PATCH 5.2 19/20] Fix allyesconfig output Greg Kroah-Hartman
2019-08-02  9:40 ` [PATCH 5.2 20/20] ceph: hold i_ceph_lock when removing caps for freeing inode Greg Kroah-Hartman
2019-08-02 17:21 ` [PATCH 5.2 00/20] 5.2.6-stable review Thierry Reding
2019-08-03  7:09   ` Greg Kroah-Hartman
2019-08-05 11:48     ` Thierry Reding
2019-08-09  8:52       ` Greg Kroah-Hartman
2019-08-09 13:04         ` Thierry Reding
2019-08-09 15:49           ` Greg Kroah-Hartman
2019-08-12  9:05             ` Thierry Reding
2021-01-04 12:39               ` Greg Kroah-Hartman
2019-08-02 23:25 ` shuah
2019-08-03  7:08   ` Greg Kroah-Hartman
2019-08-03  5:50 ` Naresh Kamboju
2019-08-03  7:11   ` Greg Kroah-Hartman
2019-08-03 16:00 ` Guenter Roeck
2019-08-04  7:15   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190802092102.527902959@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.