All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Jann Horn <jannh@google.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Petr Mladek <pmladek@suse.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>, Ingo Molnar <mingo@kernel.org>
Subject: [PATCH 4.19 24/32] sched/fair: Dont free p->numa_faults with concurrent readers
Date: Fri,  2 Aug 2019 11:39:58 +0200	[thread overview]
Message-ID: <20190802092109.441126804@linuxfoundation.org> (raw)
In-Reply-To: <20190802092101.913646560@linuxfoundation.org>

From: Jann Horn <jannh@google.com>

commit 16d51a590a8ce3befb1308e0e7ab77f3b661af33 upstream.

When going through execve(), zero out the NUMA fault statistics instead of
freeing them.

During execve, the task is reachable through procfs and the scheduler. A
concurrent /proc/*/sched reader can read data from a freed ->numa_faults
allocation (confirmed by KASAN) and write it back to userspace.
I believe that it would also be possible for a use-after-free read to occur
through a race between a NUMA fault and execve(): task_numa_fault() can
lead to task_numa_compare(), which invokes task_weight() on the currently
running task of a different CPU.

Another way to fix this would be to make ->numa_faults RCU-managed or add
extra locking, but it seems easier to wipe the NUMA fault statistics on
execve.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Fixes: 82727018b0d3 ("sched/numa: Call task_numa_free() from do_execve()")
Link: https://lkml.kernel.org/r/20190716152047.14424-1-jannh@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/exec.c                            |    2 +-
 include/linux/sched/numa_balancing.h |    4 ++--
 kernel/fork.c                        |    2 +-
 kernel/sched/fair.c                  |   24 ++++++++++++++++++++----
 4 files changed, 24 insertions(+), 8 deletions(-)

--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1826,7 +1826,7 @@ static int __do_execve_file(int fd, stru
 	membarrier_execve(current);
 	rseq_execve(current);
 	acct_update_integrals(current);
-	task_numa_free(current);
+	task_numa_free(current, false);
 	free_bprm(bprm);
 	kfree(pathbuf);
 	if (filename)
--- a/include/linux/sched/numa_balancing.h
+++ b/include/linux/sched/numa_balancing.h
@@ -19,7 +19,7 @@
 extern void task_numa_fault(int last_node, int node, int pages, int flags);
 extern pid_t task_numa_group_id(struct task_struct *p);
 extern void set_numabalancing_state(bool enabled);
-extern void task_numa_free(struct task_struct *p);
+extern void task_numa_free(struct task_struct *p, bool final);
 extern bool should_numa_migrate_memory(struct task_struct *p, struct page *page,
 					int src_nid, int dst_cpu);
 #else
@@ -34,7 +34,7 @@ static inline pid_t task_numa_group_id(s
 static inline void set_numabalancing_state(bool enabled)
 {
 }
-static inline void task_numa_free(struct task_struct *p)
+static inline void task_numa_free(struct task_struct *p, bool final)
 {
 }
 static inline bool should_numa_migrate_memory(struct task_struct *p,
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -679,7 +679,7 @@ void __put_task_struct(struct task_struc
 	WARN_ON(tsk == current);
 
 	cgroup_free(tsk);
-	task_numa_free(tsk);
+	task_numa_free(tsk, true);
 	security_task_free(tsk);
 	exit_creds(tsk);
 	delayacct_tsk_free(tsk);
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2345,13 +2345,23 @@ no_join:
 	return;
 }
 
-void task_numa_free(struct task_struct *p)
+/*
+ * Get rid of NUMA staticstics associated with a task (either current or dead).
+ * If @final is set, the task is dead and has reached refcount zero, so we can
+ * safely free all relevant data structures. Otherwise, there might be
+ * concurrent reads from places like load balancing and procfs, and we should
+ * reset the data back to default state without freeing ->numa_faults.
+ */
+void task_numa_free(struct task_struct *p, bool final)
 {
 	struct numa_group *grp = p->numa_group;
-	void *numa_faults = p->numa_faults;
+	unsigned long *numa_faults = p->numa_faults;
 	unsigned long flags;
 	int i;
 
+	if (!numa_faults)
+		return;
+
 	if (grp) {
 		spin_lock_irqsave(&grp->lock, flags);
 		for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++)
@@ -2364,8 +2374,14 @@ void task_numa_free(struct task_struct *
 		put_numa_group(grp);
 	}
 
-	p->numa_faults = NULL;
-	kfree(numa_faults);
+	if (final) {
+		p->numa_faults = NULL;
+		kfree(numa_faults);
+	} else {
+		p->total_numa_faults = 0;
+		for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++)
+			numa_faults[i] = 0;
+	}
 }
 
 /*



  parent reply	other threads:[~2019-08-02  9:56 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-02  9:39 [PATCH 4.19 00/32] 4.19.64-stable review Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 01/32] hv_sock: Add support for delayed close Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 02/32] vsock: correct removal of socket from the list Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 03/32] NFS: Fix dentry revalidation on NFSv4 lookup Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 04/32] NFS: Refactor nfs_lookup_revalidate() Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 05/32] NFSv4: Fix lookup revalidate of regular files Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 06/32] usb: dwc2: Disable all EPs on disconnect Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 07/32] usb: dwc2: Fix disable " Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 08/32] arm64: compat: Provide definition for COMPAT_SIGMINSTKSZ Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 09/32] binder: fix possible UAF when freeing buffer Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 10/32] ISDN: hfcsusb: checking idx of ep configuration Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 11/32] media: au0828: fix null dereference in error path Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 12/32] ath10k: Change the warning message string Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 13/32] media: cpia2_usb: first wake up, then free in disconnect Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 14/32] media: pvrusb2: use a different format for warnings Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 15/32] NFS: Cleanup if nfs_match_client is interrupted Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 16/32] media: radio-raremono: change devm_k*alloc to k*alloc Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 17/32] iommu/vt-d: Dont queue_iova() if there is no flush queue Greg Kroah-Hartman
2019-08-02  9:39   ` Greg Kroah-Hartman
2019-08-03 21:34   ` Pavel Machek
2019-08-03 21:34     ` Pavel Machek
2019-08-06 22:47     ` Dmitry Safonov via iommu
2019-08-06 22:47       ` Dmitry Safonov
2019-08-06 23:16       ` Dmitry Safonov via iommu
2019-08-06 23:16         ` Dmitry Safonov
2019-08-02  9:39 ` [PATCH 4.19 18/32] iommu/iova: Fix compilation error with !CONFIG_IOMMU_IOVA Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 19/32] Bluetooth: hci_uart: check for missing tty operations Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 20/32] vhost: introduce vhost_exceeds_weight() Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 21/32] vhost_net: fix possible infinite loop Greg Kroah-Hartman
2019-08-03 21:49   ` Pavel Machek
2019-08-05  4:17     ` Jason Wang
2019-08-02  9:39 ` [PATCH 4.19 22/32] vhost: vsock: add weight support Greg Kroah-Hartman
2019-08-02  9:39 ` [PATCH 4.19 23/32] vhost: scsi: " Greg Kroah-Hartman
2019-08-02  9:39 ` Greg Kroah-Hartman [this message]
2019-08-02  9:39 ` [PATCH 4.19 25/32] sched/fair: Use RCU accessors consistently for ->numa_group Greg Kroah-Hartman
2019-08-02  9:40 ` [PATCH 4.19 26/32] /proc/<pid>/cmdline: remove all the special cases Greg Kroah-Hartman
2019-08-02  9:40 ` [PATCH 4.19 27/32] /proc/<pid>/cmdline: add back the setproctitle() special case Greg Kroah-Hartman
2019-08-02  9:40 ` [PATCH 4.19 28/32] drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl Greg Kroah-Hartman
2019-08-02  9:40 ` [PATCH 4.19 29/32] Fix allyesconfig output Greg Kroah-Hartman
2019-08-02  9:40 ` [PATCH 4.19 30/32] ceph: hold i_ceph_lock when removing caps for freeing inode Greg Kroah-Hartman
2019-08-02  9:40 ` [PATCH 4.19 31/32] block, scsi: Change the preempt-only flag into a counter Greg Kroah-Hartman
2019-08-02  9:40 ` [PATCH 4.19 32/32] scsi: core: Avoid that a kernel warning appears during system resume Greg Kroah-Hartman
2019-08-02 17:20 ` [PATCH 4.19 00/32] 4.19.64-stable review Thierry Reding
2019-08-02 23:22 ` shuah
2019-08-03  5:46 ` Naresh Kamboju
2019-08-03  9:58 ` Pavel Machek
2019-08-03 10:34   ` Greg Kroah-Hartman
2019-08-03 15:59 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190802092109.441126804@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.