All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Jann Horn <jannh@google.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Petr Mladek <pmladek@suse.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>, Ingo Molnar <mingo@kernel.org>,
	Sasha Levin <sashal@kernel.org>,
	linux-fsdevel@vger.kernel.org
Subject: [PATCH AUTOSEL 5.2 68/76] sched/fair: Don't free p->numa_faults with concurrent readers
Date: Fri,  2 Aug 2019 09:19:42 -0400	[thread overview]
Message-ID: <20190802131951.11600-68-sashal@kernel.org> (raw)
In-Reply-To: <20190802131951.11600-1-sashal@kernel.org>

From: Jann Horn <jannh@google.com>

[ Upstream commit 16d51a590a8ce3befb1308e0e7ab77f3b661af33 ]

When going through execve(), zero out the NUMA fault statistics instead of
freeing them.

During execve, the task is reachable through procfs and the scheduler. A
concurrent /proc/*/sched reader can read data from a freed ->numa_faults
allocation (confirmed by KASAN) and write it back to userspace.
I believe that it would also be possible for a use-after-free read to occur
through a race between a NUMA fault and execve(): task_numa_fault() can
lead to task_numa_compare(), which invokes task_weight() on the currently
running task of a different CPU.

Another way to fix this would be to make ->numa_faults RCU-managed or add
extra locking, but it seems easier to wipe the NUMA fault statistics on
execve.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Fixes: 82727018b0d3 ("sched/numa: Call task_numa_free() from do_execve()")
Link: https://lkml.kernel.org/r/20190716152047.14424-1-jannh@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/exec.c                            |  2 +-
 include/linux/sched/numa_balancing.h |  4 ++--
 kernel/fork.c                        |  2 +-
 kernel/sched/fair.c                  | 24 ++++++++++++++++++++----
 4 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 89a500bb897a6..39902cc9eb6f4 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1828,7 +1828,7 @@ static int __do_execve_file(int fd, struct filename *filename,
 	membarrier_execve(current);
 	rseq_execve(current);
 	acct_update_integrals(current);
-	task_numa_free(current);
+	task_numa_free(current, false);
 	free_bprm(bprm);
 	kfree(pathbuf);
 	if (filename)
diff --git a/include/linux/sched/numa_balancing.h b/include/linux/sched/numa_balancing.h
index e7dd04a84ba89..3988762efe15c 100644
--- a/include/linux/sched/numa_balancing.h
+++ b/include/linux/sched/numa_balancing.h
@@ -19,7 +19,7 @@
 extern void task_numa_fault(int last_node, int node, int pages, int flags);
 extern pid_t task_numa_group_id(struct task_struct *p);
 extern void set_numabalancing_state(bool enabled);
-extern void task_numa_free(struct task_struct *p);
+extern void task_numa_free(struct task_struct *p, bool final);
 extern bool should_numa_migrate_memory(struct task_struct *p, struct page *page,
 					int src_nid, int dst_cpu);
 #else
@@ -34,7 +34,7 @@ static inline pid_t task_numa_group_id(struct task_struct *p)
 static inline void set_numabalancing_state(bool enabled)
 {
 }
-static inline void task_numa_free(struct task_struct *p)
+static inline void task_numa_free(struct task_struct *p, bool final)
 {
 }
 static inline bool should_numa_migrate_memory(struct task_struct *p,
diff --git a/kernel/fork.c b/kernel/fork.c
index fe83343da24ba..d3f006ed2f9d5 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -727,7 +727,7 @@ void __put_task_struct(struct task_struct *tsk)
 	WARN_ON(tsk == current);
 
 	cgroup_free(tsk);
-	task_numa_free(tsk);
+	task_numa_free(tsk, true);
 	security_task_free(tsk);
 	exit_creds(tsk);
 	delayacct_tsk_free(tsk);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f35930f5e528a..8adf7b303d04d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2336,13 +2336,23 @@ no_join:
 	return;
 }
 
-void task_numa_free(struct task_struct *p)
+/*
+ * Get rid of NUMA staticstics associated with a task (either current or dead).
+ * If @final is set, the task is dead and has reached refcount zero, so we can
+ * safely free all relevant data structures. Otherwise, there might be
+ * concurrent reads from places like load balancing and procfs, and we should
+ * reset the data back to default state without freeing ->numa_faults.
+ */
+void task_numa_free(struct task_struct *p, bool final)
 {
 	struct numa_group *grp = p->numa_group;
-	void *numa_faults = p->numa_faults;
+	unsigned long *numa_faults = p->numa_faults;
 	unsigned long flags;
 	int i;
 
+	if (!numa_faults)
+		return;
+
 	if (grp) {
 		spin_lock_irqsave(&grp->lock, flags);
 		for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++)
@@ -2355,8 +2365,14 @@ void task_numa_free(struct task_struct *p)
 		put_numa_group(grp);
 	}
 
-	p->numa_faults = NULL;
-	kfree(numa_faults);
+	if (final) {
+		p->numa_faults = NULL;
+		kfree(numa_faults);
+	} else {
+		p->total_numa_faults = 0;
+		for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++)
+			numa_faults[i] = 0;
+	}
 }
 
 /*
-- 
2.20.1


  parent reply	other threads:[~2019-08-02 13:34 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-02 13:18 [PATCH AUTOSEL 5.2 01/76] powerpc: fix off by one in max_zone_pfn initialization for ZONE_DMA Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 02/76] netfilter: nfnetlink: avoid deadlock due to synchronous request_module Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 03/76] vfio-ccw: Set pa_nr to 0 if memory allocation fails for pa_iova_pfn Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 04/76] vfio-ccw: Don't call cp_free if we are processing a channel program Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 05/76] netfilter: Fix rpfilter dropping vrf packets by mistake Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 06/76] netfilter: nf_tables: fix module autoload for redir Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 07/76] netfilter: conntrack: always store window size un-scaled Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 08/76] netfilter: nft_hash: fix symhash with modulus one Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 12/76] rq-qos: don't reset has_sleepers on spurious wakeups Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 13/76] rq-qos: set ourself TASK_UNINTERRUPTIBLE after we schedule Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 14/76] rq-qos: use a mb for got_token Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 15/76] netfilter: nf_tables: Support auto-loading for inet nat Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 16/76] drm/amd/display: No audio endpoint for Dell MST display Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 17/76] drm/amd/display: Clock does not lower in Updateplanes Sasha Levin
     [not found] ` <20190802131951.11600-1-sashal-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2019-08-02 13:18   ` [PATCH AUTOSEL 5.2 18/76] drm/amd/display: Wait for backlight programming completion in set backlight level Sasha Levin
2019-08-02 13:18   ` [PATCH AUTOSEL 5.2 19/76] drm/amd/display: fix DMCU hang when going into Modern Standby Sasha Levin
2019-08-02 13:18   ` [PATCH AUTOSEL 5.2 20/76] drm/amd/display: use encoder's engine id to find matched free audio device Sasha Levin
2019-08-02 13:18   ` [PATCH AUTOSEL 5.2 21/76] drm/amd/display: put back front end initialization sequence Sasha Levin
2019-08-02 13:18   ` [PATCH AUTOSEL 5.2 22/76] drm/amd/display: allocate 4 ddc engines for RV2 Sasha Levin
2019-08-02 13:18   ` [PATCH AUTOSEL 5.2 25/76] drm/amd/display: Increase size of audios array Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 23/76] drm/amd/display: Fix dc_create failure handling and 666 color depths Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 24/76] drm/amd/display: Only enable audio if speaker allocation exists Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 27/76] nl80211: fix NL80211_HE_MAX_CAPABILITY_LEN Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 28/76] mac80211: fix possible memory leak in ieee80211_assign_beacon Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 29/76] mac80211: don't warn about CW params when not using them Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 31/76] hwmon: (occ) Fix division by zero issue Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 32/76] hwmon: (nct6775) Fix register address and added missed tolerance for nct6106 Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 33/76] ARM: dts: imx6ul: fix clock frequency property name of I2C buses Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 36/76] mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy() Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 37/76] powerpc/papr_scm: Force a scm-unbind if initial scm-bind fails Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 40/76] iommu/vt-d: Check if domain->pgd was allocated Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 41/76] drm/msm/dpu: Correct dpu encoder spinlock initialization Sasha Levin
2019-08-02 13:19   ` Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 42/76] drm/msm: stop abusing dma_map/unmap for cache Sasha Levin
2019-08-02 13:19   ` Sasha Levin
2019-08-03  0:14   ` Rob Clark
2019-08-14  2:03     ` Sasha Levin
2019-08-14  2:03       ` Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 43/76] drm: silence variable 'conn' set but not used Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 44/76] arm64: dts: imx8mm: Correct SAI3 RXC/TXFS pin's mux option #1 Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 45/76] arm64: dts: imx8mq: fix SAI compatible Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 46/76] cpufreq/pasemi: fix use-after-free in pas_cpufreq_cpu_init() Sasha Levin
2019-08-02 13:19   ` Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 47/76] s390/qdio: add sanity checks to the fast-requeue path Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 57/76] drbd: dynamically allocate shash descriptor Sasha Levin
2019-08-02 13:19   ` [Drbd-dev] " Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 58/76] ACPI/IORT: Fix off-by-one check in iort_dev_find_its_id() Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 59/76] nvme: ignore subnqn for ADATA SX6000LNP Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 60/76] nvme: fix memory leak caused by incorrect subsystem free Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 61/76] nvme: fix multipath crash when ANA is deactivated Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 63/76] ARM: dts: bcm: bcm47094: add missing #cells for mdio-bus-mux Sasha Levin
2019-08-02 13:19   ` Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 64/76] scsi: megaraid_sas: fix panic on loading firmware crashdump Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 65/76] scsi: ibmvfc: fix WARN_ON during event pool release Sasha Levin
2019-08-02 13:19   ` Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 66/76] scsi: scsi_dh_alua: always use a 2 second delay before retrying RTPG Sasha Levin
2019-08-02 13:19 ` Sasha Levin [this message]
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 75/76] s390/dma: provide proper ARCH_ZONE_DMA_BITS value Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190802131951.11600-68-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=jannh@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.