public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Jann Horn <jannh@google.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Petr Mladek <pmladek@suse.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>, Ingo Molnar <mingo@kernel.org>,
	Sasha Levin <sashal@kernel.org>,
	linux-fsdevel@vger.kernel.org
Subject: [PATCH AUTOSEL 5.2 68/76] sched/fair: Don't free p->numa_faults with concurrent readers
Date: Fri,  2 Aug 2019 09:19:42 -0400	[thread overview]
Message-ID: <20190802131951.11600-68-sashal@kernel.org> (raw)
In-Reply-To: <20190802131951.11600-1-sashal@kernel.org>

From: Jann Horn <jannh@google.com>

[ Upstream commit 16d51a590a8ce3befb1308e0e7ab77f3b661af33 ]

When going through execve(), zero out the NUMA fault statistics instead of
freeing them.

During execve, the task is reachable through procfs and the scheduler. A
concurrent /proc/*/sched reader can read data from a freed ->numa_faults
allocation (confirmed by KASAN) and write it back to userspace.
I believe that it would also be possible for a use-after-free read to occur
through a race between a NUMA fault and execve(): task_numa_fault() can
lead to task_numa_compare(), which invokes task_weight() on the currently
running task of a different CPU.

Another way to fix this would be to make ->numa_faults RCU-managed or add
extra locking, but it seems easier to wipe the NUMA fault statistics on
execve.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Fixes: 82727018b0d3 ("sched/numa: Call task_numa_free() from do_execve()")
Link: https://lkml.kernel.org/r/20190716152047.14424-1-jannh@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/exec.c                            |  2 +-
 include/linux/sched/numa_balancing.h |  4 ++--
 kernel/fork.c                        |  2 +-
 kernel/sched/fair.c                  | 24 ++++++++++++++++++++----
 4 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 89a500bb897a6..39902cc9eb6f4 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1828,7 +1828,7 @@ static int __do_execve_file(int fd, struct filename *filename,
 	membarrier_execve(current);
 	rseq_execve(current);
 	acct_update_integrals(current);
-	task_numa_free(current);
+	task_numa_free(current, false);
 	free_bprm(bprm);
 	kfree(pathbuf);
 	if (filename)
diff --git a/include/linux/sched/numa_balancing.h b/include/linux/sched/numa_balancing.h
index e7dd04a84ba89..3988762efe15c 100644
--- a/include/linux/sched/numa_balancing.h
+++ b/include/linux/sched/numa_balancing.h
@@ -19,7 +19,7 @@
 extern void task_numa_fault(int last_node, int node, int pages, int flags);
 extern pid_t task_numa_group_id(struct task_struct *p);
 extern void set_numabalancing_state(bool enabled);
-extern void task_numa_free(struct task_struct *p);
+extern void task_numa_free(struct task_struct *p, bool final);
 extern bool should_numa_migrate_memory(struct task_struct *p, struct page *page,
 					int src_nid, int dst_cpu);
 #else
@@ -34,7 +34,7 @@ static inline pid_t task_numa_group_id(struct task_struct *p)
 static inline void set_numabalancing_state(bool enabled)
 {
 }
-static inline void task_numa_free(struct task_struct *p)
+static inline void task_numa_free(struct task_struct *p, bool final)
 {
 }
 static inline bool should_numa_migrate_memory(struct task_struct *p,
diff --git a/kernel/fork.c b/kernel/fork.c
index fe83343da24ba..d3f006ed2f9d5 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -727,7 +727,7 @@ void __put_task_struct(struct task_struct *tsk)
 	WARN_ON(tsk == current);
 
 	cgroup_free(tsk);
-	task_numa_free(tsk);
+	task_numa_free(tsk, true);
 	security_task_free(tsk);
 	exit_creds(tsk);
 	delayacct_tsk_free(tsk);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f35930f5e528a..8adf7b303d04d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2336,13 +2336,23 @@ no_join:
 	return;
 }
 
-void task_numa_free(struct task_struct *p)
+/*
+ * Get rid of NUMA staticstics associated with a task (either current or dead).
+ * If @final is set, the task is dead and has reached refcount zero, so we can
+ * safely free all relevant data structures. Otherwise, there might be
+ * concurrent reads from places like load balancing and procfs, and we should
+ * reset the data back to default state without freeing ->numa_faults.
+ */
+void task_numa_free(struct task_struct *p, bool final)
 {
 	struct numa_group *grp = p->numa_group;
-	void *numa_faults = p->numa_faults;
+	unsigned long *numa_faults = p->numa_faults;
 	unsigned long flags;
 	int i;
 
+	if (!numa_faults)
+		return;
+
 	if (grp) {
 		spin_lock_irqsave(&grp->lock, flags);
 		for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++)
@@ -2355,8 +2365,14 @@ void task_numa_free(struct task_struct *p)
 		put_numa_group(grp);
 	}
 
-	p->numa_faults = NULL;
-	kfree(numa_faults);
+	if (final) {
+		p->numa_faults = NULL;
+		kfree(numa_faults);
+	} else {
+		p->total_numa_faults = 0;
+		for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++)
+			numa_faults[i] = 0;
+	}
 }
 
 /*
-- 
2.20.1


  parent reply	other threads:[~2019-08-02 13:22 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-02 13:18 [PATCH AUTOSEL 5.2 01/76] powerpc: fix off by one in max_zone_pfn initialization for ZONE_DMA Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 02/76] netfilter: nfnetlink: avoid deadlock due to synchronous request_module Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 03/76] vfio-ccw: Set pa_nr to 0 if memory allocation fails for pa_iova_pfn Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 04/76] vfio-ccw: Don't call cp_free if we are processing a channel program Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 05/76] netfilter: Fix rpfilter dropping vrf packets by mistake Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 06/76] netfilter: nf_tables: fix module autoload for redir Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 07/76] netfilter: conntrack: always store window size un-scaled Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 08/76] netfilter: nft_hash: fix symhash with modulus one Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 09/76] scripts/sphinx-pre-install: fix script for RHEL/CentOS Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 10/76] scripts/sphinx-pre-install: don't use LaTeX with CentOS 7 Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 11/76] scripts/sphinx-pre-install: fix latexmk dependencies Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 12/76] rq-qos: don't reset has_sleepers on spurious wakeups Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 13/76] rq-qos: set ourself TASK_UNINTERRUPTIBLE after we schedule Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 14/76] rq-qos: use a mb for got_token Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 15/76] netfilter: nf_tables: Support auto-loading for inet nat Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 16/76] drm/amd/display: No audio endpoint for Dell MST display Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 17/76] drm/amd/display: Clock does not lower in Updateplanes Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 18/76] drm/amd/display: Wait for backlight programming completion in set backlight level Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 19/76] drm/amd/display: fix DMCU hang when going into Modern Standby Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 20/76] drm/amd/display: use encoder's engine id to find matched free audio device Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 21/76] drm/amd/display: put back front end initialization sequence Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 22/76] drm/amd/display: allocate 4 ddc engines for RV2 Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 23/76] drm/amd/display: Fix dc_create failure handling and 666 color depths Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 24/76] drm/amd/display: Only enable audio if speaker allocation exists Sasha Levin
2019-08-02 13:18 ` [PATCH AUTOSEL 5.2 25/76] drm/amd/display: Increase size of audios array Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 26/76] iscsi_ibft: make ISCSI_IBFT dependson ACPI instead of ISCSI_IBFT_FIND Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 27/76] nl80211: fix NL80211_HE_MAX_CAPABILITY_LEN Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 28/76] mac80211: fix possible memory leak in ieee80211_assign_beacon Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 29/76] mac80211: don't warn about CW params when not using them Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 30/76] allocate_flower_entry: should check for null deref Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 31/76] hwmon: (occ) Fix division by zero issue Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 32/76] hwmon: (nct6775) Fix register address and added missed tolerance for nct6106 Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 33/76] ARM: dts: imx6ul: fix clock frequency property name of I2C buses Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 34/76] x86/mm: Check for pfn instead of page in vmalloc_sync_one() Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 35/76] x86/mm: Sync also unmappings in vmalloc_sync_all() Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 36/76] mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy() Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 37/76] powerpc/papr_scm: Force a scm-unbind if initial scm-bind fails Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 38/76] arm64: Force SSBS on context switch Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 39/76] arm64: entry: SP Alignment Fault doesn't write to FAR_EL1 Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 40/76] iommu/vt-d: Check if domain->pgd was allocated Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 41/76] drm/msm/dpu: Correct dpu encoder spinlock initialization Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 42/76] drm/msm: stop abusing dma_map/unmap for cache Sasha Levin
2019-08-03  0:14   ` Rob Clark
2019-08-14  2:03     ` Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 43/76] drm: silence variable 'conn' set but not used Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 44/76] arm64: dts: imx8mm: Correct SAI3 RXC/TXFS pin's mux option #1 Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 45/76] arm64: dts: imx8mq: fix SAI compatible Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 46/76] cpufreq/pasemi: fix use-after-free in pas_cpufreq_cpu_init() Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 47/76] s390/qdio: add sanity checks to the fast-requeue path Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 48/76] ALSA: compress: Fix regression on compressed capture streams Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 49/76] ALSA: compress: Prevent bypasses of set_params Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 50/76] ALSA: compress: Don't allow paritial drain operations on capture streams Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 51/76] ALSA: compress: Be more restrictive about when a drain is allowed Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 52/76] perf script: Fix off by one in brstackinsn IPC computation Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 53/76] perf tools: Fix proper buffer size for feature processing Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 54/76] perf stat: Fix segfault for event group in repeat mode Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 55/76] perf session: Fix loading of compressed data split across adjacent records Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 56/76] perf probe: Avoid calling freeing routine multiple times for same pointer Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 57/76] drbd: dynamically allocate shash descriptor Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 58/76] ACPI/IORT: Fix off-by-one check in iort_dev_find_its_id() Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 59/76] nvme: ignore subnqn for ADATA SX6000LNP Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 60/76] nvme: fix memory leak caused by incorrect subsystem free Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 61/76] nvme: fix multipath crash when ANA is deactivated Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 62/76] ARM: davinci: fix sleep.S build error on ARMv4 Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 63/76] ARM: dts: bcm: bcm47094: add missing #cells for mdio-bus-mux Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 64/76] scsi: megaraid_sas: fix panic on loading firmware crashdump Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 65/76] scsi: ibmvfc: fix WARN_ON during event pool release Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 66/76] scsi: scsi_dh_alua: always use a 2 second delay before retrying RTPG Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 67/76] test_firmware: fix a memory leak bug Sasha Levin
2019-08-02 13:19 ` Sasha Levin [this message]
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 69/76] sched/fair: Use RCU accessors consistently for ->numa_group Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 70/76] tty/ldsem, locking/rwsem: Add missing ACQUIRE to read_failed sleep loop Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 71/76] perf/x86/intel: Fix SLOTS PEBS event constraint Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 72/76] perf/x86/intel: Fix invalid Bit 13 for Icelake MSR_OFFCORE_RSP_x register Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 73/76] perf/x86: Apply more accurate check on hypervisor platform Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 74/76] perf/core: Fix creating kernel counters for PMUs that override event->cpu Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 75/76] s390/dma: provide proper ARCH_ZONE_DMA_BITS value Sasha Levin
2019-08-02 13:19 ` [PATCH AUTOSEL 5.2 76/76] gen_compile_commands: lower the entry count threshold Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190802131951.11600-68-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=jannh@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox