* [PATCH AUTOSEL 5.0 015/262] memblock: memblock_phys_alloc_try_nid(): don't panic
[not found] <20190327180158.10245-1-sashal@kernel.org>
@ 2019-03-27 17:57 ` Sasha Levin
2019-03-28 5:57 ` Mike Rapoport
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 030/262] mm/sparse: fix a bad comparison Sasha Levin
` (11 subsequent siblings)
12 siblings, 1 reply; 15+ messages in thread
From: Sasha Levin @ 2019-03-27 17:57 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Mike Rapoport, Catalin Marinas, Christophe Leroy,
Christoph Hellwig, David S. Miller, Dennis Zhou,
Geert Uytterhoeven, Greentime Hu, Greg Kroah-Hartman, Guan Xuetao,
Guo Ren, Guo Ren, Heiko Carstens, Juergen Gross, Mark Salter,
Matt Turner, Max Filippov, Michal Simek, Paul Burton, Petr Mladek,
Richard Weinberger, Rich Felker, Rob Herring, Rob Herring,
Russell King, Stafford Horne, Tony Luck, Vineet Gupta,
Yoshinori Sato, Andrew Morton, Linus Torvalds, Sasha Levin,
linuxppc-dev, linux-mm
From: Mike Rapoport <rppt@linux.ibm.com>
[ Upstream commit 337555744e6e39dd1d87698c6084dd88a606d60a ]
The memblock_phys_alloc_try_nid() function tries to allocate memory from
the requested node and then falls back to allocation from any node in
the system. The memblock_alloc_base() fallback used by this function
panics if the allocation fails.
Replace the memblock_alloc_base() fallback with the direct call to
memblock_alloc_range_nid() and update the memblock_phys_alloc_try_nid()
callers to check the returned value and panic in case of error.
Link: http://lkml.kernel.org/r/1548057848-15136-7-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Guo Ren <ren_guo@c-sky.com> [c-sky]
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Juergen Gross <jgross@suse.com> [Xen]
Cc: Mark Salter <msalter@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/arm64/mm/numa.c | 4 ++++
arch/powerpc/mm/numa.c | 4 ++++
mm/memblock.c | 4 +++-
3 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
index ae34e3a1cef1..2c61ea4e290b 100644
--- a/arch/arm64/mm/numa.c
+++ b/arch/arm64/mm/numa.c
@@ -237,6 +237,10 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
pr_info("Initmem setup node %d [<memory-less node>]\n", nid);
nd_pa = memblock_phys_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
+ if (!nd_pa)
+ panic("Cannot allocate %zu bytes for node %d data\n",
+ nd_size, nid);
+
nd = __va(nd_pa);
/* report and initialize */
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 87f0dd004295..8ec2ed30d44c 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -788,6 +788,10 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
int tnid;
nd_pa = memblock_phys_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
+ if (!nd_pa)
+ panic("Cannot allocate %zu bytes for node %d data\n",
+ nd_size, nid);
+
nd = __va(nd_pa);
/* report and initialize */
diff --git a/mm/memblock.c b/mm/memblock.c
index ea31045ba704..d5923df56acc 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1342,7 +1342,9 @@ phys_addr_t __init memblock_phys_alloc_try_nid(phys_addr_t size, phys_addr_t ali
if (res)
return res;
- return memblock_alloc_base(size, align, MEMBLOCK_ALLOC_ACCESSIBLE);
+ return memblock_alloc_range_nid(size, align, 0,
+ MEMBLOCK_ALLOC_ACCESSIBLE,
+ NUMA_NO_NODE, MEMBLOCK_NONE);
}
/**
--
2.19.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH AUTOSEL 5.0 030/262] mm/sparse: fix a bad comparison
[not found] <20190327180158.10245-1-sashal@kernel.org>
2019-03-27 17:57 ` [PATCH AUTOSEL 5.0 015/262] memblock: memblock_phys_alloc_try_nid(): don't panic Sasha Levin
@ 2019-03-27 17:58 ` Sasha Levin
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 031/262] mm/cma.c: cma_declare_contiguous: correct err handling Sasha Levin
` (10 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Sasha Levin @ 2019-03-27 17:58 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Qian Cai, Dave Hansen, Andrew Morton, Linus Torvalds, Sasha Levin,
linux-mm
From: Qian Cai <cai@lca.pw>
[ Upstream commit d778015ac95bc036af73342c878ab19250e01fe1 ]
next_present_section_nr() could only return an unsigned number -1, so
just check it specifically where compilers will convert -1 to unsigned
if needed.
mm/sparse.c: In function 'sparse_init_nid':
mm/sparse.c:200:20: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
((section_nr >= 0) && \
^~
mm/sparse.c:478:2: note: in expansion of macro
'for_each_present_section_nr'
for_each_present_section_nr(pnum_begin, pnum) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~
mm/sparse.c:200:20: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
((section_nr >= 0) && \
^~
mm/sparse.c:497:2: note: in expansion of macro
'for_each_present_section_nr'
for_each_present_section_nr(pnum_begin, pnum) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~
mm/sparse.c: In function 'sparse_init':
mm/sparse.c:200:20: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
((section_nr >= 0) && \
^~
mm/sparse.c:520:2: note: in expansion of macro
'for_each_present_section_nr'
for_each_present_section_nr(pnum_begin + 1, pnum_end) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~
Link: http://lkml.kernel.org/r/20190228181839.86504-1-cai@lca.pw
Fixes: c4e1be9ec113 ("mm, sparsemem: break out of loops early")
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/sparse.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/sparse.c b/mm/sparse.c
index 7ea5dc6c6b19..77a0554fa5bd 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -197,7 +197,7 @@ static inline int next_present_section_nr(int section_nr)
}
#define for_each_present_section_nr(start, section_nr) \
for (section_nr = next_present_section_nr(start-1); \
- ((section_nr >= 0) && \
+ ((section_nr != -1) && \
(section_nr <= __highest_present_section_nr)); \
section_nr = next_present_section_nr(section_nr))
--
2.19.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH AUTOSEL 5.0 031/262] mm/cma.c: cma_declare_contiguous: correct err handling
[not found] <20190327180158.10245-1-sashal@kernel.org>
2019-03-27 17:57 ` [PATCH AUTOSEL 5.0 015/262] memblock: memblock_phys_alloc_try_nid(): don't panic Sasha Levin
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 030/262] mm/sparse: fix a bad comparison Sasha Levin
@ 2019-03-27 17:58 ` Sasha Levin
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 032/262] mm/page_ext.c: fix an imbalance with kmemleak Sasha Levin
` (9 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Sasha Levin @ 2019-03-27 17:58 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Peng Fan, Laura Abbott, Joonsoo Kim, Michal Hocko,
Vlastimil Babka, Marek Szyprowski, Andrey Konovalov,
Andrew Morton, Linus Torvalds, Sasha Levin, linux-mm
From: Peng Fan <peng.fan@nxp.com>
[ Upstream commit 0d3bd18a5efd66097ef58622b898d3139790aa9d ]
In case cma_init_reserved_mem failed, need to free the memblock
allocated by memblock_reserve or memblock_alloc_range.
Quote Catalin's comments:
https://lkml.org/lkml/2019/2/26/482
Kmemleak is supposed to work with the memblock_{alloc,free} pair and it
ignores the memblock_reserve() as a memblock_alloc() implementation
detail. It is, however, tolerant to memblock_free() being called on
a sub-range or just a different range from a previous memblock_alloc().
So the original patch looks fine to me. FWIW:
Link: http://lkml.kernel.org/r/20190227144631.16708-1-peng.fan@nxp.com
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/cma.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mm/cma.c b/mm/cma.c
index c7b39dd3b4f6..f4f3a8a57d86 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -353,12 +353,14 @@ int __init cma_declare_contiguous(phys_addr_t base,
ret = cma_init_reserved_mem(base, size, order_per_bit, name, res_cma);
if (ret)
- goto err;
+ goto free_mem;
pr_info("Reserved %ld MiB at %pa\n", (unsigned long)size / SZ_1M,
&base);
return 0;
+free_mem:
+ memblock_free(base, size);
err:
pr_err("Failed to reserve %ld MiB\n", (unsigned long)size / SZ_1M);
return ret;
--
2.19.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH AUTOSEL 5.0 032/262] mm/page_ext.c: fix an imbalance with kmemleak
[not found] <20190327180158.10245-1-sashal@kernel.org>
` (2 preceding siblings ...)
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 031/262] mm/cma.c: cma_declare_contiguous: correct err handling Sasha Levin
@ 2019-03-27 17:58 ` Sasha Levin
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 033/262] mm, swap: bounds check swap_info array accesses to avoid NULL derefs Sasha Levin
` (8 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Sasha Levin @ 2019-03-27 17:58 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Qian Cai, Andrew Morton, Linus Torvalds, Sasha Levin, linux-mm
From: Qian Cai <cai@lca.pw>
[ Upstream commit 0c81585499601acd1d0e1cbf424cabfaee60628c ]
After offlining a memory block, kmemleak scan will trigger a crash, as
it encounters a page ext address that has already been freed during
memory offlining. At the beginning in alloc_page_ext(), it calls
kmemleak_alloc(), but it does not call kmemleak_free() in
free_page_ext().
BUG: unable to handle kernel paging request at ffff888453d00000
PGD 128a01067 P4D 128a01067 PUD 128a04067 PMD 47e09e067 PTE 800ffffbac2ff060
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
CPU: 1 PID: 1594 Comm: bash Not tainted 5.0.0-rc8+ #15
Hardware name: HP ProLiant DL180 Gen9/ProLiant DL180 Gen9, BIOS U20 10/25/2017
RIP: 0010:scan_block+0xb5/0x290
Code: 85 6e 01 00 00 48 b8 00 00 30 f5 81 88 ff ff 48 39 c3 0f 84 5b 01 00 00 48 89 d8 48 c1 e8 03 42 80 3c 20 00 0f 85 87 01 00 00 <4c> 8b 3b e8 f3 0c fa ff 4c 39 3d 0c 6b 4c 01 0f 87 08 01 00 00 4c
RSP: 0018:ffff8881ec57f8e0 EFLAGS: 00010082
RAX: 0000000000000000 RBX: ffff888453d00000 RCX: ffffffffa61e5a54
RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff888453d00000
RBP: ffff8881ec57f920 R08: fffffbfff4ed588d R09: fffffbfff4ed588c
R10: fffffbfff4ed588c R11: ffffffffa76ac463 R12: dffffc0000000000
R13: ffff888453d00ff9 R14: ffff8881f80cef48 R15: ffff8881f80cef48
FS: 00007f6c0e3f8740(0000) GS:ffff8881f7680000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff888453d00000 CR3: 00000001c4244003 CR4: 00000000001606a0
Call Trace:
scan_gray_list+0x269/0x430
kmemleak_scan+0x5a8/0x10f0
kmemleak_write+0x541/0x6ca
full_proxy_write+0xf8/0x190
__vfs_write+0xeb/0x980
vfs_write+0x15a/0x4f0
ksys_write+0xd2/0x1b0
__x64_sys_write+0x73/0xb0
do_syscall_64+0xeb/0xaaa
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f6c0dad73b8
Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 63 2d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
RSP: 002b:00007ffd5b863cb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f6c0dad73b8
RDX: 0000000000000005 RSI: 000055a9216e1710 RDI: 0000000000000001
RBP: 000055a9216e1710 R08: 000000000000000a R09: 00007ffd5b863840
R10: 000000000000000a R11: 0000000000000246 R12: 00007f6c0dda9780
R13: 0000000000000005 R14: 00007f6c0dda4740 R15: 0000000000000005
Modules linked in: nls_iso8859_1 nls_cp437 vfat fat kvm_intel kvm irqbypass efivars ip_tables x_tables xfs sd_mod ahci libahci igb i2c_algo_bit libata i2c_core dm_mirror dm_region_hash dm_log dm_mod efivarfs
CR2: ffff888453d00000
---[ end trace ccf646c7456717c5 ]---
Kernel panic - not syncing: Fatal exception
Shutting down cpus with NMI
Kernel Offset: 0x24c00000 from 0xffffffff81000000 (relocation range:
0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: Fatal exception ]---
Link: http://lkml.kernel.org/r/20190227173147.75650-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/page_ext.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/mm/page_ext.c b/mm/page_ext.c
index 8c78b8d45117..f116431c3dee 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -273,6 +273,7 @@ static void free_page_ext(void *addr)
table_size = get_entry_size() * PAGES_PER_SECTION;
BUG_ON(PageReserved(page));
+ kmemleak_free(addr);
free_pages_exact(addr, table_size);
}
}
--
2.19.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH AUTOSEL 5.0 033/262] mm, swap: bounds check swap_info array accesses to avoid NULL derefs
[not found] <20190327180158.10245-1-sashal@kernel.org>
` (3 preceding siblings ...)
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 032/262] mm/page_ext.c: fix an imbalance with kmemleak Sasha Levin
@ 2019-03-27 17:58 ` Sasha Levin
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 035/262] mm,oom: don't kill global init via memory.oom.group Sasha Levin
` (7 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Sasha Levin @ 2019-03-27 17:58 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Daniel Jordan, Alan Stern, Andi Kleen, Dave Hansen, Omar Sandoval,
Paul McKenney, Shaohua Li, Stephen Rothwell, Tejun Heo,
Will Deacon, Andrew Morton, Linus Torvalds, Sasha Levin, linux-mm
From: Daniel Jordan <daniel.m.jordan@oracle.com>
[ Upstream commit c10d38cc8d3e43f946b6c2bf4602c86791587f30 ]
Dan Carpenter reports a potential NULL dereference in
get_swap_page_of_type:
Smatch complains that the NULL checks on "si" aren't consistent. This
seems like a real bug because we have not ensured that the type is
valid and so "si" can be NULL.
Add the missing check for NULL, taking care to use a read barrier to
ensure CPU1 observes CPU0's updates in the correct order:
CPU0 CPU1
alloc_swap_info() if (type >= nr_swapfiles)
swap_info[type] = p /* handle invalid entry */
smp_wmb() smp_rmb()
++nr_swapfiles p = swap_info[type]
Without smp_rmb, CPU1 might observe CPU0's write to nr_swapfiles before
CPU0's write to swap_info[type] and read NULL from swap_info[type].
Ying Huang noticed other places in swapfile.c don't order these reads
properly. Introduce swap_type_to_swap_info to encourage correct usage.
Use READ_ONCE and WRITE_ONCE to follow the Linux Kernel Memory Model
(see tools/memory-model/Documentation/explanation.txt).
This ordering need not be enforced in places where swap_lock is held
(e.g. si_swapinfo) because swap_lock serializes updates to nr_swapfiles
and the swap_info array.
Link: http://lkml.kernel.org/r/20190131024410.29859-1-daniel.m.jordan@oracle.com
Fixes: ec8acf20afb8 ("swap: add per-partition lock for swapfile")
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Suggested-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Andrea Parri <andrea.parri@amarulasolutions.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/swapfile.c | 51 +++++++++++++++++++++++++++++----------------------
1 file changed, 29 insertions(+), 22 deletions(-)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index dbac1d49469d..67f60e051814 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -98,6 +98,15 @@ static atomic_t proc_poll_event = ATOMIC_INIT(0);
atomic_t nr_rotate_swap = ATOMIC_INIT(0);
+static struct swap_info_struct *swap_type_to_swap_info(int type)
+{
+ if (type >= READ_ONCE(nr_swapfiles))
+ return NULL;
+
+ smp_rmb(); /* Pairs with smp_wmb in alloc_swap_info. */
+ return READ_ONCE(swap_info[type]);
+}
+
static inline unsigned char swap_count(unsigned char ent)
{
return ent & ~SWAP_HAS_CACHE; /* may include COUNT_CONTINUED flag */
@@ -1044,12 +1053,14 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_size)
/* The only caller of this function is now suspend routine */
swp_entry_t get_swap_page_of_type(int type)
{
- struct swap_info_struct *si;
+ struct swap_info_struct *si = swap_type_to_swap_info(type);
pgoff_t offset;
- si = swap_info[type];
+ if (!si)
+ goto fail;
+
spin_lock(&si->lock);
- if (si && (si->flags & SWP_WRITEOK)) {
+ if (si->flags & SWP_WRITEOK) {
atomic_long_dec(&nr_swap_pages);
/* This is called for allocating swap entry, not cache */
offset = scan_swap_map(si, 1);
@@ -1060,6 +1071,7 @@ swp_entry_t get_swap_page_of_type(int type)
atomic_long_inc(&nr_swap_pages);
}
spin_unlock(&si->lock);
+fail:
return (swp_entry_t) {0};
}
@@ -1071,9 +1083,9 @@ static struct swap_info_struct *__swap_info_get(swp_entry_t entry)
if (!entry.val)
goto out;
type = swp_type(entry);
- if (type >= nr_swapfiles)
+ p = swap_type_to_swap_info(type);
+ if (!p)
goto bad_nofile;
- p = swap_info[type];
if (!(p->flags & SWP_USED))
goto bad_device;
offset = swp_offset(entry);
@@ -1697,10 +1709,9 @@ int swap_type_of(dev_t device, sector_t offset, struct block_device **bdev_p)
sector_t swapdev_block(int type, pgoff_t offset)
{
struct block_device *bdev;
+ struct swap_info_struct *si = swap_type_to_swap_info(type);
- if ((unsigned int)type >= nr_swapfiles)
- return 0;
- if (!(swap_info[type]->flags & SWP_WRITEOK))
+ if (!si || !(si->flags & SWP_WRITEOK))
return 0;
return map_swap_entry(swp_entry(type, offset), &bdev);
}
@@ -2258,7 +2269,7 @@ static sector_t map_swap_entry(swp_entry_t entry, struct block_device **bdev)
struct swap_extent *se;
pgoff_t offset;
- sis = swap_info[swp_type(entry)];
+ sis = swp_swap_info(entry);
*bdev = sis->bdev;
offset = swp_offset(entry);
@@ -2700,9 +2711,7 @@ static void *swap_start(struct seq_file *swap, loff_t *pos)
if (!l)
return SEQ_START_TOKEN;
- for (type = 0; type < nr_swapfiles; type++) {
- smp_rmb(); /* read nr_swapfiles before swap_info[type] */
- si = swap_info[type];
+ for (type = 0; (si = swap_type_to_swap_info(type)); type++) {
if (!(si->flags & SWP_USED) || !si->swap_map)
continue;
if (!--l)
@@ -2722,9 +2731,7 @@ static void *swap_next(struct seq_file *swap, void *v, loff_t *pos)
else
type = si->type + 1;
- for (; type < nr_swapfiles; type++) {
- smp_rmb(); /* read nr_swapfiles before swap_info[type] */
- si = swap_info[type];
+ for (; (si = swap_type_to_swap_info(type)); type++) {
if (!(si->flags & SWP_USED) || !si->swap_map)
continue;
++*pos;
@@ -2831,14 +2838,14 @@ static struct swap_info_struct *alloc_swap_info(void)
}
if (type >= nr_swapfiles) {
p->type = type;
- swap_info[type] = p;
+ WRITE_ONCE(swap_info[type], p);
/*
* Write swap_info[type] before nr_swapfiles, in case a
* racing procfs swap_start() or swap_next() is reading them.
* (We never shrink nr_swapfiles, we never free this entry.)
*/
smp_wmb();
- nr_swapfiles++;
+ WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1);
} else {
kvfree(p);
p = swap_info[type];
@@ -3358,7 +3365,7 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage)
{
struct swap_info_struct *p;
struct swap_cluster_info *ci;
- unsigned long offset, type;
+ unsigned long offset;
unsigned char count;
unsigned char has_cache;
int err = -EINVAL;
@@ -3366,10 +3373,10 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage)
if (non_swap_entry(entry))
goto out;
- type = swp_type(entry);
- if (type >= nr_swapfiles)
+ p = swp_swap_info(entry);
+ if (!p)
goto bad_file;
- p = swap_info[type];
+
offset = swp_offset(entry);
if (unlikely(offset >= p->max))
goto out;
@@ -3466,7 +3473,7 @@ int swapcache_prepare(swp_entry_t entry)
struct swap_info_struct *swp_swap_info(swp_entry_t entry)
{
- return swap_info[swp_type(entry)];
+ return swap_type_to_swap_info(swp_type(entry));
}
struct swap_info_struct *page_swap_info(struct page *page)
--
2.19.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH AUTOSEL 5.0 035/262] mm,oom: don't kill global init via memory.oom.group
[not found] <20190327180158.10245-1-sashal@kernel.org>
` (4 preceding siblings ...)
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 033/262] mm, swap: bounds check swap_info array accesses to avoid NULL derefs Sasha Levin
@ 2019-03-27 17:58 ` Sasha Levin
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 036/262] memcg: killed threads should not invoke memcg OOM killer Sasha Levin
` (6 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Sasha Levin @ 2019-03-27 17:58 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Tetsuo Handa, Roman Gushchin, Johannes Weiner, Andrew Morton,
Linus Torvalds, Sasha Levin, linux-mm
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
[ Upstream commit d342a0b38674867ea67fde47b0e1e60ffe9f17a2 ]
Since setting global init process to some memory cgroup is technically
possible, oom_kill_memcg_member() must check it.
Tasks in /test1 are going to be killed due to memory.oom.group set
Memory cgroup out of memory: Killed process 1 (systemd) total-vm:43400kB, anon-rss:1228kB, file-rss:3992kB, shmem-rss:0kB
oom_reaper: reaped process 1 (systemd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000008b
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main(int argc, char *argv[])
{
static char buffer[10485760];
static int pipe_fd[2] = { EOF, EOF };
unsigned int i;
int fd;
char buf[64] = { };
if (pipe(pipe_fd))
return 1;
if (chdir("/sys/fs/cgroup/"))
return 1;
fd = open("cgroup.subtree_control", O_WRONLY);
write(fd, "+memory", 7);
close(fd);
mkdir("test1", 0755);
fd = open("test1/memory.oom.group", O_WRONLY);
write(fd, "1", 1);
close(fd);
fd = open("test1/cgroup.procs", O_WRONLY);
write(fd, "1", 1);
snprintf(buf, sizeof(buf) - 1, "%d", getpid());
write(fd, buf, strlen(buf));
close(fd);
snprintf(buf, sizeof(buf) - 1, "%lu", sizeof(buffer) * 5);
fd = open("test1/memory.max", O_WRONLY);
write(fd, buf, strlen(buf));
close(fd);
for (i = 0; i < 10; i++)
if (fork() == 0) {
char c;
close(pipe_fd[1]);
read(pipe_fd[0], &c, 1);
memset(buffer, 0, sizeof(buffer));
sleep(3);
_exit(0);
}
close(pipe_fd[0]);
close(pipe_fd[1]);
sleep(3);
return 0;
}
[ 37.052923][ T9185] a.out invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
[ 37.056169][ T9185] CPU: 4 PID: 9185 Comm: a.out Kdump: loaded Not tainted 5.0.0-rc4-next-20190131 #280
[ 37.059205][ T9185] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
[ 37.062954][ T9185] Call Trace:
[ 37.063976][ T9185] dump_stack+0x67/0x95
[ 37.065263][ T9185] dump_header+0x51/0x570
[ 37.066619][ T9185] ? trace_hardirqs_on+0x3f/0x110
[ 37.068171][ T9185] ? _raw_spin_unlock_irqrestore+0x3d/0x70
[ 37.069967][ T9185] oom_kill_process+0x18d/0x210
[ 37.071515][ T9185] out_of_memory+0x11b/0x380
[ 37.072936][ T9185] mem_cgroup_out_of_memory+0xb6/0xd0
[ 37.074601][ T9185] try_charge+0x790/0x820
[ 37.076021][ T9185] mem_cgroup_try_charge+0x42/0x1d0
[ 37.077629][ T9185] mem_cgroup_try_charge_delay+0x11/0x30
[ 37.079370][ T9185] do_anonymous_page+0x105/0x5e0
[ 37.080939][ T9185] __handle_mm_fault+0x9cb/0x1070
[ 37.082485][ T9185] handle_mm_fault+0x1b2/0x3a0
[ 37.083819][ T9185] ? handle_mm_fault+0x47/0x3a0
[ 37.085181][ T9185] __do_page_fault+0x255/0x4c0
[ 37.086529][ T9185] do_page_fault+0x28/0x260
[ 37.087788][ T9185] ? page_fault+0x8/0x30
[ 37.088978][ T9185] page_fault+0x1e/0x30
[ 37.090142][ T9185] RIP: 0033:0x7f8b183aefe0
[ 37.091433][ T9185] Code: 20 f3 44 0f 7f 44 17 d0 f3 44 0f 7f 47 30 f3 44 0f 7f 44 17 c0 48 01 fa 48 83 e2 c0 48 39 d1 74 a3 66 0f 1f 84 00 00 00 00 00 <66> 44 0f 7f 01 66 44 0f 7f 41 10 66 44 0f 7f 41 20 66 44 0f 7f 41
[ 37.096917][ T9185] RSP: 002b:00007fffc5d329e8 EFLAGS: 00010206
[ 37.098615][ T9185] RAX: 00000000006010e0 RBX: 0000000000000008 RCX: 0000000000c30000
[ 37.100905][ T9185] RDX: 00000000010010c0 RSI: 0000000000000000 RDI: 00000000006010e0
[ 37.103349][ T9185] RBP: 0000000000000000 R08: 00007f8b188f4740 R09: 0000000000000000
[ 37.105797][ T9185] R10: 00007fffc5d32420 R11: 00007f8b183aef40 R12: 0000000000000005
[ 37.108228][ T9185] R13: 0000000000000000 R14: ffffffffffffffff R15: 0000000000000000
[ 37.110840][ T9185] memory: usage 51200kB, limit 51200kB, failcnt 125
[ 37.113045][ T9185] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
[ 37.115808][ T9185] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
[ 37.117660][ T9185] Memory cgroup stats for /test1: cache:0KB rss:49484KB rss_huge:30720KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:49700KB inactive_file:0KB active_file:0KB unevictable:0KB
[ 37.123371][ T9185] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/test1,task_memcg=/test1,task=a.out,pid=9188,uid=0
[ 37.128158][ T9185] Memory cgroup out of memory: Killed process 9188 (a.out) total-vm:14456kB, anon-rss:10324kB, file-rss:504kB, shmem-rss:0kB
[ 37.132710][ T9185] Tasks in /test1 are going to be killed due to memory.oom.group set
[ 37.132833][ T54] oom_reaper: reaped process 9188 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 37.135498][ T9185] Memory cgroup out of memory: Killed process 1 (systemd) total-vm:43400kB, anon-rss:1228kB, file-rss:3992kB, shmem-rss:0kB
[ 37.143434][ T9185] Memory cgroup out of memory: Killed process 9182 (a.out) total-vm:14456kB, anon-rss:76kB, file-rss:588kB, shmem-rss:0kB
[ 37.144328][ T54] oom_reaper: reaped process 1 (systemd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 37.147585][ T9185] Memory cgroup out of memory: Killed process 9183 (a.out) total-vm:14456kB, anon-rss:6228kB, file-rss:512kB, shmem-rss:0kB
[ 37.157222][ T9185] Memory cgroup out of memory: Killed process 9184 (a.out) total-vm:14456kB, anon-rss:6228kB, file-rss:508kB, shmem-rss:0kB
[ 37.157259][ T9185] Memory cgroup out of memory: Killed process 9185 (a.out) total-vm:14456kB, anon-rss:6228kB, file-rss:512kB, shmem-rss:0kB
[ 37.157291][ T9185] Memory cgroup out of memory: Killed process 9186 (a.out) total-vm:14456kB, anon-rss:4180kB, file-rss:508kB, shmem-rss:0kB
[ 37.157306][ T54] oom_reaper: reaped process 9183 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 37.157328][ T9185] Memory cgroup out of memory: Killed process 9187 (a.out) total-vm:14456kB, anon-rss:4180kB, file-rss:512kB, shmem-rss:0kB
[ 37.157452][ T9185] Memory cgroup out of memory: Killed process 9189 (a.out) total-vm:14456kB, anon-rss:6228kB, file-rss:512kB, shmem-rss:0kB
[ 37.158733][ T9185] Memory cgroup out of memory: Killed process 9190 (a.out) total-vm:14456kB, anon-rss:552kB, file-rss:512kB, shmem-rss:0kB
[ 37.160083][ T54] oom_reaper: reaped process 9186 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 37.160187][ T54] oom_reaper: reaped process 9189 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 37.206941][ T54] oom_reaper: reaped process 9185 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 37.212300][ T9185] Memory cgroup out of memory: Killed process 9191 (a.out) total-vm:14456kB, anon-rss:4180kB, file-rss:512kB, shmem-rss:0kB
[ 37.212317][ T54] oom_reaper: reaped process 9190 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 37.218860][ T9185] Memory cgroup out of memory: Killed process 9192 (a.out) total-vm:14456kB, anon-rss:1080kB, file-rss:512kB, shmem-rss:0kB
[ 37.227667][ T54] oom_reaper: reaped process 9192 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 37.292323][ T9193] abrt-hook-ccpp (9193) used greatest stack depth: 10480 bytes left
[ 37.351843][ T1] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000008b
[ 37.354833][ T1] CPU: 7 PID: 1 Comm: systemd Kdump: loaded Not tainted 5.0.0-rc4-next-20190131 #280
[ 37.357876][ T1] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
[ 37.361685][ T1] Call Trace:
[ 37.363239][ T1] dump_stack+0x67/0x95
[ 37.365010][ T1] panic+0xfc/0x2b0
[ 37.366853][ T1] do_exit+0xd55/0xd60
[ 37.368595][ T1] do_group_exit+0x47/0xc0
[ 37.370415][ T1] get_signal+0x32a/0x920
[ 37.372449][ T1] ? _raw_spin_unlock_irqrestore+0x3d/0x70
[ 37.374596][ T1] do_signal+0x32/0x6e0
[ 37.376430][ T1] ? exit_to_usermode_loop+0x26/0x9b
[ 37.378418][ T1] ? prepare_exit_to_usermode+0xa8/0xd0
[ 37.380571][ T1] exit_to_usermode_loop+0x3e/0x9b
[ 37.382588][ T1] prepare_exit_to_usermode+0xa8/0xd0
[ 37.384594][ T1] ? page_fault+0x8/0x30
[ 37.386453][ T1] retint_user+0x8/0x18
[ 37.388160][ T1] RIP: 0033:0x7f42c06974a8
[ 37.389922][ T1] Code: Bad RIP value.
[ 37.391788][ T1] RSP: 002b:00007ffc3effd388 EFLAGS: 00010213
[ 37.394075][ T1] RAX: 000000000000000e RBX: 00007ffc3effd390 RCX: 0000000000000000
[ 37.396963][ T1] RDX: 000000000000002a RSI: 00007ffc3effd390 RDI: 0000000000000004
[ 37.399550][ T1] RBP: 00007ffc3effd680 R08: 0000000000000000 R09: 0000000000000000
[ 37.402334][ T1] R10: 00000000ffffffff R11: 0000000000000246 R12: 0000000000000001
[ 37.404890][ T1] R13: ffffffffffffffff R14: 0000000000000884 R15: 000056460b1ac3b0
Link: http://lkml.kernel.org/r/201902010336.x113a4EO027170@www262.sakura.ne.jp
Fixes: 3d8b38eb81cac813 ("mm, oom: introduce memory.oom.group")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/oom_kill.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 26ea8636758f..da0e44914085 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -928,7 +928,8 @@ static void __oom_kill_process(struct task_struct *victim)
*/
static int oom_kill_memcg_member(struct task_struct *task, void *unused)
{
- if (task->signal->oom_score_adj != OOM_SCORE_ADJ_MIN) {
+ if (task->signal->oom_score_adj != OOM_SCORE_ADJ_MIN &&
+ !is_global_init(task)) {
get_task_struct(task);
__oom_kill_process(task);
}
--
2.19.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH AUTOSEL 5.0 036/262] memcg: killed threads should not invoke memcg OOM killer
[not found] <20190327180158.10245-1-sashal@kernel.org>
` (5 preceding siblings ...)
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 035/262] mm,oom: don't kill global init via memory.oom.group Sasha Levin
@ 2019-03-27 17:58 ` Sasha Levin
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 037/262] mm, mempolicy: fix uninit memory access Sasha Levin
` (5 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Sasha Levin @ 2019-03-27 17:58 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Tetsuo Handa, David Rientjes, Kirill Tkhai, Andrew Morton,
Linus Torvalds, Sasha Levin, cgroups, linux-mm
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
[ Upstream commit 7775face207922ea62a4e96b9cd45abfdc7b9840 ]
If a memory cgroup contains a single process with many threads
(including different process group sharing the mm) then it is possible
to trigger a race when the oom killer complains that there are no oom
elible tasks and complain into the log which is both annoying and
confusing because there is no actual problem. The race looks as
follows:
P1 oom_reaper P2
try_charge try_charge
mem_cgroup_out_of_memory
mutex_lock(oom_lock)
out_of_memory
oom_kill_process(P1,P2)
wake_oom_reaper
mutex_unlock(oom_lock)
oom_reap_task
mutex_lock(oom_lock)
select_bad_process # no victim
The problem is more visible with many threads.
Fix this by checking for fatal_signal_pending from
mem_cgroup_out_of_memory when the oom_lock is already held.
The oom bypass is safe because we do the same early in the try_charge
path already. The situation migh have changed in the mean time. It
should be safe to check for fatal_signal_pending and tsk_is_oom_victim
but for a better code readability abstract the current charge bypass
condition into should_force_charge and reuse it from that path. "
Link: http://lkml.kernel.org/r/01370f70-e1f6-ebe4-b95e-0df21a0bc15e@i-love.sakura.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/memcontrol.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index af7f18b32389..79a7d2a06bba 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -248,6 +248,12 @@ enum res_type {
iter != NULL; \
iter = mem_cgroup_iter(NULL, iter, NULL))
+static inline bool should_force_charge(void)
+{
+ return tsk_is_oom_victim(current) || fatal_signal_pending(current) ||
+ (current->flags & PF_EXITING);
+}
+
/* Some nice accessors for the vmpressure. */
struct vmpressure *memcg_to_vmpressure(struct mem_cgroup *memcg)
{
@@ -1389,8 +1395,13 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
};
bool ret;
- mutex_lock(&oom_lock);
- ret = out_of_memory(&oc);
+ if (mutex_lock_killable(&oom_lock))
+ return true;
+ /*
+ * A few threads which were not waiting at mutex_lock_killable() can
+ * fail to bail out. Therefore, check again after holding oom_lock.
+ */
+ ret = should_force_charge() || out_of_memory(&oc);
mutex_unlock(&oom_lock);
return ret;
}
@@ -2209,9 +2220,7 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
* bypass the last charges so that they can exit quickly and
* free their memory.
*/
- if (unlikely(tsk_is_oom_victim(current) ||
- fatal_signal_pending(current) ||
- current->flags & PF_EXITING))
+ if (unlikely(should_force_charge()))
goto force;
/*
--
2.19.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH AUTOSEL 5.0 037/262] mm, mempolicy: fix uninit memory access
[not found] <20190327180158.10245-1-sashal@kernel.org>
` (6 preceding siblings ...)
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 036/262] memcg: killed threads should not invoke memcg OOM killer Sasha Levin
@ 2019-03-27 17:58 ` Sasha Levin
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 038/262] mm/vmalloc.c: fix kernel BUG at mm/vmalloc.c:512! Sasha Levin
` (4 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Sasha Levin @ 2019-03-27 17:58 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Vlastimil Babka, Alexander Potapenko, Dmitry Vyukov,
Andrea Arcangeli, Kirill A. Shutemov, Michal Hocko,
David Rientjes, Yisheng Xie, zhong jiang, Andrew Morton,
Linus Torvalds, Sasha Levin, linux-mm
From: Vlastimil Babka <vbabka@suse.cz>
[ Upstream commit 2e25644e8da4ed3a27e7b8315aaae74660be72dc ]
Syzbot with KMSAN reports (excerpt):
==================================================================
BUG: KMSAN: uninit-value in mpol_rebind_policy mm/mempolicy.c:353 [inline]
BUG: KMSAN: uninit-value in mpol_rebind_mm+0x249/0x370 mm/mempolicy.c:384
CPU: 1 PID: 17420 Comm: syz-executor4 Not tainted 4.20.0-rc7+ #15
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x173/0x1d0 lib/dump_stack.c:113
kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613
__msan_warning+0x82/0xf0 mm/kmsan/kmsan_instr.c:295
mpol_rebind_policy mm/mempolicy.c:353 [inline]
mpol_rebind_mm+0x249/0x370 mm/mempolicy.c:384
update_tasks_nodemask+0x608/0xca0 kernel/cgroup/cpuset.c:1120
update_nodemasks_hier kernel/cgroup/cpuset.c:1185 [inline]
update_nodemask kernel/cgroup/cpuset.c:1253 [inline]
cpuset_write_resmask+0x2a98/0x34b0 kernel/cgroup/cpuset.c:1728
...
Uninit was created at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:204 [inline]
kmsan_internal_poison_shadow+0x92/0x150 mm/kmsan/kmsan.c:158
kmsan_kmalloc+0xa6/0x130 mm/kmsan/kmsan_hooks.c:176
kmem_cache_alloc+0x572/0xb90 mm/slub.c:2777
mpol_new mm/mempolicy.c:276 [inline]
do_mbind mm/mempolicy.c:1180 [inline]
kernel_mbind+0x8a7/0x31a0 mm/mempolicy.c:1347
__do_sys_mbind mm/mempolicy.c:1354 [inline]
As it's difficult to report where exactly the uninit value resides in
the mempolicy object, we have to guess a bit. mm/mempolicy.c:353
contains this part of mpol_rebind_policy():
if (!mpol_store_user_nodemask(pol) &&
nodes_equal(pol->w.cpuset_mems_allowed, *newmask))
"mpol_store_user_nodemask(pol)" is testing pol->flags, which I couldn't
ever see being uninitialized after leaving mpol_new(). So I'll guess
it's actually about accessing pol->w.cpuset_mems_allowed on line 354,
but still part of statement starting on line 353.
For w.cpuset_mems_allowed to be not initialized, and the nodes_equal()
reachable for a mempolicy where mpol_set_nodemask() is called in
do_mbind(), it seems the only possibility is a MPOL_PREFERRED policy
with empty set of nodes, i.e. MPOL_LOCAL equivalent, with MPOL_F_LOCAL
flag. Let's exclude such policies from the nodes_equal() check. Note
the uninit access should be benign anyway, as rebinding this kind of
policy is always a no-op. Therefore no actual need for stable
inclusion.
Link: http://lkml.kernel.org/r/a71997c3-e8ae-a787-d5ce-3db05768b27c@suse.cz
Link: http://lkml.kernel.org/r/73da3e9c-cc84-509e-17d9-0c434bb9967d@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: syzbot+b19c2dc2c990ea657a71@syzkaller.appspotmail.com
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Yisheng Xie <xieyisheng1@huawei.com>
Cc: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/mempolicy.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index ee2bce59d2bf..846f80e212ac 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -350,7 +350,7 @@ static void mpol_rebind_policy(struct mempolicy *pol, const nodemask_t *newmask)
{
if (!pol)
return;
- if (!mpol_store_user_nodemask(pol) &&
+ if (!mpol_store_user_nodemask(pol) && !(pol->flags & MPOL_F_LOCAL) &&
nodes_equal(pol->w.cpuset_mems_allowed, *newmask))
return;
--
2.19.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH AUTOSEL 5.0 038/262] mm/vmalloc.c: fix kernel BUG at mm/vmalloc.c:512!
[not found] <20190327180158.10245-1-sashal@kernel.org>
` (7 preceding siblings ...)
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 037/262] mm, mempolicy: fix uninit memory access Sasha Levin
@ 2019-03-27 17:58 ` Sasha Levin
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 039/262] mm/slab.c: kmemleak no scan alien caches Sasha Levin
` (3 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Sasha Levin @ 2019-03-27 17:58 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Uladzislau Rezki (Sony), Ingo Molnar, Joel Fernandes,
Matthew Wilcox, Michal Hocko, Oleksiy Avramchenko, Steven Rostedt,
Tejun Heo, Thomas Garnier, Thomas Gleixner, Andrew Morton,
Linus Torvalds, Sasha Levin, linux-mm
From: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
[ Upstream commit afd07389d3f4933c7f7817a92fb5e053d59a3182 ]
One of the vmalloc stress test case triggers the kernel BUG():
<snip>
[60.562151] ------------[ cut here ]------------
[60.562154] kernel BUG at mm/vmalloc.c:512!
[60.562206] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[60.562247] CPU: 0 PID: 430 Comm: vmalloc_test/0 Not tainted 4.20.0+ #161
[60.562293] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[60.562351] RIP: 0010:alloc_vmap_area+0x36f/0x390
<snip>
it can happen due to big align request resulting in overflowing of
calculated address, i.e. it becomes 0 after ALIGN()'s fixup.
Fix it by checking if calculated address is within vstart/vend range.
Link: http://lkml.kernel.org/r/20190124115648.9433-2-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/vmalloc.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 2cd24186ba84..583630bf247d 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -498,7 +498,11 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
}
found:
- if (addr + size > vend)
+ /*
+ * Check also calculated address against the vstart,
+ * because it can be 0 because of big align request.
+ */
+ if (addr + size > vend || addr < vstart)
goto overflow;
va->va_start = addr;
--
2.19.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH AUTOSEL 5.0 039/262] mm/slab.c: kmemleak no scan alien caches
[not found] <20190327180158.10245-1-sashal@kernel.org>
` (8 preceding siblings ...)
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 038/262] mm/vmalloc.c: fix kernel BUG at mm/vmalloc.c:512! Sasha Levin
@ 2019-03-27 17:58 ` Sasha Levin
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 044/262] page_poison: play nicely with KASAN Sasha Levin
` (2 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Sasha Levin @ 2019-03-27 17:58 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Qian Cai, Christoph Lameter, Pekka Enberg, David Rientjes,
Joonsoo Kim, Catalin Marinas, Andrew Morton, Linus Torvalds,
Sasha Levin, linux-mm
From: Qian Cai <cai@lca.pw>
[ Upstream commit 92d1d07daad65c300c7d0b68bbef8867e9895d54 ]
Kmemleak throws endless warnings during boot due to in
__alloc_alien_cache(),
alc = kmalloc_node(memsize, gfp, node);
init_arraycache(&alc->ac, entries, batch);
kmemleak_no_scan(ac);
Kmemleak does not track the array cache (alc->ac) but the alien cache
(alc) instead, so let it track the latter by lifting kmemleak_no_scan()
out of init_arraycache().
There is another place that calls init_arraycache(), but
alloc_kmem_cache_cpus() uses the percpu allocation where will never be
considered as a leak.
kmemleak: Found object by alias at 0xffff8007b9aa7e38
CPU: 190 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc2+ #2
Call trace:
dump_backtrace+0x0/0x168
show_stack+0x24/0x30
dump_stack+0x88/0xb0
lookup_object+0x84/0xac
find_and_get_object+0x84/0xe4
kmemleak_no_scan+0x74/0xf4
setup_kmem_cache_node+0x2b4/0x35c
__do_tune_cpucache+0x250/0x2d4
do_tune_cpucache+0x4c/0xe4
enable_cpucache+0xc8/0x110
setup_cpu_cache+0x40/0x1b8
__kmem_cache_create+0x240/0x358
create_cache+0xc0/0x198
kmem_cache_create_usercopy+0x158/0x20c
kmem_cache_create+0x50/0x64
fsnotify_init+0x58/0x6c
do_one_initcall+0x194/0x388
kernel_init_freeable+0x668/0x688
kernel_init+0x18/0x124
ret_from_fork+0x10/0x18
kmemleak: Object 0xffff8007b9aa7e00 (size 256):
kmemleak: comm "swapper/0", pid 1, jiffies 4294697137
kmemleak: min_count = 1
kmemleak: count = 0
kmemleak: flags = 0x1
kmemleak: checksum = 0
kmemleak: backtrace:
kmemleak_alloc+0x84/0xb8
kmem_cache_alloc_node_trace+0x31c/0x3a0
__kmalloc_node+0x58/0x78
setup_kmem_cache_node+0x26c/0x35c
__do_tune_cpucache+0x250/0x2d4
do_tune_cpucache+0x4c/0xe4
enable_cpucache+0xc8/0x110
setup_cpu_cache+0x40/0x1b8
__kmem_cache_create+0x240/0x358
create_cache+0xc0/0x198
kmem_cache_create_usercopy+0x158/0x20c
kmem_cache_create+0x50/0x64
fsnotify_init+0x58/0x6c
do_one_initcall+0x194/0x388
kernel_init_freeable+0x668/0x688
kernel_init+0x18/0x124
kmemleak: Not scanning unknown object at 0xffff8007b9aa7e38
CPU: 190 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc2+ #2
Call trace:
dump_backtrace+0x0/0x168
show_stack+0x24/0x30
dump_stack+0x88/0xb0
kmemleak_no_scan+0x90/0xf4
setup_kmem_cache_node+0x2b4/0x35c
__do_tune_cpucache+0x250/0x2d4
do_tune_cpucache+0x4c/0xe4
enable_cpucache+0xc8/0x110
setup_cpu_cache+0x40/0x1b8
__kmem_cache_create+0x240/0x358
create_cache+0xc0/0x198
kmem_cache_create_usercopy+0x158/0x20c
kmem_cache_create+0x50/0x64
fsnotify_init+0x58/0x6c
do_one_initcall+0x194/0x388
kernel_init_freeable+0x668/0x688
kernel_init+0x18/0x124
ret_from_fork+0x10/0x18
Link: http://lkml.kernel.org/r/20190129184518.39808-1-cai@lca.pw
Fixes: 1fe00d50a9e8 ("slab: factor out initialization of array cache")
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/slab.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/mm/slab.c b/mm/slab.c
index 91c1863df93d..757e646baa5d 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -550,14 +550,6 @@ static void start_cpu_timer(int cpu)
static void init_arraycache(struct array_cache *ac, int limit, int batch)
{
- /*
- * The array_cache structures contain pointers to free object.
- * However, when such objects are allocated or transferred to another
- * cache the pointers are not cleared and they could be counted as
- * valid references during a kmemleak scan. Therefore, kmemleak must
- * not scan such objects.
- */
- kmemleak_no_scan(ac);
if (ac) {
ac->avail = 0;
ac->limit = limit;
@@ -573,6 +565,14 @@ static struct array_cache *alloc_arraycache(int node, int entries,
struct array_cache *ac = NULL;
ac = kmalloc_node(memsize, gfp, node);
+ /*
+ * The array_cache structures contain pointers to free object.
+ * However, when such objects are allocated or transferred to another
+ * cache the pointers are not cleared and they could be counted as
+ * valid references during a kmemleak scan. Therefore, kmemleak must
+ * not scan such objects.
+ */
+ kmemleak_no_scan(ac);
init_arraycache(ac, entries, batchcount);
return ac;
}
@@ -667,6 +667,7 @@ static struct alien_cache *__alloc_alien_cache(int node, int entries,
alc = kmalloc_node(memsize, gfp, node);
if (alc) {
+ kmemleak_no_scan(alc);
init_arraycache(&alc->ac, entries, batch);
spin_lock_init(&alc->lock);
}
--
2.19.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH AUTOSEL 5.0 044/262] page_poison: play nicely with KASAN
[not found] <20190327180158.10245-1-sashal@kernel.org>
` (9 preceding siblings ...)
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 039/262] mm/slab.c: kmemleak no scan alien caches Sasha Levin
@ 2019-03-27 17:58 ` Sasha Levin
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 045/262] kasan: fix kasan_check_read/write definitions Sasha Levin
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 061/262] mm/resource: Return real error codes from walk failures Sasha Levin
12 siblings, 0 replies; 15+ messages in thread
From: Sasha Levin @ 2019-03-27 17:58 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Qian Cai, Andrew Morton, Linus Torvalds, Sasha Levin, linux-mm
From: Qian Cai <cai@lca.pw>
[ Upstream commit 4117992df66a26fa33908b4969e04801534baab1 ]
KASAN does not play well with the page poisoning (CONFIG_PAGE_POISONING).
It triggers false positives in the allocation path:
BUG: KASAN: use-after-free in memchr_inv+0x2ea/0x330
Read of size 8 at addr ffff88881f800000 by task swapper/0
CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-rc1+ #54
Call Trace:
dump_stack+0xe0/0x19a
print_address_description.cold.2+0x9/0x28b
kasan_report.cold.3+0x7a/0xb5
__asan_report_load8_noabort+0x19/0x20
memchr_inv+0x2ea/0x330
kernel_poison_pages+0x103/0x3d5
get_page_from_freelist+0x15e7/0x4d90
because KASAN has not yet unpoisoned the shadow page for allocation
before it checks memchr_inv() but only found a stale poison pattern.
Also, false positives in free path,
BUG: KASAN: slab-out-of-bounds in kernel_poison_pages+0x29e/0x3d5
Write of size 4096 at addr ffff8888112cc000 by task swapper/0/1
CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc1+ #55
Call Trace:
dump_stack+0xe0/0x19a
print_address_description.cold.2+0x9/0x28b
kasan_report.cold.3+0x7a/0xb5
check_memory_region+0x22d/0x250
memset+0x28/0x40
kernel_poison_pages+0x29e/0x3d5
__free_pages_ok+0x75f/0x13e0
due to KASAN adds poisoned redzones around slab objects, but the page
poisoning needs to poison the whole page.
Link: http://lkml.kernel.org/r/20190114233405.67843-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/page_alloc.c | 2 +-
mm/page_poison.c | 4 ++++
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0b9f577b1a2a..10d0f2ed9f69 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1945,8 +1945,8 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
arch_alloc_page(page, order);
kernel_map_pages(page, 1 << order, 1);
- kernel_poison_pages(page, 1 << order, 1);
kasan_alloc_pages(page, order);
+ kernel_poison_pages(page, 1 << order, 1);
set_page_owner(page, order, gfp_flags);
}
diff --git a/mm/page_poison.c b/mm/page_poison.c
index f0c15e9017c0..21d4f97cb49b 100644
--- a/mm/page_poison.c
+++ b/mm/page_poison.c
@@ -6,6 +6,7 @@
#include <linux/page_ext.h>
#include <linux/poison.h>
#include <linux/ratelimit.h>
+#include <linux/kasan.h>
static bool want_page_poisoning __read_mostly;
@@ -40,7 +41,10 @@ static void poison_page(struct page *page)
{
void *addr = kmap_atomic(page);
+ /* KASAN still think the page is in-use, so skip it. */
+ kasan_disable_current();
memset(addr, PAGE_POISON, PAGE_SIZE);
+ kasan_enable_current();
kunmap_atomic(addr);
}
--
2.19.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH AUTOSEL 5.0 045/262] kasan: fix kasan_check_read/write definitions
[not found] <20190327180158.10245-1-sashal@kernel.org>
` (10 preceding siblings ...)
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 044/262] page_poison: play nicely with KASAN Sasha Levin
@ 2019-03-27 17:58 ` Sasha Levin
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 061/262] mm/resource: Return real error codes from walk failures Sasha Levin
12 siblings, 0 replies; 15+ messages in thread
From: Sasha Levin @ 2019-03-27 17:58 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Arnd Bergmann, Ard Biesheuvel, Will Deacon, Mark Rutland,
Alexander Potapenko, Dmitry Vyukov, Andrey Konovalov,
Stephen Rothwell, Andrew Morton, Linus Torvalds, Sasha Levin,
kasan-dev, linux-mm
From: Arnd Bergmann <arnd@arndb.de>
[ Upstream commit bcf6f55a0d05eedd8ebb6ecc60ae3f93205ad833 ]
Building little-endian allmodconfig kernels on arm64 started failing
with the generated atomic.h implementation, since we now try to call
kasan helpers from the EFI stub:
aarch64-linux-gnu-ld: drivers/firmware/efi/libstub/arm-stub.stub.o: in function `atomic_set':
include/generated/atomic-instrumented.h:44: undefined reference to `__efistub_kasan_check_write'
I suspect that we get similar problems in other files that explicitly
disable KASAN for some reason but call atomic_t based helper functions.
We can fix this by checking the predefined __SANITIZE_ADDRESS__ macro
that the compiler sets instead of checking CONFIG_KASAN, but this in
turn requires a small hack in mm/kasan/common.c so we do see the extern
declaration there instead of the inline function.
Link: http://lkml.kernel.org/r/20181211133453.2835077-1-arnd@arndb.de
Fixes: b1864b828644 ("locking/atomics: build atomic headers as required")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Anders Roxell <anders.roxell@linaro.org>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>,
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
include/linux/kasan-checks.h | 2 +-
mm/kasan/common.c | 2 ++
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/linux/kasan-checks.h b/include/linux/kasan-checks.h
index d314150658a4..a61dc075e2ce 100644
--- a/include/linux/kasan-checks.h
+++ b/include/linux/kasan-checks.h
@@ -2,7 +2,7 @@
#ifndef _LINUX_KASAN_CHECKS_H
#define _LINUX_KASAN_CHECKS_H
-#ifdef CONFIG_KASAN
+#if defined(__SANITIZE_ADDRESS__) || defined(__KASAN_INTERNAL)
void kasan_check_read(const volatile void *p, unsigned int size);
void kasan_check_write(const volatile void *p, unsigned int size);
#else
diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index 09b534fbba17..80bbe62b16cd 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -14,6 +14,8 @@
*
*/
+#define __KASAN_INTERNAL
+
#include <linux/export.h>
#include <linux/interrupt.h>
#include <linux/init.h>
--
2.19.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH AUTOSEL 5.0 061/262] mm/resource: Return real error codes from walk failures
[not found] <20190327180158.10245-1-sashal@kernel.org>
` (11 preceding siblings ...)
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 045/262] kasan: fix kasan_check_read/write definitions Sasha Levin
@ 2019-03-27 17:58 ` Sasha Levin
12 siblings, 0 replies; 15+ messages in thread
From: Sasha Levin @ 2019-03-27 17:58 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Dave Hansen, Dan Williams, Dave Jiang, Ross Zwisler, Vishal Verma,
Tom Lendacky, Andrew Morton, Michal Hocko, linux-nvdimm, linux-mm,
Huang Ying, Fengguang Wu, Borislav Petkov, Yaowei Bai,
Takashi Iwai, Jerome Glisse, Benjamin Herrenschmidt,
Paul Mackerras, linuxppc-dev, Keith Busch, Sasha Levin
From: Dave Hansen <dave.hansen@linux.intel.com>
[ Upstream commit 5cd401ace914dc68556c6d2fcae0c349444d5f86 ]
walk_system_ram_range() can return an error code either becuase
*it* failed, or because the 'func' that it calls returned an
error. The memory hotplug does the following:
ret = walk_system_ram_range(..., func);
if (ret)
return ret;
and 'ret' makes it out to userspace, eventually. The problem
s, walk_system_ram_range() failues that result from *it* failing
(as opposed to 'func') return -1. That leads to a very odd
-EPERM (-1) return code out to userspace.
Make walk_system_ram_range() return -EINVAL for internal
failures to keep userspace less confused.
This return code is compatible with all the callers that I
audited.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: linux-nvdimm@lists.01.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: Huang Ying <ying.huang@intel.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Keith Busch <keith.busch@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
kernel/resource.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/resource.c b/kernel/resource.c
index 915c02e8e5dd..ca7ed5158cff 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -382,7 +382,7 @@ static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
int (*func)(struct resource *, void *))
{
struct resource res;
- int ret = -1;
+ int ret = -EINVAL;
while (start < end &&
!find_next_iomem_res(start, end, flags, desc, first_lvl, &res)) {
@@ -462,7 +462,7 @@ int walk_system_ram_range(unsigned long start_pfn, unsigned long nr_pages,
unsigned long flags;
struct resource res;
unsigned long pfn, end_pfn;
- int ret = -1;
+ int ret = -EINVAL;
start = (u64) start_pfn << PAGE_SHIFT;
end = ((u64)(start_pfn + nr_pages) << PAGE_SHIFT) - 1;
--
2.19.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH AUTOSEL 5.0 015/262] memblock: memblock_phys_alloc_try_nid(): don't panic
2019-03-27 17:57 ` [PATCH AUTOSEL 5.0 015/262] memblock: memblock_phys_alloc_try_nid(): don't panic Sasha Levin
@ 2019-03-28 5:57 ` Mike Rapoport
2019-04-04 0:57 ` Sasha Levin
0 siblings, 1 reply; 15+ messages in thread
From: Mike Rapoport @ 2019-03-28 5:57 UTC (permalink / raw)
To: Sasha Levin
Cc: linux-kernel, stable, Catalin Marinas, Christophe Leroy,
Christoph Hellwig, David S. Miller, Dennis Zhou,
Geert Uytterhoeven, Greentime Hu, Greg Kroah-Hartman, Guan Xuetao,
Guo Ren, Guo Ren, Heiko Carstens, Juergen Gross, Mark Salter,
Matt Turner, Max Filippov, Michal Simek, Paul Burton, Petr Mladek,
Richard Weinberger, Rich Felker, Rob Herring, Rob Herring,
Russell King, Stafford Horne, Tony Luck, Vineet Gupta,
Yoshinori Sato, Andrew Morton, Linus Torvalds, linuxppc-dev,
linux-mm
Hi,
On Wed, Mar 27, 2019 at 01:57:50PM -0400, Sasha Levin wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
>
> [ Upstream commit 337555744e6e39dd1d87698c6084dd88a606d60a ]
>
> The memblock_phys_alloc_try_nid() function tries to allocate memory from
> the requested node and then falls back to allocation from any node in
> the system. The memblock_alloc_base() fallback used by this function
> panics if the allocation fails.
>
> Replace the memblock_alloc_base() fallback with the direct call to
> memblock_alloc_range_nid() and update the memblock_phys_alloc_try_nid()
> callers to check the returned value and panic in case of error.
This is a part of memblock refactoring, I don't think it should be applied
to -stable.
> Link: http://lkml.kernel.org/r/1548057848-15136-7-git-send-email-rppt@linux.ibm.com
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Christophe Leroy <christophe.leroy@c-s.fr>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Dennis Zhou <dennis@kernel.org>
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: Greentime Hu <green.hu@gmail.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Guan Xuetao <gxt@pku.edu.cn>
> Cc: Guo Ren <guoren@kernel.org>
> Cc: Guo Ren <ren_guo@c-sky.com> [c-sky]
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Juergen Gross <jgross@suse.com> [Xen]
> Cc: Mark Salter <msalter@redhat.com>
> Cc: Matt Turner <mattst88@gmail.com>
> Cc: Max Filippov <jcmvbkbc@gmail.com>
> Cc: Michal Simek <monstr@monstr.eu>
> Cc: Paul Burton <paul.burton@mips.com>
> Cc: Petr Mladek <pmladek@suse.com>
> Cc: Richard Weinberger <richard@nod.at>
> Cc: Rich Felker <dalias@libc.org>
> Cc: Rob Herring <robh+dt@kernel.org>
> Cc: Rob Herring <robh@kernel.org>
> Cc: Russell King <linux@armlinux.org.uk>
> Cc: Stafford Horne <shorne@gmail.com>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Vineet Gupta <vgupta@synopsys.com>
> Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
> arch/arm64/mm/numa.c | 4 ++++
> arch/powerpc/mm/numa.c | 4 ++++
> mm/memblock.c | 4 +++-
> 3 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> index ae34e3a1cef1..2c61ea4e290b 100644
> --- a/arch/arm64/mm/numa.c
> +++ b/arch/arm64/mm/numa.c
> @@ -237,6 +237,10 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
> pr_info("Initmem setup node %d [<memory-less node>]\n", nid);
>
> nd_pa = memblock_phys_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
> + if (!nd_pa)
> + panic("Cannot allocate %zu bytes for node %d data\n",
> + nd_size, nid);
> +
> nd = __va(nd_pa);
>
> /* report and initialize */
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 87f0dd004295..8ec2ed30d44c 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -788,6 +788,10 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
> int tnid;
>
> nd_pa = memblock_phys_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
> + if (!nd_pa)
> + panic("Cannot allocate %zu bytes for node %d data\n",
> + nd_size, nid);
> +
> nd = __va(nd_pa);
>
> /* report and initialize */
> diff --git a/mm/memblock.c b/mm/memblock.c
> index ea31045ba704..d5923df56acc 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1342,7 +1342,9 @@ phys_addr_t __init memblock_phys_alloc_try_nid(phys_addr_t size, phys_addr_t ali
>
> if (res)
> return res;
> - return memblock_alloc_base(size, align, MEMBLOCK_ALLOC_ACCESSIBLE);
> + return memblock_alloc_range_nid(size, align, 0,
> + MEMBLOCK_ALLOC_ACCESSIBLE,
> + NUMA_NO_NODE, MEMBLOCK_NONE);
> }
>
> /**
> --
> 2.19.1
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH AUTOSEL 5.0 015/262] memblock: memblock_phys_alloc_try_nid(): don't panic
2019-03-28 5:57 ` Mike Rapoport
@ 2019-04-04 0:57 ` Sasha Levin
0 siblings, 0 replies; 15+ messages in thread
From: Sasha Levin @ 2019-04-04 0:57 UTC (permalink / raw)
To: Mike Rapoport
Cc: linux-kernel, stable, Catalin Marinas, Christophe Leroy,
Christoph Hellwig, David S. Miller, Dennis Zhou,
Geert Uytterhoeven, Greentime Hu, Greg Kroah-Hartman, Guan Xuetao,
Guo Ren, Guo Ren, Heiko Carstens, Juergen Gross, Mark Salter,
Matt Turner, Max Filippov, Michal Simek, Paul Burton, Petr Mladek,
Richard Weinberger, Rich Felker, Rob Herring, Rob Herring,
Russell King, Stafford Horne, Tony Luck, Vineet Gupta,
Yoshinori Sato, Andrew Morton, Linus Torvalds, linuxppc-dev,
linux-mm
On Thu, Mar 28, 2019 at 07:57:21AM +0200, Mike Rapoport wrote:
>Hi,
>
>On Wed, Mar 27, 2019 at 01:57:50PM -0400, Sasha Levin wrote:
>> From: Mike Rapoport <rppt@linux.ibm.com>
>>
>> [ Upstream commit 337555744e6e39dd1d87698c6084dd88a606d60a ]
>>
>> The memblock_phys_alloc_try_nid() function tries to allocate memory from
>> the requested node and then falls back to allocation from any node in
>> the system. The memblock_alloc_base() fallback used by this function
>> panics if the allocation fails.
>>
>> Replace the memblock_alloc_base() fallback with the direct call to
>> memblock_alloc_range_nid() and update the memblock_phys_alloc_try_nid()
>> callers to check the returned value and panic in case of error.
>
>This is a part of memblock refactoring, I don't think it should be applied
>to -stable.
Dropped, thanks!
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2019-04-04 0:57 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20190327180158.10245-1-sashal@kernel.org>
2019-03-27 17:57 ` [PATCH AUTOSEL 5.0 015/262] memblock: memblock_phys_alloc_try_nid(): don't panic Sasha Levin
2019-03-28 5:57 ` Mike Rapoport
2019-04-04 0:57 ` Sasha Levin
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 030/262] mm/sparse: fix a bad comparison Sasha Levin
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 031/262] mm/cma.c: cma_declare_contiguous: correct err handling Sasha Levin
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 032/262] mm/page_ext.c: fix an imbalance with kmemleak Sasha Levin
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 033/262] mm, swap: bounds check swap_info array accesses to avoid NULL derefs Sasha Levin
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 035/262] mm,oom: don't kill global init via memory.oom.group Sasha Levin
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 036/262] memcg: killed threads should not invoke memcg OOM killer Sasha Levin
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 037/262] mm, mempolicy: fix uninit memory access Sasha Levin
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 038/262] mm/vmalloc.c: fix kernel BUG at mm/vmalloc.c:512! Sasha Levin
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 039/262] mm/slab.c: kmemleak no scan alien caches Sasha Levin
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 044/262] page_poison: play nicely with KASAN Sasha Levin
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 045/262] kasan: fix kasan_check_read/write definitions Sasha Levin
2019-03-27 17:58 ` [PATCH AUTOSEL 5.0 061/262] mm/resource: Return real error codes from walk failures Sasha Levin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).