* [patch 2/6] mm: fix vma_is_anonymous() false-positives
@ 2018-07-21 0:53 akpm
2018-07-21 19:48 ` Linus Torvalds
0 siblings, 1 reply; 9+ messages in thread
From: akpm @ 2018-07-21 0:53 UTC (permalink / raw)
To: stable, oleg, marcel.ziswiler, dvyukov, aarcange, kirill.shutemov,
akpm, mm-commits, torvalds
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: mm: fix vma_is_anonymous() false-positives
vma_is_anonymous() relies on ->vm_ops being NULL to detect anonymous VMA.
This is unreliable as ->mmap may not set ->vm_ops.
False-positive vma_is_anonymous() may lead to crashes:
next ffff8801ce5e7040 prev ffff8801d20eca50 mm ffff88019c1e13c0
prot 27 anon_vma ffff88019680cdd8 vm_ops 0000000000000000
pgoff 0 file ffff8801b2ec2d00 private_data 0000000000000000
flags: 0xff(read|write|exec|shared|mayread|maywrite|mayexec|mayshare)
------------[ cut here ]------------
kernel BUG at mm/memory.c:1422!
invalid opcode: 0000 [#1] SMP KASAN
CPU: 0 PID: 18486 Comm: syz-executor3 Not tainted 4.18.0-rc3+ #136
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
01/01/2011
RIP: 0010:zap_pmd_range mm/memory.c:1421 [inline]
RIP: 0010:zap_pud_range mm/memory.c:1466 [inline]
RIP: 0010:zap_p4d_range mm/memory.c:1487 [inline]
RIP: 0010:unmap_page_range+0x1c18/0x2220 mm/memory.c:1508
Code: ff 31 ff 4c 89 e6 42 c6 04 33 f8 e8 92 dd d0 ff 4d 85 e4 0f 85 4a eb ff
ff e8 54 dc d0 ff 48 8b bd 10 fc ff ff e8 82 95 fe ff <0f> 0b e8 41 dc d0 ff
0f 0b 4c 89 ad 18 fc ff ff c7 85 7c fb ff ff
RSP: 0018:ffff8801b0587330 EFLAGS: 00010286
RAX: 000000000000013c RBX: 1ffff100360b0e9c RCX: ffffc90002620000
RDX: 0000000000000000 RSI: ffffffff81631851 RDI: 0000000000000001
RBP: ffff8801b05877c8 R08: ffff880199d40300 R09: ffffed003b5c4fc0
R10: ffffed003b5c4fc0 R11: ffff8801dae27e07 R12: 0000000000000000
R13: ffff88019c1e13c0 R14: dffffc0000000000 R15: 0000000020e01000
FS: 00007fca32251700(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f04c540d000 CR3: 00000001ac1f0000 CR4: 00000000001426f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
unmap_single_vma+0x1a0/0x310 mm/memory.c:1553
zap_page_range_single+0x3cc/0x580 mm/memory.c:1644
unmap_mapping_range_vma mm/memory.c:2792 [inline]
unmap_mapping_range_tree mm/memory.c:2813 [inline]
unmap_mapping_pages+0x3a7/0x5b0 mm/memory.c:2845
unmap_mapping_range+0x48/0x60 mm/memory.c:2880
truncate_pagecache+0x54/0x90 mm/truncate.c:800
truncate_setsize+0x70/0xb0 mm/truncate.c:826
simple_setattr+0xe9/0x110 fs/libfs.c:409
notify_change+0xf13/0x10f0 fs/attr.c:335
do_truncate+0x1ac/0x2b0 fs/open.c:63
do_sys_ftruncate+0x492/0x560 fs/open.c:205
__do_sys_ftruncate fs/open.c:215 [inline]
__se_sys_ftruncate fs/open.c:213 [inline]
__x64_sys_ftruncate+0x59/0x80 fs/open.c:213
do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Reproducer:
#include <stdio.h>
#include <stddef.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <unistd.h>
#include <fcntl.h>
#define KCOV_INIT_TRACE _IOR('c', 1, unsigned long)
#define KCOV_ENABLE _IO('c', 100)
#define KCOV_DISABLE _IO('c', 101)
#define COVER_SIZE (1024<<10)
#define KCOV_TRACE_PC 0
#define KCOV_TRACE_CMP 1
int main(int argc, char **argv)
{
int fd;
unsigned long *cover;
system("mount -t debugfs none /sys/kernel/debug");
fd = open("/sys/kernel/debug/kcov", O_RDWR);
ioctl(fd, KCOV_INIT_TRACE, COVER_SIZE);
cover = mmap(NULL, COVER_SIZE * sizeof(unsigned long),
PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
munmap(cover, COVER_SIZE * sizeof(unsigned long));
cover = mmap(NULL, COVER_SIZE * sizeof(unsigned long),
PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
memset(cover, 0, COVER_SIZE * sizeof(unsigned long));
ftruncate(fd, 3UL << 20);
return 0;
}
This can be fixed by assigning anonymous VMAs own vm_ops and not relying
on it being NULL.
If ->mmap() failed to set ->vm_ops, mmap_region() will set it to
dummy_vm_ops. This way we will have non-NULL ->vm_ops for all VMAs.
[kirill@shutemov.name: add comments]
Link: http://lkml.kernel.org/r/20180711121521.omugjfpuuyxscjjf@kshutemo-mobl1
[kirill.shutemov@linux.intel.com: v2]
Link: http://lkml.kernel.org/r/20180712145626.41665-2-kirill.shutemov@linux.intel.com
[kirill.shutemov@linux.intel.com: fix splat reported by Marcel]
Link: http://lkml.kernel.org/r/20180716142049.ioa2irsd2d7sphn4@black.fi.intel.com
Link: http://lkml.kernel.org/r/20180710134821.84709-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: syzbot+3f84280d52be9b7083cc@syzkaller.appspotmail.com
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Marcel Ziswiler <marcel.ziswiler@toradex.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/arm/kernel/process.c | 1 +
arch/ia64/kernel/perfmon.c | 1 +
arch/ia64/mm/init.c | 2 ++
drivers/char/mem.c | 1 +
fs/exec.c | 1 +
fs/hugetlbfs/inode.c | 1 +
include/linux/mm.h | 5 ++++-
mm/khugepaged.c | 4 ++--
mm/mmap.c | 11 +++++++++++
mm/nommu.c | 9 ++++++++-
mm/shmem.c | 1 +
mm/util.c | 12 ++++++++++++
12 files changed, 45 insertions(+), 4 deletions(-)
diff -puN arch/arm/kernel/process.c~mm-fix-vma_is_anonymous-false-positives arch/arm/kernel/process.c
--- a/arch/arm/kernel/process.c~mm-fix-vma_is_anonymous-false-positives
+++ a/arch/arm/kernel/process.c
@@ -334,6 +334,7 @@ static struct vm_area_struct gate_vma =
.vm_start = 0xffff0000,
.vm_end = 0xffff0000 + PAGE_SIZE,
.vm_flags = VM_READ | VM_EXEC | VM_MAYREAD | VM_MAYEXEC,
+ .vm_ops = &dummy_vm_ops,
};
static int __init gate_vma_init(void)
diff -puN arch/ia64/kernel/perfmon.c~mm-fix-vma_is_anonymous-false-positives arch/ia64/kernel/perfmon.c
--- a/arch/ia64/kernel/perfmon.c~mm-fix-vma_is_anonymous-false-positives
+++ a/arch/ia64/kernel/perfmon.c
@@ -2292,6 +2292,7 @@ pfm_smpl_buffer_alloc(struct task_struct
vma->vm_file = get_file(filp);
vma->vm_flags = VM_READ|VM_MAYREAD|VM_DONTEXPAND|VM_DONTDUMP;
vma->vm_page_prot = PAGE_READONLY; /* XXX may need to change */
+ vma->vm_ops = &dummy_vm_ops;
/*
* Now we have everything we need and we can initialize
diff -puN arch/ia64/mm/init.c~mm-fix-vma_is_anonymous-false-positives arch/ia64/mm/init.c
--- a/arch/ia64/mm/init.c~mm-fix-vma_is_anonymous-false-positives
+++ a/arch/ia64/mm/init.c
@@ -122,6 +122,7 @@ ia64_init_addr_space (void)
vma->vm_end = vma->vm_start + PAGE_SIZE;
vma->vm_flags = VM_DATA_DEFAULT_FLAGS|VM_GROWSUP|VM_ACCOUNT;
vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
+ vma->vm_ops = &dummy_vm_ops;
down_write(¤t->mm->mmap_sem);
if (insert_vm_struct(current->mm, vma)) {
up_write(¤t->mm->mmap_sem);
@@ -141,6 +142,7 @@ ia64_init_addr_space (void)
vma->vm_page_prot = __pgprot(pgprot_val(PAGE_READONLY) | _PAGE_MA_NAT);
vma->vm_flags = VM_READ | VM_MAYREAD | VM_IO |
VM_DONTEXPAND | VM_DONTDUMP;
+ vma->vm_ops = &dummy_vm_ops;
down_write(¤t->mm->mmap_sem);
if (insert_vm_struct(current->mm, vma)) {
up_write(¤t->mm->mmap_sem);
diff -puN drivers/char/mem.c~mm-fix-vma_is_anonymous-false-positives drivers/char/mem.c
--- a/drivers/char/mem.c~mm-fix-vma_is_anonymous-false-positives
+++ a/drivers/char/mem.c
@@ -708,6 +708,7 @@ static int mmap_zero(struct file *file,
#endif
if (vma->vm_flags & VM_SHARED)
return shmem_zero_setup(vma);
+ vma->vm_ops = &anon_vm_ops;
return 0;
}
diff -puN fs/exec.c~mm-fix-vma_is_anonymous-false-positives fs/exec.c
--- a/fs/exec.c~mm-fix-vma_is_anonymous-false-positives
+++ a/fs/exec.c
@@ -307,6 +307,7 @@ static int __bprm_mm_init(struct linux_b
* configured yet.
*/
BUILD_BUG_ON(VM_STACK_FLAGS & VM_STACK_INCOMPLETE_SETUP);
+ vma->vm_ops = &anon_vm_ops;
vma->vm_end = STACK_TOP_MAX;
vma->vm_start = vma->vm_end - PAGE_SIZE;
vma->vm_flags = VM_SOFTDIRTY | VM_STACK_FLAGS | VM_STACK_INCOMPLETE_SETUP;
diff -puN fs/hugetlbfs/inode.c~mm-fix-vma_is_anonymous-false-positives fs/hugetlbfs/inode.c
--- a/fs/hugetlbfs/inode.c~mm-fix-vma_is_anonymous-false-positives
+++ a/fs/hugetlbfs/inode.c
@@ -597,6 +597,7 @@ static long hugetlbfs_fallocate(struct f
memset(&pseudo_vma, 0, sizeof(struct vm_area_struct));
pseudo_vma.vm_flags = (VM_HUGETLB | VM_MAYSHARE | VM_SHARED);
pseudo_vma.vm_file = file;
+ pseudo_vma.vm_ops = &dummy_vm_ops;
for (index = start; index < end; index++) {
/*
diff -puN include/linux/mm.h~mm-fix-vma_is_anonymous-false-positives include/linux/mm.h
--- a/include/linux/mm.h~mm-fix-vma_is_anonymous-false-positives
+++ a/include/linux/mm.h
@@ -1536,9 +1536,12 @@ int clear_page_dirty_for_io(struct page
int get_cmdline(struct task_struct *task, char *buffer, int buflen);
+extern const struct vm_operations_struct anon_vm_ops;
+extern const struct vm_operations_struct dummy_vm_ops;
+
static inline bool vma_is_anonymous(struct vm_area_struct *vma)
{
- return !vma->vm_ops;
+ return vma->vm_ops == &anon_vm_ops;
}
#ifdef CONFIG_SHMEM
diff -puN mm/khugepaged.c~mm-fix-vma_is_anonymous-false-positives mm/khugepaged.c
--- a/mm/khugepaged.c~mm-fix-vma_is_anonymous-false-positives
+++ a/mm/khugepaged.c
@@ -440,7 +440,7 @@ int khugepaged_enter_vma_merge(struct vm
* page fault if needed.
*/
return 0;
- if (vma->vm_ops || (vm_flags & VM_NO_KHUGEPAGED))
+ if (!vma_is_anonymous(vma) || (vm_flags & VM_NO_KHUGEPAGED))
/* khugepaged not yet working on file or special mappings */
return 0;
hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
@@ -831,7 +831,7 @@ static bool hugepage_vma_check(struct vm
return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
HPAGE_PMD_NR);
}
- if (!vma->anon_vma || vma->vm_ops)
+ if (!vma->anon_vma || !vma_is_anonymous(vma))
return false;
if (is_vma_temporary_stack(vma))
return false;
diff -puN mm/mmap.c~mm-fix-vma_is_anonymous-false-positives mm/mmap.c
--- a/mm/mmap.c~mm-fix-vma_is_anonymous-false-positives
+++ a/mm/mmap.c
@@ -561,6 +561,8 @@ static unsigned long count_vma_pages_ran
void __vma_link_rb(struct mm_struct *mm, struct vm_area_struct *vma,
struct rb_node **rb_link, struct rb_node *rb_parent)
{
+ WARN_ONCE(!vma->vm_ops, "missing vma->vm_ops");
+
/* Update tracking information for the gap following the new vma. */
if (vma->vm_next)
vma_gap_update(vma->vm_next);
@@ -1762,6 +1764,11 @@ unsigned long mmap_region(struct file *f
*/
vma->vm_file = get_file(file);
error = call_mmap(file, vma);
+
+ /* All mappings must have ->vm_ops set */
+ if (!vma->vm_ops)
+ vma->vm_ops = &dummy_vm_ops;
+
if (error)
goto unmap_and_free_vma;
@@ -1780,6 +1787,9 @@ unsigned long mmap_region(struct file *f
error = shmem_zero_setup(vma);
if (error)
goto free_vma;
+ } else {
+ /* vma_is_anonymous() relies on this. */
+ vma->vm_ops = &anon_vm_ops;
}
vma_link(mm, vma, prev, rb_link, rb_parent);
@@ -2992,6 +3002,7 @@ static int do_brk_flags(unsigned long ad
INIT_LIST_HEAD(&vma->anon_vma_chain);
vma->vm_mm = mm;
+ vma->vm_ops = &anon_vm_ops;
vma->vm_start = addr;
vma->vm_end = addr + len;
vma->vm_pgoff = pgoff;
diff -puN mm/nommu.c~mm-fix-vma_is_anonymous-false-positives mm/nommu.c
--- a/mm/nommu.c~mm-fix-vma_is_anonymous-false-positives
+++ a/mm/nommu.c
@@ -1058,6 +1058,8 @@ static int do_mmap_shared_file(struct vm
int ret;
ret = call_mmap(vma->vm_file, vma);
+ if (!vma->vm_ops)
+ vma->vm_ops = &dummy_vm_ops;
if (ret == 0) {
vma->vm_region->vm_top = vma->vm_region->vm_end;
return 0;
@@ -1089,6 +1091,8 @@ static int do_mmap_private(struct vm_are
*/
if (capabilities & NOMMU_MAP_DIRECT) {
ret = call_mmap(vma->vm_file, vma);
+ if (!vma->vm_ops)
+ vma->vm_ops = &dummy_vm_ops;
if (ret == 0) {
/* shouldn't return success if we're not sharing */
BUG_ON(!(vma->vm_flags & VM_MAYSHARE));
@@ -1137,6 +1141,8 @@ static int do_mmap_private(struct vm_are
fpos = vma->vm_pgoff;
fpos <<= PAGE_SHIFT;
+ vma->vm_ops = &dummy_vm_ops;
+
ret = kernel_read(vma->vm_file, base, len, &fpos);
if (ret < 0)
goto error_free;
@@ -1144,7 +1150,8 @@ static int do_mmap_private(struct vm_are
/* clear the last little bit */
if (ret < len)
memset(base + ret, 0, len - ret);
-
+ } else {
+ vma->vm_ops = &anon_vm_ops;
}
return 0;
diff -puN mm/shmem.c~mm-fix-vma_is_anonymous-false-positives mm/shmem.c
--- a/mm/shmem.c~mm-fix-vma_is_anonymous-false-positives
+++ a/mm/shmem.c
@@ -1424,6 +1424,7 @@ static void shmem_pseudo_vma_init(struct
/* Bias interleave by inode number to distribute better across nodes */
vma->vm_pgoff = index + info->vfs_inode.i_ino;
vma->vm_policy = mpol_shared_policy_lookup(&info->policy, index);
+ vma->vm_ops = &dummy_vm_ops;
}
static void shmem_pseudo_vma_destroy(struct vm_area_struct *vma)
diff -puN mm/util.c~mm-fix-vma_is_anonymous-false-positives mm/util.c
--- a/mm/util.c~mm-fix-vma_is_anonymous-false-positives
+++ a/mm/util.c
@@ -20,6 +20,18 @@
#include "internal.h"
+/*
+ * All anonymous VMAs have ->vm_ops set to anon_vm_ops.
+ * vma_is_anonymous() reiles on anon_vm_ops to detect anonymous VMA.
+ */
+const struct vm_operations_struct anon_vm_ops = {};
+
+/*
+ * All VMAs have to have ->vm_ops set. dummy_vm_ops can be used if the VMA
+ * doesn't need to handle any of the operations.
+ */
+const struct vm_operations_struct dummy_vm_ops = {};
+
static inline int is_kernel_rodata(unsigned long addr)
{
return addr >= (unsigned long)__start_rodata &&
_
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch 2/6] mm: fix vma_is_anonymous() false-positives
2018-07-21 0:53 [patch 2/6] mm: fix vma_is_anonymous() false-positives akpm
@ 2018-07-21 19:48 ` Linus Torvalds
2018-07-21 22:34 ` Linus Torvalds
0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2018-07-21 19:48 UTC (permalink / raw)
To: Andrew Morton
Cc: stable, Oleg Nesterov, marcel.ziswiler, Dmitry Vyukov,
Andrea Arcangeli, Kirill A. Shutemov, mm-commits
On Fri, Jul 20, 2018 at 5:53 PM <akpm@linux-foundation.org> wrote:
>
> vma_is_anonymous() relies on ->vm_ops being NULL to detect anonymous VMA.
> This is unreliable as ->mmap may not set ->vm_ops.
I detest this patch.
It does:
+ if (!vma->vm_ops)
+ vma->vm_ops = &dummy_vm_ops;
which seems eminently reasonable.
Except it does it in the wrong place - it should just do it in the vma
allocator, and guarantee that every vma has that initial dummy vm ops
pointer.
Yeah, yeah "the vma allocator" doesn't really exist right now, because people do
vma = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL);
by hand, but that would be a nice independent cleanup. We could also
puit the required
INIT_LIST_HEAD(&vma->anon_vma_chain);
etc into the allocation function.
(There's a couple of cases that don't do the "zalloc" version because
they will memcpy the old contents, but if people care we could do a
"vma_dup()" that does that, and also does all the other required
cleanup.
Then the small handful of actual anonymous cases could just NULL it
out, and we wouldn't need to make "vma_is_anonymous()" more complex
and expensive.
Instead, this patch forces random drivers to set a vm_ops that those
drivers very much by definition do not care about.
So instead fo doing cleanup, it adds complexity.
So I'm not going to apply this. Instead, I'll do the "let's introduce
a vma_alloc()/vma_free()". Initially doing *only* the allocation, then
we can start moving things into it (the vm_ops initialization, the
INIT_LIST_HEAD etc).
Linus
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch 2/6] mm: fix vma_is_anonymous() false-positives
2018-07-21 19:48 ` Linus Torvalds
@ 2018-07-21 22:34 ` Linus Torvalds
2018-07-21 22:49 ` Linus Torvalds
0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2018-07-21 22:34 UTC (permalink / raw)
To: Andrew Morton
Cc: stable, Oleg Nesterov, marcel.ziswiler, Dmitry Vyukov,
Andrea Arcangeli, Kirill A. Shutemov, mm-commits
On Sat, Jul 21, 2018 at 12:48 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So I'm not going to apply this. Instead, I'll do the "let's introduce
> a vma_alloc()/vma_free()". Initially doing *only* the allocation, then
> we can start moving things into it (the vm_ops initialization, the
> INIT_LIST_HEAD etc).
Ok, I'm pushing that out, so that we can try the "vm_ops model
defaults to dummy_vm_ops" model instead of people having to set it
explicitly.
Even if that doesn't turn out to be a good idea (ie Kirill might have
some reason I'm missing for why he really wants to have an explicit
"anon_vm_ops"), the patches to not have people use the vm_area_cachep
directly seem to be valid cleanups.
But I basically think that with those patches in place, we can:
- make vm_area_alloc() just default vm_ops to &dummy_vm_ops
- just take the part of Kirill's patch that does
vma->vm_ops = &anon_vm_ops;
and instead of '&anon_vm_ops', set it to NULL.
End result: vma_is_anonymous() continues to work as-is, and we don't
have any false positives because the anon vma's are now _explicitly_
initialized that way.
I'll just do a quick set of extra build and boot tests, and push that
baseline out (but without that final part that would introduce the
dummy_vm_ops).
Linus
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch 2/6] mm: fix vma_is_anonymous() false-positives
2018-07-21 22:34 ` Linus Torvalds
@ 2018-07-21 22:49 ` Linus Torvalds
2018-07-22 20:21 ` Linus Torvalds
2018-07-23 8:56 ` Kirill A. Shutemov
0 siblings, 2 replies; 9+ messages in thread
From: Linus Torvalds @ 2018-07-21 22:49 UTC (permalink / raw)
To: Andrew Morton
Cc: stable, Oleg Nesterov, marcel.ziswiler, Dmitry Vyukov,
Andrea Arcangeli, Kirill A. Shutemov, mm-commits
[-- Attachment #1: Type: text/plain, Size: 851 bytes --]
On Sat, Jul 21, 2018 at 3:34 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> But I basically think that with those patches in place, we can:
>
> - make vm_area_alloc() just default vm_ops to &dummy_vm_ops
>
> - just take the part of Kirill's patch that does
>
> vma->vm_ops = &anon_vm_ops;
>
> and instead of '&anon_vm_ops', set it to NULL.
IOW, Kirill's patch now just boils down to the attached trial patch.
I will *not* be committing this last patch, I think it needs more
testing and ack's. And honestly, authorship should go to Kirill if
he's ok with it, because all the basic "these are the anonymous vma's"
places came from Kirill's patch and work.
But it works for me, and I did commit and push out the parts that were
just trivial cleanups with no actual semantic changes.
Comments?
Linus
[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 2316 bytes --]
drivers/char/mem.c | 1 +
fs/exec.c | 1 +
kernel/fork.c | 2 ++
mm/mmap.c | 4 +++-
mm/nommu.c | 3 ++-
5 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index ffeb60d3434c..0d4c185c2f6c 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -708,6 +708,7 @@ static int mmap_zero(struct file *file, struct vm_area_struct *vma)
#endif
if (vma->vm_flags & VM_SHARED)
return shmem_zero_setup(vma);
+ vma->vm_ops = NULL;
return 0;
}
diff --git a/fs/exec.c b/fs/exec.c
index 72e961a62adb..b0c6efce324a 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -293,6 +293,7 @@ static int __bprm_mm_init(struct linux_binprm *bprm)
bprm->vma = vma = vm_area_alloc(mm);
if (!vma)
return -ENOMEM;
+ vma->vm_ops = NULL;
if (down_write_killable(&mm->mmap_sem)) {
err = -EINTR;
diff --git a/kernel/fork.c b/kernel/fork.c
index a191c05e757d..dd50b5220713 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -310,11 +310,13 @@ static struct kmem_cache *mm_cachep;
struct vm_area_struct *vm_area_alloc(struct mm_struct *mm)
{
+ static const struct vm_operations_struct dummy_vm_ops = {};
struct vm_area_struct *vma = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL);
if (vma) {
vma->vm_mm = mm;
INIT_LIST_HEAD(&vma->anon_vma_chain);
+ vma->vm_ops = &dummy_vm_ops;
}
return vma;
}
diff --git a/mm/mmap.c b/mm/mmap.c
index ff1944d8d458..b90176eb9e94 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1778,7 +1778,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
error = shmem_zero_setup(vma);
if (error)
goto free_vma;
- }
+ } else
+ vma->vm_ops = NULL;
vma_link(mm, vma, prev, rb_link, rb_parent);
/* Once vma denies write, undo our temporary denial count */
@@ -2983,6 +2984,7 @@ static int do_brk_flags(unsigned long addr, unsigned long len, unsigned long fla
return -ENOMEM;
}
+ vma->vm_ops = NULL;
vma->vm_start = addr;
vma->vm_end = addr + len;
vma->vm_pgoff = pgoff;
diff --git a/mm/nommu.c b/mm/nommu.c
index 1d22fdbf7d7c..e1f8b7eb5683 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1145,7 +1145,8 @@ static int do_mmap_private(struct vm_area_struct *vma,
if (ret < len)
memset(base + ret, 0, len - ret);
- }
+ } else
+ vma->vm_ops = NULL;
return 0;
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [patch 2/6] mm: fix vma_is_anonymous() false-positives
2018-07-21 22:49 ` Linus Torvalds
@ 2018-07-22 20:21 ` Linus Torvalds
2018-07-23 3:17 ` Jason Gunthorpe
2018-09-28 17:43 ` Jason Gunthorpe
2018-07-23 8:56 ` Kirill A. Shutemov
1 sibling, 2 replies; 9+ messages in thread
From: Linus Torvalds @ 2018-07-22 20:21 UTC (permalink / raw)
To: Andrew Morton, Doug Ledford, Jason Gunthorpe, Yishai Hadas,
Lijun Ou, Wei Hu(Xavier)
Cc: stable, Oleg Nesterov, marcel.ziswiler, Dmitry Vyukov,
Andrea Arcangeli, Kirill A. Shutemov, mm-commits
On Sat, Jul 21, 2018 at 3:49 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> IOW, Kirill's patch now just boils down to the attached trial patch.
Side note: we have a few rdma drivers that mess with vm_ops.
One does so intentionally, see mlx4_ib_vma_open() in
drivers/infiniband/hw/mlx4/main.c
and one *may* be intentional but it's not entirely clear:
hns_roce_vma_open() may bve the same issue, but
hns_roce_disassociate_ucontext() just looks pointless in
drivers/infiniband/hw/hns/hns_roce_main.c
Afaik, they should just use VM_DONTCOPY to not see the fork open
case, VM_DONTEXPAND to fail a size-changing mremap, and set a
vm_ops->split() function that returns an error.
Maybe we should have a VM_DONTSPLIT to make that ->split() case be simpler.
Their vma->vm_ops games seem very bogus, and will cause oddities. So
right now they *will* get copied on fork and split, and then you have
random crazy pages being mapped but not the right vm_ops. In addition
to the confusion with vma_is_anonymous().
Doug/Jason/etc?
(There's also scif_munmap() that clears vm_ops, but at minmap time it
doesn't really matter..)
Linus
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch 2/6] mm: fix vma_is_anonymous() false-positives
2018-07-22 20:21 ` Linus Torvalds
@ 2018-07-23 3:17 ` Jason Gunthorpe
2018-09-28 17:43 ` Jason Gunthorpe
1 sibling, 0 replies; 9+ messages in thread
From: Jason Gunthorpe @ 2018-07-23 3:17 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andrew Morton, Doug Ledford, Yishai Hadas, Lijun Ou,
Wei Hu(Xavier), stable, Oleg Nesterov, marcel.ziswiler,
Dmitry Vyukov, Andrea Arcangeli, Kirill A. Shutemov, mm-commits
On Sun, Jul 22, 2018 at 01:21:06PM -0700, Linus Torvalds wrote:
> On Sat, Jul 21, 2018 at 3:49 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > IOW, Kirill's patch now just boils down to the attached trial patch.
>
> Side note: we have a few rdma drivers that mess with vm_ops.
>
> One does so intentionally, see mlx4_ib_vma_open() in
>
> drivers/infiniband/hw/mlx4/main.c
Both drivers should be doing this for identical reasons, the long
comment in mlx4/main.c appears to be the rational and expected
behavior..
mlx5/main.c also does this.
Most likely this entire process should be part of the core support
library and not open coded like this in drivers.
> and one *may* be intentional but it's not entirely clear:
> hns_roce_vma_open() may bve the same issue, but
> hns_roce_disassociate_ucontext() just looks pointless in
>
> drivers/infiniband/hw/hns/hns_roce_main.c
All three drivers should have code like this as well, it is triggered
during something like a PCI hot unplug,
What the drivers are doing is mapping PCI BAR memory in response to
mmap().
When the device becomes unplugged then the BAR memory map is to be
revoked and replaced with a zero page, so the physical device can be
physically un-plugged, or hard reset. (ie access to the BAR memory may
now MCE or something)
This is probably the strongest reason why these pages need to be
protected, as the driver must be able to hunt down all user space
mappings of the BAR page and do this zeroing, so "don't copy" and
other flags seem necessary with the current vma tracking approach.
Would be glad to hear if this algorithm with zap_vma_ptes makes any
sense or is totally wrong.
However, I believe, at least for mlx4/5, it is actually actively
tested and does work in the straightfoward case.
> Afaik, they should just use VM_DONTCOPY to not see the fork open
> case, VM_DONTEXPAND to fail a size-changing mremap, and set a
> vm_ops->split() function that returns an error.
This does makes more sense, I certainly would prefer well defined
flags over this strange manipulation.
If you say this approach will do what is needed I'll ask the driver
maintainer to code and test with it..
Jason
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch 2/6] mm: fix vma_is_anonymous() false-positives
2018-07-21 22:49 ` Linus Torvalds
2018-07-22 20:21 ` Linus Torvalds
@ 2018-07-23 8:56 ` Kirill A. Shutemov
2018-07-23 19:16 ` Linus Torvalds
1 sibling, 1 reply; 9+ messages in thread
From: Kirill A. Shutemov @ 2018-07-23 8:56 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andrew Morton, stable, Oleg Nesterov, marcel.ziswiler,
Dmitry Vyukov, Andrea Arcangeli, mm-commits
On Sat, Jul 21, 2018 at 10:49:57PM +0000, Linus Torvalds wrote:
> On Sat, Jul 21, 2018 at 3:34 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > But I basically think that with those patches in place, we can:
> >
> > - make vm_area_alloc() just default vm_ops to &dummy_vm_ops
> >
> > - just take the part of Kirill's patch that does
> >
> > vma->vm_ops = &anon_vm_ops;
> >
> > and instead of '&anon_vm_ops', set it to NULL.
>
> IOW, Kirill's patch now just boils down to the attached trial patch.
>
> I will *not* be committing this last patch, I think it needs more
> testing and ack's. And honestly, authorship should go to Kirill if
> he's ok with it, because all the basic "these are the anonymous vma's"
> places came from Kirill's patch and work.
>
> But it works for me, and I did commit and push out the parts that were
> just trivial cleanups with no actual semantic changes.
>
> Comments?
Allocation helpers are great idea. But they don't cover several corner
cases: allocation of VMA on stack or in data section.
For instance, with proposed approach gate VMA is anon on ARM, but non-anon
on x86. I don't think it causes problems, but it's inconsistent and may
become a problem in the future. I think it worth manually initialize such
VMAs with dummy_vm_ops.
I also would rather keep ->vm_ops non-NULL for anonymous VMAs. This way it's
easier to catch broken drivers: place a WARN() in __vma_link_rb().
With non-NULL vm_ops we can drop all vma->vm_ops checks. There's patch in
-mm tree doing this[1].
[1] https://ozlabs.org/~akpm/mmots/broken-out/mm-drop-unneeded-vm_ops-checks-v2.patch
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch 2/6] mm: fix vma_is_anonymous() false-positives
2018-07-23 8:56 ` Kirill A. Shutemov
@ 2018-07-23 19:16 ` Linus Torvalds
0 siblings, 0 replies; 9+ messages in thread
From: Linus Torvalds @ 2018-07-23 19:16 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Andrew Morton, stable, Oleg Nesterov, marcel.ziswiler,
Dmitry Vyukov, Andrea Arcangeli, mm-commits
On Mon, Jul 23, 2018 at 1:55 AM Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> I also would rather keep ->vm_ops non-NULL for anonymous VMAs. This way it's
> easier to catch broken drivers: place a WARN() in __vma_link_rb().
>
> With non-NULL vm_ops we can drop all vma->vm_ops checks. There's patch in
> -mm tree doing this[1].
Yeah, that patch is completely broken. See the separate email where I
pointed out how there are drivers that explicitly set vm_ops to NULL
while the vma is still live.
So no. That kind of stuff isn't even an option unless everything else
is cleaned up.
So I really think there's a lot more cleanup here before
'&anon_vm_ops" would make any sense what-so-ever.
But if we had a wrapper function for it, I'd at least find it more
palatable. Something simple like
static inline void vma_set_anonymous(struct vm_area_struct *vma)
{
static const struct vm_operations_struct anon_vm_ops = {};
vma->vm_ops = &anon_vm_ops;
vma->vm_flags |= VM_ANONYMOUS;
}
would look fine to me. And I'd rather have it in a flag than anywhere
else. That's what those flags *are* for.
In fact, I think one cleanup we should do on the way to this is to fix
the current "vm_flags". It's insanely different sizes on 32-bit and
64-bit, because we use 'unsigned long' for it. And it's a big
inconvenience, because we're tight on flags.
So we could make it a u64, which would get us reliably more flags. Or
even make a separate field for some of the more specialized flags, and
have two (separate) flag fields instead. Right now we do some odd
things due to packing (not as bad as the page flags, but not pretty).
Linus
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch 2/6] mm: fix vma_is_anonymous() false-positives
2018-07-22 20:21 ` Linus Torvalds
2018-07-23 3:17 ` Jason Gunthorpe
@ 2018-09-28 17:43 ` Jason Gunthorpe
1 sibling, 0 replies; 9+ messages in thread
From: Jason Gunthorpe @ 2018-09-28 17:43 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andrew Morton, Doug Ledford, Yishai Hadas, Lijun Ou,
Wei Hu(Xavier), stable, Oleg Nesterov, marcel.ziswiler,
Dmitry Vyukov, Andrea Arcangeli, Kirill A. Shutemov, mm-commits
On Sun, Jul 22, 2018 at 01:21:06PM -0700, Linus Torvalds wrote:
> On Sat, Jul 21, 2018 at 3:49 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > IOW, Kirill's patch now just boils down to the attached trial patch.
>
> Side note: we have a few rdma drivers that mess with vm_ops.
>
> One does so intentionally, see mlx4_ib_vma_open() in
>
> drivers/infiniband/hw/mlx4/main.c
>
> and one *may* be intentional but it's not entirely clear:
> hns_roce_vma_open() may bve the same issue, but
> hns_roce_disassociate_ucontext() just looks pointless in
>
> drivers/infiniband/hw/hns/hns_roce_main.c
>
> Afaik, they should just use VM_DONTCOPY to not see the fork open
> case, VM_DONTEXPAND to fail a size-changing mremap, and set a
> vm_ops->split() function that returns an error.
>
> Maybe we should have a VM_DONTSPLIT to make that ->split() case be simpler.
>
> Their vma->vm_ops games seem very bogus, and will cause oddities. So
> right now they *will* get copied on fork and split, and then you have
> random crazy pages being mapped but not the right vm_ops. In addition
> to the confusion with vma_is_anonymous().
We've now got patches in rdma.git's for-next (and linux-next) that
conslidates this driver code and eliminates the vm_ops games in all
the infiniband drivers.
Hopefully this helps whatever this vma_is_anonymous() work was.
https://patchwork.kernel.org/project/linux-rdma/list/?series=19461&archive=both&state=*
Regards,
Jason
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2018-09-29 0:08 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-07-21 0:53 [patch 2/6] mm: fix vma_is_anonymous() false-positives akpm
2018-07-21 19:48 ` Linus Torvalds
2018-07-21 22:34 ` Linus Torvalds
2018-07-21 22:49 ` Linus Torvalds
2018-07-22 20:21 ` Linus Torvalds
2018-07-23 3:17 ` Jason Gunthorpe
2018-09-28 17:43 ` Jason Gunthorpe
2018-07-23 8:56 ` Kirill A. Shutemov
2018-07-23 19:16 ` Linus Torvalds
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).