* /dev/kmem BUG on mmap
@ 2012-07-29 22:28 Sasha Levin
2012-07-30 9:43 ` Johannes Weiner
0 siblings, 1 reply; 3+ messages in thread
From: Sasha Levin @ 2012-07-29 22:28 UTC (permalink / raw)
To: gregkh, arnd; +Cc: linux-kernel@vger.kernel.org
Hi all,
I was poking around /dev/kmem related code, and noticed the following in mmap_kmem():
/* Turn a kernel-virtual address into a physical page frame */
pfn = __pa((u64)vma->vm_pgoff << PAGE_SHIFT) >> PAGE_SHIFT;
Which looked odd since vm_pgoff is the offset into the mapping, so I'd assume that PAGE_OFFSET should be added to it as well, otherwise we get an invalid address.
I tested it by writing something like this:
int main(void)
{
int fd;
void *addr;
fd = open("/dev/kmem", O_RDONLY);
addr = mmap(NULL, 4096, PROT_READ, MAP_PRIVATE, fd, 4096);
return 0;
}
Which indeed triggered a VM_BUG:
[ 32.285431] kernel BUG at arch/x86/mm/physaddr.c:18!
[ 32.285431] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 32.285431] CPU 0
[ 32.285431] Pid: 5643, comm: a.out Tainted: G W 3.5.0-next-20120727-sasha #504
[ 32.285431] RIP: 0010:[<ffffffff810acd97>] [<ffffffff810acd97>] __phys_addr+0x57/0xa0
[ 32.285431] RSP: 0018:ffff88000be67d68 EFLAGS: 00010213
[ 32.285431] RAX: ffff87ffffffffff RBX: ffff88000d67cb00 RCX: 00000000000080d0
[ 32.285431] RDX: 0000000000000071 RSI: ffff88000bfc8dc8 RDI: 0000000000001000
[ 32.285431] RBP: ffff88000be67d68 R08: 0000000000000001 R09: ffff88000bfc8dc8
[ 32.285431] R10: ffff88000bfc81f8 R11: 0000000000000002 R12: ffff88000bfc8dc8
[ 32.285431] R13: 00007f26f80e6000 R14: ffff88000bf81000 R15: ffff88000bfc8dc8
[ 32.285431] FS: 00007f26f80e8700(0000) GS:ffff88000d800000(0000) knlGS:0000000000000000
[ 32.285431] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 32.285431] CR2: 00007f26f7c07d50 CR3: 000000000bfb8000 CR4: 00000000000406f0
[ 32.285431] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 32.285431] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 32.285431] Process a.out (pid: 5643, threadinfo ffff88000be66000, task ffff88000bf2b000)
[ 32.285431] Stack:
[ 32.285431] ffff88000be67d88 ffffffff81bb0737 ffff88000bfc95e0 ffff88000bfc95f0
[ 32.285431] ffff88000be67e48 ffffffff812163ae ffff88000d67cb00 0000000000000001
[ 32.285431] 0000000000000000 0000000000000000 ffff88000be67dd8 ffff88000bfc81f8
[ 32.285431] Call Trace:
[ 32.285431] [<ffffffff81bb0737>] mmap_kmem+0x27/0x90
[ 32.285431] [<ffffffff812163ae>] mmap_region+0x35e/0x5f0
[ 32.285431] [<ffffffff812168f9>] do_mmap_pgoff+0x2b9/0x350
[ 32.285431] [<ffffffff8120120c>] ? vm_mmap_pgoff+0x6c/0xb0
[ 32.285431] [<ffffffff81201224>] vm_mmap_pgoff+0x84/0xb0
[ 32.285431] [<ffffffff8124f280>] ? fget_raw+0x260/0x260
[ 32.285431] [<ffffffff81213d9e>] sys_mmap_pgoff+0x15e/0x190
[ 32.285431] [<ffffffff8198ab2e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 32.285431] [<ffffffff8106e4ed>] sys_mmap+0x1d/0x20
[ 32.285431] [<ffffffff8361f6f9>] system_call_fastpath+0x16/0x1b
[ 32.285431] Code: 00 00 00 00 eb fe 66 0f 1f 44 00 00 48 03 05 91 02 78 03 eb 57 0f 1f 80 00 00 00 00 48 b8 ff ff ff ff ff 87 ff ff 48 39 c7 77 11 <0f> 0b 0f 1f 80 00 00 00 00 eb fe 66 0f 1f 44 00 00 48 b8 00 00
[ 32.285431] RIP [<ffffffff810acd97>] __phys_addr+0x57/0xa0
[ 32.285431] RSP <ffff88000be67d68>
I could send a patch to do what I think it's supposed to be doing, but I find it odd since apparently /dev/kmem has been broken for a while now - which doesn't make sense.
What am I missing?
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: /dev/kmem BUG on mmap 2012-07-29 22:28 /dev/kmem BUG on mmap Sasha Levin @ 2012-07-30 9:43 ` Johannes Weiner 2012-07-31 2:06 ` Hugh Dickins 0 siblings, 1 reply; 3+ messages in thread From: Johannes Weiner @ 2012-07-30 9:43 UTC (permalink / raw) To: Sasha Levin; +Cc: gregkh, arnd, Hugh Dickins, linux-kernel@vger.kernel.org On Mon, Jul 30, 2012 at 12:28:35AM +0200, Sasha Levin wrote: > Hi all, > > I was poking around /dev/kmem related code, and noticed the following in mmap_kmem(): > > /* Turn a kernel-virtual address into a physical page frame */ > pfn = __pa((u64)vma->vm_pgoff << PAGE_SHIFT) >> PAGE_SHIFT; > > Which looked odd since vm_pgoff is the offset into the mapping, so > I'd assume that PAGE_OFFSET should be added to it as well, otherwise > we get an invalid address. It's supposed to be used with kernel offsets in the first place, i.e. vma->vm_pgoff << PAGE_SHIFT should actually be a kernel virtual address. See 6d3154c Revert "[PATCH] Fix up mmap_kmem". > I tested it by writing something like this: > > int main(void) > { > int fd; > void *addr; > > fd = open("/dev/kmem", O_RDONLY); > addr = mmap(NULL, 4096, PROT_READ, MAP_PRIVATE, fd, 4096); > > return 0; > } > > Which indeed triggered a VM_BUG: > > [ 32.285431] kernel BUG at arch/x86/mm/physaddr.c:18! x86's debug-version of __pa() triggers that bug. I'm reluctant to add a whole lot of error checking to this interface, given that you should already know what you are doing. OTOH, crashing like this is not very nice, either. Is there a portable way to check if an address is a kernel virtual one? It looks like comparing to PAGE_OFFSET would work on most archs, but not necessarily on powerpc for example. Johannes > [ 32.285431] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC > [ 32.285431] CPU 0 > [ 32.285431] Pid: 5643, comm: a.out Tainted: G W 3.5.0-next-20120727-sasha #504 > [ 32.285431] RIP: 0010:[<ffffffff810acd97>] [<ffffffff810acd97>] __phys_addr+0x57/0xa0 > [ 32.285431] RSP: 0018:ffff88000be67d68 EFLAGS: 00010213 > [ 32.285431] RAX: ffff87ffffffffff RBX: ffff88000d67cb00 RCX: 00000000000080d0 > [ 32.285431] RDX: 0000000000000071 RSI: ffff88000bfc8dc8 RDI: 0000000000001000 > [ 32.285431] RBP: ffff88000be67d68 R08: 0000000000000001 R09: ffff88000bfc8dc8 > [ 32.285431] R10: ffff88000bfc81f8 R11: 0000000000000002 R12: ffff88000bfc8dc8 > [ 32.285431] R13: 00007f26f80e6000 R14: ffff88000bf81000 R15: ffff88000bfc8dc8 > [ 32.285431] FS: 00007f26f80e8700(0000) GS:ffff88000d800000(0000) knlGS:0000000000000000 > [ 32.285431] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 32.285431] CR2: 00007f26f7c07d50 CR3: 000000000bfb8000 CR4: 00000000000406f0 > [ 32.285431] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 32.285431] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 32.285431] Process a.out (pid: 5643, threadinfo ffff88000be66000, task ffff88000bf2b000) > [ 32.285431] Stack: > [ 32.285431] ffff88000be67d88 ffffffff81bb0737 ffff88000bfc95e0 ffff88000bfc95f0 > [ 32.285431] ffff88000be67e48 ffffffff812163ae ffff88000d67cb00 0000000000000001 > [ 32.285431] 0000000000000000 0000000000000000 ffff88000be67dd8 ffff88000bfc81f8 > [ 32.285431] Call Trace: > [ 32.285431] [<ffffffff81bb0737>] mmap_kmem+0x27/0x90 > [ 32.285431] [<ffffffff812163ae>] mmap_region+0x35e/0x5f0 > [ 32.285431] [<ffffffff812168f9>] do_mmap_pgoff+0x2b9/0x350 > [ 32.285431] [<ffffffff8120120c>] ? vm_mmap_pgoff+0x6c/0xb0 > [ 32.285431] [<ffffffff81201224>] vm_mmap_pgoff+0x84/0xb0 > [ 32.285431] [<ffffffff8124f280>] ? fget_raw+0x260/0x260 > [ 32.285431] [<ffffffff81213d9e>] sys_mmap_pgoff+0x15e/0x190 > [ 32.285431] [<ffffffff8198ab2e>] ? trace_hardirqs_on_thunk+0x3a/0x3f > [ 32.285431] [<ffffffff8106e4ed>] sys_mmap+0x1d/0x20 > [ 32.285431] [<ffffffff8361f6f9>] system_call_fastpath+0x16/0x1b > [ 32.285431] Code: 00 00 00 00 eb fe 66 0f 1f 44 00 00 48 03 05 91 02 78 03 eb 57 0f 1f 80 00 00 00 00 48 b8 ff ff ff ff ff 87 ff ff 48 39 c7 77 11 <0f> 0b 0f 1f 80 00 00 00 00 eb fe 66 0f 1f 44 00 00 48 b8 00 00 > [ 32.285431] RIP [<ffffffff810acd97>] __phys_addr+0x57/0xa0 > [ 32.285431] RSP <ffff88000be67d68> > > I could send a patch to do what I think it's supposed to be doing, but I find it odd since apparently /dev/kmem has been broken for a while now - which doesn't make sense. > > What am I missing? > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: /dev/kmem BUG on mmap 2012-07-30 9:43 ` Johannes Weiner @ 2012-07-31 2:06 ` Hugh Dickins 0 siblings, 0 replies; 3+ messages in thread From: Hugh Dickins @ 2012-07-31 2:06 UTC (permalink / raw) To: Johannes Weiner; +Cc: Sasha Levin, gregkh, arnd, linux-kernel On Mon, 30 Jul 2012, Johannes Weiner wrote: > On Mon, Jul 30, 2012 at 12:28:35AM +0200, Sasha Levin wrote: > > Hi all, > > > > I was poking around /dev/kmem related code, and noticed the following in mmap_kmem(): > > > > /* Turn a kernel-virtual address into a physical page frame */ > > pfn = __pa((u64)vma->vm_pgoff << PAGE_SHIFT) >> PAGE_SHIFT; > > > > Which looked odd since vm_pgoff is the offset into the mapping, so > > I'd assume that PAGE_OFFSET should be added to it as well, otherwise > > we get an invalid address. > > It's supposed to be used with kernel offsets in the first place, > i.e. vma->vm_pgoff << PAGE_SHIFT should actually be a kernel virtual > address. See 6d3154c Revert "[PATCH] Fix up mmap_kmem". Yes. Some would say we should add a comment; but already it has one. > > > I tested it by writing something like this: > > > > int main(void) > > { > > int fd; > > void *addr; > > > > fd = open("/dev/kmem", O_RDONLY); > > addr = mmap(NULL, 4096, PROT_READ, MAP_PRIVATE, fd, 4096); > > > > return 0; > > } > > > > Which indeed triggered a VM_BUG: > > > > [ 32.285431] kernel BUG at arch/x86/mm/physaddr.c:18! > > x86's debug-version of __pa() triggers that bug. I'm reluctant to add > a whole lot of error checking to this interface, given that you should > already know what you are doing. OTOH, crashing like this is not very > nice, either. > > Is there a portable way to check if an address is a kernel virtual > one? It looks like comparing to PAGE_OFFSET would work on most archs, > but not necessarily on powerpc for example. I didn't look into powerpc; even on x86, comparing with PAGE_OFFSET first would filter out the most likely crashes, but leave it crashing on >= KERNEL_IMAGE_SIZE and !phys_addr_valid(). I think that's why it's so long said just __pa(), because different architectures would not agree on the appropriate prior validation. Debug crashes added at as low level as __pa() come as a surprise. Thank you to Sasha for bringing this to our attention, and if there were an obvious right answer, I'd definitely prefer to fail than crash an out-of-range mmap arg here, even if only CAP_SYS_RAWIO gets this far. You could say that the right answer is to add the __pa_nodebug() to every architecture (or in asm-generic), and then use that here; but is it worth bothering? Once I read the DEBUG_VIRTUAL Kconfig entry: Enable some costly sanity checks in virtual to page code. This can catch mistakes with virt_to_page() and friends. If unsure, say N. I'm inclined to think that few would turn DEBUG_VIRTUAL on, and those who do so might as well welcome this crash as the costly way in which it catches their mistakes - with apology to Sasha. Not an answer I'm especially proud of, but doubt it's worth more. Hugh ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2012-07-31 2:07 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-07-29 22:28 /dev/kmem BUG on mmap Sasha Levin 2012-07-30 9:43 ` Johannes Weiner 2012-07-31 2:06 ` Hugh Dickins
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).