All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [Bugme-new] [Bug 13850] New: reading /proc/kcore causes oops
       [not found] <bug-13850-10286@http.bugzilla.kernel.org/>
@ 2009-07-28 23:05 ` Andrew Morton
  2009-07-28 23:48   ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 31+ messages in thread
From: Andrew Morton @ 2009-07-28 23:05 UTC (permalink / raw)
  To: scgtrp
  Cc: bugzilla-daemon, bugme-daemon, Amerigo Wang, KAMEZAWA Hiroyuki,
	linux-kernel


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Mon, 27 Jul 2009 03:19:11 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=13850
> 
>            Summary: reading /proc/kcore causes oops
>            Product: Other
>            Version: 2.5
>     Kernel Version: 2.6.30
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>         AssignedTo: other_other@kernel-bugs.osdl.org
>         ReportedBy: scgtrp@gmail.com
>         Regression: No
> 
> 
> When trying to use an old trick for finding lost data by grep'ing /proc/kcore,
> I managed to oops my server's kernel. I tried again on my desktop with cat
> /proc/kcore >/dev/null. cat was killed, and a similar oops appeared in my dmesg
> which I managed to capture:
> 
> Jul 26 23:04:13 mike kernel: BUG: unable to handle kernel paging request at
> e07cf000
> Jul 26 23:04:13 mike kernel: IP: [<c0224dd1>] read_kcore+0x2c1/0x4b0
> Jul 26 23:04:13 mike kernel: *pde = 1b5f4067 *pte = 00000000 
> Jul 26 23:04:13 mike kernel: Oops: 0000 [#2] PREEMPT SMP 
> Jul 26 23:04:13 mike kernel: last sysfs file: /sys/power/state
> Jul 26 23:04:13 mike kernel: Modules linked in: ipv6 sg sd_mod fuse usb_storage
> usbhid hid snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device
> snd_pcm_oss snd_mixer_oss ppdev snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm
> snd_timer ohci_hcd parport_pc lp parport snd soundcore snd_page_alloc nvidia(P)
> agpgart k8temp ehci_hcd forcedeth i2c_nforce2 i2c_core usbcore evdev thermal
> processor fan button battery ac rtc_cmos rtc_core rtc_lib ext3 jbd mbcache
> ide_gd_mod ide_cd_mod cdrom sata_nv libata amd74xx ide_pci_generic ide_core
> scsi_mod
> Jul 26 23:04:13 mike kernel: 
> Jul 26 23:04:13 mike kernel: Pid: 4835, comm: cat Tainted: P      D   
> (2.6.30-ARCH #1) W3107
> Jul 26 23:04:13 mike kernel: EIP: 0060:[<c0224dd1>] EFLAGS: 00210286 CPU: 0
> Jul 26 23:04:13 mike kernel: EIP is at read_kcore+0x2c1/0x4b0
> Jul 26 23:04:13 mike kernel: EAX: ddb71ac0 EBX: 00001000 ECX: 00000400 EDX:
> e07d0000
> Jul 26 23:04:13 mike kernel: ESI: e07cf000 EDI: da20e000 EBP: d73fbf30 ESP:
> d73fbefc
> Jul 26 23:04:13 mike kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> Jul 26 23:04:13 mike kernel: Process cat (pid: 4835, ti=d73fa000 task=d4b5cc00
> task.ti=d73fa000)
> Jul 26 23:04:13 mike kernel: Stack:
> Jul 26 23:04:13 mike kernel: da20e000 e07cf000 d73fbf90 00000000 09459000
> 00000000 00008000 00001000
> Jul 26 23:04:13 mike kernel: 00000000 6798ed89 ddaf0000 ce920c80 c0224b10
> fffffffb c0219e89 d73fbf90
> Jul 26 23:04:13 mike kernel: 09459000 00008000 d73fbf90 6798ed89 ce920c80
> 00008000 09459000 d73fbf80
> Jul 26 23:04:13 mike kernel: Call Trace:
> Jul 26 23:04:13 mike kernel: [<c0224b10>] ? read_kcore+0x0/0x4b0
> Jul 26 23:04:13 mike kernel: [<c0219e89>] ? proc_reg_read+0x79/0xc0
> Jul 26 23:04:13 mike kernel: [<c01d1b43>] ? vfs_read+0xc3/0x1a0
> Jul 26 23:04:13 mike kernel: [<c0219e10>] ? proc_reg_read+0x0/0xc0
> Jul 26 23:04:13 mike kernel: [<c01d1d28>] ? sys_read+0x58/0xb0
> Jul 26 23:04:13 mike kernel: [<c0103c73>] ? sysenter_do_call+0x12/0x28
> Jul 26 23:04:13 mike kernel: Code: 89 fb 0f 43 f2 89 ca 29 f2 29 f3 39 f9 0f 46
> da 29 5c 24 14 f6 40 0c 01 8d 14 33 75 19 89 d9 89 f7 c1 e9 02 2b 7c 24 04 03
> 3c 24 <f3> a5 89 d9 83 e1 03 74 02 f3 a4 8b 4c 24 14 8b 00 85 c9 74 0a 
> Jul 26 23:04:13 mike kernel: EIP: [<c0224dd1>] read_kcore+0x2c1/0x4b0 SS:ESP
> 0068:d73fbefc
> Jul 26 23:04:13 mike kernel: CR2: 00000000e07cf000
> Jul 26 23:04:13 mike kernel: ---[ end trace 3bb140bf57c1987e ]---
> Jul 26 23:04:13 mike kernel: note: cat[4835] exited with preempt_count 1
> 
> I understand it's quite a ridiculous thing to do, but userspace shouldn't be
> able to cause kernel errors, no matter what kind of insane things I try.
> 

gee, read_kcore() is huge.  This makes it pretty hard to work out where
exactly the kernel died.

Is it reproducible, or do you still have the vmlinux from the above
oops on-disk?

If so, can you please help work out where it crashed?  You could run
something like

	addr2line -e vmlinux 0xc0224dd1

or

	gdb vmlinux
	(gdb) l *0xc0224dd1

both of these will need CONFIG_DEBUG_INFO=y.


It is possible to work out where the kernel crashed using the above
Code: line, but it's a bit of a pain.

Thanks.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Bugme-new] [Bug 13850] New: reading /proc/kcore causes oops
  2009-07-28 23:05 ` [Bugme-new] [Bug 13850] New: reading /proc/kcore causes oops Andrew Morton
@ 2009-07-28 23:48   ` KAMEZAWA Hiroyuki
  2009-07-29  2:46     ` Mike Smith
  0 siblings, 1 reply; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-07-28 23:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: scgtrp, bugzilla-daemon, bugme-daemon, Amerigo Wang, linux-kernel

On Tue, 28 Jul 2009 16:05:27 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> 
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Mon, 27 Jul 2009 03:19:11 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> > http://bugzilla.kernel.org/show_bug.cgi?id=13850
> > 
> >            Summary: reading /proc/kcore causes oops
> >            Product: Other
> >            Version: 2.5
> >     Kernel Version: 2.6.30
> >           Platform: All
> >         OS/Version: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Other
> >         AssignedTo: other_other@kernel-bugs.osdl.org
> >         ReportedBy: scgtrp@gmail.com
> >         Regression: No
> > 
> > 
> > When trying to use an old trick for finding lost data by grep'ing /proc/kcore,
> > I managed to oops my server's kernel. I tried again on my desktop with cat
> > /proc/kcore >/dev/null. cat was killed, and a similar oops appeared in my dmesg
> > which I managed to capture:
> > 
> > Jul 26 23:04:13 mike kernel: BUG: unable to handle kernel paging request at
> > e07cf000
> > Jul 26 23:04:13 mike kernel: IP: [<c0224dd1>] read_kcore+0x2c1/0x4b0
> > Jul 26 23:04:13 mike kernel: *pde = 1b5f4067 *pte = 00000000 
> > Jul 26 23:04:13 mike kernel: Oops: 0000 [#2] PREEMPT SMP 
> > Jul 26 23:04:13 mike kernel: last sysfs file: /sys/power/state
> > Jul 26 23:04:13 mike kernel: Modules linked in: ipv6 sg sd_mod fuse usb_storage
> > usbhid hid snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device
> > snd_pcm_oss snd_mixer_oss ppdev snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm
> > snd_timer ohci_hcd parport_pc lp parport snd soundcore snd_page_alloc nvidia(P)
> > agpgart k8temp ehci_hcd forcedeth i2c_nforce2 i2c_core usbcore evdev thermal
> > processor fan button battery ac rtc_cmos rtc_core rtc_lib ext3 jbd mbcache
> > ide_gd_mod ide_cd_mod cdrom sata_nv libata amd74xx ide_pci_generic ide_core
> > scsi_mod
> > Jul 26 23:04:13 mike kernel: 
> > Jul 26 23:04:13 mike kernel: Pid: 4835, comm: cat Tainted: P      D   
> > (2.6.30-ARCH #1) W3107
> > Jul 26 23:04:13 mike kernel: EIP: 0060:[<c0224dd1>] EFLAGS: 00210286 CPU: 0
> > Jul 26 23:04:13 mike kernel: EIP is at read_kcore+0x2c1/0x4b0
> > Jul 26 23:04:13 mike kernel: EAX: ddb71ac0 EBX: 00001000 ECX: 00000400 EDX:
> > e07d0000
> > Jul 26 23:04:13 mike kernel: ESI: e07cf000 EDI: da20e000 EBP: d73fbf30 ESP:
> > d73fbefc
> > Jul 26 23:04:13 mike kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> > Jul 26 23:04:13 mike kernel: Process cat (pid: 4835, ti=d73fa000 task=d4b5cc00
> > task.ti=d73fa000)
> > Jul 26 23:04:13 mike kernel: Stack:
> > Jul 26 23:04:13 mike kernel: da20e000 e07cf000 d73fbf90 00000000 09459000
> > 00000000 00008000 00001000
> > Jul 26 23:04:13 mike kernel: 00000000 6798ed89 ddaf0000 ce920c80 c0224b10
> > fffffffb c0219e89 d73fbf90
> > Jul 26 23:04:13 mike kernel: 09459000 00008000 d73fbf90 6798ed89 ce920c80
> > 00008000 09459000 d73fbf80
> > Jul 26 23:04:13 mike kernel: Call Trace:
> > Jul 26 23:04:13 mike kernel: [<c0224b10>] ? read_kcore+0x0/0x4b0
> > Jul 26 23:04:13 mike kernel: [<c0219e89>] ? proc_reg_read+0x79/0xc0
> > Jul 26 23:04:13 mike kernel: [<c01d1b43>] ? vfs_read+0xc3/0x1a0
> > Jul 26 23:04:13 mike kernel: [<c0219e10>] ? proc_reg_read+0x0/0xc0
> > Jul 26 23:04:13 mike kernel: [<c01d1d28>] ? sys_read+0x58/0xb0
> > Jul 26 23:04:13 mike kernel: [<c0103c73>] ? sysenter_do_call+0x12/0x28
> > Jul 26 23:04:13 mike kernel: Code: 89 fb 0f 43 f2 89 ca 29 f2 29 f3 39 f9 0f 46
> > da 29 5c 24 14 f6 40 0c 01 8d 14 33 75 19 89 d9 89 f7 c1 e9 02 2b 7c 24 04 03
> > 3c 24 <f3> a5 89 d9 83 e1 03 74 02 f3 a4 8b 4c 24 14 8b 00 85 c9 74 0a 
> > Jul 26 23:04:13 mike kernel: EIP: [<c0224dd1>] read_kcore+0x2c1/0x4b0 SS:ESP
> > 0068:d73fbefc
> > Jul 26 23:04:13 mike kernel: CR2: 00000000e07cf000
> > Jul 26 23:04:13 mike kernel: ---[ end trace 3bb140bf57c1987e ]---
> > Jul 26 23:04:13 mike kernel: note: cat[4835] exited with preempt_count 1
> > 
> > I understand it's quite a ridiculous thing to do, but userspace shouldn't be
> > able to cause kernel errors, no matter what kind of insane things I try.
> > 
> 
> gee, read_kcore() is huge.  This makes it pretty hard to work out where
> exactly the kernel died.
> 
> Is it reproducible, or do you still have the vmlinux from the above
> oops on-disk?
> 
> If so, can you please help work out where it crashed?  You could run
> something like
> 
> 	addr2line -e vmlinux 0xc0224dd1
> 
> or
> 
> 	gdb vmlinux
> 	(gdb) l *0xc0224dd1
> 
yes, disassemble will be helpful.
If you compiled the kernel by yourself, 
# objdump -d fs/proc/kcore.o

will also help us.

Hmm, but this message is curious.

unable to handle kernel paging request at e07cf000

What's layout of memory does your server have ? Could you show
# grep "System RAM" /proc/iomem
or head of dmesg ?

IIUC, current code doesn't assume any memory hole in direct-map area.
(And my new patch series should handle it even in CONFIG_HIGHMEM case.)

Thanks,
-Kame



> both of these will need CONFIG_DEBUG_INFO=y.
> 
> 
> It is possible to work out where the kernel crashed using the above
> Code: line, but it's a bit of a pain.
> 
> Thanks.
> 
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Bugme-new] [Bug 13850] New: reading /proc/kcore causes oops
  2009-07-28 23:48   ` KAMEZAWA Hiroyuki
@ 2009-07-29  2:46     ` Mike Smith
  2009-07-29  3:32       ` KAMEZAWA Hiroyuki
  2009-08-03 11:14       ` [BUGFIX][PATCH 0/3] kcore: avoid access to holes in vmalloc v3 (Was " KAMEZAWA Hiroyuki
  0 siblings, 2 replies; 31+ messages in thread
From: Mike Smith @ 2009-07-29  2:46 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, bugzilla-daemon, bugme-daemon, Amerigo Wang,
	linux-kernel

> What's layout of memory does your server have ?
The log I gave was from my desktop, so I'll assume you wanted that
instead of the server:
[mike: mike in ~]$ grep "System RAM" /proc/iomem
00010000-0009efff : System RAM
00100000-1dedffff : System RAM

> If so, can you please help work out where it crashed?
I didn't compile the kernel myself; it's the stock Arch Linux kernel,
which may explain the following useless output (this was the only file
I could find called vmlinux):
[mike: mike in /usr/src/linux-2.6.30-ARCH]$ addr2line -e vmlinux 0xc0224dd1
kcore.c:0
[mike: mike in /usr/src/linux-2.6.30-ARCH]$ gdb vmlinux
GNU gdb 6.8
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...
(no debugging symbols found)
(gdb) l *0xc0224dd1
No symbol table is loaded.  Use the "file" command.


I'll compile a minimal kernel later with debug symbols and see if I
can reproduce it on that.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Bugme-new] [Bug 13850] New: reading /proc/kcore causes oops
  2009-07-29  2:46     ` Mike Smith
@ 2009-07-29  3:32       ` KAMEZAWA Hiroyuki
  2009-07-29  8:36         ` [RFC][PATCH 1/2] vmalloc: reorder unmap and removal entry KAMEZAWA Hiroyuki
  2009-07-31  7:07         ` [BUGFIX][PATCH 0/3] fix bug for /proc/kcore causes panic (Was: [Bugme-new] [Bug 13850] New: reading /proc/kcore causes oops KAMEZAWA Hiroyuki
  2009-08-03 11:14       ` [BUGFIX][PATCH 0/3] kcore: avoid access to holes in vmalloc v3 (Was " KAMEZAWA Hiroyuki
  1 sibling, 2 replies; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-07-29  3:32 UTC (permalink / raw)
  To: Mike Smith
  Cc: Andrew Morton, bugzilla-daemon, bugme-daemon, Amerigo Wang,
	linux-kernel

On Tue, 28 Jul 2009 22:46:56 -0400
Mike Smith <scgtrp@gmail.com> wrote:

> > What's layout of memory does your server have ?
> The log I gave was from my desktop, so I'll assume you wanted that
> instead of the server:
> [mike: mike in ~]$ grep "System RAM" /proc/iomem
> 00010000-0009efff : System RAM
> 00100000-1dedffff : System RAM
> 
>From this, your kernel's valid direct-map address range will be

c0010000-c009efff
c0100000-ddedffff

And, 
==
unable to handle kernel paging request at e07cf000
==
e07cf000 doesn't exist in direct map. It seems this is vmalloc() area.

At looking into mm/vmalloc.c, this area is unmapped under
- purge_lock
But proc/kcore just access this just under vmlist_lock.

No guards at all. This is _a_ problem. But it seems race is not
reproducable easily. I'll think more but is it guaranteed whether
vmalloc area(struct vm_struct) linked to vmlist has always valid pages ?
Considering get_vm_area(), it's not true I think.

I wonder fs/proc/kcore.c's vmalloc area access needs some fix. let me try.

Thanks,
-Kame






^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC][PATCH 1/2] vmalloc: reorder unmap and removal entry
  2009-07-29  3:32       ` KAMEZAWA Hiroyuki
@ 2009-07-29  8:36         ` KAMEZAWA Hiroyuki
  2009-07-29  8:48           ` [RFC][PATCH 2/2] vmalloc: vread vwrite avoid memory hole KAMEZAWA Hiroyuki
  2009-07-29  9:40           ` [RFC][PATCH 1/2] vmalloc: reorder unmap and removal entry Amerigo Wang
  2009-07-31  7:07         ` [BUGFIX][PATCH 0/3] fix bug for /proc/kcore causes panic (Was: [Bugme-new] [Bug 13850] New: reading /proc/kcore causes oops KAMEZAWA Hiroyuki
  1 sibling, 2 replies; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-07-29  8:36 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Mike Smith, Andrew Morton, bugzilla-daemon, bugme-daemon,
	Amerigo Wang, linux-kernel

> I wonder fs/proc/kcore.c's vmalloc area access needs some fix. let me try.
> 
> Thanks,
Finally, I wrote 2 patches. maybe 2/2 is a fix for the bug.
but needs comments.

Comaparing 2 calls, vfree() and vread()
==
vfree()
  -> __vumap()
     -> remove_vm_area()
       -> free_unmap_vmap_area         #
	 -> try_purge_vmap_area_lazy() #
           ->__purge_vmap_area_lazy(); #
             -> unmap_vmap_area()      # unmap memory here.
       -> write_lock(&vmlist_lock)
          remvoe vm_struct from list
          write_unlock(&vmlist_lock).
==
vread()
 -> read_lock(&vmlist_lock);
    get vm_struct -> do memcpy
    read_unlock(&vmlist_lock);
==

Hmm, maybe not related to original bug but above order should be fixed.

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

vmap area should be purged after vm_struct is removed from the list
because vread/vwrite etc...believes the range is valid while it's on
vm_struct list.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/vmalloc.c |   14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

Index: linux-2.6.31-rc4/mm/vmalloc.c
===================================================================
--- linux-2.6.31-rc4.orig/mm/vmalloc.c
+++ linux-2.6.31-rc4/mm/vmalloc.c
@@ -1256,17 +1256,21 @@ struct vm_struct *remove_vm_area(const v
 	if (va && va->flags & VM_VM_AREA) {
 		struct vm_struct *vm = va->private;
 		struct vm_struct *tmp, **p;
-
-		vmap_debug_free_range(va->va_start, va->va_end);
-		free_unmap_vmap_area(va);
-		vm->size -= PAGE_SIZE;
-
+		/*
+		 * remove from list and disallow access to this vm_struct
+		 * before unmap. (address range confliction is maintained by
+		 * vmap.)
+		 */
 		write_lock(&vmlist_lock);
 		for (p = &vmlist; (tmp = *p) != vm; p = &tmp->next)
 			;
 		*p = tmp->next;
 		write_unlock(&vmlist_lock);
 
+		vmap_debug_free_range(va->va_start, va->va_end);
+		free_unmap_vmap_area(va);
+		vm->size -= PAGE_SIZE;
+
 		return vm;
 	}
 	return NULL;


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC][PATCH 2/2] vmalloc: vread vwrite avoid memory hole.
  2009-07-29  8:36         ` [RFC][PATCH 1/2] vmalloc: reorder unmap and removal entry KAMEZAWA Hiroyuki
@ 2009-07-29  8:48           ` KAMEZAWA Hiroyuki
  2009-07-29  9:59             ` Amerigo Wang
  2009-07-29  9:40           ` [RFC][PATCH 1/2] vmalloc: reorder unmap and removal entry Amerigo Wang
  1 sibling, 1 reply; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-07-29  8:48 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Mike Smith, Andrew Morton, bugzilla-daemon, bugme-daemon,
	Amerigo Wang, linux-kernel

Considering recent changes, I think this an answer to this bug.

All get_vm_area() calls specified VM_IOREMAP before 2.6.30.
Now, percpu calls get_vm_area() without IOREMAP (it means kcore doesn't skip this)
And it seems pcpu does delayed map to pre-allocated vmalloc-area.
Then, now, vmalloc() area has memory holes. (I'm sorry if I misunderstand.)
Then, vread()/vwrite() should be aware of memory holes in vmalloc.
And yes, /proc/kcore should be.

plz review. 
==
vread/vwrite access vmalloc area without checking there is a page or not.
In most case, this works well.

After per-cpu-alloc patch, There tend to be a hole in valid vmalloc area.
Then, skip the hole (not mapped page) is necessary.

/proc/kcore has the same problem, now, it uses its own code but
it's better to use vread().

Question: vread() should skip IOREMAP area always ?

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 drivers/char/mem.c      |    3 +
 fs/proc/kcore.c         |   32 +----------------
 include/linux/vmalloc.h |    3 +
 mm/vmalloc.c            |   90 +++++++++++++++++++++++++++++++++++++-----------
 4 files changed, 77 insertions(+), 51 deletions(-)

Index: linux-2.6.31-rc4/mm/vmalloc.c
===================================================================
--- linux-2.6.31-rc4.orig/mm/vmalloc.c
+++ linux-2.6.31-rc4/mm/vmalloc.c
@@ -1629,7 +1629,61 @@ void *vmalloc_32_user(unsigned long size
 }
 EXPORT_SYMBOL(vmalloc_32_user);
 
-long vread(char *buf, char *addr, unsigned long count)
+/*
+ * small helper routine , copy contents to buf from addr.
+ * If the page is not present, fill zero.
+ */
+
+static int aligned_vread(char *buf, char *addr, unsigned long count)
+{
+	struct page *p;
+	int copied;
+
+	while (count) {
+		unsigned long offset, length;
+
+		offset = (unsinged long)addr & PAGE_MASK;
+		length = PAGE_SIZE - offset;
+		if (length > count)
+			length = count;
+		p = vmalloc_to_page(addr);
+		if (p)
+			memcpy(buf, addr, length);
+		else
+			memset(buf, 0, length);
+		addr += length;
+		buf += length;
+		copiled += length;
+		count -= length;
+	}
+	return copied;
+}
+
+static int aligned_vwrite(char *buf, char *addr, unsigned long count)
+{
+	struct page *p;
+	int copied;
+
+	while (count) {
+		unsigned long offset, length;
+
+		offset = (unsinged long)addr & PAGE_MASK;
+		length = PAGE_SIZE - offset;
+		if (length > count)
+			length = count;
+		/* confirm the page is present */
+		p = vmalloc_to_page(addr);
+		if (p)
+			memcpy(addr, buf, length);
+		addr += length;
+		buf += length;
+		copiled += length;
+		count -= length;
+	}
+	return copied;
+}
+
+long vread(char *buf, char *addr, unsigned long count, bool skip_ioremap)
 {
 	struct vm_struct *tmp;
 	char *vaddr, *buf_start = buf;
@@ -1640,10 +1694,11 @@ long vread(char *buf, char *addr, unsign
 		count = -(unsigned long) addr;
 
 	read_lock(&vmlist_lock);
-	for (tmp = vmlist; tmp; tmp = tmp->next) {
+	for (tmp = vmlist; count && tmp; tmp = tmp->next) {
 		vaddr = (char *) tmp->addr;
 		if (addr >= vaddr + tmp->size - PAGE_SIZE)
 			continue;
+
 		while (addr < vaddr) {
 			if (count == 0)
 				goto finished;
@@ -1653,14 +1708,15 @@ long vread(char *buf, char *addr, unsign
 			count--;
 		}
 		n = vaddr + tmp->size - PAGE_SIZE - addr;
-		do {
-			if (count == 0)
-				goto finished;
-			*buf = *addr;
-			buf++;
-			addr++;
-			count--;
-		} while (--n > 0);
+		if (n > count)
+			n = count;
+		if (!(tmp->flags & VM_IOREMAP) || !skip_ioremap)
+			aligned_vread(buf, addr, n);
+		else
+			memset(buf, 0, n);
+		buf += n;
+		addr += n;
+		count -= n;
 	}
 finished:
 	read_unlock(&vmlist_lock);
@@ -1678,7 +1734,7 @@ long vwrite(char *buf, char *addr, unsig
 		count = -(unsigned long) addr;
 
 	read_lock(&vmlist_lock);
-	for (tmp = vmlist; tmp; tmp = tmp->next) {
+	for (tmp = vmlist; count && tmp; tmp = tmp->next) {
 		vaddr = (char *) tmp->addr;
 		if (addr >= vaddr + tmp->size - PAGE_SIZE)
 			continue;
@@ -1690,14 +1746,10 @@ long vwrite(char *buf, char *addr, unsig
 			count--;
 		}
 		n = vaddr + tmp->size - PAGE_SIZE - addr;
-		do {
-			if (count == 0)
-				goto finished;
-			*addr = *buf;
-			buf++;
-			addr++;
-			count--;
-		} while (--n > 0);
+		copied = aligned_vwrite(buf, addr, n);
+		buf += n;
+		addr += n;
+		count -= n;
 	}
 finished:
 	read_unlock(&vmlist_lock);
Index: linux-2.6.31-rc4/drivers/char/mem.c
===================================================================
--- linux-2.6.31-rc4.orig/drivers/char/mem.c
+++ linux-2.6.31-rc4/drivers/char/mem.c
@@ -466,7 +466,8 @@ static ssize_t read_kmem(struct file *fi
 
 			if (len > PAGE_SIZE)
 				len = PAGE_SIZE;
-			len = vread(kbuf, (char *)p, len);
+			/* we read memory even if ioremap...*/
+			len = vread(kbuf, (char *)p, len, false);
 			if (!len)
 				break;
 			if (copy_to_user(buf, kbuf, len)) {
Index: linux-2.6.31-rc4/fs/proc/kcore.c
===================================================================
--- linux-2.6.31-rc4.orig/fs/proc/kcore.c
+++ linux-2.6.31-rc4/fs/proc/kcore.c
@@ -331,40 +331,12 @@ read_kcore(struct file *file, char __use
 			struct vm_struct *m;
 			unsigned long curstart = start;
 			unsigned long cursize = tsz;
-
+			
 			elf_buf = kzalloc(tsz, GFP_KERNEL);
 			if (!elf_buf)
 				return -ENOMEM;
 
-			read_lock(&vmlist_lock);
-			for (m=vmlist; m && cursize; m=m->next) {
-				unsigned long vmstart;
-				unsigned long vmsize;
-				unsigned long msize = m->size - PAGE_SIZE;
-
-				if (((unsigned long)m->addr + msize) < 
-								curstart)
-					continue;
-				if ((unsigned long)m->addr > (curstart + 
-								cursize))
-					break;
-				vmstart = (curstart < (unsigned long)m->addr ? 
-					(unsigned long)m->addr : curstart);
-				if (((unsigned long)m->addr + msize) > 
-							(curstart + cursize))
-					vmsize = curstart + cursize - vmstart;
-				else
-					vmsize = (unsigned long)m->addr + 
-							msize - vmstart;
-				curstart = vmstart + vmsize;
-				cursize -= vmsize;
-				/* don't dump ioremap'd stuff! (TA) */
-				if (m->flags & VM_IOREMAP)
-					continue;
-				memcpy(elf_buf + (vmstart - start),
-					(char *)vmstart, vmsize);
-			}
-			read_unlock(&vmlist_lock);
+			vread(elf_buf, start, tsz, true);
 			if (copy_to_user(buffer, elf_buf, tsz)) {
 				kfree(elf_buf);
 				return -EFAULT;
Index: linux-2.6.31-rc4/include/linux/vmalloc.h
===================================================================
--- linux-2.6.31-rc4.orig/include/linux/vmalloc.h
+++ linux-2.6.31-rc4/include/linux/vmalloc.h
@@ -105,7 +105,8 @@ extern struct vm_struct *alloc_vm_area(s
 extern void free_vm_area(struct vm_struct *area);
 
 /* for /dev/kmem */
-extern long vread(char *buf, char *addr, unsigned long count);
+extern long vread(char *buf, char *addr, unsigned long count,
+		  bool skip_ioremap);
 extern long vwrite(char *buf, char *addr, unsigned long count);
 
 /*


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC][PATCH 1/2] vmalloc: reorder unmap and removal entry
  2009-07-29  8:36         ` [RFC][PATCH 1/2] vmalloc: reorder unmap and removal entry KAMEZAWA Hiroyuki
  2009-07-29  8:48           ` [RFC][PATCH 2/2] vmalloc: vread vwrite avoid memory hole KAMEZAWA Hiroyuki
@ 2009-07-29  9:40           ` Amerigo Wang
  2009-07-29 11:53             ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 31+ messages in thread
From: Amerigo Wang @ 2009-07-29  9:40 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Mike Smith, Andrew Morton, bugzilla-daemon, bugme-daemon,
	Amerigo Wang, linux-kernel

On Wed, Jul 29, 2009 at 05:36:00PM +0900, KAMEZAWA Hiroyuki wrote:
>> I wonder fs/proc/kcore.c's vmalloc area access needs some fix. let me try.
>> 
>> Thanks,
>Finally, I wrote 2 patches. maybe 2/2 is a fix for the bug.
>but needs comments.
>
>Comaparing 2 calls, vfree() and vread()
>==
>vfree()
>  -> __vumap()
>     -> remove_vm_area()
>       -> free_unmap_vmap_area         #
>	 -> try_purge_vmap_area_lazy() #
>           ->__purge_vmap_area_lazy(); #
>             -> unmap_vmap_area()      # unmap memory here.
>       -> write_lock(&vmlist_lock)
>          remvoe vm_struct from list
>          write_unlock(&vmlist_lock).
>==
>vread()
> -> read_lock(&vmlist_lock);
>    get vm_struct -> do memcpy
>    read_unlock(&vmlist_lock);
>==


I think this is a very good catch!

Your patch looks reasonable for me.

>
>Hmm, maybe not related to original bug but above order should be fixed.
>
>From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
>vmap area should be purged after vm_struct is removed from the list
>because vread/vwrite etc...believes the range is valid while it's on
>vm_struct list.
>
>Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>

Aside, would you like to replace vmlist, a single list, with our
generic list API? :) Just as you did for kcore. Pls send it in
a seprate patch.

Thanks!


>---
> mm/vmalloc.c |   14 +++++++++-----
> 1 file changed, 9 insertions(+), 5 deletions(-)
>
>Index: linux-2.6.31-rc4/mm/vmalloc.c
>===================================================================
>--- linux-2.6.31-rc4.orig/mm/vmalloc.c
>+++ linux-2.6.31-rc4/mm/vmalloc.c
>@@ -1256,17 +1256,21 @@ struct vm_struct *remove_vm_area(const v
> 	if (va && va->flags & VM_VM_AREA) {
> 		struct vm_struct *vm = va->private;
> 		struct vm_struct *tmp, **p;
>-
>-		vmap_debug_free_range(va->va_start, va->va_end);
>-		free_unmap_vmap_area(va);
>-		vm->size -= PAGE_SIZE;
>-
>+		/*
>+		 * remove from list and disallow access to this vm_struct
>+		 * before unmap. (address range confliction is maintained by
>+		 * vmap.)
>+		 */
> 		write_lock(&vmlist_lock);
> 		for (p = &vmlist; (tmp = *p) != vm; p = &tmp->next)
> 			;
> 		*p = tmp->next;
> 		write_unlock(&vmlist_lock);
> 
>+		vmap_debug_free_range(va->va_start, va->va_end);
>+		free_unmap_vmap_area(va);
>+		vm->size -= PAGE_SIZE;
>+
> 		return vm;
> 	}
> 	return NULL;
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC][PATCH 2/2] vmalloc: vread vwrite avoid memory hole.
  2009-07-29  8:48           ` [RFC][PATCH 2/2] vmalloc: vread vwrite avoid memory hole KAMEZAWA Hiroyuki
@ 2009-07-29  9:59             ` Amerigo Wang
  2009-07-29 11:51               ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 31+ messages in thread
From: Amerigo Wang @ 2009-07-29  9:59 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Mike Smith, Andrew Morton, bugzilla-daemon, bugme-daemon,
	Amerigo Wang, linux-kernel

On Wed, Jul 29, 2009 at 05:48:10PM +0900, KAMEZAWA Hiroyuki wrote:
>Considering recent changes, I think this an answer to this bug.
>
>All get_vm_area() calls specified VM_IOREMAP before 2.6.30.
>Now, percpu calls get_vm_area() without IOREMAP (it means kcore doesn't skip this)
>And it seems pcpu does delayed map to pre-allocated vmalloc-area.
>Then, now, vmalloc() area has memory holes. (I'm sorry if I misunderstand.)


You mean the guard holes in vmalloc area?

IIRC, there are some guard holes between vmalloc areas, but I haven't
checked the source code. ;)


>Then, vread()/vwrite() should be aware of memory holes in vmalloc.
>And yes, /proc/kcore should be.
>
>plz review. 
>==
>vread/vwrite access vmalloc area without checking there is a page or not.
>In most case, this works well.
>
>After per-cpu-alloc patch, There tend to be a hole in valid vmalloc area.
>Then, skip the hole (not mapped page) is necessary.
>
>/proc/kcore has the same problem, now, it uses its own code but
>it's better to use vread().
>
>Question: vread() should skip IOREMAP area always ?
>
>Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


Some comments below..


>---
> drivers/char/mem.c      |    3 +
> fs/proc/kcore.c         |   32 +----------------
> include/linux/vmalloc.h |    3 +
> mm/vmalloc.c            |   90 +++++++++++++++++++++++++++++++++++++-----------
> 4 files changed, 77 insertions(+), 51 deletions(-)
>
>Index: linux-2.6.31-rc4/mm/vmalloc.c
>===================================================================
>--- linux-2.6.31-rc4.orig/mm/vmalloc.c
>+++ linux-2.6.31-rc4/mm/vmalloc.c
>@@ -1629,7 +1629,61 @@ void *vmalloc_32_user(unsigned long size
> }
> EXPORT_SYMBOL(vmalloc_32_user);
> 
>-long vread(char *buf, char *addr, unsigned long count)
>+/*
>+ * small helper routine , copy contents to buf from addr.
>+ * If the page is not present, fill zero.
>+ */
>+
>+static int aligned_vread(char *buf, char *addr, unsigned long count)
>+{
>+	struct page *p;
>+	int copied;
>+
>+	while (count) {
>+		unsigned long offset, length;
>+
>+		offset = (unsinged long)addr & PAGE_MASK;
>+		length = PAGE_SIZE - offset;
>+		if (length > count)
>+			length = count;
>+		p = vmalloc_to_page(addr);
>+		if (p)
>+			memcpy(buf, addr, length);
>+		else
>+			memset(buf, 0, length);
>+		addr += length;
>+		buf += length;
>+		copiled += length;
>+		count -= length;
>+	}
>+	return copied;
>+}
>+
>+static int aligned_vwrite(char *buf, char *addr, unsigned long count)
>+{
>+	struct page *p;
>+	int copied;
>+
>+	while (count) {
>+		unsigned long offset, length;
>+
>+		offset = (unsinged long)addr & PAGE_MASK;
>+		length = PAGE_SIZE - offset;
>+		if (length > count)
>+			length = count;
>+		/* confirm the page is present */
>+		p = vmalloc_to_page(addr);
>+		if (p)
>+			memcpy(addr, buf, length);


So when !p, you copy nothing, but still increase the length,
*silently*?

This looks a bit weird for me... I am thinking if we need
to return the number of bytes that is *actually* copied?


>+		addr += length;
>+		buf += length;
>+		copiled += length;
>+		count -= length;
>+	}
>+	return copied;
>+}
>+
>+long vread(char *buf, char *addr, unsigned long count, bool skip_ioremap)
> {
> 	struct vm_struct *tmp;
> 	char *vaddr, *buf_start = buf;
>@@ -1640,10 +1694,11 @@ long vread(char *buf, char *addr, unsign
> 		count = -(unsigned long) addr;
> 
> 	read_lock(&vmlist_lock);
>-	for (tmp = vmlist; tmp; tmp = tmp->next) {
>+	for (tmp = vmlist; count && tmp; tmp = tmp->next) {
> 		vaddr = (char *) tmp->addr;
> 		if (addr >= vaddr + tmp->size - PAGE_SIZE)
> 			continue;
>+
> 		while (addr < vaddr) {
> 			if (count == 0)
> 				goto finished;
>@@ -1653,14 +1708,15 @@ long vread(char *buf, char *addr, unsign
> 			count--;
> 		}
> 		n = vaddr + tmp->size - PAGE_SIZE - addr;
>-		do {
>-			if (count == 0)
>-				goto finished;
>-			*buf = *addr;
>-			buf++;
>-			addr++;
>-			count--;
>-		} while (--n > 0);
>+		if (n > count)
>+			n = count;
>+		if (!(tmp->flags & VM_IOREMAP) || !skip_ioremap)
>+			aligned_vread(buf, addr, n);
>+		else
>+			memset(buf, 0, n);
>+		buf += n;
>+		addr += n;
>+		count -= n;


If I set 'skip_ioremap' true, I want the return value can be
what it _actually_ copies, i.e. kick out the bytes counted
for VM_IOREMAP. What do you think?


> 	}
> finished:
> 	read_unlock(&vmlist_lock);
>@@ -1678,7 +1734,7 @@ long vwrite(char *buf, char *addr, unsig
> 		count = -(unsigned long) addr;
> 
> 	read_lock(&vmlist_lock);
>-	for (tmp = vmlist; tmp; tmp = tmp->next) {
>+	for (tmp = vmlist; count && tmp; tmp = tmp->next) {
> 		vaddr = (char *) tmp->addr;
> 		if (addr >= vaddr + tmp->size - PAGE_SIZE)
> 			continue;
>@@ -1690,14 +1746,10 @@ long vwrite(char *buf, char *addr, unsig
> 			count--;
> 		}
> 		n = vaddr + tmp->size - PAGE_SIZE - addr;
>-		do {
>-			if (count == 0)
>-				goto finished;
>-			*addr = *buf;
>-			buf++;
>-			addr++;
>-			count--;
>-		} while (--n > 0);
>+		copied = aligned_vwrite(buf, addr, n);
>+		buf += n;
>+		addr += n;
>+		count -= n;
> 	}
> finished:
> 	read_unlock(&vmlist_lock);
>Index: linux-2.6.31-rc4/drivers/char/mem.c
>===================================================================
>--- linux-2.6.31-rc4.orig/drivers/char/mem.c
>+++ linux-2.6.31-rc4/drivers/char/mem.c
>@@ -466,7 +466,8 @@ static ssize_t read_kmem(struct file *fi
> 
> 			if (len > PAGE_SIZE)
> 				len = PAGE_SIZE;
>-			len = vread(kbuf, (char *)p, len);
>+			/* we read memory even if ioremap...*/
>+			len = vread(kbuf, (char *)p, len, false);
> 			if (!len)
> 				break;
> 			if (copy_to_user(buf, kbuf, len)) {
>Index: linux-2.6.31-rc4/fs/proc/kcore.c
>===================================================================
>--- linux-2.6.31-rc4.orig/fs/proc/kcore.c
>+++ linux-2.6.31-rc4/fs/proc/kcore.c
>@@ -331,40 +331,12 @@ read_kcore(struct file *file, char __use
> 			struct vm_struct *m;
> 			unsigned long curstart = start;
> 			unsigned long cursize = tsz;
>-
>+			


I am sure this change, a line only has spaces, is not what you want. :-)



> 			elf_buf = kzalloc(tsz, GFP_KERNEL);
> 			if (!elf_buf)
> 				return -ENOMEM;
> 
>-			read_lock(&vmlist_lock);
>-			for (m=vmlist; m && cursize; m=m->next) {
>-				unsigned long vmstart;
>-				unsigned long vmsize;
>-				unsigned long msize = m->size - PAGE_SIZE;
>-
>-				if (((unsigned long)m->addr + msize) < 
>-								curstart)
>-					continue;
>-				if ((unsigned long)m->addr > (curstart + 
>-								cursize))
>-					break;
>-				vmstart = (curstart < (unsigned long)m->addr ? 
>-					(unsigned long)m->addr : curstart);
>-				if (((unsigned long)m->addr + msize) > 
>-							(curstart + cursize))
>-					vmsize = curstart + cursize - vmstart;
>-				else
>-					vmsize = (unsigned long)m->addr + 
>-							msize - vmstart;
>-				curstart = vmstart + vmsize;
>-				cursize -= vmsize;
>-				/* don't dump ioremap'd stuff! (TA) */
>-				if (m->flags & VM_IOREMAP)
>-					continue;
>-				memcpy(elf_buf + (vmstart - start),
>-					(char *)vmstart, vmsize);
>-			}
>-			read_unlock(&vmlist_lock);
>+			vread(elf_buf, start, tsz, true);
> 			if (copy_to_user(buffer, elf_buf, tsz)) {
> 				kfree(elf_buf);
> 				return -EFAULT;
>Index: linux-2.6.31-rc4/include/linux/vmalloc.h
>===================================================================
>--- linux-2.6.31-rc4.orig/include/linux/vmalloc.h
>+++ linux-2.6.31-rc4/include/linux/vmalloc.h
>@@ -105,7 +105,8 @@ extern struct vm_struct *alloc_vm_area(s
> extern void free_vm_area(struct vm_struct *area);
> 
> /* for /dev/kmem */
>-extern long vread(char *buf, char *addr, unsigned long count);
>+extern long vread(char *buf, char *addr, unsigned long count,
>+		  bool skip_ioremap);
> extern long vwrite(char *buf, char *addr, unsigned long count);
> 
> /*
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC][PATCH 2/2] vmalloc: vread vwrite avoid memory hole.
  2009-07-29  9:59             ` Amerigo Wang
@ 2009-07-29 11:51               ` KAMEZAWA Hiroyuki
  2009-07-31  9:38                 ` Amerigo Wang
  0 siblings, 1 reply; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-07-29 11:51 UTC (permalink / raw)
  To: Amerigo Wang
  Cc: Mike Smith, Andrew Morton, bugzilla-daemon, bugme-daemon,
	linux-kernel

On Wed, 29 Jul 2009 17:59:32 +0800
Amerigo Wang <xiyou.wangcong@gmail.com> wrote:

> On Wed, Jul 29, 2009 at 05:48:10PM +0900, KAMEZAWA Hiroyuki wrote:
> >Considering recent changes, I think this an answer to this bug.
> >
> >All get_vm_area() calls specified VM_IOREMAP before 2.6.30.
> >Now, percpu calls get_vm_area() without IOREMAP (it means kcore doesn't skip this)
> >And it seems pcpu does delayed map to pre-allocated vmalloc-area.
> >Then, now, vmalloc() area has memory holes. (I'm sorry if I misunderstand.)
> 
> 
> You mean the guard holes in vmalloc area?
> 
no, not guard hole.  memory hole in the middle of valid vmalloc area, like this

start             end
      XXXXX0XXXXXG   XXXXXXXXG  XXXXXXXXG
X= pte is present
O= pte is none
G= pte is none as guard page.


> IIRC, there are some guard holes between vmalloc areas, but I haven't
> checked the source code. ;)
> 
> 
> >Then, vread()/vwrite() should be aware of memory holes in vmalloc.
> >And yes, /proc/kcore should be.
> >
> >plz review. 
> >==
> >vread/vwrite access vmalloc area without checking there is a page or not.
> >In most case, this works well.
> >
> >After per-cpu-alloc patch, There tend to be a hole in valid vmalloc area.
> >Then, skip the hole (not mapped page) is necessary.
> >
> >/proc/kcore has the same problem, now, it uses its own code but
> >it's better to use vread().
> >
> >Question: vread() should skip IOREMAP area always ?
> >
> >Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> 
> Some comments below..
> 
> 
> >---
> > drivers/char/mem.c      |    3 +
> > fs/proc/kcore.c         |   32 +----------------
> > include/linux/vmalloc.h |    3 +
> > mm/vmalloc.c            |   90 +++++++++++++++++++++++++++++++++++++-----------
> > 4 files changed, 77 insertions(+), 51 deletions(-)
> >
> >Index: linux-2.6.31-rc4/mm/vmalloc.c
> >===================================================================
> >--- linux-2.6.31-rc4.orig/mm/vmalloc.c
> >+++ linux-2.6.31-rc4/mm/vmalloc.c
> >@@ -1629,7 +1629,61 @@ void *vmalloc_32_user(unsigned long size
> > }
> > EXPORT_SYMBOL(vmalloc_32_user);
> > 
> >-long vread(char *buf, char *addr, unsigned long count)
> >+/*
> >+ * small helper routine , copy contents to buf from addr.
> >+ * If the page is not present, fill zero.
> >+ */
> >+
> >+static int aligned_vread(char *buf, char *addr, unsigned long count)
> >+{
> >+	struct page *p;
> >+	int copied;
> >+
> >+	while (count) {
> >+		unsigned long offset, length;
> >+
> >+		offset = (unsinged long)addr & PAGE_MASK;
> >+		length = PAGE_SIZE - offset;
> >+		if (length > count)
> >+			length = count;
> >+		p = vmalloc_to_page(addr);
> >+		if (p)
> >+			memcpy(buf, addr, length);
> >+		else
> >+			memset(buf, 0, length);
> >+		addr += length;
> >+		buf += length;
> >+		copiled += length;
> >+		count -= length;
> >+	}
> >+	return copied;
> >+}
> >+
> >+static int aligned_vwrite(char *buf, char *addr, unsigned long count)
> >+{
> >+	struct page *p;
> >+	int copied;
> >+
> >+	while (count) {
> >+		unsigned long offset, length;
> >+
> >+		offset = (unsinged long)addr & PAGE_MASK;
> >+		length = PAGE_SIZE - offset;
> >+		if (length > count)
> >+			length = count;
> >+		/* confirm the page is present */
> >+		p = vmalloc_to_page(addr);
> >+		if (p)
> >+			memcpy(addr, buf, length);
> 
> 
> So when !p, you copy nothing, but still increase the length,
> *silently*?
Ah, thanks, it's careless bug...will fix.



> 
> This looks a bit weird for me... I am thinking if we need
> to return the number of bytes that is *actually* copied?
> 
I think it makes the caller complex.

Then, I have 2 options.
 a) skip memory hole
 b) if we find memory hole, return immediately.

Do you like b) ? 
But in old behavior, vwrite()/vread() incremetns its "addr"
(at read zero-fill).

Then, I used a).


> 
> >+		addr += length;
> >+		buf += length;
> >+		copiled += length;
> >+		count -= length;
> >+	}
> >+	return copied;
> >+}
> >+
> >+long vread(char *buf, char *addr, unsigned long count, bool skip_ioremap)
> > {
> > 	struct vm_struct *tmp;
> > 	char *vaddr, *buf_start = buf;
> >@@ -1640,10 +1694,11 @@ long vread(char *buf, char *addr, unsign
> > 		count = -(unsigned long) addr;
> > 
> > 	read_lock(&vmlist_lock);
> >-	for (tmp = vmlist; tmp; tmp = tmp->next) {
> >+	for (tmp = vmlist; count && tmp; tmp = tmp->next) {
> > 		vaddr = (char *) tmp->addr;
> > 		if (addr >= vaddr + tmp->size - PAGE_SIZE)
> > 			continue;
> >+
> > 		while (addr < vaddr) {
> > 			if (count == 0)
> > 				goto finished;
> >@@ -1653,14 +1708,15 @@ long vread(char *buf, char *addr, unsign
> > 			count--;
> > 		}
> > 		n = vaddr + tmp->size - PAGE_SIZE - addr;
> >-		do {
> >-			if (count == 0)
> >-				goto finished;
> >-			*buf = *addr;
> >-			buf++;
> >-			addr++;
> >-			count--;
> >-		} while (--n > 0);
> >+		if (n > count)
> >+			n = count;
> >+		if (!(tmp->flags & VM_IOREMAP) || !skip_ioremap)
> >+			aligned_vread(buf, addr, n);
> >+		else
> >+			memset(buf, 0, n);
> >+		buf += n;
> >+		addr += n;
> >+		count -= n;
> 
> 
> If I set 'skip_ioremap' true, I want the return value can be
> what it _actually_ copies, i.e. kick out the bytes counted
> for VM_IOREMAP. What do you think?
> 

One one od the reasons is The same reason to above. 
(adjusted to old behavior)

IIUC, In following kind of codes,

	len = vread(buf,addr,xxx);
	bud += len;
	addr += len

len should be the length that the caller should increment buffer address.
So, I just skipped hole.



> 
> > 	}
> > finished:
> > 	read_unlock(&vmlist_lock);
> >@@ -1678,7 +1734,7 @@ long vwrite(char *buf, char *addr, unsig
> > 		count = -(unsigned long) addr;
> > 
> > 	read_lock(&vmlist_lock);
> >-	for (tmp = vmlist; tmp; tmp = tmp->next) {
> >+	for (tmp = vmlist; count && tmp; tmp = tmp->next) {
> > 		vaddr = (char *) tmp->addr;
> > 		if (addr >= vaddr + tmp->size - PAGE_SIZE)
> > 			continue;
> >@@ -1690,14 +1746,10 @@ long vwrite(char *buf, char *addr, unsig
> > 			count--;
> > 		}
> > 		n = vaddr + tmp->size - PAGE_SIZE - addr;
> >-		do {
> >-			if (count == 0)
> >-				goto finished;
> >-			*addr = *buf;
> >-			buf++;
> >-			addr++;
> >-			count--;
> >-		} while (--n > 0);
> >+		copied = aligned_vwrite(buf, addr, n);
> >+		buf += n;
> >+		addr += n;
> >+		count -= n;
> > 	}
> > finished:
> > 	read_unlock(&vmlist_lock);
> >Index: linux-2.6.31-rc4/drivers/char/mem.c
> >===================================================================
> >--- linux-2.6.31-rc4.orig/drivers/char/mem.c
> >+++ linux-2.6.31-rc4/drivers/char/mem.c
> >@@ -466,7 +466,8 @@ static ssize_t read_kmem(struct file *fi
> > 
> > 			if (len > PAGE_SIZE)
> > 				len = PAGE_SIZE;
> >-			len = vread(kbuf, (char *)p, len);
> >+			/* we read memory even if ioremap...*/
> >+			len = vread(kbuf, (char *)p, len, false);
> > 			if (!len)
> > 				break;
> > 			if (copy_to_user(buf, kbuf, len)) {
> >Index: linux-2.6.31-rc4/fs/proc/kcore.c
> >===================================================================
> >--- linux-2.6.31-rc4.orig/fs/proc/kcore.c
> >+++ linux-2.6.31-rc4/fs/proc/kcore.c
> >@@ -331,40 +331,12 @@ read_kcore(struct file *file, char __use
> > 			struct vm_struct *m;
> > 			unsigned long curstart = start;
> > 			unsigned long cursize = tsz;
> >-
> >+			
> 
> 
> I am sure this change, a line only has spaces, is not what you want. :-)
> 
yes ;(


Thank you for review. I'll fix vwrite_aligned().


-Kame

> 
> 
> > 			elf_buf = kzalloc(tsz, GFP_KERNEL);
> > 			if (!elf_buf)
> > 				return -ENOMEM;
> > 
> >-			read_lock(&vmlist_lock);
> >-			for (m=vmlist; m && cursize; m=m->next) {
> >-				unsigned long vmstart;
> >-				unsigned long vmsize;
> >-				unsigned long msize = m->size - PAGE_SIZE;
> >-
> >-				if (((unsigned long)m->addr + msize) < 
> >-								curstart)
> >-					continue;
> >-				if ((unsigned long)m->addr > (curstart + 
> >-								cursize))
> >-					break;
> >-				vmstart = (curstart < (unsigned long)m->addr ? 
> >-					(unsigned long)m->addr : curstart);
> >-				if (((unsigned long)m->addr + msize) > 
> >-							(curstart + cursize))
> >-					vmsize = curstart + cursize - vmstart;
> >-				else
> >-					vmsize = (unsigned long)m->addr + 
> >-							msize - vmstart;
> >-				curstart = vmstart + vmsize;
> >-				cursize -= vmsize;
> >-				/* don't dump ioremap'd stuff! (TA) */
> >-				if (m->flags & VM_IOREMAP)
> >-					continue;
> >-				memcpy(elf_buf + (vmstart - start),
> >-					(char *)vmstart, vmsize);
> >-			}
> >-			read_unlock(&vmlist_lock);
> >+			vread(elf_buf, start, tsz, true);
> > 			if (copy_to_user(buffer, elf_buf, tsz)) {
> > 				kfree(elf_buf);
> > 				return -EFAULT;
> >Index: linux-2.6.31-rc4/include/linux/vmalloc.h
> >===================================================================
> >--- linux-2.6.31-rc4.orig/include/linux/vmalloc.h
> >+++ linux-2.6.31-rc4/include/linux/vmalloc.h
> >@@ -105,7 +105,8 @@ extern struct vm_struct *alloc_vm_area(s
> > extern void free_vm_area(struct vm_struct *area);
> > 
> > /* for /dev/kmem */
> >-extern long vread(char *buf, char *addr, unsigned long count);
> >+extern long vread(char *buf, char *addr, unsigned long count,
> >+		  bool skip_ioremap);
> > extern long vwrite(char *buf, char *addr, unsigned long count);
> > 
> > /*
> >
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC][PATCH 1/2] vmalloc: reorder unmap and removal entry
  2009-07-29  9:40           ` [RFC][PATCH 1/2] vmalloc: reorder unmap and removal entry Amerigo Wang
@ 2009-07-29 11:53             ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-07-29 11:53 UTC (permalink / raw)
  To: Amerigo Wang
  Cc: Mike Smith, Andrew Morton, bugzilla-daemon, bugme-daemon,
	linux-kernel

On Wed, 29 Jul 2009 17:40:34 +0800
Amerigo Wang <xiyou.wangcong@gmail.com> wrote:

> On Wed, Jul 29, 2009 at 05:36:00PM +0900, KAMEZAWA Hiroyuki wrote:
> >> I wonder fs/proc/kcore.c's vmalloc area access needs some fix. let me try.
> >> 
> >> Thanks,
> >Finally, I wrote 2 patches. maybe 2/2 is a fix for the bug.
> >but needs comments.
> >
> >Comaparing 2 calls, vfree() and vread()
> >==
> >vfree()
> >  -> __vumap()
> >     -> remove_vm_area()
> >       -> free_unmap_vmap_area         #
> >	 -> try_purge_vmap_area_lazy() #
> >           ->__purge_vmap_area_lazy(); #
> >             -> unmap_vmap_area()      # unmap memory here.
> >       -> write_lock(&vmlist_lock)
> >          remvoe vm_struct from list
> >          write_unlock(&vmlist_lock).
> >==
> >vread()
> > -> read_lock(&vmlist_lock);
> >    get vm_struct -> do memcpy
> >    read_unlock(&vmlist_lock);
> >==
> 
> 
> I think this is a very good catch!
> 
> Your patch looks reasonable for me.
> 
> >
> >Hmm, maybe not related to original bug but above order should be fixed.
> >
> >From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >
> >vmap area should be purged after vm_struct is removed from the list
> >because vread/vwrite etc...believes the range is valid while it's on
> >vm_struct list.
> >
> >Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> 
> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
> 
> Aside, would you like to replace vmlist, a single list, with our
> generic list API? :) Just as you did for kcore. Pls send it in
> a seprate patch.
> 

I'll schedule it up after investigation.  (I'm not sure why private
list handle code is used.) and some pending patchs.

Thanks,
-Kame

> Thanks!
> 
> 
> >---
> > mm/vmalloc.c |   14 +++++++++-----
> > 1 file changed, 9 insertions(+), 5 deletions(-)
> >
> >Index: linux-2.6.31-rc4/mm/vmalloc.c
> >===================================================================
> >--- linux-2.6.31-rc4.orig/mm/vmalloc.c
> >+++ linux-2.6.31-rc4/mm/vmalloc.c
> >@@ -1256,17 +1256,21 @@ struct vm_struct *remove_vm_area(const v
> > 	if (va && va->flags & VM_VM_AREA) {
> > 		struct vm_struct *vm = va->private;
> > 		struct vm_struct *tmp, **p;
> >-
> >-		vmap_debug_free_range(va->va_start, va->va_end);
> >-		free_unmap_vmap_area(va);
> >-		vm->size -= PAGE_SIZE;
> >-
> >+		/*
> >+		 * remove from list and disallow access to this vm_struct
> >+		 * before unmap. (address range confliction is maintained by
> >+		 * vmap.)
> >+		 */
> > 		write_lock(&vmlist_lock);
> > 		for (p = &vmlist; (tmp = *p) != vm; p = &tmp->next)
> > 			;
> > 		*p = tmp->next;
> > 		write_unlock(&vmlist_lock);
> > 
> >+		vmap_debug_free_range(va->va_start, va->va_end);
> >+		free_unmap_vmap_area(va);
> >+		vm->size -= PAGE_SIZE;
> >+
> > 		return vm;
> > 	}
> > 	return NULL;
> >
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [BUGFIX][PATCH 0/3] fix bug for /proc/kcore causes panic  (Was: [Bugme-new] [Bug 13850] New: reading /proc/kcore causes oops
  2009-07-29  3:32       ` KAMEZAWA Hiroyuki
  2009-07-29  8:36         ` [RFC][PATCH 1/2] vmalloc: reorder unmap and removal entry KAMEZAWA Hiroyuki
@ 2009-07-31  7:07         ` KAMEZAWA Hiroyuki
  2009-07-31  7:11           ` [BUGFIX][PATCH 1/3] fix vread/vwrite to be aware of memory hole KAMEZAWA Hiroyuki
                             ` (3 more replies)
  1 sibling, 4 replies; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-07-31  7:07 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Mike Smith, Andrew Morton, bugzilla-daemon, bugme-daemon,
	Amerigo Wang, linux-kernel

On Wed, 29 Jul 2009 12:32:09 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Tue, 28 Jul 2009 22:46:56 -0400
> Mike Smith <scgtrp@gmail.com> wrote:
> 
> > > What's layout of memory does your server have ?
> > The log I gave was from my desktop, so I'll assume you wanted that
> > instead of the server:
> > [mike: mike in ~]$ grep "System RAM" /proc/iomem
> > 00010000-0009efff : System RAM
> > 00100000-1dedffff : System RAM
> > 
> From this, your kernel's valid direct-map address range will be
> 
> c0010000-c009efff
> c0100000-ddedffff
> 
Ok, I reproduced the bug on x86-32 and here is a fix.

I reproduced the bug on x86-32 host with mem=512M boot option.

As I expected, the bug is from holes in vmalloc area.
(This hole means thera are a memory hole withing [start ...start+size-PAGE_SIZE) 
 of valid vm_struct.)

For review, I divided all into 3 patches. all series will be reply to this email.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [BUGFIX][PATCH 1/3] fix vread/vwrite to be aware of memory hole
  2009-07-31  7:07         ` [BUGFIX][PATCH 0/3] fix bug for /proc/kcore causes panic (Was: [Bugme-new] [Bug 13850] New: reading /proc/kcore causes oops KAMEZAWA Hiroyuki
@ 2009-07-31  7:11           ` KAMEZAWA Hiroyuki
  2009-07-31  9:57             ` Amerigo Wang
  2009-07-31  7:12           ` [BUGFIX][PATCH 2/3] kcore should use vread KAMEZAWA Hiroyuki
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-07-31  7:11 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Mike Smith, Andrew Morton, bugzilla-daemon, bugme-daemon,
	Amerigo Wang, linux-kernel

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

vread/vwrite access vmalloc area without checking there is a page or not.

In old ages, the caller of get_vm_ara() is only IOREMAP and there is no
memory hole within vm_struct's [addr...addr + size - PAGE_SIZE]
( -PAGE_SIZE is for a guard page.)

After per-cpu-alloc patch, it uses get_vm_area() for reserve continuous
virtual address but remap _later_. There tend to be a hole in valid vmalloc
area in vm_struct lists.
Then, skip the hole (not mapped page) is necessary.
This patch updates vread/vwrite() for avoiding memory hole.

Routines which access vmalloc area without knowing for which addr is used
are
  - /proc/kcore
  - /dev/kmem

kcore checks IOREMAP, /dev/kmem doesn't. After this patch, IOREMAP is
checked and /dev/kmem will avoid to read/write it.
Fixes to /proc/kcore will be in the next patch in series.

And, this itself fixes the bug as
# dd if=/dev/kmem of=/dev/null bs=1024 count=1048576 skip=3145728
can cause panic.


Changelog v1->v2:
 - enhanced comments.
 - treat IOREMAP as hole always.
 - zero-fill memory hole if [addr...addr+size] includes valid pages.
 - returns 0 if [addr...addr+size) includes no valid pages.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/vmalloc.c |  157 +++++++++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 136 insertions(+), 21 deletions(-)

Index: linux-2.6.31-rc4/mm/vmalloc.c
===================================================================
--- linux-2.6.31-rc4.orig/mm/vmalloc.c	2009-07-31 14:06:41.000000000 +0900
+++ linux-2.6.31-rc4/mm/vmalloc.c	2009-07-31 15:51:56.000000000 +0900
@@ -1625,10 +1625,89 @@
 }
 EXPORT_SYMBOL(vmalloc_32_user);
 
+/*
+ * small helper routine , copy contents to buf from addr.
+ * If the page is not present, fill zero.
+ */
+
+static int aligned_vread(char *buf, char *addr, unsigned long count)
+{
+	struct page *p;
+	int copied;
+
+	while (count) {
+		unsigned long offset, length;
+
+		offset = (unsigned long)addr & PAGE_MASK;
+		length = PAGE_SIZE - offset;
+		if (length > count)
+			length = count;
+		p = vmalloc_to_page(addr);
+		if (p)
+			memcpy(buf, addr, length);
+		else
+			memset(buf, 0, length);
+		/* If no page, we fill 0 this area and incremetns buffer addr */
+		addr += length;
+		buf += length;
+		copied += length;
+		count -= length;
+	}
+	return copied;
+}
+
+static int aligned_vwrite(char *buf, char *addr, unsigned long count)
+{
+	struct page *p;
+	int copied;
+
+	while (count) {
+		unsigned long offset, length;
+
+		offset = (unsigned long)addr & PAGE_MASK;
+		length = PAGE_SIZE - offset;
+		if (length > count)
+			length = count;
+		/* confirm the page is present */
+		p = vmalloc_to_page(addr);
+		if (p)
+			memcpy(addr, buf, length);
+		/* If no page, we skip this area but incremetns buffer addr */
+		addr += length;
+		buf += length;
+		copied += length;
+		count -= length;
+	}
+	return copied;
+}
+
+/**
+ *	vread() -  read vmalloc area in safe way.
+ *	@buf:		buffer for reading data
+ *	@addr:		vm address.
+ *	@count:		number of bytes to be read.
+ *
+ *	Returns # of bytes which addr and buf shuld be incremented
+ *	(same to count).
+ *	If [addr...addr+count) doesn't includes any valid area, returns 0.
+ *
+ *	This function checks that addr is a valid vmalloc'ed area, and
+ *	copy data from that area to given buffer. If given memory range of
+ *	[addr...addr+count) includes some valid address, data is copied to
+ *	proper area of @buf. If there are memory holes, they'll be zero-filled.
+ *	IOREMAP area is treated as memory hole and no copy is done.
+ *
+ *	Note: In usual ops, vread() is never necessary because the caller should
+ *	know vmalloc() area is valid and can use memcpy(). This is for routines
+ *	which have to access vmalloc area without any informaion, as /dev/kmem.
+ *
+ */
+
 long vread(char *buf, char *addr, unsigned long count)
 {
 	struct vm_struct *tmp;
 	char *vaddr, *buf_start = buf;
+	unsigned long buflen = count;
 	unsigned long n;
 
 	/* Don't allow overflow */
@@ -1636,7 +1715,7 @@
 		count = -(unsigned long) addr;
 
 	read_lock(&vmlist_lock);
-	for (tmp = vmlist; tmp; tmp = tmp->next) {
+	for (tmp = vmlist; count && tmp; tmp = tmp->next) {
 		vaddr = (char *) tmp->addr;
 		if (addr >= vaddr + tmp->size - PAGE_SIZE)
 			continue;
@@ -1649,32 +1728,65 @@
 			count--;
 		}
 		n = vaddr + tmp->size - PAGE_SIZE - addr;
-		do {
-			if (count == 0)
-				goto finished;
-			*buf = *addr;
-			buf++;
-			addr++;
-			count--;
-		} while (--n > 0);
+		if (n > count)
+			n = count;
+		if (!(tmp->flags & VM_IOREMAP))
+			aligned_vread(buf, addr, n);
+		else /* IOREMAP area is treated as memory hole */
+			memset(buf, 0, n);
+		buf += n;
+		addr += n;
+		count -= n;
 	}
 finished:
 	read_unlock(&vmlist_lock);
-	return buf - buf_start;
+
+	if (buf == buf_start)
+		return 0;
+	/* zero-fill memory holes */
+	if (buf != buf_start + buflen)
+		memset(buf, 0, buflen - (buf - buf_start));
+
+	return buflen;
 }
 
+/**
+ *	vwrite() -  write vmalloc area in safe way.
+ *	@buf:		buffer for source data
+ *	@addr:		vm address.
+ *	@count:		number of bytes to be read.
+ *
+ *	Returns # of bytes which addr and buf shuld be incremented
+ *	(same to count).
+ *	If [addr...addr+count) doesn't includes any valid area, returns 0.
+ *
+ *	This function checks that addr is a valid vmalloc'ed area, and
+ *	copy data from that buffer to there. If given memory range of
+ *	[addr...addr+count) includes some valid address, data is copied from
+ *	proper area of @buf. If there are memory holes, no copy. just skip.
+ *	IOREMAP area is treated as memory hole and no copy is done.
+ *
+ *	Note: In usual ops, vwrite() is never necessary because the caller
+ *	should know vmalloc() area is valid and can use memcpy().
+ *	This is for routines which have to access vmalloc area without
+ *	any informaion, as /dev/kmem.
+ */
+
 long vwrite(char *buf, char *addr, unsigned long count)
 {
 	struct vm_struct *tmp;
-	char *vaddr, *buf_start = buf;
+	char *vaddr;
+	unsigned long buflen;
 	unsigned long n;
+	int copied = 0;
 
 	/* Don't allow overflow */
 	if ((unsigned long) addr + count < count)
 		count = -(unsigned long) addr;
+	buflen = count;
 
 	read_lock(&vmlist_lock);
-	for (tmp = vmlist; tmp; tmp = tmp->next) {
+	for (tmp = vmlist; count && tmp; tmp = tmp->next) {
 		vaddr = (char *) tmp->addr;
 		if (addr >= vaddr + tmp->size - PAGE_SIZE)
 			continue;
@@ -1686,18 +1798,21 @@
 			count--;
 		}
 		n = vaddr + tmp->size - PAGE_SIZE - addr;
-		do {
-			if (count == 0)
-				goto finished;
-			*addr = *buf;
-			buf++;
-			addr++;
-			count--;
-		} while (--n > 0);
+		if (n > count)
+			n = count;
+		if (!(tmp->flags & VM_IOREMAP)) {
+			aligned_vwrite(buf, addr, n);
+			copied++;
+		}
+		buf += n;
+		addr += n;
+		count -= n;
 	}
 finished:
 	read_unlock(&vmlist_lock);
-	return buf - buf_start;
+	if (!copied)
+		return 0;
+	return buflen;
 }
 
 /**


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [BUGFIX][PATCH 2/3] kcore should use vread
  2009-07-31  7:07         ` [BUGFIX][PATCH 0/3] fix bug for /proc/kcore causes panic (Was: [Bugme-new] [Bug 13850] New: reading /proc/kcore causes oops KAMEZAWA Hiroyuki
  2009-07-31  7:11           ` [BUGFIX][PATCH 1/3] fix vread/vwrite to be aware of memory hole KAMEZAWA Hiroyuki
@ 2009-07-31  7:12           ` KAMEZAWA Hiroyuki
  2009-07-31  7:13           ` [BUGFIX][PATCH 3/3] unmap vmalloc area after hide it KAMEZAWA Hiroyuki
  2009-07-31  7:21           ` [BUGFIX][PATCH 0/3] fix bug for /proc/kcore causes panic (Was: [Bugme-new] [Bug 13850] New: reading /proc/kcore causes oops KAMEZAWA Hiroyuki
  3 siblings, 0 replies; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-07-31  7:12 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Mike Smith, Andrew Morton, bugzilla-daemon, bugme-daemon,
	Amerigo Wang, linux-kernel

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

/proc/kcore has its own routine to access vmallc area. It can be replaced
with new vread(). And by this, /proc/kcore can do safe access to vmalloc area.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
---
 fs/proc/kcore.c |   35 ++---------------------------------
 1 file changed, 2 insertions(+), 33 deletions(-)

Index: linux-2.6.31-rc4/fs/proc/kcore.c
===================================================================
--- linux-2.6.31-rc4.orig/fs/proc/kcore.c	2009-06-10 12:05:27.000000000 +0900
+++ linux-2.6.31-rc4/fs/proc/kcore.c	2009-07-31 15:52:23.000000000 +0900
@@ -328,43 +328,12 @@
 				return -EFAULT;
 		} else if (is_vmalloc_addr((void *)start)) {
 			char * elf_buf;
-			struct vm_struct *m;
-			unsigned long curstart = start;
-			unsigned long cursize = tsz;
 
 			elf_buf = kzalloc(tsz, GFP_KERNEL);
 			if (!elf_buf)
 				return -ENOMEM;
-
-			read_lock(&vmlist_lock);
-			for (m=vmlist; m && cursize; m=m->next) {
-				unsigned long vmstart;
-				unsigned long vmsize;
-				unsigned long msize = m->size - PAGE_SIZE;
-
-				if (((unsigned long)m->addr + msize) < 
-								curstart)
-					continue;
-				if ((unsigned long)m->addr > (curstart + 
-								cursize))
-					break;
-				vmstart = (curstart < (unsigned long)m->addr ? 
-					(unsigned long)m->addr : curstart);
-				if (((unsigned long)m->addr + msize) > 
-							(curstart + cursize))
-					vmsize = curstart + cursize - vmstart;
-				else
-					vmsize = (unsigned long)m->addr + 
-							msize - vmstart;
-				curstart = vmstart + vmsize;
-				cursize -= vmsize;
-				/* don't dump ioremap'd stuff! (TA) */
-				if (m->flags & VM_IOREMAP)
-					continue;
-				memcpy(elf_buf + (vmstart - start),
-					(char *)vmstart, vmsize);
-			}
-			read_unlock(&vmlist_lock);
+			vread(elf_buf, start, tsz);
+			/* we have to zero-fill user buffer even if no read */
 			if (copy_to_user(buffer, elf_buf, tsz)) {
 				kfree(elf_buf);
 				return -EFAULT;


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [BUGFIX][PATCH 3/3] unmap vmalloc area after hide it
  2009-07-31  7:07         ` [BUGFIX][PATCH 0/3] fix bug for /proc/kcore causes panic (Was: [Bugme-new] [Bug 13850] New: reading /proc/kcore causes oops KAMEZAWA Hiroyuki
  2009-07-31  7:11           ` [BUGFIX][PATCH 1/3] fix vread/vwrite to be aware of memory hole KAMEZAWA Hiroyuki
  2009-07-31  7:12           ` [BUGFIX][PATCH 2/3] kcore should use vread KAMEZAWA Hiroyuki
@ 2009-07-31  7:13           ` KAMEZAWA Hiroyuki
  2009-07-31  7:21           ` [BUGFIX][PATCH 0/3] fix bug for /proc/kcore causes panic (Was: [Bugme-new] [Bug 13850] New: reading /proc/kcore causes oops KAMEZAWA Hiroyuki
  3 siblings, 0 replies; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-07-31  7:13 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Mike Smith, Andrew Morton, bugzilla-daemon, bugme-daemon,
	Amerigo Wang, linux-kernel

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

vmap area should be purged after vm_struct is removed from the list
because vread/vwrite etc...believes the range is valid while it's on
vm_struct list.

Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/vmalloc.c |   14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

Index: linux-2.6.31-rc4/mm/vmalloc.c
===================================================================
--- linux-2.6.31-rc4.orig/mm/vmalloc.c	2009-07-31 15:51:56.000000000 +0900
+++ linux-2.6.31-rc4/mm/vmalloc.c	2009-07-31 15:52:42.000000000 +0900
@@ -1256,17 +1256,21 @@
 	if (va && va->flags & VM_VM_AREA) {
 		struct vm_struct *vm = va->private;
 		struct vm_struct *tmp, **p;
-
-		vmap_debug_free_range(va->va_start, va->va_end);
-		free_unmap_vmap_area(va);
-		vm->size -= PAGE_SIZE;
-
+		/*
+		 * remove from list and disallow access to this vm_struct
+		 * before unmap. (address range confliction is maintained by
+		 * vmap.)
+		 */
 		write_lock(&vmlist_lock);
 		for (p = &vmlist; (tmp = *p) != vm; p = &tmp->next)
 			;
 		*p = tmp->next;
 		write_unlock(&vmlist_lock);
 
+		vmap_debug_free_range(va->va_start, va->va_end);
+		free_unmap_vmap_area(va);
+		vm->size -= PAGE_SIZE;
+
 		return vm;
 	}
 	return NULL;


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUGFIX][PATCH 0/3] fix bug for /proc/kcore causes panic  (Was: [Bugme-new] [Bug 13850] New: reading /proc/kcore causes oops
  2009-07-31  7:07         ` [BUGFIX][PATCH 0/3] fix bug for /proc/kcore causes panic (Was: [Bugme-new] [Bug 13850] New: reading /proc/kcore causes oops KAMEZAWA Hiroyuki
                             ` (2 preceding siblings ...)
  2009-07-31  7:13           ` [BUGFIX][PATCH 3/3] unmap vmalloc area after hide it KAMEZAWA Hiroyuki
@ 2009-07-31  7:21           ` KAMEZAWA Hiroyuki
  3 siblings, 0 replies; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-07-31  7:21 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Mike Smith, Andrew Morton, bugzilla-daemon, bugme-daemon,
	Amerigo Wang, linux-kernel

On Fri, 31 Jul 2009 16:07:48 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Wed, 29 Jul 2009 12:32:09 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > On Tue, 28 Jul 2009 22:46:56 -0400
> > Mike Smith <scgtrp@gmail.com> wrote:
> > 
> > > > What's layout of memory does your server have ?
> > > The log I gave was from my desktop, so I'll assume you wanted that
> > > instead of the server:
> > > [mike: mike in ~]$ grep "System RAM" /proc/iomem
> > > 00010000-0009efff : System RAM
> > > 00100000-1dedffff : System RAM
> > > 
> > From this, your kernel's valid direct-map address range will be
> > 
> > c0010000-c009efff
> > c0100000-ddedffff
> > 
> Ok, I reproduced the bug on x86-32 and here is a fix.
> 
> I reproduced the bug on x86-32 host with mem=512M boot option.
> 
> As I expected, the bug is from holes in vmalloc area.
> (This hole means thera are a memory hole withing [start ...start+size-PAGE_SIZE) 
>  of valid vm_struct.)
> 
> For review, I divided all into 3 patches. all series will be reply to this email.
> 
A memo.

[1/3] fixes the bug in /dev/kmem, also.
      This causes machine check on my host (and one of cpu was disabled)
      # dd if=/dev/kmem of=/dev/null bs=1024 count=1048576 skip=3145728
      I hope all users of kmem are sane people...but the patch fixes this.
      (If kmem has to access IOREMAP area, please teach me .)

[2/3] fix for this /proc/kcore bug based on [1/3]

[3/3] fix for vread/vwrite race conditions.
      not related to this reproducable bug itself but a fix for potential bug.

Thanks,
-Kame




> Thanks,
> -Kame
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC][PATCH 2/2] vmalloc: vread vwrite avoid memory hole.
  2009-07-29 11:51               ` KAMEZAWA Hiroyuki
@ 2009-07-31  9:38                 ` Amerigo Wang
  0 siblings, 0 replies; 31+ messages in thread
From: Amerigo Wang @ 2009-07-31  9:38 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Amerigo Wang, Mike Smith, Andrew Morton, bugzilla-daemon,
	bugme-daemon, linux-kernel

On Wed, Jul 29, 2009 at 08:51:23PM +0900, KAMEZAWA Hiroyuki wrote:
>On Wed, 29 Jul 2009 17:59:32 +0800
>Amerigo Wang <xiyou.wangcong@gmail.com> wrote:
>
>> On Wed, Jul 29, 2009 at 05:48:10PM +0900, KAMEZAWA Hiroyuki wrote:
>> >Considering recent changes, I think this an answer to this bug.
>> >
>> >All get_vm_area() calls specified VM_IOREMAP before 2.6.30.
>> >Now, percpu calls get_vm_area() without IOREMAP (it means kcore doesn't skip this)
>> >And it seems pcpu does delayed map to pre-allocated vmalloc-area.
>> >Then, now, vmalloc() area has memory holes. (I'm sorry if I misunderstand.)
>> 
>> 
>> You mean the guard holes in vmalloc area?
>> 
>no, not guard hole.  memory hole in the middle of valid vmalloc area, like this
>
>start             end
>      XXXXX0XXXXXG   XXXXXXXXG  XXXXXXXXG
>X= pte is present
>O= pte is none
>G= pte is none as guard page.
>


Hmm, I get your meaning now... thanks


>> >+
>> >+static int aligned_vwrite(char *buf, char *addr, unsigned long count)
>> >+{
>> >+	struct page *p;
>> >+	int copied;
>> >+
>> >+	while (count) {
>> >+		unsigned long offset, length;
>> >+
>> >+		offset = (unsinged long)addr & PAGE_MASK;
>> >+		length = PAGE_SIZE - offset;
>> >+		if (length > count)
>> >+			length = count;
>> >+		/* confirm the page is present */
>> >+		p = vmalloc_to_page(addr);
>> >+		if (p)
>> >+			memcpy(addr, buf, length);
>> 
>> 
>> So when !p, you copy nothing, but still increase the length,
>> *silently*?
>Ah, thanks, it's careless bug...will fix.
>
>
>
>> 
>> This looks a bit weird for me... I am thinking if we need
>> to return the number of bytes that is *actually* copied?
>> 
>I think it makes the caller complex.
>
>Then, I have 2 options.
> a) skip memory hole
> b) if we find memory hole, return immediately.
>
>Do you like b) ? 
>But in old behavior, vwrite()/vread() incremetns its "addr"
>(at read zero-fill).
>
>Then, I used a).


Ok, just want to make sure it's safe to do a).



>
>
>> 
>> >+		addr += length;
>> >+		buf += length;
>> >+		copiled += length;
>> >+		count -= length;
>> >+	}
>> >+	return copied;
>> >+}
>> >+
>> >+long vread(char *buf, char *addr, unsigned long count, bool skip_ioremap)
>> > {
>> > 	struct vm_struct *tmp;
>> > 	char *vaddr, *buf_start = buf;
>> >@@ -1640,10 +1694,11 @@ long vread(char *buf, char *addr, unsign
>> > 		count = -(unsigned long) addr;
>> > 
>> > 	read_lock(&vmlist_lock);
>> >-	for (tmp = vmlist; tmp; tmp = tmp->next) {
>> >+	for (tmp = vmlist; count && tmp; tmp = tmp->next) {
>> > 		vaddr = (char *) tmp->addr;
>> > 		if (addr >= vaddr + tmp->size - PAGE_SIZE)
>> > 			continue;
>> >+
>> > 		while (addr < vaddr) {
>> > 			if (count == 0)
>> > 				goto finished;
>> >@@ -1653,14 +1708,15 @@ long vread(char *buf, char *addr, unsign
>> > 			count--;
>> > 		}
>> > 		n = vaddr + tmp->size - PAGE_SIZE - addr;
>> >-		do {
>> >-			if (count == 0)
>> >-				goto finished;
>> >-			*buf = *addr;
>> >-			buf++;
>> >-			addr++;
>> >-			count--;
>> >-		} while (--n > 0);
>> >+		if (n > count)
>> >+			n = count;
>> >+		if (!(tmp->flags & VM_IOREMAP) || !skip_ioremap)
>> >+			aligned_vread(buf, addr, n);
>> >+		else
>> >+			memset(buf, 0, n);
>> >+		buf += n;
>> >+		addr += n;
>> >+		count -= n;
>> 
>> 
>> If I set 'skip_ioremap' true, I want the return value can be
>> what it _actually_ copies, i.e. kick out the bytes counted
>> for VM_IOREMAP. What do you think?
>> 
>
>One one od the reasons is The same reason to above. 
>(adjusted to old behavior)
>
>IIUC, In following kind of codes,
>
>	len = vread(buf,addr,xxx);
>	bud += len;
>	addr += len
>
>len should be the length that the caller should increment buffer address.
>So, I just skipped hole.


Well... old code doesn't have the parameter 'skip_ioremap'. :-)
For new code, we can do:

  len = vread(buf, addr, count, true);
  buf += len;
  addr += count;


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUGFIX][PATCH 1/3] fix vread/vwrite to be aware of memory hole
  2009-07-31  7:11           ` [BUGFIX][PATCH 1/3] fix vread/vwrite to be aware of memory hole KAMEZAWA Hiroyuki
@ 2009-07-31  9:57             ` Amerigo Wang
  2009-07-31 10:32               ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 31+ messages in thread
From: Amerigo Wang @ 2009-07-31  9:57 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Mike Smith, Andrew Morton, bugzilla-daemon, bugme-daemon,
	Amerigo Wang, linux-kernel

On Fri, Jul 31, 2009 at 04:11:28PM +0900, KAMEZAWA Hiroyuki wrote:
>From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
>vread/vwrite access vmalloc area without checking there is a page or not.
>
>In old ages, the caller of get_vm_ara() is only IOREMAP and there is no
>memory hole within vm_struct's [addr...addr + size - PAGE_SIZE]
>( -PAGE_SIZE is for a guard page.)
>
>After per-cpu-alloc patch, it uses get_vm_area() for reserve continuous
>virtual address but remap _later_. There tend to be a hole in valid vmalloc
>area in vm_struct lists.
>Then, skip the hole (not mapped page) is necessary.
>This patch updates vread/vwrite() for avoiding memory hole.
>
>Routines which access vmalloc area without knowing for which addr is used
>are
>  - /proc/kcore
>  - /dev/kmem
>
>kcore checks IOREMAP, /dev/kmem doesn't. After this patch, IOREMAP is
>checked and /dev/kmem will avoid to read/write it.
>Fixes to /proc/kcore will be in the next patch in series.
>
>And, this itself fixes the bug as
># dd if=/dev/kmem of=/dev/null bs=1024 count=1048576 skip=3145728
>can cause panic.


What panic? :-) Would you mind to put it here?


>
>
>Changelog v1->v2:
> - enhanced comments.
> - treat IOREMAP as hole always.
> - zero-fill memory hole if [addr...addr+size] includes valid pages.
> - returns 0 if [addr...addr+size) includes no valid pages.
>
>Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


I am sorry that there are still some typos below.


>---
> mm/vmalloc.c |  157 +++++++++++++++++++++++++++++++++++++++++++++++++++--------
> 1 file changed, 136 insertions(+), 21 deletions(-)
>
>Index: linux-2.6.31-rc4/mm/vmalloc.c
>===================================================================
>--- linux-2.6.31-rc4.orig/mm/vmalloc.c	2009-07-31 14:06:41.000000000 +0900
>+++ linux-2.6.31-rc4/mm/vmalloc.c	2009-07-31 15:51:56.000000000 +0900
>@@ -1625,10 +1625,89 @@
> }
> EXPORT_SYMBOL(vmalloc_32_user);
> 
>+/*
>+ * small helper routine , copy contents to buf from addr.
>+ * If the page is not present, fill zero.
>+ */
>+
>+static int aligned_vread(char *buf, char *addr, unsigned long count)
>+{
>+	struct page *p;
>+	int copied;


Forgot to initialize it??

>+
>+	while (count) {
>+		unsigned long offset, length;
>+
>+		offset = (unsigned long)addr & PAGE_MASK;
>+		length = PAGE_SIZE - offset;
>+		if (length > count)
>+			length = count;
>+		p = vmalloc_to_page(addr);
>+		if (p)
>+			memcpy(buf, addr, length);
>+		else
>+			memset(buf, 0, length);
>+		/* If no page, we fill 0 this area and incremetns buffer addr */


s/incremetns/increments/


>+		addr += length;
>+		buf += length;
>+		copied += length;
>+		count -= length;
>+	}
>+	return copied;
>+}
>+
>+static int aligned_vwrite(char *buf, char *addr, unsigned long count)
>+{
>+	struct page *p;
>+	int copied;


ditto

>+
>+	while (count) {
>+		unsigned long offset, length;
>+
>+		offset = (unsigned long)addr & PAGE_MASK;
>+		length = PAGE_SIZE - offset;
>+		if (length > count)
>+			length = count;
>+		/* confirm the page is present */
>+		p = vmalloc_to_page(addr);
>+		if (p)
>+			memcpy(addr, buf, length);
>+		/* If no page, we skip this area but incremetns buffer addr */


ditto


>+		addr += length;
>+		buf += length;
>+		copied += length;
>+		count -= length;
>+	}
>+	return copied;
>+}
>+
>+/**
>+ *	vread() -  read vmalloc area in safe way.

in _a_ safe way...


>+ *	@buf:		buffer for reading data
>+ *	@addr:		vm address.
>+ *	@count:		number of bytes to be read.
>+ *
>+ *	Returns # of bytes which addr and buf shuld be incremented

s/shuld/should/

s/incremented/increased/


>+ *	(same to count).
>+ *	If [addr...addr+count) doesn't includes any valid area, returns 0.
>+ *
>+ *	This function checks that addr is a valid vmalloc'ed area, and
>+ *	copy data from that area to given buffer. If given memory range of


to _a_ given buffer. If _the_ given memory...


>+ *	[addr...addr+count) includes some valid address, data is copied to
>+ *	proper area of @buf. If there are memory holes, they'll be zero-filled.
>+ *	IOREMAP area is treated as memory hole and no copy is done.
>+ *
>+ *	Note: In usual ops, vread() is never necessary because the caller should
>+ *	know vmalloc() area is valid and can use memcpy(). This is for routines
>+ *	which have to access vmalloc area without any informaion, as /dev/kmem.
>+ *
>+ */
>+
> long vread(char *buf, char *addr, unsigned long count)
> {
> 	struct vm_struct *tmp;
> 	char *vaddr, *buf_start = buf;
>+	unsigned long buflen = count;
> 	unsigned long n;
> 
> 	/* Don't allow overflow */
>@@ -1636,7 +1715,7 @@
> 		count = -(unsigned long) addr;
> 
> 	read_lock(&vmlist_lock);
>-	for (tmp = vmlist; tmp; tmp = tmp->next) {
>+	for (tmp = vmlist; count && tmp; tmp = tmp->next) {
> 		vaddr = (char *) tmp->addr;
> 		if (addr >= vaddr + tmp->size - PAGE_SIZE)
> 			continue;
>@@ -1649,32 +1728,65 @@
> 			count--;
> 		}
> 		n = vaddr + tmp->size - PAGE_SIZE - addr;
>-		do {
>-			if (count == 0)
>-				goto finished;
>-			*buf = *addr;
>-			buf++;
>-			addr++;
>-			count--;
>-		} while (--n > 0);
>+		if (n > count)
>+			n = count;
>+		if (!(tmp->flags & VM_IOREMAP))
>+			aligned_vread(buf, addr, n);
>+		else /* IOREMAP area is treated as memory hole */
>+			memset(buf, 0, n);
>+		buf += n;
>+		addr += n;
>+		count -= n;
> 	}
> finished:
> 	read_unlock(&vmlist_lock);
>-	return buf - buf_start;
>+
>+	if (buf == buf_start)
>+		return 0;
>+	/* zero-fill memory holes */
>+	if (buf != buf_start + buflen)
>+		memset(buf, 0, buflen - (buf - buf_start));


Is this necessary?

>+
>+	return buflen;
> }
> 
>+/**
>+ *	vwrite() -  write vmalloc area in safe way.
>+ *	@buf:		buffer for source data
>+ *	@addr:		vm address.
>+ *	@count:		number of bytes to be read.
>+ *
>+ *	Returns # of bytes which addr and buf shuld be incremented
>+ *	(same to count).
>+ *	If [addr...addr+count) doesn't includes any valid area, returns 0.
>+ *
>+ *	This function checks that addr is a valid vmalloc'ed area, and
>+ *	copy data from that buffer to there. If given memory range of
>+ *	[addr...addr+count) includes some valid address, data is copied from
>+ *	proper area of @buf. If there are memory holes, no copy. just skip.
>+ *	IOREMAP area is treated as memory hole and no copy is done.
>+ *
>+ *	Note: In usual ops, vwrite() is never necessary because the caller
>+ *	should know vmalloc() area is valid and can use memcpy().
>+ *	This is for routines which have to access vmalloc area without
>+ *	any informaion, as /dev/kmem.
>+ */


ditto

>+
> long vwrite(char *buf, char *addr, unsigned long count)
> {
> 	struct vm_struct *tmp;
>-	char *vaddr, *buf_start = buf;
>+	char *vaddr;
>+	unsigned long buflen;
> 	unsigned long n;
>+	int copied = 0;
> 
> 	/* Don't allow overflow */
> 	if ((unsigned long) addr + count < count)
> 		count = -(unsigned long) addr;
>+	buflen = count;
> 
> 	read_lock(&vmlist_lock);
>-	for (tmp = vmlist; tmp; tmp = tmp->next) {
>+	for (tmp = vmlist; count && tmp; tmp = tmp->next) {
> 		vaddr = (char *) tmp->addr;
> 		if (addr >= vaddr + tmp->size - PAGE_SIZE)
> 			continue;
>@@ -1686,18 +1798,21 @@
> 			count--;
> 		}
> 		n = vaddr + tmp->size - PAGE_SIZE - addr;
>-		do {
>-			if (count == 0)
>-				goto finished;
>-			*addr = *buf;
>-			buf++;
>-			addr++;
>-			count--;
>-		} while (--n > 0);
>+		if (n > count)
>+			n = count;
>+		if (!(tmp->flags & VM_IOREMAP)) {
>+			aligned_vwrite(buf, addr, n);
>+			copied++;
>+		}
>+		buf += n;
>+		addr += n;
>+		count -= n;
> 	}
> finished:
> 	read_unlock(&vmlist_lock);
>-	return buf - buf_start;
>+	if (!copied)
>+		return 0;
>+	return buflen;
> }
> 
> /**
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUGFIX][PATCH 1/3] fix vread/vwrite to be aware of memory hole
  2009-07-31  9:57             ` Amerigo Wang
@ 2009-07-31 10:32               ` KAMEZAWA Hiroyuki
  2009-08-03  9:10                 ` Amerigo Wang
  0 siblings, 1 reply; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-07-31 10:32 UTC (permalink / raw)
  To: Amerigo Wang
  Cc: KAMEZAWA Hiroyuki, Mike Smith, Andrew Morton, bugzilla-daemon,
	bugme-daemon, Amerigo Wang, linux-kernel

Amerigo Wang さんは書きました:
> On Fri, Jul 31, 2009 at 04:11:28PM +0900, KAMEZAWA Hiroyuki wrote:
>>From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>>
>>vread/vwrite access vmalloc area without checking there is a page or not.
>>
>>In old ages, the caller of get_vm_ara() is only IOREMAP and there is no
>>memory hole within vm_struct's [addr...addr + size - PAGE_SIZE]
>>( -PAGE_SIZE is for a guard page.)
>>
>>After per-cpu-alloc patch, it uses get_vm_area() for reserve continuous
>>virtual address but remap _later_. There tend to be a hole in valid
>> vmalloc
>>area in vm_struct lists.
>>Then, skip the hole (not mapped page) is necessary.
>>This patch updates vread/vwrite() for avoiding memory hole.
>>
>>Routines which access vmalloc area without knowing for which addr is used
>>are
>>  - /proc/kcore
>>  - /dev/kmem
>>
>>kcore checks IOREMAP, /dev/kmem doesn't. After this patch, IOREMAP is
>>checked and /dev/kmem will avoid to read/write it.
>>Fixes to /proc/kcore will be in the next patch in series.
>>
>>And, this itself fixes the bug as
>># dd if=/dev/kmem of=/dev/null bs=1024 count=1048576 skip=3145728
>>can cause panic.
>
>
> What panic? :-) Would you mind to put it here?
>
It directly reboot ;( and no log.
plz try.


>
>>
>>
>>Changelog v1->v2:
>> - enhanced comments.
>> - treat IOREMAP as hole always.
>> - zero-fill memory hole if [addr...addr+size] includes valid pages.
>> - returns 0 if [addr...addr+size) includes no valid pages.
>>
>>Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
>
> I am sorry that there are still some typos below.
>
Ouch, thanks.


>
>>---
>> mm/vmalloc.c |  157
>> +++++++++++++++++++++++++++++++++++++++++++++++++++--------
>> 1 file changed, 136 insertions(+), 21 deletions(-)
>>
>>Index: linux-2.6.31-rc4/mm/vmalloc.c
>>===================================================================
>>--- linux-2.6.31-rc4.orig/mm/vmalloc.c	2009-07-31 14:06:41.000000000
>> +0900
>>+++ linux-2.6.31-rc4/mm/vmalloc.c	2009-07-31 15:51:56.000000000 +0900
>>@@ -1625,10 +1625,89 @@
>> }
>> EXPORT_SYMBOL(vmalloc_32_user);
>>
>>+/*
>>+ * small helper routine , copy contents to buf from addr.
>>+ * If the page is not present, fill zero.
>>+ */
>>+
>>+static int aligned_vread(char *buf, char *addr, unsigned long count)
>>+{
>>+	struct page *p;
>>+	int copied;
>
>
> Forgot to initialize it??
>
Ah, yes. should be fixed.

>>+
>>+	while (count) {
>>+		unsigned long offset, length;
>>+
>>+		offset = (unsigned long)addr & PAGE_MASK;
>>+		length = PAGE_SIZE - offset;
>>+		if (length > count)
>>+			length = count;
>>+		p = vmalloc_to_page(addr);
>>+		if (p)
>>+			memcpy(buf, addr, length);
>>+		else
>>+			memset(buf, 0, length);
>>+		/* If no page, we fill 0 this area and incremetns buffer addr */
>
>
> s/incremetns/increments/
>
ok.

>
>>+		addr += length;
>>+		buf += length;
>>+		copied += length;
>>+		count -= length;
>>+	}
>>+	return copied;
>>+}
>>+
>>+static int aligned_vwrite(char *buf, char *addr, unsigned long count)
>>+{
>>+	struct page *p;
>>+	int copied;
>
>
> ditto
>
Hmm..I though I removed all warnings..but it was an illusion..
ok, will fix.

>>+
>>+	while (count) {
>>+		unsigned long offset, length;
>>+
>>+		offset = (unsigned long)addr & PAGE_MASK;
>>+		length = PAGE_SIZE - offset;
>>+		if (length > count)
>>+			length = count;
>>+		/* confirm the page is present */
>>+		p = vmalloc_to_page(addr);
>>+		if (p)
>>+			memcpy(addr, buf, length);
>>+		/* If no page, we skip this area but incremetns buffer addr */
>
>
> ditto
>
ok

>
>>+		addr += length;
>>+		buf += length;
>>+		copied += length;
>>+		count -= length;
>>+	}
>>+	return copied;
>>+}
>>+
>>+/**
>>+ *	vread() -  read vmalloc area in safe way.
>
> in _a_ safe way...
>
ok.

>
>>+ *	@buf:		buffer for reading data
>>+ *	@addr:		vm address.
>>+ *	@count:		number of bytes to be read.
>>+ *
>>+ *	Returns # of bytes which addr and buf shuld be incremented
>
> s/shuld/should/
>
> s/incremented/increased/
>
ok.

>
>>+ *	(same to count).
>>+ *	If [addr...addr+count) doesn't includes any valid area, returns 0.
>>+ *
>>+ *	This function checks that addr is a valid vmalloc'ed area, and
>>+ *	copy data from that area to given buffer. If given memory range of
>
>
> to _a_ given buffer. If _the_ given memory...
>
ok.

>
>>+ *	[addr...addr+count) includes some valid address, data is copied to
>>+ *	proper area of @buf. If there are memory holes, they'll be
>> zero-filled.
>>+ *	IOREMAP area is treated as memory hole and no copy is done.
>>+ *
>>+ *	Note: In usual ops, vread() is never necessary because the caller
>> should
>>+ *	know vmalloc() area is valid and can use memcpy(). This is for
>> routines
>>+ *	which have to access vmalloc area without any informaion, as
>> /dev/kmem.
>>+ *
>>+ */
>>+
>> long vread(char *buf, char *addr, unsigned long count)
>> {
>> 	struct vm_struct *tmp;
>> 	char *vaddr, *buf_start = buf;
>>+	unsigned long buflen = count;
>> 	unsigned long n;
>>
>> 	/* Don't allow overflow */
>>@@ -1636,7 +1715,7 @@
>> 		count = -(unsigned long) addr;
>>
>> 	read_lock(&vmlist_lock);
>>-	for (tmp = vmlist; tmp; tmp = tmp->next) {
>>+	for (tmp = vmlist; count && tmp; tmp = tmp->next) {
>> 		vaddr = (char *) tmp->addr;
>> 		if (addr >= vaddr + tmp->size - PAGE_SIZE)
>> 			continue;
>>@@ -1649,32 +1728,65 @@
>> 			count--;
>> 		}
>> 		n = vaddr + tmp->size - PAGE_SIZE - addr;
>>-		do {
>>-			if (count == 0)
>>-				goto finished;
>>-			*buf = *addr;
>>-			buf++;
>>-			addr++;
>>-			count--;
>>-		} while (--n > 0);
>>+		if (n > count)
>>+			n = count;
>>+		if (!(tmp->flags & VM_IOREMAP))
>>+			aligned_vread(buf, addr, n);
>>+		else /* IOREMAP area is treated as memory hole */
>>+			memset(buf, 0, n);
>>+		buf += n;
>>+		addr += n;
>>+		count -= n;
>> 	}
>> finished:
>> 	read_unlock(&vmlist_lock);
>>-	return buf - buf_start;
>>+
>>+	if (buf == buf_start)
>>+		return 0;
>>+	/* zero-fill memory holes */
>>+	if (buf != buf_start + buflen)
>>+		memset(buf, 0, buflen - (buf - buf_start));
>
>
> Is this necessary?
>
I wrote "filled by zero" and thinks it's sane, then does this.
Now, /proc/kcore allocates memory by kzalloc(), this is redundant.
/dev/kmem doesn't do that.


>>+
>>+	return buflen;
>> }
>>
>>+/**
>>+ *	vwrite() -  write vmalloc area in safe way.
>>+ *	@buf:		buffer for source data
>>+ *	@addr:		vm address.
>>+ *	@count:		number of bytes to be read.
>>+ *
>>+ *	Returns # of bytes which addr and buf shuld be incremented
>>+ *	(same to count).
>>+ *	If [addr...addr+count) doesn't includes any valid area, returns 0.
>>+ *
>>+ *	This function checks that addr is a valid vmalloc'ed area, and
>>+ *	copy data from that buffer to there. If given memory range of
>>+ *	[addr...addr+count) includes some valid address, data is copied from
>>+ *	proper area of @buf. If there are memory holes, no copy. just skip.
>>+ *	IOREMAP area is treated as memory hole and no copy is done.
>>+ *
>>+ *	Note: In usual ops, vwrite() is never necessary because the caller
>>+ *	should know vmalloc() area is valid and can use memcpy().
>>+ *	This is for routines which have to access vmalloc area without
>>+ *	any informaion, as /dev/kmem.
>>+ */
>
>
> ditto
>
ouch..

Thank you for review.
I'm sorry that new version will not appear until next week.
I can't access x86-32 in the weekend.

Regards,
-Kame


>>+
>> long vwrite(char *buf, char *addr, unsigned long count)
>> {
>> 	struct vm_struct *tmp;
>>-	char *vaddr, *buf_start = buf;
>>+	char *vaddr;
>>+	unsigned long buflen;
>> 	unsigned long n;
>>+	int copied = 0;
>>
>> 	/* Don't allow overflow */
>> 	if ((unsigned long) addr + count < count)
>> 		count = -(unsigned long) addr;
>>+	buflen = count;
>>
>> 	read_lock(&vmlist_lock);
>>-	for (tmp = vmlist; tmp; tmp = tmp->next) {
>>+	for (tmp = vmlist; count && tmp; tmp = tmp->next) {
>> 		vaddr = (char *) tmp->addr;
>> 		if (addr >= vaddr + tmp->size - PAGE_SIZE)
>> 			continue;
>>@@ -1686,18 +1798,21 @@
>> 			count--;
>> 		}
>> 		n = vaddr + tmp->size - PAGE_SIZE - addr;
>>-		do {
>>-			if (count == 0)
>>-				goto finished;
>>-			*addr = *buf;
>>-			buf++;
>>-			addr++;
>>-			count--;
>>-		} while (--n > 0);
>>+		if (n > count)
>>+			n = count;
>>+		if (!(tmp->flags & VM_IOREMAP)) {
>>+			aligned_vwrite(buf, addr, n);
>>+			copied++;
>>+		}
>>+		buf += n;
>>+		addr += n;
>>+		count -= n;
>> 	}
>> finished:
>> 	read_unlock(&vmlist_lock);
>>-	return buf - buf_start;
>>+	if (!copied)
>>+		return 0;
>>+	return buflen;
>> }
>>
>> /**
>>
>



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUGFIX][PATCH 1/3] fix vread/vwrite to be aware of memory hole
  2009-07-31 10:32               ` KAMEZAWA Hiroyuki
@ 2009-08-03  9:10                 ` Amerigo Wang
  2009-08-03  9:10                   ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 31+ messages in thread
From: Amerigo Wang @ 2009-08-03  9:10 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Amerigo Wang, Mike Smith, Andrew Morton, bugzilla-daemon,
	bugme-daemon, linux-kernel

On Fri, Jul 31, 2009 at 07:32:15PM +0900, KAMEZAWA Hiroyuki wrote:
>Amerigo Wang さんは書きました:
>> On Fri, Jul 31, 2009 at 04:11:28PM +0900, KAMEZAWA Hiroyuki wrote:
>>>From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>>>
>>>vread/vwrite access vmalloc area without checking there is a page or not.
>>>
>>>In old ages, the caller of get_vm_ara() is only IOREMAP and there is no
>>>memory hole within vm_struct's [addr...addr + size - PAGE_SIZE]
>>>( -PAGE_SIZE is for a guard page.)
>>>
>>>After per-cpu-alloc patch, it uses get_vm_area() for reserve continuous
>>>virtual address but remap _later_. There tend to be a hole in valid
>>> vmalloc
>>>area in vm_struct lists.
>>>Then, skip the hole (not mapped page) is necessary.
>>>This patch updates vread/vwrite() for avoiding memory hole.
>>>
>>>Routines which access vmalloc area without knowing for which addr is used
>>>are
>>>  - /proc/kcore
>>>  - /dev/kmem
>>>
>>>kcore checks IOREMAP, /dev/kmem doesn't. After this patch, IOREMAP is
>>>checked and /dev/kmem will avoid to read/write it.
>>>Fixes to /proc/kcore will be in the next patch in series.
>>>
>>>And, this itself fixes the bug as
>>># dd if=/dev/kmem of=/dev/null bs=1024 count=1048576 skip=3145728
>>>can cause panic.
>>
>>
>> What panic? :-) Would you mind to put it here?
>>
>It directly reboot ;( and no log.
>plz try.


I tried it on an x86_64 machine, no panic, just:

 dd: reading `/dev/kmem': Bad address

Only appears on i386? :)

>>>-	return buf - buf_start;
>>>+
>>>+	if (buf == buf_start)
>>>+		return 0;
>>>+	/* zero-fill memory holes */
>>>+	if (buf != buf_start + buflen)
>>>+		memset(buf, 0, buflen - (buf - buf_start));
>>
>>
>> Is this necessary?
>>
>I wrote "filled by zero" and thinks it's sane, then does this.
>Now, /proc/kcore allocates memory by kzalloc(), this is redundant.
>/dev/kmem doesn't do that.

OK.


>ouch..
>
>Thank you for review.
>I'm sorry that new version will not appear until next week.
>I can't access x86-32 in the weekend.

No problem, feel free to do it at any time that is convinient
for you.

Thanks.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUGFIX][PATCH 1/3] fix vread/vwrite to be aware of memory hole
  2009-08-03  9:10                 ` Amerigo Wang
@ 2009-08-03  9:10                   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-08-03  9:10 UTC (permalink / raw)
  To: Amerigo Wang
  Cc: Mike Smith, Andrew Morton, bugzilla-daemon, bugme-daemon,
	linux-kernel

On Mon, 3 Aug 2009 17:10:19 +0800
Amerigo Wang <xiyou.wangcong@gmail.com> wrote:

> On Fri, Jul 31, 2009 at 07:32:15PM +0900, KAMEZAWA Hiroyuki wrote:
> >Amerigo Wang さんは書きました:
> >> On Fri, Jul 31, 2009 at 04:11:28PM +0900, KAMEZAWA Hiroyuki wrote:
> >>>From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >>>
> >>>vread/vwrite access vmalloc area without checking there is a page or not.
> >>>
> >>>In old ages, the caller of get_vm_ara() is only IOREMAP and there is no
> >>>memory hole within vm_struct's [addr...addr + size - PAGE_SIZE]
> >>>( -PAGE_SIZE is for a guard page.)
> >>>
> >>>After per-cpu-alloc patch, it uses get_vm_area() for reserve continuous
> >>>virtual address but remap _later_. There tend to be a hole in valid
> >>> vmalloc
> >>>area in vm_struct lists.
> >>>Then, skip the hole (not mapped page) is necessary.
> >>>This patch updates vread/vwrite() for avoiding memory hole.
> >>>
> >>>Routines which access vmalloc area without knowing for which addr is used
> >>>are
> >>>  - /proc/kcore
> >>>  - /dev/kmem
> >>>
> >>>kcore checks IOREMAP, /dev/kmem doesn't. After this patch, IOREMAP is
> >>>checked and /dev/kmem will avoid to read/write it.
> >>>Fixes to /proc/kcore will be in the next patch in series.
> >>>
> >>>And, this itself fixes the bug as
> >>># dd if=/dev/kmem of=/dev/null bs=1024 count=1048576 skip=3145728
> >>>can cause panic.
> >>
> >>
> >> What panic? :-) Would you mind to put it here?
> >>
> >It directly reboot ;( and no log.
> >plz try.
> 
> 
> I tried it on an x86_64 machine, no panic, just:
> 
>  dd: reading `/dev/kmem': Bad address
> 
> Only appears on i386? :)
> 
I tested on i386/2cpu server.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [BUGFIX][PATCH 0/3] kcore: avoid access to holes in vmalloc v3 (Was Re: [Bugme-new] [Bug 13850] New: reading /proc/kcore causes oops
  2009-07-29  2:46     ` Mike Smith
  2009-07-29  3:32       ` KAMEZAWA Hiroyuki
@ 2009-08-03 11:14       ` KAMEZAWA Hiroyuki
  2009-08-03 11:15         ` [BUGFIX][PATCH 1/3] unmap vmalloc area after hide it KAMEZAWA Hiroyuki
                           ` (2 more replies)
  1 sibling, 3 replies; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-08-03 11:14 UTC (permalink / raw)
  To: Mike Smith
  Cc: Andrew Morton, bugzilla-daemon, bugme-daemon, Amerigo Wang,
	linux-kernel

On Tue, 28 Jul 2009 22:46:56 -0400
Mike Smith <scgtrp@gmail.com> wrote:

> > What's layout of memory does your server have ?
> The log I gave was from my desktop, so I'll assume you wanted that
> instead of the server:
> [mike: mike in ~]$ grep "System RAM" /proc/iomem
> 00010000-0009efff : System RAM
> 00100000-1dedffff : System RAM
> 

This is v3. updated against 2.6.31-rc5 and several fixes in addition
to type fixes. patches are reordered again but nochanges in concept.


Thanks,
-Kame


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [BUGFIX][PATCH 1/3] unmap vmalloc area after hide it
  2009-08-03 11:14       ` [BUGFIX][PATCH 0/3] kcore: avoid access to holes in vmalloc v3 (Was " KAMEZAWA Hiroyuki
@ 2009-08-03 11:15         ` KAMEZAWA Hiroyuki
  2009-08-03 11:18         ` [BUGFIX][PATCH 2/3] kcore: fix vread/vwrite to be aware of holes KAMEZAWA Hiroyuki
  2009-08-03 11:19         ` [BUGFIX][PATCH 3/3] kcore: /proc/kcore should use vread KAMEZAWA Hiroyuki
  2 siblings, 0 replies; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-08-03 11:15 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Mike Smith, Andrew Morton, bugzilla-daemon, bugme-daemon,
	Amerigo Wang, linux-kernel

This is necessary to garantee sane access of vread/vwrite.

==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

vmap area should be purged after vm_struct is removed from the list
because vread/vwrite etc...believes the range is valid while it's on
vm_struct list.

Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/vmalloc.c |   14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

Index: linux-2.6.31-rc4/mm/vmalloc.c
===================================================================
--- linux-2.6.31-rc4.orig/mm/vmalloc.c	2009-07-31 15:51:56.000000000 +0900
+++ linux-2.6.31-rc4/mm/vmalloc.c	2009-07-31 15:52:42.000000000 +0900
@@ -1256,17 +1256,21 @@
 	if (va && va->flags & VM_VM_AREA) {
 		struct vm_struct *vm = va->private;
 		struct vm_struct *tmp, **p;
-
-		vmap_debug_free_range(va->va_start, va->va_end);
-		free_unmap_vmap_area(va);
-		vm->size -= PAGE_SIZE;
-
+		/*
+		 * remove from list and disallow access to this vm_struct
+		 * before unmap. (address range confliction is maintained by
+		 * vmap.)
+		 */
 		write_lock(&vmlist_lock);
 		for (p = &vmlist; (tmp = *p) != vm; p = &tmp->next)
 			;
 		*p = tmp->next;
 		write_unlock(&vmlist_lock);
 
+		vmap_debug_free_range(va->va_start, va->va_end);
+		free_unmap_vmap_area(va);
+		vm->size -= PAGE_SIZE;
+
 		return vm;
 	}
 	return NULL;


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [BUGFIX][PATCH 2/3] kcore: fix vread/vwrite to be aware of holes.
  2009-08-03 11:14       ` [BUGFIX][PATCH 0/3] kcore: avoid access to holes in vmalloc v3 (Was " KAMEZAWA Hiroyuki
  2009-08-03 11:15         ` [BUGFIX][PATCH 1/3] unmap vmalloc area after hide it KAMEZAWA Hiroyuki
@ 2009-08-03 11:18         ` KAMEZAWA Hiroyuki
  2009-08-04  9:23           ` Amerigo Wang
  2009-08-04 21:30           ` Andrew Morton
  2009-08-03 11:19         ` [BUGFIX][PATCH 3/3] kcore: /proc/kcore should use vread KAMEZAWA Hiroyuki
  2 siblings, 2 replies; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-08-03 11:18 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Mike Smith, Andrew Morton, bugzilla-daemon, bugme-daemon,
	Amerigo Wang, linux-kernel

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

vread/vwrite access vmalloc area without checking there is a page or not.
In most case, this works well.

In old ages, the caller of get_vm_ara() is only IOREMAP and there is no
memory hole within vm_struct's [addr...addr + size - PAGE_SIZE]
( -PAGE_SIZE is for a guard page.)

After per-cpu-alloc patch, it uses get_vm_area() for reserve continuous
virtual address but remap _later_. There tend to be a hole in valid vmalloc
area in vm_struct lists.
Then, skip the hole (not mapped page) is necessary.
This patch updates vread/vwrite() for avoiding memory hole.

Routines which access vmalloc area without knowing for which addr is used
are
  - /proc/kcore
  - /dev/kmem

kcore checks IOREMAP, /dev/kmem doesn't. After this patch, IOREMAP is
checked and /dev/kmem will avoid to read/write it.
Fixes to /proc/kcore will be in the next patch in series.

Changelog v2->v3:
 - fixed typos.
 - use kmap. (if not using kmap, we have to add lock here.)
 - fixed PAGE_MASK miss-use.
Changelog v1->v2:
 - enhanced comments.
 - treat IOREMAP as hole always.
 - zero-fill memory hole if [addr...addr+size] includes valid pages.
 - returns 0 if [addr...addr+size) includes no valid pages.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/vmalloc.c |  182 +++++++++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 159 insertions(+), 23 deletions(-)

Index: linux-2.6.31-rc5/mm/vmalloc.c
===================================================================
--- linux-2.6.31-rc5.orig/mm/vmalloc.c	2009-08-03 18:56:33.000000000 +0900
+++ linux-2.6.31-rc5/mm/vmalloc.c	2009-08-03 19:54:39.000000000 +0900
@@ -25,7 +25,7 @@
 #include <linux/rcupdate.h>
 #include <linux/pfn.h>
 #include <linux/kmemleak.h>
-
+#include <linux/highmem.h>
 #include <asm/atomic.h>
 #include <asm/uaccess.h>
 #include <asm/tlbflush.h>
@@ -1629,10 +1629,109 @@
 }
 EXPORT_SYMBOL(vmalloc_32_user);
 
+/*
+ * small helper routine , copy contents to buf from addr.
+ * If the page is not present, fill zero.
+ */
+
+static int aligned_vread(char *buf, char *addr, unsigned long count)
+{
+	struct page *p;
+	int copied = 0;
+
+	while (count) {
+		unsigned long offset, length;
+
+		offset = (unsigned long)addr & ~PAGE_MASK;
+		length = PAGE_SIZE - offset;
+		if (length > count)
+			length = count;
+		p = vmalloc_to_page(addr);
+		/*
+		 * To do safe access to this _mapped_ area, we need
+		 * lock. But adding lock here means that we need to add
+		 * overhead of vmalloc()/vfree() calles for this _debug_
+		 * interface, rarely used. Instead of that, we'll use
+		 * kmap() and get small overhead in this access function.
+		 */
+		if (p) {
+			/* we can expect USR1 is not used */
+			void *map = kmap_atomic(p, KM_USER1);
+			memcpy(buf, map + offset, length);
+			kunmap_atomic(map, KM_USER1);
+		} else
+			memset(buf, 0, length);
+
+		addr += length;
+		buf += length;
+		copied += length;
+		count -= length;
+	}
+	return copied;
+}
+
+static int aligned_vwrite(char *buf, char *addr, unsigned long count)
+{
+	struct page *p;
+	int copied = 0;
+
+	while (count) {
+		unsigned long offset, length;
+
+		offset = (unsigned long)addr & ~PAGE_MASK;
+		length = PAGE_SIZE - offset;
+		if (length > count)
+			length = count;
+		p = vmalloc_to_page(addr);
+		/*
+		 * To do safe access to this _mapped_ area, we need
+		 * lock. But adding lock here means that we need to add
+		 * overhead of vmalloc()/vfree() calles for this _debug_
+		 * interface, rarely used. Instead of that, we'll use
+		 * kmap() and get small overhead in this access function.
+		 */
+		if (p) {
+			/* we can expect USR1 is not used */
+			void *map = kmap_atomic(p, KM_USER1);
+			memcpy(map + offset, buf, length);
+			kunmap_atomic(map, KM_USER1);
+		}
+		addr += length;
+		buf += length;
+		copied += length;
+		count -= length;
+	}
+	return copied;
+}
+
+/**
+ *	vread() -  read vmalloc area in a safe way.
+ *	@buf:		buffer for reading data
+ *	@addr:		vm address.
+ *	@count:		number of bytes to be read.
+ *
+ *	Returns # of bytes which addr and buf should be increased.
+ *	(same to count).
+ *	If [addr...addr+count) doesn't includes any valid area, returns 0.
+ *
+ *	This function checks that addr is a valid vmalloc'ed area, and
+ *	copy data from that area to a given buffer. If the given memory range of
+ *	[addr...addr+count) includes some valid address, data is copied to
+ *	proper area of @buf. If there are memory holes, they'll be zero-filled.
+ *	IOREMAP area is treated as memory hole and no copy is done.
+ *
+ *	Note: In usual ops, vread() is never necessary because the caller should
+ *	know vmalloc() area is valid and can use memcpy(). This is for routines
+ *	which have to access vmalloc area without any informaion, as /dev/kmem.
+ *
+ *	The caller should guarantee KM_USER1 is not used.
+ */
+
 long vread(char *buf, char *addr, unsigned long count)
 {
 	struct vm_struct *tmp;
 	char *vaddr, *buf_start = buf;
+	unsigned long buflen = count;
 	unsigned long n;
 
 	/* Don't allow overflow */
@@ -1640,7 +1739,7 @@
 		count = -(unsigned long) addr;
 
 	read_lock(&vmlist_lock);
-	for (tmp = vmlist; tmp; tmp = tmp->next) {
+	for (tmp = vmlist; count && tmp; tmp = tmp->next) {
 		vaddr = (char *) tmp->addr;
 		if (addr >= vaddr + tmp->size - PAGE_SIZE)
 			continue;
@@ -1653,32 +1752,66 @@
 			count--;
 		}
 		n = vaddr + tmp->size - PAGE_SIZE - addr;
-		do {
-			if (count == 0)
-				goto finished;
-			*buf = *addr;
-			buf++;
-			addr++;
-			count--;
-		} while (--n > 0);
+		if (n > count)
+			n = count;
+		if (!(tmp->flags & VM_IOREMAP))
+			aligned_vread(buf, addr, n);
+		else /* IOREMAP area is treated as memory hole */
+			memset(buf, 0, n);
+		buf += n;
+		addr += n;
+		count -= n;
 	}
 finished:
 	read_unlock(&vmlist_lock);
-	return buf - buf_start;
+
+	if (buf == buf_start)
+		return 0;
+	/* zero-fill memory holes */
+	if (buf != buf_start + buflen)
+		memset(buf, 0, buflen - (buf - buf_start));
+
+	return buflen;
 }
 
+/**
+ *	vwrite() -  write vmalloc area in a safe way.
+ *	@buf:		buffer for source data
+ *	@addr:		vm address.
+ *	@count:		number of bytes to be read.
+ *
+ *	Returns # of bytes which addr and buf should be incresed.
+ *	(same to count).
+ *	If [addr...addr+count) doesn't includes any valid area, returns 0.
+ *
+ *	This function checks that addr is a valid vmalloc'ed area, and
+ *	copy data from a buffer to the given addr. If specified range of
+ *	[addr...addr+count) includes some valid address, data is copied from
+ *	proper area of @buf. If there are memory holes, no copy to hole.
+ *	IOREMAP area is treated as memory hole and no copy is done.
+ *
+ *	Note: In usual ops, vwrite() is never necessary because the caller
+ *	should know vmalloc() area is valid and can use memcpy().
+ *	This is for routines which have to access vmalloc area without
+ *	any informaion, as /dev/kmem.
+ *
+ *	The caller should guarantee KM_USER1 is not used.
+ */
+
 long vwrite(char *buf, char *addr, unsigned long count)
 {
 	struct vm_struct *tmp;
-	char *vaddr, *buf_start = buf;
-	unsigned long n;
+	char *vaddr;
+	unsigned long n, buflen;
+	int copied = 0;
 
 	/* Don't allow overflow */
 	if ((unsigned long) addr + count < count)
 		count = -(unsigned long) addr;
+	buflen = count;
 
 	read_lock(&vmlist_lock);
-	for (tmp = vmlist; tmp; tmp = tmp->next) {
+	for (tmp = vmlist; count && tmp; tmp = tmp->next) {
 		vaddr = (char *) tmp->addr;
 		if (addr >= vaddr + tmp->size - PAGE_SIZE)
 			continue;
@@ -1690,18 +1823,21 @@
 			count--;
 		}
 		n = vaddr + tmp->size - PAGE_SIZE - addr;
-		do {
-			if (count == 0)
-				goto finished;
-			*addr = *buf;
-			buf++;
-			addr++;
-			count--;
-		} while (--n > 0);
+		if (n > count)
+			n = count;
+		if (!(tmp->flags & VM_IOREMAP)) {
+			aligned_vwrite(buf, addr, n);
+			copied++;
+		}
+		buf += n;
+		addr += n;
+		count -= n;
 	}
 finished:
 	read_unlock(&vmlist_lock);
-	return buf - buf_start;
+	if (!copied)
+		return 0;
+	return buflen;
 }
 
 /**


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [BUGFIX][PATCH 3/3] kcore: /proc/kcore should use vread
  2009-08-03 11:14       ` [BUGFIX][PATCH 0/3] kcore: avoid access to holes in vmalloc v3 (Was " KAMEZAWA Hiroyuki
  2009-08-03 11:15         ` [BUGFIX][PATCH 1/3] unmap vmalloc area after hide it KAMEZAWA Hiroyuki
  2009-08-03 11:18         ` [BUGFIX][PATCH 2/3] kcore: fix vread/vwrite to be aware of holes KAMEZAWA Hiroyuki
@ 2009-08-03 11:19         ` KAMEZAWA Hiroyuki
  2 siblings, 0 replies; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-08-03 11:19 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Mike Smith, Andrew Morton, bugzilla-daemon, bugme-daemon,
	Amerigo Wang, linux-kernel

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

/proc/kcore has its own routine to access vmallc area. It can be replaced
with vread(). And by this, /proc/kcore can do safe access to vmalloc area.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
---
 fs/proc/kcore.c |   35 ++---------------------------------
 1 file changed, 2 insertions(+), 33 deletions(-)

Index: linux-2.6.31-rc5/fs/proc/kcore.c
===================================================================
--- linux-2.6.31-rc5.orig/fs/proc/kcore.c	2009-08-03 19:50:58.000000000 +0900
+++ linux-2.6.31-rc5/fs/proc/kcore.c	2009-08-03 19:56:01.000000000 +0900
@@ -328,43 +328,12 @@
 				return -EFAULT;
 		} else if (is_vmalloc_addr((void *)start)) {
 			char * elf_buf;
-			struct vm_struct *m;
-			unsigned long curstart = start;
-			unsigned long cursize = tsz;
 
 			elf_buf = kzalloc(tsz, GFP_KERNEL);
 			if (!elf_buf)
 				return -ENOMEM;
-
-			read_lock(&vmlist_lock);
-			for (m=vmlist; m && cursize; m=m->next) {
-				unsigned long vmstart;
-				unsigned long vmsize;
-				unsigned long msize = m->size - PAGE_SIZE;
-
-				if (((unsigned long)m->addr + msize) < 
-								curstart)
-					continue;
-				if ((unsigned long)m->addr > (curstart + 
-								cursize))
-					break;
-				vmstart = (curstart < (unsigned long)m->addr ? 
-					(unsigned long)m->addr : curstart);
-				if (((unsigned long)m->addr + msize) > 
-							(curstart + cursize))
-					vmsize = curstart + cursize - vmstart;
-				else
-					vmsize = (unsigned long)m->addr + 
-							msize - vmstart;
-				curstart = vmstart + vmsize;
-				cursize -= vmsize;
-				/* don't dump ioremap'd stuff! (TA) */
-				if (m->flags & VM_IOREMAP)
-					continue;
-				memcpy(elf_buf + (vmstart - start),
-					(char *)vmstart, vmsize);
-			}
-			read_unlock(&vmlist_lock);
+			vread(elf_buf, (char *)start, tsz);
+			/* we have to zero-fill user buffer even if no read */
 			if (copy_to_user(buffer, elf_buf, tsz)) {
 				kfree(elf_buf);
 				return -EFAULT;


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUGFIX][PATCH 2/3] kcore: fix vread/vwrite to be aware of holes.
  2009-08-03 11:18         ` [BUGFIX][PATCH 2/3] kcore: fix vread/vwrite to be aware of holes KAMEZAWA Hiroyuki
@ 2009-08-04  9:23           ` Amerigo Wang
  2009-08-04  9:55             ` KAMEZAWA Hiroyuki
  2009-08-04 21:30           ` Andrew Morton
  1 sibling, 1 reply; 31+ messages in thread
From: Amerigo Wang @ 2009-08-04  9:23 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Mike Smith, Andrew Morton, bugzilla-daemon, bugme-daemon,
	Amerigo Wang, linux-kernel

On Mon, Aug 03, 2009 at 08:18:45PM +0900, KAMEZAWA Hiroyuki wrote:
>From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
>vread/vwrite access vmalloc area without checking there is a page or not.
>In most case, this works well.
>
>In old ages, the caller of get_vm_ara() is only IOREMAP and there is no
>memory hole within vm_struct's [addr...addr + size - PAGE_SIZE]
>( -PAGE_SIZE is for a guard page.)
>
>After per-cpu-alloc patch, it uses get_vm_area() for reserve continuous
>virtual address but remap _later_. There tend to be a hole in valid vmalloc
>area in vm_struct lists.
>Then, skip the hole (not mapped page) is necessary.
>This patch updates vread/vwrite() for avoiding memory hole.
>
>Routines which access vmalloc area without knowing for which addr is used
>are
>  - /proc/kcore
>  - /dev/kmem
>
>kcore checks IOREMAP, /dev/kmem doesn't. After this patch, IOREMAP is
>checked and /dev/kmem will avoid to read/write it.
>Fixes to /proc/kcore will be in the next patch in series.
>
>Changelog v2->v3:
> - fixed typos.
> - use kmap. (if not using kmap, we have to add lock here.)


Hmm.. I missed this.


> - fixed PAGE_MASK miss-use.
>Changelog v1->v2:
> - enhanced comments.
> - treat IOREMAP as hole always.
> - zero-fill memory hole if [addr...addr+size] includes valid pages.
> - returns 0 if [addr...addr+size) includes no valid pages.
>
>Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


This time it looks much better now, but it still has a small problem.
Please check it below.


>---
> mm/vmalloc.c |  182 +++++++++++++++++++++++++++++++++++++++++++++++++++--------
> 1 file changed, 159 insertions(+), 23 deletions(-)
>

<snip>


>+
>+/**
>+ *	vread() -  read vmalloc area in a safe way.
>+ *	@buf:		buffer for reading data
>+ *	@addr:		vm address.
>+ *	@count:		number of bytes to be read.
>+ *
>+ *	Returns # of bytes which addr and buf should be increased.
>+ *	(same to count).
>+ *	If [addr...addr+count) doesn't includes any valid area, returns 0.


If I read it correctly, your code doesn't do what you described here,
it doesn't return 0 when there is no valid area.


>+ *
>+ *	This function checks that addr is a valid vmalloc'ed area, and
>+ *	copy data from that area to a given buffer. If the given memory range of
>+ *	[addr...addr+count) includes some valid address, data is copied to
>+ *	proper area of @buf. If there are memory holes, they'll be zero-filled.
>+ *	IOREMAP area is treated as memory hole and no copy is done.
>+ *
>+ *	Note: In usual ops, vread() is never necessary because the caller should
>+ *	know vmalloc() area is valid and can use memcpy(). This is for routines
>+ *	which have to access vmalloc area without any informaion, as /dev/kmem.
>+ *
>+ *	The caller should guarantee KM_USER1 is not used.
>+ */
>+
> long vread(char *buf, char *addr, unsigned long count)
> {
> 	struct vm_struct *tmp;
> 	char *vaddr, *buf_start = buf;
>+	unsigned long buflen = count;
> 	unsigned long n;
> 
> 	/* Don't allow overflow */
>@@ -1640,7 +1739,7 @@
> 		count = -(unsigned long) addr;
> 
> 	read_lock(&vmlist_lock);
>-	for (tmp = vmlist; tmp; tmp = tmp->next) {
>+	for (tmp = vmlist; count && tmp; tmp = tmp->next) {
> 		vaddr = (char *) tmp->addr;
> 		if (addr >= vaddr + tmp->size - PAGE_SIZE)
> 			continue;
>@@ -1653,32 +1752,66 @@
> 			count--;
> 		}
> 		n = vaddr + tmp->size - PAGE_SIZE - addr;
>-		do {
>-			if (count == 0)
>-				goto finished;
>-			*buf = *addr;
>-			buf++;
>-			addr++;
>-			count--;
>-		} while (--n > 0);
>+		if (n > count)
>+			n = count;
>+		if (!(tmp->flags & VM_IOREMAP))
>+			aligned_vread(buf, addr, n);
>+		else /* IOREMAP area is treated as memory hole */
>+			memset(buf, 0, n);
>+		buf += n;
>+		addr += n;
>+		count -= n;
> 	}
> finished:
> 	read_unlock(&vmlist_lock);
>-	return buf - buf_start;
>+
>+	if (buf == buf_start)
>+		return 0;
>+	/* zero-fill memory holes */
>+	if (buf != buf_start + buflen)
>+		memset(buf, 0, buflen - (buf - buf_start));
>+
>+	return buflen;
> }

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUGFIX][PATCH 2/3] kcore: fix vread/vwrite to be aware of holes.
  2009-08-04  9:23           ` Amerigo Wang
@ 2009-08-04  9:55             ` KAMEZAWA Hiroyuki
  2009-08-05  2:09               ` Amerigo Wang
  0 siblings, 1 reply; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-08-04  9:55 UTC (permalink / raw)
  To: Amerigo Wang
  Cc: Mike Smith, Andrew Morton, bugzilla-daemon, bugme-daemon,
	linux-kernel

On Tue, 4 Aug 2009 17:23:16 +0800
Amerigo Wang <xiyou.wangcong@gmail.com> wrote:

> On Mon, Aug 03, 2009 at 08:18:45PM +0900, KAMEZAWA Hiroyuki wrote:
> >From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >
> >vread/vwrite access vmalloc area without checking there is a page or not.
> >In most case, this works well.
> >
> >In old ages, the caller of get_vm_ara() is only IOREMAP and there is no
> >memory hole within vm_struct's [addr...addr + size - PAGE_SIZE]
> >( -PAGE_SIZE is for a guard page.)
> >
> >After per-cpu-alloc patch, it uses get_vm_area() for reserve continuous
> >virtual address but remap _later_. There tend to be a hole in valid vmalloc
> >area in vm_struct lists.
> >Then, skip the hole (not mapped page) is necessary.
> >This patch updates vread/vwrite() for avoiding memory hole.
> >
> >Routines which access vmalloc area without knowing for which addr is used
> >are
> >  - /proc/kcore
> >  - /dev/kmem
> >
> >kcore checks IOREMAP, /dev/kmem doesn't. After this patch, IOREMAP is
> >checked and /dev/kmem will avoid to read/write it.
> >Fixes to /proc/kcore will be in the next patch in series.
> >
> >Changelog v2->v3:
> > - fixed typos.
> > - use kmap. (if not using kmap, we have to add lock here.)
> 
> 
> Hmm.. I missed this.
> 
> 
> > - fixed PAGE_MASK miss-use.
> >Changelog v1->v2:
> > - enhanced comments.
> > - treat IOREMAP as hole always.
> > - zero-fill memory hole if [addr...addr+size] includes valid pages.
> > - returns 0 if [addr...addr+size) includes no valid pages.
> >
> >Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> 
> This time it looks much better now, but it still has a small problem.
> Please check it below.
> 
> 
thanks.


> >---
> > mm/vmalloc.c |  182 +++++++++++++++++++++++++++++++++++++++++++++++++++--------
> > 1 file changed, 159 insertions(+), 23 deletions(-)
> >
> 
> <snip>
> 
> 
> >+
> >+/**
> >+ *	vread() -  read vmalloc area in a safe way.
> >+ *	@buf:		buffer for reading data
> >+ *	@addr:		vm address.
> >+ *	@count:		number of bytes to be read.
> >+ *
> >+ *	Returns # of bytes which addr and buf should be increased.
> >+ *	(same to count).
> >+ *	If [addr...addr+count) doesn't includes any valid area, returns 0.
> 
> 
> If I read it correctly, your code doesn't do what you described here,
> it doesn't return 0 when there is no valid area.
> 

Hmm.. valid area is ambiguous, ok, how about this ?

If [addr...addr+count) has no intersects with exisiting vm_struct, returns 0.

Thanks,
-Kame




> 
> >+ *
> >+ *	This function checks that addr is a valid vmalloc'ed area, and
> >+ *	copy data from that area to a given buffer. If the given memory range of
> >+ *	[addr...addr+count) includes some valid address, data is copied to
> >+ *	proper area of @buf. If there are memory holes, they'll be zero-filled.
> >+ *	IOREMAP area is treated as memory hole and no copy is done.
> >+ *
> >+ *	Note: In usual ops, vread() is never necessary because the caller should
> >+ *	know vmalloc() area is valid and can use memcpy(). This is for routines
> >+ *	which have to access vmalloc area without any informaion, as /dev/kmem.
> >+ *
> >+ *	The caller should guarantee KM_USER1 is not used.
> >+ */
> >+
> > long vread(char *buf, char *addr, unsigned long count)
> > {
> > 	struct vm_struct *tmp;
> > 	char *vaddr, *buf_start = buf;
> >+	unsigned long buflen = count;
> > 	unsigned long n;
> > 
> > 	/* Don't allow overflow */
> >@@ -1640,7 +1739,7 @@
> > 		count = -(unsigned long) addr;
> > 
> > 	read_lock(&vmlist_lock);
> >-	for (tmp = vmlist; tmp; tmp = tmp->next) {
> >+	for (tmp = vmlist; count && tmp; tmp = tmp->next) {
> > 		vaddr = (char *) tmp->addr;
> > 		if (addr >= vaddr + tmp->size - PAGE_SIZE)
> > 			continue;
> >@@ -1653,32 +1752,66 @@
> > 			count--;
> > 		}
> > 		n = vaddr + tmp->size - PAGE_SIZE - addr;
> >-		do {
> >-			if (count == 0)
> >-				goto finished;
> >-			*buf = *addr;
> >-			buf++;
> >-			addr++;
> >-			count--;
> >-		} while (--n > 0);
> >+		if (n > count)
> >+			n = count;
> >+		if (!(tmp->flags & VM_IOREMAP))
> >+			aligned_vread(buf, addr, n);
> >+		else /* IOREMAP area is treated as memory hole */
> >+			memset(buf, 0, n);
> >+		buf += n;
> >+		addr += n;
> >+		count -= n;
> > 	}
> > finished:
> > 	read_unlock(&vmlist_lock);
> >-	return buf - buf_start;
> >+
> >+	if (buf == buf_start)
> >+		return 0;
> >+	/* zero-fill memory holes */
> >+	if (buf != buf_start + buflen)
> >+		memset(buf, 0, buflen - (buf - buf_start));
> >+
> >+	return buflen;
> > }
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUGFIX][PATCH 2/3] kcore: fix vread/vwrite to be aware of holes.
  2009-08-03 11:18         ` [BUGFIX][PATCH 2/3] kcore: fix vread/vwrite to be aware of holes KAMEZAWA Hiroyuki
  2009-08-04  9:23           ` Amerigo Wang
@ 2009-08-04 21:30           ` Andrew Morton
  2009-08-05  1:08             ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 31+ messages in thread
From: Andrew Morton @ 2009-08-04 21:30 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: kamezawa.hiroyu, scgtrp, bugzilla-daemon, bugme-daemon,
	xiyou.wangcong, linux-kernel

On Mon, 3 Aug 2009 20:18:45 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> vread/vwrite access vmalloc area without checking there is a page or not.
> In most case, this works well.
> 
> In old ages, the caller of get_vm_ara() is only IOREMAP and there is no
> memory hole within vm_struct's [addr...addr + size - PAGE_SIZE]
> ( -PAGE_SIZE is for a guard page.)
> 
> After per-cpu-alloc patch, it uses get_vm_area() for reserve continuous
> virtual address but remap _later_. There tend to be a hole in valid vmalloc
> area in vm_struct lists.
> Then, skip the hole (not mapped page) is necessary.
> This patch updates vread/vwrite() for avoiding memory hole.
> 
> Routines which access vmalloc area without knowing for which addr is used
> are
>   - /proc/kcore
>   - /dev/kmem
> 
> kcore checks IOREMAP, /dev/kmem doesn't. After this patch, IOREMAP is
> checked and /dev/kmem will avoid to read/write it.
> Fixes to /proc/kcore will be in the next patch in series.
> 
> Changelog v2->v3:
>  - fixed typos.
>  - use kmap. (if not using kmap, we have to add lock here.)
>  - fixed PAGE_MASK miss-use.
> Changelog v1->v2:
>  - enhanced comments.
>  - treat IOREMAP as hole always.
>  - zero-fill memory hole if [addr...addr+size] includes valid pages.
>  - returns 0 if [addr...addr+size) includes no valid pages.
> 
> ...
>
> +static int aligned_vread(char *buf, char *addr, unsigned long count)
> +{
> +	struct page *p;
> +	int copied = 0;
> +
> +	while (count) {
> +		unsigned long offset, length;
> +
> +		offset = (unsigned long)addr & ~PAGE_MASK;
> +		length = PAGE_SIZE - offset;
> +		if (length > count)
> +			length = count;
> +		p = vmalloc_to_page(addr);
> +		/*
> +		 * To do safe access to this _mapped_ area, we need
> +		 * lock. But adding lock here means that we need to add
> +		 * overhead of vmalloc()/vfree() calles for this _debug_
> +		 * interface, rarely used. Instead of that, we'll use
> +		 * kmap() and get small overhead in this access function.
> +		 */
> +		if (p) {
> +			/* we can expect USR1 is not used */

It would be nice if the comment were to explain _why_ KM_USER1 is known
to be free here.


> +			void *map = kmap_atomic(p, KM_USER1);
> +			memcpy(buf, map + offset, length);
> +			kunmap_atomic(map, KM_USER1);

Can use clear_highpage().

> +		} else
> +			memset(buf, 0, length);
> +
> +		addr += length;
> +		buf += length;
> +		copied += length;
> +		count -= length;
> +	}
> +	return copied;
> +}
> +
> +static int aligned_vwrite(char *buf, char *addr, unsigned long count)
> +{
> +	struct page *p;
> +	int copied = 0;
> +
> +	while (count) {
> +		unsigned long offset, length;
> +
> +		offset = (unsigned long)addr & ~PAGE_MASK;
> +		length = PAGE_SIZE - offset;
> +		if (length > count)
> +			length = count;
> +		p = vmalloc_to_page(addr);
> +		/*
> +		 * To do safe access to this _mapped_ area, we need
> +		 * lock. But adding lock here means that we need to add
> +		 * overhead of vmalloc()/vfree() calles for this _debug_
> +		 * interface, rarely used. Instead of that, we'll use
> +		 * kmap() and get small overhead in this access function.
> +		 */
> +		if (p) {
> +			/* we can expect USR1 is not used */
> +			void *map = kmap_atomic(p, KM_USER1);
> +			memcpy(map + offset, buf, length);
> +			kunmap_atomic(map, KM_USER1);

clear_highpage().

> +		}
> +		addr += length;
> +		buf += length;
> +		copied += length;
> +		count -= length;
> +	}
> +	return copied;
> +}


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUGFIX][PATCH 2/3] kcore: fix vread/vwrite to be aware of holes.
  2009-08-04 21:30           ` Andrew Morton
@ 2009-08-05  1:08             ` KAMEZAWA Hiroyuki
  2009-08-05  1:24               ` Andrew Morton
  0 siblings, 1 reply; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-08-05  1:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: scgtrp, bugzilla-daemon, bugme-daemon, xiyou.wangcong,
	linux-kernel

On Tue, 4 Aug 2009 14:30:31 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Mon, 3 Aug 2009 20:18:45 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> > vread/vwrite access vmalloc area without checking there is a page or not.
> > In most case, this works well.
> > 
> > In old ages, the caller of get_vm_ara() is only IOREMAP and there is no
> > memory hole within vm_struct's [addr...addr + size - PAGE_SIZE]
> > ( -PAGE_SIZE is for a guard page.)
> > 
> > After per-cpu-alloc patch, it uses get_vm_area() for reserve continuous
> > virtual address but remap _later_. There tend to be a hole in valid vmalloc
> > area in vm_struct lists.
> > Then, skip the hole (not mapped page) is necessary.
> > This patch updates vread/vwrite() for avoiding memory hole.
> > 
> > Routines which access vmalloc area without knowing for which addr is used
> > are
> >   - /proc/kcore
> >   - /dev/kmem
> > 
> > kcore checks IOREMAP, /dev/kmem doesn't. After this patch, IOREMAP is
> > checked and /dev/kmem will avoid to read/write it.
> > Fixes to /proc/kcore will be in the next patch in series.
> > 
> > Changelog v2->v3:
> >  - fixed typos.
> >  - use kmap. (if not using kmap, we have to add lock here.)
> >  - fixed PAGE_MASK miss-use.
> > Changelog v1->v2:
> >  - enhanced comments.
> >  - treat IOREMAP as hole always.
> >  - zero-fill memory hole if [addr...addr+size] includes valid pages.
> >  - returns 0 if [addr...addr+size) includes no valid pages.
> > 
> > ...
> >
> > +static int aligned_vread(char *buf, char *addr, unsigned long count)
> > +{
> > +	struct page *p;
> > +	int copied = 0;
> > +
> > +	while (count) {
> > +		unsigned long offset, length;
> > +
> > +		offset = (unsigned long)addr & ~PAGE_MASK;
> > +		length = PAGE_SIZE - offset;
> > +		if (length > count)
> > +			length = count;
> > +		p = vmalloc_to_page(addr);
> > +		/*
> > +		 * To do safe access to this _mapped_ area, we need
> > +		 * lock. But adding lock here means that we need to add
> > +		 * overhead of vmalloc()/vfree() calles for this _debug_
> > +		 * interface, rarely used. Instead of that, we'll use
> > +		 * kmap() and get small overhead in this access function.
> > +		 */
> > +		if (p) {
> > +			/* we can expect USR1 is not used */
> 
> It would be nice if the comment were to explain _why_ KM_USER1 is known
> to be free here.
> 
ok, will do. Hmm, but KM_USR0 is better ?
I'm not sure difference between KM_USR0/KM_USR1...



> 
> > +			void *map = kmap_atomic(p, KM_USER1);
> > +			memcpy(buf, map + offset, length);
> > +			kunmap_atomic(map, KM_USER1);
> 
> Can use clear_highpage().
> 
ah, "buf" is expected to be kmalloc'ed buffer. I'll explain it in Usage
or adds comment.

Thank you for review!

-Kame


> > +		} else
> > +			memset(buf, 0, length);
> > +
> > +		addr += length;
> > +		buf += length;
> > +		copied += length;
> > +		count -= length;
> > +	}
> > +	return copied;
> > +}
> > +
> > +static int aligned_vwrite(char *buf, char *addr, unsigned long count)
> > +{
> > +	struct page *p;
> > +	int copied = 0;
> > +
> > +	while (count) {
> > +		unsigned long offset, length;
> > +
> > +		offset = (unsigned long)addr & ~PAGE_MASK;
> > +		length = PAGE_SIZE - offset;
> > +		if (length > count)
> > +			length = count;
> > +		p = vmalloc_to_page(addr);
> > +		/*
> > +		 * To do safe access to this _mapped_ area, we need
> > +		 * lock. But adding lock here means that we need to add
> > +		 * overhead of vmalloc()/vfree() calles for this _debug_
> > +		 * interface, rarely used. Instead of that, we'll use
> > +		 * kmap() and get small overhead in this access function.
> > +		 */
> > +		if (p) {
> > +			/* we can expect USR1 is not used */
> > +			void *map = kmap_atomic(p, KM_USER1);
> > +			memcpy(map + offset, buf, length);
> > +			kunmap_atomic(map, KM_USER1);
> 
> clear_highpage().
> 
> > +		}
> > +		addr += length;
> > +		buf += length;
> > +		copied += length;
> > +		count -= length;
> > +	}
> > +	return copied;
> > +}
> 
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUGFIX][PATCH 2/3] kcore: fix vread/vwrite to be aware of holes.
  2009-08-05  1:08             ` KAMEZAWA Hiroyuki
@ 2009-08-05  1:24               ` Andrew Morton
  2009-08-05  1:39                 ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 31+ messages in thread
From: Andrew Morton @ 2009-08-05  1:24 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: scgtrp, bugzilla-daemon, bugme-daemon, xiyou.wangcong,
	linux-kernel

On Wed, 5 Aug 2009 10:08:43 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> > > +		/*
> > > +		 * To do safe access to this _mapped_ area, we need
> > > +		 * lock. But adding lock here means that we need to add
> > > +		 * overhead of vmalloc()/vfree() calles for this _debug_
> > > +		 * interface, rarely used. Instead of that, we'll use
> > > +		 * kmap() and get small overhead in this access function.
> > > +		 */
> > > +		if (p) {
> > > +			/* we can expect USR1 is not used */
> > 
> > It would be nice if the comment were to explain _why_ KM_USER1 is known
> > to be free here.
> > 
> ok, will do. Hmm, but KM_USR0 is better ?
> I'm not sure difference between KM_USR0/KM_USR1...

KM_USER0 is conventionally the first one to use.  We only use KM_USER1
if it is known that KM_USER0 is already used.

> 
> 
> > 
> > > +			void *map = kmap_atomic(p, KM_USER1);
> > > +			memcpy(buf, map + offset, length);
> > > +			kunmap_atomic(map, KM_USER1);
> > 
> > Can use clear_highpage().
> > 
> ah, "buf" is expected to be kmalloc'ed buffer. I'll explain it in Usage
> or adds comment.
> 

argh, memcpy != memset.  I knew that ;)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUGFIX][PATCH 2/3] kcore: fix vread/vwrite to be aware of holes.
  2009-08-05  1:24               ` Andrew Morton
@ 2009-08-05  1:39                 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-08-05  1:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: scgtrp, bugzilla-daemon, bugme-daemon, xiyou.wangcong,
	linux-kernel

On Tue, 4 Aug 2009 18:24:57 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Wed, 5 Aug 2009 10:08:43 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > > > +		/*
> > > > +		 * To do safe access to this _mapped_ area, we need
> > > > +		 * lock. But adding lock here means that we need to add
> > > > +		 * overhead of vmalloc()/vfree() calles for this _debug_
> > > > +		 * interface, rarely used. Instead of that, we'll use
> > > > +		 * kmap() and get small overhead in this access function.
> > > > +		 */
> > > > +		if (p) {
> > > > +			/* we can expect USR1 is not used */
> > > 
> > > It would be nice if the comment were to explain _why_ KM_USER1 is known
> > > to be free here.
> > > 
> > ok, will do. Hmm, but KM_USR0 is better ?
> > I'm not sure difference between KM_USR0/KM_USR1...
> 
> KM_USER0 is conventionally the first one to use.  We only use KM_USER1
> if it is known that KM_USER0 is already used.
> 
Hmm, It sounds better to update this to use KM_USER0 when I applying Amerigo's comment.

Thanks,
-Kame


> > 
> > 
> > > 
> > > > +			void *map = kmap_atomic(p, KM_USER1);
> > > > +			memcpy(buf, map + offset, length);
> > > > +			kunmap_atomic(map, KM_USER1);
> > > 
> > > Can use clear_highpage().
> > > 
> > ah, "buf" is expected to be kmalloc'ed buffer. I'll explain it in Usage
> > or adds comment.
> > 
> 
> argh, memcpy != memset.  I knew that ;)
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUGFIX][PATCH 2/3] kcore: fix vread/vwrite to be aware of holes.
  2009-08-04  9:55             ` KAMEZAWA Hiroyuki
@ 2009-08-05  2:09               ` Amerigo Wang
  0 siblings, 0 replies; 31+ messages in thread
From: Amerigo Wang @ 2009-08-05  2:09 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Amerigo Wang, Mike Smith, Andrew Morton, bugzilla-daemon,
	bugme-daemon, linux-kernel

On Tue, Aug 04, 2009 at 06:55:26PM +0900, KAMEZAWA Hiroyuki wrote:
>> >+
>> >+/**
>> >+ *	vread() -  read vmalloc area in a safe way.
>> >+ *	@buf:		buffer for reading data
>> >+ *	@addr:		vm address.
>> >+ *	@count:		number of bytes to be read.
>> >+ *
>> >+ *	Returns # of bytes which addr and buf should be increased.
>> >+ *	(same to count).
>> >+ *	If [addr...addr+count) doesn't includes any valid area, returns 0.
>> 
>> 
>> If I read it correctly, your code doesn't do what you described here,
>> it doesn't return 0 when there is no valid area.
>> 
>
>Hmm.. valid area is ambiguous, ok, how about this ?
>
>If [addr...addr+count) has no intersects with exisiting vm_struct, returns 0.

Ah, I see your points now, Ok then.

Thanks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2009-08-05  2:06 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <bug-13850-10286@http.bugzilla.kernel.org/>
2009-07-28 23:05 ` [Bugme-new] [Bug 13850] New: reading /proc/kcore causes oops Andrew Morton
2009-07-28 23:48   ` KAMEZAWA Hiroyuki
2009-07-29  2:46     ` Mike Smith
2009-07-29  3:32       ` KAMEZAWA Hiroyuki
2009-07-29  8:36         ` [RFC][PATCH 1/2] vmalloc: reorder unmap and removal entry KAMEZAWA Hiroyuki
2009-07-29  8:48           ` [RFC][PATCH 2/2] vmalloc: vread vwrite avoid memory hole KAMEZAWA Hiroyuki
2009-07-29  9:59             ` Amerigo Wang
2009-07-29 11:51               ` KAMEZAWA Hiroyuki
2009-07-31  9:38                 ` Amerigo Wang
2009-07-29  9:40           ` [RFC][PATCH 1/2] vmalloc: reorder unmap and removal entry Amerigo Wang
2009-07-29 11:53             ` KAMEZAWA Hiroyuki
2009-07-31  7:07         ` [BUGFIX][PATCH 0/3] fix bug for /proc/kcore causes panic (Was: [Bugme-new] [Bug 13850] New: reading /proc/kcore causes oops KAMEZAWA Hiroyuki
2009-07-31  7:11           ` [BUGFIX][PATCH 1/3] fix vread/vwrite to be aware of memory hole KAMEZAWA Hiroyuki
2009-07-31  9:57             ` Amerigo Wang
2009-07-31 10:32               ` KAMEZAWA Hiroyuki
2009-08-03  9:10                 ` Amerigo Wang
2009-08-03  9:10                   ` KAMEZAWA Hiroyuki
2009-07-31  7:12           ` [BUGFIX][PATCH 2/3] kcore should use vread KAMEZAWA Hiroyuki
2009-07-31  7:13           ` [BUGFIX][PATCH 3/3] unmap vmalloc area after hide it KAMEZAWA Hiroyuki
2009-07-31  7:21           ` [BUGFIX][PATCH 0/3] fix bug for /proc/kcore causes panic (Was: [Bugme-new] [Bug 13850] New: reading /proc/kcore causes oops KAMEZAWA Hiroyuki
2009-08-03 11:14       ` [BUGFIX][PATCH 0/3] kcore: avoid access to holes in vmalloc v3 (Was " KAMEZAWA Hiroyuki
2009-08-03 11:15         ` [BUGFIX][PATCH 1/3] unmap vmalloc area after hide it KAMEZAWA Hiroyuki
2009-08-03 11:18         ` [BUGFIX][PATCH 2/3] kcore: fix vread/vwrite to be aware of holes KAMEZAWA Hiroyuki
2009-08-04  9:23           ` Amerigo Wang
2009-08-04  9:55             ` KAMEZAWA Hiroyuki
2009-08-05  2:09               ` Amerigo Wang
2009-08-04 21:30           ` Andrew Morton
2009-08-05  1:08             ` KAMEZAWA Hiroyuki
2009-08-05  1:24               ` Andrew Morton
2009-08-05  1:39                 ` KAMEZAWA Hiroyuki
2009-08-03 11:19         ` [BUGFIX][PATCH 3/3] kcore: /proc/kcore should use vread KAMEZAWA Hiroyuki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.