* *sigh* /proc/*/pagemap
@ 2008-07-05 1:07 Alexey Dobriyan
2008-07-05 1:53 ` Andrew Morton
0 siblings, 1 reply; 9+ messages in thread
From: Alexey Dobriyan @ 2008-07-05 1:07 UTC (permalink / raw)
To: torvalds; +Cc: akpm, mpm, linux-kernel
I'm few days from my patch-sending facility, but could someone
_please_ do the following before release:
1) initialize pagemap_walk.mm to "mm" , so the code starts working as
advertised, and
2) also initialize ->private to "&pm" so it wouldn't immediately oops
in pagemap_pte_hole(), and
3) unstatic struct pagemap_walk, so two threads won't fsckup each other
(including those started by root, including flipping ->mm when you
don't have permissions),
and
4) remove second ptrace_may_attach(), and
5) check with microscope allocation there -- page-aligned address and size == 0
should allocate 0 bytes, and
6) actually check that it works.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: *sigh* /proc/*/pagemap
2008-07-05 1:07 *sigh* /proc/*/pagemap Alexey Dobriyan
@ 2008-07-05 1:53 ` Andrew Morton
2008-07-05 7:44 ` Alexey Dobriyan
2008-07-05 17:40 ` Linus Torvalds
0 siblings, 2 replies; 9+ messages in thread
From: Andrew Morton @ 2008-07-05 1:53 UTC (permalink / raw)
To: Alexey Dobriyan; +Cc: torvalds, akpm, mpm, linux-kernel
On Sat, 5 Jul 2008 05:07:02 +0400 "Alexey Dobriyan" <adobriyan@gmail.com> wrote:
> I'm few days from my patch-sending facility, but could someone
> _please_ do the following before release:
>
> 1) initialize pagemap_walk.mm to "mm" , so the code starts working as
> advertised, and
> 2) also initialize ->private to "&pm" so it wouldn't immediately oops
> in pagemap_pte_hole(), and
Geeze you're picky! If everyone was like you we wouldn't need that
nice oops-printing code.
> 3) unstatic struct pagemap_walk, so two threads won't fsckup each other
> (including those started by root, including flipping ->mm when you
> don't have permissions),
Below
> and
> 4) remove second ptrace_may_attach(), and
Can't find what you're referring to here.
> 5) check with microscope allocation there -- page-aligned address and size == 0
> should allocate 0 bytes, and
Where?
> 6) actually check that it works.
Will have a shot.
From: Andrew Morton <akpm@linux-foundation.org>
- initialize pagemap_walk.mm to "mm" , so the code starts working as
advertised
- initialize ->private to "&pm" so it wouldn't immediately oops in
pagemap_pte_hole()
- unstatic struct pagemap_walk, so two threads won't fsckup each other
(including those started by root, including flipping ->mm when you don't
have permissions)
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/proc/task_mmu.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff -puN fs/proc/task_mmu.c~pagemap-fixes-to-pagemap_read fs/proc/task_mmu.c
--- a/fs/proc/task_mmu.c~pagemap-fixes-to-pagemap_read
+++ a/fs/proc/task_mmu.c
@@ -602,11 +602,6 @@ static int pagemap_pte_range(pmd_t *pmd,
return err;
}
-static struct mm_walk pagemap_walk = {
- .pmd_entry = pagemap_pte_range,
- .pte_hole = pagemap_pte_hole
-};
-
/*
* /proc/pid/pagemap - an array mapping virtual pages to pfns
*
@@ -641,6 +636,7 @@ static ssize_t pagemap_read(struct file
struct pagemapread pm;
int pagecount;
int ret = -ESRCH;
+ static struct mm_walk pagemap_walk;
if (!task)
goto out;
@@ -659,6 +655,7 @@ static ssize_t pagemap_read(struct file
if (!mm)
goto out_task;
+
ret = -ENOMEM;
uaddr = (unsigned long)buf & PAGE_MASK;
uend = (unsigned long)(buf + count);
@@ -684,6 +681,11 @@ static ssize_t pagemap_read(struct file
pm.out = (u64 *)buf;
pm.end = (u64 *)(buf + count);
+ pagemap_walk.pmd_entry = pagemap_pte_range;
+ pagemap_walk.pte_hole = pagemap_pte_hole;
+ pagemap_walk.mm = mm;
+ pagemap_walk.private = ±
+
if (!ptrace_may_attach(task)) {
ret = -EIO;
} else {
_
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: *sigh* /proc/*/pagemap
2008-07-05 1:53 ` Andrew Morton
@ 2008-07-05 7:44 ` Alexey Dobriyan
2008-07-05 8:02 ` Andrew Morton
[not found] ` <28fa9c5e0807161906q68411e9bn5975dc277f67b086@mail.gmail.com>
2008-07-05 17:40 ` Linus Torvalds
1 sibling, 2 replies; 9+ messages in thread
From: Alexey Dobriyan @ 2008-07-05 7:44 UTC (permalink / raw)
To: Andrew Morton; +Cc: torvalds, akpm, mpm, linux-kernel
On 7/5/08, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Sat, 5 Jul 2008 05:07:02 +0400 "Alexey Dobriyan" <adobriyan@gmail.com>
> wrote:
>> 3) unstatic struct pagemap_walk, so two threads won't fsckup each other
>> (including those started by root, including flipping ->mm when you
>> don't have permissions),
>
> Below
Below.
>> 4) remove second ptrace_may_attach(), and
>
> Can't find what you're referring to here.
pagemap_read() contains two calls to ptrace_may_attach(),
second one looks unneeded.
>> 5) check with microscope allocation there -- page-aligned address and size
>> == 0
>> should allocate 0 bytes, and
>
> Where?
kmalloc() in pagemap_read(). kmalloc(0) and integer wraparound look possible.
>> 6) actually check that it works.
>
> Will have a shot.
Ha-ha!
> - unstatic struct pagemap_walk, so two threads won't fsckup each other
> (including those started by root, including flipping ->mm when you don't
> have permissions)
> --- a/fs/proc/task_mmu.c~pagemap-fixes-to-pagemap_read
> +++ a/fs/proc/task_mmu.c
> @@ -641,6 +636,7 @@ static ssize_t pagemap_read(struct file
> struct pagemapread pm;
> int pagecount;
> int ret = -ESRCH;
> + static struct mm_walk pagemap_walk;
No, can't have static here, two threads doing pagemap_read() will overwrite
each other's .mm and "out" at least. Like "pm" it shouldn't be global.
It doesn't oops here for unknown reasons, though.
> @@ -684,6 +681,11 @@ static ssize_t pagemap_read(struct file
> pm.out = (u64 *)buf;
> pm.end = (u64 *)(buf + count);
>
> + pagemap_walk.pmd_entry = pagemap_pte_range;
> + pagemap_walk.pte_hole = pagemap_pte_hole;
> + pagemap_walk.mm = mm;
> + pagemap_walk.private = ±
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: *sigh* /proc/*/pagemap
2008-07-05 7:44 ` Alexey Dobriyan
@ 2008-07-05 8:02 ` Andrew Morton
[not found] ` <28fa9c5e0807161906q68411e9bn5975dc277f67b086@mail.gmail.com>
1 sibling, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2008-07-05 8:02 UTC (permalink / raw)
To: Alexey Dobriyan; +Cc: torvalds, mpm, linux-kernel
On Sat, 5 Jul 2008 11:44:54 +0400 "Alexey Dobriyan" <adobriyan@gmail.com> wrote:
> On 7/5/08, Andrew Morton <akpm@linux-foundation.org> wrote:
> > On Sat, 5 Jul 2008 05:07:02 +0400 "Alexey Dobriyan" <adobriyan@gmail.com>
> > wrote:
>
> >> 3) unstatic struct pagemap_walk, so two threads won't fsckup each other
> >> (including those started by root, including flipping ->mm when you
> >> don't have permissions),
> >
> > Below
>
> Below.
>
> >> 4) remove second ptrace_may_attach(), and
> >
> > Can't find what you're referring to here.
>
> pagemap_read() contains two calls to ptrace_may_attach(),
> second one looks unneeded.
Agree.
> >> 5) check with microscope allocation there -- page-aligned address and size
> >> == 0
> >> should allocate 0 bytes, and
> >
> > Where?
>
> kmalloc() in pagemap_read(). kmalloc(0) and integer wraparound look possible.
hm, maybe. Plug that anyway. Used kcalloc() even though the memset
isn't needed (we need a non-zeroing kcalloc)
> >> 6) actually check that it works.
> >
> > Will have a shot.
>
> Ha-ha!
>
> > - unstatic struct pagemap_walk, so two threads won't fsckup each other
> > (including those started by root, including flipping ->mm when you don't
> > have permissions)
>
> > --- a/fs/proc/task_mmu.c~pagemap-fixes-to-pagemap_read
> > +++ a/fs/proc/task_mmu.c
> > @@ -641,6 +636,7 @@ static ssize_t pagemap_read(struct file
> > struct pagemapread pm;
> > int pagecount;
> > int ret = -ESRCH;
> > + static struct mm_walk pagemap_walk;
>
> No, can't have static here, two threads doing pagemap_read() will overwrite
> each other's .mm and "out" at least. Like "pm" it shouldn't be global.
doh, copy-n-paste strikes again.
From: Andrew Morton <akpm@linux-foundation.org>
Fix some issues noted by Alexey:
- initialize pagemap_walk.mm to "mm" , so the code starts working as
advertised
- initialize ->private to "&pm" so it wouldn't immediately oops in
pagemap_pte_hole()
- unstatic struct pagemap_walk, so two threads won't fsckup each other
(including those started by root, including flipping ->mm when you don't
have permissions)
- pagemap_read() contains two calls to ptrace_may_attach(), second one
looks unneeded.
- avoid possible kmalloc(0) and integer wraparound.
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/proc/task_mmu.c | 72 ++++++++++++++++++++++---------------------
1 file changed, 38 insertions(+), 34 deletions(-)
diff -puN fs/proc/task_mmu.c~pagemap-fixes-to-pagemap_read fs/proc/task_mmu.c
--- a/fs/proc/task_mmu.c~pagemap-fixes-to-pagemap_read
+++ a/fs/proc/task_mmu.c
@@ -602,11 +602,6 @@ static int pagemap_pte_range(pmd_t *pmd,
return err;
}
-static struct mm_walk pagemap_walk = {
- .pmd_entry = pagemap_pte_range,
- .pte_hole = pagemap_pte_hole
-};
-
/*
* /proc/pid/pagemap - an array mapping virtual pages to pfns
*
@@ -641,6 +636,11 @@ static ssize_t pagemap_read(struct file
struct pagemapread pm;
int pagecount;
int ret = -ESRCH;
+ struct mm_walk pagemap_walk;
+ unsigned long src;
+ unsigned long svpfn;
+ unsigned long start_vaddr;
+ unsigned long end_vaddr;
if (!task)
goto out;
@@ -659,11 +659,15 @@ static ssize_t pagemap_read(struct file
if (!mm)
goto out_task;
- ret = -ENOMEM;
+
uaddr = (unsigned long)buf & PAGE_MASK;
uend = (unsigned long)(buf + count);
pagecount = (PAGE_ALIGN(uend) - uaddr) / PAGE_SIZE;
- pages = kmalloc(pagecount * sizeof(struct page *), GFP_KERNEL);
+ ret = 0;
+ if (pagecount == 0)
+ goto out_mm;
+ pages = kcalloc(pagecount, sizeof(struct page *), GFP_KERNEL);
+ ret = -ENOMEM;
if (!pages)
goto out_mm;
@@ -684,33 +688,33 @@ static ssize_t pagemap_read(struct file
pm.out = (u64 *)buf;
pm.end = (u64 *)(buf + count);
- if (!ptrace_may_attach(task)) {
- ret = -EIO;
- } else {
- unsigned long src = *ppos;
- unsigned long svpfn = src / PM_ENTRY_BYTES;
- unsigned long start_vaddr = svpfn << PAGE_SHIFT;
- unsigned long end_vaddr = TASK_SIZE_OF(task);
-
- /* watch out for wraparound */
- if (svpfn > TASK_SIZE_OF(task) >> PAGE_SHIFT)
- start_vaddr = end_vaddr;
-
- /*
- * The odds are that this will stop walking way
- * before end_vaddr, because the length of the
- * user buffer is tracked in "pm", and the walk
- * will stop when we hit the end of the buffer.
- */
- ret = walk_page_range(start_vaddr, end_vaddr,
- &pagemap_walk);
- if (ret == PM_END_OF_BUFFER)
- ret = 0;
- /* don't need mmap_sem for these, but this looks cleaner */
- *ppos += (char *)pm.out - buf;
- if (!ret)
- ret = (char *)pm.out - buf;
- }
+ pagemap_walk.pmd_entry = pagemap_pte_range;
+ pagemap_walk.pte_hole = pagemap_pte_hole;
+ pagemap_walk.mm = mm;
+ pagemap_walk.private = ±
+
+ src = *ppos;
+ svpfn = src / PM_ENTRY_BYTES;
+ start_vaddr = svpfn << PAGE_SHIFT;
+ end_vaddr = TASK_SIZE_OF(task);
+
+ /* watch out for wraparound */
+ if (svpfn > TASK_SIZE_OF(task) >> PAGE_SHIFT)
+ start_vaddr = end_vaddr;
+
+ /*
+ * The odds are that this will stop walking way
+ * before end_vaddr, because the length of the
+ * user buffer is tracked in "pm", and the walk
+ * will stop when we hit the end of the buffer.
+ */
+ ret = walk_page_range(start_vaddr, end_vaddr, &pagemap_walk);
+ if (ret == PM_END_OF_BUFFER)
+ ret = 0;
+ /* don't need mmap_sem for these, but this looks cleaner */
+ *ppos += (char *)pm.out - buf;
+ if (!ret)
+ ret = (char *)pm.out - buf;
out_pages:
for (; pagecount; pagecount--) {
_
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: *sigh* /proc/*/pagemap
2008-07-05 1:53 ` Andrew Morton
2008-07-05 7:44 ` Alexey Dobriyan
@ 2008-07-05 17:40 ` Linus Torvalds
2008-07-07 18:23 ` Matt Mackall
1 sibling, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2008-07-05 17:40 UTC (permalink / raw)
To: Andrew Morton; +Cc: Alexey Dobriyan, torvalds, akpm, mpm, linux-kernel
On Fri, 4 Jul 2008, Andrew Morton wrote:
> int pagecount;
> int ret = -ESRCH;
> + static struct mm_walk pagemap_walk;
>
...
>
> + pagemap_walk.pmd_entry = pagemap_pte_range;
> + pagemap_walk.pte_hole = pagemap_pte_hole;
> + pagemap_walk.mm = mm;
> + pagemap_walk.private = ±
> +
No can do. You have one single pagemap_walk, but perhaps multiple users,
who all disagree about what it should contain.
Quite frankly, I think we should just remove the whole f*cking crap. I
think it's also potentially a security hole to give physical page
information and swap info - even if it's just your own pages.
Matt, can you explain what the point was of this whole thing? I'm really
_this_ close to just removing the POS right now. It's been a big source of
bugs, and it looks entirely pointless.
Linus
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: *sigh* /proc/*/pagemap
2008-07-05 17:40 ` Linus Torvalds
@ 2008-07-07 18:23 ` Matt Mackall
2008-07-07 18:34 ` Linus Torvalds
0 siblings, 1 reply; 9+ messages in thread
From: Matt Mackall @ 2008-07-07 18:23 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andrew Morton, Alexey Dobriyan, torvalds, akpm, linux-kernel,
Dave Hansen
On Sat, 2008-07-05 at 10:40 -0700, Linus Torvalds wrote:
>
> On Fri, 4 Jul 2008, Andrew Morton wrote:
> > int pagecount;
> > int ret = -ESRCH;
> > + static struct mm_walk pagemap_walk;
> >
> ...
> >
> > + pagemap_walk.pmd_entry = pagemap_pte_range;
> > + pagemap_walk.pte_hole = pagemap_pte_hole;
> > + pagemap_walk.mm = mm;
> > + pagemap_walk.private = ±
> > +
>
> No can do. You have one single pagemap_walk, but perhaps multiple users,
> who all disagree about what it should contain.
[Sorry, been out of town for a few days.]
This bug got introduced a couple weeks ago when we revamped things to
deal with hugepages not being marked in pagetables on non-x86 despite
the existence of the relevant helper functions.
> Quite frankly, I think we should just remove the whole f*cking crap. I
> think it's also potentially a security hole to give physical page
> information and swap info - even if it's just your own pages.
It does expose information that can be used to advantage, sure. But it's
not a hole in the sense that it can be exploited on its own. If you can
read or write directly to/from physical pages or swap, you already own
the box and this just makes your job easier.
> Matt, can you explain what the point was of this whole thing? I'm really
> _this_ close to just removing the POS right now. It's been a big source of
> bugs, and it looks entirely pointless.
It exists to make the VM stop being a big black box. Before now the VM
exposed little beyond statistics, many of which are basically
meaningless (RSS?). With pagemap, you can actually see precisely where
things are getting allocated, how they're getting shared, etc. Think
NUMA, think cell phones.
Here's an example: for most of time, the page allocators have handed
back pages in reverse order. So a series of sequential pages in a
mapping would show up in pessimal order in terms of I/O coalescing (it
was basically never happening). This was discovered a few years ago with
painstaking debugging (why was coalescing so rare?), fixed, and then
promptly broken again.
When I first got pagemap working, I immediately spotted the regression
with hexdump without even looking for it: all the PFNs were counting
backwards. No statistic is ever going to give you that level of detail.
--
Mathematics is the supreme nostalgia of our time.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: *sigh* /proc/*/pagemap
2008-07-07 18:23 ` Matt Mackall
@ 2008-07-07 18:34 ` Linus Torvalds
2008-07-07 18:51 ` Matt Mackall
0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2008-07-07 18:34 UTC (permalink / raw)
To: Matt Mackall
Cc: Andrew Morton, Alexey Dobriyan, torvalds, akpm, linux-kernel,
Dave Hansen
On Mon, 7 Jul 2008, Matt Mackall wrote:
>
> It exists to make the VM stop being a big black box. Before now the VM
> exposed little beyond statistics, many of which are basically
> meaningless (RSS?). With pagemap, you can actually see precisely where
> things are getting allocated, how they're getting shared, etc. Think
> NUMA, think cell phones.
Umm. How about setting it under CONFIG_DEBUG_VM or something like that
then?
Right now it's not even _asked_ about, unless you're EMBEDDED, and
defaults to 'y'. Which means that it has all the downsides, an none of the
upsides, for 99.99% of all users.
Linus
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: *sigh* /proc/*/pagemap
2008-07-07 18:34 ` Linus Torvalds
@ 2008-07-07 18:51 ` Matt Mackall
0 siblings, 0 replies; 9+ messages in thread
From: Matt Mackall @ 2008-07-07 18:51 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andrew Morton, Alexey Dobriyan, torvalds, akpm, linux-kernel,
Dave Hansen
On Mon, 2008-07-07 at 11:34 -0700, Linus Torvalds wrote:
>
> On Mon, 7 Jul 2008, Matt Mackall wrote:
> >
> > It exists to make the VM stop being a big black box. Before now the VM
> > exposed little beyond statistics, many of which are basically
> > meaningless (RSS?). With pagemap, you can actually see precisely where
> > things are getting allocated, how they're getting shared, etc. Think
> > NUMA, think cell phones.
>
> Umm. How about setting it under CONFIG_DEBUG_VM or something like that
> then?
I'm certainly not suggesting it's only or even primarily for NUMA or
embedded, just -most obviously- useful for them. Fact is, people have
been interested enough in this POS that I've been getting patches from
all over. Nor is it about debugging the VM, per se. It's about having
transparency in a critical area of the kernel. What pages of my browser
are never swapped in? What is the memory footprint of my forking server,
really? Why the hell can't I get a huge page? Etc.
> Right now it's not even _asked_ about, unless you're EMBEDDED, and
> defaults to 'y'. Which means that it has all the downsides, an none of the
> upsides, for 99.99% of all users.
That's what I'd done originally (it's certainly -my- inclination).
Various people (Rusty comes to mind) convinced me to go the other way.
--
Mathematics is the supreme nostalgia of our time.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: *sigh* /proc/*/pagemap
[not found] ` <28fa9c5e0807161906q68411e9bn5975dc277f67b086@mail.gmail.com>
@ 2008-07-17 20:17 ` Alexey Dobriyan
0 siblings, 0 replies; 9+ messages in thread
From: Alexey Dobriyan @ 2008-07-17 20:17 UTC (permalink / raw)
To: akpm, mpm; +Cc: linux-kernel, pagexec
> > >> 5) check with microscope allocation there -- page-aligned address and
> > size
> > >> == 0
> > >> should allocate 0 bytes, and
> > >
> > > Where?
> >
> > kmalloc() in pagemap_read(). kmalloc(0) and integer wraparound look
> > possible.
Now that I unsuccessfully tried to reproduce multiplication wraparound (on 32-bit),
integer wraparound in kmalloc() simply can't happen here.
Relevant code from pagemap_read():
uaddr = (unsigned long)buf & PAGE_MASK;
uend = (unsigned long)(buf + count);
pagecount = (PAGE_ALIGN(uend) - uaddr) / PAGE_SIZE;
pages = kmalloc(pagecount * sizeof(struct page *), GFP_KERNEL);
pagecount is effectively "count / PAGE_SIZE", where "count" is size_t.
kmalloc() takes size_t as first argument, so one has to overflow
count / PAGE_SIZE * sizeof(pointer).
For sure, page is bigger than pointer.
Anyone still see holes?
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-07-17 20:20 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-05 1:07 *sigh* /proc/*/pagemap Alexey Dobriyan
2008-07-05 1:53 ` Andrew Morton
2008-07-05 7:44 ` Alexey Dobriyan
2008-07-05 8:02 ` Andrew Morton
[not found] ` <28fa9c5e0807161906q68411e9bn5975dc277f67b086@mail.gmail.com>
2008-07-17 20:17 ` Alexey Dobriyan
2008-07-05 17:40 ` Linus Torvalds
2008-07-07 18:23 ` Matt Mackall
2008-07-07 18:34 ` Linus Torvalds
2008-07-07 18:51 ` Matt Mackall
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox