/proc/$pid/pagemap troubles

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

From: Dave Hansen <haveblue@us.ibm.com>
To: Matt Mackall <mpm@selenic.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: /proc/$pid/pagemap troubles
Date: Tue, 31 Jul 2007 13:36:14 -0700	[thread overview]
Message-ID: <1185914174.18414.184.camel@localhost> (raw)

Since the pagemap code has a little header on it to help describe the
format, I wrote a little c program to parse its output.  I get some
strange results.  If I do this:

	fd = open("/proc/1/pagemap", O_RDONLY);
	count = read(fd, &endianness, 1);

count will always be 4.  

hexdump gets similar, but even worse results:

        qemu:~# strace hexdump -C /proc/self/pagemap 
        ...
        read(0, "\1\f\4\4\377\377\377\377\377\377\377\377\377\377\377\377"..., 16) = 20
        read(0, 0x804d39c, 4294967292)          = -1 EFAULT (Bad address)
        --- SIGSEGV (Segmentation fault) @ 0 (0) ---
        +++ killed by SIGSEGV +++

Note that the kernel returns 20 to the read request of 16.  I think the
kernel is actually copying over something important in hexdump's memory
which is adjacent to the buffer and causing it to segfault.

The code is basically organized not to output the right thing for any
unaligned access, and it apparently gets confused about exactly what
userspace has asked for.  I think this is largely due to its overwriting
of "count" in pagemap_read().

So, a couple of questions.  Don't we need to support non-sizeof(unsigned
long)-aligned reads?

Do we _really_ need that header in each and every file?

> * first byte:   0 for big endian, 1 for little

Do we ever have cases where userspace and kernel differ in their
endianness?  Or, are you hoping to dump these files raw on one
architecture and parse them on another?  I would think it makes more
sense to get this from elsewhere.  Or, should we just output in network
byte order and be done with it?

If anything, we could put it in /proc/$pid/status.

> * second byte:  page shift (eg 12 for 4096 byte pages)

This might actually (in theory) change on a per-process basis, so it
makes sense.  But, it seems more global to the process that just pagemap
output.  Would this always be the same as getpagesize()?   Or, should it
always map 1:1 with the amount of memory mapped by a kernel pte_t.  I
_think_ these can be slightly different because we have 64k PAGE_SIZE on
ppc64, but allow mappings to happen in 4k 

> * third byte:   entry size in bytes (currently either 4 or 8)

This one really boils down to "what is the kernel's sizeof(unsigned
long)" because we'll always store pfns in those.  It seems like we
should have a better way to go fetch that.

> * fourth byte:  header size

If we can get rid of the other three this, of course, goes away.

-- Dave

next             reply	other threads:[~2007-07-31 20:36 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-31 20:36 Dave Hansen [this message]
2007-07-31 21:37 ` /proc/$pid/pagemap troubles Matt Mackall
2007-07-31 22:43   ` Dave Hansen
2007-08-01  0:14     ` Matt Mackall
2007-08-01 16:06       ` Dave Hansen
2007-07-31 22:58   ` Andreas Schwab
2007-07-31 23:06     ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1185914174.18414.184.camel@localhost \
    --to=haveblue@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpm@selenic.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox