From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1765150AbXGaVhU (ORCPT ); Tue, 31 Jul 2007 17:37:20 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753414AbXGaVhG (ORCPT ); Tue, 31 Jul 2007 17:37:06 -0400 Received: from waste.org ([66.93.16.53]:35394 "EHLO waste.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752272AbXGaVhE (ORCPT ); Tue, 31 Jul 2007 17:37:04 -0400 Date: Tue, 31 Jul 2007 16:37:18 -0500 From: Matt Mackall To: Dave Hansen Cc: "linux-kernel@vger.kernel.org" Subject: Re: /proc/$pid/pagemap troubles Message-ID: <20070731213718.GW11115@waste.org> References: <1185914174.18414.184.camel@localhost> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1185914174.18414.184.camel@localhost> User-Agent: Mutt/1.5.13 (2006-08-11) Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 31, 2007 at 01:36:14PM -0700, Dave Hansen wrote: > Since the pagemap code has a little header on it to help describe the > format, I wrote a little c program to parse its output. I get some > strange results. If I do this: > > fd = open("/proc/1/pagemap", O_RDONLY); > count = read(fd, &endianness, 1); > > count will always be 4. Known bug, fixed in my pending and not-currently-working update. It ought to return 0 for short reads. > hexdump gets similar, but even worse results: > > qemu:~# strace hexdump -C /proc/self/pagemap > ... > read(0, "\1\f\4\4\377\377\377\377\377\377\377\377\377\377\377\377"..., 16) = 20 > read(0, 0x804d39c, 4294967292) = -1 EFAULT (Bad address) > --- SIGSEGV (Segmentation fault) @ 0 (0) --- > +++ killed by SIGSEGV +++ > > Note that the kernel returns 20 to the read request of 16. I think the > kernel is actually copying over something important in hexdump's memory > which is adjacent to the buffer and causing it to segfault. Also fixed. > The code is basically organized not to output the right thing for any > unaligned access, and it apparently gets confused about exactly what > userspace has asked for. I think this is largely due to its overwriting > of "count" in pagemap_read(). > > So, a couple of questions. Don't we need to support non-sizeof(unsigned > long)-aligned reads? Why? We should obviously never return more data than we were asked for (that's clearly a bug), but lots of things refuse to read or write stuff that isn't well sized and aligned. > Do we _really_ need that header in each and every file? Well there's either a header or there isn't. > > * first byte: 0 for big endian, 1 for little > > Do we ever have cases where userspace and kernel differ in their > endianness? Or, are you hoping to dump these files raw on one > architecture and parse them on another? Potentially, yes. > > * second byte: page shift (eg 12 for 4096 byte pages) > > This might actually (in theory) change on a per-process basis, so it > makes sense. But, it seems more global to the process that just pagemap > output. Would this always be the same as getpagesize()? Or, should it > always map 1:1 with the amount of memory mapped by a kernel pte_t. I > _think_ these can be slightly different because we have 64k PAGE_SIZE on > ppc64, but allow mappings to happen in 4k > > > * third byte: entry size in bytes (currently either 4 or 8) > > This one really boils down to "what is the kernel's sizeof(unsigned > long)" because we'll always store pfns in those. It seems like we > should have a better way to go fetch that. > > > * fourth byte: header size > > If we can get rid of the other three this, of course, goes away. True. But the variable-sized header lets us add other stuff later. -- Mathematics is the supreme nostalgia of our time.