From: Matt Mackall <mpm@selenic.com>
To: Dave Hansen <haveblue@us.ibm.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: /proc/$pid/pagemap troubles
Date: Tue, 31 Jul 2007 19:14:04 -0500 [thread overview]
Message-ID: <20070801001404.GY11115@waste.org> (raw)
In-Reply-To: <1185921790.18414.223.camel@localhost>
On Tue, Jul 31, 2007 at 03:43:10PM -0700, Dave Hansen wrote:
> On Tue, 2007-07-31 at 16:37 -0500, Matt Mackall wrote:
> >
> > > So, a couple of questions. Don't we need to support
> > non-sizeof(unsigned
> > > long)-aligned reads?
> >
> > Why? We should obviously never return more data than we were asked for
> > (that's clearly a bug), but lots of things refuse to read or write
> > stuff that isn't well sized and aligned.
>
> For the 1-byte stuff at the beginning of the file, I think it makes
> perfect sense:
>
> char __big_endian;
> int big_endian;
> read(fd, &__big_endian, 1);
> big_endian = __big_endian;
True.
> > > Do we _really_ need that header in each and every file?
> >
> > Well there's either a header or there isn't.
>
> I just mean that perhaps we should be putting some of that stuff in a
> global file. I think there's some added simplicity both in the kernel
> and in the userspace programs if we just let "*ppos << PAGE_SHIFT ==
> vaddr".
Absolutely. It's the source of both of the long read bugs you
mentioned. But there are also advantages to having it attached.
> I was going to send you a patch, but since you're updating anyway, could
> you add some symbolic names like this:
>
> #define PAGEMAP_ENTRY_SIZE_BYTES sizeof(unsigned long)
> #define PAGEMAP_HEADER_SIZE_BYTES sizeof(unsigned long)
> #define PAGEMAP_MAX_VIRT_PFN (((~0UL) >> PAGE_SHIFT) + 1)
> #define PAGEMAP_BUFFER_SLOTS (PAGE_SIZE/PAGEMAP_ENTRY_SIZE_BYTES)
> #define PAGEMAP_UNPOPULATED_PAGE (-1UL)
I suppose.
> + evpfn = min((src + count) / sizeof(unsigned long),
> + ((~0UL) >> PAGE_SHIFT) + 1);
>
> Should that hunk of code be any different for 32-bit processes on a
> 64-bit kernel? Do we want to use TASK_SIZE_OF() or restrict these to
> actual virtual address space (which is only 48 bits on x86_64).
No. A page for a 32-bit task can line anywhere in the >32-bit physical
address space.
> It also seems that we need some replacement for the CONFIG_HIGHPTE
> #ifdefs. What is the basic reason for those? The copy_to_user() called
> by add_to_pagemap()->flush_pagemap() conflicts with the kmap_atomic()
> from pte_offset_map()?
My not-yet-working code uses get_user_pages and does away with all the
double-buffering and a bunch of the locking issues.
> As I think about it, do we really even need to walk the VMA list at all?
> We don't do anything differently if an address is inside a VMA. We just
> save time by not walking the pagetables for areas that we know are
> zeroed. But, we won't be able to walk down to the PTE level in most
> cases, anyway. So, it probably won't waste _that_ much time.
Perhaps not.
> +struct pagemapread {
> + struct mm_struct *mm;
> + unsigned long next;
> + unsigned long *buf;
> + pte_t *ptebuf;
> + unsigned long pos;
> + size_t count;
> + int index;
> + char __user *out;
> +};
>
> Just looking at the structure, it's _really_ hard to tell what fields go
> with other fields, and it took me a while to unravel it all. There's
> also some redundancy in it, which I find a bit confusing.
Most of those go away with the get_user_pages approach.
--
Mathematics is the supreme nostalgia of our time.
next prev parent reply other threads:[~2007-08-01 0:13 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-07-31 20:36 /proc/$pid/pagemap troubles Dave Hansen
2007-07-31 21:37 ` Matt Mackall
2007-07-31 22:43 ` Dave Hansen
2007-08-01 0:14 ` Matt Mackall [this message]
2007-08-01 16:06 ` Dave Hansen
2007-07-31 22:58 ` Andreas Schwab
2007-07-31 23:06 ` Dave Hansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070801001404.GY11115@waste.org \
--to=mpm@selenic.com \
--cc=haveblue@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.