All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Jared Hulbert <jaredeh@gmail.com>, benh@kernel.crashing.org
Cc: Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: [RFC] Changing VM_PFNMAP assumptions and rules
Date: Sun, 11 Nov 2007 11:09:34 +1100	[thread overview]
Message-ID: <200711111109.34562.nickpiggin@yahoo.com.au> (raw)
In-Reply-To: <6934efce0711091115i3f859a00id0b869742029b661@mail.gmail.com>

On Saturday 10 November 2007 06:15, Jared Hulbert wrote:
> Per conversations regarding XIP from the vm/fs mini-summit a couple
> months back I've got a patch to air out.
>
> The basic problem is that the assumptions about PFN mappings stemming
> from the rules of remap_pfn_range() aren't always valid.  For example:
> what stops one from using vm_insert_pfn() to map PFN's into a vma in
> an arbitrary order?  Nothing.  Yet those PFN's cause problems in two
> ways.
>
> First, vm_normal_page() won't return NULL. 

They will, because it isn't allowed to be a COW mapping, and hence it
fails the vm_normal_page() test.

> My answer to this is to 
> simply check if pfn_valid()  if it isn't then we've got a proper PFN
> that can only be a PFN.  If you do have a valid PFN then you are (A) a
> 'cow'ed' PFN that is now a real page or (B) you are a real page
> pretending to be a PFN only.  The thing that makes me nervous is that
> my hack doesn't let that page pretend to be a PFN.  I can't figure out
> why a page would need/want to pretend to be a PFN so I don't see
> anything wrong with this, but maybe somebody does.
>
> Second, there are a few random BUG_ON() that don't seem to serve any
> purpose other than to punish the PFN's that don't abide by
> remap_pfn_range() rules.  I just get rid of them.  The problem is I
> don't really understand why they are there in the first place so for
> all I know I'm horribly breaking spufs or something.

They are perhaps slightly undercommented, but they are definitely
required. And it is to ensure that everything works correctly.


> Okay so I haven't tried this out on 2.6.24-rc1 yet, but the same basic
> idea worked on 2.6.23 and older.  I just wanted to get feedback on
> this approach.  I don't know the vm all that well so I want to make
> sure I'm not doing something really stupid that breaks a bunch of code
> paths I don't use.

You actually can't just use pfn_valid, because there are cases where
you actually *cannot* touch the underlying struct page's mapcount,
flags, etc. I think the only real user is /dev/mem.

So my suggestion to you, if you want to support COW pfnmaps, is to
create a new VM_FLAG type (VM_INVALIDPFNMAP? ;)), which has the
pfn_valid() == COW semantics that you want.

Keep the streamlined fastpath in vm_normal_page()...

  if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_JAREDMAP))) {
    if (vma->vm_flags & VM_JAREDMAP) {
      if (!pfn_valid(pfn))
        return NULL;
    } else {
      unsigned long off = (addr - vma->vm_start) >> PAGE_SHIFT;
      if (pfn == vma->vm_pgoff + off)
        return NULL;
      if (!is_cow_mapping(vma->vm_flags))
        return NULL;
    }
}

The tests in vm_insert_pfn would just be complementary to your new
scheme..
BUG_ON(!(vma->vm_flags & (VM_PFNMAP|VM_JAREDMAP)));
BUG_ON((vma->vm_flags & VM_PFNMAP) && is_cow_mapping(vma->vm_flags));
BUG_ON((vma->vm_flags & VM_JAREDMAP) && pfn_valid(pfn));

May not work out so easy, but AFAIKS it will work. See how much milage
that gets you.

The other thing you might like is to allow pfn_valid(pfn) pfns to go
into these mappings, and you know it is fine to twiddle with the
struct page (eg. if you want to switch between different pfns, which
I know the spufs guys want to). That's not too hard: just take out some
of the assertions. You might have to do a little bit of setup work too,
like increment the page count and mapcount etc. but just so long as you
put that in a mm/memory.c helper rather than your own code, it should
be clean enough.

>
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 9791e47..fb962d0 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -366,29 +366,19 @@ static inline int is_cow_mapping(unsigned int flags)
>   * NOTE! Some mappings do not have "struct pages". A raw PFN mapping
>   * will have each page table entry just pointing to a raw page frame
>   * number, and as far as the VM layer is concerned, those do not have
> - * pages associated with them - even if the PFN might point to memory
> - * that otherwise is perfectly fine and has a "struct page".
> + * pages associated with them.
>   *
> - * The way we recognize those mappings is through the rules set up
> - * by "remap_pfn_range()": the vma will have the VM_PFNMAP bit set,
> - * and the vm_pgoff will point to the first PFN mapped: thus every
> - * page that is a raw mapping will always honor the rule
> - *
> - *	pfn_of_page == vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT)
> - *
> - * and if that isn't true, the page has been COW'ed (in which case it
> - * _does_ have a "struct page" associated with it even if it is in a
> - * VM_PFNMAP range).
> + * The old "remap_pfn_range()" rules don't work for all applications.
> + * Each "page" in a PFN mapping either has a page struct backing it
> + * or it doesn't.  If it does then treat it like the page it is, if
> + * if it doesn't then it is not a normal page so just return NULL.
>   */
>  struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long
> addr, pte_t pte)
>  {
>  	unsigned long pfn = pte_pfn(pte);
>
>  	if (unlikely(vma->vm_flags & VM_PFNMAP)) {
> -		unsigned long off = (addr - vma->vm_start) >> PAGE_SHIFT;
> -		if (pfn == vma->vm_pgoff + off)
> -			return NULL;
> -		if (!is_cow_mapping(vma->vm_flags))
> +		if (!pfn_valid(pfn))
>  			return NULL;
>  	}
>
> @@ -1212,7 +1202,6 @@ int vm_insert_pfn(struct vm_area_struct *vma,
> unsigned long addr,
>  	spinlock_t *ptl;
>
>  	BUG_ON(!(vma->vm_flags & VM_PFNMAP));
> -	BUG_ON(is_cow_mapping(vma->vm_flags));
>
>  	retval = -ENOMEM;
>  	pte = get_locked_pte(mm, addr, &ptl);
> @@ -2216,8 +2205,6 @@ static int __do_fault(struct mm_struct *mm,
> struct vm_area_struct *vma,
>  	vmf.flags = flags;
>  	vmf.page = NULL;
>
> -	BUG_ON(vma->vm_flags & VM_PFNMAP);
> -
>  	if (likely(vma->vm_ops->fault)) {
>  		ret = vma->vm_ops->fault(vma, &vmf);
>  		if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE)))
>
> --

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2007-11-11  0:09 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-09 19:15 [RFC] Changing VM_PFNMAP assumptions and rules Jared Hulbert
2007-11-11  0:09 ` Nick Piggin [this message]
2007-11-12 22:03   ` Jared Hulbert
2007-11-12 22:29     ` Benjamin Herrenschmidt
2007-11-12 23:53       ` Jared Hulbert
2007-11-13  0:24         ` Benjamin Herrenschmidt
2007-11-13 12:08     ` Nick Piggin
2007-11-14  1:29       ` Jared Hulbert
2007-11-13 17:26         ` Nick Piggin
2007-11-14 18:52           ` Jared Hulbert
2007-11-16 23:42           ` Jared Hulbert
2007-11-19  0:17             ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200711111109.34562.nickpiggin@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=benh@kernel.crashing.org \
    --cc=jaredeh@gmail.com \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.