All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
	Andi Kleen <ak@linux.intel.com>,
	Matthew Wilcox <matthew.r.wilcox@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Dave Chinner <david@fromorbit.com>, linux-mm <linux-mm@kvack.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [RFC, PATCHv2 0/2] mm: map few pages around fault address if they are in page cache
Date: Wed, 19 Feb 2014 01:57:14 +0200	[thread overview]
Message-ID: <20140218235714.GA16064@node.dhcp.inet.fi> (raw)
In-Reply-To: <CA+55aFwEAYhhUijNUf1dRppzh=+5QfXTAdGQe8D_mJH77tPHug@mail.gmail.com>

On Tue, Feb 18, 2014 at 10:28:11AM -0800, Linus Torvalds wrote:
> On Tue, Feb 18, 2014 at 10:07 AM, Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com> wrote:
> >
> > Patch is wrong. Correct one is below.
> 
> Hmm. I don't hate this. Looking through it, it's fairly simple
> conceptually, and the code isn't that complex either. I can live with
> this.
> 
> I think it's a bit odd how you pass both "max_pgoff" and "nr_pages" to
> the fault-around function, though. In fact, I'd consider that a bug.
> Passing in "FAULT_AROUND_PAGES" is just wrong, since the code cannot -
> and in fact *must* not - actually fault in that many pages, since the
> starting/ending address can be limited by other things.
> 
> So I think that part of the code is bogus. You need to remove
> nr_pages, because any use of it is just incorrect. I don't think it
> can actually matter, since the max_pgoff checks are more restrictive,
> but if you think it can matter please explain how and why it wouldn't
> be a major bug?

I don't like this too...

Current max_pgoff is end of page table (or end of vma, if it ends before).

If we drop nr_pages but keep current max_pgoff, we will potentially setup
PTRS_PER_PTE pages a time: i.e. page fault to first page of page table and
all pages are ready. nr_pages limits the number.

It's not necessary bad idea to populate whole page table at once. I need
to measure how much latency we will add by doing that.

The only problem I see is that we take ptl for a bit too long. But with
split ptl it will affect only page table we populate.

Other approach is too limit ourself to FAULT_AROUND_PAGES from start_addr.
In this case sometimes we will do useless radix-tree lookup even if we had
chance to populated pages further in the page table.

> Apart from that, I'd really like to see numbers for different ranges
> of FAULT_AROUND_ORDER, because I think 5 is pretty high, but on the
> whole I don't find this horrible, and you still lock the page so it
> doesn't involve any new rules. I'm not hugely happy with another raw
> radix-tree user, but it's not horrible.
> 
> Btw, is the "radix_tree_deref_retry(page) -> goto restart" really
> necessary? I'd be almost more inclined to just make it just do a
> "break;" to break out of the loop and stop doing anything clever at
> all.

The code has not ready yet. I'll rework it. It just what I had by the end
of the day. I wanted to know if setup pte directly from ->fault_nonblock()
is okayish approach or considered layering violation.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
	Andi Kleen <ak@linux.intel.com>,
	Matthew Wilcox <matthew.r.wilcox@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Dave Chinner <david@fromorbit.com>, linux-mm <linux-mm@kvack.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [RFC, PATCHv2 0/2] mm: map few pages around fault address if they are in page cache
Date: Wed, 19 Feb 2014 01:57:14 +0200	[thread overview]
Message-ID: <20140218235714.GA16064@node.dhcp.inet.fi> (raw)
In-Reply-To: <CA+55aFwEAYhhUijNUf1dRppzh=+5QfXTAdGQe8D_mJH77tPHug@mail.gmail.com>

On Tue, Feb 18, 2014 at 10:28:11AM -0800, Linus Torvalds wrote:
> On Tue, Feb 18, 2014 at 10:07 AM, Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com> wrote:
> >
> > Patch is wrong. Correct one is below.
> 
> Hmm. I don't hate this. Looking through it, it's fairly simple
> conceptually, and the code isn't that complex either. I can live with
> this.
> 
> I think it's a bit odd how you pass both "max_pgoff" and "nr_pages" to
> the fault-around function, though. In fact, I'd consider that a bug.
> Passing in "FAULT_AROUND_PAGES" is just wrong, since the code cannot -
> and in fact *must* not - actually fault in that many pages, since the
> starting/ending address can be limited by other things.
> 
> So I think that part of the code is bogus. You need to remove
> nr_pages, because any use of it is just incorrect. I don't think it
> can actually matter, since the max_pgoff checks are more restrictive,
> but if you think it can matter please explain how and why it wouldn't
> be a major bug?

I don't like this too...

Current max_pgoff is end of page table (or end of vma, if it ends before).

If we drop nr_pages but keep current max_pgoff, we will potentially setup
PTRS_PER_PTE pages a time: i.e. page fault to first page of page table and
all pages are ready. nr_pages limits the number.

It's not necessary bad idea to populate whole page table at once. I need
to measure how much latency we will add by doing that.

The only problem I see is that we take ptl for a bit too long. But with
split ptl it will affect only page table we populate.

Other approach is too limit ourself to FAULT_AROUND_PAGES from start_addr.
In this case sometimes we will do useless radix-tree lookup even if we had
chance to populated pages further in the page table.

> Apart from that, I'd really like to see numbers for different ranges
> of FAULT_AROUND_ORDER, because I think 5 is pretty high, but on the
> whole I don't find this horrible, and you still lock the page so it
> doesn't involve any new rules. I'm not hugely happy with another raw
> radix-tree user, but it's not horrible.
> 
> Btw, is the "radix_tree_deref_retry(page) -> goto restart" really
> necessary? I'd be almost more inclined to just make it just do a
> "break;" to break out of the loop and stop doing anything clever at
> all.

The code has not ready yet. I'll rework it. It just what I had by the end
of the day. I wanted to know if setup pte directly from ->fault_nonblock()
is okayish approach or considered layering violation.

-- 
 Kirill A. Shutemov

  reply	other threads:[~2014-02-18 23:57 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-17 18:38 [RFC, PATCHv2 0/2] mm: map few pages around fault address if they are in page cache Kirill A. Shutemov
2014-02-17 18:38 ` Kirill A. Shutemov
2014-02-17 18:38 ` [PATCH 1/2] mm: introduce vm_ops->fault_nonblock() Kirill A. Shutemov
2014-02-17 18:38   ` Kirill A. Shutemov
2014-02-17 18:38 ` [PATCH 2/2] mm: implement ->fault_nonblock() for page cache Kirill A. Shutemov
2014-02-17 18:38   ` Kirill A. Shutemov
2014-02-17 19:01 ` [RFC, PATCHv2 0/2] mm: map few pages around fault address if they are in " Linus Torvalds
2014-02-17 19:01   ` Linus Torvalds
2014-02-17 19:49   ` Kirill A. Shutemov
2014-02-17 19:49     ` Kirill A. Shutemov
2014-02-17 20:24     ` Linus Torvalds
2014-02-17 20:24       ` Linus Torvalds
2014-02-18 13:28   ` Rik van Riel
2014-02-18 13:28     ` Rik van Riel
2014-02-18 14:15     ` Wilcox, Matthew R
2014-02-18 14:15       ` Wilcox, Matthew R
2014-02-18 18:02       ` Linus Torvalds
2014-02-18 18:02         ` Linus Torvalds
2014-02-18 18:53         ` Matthew Wilcox
2014-02-18 18:53           ` Matthew Wilcox
2014-02-18 19:07           ` Linus Torvalds
2014-02-18 19:07             ` Linus Torvalds
2014-02-18 14:23     ` Kirill A. Shutemov
2014-02-18 14:23       ` Kirill A. Shutemov
2014-02-18 17:51     ` Linus Torvalds
2014-02-18 17:51       ` Linus Torvalds
2014-02-18 17:59   ` Kirill A. Shutemov
2014-02-18 17:59     ` Kirill A. Shutemov
2014-02-18 18:07     ` Kirill A. Shutemov
2014-02-18 18:07       ` Kirill A. Shutemov
2014-02-18 18:07       ` Kirill A. Shutemov
2014-02-18 18:28       ` Linus Torvalds
2014-02-18 18:28         ` Linus Torvalds
2014-02-18 23:57         ` Kirill A. Shutemov [this message]
2014-02-18 23:57           ` Kirill A. Shutemov
2014-02-19  0:29           ` Linus Torvalds
2014-02-19  0:29             ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140218235714.GA16064@node.dhcp.inet.fi \
    --to=kirill@shutemov.name \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@fromorbit.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.r.wilcox@intel.com \
    --cc=mgorman@suse.de \
    --cc=riel@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.