linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Minchan Kim <minchan.kernel.2@gmail.com>
To: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Arun Sharma <asharma@fb.com>,
	John Stultz <john.stultz@linaro.org>, Mel Gorman <mel@csn.ul.ie>,
	Hugh Dickins <hughd@google.com>,
	Dave Hansen <dave@linux.vnet.ibm.com>,
	Rik van Riel <riel@redhat.com>, Neil Brown <neilb@suse.de>,
	Mike Hommey <mh@glandium.org>, Taras Glek <tglek@mozilla.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Jason Evans <je@fb.com>,
	sanjay@google.com, Paul Turner <pjt@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michel Lespinasse <walken@google.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [RFC v7 00/11] Support vrange for anonymous page
Date: Sun, 14 Apr 2013 16:42:04 +0900	[thread overview]
Message-ID: <20130414074204.GC8241@blaptop> (raw)
In-Reply-To: <5166D037.6040405@gmail.com>

Hi KOSAKI,

On Thu, Apr 11, 2013 at 11:01:11AM -0400, KOSAKI Motohiro wrote:
> >>>> and adding new syscall invokation is unwelcome.
> >>>
> >>> Sure. But one more system call could be cheaper than page-granuarity
> >>> operation on purged range.
> >>
> >> I don't think vrange(VOLATILE) cost is the related of this discusstion.
> >> Whether sending SIGBUS or just nuke pte, purge should be done on vmscan,
> >> not vrange() syscall.
> > 
> > Again, please see the MADV_FREE. http://lwn.net/Articles/230799/
> > It does changes pte and page flags on all pages of the range through
> > zap_pte_range. So it would make vrange(VOLASTILE) expensive and
> > the bigger cost is, the bigger range is.
> 
> This haven't been crossed my mind. now try_to_discard_one() insert vrange
> for making SIGBUS. then, we can insert pte_none() as the same cost too. Am
> I missing something?

For your requirement, we need some tracking model to detect some page is
using by the process currently before VM discards it *if* we don't give
vrange(NOVOLATILE) pair system call(Look at below). So the tracking model
should be formed in vrange(VOLATILE) system call context.

> 
> I couldn't imazine why pte should be zapping on vrange(VOLATILE).

Sorry, my explanation was too bad to understand.
I will try again.

First of all, thing you want is almost like MADV_FREE.
So let's look at it firstly.

If you call madvise(range, MADV_FREE), VM should investigate all of
pages mapped at page table for range(start, start + len) so we need
page table lookup for the range and mark a flag to all page descriptor
(ex,PG_lazyfree) to give hint to kernel for discarding the page instead of
swappint out when reclaim happens. Another thing we need is to clear out
a dirty bit from PTE to detect the pages is dirtied or not, since we call
madvise(range, MADV_FREE) because we can't discard them, which are using by
some process since he called madvise. So if VM find the page has PG_lazyfree
but the page is dirtied recenlty by peeking PTE, VM can't discard the page.
So madivse system call's overhead is folloinwg as in madvise(MADV_FREE)

1. look up all pages from page table for the range.
2. mark some bit(PG_lazyfree) for page descriptors of pages mapped at range
3. clear dirty bit and TLB flush

So, madvise(MADV_FREE) would be better than madvise(DONTNEED) because it can
avoid page fault if memory pressure doesn't happen but system call overhead
could be still huge and expecially the overhead is increased proportionally
by range size.

Let's talk about vrange(range, VOLATILE)
The overhead of it is very small, which is just mark a flag into a
structure which represents the range (ie, struct vrange). When VM want to reclaim
some pages, VM find a page is mapped at VOLATILE area, so it could discard it
instead of swapping out. It moves the ovehead from system call itself to
VM reclaim path which is very slow path in the system and I think it's desirable
design(And that's why we have rmap).
But the problem is remained. VM can't detect page using by process after he calls
vrange(range, VOLATILE) because we didn't do anything in vrange(VOLATILE) so
VM might discard the page under the process. It didn't happen in madvise(MADV_FREE)
because it cleared out dirty bit of PTE to detect the page is used or not
since madvise is called.

Solution in vrange is to make new vrange(range, NOVOLATILE) system call, which give
the hint to kernel for preventing descarding pages in the range any more.
The cost of vrange(range, NOVOLATILE) is very small, too.
It just clear out the flags from a struct vrange which represents a range.

So I think calling of pair system call about volatile would be cheaper than a
only madvise(MADV_FREE).

I hope it helps your understanding but not sure because I am writing this
in airport which are very hard to focus my work. :(

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-04-14  7:42 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-12  7:38 [RFC v7 00/11] Support vrange for anonymous page Minchan Kim
2013-03-12  7:38 ` [RFC v7 01/11] vrange: enable generic interval tree Minchan Kim
2013-03-12  7:38 ` [RFC v7 02/11] add vrange basic data structure and functions Minchan Kim
2013-03-12  7:38 ` [RFC v7 03/11] add new system call vrange(2) Minchan Kim
2013-03-12  7:38 ` [RFC v7 04/11] add proc/pid/vrange information Minchan Kim
2013-03-12  7:38 ` [RFC v7 05/11] Add purge operation Minchan Kim
2013-03-12  7:38 ` [RFC v7 06/11] send SIGBUS when user try to access purged page Minchan Kim
2013-03-12  7:38 ` [RFC v7 07/11] keep mm_struct to vrange when system call context Minchan Kim
2013-03-12  7:38 ` [RFC v7 08/11] add LRU handling for victim vrange Minchan Kim
2013-03-12  7:38 ` [RFC v7 09/11] Get rid of depenceny that all pages is from a zone in shrink_page_list Minchan Kim
2013-03-12  7:38 ` [RFC v7 10/11] Purging vrange pages without swap Minchan Kim
2013-03-12  7:38 ` [RFC v7 11/11] add purged page information in vmstat Minchan Kim
2013-03-12 23:16 ` [RFC v7 00/11] Support vrange for anonymous page Paul Turner
2013-03-13  6:44   ` Minchan Kim
2013-03-21  1:29 ` John Stultz
2013-03-22  6:01   ` Minchan Kim
2013-03-22 17:06     ` John Stultz
2013-03-25  8:42       ` Minchan Kim
2013-03-27  0:26         ` John Stultz
2013-03-27  8:03           ` Minchan Kim
2013-03-30  0:05             ` John Stultz
2013-04-01  7:57               ` Minchan Kim
2013-03-25 17:16 ` Bartlomiej Zolnierkiewicz
2013-03-27  7:18   ` Minchan Kim
2013-04-10 20:22 ` KOSAKI Motohiro
2013-04-11  6:55   ` Minchan Kim
2013-04-11  7:20     ` KOSAKI Motohiro
2013-04-11  8:02       ` Minchan Kim
2013-04-11  8:15         ` KOSAKI Motohiro
2013-04-11  8:31           ` Minchan Kim
2013-04-11 15:01             ` KOSAKI Motohiro
2013-04-14  7:42               ` Minchan Kim [this message]
2013-04-16  3:33                 ` John Stultz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130414074204.GC8241@blaptop \
    --to=minchan.kernel.2@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=asharma@fb.com \
    --cc=dave@linux.vnet.ibm.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=je@fb.com \
    --cc=john.stultz@linaro.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@gmail.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=mh@glandium.org \
    --cc=mtk.manpages@gmail.com \
    --cc=neilb@suse.de \
    --cc=pjt@google.com \
    --cc=riel@redhat.com \
    --cc=sanjay@google.com \
    --cc=tglek@mozilla.com \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).