linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Minchan Kim <minchan-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
To: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
Cc: Hugh Dickins <hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Michael Kerrisk
	<mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>,
	"linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org"
	<linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>,
	KOSAKI Motohiro
	<kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org>,
	"Kirill A. Shutemov"
	<kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>,
	Rik van Riel <riel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
	Linux API <linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Jason Evans <je-b10kYP2dOMg@public.gmane.org>,
	Shaohua Li <shli-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	"linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	yalin wang
	<yalin.wang2010-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Daniel Micay
	<danielmicay-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Mel Gorman <mgorman-l3A5Bk7waGM@public.gmane.org>
Subject: Re: [PATCH v2 01/13] mm: support madvise(MADV_FREE)
Date: Thu, 5 Nov 2015 10:48:55 +0900	[thread overview]
Message-ID: <20151105014855.GJ7357@bbox> (raw)
In-Reply-To: <CALCETrWWgbPNwCr-=LF8p33H25C_aNS5vy4wd3NUap6SmrsmkA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Wed, Nov 04, 2015 at 05:29:57PM -0800, Andy Lutomirski wrote:
> On Wed, Nov 4, 2015 at 4:56 PM, Minchan Kim <minchan-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> > On Wed, Nov 04, 2015 at 04:42:37PM -0800, Andy Lutomirski wrote:
> >> On Wed, Nov 4, 2015 at 4:13 PM, Minchan Kim <minchan-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> >> > On Tue, Nov 03, 2015 at 07:41:35PM -0800, Andy Lutomirski wrote:
> >> >> On Nov 3, 2015 5:30 PM, "Minchan Kim" <minchan-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> >> >> >
> >> >> > Linux doesn't have an ability to free pages lazy while other OS already
> >> >> > have been supported that named by madvise(MADV_FREE).
> >> >> >
> >> >> > The gain is clear that kernel can discard freed pages rather than swapping
> >> >> > out or OOM if memory pressure happens.
> >> >> >
> >> >> > Without memory pressure, freed pages would be reused by userspace without
> >> >> > another additional overhead(ex, page fault + allocation + zeroing).
> >> >> >
> >> >>
> >> >> [...]
> >> >>
> >> >> >
> >> >> > How it works:
> >> >> >
> >> >> > When madvise syscall is called, VM clears dirty bit of ptes of the range.
> >> >> > If memory pressure happens, VM checks dirty bit of page table and if it
> >> >> > found still "clean", it means it's a "lazyfree pages" so VM could discard
> >> >> > the page instead of swapping out.  Once there was store operation for the
> >> >> > page before VM peek a page to reclaim, dirty bit is set so VM can swap out
> >> >> > the page instead of discarding.
> >> >>
> >> >> What happens if you MADV_FREE something that's MAP_SHARED or isn't
> >> >> ordinary anonymous memory?  There's a long history of MADV_DONTNEED on
> >> >> such mappings causing exploitable problems, and I think it would be
> >> >> nice if MADV_FREE were obviously safe.
> >> >
> >> > It filter out VM_LOCKED|VM_HUGETLB|VM_PFNMAP and file-backed vma and MAP_SHARED
> >> > with vma_is_anonymous.
> >> >
> >> >>
> >> >> Does this set the write protect bit?
> >> >
> >> > No.
> >> >
> >> >>
> >> >> What happens on architectures without hardware dirty tracking?  For
> >> >> that matter, even on architecture with hardware dirty tracking, what
> >> >> happens in multithreaded processes that have the dirty TLB state
> >> >> cached in a different CPU's TLB?
> >> >>
> >> >> Using the dirty bit for these semantics scares me.  This API creates a
> >> >> page that can have visible nonzero contents and then can
> >> >> asynchronously and magically zero itself thereafter.  That makes me
> >> >> nervous.  Could we use the accessed bit instead?  Then the observable
> >> >
> >> > Access bit is used by aging algorithm for reclaim. In addition,
> >> > we have supported clear_refs feacture.
> >> > IOW, it could be reset anytime so it's hard to use marker for
> >> > lazy freeing at the moment.
> >> >
> >>
> >> That's unfortunate.  I think that the ABI would be much nicer if it
> >> used the accessed bit.
> >>
> >> In any case, shouldn't the aging algorithm be irrelevant here?  A
> >> MADV_FREE page that isn't accessed can be discarded, whereas we could
> >> hopefully just say that a MADV_FREE page that is accessed gets moved
> >> to whatever list holds recently accessed pages and also stops being a
> >> candidate for discarding due to MADV_FREE?
> >
> > I meant if we use access bit as indicator for lazy-freeing page,
> > we could discard valid page which is never hinted by MADV_FREE but
> > just doesn't mark access bit in page table by aging algorithm.
> 
> Oh, is the rule that the anonymous pages that are clean are discarded
> instead of swapped out?  That is, does your patch set detect that an

The page swapped-in after swapped-out has clean pte and swap device
has valid data if the page isn't touch so VM discards the page rather
than swapout. Of course, pte should point out the swap slot.
If VM decide to remove the page from swap slot, it should be marked
PG_dirty.

> anonymous page can be discarded if it's clean and that the lack of a
> dirty bit is the only indication that the page has been hit with
> MADV_FREE?

No dirty bit, exactly speaking, PG_Dirty
because the page I mentioned above has clean pte but will have PG_dirty.

> 
> If so, that seems potentially error prone -- I had assumed that pages
> that were swapped in but not written since swap-in would also be
> clean, and I don't see how you distinguish them.

I hope above will answer.
> 
> --Andy

  parent reply	other threads:[~2015-11-05  1:48 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-04  1:25 [PATCH v2 00/13] MADV_FREE support Minchan Kim
2015-11-04  1:25 ` [PATCH v2 01/13] mm: support madvise(MADV_FREE) Minchan Kim
2015-11-04  2:29   ` Sergey Senozhatsky
2015-11-04 23:40     ` Minchan Kim
     [not found]   ` <1446600367-7976-2-git-send-email-minchan-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2015-11-04  2:16     ` Sergey Senozhatsky
2015-11-04 23:39       ` Minchan Kim
2015-11-05  3:41         ` Sergey Senozhatsky
2015-11-04  3:41     ` Andy Lutomirski
2015-11-04  5:50       ` Daniel Micay
     [not found]         ` <56399CA5.8090101-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-11-04  5:53           ` Daniel Micay
2015-11-04  6:04             ` Daniel Micay
2015-11-04 18:23         ` Andy Lutomirski
2015-11-04 22:05           ` Daniel Micay
     [not found]             ` <563A813B.9080903-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-11-05 18:17               ` Shaohua Li
2015-11-05 20:13                 ` Daniel Micay
     [not found]                   ` <563BB855.6020304-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-11-05 20:14                     ` Daniel Micay
     [not found]       ` <CALCETrUuNs=26UQtkU88cKPomx_Bik9mbgUUF9q7Nmh1pQJ4qg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-11-05  0:13         ` Minchan Kim
2015-11-05  0:42           ` Andy Lutomirski
2015-11-05  0:56             ` Minchan Kim
2015-11-05  1:29               ` Andy Lutomirski
     [not found]                 ` <CALCETrWWgbPNwCr-=LF8p33H25C_aNS5vy4wd3NUap6SmrsmkA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-11-05  1:48                   ` Minchan Kim [this message]
2015-11-04 20:00     ` Shaohua Li
2015-11-04 21:43       ` Andy Lutomirski
     [not found]       ` <20151104200006.GA46783-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2015-11-04 21:16         ` Daniel Micay
     [not found]           ` <563A7591.7080607-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-11-04 21:29             ` Daniel Micay
2015-11-05  1:33         ` Minchan Kim
2015-11-05  1:37           ` Minchan Kim
2015-12-01 22:30       ` John Stultz
2015-11-04  1:25 ` [PATCH v2 02/13] mm: define MADV_FREE for some arches Minchan Kim
2015-11-04  1:25 ` [PATCH v2 03/13] arch: uapi: asm: mman.h: Let MADV_FREE have same value for all architectures Minchan Kim
     [not found] ` <1446600367-7976-1-git-send-email-minchan-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2015-11-04  1:25   ` [PATCH v2 04/13] mm: free swp_entry in madvise_free Minchan Kim
2015-11-04  1:26   ` [PATCH v2 08/13] x86: add pmd_[dirty|mkclean] for THP Minchan Kim
2015-11-04  1:26   ` [PATCH v2 10/13] powerpc: " Minchan Kim
2015-11-04  1:25 ` [PATCH v2 05/13] mm: move lazily freed pages to inactive list Minchan Kim
2015-11-04  1:26 ` [PATCH v2 06/13] mm: clear PG_dirty to mark page freeable Minchan Kim
2015-11-04  1:26 ` [PATCH v2 07/13] mm: mark stable page dirty in KSM Minchan Kim
2015-11-04  1:26 ` [PATCH v2 09/13] sparc: add pmd_[dirty|mkclean] for THP Minchan Kim
2015-11-04  1:26 ` [PATCH v2 11/13] arm: add pmd_mkclean " Minchan Kim
2015-11-04  1:26 ` [PATCH v2 12/13] arm64: " Minchan Kim
2015-11-04  1:26 ` [PATCH v2 13/13] mm: don't split THP page when syscall is called Minchan Kim
2015-12-05 11:10 ` [PATCH v2 00/13] MADV_FREE support Pavel Machek
2015-12-05 15:51   ` Daniel Micay

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151105014855.GJ7357@bbox \
    --to=minchan-dgejt+ai2ygdnm+yrofe0a@public.gmane.org \
    --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=danielmicay-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
    --cc=hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=je-b10kYP2dOMg@public.gmane.org \
    --cc=kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org \
    --cc=kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org \
    --cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
    --cc=luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org \
    --cc=mgorman-l3A5Bk7waGM@public.gmane.org \
    --cc=mhocko-AlSwsSmVLrQ@public.gmane.org \
    --cc=mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=riel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=shli-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=yalin.wang2010-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).