public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@suse.de>
To: Hugh Dickins <hugh@veritas.com>
Cc: Andrew Morton <akpm@osdl.org>, linux-kernel@vger.kernel.org
Subject: Re: nonlinear swapping w/o pte_chains [Re: VMA_MERGING_FIXUP and patch]
Date: Wed, 24 Mar 2004 15:37:29 +0100	[thread overview]
Message-ID: <20040324143729.GC2065@dualathlon.random> (raw)
In-Reply-To: <Pine.LNX.4.44.0403240931430.7474-100000@localhost.localdomain>

On Wed, Mar 24, 2004 at 10:12:58AM +0000, Hugh Dickins wrote:
> On Tue, 23 Mar 2004, Andrea Arcangeli wrote:
> > 
> > I don't think I can use the tlb gather because I've to set the pte back
> > immediatly, or can I? The IPI flood and huge pagetable walk with total
> > destruction of the address space with huge mappings will be very bad in
> > terms of usability during swapping of huge nonlinear vmas, but hey, if
> > you want to swap smoothly, you should use the vmas.
> 
> Thanks a lot for the preview (or would have been a preview if I'd been
> awake - and now I've found it easiest to look at 2.6.5-rc1 patched with
> the 2.6.5-rc1-aa2 objrmap and anon_vma you pointed Martin to in other
> mail, which includes your latest fixes).
> 
> I think you're being too harsh on the nonlinear vmas!  I know you're
> not keen on them, but punishing them this hard!  If I read it right,
> page_referenced will never (unless PageReferenced, or mapped into
> a nonlinear also) report a page from a nonlinear vma as referenced
> (I do agree with that part).  So they'll soon reach try_to_unmap,
> and each one which gets there will cause every page in every nonlinear
> vma of that inode to be unmapped from the nonlinears right then?
> Yes, that'll teach 'em to use sys_remap_file_pages without VM_LOCKED.

Yep ;)

> For mine I'll try to carry on with the less draconian approach I
> started yesterday, scanning just a range each time (rather 2.4 style).

That will DoS real life, that's why I had to be draconian.  after you
finished I'll send a testcase to test, that is a real life testcase not
an exploit. The only way to dominate complexity with a pagetable scan is
to do what 2.4 is doing, that is to drop all ptes we find it in our way
so the vm will stop calling try_to_unmap, we must avoid walking the vma
more than once to swap it out. This will cause a minor fault flood but
that's ok, it doesn't need to be fast at swapping.

> At the very least, I think your unmap (and mine) needs to
> ptep_test_and_clear_young just before unmap_pte_page, and back out if
> the page is young (referenced).  I was going to recommend that anyway:
> at last got around to considering that issue of whether the failed
> trylocks should report referenced or not (return 1 or 0).  Looking at
> how shrink_list goes, even before 2.6.5-rc1, I'd expect it to behave
> better your way (proceed to try_to_unmap, which will rightly say
> SWAP_AGAIN if it fails the same trylock) than how it was before in
> objrmap; but that will behave better with a ptep_test_and_clear_young
> check first too.

cute, I agree we should recheck the young bit inside.

> Sorry to see the #if VMA_MERGING_FIXUPs are still there.  I've a
> growing feeling that it won't make enough difference when they're
> gone.  But maybe you have a cunning plan to merge all the anon_vmas
> which would result from an mmap next page, write data in, mprotect ro,
> mmap next page, write data in, mprotect ro, ..... workload.

problem is that mprotect (and mremap) meging is low prio compared to
nonlinear==mlock and i_mmap{shared} complexity, so it'll address it only
after I've a scalable swapping for huge i_mmap{shared} list too, which
is a pre-requisite for merging, mprotect merging doesn't sounds
prerequisite, though I certainly agree we should fixup it soon (and
after we fix it it'll work for files too, something that never worked
todate, and I feel it'll be as important for files as it was so far for
anon ram, and nobody complained yet that it's not enabled for files ;).

  parent reply	other threads:[~2004-03-24 14:36 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-03-22 17:05 VMA_MERGING_FIXUP and patch Hugh Dickins
2004-03-22 17:52 ` Andrea Arcangeli
2004-03-22 19:02   ` Hugh Dickins
2004-03-22 19:58     ` Andrea Arcangeli
2004-03-23 21:44       ` nonlinear swapping w/o pte_chains [Re: VMA_MERGING_FIXUP and patch] Andrea Arcangeli
2004-03-24  2:35         ` Andrea Arcangeli
2004-03-24  4:38         ` Andrea Arcangeli
2004-03-24 10:12         ` Hugh Dickins
2004-03-24 12:18           ` Hugh Dickins
2004-03-24 14:47             ` Andrea Arcangeli
2004-03-24 14:37           ` Andrea Arcangeli [this message]
2004-03-24 18:42             ` Andrea Arcangeli
2004-03-22 19:57 ` VMA_MERGING_FIXUP and patch Andrew Morton
2004-03-22 20:05   ` Andrea Arcangeli
2004-03-22 20:33     ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040324143729.GC2065@dualathlon.random \
    --to=andrea@suse.de \
    --cc=akpm@osdl.org \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox