From: Andrew Morton <akpm@zip.com.au>
To: Andrea Arcangeli <andrea@suse.de>
Cc: Linus Torvalds <torvalds@transmeta.com>,
"Martin J. Bligh" <fletch@aracnet.com>,
Rik van Riel <riel@conectiva.com.br>,
"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: scalable kmap (was Re: vm lock contention reduction)
Date: Mon, 08 Jul 2002 13:39:04 -0700 [thread overview]
Message-ID: <3D29F868.1338ACF3@zip.com.au> (raw)
In-Reply-To: 20020708080953.GC1350@dualathlon.random
Andrea Arcangeli wrote:
>
> ...
> > generic_file_write()
> > {
> > ...
> > atomic_inc(¤t->mm->dont_unmap_pages);
> >
> > {
> > volatile char dummy;
> > __get_user(dummy, addr);
> > __get_user(dummy, addr+bytes+1);
> > }
> > lock_page();
> > ->prepare_write()
> > kmap_atomic()
> > copy_from_user()
> > kunmap_atomic()
> > ->commit_write()
> > atomic_dec(¤t->mm->dont_unmap_pages);
> > unlock_page()
> > }
> >
> > and over in mm/rmap.c:try_to_unmap_one(), check mm->dont_unmap_pages.
> >
> > Obviously, all this is dependent on CONFIG_HIGHMEM.
> >
> > Workable?
>
> the above pseudocode still won't work correctly,
Sure. It's crap. It can be used to get mlockall() for free.
> if you don't pin the
> page as Martin proposed and you only rely on its virtual mapping to stay
> there because the page can go away under you despite the
> swap_out/rmap-unmapping work, if there's a parallel thread running
> munmap+re-mmap under you. So at the very least you need the mmap_sem at
> every generic_file_write to avoid other threads to change your virtual
> address under you. And you'll basically need to make the mmap_sem
> recursive, because you have to take it before running __get_user to
> avoid races. You could easily do that using my rwsem, I made two versions
> of them, with one that supports recursion, however this is just for your
> info, I'm not suggesting to make it recursive.
I think I'll just go for pinning the damn page. It's a spinlock and
maybe three cachelines but the kernel is about to do a 4k memcpy
anyway. And get_user_pages() doesn't show up much on O_DIRECT
profiles and it'll be a net win and we need to do SOMETHING, dammit.
> ...
> The only reason I can imagine rmap useful in todays
> hardware for all kind of vma (what the patch provides compared to what
> we have now) is to more efficiently defragment ram with an algorithm in
> the memory balancing to provide largepages more efficiently from mixed
> zones, if somebody would suggest rmap for this reason (nobody did yet)
It has been discussed. But no action yet.
> I
> would have to agree completely that it is very useful for that, OTOH it
> seems everybody is reserving (or planning to reserve) a zone for
> largepages anyways so that we don't run into fragmentation in the first
> place. And btw - talking about largepages - we have three concurrent and
> controversial largepage implementations for linux available today, they
> all have different API, one is even shipped in production by a vendor,
What implementation do you favour?
> and while auditing the code I seen it also exports an API visible to
> userspace [ignoring the sysctl] (unlike what I was told):
>
> +#define MAP_BIGPAGE 0x40 /* bigpage mapping */
> [..]
> _trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN) |
> _trans(flags, MAP_DENYWRITE, VM_DENYWRITE) |
> + _trans(flags, MAP_BIGPAGE, VM_BIGMAP) |
> _trans(flags, MAP_EXECUTABLE, VM_EXECUTABLE);
> return prot_bits | flag_bits;
> #undef _trans
>
> that's a new unofficial bitflag to mmap that any proprietary userspace
> can pass to mmap today. Other implementations of the largepage feature
> use madvise or other syscalls to tell the kernel to allocate
> largepages. At least the above won't return -EINVAL so the binaryonly
> app will work transparently on a mainline kernel, but it can eventually
> malfunction if we use 0x40 for something else in 2.5. So I think we
> should do something about the largepages too ASAP into 2.5 (like
> async-io).
Yup. I don't think the -aa kernel has a large page patch, does it?
Is that something which you have time to look into?
-
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
next prev parent reply other threads:[~2002-07-08 20:39 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-07-04 23:05 vm lock contention reduction Andrew Morton
2002-07-04 23:26 ` Rik van Riel
2002-07-04 23:27 ` Rik van Riel
2002-07-05 1:37 ` Andrew Morton
2002-07-05 1:49 ` Rik van Riel
2002-07-05 2:18 ` Andrew Morton
2002-07-05 2:16 ` Rik van Riel
2002-07-05 2:53 ` Andrew Morton
2002-07-05 3:52 ` Benjamin LaHaise
2002-07-05 4:47 ` Linus Torvalds
2002-07-05 5:38 ` Andrew Morton
2002-07-05 5:51 ` Linus Torvalds
2002-07-05 6:08 ` Linus Torvalds
2002-07-05 6:27 ` Alexander Viro
2002-07-05 6:33 ` Andrew Morton
2002-07-05 7:33 ` Andrea Arcangeli
2002-07-07 2:50 ` Andrew Morton
2002-07-07 3:05 ` Linus Torvalds
2002-07-07 3:47 ` Andrew Morton
2002-07-08 11:39 ` Enhanced profiling support (was Re: vm lock contention reduction) John Levon
2002-07-08 11:39 ` John Levon
2002-07-08 17:52 ` Linus Torvalds
2002-07-08 17:52 ` Linus Torvalds
2002-07-08 18:41 ` Karim Yaghmour
2002-07-08 18:41 ` Karim Yaghmour
2002-07-10 2:22 ` John Levon
2002-07-10 2:22 ` John Levon
2002-07-10 4:16 ` Karim Yaghmour
2002-07-10 4:16 ` Karim Yaghmour
2002-07-10 4:38 ` John Levon
2002-07-10 4:38 ` John Levon
2002-07-10 5:46 ` Karim Yaghmour
2002-07-10 5:46 ` Karim Yaghmour
2002-07-10 13:10 ` bob
2002-07-10 13:10 ` bob
2002-07-09 16:57 ` John Levon
2002-07-09 19:56 ` Karim Yaghmour
2002-07-07 5:16 ` vm lock contention reduction Martin J. Bligh
2002-07-07 6:13 ` scalable kmap (was Re: vm lock contention reduction) Martin J. Bligh
2002-07-07 6:37 ` Andrew Morton
2002-07-07 7:53 ` Linus Torvalds
2002-07-07 9:04 ` Andrew Morton
2002-07-07 16:13 ` Martin J. Bligh
2002-07-07 18:31 ` Linus Torvalds
2002-07-07 18:55 ` Linus Torvalds
2002-07-07 19:02 ` Linus Torvalds
2002-07-08 7:24 ` Andrew Morton
2002-07-08 8:09 ` Andrea Arcangeli
2002-07-08 14:50 ` William Lee Irwin III
2002-07-08 20:39 ` Andrew Morton [this message]
2002-07-08 21:08 ` Benjamin LaHaise
2002-07-08 21:45 ` Andrew Morton
2002-07-08 22:24 ` Benjamin LaHaise
2002-07-07 16:00 ` Martin J. Bligh
2002-07-07 18:28 ` Linus Torvalds
2002-07-08 7:11 ` Andrea Arcangeli
2002-07-08 10:15 ` Eric W. Biederman
2002-07-08 7:00 ` Andrea Arcangeli
2002-07-08 17:29 ` Martin J. Bligh
2002-07-08 22:14 ` Linus Torvalds
2002-07-09 0:16 ` Andrew Morton
2002-07-09 3:17 ` Andrew Morton
2002-07-09 4:28 ` Martin J. Bligh
2002-07-09 5:28 ` Andrew Morton
2002-07-09 6:15 ` Martin J. Bligh
2002-07-09 6:30 ` William Lee Irwin III
2002-07-09 6:32 ` William Lee Irwin III
2002-07-09 16:08 ` Martin J. Bligh
2002-07-09 17:32 ` Andrea Arcangeli
2002-07-10 5:32 ` Andrew Morton
2002-07-10 22:43 ` Martin J. Bligh
2002-07-10 23:08 ` Andrew Morton
2002-07-10 23:26 ` Martin J. Bligh
2002-07-11 0:19 ` Andrew Morton
2002-07-12 17:48 ` Martin J. Bligh
2002-07-13 11:18 ` Andrea Arcangeli
2002-07-09 13:59 ` Benjamin LaHaise
2002-07-08 0:38 ` vm lock contention reduction William Lee Irwin III
2002-07-05 6:46 ` Andrew Morton
2002-07-05 14:25 ` Rik van Riel
2002-07-05 23:11 ` William Lee Irwin III
2002-07-05 23:48 ` Andrew Morton
2002-07-06 0:11 ` Rik van Riel
2002-07-06 0:31 ` Linus Torvalds
2002-07-06 0:45 ` Rik van Riel
2002-07-06 0:48 ` Andrew Morton
2002-07-08 0:59 ` William Lee Irwin III
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3D29F868.1338ACF3@zip.com.au \
--to=akpm@zip.com.au \
--cc=andrea@suse.de \
--cc=fletch@aracnet.com \
--cc=linux-mm@kvack.org \
--cc=riel@conectiva.com.br \
--cc=torvalds@transmeta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.