From: Matthew Dillon <dillon@apollo.backplane.com>
To: Linus Torvalds <torvalds@transmeta.com>
Cc: Rik van Riel <riel@conectiva.com.br>,
Chris Wedgwood <cw@f00f.org>,
linux-mm@kvack.org, linux-kernel@vger.rutgers.edu
Subject: Re: RFC: design for new VM
Date: Fri, 4 Aug 2000 18:52:16 -0700 (PDT) [thread overview]
Message-ID: <200008050152.SAA89298@apollo.backplane.com> (raw)
In-Reply-To: Pine.LNX.4.10.10008041655420.11340-100000@penguin.transmeta.com
:I agree that from a page table standpoint you should be correct.
:
:I don't think that the other issues are as easily resolved, though.
:Especially with address space ID's on other architectures it can get
:_really_ interesting to do TLB invalidates correctly to other CPU's etc
:(you need to keep track of who shares parts of your page tables etc).
:
:...
:> mismatch, such as call mprotect(), the shared page table would be split.
:
:Right. But what about the TLB?
I'm not advocating trying to share TLB entries, that would be
a disaster. I'm contemplating just the physical page table structure.
e.g. if you mmap() a 1GB file shared (or private read-only) into 300
independant processes, it should be possible to share all the meta-data
required to support that mapping except for the TLB entries themselves.
ASNs shouldn't make a difference... presumably the tags on the TLB
entries are added on after the metadata lookup. I'm also not advocating
attempting to share intermediate 'partial' in-memory TLB caches (hash
tables or other structures). Those are typically fixed in size,
per-cpu, and would not be impacted by scale.
:You have to have some page table locking mechanism for SMP eventually: I
:think you miss some of the problems because the current FreeBSD SMP stuff
:is mostly still "big kernel lock" (outdated info?), and you'll end up
:kicking yourself in a big way when you have the 300 processes sharing the
:same lock for that region..
If it were a long-held lock I'd worry, but if it's a lock on a pte
I don't think it can hurt. After all, even with separate page tables
if 300 processes fault on the same backing file offset you are going
to hit a bottleneck with MP locking anyway, just at a deeper level
(the filesystem rather then the VM system). The BSDI folks did a lot
of testing with their fine-grained MP implementation and found that
putting a global lock around the entire VM system had absolutely no
impact on MP performance.
:> (Linux falls on its face for other reasons, mainly the fact that it
:> maps all of physical memory into KVM in order to manage it).
:
:Not true any more.. Trying to map 64GB of RAM convinced us otherwise ;)
Oh, that's cool! I don't think anyone in FreeBSDland has bothered with
large-memory (> 4GB) memory configurations, there doesn't seem to be
much demand for such a thing on IA32.
:> I think the loss of MP locking for this situation is outweighed by the
:> benefit of a huge reduction in page faults -- rather then see 300
:> processes each take a page fault on the same page, only the first process
:> would and the pte would already be in place when the others got to it.
:> When it comes right down to it, page faults on shared data sets are not
:> really an issue for MP scaleability.
:
:I think you'll find that there are all these small details that just
:cannot be solved cleanly. Do you want to be stuck with a x86-only
:solution?
:
:That said, I cannot honestly say that I have tried very hard to come up
:with solutions. I just have this feeling that it's a dark ugly hole that I
:wouldn't want to go down..
:
: Linus
Well, I don't think this is x86-specific. Or, that is, I don't think it
would pollute the machine-independant code. FreeBSD has virtually no
notion of 'page tables' outside the i386-specific VM files... it doesn't
use page tables (or two-level page-like tables... is Linux still using
those?) to store meta information at all in the higher levels of the
kernel. It uses architecture-independant VM objects and vm_map_entry
structures for that. Physical page tables on FreeBSD are
throw-away-at-any-time entities. The actual implementation of the
'page table' in the IA32 sense occurs entirely in the machine-dependant
subdirectory for IA32.
A page-table sharing mechanism would have to implement the knowledge --
the 'potential' for sharing at a higher level (the vm_map_entry
structure), but it would be up to the machine-dependant VM code to
implement any actual sharing given that knowledge. So while the specific
implementation for IA32 is definitely machine-specific, it would have
no effect on other OS ports (of course, we have only one other
working port at the moment, to the alpha, but you get the idea).
-Matt
Matthew Dillon
<dillon@backplane.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
next prev parent reply other threads:[~2000-08-05 1:52 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2000-08-02 22:08 RFC: design for new VM Rik van Riel
2000-08-03 7:19 ` Chris Wedgwood
2000-08-03 16:01 ` Rik van Riel
2000-08-04 15:41 ` Matthew Dillon
2000-08-04 17:49 ` Linus Torvalds
2000-08-04 23:51 ` Matthew Dillon
2000-08-05 0:03 ` Linus Torvalds
2000-08-05 1:52 ` Matthew Dillon [this message]
2000-08-05 1:09 ` Matthew Wilcox
2000-08-05 2:05 ` Linus Torvalds
2000-08-05 2:17 ` Alexander Viro
2000-08-07 17:55 ` Matthew Dillon
2000-08-05 22:48 ` Theodore Y. Ts'o
2000-08-03 18:27 ` lamont
2000-08-03 18:34 ` Linus Torvalds
2000-08-03 19:11 ` Chris Wedgwood
2000-08-03 21:04 ` Benjamin C.R. LaHaise
2000-08-03 19:32 ` Rik van Riel
2000-08-03 18:05 ` Linus Torvalds
2000-08-03 18:50 ` Rik van Riel
2000-08-03 20:22 ` Linus Torvalds
2000-08-03 22:05 ` Rik van Riel
2000-08-03 22:19 ` Linus Torvalds
2000-08-03 19:00 ` Richard B. Johnson
2000-08-03 19:29 ` Rik van Riel
2000-08-03 20:23 ` Linus Torvalds
2000-08-03 19:37 ` Ingo Oeser
2000-08-03 20:40 ` Linus Torvalds
2000-08-03 21:56 ` Ingo Oeser
2000-08-03 22:12 ` Linus Torvalds
2000-08-04 2:33 ` David Gould
2000-08-16 15:10 ` Stephen C. Tweedie
2000-08-03 19:26 ` Roger Larsson
2000-08-03 21:50 ` Rik van Riel
2000-08-03 22:28 ` Roger Larsson
-- strict thread matches above, loose matches on Subject: below --
2000-08-04 13:52 Mark_H_Johnson
[not found] <8725692F.0079E22B.00@d53mta03h.boulder.ibm.com>
2000-08-07 17:40 ` Gerrit.Huizenga
2000-08-07 18:37 ` Matthew Wilcox
2000-08-07 20:55 ` Chuck Lever
2000-08-07 21:59 ` Rik van Riel
2000-08-08 3:26 ` David Gould
2000-08-08 5:54 ` Kanoj Sarcar
2000-08-08 7:15 ` David Gould
[not found] <87256934.0072FA16.00@d53mta04h.boulder.ibm.com>
2000-08-08 0:36 ` Gerrit.Huizenga
[not found] <87256934.0078DADB.00@d53mta03h.boulder.ibm.com>
2000-08-08 0:48 ` Gerrit.Huizenga
2000-08-08 15:21 ` Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200008050152.SAA89298@apollo.backplane.com \
--to=dillon@apollo.backplane.com \
--cc=cw@f00f.org \
--cc=linux-kernel@vger.rutgers.edu \
--cc=linux-mm@kvack.org \
--cc=riel@conectiva.com.br \
--cc=torvalds@transmeta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.