From: "Martin J. Bligh" <mbligh@aracnet.com>
To: Andrea Arcangeli <andrea@suse.de>, Andrew Morton <akpm@osdl.org>
Cc: Rik van Riel <riel@redhat.com>, linux-kernel@vger.kernel.org
Subject: Re: 2.4.23aa2 (bugfixes and important VM improvements for the high end)
Date: Fri, 27 Feb 2004 14:03:07 -0800 [thread overview]
Message-ID: <162060000.1077919387@flay> (raw)
In-Reply-To: <20040227211548.GI8834@dualathlon.random>
> note that the 4:4 split is wrong in 99% of cases where people needs 64G
> gigs. I'm advocating strongly for the 2:2 split to everybody I talk
> with, I'm trying to spread the 2:2 idea because IMHO it's an order of
> magnitude simpler and an order of magnitude superior. Unfortunately I
> could get not a single number to back my 2:2 claims, since the 4:4
> buzzword is spreading and people only test with 4:4. so it's pretty hard
> for me to spread the 2:2 buzzword.
For the record, I for one am not opposed to doing 2:2 instead of 4:4.
What pisses me off is people trying to squeeze large amounts of memory
into 3:1, and distros pretending it's supportable, when it's never
stable across a broad spectrum of workloads. Between 2:2 and 4:4,
it's just a different overhead tradeoff.
> 4:4 makes no sense at all, the only advantage of 4:4 w.r.t. 2:2 is that
> they can map 2.7G per task of shm instead of 1.7G per task of shm.
Eh? You have a 2GB difference of user address space, and a 1GB difference
of shm size. You lost a GB somewhere ;-) Depending on whether you move
TASK_UNMAPPPED_BASE or not, it you might mean 2.7 vs 0.7 or at a pinch
3.5 vs 1.5, I'm not sure.
> syscall and irq. I expect the databases will run an order of magnitude
> faster with _2:2_ in a 64G configuration, with _1.7G_ per process of shm
> mapped, instead of their 4:4 split with 2.7G (or more, up to 3.9 ;)
> mapped per task.
That may well be true for some workloads, I suspect it's slower for others.
One could call the tradeoff either way.
> I don't mind if 4:4 gets merged but I recommend db vendors to benchmark
> _2:2_ against 4:4 before remotely considering deploying 4:4 in
> production. Then of course let me know since I had not the luck to get
> any number back and I've no access to any 64G box.
If you send me a *simple* simulation test, I'll gladly run it for you ;-)
But I'm not going to go fiddle with Oracle, and thousands of disks ;-)
> I don't care about 256G with 2:2 split, since intel and hp are now going
> x86-64 too.
Yeah, I don't think we ever need to deal with that kind of insanity ;-)
>> averse to objrmap for file-backed mappings either - I agree that the search
>> problems which were demonstrated are unlikely to bite in real life.
>
> cool.
>
> Martin's patch from IBM is a great start IMHO. I found a bug in the vma
> flags check though, VM_RESERVED should be checked too, not only
> VM_LOCKED, unless I'm missing something, but it's a minor issue.
I didn't actually write it - that was Dave McCracken ;-) I just suggested
the partial aproach (because I'm dirty and lazy ;-)) and carried it
in my tree.
I agree with Andrew's comments though - it's not nice having the dual
approach of the partial, but the complexity of the full approach is a
bit scary and buys you little in real terms (performance and space).
I still believe that creating an "address_space like structure" for
anon memory, shared across VMAs is an idea that might give us cleaner
code - it also fixes other problems like Andi's NUMA API binding.
> We can write a testcase ourself, it's pretty easy, just create a 2.7G
> file in /dev/shm, and mmap(MAP_SHARED) it from 1k processes and fault in
> all the pagetables from all tasks touching the shm vma. Then run a
> second copy until the machine starts swapping and see how thing goes. To
> do this you need probably 8G, this is why I didn't write the testcase
> myself yet ;). maybe I can simulate with less shm and less tasks on 1G
> boxes too, but the extreme lru effects of point 3 won't be visibile
> there, the very same software configuration works fine on 1/2G boxes on
> stock 2.4. problems showsup when the lru grows due the algorithm not
> contemplating million of dirty swapcache in a row at the end of the lru
> and some gigs of free cache ad the head of the lru. the rmap-only issues
> can also be tested with math, no testcase is needed for that.
I don't have time at the moment to go write it at the moment, but I can certainly run it on large end hardware if that helps.
M.
next prev parent reply other threads:[~2004-02-27 22:09 UTC|newest]
Thread overview: 100+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-02-27 1:33 2.4.23aa2 (bugfixes and important VM improvements for the high end) Andrea Arcangeli
2004-02-27 4:38 ` Rik van Riel
2004-02-27 17:32 ` Andrea Arcangeli
2004-02-27 19:08 ` Rik van Riel
2004-02-27 20:29 ` Andrew Morton
2004-02-27 20:49 ` Rik van Riel
2004-02-27 20:55 ` Andrew Morton
2004-02-27 21:28 ` Andrea Arcangeli
2004-02-27 21:37 ` Andrea Arcangeli
2004-02-28 3:22 ` Andrea Arcangeli
2004-03-01 11:10 ` Nikita Danilov
2004-02-27 21:15 ` Andrea Arcangeli
2004-02-27 22:03 ` Martin J. Bligh [this message]
2004-02-27 22:23 ` Andrew Morton
2004-02-28 2:32 ` Andrea Arcangeli
2004-02-28 4:57 ` Wim Coekaerts
2004-02-28 6:18 ` Andrea Arcangeli
2004-02-28 6:45 ` Martin J. Bligh
2004-02-28 7:05 ` Andrea Arcangeli
2004-02-28 9:19 ` Dave Hansen
2004-03-18 2:44 ` Andrea Arcangeli
[not found] ` <20040228061838.GO8834@dualathlon.random.suse.lists.linux.kernel>
2004-02-28 12:46 ` Andi Kleen
2004-02-29 1:39 ` Andrea Arcangeli
2004-02-29 2:29 ` Andi Kleen
2004-02-29 16:34 ` Andrea Arcangeli
2004-02-28 6:10 ` Martin J. Bligh
2004-02-28 6:43 ` Andrea Arcangeli
2004-02-28 7:00 ` Martin J. Bligh
2004-02-28 7:29 ` Andrea Arcangeli
2004-02-28 14:55 ` Rik van Riel
2004-02-28 15:06 ` Arjan van de Ven
2004-02-29 1:43 ` Andrea Arcangeli
[not found] ` < 1078370073.3403.759.camel@abyss.local>
2004-03-04 3:14 ` Peter Zaitsev
2004-03-04 3:33 ` Andrew Morton
2004-03-04 3:44 ` Peter Zaitsev
2004-03-04 4:07 ` Andrew Morton
2004-03-04 4:44 ` Peter Zaitsev
2004-03-04 4:52 ` Andrea Arcangeli
2004-03-04 5:10 ` Andrew Morton
2004-03-04 5:27 ` Andrea Arcangeli
2004-03-04 5:38 ` Andrew Morton
2004-03-05 20:19 ` Jamie Lokier
2004-03-05 20:33 ` Andrea Arcangeli
2004-03-05 21:44 ` Jamie Lokier
2004-03-04 12:12 ` Rik van Riel
2004-03-04 16:21 ` Peter Zaitsev
2004-03-04 18:13 ` Andrea Arcangeli
2004-03-04 17:35 ` Martin J. Bligh
2004-03-04 18:16 ` Andrea Arcangeli
2004-03-04 19:31 ` Martin J. Bligh
2004-03-04 20:21 ` Peter Zaitsev
2004-03-05 10:33 ` Ingo Molnar
2004-03-05 14:15 ` Andrea Arcangeli
2004-03-05 14:32 ` Ingo Molnar
2004-03-05 14:58 ` Andrea Arcangeli
2004-03-05 15:26 ` Ingo Molnar
2004-03-05 15:53 ` Andrea Arcangeli
2004-03-07 8:41 ` Ingo Molnar
2004-03-07 10:29 ` Nick Piggin
2004-03-07 17:33 ` Andrea Arcangeli
2004-03-08 5:15 ` Nick Piggin
2004-03-07 17:24 ` Andrea Arcangeli
2004-03-05 21:28 ` Martin J. Bligh
2004-03-05 18:42 ` Martin J. Bligh
2004-03-05 19:13 ` Andrea Arcangeli
2004-03-05 19:55 ` Martin J. Bligh
2004-03-05 20:29 ` Andrea Arcangeli
2004-03-05 20:41 ` Andrew Morton
2004-03-05 21:07 ` Andrea Arcangeli
2004-03-05 22:12 ` Andrew Morton
2004-03-05 14:34 ` Ingo Molnar
2004-03-05 14:59 ` Andrea Arcangeli
2004-03-05 15:02 ` Ingo Molnar
[not found] ` <20040305150225.GA13237@elte.hu.suse.lists.linux.kernel>
2004-03-05 15:51 ` Andi Kleen
2004-03-05 16:23 ` Ingo Molnar
2004-03-05 16:39 ` Andrea Arcangeli
2004-03-07 8:16 ` Ingo Molnar
2004-03-10 13:21 ` Andi Kleen
2004-03-05 16:42 ` Andrea Arcangeli
2004-03-05 16:49 ` Ingo Molnar
2004-03-05 16:58 ` Andrea Arcangeli
2004-03-05 20:11 ` Jamie Lokier
2004-03-06 5:12 ` Jamie Lokier
2004-03-06 12:56 ` Magnus Naeslund(t)
2004-03-06 13:13 ` Magnus Naeslund(t)
2004-03-07 11:55 ` Ingo Molnar
2004-03-07 6:50 ` Peter Zaitsev
2004-03-02 9:10 ` Kurt Garloff
2004-03-02 15:32 ` Martin J. Bligh
2004-02-27 21:42 ` Hugh Dickins
2004-02-27 23:18 ` Marcelo Tosatti
2004-02-27 22:39 ` Andrew Morton
2004-02-27 20:31 ` Andrea Arcangeli
2004-02-29 6:34 ` Mike Fedyk
[not found] <20040304175821.GO4922@dualathlon.random>
2004-03-04 22:14 ` Rik van Riel
2004-03-04 23:24 ` Andrea Arcangeli
2004-03-05 3:43 ` Rik van Riel
[not found] <1u7eQ-6Bz-1@gated-at.bofh.it>
[not found] ` <1ue6M-45w-11@gated-at.bofh.it>
[not found] ` <1uofN-4Rh-25@gated-at.bofh.it>
[not found] ` <1vRz3-5p2-11@gated-at.bofh.it>
[not found] ` <1vRSn-5Fc-11@gated-at.bofh.it>
[not found] ` <1vS26-5On-21@gated-at.bofh.it>
[not found] ` <1wkUr-3QW-11@gated-at.bofh.it>
[not found] ` <1wolx-7ET-31@gated-at.bofh.it>
[not found] ` <1woEM-7Yx-41@gated-at.bofh.it>
[not found] ` <1wp8b-7x-3@gated-at.bofh.it>
[not found] ` <1wp8l-7x-25@gated-at.bofh.it>
[not found] ` <1x0qG-Dr-3@gated-at.bofh.it>
2004-03-12 21:15 ` Andi Kleen
2004-03-18 19:50 ` Peter Zaitsev
[not found] ` <1woEJ-7Yx-25@gated-at.bofh.it>
[not found] ` <1wp8c-7x-5@gated-at.bofh.it>
[not found] ` <1wprd-qI-21@gated-at.bofh.it>
[not found] ` <1wpUz-Tw-21@gated-at.bofh.it>
[not found] ` <1x293-2nT-7@gated-at.bofh.it>
2004-03-12 21:25 ` Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=162060000.1077919387@flay \
--to=mbligh@aracnet.com \
--cc=akpm@osdl.org \
--cc=andrea@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox