From: "Martin J. Bligh" <mbligh@aracnet.com>
To: Andrea Arcangeli <andrea@suse.de>, Andrew Morton <akpm@osdl.org>
Cc: Rik van Riel <riel@redhat.com>, linux-kernel@vger.kernel.org
Subject: Re: 2.4.23aa2 (bugfixes and important VM improvements for the high end)
Date: Fri, 27 Feb 2004 14:03:07 -0800 [thread overview]
Message-ID: <162060000.1077919387@flay> (raw)
In-Reply-To: <20040227211548.GI8834@dualathlon.random>
> note that the 4:4 split is wrong in 99% of cases where people needs 64G
> gigs. I'm advocating strongly for the 2:2 split to everybody I talk
> with, I'm trying to spread the 2:2 idea because IMHO it's an order of
> magnitude simpler and an order of magnitude superior. Unfortunately I
> could get not a single number to back my 2:2 claims, since the 4:4
> buzzword is spreading and people only test with 4:4. so it's pretty hard
> for me to spread the 2:2 buzzword.
For the record, I for one am not opposed to doing 2:2 instead of 4:4.
What pisses me off is people trying to squeeze large amounts of memory
into 3:1, and distros pretending it's supportable, when it's never
stable across a broad spectrum of workloads. Between 2:2 and 4:4,
it's just a different overhead tradeoff.
> 4:4 makes no sense at all, the only advantage of 4:4 w.r.t. 2:2 is that
> they can map 2.7G per task of shm instead of 1.7G per task of shm.
Eh? You have a 2GB difference of user address space, and a 1GB difference
of shm size. You lost a GB somewhere ;-) Depending on whether you move
TASK_UNMAPPPED_BASE or not, it you might mean 2.7 vs 0.7 or at a pinch
3.5 vs 1.5, I'm not sure.
> syscall and irq. I expect the databases will run an order of magnitude
> faster with _2:2_ in a 64G configuration, with _1.7G_ per process of shm
> mapped, instead of their 4:4 split with 2.7G (or more, up to 3.9 ;)
> mapped per task.
That may well be true for some workloads, I suspect it's slower for others.
One could call the tradeoff either way.
> I don't mind if 4:4 gets merged but I recommend db vendors to benchmark
> _2:2_ against 4:4 before remotely considering deploying 4:4 in
> production. Then of course let me know since I had not the luck to get
> any number back and I've no access to any 64G box.
If you send me a *simple* simulation test, I'll gladly run it for you ;-)
But I'm not going to go fiddle with Oracle, and thousands of disks ;-)
> I don't care about 256G with 2:2 split, since intel and hp are now going
> x86-64 too.
Yeah, I don't think we ever need to deal with that kind of insanity ;-)
>> averse to objrmap for file-backed mappings either - I agree that the search
>> problems which were demonstrated are unlikely to bite in real life.
>
> cool.
>
> Martin's patch from IBM is a great start IMHO. I found a bug in the vma
> flags check though, VM_RESERVED should be checked too, not only
> VM_LOCKED, unless I'm missing something, but it's a minor issue.
I didn't actually write it - that was Dave McCracken ;-) I just suggested
the partial aproach (because I'm dirty and lazy ;-)) and carried it
in my tree.
I agree with Andrew's comments though - it's not nice having the dual
approach of the partial, but the complexity of the full approach is a
bit scary and buys you little in real terms (performance and space).
I still believe that creating an "address_space like structure" for
anon memory, shared across VMAs is an idea that might give us cleaner
code - it also fixes other problems like Andi's NUMA API binding.
> We can write a testcase ourself, it's pretty easy, just create a 2.7G
> file in /dev/shm, and mmap(MAP_SHARED) it from 1k processes and fault in
> all the pagetables from all tasks touching the shm vma. Then run a
> second copy until the machine starts swapping and see how thing goes. To
> do this you need probably 8G, this is why I didn't write the testcase
> myself yet ;). maybe I can simulate with less shm and less tasks on 1G
> boxes too, but the extreme lru effects of point 3 won't be visibile
> there, the very same software configuration works fine on 1/2G boxes on
> stock 2.4. problems showsup when the lru grows due the algorithm not
> contemplating million of dirty swapcache in a row at the end of the lru
> and some gigs of free cache ad the head of the lru. the rmap-only issues
> can also be tested with math, no testcase is needed for that.
I don't have time at the moment to go write it at the moment, but I can certainly run it on large end hardware if that helps.
M.
next prev parent reply other threads:[~2004-02-27 22:09 UTC|newest]
Thread overview: 100+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-02-27 1:33 2.4.23aa2 (bugfixes and important VM improvements for the high end) Andrea Arcangeli
2004-02-27 4:38 ` Rik van Riel
2004-02-27 17:32 ` Andrea Arcangeli
2004-02-27 19:08 ` Rik van Riel
2004-02-27 20:29 ` Andrew Morton
2004-02-27 20:49 ` Rik van Riel
2004-02-27 20:55 ` Andrew Morton
2004-02-27 21:28 ` Andrea Arcangeli
2004-02-27 21:37 ` Andrea Arcangeli
2004-02-28 3:22 ` Andrea Arcangeli
2004-03-01 11:10 ` Nikita Danilov
2004-02-27 21:15 ` Andrea Arcangeli
2004-02-27 22:03 ` Martin J. Bligh [this message]
2004-02-27 22:23 ` Andrew Morton
2004-02-28 2:32 ` Andrea Arcangeli
2004-02-28 4:57 ` Wim Coekaerts
2004-02-28 6:18 ` Andrea Arcangeli
2004-02-28 6:45 ` Martin J. Bligh
2004-02-28 7:05 ` Andrea Arcangeli
2004-02-28 9:19 ` Dave Hansen
2004-03-18 2:44 ` Andrea Arcangeli
[not found] ` <20040228061838.GO8834@dualathlon.random.suse.lists.linux.kernel>
2004-02-28 12:46 ` Andi Kleen
2004-02-29 1:39 ` Andrea Arcangeli
2004-02-29 2:29 ` Andi Kleen
2004-02-29 16:34 ` Andrea Arcangeli
2004-02-28 6:10 ` Martin J. Bligh
2004-02-28 6:43 ` Andrea Arcangeli
2004-02-28 7:00 ` Martin J. Bligh
2004-02-28 7:29 ` Andrea Arcangeli
2004-02-28 14:55 ` Rik van Riel
2004-02-28 15:06 ` Arjan van de Ven
2004-02-29 1:43 ` Andrea Arcangeli
[not found] ` < 1078370073.3403.759.camel@abyss.local>
2004-03-04 3:14 ` Peter Zaitsev
2004-03-04 3:33 ` Andrew Morton
2004-03-04 3:44 ` Peter Zaitsev
2004-03-04 4:07 ` Andrew Morton
2004-03-04 4:44 ` Peter Zaitsev
2004-03-04 4:52 ` Andrea Arcangeli
2004-03-04 5:10 ` Andrew Morton
2004-03-04 5:27 ` Andrea Arcangeli
2004-03-04 5:38 ` Andrew Morton
2004-03-05 20:19 ` Jamie Lokier
2004-03-05 20:33 ` Andrea Arcangeli
2004-03-05 21:44 ` Jamie Lokier
2004-03-04 12:12 ` Rik van Riel
2004-03-04 16:21 ` Peter Zaitsev
2004-03-04 18:13 ` Andrea Arcangeli
2004-03-04 17:35 ` Martin J. Bligh
2004-03-04 18:16 ` Andrea Arcangeli
2004-03-04 19:31 ` Martin J. Bligh
2004-03-04 20:21 ` Peter Zaitsev
2004-03-05 10:33 ` Ingo Molnar
2004-03-05 14:15 ` Andrea Arcangeli
2004-03-05 14:32 ` Ingo Molnar
2004-03-05 14:58 ` Andrea Arcangeli
2004-03-05 15:26 ` Ingo Molnar
2004-03-05 15:53 ` Andrea Arcangeli
2004-03-07 8:41 ` Ingo Molnar
2004-03-07 10:29 ` Nick Piggin
2004-03-07 17:33 ` Andrea Arcangeli
2004-03-08 5:15 ` Nick Piggin
2004-03-07 17:24 ` Andrea Arcangeli
2004-03-05 21:28 ` Martin J. Bligh
2004-03-05 18:42 ` Martin J. Bligh
2004-03-05 19:13 ` Andrea Arcangeli
2004-03-05 19:55 ` Martin J. Bligh
2004-03-05 20:29 ` Andrea Arcangeli
2004-03-05 20:41 ` Andrew Morton
2004-03-05 21:07 ` Andrea Arcangeli
2004-03-05 22:12 ` Andrew Morton
2004-03-05 14:34 ` Ingo Molnar
2004-03-05 14:59 ` Andrea Arcangeli
2004-03-05 15:02 ` Ingo Molnar
[not found] ` <20040305150225.GA13237@elte.hu.suse.lists.linux.kernel>
2004-03-05 15:51 ` Andi Kleen
2004-03-05 16:23 ` Ingo Molnar
2004-03-05 16:39 ` Andrea Arcangeli
2004-03-07 8:16 ` Ingo Molnar
2004-03-10 13:21 ` Andi Kleen
2004-03-05 16:42 ` Andrea Arcangeli
2004-03-05 16:49 ` Ingo Molnar
2004-03-05 16:58 ` Andrea Arcangeli
2004-03-05 20:11 ` Jamie Lokier
2004-03-06 5:12 ` Jamie Lokier
2004-03-06 12:56 ` Magnus Naeslund(t)
2004-03-06 13:13 ` Magnus Naeslund(t)
2004-03-07 11:55 ` Ingo Molnar
2004-03-07 6:50 ` Peter Zaitsev
2004-03-02 9:10 ` Kurt Garloff
2004-03-02 15:32 ` Martin J. Bligh
2004-02-27 21:42 ` Hugh Dickins
2004-02-27 23:18 ` Marcelo Tosatti
2004-02-27 22:39 ` Andrew Morton
2004-02-27 20:31 ` Andrea Arcangeli
2004-02-29 6:34 ` Mike Fedyk
[not found] <20040304175821.GO4922@dualathlon.random>
2004-03-04 22:14 ` Rik van Riel
2004-03-04 23:24 ` Andrea Arcangeli
2004-03-05 3:43 ` Rik van Riel
[not found] <1u7eQ-6Bz-1@gated-at.bofh.it>
[not found] ` <1ue6M-45w-11@gated-at.bofh.it>
[not found] ` <1uofN-4Rh-25@gated-at.bofh.it>
[not found] ` <1vRz3-5p2-11@gated-at.bofh.it>
[not found] ` <1vRSn-5Fc-11@gated-at.bofh.it>
[not found] ` <1vS26-5On-21@gated-at.bofh.it>
[not found] ` <1wkUr-3QW-11@gated-at.bofh.it>
[not found] ` <1wolx-7ET-31@gated-at.bofh.it>
[not found] ` <1woEM-7Yx-41@gated-at.bofh.it>
[not found] ` <1wp8b-7x-3@gated-at.bofh.it>
[not found] ` <1wp8l-7x-25@gated-at.bofh.it>
[not found] ` <1x0qG-Dr-3@gated-at.bofh.it>
2004-03-12 21:15 ` Andi Kleen
2004-03-18 19:50 ` Peter Zaitsev
[not found] ` <1woEJ-7Yx-25@gated-at.bofh.it>
[not found] ` <1wp8c-7x-5@gated-at.bofh.it>
[not found] ` <1wprd-qI-21@gated-at.bofh.it>
[not found] ` <1wpUz-Tw-21@gated-at.bofh.it>
[not found] ` <1x293-2nT-7@gated-at.bofh.it>
2004-03-12 21:25 ` Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=162060000.1077919387@flay \
--to=mbligh@aracnet.com \
--cc=akpm@osdl.org \
--cc=andrea@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.