From: torvalds@transmeta.com (Linus Torvalds)
To: linux-kernel@vger.kernel.org
Subject: Re: Break 2.4 VM in five easy steps
Date: 6 Jun 2001 13:47:35 -0700 [thread overview]
Message-ID: <9fm4t7$412$1@penguin.transmeta.com> (raw)
In-Reply-To: <3B1D5ADE.7FA50CD0@illusionary.com>
In article <3B1D5ADE.7FA50CD0@illusionary.com>,
Derek Glidden <dglidden@illusionary.com> wrote:
>
>After reading the messages to this list for the last couple of weeks and
>playing around on my machine, I'm convinced that the VM system in 2.4 is
>still severely broken.
Now, this may well be true, but what you actually demonstrated is that
"swapoff()" is extremely (and I mean _EXTREMELY_) inefficient, to the
point that it can certainly be called broken.
It got worse in 2.4.x not so much due to any generic VM worseness, as
due to the fact that the much more persistent swap cache behaviour in
2.4.x just exposes the fundamental inefficiencies of "swapoff()" more
clearly. I don't think the swapoff() algorithm itself has changed, it's
just that the algorithm was always exponential, I think (and because of
the persistent swap cache, the "n" in the algorithm became much bigger).
So this is really a separate problem from the general VM balancing
issues. Go and look at the "try_to_unuse()" logic, and wince.
I'd love to have somebody look a bit more at swap-off. It may well be,
for example, that swap-off does not correctly notice dead swap-pages at
all - somebody should verify that it doesn't try to read in and
"try_to_unuse()" dead swap entries. That would make the inefficiency
show up even more clearly.
(Quick look gives the following: right now try_to_unuse() in
mm/swapfile.c does something like
lock_page(page);
if (PageSwapCache(page))
delete_from_swap_cache_nolock(page);
UnlockPage(page);
read_lock(&tasklist_lock);
for_each_task(p)
unuse_process(p->mm, entry, page);
read_unlock(&tasklist_lock);
shmem_unuse(entry, page);
/* Now get rid of the extra reference to the temporary
page we've been using. */
page_cache_release(page);
and we should trivially notice that if the page count is 1, it cannot be
mapped in any process, so we should maybe add something like
lock_page(page);
if (PageSwapCache(page))
delete_from_swap_cache_nolock(page);
UnlockPage(page);
+ if (page_count(page) == 1)
+ goto nothing_to_do;
read_lock(&tasklist_lock);
for_each_task(p)
unuse_process(p->mm, entry, page);
read_unlock(&tasklist_lock);
shmem_unuse(entry, page);
+
+ nothing_to_do:
+
/* Now get rid of the extra reference to the temporary
page we've been using. */
page_cache_release(page);
which should (assuming I got the page count thing right - I'v eobviously
not tested the above change) make sure that we don't spend tons of time
on dead swap pages.
Somebody interested in trying the above add? And looking for other more
obvious bandaid fixes. It won't "fix" swapoff per se, but it might make
it bearable and bring it to the 2.2.x levels.
The _real_ fix is to really make "swapoff()" work the other way around -
go through each process and look for swap entries in the page tables
_first_, and bring all entries for that device in sanely, and after
everything is brought in just drop all the swap cache pages for that
device.
The current swapoff() thing is really a quick hack that has lived on
since early 1992 with quick hacks to make it work with the big VM
changes that have happened since.
That would make swapoff be O(n) in VM size (and you can easily do some
further micro-optimizations at that time by avoiding shared mappings
with backing store and other things that cannot have swap info involved)
Is anybody interested in making "swapoff()" better? Please speak up..
Linus
next prev parent reply other threads:[~2001-06-06 20:48 UTC|newest]
Thread overview: 142+ messages / expand[flat|nested] mbox.gz Atom feed top
2001-06-05 22:19 Break 2.4 VM in five easy steps Derek Glidden
2001-06-05 23:38 ` Jeffrey W. Baker
2001-06-06 1:42 ` Russell Leighton
2001-06-06 7:14 ` Sean Hunter
2001-06-06 7:47 ` Jonathan Morton
2001-06-06 2:16 ` Andrew Morton
2001-06-06 3:19 ` Derek Glidden
2001-06-06 8:19 ` Xavier Bestel
2001-06-06 8:54 ` Sean Hunter
2001-06-06 9:16 ` Xavier Bestel
2001-06-06 9:25 ` Sean Hunter
2001-06-06 10:04 ` Jonathan Morton
2001-06-06 9:57 ` Dr S.M. Huen
2001-06-06 10:06 ` DBs (ML)
2001-06-06 10:08 ` Vivek Dasmohapatra
2001-06-06 10:19 ` Lauri Tischler
2001-06-06 10:22 ` Sean Hunter
2001-06-06 10:48 ` Alexander Viro
2001-06-06 16:58 ` dean gaudet
2001-06-06 17:10 ` Remi Turk
2001-06-06 22:44 ` Kai Henningsen
2001-06-09 7:17 ` Rik van Riel
2001-06-06 16:47 ` dean gaudet
2001-06-06 17:17 ` Kurt Roeckx
2001-06-06 18:35 ` Dr S.M. Huen
2001-06-06 18:40 ` Mark Salisbury
2001-06-06 19:11 ` android
2001-06-07 0:27 ` Mike A. Harris
2001-06-07 0:20 ` Mike A. Harris
2001-06-09 8:16 ` Rik van Riel
2001-06-09 8:57 ` Mike A. Harris
2001-06-07 21:31 ` Shane Nay
2001-06-07 20:00 ` Marcelo Tosatti
2001-06-07 21:55 ` Shane Nay
2001-06-07 20:29 ` Marcelo Tosatti
2001-06-07 23:29 ` VM Report was:Re: " Shane Nay
2001-06-08 1:18 ` Jonathan Morton
2001-06-08 12:50 ` Mike Galbraith
2001-06-08 14:19 ` Tobias Ringstrom
2001-06-08 15:51 ` John Stoffel
2001-06-08 17:01 ` Mike Galbraith
2001-06-08 17:43 ` John Stoffel
2001-06-08 17:35 ` Marcelo Tosatti
2001-06-08 20:58 ` John Stoffel
2001-06-08 20:04 ` Marcelo Tosatti
2001-06-08 23:44 ` Jonathan Morton
2001-06-09 2:36 ` Andrew Morton
2001-06-09 6:33 ` Mark Hahn
2001-06-09 3:43 ` Mike Galbraith
2001-06-09 4:05 ` Jonathan Morton
2001-06-09 5:09 ` Mike Galbraith
2001-06-09 5:07 ` Mike Galbraith
2001-06-08 18:30 ` Mike Galbraith
2001-06-09 12:31 ` Zlatko Calusic
2001-06-09 3:34 ` Rik van Riel
2001-06-08 16:51 ` Mike Galbraith
2001-06-08 19:09 ` Tobias Ringstrom
2001-06-09 4:36 ` Mike Galbraith
2001-06-06 11:16 ` Daniel Phillips
2001-06-06 15:28 ` Richard Gooch
2001-06-06 15:42 ` Christian Bornträger
2001-06-06 15:57 ` Requirement: swap = RAM x 2.5 ?? Jeff Garzik
2001-06-06 16:12 ` Richard Gooch
2001-06-06 16:15 ` Jeff Garzik
2001-06-06 16:19 ` Richard Gooch
2001-06-06 16:53 ` Mike Galbraith
2001-06-06 17:05 ` Greg Hennessy
2001-06-06 18:42 ` Eric W. Biederman
2001-06-07 1:29 ` Jan Harkes
2001-06-06 17:14 ` Break 2.4 VM in five easy steps Ben Greear
2001-06-06 12:07 ` Jonathan Morton
2001-06-06 13:58 ` Gerhard Mack
2001-06-08 4:56 ` C. Martins
2001-06-06 14:41 ` Derek Glidden
2001-06-06 20:29 ` José Luis Domingo López
2001-06-06 14:16 ` Disconnect
[not found] ` <3B1DEAC7.43DEFA1C@idb.hist.no>
2001-06-06 14:51 ` Derek Glidden
2001-06-06 21:34 ` Alan Cox
2001-06-09 8:07 ` Rik van Riel
2001-06-07 7:23 ` Helge Hafting
2001-06-07 16:56 ` Eric W. Biederman
2001-06-07 20:24 ` José Luis Domingo López
2001-06-06 4:03 ` Jeffrey W. Baker
2001-06-06 13:32 ` Eric W. Biederman
2001-06-06 14:41 ` Marc Heckmann
2001-06-06 14:51 ` Hugh Dickins
2001-06-06 13:08 ` Eric W. Biederman
2001-06-06 16:48 ` Jeffrey W. Baker
[not found] ` <m2lmn61ceb.fsf@sympatico.ca>
2001-06-06 14:37 ` Derek Glidden
2001-06-07 0:34 ` Mike A. Harris
2001-06-07 3:13 ` Miles Lane
2001-06-07 15:49 ` Derek Glidden
2001-06-07 19:06 ` Miles Lane
2001-06-09 5:57 ` Mike A. Harris
2001-06-06 18:59 ` Mike Galbraith
2001-06-06 19:39 ` Derek Glidden
2001-06-06 20:47 ` Linus Torvalds [this message]
2001-06-07 7:42 ` Eric W. Biederman
2001-06-07 8:11 ` Linus Torvalds
2001-06-07 8:54 ` Eric W. Biederman
2001-06-06 21:39 ` android
2001-06-06 22:08 ` Jonathan Morton
2001-06-06 22:27 ` android
2001-06-06 22:33 ` Antoine
2001-06-06 22:38 ` Robert Love
2001-06-06 22:40 ` Jonathan Morton
-- strict thread matches above, loose matches on Subject: below --
2001-06-06 15:31 Derek Glidden
2001-06-06 15:46 ` John Alvord
2001-06-06 15:58 ` Derek Glidden
2001-06-06 18:27 ` Eric W. Biederman
2001-06-06 18:47 ` Derek Glidden
2001-06-06 18:52 ` Eric W. Biederman
2001-06-06 19:06 ` Mike Galbraith
2001-06-06 19:28 ` Eric W. Biederman
2001-06-07 4:32 ` Mike Galbraith
2001-06-07 6:38 ` Eric W. Biederman
2001-06-07 7:28 ` Mike Galbraith
2001-06-07 7:59 ` Eric W. Biederman
2001-06-07 8:15 ` Mike Galbraith
2001-06-07 17:10 ` Marcelo Tosatti
2001-06-06 19:28 ` Derek Glidden
2001-06-09 7:55 ` Rik van Riel
2001-06-06 20:43 ` Daniel Phillips
2001-06-06 21:57 ` LA Walsh
2001-06-07 6:35 ` Eric W. Biederman
2001-06-07 15:25 ` LA Walsh
2001-06-07 16:42 ` Eric W. Biederman
2001-06-07 20:47 ` LA Walsh
2001-06-08 19:38 ` Pavel Machek
2001-06-09 7:34 ` Rik van Riel
2001-06-06 21:30 ` Alan Cox
2001-06-06 21:57 ` Derek Glidden
2001-06-09 8:09 ` Rik van Riel
2001-06-07 10:46 Bernd Jendrissek
[not found] ` <20010607153835.T14203@jessica>
2001-06-08 7:37 ` Bernd Jendrissek
2001-06-08 19:32 ` Pavel Machek
2001-06-11 12:06 ` Maciej Zenczykowski
2001-06-11 19:04 ` Rik van Riel
2001-06-12 7:46 ` Bernd Jendrissek
2001-06-07 14:22 Bulent Abali
2001-06-07 15:38 ` Mike Galbraith
2001-06-10 22:04 Rob Landley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='9fm4t7$412$1@penguin.transmeta.com' \
--to=torvalds@transmeta.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox