public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@suse.de>
To: Rik van Riel <riel@conectiva.com.br>
Cc: "Martin J. Bligh" <Martin.Bligh@us.ibm.com>,
	Daniel Phillips <phillips@bonn-fries.net>,
	Bill Davidsen <davidsen@tmr.com>,
	Mike Fedyk <mfedyk@matchmail.com>,
	linux-kernel@vger.kernel.org
Subject: Re: 2.4.19pre1aa1
Date: Tue, 5 Mar 2002 00:01:02 +0100	[thread overview]
Message-ID: <20020305000102.S20606@dualathlon.random> (raw)
In-Reply-To: <20020304191942.M20606@dualathlon.random> <Pine.LNX.4.44L.0203041732520.1413-100000@duckman.distro.conectiva>
In-Reply-To: <Pine.LNX.4.44L.0203041732520.1413-100000@duckman.distro.conectiva>

On Mon, Mar 04, 2002 at 06:36:47PM -0300, Rik van Riel wrote:
> On Mon, 4 Mar 2002, Andrea Arcangeli wrote:
> 
> > > 2) We can do local per-node scanning - no need to bounce
> > > information to and fro across the interconnect just to see what's
> > > worth swapping out.
> >
> > the lru lists are global at the moment, so for the normal swapout
> > activitiy rmap won't allow you to do what you mention above
> 
> Actually, the lru lists are per zone and have been for a while.

They're not in my tree and for very good reasons, Ben did such mistake
the first time at some point during 2.3. You've a big downside with the
per-zone information, all normal machines (like with 64M of ram or 2G of
ram) where theorical O(N) complexity is perfectly fine for lowmem
dma/normal allocations, will get hurted very much by the per-node lrus.
You're the one saying that the system load is very low and that it's
better to do more accurate page replacement decisions.

I think they may be worthwhile on a hundred gigabyte machine only, but
the whole point is that in such a box you'll have only one zone anyways
and so per-zone in such case will match per-node :).

So I think they should be at least per-node in 2.5 to make 99% of
userbase happy.  And again, it depends on what kind numa if they've to
be global or per-node, so it would be probably much better to have them
per-node or global depending on a compile-time configuration #define.

> The thing which was lacking up to now is a pagecache_lru_lock
> per zone, because this clashes with truncate().  Arjan came up
> with a creative solution to fix this problem and I'll integrate
> it into -rmap soon...

making it a per-lru spinlock is natural scalability optimization, but
anyways pagemap_lru_lock isn't a very critical spinlock.  before
worrying about pagemal_lru_lock I'd worry about the pagecache_lock I
think (even the pagecache_lock doesn't matter much on most usages). Of
course it also depends on the workload, but the important workloads will
hit the pagecache_lock first.

> > (furthmore rmap gives you only the pointer to the pte chain, but there's
> > no guarantee the pte is in the same node as the physical page, even
> > assuming we'll have per-node inactive/active list, so you'll fall into
> > the bouncing scenario anyways rmap or not, only the cpu usage will be
> > lower and as side effect you'll bounce less, but you're not avoiding the
> > interconnet overhead with the per-node scanning).
> 
> Well, if we need to free memory from node A, we will need to
> do that anyway. If we don't scan the page tables from node B,
> maybe we'll never be able to free memory from node A.
> 
> The only thing -rmap does is make sure we only scan the page
> tables belonging to the physical pages in node A, instead of
> having to scan the page tables of all processes in all nodes.

Correct. And as said this is a scalability optimization, the more ptes
you'll have, the more you want to skip the ones belonging to pages in
node B, or you may end wasting too much system time on 512G system etc...

> I'd appreciate it if you could look at the implementation and
> look for areas to optimise. However, note that I don't believe

I didn't had time to look too much into that yet (I had only a short
review so far), but I will certainly do that in some more time, looking
at it with a 2.5 long term prospective. I didn't liked too much that you
resurrected some of the old code that I don't think pays off. I would
preferred if you had rmap on top of my vm patch without reintroducing
the older logics. I still don't see the need of inactive_dirty and the
fact you dropped classzone and put the unreliable "plenty stuff" that
reintroduces design bugs that will lead kswapd go crazy again. But ok, I
don't worry too much about that, the rmap bits that maintains the
additional information are orthogonal with the other changes and that's
the interesting part of the patch after all.

Andrea

  reply	other threads:[~2002-03-04 23:03 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-02-27 12:50 2.4.19pre1aa1 Andrea Arcangeli
2002-02-28 22:11 ` 2.4.19pre1aa1 Bill Davidsen
2002-03-01  1:30   ` 2.4.19pre1aa1 Mike Fedyk
2002-03-01  3:26     ` 2.4.19pre1aa1 Bill Davidsen
2002-03-01  3:46       ` 2.4.19pre1aa1 Mike Fedyk
2002-03-01 12:51         ` 2.4.19pre1aa1 Rik van Riel
2002-03-01 18:37           ` 2.4.19pre1aa1 Mike Fedyk
2002-03-01 10:17       ` 2.4.19pre1aa1 Marco Colombo
2002-03-01 11:37         ` 2.4.19pre1aa1 Alan Cox
2002-03-02  2:06       ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-02  2:28         ` 2.4.19pre1aa1 Alan Cox
2002-03-02  3:30           ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-03 21:38         ` 2.4.19pre1aa1 Daniel Phillips
2002-03-04  0:49           ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-04  1:46             ` 2.4.19pre1aa1 Daniel Phillips
2002-03-04  2:25               ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-04  3:22                 ` 2.4.19pre1aa1 Daniel Phillips
2002-03-04 12:41                 ` 2.4.19pre1aa1 Rik van Riel
2002-03-04 14:05                   ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-04 14:23                     ` 2.4.19pre1aa1 Rik van Riel
2002-03-04 16:10                       ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-04 16:28                         ` 2.4.19pre1aa1 Rik van Riel
2002-03-04 16:59                       ` 2.4.19pre1aa1 Martin J. Bligh
2002-03-04 18:18                         ` 2.4.19pre1aa1 Stephan von Krawczynski
2002-03-04 18:41                           ` 2.4.19pre1aa1 Stephan von Krawczynski
2002-03-04 18:46                           ` 2.4.19pre1aa1 Martin J. Bligh
2002-03-04 22:06                             ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-04 23:03                               ` 2.4.19pre1aa1 Samuel Ortiz
2002-03-05 11:23                                 ` 2.4.19pre1aa1 Stephan von Krawczynski
2002-03-05 17:35                                   ` 2.4.19pre1aa1 Samuel Ortiz
2002-03-05  0:12                               ` 2.4.19pre1aa1 Rik van Riel
2002-03-05  6:21                               ` 2.4.19pre1aa1 Martin J. Bligh
2002-03-04 21:37                           ` 2.4.19pre1aa1 Rik van Riel
2002-03-04 18:19                         ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-04 18:56                           ` 2.4.19pre1aa1 Martin J. Bligh
2002-03-04 22:25                             ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-04 23:09                               ` 2.4.19pre1aa1 Gerrit Huizenga
2002-03-05  0:19                                 ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-05  2:00                                   ` 2.4.19pre1aa1 Gerrit Huizenga
2002-03-04 22:38                             ` 2.4.19pre1aa1 Daniel Phillips
2002-03-04 21:36                           ` 2.4.19pre1aa1 Rik van Riel
2002-03-04 23:01                             ` Andrea Arcangeli [this message]
2002-03-04 23:11                               ` 2.4.19pre1aa1 Rik van Riel
2002-03-04 23:52                                 ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-05  0:01                                   ` 2.4.19pre1aa1 Rik van Riel
2002-03-05  1:05                                     ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-05  1:26                                       ` 2.4.19pre1aa1 Rik van Riel
2002-03-05  1:40                                         ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-05  1:55                                           ` 2.4.19pre1aa1 Martin J. Bligh
2002-03-05  5:16                                             ` 2.4.19pre1aa1 Samuel Ortiz
2002-03-05  5:47                                               ` 2.4.19pre1aa1 Martin J. Bligh
2002-03-05  6:33                                                 ` 2.4.19pre1aa1 Samuel Ortiz
2002-03-05 12:22                                           ` 2.4.19pre1aa1 Rik van Riel
2002-03-05 15:01                                             ` 2.4.19pre1aa1 Andrea Arcangeli
     [not found]                                             ` <Pine.LNX.4.44L.0203050921510.1413-100000@duckman.distro.conecti va>
2002-03-05 15:29                                               ` 2.4.19pre1aa1 Martin J. Bligh
2002-03-05 15:43                                                 ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-05  3:05                                         ` 2.4.19pre1aa1 Bill Davidsen
2002-03-05  8:35                                   ` 2.4.19pre1aa1 arjan
2002-03-05 12:41                                     ` 2.4.19pre1aa1 Rik van Riel
2002-03-05 15:10                                       ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-05 16:57                                         ` 2.4.19pre1aa1 Rik van Riel
2002-03-05 18:26                                           ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-05 18:30                                             ` 2.4.19pre1aa1 Arjan van de Ven
2002-03-05 19:12                                               ` 2.4.19pre1aa1 Andrew Morton
2002-03-05 23:03                                                 ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-05 23:05                                                   ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-05 23:24                                                     ` 2.4.19pre1aa1 Andrew Morton
2002-03-05 23:37                                                       ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-05 23:51                                                         ` 2.4.19pre1aa1 Andrew Morton
2002-03-06  0:09                                       ` 2.4.19pre1aa1 Daniel Phillips
2002-03-05 14:55                                     ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-05  5:38                               ` 2.4.19pre1aa1 Martin J. Bligh
2002-03-05  6:45                                 ` 2.4.19pre1aa1 David Lang
     [not found]       ` <200203021958.g22JwKq08818@Port.imtp.ilyichevsk.odessa.ua>
2002-03-02 20:47         ` 2.4.19pre1aa1 Andrea Arcangeli
2002-03-02 20:58           ` 2.4.19pre1aa1 Robert Love
2002-03-05 22:16             ` 2.4.19pre1aa1 Bill Davidsen
  -- strict thread matches above, loose matches on Subject: below --
2002-02-28  2:57 2.4.19pre1aa1 rwhron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20020305000102.S20606@dualathlon.random \
    --to=andrea@suse.de \
    --cc=Martin.Bligh@us.ibm.com \
    --cc=davidsen@tmr.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mfedyk@matchmail.com \
    --cc=phillips@bonn-fries.net \
    --cc=riel@conectiva.com.br \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox