From: Bill Davidsen <davidsen@tmr.com>
To: Ray Bryant <raybry@sgi.com>
Cc: Buddy Lumpkin <b.lumpkin@comcast.net>,
"'Con Kolivas'" <kernel@kolivas.org>,
"'FabF'" <fabian.frederick@skynet.be>,
"'Bernd Eckenfels'" <ecki-news2004-05@lina.inka.de>,
linux-kernel@vger.kernel.org, lse-tech@lists.sourceforge.net,
linux-mm@kvack.org
Subject: Re: why swap at all?
Date: Wed, 09 Jun 2004 15:24:13 -0400 [thread overview]
Message-ID: <40C763DD.7090003@tmr.com> (raw)
In-Reply-To: <40C5D7FB.7020402@sgi.com>
Ray Bryant wrote:
>
> Buddy Lumpkin wrote:
>
>> <snip> One method would be to keep the
>> pagecache on it's own list, and move pages to the head of the list any
>> time
>> they are modified or referenced, and reclaim from the tail.
>> All pages on this list can be considered as "free memory", because any
>> new
>> memory requests would just cause pages to be evicted from the tail of the
>> list.
>>
>
> We have code running on Altix that does exactly this. (Please note,
> however, that this is for our version of Linux 2.4.21 -- Yeah, its
> old, but that is what the product runs at the moment -- we are in
> the process of switching over to Linux 2.6 when all of this will
> have to be re-evaluated.) The changes are in three parts:
>
> (1) We added a new page list, the reclaim list. Pages are put
> onto the reclaim list when they are inserted into the page cache.
> They are removed from the list when they are marked dirty (buffers
> from the page go on to the LRU dirty list) or when the pages are
> mmap'd into an address space, since in either of these situations,
> the pages are not reclaimable. (This list is per node in our
> NUMA system.)
>
> (2) We added code in __alloc_pages() so that if the local node
> allocation is going to fail (remember that Altix is a NUMA machine),
> we call out to a routine to scan the reclaim list on that node and
> to release enough clean buffer cache pages to make the local
> allocation succeed (plus a few pages, for efficiency). If this
> doesn't work, we most likely end up spilling the allocation over
> to another node.
>
> (3) We added code in generic_file_write() to limit the size of
> the page cache on buffered file I/O write operations. If the
> current size of the page cache is larger than the limit, we
> call the same routine as above to release some page cache pages.
> If we can't free enough pages to get below the limit, we throttle
> the write process by delaying it for a bit. This was all to
> avoid the problem of a large buffered file I/O request causing
> the page cache to grow to the point where the system would start
> to swap. (On our large memory systems, dropping into the
> swapping code can cause the system to freeze for 10's of seconds,
> and that is something we would like to avoid).
>
> (We actually don't enforce the page cache limit unless the amount
> of free memory has dropped below a certain threshold. This is to
> keep the page cache from being limited if there is lots of free
> memory -- even though we only limit the page cache on writes,
> it turns out that the kernel is constantly writing to the disk,
> so this also effectively causes the page cache to be limited
> for reads as well.)
>
> This code was also written in response to customer demand. They
> don't like the fact that the buffer cache grows and grows on our
> Altix systems, and they want old buffer cache pages to be cleared
> out when they are no longer needed. Since we almost never suffer
> memory pressure on our systems (and if we do, we are likely in
> trouble), kswapd almost never does this. Buffer cache pages can
> sit around for days with no one removing them. The above was one
> approach to solve that problem.
>
> Pleaes note: YMMV. An Altix is not a desktop system and I make
> no claims that the above approach is appropriate for everyone.
> For us, it turns out to work better to bias storage allocation
> against unbridled growth of the page cache. Indeed, we have
> spent a lot of time trying to solve problems related to page
> cache on Altix systems. Assuming we get our OLS paper done
> in time, you can read more about this in our paper at OLS.
> (If not, we intend to post our experiences paper on the
> oss.sgi.com website.)
>
> Finally, let me reiterate that we are beginning the process of
> evaluating the 2.6 memory manager wrt the same problem as above.
> Before we will propose a change such as above for 2.6, we have
> to convince ourselves that (1) setting vm_swappiness appropriately
> doesn't solve the problem, and (2) that patches such as the ones
> that Nick Piggin has been proposing don't solve the problem
> either, and that (3) there isn't some other mechanism to deal
> with this in 2.6.
I have to admit that the definition of "desktop machine" has changed a
lot in the last few years, in terms of hardware, but I have been running
since 486 days with "what can I build/buy for <$2k which best fits my
overall computing?" With the onset of cheap memory and Opteron, NUMA
will be a factor in the next few years in all probability, and SMP has
been since the dual pentium systems were new.
That said, I think that your work will be useful, even if it is used
piecemeal or as inspiration to Nick, Andrea, and other who have been
working in the area. I find Nick's work as of 2.6.7-rc1-mm1 so good I
haven't moved any of my desktop machines beyond it, but it sounds as if
your work addresses the issue I mentioned about limiting buffer usage,
and Rik's comment that the code lacks check and balances. You seem to
have a balance, I'd love to see it.
--
-bill davidsen (davidsen@tmr.com)
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me
next prev parent reply other threads:[~2004-06-09 19:24 UTC|newest]
Thread overview: 146+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <fa.amhil9e.o5kt1u@ifi.uio.no>
[not found] ` <fa.kfm8lru.1l2mdp4@ifi.uio.no>
2004-06-08 15:12 ` why swap at all? Ray Bryant
2004-06-08 15:15 ` Ray Bryant
2004-06-09 19:24 ` Bill Davidsen [this message]
2004-05-31 19:34 Michael Brennan
2004-05-31 20:29 ` John Bradford
2004-05-31 22:47 ` Nick Piggin
2004-05-31 23:30 ` Bernd Eckenfels
2004-06-01 18:36 ` FabF
2004-06-01 19:02 ` Valdis.Kletnieks
2004-06-01 19:53 ` FabF
2004-06-01 20:00 ` Valdis.Kletnieks
2004-06-01 20:14 ` FabF
2004-06-01 20:22 ` Valdis.Kletnieks
2004-06-01 21:15 ` FabF
2004-06-01 21:40 ` Valdis.Kletnieks
2004-06-03 13:54 ` Bill Davidsen
2004-06-04 0:01 ` Nick Piggin
2004-06-01 23:17 ` Bernd Eckenfels
2004-06-02 5:38 ` FabF
2004-06-02 11:42 ` Con Kolivas
2004-06-02 12:22 ` John Bradford
2004-06-02 12:22 ` Con Kolivas
2004-06-02 17:06 ` FabF
2004-06-03 14:14 ` Bill Davidsen
2004-06-04 7:23 ` Buddy Lumpkin
2004-06-04 17:08 ` Bill Davidsen
2004-06-15 14:55 ` Charles Shannon Hendrix
2004-06-04 9:11 ` Catalin BOIE
2004-06-04 17:24 ` Bill Davidsen
2004-06-06 14:39 ` Rik van Riel
2004-06-02 17:59 ` Valdis.Kletnieks
2004-06-02 18:30 ` FabF
2004-06-02 23:54 ` Con Kolivas
2004-06-03 16:16 ` FabF
2004-06-03 23:56 ` Con Kolivas
2004-06-04 0:16 ` Con Kolivas
2004-06-03 14:18 ` Bill Davidsen
2004-06-03 14:27 ` Con Kolivas
2004-06-02 17:52 ` Valdis.Kletnieks
2004-06-02 3:50 ` Tim Connors
2004-06-02 17:45 ` Valdis.Kletnieks
2004-06-01 8:34 ` John Bradford
2004-06-01 8:32 ` William Lee Irwin III
2004-06-01 8:50 ` John Bradford
2004-06-01 8:54 ` William Lee Irwin III
2004-06-01 9:10 ` John Bradford
2004-06-08 1:18 ` Tim Connors
2004-06-08 5:29 ` Denis Vlasenko
2004-06-01 9:38 ` Buddy Lumpkin
2004-06-01 10:13 ` Tim Connors
2004-06-01 10:24 ` William Lee Irwin III
2004-06-01 11:19 ` Tim Connors
-- strict thread matches above, loose matches on Subject: below --
2004-05-27 12:31 Piszcz, Justin Michael
2004-05-27 12:41 ` William Lee Irwin III
2004-05-27 15:59 ` John Bradford
2004-05-27 16:16 ` William Lee Irwin III
2004-06-03 13:38 ` Bill Davidsen
[not found] <fa.fegqf9v.kmidof@ifi.uio.no>
[not found] ` <fa.bqpvcrs.u648jq@ifi.uio.no>
2004-05-27 11:39 ` Andy Lutomirski
2004-05-28 21:37 ` Denis Vlasenko
2004-05-28 22:28 ` Bernd Eckenfels
2004-05-29 7:31 ` Denis Vlasenko
2004-05-31 10:49 ` jlnance
2004-06-01 11:57 ` Lenar Lõhmus
2004-06-01 12:27 ` Robin Rosenberg
2004-06-01 16:49 ` jlnance
2004-06-02 18:38 ` John Hendrikx
2004-06-01 12:21 ` David B. Stevens
2004-05-27 5:37 Nick Piggin
2004-05-27 17:27 ` Buddy Lumpkin
2004-05-26 12:34 Piszcz, Justin Michael
2004-05-26 12:24 Nick Piggin
2004-05-26 13:03 ` Buddy Lumpkin
2004-05-26 13:27 ` Helge Hafting
2004-05-26 11:57 Nick Piggin
2004-05-26 12:19 ` Buddy Lumpkin
2004-05-26 11:04 Nick Piggin
2004-05-26 6:38 Anthony DiSante
2004-05-26 7:31 ` Buddy Lumpkin
2004-05-26 7:55 ` William Lee Irwin III
2004-05-26 8:30 ` Buddy Lumpkin
2004-05-26 8:44 ` Nick Piggin
2004-05-26 9:34 ` John Bradford
2004-05-26 9:48 ` Nick Piggin
2004-05-26 10:10 ` Matthias Schniedermeyer
2004-05-26 10:33 ` Nick Piggin
2004-05-26 10:58 ` Matthias Schniedermeyer
2004-05-26 11:19 ` Nick Piggin
2004-05-26 12:27 ` Matthias Schniedermeyer
2004-05-27 5:38 ` Nick Piggin
2004-05-26 12:37 ` Matthias Schniedermeyer
2004-05-26 13:06 ` Gianni Tedesco
2004-05-26 13:41 ` Matt H.
2004-05-26 13:55 ` Buddy Lumpkin
2004-05-27 5:14 ` Tom Felker
2004-05-27 6:02 ` Nick Piggin
2004-05-27 7:04 ` Bernd Eckenfels
2004-05-27 7:16 ` Oliver Neukum
2004-05-26 10:45 ` Martin Olsson
2004-05-26 11:25 ` Nick Piggin
2004-05-26 16:33 ` David Schwartz
2004-05-26 16:58 ` John Bradford
2004-05-26 23:32 ` Kyle Moffett
2004-05-27 8:05 ` John Bradford
2004-05-26 10:46 ` John Bradford
2004-05-26 11:46 ` Buddy Lumpkin
2004-05-26 11:39 ` Buddy Lumpkin
2004-05-26 9:42 ` Anthony DiSante
2004-05-26 9:58 ` Nick Piggin
2004-05-26 20:11 ` Wakko Warner
2004-05-27 5:59 ` Nick Piggin
2004-05-27 14:34 ` Wakko Warner
2004-05-26 10:40 ` Buddy Lumpkin
2004-05-26 13:15 ` Helge Hafting
2004-05-26 9:09 ` William Lee Irwin III
2004-05-26 11:38 ` Buddy Lumpkin
2004-05-26 12:12 ` Paulo Marques
2004-05-26 12:14 ` Nick Piggin
2004-05-26 12:40 ` Denis Vlasenko
2004-05-26 10:41 ` Denis Vlasenko
2004-05-26 12:07 ` Buddy Lumpkin
2004-05-26 12:06 ` Marc-Christian Petersen
2004-05-26 12:19 ` Denis Vlasenko
2004-05-26 13:48 ` Buddy Lumpkin
2004-05-26 12:33 ` Richard B. Johnson
2004-05-26 13:25 ` Buddy Lumpkin
2004-05-26 12:30 ` Rik van Riel
2004-05-26 10:44 ` Denis Vlasenko
2004-05-26 11:49 ` Buddy Lumpkin
2004-05-26 12:19 ` Rik van Riel
2004-05-26 12:55 ` Buddy Lumpkin
2004-05-26 8:27 ` Roger Luethi
2004-05-26 9:23 ` John Bradford
2004-05-26 9:30 ` Roger Luethi
2004-05-26 10:35 ` John Bradford
2004-05-26 10:37 ` Nick Piggin
2004-05-26 10:48 ` John Bradford
2004-05-26 13:01 ` Helge Hafting
2004-05-26 8:32 ` Denis Vlasenko
2004-05-26 9:00 ` Helge Hafting
2004-05-26 9:40 ` John Bradford
2004-05-26 13:06 ` Helge Hafting
2004-05-26 9:06 ` John Bradford
2004-05-26 12:31 ` Buddy Lumpkin
2004-05-26 10:02 ` Raphael Jacquot
2004-05-26 13:00 ` Satoshi Oshima
2004-05-26 13:38 ` William Lee Irwin III
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=40C763DD.7090003@tmr.com \
--to=davidsen@tmr.com \
--cc=b.lumpkin@comcast.net \
--cc=ecki-news2004-05@lina.inka.de \
--cc=fabian.frederick@skynet.be \
--cc=kernel@kolivas.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lse-tech@lists.sourceforge.net \
--cc=raybry@sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox