public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Bill Davidsen <davidsen@tmr.com>
To: Ray Bryant <raybry@sgi.com>
Cc: Buddy Lumpkin <b.lumpkin@comcast.net>,
	"'Con Kolivas'" <kernel@kolivas.org>,
	"'FabF'" <fabian.frederick@skynet.be>,
	"'Bernd Eckenfels'" <ecki-news2004-05@lina.inka.de>,
	linux-kernel@vger.kernel.org, lse-tech@lists.sourceforge.net,
	linux-mm@kvack.org
Subject: Re: why swap at all?
Date: Wed, 09 Jun 2004 15:24:13 -0400	[thread overview]
Message-ID: <40C763DD.7090003@tmr.com> (raw)
In-Reply-To: <40C5D7FB.7020402@sgi.com>

Ray Bryant wrote:
> 
> Buddy Lumpkin wrote:
> 
>>  <snip> One method would be to keep the
>> pagecache on it's own list, and move pages to the head of the list any 
>> time
>> they are modified or referenced, and reclaim from the tail.
>> All pages on this list can be considered as "free memory", because any 
>> new
>> memory requests would just cause pages to be evicted from the tail of the
>> list.
>>
> 
> We have code running on Altix that does exactly this.  (Please note,
> however, that this is for our version of Linux 2.4.21 -- Yeah, its
> old, but that is what the product runs at the moment -- we are in
> the process of switching over to Linux 2.6 when all of this will
> have to be re-evaluated.)  The changes are in three parts:
> 
> (1)  We added a new page list, the reclaim list.  Pages are put
> onto the reclaim list when they are inserted into the page cache.
> They are removed from the list when they are marked dirty (buffers
> from the page go on to the LRU dirty list) or when the pages are
> mmap'd into an address space, since in either of these situations,
> the pages are not reclaimable.  (This list is per node in our
> NUMA system.)
> 
> (2)  We added code in __alloc_pages() so that if the local node
> allocation is going to fail (remember that Altix is a NUMA machine),
> we call out to a routine to scan the reclaim list on that node and
> to release enough clean buffer cache pages to make the local
> allocation succeed (plus a few pages, for efficiency).  If this
> doesn't work, we most likely end up spilling the allocation over
> to another node.
> 
> (3)  We added code in generic_file_write() to limit the size of
> the page cache on buffered file I/O write operations.  If the
> current size of the page cache is larger than the limit, we
> call the same routine as above to release some page cache pages.
> If we can't free enough pages to get below the limit, we throttle
> the write process by delaying it for a bit.  This was all to
> avoid the problem of a large buffered file I/O request causing
> the page cache to grow to the point where the system would start
> to swap.  (On our large memory systems, dropping into the
> swapping code can cause the system to freeze for 10's of seconds,
> and that is something we would like to avoid).
> 
> (We actually don't enforce the page cache limit unless the amount
> of free memory has dropped below a certain threshold.  This is to
> keep the page cache from being limited if there is lots of free
> memory -- even though we only limit the page cache on writes,
> it turns out that the kernel is constantly writing to the disk,
> so this also effectively causes the page cache to be limited
> for reads as well.)
> 
> This code was also written in response to customer demand.  They
> don't like the fact that the buffer cache grows and grows on our
> Altix systems, and they want old buffer cache pages to be cleared
> out when they are no longer needed.  Since we almost never suffer
> memory pressure on our systems (and if we do, we are likely in
> trouble), kswapd almost never does this.  Buffer cache pages can
> sit around for days with no one removing them.  The above was one
> approach to solve that problem.
> 
> Pleaes note: YMMV.  An Altix is not a desktop system and I make
> no claims that the above approach is appropriate for everyone.
> For us, it turns out to work better to bias storage allocation
> against unbridled growth of the page cache.  Indeed, we have
> spent a lot of time trying to solve problems related to page
> cache on Altix systems.  Assuming we get our OLS paper done
> in time, you can read more about this in our paper at OLS.
> (If not, we intend to post our experiences paper on the
> oss.sgi.com website.)
> 
> Finally, let me reiterate that we are beginning the process of
> evaluating the 2.6 memory manager wrt the same problem as above.
> Before we will propose a change such as above for 2.6, we have
> to convince ourselves that (1) setting vm_swappiness appropriately
> doesn't solve the problem, and (2) that patches such as the ones
> that Nick Piggin has been proposing don't solve the problem
> either, and that (3) there isn't some other mechanism to deal
> with this in 2.6.

I have to admit that the definition of "desktop machine" has changed a 
lot in the last few years, in terms of hardware, but I have been running 
since 486 days with "what can I build/buy for <$2k which best fits my 
overall computing?" With the onset of cheap memory and Opteron, NUMA 
will be a factor in the next few years in all probability, and SMP has 
been since the dual pentium systems were new.

That said, I think that your work will be useful, even if it is used 
piecemeal or as inspiration to Nick, Andrea, and other who have been 
working in the area. I find Nick's work as of 2.6.7-rc1-mm1 so good I 
haven't moved any of my desktop machines beyond it, but it sounds as if 
your work addresses the issue I mentioned about limiting buffer usage, 
and Rik's comment that the code lacks check and balances. You seem to 
have a balance, I'd love to see it.


-- 
    -bill davidsen (davidsen@tmr.com)
"The secret to procrastination is to put things off until the
  last possible moment - but no longer"  -me

  reply	other threads:[~2004-06-09 19:24 UTC|newest]

Thread overview: 146+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <fa.amhil9e.o5kt1u@ifi.uio.no>
     [not found] ` <fa.kfm8lru.1l2mdp4@ifi.uio.no>
2004-06-08 15:12   ` why swap at all? Ray Bryant
2004-06-08 15:15   ` Ray Bryant
2004-06-09 19:24     ` Bill Davidsen [this message]
2004-05-31 19:34 Michael Brennan
2004-05-31 20:29 ` John Bradford
2004-05-31 22:47   ` Nick Piggin
2004-05-31 23:30     ` Bernd Eckenfels
2004-06-01 18:36       ` FabF
2004-06-01 19:02         ` Valdis.Kletnieks
2004-06-01 19:53           ` FabF
2004-06-01 20:00             ` Valdis.Kletnieks
2004-06-01 20:14               ` FabF
2004-06-01 20:22                 ` Valdis.Kletnieks
2004-06-01 21:15                   ` FabF
2004-06-01 21:40                     ` Valdis.Kletnieks
2004-06-03 13:54                     ` Bill Davidsen
2004-06-04  0:01                       ` Nick Piggin
2004-06-01 23:17               ` Bernd Eckenfels
2004-06-02  5:38                 ` FabF
2004-06-02 11:42                   ` Con Kolivas
2004-06-02 12:22                     ` John Bradford
2004-06-02 12:22                       ` Con Kolivas
2004-06-02 17:06                     ` FabF
2004-06-03 14:14                     ` Bill Davidsen
2004-06-04  7:23                       ` Buddy Lumpkin
2004-06-04 17:08                         ` Bill Davidsen
2004-06-15 14:55                           ` Charles Shannon Hendrix
2004-06-04  9:11                       ` Catalin BOIE
2004-06-04 17:24                         ` Bill Davidsen
2004-06-06 14:39                       ` Rik van Riel
2004-06-02 17:59                   ` Valdis.Kletnieks
2004-06-02 18:30                     ` FabF
2004-06-02 23:54                       ` Con Kolivas
2004-06-03 16:16                         ` FabF
2004-06-03 23:56                           ` Con Kolivas
2004-06-04  0:16                             ` Con Kolivas
2004-06-03 14:18                     ` Bill Davidsen
2004-06-03 14:27                       ` Con Kolivas
2004-06-02 17:52                 ` Valdis.Kletnieks
2004-06-02  3:50           ` Tim Connors
2004-06-02 17:45             ` Valdis.Kletnieks
2004-06-01  8:34     ` John Bradford
2004-06-01  8:32       ` William Lee Irwin III
2004-06-01  8:50         ` John Bradford
2004-06-01  8:54           ` William Lee Irwin III
2004-06-01  9:10             ` John Bradford
2004-06-08  1:18               ` Tim Connors
2004-06-08  5:29                 ` Denis Vlasenko
2004-06-01  9:38   ` Buddy Lumpkin
2004-06-01 10:13     ` Tim Connors
2004-06-01 10:24       ` William Lee Irwin III
2004-06-01 11:19         ` Tim Connors
  -- strict thread matches above, loose matches on Subject: below --
2004-05-27 12:31 Piszcz, Justin Michael
2004-05-27 12:41 ` William Lee Irwin III
2004-05-27 15:59   ` John Bradford
2004-05-27 16:16     ` William Lee Irwin III
2004-06-03 13:38   ` Bill Davidsen
     [not found] <fa.fegqf9v.kmidof@ifi.uio.no>
     [not found] ` <fa.bqpvcrs.u648jq@ifi.uio.no>
2004-05-27 11:39   ` Andy Lutomirski
2004-05-28 21:37     ` Denis Vlasenko
2004-05-28 22:28       ` Bernd Eckenfels
2004-05-29  7:31         ` Denis Vlasenko
2004-05-31 10:49         ` jlnance
2004-06-01 11:57           ` Lenar Lõhmus
2004-06-01 12:27             ` Robin Rosenberg
2004-06-01 16:49             ` jlnance
2004-06-02 18:38               ` John Hendrikx
2004-06-01 12:21           ` David B. Stevens
2004-05-27  5:37 Nick Piggin
2004-05-27 17:27 ` Buddy Lumpkin
2004-05-26 12:34 Piszcz, Justin Michael
2004-05-26 12:24 Nick Piggin
2004-05-26 13:03 ` Buddy Lumpkin
2004-05-26 13:27   ` Helge Hafting
2004-05-26 11:57 Nick Piggin
2004-05-26 12:19 ` Buddy Lumpkin
2004-05-26 11:04 Nick Piggin
2004-05-26  6:38 Anthony DiSante
2004-05-26  7:31 ` Buddy Lumpkin
2004-05-26  7:55   ` William Lee Irwin III
2004-05-26  8:30     ` Buddy Lumpkin
2004-05-26  8:44       ` Nick Piggin
2004-05-26  9:34         ` John Bradford
2004-05-26  9:48           ` Nick Piggin
2004-05-26 10:10             ` Matthias Schniedermeyer
2004-05-26 10:33               ` Nick Piggin
2004-05-26 10:58                 ` Matthias Schniedermeyer
2004-05-26 11:19                   ` Nick Piggin
2004-05-26 12:27                     ` Matthias Schniedermeyer
2004-05-27  5:38                       ` Nick Piggin
2004-05-26 12:37                     ` Matthias Schniedermeyer
2004-05-26 13:06                       ` Gianni Tedesco
2004-05-26 13:41                         ` Matt H.
2004-05-26 13:55                       ` Buddy Lumpkin
2004-05-27  5:14                       ` Tom Felker
2004-05-27  6:02                         ` Nick Piggin
2004-05-27  7:04                         ` Bernd Eckenfels
2004-05-27  7:16                         ` Oliver Neukum
2004-05-26 10:45               ` Martin Olsson
2004-05-26 11:25                 ` Nick Piggin
2004-05-26 16:33                 ` David Schwartz
2004-05-26 16:58                   ` John Bradford
2004-05-26 23:32                     ` Kyle Moffett
2004-05-27  8:05                       ` John Bradford
2004-05-26 10:46             ` John Bradford
2004-05-26 11:46             ` Buddy Lumpkin
2004-05-26 11:39           ` Buddy Lumpkin
2004-05-26  9:42         ` Anthony DiSante
2004-05-26  9:58           ` Nick Piggin
2004-05-26 20:11             ` Wakko Warner
2004-05-27  5:59               ` Nick Piggin
2004-05-27 14:34                 ` Wakko Warner
2004-05-26 10:40         ` Buddy Lumpkin
2004-05-26 13:15           ` Helge Hafting
2004-05-26  9:09       ` William Lee Irwin III
2004-05-26 11:38         ` Buddy Lumpkin
2004-05-26 12:12           ` Paulo Marques
2004-05-26 12:14           ` Nick Piggin
2004-05-26 12:40           ` Denis Vlasenko
2004-05-26 10:41       ` Denis Vlasenko
2004-05-26 12:07         ` Buddy Lumpkin
2004-05-26 12:06           ` Marc-Christian Petersen
2004-05-26 12:19           ` Denis Vlasenko
2004-05-26 13:48             ` Buddy Lumpkin
2004-05-26 12:33           ` Richard B. Johnson
2004-05-26 13:25             ` Buddy Lumpkin
2004-05-26 12:30         ` Rik van Riel
2004-05-26 10:44       ` Denis Vlasenko
2004-05-26 11:49         ` Buddy Lumpkin
2004-05-26 12:19       ` Rik van Riel
2004-05-26 12:55         ` Buddy Lumpkin
2004-05-26  8:27 ` Roger Luethi
2004-05-26  9:23   ` John Bradford
2004-05-26  9:30     ` Roger Luethi
2004-05-26 10:35       ` John Bradford
2004-05-26 10:37         ` Nick Piggin
2004-05-26 10:48           ` John Bradford
2004-05-26 13:01     ` Helge Hafting
2004-05-26  8:32 ` Denis Vlasenko
2004-05-26  9:00 ` Helge Hafting
2004-05-26  9:40   ` John Bradford
2004-05-26 13:06     ` Helge Hafting
2004-05-26  9:06 ` John Bradford
2004-05-26 12:31   ` Buddy Lumpkin
2004-05-26 10:02 ` Raphael Jacquot
2004-05-26 13:00 ` Satoshi Oshima
2004-05-26 13:38   ` William Lee Irwin III

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=40C763DD.7090003@tmr.com \
    --to=davidsen@tmr.com \
    --cc=b.lumpkin@comcast.net \
    --cc=ecki-news2004-05@lina.inka.de \
    --cc=fabian.frederick@skynet.be \
    --cc=kernel@kolivas.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lse-tech@lists.sourceforge.net \
    --cc=raybry@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox