tmem and swap pagecache

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: John Moser <john.r.moser@gmail.com>
To: linux-kernel@vger.kernel.org
Subject: tmem and swap pagecache
Date: Wed, 20 Jun 2012 19:12:17 -0400	[thread overview]
Message-ID: <4FE258D1.9040304@gmail.com> (raw)

Dragonfly BSD has a feature by which--if you put a swap partition on an 
SSD--it will allow you to configure the swapping area to also accept 
page cache data.  In this way, page cache from spinning hard disks can 
be swapped into SSD for acceleration using the swap device.

It seems to me that creating a tmem[1] module that supplies page cache 
swap or supplies a swap area with page cache logic (i.e. a swap area 
that acts as a combined area, using its entire size as page cache but 
evicting page cache for swap when full) would be a good use of tmem.

It isn't apparent to me that tmem has any way of knowing where a page 
comes from, however.  Specifically, as far as I can tell (I may be 
wrong), you can't hint to tmem that a page is from page cache (you can 
tell it it's persistent or transient--probably swap or cache, but not 
necessarily), much less that it's page cache backed by a spinning disk 
versus an SSD or USB drive.

To my senses, it would be useful to be able to have a database server or 
mass file server with 4GB or 8GB of RAM and a 128GB SSD ($150, cheaper 
than 128GB of RAM...) that can back extended page cache.  Effectively 
you get an L2 page cache, with maybe 2-3GB of L1 page cache in system 
RAM and 108GB of L2 page cache on an SSD, with 20GB in use for the / 
file system (which you want to specifically NOT page cache into SSD).

[Tangent]

It may alternately be a generally good idea to check the backing device 
of page cache in general and favor evicting SSD-backed page cache over 
slow page cache, although now we're getting too specific and we want to 
generically call these devices "fast" or "slow" and more specifically 
"faster than X" and "Slower than Y" if we're discussing that.

Essentially I mean:  what about spinning hard disks on SATA2 vs SATA3 vs 
USB vs eSata?  eSATA is way faster than USB, SATA3 is faster than SATA2 
unless your disk is slower than the SATA2 disk and the SATA2 disk is 
slower than SATA2.  Obviously you want to evict the cache for a 7200RPM 
SATA3 hard disk with 64MB cache more readily than a spinning USB 1.0 
hard disk, if they're both used roughly as recently--the SATA3 drive is 
10% more likely to be used first than the USB drive but the USB drive 
has 500% of the access speed, overall you're likely to come out faster 
favoring the USB drive's page cache.  As you evict more of the SATA3 
drive's cache, what's left has been used far more recently than anything 
on the USB drive, so it starts making more sense to evict the old USB 
drive's data.

[/Tangent]

But even if you favor SSD eviction over hard drive eviction (by any 
metric), you'll eventually come to a point where you only have so much 
page cache and having the ability to swap page cache to a huge SSD 
becomes more attractive.  It becomes an extremely large read cache.  The 
Seagate Momentus XT fares pretty well with an 8GB read cache for this 
(although on their specialized, tightly integrated hardware, they've 
managed to use write-back caching to gain a LOT of write performance, 
too); it would make sense that using an SSD as a big page cache would 
supply similar gains in specialized environments for cheaper than the 
cost of RAM.

Of course that just takes us back to the original question:  Can we hint 
to tmem where the data is coming from, and let it decide if it cares and 
how to handle it?

[1] http://lwn.net/Articles/340409/

                 reply	other threads:[~2012-06-20 23:11 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FE258D1.9040304@gmail.com \
    --to=john.r.moser@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).