linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: Nick Piggin <npiggin@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
	Ted Ts'o <tytso@mit.edu>, Lukas Czerner <lczerner@redhat.com>,
	Boaz Harrosh <bharrosh@panasas.com>,
	linux-fsdevel@vger.kernel.org,
	Ext4 Developers List <linux-ext4@vger.kernel.org>,
	linux-mm@kvack.org
Subject: Re: [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags
Date: Mon, 23 Apr 2012 09:23:03 +0100	[thread overview]
Message-ID: <1335169383.4191.9.camel@dabdike.lan> (raw)
In-Reply-To: <CAPa8GCDkP_53VGAeQPeYgf3GW3KZ09BvnqduArQE7svf2mMj4A@mail.gmail.com>

On Sun, 2012-04-22 at 16:30 +1000, Nick Piggin wrote:
> On 22 April 2012 09:56, KOSAKI Motohiro <kosaki.motohiro@gmail.com> wrote:
> > On Fri, Apr 20, 2012 at 10:58 AM, Ted Ts'o <tytso@mit.edu> wrote:
> >> On Fri, Apr 20, 2012 at 06:42:08PM +0400, James Bottomley wrote:
> >>>
> >>> I'm not at all wedded to O_HOT and O_COLD; I think if we establish a
> >>> hint hierarchy file->page cache->device then we should, of course,
> >>> choose the best API and naming scheme for file->page cache.  The only
> >>> real point I was making is that we should tie in the page cache, and
> >>> currently it only knows about "hot" and "cold" pages.
> >>
> >> The problem is that "hot" and "cold" will have different meanings from
> >> the perspective of the file system versus the page cache.  The file
> >> system may consider a file "hot" if it is accessed frequently ---
> >> compared to the other 2 TB of data on that HDD.  The memory subsystem
> >> will consider a page "hot" compared to what has been recently accessed
> >> in the 8GB of memory that you might have your system.  Now consider
> >> that you might have a dozen or so 2TB disks that each have their "hot"
> >> areas, and it's not at all obvious that just because a file, or even
> >> part of a file is marked "hot", that it deserves to be in memory at
> >> any particular point in time.
> >
> > So, this have intentionally different meanings I have no seen a reason why
> > fs uses hot/cold words. It seems to bring a confusion.
> 
> Right. It has nothing to do with hot/cold usage in the page allocator,
> which is about how many lines of that page are in CPU cache.

Well, no it's a similar concept:  we have no idea whether the page is
cached or not.  What we do is estimate that by elapsed time since we
last touched the page.  In some sense, this is similar to the fs
definition: a hot page hint would mean we expect to touch the page
frequently and a cold page means we wouldn't.  i.e. for a hot page, the
elapsed time between touches would be short and for a cold page it would
be long.  Now I still think there's a mismatch in the time scales: a
long elapsed time for mm making the page cold isn't necessarily the same
long elapsed time for the file, because the mm idea is conditioned by
local events (like memory pressure).

> However it could be propagated up to page reclaim level, at least.
> Perhaps readahead/writeback too. But IMO it would be better to nail down
> the semantics for block and filesystem before getting worried about that.

Sure ... I just forwarded the email in case mm people had an interest.
If you want FS and storage to develop the hints first and then figure
out if we can involve the page cache, that's more or less what was
happening anyway.

> > But I don't know full story of this feature and I might be overlooking
> > something.
> 
> Also, "hot" and "cold" (as others have noted) is a big hammer that perhaps
> catches a tiny subset of useful work (probably more likely: benchmarks).
> 
> Is it read often? Written often? Both? Are reads and writes random or linear?
> Is it latency bound, or throughput bound? (i.e., are queue depths high or
> low?)
> 
> A filesystem and storage device might care about all of these things.
> Particularly if you have something more advanced than a single disk.
> Caches, tiers of storage, etc.

Experience has taught me to be wary of fine grained hints: they tend to
be more trouble than they're worth (the definitions are either
inaccurate or so tediously precise that no-one can be bothered to read
them).  A small set of broad hints is usually more useable than a huge
set of fine grained ones, so from that point of view, I like the
O_HOT/O_COLD ones.

James



  reply	other threads:[~2012-04-23  8:23 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-19 19:20 [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags Theodore Ts'o
2012-04-19 19:20 ` [PATCH, RFC 1/3] fs: add new open flags O_HOT and O_COLD Theodore Ts'o
2012-04-19 19:20 ` [PATCH, RFC 2/3] fs: propagate the open_flags structure down to the low-level fs's create() Theodore Ts'o
2012-04-19 19:20 ` [PATCH, RFC 3/3] ext4: use the O_HOT and O_COLD open flags to influence inode allocation Theodore Ts'o
2012-04-19 19:45   ` Eric Sandeen
2012-04-19 19:59     ` Ted Ts'o
2012-04-19 22:55       ` Andreas Dilger
2012-04-19 23:27   ` Dave Chinner
2012-04-20  2:26     ` Ted Ts'o
2012-04-21  0:57       ` Dave Chinner
2012-04-20  0:26 ` [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags Alex Elder
2012-04-20  2:45   ` Ted Ts'o
2012-04-20  9:31     ` Boaz Harrosh
2012-04-20  9:12 ` Boaz Harrosh
2012-04-20  9:45   ` Lukas Czerner
2012-04-20 11:01     ` James Bottomley
2012-04-20 11:23       ` Lukas Czerner
2012-04-20 14:07         ` Christoph Lameter
2012-04-20 14:42         ` James Bottomley
2012-04-20 14:58           ` Ted Ts'o
2012-04-21 23:56             ` KOSAKI Motohiro
2012-04-22  6:30               ` Nick Piggin
2012-04-23  8:23                 ` James Bottomley [this message]
2012-04-23 11:47                   ` Nick Piggin
2012-04-24  6:18                     ` Nick Piggin
2012-04-24 15:00                       ` KOSAKI Motohiro
2012-04-21 18:26       ` Jeff Garzik
2012-04-20 10:16 ` Bernd Schubert
2012-04-20 10:38   ` Lukas Czerner
2012-04-21 18:24 ` Jeff Garzik
2012-04-24 16:07 ` Alex Elder
2012-04-24 19:33 ` Jamie Lokier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1335169383.4191.9.camel@dabdike.lan \
    --to=james.bottomley@hansenpartnership.com \
    --cc=bharrosh@panasas.com \
    --cc=kosaki.motohiro@gmail.com \
    --cc=lczerner@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@gmail.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).