linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags
       [not found]   ` <alpine.LFD.2.00.1204201120060.27750@dhcp-27-109.brq.redhat.com>
@ 2012-04-20 11:01     ` James Bottomley
  2012-04-20 11:23       ` Lukas Czerner
  2012-04-21 18:26       ` Jeff Garzik
  0 siblings, 2 replies; 12+ messages in thread
From: James Bottomley @ 2012-04-20 11:01 UTC (permalink / raw)
  To: Lukas Czerner
  Cc: Boaz Harrosh, Theodore Ts'o, linux-fsdevel,
	Ext4 Developers List, linux-mm

On Fri, 2012-04-20 at 11:45 +0200, Lukas Czerner wrote:
> On Fri, 20 Apr 2012, Boaz Harrosh wrote:
> 
> > On 04/19/2012 10:20 PM, Theodore Ts'o wrote:
> > 
> > > As I had brought up during one of the lightning talks at the Linux
> > > Storage and Filesystem workshop, I am interested in introducing two new
> > > open flags, O_HOT and O_COLD.  These flags are passed down to the
> > > individual file system's inode operations' create function, and the file
> > > system can use these flags as a hint regarding whether the file is
> > > likely to be accessed frequently or not.
> > > 
> > > In the future I plan to do further work on how ext4 would use these
> > > flags, but I want to first get the ability to pass these flags plumbed
> > > into the VFS layer and the code points for O_HOT and O_COLD reserved.
> > > 
> > > 
> > > Theodore Ts'o (3):
> > >   fs: add new open flags O_HOT and O_COLD
> > >   fs: propagate the open_flags structure down to the low-level fs's
> > >     create()
> > >   ext4: use the O_HOT and O_COLD open flags to influence inode
> > >     allocation
> > > 
> > 
> > 
> > I would expect that the first, and most important patch to this
> > set would be the man page which would define the new API. 
> > What do you mean by cold/normal/hot? what is expected if supported?
> > how can we know if supported? ....
> 
> Well, this is exactly my concern as well. There is no way anyone would
> know what it actually means a what users can expect form using it. The
> result of this is very simple, everyone will just use O_HOT for
> everything (if they will use it at all).
> 
> Ted, as I've mentioned on LSF I think that the HOT/COLD name is really
> bad choice for exactly this reason. It means nothing. If you want to use
> this flag to place the inode on the faster part of the disk, then just
> say so and name the flag accordingly, this way everyone can use it.
> However for this to actually work we need some fs<->storage interface to
> query storage layout, which actually should not be that hard to do. I am
> afraid that in current form it will suit only Google and Taobao. I would
> really like to have interface to pass tags between user->fs and
> fs<->storage, but this one does not seem like a good start.

I think this is a little unfair.  We already have the notion of hot and
cold pages within the page cache.  The definitions for storage is
similar: a hot block is one which will likely be read again shortly and
a cold block is one that likely won't (ignoring the 30 odd gradations of
in-between that the draft standard currently mandates)

The concern I have is that the notion of hot and cold files *isn't*
propagated to the page cache, it's just shared between the fs and the
disk.  It looks like we could tie the notion of file opened with O_HOT
or O_COLD into the page reclaimers and actually call
free_hot_cold_page() with the correct flag, meaning we might get an
immediate benefit even in the absence of hint supporting disks.

I cc'd linux-mm to see if there might be an interest in this ... or even
if it's worth it: I can also see we don't necessarily want userspace to
be able to tamper with our idea of what's hot and cold in the page
cache, since we get it primarily from the lru lists.

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags
  2012-04-20 11:01     ` [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags James Bottomley
@ 2012-04-20 11:23       ` Lukas Czerner
  2012-04-20 14:07         ` Christoph Lameter
  2012-04-20 14:42         ` James Bottomley
  2012-04-21 18:26       ` Jeff Garzik
  1 sibling, 2 replies; 12+ messages in thread
From: Lukas Czerner @ 2012-04-20 11:23 UTC (permalink / raw)
  To: James Bottomley
  Cc: Lukas Czerner, Boaz Harrosh, Theodore Ts'o, linux-fsdevel,
	Ext4 Developers List, linux-mm

On Fri, 20 Apr 2012, James Bottomley wrote:

> On Fri, 2012-04-20 at 11:45 +0200, Lukas Czerner wrote:
> > On Fri, 20 Apr 2012, Boaz Harrosh wrote:
> > 
> > > On 04/19/2012 10:20 PM, Theodore Ts'o wrote:
> > > 
> > > > As I had brought up during one of the lightning talks at the Linux
> > > > Storage and Filesystem workshop, I am interested in introducing two new
> > > > open flags, O_HOT and O_COLD.  These flags are passed down to the
> > > > individual file system's inode operations' create function, and the file
> > > > system can use these flags as a hint regarding whether the file is
> > > > likely to be accessed frequently or not.
> > > > 
> > > > In the future I plan to do further work on how ext4 would use these
> > > > flags, but I want to first get the ability to pass these flags plumbed
> > > > into the VFS layer and the code points for O_HOT and O_COLD reserved.
> > > > 
> > > > 
> > > > Theodore Ts'o (3):
> > > >   fs: add new open flags O_HOT and O_COLD
> > > >   fs: propagate the open_flags structure down to the low-level fs's
> > > >     create()
> > > >   ext4: use the O_HOT and O_COLD open flags to influence inode
> > > >     allocation
> > > > 
> > > 
> > > 
> > > I would expect that the first, and most important patch to this
> > > set would be the man page which would define the new API. 
> > > What do you mean by cold/normal/hot? what is expected if supported?
> > > how can we know if supported? ....
> > 
> > Well, this is exactly my concern as well. There is no way anyone would
> > know what it actually means a what users can expect form using it. The
> > result of this is very simple, everyone will just use O_HOT for
> > everything (if they will use it at all).
> > 
> > Ted, as I've mentioned on LSF I think that the HOT/COLD name is really
> > bad choice for exactly this reason. It means nothing. If you want to use
> > this flag to place the inode on the faster part of the disk, then just
> > say so and name the flag accordingly, this way everyone can use it.
> > However for this to actually work we need some fs<->storage interface to
> > query storage layout, which actually should not be that hard to do. I am
> > afraid that in current form it will suit only Google and Taobao. I would
> > really like to have interface to pass tags between user->fs and
> > fs<->storage, but this one does not seem like a good start.
> 
> I think this is a little unfair.  We already have the notion of hot and
> cold pages within the page cache.  The definitions for storage is
> similar: a hot block is one which will likely be read again shortly and
> a cold block is one that likely won't (ignoring the 30 odd gradations of
> in-between that the draft standard currently mandates)

You're right, but there is a crucial difference, you can not compare
a page with a file. Page will be read or .. well not read so often, but
that's just one dimension. Files has a lot more dimensions, will it be
rewritten often ? will it be read often, appended often, do we need
really fast first access ? do we need fast metadata operation ? Will
this file be there forever, or is it just temporary ? Do we need fast
read/write ? and many more...

> 
> The concern I have is that the notion of hot and cold files *isn't*
> propagated to the page cache, it's just shared between the fs and the
> disk.  It looks like we could tie the notion of file opened with O_HOT
> or O_COLD into the page reclaimers and actually call
> free_hot_cold_page() with the correct flag, meaning we might get an
> immediate benefit even in the absence of hint supporting disks.

And this is actually very good idea, but the file flag should not be
O_HOT/O_COLD (and in this case being it open flag is really disputable
as well), but rather hold-this-file-in-memory-longer-than-others, or
will-read-this-file-quite-often. Moreover since with Ted's patches O_HOT
means put the file on faster part of the disk (or rather whatever fs
thinks is fast part of the disk, since the interface to get such
information is missing) we already have one "meaning" and with this
we'll add yet another, completely different meaning to the single
flag. That seems messy.

Thanks!
-Lukas

> 
> I cc'd linux-mm to see if there might be an interest in this ... or even
> if it's worth it: I can also see we don't necessarily want userspace to
> be able to tamper with our idea of what's hot and cold in the page
> cache, since we get it primarily from the lru lists.
> 
> James
> 
> 
> 

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags
  2012-04-20 11:23       ` Lukas Czerner
@ 2012-04-20 14:07         ` Christoph Lameter
  2012-04-20 14:42         ` James Bottomley
  1 sibling, 0 replies; 12+ messages in thread
From: Christoph Lameter @ 2012-04-20 14:07 UTC (permalink / raw)
  To: Lukas Czerner
  Cc: James Bottomley, Boaz Harrosh, Theodore Ts'o, linux-fsdevel,
	Ext4 Developers List, linux-mm

> > I cc'd linux-mm to see if there might be an interest in this ... or even
> > if it's worth it: I can also see we don't necessarily want userspace to
> > be able to tamper with our idea of what's hot and cold in the page
> > cache, since we get it primarily from the lru lists.
> >
> > James

The notion of hor and cold in the page allocator refers to processor cache
hotness and is used for pages on the per cpu free lists.

F.e. cold pages are used when I/O is soon expected to occur on them
because we want to avoid having to evict cache lines. Cold pages have been
freed a long time ago.

Hot pages are those that have been recently freed (we know that some
cachelines are present therefore) and thus it is likely that acquisition
by another process will allow that process to reuse the cacheline already
present avoiding a trip to memory.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags
  2012-04-20 11:23       ` Lukas Czerner
  2012-04-20 14:07         ` Christoph Lameter
@ 2012-04-20 14:42         ` James Bottomley
  2012-04-20 14:58           ` Ted Ts'o
  1 sibling, 1 reply; 12+ messages in thread
From: James Bottomley @ 2012-04-20 14:42 UTC (permalink / raw)
  To: Lukas Czerner
  Cc: Boaz Harrosh, Theodore Ts'o, linux-fsdevel,
	Ext4 Developers List, linux-mm

On Fri, 2012-04-20 at 13:23 +0200, Lukas Czerner wrote:
> On Fri, 20 Apr 2012, James Bottomley wrote:
> 
> > On Fri, 2012-04-20 at 11:45 +0200, Lukas Czerner wrote:
> > > On Fri, 20 Apr 2012, Boaz Harrosh wrote:
> > > 
> > > > On 04/19/2012 10:20 PM, Theodore Ts'o wrote:
> > > > 
> > > > > As I had brought up during one of the lightning talks at the Linux
> > > > > Storage and Filesystem workshop, I am interested in introducing two new
> > > > > open flags, O_HOT and O_COLD.  These flags are passed down to the
> > > > > individual file system's inode operations' create function, and the file
> > > > > system can use these flags as a hint regarding whether the file is
> > > > > likely to be accessed frequently or not.
> > > > > 
> > > > > In the future I plan to do further work on how ext4 would use these
> > > > > flags, but I want to first get the ability to pass these flags plumbed
> > > > > into the VFS layer and the code points for O_HOT and O_COLD reserved.
> > > > > 
> > > > > 
> > > > > Theodore Ts'o (3):
> > > > >   fs: add new open flags O_HOT and O_COLD
> > > > >   fs: propagate the open_flags structure down to the low-level fs's
> > > > >     create()
> > > > >   ext4: use the O_HOT and O_COLD open flags to influence inode
> > > > >     allocation
> > > > > 
> > > > 
> > > > 
> > > > I would expect that the first, and most important patch to this
> > > > set would be the man page which would define the new API. 
> > > > What do you mean by cold/normal/hot? what is expected if supported?
> > > > how can we know if supported? ....
> > > 
> > > Well, this is exactly my concern as well. There is no way anyone would
> > > know what it actually means a what users can expect form using it. The
> > > result of this is very simple, everyone will just use O_HOT for
> > > everything (if they will use it at all).
> > > 
> > > Ted, as I've mentioned on LSF I think that the HOT/COLD name is really
> > > bad choice for exactly this reason. It means nothing. If you want to use
> > > this flag to place the inode on the faster part of the disk, then just
> > > say so and name the flag accordingly, this way everyone can use it.
> > > However for this to actually work we need some fs<->storage interface to
> > > query storage layout, which actually should not be that hard to do. I am
> > > afraid that in current form it will suit only Google and Taobao. I would
> > > really like to have interface to pass tags between user->fs and
> > > fs<->storage, but this one does not seem like a good start.
> > 
> > I think this is a little unfair.  We already have the notion of hot and
> > cold pages within the page cache.  The definitions for storage is
> > similar: a hot block is one which will likely be read again shortly and
> > a cold block is one that likely won't (ignoring the 30 odd gradations of
> > in-between that the draft standard currently mandates)
> 
> You're right, but there is a crucial difference, you can not compare
> a page with a file. Page will be read or .. well not read so often, but
> that's just one dimension. Files has a lot more dimensions, will it be
> rewritten often ? will it be read often, appended often, do we need
> really fast first access ? do we need fast metadata operation ? Will
> this file be there forever, or is it just temporary ? Do we need fast
> read/write ? and many more...

Yes and no.  I agree with your assessment.  The major point you could
ding me on actually is that just because a file is hot doesn't mean all
its pages are it could only have a few hot pages in it.  You could also
argue that the time scale over which the page cache considers a page hot
and that over which a disk does the same might be so dissimilar as to
render the two usages orthogonal.

The points about read and write are valid, but we could extend the page
cache to them too.  For instance, our readahead decisions are done at a
bit of the wrong level (statically in block).  If the page cache knew a
file was streaming (a movie file, for instance), we could adjust the
readahead dynamically for that file.

Where this might be leading is that the file/filesystem hints to the
page cache, and the page cache hints to the device.  That way, we could
cope with the hot file with only a few hot pages case.

The drawback is that we really don't have much of this machinery in the
page cache at the moment, and it's questionable if we really want it.
Solving our readahead problem would be brilliant, especially if the
interface were hintable, but not necessarily if it involves huge
algorithmic expense in our current page cache.

> > The concern I have is that the notion of hot and cold files *isn't*
> > propagated to the page cache, it's just shared between the fs and the
> > disk.  It looks like we could tie the notion of file opened with O_HOT
> > or O_COLD into the page reclaimers and actually call
> > free_hot_cold_page() with the correct flag, meaning we might get an
> > immediate benefit even in the absence of hint supporting disks.
> 
> And this is actually very good idea, but the file flag should not be
> O_HOT/O_COLD (and in this case being it open flag is really disputable
> as well), but rather hold-this-file-in-memory-longer-than-others, or
> will-read-this-file-quite-often. Moreover since with Ted's patches O_HOT
> means put the file on faster part of the disk (or rather whatever fs
> thinks is fast part of the disk, since the interface to get such
> information is missing) we already have one "meaning" and with this
> we'll add yet another, completely different meaning to the single
> flag. That seems messy.

I'm not at all wedded to O_HOT and O_COLD; I think if we establish a
hint hierarchy file->page cache->device then we should, of course,
choose the best API and naming scheme for file->page cache.  The only
real point I was making is that we should tie in the page cache, and
currently it only knows about "hot" and "cold" pages.

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags
  2012-04-20 14:42         ` James Bottomley
@ 2012-04-20 14:58           ` Ted Ts'o
  2012-04-21 23:56             ` KOSAKI Motohiro
  0 siblings, 1 reply; 12+ messages in thread
From: Ted Ts'o @ 2012-04-20 14:58 UTC (permalink / raw)
  To: James Bottomley
  Cc: Lukas Czerner, Boaz Harrosh, linux-fsdevel, Ext4 Developers List,
	linux-mm

On Fri, Apr 20, 2012 at 06:42:08PM +0400, James Bottomley wrote:
> 
> I'm not at all wedded to O_HOT and O_COLD; I think if we establish a
> hint hierarchy file->page cache->device then we should, of course,
> choose the best API and naming scheme for file->page cache.  The only
> real point I was making is that we should tie in the page cache, and
> currently it only knows about "hot" and "cold" pages.

The problem is that "hot" and "cold" will have different meanings from
the perspective of the file system versus the page cache.  The file
system may consider a file "hot" if it is accessed frequently ---
compared to the other 2 TB of data on that HDD.  The memory subsystem
will consider a page "hot" compared to what has been recently accessed
in the 8GB of memory that you might have your system.  Now consider
that you might have a dozen or so 2TB disks that each have their "hot"
areas, and it's not at all obvious that just because a file, or even
part of a file is marked "hot", that it deserves to be in memory at
any particular point in time.

						- Ted

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags
  2012-04-20 11:01     ` [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags James Bottomley
  2012-04-20 11:23       ` Lukas Czerner
@ 2012-04-21 18:26       ` Jeff Garzik
  1 sibling, 0 replies; 12+ messages in thread
From: Jeff Garzik @ 2012-04-21 18:26 UTC (permalink / raw)
  To: James Bottomley
  Cc: Lukas Czerner, Boaz Harrosh, Theodore Ts'o, linux-fsdevel,
	Ext4 Developers List, linux-mm

On 04/20/2012 07:01 AM, James Bottomley wrote:
> The concern I have is that the notion of hot and cold files *isn't*
> propagated to the page cache, it's just shared between the fs and the
> disk.

Bingo -- full-file hint is too coarse-grained for some workloads.  Page 
granularity would propagate to the VM as well as block layer, and give 
the required flexibility to all workloads.  As well as covering the 
full-file case.

	Jeff


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags
  2012-04-20 14:58           ` Ted Ts'o
@ 2012-04-21 23:56             ` KOSAKI Motohiro
  2012-04-22  6:30               ` Nick Piggin
  0 siblings, 1 reply; 12+ messages in thread
From: KOSAKI Motohiro @ 2012-04-21 23:56 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: James Bottomley, Lukas Czerner, Boaz Harrosh, linux-fsdevel,
	Ext4 Developers List, linux-mm

On Fri, Apr 20, 2012 at 10:58 AM, Ted Ts'o <tytso@mit.edu> wrote:
> On Fri, Apr 20, 2012 at 06:42:08PM +0400, James Bottomley wrote:
>>
>> I'm not at all wedded to O_HOT and O_COLD; I think if we establish a
>> hint hierarchy file->page cache->device then we should, of course,
>> choose the best API and naming scheme for file->page cache.  The only
>> real point I was making is that we should tie in the page cache, and
>> currently it only knows about "hot" and "cold" pages.
>
> The problem is that "hot" and "cold" will have different meanings from
> the perspective of the file system versus the page cache.  The file
> system may consider a file "hot" if it is accessed frequently ---
> compared to the other 2 TB of data on that HDD.  The memory subsystem
> will consider a page "hot" compared to what has been recently accessed
> in the 8GB of memory that you might have your system.  Now consider
> that you might have a dozen or so 2TB disks that each have their "hot"
> areas, and it's not at all obvious that just because a file, or even
> part of a file is marked "hot", that it deserves to be in memory at
> any particular point in time.

So, this have intentionally different meanings I have no seen a reason why
fs uses hot/cold words. It seems to bring a confusion.

But I don't know full story of this feature and I might be overlooking
something.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags
  2012-04-21 23:56             ` KOSAKI Motohiro
@ 2012-04-22  6:30               ` Nick Piggin
  2012-04-23  8:23                 ` James Bottomley
  0 siblings, 1 reply; 12+ messages in thread
From: Nick Piggin @ 2012-04-22  6:30 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Ted Ts'o, James Bottomley, Lukas Czerner, Boaz Harrosh,
	linux-fsdevel, Ext4 Developers List, linux-mm

On 22 April 2012 09:56, KOSAKI Motohiro <kosaki.motohiro@gmail.com> wrote:
> On Fri, Apr 20, 2012 at 10:58 AM, Ted Ts'o <tytso@mit.edu> wrote:
>> On Fri, Apr 20, 2012 at 06:42:08PM +0400, James Bottomley wrote:
>>>
>>> I'm not at all wedded to O_HOT and O_COLD; I think if we establish a
>>> hint hierarchy file->page cache->device then we should, of course,
>>> choose the best API and naming scheme for file->page cache.  The only
>>> real point I was making is that we should tie in the page cache, and
>>> currently it only knows about "hot" and "cold" pages.
>>
>> The problem is that "hot" and "cold" will have different meanings from
>> the perspective of the file system versus the page cache.  The file
>> system may consider a file "hot" if it is accessed frequently ---
>> compared to the other 2 TB of data on that HDD.  The memory subsystem
>> will consider a page "hot" compared to what has been recently accessed
>> in the 8GB of memory that you might have your system.  Now consider
>> that you might have a dozen or so 2TB disks that each have their "hot"
>> areas, and it's not at all obvious that just because a file, or even
>> part of a file is marked "hot", that it deserves to be in memory at
>> any particular point in time.
>
> So, this have intentionally different meanings I have no seen a reason why
> fs uses hot/cold words. It seems to bring a confusion.

Right. It has nothing to do with hot/cold usage in the page allocator,
which is about how many lines of that page are in CPU cache.

However it could be propagated up to page reclaim level, at least.
Perhaps readahead/writeback too. But IMO it would be better to nail down
the semantics for block and filesystem before getting worried about that.


>
> But I don't know full story of this feature and I might be overlooking
> something.

Also, "hot" and "cold" (as others have noted) is a big hammer that perhaps
catches a tiny subset of useful work (probably more likely: benchmarks).

Is it read often? Written often? Both? Are reads and writes random or linear?
Is it latency bound, or throughput bound? (i.e., are queue depths high or
low?)

A filesystem and storage device might care about all of these things.
Particularly if you have something more advanced than a single disk.
Caches, tiers of storage, etc.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags
  2012-04-22  6:30               ` Nick Piggin
@ 2012-04-23  8:23                 ` James Bottomley
  2012-04-23 11:47                   ` Nick Piggin
  0 siblings, 1 reply; 12+ messages in thread
From: James Bottomley @ 2012-04-23  8:23 UTC (permalink / raw)
  To: Nick Piggin
  Cc: KOSAKI Motohiro, Ted Ts'o, Lukas Czerner, Boaz Harrosh,
	linux-fsdevel, Ext4 Developers List, linux-mm

On Sun, 2012-04-22 at 16:30 +1000, Nick Piggin wrote:
> On 22 April 2012 09:56, KOSAKI Motohiro <kosaki.motohiro@gmail.com> wrote:
> > On Fri, Apr 20, 2012 at 10:58 AM, Ted Ts'o <tytso@mit.edu> wrote:
> >> On Fri, Apr 20, 2012 at 06:42:08PM +0400, James Bottomley wrote:
> >>>
> >>> I'm not at all wedded to O_HOT and O_COLD; I think if we establish a
> >>> hint hierarchy file->page cache->device then we should, of course,
> >>> choose the best API and naming scheme for file->page cache.  The only
> >>> real point I was making is that we should tie in the page cache, and
> >>> currently it only knows about "hot" and "cold" pages.
> >>
> >> The problem is that "hot" and "cold" will have different meanings from
> >> the perspective of the file system versus the page cache.  The file
> >> system may consider a file "hot" if it is accessed frequently ---
> >> compared to the other 2 TB of data on that HDD.  The memory subsystem
> >> will consider a page "hot" compared to what has been recently accessed
> >> in the 8GB of memory that you might have your system.  Now consider
> >> that you might have a dozen or so 2TB disks that each have their "hot"
> >> areas, and it's not at all obvious that just because a file, or even
> >> part of a file is marked "hot", that it deserves to be in memory at
> >> any particular point in time.
> >
> > So, this have intentionally different meanings I have no seen a reason why
> > fs uses hot/cold words. It seems to bring a confusion.
> 
> Right. It has nothing to do with hot/cold usage in the page allocator,
> which is about how many lines of that page are in CPU cache.

Well, no it's a similar concept:  we have no idea whether the page is
cached or not.  What we do is estimate that by elapsed time since we
last touched the page.  In some sense, this is similar to the fs
definition: a hot page hint would mean we expect to touch the page
frequently and a cold page means we wouldn't.  i.e. for a hot page, the
elapsed time between touches would be short and for a cold page it would
be long.  Now I still think there's a mismatch in the time scales: a
long elapsed time for mm making the page cold isn't necessarily the same
long elapsed time for the file, because the mm idea is conditioned by
local events (like memory pressure).

> However it could be propagated up to page reclaim level, at least.
> Perhaps readahead/writeback too. But IMO it would be better to nail down
> the semantics for block and filesystem before getting worried about that.

Sure ... I just forwarded the email in case mm people had an interest.
If you want FS and storage to develop the hints first and then figure
out if we can involve the page cache, that's more or less what was
happening anyway.

> > But I don't know full story of this feature and I might be overlooking
> > something.
> 
> Also, "hot" and "cold" (as others have noted) is a big hammer that perhaps
> catches a tiny subset of useful work (probably more likely: benchmarks).
> 
> Is it read often? Written often? Both? Are reads and writes random or linear?
> Is it latency bound, or throughput bound? (i.e., are queue depths high or
> low?)
> 
> A filesystem and storage device might care about all of these things.
> Particularly if you have something more advanced than a single disk.
> Caches, tiers of storage, etc.

Experience has taught me to be wary of fine grained hints: they tend to
be more trouble than they're worth (the definitions are either
inaccurate or so tediously precise that no-one can be bothered to read
them).  A small set of broad hints is usually more useable than a huge
set of fine grained ones, so from that point of view, I like the
O_HOT/O_COLD ones.

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags
  2012-04-23  8:23                 ` James Bottomley
@ 2012-04-23 11:47                   ` Nick Piggin
  2012-04-24  6:18                     ` Nick Piggin
  0 siblings, 1 reply; 12+ messages in thread
From: Nick Piggin @ 2012-04-23 11:47 UTC (permalink / raw)
  To: James Bottomley
  Cc: KOSAKI Motohiro, Ted Ts'o, Lukas Czerner, Boaz Harrosh,
	linux-fsdevel, Ext4 Developers List, linux-mm

On 23 April 2012 18:23, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Sun, 2012-04-22 at 16:30 +1000, Nick Piggin wrote:
>> On 22 April 2012 09:56, KOSAKI Motohiro <kosaki.motohiro@gmail.com> wrote:
>> > On Fri, Apr 20, 2012 at 10:58 AM, Ted Ts'o <tytso@mit.edu> wrote:
>> >> On Fri, Apr 20, 2012 at 06:42:08PM +0400, James Bottomley wrote:
>> >>>
>> >>> I'm not at all wedded to O_HOT and O_COLD; I think if we establish a
>> >>> hint hierarchy file->page cache->device then we should, of course,
>> >>> choose the best API and naming scheme for file->page cache.  The only
>> >>> real point I was making is that we should tie in the page cache, and
>> >>> currently it only knows about "hot" and "cold" pages.
>> >>
>> >> The problem is that "hot" and "cold" will have different meanings from
>> >> the perspective of the file system versus the page cache.  The file
>> >> system may consider a file "hot" if it is accessed frequently ---
>> >> compared to the other 2 TB of data on that HDD.  The memory subsystem
>> >> will consider a page "hot" compared to what has been recently accessed
>> >> in the 8GB of memory that you might have your system.  Now consider
>> >> that you might have a dozen or so 2TB disks that each have their "hot"
>> >> areas, and it's not at all obvious that just because a file, or even
>> >> part of a file is marked "hot", that it deserves to be in memory at
>> >> any particular point in time.
>> >
>> > So, this have intentionally different meanings I have no seen a reason why
>> > fs uses hot/cold words. It seems to bring a confusion.
>>
>> Right. It has nothing to do with hot/cold usage in the page allocator,
>> which is about how many lines of that page are in CPU cache.
>
> Well, no it's a similar concept:  we have no idea whether the page is
> cached or not.
>
>  What we do is estimate that by elapsed time since we
> last touched the page.  In some sense, this is similar to the fs
> definition: a hot page hint would mean we expect to touch the page
> frequently and a cold page means we wouldn't.  i.e. for a hot page, the
> elapsed time between touches would be short and for a cold page it would
> be long.  Now I still think there's a mismatch in the time scales: a
> long elapsed time for mm making the page cold isn't necessarily the same
> long elapsed time for the file, because the mm idea is conditioned by
> local events (like memory pressure).

I suspect the mismatch would make it have virtually no correlation.
Experiments could surely be made, though.


>> However it could be propagated up to page reclaim level, at least.
>> Perhaps readahead/writeback too. But IMO it would be better to nail down
>> the semantics for block and filesystem before getting worried about that.
>
> Sure ... I just forwarded the email in case mm people had an interest.
> If you want FS and storage to develop the hints first and then figure
> out if we can involve the page cache, that's more or less what was
> happening anyway.

OK, good. mm layers can always look up any such flags quite easily, so
I think there is no problem of mechanism, only policy.


>> > But I don't know full story of this feature and I might be overlooking
>> > something.
>>
>> Also, "hot" and "cold" (as others have noted) is a big hammer that perhaps
>> catches a tiny subset of useful work (probably more likely: benchmarks).
>>
>> Is it read often? Written often? Both? Are reads and writes random or linear?
>> Is it latency bound, or throughput bound? (i.e., are queue depths high or
>> low?)
>>
>> A filesystem and storage device might care about all of these things.
>> Particularly if you have something more advanced than a single disk.
>> Caches, tiers of storage, etc.
>
> Experience has taught me to be wary of fine grained hints: they tend to
> be more trouble than they're worth (the definitions are either
> inaccurate or so tediously precise that no-one can be bothered to read
> them).  A small set of broad hints is usually more useable than a huge
> set of fine grained ones, so from that point of view, I like the
> O_HOT/O_COLD ones.

So long as the implementations can be sufficiently general that large majority
of "reasonable" application of the flags does not result in a slowdown, perhaps.

But while defining the API, you have to think about these things and not
just dismiss them completely.

Read vs write can be very important for caches and tiers, same for
random/linear,
latency constraints, etc. These things aren't exactly a huge unwieldy matrix. We
already have similar concepts in fadvise and such.

Thanks,
Nick

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags
  2012-04-23 11:47                   ` Nick Piggin
@ 2012-04-24  6:18                     ` Nick Piggin
  2012-04-24 15:00                       ` KOSAKI Motohiro
  0 siblings, 1 reply; 12+ messages in thread
From: Nick Piggin @ 2012-04-24  6:18 UTC (permalink / raw)
  To: James Bottomley
  Cc: KOSAKI Motohiro, Ted Ts'o, Lukas Czerner, Boaz Harrosh,
	linux-fsdevel, Ext4 Developers List, linux-mm

On 23 April 2012 21:47, Nick Piggin <npiggin@gmail.com> wrote:
> On 23 April 2012 18:23, James Bottomley

>> Experience has taught me to be wary of fine grained hints: they tend to
>> be more trouble than they're worth (the definitions are either
>> inaccurate or so tediously precise that no-one can be bothered to read
>> them).  A small set of broad hints is usually more useable than a huge
>> set of fine grained ones, so from that point of view, I like the
>> O_HOT/O_COLD ones.
>
> So long as the implementations can be sufficiently general that large majority
> of "reasonable" application of the flags does not result in a slowdown, perhaps.
>
> But while defining the API, you have to think about these things and not
> just dismiss them completely.
>
> Read vs write can be very important for caches and tiers, same for
> random/linear,
> latency constraints, etc. These things aren't exactly a huge unwieldy matrix. We
> already have similar concepts in fadvise and such.

I'm not saying it's necessarily a bad idea as such. But experience
has taught me that if you define an API before having much
experience of the implementation and its users, and without
being able to write meaningful documentation for it, then it's
going to be a bad API.

So rather than pushing through these flags first, I think it would
be better to actually do implementation work, and get some
benchmarks (if not real apps) and have something working
like that before turning anything into an API.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags
  2012-04-24  6:18                     ` Nick Piggin
@ 2012-04-24 15:00                       ` KOSAKI Motohiro
  0 siblings, 0 replies; 12+ messages in thread
From: KOSAKI Motohiro @ 2012-04-24 15:00 UTC (permalink / raw)
  To: Nick Piggin
  Cc: James Bottomley, KOSAKI Motohiro, Ted Ts'o, Lukas Czerner,
	Boaz Harrosh, linux-fsdevel, Ext4 Developers List, linux-mm

(4/24/12 2:18 AM), Nick Piggin wrote:
> On 23 April 2012 21:47, Nick Piggin<npiggin@gmail.com>  wrote:
>> On 23 April 2012 18:23, James Bottomley
>
>>> Experience has taught me to be wary of fine grained hints: they tend to
>>> be more trouble than they're worth (the definitions are either
>>> inaccurate or so tediously precise that no-one can be bothered to read
>>> them).  A small set of broad hints is usually more useable than a huge
>>> set of fine grained ones, so from that point of view, I like the
>>> O_HOT/O_COLD ones.
>>
>> So long as the implementations can be sufficiently general that large majority
>> of "reasonable" application of the flags does not result in a slowdown, perhaps.
>>
>> But while defining the API, you have to think about these things and not
>> just dismiss them completely.
>>
>> Read vs write can be very important for caches and tiers, same for
>> random/linear,
>> latency constraints, etc. These things aren't exactly a huge unwieldy matrix. We
>> already have similar concepts in fadvise and such.
>
> I'm not saying it's necessarily a bad idea as such. But experience
> has taught me that if you define an API before having much
> experience of the implementation and its users, and without
> being able to write meaningful documentation for it, then it's
> going to be a bad API.
>
> So rather than pushing through these flags first, I think it would
> be better to actually do implementation work, and get some
> benchmarks (if not real apps) and have something working
> like that before turning anything into an API.

Fully agreed.

I _guess_ O_COLD has an enough real world usefullness because a backup operation
makes a lot of "write once read never" inodes. Moreover it doesn't have a system wide
side effect.

In the other hands, I don't imagine how O_HOT works yet. Beccause of, many apps want
to run faster than other apps and it definitely don't work _if_ all applications turn on
O_HOT for every open operations. So, I'm not sure why apps don't do such intentional
abuse yet.

So, we might need some API design discussions.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-04-24 15:00 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1334863211-19504-1-git-send-email-tytso@mit.edu>
     [not found] ` <4F912880.70708@panasas.com>
     [not found]   ` <alpine.LFD.2.00.1204201120060.27750@dhcp-27-109.brq.redhat.com>
2012-04-20 11:01     ` [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags James Bottomley
2012-04-20 11:23       ` Lukas Czerner
2012-04-20 14:07         ` Christoph Lameter
2012-04-20 14:42         ` James Bottomley
2012-04-20 14:58           ` Ted Ts'o
2012-04-21 23:56             ` KOSAKI Motohiro
2012-04-22  6:30               ` Nick Piggin
2012-04-23  8:23                 ` James Bottomley
2012-04-23 11:47                   ` Nick Piggin
2012-04-24  6:18                     ` Nick Piggin
2012-04-24 15:00                       ` KOSAKI Motohiro
2012-04-21 18:26       ` Jeff Garzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).