* Re: [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags [not found] ` <alpine.LFD.2.00.1204201120060.27750@dhcp-27-109.brq.redhat.com> @ 2012-04-20 11:01 ` James Bottomley 2012-04-20 11:23 ` Lukas Czerner 2012-04-21 18:26 ` Jeff Garzik 0 siblings, 2 replies; 12+ messages in thread From: James Bottomley @ 2012-04-20 11:01 UTC (permalink / raw) To: Lukas Czerner Cc: Boaz Harrosh, Theodore Ts'o, linux-fsdevel, Ext4 Developers List, linux-mm On Fri, 2012-04-20 at 11:45 +0200, Lukas Czerner wrote: > On Fri, 20 Apr 2012, Boaz Harrosh wrote: > > > On 04/19/2012 10:20 PM, Theodore Ts'o wrote: > > > > > As I had brought up during one of the lightning talks at the Linux > > > Storage and Filesystem workshop, I am interested in introducing two new > > > open flags, O_HOT and O_COLD. These flags are passed down to the > > > individual file system's inode operations' create function, and the file > > > system can use these flags as a hint regarding whether the file is > > > likely to be accessed frequently or not. > > > > > > In the future I plan to do further work on how ext4 would use these > > > flags, but I want to first get the ability to pass these flags plumbed > > > into the VFS layer and the code points for O_HOT and O_COLD reserved. > > > > > > > > > Theodore Ts'o (3): > > > fs: add new open flags O_HOT and O_COLD > > > fs: propagate the open_flags structure down to the low-level fs's > > > create() > > > ext4: use the O_HOT and O_COLD open flags to influence inode > > > allocation > > > > > > > > > I would expect that the first, and most important patch to this > > set would be the man page which would define the new API. > > What do you mean by cold/normal/hot? what is expected if supported? > > how can we know if supported? .... > > Well, this is exactly my concern as well. There is no way anyone would > know what it actually means a what users can expect form using it. The > result of this is very simple, everyone will just use O_HOT for > everything (if they will use it at all). > > Ted, as I've mentioned on LSF I think that the HOT/COLD name is really > bad choice for exactly this reason. It means nothing. If you want to use > this flag to place the inode on the faster part of the disk, then just > say so and name the flag accordingly, this way everyone can use it. > However for this to actually work we need some fs<->storage interface to > query storage layout, which actually should not be that hard to do. I am > afraid that in current form it will suit only Google and Taobao. I would > really like to have interface to pass tags between user->fs and > fs<->storage, but this one does not seem like a good start. I think this is a little unfair. We already have the notion of hot and cold pages within the page cache. The definitions for storage is similar: a hot block is one which will likely be read again shortly and a cold block is one that likely won't (ignoring the 30 odd gradations of in-between that the draft standard currently mandates) The concern I have is that the notion of hot and cold files *isn't* propagated to the page cache, it's just shared between the fs and the disk. It looks like we could tie the notion of file opened with O_HOT or O_COLD into the page reclaimers and actually call free_hot_cold_page() with the correct flag, meaning we might get an immediate benefit even in the absence of hint supporting disks. I cc'd linux-mm to see if there might be an interest in this ... or even if it's worth it: I can also see we don't necessarily want userspace to be able to tamper with our idea of what's hot and cold in the page cache, since we get it primarily from the lru lists. James -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags 2012-04-20 11:01 ` [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags James Bottomley @ 2012-04-20 11:23 ` Lukas Czerner 2012-04-20 14:07 ` Christoph Lameter 2012-04-20 14:42 ` James Bottomley 2012-04-21 18:26 ` Jeff Garzik 1 sibling, 2 replies; 12+ messages in thread From: Lukas Czerner @ 2012-04-20 11:23 UTC (permalink / raw) To: James Bottomley Cc: Lukas Czerner, Boaz Harrosh, Theodore Ts'o, linux-fsdevel, Ext4 Developers List, linux-mm On Fri, 20 Apr 2012, James Bottomley wrote: > On Fri, 2012-04-20 at 11:45 +0200, Lukas Czerner wrote: > > On Fri, 20 Apr 2012, Boaz Harrosh wrote: > > > > > On 04/19/2012 10:20 PM, Theodore Ts'o wrote: > > > > > > > As I had brought up during one of the lightning talks at the Linux > > > > Storage and Filesystem workshop, I am interested in introducing two new > > > > open flags, O_HOT and O_COLD. These flags are passed down to the > > > > individual file system's inode operations' create function, and the file > > > > system can use these flags as a hint regarding whether the file is > > > > likely to be accessed frequently or not. > > > > > > > > In the future I plan to do further work on how ext4 would use these > > > > flags, but I want to first get the ability to pass these flags plumbed > > > > into the VFS layer and the code points for O_HOT and O_COLD reserved. > > > > > > > > > > > > Theodore Ts'o (3): > > > > fs: add new open flags O_HOT and O_COLD > > > > fs: propagate the open_flags structure down to the low-level fs's > > > > create() > > > > ext4: use the O_HOT and O_COLD open flags to influence inode > > > > allocation > > > > > > > > > > > > > I would expect that the first, and most important patch to this > > > set would be the man page which would define the new API. > > > What do you mean by cold/normal/hot? what is expected if supported? > > > how can we know if supported? .... > > > > Well, this is exactly my concern as well. There is no way anyone would > > know what it actually means a what users can expect form using it. The > > result of this is very simple, everyone will just use O_HOT for > > everything (if they will use it at all). > > > > Ted, as I've mentioned on LSF I think that the HOT/COLD name is really > > bad choice for exactly this reason. It means nothing. If you want to use > > this flag to place the inode on the faster part of the disk, then just > > say so and name the flag accordingly, this way everyone can use it. > > However for this to actually work we need some fs<->storage interface to > > query storage layout, which actually should not be that hard to do. I am > > afraid that in current form it will suit only Google and Taobao. I would > > really like to have interface to pass tags between user->fs and > > fs<->storage, but this one does not seem like a good start. > > I think this is a little unfair. We already have the notion of hot and > cold pages within the page cache. The definitions for storage is > similar: a hot block is one which will likely be read again shortly and > a cold block is one that likely won't (ignoring the 30 odd gradations of > in-between that the draft standard currently mandates) You're right, but there is a crucial difference, you can not compare a page with a file. Page will be read or .. well not read so often, but that's just one dimension. Files has a lot more dimensions, will it be rewritten often ? will it be read often, appended often, do we need really fast first access ? do we need fast metadata operation ? Will this file be there forever, or is it just temporary ? Do we need fast read/write ? and many more... > > The concern I have is that the notion of hot and cold files *isn't* > propagated to the page cache, it's just shared between the fs and the > disk. It looks like we could tie the notion of file opened with O_HOT > or O_COLD into the page reclaimers and actually call > free_hot_cold_page() with the correct flag, meaning we might get an > immediate benefit even in the absence of hint supporting disks. And this is actually very good idea, but the file flag should not be O_HOT/O_COLD (and in this case being it open flag is really disputable as well), but rather hold-this-file-in-memory-longer-than-others, or will-read-this-file-quite-often. Moreover since with Ted's patches O_HOT means put the file on faster part of the disk (or rather whatever fs thinks is fast part of the disk, since the interface to get such information is missing) we already have one "meaning" and with this we'll add yet another, completely different meaning to the single flag. That seems messy. Thanks! -Lukas > > I cc'd linux-mm to see if there might be an interest in this ... or even > if it's worth it: I can also see we don't necessarily want userspace to > be able to tamper with our idea of what's hot and cold in the page > cache, since we get it primarily from the lru lists. > > James > > > -- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags 2012-04-20 11:23 ` Lukas Czerner @ 2012-04-20 14:07 ` Christoph Lameter 2012-04-20 14:42 ` James Bottomley 1 sibling, 0 replies; 12+ messages in thread From: Christoph Lameter @ 2012-04-20 14:07 UTC (permalink / raw) To: Lukas Czerner Cc: James Bottomley, Boaz Harrosh, Theodore Ts'o, linux-fsdevel, Ext4 Developers List, linux-mm > > I cc'd linux-mm to see if there might be an interest in this ... or even > > if it's worth it: I can also see we don't necessarily want userspace to > > be able to tamper with our idea of what's hot and cold in the page > > cache, since we get it primarily from the lru lists. > > > > James The notion of hor and cold in the page allocator refers to processor cache hotness and is used for pages on the per cpu free lists. F.e. cold pages are used when I/O is soon expected to occur on them because we want to avoid having to evict cache lines. Cold pages have been freed a long time ago. Hot pages are those that have been recently freed (we know that some cachelines are present therefore) and thus it is likely that acquisition by another process will allow that process to reuse the cacheline already present avoiding a trip to memory. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags 2012-04-20 11:23 ` Lukas Czerner 2012-04-20 14:07 ` Christoph Lameter @ 2012-04-20 14:42 ` James Bottomley 2012-04-20 14:58 ` Ted Ts'o 1 sibling, 1 reply; 12+ messages in thread From: James Bottomley @ 2012-04-20 14:42 UTC (permalink / raw) To: Lukas Czerner Cc: Boaz Harrosh, Theodore Ts'o, linux-fsdevel, Ext4 Developers List, linux-mm On Fri, 2012-04-20 at 13:23 +0200, Lukas Czerner wrote: > On Fri, 20 Apr 2012, James Bottomley wrote: > > > On Fri, 2012-04-20 at 11:45 +0200, Lukas Czerner wrote: > > > On Fri, 20 Apr 2012, Boaz Harrosh wrote: > > > > > > > On 04/19/2012 10:20 PM, Theodore Ts'o wrote: > > > > > > > > > As I had brought up during one of the lightning talks at the Linux > > > > > Storage and Filesystem workshop, I am interested in introducing two new > > > > > open flags, O_HOT and O_COLD. These flags are passed down to the > > > > > individual file system's inode operations' create function, and the file > > > > > system can use these flags as a hint regarding whether the file is > > > > > likely to be accessed frequently or not. > > > > > > > > > > In the future I plan to do further work on how ext4 would use these > > > > > flags, but I want to first get the ability to pass these flags plumbed > > > > > into the VFS layer and the code points for O_HOT and O_COLD reserved. > > > > > > > > > > > > > > > Theodore Ts'o (3): > > > > > fs: add new open flags O_HOT and O_COLD > > > > > fs: propagate the open_flags structure down to the low-level fs's > > > > > create() > > > > > ext4: use the O_HOT and O_COLD open flags to influence inode > > > > > allocation > > > > > > > > > > > > > > > > > I would expect that the first, and most important patch to this > > > > set would be the man page which would define the new API. > > > > What do you mean by cold/normal/hot? what is expected if supported? > > > > how can we know if supported? .... > > > > > > Well, this is exactly my concern as well. There is no way anyone would > > > know what it actually means a what users can expect form using it. The > > > result of this is very simple, everyone will just use O_HOT for > > > everything (if they will use it at all). > > > > > > Ted, as I've mentioned on LSF I think that the HOT/COLD name is really > > > bad choice for exactly this reason. It means nothing. If you want to use > > > this flag to place the inode on the faster part of the disk, then just > > > say so and name the flag accordingly, this way everyone can use it. > > > However for this to actually work we need some fs<->storage interface to > > > query storage layout, which actually should not be that hard to do. I am > > > afraid that in current form it will suit only Google and Taobao. I would > > > really like to have interface to pass tags between user->fs and > > > fs<->storage, but this one does not seem like a good start. > > > > I think this is a little unfair. We already have the notion of hot and > > cold pages within the page cache. The definitions for storage is > > similar: a hot block is one which will likely be read again shortly and > > a cold block is one that likely won't (ignoring the 30 odd gradations of > > in-between that the draft standard currently mandates) > > You're right, but there is a crucial difference, you can not compare > a page with a file. Page will be read or .. well not read so often, but > that's just one dimension. Files has a lot more dimensions, will it be > rewritten often ? will it be read often, appended often, do we need > really fast first access ? do we need fast metadata operation ? Will > this file be there forever, or is it just temporary ? Do we need fast > read/write ? and many more... Yes and no. I agree with your assessment. The major point you could ding me on actually is that just because a file is hot doesn't mean all its pages are it could only have a few hot pages in it. You could also argue that the time scale over which the page cache considers a page hot and that over which a disk does the same might be so dissimilar as to render the two usages orthogonal. The points about read and write are valid, but we could extend the page cache to them too. For instance, our readahead decisions are done at a bit of the wrong level (statically in block). If the page cache knew a file was streaming (a movie file, for instance), we could adjust the readahead dynamically for that file. Where this might be leading is that the file/filesystem hints to the page cache, and the page cache hints to the device. That way, we could cope with the hot file with only a few hot pages case. The drawback is that we really don't have much of this machinery in the page cache at the moment, and it's questionable if we really want it. Solving our readahead problem would be brilliant, especially if the interface were hintable, but not necessarily if it involves huge algorithmic expense in our current page cache. > > The concern I have is that the notion of hot and cold files *isn't* > > propagated to the page cache, it's just shared between the fs and the > > disk. It looks like we could tie the notion of file opened with O_HOT > > or O_COLD into the page reclaimers and actually call > > free_hot_cold_page() with the correct flag, meaning we might get an > > immediate benefit even in the absence of hint supporting disks. > > And this is actually very good idea, but the file flag should not be > O_HOT/O_COLD (and in this case being it open flag is really disputable > as well), but rather hold-this-file-in-memory-longer-than-others, or > will-read-this-file-quite-often. Moreover since with Ted's patches O_HOT > means put the file on faster part of the disk (or rather whatever fs > thinks is fast part of the disk, since the interface to get such > information is missing) we already have one "meaning" and with this > we'll add yet another, completely different meaning to the single > flag. That seems messy. I'm not at all wedded to O_HOT and O_COLD; I think if we establish a hint hierarchy file->page cache->device then we should, of course, choose the best API and naming scheme for file->page cache. The only real point I was making is that we should tie in the page cache, and currently it only knows about "hot" and "cold" pages. James -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags 2012-04-20 14:42 ` James Bottomley @ 2012-04-20 14:58 ` Ted Ts'o 2012-04-21 23:56 ` KOSAKI Motohiro 0 siblings, 1 reply; 12+ messages in thread From: Ted Ts'o @ 2012-04-20 14:58 UTC (permalink / raw) To: James Bottomley Cc: Lukas Czerner, Boaz Harrosh, linux-fsdevel, Ext4 Developers List, linux-mm On Fri, Apr 20, 2012 at 06:42:08PM +0400, James Bottomley wrote: > > I'm not at all wedded to O_HOT and O_COLD; I think if we establish a > hint hierarchy file->page cache->device then we should, of course, > choose the best API and naming scheme for file->page cache. The only > real point I was making is that we should tie in the page cache, and > currently it only knows about "hot" and "cold" pages. The problem is that "hot" and "cold" will have different meanings from the perspective of the file system versus the page cache. The file system may consider a file "hot" if it is accessed frequently --- compared to the other 2 TB of data on that HDD. The memory subsystem will consider a page "hot" compared to what has been recently accessed in the 8GB of memory that you might have your system. Now consider that you might have a dozen or so 2TB disks that each have their "hot" areas, and it's not at all obvious that just because a file, or even part of a file is marked "hot", that it deserves to be in memory at any particular point in time. - Ted -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags 2012-04-20 14:58 ` Ted Ts'o @ 2012-04-21 23:56 ` KOSAKI Motohiro 2012-04-22 6:30 ` Nick Piggin 0 siblings, 1 reply; 12+ messages in thread From: KOSAKI Motohiro @ 2012-04-21 23:56 UTC (permalink / raw) To: Ted Ts'o Cc: James Bottomley, Lukas Czerner, Boaz Harrosh, linux-fsdevel, Ext4 Developers List, linux-mm On Fri, Apr 20, 2012 at 10:58 AM, Ted Ts'o <tytso@mit.edu> wrote: > On Fri, Apr 20, 2012 at 06:42:08PM +0400, James Bottomley wrote: >> >> I'm not at all wedded to O_HOT and O_COLD; I think if we establish a >> hint hierarchy file->page cache->device then we should, of course, >> choose the best API and naming scheme for file->page cache. The only >> real point I was making is that we should tie in the page cache, and >> currently it only knows about "hot" and "cold" pages. > > The problem is that "hot" and "cold" will have different meanings from > the perspective of the file system versus the page cache. The file > system may consider a file "hot" if it is accessed frequently --- > compared to the other 2 TB of data on that HDD. The memory subsystem > will consider a page "hot" compared to what has been recently accessed > in the 8GB of memory that you might have your system. Now consider > that you might have a dozen or so 2TB disks that each have their "hot" > areas, and it's not at all obvious that just because a file, or even > part of a file is marked "hot", that it deserves to be in memory at > any particular point in time. So, this have intentionally different meanings I have no seen a reason why fs uses hot/cold words. It seems to bring a confusion. But I don't know full story of this feature and I might be overlooking something. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags 2012-04-21 23:56 ` KOSAKI Motohiro @ 2012-04-22 6:30 ` Nick Piggin 2012-04-23 8:23 ` James Bottomley 0 siblings, 1 reply; 12+ messages in thread From: Nick Piggin @ 2012-04-22 6:30 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Ted Ts'o, James Bottomley, Lukas Czerner, Boaz Harrosh, linux-fsdevel, Ext4 Developers List, linux-mm On 22 April 2012 09:56, KOSAKI Motohiro <kosaki.motohiro@gmail.com> wrote: > On Fri, Apr 20, 2012 at 10:58 AM, Ted Ts'o <tytso@mit.edu> wrote: >> On Fri, Apr 20, 2012 at 06:42:08PM +0400, James Bottomley wrote: >>> >>> I'm not at all wedded to O_HOT and O_COLD; I think if we establish a >>> hint hierarchy file->page cache->device then we should, of course, >>> choose the best API and naming scheme for file->page cache. The only >>> real point I was making is that we should tie in the page cache, and >>> currently it only knows about "hot" and "cold" pages. >> >> The problem is that "hot" and "cold" will have different meanings from >> the perspective of the file system versus the page cache. The file >> system may consider a file "hot" if it is accessed frequently --- >> compared to the other 2 TB of data on that HDD. The memory subsystem >> will consider a page "hot" compared to what has been recently accessed >> in the 8GB of memory that you might have your system. Now consider >> that you might have a dozen or so 2TB disks that each have their "hot" >> areas, and it's not at all obvious that just because a file, or even >> part of a file is marked "hot", that it deserves to be in memory at >> any particular point in time. > > So, this have intentionally different meanings I have no seen a reason why > fs uses hot/cold words. It seems to bring a confusion. Right. It has nothing to do with hot/cold usage in the page allocator, which is about how many lines of that page are in CPU cache. However it could be propagated up to page reclaim level, at least. Perhaps readahead/writeback too. But IMO it would be better to nail down the semantics for block and filesystem before getting worried about that. > > But I don't know full story of this feature and I might be overlooking > something. Also, "hot" and "cold" (as others have noted) is a big hammer that perhaps catches a tiny subset of useful work (probably more likely: benchmarks). Is it read often? Written often? Both? Are reads and writes random or linear? Is it latency bound, or throughput bound? (i.e., are queue depths high or low?) A filesystem and storage device might care about all of these things. Particularly if you have something more advanced than a single disk. Caches, tiers of storage, etc. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags 2012-04-22 6:30 ` Nick Piggin @ 2012-04-23 8:23 ` James Bottomley 2012-04-23 11:47 ` Nick Piggin 0 siblings, 1 reply; 12+ messages in thread From: James Bottomley @ 2012-04-23 8:23 UTC (permalink / raw) To: Nick Piggin Cc: KOSAKI Motohiro, Ted Ts'o, Lukas Czerner, Boaz Harrosh, linux-fsdevel, Ext4 Developers List, linux-mm On Sun, 2012-04-22 at 16:30 +1000, Nick Piggin wrote: > On 22 April 2012 09:56, KOSAKI Motohiro <kosaki.motohiro@gmail.com> wrote: > > On Fri, Apr 20, 2012 at 10:58 AM, Ted Ts'o <tytso@mit.edu> wrote: > >> On Fri, Apr 20, 2012 at 06:42:08PM +0400, James Bottomley wrote: > >>> > >>> I'm not at all wedded to O_HOT and O_COLD; I think if we establish a > >>> hint hierarchy file->page cache->device then we should, of course, > >>> choose the best API and naming scheme for file->page cache. The only > >>> real point I was making is that we should tie in the page cache, and > >>> currently it only knows about "hot" and "cold" pages. > >> > >> The problem is that "hot" and "cold" will have different meanings from > >> the perspective of the file system versus the page cache. The file > >> system may consider a file "hot" if it is accessed frequently --- > >> compared to the other 2 TB of data on that HDD. The memory subsystem > >> will consider a page "hot" compared to what has been recently accessed > >> in the 8GB of memory that you might have your system. Now consider > >> that you might have a dozen or so 2TB disks that each have their "hot" > >> areas, and it's not at all obvious that just because a file, or even > >> part of a file is marked "hot", that it deserves to be in memory at > >> any particular point in time. > > > > So, this have intentionally different meanings I have no seen a reason why > > fs uses hot/cold words. It seems to bring a confusion. > > Right. It has nothing to do with hot/cold usage in the page allocator, > which is about how many lines of that page are in CPU cache. Well, no it's a similar concept: we have no idea whether the page is cached or not. What we do is estimate that by elapsed time since we last touched the page. In some sense, this is similar to the fs definition: a hot page hint would mean we expect to touch the page frequently and a cold page means we wouldn't. i.e. for a hot page, the elapsed time between touches would be short and for a cold page it would be long. Now I still think there's a mismatch in the time scales: a long elapsed time for mm making the page cold isn't necessarily the same long elapsed time for the file, because the mm idea is conditioned by local events (like memory pressure). > However it could be propagated up to page reclaim level, at least. > Perhaps readahead/writeback too. But IMO it would be better to nail down > the semantics for block and filesystem before getting worried about that. Sure ... I just forwarded the email in case mm people had an interest. If you want FS and storage to develop the hints first and then figure out if we can involve the page cache, that's more or less what was happening anyway. > > But I don't know full story of this feature and I might be overlooking > > something. > > Also, "hot" and "cold" (as others have noted) is a big hammer that perhaps > catches a tiny subset of useful work (probably more likely: benchmarks). > > Is it read often? Written often? Both? Are reads and writes random or linear? > Is it latency bound, or throughput bound? (i.e., are queue depths high or > low?) > > A filesystem and storage device might care about all of these things. > Particularly if you have something more advanced than a single disk. > Caches, tiers of storage, etc. Experience has taught me to be wary of fine grained hints: they tend to be more trouble than they're worth (the definitions are either inaccurate or so tediously precise that no-one can be bothered to read them). A small set of broad hints is usually more useable than a huge set of fine grained ones, so from that point of view, I like the O_HOT/O_COLD ones. James -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags 2012-04-23 8:23 ` James Bottomley @ 2012-04-23 11:47 ` Nick Piggin 2012-04-24 6:18 ` Nick Piggin 0 siblings, 1 reply; 12+ messages in thread From: Nick Piggin @ 2012-04-23 11:47 UTC (permalink / raw) To: James Bottomley Cc: KOSAKI Motohiro, Ted Ts'o, Lukas Czerner, Boaz Harrosh, linux-fsdevel, Ext4 Developers List, linux-mm On 23 April 2012 18:23, James Bottomley <James.Bottomley@hansenpartnership.com> wrote: > On Sun, 2012-04-22 at 16:30 +1000, Nick Piggin wrote: >> On 22 April 2012 09:56, KOSAKI Motohiro <kosaki.motohiro@gmail.com> wrote: >> > On Fri, Apr 20, 2012 at 10:58 AM, Ted Ts'o <tytso@mit.edu> wrote: >> >> On Fri, Apr 20, 2012 at 06:42:08PM +0400, James Bottomley wrote: >> >>> >> >>> I'm not at all wedded to O_HOT and O_COLD; I think if we establish a >> >>> hint hierarchy file->page cache->device then we should, of course, >> >>> choose the best API and naming scheme for file->page cache. The only >> >>> real point I was making is that we should tie in the page cache, and >> >>> currently it only knows about "hot" and "cold" pages. >> >> >> >> The problem is that "hot" and "cold" will have different meanings from >> >> the perspective of the file system versus the page cache. The file >> >> system may consider a file "hot" if it is accessed frequently --- >> >> compared to the other 2 TB of data on that HDD. The memory subsystem >> >> will consider a page "hot" compared to what has been recently accessed >> >> in the 8GB of memory that you might have your system. Now consider >> >> that you might have a dozen or so 2TB disks that each have their "hot" >> >> areas, and it's not at all obvious that just because a file, or even >> >> part of a file is marked "hot", that it deserves to be in memory at >> >> any particular point in time. >> > >> > So, this have intentionally different meanings I have no seen a reason why >> > fs uses hot/cold words. It seems to bring a confusion. >> >> Right. It has nothing to do with hot/cold usage in the page allocator, >> which is about how many lines of that page are in CPU cache. > > Well, no it's a similar concept: we have no idea whether the page is > cached or not. > > What we do is estimate that by elapsed time since we > last touched the page. In some sense, this is similar to the fs > definition: a hot page hint would mean we expect to touch the page > frequently and a cold page means we wouldn't. i.e. for a hot page, the > elapsed time between touches would be short and for a cold page it would > be long. Now I still think there's a mismatch in the time scales: a > long elapsed time for mm making the page cold isn't necessarily the same > long elapsed time for the file, because the mm idea is conditioned by > local events (like memory pressure). I suspect the mismatch would make it have virtually no correlation. Experiments could surely be made, though. >> However it could be propagated up to page reclaim level, at least. >> Perhaps readahead/writeback too. But IMO it would be better to nail down >> the semantics for block and filesystem before getting worried about that. > > Sure ... I just forwarded the email in case mm people had an interest. > If you want FS and storage to develop the hints first and then figure > out if we can involve the page cache, that's more or less what was > happening anyway. OK, good. mm layers can always look up any such flags quite easily, so I think there is no problem of mechanism, only policy. >> > But I don't know full story of this feature and I might be overlooking >> > something. >> >> Also, "hot" and "cold" (as others have noted) is a big hammer that perhaps >> catches a tiny subset of useful work (probably more likely: benchmarks). >> >> Is it read often? Written often? Both? Are reads and writes random or linear? >> Is it latency bound, or throughput bound? (i.e., are queue depths high or >> low?) >> >> A filesystem and storage device might care about all of these things. >> Particularly if you have something more advanced than a single disk. >> Caches, tiers of storage, etc. > > Experience has taught me to be wary of fine grained hints: they tend to > be more trouble than they're worth (the definitions are either > inaccurate or so tediously precise that no-one can be bothered to read > them). A small set of broad hints is usually more useable than a huge > set of fine grained ones, so from that point of view, I like the > O_HOT/O_COLD ones. So long as the implementations can be sufficiently general that large majority of "reasonable" application of the flags does not result in a slowdown, perhaps. But while defining the API, you have to think about these things and not just dismiss them completely. Read vs write can be very important for caches and tiers, same for random/linear, latency constraints, etc. These things aren't exactly a huge unwieldy matrix. We already have similar concepts in fadvise and such. Thanks, Nick -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags 2012-04-23 11:47 ` Nick Piggin @ 2012-04-24 6:18 ` Nick Piggin 2012-04-24 15:00 ` KOSAKI Motohiro 0 siblings, 1 reply; 12+ messages in thread From: Nick Piggin @ 2012-04-24 6:18 UTC (permalink / raw) To: James Bottomley Cc: KOSAKI Motohiro, Ted Ts'o, Lukas Czerner, Boaz Harrosh, linux-fsdevel, Ext4 Developers List, linux-mm On 23 April 2012 21:47, Nick Piggin <npiggin@gmail.com> wrote: > On 23 April 2012 18:23, James Bottomley >> Experience has taught me to be wary of fine grained hints: they tend to >> be more trouble than they're worth (the definitions are either >> inaccurate or so tediously precise that no-one can be bothered to read >> them). A small set of broad hints is usually more useable than a huge >> set of fine grained ones, so from that point of view, I like the >> O_HOT/O_COLD ones. > > So long as the implementations can be sufficiently general that large majority > of "reasonable" application of the flags does not result in a slowdown, perhaps. > > But while defining the API, you have to think about these things and not > just dismiss them completely. > > Read vs write can be very important for caches and tiers, same for > random/linear, > latency constraints, etc. These things aren't exactly a huge unwieldy matrix. We > already have similar concepts in fadvise and such. I'm not saying it's necessarily a bad idea as such. But experience has taught me that if you define an API before having much experience of the implementation and its users, and without being able to write meaningful documentation for it, then it's going to be a bad API. So rather than pushing through these flags first, I think it would be better to actually do implementation work, and get some benchmarks (if not real apps) and have something working like that before turning anything into an API. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags 2012-04-24 6:18 ` Nick Piggin @ 2012-04-24 15:00 ` KOSAKI Motohiro 0 siblings, 0 replies; 12+ messages in thread From: KOSAKI Motohiro @ 2012-04-24 15:00 UTC (permalink / raw) To: Nick Piggin Cc: James Bottomley, KOSAKI Motohiro, Ted Ts'o, Lukas Czerner, Boaz Harrosh, linux-fsdevel, Ext4 Developers List, linux-mm (4/24/12 2:18 AM), Nick Piggin wrote: > On 23 April 2012 21:47, Nick Piggin<npiggin@gmail.com> wrote: >> On 23 April 2012 18:23, James Bottomley > >>> Experience has taught me to be wary of fine grained hints: they tend to >>> be more trouble than they're worth (the definitions are either >>> inaccurate or so tediously precise that no-one can be bothered to read >>> them). A small set of broad hints is usually more useable than a huge >>> set of fine grained ones, so from that point of view, I like the >>> O_HOT/O_COLD ones. >> >> So long as the implementations can be sufficiently general that large majority >> of "reasonable" application of the flags does not result in a slowdown, perhaps. >> >> But while defining the API, you have to think about these things and not >> just dismiss them completely. >> >> Read vs write can be very important for caches and tiers, same for >> random/linear, >> latency constraints, etc. These things aren't exactly a huge unwieldy matrix. We >> already have similar concepts in fadvise and such. > > I'm not saying it's necessarily a bad idea as such. But experience > has taught me that if you define an API before having much > experience of the implementation and its users, and without > being able to write meaningful documentation for it, then it's > going to be a bad API. > > So rather than pushing through these flags first, I think it would > be better to actually do implementation work, and get some > benchmarks (if not real apps) and have something working > like that before turning anything into an API. Fully agreed. I _guess_ O_COLD has an enough real world usefullness because a backup operation makes a lot of "write once read never" inodes. Moreover it doesn't have a system wide side effect. In the other hands, I don't imagine how O_HOT works yet. Beccause of, many apps want to run faster than other apps and it definitely don't work _if_ all applications turn on O_HOT for every open operations. So, I'm not sure why apps don't do such intentional abuse yet. So, we might need some API design discussions. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags 2012-04-20 11:01 ` [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags James Bottomley 2012-04-20 11:23 ` Lukas Czerner @ 2012-04-21 18:26 ` Jeff Garzik 1 sibling, 0 replies; 12+ messages in thread From: Jeff Garzik @ 2012-04-21 18:26 UTC (permalink / raw) To: James Bottomley Cc: Lukas Czerner, Boaz Harrosh, Theodore Ts'o, linux-fsdevel, Ext4 Developers List, linux-mm On 04/20/2012 07:01 AM, James Bottomley wrote: > The concern I have is that the notion of hot and cold files *isn't* > propagated to the page cache, it's just shared between the fs and the > disk. Bingo -- full-file hint is too coarse-grained for some workloads. Page granularity would propagate to the VM as well as block layer, and give the required flexibility to all workloads. As well as covering the full-file case. Jeff -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2012-04-24 15:00 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <1334863211-19504-1-git-send-email-tytso@mit.edu> [not found] ` <4F912880.70708@panasas.com> [not found] ` <alpine.LFD.2.00.1204201120060.27750@dhcp-27-109.brq.redhat.com> 2012-04-20 11:01 ` [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags James Bottomley 2012-04-20 11:23 ` Lukas Czerner 2012-04-20 14:07 ` Christoph Lameter 2012-04-20 14:42 ` James Bottomley 2012-04-20 14:58 ` Ted Ts'o 2012-04-21 23:56 ` KOSAKI Motohiro 2012-04-22 6:30 ` Nick Piggin 2012-04-23 8:23 ` James Bottomley 2012-04-23 11:47 ` Nick Piggin 2012-04-24 6:18 ` Nick Piggin 2012-04-24 15:00 ` KOSAKI Motohiro 2012-04-21 18:26 ` Jeff Garzik
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).