* VFS hot tracking: How to calculate data temperature? @ 2012-11-02 4:04 Zhi Yong Wu 2012-11-02 4:43 ` Ram Pai ` (2 more replies) 0 siblings, 3 replies; 29+ messages in thread From: Zhi Yong Wu @ 2012-11-02 4:04 UTC (permalink / raw) To: linux-fsdevel; +Cc: linuxram, zwu.kernel, Dave Chinner, cmm, Ben Chociej HI, guys VFS hot tracking currently show result as below, and it is very strange and not nice. inode #279, reads 0, writes 1, avg read time 18446744073709551615, avg write time 5251566408153596, temp 109 Do anyone know if there is one simpler but effective way to calculate data temperature? -- Regards, Zhi Yong Wu ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: VFS hot tracking: How to calculate data temperature? 2012-11-02 4:04 VFS hot tracking: How to calculate data temperature? Zhi Yong Wu @ 2012-11-02 4:43 ` Ram Pai 2012-11-02 6:39 ` Zhi Yong Wu 2012-11-02 6:38 ` Zhi Yong Wu 2012-11-09 1:12 ` Zhi Yong Wu 2 siblings, 1 reply; 29+ messages in thread From: Ram Pai @ 2012-11-02 4:43 UTC (permalink / raw) To: Zhi Yong Wu; +Cc: linux-fsdevel, linuxram, Dave Chinner, cmm, Ben Chociej On Fri, Nov 02, 2012 at 12:04:23PM +0800, Zhi Yong Wu wrote: > HI, guys > > VFS hot tracking currently show result as below, and it is very > strange and not nice. > > inode #279, reads 0, writes 1, avg read time 18446744073709551615, > avg write time 5251566408153596, temp 109 > > Do anyone know if there is one simpler but effective way to calculate > data temperature? Intuitively, data gets hot when it is accessed frequently, and cools down as the frequency of its access decreases. It is like heating water, the longer you heat it, the hotter it gets. The more intense the flame the faster it gets heated. So it is a function of both intensity and continuity. In the case of data, intensity can be mapped to the access frequency within a discrete interval. And the continuity can be mapped to sustainence of the same access frequency across further discrete intervals. Any reduction of intensity cools down the data. Any break in continuity cools down the data. We need to define the size of a 'discrete interval', and keep collecting the frequency of access in each discrete interval and take weighted average across all these discrete intervals. That should give us the temperature of the data. Here is an example to make my thoughts clear. Lets say we define discrete interval as 1sec. If a given peice of data is accessed 100 times in this 1sec, then the temperature of the data at the end of the 1sec will become (0+100)/2=50, where 0 is the temperature of the data at the beginning of the 1sec. In the next 1 sec, if the data is accessed 100 times again, then the temperature of the data at the end of the 2nd sec becomes (50+100)/2=75 and so on a so forth. If during any interval if the data is not accessed at all, the temperature goes down by half. This way of calculation biases the temperature towards most recent access. In other words a high amount of recent access significantly rises the temperature of the data compared to the same high amount of non-recent access. This technique will require us to capture *only* two information; the access frequency and the latest temperature of the data. A crude approach is to schedule a kernel daemon approximately every second to update the latest temperature using the access frequency since the last time the temperature got updated. But this approach is not scalable. The other approach is to decouple the frequency of temperature calculation from the size of the discrete interval. The kernel daemon thread can take its own sweet time to calculate the temperature. Whenever the daemon gets to calculate the new temperature of a given inode it can evenly spread the load, since the last temperature calculation, across all the discrete intervals. The formula for calculation will be (Ty+((2^y)-1)x)/(2^y). Where 'x' is the number of accesses in the last 'y' discrete intervals. 'T' is the previous temperature. Offcourse; I am approximating that there were exactly 'x/y' accesses in each of these 'y' discrete interval. This way of calculation will require one more field to be added in the datastructure, which is the timestamp when the last temperature got updated. RP ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: VFS hot tracking: How to calculate data temperature? 2012-11-02 4:43 ` Ram Pai @ 2012-11-02 6:39 ` Zhi Yong Wu 0 siblings, 0 replies; 29+ messages in thread From: Zhi Yong Wu @ 2012-11-02 6:39 UTC (permalink / raw) To: Ram Pai; +Cc: linux-fsdevel, linuxram, Dave Chinner, cmm, Ben Chociej any other comments? On Fri, Nov 2, 2012 at 12:43 PM, Ram Pai <linuxram@us.ibm.com> wrote: > On Fri, Nov 02, 2012 at 12:04:23PM +0800, Zhi Yong Wu wrote: >> HI, guys >> >> VFS hot tracking currently show result as below, and it is very >> strange and not nice. >> >> inode #279, reads 0, writes 1, avg read time 18446744073709551615, >> avg write time 5251566408153596, temp 109 >> >> Do anyone know if there is one simpler but effective way to calculate >> data temperature? > > Intuitively, data gets hot when it is accessed frequently, and cools down as > the frequency of its access decreases. > > It is like heating water, the longer you heat it, the hotter it gets. The more > intense the flame the faster it gets heated. So it is a function of both > intensity and continuity. In the case of data, intensity can be mapped to > the access frequency within a discrete interval. And the continuity can be > mapped to sustainence of the same access frequency across further discrete > intervals. Any reduction of intensity cools down the data. Any break in > continuity cools down the data. > > We need to define the size of a 'discrete interval', and keep collecting the > frequency of access in each discrete interval and take weighted > average across all these discrete intervals. That should give us the > temperature of the data. > > Here is an example to make my thoughts clear. > > Lets say we define discrete interval as 1sec. > > If a given peice of data is accessed 100 times in this 1sec, then the > temperature of the data at the end of the 1sec will become (0+100)/2=50, where > 0 is the temperature of the data at the beginning of the 1sec. In the next 1 > sec, if the data is accessed 100 times again, then the temperature of the data > at the end of the 2nd sec becomes (50+100)/2=75 and so on a so forth. If > during any interval if the data is not accessed at all, the temperature goes > down by half. This way of calculation biases the temperature towards most > recent access. In other words a high amount of recent access significantly > rises the temperature of the data compared to the same high amount of > non-recent access. > > This technique will require us to capture *only* two information; the access > frequency and the latest temperature of the data. > > A crude approach is to schedule a kernel daemon approximately every second to > update the latest temperature using the access frequency since the last time > the temperature got updated. But this approach is not scalable. The other > approach is to decouple the frequency of temperature calculation from the size of > the discrete interval. The kernel daemon thread can take its own sweet time to > calculate the temperature. Whenever the daemon gets to calculate the new > temperature of a given inode it can evenly spread the load, since the last > temperature calculation, across all the discrete intervals. > > The formula for calculation will be (Ty+((2^y)-1)x)/(2^y). Where 'x' is the > number of accesses in the last 'y' discrete intervals. 'T' is the previous > temperature. Offcourse; I am approximating that there were exactly 'x/y' > accesses in each of these 'y' discrete interval. > > This way of calculation will require one more field to be added in the > datastructure, which is the timestamp when the last temperature got updated. > > > RP > -- Regards, Zhi Yong Wu ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: VFS hot tracking: How to calculate data temperature? 2012-11-02 4:04 VFS hot tracking: How to calculate data temperature? Zhi Yong Wu 2012-11-02 4:43 ` Ram Pai @ 2012-11-02 6:38 ` Zhi Yong Wu 2012-11-02 8:41 ` Zheng Liu 2012-11-02 21:27 ` Mingming.cao 2012-11-09 1:12 ` Zhi Yong Wu 2 siblings, 2 replies; 29+ messages in thread From: Zhi Yong Wu @ 2012-11-02 6:38 UTC (permalink / raw) To: linux-fsdevel Cc: linuxram, zwu.kernel, Dave Chinner, cmm, Ben Chociej, James Northrup Here also has another question. How to save the file temperature among the umount to be able to preserve the file tempreture after reboot? This above is the requirement from DB product. I thought that we can save file temperature in its inode struct, that is, add one new field in struct inode, then this info will be written to disk with inode. Any comments or ideas are appreciated, thanks. On Fri, Nov 2, 2012 at 12:04 PM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote: > HI, guys > > VFS hot tracking currently show result as below, and it is very > strange and not nice. > > inode #279, reads 0, writes 1, avg read time 18446744073709551615, > avg write time 5251566408153596, temp 109 > > Do anyone know if there is one simpler but effective way to calculate > data temperature? > > -- > Regards, > > Zhi Yong Wu -- Regards, Zhi Yong Wu ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: VFS hot tracking: How to calculate data temperature? 2012-11-02 6:38 ` Zhi Yong Wu @ 2012-11-02 8:41 ` Zheng Liu 2012-11-02 20:10 ` Darrick J. Wong ` (2 more replies) 2012-11-02 21:27 ` Mingming.cao 1 sibling, 3 replies; 29+ messages in thread From: Zheng Liu @ 2012-11-02 8:41 UTC (permalink / raw) To: Zhi Yong Wu Cc: linux-fsdevel, linuxram, Dave Chinner, cmm, Ben Chociej, James Northrup On Fri, Nov 02, 2012 at 02:38:29PM +0800, Zhi Yong Wu wrote: > Here also has another question. > > How to save the file temperature among the umount to be able to > preserve the file tempreture after reboot? > > This above is the requirement from DB product. > I thought that we can save file temperature in its inode struct, that > is, add one new field in struct inode, then this info will be written > to disk with inode. > > Any comments or ideas are appreciated, thanks. Hi Zhiyong, I think that we might define a callback function. If a filesystem wants to save these data, it can implement a function to save them. The filesystem can decide whether adding it or not by themselves. BTW, actually I don't really care about how to save these data because I only want to observe which file is accessed in real time, which is very useful for me to track a problem in our product system. Regards, Zheng ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: VFS hot tracking: How to calculate data temperature? 2012-11-02 8:41 ` Zheng Liu @ 2012-11-02 20:10 ` Darrick J. Wong 2012-11-05 2:34 ` Zhi Yong Wu 2012-11-05 8:35 ` Dave Chinner 2012-11-05 2:29 ` Zhi Yong Wu 2012-11-06 9:36 ` Ram Pai 2 siblings, 2 replies; 29+ messages in thread From: Darrick J. Wong @ 2012-11-02 20:10 UTC (permalink / raw) To: Zhi Yong Wu, linux-fsdevel, linuxram, Dave Chinner, cmm, Ben Chociej, James Northrup On Fri, Nov 02, 2012 at 04:41:09PM +0800, Zheng Liu wrote: > On Fri, Nov 02, 2012 at 02:38:29PM +0800, Zhi Yong Wu wrote: > > Here also has another question. > > > > How to save the file temperature among the umount to be able to > > preserve the file tempreture after reboot? > > > > This above is the requirement from DB product. > > I thought that we can save file temperature in its inode struct, that > > is, add one new field in struct inode, then this info will be written > > to disk with inode. > > > > Any comments or ideas are appreciated, thanks. > > Hi Zhiyong, > > I think that we might define a callback function. If a filesystem wants > to save these data, it can implement a function to save them. The > filesystem can decide whether adding it or not by themselves. > > BTW, actually I don't really care about how to save these data because I > only want to observe which file is accessed in real time, which is very > useful for me to track a problem in our product system. <shrug> I _think_ the vfs quota code simply asks the filesystem for a special inode where it save the quota data in whatever (FS-agnostic) format it wants. Have you considered something like that? (Or, maybe everyone secretly hates doing that? Secret files, yaaay...) --D > > Regards, > Zheng > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: VFS hot tracking: How to calculate data temperature? 2012-11-02 20:10 ` Darrick J. Wong @ 2012-11-05 2:34 ` Zhi Yong Wu 2012-11-05 8:35 ` Dave Chinner 1 sibling, 0 replies; 29+ messages in thread From: Zhi Yong Wu @ 2012-11-05 2:34 UTC (permalink / raw) To: Darrick J. Wong Cc: linux-fsdevel, linuxram, Dave Chinner, cmm, Ben Chociej, James Northrup, linux-kernel mlist On Sat, Nov 3, 2012 at 4:10 AM, Darrick J. Wong <darrick.wong@oracle.com> wrote: > On Fri, Nov 02, 2012 at 04:41:09PM +0800, Zheng Liu wrote: >> On Fri, Nov 02, 2012 at 02:38:29PM +0800, Zhi Yong Wu wrote: >> > Here also has another question. >> > >> > How to save the file temperature among the umount to be able to >> > preserve the file tempreture after reboot? >> > >> > This above is the requirement from DB product. >> > I thought that we can save file temperature in its inode struct, that >> > is, add one new field in struct inode, then this info will be written >> > to disk with inode. >> > >> > Any comments or ideas are appreciated, thanks. >> >> Hi Zhiyong, >> >> I think that we might define a callback function. If a filesystem wants >> to save these data, it can implement a function to save them. The >> filesystem can decide whether adding it or not by themselves. >> >> BTW, actually I don't really care about how to save these data because I >> only want to observe which file is accessed in real time, which is very >> useful for me to track a problem in our product system. > > <shrug> I _think_ the vfs quota code simply asks the filesystem for a special > inode where it save the quota data in whatever (FS-agnostic) format it wants. > Have you considered something like that? No, but it is one good hint for my issue. thanks. > > (Or, maybe everyone secretly hates doing that? Secret files, yaaay...) ah, do you think of doing that? > > --D >> >> Regards, >> Zheng >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- Regards, Zhi Yong Wu ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: VFS hot tracking: How to calculate data temperature? 2012-11-02 20:10 ` Darrick J. Wong 2012-11-05 2:34 ` Zhi Yong Wu @ 2012-11-05 8:35 ` Dave Chinner 1 sibling, 0 replies; 29+ messages in thread From: Dave Chinner @ 2012-11-05 8:35 UTC (permalink / raw) To: Darrick J. Wong Cc: Zhi Yong Wu, linux-fsdevel, linuxram, cmm, Ben Chociej, James Northrup On Fri, Nov 02, 2012 at 01:10:48PM -0700, Darrick J. Wong wrote: > On Fri, Nov 02, 2012 at 04:41:09PM +0800, Zheng Liu wrote: > > On Fri, Nov 02, 2012 at 02:38:29PM +0800, Zhi Yong Wu wrote: > > > Here also has another question. > > > > > > How to save the file temperature among the umount to be able to > > > preserve the file tempreture after reboot? > > > > > > This above is the requirement from DB product. > > > I thought that we can save file temperature in its inode struct, that > > > is, add one new field in struct inode, then this info will be written > > > to disk with inode. > > > > > > Any comments or ideas are appreciated, thanks. > > > > Hi Zhiyong, > > > > I think that we might define a callback function. If a filesystem wants > > to save these data, it can implement a function to save them. The > > filesystem can decide whether adding it or not by themselves. > > > > BTW, actually I don't really care about how to save these data because I > > only want to observe which file is accessed in real time, which is very > > useful for me to track a problem in our product system. > > <shrug> I _think_ the vfs quota code simply asks the filesystem for a special > inode where it save the quota data in whatever (FS-agnostic) format it wants. > Have you considered something like that? Doesn't make a lot of sense because the data is per-inode. Storing it per-inode is the only way it can be efficiently stored and indexed as it is accessed at the same time the inode is accessed. Quota information, OTOH, is per user/group/project - they are shared structures and have a completely different lookup and index mechanism to per-inode data structures. Henc eI don't think that the quota model would be a good fit for such data. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: VFS hot tracking: How to calculate data temperature? 2012-11-02 8:41 ` Zheng Liu 2012-11-02 20:10 ` Darrick J. Wong @ 2012-11-05 2:29 ` Zhi Yong Wu 2012-11-06 8:39 ` Zheng Liu 2012-11-06 9:36 ` Ram Pai 2 siblings, 1 reply; 29+ messages in thread From: Zhi Yong Wu @ 2012-11-05 2:29 UTC (permalink / raw) To: Zhi Yong Wu, linux-fsdevel, linuxram, Dave Chinner, cmm, Ben Chociej, James Northrup, linux-kernel mlist On Fri, Nov 2, 2012 at 4:41 PM, Zheng Liu <gnehzuil.liu@gmail.com> wrote: > On Fri, Nov 02, 2012 at 02:38:29PM +0800, Zhi Yong Wu wrote: >> Here also has another question. >> >> How to save the file temperature among the umount to be able to >> preserve the file tempreture after reboot? >> >> This above is the requirement from DB product. >> I thought that we can save file temperature in its inode struct, that >> is, add one new field in struct inode, then this info will be written >> to disk with inode. >> >> Any comments or ideas are appreciated, thanks. > > Hi Zhiyong, > > I think that we might define a callback function. If a filesystem wants > to save these data, it can implement a function to save them. The > filesystem can decide whether adding it or not by themselves. Great idea, temperature saving function is maybe very specific to FS. But i am wondering if we can find one generic way to save temperature info at first. > > BTW, actually I don't really care about how to save these data because I > only want to observe which file is accessed in real time, which is very > useful for me to track a problem in our product system. heh, but other guys or products care about this. > > Regards, > Zheng -- Regards, Zhi Yong Wu ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: VFS hot tracking: How to calculate data temperature? 2012-11-05 2:29 ` Zhi Yong Wu @ 2012-11-06 8:39 ` Zheng Liu 2012-11-06 9:00 ` Zhi Yong Wu 0 siblings, 1 reply; 29+ messages in thread From: Zheng Liu @ 2012-11-06 8:39 UTC (permalink / raw) To: Zhi Yong Wu Cc: linux-fsdevel, linuxram, Dave Chinner, cmm, Ben Chociej, James Northrup, linux-kernel mlist On Mon, Nov 05, 2012 at 10:29:39AM +0800, Zhi Yong Wu wrote: > On Fri, Nov 2, 2012 at 4:41 PM, Zheng Liu <gnehzuil.liu@gmail.com> wrote: > > On Fri, Nov 02, 2012 at 02:38:29PM +0800, Zhi Yong Wu wrote: > >> Here also has another question. > >> > >> How to save the file temperature among the umount to be able to > >> preserve the file tempreture after reboot? > >> > >> This above is the requirement from DB product. > >> I thought that we can save file temperature in its inode struct, that > >> is, add one new field in struct inode, then this info will be written > >> to disk with inode. > >> > >> Any comments or ideas are appreciated, thanks. > > > > Hi Zhiyong, > > > > I think that we might define a callback function. If a filesystem wants > > to save these data, it can implement a function to save them. The > > filesystem can decide whether adding it or not by themselves. > Great idea, temperature saving function is maybe very specific to FS. > But i am wondering if we can find one generic way to save temperature > info at first. I don't think a generic way is better because it cannot support a variety of filesystems. So maybe you must answer this question firstly: how many filesystems do you want to save this info? such as ext4, xfs, btrfs, etc. Then we can try to find a generic way. If only these three filesystems you want to support, maybe saving in xattr is an optional way. Regards, Zheng ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: VFS hot tracking: How to calculate data temperature? 2012-11-06 8:39 ` Zheng Liu @ 2012-11-06 9:00 ` Zhi Yong Wu 2012-11-07 6:45 ` Zheng Liu 0 siblings, 1 reply; 29+ messages in thread From: Zhi Yong Wu @ 2012-11-06 9:00 UTC (permalink / raw) To: Zhi Yong Wu, linux-fsdevel, linuxram, Dave Chinner, cmm, Ben Chociej, James Northrup, linux-kernel mlist On Tue, Nov 6, 2012 at 4:39 PM, Zheng Liu <gnehzuil.liu@gmail.com> wrote: > On Mon, Nov 05, 2012 at 10:29:39AM +0800, Zhi Yong Wu wrote: >> On Fri, Nov 2, 2012 at 4:41 PM, Zheng Liu <gnehzuil.liu@gmail.com> wrote: >> > On Fri, Nov 02, 2012 at 02:38:29PM +0800, Zhi Yong Wu wrote: >> >> Here also has another question. >> >> >> >> How to save the file temperature among the umount to be able to >> >> preserve the file tempreture after reboot? >> >> >> >> This above is the requirement from DB product. >> >> I thought that we can save file temperature in its inode struct, that >> >> is, add one new field in struct inode, then this info will be written >> >> to disk with inode. >> >> >> >> Any comments or ideas are appreciated, thanks. >> > >> > Hi Zhiyong, >> > >> > I think that we might define a callback function. If a filesystem wants >> > to save these data, it can implement a function to save them. The >> > filesystem can decide whether adding it or not by themselves. >> Great idea, temperature saving function is maybe very specific to FS. >> But i am wondering if we can find one generic way to save temperature >> info at first. > > I don't think a generic way is better because it cannot support a > variety of filesystems. So maybe you must answer this question firstly: > how many filesystems do you want to save this info? such as ext4, xfs, > btrfs, etc. Then we can try to find a generic way. If only these three > filesystems you want to support, maybe saving in xattr is an optional > way. yes, xattr is one good choice from currect discussion result. Maybe we can provide one generic way, and one callback registering infrastructure, if FS register its own saving callback, this callback function will be used, otherwise the generic way will be applied. By the way, as what Dave mentioned, the patchset v4+ review has highest priority, then the way how to calc data temperature, and the lowest priority is the way how to save data temperature info. > > Regards, > Zheng -- Regards, Zhi Yong Wu ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: VFS hot tracking: How to calculate data temperature? 2012-11-06 9:00 ` Zhi Yong Wu @ 2012-11-07 6:45 ` Zheng Liu 0 siblings, 0 replies; 29+ messages in thread From: Zheng Liu @ 2012-11-07 6:45 UTC (permalink / raw) To: Zhi Yong Wu Cc: linux-fsdevel, linuxram, Dave Chinner, cmm, Ben Chociej, James Northrup, linux-kernel mlist On Tue, Nov 06, 2012 at 05:00:19PM +0800, Zhi Yong Wu wrote: > On Tue, Nov 6, 2012 at 4:39 PM, Zheng Liu <gnehzuil.liu@gmail.com> wrote: > > On Mon, Nov 05, 2012 at 10:29:39AM +0800, Zhi Yong Wu wrote: > >> On Fri, Nov 2, 2012 at 4:41 PM, Zheng Liu <gnehzuil.liu@gmail.com> wrote: > >> > On Fri, Nov 02, 2012 at 02:38:29PM +0800, Zhi Yong Wu wrote: > >> >> Here also has another question. > >> >> > >> >> How to save the file temperature among the umount to be able to > >> >> preserve the file tempreture after reboot? > >> >> > >> >> This above is the requirement from DB product. > >> >> I thought that we can save file temperature in its inode struct, that > >> >> is, add one new field in struct inode, then this info will be written > >> >> to disk with inode. > >> >> > >> >> Any comments or ideas are appreciated, thanks. > >> > > >> > Hi Zhiyong, > >> > > >> > I think that we might define a callback function. If a filesystem wants > >> > to save these data, it can implement a function to save them. The > >> > filesystem can decide whether adding it or not by themselves. > >> Great idea, temperature saving function is maybe very specific to FS. > >> But i am wondering if we can find one generic way to save temperature > >> info at first. > > > > I don't think a generic way is better because it cannot support a > > variety of filesystems. So maybe you must answer this question firstly: > > how many filesystems do you want to save this info? such as ext4, xfs, > > btrfs, etc. Then we can try to find a generic way. If only these three > > filesystems you want to support, maybe saving in xattr is an optional > > way. > yes, xattr is one good choice from currect discussion result. Maybe we > can provide one generic way, and one callback registering > infrastructure, if FS register its own saving callback, this callback > function will be used, otherwise the generic way will be applied. > > By the way, as what Dave mentioned, the patchset v4+ review has > highest priority, then the way how to calc data temperature, and the > lowest priority is the way how to save data temperature info. Great! Thanks for sharing the news with me. IMHO the highest priority is that we must know the overhead that this patch set costs after using these patches. My point of view is that there is no any overhead when it is disabled, and it only brings a little overhead when it is enabled. Regards, Zheng ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: VFS hot tracking: How to calculate data temperature? 2012-11-02 8:41 ` Zheng Liu 2012-11-02 20:10 ` Darrick J. Wong 2012-11-05 2:29 ` Zhi Yong Wu @ 2012-11-06 9:36 ` Ram Pai 2012-11-06 23:10 ` Darrick J. Wong 2 siblings, 1 reply; 29+ messages in thread From: Ram Pai @ 2012-11-06 9:36 UTC (permalink / raw) To: Zhi Yong Wu, linux-fsdevel, linuxram, Dave Chinner, cmm, Ben Chociej, James Northrup On Fri, Nov 02, 2012 at 04:41:09PM +0800, Zheng Liu wrote: > On Fri, Nov 02, 2012 at 02:38:29PM +0800, Zhi Yong Wu wrote: > > Here also has another question. > > > > How to save the file temperature among the umount to be able to > > preserve the file tempreture after reboot? > > > > This above is the requirement from DB product. > > I thought that we can save file temperature in its inode struct, that > > is, add one new field in struct inode, then this info will be written > > to disk with inode. > > > > Any comments or ideas are appreciated, thanks. > > Hi Zhiyong, > > I think that we might define a callback function. If a filesystem wants > to save these data, it can implement a function to save them. The > filesystem can decide whether adding it or not by themselves. > > BTW, actually I don't really care about how to save these data because I > only want to observe which file is accessed in real time, which is very > useful for me to track a problem in our product system. To me, umounting a filesystem is a way of explicitly telling the VFS that the filesystem's data is not hot anymore. So probably, it really does not make sense to store temperatures across mount boundaries. RP ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: VFS hot tracking: How to calculate data temperature? 2012-11-06 9:36 ` Ram Pai @ 2012-11-06 23:10 ` Darrick J. Wong 2012-11-07 6:36 ` Zheng Liu 0 siblings, 1 reply; 29+ messages in thread From: Darrick J. Wong @ 2012-11-06 23:10 UTC (permalink / raw) To: Ram Pai Cc: Zhi Yong Wu, linux-fsdevel, linuxram, Dave Chinner, cmm, Ben Chociej, James Northrup On Tue, Nov 06, 2012 at 05:36:38PM +0800, Ram Pai wrote: > On Fri, Nov 02, 2012 at 04:41:09PM +0800, Zheng Liu wrote: > > On Fri, Nov 02, 2012 at 02:38:29PM +0800, Zhi Yong Wu wrote: > > > Here also has another question. > > > > > > How to save the file temperature among the umount to be able to > > > preserve the file tempreture after reboot? > > > > > > This above is the requirement from DB product. > > > I thought that we can save file temperature in its inode struct, that > > > is, add one new field in struct inode, then this info will be written > > > to disk with inode. > > > > > > Any comments or ideas are appreciated, thanks. > > > > Hi Zhiyong, > > > > I think that we might define a callback function. If a filesystem wants > > to save these data, it can implement a function to save them. The > > filesystem can decide whether adding it or not by themselves. > > > > BTW, actually I don't really care about how to save these data because I > > only want to observe which file is accessed in real time, which is very > > useful for me to track a problem in our product system. > > To me, umounting a filesystem is a way of explicitly telling the VFS that the > filesystem's data is not hot anymore. So probably, it really does not make > sense to store temperatures across mount boundaries. I'd prefer that file heat data to be retained across mounts -- we shouldn't throw away all of our observations just because of a system crash / power outage / scheduled reboot. Or, imagine if you're a defragging tool. If you're clever enough to try consolidating all the hot blocks in one place on disk so that you could aggressively read them all in at once (e.g. ureadahead), I think you'd want to be able to access as big of an observation pool as possible. This just occurred to me -- are you saving all of the file's heat data, like the per-range read/write counters, and the averages? Or just a single compiled heat rating for the whole file? I suggested a big hidden file a few days ago because I'd thought you were trying to save all the range/heat data, which would probably be painful to shoehorn into an xattr. If you're only storing a single number, then the xattr way is probably ok. --D > > RP > > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: VFS hot tracking: How to calculate data temperature? 2012-11-06 23:10 ` Darrick J. Wong @ 2012-11-07 6:36 ` Zheng Liu 2012-11-07 19:25 ` Darrick J. Wong 0 siblings, 1 reply; 29+ messages in thread From: Zheng Liu @ 2012-11-07 6:36 UTC (permalink / raw) To: Darrick J. Wong Cc: Ram Pai, Zhi Yong Wu, linux-fsdevel, linuxram, Dave Chinner, cmm, Ben Chociej, James Northrup On Tue, Nov 06, 2012 at 03:10:11PM -0800, Darrick J. Wong wrote: > On Tue, Nov 06, 2012 at 05:36:38PM +0800, Ram Pai wrote: > > On Fri, Nov 02, 2012 at 04:41:09PM +0800, Zheng Liu wrote: > > > On Fri, Nov 02, 2012 at 02:38:29PM +0800, Zhi Yong Wu wrote: > > > > Here also has another question. > > > > > > > > How to save the file temperature among the umount to be able to > > > > preserve the file tempreture after reboot? > > > > > > > > This above is the requirement from DB product. > > > > I thought that we can save file temperature in its inode struct, that > > > > is, add one new field in struct inode, then this info will be written > > > > to disk with inode. > > > > > > > > Any comments or ideas are appreciated, thanks. > > > > > > Hi Zhiyong, > > > > > > I think that we might define a callback function. If a filesystem wants > > > to save these data, it can implement a function to save them. The > > > filesystem can decide whether adding it or not by themselves. > > > > > > BTW, actually I don't really care about how to save these data because I > > > only want to observe which file is accessed in real time, which is very > > > useful for me to track a problem in our product system. > > > > To me, umounting a filesystem is a way of explicitly telling the VFS that the > > filesystem's data is not hot anymore. So probably, it really does not make > > sense to store temperatures across mount boundaries. > > I'd prefer that file heat data to be retained across mounts -- we shouldn't > throw away all of our observations just because of a system crash / power > outage / scheduled reboot. > > Or, imagine if you're a defragging tool. If you're clever enough to try > consolidating all the hot blocks in one place on disk so that you could > aggressively read them all in at once (e.g. ureadahead), I think you'd want to > be able to access as big of an observation pool as possible. > > This just occurred to me -- are you saving all of the file's heat data, like > the per-range read/write counters, and the averages? Or just a single compiled > heat rating for the whole file? I suggested a big hidden file a few days ago > because I'd thought you were trying to save all the range/heat data, which > would probably be painful to shoehorn into an xattr. If you're only storing a > single number, then the xattr way is probably ok. Hi Darrick, Maybe the best way is that a new mount option or a switch in sysfs is provided to turn on/off it. The user can decide whether it is enabled or not. After all it will bring some extra overhead. At least turning it on in our product system is unacceptable for me if there is no any problem that I need to track. Regards, Zheng ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: VFS hot tracking: How to calculate data temperature? 2012-11-07 6:36 ` Zheng Liu @ 2012-11-07 19:25 ` Darrick J. Wong 2012-11-08 2:48 ` Zheng Liu 0 siblings, 1 reply; 29+ messages in thread From: Darrick J. Wong @ 2012-11-07 19:25 UTC (permalink / raw) To: Ram Pai, Zhi Yong Wu, linux-fsdevel, linuxram, Dave Chinner, cmm, Ben Chociej, James Northrup On Wed, Nov 07, 2012 at 02:36:42PM +0800, Zheng Liu wrote: > On Tue, Nov 06, 2012 at 03:10:11PM -0800, Darrick J. Wong wrote: > > On Tue, Nov 06, 2012 at 05:36:38PM +0800, Ram Pai wrote: > > > On Fri, Nov 02, 2012 at 04:41:09PM +0800, Zheng Liu wrote: > > > > On Fri, Nov 02, 2012 at 02:38:29PM +0800, Zhi Yong Wu wrote: > > > > > Here also has another question. > > > > > > > > > > How to save the file temperature among the umount to be able to > > > > > preserve the file tempreture after reboot? > > > > > > > > > > This above is the requirement from DB product. > > > > > I thought that we can save file temperature in its inode struct, that > > > > > is, add one new field in struct inode, then this info will be written > > > > > to disk with inode. > > > > > > > > > > Any comments or ideas are appreciated, thanks. > > > > > > > > Hi Zhiyong, > > > > > > > > I think that we might define a callback function. If a filesystem wants > > > > to save these data, it can implement a function to save them. The > > > > filesystem can decide whether adding it or not by themselves. > > > > > > > > BTW, actually I don't really care about how to save these data because I > > > > only want to observe which file is accessed in real time, which is very > > > > useful for me to track a problem in our product system. > > > > > > To me, umounting a filesystem is a way of explicitly telling the VFS that the > > > filesystem's data is not hot anymore. So probably, it really does not make > > > sense to store temperatures across mount boundaries. > > > > I'd prefer that file heat data to be retained across mounts -- we shouldn't > > throw away all of our observations just because of a system crash / power > > outage / scheduled reboot. > > > > Or, imagine if you're a defragging tool. If you're clever enough to try > > consolidating all the hot blocks in one place on disk so that you could > > aggressively read them all in at once (e.g. ureadahead), I think you'd want to > > be able to access as big of an observation pool as possible. > > > > This just occurred to me -- are you saving all of the file's heat data, like > > the per-range read/write counters, and the averages? Or just a single compiled > > heat rating for the whole file? I suggested a big hidden file a few days ago > > because I'd thought you were trying to save all the range/heat data, which > > would probably be painful to shoehorn into an xattr. If you're only storing a > > single number, then the xattr way is probably ok. > > Hi Darrick, > > Maybe the best way is that a new mount option or a switch in sysfs is > provided to turn on/off it. The user can decide whether it is enabled > or not. After all it will bring some extra overhead. At least turning > it on in our product system is unacceptable for me if there is no any > problem that I need to track. Hmm... who are the intended in-kernel users of the hot tracking feature? I'm starting to wonder if it's possible (or desirable) to implement some of this in userspace and have the kernel ask for the hot data as needed, or simply write a driver program that handles the strategy and only needs the kernel interface that moves extents around. I feel like we could just write a regular program that uses ftrace to record io activity and manage all the observations that we pick up, and then the db, defrag, dedupe, etc. programs can just call into that? On the other hand, writing some daemon program has its own problems with distribution, starting it up, and killing it off at shutdown. But it would make Zheng's (non)use case easier -- if you don't want it, don't run it. Perhaps this approach has already been discussed and thrown out? In which case I'll shut up. :) --D > > Regards, > Zheng ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: VFS hot tracking: How to calculate data temperature? 2012-11-07 19:25 ` Darrick J. Wong @ 2012-11-08 2:48 ` Zheng Liu 0 siblings, 0 replies; 29+ messages in thread From: Zheng Liu @ 2012-11-08 2:48 UTC (permalink / raw) To: Darrick J. Wong Cc: Ram Pai, Zhi Yong Wu, linux-fsdevel, linuxram, Dave Chinner, cmm, Ben Chociej, James Northrup On Wed, Nov 07, 2012 at 11:25:33AM -0800, Darrick J. Wong wrote: > On Wed, Nov 07, 2012 at 02:36:42PM +0800, Zheng Liu wrote: > > On Tue, Nov 06, 2012 at 03:10:11PM -0800, Darrick J. Wong wrote: > > > On Tue, Nov 06, 2012 at 05:36:38PM +0800, Ram Pai wrote: > > > > On Fri, Nov 02, 2012 at 04:41:09PM +0800, Zheng Liu wrote: > > > > > On Fri, Nov 02, 2012 at 02:38:29PM +0800, Zhi Yong Wu wrote: > > > > > > Here also has another question. > > > > > > > > > > > > How to save the file temperature among the umount to be able to > > > > > > preserve the file tempreture after reboot? > > > > > > > > > > > > This above is the requirement from DB product. > > > > > > I thought that we can save file temperature in its inode struct, that > > > > > > is, add one new field in struct inode, then this info will be written > > > > > > to disk with inode. > > > > > > > > > > > > Any comments or ideas are appreciated, thanks. > > > > > > > > > > Hi Zhiyong, > > > > > > > > > > I think that we might define a callback function. If a filesystem wants > > > > > to save these data, it can implement a function to save them. The > > > > > filesystem can decide whether adding it or not by themselves. > > > > > > > > > > BTW, actually I don't really care about how to save these data because I > > > > > only want to observe which file is accessed in real time, which is very > > > > > useful for me to track a problem in our product system. > > > > > > > > To me, umounting a filesystem is a way of explicitly telling the VFS that the > > > > filesystem's data is not hot anymore. So probably, it really does not make > > > > sense to store temperatures across mount boundaries. > > > > > > I'd prefer that file heat data to be retained across mounts -- we shouldn't > > > throw away all of our observations just because of a system crash / power > > > outage / scheduled reboot. > > > > > > Or, imagine if you're a defragging tool. If you're clever enough to try > > > consolidating all the hot blocks in one place on disk so that you could > > > aggressively read them all in at once (e.g. ureadahead), I think you'd want to > > > be able to access as big of an observation pool as possible. > > > > > > This just occurred to me -- are you saving all of the file's heat data, like > > > the per-range read/write counters, and the averages? Or just a single compiled > > > heat rating for the whole file? I suggested a big hidden file a few days ago > > > because I'd thought you were trying to save all the range/heat data, which > > > would probably be painful to shoehorn into an xattr. If you're only storing a > > > single number, then the xattr way is probably ok. > > > > Hi Darrick, > > > > Maybe the best way is that a new mount option or a switch in sysfs is > > provided to turn on/off it. The user can decide whether it is enabled > > or not. After all it will bring some extra overhead. At least turning > > it on in our product system is unacceptable for me if there is no any > > problem that I need to track. > > Hmm... who are the intended in-kernel users of the hot tracking feature? I'm > starting to wonder if it's possible (or desirable) to implement some of this in > userspace and have the kernel ask for the hot data as needed, or simply write a > driver program that handles the strategy and only needs the kernel interface > that moves extents around. I feel like we could just write a regular program > that uses ftrace to record io activity and manage all the observations that we > pick up, and then the db, defrag, dedupe, etc. programs can just call into > that? > > On the other hand, writing some daemon program has its own problems with > distribution, starting it up, and killing it off at shutdown. But it would > make Zheng's (non)use case easier -- if you don't want it, don't run it. In fact, I often need to help a application developer to find some IO problems. So let me describe my own solution, please. I usually use blktrace to grab some IO activities, run a script to filter some read/write IO requests which contain sector number in disk. Then I use 'ex' command in debugfs (for ext4 filesystem) to get the layout of all extents. Finally, I write a script program to get which file is accessed. Until now I haven't find a better method to do it. So the hot tracking feature is very useful for me. That is why I concern the overhead that the feature brings, and why I hope this feature can be enabled/disabled dynamically. As I said, we can write a userspace program to do these things, but my method has some defects. On one hand, we couldn't export the layout of all servers because it uses a huge number of disk spaces. OTOH, when I need to use my own method to track a problem, I need to export the layout, and it takes a long time to do this thing. Maybe the problem disappears after you finish exporting the layout. :-( Thus, writting some daemon program might be another solution, but for me it is not the best solution. At least we need to do something in kernel to record IO activity in order that the user can easily retrieve it. Regards, Zheng ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: VFS hot tracking: How to calculate data temperature? 2012-11-02 6:38 ` Zhi Yong Wu 2012-11-02 8:41 ` Zheng Liu @ 2012-11-02 21:27 ` Mingming.cao 2012-11-05 2:35 ` Zhi Yong Wu 1 sibling, 1 reply; 29+ messages in thread From: Mingming.cao @ 2012-11-02 21:27 UTC (permalink / raw) To: Zhi Yong Wu Cc: linux-fsdevel, linuxram, Dave Chinner, Ben Chociej, James Northrup On Fri, 2012-11-02 at 14:38 +0800, Zhi Yong Wu wrote: > Here also has another question. > > How to save the file temperature among the umount to be able to > preserve the file tempreture after reboot? > > This above is the requirement from DB product. > I thought that we can save file temperature in its inode struct, that > is, add one new field in struct inode, then this info will be written > to disk with inode. > > Any comments or ideas are appreciated, thanks. > > Maybe could save the last file temperature with extended attributes. Just save the per-inode temperature only for now. Mingming > On Fri, Nov 2, 2012 at 12:04 PM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote: > > HI, guys > > > > VFS hot tracking currently show result as below, and it is very > > strange and not nice. > > > > inode #279, reads 0, writes 1, avg read time 18446744073709551615, > > avg write time 5251566408153596, temp 109 > > > > Do anyone know if there is one simpler but effective way to calculate > > data temperature? > > > > -- > > Regards, > > > > Zhi Yong Wu > > > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: VFS hot tracking: How to calculate data temperature? 2012-11-02 21:27 ` Mingming.cao @ 2012-11-05 2:35 ` Zhi Yong Wu 2012-11-05 8:28 ` Dave Chinner 0 siblings, 1 reply; 29+ messages in thread From: Zhi Yong Wu @ 2012-11-05 2:35 UTC (permalink / raw) To: cmm Cc: linux-fsdevel, linuxram, Dave Chinner, Ben Chociej, James Northrup, linux-kernel mlist On Sat, Nov 3, 2012 at 5:27 AM, Mingming.cao <cmm@us.ibm.com> wrote: > On Fri, 2012-11-02 at 14:38 +0800, Zhi Yong Wu wrote: >> Here also has another question. >> >> How to save the file temperature among the umount to be able to >> preserve the file tempreture after reboot? >> >> This above is the requirement from DB product. >> I thought that we can save file temperature in its inode struct, that >> is, add one new field in struct inode, then this info will be written >> to disk with inode. >> >> Any comments or ideas are appreciated, thanks. >> >> > > Maybe could save the last file temperature with extended attributes. It seems that only ext4 has the concept of extended attributes. > Just save the per-inode temperature only for now. > > Mingming >> On Fri, Nov 2, 2012 at 12:04 PM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote: >> > HI, guys >> > >> > VFS hot tracking currently show result as below, and it is very >> > strange and not nice. >> > >> > inode #279, reads 0, writes 1, avg read time 18446744073709551615, >> > avg write time 5251566408153596, temp 109 >> > >> > Do anyone know if there is one simpler but effective way to calculate >> > data temperature? >> > >> > -- >> > Regards, >> > >> > Zhi Yong Wu >> >> >> > > > -- Regards, Zhi Yong Wu ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: VFS hot tracking: How to calculate data temperature? 2012-11-05 2:35 ` Zhi Yong Wu @ 2012-11-05 8:28 ` Dave Chinner 2012-11-05 8:44 ` Zhi Yong Wu 0 siblings, 1 reply; 29+ messages in thread From: Dave Chinner @ 2012-11-05 8:28 UTC (permalink / raw) To: Zhi Yong Wu Cc: cmm, linux-fsdevel, linuxram, Ben Chociej, James Northrup, linux-kernel mlist On Mon, Nov 05, 2012 at 10:35:50AM +0800, Zhi Yong Wu wrote: > On Sat, Nov 3, 2012 at 5:27 AM, Mingming.cao <cmm@us.ibm.com> wrote: > > On Fri, 2012-11-02 at 14:38 +0800, Zhi Yong Wu wrote: > >> Here also has another question. > >> > >> How to save the file temperature among the umount to be able to > >> preserve the file tempreture after reboot? > >> > >> This above is the requirement from DB product. > >> I thought that we can save file temperature in its inode struct, that > >> is, add one new field in struct inode, then this info will be written > >> to disk with inode. > >> > >> Any comments or ideas are appreciated, thanks. > >> > >> > > > > Maybe could save the last file temperature with extended attributes. > It seems that only ext4 has the concept of extended attributes. All major filesystems have xattr support. They are used extensively by the security and integrity subsystems, for example. Saving the information might be something that is useful to certian applications, but lets have the people that need that functionality spell out their requirements before discussing how or what to implement. Indeed, discussion shoul dreally focus on getting the core, in-memory infrastructure sorted out first before trying to expand the functionality further... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: VFS hot tracking: How to calculate data temperature? 2012-11-05 8:28 ` Dave Chinner @ 2012-11-05 8:44 ` Zhi Yong Wu 2012-11-05 10:33 ` Steven Whitehouse 0 siblings, 1 reply; 29+ messages in thread From: Zhi Yong Wu @ 2012-11-05 8:44 UTC (permalink / raw) To: Dave Chinner Cc: cmm, linux-fsdevel, linuxram, Ben Chociej, James Northrup, linux-kernel mlist On Mon, Nov 5, 2012 at 4:28 PM, Dave Chinner <david@fromorbit.com> wrote: > On Mon, Nov 05, 2012 at 10:35:50AM +0800, Zhi Yong Wu wrote: >> On Sat, Nov 3, 2012 at 5:27 AM, Mingming.cao <cmm@us.ibm.com> wrote: >> > On Fri, 2012-11-02 at 14:38 +0800, Zhi Yong Wu wrote: >> >> Here also has another question. >> >> >> >> How to save the file temperature among the umount to be able to >> >> preserve the file tempreture after reboot? >> >> >> >> This above is the requirement from DB product. >> >> I thought that we can save file temperature in its inode struct, that >> >> is, add one new field in struct inode, then this info will be written >> >> to disk with inode. >> >> >> >> Any comments or ideas are appreciated, thanks. >> >> >> >> >> > >> > Maybe could save the last file temperature with extended attributes. >> It seems that only ext4 has the concept of extended attributes. > > All major filesystems have xattr support. They are used extensively > by the security and integrity subsystems, for example. got it, thanks. > > Saving the information might be something that is useful to certian > applications, but lets have the people that need that functionality > spell out their requirements before discussing how or what to > implement. Indeed, discussion shoul dreally focus on getting the > core, in-memory infrastructure sorted out first before trying to > expand the functionality further... ah, but the latest patchset need some love from experienced FS guys:)....... > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com -- Regards, Zhi Yong Wu ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: VFS hot tracking: How to calculate data temperature? 2012-11-05 8:44 ` Zhi Yong Wu @ 2012-11-05 10:33 ` Steven Whitehouse 2012-11-05 11:46 ` Zhi Yong Wu 0 siblings, 1 reply; 29+ messages in thread From: Steven Whitehouse @ 2012-11-05 10:33 UTC (permalink / raw) To: Zhi Yong Wu Cc: Dave Chinner, cmm, linux-fsdevel, linuxram, Ben Chociej, James Northrup, linux-kernel mlist Hi, On Mon, 2012-11-05 at 16:44 +0800, Zhi Yong Wu wrote: > On Mon, Nov 5, 2012 at 4:28 PM, Dave Chinner <david@fromorbit.com> wrote: > > On Mon, Nov 05, 2012 at 10:35:50AM +0800, Zhi Yong Wu wrote: > >> On Sat, Nov 3, 2012 at 5:27 AM, Mingming.cao <cmm@us.ibm.com> wrote: > >> > On Fri, 2012-11-02 at 14:38 +0800, Zhi Yong Wu wrote: > >> >> Here also has another question. > >> >> > >> >> How to save the file temperature among the umount to be able to > >> >> preserve the file tempreture after reboot? > >> >> > >> >> This above is the requirement from DB product. > >> >> I thought that we can save file temperature in its inode struct, that > >> >> is, add one new field in struct inode, then this info will be written > >> >> to disk with inode. > >> >> > >> >> Any comments or ideas are appreciated, thanks. > >> >> > >> >> > >> > > >> > Maybe could save the last file temperature with extended attributes. > >> It seems that only ext4 has the concept of extended attributes. > > > > All major filesystems have xattr support. They are used extensively > > by the security and integrity subsystems, for example. > got it, thanks. > > > > Saving the information might be something that is useful to certian > > applications, but lets have the people that need that functionality > > spell out their requirements before discussing how or what to > > implement. Indeed, discussion shoul dreally focus on getting the > > core, in-memory infrastructure sorted out first before trying to > > expand the functionality further... > ah, but the latest patchset need some love from experienced FS guys:)....... There is one other possible issue with saving the data into the filesystem, which is that it may disturb what you are trying to measure. Some filesystems (GFS2 is one) store data for small inodes in the same block as the inode itself. So that means the accesses to the saved hot tracking info may potentially affect the data access times too. Also there is a very limited amount of space to expand the number of fields in the inode, so xattr may be the only solution, depending on how much data needs to be stored in each case. In the GFS2 case (I don't think it is unique in this) xattrs are stored out of line and having to access them in every open means an extra block read per inode, which again has performance implications. So that is not an insurmountable problem, but something to take into account in selecting a solution, Steve. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: VFS hot tracking: How to calculate data temperature? 2012-11-05 10:33 ` Steven Whitehouse @ 2012-11-05 11:46 ` Zhi Yong Wu 2012-11-05 11:57 ` Steven Whitehouse 0 siblings, 1 reply; 29+ messages in thread From: Zhi Yong Wu @ 2012-11-05 11:46 UTC (permalink / raw) To: Steven Whitehouse Cc: Dave Chinner, cmm, linux-fsdevel, linuxram, Ben Chociej, James Northrup, linux-kernel mlist On Mon, Nov 5, 2012 at 6:33 PM, Steven Whitehouse <swhiteho@redhat.com> wrote: > Hi, > > On Mon, 2012-11-05 at 16:44 +0800, Zhi Yong Wu wrote: >> On Mon, Nov 5, 2012 at 4:28 PM, Dave Chinner <david@fromorbit.com> wrote: >> > On Mon, Nov 05, 2012 at 10:35:50AM +0800, Zhi Yong Wu wrote: >> >> On Sat, Nov 3, 2012 at 5:27 AM, Mingming.cao <cmm@us.ibm.com> wrote: >> >> > On Fri, 2012-11-02 at 14:38 +0800, Zhi Yong Wu wrote: >> >> >> Here also has another question. >> >> >> >> >> >> How to save the file temperature among the umount to be able to >> >> >> preserve the file tempreture after reboot? >> >> >> >> >> >> This above is the requirement from DB product. >> >> >> I thought that we can save file temperature in its inode struct, that >> >> >> is, add one new field in struct inode, then this info will be written >> >> >> to disk with inode. >> >> >> >> >> >> Any comments or ideas are appreciated, thanks. >> >> >> >> >> >> >> >> > >> >> > Maybe could save the last file temperature with extended attributes. >> >> It seems that only ext4 has the concept of extended attributes. >> > >> > All major filesystems have xattr support. They are used extensively >> > by the security and integrity subsystems, for example. >> got it, thanks. >> > >> > Saving the information might be something that is useful to certian >> > applications, but lets have the people that need that functionality >> > spell out their requirements before discussing how or what to >> > implement. Indeed, discussion shoul dreally focus on getting the >> > core, in-memory infrastructure sorted out first before trying to >> > expand the functionality further... >> ah, but the latest patchset need some love from experienced FS guys:)....... > > There is one other possible issue with saving the data into the > filesystem, which is that it may disturb what you are trying to measure. > Some filesystems (GFS2 is one) store data for small inodes in the same > block as the inode itself. So that means the accesses to the saved hot > tracking info may potentially affect the data access times too. Also > there is a very limited amount of space to expand the number of fields > in the inode, so xattr may be the only solution, depending on how much > data needs to be stored in each case. Very good analysis, two possible issues are very meanful, thanks. > > In the GFS2 case (I don't think it is unique in this) xattrs are stored > out of line and having to access them in every open means an extra block > read per inode, which again has performance implications. > > So that is not an insurmountable problem, but something to take into > account in selecting a solution, In summary, you look like preferring to xattr as its solution. > > Steve. > > > -- Regards, Zhi Yong Wu ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: VFS hot tracking: How to calculate data temperature? 2012-11-05 11:46 ` Zhi Yong Wu @ 2012-11-05 11:57 ` Steven Whitehouse 2012-11-05 12:18 ` Zhi Yong Wu 0 siblings, 1 reply; 29+ messages in thread From: Steven Whitehouse @ 2012-11-05 11:57 UTC (permalink / raw) To: Zhi Yong Wu Cc: Dave Chinner, cmm, linux-fsdevel, linuxram, Ben Chociej, James Northrup, linux-kernel mlist Hi, On Mon, 2012-11-05 at 19:46 +0800, Zhi Yong Wu wrote: > On Mon, Nov 5, 2012 at 6:33 PM, Steven Whitehouse <swhiteho@redhat.com> wrote: > > Hi, > > > > On Mon, 2012-11-05 at 16:44 +0800, Zhi Yong Wu wrote: > >> On Mon, Nov 5, 2012 at 4:28 PM, Dave Chinner <david@fromorbit.com> wrote: > >> > On Mon, Nov 05, 2012 at 10:35:50AM +0800, Zhi Yong Wu wrote: > >> >> On Sat, Nov 3, 2012 at 5:27 AM, Mingming.cao <cmm@us.ibm.com> wrote: > >> >> > On Fri, 2012-11-02 at 14:38 +0800, Zhi Yong Wu wrote: > >> >> >> Here also has another question. > >> >> >> > >> >> >> How to save the file temperature among the umount to be able to > >> >> >> preserve the file tempreture after reboot? > >> >> >> > >> >> >> This above is the requirement from DB product. > >> >> >> I thought that we can save file temperature in its inode struct, that > >> >> >> is, add one new field in struct inode, then this info will be written > >> >> >> to disk with inode. > >> >> >> > >> >> >> Any comments or ideas are appreciated, thanks. > >> >> >> > >> >> >> > >> >> > > >> >> > Maybe could save the last file temperature with extended attributes. > >> >> It seems that only ext4 has the concept of extended attributes. > >> > > >> > All major filesystems have xattr support. They are used extensively > >> > by the security and integrity subsystems, for example. > >> got it, thanks. > >> > > >> > Saving the information might be something that is useful to certian > >> > applications, but lets have the people that need that functionality > >> > spell out their requirements before discussing how or what to > >> > implement. Indeed, discussion shoul dreally focus on getting the > >> > core, in-memory infrastructure sorted out first before trying to > >> > expand the functionality further... > >> ah, but the latest patchset need some love from experienced FS guys:)....... > > > > There is one other possible issue with saving the data into the > > filesystem, which is that it may disturb what you are trying to measure. > > Some filesystems (GFS2 is one) store data for small inodes in the same > > block as the inode itself. So that means the accesses to the saved hot > > tracking info may potentially affect the data access times too. Also > > there is a very limited amount of space to expand the number of fields > > in the inode, so xattr may be the only solution, depending on how much > > data needs to be stored in each case. > Very good analysis, two possible issues are very meanful, thanks. > > > > In the GFS2 case (I don't think it is unique in this) xattrs are stored > > out of line and having to access them in every open means an extra block > > read per inode, which again has performance implications. > > > > So that is not an insurmountable problem, but something to take into > > account in selecting a solution, > In summary, you look like preferring to xattr as its solution. > Well, that depends on exactly how large the data to be stored is, and other factors. It will add overhead to the storage/retrieval but at least it is fairly generic (wrt on-disk format) so likely to be easier to retrofit to existing filesystems. I suspect this may be one of those cases where there is no obvious right answer and it is a case of selecting the least worst option, if that makes sense? Steve. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: VFS hot tracking: How to calculate data temperature? 2012-11-05 11:57 ` Steven Whitehouse @ 2012-11-05 12:18 ` Zhi Yong Wu 2012-11-05 12:25 ` Steven Whitehouse 0 siblings, 1 reply; 29+ messages in thread From: Zhi Yong Wu @ 2012-11-05 12:18 UTC (permalink / raw) To: Steven Whitehouse Cc: Dave Chinner, cmm, linux-fsdevel, linuxram, Ben Chociej, James Northrup, linux-kernel mlist On Mon, Nov 5, 2012 at 7:57 PM, Steven Whitehouse <swhiteho@redhat.com> wrote: > Hi, > > On Mon, 2012-11-05 at 19:46 +0800, Zhi Yong Wu wrote: >> On Mon, Nov 5, 2012 at 6:33 PM, Steven Whitehouse <swhiteho@redhat.com> wrote: >> > Hi, >> > >> > On Mon, 2012-11-05 at 16:44 +0800, Zhi Yong Wu wrote: >> >> On Mon, Nov 5, 2012 at 4:28 PM, Dave Chinner <david@fromorbit.com> wrote: >> >> > On Mon, Nov 05, 2012 at 10:35:50AM +0800, Zhi Yong Wu wrote: >> >> >> On Sat, Nov 3, 2012 at 5:27 AM, Mingming.cao <cmm@us.ibm.com> wrote: >> >> >> > On Fri, 2012-11-02 at 14:38 +0800, Zhi Yong Wu wrote: >> >> >> >> Here also has another question. >> >> >> >> >> >> >> >> How to save the file temperature among the umount to be able to >> >> >> >> preserve the file tempreture after reboot? >> >> >> >> >> >> >> >> This above is the requirement from DB product. >> >> >> >> I thought that we can save file temperature in its inode struct, that >> >> >> >> is, add one new field in struct inode, then this info will be written >> >> >> >> to disk with inode. >> >> >> >> >> >> >> >> Any comments or ideas are appreciated, thanks. >> >> >> >> >> >> >> >> >> >> >> > >> >> >> > Maybe could save the last file temperature with extended attributes. >> >> >> It seems that only ext4 has the concept of extended attributes. >> >> > >> >> > All major filesystems have xattr support. They are used extensively >> >> > by the security and integrity subsystems, for example. >> >> got it, thanks. >> >> > >> >> > Saving the information might be something that is useful to certian >> >> > applications, but lets have the people that need that functionality >> >> > spell out their requirements before discussing how or what to >> >> > implement. Indeed, discussion shoul dreally focus on getting the >> >> > core, in-memory infrastructure sorted out first before trying to >> >> > expand the functionality further... >> >> ah, but the latest patchset need some love from experienced FS guys:)....... >> > >> > There is one other possible issue with saving the data into the >> > filesystem, which is that it may disturb what you are trying to measure. >> > Some filesystems (GFS2 is one) store data for small inodes in the same >> > block as the inode itself. So that means the accesses to the saved hot >> > tracking info may potentially affect the data access times too. Also >> > there is a very limited amount of space to expand the number of fields >> > in the inode, so xattr may be the only solution, depending on how much >> > data needs to be stored in each case. >> Very good analysis, two possible issues are very meanful, thanks. >> > >> > In the GFS2 case (I don't think it is unique in this) xattrs are stored >> > out of line and having to access them in every open means an extra block >> > read per inode, which again has performance implications. >> > >> > So that is not an insurmountable problem, but something to take into >> > account in selecting a solution, >> In summary, you look like preferring to xattr as its solution. >> > > Well, that depends on exactly how large the data to be stored is, and > other factors. It will add overhead to the storage/retrieval but at > least it is fairly generic (wrt on-disk format) so likely to be easier > to retrofit to existing filesystems. Do you have some idea with more details about how to retrofit to existing FS?:) > > I suspect this may be one of those cases where there is no obvious right > answer and it is a case of selecting the least worst option, if that > makes sense? Then we can only check which solution is better via large scale performance test. > > Steve. > > -- Regards, Zhi Yong Wu ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: VFS hot tracking: How to calculate data temperature? 2012-11-05 12:18 ` Zhi Yong Wu @ 2012-11-05 12:25 ` Steven Whitehouse 0 siblings, 0 replies; 29+ messages in thread From: Steven Whitehouse @ 2012-11-05 12:25 UTC (permalink / raw) To: Zhi Yong Wu Cc: Dave Chinner, cmm, linux-fsdevel, linuxram, Ben Chociej, James Northrup, linux-kernel mlist Hi, On Mon, 2012-11-05 at 20:18 +0800, Zhi Yong Wu wrote: > On Mon, Nov 5, 2012 at 7:57 PM, Steven Whitehouse <swhiteho@redhat.com> wrote: > > Hi, > > > > On Mon, 2012-11-05 at 19:46 +0800, Zhi Yong Wu wrote: > >> On Mon, Nov 5, 2012 at 6:33 PM, Steven Whitehouse <swhiteho@redhat.com> wrote: > >> > Hi, > >> > > >> > On Mon, 2012-11-05 at 16:44 +0800, Zhi Yong Wu wrote: > >> >> On Mon, Nov 5, 2012 at 4:28 PM, Dave Chinner <david@fromorbit.com> wrote: > >> >> > On Mon, Nov 05, 2012 at 10:35:50AM +0800, Zhi Yong Wu wrote: > >> >> >> On Sat, Nov 3, 2012 at 5:27 AM, Mingming.cao <cmm@us.ibm.com> wrote: > >> >> >> > On Fri, 2012-11-02 at 14:38 +0800, Zhi Yong Wu wrote: > >> >> >> >> Here also has another question. > >> >> >> >> > >> >> >> >> How to save the file temperature among the umount to be able to > >> >> >> >> preserve the file tempreture after reboot? > >> >> >> >> > >> >> >> >> This above is the requirement from DB product. > >> >> >> >> I thought that we can save file temperature in its inode struct, that > >> >> >> >> is, add one new field in struct inode, then this info will be written > >> >> >> >> to disk with inode. > >> >> >> >> > >> >> >> >> Any comments or ideas are appreciated, thanks. > >> >> >> >> > >> >> >> >> > >> >> >> > > >> >> >> > Maybe could save the last file temperature with extended attributes. > >> >> >> It seems that only ext4 has the concept of extended attributes. > >> >> > > >> >> > All major filesystems have xattr support. They are used extensively > >> >> > by the security and integrity subsystems, for example. > >> >> got it, thanks. > >> >> > > >> >> > Saving the information might be something that is useful to certian > >> >> > applications, but lets have the people that need that functionality > >> >> > spell out their requirements before discussing how or what to > >> >> > implement. Indeed, discussion shoul dreally focus on getting the > >> >> > core, in-memory infrastructure sorted out first before trying to > >> >> > expand the functionality further... > >> >> ah, but the latest patchset need some love from experienced FS guys:)....... > >> > > >> > There is one other possible issue with saving the data into the > >> > filesystem, which is that it may disturb what you are trying to measure. > >> > Some filesystems (GFS2 is one) store data for small inodes in the same > >> > block as the inode itself. So that means the accesses to the saved hot > >> > tracking info may potentially affect the data access times too. Also > >> > there is a very limited amount of space to expand the number of fields > >> > in the inode, so xattr may be the only solution, depending on how much > >> > data needs to be stored in each case. > >> Very good analysis, two possible issues are very meanful, thanks. > >> > > >> > In the GFS2 case (I don't think it is unique in this) xattrs are stored > >> > out of line and having to access them in every open means an extra block > >> > read per inode, which again has performance implications. > >> > > >> > So that is not an insurmountable problem, but something to take into > >> > account in selecting a solution, > >> In summary, you look like preferring to xattr as its solution. > >> > > > > Well, that depends on exactly how large the data to be stored is, and > > other factors. It will add overhead to the storage/retrieval but at > > least it is fairly generic (wrt on-disk format) so likely to be easier > > to retrofit to existing filesystems. > Do you have some idea with more details about how to retrofit to existing FS?:) Well I think we've already covered the obvious ways... > > > > I suspect this may be one of those cases where there is no obvious right > > answer and it is a case of selecting the least worst option, if that > > makes sense? > Then we can only check which solution is better via large scale > performance test. Indeed, and that will be to a certain extent fs dependent too, Steve. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: VFS hot tracking: How to calculate data temperature? 2012-11-02 4:04 VFS hot tracking: How to calculate data temperature? Zhi Yong Wu 2012-11-02 4:43 ` Ram Pai 2012-11-02 6:38 ` Zhi Yong Wu @ 2012-11-09 1:12 ` Zhi Yong Wu 2012-11-09 3:20 ` Zheng Liu [not found] ` <CAPkEcwg0ZHjV3JVxoKSzFqKLHavhGdTufLZBdBGQ6xXDMrSU-w@mail.gmail.com> 2 siblings, 2 replies; 29+ messages in thread From: Zhi Yong Wu @ 2012-11-09 1:12 UTC (permalink / raw) To: linux-fsdevel Cc: linuxram, zwu.kernel, Dave Chinner, cmm, Ben Chociej, Zheng Liu, Darrick J. Wong, Ram Pai, James Northrup, Steven Whitehouse On Fri, Nov 2, 2012 at 12:04 PM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote: > HI, guys > > VFS hot tracking currently show result as below, and it is very > strange and not nice. > > inode #279, reads 0, writes 1, avg read time 18446744073709551615, > avg write time 5251566408153596, temp 109 > > Do anyone know if there is one simpler but effective way to calculate > data temperature? inode 279, reads 0, writes 1, temp 109 Since we have got no better way, and avg read/write times are mid-stage value, i want to show it as above format, do you think of it? > > -- > Regards, > > Zhi Yong Wu -- Regards, Zhi Yong Wu ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: VFS hot tracking: How to calculate data temperature? 2012-11-09 1:12 ` Zhi Yong Wu @ 2012-11-09 3:20 ` Zheng Liu [not found] ` <CAPkEcwg0ZHjV3JVxoKSzFqKLHavhGdTufLZBdBGQ6xXDMrSU-w@mail.gmail.com> 1 sibling, 0 replies; 29+ messages in thread From: Zheng Liu @ 2012-11-09 3:20 UTC (permalink / raw) To: Zhi Yong Wu Cc: linux-fsdevel, linuxram, Dave Chinner, cmm, Ben Chociej, Zheng Liu, Darrick J. Wong, Ram Pai, James Northrup, Steven Whitehouse On Fri, Nov 09, 2012 at 09:12:10AM +0800, Zhi Yong Wu wrote: > On Fri, Nov 2, 2012 at 12:04 PM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote: > > HI, guys > > > > VFS hot tracking currently show result as below, and it is very > > strange and not nice. > > > > inode #279, reads 0, writes 1, avg read time 18446744073709551615, > > avg write time 5251566408153596, temp 109 > > > > Do anyone know if there is one simpler but effective way to calculate > > data temperature? > > inode 279, reads 0, writes 1, temp 109 > > Since we have got no better way, and avg read/write times are > mid-stage value, i want to show it as above format, do you think of > it? Looks good to me. Regards, Zheng ^ permalink raw reply [flat|nested] 29+ messages in thread
[parent not found: <CAPkEcwg0ZHjV3JVxoKSzFqKLHavhGdTufLZBdBGQ6xXDMrSU-w@mail.gmail.com>]
* Re: VFS hot tracking: How to calculate data temperature? [not found] ` <CAPkEcwg0ZHjV3JVxoKSzFqKLHavhGdTufLZBdBGQ6xXDMrSU-w@mail.gmail.com> @ 2012-11-11 23:32 ` Zhi Yong Wu 0 siblings, 0 replies; 29+ messages in thread From: Zhi Yong Wu @ 2012-11-11 23:32 UTC (permalink / raw) To: james northrup Cc: linux-fsdevel@vger.kernel.org, linuxram, Dave Chinner, cmm, Ben Chociej, Zheng Liu, Darrick J. Wong, Ram Pai, Steven Whitehouse On Fri, Nov 9, 2012 at 11:05 PM, james northrup <northrup.james@gmail.com> wrote: > is 'temp 109' a bucket of inodes or is temp per-inode? both. > > > On Thu, Nov 8, 2012 at 5:12 PM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote: >> >> On Fri, Nov 2, 2012 at 12:04 PM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote: >> > HI, guys >> > >> > VFS hot tracking currently show result as below, and it is very >> > strange and not nice. >> > >> > inode #279, reads 0, writes 1, avg read time 18446744073709551615, >> > avg write time 5251566408153596, temp 109 >> > >> > Do anyone know if there is one simpler but effective way to calculate >> > data temperature? >> >> inode 279, reads 0, writes 1, temp 109 >> >> Since we have got no better way, and avg read/write times are >> mid-stage value, i want to show it as above format, do you think of >> it? >> >> > >> > -- >> > Regards, >> > >> > Zhi Yong Wu >> >> >> >> -- >> Regards, >> >> Zhi Yong Wu > > -- Regards, Zhi Yong Wu ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2012-11-11 23:32 UTC | newest] Thread overview: 29+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-11-02 4:04 VFS hot tracking: How to calculate data temperature? Zhi Yong Wu 2012-11-02 4:43 ` Ram Pai 2012-11-02 6:39 ` Zhi Yong Wu 2012-11-02 6:38 ` Zhi Yong Wu 2012-11-02 8:41 ` Zheng Liu 2012-11-02 20:10 ` Darrick J. Wong 2012-11-05 2:34 ` Zhi Yong Wu 2012-11-05 8:35 ` Dave Chinner 2012-11-05 2:29 ` Zhi Yong Wu 2012-11-06 8:39 ` Zheng Liu 2012-11-06 9:00 ` Zhi Yong Wu 2012-11-07 6:45 ` Zheng Liu 2012-11-06 9:36 ` Ram Pai 2012-11-06 23:10 ` Darrick J. Wong 2012-11-07 6:36 ` Zheng Liu 2012-11-07 19:25 ` Darrick J. Wong 2012-11-08 2:48 ` Zheng Liu 2012-11-02 21:27 ` Mingming.cao 2012-11-05 2:35 ` Zhi Yong Wu 2012-11-05 8:28 ` Dave Chinner 2012-11-05 8:44 ` Zhi Yong Wu 2012-11-05 10:33 ` Steven Whitehouse 2012-11-05 11:46 ` Zhi Yong Wu 2012-11-05 11:57 ` Steven Whitehouse 2012-11-05 12:18 ` Zhi Yong Wu 2012-11-05 12:25 ` Steven Whitehouse 2012-11-09 1:12 ` Zhi Yong Wu 2012-11-09 3:20 ` Zheng Liu [not found] ` <CAPkEcwg0ZHjV3JVxoKSzFqKLHavhGdTufLZBdBGQ6xXDMrSU-w@mail.gmail.com> 2012-11-11 23:32 ` Zhi Yong Wu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).