From: Ram Pai <linuxram@us.ibm.com>
To: Zhi Yong Wu <zwu.kernel@gmail.com>
Cc: linux-fsdevel@vger.kernel.org, linuxram@linux.vnet.ibm.com,
Dave Chinner <david@fromorbit.com>,
cmm@us.ibm.com, Ben Chociej <bchociej@gmail.com>
Subject: Re: VFS hot tracking: How to calculate data temperature?
Date: Fri, 2 Nov 2012 12:43:12 +0800 [thread overview]
Message-ID: <20121102044312.GY4555@ram-ThinkPad-T61> (raw)
In-Reply-To: <CAEH94LjTedn0n00ZtqFxBL=kDA3u8G39q+x79-vPoMYvEKW1gw@mail.gmail.com>
On Fri, Nov 02, 2012 at 12:04:23PM +0800, Zhi Yong Wu wrote:
> HI, guys
>
> VFS hot tracking currently show result as below, and it is very
> strange and not nice.
>
> inode #279, reads 0, writes 1, avg read time 18446744073709551615,
> avg write time 5251566408153596, temp 109
>
> Do anyone know if there is one simpler but effective way to calculate
> data temperature?
Intuitively, data gets hot when it is accessed frequently, and cools down as
the frequency of its access decreases.
It is like heating water, the longer you heat it, the hotter it gets. The more
intense the flame the faster it gets heated. So it is a function of both
intensity and continuity. In the case of data, intensity can be mapped to
the access frequency within a discrete interval. And the continuity can be
mapped to sustainence of the same access frequency across further discrete
intervals. Any reduction of intensity cools down the data. Any break in
continuity cools down the data.
We need to define the size of a 'discrete interval', and keep collecting the
frequency of access in each discrete interval and take weighted
average across all these discrete intervals. That should give us the
temperature of the data.
Here is an example to make my thoughts clear.
Lets say we define discrete interval as 1sec.
If a given peice of data is accessed 100 times in this 1sec, then the
temperature of the data at the end of the 1sec will become (0+100)/2=50, where
0 is the temperature of the data at the beginning of the 1sec. In the next 1
sec, if the data is accessed 100 times again, then the temperature of the data
at the end of the 2nd sec becomes (50+100)/2=75 and so on a so forth. If
during any interval if the data is not accessed at all, the temperature goes
down by half. This way of calculation biases the temperature towards most
recent access. In other words a high amount of recent access significantly
rises the temperature of the data compared to the same high amount of
non-recent access.
This technique will require us to capture *only* two information; the access
frequency and the latest temperature of the data.
A crude approach is to schedule a kernel daemon approximately every second to
update the latest temperature using the access frequency since the last time
the temperature got updated. But this approach is not scalable. The other
approach is to decouple the frequency of temperature calculation from the size of
the discrete interval. The kernel daemon thread can take its own sweet time to
calculate the temperature. Whenever the daemon gets to calculate the new
temperature of a given inode it can evenly spread the load, since the last
temperature calculation, across all the discrete intervals.
The formula for calculation will be (Ty+((2^y)-1)x)/(2^y). Where 'x' is the
number of accesses in the last 'y' discrete intervals. 'T' is the previous
temperature. Offcourse; I am approximating that there were exactly 'x/y'
accesses in each of these 'y' discrete interval.
This way of calculation will require one more field to be added in the
datastructure, which is the timestamp when the last temperature got updated.
RP
next prev parent reply other threads:[~2012-11-02 7:13 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-02 4:04 VFS hot tracking: How to calculate data temperature? Zhi Yong Wu
2012-11-02 4:43 ` Ram Pai [this message]
2012-11-02 6:39 ` Zhi Yong Wu
2012-11-02 6:38 ` Zhi Yong Wu
2012-11-02 8:41 ` Zheng Liu
2012-11-02 20:10 ` Darrick J. Wong
2012-11-05 2:34 ` Zhi Yong Wu
2012-11-05 8:35 ` Dave Chinner
2012-11-05 2:29 ` Zhi Yong Wu
2012-11-06 8:39 ` Zheng Liu
2012-11-06 9:00 ` Zhi Yong Wu
2012-11-07 6:45 ` Zheng Liu
2012-11-06 9:36 ` Ram Pai
2012-11-06 23:10 ` Darrick J. Wong
2012-11-07 6:36 ` Zheng Liu
2012-11-07 19:25 ` Darrick J. Wong
2012-11-08 2:48 ` Zheng Liu
2012-11-02 21:27 ` Mingming.cao
2012-11-05 2:35 ` Zhi Yong Wu
2012-11-05 8:28 ` Dave Chinner
2012-11-05 8:44 ` Zhi Yong Wu
2012-11-05 10:33 ` Steven Whitehouse
2012-11-05 11:46 ` Zhi Yong Wu
2012-11-05 11:57 ` Steven Whitehouse
2012-11-05 12:18 ` Zhi Yong Wu
2012-11-05 12:25 ` Steven Whitehouse
2012-11-09 1:12 ` Zhi Yong Wu
2012-11-09 3:20 ` Zheng Liu
[not found] ` <CAPkEcwg0ZHjV3JVxoKSzFqKLHavhGdTufLZBdBGQ6xXDMrSU-w@mail.gmail.com>
2012-11-11 23:32 ` Zhi Yong Wu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121102044312.GY4555@ram-ThinkPad-T61 \
--to=linuxram@us.ibm.com \
--cc=bchociej@gmail.com \
--cc=cmm@us.ibm.com \
--cc=david@fromorbit.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linuxram@linux.vnet.ibm.com \
--cc=zwu.kernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).