All of lore.kernel.org
 help / color / mirror / Atom feed
From: zwu.kernel@gmail.com
To: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, viro@zeniv.linux.org.uk,
	david@fromorbit.com, dave@jikos.cz, darrick.wong@oracle.com,
	andi@firstfloor.org, hch@infradead.org,
	linuxram@linux.vnet.ibm.com, zwu.kernel@gmail.com,
	wuzhy@linux.vnet.ibm.com
Subject: [PATCH RESEND v1 16/16] vfs: add documentation
Date: Thu, 20 Dec 2012 22:43:35 +0800	[thread overview]
Message-ID: <1356014615-15073-17-git-send-email-zwu.kernel@gmail.com> (raw)
In-Reply-To: <1356014615-15073-1-git-send-email-zwu.kernel@gmail.com>

From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>

  Add one doc for VFS hot tracking feature

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
 Documentation/filesystems/00-INDEX         |    2 +
 Documentation/filesystems/hot_tracking.txt |  255 ++++++++++++++++++++++++++++
 2 files changed, 257 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/filesystems/hot_tracking.txt

diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX
index 7b52ba7..b973fd8 100644
--- a/Documentation/filesystems/00-INDEX
+++ b/Documentation/filesystems/00-INDEX
@@ -120,3 +120,5 @@ xfs.txt
 	- info and mount options for the XFS filesystem.
 xip.txt
 	- info on execute-in-place for file mappings.
+hot_tracking.txt
+	- info on hot data tracking in VFS layer
diff --git a/Documentation/filesystems/hot_tracking.txt b/Documentation/filesystems/hot_tracking.txt
new file mode 100644
index 0000000..cd6a592
--- /dev/null
+++ b/Documentation/filesystems/hot_tracking.txt
@@ -0,0 +1,255 @@
+Hot Data Tracking
+
+September, 2012		Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
+
+CONTENTS
+
+1. Introduction
+2. Motivation
+3. The Design
+4. How to Calc Frequency of Reads/Writes & Temperature
+5. Git Development Tree
+6. Usage Example
+
+
+1. Introduction
+
+  The feature adds experimental support for tracking data temperature
+information in VFS layer.  Essentially, this means maintaining some key
+stats(like number of reads/writes, last read/write time, frequency of
+reads/writes), then distilling those numbers down to a single
+"temperature" value that reflects what data is "hot," and using that
+temperature to move data to SSDs.
+
+  The long-term goal of the feature is to allow some FSs,
+e.g. Btrfs to intelligently utilize SSDs in a heterogenous volume.
+Incidentally, this project has been motivated by
+the Project Ideas page on the Btrfs wiki.
+
+  Of course, users are warned not to run this code outside of development
+environments. These patches are EXPERIMENTAL, and as such they might eat
+your data and/or memory. That said, the code should be relatively safe
+when the hottrack mount option are disabled.
+
+
+2. Motivation
+
+  The overall goal of enabling hot data relocation to SSD has been
+motivated by the Project Ideas page on the Btrfs wiki at
+<https://btrfs.wiki.kernel.org/index.php/Project_ideas>.
+It will divide into two steps. VFS provide hot data tracking function
+while specific FS will provide hot data relocation function.
+So as the first step of this goal, it is hoped that the patchset
+for hot data tracking will eventually mature into VFS.
+
+  This is essentially the traditional cache argument: SSD is fast and
+expensive; HDD is cheap but slow. ZFS, for example, can already take
+advantage of SSD caching. Btrfs should also be able to take advantage of
+hybrid storage without many broad, sweeping changes to existing code.
+
+
+3. The Design
+
+These include the following parts:
+
+    * Hooks in existing vfs functions to track data access frequency
+
+    * New radix-trees for tracking access frequency of inodes and sub-file
+ranges
+    The relationship between super_block and radix-tree is as below:
+hot_info.hot_inode_tree
+    Each FS instance can find hot tracking info s_hotinfo.
+In this hot_info, it store a lot of hot tracking info such as hot_inode_tree,
+inode and range list, etc.
+
+    * A list for indexing data by its temperature
+
+    * A debugfs interface for dumping data from the radix-trees
+
+    * A background kthread for updating inode heat info
+
+    * Mount options for enabling temperature tracking(-o hot_track,
+default mean disabled)
+    * An ioctl to retrieve the frequency information collected for a certain
+file
+    * Ioctls to enable/disable frequency tracking per inode.
+
+Let us see their relationship as below:
+
+    * hot_info.hot_inode_tree indexes hot_inode_items, one per inode
+
+    * hot_inode_item contains access frequency data for that inode
+
+    * hot_inode_item holds a heat list node to index the access
+frequency data for that inode
+
+    * hot_inode_item.hot_range_tree indexes hot_range_items for that inode
+
+    * hot_range_item contains access frequency data for that range
+
+    * hot_range_item holds a heat list node to index the access
+frequency data for that range
+
+    * hot_info.heat_inode_map indexes per-inode heat list nodes
+
+    * hot_info.heat_range_map indexes per-range heat list nodes
+
+  How about some ascii art? :) Just looking at the hot inode item case
+(the range item case is the same pattern, though), we have:
+
+heat_inode_map           hot_inode_tree
+    |                         |
+    |                         V
+    |           +-------hot_comm_item--------+
+    |           |       frequency data       |
++---+           |        list_head           |
+|               V            ^ |             V
+| ...<--hot_comm_item-->...  | |  ...<--hot_comm_item-->...
+|       frequency data       | |        frequency data
++-------->list_head----------+ +--------->list_head--->.....
+       hot_range_tree                  hot_range_tree
+                                             |
+             heat_range_map                  V
+                   |           +-------hot_comm_item--------+
+                   |           |       frequency data       |
+               +---+           |        list_head           |
+               |               V            ^ |             V
+               | ...<--hot_comm_item-->...  | |  ...<--hot_comm_item-->...
+               |       frequency data       | |        frequency data
+               +-------->list_head----------+ +--------->list_head--->.....
+
+
+4. How to Calc Frequency of Reads/Writes & Temperature
+
+1.) hot_rw_freq_calc()
+
+  This function does the actual work of updating the frequency numbers,
+whatever they turn out to be. FREQ_POWER determines how many atime
+deltas we keep track of (as a power of 2). So, setting it to anything above
+16ish is probably overkill. Also, the higher the power, the more bits get
+right shifted out of the timestamp, reducing precision, so take note of that
+as well.
+
+  The caller should have already locked freq_data's parent's spinlock.
+
+  FREQ_POWER, defined immediately below, determines how heavily to weight
+the current frequency numbers against the newest access. For example, a value
+of 4 means that the new access information will be weighted 1/16th (ie 2^-4)
+as heavily as the existing frequency info. In essence, this is a kludged-
+together version of a weighted average, since we can't afford to keep all of
+the information that it would take to get a _real_ weighted average.
+
+2.) Some Micro explaination
+
+  The following comments explain what exactly comprises a unit of heat.
+Each of six values of heat are calculated and combined in order to form an
+overall temperature for the data:
+
+    * NRR - number of reads since mount
+    * NRW - number of writes since mount
+    * LTR - time elapsed since last read (ns)
+    * LTW - time elapsed since last write (ns)
+    * AVR - average delta between recent reads (ns)
+    * AVW - average delta between recent writes (ns)
+
+  These values are divided (right-shifted) according to the *_DIVIDER_POWER
+values defined below to bring the numbers into a reasonable range. You can
+modify these values to fit your needs. However, each heat unit is a u32 and
+thus maxes out at 2^32 - 1. Therefore, you must choose your dividers quite
+carefully or else they could max out or be stuck at zero quite easily.
+(E.g., if you chose AVR_DIVIDER_POWER = 0, nothing less than 4s of atime
+delta would bring the temperature above zero, ever.)
+
+  Finally, each value is added to the overall temperature between 0 and 8
+times, depending on its *_COEFF_POWER value. Note that the coefficients are
+also actually implemented with shifts, so take care to treat these values
+as powers of 2. (I.e., 0 means we'll add it to the temp once; 1 = 2x, etc.)
+
+    * AVR/AVW cold unit = 2^X ns of average delta
+    * AVR/AVW heat unit = HEAT_MAX_VALUE - cold unit
+
+  E.g., data with an average delta between 0 and 2^X ns will have a cold
+value of 0, which means a heat value equal to HEAT_MAX_VALUE.
+
+3.) hot_temp_calc()
+
+  This function is responsible for distilling the six heat
+criteria, which are described in detail in hot_tracking.h) down into a single
+temperature value for the data, which is an integer between 0
+and HEAT_MAX_VALUE.
+
+  To accomplish this, the raw values from the hot_freq_data structure
+are shifted various ways in order to make the temperature calculation more
+or less sensitive to each value.
+
+  Once this calibration has happened, we do some additional normalization and
+make sure that everything fits nicely in a u32. From there, we take a very
+rudimentary kind of "average" of each of the values, where the *_COEFF_POWER
+values act as weights for the average.
+
+  Finally, we use the HEAT_HASH_BITS value, which determines the size of the
+heat list array, to normalize the temperature to the proper granularity.
+
+
+5. Git Development Tree
+
+  This feature is still on development and review, so if you're interested,
+you can pull from the git repository at the following location:
+
+  https://github.com/wuzhy/kernel.git hot_tracking
+  git://github.com/wuzhy/kernel.git hot_tracking
+
+
+6. Usage Example
+
+1.) To use hot tracking, you should mount like this:
+
+$ mount -o hot_track /dev/sdb /mnt
+[ 1505.894078] device label test devid 1 transid 29 /dev/sdb
+[ 1505.952977] btrfs: disk space caching is enabled
+[ 1506.069678] vfs: turning on hot data tracking
+
+2.) Mount debugfs at first:
+
+$ mount -t debugfs none /sys/kernel/debug
+$ ls -l /sys/kernel/debug/hot_track/
+total 0
+drwxr-xr-x 2 root root 0 Aug  8 04:40 sdb
+$ ls -l /sys/kernel/debug/hot_track/sdb
+total 0
+-rw-r--r-- 1 root root 0 Aug  8 04:40 rt_stats_inode
+-rw-r--r-- 1 root root 0 Aug  8 04:40 rt_stats_range
+
+3.) View information about hot tracking from debugfs:
+
+$ echo "hot tracking test" > /mnt/file
+$ cat /sys/kernel/debug/hot_track/sdb/rt_stats_inode
+inode #279, reads 0, writes 1, temp 109
+$ cat /sys/kernel/debug/hot_track/sdb/range_data
+inode #279, range 0+1048576, reads 0, writes 1, temp 64
+
+$ echo "hot data tracking test" >> /mnt/file
+$ cat /sys/kernel/debug/hot_track/sdb/rt_stats_inode
+inode #279, reads 0, writes 2, temp 109
+$ cat /sys/kernel/debug/hot_track/sdb/range_data
+inode #279, range 0+1048576 reads 0, writes 2, temp 64
+
+4.) Check temp sorting result of some nodes:
+
+$ cat /sys/kernel/debug/hot_track/loop0/hot_spots_inode
+inode #5248773, reads 0, writes 244, temp 111
+inode #878523, reads 0, writes 1, temp 109
+inode #878524, reads 0, writes 1, temp 109
+
+5.) Tune some hot tracking parameters as below:
+
+$ cat /proc/sys/fs/hot-kick-time
+300
+$ echo 360 > /proc/sys/fs/hot-kick-time
+$ cat /proc/sys/fs/hot-kick-time
+360
+$ cat /proc/sys/fs/hot-update-delay
+300
+$ echo 360 > /proc/sys/fs/hot-update-delay
+$ cat /proc/sys/fs/hot-update-delay
+360
-- 
1.7.6.5

  parent reply	other threads:[~2012-12-20 14:43 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-20 14:43 [PATCH RESEND v1 00/16] vfs: hot data tracking zwu.kernel
2012-12-20 14:43 ` [PATCH RESEND v1 01/16] vfs: introduce some data structures zwu.kernel
2013-01-10  0:48   ` David Sterba
2013-01-10  6:24     ` Zhi Yong Wu
2012-12-20 14:43 ` [PATCH RESEND v1 02/16] vfs: add init and cleanup functions zwu.kernel
2013-01-10  0:48   ` David Sterba
2013-01-11  7:21     ` Zhi Yong Wu
2012-12-20 14:43 ` [PATCH RESEND v1 03/16] vfs: add I/O frequency update function zwu.kernel
2013-01-10  0:51   ` David Sterba
2013-01-11  7:38     ` Zhi Yong Wu
2013-01-11 14:27       ` David Sterba
2013-01-11 14:54         ` Zhi Yong Wu
2012-12-20 14:43 ` [PATCH RESEND v1 04/16] vfs: add two map arrays zwu.kernel
2013-01-10  0:51   ` David Sterba
2012-12-20 14:43 ` [PATCH RESEND v1 05/16] vfs: add hooks to enable hot tracking zwu.kernel
2013-01-10  0:52   ` David Sterba
2013-01-11  7:47     ` Zhi Yong Wu
2012-12-20 14:43 ` [PATCH RESEND v1 06/16] vfs: add temp calculation function zwu.kernel
2013-01-10  0:53   ` David Sterba
2013-01-11  8:08     ` Zhi Yong Wu
2012-12-20 14:43 ` [PATCH RESEND v1 07/16] vfs: add map info update function zwu.kernel
2012-12-20 14:43 ` [PATCH RESEND v1 08/16] vfs: add aging function zwu.kernel
2012-12-20 14:43 ` [PATCH RESEND v1 09/16] vfs: add one work queue zwu.kernel
2012-12-20 14:43 ` [PATCH RESEND v1 10/16] vfs: add FS hot type support zwu.kernel
2012-12-20 14:43 ` [PATCH RESEND v1 11/16] vfs: register one shrinker zwu.kernel
2012-12-20 14:43 ` [PATCH RESEND v1 12/16] vfs: add one ioctl interface zwu.kernel
2012-12-20 14:43 ` [PATCH RESEND v1 13/16] vfs: add debugfs support zwu.kernel
2012-12-20 14:43 ` [PATCH RESEND v1 14/16] proc: add two hot_track proc files zwu.kernel
2012-12-20 14:43 ` [PATCH RESEND v1 15/16] btrfs: add hot tracking support zwu.kernel
2012-12-20 14:43 ` zwu.kernel [this message]
2012-12-20 14:55 ` [PATCH RESEND v1 00/16] vfs: hot data tracking Zhi Yong Wu
2013-01-07 13:49 ` Zhi Yong Wu
2013-01-08  7:52 ` Zhi Yong Wu
2013-02-22  0:32 ` Zhi Yong Wu
2013-02-25 11:40   ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1356014615-15073-17-git-send-email-zwu.kernel@gmail.com \
    --to=zwu.kernel@gmail.com \
    --cc=andi@firstfloor.org \
    --cc=darrick.wong@oracle.com \
    --cc=dave@jikos.cz \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxram@linux.vnet.ibm.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wuzhy@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.