* Collecting aged Ext4 profiles
@ 2017-07-16 0:14 Saurabh Kadekodi
2017-07-17 9:56 ` Lukas Czerner
[not found] ` <D52E9B43-C169-4C71-BFDA-0100C75FEDC1@dilger.ca>
0 siblings, 2 replies; 5+ messages in thread
From: Saurabh Kadekodi @ 2017-07-16 0:14 UTC (permalink / raw)
To: linux-ext4
Hi,
I am a PhD student studying file and storage systems and I am currently conducting research on local file system aging. My research aims at understanding realistic aging patterns and analyzing the effects of aging on file system data structures and its performance. For this purpose, I would like to capture characteristics of naturally aged file systems (i.e. not aged via synthetic workload generators).
In order to facilitate this profile capture, I have written a shell / python based profiling tool (fsagestats - https://github.com/saurabhkadekodi/fsagestats) that does a file system tree walk and captures different characteristics (file age, file size and directory depth) of files and directories and produces distributions. I do not care about file names or data within each file. It also runs e2freefrag in order to understand the level of free space fragmentation, e4defrag in order to capture the fragmentation score, and copies a large file (~ 2GB) and runs filefrag in order to understand the file fragmentation, all of which are directly correlated with the file system performance. It dumps the results in the results dir, which is to be specified when you run fsagestats. You can send me
the aging profile by tarring up the results directory and sending it via email.
Since I do not have access to Ext4 systems that see a lot of churn, I am reaching out to the Ext4 community in order to find volunteers willing to run my script and capture their Ext4 aging profile. Please feel free to modify the script as per your installation or as you see fit. Since fsagestats collects no private information, I eventually intend to host these profiles publicly (unless explicitly requested not to) to aid other researchers / enthusiasts.
In case you have any questions on concerns, please let me know.
Thanks,
Saurabh Kadekodi
PS: cc’ing the response and / or the aging profile to saukad@cs.cmu.edu is greatly appreciated.
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: Collecting aged Ext4 profiles 2017-07-16 0:14 Collecting aged Ext4 profiles Saurabh Kadekodi @ 2017-07-17 9:56 ` Lukas Czerner [not found] ` <D52E9B43-C169-4C71-BFDA-0100C75FEDC1@dilger.ca> 1 sibling, 0 replies; 5+ messages in thread From: Lukas Czerner @ 2017-07-17 9:56 UTC (permalink / raw) To: Saurabh Kadekodi; +Cc: linux-ext4, spetrovi Hi, my collegue Samuel Petrovic <spetrovi@redhat.com> has done some research in this area as well, so maybe you may find it useful to compare notes. Adding him to cc. Cheers, -Lukas On Sat, Jul 15, 2017 at 05:14:21PM -0700, Saurabh Kadekodi wrote: > Hi, > > I am a PhD student studying file and storage systems and I am currently conducting research on local file system aging. My research aims at understanding realistic aging patterns and analyzing the effects of aging on file system data structures and its performance. For this purpose, I would like to capture characteristics of naturally aged file systems (i.e. not aged via synthetic workload generators). > > In order to facilitate this profile capture, I have written a shell / python based profiling tool (fsagestats - https://github.com/saurabhkadekodi/fsagestats) that does a file system tree walk and captures different characteristics (file age, file size and directory depth) of files and directories and produces distributions. I do not care about file names or data within each file. It also runs e2freefrag in order to understand the level of free space fragmentation, e4defrag in order to capture the fragmentation score, and copies a large file (~ 2GB) and runs filefrag in order to understand the file fragmentation, all of which are directly correlated with the file system performance. It dumps the results in the results dir, which is to be specified when you run fsagestats. You can send m e the aging profile by tarring up the results directory and sending it via email. > > Since I do not have access to Ext4 systems that see a lot of churn, I am reaching out to the Ext4 community in order to find volunteers willing to run my script and capture their Ext4 aging profile. Please feel free to modify the script as per your installation or as you see fit. Since fsagestats collects no private information, I eventually intend to host these profiles publicly (unless explicitly requested not to) to aid other researchers / enthusiasts. > > In case you have any questions on concerns, please let me know. > > Thanks, > Saurabh Kadekodi > > PS: cc’ing the response and / or the aging profile to saukad@cs.cmu.edu is greatly appreciated. ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <D52E9B43-C169-4C71-BFDA-0100C75FEDC1@dilger.ca>]
* Re: Collecting aged Ext4 profiles [not found] ` <D52E9B43-C169-4C71-BFDA-0100C75FEDC1@dilger.ca> @ 2017-07-17 20:08 ` Andreas Dilger 2017-07-17 20:12 ` Saurabh Kadekodi 0 siblings, 1 reply; 5+ messages in thread From: Andreas Dilger @ 2017-07-17 20:08 UTC (permalink / raw) To: Saurabh Kadekodi; +Cc: linux-ext4 [-- Attachment #1: Type: text/plain, Size: 2807 bytes --] On Jul 16, 2017, at 12:34 AM, Andreas Dilger <adilger@dilger.ca> wrote: > On Jul 15, 2017, at 18:14, Saurabh Kadekodi <saukad@cs.cmu.edu> wrote: > >> Hi, >> >> I am a PhD student studying file and storage systems and I am currently conducting research on local file system aging. My research aims at understanding realistic aging patterns and analyzing the effects of aging on file system data structures and its performance. For this purpose, I would like to capture characteristics of naturally aged file systems (i.e. not aged via synthetic workload generators). >> >> In order to facilitate this profile capture, I have written a shell / python based profiling tool (fsagestats - https://github.com/saurabhkadekodi/fsagestats) that does a file system tree walk and captures different characteristics (file age, file size and directory depth) of files and directories and produces distributions. I do not care about file names or data within each file. It also runs e2freefrag in order to understand the level of free space fragmentation, e4defrag in order to capture the fragmentation score, and copies a large file (~ 2GB) and runs filefrag in order to understand the file fragmentation, all of which are directly correlated with the file system performance. It dumps the results in the results dir, which is to be specified when you run fsagestats. You can send me the aging profile by tarring up the results directory and sending it via email. >> >> Since I do not have access to Ext4 systems that see a lot of churn, I am reaching out to the Ext4 community in order to find volunteers willing to run my script and capture their Ext4 aging profile. Please feel free to modify the script as per your installation or as you see fit. Since fsagestats collects no private information, I eventually intend to host these profiles publicly (unless explicitly requested not to) to aid other researchers / enthusiasts. >> >> In case you have any questions on concerns, please let me know. >> >> Thanks, >> Saurabh Kadekodi >> >> PS: cc’ing the response and / or the aging profile to saukad@cs.cmu.edu is greatly appreciated. > > How does your fsagestats tool compare to the existing fsstats tool (http://web.cs.dal.ca/~morven/CSCI3120/fsstats)? If there isn't a significant difference between the two, it would be nice to stick with the existing tool to collect the filesystem information so that the body of data collected continues to grow. Actually, a slightly better URL is https://github.com/adilger/fsstats which is a proper Git repo and includes the original license. The original project URL http://www.pdsi-scidac.org/fsstats/ is no longer functional. I also have a local archive of results from that project if you are interested. Cheers, Andreas [-- Attachment #2: Message signed with OpenPGP --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Collecting aged Ext4 profiles 2017-07-17 20:08 ` Andreas Dilger @ 2017-07-17 20:12 ` Saurabh Kadekodi 2017-07-23 1:40 ` Saurabh Kadekodi 0 siblings, 1 reply; 5+ messages in thread From: Saurabh Kadekodi @ 2017-07-17 20:12 UTC (permalink / raw) To: Andreas Dilger; +Cc: linux-ext4 Thanks Andreas. Yes, it would be great if you could share the archive. I will go through fsstats and check the exact difference. In case it captures what I need to, I agree that using fsstats would be more apt. Thanks, Saurabh > On Jul 17, 2017, at 1:08 PM, Andreas Dilger <adilger@dilger.ca> wrote: > > On Jul 16, 2017, at 12:34 AM, Andreas Dilger <adilger@dilger.ca> wrote: >> On Jul 15, 2017, at 18:14, Saurabh Kadekodi <saukad@cs.cmu.edu> wrote: >> >>> Hi, >>> >>> I am a PhD student studying file and storage systems and I am currently conducting research on local file system aging. My research aims at understanding realistic aging patterns and analyzing the effects of aging on file system data structures and its performance. For this purpose, I would like to capture characteristics of naturally aged file systems (i.e. not aged via synthetic workload generators). >>> >>> In order to facilitate this profile capture, I have written a shell / python based profiling tool (fsagestats - https://github.com/saurabhkadekodi/fsagestats) that does a file system tree walk and captures different characteristics (file age, file size and directory depth) of files and directories and produces distributions. I do not care about file names or data within each file. It also runs e2freefrag in order to understand the level of free space fragmentation, e4defrag in order to capture the fragmentation score, and copies a large file (~ 2GB) and runs filefrag in order to understand the file fragmentation, all of which are directly correlated with the file system performance. It dumps the results in the results dir, which is to be specified when you run fsagestats. You can send me the aging profile by tarring up the results directory and sending it via email. >>> >>> Since I do not have access to Ext4 systems that see a lot of churn, I am reaching out to the Ext4 community in order to find volunteers willing to run my script and capture their Ext4 aging profile. Please feel free to modify the script as per your installation or as you see fit. Since fsagestats collects no private information, I eventually intend to host these profiles publicly (unless explicitly requested not to) to aid other researchers / enthusiasts. >>> >>> In case you have any questions on concerns, please let me know. >>> >>> Thanks, >>> Saurabh Kadekodi >>> >>> PS: cc’ing the response and / or the aging profile to saukad@cs.cmu.edu is greatly appreciated. >> >> How does your fsagestats tool compare to the existing fsstats tool (http://web.cs.dal.ca/~morven/CSCI3120/fsstats)? If there isn't a significant difference between the two, it would be nice to stick with the existing tool to collect the filesystem information so that the body of data collected continues to grow. > > Actually, a slightly better URL is https://github.com/adilger/fsstats which is a > proper Git repo and includes the original license. The original project URL > http://www.pdsi-scidac.org/fsstats/ is no longer functional. I also have a local > archive of results from that project if you are interested. > > Cheers, Andreas > > > > > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Collecting aged Ext4 profiles 2017-07-17 20:12 ` Saurabh Kadekodi @ 2017-07-23 1:40 ` Saurabh Kadekodi 0 siblings, 0 replies; 5+ messages in thread From: Saurabh Kadekodi @ 2017-07-23 1:40 UTC (permalink / raw) To: Andreas Dilger; +Cc: linux-ext4 fsstats captures most of the stuff I want to (age and size distributions). It does not capture the directory depth distribution (i.e. what fraction of the files are how deep in the fs hierarchy) which can be important in an aging study because Ext4 chooses to split high level directories in different block groups resulting in some fragmentation. fsstats also does not capture free space fragmentation and the fragmentation score, both of which are important for my study. If fsstats is more convenient, it would be great if the following commands could also be run in order to capture the fragmentation: 1. e2freefrag ext4_dev 2. e4defrag -c mount_point Thanks, Saurabh > On Jul 17, 2017, at 1:12 PM, Saurabh Kadekodi <saukad@cs.cmu.edu> wrote: > > Thanks Andreas. Yes, it would be great if you could share the archive. I will go through fsstats and check the exact difference. In case it captures what I need to, I agree that using fsstats would be more apt. > > Thanks, > Saurabh > >> On Jul 17, 2017, at 1:08 PM, Andreas Dilger <adilger@dilger.ca> wrote: >> >> On Jul 16, 2017, at 12:34 AM, Andreas Dilger <adilger@dilger.ca> wrote: >>> On Jul 15, 2017, at 18:14, Saurabh Kadekodi <saukad@cs.cmu.edu> wrote: >>> >>>> Hi, >>>> >>>> I am a PhD student studying file and storage systems and I am currently conducting research on local file system aging. My research aims at understanding realistic aging patterns and analyzing the effects of aging on file system data structures and its performance. For this purpose, I would like to capture characteristics of naturally aged file systems (i.e. not aged via synthetic workload generators). >>>> >>>> In order to facilitate this profile capture, I have written a shell / python based profiling tool (fsagestats - https://github.com/saurabhkadekodi/fsagestats) that does a file system tree walk and captures different characteristics (file age, file size and directory depth) of files and directories and produces distributions. I do not care about file names or data within each file. It also runs e2freefrag in order to understand the level of free space fragmentation, e4defrag in order to capture the fragmentation score, and copies a large file (~ 2GB) and runs filefrag in order to understand the file fragmentation, all of which are directly correlated with the file system performance. It dumps the results in the results dir, which is to be specified when you run fsagestats. You can sen d me the aging profile by tarring up the results directory and sending it via email. >>>> >>>> Since I do not have access to Ext4 systems that see a lot of churn, I am reaching out to the Ext4 community in order to find volunteers willing to run my script and capture their Ext4 aging profile. Please feel free to modify the script as per your installation or as you see fit. Since fsagestats collects no private information, I eventually intend to host these profiles publicly (unless explicitly requested not to) to aid other researchers / enthusiasts. >>>> >>>> In case you have any questions on concerns, please let me know. >>>> >>>> Thanks, >>>> Saurabh Kadekodi >>>> >>>> PS: cc’ing the response and / or the aging profile to saukad@cs.cmu.edu is greatly appreciated. >>> >>> How does your fsagestats tool compare to the existing fsstats tool (http://web.cs.dal.ca/~morven/CSCI3120/fsstats)? If there isn't a significant difference between the two, it would be nice to stick with the existing tool to collect the filesystem information so that the body of data collected continues to grow. >> >> Actually, a slightly better URL is https://github.com/adilger/fsstats which is a >> proper Git repo and includes the original license. The original project URL >> http://www.pdsi-scidac.org/fsstats/ is no longer functional. I also have a local >> archive of results from that project if you are interested. >> >> Cheers, Andreas >> >> >> >> >> > ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-07-23 1:40 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-16 0:14 Collecting aged Ext4 profiles Saurabh Kadekodi
2017-07-17 9:56 ` Lukas Czerner
[not found] ` <D52E9B43-C169-4C71-BFDA-0100C75FEDC1@dilger.ca>
2017-07-17 20:08 ` Andreas Dilger
2017-07-17 20:12 ` Saurabh Kadekodi
2017-07-23 1:40 ` Saurabh Kadekodi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).