From: Dave Chinner <david@fromorbit.com>
To: Stefan Ring <stefanrin@gmail.com>
Cc: Saurabh Kadekodi <saukad@cs.cmu.edu>, linux-xfs@vger.kernel.org
Subject: Re: Collecting aged XFS profiles
Date: Tue, 18 Jul 2017 09:48:15 +1000 [thread overview]
Message-ID: <20170717234814.GK17762@dastard> (raw)
In-Reply-To: <CAAxjCEzevLnmxyVRME1-kT867KjH2rtOdspv7nF6TOfLUDD+xA@mail.gmail.com>
On Mon, Jul 17, 2017 at 09:00:19PM +0200, Stefan Ring wrote:
> On Sun, Jul 16, 2017 at 2:11 AM, Saurabh Kadekodi
> <saukad@cs.cmu.edu> wrote:
> > Hi,
> >
> > I am a PhD student studying file and storage systems and I am
> > currently conducting research on local file system aging. My
> > research aims at understanding realistic aging patterns and
> > analyzing the effects of aging on file system data structures
> > and its performance. For this purpose, I would like to capture
> > characteristics of naturally aged file systems (i.e. not aged
> > via synthetic workload generators).
Hi Saurabh - it's a great idea to do this, but I suspect you might
want to spend some more time learning about the mechanisms
and policies XFS uses to prevent aging and maintain performance. I'm
suggesting this because knowing what the filesystem is trying to do
will drastically change your idea of what information needs to be
gathered....
> > In order to facilitate this profile capture, I have written a shell / python based profiling tool (fsagestats - https://github.com/saurabhkadekodi/fsagestats) that does a file system tree walk and captures different characteristics (file age, file size and directory depth) of files and directories and produces distributions. I do not care about file names or data within each file. It also runs xfs_db in order to capture the free space fragmentation, file fragmentation, directory fragmentation and overall fragmentation; all of which are directly correlated with the file system performance. It dumps the results in the results dir, which is to be specified when you run fsagestats. You can send me the aging profile by tarring up the results directory and sending it via email.
> >
> > Since I do not have access to XFS systems that see a lot of churn, I am reaching out to the XFS community in order to find volunteers willing to run my script and capture their XFS aging profile. Please feel free to modify the script as per your installation or as you see fit. Since fsagestats collects no private information, I eventually intend to host these profiles publicly (unless explicitly requested not to) to aid other researchers / enthusiasts.
> >
> > In case you have any questions on concerns, please let me know.
>
> I have a nicely aged filesystem (1 TB) on our dev server with around
> 10 million files on it. I will not run a script that executes two
> xfs_io calls *for each file* on it. Why don't you just use Python's
> stat.stat to get at the ctime and the size?
Ok, had a look at the script. You can replace most of it with
pretty much one line.
$ find <dir> -exec stat -c "%n %Z %s" {} \;
Processing the dirents to get the "distribution stats" could be done
by piping the output into a five line awk script. I'll leave that
as an exercise for the reader.
IMO, the script is not gathering anything particularly useful about
how the filesystem has aged. The information being gathered doesn't
tell us anything useful about how the allocator is performing for
the given workload, nor does it provide insight into the locality
characteristics and fragmentation of related files and directories
which directly influence IO (and hence filesystem) performance.
e.g. if the inode64 allocator is in use, then all the files in a
directory should be in the same physical region. As such, a key sign
of an aged filesystem is that the allocator is not able to maintain
the desired locality relationships between files.
To analyse such things, maybe consider gathering obfuscated metadump
images rather asking people to run scripts that gather limited
information. That way you can develop scripts to extract the
information your research requires from the filesystem images you
received, rather than try to draw tenuous conclusions from a limited
data set...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2017-07-17 23:48 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-16 0:11 Collecting aged XFS profiles Saurabh Kadekodi
2017-07-16 2:57 ` Eric Sandeen
2017-07-17 19:00 ` Stefan Ring
2017-07-17 23:48 ` Dave Chinner [this message]
2017-07-18 5:45 ` Saurabh Kadekodi
2017-07-19 7:59 ` Stefan Ring
2017-07-19 15:20 ` Eric Sandeen
2017-07-19 21:08 ` Stefan Ring
2017-07-19 22:00 ` Eric Sandeen
2017-07-20 7:52 ` Stefan Ring
2017-07-20 14:27 ` Eric Sandeen
2017-07-20 20:15 ` Stefan Ring
2017-07-20 20:21 ` Eric Sandeen
2017-07-20 3:02 ` Dave Chinner
2017-07-20 3:55 ` Eric Sandeen
2017-07-20 4:38 ` Dave Chinner
2017-07-20 14:24 ` Eric Sandeen
2017-07-20 22:27 ` Dave Chinner
2017-07-20 22:48 ` Eric Sandeen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170717234814.GK17762@dastard \
--to=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
--cc=saukad@cs.cmu.edu \
--cc=stefanrin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox