From: Ric Wheeler <ricwheeler@gmail.com>
To: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Cc: linux-mm@kvack.org,
Linux FS Devel <linux-fsdevel@vger.kernel.org>,
Mel Gorman <mgorman@suse.de>, Andreas Dilger <adilger@dilger.ca>,
sage@inktank.com
Subject: Re: Linux Plumbers IO & File System Micro-conference
Date: Fri, 19 Jul 2013 15:57:37 -0400 [thread overview]
Message-ID: <51E99A31.2070208@gmail.com> (raw)
In-Reply-To: <51E998E0.10207@itwm.fraunhofer.de>
On 07/19/2013 03:52 PM, Bernd Schubert wrote:
> Hello Ric, hi all,
>
> On 07/12/2013 07:20 PM, Ric Wheeler wrote:
>>
>> If you have topics that you would like to add, wait until the
>> instructions get posted at the link above. If you are impatient, feel
>> free to email me directly (but probably best to drop the broad mailing
>> lists from the reply).
>
> sorry, that will be a rather long introduction, the short conclusion is below.
>
>
> Introduction to the meta-cache issue:
> =====================================
> For quite a while we are redesigning our FhGFS storage layout to workaround
> meta-cache issues of underlying file systems. However, there are constraints
> as data and meta-data are distributed on between several targets/servers.
> Other distributed file systems, such as Lustre and (I think) cepfs should have
> the similar issues.
>
> So the main issue we have is that streaming reads/writes evict meta-pages from
> the page-cache. I.e. this results in lots of directory-block reads on creating
> files. So FhGFS, Lustre an (I believe) cephfs are using hash-directories to
> store object files. Access to files in these hash-directories is rather random
> and with increasing number of files, access to hash directory-blocks/pages
> also gets entirely random. Streaming IO easily evicts these pages, which
> results in high latencies when users perform file creates/deletes, as
> corresponding directory blocks have to be re-read from disk again and again.
> Now one could argue that hash-directories are poor choice and indeed we are
> mostly solving that issue in FhGFS now(currently stable release on the meta
> side, upcoming release on the data/storage side).
> However, given by the problem of distributed meta-data and distributed data we
> have not found a way yet to entirely eliminate hash directories. For example,
> recently one of our users created 80 million directories with one or two files
> in these directories and even with the new layout that still would be an
> issue. It even is an issue with direct access on the underlying file system.
> Of course, basically empty directories should be avoided at all, but users
> have their own way of doing IO.
> Furthermore, the meta-cache vs. streaming-cache issue is not limited to
> directory blocks only, but any cached meta-data are affected. Mel recently
> wrote a few patches to improve meta-caching ("Obey mark_page_accessed hint
> given by filesystems"), but at least for our directory-block issue that
> doesn't seem to help.
>
> Conclusion:
> ===========
> From my point of view, there should be a small, but configurable, number pages
> reserved for meta-data only. If streaming IO wouldn't be able evict these
> pages, our and other file systems meta-cache issues probably would be entire
> solved at all.
>
>
> Example:
> ========
>
> Just a very basic simple bonnie++ test with 60000 files on ext4 with inlined
> data to reduce block and bitmap lookups and writes.
>
> Entirely cached hash directories (16384), which are populated with about 16
> million files, so 1000 files per hash-dir.
>
>> Version 1.96 ------Sequential Create------ --------Random Create--------
>> fslab3 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
>> files:max:min /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
>> 60:32:32 1702 14 2025 12 1332 4 1873 16 2047 13 1266 3
>> Latency 3874ms 6645ms 8659ms 505ms 7257ms 9627ms
>> 1.96,1.96,fslab3,1,1374655110,,,,,,,,,,,,,,60,32,32,,,1702,14,2025,12,1332,4,1873,16,2047,13,1266,3,,,,,,,3874ms,6645ms,8659ms,505ms,7257ms,9627ms
>>
>
>
> Now after clients did some streaming IO:
>
>> Version 1.96 ------Sequential Create------ --------Random Create--------
>> fslab3 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
>> files:max:min /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
>> 60:32:32 541 4 2343 16 2103 6 586 5 1947 13 1603 4
>> Latency 190ms 166ms 3459ms 6762ms 6518ms 9185ms
>
>
> With longer/more streaming that can go down to 25 creates/s. iostat and btrace
> show lots of meta-reads then, which correspond to directory-block reads.
>
> Now after running 'find' over these hash directories to re-read all blocks:
>
>> Version 1.96 ------Sequential Create------ --------Random Create--------
>> fslab3 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
>> files:max:min /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
>> 60:32:32 1878 16 2766 16 2464 7 1506 13 2054 13 1433 4
>> Latency 349ms 164ms 1594ms 7730ms 6204ms 8112ms
>
>
>
> Would a dedicated meta-cache be a topic for discussion?
>
>
> Thanks,
> Bernd
>
Hi Bernd,
I think that sounds like an interesting idea to discuss - can you add a proposal
here:
http://www.linuxplumbersconf.org/2013/ocw/events/LPC2013/proposals
Thanks!
Ric
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-07-19 19:57 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-12 17:20 Linux Plumbers IO & File System Micro-conference Ric Wheeler
2013-07-12 17:42 ` faibish, sorin
2013-07-15 21:22 ` Ric Wheeler
2013-07-19 19:52 ` Bernd Schubert
2013-07-19 19:57 ` Ric Wheeler [this message]
2013-07-22 0:47 ` Dave Chinner
2013-07-22 12:36 ` Bernd Schubert
2013-07-23 6:25 ` Dave Chinner
2013-07-26 14:35 ` Bernd Schubert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51E99A31.2070208@gmail.com \
--to=ricwheeler@gmail.com \
--cc=adilger@dilger.ca \
--cc=bernd.schubert@itwm.fraunhofer.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=sage@inktank.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).