From: Brian Foster <bfoster@redhat.com>
To: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>,
linux-fsdevel@vger.kernel.org, dchinner@redhat.com,
jbacik@fb.com
Subject: Re: [PATCH v6 1/2] sb: add a new writeback list for sync
Date: Thu, 21 Jan 2016 12:13:06 -0500 [thread overview]
Message-ID: <20160121171306.GC19272@bfoster.bfoster> (raw)
In-Reply-To: <20160121163411.GP10810@quack.suse.cz>
On Thu, Jan 21, 2016 at 05:34:11PM +0100, Jan Kara wrote:
> On Thu 21-01-16 10:22:57, Brian Foster wrote:
> > On Thu, Jan 21, 2016 at 07:11:59AM +1100, Dave Chinner wrote:
> > > On Wed, Jan 20, 2016 at 02:26:26PM +0100, Jan Kara wrote:
> > > > On Tue 19-01-16 12:59:12, Brian Foster wrote:
> > > > > From: Dave Chinner <dchinner@redhat.com>
> > > > >
...
> > > >
> >
> > Hi Jan, Dave,
> >
...
> > > > a) How much sync(2) speed has improved if there's not much to wait for.
> > >
> > > Depends on the size of the inode cache when sync is run. If it's
> > > empty it's not noticable. When you have tens of millions of cached,
> > > clean inodes the inode list traversal can takes tens of seconds.
> > > This is the sort of problem Josef reported that FB were having...
> > >
> >
> > FWIW, Ceph has indicated this is a pain point for them as well. The
> > results at [0] below show the difference in sync time with a largely
> > populated inode cache before and after this patch.
> >
> > > > b) See whether parallel heavy stat(2) load which is rotating lots of inodes
> > > > in inode cache sees some improvement when it doesn't have to contend with
> > > > sync(2) on s_inode_list_lock. I believe Dave Chinner had some loads where
> > > > the contention on s_inode_list_lock due to sync and rotation of inodes was
> > > > pretty heavy.
> > >
> > > Just my usual fsmark workloads - they have parallel find and
> > > parallel ls -lR traversals over the created fileset. Even just
> > > running sync during creation (because there are millions of cached
> > > inodes, and ~250,000 inodes being instiated and reclaimed every
> > > second) causes lock contention problems....
> > >
> >
> > I ran a similar parallel (16x) fs_mark workload using '-S 4,' which
> > incorporates a sync() per pass. Without this patch, this demonstrates a
> > slow degradation as the inode cache grows. Results at [1].
>
> Thanks for the results. I think it would be good if you incorporated them
> in the changelog since other people will likely be asking similar
> questions when seeing the inode is growing. Other than that feel free to
> add:
>
> Reviewed-by: Jan Kara <jack@suse.cz>
>
No problem, thanks! Sure, I don't want to dump the raw stuff into the
commit log description to avoid making it too long, but I can reference
the core sync time impact. I've appended the following for now:
"With this change, filesystem sync times are significantly reduced for
fs' with largely populated inode caches and otherwise no other work to
do. For example, on a 16xcpu 2GHz x86-64 server, 10TB XFS filesystem
with a ~10m entry inode cache, sync times are reduced from ~7.3s to less
than 0.1s when the filesystem is fully clean."
I'll repost in a day or so if I don't receive any other feedback.
Brian
> Honza
> > 16xcpu, 32GB RAM x86-64 server
> > Storage is LVM volumes on hw raid0.
> >
> > [0] -- sync test w/ ~10m clean inode cache
> > - 10TB pre-populated XFS fs, cache populated via parallel find/stat
> > workload
> >
> > --- 4.4.0+
> >
> > # cat /proc/slabinfo | grep xfs
> > xfs_dqtrx 0 0 528 62 8 : tunables 0 0 0 : slabdata 0 0 0
> > xfs_dquot 0 0 656 49 8 : tunables 0 0 0 : slabdata 0 0 0
> > xfs_buf 496293 496893 640 51 8 : tunables 0 0 0 : slabdata 9743 9743 0
> > xfs_icr 0 0 144 56 2 : tunables 0 0 0 : slabdata 0 0 0
> > xfs_inode 10528071 10529150 1728 18 8 : tunables 0 0 0 : slabdata 584999 584999 0
> > xfs_efd_item 0 0 400 40 4 : tunables 0 0 0 : slabdata 0 0 0
> > xfs_da_state 544 544 480 34 4 : tunables 0 0 0 : slabdata 16 16 0
> > xfs_btree_cur 0 0 208 39 2 : tunables 0 0 0 : slabdata 0 0 0
> >
> > # time sync
> >
> > real 0m7.322s
> > user 0m0.000s
> > sys 0m7.314s
> > # time sync
> >
> > real 0m7.299s
> > user 0m0.000s
> > sys 0m7.296s
> >
> > --- 4.4.0+ w/ sync patch
> >
> > # cat /proc/slabinfo | grep xfs
> > xfs_dqtrx 0 0 528 62 8 : tunables 0 0 0 : slabdata 0 0 0
> > xfs_dquot 0 0 656 49 8 : tunables 0 0 0 : slabdata 0 0 0
> > xfs_buf 428214 428514 640 51 8 : tunables 0 0 0 : slabdata 8719 8719 0
> > xfs_icr 0 0 144 56 2 : tunables 0 0 0 : slabdata 0 0 0
> > xfs_inode 11054375 11054438 1728 18 8 : tunables 0 0 0 : slabdata 721323 721323 0
> > xfs_efd_item 0 0 400 40 4 : tunables 0 0 0 : slabdata 0 0 0
> > xfs_da_state 544 544 480 34 4 : tunables 0 0 0 : slabdata 16 16 0
> > xfs_btree_cur 0 0 208 39 2 : tunables 0 0 0 : slabdata 0 0 0
> >
> > # time sync
> >
> > real 0m0.040s
> > user 0m0.001s
> > sys 0m0.003s
> > # time sync
> >
> > real 0m0.002s
> > user 0m0.001s
> > sys 0m0.002s
> >
> > [1] -- fs_mark -D 1000 -S4 -n 1000 -d /mnt/0 ... -d /mnt/15 -L 32
> > - 1TB XFS fs
> >
> > --- 4.4.0+
> >
> > FSUse% Count Size Files/sec App Overhead
> > 2 16000 51200 3313.3 822514
> > 2 32000 51200 3353.6 310268
> > 2 48000 51200 3475.2 289941
> > 2 64000 51200 3104.6 289993
> > 2 80000 51200 2944.9 292124
> > 2 96000 51200 3010.4 288042
> > 3 112000 51200 2756.4 289761
> > 3 128000 51200 2753.2 288096
> > 3 144000 51200 2474.4 290797
> > 3 160000 51200 2657.9 290898
> > 3 176000 51200 2498.0 288247
> > 3 192000 51200 2415.5 287329
> > 3 208000 51200 2336.1 291113
> > 3 224000 51200 2352.9 290103
> > 3 240000 51200 2309.6 289580
> > 3 256000 51200 2344.3 289828
> > 3 272000 51200 2293.0 291282
> > 3 288000 51200 2295.5 286538
> > 4 304000 51200 2119.0 288906
> > 4 320000 51200 2059.6 293605
> > 4 336000 51200 2129.1 289825
> > 4 352000 51200 1929.8 288186
> > 4 368000 51200 1987.5 294596
> > 4 384000 51200 1929.1 293528
> > 4 400000 51200 1934.8 288138
> > 4 416000 51200 1823.6 292318
> > 4 432000 51200 1838.7 290890
> > 4 448000 51200 1797.5 288816
> > 4 464000 51200 1823.2 287190
> > 4 480000 51200 1738.7 295745
> > 4 496000 51200 1716.4 293821
> > 5 512000 51200 1726.7 290445
> >
> > --- 4.4.0+ w/ sync patch
> >
> > FSUse% Count Size Files/sec App Overhead
> > 2 16000 51200 3409.7 999579
> > 2 32000 51200 3481.3 286877
> > 2 48000 51200 3447.3 282743
> > 2 64000 51200 3522.3 283400
> > 2 80000 51200 3427.0 286360
> > 2 96000 51200 3360.2 307219
> > 3 112000 51200 3377.7 286625
> > 3 128000 51200 3363.7 285929
> > 3 144000 51200 3345.7 283138
> > 3 160000 51200 3384.9 291081
> > 3 176000 51200 3084.1 285265
> > 3 192000 51200 3388.4 291439
> > 3 208000 51200 3242.8 286332
> > 3 224000 51200 3337.9 285006
> > 3 240000 51200 3442.8 292109
> > 3 256000 51200 3230.3 283432
> > 3 272000 51200 3358.3 286996
> > 3 288000 51200 3309.0 288058
> > 4 304000 51200 3293.4 284309
> > 4 320000 51200 3221.4 284476
> > 4 336000 51200 3241.5 283968
> > 4 352000 51200 3228.3 284354
> > 4 368000 51200 3255.7 286072
> > 4 384000 51200 3094.6 290240
> > 4 400000 51200 3385.6 288158
> > 4 416000 51200 3265.2 284387
> > 4 432000 51200 3315.2 289656
> > 4 448000 51200 3275.1 284562
> > 4 464000 51200 3238.4 294976
> > 4 480000 51200 3060.0 290088
> > 4 496000 51200 3359.5 286949
> > 5 512000 51200 3156.2 288126
> >
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
next prev parent reply other threads:[~2016-01-21 17:13 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-19 17:59 [PATCH v6 0/2] improve sync efficiency with sb inode wb list Brian Foster
2016-01-19 17:59 ` [PATCH v6 1/2] sb: add a new writeback list for sync Brian Foster
2016-01-20 13:26 ` Jan Kara
2016-01-20 20:11 ` Dave Chinner
2016-01-21 15:22 ` Brian Foster
2016-01-21 16:34 ` Jan Kara
2016-01-21 17:13 ` Brian Foster [this message]
2016-01-21 18:08 ` Josef Bacik
2016-01-19 17:59 ` [PATCH v6 2/2] wb: inode writeback list tracking tracepoints Brian Foster
2016-01-20 13:14 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160121171306.GC19272@bfoster.bfoster \
--to=bfoster@redhat.com \
--cc=david@fromorbit.com \
--cc=dchinner@redhat.com \
--cc=jack@suse.cz \
--cc=jbacik@fb.com \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).