From: Brian Foster <bfoster@redhat.com>
To: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>,
linux-fsdevel@vger.kernel.org, dchinner@redhat.com,
jbacik@fb.com
Subject: Re: [PATCH v6 1/2] sb: add a new writeback list for sync
Date: Thu, 21 Jan 2016 12:13:06 -0500 [thread overview]
Message-ID: <20160121171306.GC19272@bfoster.bfoster> (raw)
In-Reply-To: <20160121163411.GP10810@quack.suse.cz>
On Thu, Jan 21, 2016 at 05:34:11PM +0100, Jan Kara wrote:
> On Thu 21-01-16 10:22:57, Brian Foster wrote:
> > On Thu, Jan 21, 2016 at 07:11:59AM +1100, Dave Chinner wrote:
> > > On Wed, Jan 20, 2016 at 02:26:26PM +0100, Jan Kara wrote:
> > > > On Tue 19-01-16 12:59:12, Brian Foster wrote:
> > > > > From: Dave Chinner <dchinner@redhat.com>
> > > > >
...
> > > >
> >
> > Hi Jan, Dave,
> >
...
> > > > a) How much sync(2) speed has improved if there's not much to wait for.
> > >
> > > Depends on the size of the inode cache when sync is run. If it's
> > > empty it's not noticable. When you have tens of millions of cached,
> > > clean inodes the inode list traversal can takes tens of seconds.
> > > This is the sort of problem Josef reported that FB were having...
> > >
> >
> > FWIW, Ceph has indicated this is a pain point for them as well. The
> > results at [0] below show the difference in sync time with a largely
> > populated inode cache before and after this patch.
> >
> > > > b) See whether parallel heavy stat(2) load which is rotating lots of inodes
> > > > in inode cache sees some improvement when it doesn't have to contend with
> > > > sync(2) on s_inode_list_lock. I believe Dave Chinner had some loads where
> > > > the contention on s_inode_list_lock due to sync and rotation of inodes was
> > > > pretty heavy.
> > >
> > > Just my usual fsmark workloads - they have parallel find and
> > > parallel ls -lR traversals over the created fileset. Even just
> > > running sync during creation (because there are millions of cached
> > > inodes, and ~250,000 inodes being instiated and reclaimed every
> > > second) causes lock contention problems....
> > >
> >
> > I ran a similar parallel (16x) fs_mark workload using '-S 4,' which
> > incorporates a sync() per pass. Without this patch, this demonstrates a
> > slow degradation as the inode cache grows. Results at [1].
>
> Thanks for the results. I think it would be good if you incorporated them
> in the changelog since other people will likely be asking similar
> questions when seeing the inode is growing. Other than that feel free to
> add:
>
> Reviewed-by: Jan Kara <jack@suse.cz>
>
No problem, thanks! Sure, I don't want to dump the raw stuff into the
commit log description to avoid making it too long, but I can reference
the core sync time impact. I've appended the following for now:
"With this change, filesystem sync times are significantly reduced for
fs' with largely populated inode caches and otherwise no other work to
do. For example, on a 16xcpu 2GHz x86-64 server, 10TB XFS filesystem
with a ~10m entry inode cache, sync times are reduced from ~7.3s to less
than 0.1s when the filesystem is fully clean."
I'll repost in a day or so if I don't receive any other feedback.
Brian
> Honza
> > 16xcpu, 32GB RAM x86-64 server
> > Storage is LVM volumes on hw raid0.
> >
> > [0] -- sync test w/ ~10m clean inode cache
> > - 10TB pre-populated XFS fs, cache populated via parallel find/stat
> > workload
> >
> > --- 4.4.0+
> >
> > # cat /proc/slabinfo | grep xfs
> > xfs_dqtrx 0 0 528 62 8 : tunables 0 0 0 : slabdata 0 0 0
> > xfs_dquot 0 0 656 49 8 : tunables 0 0 0 : slabdata 0 0 0
> > xfs_buf 496293 496893 640 51 8 : tunables 0 0 0 : slabdata 9743 9743 0
> > xfs_icr 0 0 144 56 2 : tunables 0 0 0 : slabdata 0 0 0
> > xfs_inode 10528071 10529150 1728 18 8 : tunables 0 0 0 : slabdata 584999 584999 0
> > xfs_efd_item 0 0 400 40 4 : tunables 0 0 0 : slabdata 0 0 0
> > xfs_da_state 544 544 480 34 4 : tunables 0 0 0 : slabdata 16 16 0
> > xfs_btree_cur 0 0 208 39 2 : tunables 0 0 0 : slabdata 0 0 0
> >
> > # time sync
> >
> > real 0m7.322s
> > user 0m0.000s
> > sys 0m7.314s
> > # time sync
> >
> > real 0m7.299s
> > user 0m0.000s
> > sys 0m7.296s
> >
> > --- 4.4.0+ w/ sync patch
> >
> > # cat /proc/slabinfo | grep xfs
> > xfs_dqtrx 0 0 528 62 8 : tunables 0 0 0 : slabdata 0 0 0
> > xfs_dquot 0 0 656 49 8 : tunables 0 0 0 : slabdata 0 0 0
> > xfs_buf 428214 428514 640 51 8 : tunables 0 0 0 : slabdata 8719 8719 0
> > xfs_icr 0 0 144 56 2 : tunables 0 0 0 : slabdata 0 0 0
> > xfs_inode 11054375 11054438 1728 18 8 : tunables 0 0 0 : slabdata 721323 721323 0
> > xfs_efd_item 0 0 400 40 4 : tunables 0 0 0 : slabdata 0 0 0
> > xfs_da_state 544 544 480 34 4 : tunables 0 0 0 : slabdata 16 16 0
> > xfs_btree_cur 0 0 208 39 2 : tunables 0 0 0 : slabdata 0 0 0
> >
> > # time sync
> >
> > real 0m0.040s
> > user 0m0.001s
> > sys 0m0.003s
> > # time sync
> >
> > real 0m0.002s
> > user 0m0.001s
> > sys 0m0.002s
> >
> > [1] -- fs_mark -D 1000 -S4 -n 1000 -d /mnt/0 ... -d /mnt/15 -L 32
> > - 1TB XFS fs
> >
> > --- 4.4.0+
> >
> > FSUse% Count Size Files/sec App Overhead
> > 2 16000 51200 3313.3 822514
> > 2 32000 51200 3353.6 310268
> > 2 48000 51200 3475.2 289941
> > 2 64000 51200 3104.6 289993
> > 2 80000 51200 2944.9 292124
> > 2 96000 51200 3010.4 288042
> > 3 112000 51200 2756.4 289761
> > 3 128000 51200 2753.2 288096
> > 3 144000 51200 2474.4 290797
> > 3 160000 51200 2657.9 290898
> > 3 176000 51200 2498.0 288247
> > 3 192000 51200 2415.5 287329
> > 3 208000 51200 2336.1 291113
> > 3 224000 51200 2352.9 290103
> > 3 240000 51200 2309.6 289580
> > 3 256000 51200 2344.3 289828
> > 3 272000 51200 2293.0 291282
> > 3 288000 51200 2295.5 286538
> > 4 304000 51200 2119.0 288906
> > 4 320000 51200 2059.6 293605
> > 4 336000 51200 2129.1 289825
> > 4 352000 51200 1929.8 288186
> > 4 368000 51200 1987.5 294596
> > 4 384000 51200 1929.1 293528
> > 4 400000 51200 1934.8 288138
> > 4 416000 51200 1823.6 292318
> > 4 432000 51200 1838.7 290890
> > 4 448000 51200 1797.5 288816
> > 4 464000 51200 1823.2 287190
> > 4 480000 51200 1738.7 295745
> > 4 496000 51200 1716.4 293821
> > 5 512000 51200 1726.7 290445
> >
> > --- 4.4.0+ w/ sync patch
> >
> > FSUse% Count Size Files/sec App Overhead
> > 2 16000 51200 3409.7 999579
> > 2 32000 51200 3481.3 286877
> > 2 48000 51200 3447.3 282743
> > 2 64000 51200 3522.3 283400
> > 2 80000 51200 3427.0 286360
> > 2 96000 51200 3360.2 307219
> > 3 112000 51200 3377.7 286625
> > 3 128000 51200 3363.7 285929
> > 3 144000 51200 3345.7 283138
> > 3 160000 51200 3384.9 291081
> > 3 176000 51200 3084.1 285265
> > 3 192000 51200 3388.4 291439
> > 3 208000 51200 3242.8 286332
> > 3 224000 51200 3337.9 285006
> > 3 240000 51200 3442.8 292109
> > 3 256000 51200 3230.3 283432
> > 3 272000 51200 3358.3 286996
> > 3 288000 51200 3309.0 288058
> > 4 304000 51200 3293.4 284309
> > 4 320000 51200 3221.4 284476
> > 4 336000 51200 3241.5 283968
> > 4 352000 51200 3228.3 284354
> > 4 368000 51200 3255.7 286072
> > 4 384000 51200 3094.6 290240
> > 4 400000 51200 3385.6 288158
> > 4 416000 51200 3265.2 284387
> > 4 432000 51200 3315.2 289656
> > 4 448000 51200 3275.1 284562
> > 4 464000 51200 3238.4 294976
> > 4 480000 51200 3060.0 290088
> > 4 496000 51200 3359.5 286949
> > 5 512000 51200 3156.2 288126
> >
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
next prev parent reply other threads:[~2016-01-21 17:13 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-19 17:59 [PATCH v6 0/2] improve sync efficiency with sb inode wb list Brian Foster
2016-01-19 17:59 ` [PATCH v6 1/2] sb: add a new writeback list for sync Brian Foster
2016-01-20 13:26 ` Jan Kara
2016-01-20 20:11 ` Dave Chinner
2016-01-21 15:22 ` Brian Foster
2016-01-21 16:34 ` Jan Kara
2016-01-21 17:13 ` Brian Foster [this message]
2016-01-21 18:08 ` Josef Bacik
2016-01-19 17:59 ` [PATCH v6 2/2] wb: inode writeback list tracking tracepoints Brian Foster
2016-01-20 13:14 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160121171306.GC19272@bfoster.bfoster \
--to=bfoster@redhat.com \
--cc=david@fromorbit.com \
--cc=dchinner@redhat.com \
--cc=jack@suse.cz \
--cc=jbacik@fb.com \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.