From: Abhijith Das <adas@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [GFS2 PATCH] gfs2: Remove inode from ordered write list in gfs2_write_inode()
Date: Mon, 29 Jan 2018 11:08:26 -0500 (EST) [thread overview]
Message-ID: <803959826.4237011.1517242106693.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <f52b5cdb-37d2-9afa-2cab-02f6263f1824@redhat.com>
----- Original Message -----
> From: "Steven Whitehouse" <swhiteho@redhat.com>
> To: "Abhi Das" <adas@redhat.com>, cluster-devel at redhat.com
> Sent: Monday, January 29, 2018 3:46:39 AM
> Subject: Re: [Cluster-devel] [GFS2 PATCH] gfs2: Remove inode from ordered write list in gfs2_write_inode()
>
> Hi,
>
> Looks good. Do you have any figures for how big a reduction in inodes on
> the order list this gives?
>
Here are some numbers with my million file creation test. It basically creates
a directory tree with 10000 directories at the deepest level with each of those
containing 100 8K size files.
I added some instrumentation to periodically print out the size of the ordered
write list and a count of the places from which inodes are added to and removed
from it.
This is the output without the patch:
[820545.521293] Ord list Size:23904 +[add_inode:23904] -[trunc:0 evict:0 wait:0 write_inode:0 ord_write:0 setflags:0]
[820575.535980] Ord list Size:73670 +[add_inode:73670] -[trunc:0 evict:0 wait:0 write_inode:0 ord_write:0 setflags:0]
...
[820966.796087] Ord list Size:667066 +[add_inode:667066] -[trunc:0 evict:0 wait:0 write_inode:0 ord_write:0 setflags:0]
[820996.811789] Ord list Size:705551 +[add_inode:710262] -[trunc:0 evict:0 wait:0 write_inode:0 ord_write:4711 setflags:0]
[821026.825525] Ord list Size:676518 +[add_inode:751608] -[trunc:0 evict:36 wait:0 write_inode:0 ord_write:75054 setflags:0]
...
[821146.927283] Ord list Size:587353 +[add_inode:922395] -[trunc:0 evict:376 wait:0 write_inode:0 ord_write:334666 setflags:0]
[821177.204024] Ord list Size:573822 +[add_inode:965143] -[trunc:0 evict:379 wait:0 write_inode:0 ord_write:390942 setflags:0]
[821207.402720] Ord list Size:569946 +[add_inode:1000000] -[trunc:0 evict:10143 wait:0 write_inode:0 ord_write:419911 setflags:0]
>> Test has ended here when we hit a million files - took 662 seconds
[821237.416418] Ord list Size:564908 +[add_inode:1000000] -[trunc:0 evict:10143 wait:0 write_inode:0 ord_write:424949 setflags:0]
[821267.429116] Ord list Size:564908 +[add_inode:1000000] -[trunc:0 evict:10143 wait:0 write_inode:0 ord_write:424949 setflags:0]
...
[821387.475956] Ord list Size:564908 +[add_inode:1000000] -[trunc:0 evict:10143 wait:0 write_inode:0 ord_write:424949 setflags:0]
>> I did an unmount here
[821398.943003] Ord list Size:0 +[add_inode:1000000] -[trunc:0 evict:575051 wait:0 write_inode:0 ord_write:424949 setflags:0]
As you can see, the size of the ordered list steadily climbs as we create more
files and a portion of it gets pruned in gfs2_ordered_write() where we remove
inodes whose mappings have zero pages (result of my patch from a few weeks ago)
A large part of the list is still around well after the test completes and
doesn't get cleared out until unmount.
With the latest patch:
[819186.894664] Ord list Size:2586 +[add_inode:2586] -[trunc:0 evict:0 wait:0 write_inode:0 ord_write:0 setflags:0]
[819217.871372] Ord list Size:46996 +[add_inode:48240] -[trunc:0 evict:0 wait:0 write_inode:1244 ord_write:0 setflags:0]
[819247.891066] Ord list Size:52424 +[add_inode:97568] -[trunc:0 evict:0 wait:0 write_inode:45144 ord_write:0 setflags:0]
[819278.379780] Ord list Size:48550 +[add_inode:144557] -[trunc:0 evict:0 wait:0 write_inode:96007 ord_write:0 setflags:0]
...
[819760.628004] Ord list Size:49686 +[add_inode:937372] -[trunc:0 evict:0 wait:0 write_inode:887686 ord_write:0 setflags:0]
[819791.206728] Ord list Size:48754 +[add_inode:986127] -[trunc:0 evict:0 wait:0 write_inode:937373 ord_write:0 setflags:0]
[819821.223436] Ord list Size:14212 +[add_inode:1000000] -[trunc:0 evict:0 wait:0 write_inode:985788 ord_write:0 setflags:0]
>> test ends here - took 635s
[819851.236137] Ord list Size:0 +[add_inode:1000000] -[trunc:0 evict:0 wait:0 write_inode:1000000 ord_write:0 setflags:0]
[819881.247826] Ord list Size:0 +[add_inode:1000000] -[trunc:0 evict:0 wait:0 write_inode:1000000 ord_write:0 setflags:0]
With the removal of non dirty inodes from the ordered list in gfs2_write_inode(),
we don't let things get out of hand and only about 50k inodes are on the list at
any point. Also, as the test completes, the entire list is cleared out.
In my testing I haven't encountered any issues with it, but if somebody can tack
this patch on to their test kernels (or I can provide one) I'd be more comfortable.
Cheers!
--Abhi
> Steve.
>
>
> On 29/01/18 05:02, Abhi Das wrote:
> > The vfs clears the I_DIRTY inode flag before calling gfs2_write_inode()
> > having queued any data that needed to be written to disk.
> > This is a good time to remove such inodes from our ordered write list
> > so they don't hang around for long periods of time.
> >
> > Signed-off-by: Abhi Das <adas@redhat.com>
> > ---
> > fs/gfs2/super.c | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> > index d81d46e..596feb6 100644
> > --- a/fs/gfs2/super.c
> > +++ b/fs/gfs2/super.c
> > @@ -766,6 +766,12 @@ static int gfs2_write_inode(struct inode *inode,
> > struct writeback_control *wbc)
> > ret = filemap_fdatawait(metamapping);
> > if (ret)
> > mark_inode_dirty_sync(inode);
> > + else {
> > + spin_lock(&inode->i_lock);
> > + if (!(inode->i_flags & I_DIRTY))
> > + gfs2_ordered_del_inode(ip);
> > + spin_unlock(&inode->i_lock);
> > + }
> > return ret;
> > }
> >
>
>
next prev parent reply other threads:[~2018-01-29 16:08 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-29 5:02 [Cluster-devel] [GFS2 PATCH] gfs2: Remove inode from ordered write list in gfs2_write_inode() Abhi Das
2018-01-29 9:46 ` Steven Whitehouse
2018-01-29 16:08 ` Abhijith Das [this message]
2018-01-30 17:23 ` Bob Peterson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=803959826.4237011.1517242106693.JavaMail.zimbra@redhat.com \
--to=adas@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.