Re: [PATCH] [RFC] xfs: initialise attr fork on inode create

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH] [RFC] xfs: initialise attr fork on inode create
Date: Sat, 5 Dec 2020 06:34:44 -0500	[thread overview]
Message-ID: <20201205113444.GA1485029@bfoster> (raw)
In-Reply-To: <20201204212222.GG3913616@dread.disaster.area>

On Sat, Dec 05, 2020 at 08:22:22AM +1100, Dave Chinner wrote:
> On Fri, Dec 04, 2020 at 07:31:37AM -0500, Brian Foster wrote:
> > On Thu, Dec 03, 2020 at 10:27:24AM +1100, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > When we allocate a new inode, we often need to add an attribute to
> > > the inode as part of the create. This can happen as a result of
> > > needing to add default ACLs or security labels before the inode is
> > > made visible to userspace.
> > > 
> > > This is highly inefficient right now. We do the create transaction
> > > to allocate the inode, then we do an "add attr fork" transaction to
> > > modify the just created empty inode to set the inode fork offset to
> > > allow attributes to be stored, then we go and do the attribute
> > > creation.
> > > 
> > > This means 3 transactions instead of 1 to allocate an inode, and
> > > this greatly increases the load on the CIL commit code, resulting in
> > > excessive contention on the CIL spin locks and performance
> > > degradation:
> > > 
> > >  18.99%  [kernel]                [k] __pv_queued_spin_lock_slowpath
> > >   3.57%  [kernel]                [k] do_raw_spin_lock
> > >   2.51%  [kernel]                [k] __raw_callee_save___pv_queued_spin_unlock
> > >   2.48%  [kernel]                [k] memcpy
> > >   2.34%  [kernel]                [k] xfs_log_commit_cil
> > > 
> > > The typical profile resulting from running fsmark on a selinux enabled
> > > filesytem is adds this overhead to the create path:
> > > 
> > ...
> > > 
> > > And fsmark creation rate performance drops by ~25%. The key point to
> > > note here is that half the additional overhead comes from adding the
> > > attribute fork to the newly created inode. That's crazy, considering
> > > we can do this same thing at inode create time with a couple of
> > > lines of code and no extra overhead.
> > > 
> > > So, if we know we are going to add an attribute immediately after
> > > creating the inode, let's just initialise the attribute fork inside
> > > the create transaction and chop that whole chunk of code out of
> > > the create fast path. This completely removes the performance
> > > drop caused by enabling SELinux, and the profile looks like:
> > > 
> > ...
> > > 
> > > Which indicates the XFS overhead of creating the selinux xattr has
> > > been halved. This doesn't fix the CIL lock contention problem, just
> > > means it's not a limiting factor for this workload. Lock contention
> > > in the security subsystems is going to be an issue soon, though...
> > > 
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > ---
> > >  fs/xfs/libxfs/xfs_inode_fork.c | 20 +++++++++++++++-----
> > >  fs/xfs/libxfs/xfs_inode_fork.h |  1 +
> > >  fs/xfs/xfs_inode.c             | 24 ++++++++++++++++++++----
> > >  fs/xfs/xfs_inode.h             |  5 +++--
> > >  fs/xfs/xfs_iops.c              | 10 +++++++++-
> > >  fs/xfs/xfs_qm.c                |  2 +-
> > >  fs/xfs/xfs_symlink.c           |  2 +-
> > >  7 files changed, 50 insertions(+), 14 deletions(-)
> > > 
> > ...
> > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > > index 2bfbcf28b1bd..9ee2e0b4c6fd 100644
> > > --- a/fs/xfs/xfs_inode.c
> > > +++ b/fs/xfs/xfs_inode.c
> > ...
> > > @@ -918,6 +919,18 @@ xfs_ialloc(
> > >  		ASSERT(0);
> > >  	}
> > >  
> > > +	/*
> > > +	 * If we need to create attributes immediately after allocating the
> > > +	 * inode, initialise an empty attribute fork right now. We use the
> > > +	 * default fork offset for attributes here as we don't know exactly what
> > > +	 * size or how many attributes we might be adding. We can do this safely
> > > +	 * here because we know the data fork is completely empty right now.
> > > +	 */
> > > +	if (init_attrs) {
> > > +		ip->i_afp = xfs_ifork_alloc(XFS_DINODE_FMT_EXTENTS, 0);
> > > +		ip->i_d.di_forkoff = xfs_default_attroffset(ip) >> 3;
> > > +	}
> > > +
> > 
> > Seems reasonable in principle, but why not refactor
> > xfs_bmap_add_attrfork() such that the internals (i.e. everything within
> > the transaction/ilock code) can be properly reused in both contexts
> > rather than open-coding (and thus duplicating) a somewhat stripped down
> > version?
> 
> We don't know the size of the attribute that is being created, so
> the attr size dependent parts of it can't be used.
> 

Not sure I see the problem here. It looks to me that
xfs_bmap_add_attrfork() would do the right thing if we just passed a
size of zero. The only place the size value is actually used is down in
xfs_attr_shortform_bytesfit(), and I'd expect that to identify that the
requested size is <= than the current afork size (also zero for a newly
allocated inode..?) and bail out.

That said, I wouldn't be opposed to tweaking xfs_bmap_set_attrforkoff()
by a line or two to just skip the shortform call if size == 0. Then we
can be more explicit about the "size == 0 means preemptive fork alloc,
use the default offset" use case and perhaps actually document it with
some comments as well.

Brian

> > At a glance, it looks like there are some subtle differences in
> > the initial setup of the attr fork for a device node inode, for example.
> 
> Yes, there's a difference, but it's largely irrelevant as adding
> the first attribute to a device format inode will reset the
> forkoffset to the min via xfs_attr_shortform_bytesfit().
> 
> And if the attribute is larger than will fit in the default fork
> offset space, but can fit the attr in shrotform by shrinking the
> empty data fork space, xfs_attr_shortform_bytesfit() will do that as
> well. IOWs, we only need to set a non-zero fork offset here and init
> the ip->i_afp pointer - immediately setting an attribute on the
> empty inode literal area will do the rest for the fork offset setup
> for us...
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
>

next prev parent reply	other threads:[~2020-12-05 11:40 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-02 23:27 [PATCH] [RFC] xfs: initialise attr fork on inode create Dave Chinner
2020-12-03  8:40 ` Christoph Hellwig
2020-12-03 21:44   ` Dave Chinner
2020-12-04  7:54     ` Christoph Hellwig
2020-12-07 17:22       ` Casey Schaufler
2020-12-07 17:25         ` Christoph Hellwig
2020-12-07 20:49           ` Dave Chinner
2020-12-04 12:31 ` Brian Foster
2020-12-04 21:22   ` Dave Chinner
2020-12-05 11:34     ` Brian Foster [this message]
2020-12-06 23:33       ` Dave Chinner
2020-12-07 16:30         ` Brian Foster
2020-12-07 17:18           ` Darrick J. Wong
2020-12-07 17:12 ` Darrick J. Wong
2020-12-07 17:31 ` Christoph Hellwig
2020-12-07 20:42   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201205113444.GA1485029@bfoster \
    --to=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.