From: "Darrick J. Wong" <djwong@kernel.org>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org, david@fromorbit.com, hch@infradead.org
Subject: Re: [PATCH 06/16] xfs: defer inode inactivation to a workqueue
Date: Thu, 17 Jun 2021 11:41:11 -0700 [thread overview]
Message-ID: <20210617184111.GD158232@locust> (raw)
In-Reply-To: <YMta7aDtYSSV/CPd@bfoster>
On Thu, Jun 17, 2021 at 10:23:41AM -0400, Brian Foster wrote:
> On Tue, Jun 15, 2021 at 01:53:24PM -0700, Darrick J. Wong wrote:
> > On Tue, Jun 15, 2021 at 10:43:28AM -0400, Brian Foster wrote:
> > > On Mon, Jun 14, 2021 at 12:27:20PM -0700, Darrick J. Wong wrote:
> > > > On Mon, Jun 14, 2021 at 12:17:03PM -0400, Brian Foster wrote:
> > > > > On Sun, Jun 13, 2021 at 10:20:29AM -0700, Darrick J. Wong wrote:
> > > > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > > >
> > > > > > Instead of calling xfs_inactive directly from xfs_fs_destroy_inode,
> > > > > > defer the inactivation phase to a separate workqueue. With this change,
> > > > > > we can speed up directory tree deletions by reducing the duration of
> > > > > > unlink() calls to the directory and unlinked list updates.
> > > > > >
> > > > > > By moving the inactivation work to the background, we can reduce the
> > > > > > total cost of deleting a lot of files by performing the file deletions
> > > > > > in disk order instead of directory entry order, which can be arbitrary.
> > > > > >
> > > > > > We introduce two new inode flags -- NEEDS_INACTIVE and INACTIVATING.
> > > > > > The first flag helps our worker find inodes needing inactivation, and
> > > > > > the second flag marks inodes that are in the process of being
> > > > > > inactivated. A concurrent xfs_iget on the inode can still resurrect the
> > > > > > inode by clearing NEEDS_INACTIVE (or bailing if INACTIVATING is set).
> > > > > >
> > > > > > Unfortunately, deferring the inactivation has one huge downside --
> > > > > > eventual consistency. Since all the freeing is deferred to a worker
> > > > > > thread, one can rm a file but the space doesn't come back immediately.
> > > > > > This can cause some odd side effects with quota accounting and statfs,
> > > > > > so we flush inactivation work during syncfs in order to maintain the
> > > > > > existing behaviors, at least for callers that unlink() and sync().
> > > > > >
> > > > > > For this patch we'll set the delay to zero to mimic the old timing as
> > > > > > much as possible; in the next patch we'll play with different delay
> > > > > > settings.
> > > > > >
> > > > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > > > > ---
> > > > >
> > > > > Just some high level questions on a first/cursory pass..
> > > > >
> > > > > > Documentation/admin-guide/xfs.rst | 3
> > > > > > fs/xfs/scrub/common.c | 7 +
> > > > > > fs/xfs/xfs_icache.c | 332 ++++++++++++++++++++++++++++++++++---
> > > > > > fs/xfs/xfs_icache.h | 5 +
> > > > > > fs/xfs/xfs_inode.h | 19 ++
> > > > > > fs/xfs/xfs_log_recover.c | 7 +
> > > > > > fs/xfs/xfs_mount.c | 26 +++
> > > > > > fs/xfs/xfs_mount.h | 21 ++
> > > > > > fs/xfs/xfs_super.c | 53 ++++++
> > > > > > fs/xfs/xfs_trace.h | 50 +++++-
> > > > > > 10 files changed, 490 insertions(+), 33 deletions(-)
> > > > > >
> > > > > >
> > > > > > diff --git a/Documentation/admin-guide/xfs.rst b/Documentation/admin-guide/xfs.rst
> > > > > > index 8de008c0c5ad..f9b109bfc6a6 100644
> > > > > > --- a/Documentation/admin-guide/xfs.rst
> > > > > > +++ b/Documentation/admin-guide/xfs.rst
> > > > > > @@ -524,7 +524,8 @@ and the short name of the data device. They all can be found in:
> > > > > > mount time quotacheck.
> > > > > > xfs-gc Background garbage collection of disk space that have been
> > > > > > speculatively allocated beyond EOF or for staging copy on
> > > > > > - write operations.
> > > > > > + write operations; and files that are no longer linked into
> > > > > > + the directory tree.
> > > > > > ================ ===========
> > > > > >
> > > > > > For example, the knobs for the quotacheck workqueue for /dev/nvme0n1 would be
> > > > > ...
> > > > > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > > > > > index 4002f0b84401..e094c16aa8c5 100644
> > > > > > --- a/fs/xfs/xfs_icache.c
> > > > > > +++ b/fs/xfs/xfs_icache.c
> > > > > ...
> > > > > > @@ -262,6 +285,9 @@ xfs_perag_set_inode_tag(
> > > > > > case XFS_ICI_BLOCKGC_TAG:
> > > > > > xfs_blockgc_queue(pag);
> > > > > > break;
> > > > > > + case XFS_ICI_INODEGC_TAG:
> > > > > > + xfs_inodegc_queue(mp);
> > > > > > + break;
> > > > > > }
> > > > > >
> > > > > > trace_xfs_perag_set_inode_tag(mp, pag->pag_agno, tag, _RET_IP_);
> > > > > > @@ -338,28 +364,26 @@ xfs_inode_mark_reclaimable(
> > > > > > {
> > > > > > struct xfs_mount *mp = ip->i_mount;
> > > > > > struct xfs_perag *pag;
> > > > > > + unsigned int tag;
> > > > > > bool need_inactive = xfs_inode_needs_inactive(ip);
> > > > >
> > > > > Nit: I think we usually try to avoid function calls in the variable
> > > > > declarations like this..
> > > >
> > > > <shrug> For boolean flags that don't change value in the function body
> > > > I'd rather they be initialized at declaration time so that code
> > > > maintainers don't have to remember if a variable is initialized or not.
> > > > need_inactive has scope and a valid value wherever you are in the
> > > > function body.
> > > >
> > >
> > > I think the usual pattern is to init such variables right after the
> > > declarations so as to not obfuscate function calls (or at least I find
> > > that much cleaner).
> >
> > Very well, I'll change it back.
> >
> > > > > >
> > > > > > if (!need_inactive) {
> > > > > > /* Going straight to reclaim, so drop the dquots. */
> > > > > > xfs_qm_dqdetach(ip);
> > > > > > - } else {
> > > > > > - xfs_inactive(ip);
> > > > > > - }
> > > > > >
> > > > >
> > > > > Ok, so obviously this disconnects the unlink -> destroy path from the
> > > > > historical inactive -> reclaimable sequence. Is there any concern for
> > > > > example from a memory usage standpoint or anything where we might want
> > > > > to consider waiting on outstanding work here? E.g., if we had a large
> > > > > filesystem, limited memory and enough CPUs doing unlinks where the
> > > > > background reclaim simply can't keep up.
> > > >
> > > > So far in my evaluation, deferred inactivation has improved reclaim
> > > > behavior because evict_inodes can push all eligible inodes all the way
> > > > to IRECLAIMABLE without getting stuck behind xfs_inactive, and the
> > > > shrinker can free other cached inodes sooner. I think we're still
> > > > mostly reliant on xfs_inode_alloc looping when memory is low to throttle
> > > > the size of the inode cache, since I think mass deletions are what push
> > > > the inodegc code the hardest. If reclaim can't free inodes fast enough,
> > > > I think that means the AIL is running slowly, in which case both
> > > > frontend and background workers will block on the log.
> > > >
> > > > > I suppose it's a similar situation that inode reclaim is in, but we do
> > > > > have shrinker feedback to hook back in and actually free available
> > > > > objects. Do we have or need something like that if too many objects are
> > > > > stuck waiting on inactivation?
> > > >
> > > > There's no direct link between the shrinker and the inodegc worker,
> > > > which means that it can't push the inodegc worker to get even more
> > > > inodes to IRECLAIMABLE. I've played with racing directory tree
> > > > deletions of a filesystem with 8M inodes against fsstress on VMs with
> > > > 512M of memory, but so far I haven't been able to drive the system into
> > > > OOM any more easily than I could before this patch. The unlink calls
> > > > seem to get stuck under xfs_iget, like I expect.
> > > >
> > >
> > > Ok, that's sort of the scenario I was thinking about. I have a high
> > > memory/cpu count box I occasionally use for such purposes. I threw an
> > > fs_mark 1k file creation workload at a large fs on that system and let
> > > it run to -ENOSPC, then ran a 64x parallel rm -rf of the top-level
> > > subdirs. In the baseline case (a 5.12 based distro kernel), I see fairly
> > > stable xfs_inode slab usage throughout the removal. It mostly hovers
> > > from 150k-300k objects, sometimes spikes into the 500k-600k range, and
> > > the test completes as expected.
> >
> > Hm. Without any patches applied, I loaded up a filesystem with 10
> > million inodes in a VM with 512M of RAM and ran rm -rf /mnt/*. The
> > xfs_inode slab usage hovered around 20-30k objects, and occasionally
> > spiked to about 80k. The inode count would decrease at about 7k inodes
> > per second...
> >
> > > I see much different behavior from the same test on for-next plus this
> > > patch. xfs_inode slab usage continuously increases from the onset,
> > > eventually gets up to over 175m objects (~295GB worth of inodes), and
> > > the oom killer takes everything down. This occurs after only about 10%
> > > of the total consumed space has been removed.
> >
> > ...but as soon as I pushed the patch stack up to this particular patch,
> > the slab cache usage went straight up to about 200k objects and the VM
> > OOMed shortly thereafter. Until that happened, I could see the icount
> > going down about about 12k inodes per second, but that doesn't do users
> > much good. I observed that if I made xfs_inode_mark_reclaimable call
> > flush_work on the inodegc work, the behaviors went back to the steady
> > but slow deletion, though the VM still OOMed.
> >
> > I bet I'm messing up the shrinker accounting here, because an inode
> > that's queued for inactivation is no longer on the vfs s_inode_lru and
> > it's not tagged for reclaim, so a call to super_cache_count will not
> > know about the inode, decide there isn't any reclaim work to do for this
> > XFS and trip the OOM killer.
> >
>
> Sounds plausible.
Indeed, if I bump the return value by one if there's any inactivation
work to do, the problem seems to go away, at least if reclaim_inodes_nr
requeues the inodegc worker immediately.
> > The other thing we might want to do is throttle the frontend so it won't
> > get too far ahead of the background workers. If xfs_reclaim_inodes_nr
> > requeues the worker immediately and xfs_inode_mark_reclaimable calls
> > flush_work on the background worker, the OOMs seem to go away.
> >
>
> Yes, that seems like the primary disconnect. At some point and somehow,
> I think the unlink work needs to throttle against the debt of background
> work it creates by just speeding along dumping fully unlinked inodes on
> the "needs inactivation" workqueue. From there, the rate of unlinks or
> frees might not matter as much, because that could always change based
> on cpu count, RAM, storage, etc.
<nod> At least for the single-threaded inodegc case, it seems to work
all right if reclaim_inodes_nr() can set a flag to indicate that we hit
memory pressure, kick the inodegc worker immediately, and then
mark_reclaimable can flush_work() if the inode needs inactivation, the
caller is not itself in a memory reclaim context, and the pressure flag
is set.
By 'all right', I mean that with no patches and 512M of RAM, the VM will
OOM on a deltree of my 10 million inode fs; bumping the memory to 960M
makes the OOM go away; and it doesn't come back once I roll all the
patches back onto the branch.
I don't know that this really qualifies as winning, though the test fs
has a 1GB log, so with DRAM < LOG I guess that implies we can OOM long
before anyone throttles on the log space.
(xfs_repair -n will also sometimes OOM on this filesystem when the VM
only has 512M of RAM...)
> > > I've not dug into the behavior enough to see where things are getting
> > > backed up during this test, and I'm sure that this is not an
> > > ideal/typical configuration (and increased slab usage is probably to be
> > > expected given the nature of the change), but it does seem like we might
> > > need some kind of adjustment in place to manage this kind of breakdown
> > > scenario.
> >
> > When I pushed the stack up to the parallel inactivation series, the
> > behavior changes again -- this time it takes a lot longer to push the VM
> > to OOM. I think what's going on is that once I switch this to per-AG
> > workers, the workers can keep up with the frontend that's inactivating
> > the inodes. At least until those workers start competing with the
> > unlink() activity for log space, and then they fall behind and the box
> > OOMs again. Unfortunately, the changes I made above didn't prevent the
> > OOM.
> >
>
> Hm, Ok. So it sounds like that's a significant optimization but not
> necessarily a fundamental fix. I.e., an environment where CPUs greatly
> exceeds the AG count might still run into this kind of thing..? I can
> repeat the test with subsequent patches to explore that, but my test
> system is currently occupied with testing Dave's latest logging patches.
Hehe. I /think/ I wrote an implementation bug in the parallel inodegc
patch when I pushed it back on the stack. Though TBH I haven't made
that much progress with the single-threaded case since I've also been
running around in circles with the recent logging problems.
> > > > The VFS still has this (erroneous) notion that calling ->destroy_inode
> > > > will directly reclaim inodes from XFS, but the only difference from
> > > > today's code is that it can take longer for inodes to get to that state.
> > > > To fix that, I've also thought about making xfs_fs_free_cached_objects
> > > > requeue the inodegc worker immediately (but without waiting for it), but
> > > > so far I haven't needed to do that to avoid trouble.
> > > >
> > > > > > - if (!XFS_FORCED_SHUTDOWN(mp) && ip->i_delayed_blks) {
> > > > > > - xfs_check_delalloc(ip, XFS_DATA_FORK);
> > > > > > - xfs_check_delalloc(ip, XFS_COW_FORK);
> > > > > > - ASSERT(0);
> > > > > > + if (!XFS_FORCED_SHUTDOWN(mp) && ip->i_delayed_blks) {
> > > > > > + xfs_check_delalloc(ip, XFS_DATA_FORK);
> > > > > > + xfs_check_delalloc(ip, XFS_COW_FORK);
> > > > > > + ASSERT(0);
> > > > > > + }
> > > > > > }
> > > > > >
> > > > > > XFS_STATS_INC(mp, vn_reclaim);
> > > > > >
> > > > > > /*
> > > > > > - * We should never get here with one of the reclaim flags already set.
> > > > > > + * We should never get here with any of the reclaim flags already set.
> > > > > > */
> > > > > > - ASSERT_ALWAYS(!xfs_iflags_test(ip, XFS_IRECLAIMABLE));
> > > > > > - ASSERT_ALWAYS(!xfs_iflags_test(ip, XFS_IRECLAIM));
> > > > > > + ASSERT_ALWAYS(!xfs_iflags_test(ip, XFS_ALL_IRECLAIM_FLAGS));
> > > > > >
> > > > > > /*
> > > > > > * We always use background reclaim here because even if the inode is
> > > > > ...
> > > > > > @@ -889,22 +936,32 @@ xfs_dqrele_igrab(
> > > > > >
> > > > > > /*
> > > > > > * Skip inodes that are anywhere in the reclaim machinery because we
> > > > > > - * drop dquots before tagging an inode for reclamation.
> > > > > > + * drop dquots before tagging an inode for reclamation. If the inode
> > > > > > + * is being inactivated, skip it because inactivation will drop the
> > > > > > + * dquots for us.
> > > > > > */
> > > > > > - if (ip->i_flags & (XFS_IRECLAIM | XFS_IRECLAIMABLE))
> > > > > > + if (ip->i_flags & (XFS_IRECLAIM | XFS_IRECLAIMABLE | XFS_INACTIVATING))
> > > > > > goto out_unlock;
> > > > >
> > > > > I was a little curious why we'd skip reclaimable inodes here but not
> > > > > need_inactive..
> > > > >
> > > > > >
> > > > > > /*
> > > > > > - * The inode looks alive; try to grab a VFS reference so that it won't
> > > > > > - * get destroyed. If we got the reference, return true to say that
> > > > > > - * we grabbed the inode.
> > > > > > + * If the inode is queued but not undergoing inactivation, set the
> > > > > > + * inactivating flag so everyone will leave it alone and return true
> > > > > > + * to say that we are taking ownership of it.
> > > > > > + *
> > > > > > + * Otherwise, the inode looks alive; try to grab a VFS reference so
> > > > > > + * that it won't get destroyed. If we got the reference, return true
> > > > > > + * to say that we grabbed the inode.
> > > > > > *
> > > > > > * If we can't get the reference, then we know the inode had its VFS
> > > > > > * state torn down and hasn't yet entered the reclaim machinery. Since
> > > > > > * we also know that dquots are detached from an inode before it enters
> > > > > > * reclaim, we can skip the inode.
> > > > > > */
> > > > > > - ret = igrab(VFS_I(ip)) != NULL;
> > > > > > + ret = true;
> > > > > > + if (ip->i_flags & XFS_NEED_INACTIVE)
> > > > > > + ip->i_flags |= XFS_INACTIVATING;
> > > > > > + else if (!igrab(VFS_I(ip)))
> > > > > > + ret = false;
> > > > >
> > > > > ... but I guess we process the latter (in that we release the dquots and
> > > > > revert the just added inactivating state). Is this a functional
> > > > > requirement or considered an optimization? FWIW, it would be helpful if
> > > > > the comment more explained why we did this as such as opposed to just
> > > > > reflecting what the code already does.
> > > >
> > > > It's an optimization to avoid stalling quotaoff on NEEDS_INACTIVE
> > > > inodes. This means that quotaoff can now cause the background inodegc
> > > > worker to have to loop again, but seeing as quotaoff can pin the log
> > > > tail, I decided that was a reasonable tradeoff.
> > > > How does this rewording sound to you?
> > > >
> > >
> > > Ok. I guess it would be nice to see some separation between core
> > > functionality and this kind of optimization. It would make things more
> > > clear to review for one, but also provides a trail of documentation and
> > > intent if something like the approach to handling dquots causes some
> > > problem in the future and the poor developer who might have to analyze
> > > it otherwise has to navigate how this fits into the broader functional
> > > rework. But I know people tend to bristle lately over factoring and
> > > readability feedback and this is v7, so that's just my .02. That aside,
> > > the comment below looks fine to me.
> >
> > I don't mind breaking up a large patch into smaller more targeted ones,
> > because that's directly relevant to the code changes being proposed. It
> > also isn't a terribly large change. It's more things like being told to
> > fix or remove quotaoff (or that whole mess over building latency qos
> > functions to decide when to defer ioend clear_pagewriteback to a
> > workqueue) that bother me.
> >
>
> IMO, it's more important to separate things based on mechanism/policy or
> core function vs. optimizations and optional enhancements as opposed to
> things like patch size. I suppose sometimes a patch can just end up big
> enough that it's just easier to see a single functional change split up
> into smaller chunks, but it's not clear to me that is the case here.
> *shrug* Anyways, my comment is more around the mix of content than the
> size of the patch.
<nod> Ultimately, I did kick the optimization out to a separate patch.
I will probably end up modifying this patch to flush_work from
mark_reclaimable, then add a new patch to adjust the behavior to what we
really want (i.e. having a real feedback loop based on memory reclaim
trigger) so that we can capture some performance benefit.
> With regard to the quotaoff thing, it seems like this work has to
> include some nasty logic (here and in a subsequent patch) just to
> prevent quotaoff from becoming more deadlock prone than it already is.
Yep.
> Given that we have outstanding work (that both Dave and I have
> previously agreed is viable) to make quotaoff not deadlock prone, can we
> consider whether incorporating that work would simplify this work such
> that it can drop much of that logic?
As I've said before, I only keep dragging the quotaoff changes along in
this series because I don't want the resolution of that question to
become a prerequisite for landing this change...
> IIRC, the last time this came up Christoph suggested turning off
> quotaoff entirely, to which we all agreed. A patchset came, had some
> issues and hasn't been seen again.
...because that patchset came, had issues, and hasn't been resubmitted.
I can only build my development tree off of a stable point on the
upstream branch, so whatever I touch in quotaoff here is more or less
the bare minimum needed to keep fstests happy.
> So as far I'm concerned, if we're
> willing to turn quotaoff entirely, I don't care too much about if
> something like deferred inactivation makes quotaoff incrementally more
> slow (as opposed to turning it into a potential deadlock monster).
If we dumped quotaoff entirely, I'd gut these parts from the patchset.
> > > > /*
> > > > * If the inode is queued but not currently undergoing
> > > > * inactivation, we want to slip in to drop the dquots ASAP
> > > > * because quotaoff can pin the log tail and cause log livelock.
> > > > * Avoiding that is worth potentially forcing the inodegc worker
> > > > * to make another pass. Set INACTIVATING to prevent inodegc
> > > > * and iget from touching the inode.
> > > > *
> > > > * Otherwise, the inode looks alive; try to grab a VFS reference
> > > > * so that it won't get destroyed. If we got the reference,
> > > > * return true to say that we grabbed the inode.
> > > > *
> > > > * If we can't get the reference, then we know the inode had its
> > > > * VFS state torn down and hasn't yet entered the reclaim
> > > > * machinery. Since we also know that dquots are detached from
> > > > * an inode before it enters reclaim, we can skip the inode.
> > > > */
> > > >
> > > > > >
> > > > > > out_unlock:
> > > > > > spin_unlock(&ip->i_flags_lock);
> > > > > > @@ -917,6 +974,8 @@ xfs_dqrele_inode(
> > > > > > struct xfs_inode *ip,
> > > > > > struct xfs_icwalk *icw)
> > > > > > {
> > > > > > + bool live_inode;
> > > > > > +
> > > > > > if (xfs_iflags_test(ip, XFS_INEW))
> > > > > > xfs_inew_wait(ip);
> > > > > >
> > > > > > @@ -934,7 +993,19 @@ xfs_dqrele_inode(
> > > > > > ip->i_pdquot = NULL;
> > > > > > }
> > > > > > xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > > > > > - xfs_irele(ip);
> > > > > > +
> > > > > > + /*
> > > > > > + * If we set INACTIVATING earlier to prevent this inode from being
> > > > > > + * touched, clear that state to let the inodegc claim it. Otherwise,
> > > > > > + * it's a live inode and we need to release it.
> > > > > > + */
> > > > > > + spin_lock(&ip->i_flags_lock);
> > > > > > + live_inode = !(ip->i_flags & XFS_INACTIVATING);
> > > > > > + ip->i_flags &= ~XFS_INACTIVATING;
> > > > > > + spin_unlock(&ip->i_flags_lock);
> > > > > > +
> > > > > > + if (live_inode)
> > > > > > + xfs_irele(ip);
> > > > > > }
> > > > > >
> > > > > > /*
> > > > > ...
> > > > > > +/*
> > > > > > + * Force all currently queued inode inactivation work to run immediately, and
> > > > > > + * wait for the work to finish.
> > > > > > + */
> > > > > > +void
> > > > > > +xfs_inodegc_flush(
> > > > > > + struct xfs_mount *mp)
> > > > > > +{
> > > > > > + trace_xfs_inodegc_flush(mp, 0, _RET_IP_);
> > > > > > + flush_delayed_work(&mp->m_inodegc_work);
> > > > > > +}
> > > > > > +
> > > > > > +/* Disable the inode inactivation background worker and wait for it to stop. */
> > > > > > +void
> > > > > > +xfs_inodegc_stop(
> > > > > > + struct xfs_mount *mp)
> > > > > > +{
> > > > > > + if (!test_and_clear_bit(XFS_OPFLAG_INODEGC_RUNNING_BIT, &mp->m_opflags))
> > > > > > + return;
> > > > > > +
> > > > > > + cancel_delayed_work_sync(&mp->m_inodegc_work);
> > > > > > + trace_xfs_inodegc_stop(mp, 0, _RET_IP_);
> > > > > > +}
> > > > > > +
> > > > > > +/*
> > > > > > + * Enable the inode inactivation background worker and schedule deferred inode
> > > > > > + * inactivation work if there is any.
> > > > > > + */
> > > > > > +void
> > > > > > +xfs_inodegc_start(
> > > > > > + struct xfs_mount *mp)
> > > > > > +{
> > > > > > + if (test_and_set_bit(XFS_OPFLAG_INODEGC_RUNNING_BIT, &mp->m_opflags))
> > > > > > + return;
> > > > > > +
> > > > > > + trace_xfs_inodegc_start(mp, 0, _RET_IP_);
> > > > > > + xfs_inodegc_queue(mp);
> > > > > > +}
> > > > > > +
> > > > >
> > > > > This seems like a decent amount of boilerplate. Is all of this really
> > > > > necessary and/or is it worth it just for the added tracepoints?
> > > >
> > > > I've gotten a lot of use out of the tracepoints to debug all sorts of
> > > > weird mis-interactions between freeze, remounts, and the inodegc worker.
> > > >
> > >
> > > Is that due to heavy development activity or something you realistically
> > > expect to see used in production? I've nothing against tracepoints, but
> > > I almost always have custom tracepoints during development that are
> > > temporarily useful but eventually removed, and do question the tradeoff
> > > of creating a bunch of wrapper code just for the tracepoints.
> >
> > I've found myself repeatedly adding them to debug some unexpected
> > behavior, then removing them to polish things before publishing, and
> > finally got fed up with doing this over and over and over.
> >
>
> Fair enough. That's something that can be cleaned up down the road, if
> needed.
>
> > > > > ISTM
> > > > > that the start/stop thing is already technically racy wrt to queue
> > > > > activity and the bit check in the worker might be sufficient to avoid
> > > > > transaction/freeze deadlocks or whatever the original intent was. Hm?
> > > >
> > > > The bitflag isn't racy, though why isn't obvious. There are seven
> > > > places where we change the INODEGC_RUNNING state:
> > > >
> > > > - For freeze and ro remount, the VFS will take s_umount before calling
> > > > into XFS, and XFS then calls inodegc_flush to complete all the queued
> > > > work before changing the bit state.
> > > >
> > > > During the v6 review Dave wondered if we even needed the INODEGC
> > > > RUNNING flag, and I concluded that we need it for freeze, since we
> > > > allow inodes to sleep under xfs_inactive() while trying to allocate a
> > > > transaction. Under the new scheme, inodes can be queued for
> > > > inactivation on a frozen fs, but we need xfs_inodegc_queue to know not
> > > > to kick the worker without having to mess around with VFS locks to
> > > > check the freeze state.
> > > >
> > >
> > > Not sure I'm clear on what you describe above, but I'm not really
> > > concerned about the existence of the state bit as much. The above refers
> > > to xfs_inodegc_queue(), which already checks the bit, whereas my comment
> > > was just more around the need for the xfs_inodegc_start/stop() wrappers.
> >
> > Now I'm completely confused about what you're asking about here. Are
> > you suggesting that I could set and clear the INODEGC RUNNING bit
> > directly from xfs_fs_freeze and xfs_fs_unfreeze, change
> > xfs_inodegc_worker to bail out if INODEGC_RUNNING is set, and then
> > replace all the inodegc_stop/start calls with direct calls to
> > cancel_work_delayed_sync/xfs_inodegc_queue?
> >
>
> Eh, I'm probably getting lost in the weeds here. My primary concern was
> the proliferation of unnecessary boilerplate code and if we could manage
> the background/workqueue task like we manage other such background
> workers with respect to freeze and whatnot. If the helpers are going to
> stay around anyways, then this doesn't matter that much. Feel free to
> disregard this feedback.
Ah, ok. This is totally one of those things where I want the
tracepoints now because they're really helpful for debugging the
machinery, but if it lands and stabilizes for a few months I won't
object to them disappearing in one of the "get rid of trivial helpers"
cleanups that seem so popular in these parts. ;)
--D
>
> Brian
>
> > --D
> >
> > >
> > > Brian
> > >
> > > > - For thaw and rw remount, the VFS takes s_umount and calls XFS, which
> > > > re-enables the worker.
> > > >
> > > > - Mount and unmount run in isolation (since we can't have any open files
> > > > on the filesystem) so there won't be any other threads trying to call
> > > > into the filesystem.
> > > >
> > > > - I think xchk_{start,stop}_reaping is racy though. That code probably
> > > > has to sb_start_write to prevent freeze or ro remount from messing
> > > > with the INODEGC_RUNNING state. That would still leave a hole if
> > > > fscounters scrub races with a rw remount and immediately files start
> > > > getting deleted, but maybe that's just not worth worrying about.
> > > > Scrub will try again a few times, and in djwong-dev I solve that by
> > > > freezing the filesystem if the unlocked fscounters scrub fails.
> > > >
> > > > --D
> > > >
> > > > >
> > > > > Brian
> > > > >
> > > > > > /* XFS Inode Cache Walking Code */
> > > > > >
> > > > > > /*
> > > > > > @@ -1708,6 +1979,8 @@ xfs_icwalk_igrab(
> > > > > > return xfs_blockgc_igrab(ip);
> > > > > > case XFS_ICWALK_RECLAIM:
> > > > > > return xfs_reclaim_igrab(ip, icw);
> > > > > > + case XFS_ICWALK_INODEGC:
> > > > > > + return xfs_inodegc_igrab(ip);
> > > > > > default:
> > > > > > return false;
> > > > > > }
> > > > > > @@ -1736,6 +2009,9 @@ xfs_icwalk_process_inode(
> > > > > > case XFS_ICWALK_RECLAIM:
> > > > > > xfs_reclaim_inode(ip, pag);
> > > > > > break;
> > > > > > + case XFS_ICWALK_INODEGC:
> > > > > > + xfs_inodegc_inactivate(ip, pag, icw);
> > > > > > + break;
> > > > > > }
> > > > > > return error;
> > > > > > }
> > > > > > diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
> > > > > > index 00dc98a92835..840eac06a71b 100644
> > > > > > --- a/fs/xfs/xfs_icache.h
> > > > > > +++ b/fs/xfs/xfs_icache.h
> > > > > > @@ -80,4 +80,9 @@ int xfs_icache_inode_is_allocated(struct xfs_mount *mp, struct xfs_trans *tp,
> > > > > > void xfs_blockgc_stop(struct xfs_mount *mp);
> > > > > > void xfs_blockgc_start(struct xfs_mount *mp);
> > > > > >
> > > > > > +void xfs_inodegc_worker(struct work_struct *work);
> > > > > > +void xfs_inodegc_flush(struct xfs_mount *mp);
> > > > > > +void xfs_inodegc_stop(struct xfs_mount *mp);
> > > > > > +void xfs_inodegc_start(struct xfs_mount *mp);
> > > > > > +
> > > > > > #endif
> > > > > > diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> > > > > > index e3137bbc7b14..fa5be0d071ad 100644
> > > > > > --- a/fs/xfs/xfs_inode.h
> > > > > > +++ b/fs/xfs/xfs_inode.h
> > > > > > @@ -240,6 +240,7 @@ static inline bool xfs_inode_has_bigtime(struct xfs_inode *ip)
> > > > > > #define __XFS_IPINNED_BIT 8 /* wakeup key for zero pin count */
> > > > > > #define XFS_IPINNED (1 << __XFS_IPINNED_BIT)
> > > > > > #define XFS_IEOFBLOCKS (1 << 9) /* has the preallocblocks tag set */
> > > > > > +#define XFS_NEED_INACTIVE (1 << 10) /* see XFS_INACTIVATING below */
> > > > > > /*
> > > > > > * If this unlinked inode is in the middle of recovery, don't let drop_inode
> > > > > > * truncate and free the inode. This can happen if we iget the inode during
> > > > > > @@ -248,6 +249,21 @@ static inline bool xfs_inode_has_bigtime(struct xfs_inode *ip)
> > > > > > #define XFS_IRECOVERY (1 << 11)
> > > > > > #define XFS_ICOWBLOCKS (1 << 12)/* has the cowblocks tag set */
> > > > > >
> > > > > > +/*
> > > > > > + * If we need to update on-disk metadata before this IRECLAIMABLE inode can be
> > > > > > + * freed, then NEED_INACTIVE will be set. Once we start the updates, the
> > > > > > + * INACTIVATING bit will be set to keep iget away from this inode. After the
> > > > > > + * inactivation completes, both flags will be cleared and the inode is a
> > > > > > + * plain old IRECLAIMABLE inode.
> > > > > > + */
> > > > > > +#define XFS_INACTIVATING (1 << 13)
> > > > > > +
> > > > > > +/* All inode state flags related to inode reclaim. */
> > > > > > +#define XFS_ALL_IRECLAIM_FLAGS (XFS_IRECLAIMABLE | \
> > > > > > + XFS_IRECLAIM | \
> > > > > > + XFS_NEED_INACTIVE | \
> > > > > > + XFS_INACTIVATING)
> > > > > > +
> > > > > > /*
> > > > > > * Per-lifetime flags need to be reset when re-using a reclaimable inode during
> > > > > > * inode lookup. This prevents unintended behaviour on the new inode from
> > > > > > @@ -255,7 +271,8 @@ static inline bool xfs_inode_has_bigtime(struct xfs_inode *ip)
> > > > > > */
> > > > > > #define XFS_IRECLAIM_RESET_FLAGS \
> > > > > > (XFS_IRECLAIMABLE | XFS_IRECLAIM | \
> > > > > > - XFS_IDIRTY_RELEASE | XFS_ITRUNCATED)
> > > > > > + XFS_IDIRTY_RELEASE | XFS_ITRUNCATED | XFS_NEED_INACTIVE | \
> > > > > > + XFS_INACTIVATING)
> > > > > >
> > > > > > /*
> > > > > > * Flags for inode locking.
> > > > > > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> > > > > > index 1227503d2246..9d8fc85bd28d 100644
> > > > > > --- a/fs/xfs/xfs_log_recover.c
> > > > > > +++ b/fs/xfs/xfs_log_recover.c
> > > > > > @@ -2784,6 +2784,13 @@ xlog_recover_process_iunlinks(
> > > > > > }
> > > > > > xfs_buf_rele(agibp);
> > > > > > }
> > > > > > +
> > > > > > + /*
> > > > > > + * Flush the pending unlinked inodes to ensure that the inactivations
> > > > > > + * are fully completed on disk and the incore inodes can be reclaimed
> > > > > > + * before we signal that recovery is complete.
> > > > > > + */
> > > > > > + xfs_inodegc_flush(mp);
> > > > > > }
> > > > > >
> > > > > > STATIC void
> > > > > > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > > > > > index c3a96fb3ad80..ab65a14e51e6 100644
> > > > > > --- a/fs/xfs/xfs_mount.c
> > > > > > +++ b/fs/xfs/xfs_mount.c
> > > > > > @@ -514,7 +514,8 @@ xfs_check_summary_counts(
> > > > > > * Flush and reclaim dirty inodes in preparation for unmount. Inodes and
> > > > > > * internal inode structures can be sitting in the CIL and AIL at this point,
> > > > > > * so we need to unpin them, write them back and/or reclaim them before unmount
> > > > > > - * can proceed.
> > > > > > + * can proceed. In other words, callers are required to have inactivated all
> > > > > > + * inodes.
> > > > > > *
> > > > > > * An inode cluster that has been freed can have its buffer still pinned in
> > > > > > * memory because the transaction is still sitting in a iclog. The stale inodes
> > > > > > @@ -546,6 +547,7 @@ xfs_unmount_flush_inodes(
> > > > > > mp->m_flags |= XFS_MOUNT_UNMOUNTING;
> > > > > >
> > > > > > xfs_ail_push_all_sync(mp->m_ail);
> > > > > > + xfs_inodegc_stop(mp);
> > > > > > cancel_delayed_work_sync(&mp->m_reclaim_work);
> > > > > > xfs_reclaim_inodes(mp);
> > > > > > xfs_health_unmount(mp);
> > > > > > @@ -782,6 +784,9 @@ xfs_mountfs(
> > > > > > if (error)
> > > > > > goto out_log_dealloc;
> > > > > >
> > > > > > + /* Enable background inode inactivation workers. */
> > > > > > + xfs_inodegc_start(mp);
> > > > > > +
> > > > > > /*
> > > > > > * Get and sanity-check the root inode.
> > > > > > * Save the pointer to it in the mount structure.
> > > > > > @@ -936,6 +941,15 @@ xfs_mountfs(
> > > > > > xfs_irele(rip);
> > > > > > /* Clean out dquots that might be in memory after quotacheck. */
> > > > > > xfs_qm_unmount(mp);
> > > > > > +
> > > > > > + /*
> > > > > > + * Inactivate all inodes that might still be in memory after a log
> > > > > > + * intent recovery failure so that reclaim can free them. Metadata
> > > > > > + * inodes and the root directory shouldn't need inactivation, but the
> > > > > > + * mount failed for some reason, so pull down all the state and flee.
> > > > > > + */
> > > > > > + xfs_inodegc_flush(mp);
> > > > > > +
> > > > > > /*
> > > > > > * Flush all inode reclamation work and flush the log.
> > > > > > * We have to do this /after/ rtunmount and qm_unmount because those
> > > > > > @@ -983,6 +997,16 @@ xfs_unmountfs(
> > > > > > uint64_t resblks;
> > > > > > int error;
> > > > > >
> > > > > > + /*
> > > > > > + * Perform all on-disk metadata updates required to inactivate inodes
> > > > > > + * that the VFS evicted earlier in the unmount process. Freeing inodes
> > > > > > + * and discarding CoW fork preallocations can cause shape changes to
> > > > > > + * the free inode and refcount btrees, respectively, so we must finish
> > > > > > + * this before we discard the metadata space reservations. Metadata
> > > > > > + * inodes and the root directory do not require inactivation.
> > > > > > + */
> > > > > > + xfs_inodegc_flush(mp);
> > > > > > +
> > > > > > xfs_blockgc_stop(mp);
> > > > > > xfs_fs_unreserve_ag_blocks(mp);
> > > > > > xfs_qm_unmount_quotas(mp);
> > > > > > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> > > > > > index c78b63fe779a..dc906b78e24c 100644
> > > > > > --- a/fs/xfs/xfs_mount.h
> > > > > > +++ b/fs/xfs/xfs_mount.h
> > > > > > @@ -154,6 +154,13 @@ typedef struct xfs_mount {
> > > > > > uint8_t m_rt_checked;
> > > > > > uint8_t m_rt_sick;
> > > > > >
> > > > > > + /*
> > > > > > + * This atomic bitset controls flags that alter the behavior of the
> > > > > > + * filesystem. Use only the atomic bit helper functions here; see
> > > > > > + * XFS_OPFLAG_* for information about the actual flags.
> > > > > > + */
> > > > > > + unsigned long m_opflags;
> > > > > > +
> > > > > > /*
> > > > > > * End of read-mostly variables. Frequently written variables and locks
> > > > > > * should be placed below this comment from now on. The first variable
> > > > > > @@ -184,6 +191,7 @@ typedef struct xfs_mount {
> > > > > > uint64_t m_resblks_avail;/* available reserved blocks */
> > > > > > uint64_t m_resblks_save; /* reserved blks @ remount,ro */
> > > > > > struct delayed_work m_reclaim_work; /* background inode reclaim */
> > > > > > + struct delayed_work m_inodegc_work; /* background inode inactive */
> > > > > > struct xfs_kobj m_kobj;
> > > > > > struct xfs_kobj m_error_kobj;
> > > > > > struct xfs_kobj m_error_meta_kobj;
> > > > > > @@ -258,6 +266,19 @@ typedef struct xfs_mount {
> > > > > > #define XFS_MOUNT_DAX_ALWAYS (1ULL << 26)
> > > > > > #define XFS_MOUNT_DAX_NEVER (1ULL << 27)
> > > > > >
> > > > > > +/*
> > > > > > + * Operation flags -- each entry here is a bit index into m_opflags and is
> > > > > > + * not itself a flag value. Use the atomic bit functions to access.
> > > > > > + */
> > > > > > +enum xfs_opflag_bits {
> > > > > > + /*
> > > > > > + * If set, background inactivation worker threads will be scheduled to
> > > > > > + * process queued inodegc work. If not, queued inodes remain in memory
> > > > > > + * waiting to be processed.
> > > > > > + */
> > > > > > + XFS_OPFLAG_INODEGC_RUNNING_BIT = 0,
> > > > > > +};
> > > > > > +
> > > > > > /*
> > > > > > * Max and min values for mount-option defined I/O
> > > > > > * preallocation sizes.
> > > > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > > > > > index dd1ee333dcb3..0b01d9499395 100644
> > > > > > --- a/fs/xfs/xfs_super.c
> > > > > > +++ b/fs/xfs/xfs_super.c
> > > > > > @@ -714,6 +714,8 @@ xfs_fs_sync_fs(
> > > > > > {
> > > > > > struct xfs_mount *mp = XFS_M(sb);
> > > > > >
> > > > > > + trace_xfs_fs_sync_fs(mp, wait, _RET_IP_);
> > > > > > +
> > > > > > /*
> > > > > > * Doing anything during the async pass would be counterproductive.
> > > > > > */
> > > > > > @@ -730,6 +732,25 @@ xfs_fs_sync_fs(
> > > > > > flush_delayed_work(&mp->m_log->l_work);
> > > > > > }
> > > > > >
> > > > > > + /*
> > > > > > + * Flush all deferred inode inactivation work so that the free space
> > > > > > + * counters will reflect recent deletions. Do not force the log again
> > > > > > + * because log recovery can restart the inactivation from the info that
> > > > > > + * we just wrote into the ondisk log.
> > > > > > + *
> > > > > > + * For regular operation this isn't strictly necessary since we aren't
> > > > > > + * required to guarantee that unlinking frees space immediately, but
> > > > > > + * that is how XFS historically behaved.
> > > > > > + *
> > > > > > + * If, however, the filesystem is at FREEZE_PAGEFAULTS, this is our
> > > > > > + * last chance to complete the inactivation work before the filesystem
> > > > > > + * freezes and the log is quiesced. The background worker will not
> > > > > > + * activate again until the fs is thawed because the VFS won't evict
> > > > > > + * any more inodes until freeze_super drops s_umount and we disable the
> > > > > > + * worker in xfs_fs_freeze.
> > > > > > + */
> > > > > > + xfs_inodegc_flush(mp);
> > > > > > +
> > > > > > return 0;
> > > > > > }
> > > > > >
> > > > > > @@ -844,6 +865,17 @@ xfs_fs_freeze(
> > > > > > */
> > > > > > flags = memalloc_nofs_save();
> > > > > > xfs_blockgc_stop(mp);
> > > > > > +
> > > > > > + /*
> > > > > > + * Stop the inodegc background worker. freeze_super already flushed
> > > > > > + * all pending inodegc work when it sync'd the filesystem after setting
> > > > > > + * SB_FREEZE_PAGEFAULTS, and it holds s_umount, so we know that inodes
> > > > > > + * cannot enter xfs_fs_destroy_inode until the freeze is complete.
> > > > > > + * If the filesystem is read-write, inactivated inodes will queue but
> > > > > > + * the worker will not run until the filesystem thaws or unmounts.
> > > > > > + */
> > > > > > + xfs_inodegc_stop(mp);
> > > > > > +
> > > > > > xfs_save_resvblks(mp);
> > > > > > ret = xfs_log_quiesce(mp);
> > > > > > memalloc_nofs_restore(flags);
> > > > > > @@ -859,6 +891,14 @@ xfs_fs_unfreeze(
> > > > > > xfs_restore_resvblks(mp);
> > > > > > xfs_log_work_queue(mp);
> > > > > > xfs_blockgc_start(mp);
> > > > > > +
> > > > > > + /*
> > > > > > + * Don't reactivate the inodegc worker on a readonly filesystem because
> > > > > > + * inodes are sent directly to reclaim.
> > > > > > + */
> > > > > > + if (!(mp->m_flags & XFS_MOUNT_RDONLY))
> > > > > > + xfs_inodegc_start(mp);
> > > > > > +
> > > > > > return 0;
> > > > > > }
> > > > > >
> > > > > > @@ -1665,6 +1705,9 @@ xfs_remount_rw(
> > > > > > if (error && error != -ENOSPC)
> > > > > > return error;
> > > > > >
> > > > > > + /* Re-enable the background inode inactivation worker. */
> > > > > > + xfs_inodegc_start(mp);
> > > > > > +
> > > > > > return 0;
> > > > > > }
> > > > > >
> > > > > > @@ -1687,6 +1730,15 @@ xfs_remount_ro(
> > > > > > return error;
> > > > > > }
> > > > > >
> > > > > > + /*
> > > > > > + * Stop the inodegc background worker. xfs_fs_reconfigure already
> > > > > > + * flushed all pending inodegc work when it sync'd the filesystem.
> > > > > > + * The VFS holds s_umount, so we know that inodes cannot enter
> > > > > > + * xfs_fs_destroy_inode during a remount operation. In readonly mode
> > > > > > + * we send inodes straight to reclaim, so no inodes will be queued.
> > > > > > + */
> > > > > > + xfs_inodegc_stop(mp);
> > > > > > +
> > > > > > /* Free the per-AG metadata reservation pool. */
> > > > > > error = xfs_fs_unreserve_ag_blocks(mp);
> > > > > > if (error) {
> > > > > > @@ -1810,6 +1862,7 @@ static int xfs_init_fs_context(
> > > > > > mutex_init(&mp->m_growlock);
> > > > > > INIT_WORK(&mp->m_flush_inodes_work, xfs_flush_inodes_worker);
> > > > > > INIT_DELAYED_WORK(&mp->m_reclaim_work, xfs_reclaim_worker);
> > > > > > + INIT_DELAYED_WORK(&mp->m_inodegc_work, xfs_inodegc_worker);
> > > > > > mp->m_kobj.kobject.kset = xfs_kset;
> > > > > > /*
> > > > > > * We don't create the finobt per-ag space reservation until after log
> > > > > > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> > > > > > index d0b4799ad1e6..ca9bfbd28886 100644
> > > > > > --- a/fs/xfs/xfs_trace.h
> > > > > > +++ b/fs/xfs/xfs_trace.h
> > > > > > @@ -156,6 +156,45 @@ DEFINE_PERAG_REF_EVENT(xfs_perag_put);
> > > > > > DEFINE_PERAG_REF_EVENT(xfs_perag_set_inode_tag);
> > > > > > DEFINE_PERAG_REF_EVENT(xfs_perag_clear_inode_tag);
> > > > > >
> > > > > > +DECLARE_EVENT_CLASS(xfs_fs_class,
> > > > > > + TP_PROTO(struct xfs_mount *mp, int data, unsigned long caller_ip),
> > > > > > + TP_ARGS(mp, data, caller_ip),
> > > > > > + TP_STRUCT__entry(
> > > > > > + __field(dev_t, dev)
> > > > > > + __field(unsigned long long, mflags)
> > > > > > + __field(unsigned long, opflags)
> > > > > > + __field(unsigned long, sbflags)
> > > > > > + __field(int, data)
> > > > > > + __field(unsigned long, caller_ip)
> > > > > > + ),
> > > > > > + TP_fast_assign(
> > > > > > + __entry->dev = mp->m_super->s_dev;
> > > > > > + __entry->mflags = mp->m_flags;
> > > > > > + __entry->opflags = mp->m_opflags;
> > > > > > + __entry->sbflags = mp->m_super->s_flags;
> > > > > > + __entry->data = data;
> > > > > > + __entry->caller_ip = caller_ip;
> > > > > > + ),
> > > > > > + TP_printk("dev %d:%d flags 0x%llx opflags 0x%lx sflags 0x%lx data %d caller %pS",
> > > > > > + MAJOR(__entry->dev), MINOR(__entry->dev),
> > > > > > + __entry->mflags,
> > > > > > + __entry->opflags,
> > > > > > + __entry->sbflags,
> > > > > > + __entry->data,
> > > > > > + (char *)__entry->caller_ip)
> > > > > > +);
> > > > > > +
> > > > > > +#define DEFINE_FS_EVENT(name) \
> > > > > > +DEFINE_EVENT(xfs_fs_class, name, \
> > > > > > + TP_PROTO(struct xfs_mount *mp, int data, unsigned long caller_ip), \
> > > > > > + TP_ARGS(mp, data, caller_ip))
> > > > > > +DEFINE_FS_EVENT(xfs_inodegc_flush);
> > > > > > +DEFINE_FS_EVENT(xfs_inodegc_start);
> > > > > > +DEFINE_FS_EVENT(xfs_inodegc_stop);
> > > > > > +DEFINE_FS_EVENT(xfs_inodegc_queue);
> > > > > > +DEFINE_FS_EVENT(xfs_inodegc_worker);
> > > > > > +DEFINE_FS_EVENT(xfs_fs_sync_fs);
> > > > > > +
> > > > > > DECLARE_EVENT_CLASS(xfs_ag_class,
> > > > > > TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno),
> > > > > > TP_ARGS(mp, agno),
> > > > > > @@ -615,14 +654,17 @@ DECLARE_EVENT_CLASS(xfs_inode_class,
> > > > > > TP_STRUCT__entry(
> > > > > > __field(dev_t, dev)
> > > > > > __field(xfs_ino_t, ino)
> > > > > > + __field(unsigned long, iflags)
> > > > > > ),
> > > > > > TP_fast_assign(
> > > > > > __entry->dev = VFS_I(ip)->i_sb->s_dev;
> > > > > > __entry->ino = ip->i_ino;
> > > > > > + __entry->iflags = ip->i_flags;
> > > > > > ),
> > > > > > - TP_printk("dev %d:%d ino 0x%llx",
> > > > > > + TP_printk("dev %d:%d ino 0x%llx iflags 0x%lx",
> > > > > > MAJOR(__entry->dev), MINOR(__entry->dev),
> > > > > > - __entry->ino)
> > > > > > + __entry->ino,
> > > > > > + __entry->iflags)
> > > > > > )
> > > > > >
> > > > > > #define DEFINE_INODE_EVENT(name) \
> > > > > > @@ -666,6 +708,10 @@ DEFINE_INODE_EVENT(xfs_inode_free_eofblocks_invalid);
> > > > > > DEFINE_INODE_EVENT(xfs_inode_set_cowblocks_tag);
> > > > > > DEFINE_INODE_EVENT(xfs_inode_clear_cowblocks_tag);
> > > > > > DEFINE_INODE_EVENT(xfs_inode_free_cowblocks_invalid);
> > > > > > +DEFINE_INODE_EVENT(xfs_inode_set_reclaimable);
> > > > > > +DEFINE_INODE_EVENT(xfs_inode_reclaiming);
> > > > > > +DEFINE_INODE_EVENT(xfs_inode_set_need_inactive);
> > > > > > +DEFINE_INODE_EVENT(xfs_inode_inactivating);
> > > > > >
> > > > > > /*
> > > > > > * ftrace's __print_symbolic requires that all enum values be wrapped in the
> > > > > >
> > > > >
> > > >
> > >
> >
>
next prev parent reply other threads:[~2021-06-17 18:41 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-13 17:19 [PATCHSET v7 00/16] xfs: deferred inode inactivation Darrick J. Wong
2021-06-13 17:20 ` [PATCH 01/16] xfs: refactor the inode recycling code Darrick J. Wong
2021-06-16 8:13 ` Christoph Hellwig
2021-06-13 17:20 ` [PATCH 02/16] xfs: move xfs_inactive call to xfs_inode_mark_reclaimable Darrick J. Wong
2021-06-16 8:18 ` Christoph Hellwig
2021-06-13 17:20 ` [PATCH 03/16] xfs: detach dquots from inode if we don't need to inactivate it Darrick J. Wong
2021-06-14 16:13 ` Brian Foster
2021-06-14 17:27 ` Darrick J. Wong
2021-06-16 8:21 ` Christoph Hellwig
2021-06-13 17:20 ` [PATCH 04/16] xfs: clean up xfs_inactive a little bit Darrick J. Wong
2021-06-14 16:14 ` Brian Foster
2021-06-14 17:34 ` Darrick J. Wong
2021-06-16 8:23 ` Christoph Hellwig
2021-06-13 17:20 ` [PATCH 05/16] xfs: separate primary inode selection criteria in xfs_iget_cache_hit Darrick J. Wong
2021-06-14 16:14 ` Brian Foster
2021-06-16 8:26 ` Christoph Hellwig
2021-06-13 17:20 ` [PATCH 06/16] xfs: defer inode inactivation to a workqueue Darrick J. Wong
2021-06-14 16:17 ` Brian Foster
2021-06-14 19:27 ` Darrick J. Wong
2021-06-15 14:43 ` Brian Foster
2021-06-15 20:53 ` Darrick J. Wong
2021-06-17 14:23 ` Brian Foster
2021-06-17 18:41 ` Darrick J. Wong [this message]
2021-06-18 14:00 ` Brian Foster
2021-06-18 14:58 ` Darrick J. Wong
2021-06-21 5:19 ` Christoph Hellwig
2021-06-13 17:20 ` [PATCH 07/16] xfs: drop dead dquots before scheduling inode for inactivation Darrick J. Wong
2021-06-13 17:20 ` [PATCH 08/16] xfs: expose sysfs knob to control inode inactivation delay Darrick J. Wong
2021-06-13 17:20 ` [PATCH 09/16] xfs: reduce inactivation delay when things are tight Darrick J. Wong
2021-06-13 17:20 ` [PATCH 10/16] xfs: inactivate inodes any time we try to free speculative preallocations Darrick J. Wong
2021-06-13 17:20 ` [PATCH 11/16] xfs: flush inode inactivation work when compiling usage statistics Darrick J. Wong
2021-06-13 17:21 ` [PATCH 12/16] xfs: parallelize inode inactivation Darrick J. Wong
2021-06-13 17:21 ` [PATCH 13/16] xfs: don't run speculative preallocation gc when fs is frozen Darrick J. Wong
2021-06-13 17:21 ` [PATCH 14/16] xfs: scale speculative preallocation gc delay based on free space Darrick J. Wong
2021-06-13 17:21 ` [PATCH 15/16] xfs: use background worker pool when transactions can't get " Darrick J. Wong
2021-06-13 17:21 ` [PATCH 16/16] xfs: avoid buffer deadlocks when walking fs inodes Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210617184111.GD158232@locust \
--to=djwong@kernel.org \
--cc=bfoster@redhat.com \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox