public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* quotaoff, transaction quiesce, and dquot logging
@ 2020-09-04 15:59 Brian Foster
  2020-09-04 22:29 ` Dave Chinner
  0 siblings, 1 reply; 7+ messages in thread
From: Brian Foster @ 2020-09-04 15:59 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

Hi Dave,

I'm finally getting back to the quotaoff thing we discussed a ways
back[1] and doing some auditing to make sure that I understand the
approach and that it seems correct. To refresh, your original prototype
and the slightly different one I'm looking into implement the same
general scheme:

1.) quiesce the transaction subsystem
2.) disable quota(s) (update state flags)
3.) log quotaoff start/end items (synchronous)
4.) open the transaction subsystem
5.) release all inode dquot references and purge dquots

The idea is that the critical invariant requred for quotaoff is that no
dquots are logged after the quotaoff end item is committed to the log.
Otherwise there is no guarantee that the tail pushes past the quotaoff
item and a subsequent crash/recovery incorrectly replays dquot changes
for an inactive quota mode.

As it is, I think there's at least one assumption we've made that isn't
entirely accurate. It looks to me that steps 1-4 don't guarantee that
dquots aren't logged after the transaction subsystem is released. The
current code (and my prototype) only clear the *QUOTA_ACTIVE flags at
that point, and various transactions might have already acquired or
attached dquots to inodes before the transaction allocation even occurs.
Once the transaction is allocated, various paths basically only care if
we have a dquot or not.

For example, xfs_create() gets the dquots up front, allocs the
transaction and xfs_trans_reserve_quota_bydquots() attaches any of the
associated dquots to the transaction. xfs_trans_reserve_quota_bydquots()
checks for (!QUOTA_ON() || !QUOTA_RUNNING()), but those only help us if
all quotas have been disabled. Consider if one of multiple active quotas
are being turned off, and that this path already has dquots for both,
for example.

I do notice that your prototype[1] clears all of the quota flags (not
just the ACTIVE flags) after the transaction barrier is released. This
prevents further modifications in some cases, but it doesn't seem like
that is enough to avoid violating the invariant described above. E.g.,
xfs_trans_apply_dquot_deltas() logs the dquot regardless of whether
changes are made (and actually looks like it can make some changes on
the dquot even if the transaction doesn't) after the dquot is attached
to the transaction.

This does make me wonder a bit whether we should rework the transaction
commit path to avoid modifying/logging the dquot completely if the quota
is inactive or accounting is disabled. When starting to look around with
that in mind, I see the following in xfs_quota_defs.h:

/*
 * Checking XFS_IS_*QUOTA_ON() while holding any inode lock guarantees
 * quota will be not be switched off as long as that inode lock is held.
 */
#define XFS_IS_QUOTA_ON(mp)     ((mp)->m_qflags & (XFS_UQUOTA_ACTIVE | \
                                                   XFS_GQUOTA_ACTIVE | \
                                                   XFS_PQUOTA_ACTIVE))
...

So I'm wondering how safe that actually would be, or even how safe it is
to clear the ACCT|ENFD flags before we release/purge dquots. It seems
like that conflicts with the above documentation, at least, but I'm not
totally clear on the reason for that rule. In any event, I'm still
poking around a bit, but unless I'm missing something in the analysis
above it doesn't seem like this is a matter of simply altering the
quotaoff path as originally expected. Thoughts or ideas appreciated.

Brian

[1] https://lore.kernel.org/linux-xfs/20200702115144.GH2005@dread.disaster.area/


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: quotaoff, transaction quiesce, and dquot logging
  2020-09-04 15:59 quotaoff, transaction quiesce, and dquot logging Brian Foster
@ 2020-09-04 22:29 ` Dave Chinner
  2020-09-08 15:56   ` Brian Foster
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2020-09-04 22:29 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Sep 04, 2020 at 11:59:49AM -0400, Brian Foster wrote:
> Hi Dave,
> 
> I'm finally getting back to the quotaoff thing we discussed a ways
> back[1] and doing some auditing to make sure that I understand the
> approach and that it seems correct. To refresh, your original prototype
> and the slightly different one I'm looking into implement the same
> general scheme:
> 
> 1.) quiesce the transaction subsystem
> 2.) disable quota(s) (update state flags)
> 3.) log quotaoff start/end items (synchronous)
> 4.) open the transaction subsystem
> 5.) release all inode dquot references and purge dquots
> 
> The idea is that the critical invariant requred for quotaoff is that no
> dquots are logged after the quotaoff end item is committed to the log.
> Otherwise there is no guarantee that the tail pushes past the quotaoff
> item and a subsequent crash/recovery incorrectly replays dquot changes
> for an inactive quota mode.
> 
> As it is, I think there's at least one assumption we've made that isn't
> entirely accurate. It looks to me that steps 1-4 don't guarantee that
> dquots aren't logged after the transaction subsystem is released. The
> current code (and my prototype) only clear the *QUOTA_ACTIVE flags at
> that point, and various transactions might have already acquired or
> attached dquots to inodes before the transaction allocation even occurs.
> Once the transaction is allocated, various paths basically only care if
> we have a dquot or not.
> 
> For example, xfs_create() gets the dquots up front, allocs the
> transaction and xfs_trans_reserve_quota_bydquots() attaches any of the
> associated dquots to the transaction. xfs_trans_reserve_quota_bydquots()
> checks for (!QUOTA_ON() || !QUOTA_RUNNING()), but those only help us if
> all quotas have been disabled. Consider if one of multiple active quotas
> are being turned off, and that this path already has dquots for both,
> for example.

Right, that's one of the main problems I tried to solve.

> 
> I do notice that your prototype[1] clears all of the quota flags (not
> just the ACTIVE flags) after the transaction barrier is released. This
> prevents further modifications in some cases, but it doesn't seem like
> that is enough to avoid violating the invariant described above. E.g.,
> xfs_trans_apply_dquot_deltas() logs the dquot regardless of whether
> changes are made (and actually looks like it can make some changes on
> the dquot even if the transaction doesn't) after the dquot is attached
> to the transaction.

It should, because the transaction barrier includes a draining
mechanism to wait for all quota modifying transactions already
running to drain out. That is, any transaction marked with
XFS_TRANS_QUOTA (via it's initial reservation) will have an elevated
q->qi_active_trans count, and that only gets decremented when the
transaction completes and the dquot is released from the
transaction (i.e. in xfs_trans_free_dqinfo()).

The barrier sets the XFS_QUOTA_OFF_RUNNING_BIT, at which point new
transactions marked with XFS_TRANS_QUOTA will enter
xfs_trans_quota_enabled() in xfs_trans_alloc(). If the
XFS_QUOTA_OFF_RUNNING_BIT is set, they block waiting for it to be
cleared.

This forms the "no new quota modifying transactions can start" part
of the barrier.

If quota is running and the XFS_QUOTA_OFF_RUNNING_BIT is not set,
they increment q->qi_active_trans to indicate that there is a
running dquot modifying transaction in progress. This state is
maintained across transaction duplication/rolling so the reference
only goes away when the transaction is completely finished with
dquots and released them from the transaction.

The quota off code sets the XFS_QUOTA_OFF_RUNNING_BIT the waits for
q->qi_active_trans to go to zero, thereby setting the "stop new
xacts from starting" barrier and then waiting for all dquot
modifying xacts to complete. At this point, there are not
transactions holding dquot references, no dquots being modified,
and all new transactions that might modify dquots are being held in
xfs_trans_alloc() and will check xfs_trans_quota_running() when they
are woken. 

At this point, we might still have uncommitted dquots in the CIL,
so we do a xfs_log_force(mp, XFS_LOG_SYNC) to flush all the
modifications to the journal, thereby guaranteeing that we order
them in the journal before the first quotaoff item is committed.

With all dquot mods being stable now, and no new modifications able
to occur, we can clear all the quota flags in one go. We
don't need to keep the quotas we are turning off running while we
clear out all the inode references to them, because once the
running/active/acct/enforced flags are cleared, no new modification
will try to modify that quota because all the "is this quota type
running" checks will fail the moment we clear all the in-memory
quota flags.

IOWs, the barrier mechanism I added was designed to provide the
exact "no dquots are logged after the quotaoff item is committed to
the log" invariant you describe. It's basically the same mechanism
we use for draining direct IO via taking IOLOCKs to prevent new
submissions and calling inode_dio_wait() to drain everything in
flight....

> This does make me wonder a bit whether we should rework the transaction
> commit path to avoid modifying/logging the dquot completely if the quota
> is inactive or accounting is disabled.

If there are no dquots for an inactive quota type attached to the
transaction, it won't log anything. That's what the barrier+drain
acheives.

> When starting to look around with
> that in mind, I see the following in xfs_quota_defs.h:
> 
> /*
>  * Checking XFS_IS_*QUOTA_ON() while holding any inode lock guarantees
>  * quota will be not be switched off as long as that inode lock is held.
>  */
> #define XFS_IS_QUOTA_ON(mp)     ((mp)->m_qflags & (XFS_UQUOTA_ACTIVE | \
>                                                    XFS_GQUOTA_ACTIVE | \
>                                                    XFS_PQUOTA_ACTIVE))
> ...
> 
> So I'm wondering how safe that actually would be, or even how safe it is
> to clear the ACCT|ENFD flags before we release/purge dquots.

I don't see any problem with this - we only care that when we
restart transactions that the dquots for the disabled quota type are
not joined into new transactions, and clearing the ACTIVE flag for
that dquot type should do that.

> It seems
> like that conflicts with the above documentation, at least, but I'm not
> totally clear on the reason for that rule.

Releasing the dquot in the quota-off code required an inode lock
cycle to guarantee that the inode was not currently being modified
in a transaction that might hold a reference to the dquota. Hence
the ACTIVE check under the inode lock is the only way to guarantee
the dquot doesn't get ripped out from underneath an inode/dquot
modification in progress...

> In any event, I'm still
> poking around a bit, but unless I'm missing something in the analysis
> above it doesn't seem like this is a matter of simply altering the
> quotaoff path as originally expected. Thoughts or ideas appreciated.

I suspect you missed the importance of q->qi_active_trans for
draining all active quota modifying transactions before making quota
flag changes...

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: quotaoff, transaction quiesce, and dquot logging
  2020-09-04 22:29 ` Dave Chinner
@ 2020-09-08 15:56   ` Brian Foster
  2020-09-08 21:07     ` Dave Chinner
  0 siblings, 1 reply; 7+ messages in thread
From: Brian Foster @ 2020-09-08 15:56 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Sat, Sep 05, 2020 at 08:29:36AM +1000, Dave Chinner wrote:
> On Fri, Sep 04, 2020 at 11:59:49AM -0400, Brian Foster wrote:
> > Hi Dave,
> > 
> > I'm finally getting back to the quotaoff thing we discussed a ways
> > back[1] and doing some auditing to make sure that I understand the
> > approach and that it seems correct. To refresh, your original prototype
> > and the slightly different one I'm looking into implement the same
> > general scheme:
> > 
> > 1.) quiesce the transaction subsystem
> > 2.) disable quota(s) (update state flags)
> > 3.) log quotaoff start/end items (synchronous)
> > 4.) open the transaction subsystem
> > 5.) release all inode dquot references and purge dquots
> > 
> > The idea is that the critical invariant requred for quotaoff is that no
> > dquots are logged after the quotaoff end item is committed to the log.
> > Otherwise there is no guarantee that the tail pushes past the quotaoff
> > item and a subsequent crash/recovery incorrectly replays dquot changes
> > for an inactive quota mode.
> > 
> > As it is, I think there's at least one assumption we've made that isn't
> > entirely accurate. It looks to me that steps 1-4 don't guarantee that
> > dquots aren't logged after the transaction subsystem is released. The
> > current code (and my prototype) only clear the *QUOTA_ACTIVE flags at
> > that point, and various transactions might have already acquired or
> > attached dquots to inodes before the transaction allocation even occurs.
> > Once the transaction is allocated, various paths basically only care if
> > we have a dquot or not.
> > 
> > For example, xfs_create() gets the dquots up front, allocs the
> > transaction and xfs_trans_reserve_quota_bydquots() attaches any of the
> > associated dquots to the transaction. xfs_trans_reserve_quota_bydquots()
> > checks for (!QUOTA_ON() || !QUOTA_RUNNING()), but those only help us if
> > all quotas have been disabled. Consider if one of multiple active quotas
> > are being turned off, and that this path already has dquots for both,
> > for example.
> 
> Right, that's one of the main problems I tried to solve.
> 
> > 
> > I do notice that your prototype[1] clears all of the quota flags (not
> > just the ACTIVE flags) after the transaction barrier is released. This
> > prevents further modifications in some cases, but it doesn't seem like
> > that is enough to avoid violating the invariant described above. E.g.,
> > xfs_trans_apply_dquot_deltas() logs the dquot regardless of whether
> > changes are made (and actually looks like it can make some changes on
> > the dquot even if the transaction doesn't) after the dquot is attached
> > to the transaction.
> 
> It should, because the transaction barrier includes a draining
> mechanism to wait for all quota modifying transactions already
> running to drain out. That is, any transaction marked with
> XFS_TRANS_QUOTA (via it's initial reservation) will have an elevated
> q->qi_active_trans count, and that only gets decremented when the
> transaction completes and the dquot is released from the
> transaction (i.e. in xfs_trans_free_dqinfo()).
> 
> The barrier sets the XFS_QUOTA_OFF_RUNNING_BIT, at which point new
> transactions marked with XFS_TRANS_QUOTA will enter
> xfs_trans_quota_enabled() in xfs_trans_alloc(). If the
> XFS_QUOTA_OFF_RUNNING_BIT is set, they block waiting for it to be
> cleared.
> 
> This forms the "no new quota modifying transactions can start" part
> of the barrier.
> 
> If quota is running and the XFS_QUOTA_OFF_RUNNING_BIT is not set,
> they increment q->qi_active_trans to indicate that there is a
> running dquot modifying transaction in progress. This state is
> maintained across transaction duplication/rolling so the reference
> only goes away when the transaction is completely finished with
> dquots and released them from the transaction.
> 
> The quota off code sets the XFS_QUOTA_OFF_RUNNING_BIT the waits for
> q->qi_active_trans to go to zero, thereby setting the "stop new
> xacts from starting" barrier and then waiting for all dquot
> modifying xacts to complete. At this point, there are not
> transactions holding dquot references, no dquots being modified,
> and all new transactions that might modify dquots are being held in
> xfs_trans_alloc() and will check xfs_trans_quota_running() when they
> are woken. 
> 
> At this point, we might still have uncommitted dquots in the CIL,
> so we do a xfs_log_force(mp, XFS_LOG_SYNC) to flush all the
> modifications to the journal, thereby guaranteeing that we order
> them in the journal before the first quotaoff item is committed.
> 
> With all dquot mods being stable now, and no new modifications able
> to occur, we can clear all the quota flags in one go. We
> don't need to keep the quotas we are turning off running while we
> clear out all the inode references to them, because once the
> running/active/acct/enforced flags are cleared, no new modification
> will try to modify that quota because all the "is this quota type
> running" checks will fail the moment we clear all the in-memory
> quota flags.
> 
> IOWs, the barrier mechanism I added was designed to provide the
> exact "no dquots are logged after the quotaoff item is committed to
> the log" invariant you describe. It's basically the same mechanism
> we use for draining direct IO via taking IOLOCKs to prevent new
> submissions and calling inode_dio_wait() to drain everything in
> flight....
> 

Right, I follow all of the above. I've been experimenting with an
approach that just freezes all transactions as opposed to only quota
transactions just to reduce the amount of code involved. What I'm trying
to point out is that I don't think this quotaoff logic alone is
sufficient to prevent dquot log ordering problems.

Consider the following example scenario:

- fs mounted w/ user+group quotas enabled
- inode 0x123 is in-core w/ user+group dquots already attached
- user executes 'xfs_quota -xc "off -g" <mnt>' to turn off group quotas
- quotaoff drains all outstanding transactions, clears (group) quota
  flag, logs quotaoff start/end ...

Meanwhile..

- user executes an fallocate request on inode 0x123, which blocks down
  in xfs_alloc_file_space() -> xfs_trans_alloc() due to the quotaoff in
  progress.
- quotaoff releases the trans barrier and begins doing its dquot
  flush/purge thing..
- falloc grabs the 0x123 ilock and xfs_trans_reserve_quota_bydquots() ->
  xfs_trans_dqresv() -> xfs_trans_mod_dquot() joins the user/group
  dquots to the transaction quota ctx because they are still attached to
  the inode at this point (and user quota is still enabled), hence quota
  blocks are reserved in both
- xfs_trans_mod_dquot_byino() (via xfs_bmapi_write() -> ... -> xfs_bmap_btalloc() ->
  xfs_bmap_btalloc_accounting()) skips accounting the allocated blocks
  to the group dquot because it is not enabled
- xfs_trans_commit() -> ... -> xfs_trans_apply_dquot_deltas() logs both
  dquots (regardless of the previous step) because the group dquot was
  attached to the transaction quota context in the reserve step, even
  though it was disabled before that point

This basic scenario can be confirmed with a simple assert check in
xfs_trans_log_dquot():

	ASSERT(xfs_this_quota_on(tp->t_mountp, xfs_dquot_type(dqp)));

... which can fail when included on top of the prototype(s). The
remainder of my email around the various flags and whatnot was more
related to how to address this as opposed to being an issue with the
concept itself. It seems like we should be able to head this issue off
somewhere in this sequence (i.e., checking the appropriate flag before
the dquot is attached), but it also seems like the quotaoff start/end
plus various quota flags all fit together a certain way and I feel like
some pieces of the puzzle are still missing from a design standpoint...

Brian

> > This does make me wonder a bit whether we should rework the transaction
> > commit path to avoid modifying/logging the dquot completely if the quota
> > is inactive or accounting is disabled.
> 
> If there are no dquots for an inactive quota type attached to the
> transaction, it won't log anything. That's what the barrier+drain
> acheives.
> 
> > When starting to look around with
> > that in mind, I see the following in xfs_quota_defs.h:
> > 
> > /*
> >  * Checking XFS_IS_*QUOTA_ON() while holding any inode lock guarantees
> >  * quota will be not be switched off as long as that inode lock is held.
> >  */
> > #define XFS_IS_QUOTA_ON(mp)     ((mp)->m_qflags & (XFS_UQUOTA_ACTIVE | \
> >                                                    XFS_GQUOTA_ACTIVE | \
> >                                                    XFS_PQUOTA_ACTIVE))
> > ...
> > 
> > So I'm wondering how safe that actually would be, or even how safe it is
> > to clear the ACCT|ENFD flags before we release/purge dquots.
> 
> I don't see any problem with this - we only care that when we
> restart transactions that the dquots for the disabled quota type are
> not joined into new transactions, and clearing the ACTIVE flag for
> that dquot type should do that.
> 
> > It seems
> > like that conflicts with the above documentation, at least, but I'm not
> > totally clear on the reason for that rule.
> 
> Releasing the dquot in the quota-off code required an inode lock
> cycle to guarantee that the inode was not currently being modified
> in a transaction that might hold a reference to the dquota. Hence
> the ACTIVE check under the inode lock is the only way to guarantee
> the dquot doesn't get ripped out from underneath an inode/dquot
> modification in progress...
> 
> > In any event, I'm still
> > poking around a bit, but unless I'm missing something in the analysis
> > above it doesn't seem like this is a matter of simply altering the
> > quotaoff path as originally expected. Thoughts or ideas appreciated.
> 
> I suspect you missed the importance of q->qi_active_trans for
> draining all active quota modifying transactions before making quota
> flag changes...
> 
> Cheers,
> 
> Dave.
> 
> -- 
> Dave Chinner
> david@fromorbit.com
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: quotaoff, transaction quiesce, and dquot logging
  2020-09-08 15:56   ` Brian Foster
@ 2020-09-08 21:07     ` Dave Chinner
  2020-09-09 15:00       ` Brian Foster
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2020-09-08 21:07 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Tue, Sep 08, 2020 at 11:56:02AM -0400, Brian Foster wrote:
> On Sat, Sep 05, 2020 at 08:29:36AM +1000, Dave Chinner wrote:
> > IOWs, the barrier mechanism I added was designed to provide the
> > exact "no dquots are logged after the quotaoff item is committed to
> > the log" invariant you describe. It's basically the same mechanism
> > we use for draining direct IO via taking IOLOCKs to prevent new
> > submissions and calling inode_dio_wait() to drain everything in
> > flight....
> > 
> 
> Right, I follow all of the above. I've been experimenting with an
> approach that just freezes all transactions as opposed to only quota
> transactions just to reduce the amount of code involved. What I'm trying
> to point out is that I don't think this quotaoff logic alone is
> sufficient to prevent dquot log ordering problems.
> 
> Consider the following example scenario:
> 
> - fs mounted w/ user+group quotas enabled
> - inode 0x123 is in-core w/ user+group dquots already attached
> - user executes 'xfs_quota -xc "off -g" <mnt>' to turn off group quotas
> - quotaoff drains all outstanding transactions, clears (group) quota
>   flag, logs quotaoff start/end ...
> 
> Meanwhile..
> 
> - user executes an fallocate request on inode 0x123, which blocks down
>   in xfs_alloc_file_space() -> xfs_trans_alloc() due to the quotaoff in
>   progress.
> - quotaoff releases the trans barrier and begins doing its dquot
>   flush/purge thing..
> - falloc grabs the 0x123 ilock and xfs_trans_reserve_quota_bydquots() ->
>   xfs_trans_dqresv() -> xfs_trans_mod_dquot() joins the user/group
>   dquots to the transaction quota ctx because they are still attached to
>   the inode at this point (and user quota is still enabled), hence quota
>   blocks are reserved in both

There's the bug. The patch I wrote needs to ensure that the quotas
are enabled when attaching the dquot to the dqinfo. The code
currently checks for global "quota on" state, but it doesn't check
individual quota state...

> - xfs_trans_mod_dquot_byino() (via xfs_bmapi_write() -> ... -> xfs_bmap_btalloc() ->
>   xfs_bmap_btalloc_accounting()) skips accounting the allocated blocks
>   to the group dquot because it is not enabled

Right, the reservation functions need to do the same thing as
xfs_trans_mod_dquot_byino(). I simply missed that for the
reservation functions. i.e. Adding the same style of check like:

	if (XFS_IS_UQUOTA_ON(mp) && udq)

before doing anything with user quota will avoid this problem as
we are already in transaction context and the UQUOTA on or off state
will not change until the transaction ends.

> concept itself. It seems like we should be able to head this issue off
> somewhere in this sequence (i.e., checking the appropriate flag before
> the dquot is attached), but it also seems like the quotaoff start/end
> plus various quota flags all fit together a certain way and I feel like
> some pieces of the puzzle are still missing from a design standpoint...

I can't think of anything that is missing - the quota off barrier
gives us an atomic quota state change w.r.t. running transactions,
so we just need to make sure we check the quota state before joining
anything quota related to a transaction rather than assume that the
presence of a dquot attached to an inode means that quotas are on.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: quotaoff, transaction quiesce, and dquot logging
  2020-09-08 21:07     ` Dave Chinner
@ 2020-09-09 15:00       ` Brian Foster
  2020-09-09 22:59         ` Dave Chinner
  0 siblings, 1 reply; 7+ messages in thread
From: Brian Foster @ 2020-09-09 15:00 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Sep 09, 2020 at 07:07:20AM +1000, Dave Chinner wrote:
> On Tue, Sep 08, 2020 at 11:56:02AM -0400, Brian Foster wrote:
> > On Sat, Sep 05, 2020 at 08:29:36AM +1000, Dave Chinner wrote:
> > > IOWs, the barrier mechanism I added was designed to provide the
> > > exact "no dquots are logged after the quotaoff item is committed to
> > > the log" invariant you describe. It's basically the same mechanism
> > > we use for draining direct IO via taking IOLOCKs to prevent new
> > > submissions and calling inode_dio_wait() to drain everything in
> > > flight....
> > > 
> > 
> > Right, I follow all of the above. I've been experimenting with an
> > approach that just freezes all transactions as opposed to only quota
> > transactions just to reduce the amount of code involved. What I'm trying
> > to point out is that I don't think this quotaoff logic alone is
> > sufficient to prevent dquot log ordering problems.
> > 
> > Consider the following example scenario:
> > 
> > - fs mounted w/ user+group quotas enabled
> > - inode 0x123 is in-core w/ user+group dquots already attached
> > - user executes 'xfs_quota -xc "off -g" <mnt>' to turn off group quotas
> > - quotaoff drains all outstanding transactions, clears (group) quota
> >   flag, logs quotaoff start/end ...
> > 
> > Meanwhile..
> > 
> > - user executes an fallocate request on inode 0x123, which blocks down
> >   in xfs_alloc_file_space() -> xfs_trans_alloc() due to the quotaoff in
> >   progress.
> > - quotaoff releases the trans barrier and begins doing its dquot
> >   flush/purge thing..
> > - falloc grabs the 0x123 ilock and xfs_trans_reserve_quota_bydquots() ->
> >   xfs_trans_dqresv() -> xfs_trans_mod_dquot() joins the user/group
> >   dquots to the transaction quota ctx because they are still attached to
> >   the inode at this point (and user quota is still enabled), hence quota
> >   blocks are reserved in both
> 
> There's the bug. The patch I wrote needs to ensure that the quotas
> are enabled when attaching the dquot to the dqinfo. The code
> currently checks for global "quota on" state, but it doesn't check
> individual quota state...
> 

Ok. I was surmising about something similar down in the commit path, but
it seems more appropriate to avoid attaching the dquot to the
transaction (and not doing any accounting, reservation or otherwise) in
the first place.

> > - xfs_trans_mod_dquot_byino() (via xfs_bmapi_write() -> ... -> xfs_bmap_btalloc() ->
> >   xfs_bmap_btalloc_accounting()) skips accounting the allocated blocks
> >   to the group dquot because it is not enabled
> 
> Right, the reservation functions need to do the same thing as
> xfs_trans_mod_dquot_byino(). I simply missed that for the
> reservation functions. i.e. Adding the same style of check like:
> 
> 	if (XFS_IS_UQUOTA_ON(mp) && udq)
> 
> before doing anything with user quota will avoid this problem as
> we are already in transaction context and the UQUOTA on or off state
> will not change until the transaction ends.
> 
> > concept itself. It seems like we should be able to head this issue off
> > somewhere in this sequence (i.e., checking the appropriate flag before
> > the dquot is attached), but it also seems like the quotaoff start/end
> > plus various quota flags all fit together a certain way and I feel like
> > some pieces of the puzzle are still missing from a design standpoint...
> 
> I can't think of anything that is missing - the quota off barrier
> gives us an atomic quota state change w.r.t. running transactions,
> so we just need to make sure we check the quota state before joining
> anything quota related to a transaction rather than assume that the
> presence of a dquot attached to an inode means that quotas are on.
> 

This gets back to my earlier questions around the various quota flags.
If I trace through the code of some operations, it seems like this
approach should work (once this logging issue is addressed, and more
testing required of course). However if I refer back to the runtime
macro comment:

/*
 * Checking XFS_IS_*QUOTA_ON() while holding any inode lock guarantees
 * quota will be not be switched off as long as that inode lock is held.
 */

This will technically no longer be the case because the updated quotaoff
will clear all of the flags before cycling any ilocks and detaching
dquots. I'm aware it will drain the transaction subsystem, but does
anything else depend on not seeing such a state change with an inode
lock held? I haven't seen anything so far that would conflict, but the
comment here is rather vague on details.

Conversely, if not, I'm wondering whether there's a need for an ACTIVE
flag at all if we'd clear it at the same time as the ACCT|ENFD flags
during quotaoff anyways. It sounds like the answer to both those
questions is no based on your previous responses, perhaps reason being
that the transaction drain on the quotaoff side effectively replaces the
need for this rule on the general transaction side. Hm? Note that I
wouldn't remove the ACTIVE flag immediately anyways, but I want to make
sure the concern is clear..

Thanks for the feedback.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: quotaoff, transaction quiesce, and dquot logging
  2020-09-09 15:00       ` Brian Foster
@ 2020-09-09 22:59         ` Dave Chinner
  2020-09-10 13:20           ` Brian Foster
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2020-09-09 22:59 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Wed, Sep 09, 2020 at 11:00:35AM -0400, Brian Foster wrote:
> On Wed, Sep 09, 2020 at 07:07:20AM +1000, Dave Chinner wrote:
> > On Tue, Sep 08, 2020 at 11:56:02AM -0400, Brian Foster wrote:
> > > - xfs_trans_mod_dquot_byino() (via xfs_bmapi_write() -> ... -> xfs_bmap_btalloc() ->
> > >   xfs_bmap_btalloc_accounting()) skips accounting the allocated blocks
> > >   to the group dquot because it is not enabled
> > 
> > Right, the reservation functions need to do the same thing as
> > xfs_trans_mod_dquot_byino(). I simply missed that for the
> > reservation functions. i.e. Adding the same style of check like:
> > 
> > 	if (XFS_IS_UQUOTA_ON(mp) && udq)
> > 
> > before doing anything with user quota will avoid this problem as
> > we are already in transaction context and the UQUOTA on or off state
> > will not change until the transaction ends.
> > 
> > > concept itself. It seems like we should be able to head this issue off
> > > somewhere in this sequence (i.e., checking the appropriate flag before
> > > the dquot is attached), but it also seems like the quotaoff start/end
> > > plus various quota flags all fit together a certain way and I feel like
> > > some pieces of the puzzle are still missing from a design standpoint...
> > 
> > I can't think of anything that is missing - the quota off barrier
> > gives us an atomic quota state change w.r.t. running transactions,
> > so we just need to make sure we check the quota state before joining
> > anything quota related to a transaction rather than assume that the
> > presence of a dquot attached to an inode means that quotas are on.
> > 
> 
> This gets back to my earlier questions around the various quota flags.
> If I trace through the code of some operations, it seems like this
> approach should work (once this logging issue is addressed, and more
> testing required of course). However if I refer back to the runtime
> macro comment:
> 
> /*
>  * Checking XFS_IS_*QUOTA_ON() while holding any inode lock guarantees
>  * quota will be not be switched off as long as that inode lock is held.
>  */
> 
> This will technically no longer be the case because the updated quotaoff
> will clear all of the flags before cycling any ilocks and detaching
> dquots. I'm aware it will drain the transaction subsystem, but does
> anything else depend on not seeing such a state change with an inode
> lock held? I haven't seen anything so far that would conflict, but the
> comment here is rather vague on details.

Not that I know of. I would probably rewrite the above comment as:

/*
 * Checking XFS_IS_*QUOTA_ON() while inside an active quota modifying
 * transaction context guarantees quota will be not be switched until after the
 * entire rolling transaction chain is completed.
 */

To clarify the situation. Having the inode locked will now only
guarantee that the dquot will not go away while the inode is locked,
it doesn't guarantee that quota will not switch off any more.

> Conversely, if not, I'm wondering whether there's a need for an ACTIVE
> flag at all if we'd clear it at the same time as the ACCT|ENFD flags
> during quotaoff anyways. It sounds like the answer to both those
> questions is no based on your previous responses, perhaps reason being
> that the transaction drain on the quotaoff side effectively replaces the
> need for this rule on the general transaction side. Hm? Note that I
> wouldn't remove the ACTIVE flag immediately anyways, but I want to make
> sure the concern is clear..

Yes, I think you are right - the ACTIVE flag could probably away as
it doesn't really play a part in the quota-off dance anymore. We'd
still need the IS_QUOTA_ON() checks, but they'd look at ACCT|ENFD
instead...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: quotaoff, transaction quiesce, and dquot logging
  2020-09-09 22:59         ` Dave Chinner
@ 2020-09-10 13:20           ` Brian Foster
  0 siblings, 0 replies; 7+ messages in thread
From: Brian Foster @ 2020-09-10 13:20 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Sep 10, 2020 at 08:59:37AM +1000, Dave Chinner wrote:
> On Wed, Sep 09, 2020 at 11:00:35AM -0400, Brian Foster wrote:
> > On Wed, Sep 09, 2020 at 07:07:20AM +1000, Dave Chinner wrote:
> > > On Tue, Sep 08, 2020 at 11:56:02AM -0400, Brian Foster wrote:
> > > > - xfs_trans_mod_dquot_byino() (via xfs_bmapi_write() -> ... -> xfs_bmap_btalloc() ->
> > > >   xfs_bmap_btalloc_accounting()) skips accounting the allocated blocks
> > > >   to the group dquot because it is not enabled
> > > 
> > > Right, the reservation functions need to do the same thing as
> > > xfs_trans_mod_dquot_byino(). I simply missed that for the
> > > reservation functions. i.e. Adding the same style of check like:
> > > 
> > > 	if (XFS_IS_UQUOTA_ON(mp) && udq)
> > > 
> > > before doing anything with user quota will avoid this problem as
> > > we are already in transaction context and the UQUOTA on or off state
> > > will not change until the transaction ends.
> > > 
> > > > concept itself. It seems like we should be able to head this issue off
> > > > somewhere in this sequence (i.e., checking the appropriate flag before
> > > > the dquot is attached), but it also seems like the quotaoff start/end
> > > > plus various quota flags all fit together a certain way and I feel like
> > > > some pieces of the puzzle are still missing from a design standpoint...
> > > 
> > > I can't think of anything that is missing - the quota off barrier
> > > gives us an atomic quota state change w.r.t. running transactions,
> > > so we just need to make sure we check the quota state before joining
> > > anything quota related to a transaction rather than assume that the
> > > presence of a dquot attached to an inode means that quotas are on.
> > > 
> > 
> > This gets back to my earlier questions around the various quota flags.
> > If I trace through the code of some operations, it seems like this
> > approach should work (once this logging issue is addressed, and more
> > testing required of course). However if I refer back to the runtime
> > macro comment:
> > 
> > /*
> >  * Checking XFS_IS_*QUOTA_ON() while holding any inode lock guarantees
> >  * quota will be not be switched off as long as that inode lock is held.
> >  */
> > 
> > This will technically no longer be the case because the updated quotaoff
> > will clear all of the flags before cycling any ilocks and detaching
> > dquots. I'm aware it will drain the transaction subsystem, but does
> > anything else depend on not seeing such a state change with an inode
> > lock held? I haven't seen anything so far that would conflict, but the
> > comment here is rather vague on details.
> 
> Not that I know of. I would probably rewrite the above comment as:
> 
> /*
>  * Checking XFS_IS_*QUOTA_ON() while inside an active quota modifying
>  * transaction context guarantees quota will be not be switched until after the
>  * entire rolling transaction chain is completed.
>  */
> 
> To clarify the situation. Having the inode locked will now only
> guarantee that the dquot will not go away while the inode is locked,
> it doesn't guarantee that quota will not switch off any more.
> 

Ok, that makes more sense.

> > Conversely, if not, I'm wondering whether there's a need for an ACTIVE
> > flag at all if we'd clear it at the same time as the ACCT|ENFD flags
> > during quotaoff anyways. It sounds like the answer to both those
> > questions is no based on your previous responses, perhaps reason being
> > that the transaction drain on the quotaoff side effectively replaces the
> > need for this rule on the general transaction side. Hm? Note that I
> > wouldn't remove the ACTIVE flag immediately anyways, but I want to make
> > sure the concern is clear..
> 
> Yes, I think you are right - the ACTIVE flag could probably away as
> it doesn't really play a part in the quota-off dance anymore. We'd
> still need the IS_QUOTA_ON() checks, but they'd look at ACCT|ENFD
> instead...
> 

Ack. Thanks for the sanity check.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-09-10 13:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-09-04 15:59 quotaoff, transaction quiesce, and dquot logging Brian Foster
2020-09-04 22:29 ` Dave Chinner
2020-09-08 15:56   ` Brian Foster
2020-09-08 21:07     ` Dave Chinner
2020-09-09 15:00       ` Brian Foster
2020-09-09 22:59         ` Dave Chinner
2020-09-10 13:20           ` Brian Foster

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox