* [PATCH v2 2/3] xfs: fix infinite loop by detaching the group/project hints from user dquot @ 2013-11-26 13:38 Jeff Liu 2013-11-28 10:43 ` Christoph Hellwig 0 siblings, 1 reply; 12+ messages in thread From: Jeff Liu @ 2013-11-26 13:38 UTC (permalink / raw) To: xfs@oss.sgi.com From: Jie Liu <jeff.liu@oracle.com> xfs_quota(8) will hang up if trying to turn group/project quota off before the user quota is off, this could be 100% reproduced by: # mount -ouquota,gquota /dev/sda7 /xfs # mkdir /xfs/test # xfs_quota -xc 'off -g' /xfs <-- hangs up # echo w > /proc/sysrq-trigger # dmesg SysRq : Show Blocked State task PC stack pid father xfs_quota D 0000000000000000 0 27574 2551 0x00000000 [snip] Call Trace: [<ffffffff81aaa21d>] schedule+0xad/0xc0 [<ffffffff81aa327e>] schedule_timeout+0x35e/0x3c0 [<ffffffff8114b506>] ? mark_held_locks+0x176/0x1c0 [<ffffffff810ad6c0>] ? call_timer_fn+0x2c0/0x2c0 [<ffffffffa0c25380>] ? xfs_qm_shrink_count+0x30/0x30 [xfs] [<ffffffff81aa3306>] schedule_timeout_uninterruptible+0x26/0x30 [<ffffffffa0c26155>] xfs_qm_dquot_walk+0x235/0x260 [xfs] [<ffffffffa0c059d8>] ? xfs_perag_get+0x1d8/0x2d0 [xfs] [<ffffffffa0c05805>] ? xfs_perag_get+0x5/0x2d0 [xfs] [<ffffffffa0b7707e>] ? xfs_inode_ag_iterator+0xae/0xf0 [xfs] [<ffffffffa0c22280>] ? xfs_trans_free_dqinfo+0x50/0x50 [xfs] [<ffffffffa0b7709f>] ? xfs_inode_ag_iterator+0xcf/0xf0 [xfs] [<ffffffffa0c261e6>] xfs_qm_dqpurge_all+0x66/0xb0 [xfs] [<ffffffffa0c2497a>] xfs_qm_scall_quotaoff+0x20a/0x5f0 [xfs] [<ffffffffa0c2b8f6>] xfs_fs_set_xstate+0x136/0x180 [xfs] [<ffffffff8136cf7a>] do_quotactl+0x53a/0x6b0 [<ffffffff812fba4b>] ? iput+0x5b/0x90 [<ffffffff8136d257>] SyS_quotactl+0x167/0x1d0 [<ffffffff814cf2ee>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff81abcd19>] system_call_fastpath+0x16/0x1b It's fine if we turn user quota off at first, then turn off other kind of quotas if they are enabled since the group/project dquot refcount is decreased to zero once the user quota if off. Otherwise, those dquots refcount is non-zero due to the user dquot might refer to them as hint(s). Hence, above operation cause an infinite loop at xfs_qm_dquot_walk() while trying to purge dquot cache. This problem has been around since Linux 3.4, it was introduced by: [ b84a3a9675 xfs: remove the per-filesystem list of dquots ] Originally we will release the group dquot pointers because the user dquots maybe carrying around as a hint via xfs_qm_detach_gdquots(). However, with above change, there is no such work to be done before purging group/project dquot cache. In order to solve this problem, this patch introduces a special routine xfs_qm_dqpurge_hints(), and it would release the group/project dquot pointers the user dquots maybe carrying around as a hint, and then it will proceed to purge the user dquot cache if requested. Cc: stable@vger.kernel.org Signed-off-by: Jie Liu <jeff.liu@oracle.com> --- fs/xfs/xfs_qm.c | 71 ++++++++++++++++++++++++++++++++++++++++----------------- 1 file changed, 50 insertions(+), 21 deletions(-) diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index 14a4996..424ef73 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -134,8 +134,6 @@ xfs_qm_dqpurge( { struct xfs_mount *mp = dqp->q_mount; struct xfs_quotainfo *qi = mp->m_quotainfo; - struct xfs_dquot *gdqp = NULL; - struct xfs_dquot *pdqp = NULL; xfs_dqlock(dqp); if ((dqp->dq_flags & XFS_DQ_FREEING) || dqp->q_nrefs != 0) { @@ -143,21 +141,6 @@ xfs_qm_dqpurge( return EAGAIN; } - /* - * If this quota has a hint attached, prepare for releasing it now. - */ - gdqp = dqp->q_gdquot; - if (gdqp) { - xfs_dqlock(gdqp); - dqp->q_gdquot = NULL; - } - - pdqp = dqp->q_pdquot; - if (pdqp) { - xfs_dqlock(pdqp); - dqp->q_pdquot = NULL; - } - dqp->dq_flags |= XFS_DQ_FREEING; xfs_dqflock(dqp); @@ -206,11 +189,47 @@ xfs_qm_dqpurge( XFS_STATS_DEC(xs_qm_dquot_unused); xfs_qm_dqdestroy(dqp); + return 0; +} + +/* + * Release the group or project dquot pointers the user dquots maybe carrying + * around as a hint, and proceed to purge the user dquot cache if requested. +*/ +STATIC int +xfs_qm_dqpurge_hints( + struct xfs_dquot *dqp, + void *data) +{ + struct xfs_dquot *gdqp = NULL; + struct xfs_dquot *pdqp = NULL; + uint flags = *((uint *)data); + xfs_dqlock(dqp); + if (dqp->dq_flags & XFS_DQ_FREEING) { + xfs_dqunlock(dqp); + return EAGAIN; + } + + /* If this quota has a hint attached, prepare for releasing it now */ + gdqp = dqp->q_gdquot; if (gdqp) - xfs_qm_dqput(gdqp); + dqp->q_gdquot = NULL; + + pdqp = dqp->q_pdquot; if (pdqp) - xfs_qm_dqput(pdqp); + dqp->q_pdquot = NULL; + + xfs_dqunlock(dqp); + + if (gdqp) + xfs_qm_dqrele(gdqp); + if (pdqp) + xfs_qm_dqrele(pdqp); + + if (flags & XFS_QMOPT_UQUOTA) + return xfs_qm_dqpurge(dqp, NULL); + return 0; } @@ -222,8 +241,18 @@ xfs_qm_dqpurge_all( struct xfs_mount *mp, uint flags) { - if (flags & XFS_QMOPT_UQUOTA) - xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge, NULL); + /* + * We have to release group/project dquot hint(s) from the user dquot + * at first if they are there, otherwise we would run into an infinite + * loop while walking through radix tree to purge other type of dquots + * since their refcount is not zero if the user dquot refers to them + * as hint. + * + * Call the special xfs_qm_dqpurge_hints() will end up go through the + * general xfs_qm_dqpurge() against user dquot cache if requested. + */ + xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge_hints, &flags); + if (flags & XFS_QMOPT_GQUOTA) xfs_qm_dquot_walk(mp, XFS_DQ_GROUP, xfs_qm_dqpurge, NULL); if (flags & XFS_QMOPT_PQUOTA) -- 1.8.3.2 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/3] xfs: fix infinite loop by detaching the group/project hints from user dquot 2013-11-26 13:38 [PATCH v2 2/3] xfs: fix infinite loop by detaching the group/project hints from user dquot Jeff Liu @ 2013-11-28 10:43 ` Christoph Hellwig 2013-11-29 9:36 ` Jeff Liu 0 siblings, 1 reply; 12+ messages in thread From: Christoph Hellwig @ 2013-11-28 10:43 UTC (permalink / raw) To: Jeff Liu; +Cc: xfs@oss.sgi.com On Tue, Nov 26, 2013 at 09:38:49PM +0800, Jeff Liu wrote: > + if (flags & XFS_QMOPT_UQUOTA) > + return xfs_qm_dqpurge(dqp, NULL); To me it doesn't make any sense to overload this function for the user quotas that don't have hints. I'd suggest dropping this hunk and keeping a separate walk for releasing the uquots. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/3] xfs: fix infinite loop by detaching the group/project hints from user dquot 2013-11-28 10:43 ` Christoph Hellwig @ 2013-11-29 9:36 ` Jeff Liu 2013-12-06 21:01 ` Ben Myers 0 siblings, 1 reply; 12+ messages in thread From: Jeff Liu @ 2013-11-29 9:36 UTC (permalink / raw) To: Christoph Hellwig; +Cc: xfs@oss.sgi.com On 11/28 2013 18:43 PM, Christoph Hellwig wrote: > On Tue, Nov 26, 2013 at 09:38:49PM +0800, Jeff Liu wrote: >> + if (flags & XFS_QMOPT_UQUOTA) >> + return xfs_qm_dqpurge(dqp, NULL); > > To me it doesn't make any sense to overload this function for the user > quotas that don't have hints. To me it would like a silly compromise. > > I'd suggest dropping this hunk and keeping a separate walk for > releasing the uquots. I thought this over and yup, that is an overload if neither group nor project are enabled, or we don't want to turn user quota off. But even so, we currently also have overloads by checking group/project hints before releasing any type of quota in xfs_qm_purge(). In this point, this fix can reduce a bit overloads by moving those checkups to xfs_qm_purge_hints() if we want to turn group/project quotas off. If we considering to drop above hunk to release user quota separately, we finally would have to walk through user quota to remove those hints again, i.e, /* Remove group/project hints from user dquot */ STATIC int xfs_qm_dqpurge_hints( struct xfs_dquot *dqp, void *data) { uint flags = *((uint *)data); struct xfs_dquot *gdqp; struct xfs_dquot *pdqp; xfs_dqlock(dqp); if (dqp->dq_flags & XFS_DQ_FREEING) { xfs_dqunlock(dqp); return EAGAIN; } /* If this quota has a hint attached, prepare for releasing it now */ gdqp = dqp->q_gdquot; if (gdqp) dqp->q_gdquot = NULL; pdqp = dqp->q_pdquot; if (pdqp) dqp->q_pdquot = NULL; xfs_dqunlock(dqp); if (gdqp) xfs_qm_dqrele(gdqp); if (pdqp) xfs_qm_dqrele(pdqp); return 0; } void xfs_qm_dqpurge_all() { xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge_hints, NULL); if (flags & XFS_QMOPT_UQUOTA) xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge, NULL); if (flags & XFS_QMOPT_GQUOTA) xfs_qm_dquot_walk(mp, XFS_DQ_GROUP, xfs_qm_dqpurge, NULL); if (flags & XFS_QMOPT_PQUOTA) xfs_qm_dquot_walk(mp, XFS_DQ_PROJ, xfs_qm_dqpurge, NULL); } Above code is what I can figured out as per your suggestions for now, but it would introduce overheads for walking through user dquots to release hints separately if we want to turn user quota off. Any thoughts? Thanks, -Jeff _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/3] xfs: fix infinite loop by detaching the group/project hints from user dquot 2013-11-29 9:36 ` Jeff Liu @ 2013-12-06 21:01 ` Ben Myers 2013-12-07 5:51 ` Jeff Liu 0 siblings, 1 reply; 12+ messages in thread From: Ben Myers @ 2013-12-06 21:01 UTC (permalink / raw) To: Jeff Liu, Christoph Hellwig; +Cc: xfs@oss.sgi.com Hey Jeff, On Fri, Nov 29, 2013 at 05:36:01PM +0800, Jeff Liu wrote: > On 11/28 2013 18:43 PM, Christoph Hellwig wrote: > > On Tue, Nov 26, 2013 at 09:38:49PM +0800, Jeff Liu wrote: > >> + if (flags & XFS_QMOPT_UQUOTA) > >> + return xfs_qm_dqpurge(dqp, NULL); > > > > To me it doesn't make any sense to overload this function for the user > > quotas that don't have hints. > To me it would like a silly compromise. > > > > I'd suggest dropping this hunk and keeping a separate walk for > > releasing the uquots. > I thought this over and yup, that is an overload if neither group nor project > are enabled, or we don't want to turn user quota off. > > But even so, we currently also have overloads by checking group/project hints > before releasing any type of quota in xfs_qm_purge(). In this point, this fix > can reduce a bit overloads by moving those checkups to xfs_qm_purge_hints() if > we want to turn group/project quotas off. > > If we considering to drop above hunk to release user quota separately, we finally > would have to walk through user quota to remove those hints again, i.e, > > /* Remove group/project hints from user dquot */ > STATIC int > xfs_qm_dqpurge_hints( > struct xfs_dquot *dqp, > void *data) > { > uint flags = *((uint *)data); > struct xfs_dquot *gdqp; > struct xfs_dquot *pdqp; > > xfs_dqlock(dqp); > if (dqp->dq_flags & XFS_DQ_FREEING) { > xfs_dqunlock(dqp); > return EAGAIN; > } > > /* If this quota has a hint attached, prepare for releasing it now */ > gdqp = dqp->q_gdquot; > if (gdqp) > dqp->q_gdquot = NULL; > > pdqp = dqp->q_pdquot; > if (pdqp) > dqp->q_pdquot = NULL; > > xfs_dqunlock(dqp); > > if (gdqp) > xfs_qm_dqrele(gdqp); > if (pdqp) > xfs_qm_dqrele(pdqp); > > return 0; > } > > void > xfs_qm_dqpurge_all() > { > xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge_hints, NULL); > > if (flags & XFS_QMOPT_UQUOTA) > xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge, NULL); > if (flags & XFS_QMOPT_GQUOTA) > xfs_qm_dquot_walk(mp, XFS_DQ_GROUP, xfs_qm_dqpurge, NULL); > if (flags & XFS_QMOPT_PQUOTA) > xfs_qm_dquot_walk(mp, XFS_DQ_PROJ, xfs_qm_dqpurge, NULL); > } > > Above code is what I can figured out as per your suggestions for now, but it > would introduce overheads for walking through user dquots to release hints > separately if we want to turn user quota off. > > Any thoughts? I was gonna pull in the single walk version, but now I realize that it is still under discussion. I'm happy with either implementation, with maybe a slight preference for a single user quota walk. Can you and Christoph come to an agreement? Thanks, Ben _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/3] xfs: fix infinite loop by detaching the group/project hints from user dquot 2013-12-06 21:01 ` Ben Myers @ 2013-12-07 5:51 ` Jeff Liu 2013-12-09 1:26 ` Dave Chinner 0 siblings, 1 reply; 12+ messages in thread From: Jeff Liu @ 2013-12-07 5:51 UTC (permalink / raw) To: Ben Myers, Christoph Hellwig; +Cc: xfs@oss.sgi.com Hi Ben, On 12/07 2013 05:01 AM, Ben Myers wrote: > Hey Jeff, > > On Fri, Nov 29, 2013 at 05:36:01PM +0800, Jeff Liu wrote: >> On 11/28 2013 18:43 PM, Christoph Hellwig wrote: >>> On Tue, Nov 26, 2013 at 09:38:49PM +0800, Jeff Liu wrote: >>>> + if (flags & XFS_QMOPT_UQUOTA) >>>> + return xfs_qm_dqpurge(dqp, NULL); >>> >>> To me it doesn't make any sense to overload this function for the user >>> quotas that don't have hints. >> To me it would like a silly compromise. >>> >>> I'd suggest dropping this hunk and keeping a separate walk for >>> releasing the uquots. >> I thought this over and yup, that is an overload if neither group nor project >> are enabled, or we don't want to turn user quota off. >> >> But even so, we currently also have overloads by checking group/project hints >> before releasing any type of quota in xfs_qm_purge(). In this point, this fix >> can reduce a bit overloads by moving those checkups to xfs_qm_purge_hints() if >> we want to turn group/project quotas off. >> >> If we considering to drop above hunk to release user quota separately, we finally >> would have to walk through user quota to remove those hints again, i.e, >> >> /* Remove group/project hints from user dquot */ >> STATIC int >> xfs_qm_dqpurge_hints( >> struct xfs_dquot *dqp, >> void *data) >> { >> uint flags = *((uint *)data); >> struct xfs_dquot *gdqp; >> struct xfs_dquot *pdqp; >> >> xfs_dqlock(dqp); >> if (dqp->dq_flags & XFS_DQ_FREEING) { >> xfs_dqunlock(dqp); >> return EAGAIN; >> } >> >> /* If this quota has a hint attached, prepare for releasing it now */ >> gdqp = dqp->q_gdquot; >> if (gdqp) >> dqp->q_gdquot = NULL; >> >> pdqp = dqp->q_pdquot; >> if (pdqp) >> dqp->q_pdquot = NULL; >> >> xfs_dqunlock(dqp); >> >> if (gdqp) >> xfs_qm_dqrele(gdqp); >> if (pdqp) >> xfs_qm_dqrele(pdqp); >> >> return 0; >> } >> >> void >> xfs_qm_dqpurge_all() >> { >> xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge_hints, NULL); >> >> if (flags & XFS_QMOPT_UQUOTA) >> xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge, NULL); >> if (flags & XFS_QMOPT_GQUOTA) >> xfs_qm_dquot_walk(mp, XFS_DQ_GROUP, xfs_qm_dqpurge, NULL); >> if (flags & XFS_QMOPT_PQUOTA) >> xfs_qm_dquot_walk(mp, XFS_DQ_PROJ, xfs_qm_dqpurge, NULL); >> } >> >> Above code is what I can figured out as per your suggestions for now, but it >> would introduce overheads for walking through user dquots to release hints >> separately if we want to turn user quota off. >> >> Any thoughts? > > I was gonna pull in the single walk version, but now I realize that it is still > under discussion. I'm happy with either implementation, with maybe a slight > preference for a single user quota walk. Can you and Christoph come to an > agreement? For now, I can not figure out a more optimized solution. Well, I just realized I don't need to initialize both gdqp and pdqp to NULL at xfs_qm_dqpurge_hints() since they will be evaluated by dqp pointers dereference anyway. As a minor fix, the revised version was shown as follows. Christoph, as I mentioned previously, keeping a separate walk to release the user dquots would also have overloads in some cases, would you happy to have this fix although it is not most optimized? Also, Ben, would you please pull in another fix below? It is independent to the current fix and it has already been reviewed by Christoph. :) [PATCH v2 1/3] xfs: fix assertion failure at xfs_setattr_nonsize Thanks, -Jeff From: Jie Liu <jeff.liu@oracle.com> xfs_quota(8) will hang up if trying to turn group/project quota off before the user quota is off, this could be 100% reproduced by: # mount -ouquota,gquota /dev/sda7 /xfs # mkdir /xfs/test # xfs_quota -xc 'off -g' /xfs <-- hangs up # echo w > /proc/sysrq-trigger # dmesg SysRq : Show Blocked State task PC stack pid father xfs_quota D 0000000000000000 0 27574 2551 0x00000000 [snip] Call Trace: [<ffffffff81aaa21d>] schedule+0xad/0xc0 [<ffffffff81aa327e>] schedule_timeout+0x35e/0x3c0 [<ffffffff8114b506>] ? mark_held_locks+0x176/0x1c0 [<ffffffff810ad6c0>] ? call_timer_fn+0x2c0/0x2c0 [<ffffffffa0c25380>] ? xfs_qm_shrink_count+0x30/0x30 [xfs] [<ffffffff81aa3306>] schedule_timeout_uninterruptible+0x26/0x30 [<ffffffffa0c26155>] xfs_qm_dquot_walk+0x235/0x260 [xfs] [<ffffffffa0c059d8>] ? xfs_perag_get+0x1d8/0x2d0 [xfs] [<ffffffffa0c05805>] ? xfs_perag_get+0x5/0x2d0 [xfs] [<ffffffffa0b7707e>] ? xfs_inode_ag_iterator+0xae/0xf0 [xfs] [<ffffffffa0c22280>] ? xfs_trans_free_dqinfo+0x50/0x50 [xfs] [<ffffffffa0b7709f>] ? xfs_inode_ag_iterator+0xcf/0xf0 [xfs] [<ffffffffa0c261e6>] xfs_qm_dqpurge_all+0x66/0xb0 [xfs] [<ffffffffa0c2497a>] xfs_qm_scall_quotaoff+0x20a/0x5f0 [xfs] [<ffffffffa0c2b8f6>] xfs_fs_set_xstate+0x136/0x180 [xfs] [<ffffffff8136cf7a>] do_quotactl+0x53a/0x6b0 [<ffffffff812fba4b>] ? iput+0x5b/0x90 [<ffffffff8136d257>] SyS_quotactl+0x167/0x1d0 [<ffffffff814cf2ee>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff81abcd19>] system_call_fastpath+0x16/0x1b It's fine if we turn user quota off at first, then turn off other kind of quotas if they are enabled since the group/project dquot refcount is decreased to zero once the user quota if off. Otherwise, those dquots refcount is non-zero due to the user dquot might refer to them as hint(s). Hence, above operation cause an infinite loop at xfs_qm_dquot_walk() while trying to purge dquot cache. This problem has been around since Linux 3.4, it was introduced by: [ b84a3a9675 xfs: remove the per-filesystem list of dquots ] Originally we will release the group dquot pointers because the user dquots maybe carrying around as a hint via xfs_qm_detach_gdquots(). However, with above change, there is no such work to be done before purging group/project dquot cache. In order to solve this problem, this patch introduces a special routine xfs_qm_dqpurge_hints(), and it would release the group/project dquot pointers the user dquots maybe carrying around as a hint, and then it will proceed to purge the user dquot cache if requested. Cc: stable@vger.kernel.org Signed-off-by: Jie Liu <jeff.liu@oracle.com> --- fs/xfs/xfs_qm.c | 71 ++++++++++++++++++++++++++++++++++++++++----------------- 1 file changed, 50 insertions(+), 21 deletions(-) diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index 14a4996..424ef73 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -134,8 +134,6 @@ xfs_qm_dqpurge( { struct xfs_mount *mp = dqp->q_mount; struct xfs_quotainfo *qi = mp->m_quotainfo; - struct xfs_dquot *gdqp = NULL; - struct xfs_dquot *pdqp = NULL; xfs_dqlock(dqp); if ((dqp->dq_flags & XFS_DQ_FREEING) || dqp->q_nrefs != 0) { @@ -143,21 +141,6 @@ xfs_qm_dqpurge( return EAGAIN; } - /* - * If this quota has a hint attached, prepare for releasing it now. - */ - gdqp = dqp->q_gdquot; - if (gdqp) { - xfs_dqlock(gdqp); - dqp->q_gdquot = NULL; - } - - pdqp = dqp->q_pdquot; - if (pdqp) { - xfs_dqlock(pdqp); - dqp->q_pdquot = NULL; - } - dqp->dq_flags |= XFS_DQ_FREEING; xfs_dqflock(dqp); @@ -206,11 +189,47 @@ xfs_qm_dqpurge( XFS_STATS_DEC(xs_qm_dquot_unused); xfs_qm_dqdestroy(dqp); + return 0; +} + +/* + * Release the group or project dquot pointers the user dquots maybe carrying + * around as a hint, and proceed to purge the user dquot cache if requested. +*/ +STATIC int +xfs_qm_dqpurge_hints( + struct xfs_dquot *dqp, + void *data) +{ + uint flags = *((uint *)data); + struct xfs_dquot *gdqp; + struct xfs_dquot *pdqp; + xfs_dqlock(dqp); + if (dqp->dq_flags & XFS_DQ_FREEING) { + xfs_dqunlock(dqp); + return EAGAIN; + } + + /* If this quota has a hint attached, prepare for releasing it now */ + gdqp = dqp->q_gdquot; if (gdqp) - xfs_qm_dqput(gdqp); + dqp->q_gdquot = NULL; + + pdqp = dqp->q_pdquot; if (pdqp) - xfs_qm_dqput(pdqp); + dqp->q_pdquot = NULL; + + xfs_dqunlock(dqp); + + if (gdqp) + xfs_qm_dqrele(gdqp); + if (pdqp) + xfs_qm_dqrele(pdqp); + + if (flags & XFS_QMOPT_UQUOTA) + return xfs_qm_dqpurge(dqp, NULL); + return 0; } @@ -222,8 +241,18 @@ xfs_qm_dqpurge_all( struct xfs_mount *mp, uint flags) { - if (flags & XFS_QMOPT_UQUOTA) - xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge, NULL); + /* + * We have to release group/project dquot hint(s) from the user dquot + * at first if they are there, otherwise we would run into an infinite + * loop while walking through radix tree to purge other type of dquots + * since their refcount is not zero if the user dquot refers to them + * as hint. + * + * Call the special xfs_qm_dqpurge_hints() will end up go through the + * general xfs_qm_dqpurge() against user dquot cache if requested. + */ + xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge_hints, &flags); + if (flags & XFS_QMOPT_GQUOTA) xfs_qm_dquot_walk(mp, XFS_DQ_GROUP, xfs_qm_dqpurge, NULL); if (flags & XFS_QMOPT_PQUOTA) -- 1.8.3.2 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/3] xfs: fix infinite loop by detaching the group/project hints from user dquot 2013-12-07 5:51 ` Jeff Liu @ 2013-12-09 1:26 ` Dave Chinner 2013-12-09 2:36 ` Dave Chinner 0 siblings, 1 reply; 12+ messages in thread From: Dave Chinner @ 2013-12-09 1:26 UTC (permalink / raw) To: Jeff Liu; +Cc: Christoph Hellwig, Ben Myers, xfs@oss.sgi.com On Sat, Dec 07, 2013 at 01:51:24PM +0800, Jeff Liu wrote: > Hi Ben, > .... > >> void > >> xfs_qm_dqpurge_all() > >> { > >> xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge_hints, NULL); > >> > >> if (flags & XFS_QMOPT_UQUOTA) > >> xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge, NULL); > >> if (flags & XFS_QMOPT_GQUOTA) > >> xfs_qm_dquot_walk(mp, XFS_DQ_GROUP, xfs_qm_dqpurge, NULL); > >> if (flags & XFS_QMOPT_PQUOTA) > >> xfs_qm_dquot_walk(mp, XFS_DQ_PROJ, xfs_qm_dqpurge, NULL); > >> } > >> > >> Above code is what I can figured out as per your suggestions for now, but it > >> would introduce overheads for walking through user dquots to release hints > >> separately if we want to turn user quota off. > >> > >> Any thoughts? > > > > I was gonna pull in the single walk version, but now I realize that it is still > > under discussion. I'm happy with either implementation, with maybe a slight > > preference for a single user quota walk. Can you and Christoph come to an > > agreement? > For now, I can not figure out a more optimized solution. Well, I just realized > I don't need to initialize both gdqp and pdqp to NULL at xfs_qm_dqpurge_hints() > since they will be evaluated by dqp pointers dereference anyway. As a minor fix, > the revised version was shown as follows. > > Christoph, as I mentioned previously, keeping a separate walk to release the user > dquots would also have overloads in some cases, would you happy to have this fix > although it is not most optimized? I'm happy either way it is done - I'd prefer we fix the problem than bikeshed over an extra radix tree walk or not given for most people the overhead won't be significant. > From: Jie Liu <jeff.liu@oracle.com> > > xfs_quota(8) will hang up if trying to turn group/project quota off > before the user quota is off, this could be 100% reproduced by: ..... So from the perspective, I'm happy to consider the updated patch as: Reviewed-by: Dave Chinner <dchinner@redhat.com> However, I question the need for the hints at all now. The hints were necessary back when the quota manager had global lists and hashes, and the lookups were expensive. Hence there was a significant win to caching the group dquot on the user dquot as it avoided a significant amount of code, locks and dirty cachelines. Now, it's just a radix tree lookup under only a single lock and the process dirties far fewer cachelines (none in the radix tree at all) and so should be substantially faster than the old code. And with the dquots being attached and cached on inodes in the first place, I don't see much advantage to keeping hints on the user dquot. THis is especially true for project quotas where a user might be accessing files in different projects all the time and so thrashing the project quota hint on the user dquot.... Hence I wonder if removing the dquot hint caching altogether would result in smaller, simpler, faster code. And, in reality, if the radix tree lock is a contention point on lookup after removing the hints, then we can fix that quite easily by switching to RCU-based lockless lookups like we do for the inode cache.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/3] xfs: fix infinite loop by detaching the group/project hints from user dquot 2013-12-09 1:26 ` Dave Chinner @ 2013-12-09 2:36 ` Dave Chinner 2013-12-09 3:26 ` Jeff Liu 2013-12-09 7:10 ` Christoph Hellwig 0 siblings, 2 replies; 12+ messages in thread From: Dave Chinner @ 2013-12-09 2:36 UTC (permalink / raw) To: Jeff Liu; +Cc: Christoph Hellwig, Ben Myers, xfs@oss.sgi.com On Mon, Dec 09, 2013 at 12:26:42PM +1100, Dave Chinner wrote: > On Sat, Dec 07, 2013 at 01:51:24PM +0800, Jeff Liu wrote: > > Hi Ben, > > > .... > > >> void > > >> xfs_qm_dqpurge_all() > > >> { > > >> xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge_hints, NULL); > > >> > > >> if (flags & XFS_QMOPT_UQUOTA) > > >> xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge, NULL); > > >> if (flags & XFS_QMOPT_GQUOTA) > > >> xfs_qm_dquot_walk(mp, XFS_DQ_GROUP, xfs_qm_dqpurge, NULL); > > >> if (flags & XFS_QMOPT_PQUOTA) > > >> xfs_qm_dquot_walk(mp, XFS_DQ_PROJ, xfs_qm_dqpurge, NULL); > > >> } > > >> > > >> Above code is what I can figured out as per your suggestions for now, but it > > >> would introduce overheads for walking through user dquots to release hints > > >> separately if we want to turn user quota off. > > >> > > >> Any thoughts? > > > > > > I was gonna pull in the single walk version, but now I realize that it is still > > > under discussion. I'm happy with either implementation, with maybe a slight > > > preference for a single user quota walk. Can you and Christoph come to an > > > agreement? > > For now, I can not figure out a more optimized solution. Well, I just realized > > I don't need to initialize both gdqp and pdqp to NULL at xfs_qm_dqpurge_hints() > > since they will be evaluated by dqp pointers dereference anyway. As a minor fix, > > the revised version was shown as follows. > > > > Christoph, as I mentioned previously, keeping a separate walk to release the user > > dquots would also have overloads in some cases, would you happy to have this fix > > although it is not most optimized? > > I'm happy either way it is done - I'd prefer we fix the problem than > bikeshed over an extra radix tree walk or not given for most people > the overhead won't be significant. > > > From: Jie Liu <jeff.liu@oracle.com> > > > > xfs_quota(8) will hang up if trying to turn group/project quota off > > before the user quota is off, this could be 100% reproduced by: > ..... > > So from the perspective, I'm happy to consider the updated > patch as: > > Reviewed-by: Dave Chinner <dchinner@redhat.com> > > However, I question the need for the hints at all now. The hints > were necessary back when the quota manager had global lists and > hashes, and the lookups were expensive. Hence there was a > significant win to caching the group dquot on the user dquot as it > avoided a significant amount of code, locks and dirty cachelines. > > Now, it's just a radix tree lookup under only a single lock and the > process dirties far fewer cachelines (none in the radix tree at all) > and so should be substantially faster than the old code. And with > the dquots being attached and cached on inodes in the first place, I > don't see much advantage to keeping hints on the user dquot. THis is > especially true for project quotas where a user might be accessing > files in different projects all the time and so thrashing the > project quota hint on the user dquot.... > > Hence I wonder if removing the dquot hint caching altogether would > result in smaller, simpler, faster code. And, in reality, if the > radix tree lock is a contention point on lookup after removing the > hints, then we can fix that quite easily by switching to RCU-based > lockless lookups like we do for the inode cache.... Actually, scalability couldn't get any worse by removing the hints. If I run a concurrent workload with quota enabled, the global dquot locks (be it user, quota or project) completely serialises the workload. This result if from u/g/p all enabled, run by a single user in a single group and a project ID of zero: ./fs_mark -D 10000 -S0 -n 100000 -s 0 -L 32 -d /mnt/scratch/0 -d /mnt/scratch/1 -d /mnt/scratch/2 -d /mnt/scratch/3 -d /mnt/scratch/4 -d /mnt/scratch/5 -d /mnt/scratch/6 -d /mnt/scratch/7 -d /mnt/scratch/8 -d /mnt/scratch/9 -d /mnt/scratch/10 -d /mnt/scratch/11 -d /mnt/scratch/12 -d /mnt/scratch/13 -d /mnt/scratch/14 -d /mnt/scratch/15 # Version 3.3, 16 thread(s) starting at Mon Dec 9 12:53:46 2013 # Sync method: NO SYNC: Test does not issue sync() or fsync() calls. # Directories: Time based hash between directories across 10000 subdirectories with 180 seconds per subdirectory. # File names: 40 bytes long, (16 initial bytes of time stamp with 24 random bytes at end of name) # Files info: size 0 bytes, written with an IO size of 16384 bytes per write # App overhead is time in microseconds spent in the test not doing file writing related system calls. FSUse% Count Size Files/sec App Overhead 0 1600000 0 17666.5 15377143 0 3200000 0 17018.6 15922906 0 4800000 0 17373.5 16149660 0 6400000 0 16564.9 17234139 .... Without quota enabled, that workload runs at >250,000 files/sec. Serialisation is completely on the dquot locks - so I don't see anything right now that hints are going to buy us in terms of improving concurrency or scalability, so I think we probably can just get rid of them. FWIW, getting rid of the hints and converting the dquot reference counter to an atomic actually improves performance a bit: FSUse% Count Size Files/sec App Overhead 0 1600000 0 17559.3 15606077 0 3200000 0 18738.9 14026009 0 4800000 0 18960.0 14381162 0 6400000 0 19026.5 14422024 0 8000000 0 18456.6 15369059 Sure, 10% improvement is 10%, but concurrency still sucks. At least it narrows down the cause - the transactional modifications are the serialisation issue. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/3] xfs: fix infinite loop by detaching the group/project hints from user dquot 2013-12-09 2:36 ` Dave Chinner @ 2013-12-09 3:26 ` Jeff Liu 2013-12-09 6:09 ` Dave Chinner 2013-12-09 7:10 ` Christoph Hellwig 1 sibling, 1 reply; 12+ messages in thread From: Jeff Liu @ 2013-12-09 3:26 UTC (permalink / raw) To: Dave Chinner; +Cc: Christoph Hellwig, Ben Myers, xfs@oss.sgi.com On 12/09 2013 10:36 AM, Dave Chinner wrote: > On Mon, Dec 09, 2013 at 12:26:42PM +1100, Dave Chinner wrote: >> On Sat, Dec 07, 2013 at 01:51:24PM +0800, Jeff Liu wrote: >>> Hi Ben, >>> >> .... >>>>> void >>>>> xfs_qm_dqpurge_all() >>>>> { >>>>> xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge_hints, NULL); >>>>> >>>>> if (flags & XFS_QMOPT_UQUOTA) >>>>> xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge, NULL); >>>>> if (flags & XFS_QMOPT_GQUOTA) >>>>> xfs_qm_dquot_walk(mp, XFS_DQ_GROUP, xfs_qm_dqpurge, NULL); >>>>> if (flags & XFS_QMOPT_PQUOTA) >>>>> xfs_qm_dquot_walk(mp, XFS_DQ_PROJ, xfs_qm_dqpurge, NULL); >>>>> } >>>>> >>>>> Above code is what I can figured out as per your suggestions for now, but it >>>>> would introduce overheads for walking through user dquots to release hints >>>>> separately if we want to turn user quota off. >>>>> >>>>> Any thoughts? >>>> >>>> I was gonna pull in the single walk version, but now I realize that it is still >>>> under discussion. I'm happy with either implementation, with maybe a slight >>>> preference for a single user quota walk. Can you and Christoph come to an >>>> agreement? >>> For now, I can not figure out a more optimized solution. Well, I just realized >>> I don't need to initialize both gdqp and pdqp to NULL at xfs_qm_dqpurge_hints() >>> since they will be evaluated by dqp pointers dereference anyway. As a minor fix, >>> the revised version was shown as follows. >>> >>> Christoph, as I mentioned previously, keeping a separate walk to release the user >>> dquots would also have overloads in some cases, would you happy to have this fix >>> although it is not most optimized? >> >> I'm happy either way it is done - I'd prefer we fix the problem than >> bikeshed over an extra radix tree walk or not given for most people >> the overhead won't be significant. >> >>> From: Jie Liu <jeff.liu@oracle.com> >>> >>> xfs_quota(8) will hang up if trying to turn group/project quota off >>> before the user quota is off, this could be 100% reproduced by: >> ..... >> >> So from the perspective, I'm happy to consider the updated >> patch as: >> >> Reviewed-by: Dave Chinner <dchinner@redhat.com> >> >> However, I question the need for the hints at all now. The hints >> were necessary back when the quota manager had global lists and >> hashes, and the lookups were expensive. Hence there was a >> significant win to caching the group dquot on the user dquot as it >> avoided a significant amount of code, locks and dirty cachelines. >> >> Now, it's just a radix tree lookup under only a single lock and the >> process dirties far fewer cachelines (none in the radix tree at all) >> and so should be substantially faster than the old code. And with >> the dquots being attached and cached on inodes in the first place, I >> don't see much advantage to keeping hints on the user dquot. THis is >> especially true for project quotas where a user might be accessing >> files in different projects all the time and so thrashing the >> project quota hint on the user dquot.... Ah, that accounts for it! Yesterday, I even thought to add an udquot member to struct xfs_dquot in order to avoid walk though user quota while turning off others, i.e, diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h index d22ed00..0037c7e 100644 --- a/fs/xfs/xfs_dquot.h +++ b/fs/xfs/xfs_dquot.h @@ -52,6 +52,13 @@ typedef struct xfs_dquot { int q_bufoffset; /* off of dq in buffer (# dquots) */ xfs_fileoff_t q_fileoffset; /* offset in quotas file */ + union { + struct xfs_dquot *q_udquot; + struct { + struct xfs_dquot *q_pdquot; + struct xfs_dquot *q_gdquot; + } gp_hints; + } hints; struct xfs_dquot*q_gdquot; /* group dquot, hint only */ struct xfs_dquot*q_pdquot; /* project dquot, hint only */ xfs_disk_dquot_t q_core; /* actual usage & quotas */ In this way, I can attach the q_udquot to group/project dquots while attaching them to the user's. Thus I don't need to walk through user dquots to fetch the hints but to fetch them via: gdquot->hints.q_udquot.g_pdquot/g_gdquot and then decrease the reference count, but that need more code changes and add complexities. >> >> Hence I wonder if removing the dquot hint caching altogether would >> result in smaller, simpler, faster code. And, in reality, if the >> radix tree lock is a contention point on lookup after removing the >> hints, then we can fix that quite easily by switching to RCU-based >> lockless lookups like we do for the inode cache.... > > Actually, scalability couldn't get any worse by removing the hints. > If I run a concurrent workload with quota enabled, the global dquot > locks (be it user, quota or project) completely serialises the > workload. This result if from u/g/p all enabled, run by a single > user in a single group and a project ID of zero: > > ./fs_mark -D 10000 -S0 -n 100000 -s 0 -L 32 -d /mnt/scratch/0 -d /mnt/scratch/1 -d /mnt/scratch/2 -d /mnt/scratch/3 -d /mnt/scratch/4 -d /mnt/scratch/5 -d /mnt/scratch/6 -d /mnt/scratch/7 -d /mnt/scratch/8 -d /mnt/scratch/9 -d /mnt/scratch/10 -d /mnt/scratch/11 -d /mnt/scratch/12 -d /mnt/scratch/13 -d /mnt/scratch/14 -d /mnt/scratch/15 > # Version 3.3, 16 thread(s) starting at Mon Dec 9 12:53:46 2013 > # Sync method: NO SYNC: Test does not issue sync() or fsync() calls. > # Directories: Time based hash between directories across 10000 subdirectories with 180 seconds per subdirectory. > # File names: 40 bytes long, (16 initial bytes of time stamp with 24 random bytes at end of name) > # Files info: size 0 bytes, written with an IO size of 16384 bytes per write > # App overhead is time in microseconds spent in the test not doing file writing related system calls. > > FSUse% Count Size Files/sec App Overhead > 0 1600000 0 17666.5 15377143 > 0 3200000 0 17018.6 15922906 > 0 4800000 0 17373.5 16149660 > 0 6400000 0 16564.9 17234139 > .... > > Without quota enabled, that workload runs at >250,000 files/sec. > > Serialisation is completely on the dquot locks - so I don't see > anything right now that hints are going to buy us in terms of > improving concurrency or scalability, so I think we probably can > just get rid of them. > > FWIW, getting rid of the hints and converting the dquot reference > counter to an atomic actually improves performance a bit: > > FSUse% Count Size Files/sec App Overhead > 0 1600000 0 17559.3 15606077 > 0 3200000 0 18738.9 14026009 > 0 4800000 0 18960.0 14381162 > 0 6400000 0 19026.5 14422024 > 0 8000000 0 18456.6 15369059 > > Sure, 10% improvement is 10%, but concurrency still sucks. At least > it narrows down the cause - the transactional modifications are the > serialisation issue. Admire!! I'm still in considering of remove the hints but you have already shown the measuring results. :) Would you like to fix it in this way directly or make it as a increased improvement once my current fix got merged? Both fine to me. Thanks, -Jeff _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/3] xfs: fix infinite loop by detaching the group/project hints from user dquot 2013-12-09 3:26 ` Jeff Liu @ 2013-12-09 6:09 ` Dave Chinner 2013-12-09 6:36 ` Dave Chinner 0 siblings, 1 reply; 12+ messages in thread From: Dave Chinner @ 2013-12-09 6:09 UTC (permalink / raw) To: Jeff Liu; +Cc: Christoph Hellwig, Ben Myers, xfs@oss.sgi.com On Mon, Dec 09, 2013 at 11:26:14AM +0800, Jeff Liu wrote: > On 12/09 2013 10:36 AM, Dave Chinner wrote: > > On Mon, Dec 09, 2013 at 12:26:42PM +1100, Dave Chinner wrote: > >> On Sat, Dec 07, 2013 at 01:51:24PM +0800, Jeff Liu wrote: > >>> Hi Ben, > >>> > >> .... > >>>>> void > >>>>> xfs_qm_dqpurge_all() > >>>>> { > >>>>> xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge_hints, NULL); > >>>>> > >>>>> if (flags & XFS_QMOPT_UQUOTA) > >>>>> xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge, NULL); > >>>>> if (flags & XFS_QMOPT_GQUOTA) > >>>>> xfs_qm_dquot_walk(mp, XFS_DQ_GROUP, xfs_qm_dqpurge, NULL); > >>>>> if (flags & XFS_QMOPT_PQUOTA) > >>>>> xfs_qm_dquot_walk(mp, XFS_DQ_PROJ, xfs_qm_dqpurge, NULL); > >>>>> } > >>>>> > >>>>> Above code is what I can figured out as per your suggestions for now, but it > >>>>> would introduce overheads for walking through user dquots to release hints > >>>>> separately if we want to turn user quota off. > >>>>> > >>>>> Any thoughts? > >>>> > >>>> I was gonna pull in the single walk version, but now I realize that it is still > >>>> under discussion. I'm happy with either implementation, with maybe a slight > >>>> preference for a single user quota walk. Can you and Christoph come to an > >>>> agreement? > >>> For now, I can not figure out a more optimized solution. Well, I just realized > >>> I don't need to initialize both gdqp and pdqp to NULL at xfs_qm_dqpurge_hints() > >>> since they will be evaluated by dqp pointers dereference anyway. As a minor fix, > >>> the revised version was shown as follows. > >>> > >>> Christoph, as I mentioned previously, keeping a separate walk to release the user > >>> dquots would also have overloads in some cases, would you happy to have this fix > >>> although it is not most optimized? > >> > >> I'm happy either way it is done - I'd prefer we fix the problem than > >> bikeshed over an extra radix tree walk or not given for most people > >> the overhead won't be significant. > >> > >>> From: Jie Liu <jeff.liu@oracle.com> > >>> > >>> xfs_quota(8) will hang up if trying to turn group/project quota off > >>> before the user quota is off, this could be 100% reproduced by: > >> ..... > >> > >> So from the perspective, I'm happy to consider the updated > >> patch as: > >> > >> Reviewed-by: Dave Chinner <dchinner@redhat.com> > >> > >> However, I question the need for the hints at all now. The hints > >> were necessary back when the quota manager had global lists and > >> hashes, and the lookups were expensive. Hence there was a > >> significant win to caching the group dquot on the user dquot as it > >> avoided a significant amount of code, locks and dirty cachelines. > >> > >> Now, it's just a radix tree lookup under only a single lock and the > >> process dirties far fewer cachelines (none in the radix tree at all) > >> and so should be substantially faster than the old code. And with > >> the dquots being attached and cached on inodes in the first place, I > >> don't see much advantage to keeping hints on the user dquot. THis is > >> especially true for project quotas where a user might be accessing > >> files in different projects all the time and so thrashing the > >> project quota hint on the user dquot.... > Ah, that accounts for it! Yesterday, I even thought to add an udquot > member to struct xfs_dquot in order to avoid walk though user quota > while turning off others, i.e, > > diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h > index d22ed00..0037c7e 100644 > --- a/fs/xfs/xfs_dquot.h > +++ b/fs/xfs/xfs_dquot.h > @@ -52,6 +52,13 @@ typedef struct xfs_dquot { > int q_bufoffset; /* off of dq in buffer (# dquots) */ > xfs_fileoff_t q_fileoffset; /* offset in quotas file */ > > + union { > + struct xfs_dquot *q_udquot; > + struct { > + struct xfs_dquot *q_pdquot; > + struct xfs_dquot *q_gdquot; > + } gp_hints; > + } hints; > struct xfs_dquot*q_gdquot; /* group dquot, hint only */ > struct xfs_dquot*q_pdquot; /* project dquot, hint only */ > xfs_disk_dquot_t q_core; /* actual usage & quotas */ > > In this way, I can attach the q_udquot to group/project dquots while > attaching them to the user's. Thus I don't need to walk through user > dquots to fetch the hints but to fetch them via: > gdquot->hints.q_udquot.g_pdquot/g_gdquot and then decrease the reference > count, but that need more code changes and add complexities. Yeah, more complexity, but I can see why you might take that path ;) > >> Hence I wonder if removing the dquot hint caching altogether would > >> result in smaller, simpler, faster code. And, in reality, if the > >> radix tree lock is a contention point on lookup after removing the > >> hints, then we can fix that quite easily by switching to RCU-based > >> lockless lookups like we do for the inode cache.... > > > > Actually, scalability couldn't get any worse by removing the hints. > > If I run a concurrent workload with quota enabled, the global dquot > > locks (be it user, quota or project) completely serialises the > > workload. This result if from u/g/p all enabled, run by a single > > user in a single group and a project ID of zero: ..... > > > > FSUse% Count Size Files/sec App Overhead > > 0 1600000 0 17666.5 15377143 > > 0 3200000 0 17018.6 15922906 > > 0 4800000 0 17373.5 16149660 > > 0 6400000 0 16564.9 17234139 > > .... > > > > Without quota enabled, that workload runs at >250,000 files/sec. > > > > Serialisation is completely on the dquot locks - so I don't see > > anything right now that hints are going to buy us in terms of > > improving concurrency or scalability, so I think we probably can > > just get rid of them. > > > > FWIW, getting rid of the hints and converting the dquot reference > > counter to an atomic actually improves performance a bit: > > > > FSUse% Count Size Files/sec App Overhead > > 0 1600000 0 17559.3 15606077 > > 0 3200000 0 18738.9 14026009 > > 0 4800000 0 18960.0 14381162 > > 0 6400000 0 19026.5 14422024 > > 0 8000000 0 18456.6 15369059 > > > > Sure, 10% improvement is 10%, but concurrency still sucks. At least > > it narrows down the cause - the transactional modifications are the > > serialisation issue. > Admire!! I'm still in considering of remove the hints but you have already > shown the measuring results. :) Oh, removing 200 lines of code is easy ;) It's making the dquot transaction code lockless that's hard... > Would you like to fix it in this way directly or make it as a increased > improvement once my current fix got merged? Both fine to me. Well, lets see what other's think about remove the hints altogether. If there is agreement on doing that, then we may as well just do that straight away. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/3] xfs: fix infinite loop by detaching the group/project hints from user dquot 2013-12-09 6:09 ` Dave Chinner @ 2013-12-09 6:36 ` Dave Chinner 0 siblings, 0 replies; 12+ messages in thread From: Dave Chinner @ 2013-12-09 6:36 UTC (permalink / raw) To: Jeff Liu; +Cc: Christoph Hellwig, Ben Myers, xfs@oss.sgi.com On Mon, Dec 09, 2013 at 05:09:45PM +1100, Dave Chinner wrote: > On Mon, Dec 09, 2013 at 11:26:14AM +0800, Jeff Liu wrote: > > On 12/09 2013 10:36 AM, Dave Chinner wrote: > > > On Mon, Dec 09, 2013 at 12:26:42PM +1100, Dave Chinner wrote: > > >> On Sat, Dec 07, 2013 at 01:51:24PM +0800, Jeff Liu wrote: > > >>> Hi Ben, > > >>> > > >> .... > > >>>>> void > > >>>>> xfs_qm_dqpurge_all() > > >>>>> { > > >>>>> xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge_hints, NULL); > > >>>>> > > >>>>> if (flags & XFS_QMOPT_UQUOTA) > > >>>>> xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge, NULL); > > >>>>> if (flags & XFS_QMOPT_GQUOTA) > > >>>>> xfs_qm_dquot_walk(mp, XFS_DQ_GROUP, xfs_qm_dqpurge, NULL); > > >>>>> if (flags & XFS_QMOPT_PQUOTA) > > >>>>> xfs_qm_dquot_walk(mp, XFS_DQ_PROJ, xfs_qm_dqpurge, NULL); > > >>>>> } > > >>>>> > > >>>>> Above code is what I can figured out as per your suggestions for now, but it > > >>>>> would introduce overheads for walking through user dquots to release hints > > >>>>> separately if we want to turn user quota off. > > >>>>> > > >>>>> Any thoughts? > > >>>> > > >>>> I was gonna pull in the single walk version, but now I realize that it is still > > >>>> under discussion. I'm happy with either implementation, with maybe a slight > > >>>> preference for a single user quota walk. Can you and Christoph come to an > > >>>> agreement? > > >>> For now, I can not figure out a more optimized solution. Well, I just realized > > >>> I don't need to initialize both gdqp and pdqp to NULL at xfs_qm_dqpurge_hints() > > >>> since they will be evaluated by dqp pointers dereference anyway. As a minor fix, > > >>> the revised version was shown as follows. > > >>> > > >>> Christoph, as I mentioned previously, keeping a separate walk to release the user > > >>> dquots would also have overloads in some cases, would you happy to have this fix > > >>> although it is not most optimized? > > >> > > >> I'm happy either way it is done - I'd prefer we fix the problem than > > >> bikeshed over an extra radix tree walk or not given for most people > > >> the overhead won't be significant. > > >> > > >>> From: Jie Liu <jeff.liu@oracle.com> > > >>> > > >>> xfs_quota(8) will hang up if trying to turn group/project quota off > > >>> before the user quota is off, this could be 100% reproduced by: > > >> ..... > > >> > > >> So from the perspective, I'm happy to consider the updated > > >> patch as: > > >> > > >> Reviewed-by: Dave Chinner <dchinner@redhat.com> > > >> > > >> However, I question the need for the hints at all now. The hints > > >> were necessary back when the quota manager had global lists and > > >> hashes, and the lookups were expensive. Hence there was a > > >> significant win to caching the group dquot on the user dquot as it > > >> avoided a significant amount of code, locks and dirty cachelines. > > >> > > >> Now, it's just a radix tree lookup under only a single lock and the > > >> process dirties far fewer cachelines (none in the radix tree at all) > > >> and so should be substantially faster than the old code. And with > > >> the dquots being attached and cached on inodes in the first place, I > > >> don't see much advantage to keeping hints on the user dquot. THis is > > >> especially true for project quotas where a user might be accessing > > >> files in different projects all the time and so thrashing the > > >> project quota hint on the user dquot.... > > Ah, that accounts for it! Yesterday, I even thought to add an udquot > > member to struct xfs_dquot in order to avoid walk though user quota > > while turning off others, i.e, > > > > diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h > > index d22ed00..0037c7e 100644 > > --- a/fs/xfs/xfs_dquot.h > > +++ b/fs/xfs/xfs_dquot.h > > @@ -52,6 +52,13 @@ typedef struct xfs_dquot { > > int q_bufoffset; /* off of dq in buffer (# dquots) */ > > xfs_fileoff_t q_fileoffset; /* offset in quotas file */ > > > > + union { > > + struct xfs_dquot *q_udquot; > > + struct { > > + struct xfs_dquot *q_pdquot; > > + struct xfs_dquot *q_gdquot; > > + } gp_hints; > > + } hints; > > struct xfs_dquot*q_gdquot; /* group dquot, hint only */ > > struct xfs_dquot*q_pdquot; /* project dquot, hint only */ > > xfs_disk_dquot_t q_core; /* actual usage & quotas */ > > > > In this way, I can attach the q_udquot to group/project dquots while > > attaching them to the user's. Thus I don't need to walk through user > > dquots to fetch the hints but to fetch them via: > > gdquot->hints.q_udquot.g_pdquot/g_gdquot and then decrease the reference > > count, but that need more code changes and add complexities. > > Yeah, more complexity, but I can see why you might take that path ;) > > > >> Hence I wonder if removing the dquot hint caching altogether would > > >> result in smaller, simpler, faster code. And, in reality, if the > > >> radix tree lock is a contention point on lookup after removing the > > >> hints, then we can fix that quite easily by switching to RCU-based > > >> lockless lookups like we do for the inode cache.... > > > > > > Actually, scalability couldn't get any worse by removing the hints. > > > If I run a concurrent workload with quota enabled, the global dquot > > > locks (be it user, quota or project) completely serialises the > > > workload. This result if from u/g/p all enabled, run by a single > > > user in a single group and a project ID of zero: > ..... > > > > > > FSUse% Count Size Files/sec App Overhead > > > 0 1600000 0 17666.5 15377143 > > > 0 3200000 0 17018.6 15922906 > > > 0 4800000 0 17373.5 16149660 > > > 0 6400000 0 16564.9 17234139 > > > .... > > > > > > Without quota enabled, that workload runs at >250,000 files/sec. > > > > > > Serialisation is completely on the dquot locks - so I don't see > > > anything right now that hints are going to buy us in terms of > > > improving concurrency or scalability, so I think we probably can > > > just get rid of them. > > > > > > FWIW, getting rid of the hints and converting the dquot reference > > > counter to an atomic actually improves performance a bit: > > > > > > FSUse% Count Size Files/sec App Overhead > > > 0 1600000 0 17559.3 15606077 > > > 0 3200000 0 18738.9 14026009 > > > 0 4800000 0 18960.0 14381162 > > > 0 6400000 0 19026.5 14422024 > > > 0 8000000 0 18456.6 15369059 > > > > > > Sure, 10% improvement is 10%, but concurrency still sucks. At least > > > it narrows down the cause - the transactional modifications are the > > > serialisation issue. > > Admire!! I'm still in considering of remove the hints but you have already > > shown the measuring results. :) > > Oh, removing 200 lines of code is easy ;) > > It's making the dquot transaction code lockless that's hard... Well, the easy bit is prototyped - xfs_trans_dqresv() uses cmpxchg loops now: FSUse% Count Size Files/sec App Overhead 0 1600000 0 24487.3 13020824 0 3200000 0 24399.3 13333231 0 4800000 0 24268.4 14574793 0 6400000 0 24026.3 14918698 And all the lock contention is locking the dquots during xfs_trans_commit(): - 3.44% [kernel] [k] __mutex_lock_slowpath - __mutex_lock_slowpath - 99.56% mutex_lock - 83.30% xfs_trans_dqlockedjoin xfs_trans_apply_dquot_deltas xfs_trans_commit xfs_create xfs_vn_mknod xfs_vn_create vfs_create .... I have a couple of interesting thoughts on how to deal with this; it involves building on top of Christoph's log formatting patch series and pushing the dquot modifications all the way into the ->iop_format method where they can be applied locklessly to the dquot.... If I can make that work, then I need to step back and have a deep think about how we can apply the technique generically at an object level, because if we can do this then lockless, concurrent transactions are possible for various types of objects. I can see much joy coming from not having to lock the inode to update the file size transactionally during IO completion.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/3] xfs: fix infinite loop by detaching the group/project hints from user dquot 2013-12-09 2:36 ` Dave Chinner 2013-12-09 3:26 ` Jeff Liu @ 2013-12-09 7:10 ` Christoph Hellwig 2013-12-09 23:07 ` Ben Myers 1 sibling, 1 reply; 12+ messages in thread From: Christoph Hellwig @ 2013-12-09 7:10 UTC (permalink / raw) To: Dave Chinner; +Cc: Christoph Hellwig, Jeff Liu, Ben Myers, xfs@oss.sgi.com The hint removal gets my vote. I actually had it written up the last time I touch the quota code, but for some reasons I stopped before sumitting it. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/3] xfs: fix infinite loop by detaching the group/project hints from user dquot 2013-12-09 7:10 ` Christoph Hellwig @ 2013-12-09 23:07 ` Ben Myers 0 siblings, 0 replies; 12+ messages in thread From: Ben Myers @ 2013-12-09 23:07 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Jeff Liu, xfs@oss.sgi.com On Sun, Dec 08, 2013 at 11:10:03PM -0800, Christoph Hellwig wrote: > The hint removal gets my vote. I actually had it written up the last > time I touch the quota code, but for some reasons I stopped before > sumitting it. Sounds good to me too. -Ben _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2013-12-09 23:07 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-11-26 13:38 [PATCH v2 2/3] xfs: fix infinite loop by detaching the group/project hints from user dquot Jeff Liu 2013-11-28 10:43 ` Christoph Hellwig 2013-11-29 9:36 ` Jeff Liu 2013-12-06 21:01 ` Ben Myers 2013-12-07 5:51 ` Jeff Liu 2013-12-09 1:26 ` Dave Chinner 2013-12-09 2:36 ` Dave Chinner 2013-12-09 3:26 ` Jeff Liu 2013-12-09 6:09 ` Dave Chinner 2013-12-09 6:36 ` Dave Chinner 2013-12-09 7:10 ` Christoph Hellwig 2013-12-09 23:07 ` Ben Myers
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox