From: Rich Johnston <rjohnston@sgi.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 3/4] xfstests: _check_quota_usage needs to unmount to get XFS quotacheck
Date: Thu, 16 Aug 2012 14:16:33 -0500 [thread overview]
Message-ID: <502D4711.1010809@sgi.com> (raw)
In-Reply-To: <20120726225504.GB2877@dastard>
On 07/26/2012 05:55 PM, Dave Chinner wrote:
> On Thu, Jul 26, 2012 at 06:35:05PM +1000, Dave Chinner wrote:
>> From: Dave Chinner <dchinner@redhat.com>
>>
>> Remount won't run a quota check - it's only done during mount. Hence
>> all quota tests using this check function are not actually
>> validating XFS filesystems right now.
>>
>> Signed-off-by: Dave Chinner <dchinner@redhat.com>
>
> FWIW, this change is exposing some problems in the new dquot code:
>
>> ---
>> common.quota | 10 ++++++++--
>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/common.quota b/common.quota
>> index 9736306..2fa784jack@suse.czb 100644
>> --- a/common.quota
>> +++ b/common.quota
>> @@ -236,6 +236,11 @@ _check_quota_usage()jack@suse.cz
>> {
>> # Sync to get delalloc to disk
>> sync
>> +
>> + # kill caches to guarantee removal speculative delalloc
>> + # XXX: really need an ioctl instead of this big hammer
>> + echo 3 > /proc/sys/vm/drop_caches
>> +
>
> Some kind of locking issue is present:
>
> [ 1871.738970] XFS (vdb): Quotacheck: Done.
> [ 1877.795774] ------------[ cut here ]------------
> [ 1877.797347] WARNING: at kernel/mutex-debug.c:78 debug_mutex_unlock+0xda/0xe0()
> [ 1877.799416] Hardware name: Bochs
> [ 1877.799416] Modules linked in:
> [ 1877.799416] Pid: 2261, comm: 232 Not tainted 3.5.0-rc5-dgc+ #313
> [ 1877.799416] Call Trace:
> [ 1877.799416] [<ffffffff8107a83f>] warn_slowpath_common+0x7f/0xc0
> [ 1877.799416] [<ffffffff8107a89a>] warn_slowpath_null+0x1a/0x20
> [ 1877.799416] [<ffffffff810d022a>] debug_mutex_unlock+0xda/0xe0
> [ 1877.799416] [<ffffffff81b4c97c>] __mutex_unlock_slowpath+0x7c/0x130
> [ 1877.799416] [<ffffffff81b4ca3e>] mutex_unlock+0xe/0x10
> [ 1877.799416] [<ffffffff814b12d8>] xfs_qm_dqreclaim_one+0x178/0x3d0
> [ 1877.799416] [<ffffffff814b1620>] xfs_qm_shake+0xf0/0x170
> [ 1877.799416] [<ffffffff81137789>] shrink_slab+0x169/0x350
> [ 1877.799416] [<ffffffff81709b04>] ? do_raw_spin_lock+0x54/0x120
> [ 1877.799416] [<ffffffff8118a488>] ? iput+0x48/0x210
> [ 1877.799416] [<ffffffff8119b433>] drop_caches_sysctl_handler+0x73/0xa0
> [ 1877.799416] [<ffffffff811de863>] proc_sys_call_handler.isra.11+0xb3/0xd0
> [ 1877.799416] [<ffffffff811de898>] proc_sys_write+0x18/0x20
> [ 1877.799416] [<ffffffff81170298>] vfs_write+0xa8/0x160
> [ 1877.799416] [<ffffffff8117058a>] sys_write+0x4a/0x90
> [ 1877.799416] [<ffffffff81b57269>] system_call_fastpath+0x16/0x1b
> [ 1877.799416] ---[ end trace 4f2a89b2cbd5e64f ]---
>
> which is:
>
> DEBUG_LOCKS_WARN_ON(lock->owner != current);
>
> so something other than the task that locked the mutex unlocked it,
> or we are unlocking an unlocked dquot...
>
>> VFS_QUOTA=0
>> case $FSTYP in
>> ext2|ext3|ext4|ext4dev|reiserfs)
>> @@ -253,8 +258,9 @@ _check_quota_usage()
>> quotacheck -u -g $SCRATCH_MNT 2>/dev/null
>> else
>> # use XFS method to force quotacheck
>> - mount -o remount,noquota $SCRATCH_DEV
>> - mount -o remount,usrquota,grpquota $SCRATCH_DEV
>> + xfs_quota -x -c "off -ug" $SCRATCH_MNT
>
> And this is hanging with what appears to be a reference counting bug
> when purging dquots in generic/233:
>
> # echo w > /proc/sysrq-trigger
> [53710.206100] SysRq : Show Blocked State
> [53710.207213] task PC stack pid father
> [53710.208749] xfs_quota D ffff88003fc12880 3896 18147 17936 0x00000000
> [53710.209738] ffff88000f3afc18 0000000000000086 ffff88001cb160c0 ffff88000f3affd8
> [53710.209738] ffff88000f3affd8 ffff88000f3affd8 ffffffff81f9b420 ffff88001cb160c0
> [53710.209738] ffff88000f3afc08 ffffffff821ece80 ffff88000f3afc50 0000000100cbbe68
> [53710.209738] Call Trace:
> [53710.209738] [<ffffffff81b4dea9>] schedule+0x29/0x70
> [53710.209738] [<ffffffff81b4bcad>] schedule_timeout+0x13d/0x2c0
> [53710.209738] [<ffffffff81089f90>] ? usleep_range+0x50/0x50
> [53710.209738] [<ffffffff814aea90>] ? xfs_qm_need_dqattach+0x70/0x70
> [53710.209738] [<ffffffff81b4be4e>] schedule_timeout_uninterruptible+0x1e/0x20
> [53710.209738] [<ffffffff814aeef3>] xfs_qm_dquot_walk+0x153/0x170
> [53710.209738] [<ffffffff816fb81b>] ? radix_tree_lookup+0xb/0x10
> [53710.209738] [<ffffffff8149772a>] ? xfs_perag_get+0x3a/0x120
> [53710.209738] [<ffffffff814ace60>] ? xfs_trans_free_dqinfo+0x40/0x40
> [53710.209738] [<ffffffff81448aef>] ? xfs_inode_ag_iterator+0x8f/0xa0
> [53710.209738] [<ffffffff814aef93>] xfs_qm_dqpurge_all+0x83/0x90
> [53710.209738] [<ffffffff814ae4b9>] xfs_qm_scall_quotaoff+0x139/0x350
> [53710.209738] [<ffffffff814b2780>] xfs_fs_set_xstate+0xd0/0xf0
> [53710.209738] [<ffffffff811d1088>] sys_quotactl+0x1f8/0x740
> [53710.209738] [<ffffffff81174d7a>] ? sys_newstat+0x2a/0x40
> [53710.209738] [<ffffffff81b52635>] ? do_async_page_fault+0x35/0x90
> [53710.209738] [<ffffffff81b57269>] system_call_fastpath+0x16/0x1b
>
> It's hitting a dquot that either has the FREEING flag set of an
> elevated reference count, so is skipping it. It gets stuck in the
> loop forever retrying. That's probably related to the above lock
> issue.
>
> And generic/231 fails with a significant accounting difference:
>
> generic/231 [failed, exit status 1] - output mismatch (see tests/generic/231.out.bad)
> --- tests/generic/231.out 2012-07-26 18:42:30.000000000 +1000
> +++ results/generic/231.out.bad 2012-07-27 08:24:22.000000000 +1000
> @@ -2,15 +2,7 @@
> === FSX Standard Mode, Memory Mapping, 1 Tasks ===
> All operations completed A-OK!
> Comparing user usage
> -Comparing group usage
> -=== FSX Standard Mode, Memory Mapping, 4 Tasks ===
> -All operations completed A-OK!
> -All operations completed A-OK!
> -All operations completed A-OK!
> -All operations completed A-OK!
> -Comparing user usage
> -Comparing group usage
> -=== FSX Standard Mode, Memory Mapping, 1 Tasks ===
> -All operations completed A-OK!
> -Comparing user usage
> -Comparing group usage
> +4c4
> +< #1001 -- 524 0 0 3 0 0
> +---
> +> #1001 -- 316 0 0 3 0 0
>
> generic/270 and generic/233 give a similar mismatch when they don't
> hang.
>
> So, yeah, we haven't been verifying the quota accounting code as
> well as we should have been for some time now....
>
> Cheers,
>
> Dave.
>
I did see the the hang some times and the accounting mismatch. Dave do
you want to look into this further. Otherwise I am OK with approving
this patch and fixing the accounting and lockup under another bug
because this patch is the way to work around the remount issue. I will
leave it up to you.
Reviewed-by: Rich Johnston <rjohnston@sgi.com>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2012-08-16 19:16 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-26 8:35 [PATCH 0/4] xfstests: random fixes and improvements Dave Chinner
2012-07-26 8:35 ` [PATCH 1/4] xfstests: test 110 sometimes fails to unmount scratch dev Dave Chinner
2012-08-16 19:16 ` Rich Johnston
2012-08-28 20:06 ` Christoph Hellwig
2012-08-28 20:20 ` Ben Myers
2012-07-26 8:35 ` [PATCH 2/4] xfstests: loop devices vs umount stupidity Dave Chinner
2012-08-16 19:16 ` Rich Johnston
2012-08-16 22:27 ` Dave Chinner
2012-08-17 12:45 ` Rich Johnston
2012-08-28 20:06 ` Christoph Hellwig
2012-07-26 8:35 ` [PATCH 3/4] xfstests: _check_quota_usage needs to unmount to get XFS quotacheck Dave Chinner
2012-07-26 22:55 ` Dave Chinner
2012-08-16 19:16 ` Rich Johnston [this message]
2012-08-28 20:07 ` Christoph Hellwig
2012-07-26 8:35 ` [PATCH 4/4] xfstests: speed up 227 by using preallocation Dave Chinner
2012-08-16 19:16 ` Rich Johnston
2012-08-28 20:07 ` Christoph Hellwig
2012-08-14 21:39 ` [PATCH 0/4] xfstests: random fixes and improvements Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=502D4711.1010809@sgi.com \
--to=rjohnston@sgi.com \
--cc=david@fromorbit.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox