From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jiaying Zhang Subject: Re: deadlocks with fs quotas Date: Mon, 29 Jun 2009 17:12:11 -0700 Message-ID: <5df78e1d0906291712m6187d0e5w14fffe8719558bf7@mail.gmail.com> References: <5df78e1d0906261805n198639ddoe6b5a0856a065bfd@mail.gmail.com> <20090627183129.GB31902@duck.suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: linux-fsdevel@vger.kernel.org To: Jan Kara Return-path: Received: from smtp-out.google.com ([216.239.45.13]:16020 "EHLO smtp-out.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751110AbZF3AML (ORCPT ); Mon, 29 Jun 2009 20:12:11 -0400 Received: from zps75.corp.google.com (zps75.corp.google.com [172.25.146.75]) by smtp-out.google.com with ESMTP id n5U0CELr020863 for ; Mon, 29 Jun 2009 17:12:14 -0700 Received: from qyk41 (qyk41.prod.google.com [10.241.83.169]) by zps75.corp.google.com with ESMTP id n5U0Bnlp020317 for ; Mon, 29 Jun 2009 17:12:12 -0700 Received: by qyk41 with SMTP id 41so1901908qyk.19 for ; Mon, 29 Jun 2009 17:12:12 -0700 (PDT) In-Reply-To: <20090627183129.GB31902@duck.suse.cz> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Sat, Jun 27, 2009 at 11:31 AM, Jan Kara wrote: > Hello, > > On Fri 26-06-09 18:05:15, Jiaying Zhang wrote: >> I am looking at the fs quota code while debugging a deadlock problem >> on ext2 file system. I found there is a potential deadlock between quotaon >> and quotaoff (or quotasync). Basically, all of quotactl operations need to >> be protected by dqonoff_mutex. vfs_quota_off and vfs_quota_sync also call >> sb->s_op->quota_write that needs to grab the i_mutex of the quota file. >> But in vfs_quota_on_inode (called from quotaon operation), the current >> code tries to grab the i_mutex of the quota file first before getting >> quonoff_mutex. >> >> Here is a simple test to show the problem: >> $ while true; do quotaon /dev/hda >&/dev/null; usleep $RANDOM; done & >> $ while true; do quotaoff /dev/hda >&/dev/null; usleep $RANDOM; done & >> >> After running for a while, the two processes get deadlocked. >> Below is my proposed change to fix the problem: > Thanks for the analysis and the patch. You are right, it is a bug and your > fix seems to be correct. Only, we should also use I_MUTEX_QUOTA for > acquiring i_mutex to be consistent with other places (and silence lockdep). > >> Index: git-linux/fs/quota/dquot.c >> =================================================================== >> --- git-linux.orig/fs/quota/dquot.c 2009-05-20 18:05:55.000000000 -0700 >> +++ git-linux/fs/quota/dquot.c 2009-06-26 17:57:04.000000000 -0700 >> @@ -2042,8 +2042,8 @@ >> * changes */ >> invalidate_bdev(sb->s_bdev); >> } >> - mutex_lock(&inode->i_mutex); >> mutex_lock(&dqopt->dqonoff_mutex); >> + mutex_lock(&inode->i_mutex); >> if (sb_has_quota_loaded(sb, type)) { >> error = -EBUSY; >> goto out_lock; >> @@ -2094,7 +2094,6 @@ >> dqopt->files[type] = NULL; >> iput(inode); >> out_lock: >> - mutex_unlock(&dqopt->dqonoff_mutex); >> if (oldflags != -1) { >> down_write(&dqopt->dqptr_sem); >> /* Set the flags back (in the case of accidental quotaon() >> @@ -2104,6 +2103,7 @@ >> up_write(&dqopt->dqptr_sem); >> } >> mutex_unlock(&inode->i_mutex); >> + mutex_unlock(&dqopt->dqonoff_mutex); >> out_fmt: >> put_quota_format(fmt); >> >> >> Also while debugging the problem, I found the following code path: >> >> shrink_icache_memory -> prune_icache (grab iprune_mutex) -> dispose_list -> >> clear_inode -> dquot_drop -> dqput -> dquot_release -> >> dqopt->ops[dquot->dq_type]->write_file_info -> >> sb->s_op->quota_write (i.e., ext2_quota_write for ext2) > Yes, I'm aware of this. > >> AFAICT, it seems very deadlock prone that the quota system tries to write >> the quota file while clearing an inode. The ext2_quota_write calls >> ext2_get_block that may block at alloc_page that in turn may try to call >> shrink_icache_memory when the system is low in memory. But the >> iprune_mutex is already hold by the process so the system will get >> into deadlock here. I haven't got a test case that shows the deadlock but >> want to raise this issue here first to see if I have missed anything. > I guess, you mean page allocation from sb_bread() or similar functions... > The point is that all these functions should perform allocations with GFP_FS > flag cleared and thus we can never reenter the inode pruning (or any other > filesystem) code. I see. We are using GFP_NOFS so the checking for (gfp_mask & __GFP_FS) in shrinkg_icache_memory fails and it stops there. Thanks for pointing this out. Jiaying > > Honza > > PS: I'm going on vacation for a week, I'll merge your patch when I return. > -- > Jan Kara > SUSE Labs, CR >