From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sunil Mushran Date: Fri, 05 Feb 2010 10:39:47 -0800 Subject: [Ocfs2-devel] [PATCH] ocfs2: Plugs race between the dc thread and an unlock ast message In-Reply-To: <20100205060148.GA3416@mail.oracle.com> References: <1265221014-10591-1-git-send-email-sunil.mushran@oracle.com> <20100204102729.GA4339@laptop.oracle.com> <4B6B21BE.10708@oracle.com> <20100205060148.GA3416@mail.oracle.com> Message-ID: <4B6C65F3.8000805@oracle.com> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: ocfs2-devel@oss.oracle.com Joel Becker wrote: > Why cancel convert? Why not an actual unlock. That's what I > thought you meant when you proposed the patch. After all, David's bug > was at DLM_LOCK_IV, not NL. > Cancel isn't in play here. Cancel only happens on the dc > thread, and the dc thread has gotten past BUSY. So the lock isn't busy > anymore. It does downconvert_worker in an unlocked state. When it > comes back to recheck, another thread can't have done a cancel. > That's why I asked about unlink. Imagine the dc thread is > handling a bast while the unlink thread has gotten to clear_inode. > Could ocfs2_drop_lock() be racing the downconvert? Yes. I forgot the level was IV. So the full fix would be to ensure the level is <= NL in the block below. ?? /* * How can we block and yet be@NL? We were trying to upconvert * from NL and got canceled. The code comes back here, and now * we notice and clear BLOCKING. */ if (lockres->l_level == DLM_LOCK_NL) { BUG_ON(lockres->l_ex_holders || lockres->l_ro_holders); lockres->l_blocking = DLM_LOCK_NL; lockres_clear_flags(lockres, OCFS2_LOCK_BLOCKED); spin_unlock_irqrestore(&lockres->l_lock, flags); goto leave; }