From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wr1-f43.google.com (mail-wr1-f43.google.com [209.85.221.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 859234D2EEF for ; Fri, 15 May 2026 15:35:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.43 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778859325; cv=none; b=T+Bgb2Z1D6qTOtHGs5lyhto7uGdaoH24dwxG5Vqimy/2FVZ3slJ/cC6bE4yC7ll6XzHOWQn8ViMek7FkCE0bUmEiUFkcxBuSHjpTE1vrkAczx4kq3guhmTMHbh5uZqPES55bRczekjvUQajtA4Txg758HzicnFJzujVv4Yjkh+0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778859325; c=relaxed/simple; bh=IkY4f31IVI+sS2rTQ19APf2Kf6Tq7o1apDJKC1b+guU=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=W3ZOVU/aXMQNcmplXQJPaCccZhxmgfi893PPv+1c2/zZazQVMMWN3lSfF/40zo8lMmgqSOzPWhuitoYNY+AZ/1fcGpedWstn/j1OoxAl/6w6PmC0vx965gSqBz6w+M1+pAk/fgyN6cGGBYzNf3GYGJxozDC5H2kDEKVsYsj+fgs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b=e4se2kqw; arc=none smtp.client-ip=209.85.221.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b="e4se2kqw" Received: by mail-wr1-f43.google.com with SMTP id ffacd0b85a97d-45b030a5696so455248f8f.0 for ; Fri, 15 May 2026 08:35:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1778859316; x=1779464116; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=1PMkqrlxvVv2+zBG40scq5k2wIizxTATm8E5BebsVQs=; b=e4se2kqw3UOZX1cb8w/pA7U06HNFJihiKl8pBQ0JC5QQa3SL41CwwpTrFF/qIh3/tD jCuYmYvN9SpIMSS+wAFWY36yeQBDU9FQ+3RHVI2M2XBl8SjHtZmOzM0ClusljDHHo713 lPUBLyt8cigI1DZU8vVVXJCSdHMw1qU/xeLOEITh5AnIDiUCJLWZy0+bf+OnAEevZPCT rHZv8VAjXlazeOMnMdF2aDBw5BgBSzOrd0icWsKygqP9MD6T+2wf/PjFkV8KXG2NOtsj 7OoMY9c2gy1zaIdFj7+XPUiDa5FtQ8xiyxXdh4lur4l8ATmxmNAM5qhn7YFcMzbBkfhK 7xeg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778859316; x=1779464116; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1PMkqrlxvVv2+zBG40scq5k2wIizxTATm8E5BebsVQs=; b=Plg3fC1nu3qx70yFe8jC5A+RZdepuCNfPSb1KJY0VQEpmvUQmSN0IrDoAPjp+a7Sdb GIb+WegPFiyLuoF40w0RXsbba28jtU6hLIfxX2aDh8lxewcUvO3dPSgRIx4f+z+kR88X KMIdR/8K8ScW3iE6VupsxbHmUQeSuknFjeymFu1aYIAc5sDqqJspFF3rH/23zqLYIc0d bBUVJEqaKJ2DMw1MffoaD8yDlKQY3B9L93igYckvhEuV+g6tgqQ4J9Xj/P7li1cX3rn4 Pwog4oAOqsXrNl/2vUMSqnxV2OOia2/nOZwUw6Z6x6ieVNW+HRlaZbwmvJO8bxtZY1Pa Up/A== X-Forwarded-Encrypted: i=1; AFNElJ9p7klAax+cna7A79jQGZ0ITZwKKvFR2KHJcLtInybGOHYq5mFD8OM51uen0wDD1EXxqkmQ7KSxYWctIFY=@vger.kernel.org X-Gm-Message-State: AOJu0Yz8vl2AwidiJ7cFu+X1YyGD9AhkMvmiBqhs+5SWPH/w5GxmdiEu OagT5r3FNJ7bFiLYMBBSOUEWh7At9rB89ZeljCWSbPIoF8xdsU84N82z56ipoQr5Fek= X-Gm-Gg: Acq92OEWzuuIm3AqSvNWFr3mvCrqTuiQrV2oc5kQvMV6EMC6nj5bAlerSN57G6mAjhf YkNSdlVCyWqENe57cbPSAjQCEY2Rzi/CHdHQLe++SKsyUwC3P7ZXbtGVecBRXbBrJia6L52X4Kw HFvGqIzvxEbN7eUBJ556xy+5hGJeAshnk4hfkUVQ0K8olmSgAeZlKULHcKlP+TdYJzYT4t9pidJ FBoMF6aVRQa5/+DEc16/c/pgj2e2Sj2oQjspkWJZALDTRK4gAqH8qHe6BTcb4t4pvhc/iTl0GHz 90beB42q2bduwIp01PfJmH2fD/CYFQdzNASvp/IezDI7S416G8ZdEpxzsvPP6FxIFxZLiz7MGe8 F8tvnT0F20b+uDbBvX+SLUPiZxcsygQASry3rhXqVwwukZthldiBPrddHX0qW0cWjnWvuRo3HsZ 7Ff95lj2h2EH5rI1lr2ZScVZxQaeNDxCf6mgOFzvbncRI= X-Received: by 2002:a05:600c:1f8c:b0:48f:d410:6076 with SMTP id 5b1f17b1804b1-48fe630e3d5mr28492035e9.7.1778859316419; Fri, 15 May 2026 08:35:16 -0700 (PDT) Received: from localhost ([202.127.77.110]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2bd5cfe498asm64859135ad.39.2026.05.15.08.35.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 May 2026 08:35:15 -0700 (PDT) Date: Fri, 15 May 2026 23:35:13 +0800 From: Heming Zhao To: Tetsuo Handa Cc: Mark Fasheh , Joel Becker , Joseph Qi , jiangyiwen , Andrew Morton , ocfs2-devel@lists.linux.dev, LKML Subject: Re: [PATCH] ocfs2: kill osb->system_file_mutex lock Message-ID: References: <934355dd-a0b1-4e53-93ac-0a7ae7458051@I-love.SAKURA.ne.jp> <831c4fc1-c89f-48bc-84c6-25b2cefc2b20@I-love.SAKURA.ne.jp> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <831c4fc1-c89f-48bc-84c6-25b2cefc2b20@I-love.SAKURA.ne.jp> On Thu, May 14, 2026 at 04:09:25PM +0900, Tetsuo Handa wrote: > Hi Heming, > > I would like to clarify why the expectation of "being called only once" is logically > incorrect, reply to your concern regarding the reference count leak and explain why > this patch is completely safe and sufficient. > > 1. get_local_system_inode() can fail under memory pressure: > get_local_system_inode() allocates memory internally. Under heavy memory pressure, > this allocation can fail and return NULL. When this happens, the caller > ocfs2_get_system_file_inode() must fall back to calling _ocfs2_get_system_file_inode() > again to read the inode from disk. Therefore, the filesystem design must inherently > support multiple calls to _ocfs2_get_system_file_inode(). > > 2. Why cmpxchg() is sufficient and safe without the mutex: > The only thing the system_file_mutex is needed was to prevent a race where two > threads concurrently execute _ocfs2_get_system_file_inode(), obtain the SAME inode > pointer (since the underlying VFS iget_locked() returns the identical address for > the same slot), and both mistakenly invoke igrab() on it, leading to a reference > count leak. > > This patch perfectly solves that race condition by using cmpxchg() on the target > pointer array slot: > > * The thread that wins the cmpxchg() successfully initializes the slot with the > fetched inode and get the extra refcount because it is the first time to store > into the slot. > > * The thread that loses the cmpxchg() detects that another thread has already > initialized the slot with the exact same inode. The loser thread returns > without getting the extra refcount because it is not the first time to store > into the slot. > > Therefore, the reference counting contract is strictly and atomically maintained. > No references are leaked, and the array slot is never corrupted. Hi, The logic here is incorrect. The purpose of the refcount is to track how many consumers are using the inode. In the original code, if two threads concurrently access ocfs2_get_system_file_inode() while the inode is uninitialized, inode->i_count would ultimately be incremented by 3. However, with your patch, i_count will only be incremented by 2. To be more specific: Your patch explicitly triggers a race condition: when the target local_system inode is uninitialized and two threads enter simultaneously, Thread 1 wins the cmpxchg() and increments the refcount before exiting. Thread 2, however, loses the refcount increment simply because the atomic operation failed. btw, The issue addressed in commit 43b10a20372d was that after two concurrent threads returned, inode->i_count ended up being 4 when the correct value should have been 3. With your patch, the value will end up being 2, which is insufficient. In my opinion, the problem with the current code is that the scope of mutex_lock(&osb->system_file_mutex) is too broad. This mutex only needs to be held prior to calling _ocfs2_get_system_file_inode(). I previously highlighted this point in my initial review comment on the patch. Thanks, Heming > > 3. Standard filesystems do not use a global mutex for this: > Standard filesystems (like Ext4's ext4_get_journal_inode or XFS's > xfs_qm_init_quotainos) rely entirely on the VFS layer's internal hashing/locking (e.g., > iget_locked) to serialize metadata/system inode lookups. OCFS2's system_file_mutex is a > redundant global lock that heavily pollutes the lock dependency graph, triggering > possible deadlock warnings that block us from testing and fixing genuine deadlocks. > > Since the cmpxchg() approach guarantees atomic slot initialization + igrab(), the global > mutex is completely redundant and should be removed. > > Regards. >