From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C48E638FB1 for ; Thu, 9 Nov 2023 23:58:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="Ph61aCmr" Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id EE55221989; Thu, 9 Nov 2023 23:58:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1699574283; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=nVtUe0++z/uMkqi4t3djy4nGdDEtn27q9FxnRHIG5LY=; b=Ph61aCmrBZMYCeG6J45zUWb+PKE0knnEEm+9YIyDw2xc1bV89xi+81B/dD/Fp/C2HaCPY+ Yhz8eRr+Cb/0liDUCLvOvEh/GWEMeVfYp7bQkGui/5EIpygP2zlF1sR3jcwEhfnXJNeF7P AMtu6DSxiYDFD2GPGiCaky1U3wp8D2U= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 4F649132BE; Thu, 9 Nov 2023 23:58:03 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 7ChRDwtyTWUaZQAAMHmgww (envelope-from ); Thu, 09 Nov 2023 23:58:03 +0000 Date: Fri, 10 Nov 2023 00:58:02 +0100 From: Anthony Iliopoulos To: Srivathsa Dara Cc: "mark@fasheh.com" , "jlbec@evilplan.org" , "joseph.qi@linux.alibaba.com" , Rajesh Sivaramasubramaniom , Junxiao Bi , "ocfs2-devel@lists.linux.dev" , Gautham Ananthakrishna Subject: Re: [PATCH] ocfs2: call ocfs2_abort when journal abort Message-ID: References: <20231030120057.928280-1-srivathsa.d.dara@oracle.com> <7ABBFC08-4350-4D6C-B740-80DE590F9476@suse.com> Precedence: bulk X-Mailing-List: ocfs2-devel@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Thu, Nov 09, 2023 at 07:26:16PM +0000, Srivathsa Dara wrote: > Hi Anthony, > thanks for the review > > -----Original Message----- > From: Anthony Iliopoulos > Sent: Wednesday, November 1, 2023 4:53 AM > To: Srivathsa Dara > Cc: mark@fasheh.com; jlbec@evilplan.org; joseph.qi@linux.alibaba.com; > Rajesh Sivaramasubramaniom ; > Junxiao Bi ; ocfs2-devel@lists.linux.dev > Subject: Re: [PATCH] ocfs2: call ocfs2_abort when journal abort > > On Mon, Oct 30, 2023 at 12:00:57PM +0000, Srivathsa Dara wrote: > > From: Ryan Ding > > > > > > journal can not recover from abort state, so we should take following > > action to prevent file system from corruption: > > Could you elaborate on how corruption can occur when the filesystem is > in this particular state? How could this be reproduced? > > [Srivathsa]: Corruption can occur due to storage failure How does that failure lead from aborting the journal to causing fs corruption? What exactly gets corrupted after journal abort? Can you provide more details and possibly a reproducer on this? > > 1. change to readonly filesystem when local mount. We can not afford > > further write, so change to RO state is reasonable. > > This is generally the strategy, but unrelated to the code that this commit > is touching. The ocfs2_commit_thread() only applies to non-local mounts. > > [Srivathsa]: Yes, you are right, this is only for cluster mode. I will remove > all local mount related comments. > > > 2. panic when cluster mount. Because we can not release lock resource in > > this state, other node will hung when it require a lock owned by this > > node. So panic and remaster is a reasonable choise. > > Panicking is never a reasonable choice :-) Why couldn't a remastering be > initiated without going through a reboot instead? > > [Srivathsa]: Without reboot, we run into a journal failure, the only option > is to have fs in read-only mode. Could you suggest any other way to make > this node release the locks held by it? Switching to read-only, and self-fencing so that another node will take over the recovery. > > ocfs2_abort() will do all the above work. > > > > Signed-off-by: Ryan Ding > > Signed-off-by: Srivathsa Dara > > --- > > fs/ocfs2/journal.c | 27 +++++++++++++++------------ > > 1 file changed, 15 insertions(+), 12 deletions(-) > > > > diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c index > > ce215565d061..6dace475f019 100644 > > --- a/fs/ocfs2/journal.c > > +++ b/fs/ocfs2/journal.c > > @@ -14,7 +14,6 @@ > > #include > > #include > > #include > > -#include > > #include > > > > #include > > @@ -2326,7 +2325,7 @@ static int __ocfs2_wait_on_mount(struct > > ocfs2_super *osb, int quota) > > > > static int ocfs2_commit_thread(void *arg) { > > - int status; > > + int status = 0; > > struct ocfs2_super *osb = arg; > > struct ocfs2_journal *journal = osb->journal; > > > > @@ -2340,21 +2339,25 @@ static int ocfs2_commit_thread(void *arg) > > wait_event_interruptible(osb->checkpoint_event, > > atomic_read(&journal->j_num_trans) > > || kthread_should_stop()); > > + if (status < 0) { > > + /* As we can not terminate by ourself, just enter an > > + * empty loop to wait for stop. > > + */ > > + continue; > > + } > > the above is unnecessary, given that below will induce panic upon failure. > > > status = ocfs2_commit_cache(osb); > > if (status < 0) { > > - static unsigned long abort_warn_time; > > - > > - /* Warn about this once per minute */ > > - if (printk_timed_ratelimit(&abort_warn_time, 60*HZ)) > > - mlog(ML_ERROR, "status = %d, journal is " > > - "already aborted.\n", status); > > /* > > - * After ocfs2_commit_cache() fails, j_num_trans has a > > - * non-zero value. Sleep here to avoid a busy-wait > > - * loop. > > + * journal can not recover from abort state, there is > > + * no need to keep commit cache. So we should either > > + * change to readonly(local mount) or just panic > > + * (cluster mount). > > commit cache doesn't apply to local mounts. > > > + * We should also clear j_num_trans to prevent further > > + * commit. > > that's redundant given that we're about to panic. > > > */ > > - msleep_interruptible(1000); > > + atomic_set(&journal->j_num_trans, 0); > > same here. > > > + ocfs2_abort(osb->sb, "Detected aborted journal"); > > ocfs2_commit_cache returning with an error does not necessarily mean that > the journal has been aborted; jbd2_journal_flush will return an error for other > conditions too, some of which could be recoverable. > > Therefore unconditionally aborting here is not necessarily warranted, and > presumably this is why the current code is a retry loop. > > [Srivathsa]: Even the journal is not aborted, there is some serious > error with the journal, keeping it running at this stage could lead > to corruption. How, precisely? Irrespective of the journal state, this should not be leading to any corruptions, the ocfs2_commit_thread will just keep unsuccessfully attempting a journal flush. Otherwise there's a bug somewhere that needs to be directly addressed. > But even if the journal has been aborted at that point, there's no reason why > the node cannot self-fence and effectively shutdown the local fs, instead of > aborting with a panic. > > [Srivathsa]: By self-fence, did you mean system panic/reboot due > to heartbeat loss? If so, that can't be guaranteed that it will happen > because the storage failure can be intermittent. Even if that happens, > it will panic or reboot, how effectively it will shutdown the local fs, > considering the journal is failing? I mean intentionally stopping the heartbeat and switching to read-only. > [Srivathsa]: will send a v2 addressing the redundant code. Apart from the redundant code, I see two separate issues: - the claim that there is a potential fs corruption bug after a journal abort in ocfs2_commit_thread, which needs to be clarified and addressed properly. - the method of handling irrecoverable journal errors. Regards, Anthony