From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steven Whitehouse Date: Tue, 28 Apr 2015 10:44:25 +0100 Subject: [Cluster-devel] [PATCH] GFS2: mark the journal idle to fix ro mounts In-Reply-To: <20150427173107.GR29132@octiron.msp.redhat.com> References: <1429888385-25449-1-git-send-email-bmarzins@redhat.com> <553E0906.6000708@redhat.com> <20150427173107.GR29132@octiron.msp.redhat.com> Message-ID: <553F5679.2060309@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi, On 27/04/15 18:31, Benjamin Marzinski wrote: > On Mon, Apr 27, 2015 at 11:01:42AM +0100, Steven Whitehouse wrote: >> Hi, >> >> On 24/04/15 16:13, Benjamin Marzinski wrote: >>> When gfs2 was mounted read-only and then unmounted, it was writing a >>> header block to the journal in the syncing gfs2_log_flush() call from >>> kill_sb(). This is because the journal was not being marked as idle >>> until the first log header was written out, and on a read-only mount >>> there never was a log header written out. Since the journal was not >>> marked idle, gfs2_log_flush() was writing out a header lock to make >>> sure it was empty during the sync. Not only did this cause IO to a >>> read-only filesystem, but the journalling isn't completely initialized >>> on read-only mounts, and so gfs2 was writing out the wrong sequence >>> number in the log header. >>> >>> Now, the journal is marked idle on mount, and gfs2_log_flush() won't >>> write out anything until there starts being transactions to flush. >> Does that mean that we should be doing more to initialize the log in the r/o >> mount case? It should know enough to recover the journals in the case that >> it is the first mounter, so did this perhaps only apply to subsequent >> mounters of the filesystem? > gfs2 currently has enough information to do recovery. Both > gfs2_recover_func() and gfs2_make_fs_rw() call gfs2_find_jhead() to > get information about the head of the journal and the sequence > numbers. > > gfs2_make_fs_rw() saves this information with these lines > > /* Initialize some head of the log stuff */ > sdp->sd_log_sequence = head.lh_sequence + 1; > gfs2_log_pointers_init(sdp, head.lh_blkno); > > This is what's not getting called on the read-only mounts that was > causing the fsck error. But the read only mounts should never be writing > anything in gfs2_log_flush(), which is where these values are used. So > we could make sure these values are always initialized, but the fact > that they weren't was what allowed us to catch this bug (and it would > still be a bug to write that header, even if it didn't mess with fsck). > > -Ben That makes sense to me - thanks, Steve.