From mboxrd@z Thu Jan  1 00:00:00 1970
From: Steven Whitehouse <swhiteho@redhat.com>
Date: Tue, 28 Apr 2015 10:44:25 +0100
Subject: [Cluster-devel] [PATCH] GFS2: mark the journal idle to fix ro
 mounts
In-Reply-To: <20150427173107.GR29132@octiron.msp.redhat.com>
References: <1429888385-25449-1-git-send-email-bmarzins@redhat.com>
	<553E0906.6000708@redhat.com>
	<20150427173107.GR29132@octiron.msp.redhat.com>
Message-ID: <553F5679.2060309@redhat.com>
List-Id: <cluster-devel.redhat.com>
To: cluster-devel.redhat.com
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

Hi,

On 27/04/15 18:31, Benjamin Marzinski wrote:
> On Mon, Apr 27, 2015 at 11:01:42AM +0100, Steven Whitehouse wrote:
>> Hi,
>>
>> On 24/04/15 16:13, Benjamin Marzinski wrote:
>>> When gfs2 was mounted read-only and then unmounted, it was writing a
>>> header block to the journal in the syncing gfs2_log_flush() call from
>>> kill_sb(). This is because the journal was not being marked as idle
>>> until the first log header was written out, and on a read-only mount
>>> there never was a log header written out. Since the journal was not
>>> marked idle, gfs2_log_flush() was writing out a header lock to make
>>> sure it was empty during the sync.  Not only did this cause IO to a
>>> read-only filesystem, but the journalling isn't completely initialized
>>> on read-only mounts, and so gfs2 was writing out the wrong sequence
>>> number in the log header.
>>>
>>> Now, the journal is marked idle on mount, and gfs2_log_flush() won't
>>> write out anything until there starts being transactions to flush.
>> Does that mean that we should be doing more to initialize the log in the r/o
>> mount case? It should know enough to recover the journals in the case that
>> it is the first mounter, so did this perhaps only apply to subsequent
>> mounters of the filesystem?
> gfs2 currently has enough information to do recovery.  Both
> gfs2_recover_func() and gfs2_make_fs_rw() call gfs2_find_jhead() to
> get information about the head of the journal and the sequence
> numbers.
>
> gfs2_make_fs_rw() saves this information with these lines
>
>          /*  Initialize some head of the log stuff  */
>          sdp->sd_log_sequence = head.lh_sequence + 1;
>          gfs2_log_pointers_init(sdp, head.lh_blkno);
>
> This is what's not getting called on the read-only mounts that was
> causing the fsck error. But the read only mounts should never be writing
> anything in gfs2_log_flush(), which is where these values are used. So
> we could make sure these values are always initialized, but the fact
> that they weren't was what allowed us to catch this bug (and it would
> still be a bug to write that header, even if it didn't mess with fsck).
>
> -Ben
That makes sense to me - thanks,

Steve.