From mboxrd@z Thu Jan  1 00:00:00 1970
From: Bob Peterson <rpeterso@redhat.com>
Date: Tue, 19 Jan 2021 10:44:44 -0500 (EST)
Subject: [Cluster-devel] [GFS2 PATCH] gfs2: make recovery workqueue
 operate on a gfs2 mount point, not journal
In-Reply-To: <CAHc6FU7T5RzFhPWF_YbZY9a7+goVTPKrOybh46e12xe6zhL99Q@mail.gmail.com>
References: <2125295377.38904313.1608669538740.JavaMail.zimbra@redhat.com>
	<51252ca2-fa56-acb8-24cf-fb2e992f76de@redhat.com>
	<561946972.42407585.1609776564024.JavaMail.zimbra@redhat.com>
	<CAHc6FU7T5RzFhPWF_YbZY9a7+goVTPKrOybh46e12xe6zhL99Q@mail.gmail.com>
Message-ID: <1287886465.45164472.1611071084974.JavaMail.zimbra@redhat.com>
List-Id: <cluster-devel.redhat.com>
To: cluster-devel.redhat.com
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

----- Original Message -----
> On Mon, Jan 4, 2021 at 5:09 PM Bob Peterson <rpeterso@redhat.com> wrote:
> >
> > Hi,
> >
> > ----- Original Message -----
> > > Hi,
> > >
> > > On 22/12/2020 20:38, Bob Peterson wrote:
> > > > Hi,
> > > >
> > > > Before this patch, journal recovery was done by a workqueue function
> > > > that
> > > > operated on a per-journal basis. The problem is, these could run
> > > > simultaneously
> > > > which meant that they could all use the same bio, sd_log_bio, to do
> > > > their
> > > > writing to all the various journals. These operations overwrote one
> > > > another
> > > > eventually causing memory corruption.
> > >
> > > Why not just add more bios so that this issue goes away? It would make
> > > more sense than preventing recovery from running in parallel. In general
> > > recovery should be spread amoung nodes anyway, so the case of having
> > > multiple recoveries running on the same node in parallel should be
> > > fairly rare too,
> > >
> > > Steve.
> >
> > As I understand it, if we allocate a bio from the same bio_set (as
> > bio_alloc does)
> > we need to submit the previous bio before getting the next one, which means
> > recovery processes cannot work in parallel, even if they use different bio
> > pointers.
> 
> Each recovery worker submits the current bio before allocating the
> next, so in the worst possible case, the recovery workers will end up
> getting serialized (that is, they will sleep in bio_alloc until they
> get their turn).
> 
> Andreas

Sure, the recovery workers' bio allocations and submitting may be serialized,
but that's where it ends. The recovery workers don't prevent races with each
other when using the variable common to all of them: sdp->sd_log_bio.
This is the case when there are, for example, 5 journals with 5 different
recovery workers, all trying to use the same sdp->sd_log_bio at the same time.
My choices were between using 5 different pointers or 1 single point of use.
I chose the latter. If you like, I can temporarily revert the patch and try
to somehow prove this is what happens, but it seems like a waste of time.
The patch made the problem go away.

Bob