[Cluster-devel] [GFS2 PATCH] gfs2: make recovery workqueue operate on a gfs2 mount point, not journal

cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed

From: Bob Peterson <rpeterso@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [GFS2 PATCH] gfs2: make recovery workqueue operate on a gfs2 mount point, not journal
Date: Tue, 19 Jan 2021 13:18:42 -0500 (EST)	[thread overview]
Message-ID: <1238899263.45200782.1611080322302.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <CAHc6FU6sNa5CA1Q9deyuVhBA7RohHhb59m6PZ3-EMFueW6W6kg@mail.gmail.com>

----- Original Message -----
> On Tue, Jan 19, 2021 at 4:44 PM Bob Peterson <rpeterso@redhat.com> wrote:
> > Sure, the recovery workers' bio allocations and submitting may be
> > serialized,
> > but that's where it ends. The recovery workers don't prevent races with
> > each
> > other when using the variable common to all of them: sdp->sd_log_bio.
> > This is the case when there are, for example, 5 journals with 5 different
> > recovery workers, all trying to use the same sdp->sd_log_bio at the same
> > time.
> 
> Well, sdp->sd_log_bio obviously needs to be moved to a per-journal context.

I tried that and it didn't end well. If we keep multiple bio pointers, each
recovery worker still needs to make sure all the other bios are submitted
before allocating a new one. Sure, it could make sure _its_ previous bio was
submitted, and the others would be serialized, but there are cases in which
they can run out of bios. Yes, I saw this. This can happen, for example,
when you have 60 gfs2 mounts times 5 nodes, with lots of workers requesting
lots of bios at the same time. Unless, of course, we allocate unique bio_sets
that get their own slabs, etc. We can introduce spinlock locking or something
to manage this, but when I tried it, I found multiple scenarios that deadlock.
It gets ugly really fast.

In practice, when multiple nodes in a cluster go down, their journals are
recovered by several of the remaining cluster nodes, which means they happen
simultaneously anyway, and pretty quickly. In my case, I've got 5 nodes and
2 of them get shot, so the remaining 3 nodes do the journal recovery, and
I've never seen them conflict with one another. Their glocks seem to distribute
the work well.

The only time you're really going to see multiple journals recovered by the
same node (for the same file systems anyway) is when the cluster loses quorum.
Then when quorum is regained, there is often a burst of requests to recover
multiple journals on the same few nodes. Then the same node often tries to
recover several journals for several file systems.

So the circumstances are unusual to begin with. But also very recreatable.

What's wrong with a single worker that handles them all? What's your actual
concern with doing it this way? Is it performance? Who cares if journal
recovery takes 1.4 seconds rather than 1.2 seconds?

Bob

next prev parent reply	other threads:[~2021-01-19 18:18 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <290202568.38904309.1608669529163.JavaMail.zimbra@redhat.com>
2020-12-22 20:38 ` [Cluster-devel] [GFS2 PATCH] gfs2: make recovery workqueue operate on a gfs2 mount point, not journal Bob Peterson
2020-12-22 23:27   ` Andreas Gruenbacher
2020-12-23  2:47     ` Bob Peterson
2021-01-04  9:13   ` Steven Whitehouse
2021-01-04 16:09     ` Bob Peterson
2021-01-19 15:23       ` Andreas Gruenbacher
2021-01-19 15:44         ` Bob Peterson
2021-01-19 17:36           ` Andreas Gruenbacher
2021-01-19 18:18             ` Bob Peterson [this message]
2021-01-19 20:14               ` Andreas Gruenbacher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1238899263.45200782.1611080322302.JavaMail.zimbra@redhat.com \
    --to=rpeterso@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).