[Cluster-devel] [GFS2 PATCH] gfs2: make recovery workqueue operate on a gfs2 mount point, not journal

cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed

From: Bob Peterson <rpeterso@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [GFS2 PATCH] gfs2: make recovery workqueue operate on a gfs2 mount point, not journal
Date: Mon, 4 Jan 2021 11:09:24 -0500 (EST)	[thread overview]
Message-ID: <561946972.42407585.1609776564024.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <51252ca2-fa56-acb8-24cf-fb2e992f76de@redhat.com>

Hi,

----- Original Message -----
> Hi,
> 
> On 22/12/2020 20:38, Bob Peterson wrote:
> > Hi,
> >
> > Before this patch, journal recovery was done by a workqueue function that
> > operated on a per-journal basis. The problem is, these could run
> > simultaneously
> > which meant that they could all use the same bio, sd_log_bio, to do their
> > writing to all the various journals. These operations overwrote one another
> > eventually causing memory corruption.
> 
> Why not just add more bios so that this issue goes away? It would make
> more sense than preventing recovery from running in parallel. In general
> recovery should be spread amoung nodes anyway, so the case of having
> multiple recoveries running on the same node in parallel should be
> fairly rare too,
> 
> Steve.

As I understand it, if we allocate a bio from the same bio_set (as bio_alloc does)
we need to submit the previous bio before getting the next one, which means
recovery processes cannot work in parallel, even if they use different bio pointers.

We can, of course, allocate several bio_sets, one for each journal, but I
remember Jeff Moyer telling me it would use 1MB per bio_set of memory,
which seems high. (I've not verified that.) I'm testing up to 60 mounts
times 5 cluster nodes (5 journals) which would add up to 300MB of memory.
That's not horrible but I remember we decided not to allocate separate
per-mount rb_trees for glock indexing because of the memory needed, and 
that seems much less by comparison.

We could also introduce new locking (and multiple bio pointers) to prevent
the bio from being used by multiple recoveries at the same time. I actually
tried that on an earlier attempt and immediately ran into deadlock issues,
probably because our journal writes also use the same bio.

This way is pretty simple and there are fewer recovery processes to worry
about when analyzing vmcores.

Bob

next prev parent reply	other threads:[~2021-01-04 16:09 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <290202568.38904309.1608669529163.JavaMail.zimbra@redhat.com>
2020-12-22 20:38 ` [Cluster-devel] [GFS2 PATCH] gfs2: make recovery workqueue operate on a gfs2 mount point, not journal Bob Peterson
2020-12-22 23:27   ` Andreas Gruenbacher
2020-12-23  2:47     ` Bob Peterson
2021-01-04  9:13   ` Steven Whitehouse
2021-01-04 16:09     ` Bob Peterson [this message]
2021-01-19 15:23       ` Andreas Gruenbacher
2021-01-19 15:44         ` Bob Peterson
2021-01-19 17:36           ` Andreas Gruenbacher
2021-01-19 18:18             ` Bob Peterson
2021-01-19 20:14               ` Andreas Gruenbacher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=561946972.42407585.1609776564024.JavaMail.zimbra@redhat.com \
    --to=rpeterso@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).