cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Bob Peterson <rpeterso@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [GFS2 PATCH] gfs2: Panic when an io error	occurs writing
Date: Tue, 18 Dec 2018 10:51:56 -0500 (EST)	[thread overview]
Message-ID: <660117846.56648788.1545148316625.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <4d7faf0ffc084ef79e01449ba9b09868@AMSPEX02CL01.citrite.net>

----- Original Message -----
> Hi Bob,
> 
> I agree, it's a hard problem. I'm just trying to understand that we've done
> the absolute best we can and that if this condition is hit then the best
> solution really is to just kill the node. I guess it's also a question of
> how common this actually ends up being. We have now got customers starting
> to use GFS2 for VM storage on XenServer so I guess we'll just have to see
> how many support calls we get in on it.
> 
> Thanks,
> 
> Mark.

Hi Mark,

I don't expect the problem to be very common in the real world. 
The user has to get IO errors while writing to the GFS2 journal, which is
not very common. The patch is basically reacting to a phenomenon we
recently started noticing in which the HBA (qla2xxx) driver shuts down
and stops accepting requests when you do abnormal reboots (which we sometimes
do to test node recovery). In these cases, the node doesn't go down right away.
It stays up just long enough to cause IO errors with subsequent withdraws,
which, we discovered, results in file system corruption.
Normal reboots, "/sbin/reboot -fin", and "echo b > /proc/sysrq-trigger" should
not have this problem, nor should node fencing, etc.

And like I said, I'm open to suggestions on how to fix it. I wish there was a
better solution.

As it is, I'd kind of like to get something into this merge window for the
upstream kernel, but I'll need to submit the pull request for that probably
tomorrow or Thursday. If we find a better solution, we can always revert these
changes and implement a new one.

Regards,

Bob Peterson



  reply	other threads:[~2018-12-18 15:51 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-17 17:15 [Cluster-devel] [GFS2 PATCH] gfs2: Panic when an io error occurs writing Mark Syms
2018-12-17 20:20 ` Bob Peterson
2018-12-18  9:49   ` Mark Syms
2018-12-18 15:51     ` Bob Peterson [this message]
2018-12-18 16:09       ` Mark Syms
2018-12-19  9:16         ` Steven Whitehouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=660117846.56648788.1545148316625.JavaMail.zimbra@redhat.com \
    --to=rpeterso@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).