cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Bob Peterson <rpeterso@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [DLM PATCH 0/6] Misc DLM Improvements Regarding Socket Errors
Date: Thu, 11 Feb 2016 13:39:09 -0500 (EST)	[thread overview]
Message-ID: <1845924157.21879838.1455215948999.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <20160211172241.GA1737@redhat.com>

----- Original Message -----
> On Wed, Feb 10, 2016 at 01:55:26PM -0500, Bob Peterson wrote:
> > I've been doing a bunch of recovery testing with DLM and discovered some
> > issues. This collection of 6 patches addresses those issues. Some of them
> > are of my own making, introduced by the recent patches that made DLM
> > print socket connection errors, and recovery from those errors.
> 
> Thanks Bob, perhaps I've not been paying close enough attention, but it's
> unclear to me how this patch set relates the the most accute issue we have
> at the moment, which are the problems introduced here:
> 
>   From b3a5bbfd780d9e9291f5f257be06e9ad6db11657 Mon Sep 17 00:00:00 2001
>   From: Bob Peterson <rpeterso@redhat.com>
>   Date: Thu, 27 Aug 2015 09:34:47 -0500
>   Subject: [PATCH] dlm: print error from kernel_sendpage
> 
>   Print a dlm-specific error when a socket error occurs
>   when sending a dlm message.
> 
>   Signed-off-by: Bob Peterson <rpeterso@redhat.com>
>   Signed-off-by: David Teigland <teigland@redhat.com>
> 
> Could we begin with one patch that's easy to track that directly resolves
> the issues with that commit (perhaps even a revert if it's not simple to
> fix directly)?  That brings us back to a known-good place, from which we
> can look at cleanups and changes.
> 
Hi Dave,

My goal has always been to attain stability, which I think I've finally
achieved.

The problem is: While testing the dlm in multiple recovery situations,
Nate and I discovered multiple problems. Until recently, no one has tried
to run recovery tests on an upstream DLM, so I think we're finding some
old bugs that have been there for a while, as well as bugs with b3a5bbfd,
which you mentioned.

I agree that some of these patches might be unnecessary improvements.
I'll try to pare them down to what is absolutely necessary and what
is not. I'll also document exactly why the necessary ones are needed.

I'll also try to post them in order of highest priority and repost
them as individual patches rather than a set.

The recovery tests are somewhat slow, so this will take some time.

BTW, Have you had a chance to look at the patch I posted on 18 January,
titled "DLM: Replace nodeid_to_addr with kernel_getpeername"?
That definitely fixes one bug in patch b3a5bbfd which you mentioned.

I assume you're not suggesting I combine that patch with other patches
to stabilize b3a5bbfd, right? As you well know, this is very touchy
code and it's easier to diagnose and debug a larger number of smaller
patches.

Regards,

Bob Peterson
Red Hat File Systems



  reply	other threads:[~2016-02-11 18:39 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-10 18:55 [Cluster-devel] [DLM PATCH 0/6] Misc DLM Improvements Regarding Socket Errors Bob Peterson
2016-02-10 18:55 ` [Cluster-devel] [DLM PATCH 1/6] DLM: Don't create kernel socket until we have valid node address Bob Peterson
2016-02-10 18:55 ` [Cluster-devel] [DLM PATCH 2/6] DLM: Call original error report when socket is NULL Bob Peterson
2016-02-11 16:43   ` Andreas Gruenbacher
2016-02-10 18:55 ` [Cluster-devel] [DLM PATCH 3/6] DLM: Make consistent error path through tcp_create_listen_sock Bob Peterson
2016-02-11 16:52   ` Andreas Gruenbacher
2016-02-11 17:59     ` Bob Peterson
2016-02-11 21:09       ` [Cluster-devel] [DLM PATCH 3/6] DLM: Make consistent error path Andreas Gruenbacher
2016-02-10 18:55 ` [Cluster-devel] [DLM PATCH 4/6] DLM: Eliminate useless goto Bob Peterson
2016-02-11 16:53   ` Andreas Gruenbacher
2016-02-10 18:55 ` [Cluster-devel] [DLM PATCH 5/6] DLM: Add locking to protect save callback assignments Bob Peterson
2016-02-11 17:04   ` Andreas Gruenbacher
2016-02-10 18:55 ` [Cluster-devel] [DLM PATCH 6/6] DLM: save / restore all socket callbacks Bob Peterson
2016-02-11 15:31   ` Steven Whitehouse
2016-02-11 16:43     ` [Cluster-devel] [DLM PATCH 6/6][try #2] " Bob Peterson
2016-02-11 17:10       ` Andreas Gruenbacher
2016-02-11 17:05 ` [Cluster-devel] [DLM PATCH 0/6] Misc DLM Improvements Regarding Socket Errors Andreas Gruenbacher
2016-02-11 17:22 ` David Teigland
2016-02-11 18:39   ` Bob Peterson [this message]
2016-02-11 18:59     ` David Teigland
2016-02-15 21:16     ` Bob Peterson
2016-02-15 21:24       ` David Teigland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1845924157.21879838.1455215948999.JavaMail.zimbra@redhat.com \
    --to=rpeterso@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).