From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Teigland Date: Thu, 11 Feb 2016 12:59:50 -0600 Subject: [Cluster-devel] [DLM PATCH 0/6] Misc DLM Improvements Regarding Socket Errors In-Reply-To: <1845924157.21879838.1455215948999.JavaMail.zimbra@redhat.com> References: <1455130532-9317-1-git-send-email-rpeterso@redhat.com> <20160211172241.GA1737@redhat.com> <1845924157.21879838.1455215948999.JavaMail.zimbra@redhat.com> Message-ID: <20160211185950.GA3152@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Thu, Feb 11, 2016 at 01:39:09PM -0500, Bob Peterson wrote: > The problem is: While testing the dlm in multiple recovery situations, > Nate and I discovered multiple problems. Until recently, no one has tried > to run recovery tests on an upstream DLM, (Let's distinguish tcp connection testing/recovery vs locking testing/recovery. I agree we've never looked at the tcp connections too much since the node is typically dead anyway.) > I agree that some of these patches might be unnecessary improvements. > I'll try to pare them down to what is absolutely necessary and what > is not. I'll also document exactly why the necessary ones are needed. Improvements are fine, I was just confused about which were fixes vs cleanups. > I'll also try to post them in order of highest priority and repost > them as individual patches rather than a set. > > The recovery tests are somewhat slow, so this will take some time. > > BTW, Have you had a chance to look at the patch I posted on 18 January, > titled "DLM: Replace nodeid_to_addr with kernel_getpeername"? > That definitely fixes one bug in patch b3a5bbfd which you mentioned. Great, thanks, that's the key one that I'd missed or forgotten. > I assume you're not suggesting I combine that patch with other patches > to stabilize b3a5bbfd, right? As you well know, this is very touchy > code and it's easier to diagnose and debug a larger number of smaller > patches. No, I don't have any concerns with the other improvements/fixes you have since the main issue was fixed in that nodeid_to_addr replacement.