* [PATCH] Ceph RADOS cluster mutex helper for Samba CTDB
@ 2016-12-01 14:17 David Disseldorp
[not found] ` <CAJ+X7mTkBLQDYb+r9LELQe-sqfG_4YkQ9HbkDFAp70cPp7V8zA@mail.gmail.com>
0 siblings, 1 reply; 5+ messages in thread
From: David Disseldorp @ 2016-12-01 14:17 UTC (permalink / raw)
To: Samba Technical; +Cc: ceph-devel@vger.kernel.org, Martin Schwenke
[-- Attachment #1: Type: text/plain, Size: 887 bytes --]
Hi,
The attached patch-set implements a cluster mutex helper for Samba CTDB
using Ceph librados.
ctdb_mutex_ceph_rados_helper_lock can be used as a recovery lock provider
for CTDB. When configured, split brain avoidance during CTDB recovery
will be handled using locks against an object located in a Ceph RADOS
pool.
I've also attached a standalone test script - @Martin: does this belong
in the ctdb test suite, or can I just commit it as a standalone test?
It has a few non-standard dependencies: a running Ceph cluster, the
rados and jq binaries.
Feedback appreciated.
Cheers, David
--
ctdb/doc/Makefile | 3 +-
ctdb/doc/ctdb_mutex_ceph_rados_helper.7.xml | 90 ++++++
ctdb/tools/ctdb_mutex_ceph_rados_helper.c | 334 ++++++++++++++++++++
ctdb/wscript | 19 ++
4 files changed, 445 insertions(+), 1 deletion(-)
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: ctdb_reclock_ceph.patchset --]
[-- Type: text/x-patch, Size: 17146 bytes --]
From 47e653373073f1564844465eb416cdebb1dc4aa6 Mon Sep 17 00:00:00 2001
From: David Disseldorp <ddiss@samba.org>
Date: Thu, 1 Dec 2016 13:33:22 +0100
Subject: [PATCH 1/2] ctdb: cluster mutex helper using Ceph RADOS
ctdb_mutex_ceph_rados_helper implements the cluster mutex helper API
atop Ceph using the librados rados_lock_exclusive()/rados_unlock()
functionality.
Once configured, split brain avoidance during CTDB recovery will be
handled using locks against an object located in a Ceph RADOS pool.
Signed-off-by: David Disseldorp <ddiss@samba.org>
---
ctdb/tools/ctdb_mutex_ceph_rados_helper.c | 334 ++++++++++++++++++++++++++++++
ctdb/wscript | 19 ++
2 files changed, 353 insertions(+)
create mode 100644 ctdb/tools/ctdb_mutex_ceph_rados_helper.c
diff --git a/ctdb/tools/ctdb_mutex_ceph_rados_helper.c b/ctdb/tools/ctdb_mutex_ceph_rados_helper.c
new file mode 100644
index 0000000..8d19965
--- /dev/null
+++ b/ctdb/tools/ctdb_mutex_ceph_rados_helper.c
@@ -0,0 +1,334 @@
+/*
+ CTDB mutex helper using Ceph librados locks
+
+ Copyright (C) David Disseldorp 2016
+
+ Based on ctdb_mutex_fcntl_helper.c, which is:
+ Copyright (C) Martin Schwenke 2015
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, see <http://www.gnu.org/licenses/>.
+*/
+
+#include "replace.h"
+#include "system/filesys.h"
+#include "system/network.h"
+
+/* protocol.h is just needed for ctdb_sock_addr, which is used in system.h */
+#include "protocol/protocol.h"
+#include "common/system.h"
+#include "lib/util/time.h"
+#include "tevent.h"
+#include "talloc.h"
+#include "rados/librados.h"
+
+#define CTDB_MUTEX_CEPH_LOCK_NAME "ctdb_reclock_mutex"
+#define CTDB_MUTEX_CEPH_LOCK_COOKIE CTDB_MUTEX_CEPH_LOCK_NAME
+#define CTDB_MUTEX_CEPH_LOCK_DESC "CTDB recovery lock"
+
+#define CTDB_MUTEX_STATUS_HOLDING "0"
+#define CTDB_MUTEX_STATUS_CONTENDED "1"
+#define CTDB_MUTEX_STATUS_TIMEOUT "2"
+#define CTDB_MUTEX_STATUS_ERROR "3"
+
+static char *progname = NULL;
+
+static int ctdb_mutex_rados_ctx_create(const char *ceph_cluster_name,
+ const char *ceph_auth_name,
+ const char *pool_name,
+ rados_t *_ceph_cluster,
+ rados_ioctx_t *_ioctx)
+{
+ rados_t ceph_cluster = NULL;
+ rados_ioctx_t ioctx = NULL;
+ int ret;
+
+ ret = rados_create2(&ceph_cluster, ceph_cluster_name, ceph_auth_name, 0);
+ if (ret < 0) {
+ fprintf(stderr, "%s: failed to initialise Ceph cluster %s as %s"
+ " - (%s)\n", progname, ceph_cluster_name, ceph_auth_name,
+ strerror(-ret));
+ return ret;
+ }
+
+ /* path=NULL tells librados to use default locations */
+ ret = rados_conf_read_file(ceph_cluster, NULL);
+ if (ret < 0) {
+ fprintf(stderr, "%s: failed to parse Ceph cluster config"
+ " - (%s)\n", progname, strerror(-ret));
+ rados_shutdown(ceph_cluster);
+ return ret;
+ }
+
+ ret = rados_connect(ceph_cluster);
+ if (ret < 0) {
+ fprintf(stderr, "%s: failed to connect to Ceph cluster %s as %s"
+ " - (%s)\n", progname, ceph_cluster_name, ceph_auth_name,
+ strerror(-ret));
+ rados_shutdown(ceph_cluster);
+ return ret;
+ }
+
+
+ ret = rados_ioctx_create(ceph_cluster, pool_name, &ioctx);
+ if (ret < 0) {
+ fprintf(stderr, "%s: failed to create Ceph ioctx for pool %s"
+ " - (%s)\n", progname, pool_name, strerror(-ret));
+ rados_shutdown(ceph_cluster);
+ return ret;
+ }
+
+ *_ceph_cluster = ceph_cluster;
+ *_ioctx = ioctx;
+
+ return 0;
+}
+
+static void ctdb_mutex_rados_ctx_destroy(rados_t ceph_cluster,
+ rados_ioctx_t ioctx)
+{
+ rados_ioctx_destroy(ioctx);
+ rados_shutdown(ceph_cluster);
+}
+
+static int ctdb_mutex_rados_lock(rados_ioctx_t *ioctx,
+ const char *oid)
+{
+ int ret;
+
+ ret = rados_lock_exclusive(ioctx, oid,
+ CTDB_MUTEX_CEPH_LOCK_NAME,
+ CTDB_MUTEX_CEPH_LOCK_COOKIE,
+ CTDB_MUTEX_CEPH_LOCK_DESC,
+ NULL, /* infinite duration */
+ 0);
+ if ((ret == -EEXIST) || (ret == -EBUSY)) {
+ /* lock contention */
+ return ret;
+ } else if (ret < 0) {
+ /* unexpected failure */
+ fprintf(stderr,
+ "%s: Failed to get lock on RADOS object '%s' - (%s)\n",
+ progname, oid, strerror(-ret));
+ return ret;
+ }
+
+ /* lock obtained */
+ return 0;
+}
+
+static int ctdb_mutex_rados_unlock(rados_ioctx_t *ioctx,
+ const char *oid)
+{
+ int ret;
+
+ ret = rados_unlock(ioctx, oid,
+ CTDB_MUTEX_CEPH_LOCK_NAME,
+ CTDB_MUTEX_CEPH_LOCK_COOKIE);
+ if (ret < 0) {
+ fprintf(stderr,
+ "%s: Failed to drop lock on RADOS object '%s' - (%s)\n",
+ progname, oid, strerror(-ret));
+ return ret;
+ }
+
+ return 0;
+}
+
+struct ctdb_mutex_rados_state {
+ bool holding_mutex;
+ const char *ceph_cluster_name;
+ const char *ceph_auth_name;
+ const char *pool_name;
+ const char *object;
+ int ppid;
+ struct tevent_context *ev;
+ struct tevent_signal *sig_ev;
+ struct tevent_timer *timer_ev;
+ rados_t ceph_cluster;
+ rados_ioctx_t ioctx;
+};
+
+static void ctdb_mutex_rados_sigterm_cb(struct tevent_context *ev,
+ struct tevent_signal *se,
+ int signum,
+ int count,
+ void *siginfo,
+ void *private_data)
+{
+ struct ctdb_mutex_rados_state *cmr_state = private_data;
+ int ret;
+
+ if (!cmr_state->holding_mutex) {
+ fprintf(stderr, "Sigterm callback invoked without mutex!\n");
+ ret = -EINVAL;
+ goto err_ctx_cleanup;
+ }
+
+ ret = ctdb_mutex_rados_unlock(cmr_state->ioctx, cmr_state->object);
+err_ctx_cleanup:
+ ctdb_mutex_rados_ctx_destroy(cmr_state->ceph_cluster,
+ cmr_state->ioctx);
+ talloc_free(cmr_state);
+ exit(ret ? 1 : 0);
+}
+
+static void ctdb_mutex_rados_timer_cb(struct tevent_context *ev,
+ struct tevent_timer *te,
+ struct timeval current_time,
+ void *private_data)
+{
+ struct ctdb_mutex_rados_state *cmr_state = private_data;
+ int ret;
+
+ if (!cmr_state->holding_mutex) {
+ fprintf(stderr, "Timer callback invoked without mutex!\n");
+ ret = -EINVAL;
+ goto err_ctx_cleanup;
+ }
+
+ if ((kill(cmr_state->ppid, 0) == 0) || (errno != ESRCH)) {
+ /* parent still around, keep waiting */
+ cmr_state->timer_ev = tevent_add_timer(cmr_state->ev, cmr_state,
+ timeval_current_ofs(5,0),
+ ctdb_mutex_rados_timer_cb,
+ cmr_state);
+ if (cmr_state->timer_ev == NULL) {
+ fprintf(stderr, "Failed to create timer event\n");
+ /* rely on signal cb */
+ }
+ return;
+ }
+
+ /* parent ended, drop lock and exit */
+ ret = ctdb_mutex_rados_unlock(cmr_state->ioctx, cmr_state->object);
+err_ctx_cleanup:
+ ctdb_mutex_rados_ctx_destroy(cmr_state->ceph_cluster,
+ cmr_state->ioctx);
+ talloc_free(cmr_state);
+ exit(ret ? 1 : 0);
+}
+
+int main(int argc, char *argv[])
+{
+ int ret;
+ struct ctdb_mutex_rados_state *cmr_state;
+
+ progname = argv[0];
+
+ if (argc != 5) {
+ fprintf(stderr, "Usage: %s <Ceph Cluster> <Ceph user> "
+ "<RADOS pool> <RADOS object>\n",
+ progname);
+ ret = -EINVAL;
+ goto err_out;
+ }
+
+ ret = setvbuf(stdout, NULL, _IONBF, 0);
+ if (ret != 0) {
+ fprintf(stderr, "Failed to configure unbuffered stdout I/O\n");
+ }
+
+ cmr_state = talloc_zero(NULL, struct ctdb_mutex_rados_state);
+ if (cmr_state == NULL) {
+ fprintf(stdout, CTDB_MUTEX_STATUS_ERROR);
+ ret = -ENOMEM;
+ goto err_out;
+ }
+
+ cmr_state->ceph_cluster_name = argv[1];
+ cmr_state->ceph_auth_name = argv[2];
+ cmr_state->pool_name = argv[3];
+ cmr_state->object = argv[4];
+
+ cmr_state->ppid = getppid();
+ if (cmr_state->ppid == 1) {
+ /*
+ * The original parent is gone and the process has
+ * been reparented to init. This can happen if the
+ * helper is started just as the parent is killed
+ * during shutdown. The error message doesn't need to
+ * be stellar, since there won't be anything around to
+ * capture and log it...
+ */
+ fprintf(stderr, "%s: PPID == 1\n", progname);
+ ret = -EPIPE;
+ goto err_state_free;
+ }
+
+ cmr_state->ev = tevent_context_init(cmr_state);
+ if (cmr_state->ev == NULL) {
+ fprintf(stderr, "tevent_context_init failed\n");
+ fprintf(stdout, CTDB_MUTEX_STATUS_ERROR);
+ ret = -ENOMEM;
+ goto err_state_free;
+ }
+
+ /* wait for sigterm */
+ cmr_state->sig_ev = tevent_add_signal(cmr_state->ev, cmr_state, SIGTERM, 0,
+ ctdb_mutex_rados_sigterm_cb,
+ cmr_state);
+ if (cmr_state->sig_ev == NULL) {
+ fprintf(stderr, "Failed to create signal event\n");
+ fprintf(stdout, CTDB_MUTEX_STATUS_ERROR);
+ ret = -ENOMEM;
+ goto err_state_free;
+ }
+
+ /* periodically check parent */
+ cmr_state->timer_ev = tevent_add_timer(cmr_state->ev, cmr_state,
+ timeval_current_ofs(5,0),
+ ctdb_mutex_rados_timer_cb,
+ cmr_state);
+ if (cmr_state->timer_ev == NULL) {
+ fprintf(stderr, "Failed to create timer event\n");
+ fprintf(stdout, CTDB_MUTEX_STATUS_ERROR);
+ ret = -ENOMEM;
+ goto err_state_free;
+ }
+
+ ret = ctdb_mutex_rados_ctx_create(cmr_state->ceph_cluster_name,
+ cmr_state->ceph_auth_name,
+ cmr_state->pool_name,
+ &cmr_state->ceph_cluster,
+ &cmr_state->ioctx);
+ if (ret < 0) {
+ fprintf(stdout, CTDB_MUTEX_STATUS_ERROR);
+ goto err_state_free;
+ }
+
+ ret = ctdb_mutex_rados_lock(cmr_state->ioctx, cmr_state->object);
+ if ((ret == -EEXIST) || (ret == -EBUSY)) {
+ fprintf(stdout, CTDB_MUTEX_STATUS_CONTENDED);
+ goto err_ctx_cleanup;
+ } else if (ret < 0) {
+ fprintf(stdout, CTDB_MUTEX_STATUS_ERROR);
+ goto err_ctx_cleanup;
+ }
+
+ cmr_state->holding_mutex = true;
+ fprintf(stdout, CTDB_MUTEX_STATUS_HOLDING);
+
+ /* wait for the signal / timer events to do their work */
+ ret = tevent_loop_wait(cmr_state->ev);
+ if (ret < 0) {
+ goto err_ctx_cleanup;
+ }
+err_ctx_cleanup:
+ ctdb_mutex_rados_ctx_destroy(cmr_state->ceph_cluster,
+ cmr_state->ioctx);
+err_state_free:
+ talloc_free(cmr_state);
+err_out:
+ return ret ? 1 : 0;
+}
diff --git a/ctdb/wscript b/ctdb/wscript
index f4bccef..75ddee2 100644
--- a/ctdb/wscript
+++ b/ctdb/wscript
@@ -69,6 +69,9 @@ def set_options(opt):
opt.add_option('--enable-pmda',
help=("Turn on PCP pmda support (default=no)"),
action="store_true", dest='ctdb_pmda', default=False)
+ opt.add_option('--enable-ceph-reclock',
+ help=("Enable Ceph CTDB recovery lock helper (default=no)"),
+ action="store_true", dest='ctdb_ceph_reclock', default=False)
opt.add_option('--with-logdir',
help=("Path to log directory"),
@@ -159,6 +162,15 @@ def configure(conf):
conf.env.CTDB_PMDADIR = os.path.join(conf.env.LOCALSTATEDIR,
'lib/pcp/pmdas/ctdb')
+ if Options.options.ctdb_ceph_reclock:
+ if (conf.CHECK_HEADERS('rados/librados.h', False, False, 'rados') and
+ conf.CHECK_LIB('rados', shlib=True)):
+ Logs.info('Building with Ceph librados recovery lock support')
+ conf.define('HAVE_LIBRADOS', 1)
+ else:
+ Logs.error("Missing librados for Ceph recovery lock support")
+ sys.exit(1)
+
have_infiniband = False
if Options.options.ctdb_infiniband:
ib_support = True
@@ -517,6 +529,13 @@ def build(bld):
bld.INSTALL_FILES('${CTDB_PMDADIR}', 'utils/pmda/README',
destname='README')
+ if bld.env.HAVE_LIBRADOS:
+ bld.SAMBA_BINARY('ctdb_mutex_ceph_rados_helper',
+ source='tools/ctdb_mutex_ceph_rados_helper.c',
+ deps='ctdb-system rados',
+ includes='include',
+ install_path='${CTDB_HELPER_BINDIR}')
+
sed_expr1 = 's|/usr/local/var/lib/ctdb|%s|g' % (bld.env.CTDB_VARDIR)
sed_expr2 = 's|/usr/local/etc/ctdb|%s|g' % (bld.env.CTDB_ETCDIR)
sed_expr3 = 's|/usr/local/var/log|%s|g' % (bld.env.CTDB_LOGDIR)
--
2.10.2
From fb53f42d1e02e0f33d1a1304c7d708203d75a14d Mon Sep 17 00:00:00 2001
From: David Disseldorp <ddiss@samba.org>
Date: Thu, 1 Dec 2016 14:22:45 +0100
Subject: [PATCH 2/2] ctdb/doc: man page for Ceph RADOS cluster mutex helper
Signed-off-by: David Disseldorp <ddiss@samba.org>
---
ctdb/doc/Makefile | 3 +-
ctdb/doc/ctdb_mutex_ceph_rados_helper.7.xml | 90 +++++++++++++++++++++++++++++
2 files changed, 92 insertions(+), 1 deletion(-)
create mode 100644 ctdb/doc/ctdb_mutex_ceph_rados_helper.7.xml
diff --git a/ctdb/doc/Makefile b/ctdb/doc/Makefile
index f0f8215..5bbd748 100644
--- a/ctdb/doc/Makefile
+++ b/ctdb/doc/Makefile
@@ -8,7 +8,8 @@ DOCS = ctdb.1 ctdb.1.html \
ctdbd.conf.5 ctdbd.conf.5.html \
ctdb.7 ctdb.7.html \
ctdb-statistics.7 ctdb-statistics.7.html \
- ctdb-tunables.7 ctdb-tunables.7.html
+ ctdb-tunables.7 ctdb-tunables.7.html \
+ ctdb_mutex_ceph_rados_helper.7 ctdb_mutex_ceph_rados_helper.7.html
all: $(DOCS)
diff --git a/ctdb/doc/ctdb_mutex_ceph_rados_helper.7.xml b/ctdb/doc/ctdb_mutex_ceph_rados_helper.7.xml
new file mode 100644
index 0000000..e5dedc7
--- /dev/null
+++ b/ctdb/doc/ctdb_mutex_ceph_rados_helper.7.xml
@@ -0,0 +1,90 @@
+<?xml version="1.0" encoding="iso-8859-1"?>
+<!DOCTYPE refentry
+ PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
+ "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
+<refentry id="ctdb_mutex_ceph_rados_helper.7">
+
+ <refmeta>
+ <refentrytitle>Ceph RADOS Mutex</refentrytitle>
+ <manvolnum>7</manvolnum>
+ <refmiscinfo class="source">ctdb</refmiscinfo>
+ <refmiscinfo class="manual">CTDB - clustered TDB database</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>ctdb_mutex_ceph_rados_helper</refname>
+ <refpurpose>Ceph RADOS cluster mutex helper</refpurpose>
+ </refnamediv>
+
+ <refsect1>
+ <title>DESCRIPTION</title>
+ <para>
+ ctdb_mutex_ceph_rados_helper_lock can be used as a recovery lock provider
+ for CTDB. When configured, split brain avoidance during CTDB recovery
+ will be handled using locks against an object located in a Ceph RADOS
+ pool.
+ To enable this functionality, include the following line in your CTDB
+ config file:
+ </para>
+ <screen format="linespecific">
+CTDB_RECOVERY_LOCK="!ctdb_mutex_ceph_rados_helper_lock [Cluster] [User] [Pool] [Object]"
+
+Cluster: Ceph cluster name (e.g. ceph)
+User: Ceph cluster user name (e.g. client.admin)
+Pool: Ceph RADOS pool name
+Object: Ceph RADOS object name
+ </screen>
+ <para>
+ The Ceph cluster <parameter>Cluster</parameter> must be up and running,
+ with a configuration, and keyring file for <parameter>User</parameter>
+ located in a librados default search path (e.g. /etc/ceph/).
+ <parameter>Pool</parameter> must already exist.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>SEE ALSO</title>
+ <para>
+ <citerefentry><refentrytitle>ctdb</refentrytitle>
+ <manvolnum>7</manvolnum></citerefentry>,
+
+ <citerefentry><refentrytitle>ctdbd</refentrytitle>
+ <manvolnum>1</manvolnum></citerefentry>,
+
+ <ulink url="http://ctdb.samba.org/"/>
+ </para>
+ </refsect1>
+
+ <refentryinfo>
+ <author>
+ <contrib>
+ This documentation was written by David Disseldorp
+ </contrib>
+ </author>
+
+ <copyright>
+ <year>2016</year>
+ <holder>David Disseldorp</holder>
+ </copyright>
+ <legalnotice>
+ <para>
+ This program is free software; you can redistribute it and/or
+ modify it under the terms of the GNU General Public License as
+ published by the Free Software Foundation; either version 3 of
+ the License, or (at your option) any later version.
+ </para>
+ <para>
+ This program is distributed in the hope that it will be
+ useful, but WITHOUT ANY WARRANTY; without even the implied
+ warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
+ PURPOSE. See the GNU General Public License for more details.
+ </para>
+ <para>
+ You should have received a copy of the GNU General Public
+ License along with this program; if not, see
+ <ulink url="http://www.gnu.org/licenses"/>.
+ </para>
+ </legalnotice>
+ </refentryinfo>
+
+</refentry>
--
2.10.2
[-- Attachment #3: test_ceph_rados_reclock.sh --]
[-- Type: application/x-shellscript, Size: 4245 bytes --]
^ permalink raw reply related [flat|nested] 5+ messages in thread[parent not found: <CAJ+X7mTkBLQDYb+r9LELQe-sqfG_4YkQ9HbkDFAp70cPp7V8zA@mail.gmail.com>]
* Re: [PATCH] Ceph RADOS cluster mutex helper for Samba CTDB [not found] ` <CAJ+X7mTkBLQDYb+r9LELQe-sqfG_4YkQ9HbkDFAp70cPp7V8zA@mail.gmail.com> @ 2016-12-06 12:14 ` David Disseldorp 2016-12-06 12:18 ` David Disseldorp 0 siblings, 1 reply; 5+ messages in thread From: David Disseldorp @ 2016-12-06 12:14 UTC (permalink / raw) To: Amitay Isaacs; +Cc: Samba Technical, ceph-devel@vger.kernel.org On Tue, 6 Dec 2016 18:58:41 +1100, Amitay Isaacs wrote: > On Fri, Dec 2, 2016 at 1:17 AM, David Disseldorp <ddiss@suse.de> wrote: > > > Hi, > > > > The attached patch-set implements a cluster mutex helper for Samba CTDB > > using Ceph librados. > > > > ctdb_mutex_ceph_rados_helper_lock can be used as a recovery lock provider > > for CTDB. When configured, split brain avoidance during CTDB recovery > > will be handled using locks against an object located in a Ceph RADOS > > pool. > > > > I've also attached a standalone test script - @Martin: does this belong > > in the ctdb test suite, or can I just commit it as a standalone test? > > It has a few non-standard dependencies: a running Ceph cluster, the > > rados and jq binaries. > > > > Feedback appreciated. > > > > > This code does not belong in ctdb/tools. You can move it to an appropriate > directory in ctdb/utils. > > Please include the test code also as part of the commit. Someone with > ceph-rados setup should be able to run this test. > I would appreciate if you can add few comments in the test script > describing the requirements and how to run the test. Thanks for the feedback, Amitay. Please find a v2 patchset attached, with the following changes: - move ctdb_mutex_ceph_rados_helper under ctdb/utils/ceph - add test_ceph_rados_reclock.sh and document usage Cheers, David -- ctdb/doc/Makefile | 3 +- ctdb/doc/ctdb_mutex_ceph_rados_helper.7.xml | 90 +++++ .../utils/ceph/ctdb_mutex_ceph_rados_helper.c | 334 ++++++++++++++++++ ctdb/utils/ceph/test_ceph_rados_reclock.sh | 151 ++++++++ ctdb/wscript | 19 + 5 files changed, 596 insertions(+), 1 deletion(-) ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] Ceph RADOS cluster mutex helper for Samba CTDB 2016-12-06 12:14 ` David Disseldorp @ 2016-12-06 12:18 ` David Disseldorp [not found] ` <CAJ+X7mRh04D+Yvtf0xx3dT6rTa=9KvJagyK=1PJQC=1R+u++7w@mail.gmail.com> 0 siblings, 1 reply; 5+ messages in thread From: David Disseldorp @ 2016-12-06 12:18 UTC (permalink / raw) To: Amitay Isaacs; +Cc: Samba Technical, ceph-devel@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 409 bytes --] This time with the patch-set attached... > ctdb/doc/Makefile | 3 +- > ctdb/doc/ctdb_mutex_ceph_rados_helper.7.xml | 90 +++++ > .../utils/ceph/ctdb_mutex_ceph_rados_helper.c | 334 ++++++++++++++++++ > ctdb/utils/ceph/test_ceph_rados_reclock.sh | 151 ++++++++ > ctdb/wscript | 19 + > 5 files changed, 596 insertions(+), 1 deletion(-) [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: ctdb_reclock_ceph_v2.patchset --] [-- Type: text/x-patch, Size: 24638 bytes --] From f723522c259f6b46cec088cc951ce801e491a063 Mon Sep 17 00:00:00 2001 From: David Disseldorp <ddiss@samba.org> Date: Thu, 1 Dec 2016 13:33:22 +0100 Subject: [PATCH 1/3] ctdb: cluster mutex helper using Ceph RADOS ctdb_mutex_ceph_rados_helper implements the cluster mutex helper API atop Ceph using the librados rados_lock_exclusive()/rados_unlock() functionality. Once configured, split brain avoidance during CTDB recovery will be handled using locks against an object located in a Ceph RADOS pool. Signed-off-by: David Disseldorp <ddiss@samba.org> --- ctdb/utils/ceph/ctdb_mutex_ceph_rados_helper.c | 334 +++++++++++++++++++++++++ ctdb/wscript | 19 ++ 2 files changed, 353 insertions(+) create mode 100644 ctdb/utils/ceph/ctdb_mutex_ceph_rados_helper.c diff --git a/ctdb/utils/ceph/ctdb_mutex_ceph_rados_helper.c b/ctdb/utils/ceph/ctdb_mutex_ceph_rados_helper.c new file mode 100644 index 0000000..8d19965 --- /dev/null +++ b/ctdb/utils/ceph/ctdb_mutex_ceph_rados_helper.c @@ -0,0 +1,334 @@ +/* + CTDB mutex helper using Ceph librados locks + + Copyright (C) David Disseldorp 2016 + + Based on ctdb_mutex_fcntl_helper.c, which is: + Copyright (C) Martin Schwenke 2015 + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, see <http://www.gnu.org/licenses/>. +*/ + +#include "replace.h" +#include "system/filesys.h" +#include "system/network.h" + +/* protocol.h is just needed for ctdb_sock_addr, which is used in system.h */ +#include "protocol/protocol.h" +#include "common/system.h" +#include "lib/util/time.h" +#include "tevent.h" +#include "talloc.h" +#include "rados/librados.h" + +#define CTDB_MUTEX_CEPH_LOCK_NAME "ctdb_reclock_mutex" +#define CTDB_MUTEX_CEPH_LOCK_COOKIE CTDB_MUTEX_CEPH_LOCK_NAME +#define CTDB_MUTEX_CEPH_LOCK_DESC "CTDB recovery lock" + +#define CTDB_MUTEX_STATUS_HOLDING "0" +#define CTDB_MUTEX_STATUS_CONTENDED "1" +#define CTDB_MUTEX_STATUS_TIMEOUT "2" +#define CTDB_MUTEX_STATUS_ERROR "3" + +static char *progname = NULL; + +static int ctdb_mutex_rados_ctx_create(const char *ceph_cluster_name, + const char *ceph_auth_name, + const char *pool_name, + rados_t *_ceph_cluster, + rados_ioctx_t *_ioctx) +{ + rados_t ceph_cluster = NULL; + rados_ioctx_t ioctx = NULL; + int ret; + + ret = rados_create2(&ceph_cluster, ceph_cluster_name, ceph_auth_name, 0); + if (ret < 0) { + fprintf(stderr, "%s: failed to initialise Ceph cluster %s as %s" + " - (%s)\n", progname, ceph_cluster_name, ceph_auth_name, + strerror(-ret)); + return ret; + } + + /* path=NULL tells librados to use default locations */ + ret = rados_conf_read_file(ceph_cluster, NULL); + if (ret < 0) { + fprintf(stderr, "%s: failed to parse Ceph cluster config" + " - (%s)\n", progname, strerror(-ret)); + rados_shutdown(ceph_cluster); + return ret; + } + + ret = rados_connect(ceph_cluster); + if (ret < 0) { + fprintf(stderr, "%s: failed to connect to Ceph cluster %s as %s" + " - (%s)\n", progname, ceph_cluster_name, ceph_auth_name, + strerror(-ret)); + rados_shutdown(ceph_cluster); + return ret; + } + + + ret = rados_ioctx_create(ceph_cluster, pool_name, &ioctx); + if (ret < 0) { + fprintf(stderr, "%s: failed to create Ceph ioctx for pool %s" + " - (%s)\n", progname, pool_name, strerror(-ret)); + rados_shutdown(ceph_cluster); + return ret; + } + + *_ceph_cluster = ceph_cluster; + *_ioctx = ioctx; + + return 0; +} + +static void ctdb_mutex_rados_ctx_destroy(rados_t ceph_cluster, + rados_ioctx_t ioctx) +{ + rados_ioctx_destroy(ioctx); + rados_shutdown(ceph_cluster); +} + +static int ctdb_mutex_rados_lock(rados_ioctx_t *ioctx, + const char *oid) +{ + int ret; + + ret = rados_lock_exclusive(ioctx, oid, + CTDB_MUTEX_CEPH_LOCK_NAME, + CTDB_MUTEX_CEPH_LOCK_COOKIE, + CTDB_MUTEX_CEPH_LOCK_DESC, + NULL, /* infinite duration */ + 0); + if ((ret == -EEXIST) || (ret == -EBUSY)) { + /* lock contention */ + return ret; + } else if (ret < 0) { + /* unexpected failure */ + fprintf(stderr, + "%s: Failed to get lock on RADOS object '%s' - (%s)\n", + progname, oid, strerror(-ret)); + return ret; + } + + /* lock obtained */ + return 0; +} + +static int ctdb_mutex_rados_unlock(rados_ioctx_t *ioctx, + const char *oid) +{ + int ret; + + ret = rados_unlock(ioctx, oid, + CTDB_MUTEX_CEPH_LOCK_NAME, + CTDB_MUTEX_CEPH_LOCK_COOKIE); + if (ret < 0) { + fprintf(stderr, + "%s: Failed to drop lock on RADOS object '%s' - (%s)\n", + progname, oid, strerror(-ret)); + return ret; + } + + return 0; +} + +struct ctdb_mutex_rados_state { + bool holding_mutex; + const char *ceph_cluster_name; + const char *ceph_auth_name; + const char *pool_name; + const char *object; + int ppid; + struct tevent_context *ev; + struct tevent_signal *sig_ev; + struct tevent_timer *timer_ev; + rados_t ceph_cluster; + rados_ioctx_t ioctx; +}; + +static void ctdb_mutex_rados_sigterm_cb(struct tevent_context *ev, + struct tevent_signal *se, + int signum, + int count, + void *siginfo, + void *private_data) +{ + struct ctdb_mutex_rados_state *cmr_state = private_data; + int ret; + + if (!cmr_state->holding_mutex) { + fprintf(stderr, "Sigterm callback invoked without mutex!\n"); + ret = -EINVAL; + goto err_ctx_cleanup; + } + + ret = ctdb_mutex_rados_unlock(cmr_state->ioctx, cmr_state->object); +err_ctx_cleanup: + ctdb_mutex_rados_ctx_destroy(cmr_state->ceph_cluster, + cmr_state->ioctx); + talloc_free(cmr_state); + exit(ret ? 1 : 0); +} + +static void ctdb_mutex_rados_timer_cb(struct tevent_context *ev, + struct tevent_timer *te, + struct timeval current_time, + void *private_data) +{ + struct ctdb_mutex_rados_state *cmr_state = private_data; + int ret; + + if (!cmr_state->holding_mutex) { + fprintf(stderr, "Timer callback invoked without mutex!\n"); + ret = -EINVAL; + goto err_ctx_cleanup; + } + + if ((kill(cmr_state->ppid, 0) == 0) || (errno != ESRCH)) { + /* parent still around, keep waiting */ + cmr_state->timer_ev = tevent_add_timer(cmr_state->ev, cmr_state, + timeval_current_ofs(5,0), + ctdb_mutex_rados_timer_cb, + cmr_state); + if (cmr_state->timer_ev == NULL) { + fprintf(stderr, "Failed to create timer event\n"); + /* rely on signal cb */ + } + return; + } + + /* parent ended, drop lock and exit */ + ret = ctdb_mutex_rados_unlock(cmr_state->ioctx, cmr_state->object); +err_ctx_cleanup: + ctdb_mutex_rados_ctx_destroy(cmr_state->ceph_cluster, + cmr_state->ioctx); + talloc_free(cmr_state); + exit(ret ? 1 : 0); +} + +int main(int argc, char *argv[]) +{ + int ret; + struct ctdb_mutex_rados_state *cmr_state; + + progname = argv[0]; + + if (argc != 5) { + fprintf(stderr, "Usage: %s <Ceph Cluster> <Ceph user> " + "<RADOS pool> <RADOS object>\n", + progname); + ret = -EINVAL; + goto err_out; + } + + ret = setvbuf(stdout, NULL, _IONBF, 0); + if (ret != 0) { + fprintf(stderr, "Failed to configure unbuffered stdout I/O\n"); + } + + cmr_state = talloc_zero(NULL, struct ctdb_mutex_rados_state); + if (cmr_state == NULL) { + fprintf(stdout, CTDB_MUTEX_STATUS_ERROR); + ret = -ENOMEM; + goto err_out; + } + + cmr_state->ceph_cluster_name = argv[1]; + cmr_state->ceph_auth_name = argv[2]; + cmr_state->pool_name = argv[3]; + cmr_state->object = argv[4]; + + cmr_state->ppid = getppid(); + if (cmr_state->ppid == 1) { + /* + * The original parent is gone and the process has + * been reparented to init. This can happen if the + * helper is started just as the parent is killed + * during shutdown. The error message doesn't need to + * be stellar, since there won't be anything around to + * capture and log it... + */ + fprintf(stderr, "%s: PPID == 1\n", progname); + ret = -EPIPE; + goto err_state_free; + } + + cmr_state->ev = tevent_context_init(cmr_state); + if (cmr_state->ev == NULL) { + fprintf(stderr, "tevent_context_init failed\n"); + fprintf(stdout, CTDB_MUTEX_STATUS_ERROR); + ret = -ENOMEM; + goto err_state_free; + } + + /* wait for sigterm */ + cmr_state->sig_ev = tevent_add_signal(cmr_state->ev, cmr_state, SIGTERM, 0, + ctdb_mutex_rados_sigterm_cb, + cmr_state); + if (cmr_state->sig_ev == NULL) { + fprintf(stderr, "Failed to create signal event\n"); + fprintf(stdout, CTDB_MUTEX_STATUS_ERROR); + ret = -ENOMEM; + goto err_state_free; + } + + /* periodically check parent */ + cmr_state->timer_ev = tevent_add_timer(cmr_state->ev, cmr_state, + timeval_current_ofs(5,0), + ctdb_mutex_rados_timer_cb, + cmr_state); + if (cmr_state->timer_ev == NULL) { + fprintf(stderr, "Failed to create timer event\n"); + fprintf(stdout, CTDB_MUTEX_STATUS_ERROR); + ret = -ENOMEM; + goto err_state_free; + } + + ret = ctdb_mutex_rados_ctx_create(cmr_state->ceph_cluster_name, + cmr_state->ceph_auth_name, + cmr_state->pool_name, + &cmr_state->ceph_cluster, + &cmr_state->ioctx); + if (ret < 0) { + fprintf(stdout, CTDB_MUTEX_STATUS_ERROR); + goto err_state_free; + } + + ret = ctdb_mutex_rados_lock(cmr_state->ioctx, cmr_state->object); + if ((ret == -EEXIST) || (ret == -EBUSY)) { + fprintf(stdout, CTDB_MUTEX_STATUS_CONTENDED); + goto err_ctx_cleanup; + } else if (ret < 0) { + fprintf(stdout, CTDB_MUTEX_STATUS_ERROR); + goto err_ctx_cleanup; + } + + cmr_state->holding_mutex = true; + fprintf(stdout, CTDB_MUTEX_STATUS_HOLDING); + + /* wait for the signal / timer events to do their work */ + ret = tevent_loop_wait(cmr_state->ev); + if (ret < 0) { + goto err_ctx_cleanup; + } +err_ctx_cleanup: + ctdb_mutex_rados_ctx_destroy(cmr_state->ceph_cluster, + cmr_state->ioctx); +err_state_free: + talloc_free(cmr_state); +err_out: + return ret ? 1 : 0; +} diff --git a/ctdb/wscript b/ctdb/wscript index 6137f92..c34f618 100644 --- a/ctdb/wscript +++ b/ctdb/wscript @@ -70,6 +70,9 @@ def set_options(opt): opt.add_option('--enable-pmda', help=("Turn on PCP pmda support (default=no)"), action="store_true", dest='ctdb_pmda', default=False) + opt.add_option('--enable-ceph-reclock', + help=("Enable Ceph CTDB recovery lock helper (default=no)"), + action="store_true", dest='ctdb_ceph_reclock', default=False) opt.add_option('--with-logdir', help=("Path to log directory"), @@ -160,6 +163,15 @@ def configure(conf): conf.env.CTDB_PMDADIR = os.path.join(conf.env.LOCALSTATEDIR, 'lib/pcp/pmdas/ctdb') + if Options.options.ctdb_ceph_reclock: + if (conf.CHECK_HEADERS('rados/librados.h', False, False, 'rados') and + conf.CHECK_LIB('rados', shlib=True)): + Logs.info('Building with Ceph librados recovery lock support') + conf.define('HAVE_LIBRADOS', 1) + else: + Logs.error("Missing librados for Ceph recovery lock support") + sys.exit(1) + have_infiniband = False if Options.options.ctdb_infiniband: ib_support = True @@ -516,6 +528,13 @@ def build(bld): bld.INSTALL_FILES('${CTDB_PMDADIR}', 'utils/pmda/README', destname='README') + if bld.env.HAVE_LIBRADOS: + bld.SAMBA_BINARY('ctdb_mutex_ceph_rados_helper', + source='utils/ceph/ctdb_mutex_ceph_rados_helper.c', + deps='ctdb-system rados', + includes='include', + install_path='${CTDB_HELPER_BINDIR}') + sed_expr1 = 's|/usr/local/var/lib/ctdb|%s|g' % (bld.env.CTDB_VARDIR) sed_expr2 = 's|/usr/local/etc/ctdb|%s|g' % (bld.env.CTDB_ETCDIR) sed_expr3 = 's|/usr/local/var/log|%s|g' % (bld.env.CTDB_LOGDIR) -- 2.10.2 From 2e4e170dba4fdbd5bdf0c62c4c71e361d97bf6ee Mon Sep 17 00:00:00 2001 From: David Disseldorp <ddiss@samba.org> Date: Thu, 1 Dec 2016 14:22:45 +0100 Subject: [PATCH 2/3] ctdb/doc: man page for Ceph RADOS cluster mutex helper Signed-off-by: David Disseldorp <ddiss@samba.org> --- ctdb/doc/Makefile | 3 +- ctdb/doc/ctdb_mutex_ceph_rados_helper.7.xml | 90 +++++++++++++++++++++++++++++ 2 files changed, 92 insertions(+), 1 deletion(-) create mode 100644 ctdb/doc/ctdb_mutex_ceph_rados_helper.7.xml diff --git a/ctdb/doc/Makefile b/ctdb/doc/Makefile index 50ab719..756fe27 100644 --- a/ctdb/doc/Makefile +++ b/ctdb/doc/Makefile @@ -9,7 +9,8 @@ DOCS = ctdb.1 ctdb.1.html \ ctdb.7 ctdb.7.html \ ctdb-statistics.7 ctdb-statistics.7.html \ ctdb-etcd.7 ctdb-etcd.7.html \ - ctdb-tunables.7 ctdb-tunables.7.html + ctdb-tunables.7 ctdb-tunables.7.html \ + ctdb_mutex_ceph_rados_helper.7 ctdb_mutex_ceph_rados_helper.7.html all: $(DOCS) diff --git a/ctdb/doc/ctdb_mutex_ceph_rados_helper.7.xml b/ctdb/doc/ctdb_mutex_ceph_rados_helper.7.xml new file mode 100644 index 0000000..e5dedc7 --- /dev/null +++ b/ctdb/doc/ctdb_mutex_ceph_rados_helper.7.xml @@ -0,0 +1,90 @@ +<?xml version="1.0" encoding="iso-8859-1"?> +<!DOCTYPE refentry + PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" + "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"> +<refentry id="ctdb_mutex_ceph_rados_helper.7"> + + <refmeta> + <refentrytitle>Ceph RADOS Mutex</refentrytitle> + <manvolnum>7</manvolnum> + <refmiscinfo class="source">ctdb</refmiscinfo> + <refmiscinfo class="manual">CTDB - clustered TDB database</refmiscinfo> + </refmeta> + + <refnamediv> + <refname>ctdb_mutex_ceph_rados_helper</refname> + <refpurpose>Ceph RADOS cluster mutex helper</refpurpose> + </refnamediv> + + <refsect1> + <title>DESCRIPTION</title> + <para> + ctdb_mutex_ceph_rados_helper_lock can be used as a recovery lock provider + for CTDB. When configured, split brain avoidance during CTDB recovery + will be handled using locks against an object located in a Ceph RADOS + pool. + To enable this functionality, include the following line in your CTDB + config file: + </para> + <screen format="linespecific"> +CTDB_RECOVERY_LOCK="!ctdb_mutex_ceph_rados_helper_lock [Cluster] [User] [Pool] [Object]" + +Cluster: Ceph cluster name (e.g. ceph) +User: Ceph cluster user name (e.g. client.admin) +Pool: Ceph RADOS pool name +Object: Ceph RADOS object name + </screen> + <para> + The Ceph cluster <parameter>Cluster</parameter> must be up and running, + with a configuration, and keyring file for <parameter>User</parameter> + located in a librados default search path (e.g. /etc/ceph/). + <parameter>Pool</parameter> must already exist. + </para> + </refsect1> + + <refsect1> + <title>SEE ALSO</title> + <para> + <citerefentry><refentrytitle>ctdb</refentrytitle> + <manvolnum>7</manvolnum></citerefentry>, + + <citerefentry><refentrytitle>ctdbd</refentrytitle> + <manvolnum>1</manvolnum></citerefentry>, + + <ulink url="http://ctdb.samba.org/"/> + </para> + </refsect1> + + <refentryinfo> + <author> + <contrib> + This documentation was written by David Disseldorp + </contrib> + </author> + + <copyright> + <year>2016</year> + <holder>David Disseldorp</holder> + </copyright> + <legalnotice> + <para> + This program is free software; you can redistribute it and/or + modify it under the terms of the GNU General Public License as + published by the Free Software Foundation; either version 3 of + the License, or (at your option) any later version. + </para> + <para> + This program is distributed in the hope that it will be + useful, but WITHOUT ANY WARRANTY; without even the implied + warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR + PURPOSE. See the GNU General Public License for more details. + </para> + <para> + You should have received a copy of the GNU General Public + License along with this program; if not, see + <ulink url="http://www.gnu.org/licenses"/>. + </para> + </legalnotice> + </refentryinfo> + +</refentry> -- 2.10.2 From ba0581fbfcd2d5718e2223aba3121f9a9c0f8a4a Mon Sep 17 00:00:00 2001 From: David Disseldorp <ddiss@samba.org> Date: Tue, 6 Dec 2016 13:03:27 +0100 Subject: [PATCH 3/3] ctdb: add test script for ctdb_mutex_ceph_rados_helper This standalone test script performs the following: - using ctdb_mutex_ceph_rados_helper, take a lock on the Ceph RADOS object a CLUSTER/$POOL/$OBJECT using the Ceph keyring for $USER + confirm that lock is obtained, via ctdb_mutex_ceph_rados_helper "0" output - check RADOS object lock state, using the "rados lock info" command - attempt to obtain the lock again, using ctdb_mutex_ceph_rados_helper + confirm that the lock is not successfully taken - tell the first locker to drop the lock and exit, via SIGTERM - once the first locker has exited, attempt to get the lock again + confirm that this attempt succeeds Signed-off-by: David Disseldorp <ddiss@samba.org> --- ctdb/utils/ceph/test_ceph_rados_reclock.sh | 151 +++++++++++++++++++++++++++++ 1 file changed, 151 insertions(+) create mode 100755 ctdb/utils/ceph/test_ceph_rados_reclock.sh diff --git a/ctdb/utils/ceph/test_ceph_rados_reclock.sh b/ctdb/utils/ceph/test_ceph_rados_reclock.sh new file mode 100755 index 0000000..1adacf6 --- /dev/null +++ b/ctdb/utils/ceph/test_ceph_rados_reclock.sh @@ -0,0 +1,151 @@ +#!/bin/bash +# standalone test for ctdb_mutex_ceph_rados_helper +# +# Copyright (C) David Disseldorp 2016 +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, see <http://www.gnu.org/licenses/>. + +# XXX The following parameters may require configuration: +CLUSTER="ceph" # Name of the Ceph cluster under test +USER="client.admin" # Ceph user - a keyring must exist +POOL="rbd" # RADOS pool - must exist +OBJECT="ctdb_reclock" # RADOS object: target for lock requests + +# test procedure: +# - using ctdb_mutex_ceph_rados_helper, take a lock on the Ceph RADOS object at +# CLUSTER/$POOL/$OBJECT using the Ceph keyring for $USER +# + confirm that lock is obtained, via ctdb_mutex_ceph_rados_helper "0" output +# - check RADOS object lock state, using the "rados lock info" command +# - attempt to obtain the lock again, using ctdb_mutex_ceph_rados_helper +# + confirm that the lock is not successfully taken ("1" output=contention) +# - tell the first locker to drop the lock and exit, via SIGTERM +# - once the first locker has exited, attempt to get the lock again +# + confirm that this attempt succeeds + +function _fail() { + echo "FAILED: $*" + exit 1 +} + +# this test requires the Ceph "rados" binary, and "jq" json parser +which jq > /dev/null || exit 1 +which rados > /dev/null || exit 1 +which ctdb_mutex_ceph_rados_helper || exit 1 + +TMP_DIR="$(mktemp --directory)" || exit 1 +rados -p "$POOL" rm "$OBJECT" + +(ctdb_mutex_ceph_rados_helper "$CLUSTER" "$USER" "$POOL" "$OBJECT" \ + > ${TMP_DIR}/first) & +locker_pid=$! + +# TODO wait for ctdb_mutex_ceph_rados_helper to write one byte to stdout, +# indicating lock acquisition success/failure +sleep 1 + +first_out=$(cat ${TMP_DIR}/first) +[ "$first_out" == "0" ] \ + || _fail "expected lock acquisition (0), but got $first_out" + +rados -p "$POOL" lock info "$OBJECT" ctdb_reclock_mutex \ + > ${TMP_DIR}/lock_state_first + +# echo "with lock: `cat ${TMP_DIR}/lock_state_first`" + +LOCK_NAME="$(jq -r '.name' ${TMP_DIR}/lock_state_first)" +[ "$LOCK_NAME" == "ctdb_reclock_mutex" ] \ + || _fail "unexpected lock name: $LOCK_NAME" +LOCK_TYPE="$(jq -r '.type' ${TMP_DIR}/lock_state_first)" +[ "$LOCK_TYPE" == "exclusive" ] \ + || _fail "unexpected lock type: $LOCK_TYPE" + +LOCK_COUNT="$(jq -r '.lockers | length' ${TMP_DIR}/lock_state_first)" +[ $LOCK_COUNT -eq 1 ] || _fail "expected 1 lock in rados state, got $LOCK_COUNT" +LOCKER_COOKIE="$(jq -r '.lockers[0].cookie' ${TMP_DIR}/lock_state_first)" +[ "$LOCKER_COOKIE" == "ctdb_reclock_mutex" ] \ + || _fail "unexpected locker cookie: $LOCKER_COOKIE" +LOCKER_DESC="$(jq -r '.lockers[0].description' ${TMP_DIR}/lock_state_first)" +[ "$LOCKER_DESC" == "CTDB recovery lock" ] \ + || _fail "unexpected locker description: $LOCKER_DESC" + +# second attempt while first is still holding the lock - expect failure +ctdb_mutex_ceph_rados_helper "$CLUSTER" "$USER" "$POOL" "$OBJECT" \ + > ${TMP_DIR}/second +second_out=$(cat ${TMP_DIR}/second) +[ "$second_out" == "1" ] \ + || _fail "expected lock contention (1), but got $second_out" + +# confirm lock state didn't change +rados -p "$POOL" lock info "$OBJECT" ctdb_reclock_mutex \ + > ${TMP_DIR}/lock_state_second + +diff ${TMP_DIR}/lock_state_first ${TMP_DIR}/lock_state_second \ + || _fail "unexpected lock state change" + +# tell first locker to drop the lock and terminate +kill $locker_pid || exit 1 + +wait $locker_pid &> /dev/null + +rados -p "$POOL" lock info "$OBJECT" ctdb_reclock_mutex \ + > ${TMP_DIR}/lock_state_third +# echo "without lock: `cat ${TMP_DIR}/lock_state_third`" + +LOCK_NAME="$(jq -r '.name' ${TMP_DIR}/lock_state_third)" +[ "$LOCK_NAME" == "ctdb_reclock_mutex" ] \ + || _fail "unexpected lock name: $LOCK_NAME" +LOCK_TYPE="$(jq -r '.type' ${TMP_DIR}/lock_state_third)" +[ "$LOCK_TYPE" == "exclusive" ] \ + || _fail "unexpected lock type: $LOCK_TYPE" + +LOCK_COUNT="$(jq -r '.lockers | length' ${TMP_DIR}/lock_state_third)" +[ $LOCK_COUNT -eq 0 ] \ + || _fail "didn\'t expect any locks in rados state, got $LOCK_COUNT" + +exec >${TMP_DIR}/third -- ctdb_mutex_ceph_rados_helper "$CLUSTER" "$USER" "$POOL" "$OBJECT" & +locker_pid=$! + +sleep 1 + +rados -p "$POOL" lock info "$OBJECT" ctdb_reclock_mutex \ + > ${TMP_DIR}/lock_state_fourth +# echo "with lock again: `cat ${TMP_DIR}/lock_state_fourth`" + +LOCK_NAME="$(jq -r '.name' ${TMP_DIR}/lock_state_fourth)" +[ "$LOCK_NAME" == "ctdb_reclock_mutex" ] \ + || _fail "unexpected lock name: $LOCK_NAME" +LOCK_TYPE="$(jq -r '.type' ${TMP_DIR}/lock_state_fourth)" +[ "$LOCK_TYPE" == "exclusive" ] \ + || _fail "unexpected lock type: $LOCK_TYPE" + +LOCK_COUNT="$(jq -r '.lockers | length' ${TMP_DIR}/lock_state_fourth)" +[ $LOCK_COUNT -eq 1 ] || _fail "expected 1 lock in rados state, got $LOCK_COUNT" +LOCKER_COOKIE="$(jq -r '.lockers[0].cookie' ${TMP_DIR}/lock_state_fourth)" +[ "$LOCKER_COOKIE" == "ctdb_reclock_mutex" ] \ + || _fail "unexpected locker cookie: $LOCKER_COOKIE" +LOCKER_DESC="$(jq -r '.lockers[0].description' ${TMP_DIR}/lock_state_fourth)" +[ "$LOCKER_DESC" == "CTDB recovery lock" ] \ + || _fail "unexpected locker description: $LOCKER_DESC" + +kill $locker_pid || exit 1 +wait $locker_pid &> /dev/null + +third_out=$(cat ${TMP_DIR}/third) +[ "$third_out" == "0" ] \ + || _fail "expected lock acquisition (0), but got $third_out" + +rm ${TMP_DIR}/* +rmdir $TMP_DIR + +echo "$0: all tests passed" -- 2.10.2 ^ permalink raw reply related [flat|nested] 5+ messages in thread
[parent not found: <CAJ+X7mRh04D+Yvtf0xx3dT6rTa=9KvJagyK=1PJQC=1R+u++7w@mail.gmail.com>]
* Re: [PATCH] Ceph RADOS cluster mutex helper for Samba CTDB [not found] ` <CAJ+X7mRh04D+Yvtf0xx3dT6rTa=9KvJagyK=1PJQC=1R+u++7w@mail.gmail.com> @ 2016-12-08 18:39 ` David Disseldorp 2016-12-09 3:11 ` Amitay Isaacs 0 siblings, 1 reply; 5+ messages in thread From: David Disseldorp @ 2016-12-08 18:39 UTC (permalink / raw) To: Amitay Isaacs; +Cc: Samba Technical, ceph-devel@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 1332 bytes --] Hi Amitay, On Wed, 7 Dec 2016 13:32:34 +1100, Amitay Isaacs wrote: > Hi David, > > On Tue, Dec 6, 2016 at 11:18 PM, David Disseldorp <ddiss@suse.de> wrote: > > > This time with the patch-set attached... > > > > > ctdb/doc/Makefile | 3 +- > > > ctdb/doc/ctdb_mutex_ceph_rados_helper.7.xml | 90 +++++ > > > .../utils/ceph/ctdb_mutex_ceph_rados_helper.c | 334 ++++++++++++++++++ > > > ctdb/utils/ceph/test_ceph_rados_reclock.sh | 151 ++++++++ > > > ctdb/wscript | 19 + > > > 5 files changed, 596 insertions(+), 1 deletion(-) > > > > In patch 1, why do you need to include any of the CTDB files > (protocol/protocol.h and common/system.h) and have dependency on > ctdb-system? I don't see you are using any of the functions defined in > common/system.h. > > Please include the manpage in SAMBA_BINARY() definition. Also include it in > manpages[] list. It might be better to merge patch 1 and patch 2. Thanks for the feedback. Please find a new version attached (atop the etcd changes), attempting to address your points above: - drop unnecessary includes and ctdb-system dependency + add separate talloc and tevent deps + use tevent_timeval_current_ofs() instead of timeval_current_ofs() - conditionally generate the man page Cheers, David [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: ctdb_reclock_ceph_v3.patchset --] [-- Type: text/x-patch, Size: 25256 bytes --] From 54c16ac1dfafb06111aeafb2377b06bd5db36994 Mon Sep 17 00:00:00 2001 From: David Disseldorp <ddiss@samba.org> Date: Thu, 1 Dec 2016 13:33:22 +0100 Subject: [PATCH 1/3] ctdb: cluster mutex helper using Ceph RADOS ctdb_mutex_ceph_rados_helper implements the cluster mutex helper API atop Ceph using the librados rados_lock_exclusive()/rados_unlock() functionality. Once configured, split brain avoidance during CTDB recovery will be handled using locks against an object located in a Ceph RADOS pool. Signed-off-by: David Disseldorp <ddiss@samba.org> --- ctdb/utils/ceph/ctdb_mutex_ceph_rados_helper.c | 328 +++++++++++++++++++++++++ ctdb/wscript | 19 ++ 2 files changed, 347 insertions(+) create mode 100644 ctdb/utils/ceph/ctdb_mutex_ceph_rados_helper.c diff --git a/ctdb/utils/ceph/ctdb_mutex_ceph_rados_helper.c b/ctdb/utils/ceph/ctdb_mutex_ceph_rados_helper.c new file mode 100644 index 0000000..326a0b0 --- /dev/null +++ b/ctdb/utils/ceph/ctdb_mutex_ceph_rados_helper.c @@ -0,0 +1,328 @@ +/* + CTDB mutex helper using Ceph librados locks + + Copyright (C) David Disseldorp 2016 + + Based on ctdb_mutex_fcntl_helper.c, which is: + Copyright (C) Martin Schwenke 2015 + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, see <http://www.gnu.org/licenses/>. +*/ + +#include "replace.h" + +#include "tevent.h" +#include "talloc.h" +#include "rados/librados.h" + +#define CTDB_MUTEX_CEPH_LOCK_NAME "ctdb_reclock_mutex" +#define CTDB_MUTEX_CEPH_LOCK_COOKIE CTDB_MUTEX_CEPH_LOCK_NAME +#define CTDB_MUTEX_CEPH_LOCK_DESC "CTDB recovery lock" + +#define CTDB_MUTEX_STATUS_HOLDING "0" +#define CTDB_MUTEX_STATUS_CONTENDED "1" +#define CTDB_MUTEX_STATUS_TIMEOUT "2" +#define CTDB_MUTEX_STATUS_ERROR "3" + +static char *progname = NULL; + +static int ctdb_mutex_rados_ctx_create(const char *ceph_cluster_name, + const char *ceph_auth_name, + const char *pool_name, + rados_t *_ceph_cluster, + rados_ioctx_t *_ioctx) +{ + rados_t ceph_cluster = NULL; + rados_ioctx_t ioctx = NULL; + int ret; + + ret = rados_create2(&ceph_cluster, ceph_cluster_name, ceph_auth_name, 0); + if (ret < 0) { + fprintf(stderr, "%s: failed to initialise Ceph cluster %s as %s" + " - (%s)\n", progname, ceph_cluster_name, ceph_auth_name, + strerror(-ret)); + return ret; + } + + /* path=NULL tells librados to use default locations */ + ret = rados_conf_read_file(ceph_cluster, NULL); + if (ret < 0) { + fprintf(stderr, "%s: failed to parse Ceph cluster config" + " - (%s)\n", progname, strerror(-ret)); + rados_shutdown(ceph_cluster); + return ret; + } + + ret = rados_connect(ceph_cluster); + if (ret < 0) { + fprintf(stderr, "%s: failed to connect to Ceph cluster %s as %s" + " - (%s)\n", progname, ceph_cluster_name, ceph_auth_name, + strerror(-ret)); + rados_shutdown(ceph_cluster); + return ret; + } + + + ret = rados_ioctx_create(ceph_cluster, pool_name, &ioctx); + if (ret < 0) { + fprintf(stderr, "%s: failed to create Ceph ioctx for pool %s" + " - (%s)\n", progname, pool_name, strerror(-ret)); + rados_shutdown(ceph_cluster); + return ret; + } + + *_ceph_cluster = ceph_cluster; + *_ioctx = ioctx; + + return 0; +} + +static void ctdb_mutex_rados_ctx_destroy(rados_t ceph_cluster, + rados_ioctx_t ioctx) +{ + rados_ioctx_destroy(ioctx); + rados_shutdown(ceph_cluster); +} + +static int ctdb_mutex_rados_lock(rados_ioctx_t *ioctx, + const char *oid) +{ + int ret; + + ret = rados_lock_exclusive(ioctx, oid, + CTDB_MUTEX_CEPH_LOCK_NAME, + CTDB_MUTEX_CEPH_LOCK_COOKIE, + CTDB_MUTEX_CEPH_LOCK_DESC, + NULL, /* infinite duration */ + 0); + if ((ret == -EEXIST) || (ret == -EBUSY)) { + /* lock contention */ + return ret; + } else if (ret < 0) { + /* unexpected failure */ + fprintf(stderr, + "%s: Failed to get lock on RADOS object '%s' - (%s)\n", + progname, oid, strerror(-ret)); + return ret; + } + + /* lock obtained */ + return 0; +} + +static int ctdb_mutex_rados_unlock(rados_ioctx_t *ioctx, + const char *oid) +{ + int ret; + + ret = rados_unlock(ioctx, oid, + CTDB_MUTEX_CEPH_LOCK_NAME, + CTDB_MUTEX_CEPH_LOCK_COOKIE); + if (ret < 0) { + fprintf(stderr, + "%s: Failed to drop lock on RADOS object '%s' - (%s)\n", + progname, oid, strerror(-ret)); + return ret; + } + + return 0; +} + +struct ctdb_mutex_rados_state { + bool holding_mutex; + const char *ceph_cluster_name; + const char *ceph_auth_name; + const char *pool_name; + const char *object; + int ppid; + struct tevent_context *ev; + struct tevent_signal *sig_ev; + struct tevent_timer *timer_ev; + rados_t ceph_cluster; + rados_ioctx_t ioctx; +}; + +static void ctdb_mutex_rados_sigterm_cb(struct tevent_context *ev, + struct tevent_signal *se, + int signum, + int count, + void *siginfo, + void *private_data) +{ + struct ctdb_mutex_rados_state *cmr_state = private_data; + int ret; + + if (!cmr_state->holding_mutex) { + fprintf(stderr, "Sigterm callback invoked without mutex!\n"); + ret = -EINVAL; + goto err_ctx_cleanup; + } + + ret = ctdb_mutex_rados_unlock(cmr_state->ioctx, cmr_state->object); +err_ctx_cleanup: + ctdb_mutex_rados_ctx_destroy(cmr_state->ceph_cluster, + cmr_state->ioctx); + talloc_free(cmr_state); + exit(ret ? 1 : 0); +} + +static void ctdb_mutex_rados_timer_cb(struct tevent_context *ev, + struct tevent_timer *te, + struct timeval current_time, + void *private_data) +{ + struct ctdb_mutex_rados_state *cmr_state = private_data; + int ret; + + if (!cmr_state->holding_mutex) { + fprintf(stderr, "Timer callback invoked without mutex!\n"); + ret = -EINVAL; + goto err_ctx_cleanup; + } + + if ((kill(cmr_state->ppid, 0) == 0) || (errno != ESRCH)) { + /* parent still around, keep waiting */ + cmr_state->timer_ev = tevent_add_timer(cmr_state->ev, cmr_state, + tevent_timeval_current_ofs(5, 0), + ctdb_mutex_rados_timer_cb, + cmr_state); + if (cmr_state->timer_ev == NULL) { + fprintf(stderr, "Failed to create timer event\n"); + /* rely on signal cb */ + } + return; + } + + /* parent ended, drop lock and exit */ + ret = ctdb_mutex_rados_unlock(cmr_state->ioctx, cmr_state->object); +err_ctx_cleanup: + ctdb_mutex_rados_ctx_destroy(cmr_state->ceph_cluster, + cmr_state->ioctx); + talloc_free(cmr_state); + exit(ret ? 1 : 0); +} + +int main(int argc, char *argv[]) +{ + int ret; + struct ctdb_mutex_rados_state *cmr_state; + + progname = argv[0]; + + if (argc != 5) { + fprintf(stderr, "Usage: %s <Ceph Cluster> <Ceph user> " + "<RADOS pool> <RADOS object>\n", + progname); + ret = -EINVAL; + goto err_out; + } + + ret = setvbuf(stdout, NULL, _IONBF, 0); + if (ret != 0) { + fprintf(stderr, "Failed to configure unbuffered stdout I/O\n"); + } + + cmr_state = talloc_zero(NULL, struct ctdb_mutex_rados_state); + if (cmr_state == NULL) { + fprintf(stdout, CTDB_MUTEX_STATUS_ERROR); + ret = -ENOMEM; + goto err_out; + } + + cmr_state->ceph_cluster_name = argv[1]; + cmr_state->ceph_auth_name = argv[2]; + cmr_state->pool_name = argv[3]; + cmr_state->object = argv[4]; + + cmr_state->ppid = getppid(); + if (cmr_state->ppid == 1) { + /* + * The original parent is gone and the process has + * been reparented to init. This can happen if the + * helper is started just as the parent is killed + * during shutdown. The error message doesn't need to + * be stellar, since there won't be anything around to + * capture and log it... + */ + fprintf(stderr, "%s: PPID == 1\n", progname); + ret = -EPIPE; + goto err_state_free; + } + + cmr_state->ev = tevent_context_init(cmr_state); + if (cmr_state->ev == NULL) { + fprintf(stderr, "tevent_context_init failed\n"); + fprintf(stdout, CTDB_MUTEX_STATUS_ERROR); + ret = -ENOMEM; + goto err_state_free; + } + + /* wait for sigterm */ + cmr_state->sig_ev = tevent_add_signal(cmr_state->ev, cmr_state, SIGTERM, 0, + ctdb_mutex_rados_sigterm_cb, + cmr_state); + if (cmr_state->sig_ev == NULL) { + fprintf(stderr, "Failed to create signal event\n"); + fprintf(stdout, CTDB_MUTEX_STATUS_ERROR); + ret = -ENOMEM; + goto err_state_free; + } + + /* periodically check parent */ + cmr_state->timer_ev = tevent_add_timer(cmr_state->ev, cmr_state, + tevent_timeval_current_ofs(5, 0), + ctdb_mutex_rados_timer_cb, + cmr_state); + if (cmr_state->timer_ev == NULL) { + fprintf(stderr, "Failed to create timer event\n"); + fprintf(stdout, CTDB_MUTEX_STATUS_ERROR); + ret = -ENOMEM; + goto err_state_free; + } + + ret = ctdb_mutex_rados_ctx_create(cmr_state->ceph_cluster_name, + cmr_state->ceph_auth_name, + cmr_state->pool_name, + &cmr_state->ceph_cluster, + &cmr_state->ioctx); + if (ret < 0) { + fprintf(stdout, CTDB_MUTEX_STATUS_ERROR); + goto err_state_free; + } + + ret = ctdb_mutex_rados_lock(cmr_state->ioctx, cmr_state->object); + if ((ret == -EEXIST) || (ret == -EBUSY)) { + fprintf(stdout, CTDB_MUTEX_STATUS_CONTENDED); + goto err_ctx_cleanup; + } else if (ret < 0) { + fprintf(stdout, CTDB_MUTEX_STATUS_ERROR); + goto err_ctx_cleanup; + } + + cmr_state->holding_mutex = true; + fprintf(stdout, CTDB_MUTEX_STATUS_HOLDING); + + /* wait for the signal / timer events to do their work */ + ret = tevent_loop_wait(cmr_state->ev); + if (ret < 0) { + goto err_ctx_cleanup; + } +err_ctx_cleanup: + ctdb_mutex_rados_ctx_destroy(cmr_state->ceph_cluster, + cmr_state->ioctx); +err_state_free: + talloc_free(cmr_state); +err_out: + return ret ? 1 : 0; +} diff --git a/ctdb/wscript b/ctdb/wscript index d7b1891..59bd8e2 100644 --- a/ctdb/wscript +++ b/ctdb/wscript @@ -79,6 +79,9 @@ def set_options(opt): opt.add_option('--enable-etcd-reclock', help=("Enable etcd recovery lock helper (default=no)"), action="store_true", dest='ctdb_etcd_reclock', default=False) + opt.add_option('--enable-ceph-reclock', + help=("Enable Ceph CTDB recovery lock helper (default=no)"), + action="store_true", dest='ctdb_ceph_reclock', default=False) opt.add_option('--with-logdir', help=("Path to log directory"), @@ -201,6 +204,15 @@ def configure(conf): Logs.info('Building with etcd support') conf.env.etcd_reclock = have_etcd_reclock + if Options.options.ctdb_ceph_reclock: + if (conf.CHECK_HEADERS('rados/librados.h', False, False, 'rados') and + conf.CHECK_LIB('rados', shlib=True)): + Logs.info('Building with Ceph librados recovery lock support') + conf.define('HAVE_LIBRADOS', 1) + else: + Logs.error("Missing librados for Ceph recovery lock support") + sys.exit(1) + conf.env.CTDB_BINDIR = os.path.join(conf.env.EXEC_PREFIX, 'bin') conf.env.CTDB_ETCDIR = os.path.join(conf.env.SYSCONFDIR, 'ctdb') conf.env.CTDB_VARDIR = os.path.join(conf.env.LOCALSTATEDIR, 'lib/ctdb') @@ -540,6 +552,13 @@ def build(bld): bld.INSTALL_FILES('${CTDB_PMDADIR}', 'utils/pmda/README', destname='README') + if bld.env.HAVE_LIBRADOS: + bld.SAMBA_BINARY('ctdb_mutex_ceph_rados_helper', + source='utils/ceph/ctdb_mutex_ceph_rados_helper.c', + deps='talloc tevent rados', + includes='include', + install_path='${CTDB_HELPER_BINDIR}') + sed_expr1 = 's|/usr/local/var/lib/ctdb|%s|g' % (bld.env.CTDB_VARDIR) sed_expr2 = 's|/usr/local/etc/ctdb|%s|g' % (bld.env.CTDB_ETCDIR) sed_expr3 = 's|/usr/local/var/log|%s|g' % (bld.env.CTDB_LOGDIR) -- 2.10.2 From 35912b7dca417639615ad5662b5a76ee3e25a6ec Mon Sep 17 00:00:00 2001 From: David Disseldorp <ddiss@samba.org> Date: Thu, 1 Dec 2016 14:22:45 +0100 Subject: [PATCH 2/3] ctdb/doc: man page for Ceph RADOS cluster mutex helper Signed-off-by: David Disseldorp <ddiss@samba.org> --- ctdb/doc/ctdb_mutex_ceph_rados_helper.7.xml | 90 +++++++++++++++++++++++++++++ ctdb/wscript | 12 +++- 2 files changed, 100 insertions(+), 2 deletions(-) create mode 100644 ctdb/doc/ctdb_mutex_ceph_rados_helper.7.xml diff --git a/ctdb/doc/ctdb_mutex_ceph_rados_helper.7.xml b/ctdb/doc/ctdb_mutex_ceph_rados_helper.7.xml new file mode 100644 index 0000000..e5dedc7 --- /dev/null +++ b/ctdb/doc/ctdb_mutex_ceph_rados_helper.7.xml @@ -0,0 +1,90 @@ +<?xml version="1.0" encoding="iso-8859-1"?> +<!DOCTYPE refentry + PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" + "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"> +<refentry id="ctdb_mutex_ceph_rados_helper.7"> + + <refmeta> + <refentrytitle>Ceph RADOS Mutex</refentrytitle> + <manvolnum>7</manvolnum> + <refmiscinfo class="source">ctdb</refmiscinfo> + <refmiscinfo class="manual">CTDB - clustered TDB database</refmiscinfo> + </refmeta> + + <refnamediv> + <refname>ctdb_mutex_ceph_rados_helper</refname> + <refpurpose>Ceph RADOS cluster mutex helper</refpurpose> + </refnamediv> + + <refsect1> + <title>DESCRIPTION</title> + <para> + ctdb_mutex_ceph_rados_helper_lock can be used as a recovery lock provider + for CTDB. When configured, split brain avoidance during CTDB recovery + will be handled using locks against an object located in a Ceph RADOS + pool. + To enable this functionality, include the following line in your CTDB + config file: + </para> + <screen format="linespecific"> +CTDB_RECOVERY_LOCK="!ctdb_mutex_ceph_rados_helper_lock [Cluster] [User] [Pool] [Object]" + +Cluster: Ceph cluster name (e.g. ceph) +User: Ceph cluster user name (e.g. client.admin) +Pool: Ceph RADOS pool name +Object: Ceph RADOS object name + </screen> + <para> + The Ceph cluster <parameter>Cluster</parameter> must be up and running, + with a configuration, and keyring file for <parameter>User</parameter> + located in a librados default search path (e.g. /etc/ceph/). + <parameter>Pool</parameter> must already exist. + </para> + </refsect1> + + <refsect1> + <title>SEE ALSO</title> + <para> + <citerefentry><refentrytitle>ctdb</refentrytitle> + <manvolnum>7</manvolnum></citerefentry>, + + <citerefentry><refentrytitle>ctdbd</refentrytitle> + <manvolnum>1</manvolnum></citerefentry>, + + <ulink url="http://ctdb.samba.org/"/> + </para> + </refsect1> + + <refentryinfo> + <author> + <contrib> + This documentation was written by David Disseldorp + </contrib> + </author> + + <copyright> + <year>2016</year> + <holder>David Disseldorp</holder> + </copyright> + <legalnotice> + <para> + This program is free software; you can redistribute it and/or + modify it under the terms of the GNU General Public License as + published by the Free Software Foundation; either version 3 of + the License, or (at your option) any later version. + </para> + <para> + This program is distributed in the hope that it will be + useful, but WITHOUT ANY WARRANTY; without even the implied + warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR + PURPOSE. See the GNU General Public License for more details. + </para> + <para> + You should have received a copy of the GNU General Public + License along with this program; if not, see + <ulink url="http://www.gnu.org/licenses"/>. + </para> + </legalnotice> + </refentryinfo> + +</refentry> diff --git a/ctdb/wscript b/ctdb/wscript index 59bd8e2..d0e8ec7 100644 --- a/ctdb/wscript +++ b/ctdb/wscript @@ -58,6 +58,10 @@ manpages_etcd = [ 'ctdb-etcd.7', ] +manpages_ceph = [ + 'ctdb_mutex_ceph_rados_helper.7', +] + def set_options(opt): opt.PRIVATE_EXTENSION_DEFAULT('ctdb') @@ -273,7 +277,9 @@ def configure(conf): conf.env.ctdb_prebuilt_manpages = [] manpages = manpages_binary + manpages_misc if conf.env.etcd_reclock: - manpages = manpages + manpages_etcd + manpages += manpages_etcd + if conf.env.HAVE_LIBRADOS: + manpages += manpages_ceph for m in manpages: if os.path.exists(os.path.join("doc", m)): Logs.info(" %s: yes" % (m)) @@ -572,7 +578,9 @@ def build(bld): manpages_extra = manpages_misc if bld.env.etcd_reclock: - manpages_extra = manpages_extra + manpages_etcd + manpages_extra += manpages_etcd + if bld.env.HAVE_LIBRADOS: + manpages_extra += manpages_ceph for f in manpages_binary + manpages_extra: x = '%s.xml' % (f) bld.SAMBA_GENERATOR(x, -- 2.10.2 From dbc411675b338ba755c4521a0d859e2c9d67bf87 Mon Sep 17 00:00:00 2001 From: David Disseldorp <ddiss@samba.org> Date: Tue, 6 Dec 2016 13:03:27 +0100 Subject: [PATCH 3/3] ctdb: add test script for ctdb_mutex_ceph_rados_helper This standalone test script performs the following: - using ctdb_mutex_ceph_rados_helper, take a lock on the Ceph RADOS object a CLUSTER/$POOL/$OBJECT using the Ceph keyring for $USER + confirm that lock is obtained, via ctdb_mutex_ceph_rados_helper "0" output - check RADOS object lock state, using the "rados lock info" command - attempt to obtain the lock again, using ctdb_mutex_ceph_rados_helper + confirm that the lock is not successfully taken - tell the first locker to drop the lock and exit, via SIGTERM - once the first locker has exited, attempt to get the lock again + confirm that this attempt succeeds Signed-off-by: David Disseldorp <ddiss@samba.org> --- ctdb/utils/ceph/test_ceph_rados_reclock.sh | 151 +++++++++++++++++++++++++++++ 1 file changed, 151 insertions(+) create mode 100755 ctdb/utils/ceph/test_ceph_rados_reclock.sh diff --git a/ctdb/utils/ceph/test_ceph_rados_reclock.sh b/ctdb/utils/ceph/test_ceph_rados_reclock.sh new file mode 100755 index 0000000..1adacf6 --- /dev/null +++ b/ctdb/utils/ceph/test_ceph_rados_reclock.sh @@ -0,0 +1,151 @@ +#!/bin/bash +# standalone test for ctdb_mutex_ceph_rados_helper +# +# Copyright (C) David Disseldorp 2016 +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, see <http://www.gnu.org/licenses/>. + +# XXX The following parameters may require configuration: +CLUSTER="ceph" # Name of the Ceph cluster under test +USER="client.admin" # Ceph user - a keyring must exist +POOL="rbd" # RADOS pool - must exist +OBJECT="ctdb_reclock" # RADOS object: target for lock requests + +# test procedure: +# - using ctdb_mutex_ceph_rados_helper, take a lock on the Ceph RADOS object at +# CLUSTER/$POOL/$OBJECT using the Ceph keyring for $USER +# + confirm that lock is obtained, via ctdb_mutex_ceph_rados_helper "0" output +# - check RADOS object lock state, using the "rados lock info" command +# - attempt to obtain the lock again, using ctdb_mutex_ceph_rados_helper +# + confirm that the lock is not successfully taken ("1" output=contention) +# - tell the first locker to drop the lock and exit, via SIGTERM +# - once the first locker has exited, attempt to get the lock again +# + confirm that this attempt succeeds + +function _fail() { + echo "FAILED: $*" + exit 1 +} + +# this test requires the Ceph "rados" binary, and "jq" json parser +which jq > /dev/null || exit 1 +which rados > /dev/null || exit 1 +which ctdb_mutex_ceph_rados_helper || exit 1 + +TMP_DIR="$(mktemp --directory)" || exit 1 +rados -p "$POOL" rm "$OBJECT" + +(ctdb_mutex_ceph_rados_helper "$CLUSTER" "$USER" "$POOL" "$OBJECT" \ + > ${TMP_DIR}/first) & +locker_pid=$! + +# TODO wait for ctdb_mutex_ceph_rados_helper to write one byte to stdout, +# indicating lock acquisition success/failure +sleep 1 + +first_out=$(cat ${TMP_DIR}/first) +[ "$first_out" == "0" ] \ + || _fail "expected lock acquisition (0), but got $first_out" + +rados -p "$POOL" lock info "$OBJECT" ctdb_reclock_mutex \ + > ${TMP_DIR}/lock_state_first + +# echo "with lock: `cat ${TMP_DIR}/lock_state_first`" + +LOCK_NAME="$(jq -r '.name' ${TMP_DIR}/lock_state_first)" +[ "$LOCK_NAME" == "ctdb_reclock_mutex" ] \ + || _fail "unexpected lock name: $LOCK_NAME" +LOCK_TYPE="$(jq -r '.type' ${TMP_DIR}/lock_state_first)" +[ "$LOCK_TYPE" == "exclusive" ] \ + || _fail "unexpected lock type: $LOCK_TYPE" + +LOCK_COUNT="$(jq -r '.lockers | length' ${TMP_DIR}/lock_state_first)" +[ $LOCK_COUNT -eq 1 ] || _fail "expected 1 lock in rados state, got $LOCK_COUNT" +LOCKER_COOKIE="$(jq -r '.lockers[0].cookie' ${TMP_DIR}/lock_state_first)" +[ "$LOCKER_COOKIE" == "ctdb_reclock_mutex" ] \ + || _fail "unexpected locker cookie: $LOCKER_COOKIE" +LOCKER_DESC="$(jq -r '.lockers[0].description' ${TMP_DIR}/lock_state_first)" +[ "$LOCKER_DESC" == "CTDB recovery lock" ] \ + || _fail "unexpected locker description: $LOCKER_DESC" + +# second attempt while first is still holding the lock - expect failure +ctdb_mutex_ceph_rados_helper "$CLUSTER" "$USER" "$POOL" "$OBJECT" \ + > ${TMP_DIR}/second +second_out=$(cat ${TMP_DIR}/second) +[ "$second_out" == "1" ] \ + || _fail "expected lock contention (1), but got $second_out" + +# confirm lock state didn't change +rados -p "$POOL" lock info "$OBJECT" ctdb_reclock_mutex \ + > ${TMP_DIR}/lock_state_second + +diff ${TMP_DIR}/lock_state_first ${TMP_DIR}/lock_state_second \ + || _fail "unexpected lock state change" + +# tell first locker to drop the lock and terminate +kill $locker_pid || exit 1 + +wait $locker_pid &> /dev/null + +rados -p "$POOL" lock info "$OBJECT" ctdb_reclock_mutex \ + > ${TMP_DIR}/lock_state_third +# echo "without lock: `cat ${TMP_DIR}/lock_state_third`" + +LOCK_NAME="$(jq -r '.name' ${TMP_DIR}/lock_state_third)" +[ "$LOCK_NAME" == "ctdb_reclock_mutex" ] \ + || _fail "unexpected lock name: $LOCK_NAME" +LOCK_TYPE="$(jq -r '.type' ${TMP_DIR}/lock_state_third)" +[ "$LOCK_TYPE" == "exclusive" ] \ + || _fail "unexpected lock type: $LOCK_TYPE" + +LOCK_COUNT="$(jq -r '.lockers | length' ${TMP_DIR}/lock_state_third)" +[ $LOCK_COUNT -eq 0 ] \ + || _fail "didn\'t expect any locks in rados state, got $LOCK_COUNT" + +exec >${TMP_DIR}/third -- ctdb_mutex_ceph_rados_helper "$CLUSTER" "$USER" "$POOL" "$OBJECT" & +locker_pid=$! + +sleep 1 + +rados -p "$POOL" lock info "$OBJECT" ctdb_reclock_mutex \ + > ${TMP_DIR}/lock_state_fourth +# echo "with lock again: `cat ${TMP_DIR}/lock_state_fourth`" + +LOCK_NAME="$(jq -r '.name' ${TMP_DIR}/lock_state_fourth)" +[ "$LOCK_NAME" == "ctdb_reclock_mutex" ] \ + || _fail "unexpected lock name: $LOCK_NAME" +LOCK_TYPE="$(jq -r '.type' ${TMP_DIR}/lock_state_fourth)" +[ "$LOCK_TYPE" == "exclusive" ] \ + || _fail "unexpected lock type: $LOCK_TYPE" + +LOCK_COUNT="$(jq -r '.lockers | length' ${TMP_DIR}/lock_state_fourth)" +[ $LOCK_COUNT -eq 1 ] || _fail "expected 1 lock in rados state, got $LOCK_COUNT" +LOCKER_COOKIE="$(jq -r '.lockers[0].cookie' ${TMP_DIR}/lock_state_fourth)" +[ "$LOCKER_COOKIE" == "ctdb_reclock_mutex" ] \ + || _fail "unexpected locker cookie: $LOCKER_COOKIE" +LOCKER_DESC="$(jq -r '.lockers[0].description' ${TMP_DIR}/lock_state_fourth)" +[ "$LOCKER_DESC" == "CTDB recovery lock" ] \ + || _fail "unexpected locker description: $LOCKER_DESC" + +kill $locker_pid || exit 1 +wait $locker_pid &> /dev/null + +third_out=$(cat ${TMP_DIR}/third) +[ "$third_out" == "0" ] \ + || _fail "expected lock acquisition (0), but got $third_out" + +rm ${TMP_DIR}/* +rmdir $TMP_DIR + +echo "$0: all tests passed" -- 2.10.2 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] Ceph RADOS cluster mutex helper for Samba CTDB 2016-12-08 18:39 ` David Disseldorp @ 2016-12-09 3:11 ` Amitay Isaacs 0 siblings, 0 replies; 5+ messages in thread From: Amitay Isaacs @ 2016-12-09 3:11 UTC (permalink / raw) To: David Disseldorp; +Cc: ceph-devel@vger.kernel.org, Samba Technical On Fri, Dec 9, 2016 at 5:39 AM, David Disseldorp <ddiss@suse.de> wrote: > Hi Amitay, > > On Wed, 7 Dec 2016 13:32:34 +1100, Amitay Isaacs wrote: > > > Hi David, > > > > On Tue, Dec 6, 2016 at 11:18 PM, David Disseldorp <ddiss@suse.de> wrote: > > > > > This time with the patch-set attached... > > > > > > > ctdb/doc/Makefile | 3 +- > > > > ctdb/doc/ctdb_mutex_ceph_rados_helper.7.xml | 90 +++++ > > > > .../utils/ceph/ctdb_mutex_ceph_rados_helper.c | 334 > ++++++++++++++++++ > > > > ctdb/utils/ceph/test_ceph_rados_reclock.sh | 151 ++++++++ > > > > ctdb/wscript | 19 + > > > > 5 files changed, 596 insertions(+), 1 deletion(-) > > > > > > > In patch 1, why do you need to include any of the CTDB files > > (protocol/protocol.h and common/system.h) and have dependency on > > ctdb-system? I don't see you are using any of the functions defined in > > common/system.h. > > > > Please include the manpage in SAMBA_BINARY() definition. Also include it > in > > manpages[] list. It might be better to merge patch 1 and patch 2. > > Thanks for the feedback. Please find a new version attached (atop the > etcd changes), attempting to address your points above: > - drop unnecessary includes and ctdb-system dependency > + add separate talloc and tevent deps > + use tevent_timeval_current_ofs() instead of timeval_current_ofs() > - conditionally generate the man page > > Cheers, David Looks good. Pushed to autobuild with following minor fixups. - In manpage, replace ctdb_mutex_ceph_rados_helper_lock with ctdb_mutex_ceph_rados_helper - In wscript, add missing manpages_ceph in the dist target Amitay. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-12-09 3:11 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-01 14:17 [PATCH] Ceph RADOS cluster mutex helper for Samba CTDB David Disseldorp
[not found] ` <CAJ+X7mTkBLQDYb+r9LELQe-sqfG_4YkQ9HbkDFAp70cPp7V8zA@mail.gmail.com>
2016-12-06 12:14 ` David Disseldorp
2016-12-06 12:18 ` David Disseldorp
[not found] ` <CAJ+X7mRh04D+Yvtf0xx3dT6rTa=9KvJagyK=1PJQC=1R+u++7w@mail.gmail.com>
2016-12-08 18:39 ` David Disseldorp
2016-12-09 3:11 ` Amitay Isaacs
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.