From: Ian Jackson <ian.jackson@eu.citrix.com>
To: xen-devel@lists.xenproject.org
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Subject: [OSSTEST PATCH 17/33] ms-ownerdaemon: Cope with db restart. Retry recording dead tasks.
Date: Fri, 8 Jul 2016 19:26:09 +0100 [thread overview]
Message-ID: <1468002385-4407-18-git-send-email-ian.jackson@eu.citrix.com> (raw)
In-Reply-To: <1468002385-4407-1-git-send-email-ian.jackson@eu.citrix.com>
In chan-destroy-stuff, instead of accessing the db directly, add the
dead task(s) to a queue, and arrange to look at that queue.
Errors are handled by setting an `after' handler which we cancel if we
are successful.
The after handler requeues a queue run attempt as the first thing
(which will arrange that a further retry will occur if things are
still broken) and then attempts to reconnect to the database.
I have tested this with a test instance by renaming the `tasks' table
under its feet, and it functions as expected.
DEPLOYMENT NOTE: The owner daemon cannot be restarted without shutting
everything down. So this update should first be deployed in
Cambridge, probably, to see how it goes. Also, it is less critical in
the main Xen production test lab because there the db and the owner
daemon are co-hosted on the same VM.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v2: Put back the `unset tasks' which was mistakenly removed. The
effect of its lack is to fail to clear out the task list for
previous uses of the channel (which is named after the fd); this
is mostly harmless apart from log spam but causes the usual
case to be something like
OK created-task 456354 ownd [10.80.227.94]:44852-876
rather than
OK created-task 456354 ownd [10.80.227.94]:44852-876
which some of the clients (rightly) don't expect.
---
Osstest/Executive.pm | 1 +
ms-ownerdaemon | 38 ++++++++++++++++++++++++++++++++++----
2 files changed, 35 insertions(+), 4 deletions(-)
diff --git a/Osstest/Executive.pm b/Osstest/Executive.pm
index 468031c..0602925 100644
--- a/Osstest/Executive.pm
+++ b/Osstest/Executive.pm
@@ -113,6 +113,7 @@ augmentconfigdefaults(
augmentconfigdefaults(
OwnerDaemonHost => $c{ControlDaemonHost},
QueueDaemonHost => $c{ControlDaemonHost},
+ OwnerDaemonDbRetry => $c{QueueDaemonRetry},
);
#---------- configuration reader etc. ----------
diff --git a/ms-ownerdaemon b/ms-ownerdaemon
index 3623d19..62ca645 100755
--- a/ms-ownerdaemon
+++ b/ms-ownerdaemon
@@ -22,16 +22,38 @@
source ./tcl/daemonlib.tcl
+set dead_tasks {}
+
proc chan-destroy-stuff {chan} {
+ global dead_tasks
+
upvar #0 chanawait($chan) await
catch { unset await }
upvar #0 chantasks($chan) tasks
if {![info exists tasks]} return
+ puts-chan-desc $chan "-- $tasks"
+
+ foreach task $tasks {
+ lappend dead_tasks $task
+ }
+ unset tasks
+ after idle record-dead-tasks
+}
+
+proc record-dead-tasks {} {
+ global c dead_tasks
+
+ if {![llength $dead_tasks]} return
+
+ puts "record-dead-tasks ... $dead_tasks"
+
+ set retry [expr {$c(OwnerDaemonDbRetry) * 1000}]
+ set eafter [after $retry record-dead-tasks-retry]
+
jobdb::transaction resources {
- puts-chan-desc $chan "-- $tasks"
- foreach task $tasks {
+ foreach task $dead_tasks {
jobdb::db-execute "
UPDATE tasks
SET live = 'f'
@@ -39,12 +61,20 @@ proc chan-destroy-stuff {chan} {
"
}
}
- puts-chan-desc $chan "== $tasks"
- unset tasks
+ after cancel $eafter
+ puts "record-dead-tasks OK. $dead_tasks"
+ set dead_tasks {}
after idle await-endings-notify
}
+proc record-dead-tasks-retry {} {
+ after idle record-dead-tasks
+ puts "** reconnecting/retrying **"
+ catch { jobdb::db-close }
+ jobdb::db-open
+}
+
proc await-endings-notify {} {
global chanawait
foreach chan [array names chanawait] {
--
2.1.4
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
next prev parent reply other threads:[~2016-07-08 18:26 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-08 18:25 [OSSTEST PATCH 00/33] Database locking and retry Ian Jackson
2016-07-08 18:25 ` [OSSTEST PATCH 01/33] mg-allocate: Fix "issteallable" call Ian Jackson
2016-07-08 18:25 ` [OSSTEST PATCH 02/33] mg-allocate: Do not treat already-allocated resources as satisfactory Ian Jackson
2016-07-08 18:25 ` [OSSTEST PATCH 03/33] mg-schema-test-database: Direct logs to local directory Ian Jackson
2016-07-08 18:25 ` [OSSTEST PATCH 04/33] mg-schema-test-database: Prepare for `daemons' to be cleverer Ian Jackson
2016-07-08 18:25 ` [OSSTEST PATCH 05/33] mg-schema-test-database: Make `daemons' " Ian Jackson
2016-07-08 18:25 ` [OSSTEST PATCH 06/33] mg-schema-test-database: Change default minflight to -100 Ian Jackson
2016-07-08 18:25 ` [OSSTEST PATCH 07/33] invoke-daemon: Honour OSSTEST_DAEMON_TCLSH Ian Jackson
2016-07-08 18:26 ` [OSSTEST PATCH 08/33] Tcl: Use tclsh8.5 Ian Jackson
2016-07-08 18:26 ` [OSSTEST PATCH 09/33] ms-flights-summary: Remove spurious \ in keys \%{ something } Ian Jackson
2016-07-08 18:26 ` [OSSTEST PATCH 10/33] ms-planner: Support ClientNotes Ian Jackson
2016-07-08 18:26 ` [OSSTEST PATCH 11/33] Tcl database debugging: Actually work Ian Jackson
2016-07-08 18:26 ` [OSSTEST PATCH 12/33] Database locking: Tcl: Use db-execute-array Ian Jackson
2016-07-08 18:26 ` [OSSTEST PATCH 13/33] Database locking: Tcl: Use db-execute Ian Jackson
2016-07-08 18:26 ` [OSSTEST PATCH 14/33] Database locking: Tcl: Always use db-execute-array for SELECT Ian Jackson
2016-07-08 18:26 ` [OSSTEST PATCH 15/33] Database locking: Tcl: for errorCode, use pg_exec, not pg_execute Ian Jackson
2016-07-08 18:26 ` [OSSTEST PATCH 16/33] Database locking: Tcl: Retry only on DEADLOCK DETECTED Ian Jackson
2016-07-08 18:26 ` Ian Jackson [this message]
2016-07-08 18:26 ` [OSSTEST PATCH 18/33] ms-ownerdaemon: Break out db-reopen, and move it to JobDB-Executive Ian Jackson
2016-07-08 18:26 ` [OSSTEST PATCH 19/33] tcl daemons: Move BEGIN within scope of transaction error trapping Ian Jackson
2016-07-08 18:26 ` [OSSTEST PATCH 20/33] tcl daemons: jobdb::transaction: Improve two message generation sites Ian Jackson
2016-07-08 18:26 ` [OSSTEST PATCH 21/33] tcl daemons: Remove obsolete `global g' Ian Jackson
2016-07-08 18:26 ` [OSSTEST PATCH 22/33] tcl daemons: Break out db-ensure-open and db-ensure-closed Ian Jackson
2016-07-08 18:26 ` [OSSTEST PATCH 23/33] tcl daemons: db-ensure-open, -close: Make idempotent Ian Jackson
2016-07-08 18:26 ` [OSSTEST PATCH 24/33] tcl daemons: make db-reopen actually work Ian Jackson
2016-07-08 18:26 ` [OSSTEST PATCH 25/33] tcl daemons: More info in db--exec-check error Ian Jackson
2016-07-08 18:26 ` [OSSTEST PATCH 26/33] tcl daemons: Recognise `SSL SYSCALL' errors with their own errorCode Ian Jackson
2016-07-08 18:26 ` [OSSTEST PATCH 27/33] tcl daemons: transaction: Properly match db-open and db-close Ian Jackson
2016-07-08 18:26 ` [OSSTEST PATCH 28/33] tcl daemons: if error occurs, ensure db is closed afterwards Ian Jackson
2016-07-08 18:26 ` [OSSTEST PATCH 29/33] tcl daemons: transaction: Only try ROLLBACK when necessary Ian Jackson
2016-07-08 18:26 ` [OSSTEST PATCH 30/33] tcl daemons: transaction: Support db autoreconnect Ian Jackson
2016-07-08 18:26 ` [OSSTEST PATCH 31/33] tcl-daemons: ms-ownerdaemon: Use autoreconnect Ian Jackson
2016-07-08 18:26 ` [OSSTEST PATCH 32/33] tcl daemons: Provide with-db Ian Jackson
2016-07-08 18:26 ` [OSSTEST PATCH 33/33] tcl daemons: Use with-db Ian Jackson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1468002385-4407-18-git-send-email-ian.jackson@eu.citrix.com \
--to=ian.jackson@eu.citrix.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).