From: Ian Jackson <ian.jackson@eu.citrix.com>
To: xen-devel@lists.xenproject.org
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>,
Ian Campbell <ian.campbell@citrix.com>
Subject: [OSSTEST PATCH 7/7] ms-ownerdaemon: Cope with db restart. Retry recording dead tasks.
Date: Thu, 7 Jan 2016 19:38:16 +0000 [thread overview]
Message-ID: <1452195496-16016-8-git-send-email-ian.jackson@eu.citrix.com> (raw)
In-Reply-To: <1452195496-16016-1-git-send-email-ian.jackson@eu.citrix.com>
In chan-destroy-stuff, instead of accessing the db directly, add the
dead task(s) to a queue, and arrange to look at that queue.
Errors are handled by setting an `after' handler which we cancel if we
are successful.
The after handler requeues a queue run attempt as the first thing
(which will arrange that a further retry will occur if things are
still broken) and then attempts to reconnect to the database.
I have tested this with a test instance by renaming the `tasks' table
under its feet, and it functions as expected.
DEPLOYMENT NOTE: The owner daemon cannot be restarted without shutting
everything down. So this update should first be deployed in
Cambridge, probably, to see how it goes. Also, it is less critical in
the main Xen production test lab because there the db and the owner
daemon are co-hosted on the same VM.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
Osstest/Executive.pm | 1 +
ms-ownerdaemon | 37 +++++++++++++++++++++++++++++++++----
2 files changed, 34 insertions(+), 4 deletions(-)
diff --git a/Osstest/Executive.pm b/Osstest/Executive.pm
index 2314577..d31fafb 100644
--- a/Osstest/Executive.pm
+++ b/Osstest/Executive.pm
@@ -113,6 +113,7 @@ augmentconfigdefaults(
augmentconfigdefaults(
OwnerDaemonHost => $c{ControlDaemonHost},
QueueDaemonHost => $c{ControlDaemonHost},
+ OwnerDaemonDbRetry => $c{QueueDaemonRetry},
);
#---------- configuration reader etc. ----------
diff --git a/ms-ownerdaemon b/ms-ownerdaemon
index 502dcfe..318549a 100755
--- a/ms-ownerdaemon
+++ b/ms-ownerdaemon
@@ -22,16 +22,37 @@
source ./tcl/daemonlib.tcl
+set dead_tasks {}
+
proc chan-destroy-stuff {chan} {
+ global dead_tasks
+
upvar #0 chanawait($chan) await
catch { unset await }
upvar #0 chantasks($chan) tasks
if {![info exists tasks]} return
+ puts-chan-desc $chan "-- $tasks"
+
+ foreach task $tasks {
+ lappend dead_tasks $task
+ }
+ after idle record-dead-tasks
+}
+
+proc record-dead-tasks {} {
+ global c dead_tasks
+
+ if {![llength $dead_tasks]} return
+
+ puts "record-dead-tasks ... $dead_tasks"
+
+ set retry [expr {$c(OwnerDaemonDbRetry) * 1000}]
+ set eafter [after $retry record-dead-tasks-retry]
+
jobdb::transaction resources {
- puts-chan-desc $chan "-- $tasks"
- foreach task $tasks {
+ foreach task $dead_tasks {
jobdb::db-execute "
UPDATE tasks
SET live = 'f'
@@ -39,12 +60,20 @@ proc chan-destroy-stuff {chan} {
"
}
}
- puts-chan-desc $chan "== $tasks"
- unset tasks
+ after cancel $eafter
+ puts "record-dead-tasks OK. $dead_tasks"
+ set dead_tasks {}
after idle await-endings-notify
}
+proc record-dead-tasks-retry {} {
+ after idle record-dead-tasks
+ puts "** reconnecting/retrying **"
+ catch { jobdb::db-close }
+ jobdb::db-open
+}
+
proc await-endings-notify {} {
global chanawait
foreach chan [array names chanawait] {
--
1.7.10.4
next prev parent reply other threads:[~2016-01-07 19:38 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-07 19:38 [OSSTEST PATCH 0/7] Better database handling in Tcl Ian Jackson
2016-01-07 19:38 ` [OSSTEST PATCH 1/7] Tcl database debugging: Actually work Ian Jackson
2016-01-07 19:38 ` [OSSTEST PATCH 2/7] Database locking: Tcl: Use db-execute-array Ian Jackson
2016-01-07 19:38 ` [OSSTEST PATCH 3/7] Database locking: Tcl: Use db-execute Ian Jackson
2016-01-08 9:28 ` Ian Campbell
2016-01-12 15:38 ` Ian Jackson
2016-01-07 19:38 ` [OSSTEST PATCH 4/7] Database locking: Tcl: Always use db-execute-array for SELECT Ian Jackson
2016-01-07 19:38 ` [OSSTEST PATCH 5/7] Database locking: Tcl: for errorCode, use pg_exec, not pg_execute Ian Jackson
2016-01-08 9:32 ` Ian Campbell
2016-01-12 15:39 ` Ian Jackson
2016-01-14 10:33 ` Ian Campbell
2016-01-07 19:38 ` [OSSTEST PATCH 6/7] Database locking: Tcl: Retry only on DEADLOCK DETECTED Ian Jackson
2016-01-07 19:38 ` Ian Jackson [this message]
2016-01-08 9:40 ` [OSSTEST PATCH 0/7] Better database handling in Tcl Ian Campbell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1452195496-16016-8-git-send-email-ian.jackson@eu.citrix.com \
--to=ian.jackson@eu.citrix.com \
--cc=ian.campbell@citrix.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).