* [PATCH 0 of 1] Deal with broken frontend/backend ring I/O.
@ 2011-06-20 8:26 Daniel Stodden
2011-06-20 8:26 ` [PATCH 1 of 1] xen-backwatch: " Daniel Stodden
0 siblings, 1 reply; 5+ messages in thread
From: Daniel Stodden @ 2011-06-20 8:26 UTC (permalink / raw)
To: Xen
Hi.
After running this blkback patch (Don't let in-flight requests defer
pending ones...)
http://lists.xensource.com/archives/html/xen-devel/2011-05/msg01968.html
for a while I guess it's mostly been verified.
Unfortunately, it also revealed a great potential to demo old guest
bugs. The 2.6.32 tree used to have a problem with lost notifications
during IRQ handler migration, due to a glitch in the dynirq handler
logic.
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.32.y.git;a=commitdiff;h=c5783925493e315f91330241546da7915dcc46e3
Blkfront got fixed in stable/v2.6.32.y, but looks as at least RHEL6
didn't patch it (yet), so I suspect CentOS and derivatives to suffer
too.
Xen-blkfront is particularly sensitive to this. Some people seem to
report around one or two incidents per week. Presumably more on
heavily loaded systems (to repro, manually spinning the affinity mask
under scattered I/O will trigger almost immediately). That's going to
increase.
So let's learn to live with that. Main issue is that even if you know
what to blame, there's nothing in place to deal with it.
I'd like to propose toolstack support which provides people with a
workaround. With minimal kernel support, a watchdog can mostly live in
userland, is easy to do and won't need to clutter backend drivers.
This can hardly be considered a fix for what's essentiallly guest
problem. But it gives hosts a chance to automate guest recovery until
there's an update.
Also, it's nice for debugging. Ring I/O and event races are a constant
source of paranaoia whenever guests appear to wedge, and I believe it
might help to drastically reduce time spent on remote triage in some
cases.
It can also identify excessively blocking I/O (as opposed to a stuck
message dispatch).
Some potential use cases
- Run occasionally (cron). Alerting on production systems where
guest OSes resides in a different admistrative domain with no
prospect for a quick fix. Might go into distros.
- More frequently, once the machine is known to host guests prone to
error. There shouldn't be much of a performance impact anyway. But
it might want to be tuned to not start spamming the console logs.
- Command line test. For people reporting I/O issues, wherever
suspecting front/backend problems (or to dismiss that). Or to aid
driver hacking. Might also go in xen-bugtool.
I chose to drop it into tools/misc. It's rather standalone. Takes a
sysfs patch to blkback. I didn't add netback support, but I guess that
would look very similar if it ever becomes desirable.
Cheers,
Daniel
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1 of 1] xen-backwatch: Deal with broken frontend/backend ring I/O
2011-06-20 8:26 [PATCH 0 of 1] Deal with broken frontend/backend ring I/O Daniel Stodden
@ 2011-06-20 8:26 ` Daniel Stodden
2011-06-20 16:49 ` Ian Jackson
0 siblings, 1 reply; 5+ messages in thread
From: Daniel Stodden @ 2011-06-20 8:26 UTC (permalink / raw)
To: Xen
[-- Attachment #1: Type: text/plain, Size: 502 bytes --]
Adds tool support to debug backends which expose I/O ring state in
sysfs. Currently supports /sys/devices/xen-backend/vbd-*-*/io_ring
nodes for block I/O, where implemented.
Primary function is to observe ring state make progress over a period
of time, then report stuck message queue halves where pending
consumer/event are not moving.
Adding --kick will re-issue event notifications to frontends, and/or
kick backends out of wait state.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
[-- Attachment #2: xen-backwatch.diff --]
[-- Type: text/x-patch, Size: 17943 bytes --]
# HG changeset patch
# User Daniel Stodden <daniel.stodden@citrix.com>
# Date 1308558304 25200
# Node ID f73f9c3d6eaeac7a77c9a9fca0db2ccc45496521
# Parent eca057e4475ca455ec36f962b9179fd2c9674196
xen-backwatch: Deal with broken frontend/backend ring I/O.
Adds tool support to debug backends which expose I/O ring state in
sysfs. Currently supports /sys/devices/xen-backend/vbd-*-*/io_ring
nodes for block I/O, where implemented.
Primary function is to observe ring state make progress over a period
of time, then report stuck message queue halves where pending
consumer/event are not moving.
Adding --kick will re-issue event notifications to frontends, and/or
kick backends out of wait state.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
diff --git a/tools/misc/Makefile b/tools/misc/Makefile
--- a/tools/misc/Makefile
+++ b/tools/misc/Makefile
@@ -21,7 +21,7 @@
INSTALL_BIN-$(CONFIG_X86) += xen-detect
INSTALL_BIN := $(INSTALL_BIN-y)
-INSTALL_SBIN-y := xm xen-bugtool xen-python-path xend xenperf xsview xenpm xen-tmem-list-parse gtraceview gtracestat xenlockprof xenwatchdogd
+INSTALL_SBIN-y := xm xen-bugtool xen-python-path xend xenperf xsview xenpm xen-tmem-list-parse gtraceview gtracestat xenlockprof xenwatchdogd xen-ringwatch
INSTALL_SBIN-$(CONFIG_X86) += xen-hvmctx xen-hvmcrash
INSTALL_SBIN-$(CONFIG_MIGRATE) += xen-hptool
INSTALL_SBIN := $(INSTALL_SBIN-y)
diff --git a/tools/misc/xen-ringwatch b/tools/misc/xen-ringwatch
new file mode 100755
--- /dev/null
+++ b/tools/misc/xen-ringwatch
@@ -0,0 +1,619 @@
+#!/usr/bin/python
+#
+# Copyright (C) 2011 Citrix Systems, Inc.
+#
+# This library is free software; you can redistribute it and/or modify
+# it under the terms of version 2.1 of the GNU Lesser General Public
+# License as published by the Free Software Foundation.
+#
+# This library is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307
+# USA
+#
+
+"""Overview:
+
+ - Gather Xen I/O ring states
+ (from %s/*/ring)
+
+ - Update ring states every -T seconds.
+
+ - Determine if rings are idle or make progress.
+
+ - Determine if idle rings dropped notifications (%s).
+
+ - Instruct stuck backends to reissue notifications.
+"""
+
+import os
+import glob
+
+class Pattern(object):
+ """A regex pattern. Compiled on demand, then persisted."""
+
+ def __init__(self, regex):
+ self.regex = regex
+ self.__pattern = None
+
+ def get(self):
+ import re
+
+ if not self.__pattern:
+ self.__pattern = re.compile(self.regex)
+
+ return self.__pattern
+
+ def search(self, s):
+ return self.get().search(s)
+
+class XenBackend(object):
+ """A Xen I/O backend."""
+
+ SYSFS_BASEDIR = "/sys/devices/xen-backend"
+
+ def __init__(self, rd, devid):
+ self.rd = int(rd)
+ self.devid = int(devid)
+
+ def __repr__(self):
+ return "%s(%d, %d)" % (type(self).__name__,
+ self.rd, self.devid)
+
+ def name(self):
+ raise NotImplementedError
+
+ def path(self):
+ return "%s/%s" % (self.SYSFS_BASEDIR, self.name())
+
+ _name_pattern = None
+
+ @classmethod
+ def from_name(cls, name):
+ match = cls._name_pattern.search(name)
+ if not match:
+ raise Exception, "Malformed %s name: %s" % \
+ (type(self).__name__, name)
+
+ rd = match.group(1)
+ devid = match.group(2)
+
+ return cls(rd, devid)
+
+ _name_glob = None
+
+ @classmethod
+ def find(cls):
+ paths = glob.glob("%s/%s" % (cls.SYSFS_BASEDIR,
+ cls._name_glob))
+ for path in paths:
+ name = os.path.basename(path)
+ yield cls.from_name(name)
+
+ def find_rings(self):
+ for ring in self.Ring.find(self):
+ yield ring
+
+ class Ring(object):
+
+ def __init__(self, backend, name):
+ self.backend = backend
+ self.name = name
+
+ __size = None
+
+ def key(self):
+ return "%s/%s" % (self.backend.name(),
+ self.name)
+
+ def __str__(self):
+ return "%s(%s)" % (type(self).__name__, self.key())
+
+ @classmethod
+ def from_name(cls, backend, name):
+ return cls(backend, name)
+
+ _name_glob = None
+
+ @classmethod
+ def find(cls, backend):
+ paths = glob.glob("%s/%s" % (backend.path(),
+ cls._name_glob))
+ for path in paths:
+ name = os.path.basename(path)
+ yield cls.from_name(backend, name)
+
+ def path(self):
+ return "%s/%s" % (self.backend.path(),
+ self.name)
+
+ def read(self):
+ state = RingState.from_sysfs(self.path())
+ return state
+
+ def write(self, cmd):
+ f = file(self.path(), 'w')
+ try:
+ f.write(cmd.rstrip())
+ finally:
+ f.close()
+
+ def kick(self):
+ self.write("kick")
+
+ def poll(self):
+ self.write("poll")
+
+ __ring = None
+
+ TYPES = {}
+ XEN_BACKEND_NAME = None
+
+ @classmethod
+ def register(cls):
+ XenBackend.TYPES[cls.XEN_BACKEND_NAME] = cls
+
+class VBD(XenBackend):
+ """Xen blkif backends."""
+
+ XEN_BACKEND_NAME = 'vbd'
+
+ _name_pattern = Pattern("vbd-(\d+)-(\d+)")
+ _name_glob = "vbd-*-*"
+
+ def name(self):
+ return "vbd-%d-%d" % (self.rd, self.devid)
+
+ class Ring(XenBackend.Ring):
+ _name_glob = "io_ring"
+
+VBD.register()
+
+class VIF(XenBackend):
+ """Xen netif backends."""
+
+ XEN_BACKEND_NAME = 'vif'
+
+ _name_pattern = Pattern("vif-(\d+)-(\d+)")
+ _name_glob = "vif-*-*"
+
+ def name(self):
+ return "vif-%d-%d" % (self.rd, self.devid)
+
+ class Ring(XenBackend.Ring):
+ _name_glob = "{rx,tx}_ring"
+
+#VIF.register()
+
+class RingState(object):
+ """Overall backend ring state. Comprising req and rsp queue
+ indexes, and analysis."""
+
+ def __init__(self, size, req, rsp):
+ self.size = int(size)
+ self.req = req
+ self.rsp = rsp
+
+ _size_pattern = Pattern("nr_ents (\d+)")
+
+ @classmethod
+ def from_sysfs(cls, path):
+
+ f = file(path, "r")
+ try:
+ s = f.read()
+ finally:
+ f.close()
+
+ try:
+ (_nr_ents, _req, _rsp, _) = s.split("\n")
+
+ match = cls._size_pattern.search(_nr_ents)
+ nr_ents = int(match.group(1))
+
+ except Exception, e:
+ raise Exception, "Malformed %s input: %s (%s)" % \
+ (cls.__name__, repr(s), str(e))
+
+ req = cls.Req.from_sysfs(_req, size=nr_ents)
+ rsp = cls.Rsp.from_sysfs(_rsp, size=nr_ents)
+
+ return cls(nr_ents, req, rsp)
+
+ class Queue(dict):
+
+ def __init__(self, size):
+ self.size = int(size)
+
+ prod = None
+
+ @classmethod
+ def from_sysfs(cls, line, **d):
+
+ match = cls._pattern.search(line)
+ if not match:
+ raise Exception, "Malformed %s input: %s" % \
+ (cls.__name__, repr(s))
+
+ i = iter(match.groups())
+ for k in i:
+ d[k] = i.next()
+
+ return cls(**d)
+
+ def is_consumed(self):
+ return self.prod == self._cons()
+
+ class Req(Queue):
+
+ _pattern = Pattern("req (prod) (\d+) (cons) (\d+) (event) (\d+)")
+
+ def __init__(self, prod, cons, event, **d):
+ RingState.Queue.__init__(self, **d)
+ self.prod = int(prod)
+ self.cons = int(cons)
+ self.event = int(event)
+
+ def __repr__(self):
+ return "%s(prod=%d, cons=%d, event=%d)" % \
+ (type(self).__name__, self.prod, self.cons, self.event)
+
+ def _cons(self):
+ return self.cons
+
+ def __eq__(self, other):
+ return \
+ self.prod == other.prod and \
+ self.cons == other.cons and \
+ self.event == other.event
+
+ class Rsp(Queue):
+
+ _pattern = Pattern("rsp (prod) (\d+) (pvt) (\d+) (event) (\d+)")
+
+ def __init__(self, prod, pvt, event, **d):
+ RingState.Queue.__init__(self, **d)
+ self.prod = int(prod)
+ self.pvt = int(pvt)
+ self.event = int(event)
+
+ def __repr__(self):
+ return "%s(prod=%d, pvt=%d, event=%d)" % \
+ (type(self).__name__, self.prod, self.pvt, self.event)
+
+ def _cons(self):
+ return self.event - 1
+
+ def __eq__(self, other):
+ return \
+ self.prod == other.prod and \
+ self.pvt == other.pvt and \
+ self.event == other.event
+
+ def is_consumed(self):
+ return \
+ self.rsp.is_consumed() and \
+ self.req.is_consumed()
+
+ def is_pending(self):
+ return self.rsp.prod != self.req.prod
+
+ def kick(self, ring):
+ action = False
+
+ if not self.req.is_consumed():
+ action |= True
+ ring.poll()
+
+ if not self.rsp.is_consumed():
+ action |= True
+ ring.kick()
+
+ return action
+
+ def __eq__(self, other):
+ return \
+ self.size == other.size and \
+ self.req == other.req and \
+ self.rsp == other.rsp
+
+ def __repr__(self):
+ return "%s(size=%d, %s, %s)" % \
+ (type(self).__name__, self.size, self.req, self.rsp)
+
+ def display(self):
+ complete = { True: "complete", False: "pending" }
+
+ io = complete[not self.is_pending()]
+ req = complete[self.req.is_consumed()]
+ rsp = complete[self.rsp.is_consumed()]
+
+ return "%s: io: %s, req: %s, rsp: %s" % (self, io, req, rsp)
+
+class RingWatch(object):
+ """State machine watching I/O individual ring state"""
+
+ _NEW = "_NEW"
+ BUSY = "BUSY"
+ IDLE = "IDLE"
+ STCK = "STCK"
+
+ COMMENTS = { BUSY: "Message traffic observed (OK)",
+ IDLE: "No messages observed (Ring OK, I/O depends)",
+ STCK: "No pending req/rsp consumer progress observed (BUG)" }
+
+ def __init__(self, ring, state):
+ self.ring = ring
+ self.state = state
+ self.status = RingWatch._NEW
+
+ @classmethod
+ def new(cls, ring):
+ state = ring.read()
+ return cls(ring, state)
+
+ def __str__(self):
+ return "%s(%s)[%s]" % \
+ (type(self).__name__, self.ring.key(), self.status)
+
+ def is_stuck(self):
+ return self.status == self.STCK
+
+ def is_idle(self):
+ return self.status == self.IDLE
+
+ def kick(self):
+ if self.is_stuck():
+ return self.state.kick(self.ring)
+
+ def update(self):
+
+ prev = self.state
+ curr = self.ring.read()
+
+ if curr == prev:
+ if not curr.is_consumed():
+ self.status = self.STCK
+ else:
+ self.status = self.IDLE
+ else:
+ self.status = self.BUSY
+
+ self.state = curr
+
+ def display(self):
+ return "%s: %s" % (self,
+ self.state.display())
+
+class WatchList(object):
+ """Managed collection of I/O rings under surveillance."""
+
+ def __init__(self, gen):
+ self.gen = gen
+ self.list = {}
+
+ def update(self):
+
+ # NB. clear the watch list, then rebuild it. new entries get
+ # added, existing ones updates, those gone discarded.
+ prev = self.list
+ self.list = {}
+
+ for ring in self.gen():
+
+ key = ring.key()
+ entry = prev.get(key)
+
+ try:
+ if not entry:
+ entry = RingWatch.new(ring)
+ else:
+ entry.update()
+
+ except IOError, e:
+ pass
+ # NB. racing unplug, any ring.read() may raise.
+ # nothing left to memorize then.
+ else:
+ self.list[key] = entry
+
+ def __iter__(self):
+ return self.list.itervalues()
+
+ def pending(self):
+ for entry in self:
+ if entry.is_idle() and entry.state.is_pending():
+ yield entry
+
+ def stuck(self):
+ for entry in self:
+ if entry.is_stuck():
+ yield entry
+
+ def kick(self):
+ for entry in self.stuck():
+ try:
+ entry.kick()
+ except IOError:
+ # NB. racing unplug, any ring.write() may raise.
+ pass
+
+if __name__ == '__main__':
+ from sys import argv, stdout, stderr, exit
+ from getopt import gnu_getopt, GetoptError
+ from pprint import pprint
+
+ DEFAULT_PERIOD = 1 # secs
+
+ verbose = 0
+ period = DEFAULT_PERIOD
+ backends = XenBackend.TYPES.values()
+ kick = False
+ iowatch = False
+
+ OPTIONS = ((('h', 'help'),
+ "Print this help screen."),
+
+ (('v', 'verbose'),
+ "Increase output verbosity level (use n-times)."),
+
+ (('I', 'io'),
+ "Watch out for stuck I/O (not messaging), too. (%s)" % \
+ (iowatch)),
+
+ (('t', 'types'),
+ "Comma separated list of backend types to watch. (%s)" % \
+ ",".join(map(lambda t: t.XEN_BACKEND_NAME, backends))),
+
+ (('T', 'period'),
+ "Watch update period. (%d) [secs]" % \
+ (period)),
+
+ (('k', 'kick'),
+ "Kick broken guests out of cardiac arrest. (%s)" % \
+ (kick))
+ )
+
+ COMMANDS = {"check":
+ "Single iteration quick test (takes -T seconds)."}
+
+ def usage(stream):
+ prog = os.path.basename(argv[0])
+
+ print >>stream
+
+ print >>stream, "Usage:"
+ print >>stream, "\t%s [options] {%s}" % (prog, "|".join(COMMANDS))
+
+ print >>stream
+
+ print >>stream, "Commands:"
+ for (name, desc) in COMMANDS.iteritems():
+ print >>stream, "\t%s: \t%s" % (name, desc)
+
+ print >>stream
+
+ print >>stream, "Options:"
+ for ((short, _long), desc) in OPTIONS:
+ print >>stream, "\t-%s, --%s: \t%s" % (short, _long, desc)
+
+ print >>stream
+
+ def fail(msg = None):
+ if msg: print >>stderr, "Error: %s" % msg
+ usage(stderr)
+ exit(1)
+
+ def help():
+
+ usage(stdout)
+
+ print __doc__ % (XenBackend.SYSFS_BASEDIR, RingWatch.STCK)
+
+ print "Backend Types:"
+ for k, v in XenBackend.TYPES.iteritems():
+ print "\t%s: \t%s (%s)" % (k, v.__doc__, v._name_glob)
+
+ print
+ print "Ring States:"
+ for k, v in RingWatch.COMMENTS.iteritems():
+ print "\t%s: \t%s" % (k, v)
+
+ print
+
+ try:
+ opts, args = gnu_getopt(argv[1:],
+ "hIkt:vT:",
+ ["help",
+ "io",
+ "kick",
+ "type=",
+ "verbose",
+ "period="])
+ except GetoptError, e:
+ fail(str(e))
+
+ for (o, arg) in opts:
+ try:
+ if o in ('-h', '--help'):
+ help()
+ exit(0)
+
+ elif o in ['-v', '--verbose']:
+ verbose += 1
+
+ elif o in ['-I', '--io']:
+ iowatch = True
+
+ elif o in ('-T', '--period'):
+ period = int(arg)
+
+ elif o in ('-t', '--type'):
+ backends = ",".split(arg)
+ backends = map(lambda t: XenBackend.TYPES[t], backends)
+
+ elif o in ('-k', '--kick'):
+ kick = True
+
+ else:
+ raise "BUG: option %s unhandled." % o
+
+ except ValueError:
+ fail("%s: invalid argument '%s'." % (o, arg))
+
+ try:
+ cmd = args[0]
+ except IndexError:
+ fail("Missing command.")
+
+ def ring_select():
+ for _type in backends:
+ for backend in _type.find():
+ for ring in backend.find_rings():
+ yield ring
+
+ def show(entries):
+ for watch in entries:
+ print watch.display()
+
+ def pause():
+ import time
+ time.sleep(period)
+
+ watches = WatchList(ring_select)
+
+ if cmd == "check":
+
+ # init
+ watches.update()
+
+ if verbose >= 3:
+ show(watches)
+
+ # watch for one round
+ pause()
+ watches.update()
+
+ # show result
+ crit = list(watches.stuck())
+ stuck = bool(crit)
+
+ if (iowatch):
+ crit.extend(watches.pending())
+
+ if verbose >= 2:
+ show(watches)
+ elif crit:
+ show(crit)
+
+ if stuck and kick:
+ # deal with it
+ watches.kick()
+
+ else:
+ fail("Invalid command.")
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1 of 1] xen-backwatch: Deal with broken frontend/backend ring I/O
2011-06-20 8:26 ` [PATCH 1 of 1] xen-backwatch: " Daniel Stodden
@ 2011-06-20 16:49 ` Ian Jackson
2011-06-20 20:47 ` Daniel Stodden
0 siblings, 1 reply; 5+ messages in thread
From: Ian Jackson @ 2011-06-20 16:49 UTC (permalink / raw)
To: Daniel Stodden; +Cc: Xen
Daniel Stodden writes ("[Xen-devel] [PATCH 1 of 1] xen-backwatch: Deal with broken frontend/backend ring I/O"):
> Adds tool support to debug backends which expose I/O ring state in
> sysfs. Currently supports /sys/devices/xen-backend/vbd-*-*/io_ring
> nodes for block I/O, where implemented.
Thanks.
> Primary function is to observe ring state make progress over a period
> of time, then report stuck message queue halves where pending
> consumer/event are not moving.
This seems to have only one entry in COMMANDS, "check". Is that
right ? And it doesn't seem to provide a way to specify a particular
domain to look for ?
I'm happy to take it as-is as it seems like a better-than-nothing tool
but I just wanted to check I'd understood it, first.
Ian.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1 of 1] xen-backwatch: Deal with broken frontend/backend ring I/O
2011-06-20 16:49 ` Ian Jackson
@ 2011-06-20 20:47 ` Daniel Stodden
2011-06-21 14:39 ` Ian Jackson
0 siblings, 1 reply; 5+ messages in thread
From: Daniel Stodden @ 2011-06-20 20:47 UTC (permalink / raw)
To: Ian Jackson; +Cc: Xen
[-- Attachment #1: Type: text/plain, Size: 2670 bytes --]
On Mon, 2011-06-20 at 12:49 -0400, Ian Jackson wrote:
> Daniel Stodden writes ("[Xen-devel] [PATCH 1 of 1] xen-backwatch: Deal with broken frontend/backend ring I/O"):
> > Adds tool support to debug backends which expose I/O ring state in
> > sysfs. Currently supports /sys/devices/xen-backend/vbd-*-*/io_ring
> > nodes for block I/O, where implemented.
>
> Thanks.
>
> > Primary function is to observe ring state make progress over a period
> > of time, then report stuck message queue halves where pending
> > consumer/event are not moving.
>
> This seems to have only one entry in COMMANDS, "check". Is that
> right ?
The <command> thing should allow alternative ways to run it without
breaking existing deployments. I used to think about a 'daemon', but
then found that cron would likely do the job.
> And it doesn't seem to provide a way to specify a particular
> domain to look for ?
I briefly considered it initially, but after testing it just didn't look
so important anymore. :}
Presently, a
# xen-ringwatch check -v
RingWatch(vbd-1-51760/io_ring)[IDLE]: RingState(size=32, Req(prod=31, cons=31, event=32), Rsp(prod=31, pvt=31, event=32)): io: complete, req: complete, rsp: complete
RingWatch(vbd-1-51712/io_ring)[BUSY]: RingState(size=32, Req(prod=143236466, cons=143236466, event=143236467), Rsp(prod=143236459, pvt=143236459, event=143236460)): io: pending, req: complete, rsp: complete
will to dump the entire set of running backends, independent of state.
I should point out there's not really a significant overhead involved,
except some required wait period to come to a conclusion. It's all
glob/read/write/wait and all VBDs are watched in parallel. But even with
50 VMs, at some point I anticipated people to rather grep instead.
Here's a sample crontab invocation:
xen-ringwatch check -T 4 --kick | logger -p daemon.crit -t RINGWATCH-ALERT
Which will remain silent, until it actually discovers some watched
subset to .kick() and then outputs those, exclusively.
Jun 20 13:26:59 localhost RINGWATCH-ALERT: RingWatch(vbd-1-51712/io_ring)[STCK]: RingState(size=32, Req(prod=146141561, cons=146141561, event=146141562), Rsp(prod=146141561, pvt=146141561, event=146141530)): io: complete, req: complete, rsp: pending
> I'm happy to take it as-is as it seems like a better-than-nothing tool
> but I just wanted to check I'd understood it, first.
Found that the patch I sent was missing cleanup in some spots (mainly a
program rename, and the verbose variable in __main__ ended up off by
one). Can I sneak in the update attached before you push it?
Also, I never tried the make install target. Does it look okay to you?
Cheers,
Daniel
[-- Attachment #2: xen-ringwatch.diff --]
[-- Type: text/x-patch, Size: 17943 bytes --]
# HG changeset patch
# User Daniel Stodden <daniel.stodden@citrix.com>
# Date 1308558304 25200
# Node ID f73f9c3d6eaeac7a77c9a9fca0db2ccc45496521
# Parent eca057e4475ca455ec36f962b9179fd2c9674196
xen-ringwatch: Deal with broken frontend/backend ring I/O.
Adds tool support to debug backends which expose I/O ring state in
sysfs. Currently supports /sys/devices/xen-backend/vbd-*-*/io_ring
nodes for block I/O, where implemented.
Primary function is to observe ring state make progress over a period
of time, then report stuck message queue halves where pending
consumer/event are not moving.
Adding --kick will re-issue event notifications to frontends, and/or
kick backends out of wait state.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
diff --git a/tools/misc/Makefile b/tools/misc/Makefile
--- a/tools/misc/Makefile
+++ b/tools/misc/Makefile
@@ -21,7 +21,7 @@
INSTALL_BIN-$(CONFIG_X86) += xen-detect
INSTALL_BIN := $(INSTALL_BIN-y)
-INSTALL_SBIN-y := xm xen-bugtool xen-python-path xend xenperf xsview xenpm xen-tmem-list-parse gtraceview gtracestat xenlockprof xenwatchdogd
+INSTALL_SBIN-y := xm xen-bugtool xen-python-path xend xenperf xsview xenpm xen-tmem-list-parse gtraceview gtracestat xenlockprof xenwatchdogd xen-ringwatch
INSTALL_SBIN-$(CONFIG_X86) += xen-hvmctx xen-hvmcrash
INSTALL_SBIN-$(CONFIG_MIGRATE) += xen-hptool
INSTALL_SBIN := $(INSTALL_SBIN-y)
diff --git a/tools/misc/xen-ringwatch b/tools/misc/xen-ringwatch
new file mode 100755
--- /dev/null
+++ b/tools/misc/xen-ringwatch
@@ -0,0 +1,619 @@
+#!/usr/bin/python
+#
+# Copyright (C) 2011 Citrix Systems, Inc.
+#
+# This library is free software; you can redistribute it and/or modify
+# it under the terms of version 2.1 of the GNU Lesser General Public
+# License as published by the Free Software Foundation.
+#
+# This library is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307
+# USA
+#
+
+"""Overview:
+
+ - Gather Xen I/O ring states
+ (from %s/*/ring)
+
+ - Update ring states every -T seconds.
+
+ - Determine if rings are idle or make progress.
+
+ - Determine if idle rings dropped notifications (%s).
+
+ - Instruct stuck backends to reissue notifications.
+"""
+
+import os
+import glob
+
+class Pattern(object):
+ """A regex pattern. Compiled on demand, then persisted."""
+
+ def __init__(self, regex):
+ self.regex = regex
+ self.__pattern = None
+
+ def get(self):
+ import re
+
+ if not self.__pattern:
+ self.__pattern = re.compile(self.regex)
+
+ return self.__pattern
+
+ def search(self, s):
+ return self.get().search(s)
+
+class XenBackend(object):
+ """A Xen I/O backend."""
+
+ SYSFS_BASEDIR = "/sys/devices/xen-backend"
+
+ def __init__(self, rd, devid):
+ self.rd = int(rd)
+ self.devid = int(devid)
+
+ def __repr__(self):
+ return "%s(%d, %d)" % (type(self).__name__,
+ self.rd, self.devid)
+
+ def name(self):
+ raise NotImplementedError
+
+ def path(self):
+ return "%s/%s" % (self.SYSFS_BASEDIR, self.name())
+
+ _name_pattern = None
+
+ @classmethod
+ def from_name(cls, name):
+ match = cls._name_pattern.search(name)
+ if not match:
+ raise Exception, "Malformed %s name: %s" % \
+ (type(self).__name__, name)
+
+ rd = match.group(1)
+ devid = match.group(2)
+
+ return cls(rd, devid)
+
+ _name_glob = None
+
+ @classmethod
+ def find(cls):
+ paths = glob.glob("%s/%s" % (cls.SYSFS_BASEDIR,
+ cls._name_glob))
+ for path in paths:
+ name = os.path.basename(path)
+ yield cls.from_name(name)
+
+ def find_rings(self):
+ for ring in self.Ring.find(self):
+ yield ring
+
+ class Ring(object):
+
+ def __init__(self, backend, name):
+ self.backend = backend
+ self.name = name
+
+ __size = None
+
+ def key(self):
+ return "%s/%s" % (self.backend.name(),
+ self.name)
+
+ def __str__(self):
+ return "%s(%s)" % (type(self).__name__, self.key())
+
+ @classmethod
+ def from_name(cls, backend, name):
+ return cls(backend, name)
+
+ _name_glob = None
+
+ @classmethod
+ def find(cls, backend):
+ paths = glob.glob("%s/%s" % (backend.path(),
+ cls._name_glob))
+ for path in paths:
+ name = os.path.basename(path)
+ yield cls.from_name(backend, name)
+
+ def path(self):
+ return "%s/%s" % (self.backend.path(),
+ self.name)
+
+ def read(self):
+ state = RingState.from_sysfs(self.path())
+ return state
+
+ def write(self, cmd):
+ f = file(self.path(), 'w')
+ try:
+ f.write(cmd.rstrip())
+ finally:
+ f.close()
+
+ def kick(self):
+ self.write("kick")
+
+ def poll(self):
+ self.write("poll")
+
+ __ring = None
+
+ TYPES = {}
+ XEN_BACKEND_NAME = None
+
+ @classmethod
+ def register(cls):
+ XenBackend.TYPES[cls.XEN_BACKEND_NAME] = cls
+
+class VBD(XenBackend):
+ """Xen blkif backends."""
+
+ XEN_BACKEND_NAME = 'vbd'
+
+ _name_pattern = Pattern("vbd-(\d+)-(\d+)")
+ _name_glob = "vbd-*-*"
+
+ def name(self):
+ return "vbd-%d-%d" % (self.rd, self.devid)
+
+ class Ring(XenBackend.Ring):
+ _name_glob = "io_ring"
+
+VBD.register()
+
+class VIF(XenBackend):
+ """Xen netif backends."""
+
+ XEN_BACKEND_NAME = 'vif'
+
+ _name_pattern = Pattern("vif-(\d+)-(\d+)")
+ _name_glob = "vif-*-*"
+
+ def name(self):
+ return "vif-%d-%d" % (self.rd, self.devid)
+
+ class Ring(XenBackend.Ring):
+ _name_glob = "{rx,tx}_ring"
+
+#VIF.register()
+
+class RingState(object):
+ """Overall backend ring state. Comprising req and rsp queue
+ indexes, and analysis."""
+
+ def __init__(self, size, req, rsp):
+ self.size = int(size)
+ self.req = req
+ self.rsp = rsp
+
+ _size_pattern = Pattern("nr_ents (\d+)")
+
+ @classmethod
+ def from_sysfs(cls, path):
+
+ f = file(path, "r")
+ try:
+ s = f.read()
+ finally:
+ f.close()
+
+ try:
+ (_nr_ents, _req, _rsp, _) = s.split("\n")
+
+ match = cls._size_pattern.search(_nr_ents)
+ nr_ents = int(match.group(1))
+
+ except Exception, e:
+ raise Exception, "Malformed %s input: %s (%s)" % \
+ (cls.__name__, repr(s), str(e))
+
+ req = cls.Req.from_sysfs(_req, size=nr_ents)
+ rsp = cls.Rsp.from_sysfs(_rsp, size=nr_ents)
+
+ return cls(nr_ents, req, rsp)
+
+ class Queue(dict):
+
+ def __init__(self, size):
+ self.size = int(size)
+
+ prod = None
+
+ @classmethod
+ def from_sysfs(cls, line, **d):
+
+ match = cls._pattern.search(line)
+ if not match:
+ raise Exception, "Malformed %s input: %s" % \
+ (cls.__name__, repr(s))
+
+ i = iter(match.groups())
+ for k in i:
+ d[k] = i.next()
+
+ return cls(**d)
+
+ def is_consumed(self):
+ return self.prod == self._cons()
+
+ class Req(Queue):
+
+ _pattern = Pattern("req (prod) (\d+) (cons) (\d+) (event) (\d+)")
+
+ def __init__(self, prod, cons, event, **d):
+ RingState.Queue.__init__(self, **d)
+ self.prod = int(prod)
+ self.cons = int(cons)
+ self.event = int(event)
+
+ def __repr__(self):
+ return "%s(prod=%d, cons=%d, event=%d)" % \
+ (type(self).__name__, self.prod, self.cons, self.event)
+
+ def _cons(self):
+ return self.cons
+
+ def __eq__(self, other):
+ return \
+ self.prod == other.prod and \
+ self.cons == other.cons and \
+ self.event == other.event
+
+ class Rsp(Queue):
+
+ _pattern = Pattern("rsp (prod) (\d+) (pvt) (\d+) (event) (\d+)")
+
+ def __init__(self, prod, pvt, event, **d):
+ RingState.Queue.__init__(self, **d)
+ self.prod = int(prod)
+ self.pvt = int(pvt)
+ self.event = int(event)
+
+ def __repr__(self):
+ return "%s(prod=%d, pvt=%d, event=%d)" % \
+ (type(self).__name__, self.prod, self.pvt, self.event)
+
+ def _cons(self):
+ return self.event - 1
+
+ def __eq__(self, other):
+ return \
+ self.prod == other.prod and \
+ self.pvt == other.pvt and \
+ self.event == other.event
+
+ def is_consumed(self):
+ return \
+ self.rsp.is_consumed() and \
+ self.req.is_consumed()
+
+ def is_pending(self):
+ return self.rsp.prod != self.req.prod
+
+ def kick(self, ring):
+ action = False
+
+ if not self.req.is_consumed():
+ action |= True
+ ring.poll()
+
+ if not self.rsp.is_consumed():
+ action |= True
+ ring.kick()
+
+ return action
+
+ def __eq__(self, other):
+ return \
+ self.size == other.size and \
+ self.req == other.req and \
+ self.rsp == other.rsp
+
+ def __repr__(self):
+ return "%s(size=%d, %s, %s)" % \
+ (type(self).__name__, self.size, self.req, self.rsp)
+
+ def display(self):
+ complete = { True: "complete", False: "pending" }
+
+ io = complete[not self.is_pending()]
+ req = complete[self.req.is_consumed()]
+ rsp = complete[self.rsp.is_consumed()]
+
+ return "%s: io: %s, req: %s, rsp: %s" % (self, io, req, rsp)
+
+class RingWatch(object):
+ """State machine watching I/O individual ring state"""
+
+ _NEW = "_NEW"
+ BUSY = "BUSY"
+ IDLE = "IDLE"
+ STCK = "STCK"
+
+ COMMENTS = { BUSY: "Message traffic observed (OK)",
+ IDLE: "No messages observed (Ring OK, I/O depends)",
+ STCK: "No pending req/rsp consumer progress observed (BUG)" }
+
+ def __init__(self, ring, state):
+ self.ring = ring
+ self.state = state
+ self.status = RingWatch._NEW
+
+ @classmethod
+ def new(cls, ring):
+ state = ring.read()
+ return cls(ring, state)
+
+ def __str__(self):
+ return "%s(%s)[%s]" % \
+ (type(self).__name__, self.ring.key(), self.status)
+
+ def is_stuck(self):
+ return self.status == self.STCK
+
+ def is_idle(self):
+ return self.status == self.IDLE
+
+ def kick(self):
+ if self.is_stuck():
+ return self.state.kick(self.ring)
+
+ def update(self):
+
+ prev = self.state
+ curr = self.ring.read()
+
+ if curr == prev:
+ if not curr.is_consumed():
+ self.status = self.STCK
+ else:
+ self.status = self.IDLE
+ else:
+ self.status = self.BUSY
+
+ self.state = curr
+
+ def display(self):
+ return "%s: %s" % (self,
+ self.state.display())
+
+class WatchList(object):
+ """Managed collection of I/O rings under surveillance."""
+
+ def __init__(self, gen):
+ self.gen = gen
+ self.list = {}
+
+ def update(self):
+
+ # NB. clear the watch list, then rebuild it. new entries get
+ # added, existing ones updates, those gone discarded.
+ prev = self.list
+ self.list = {}
+
+ for ring in self.gen():
+
+ key = ring.key()
+ entry = prev.get(key)
+
+ try:
+ if not entry:
+ entry = RingWatch.new(ring)
+ else:
+ entry.update()
+
+ except IOError, e:
+ pass
+ # NB. racing unplug, any ring.read() may raise.
+ # nothing left to memorize then.
+ else:
+ self.list[key] = entry
+
+ def __iter__(self):
+ return self.list.itervalues()
+
+ def pending(self):
+ for entry in self:
+ if entry.is_idle() and entry.state.is_pending():
+ yield entry
+
+ def stuck(self):
+ for entry in self:
+ if entry.is_stuck():
+ yield entry
+
+ def kick(self):
+ for entry in self.stuck():
+ try:
+ entry.kick()
+ except IOError:
+ # NB. racing unplug, any ring.write() may raise.
+ pass
+
+if __name__ == '__main__':
+ from sys import argv, stdout, stderr, exit
+ from getopt import gnu_getopt, GetoptError
+ from pprint import pprint
+
+ DEFAULT_PERIOD = 1 # secs
+
+ verbose = 0
+ period = DEFAULT_PERIOD
+ backends = XenBackend.TYPES.values()
+ kick = False
+ iowatch = False
+
+ OPTIONS = ((('h', 'help'),
+ "Print this help screen."),
+
+ (('v', 'verbose'),
+ "Increase output verbosity level (use n-times)."),
+
+ (('I', 'io'),
+ "Watch out for stuck I/O (not messaging), too. (%s)" % \
+ (iowatch)),
+
+ (('t', 'types'),
+ "Comma separated list of backend types to watch. (%s)" % \
+ ",".join(map(lambda t: t.XEN_BACKEND_NAME, backends))),
+
+ (('T', 'period'),
+ "Watch update period. (%d) [secs]" % \
+ (period)),
+
+ (('k', 'kick'),
+ "Kick broken guests out of cardiac arrest. (%s)" % \
+ (kick))
+ )
+
+ COMMANDS = {"check":
+ "Single iteration quick test (takes -T seconds)."}
+
+ def usage(stream):
+ prog = os.path.basename(argv[0])
+
+ print >>stream
+
+ print >>stream, "Usage:"
+ print >>stream, "\t%s [options] {%s}" % (prog, "|".join(COMMANDS))
+
+ print >>stream
+
+ print >>stream, "Commands:"
+ for (name, desc) in COMMANDS.iteritems():
+ print >>stream, "\t%s: \t%s" % (name, desc)
+
+ print >>stream
+
+ print >>stream, "Options:"
+ for ((short, _long), desc) in OPTIONS:
+ print >>stream, "\t-%s, --%s: \t%s" % (short, _long, desc)
+
+ print >>stream
+
+ def fail(msg = None):
+ if msg: print >>stderr, "Error: %s" % msg
+ usage(stderr)
+ exit(1)
+
+ def help():
+
+ usage(stdout)
+
+ print __doc__ % (XenBackend.SYSFS_BASEDIR, RingWatch.STCK)
+
+ print "Backend Types:"
+ for k, v in XenBackend.TYPES.iteritems():
+ print "\t%s: \t%s (%s)" % (k, v.__doc__, v._name_glob)
+
+ print
+ print "Ring States:"
+ for k, v in RingWatch.COMMENTS.iteritems():
+ print "\t%s: \t%s" % (k, v)
+
+ print
+
+ try:
+ opts, args = gnu_getopt(argv[1:],
+ "hIkt:vT:",
+ ["help",
+ "io",
+ "kick",
+ "type=",
+ "verbose",
+ "period="])
+ except GetoptError, e:
+ fail(str(e))
+
+ for (o, arg) in opts:
+ try:
+ if o in ('-h', '--help'):
+ help()
+ exit(0)
+
+ elif o in ['-v', '--verbose']:
+ verbose += 1
+
+ elif o in ['-I', '--io']:
+ iowatch = True
+
+ elif o in ('-T', '--period'):
+ period = int(arg)
+
+ elif o in ('-t', '--type'):
+ backends = ",".split(arg)
+ backends = map(lambda t: XenBackend.TYPES[t], backends)
+
+ elif o in ('-k', '--kick'):
+ kick = True
+
+ else:
+ raise "BUG: option %s unhandled." % o
+
+ except ValueError:
+ fail("%s: invalid argument '%s'." % (o, arg))
+
+ try:
+ cmd = args[0]
+ except IndexError:
+ fail("Missing command.")
+
+ def ring_select():
+ for _type in backends:
+ for backend in _type.find():
+ for ring in backend.find_rings():
+ yield ring
+
+ def show(entries):
+ for watch in entries:
+ print watch.display()
+
+ def pause():
+ import time
+ time.sleep(period)
+
+ watches = WatchList(ring_select)
+
+ if cmd == "check":
+
+ # init
+ watches.update()
+
+ if verbose >= 2:
+ show(watches)
+
+ # watch for one round
+ pause()
+ watches.update()
+
+ # show result
+ crit = list(watches.stuck())
+ stuck = bool(crit)
+
+ if (iowatch):
+ crit.extend(watches.pending())
+
+ if verbose >= 1:
+ show(watches)
+ elif crit:
+ show(crit)
+
+ if stuck and kick:
+ # deal with it
+ watches.kick()
+
+ else:
+ fail("Invalid command.")
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1 of 1] xen-backwatch: Deal with broken frontend/backend ring I/O
2011-06-20 20:47 ` Daniel Stodden
@ 2011-06-21 14:39 ` Ian Jackson
0 siblings, 0 replies; 5+ messages in thread
From: Ian Jackson @ 2011-06-21 14:39 UTC (permalink / raw)
To: Daniel Stodden; +Cc: Xen
Daniel Stodden writes ("Re: [Xen-devel] [PATCH 1 of 1] xen-backwatch: Deal with broken frontend/backend ring I/O"):
> On Mon, 2011-06-20 at 12:49 -0400, Ian Jackson wrote:
> > And it doesn't seem to provide a way to specify a particular
> > domain to look for ?
>
> I briefly considered it initially, but after testing it just didn't look
> so important anymore. :}
OK :-). Well, it's an improvement, so it should go in.
> Found that the patch I sent was missing cleanup in some spots (mainly a
> program rename, and the verbose variable in __main__ ended up off by
> one). Can I sneak in the update attached before you push it?
Sure.
> Also, I never tried the make install target. Does it look okay to you?
My build test invoked it and it DTRT.
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-06-21 14:39 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-20 8:26 [PATCH 0 of 1] Deal with broken frontend/backend ring I/O Daniel Stodden
2011-06-20 8:26 ` [PATCH 1 of 1] xen-backwatch: " Daniel Stodden
2011-06-20 16:49 ` Ian Jackson
2011-06-20 20:47 ` Daniel Stodden
2011-06-21 14:39 ` Ian Jackson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.