All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Stodden <daniel.stodden@citrix.com>
To: Xen <xen-devel@lists.xensource.com>
Subject: [PATCH 0 of 1] Deal with broken frontend/backend ring I/O.
Date: Mon, 20 Jun 2011 01:26:29 -0700	[thread overview]
Message-ID: <patchbomb.1308558389@localhost6.localdomain6> (raw)


Hi.

After running this blkback patch (Don't let in-flight requests defer
pending ones...)

http://lists.xensource.com/archives/html/xen-devel/2011-05/msg01968.html

for a while I guess it's mostly been verified.

Unfortunately, it also revealed a great potential to demo old guest
bugs. The 2.6.32 tree used to have a problem with lost notifications
during IRQ handler migration, due to a glitch in the dynirq handler
logic.

http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.32.y.git;a=commitdiff;h=c5783925493e315f91330241546da7915dcc46e3

Blkfront got fixed in stable/v2.6.32.y, but looks as at least RHEL6
didn't patch it (yet), so I suspect CentOS and derivatives to suffer
too.

Xen-blkfront is particularly sensitive to this. Some people seem to
report around one or two incidents per week. Presumably more on
heavily loaded systems (to repro, manually spinning the affinity mask
under scattered I/O will trigger almost immediately). That's going to
increase.

So let's learn to live with that. Main issue is that even if you know
what to blame, there's nothing in place to deal with it.

I'd like to propose toolstack support which provides people with a
workaround. With minimal kernel support, a watchdog can mostly live in
userland, is easy to do and won't need to clutter backend drivers.

This can hardly be considered a fix for what's essentiallly guest
problem. But it gives hosts a chance to automate guest recovery until
there's an update.

Also, it's nice for debugging. Ring I/O and event races are a constant
source of paranaoia whenever guests appear to wedge, and I believe it
might help to drastically reduce time spent on remote triage in some
cases.

It can also identify excessively blocking I/O (as opposed to a stuck
message dispatch).

Some potential use cases

  - Run occasionally (cron). Alerting on production systems where
    guest OSes resides in a different admistrative domain with no
    prospect for a quick fix. Might go into distros.

  - More frequently, once the machine is known to host guests prone to
    error. There shouldn't be much of a performance impact anyway. But
    it might want to be tuned to not start spamming the console logs.

  - Command line test. For people reporting I/O issues, wherever
    suspecting front/backend problems (or to dismiss that). Or to aid
    driver hacking. Might also go in xen-bugtool.

I chose to drop it into tools/misc. It's rather standalone. Takes a
sysfs patch to blkback. I didn't add netback support, but I guess that
would look very similar if it ever becomes desirable.

Cheers,
Daniel

             reply	other threads:[~2011-06-20  8:26 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-20  8:26 Daniel Stodden [this message]
2011-06-20  8:26 ` [PATCH 1 of 1] xen-backwatch: Deal with broken frontend/backend ring I/O Daniel Stodden
2011-06-20 16:49   ` Ian Jackson
2011-06-20 20:47     ` Daniel Stodden
2011-06-21 14:39       ` Ian Jackson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=patchbomb.1308558389@localhost6.localdomain6 \
    --to=daniel.stodden@citrix.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.