Re: [Qemu-devel] lock-free monitor? - Dr. David Alan Gilbert

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Markus Armbruster <armbru@redhat.com>
Cc: xiecl.fnst@cn.fujitsu.com, qemu-devel@nongnu.org, stefanha@redhat.com
Subject: Re: [Qemu-devel] lock-free monitor?
Date: Wed, 10 Feb 2016 15:33:42 +0000	[thread overview]
Message-ID: <20160210153341.GC2560@work-vm> (raw)
In-Reply-To: <87h9hg61lh.fsf@blackfin.pond.sub.org>

* Markus Armbruster (armbru@redhat.com) wrote:
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
> 
> > * Markus Armbruster (armbru@redhat.com) wrote:
> >> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
> >> 
> >> > Hi,
> >> >   I wondered what it would take to be able to do a lock-free monitor;
> >> > i.e. one that could respond to (some) commands while the qemu big lock is held.
> >> 
> >> Requires a careful audit of the monitor code.
> >> 
> >> For instance, cur_mon needs to be made thread local for running multiple
> >> monitors concurrently.
> >
> > Hmm that's fun;  and cur_mon is used all over the place when 'fd' passing
> > is used - although that probably should be cleaned up somehow.
> 
> When cur_mon got created, we had no threads.  Even now, monitors only
> ever run under the BQL, so cur_mon's still fine.
> 
> Having a current monitor is simpler than passing one around, especially
> when you go through layers that don't know about it.  Such cases exist,
> and they made us abandon commit 376253e's hope to eliminate cur_mon.
> Unfortunately, I've since forgotten the details, so I can't point you to
> an example.

Yes,  I think maybe if we can try and remove the use of cur_mon one
use at a time outside of the monitor it might move it in the right direction.

> >> > My reason for asking is that there are cases in migration and colo
> >> > where a network block device has failed and is blocking, but it fails
> >> > at just the wrong time while the lock is held, and now, you don't have
> >> 
> >> Is it okay to block while holding the BQL?
> >
> > I'd hope the simple answer was no; unfortunately the more complex answer
> > is that we keep finding places that do.
> 
> Would you call these bugs?
> 
> Even if you do, we may want to recover from them.

They're a bug in the end result that needs fixing; so for example a 
crashing NBD server shouldn't lock you out of the monitor during a migrate,
and I don't think we current;y have other solutions.

> >                                          The cases I'm aware of are
> > mostly bdrv_drain_all calls in migration/colo, where one of the block
> > devices (e.g. an NBD network device) fails.  Generally these are called
> > at the end of a migration cycle when we're just ensuring that everything
> > really is drained and are normally called with the lock held to ensure
> > that nothing else is generating new traffic as we're clearing everything else.
> 
> Using the BQL for that drives a cork into the I/O pipeline with a Really
> Big Hammer.  Can we do better?
> 
> The answer may be "yes, but don't hold your breath."  Then we may need
> means to better cope with the bugs.

Yeh there are some places that the migraiton code holds the BQL where
I don't really understand all the things it's guarding against.

> >> > any way to unblock it since you can't do anything on the monitors.
> >> > If I could issue a command then I could have it end up doing a shutdown(2)
> >> > on the network connection and free things up.
> >> >
> >> > Maybe this would also help real-time people?
> >> >
> >> > I was thinking then, we could:
> >> >    a) Have a monitor that wasn't tied to the main loop at all - each instance
> >> > would be it's own thread and have it's own poll() (probably using the new
> >> > IO channels?)
> >> >    b) for most commands it would take the big lock before execute the command
> >> > and release it afterwards - so there would be no risk in those commands.
> >> 
> >> Saves you the auditing their code, which would probably be impractical.
> >> 
> >> >    c) Some commands you could mark as 'lock free' - they would be required
> >> > not to take any locks or wait for *anything* and for those the monitor would
> >> > not take the lock.
> >> 
> >> They also must not access shared state without suitable synchronization.
> >
> > Right; I'd expect most of the cases to either be reading simple status information,
> > or having to find whatever they need to do using rcu list walks.
> >
> >> >    d) Perhaps have some way of restricting a particular monitor session to only
> >> > allowing non-blocking commands.
> >> 
> >> Why?  To ensure you always have an emergency monitor available?
> >
> > Yes, mostly to stop screwups of putting a potentially blocking command down your
> > emergency route.
> 
> Understand.
> 
> >> >    e) Then I'd have to have a way of finding the broken device in a lockfree
> >> > way (RTU list walk?) and issuing the shutdown(2).
> >> >
> >> > Bonus points to anyone who can statically check commands in (c).
> >> 
> >> Tall order.
> >
> > Yes :-)   A (compile time selected) dynamic check might be possible
> > where the normal global lock functions and rcu block functions check if they're
> > in a monitor thread and assert.
> >
> >> > Does this make sense to everyone else, or does anyone have any better
> >> > suggestions?
> >> 
> >> Sounds like a big, hairy task to me.
> >> 
> >> Any alternatives?
> >
> > I hope so; although is the idea of making a lock-free monitor a generally
> > good idea we should do anyway?
> 
> There's no shortage of good ideas, but our bandwidth to implement,
> review, document and test them has limits.
> 
> The general plan is to reduce the scope of the BQL gradually.
> Liberating the monitor from the BQL is one step on the road to complete
> BQL elimination.  The question is whether this step needs to be taken
> *now*.

COLO probably needs something;  although in it's case I'm going to investigate
if some evil with iptables might be able to force a recovery by causing the socket
to close (after all we're assuming that case involves the destination is dead - but
I don't want to wait for a full TCP timeout).

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

next prev parent reply	other threads:[~2016-02-10 15:33 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-08 15:17 [Qemu-devel] lock-free monitor? Dr. David Alan Gilbert
2016-02-09 13:47 ` Stefan Hajnoczi
2016-02-09 13:52   ` Dr. David Alan Gilbert
2016-02-14  6:22   ` Fam Zheng
2016-02-15 13:42     ` Stefan Hajnoczi
2016-02-15 14:19       ` Markus Armbruster
2016-02-15 12:59   ` Kevin Wolf
2016-02-09 16:57 ` Markus Armbruster
2016-02-10  8:52   ` Dr. David Alan Gilbert
2016-02-10 15:12     ` Markus Armbruster
2016-02-10 15:33       ` Dr. David Alan Gilbert [this message]
2016-02-11  8:33         ` Markus Armbruster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160210153341.GC2560@work-vm \
    --to=dgilbert@redhat.com \
    --cc=armbru@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=xiecl.fnst@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).