Re: [Qemu-devel] lock-free monitor? - Dr. David Alan Gilbert

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Markus Armbruster <armbru@redhat.com>
Cc: xiecl.fnst@cn.fujitsu.com, qemu-devel@nongnu.org, stefanha@redhat.com
Subject: Re: [Qemu-devel] lock-free monitor?
Date: Wed, 10 Feb 2016 15:33:42 +0000	[thread overview]
Message-ID: <20160210153341.GC2560@work-vm> (raw)
In-Reply-To: <87h9hg61lh.fsf@blackfin.pond.sub.org>

* Markus Armbruster (armbru@redhat.com) wrote:
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
> 
> > * Markus Armbruster (armbru@redhat.com) wrote:
> >> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
> >> 
> >> > Hi,
> >> >   I wondered what it would take to be able to do a lock-free monitor;
> >> > i.e. one that could respond to (some) commands while the qemu big lock is held.
> >> 
> >> Requires a careful audit of the monitor code.
> >> 
> >> For instance, cur_mon needs to be made thread local for running multiple
> >> monitors concurrently.
> >
> > Hmm that's fun;  and cur_mon is used all over the place when 'fd' passing
> > is used - although that probably should be cleaned up somehow.
> 
> When cur_mon got created, we had no threads.  Even now, monitors only
> ever run under the BQL, so cur_mon's still fine.
> 
> Having a current monitor is simpler than passing one around, especially
> when you go through layers that don't know about it.  Such cases exist,
> and they made us abandon commit 376253e's hope to eliminate cur_mon.
> Unfortunately, I've since forgotten the details, so I can't point you to
> an example.

Yes,  I think maybe if we can try and remove the use of cur_mon one
use at a time outside of the monitor it might move it in the right direction.

> >> > My reason for asking is that there are cases in migration and colo
> >> > where a network block device has failed and is blocking, but it fails
> >> > at just the wrong time while the lock is held, and now, you don't have
> >> 
> >> Is it okay to block while holding the BQL?
> >
> > I'd hope the simple answer was no; unfortunately the more complex answer
> > is that we keep finding places that do.
> 
> Would you call these bugs?
> 
> Even if you do, we may want to recover from them.

They're a bug in the end result that needs fixing; so for example a 
crashing NBD server shouldn't lock you out of the monitor during a migrate,
and I don't think we current;y have other solutions.

> >                                          The cases I'm aware of are
> > mostly bdrv_drain_all calls in migration/colo, where one of the block
> > devices (e.g. an NBD network device) fails.  Generally these are called
> > at the end of a migration cycle when we're just ensuring that everything
> > really is drained and are normally called with the lock held to ensure
> > that nothing else is generating new traffic as we're clearing everything else.
> 
> Using the BQL for that drives a cork into the I/O pipeline with a Really
> Big Hammer.  Can we do better?
> 
> The answer may be "yes, but don't hold your breath."  Then we may need
> means to better cope with the bugs.

Yeh there are some places that the migraiton code holds the BQL where
I don't really understand all the things it's guarding against.

> >> > any way to unblock it since you can't do anything on the monitors.
> >> > If I could issue a command then I could have it end up doing a shutdown(2)
> >> > on the network connection and free things up.
> >> >
> >> > Maybe this would also help real-time people?
> >> >
> >> > I was thinking then, we could:
> >> >    a) Have a monitor that wasn't tied to the main loop at all - each instance
> >> > would be it's own thread and have it's own poll() (probably using the new
> >> > IO channels?)
> >> >    b) for most commands it would take the big lock before execute the command
> >> > and release it afterwards - so there would be no risk in those commands.
> >> 
> >> Saves you the auditing their code, which would probably be impractical.
> >> 
> >> >    c) Some commands you could mark as 'lock free' - they would be required
> >> > not to take any locks or wait for *anything* and for those the monitor would
> >> > not take the lock.
> >> 
> >> They also must not access shared state without suitable synchronization.
> >
> > Right; I'd expect most of the cases to either be reading simple status information,
> > or having to find whatever they need to do using rcu list walks.
> >
> >> >    d) Perhaps have some way of restricting a particular monitor session to only
> >> > allowing non-blocking commands.
> >> 
> >> Why?  To ensure you always have an emergency monitor available?
> >
> > Yes, mostly to stop screwups of putting a potentially blocking command down your
> > emergency route.
> 
> Understand.
> 
> >> >    e) Then I'd have to have a way of finding the broken device in a lockfree
> >> > way (RTU list walk?) and issuing the shutdown(2).
> >> >
> >> > Bonus points to anyone who can statically check commands in (c).
> >> 
> >> Tall order.
> >
> > Yes :-)   A (compile time selected) dynamic check might be possible
> > where the normal global lock functions and rcu block functions check if they're
> > in a monitor thread and assert.
> >
> >> > Does this make sense to everyone else, or does anyone have any better
> >> > suggestions?
> >> 
> >> Sounds like a big, hairy task to me.
> >> 
> >> Any alternatives?
> >
> > I hope so; although is the idea of making a lock-free monitor a generally
> > good idea we should do anyway?
> 
> There's no shortage of good ideas, but our bandwidth to implement,
> review, document and test them has limits.
> 
> The general plan is to reduce the scope of the BQL gradually.
> Liberating the monitor from the BQL is one step on the road to complete
> BQL elimination.  The question is whether this step needs to be taken
> *now*.

COLO probably needs something;  although in it's case I'm going to investigate
if some evil with iptables might be able to force a recovery by causing the socket
to close (after all we're assuming that case involves the destination is dead - but
I don't want to wait for a full TCP timeout).

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

next prev parent reply	other threads:[~2016-02-10 15:33 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-08 15:17 [Qemu-devel] lock-free monitor? Dr. David Alan Gilbert
2016-02-09 13:47 ` Stefan Hajnoczi
2016-02-09 13:52   ` Dr. David Alan Gilbert
2016-02-14  6:22   ` Fam Zheng
2016-02-15 13:42     ` Stefan Hajnoczi
2016-02-15 14:19       ` Markus Armbruster
2016-02-15 12:59   ` Kevin Wolf
2016-02-09 16:57 ` Markus Armbruster
2016-02-10  8:52   ` Dr. David Alan Gilbert
2016-02-10 15:12     ` Markus Armbruster
2016-02-10 15:33       ` Dr. David Alan Gilbert [this message]
2016-02-11  8:33         ` Markus Armbruster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160210153341.GC2560@work-vm \
    --to=dgilbert@redhat.com \
    --cc=armbru@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=xiecl.fnst@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.