Re: [Qemu-devel] [PATCH 1/2] Add virtagent file system freeze/thaw

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Michael Roth <mdroth@linux.vnet.ibm.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Jes Sorensen <Jes.Sorensen@redhat.com>,
	qemu-devel@nongnu.org, agl@us.ibm.com
Subject: Re: [Qemu-devel] [PATCH 1/2] Add virtagent file system freeze/thaw
Date: Fri, 04 Feb 2011 10:27:08 -0600	[thread overview]
Message-ID: <4D4C28DC.7000902@linux.vnet.ibm.com> (raw)
In-Reply-To: <AANLkTi=o+-AJihN-Jm6iNhjY_EWcrOvNHjCMfGbTxdY2@mail.gmail.com>

On 02/04/2011 12:13 AM, Stefan Hajnoczi wrote:
> On Thu, Feb 3, 2011 at 5:41 PM, Michael Roth<mdroth@linux.vnet.ibm.com>  wrote:
>> For things like logging and i/o on a frozen system...I agree we'd need some
>> flag for these kinds of situations. Maybe a disable_logging() flag....i
>> really don't like this though... I'd imagine even syslogd() could block
>> virtagent in this type of situation, so that would need to be disabled as
>> well.
>>
>> But doing so completely subverts our attempts and providing proper
>> accounting of what the agent is doing to the user. A user can freeze the
>> filesystem, knowing that logging would be disabled, then prod at whatever he
>> wants. So the handling should be something specific to fsfreeze, with
>> stricter requirements:
>>
>> If a user calls fsfreeze(), we disable logging, but also disable the ability
>> to do anything other than fsthaw() or fsstatus(). This actually solves the
>> potential deadlocking problem for other RPCs as well...since they cant be
>> executed in the first place.
>>
>> So I think that addresses the agent deadlocking itself, post-freeze.
>>
>> However, fsfreeze() itself might lock-up the agent as well...I'm not
>> confident we can really put any kind of bound on how long it'll take to
>> execute, and if we timeout on the client-side the agent can still block
>> here.
>>
>> Plus there are any number of other situations where an RPC can still hang
>> things...in the future when we potentially allow things like script
>> execution, they might do something like attempt to connect to a socket
>> that's already in use and wait on the server for an arbitrary amount of
>> time, or open a file on an nfs share that in currently unresponsive.
>>
>> So a solution for these situations is still needed, and I'm starting to
>> agree that threads are needed, but I don't think we should do RPCs
>> concurrently (not sure if that's what is being suggested or not). At least,
>> there's no pressing reason for it as things currently stand (there aren't
>> currently any RPCs where fast response times are all that important, so it's
>> okay to serialize them behind previous RPCs, and HMP/QMP are command at a
>> time), and it's something that Im fairly confident can be added if the need
>> arises in the future.
>>
>> But for dealing with a situation where an RPC can hang the agent, I think
>> one thread should do it. Basically:
>>
>> We associate each RPC with a time limit. Some RPCs, very special ones that
>> we'd trust with our kids, could potentially specify an unlimited timeout.
>> The client side should use this same timeout on it's end. In the future we
>> might allow the user to explicitly disable the timeout for a certain RPC.
>> The logic would then be:
>>
>> - read in a client RPC request
>> - start a thread to do RPC
>> - if there's a timeout, register an alarm(<timeout>), with a handler that
>> will call something like pthread_kill(current_worker_thread). On the thread
>> side, this signal will induce a pthread_exit()
>> - wait for the thread to return (pthread_join(current_worker_thread))
>> - return it's response back to the caller if it finished, return a timeout
>> indication otherwise
>
> I'm not sure about a timeout inside virtagent.  A client needs to
> protect itself with its own timeout and shouldn't rely on the server
> to prevent it from locking up - especially since the server is a guest
> which we have no control over.  So the timeout does not help the
> guest.

We actually have timeouts for the client already (though they'll need to 
be reworked a bit to handle the proposed solutions), what I'm proposing 
is an additional timeout on the guest/server side for the actual RPCs, 
since a blocking RPC can still hang the guest agent.

>
> Aborting an RPC handler could leave the system in an inconsistent
> state unless we are careful.  For example, aborting freeze requires
> thawing those file systems that have been successfully frozen so far.
> For other handlers it might leave temporary files around, or if they
> are not carefully written may partially update files in-place and
> leave them corrupted.
>
> So instead of a blanket timeout, I think handlers that perform
> operations that may block for unknown periods of time could
> specifically use timeouts.  That gives the handler control to perform
> cleanup.

Good point. Although, I'm not sure I want to push timeout handling to 
the actual RPCs though....something as simple as open()/read() can block 
indefinitely in certain situations, and it'll be difficult to account 
for every situation, and the resulting code will be tedious as well. I'd 
really like the make the actual RPC as simple as possible, since it's 
something that may be extended heavily over time.

So what if we simply allow an RPC to register a timeout handler at the 
beginning of the RPC call? So when the thread doing the RPC exits we:

- check to see if thread exited as a result of timeout
- check to see if a timeout handler was registered, if so, call it, 
reset the handler, then return a timeout indication
- if it didn't time out, return the response

The only burden this puts on the RPC author is that information they 
need to recover state would need to be accessible outside the thread, 
which is easily done by encapsulating state in static/global structs. So 
the timeout handler for fsfreeze, as it is currently written, would be 
something like:

va_fsfreeze_timeout_handler():
     foreach mnt in fsfreeze.mount_list:
         unfreeze(mnt)
     fsfreeze.mount_list = NULL

We'll need to be careful about lists/objects being in weird states due 
to the forced exit, but I think it's doable.

>
> Stefan

next prev parent reply	other threads:[~2011-02-04 16:28 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-01 10:58 [Qemu-devel] [PATCH 0/2] virtagent - fsfreeze support Jes.Sorensen
2011-02-01 10:58 ` [Qemu-devel] [PATCH 1/2] Add virtagent file system freeze/thaw Jes.Sorensen
2011-02-01 14:12   ` Stefan Hajnoczi
2011-02-01 14:26     ` Jes Sorensen
2011-02-01 14:34       ` Stefan Hajnoczi
2011-02-01 14:36         ` Jes Sorensen
2011-02-01 14:41           ` Stefan Hajnoczi
2011-02-01 17:22             ` Michael Roth
2011-02-01 14:48   ` [Qemu-devel] " Adam Litke
2011-02-01 15:02     ` Jes Sorensen
2011-02-01 16:50   ` Michael Roth
2011-02-02  8:38     ` Jes Sorensen
2011-02-02  7:57   ` [Qemu-devel] " Stefan Hajnoczi
2011-02-02  8:48     ` Jes Sorensen
2011-02-03 17:41       ` Michael Roth
2011-02-04  6:13         ` Stefan Hajnoczi
2011-02-04 16:27           ` Michael Roth [this message]
2011-02-04 16:52             ` Stefan Hajnoczi
2011-02-04 11:03         ` Jes Sorensen
2011-02-04 16:51           ` Michael Roth
2011-02-01 10:58 ` [Qemu-devel] [PATCH 2/2] Add monitor commands for fsfreeze support Jes.Sorensen
2011-02-01 11:25 ` [Qemu-devel] [PATCH 0/2] virtagent - " Vasiliy G Tolstov
2011-02-01 13:02   ` Jes Sorensen
2011-02-01 16:04   ` Richard W.M. Jones
2011-02-01 20:04     ` Vasiliy G Tolstov
2011-02-01 20:17       ` Richard W.M. Jones
2011-02-01 14:16 ` Stefan Hajnoczi
2011-02-01 14:28   ` Jes Sorensen
  -- strict thread matches above, loose matches on Subject: below --
2011-02-02  8:42 [Qemu-devel] [PATCH v2 " Jes.Sorensen
2011-02-02  8:42 ` [Qemu-devel] [PATCH 1/2] Add virtagent file system freeze/thaw Jes.Sorensen
2011-02-04 10:57 [Qemu-devel] [PATCH v3 0/2] virtagent - fsfreeze support Jes.Sorensen
2011-02-04 10:57 ` [Qemu-devel] [PATCH 1/2] Add virtagent file system freeze/thaw Jes.Sorensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D4C28DC.7000902@linux.vnet.ibm.com \
    --to=mdroth@linux.vnet.ibm.com \
    --cc=Jes.Sorensen@redhat.com \
    --cc=agl@us.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).