From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:41680)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mdroth@linux.vnet.ibm.com>) id 1QHH1a-0006GY-DP
	for qemu-devel@nongnu.org; Tue, 03 May 2011 10:56:31 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mdroth@linux.vnet.ibm.com>) id 1QHH1Z-0002RP-7V
	for qemu-devel@nongnu.org; Tue, 03 May 2011 10:56:30 -0400
Received: from e34.co.us.ibm.com ([32.97.110.152]:34370)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mdroth@linux.vnet.ibm.com>) id 1QHH1Y-0002RL-W9
	for qemu-devel@nongnu.org; Tue, 03 May 2011 10:56:29 -0400
Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com
	[9.17.195.106])
	by e34.co.us.ibm.com (8.14.4/8.13.1) with ESMTP id p43Ei46F003944
	for <qemu-devel@nongnu.org>; Tue, 3 May 2011 08:44:04 -0600
Received: from d03av06.boulder.ibm.com (d03av06.boulder.ibm.com [9.17.195.245])
	by d03relay04.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id
	p43EvTfU136698
	for <qemu-devel@nongnu.org>; Tue, 3 May 2011 08:57:29 -0600
Received: from d03av06.boulder.ibm.com (loopback [127.0.0.1])
	by d03av06.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP
	id p43F1N8c029852
	for <qemu-devel@nongnu.org>; Tue, 3 May 2011 09:01:23 -0600
Message-ID: <4DC01795.5000304@linux.vnet.ibm.com>
Date: Tue, 03 May 2011 09:56:21 -0500
From: Michael Roth <mdroth@linux.vnet.ibm.com>
MIME-Version: 1.0
References: <1303138953-1334-1-git-send-email-mdroth@linux.vnet.ibm.com>
	<4DAFFD05.20304@redhat.com> <4DB0376B.8010505@linux.vnet.ibm.com>
	<4DBFFA61.2020104@redhat.com> <4DC008E7.8010504@linux.vnet.ibm.com>
	<4DC00D3E.3090804@redhat.com>
In-Reply-To: <4DC00D3E.3090804@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC][PATCH v2 00/11] QEMU Guest Agent: QMP-based
 host/guest communication (virtagent)
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Jes Sorensen <Jes.Sorensen@redhat.com>
Cc: aliguori@linux.vnet.ibm.com, agl@linux.vnet.ibm.com, qemu-devel@nongnu.org

On 05/03/2011 09:12 AM, Jes Sorensen wrote:
> On 05/03/11 15:53, Michael Roth wrote:
>>>
>>> IMHO it is better to disable the commands rather than just logging, but
>>> either way should allow it to drop in.
>>
>> Kinda agree, but logging seems to be the real dependency. With the
>> server-side timeouts now in place even doing stuff like fopen/fwrite is
>> permitted (it would just timeout if it blocked too long). It's the
>> logging stuff that we don't really have a way to recover from, because
>> it's not run in a thread we can just nuke after a certain amount of time.
>>
>> Even when we're not frozen, we can't guarantee an fopen/fwrite/fread
>> will succeed, so failures shouldn't be too much of a surprise since they
>> need to be handled anyway. And determining whether or not a command
>> should be marked as executable during a freeze is somewhat nebulous
>> (fopen might work for read-only access, but hang for write access when
>> O_CREATE is set, fwrite might succeed if it doesn't require a flush,
>> etc), plus internal things like logging need to be taken into account.
>>
>> So, for now at least I think it's a reasonable way to do it.
>
> Please be very careful with any fwrite() calls - it's not just logging.
> Any writes to the frozen file systems will result in the caller being
> put to sleep until the file system is unfrozen. It won't timeout, and
> the agent will be stuck hanging in that call.
>
> It's fun playing with the fsfreeze stuff on your desktop system - doing
> it in an xterm has 'interesting' effects..... :)
>
> This is why I prefer the 'disable function' rather 'disable logging'
> approach.
>

I'll review the code to make sure, but what I mean is that the actual 
daemon only ever does disk i/o when it's doing logging. All the RPCs are 
handled in a separate worker thread, so if they do something fubar we 
should still be able to recover (the timeouts are done by 
pthread_cancel()'ing the worker thread). So things should be recoverable 
so long as we don't log when frozen. RPCs succeeding or failing we don't 
care too much about, since they'll just result in a nice "command timed 
out" response to the user eventually.

Checking logging_enabled is more to ensure we don't let the user exploit 
the fs_freeze side-effects to do sensitive stuff under the guest owner's 
nose (not that we couldn't anyway from host, but it may important for 
piece of mind about have an externally accessible/privileged agent in 
your guest).

works_under_fsfreeze is harder to determine beforehand, unless maybe you 
flag any command that did i/o other than logging...but there are 
actually use cases where that would be too restrictive. If they're using 
fwrite to pipe output through a character device, for instance, or 
twiddling knobs in sysfs. Better to keep the same semantics as you'd 
have when "physically" on the machine, when possible.

Will need to do some good testing though, fsfreeze is definitely a 
strange state to be in :)

>>> Sorry for the late reply, been a bit swamped here.
>>
>> No problem I have your patches in my tree now. They still need a little
>> bit of love and testing but I should be able to get them out on the list
>> shortly.
>
> Sounds great!
>
> Cheers,
> Jes