From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:38658) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QHG32-0004tH-Fr for qemu-devel@nongnu.org; Tue, 03 May 2011 09:53:57 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QHG31-0000m6-At for qemu-devel@nongnu.org; Tue, 03 May 2011 09:53:56 -0400 Received: from e34.co.us.ibm.com ([32.97.110.152]:46606) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QHG31-0000lp-5l for qemu-devel@nongnu.org; Tue, 03 May 2011 09:53:55 -0400 Received: from d03relay02.boulder.ibm.com (d03relay02.boulder.ibm.com [9.17.195.227]) by e34.co.us.ibm.com (8.14.4/8.13.1) with ESMTP id p43DfUaY021048 for ; Tue, 3 May 2011 07:41:30 -0600 Received: from d03av05.boulder.ibm.com (d03av05.boulder.ibm.com [9.17.195.85]) by d03relay02.boulder.ibm.com (8.13.8/8.13.8/NCO v9.1) with ESMTP id p43DrkBv115522 for ; Tue, 3 May 2011 07:53:47 -0600 Received: from d03av05.boulder.ibm.com (loopback [127.0.0.1]) by d03av05.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p43DriqM012322 for ; Tue, 3 May 2011 07:53:45 -0600 Message-ID: <4DC008E7.8010504@linux.vnet.ibm.com> Date: Tue, 03 May 2011 08:53:43 -0500 From: Michael Roth MIME-Version: 1.0 References: <1303138953-1334-1-git-send-email-mdroth@linux.vnet.ibm.com> <4DAFFD05.20304@redhat.com> <4DB0376B.8010505@linux.vnet.ibm.com> <4DBFFA61.2020104@redhat.com> In-Reply-To: <4DBFFA61.2020104@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC][PATCH v2 00/11] QEMU Guest Agent: QMP-based host/guest communication (virtagent) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jes Sorensen Cc: aliguori@linux.vnet.ibm.com, agl@linux.vnet.ibm.com, qemu-devel@nongnu.org On 05/03/2011 07:51 AM, Jes Sorensen wrote: > On 04/21/11 15:55, Michael Roth wrote: >>> Did you do anything with the fsfreeze patches, or were they dropped in >>> the migration to qapi? >> >> They were pending some changes required on the agent side that weren't >> really addressed/doable until this patchset, namely: >> >> 1) server-side timeout mechanism to recover from RPCs that can hang >> indefinitely or take a really long time (fsfreeze, fopen, etc), >> currently it's 30 seconds, may need to bump it to 60 for fsfreeze, or >> potentially add an RPC to change the server-side timeout >> 2) a simple way to temporarily turn off logging so agent doesn't >> deadlock itself >> 3) a way to register a cleanup handler when a timeout occurs. >> 4) disabling RPCs where proper accounting/logging is required >> (guest-open-file, guest-shutdown, etc) >> >> #4 isn't implemented...I think this could be done fairly in-evasively >> with something like: >> >> Response important_rpc(): >> if (!ga_log("syslog", LEVEL_CRITICAL, "important stuff happening")) >> return ERROR_LOGGING_CURRENTLY_DISABLED > > Either that, or maybe simply disable the full command while the freeze > is in progress? I fear we're more likely to miss a case of checking for > logging than we are to miss command disabling? > > It should still be very non evasive, maybe just a flag in the struct > declaring the functions marking it as logging-required and if the > no-logging flag is set, the command is made to wait, or return -EAGAIN > Yup when I actually starting dropping it in I realized this was a much better approach. Although, for now I just added something like "if (!logging_enabled) { error_set(QERR_GA_LOGGING_DISABLED); return }" to the start of functions where logging is considered critical, which will result in the user getting an error message about logging so it's not too much of a surprise to them. The actual dispatch code closely mirrors Anthony's dispatch stuff for QMP so I was hesitant to try to modify it to handle this automatically, since it would require some changes to how the schema parsing/handling is done (would probably need to add a "requires_logging" flag in the schema). Wouldn't take much though. Either way, should be a clean conversion if we decide to go that route. >> >> bool ga_log(log_domain, level, msg): >> if (log_domain == "syslog") >> if (!logging_enabled&& is_critical(log_level)) >> return False; >> syslog(msg, ...) >> else >> if (logging_enabled) >> normallog(msg, ...) >> return True >> >> With that I think we could actually drop the fsfreeze stuff in. Thoughts? > > IMHO it is better to disable the commands rather than just logging, but > either way should allow it to drop in. Kinda agree, but logging seems to be the real dependency. With the server-side timeouts now in place even doing stuff like fopen/fwrite is permitted (it would just timeout if it blocked too long). It's the logging stuff that we don't really have a way to recover from, because it's not run in a thread we can just nuke after a certain amount of time. Even when we're not frozen, we can't guarantee an fopen/fwrite/fread will succeed, so failures shouldn't be too much of a surprise since they need to be handled anyway. And determining whether or not a command should be marked as executable during a freeze is somewhat nebulous (fopen might work for read-only access, but hang for write access when O_CREATE is set, fwrite might succeed if it doesn't require a flush, etc), plus internal things like logging need to be taken into account. So, for now at least I think it's a reasonable way to do it. > > Sorry for the late reply, been a bit swamped here. No problem I have your patches in my tree now. They still need a little bit of love and testing but I should be able to get them out on the list shortly. > > Cheers, > Jes