From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=57714 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PlOtK-0007k3-Ho for qemu-devel@nongnu.org; Fri, 04 Feb 2011 11:52:16 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PlOtF-0005nX-Sp for qemu-devel@nongnu.org; Fri, 04 Feb 2011 11:52:10 -0500 Received: from mail-gy0-f173.google.com ([209.85.160.173]:53413) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PlOtF-0005nQ-QP for qemu-devel@nongnu.org; Fri, 04 Feb 2011 11:52:09 -0500 Received: by gye5 with SMTP id 5so1128424gye.4 for ; Fri, 04 Feb 2011 08:52:09 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <4D4C28DC.7000902@linux.vnet.ibm.com> References: <1296557928-30019-1-git-send-email-Jes.Sorensen@redhat.com> <1296557928-30019-2-git-send-email-Jes.Sorensen@redhat.com> <4D491A5A.5020008@redhat.com> <4D4AE8D7.3060209@linux.vnet.ibm.com> <4D4C28DC.7000902@linux.vnet.ibm.com> Date: Fri, 4 Feb 2011 16:52:08 +0000 Message-ID: Subject: Re: [Qemu-devel] [PATCH 1/2] Add virtagent file system freeze/thaw From: Stefan Hajnoczi Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Michael Roth Cc: Jes Sorensen , qemu-devel@nongnu.org, agl@us.ibm.com On Fri, Feb 4, 2011 at 4:27 PM, Michael Roth wr= ote: >> Aborting an RPC handler could leave the system in an inconsistent >> state unless we are careful. =A0For example, aborting freeze requires >> thawing those file systems that have been successfully frozen so far. >> For other handlers it might leave temporary files around, or if they >> are not carefully written may partially update files in-place and >> leave them corrupted. >> >> So instead of a blanket timeout, I think handlers that perform >> operations that may block for unknown periods of time could >> specifically use timeouts. =A0That gives the handler control to perform >> cleanup. > > Good point. Although, I'm not sure I want to push timeout handling to the > actual RPCs though....something as simple as open()/read() can block > indefinitely in certain situations, and it'll be difficult to account for > every situation, and the resulting code will be tedious as well. I'd real= ly > like the make the actual RPC as simple as possible, since it's something > that may be extended heavily over time. > > So what if we simply allow an RPC to register a timeout handler at the > beginning of the RPC call? So when the thread doing the RPC exits we: > > - check to see if thread exited as a result of timeout > - check to see if a timeout handler was registered, if so, call it, reset > the handler, then return a timeout indication > - if it didn't time out, return the response > > The only burden this puts on the RPC author is that information they need= to > recover state would need to be accessible outside the thread, which is > easily done by encapsulating state in static/global structs. So the timeo= ut > handler for fsfreeze, as it is currently written, would be something like= : > > va_fsfreeze_timeout_handler(): > =A0 =A0foreach mnt in fsfreeze.mount_list: > =A0 =A0 =A0 =A0unfreeze(mnt) > =A0 =A0fsfreeze.mount_list =3D NULL > > We'll need to be careful about lists/objects being in weird states due to > the forced exit, but I think it's doable. Yeah, still requires discipline but it could work. Stefan