From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38618) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZZGGz-0004DO-AW for qemu-devel@nongnu.org; Tue, 08 Sep 2015 06:37:13 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZZGGv-0006Sw-7F for qemu-devel@nongnu.org; Tue, 08 Sep 2015 06:37:09 -0400 Received: from relay.parallels.com ([195.214.232.42]:54988) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZZGGu-0006RL-Tl for qemu-devel@nongnu.org; Tue, 08 Sep 2015 06:37:05 -0400 References: <1441699228-25767-1-git-send-email-den@openvz.org> <55EEAB55.2070908@redhat.com> <20150908102217.GA12263@stefanha-thinkpad.redhat.com> From: "Denis V. Lunev" Message-ID: <55EEBA45.3080503@openvz.org> Date: Tue, 8 Sep 2015 13:36:53 +0300 MIME-Version: 1.0 In-Reply-To: <20150908102217.GA12263@stefanha-thinkpad.redhat.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH RFC 0/5] disk deadlines List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi , Paolo Bonzini Cc: Kevin Wolf , qemu-devel@nongnu.org, Raushaniya Maksudova On 09/08/2015 01:22 PM, Stefan Hajnoczi wrote: > On Tue, Sep 08, 2015 at 11:33:09AM +0200, Paolo Bonzini wrote: >> >> On 08/09/2015 10:00, Denis V. Lunev wrote: >>> How the given solution works? >>> >>> If disk-deadlines option is enabled for a drive, one controls time completion >>> of this drive's requests. The method is as follows (further assume that this >>> option is enabled). >>> >>> Every drive has its own red-black tree for keeping its requests. >>> Expiration time of the request is a key, cookie (as id of request) is an >>> appropriate node. Assume that every requests has 8 seconds to be completed. >>> If request was not accomplished in time for some reasons (server crash or smth >>> else), timer of this drive is fired and an appropriate callback requests to >>> stop Virtial Machine (VM). >>> >>> VM remains stopped until all requests from the disk which caused VM's stopping >>> are completed. Furthermore, if there is another disks with 'disk-deadlines=on' >>> whose requests are waiting to be completed, do not start VM : wait completion >>> of all "late" requests from all disks. >>> >>> Furthermore, all requests which caused VM stopping (or those that just were not >>> completed in time) could be printed using "info disk-deadlines" qemu monitor >>> option as follows: >> This topic has come up several times in the past. >> >> I agree that the current behavior is not great, but I am not sure that >> timeouts are safe. For example, how is disk-deadlines=on different from >> NFS soft mounts? The NFS man page says >> >> NB: A so-called "soft" timeout can cause silent data corruption in >> certain cases. As such, use the soft option only when client >> responsiveness is more important than data integrity. Using NFS >> over TCP or increasing the value of the retrans option may >> mitigate some of the risks of using the soft option. >> >> Note how it only says "mitigate", not solve. > The risky part of "soft" mounts is probably that the client doesn't know > whether or not the request completed. So it doesn't know the state of > the data on the server after a write request. This is the classic > Byzantine fault tolerance problem in distributed systems. > > This patch series pauses the guest like rerror=stop. Therefore it's > different from NFS "soft" mounts, which are like rerror=report. > > Guests running without this patch series may suffer from the NFS "soft" > mounts problem when they time out and give up on the I/O request just as > it actually completes on the server, leaving the data in a different > state than expected. > > This patch series solves that problem by pausing the guest. Action can > be taken on the host to bring storage back and resume (similar to > ENOSPC). > > In order for this to work well, QEMU's timeout value must be shorter > than the guest's own timeout value. > > Stefan nice summary, thank you :)