From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:38618)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <den@openvz.org>) id 1ZZGGz-0004DO-AW
	for qemu-devel@nongnu.org; Tue, 08 Sep 2015 06:37:13 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <den@openvz.org>) id 1ZZGGv-0006Sw-7F
	for qemu-devel@nongnu.org; Tue, 08 Sep 2015 06:37:09 -0400
Received: from relay.parallels.com ([195.214.232.42]:54988)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <den@openvz.org>) id 1ZZGGu-0006RL-Tl
	for qemu-devel@nongnu.org; Tue, 08 Sep 2015 06:37:05 -0400
References: <1441699228-25767-1-git-send-email-den@openvz.org>
	<55EEAB55.2070908@redhat.com>
	<20150908102217.GA12263@stefanha-thinkpad.redhat.com>
From: "Denis V. Lunev" <den@openvz.org>
Message-ID: <55EEBA45.3080503@openvz.org>
Date: Tue, 8 Sep 2015 13:36:53 +0300
MIME-Version: 1.0
In-Reply-To: <20150908102217.GA12263@stefanha-thinkpad.redhat.com>
Content-Type: text/plain; charset="windows-1252"; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH RFC 0/5] disk deadlines
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Stefan Hajnoczi <stefanha@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>, qemu-devel@nongnu.org, Raushaniya Maksudova <rmaksudova@virtuozzo.com>

On 09/08/2015 01:22 PM, Stefan Hajnoczi wrote:
> On Tue, Sep 08, 2015 at 11:33:09AM +0200, Paolo Bonzini wrote:
>>
>> On 08/09/2015 10:00, Denis V. Lunev wrote:
>>> How the given solution works?
>>>
>>> If disk-deadlines option is enabled for a drive, one controls time completion
>>> of this drive's requests. The method is as follows (further assume that this
>>> option is enabled).
>>>
>>> Every drive has its own red-black tree for keeping its requests.
>>> Expiration time of the request is a key, cookie (as id of request) is an
>>> appropriate node. Assume that every requests has 8 seconds to be completed.
>>> If request was not accomplished in time for some reasons (server crash or smth
>>> else), timer of this drive is fired and an appropriate callback requests to
>>> stop Virtial Machine (VM).
>>>
>>> VM remains stopped until all requests from the disk which caused VM's stopping
>>> are completed. Furthermore, if there is another disks with 'disk-deadlines=on'
>>> whose requests are waiting to be completed, do not start VM : wait completion
>>> of all "late" requests from all disks.
>>>
>>> Furthermore, all requests which caused VM stopping (or those that just were not
>>> completed in time) could be printed using "info disk-deadlines" qemu monitor
>>> option as follows:
>> This topic has come up several times in the past.
>>
>> I agree that the current behavior is not great, but I am not sure that
>> timeouts are safe.  For example, how is disk-deadlines=on different from
>> NFS soft mounts?  The NFS man page says
>>
>>       NB: A so-called "soft" timeout can cause silent data corruption in
>>       certain cases.  As such, use the soft option only when client
>>       responsiveness is more important than data integrity.  Using NFS
>>       over TCP or increasing the value of the retrans option may
>>       mitigate some of the risks of using the soft option.
>>
>> Note how it only says "mitigate", not solve.
> The risky part of "soft" mounts is probably that the client doesn't know
> whether or not the request completed.  So it doesn't know the state of
> the data on the server after a write request.  This is the classic
> Byzantine fault tolerance problem in distributed systems.
>
> This patch series pauses the guest like rerror=stop.  Therefore it's
> different from NFS "soft" mounts, which are like rerror=report.
>
> Guests running without this patch series may suffer from the NFS "soft"
> mounts problem when they time out and give up on the I/O request just as
> it actually completes on the server, leaving the data in a different
> state than expected.
>
> This patch series solves that problem by pausing the guest.  Action can
> be taken on the host to bring storage back and resume (similar to
> ENOSPC).
>
> In order for this to work well, QEMU's timeout value must be shorter
> than the guest's own timeout value.
>
> Stefan
nice summary, thank you :)