From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:53086)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <den@openvz.org>) id 1ZZJo1-0003v8-05
	for qemu-devel@nongnu.org; Tue, 08 Sep 2015 10:23:34 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <den@openvz.org>) id 1ZZJnw-00070d-Vx
	for qemu-devel@nongnu.org; Tue, 08 Sep 2015 10:23:29 -0400
Received: from relay.parallels.com ([195.214.232.42]:55956)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <den@openvz.org>) id 1ZZJnw-0006zT-PJ
	for qemu-devel@nongnu.org; Tue, 08 Sep 2015 10:23:24 -0400
References: <1441699228-25767-1-git-send-email-den@openvz.org>
	<1441699228-25767-5-git-send-email-den@openvz.org>
	<20150908110624.GE4230@noname.redhat.com> <55EEC61F.9050301@openvz.org>
	<20150908130556.GF4230@noname.redhat.com>
From: "Denis V. Lunev" <den@openvz.org>
Message-ID: <55EEEF50.2010200@openvz.org>
Date: Tue, 8 Sep 2015 17:23:12 +0300
MIME-Version: 1.0
In-Reply-To: <20150908130556.GF4230@noname.redhat.com>
Content-Type: text/plain; charset="windows-1252"; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH 4/5] disk_deadlines: add control of
 requests time expiration
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>, qemu-devel@nongnu.org, Raushaniya Maksudova <rmaksudova@virtuozzo.com>

On 09/08/2015 04:05 PM, Kevin Wolf wrote:
> Am 08.09.2015 um 13:27 hat Denis V. Lunev geschrieben:
>> interesting point. Yes, it flushes all requests and most likely
>> hangs inside waiting requests to complete. But fortunately
>> this happens after the switch to paused state thus
>> the guest becomes paused. That's why I have missed this
>> fact.
>>
>> This (could) be considered as a problem but I have no (good)
>> solution at the moment. Should think a bit on.
> Let me suggest a radically different design. Note that I don't say this
> is necessarily how things should be done, I'm just trying to introduce
> some new ideas and broaden the discussion, so that we have a larger set
> of ideas from which we can pick the right solution(s).
>
> The core of my idea would be a new filter block driver 'timeout' that
> can be added on top of each BDS that could potentially fail, like a
> raw-posix BDS pointing to a file on NFS. This way most pieces of the
> solution are nicely modularised and don't touch the block layer core.
>
> During normal operation the driver would just be passing through
> requests to the lower layer. When it detects a timeout, however, it
> completes the request it received with -ETIMEDOUT. It also completes any
> new request it receives with -ETIMEDOUT without passing the request on
> until the request that originally timed out returns. This is our safety
> measure against anyone seeing whether or how the timed out request
> modified data.
>
> We need to make sure that bdrv_drain() doesn't wait for this request.
> Possibly we need to introduce a .bdrv_drain callback that replaces the
> default handling, because bdrv_requests_pending() in the default
> handling considers bs->file, which would still have the timed out
> request. We don't want to see this; bdrv_drain_all() should complete
> even though that request is still pending internally (externally, we
> returned -ETIMEDOUT, so we can consider it completed). This way the
> monitor stays responsive and background jobs can go on if they don't use
> the failing block device.
>
> And then we essentially reuse the rerror/werror mechanism that we
> already have to stop the VM. The device models would be extended to
> always stop the VM on -ETIMEDOUT, regardless of the error policy. In
> this state, the VM would even be migratable if you make sure that the
> pending request can't modify the image on the destination host any more.
>
> Do you think this could work, or did I miss something important?
>
> Kevin
could I propose even more radical solution then?

My original approach was based on the fact that
this could should be maintainable out-of-stream.
If the patch will be merged - this boundary condition
could be dropped.

Why not to invent 'terror' field on BdrvOptions
and process things in core block layer without
a filter? RB Tree entry will just not created if
the policy will be set to 'ignore'.

Den