From: "Denis V. Lunev" <den@openvz.org>
To: Andrey Korolyov <andrey@xdel.ru>
Cc: Kevin Wolf <kwolf@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
Stefan Hajnoczi <stefanha@redhat.com>,
Raushaniya Maksudova <rmaksudova@virtuozzo.com>
Subject: Re: [Qemu-devel] [PATCH RFC 0/5] disk deadlines
Date: Tue, 8 Sep 2015 13:50:17 +0300 [thread overview]
Message-ID: <55EEBD69.5030701@openvz.org> (raw)
In-Reply-To: <CABYiri-k_HvR54JYXfAShAcGLs+seyXVJSO7ufnKHdhqXEXuig@mail.gmail.com>
On 09/08/2015 01:37 PM, Andrey Korolyov wrote:
> On Tue, Sep 8, 2015 at 12:41 PM, Denis V. Lunev <den@openvz.org> wrote:
>> On 09/08/2015 12:33 PM, Paolo Bonzini wrote:
>>>
>>> On 08/09/2015 10:00, Denis V. Lunev wrote:
>>>> How the given solution works?
>>>>
>>>> If disk-deadlines option is enabled for a drive, one controls time
>>>> completion
>>>> of this drive's requests. The method is as follows (further assume that
>>>> this
>>>> option is enabled).
>>>>
>>>> Every drive has its own red-black tree for keeping its requests.
>>>> Expiration time of the request is a key, cookie (as id of request) is an
>>>> appropriate node. Assume that every requests has 8 seconds to be
>>>> completed.
>>>> If request was not accomplished in time for some reasons (server crash or
>>>> smth
>>>> else), timer of this drive is fired and an appropriate callback requests
>>>> to
>>>> stop Virtial Machine (VM).
>>>>
>>>> VM remains stopped until all requests from the disk which caused VM's
>>>> stopping
>>>> are completed. Furthermore, if there is another disks with
>>>> 'disk-deadlines=on'
>>>> whose requests are waiting to be completed, do not start VM : wait
>>>> completion
>>>> of all "late" requests from all disks.
>>>>
>>>> Furthermore, all requests which caused VM stopping (or those that just
>>>> were not
>>>> completed in time) could be printed using "info disk-deadlines" qemu
>>>> monitor
>>>> option as follows:
>>> This topic has come up several times in the past.
>>>
>>> I agree that the current behavior is not great, but I am not sure that
>>> timeouts are safe. For example, how is disk-deadlines=on different from
>>> NFS soft mounts? The NFS man page says
>>>
>>> NB: A so-called "soft" timeout can cause silent data corruption in
>>> certain cases. As such, use the soft option only when client
>>> responsiveness is more important than data integrity. Using NFS
>>> over TCP or increasing the value of the retrans option may
>>> mitigate some of the risks of using the soft option.
>>>
>>> Note how it only says "mitigate", not solve.
>>>
>>> Paolo
>> This solution is far not perfect as there is a race window for
>> request complete anyway. Though the amount of failures is
>> reduced by 2-3 orders of magnitude.
>>
>> The behavior is similar not for soft mounts, which could
>> corrupt the data but to hard mounts which are default AFAIR.
>> It will not corrupt the data and should patiently wait
>> request complete.
>>
>> Without the disk the guest is not able to serve any request and
>> thus keeping it running does not make serious sense.
>>
>> This approach is used by Odin in production for years and
>> we were able to seriously reduce the amount of end-user
>> reclamations. We were unable to invent any reasonable
>> solution without guest modification/timeouts tuning.
>>
>> Anyway, this code is off by default, storage agnostic, separated.
>> Yes, we will be able to maintain it for us out-of-tree, but...
>> Den
>>
> Thanks, the series looks very promising. I have a rather side question
> - assuming that we have a guest for which scsi/ide usage is only an
> option, wouldn`t the timekeeping issues from the pause/resume action
> be a corner problem there?
I do not think so. The guest can be paused/suspended by the
management and resumes. Normally it takes some time
for guest to start see the time difference and speedup is
limited.
> The assumption based on a fact that the
> guests with appropriate kvmclock settings can rather softly handle a
> resulting timer jump and at the same moment they are not bounded at
> most to the 'legacy' storage interfaces, but those guests with
> interfaces which are not prone to 'time-outing' can commonly misbehave
> as well from a large timer jump. For an IDE, the approach proposed by
> a patch is an only option, and for SCSI it is better to tune guest
> driver timeout instead, if guest OS allows that. So yes, description
> for possible drawbacks would be very useful there.
OK. I will add the note for this.
Though there are cases when this timeout could not be
tuned at all even for a SCSI case, f.e. Windows will BSOD
with 7b early on boot without the solution applied
and I do not know good ways to tweak this timeout
in guest. It is far too specific.
Den
next prev parent reply other threads:[~2015-09-08 10:50 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-08 8:00 [Qemu-devel] [PATCH RFC 0/5] disk deadlines Denis V. Lunev
2015-09-08 8:00 ` [Qemu-devel] [PATCH 1/5] add QEMU style defines for __sync_add_and_fetch Denis V. Lunev
2015-09-10 8:19 ` Stefan Hajnoczi
2015-09-08 8:00 ` [Qemu-devel] [PATCH 2/5] disk_deadlines: add request to resume Virtual Machine Denis V. Lunev
2015-09-10 8:51 ` Stefan Hajnoczi
2015-09-10 19:18 ` Denis V. Lunev
2015-09-14 16:46 ` Stefan Hajnoczi
2015-09-08 8:00 ` [Qemu-devel] [PATCH 3/5] disk_deadlines: add disk-deadlines option per drive Denis V. Lunev
2015-09-10 9:05 ` Stefan Hajnoczi
2015-09-08 8:00 ` [Qemu-devel] [PATCH 4/5] disk_deadlines: add control of requests time expiration Denis V. Lunev
2015-09-08 9:35 ` Fam Zheng
2015-09-08 9:42 ` Denis V. Lunev
2015-09-08 11:06 ` Kevin Wolf
2015-09-08 11:27 ` Denis V. Lunev
2015-09-08 13:05 ` Kevin Wolf
2015-09-08 14:23 ` Denis V. Lunev
2015-09-08 14:48 ` Kevin Wolf
2015-09-10 10:27 ` Stefan Hajnoczi
2015-09-10 11:39 ` Kevin Wolf
2015-09-14 16:53 ` Stefan Hajnoczi
2015-09-25 12:34 ` Dr. David Alan Gilbert
2015-09-28 12:42 ` Stefan Hajnoczi
2015-09-28 13:55 ` Dr. David Alan Gilbert
2015-09-08 8:00 ` [Qemu-devel] [PATCH 5/5] disk_deadlines: add info disk-deadlines option Denis V. Lunev
2015-09-08 16:20 ` Eric Blake
2015-09-08 16:26 ` Eric Blake
2015-09-10 18:53 ` Denis V. Lunev
2015-09-10 19:13 ` Denis V. Lunev
2015-09-08 8:58 ` [Qemu-devel] [PATCH RFC 0/5] disk deadlines Vasiliy Tolstov
2015-09-08 9:20 ` Fam Zheng
2015-09-08 10:11 ` Kevin Wolf
2015-09-08 10:13 ` Denis V. Lunev
2015-09-08 10:20 ` Fam Zheng
2015-09-08 10:46 ` Denis V. Lunev
2015-09-08 10:49 ` Kevin Wolf
2015-09-08 13:20 ` Fam Zheng
2015-09-08 9:33 ` Paolo Bonzini
2015-09-08 9:41 ` Denis V. Lunev
2015-09-08 9:43 ` Paolo Bonzini
2015-09-08 10:37 ` Andrey Korolyov
2015-09-08 10:50 ` Denis V. Lunev [this message]
2015-09-08 10:07 ` Kevin Wolf
2015-09-08 10:08 ` Denis V. Lunev
2015-09-08 10:22 ` Stefan Hajnoczi
2015-09-08 10:26 ` Paolo Bonzini
2015-09-08 10:36 ` Denis V. Lunev
2015-09-08 19:11 ` John Snow
2015-09-10 19:29 ` [Qemu-devel] Summary: " Denis V. Lunev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55EEBD69.5030701@openvz.org \
--to=den@openvz.org \
--cc=andrey@xdel.ru \
--cc=kwolf@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=rmaksudova@virtuozzo.com \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.