From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43645) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZZGU1-0002jQ-Lc for qemu-devel@nongnu.org; Tue, 08 Sep 2015 06:50:38 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZZGTy-0005z2-1u for qemu-devel@nongnu.org; Tue, 08 Sep 2015 06:50:37 -0400 Received: from relay.parallels.com ([195.214.232.42]:56771) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZZGTx-0005xk-Hn for qemu-devel@nongnu.org; Tue, 08 Sep 2015 06:50:33 -0400 References: <1441699228-25767-1-git-send-email-den@openvz.org> <55EEAB55.2070908@redhat.com> <55EEAD4B.6090609@openvz.org> From: "Denis V. Lunev" Message-ID: <55EEBD69.5030701@openvz.org> Date: Tue, 8 Sep 2015 13:50:17 +0300 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH RFC 0/5] disk deadlines List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Andrey Korolyov Cc: Kevin Wolf , Paolo Bonzini , "qemu-devel@nongnu.org" , Stefan Hajnoczi , Raushaniya Maksudova On 09/08/2015 01:37 PM, Andrey Korolyov wrote: > On Tue, Sep 8, 2015 at 12:41 PM, Denis V. Lunev wrote: >> On 09/08/2015 12:33 PM, Paolo Bonzini wrote: >>> >>> On 08/09/2015 10:00, Denis V. Lunev wrote: >>>> How the given solution works? >>>> >>>> If disk-deadlines option is enabled for a drive, one controls time >>>> completion >>>> of this drive's requests. The method is as follows (further assume that >>>> this >>>> option is enabled). >>>> >>>> Every drive has its own red-black tree for keeping its requests. >>>> Expiration time of the request is a key, cookie (as id of request) is an >>>> appropriate node. Assume that every requests has 8 seconds to be >>>> completed. >>>> If request was not accomplished in time for some reasons (server crash or >>>> smth >>>> else), timer of this drive is fired and an appropriate callback requests >>>> to >>>> stop Virtial Machine (VM). >>>> >>>> VM remains stopped until all requests from the disk which caused VM's >>>> stopping >>>> are completed. Furthermore, if there is another disks with >>>> 'disk-deadlines=on' >>>> whose requests are waiting to be completed, do not start VM : wait >>>> completion >>>> of all "late" requests from all disks. >>>> >>>> Furthermore, all requests which caused VM stopping (or those that just >>>> were not >>>> completed in time) could be printed using "info disk-deadlines" qemu >>>> monitor >>>> option as follows: >>> This topic has come up several times in the past. >>> >>> I agree that the current behavior is not great, but I am not sure that >>> timeouts are safe. For example, how is disk-deadlines=on different from >>> NFS soft mounts? The NFS man page says >>> >>> NB: A so-called "soft" timeout can cause silent data corruption in >>> certain cases. As such, use the soft option only when client >>> responsiveness is more important than data integrity. Using NFS >>> over TCP or increasing the value of the retrans option may >>> mitigate some of the risks of using the soft option. >>> >>> Note how it only says "mitigate", not solve. >>> >>> Paolo >> This solution is far not perfect as there is a race window for >> request complete anyway. Though the amount of failures is >> reduced by 2-3 orders of magnitude. >> >> The behavior is similar not for soft mounts, which could >> corrupt the data but to hard mounts which are default AFAIR. >> It will not corrupt the data and should patiently wait >> request complete. >> >> Without the disk the guest is not able to serve any request and >> thus keeping it running does not make serious sense. >> >> This approach is used by Odin in production for years and >> we were able to seriously reduce the amount of end-user >> reclamations. We were unable to invent any reasonable >> solution without guest modification/timeouts tuning. >> >> Anyway, this code is off by default, storage agnostic, separated. >> Yes, we will be able to maintain it for us out-of-tree, but... >> Den >> > Thanks, the series looks very promising. I have a rather side question > - assuming that we have a guest for which scsi/ide usage is only an > option, wouldn`t the timekeeping issues from the pause/resume action > be a corner problem there? I do not think so. The guest can be paused/suspended by the management and resumes. Normally it takes some time for guest to start see the time difference and speedup is limited. > The assumption based on a fact that the > guests with appropriate kvmclock settings can rather softly handle a > resulting timer jump and at the same moment they are not bounded at > most to the 'legacy' storage interfaces, but those guests with > interfaces which are not prone to 'time-outing' can commonly misbehave > as well from a large timer jump. For an IDE, the approach proposed by > a patch is an only option, and for SCSI it is better to tune guest > driver timeout instead, if guest OS allows that. So yes, description > for possible drawbacks would be very useful there. OK. I will add the note for this. Though there are cases when this timeout could not be tuned at all even for a SCSI case, f.e. Windows will BSOD with 7b early on boot without the solution applied and I do not know good ways to tweak this timeout in guest. It is far too specific. Den