From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S940439AbdAGHmm (ORCPT ); Sat, 7 Jan 2017 02:42:42 -0500 Received: from out1-smtp.messagingengine.com ([66.111.4.25]:42024 "EHLO out1-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751950AbdAGHmf (ORCPT ); Sat, 7 Jan 2017 02:42:35 -0500 X-ME-Sender: X-Sasl-enc: S7vavvj+ACF0TuADi/q/OBnOB9zaCsNxBYjMAprjyy0t 1483774953 Date: Sat, 7 Jan 2017 08:42:33 +0100 From: Greg KH To: Long Li Cc: KY Srinivasan , Haiyang Zhang , "devel@linuxdriverproject.org" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH v2] hv: retry infinitely on hypercall transient failures Message-ID: <20170107074233.GB18087@kroah.com> References: <1483582340-26770-1-git-send-email-longli@exchange.microsoft.com> <20170105074742.GB4547@kroah.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.7.2 (2016-11-26) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Jan 07, 2017 at 07:23:14AM +0000, Long Li wrote: > > -----Original Message----- > > From: Greg KH [mailto:greg@kroah.com] > > Sent: Wednesday, January 04, 2017 11:48 PM > > To: Long Li > > Cc: KY Srinivasan ; Haiyang Zhang > > ; devel@linuxdriverproject.org; linux- > > kernel@vger.kernel.org > > Subject: Re: [PATCH v2] hv: retry infinitely on hypercall transient failures > > > > On Wed, Jan 04, 2017 at 06:12:20PM -0800, Long Li wrote: > > > From: Long Li > > > > > > Hyper-v host guarantees that a hypercall will finish in reasonable time. > > > Retry infinitely on transient failures to avoid returning error to upper layer. > > > > Again, never retry "forever", always have a way out, otherwise you will crash. > > > > And again, why are you making this change? What problem does it solve? > > The problem it tries to solve is that in this code we are returning > error prematurely on transient failures. The hypercall is used mostly > in channel establishment. If we return a transient failure, the VM may > not boot or not useful after boot due to some devices missing. > > Another approach is to increase the number of retries. But we don't > know how many retries is safe, and Windows host side expects the guest > retry infinitely and not return error on transient failures. That implies a lot of trust in the host side, don't you think? Worse case, make the delay a minute or so, but give the system a way out incase there's a bug in the host. As there will be bugs in the host, just like there are bugs in the client :) thanks, greg k-h