From mboxrd@z Thu Jan 1 00:00:00 1970 From: Olaf Hering Subject: Re: [PATCH 2/3] Handle GNTST_eagain in kernel drivers Date: Mon, 2 Jan 2012 17:06:12 +0100 Message-ID: <20120102160611.GB12529@aepfle.de> References: <1324092141-9730-1-git-send-email-adin@scannell.ca> <1324092141-9730-3-git-send-email-adin@scannell.ca> <20111217143015.GD14816@konrad-lan> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <20111217143015.GD14816@konrad-lan> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Konrad Rzeszutek Wilk Cc: andres@gridcentric.ca, xen-devel@lists.xensource.com, Adin Scannell , JBeulich@suse.com, adin@gridcentric.com List-Id: xen-devel@lists.xenproject.org On Sat, Dec 17, Konrad Rzeszutek Wilk wrote: > > +#define gnttab_check_GNTST_eagain_while(__HCop, __HCarg_p) \ > > So why does this have to be a macro? What is the advantage of that > versus making this a function? I dont remember why I turned this into a macro instead of a function. > > +do { \ > > + u8 __hc_delay = 1; \ > > + int __ret; \ > > + while (unlikely((__HCarg_p)->status == GNTST_eagain && __hc_delay)) { \ > > + msleep(__hc_delay++); \ > > Ugh. Not sure what happend to this, but there are tons of '\' at the > end. A multiline macro needs backslashes at the end. > So why msleep? Why not go for a proper yield? Call the safe_halt() > function? It needs some interuptible sleep, whatever is best in this context. > > + __ret = HYPERVISOR_grant_table_op(__HCop, (__HCarg_p), 1); \ > > + BUG_ON(__ret); \ > > + } \ > > + if (__hc_delay == 0) { \ > > So this would happen if we rolled over __hc_delay, so did this more than > 255 times? Presumarily this can happen if the swapper in dom0 crashes.. Or if something in the paging paths goes wrong. > I would recommend you use 'WARN' here and include tons of details. > This is a pretty serious issues, is it not? Either the host is really busy and cant page-in quick enough even after so-many seconds. Or something in the pager/hypervisor does not work right. In either case, a backtrace wont help much as it does only cause noise. The printk below prints the function name (I think thats the reason why it is a macro) to give some hint. > > + printk(KERN_ERR "%s: gnt busy\n", __func__,); \ > > + (__HCarg_p)->status = GNTST_bad_page; \ > > + } \ > > + if ((__HCarg_p)->status != GNTST_okay) \ > > + printk(KERN_ERR "%s: gnt status %x\n", \ > > + __func__, (__HCarg_p)->status); \ > > Use GNTTABOP_error_msgs. Also should we continue? What is the > impact if we do continue? The times this is hit is if the status > is not GNTS_okay and if it is not GNTS_eagain - so what are the > situations in which this happens and what can the domain do > to recover? Should there be some helpfull message instead of > just "gnt status: X" ? The caller has to deal with the various !GNTST_okay states anyway, this patch wont change that fact. Olaf