From mboxrd@z Thu Jan 1 00:00:00 1970 From: BVK Chaitanya Subject: Re: On netfront accelerator add/remove watches Date: Wed, 30 Jul 2008 10:28:50 +0530 Message-ID: <488FF50A.5000404@symantec.com> References: <488E8F39.4020406@symantec.com> <488EFA50.1070708@solarflare.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <488EFA50.1070708@solarflare.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Neil Turton Cc: Xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org Neil Turton wrote: > Hi, > > BVK Chaitanya wrote: >> I see that netfront_accel_add_watch and netfront_accel_remove_watch >> functions are _not_ protected by accelerator_mutex in accel.c Is there >> any specific reason for this? > > Yes. These functions need to be synchronised by the callers. Adding a > mutex here would ensure that they didn't execute at the same time, but > wouldn't impose any order on the calls. This matters because add > followed by remove is different from remove followed by add. The > callers need to decide which order they should be executed in. > > So the watch is only added/removed from a xenbus callback. I think > these callbacks should be synchronised by xenbus. Can someone confirm that? OK. I understand it now. > >> I see that they sometimes get called twice (and result in BUG_ON) in >> very fast (20ms) domain suspend-resume cycles and I couldn't figure out >> how it is possible :-( > > Is that the BUG_ON in netfront_accelerator_add_watch? One possible > explanation is that suspend_cancel is called and then otherend_changed > is called. Can you add a printk to netfront_suspend_cancel to see if it > gets called just before the BUG_ON gets triggered? > Yes, BUG_ON was from netfront_accelerator_add_watch function. I think i got the problem: xen_suspend which calls suspend_cancel is not serialized properly. Under heavy load and very fine suspend-resume cycles, multiple suspend_cancel instances can be running simultaneously. regards, -- bvk-chaitanya