From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:59775) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UT5KF-0007by-BA for qemu-devel@nongnu.org; Fri, 19 Apr 2013 03:01:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UT5KC-0002bI-Id for qemu-devel@nongnu.org; Fri, 19 Apr 2013 03:01:39 -0400 Received: from ssl.dlhnet.de ([91.198.192.8]:40233 helo=ssl.dlh.net) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UT5KC-0002b3-Bi for qemu-devel@nongnu.org; Fri, 19 Apr 2013 03:01:36 -0400 Message-ID: <5170EBD1.1070602@dlhnet.de> Date: Fri, 19 Apr 2013 09:01:37 +0200 From: Peter Lieven MIME-Version: 1.0 References: <1365311880-11800-1-git-send-email-peter.crosthwaite@xilinx.com> <878v4kvt54.fsf@codemonkey.ws> <517001C1.8050005@dlhnet.de> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH] mainloop.c: Keep unlocking BQL during busy-wait spin-out List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Crosthwaite Cc: pbonzini@redhat.com, Anthony Liguori , qemu-devel@nongnu.org On 18.04.2013 16:35, Peter Crosthwaite wrote: > Hi Peter, > > On Fri, Apr 19, 2013 at 12:22 AM, Peter Lieven wrote: >> On 15.04.2013 15:08, Anthony Liguori wrote: >>> Peter Crosthwaite writes: >>> >>>> Modify Anthony's starvation detection logic to keep the BQL unlocked >>>> until the starvation condition goes away. Otherwise the counter has to >>>> count up to 1000 for each needed iteration until the busy-wait is >>>> lifted. >>>> >>>> Reset the counter back to zero once glib_pollfds_fill returns with a >>>> non-zero timout, (indicating a return to normality). The 1000 iteration >>>> wait now only happens once on the transition from normal operation to >>>> busy-wait starvation. >>>> >>>> Anthony's original patch fixed the serial paste bug, but this patch is >>>> also needed to restore performance. >>>> >>>> Signed-off-by: Peter Crosthwaite >>> I'm going through patches for 1.5 candidates. >>> >>> I believe the paste performance issue has been resolved now and this >>> patch is no longer needed. I can't find a definitive statement on the >>> list for that though. >> >> I am also hitting a problem that occured first after Anthonys original >> patch. >> In my testing environment I had 3 vServers that indepently of the load >> became >> heavily unresponsive after reporting "main-loop: WARNING: I/O thread spun >> for 1000 iterations". >> Even QMP is not responsing from time to time. But I am not using serial. >> From the load statistics >> it seems that the vServers is using one complete core busy waiting. >> >> I haven't seen this before this patch. > Are you referring to my patch or Anthonys patch here? Does this patch > introduce a regression (or even a change in behaviour) for you? I only noticed that after Anthonys patch (or at that time) I got VMs that became unresponsive after the thread spun notification. Can you imagine that the fake timeout that is introduced by this patch can somehow itself can cause a problem? Peter