From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:37138) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QAsWO-0002J0-Ic for qemu-devel@nongnu.org; Fri, 15 Apr 2011 19:33:53 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QAsWN-0002EP-Jq for qemu-devel@nongnu.org; Fri, 15 Apr 2011 19:33:52 -0400 Received: from e5.ny.us.ibm.com ([32.97.182.145]:56188) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QAsWN-0002EL-Eq for qemu-devel@nongnu.org; Fri, 15 Apr 2011 19:33:51 -0400 Received: from d01relay01.pok.ibm.com (d01relay01.pok.ibm.com [9.56.227.233]) by e5.ny.us.ibm.com (8.14.4/8.13.1) with ESMTP id p3FN7ma3028945 for ; Fri, 15 Apr 2011 19:07:48 -0400 Received: from d01av01.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by d01relay01.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p3FNXmh9394744 for ; Fri, 15 Apr 2011 19:33:48 -0400 Received: from d01av01.pok.ibm.com (loopback [127.0.0.1]) by d01av01.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p3FNXlCm013610 for ; Fri, 15 Apr 2011 19:33:47 -0400 Message-ID: <4DA8D5DF.5070503@us.ibm.com> Date: Fri, 15 Apr 2011 16:33:51 -0700 From: Badari Pulavarty MIME-Version: 1.0 References: <1302874855-14736-1-git-send-email-stefanha@linux.vnet.ibm.com> <20110415150513.GA29619@lst.de> <20110415153448.GA30116@lst.de> <1302884634.32391.3.camel@badari-desktop> <20110415172909.GB303@lst.de> <4DA8C4F0.4080507@us.ibm.com> <4DA8CE00.3090907@us.ibm.com> In-Reply-To: <4DA8CE00.3090907@us.ibm.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH] raw-posix: Linearize direct I/O on Linux NFS List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: Kevin Wolf , Stefan Hajnoczi , Stefan Hajnoczi , qemu-devel@nongnu.org, Khoa Huynh , pbadari@linux.vnet.ibm.com, Christoph Hellwig On 4/15/2011 4:00 PM, Anthony Liguori wrote: > On 04/15/2011 05:21 PM, pbadari@linux.vnet.ibm.com wrote: >> On 4/15/2011 10:29 AM, Christoph Hellwig wrote: >>> On Fri, Apr 15, 2011 at 09:23:54AM -0700, Badari Pulavarty wrote: >>>> True. That brings up a different question - whether we are doing >>>> enough testing on mainline QEMU :( >>> It seems you're clearly not doing enough testing on any qemu. Even >>> the RHEL6 qemu has had preadv/pwritev since the first beta. >> >> Christoph, >> >> When you say "you're" - you really meant RH right ? RH should have >> caught this in their >> regression year ago as part of their first beta. Correct ? >> >> Unfortunately, you are picking on person who spent time find & >> analyzing the regression, >> narrowing the problem area and suggesting approaches to address the >> issue :( > > This is a pretty silly discussion to be having. > > The facts are: > > 1) NFS sucks with preadv/pwritev and O_DIRECT -- is anyone really > surprised? > > 2) We could work around this in QEMU by doing something ugly > > 3) We have no way to detect when we no longer need a work around which > makes (2) really unappealing. > > 4) That leaves us with: > a) waiting for NFS to get fixed properly and just living with > worse performance on older kernels > > b) having a user-tunable switch to enable bouncing > > I really dislike the idea of (b) because we're stuck with it forever > and it's yet another switch for people to mistakenly depend on. > > I'm still waiting to see performance data without O_DIRECT. I suspect > that using cache=writethrough will make most of this problem go away > in which case, we can just recommend that as a work around until NFS > is properly fixed. We need to run through all cases and analyze the performance of cache=writethrough. Our initial (smaller setup) analysis indicates that its better than unpatched O_DIRECT - but ~5% slower for sequential writes. But 30%+ slower for random read/writes and mixed IO workloads. (In the past NFS O_SYNC is performance extremely poor compared to O_DIRECT with no scaling - older kernels due to congestion control issues). Khoa would collect the data over next few days. To be honest with you, we should kill cache=none and just optimize only one case and live with it (like other commerical hypervisor). :( Thanks, Badari