From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:37138)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbadari@us.ibm.com>) id 1QAsWO-0002J0-Ic
	for qemu-devel@nongnu.org; Fri, 15 Apr 2011 19:33:53 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pbadari@us.ibm.com>) id 1QAsWN-0002EP-Jq
	for qemu-devel@nongnu.org; Fri, 15 Apr 2011 19:33:52 -0400
Received: from e5.ny.us.ibm.com ([32.97.182.145]:56188)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbadari@us.ibm.com>) id 1QAsWN-0002EL-Eq
	for qemu-devel@nongnu.org; Fri, 15 Apr 2011 19:33:51 -0400
Received: from d01relay01.pok.ibm.com (d01relay01.pok.ibm.com [9.56.227.233])
	by e5.ny.us.ibm.com (8.14.4/8.13.1) with ESMTP id p3FN7ma3028945
	for <qemu-devel@nongnu.org>; Fri, 15 Apr 2011 19:07:48 -0400
Received: from d01av01.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215])
	by d01relay01.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id
	p3FNXmh9394744
	for <qemu-devel@nongnu.org>; Fri, 15 Apr 2011 19:33:48 -0400
Received: from d01av01.pok.ibm.com (loopback [127.0.0.1])
	by d01av01.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id
	p3FNXlCm013610
	for <qemu-devel@nongnu.org>; Fri, 15 Apr 2011 19:33:47 -0400
Message-ID: <4DA8D5DF.5070503@us.ibm.com>
Date: Fri, 15 Apr 2011 16:33:51 -0700
From: Badari Pulavarty <pbadari@us.ibm.com>
MIME-Version: 1.0
References: <1302874855-14736-1-git-send-email-stefanha@linux.vnet.ibm.com>
	<20110415150513.GA29619@lst.de>
	<BANLkTim7jON_CRmdN0k8Spkw7_HFfmMT6w@mail.gmail.com>
	<20110415153448.GA30116@lst.de>	<1302884634.32391.3.camel@badari-desktop>
	<20110415172909.GB303@lst.de> <4DA8C4F0.4080507@us.ibm.com>
	<4DA8CE00.3090907@us.ibm.com>
In-Reply-To: <4DA8CE00.3090907@us.ibm.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH] raw-posix: Linearize direct I/O on Linux
 NFS
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <http://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Anthony Liguori <aliguori@us.ibm.com>
Cc: Kevin Wolf <kwolf@redhat.com>, Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>, Stefan Hajnoczi <stefanha@gmail.com>, qemu-devel@nongnu.org, Khoa Huynh <khoa@us.ibm.com>, pbadari@linux.vnet.ibm.com, Christoph Hellwig <hch@lst.de>

On 4/15/2011 4:00 PM, Anthony Liguori wrote:
> On 04/15/2011 05:21 PM, pbadari@linux.vnet.ibm.com wrote:
>> On 4/15/2011 10:29 AM, Christoph Hellwig wrote:
>>> On Fri, Apr 15, 2011 at 09:23:54AM -0700, Badari Pulavarty wrote:
>>>> True. That brings up a different question - whether we are doing
>>>> enough testing on mainline QEMU :(
>>> It seems you're clearly not doing enough testing on any qemu.  Even
>>> the RHEL6 qemu has had preadv/pwritev since the first beta.
>>
>> Christoph,
>>
>> When you say "you're" - you really meant RH right ? RH should have 
>> caught this in their
>> regression year ago as part of their first beta. Correct ?
>>
>> Unfortunately, you are picking on person who spent time find & 
>> analyzing the regression,
>> narrowing the problem area and suggesting approaches to address the 
>> issue :(
>
> This is a pretty silly discussion to be having.
>
> The facts are:
>
> 1) NFS sucks with preadv/pwritev and O_DIRECT -- is anyone really 
> surprised?
>
> 2) We could work around this in QEMU by doing something ugly
>
> 3) We have no way to detect when we no longer need a work around which 
> makes (2) really unappealing.
>
> 4) That leaves us with:
>     a) waiting for NFS to get fixed properly and just living with 
> worse performance on older kernels
>
>     b) having a user-tunable switch to enable bouncing
>
> I really dislike the idea of (b) because we're stuck with it forever 
> and it's yet another switch for people to mistakenly depend on.
>
> I'm still waiting to see performance data without O_DIRECT.  I suspect 
> that using cache=writethrough will make most of this problem go away 
> in which case, we can just recommend that as a work around until NFS 
> is properly fixed.

We need to run through all cases and analyze the performance of 
cache=writethrough. Our initial (smaller setup) analysis
indicates that its better than unpatched O_DIRECT - but ~5% slower for 
sequential writes. But 30%+ slower for
random read/writes and mixed IO workloads. (In the past NFS O_SYNC is 
performance extremely poor compared to
O_DIRECT with no scaling - older kernels due to congestion control issues).

Khoa would collect the data over next few days.

To be honest with you, we should kill cache=none and just optimize only 
one case and live with it (like other commerical
hypervisor). :(

Thanks,
Badari