From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=57110 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PujAk-0008TO-CH for qemu-devel@nongnu.org; Wed, 02 Mar 2011 05:20:47 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PujAh-0007Ru-1q for qemu-devel@nongnu.org; Wed, 02 Mar 2011 05:20:45 -0500 Received: from mail-gw0-f46.google.com ([74.125.83.46]:33486) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PujAg-0007Rg-UW for qemu-devel@nongnu.org; Wed, 02 Mar 2011 05:20:42 -0500 Received: by gwj20 with SMTP id 20so3417470gwj.33 for ; Wed, 02 Mar 2011 02:20:42 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <874o7m6ris.fsf@linux.vnet.ibm.com> References: <1298961534-8099-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1298961534-8099-4-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <87k4gievf2.fsf@linux.vnet.ibm.com> <87r5aqhg7e.fsf@linux.vnet.ibm.com> <874o7m6ris.fsf@linux.vnet.ibm.com> Date: Wed, 2 Mar 2011 10:20:41 +0000 Message-ID: Subject: Re: [Qemu-devel] [PATCH -V2 4/6] hw/9pfs: Implement syncfs From: Stefan Hajnoczi Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Aneesh Kumar K. V" Cc: aliguori@us.ibm.com, qemu-devel@nongnu.org On Wed, Mar 2, 2011 at 5:05 AM, Aneesh Kumar K. V wrote: > On Tue, 1 Mar 2011 20:27:19 +0000, Stefan Hajnoczi w= rote: >> On Tue, Mar 1, 2011 at 6:02 PM, Aneesh Kumar K. V >> wrote: >> > On Tue, 1 Mar 2011 15:59:19 +0000, Stefan Hajnoczi wrote: >> >> >> Please explain the semantics of P9_TSYNCFS. =A0Won't returning suc= cess >> >> >> without doing anything lead to data integrity issues? >> >> > >> >> > I should actually do the 9P Operation format as commit message. Wil= l >> >> > add in the next update. Whether returning here would cause a data >> >> > integrity issue, it depends what sort of guarantee we want to >> >> > provide. So calling sync on the guest will cause all the dirty page= s in >> >> > the guest to be flushed to host. Now all those changes are in the h= ost >> >> > page cache and it would be nice to flush them =A0as a part of sync = but >> >> > then since we don't have a per file system sync, the above would im= ply >> >> > we flush all dirty pages on the host which can result in large >> >> > performance impact. >> >> >> >> You get the define the semantics of P9_TSYNCFS? =A0I thought this is >> >> part of a well-defined protocol :). =A0If this is a .L extension then >> >> it's probably a bad design and shouldn't be added to the protocol if >> >> we can't implement it. >> > >> > It is a part of .L extension and we can definitely implement it. There >> > is patch out there which is yet to be merged >> > >> > http://thread.gmane.org/gmane.linux.file-systems/44628 >> >> A future Linux-only ioctl :/. >> >> >> Is this operation supposed to flush the disk write cache too? >> > >> > I am not sure we need to specify that as a part of 9p operation. I gue= ss >> > we can only say maximum possible data integrity. Whether a sync will >> > cause disk write cache flush depends on the file system. For ext* that >> > can be controlled by mount option barrier. >> >> So on a host with a safe configuration this operation should put data >> on stable storage? >> >> >> >> >> I think virtio-9p has a file descriptor cache. =A0Would it be possibl= e >> >> to fsync() those file descriptors? >> >> >> > >> > Ideally we should. But that would involve a large number of fsync call= s. >> >> Yep, that's why this is a weird operation to support, especially since >> it's a .L add-on and not original 9P. =A0What's the use-case since >> today's Linux userland cannot directly make use of this operation? =A0I >> guess it has been added in order to pass-through a Linux internal vfs >> super block sync function? > > IMHO it would be nice to have a syncfs 9p operation because that enables > the client to say "if possible" flush the dirty data on the server. I > guess we should consider this as something server can choose to > ignore. In a cloud setup even doing a per file system sync can imply > performance impact because VirtFS export may not 1:1 map to mount point > on host. There is also plan to add a new option to the VirtFs export poin= t > which enable write to exported files to be either O_SYNC or > O_DIRECT, similar to the way done for image files. That would imply > Tsyncfs doesn't have much to do because we don't have dirty data on host > pagecache anymore. > > So from 9p .L protocol point of view, it is a valid operation which > enables client to request a flush of server cache if possible. And qemu > 9p server choose to ignore because of the performance impact. If you are > not comfortable with not doing anything specific in Tsyncfs > operation, we can add sync(2) call as part of this 9p operation and > later switch to =A0FS_IOC_SYNCFS when it become available. The case we need to prevent is where applications running on virtfs think they are getting guarantees that the implementation does not provide. Silently noping a sync operation is a step in that direction so I agree that sync(2) would be safer. I'm not sure I understand your 1:1 mount point mapping argument. The FS_IOC_SYNCFS ioctl does not help us there since it syncs the entire filesystem, not the directory tree that virtfs is mapped to. It will do a bunch of extra I/O similar to how sync(2) does this across all filesystems today. This suggests again that P9_TSYNCFS is hard to implement because FS_IOC_SYNCFS ends up not being useful. Did I miss something? I'm looking for a use case where guests need a P9_TSYNCFS operation. P9_TSYNCFS is not in linux-2.6 yet so I don't have any example code that exploits it. Can you point me to something that shows off why this operation is necessary? It must an optimization if 9P and NFS make do without an equivalent? Stefan