From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52519) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1biJD3-0005v5-0i for qemu-devel@nongnu.org; Fri, 09 Sep 2016 06:39:02 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1biJCy-0007e2-NJ for qemu-devel@nongnu.org; Fri, 09 Sep 2016 06:38:59 -0400 Received: from mx1.redhat.com ([209.132.183.28]:41368) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1biJCy-0007di-FN for qemu-devel@nongnu.org; Fri, 09 Sep 2016 06:38:56 -0400 Date: Fri, 9 Sep 2016 12:38:51 +0200 From: Kevin Wolf Message-ID: <20160909103851.GA6682@noname.redhat.com> References: <1471935390-126476-1-git-send-email-ashish.mittal@veritas.com> <20160830173507.GA1501@localhost.localdomain> <20160908140049.GD1311@localhost.localdomain> <20160908142021.GE4206@noname.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Subject: Re: [Qemu-devel] vxhs caching behaviour (was: [PATCH v4 RFC] block/vxhs: Initial commit to add) Veritas HyperScale VxHS block device support List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: ashish mittal Cc: Jeff Cody , qemu-devel@nongnu.org, Paolo Bonzini , Markus Armbruster , "Daniel P. Berrange" , Ashish Mittal , Stefan Hajnoczi , Ketan.Nilangekar@veritas.com, Abhijit.Dey@veritas.com Am 08.09.2016 um 22:46 hat ashish mittal geschrieben: > Hi Kevin, > > By design, our writeback cache is on non-volatile SSD device. We do > async writes to this cache and also maintain a persistent index map of > the data written. This gives us the capability to recover write-back > cache if needed. So your server application uses something like O_SYNC to write data to the cache SSD, in order to avoid that data is sitting only in the kernel page cache or in a volatile write cache of the SSD? Kevin > On Thu, Sep 8, 2016 at 7:20 AM, Kevin Wolf wrote: > > Am 08.09.2016 um 16:00 hat Jeff Cody geschrieben: > >> > >> +/* > >> > >> + * This is called by QEMU when a flush gets triggered from within > >> > >> + * a guest at the block layer, either for IDE or SCSI disks. > >> > >> + */ > >> > >> +int vxhs_co_flush(BlockDriverState *bs) > >> > >> +{ > >> > >> + BDRVVXHSState *s = bs->opaque; > >> > >> + uint64_t size = 0; > >> > >> + int ret = 0; > >> > >> + > >> > >> + ret = qemu_iio_ioctl(s->qnio_ctx, > >> > >> + s->vdisk_hostinfo[s->vdisk_cur_host_idx].vdisk_rfd, > >> > >> + VDISK_AIO_FLUSH, &size, NULL, IIO_FLAG_SYNC); > >> > >> + > >> > >> + if (ret < 0) { > >> > >> + /* > >> > >> + * Currently not handling the flush ioctl > >> > >> + * failure because of network connection > >> > >> + * disconnect. Since all the writes are > >> > >> + * commited into persistent storage hence > >> > >> + * this flush call is noop and we can safely > >> > >> + * return success status to the caller. > >> > > > >> > > I'm not sure I understand here. Are you saying the qemu_iio_ioctl() call > >> > > above is a noop? > >> > > > >> > > >> > Yes, qemu_iio_ioctl(VDISK_AIO_FLUSH) is only a place-holder at present > >> > in case we later want to add some functionality to it. I have now > >> > added a comment to this affect to avoid any confusion. > >> > > >> > >> The problem is you don't know which version of the qnio library any given > >> QEMU binary will be using, since it is a shared library. Future versions > >> may implement the flush ioctl as expressed above, in which case we may hide > >> a valid error. > >> > >> Am I correct in assuming that this call suppresses errors because an error > >> is returned for an unknown ioctl operation of VDISK_AIO_FLUSH? If so, and > >> you want a placeholder here for flushing, you should go all the way and stub > >> out the underlying ioctl call to return success. Then QEMU can at least > >> rely on the error return from the flush operation. > > > > So what's the story behind the missing flush command? > > > > Does the server always use something like O_SYNC, i.e. all potential > > write caches in the stack operate in a writethrough mode? So each write > > request is only completed successfully if it is ensured that the data is > > safe on disk rather than in a volatile writeback cache? > > > > As soon as any writeback cache can be involved (e.g. the kernel page > > cache or a volatile disk cache) and there is no flush command (a real > > one, not just stubbed), the driver is not operating correctly and > > therefore not ready for inclusion. > > > > So Ashish, can you tell us something about caching behaviour across the > > storage stack when vxhs is involved? > > > > Kevin