From mboxrd@z Thu Jan 1 00:00:00 1970 From: Rusty Russell Subject: Re: [PATCH] virtio-spec: document block CMD and FLUSH Date: Tue, 4 May 2010 14:08:24 +0930 Message-ID: <201005041408.25069.rusty@rustcorp.com.au> References: <20100218222220.GA14847@redhat.com> Mime-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Cc: virtualization@lists.linux-foundation.org, Anthony Liguori , qemu-devel@nongnu.org, kvm@vger.kernel.org, hch@lst.de, Neil Brown , Jens Axboe , tytso@mit.edu To: "Michael S. Tsirkin" Return-path: Received: from ozlabs.org ([203.10.76.45]:45928 "EHLO ozlabs.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752314Ab0EDEia (ORCPT ); Tue, 4 May 2010 00:38:30 -0400 In-Reply-To: <20100218222220.GA14847@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On Fri, 19 Feb 2010 08:52:20 am Michael S. Tsirkin wrote: > I took a stub at documenting CMD and FLUSH request types in virtio > block. Christoph, could you look over this please? > > I note that the interface seems full of warts to me, > this might be a first step to cleaning them. ISTR Christoph had withdrawn some patches in this area, and was waiting for him to resubmit? I've given up on figuring out the block device. What seem to me to be sane semantics along the lines of memory barriers are foreign to disk people: they want (and depend on) flushing everywhere. For example, tdb transactions do not require a flush, they only require what I would call a barrier: that prior data be written out before any future data. Surely that would be more efficient in general than a flush! In fact, TDB wants only writes to *that file* (and metadata) written out first; it has no ordering issues with other I/O on the same device. A generic I/O interface would allow you to specify "this request depends on these outstanding requests" and leave it at that. It might have some sync flush command for dumb applications and OSes. The userspace API might be not be as precise and only allow such a barrier against all prior writes on this fd. ISTR someone mentioning a desire for such an API years ago, so CC'ing the usual I/O suspects... Cheers, Rusty. From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1O99u1-0004oM-6i for qemu-devel@nongnu.org; Tue, 04 May 2010 00:38:37 -0400 Received: from [140.186.70.92] (port=55387 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1O99tz-0004oE-KW for qemu-devel@nongnu.org; Tue, 04 May 2010 00:38:36 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1O99ty-0003WR-6r for qemu-devel@nongnu.org; Tue, 04 May 2010 00:38:35 -0400 Received: from ozlabs.org ([203.10.76.45]:45271) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O99tx-0003Vz-Q1 for qemu-devel@nongnu.org; Tue, 04 May 2010 00:38:34 -0400 From: Rusty Russell Date: Tue, 4 May 2010 14:08:24 +0930 References: <20100218222220.GA14847@redhat.com> In-Reply-To: <20100218222220.GA14847@redhat.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201005041408.25069.rusty@rustcorp.com.au> Subject: [Qemu-devel] Re: [PATCH] virtio-spec: document block CMD and FLUSH List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: tytso@mit.edu, kvm@vger.kernel.org, Neil Brown , qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, Jens Axboe , hch@lst.de On Fri, 19 Feb 2010 08:52:20 am Michael S. Tsirkin wrote: > I took a stub at documenting CMD and FLUSH request types in virtio > block. Christoph, could you look over this please? > > I note that the interface seems full of warts to me, > this might be a first step to cleaning them. ISTR Christoph had withdrawn some patches in this area, and was waiting for him to resubmit? I've given up on figuring out the block device. What seem to me to be sane semantics along the lines of memory barriers are foreign to disk people: they want (and depend on) flushing everywhere. For example, tdb transactions do not require a flush, they only require what I would call a barrier: that prior data be written out before any future data. Surely that would be more efficient in general than a flush! In fact, TDB wants only writes to *that file* (and metadata) written out first; it has no ordering issues with other I/O on the same device. A generic I/O interface would allow you to specify "this request depends on these outstanding requests" and leave it at that. It might have some sync flush command for dumb applications and OSes. The userspace API might be not be as precise and only allow such a barrier against all prior writes on this fd. ISTR someone mentioning a desire for such an API years ago, so CC'ing the usual I/O suspects... Cheers, Rusty.