From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:56676) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aljIU-00073f-Uj for qemu-devel@nongnu.org; Thu, 31 Mar 2016 16:34:32 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aljIQ-0004ew-S0 for qemu-devel@nongnu.org; Thu, 31 Mar 2016 16:34:30 -0400 Received: from mx1.redhat.com ([209.132.183.28]:45840) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aljIQ-0004ec-Ke for qemu-devel@nongnu.org; Thu, 31 Mar 2016 16:34:26 -0400 References: <56FD7B7E.4060004@redhat.com> <64B326DA-CDF4-4537-B38A-46E7B57C319C@alex.org.uk> <56FD8069.7020101@redhat.com> From: Eric Blake Message-ID: <56FD89D0.5050408@redhat.com> Date: Thu, 31 Mar 2016 14:34:24 -0600 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="0TSFUOto1bNKWStAViac7ier3Uqxu2SrB" Subject: Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH? List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Bligh Cc: "nbd-general@lists.sourceforge.net" , "qemu-devel@nongnu.org" This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --0TSFUOto1bNKWStAViac7ier3Uqxu2SrB Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 03/31/2016 02:17 PM, Alex Bligh wrote: > OK so I actually went and researched what my answer was last time I > was asked ( :-) ): >=20 > Here was my conclusion last time after trawling through lkml > on the subject: >=20 > From https://sourceforge.net/p/nbd/mailman/message/27569820/ >=20 >> You may process commands out of order, and reply out of order, >> save that >> a) all write commands *completed* before you process a REQ_FLUSH >> must be written to non-volatile storage prior to completing >> that REQ_FLUSH (though apparently you should, if possible, make >> this true for all write commands *received*, which is a stronger >> condition) [Ignore this if you don't set SEND_REQ_FLUSH] >> b) a REQ_FUA flagged write must not complete until its payload >> is written to non-volatile storage [ignore this if you don't >> set SEND_REQ_FUA] >> >=20 >=20 > Perhaps it would be good for that to actually go in the docs! Indeed. >=20 > I don't think we need a 'stronger barrier' as the client can > implement that itself merely by waiting for all commands to > complete prior to sending FLUSH. >=20 > Incidentally, last time I looked, the linux kernel always sent > a FLUSH immediately after any bio marked FUA. Does qemu use > more interesting behavioural modes? I'm just learning the qemu nbd code myself, so I don't have a good answer, other than what I wrote before: >> >> In qemu, read+FUA just triggers blk_co_flush() prior to reading; but >> that's the same function it calls for write+FUA. >=20 > That's harmless, but unnecessary in the sense that current documented > behaviour doesn't require it. Perhaps it should? It _is_ a reasonable semantic - it means you are guaranteed that what YOU read will match what anyone ELSE reads (without waiting for writes to land, what YOU read SHOULD favor what is sitting in pending writes, while what others read may be stale on-disk data about to be overwritten by pending writes). And while I'm a bit fuzzy on the POSIX semantics of O_SYNC and O_DSYNC with open(), and on sync() vs. fsync() vs. fdatasync() vs. syncfs() (they are all subtly different, and I never remember which is stronger in what scenarios, nor how Linux subtly differs from what POSIX says), POSIX does have some wording syncs being useful even on reads to force the read to not complete until you can guarantee that everyone else will read the same content (that is, the sync flushes the pending writes, even though you are doing a read operation). >=20 > I suppose TRIM etc. should support FUA too? Probably, and with similar semantics to WRITE (only affects this transaction rather than all pending ones, but guarantees that the trim lands on disk before returning). >> Meanwhile, it sounds like FUA is valid on read, write, AND flush >> (because the kernel supports all three), >=20 > Do you have a pointer to what FUA means on kernel reads? Does it No clue. I'm not a kernel expert, and was assuming that you knew more about it than me. > mean "force unit access for the read" or does it mean "flush any > write for that block first"? The first is subtly different if the > file is remote and being accessed by multiple people (e.g. NFS, Ceph et= c.) I would lean to the latter - FUA on a read seems like it is most useful if it means "guarantee that no one else can read something older than what I read", and NOT "give me possibly stale data because I accessed the underlying storage rather than paying attention to in-flight writes that would change what I read". In other words, I think you should ALWAYS prefer data from in-flight writes over going to backing storage, but USUALLY don't need the overhead of waiting for those writes to complete; FUA slows down your read, but gives you better data assurance. >=20 >> even if we aren't quite sure >> what to document of those flags. And that means qemu is correct, and >> the NBD protocol has a bug. Since you contributed the FUA flag, is th= at >> something you can try to improve? >=20 > Yeah. My mess so I should clean it up. I think FUA should be valid > on essentially everything. >=20 > I think I might wait until structured replies is in though! It's also tricky because we just barely documented that servers SHOULD reject invalid flags with EINVAL; and that clients MUST NOT send FUA on commands where it is not documented; I don't know if we have an adequate discovery system in place to learn _which_ commands support FUA, especially if you are proposing that we expand the scope of FUA to be valid alongside a TRIM request. It doesn't have to be solved today, though, so I'm fine if you wait for structured replies first. --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --0TSFUOto1bNKWStAViac7ier3Uqxu2SrB Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCAAGBQJW/YnQAAoJEKeha0olJ0Nq8ugH/3fsNKVUl6EY+C/c3GIby8er TLGPUQc+ivQFbImu1Wo5zMtg9WthDaK81+rk9sPB6WRJ648bBQNMiCX/lFYAXMkH YmSlh4msctRnMwHcd9zEJqP7EkZt+k8zUy6lEZL/AJPiN8kMSYCLPNX8EYvc9Ldc iU4Bw8daBxKrzx/wkDCLoOG+PIBXikh/62YfFWH4s/uHH8oFlHiH3V+XXkAczPQw CQC1yIvoKmkkaUZFEeOk4E3/YknJ4idxL+fyC6RmTD00WxHxXJathpcpsJeFkL/q G4ldvk3+4pjHGxhcpkEyZKh76iRwvLZilMkCKcmXWbQHdnXwqJQXkt32azLl6zc= =cTep -----END PGP SIGNATURE----- --0TSFUOto1bNKWStAViac7ier3Uqxu2SrB--