From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52469) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1azg93-0002pe-UE for qemu-devel@nongnu.org; Mon, 09 May 2016 04:02:27 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1azg8y-00064H-QW for qemu-devel@nongnu.org; Mon, 09 May 2016 04:02:24 -0400 Date: Mon, 9 May 2016 10:02:12 +0200 From: Kevin Wolf Message-ID: <20160509080212.GB5054@noname.redhat.com> References: <20160406110216.GE5098@noname.redhat.com> <5704F0A4.8090806@redhat.com> <20160406114140.GF5098@noname.redhat.com> <20160406115159.GG5098@noname.redhat.com> <20160406131053.GB7329@localhost.localdomain> <20160406132011.GI5098@noname.redhat.com> <570610D7.8000104@redhat.com> <977766574.2310591.1460348810883.JavaMail.zimbra@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [Qemu-block] [PATCH for-2.6 2/2] block/gluster: prevent data loss after i/o error List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Raghavendra Talur Cc: Raghavendra Gowdappa , Pranith Kumar Karampuri , Jeff Cody , Ric Wheeler , Jeff Darcy , qemu-block@nongnu.org, Vijay Bellur , qemu-devel@nongnu.org, kdhananj@redhat.com, Poornima Gurusiddaiah , Rajesh Joseph Am 09.05.2016 um 08:38 hat Raghavendra Talur geschrieben: >=20 >=20 > On Mon, Apr 11, 2016 at 9:56 AM, Raghavendra Gowdappa > wrote: >=20 >=20 > > +Raghavendra G who implemented this option in write-behind, to th= is > > upstream patch review discussion >=20 > Thanks Pranith. Kritika did inform us about the discussion. We were= working > on ways to find out solutions to the problems raised (and it was a = long > festive weekend in Bangalore). >=20 > Sorry for top-posting. I am trying to address two issues raised in = this > mail thread: > 1. No ways to identify whether setting of option succeeded in gfapi= . > 2. Why retaining cache after fsync failure is not default behavior? >=20 > 1. No ways to identify whether setting of option succeeded in gfapi= : >=20 > I've added Poornima and Raghavendra Talur who work on gfapi to assi= st on > this. >=20 >=20 > =A0There is currently no such feature in gfapi. We could think of two p= ossible > solutions: >=20 > =A0a. Have the qemu-glusterfs driver require a version of glusterfs-api= .rpm which > surely has this write behind option. In that case, if you use > glfs_set_xlator_option before glfs_init with right key and value, there= is no > way the volume set gets rejected. Note that this is a installation time > dependency, we don't have libgfapi library versioning. This solution is= good > enough, if the logic in qemu is=A0 >=20 > ret =3D glfs_set_xlator_option; > if (ret) { > =A0 =A0 exit because we don't have required environment. > }=A0 > proceed with work; This is not good enough. Basically the requirement is: Given a qemu binary that is put on a machine with a random gluster version installed, this qemu binary never corrupts a gluster image. Either it errors out or it works correctly. Just assuming that the right gluster version is installed is much too weak, and even version number checks aren't very good when you consider things like backports. > b. Second solution is to implement a case in write_behind getxattr FOP = to > handle such request and reply back with the current setting. Look at > dht_get_real_filename for similar feature. You will have to implement l= ogic > similar to something like below >=20 > ret =3D glfs_getxattr ( fs, "/", glusterfs.write-behind- > resync-failed-syncs-after-fsync-status, &value, size); >=20 > if ( (ret =3D -1 && errno =3D=3D ENODATA) || value =3D=3D 0 ) { > =A0 =A0 =A0 =A0// write behind in the stack does not understand this op= tion > =A0 =A0 =A0 =A0// =A0OR > =A0 =A0 =A0 //=A0 we have write-behind with required option set to off > =A0 =A0 > } else { > =A0 =A0 // We do have the required option > } Yes, please this one. Or a new glfs_setxattr_fail() that returns an error if the option didn't take effect. But I guess a glfs_getxattr() function makes sense anyway. In qemu, we try to allow querying everything that you can set. > One negative aspect of both the solutions above is the loosely coupled = nature > of logic. If the behavior ever changes in lower layer(which is gfapi or > write-behind in this case) the upper layer(qemu) will have to > a. have a dependency of the sort which defines both the minimum version= and > maximum version of package required > b. interpret the return values with more if-else conditions to maintain > backward compatibility. It's gluster's job to keep the compatibility with existing users of the APIs. Just like qemu has to make sure that old QMP clients keep working when we make changes or extensions to the protocol. In other words: If the behaviour of a lower layer changes while the upper layer only uses old APIs, that's clearly a bug in the lower layer. Kevin