qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Niels de Vos <ndevos@redhat.com>
To: Jeff Cody <jcody@redhat.com>
Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org
Subject: Re: [Qemu-devel] [PATCH] block/gluster: glfs_lseek() workaround
Date: Wed, 24 May 2017 23:32:47 +0200	[thread overview]
Message-ID: <20170524213246.GR6614@ndevos-x240.usersys.redhat.com> (raw)
In-Reply-To: <20170524205003.GB4955@localhost.localdomain>

On Wed, May 24, 2017 at 04:50:03PM -0400, Jeff Cody wrote:
> On Wed, May 24, 2017 at 11:02:02AM +0200, Niels de Vos wrote:
> > On Tue, May 23, 2017 at 01:27:50PM -0400, Jeff Cody wrote:
> > > On current released versions of glusterfs, glfs_lseek() will sometimes
> > > return invalid values for SEEK_DATA or SEEK_HOLE.  For SEEK_DATA and
> > > SEEK_HOLE, the returned value should be >= the passed offset, or < 0 in
> > > the case of error:
> > > 
> > > LSEEK(2):
> > > 
> > >     off_t lseek(int fd, off_t offset, int whence);
> > > 
> > >     [...]
> > > 
> > >     SEEK_HOLE
> > >               Adjust  the file offset to the next hole in the file greater
> > >               than or equal to offset.  If offset points into the middle of
> > >               a hole, then the file offset is set to offset.  If there is no
> > >               hole past offset, then the file offset is adjusted to the end
> > >               of the file (i.e., there is  an implicit hole at the end of
> > >               any file).
> > > 
> > >     [...]
> > > 
> > >     RETURN VALUE
> > >               Upon  successful  completion,  lseek()  returns  the resulting
> > >               offset location as measured in bytes from the beginning of the
> > >               file.  On error, the value (off_t) -1 is returned and errno is
> > >               set to indicate the error
> > > 
> > > However, occasionally glfs_lseek() for SEEK_HOLE/DATA will return a
> > > value less than the passed offset, yet greater than zero.
> > > 
> > > For instance, here are example values observed from this call:
> > > 
> > >     offs = glfs_lseek(s->fd, start, SEEK_HOLE);
> > >     if (offs < 0) {
> > >         return -errno;          /* D1 and (H3 or H4) */
> > >     }
> > > 
> > > start == 7608336384
> > > offs == 7607877632
> > > 
> > > This causes QEMU to abort on the assert test.  When this value is
> > > returned, errno is also 0.
> > > 
> > > This is a reported and known bug to glusterfs:
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1425293
> > > 
> > > Although this is being fixed in gluster, we still should work around it
> > > in QEMU, given that multiple released versions of gluster behave this
> > > way.
> > 
> > Versions of GlusterFS 3.8.0 - 3.8.8 are affected, 3.8.9 has the fix
> > already and was released in February this year. We encourage users to
> > update to recent versions, and provide a stable (bugfix only) update
> > each month.
> 
> I am able to reproduce this bug on a server running glusterfs 3.11.0rc0.  Is
> that expected?

No, that really is not expected. The backport that was reported to fix
it for a 3.8.4 version is definitely part of 3.11 already. Could you
pass me some details about your Gluster environment and volume, either
by email or in a bug? I'll try to reproduce and debug it from the
Gluster side.

There is a holiday tomorrow (I'm in The Netherlands), and I'm travelling
the whole weekend. I might not be able to look into this before next
week. Sorry about that!

Thanks,
Niels


> 
> > 
> > The Red Hat Gluster Storage product that is often used in combination
> > with QEMU (+oVirt) does unfortunately not have an update where this is
> > fixed.
> > 
> > Using an unfixed Gluster storage environment can cause QEMU to segfault.
> > Although fixes exist for Gluster, not all users will have them
> > installed. Preventing the segfault in QEMU due to a broken storage
> > environment makes sense to me.
> > 
> > > This patch treats the return case of (offs < start) the same as if an
> > > error value other than ENXIO is returned; we will assume we learned
> > > nothing, and there are no holes in the file.
> > > 
> > > Signed-off-by: Jeff Cody <jcody@redhat.com>
> > > ---
> > >  block/gluster.c | 18 ++++++++++++++++--
> > >  1 file changed, 16 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/block/gluster.c b/block/gluster.c
> > > index 7c76cd0..c147909e 100644
> > > --- a/block/gluster.c
> > > +++ b/block/gluster.c
> > > @@ -1275,7 +1275,14 @@ static int find_allocation(BlockDriverState *bs, off_t start,
> > >      if (offs < 0) {
> > >          return -errno;          /* D3 or D4 */
> > >      }
> > > -    assert(offs >= start);
> > > +
> > > +    if (offs < start) {
> > > +        /* This is not a valid return by lseek().  We are safe to just return
> > > +         * -EIO in this case, and we'll treat it like D4. Unfortunately some
> > > +         *  versions of libgfapi will return offs < start, so an assert here
> > > +         *  will unneccesarily abort QEMU. */
> > 
> > This is not really correct, the problem is not in libgfapi, but in the
> > protocol translator on the server-side. The version of libgfapi does not
> > matter here, it is the version on the server. But that might be too much
> > detail for the comment.
> > 
> > > +        return -EIO;
> > > +    }
> > >  
> > >      if (offs > start) {
> > >          /* D2: in hole, next data at offs */
> > > @@ -1307,7 +1314,14 @@ static int find_allocation(BlockDriverState *bs, off_t start,
> > >      if (offs < 0) {
> > >          return -errno;          /* D1 and (H3 or H4) */
> > >      }
> > > -    assert(offs >= start);
> > > +
> > > +    if (offs < start) {
> > > +        /* This is not a valid return by lseek().  We are safe to just return
> > > +         * -EIO in this case, and we'll treat it like H4. Unfortunately some
> > > +         *  versions of libgfapi will return offs < start, so an assert here
> > > +         *  will unneccesarily abort QEMU. */
> > > +        return -EIO;
> > > +    }
> > >  
> > >      if (offs > start) {
> > >          /*
> > > -- 
> > > 2.9.3
> > > 
> > 
> > You might want to explain the problem a little different in the commit
> > message. It is fine too if you think it would become too detailed, my
> > explanation is in the archives now in any case.
> > 
> > Reviewed-by: Niels de Vos <ndevos@redhat.com>

  reply	other threads:[~2017-05-24 21:33 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-23 17:27 [Qemu-devel] [PATCH] block/gluster: glfs_lseek() workaround Jeff Cody
2017-05-23 17:50 ` Eric Blake
2017-05-24  9:02 ` Niels de Vos
2017-05-24 16:26   ` Jeff Cody
2017-05-24 20:50   ` Jeff Cody
2017-05-24 21:32     ` Niels de Vos [this message]
2017-05-26 19:14 ` Jeff Cody

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170524213246.GR6614@ndevos-x240.usersys.redhat.com \
    --to=ndevos@redhat.com \
    --cc=jcody@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).