qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Eric Blake <eblake@redhat.com>
To: Peter Maydell <peter.maydell@linaro.org>,
	Yan-Jie Wang <jaywang0.tw@gmail.com>
Cc: QEMU Developers <qemu-devel@nongnu.org>,
	qemu-stable <qemu-stable@nongnu.org>
Subject: Re: [Qemu-devel] [PATCH] block/file-posix: fix the wrong result of find_allocation() in macOS.
Date: Mon, 10 Sep 2018 10:10:24 -0500	[thread overview]
Message-ID: <427e8ad2-c96a-fce8-b49e-c490c15d9ecd@redhat.com> (raw)
In-Reply-To: <CAFEAcA-AX9V+OGMGf+JD9dnFEiSfStuSrmEGGuhFhNQjTyrJcw@mail.gmail.com>

On 9/8/18 10:34 AM, Peter Maydell wrote:
> On 8 September 2018 at 15:15, Yan-Jie Wang <jaywang0.tw@gmail.com> wrote:
>> In macOS, lseek with SEEK_DATA behaves differently.
>> It seeks to the next data region even though offset is in the middle of
>> a data region. In addition, there may be many data regions without any
>> hole among them, like this: |---Data---|---Data---|
>>
>> Because of this, qemu-img convert with raw images as input may create
>> corrupted images in macOS especially for large files, and qemu-img
>> map may also report wrong things. This patch fixes this undesired
>> behaviors.
> 
> Hi. I have two general questions here:
> (1) is this behaviour of SEEK_DATA specific to macOS, or do the
> other BSDs (FreeBSD, OpenBSD, NetBSD) also have it ?

I hope no one else has it, because it is wrong, and should be fixed in 
MacOS to comply with the POSIX recommendation:

http://austingroupbugs.net/view.php?id=415#c862

The commit message ought to call out this link, when stating that the 
patch is a workaround for a buggy syscall.

> (2) is there a way to determine which flavour of SEEK_DATA we
> have as a configure-time test rather than having to hardcode
> an OS-specific #ifdef ?

Rather than hard-coding an OS-specific #ifdef, if we really are stuck 
working around MacOS braindead departure from implementation 
compatibility, then here's what Bruno recommended on the gnulib list:

https://lists.gnu.org/archive/html/bug-gnulib/2018-09/msg00058.html
> 
> How about one of these approaches?
> 
> (a) Put some knowledge about the extent boundaries into the code.
>     I.e. round offset down to the previous extend boundary before the
>     two lseek calls?

Except that I know of no clean way to learn what the actual extent 
boundary of a random offset is going to be.

> 
> (b) Evaluate both
>         d = lseek (fd, SEEK_DATA, offset);
>         h = lseek (fd, SEEK_HOLE, offset);
>     and since you don't know anything about the range from offset
>     to min(d,h)-1, assume it's data.
>     Then, if d < h, you have a data block, or if h < d, you have a hole.
> 

Our code currently optimizes by trying to use only one instead of two 
lseek() calls where possible, but this workaround seems like the most 
viable of any solution without even more syscalls.

> (c) for (int i = 1; i <= 16; i++)
>       {
>         unsigned long o = max (offset - (1 << i), 0);
>         d = lseek (fd, SEEK_DATA, o);
>         h = lseek (fd, SEEK_HOLE, o);
>         if (d < offset || h < offset)
>           break;
>       }

That may work to accomplish the semantics of proposal 1), but it is an 
awful lot of syscalls, and once you've made two, you don't really need 
to make more than two.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

  reply	other threads:[~2018-09-10 15:10 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-08 14:15 [Qemu-devel] [PATCH] block/file-posix: fix the wrong result of find_allocation() in macOS Yan-Jie Wang
2018-09-08 15:34 ` Peter Maydell
2018-09-10 15:10   ` Eric Blake [this message]
2018-09-11  2:01   ` 王彥傑

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=427e8ad2-c96a-fce8-b49e-c490c15d9ecd@redhat.com \
    --to=eblake@redhat.com \
    --cc=jaywang0.tw@gmail.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-stable@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).