From: Hanna Reitz <hreitz@redhat.com>
To: Klaus Kiwi <kkiwi@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>, qemu-devel <qemu-devel@nongnu.org>,
Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [qemu-web PATCH] Add a blog post about FUSE block exports
Date: Fri, 20 Aug 2021 11:03:00 +0200 [thread overview]
Message-ID: <c55458f6-0fc4-b03f-ddf1-0a65d79b832c@redhat.com> (raw)
In-Reply-To: <CAELHpAD81hgKbvRV=R7jaLyi8Nwi-edd+mJ8arhXAp2=iAiokg@mail.gmail.com>
On 19.08.21 20:22, Klaus Kiwi wrote:
>
>
> On Thu, Aug 19, 2021 at 7:27 AM Hanna Reitz <hreitz@redhat.com
> <mailto:hreitz@redhat.com>> wrote:
>
> This post explains when FUSE block exports are useful, how they work,
> and that it is fun to export an image file on its own path so it looks
> like your image file (in whatever format it was) is a raw image now.
>
>
> Thanks Hanna, great work. Even if you explained this to me multiple times,
> thanks to this I think I now finally understand *how* it works.
Oops, sorry for forgetting to CC you...
> Signed-off-by: Hanna Reitz <hreitz@redhat.com
> <mailto:hreitz@redhat.com>>
> ---
> You can also find this patch here:
> https://gitlab.com/hreitz/qemu-web
> <https://gitlab.com/hreitz/qemu-web> fuse-blkexport-v1
>
> My first patch to qemu-web, so I hope I am not doing anything overly
> stupid here (adding SVGs with extremely long lines comes to mind)...
> ---
> _posts/2021-08-18-fuse-blkexport.md | 488
> ++++++++++++++++++++++
> screenshots/2021-08-18-block-graph-a.svg | 2 +
> screenshots/2021-08-18-block-graph-b.svg | 2 +
> screenshots/2021-08-18-block-graph-c.svg | 2 +
> screenshots/2021-08-18-block-graph-d.svg | 2 +
> screenshots/2021-08-18-block-graph-e.svg | 2 +
> screenshots/2021-08-18-root-directory.svg | 2 +
> screenshots/2021-08-18-root-file.svg | 2 +
> 8 files changed, 502 insertions(+)
> create mode 100644 _posts/2021-08-18-fuse-blkexport.md
> create mode 100644 screenshots/2021-08-18-block-graph-a.svg
> create mode 100644 screenshots/2021-08-18-block-graph-b.svg
> create mode 100644 screenshots/2021-08-18-block-graph-c.svg
> create mode 100644 screenshots/2021-08-18-block-graph-d.svg
> create mode 100644 screenshots/2021-08-18-block-graph-e.svg
> create mode 100644 screenshots/2021-08-18-root-directory.svg
> create mode 100644 screenshots/2021-08-18-root-file.svg
>
> diff --git a/_posts/2021-08-18-fuse-blkexport.md
> b/_posts/2021-08-18-fuse-blkexport.md
> new file mode 100644
> index 0000000..e6a55d0
> --- /dev/null
> +++ b/_posts/2021-08-18-fuse-blkexport.md
> @@ -0,0 +1,488 @@
> +---
> +layout: post
> +title: "Exporting block devices as raw image files with FUSE"
> +date: 2021-08-18 18:00:00 +0200
> +author: Hanna Reitz
> +categories: [storage, features, tutorials]
>
>
> Non-fatal, but I feel that the title doesn't summarize all that this'
> blog posts is about.
> An alternate suggestion might be in the lines of "A look into QEMU's
> FUSE export
> feature, and how to use it to manipulate guest images".
Hmm, I don’t know. The feature itself doesn’t really allow you to
manipulate guest images, it only provides a translation layer so that
other tools can do it. I can definitely replace “Exporting block
devices” by “Presenting guest images”, but I’m not sure I want to go
much further, actually.
> +---
> +Sometimes, there is a VM disk image whose contents you want to
> manipulate
> +without booting the VM. For raw images, that process is usually
> fairly simple,
> +because most Linux systems bring tools for the job, e.g.:
> +* *dd* to just copy data to and from given offsets,
> +* *parted* to manipulate the partition table,
> +* *kpartx* to present all partitions as block devices,
> +* *mount* to access filesystems’ contents.
> +
> +Sadly, but naturally, such tools only work for raw images, and
> not for images
> +e.g. in QEMU’s qcow2 format. To access such an image’s content,
> the format has
> +to be translated to create a raw image, for example by:
> +* Exporting the image file with `qemu-nbd -c` as an NBD block
> device file,
> +* Converting between image formats using `qemu-img convert`,
> +* Accessing the image from a guest, where it appears as a normal
> block device.
> +
>
> Guessing that this would be the best place to mention
> guestmount/libguestfs, as Stefan
> mentioned in another reply to this thread?
Yes, probably replacing the “Accessing the image from a guest” point.
> Bonus points if you can identify (dis)advantages, similarly to that
> you did below
> with the other methods.
>
> +Unfortunately, none of these methods is perfect: `qemu-nbd -c`
> generally
> +requires root rights, converting to a temporary raw copy requires
> additional
> +disk space and the conversion process takes time, and accessing
> the image from a
> +guest is just quite cumbersome in general (and also specifically
> something that
> +we set out to avoid in the first sentence of this blog post).
> +
> +As of QEMU 6.0, there is another method, namely FUSE block exports.
> +Conceptually, these are rather similar to using `qemu-nbd -c`,
> but they do not
> +require root rights.
> +
> +**Note**: FUSE block exports are a feature that can be enabled or
> disabled
> +during the build process with `--enable-fuse` or
> `--disable-fuse`, respectively;
> +omitting either configure option will enable the feature if and
> only if libfuse3
> +is present. It is possible that the QEMU build you are using
> does not have FUSE
> +block export support, because it was not compiled in.
> +
> +FUSE (*Filesystem in Userspace*) is a technology to let userspace
> processes
> +provide filesystem drivers. For example, *sshfs* is a program
> that allows
> +mounting remote directories from a machine accessible via SSH.
> +
>
>
> Nitpicking but maybe FUSE here could link to another
> tutorial/wikipedia page
> with more info?
The best I could do is link to Wikipedia, I suppose, but would that
really be helpful? I think this post itself kind of provides an intro
into what FUSE is.
> +QEMU can use FUSE to make a virtual block device appear as a
> normal file on the
> +host, so that tools like *kpartx* can interact with it regardless
> of the image
> +format.
> +
> +## Background information
> +
> +### File mounts
>
> I must confess that, as I've gone through the document, this felt a
> bit like breaking
> the flow (probably due to my pre-conceptions of always mounting a
> resource into
> some directory to see it's content, which I guess was what I was
> expecting this
> would go before talking about mounting files).
>
> I understand now, however, that this introduction is necessary, but
> perhaps
> something like "Before we are able to use QEMU's FUSE exports, we need
> to clarify
> some fundamental concepts on the VFS and mountpoints: It is a
> little-known fact
> that <...>" would help me understand the flow better here.
Oh, sure!
> +A perhaps little-known fact is that, on Linux, filesystems do not
> need to have
> +a root directory, they only need to have a root node. A
> filesystem that only
> +provides a single regular file is perfectly valid.
> +
> +Conceptually, every filesystem is a tree, and mounting works by
> replacing one
> +subtree of the global VFS tree by the mounted filesystem’s tree.
> Normally, a
> +filesystem’s root node is a directory, like in the following example:
> +
> +||
> +|:--:|
> +|*Fig. 1: Mounting a regular filesystem with a directory as its
> root node*|
> +
> +Here, the directory `/foo` and its content (the files `/foo/a`
> and `/foo/b`) are
> +shadowed by the new filesystem (showing `/foo/x` and `/foo/y`).
> +
>
>
> Must confess that I wish there were a better term for it than
> 'shadowed directory'
> or 'shadowed file', avoiding potential confusion with things like
> /etc/shadow or
> 'shadow memory'.. But I couldn't think if any.
>
> +Note that a filesystem’s root node generally has no name. After
> mounting, the
> +filesystem’s root directory’s name is determined by the original
> name of the
> +mount point.
> +
> +Because a tree does not need to have multiple nodes but may
> consist of just a
> +single leaf, a filesystem with a file for its root node works
> just as well,
> +though:
> +
> +||
> +|:--:|
> +|*Fig. 2: Mounting a filesystem with a regular (unnamed) file as
> its root node*|
> +
> +Here, FS B only consists of a single node, a regular file with no
> name. (As
> +above, a filesystem’s root node is generally unnamed.)
> Consequently, the mount
> +point for it must also be a regular file (`/foo/a` in our
> example), and just
> +like before, the content of `/foo/a` is shadowed, and when
> opening it, one will
> +instead see the contents of FS B’s unnamed root node.
> +
> +### QEMU block exports
> +
> +QEMU allows exporting block nodes via various protocols (as of
> 6.0: NBD,
> +vhost-user, FUSE). A block node is an element of QEMU’s block
> graph (see e.g.
> +[Managing the New Block
> Layer](http://events17.linuxfoundation.org/sites/events/files/slides/talk\_11.pdf
> <http://events17.linuxfoundation.org/sites/events/files/slides/talk%5C_11.pdf>),
> +a talk given at KVM Forum 2017), which can for example be
> attached to guest
> +devices. Here is a very simple example:
> +
> +||
> +|:--:|
> +|*Fig. 3: A simple block graph for attaching a qcow2 image to a
> virtio-blk guest device*|
> +
> +This is the simplest example for a block graph that connects a
> *virtio-blk*
> +guest device to a qcow2 image file. The *file* block driver,
> instanced in the
> +form of a block node named *prot-node*, accesses the actual file
> and provides
> +the node above it access to the raw content. This node above,
> named *fmt-node*,
> +is handled by the *qcow2* block driver, which is capable of
> interpreting the
> +qcow2 format. Parents of this node will therefore see the actual
> content of the
> +virtual disk that is represented by the qcow2 image. There is
> only one parent
> +here, which is the *virtio-blk* guest device, which will thus see
> the virtual
> +disk.
> +
> +The command line to achieve the above could look something like this:
> +```
> +$ qemu-system-x86_64 \
> + -blockdev node-name=prot-node,driver=file,filename=$image_path \
> + -blockdev node-name=fmt-node,driver=qcow2,file=prot-node \
> + -device virtio-blk,drive=fmt-node
> +```
> +
> +Besides attaching guest devices to block nodes, you can also
> export them for
> +users outside of qemu, for example via NBD. Say you have a QMP
> channel open for
> +the QEMU instance above, then you could do this:
>
>
> As much as I hate to say it, wouldn't it be better to give the example
> below using
> (legacy?) qemu monitor commands, instead of JSON? Unless it cannot be
> done that way
> of course, they feel more intuitive/recognizable to me I think.
nbd_server_start exists as an HMP command, but there’s no direct
equivalent of block-export-add. We do have nbd_server_add, but of note
is that the nbd-server-add QMP command is deprecated.
In any case, I prefer using the JSON QMP commands here, because they map
directly to the storage daemon’s command line (--nbd-server and --export).
If this is too confusing, then I’d rather jump directly to the storage
daemon; but I feel like there’s value in showing that block exports work
in the system emulator, too.
>
> +```json
> +{
> + "execute": "nbd-server-start",
> + "arguments": {
> + "addr": {
> + "type": "inet",
> + "data": {
> + "host": "localhost",
> + "port": "10809"
> + }
> + }
> + }
> +}
> +{
> + "execute": "block-export-add",
> + "arguments": {
> + "type": "nbd",
> + "id": "fmt-node-export",
> + "node-name": "fmt-node",
> + "name": "guest-disk"
> + }
> +}
> +```
>
[...]
> The rest of it is very didactic and educational - thanks! And since
> none of my comments are critical:
> Reviewed-by: Klaus Heinrich Kiwi <kkiwi@redhat.com
> <mailto:kkiwi@redhat.com>>
Thanks!
Hanna
next prev parent reply other threads:[~2021-08-20 9:05 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-19 10:25 [qemu-web PATCH] Add a blog post about FUSE block exports Hanna Reitz
2021-08-19 10:37 ` Philippe Mathieu-Daudé
2021-08-19 11:00 ` Hanna Reitz
2021-08-19 11:09 ` Philippe Mathieu-Daudé
2021-08-19 11:17 ` Hanna Reitz
2021-08-19 16:23 ` Stefan Hajnoczi
2021-08-20 7:56 ` Hanna Reitz
2021-08-20 9:21 ` Daniel P. Berrangé
2021-08-20 14:27 ` Stefan Hajnoczi
2021-08-22 13:18 ` Thomas Huth
2021-08-23 8:30 ` Hanna Reitz
2021-08-23 8:49 ` Thomas Huth
2021-08-19 18:22 ` Klaus Kiwi
2021-08-20 9:03 ` Hanna Reitz [this message]
2021-08-20 21:24 ` Eric Blake
2021-08-23 8:23 ` Hanna Reitz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c55458f6-0fc4-b03f-ddf1-0a65d79b832c@redhat.com \
--to=hreitz@redhat.com \
--cc=kkiwi@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).