From: Paolo Bonzini <pbonzini@redhat.com>
To: Alex Bligh <alex@alex.org.uk>
Cc: "nbd-general@lists.sourceforge.net"
<nbd-general@lists.sourceforge.net>,
Kevin Wolf <kwolf@redhat.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
Pavel Borzenkov <pborzenkov@virtuozzo.com>,
"Stefan stefanha@redhat. com" <stefanha@redhat.com>,
Wouter Verhelst <w@uter.be>, "Denis V. Lunev" <den@openvz.org>
Subject: Re: [Qemu-devel] [PATCH v2 1/1] NBD proto: add WRITE_ZEROES extension
Date: Thu, 31 Mar 2016 16:40:35 +0200 [thread overview]
Message-ID: <56FD36E3.1010402@redhat.com> (raw)
In-Reply-To: <357ECCE6-4A6F-430A-9C2C-214D775CFBFE@alex.org.uk>
On 31/03/2016 16:27, Alex Bligh wrote:
> > > IE why not always permit trimming PROVIDED the data always reads back
> > > as zero? This would be far simpler.
> >
> > Because trimming can make future operations more expensive and cause
> > fragmentation (which may not be as bad as it used to be at the media
> > level, but it is still somewhat bad at the filesystem level).
> >
> > So if you want a fully-provisioned file, the simplest way to do so is to
> > write zeroes to it, and trimming is undesirable.
> But isn't the server in a better position to know this than the
> client?
There are at least three possible states for a sector:
- hole (thin-provisioned)
- allocated as data (disk contains actual zeroes)
- allocated as unwritten (blocks reserved on backing storage, reads as
zeroes but the disk may not contain actual zeroes)
It's always okay for the backend to convert a zero block to an unwritten
extent; it's generally not okay for a backend to take a request to
create an unwritten extent and instead create a hole.
It's all an "as if" situation. The server must provide the semantics
requested by the client. For example, writing to a hole could cause
ENOSPC, writing to an unwritten extend could not. The server might know
better, because it certainly is in a better position to know how to
fulfill the client's request.
But even if it's just a hint, it makes sense for NBD to provide it.
It's not a coincidence that this hint exists at all levels: SCSI has an
UNMAP bit that can be set in the WRITE SAME command (and it has UNMAP
which matches NBD's TRIM); the fallocate system call has
FALLOC_FL_ZERO_RANGE and FALLOC_FL_PUNCH_HOLE (plus Linux has the
BLKDISCARD ioctl which again matches NBD's TRIM for block devices).
> EG if the server has a back end implementation (as I suspect
> Ceph on qemu-nbd does)
Ceph doesn't, but gluster does.
> which never actually stores all zero blocks,
> it won't make a difference, and conceivably you're generating a whole
> pile of I/O to avoid sparseness when sparseness might be faster. Take
> for example a persistent memory interface, where fragmentation is
> irrelevant, and writing piles of zeroes to memory is a waste of time.
It certainly isn't a waste of time if your intention is to scrub data
belonging to a previous tenant, before giving access to someone else!
If you have a metadata layer above then you can handle the command there
(that's why we're adding it); if you haven't you do have to write the
zeroes.
Paolo
next prev parent reply other threads:[~2016-03-31 14:40 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-31 13:02 [Qemu-devel] [PATCH v2 1/1] NBD proto: add WRITE_ZEROES extension Denis V. Lunev
2016-03-31 13:53 ` Alex Bligh
2016-03-31 13:55 ` Paolo Bonzini
2016-03-31 14:27 ` Alex Bligh
2016-03-31 14:40 ` Paolo Bonzini [this message]
2016-03-31 14:08 ` Eric Blake
2016-03-31 23:46 ` Eric Blake
2016-04-01 8:37 ` [Qemu-devel] [Nbd] " Wouter Verhelst
2016-04-01 20:26 ` Eric Blake
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56FD36E3.1010402@redhat.com \
--to=pbonzini@redhat.com \
--cc=alex@alex.org.uk \
--cc=den@openvz.org \
--cc=kwolf@redhat.com \
--cc=nbd-general@lists.sourceforge.net \
--cc=pborzenkov@virtuozzo.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=w@uter.be \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.