From: Paolo Bonzini <pbonzini@redhat.com>
To: Alex Bligh <alex@alex.org.uk>
Cc: "nbd-general@lists.sourceforge.net"
<nbd-general@lists.sourceforge.net>,
Kevin Wolf <kwolf@redhat.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
Pavel Borzenkov <pborzenkov@virtuozzo.com>,
"Stefan stefanha@redhat. com" <stefanha@redhat.com>,
Wouter Verhelst <w@uter.be>, "Denis V. Lunev" <den@openvz.org>
Subject: Re: [Qemu-devel] [PATCH v2 1/1] NBD proto: add WRITE_ZEROES extension
Date: Thu, 31 Mar 2016 16:40:35 +0200 [thread overview]
Message-ID: <56FD36E3.1010402@redhat.com> (raw)
In-Reply-To: <357ECCE6-4A6F-430A-9C2C-214D775CFBFE@alex.org.uk>
On 31/03/2016 16:27, Alex Bligh wrote:
> > > IE why not always permit trimming PROVIDED the data always reads back
> > > as zero? This would be far simpler.
> >
> > Because trimming can make future operations more expensive and cause
> > fragmentation (which may not be as bad as it used to be at the media
> > level, but it is still somewhat bad at the filesystem level).
> >
> > So if you want a fully-provisioned file, the simplest way to do so is to
> > write zeroes to it, and trimming is undesirable.
> But isn't the server in a better position to know this than the
> client?
There are at least three possible states for a sector:
- hole (thin-provisioned)
- allocated as data (disk contains actual zeroes)
- allocated as unwritten (blocks reserved on backing storage, reads as
zeroes but the disk may not contain actual zeroes)
It's always okay for the backend to convert a zero block to an unwritten
extent; it's generally not okay for a backend to take a request to
create an unwritten extent and instead create a hole.
It's all an "as if" situation. The server must provide the semantics
requested by the client. For example, writing to a hole could cause
ENOSPC, writing to an unwritten extend could not. The server might know
better, because it certainly is in a better position to know how to
fulfill the client's request.
But even if it's just a hint, it makes sense for NBD to provide it.
It's not a coincidence that this hint exists at all levels: SCSI has an
UNMAP bit that can be set in the WRITE SAME command (and it has UNMAP
which matches NBD's TRIM); the fallocate system call has
FALLOC_FL_ZERO_RANGE and FALLOC_FL_PUNCH_HOLE (plus Linux has the
BLKDISCARD ioctl which again matches NBD's TRIM for block devices).
> EG if the server has a back end implementation (as I suspect
> Ceph on qemu-nbd does)
Ceph doesn't, but gluster does.
> which never actually stores all zero blocks,
> it won't make a difference, and conceivably you're generating a whole
> pile of I/O to avoid sparseness when sparseness might be faster. Take
> for example a persistent memory interface, where fragmentation is
> irrelevant, and writing piles of zeroes to memory is a waste of time.
It certainly isn't a waste of time if your intention is to scrub data
belonging to a previous tenant, before giving access to someone else!
If you have a metadata layer above then you can handle the command there
(that's why we're adding it); if you haven't you do have to write the
zeroes.
Paolo
next prev parent reply other threads:[~2016-03-31 14:40 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-31 13:02 [Qemu-devel] [PATCH v2 1/1] NBD proto: add WRITE_ZEROES extension Denis V. Lunev
2016-03-31 13:53 ` Alex Bligh
2016-03-31 13:55 ` Paolo Bonzini
2016-03-31 14:27 ` Alex Bligh
2016-03-31 14:40 ` Paolo Bonzini [this message]
2016-03-31 14:08 ` Eric Blake
2016-03-31 23:46 ` Eric Blake
2016-04-01 8:37 ` [Qemu-devel] [Nbd] " Wouter Verhelst
2016-04-01 20:26 ` Eric Blake
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56FD36E3.1010402@redhat.com \
--to=pbonzini@redhat.com \
--cc=alex@alex.org.uk \
--cc=den@openvz.org \
--cc=kwolf@redhat.com \
--cc=nbd-general@lists.sourceforge.net \
--cc=pborzenkov@virtuozzo.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=w@uter.be \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).