From: Bob Peterson <rpeterso@redhat.com>
To: linux-fsdevel@vger.kernel.org, Alexander Viro <aviro@redhat.com>,
cluster-devel <cluster-devel@redhat.com>,
Steven Whitehouse <swhiteho@redhat.com>
Subject: [PATCH][try6] VFS: new want_holesize and got_holesize buffer_head flags for fiemap
Date: Tue, 21 Oct 2014 21:09:55 -0400 (EDT) [thread overview]
Message-ID: <579956475.9102148.1413940195344.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <635673293.9099204.1413939501068.JavaMail.zimbra@redhat.com>
Hi,
This is my sixth rework of this patch. The problem with the previous
version is that the underlying file system's block_map function
may set the buffer_head's b_state to 0, which may be misinterpreted
by fiemap as meaning it has returned the hole size in b_size.
This version implements a suggestion from Steve Whitehouse: it
improves on the design by (unfortunately) using two flags: a flag to
tell block_map to return hole size if possible, and another flag to
tell fiemap that the hole size has been returned. This way there is
no possibility of a misunderstanding.
The problem:
If you do a fiemap operation on a very large sparse file, it can take
an extremely long amount of time (we're talking days here) because
function __generic_block_fiemap does a block-for-block search when it
encounters a hole.
The solution:
Allow the underlying file system to return the hole size so that
function __generic_block_fiemap can quickly skip the hole. I have a
companion patch to GFS2 that takes advantage of the new flags to
speed up fiemap on sparse GFS2 files. Other file systems can do the
same as they see fit. For GFS2, the time it takes to skip a 1PB hole
in a sparse file goes from days to milliseconds.
Patch description:
This patch changes function __generic_block_fiemap so that it sets a new
buffer_want_holesize bit. The new bit signals to the underlying file system
to return a hole size from its block_map function (if possible) in the
event that a hole is encountered at the requested block. If the block_map
function encounters a hole, and sets buffer_got_holesize, fiemap takes the
returned b_size to be the size of the hole, in bytes. It then skips the
hole and moves to the next block. This may be repeated several times
in a row, especially for large holes, due to possible limitations of the
fs-specific block_map function. This is still much faster than trying
each block individually when large holes are encountered. If the
block_map function does not set buffer_got_holesize, the request for
holesize has been ignored, and it falls back to today's method of doing a
block-by-block search for the next valid block.
Regards,
Bob Peterson
Red Hat File Systems
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
fs/ioctl.c | 7 ++++++-
include/linux/buffer_head.h | 4 ++++
2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/fs/ioctl.c b/fs/ioctl.c
index 8ac3fad..9c07a40 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -291,13 +291,18 @@ int __generic_block_fiemap(struct inode *inode,
memset(&map_bh, 0, sizeof(struct buffer_head));
map_bh.b_size = len;
+ set_buffer_want_holesize(&map_bh);
ret = get_block(inode, start_blk, &map_bh, 0);
if (ret)
break;
/* HOLE */
if (!buffer_mapped(&map_bh)) {
- start_blk++;
+ if (buffer_got_holesize(&map_bh))
+ start_blk += logical_to_blk(inode,
+ map_bh.b_size);
+ else
+ start_blk++;
/*
* We want to handle the case where there is an
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 324329c..fc8e47c 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -37,6 +37,8 @@ enum bh_state_bits {
BH_Meta, /* Buffer contains metadata */
BH_Prio, /* Buffer should be submitted with REQ_PRIO */
BH_Defer_Completion, /* Defer AIO completion to workqueue */
+ BH_Want_Holesize, /* Return hole size if possible */
+ BH_Got_Holesize, /* Hole size has been returned */
BH_PrivateStart,/* not a state bit, but the first bit available
* for private allocation by other entities
@@ -128,6 +130,8 @@ BUFFER_FNS(Boundary, boundary)
BUFFER_FNS(Write_EIO, write_io_error)
BUFFER_FNS(Unwritten, unwritten)
BUFFER_FNS(Meta, meta)
+BUFFER_FNS(Want_Holesize, want_holesize)
+BUFFER_FNS(Got_Holesize, got_holesize)
BUFFER_FNS(Prio, prio)
BUFFER_FNS(Defer_Completion, defer_completion)
--
1.9.3
next parent reply other threads:[~2014-10-22 1:09 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <635673293.9099204.1413939501068.JavaMail.zimbra@redhat.com>
2014-10-22 1:09 ` Bob Peterson [this message]
2014-10-22 6:04 ` [Cluster-devel] [PATCH][try6] VFS: new want_holesize and got_holesize buffer_head flags for fiemap Christoph Hellwig
2014-10-22 10:50 ` Steven Whitehouse
2014-10-22 12:28 ` Bob Peterson
2014-10-23 8:51 ` Christoph Hellwig
2014-10-23 12:34 ` Bob Peterson
2014-10-26 21:41 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=579956475.9102148.1413940195344.JavaMail.zimbra@redhat.com \
--to=rpeterso@redhat.com \
--cc=aviro@redhat.com \
--cc=cluster-devel@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=swhiteho@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).