* [PATCH][resend] fs: Add hooks for get_hole_size to generic_block_fiemap [not found] <1623792099.11913714.1409074745372.JavaMail.zimbra@redhat.com> @ 2014-09-03 15:52 ` Bob Peterson 2014-09-03 20:26 ` Dave Chinner 2014-09-05 0:04 ` [PATCH][try4] " Bob Peterson 0 siblings, 2 replies; 4+ messages in thread From: Bob Peterson @ 2014-09-03 15:52 UTC (permalink / raw) To: Alexander Viro, linux-fsdevel Hi, I sent this patch a few weeks back and it didn't receive any comments. So I assume it's okay to add upstream? The problem: If you do a fiemap operation on a very large sparse file, it can take an extremely long amount of time (we're talking days here) because function __generic_block_fiemap does a block-for-block search when it encounters a hole. The solution: Allow the underlying file system to return the hole size so that function __generic_block_fiemap can quickly skip the hole. Preamble: In cases where the fs-specific block_map() function finds a hole, it can return the hole size in b_size. This is efficient because the file system doesn't need to figure out block mapping a second time to determine the hole size. The patch repurposes the buffer_meta flag to tell when the fs-specific block_map() is passing back the hole_size: If the fs-specific block_map() doesn't set the buffer_meta bit, function __generic_block_fiemap() assumes a hole size of 1 as before. Other file systems that want to take advantage of the new "hole size" functionality need only write their own function to determine the hole size, call it from their respective block_map() function, and set_buffer_meta to put it into use. I've written a simple patch to GFS2 that does just that, as a follow-on. Patch description: This patch changes function __generic_block_fiemap so that if the fs-specific block_map sets the buffer_meta flag corresponding to a hole, it takes the returned b_size to be the size of the hole, in bytes. This is much faster than trying each block individually when large holes are encountered. Regards, Bob Peterson Red Hat File Systems Signed-off-by: Bob Peterson <rpeterso@redhat.com> --- diff --git a/fs/ioctl.c b/fs/ioctl.c index 8ac3fad..954d1c3 100644 --- a/fs/ioctl.c +++ b/fs/ioctl.c @@ -291,13 +291,18 @@ int __generic_block_fiemap(struct inode *inode, memset(&map_bh, 0, sizeof(struct buffer_head)); map_bh.b_size = len; + clear_buffer_meta(&map_bh); ret = get_block(inode, start_blk, &map_bh, 0); if (ret) break; /* HOLE */ if (!buffer_mapped(&map_bh)) { - start_blk++; + if (buffer_meta(&map_bh)) + start_blk += logical_to_blk(inode, + map_bh.b_size); + else + start_blk++; /* * We want to handle the case where there is an -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH][resend] fs: Add hooks for get_hole_size to generic_block_fiemap 2014-09-03 15:52 ` [PATCH][resend] fs: Add hooks for get_hole_size to generic_block_fiemap Bob Peterson @ 2014-09-03 20:26 ` Dave Chinner 2014-09-05 0:04 ` [PATCH][try4] " Bob Peterson 1 sibling, 0 replies; 4+ messages in thread From: Dave Chinner @ 2014-09-03 20:26 UTC (permalink / raw) To: Bob Peterson; +Cc: Alexander Viro, linux-fsdevel On Wed, Sep 03, 2014 at 11:52:25AM -0400, Bob Peterson wrote: > Hi, > > I sent this patch a few weeks back and it didn't receive any comments. > So I assume it's okay to add upstream? > > The problem: > If you do a fiemap operation on a very large sparse file, it can take > an extremely long amount of time (we're talking days here) because > function __generic_block_fiemap does a block-for-block search when it > encounters a hole. > > The solution: > Allow the underlying file system to return the hole size so that function > __generic_block_fiemap can quickly skip the hole. > > Preamble: > In cases where the fs-specific block_map() function finds a hole, it > can return the hole size in b_size. This is efficient because the file > system doesn't need to figure out block mapping a second time to > determine the hole size. The patch repurposes the buffer_meta flag > to tell when the fs-specific block_map() is passing back the hole_size: > If the fs-specific block_map() doesn't set the buffer_meta bit, Use of BH_Meta is a bit wierd. Mapping the data blocks of a file and expecting the return to have the BH_Meta flag set seems, well, A Little Bit Wrong. We're not short of buffer flags, perhaps adding a purpose sepcific flag for this? Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH][try4] fs: Add hooks for get_hole_size to generic_block_fiemap 2014-09-03 15:52 ` [PATCH][resend] fs: Add hooks for get_hole_size to generic_block_fiemap Bob Peterson 2014-09-03 20:26 ` Dave Chinner @ 2014-09-05 0:04 ` Bob Peterson 2014-09-11 15:29 ` [PATCH][try5] fs: if block_map clears buffer_holesize bit skip hole size from b_size Bob Peterson 1 sibling, 1 reply; 4+ messages in thread From: Bob Peterson @ 2014-09-05 0:04 UTC (permalink / raw) To: Alexander Viro, linux-fsdevel Hi, This version uses a new buffer flag, holesize, as Dave Chinner suggested. The problem: If you do a fiemap operation on a very large sparse file, it can take an extremely long amount of time (we're talking days here) because function __generic_block_fiemap does a block-for-block search when it encounters a hole. The solution: Allow the underlying file system to return the hole size so that function __generic_block_fiemap can quickly skip the hole. Preamble: In cases where the fs-specific block_map() function finds a hole, it can return the hole size in b_size. This is efficient because the file system doesn't need to figure out block mapping a second time to determine the hole size. The patch uses a new buffer_holesize flag to tell when the fs-specific block_map() is passing back the hole_size: If the fs-specific block_map() doesn't set the buffer_holesize bit, function __generic_block_fiemap() assumes a hole size of 1 as before. Other file systems that want to take advantage of the new "hole size" functionality need only write their own function to determine the hole size, call it from their respective block_map() function, and set_buffer_holesize to put it into use. I've written a simple patch to GFS2 that does just that, as a follow-on. Patch description: This patch changes function __generic_block_fiemap so that if the fs-specific block_map sets the buffer_holesize flag corresponding to a hole, it takes the returned b_size to be the size of the hole, in bytes. This is much faster than trying each block individually when large holes are encountered. Regards, Bob Peterson Red Hat File Systems Signed-off-by: Bob Peterson <rpeterso@redhat.com> --- fs/ioctl.c | 7 ++++++- include/linux/buffer_head.h | 2 ++ 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/ioctl.c b/fs/ioctl.c index 8ac3fad..121ba6f 100644 --- a/fs/ioctl.c +++ b/fs/ioctl.c @@ -291,13 +291,18 @@ int __generic_block_fiemap(struct inode *inode, memset(&map_bh, 0, sizeof(struct buffer_head)); map_bh.b_size = len; + clear_buffer_holesize(&map_bh); ret = get_block(inode, start_blk, &map_bh, 0); if (ret) break; /* HOLE */ if (!buffer_mapped(&map_bh)) { - start_blk++; + if (buffer_holesize(&map_bh)) + start_blk += logical_to_blk(inode, + map_bh.b_size); + else + start_blk++; /* * We want to handle the case where there is an diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index 324329c..39ed1f1 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -37,6 +37,7 @@ enum bh_state_bits { BH_Meta, /* Buffer contains metadata */ BH_Prio, /* Buffer should be submitted with REQ_PRIO */ BH_Defer_Completion, /* Defer AIO completion to workqueue */ + BH_Holesize, /* Hole encountered, hole size returned */ BH_PrivateStart,/* not a state bit, but the first bit available * for private allocation by other entities @@ -128,6 +129,7 @@ BUFFER_FNS(Boundary, boundary) BUFFER_FNS(Write_EIO, write_io_error) BUFFER_FNS(Unwritten, unwritten) BUFFER_FNS(Meta, meta) +BUFFER_FNS(Holesize, holesize) BUFFER_FNS(Prio, prio) BUFFER_FNS(Defer_Completion, defer_completion) ^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH][try5] fs: if block_map clears buffer_holesize bit skip hole size from b_size 2014-09-05 0:04 ` [PATCH][try4] " Bob Peterson @ 2014-09-11 15:29 ` Bob Peterson 0 siblings, 0 replies; 4+ messages in thread From: Bob Peterson @ 2014-09-11 15:29 UTC (permalink / raw) To: Alexander Viro, linux-fsdevel Hi, This version uses a new buffer flag, holesize, as Dave Chinner suggested. It also incorporates a suggestion from Steve Whitehouse. The problem: If you do a fiemap operation on a very large sparse file, it can take an extremely long amount of time (we're talking days here) because function __generic_block_fiemap does a block-for-block search when it encounters a hole. The solution: Allow the underlying file system to return the hole size so that function __generic_block_fiemap can quickly skip the hole. Patch description: This patch changes function __generic_block_fiemap so that it sets a new buffer_holesize bit. The new bit signals to the underlying file system to return a hole size from its block_map function (if possible) in the event that a hole is encountered at the requested block. If the block_map function encounters a hole, and clears buffer_holesize, fiemap takes the returned b_size to be the size of the hole, in bytes. It then skips the hole and moves to the next block. This may be repeated several times in a row, especially for large holes, due to possible limitations of the fs-specific block_map function. This is still much faster than trying each block individually when large holes are encountered. If the block_map function does not clear buffer_holesize, the request for holesize has been ignored, and it falls back to today's method of doing a block-by-block search for the next valid block. Regards, Bob Peterson Red Hat File Systems Signed-off-by: Bob Peterson <rpeterso@redhat.com> --- fs/ioctl.c | 7 ++++++- include/linux/buffer_head.h | 2 ++ 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/ioctl.c b/fs/ioctl.c index 8ac3fad..ae63b1f 100644 --- a/fs/ioctl.c +++ b/fs/ioctl.c @@ -291,13 +291,18 @@ int __generic_block_fiemap(struct inode *inode, memset(&map_bh, 0, sizeof(struct buffer_head)); map_bh.b_size = len; + set_buffer_holesize(&map_bh); /* return hole size if able */ ret = get_block(inode, start_blk, &map_bh, 0); if (ret) break; /* HOLE */ if (!buffer_mapped(&map_bh)) { - start_blk++; + if (buffer_holesize(&map_bh)) /* holesize ignored */ + start_blk++; + else + start_blk += logical_to_blk(inode, + map_bh.b_size); /* * We want to handle the case where there is an diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index 324329c..b8ce396 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -37,6 +37,7 @@ enum bh_state_bits { BH_Meta, /* Buffer contains metadata */ BH_Prio, /* Buffer should be submitted with REQ_PRIO */ BH_Defer_Completion, /* Defer AIO completion to workqueue */ + BH_Holesize, /* Return hole size (and clear) if possible */ BH_PrivateStart,/* not a state bit, but the first bit available * for private allocation by other entities @@ -128,6 +129,7 @@ BUFFER_FNS(Boundary, boundary) BUFFER_FNS(Write_EIO, write_io_error) BUFFER_FNS(Unwritten, unwritten) BUFFER_FNS(Meta, meta) +BUFFER_FNS(Holesize, holesize) BUFFER_FNS(Prio, prio) BUFFER_FNS(Defer_Completion, defer_completion) ^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-09-11 15:29 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <1623792099.11913714.1409074745372.JavaMail.zimbra@redhat.com> 2014-09-03 15:52 ` [PATCH][resend] fs: Add hooks for get_hole_size to generic_block_fiemap Bob Peterson 2014-09-03 20:26 ` Dave Chinner 2014-09-05 0:04 ` [PATCH][try4] " Bob Peterson 2014-09-11 15:29 ` [PATCH][try5] fs: if block_map clears buffer_holesize bit skip hole size from b_size Bob Peterson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).