* [PATCH 1/3] tmpfs: revert SEEK_DATA and SEEK_HOLE
[not found] <alpine.LSU.2.00.1207091533001.2051@eggly.anvils>
@ 2012-07-09 22:41 ` Hugh Dickins
2012-07-11 6:07 ` Cong Wang
0 siblings, 1 reply; 12+ messages in thread
From: Hugh Dickins @ 2012-07-09 22:41 UTC (permalink / raw)
To: Andrew Morton
Cc: Christoph Hellwig, Josef Bacik, Andi Kleen, Andreas Dilger,
Dave Chinner, Marco Stornelli, Jeff liu, Chris Mason, linux-mm,
linux-fsdevel, linux-kernel
Revert 4fb5ef089b28 ("tmpfs: support SEEK_DATA and SEEK_HOLE").
I believe it's correct, and it's been nice to have from rc1 to rc6;
but as the original commit said:
I don't know who actually uses SEEK_DATA or SEEK_HOLE, and whether it
would be of any use to them on tmpfs. This code adds 92 lines and 752
bytes on x86_64 - is that bloat or worthwhile?
Nobody asked for it, so I conclude that it's bloat: let's revert tmpfs
to the dumb generic support for v3.5. We can always reinstate it later
if useful, and anyone needing it in a hurry can just get it out of git.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Josef Bacik <josef@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andreas Dilger <adilger@dilger.ca>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Marco Stornelli <marco.stornelli@gmail.com>
Cc: Jeff liu <jeff.liu@oracle.com>
Cc: Chris Mason <chris.mason@fusionio.com>
---
But if someone protests at this reversion, of course we can drop it.
mm/shmem.c | 94 ---------------------------------------------------
1 file changed, 1 insertion(+), 93 deletions(-)
--- 3.5-rc6/mm/shmem.c 2012-07-07 18:20:40.635328642 -0700
+++ linux/mm/shmem.c 2012-07-07 19:20:02.986950655 -0700
@@ -1692,98 +1692,6 @@ static ssize_t shmem_file_splice_read(st
return error;
}
-/*
- * llseek SEEK_DATA or SEEK_HOLE through the radix_tree.
- */
-static pgoff_t shmem_seek_hole_data(struct address_space *mapping,
- pgoff_t index, pgoff_t end, int origin)
-{
- struct page *page;
- struct pagevec pvec;
- pgoff_t indices[PAGEVEC_SIZE];
- bool done = false;
- int i;
-
- pagevec_init(&pvec, 0);
- pvec.nr = 1; /* start small: we may be there already */
- while (!done) {
- pvec.nr = shmem_find_get_pages_and_swap(mapping, index,
- pvec.nr, pvec.pages, indices);
- if (!pvec.nr) {
- if (origin == SEEK_DATA)
- index = end;
- break;
- }
- for (i = 0; i < pvec.nr; i++, index++) {
- if (index < indices[i]) {
- if (origin == SEEK_HOLE) {
- done = true;
- break;
- }
- index = indices[i];
- }
- page = pvec.pages[i];
- if (page && !radix_tree_exceptional_entry(page)) {
- if (!PageUptodate(page))
- page = NULL;
- }
- if (index >= end ||
- (page && origin == SEEK_DATA) ||
- (!page && origin == SEEK_HOLE)) {
- done = true;
- break;
- }
- }
- shmem_deswap_pagevec(&pvec);
- pagevec_release(&pvec);
- pvec.nr = PAGEVEC_SIZE;
- cond_resched();
- }
- return index;
-}
-
-static loff_t shmem_file_llseek(struct file *file, loff_t offset, int origin)
-{
- struct address_space *mapping;
- struct inode *inode;
- pgoff_t start, end;
- loff_t new_offset;
-
- if (origin != SEEK_DATA && origin != SEEK_HOLE)
- return generic_file_llseek_size(file, offset, origin,
- MAX_LFS_FILESIZE);
- mapping = file->f_mapping;
- inode = mapping->host;
- mutex_lock(&inode->i_mutex);
- /* We're holding i_mutex so we can access i_size directly */
-
- if (offset < 0)
- offset = -EINVAL;
- else if (offset >= inode->i_size)
- offset = -ENXIO;
- else {
- start = offset >> PAGE_CACHE_SHIFT;
- end = (inode->i_size + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
- new_offset = shmem_seek_hole_data(mapping, start, end, origin);
- new_offset <<= PAGE_CACHE_SHIFT;
- if (new_offset > offset) {
- if (new_offset < inode->i_size)
- offset = new_offset;
- else if (origin == SEEK_DATA)
- offset = -ENXIO;
- else
- offset = inode->i_size;
- }
- }
-
- if (offset >= 0 && offset != file->f_pos) {
- file->f_pos = offset;
- file->f_version = 0;
- }
- mutex_unlock(&inode->i_mutex);
- return offset;
-}
-
static long shmem_fallocate(struct file *file, int mode, loff_t offset,
loff_t len)
{
@@ -2787,7 +2695,7 @@ static const struct address_space_operat
static const struct file_operations shmem_file_operations = {
.mmap = shmem_mmap,
#ifdef CONFIG_TMPFS
- .llseek = shmem_file_llseek,
+ .llseek = generic_file_llseek,
.read = do_sync_read,
.write = do_sync_write,
.aio_read = shmem_file_aio_read,
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] tmpfs: revert SEEK_DATA and SEEK_HOLE
2012-07-09 22:41 ` [PATCH 1/3] tmpfs: revert SEEK_DATA and SEEK_HOLE Hugh Dickins
@ 2012-07-11 6:07 ` Cong Wang
2012-07-11 18:55 ` Hugh Dickins
0 siblings, 1 reply; 12+ messages in thread
From: Cong Wang @ 2012-07-11 6:07 UTC (permalink / raw)
To: linux-mm; +Cc: linux-fsdevel, linux-kernel
On Mon, 09 Jul 2012 at 22:41 GMT, Hugh Dickins <hughd@google.com> wrote:
> Revert 4fb5ef089b28 ("tmpfs: support SEEK_DATA and SEEK_HOLE").
> I believe it's correct, and it's been nice to have from rc1 to rc6;
> but as the original commit said:
>
> I don't know who actually uses SEEK_DATA or SEEK_HOLE, and whether it
> would be of any use to them on tmpfs. This code adds 92 lines and 752
> bytes on x86_64 - is that bloat or worthwhile?
I don't think 752 bytes matter much, especially for x86_64.
>
> Nobody asked for it, so I conclude that it's bloat: let's revert tmpfs
> to the dumb generic support for v3.5. We can always reinstate it later
> if useful, and anyone needing it in a hurry can just get it out of git.
>
If you don't have burden to maintain it, I'd prefer to leave as it is,
I don't think 752-bytes is the reason we revert it.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] tmpfs: revert SEEK_DATA and SEEK_HOLE
2012-07-11 6:07 ` Cong Wang
@ 2012-07-11 18:55 ` Hugh Dickins
2012-07-11 23:01 ` Dave Chinner
0 siblings, 1 reply; 12+ messages in thread
From: Hugh Dickins @ 2012-07-11 18:55 UTC (permalink / raw)
To: Cong Wang; +Cc: linux-mm, linux-fsdevel, linux-kernel
On Wed, 11 Jul 2012, Cong Wang wrote:
> On Mon, 09 Jul 2012 at 22:41 GMT, Hugh Dickins <hughd@google.com> wrote:
> > Revert 4fb5ef089b28 ("tmpfs: support SEEK_DATA and SEEK_HOLE").
> > I believe it's correct, and it's been nice to have from rc1 to rc6;
> > but as the original commit said:
> >
> > I don't know who actually uses SEEK_DATA or SEEK_HOLE, and whether it
> > would be of any use to them on tmpfs. This code adds 92 lines and 752
> > bytes on x86_64 - is that bloat or worthwhile?
>
>
> I don't think 752 bytes matter much, especially for x86_64.
>
> >
> > Nobody asked for it, so I conclude that it's bloat: let's revert tmpfs
> > to the dumb generic support for v3.5. We can always reinstate it later
> > if useful, and anyone needing it in a hurry can just get it out of git.
> >
>
> If you don't have burden to maintain it, I'd prefer to leave as it is,
> I don't think 752-bytes is the reason we revert it.
Thank you, your vote has been counted ;)
and I'll be glad if yours stimulates some agreement or disagreement.
But your vote would count for a lot more if you know of some app which
would really benefit from this functionality in tmpfs: I've heard of none.
Hugh
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] tmpfs: revert SEEK_DATA and SEEK_HOLE
2012-07-11 18:55 ` Hugh Dickins
@ 2012-07-11 23:01 ` Dave Chinner
2012-07-12 2:50 ` Hugh Dickins
2012-07-12 3:21 ` Jeff Liu
0 siblings, 2 replies; 12+ messages in thread
From: Dave Chinner @ 2012-07-11 23:01 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Cong Wang, linux-mm, linux-fsdevel, linux-kernel
On Wed, Jul 11, 2012 at 11:55:34AM -0700, Hugh Dickins wrote:
> On Wed, 11 Jul 2012, Cong Wang wrote:
> > On Mon, 09 Jul 2012 at 22:41 GMT, Hugh Dickins <hughd@google.com> wrote:
> > > Revert 4fb5ef089b28 ("tmpfs: support SEEK_DATA and SEEK_HOLE").
> > > I believe it's correct, and it's been nice to have from rc1 to rc6;
> > > but as the original commit said:
> > >
> > > I don't know who actually uses SEEK_DATA or SEEK_HOLE, and whether it
> > > would be of any use to them on tmpfs. This code adds 92 lines and 752
> > > bytes on x86_64 - is that bloat or worthwhile?
> >
> >
> > I don't think 752 bytes matter much, especially for x86_64.
> >
> > >
> > > Nobody asked for it, so I conclude that it's bloat: let's revert tmpfs
> > > to the dumb generic support for v3.5. We can always reinstate it later
> > > if useful, and anyone needing it in a hurry can just get it out of git.
> > >
> >
> > If you don't have burden to maintain it, I'd prefer to leave as it is,
> > I don't think 752-bytes is the reason we revert it.
>
> Thank you, your vote has been counted ;)
> and I'll be glad if yours stimulates some agreement or disagreement.
>
> But your vote would count for a lot more if you know of some app which
> would really benefit from this functionality in tmpfs: I've heard of none.
So what? I've heard of no apps that use this functionality on XFS,
either, but I have heard of a lot of people asking for it to be
implemented over the past couple of years so they can use it.
There's been patches written to make coreutils (cp) make use of it
instead of parsing FIEMAP output to find holes, though I don't know
if that's gone beyond more than "here's some patches"....
Besides, given that you can punch holes in tmpfs files, it seems
strange to then say "we don't need a method of skipping holes to
find data quickly"....
Besides, seek-hole/data is still shiny new and lots of developers
aren't even aware of it's presence in recent kernels. Removing new
functionality saying "no-one is using it" is like smashing the egg
before the chicken hatches (or is it cutting of the chickes's head
before it lays the egg?).
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] tmpfs: revert SEEK_DATA and SEEK_HOLE
2012-07-11 23:01 ` Dave Chinner
@ 2012-07-12 2:50 ` Hugh Dickins
2012-07-12 3:21 ` Jeff Liu
1 sibling, 0 replies; 12+ messages in thread
From: Hugh Dickins @ 2012-07-12 2:50 UTC (permalink / raw)
To: Dave Chinner; +Cc: Cong Wang, linux-mm, linux-fsdevel, linux-kernel
On Thu, 12 Jul 2012, Dave Chinner wrote:
> On Wed, Jul 11, 2012 at 11:55:34AM -0700, Hugh Dickins wrote:
> > On Wed, 11 Jul 2012, Cong Wang wrote:
> > >
> > > If you don't have burden to maintain it, I'd prefer to leave as it is,
> > > I don't think 752-bytes is the reason we revert it.
> >
> > Thank you, your vote has been counted ;)
> > and I'll be glad if yours stimulates some agreement or disagreement.
> >
> > But your vote would count for a lot more if you know of some app which
> > would really benefit from this functionality in tmpfs: I've heard of none.
>
> So what? I've heard of no apps that use this functionality on XFS,
> either, but I have heard of a lot of people asking for it to be
> implemented over the past couple of years so they can use it.
I'd certainly not ask you to remove your support for it from XFS:
nobody would call XFS a minimal filesystem.
But tmpfs has a tradition and a duty to keep fairly small:
it needs to be useful, but it shouldn't be carrying unused baggage.
> There's been patches written to make coreutils (cp) make use of it
> instead of parsing FIEMAP output to find holes, though I don't know
> if that's gone beyond more than "here's some patches"....
>
> Besides, given that you can punch holes in tmpfs files, it seems
> strange to then say "we don't need a method of skipping holes to
> find data quickly"....
tmpfs has been punching holes (via MADV_REMOVE) since 2.6.16 (and
that wasn't added on my whim, IBM wanted and did it). But I haven't
heard of anybody asking for a method of skipping them in six years.
>
> Besides, seek-hole/data is still shiny new and lots of developers
> aren't even aware of it's presence in recent kernels. Removing new
> functionality saying "no-one is using it" is like smashing the egg
> before the chicken hatches (or is it cutting of the chickes's head
> before it lays the egg?).
(You remind me of my chicken-and-egg sandwiches - you can't get them,
you see, it's chicken and egg.)
I'm not trying to remove SEEK_HOLE/SEEK_DATA support from the kernel:
I'm just saying that nobody has yet made the case for their usefulness
in tmpfs, so they're better removed from it before v3.5 is released.
Once we see how useful they have become in the grown-up filesystems,
or someone shows how useful they can be on tmpfs, then we reinstate.
Of course, I'm on both sides of this argument: I wrote that code,
I like it, I'll be glad to put it back when it's useful to someone.
Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] tmpfs: revert SEEK_DATA and SEEK_HOLE
2012-07-11 23:01 ` Dave Chinner
2012-07-12 2:50 ` Hugh Dickins
@ 2012-07-12 3:21 ` Jeff Liu
2012-07-16 9:28 ` Hugh Dickins
1 sibling, 1 reply; 12+ messages in thread
From: Jeff Liu @ 2012-07-12 3:21 UTC (permalink / raw)
To: Dave Chinner
Cc: Hugh Dickins, Cong Wang, linux-mm, linux-fsdevel, linux-kernel
On 07/12/2012 07:01 AM, Dave Chinner wrote:
> On Wed, Jul 11, 2012 at 11:55:34AM -0700, Hugh Dickins wrote:
>> On Wed, 11 Jul 2012, Cong Wang wrote:
>>> On Mon, 09 Jul 2012 at 22:41 GMT, Hugh Dickins <hughd@google.com> wrote:
>>>> Revert 4fb5ef089b28 ("tmpfs: support SEEK_DATA and SEEK_HOLE").
>>>> I believe it's correct, and it's been nice to have from rc1 to rc6;
>>>> but as the original commit said:
>>>>
>>>> I don't know who actually uses SEEK_DATA or SEEK_HOLE, and whether it
>>>> would be of any use to them on tmpfs. This code adds 92 lines and 752
>>>> bytes on x86_64 - is that bloat or worthwhile?
>>>
>>>
>>> I don't think 752 bytes matter much, especially for x86_64.
>>>
>>>>
>>>> Nobody asked for it, so I conclude that it's bloat: let's revert tmpfs
>>>> to the dumb generic support for v3.5. We can always reinstate it later
>>>> if useful, and anyone needing it in a hurry can just get it out of git.
>>>>
>>>
>>> If you don't have burden to maintain it, I'd prefer to leave as it is,
>>> I don't think 752-bytes is the reason we revert it.
>>
>> Thank you, your vote has been counted ;)
>> and I'll be glad if yours stimulates some agreement or disagreement.
>>
>> But your vote would count for a lot more if you know of some app which
>> would really benefit from this functionality in tmpfs: I've heard of none.
>
> So what? I've heard of no apps that use this functionality on XFS,
> either, but I have heard of a lot of people asking for it to be
> implemented over the past couple of years so they can use it.
> There's been patches written to make coreutils (cp) make use of it
> instead of parsing FIEMAP output to find holes, though I don't know
> if that's gone beyond more than "here's some patches"...
Yes, for apps, cp(1) will make use of it to replace the old FIEMAP for efficient sparse file copy.
I have implemented an extent-scan module to coreutils a few years ago,
http://fossies.org/dox/coreutils-8.17/extent-scan_8c_source.html
It does extent scan through FIEMAP, however, SEEK_DATA/SEEK_HOLE is more convenient and easy to use
considering the call interface. So FIEMAP will be replaced by SEEK_XXX once it got supported by EXT4.
Moreover, I have discussed with Jim who is the coreutils maintainer previously, He would like to post
extent-scan module to Gnulib so that other GNU utilities which are relied on Gnulib might be a potential
user of it, at least, GNU tar will definitely need it for sparse file backup.
>
> Besides, given that you can punch holes in tmpfs files, it seems
> strange to then say "we don't need a method of skipping holes to
> find data quickly"....
So its deserve to keep this feature working on tmpfs considering hole punch. :)
Thanks,
-Jeff
>
> Besides, seek-hole/data is still shiny new and lots of developers
> aren't even aware of it's presence in recent kernels. Removing new
> functionality saying "no-one is using it" is like smashing the egg
> before the chicken hatches (or is it cutting of the chickes's head
> before it lays the egg?).
>
> Cheers,
>
> Dave.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] tmpfs: revert SEEK_DATA and SEEK_HOLE
2012-07-12 3:21 ` Jeff Liu
@ 2012-07-16 9:28 ` Hugh Dickins
2012-07-17 6:15 ` Jeff Liu
2012-07-31 14:30 ` Jim Meyering
0 siblings, 2 replies; 12+ messages in thread
From: Hugh Dickins @ 2012-07-16 9:28 UTC (permalink / raw)
To: Jeff Liu; +Cc: Dave Chinner, Cong Wang, linux-mm, linux-fsdevel, linux-kernel
On Thu, 12 Jul 2012, Jeff Liu wrote:
> On 07/12/2012 07:01 AM, Dave Chinner wrote:
> > On Wed, Jul 11, 2012 at 11:55:34AM -0700, Hugh Dickins wrote:
> >>
> >> But your vote would count for a lot more if you know of some app which
> >> would really benefit from this functionality in tmpfs: I've heard of none.
> >
> > So what? I've heard of no apps that use this functionality on XFS,
> > either, but I have heard of a lot of people asking for it to be
> > implemented over the past couple of years so they can use it.
> > There's been patches written to make coreutils (cp) make use of it
> > instead of parsing FIEMAP output to find holes, though I don't know
> > if that's gone beyond more than "here's some patches"...
>
> Yes, for apps, cp(1) will make use of it to replace the old FIEMAP for efficient sparse file copy.
> I have implemented an extent-scan module to coreutils a few years ago,
> http://fossies.org/dox/coreutils-8.17/extent-scan_8c_source.html
Thanks for confirming Dave's pointer to cp.
Of course, tmpfs has never supported FIBMAP or FIEMAP;
but SEEK_DATA and SEEK_HOLE do fit it much more naturally.
>
> It does extent scan through FIEMAP, however, SEEK_DATA/SEEK_HOLE is more convenient and easy to use
> considering the call interface. So FIEMAP will be replaced by SEEK_XXX once it got supported by EXT4.
>
> Moreover, I have discussed with Jim who is the coreutils maintainer previously, He would like to post
> extent-scan module to Gnulib so that other GNU utilities which are relied on Gnulib might be a potential
> user of it, at least, GNU tar will definitely need it for sparse file backup.
Thanks for the info. I confess I'm not hugely swayed by cp and sparse
file archive arguments - I doubt many people care, and I doubt those who
do care are using tmpfs for them.
But my doubts are just ignorance. I was hoping to hear, not that we have
tools to copy sparse files efficiently (umm, over the network?), but
what apps are actually working live with those sparse files on tmpfs,
and now need to seek around them. Some math or physics applications?
> >
> > Besides, given that you can punch holes in tmpfs files, it seems
> > strange to then say "we don't need a method of skipping holes to
> > find data quickly"....
>
> So its deserve to keep this feature working on tmpfs considering hole punch. :)
Well, thank you, as I said earlier I am on both sides of the argument.
(And feel uncomfortably like a prima donna waiting in the wings until
the audience has shouted long enough for the encore.)
It's now taken out of 3.5, but we can bring it back when there's more
demand. Your extent-scan is itself waiting for ext4 to support it:
maybe get noisy at me when that's imminent.
Hugh
>
> Thanks,
> -Jeff
>
> >
> > Besides, seek-hole/data is still shiny new and lots of developers
> > aren't even aware of it's presence in recent kernels. Removing new
> > functionality saying "no-one is using it" is like smashing the egg
> > before the chicken hatches (or is it cutting of the chickes's head
> > before it lays the egg?).
> >
> > Cheers,
> >
> > Dave.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] tmpfs: revert SEEK_DATA and SEEK_HOLE
2012-07-16 9:28 ` Hugh Dickins
@ 2012-07-17 6:15 ` Jeff Liu
2012-07-31 14:30 ` Jim Meyering
1 sibling, 0 replies; 12+ messages in thread
From: Jeff Liu @ 2012-07-17 6:15 UTC (permalink / raw)
To: Hugh Dickins
Cc: Dave Chinner, Cong Wang, linux-mm, linux-fsdevel, linux-kernel
On 07/16/2012 05:28 PM, Hugh Dickins wrote:
> On Thu, 12 Jul 2012, Jeff Liu wrote:
>> On 07/12/2012 07:01 AM, Dave Chinner wrote:
>>> On Wed, Jul 11, 2012 at 11:55:34AM -0700, Hugh Dickins wrote:
>>>>
>>>> But your vote would count for a lot more if you know of some app which
>>>> would really benefit from this functionality in tmpfs: I've heard of none.
>>>
>>> So what? I've heard of no apps that use this functionality on XFS,
>>> either, but I have heard of a lot of people asking for it to be
>>> implemented over the past couple of years so they can use it.
>>> There's been patches written to make coreutils (cp) make use of it
>>> instead of parsing FIEMAP output to find holes, though I don't know
>>> if that's gone beyond more than "here's some patches"...
>>
>> Yes, for apps, cp(1) will make use of it to replace the old FIEMAP for efficient sparse file copy.
>> I have implemented an extent-scan module to coreutils a few years ago,
>> http://fossies.org/dox/coreutils-8.17/extent-scan_8c_source.html
>
> Thanks for confirming Dave's pointer to cp.
>
> Of course, tmpfs has never supported FIBMAP or FIEMAP;
> but SEEK_DATA and SEEK_HOLE do fit it much more naturally.
>
>>
>> It does extent scan through FIEMAP, however, SEEK_DATA/SEEK_HOLE is more convenient and easy to use
>> considering the call interface. So FIEMAP will be replaced by SEEK_XXX once it got supported by EXT4.
>>
>> Moreover, I have discussed with Jim who is the coreutils maintainer previously, He would like to post
>> extent-scan module to Gnulib so that other GNU utilities which are relied on Gnulib might be a potential
>> user of it, at least, GNU tar will definitely need it for sparse file backup.
>
> Thanks for the info. I confess I'm not hugely swayed by cp and sparse
> file archive arguments - I doubt many people care, and I doubt those who
> do care are using tmpfs for them.
>
> But my doubts are just ignorance. I was hoping to hear, not that we have
> tools to copy sparse files efficiently (umm, over the network?), but
> what apps are actually working live with those sparse files on tmpfs,
> and now need to seek around them. Some math or physics applications?
>
>>>
>>> Besides, given that you can punch holes in tmpfs files, it seems
>>> strange to then say "we don't need a method of skipping holes to
>>> find data quickly"....
>>
>> So its deserve to keep this feature working on tmpfs considering hole punch. :)
>
> Well, thank you, as I said earlier I am on both sides of the argument.
> (And feel uncomfortably like a prima donna waiting in the wings until
> the audience has shouted long enough for the encore.)
Oh, sorry, I missed you response to Dave before.
>
> It's now taken out of 3.5, but we can bring it back when there's more
> demand.
Yep. :)
Thanks,
-Jeff
> Your extent-scan is itself waiting for ext4 to support it:
> maybe get noisy at me when that's imminent.
>
> Hugh
>
>>
>> Thanks,
>> -Jeff
>>
>>>
>>> Besides, seek-hole/data is still shiny new and lots of developers
>>> aren't even aware of it's presence in recent kernels. Removing new
>>> functionality saying "no-one is using it" is like smashing the egg
>>> before the chicken hatches (or is it cutting of the chickes's head
>>> before it lays the egg?).
>>>
>>> Cheers,
>>>
>>> Dave.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] tmpfs: revert SEEK_DATA and SEEK_HOLE
2012-07-16 9:28 ` Hugh Dickins
2012-07-17 6:15 ` Jeff Liu
@ 2012-07-31 14:30 ` Jim Meyering
2012-07-31 14:42 ` Jim Meyering
2012-08-08 2:08 ` Hugh Dickins
1 sibling, 2 replies; 12+ messages in thread
From: Jim Meyering @ 2012-07-31 14:30 UTC (permalink / raw)
To: linux-fsdevel; +Cc: Hugh Dickins
Hugh Dickins wrote:
> On Thu, 12 Jul 2012, Jeff Liu wrote:
> > On 07/12/2012 07:01 AM, Dave Chinner wrote:
> > > On Wed, Jul 11, 2012 at 11:55:34AM -0700, Hugh Dickins wrote:
> > >>
> > >> But your vote would count for a lot more if you know of some app which
> > >> would really benefit from this functionality in tmpfs: I've heard
> > >> of none.
...
[Jeff mentioned "cp"]
grep is another tool that would benefit.
I often put very large files (often sparse, too) on tmpfs file systems
and would like "grep -r PAT /tmp" to work well in spite of those files.
Please consider restoring SEEK_HOLE/SEEK_DATA support for tmpfs.
The lack of cross-FS support in SEEK_HOLE/SEEK_DATA support is a bit of a
thorn in our sides. FIEMAP is not a viable option, and SEEK_HOLE support
works only if you happen to be using btrfs, xfs, ocfs2 or 3.5.0-rcN tmpfs.
Not something we can rely on for a feature whose lack can convert grep -r
into a memory-hogging apparently-hung job or OOM-killer-target.
What would you like to happen when you run
(deliberately or inadvertently) grep on a large sparse file?
I want it to search only the non-HOLE sections of that file,
especially when examining a hole involves accumulating a
"line" that may be so long that it exhausts virtual memory.
We're not quite there, but for now can at least avoid the
VM-abusing behavior with --binary-file=without-match option,
which says to treat "binary" (sparse) files as if they contain no match.
Sometimes.
With working SEEK_HOLE support, grep does the right thing here:
(${AWK-awk} 'BEGIN{ for (i=0;i<1000;i++) printf "%080d\n", 0 }' < /dev/null
echo x | dd bs=1024k seek=8000000
) >8T-or-so
$ env time --format=%e grep x 8T-or-so
0.00
But without SEEK_HOLE support, and with a lot of memory, grep takes a
long time to allocate all of that space before it finally chokes or is killed.
Here, it takes 46 seconds before running out of memory:
$ env time grep --binary-file=without-match x 8T-or-so
grep: memory exhausted
3.15user 25.48system 0:46.46elapsed 61%CPU\
(0avgtext+0avgdata 12583712maxresident)k
0inputs+8outputs (0major+2733623minor)pagefaults 0swaps
[Exit 2]
Until very recently, grep was trying to guess whether an input
has a hole using st_blocks and st_size, but with file systems now
using compression, that method it too subject to false-positives.
Ideally we would use SEEK_HOLE/SEEK_DATA, but until that is useful on
more linux file systems, I suspect we'll have to choose our method based
on the file system type (at the cost of a statvfs call for each st_dev),
possibly in combination with the linux kernel version.
Here's some background/discussion on the topic, including the
original report about the st_blocks-based heuristic not working:
http://thread.gmane.org/gmane.comp.gnu.grep.bugs/4604/focus=4610
In case you want to see the SEEK_HOLE-using code, grep's file_is_binary
function is here:
http://git.savannah.gnu.org/cgit/grep.git/tree/src/main.c#n439
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] tmpfs: revert SEEK_DATA and SEEK_HOLE
2012-07-31 14:30 ` Jim Meyering
@ 2012-07-31 14:42 ` Jim Meyering
2012-08-08 2:08 ` Hugh Dickins
1 sibling, 0 replies; 12+ messages in thread
From: Jim Meyering @ 2012-07-31 14:42 UTC (permalink / raw)
To: linux-fsdevel; +Cc: Hugh Dickins
Jim Meyering wrote:
> Hugh Dickins wrote:
>> On Thu, 12 Jul 2012, Jeff Liu wrote:
>> > On 07/12/2012 07:01 AM, Dave Chinner wrote:
>> > > On Wed, Jul 11, 2012 at 11:55:34AM -0700, Hugh Dickins wrote:
>> > >>
>> > >> But your vote would count for a lot more if you know of some app which
>> > >> would really benefit from this functionality in tmpfs: I've heard
>> > >> of none.
> ...
> [Jeff mentioned "cp"]
>
> grep is another tool that would benefit.
> I often put very large files (often sparse, too) on tmpfs file systems
> and would like "grep -r PAT /tmp" to work well in spite of those files.
>
> Please consider restoring SEEK_HOLE/SEEK_DATA support for tmpfs.
Also, lseek's SEEK_HOLE/SEEK_DATA support is slated to be required
by POSIX 2008: http://austingroupbugs.net/view.php?id=415
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] tmpfs: revert SEEK_DATA and SEEK_HOLE
2012-07-31 14:30 ` Jim Meyering
2012-07-31 14:42 ` Jim Meyering
@ 2012-08-08 2:08 ` Hugh Dickins
2012-08-14 17:03 ` Paul Eggert
1 sibling, 1 reply; 12+ messages in thread
From: Hugh Dickins @ 2012-08-08 2:08 UTC (permalink / raw)
To: Jim Meyering; +Cc: Paul Eggert, Zheng Liu, linux-fsdevel
On Tue, 31 Jul 2012, Jim Meyering wrote:
> Hugh Dickins wrote:
> > On Thu, 12 Jul 2012, Jeff Liu wrote:
> > > On 07/12/2012 07:01 AM, Dave Chinner wrote:
> > > > On Wed, Jul 11, 2012 at 11:55:34AM -0700, Hugh Dickins wrote:
> > > >>
> > > >> But your vote would count for a lot more if you know of some app which
> > > >> would really benefit from this functionality in tmpfs: I've heard
> > > >> of none.
> ...
> [Jeff mentioned "cp"]
>
> grep is another tool that would benefit.
> I often put very large files (often sparse, too) on tmpfs file systems
> and would like "grep -r PAT /tmp" to work well in spite of those files.
>
> Please consider restoring SEEK_HOLE/SEEK_DATA support for tmpfs.
Many thanks, Jim, for taking the time to explain your case and provide
the links below. I am sorry to have given you guys trouble by adding
and then reverting the tmpfs SEEK_HOLE before 3.5 final.
I really wanted to respond to you positively. But now that (I think) I
understand the grep case better, I believe the best I can do is to keep
an eye on ext4 development, and reintroduce tmpfs SEEK_HOLE/SEEK_DATA
in the same release that proper support comes to ext4, whenever that is
(I notice that Zheng Liu has posted patches for it, but I have no idea
how close to ready they are).
>
> The lack of cross-FS support in SEEK_HOLE/SEEK_DATA support is a bit of a
> thorn in our sides. FIEMAP is not a viable option, and SEEK_HOLE support
> works only if you happen to be using btrfs, xfs, ocfs2 or 3.5.0-rcN tmpfs.
> Not something we can rely on for a feature whose lack can convert grep -r
> into a memory-hogging apparently-hung job or OOM-killer-target.
>
> What would you like to happen when you run
> (deliberately or inadvertently) grep on a large sparse file?
> I want it to search only the non-HOLE sections of that file,
That will be a nice optimization once SEEK_DATA is more widely
supported, yes.
> especially when examining a hole involves accumulating a
> "line" that may be so long that it exhausts virtual memory.
I think you're conflating two different things there.
As I understand it, there's a bug (or silliness) in grep such that
it ends up exhausting memory when it cannot divide the file up into
sensibly lengthed lines. I agree that you don't want grep to collapse
with "memory exhausted" in such a case; and you'll know better than me
what a sensible maximum length for a grep line should be.
But what has that got to do with holes? I have not tried grepping a
larger-than-memory+swap file containing neither NULs nor newlines, so
please correct me if I'm wrong: but I expect it to fail grep's binary
file test (with or without proper SEEK_HOLE support), but then to hit
the memory exhausted error.
What it's got to do with holes is that simple text files don't have
NULs in them, and there tends to be a correlation between files
containing NULs and files not divided into lines of text (though
I'm blissfully ignorant of even the most common document formats).
There's also a correlation between files containing NULs and files
containing NULs in their first 32k. grep's file_is_binary() looks for
a NUL in that first 32k (?); and the heuristic has been "enhanced" to
check for sparse files too (whether by st_blocks or now SEEK_HOLE) -
though no attempt to check for isolated NULs beyond the first 32k.
And there's even a "big-hole" test for this behaviour in the tree.
But wouldn't the developer's common case (object files amidst source
in the tree) usually be handled by that check on the first 32k?
And all this stuff about holes: shouldn't grep instead just be
checking for over-long lines instead of running out of memory?
I apologize, this is not the right forum to discuss grep code:
let's take it away from linux-fsdevel if followup is needed.
I'll happily reintroduce tmpfs SEEK_DATA and SEEK_HOLE
when ext4 joins btrfs and xfs in supporting it.
Thanks,
Hugh
> We're not quite there, but for now can at least avoid the
> VM-abusing behavior with --binary-file=without-match option,
> which says to treat "binary" (sparse) files as if they contain no match.
> Sometimes.
>
> With working SEEK_HOLE support, grep does the right thing here:
>
> (${AWK-awk} 'BEGIN{ for (i=0;i<1000;i++) printf "%080d\n", 0 }' < /dev/null
> echo x | dd bs=1024k seek=8000000
> ) >8T-or-so
>
> $ env time --format=%e grep x 8T-or-so
> 0.00
>
> But without SEEK_HOLE support, and with a lot of memory, grep takes a
> long time to allocate all of that space before it finally chokes or is killed.
> Here, it takes 46 seconds before running out of memory:
>
> $ env time grep --binary-file=without-match x 8T-or-so
> grep: memory exhausted
> 3.15user 25.48system 0:46.46elapsed 61%CPU\
> (0avgtext+0avgdata 12583712maxresident)k
> 0inputs+8outputs (0major+2733623minor)pagefaults 0swaps
> [Exit 2]
>
> Until very recently, grep was trying to guess whether an input
> has a hole using st_blocks and st_size, but with file systems now
> using compression, that method it too subject to false-positives.
>
> Ideally we would use SEEK_HOLE/SEEK_DATA, but until that is useful on
> more linux file systems, I suspect we'll have to choose our method based
> on the file system type (at the cost of a statvfs call for each st_dev),
> possibly in combination with the linux kernel version.
>
> Here's some background/discussion on the topic, including the
> original report about the st_blocks-based heuristic not working:
>
> http://thread.gmane.org/gmane.comp.gnu.grep.bugs/4604/focus=4610
>
> In case you want to see the SEEK_HOLE-using code, grep's file_is_binary
> function is here:
>
> http://git.savannah.gnu.org/cgit/grep.git/tree/src/main.c#n439
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] tmpfs: revert SEEK_DATA and SEEK_HOLE
2012-08-08 2:08 ` Hugh Dickins
@ 2012-08-14 17:03 ` Paul Eggert
0 siblings, 0 replies; 12+ messages in thread
From: Paul Eggert @ 2012-08-14 17:03 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Jim Meyering, Zheng Liu, linux-fsdevel
On 08/07/2012 07:08 PM, Hugh Dickins wrote:
> wouldn't the developer's common case (object files amidst source
> in the tree) usually be handled by that check on the first 32k?
Yes, but grep should also handle the less-common case
where the first 32K is text and there's a large hole later.
The particular case I'm worried about is a denial-of-service
attack, so it's irrelevant that this case is uncommon in
typical files.
> shouldn't grep instead just be
> checking for over-long lines instead of running out of memory?
GNU programs should not have arbitrary limits. An arbitrary limit, such
as 100,000 bytes, that we put on line length, would cause grep to
not work on some valid inputs.
This is not to say that grep couldn't function better on files with
lots of nulls -- it can, and that's on our list of things to do --
but SEEK_HOLE is a big and obvious win in this area.
We also need SEEK_HOLE and SEEK_DATA for GNU 'tar', for the same reason
(denial-of-service attacks, mostly).
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2012-08-14 17:13 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <alpine.LSU.2.00.1207091533001.2051@eggly.anvils>
2012-07-09 22:41 ` [PATCH 1/3] tmpfs: revert SEEK_DATA and SEEK_HOLE Hugh Dickins
2012-07-11 6:07 ` Cong Wang
2012-07-11 18:55 ` Hugh Dickins
2012-07-11 23:01 ` Dave Chinner
2012-07-12 2:50 ` Hugh Dickins
2012-07-12 3:21 ` Jeff Liu
2012-07-16 9:28 ` Hugh Dickins
2012-07-17 6:15 ` Jeff Liu
2012-07-31 14:30 ` Jim Meyering
2012-07-31 14:42 ` Jim Meyering
2012-08-08 2:08 ` Hugh Dickins
2012-08-14 17:03 ` Paul Eggert
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).