* Usefulness of SEEK_HOLE / SEEK_DATA in generic_file_llseek()
@ 2013-12-23 23:12 Jan Kara
2013-12-24 0:34 ` Josef Bacik
0 siblings, 1 reply; 5+ messages in thread
From: Jan Kara @ 2013-12-23 23:12 UTC (permalink / raw)
To: linux-fsdevel; +Cc: Josef Bacik
Hello,
so I've now hit a xfstests failure for UDF which is caused by the
implementation of SEEK_HOLE / SEEK_DATA in generic_file_llseek(). UDF uses
that function as its .llseek method but it supports holes as any other unix
filesystem (e.g. ext2). The test in xfstests assumes that when it creates a
file by pwrite(fd, buf, bufsz, off), then SEEK_DATA on offset 0 should
return 'off' (off is reasonably rounded) but that's not true for the
implementation in generic_file_llseek().
Now I'm not so much interested in that test itself - that can be tweaked to
detect that case. But I rather wanted to ask - how useful is it to
implement SEEK_HOLE / SEEK_DATA the way it is in generic_file_llseek()?
Because it seems to me that any serious user will have to detect whether
SEEK_HOLE / SEEK_DATA works reasonably and if not, fall back to some
heuristic anyway. So why bother inventing bogus values in
generic_file_llseek and thus making detection of working implementation
harder?
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Usefulness of SEEK_HOLE / SEEK_DATA in generic_file_llseek()
2013-12-23 23:12 Usefulness of SEEK_HOLE / SEEK_DATA in generic_file_llseek() Jan Kara
@ 2013-12-24 0:34 ` Josef Bacik
2013-12-25 14:34 ` Jeff Liu
2013-12-30 21:18 ` Jan Kara
0 siblings, 2 replies; 5+ messages in thread
From: Josef Bacik @ 2013-12-24 0:34 UTC (permalink / raw)
To: Jan Kara, linux-fsdevel
On 12/23/2013 06:12 PM, Jan Kara wrote:
> Hello,
>
> so I've now hit a xfstests failure for UDF which is caused by the
> implementation of SEEK_HOLE / SEEK_DATA in generic_file_llseek(). UDF uses
> that function as its .llseek method but it supports holes as any other unix
> filesystem (e.g. ext2). The test in xfstests assumes that when it creates a
> file by pwrite(fd, buf, bufsz, off), then SEEK_DATA on offset 0 should
> return 'off' (off is reasonably rounded) but that's not true for the
> implementation in generic_file_llseek().
>
> Now I'm not so much interested in that test itself - that can be tweaked to
> detect that case. But I rather wanted to ask - how useful is it to
> implement SEEK_HOLE / SEEK_DATA the way it is in generic_file_llseek()?
> Because it seems to me that any serious user will have to detect whether
> SEEK_HOLE / SEEK_DATA works reasonably and if not, fall back to some
> heuristic anyway. So why bother inventing bogus values in
> generic_file_llseek and thus making detection of working implementation
> harder?
I'm writing this from my in-laws so I'm going to make some assumptions
about how the code works based on my memory, so sorry in advanced if
this is completely wrong ;).
IIRC with the generic implementation we treat everything <= i_size as
data and i_size as the first hole. The way the spec works is that if we
are currently at data and do seek_data then we just return our current
offset, same for a hole. In order to not be a jackass and have
-EOPNOTSUPP for anybody who didn't implement seek_hole/seek_data I just
did it this way where the only hole is the one that starts at i_size, so
seek_data before that is going to return the value.
As far as detecting an optimized handling of seek_hole/seek_data I'm not
sure what the best answer for that is. I suppose seek_hole/seek_data is
new enough that people will have checks for -EOPNOTSUPP anyway so we
could just switch it back to that, but that seems like a regression of
sorts to me. I'm not married to the implementation as it is so I'm open
to suggestions. Thanks,
Josef
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Usefulness of SEEK_HOLE / SEEK_DATA in generic_file_llseek()
2013-12-24 0:34 ` Josef Bacik
@ 2013-12-25 14:34 ` Jeff Liu
2013-12-30 21:18 ` Jan Kara
1 sibling, 0 replies; 5+ messages in thread
From: Jeff Liu @ 2013-12-25 14:34 UTC (permalink / raw)
To: Josef Bacik, Jan Kara, linux-fsdevel, coreutils
Hi,
On 12/24 2013 08:34 AM, Josef Bacik wrote:
> On 12/23/2013 06:12 PM, Jan Kara wrote:
>> Hello,
>>
>> so I've now hit a xfstests failure for UDF which is caused by the
>> implementation of SEEK_HOLE / SEEK_DATA in generic_file_llseek(). UDF
>> uses
>> that function as its .llseek method but it supports holes as any other
>> unix
>> filesystem (e.g. ext2). The test in xfstests assumes that when it
>> creates a
>> file by pwrite(fd, buf, bufsz, off), then SEEK_DATA on offset 0 should
>> return 'off' (off is reasonably rounded) but that's not true for the
>> implementation in generic_file_llseek().
>>
>> Now I'm not so much interested in that test itself - that can be
>> tweaked to
>> detect that case. But I rather wanted to ask - how useful is it to
>> implement SEEK_HOLE / SEEK_DATA the way it is in generic_file_llseek()?
>> Because it seems to me that any serious user will have to detect whether
>> SEEK_HOLE / SEEK_DATA works reasonably and if not, fall back to some
>> heuristic anyway. So why bother inventing bogus values in
>> generic_file_llseek and thus making detection of working implementation
>> harder?
> I'm writing this from my in-laws so I'm going to make some assumptions
> about how the code works based on my memory, so sorry in advanced if
> this is completely wrong ;).
>
> IIRC with the generic implementation we treat everything <= i_size as
> data and i_size as the first hole. The way the spec works is that if we
> are currently at data and do seek_data then we just return our current
> offset, same for a hole. In order to not be a jackass and have
> -EOPNOTSUPP for anybody who didn't implement seek_hole/seek_data I just
> did it this way where the only hole is the one that starts at i_size, so
> seek_data before that is going to return the value.
>
> As far as detecting an optimized handling of seek_hole/seek_data I'm not
> sure what the best answer for that is. I suppose seek_hole/seek_data is
> new enough that people will have checks for -EOPNOTSUPP anyway so we
> could just switch it back to that, but that seems like a regression of
> sorts to me. I'm not married to the implementation as it is so I'm open
> to suggestions. Thanks,
Maybe we could consider this semantics from user space tools point of view,
i.e, Coreutils cp, sha1susm, GUN tar, AST(cp, mv, pax), etc...
I once tried to introduce SEEK_HOLE/DATA to Coreutils to improve cp(1) about
two years ago, that is intended to replace my old implementation via fiemap
ioctl(2),
http://lists.gnu.org/archive/html/bug-coreutils/2011-08/msg00102.html
Currently cp(1) handle sparse files via fiemap ioctl(2) although it have some
issues that have been discussed previously, but maybe we still have to support
it as a back up given that it performs efficiently for dealing with sparse files
on file systems without SEEK_HOLE/DATA support, therefore we can fall back to
the old fiemap calls. Hence, I personally would like to see if we can return
EOPNOTSUPP in this case.
BTW, star do pre-checkups if the file system support SEEK_HOLE/DATA by checking
EOPNOTSUPP, maybe this is in terms of the Solaris interface I'm not yet verified.
Thanks,
-Jeff
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Usefulness of SEEK_HOLE / SEEK_DATA in generic_file_llseek()
2013-12-24 0:34 ` Josef Bacik
2013-12-25 14:34 ` Jeff Liu
@ 2013-12-30 21:18 ` Jan Kara
2013-12-31 6:00 ` Theodore Ts'o
1 sibling, 1 reply; 5+ messages in thread
From: Jan Kara @ 2013-12-30 21:18 UTC (permalink / raw)
To: Josef Bacik; +Cc: Jan Kara, linux-fsdevel
On Mon 23-12-13 19:34:18, Josef Bacik wrote:
> On 12/23/2013 06:12 PM, Jan Kara wrote:
> > Hello,
> >
> > so I've now hit a xfstests failure for UDF which is caused by the
> >implementation of SEEK_HOLE / SEEK_DATA in generic_file_llseek(). UDF uses
> >that function as its .llseek method but it supports holes as any other unix
> >filesystem (e.g. ext2). The test in xfstests assumes that when it creates a
> >file by pwrite(fd, buf, bufsz, off), then SEEK_DATA on offset 0 should
> >return 'off' (off is reasonably rounded) but that's not true for the
> >implementation in generic_file_llseek().
> >
> >Now I'm not so much interested in that test itself - that can be tweaked to
> >detect that case. But I rather wanted to ask - how useful is it to
> >implement SEEK_HOLE / SEEK_DATA the way it is in generic_file_llseek()?
> >Because it seems to me that any serious user will have to detect whether
> >SEEK_HOLE / SEEK_DATA works reasonably and if not, fall back to some
> >heuristic anyway. So why bother inventing bogus values in
> >generic_file_llseek and thus making detection of working implementation
> >harder?
> I'm writing this from my in-laws so I'm going to make some
> assumptions about how the code works based on my memory, so sorry in
> advanced if this is completely wrong ;).
>
> IIRC with the generic implementation we treat everything <= i_size
> as data and i_size as the first hole. The way the spec works is
> that if we are currently at data and do seek_data then we just
> return our current offset, same for a hole. In order to not be a
> jackass and have -EOPNOTSUPP for anybody who didn't implement
> seek_hole/seek_data I just did it this way where the only hole is
> the one that starts at i_size, so seek_data before that is going to
> return the value.
Correct. My point is that I actually don't see anything 'jackass' in
returning -EOPNOTSUPP. I agree the way you've implemented
SEEK_HOLE/SEEK_DATA is a correct implementation of the spec but is it a
useful one? I mean e.g. cp(1) will rather want to fall back to its old
heuristic if SEEK_DATA just returns the current file offset. And if the
most of reasonable usecases of SEEK_DATA/SEEK_HOLE will need to detect the
case of dumb SEEK_DATA/SEEK_HOLE implementation, then what's the point?
> As far as detecting an optimized handling of seek_hole/seek_data I'm
> not sure what the best answer for that is. I suppose
> seek_hole/seek_data is new enough that people will have checks for
> -EOPNOTSUPP anyway so we could just switch it back to that, but that
> seems like a regression of sorts to me. I'm not married to the
> implementation as it is so I'm open to suggestions. Thanks,
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Usefulness of SEEK_HOLE / SEEK_DATA in generic_file_llseek()
2013-12-30 21:18 ` Jan Kara
@ 2013-12-31 6:00 ` Theodore Ts'o
0 siblings, 0 replies; 5+ messages in thread
From: Theodore Ts'o @ 2013-12-31 6:00 UTC (permalink / raw)
To: Jan Kara; +Cc: Josef Bacik, linux-fsdevel
On Mon, Dec 30, 2013 at 10:18:31PM +0100, Jan Kara wrote:
> > IIRC with the generic implementation we treat everything <= i_size
> > as data and i_size as the first hole. The way the spec works is
> > that if we are currently at data and do seek_data then we just
> > return our current offset, same for a hole. In order to not be a
> > jackass and have -EOPNOTSUPP for anybody who didn't implement
> > seek_hole/seek_data I just did it this way where the only hole is
> > the one that starts at i_size, so seek_data before that is going to
> > return the value.
>
> Correct. My point is that I actually don't see anything 'jackass' in
> returning -EOPNOTSUPP.
In fact, one useful reason in returning -EOPNOTSUPP is so the
userspace program can tell whether or not SEEK_HOLE/SEEK_DATA is
really implemented by the low-level file system. If the userspace
program knows that it's not available, it might want to fall back to
some alternate mechanism, such as scanning the contents of the file
blocks which are all zeros. (If the program knows that
SEEK_HOLE/SEEK_DATA is really there, it might to avoid the CPU
overhead of scanning the data buffers as it does the copy.)
- Ted
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-12-31 6:00 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-23 23:12 Usefulness of SEEK_HOLE / SEEK_DATA in generic_file_llseek() Jan Kara
2013-12-24 0:34 ` Josef Bacik
2013-12-25 14:34 ` Jeff Liu
2013-12-30 21:18 ` Jan Kara
2013-12-31 6:00 ` Theodore Ts'o
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).