* Re: Usefulness of SEEK_HOLE / SEEK_DATA in generic_file_llseek()
2013-12-24 0:34 ` Josef Bacik
@ 2013-12-25 14:34 ` Jeff Liu
2013-12-30 21:18 ` Jan Kara
1 sibling, 0 replies; 5+ messages in thread
From: Jeff Liu @ 2013-12-25 14:34 UTC (permalink / raw)
To: Josef Bacik, Jan Kara, linux-fsdevel, coreutils
Hi,
On 12/24 2013 08:34 AM, Josef Bacik wrote:
> On 12/23/2013 06:12 PM, Jan Kara wrote:
>> Hello,
>>
>> so I've now hit a xfstests failure for UDF which is caused by the
>> implementation of SEEK_HOLE / SEEK_DATA in generic_file_llseek(). UDF
>> uses
>> that function as its .llseek method but it supports holes as any other
>> unix
>> filesystem (e.g. ext2). The test in xfstests assumes that when it
>> creates a
>> file by pwrite(fd, buf, bufsz, off), then SEEK_DATA on offset 0 should
>> return 'off' (off is reasonably rounded) but that's not true for the
>> implementation in generic_file_llseek().
>>
>> Now I'm not so much interested in that test itself - that can be
>> tweaked to
>> detect that case. But I rather wanted to ask - how useful is it to
>> implement SEEK_HOLE / SEEK_DATA the way it is in generic_file_llseek()?
>> Because it seems to me that any serious user will have to detect whether
>> SEEK_HOLE / SEEK_DATA works reasonably and if not, fall back to some
>> heuristic anyway. So why bother inventing bogus values in
>> generic_file_llseek and thus making detection of working implementation
>> harder?
> I'm writing this from my in-laws so I'm going to make some assumptions
> about how the code works based on my memory, so sorry in advanced if
> this is completely wrong ;).
>
> IIRC with the generic implementation we treat everything <= i_size as
> data and i_size as the first hole. The way the spec works is that if we
> are currently at data and do seek_data then we just return our current
> offset, same for a hole. In order to not be a jackass and have
> -EOPNOTSUPP for anybody who didn't implement seek_hole/seek_data I just
> did it this way where the only hole is the one that starts at i_size, so
> seek_data before that is going to return the value.
>
> As far as detecting an optimized handling of seek_hole/seek_data I'm not
> sure what the best answer for that is. I suppose seek_hole/seek_data is
> new enough that people will have checks for -EOPNOTSUPP anyway so we
> could just switch it back to that, but that seems like a regression of
> sorts to me. I'm not married to the implementation as it is so I'm open
> to suggestions. Thanks,
Maybe we could consider this semantics from user space tools point of view,
i.e, Coreutils cp, sha1susm, GUN tar, AST(cp, mv, pax), etc...
I once tried to introduce SEEK_HOLE/DATA to Coreutils to improve cp(1) about
two years ago, that is intended to replace my old implementation via fiemap
ioctl(2),
http://lists.gnu.org/archive/html/bug-coreutils/2011-08/msg00102.html
Currently cp(1) handle sparse files via fiemap ioctl(2) although it have some
issues that have been discussed previously, but maybe we still have to support
it as a back up given that it performs efficiently for dealing with sparse files
on file systems without SEEK_HOLE/DATA support, therefore we can fall back to
the old fiemap calls. Hence, I personally would like to see if we can return
EOPNOTSUPP in this case.
BTW, star do pre-checkups if the file system support SEEK_HOLE/DATA by checking
EOPNOTSUPP, maybe this is in terms of the Solaris interface I'm not yet verified.
Thanks,
-Jeff
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Usefulness of SEEK_HOLE / SEEK_DATA in generic_file_llseek()
2013-12-24 0:34 ` Josef Bacik
2013-12-25 14:34 ` Jeff Liu
@ 2013-12-30 21:18 ` Jan Kara
2013-12-31 6:00 ` Theodore Ts'o
1 sibling, 1 reply; 5+ messages in thread
From: Jan Kara @ 2013-12-30 21:18 UTC (permalink / raw)
To: Josef Bacik; +Cc: Jan Kara, linux-fsdevel
On Mon 23-12-13 19:34:18, Josef Bacik wrote:
> On 12/23/2013 06:12 PM, Jan Kara wrote:
> > Hello,
> >
> > so I've now hit a xfstests failure for UDF which is caused by the
> >implementation of SEEK_HOLE / SEEK_DATA in generic_file_llseek(). UDF uses
> >that function as its .llseek method but it supports holes as any other unix
> >filesystem (e.g. ext2). The test in xfstests assumes that when it creates a
> >file by pwrite(fd, buf, bufsz, off), then SEEK_DATA on offset 0 should
> >return 'off' (off is reasonably rounded) but that's not true for the
> >implementation in generic_file_llseek().
> >
> >Now I'm not so much interested in that test itself - that can be tweaked to
> >detect that case. But I rather wanted to ask - how useful is it to
> >implement SEEK_HOLE / SEEK_DATA the way it is in generic_file_llseek()?
> >Because it seems to me that any serious user will have to detect whether
> >SEEK_HOLE / SEEK_DATA works reasonably and if not, fall back to some
> >heuristic anyway. So why bother inventing bogus values in
> >generic_file_llseek and thus making detection of working implementation
> >harder?
> I'm writing this from my in-laws so I'm going to make some
> assumptions about how the code works based on my memory, so sorry in
> advanced if this is completely wrong ;).
>
> IIRC with the generic implementation we treat everything <= i_size
> as data and i_size as the first hole. The way the spec works is
> that if we are currently at data and do seek_data then we just
> return our current offset, same for a hole. In order to not be a
> jackass and have -EOPNOTSUPP for anybody who didn't implement
> seek_hole/seek_data I just did it this way where the only hole is
> the one that starts at i_size, so seek_data before that is going to
> return the value.
Correct. My point is that I actually don't see anything 'jackass' in
returning -EOPNOTSUPP. I agree the way you've implemented
SEEK_HOLE/SEEK_DATA is a correct implementation of the spec but is it a
useful one? I mean e.g. cp(1) will rather want to fall back to its old
heuristic if SEEK_DATA just returns the current file offset. And if the
most of reasonable usecases of SEEK_DATA/SEEK_HOLE will need to detect the
case of dumb SEEK_DATA/SEEK_HOLE implementation, then what's the point?
> As far as detecting an optimized handling of seek_hole/seek_data I'm
> not sure what the best answer for that is. I suppose
> seek_hole/seek_data is new enough that people will have checks for
> -EOPNOTSUPP anyway so we could just switch it back to that, but that
> seems like a regression of sorts to me. I'm not married to the
> implementation as it is so I'm open to suggestions. Thanks,
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 5+ messages in thread