* SEEK_DATA/SEEK_HOLE support
@ 2011-10-02 15:04 Jeff Liu
2011-10-02 15:42 ` Christoph Hellwig
2011-11-14 10:24 ` Christoph Hellwig
0 siblings, 2 replies; 28+ messages in thread
From: Jeff Liu @ 2011-10-02 15:04 UTC (permalink / raw)
To: xfs
Dear developer,
Does anyone already worked on SEEK_DATA/SEEK_HOLE for XFS? I'd like to
implement it if not. :)
Thanks,
-Jeff
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: SEEK_DATA/SEEK_HOLE support
2011-10-02 15:04 SEEK_DATA/SEEK_HOLE support Jeff Liu
@ 2011-10-02 15:42 ` Christoph Hellwig
2011-10-02 16:06 ` Jeff Liu
2011-11-14 10:24 ` Christoph Hellwig
1 sibling, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2011-10-02 15:42 UTC (permalink / raw)
To: Jeff Liu; +Cc: xfs
On Sun, Oct 02, 2011 at 11:04:31PM +0800, Jeff Liu wrote:
> Dear developer,
>
> Does anyone already worked on SEEK_DATA/SEEK_HOLE for XFS? I'd like to
> implement it if not. :)
Dave mentioned he had a basic implementation, he might have some code
that you can improve on.
Did we get consensus about the the semantics of them for unwritten
extents? If we want them to be exact in the fact of unwritten extents
that is going to be the most work, e.g. if we find an unwritten extent
we'll then have to do a pagecache lookup and check if pages are dirty
and in that case not treat them as a hole.
The rest of the implementation should be easy, e.g. doing something like
the following pseudo-code which missed all the proper error codes and
conversions from the lseek arguments to filesystem blocks and the
required locking around it:
seek_hole()
{
xfs_bmap_search_extents(ip, bno, XFS_DATA_FORK, &eof, &lastx, &got,
&prev);
for (;;) {
if (eof) {
/* past the file, nothing do here */
return;
}
if (got.br_startoff > bno)
/* found hole, return it */
return;
}
if (++lastx == ifp->if_bytes / sizeof(xfs_bmbt_rec_t)) {
/* reached EOF */
return;
}
xfs_bmbt_get_all(xfs_iext_get_ext(ifp, lastx), &got);
}
}
seek_data()
{
xfs_bmap_search_extents(ip, bno, XFS_DATA_FORK, &eof, &lastx, &got,
&prev);
for (;;) {
if (eof) {
/* past the file, nothing do here */
return;
}
/* next data extent */
return got.br_startoff < bno ? bno : got.br_startof;
}
}
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: SEEK_DATA/SEEK_HOLE support
2011-10-02 15:42 ` Christoph Hellwig
@ 2011-10-02 16:06 ` Jeff Liu
2011-10-02 17:59 ` Christoph Hellwig
0 siblings, 1 reply; 28+ messages in thread
From: Jeff Liu @ 2011-10-02 16:06 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
Hi Christoph,
Thanks for your prompt response!
On 10/02/2011 11:42 PM, Christoph Hellwig wrote:
> On Sun, Oct 02, 2011 at 11:04:31PM +0800, Jeff Liu wrote:
>> Dear developer,
>>
>> Does anyone already worked on SEEK_DATA/SEEK_HOLE for XFS? I'd like to
>> implement it if not. :)
>
> Dave mentioned he had a basic implementation, he might have some code
> that you can improve on.
>
> Did we get consensus about the the semantics of them for unwritten
> extents? If we want them to be exact in the fact of unwritten extents
> that is going to be the most work, e.g. if we find an unwritten extent
> we'll then have to do a pagecache lookup and check if pages are dirty
> and in that case not treat them as a hole.
IMHO, to avoid data loss in some user application like cp(1), for
unwritten extents, we always need to check the pages status. Just as
you mentioned above, return the map offset if pages are dirty for
SEEK_DATA, or a hole found.
-Jeff
>
> The rest of the implementation should be easy, e.g. doing something like
> the following pseudo-code which missed all the proper error codes and
> conversions from the lseek arguments to filesystem blocks and the
> required locking around it:
>
>
> seek_hole()
> {
> xfs_bmap_search_extents(ip, bno, XFS_DATA_FORK, &eof, &lastx, &got,
> &prev);
>
> for (;;) {
> if (eof) {
> /* past the file, nothing do here */
> return;
> }
>
> if (got.br_startoff > bno)
> /* found hole, return it */
> return;
> }
>
> if (++lastx == ifp->if_bytes / sizeof(xfs_bmbt_rec_t)) {
> /* reached EOF */
> return;
> }
> xfs_bmbt_get_all(xfs_iext_get_ext(ifp, lastx), &got);
> }
> }
>
> seek_data()
> {
> xfs_bmap_search_extents(ip, bno, XFS_DATA_FORK, &eof, &lastx, &got,
> &prev);
>
> for (;;) {
> if (eof) {
> /* past the file, nothing do here */
> return;
> }
>
> /* next data extent */
> return got.br_startoff < bno ? bno : got.br_startof;
> }
> }
>
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: SEEK_DATA/SEEK_HOLE support
2011-10-02 16:06 ` Jeff Liu
@ 2011-10-02 17:59 ` Christoph Hellwig
2011-10-02 19:11 ` Andi Kleen
2011-10-03 4:04 ` Jeff Liu
0 siblings, 2 replies; 28+ messages in thread
From: Christoph Hellwig @ 2011-10-02 17:59 UTC (permalink / raw)
To: Jeff Liu; +Cc: xfs
On Mon, Oct 03, 2011 at 12:06:37AM +0800, Jeff Liu wrote:
> IMHO, to avoid data loss in some user application like cp(1), for
> unwritten extents, we always need to check the pages status. Just as
> you mentioned above, return the map offset if pages are dirty for
> SEEK_DATA, or a hole found.
I'd suggest to first implement the simple versions I schemed below,
which would treat unwritten extents as data. That is sub-optimal,
but a) safe and b) easy to implement. The second step would be to
add probing for unwritten extents, which is even something we could
do as a common helper routine shared by filesystems.
And the most important thing is of course adding QA for it. Josef
already wrote an xfstests case that needs to be resurrected, compared
against the latest Posix draft and if nessecary updated.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: SEEK_DATA/SEEK_HOLE support
2011-10-02 17:59 ` Christoph Hellwig
@ 2011-10-02 19:11 ` Andi Kleen
2011-10-03 4:04 ` Jeff Liu
1 sibling, 0 replies; 28+ messages in thread
From: Andi Kleen @ 2011-10-02 19:11 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Jeff Liu, xfs
Christoph Hellwig <hch@infradead.org> writes:
>
> And the most important thing is of course adding QA for it. Josef
> already wrote an xfstests case that needs to be resurrected, compared
> against the latest Posix draft and if nessecary updated.
Also when you do it please make sure you don't break unlocked lseek()
The patches for that are in Al's queue. Only take the lock
for SEEK_HOLE/DATA, but not for the other operations. Use the
new helper instead of cut'n'pasting code.
-Andi
--
ak@linux.intel.com -- Speaking for myself only
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: SEEK_DATA/SEEK_HOLE support
2011-10-02 17:59 ` Christoph Hellwig
2011-10-02 19:11 ` Andi Kleen
@ 2011-10-03 4:04 ` Jeff Liu
2011-10-03 23:43 ` Dave Chinner
1 sibling, 1 reply; 28+ messages in thread
From: Jeff Liu @ 2011-10-03 4:04 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On 10/03/2011 01:59 AM, Christoph Hellwig wrote:
> On Mon, Oct 03, 2011 at 12:06:37AM +0800, Jeff Liu wrote:
>> IMHO, to avoid data loss in some user application like cp(1), for
>> unwritten extents, we always need to check the pages status. Just as
>> you mentioned above, return the map offset if pages are dirty for
>> SEEK_DATA, or a hole found.
>
> I'd suggest to first implement the simple versions I schemed below,
> which would treat unwritten extents as data. That is sub-optimal,
> but a) safe and b) easy to implement. The second step would be to
> add probing for unwritten extents, which is even something we could
> do as a common helper routine shared by filesystems.
So I'll wait for Dave's patch become ready, and then continue to improve
it if necessary.
In the meantime, I'll try to figure out how to add a helper which can be
shared by all file systems for UNWRITTEN extents.
>
> And the most important thing is of course adding QA for it. Josef
> already wrote an xfstests case that needs to be resurrected, compared
> against the latest Posix draft and if nessecary updated.
Thanks,
-Jeff
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: SEEK_DATA/SEEK_HOLE support
2011-10-03 4:04 ` Jeff Liu
@ 2011-10-03 23:43 ` Dave Chinner
2011-10-04 13:02 ` Christoph Hellwig
0 siblings, 1 reply; 28+ messages in thread
From: Dave Chinner @ 2011-10-03 23:43 UTC (permalink / raw)
To: Jeff Liu; +Cc: Christoph Hellwig, xfs
On Mon, Oct 03, 2011 at 12:04:11PM +0800, Jeff Liu wrote:
> On 10/03/2011 01:59 AM, Christoph Hellwig wrote:
>
> > On Mon, Oct 03, 2011 at 12:06:37AM +0800, Jeff Liu wrote:
> >> IMHO, to avoid data loss in some user application like cp(1), for
> >> unwritten extents, we always need to check the pages status. Just as
> >> you mentioned above, return the map offset if pages are dirty for
> >> SEEK_DATA, or a hole found.
> >
> > I'd suggest to first implement the simple versions I schemed below,
> > which would treat unwritten extents as data. That is sub-optimal,
> > but a) safe and b) easy to implement. The second step would be to
> > add probing for unwritten extents, which is even something we could
> > do as a common helper routine shared by filesystems.
>
> So I'll wait for Dave's patch become ready, and then continue to improve
> it if necessary.
> In the meantime, I'll try to figure out how to add a helper which can be
> shared by all file systems for UNWRITTEN extents.
The lookup is pretty simple - if there's cached data over the
unwritten range, then I'm considering it a data range. If there's no
cached data over the unwritten extent, it's a hole. That makes the
lookup simply a case of finding the first cached page in the
unwritten extent.
It'll end up reading something like this:
iomap = offset_to_extent(offset);
first_index = extent_to_page_index(iomap);
nr_found = pagevec_lookup(&pvec, inode->i_mapping, first_index, 1);
if (!nr_found)
break;
offset = page->index << PAGECACHE_SHIFT;
pagevec_release(&pvec);
/* If we fell off the end of the extent lookup next extent */
if (offset >= end_of_extent(iomap)) {
offset = end_of_extent(iomap);
goto next_extent;
}
All the extent manipulations are pretty filesystem specific, so
there's not much that can be extracted into generic helper, I
think...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: SEEK_DATA/SEEK_HOLE support
2011-10-03 23:43 ` Dave Chinner
@ 2011-10-04 13:02 ` Christoph Hellwig
2011-10-05 4:36 ` Dave Chinner
0 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2011-10-04 13:02 UTC (permalink / raw)
To: Dave Chinner; +Cc: Christoph Hellwig, Jeff Liu, xfs
On Tue, Oct 04, 2011 at 10:43:05AM +1100, Dave Chinner wrote:
> The lookup is pretty simple - if there's cached data over the
> unwritten range, then I'm considering it a data range. If there's no
> cached data over the unwritten extent, it's a hole. That makes the
> lookup simply a case of finding the first cached page in the
> unwritten extent.
>
> It'll end up reading something like this:
>
> iomap = offset_to_extent(offset);
> first_index = extent_to_page_index(iomap);
>
> nr_found = pagevec_lookup(&pvec, inode->i_mapping, first_index, 1);
> if (!nr_found)
> break;
>
> offset = page->index << PAGECACHE_SHIFT;
> pagevec_release(&pvec);
>
> /* If we fell off the end of the extent lookup next extent */
> if (offset >= end_of_extent(iomap)) {
> offset = end_of_extent(iomap);
> goto next_extent;
> }
>
> All the extent manipulations are pretty filesystem specific, so
> there's not much that can be extracted into generic helper, I
> think...
Actually pretty similar code will work just fine if you passt the
start + len of the extents in (which we got from looking it up
fs-specificly):
Note that we have to look for both dirty and writeback pages to
make it safe.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: SEEK_DATA/SEEK_HOLE support
2011-10-04 13:02 ` Christoph Hellwig
@ 2011-10-05 4:36 ` Dave Chinner
2011-10-05 5:32 ` Jeff Liu
2011-10-05 7:34 ` Michael Monnerie
0 siblings, 2 replies; 28+ messages in thread
From: Dave Chinner @ 2011-10-05 4:36 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Jeff Liu, xfs
On Tue, Oct 04, 2011 at 09:02:08AM -0400, Christoph Hellwig wrote:
> On Tue, Oct 04, 2011 at 10:43:05AM +1100, Dave Chinner wrote:
> > The lookup is pretty simple - if there's cached data over the
> > unwritten range, then I'm considering it a data range. If there's no
> > cached data over the unwritten extent, it's a hole. That makes the
> > lookup simply a case of finding the first cached page in the
> > unwritten extent.
> >
> > It'll end up reading something like this:
> >
> > iomap = offset_to_extent(offset);
> > first_index = extent_to_page_index(iomap);
> >
> > nr_found = pagevec_lookup(&pvec, inode->i_mapping, first_index, 1);
> > if (!nr_found)
> > break;
> >
> > offset = page->index << PAGECACHE_SHIFT;
> > pagevec_release(&pvec);
> >
> > /* If we fell off the end of the extent lookup next extent */
> > if (offset >= end_of_extent(iomap)) {
> > offset = end_of_extent(iomap);
> > goto next_extent;
> > }
> >
> > All the extent manipulations are pretty filesystem specific, so
> > there's not much that can be extracted into generic helper, I
> > think...
>
> Actually pretty similar code will work just fine if you passt the
> start + len of the extents in (which we got from looking it up
> fs-specificly):
>
> Note that we have to look for both dirty and writeback pages to
> make it safe.
That will only work if you can prevent concurrent unwritten extent
conversion from happening while we do the separate tag lookups on
the range because it requires two radix tree tag lookups rather than
just one index lookup. i.e. miss the dirty page because it went
dirty->writeback during the dirty tag search, and miss the same page
when doing the writeback lookup because it went writeback->clean
very quickly due to IO completion.
So to stop that from happening, it requires that filesystems can
exclude unwritten extent conversion from happening while a
SEEK_HOLE/SEEK_DATA operation is in progress, and that the
filesystem can safely do mapping tree lookups while providing that
extent tree exclusion. I know that XFS has no problems here, but
filesystems that use i_mutex for everything might be in trouble.
Besides, if we just look for pages in the cache over unwritten
extents (i.e. someone has treated it as data already), then it can
be done locklessly without having to worry about page state changes
occurring concurrently...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: SEEK_DATA/SEEK_HOLE support
2011-10-05 4:36 ` Dave Chinner
@ 2011-10-05 5:32 ` Jeff Liu
2011-10-05 9:23 ` Dave Chinner
2011-10-05 7:34 ` Michael Monnerie
1 sibling, 1 reply; 28+ messages in thread
From: Jeff Liu @ 2011-10-05 5:32 UTC (permalink / raw)
To: Dave Chinner; +Cc: Christoph Hellwig, xfs
On 10/05/2011 12:36 PM, Dave Chinner wrote:
> On Tue, Oct 04, 2011 at 09:02:08AM -0400, Christoph Hellwig wrote:
>> On Tue, Oct 04, 2011 at 10:43:05AM +1100, Dave Chinner wrote:
>>> The lookup is pretty simple - if there's cached data over the
>>> unwritten range, then I'm considering it a data range. If there's no
>>> cached data over the unwritten extent, it's a hole. That makes the
>>> lookup simply a case of finding the first cached page in the
>>> unwritten extent.
>>>
>>> It'll end up reading something like this:
>>>
>>> iomap = offset_to_extent(offset);
>>> first_index = extent_to_page_index(iomap);
>>>
>>> nr_found = pagevec_lookup(&pvec, inode->i_mapping, first_index, 1);
>>> if (!nr_found)
>>> break;
>>>
>>> offset = page->index << PAGECACHE_SHIFT;
>>> pagevec_release(&pvec);
>>>
>>> /* If we fell off the end of the extent lookup next extent */
>>> if (offset >= end_of_extent(iomap)) {
>>> offset = end_of_extent(iomap);
>>> goto next_extent;
>>> }
>>>
>>> All the extent manipulations are pretty filesystem specific, so
>>> there's not much that can be extracted into generic helper, I
>>> think...
>>
>> Actually pretty similar code will work just fine if you passt the
>> start + len of the extents in (which we got from looking it up
>> fs-specificly):
>>
>> Note that we have to look for both dirty and writeback pages to
>> make it safe.
>
> That will only work if you can prevent concurrent unwritten extent
> conversion from happening while we do the separate tag lookups on
> the range because it requires two radix tree tag lookups rather than
> just one index lookup. i.e. miss the dirty page because it went
> dirty->writeback during the dirty tag search, and miss the same page
> when doing the writeback lookup because it went writeback->clean
> very quickly due to IO completion.
>
> So to stop that from happening, it requires that filesystems can
> exclude unwritten extent conversion from happening while a
> SEEK_HOLE/SEEK_DATA operation is in progress, and that the
> filesystem can safely do mapping tree lookups while providing that
> extent tree exclusion. I know that XFS has no problems here, but
> filesystems that use i_mutex for everything might be in trouble.
>
> Besides, if we just look for pages in the cache over unwritten
> extents (i.e. someone has treated it as data already), then it can
> be done locklessly without having to worry about page state changes
> occurring concurrently...
>From some backup utility's point of view, AFAICS, the benefits is,
utilites does not need to perform read/write for an unwritten extents
without dirty or writeback pages, instead, just call lseek(2) to treat
it as hole. The issue is, is it worth to do that? i.e, exclude
unwritten extent conversions when SEEK_XXX is ongoing.
Thanks,
-Jeff
>
> Cheers,
>
> Dave.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: SEEK_DATA/SEEK_HOLE support
2011-10-05 4:36 ` Dave Chinner
2011-10-05 5:32 ` Jeff Liu
@ 2011-10-05 7:34 ` Michael Monnerie
2011-10-05 9:36 ` Dave Chinner
1 sibling, 1 reply; 28+ messages in thread
From: Michael Monnerie @ 2011-10-05 7:34 UTC (permalink / raw)
To: xfs; +Cc: Christoph Hellwig, Jeff Liu
[-- Attachment #1.1: Type: Text/Plain, Size: 2195 bytes --]
On Mittwoch, 5. Oktober 2011 Dave Chinner wrote:
> That will only work if you can prevent concurrent unwritten extent
> conversion from happening while we do the separate tag lookups on
> the range because it requires two radix tree tag lookups rather than
> just one index lookup. i.e. miss the dirty page because it went
> dirty->writeback during the dirty tag search, and miss the same page
> when doing the writeback lookup because it went writeback->clean
> very quickly due to IO completion.
>
> So to stop that from happening, it requires that filesystems can
> exclude unwritten extent conversion from happening while a
> SEEK_HOLE/SEEK_DATA operation is in progress, and that the
> filesystem can safely do mapping tree lookups while providing that
> extent tree exclusion. I know that XFS has no problems here, but
> filesystems that use i_mutex for everything might be in trouble.
>
> Besides, if we just look for pages in the cache over unwritten
> extents (i.e. someone has treated it as data already), then it can
> be done locklessly without having to worry about page state changes
> occurring concurrently...
I'd like to understand why it's important to care about locking here. As
I understand it, SEEK_* is used for example to copy a file efficiently.
If that is performed on a file that is currently being written to, the
resulting copy will probably be bogus anyway, even without SEEK_* usage.
There might be a case where it is important, but I can't see that atm.
If I understand it correctly, then if we do not lock during SEEK_*
operations, a part of the file might be missed to copy, but that's only
for cases where the source file is being written to. If that file is
100GB size (to be extreme), and we copy it while it's modified, we will
almost for sure have a copy that is partly modified, partly not,
depending on which area was modified before read and which not. So
where's the point?
--
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc
it-management Internet Services: Protéger
http://proteger.at [gesprochen: Prot-e-schee]
Tel: +43 660 / 415 6531
// Haus zu verkaufen: http://zmi.at/langegg/
[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: SEEK_DATA/SEEK_HOLE support
2011-10-05 5:32 ` Jeff Liu
@ 2011-10-05 9:23 ` Dave Chinner
2011-10-05 13:56 ` Jeff Liu
0 siblings, 1 reply; 28+ messages in thread
From: Dave Chinner @ 2011-10-05 9:23 UTC (permalink / raw)
To: Jeff Liu; +Cc: Christoph Hellwig, xfs
On Tue, Oct 04, 2011 at 10:32:11PM -0700, Jeff Liu wrote:
> On 10/05/2011 12:36 PM, Dave Chinner wrote:
>
> > On Tue, Oct 04, 2011 at 09:02:08AM -0400, Christoph Hellwig wrote:
> >> On Tue, Oct 04, 2011 at 10:43:05AM +1100, Dave Chinner wrote:
> >>> The lookup is pretty simple - if there's cached data over the
> >>> unwritten range, then I'm considering it a data range. If there's no
> >>> cached data over the unwritten extent, it's a hole. That makes the
> >>> lookup simply a case of finding the first cached page in the
> >>> unwritten extent.
> >>>
> >>> It'll end up reading something like this:
> >>>
> >>> iomap = offset_to_extent(offset);
> >>> first_index = extent_to_page_index(iomap);
> >>>
> >>> nr_found = pagevec_lookup(&pvec, inode->i_mapping, first_index, 1);
> >>> if (!nr_found)
> >>> break;
> >>>
> >>> offset = page->index << PAGECACHE_SHIFT;
> >>> pagevec_release(&pvec);
> >>>
> >>> /* If we fell off the end of the extent lookup next extent */
> >>> if (offset >= end_of_extent(iomap)) {
> >>> offset = end_of_extent(iomap);
> >>> goto next_extent;
> >>> }
> >>>
> >>> All the extent manipulations are pretty filesystem specific, so
> >>> there's not much that can be extracted into generic helper, I
> >>> think...
> >>
> >> Actually pretty similar code will work just fine if you passt the
> >> start + len of the extents in (which we got from looking it up
> >> fs-specificly):
> >>
> >> Note that we have to look for both dirty and writeback pages to
> >> make it safe.
> >
> > That will only work if you can prevent concurrent unwritten extent
> > conversion from happening while we do the separate tag lookups on
> > the range because it requires two radix tree tag lookups rather than
> > just one index lookup. i.e. miss the dirty page because it went
> > dirty->writeback during the dirty tag search, and miss the same page
> > when doing the writeback lookup because it went writeback->clean
> > very quickly due to IO completion.
>
> >
> > So to stop that from happening, it requires that filesystems can
> > exclude unwritten extent conversion from happening while a
> > SEEK_HOLE/SEEK_DATA operation is in progress, and that the
> > filesystem can safely do mapping tree lookups while providing that
> > extent tree exclusion. I know that XFS has no problems here, but
> > filesystems that use i_mutex for everything might be in trouble.
> >
> > Besides, if we just look for pages in the cache over unwritten
> > extents (i.e. someone has treated it as data already), then it can
> > be done locklessly without having to worry about page state changes
> > occurring concurrently...
>
> From some backup utility's point of view, AFAICS, the benefits is,
> utilites does not need to perform read/write for an unwritten extents
> without dirty or writeback pages, instead, just call lseek(2) to treat
> it as hole.
Yes, that's a use case for SEEK_HOLE/DATA, but that has nothing to
do with the filesystem implementation issues involved with page
state changes. What we nee dto avoid is a page state change from
causing a SEEK_XXX from missing a page it should have stopped the
seek at because it changed state concurrently. A page changing from
dirty to writeback is still a page containing DATA, so if we miss it
in both the scan for dirty pages and for writeback pages, then the
backup application will seek over data it should have backed up....
> The issue is, is it worth to do that? i.e, exclude
> unwritten extent conversions when SEEK_XXX is ongoing.
It's a moot point for XFS - we need to hold the ILOCK shared
while doing the extent tree walk, so we are going to hold off
unwritten extent conversion as that requires the ILOCK
exclusively...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: SEEK_DATA/SEEK_HOLE support
2011-10-05 7:34 ` Michael Monnerie
@ 2011-10-05 9:36 ` Dave Chinner
2011-10-05 18:22 ` Michael Monnerie
0 siblings, 1 reply; 28+ messages in thread
From: Dave Chinner @ 2011-10-05 9:36 UTC (permalink / raw)
To: Michael Monnerie; +Cc: Christoph Hellwig, Jeff Liu, xfs
On Wed, Oct 05, 2011 at 09:34:26AM +0200, Michael Monnerie wrote:
> On Mittwoch, 5. Oktober 2011 Dave Chinner wrote:
> > That will only work if you can prevent concurrent unwritten extent
> > conversion from happening while we do the separate tag lookups on
> > the range because it requires two radix tree tag lookups rather than
> > just one index lookup. i.e. miss the dirty page because it went
> > dirty->writeback during the dirty tag search, and miss the same page
> > when doing the writeback lookup because it went writeback->clean
> > very quickly due to IO completion.
> >
> > So to stop that from happening, it requires that filesystems can
> > exclude unwritten extent conversion from happening while a
> > SEEK_HOLE/SEEK_DATA operation is in progress, and that the
> > filesystem can safely do mapping tree lookups while providing that
> > extent tree exclusion. I know that XFS has no problems here, but
> > filesystems that use i_mutex for everything might be in trouble.
> >
> > Besides, if we just look for pages in the cache over unwritten
> > extents (i.e. someone has treated it as data already), then it can
> > be done locklessly without having to worry about page state changes
> > occurring concurrently...
>
> I'd like to understand why it's important to care about locking here. As
> I understand it, SEEK_* is used for example to copy a file efficiently.
> If that is performed on a file that is currently being written to, the
> resulting copy will probably be bogus anyway, even without SEEK_* usage.
The problem is that copying a file with dirty data hot in the cache
concurrently when writeback to that file is occurring. The
application is not modifying the file, and the state changes that
could lead SEEK_DATA missing data are driven entirely by background
kernel threads. The application leaves a file in this state:
dirty page
+------D--------+
unwritten extent
SEEK_DATA must stop at that dirty page over the unwritten extent.
If we do a dirty page lookup, but the flusher thread has started IO
on it, it will have transitioned to this state:
writeback page
+------W--------+
unwritten extent
And the dirty page lookup will not find it. If we allow unwritten
extent conversion to also run while we are doing the writeback page
cache lookup, then IO completion and conversion can occur before we
find the writeback page in the cache. i.e we see this state:
clean page
+------C--------+
unwritten extent
And we don't find the data page over the unwritten extent at all
because we haven't looked up clean pages in the cache.
At that point, our SEEK_DATA call has skipped over a valid data
page, the copy utility fails to copy the data over the unwritten
extent, thereby silently creating a corrupt copy....
> There might be a case where it is important, but I can't see that atm.
It's a data corruption problem, pure and simple....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: SEEK_DATA/SEEK_HOLE support
2011-10-05 9:23 ` Dave Chinner
@ 2011-10-05 13:56 ` Jeff Liu
0 siblings, 0 replies; 28+ messages in thread
From: Jeff Liu @ 2011-10-05 13:56 UTC (permalink / raw)
To: Dave Chinner; +Cc: Christoph Hellwig, xfs
On 10/05/2011 05:23 PM, Dave Chinner wrote:
> On Tue, Oct 04, 2011 at 10:32:11PM -0700, Jeff Liu wrote:
>> On 10/05/2011 12:36 PM, Dave Chinner wrote:
>>
>>> On Tue, Oct 04, 2011 at 09:02:08AM -0400, Christoph Hellwig wrote:
>>>> On Tue, Oct 04, 2011 at 10:43:05AM +1100, Dave Chinner wrote:
>>>>> The lookup is pretty simple - if there's cached data over the
>>>>> unwritten range, then I'm considering it a data range. If there's no
>>>>> cached data over the unwritten extent, it's a hole. That makes the
>>>>> lookup simply a case of finding the first cached page in the
>>>>> unwritten extent.
>>>>>
>>>>> It'll end up reading something like this:
>>>>>
>>>>> iomap = offset_to_extent(offset);
>>>>> first_index = extent_to_page_index(iomap);
>>>>>
>>>>> nr_found = pagevec_lookup(&pvec, inode->i_mapping, first_index, 1);
>>>>> if (!nr_found)
>>>>> break;
>>>>>
>>>>> offset = page->index << PAGECACHE_SHIFT;
>>>>> pagevec_release(&pvec);
>>>>>
>>>>> /* If we fell off the end of the extent lookup next extent */
>>>>> if (offset >= end_of_extent(iomap)) {
>>>>> offset = end_of_extent(iomap);
>>>>> goto next_extent;
>>>>> }
>>>>>
>>>>> All the extent manipulations are pretty filesystem specific, so
>>>>> there's not much that can be extracted into generic helper, I
>>>>> think...
>>>>
>>>> Actually pretty similar code will work just fine if you passt the
>>>> start + len of the extents in (which we got from looking it up
>>>> fs-specificly):
>>>>
>>>> Note that we have to look for both dirty and writeback pages to
>>>> make it safe.
>>>
>>> That will only work if you can prevent concurrent unwritten extent
>>> conversion from happening while we do the separate tag lookups on
>>> the range because it requires two radix tree tag lookups rather than
>>> just one index lookup. i.e. miss the dirty page because it went
>>> dirty->writeback during the dirty tag search, and miss the same page
>>> when doing the writeback lookup because it went writeback->clean
>>> very quickly due to IO completion.
>>
>>>
>>> So to stop that from happening, it requires that filesystems can
>>> exclude unwritten extent conversion from happening while a
>>> SEEK_HOLE/SEEK_DATA operation is in progress, and that the
>>> filesystem can safely do mapping tree lookups while providing that
>>> extent tree exclusion. I know that XFS has no problems here, but
>>> filesystems that use i_mutex for everything might be in trouble.
>>>
>>> Besides, if we just look for pages in the cache over unwritten
>>> extents (i.e. someone has treated it as data already), then it can
>>> be done locklessly without having to worry about page state changes
>>> occurring concurrently...
>>
>> From some backup utility's point of view, AFAICS, the benefits is,
>> utilites does not need to perform read/write for an unwritten extents
>> without dirty or writeback pages, instead, just call lseek(2) to treat
>> it as hole.
>
> Yes, that's a use case for SEEK_HOLE/DATA, but that has nothing to
> do with the filesystem implementation issues involved with page
> state changes. What we nee dto avoid is a page state change from
> causing a SEEK_XXX from missing a page it should have stopped the
> seek at because it changed state concurrently. A page changing from
> dirty to writeback is still a page containing DATA, so if we miss it
> in both the scan for dirty pages and for writeback pages, then the
> backup application will seek over data it should have backed up....
>
>> The issue is, is it worth to do that? i.e, exclude
>> unwritten extent conversions when SEEK_XXX is ongoing.
>
> It's a moot point for XFS - we need to hold the ILOCK shared
> while doing the extent tree walk, so we are going to hold off
> unwritten extent conversion as that requires the ILOCK
> exclusively...
Thanks for your clarification!
Now I understood your option. Indeed, if the utility treat unwritten
extents as data, we do not need to worry about the pages state
conversion procedures. Otherwise, we need to hold off that until
SEEK_XXX iteration done to avoid missing a page copy.
Thanks,
-Jeff
>
> Cheers,
>
> Dave.
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: SEEK_DATA/SEEK_HOLE support
2011-10-05 9:36 ` Dave Chinner
@ 2011-10-05 18:22 ` Michael Monnerie
2011-10-06 0:32 ` Dave Chinner
0 siblings, 1 reply; 28+ messages in thread
From: Michael Monnerie @ 2011-10-05 18:22 UTC (permalink / raw)
To: xfs; +Cc: Christoph Hellwig, Jeff Liu
[-- Attachment #1.1: Type: Text/Plain, Size: 678 bytes --]
On Mittwoch, 5. Oktober 2011 Dave Chinner wrote:
> It's a data corruption problem, pure and simple....
Thanks for the thorough explanation. So it's a problem when a file has
recently been modified and not yet been written back to disk. Would it
be worth to force a flush to disk before SEEK_* operations can start? I
don't know if it's easier to do all the lookups you suggested, or do an
fsync. That could have it's own impacts, though.
--
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc
it-management Internet Services: Protéger
http://proteger.at [gesprochen: Prot-e-schee]
Tel: +43 660 / 415 6531
// Haus zu verkaufen: http://zmi.at/langegg/
[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: SEEK_DATA/SEEK_HOLE support
2011-10-05 18:22 ` Michael Monnerie
@ 2011-10-06 0:32 ` Dave Chinner
0 siblings, 0 replies; 28+ messages in thread
From: Dave Chinner @ 2011-10-06 0:32 UTC (permalink / raw)
To: Michael Monnerie; +Cc: Christoph Hellwig, Jeff Liu, xfs
On Wed, Oct 05, 2011 at 08:22:18PM +0200, Michael Monnerie wrote:
> On Mittwoch, 5. Oktober 2011 Dave Chinner wrote:
> > It's a data corruption problem, pure and simple....
>
> Thanks for the thorough explanation. So it's a problem when a file has
> recently been modified and not yet been written back to disk. Would it
> be worth to force a flush to disk before SEEK_* operations can start? I
> don't know if it's easier to do all the lookups you suggested, or do an
> fsync. That could have it's own impacts, though.
The problem is that application does not know whether fsync is
needed or not, so it would have to do that unconditionally for every
file it was copying. That has potential for quite severe, unexpected
performance regressions, so it's something we want to avoid.
There isn't really any additional complexity on the side of the
XFS SEEK_XXX code to handle this properly - my comments were
aimed at making sure that everyone understood the constraints of
taking parts of the XFS code and making it a generic helper because
other filesystems do not have the same locking heirachyi or
unwritten extent implementation as XFS does....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: SEEK_DATA/SEEK_HOLE support
2011-10-02 15:04 SEEK_DATA/SEEK_HOLE support Jeff Liu
2011-10-02 15:42 ` Christoph Hellwig
@ 2011-11-14 10:24 ` Christoph Hellwig
2011-11-14 12:47 ` Jeff Liu
1 sibling, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2011-11-14 10:24 UTC (permalink / raw)
To: Jeff Liu; +Cc: xfs
Just curious: did you start looking into implementing
SEEK_DATA/SEEK_HOLE support?
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: SEEK_DATA/SEEK_HOLE support
2011-11-14 10:24 ` Christoph Hellwig
@ 2011-11-14 12:47 ` Jeff Liu
2011-11-14 12:50 ` Christoph Hellwig
0 siblings, 1 reply; 28+ messages in thread
From: Jeff Liu @ 2011-11-14 12:47 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
Hi Christoph,
On 11/14/2011 06:24 PM, Christoph Hellwig wrote:
> Just curious: did you start looking into implementing
> SEEK_DATA/SEEK_HOLE support?
I have misunderstood your previous response:
"Dave mentioned he had a basic implementation, he might have some code
that you can improve on.”
So I am still waiting that now.
But actually I have already wrote a patch at that time, I'll run a few
tests and send it out for your guys review soon.
Thanks,
-Jeff
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: SEEK_DATA/SEEK_HOLE support
2011-11-14 12:47 ` Jeff Liu
@ 2011-11-14 12:50 ` Christoph Hellwig
2011-11-19 8:29 ` XFS SEEK_DATA/SEEK_HOLE support V1 Jeff Liu
0 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2011-11-14 12:50 UTC (permalink / raw)
To: Jeff Liu; +Cc: Christoph Hellwig, xfs
On Mon, Nov 14, 2011 at 08:47:36PM +0800, Jeff Liu wrote:
> Hi Christoph,
>
> On 11/14/2011 06:24 PM, Christoph Hellwig wrote:
>
> > Just curious: did you start looking into implementing
> > SEEK_DATA/SEEK_HOLE support?
>
> I have misunderstood your previous response:
>
> "Dave mentioned he had a basic implementation, he might have some code
> that you can improve on.?
>
> So I am still waiting that now.
He mentioned that, but given that there was no folloup with code
I wouldn't expect it to appear anymore.
> But actually I have already wrote a patch at that time, I'll run a few
> tests and send it out for your guys review soon.
Thanks a lot!
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 28+ messages in thread
* XFS SEEK_DATA/SEEK_HOLE support V1
2011-11-14 12:50 ` Christoph Hellwig
@ 2011-11-19 8:29 ` Jeff Liu
2011-11-19 8:34 ` Jeff Liu
2011-11-19 8:37 ` [PATCH] Introduce SEEK_DATA/SEEK_HOLE support to XFS V1 Jeff Liu
0 siblings, 2 replies; 28+ messages in thread
From: Jeff Liu @ 2011-11-19 8:29 UTC (permalink / raw)
To: xfs; +Cc: Christoph Hellwig, Chris Mason, aelder
Hello,
Sorry for the delay! I have worked on another high priority task these days.
So the below patch is the first try to add SEEK_DATA/SEEK_HOLE to XFS, I have tested it through seek_test.c from Sunil, looks all works to me.
http://oss.oracle.com/~smushran/seek_data/seek_test.c
Any feedback are welcome, thank you!
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
---
fs/xfs/xfs_bmap.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_bmap.h | 1 +
fs/xfs/xfs_file.c | 51 ++++++++++++++++++++++++++++++++-
fs/xfs/xfs_iops.c | 64 +++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_iops.h | 3 ++
5 files changed, 199 insertions(+), 1 deletions(-)
diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c
index c68baeb..04c3930 100644
--- a/fs/xfs/xfs_bmap.c
+++ b/fs/xfs/xfs_bmap.c
@@ -6133,3 +6133,84 @@ next_block:
return error;
}
+
+
+int
+xfs_seek_data_hole(
+ struct xfs_inode *ip,
+ loff_t *start,
+ u32 type)
+{
+ xfs_mount_t *mp = ip->i_mount;
+ xfs_fileoff_t seekoff = *start;
+ xfs_fileoff_t filelen;
+ xfs_extnum_t lastx;
+ xfs_ifork_t *ifp;
+ struct xfs_bmbt_irec got;
+ struct xfs_bmbt_irec prev;
+ u64 extoff;
+ u64 extlen;
+ int eof;
+
+ if (xfs_get_extsz_hint(ip) ||
+ ip->i_d.di_flags & (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND)) {
+ filelen = XFS_MAXIOFFSET(mp);
+ } else {
+ filelen = ip->i_size;
+ }
+
+ ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
+ xfs_bmap_search_extents(ip, XFS_BB_TO_FSBT(mp, BTOBB(*start)),
+ XFS_DATA_FORK, &eof, &lastx, &got, &prev);
+
+ if (type == SEEK_DATA) {
+ extoff = BBTOB(XFS_FSB_TO_BB(mp, got.br_startoff));
+ extlen = BBTOB(XFS_FSB_TO_BB(mp, got.br_blockcount));
+
+ if (eof) {
+ if (seekoff < extoff + extlen)
+ *start = seekoff;
+ else {
+ /*
+ * There is no more data region past the
+ * supplied offset.
+ */
+ return XFS_ERROR(ENXIO);
+ }
+ }
+
+ *start = seekoff < extoff ? extoff : seekoff;
+ } else {
+ for (;;) {
+ extoff = BBTOB(XFS_FSB_TO_BB(mp, got.br_startoff));
+ extlen = BBTOB(XFS_FSB_TO_BB(mp, got.br_blockcount));
+ if (eof) {
+ /*
+ * There might be a hole at the end of the
+ * file, adjust to the file size.
+ */
+ if (seekoff >= extoff) {
+ *start = min_t(xfs_fileoff_t, filelen,
+ (extoff + extlen));
+ }
+ break;
+ }
+
+ /* A hole found */
+ if (seekoff < extoff) {
+ *start = seekoff;
+ break;
+ }
+
+ /* Fetch the next extent */
+ seekoff = extoff + extlen;
+ if (++lastx < ifp->if_bytes / sizeof(xfs_bmbt_rec_t))
+ xfs_bmbt_get_all(xfs_iext_get_ext(ifp, lastx),
+ &got);
+ else
+ eof = 1;
+ }
+ }
+
+ return 0;
+}
diff --git a/fs/xfs/xfs_bmap.h b/fs/xfs/xfs_bmap.h
index 89ee672..964e29b 100644
--- a/fs/xfs/xfs_bmap.h
+++ b/fs/xfs/xfs_bmap.h
@@ -196,6 +196,7 @@ int xfs_bunmapi(struct xfs_trans *tp, struct xfs_inode *ip,
int xfs_check_nostate_extents(struct xfs_ifork *ifp, xfs_extnum_t idx,
xfs_extnum_t num);
uint xfs_default_attroffset(struct xfs_inode *ip);
+int xfs_seek_data_hole(struct xfs_inode *ip, loff_t *start, u32 type);
#ifdef __KERNEL__
/* bmap to userspace formatter - copy to user & advance pointer */
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 753ed9b..bf5471b 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1141,8 +1141,57 @@ xfs_vm_page_mkwrite(
return block_page_mkwrite(vma, vmf, xfs_get_blocks);
}
+STATIC loff_t
+xfs_file_llseek(
+ struct file *file,
+ loff_t offset,
+ int origin)
+{
+ struct inode *inode = file->f_mapping->host;
+ int ret;
+
+ if (origin != SEEK_DATA && origin != SEEK_HOLE)
+ return generic_file_llseek(file, offset, origin);
+
+ mutex_lock(&inode->i_mutex);
+ switch (origin) {
+ case SEEK_DATA:
+ case SEEK_HOLE:
+ if (offset >= i_size_read(inode)) {
+ ret = -ENXIO;
+ goto error;
+ }
+
+ ret = xfs_find_desired_extent(inode, &offset, origin);
+ if (ret)
+ goto error;
+ }
+
+ if (offset < 0 && !(file->f_mode & FMODE_UNSIGNED_OFFSET)) {
+ ret = -EINVAL;
+ goto error;
+ }
+
+ if (offset > inode->i_sb->s_maxbytes) {
+ ret = -EINVAL;
+ goto error;
+ }
+
+ if (offset != file->f_pos) {
+ file->f_pos = offset;
+ file->f_version = 0;
+ }
+
+ mutex_unlock(&inode->i_mutex);
+ return offset;
+
+error:
+ mutex_unlock(&inode->i_mutex);
+ return ret;
+}
+
const struct file_operations xfs_file_operations = {
- .llseek = generic_file_llseek,
+ .llseek = xfs_file_llseek,
.read = do_sync_read,
.write = do_sync_write,
.aio_read = xfs_file_aio_read,
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 23ce927..482c1ff 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1030,6 +1030,70 @@ xfs_vn_fiemap(
return 0;
}
+int
+xfs_find_desired_extent(
+ struct inode *inode,
+ loff_t *start,
+ u32 type)
+{
+ xfs_inode_t *ip = XFS_I(inode);
+ xfs_mount_t *mp = ip->i_mount;
+ struct xfs_ifork *ifp;
+ int lock;
+ int error;
+
+ if (ip->i_d.di_format != XFS_DINODE_FMT_EXTENTS &&
+ ip->i_d.di_format != XFS_DINODE_FMT_BTREE &&
+ ip->i_d.di_format != XFS_DINODE_FMT_LOCAL)
+ return XFS_ERROR(EINVAL);
+
+ xfs_ilock(ip, XFS_IOLOCK_SHARED);
+
+ /*
+ * Flush the delay alloc blocks. Even after flushing the inode,
+ * there can still be delalloc blocks on the inode beyond EOF
+ * due to speculative preallocation. These are not removed until
+ * the release function is called or the inode is inactivated.
+ * Hence we cannot assert here that ip->i_delayed_blks == 0.
+ */
+ if (ip->i_delayed_blks || ip->i_size > ip->i_d.di_size) {
+ error = xfs_flush_pages(ip, 0, -1, 0, FI_REMAPF);
+ if (error)
+ goto out_unlock_iolock;
+ }
+
+ lock = xfs_ilock_map_shared(ip);
+
+ if (XFS_FORCED_SHUTDOWN(mp)) {
+ error = EIO;
+ goto out_unlock_ilock;
+ }
+
+ XFS_STATS_INC(xs_blk_mapr);
+ ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
+
+ ASSERT(ifp->if_ext_max ==
+ XFS_IFORK_SIZE(ip, XFS_DATA_FORK) / (uint)sizeof(xfs_bmbt_rec_t));
+
+ if (!(ifp->if_flags & XFS_IFEXTENTS)) {
+ error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
+ if (error)
+ goto out_unlock_ilock;
+ }
+
+ error = xfs_seek_data_hole(ip, start, type);
+
+out_unlock_ilock:
+ xfs_iunlock_map_shared(ip, lock);
+out_unlock_iolock:
+ xfs_iunlock(ip, XFS_IOLOCK_SHARED);
+
+ if (error)
+ return -error;
+
+ return 0;
+}
+
static const struct inode_operations xfs_inode_operations = {
.get_acl = xfs_get_acl,
.getattr = xfs_vn_getattr,
diff --git a/fs/xfs/xfs_iops.h b/fs/xfs/xfs_iops.h
index ef41c92..ea47abd 100644
--- a/fs/xfs/xfs_iops.h
+++ b/fs/xfs/xfs_iops.h
@@ -27,4 +27,7 @@ extern ssize_t xfs_vn_listxattr(struct dentry *, char *data, size_t size);
extern void xfs_setup_inode(struct xfs_inode *);
+extern int xfs_find_desired_extent(struct inode *inode, loff_t *start,
+ u32 type);
+
#endif /* __XFS_IOPS_H__ */
--
1.7.4.1
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: XFS SEEK_DATA/SEEK_HOLE support V1
2011-11-19 8:29 ` XFS SEEK_DATA/SEEK_HOLE support V1 Jeff Liu
@ 2011-11-19 8:34 ` Jeff Liu
2011-11-19 8:37 ` [PATCH] Introduce SEEK_DATA/SEEK_HOLE support to XFS V1 Jeff Liu
1 sibling, 0 replies; 28+ messages in thread
From: Jeff Liu @ 2011-11-19 8:34 UTC (permalink / raw)
To: jeff.liu; +Cc: Christoph Hellwig, aelder, Chris Mason, xfs
Sorry!! I forgot to write the subject beginning with "[PATCH]", will
re-sent it again.
Thanks,
-Jeff
On 11/19/2011 04:29 PM, Jeff Liu wrote:
> Hello,
>
> Sorry for the delay! I have worked on another high priority task these days.
>
> So the below patch is the first try to add SEEK_DATA/SEEK_HOLE to XFS, I have tested it through seek_test.c from Sunil, looks all works to me.
> http://oss.oracle.com/~smushran/seek_data/seek_test.c
>
> Any feedback are welcome, thank you!
>
> Signed-off-by: Jie Liu <jeff.liu@oracle.com>
>
> ---
> fs/xfs/xfs_bmap.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> fs/xfs/xfs_bmap.h | 1 +
> fs/xfs/xfs_file.c | 51 ++++++++++++++++++++++++++++++++-
> fs/xfs/xfs_iops.c | 64 +++++++++++++++++++++++++++++++++++++++++
> fs/xfs/xfs_iops.h | 3 ++
> 5 files changed, 199 insertions(+), 1 deletions(-)
>
> diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c
> index c68baeb..04c3930 100644
> --- a/fs/xfs/xfs_bmap.c
> +++ b/fs/xfs/xfs_bmap.c
> @@ -6133,3 +6133,84 @@ next_block:
>
> return error;
> }
> +
> +
> +int
> +xfs_seek_data_hole(
> + struct xfs_inode *ip,
> + loff_t *start,
> + u32 type)
> +{
> + xfs_mount_t *mp = ip->i_mount;
> + xfs_fileoff_t seekoff = *start;
> + xfs_fileoff_t filelen;
> + xfs_extnum_t lastx;
> + xfs_ifork_t *ifp;
> + struct xfs_bmbt_irec got;
> + struct xfs_bmbt_irec prev;
> + u64 extoff;
> + u64 extlen;
> + int eof;
> +
> + if (xfs_get_extsz_hint(ip) ||
> + ip->i_d.di_flags & (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND)) {
> + filelen = XFS_MAXIOFFSET(mp);
> + } else {
> + filelen = ip->i_size;
> + }
> +
> + ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
> + xfs_bmap_search_extents(ip, XFS_BB_TO_FSBT(mp, BTOBB(*start)),
> + XFS_DATA_FORK, &eof, &lastx, &got, &prev);
> +
> + if (type == SEEK_DATA) {
> + extoff = BBTOB(XFS_FSB_TO_BB(mp, got.br_startoff));
> + extlen = BBTOB(XFS_FSB_TO_BB(mp, got.br_blockcount));
> +
> + if (eof) {
> + if (seekoff < extoff + extlen)
> + *start = seekoff;
> + else {
> + /*
> + * There is no more data region past the
> + * supplied offset.
> + */
> + return XFS_ERROR(ENXIO);
> + }
> + }
> +
> + *start = seekoff < extoff ? extoff : seekoff;
> + } else {
> + for (;;) {
> + extoff = BBTOB(XFS_FSB_TO_BB(mp, got.br_startoff));
> + extlen = BBTOB(XFS_FSB_TO_BB(mp, got.br_blockcount));
> + if (eof) {
> + /*
> + * There might be a hole at the end of the
> + * file, adjust to the file size.
> + */
> + if (seekoff >= extoff) {
> + *start = min_t(xfs_fileoff_t, filelen,
> + (extoff + extlen));
> + }
> + break;
> + }
> +
> + /* A hole found */
> + if (seekoff < extoff) {
> + *start = seekoff;
> + break;
> + }
> +
> + /* Fetch the next extent */
> + seekoff = extoff + extlen;
> + if (++lastx < ifp->if_bytes / sizeof(xfs_bmbt_rec_t))
> + xfs_bmbt_get_all(xfs_iext_get_ext(ifp, lastx),
> + &got);
> + else
> + eof = 1;
> + }
> + }
> +
> + return 0;
> +}
> diff --git a/fs/xfs/xfs_bmap.h b/fs/xfs/xfs_bmap.h
> index 89ee672..964e29b 100644
> --- a/fs/xfs/xfs_bmap.h
> +++ b/fs/xfs/xfs_bmap.h
> @@ -196,6 +196,7 @@ int xfs_bunmapi(struct xfs_trans *tp, struct xfs_inode *ip,
> int xfs_check_nostate_extents(struct xfs_ifork *ifp, xfs_extnum_t idx,
> xfs_extnum_t num);
> uint xfs_default_attroffset(struct xfs_inode *ip);
> +int xfs_seek_data_hole(struct xfs_inode *ip, loff_t *start, u32 type);
>
> #ifdef __KERNEL__
> /* bmap to userspace formatter - copy to user & advance pointer */
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 753ed9b..bf5471b 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -1141,8 +1141,57 @@ xfs_vm_page_mkwrite(
> return block_page_mkwrite(vma, vmf, xfs_get_blocks);
> }
>
> +STATIC loff_t
> +xfs_file_llseek(
> + struct file *file,
> + loff_t offset,
> + int origin)
> +{
> + struct inode *inode = file->f_mapping->host;
> + int ret;
> +
> + if (origin != SEEK_DATA && origin != SEEK_HOLE)
> + return generic_file_llseek(file, offset, origin);
> +
> + mutex_lock(&inode->i_mutex);
> + switch (origin) {
> + case SEEK_DATA:
> + case SEEK_HOLE:
> + if (offset >= i_size_read(inode)) {
> + ret = -ENXIO;
> + goto error;
> + }
> +
> + ret = xfs_find_desired_extent(inode, &offset, origin);
> + if (ret)
> + goto error;
> + }
> +
> + if (offset < 0 && !(file->f_mode & FMODE_UNSIGNED_OFFSET)) {
> + ret = -EINVAL;
> + goto error;
> + }
> +
> + if (offset > inode->i_sb->s_maxbytes) {
> + ret = -EINVAL;
> + goto error;
> + }
> +
> + if (offset != file->f_pos) {
> + file->f_pos = offset;
> + file->f_version = 0;
> + }
> +
> + mutex_unlock(&inode->i_mutex);
> + return offset;
> +
> +error:
> + mutex_unlock(&inode->i_mutex);
> + return ret;
> +}
> +
> const struct file_operations xfs_file_operations = {
> - .llseek = generic_file_llseek,
> + .llseek = xfs_file_llseek,
> .read = do_sync_read,
> .write = do_sync_write,
> .aio_read = xfs_file_aio_read,
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 23ce927..482c1ff 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -1030,6 +1030,70 @@ xfs_vn_fiemap(
> return 0;
> }
>
> +int
> +xfs_find_desired_extent(
> + struct inode *inode,
> + loff_t *start,
> + u32 type)
> +{
> + xfs_inode_t *ip = XFS_I(inode);
> + xfs_mount_t *mp = ip->i_mount;
> + struct xfs_ifork *ifp;
> + int lock;
> + int error;
> +
> + if (ip->i_d.di_format != XFS_DINODE_FMT_EXTENTS &&
> + ip->i_d.di_format != XFS_DINODE_FMT_BTREE &&
> + ip->i_d.di_format != XFS_DINODE_FMT_LOCAL)
> + return XFS_ERROR(EINVAL);
> +
> + xfs_ilock(ip, XFS_IOLOCK_SHARED);
> +
> + /*
> + * Flush the delay alloc blocks. Even after flushing the inode,
> + * there can still be delalloc blocks on the inode beyond EOF
> + * due to speculative preallocation. These are not removed until
> + * the release function is called or the inode is inactivated.
> + * Hence we cannot assert here that ip->i_delayed_blks == 0.
> + */
> + if (ip->i_delayed_blks || ip->i_size > ip->i_d.di_size) {
> + error = xfs_flush_pages(ip, 0, -1, 0, FI_REMAPF);
> + if (error)
> + goto out_unlock_iolock;
> + }
> +
> + lock = xfs_ilock_map_shared(ip);
> +
> + if (XFS_FORCED_SHUTDOWN(mp)) {
> + error = EIO;
> + goto out_unlock_ilock;
> + }
> +
> + XFS_STATS_INC(xs_blk_mapr);
> + ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
> +
> + ASSERT(ifp->if_ext_max ==
> + XFS_IFORK_SIZE(ip, XFS_DATA_FORK) / (uint)sizeof(xfs_bmbt_rec_t));
> +
> + if (!(ifp->if_flags & XFS_IFEXTENTS)) {
> + error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
> + if (error)
> + goto out_unlock_ilock;
> + }
> +
> + error = xfs_seek_data_hole(ip, start, type);
> +
> +out_unlock_ilock:
> + xfs_iunlock_map_shared(ip, lock);
> +out_unlock_iolock:
> + xfs_iunlock(ip, XFS_IOLOCK_SHARED);
> +
> + if (error)
> + return -error;
> +
> + return 0;
> +}
> +
> static const struct inode_operations xfs_inode_operations = {
> .get_acl = xfs_get_acl,
> .getattr = xfs_vn_getattr,
> diff --git a/fs/xfs/xfs_iops.h b/fs/xfs/xfs_iops.h
> index ef41c92..ea47abd 100644
> --- a/fs/xfs/xfs_iops.h
> +++ b/fs/xfs/xfs_iops.h
> @@ -27,4 +27,7 @@ extern ssize_t xfs_vn_listxattr(struct dentry *, char *data, size_t size);
>
> extern void xfs_setup_inode(struct xfs_inode *);
>
> +extern int xfs_find_desired_extent(struct inode *inode, loff_t *start,
> + u32 type);
> +
> #endif /* __XFS_IOPS_H__ */
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH] Introduce SEEK_DATA/SEEK_HOLE support to XFS V1
2011-11-19 8:29 ` XFS SEEK_DATA/SEEK_HOLE support V1 Jeff Liu
2011-11-19 8:34 ` Jeff Liu
@ 2011-11-19 8:37 ` Jeff Liu
2011-11-19 19:11 ` Christoph Hellwig
2011-11-20 0:30 ` Dave Chinner
1 sibling, 2 replies; 28+ messages in thread
From: Jeff Liu @ 2011-11-19 8:37 UTC (permalink / raw)
To: xfs; +Cc: Christoph Hellwig, Chris Mason, aelder
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
---
fs/xfs/xfs_bmap.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_bmap.h | 1 +
fs/xfs/xfs_file.c | 51 ++++++++++++++++++++++++++++++++-
fs/xfs/xfs_iops.c | 64 +++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_iops.h | 3 ++
5 files changed, 199 insertions(+), 1 deletions(-)
diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c
index c68baeb..04c3930 100644
--- a/fs/xfs/xfs_bmap.c
+++ b/fs/xfs/xfs_bmap.c
@@ -6133,3 +6133,84 @@ next_block:
return error;
}
+
+
+int
+xfs_seek_data_hole(
+ struct xfs_inode *ip,
+ loff_t *start,
+ u32 type)
+{
+ xfs_mount_t *mp = ip->i_mount;
+ xfs_fileoff_t seekoff = *start;
+ xfs_fileoff_t filelen;
+ xfs_extnum_t lastx;
+ xfs_ifork_t *ifp;
+ struct xfs_bmbt_irec got;
+ struct xfs_bmbt_irec prev;
+ u64 extoff;
+ u64 extlen;
+ int eof;
+
+ if (xfs_get_extsz_hint(ip) ||
+ ip->i_d.di_flags & (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND)) {
+ filelen = XFS_MAXIOFFSET(mp);
+ } else {
+ filelen = ip->i_size;
+ }
+
+ ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
+ xfs_bmap_search_extents(ip, XFS_BB_TO_FSBT(mp, BTOBB(*start)),
+ XFS_DATA_FORK, &eof, &lastx, &got, &prev);
+
+ if (type == SEEK_DATA) {
+ extoff = BBTOB(XFS_FSB_TO_BB(mp, got.br_startoff));
+ extlen = BBTOB(XFS_FSB_TO_BB(mp, got.br_blockcount));
+
+ if (eof) {
+ if (seekoff < extoff + extlen)
+ *start = seekoff;
+ else {
+ /*
+ * There is no more data region past the
+ * supplied offset.
+ */
+ return XFS_ERROR(ENXIO);
+ }
+ }
+
+ *start = seekoff < extoff ? extoff : seekoff;
+ } else {
+ for (;;) {
+ extoff = BBTOB(XFS_FSB_TO_BB(mp, got.br_startoff));
+ extlen = BBTOB(XFS_FSB_TO_BB(mp, got.br_blockcount));
+ if (eof) {
+ /*
+ * There might be a hole at the end of the
+ * file, adjust to the file size.
+ */
+ if (seekoff >= extoff) {
+ *start = min_t(xfs_fileoff_t, filelen,
+ (extoff + extlen));
+ }
+ break;
+ }
+
+ /* A hole found */
+ if (seekoff < extoff) {
+ *start = seekoff;
+ break;
+ }
+
+ /* Fetch the next extent */
+ seekoff = extoff + extlen;
+ if (++lastx < ifp->if_bytes / sizeof(xfs_bmbt_rec_t))
+ xfs_bmbt_get_all(xfs_iext_get_ext(ifp, lastx),
+ &got);
+ else
+ eof = 1;
+ }
+ }
+
+ return 0;
+}
diff --git a/fs/xfs/xfs_bmap.h b/fs/xfs/xfs_bmap.h
index 89ee672..964e29b 100644
--- a/fs/xfs/xfs_bmap.h
+++ b/fs/xfs/xfs_bmap.h
@@ -196,6 +196,7 @@ int xfs_bunmapi(struct xfs_trans *tp, struct xfs_inode *ip,
int xfs_check_nostate_extents(struct xfs_ifork *ifp, xfs_extnum_t idx,
xfs_extnum_t num);
uint xfs_default_attroffset(struct xfs_inode *ip);
+int xfs_seek_data_hole(struct xfs_inode *ip, loff_t *start, u32 type);
#ifdef __KERNEL__
/* bmap to userspace formatter - copy to user & advance pointer */
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 753ed9b..bf5471b 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1141,8 +1141,57 @@ xfs_vm_page_mkwrite(
return block_page_mkwrite(vma, vmf, xfs_get_blocks);
}
+STATIC loff_t
+xfs_file_llseek(
+ struct file *file,
+ loff_t offset,
+ int origin)
+{
+ struct inode *inode = file->f_mapping->host;
+ int ret;
+
+ if (origin != SEEK_DATA && origin != SEEK_HOLE)
+ return generic_file_llseek(file, offset, origin);
+
+ mutex_lock(&inode->i_mutex);
+ switch (origin) {
+ case SEEK_DATA:
+ case SEEK_HOLE:
+ if (offset >= i_size_read(inode)) {
+ ret = -ENXIO;
+ goto error;
+ }
+
+ ret = xfs_find_desired_extent(inode, &offset, origin);
+ if (ret)
+ goto error;
+ }
+
+ if (offset < 0 && !(file->f_mode & FMODE_UNSIGNED_OFFSET)) {
+ ret = -EINVAL;
+ goto error;
+ }
+
+ if (offset > inode->i_sb->s_maxbytes) {
+ ret = -EINVAL;
+ goto error;
+ }
+
+ if (offset != file->f_pos) {
+ file->f_pos = offset;
+ file->f_version = 0;
+ }
+
+ mutex_unlock(&inode->i_mutex);
+ return offset;
+
+error:
+ mutex_unlock(&inode->i_mutex);
+ return ret;
+}
+
const struct file_operations xfs_file_operations = {
- .llseek = generic_file_llseek,
+ .llseek = xfs_file_llseek,
.read = do_sync_read,
.write = do_sync_write,
.aio_read = xfs_file_aio_read,
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 23ce927..482c1ff 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1030,6 +1030,70 @@ xfs_vn_fiemap(
return 0;
}
+int
+xfs_find_desired_extent(
+ struct inode *inode,
+ loff_t *start,
+ u32 type)
+{
+ xfs_inode_t *ip = XFS_I(inode);
+ xfs_mount_t *mp = ip->i_mount;
+ struct xfs_ifork *ifp;
+ int lock;
+ int error;
+
+ if (ip->i_d.di_format != XFS_DINODE_FMT_EXTENTS &&
+ ip->i_d.di_format != XFS_DINODE_FMT_BTREE &&
+ ip->i_d.di_format != XFS_DINODE_FMT_LOCAL)
+ return XFS_ERROR(EINVAL);
+
+ xfs_ilock(ip, XFS_IOLOCK_SHARED);
+
+ /*
+ * Flush the delay alloc blocks. Even after flushing the inode,
+ * there can still be delalloc blocks on the inode beyond EOF
+ * due to speculative preallocation. These are not removed until
+ * the release function is called or the inode is inactivated.
+ * Hence we cannot assert here that ip->i_delayed_blks == 0.
+ */
+ if (ip->i_delayed_blks || ip->i_size > ip->i_d.di_size) {
+ error = xfs_flush_pages(ip, 0, -1, 0, FI_REMAPF);
+ if (error)
+ goto out_unlock_iolock;
+ }
+
+ lock = xfs_ilock_map_shared(ip);
+
+ if (XFS_FORCED_SHUTDOWN(mp)) {
+ error = EIO;
+ goto out_unlock_ilock;
+ }
+
+ XFS_STATS_INC(xs_blk_mapr);
+ ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
+
+ ASSERT(ifp->if_ext_max ==
+ XFS_IFORK_SIZE(ip, XFS_DATA_FORK) / (uint)sizeof(xfs_bmbt_rec_t));
+
+ if (!(ifp->if_flags & XFS_IFEXTENTS)) {
+ error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
+ if (error)
+ goto out_unlock_ilock;
+ }
+
+ error = xfs_seek_data_hole(ip, start, type);
+
+out_unlock_ilock:
+ xfs_iunlock_map_shared(ip, lock);
+out_unlock_iolock:
+ xfs_iunlock(ip, XFS_IOLOCK_SHARED);
+
+ if (error)
+ return -error;
+
+ return 0;
+}
+
static const struct inode_operations xfs_inode_operations = {
.get_acl = xfs_get_acl,
.getattr = xfs_vn_getattr,
diff --git a/fs/xfs/xfs_iops.h b/fs/xfs/xfs_iops.h
index ef41c92..ea47abd 100644
--- a/fs/xfs/xfs_iops.h
+++ b/fs/xfs/xfs_iops.h
@@ -27,4 +27,7 @@ extern ssize_t xfs_vn_listxattr(struct dentry *, char *data, size_t size);
extern void xfs_setup_inode(struct xfs_inode *);
+extern int xfs_find_desired_extent(struct inode *inode, loff_t *start,
+ u32 type);
+
#endif /* __XFS_IOPS_H__ */
--
1.7.4.1
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH] Introduce SEEK_DATA/SEEK_HOLE support to XFS V1
2011-11-19 8:37 ` [PATCH] Introduce SEEK_DATA/SEEK_HOLE support to XFS V1 Jeff Liu
@ 2011-11-19 19:11 ` Christoph Hellwig
2011-11-20 13:15 ` Jeff Liu
2011-11-20 0:30 ` Dave Chinner
1 sibling, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2011-11-19 19:11 UTC (permalink / raw)
To: Jeff Liu; +Cc: Christoph Hellwig, aelder, Chris Mason, xfs
On Sat, Nov 19, 2011 at 04:37:13PM +0800, Jeff Liu wrote:
>
> Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Thanks a lot Jeff. A few comments below:
> +int
> +xfs_seek_data_hole(
> + struct xfs_inode *ip,
> + loff_t *start,
> + u32 type)
> +{
> + xfs_mount_t *mp = ip->i_mount;
please use
struct xfs_mount *mp = ip->i_mount;
for all new code.
> + if (xfs_get_extsz_hint(ip) ||
> + ip->i_d.di_flags & (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND)) {
> + filelen = XFS_MAXIOFFSET(mp);
> + } else {
> + filelen = ip->i_size;
> + }
I don't understand this. XFS_MAXIOFFSET is the maximum possible file
size in an XFS filesystem - using it with an extent size hint or
the prealloc or appen only flags doesn't make sense to me.
> + if (type == SEEK_DATA) {
xfs_seek_data_hole shares almost no common code between the SEEK_DATA
and SEEK_HOLE cases, which suggests it probably should be two different
routines.
> +STATIC loff_t
> +xfs_file_llseek(
> + struct file *file,
> + loff_t offset,
> + int origin)
> +{
> + struct inode *inode = file->f_mapping->host;
> + int ret;
> +
> + if (origin != SEEK_DATA && origin != SEEK_HOLE)
> + return generic_file_llseek(file, offset, origin);
> +
> + mutex_lock(&inode->i_mutex);
> + switch (origin) {
> + case SEEK_DATA:
> + case SEEK_HOLE:
Having the if above and then the switch here seems like and odd style.
I'd do either an if, or a switch statement for all possible variants,
but not both.
> + if (offset >= i_size_read(inode)) {
> + ret = -ENXIO;
> + goto error;
> + }
> +
> + ret = xfs_find_desired_extent(inode, &offset, origin);
> + if (ret)
> + goto error;
> + }
> +
> + if (offset < 0 && !(file->f_mode & FMODE_UNSIGNED_OFFSET)) {
> + ret = -EINVAL;
> + goto error;
> + }
I don't think this could ever happen on XFS.
> + if (offset > inode->i_sb->s_maxbytes) {
> + ret = -EINVAL;
> + goto error;
> + }
This also shouldn't happen if the low-level code does the right
thing.
> + if (offset != file->f_pos) {
> + file->f_pos = offset;
> + file->f_version = 0;
> + }
XFS never uses f_version, no need to update it.
> +int
> +xfs_find_desired_extent(
> + struct inode *inode,
> + loff_t *start,
> + u32 type)
I think this would better be merged with the code currenly in
xfs_file_llseek. Maybe move all the SEEK_DATA/SEEK_HOLE specific
code from there to this function?
Also please move this routine to be next to xfs_file_llseek in xfs_file.c,
which also means that it can be marked static.
> +{
> + xfs_inode_t *ip = XFS_I(inode);
> + xfs_mount_t *mp = ip->i_mount;
Just as above please use the struct versions for new code.
> + /*
> + * Flush the delay alloc blocks. Even after flushing the inode,
> + * there can still be delalloc blocks on the inode beyond EOF
> + * due to speculative preallocation. These are not removed until
> + * the release function is called or the inode is inactivated.
> + * Hence we cannot assert here that ip->i_delayed_blks == 0.
> + */
> + if (ip->i_delayed_blks || ip->i_size > ip->i_d.di_size) {
> + error = xfs_flush_pages(ip, 0, -1, 0, FI_REMAPF);
> + if (error)
> + goto out_unlock_iolock;
> + }
For the final version we should get rid of this flush and instead look
for pages having dirty unwritten extents in the pagecache and adjust
the result based on it. I'm fine with delaying this until all other
issues are sorted out.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] Introduce SEEK_DATA/SEEK_HOLE support to XFS V1
2011-11-19 8:37 ` [PATCH] Introduce SEEK_DATA/SEEK_HOLE support to XFS V1 Jeff Liu
2011-11-19 19:11 ` Christoph Hellwig
@ 2011-11-20 0:30 ` Dave Chinner
2011-11-20 13:59 ` Jeff Liu
1 sibling, 1 reply; 28+ messages in thread
From: Dave Chinner @ 2011-11-20 0:30 UTC (permalink / raw)
To: Jeff Liu; +Cc: Christoph Hellwig, aelder, Chris Mason, xfs
On Sat, Nov 19, 2011 at 04:37:13PM +0800, Jeff Liu wrote:
>
> Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Christoph has commented on the code-related aspects of the patch, so
I won't repeat that. I'll comment on structural/design issues
instead.
Firstly, the patch splits the functionality arbitrarily over three
different files, and I don't think that is necessary. There really
is no reason at all for xfs_bmap.c to know anything aout
SEEK_HOLE/SEEK_DATA semantics - that file is for extent manipulation
and search functions. SEEK semantics should be entirely encoded into
the function that deals with the seeking.
Secondly, xfs_find_desired_extent() and xfs_seek_hole_data() should
be one function, and named something like xfs_file_seek_extent().
Finally, don't think using low level extent search functions like
xfs_bmap_search_extents() is the right level to implement this
functionality (see my comments about SEEK_HOLE/SEEK_DATA semantics
in xfs-bmap.c above), especially as we already have functions for
looking up holes and extents at given offsets.
That is, to find the next hole at or after after a given offset, we
already have xfs_bmap_first_unused(). Indeed, this function already
has the exact semantics that SEEK_HOLE requires. Simply put:
case SEEK_HOLE:
fsb = XFS_B_TO_FSBT(mp, start_offset);
error = xfs_bmap_first_unused(NULL, ip, 1, &fsb,
XFS_DATA_FORK);
if (error)
return -error;
if (fsb <= XFS_B_TO_FSBT(mp, start_offset))
return start_offset;
return XFS_FSB_TO_B(mp, fsb);
As to the data extent search, I'd prefer you to use xfs_bmapi_read()
rather than xfs_bmap_search_extents() directly. I'd prefer that we
return unwritten extents as holes rather than data from the initial
implementation, and using the low level xfs_bmap_search_extents()
makes that quite complex. However, we already have a function for
handling that complexity: xfs_bmapi_read().
That is, xfs_bmapi_read() needs to be passed an array of two maps,
one for the current offset, and one for the next extent type. This
makes one call sufficient for most transitions. Done in a simple
loop it will handle all conditions of hole->unwritten->hole....
until it finds a data extent of EOF.
start_fsbno = XFS_B_TO_FSBT(mp, start_offset);
while (1) {
struct xfs_bmbt_irec maps[2];
int nmaps = 2;
count_fsb = XFS_B_TO_FSB(mp, XFS_MAXIOFFSET(mp));
error = xfs_bmapi_read(ip, fsbno, count_fsb,
&maps, &nmaps, XFS_BMAPI_ENTIRE);
if (error)
return -error;
if (!nmaps) {
/* no extents at given offset, must be beyond EOF */
return -ENXIO;
}
switch (map[0].br_startblock) {
case DELAYSTARTBLOCK:
/* landed in an in-memory data extent */
return map[0].br_startoff;
default:
/* landed in an allocated extent */
if (map[0].br_state == XFS_EXT_NORM) {
/* a real data extent */
return map[0].br_startoff;
}
/* Fall through to hole handling for unwritten extents */
case HOLESTARTBLOCK:
/*
* landed in a hole. If the next extent is a data
* extent, then return the start of it, otherwise
* we need to move the start offset and map more
* blocks.
*/
if (map[1].br_startblock == DELAYSTARTBLOCK ||
((map[1].br_startblock != HOLESTARTBLOCK &&
map[1].br_state == XFS_EXT_NORM)))
return map[1].br_startoff;
start_fsbno = map[1].br_startoff + map[1].br_blockcount;
break;
}
if (XFS_FSB_TO_B(mp, start_fsbno) > ip->i_size) {
/* Beyond EOF now */
return -ENXIO;
}
}
This can pretty much all be done in
fs/xfs/xfs_file.c::xfs_file_seek_extent() because all the functions
used by the above code are exported from xfs_bmap.c for external use
- that solves the scattering problem and uses interfaces that we
already know work in the intended manner.... ;)
BTW:
> +xfs_file_llseek(
> + struct file *file,
> + loff_t offset,
> + int origin)
> +{
> + struct inode *inode = file->f_mapping->host;
> + int ret;
> +
> + if (origin != SEEK_DATA && origin != SEEK_HOLE)
> + return generic_file_llseek(file, offset, origin);
> +
> + mutex_lock(&inode->i_mutex);
> + switch (origin) {
We don't need the i_mutex to be held here. We only need to hold the
ilock in shared mode for this operation to protect against extent
list modifications (like unwritten extent conversion and
truncation).
> +int
> +xfs_find_desired_extent(
> + struct inode *inode,
> + loff_t *start,
> + u32 type)
> +{
> + xfs_inode_t *ip = XFS_I(inode);
> + xfs_mount_t *mp = ip->i_mount;
> + struct xfs_ifork *ifp;
> + int lock;
> + int error;
> +
> + if (ip->i_d.di_format != XFS_DINODE_FMT_EXTENTS &&
> + ip->i_d.di_format != XFS_DINODE_FMT_BTREE &&
> + ip->i_d.di_format != XFS_DINODE_FMT_LOCAL)
> + return XFS_ERROR(EINVAL);
> +
> + xfs_ilock(ip, XFS_IOLOCK_SHARED);
> +
> + /*
> + * Flush the delay alloc blocks. Even after flushing the inode,
> + * there can still be delalloc blocks on the inode beyond EOF
> + * due to speculative preallocation. These are not removed until
> + * the release function is called or the inode is inactivated.
> + * Hence we cannot assert here that ip->i_delayed_blks == 0.
> + */
> + if (ip->i_delayed_blks || ip->i_size > ip->i_d.di_size) {
> + error = xfs_flush_pages(ip, 0, -1, 0, FI_REMAPF);
> + if (error)
> + goto out_unlock_iolock;
> + }
i.e. this IOLOCK and flush is completely unnecessary because we'll
find delayed allocation extents in the extent lookup and can handle
them just like real allocated extents....
> + lock = xfs_ilock_map_shared(ip);
i.e. this is the only lock we need to take.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] Introduce SEEK_DATA/SEEK_HOLE support to XFS V1
2011-11-19 19:11 ` Christoph Hellwig
@ 2011-11-20 13:15 ` Jeff Liu
0 siblings, 0 replies; 28+ messages in thread
From: Jeff Liu @ 2011-11-20 13:15 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: aelder, Chris Mason, xfs
Hi Christoph,
Thanks for your quick response!
On 11/20/2011 03:11 AM, Christoph Hellwig wrote:
> On Sat, Nov 19, 2011 at 04:37:13PM +0800, Jeff Liu wrote:
>>
>> Signed-off-by: Jie Liu <jeff.liu@oracle.com>
>
> Thanks a lot Jeff. A few comments below:
>
>> +int
>> +xfs_seek_data_hole(
>> + struct xfs_inode *ip,
>> + loff_t *start,
>> + u32 type)
>> +{
>> + xfs_mount_t *mp = ip->i_mount;
>
> please use
>
> struct xfs_mount *mp = ip->i_mount;
>
> for all new code.
>
>> + if (xfs_get_extsz_hint(ip) ||
>> + ip->i_d.di_flags & (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND)) {
>> + filelen = XFS_MAXIOFFSET(mp);
>> + } else {
>> + filelen = ip->i_size;
>> + }
>
> I don't understand this. XFS_MAXIOFFSET is the maximum possible file
> size in an XFS filesystem - using it with an extent size hint or
> the prealloc or appen only flags doesn't make sense to me.
Sorry, I forgot why I using it to retrieve the file size before. :( but
it definitely don't needed here.
>
>> + if (type == SEEK_DATA) {
>
> xfs_seek_data_hole shares almost no common code between the SEEK_DATA
> and SEEK_HOLE cases, which suggests it probably should be two different
> routines.
Yes, it's better to split SEEK_DATA/SEEK_HOLE into two different functions.
>
>
>> +STATIC loff_t
>> +xfs_file_llseek(
>> + struct file *file,
>> + loff_t offset,
>> + int origin)
>> +{
>> + struct inode *inode = file->f_mapping->host;
>> + int ret;
>> +
>> + if (origin != SEEK_DATA && origin != SEEK_HOLE)
>> + return generic_file_llseek(file, offset, origin);
>> +
>> + mutex_lock(&inode->i_mutex);
>> + switch (origin) {
>> + case SEEK_DATA:
>> + case SEEK_HOLE:
>
> Having the if above and then the switch here seems like and odd style.
> I'd do either an if, or a switch statement for all possible variants,
> but not both.
I'll fix it with switch statement then. :-)
>
>> + if (offset >= i_size_read(inode)) {
>> + ret = -ENXIO;
>> + goto error;
>> + }
>> +
>> + ret = xfs_find_desired_extent(inode, &offset, origin);
>> + if (ret)
>> + goto error;
>> + }
>> +
>> + if (offset < 0 && !(file->f_mode & FMODE_UNSIGNED_OFFSET)) {
>> + ret = -EINVAL;
>> + goto error;
>> + }
>
> I don't think this could ever happen on XFS.
>
>> + if (offset > inode->i_sb->s_maxbytes) {
>> + ret = -EINVAL;
>> + goto error;
>> + }
>
> This also shouldn't happen if the low-level code does the right
> thing.
>
>> + if (offset != file->f_pos) {
>> + file->f_pos = offset;
>> + file->f_version = 0;
>> + }
>
> XFS never uses f_version, no need to update it.
Thanks for pointing out my mistakes for above three issues,
I have not think carefully about them, just copy && paste the code block
from btrfs instead, they will be fixed in the next post.
>
>> +int
>> +xfs_find_desired_extent(
>> + struct inode *inode,
>> + loff_t *start,
>> + u32 type)
>
> I think this would better be merged with the code currenly in
> xfs_file_llseek. Maybe move all the SEEK_DATA/SEEK_HOLE specific
> code from there to this function?
>
> Also please move this routine to be next to xfs_file_llseek in xfs_file.c,
> which also means that it can be marked static.
ok, I'll merge both find_desired_extent() and seek_data_hole() to a
single routine and place it there.
>
>> +{
>> + xfs_inode_t *ip = XFS_I(inode);
>> + xfs_mount_t *mp = ip->i_mount;
>
> Just as above please use the struct versions for new code.
>
>> + /*
>> + * Flush the delay alloc blocks. Even after flushing the inode,
>> + * there can still be delalloc blocks on the inode beyond EOF
>> + * due to speculative preallocation. These are not removed until
>> + * the release function is called or the inode is inactivated.
>> + * Hence we cannot assert here that ip->i_delayed_blks == 0.
>> + */
>> + if (ip->i_delayed_blks || ip->i_size > ip->i_d.di_size) {
>> + error = xfs_flush_pages(ip, 0, -1, 0, FI_REMAPF);
>> + if (error)
>> + goto out_unlock_iolock;
>> + }
>
> For the final version we should get rid of this flush and instead look
> for pages having dirty unwritten extents in the pagecache and adjust
> the result based on it. I'm fine with delaying this until all other
> issues are sorted out.
Per Dave's comments, we would get ride of pages flush if using
xfs_bmapi_read() to retrieve the delayed extents info from the initial
implementation.
Thanks,
-Jeff
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] Introduce SEEK_DATA/SEEK_HOLE support to XFS V1
2011-11-20 0:30 ` Dave Chinner
@ 2011-11-20 13:59 ` Jeff Liu
2011-11-20 15:30 ` Christoph Hellwig
0 siblings, 1 reply; 28+ messages in thread
From: Jeff Liu @ 2011-11-20 13:59 UTC (permalink / raw)
To: Dave Chinner; +Cc: Christoph Hellwig, xfs, Chris Mason, aelder
Hi Dave,
Thanks for your further comments!
On 11/20/2011 08:30 AM, Dave Chinner wrote:
> On Sat, Nov 19, 2011 at 04:37:13PM +0800, Jeff Liu wrote:
>>
>> Signed-off-by: Jie Liu <jeff.liu@oracle.com>
>
> Christoph has commented on the code-related aspects of the patch, so
> I won't repeat that. I'll comment on structural/design issues
> instead.
>
> Firstly, the patch splits the functionality arbitrarily over three
> different files, and I don't think that is necessary. There really
> is no reason at all for xfs_bmap.c to know anything aout
> SEEK_HOLE/SEEK_DATA semantics - that file is for extent manipulation
> and search functions. SEEK semantics should be entirely encoded into
> the function that deals with the seeking.
Yes, I should merge and isolate those functions in xfs_file.c.
>
> Secondly, xfs_find_desired_extent() and xfs_seek_hole_data() should
> be one function, and named something like xfs_file_seek_extent().
Definitely! and sorry for my poor english, xfs_file_seek_extent() is distinct in this case.
>
> Finally, don't think using low level extent search functions like
> xfs_bmap_search_extents() is the right level to implement this
> functionality (see my comments about SEEK_HOLE/SEEK_DATA semantics
> in xfs-bmap.c above), especially as we already have functions for
> looking up holes and extents at given offsets.
>
> That is, to find the next hole at or after after a given offset, we
> already have xfs_bmap_first_unused(). Indeed, this function already
> has the exact semantics that SEEK_HOLE requires. Simply put:
>
> case SEEK_HOLE:
> fsb = XFS_B_TO_FSBT(mp, start_offset);
> error = xfs_bmap_first_unused(NULL, ip, 1, &fsb,
> XFS_DATA_FORK);
> if (error)
> return -error;
>
> if (fsb <= XFS_B_TO_FSBT(mp, start_offset))
> return start_offset;
> return XFS_FSB_TO_B(mp, fsb);
Thanks for pointing it out, I even don't know XFS has this convenient routine at that time. :(
>
>
> As to the data extent search, I'd prefer you to use xfs_bmapi_read()
> rather than xfs_bmap_search_extents() directly. I'd prefer that we
> return unwritten extents as holes rather than data from the initial
> implementation, and using the low level xfs_bmap_search_extents()
> makes that quite complex. However, we already have a function for
> handling that complexity: xfs_bmapi_read().
>
> That is, xfs_bmapi_read() needs to be passed an array of two maps,
> one for the current offset, and one for the next extent type. This
> makes one call sufficient for most transitions. Done in a simple
> loop it will handle all conditions of hole->unwritten->hole....
> until it finds a data extent of EOF.
>
>
> start_fsbno = XFS_B_TO_FSBT(mp, start_offset);
> while (1) {
> struct xfs_bmbt_irec maps[2];
> int nmaps = 2;
>
> count_fsb = XFS_B_TO_FSB(mp, XFS_MAXIOFFSET(mp));
> error = xfs_bmapi_read(ip, fsbno, count_fsb,
> &maps, &nmaps, XFS_BMAPI_ENTIRE);
>
> if (error)
> return -error;
> if (!nmaps) {
> /* no extents at given offset, must be beyond EOF */
> return -ENXIO;
> }
>
> switch (map[0].br_startblock) {
> case DELAYSTARTBLOCK:
> /* landed in an in-memory data extent */
> return map[0].br_startoff;
>
> default:
> /* landed in an allocated extent */
> if (map[0].br_state == XFS_EXT_NORM) {
> /* a real data extent */
> return map[0].br_startoff;
> }
> /* Fall through to hole handling for unwritten extents */
>
> case HOLESTARTBLOCK:
> /*
> * landed in a hole. If the next extent is a data
> * extent, then return the start of it, otherwise
> * we need to move the start offset and map more
> * blocks.
> */
> if (map[1].br_startblock == DELAYSTARTBLOCK ||
> ((map[1].br_startblock != HOLESTARTBLOCK &&
> map[1].br_state == XFS_EXT_NORM)))
> return map[1].br_startoff;
>
> start_fsbno = map[1].br_startoff + map[1].br_blockcount;
> break;
> }
>
> if (XFS_FSB_TO_B(mp, start_fsbno) > ip->i_size) {
> /* Beyond EOF now */
> return -ENXIO;
> }
> }
>
> This can pretty much all be done in
> fs/xfs/xfs_file.c::xfs_file_seek_extent() because all the functions
> used by the above code are exported from xfs_bmap.c for external use
> - that solves the scattering problem and uses interfaces that we
> already know work in the intended manner.... ;)
Thanks again for the detailed info!
At first, I have spent a few hours to think it over using xfs_bmap_search_extents() or xfs_bmapi_read().
Indeed, it will be more complex to deal with the unwritten extents later if using xfs_bmap_search_extents().
This change will reflected in my next post.
>
> BTW:
>
>> +xfs_file_llseek(
>> + struct file *file,
>> + loff_t offset,
>> + int origin)
>> +{
>> + struct inode *inode = file->f_mapping->host;
>> + int ret;
>> +
>> + if (origin != SEEK_DATA && origin != SEEK_HOLE)
>> + return generic_file_llseek(file, offset, origin);
>> +
>> + mutex_lock(&inode->i_mutex);
>> + switch (origin) {
>
> We don't need the i_mutex to be held here. We only need to hold the
> ilock in shared mode for this operation to protect against extent
> list modifications (like unwritten extent conversion and
> truncation).
looks we only need to hold the ilock in shared mode in xfs_file_seek_extent().
>
>> +int
>> +xfs_find_desired_extent(
>> + struct inode *inode,
>> + loff_t *start,
>> + u32 type)
>> +{
>> + xfs_inode_t *ip = XFS_I(inode);
>> + xfs_mount_t *mp = ip->i_mount;
>> + struct xfs_ifork *ifp;
>> + int lock;
>> + int error;
>> +
>> + if (ip->i_d.di_format != XFS_DINODE_FMT_EXTENTS &&
>> + ip->i_d.di_format != XFS_DINODE_FMT_BTREE &&
>> + ip->i_d.di_format != XFS_DINODE_FMT_LOCAL)
>> + return XFS_ERROR(EINVAL);
>> +
>> + xfs_ilock(ip, XFS_IOLOCK_SHARED);
>> +
>> + /*
>> + * Flush the delay alloc blocks. Even after flushing the inode,
>> + * there can still be delalloc blocks on the inode beyond EOF
>> + * due to speculative preallocation. These are not removed until
>> + * the release function is called or the inode is inactivated.
>> + * Hence we cannot assert here that ip->i_delayed_blks == 0.
>> + */
>> + if (ip->i_delayed_blks || ip->i_size > ip->i_d.di_size) {
>> + error = xfs_flush_pages(ip, 0, -1, 0, FI_REMAPF);
>> + if (error)
>> + goto out_unlock_iolock;
>> + }
>
> i.e. this IOLOCK and flush is completely unnecessary because we'll
> find delayed allocation extents in the extent lookup and can handle
> them just like real allocated extents....
>
>> + lock = xfs_ilock_map_shared(ip);
>
> i.e. this is the only lock we need to take.
Yes, if we get rid of the the flush pages here, xfs_ilock() can be removed safely.
Thanks,
-Jeff
>
> Cheers,
>
> Dave.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] Introduce SEEK_DATA/SEEK_HOLE support to XFS V1
2011-11-20 13:59 ` Jeff Liu
@ 2011-11-20 15:30 ` Christoph Hellwig
2011-11-20 22:34 ` Dave Chinner
0 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2011-11-20 15:30 UTC (permalink / raw)
To: Jeff Liu; +Cc: Christoph Hellwig, xfs, Chris Mason, aelder
On Sun, Nov 20, 2011 at 09:59:31PM +0800, Jeff Liu wrote:
> > fsb = XFS_B_TO_FSBT(mp, start_offset);
> > error = xfs_bmap_first_unused(NULL, ip, 1, &fsb,
> > XFS_DATA_FORK);
> > if (error)
> > return -error;
> >
> > if (fsb <= XFS_B_TO_FSBT(mp, start_offset))
> > return start_offset;
> > return XFS_FSB_TO_B(mp, fsb);
>
> Thanks for pointing it out, I even don't know XFS has this convenient routine at that time. :(
I didn't remember it either, but Dave has been working the dir code
which makes use of this funtion lately :)
Btw, the documentation for the function doesn't mention that it starts
searching for the hole only after the passed in block number, which
is something that could be improved.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] Introduce SEEK_DATA/SEEK_HOLE support to XFS V1
2011-11-20 15:30 ` Christoph Hellwig
@ 2011-11-20 22:34 ` Dave Chinner
0 siblings, 0 replies; 28+ messages in thread
From: Dave Chinner @ 2011-11-20 22:34 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Jeff Liu, xfs, Chris Mason, aelder
On Sun, Nov 20, 2011 at 10:30:13AM -0500, Christoph Hellwig wrote:
> On Sun, Nov 20, 2011 at 09:59:31PM +0800, Jeff Liu wrote:
> > > fsb = XFS_B_TO_FSBT(mp, start_offset);
> > > error = xfs_bmap_first_unused(NULL, ip, 1, &fsb,
> > > XFS_DATA_FORK);
> > > if (error)
> > > return -error;
> > >
> > > if (fsb <= XFS_B_TO_FSBT(mp, start_offset))
> > > return start_offset;
> > > return XFS_FSB_TO_B(mp, fsb);
> >
> > Thanks for pointing it out, I even don't know XFS has this convenient routine at that time. :(
>
> I didn't remember it either, but Dave has been working the dir code
> which makes use of this funtion lately :)
>
> Btw, the documentation for the function doesn't mention that it starts
> searching for the hole only after the passed in block number, which
> is something that could be improved.
Definitely.
A bit of code archeology shows that the location of the hole was
originally the return value of the function, then it got moved to a
function parameter so that the return value could be used for error
status (1995). Then the parameter go changed to be used as the first
block to start searching from when the dir2 code was introduced in
1999. So it's been wrong for quite some time ;)
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2011-11-20 22:34 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-02 15:04 SEEK_DATA/SEEK_HOLE support Jeff Liu
2011-10-02 15:42 ` Christoph Hellwig
2011-10-02 16:06 ` Jeff Liu
2011-10-02 17:59 ` Christoph Hellwig
2011-10-02 19:11 ` Andi Kleen
2011-10-03 4:04 ` Jeff Liu
2011-10-03 23:43 ` Dave Chinner
2011-10-04 13:02 ` Christoph Hellwig
2011-10-05 4:36 ` Dave Chinner
2011-10-05 5:32 ` Jeff Liu
2011-10-05 9:23 ` Dave Chinner
2011-10-05 13:56 ` Jeff Liu
2011-10-05 7:34 ` Michael Monnerie
2011-10-05 9:36 ` Dave Chinner
2011-10-05 18:22 ` Michael Monnerie
2011-10-06 0:32 ` Dave Chinner
2011-11-14 10:24 ` Christoph Hellwig
2011-11-14 12:47 ` Jeff Liu
2011-11-14 12:50 ` Christoph Hellwig
2011-11-19 8:29 ` XFS SEEK_DATA/SEEK_HOLE support V1 Jeff Liu
2011-11-19 8:34 ` Jeff Liu
2011-11-19 8:37 ` [PATCH] Introduce SEEK_DATA/SEEK_HOLE support to XFS V1 Jeff Liu
2011-11-19 19:11 ` Christoph Hellwig
2011-11-20 13:15 ` Jeff Liu
2011-11-20 0:30 ` Dave Chinner
2011-11-20 13:59 ` Jeff Liu
2011-11-20 15:30 ` Christoph Hellwig
2011-11-20 22:34 ` Dave Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox