From: Jeff Liu <jeff.liu@oracle.com>
To: Mark Tinguely <tinguely@sgi.com>
Cc: Christoph Hellwig <hch@infradead.org>, Ben Myers <bpm@sgi.com>,
Chris Mason <chris.mason@oracle.com>,
xfs@oss.sgi.com
Subject: Re: Introduce SEEK_DATA/SEEK_HOLE to XFS V5
Date: Fri, 13 Jan 2012 10:41:52 +0800 [thread overview]
Message-ID: <4F0F99F0.5060606@oracle.com> (raw)
In-Reply-To: <4F0EF5DC.1070207@sgi.com>
On 01/12/2012 11:01 PM, Mark Tinguely wrote:
> On 01/12/12 07:52, Jeff Liu wrote:
>> Hi Mark,
>>
>> On 01/12/2012 05:12 AM, Mark Tinguely wrote:
>>
>>> xfs_has_unwritten_buffer() always returns the offset of the first
>>> dirty unwritten page. This can cause xfs_seek_data() and xfs_seek_hole()
>>> to give the wrong results in certain circumstances.
>>
>> Sorry, am was well understood your opinions in this point for now.
>> IMHO, we can only find and return the data buffer offset at a dirty or
>> unwritten page once the first page was probed.
>>
>
> From my tests, xfs_bmapi_read() can only find holes if they cross or
> start on a 64KB boundary. It would be nice if unwritten extents were at
> least that good at finding holes.
>
>
> In xfs_has_unwritten_buffer(), could you start searching from the seek
> offset? The variable *offset could pass in that seek address and us that
> offset as the starting "index" rather than the beginning of the extent?
>
> You start:
>
> index = XFS_FSB_TO_B(mp, map->br_startoff) >> PAGE_CACHE_SHIFT;
>
> Could we do?:
>
> index = XFS_FSB_TO_B(mp, *offset) >> PAGE_CACHE_SHIFT;
>
> And before calling xfs_has_unwritten_buffer():
> offset = seekoff;
Good catch!
Looks we need to examine the max value between seekoff and
map->br_startoff, before passing it to xfs_has_unwritten_buffer().
For SEEK_DATA, if the seekoff is less than map->br_startoff, IMHO, we
need to pass the map->br_startoff to it.
>
> Also, my idea to find the next data/hole requires that
> xfs_has_unwritten_buffer() finds the smallest PAGECACHE_TAG_DIRTY or
> PAGECACHE_TAG_WRITEBACK page if any starting at the seek offset.
By combining with all your comments below, now I feel a bits clear about
your opinions. :)
I think it is definitely needed if we continue to use the current idea,
i.e, probing the unwritten extent twice(DIRTY, WRITEBACK).
Thanks,
-Jeff
>
>
>>>
>>> In xfs_seek_data(), every page past first dirty/unwritten page in the
>>> unwritten extent will be reported as data.
>>
>> Hmm, consider the user level utility that make use of SEEK_XXX stuff to
>> copy data from an offset in source file:
>>
>> Generally, it will call xfs_seek_data() firstly,
>> if we read an unwritten extent and there is data buffer was probed in
>> xfs_seek_data(), it only means we can read file data starting from the
>> returned offset of xfs_has_unwritten_buffer().
>>
>> Then it will call xfs_seek_hole() to calculate this extent length.
>> next, a couple of read()/write() will be called in a loop depending on
>> the extent length.
>>
>> [ page 1 ] | [ page 2 ] | [ page 3 ] | .... [ page N ]
>> |data offset at page 2|
>>
>> If we got the data offset from page2, and there is no data at page 3,
>> the user utility call read(2) will returns ZERO, and it will break
>> immediately.
>>
>
> Something like:
> loop
> s = lseek(fd, off, SEEK_DATA);
> if (s == -1)
> if we errno == ENXIO
> return done /* eof */
> else
> return errno
>
> e = lseek(fd, s, SEEK_HOLE);
> if (e == -1)
> return errno
>
> dest = copy from s to e
> off = e
> end loop (if not eof or other condition)
>
> You will seek for next hole at the found data position. Even if
> xfs_has_unwritten_buffer() does the right thing and returns the
> dirty/unwritten page starting from seekoff, we need go a page past the
> current page (which has data) to look for the next hole.
>
>
>
> Something like (again psuedo-code)
> loop
> offset1 = offset2 = seekoff
> xfs_has_unwritten_buffer(seekoff, &offset1, DIRTY)
> xfs_has_unwritten_buffer(seekoff, &offset2, WRITEBACK)
> d = min(offset1, offset2)
>
> if (d > seekoff OR d == NULL)
> return found a hole at seekoff
>
> if (d == seekoff) /* standard case assuming how we
> * use SEEK_DATA/SEEK_HOLE
> * This is the step your code
> * does not perform. It jumps
> * to the next extent
> */
> seekoff += page size of dirty/writeback **
> end while the seekoff < extent size
>
> ** here we could jump to the next 64KB boundary and be as accurate as
> xfs_bmapi_read().
>
> Good job. This is an important feature.
>
> --Mark Tinguely.
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2012-01-13 2:42 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-06 13:28 Introduce SEEK_DATA/SEEK_HOLE to XFS V5 Jeff Liu
2012-01-10 17:18 ` Ben Myers
2012-01-11 5:45 ` Jeff Liu
2012-01-11 21:06 ` Mark Tinguely
2012-01-12 16:22 ` Christoph Hellwig
2012-01-12 17:50 ` Ben Myers
2012-01-11 21:07 ` Mark Tinguely
2012-01-12 13:29 ` Jeff Liu
2012-01-12 16:39 ` Christoph Hellwig
2012-01-13 2:14 ` Jeff Liu
2012-01-11 21:12 ` Mark Tinguely
2012-01-12 13:52 ` Jeff Liu
2012-01-12 15:01 ` Mark Tinguely
2012-01-12 16:41 ` Christoph Hellwig
2012-01-12 17:39 ` Ben Myers
2012-01-13 2:41 ` Jeff Liu [this message]
2012-01-11 15:43 ` Ben Myers
2012-01-11 22:28 ` Ben Myers
2012-01-12 13:21 ` Jeff Liu
2012-01-12 12:53 ` Jeff Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F0F99F0.5060606@oracle.com \
--to=jeff.liu@oracle.com \
--cc=bpm@sgi.com \
--cc=chris.mason@oracle.com \
--cc=hch@infradead.org \
--cc=tinguely@sgi.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.