From: Jeff Liu <jeff.liu@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@infradead.org>,
xfs@oss.sgi.com, Chris Mason <chris.mason@oracle.com>,
aelder@sgi.com
Subject: Re: [PATCH] Introduce SEEK_DATA/SEEK_HOLE support to XFS V1
Date: Sun, 20 Nov 2011 21:59:31 +0800 [thread overview]
Message-ID: <4EC907C3.7020901@oracle.com> (raw)
In-Reply-To: <20111120003031.GM7046@dastard>
Hi Dave,
Thanks for your further comments!
On 11/20/2011 08:30 AM, Dave Chinner wrote:
> On Sat, Nov 19, 2011 at 04:37:13PM +0800, Jeff Liu wrote:
>>
>> Signed-off-by: Jie Liu <jeff.liu@oracle.com>
>
> Christoph has commented on the code-related aspects of the patch, so
> I won't repeat that. I'll comment on structural/design issues
> instead.
>
> Firstly, the patch splits the functionality arbitrarily over three
> different files, and I don't think that is necessary. There really
> is no reason at all for xfs_bmap.c to know anything aout
> SEEK_HOLE/SEEK_DATA semantics - that file is for extent manipulation
> and search functions. SEEK semantics should be entirely encoded into
> the function that deals with the seeking.
Yes, I should merge and isolate those functions in xfs_file.c.
>
> Secondly, xfs_find_desired_extent() and xfs_seek_hole_data() should
> be one function, and named something like xfs_file_seek_extent().
Definitely! and sorry for my poor english, xfs_file_seek_extent() is distinct in this case.
>
> Finally, don't think using low level extent search functions like
> xfs_bmap_search_extents() is the right level to implement this
> functionality (see my comments about SEEK_HOLE/SEEK_DATA semantics
> in xfs-bmap.c above), especially as we already have functions for
> looking up holes and extents at given offsets.
>
> That is, to find the next hole at or after after a given offset, we
> already have xfs_bmap_first_unused(). Indeed, this function already
> has the exact semantics that SEEK_HOLE requires. Simply put:
>
> case SEEK_HOLE:
> fsb = XFS_B_TO_FSBT(mp, start_offset);
> error = xfs_bmap_first_unused(NULL, ip, 1, &fsb,
> XFS_DATA_FORK);
> if (error)
> return -error;
>
> if (fsb <= XFS_B_TO_FSBT(mp, start_offset))
> return start_offset;
> return XFS_FSB_TO_B(mp, fsb);
Thanks for pointing it out, I even don't know XFS has this convenient routine at that time. :(
>
>
> As to the data extent search, I'd prefer you to use xfs_bmapi_read()
> rather than xfs_bmap_search_extents() directly. I'd prefer that we
> return unwritten extents as holes rather than data from the initial
> implementation, and using the low level xfs_bmap_search_extents()
> makes that quite complex. However, we already have a function for
> handling that complexity: xfs_bmapi_read().
>
> That is, xfs_bmapi_read() needs to be passed an array of two maps,
> one for the current offset, and one for the next extent type. This
> makes one call sufficient for most transitions. Done in a simple
> loop it will handle all conditions of hole->unwritten->hole....
> until it finds a data extent of EOF.
>
>
> start_fsbno = XFS_B_TO_FSBT(mp, start_offset);
> while (1) {
> struct xfs_bmbt_irec maps[2];
> int nmaps = 2;
>
> count_fsb = XFS_B_TO_FSB(mp, XFS_MAXIOFFSET(mp));
> error = xfs_bmapi_read(ip, fsbno, count_fsb,
> &maps, &nmaps, XFS_BMAPI_ENTIRE);
>
> if (error)
> return -error;
> if (!nmaps) {
> /* no extents at given offset, must be beyond EOF */
> return -ENXIO;
> }
>
> switch (map[0].br_startblock) {
> case DELAYSTARTBLOCK:
> /* landed in an in-memory data extent */
> return map[0].br_startoff;
>
> default:
> /* landed in an allocated extent */
> if (map[0].br_state == XFS_EXT_NORM) {
> /* a real data extent */
> return map[0].br_startoff;
> }
> /* Fall through to hole handling for unwritten extents */
>
> case HOLESTARTBLOCK:
> /*
> * landed in a hole. If the next extent is a data
> * extent, then return the start of it, otherwise
> * we need to move the start offset and map more
> * blocks.
> */
> if (map[1].br_startblock == DELAYSTARTBLOCK ||
> ((map[1].br_startblock != HOLESTARTBLOCK &&
> map[1].br_state == XFS_EXT_NORM)))
> return map[1].br_startoff;
>
> start_fsbno = map[1].br_startoff + map[1].br_blockcount;
> break;
> }
>
> if (XFS_FSB_TO_B(mp, start_fsbno) > ip->i_size) {
> /* Beyond EOF now */
> return -ENXIO;
> }
> }
>
> This can pretty much all be done in
> fs/xfs/xfs_file.c::xfs_file_seek_extent() because all the functions
> used by the above code are exported from xfs_bmap.c for external use
> - that solves the scattering problem and uses interfaces that we
> already know work in the intended manner.... ;)
Thanks again for the detailed info!
At first, I have spent a few hours to think it over using xfs_bmap_search_extents() or xfs_bmapi_read().
Indeed, it will be more complex to deal with the unwritten extents later if using xfs_bmap_search_extents().
This change will reflected in my next post.
>
> BTW:
>
>> +xfs_file_llseek(
>> + struct file *file,
>> + loff_t offset,
>> + int origin)
>> +{
>> + struct inode *inode = file->f_mapping->host;
>> + int ret;
>> +
>> + if (origin != SEEK_DATA && origin != SEEK_HOLE)
>> + return generic_file_llseek(file, offset, origin);
>> +
>> + mutex_lock(&inode->i_mutex);
>> + switch (origin) {
>
> We don't need the i_mutex to be held here. We only need to hold the
> ilock in shared mode for this operation to protect against extent
> list modifications (like unwritten extent conversion and
> truncation).
looks we only need to hold the ilock in shared mode in xfs_file_seek_extent().
>
>> +int
>> +xfs_find_desired_extent(
>> + struct inode *inode,
>> + loff_t *start,
>> + u32 type)
>> +{
>> + xfs_inode_t *ip = XFS_I(inode);
>> + xfs_mount_t *mp = ip->i_mount;
>> + struct xfs_ifork *ifp;
>> + int lock;
>> + int error;
>> +
>> + if (ip->i_d.di_format != XFS_DINODE_FMT_EXTENTS &&
>> + ip->i_d.di_format != XFS_DINODE_FMT_BTREE &&
>> + ip->i_d.di_format != XFS_DINODE_FMT_LOCAL)
>> + return XFS_ERROR(EINVAL);
>> +
>> + xfs_ilock(ip, XFS_IOLOCK_SHARED);
>> +
>> + /*
>> + * Flush the delay alloc blocks. Even after flushing the inode,
>> + * there can still be delalloc blocks on the inode beyond EOF
>> + * due to speculative preallocation. These are not removed until
>> + * the release function is called or the inode is inactivated.
>> + * Hence we cannot assert here that ip->i_delayed_blks == 0.
>> + */
>> + if (ip->i_delayed_blks || ip->i_size > ip->i_d.di_size) {
>> + error = xfs_flush_pages(ip, 0, -1, 0, FI_REMAPF);
>> + if (error)
>> + goto out_unlock_iolock;
>> + }
>
> i.e. this IOLOCK and flush is completely unnecessary because we'll
> find delayed allocation extents in the extent lookup and can handle
> them just like real allocated extents....
>
>> + lock = xfs_ilock_map_shared(ip);
>
> i.e. this is the only lock we need to take.
Yes, if we get rid of the the flush pages here, xfs_ilock() can be removed safely.
Thanks,
-Jeff
>
> Cheers,
>
> Dave.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2011-11-20 13:59 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-10-02 15:04 SEEK_DATA/SEEK_HOLE support Jeff Liu
2011-10-02 15:42 ` Christoph Hellwig
2011-10-02 16:06 ` Jeff Liu
2011-10-02 17:59 ` Christoph Hellwig
2011-10-02 19:11 ` Andi Kleen
2011-10-03 4:04 ` Jeff Liu
2011-10-03 23:43 ` Dave Chinner
2011-10-04 13:02 ` Christoph Hellwig
2011-10-05 4:36 ` Dave Chinner
2011-10-05 5:32 ` Jeff Liu
2011-10-05 9:23 ` Dave Chinner
2011-10-05 13:56 ` Jeff Liu
2011-10-05 7:34 ` Michael Monnerie
2011-10-05 9:36 ` Dave Chinner
2011-10-05 18:22 ` Michael Monnerie
2011-10-06 0:32 ` Dave Chinner
2011-11-14 10:24 ` Christoph Hellwig
2011-11-14 12:47 ` Jeff Liu
2011-11-14 12:50 ` Christoph Hellwig
2011-11-19 8:29 ` XFS SEEK_DATA/SEEK_HOLE support V1 Jeff Liu
2011-11-19 8:34 ` Jeff Liu
2011-11-19 8:37 ` [PATCH] Introduce SEEK_DATA/SEEK_HOLE support to XFS V1 Jeff Liu
2011-11-19 19:11 ` Christoph Hellwig
2011-11-20 13:15 ` Jeff Liu
2011-11-20 0:30 ` Dave Chinner
2011-11-20 13:59 ` Jeff Liu [this message]
2011-11-20 15:30 ` Christoph Hellwig
2011-11-20 22:34 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4EC907C3.7020901@oracle.com \
--to=jeff.liu@oracle.com \
--cc=aelder@sgi.com \
--cc=chris.mason@oracle.com \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.