All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Carlos Maiolino <cmaiolino@redhat.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH] [RFC] Implement XFS_IOC_FSBULKSTAT_RANGE ioctl
Date: Wed, 4 Mar 2015 08:55:57 +1100	[thread overview]
Message-ID: <20150303215557.GU4251@dastard> (raw)
In-Reply-To: <1425393678-4550-1-git-send-email-cmaiolino@redhat.com>

On Tue, Mar 03, 2015 at 11:41:18AM -0300, Carlos Maiolino wrote:
> This patch aims to implement a ranged bulkstat ioctl, which will make users able
> to bulkstat inodes in a filesystem based on range instead on rely only in a
> whole filesystem bulkstat.
> 
> This first implementation add a per AG bulkstat possibility, but it also adds
> the necessary infrastructure to implement different kinds of ranged bulkstat,
> like per block.
> 
> The patch is already working and I've been testing it for a while, so, I think
> it's time to send this patch out and ask for comments on it.
> 
> The core of the implementation is very similar with what xfs_bulsktat() does by
> now, which, instead bulkstat the whole filesystem, it start/stop on the
> specified range.
> 
> As per Dave's suggesting, I added to  rgbulkreq structure (used to pass data
> to/from userland), a padding, so we can use the same structure for another kind
> of ranged bulkstats without the need to actually change the structure size.
> 
> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_fs.h |  12 +++
>  fs/xfs/xfs_ioctl.c     |  56 +++++++++++++
>  fs/xfs/xfs_itable.c    | 210 +++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_itable.h    |  16 ++++
>  4 files changed, 294 insertions(+)
> 
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index 18dc721..88665aa 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -334,6 +334,17 @@ typedef struct xfs_fsop_bulkreq {
>  	__s32		__user *ocount;	/* output count pointer		*/
>  } xfs_fsop_bulkreq_t;
>  
> +typedef struct xfs_fsop_rgbulkreq {
> +	__u64		__user *lastip; /* last inode # pointer		*/
> +	__s32		icount;		/* count of entries in buffer	*/
> +	void		__user *ubuffer;/* user buffer for inode desc.	*/
> +	__s32		__user *ocount;	/* output count pointer		*/
> +	__u64		start;		/* start point of rgbulkstat	*/
> +	__u64		end;		/* end point of rgbulkstat	*/
> +	__u32		flags;		/* multipurpose flag field	*/
> +	__u32		pad[5];		/* reserved space		*/
> +} xfs_fsop_rgbulkreq_t;

No new typedefs. Also, call it struct xfs_fsop_bulkstat_range, so
it's clear what ioctl it belongs with.

And, as eric mentioned, the flag:

#define XFS_FSOP_BSRANGE_AGNUMBER	1

also needs to be defined to tell the kernel that start/end are ag
numbers.

> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index f7afb86..34a4de5 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -790,6 +790,59 @@ xfs_ioc_bulkstat(
>  }
>  
>  STATIC int
> +xfs_ioc_bulkstat_range(
> +	xfs_mount_t		*mp,
> +	unsigned int		cmd,
> +	void			__user *arg)
> +{
> +
> +	xfs_fsop_rgbulkreq_t	rgbulkreq;

Again, better name needed. Even just "bsreq" would be fine, because
it's just a bulkstat request structure....

> +	int			count;	/* # of records returned */
> +	xfs_ino_t		inlast; /* last inode number */
> +	int			done;
> +	int			error;
> +
> +	if (!capable(CAP_SYS_ADMIN))
> +		return -EPERM;
> +
> +	if (XFS_FORCED_SHUTDOWN(mp))
> +		return -EIO;
> +
> +	/* We do not support another ranged calls yet */
> +	if (cmd != XFS_IOC_FSBULKSTAT_RANGE)
> +		return -EINVAL;

Not necessary.

> +
> +	if (copy_from_user(&rgbulkreq, arg, sizeof(xfs_fsop_rgbulkreq_t)))
> +		return -EFAULT;
> +
> +	if (copy_from_user(&inlast, rgbulkreq.lastip, sizeof(__s64)))
> +		return -EFAULT;
> +
> +	if ((count = rgbulkreq.icount) <= 0)
> +		return -EINVAL;

Assignments need to be on their own line.

> +
> +	if (rgbulkreq.ubuffer == NULL)
> +		return -EINVAL;

Missing is validation of the start, end and flags fields.

> +
> +	error = xfs_bulkstat_range(mp, &inlast, &count, xfs_bulkstat_one,
> +				   sizeof(xfs_bstat_t), rgbulkreq.ubuffer,
> +				   rgbulkreq.start, rgbulkreq.end, &done);

Why do you need a "done" parameter?  Also, just pass the bsreq
structure, not the individual elements. I don't see any point in
passing the formatter or size parameters at this point, either,
because xfs_bulkstat_range() can easily define them.

> diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
> index 82e3142..d6b27ee 100644
> --- a/fs/xfs/xfs_itable.c
> +++ b/fs/xfs/xfs_itable.c
> @@ -530,6 +530,216 @@ del_cursor:
>  	return error;
>  }
>  
> +/*
> + * Return stat information in bulk (by-inode) for specified
> + * filesystem range.
> + */
> +int
> +xfs_bulkstat_range(
> +	xfs_mount_t		*mp,	/* mount point for filesystem */
> +	xfs_ino_t		*lastinop, /* last inode returned */
> +	int			*ubcountp, /* size of buffer/count returned */
> +	bulkstat_one_pf		formatter, /* func that'd fill a single buf */
> +	size_t			statstruct_size, /* sizeof struct filling */
> +	char			__user *ubuffer, /* buffer with inode stats */
> +	__u64			start,	/* start bulkstat here */
> +	__u64			end,	/* end bulkstat here */
> +	int			*done) /* 1 if there are more stats to get */
> +{
> +	xfs_buf_t		*agbp;	/* agi header buffer */
> +	xfs_agino_t		agino;	/* inode # in allocation group */
> +	xfs_agnumber_t		agno;	/* allocation group number */
> +	xfs_btree_cur_t		*cur;	/* btree cursor for ialloc btree */
> +	size_t			irbsize; /* size of irec buffer in bytes */
> +	xfs_inobt_rec_incore_t	*irbuf;	/* start of irec buffer */
> +	int			nirbuf;	/* size of irbuf */
> +	int			ubcount; /* size of user's buffer */
> +	struct xfs_bulkstat_agichunk ac;
> +	int			error = 0;
> +
> +	/*
> +	 * Get the last inode value, see if there is nothing to do.
> +	 */
> +	if (*lastinop != 0)
> +		agno = XFS_INO_TO_AGNO(mp, *lastinop);
> +	else
> +		agno = start;
> +
> +	agino = XFS_INO_TO_AGINO(mp, *lastinop);
> +
> +	/*
> +	 * If specified end is bigger than mp->m_sb.sb_agount, should we
> +	 * gracefully interpret as if there is nothing to do, or trigger
> +	 * an error?
> +	 */

We should have already errored out before we get here in this case.

> +	if (agno >= mp->m_sb.sb_agcount) {
> +		*done = 1;
> +		*ubcountp = 0;
> +		return 0;
> +	}

>From this point onwards, you've pretty much copy-n-pasted the
xfs_bulkstat() implementation. What should happen here is:

	for (agno = start; agno < end; agno++) {
		error = xfs_bulkstat_range_ag(agno, ... );
		if (error)
			break;
	}

and all this:

> +	ubcount = *ubcountp;
> +	ac.ac_ubuffer = &ubuffer;
> +	ac.ac_ubleft = ubcount * statstruct_size; /* bytes */
> +	ac.ac_ubelem = 0;
> +
> +	*ubcountp = 0;
> +	*done = 0;
> +
> +	irbuf = kmem_zalloc_greedy(&irbsize, PAGE_SIZE, PAGE_SIZE * 4);
> +	if (!irbuf)
> +		return -ENOMEM;
> +
> +	nirbuf = irbsize / sizeof(*irbuf);
> +
> +	/*
> +	 * Loop over the allocation groups, starting from the last inode
> +	 * returned; - means start of the allocation group.
> +	 */
> +	while (agno <= end) {
> +		struct xfs_inobt_rec_incore	*irbp = irbuf;
> +		struct xfs_inobt_rec_incore	*irbufend = irbuf + nirbuf;
> +		bool				end_of_ag = false;
> +		int				icount = 0;
> +		int				stat;
> +
> +		error = xfs_ialloc_read_agi(mp, NULL, agno, &agbp);
> +		if (error)
> +			break;
> +
> +		/*
> +		 * Allocate and initialize a btree cursor for ialloc btree.
> +		 */
> +		cur = xfs_inobt_init_cursor(mp, NULL, agbp, agno,
> +					    XFS_BTNUM_INO);
> +
> +		if (agino > 0) {
> +			/*
> +			 * In the middle of an allocation group, we need to get
> +			 * the remainder of the chunk we are in.
> +			 */
> +			struct xfs_inobt_rec_incore	r;
> +
> +			error = xfs_bulkstat_grab_ichunk(cur, agino, &icount, &r);
> +			if (error)
> +				goto del_cursor;
> +			if (icount) {
> +				irbp->ir_startino = r.ir_startino;
> +				irbp->ir_freecount = r.ir_freecount;
> +				irbp->ir_free = r.ir_free;
> +				irbp++;
> +			}
> +			/* Increment to the next record */
> +			error = xfs_btree_increment(cur, 0, &stat);
> +		} else {
> +			/* Start of ag. Lookup the first inode chunk */
> +			error = xfs_inobt_lookup(cur, 0, XFS_LOOKUP_GE, &stat);
> +		}
> +		if (error || stat == 0) {
> +			end_of_ag = true;
> +			goto del_cursor;
> +		}
> +
> +		/*
> +		 * Loop through inode btree records in this ag,
> +		 * until we run out of inodes or space in the buffer.
> +		 */
> +		while (irbp < irbufend && icount < ubcount) {
> +			struct xfs_inobt_rec_incore	r;
> +
> +			error = xfs_inobt_get_rec(cur, &r, &stat);
> +			if (error || stat == 0) {
> +				end_of_ag = true;
> +				goto del_cursor;
> +			}
> +
> +			/*
> +			 * If this chunk has any allocated inodes, save it.
> +			 * Also start read-ahead now for this chunk.
> +			 */
> +			if (r.ir_freecount < XFS_INODES_PER_CHUNK) {
> +				xfs_bulkstat_ichunk_ra(mp, agno, &r);
> +				irbp->ir_startino = r.ir_startino;
> +				irbp->ir_freecount = r.ir_freecount;
> +				irbp->ir_free = r.ir_free;
> +				irbp++;
> +				icount += XFS_INODES_PER_CHUNK - r.ir_freecount;
> +			}
> +			error = xfs_btree_increment(cur, 0, &stat);
> +			if (error || stat == 0) {
> +				end_of_ag = true;
> +				goto del_cursor;
> +			}
> +			cond_resched();
> +		}
> +
> +	/*
> +	 * Drop the btree buffers and the agi buffer as we can't hold any of the
> +	 * locks these represent when calling iget. If there is a pending error,
> +	 * then we are done.
> +	 */
> +del_cursor:
> +		xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
> +		xfs_buf_relse(agbp);
> +		if (error)
> +			break;
> +		/*
> +		 * Now format all the good inodes into the user's buffer. The
> +		 * call to xfs_bulkstat_ag_ichunk() sets up the agino pointer
> +		 * for the next loop iteration.
> +		 */
> +		irbufend = irbp;
> +		for (irbp = irbuf;
> +		     irbp < irbufend && ac.ac_ubleft >= statstruct_size;
> +		     irbp++) {
> +			error = xfs_bulkstat_ag_ichunk(mp, agno, irbp, formatter,
> +						statstruct_size, &ac, &agino);
> +			if (error)
> +				break;
> +
> +			cond_resched();
> +		}
> +
> +		/*
> +		 * If we have run out of space or had a formatting error, we
> +		 * are now done
> +		 */
> +		if (ac.ac_ubleft < statstruct_size || error)
> +			break;
> +
> +		if (end_of_ag) {
> +			agno++;
> +			agino = 0;
> +		}
> +	}
> +	/*
> +	 * Done, we are either out of specified inode range or space to
> +	 * put the data.
> +	 */
> +	kmem_free(irbuf);
> +	*ubcountp = ac.ac_ubelem;

Up to here is done by xfs_bulkstat_range_ag(). This factoring should
be done to xfs_bulkstat() as the first patch in the series, then you
can add the xfs_bulkstat_range() function that uses it, then finally
introduce the ioctl that wraps around xfs_bulkstat_range().

This avoids all of the copy/paste, and ensures that the bulkstat
APIs all use the same implementation which makes testing and
maintenance of the code in future much simpler.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2015-03-03 21:56 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-03 14:41 [PATCH] [RFC] Implement XFS_IOC_FSBULKSTAT_RANGE ioctl Carlos Maiolino
2015-03-03 20:58 ` Eric Sandeen
2015-03-03 21:55 ` Dave Chinner [this message]
2015-03-06 17:34   ` Carlos Maiolino

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150303215557.GU4251@dastard \
    --to=david@fromorbit.com \
    --cc=cmaiolino@redhat.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.