[PATCH 0/3] Add agskip=value mount option

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/3] Add agskip=value mount option
@ 2013-01-29 15:39 rjohnston
  2013-01-29 15:39 ` [PATCH 1/3] xfs: add " rjohnston
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: rjohnston @ 2013-01-29 15:39 UTC (permalink / raw)
  To: xfs

This series will:

    1. Add agskip=value mount option to xfs
    2. Add agkip mount option test to xfstests
    3. Add agskip mount option to the mount.8 manpage (util-linux)

--Rich

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] xfs: add agskip=value mount option
  2013-01-29 15:39 [PATCH 0/3] Add agskip=value mount option rjohnston
@ 2013-01-29 15:39 ` rjohnston
  2013-01-29 20:45   ` Eric Sandeen
                     ` (2 more replies)
  2013-01-29 15:39 ` [PATCH 2/3] xfstests: add test for agskip " rjohnston
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 12+ messages in thread
From: rjohnston @ 2013-01-29 15:39 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-add-agskip-mount-option.patch --]
[-- Type: text/plain, Size: 6764 bytes --]

The agskip mount option specifies  the allocation group, (AG) for a new
file, relative to the start of the last created file. agskip has the
opposite effect of the rotorstep  system tunable  parameter.   Each
new  file  to  be placed in the location lastAG + agskipValue,
where lastAG is the allocation group of the last created file.

For example, agskip=3 means each new file will be  allocated  three  AGs  away
from the starting AG of the most recently created file.

The agskip mount option disables the rotorstep system tunable parameter.

Signed-off-by: Rich Johnston <rjohnston@sgi.com> 
---
 fs/xfs/xfs_alloc.c      |   35 ++++++++++++++++++++++++++++-------
 fs/xfs/xfs_filestream.c |    8 +++++++-
 fs/xfs/xfs_mount.c      |    1 +
 fs/xfs/xfs_mount.h      |    4 +++-
 fs/xfs/xfs_super.c      |   13 ++++++++++++-
 5 files changed, 51 insertions(+), 10 deletions(-)

Index: b/fs/xfs/xfs_alloc.c
===================================================================
--- a/fs/xfs/xfs_alloc.c
+++ b/fs/xfs/xfs_alloc.c
@@ -2365,12 +2365,20 @@ xfs_alloc_vextent(
 		 * Try near allocation first, then anywhere-in-ag after
 		 * the first a.g. fails.
 		 */
-		if ((args->userdata  == XFS_ALLOC_INITIAL_USER_DATA) &&
-		    (mp->m_flags & XFS_MOUNT_32BITINODES)) {
-			args->fsbno = XFS_AGB_TO_FSB(mp,
-					((mp->m_agfrotor / rotorstep) %
-					mp->m_sb.sb_agcount), 0);
-			bump_rotor = 1;
+		if (args->userdata == XFS_ALLOC_INITIAL_USER_DATA) {
+			if (mp->m_flags & XFS_MOUNT_AGSKIP) {
+				spin_lock(&mp->m_agfrotor_lock);
+				args->fsbno = XFS_AGB_TO_FSB(mp,
+					mp->m_agfrotor, 0);
+				mp->m_agfrotor = (mp->m_agfrotor + mp->m_agskip)
+					% mp->m_sb.sb_agcount;
+				spin_unlock(&mp->m_agfrotor_lock);
+			} else if (mp->m_flags & XFS_MOUNT_32BITINODES) {
+				args->fsbno = XFS_AGB_TO_FSB(mp,
+						((mp->m_agfrotor / rotorstep) %
+						mp->m_sb.sb_agcount), 0);
+				bump_rotor = 1;
+			}
 		}
 		args->agbno = XFS_FSB_TO_AGBNO(mp, args->fsbno);
 		args->type = XFS_ALLOCTYPE_NEAR_BNO;
@@ -2385,8 +2393,21 @@ xfs_alloc_vextent(
 			/*
 			 * Start with the last place we left off.
 			 */
-			args->agno = sagno = (mp->m_agfrotor / rotorstep) %
+                        if (mp->m_flags & XFS_MOUNT_AGSKIP) {
+				/*
+				 * The spinlock makes the combined assignment
+				 * of args->fsbno and mp->m_agfrotor atomic.
+				 * mp->m_agfrotor can not be advanced until
+				 * args->fsbno is assigned.
+				*/
+                                spin_lock(&mp->m_agfrotor_lock);
+                                sagno = mp->m_agfrotor;
+                                spin_unlock(&mp->m_agfrotor_lock);
+                        } else {
+				sagno = (mp->m_agfrotor / rotorstep) %
 					mp->m_sb.sb_agcount;
+			}
+			args->agno = sagno;
 			args->type = XFS_ALLOCTYPE_THIS_AG;
 			flags = XFS_ALLOC_FLAG_TRYLOCK;
 		} else if (type == XFS_ALLOCTYPE_FIRST_AG) {
Index: b/fs/xfs/xfs_filestream.c
===================================================================
--- a/fs/xfs/xfs_filestream.c
+++ b/fs/xfs/xfs_filestream.c
@@ -622,7 +622,13 @@ xfs_filestream_associate(
 	 * Set the starting AG using the rotor for inode32, otherwise
 	 * use the directory inode's AG.
 	 */
-	if (mp->m_flags & XFS_MOUNT_32BITINODES) {
+	if (mp->m_flags & XFS_MOUNT_AGSKIP) {
+		spin_lock(&mp->m_agfrotor_lock);
+		startag = mp->m_agfrotor;
+		mp->m_agfrotor = (mp->m_agfrotor + mp->m_agskip)
+			% mp->m_sb.sb_agcount;
+		spin_unlock(&mp->m_agfrotor_lock);
+	} else if (mp->m_flags & XFS_MOUNT_32BITINODES) {
 		rotorstep = xfs_rotorstep;
 		startag = (mp->m_agfrotor / rotorstep) % mp->m_sb.sb_agcount;
 		mp->m_agfrotor = (mp->m_agfrotor + 1) %
Index: b/fs/xfs/xfs_mount.c
===================================================================
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -768,6 +768,7 @@ STATIC void
 xfs_mount_common(xfs_mount_t *mp, xfs_sb_t *sbp)
 {
 	mp->m_agfrotor = mp->m_agirotor = 0;
+	spin_lock_init(&mp->m_agfrotor_lock);
 	spin_lock_init(&mp->m_agirotor_lock);
 	mp->m_maxagi = mp->m_sb.sb_agcount;
 	mp->m_blkbit_log = sbp->sb_blocklog + XFS_NBBYLOG;
Index: b/fs/xfs/xfs_mount.h
===================================================================
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -119,6 +119,7 @@ typedef struct xfs_mount {
 	char			*m_logname;	/* external log device name */
 	int			m_bsize;	/* fs logical block size */
 	xfs_agnumber_t		m_agfrotor;	/* last ag where space found */
+	spinlock_t		m_agfrotor_lock;/* .. and lock protecting it */
 	xfs_agnumber_t		m_agirotor;	/* last ag dir inode alloced */
 	spinlock_t		m_agirotor_lock;/* .. and lock protecting it */
 	xfs_agnumber_t		m_maxagi;	/* highest inode alloc group */
@@ -203,6 +204,7 @@ typedef struct xfs_mount {
 	struct shrinker		m_inode_shrink;	/* inode reclaim shrinker */
 	int64_t			m_low_space[XFS_LOWSP_MAX];
 						/* low free space thresholds */
+	int			m_agskip;	/* extent allocation stride */
 
 	struct workqueue_struct	*m_data_workqueue;
 	struct workqueue_struct	*m_unwritten_workqueue;
@@ -246,7 +248,7 @@ typedef struct xfs_mount {
 #define XFS_MOUNT_FILESTREAMS	(1ULL << 24)	/* enable the filestreams
 						   allocator */
 #define XFS_MOUNT_NOATTR2	(1ULL << 25)	/* disable use of attr2 format */
-
+#define XFS_MOUNT_AGSKIP	(1ULL << 26)	/* extent allocation stride */
 
 /*
  * Default minimum read and write sizes.
Index: b/fs/xfs/xfs_super.c
===================================================================
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -114,7 +114,7 @@ mempool_t *xfs_ioend_pool;
 #define MNTOPT_NODELAYLOG  "nodelaylog"	/* Delayed logging disabled */
 #define MNTOPT_DISCARD	   "discard"	/* Discard unused blocks */
 #define MNTOPT_NODISCARD   "nodiscard"	/* Do not discard unused blocks */
-
+#define MNTOPT_AGSKIP      "agskip"	/* initial extent allocation stride */
 /*
  * Table driven mount option parser.
  *
@@ -383,6 +383,15 @@ xfs_parseargs(
 		} else if (!strcmp(this_char, "irixsgid")) {
 			xfs_warn(mp,
 	"irixsgid is now a sysctl(2) variable, option is deprecated.");
+		} else if (!strcmp(this_char, MNTOPT_AGSKIP)) {
+			if (!value || !*value) {
+				xfs_warn(mp,
+					"%s option requires an argument",
+					this_char);
+				return EINVAL;
+			}
+			mp->m_flags |= XFS_MOUNT_AGSKIP;
+			mp->m_agskip = simple_strtoul(value, &eov, 10);
 		} else {
 			xfs_warn(mp, "unknown mount option [%s].", this_char);
 			return EINVAL;
@@ -567,6 +576,8 @@ xfs_showargs(
 
 	if (!(mp->m_qflags & XFS_ALL_QUOTA_ACCT))
 		seq_puts(m, "," MNTOPT_NOQUOTA);
+	if (mp->m_flags & XFS_MOUNT_AGSKIP)
+		seq_printf(m, "," MNTOPT_AGSKIP "=%d", mp->m_agskip);
 
 	return 0;
 }

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 2/3] xfstests: add test for agskip mount option
  2013-01-29 15:39 [PATCH 0/3] Add agskip=value mount option rjohnston
  2013-01-29 15:39 ` [PATCH 1/3] xfs: add " rjohnston
@ 2013-01-29 15:39 ` rjohnston
  2013-01-29 15:39 ` [PATCH 3/3] --- sys-utils/mount.8 | 12 ++++++++++++ 1 file changed, 12 insertions(+) rjohnston
  2013-01-29 17:59 ` [PATCH 0/3] Add agskip=value mount option Eric Sandeen
  3 siblings, 0 replies; 12+ messages in thread
From: rjohnston @ 2013-01-29 15:39 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfstests-add-agskip-test.patch --]
[-- Type: text/plain, Size: 5479 bytes --]

Signed-off-by: Rich Johnston <rjohnston@sgi.com>
---
 295     |  147 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 295.out |    2 
 group   |    1 
 3 files changed, 150 insertions(+)

Index: b/295
===================================================================
--- /dev/null
+++ b/295
@@ -0,0 +1,147 @@
+#! /bin/bash
+# FS QA Test No. 295
+#
+# Test agskip mount options
+#
+#-----------------------------------------------------------------------
+# Copyright (c) 2013 SGI.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#-----------------------------------------------------------------------
+#
+#
+# This test is a modified version of agskip test written by Alain Renaud.
+#
+#   Phase 1: Create an xfs filesystem with AGCOUNT=32 and verify that the
+#            AGCOUNT is 32.
+#   Phase 2: Mount the filesystem with AGSKIP=3.
+#   Phase 3: Verify that the AGSKIP is working correctly:
+#            Create a file that spans more than one AG (size=1.5 * AGSIZE)
+#            Create another file that is less than AGSIZE.
+#            Use xfs_bmap to verify that the second extent is at location
+#            previous_AG+1.
+#            Continue creating/verifying that the AG's are created 3 AG's
+#            apart
+#
+# creator
+owner=rjohnston@sgi.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1	# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+    cd /
+    rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+# real QA test starts here
+rm -f $seq.full
+_require_scratch
+
+BLK_SIZE=4096
+AGCOUNT=32
+AGSKIP=3
+
+# Modify as appropriate.
+_supported_fs xfs
+_supported_os Linux
+
+echo "Phase 1: mkfs the filesystem" >> $seq.full 2>&1
+echo "DEBUG: agcount=${AGCOUNT}" >> $seq.full 2>&1
+_scratch_mkfs_xfs " -dagcount=${AGCOUNT}" \
+    >> $seq.full 2>&1 || _fail "mkfs_xfs failed"
+
+# Get fs agcount
+_agcount=$( xfs_db -r -c 'sb 0' -c 'print' ${SCRATCH_DEV} | \
+    awk '$1 == "agcount" { print $3 }' ) >> $seq.full 2>&1
+_agblocks=$( xfs_db -r -c 'sb 0' -c 'print' ${SCRATCH_DEV} | \
+    awk '$1 == "agblocks" { print $3 }' ) >> $seq.full 2>&1
+
+# Print size in meg
+_agsize=$(( _agblocks * ${BLK_SIZE} / 1024 / 1024 )) >> $seq.full 2>&1
+
+if [[ "${_agcount}" -ne ${AGCOUNT} ]]
+then
+    echo "ERROR: agcount(${_agcount}) value is incorrect " >> $seq.full 2>&1
+    exit
+fi
+
+echo "Phase 2: mount filesystem with agskip option" >> $seq.full 2>&1
+echo "DEBUG: agskip=${AGSKIP} "  >> $seq.full 2>&1
+
+# mount
+_scratch_mount "-o agskip=${AGSKIP}" >> $seq.full 2>&1 \
+    || _fail "xfs_mount failed"
+
+echo "Phase 3: test agskip option"
+mkdir ${SCRATCH_MNT}/agskip-${AGSKIP}.dir >> $seq.full 2>&1 \
+    || _fail "mkdir failed"
+
+# make the first file with 2 extents
+FSIZE=$(( ${_agsize} + (  ${_agsize} / 2 )))
+FILE=${SCRATCH_MNT}/agskip-${AGSKIP}.dir/file-0
+echo "DEBUG: making first file with size = ${FSIZE}m" >> $seq.full 2>&1
+xfs_mkfile ${FSIZE}m ${FILE} >> $seq.full 2>&1 \
+    || _fail "failed to make file ${FILE}"
+
+agvalue=$( xfs_bmap -v ${FILE} | awk '$1 == "0:" { print $4 }')
+
+# make sure that the second extent is at location AG+1
+_agtmp=$( xfs_bmap -v ${FILE} | awk '$1 == "1:" { print $4 }')
+
+if [[ $(( ( agvalue + 1 ) % AGCOUNT )) -ne ${_agtmp} ]]
+then
+    echo -n "ERROR: The AG location of extent-1=${_agtmp}" >> $seq.full 2>&1
+    echo    "       as where extent-0=${agvalue}" >> $seq.full 2>&1
+    exit
+fi
+
+_cmpt=1
+while [[ ${_cmpt} -lt 25 ]]
+do
+    FILE=${SCRATCH_MNT}/agskip-${AGSKIP}.dir/file-${_cmpt}
+    if ! xfs_mkfile 10m ${FILE}
+    then
+        echo "ERROR: failed to make file ${FILE}" >> $seq.full 2>&1
+        exit
+    fi
+    _agtmp=$( xfs_bmap -v ${FILE} | awk '$1 == "0:" { print $4 }')
+    if [[ $(( ( agvalue + AGSKIP ) % AGCOUNT )) -ne ${_agtmp} ]]
+    then
+        echo "ERROR: The AG location of file ${FILE} not at AG+${AGSKIP}" \
+             >> $seq.full 2>&1
+        echo "       old AG == ${agvalue} new AG == ${_agtmp}" >> $seq.full 2>&1
+        exit
+    fi
+    agvalue=${_agtmp}
+    let _cmpt++
+done
+
+echo "# unmounting scratch" >> $seq.full 2>&1
+${UMOUNT_PROG} ${SCRATCH_MNT} >>$seq.full 2>&1 || _fail "unmount failed"
+
+# success, all done
+status=0
+_cleanup
+exit
Index: b/295.out
===================================================================
--- /dev/null
+++ b/295.out
@@ -0,0 +1,2 @@
+QA output created by 295
+Phase 3: test agskip option
Index: b/group
===================================================================
--- a/group
+++ b/group
@@ -413,3 +413,4 @@ deprecated
 292 auto mkfs quick
 293 auto quick
 294 auto quick
+294 auto mount

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 3/3] --- sys-utils/mount.8 | 12 ++++++++++++ 1 file changed, 12 insertions(+)
  2013-01-29 15:39 [PATCH 0/3] Add agskip=value mount option rjohnston
  2013-01-29 15:39 ` [PATCH 1/3] xfs: add " rjohnston
  2013-01-29 15:39 ` [PATCH 2/3] xfstests: add test for agskip " rjohnston
@ 2013-01-29 15:39 ` rjohnston
  2013-01-29 17:59 ` [PATCH 0/3] Add agskip=value mount option Eric Sandeen
  3 siblings, 0 replies; 12+ messages in thread
From: rjohnston @ 2013-01-29 15:39 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: util-linux-add-xfs-agskip-mount-option.patch --]
[-- Type: text/plain, Size: 1014 bytes --]

Index: b/sys-utils/mount.8
===================================================================
--- a/sys-utils/mount.8
+++ b/sys-utils/mount.8
@@ -2587,6 +2587,18 @@ None.
 
 .SH "Mount options for xfs"
 .TP
+.BI agskip= value
+Specifies the allocation group, (AG) for a new file, relative to the
+start of the last created file. agskip has the opposite effect of the
+rotorstep system tunable parameter.  Each new file to be placed in the
+location lastAG + agskipValue, where lastAG is the allocation group
+of the last created file.
+
+For example, agskip=3 means each new file will be allocated three AGs
+away from the starting AG of the most recently created file.
+
+The agskip mount option disables the rotorstep system tunable parameter.
+.TP
 .BI allocsize= size
 Sets the buffered I/O end-of-file preallocation size when
 doing delayed allocation writeout (default size is 64KiB).

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] Add agskip=value mount option
  2013-01-29 15:39 [PATCH 0/3] Add agskip=value mount option rjohnston
                   ` (2 preceding siblings ...)
  2013-01-29 15:39 ` [PATCH 3/3] --- sys-utils/mount.8 | 12 ++++++++++++ 1 file changed, 12 insertions(+) rjohnston
@ 2013-01-29 17:59 ` Eric Sandeen
  3 siblings, 0 replies; 12+ messages in thread
From: Eric Sandeen @ 2013-01-29 17:59 UTC (permalink / raw)
  To: rjohnston; +Cc: xfs

On 1/29/13 9:39 AM, rjohnston@sgi.com wrote:
> This series will:
> 
>     1. Add agskip=value mount option to xfs
>     2. Add agkip mount option test to xfstests
>     3. Add agskip mount option to the mount.8 manpage (util-linux)

You left out:

Why?  :)

-Eric

> --Rich
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] xfs: add agskip=value mount option
  2013-01-29 15:39 ` [PATCH 1/3] xfs: add " rjohnston
@ 2013-01-29 20:45   ` Eric Sandeen
  2013-01-29 20:57   ` Eric Sandeen
  2013-01-30  1:04   ` Dave Chinner
  2 siblings, 0 replies; 12+ messages in thread
From: Eric Sandeen @ 2013-01-29 20:45 UTC (permalink / raw)
  To: rjohnston; +Cc: xfs

On 1/29/13 9:39 AM, rjohnston@sgi.com wrote:

<an attachment>

Please do inline if possible?

But anyway, although I've not looked at the patch itself yet,
please also add this to Documentation/filesystems/xfs.txt

And not just the "what" but the "why" - why would a person
use this?  And how would they select the value?

Thanks,
-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] xfs: add agskip=value mount option
  2013-01-29 15:39 ` [PATCH 1/3] xfs: add " rjohnston
  2013-01-29 20:45   ` Eric Sandeen
@ 2013-01-29 20:57   ` Eric Sandeen
  2013-01-30  1:04   ` Dave Chinner
  2 siblings, 0 replies; 12+ messages in thread
From: Eric Sandeen @ 2013-01-29 20:57 UTC (permalink / raw)
  To: rjohnston; +Cc: xfs

On 1/29/13 9:39 AM, rjohnston@sgi.com wrote:
> The agskip mount option specifies  the allocation group, (AG) for a new
> file, relative to the start of the last created file. agskip has the
> opposite effect of the rotorstep  system tunable  parameter.   Each
> new  file  to  be placed in the location lastAG + agskipValue,
> where lastAG is the allocation group of the last created file.
> 
> For example, agskip=3 means each new file will be  allocated  three  AGs  away
> from the starting AG of the most recently created file.
> 
> The agskip mount option disables the rotorstep system tunable parameter.
> 
> Signed-off-by: Rich Johnston <rjohnston@sgi.com> 
> ---
>  fs/xfs/xfs_alloc.c      |   35 ++++++++++++++++++++++++++++-------
>  fs/xfs/xfs_filestream.c |    8 +++++++-
>  fs/xfs/xfs_mount.c      |    1 +
>  fs/xfs/xfs_mount.h      |    4 +++-
>  fs/xfs/xfs_super.c      |   13 ++++++++++++-
>  5 files changed, 51 insertions(+), 10 deletions(-)
> 
> Index: b/fs/xfs/xfs_alloc.c
> ===================================================================
> --- a/fs/xfs/xfs_alloc.c
> +++ b/fs/xfs/xfs_alloc.c
> @@ -2365,12 +2365,20 @@ xfs_alloc_vextent(
>  		 * Try near allocation first, then anywhere-in-ag after
>  		 * the first a.g. fails.
>  		 */
> -		if ((args->userdata  == XFS_ALLOC_INITIAL_USER_DATA) &&
> -		    (mp->m_flags & XFS_MOUNT_32BITINODES)) {
> -			args->fsbno = XFS_AGB_TO_FSB(mp,
> -					((mp->m_agfrotor / rotorstep) %
> -					mp->m_sb.sb_agcount), 0);
> -			bump_rotor = 1;
> +		if (args->userdata == XFS_ALLOC_INITIAL_USER_DATA) {
> +			if (mp->m_flags & XFS_MOUNT_AGSKIP) {
> +				spin_lock(&mp->m_agfrotor_lock);

Hm, this seems like a bigger deal than the m_agirotor_lock, no?
That one got taken for dir creation or for AG sips, but this
is taken for every new file?

Also, I guess I had expected that this might cause new dirs to rotor
around, but it looks like with this option,there is no locality
of files in dirs (just like there is not with inode32, IIRC).
Is that correct? (and is that desired?)

> +				args->fsbno = XFS_AGB_TO_FSB(mp,
> +					mp->m_agfrotor, 0);
> +				mp->m_agfrotor = (mp->m_agfrotor + mp->m_agskip)
> +					% mp->m_sb.sb_agcount;
> +				spin_unlock(&mp->m_agfrotor_lock);
> +			} else if (mp->m_flags & XFS_MOUNT_32BITINODES) {
> +				args->fsbno = XFS_AGB_TO_FSB(mp,
> +						((mp->m_agfrotor / rotorstep) %
> +						mp->m_sb.sb_agcount), 0);
> +				bump_rotor = 1;
> +			}
>  		}
>  		args->agbno = XFS_FSB_TO_AGBNO(mp, args->fsbno);
>  		args->type = XFS_ALLOCTYPE_NEAR_BNO;
> @@ -2385,8 +2393,21 @@ xfs_alloc_vextent(
>  			/*
>  			 * Start with the last place we left off.
>  			 */
> -			args->agno = sagno = (mp->m_agfrotor / rotorstep) %
> +                        if (mp->m_flags & XFS_MOUNT_AGSKIP) {

tabs please

> +				/*
> +				 * The spinlock makes the combined assignment
> +				 * of args->fsbno and mp->m_agfrotor atomic.
> +				 * mp->m_agfrotor can not be advanced until
> +				 * args->fsbno is assigned.
> +				*/
> +                                spin_lock(&mp->m_agfrotor_lock);
> +                                sagno = mp->m_agfrotor;
> +                                spin_unlock(&mp->m_agfrotor_lock);

tabs please



> +                        } else {
> +				sagno = (mp->m_agfrotor / rotorstep) %
>  					mp->m_sb.sb_agcount;
> +			}
> +			args->agno = sagno;
>  			args->type = XFS_ALLOCTYPE_THIS_AG;
>  			flags = XFS_ALLOC_FLAG_TRYLOCK;
>  		} else if (type == XFS_ALLOCTYPE_FIRST_AG) {
> Index: b/fs/xfs/xfs_filestream.c
> ===================================================================
> --- a/fs/xfs/xfs_filestream.c
> +++ b/fs/xfs/xfs_filestream.c
> @@ -622,7 +622,13 @@ xfs_filestream_associate(
>  	 * Set the starting AG using the rotor for inode32, otherwise
>  	 * use the directory inode's AG.
>  	 */
> -	if (mp->m_flags & XFS_MOUNT_32BITINODES) {
> +	if (mp->m_flags & XFS_MOUNT_AGSKIP) {
> +		spin_lock(&mp->m_agfrotor_lock);
> +		startag = mp->m_agfrotor;
> +		mp->m_agfrotor = (mp->m_agfrotor + mp->m_agskip)
> +			% mp->m_sb.sb_agcount;
> +		spin_unlock(&mp->m_agfrotor_lock);
> +	} else if (mp->m_flags & XFS_MOUNT_32BITINODES) {
>  		rotorstep = xfs_rotorstep;
>  		startag = (mp->m_agfrotor / rotorstep) % mp->m_sb.sb_agcount;
>  		mp->m_agfrotor = (mp->m_agfrotor + 1) %
> Index: b/fs/xfs/xfs_mount.c
> ===================================================================
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -768,6 +768,7 @@ STATIC void
>  xfs_mount_common(xfs_mount_t *mp, xfs_sb_t *sbp)
>  {
>  	mp->m_agfrotor = mp->m_agirotor = 0;
> +	spin_lock_init(&mp->m_agfrotor_lock);
>  	spin_lock_init(&mp->m_agirotor_lock);
>  	mp->m_maxagi = mp->m_sb.sb_agcount;
>  	mp->m_blkbit_log = sbp->sb_blocklog + XFS_NBBYLOG;
> Index: b/fs/xfs/xfs_mount.h
> ===================================================================
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -119,6 +119,7 @@ typedef struct xfs_mount {
>  	char			*m_logname;	/* external log device name */
>  	int			m_bsize;	/* fs logical block size */
>  	xfs_agnumber_t		m_agfrotor;	/* last ag where space found */
> +	spinlock_t		m_agfrotor_lock;/* .. and lock protecting it */
>  	xfs_agnumber_t		m_agirotor;	/* last ag dir inode alloced */
>  	spinlock_t		m_agirotor_lock;/* .. and lock protecting it */
>  	xfs_agnumber_t		m_maxagi;	/* highest inode alloc group */
> @@ -203,6 +204,7 @@ typedef struct xfs_mount {
>  	struct shrinker		m_inode_shrink;	/* inode reclaim shrinker */
>  	int64_t			m_low_space[XFS_LOWSP_MAX];
>  						/* low free space thresholds */
> +	int			m_agskip;	/* extent allocation stride */

Not sure I grok that comment.  But then we don't have a real description yet
anyway - just be sure it matches the described behavior, I guess?

>  
>  	struct workqueue_struct	*m_data_workqueue;
>  	struct workqueue_struct	*m_unwritten_workqueue;
> @@ -246,7 +248,7 @@ typedef struct xfs_mount {
>  #define XFS_MOUNT_FILESTREAMS	(1ULL << 24)	/* enable the filestreams
>  						   allocator */
>  #define XFS_MOUNT_NOATTR2	(1ULL << 25)	/* disable use of attr2 format */
> -
> +#define XFS_MOUNT_AGSKIP	(1ULL << 26)	/* extent allocation stride */
>  
>  /*
>   * Default minimum read and write sizes.
> Index: b/fs/xfs/xfs_super.c
> ===================================================================
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -114,7 +114,7 @@ mempool_t *xfs_ioend_pool;
>  #define MNTOPT_NODELAYLOG  "nodelaylog"	/* Delayed logging disabled */
>  #define MNTOPT_DISCARD	   "discard"	/* Discard unused blocks */
>  #define MNTOPT_NODISCARD   "nodiscard"	/* Do not discard unused blocks */
> -
> +#define MNTOPT_AGSKIP      "agskip"	/* initial extent allocation stride */

tabs please

>  /*
>   * Table driven mount option parser.
>   *
> @@ -383,6 +383,15 @@ xfs_parseargs(
>  		} else if (!strcmp(this_char, "irixsgid")) {
>  			xfs_warn(mp,
>  	"irixsgid is now a sysctl(2) variable, option is deprecated.");
> +		} else if (!strcmp(this_char, MNTOPT_AGSKIP)) {
> +			if (!value || !*value) {
> +				xfs_warn(mp,
> +					"%s option requires an argument",
> +					this_char);
> +				return EINVAL;
> +			}
> +			mp->m_flags |= XFS_MOUNT_AGSKIP;
> +			mp->m_agskip = simple_strtoul(value, &eov, 10);
>  		} else {
>  			xfs_warn(mp, "unknown mount option [%s].", this_char);
>  			return EINVAL;
> @@ -567,6 +576,8 @@ xfs_showargs(
>  
>  	if (!(mp->m_qflags & XFS_ALL_QUOTA_ACCT))
>  		seq_puts(m, "," MNTOPT_NOQUOTA);
> +	if (mp->m_flags & XFS_MOUNT_AGSKIP)
> +		seq_printf(m, "," MNTOPT_AGSKIP "=%d", mp->m_agskip);
>  
>  	return 0;
>  }
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] xfs: add agskip=value mount option
  2013-01-29 15:39 ` [PATCH 1/3] xfs: add " rjohnston
  2013-01-29 20:45   ` Eric Sandeen
  2013-01-29 20:57   ` Eric Sandeen
@ 2013-01-30  1:04   ` Dave Chinner
  2013-01-30 23:32     ` Ben Myers
  2 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2013-01-30  1:04 UTC (permalink / raw)
  To: rjohnston; +Cc: xfs

On Tue, Jan 29, 2013 at 09:39:15AM -0600, rjohnston@sgi.com wrote:
> The agskip mount option specifies  the allocation group, (AG) for a new
> file, relative to the start of the last created file. agskip has the
> opposite effect of the rotorstep  system tunable  parameter.   Each
> new  file  to  be placed in the location lastAG + agskipValue,
> where lastAG is the allocation group of the last created file.
> 
> For example, agskip=3 means each new file will be  allocated  three  AGs  away
> from the starting AG of the most recently created file.

What problem is this intended to solve? What are the limits to useful
values of the mount option for solving that problem?

> The agskip mount option disables the rotorstep system tunable parameter.

I think that's wrong. The both can co-exist, and I can see use cases
where you'd actually want both to be active on inode32 systems.
Specifying the size of the skip between each "rotorstep" should not
disable the number of files in each rotorstep....

> Signed-off-by: Rich Johnston <rjohnston@sgi.com> 
> ---
>  fs/xfs/xfs_alloc.c      |   35 ++++++++++++++++++++++++++++-------
>  fs/xfs/xfs_filestream.c |    8 +++++++-
>  fs/xfs/xfs_mount.c      |    1 +
>  fs/xfs/xfs_mount.h      |    4 +++-
>  fs/xfs/xfs_super.c      |   13 ++++++++++++-
>  5 files changed, 51 insertions(+), 10 deletions(-)
> 
> Index: b/fs/xfs/xfs_alloc.c
> ===================================================================
> --- a/fs/xfs/xfs_alloc.c
> +++ b/fs/xfs/xfs_alloc.c
> @@ -2365,12 +2365,20 @@ xfs_alloc_vextent(
>  		 * Try near allocation first, then anywhere-in-ag after
>  		 * the first a.g. fails.
>  		 */
> -		if ((args->userdata  == XFS_ALLOC_INITIAL_USER_DATA) &&
> -		    (mp->m_flags & XFS_MOUNT_32BITINODES)) {
> -			args->fsbno = XFS_AGB_TO_FSB(mp,
> -					((mp->m_agfrotor / rotorstep) %
> -					mp->m_sb.sb_agcount), 0);
> -			bump_rotor = 1;
> +		if (args->userdata == XFS_ALLOC_INITIAL_USER_DATA) {
> +			if (mp->m_flags & XFS_MOUNT_AGSKIP) {
> +				spin_lock(&mp->m_agfrotor_lock);
> +				args->fsbno = XFS_AGB_TO_FSB(mp,
> +					mp->m_agfrotor, 0);
> +				mp->m_agfrotor = (mp->m_agfrotor + mp->m_agskip)
> +					% mp->m_sb.sb_agcount;
> +				spin_unlock(&mp->m_agfrotor_lock);

Why are you changing the inode64 allocator behaviour like this? The
whole point of inode64 is that it allocates data close to the owner
inode and this breaks that association. Perhaps this modification to
the initial data allocation should still only be applied to the
inode32 allocator.

Indeed, if we want the inode64 allocator to have this behaviour,
then we should be applying the agskip parameter to the inode
allocator, not the data extent allocator. i.e to the agirotor in
xfs_ialloc_next_ag(). This will retain the data/metadata locality
properties of the inode64 allocator, yet still give the benefits of
an agskip allocation parameter to move the locality of entire
directories.

....

> @@ -2385,8 +2393,21 @@ xfs_alloc_vextent(
>  			/*
>  			 * Start with the last place we left off.
>  			 */
> -			args->agno = sagno = (mp->m_agfrotor / rotorstep) %
> +                        if (mp->m_flags & XFS_MOUNT_AGSKIP) {
> +				/*
> +				 * The spinlock makes the combined assignment
> +				 * of args->fsbno and mp->m_agfrotor atomic.
> +				 * mp->m_agfrotor can not be advanced until
> +				 * args->fsbno is assigned.
> +				*/
> +                                spin_lock(&mp->m_agfrotor_lock);
> +                                sagno = mp->m_agfrotor;
> +                                spin_unlock(&mp->m_agfrotor_lock);

There's whitespace damage there, and the comment doesn't make any
sense. Reading mp->m_agfrotor will always be an atomic access.

Also, there's nothing to stop you getting the "wrong" rotor value
here. i.e.

	thread 1	thread 2
	lock
	calc agfrotor
	unlock
			lock
			calc agfrotor
			unlock
	lock
	sagno = agfrotor
	unlock
			lock
			sagno = agfrotor
			unlock

And hence both allocations end up with the same start AG at twice
the distance of the skip value. I doubt this is what you intend.

....

> @@ -622,7 +622,13 @@ xfs_filestream_associate(
>  	 * Set the starting AG using the rotor for inode32, otherwise
>  	 * use the directory inode's AG.
>  	 */
> -	if (mp->m_flags & XFS_MOUNT_32BITINODES) {
> +	if (mp->m_flags & XFS_MOUNT_AGSKIP) {
> +		spin_lock(&mp->m_agfrotor_lock);
> +		startag = mp->m_agfrotor;
> +		mp->m_agfrotor = (mp->m_agfrotor + mp->m_agskip)
> +			% mp->m_sb.sb_agcount;
> +		spin_unlock(&mp->m_agfrotor_lock);
> +	} else if (mp->m_flags & XFS_MOUNT_32BITINODES) {
>  		rotorstep = xfs_rotorstep;
>  		startag = (mp->m_agfrotor / rotorstep) % mp->m_sb.sb_agcount;
>  		mp->m_agfrotor = (mp->m_agfrotor + 1) %

Please encapsulate all this agf rotor stuff in helper functions so
we don't duplicate logic all over the place...

....
> +		} else if (!strcmp(this_char, MNTOPT_AGSKIP)) {
> +			if (!value || !*value) {
> +				xfs_warn(mp,
> +					"%s option requires an argument",
> +					this_char);
> +				return EINVAL;
> +			}
> +			mp->m_flags |= XFS_MOUNT_AGSKIP;
> +			mp->m_agskip = simple_strtoul(value, &eov, 10);
>  		} else {

Hmmm. You don't validate whether the agskip option is valid for the
current filesystem configuration. What is a valid value for an
arbitrary configuration, and how can the filesystem validate it is
sensible?

>  			xfs_warn(mp, "unknown mount option [%s].", this_char);
>  			return EINVAL;
> @@ -567,6 +576,8 @@ xfs_showargs(
>  
>  	if (!(mp->m_qflags & XFS_ALL_QUOTA_ACCT))
>  		seq_puts(m, "," MNTOPT_NOQUOTA);
> +	if (mp->m_flags & XFS_MOUNT_AGSKIP)
> +		seq_printf(m, "," MNTOPT_AGSKIP "=%d", mp->m_agskip);
>  
>  	return 0;
>  }

Overall, I'm wondering if this is the right way to approach this
problem. It only really works correctly (in terms of distribution of
files/metadata) for fixed volume sizes (i.e. homogenous layouts) -
the common case where a skip is useful is after growing a filesystem
onto a new volume. It's rare that the new volume is the same as the
existing volumes, so it's hard to set a skip value that reliably
alternates between old and new volumes.

We talked about this allocation distribution problem last march when
we met at LSF, and I thought we agreed that pushing
agskip/agrotorstep mount options upstream was not the way we were
going to solve this problem after I outlined how I planned to solve
this problem. Indeed, I already have prototype patches for the AG control ioctls
that allow the xfs_spaceman program that associate an AG with an
"allocation zone" (an "AZ") and partially written kernel
patches that implement AZs, per-AZ agi/agf rotors and higher level AZ
iteration in the allocator.

(It also allows specification of per-AZ stripe unit/stripe width and
other allocation control parameters but those details are little
outside the scope of this particular discussion.)

IOWs, I'm close to having a significantly more capable and flexible
solution to this problem that doesn't require new mount options to
be added or on-disk format changes to be made. As such, I'm not sure
that we want to introduce a mount option that is only a partial
solution to the problem. We basically can't remove mount options
once they have been added, so introducing a new mount option just to
mark it deprecated in 6 months time seems like a bad idea....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] xfs: add agskip=value mount option
  2013-01-30  1:04   ` Dave Chinner
@ 2013-01-30 23:32     ` Ben Myers
  2013-01-31  6:13       ` Dave Chinner
  0 siblings, 1 reply; 12+ messages in thread
From: Ben Myers @ 2013-01-30 23:32 UTC (permalink / raw)
  To: Dave Chinner; +Cc: rjohnston, xfs

Hey Dave,

On Wed, Jan 30, 2013 at 12:04:30PM +1100, Dave Chinner wrote:
> On Tue, Jan 29, 2013 at 09:39:15AM -0600, rjohnston@sgi.com wrote:
> > The agskip mount option specifies  the allocation group, (AG) for a new
> > file, relative to the start of the last created file. agskip has the
> > opposite effect of the rotorstep  system tunable  parameter.   Each
> > new  file  to  be placed in the location lastAG + agskipValue,
> > where lastAG is the allocation group of the last created file.
> > 
> > For example, agskip=3 means each new file will be  allocated  three  AGs  away
> > from the starting AG of the most recently created file.
>
> Overall, I'm wondering if this is the right way to approach this
> problem.

We'll have to make sure we all understand the problem we're trying to solve
with this before going too far.

> It only really works correctly (in terms of distribution of
> files/metadata) for fixed volume sizes (i.e. homogenous layouts) -
> the common case where a skip is useful is after growing a filesystem
> onto a new volume. It's rare that the new volume is the same as the
> existing volumes, so it's hard to set a skip value that reliably
> alternates between old and new volumes.

Based upon what I've read so far on the internal bug when this was introduced,
this is more about being able to utilize all allocation groups in a filesystem
with many concats.  It's not so much related to balance after growing a
filesystem (which is another interesting problem).  The info should be added to
this series and be reposted.

> We talked about this allocation distribution problem last march when
> we met at LSF, and I thought we agreed that pushing
> agskip/agrotorstep mount options upstream was not the way we were
> going to solve this problem after I outlined how I planned to solve
> this problem.

If we can come up with something better, that's great.  But AFAICT the problem
still needs to be addressed.  This is just one way to do it.

> Indeed, I already have prototype patches for the AG control ioctls
> that allow the xfs_spaceman program that associate an AG with an
> "allocation zone" (an "AZ") and partially written kernel
> patches that implement AZs, per-AZ agi/agf rotors and higher level AZ
> iteration in the allocator.
> 
> (It also allows specification of per-AZ stripe unit/stripe width and
> other allocation control parameters but those details are little
> outside the scope of this particular discussion.)

Cool.

> IOWs, I'm close to having a significantly more capable and flexible
> solution to this problem that doesn't require new mount options to
> be added or on-disk format changes to be made. As such, I'm not sure
> that we want to introduce a mount option that is only a partial
> solution to the problem. We basically can't remove mount options
> once they have been added, so introducing a new mount option just to
> mark it deprecated in 6 months time seems like a bad idea....

I'm looking forward to seeing it.  ;)

In the meantime we'll get a conversation started as to how to resolve this
particular problem.... once we have a better understanding of it.

Regards,
	Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] xfs: add agskip=value mount option
  2013-01-30 23:32     ` Ben Myers
@ 2013-01-31  6:13       ` Dave Chinner
  2013-01-31 20:24         ` Rich Johnston
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2013-01-31  6:13 UTC (permalink / raw)
  To: Ben Myers; +Cc: rjohnston, xfs

On Wed, Jan 30, 2013 at 05:32:03PM -0600, Ben Myers wrote:
> Hey Dave,
> 
> On Wed, Jan 30, 2013 at 12:04:30PM +1100, Dave Chinner wrote:
> > On Tue, Jan 29, 2013 at 09:39:15AM -0600, rjohnston@sgi.com wrote:
> > > The agskip mount option specifies  the allocation group, (AG) for a new
> > > file, relative to the start of the last created file. agskip has the
> > > opposite effect of the rotorstep  system tunable  parameter.   Each
> > > new  file  to  be placed in the location lastAG + agskipValue,
> > > where lastAG is the allocation group of the last created file.
> > > 
> > > For example, agskip=3 means each new file will be  allocated  three  AGs  away
> > > from the starting AG of the most recently created file.
> >
> > Overall, I'm wondering if this is the right way to approach this
> > problem.
> 
> We'll have to make sure we all understand the problem we're trying to solve
> with this before going too far.

I'm in no doubt about what it is for - I know the exact history of
this patch and exactly what problems it was designed to solve
because....

> > It only really works correctly (in terms of distribution of
> > files/metadata) for fixed volume sizes (i.e. homogenous layouts) -
> > the common case where a skip is useful is after growing a filesystem
> > onto a new volume. It's rare that the new volume is the same as the
> > existing volumes, so it's hard to set a skip value that reliably
> > alternates between old and new volumes.
> 
> Based upon what I've read so far on the internal bug when this was introduced,
> this is more about being able to utilize all allocation groups in a filesystem
> with many concats.

... I was still at SGI when that bug was raised and the ag_skip
patch was written for the "Enhanced XFS" module in the NAS product
SGI was selling at the time. It was written as a quick stopgap by
the NAS product engineers to counter the problems being seen on that
product due to the nature of the "concat of stripes" storage
configuration that product used.

It was never was proposed as an upstream solution because I NACKed
it internally.  Indeed, at the time I was already well down the path
of fixing the problem in XFS in a much more capable way. i.e.  this
stuff:

http://oss.sgi.com/archives/xfs/2009-02/msg00250.html
http://oss.sgi.com/archives/xfs/2009-02/msg00253.html

And that was planned to replace the agskip hack in the next NAS
product release. Unfortunately, once I left SGI nobody picked up the
work I was doing and "Enhanced XFS" turned into a zombie.  Indeed,
agskip was posted back here in 2009 as part of the same code dump as
the above patches when the XFS group in Melbourne was let go:

http://oss.sgi.com/archives/xfs/2009-02/msg00252.html

There's a bit of history to this patch ;)

> It's not so much related to balance after growing a
> filesystem (which is another interesting problem).  The info should be added to
> this series and be reposted.

Actually, that was one of the problems the patch solved on the NAS
product. It was a secondary problem as growing wasn't a common
operation, but it was definitely a concern....

> > We talked about this allocation distribution problem last march when
> > we met at LSF, and I thought we agreed that pushing
> > agskip/agrotorstep mount options upstream was not the way we were
> > going to solve this problem after I outlined how I planned to solve
> > this problem.
> 
> If we can come up with something better, that's great.  But AFAICT the problem
> still needs to be addressed.  This is just one way to do it.

I'm not saynig that it doesn't need to be addressed. I'm just saying
the sam ething I said 5 years ago: there's no point pushing it into
mainline when far more comprehensive solution is just around the
corner....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] xfs: add agskip=value mount option
  2013-01-31  6:13       ` Dave Chinner
@ 2013-01-31 20:24         ` Rich Johnston
  2013-02-04 13:18           ` Dave Chinner
  0 siblings, 1 reply; 12+ messages in thread
From: Rich Johnston @ 2013-01-31 20:24 UTC (permalink / raw)
  To: Ben Myers, xfs

Hey Dave,

On 01/31/2013 12:13 AM, Dave Chinner wrote:
> On Wed, Jan 30, 2013 at 05:32:03PM -0600, Ben Myers wrote:
>> Hey Dave,
>>
>> On Wed, Jan 30, 2013 at 12:04:30PM +1100, Dave Chinner wrote:
>>> On Tue, Jan 29, 2013 at 09:39:15AM -0600, rjohnston@sgi.com wrote:
>>>> The agskip mount option specifies  the allocation group, (AG) for a new
>>>> file, relative to the start of the last created file. agskip has the
>>>> opposite effect of the rotorstep  system tunable  parameter.   Each
>>>> new  file  to  be placed in the location lastAG + agskipValue,
>>>> where lastAG is the allocation group of the last created file.
>>>>
>>>> For example, agskip=3 means each new file will be  allocated  three  AGs  away
>>>> from the starting AG of the most recently created file.
>>>
>>> Overall, I'm wondering if this is the right way to approach this
>>> problem.
>>
>> We'll have to make sure we all understand the problem we're trying to solve
>> with this before going too far.
>
> I'm in no doubt about what it is for - I know the exact history of
> this patch and exactly what problems it was designed to solve
> because....
>
>>> It only really works correctly (in terms of distribution of
>>> files/metadata) for fixed volume sizes (i.e. homogenous layouts) -
>>> the common case where a skip is useful is after growing a filesystem
>>> onto a new volume. It's rare that the new volume is the same as the
>>> existing volumes, so it's hard to set a skip value that reliably
>>> alternates between old and new volumes.
>>
>> Based upon what I've read so far on the internal bug when this was introduced,
>> this is more about being able to utilize all allocation groups in a filesystem
>> with many concats.
>
> ... I was still at SGI when that bug was raised and the ag_skip
> patch was written for the "Enhanced XFS" module in the NAS product
> SGI was selling at the time. It was written as a quick stopgap by
> the NAS product engineers to counter the problems being seen on that
> product due to the nature of the "concat of stripes" storage
> configuration that product used.
>
> It was never was proposed as an upstream solution because I NACKed
> it internally.  Indeed, at the time I was already well down the path
> of fixing the problem in XFS in a much more capable way. i.e.  this
> stuff:
>

I did not see any references to the patchset you referenced below when I 
was  working on submitting this patchset. Thanks for pointing it out.

> http://oss.sgi.com/archives/xfs/2009-02/msg00250.html

The patchset (pluggable allocation policies) above looks very promising 
and I would like to port it to top of tree and use it instead of my 
agskip proposal.  Are there any changes to this patchset we should 
discuss before I start.

Thanks
--Rich

> http://oss.sgi.com/archives/xfs/2009-02/msg00253.html
>
> And that was planned to replace the agskip hack in the next NAS
> product release. Unfortunately, once I left SGI nobody picked up the
> work I was doing and "Enhanced XFS" turned into a zombie.  Indeed,
> agskip was posted back here in 2009 as part of the same code dump as
> the above patches when the XFS group in Melbourne was let go:
>
> http://oss.sgi.com/archives/xfs/2009-02/msg00252.html
>
> There's a bit of history to this patch ;)
>
>> It's not so much related to balance after growing a
>> filesystem (which is another interesting problem).  The info should be added to
>> this series and be reposted.
>
> Actually, that was one of the problems the patch solved on the NAS
> product. It was a secondary problem as growing wasn't a common
> operation, but it was definitely a concern....
>
>>> We talked about this allocation distribution problem last march when
>>> we met at LSF, and I thought we agreed that pushing
>>> agskip/agrotorstep mount options upstream was not the way we were
>>> going to solve this problem after I outlined how I planned to solve
>>> this problem.
>>
>> If we can come up with something better, that's great.  But AFAICT the problem
>> still needs to be addressed.  This is just one way to do it.
>
> I'm not saynig that it doesn't need to be addressed. I'm just saying
> the sam ething I said 5 years ago: there's no point pushing it into
> mainline when far more comprehensive solution is just around the
> corner....
>
> Cheers,
>
> Dave.
>


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] xfs: add agskip=value mount option
  2013-01-31 20:24         ` Rich Johnston
@ 2013-02-04 13:18           ` Dave Chinner
  0 siblings, 0 replies; 12+ messages in thread
From: Dave Chinner @ 2013-02-04 13:18 UTC (permalink / raw)
  To: Rich Johnston; +Cc: Ben Myers, xfs

On Thu, Jan 31, 2013 at 02:24:02PM -0600, Rich Johnston wrote:
> On 01/31/2013 12:13 AM, Dave Chinner wrote:
> >On Wed, Jan 30, 2013 at 05:32:03PM -0600, Ben Myers wrote:
> >>Hey Dave,
> >>
> >>On Wed, Jan 30, 2013 at 12:04:30PM +1100, Dave Chinner wrote:
> >>>On Tue, Jan 29, 2013 at 09:39:15AM -0600, rjohnston@sgi.com wrote:
> >>>>The agskip mount option specifies  the allocation group, (AG) for a new
> >>>>file, relative to the start of the last created file. agskip has the
> >>>>opposite effect of the rotorstep  system tunable  parameter.   Each
> >>>>new  file  to  be placed in the location lastAG + agskipValue,
> >>>>where lastAG is the allocation group of the last created file.
> >>>>
> >>>>For example, agskip=3 means each new file will be  allocated  three  AGs  away
> >>>>from the starting AG of the most recently created file.
> >>>
> >>>Overall, I'm wondering if this is the right way to approach this
> >>>problem.
> 
> I did not see any references to the patchset you referenced below
> when I was  working on submitting this patchset. Thanks for pointing
> it out.
> 
> >http://oss.sgi.com/archives/xfs/2009-02/msg00250.html
> 
> The patchset (pluggable allocation policies) above looks very
> promising and I would like to port it to top of tree and use it
> instead of my agskip proposal.  Are there any changes to this
> patchset we should discuss before I start.

Well, apart from the fact you'll probably have to rewrite parts of
it from the ground up because the xfs_bmap.c code and inode
allocation code has been significantly factored since those patches
were written. Porting them, IMO, is mostly not an option because the
chance of missing bug fixes and enhancements from the past 5 years
is too great.

Also, the patchset was still at the proof of concept stage, and some
of the stuff may not be supportable. e.g. the policy templates and
multiple instantiation code. They were wild ideas I was trying to
see if I could get to work, so they may well be something we don't
really want to try to do.

Also, the patch set is really an encoding of work in progress, not a
finished and refined patch set. Several of the patches rework the
same code over and over again and, for example, the on-disk
representation of the allocation policy is changed at least once
during the series. The on-disk format should be defined once during
a mergable patch series and not changed at all, so this shows that
the patch set is really a line of research rather than a
ready-for-production piece of work.

Further, the userspace interfaces will need to be completely
rethought. Looking at the per-parameter ioctl interface, it's pretty
nasty and convoluted probably isn't eaily usable - i never wrote any
userspace code for it, so it's just an encoding of an idea more than
anything.

Finally, it doesn't have any of the storage configuration
information encoded into it (eg. for a generic ag_skip like
solution) and no interface to add that information.  That was work
that was going to build on this policy interface, but now I think it
is completely orthogonal to the policy interface as it has
completely separate uses (like defining failure domains).

Right now I think that it is better to add the storage layout
information to the filesystem, then worry about how generic
allocation policies can use that information. However, doing an
initial abstraction of the inode and data allocation code (patches
3-7) configured via the existing mount options and making stuff like
the "everything inherits from the parent inode" functionality and
the inode64->inode32 transitions during growfs work correctly would
be a good set of functionality that coul dbe pulled mostly unchanged
from the patch set. That abstraction is relatively straight
forward and lays the foundation for more complex allocation policies
to be implemented, but doesn't go as far as some of the semi-insane
ideas I was trying out later in the series...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-02-04 13:18 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-29 15:39 [PATCH 0/3] Add agskip=value mount option rjohnston
2013-01-29 15:39 ` [PATCH 1/3] xfs: add " rjohnston
2013-01-29 20:45   ` Eric Sandeen
2013-01-29 20:57   ` Eric Sandeen
2013-01-30  1:04   ` Dave Chinner
2013-01-30 23:32     ` Ben Myers
2013-01-31  6:13       ` Dave Chinner
2013-01-31 20:24         ` Rich Johnston
2013-02-04 13:18           ` Dave Chinner
2013-01-29 15:39 ` [PATCH 2/3] xfstests: add test for agskip " rjohnston
2013-01-29 15:39 ` [PATCH 3/3] --- sys-utils/mount.8 | 12 ++++++++++++ 1 file changed, 12 insertions(+) rjohnston
2013-01-29 17:59 ` [PATCH 0/3] Add agskip=value mount option Eric Sandeen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox