All of lore.kernel.org
 help / color / mirror / Atom feed
* [Ocfs2-devel] [PATCH 0/4] Ocfs2 allocation tuning
@ 2010-04-06  1:17 Mark Fasheh
  2010-04-06  1:17 ` [Ocfs2-devel] [PATCH 1/4] ocfs2: clean up localalloc mount option size parsing Mark Fasheh
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Mark Fasheh @ 2010-04-06  1:17 UTC (permalink / raw)
  To: ocfs2-devel

The following patches apply some lessons I've learned during some testing of
a couple non-trivial workloads on Ocfs2.  Roughly speaking, they tune "up"
the default local alloc windowxs, and tune "down" the reservations code a
bit.

I primarily ran two tests - one simulating a large multi-node data-processing
workload, and the multi-threaded writers test, mtwrite at:

http://www.kernel.org/pub/linux/kernel/people/mfasheh/ocfs2/tests/mtwrite.c

All patches were developed and tested on top of the current set of
allocation changes in ocfs2.git (primarily reservations, and the local alloc
fixes we have).  A git branch containing all the patches is available, based
off 2.6.33:

git pull git://git.kernel.org/pub/scm/linux/kernel/git/ocfs2-mark.git disk-alloc

Basically though, this all involved many many test runs (of which I have
lots of data) and a thorough checking of fragmentation levels.  I changed
only one value at a time. Details regarding some of the test runs are within
the specific patch descriptions.
	--Mark

--
Mark Fasheh

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Ocfs2-devel] [PATCH 1/4] ocfs2: clean up localalloc mount option size parsing
  2010-04-06  1:17 [Ocfs2-devel] [PATCH 0/4] Ocfs2 allocation tuning Mark Fasheh
@ 2010-04-06  1:17 ` Mark Fasheh
  2010-04-06  1:17 ` [Ocfs2-devel] [PATCH 2/4] ocfs2: increase the default size of local alloc windows Mark Fasheh
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Mark Fasheh @ 2010-04-06  1:17 UTC (permalink / raw)
  To: ocfs2-devel

This patch pulls the local alloc sizing code into localalloc.c and provides
a callout to it from ocfs2_fill_super(). Behavior is essentially unchanged
except that I correctly calculate the maximum local alloc size. The old code
in ocfs2_parse_options() calculated the max size as:

ocfs2_local_alloc_size(sb) * 8

which is correct, in bits. Unfortunately though the option passed in is in
megabytes. Ultimately, this bug made no real difference - the shrink code
would catch a too-large size and bring it down to something reasonable.
Still, it's less than efficient as-is.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/localalloc.c |   28 ++++++++++++++++++++++++++++
 fs/ocfs2/localalloc.h |    2 ++
 fs/ocfs2/ocfs2.h      |    6 ++++++
 fs/ocfs2/super.c      |   10 +++++-----
 4 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
index 18fc176..9dc34a6 100644
--- a/fs/ocfs2/localalloc.c
+++ b/fs/ocfs2/localalloc.c
@@ -75,6 +75,34 @@ static int ocfs2_local_alloc_new_window(struct ocfs2_super *osb,
 static int ocfs2_local_alloc_slide_window(struct ocfs2_super *osb,
 					  struct inode *local_alloc_inode);
 
+void ocfs2_la_set_sizes(struct ocfs2_super *osb, int requested_mb)
+{
+	struct super_block *sb = osb->sb;
+	unsigned int la_default_mb = OCFS2_DEFAULT_LOCAL_ALLOC_SIZE;
+	unsigned int la_max_mb;
+
+	la_max_mb = ocfs2_clusters_to_megabytes(sb,
+						ocfs2_local_alloc_size(sb) * 8);
+
+	mlog(0, "requested: %dM, max: %uM, default: %uM\n",
+	     requested_mb, la_max_mb, la_default_mb);
+
+	if (requested_mb == -1) {
+		/* No user request - use defaults */
+		osb->local_alloc_default_bits =
+			ocfs2_megabytes_to_clusters(sb, la_default_mb);
+	} else if (requested_mb > la_max_mb) {
+		/* Request is too big, we give the maximum available */
+		osb->local_alloc_default_bits =
+			ocfs2_megabytes_to_clusters(sb, la_max_mb);
+	} else {
+		osb->local_alloc_default_bits =
+			ocfs2_megabytes_to_clusters(sb, requested_mb);
+	}
+
+	osb->local_alloc_bits = osb->local_alloc_default_bits;
+}
+
 static inline int ocfs2_la_state_enabled(struct ocfs2_super *osb)
 {
 	return (osb->local_alloc_state == OCFS2_LA_THROTTLED ||
diff --git a/fs/ocfs2/localalloc.h b/fs/ocfs2/localalloc.h
index ac5ea9f..04195c6 100644
--- a/fs/ocfs2/localalloc.h
+++ b/fs/ocfs2/localalloc.h
@@ -30,6 +30,8 @@ int ocfs2_load_local_alloc(struct ocfs2_super *osb);
 
 void ocfs2_shutdown_local_alloc(struct ocfs2_super *osb);
 
+void ocfs2_la_set_sizes(struct ocfs2_super *osb, int requested_mb);
+
 int ocfs2_begin_local_alloc_recovery(struct ocfs2_super *osb,
 				     int node_num,
 				     struct ocfs2_dinode **alloc_copy);
diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index 4a6f00e..44b95a9 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -765,6 +765,12 @@ static inline unsigned int ocfs2_megabytes_to_clusters(struct super_block *sb,
 	return megs << (20 - OCFS2_SB(sb)->s_clustersize_bits);
 }
 
+static inline unsigned int ocfs2_clusters_to_megabytes(struct super_block *sb,
+						       unsigned int clusters)
+{
+	return clusters >> (20 - OCFS2_SB(sb)->s_clustersize_bits);
+}
+
 static inline void ocfs2_init_inode_steal_slot(struct ocfs2_super *osb)
 {
 	spin_lock(&osb->osb_lock);
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index db354d1..6dfcb4e 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -93,7 +93,7 @@ struct mount_options
 	unsigned long	mount_opt;
 	unsigned int	atime_quantum;
 	signed short	slot;
-	unsigned int	localalloc_opt;
+	int		localalloc_opt;
 	unsigned int	resv_level;
 	char		cluster_stack[OCFS2_STACK_LABEL_LEN + 1];
 };
@@ -1027,8 +1027,8 @@ static int ocfs2_fill_super(struct super_block *sb, void *data, int silent)
 	osb->s_atime_quantum = parsed_options.atime_quantum;
 	osb->preferred_slot = parsed_options.slot;
 	osb->osb_commit_interval = parsed_options.commit_interval;
-	osb->local_alloc_default_bits = ocfs2_megabytes_to_clusters(sb, parsed_options.localalloc_opt);
-	osb->local_alloc_bits = osb->local_alloc_default_bits;
+
+	ocfs2_la_set_sizes(osb, parsed_options.localalloc_opt);
 	osb->osb_resv_level = parsed_options.resv_level;
 
 	status = ocfs2_verify_userspace_stack(osb, &parsed_options);
@@ -1288,7 +1288,7 @@ static int ocfs2_parse_options(struct super_block *sb,
 	mopt->mount_opt = 0;
 	mopt->atime_quantum = OCFS2_DEFAULT_ATIME_QUANTUM;
 	mopt->slot = OCFS2_INVALID_SLOT;
-	mopt->localalloc_opt = OCFS2_DEFAULT_LOCAL_ALLOC_SIZE;
+	mopt->localalloc_opt = -1;
 	mopt->cluster_stack[0] = '\0';
 	mopt->resv_level = OCFS2_DEFAULT_RESV_LEVEL;
 
@@ -1381,7 +1381,7 @@ static int ocfs2_parse_options(struct super_block *sb,
 				status = 0;
 				goto bail;
 			}
-			if (option >= 0 && (option <= ocfs2_local_alloc_size(sb) * 8))
+			if (option >= 0)
 				mopt->localalloc_opt = option;
 			break;
 		case Opt_localflocks:
-- 
1.6.4.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Ocfs2-devel] [PATCH 2/4] ocfs2: increase the default size of local alloc windows
  2010-04-06  1:17 [Ocfs2-devel] [PATCH 0/4] Ocfs2 allocation tuning Mark Fasheh
  2010-04-06  1:17 ` [Ocfs2-devel] [PATCH 1/4] ocfs2: clean up localalloc mount option size parsing Mark Fasheh
@ 2010-04-06  1:17 ` Mark Fasheh
  2010-04-06  1:17 ` [Ocfs2-devel] [PATCH 3/4] ocfs2: change default reservation window sizes Mark Fasheh
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Mark Fasheh @ 2010-04-06  1:17 UTC (permalink / raw)
  To: ocfs2-devel

I have observed that the current size of 8M gives us pretty poor
fragmentation on multi-threaded workloads which do lots of writes.

Generally, I can increase the size of local alloc windows and observe a
marked decrease in fragmentation, even up and beyond window sizes of 512
megabytes. This makes sense for a couple reasons - larger local alloc means
more room for reservation windows. On multi-node workloads the larger local
alloc helps as well because we don't have to do window slides as often.

Also, I removed the OCFS2_DEFAULT_LOCAL_ALLOC_SIZE constant as it is no
longer used and the comment above it was out of date.

To test fragmentation, I used a workload which launched 4 threads that did
4k writes into a series of about 140 alternating files.

With resv_level=2, and a 4k/4k file system I observed the following average
fragmentation for various localalloc= parameters:

localalloc=	avg. fragmentation
	8		48
	32		16
	64		10
	120		7

On larger cluster sizes, the difference is more dramatic.

The new default size top out at 256M, which we'll only get for cluster
sizes of 32K and above.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/localalloc.c |  114 ++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/ocfs2/localalloc.h |    1 +
 fs/ocfs2/ocfs2.h      |    3 +
 fs/ocfs2/ocfs2_fs.h   |    8 ---
 fs/ocfs2/super.c      |    3 +-
 5 files changed, 118 insertions(+), 11 deletions(-)

diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
index 9dc34a6..9b70a02 100644
--- a/fs/ocfs2/localalloc.c
+++ b/fs/ocfs2/localalloc.c
@@ -75,10 +75,120 @@ static int ocfs2_local_alloc_new_window(struct ocfs2_super *osb,
 static int ocfs2_local_alloc_slide_window(struct ocfs2_super *osb,
 					  struct inode *local_alloc_inode);
 
+/*
+ * ocfs2_la_default_mb() - determine a default size, in megabytes of
+ * the local alloc.
+ *
+ * Generally, we'd like to pick as large a local alloc as
+ * possible. Performance on large workloads tends to scale
+ * proportionally to la size. In addition to that, the reservations
+ * code functions more efficiently as it can reserve more windows for
+ * write.
+ *
+ * Some things work against us when trying to choose a large local alloc:
+ *
+ * - We need to ensure our sizing is picked to leave enough space in
+ *   group descriptors for other allocations (such as block groups,
+ *   etc). Picking default sizes which are a multiple of 4 could help
+ *   - block groups are allocated in 2mb and 4mb chunks.
+ *
+ * - Likewise, we don't want to starve other nodes of bits on small
+ *   file systems. This can easily be taken care of by limiting our
+ *   default to a reasonable size (256M) on larger cluster sizes.
+ *
+ * - Some file systems can't support very large sizes - 4k and 8k in
+ *   particular are limited to less than 128 and 256 megabytes respectively.
+ *
+ * The following reference table shows group descriptor and local
+ * alloc maximums at various cluster sizes (4k blocksize)
+ *
+ * csize: 4K	group: 126M	la: 121M
+ * csize: 8K	group: 252M	la: 243M
+ * csize: 16K	group: 504M	la: 486M
+ * csize: 32K	group: 1008M	la: 972M
+ * csize: 64K	group: 2016M	la: 1944M
+ * csize: 128K	group: 4032M	la: 3888M
+ * csize: 256K	group: 8064M	la: 7776M
+ * csize: 512K	group: 16128M	la: 15552M
+ * csize: 1024K	group: 32256M	la: 31104M
+ */
+#define	OCFS2_LA_MAX_DEFAULT_MB	256
+#define	OCFS2_LA_OLD_DEFAULT	8
+unsigned int ocfs2_la_default_mb(struct ocfs2_super *osb)
+{
+	unsigned int la_mb;
+	unsigned int gd_mb;
+	unsigned int megs_per_slot;
+	struct super_block *sb = osb->sb;
+
+	gd_mb = ocfs2_clusters_to_megabytes(osb->sb,
+					    8 * ocfs2_group_bitmap_size(sb));
+
+	/*
+	 * This takes care of files systems with very small group
+	 * descriptors - 512 byte blocksize at cluster sizes lower
+	 * than 16K and also 1k blocksize with 4k cluster size.
+	 */
+	if ((sb->s_blocksize == 512 && osb->s_clustersize <= 8192)
+	    || (sb->s_blocksize == 1024 && osb->s_clustersize == 4096))
+		return OCFS2_LA_OLD_DEFAULT;
+
+	/*
+	 * Leave enough room for some block groups and make the final
+	 * value we work from a multiple of 4.
+	 */
+	gd_mb -= 16;
+	gd_mb &= 0xFFFFFFFB;
+
+	la_mb = gd_mb;
+
+	/*
+	 * Keep window sizes down to a reasonable default
+	 */
+	if (la_mb > OCFS2_LA_MAX_DEFAULT_MB) {
+		/*
+		 * Some clustersize / blocksize combinations will have
+		 * given us a larger than OCFS2_LA_MAX_DEFAULT_MB
+		 * default size, but get poor distribution when
+		 * limited to exactly 256 megabytes.
+		 *
+		 * As an example, 16K clustersize at 4K blocksize
+		 * gives us a cluster group size of 504M. Paring the
+		 * local alloc size down to 256 however, would give us
+		 * only one window and around 200MB left in the
+		 * cluster group. Instead, find the first size below
+		 * 256 which would give us an even distribution.
+		 *
+		 * Larger cluster group sizes actually work out pretty
+		 * well when pared to 256, so we don't have to do this
+		 * for any group that fits more than two
+		 * OCFS2_LA_MAX_DEFAULT_MB windows.
+		 */
+		if (gd_mb > (2 * OCFS2_LA_MAX_DEFAULT_MB))
+			la_mb = 256;
+		else {
+			unsigned int gd_mult = gd_mb;
+
+			while (gd_mult > 256)
+				gd_mult = gd_mult >> 1;
+
+			la_mb = gd_mult;
+		}
+	}
+
+	megs_per_slot = osb->osb_clusters_at_boot / osb->max_slots;
+	megs_per_slot = ocfs2_clusters_to_megabytes(osb->sb, megs_per_slot);
+	/* Too many nodes, too few disk clusters. */
+	if (megs_per_slot < la_mb)
+		la_mb = megs_per_slot;
+
+	return la_mb;
+}
+
 void ocfs2_la_set_sizes(struct ocfs2_super *osb, int requested_mb)
 {
 	struct super_block *sb = osb->sb;
-	unsigned int la_default_mb = OCFS2_DEFAULT_LOCAL_ALLOC_SIZE;
+	unsigned int la_default_mb = ocfs2_la_default_mb(osb);
 	unsigned int la_max_mb;
 
 	la_max_mb = ocfs2_clusters_to_megabytes(sb,
@@ -185,7 +295,7 @@ int ocfs2_load_local_alloc(struct ocfs2_super *osb)
 		     osb->local_alloc_bits, (osb->bitmap_cpg - 1));
 		osb->local_alloc_bits =
 			ocfs2_megabytes_to_clusters(osb->sb,
-						    OCFS2_DEFAULT_LOCAL_ALLOC_SIZE);
+						    ocfs2_la_default_mb(osb));
 	}
 
 	/* read the alloc off disk */
diff --git a/fs/ocfs2/localalloc.h b/fs/ocfs2/localalloc.h
index 04195c6..1be9b58 100644
--- a/fs/ocfs2/localalloc.h
+++ b/fs/ocfs2/localalloc.h
@@ -31,6 +31,7 @@ int ocfs2_load_local_alloc(struct ocfs2_super *osb);
 void ocfs2_shutdown_local_alloc(struct ocfs2_super *osb);
 
 void ocfs2_la_set_sizes(struct ocfs2_super *osb, int requested_mb);
+unsigned int ocfs2_la_default_mb(struct ocfs2_super *osb);
 
 int ocfs2_begin_local_alloc_recovery(struct ocfs2_super *osb,
 				     int node_num,
diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index 44b95a9..72e6eef 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -339,6 +339,9 @@ struct ocfs2_super
 	 */
 	unsigned int local_alloc_bits;
 	unsigned int local_alloc_default_bits;
+	/* osb_clusters_at_boot can become stale! Do not trust it to
+	 * be up to date. */
+	unsigned int osb_clusters_at_boot;
 
 	enum ocfs2_local_alloc_state local_alloc_state; /* protected
 							 * by osb_lock */
diff --git a/fs/ocfs2/ocfs2_fs.h b/fs/ocfs2/ocfs2_fs.h
index 7638a38..d58bc76 100644
--- a/fs/ocfs2/ocfs2_fs.h
+++ b/fs/ocfs2/ocfs2_fs.h
@@ -340,14 +340,6 @@ struct reflink_arguments {
 #define OCFS2_MIN_JOURNAL_SIZE		(4 * 1024 * 1024)
 
 /*
- * Default local alloc size (in megabytes)
- *
- * The value chosen should be such that most allocations, including new
- * block groups, use local alloc.
- */
-#define OCFS2_DEFAULT_LOCAL_ALLOC_SIZE	8
-
-/*
  * Inline extended attribute size (in bytes)
  * The value chosen should be aligned to 16 byte boundaries.
  */
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index 6dfcb4e..bca1ed3 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -1499,7 +1499,7 @@ static int ocfs2_show_options(struct seq_file *s, struct vfsmount *mnt)
 			   (unsigned) (osb->osb_commit_interval / HZ));
 
 	local_alloc_megs = osb->local_alloc_bits >> (20 - osb->s_clustersize_bits);
-	if (local_alloc_megs != OCFS2_DEFAULT_LOCAL_ALLOC_SIZE)
+	if (local_alloc_megs != ocfs2_la_default_mb(osb))
 		seq_printf(s, ",localalloc=%d", local_alloc_megs);
 
 	if (opts & OCFS2_MOUNT_LOCALFLOCKS)
@@ -2247,6 +2247,7 @@ static int ocfs2_initialize_super(struct super_block *sb,
 	}
 
 	osb->bitmap_blkno = OCFS2_I(inode)->ip_blkno;
+	osb->osb_clusters_at_boot = OCFS2_I(inode)->ip_clusters;
 	iput(inode);
 
 	osb->bitmap_cpg = ocfs2_group_bitmap_size(sb) * 8;
-- 
1.6.4.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Ocfs2-devel] [PATCH 3/4] ocfs2: change default reservation window sizes
  2010-04-06  1:17 [Ocfs2-devel] [PATCH 0/4] Ocfs2 allocation tuning Mark Fasheh
  2010-04-06  1:17 ` [Ocfs2-devel] [PATCH 1/4] ocfs2: clean up localalloc mount option size parsing Mark Fasheh
  2010-04-06  1:17 ` [Ocfs2-devel] [PATCH 2/4] ocfs2: increase the default size of local alloc windows Mark Fasheh
@ 2010-04-06  1:17 ` Mark Fasheh
  2010-04-06  1:17 ` [Ocfs2-devel] [PATCH 4/4] ocfs2: Add dir_resv_level mount option Mark Fasheh
  2010-04-06  1:36 ` [Ocfs2-devel] [PATCH 0/4] Ocfs2 allocation tuning Joel Becker
  4 siblings, 0 replies; 6+ messages in thread
From: Mark Fasheh @ 2010-04-06  1:17 UTC (permalink / raw)
  To: ocfs2-devel

The default reservation size of 4 (32-bit windows) is a bit too ambitious.
Scale it back to 16 bits (resv_level=2). I have been testing various sizes
on a 4-node cluster which runs a mixed workload that is heavily threaded.
With a 256MB local alloc, I get *roughly* the following levels of average file
fragmentation:

resv_level=0	70%
resv_level=1	21%
resv_level=2	23%
resv_level=3	24%
resv_level=4	60%
resv_level=5	did not test
resv_level=6	60%

resv_level=2 seemed like a good compromise between not letting windows be
too small, but not so big that heavier workloads will immediately suffer
without tuning.

This patch also change the behavior of directory reservations - they now
track file reservations.  The previous compromise of giving directory
windows only 8 bits wound up fragmenting more at some window sizes because
file allocations had smaller unused windows to poach from.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 Documentation/filesystems/ocfs2.txt |    2 +-
 fs/ocfs2/reservations.c             |    7 ++++---
 fs/ocfs2/reservations.h             |    2 +-
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/Documentation/filesystems/ocfs2.txt b/Documentation/filesystems/ocfs2.txt
index 412df90..32339e5 100644
--- a/Documentation/filesystems/ocfs2.txt
+++ b/Documentation/filesystems/ocfs2.txt
@@ -80,6 +80,6 @@ user_xattr	(*)	Enables Extended User Attributes.
 nouser_xattr		Disables Extended User Attributes.
 acl			Enables POSIX Access Control Lists support.
 noacl		(*)	Disables POSIX Access Control Lists support.
-resv_level=4	(*)	Set how agressive allocation reservations will be.
+resv_level=2	(*)	Set how agressive allocation reservations will be.
 			Valid values are between 0 (reservations off) to 8
 			(maximum space for reservations).
diff --git a/fs/ocfs2/reservations.c b/fs/ocfs2/reservations.c
index 7fc6cfe..87fa357 100644
--- a/fs/ocfs2/reservations.c
+++ b/fs/ocfs2/reservations.c
@@ -55,9 +55,10 @@ static unsigned int ocfs2_resv_window_bits(struct ocfs2_reservation_map *resmap,
 	if (!(resv->r_flags & OCFS2_RESV_FLAG_DIR)) {
 		/* 8, 16, 32, 64, 128, 256, 512, 1024 */
 		bits = 4 << osb->osb_resv_level;
-	} else
-		bits = OCFS2_RESV_DIR_WINDOW_BITS;
-
+	} else {
+		/* For now, treat directories the same as files. */
+		bits = 4 << osb->osb_resv_level;
+	}
 	return bits;
 }
 
diff --git a/fs/ocfs2/reservations.h b/fs/ocfs2/reservations.h
index 34bb308..022aff6 100644
--- a/fs/ocfs2/reservations.h
+++ b/fs/ocfs2/reservations.h
@@ -22,7 +22,7 @@
 
 #include <linux/rbtree.h>
 
-#define OCFS2_DEFAULT_RESV_LEVEL	4
+#define OCFS2_DEFAULT_RESV_LEVEL	2
 #define OCFS2_MAX_RESV_LEVEL	9
 #define OCFS2_MIN_RESV_LEVEL	0
 
-- 
1.6.4.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Ocfs2-devel] [PATCH 4/4] ocfs2: Add dir_resv_level mount option
  2010-04-06  1:17 [Ocfs2-devel] [PATCH 0/4] Ocfs2 allocation tuning Mark Fasheh
                   ` (2 preceding siblings ...)
  2010-04-06  1:17 ` [Ocfs2-devel] [PATCH 3/4] ocfs2: change default reservation window sizes Mark Fasheh
@ 2010-04-06  1:17 ` Mark Fasheh
  2010-04-06  1:36 ` [Ocfs2-devel] [PATCH 0/4] Ocfs2 allocation tuning Joel Becker
  4 siblings, 0 replies; 6+ messages in thread
From: Mark Fasheh @ 2010-04-06  1:17 UTC (permalink / raw)
  To: ocfs2-devel

The default behavior for directory reservations stays the same, but we add a
mount option so people can tweak the size of directory reservations
according to their workloads.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 Documentation/filesystems/ocfs2.txt |    4 ++++
 fs/ocfs2/dir.c                      |    6 ++++--
 fs/ocfs2/ocfs2.h                    |    1 +
 fs/ocfs2/reservations.c             |    9 ++++++---
 fs/ocfs2/reservations.h             |    2 ++
 fs/ocfs2/super.c                    |   23 +++++++++++++++++++++++
 6 files changed, 40 insertions(+), 5 deletions(-)

diff --git a/Documentation/filesystems/ocfs2.txt b/Documentation/filesystems/ocfs2.txt
index 32339e5..1f7ae14 100644
--- a/Documentation/filesystems/ocfs2.txt
+++ b/Documentation/filesystems/ocfs2.txt
@@ -83,3 +83,7 @@ noacl		(*)	Disables POSIX Access Control Lists support.
 resv_level=2	(*)	Set how agressive allocation reservations will be.
 			Valid values are between 0 (reservations off) to 8
 			(maximum space for reservations).
+dir_resv_level=	(*)	By default, directory reservations will scale with file
+			reservations - users should rarely need to change this
+			value. If allocation reservations are turned off, this
+			option will have no effect.
diff --git a/fs/ocfs2/dir.c b/fs/ocfs2/dir.c
index b71acf3..dce39da 100644
--- a/fs/ocfs2/dir.c
+++ b/fs/ocfs2/dir.c
@@ -2993,7 +2993,8 @@ static int ocfs2_expand_inline_dir(struct inode *dir, struct buffer_head *di_bh,
 	 * if we only get one now, that's enough to continue. The rest
 	 * will be claimed after the conversion to extents.
 	 */
-	data_ac->ac_resv = &oi->ip_la_data_resv;
+	if (ocfs2_dir_resv_allowed(osb))
+		data_ac->ac_resv = &oi->ip_la_data_resv;
 	ret = ocfs2_claim_clusters(osb, handle, data_ac, 1, &bit_off, &len);
 	if (ret) {
 		mlog_errno(ret);
@@ -3373,7 +3374,8 @@ static int ocfs2_extend_dir(struct ocfs2_super *osb,
 			goto bail;
 		}
 
-		data_ac->ac_resv = &OCFS2_I(dir)->ip_la_data_resv;
+		if (ocfs2_dir_resv_allowed(osb))
+			data_ac->ac_resv = &OCFS2_I(dir)->ip_la_data_resv;
 
 		credits = ocfs2_calc_extend_credits(sb, el, 1);
 	} else {
diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index 72e6eef..e946936 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -353,6 +353,7 @@ struct ocfs2_super
 	struct ocfs2_reservation_map	osb_la_resmap;
 
 	unsigned int	osb_resv_level;
+	unsigned int	osb_dir_resv_level;
 
 	/* Next three fields are for local node slot recovery during
 	 * mount. */
diff --git a/fs/ocfs2/reservations.c b/fs/ocfs2/reservations.c
index 87fa357..6497bcc 100644
--- a/fs/ocfs2/reservations.c
+++ b/fs/ocfs2/reservations.c
@@ -44,7 +44,11 @@ DEFINE_SPINLOCK(resv_lock);
 
 #define	OCFS2_MIN_RESV_WINDOW_BITS	8
 #define	OCFS2_MAX_RESV_WINDOW_BITS	1024
-#define	OCFS2_RESV_DIR_WINDOW_BITS	OCFS2_MIN_RESV_WINDOW_BITS
+
+int ocfs2_dir_resv_allowed(struct ocfs2_super *osb)
+{
+	return (osb->osb_resv_level && osb->osb_dir_resv_level);
+}
 
 static unsigned int ocfs2_resv_window_bits(struct ocfs2_reservation_map *resmap,
 					   struct ocfs2_alloc_reservation *resv)
@@ -56,8 +60,7 @@ static unsigned int ocfs2_resv_window_bits(struct ocfs2_reservation_map *resmap,
 		/* 8, 16, 32, 64, 128, 256, 512, 1024 */
 		bits = 4 << osb->osb_resv_level;
 	} else {
-		/* For now, treat directories the same as files. */
-		bits = 4 << osb->osb_resv_level;
+		bits = 4 << osb->osb_dir_resv_level;
 	}
 	return bits;
 }
diff --git a/fs/ocfs2/reservations.h b/fs/ocfs2/reservations.h
index 022aff6..25b0c0e 100644
--- a/fs/ocfs2/reservations.h
+++ b/fs/ocfs2/reservations.h
@@ -67,6 +67,8 @@ void ocfs2_resv_init_once(struct ocfs2_alloc_reservation *resv);
 void ocfs2_resv_set_type(struct ocfs2_alloc_reservation *resv,
 			 unsigned int flags);
 
+int ocfs2_dir_resv_allowed(struct ocfs2_super *osb);
+
 /**
  * ocfs2_resv_discard() - truncate a reservation
  * @resmap:
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index bca1ed3..9f0e211 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -95,6 +95,7 @@ struct mount_options
 	signed short	slot;
 	int		localalloc_opt;
 	unsigned int	resv_level;
+	int		dir_resv_level;
 	char		cluster_stack[OCFS2_STACK_LABEL_LEN + 1];
 };
 
@@ -177,6 +178,7 @@ enum {
 	Opt_usrquota,
 	Opt_grpquota,
 	Opt_resv_level,
+	Opt_dir_resv_level,
 	Opt_err,
 };
 
@@ -204,6 +206,7 @@ static const match_table_t tokens = {
 	{Opt_usrquota, "usrquota"},
 	{Opt_grpquota, "grpquota"},
 	{Opt_resv_level, "resv_level=%u"},
+	{Opt_dir_resv_level, "dir_resv_level=%u"},
 	{Opt_err, NULL}
 };
 
@@ -1030,6 +1033,11 @@ static int ocfs2_fill_super(struct super_block *sb, void *data, int silent)
 
 	ocfs2_la_set_sizes(osb, parsed_options.localalloc_opt);
 	osb->osb_resv_level = parsed_options.resv_level;
+	osb->osb_dir_resv_level = parsed_options.resv_level;
+	if (parsed_options.dir_resv_level == -1)
+		osb->osb_dir_resv_level = parsed_options.resv_level;
+	else
+		osb->osb_dir_resv_level = parsed_options.dir_resv_level;
 
 	status = ocfs2_verify_userspace_stack(osb, &parsed_options);
 	if (status)
@@ -1291,6 +1299,7 @@ static int ocfs2_parse_options(struct super_block *sb,
 	mopt->localalloc_opt = -1;
 	mopt->cluster_stack[0] = '\0';
 	mopt->resv_level = OCFS2_DEFAULT_RESV_LEVEL;
+	mopt->dir_resv_level = -1;
 
 	if (!options) {
 		status = 1;
@@ -1445,6 +1454,17 @@ static int ocfs2_parse_options(struct super_block *sb,
 			    option < OCFS2_MAX_RESV_LEVEL)
 				mopt->resv_level = option;
 			break;
+		case Opt_dir_resv_level:
+			if (is_remount)
+				break;
+			if (match_int(&args[0], &option)) {
+				status = 0;
+				goto bail;
+			}
+			if (option >= OCFS2_MIN_RESV_LEVEL &&
+			    option < OCFS2_MAX_RESV_LEVEL)
+				mopt->dir_resv_level = option;
+			break;
 		default:
 			mlog(ML_ERROR,
 			     "Unrecognized mount option \"%s\" "
@@ -1529,6 +1549,9 @@ static int ocfs2_show_options(struct seq_file *s, struct vfsmount *mnt)
 	if (osb->osb_resv_level != OCFS2_DEFAULT_RESV_LEVEL)
 		seq_printf(s, ",resv_level=%d", osb->osb_resv_level);
 
+	if (osb->osb_dir_resv_level != osb->osb_resv_level)
+		seq_printf(s, ",dir_resv_level=%d", osb->osb_resv_level);
+
 	return 0;
 }
 
-- 
1.6.4.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Ocfs2-devel] [PATCH 0/4] Ocfs2 allocation tuning
  2010-04-06  1:17 [Ocfs2-devel] [PATCH 0/4] Ocfs2 allocation tuning Mark Fasheh
                   ` (3 preceding siblings ...)
  2010-04-06  1:17 ` [Ocfs2-devel] [PATCH 4/4] ocfs2: Add dir_resv_level mount option Mark Fasheh
@ 2010-04-06  1:36 ` Joel Becker
  4 siblings, 0 replies; 6+ messages in thread
From: Joel Becker @ 2010-04-06  1:36 UTC (permalink / raw)
  To: ocfs2-devel

On Mon, Apr 05, 2010 at 06:17:12PM -0700, Mark Fasheh wrote:
> The following patches apply some lessons I've learned during some testing of
> a couple non-trivial workloads on Ocfs2.  Roughly speaking, they tune "up"
> the default local alloc windowxs, and tune "down" the reservations code a
> bit.
> 
> I primarily ran two tests - one simulating a large multi-node data-processing
> workload, and the multi-threaded writers test, mtwrite at:
> 
> http://www.kernel.org/pub/linux/kernel/people/mfasheh/ocfs2/tests/mtwrite.c
> 
> All patches were developed and tested on top of the current set of
> allocation changes in ocfs2.git (primarily reservations, and the local alloc
> fixes we have).  A git branch containing all the patches is available, based
> off 2.6.33:
> 
> git pull git://git.kernel.org/pub/scm/linux/kernel/git/ocfs2-mark.git disk-alloc
> 
> Basically though, this all involved many many test runs (of which I have
> lots of data) and a thorough checking of fragmentation levels.  I changed
> only one value at a time. Details regarding some of the test runs are within
> the specific patch descriptions.

	These patches are now in the merge-window branch of ocfs2.git.

Joel

-- 

"It is not the function of our government to keep the citizen from
 falling into error; it is the function of the citizen to keep the
 government from falling into error."
	- Robert H. Jackson

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-04-06  1:36 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-06  1:17 [Ocfs2-devel] [PATCH 0/4] Ocfs2 allocation tuning Mark Fasheh
2010-04-06  1:17 ` [Ocfs2-devel] [PATCH 1/4] ocfs2: clean up localalloc mount option size parsing Mark Fasheh
2010-04-06  1:17 ` [Ocfs2-devel] [PATCH 2/4] ocfs2: increase the default size of local alloc windows Mark Fasheh
2010-04-06  1:17 ` [Ocfs2-devel] [PATCH 3/4] ocfs2: change default reservation window sizes Mark Fasheh
2010-04-06  1:17 ` [Ocfs2-devel] [PATCH 4/4] ocfs2: Add dir_resv_level mount option Mark Fasheh
2010-04-06  1:36 ` [Ocfs2-devel] [PATCH 0/4] Ocfs2 allocation tuning Joel Becker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.