All of lore.kernel.org
 help / color / mirror / Atom feed
From: jeffm@suse.com
To: ReiserFS Mailing List <reiserfs-devel@vger.kernel.org>
Subject: [patch 40/40] reiserfs: reorganize do_balan.c comments
Date: Mon, 11 Jun 2007 15:03:49 -0400	[thread overview]
Message-ID: <20070611190633.849415204@suse.com> (raw)
In-Reply-To: 20070611190309.532091171@suse.com

[-- Attachment #1: reiserfs-balance_leaf_comment.diff --]
[-- Type: text/plain, Size: 13687 bytes --]

 This patch just shuffles some comments around to make them more useful,
 as well as formatted in a more readable manner.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
---
 fs/reiserfs/do_balan.c |  233 ++++++++++++++++++++++---------------------------
 1 file changed, 108 insertions(+), 125 deletions(-)

--- a/fs/reiserfs/do_balan.c	2007-05-30 17:54:46.000000000 -0400
+++ b/fs/reiserfs/do_balan.c	2007-05-30 17:54:47.000000000 -0400
@@ -1,20 +1,47 @@
 /*
  * Copyright 2000 by Hans Reiser, licensing governed by reiserfs/README
+ *
+ * Big rule: Nothing performed during balance_leaf() may schedule. This
+ * is to ensure that the tree will be stable during the entire balancing
+ * run. This is clearly suboptimal for multithreaded use, but making
+ * the tree locking more scalable is a huge project that nobody has
+ * decided to take on yet.
+ *
+ * Summary:
+ *  if deleting/cutting something ( tb->insert_size[0] < 0 )
+ *    return(bl_when_delete());
+ *  else
+ *    if lnum is larger than 0 we put items into the left node
+ *    if rnum is larger than 0 we put items into the right node
+ *    if tb->snum[0] is larger than 0 we put items into the new node s1
+ *    if tb->snum[1] is larger than 0 we put items into the new node s2
+ * Note that all *num* count new items being created.
+ *
+ * Some interesting rules of balancing:
+ *
+ * we delete a maximum of two nodes per level per balancing: we never
+ * delete R, when we delete two of three nodes L, S, R then we move
+ * them into R.
+ *
+ * we only delete L if we are deleting two nodes, if we delete only
+ * one node we delete S
+ *
+ * if we shift leaves then we shift as much as we can: this is a
+ * deliberate policy of extremism in node packing which results in
+ * higher average utilization after repeated random balance operations
+ * at the cost of more memory copies and more balancing as a result of
+ * small insertions to full nodes.
+ *
+ * if we shift internal nodes we try to evenly balance the node
+ * utilization, with consequent less balancing at the cost of lower
+ * utilization.
+ *
+ * one could argue that the policy for directories in leaves should be
+ * that of internal nodes, but we will wait until another day to
+ * evaluate this....  It would be nice to someday measure and prove
+ * these assumptions as to what is optimal....
  */
 
-/* Now we have all buffers that must be used in balancing of the tree 	*/
-/* Further calculations can not cause schedule(), and thus the buffer 	*/
-/* tree will be stable until the balancing will be finished 		*/
-/* balance the tree according to the analysis made before,		*/
-/* and using buffers obtained after all above.				*/
-
-/**
- ** bl_when_delete
- ** balance_leaf
- ** do_balance
- **
- **/
-
 #include <asm/uaccess.h>
 #include <linux/time.h>
 #include <linux/reiserfs_fs.h>
@@ -76,29 +103,6 @@ inline void do_balance_mark_leaf_dirty(s
 #define do_balance_mark_internal_dirty do_balance_mark_leaf_dirty
 #define do_balance_mark_sb_dirty do_balance_mark_leaf_dirty
 
-/* summary:
- if deleting something ( tb->insert_size[0] < 0 )
-   return(bl_when_delete()); (flag d handled here)
- else
-   if lnum is larger than 0 we put items into the left node
-   if rnum is larger than 0 we put items into the right node
-   if tb->snum[0] is larger than 0 we put items into the new node s1
-   if tb->snum[1] is larger than 0 we put items into the new node s2
-Note that all *num* count new items being created.
-
-It would be easier to read balance_leaf() if each of these summary
-lines was a separate procedure rather than being inlined.  I think
-that there are many passages here and in bl_when_delete() in
-which two calls to one procedure can replace two passages, and it
-might save cache space and improve software maintenance costs to do so.
-
-Vladimir made the perceptive comment that we should offload most of
-the decision making in this function into fix_nodes/check_balance, and
-then create some sort of structure in tb that says what actions should
-be performed by do_balance.
-
--Hans */
-
 /* L[0] must be joined with S[0] */
 static int bl_delete_merge_left(struct tree_balance *tb)
 {
@@ -1352,20 +1356,30 @@ bl_current_node(struct tree_ba
 }
 
 
-static int balance_leaf(struct tree_balance *tb, struct item_head *ih,	/* item header of inserted item (this is on little endian) */
-			const char *body,	/* body  of inserted item or bytes to paste */
-			int flag,	/* i - insert, d - delete, c - cut, p - paste
-					   (see comment to do_balance) */
-			struct item_head *insert_key,	/* in our processing of one level we sometimes determine what
-							   must be inserted into the next higher level.  This insertion
-							   consists of a key or two keys and their corresponding
-							   pointers */
-			struct buffer_head **insert_ptr	/* inserted node-ptrs for the next level */
-    )
+/*
+ * balance_leaf(): The core driver for balancing the leaf level of reiserfs
+ *                 s-trees. This function completes successfully or panics.
+ *                 Upon successful completion, the leaf level is balanced.
+ *
+ * @tb:         tree balance struct containing state of this balance
+ * @ih:         item header of inserted item (this is on little endian)
+ * @body:       body of inserted item or bytes to paste
+ * @flag:	One of M_INSERT, M_PASTE, M_DELETE, or M_CUT
+ * @insert_key:	In our processing of one level we sometimes determine what
+ *              must be inserted into the next higher level.  This insertion
+ *              consists of a key or two keys and their corresponding
+ *              pointers
+ * @insert_ptr:	inserted node-ptrs for the next level
+ */
+static int balance_leaf(struct tree_balance *tb, struct item_head *ih,
+                        const char *body, int flag,
+                        struct item_head *insert_key,
+                        struct buffer_head **insert_ptr)
 {
 	struct buffer_head *tbS0 = PATH_PLAST_BUFFER(tb->tb_path);
-	int item_pos = PATH_LAST_POSITION(tb->tb_path);	/*  index into the array of item headers in S[0]
-							   of the affected item */
+	/* index into the array of item headers in S[0]
+         * of the affected item */
+	int item_pos = PATH_LAST_POSITION(tb->tb_path);
 	int pos_in_item;
 	int zeros_num;
 
@@ -1435,7 +1449,7 @@ static int balance_leaf(struct tree_bala
 		          pos_in_item);
 
 	return 0;
-}				/* Leaf level of the tree is balanced (end of balance_leaf) */
+}	/* Leaf level of the tree is balanced (end of balance_leaf) */
 
 /* Make empty node */
 void make_empty_node(struct buffer_info *bi)
@@ -1449,7 +1463,7 @@ void make_empty_node(struct buffer_info 
 	set_blkh_free_space(blkh, MAX_CHILD_SIZE(bi->bi_bh));
 
 	if (bi->bi_parent)
-		B_N_CHILD(bi->bi_parent, bi->bi_position)->dc_size = 0;	/* Endian safe if 0 */
+		B_N_CHILD(bi->bi_parent, bi->bi_position)->dc_size = 0;
 }
 
 /* Get first empty buffer */
@@ -1490,7 +1504,8 @@ static void store_thrown(struct tree_bal
 			get_bh(bh);	/* free_thrown puts this */
 			return;
 		}
-	reiserfs_warning(tb->tb_sb, "reiserfs-12321", "too many thrown buffers");
+	reiserfs_warning(tb->tb_sb, "reiserfs-12321",
+	                 "too many thrown buffers");
 }
 
 static void free_thrown(struct tree_balance *tb)
@@ -1504,7 +1519,8 @@ static void free_thrown(struct tree_bala
 				reiserfs_warning(tb->tb_sb, "reiserfs-12322",
 				                 "called with dirty buffer %d",
 				                 blocknr);
-			brelse(tb->thrown[i]);	/* incremented in store_thrown */
+			/* incremented in store_thrown */
+			brelse(tb->thrown[i]);
 			reiserfs_free_block(tb->transaction_handle, NULL,
 					    blocknr, 0);
 		}
@@ -1530,9 +1546,8 @@ void replace_key(struct tree_balance *tb
 	RFALSE(dest == NULL || src == NULL,
 	       "vs-12305: source or destination buffer is 0 (src=%p, dest=%p)",
 	       src, dest);
-	RFALSE(!B_IS_KEYS_LEVEL(dest),
-	       "vs-12310: invalid level (%z) for destination buffer. dest must be leaf",
-	       dest);
+	RFALSE(!B_IS_KEYS_LEVEL(dest), "vs-12310: invalid level (%z) "
+	       "for destination buffer. dest must be leaf", dest);
 	RFALSE(n_dest < 0 || n_src < 0,
 	       "vs-12315: src(%d) or dest(%d) key number < 0", n_src, n_dest);
 	RFALSE(n_dest >= B_NR_ITEMS(dest) || n_src >= B_NR_ITEMS(src),
@@ -1723,39 +1738,6 @@ static void check_internal_levels(struct
 
 #endif
 
-/* Now we have all of the buffers that must be used in balancing of
-   the tree.  We rely on the assumption that schedule() will not occur
-   while do_balance works. ( Only interrupt handlers are acceptable.)
-   We balance the tree according to the analysis made before this,
-   using buffers already obtained.  For SMP support it will someday be
-   necessary to add ordered locking of tb. */
-
-/* Some interesting rules of balancing:
-
-   we delete a maximum of two nodes per level per balancing: we never
-   delete R, when we delete two of three nodes L, S, R then we move
-   them into R.
-
-   we only delete L if we are deleting two nodes, if we delete only
-   one node we delete S
-
-   if we shift leaves then we shift as much as we can: this is a
-   deliberate policy of extremism in node packing which results in
-   higher average utilization after repeated random balance operations
-   at the cost of more memory copies and more balancing as a result of
-   small insertions to full nodes.
-
-   if we shift internal nodes we try to evenly balance the node
-   utilization, with consequent less balancing at the cost of lower
-   utilization.
-
-   one could argue that the policy for directories in leaves should be
-   that of internal nodes, but we will wait until another day to
-   evaluate this....  It would be nice to someday measure and prove
-   these assumptions as to what is optimal....
-
-*/
-
 static inline void do_balance_starts(struct tree_balance *tb)
 {
 	/* use print_cur_tb() to see initial state of struct
@@ -1794,43 +1776,45 @@ static inline void do_balance_completed(
 	free_thrown(tb);
 }
 
-void do_balance(struct tree_balance *tb,	/* tree_balance structure */
-		struct item_head *ih,	/* item header of inserted item */
-		const char *body,	/* body  of inserted item or bytes to paste */
-		int flag)
-{				/* i - insert, d - delete
-				   c - cut, p - paste
-
-				   Cut means delete part of an item
-				   (includes removing an entry from a
-				   directory).
-
-				   Delete means delete whole item.
-
-				   Insert means add a new item into the
-				   tree.
-
-				   Paste means to append to the end of an
-				   existing file or to insert a directory
-				   entry.  */
-	int child_pos,		/* position of a child node in its parent */
-	 h;			/* level of the tree being processed */
-	struct item_head insert_key[2];	/* in our processing of one level
-					   we sometimes determine what
-					   must be inserted into the next
-					   higher level.  This insertion
-					   consists of a key or two keys
-					   and their corresponding
-					   pointers */
-	struct buffer_head *insert_ptr[2];	/* inserted node-ptrs for the next
-						   level */
+/*
+ * do_balance(): The driver function for tree balancing. We balance the leaf
+ *               level first, and then update the internal tree to match.
+ *
+ * @tb: struct tree_balance containing state of balancing
+ * @ih: item header of inserted item
+ * @body: body of inserted item or bytes to paste
+ * @flag: One of M_INSERT, M_PASTE, M_DELETE, or M_CUT:
+ *        M_INSERT: Add a new item to the tree
+ *        M_PASTE:  Extend an existing item, e.g. appending a file or
+ *                  adding a new directory entry
+ *        M_DELETE: Delete the entire item
+ *        M_CUT:    Delete part of an item, e.g. removing a directory entry
+ */
+void do_balance(struct tree_balance *tb, struct item_head *ih,
+                const char *body, int flag)
+{
+	/* position of a child node in its parent */
+	int child_pos = PATH_H_B_ITEM_ORDER(tb->tb_path, 0);
+
+	/* level of the tree being processed */
+	int h;
+
+	/* in our processing of one level we sometimes determine what
+         * must be inserted into the next higher level.  This insertion
+         * consists of a key or two keys and their corresponding pointers */
+	struct item_head insert_key[2];
+
+	/* Buffers added to the leaf level that must be tracked for
+	 * balancing the internal levels. */
+	struct buffer_head *insert_ptr[2];
 
 	tb->tb_mode = flag;
 	tb->need_balance_dirty = 0;
 
-	if (FILESYSTEM_CHANGED_TB(tb)) {
-		reiserfs_panic(tb->tb_sb, "clm-6000", "fs generation has changed");
-	}
+	if (FILESYSTEM_CHANGED_TB(tb))
+		reiserfs_panic(tb->tb_sb, "clm-6000",
+		               "fs generation has changed");
+
 	/* if we have no real work to do  */
 	if (!tb->insert_size[0]) {
 		reiserfs_warning(tb->tb_sb, "PAP-12350",
@@ -1845,8 +1829,7 @@ void do_balance(struct tree_balance *tb,
 	/* balance leaf returns 0 except if combining L R and S into
 	   one node.  see balance_internal() for explanation of this
 	   line of code. */
-	child_pos = PATH_H_B_ITEM_ORDER(tb->tb_path, 0) +
-	    balance_leaf(tb, ih, body, flag, insert_key, insert_ptr);
+	child_pos += balance_leaf(tb, ih, body, flag, insert_key, insert_ptr);
 
 #ifdef CONFIG_REISERFS_CHECK
 	check_after_balance_leaf(tb);
@@ -1854,8 +1837,8 @@ void do_balance(struct tree_balance *tb,
 
 	/* Balance internal level of the tree. */
 	for (h = 1; h < MAX_HEIGHT && tb->insert_size[h]; h++)
-		child_pos =
-		    balance_internal(tb, h, child_pos, insert_key, insert_ptr);
+		child_pos = balance_internal(tb, h, child_pos,
+		                             insert_key, insert_ptr);
 
 	do_balance_completed(tb);
 

-- 
Jeff Mahoney
SUSE Labs


  parent reply	other threads:[~2007-06-11 19:03 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-11 19:03 [patch 00/40] reiserfs: patch queue (v2) jeffm
2007-06-11 19:03 ` [patch 01/40] reiserfs: fix up lockdep warnings jeffm
2007-06-11 19:03 ` [patch 02/40] reiserfs: dont use BUG when panicking jeffm
2007-06-11 19:03 ` [patch 03/40] reiserfs: use is_reusable to catch corruption jeffm
2007-06-13 15:54   ` Jeff Mahoney
2007-06-11 19:03 ` [patch 04/40] reiserfs: make bitmap use cached first zero bit jeffm
2007-06-11 19:03 ` [patch 05/40] reiserfs: use more consistent printk formatting jeffm
2007-06-11 19:03 ` [patch 06/40] reiserfs: make some warnings informational jeffm
2007-06-11 19:03 ` [patch 07/40] reiserfs: rework reiserfs_warning jeffm
2007-06-11 19:03 ` [patch 08/40] reiserfs: rework reiserfs_panic jeffm
2007-06-11 19:03 ` [patch 09/40] reiserfs: rearrange journal abort jeffm
2007-06-11 19:03 ` [patch 10/40] reiserfs: introduce reiserfs_error() jeffm
2007-06-11 19:03 ` [patch 11/40] reiserfs: use reiserfs_error() jeffm
2007-06-11 19:03 ` [patch 12/40] reiserfs: simplify xattr internal file lookups/opens jeffm
2007-06-11 19:03 ` [patch 13/40] reiserfs: eliminate per-super xattr lock jeffm
2007-06-11 19:03 ` [patch 14/40] reiserfs: make per-inode xattr locking more fine grained jeffm
2007-06-11 19:03 ` [patch 15/40] reiserfs: remove i_has_xattr_dir jeffm
2007-06-11 19:03 ` [patch 16/40] reiserfs: remove link detection code jeffm
2007-06-11 19:03 ` [patch 17/40] reiserfs: use generic xattr handlers jeffm
2007-06-11 19:03 ` [patch 18/40] reiserfs: use better open options for internal files jeffm
2007-06-11 19:03 ` [patch 19/40] reiserfs: add per-file data=ordered mode and use it for xattrs jeffm
2007-06-11 19:03 ` [patch 20/40] reiserfs: journaled xattrs jeffm
2007-06-11 19:03 ` [patch 21/40] reiserfs: use generic readdir for operations across all xattrs jeffm
2007-06-11 19:03 ` [patch 22/40] reiserfs: add atomic addition of selinux attributes during inode creation jeffm
2007-06-11 19:03 ` [patch 23/40] reiserfs: cleanup path functions jeffm
2007-06-11 19:03 ` [patch 24/40] reiserfs: strip trailing whitespace jeffm
2007-06-11 19:03 ` [patch 26/40] reiserfs: rename p_s_bh to bh jeffm
2007-06-11 19:03 ` [patch 27/40] reiserfs: rename p_s_inode to inode jeffm
2007-06-11 19:03 ` [patch 28/40] reiserfs: rename p_s_tb to tb jeffm
2007-06-11 19:03 ` [patch 29/40] reiserfs: rename p_._ variables jeffm
2007-06-11 19:03 ` [patch 30/40] reiserfs: rename _* variables jeffm
2007-06-11 19:03 ` [patch 31/40] reiserfs: factor out buffer_info initialization jeffm
2007-06-11 19:03 ` [patch 32/40] reiserfs: Turn tb->snum and tb->sbytes into an array jeffm
2007-06-11 19:03 ` [patch 33/40] reiserfs: split left balancing part of balance_leaf() off jeffm
2007-06-11 19:03 ` [patch 34/40] reiserfs: split right " jeffm
2007-06-11 19:03 ` [patch 35/40] reiserfs: split balance_leaf new node handling out jeffm
2007-06-11 19:03 ` [patch 36/40] reiserfs: split out current node handling from balance_leaf jeffm
2007-06-11 19:03 ` [patch 37/40] reiserfs: clean up bl_when_delete jeffm
2007-06-11 19:03 ` [patch 38/40] reiserfs: clean up balancing modes jeffm
2007-06-11 19:03 ` [patch 39/40] reiserfs: split bl_when_delete jeffm
2007-06-11 19:03 ` jeffm [this message]
2007-06-11 19:20 ` [patch 00/40] reiserfs: patch queue (v2) Jeff Mahoney
2007-06-14 19:41 ` Jeff Mahoney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070611190633.849415204@suse.com \
    --to=jeffm@suse.com \
    --cc=reiserfs-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.