From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Mason Subject: Re: udpated data logging available Date: 01 Jul 2003 21:44:21 -0400 Message-ID: <1057110260.20904.878.camel@tiny.suse.com> References: <1055764071.24111.650.camel@tiny.suse.com> <200306260216.41204.christian.mayrhuber@gmx.net> <1056592066.20899.10.camel@tiny.suse.com> <200306261342.08725.Dieter.Nuetzel@hamburg.de> <1056632026.20899.39.camel@tiny.suse.com> <3EFAF6D7.9040408@netscape.net> <3F021C41.9090100@netscape.net> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-oAM8VtYzhvq2grnermaN" Return-path: list-help: list-unsubscribe: list-post: Errors-To: flx@namesys.com In-Reply-To: <3F021C41.9090100@netscape.net> List-Id: To: Manuel Krause Cc: reiserfs-list --=-oAM8VtYzhvq2grnermaN Content-Type: text/plain Content-Transfer-Encoding: 7bit On Tue, 2003-07-01 at 19:41, Manuel Krause wrote: > > > Does the search_reada-4 contradict the new code or is it even dangerous > > to combine them (what I luckily didn't trigger so far)? > > > > Thanks, > > > > Manuel > > No answer needed so far upon search_reada-4 ?!?! > Sorry, I've been doing some final testing on search_reada-5, which is attached. It doesn't help quite as much as search_reada-4, but it also doesn't hurt the random io case anywhere near as badly. It tries to be smarter about only doing read ahead for the same object you are searching for. I'll upload in the morning. > If I may remind, that only patch brought 2.4.20+ +reiserfs > +data-logging to the high throughput values of 2.4.19 +reiserfs > +data-logging when copying my backup partition (around 5GB) via cp. > > > > O.K. -- The new (experimental) patches run fine on all my previous > simple test patterns _with_ search_reada-4 (cp my backup-partitions, > home usage with NS 7.1 and OOo 1.1betas; VMware 3.2.1 sessions with > defrag/SpeedDisk in Win98) with 2.4.21 +data-logging +rml-preempt-kernel. > > I didn't post definite timings upon my data as using the first new > experimental data-logging patches led to a throughput/speed improvement > of 3% only (compared to without exp patches) what is within in the > typical fluctuation (copying via cp). And I avoided testing without > search_reada so far, for the reason of needed retesting back to 2.4.19 > (disk content changed). > So, at least, I can say "It didn't get slower - but may be a bit faster > or even another bit more. - Depends..." > Most of the improvement comes in fsync heavy workloads. The data=ordered io is a little smoother as well, for better latencies in general. > > Many thanks, your work is great indeed ! > Thanks for your continued tests, they are very helpful. -chris --=-oAM8VtYzhvq2grnermaN Content-Disposition: attachment; filename=search_reada-5.diff Content-Type: text/plain; name=search_reada-5.diff; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit ===== fs/reiserfs/stree.c 1.22 vs edited ===== --- 1.22/fs/reiserfs/stree.c Mon Jun 30 12:45:49 2003 +++ edited/fs/reiserfs/stree.c Mon Jun 30 13:33:02 2003 @@ -598,26 +598,32 @@ -#ifdef SEARCH_BY_KEY_READA +#define SEARCH_BY_KEY_READA 8 /* The function is NOT SCHEDULE-SAFE! */ -static void search_by_key_reada (struct super_block * s, int blocknr) +static void search_by_key_reada (struct super_block * s, + struct buffer_head **bh, + unsigned long *b, int num) { - struct buffer_head * bh; + int i,j; - if (blocknr == 0) - return; - - bh = getblk (s->s_dev, blocknr, s->s_blocksize); - - if (!buffer_uptodate (bh)) { - ll_rw_block (READA, 1, &bh); + for (i = 0 ; i < num ; i++) { + bh[i] = sb_getblk (s, b[i]); + if (buffer_uptodate(bh[i])) { + brelse(bh[i]); + break; + } + touch_buffer(bh[i]); + } + if (i) { + ll_rw_block(READA, i, bh); + } + for(j = 0 ; j < i ; j++) { + if (bh[j]) + brelse(bh[j]); } - bh->b_count --; } -#endif - /************************************************************************** * Algorithm SearchByKey * * look for item in the Disk S+Tree by its key * @@ -660,6 +666,9 @@ int n_node_level, n_retval; int right_neighbor_of_leaf_node; int fs_gen; + struct buffer_head *reada_bh[SEARCH_BY_KEY_READA]; + unsigned long reada_blocks[SEARCH_BY_KEY_READA]; + int reada_count = 0; #ifdef CONFIG_REISERFS_CHECK int n_repeat_counter = 0; @@ -696,11 +705,11 @@ fs_gen = get_generation (p_s_sb); expected_level --; -#ifdef SEARCH_BY_KEY_READA - /* schedule read of right neighbor */ - search_by_key_reada (p_s_sb, right_neighbor_of_leaf_node); -#endif - + /* schedule read of right neighbors */ + if (reada_count) { + search_by_key_reada (p_s_sb, reada_bh, reada_blocks, reada_count); + reada_count = 0; + } /* Read the next tree node, and set the last element in the path to have a pointer to it. */ if ( ! (p_s_bh = p_s_last_element->pe_buffer = @@ -787,12 +796,37 @@ an internal node. Now we calculate child block number by position in the node. */ n_block_number = B_N_CHILD_NUM(p_s_bh, p_s_last_element->pe_position); - -#ifdef SEARCH_BY_KEY_READA - /* if we are going to read leaf node, then calculate its right neighbor if possible */ - if (n_node_level == DISK_LEAF_NODE_LEVEL + 1 && p_s_last_element->pe_position < B_NR_ITEMS (p_s_bh)) - right_neighbor_of_leaf_node = B_N_CHILD_NUM(p_s_bh, p_s_last_element->pe_position + 1); -#endif + + /* if we are going to read leaf nodes, try for read ahead as well */ + if (n_node_level == DISK_LEAF_NODE_LEVEL + 1 && + p_s_last_element->pe_position < B_NR_ITEMS (p_s_bh) && + !is_direct_cpu_key(p_s_key) && + !is_statdata_cpu_key(p_s_key)) + { + int pos = p_s_last_element->pe_position; + int limit = B_NR_ITEMS(p_s_bh); + struct buffer_head *tmp_bh; + struct key *le_key; + + /* don't try to readahead if the leaf is already + * in ram. get_hash_table doesn't schedule, so this + * is safe + */ + tmp_bh = sb_get_hash_table(p_s_sb, n_block_number); + if (tmp_bh) { + brelse(tmp_bh); + continue; + } + while(pos <= limit && reada_count < SEARCH_BY_KEY_READA) { + le_key = B_N_PDELIM_KEY(p_s_bh, pos); + if (le32_to_cpu(le_key->k_objectid) != + p_s_key->on_disk_key.k_objectid) + { + break; + } + reada_blocks[reada_count++] = B_N_CHILD_NUM(p_s_bh, pos++); + } + } } } --=-oAM8VtYzhvq2grnermaN--