Re: [BK][PATCH] ReiserFS CPU and memory bandwidth efficient large writes

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Andrew Morton <akpm@digeo.com>
To: Hans Reiser <reiser@namesys.com>
Cc: Linus Torvalds <torvalds@transmeta.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [BK][PATCH] ReiserFS CPU and memory bandwidth efficient large writes
Date: Fri, 13 Dec 2002 13:40:42 -0800	[thread overview]
Message-ID: <3DFA53DA.DE6788C1@digeo.com> (raw)
In-Reply-To: 3DFA2D4F.3010301@namesys.com

Hans Reiser wrote:
> ..
>     These three changesets implement reiserfs_file_write.

What a big patch.

I'd be interested in testing it a bit.  Could you please
describe under what circumstances the new code paths are
executed?  Is it simply appending to a normal old file?  No
special mount options?


A few little comments, and maybe a security problem....


> +int reiserfs_can_fit_pages ( struct super_block *sb /* superblock of filesystem
> +						       to estimate space */ )
> +{
> +	unsigned long space;
> +
> +	spin_lock(&REISERFS_SB(sb)->bitmap_lock);
> +	space = (SB_FREE_BLOCKS(sb) - REISERFS_SB(sb)->reserved_blocks) / ( PAGE_CACHE_SIZE/sb->s_blocksize);
> +	spin_unlock(&REISERFS_SB(sb)->bitmap_lock);
> +
> +	return space;
>  }

Both of the divisions here can be replaced with shifts.  Divides are expensive.

The locking here is peculiar:

	spin_lock(lock);
	value = calculation();
	spin_unlock(lock);
	return value;

Surely, if the calculation relies on the lock then the returned value is invalidated
as soon as you drop the lock?

> -                    offset = le_ih_k_offset( ih ) + (old_len - tb->rbytes );
> +                    offset = le_ih_k_offset( ih ) + (old_len - tb->rbytes )*(is_indirect_le_ih(ih)?tb->tb_sb->s_blocksize/UNFM_P_SIZE:1);

Can use a shift.

> +			  if (is_indirect_le_key(version,B_N_PKEY(tb->R[0],0))){
> +			      temp_rem = (n_rem / UNFM_P_SIZE) * 
> +			                 tb->tb_sb->s_blocksize;

Shift by i_blkbits (all over the place)


> +/* this function from inode.c would be used here, too */
> +extern void restart_transaction(struct reiserfs_transaction_handle *th,
> +                                struct inode *inode, struct path *path);

This decl should be in a header file.

> +
> +/* I really do not want to play with memory shortage right now, so
> +   to simplify the code, we are not going to write more than this much pages at
> +   a time. This still should considerably improve performance compared to 4k
> +   at a time case. */
> +#define REISERFS_WRITE_PAGES_AT_A_TIME 32

Page sizes vary.  A better value may be

	(128 * 1024) / PAGE_CACHE_SIZE

so that consistent behaviour is seem on platforms which have page sizes larger
than 4k.

> +int reiserfs_allocate_blocks_for_region(
> ...
> +    b_blocknr_t allocated_blocks[blocks_to_allocate]; // Pointer to a place where allocated blocknumbers would be stored. Right now statically allocated, later that will change.

This is a variable-sized array (aka: alloca).  It's not a problem IMO, but
worth pointing out...

> +    reiserfs_blocknr_hint_t hint; // hint structure for block allocator.
> +    size_t res; // return value of various functions that we call.
> +    int curr_block; // current block used to keep track of unmapped blocks.
> +    int i; // loop counter
> +    int itempos; // position in item
> +    unsigned int from = (pos & (PAGE_CACHE_SIZE - 1)); // writing position in
> +						       // first page
> +    unsigned int to = ((pos + write_bytes - 1) & (PAGE_CACHE_SIZE - 1)) + 1; /* last modified byte offset in last page */
> +    __u64 hole_size ; // amount of blocks for a file hole, if it needed to be created.
> +    int modifying_this_item = 0; // Flag for items traversal code to keep track
> +				 // of the fact that we already prepared
> +				 // current block for journal

How much stack is this function using, worst-case?

> +    for ( i = 0; i < num_pages; i++) {
> +	prepared_pages[i] = grab_cache_page(mapping, index + i); // locks the page

OK.  But note that locking multiple pages here is only free from AB/BA deadlocks
because this is the only path which does it, and it is singly-threaded via i_sem.

> +	if ( from != 0 ) {/* First page needs to be partially zeroed */
> +	    char *kaddr = kmap_atomic(prepared_pages[0], KM_USER0);
> +	    memset(kaddr, 0, from);
> +	    kunmap_atomic( kaddr, KM_USER0);
> +	    SetPageUptodate(prepared_pages[0]);
> +	}
> +	if ( to != PAGE_CACHE_SIZE ) { /* Last page needs to be partially zeroed */
> +	    char *kaddr = kmap_atomic(prepared_pages[num_pages-1], KM_USER0);
> +	    memset(kaddr+to, 0, PAGE_CACHE_SIZE - to);
> +	    kunmap_atomic( kaddr, KM_USER0);
> +	    SetPageUptodate(prepared_pages[num_pages-1]);
> +	}

This seems wrong.  This could be a newly-allocated pagecache page.  It is not
yet fully uptodate.  If (say) the subsequent copy_from_user gets a fault then
it appears that this now-uptodate pagecache page will leak uninitialised stuff?

> +			set_bit(BH_Uptodate, &bh->b_state);

set_buffer_uptodate(bh) is more conventional (two instances).

> +		/* We need to mark new file size in case this function will be
> +		   interrupted/aborted later on. And we may do this only for
> +		   holes. */
> +		inode->i_size += inode->i_sb->s_blocksize * blocks_needed;

64-bit multiply, should be a shift.

next prev parent reply	other threads:[~2002-12-13 21:33 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-12-13 18:56 [BK][PATCH] ReiserFS CPU and memory bandwidth efficient large writes Hans Reiser
2002-12-13 21:40 ` Andrew Morton [this message]
2002-12-14  1:11   ` Hans Reiser
2002-12-14 11:44   ` Oleg Drokin
2002-12-14 18:27     ` Andrew Morton
2002-12-14 13:21   ` Oleg Drokin
2002-12-14 18:42     ` Andrew Morton
2002-12-14 19:20       ` Oleg Drokin
2002-12-14 20:10         ` Andrew Morton
2002-12-14 20:25           ` Oleg Drokin
2002-12-16 18:24             ` Chris Mason
2002-12-17 10:53               ` Oleg Drokin
2002-12-14 22:21           ` Hans Reiser

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3DFA53DA.DE6788C1@digeo.com \
    --to=akpm@digeo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=reiser@namesys.com \
    --cc=torvalds@transmeta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox