From: Andrew Morton <akpm@digeo.com>
To: Hans Reiser <reiser@namesys.com>
Cc: Linus Torvalds <torvalds@transmeta.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [BK][PATCH] ReiserFS CPU and memory bandwidth efficient large writes
Date: Fri, 13 Dec 2002 13:40:42 -0800 [thread overview]
Message-ID: <3DFA53DA.DE6788C1@digeo.com> (raw)
In-Reply-To: 3DFA2D4F.3010301@namesys.com
Hans Reiser wrote:
> ..
> These three changesets implement reiserfs_file_write.
What a big patch.
I'd be interested in testing it a bit. Could you please
describe under what circumstances the new code paths are
executed? Is it simply appending to a normal old file? No
special mount options?
A few little comments, and maybe a security problem....
> +int reiserfs_can_fit_pages ( struct super_block *sb /* superblock of filesystem
> + to estimate space */ )
> +{
> + unsigned long space;
> +
> + spin_lock(&REISERFS_SB(sb)->bitmap_lock);
> + space = (SB_FREE_BLOCKS(sb) - REISERFS_SB(sb)->reserved_blocks) / ( PAGE_CACHE_SIZE/sb->s_blocksize);
> + spin_unlock(&REISERFS_SB(sb)->bitmap_lock);
> +
> + return space;
> }
Both of the divisions here can be replaced with shifts. Divides are expensive.
The locking here is peculiar:
spin_lock(lock);
value = calculation();
spin_unlock(lock);
return value;
Surely, if the calculation relies on the lock then the returned value is invalidated
as soon as you drop the lock?
> - offset = le_ih_k_offset( ih ) + (old_len - tb->rbytes );
> + offset = le_ih_k_offset( ih ) + (old_len - tb->rbytes )*(is_indirect_le_ih(ih)?tb->tb_sb->s_blocksize/UNFM_P_SIZE:1);
Can use a shift.
> + if (is_indirect_le_key(version,B_N_PKEY(tb->R[0],0))){
> + temp_rem = (n_rem / UNFM_P_SIZE) *
> + tb->tb_sb->s_blocksize;
Shift by i_blkbits (all over the place)
> +/* this function from inode.c would be used here, too */
> +extern void restart_transaction(struct reiserfs_transaction_handle *th,
> + struct inode *inode, struct path *path);
This decl should be in a header file.
> +
> +/* I really do not want to play with memory shortage right now, so
> + to simplify the code, we are not going to write more than this much pages at
> + a time. This still should considerably improve performance compared to 4k
> + at a time case. */
> +#define REISERFS_WRITE_PAGES_AT_A_TIME 32
Page sizes vary. A better value may be
(128 * 1024) / PAGE_CACHE_SIZE
so that consistent behaviour is seem on platforms which have page sizes larger
than 4k.
> +int reiserfs_allocate_blocks_for_region(
> ...
> + b_blocknr_t allocated_blocks[blocks_to_allocate]; // Pointer to a place where allocated blocknumbers would be stored. Right now statically allocated, later that will change.
This is a variable-sized array (aka: alloca). It's not a problem IMO, but
worth pointing out...
> + reiserfs_blocknr_hint_t hint; // hint structure for block allocator.
> + size_t res; // return value of various functions that we call.
> + int curr_block; // current block used to keep track of unmapped blocks.
> + int i; // loop counter
> + int itempos; // position in item
> + unsigned int from = (pos & (PAGE_CACHE_SIZE - 1)); // writing position in
> + // first page
> + unsigned int to = ((pos + write_bytes - 1) & (PAGE_CACHE_SIZE - 1)) + 1; /* last modified byte offset in last page */
> + __u64 hole_size ; // amount of blocks for a file hole, if it needed to be created.
> + int modifying_this_item = 0; // Flag for items traversal code to keep track
> + // of the fact that we already prepared
> + // current block for journal
How much stack is this function using, worst-case?
> + for ( i = 0; i < num_pages; i++) {
> + prepared_pages[i] = grab_cache_page(mapping, index + i); // locks the page
OK. But note that locking multiple pages here is only free from AB/BA deadlocks
because this is the only path which does it, and it is singly-threaded via i_sem.
> + if ( from != 0 ) {/* First page needs to be partially zeroed */
> + char *kaddr = kmap_atomic(prepared_pages[0], KM_USER0);
> + memset(kaddr, 0, from);
> + kunmap_atomic( kaddr, KM_USER0);
> + SetPageUptodate(prepared_pages[0]);
> + }
> + if ( to != PAGE_CACHE_SIZE ) { /* Last page needs to be partially zeroed */
> + char *kaddr = kmap_atomic(prepared_pages[num_pages-1], KM_USER0);
> + memset(kaddr+to, 0, PAGE_CACHE_SIZE - to);
> + kunmap_atomic( kaddr, KM_USER0);
> + SetPageUptodate(prepared_pages[num_pages-1]);
> + }
This seems wrong. This could be a newly-allocated pagecache page. It is not
yet fully uptodate. If (say) the subsequent copy_from_user gets a fault then
it appears that this now-uptodate pagecache page will leak uninitialised stuff?
> + set_bit(BH_Uptodate, &bh->b_state);
set_buffer_uptodate(bh) is more conventional (two instances).
> + /* We need to mark new file size in case this function will be
> + interrupted/aborted later on. And we may do this only for
> + holes. */
> + inode->i_size += inode->i_sb->s_blocksize * blocks_needed;
64-bit multiply, should be a shift.
next prev parent reply other threads:[~2002-12-13 21:33 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-12-13 18:56 [BK][PATCH] ReiserFS CPU and memory bandwidth efficient large writes Hans Reiser
2002-12-13 21:40 ` Andrew Morton [this message]
2002-12-14 1:11 ` Hans Reiser
2002-12-14 11:44 ` Oleg Drokin
2002-12-14 18:27 ` Andrew Morton
2002-12-14 13:21 ` Oleg Drokin
2002-12-14 18:42 ` Andrew Morton
2002-12-14 19:20 ` Oleg Drokin
2002-12-14 20:10 ` Andrew Morton
2002-12-14 20:25 ` Oleg Drokin
2002-12-16 18:24 ` Chris Mason
2002-12-17 10:53 ` Oleg Drokin
2002-12-14 22:21 ` Hans Reiser
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3DFA53DA.DE6788C1@digeo.com \
--to=akpm@digeo.com \
--cc=linux-kernel@vger.kernel.org \
--cc=reiser@namesys.com \
--cc=torvalds@transmeta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox