* [PATCH 0/4] map multiple blocks in get_block() and mpage_readpages()
@ 2006-02-27 21:20 Badari Pulavarty
2006-02-27 21:22 ` [PATCH 1/4] change buffer_head.b_size to size_t Badari Pulavarty
` (4 more replies)
0 siblings, 5 replies; 18+ messages in thread
From: Badari Pulavarty @ 2006-02-27 21:20 UTC (permalink / raw)
To: akpm; +Cc: lkml, pbadari
Hi,
Following patches add support to map multiple blocks in ->get_block().
This is will allow us to handle mapping of multiple disk blocks for
mpage_readpages() and mpage_writepages() etc. Instead of adding new
argument, I use "b_size" to indicate the amount of disk mapping needed
for get_block(). And also, on success get_block() actually indicates
the amount of disk mapping it did.
Now that get_block() can handle multiple blocks, there is no need
for ->get_blocks() which was added for DIO.
[PATCH 1/4] change buffer_head.b_size to size_t
[PATCH 2/4] pass b_size to ->get_block()
[PATCH 3/4] map multiple blocks for mpage_readpages()
[PATCH 4/4] remove ->get_blocks() support
I noticed decent improvements (reduced sys time) on JFS, XFS and ext3.
(on simple "dd" read tests).
(rc3.mm1) (rc3.mm1 + patches)
real 0m18.814s 0m18.482s
user 0m0.000s 0m0.004s
sys 0m3.240s 0m2.912s
Andrew, Could you include it in -mm tree ?
Comments ?
Thanks,
Badari
^ permalink raw reply [flat|nested] 18+ messages in thread* [PATCH 1/4] change buffer_head.b_size to size_t 2006-02-27 21:20 [PATCH 0/4] map multiple blocks in get_block() and mpage_readpages() Badari Pulavarty @ 2006-02-27 21:22 ` Badari Pulavarty 2006-03-02 1:51 ` Andrew Morton 2006-03-02 1:51 ` Andrew Morton 2006-02-27 21:23 ` [PATCH 2/4] pass b_size to ->get_block() Badari Pulavarty ` (3 subsequent siblings) 4 siblings, 2 replies; 18+ messages in thread From: Badari Pulavarty @ 2006-02-27 21:22 UTC (permalink / raw) To: akpm; +Cc: lkml [-- Attachment #1: Type: text/plain, Size: 368 bytes --] Increase the size of the buffer_head b_size field (only) for 64 bit platforms. Update some old and moldy comments in and around the structure as well. The b_size increase allows us to perform larger mappings and allocations for large I/O requests from userspace, which tie in with other changes allowing the get_block_t() interface to map multiple blocks at once. [-- Attachment #2: increase-bsize.patch --] [-- Type: text/x-patch, Size: 4662 bytes --] Increase the size of the buffer_head b_size field (only) for 64 bit platforms. Update some old and moldy comments in and around the structure as well. The b_size increase allows us to perform larger mappings and allocations for large I/O requests from userspace, which tie in with other changes allowing the get_block_t() interface to map multiple blocks at once. Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Index: linux-2.6.16-rc5/include/linux/buffer_head.h =================================================================== --- linux-2.6.16-rc5.orig/include/linux/buffer_head.h 2006-02-26 21:09:35.000000000 -0800 +++ linux-2.6.16-rc5/include/linux/buffer_head.h 2006-02-27 08:22:37.000000000 -0800 @@ -46,25 +46,28 @@ struct address_space; typedef void (bh_end_io_t)(struct buffer_head *bh, int uptodate); /* - * Keep related fields in common cachelines. The most commonly accessed - * field (b_state) goes at the start so the compiler does not generate - * indexed addressing for it. + * Historically, a buffer_head was used to map a single block + * within a page, and of course as the unit of I/O through the + * filesystem and block layers. Nowadays the basic I/O unit + * is the bio, and buffer_heads are used for extracting block + * mappings (via a get_block_t call), for tracking state within + * a page (via a page_mapping) and for wrapping bio submission + * for backward compatibility reasons (e.g. submit_bh). */ struct buffer_head { - /* First cache line: */ unsigned long b_state; /* buffer state bitmap (see above) */ struct buffer_head *b_this_page;/* circular list of page's buffers */ struct page *b_page; /* the page this bh is mapped to */ - atomic_t b_count; /* users using this block */ - u32 b_size; /* block size */ - sector_t b_blocknr; /* block number */ - char *b_data; /* pointer to data block */ + sector_t b_blocknr; /* start block number */ + size_t b_size; /* size of mapping */ + char *b_data; /* pointer to data within the page */ struct block_device *b_bdev; bh_end_io_t *b_end_io; /* I/O completion */ void *b_private; /* reserved for b_end_io */ struct list_head b_assoc_buffers; /* associated with another mapping */ + atomic_t b_count; /* users using this buffer_head */ }; /* Index: linux-2.6.16-rc5/fs/buffer.c =================================================================== --- linux-2.6.16-rc5.orig/fs/buffer.c 2006-02-26 21:09:35.000000000 -0800 +++ linux-2.6.16-rc5/fs/buffer.c 2006-02-27 08:22:37.000000000 -0800 @@ -432,7 +432,8 @@ __find_get_block_slow(struct block_devic printk("__find_get_block_slow() failed. " "block=%llu, b_blocknr=%llu\n", (unsigned long long)block, (unsigned long long)bh->b_blocknr); - printk("b_state=0x%08lx, b_size=%u\n", bh->b_state, bh->b_size); + printk("b_state=0x%08lx, b_size=%lu\n", bh->b_state, + (unsigned long)bh->b_size); printk("device blocksize: %d\n", 1 << bd_inode->i_blkbits); } out_unlock: Index: linux-2.6.16-rc5/fs/reiserfs/prints.c =================================================================== --- linux-2.6.16-rc5.orig/fs/reiserfs/prints.c 2006-02-26 21:09:35.000000000 -0800 +++ linux-2.6.16-rc5/fs/reiserfs/prints.c 2006-02-27 08:22:37.000000000 -0800 @@ -143,8 +143,8 @@ static void sprintf_buffer_head(char *bu char b[BDEVNAME_SIZE]; sprintf(buf, - "dev %s, size %d, blocknr %llu, count %d, state 0x%lx, page %p, (%s, %s, %s)", - bdevname(bh->b_bdev, b), bh->b_size, + "dev %s, size %ld, blocknr %llu, count %d, state 0x%lx, page %p, (%s, %s, %s)", + bdevname(bh->b_bdev, b), (unsigned long)bh->b_size, (unsigned long long)bh->b_blocknr, atomic_read(&(bh->b_count)), bh->b_state, bh->b_page, buffer_uptodate(bh) ? "UPTODATE" : "!UPTODATE", Index: linux-2.6.16-rc5/fs/ocfs2/journal.c =================================================================== --- linux-2.6.16-rc5.orig/fs/ocfs2/journal.c 2006-02-26 21:09:35.000000000 -0800 +++ linux-2.6.16-rc5/fs/ocfs2/journal.c 2006-02-27 08:22:37.000000000 -0800 @@ -377,12 +377,12 @@ int ocfs2_journal_access(struct ocfs2_jo BUG_ON(!bh); BUG_ON(!(handle->flags & OCFS2_HANDLE_STARTED)); - mlog_entry("bh->b_blocknr=%llu, type=%d (\"%s\"), bh->b_size = %hu\n", + mlog_entry("bh->b_blocknr=%llu, type=%d (\"%s\"), bh->b_size = %lu\n", (unsigned long long)bh->b_blocknr, type, (type == OCFS2_JOURNAL_ACCESS_CREATE) ? "OCFS2_JOURNAL_ACCESS_CREATE" : "OCFS2_JOURNAL_ACCESS_WRITE", - bh->b_size); + (unsigned long)bh->b_size); /* we can safely remove this assertion after testing. */ if (!buffer_uptodate(bh)) { ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/4] change buffer_head.b_size to size_t 2006-02-27 21:22 ` [PATCH 1/4] change buffer_head.b_size to size_t Badari Pulavarty @ 2006-03-02 1:51 ` Andrew Morton 2006-03-02 1:51 ` Andrew Morton 1 sibling, 0 replies; 18+ messages in thread From: Andrew Morton @ 2006-03-02 1:51 UTC (permalink / raw) To: Badari Pulavarty; +Cc: linux-kernel Badari Pulavarty <pbadari@us.ibm.com> wrote: > > + size_t b_size; /* size of mapping */ > + char *b_data; /* pointer to data within the page */ > > struct block_device *b_bdev; > bh_end_io_t *b_end_io; /* I/O completion */ > void *b_private; /* reserved for b_end_io */ > struct list_head b_assoc_buffers; /* associated with another mapping */ > + atomic_t b_count; /* users using this buffer_head */ > }; > > /* > Index: linux-2.6.16-rc5/fs/buffer.c > =================================================================== > --- linux-2.6.16-rc5.orig/fs/buffer.c 2006-02-26 21:09:35.000000000 -0800 > +++ linux-2.6.16-rc5/fs/buffer.c 2006-02-27 08:22:37.000000000 -0800 > @@ -432,7 +432,8 @@ __find_get_block_slow(struct block_devic > printk("__find_get_block_slow() failed. " > "block=%llu, b_blocknr=%llu\n", > (unsigned long long)block, (unsigned long long)bh->b_blocknr); > - printk("b_state=0x%08lx, b_size=%u\n", bh->b_state, bh->b_size); > + printk("b_state=0x%08lx, b_size=%lu\n", bh->b_state, > + (unsigned long)bh->b_size); We print size_t with `%z'. Hence the cast isn't needed. (I'll edit the diff..) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/4] change buffer_head.b_size to size_t 2006-02-27 21:22 ` [PATCH 1/4] change buffer_head.b_size to size_t Badari Pulavarty 2006-03-02 1:51 ` Andrew Morton @ 2006-03-02 1:51 ` Andrew Morton 2006-03-02 2:22 ` Nathan Scott 1 sibling, 1 reply; 18+ messages in thread From: Andrew Morton @ 2006-03-02 1:51 UTC (permalink / raw) To: Badari Pulavarty; +Cc: linux-kernel Badari Pulavarty <pbadari@us.ibm.com> wrote: > > + * Historically, a buffer_head was used to map a single block > + * within a page, and of course as the unit of I/O through the > + * filesystem and block layers. Nowadays the basic I/O unit > + * is the bio, and buffer_heads are used for extracting block > + * mappings (via a get_block_t call), for tracking state within > + * a page (via a page_mapping) and for wrapping bio submission > + * for backward compatibility reasons (e.g. submit_bh). Well kinda. A buffer_head remains the kernel's basic abstraction for a "disk block". We cannot replace that with `struct page' (size isn't flexible) nor of course with `struct bio'. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/4] change buffer_head.b_size to size_t 2006-03-02 1:51 ` Andrew Morton @ 2006-03-02 2:22 ` Nathan Scott 2006-03-02 2:56 ` Andrew Morton 0 siblings, 1 reply; 18+ messages in thread From: Nathan Scott @ 2006-03-02 2:22 UTC (permalink / raw) To: Andrew Morton; +Cc: Badari Pulavarty, linux-kernel On Wed, Mar 01, 2006 at 05:51:48PM -0800, Andrew Morton wrote: > Badari Pulavarty <pbadari@us.ibm.com> wrote: > > > > + * Historically, a buffer_head was used to map a single block > > + * within a page, and of course as the unit of I/O through the > > + * filesystem and block layers. Nowadays the basic I/O unit > > + * is the bio, and buffer_heads are used for extracting block > > + * mappings (via a get_block_t call), for tracking state within > > + * a page (via a page_mapping) and for wrapping bio submission > > + * for backward compatibility reasons (e.g. submit_bh). > > Well kinda. A buffer_head remains the kernel's basic abstraction for a > "disk block". Thats what I said (meant to say) with "buffer_heads are used for extracting block mappings". I think by "disk block" you mean what I'm thinking of as a "block mapping" (series of contiguous blocks). I'd think of a sector_t as a "disk block", but maybe I'm just wierd that way... a better wordsmith should jump in and update the comment I guess. > We cannot replace that with `struct page' (size isn't > flexible) nor of course with `struct bio'. Indeed. -- Nathan ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/4] change buffer_head.b_size to size_t 2006-03-02 2:22 ` Nathan Scott @ 2006-03-02 2:56 ` Andrew Morton 2006-03-06 19:54 ` Tim Pepper 0 siblings, 1 reply; 18+ messages in thread From: Andrew Morton @ 2006-03-02 2:56 UTC (permalink / raw) To: Nathan Scott; +Cc: pbadari, linux-kernel Nathan Scott <nathans@sgi.com> wrote: > > On Wed, Mar 01, 2006 at 05:51:48PM -0800, Andrew Morton wrote: > > Badari Pulavarty <pbadari@us.ibm.com> wrote: > > > > > > + * Historically, a buffer_head was used to map a single block > > > + * within a page, and of course as the unit of I/O through the > > > + * filesystem and block layers. Nowadays the basic I/O unit > > > + * is the bio, and buffer_heads are used for extracting block > > > + * mappings (via a get_block_t call), for tracking state within > > > + * a page (via a page_mapping) and for wrapping bio submission > > > + * for backward compatibility reasons (e.g. submit_bh). > > > > Well kinda. A buffer_head remains the kernel's basic abstraction for a > > "disk block". > > Thats what I said (meant to say) with > "buffer_heads are used for extracting block mappings". > > I think by "disk block" you mean what I'm thinking of as a > "block mapping" (series of contiguous blocks). I'd think > of a sector_t as a "disk block", but maybe I'm just wierd > that way... a better wordsmith should jump in and update > the comment I guess. Think of `struct buffer_head' as `struct block'. If you want to read a block off the disk, diddle with it and write it back (while, of course, being coherent with everything else which is going on), the bh is the only construct we have for this. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/4] change buffer_head.b_size to size_t 2006-03-02 2:56 ` Andrew Morton @ 2006-03-06 19:54 ` Tim Pepper 0 siblings, 0 replies; 18+ messages in thread From: Tim Pepper @ 2006-03-06 19:54 UTC (permalink / raw) To: pbadari; +Cc: Andrew Morton, Nathan Scott, linux-kernel If you're looking for nice wording: I forget if it is Linux Device Drivers, Understanding the Linux Kernel or Linux Kernel Development, but more or less combined they manage to describe the bastard state of what struct buffer_head used to be and how bio coming along leaves bh primarily as mapping between a memory buffer and a disk block. With block related things being abstracted cleaner as of 2.6 (and that hopefully being a continuing goal) I'd vote for any change to this struct's comment focusing on what the struct is meant to do, not what abstractions have been (ab)used to accomplish different things over time. Tim ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2/4] pass b_size to ->get_block() 2006-02-27 21:20 [PATCH 0/4] map multiple blocks in get_block() and mpage_readpages() Badari Pulavarty 2006-02-27 21:22 ` [PATCH 1/4] change buffer_head.b_size to size_t Badari Pulavarty @ 2006-02-27 21:23 ` Badari Pulavarty 2006-03-02 1:52 ` Andrew Morton 2006-02-27 21:24 ` [PATCH 3/4] map multiple blocks for mpage_readpages() Badari Pulavarty ` (2 subsequent siblings) 4 siblings, 1 reply; 18+ messages in thread From: Badari Pulavarty @ 2006-02-27 21:23 UTC (permalink / raw) To: akpm; +Cc: lkml [-- Attachment #1: Type: text/plain, Size: 153 bytes --] Pass amount of disk needs to be mapped to get_block(). This way one can modify the fs ->get_block() functions to map multiple blocks at the same time. [-- Attachment #2: getblock-take-size.patch --] [-- Type: text/x-patch, Size: 3339 bytes --] Pass amount of disk needs to be mapped to get_block(). This way one can modify the fs ->get_block() functions to map multiple blocks at the same time. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> fs/buffer.c | 6 ++++++ fs/mpage.c | 2 ++ include/linux/buffer_head.h | 1 + 3 files changed, 9 insertions(+) Index: linux-2.6.16-rc5/fs/buffer.c =================================================================== --- linux-2.6.16-rc5.orig/fs/buffer.c 2006-02-27 08:22:37.000000000 -0800 +++ linux-2.6.16-rc5/fs/buffer.c 2006-02-27 08:22:45.000000000 -0800 @@ -1786,6 +1786,7 @@ static int __block_write_full_page(struc clear_buffer_dirty(bh); set_buffer_uptodate(bh); } else if (!buffer_mapped(bh) && buffer_dirty(bh)) { + bh->b_size = 1 << inode->i_blkbits; err = get_block(inode, block, bh, 1); if (err) goto recover; @@ -1939,6 +1940,7 @@ static int __block_prepare_write(struct if (buffer_new(bh)) clear_buffer_new(bh); if (!buffer_mapped(bh)) { + bh->b_size = 1 << inode->i_blkbits; err = get_block(inode, block, bh, 1); if (err) break; @@ -2094,6 +2096,7 @@ int block_read_full_page(struct page *pa fully_mapped = 0; if (iblock < lblock) { + bh->b_size = 1 << inode->i_blkbits; err = get_block(inode, iblock, bh, 0); if (err) SetPageError(page); @@ -2415,6 +2418,7 @@ int nobh_prepare_write(struct page *page create = 1; if (block_start >= to) create = 0; + map_bh.b_size = 1 << blkbits; ret = get_block(inode, block_in_file + block_in_page, &map_bh, create); if (ret) @@ -2675,6 +2679,7 @@ int block_truncate_page(struct address_s err = 0; if (!buffer_mapped(bh)) { + bh->b_size = 1 << inode->i_blkbits; err = get_block(inode, iblock, bh, 0); if (err) goto unlock; @@ -2761,6 +2766,7 @@ sector_t generic_block_bmap(struct addre struct inode *inode = mapping->host; tmp.b_state = 0; tmp.b_blocknr = 0; + tmp.b_size = 1 << inode->i_blkbits; get_block(inode, block, &tmp, 0); return tmp.b_blocknr; } Index: linux-2.6.16-rc5/include/linux/buffer_head.h =================================================================== --- linux-2.6.16-rc5.orig/include/linux/buffer_head.h 2006-02-27 08:22:37.000000000 -0800 +++ linux-2.6.16-rc5/include/linux/buffer_head.h 2006-02-27 08:22:45.000000000 -0800 @@ -280,6 +280,7 @@ map_bh(struct buffer_head *bh, struct su set_buffer_mapped(bh); bh->b_bdev = sb->s_bdev; bh->b_blocknr = block; + bh->b_size = sb->s_blocksize; } /* Index: linux-2.6.16-rc5/fs/mpage.c =================================================================== --- linux-2.6.16-rc5.orig/fs/mpage.c 2006-02-26 21:09:35.000000000 -0800 +++ linux-2.6.16-rc5/fs/mpage.c 2006-02-27 08:22:45.000000000 -0800 @@ -192,6 +192,7 @@ do_mpage_readpage(struct bio *bio, struc page_block++, block_in_file++) { bh.b_state = 0; if (block_in_file < last_block) { + bh.b_size = blocksize; if (get_block(inode, block_in_file, &bh, 0)) goto confused; } @@ -472,6 +473,7 @@ __mpage_writepage(struct bio *bio, struc for (page_block = 0; page_block < blocks_per_page; ) { map_bh.b_state = 0; + map_bh.b_size = 1 << blkbits; if (get_block(inode, block_in_file, &map_bh, 1)) goto confused; if (buffer_new(&map_bh)) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/4] pass b_size to ->get_block() 2006-02-27 21:23 ` [PATCH 2/4] pass b_size to ->get_block() Badari Pulavarty @ 2006-03-02 1:52 ` Andrew Morton 2006-03-03 17:17 ` Badari Pulavarty 0 siblings, 1 reply; 18+ messages in thread From: Andrew Morton @ 2006-03-02 1:52 UTC (permalink / raw) To: Badari Pulavarty; +Cc: linux-kernel Badari Pulavarty <pbadari@us.ibm.com> wrote: > > Pass amount of disk needs to be mapped to get_block(). > This way one can modify the fs ->get_block() functions > to map multiple blocks at the same time. I can't say I terribly like this patch. Initialising b_size all over the place seems fragile. We're _already_ setting bh.b_size to the right thing in alloc_page_buffers(), and for a bh which is attached to pagecache_page->private, there's no reason why b_size would ever change. So what I think I'll do is to convert those places where you're needlessly assigning to b_size into temporary WARN_ON(b_size != blocksize). The only place where we need to initialise b_size is where we've got a non-pagecache bh allocated on the stack. We need to be sure that no ->get_block() implementations write garbage into bh->b_size if something goes wrong. b_size on a pagecache-based buffer_head must remain inviolate. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/4] pass b_size to ->get_block() 2006-03-02 1:52 ` Andrew Morton @ 2006-03-03 17:17 ` Badari Pulavarty 0 siblings, 0 replies; 18+ messages in thread From: Badari Pulavarty @ 2006-03-03 17:17 UTC (permalink / raw) To: Andrew Morton; +Cc: lkml On Wed, 2006-03-01 at 17:52 -0800, Andrew Morton wrote: > Badari Pulavarty <pbadari@us.ibm.com> wrote: > > > > Pass amount of disk needs to be mapped to get_block(). > > This way one can modify the fs ->get_block() functions > > to map multiple blocks at the same time. > > I can't say I terribly like this patch. Initialising b_size all over the > place seems fragile. I went through all the in-kernel places where we call ->getblock() and initialized b_size correctly. > > We're _already_ setting bh.b_size to the right thing in > alloc_page_buffers(), and for a bh which is attached to > pagecache_page->private, there's no reason why b_size would ever change. Good point. > So what I think I'll do is to convert those places where you're needlessly > assigning to b_size into temporary WARN_ON(b_size != blocksize). Yep. > The only place where we need to initialise b_size is where we've got a > non-pagecache bh allocated on the stack. > > We need to be sure that no ->get_block() implementations write garbage into > bh->b_size if something goes wrong. b_size on a pagecache-based > buffer_head must remain inviolate. Good news is, filesystems that allocate a single block currently don't care about b_size anyway. I guess to be paranoid, we should add WARN_ON() or BUG_ON() in jfs, xfs, ext3 (in -mm) ->getblock() to check for a valid b_size, before using it (not being negitive, multiple of blocksize, not-a-huge-number type checks ?). Thanks, Badari ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 3/4] map multiple blocks for mpage_readpages() 2006-02-27 21:20 [PATCH 0/4] map multiple blocks in get_block() and mpage_readpages() Badari Pulavarty 2006-02-27 21:22 ` [PATCH 1/4] change buffer_head.b_size to size_t Badari Pulavarty 2006-02-27 21:23 ` [PATCH 2/4] pass b_size to ->get_block() Badari Pulavarty @ 2006-02-27 21:24 ` Badari Pulavarty 2006-03-02 4:45 ` Andrew Morton 2006-02-27 21:24 ` [PATCH 4/4] remove ->get_blocks() support Badari Pulavarty 2006-03-02 1:52 ` [PATCH 0/4] map multiple blocks in get_block() and mpage_readpages() Andrew Morton 4 siblings, 1 reply; 18+ messages in thread From: Badari Pulavarty @ 2006-02-27 21:24 UTC (permalink / raw) To: akpm; +Cc: lkml [-- Attachment #1: Type: text/plain, Size: 482 bytes --] This patch changes mpage_readpages() and get_block() to get the disk mapping information for multiple blocks at the same time. b_size represents the amount of disk mapping that needs to mapped. On the successful get_block() b_size indicates the amount of disk mapping thats actually mapped. Only the filesystems who care to use this information and provide multiple disk blocks at a time can choose to do so. No changes are needed for the filesystems who wants to ignore this. [-- Attachment #2: getblocks-readpages.patch --] [-- Type: text/x-patch, Size: 7570 bytes --] This patch changes mpage_readpages() and get_block() to get the disk mapping information for multiple blocks at the same time. b_size represents the amount of disk mapping that needs to mapped. On the successful get_block() b_size indicates the amount of disk mapping thats actually mapped. Only the filesystems who care to use this information and provide multiple disk blocks at a time can choose to do so. No changes are needed for the filesystems who wants to ignore this. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> fs/jfs/inode.c | 3 - fs/mpage.c | 98 +++++++++++++++++++++++++++++++++++--------- fs/xfs/linux-2.6/xfs_aops.c | 4 - 3 files changed, 83 insertions(+), 22 deletions(-) Index: linux-2.6.16-rc5/fs/mpage.c =================================================================== --- linux-2.6.16-rc5.orig/fs/mpage.c 2006-02-27 08:22:45.000000000 -0800 +++ linux-2.6.16-rc5/fs/mpage.c 2006-02-27 08:22:56.000000000 -0800 @@ -165,7 +165,9 @@ map_buffer_to_page(struct page *page, st static struct bio * do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages, - sector_t *last_block_in_bio, get_block_t get_block) + sector_t *last_block_in_bio, struct buffer_head *map_bh, + unsigned long *first_logical_block, int *map_valid, + get_block_t get_block) { struct inode *inode = page->mapping->host; const unsigned blkbits = inode->i_blkbits; @@ -173,34 +175,72 @@ do_mpage_readpage(struct bio *bio, struc const unsigned blocksize = 1 << blkbits; sector_t block_in_file; sector_t last_block; + sector_t last_block_in_file; sector_t blocks[MAX_BUF_PER_PAGE]; unsigned page_block; unsigned first_hole = blocks_per_page; struct block_device *bdev = NULL; - struct buffer_head bh; int length; int fully_mapped = 1; + unsigned nblocks, i; if (page_has_buffers(page)) goto confused; block_in_file = (sector_t)page->index << (PAGE_CACHE_SHIFT - blkbits); - last_block = (i_size_read(inode) + blocksize - 1) >> blkbits; + last_block = block_in_file + (nr_pages * blocks_per_page); + last_block_in_file = (i_size_read(inode) + blocksize - 1) >> blkbits; + if (last_block > last_block_in_file) + last_block = last_block_in_file; + page_block = 0; + + /* + * Map blocks using the result from the last get_blocks call first. + */ + nblocks = map_bh->b_size >> blkbits; + if (*map_valid && + block_in_file > *first_logical_block && + block_in_file < (*first_logical_block + nblocks)) { + unsigned map_offset = block_in_file - *first_logical_block; + unsigned last = nblocks - map_offset; + + for (i = 0; ; i++) { + if (i == last) { + *map_valid = 0; + break; + } + if (page_block == blocks_per_page) + break; + blocks[page_block] = map_bh->b_blocknr + map_offset + i; + page_block++; + block_in_file++; + } + bdev = map_bh->b_bdev; + } + + /* + * Then do more get_blocks calls until we are done with this page. + */ + map_bh->b_page = page; + while (page_block < blocks_per_page) { + map_bh->b_state = 0; + map_bh->b_size = 0; - bh.b_page = page; - for (page_block = 0; page_block < blocks_per_page; - page_block++, block_in_file++) { - bh.b_state = 0; if (block_in_file < last_block) { - bh.b_size = blocksize; - if (get_block(inode, block_in_file, &bh, 0)) + map_bh->b_size = (last_block - block_in_file) << blkbits; + if (get_block(inode, block_in_file, map_bh, 0)) goto confused; + *first_logical_block = block_in_file; + *map_valid = 1; } - if (!buffer_mapped(&bh)) { + if (!buffer_mapped(map_bh)) { fully_mapped = 0; if (first_hole == blocks_per_page) first_hole = page_block; + page_block++; + block_in_file++; + *map_valid = 0; continue; } @@ -210,8 +250,8 @@ do_mpage_readpage(struct bio *bio, struc * we just collected from get_block into the page's buffers * so readpage doesn't have to repeat the get_block call */ - if (buffer_uptodate(&bh)) { - map_buffer_to_page(page, &bh, page_block); + if (buffer_uptodate(map_bh)) { + map_buffer_to_page(page, map_bh, page_block); goto confused; } @@ -219,10 +259,20 @@ do_mpage_readpage(struct bio *bio, struc goto confused; /* hole -> non-hole */ /* Contiguous blocks? */ - if (page_block && blocks[page_block-1] != bh.b_blocknr-1) + if (page_block && blocks[page_block-1] != map_bh->b_blocknr-1) goto confused; - blocks[page_block] = bh.b_blocknr; - bdev = bh.b_bdev; + nblocks = map_bh->b_size >> blkbits; + for (i = 0; ; i++) { + if (i == nblocks) { + *map_valid = 0; + break; + } else if (page_block == blocks_per_page) + break; + blocks[page_block] = map_bh->b_blocknr + i; + page_block++; + block_in_file++; + } + bdev = map_bh->b_bdev; } if (first_hole != blocks_per_page) { @@ -261,7 +311,7 @@ alloc_new: goto alloc_new; } - if (buffer_boundary(&bh) || (first_hole != blocks_per_page)) + if (buffer_boundary(map_bh) || (first_hole != blocks_per_page)) bio = mpage_bio_submit(READ, bio); else *last_block_in_bio = blocks[blocks_per_page - 1]; @@ -332,6 +382,9 @@ mpage_readpages(struct address_space *ma unsigned page_idx; sector_t last_block_in_bio = 0; struct pagevec lru_pvec; + struct buffer_head map_bh; + unsigned long first_logical_block = 0; + int map_valid = 0; pagevec_init(&lru_pvec, 0); for (page_idx = 0; page_idx < nr_pages; page_idx++) { @@ -343,7 +396,9 @@ mpage_readpages(struct address_space *ma page->index, GFP_KERNEL)) { bio = do_mpage_readpage(bio, page, nr_pages - page_idx, - &last_block_in_bio, get_block); + &last_block_in_bio, &map_bh, + &first_logical_block, + &map_valid, get_block); if (!pagevec_add(&lru_pvec, page)) __pagevec_lru_add(&lru_pvec); } else { @@ -365,9 +420,14 @@ int mpage_readpage(struct page *page, ge { struct bio *bio = NULL; sector_t last_block_in_bio = 0; + struct buffer_head map_bh; + unsigned long first_logical_block = 0; + int map_valid = 0; + - bio = do_mpage_readpage(bio, page, 1, - &last_block_in_bio, get_block); + bio = do_mpage_readpage(bio, page, 1, &last_block_in_bio, + &map_bh, &first_logical_block, &map_valid, + get_block); if (bio) mpage_bio_submit(READ, bio); return 0; Index: linux-2.6.16-rc5/fs/jfs/inode.c =================================================================== --- linux-2.6.16-rc5.orig/fs/jfs/inode.c 2006-02-26 21:09:35.000000000 -0800 +++ linux-2.6.16-rc5/fs/jfs/inode.c 2006-02-27 08:22:56.000000000 -0800 @@ -257,7 +257,8 @@ jfs_get_blocks(struct inode *ip, sector_ static int jfs_get_block(struct inode *ip, sector_t lblock, struct buffer_head *bh_result, int create) { - return jfs_get_blocks(ip, lblock, 1, bh_result, create); + return jfs_get_blocks(ip, lblock, bh_result->b_size >> ip->i_blkbits, + bh_result, create); } static int jfs_writepage(struct page *page, struct writeback_control *wbc) Index: linux-2.6.16-rc5/fs/xfs/linux-2.6/xfs_aops.c =================================================================== --- linux-2.6.16-rc5.orig/fs/xfs/linux-2.6/xfs_aops.c 2006-02-26 21:09:35.000000000 -0800 +++ linux-2.6.16-rc5/fs/xfs/linux-2.6/xfs_aops.c 2006-02-27 08:22:56.000000000 -0800 @@ -1136,8 +1136,8 @@ linvfs_get_block( struct buffer_head *bh_result, int create) { - return __linvfs_get_block(inode, iblock, 0, bh_result, - create, 0, BMAPI_WRITE); + return __linvfs_get_block(inode, iblock, bh_result->b_size >> inode->i_blkbits, + bh_result, create, 0, BMAPI_WRITE); } STATIC int ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/4] map multiple blocks for mpage_readpages() 2006-02-27 21:24 ` [PATCH 3/4] map multiple blocks for mpage_readpages() Badari Pulavarty @ 2006-03-02 4:45 ` Andrew Morton 2006-03-03 16:51 ` Badari Pulavarty 0 siblings, 1 reply; 18+ messages in thread From: Andrew Morton @ 2006-03-02 4:45 UTC (permalink / raw) To: Badari Pulavarty; +Cc: linux-kernel Badari Pulavarty <pbadari@us.ibm.com> wrote: > > do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages, > - sector_t *last_block_in_bio, get_block_t get_block) > + sector_t *last_block_in_bio, struct buffer_head *map_bh, > + unsigned long *first_logical_block, int *map_valid, > + get_block_t get_block) I wonder if we really need that map_valid pointer there. The normal way of communicating the validity of a bh is via buffer_uptodate(). I _think_ that's an appropriate thing to use here. With appropriate comments, of course.. If those options are not a sufficiently good fit then we can easily create a new buffer-head.state bit for this purpose - there are lots to spare. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/4] map multiple blocks for mpage_readpages() 2006-03-02 4:45 ` Andrew Morton @ 2006-03-03 16:51 ` Badari Pulavarty 0 siblings, 0 replies; 18+ messages in thread From: Badari Pulavarty @ 2006-03-03 16:51 UTC (permalink / raw) To: Andrew Morton; +Cc: lkml On Wed, 2006-03-01 at 20:45 -0800, Andrew Morton wrote: > Badari Pulavarty <pbadari@us.ibm.com> wrote: > > > > do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages, > > - sector_t *last_block_in_bio, get_block_t get_block) > > + sector_t *last_block_in_bio, struct buffer_head *map_bh, > > + unsigned long *first_logical_block, int *map_valid, > > + get_block_t get_block) > > I wonder if we really need that map_valid pointer there. The normal way of > communicating the validity of a bh is via buffer_uptodate(). I _think_ > that's an appropriate thing to use here. With appropriate comments, of > course.. > > If those options are not a sufficiently good fit then we can easily create > a new buffer-head.state bit for this purpose - there are lots to spare. Yep. Can be done. I will take a pass at it. Thanks, Badari ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 4/4] remove ->get_blocks() support 2006-02-27 21:20 [PATCH 0/4] map multiple blocks in get_block() and mpage_readpages() Badari Pulavarty ` (2 preceding siblings ...) 2006-02-27 21:24 ` [PATCH 3/4] map multiple blocks for mpage_readpages() Badari Pulavarty @ 2006-02-27 21:24 ` Badari Pulavarty 2006-03-02 1:52 ` [PATCH 0/4] map multiple blocks in get_block() and mpage_readpages() Andrew Morton 4 siblings, 0 replies; 18+ messages in thread From: Badari Pulavarty @ 2006-02-27 21:24 UTC (permalink / raw) To: akpm; +Cc: lkml [-- Attachment #1: Type: text/plain, Size: 199 bytes --] Now that get_block() can handle mapping multiple disk blocks, no need to have ->get_blocks(). This patch removes fs specific ->get_blocks() added for DIO and makes it users use get_block() instead. [-- Attachment #2: remove-getblocks.patch --] [-- Type: text/x-patch, Size: 16572 bytes --] Now that get_block() can handle mapping multiple disk blocks, no need to have ->get_blocks(). This patch removes fs specific ->get_blocks() added for DIO and makes it users use get_block() instead. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> fs/block_dev.c | 3 ++- fs/direct-io.c | 27 ++++++++++++++------------- fs/ext2/inode.c | 14 +------------- fs/ext3/inode.c | 3 +-- fs/fat/inode.c | 2 +- fs/hfs/inode.c | 13 +------------ fs/hfsplus/inode.c | 13 +------------ fs/jfs/inode.c | 2 +- fs/ocfs2/aops.c | 2 +- fs/reiserfs/inode.c | 1 - fs/xfs/linux-2.6/xfs_aops.c | 5 ++--- include/linux/fs.h | 17 +++++++---------- 12 files changed, 32 insertions(+), 70 deletions(-) Index: linux-2.6.16-rc5/fs/direct-io.c =================================================================== --- linux-2.6.16-rc5.orig/fs/direct-io.c 2006-02-26 21:09:35.000000000 -0800 +++ linux-2.6.16-rc5/fs/direct-io.c 2006-02-27 08:23:02.000000000 -0800 @@ -86,12 +86,12 @@ struct dio { unsigned first_block_in_page; /* doesn't change, Used only once */ int boundary; /* prev block is at a boundary */ int reap_counter; /* rate limit reaping */ - get_blocks_t *get_blocks; /* block mapping function */ + get_block_t *get_block; /* block mapping function */ dio_iodone_t *end_io; /* IO completion function */ sector_t final_block_in_bio; /* current final block in bio + 1 */ sector_t next_block_for_io; /* next block to be put under IO, in dio_blocks units */ - struct buffer_head map_bh; /* last get_blocks() result */ + struct buffer_head map_bh; /* last get_block() result */ /* * Deferred addition of a page to the dio. These variables are @@ -210,9 +210,9 @@ static struct page *dio_get_page(struct /* * Called when all DIO BIO I/O has been completed - let the filesystem - * know, if it registered an interest earlier via get_blocks. Pass the + * know, if it registered an interest earlier via get_block. Pass the * private field of the map buffer_head so that filesystems can use it - * to hold additional state between get_blocks calls and dio_complete. + * to hold additional state between get_block calls and dio_complete. */ static void dio_complete(struct dio *dio, loff_t offset, ssize_t bytes) { @@ -488,7 +488,7 @@ static int dio_bio_reap(struct dio *dio) * The fs is allowed to map lots of blocks at once. If it wants to do that, * it uses the passed inode-relative block number as the file offset, as usual. * - * get_blocks() is passed the number of i_blkbits-sized blocks which direct_io + * get_block() is passed the number of i_blkbits-sized blocks which direct_io * has remaining to do. The fs should not map more than this number of blocks. * * If the fs has mapped a lot of blocks, it should populate bh->b_size to @@ -501,7 +501,7 @@ static int dio_bio_reap(struct dio *dio) * In the case of filesystem holes: the fs may return an arbitrarily-large * hole by returning an appropriate value in b_size and by clearing * buffer_mapped(). However the direct-io code will only process holes one - * block at a time - it will repeatedly call get_blocks() as it walks the hole. + * block at a time - it will repeatedly call get_block() as it walks the hole. */ static int get_more_blocks(struct dio *dio) { @@ -543,7 +543,8 @@ static int get_more_blocks(struct dio *d * at a higher level for inside-i_size block-instantiating * writes. */ - ret = (*dio->get_blocks)(dio->inode, fs_startblk, fs_count, + map_bh->b_size = fs_count << dio->blkbits; + ret = (*dio->get_block)(dio->inode, fs_startblk, map_bh, create); } return ret; @@ -778,11 +779,11 @@ static void dio_zero_block(struct dio *d * happily perform page-sized but 512-byte aligned IOs. It is important that * blockdev IO be able to have fine alignment and large sizes. * - * So what we do is to permit the ->get_blocks function to populate bh.b_size + * So what we do is to permit the ->get_block function to populate bh.b_size * with the size of IO which is permitted at this offset and this i_blkbits. * * For best results, the blockdev should be set up with 512-byte i_blkbits and - * it should set b_size to PAGE_SIZE or more inside get_blocks(). This gives + * it should set b_size to PAGE_SIZE or more inside get_block(). This gives * fine alignment but still allows this function to work in PAGE_SIZE units. */ static int do_direct_IO(struct dio *dio) @@ -942,7 +943,7 @@ out: static ssize_t direct_io_worker(int rw, struct kiocb *iocb, struct inode *inode, const struct iovec *iov, loff_t offset, unsigned long nr_segs, - unsigned blkbits, get_blocks_t get_blocks, dio_iodone_t end_io, + unsigned blkbits, get_block_t get_block, dio_iodone_t end_io, struct dio *dio) { unsigned long user_addr; @@ -964,7 +965,7 @@ direct_io_worker(int rw, struct kiocb *i dio->boundary = 0; dio->reap_counter = 0; - dio->get_blocks = get_blocks; + dio->get_block = get_block; dio->end_io = end_io; dio->map_bh.b_private = NULL; dio->final_block_in_bio = -1; @@ -1170,7 +1171,7 @@ direct_io_worker(int rw, struct kiocb *i ssize_t __blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode, struct block_device *bdev, const struct iovec *iov, loff_t offset, - unsigned long nr_segs, get_blocks_t get_blocks, dio_iodone_t end_io, + unsigned long nr_segs, get_block_t get_block, dio_iodone_t end_io, int dio_lock_type) { int seg; @@ -1266,7 +1267,7 @@ __blockdev_direct_IO(int rw, struct kioc (end > i_size_read(inode))); retval = direct_io_worker(rw, iocb, inode, iov, offset, - nr_segs, blkbits, get_blocks, end_io, dio); + nr_segs, blkbits, get_block, end_io, dio); if (rw == READ && dio_lock_type == DIO_LOCKING) reader_with_isem = 0; Index: linux-2.6.16-rc5/fs/ext2/inode.c =================================================================== --- linux-2.6.16-rc5.orig/fs/ext2/inode.c 2006-02-26 21:09:35.000000000 -0800 +++ linux-2.6.16-rc5/fs/ext2/inode.c 2006-02-27 08:23:02.000000000 -0800 @@ -667,18 +667,6 @@ static sector_t ext2_bmap(struct address return generic_block_bmap(mapping,block,ext2_get_block); } -static int -ext2_get_blocks(struct inode *inode, sector_t iblock, unsigned long max_blocks, - struct buffer_head *bh_result, int create) -{ - int ret; - - ret = ext2_get_block(inode, iblock, bh_result, create); - if (ret == 0) - bh_result->b_size = (1 << inode->i_blkbits); - return ret; -} - static ssize_t ext2_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, loff_t offset, unsigned long nr_segs) @@ -687,7 +675,7 @@ ext2_direct_IO(int rw, struct kiocb *ioc struct inode *inode = file->f_mapping->host; return blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov, - offset, nr_segs, ext2_get_blocks, NULL); + offset, nr_segs, ext2_get_block, NULL); } static int Index: linux-2.6.16-rc5/fs/ext3/inode.c =================================================================== --- linux-2.6.16-rc5.orig/fs/ext3/inode.c 2006-02-26 21:09:35.000000000 -0800 +++ linux-2.6.16-rc5/fs/ext3/inode.c 2006-02-27 08:23:02.000000000 -0800 @@ -806,8 +806,7 @@ static int ext3_get_block(struct inode * static int ext3_direct_io_get_blocks(struct inode *inode, sector_t iblock, - unsigned long max_blocks, struct buffer_head *bh_result, - int create) + struct buffer_head *bh_result, int create) { handle_t *handle = journal_current_handle(); int ret = 0; Index: linux-2.6.16-rc5/fs/fat/inode.c =================================================================== --- linux-2.6.16-rc5.orig/fs/fat/inode.c 2006-02-26 21:09:35.000000000 -0800 +++ linux-2.6.16-rc5/fs/fat/inode.c 2006-02-27 08:23:02.000000000 -0800 @@ -101,11 +101,11 @@ static int __fat_get_blocks(struct inode } static int fat_get_blocks(struct inode *inode, sector_t iblock, - unsigned long max_blocks, struct buffer_head *bh_result, int create) { struct super_block *sb = inode->i_sb; int err; + unsigned long max_blocks = bh_result->b_size >> inode->i_blkbits; err = __fat_get_blocks(inode, iblock, &max_blocks, bh_result, create); if (err) Index: linux-2.6.16-rc5/fs/hfs/inode.c =================================================================== --- linux-2.6.16-rc5.orig/fs/hfs/inode.c 2006-02-26 21:09:35.000000000 -0800 +++ linux-2.6.16-rc5/fs/hfs/inode.c 2006-02-27 08:23:02.000000000 -0800 @@ -98,17 +98,6 @@ static int hfs_releasepage(struct page * return res ? try_to_free_buffers(page) : 0; } -static int hfs_get_blocks(struct inode *inode, sector_t iblock, unsigned long max_blocks, - struct buffer_head *bh_result, int create) -{ - int ret; - - ret = hfs_get_block(inode, iblock, bh_result, create); - if (!ret) - bh_result->b_size = (1 << inode->i_blkbits); - return ret; -} - static ssize_t hfs_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, loff_t offset, unsigned long nr_segs) { @@ -116,7 +105,7 @@ static ssize_t hfs_direct_IO(int rw, str struct inode *inode = file->f_dentry->d_inode->i_mapping->host; return blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov, - offset, nr_segs, hfs_get_blocks, NULL); + offset, nr_segs, hfs_get_block, NULL); } static int hfs_writepages(struct address_space *mapping, Index: linux-2.6.16-rc5/fs/hfsplus/inode.c =================================================================== --- linux-2.6.16-rc5.orig/fs/hfsplus/inode.c 2006-02-26 21:09:35.000000000 -0800 +++ linux-2.6.16-rc5/fs/hfsplus/inode.c 2006-02-27 08:23:02.000000000 -0800 @@ -93,17 +93,6 @@ static int hfsplus_releasepage(struct pa return res ? try_to_free_buffers(page) : 0; } -static int hfsplus_get_blocks(struct inode *inode, sector_t iblock, unsigned long max_blocks, - struct buffer_head *bh_result, int create) -{ - int ret; - - ret = hfsplus_get_block(inode, iblock, bh_result, create); - if (!ret) - bh_result->b_size = (1 << inode->i_blkbits); - return ret; -} - static ssize_t hfsplus_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, loff_t offset, unsigned long nr_segs) { @@ -111,7 +100,7 @@ static ssize_t hfsplus_direct_IO(int rw, struct inode *inode = file->f_dentry->d_inode->i_mapping->host; return blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov, - offset, nr_segs, hfsplus_get_blocks, NULL); + offset, nr_segs, hfsplus_get_block, NULL); } static int hfsplus_writepages(struct address_space *mapping, Index: linux-2.6.16-rc5/fs/ocfs2/aops.c =================================================================== --- linux-2.6.16-rc5.orig/fs/ocfs2/aops.c 2006-02-26 21:09:35.000000000 -0800 +++ linux-2.6.16-rc5/fs/ocfs2/aops.c 2006-02-27 08:23:02.000000000 -0800 @@ -538,7 +538,6 @@ bail: * fs_count, map_bh, dio->rw == WRITE); */ static int ocfs2_direct_IO_get_blocks(struct inode *inode, sector_t iblock, - unsigned long max_blocks, struct buffer_head *bh_result, int create) { int ret; @@ -546,6 +545,7 @@ static int ocfs2_direct_IO_get_blocks(st u64 p_blkno; int contig_blocks; unsigned char blocksize_bits; + unsigned long max_blocks = bh_result->b_size >> inode->i_blkbits; if (!inode || !bh_result) { mlog(ML_ERROR, "inode or bh_result is null\n"); Index: linux-2.6.16-rc5/fs/reiserfs/inode.c =================================================================== --- linux-2.6.16-rc5.orig/fs/reiserfs/inode.c 2006-02-26 21:09:35.000000000 -0800 +++ linux-2.6.16-rc5/fs/reiserfs/inode.c 2006-02-27 08:23:02.000000000 -0800 @@ -466,7 +466,6 @@ static int reiserfs_get_block_create_0(s direct_IO request. */ static int reiserfs_get_blocks_direct_io(struct inode *inode, sector_t iblock, - unsigned long max_blocks, struct buffer_head *bh_result, int create) { Index: linux-2.6.16-rc5/fs/xfs/linux-2.6/xfs_aops.c =================================================================== --- linux-2.6.16-rc5.orig/fs/xfs/linux-2.6/xfs_aops.c 2006-02-27 08:22:56.000000000 -0800 +++ linux-2.6.16-rc5/fs/xfs/linux-2.6/xfs_aops.c 2006-02-27 08:23:02.000000000 -0800 @@ -1144,12 +1144,11 @@ STATIC int linvfs_get_blocks_direct( struct inode *inode, sector_t iblock, - unsigned long max_blocks, struct buffer_head *bh_result, int create) { - return __linvfs_get_block(inode, iblock, max_blocks, bh_result, - create, 1, BMAPI_WRITE|BMAPI_DIRECT); + return __linvfs_get_block(inode, iblock, bh_result->b_size >> inode->i_blkbits, + bh_result, create, 1, BMAPI_WRITE|BMAPI_DIRECT); } STATIC void Index: linux-2.6.16-rc5/include/linux/fs.h =================================================================== --- linux-2.6.16-rc5.orig/include/linux/fs.h 2006-02-26 21:09:35.000000000 -0800 +++ linux-2.6.16-rc5/include/linux/fs.h 2006-02-27 08:23:02.000000000 -0800 @@ -240,9 +240,6 @@ extern void __init files_init(unsigned l struct buffer_head; typedef int (get_block_t)(struct inode *inode, sector_t iblock, struct buffer_head *bh_result, int create); -typedef int (get_blocks_t)(struct inode *inode, sector_t iblock, - unsigned long max_blocks, - struct buffer_head *bh_result, int create); typedef void (dio_iodone_t)(struct kiocb *iocb, loff_t offset, ssize_t bytes, void *private); @@ -1621,7 +1618,7 @@ static inline void do_generic_file_read( ssize_t __blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode, struct block_device *bdev, const struct iovec *iov, loff_t offset, - unsigned long nr_segs, get_blocks_t get_blocks, dio_iodone_t end_io, + unsigned long nr_segs, get_block_t get_block, dio_iodone_t end_io, int lock_type); enum { @@ -1632,29 +1629,29 @@ enum { static inline ssize_t blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode, struct block_device *bdev, const struct iovec *iov, - loff_t offset, unsigned long nr_segs, get_blocks_t get_blocks, + loff_t offset, unsigned long nr_segs, get_block_t get_block, dio_iodone_t end_io) { return __blockdev_direct_IO(rw, iocb, inode, bdev, iov, offset, - nr_segs, get_blocks, end_io, DIO_LOCKING); + nr_segs, get_block, end_io, DIO_LOCKING); } static inline ssize_t blockdev_direct_IO_no_locking(int rw, struct kiocb *iocb, struct inode *inode, struct block_device *bdev, const struct iovec *iov, - loff_t offset, unsigned long nr_segs, get_blocks_t get_blocks, + loff_t offset, unsigned long nr_segs, get_block_t get_block, dio_iodone_t end_io) { return __blockdev_direct_IO(rw, iocb, inode, bdev, iov, offset, - nr_segs, get_blocks, end_io, DIO_NO_LOCKING); + nr_segs, get_block, end_io, DIO_NO_LOCKING); } static inline ssize_t blockdev_direct_IO_own_locking(int rw, struct kiocb *iocb, struct inode *inode, struct block_device *bdev, const struct iovec *iov, - loff_t offset, unsigned long nr_segs, get_blocks_t get_blocks, + loff_t offset, unsigned long nr_segs, get_block_t get_block, dio_iodone_t end_io) { return __blockdev_direct_IO(rw, iocb, inode, bdev, iov, offset, - nr_segs, get_blocks, end_io, DIO_OWN_LOCKING); + nr_segs, get_block, end_io, DIO_OWN_LOCKING); } extern struct file_operations generic_ro_fops; Index: linux-2.6.16-rc5/fs/block_dev.c =================================================================== --- linux-2.6.16-rc5.orig/fs/block_dev.c 2006-02-26 21:09:35.000000000 -0800 +++ linux-2.6.16-rc5/fs/block_dev.c 2006-02-27 08:23:02.000000000 -0800 @@ -135,9 +135,10 @@ blkdev_get_block(struct inode *inode, se static int blkdev_get_blocks(struct inode *inode, sector_t iblock, - unsigned long max_blocks, struct buffer_head *bh, int create) + struct buffer_head *bh, int create) { sector_t end_block = max_block(I_BDEV(inode)); + unsigned long max_blocks = bh->b_size >> inode->i_blkbits; if ((iblock + max_blocks) > end_block) { max_blocks = end_block - iblock; Index: linux-2.6.16-rc5/fs/jfs/inode.c =================================================================== --- linux-2.6.16-rc5.orig/fs/jfs/inode.c 2006-02-27 08:22:56.000000000 -0800 +++ linux-2.6.16-rc5/fs/jfs/inode.c 2006-02-27 08:23:02.000000000 -0800 @@ -301,7 +301,7 @@ static ssize_t jfs_direct_IO(int rw, str struct inode *inode = file->f_mapping->host; return blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov, - offset, nr_segs, jfs_get_blocks, NULL); + offset, nr_segs, jfs_get_block, NULL); } struct address_space_operations jfs_aops = { ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/4] map multiple blocks in get_block() and mpage_readpages() 2006-02-27 21:20 [PATCH 0/4] map multiple blocks in get_block() and mpage_readpages() Badari Pulavarty ` (3 preceding siblings ...) 2006-02-27 21:24 ` [PATCH 4/4] remove ->get_blocks() support Badari Pulavarty @ 2006-03-02 1:52 ` Andrew Morton 2006-03-03 17:05 ` Badari Pulavarty 4 siblings, 1 reply; 18+ messages in thread From: Andrew Morton @ 2006-03-02 1:52 UTC (permalink / raw) To: Badari Pulavarty; +Cc: linux-kernel, pbadari Badari Pulavarty <pbadari@us.ibm.com> wrote: > > I noticed decent improvements (reduced sys time) on JFS, XFS and ext3. > (on simple "dd" read tests). > > (rc3.mm1) (rc3.mm1 + patches) > real 0m18.814s 0m18.482s > user 0m0.000s 0m0.004s > sys 0m3.240s 0m2.912s With which filesystem? XFS and JFS implement a larger-than-b_size ->get_block, but ext3 doesn't. I'd expect ext3 system time to increase a bit, if anything? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/4] map multiple blocks in get_block() and mpage_readpages() 2006-03-02 1:52 ` [PATCH 0/4] map multiple blocks in get_block() and mpage_readpages() Andrew Morton @ 2006-03-03 17:05 ` Badari Pulavarty 2006-03-03 20:32 ` Andrew Morton 0 siblings, 1 reply; 18+ messages in thread From: Badari Pulavarty @ 2006-03-03 17:05 UTC (permalink / raw) To: Andrew Morton; +Cc: lkml On Wed, 2006-03-01 at 17:52 -0800, Andrew Morton wrote: > Badari Pulavarty <pbadari@us.ibm.com> wrote: > > > > I noticed decent improvements (reduced sys time) on JFS, XFS and ext3. > > (on simple "dd" read tests). > > > > (rc3.mm1) (rc3.mm1 + patches) > > real 0m18.814s 0m18.482s > > user 0m0.000s 0m0.004s > > sys 0m3.240s 0m2.912s > > With which filesystem? XFS and JFS implement a larger-than-b_size > ->get_block, but ext3 doesn't. I'd expect ext3 system time to increase a > bit, if anything? These numbers are on JFS. With the current ext3 (mainline) - I did find not-really-noticible increase in sys time (due to code overhead). I tested on ext3 with Mingming's ext3 getblocks() support in -mm also, which showed reduction in sys time. Thanks, Badari ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/4] map multiple blocks in get_block() and mpage_readpages() 2006-03-03 17:05 ` Badari Pulavarty @ 2006-03-03 20:32 ` Andrew Morton 2006-03-03 21:54 ` Badari Pulavarty 0 siblings, 1 reply; 18+ messages in thread From: Andrew Morton @ 2006-03-03 20:32 UTC (permalink / raw) To: Badari Pulavarty; +Cc: linux-kernel Badari Pulavarty <pbadari@us.ibm.com> wrote: > > On Wed, 2006-03-01 at 17:52 -0800, Andrew Morton wrote: > > Badari Pulavarty <pbadari@us.ibm.com> wrote: > > > > > > I noticed decent improvements (reduced sys time) on JFS, XFS and ext3. > > > (on simple "dd" read tests). > > > > > > (rc3.mm1) (rc3.mm1 + patches) > > > real 0m18.814s 0m18.482s > > > user 0m0.000s 0m0.004s > > > sys 0m3.240s 0m2.912s > > > > With which filesystem? XFS and JFS implement a larger-than-b_size > > ->get_block, but ext3 doesn't. I'd expect ext3 system time to increase a > > bit, if anything? > > These numbers are on JFS. With the current ext3 (mainline) - I did > find not-really-noticible increase in sys time (due to code overhead). > > I tested on ext3 with Mingming's ext3 getblocks() support in -mm also, > which showed reduction in sys time. > OK, no surprises there. Things will improve when someone gets around to doing multi-block get_block for ext3. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/4] map multiple blocks in get_block() and mpage_readpages() 2006-03-03 20:32 ` Andrew Morton @ 2006-03-03 21:54 ` Badari Pulavarty 0 siblings, 0 replies; 18+ messages in thread From: Badari Pulavarty @ 2006-03-03 21:54 UTC (permalink / raw) To: Andrew Morton, mcao; +Cc: lkml [-- Attachment #1: Type: text/plain, Size: 1217 bytes --] On Fri, 2006-03-03 at 12:32 -0800, Andrew Morton wrote: > Badari Pulavarty <pbadari@us.ibm.com> wrote: > > > > On Wed, 2006-03-01 at 17:52 -0800, Andrew Morton wrote: > > > Badari Pulavarty <pbadari@us.ibm.com> wrote: > > > > > > > > I noticed decent improvements (reduced sys time) on JFS, XFS and ext3. > > > > (on simple "dd" read tests). > > > > > > > > (rc3.mm1) (rc3.mm1 + patches) > > > > real 0m18.814s 0m18.482s > > > > user 0m0.000s 0m0.004s > > > > sys 0m3.240s 0m2.912s > > > > > > With which filesystem? XFS and JFS implement a larger-than-b_size > > > ->get_block, but ext3 doesn't. I'd expect ext3 system time to increase a > > > bit, if anything? > > > > These numbers are on JFS. With the current ext3 (mainline) - I did > > find not-really-noticible increase in sys time (due to code overhead). > > > > I tested on ext3 with Mingming's ext3 getblocks() support in -mm also, > > which showed reduction in sys time. > > > > OK, no surprises there. Things will improve when someone gets around to > doing multi-block get_block for ext3. Well, I already did this when I tested Mingming's multiblock work for ext3. (-mm only patch) Thanks, Badari [-- Attachment #2: ext3-getblocks.patch --] [-- Type: text/x-patch, Size: 1738 bytes --] Mingming Cao recently added multi-block allocation support for ext3, currently used only by DIO. I added support to map multiple blocks for mpage_readpages(). This patch add support for ext3_get_block() to deal with multi-block mapping. Basically it renames ext3_direct_io_get_blocks() as ext3_get_block(). Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Index: linux-2.6.16-rc5-mm2/fs/ext3/inode.c =================================================================== --- linux-2.6.16-rc5-mm2.orig/fs/ext3/inode.c 2006-03-03 13:46:22.000000000 -0800 +++ linux-2.6.16-rc5-mm2/fs/ext3/inode.c 2006-03-03 13:52:15.000000000 -0800 @@ -936,9 +936,8 @@ out: #define DIO_CREDITS (EXT3_RESERVE_TRANS_BLOCKS + 32) -static int -ext3_direct_io_get_blocks(struct inode *inode, sector_t iblock, - struct buffer_head *bh_result, int create) +static int ext3_get_block(struct inode *inode, sector_t iblock, + struct buffer_head *bh_result, int create) { handle_t *handle = journal_current_handle(); int ret = 0; @@ -987,12 +986,6 @@ get_block: return ret; } -static int ext3_get_block(struct inode *inode, sector_t iblock, - struct buffer_head *bh_result, int create) -{ - return ext3_direct_io_get_blocks(inode, iblock, bh_result, create); -} - /* * `handle' can be NULL if create is zero */ @@ -1645,10 +1638,10 @@ static ssize_t ext3_direct_IO(int rw, st ret = blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov, offset, nr_segs, - ext3_direct_io_get_blocks, NULL); + ext3_get_block, NULL); /* - * Reacquire the handle: ext3_direct_io_get_block() can restart the + * Reacquire the handle: ext3_get_block() can restart the * transaction */ handle = journal_current_handle(); ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2006-03-06 19:54 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-02-27 21:20 [PATCH 0/4] map multiple blocks in get_block() and mpage_readpages() Badari Pulavarty 2006-02-27 21:22 ` [PATCH 1/4] change buffer_head.b_size to size_t Badari Pulavarty 2006-03-02 1:51 ` Andrew Morton 2006-03-02 1:51 ` Andrew Morton 2006-03-02 2:22 ` Nathan Scott 2006-03-02 2:56 ` Andrew Morton 2006-03-06 19:54 ` Tim Pepper 2006-02-27 21:23 ` [PATCH 2/4] pass b_size to ->get_block() Badari Pulavarty 2006-03-02 1:52 ` Andrew Morton 2006-03-03 17:17 ` Badari Pulavarty 2006-02-27 21:24 ` [PATCH 3/4] map multiple blocks for mpage_readpages() Badari Pulavarty 2006-03-02 4:45 ` Andrew Morton 2006-03-03 16:51 ` Badari Pulavarty 2006-02-27 21:24 ` [PATCH 4/4] remove ->get_blocks() support Badari Pulavarty 2006-03-02 1:52 ` [PATCH 0/4] map multiple blocks in get_block() and mpage_readpages() Andrew Morton 2006-03-03 17:05 ` Badari Pulavarty 2006-03-03 20:32 ` Andrew Morton 2006-03-03 21:54 ` Badari Pulavarty
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox