Re: [PATCH for 3.2] fs/direct-io.c: Calculate fs_count correctly in get_more

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH for 3.2] fs/direct-io.c: Calculate fs_count correctly in get_more_blocks.
       [not found]   ` <x49wrblav46.fsf@segfault.boston.devel.redhat.com>
@ 2011-11-01  3:31     ` Tao Ma
  2011-11-02  2:26     ` [PATCH V2 " Tao Ma
  1 sibling, 0 replies; 4+ messages in thread
From: Tao Ma @ 2011-11-01  3:31 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Al Viro,
	Andrew Morton

On 11/01/2011 02:12 AM, Jeff Moyer wrote:
> Tao Ma <tm@tao.ma> writes:
> 
>> From: Tao Ma <boyu.mt@taobao.com>
>>
>> In get_more_blocks, we use dio_count to calculate fs_count and do some
>> tricky things to increase fs_count if dio_count isn't aligned. But
>> actually it still has some cornor case that can't be coverd. See the
>> following example:
>> ./dio_write foo -s 1024 -w 4096(direct write 4096 bytes at offset 1024).
>> The same goes if the offset isn't aligned to fs_blocksize.
>>
>> In this case, the old calculation counts fs_count to be 1, but actually
>> we will write into 2 different blocks(if fs_blocksize=4096). The old code
>> just works, since it will call get_block twice(and may have to allocate
>> and create extent twice for file systems like ext4). So we'd better call
>> get_block just once with the proper fs_count.
> 
> This description was *really* hard for me to understand.  It seems to me
> that right now there's an inefficiency in the code.  It's not clear
> whether you're claiming that it was introduced recently, though.  Was
> it, or has this problem been around for a while?
Actually it is there a long time ago. And the good thing is that it
isn't a bug, only some performance overhead.
> 
> How did you notice this?  Was there any evidence of a problem, such as
> performance overhead or less than ideal file layout?
I found it when I dig into some ext4 issues. The ext4 can't create the
whole 8K(in the above case) and ext4 has to create the blocks 2 times
for just one direct i/o write. In some of our test, it costs.

> 
> Anyway, I agree that the code does not correctly calculate the number of
> file system blocks in a request.  I also agree that your patch fixes
> that issue.
> 
> Please ammend the description and then you can add my:
So how about the following commit log(please feel free to modify it if I
still don't describe it correctly).

In get_more_blocks, we use dio_count to calculate fs_count to let the
file system map(maybe also create) blocks. And some tricky things are
done to increase fs_count if dio_count isn't aligned.

But actually it still has some cornor case that can't be coverd. See the
following example:
./dio_write foo -s 1024 -w 4096(direct write 4096 bytes at offset 1024).

In this case, the old calculation counts fs_count to be 1, but actually
we will write into 2 different blocks(if fs_blocksize=4096). So the
underlying file system is called twice and leads to some performance
overhead. So fix it by calculating fs_count correctly and let the file
system knows what we really want to write.

Thanks
Tao

> 
> Acked-by: Jeff Moyer <jmoyer@redhat.com>
> 
> Cheers,
> Jeff
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH V2 for 3.2] fs/direct-io.c: Calculate fs_count correctly in get_more_blocks.
       [not found]   ` <x49wrblav46.fsf@segfault.boston.devel.redhat.com>
  2011-11-01  3:31     ` [PATCH for 3.2] fs/direct-io.c: Calculate fs_count correctly in get_more_blocks Tao Ma
@ 2011-11-02  2:26     ` Tao Ma
  2011-11-02  7:36       ` Christoph Hellwig
  1 sibling, 1 reply; 4+ messages in thread
From: Tao Ma @ 2011-11-02  2:26 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Christoph Hellwig, Al Viro, Andrew Morton

From: Tao Ma <boyu.mt@taobao.com>

In get_more_blocks, we use dio_count to calculate fs_count to let the
file system map(maybe also create) blocks. And some tricky things are
done to increase fs_count if dio_count isn't aligned.

But actually it still has some cornor case that can't be coverd. See the
following example:
./dio_write foo -s 1024 -w 4096(direct write 4096 bytes at offset 1024).

In this case, the old calculation counts fs_count to be 1, but actually
we will write into 2 different blocks(if fs_blocksize=4096). So the
underlying file system is called twice and leads to some performance
overhead. So fix it by calculating fs_count correctly and let the file
system knows what we really want to write.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
---
 fs/direct-io.c |   11 ++++-------
 1 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index d740ab6..5582183 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -580,9 +580,8 @@ static int get_more_blocks(struct dio *dio, struct dio_submit *sdio,
 {
 	int ret;
 	sector_t fs_startblk;	/* Into file, in filesystem-sized blocks */
+	sector_t fs_endblk;	/* Into file, in filesystem-sized blocks */
 	unsigned long fs_count;	/* Number of filesystem-sized blocks */
-	unsigned long dio_count;/* Number of dio_block-sized blocks */
-	unsigned long blkmask;
 	int create;
 
 	/*
@@ -593,11 +592,9 @@ static int get_more_blocks(struct dio *dio, struct dio_submit *sdio,
 	if (ret == 0) {
 		BUG_ON(sdio->block_in_file >= sdio->final_block_in_request);
 		fs_startblk = sdio->block_in_file >> sdio->blkfactor;
-		dio_count = sdio->final_block_in_request - sdio->block_in_file;
-		fs_count = dio_count >> sdio->blkfactor;
-		blkmask = (1 << sdio->blkfactor) - 1;
-		if (dio_count & blkmask)	
-			fs_count++;
+		fs_endblk = (sdio->final_block_in_request - 1) >>
+				sdio->blkfactor;
+		fs_count = fs_endblk - fs_startblk + 1;
 
 		map_bh->b_state = 0;
 		map_bh->b_size = fs_count << dio->inode->i_blkbits;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH V2 for 3.2] fs/direct-io.c: Calculate fs_count correctly in get_more_blocks.
  2011-11-02  2:26     ` [PATCH V2 " Tao Ma
@ 2011-11-02  7:36       ` Christoph Hellwig
  2011-11-03  3:21         ` Tao Ma
  0 siblings, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2011-11-02  7:36 UTC (permalink / raw)
  To: Tao Ma
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Al Viro,
	Andrew Morton

The patch looks good to me, but given that it neither fixes a bug
nor a regression I'd probably call it 3.3 material at this point.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH V2 for 3.2] fs/direct-io.c: Calculate fs_count correctly in get_more_blocks.
  2011-11-02  7:36       ` Christoph Hellwig
@ 2011-11-03  3:21         ` Tao Ma
  0 siblings, 0 replies; 4+ messages in thread
From: Tao Ma @ 2011-11-03  3:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-fsdevel, linux-kernel, Christoph Hellwig, Al Viro,
	Andrew Morton

On 11/02/2011 03:36 PM, Christoph Hellwig wrote:
> The patch looks good to me, but given that it neither fixes a bug
> nor a regression I'd probably call it 3.3 material at this point.
OK, so you will take it or I should ask Andrew to take it for now?

Thanks
Tao

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-11-03  3:21 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20111029105856.GA6479@infradead.org>
     [not found] ` <1320045873-3956-1-git-send-email-tm@tao.ma>
     [not found]   ` <x49wrblav46.fsf@segfault.boston.devel.redhat.com>
2011-11-01  3:31     ` [PATCH for 3.2] fs/direct-io.c: Calculate fs_count correctly in get_more_blocks Tao Ma
2011-11-02  2:26     ` [PATCH V2 " Tao Ma
2011-11-02  7:36       ` Christoph Hellwig
2011-11-03  3:21         ` Tao Ma

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox