write is faster whan seek?

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* write is faster whan seek?
@ 2008-06-11  7:20 Dmitri Monakhov
  2008-06-11  7:48 ` Jens Axboe
  2008-06-11 11:38 ` Alan D. Brunelle
  0 siblings, 2 replies; 11+ messages in thread
From: Dmitri Monakhov @ 2008-06-11  7:20 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: jens.axboe

I've found what any non continious sequence violation  result in significant
pefrormance drawback. I've two types of requests:
1)Ideally sequential  writes:
   for(i=0;i<num;i++) {
       write(fd, chunk, page_size*32);
   } 
   fsync(fd);

2) Sequential writes with dgap for each 32'th page
   for(i=0;i<num;i++) {
       write(fd, chunk, page_size*31);
       lseek(fd, page_size, SEEK_CUR);
   }
   fsync(fd);

I've found what second IO pattern is about twice times slower whan the first 
one regardless to ioscheduler or HW disk. It is not clear to me why this
happen. Is it linux speciffic or general hardware behaviour speciffic.
I've naively expected what disk hardware cat merge several 31-paged
requests in to continious one by filling holes by some sort of dummy activity.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: write is faster whan seek?
  2008-06-11  7:20 write is faster whan seek? Dmitri Monakhov
@ 2008-06-11  7:48 ` Jens Axboe
  2008-06-11  8:11   ` Dmitri Monakhov
  2008-06-11 11:38 ` Alan D. Brunelle
  1 sibling, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2008-06-11  7:48 UTC (permalink / raw)
  To: Dmitri Monakhov; +Cc: linux-fsdevel

On Wed, Jun 11 2008, Dmitri Monakhov wrote:
> I've found what any non continious sequence violation  result in significant
> pefrormance drawback. I've two types of requests:
> 1)Ideally sequential  writes:
>    for(i=0;i<num;i++) {
>        write(fd, chunk, page_size*32);
>    } 
>    fsync(fd);
> 
> 2) Sequential writes with dgap for each 32'th page
>    for(i=0;i<num;i++) {
>        write(fd, chunk, page_size*31);
>        lseek(fd, page_size, SEEK_CUR);
>    }
>    fsync(fd);
> 
> I've found what second IO pattern is about twice times slower whan the
> first one regardless to ioscheduler or HW disk. It is not clear to me
> why this happen. Is it linux speciffic or general hardware behaviour
> speciffic.  I've naively expected what disk hardware cat merge several
> 31-paged requests in to continious one by filling holes by some sort
> of dummy activity.

Performance should be about the same. The first is always going to be a
little faster, on some hardware probably quite a bit. Are you using
write back caching on the drive? I ran a quick test here, and the second
test is about ~5% slower on this drive.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: write is faster whan seek?
  2008-06-11  7:48 ` Jens Axboe
@ 2008-06-11  8:11   ` Dmitri Monakhov
  2008-06-11  8:26     ` Jens Axboe
  0 siblings, 1 reply; 11+ messages in thread
From: Dmitri Monakhov @ 2008-06-11  8:11 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 2161 bytes --]

Jens Axboe <jens.axboe@oracle.com> writes:

> On Wed, Jun 11 2008, Dmitri Monakhov wrote:
>> I've found what any non continious sequence violation  result in significant
>> pefrormance drawback. I've two types of requests:
>> 1)Ideally sequential  writes:
>>    for(i=0;i<num;i++) {
>>        write(fd, chunk, page_size*32);
>>    } 
>>    fsync(fd);
>> 
>> 2) Sequential writes with dgap for each 32'th page
>>    for(i=0;i<num;i++) {
>>        write(fd, chunk, page_size*31);
>>        lseek(fd, page_size, SEEK_CUR);
>>    }
>>    fsync(fd);
>> 
>> I've found what second IO pattern is about twice times slower whan the
>> first one regardless to ioscheduler or HW disk. It is not clear to me
>> why this happen. Is it linux speciffic or general hardware behaviour
>> speciffic.  I've naively expected what disk hardware cat merge several
>> 31-paged requests in to continious one by filling holes by some sort
>> of dummy activity.
>
> Performance should be about the same. The first is always going to be a
> little faster, on some hardware probably quite a bit. Are you using
> write back caching on the drive? I ran a quick test here, and the second
> test is about ~5% slower on this drive.
Hmmm... it is definitly not happen in may case.  
I've tested following sata drive w and w/o write cache
AHCI
ata7.00: ATA-7: ST3250410AS, 3.AAC, max UDMA/133
ata7.00: 488397168 sectors, multi 0: LBA48 NCQ (depth 31/32)
In all cases it is about two times slower.

# time   /tmp/wr_test /dev/sda3 32 0 800
real    0m1.183s
user    0m0.002s
sys     0m0.079s

# time   /tmp/wr_test /dev/sda3 31 1 800
real    0m3.240s
user    0m0.000s
sys     0m0.078s

Other SCSI disk is about ~1.5 times slower :
  Vendor: FUJITSU   Model: MAW3073NP         Rev: 0104
  Type:   Direct-Access                      ANSI SCSI revision: 03
 target2:0:4: Beginning Domain Validation
 target2:0:4: Ending Domain Validation
 target2:0:4: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS RTI WRFLOW PCOMP (6.25 ns, offset 127)
SCSI device sde: 143638992 512-byte hdwr sectors (73543 MB)
sde: Write Protect is off
sde: Mode Sense: b3 00 00 08
SCSI device sde: drive cache: write back

test source:

[-- Attachment #2: wr_test.c --]
[-- Type: text/plain, Size: 424 bytes --]

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
int main(int argc, char** argv)
{
	char *buf;
	long chunk, off, num, i;
	int fd;
	if (argc != 5);
	chunk = atoi(argv[2])*4096;
	off = atoi(argv[3])*4096;
	num = atoi(argv[4]);
	buf = malloc(chunk);
	fd = open(argv[1], O_RDWR|O_CREAT, 0777);
	for(i = 0; i < num; i++) {
		write(fd, buf, chunk);
		lseek(fd, off, SEEK_CUR);
	}
	return fsync(fd);
}

[-- Attachment #3: Type: text/plain, Size: 22 bytes --]


>
> -- 
> Jens Axboe

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: write is faster whan seek?
  2008-06-11  8:11   ` Dmitri Monakhov
@ 2008-06-11  8:26     ` Jens Axboe
  2008-06-11  9:28       ` Dmitri Monakhov
  0 siblings, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2008-06-11  8:26 UTC (permalink / raw)
  To: Dmitri Monakhov; +Cc: linux-fsdevel

On Wed, Jun 11 2008, Dmitri Monakhov wrote:
> Jens Axboe <jens.axboe@oracle.com> writes:
> 
> > On Wed, Jun 11 2008, Dmitri Monakhov wrote:
> >> I've found what any non continious sequence violation  result in significant
> >> pefrormance drawback. I've two types of requests:
> >> 1)Ideally sequential  writes:
> >>    for(i=0;i<num;i++) {
> >>        write(fd, chunk, page_size*32);
> >>    } 
> >>    fsync(fd);
> >> 
> >> 2) Sequential writes with dgap for each 32'th page
> >>    for(i=0;i<num;i++) {
> >>        write(fd, chunk, page_size*31);
> >>        lseek(fd, page_size, SEEK_CUR);
> >>    }
> >>    fsync(fd);
> >> 
> >> I've found what second IO pattern is about twice times slower whan the
> >> first one regardless to ioscheduler or HW disk. It is not clear to me
> >> why this happen. Is it linux speciffic or general hardware behaviour
> >> speciffic.  I've naively expected what disk hardware cat merge several
> >> 31-paged requests in to continious one by filling holes by some sort
> >> of dummy activity.
> >
> > Performance should be about the same. The first is always going to be a
> > little faster, on some hardware probably quite a bit. Are you using
> > write back caching on the drive? I ran a quick test here, and the second
> > test is about ~5% slower on this drive.
> Hmmm... it is definitly not happen in may case.  
> I've tested following sata drive w and w/o write cache
> AHCI
> ata7.00: ATA-7: ST3250410AS, 3.AAC, max UDMA/133
> ata7.00: 488397168 sectors, multi 0: LBA48 NCQ (depth 31/32)
> In all cases it is about two times slower.
> 
> # time   /tmp/wr_test /dev/sda3 32 0 800
> real    0m1.183s
> user    0m0.002s
> sys     0m0.079s
> 
> # time   /tmp/wr_test /dev/sda3 31 1 800
> real    0m3.240s
> user    0m0.000s
> sys     0m0.078s

Ahh, direct to device. Try making a 4kb fs on sda3, mount it, umount
it, then rerun the test.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: write is faster whan seek?
  2008-06-11  8:26     ` Jens Axboe
@ 2008-06-11  9:28       ` Dmitri Monakhov
  2008-06-11  9:38         ` Jens Axboe
  0 siblings, 1 reply; 11+ messages in thread
From: Dmitri Monakhov @ 2008-06-11  9:28 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel

Jens Axboe <jens.axboe@oracle.com> writes:

> On Wed, Jun 11 2008, Dmitri Monakhov wrote:
>> Jens Axboe <jens.axboe@oracle.com> writes:
>> 
>> > On Wed, Jun 11 2008, Dmitri Monakhov wrote:
>> >> I've found what any non continious sequence violation  result in significant
>> >> pefrormance drawback. I've two types of requests:
>> >> 1)Ideally sequential  writes:
>> >>    for(i=0;i<num;i++) {
>> >>        write(fd, chunk, page_size*32);
>> >>    } 
>> >>    fsync(fd);
>> >> 
>> >> 2) Sequential writes with dgap for each 32'th page
>> >>    for(i=0;i<num;i++) {
>> >>        write(fd, chunk, page_size*31);
>> >>        lseek(fd, page_size, SEEK_CUR);
>> >>    }
>> >>    fsync(fd);
>> >> 
>> >> I've found what second IO pattern is about twice times slower whan the
>> >> first one regardless to ioscheduler or HW disk. It is not clear to me
>> >> why this happen. Is it linux speciffic or general hardware behaviour
>> >> speciffic.  I've naively expected what disk hardware cat merge several
>> >> 31-paged requests in to continious one by filling holes by some sort
>> >> of dummy activity.
>> >
>> > Performance should be about the same. The first is always going to be a
>> > little faster, on some hardware probably quite a bit. Are you using
>> > write back caching on the drive? I ran a quick test here, and the second
>> > test is about ~5% slower on this drive.
>> Hmmm... it is definitly not happen in may case.  
>> I've tested following sata drive w and w/o write cache
>> AHCI
>> ata7.00: ATA-7: ST3250410AS, 3.AAC, max UDMA/133
>> ata7.00: 488397168 sectors, multi 0: LBA48 NCQ (depth 31/32)
>> In all cases it is about two times slower.
>> 
>> # time   /tmp/wr_test /dev/sda3 32 0 800
>> real    0m1.183s
>> user    0m0.002s
>> sys     0m0.079s
>> 
>> # time   /tmp/wr_test /dev/sda3 31 1 800
>> real    0m3.240s
>> user    0m0.000s
>> sys     0m0.078s
>
> Ahh, direct to device. Try making a 4kb fs on sda3, mount it, umount
> it, then rerun the test.
Noup.. no changes at all.
# cat /sys/block/sda/queue/scheduler
[noop] anticipatory deadline cfq
# blockdev --getbsz  /dev/sda
4096
# hdparm -W 0 /dev/sda

/dev/sda:
 setting drive write-caching to 0 (off)

# time ./wr_test /dev/sda  32 0 800
real    0m1.185s
user    0m0.000s
sys     0m0.106s

# time ./wr_test /dev/sda  31 1 800
real    0m3.391s
user    0m0.002s
sys     0m0.112s


I'll try to play with request queue parameters
>
> -- 
> Jens Axboe

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: write is faster whan seek?
  2008-06-11  9:28       ` Dmitri Monakhov
@ 2008-06-11  9:38         ` Jens Axboe
  0 siblings, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2008-06-11  9:38 UTC (permalink / raw)
  To: Dmitri Monakhov; +Cc: linux-fsdevel

On Wed, Jun 11 2008, Dmitri Monakhov wrote:
> Jens Axboe <jens.axboe@oracle.com> writes:
> 
> > On Wed, Jun 11 2008, Dmitri Monakhov wrote:
> >> Jens Axboe <jens.axboe@oracle.com> writes:
> >> 
> >> > On Wed, Jun 11 2008, Dmitri Monakhov wrote:
> >> >> I've found what any non continious sequence violation  result in significant
> >> >> pefrormance drawback. I've two types of requests:
> >> >> 1)Ideally sequential  writes:
> >> >>    for(i=0;i<num;i++) {
> >> >>        write(fd, chunk, page_size*32);
> >> >>    } 
> >> >>    fsync(fd);
> >> >> 
> >> >> 2) Sequential writes with dgap for each 32'th page
> >> >>    for(i=0;i<num;i++) {
> >> >>        write(fd, chunk, page_size*31);
> >> >>        lseek(fd, page_size, SEEK_CUR);
> >> >>    }
> >> >>    fsync(fd);
> >> >> 
> >> >> I've found what second IO pattern is about twice times slower whan the
> >> >> first one regardless to ioscheduler or HW disk. It is not clear to me
> >> >> why this happen. Is it linux speciffic or general hardware behaviour
> >> >> speciffic.  I've naively expected what disk hardware cat merge several
> >> >> 31-paged requests in to continious one by filling holes by some sort
> >> >> of dummy activity.
> >> >
> >> > Performance should be about the same. The first is always going to be a
> >> > little faster, on some hardware probably quite a bit. Are you using
> >> > write back caching on the drive? I ran a quick test here, and the second
> >> > test is about ~5% slower on this drive.
> >> Hmmm... it is definitly not happen in may case.  
> >> I've tested following sata drive w and w/o write cache
> >> AHCI
> >> ata7.00: ATA-7: ST3250410AS, 3.AAC, max UDMA/133
> >> ata7.00: 488397168 sectors, multi 0: LBA48 NCQ (depth 31/32)
> >> In all cases it is about two times slower.
> >> 
> >> # time   /tmp/wr_test /dev/sda3 32 0 800
> >> real    0m1.183s
> >> user    0m0.002s
> >> sys     0m0.079s
> >> 
> >> # time   /tmp/wr_test /dev/sda3 31 1 800
> >> real    0m3.240s
> >> user    0m0.000s
> >> sys     0m0.078s
> >
> > Ahh, direct to device. Try making a 4kb fs on sda3, mount it, umount
> > it, then rerun the test.
> Noup.. no changes at all.
> # cat /sys/block/sda/queue/scheduler
> [noop] anticipatory deadline cfq
> # blockdev --getbsz  /dev/sda
> 4096
> # hdparm -W 0 /dev/sda
> 
> /dev/sda:
>  setting drive write-caching to 0 (off)
> 
> # time ./wr_test /dev/sda  32 0 800
> real    0m1.185s
> user    0m0.000s
> sys     0m0.106s
> 
> # time ./wr_test /dev/sda  31 1 800
> real    0m3.391s
> user    0m0.002s
> sys     0m0.112s
> 
> 
> I'll try to play with request queue parameters

Try finding out if the issued IO pattern is OK first, you should
understand the problem before you attempt to fix it. Run blktrace on
/dev/sda when you do the two tests and compare the blkparse output!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: write is faster whan seek?
  2008-06-11  7:20 write is faster whan seek? Dmitri Monakhov
  2008-06-11  7:48 ` Jens Axboe
@ 2008-06-11 11:38 ` Alan D. Brunelle
  2008-06-11 11:49   ` Jens Axboe
  1 sibling, 1 reply; 11+ messages in thread
From: Alan D. Brunelle @ 2008-06-11 11:38 UTC (permalink / raw)
  To: Dmitri Monakhov; +Cc: linux-fsdevel, jens.axboe

Dmitri Monakhov wrote:

Could it be that in the first case you will have merges, thus creating
fewer/larger I/O requests? Running iostat -x during the two runs, and
watching the output is a good first place to start.

Alan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: write is faster whan seek?
  2008-06-11 11:38 ` Alan D. Brunelle
@ 2008-06-11 11:49   ` Jens Axboe
  2008-06-11 11:52     ` Alan D. Brunelle
  0 siblings, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2008-06-11 11:49 UTC (permalink / raw)
  To: Alan D. Brunelle; +Cc: Dmitri Monakhov, linux-fsdevel

On Wed, Jun 11 2008, Alan D. Brunelle wrote:
> Dmitri Monakhov wrote:
> 
> Could it be that in the first case you will have merges, thus creating
> fewer/larger I/O requests? Running iostat -x during the two runs, and
> watching the output is a good first place to start.

I think it's mostly down to whether a specific drive is good at doing
124kb writes + 4k seek (and repeat) compared to regular streaming
writes. The tested disk was SATA with write back caching, there should
be no real command overhead gain in those size ranges.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: write is faster whan seek?
  2008-06-11 11:49   ` Jens Axboe
@ 2008-06-11 11:52     ` Alan D. Brunelle
  2008-06-11 11:55       ` Jens Axboe
  0 siblings, 1 reply; 11+ messages in thread
From: Alan D. Brunelle @ 2008-06-11 11:52 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Dmitri Monakhov, linux-fsdevel

Jens Axboe wrote:
> On Wed, Jun 11 2008, Alan D. Brunelle wrote:
>> Dmitri Monakhov wrote:
>>
>> Could it be that in the first case you will have merges, thus creating
>> fewer/larger I/O requests? Running iostat -x during the two runs, and
>> watching the output is a good first place to start.
> 
> I think it's mostly down to whether a specific drive is good at doing
> 124kb writes + 4k seek (and repeat) compared to regular streaming
> writes. The tested disk was SATA with write back caching, there should
> be no real command overhead gain in those size ranges.
> 

Probably true, I'd think the iostat -x data would be very helpful though.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: write is faster whan seek?
  2008-06-11 11:52     ` Alan D. Brunelle
@ 2008-06-11 11:55       ` Jens Axboe
  2008-06-16 12:14         ` Dmitri Monakhov
  0 siblings, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2008-06-11 11:55 UTC (permalink / raw)
  To: Alan D. Brunelle; +Cc: Dmitri Monakhov, linux-fsdevel

On Wed, Jun 11 2008, Alan D. Brunelle wrote:
> Jens Axboe wrote:
> > On Wed, Jun 11 2008, Alan D. Brunelle wrote:
> >> Dmitri Monakhov wrote:
> >>
> >> Could it be that in the first case you will have merges, thus creating
> >> fewer/larger I/O requests? Running iostat -x during the two runs, and
> >> watching the output is a good first place to start.
> > 
> > I think it's mostly down to whether a specific drive is good at doing
> > 124kb writes + 4k seek (and repeat) compared to regular streaming
> > writes. The tested disk was SATA with write back caching, there should
> > be no real command overhead gain in those size ranges.
> > 
> 
> Probably true, I'd think the iostat -x data would be very helpful though.

Definitely, the more data the better :). I already asked for blktrace
data, that should give us everything we need.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: write is faster whan seek?
  2008-06-11 11:55       ` Jens Axboe
@ 2008-06-16 12:14         ` Dmitri Monakhov
  0 siblings, 0 replies; 11+ messages in thread
From: Dmitri Monakhov @ 2008-06-16 12:14 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Alan D. Brunelle, linux-fsdevel

Jens Axboe <jens.axboe@oracle.com> writes:

> On Wed, Jun 11 2008, Alan D. Brunelle wrote:
>> Jens Axboe wrote:
>> > On Wed, Jun 11 2008, Alan D. Brunelle wrote:
>> >> Dmitri Monakhov wrote:
>> >>
>> >> Could it be that in the first case you will have merges, thus creating
>> >> fewer/larger I/O requests? Running iostat -x during the two runs, and
>> >> watching the output is a good first place to start.
>> > 
>> > I think it's mostly down to whether a specific drive is good at doing
>> > 124kb writes + 4k seek (and repeat) compared to regular streaming
>> > writes. The tested disk was SATA with write back caching, there should
>> > be no real command overhead gain in those size ranges.
>> > 
>> 
>> Probably true, I'd think the iostat -x data would be very helpful though.
>
> Definitely, the more data the better :). I already asked for blktrace
> data, that should give us everything we need.
Seems it is hardware issue.
Test:
I've disabled margeing request logic in __make_request(), and restrict 
bio->bi_size =< 128 sectors via merge_bvec_fn. ioscheduler:noop
For IO patterns
1 CW(continuous writes)  write(,, PAGE_SIZE*16)
2 WH(writes with holes)   write(,,PAGE_SIZE*15); lseek(,PAGE_SIZE, SEEK_CUR)
3 CR(continuous reads)    write(,, PAGE_SIZE*16)
4 RH(read with holes)    same as WH but directly send bios in order to explicitly
  prevent read-ahead logic.

I've played with sata disk with NCQ on AHCI, and SCSI disk. 

Result: For SATA disk
Performance drawback caused by restricted bio size was negligible for all io
patterns. So this is definitely not queue starvation issue. BIOs sended by
pdflush was ordered in all cases(as expected). For all io patterns, except WH
case, driver completions was olso ordered. But for on WH io pattern driver 
seems goes crazy:
 Dispetched requests:
  8,0    1       14     0.000050684  3485  D   W 0 + 128 [pdflush]
  8,0    1       15     0.000055906  3485  D   W 136 + 128 [pdflush]
  8,0    1       16     0.000059269  3485  D   W 272 + 128 [pdflush]
  8,0    1       17     0.000062625  3485  D   W 408 + 128 [pdflush]
  8,0    1       31     0.000133306  3485  D   W 544 + 128 [pdflush]
  8,0    1       32     0.000136043  3485  D   W 680 + 128 [pdflush]
  8,0    1       33     0.000140446  3485  D   W 816 + 128 [pdflush]
  8,0    1       34     0.000142961  3485  D   W 952 + 128 [pdflush]
  8,0    1       48     0.000204734  3485  D   W 1088 + 128 [pdflush]
  8,0    1       49     0.000207358  3485  D   W 1224 + 128 [pdflush]
  8,0    1       50     0.000209505  3485  D   W 1360 + 128 [pdflush]
  ....
 Completed request:
  8,0    0        1     0.045342874  3907  C   W 2856 + 128 [0]
  8,0    0        3     0.045374650  3907  C   W 2992 + 128 [0]
  8,0    0        5     0.057461715     0  C   W 1768 + 128 [0]
  8,0    0        7     0.057491967     0  C   W 1904 + 128 [0]
  8,0    0        9     0.060058695     0  C   W 680 + 128 [0]
  8,0    0       11     0.060075666     0  C   W 816 + 128 [0]
  8,0    0       13     0.063015540     0  C   W 1360 + 128 [0]
  8,0    0       15     0.063028859     0  C   W 1496 + 128 [0]
  8,0    0       17     0.073802939     0  C   W 3672 + 128 [0]
  8,0    0       19     0.073817422     0  C   W 3808 + 128 [0]
  8,0    0       21     0.075664013     0  C   W 544 + 128 [0]
  8,0    0       23     0.078348416     0  C   W 1088 + 128 [0]
  8,0    0       25     0.078362380     0  C   W 1224 + 128 [0]
  8,0    0       27     0.089371470     0  C   W 3400 + 128 [0]
  8,0    0       29     0.089385247     0  C   W 3536 + 128 [0]
  8,0    0       31     0.092328327     0  C   W 272 + 128 [0]
  ....
As you can see completion appears in semi-random order. This happens regardless
to enabled/disabled hardware write cache. So this is hardware crap.
Note: i've got same performance drawback for mac-mini with MAC Os.

Results for SCSI ( bio's size was restricted to 256 sectors):
All requests dispatched and completed in normal order, but by unknown
reason it takes more time to serve "write with holes" reqests.

Disk driver requests completions timeline comparison table
write(,, 32 *PG_SZ)       || write(,, 31*PG_SZ) ;lseek(,PG_SZ, SET_CUR)
--------------------------++---------------------------------
time       sector         ||    time      sector
--------------------------++---------------------------------   
0.001028   131072 + 96    ||  0.001020   131072 + 96 
0.010916   131176 + 256   ||  0.015471   131176 + 152
0.018810   131432 + 256   ||  0.022863   131336 + 248
0.020248   131688 + 256   ||  0.024771   131592 + 248
0.021674   131944 + 256   ||  0.031986   131848 + 248
0.023090   132200 + 256   ||  0.039276   132104 + 248
0.024575   132456 + 256   ||  0.046587   132360 + 248
0.026069   132712 + 256   ||  0.054503   132616 + 248
0.027566   132968 + 256   ||  0.061797   132872 + 248
0.029063   133224 + 256   ||  0.069087   133128 + 248
0.030558   133480 + 256   ||  0.076388   133384 + 248
0.032053   133736 + 256   ||  0.083756   133640 + 248
0.033544   133992 + 256   ||  0.085657   133896 + 248
0.035042   134248 + 256   ||  0.092878   134152 + 248
0.036518   134504 + 256   ||  0.100176   134408 + 248
0.038009   134760 + 256   ||  0.107473   134664 + 248
0.039510   135016 + 256   ||  0.115323   134920 + 248
0.041005   135272 + 256   ||  0.122638   135176 + 248
0.042500   135528 + 256   ||  0.129933   135432 + 248
0.043992   135784 + 256   ||  0.137224   135688 + 248
IMHO it is also hardware issue.

>
> -- 
> Jens Axboe
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-06-16 12:15 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-11  7:20 write is faster whan seek? Dmitri Monakhov
2008-06-11  7:48 ` Jens Axboe
2008-06-11  8:11   ` Dmitri Monakhov
2008-06-11  8:26     ` Jens Axboe
2008-06-11  9:28       ` Dmitri Monakhov
2008-06-11  9:38         ` Jens Axboe
2008-06-11 11:38 ` Alan D. Brunelle
2008-06-11 11:49   ` Jens Axboe
2008-06-11 11:52     ` Alan D. Brunelle
2008-06-11 11:55       ` Jens Axboe
2008-06-16 12:14         ` Dmitri Monakhov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).