Question about FIO sequential writes

All of lore.kernel.org
 help / color / mirror / Atom feed

* Question about FIO sequential writes
@ 2014-04-15 22:14 Xiaofei Du
  2014-04-15 22:24 ` Jens Axboe
  0 siblings, 1 reply; 11+ messages in thread
From: Xiaofei Du @ 2014-04-15 22:14 UTC (permalink / raw)
  To: axboe; +Cc: fio, aaronc

[-- Attachment #1: Type: text/plain, Size: 697 bytes --]

Dear Jens,

I have a couple questions about FIO.

First question. For default ioengine, with direct IO option enabled. On a
hard drive with writing cache disabled. I got random writes faster than
sequential writes. I couldn't explain this. Can you please help me explain
if you know the reason. Could that be a bug?

Second question. For libaio ioengin, with direct IO option enabled and on a
hard drive with writing cache disabled, I could get very high IOPS for
sequential writes if I keep increasing the iodepth. The IOPS I got seems
unreasonable. It could go up to 16000 IOPS for 4k blocks. So is there
anything wrong with sequential writes in FIO?

Thanks a lot for your help.

Best,
Xiaofei

[-- Attachment #2: Type: text/html, Size: 1506 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about FIO sequential writes
  2014-04-15 22:14 Question about FIO sequential writes Xiaofei Du
@ 2014-04-15 22:24 ` Jens Axboe
  2014-04-15 23:04   ` Xiaofei Du
  2014-04-16  1:10   ` Xiaofei Du
  0 siblings, 2 replies; 11+ messages in thread
From: Jens Axboe @ 2014-04-15 22:24 UTC (permalink / raw)
  To: Xiaofei Du; +Cc: fio, aaronc

On 2014-04-15 16:14, Xiaofei Du wrote:
> Dear Jens,
>
> I have a couple questions about FIO.
>
> First question. For default ioengine, with direct IO option enabled. On
> a hard drive with writing cache disabled. I got random writes faster
> than sequential writes. I couldn't explain this. Can you please help me
> explain if you know the reason. Could that be a bug?

How big of a difference are you seeing? Depending on the time and size 
of the seek, the random write could be faster. The sequential write in 
this case has to wait for a full revolution of the platters before the 
head is positioned correctly again. For a random write, statistically 
you would have to wait for half a revolution. But you have seek time for 
that case too, so it depends on the IO pattern.

So it might be useful to include the job you ran and the outputs.

> Second question. For libaio ioengin, with direct IO option enabled and
> on a hard drive with writing cache disabled, I could get very high IOPS
> for sequential writes if I keep increasing the iodepth. The IOPS I got
> seems unreasonable. It could go up to 16000 IOPS for 4k blocks. So is
> there anything wrong with sequential writes in FIO?

If you go to a higher depth, the OS/driver may merge sequential IO. So 
this means that fio might be issuing 4K IOs and calculating IOPS based 
on that, but further down they are coalesced and (for instance) 128 4K 
writes are submitted and completed by the drive as a single 512K write.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about FIO sequential writes
  2014-04-15 22:24 ` Jens Axboe
@ 2014-04-15 23:04   ` Xiaofei Du
  2014-04-16  1:44     ` Jens Axboe
  2014-04-16  1:10   ` Xiaofei Du
  1 sibling, 1 reply; 11+ messages in thread
From: Xiaofei Du @ 2014-04-15 23:04 UTC (permalink / raw)
  To: Jens Axboe; +Cc: fio, aaronc

[-- Attachment #1: Type: text/plain, Size: 4903 bytes --]

Dear Jens,

Thank you so much for your answers.

For your second answer:

I saw a sequential writes with IOPS of 16000 for 4k blocks. Even if they
are combined at the driver or firmware. That is 64MB/s. For a disk with
write cache disabled. Is that reasonable?

This is the job file content for the large iodepth

[global]
bs=4k
ioengine=libaio
iodepth=640
size=2g
direct=1
buffered=0
filename=2gfile

[seq-write]
rw=write
stonewall


For your first answer:

This is the job file content.

########################################################################

[global]
bs=4k
size=100m
direct=1
filename=100mfile

[seq-write]
rw=write
stonewall

[rand-write]
rw=randwrite
stonewall

########################################################################

This is the output. In this run the difference is 115 vs 123. On another
disk, the number I got was 121 vs 141. Random writes are always faster than
sequential writes.

gregory@pacific:~$ fio job_file2
seq-write: (g=0): rw=write, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
rand-write: (g=1): rw=randwrite, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
fio 1.59
Starting 2 processes
seq-write: Laying out IO file(s) (1 file(s) / 100MB)
Jobs: 1 (f=1): [_w] [100.0% done] [0K/552K /s] [0 /135  iops] [eta 00m:00s]
seq-write: (groupid=0, jobs=1): err= 0: pid=31294
  write: io=102400KB, bw=473220 B/s, iops=115 , runt=221583msec
    clat (msec): min=7 , max=99 , avg= 8.65, stdev= 2.21
     lat (msec): min=7 , max=99 , avg= 8.65, stdev= 2.21
    bw (KB/s) : min=  311, max=  478, per=100.04%, avg=462.18, stdev=24.03
  cpu          : usr=0.06%, sys=0.55%, ctx=25785, majf=0, minf=25
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     issued r/w/d: total=0/25600/0, short=0/0/0

     lat (msec): 10=97.96%, 20=1.08%, 50=0.95%, 100=0.01%
rand-write: (groupid=1, jobs=1): err= 0: pid=31295
  write: io=102400KB, bw=507210 B/s, iops=123 , runt=206734msec
    clat (msec): min=1 , max=107 , avg= 8.07, stdev= 5.32
     lat (msec): min=1 , max=107 , avg= 8.07, stdev= 5.32
    bw (KB/s) : min=  339, max=  608, per=100.08%, avg=495.39, stdev=41.37
  cpu          : usr=0.10%, sys=0.45%, ctx=25775, majf=0, minf=24
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     issued r/w/d: total=0/25600/0, short=0/0/0

     lat (msec): 2=0.55%, 4=12.46%, 10=66.77%, 20=16.43%, 50=3.61%
     lat (msec): 100=0.14%, 250=0.04%

Run status group 0 (all jobs):
  WRITE: io=102400KB, aggrb=462KB/s, minb=473KB/s, maxb=473KB/s,
mint=221583msec, maxt=221583msec

Run status group 1 (all jobs):
  WRITE: io=102400KB, aggrb=495KB/s, minb=507KB/s, maxb=507KB/s,
mint=206734msec, maxt=206734msec

Disk stats (read/write):
  dm-0: ios=0/51500, merge=0/0, ticks=0/433244, in_queue=433252,
util=99.29%, aggrios=0/51400, aggrmerge=0/116, aggrticks=0/429716,
aggrin_queue=429636, aggrutil=99.26%
    sda: ios=0/51400, merge=0/116, ticks=0/429716, in_queue=429636,
util=99.26%



On Tue, Apr 15, 2014 at 3:24 PM, Jens Axboe <axboe@kernel.dk> wrote:

> On 2014-04-15 16:14, Xiaofei Du wrote:
>
>> Dear Jens,
>>
>> I have a couple questions about FIO.
>>
>> First question. For default ioengine, with direct IO option enabled. On
>> a hard drive with writing cache disabled. I got random writes faster
>> than sequential writes. I couldn't explain this. Can you please help me
>> explain if you know the reason. Could that be a bug?
>>
>
> How big of a difference are you seeing? Depending on the time and size of
> the seek, the random write could be faster. The sequential write in this
> case has to wait for a full revolution of the platters before the head is
> positioned correctly again. For a random write, statistically you would
> have to wait for half a revolution. But you have seek time for that case
> too, so it depends on the IO pattern.
>
> So it might be useful to include the job you ran and the outputs.
>
>
>  Second question. For libaio ioengin, with direct IO option enabled and
>> on a hard drive with writing cache disabled, I could get very high IOPS
>> for sequential writes if I keep increasing the iodepth. The IOPS I got
>> seems unreasonable. It could go up to 16000 IOPS for 4k blocks. So is
>> there anything wrong with sequential writes in FIO?
>>
>
> If you go to a higher depth, the OS/driver may merge sequential IO. So
> this means that fio might be issuing 4K IOs and calculating IOPS based on
> that, but further down they are coalesced and (for instance) 128 4K writes
> are submitted and completed by the drive as a single 512K write.


>
> --
> Jens Axboe
>
>

[-- Attachment #2: Type: text/html, Size: 7285 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about FIO sequential writes
  2014-04-15 22:24 ` Jens Axboe
  2014-04-15 23:04   ` Xiaofei Du
@ 2014-04-16  1:10   ` Xiaofei Du
  1 sibling, 0 replies; 11+ messages in thread
From: Xiaofei Du @ 2014-04-16  1:10 UTC (permalink / raw)
  To: Jens Axboe; +Cc: fio, Aaron Carroll

[-- Attachment #1: Type: text/plain, Size: 2170 bytes --]

Dear Jens,

For your second answer, I saw similar effects. I did another run on another
machine, which has an older disk and a newer kernel. On that machine,
iostat shows iops of 65 for throughput around 32MB/s (this means each op
has a size of 512k), which is in accordance to what you said in your second
answer.

seems some iostat shows iops for original block size, some versions of
iostat show iops after the blocks being combined.

I think 64MB/s sequential write throughput could be reasonable now. Thanks
a lot for your help.

Best,
Xiaofei


On Tue, Apr 15, 2014 at 3:24 PM, Jens Axboe <axboe@kernel.dk> wrote:

> On 2014-04-15 16:14, Xiaofei Du wrote:
>
>> Dear Jens,
>>
>> I have a couple questions about FIO.
>>
>> First question. For default ioengine, with direct IO option enabled. On
>> a hard drive with writing cache disabled. I got random writes faster
>> than sequential writes. I couldn't explain this. Can you please help me
>> explain if you know the reason. Could that be a bug?
>>
>
> How big of a difference are you seeing? Depending on the time and size of
> the seek, the random write could be faster. The sequential write in this
> case has to wait for a full revolution of the platters before the head is
> positioned correctly again. For a random write, statistically you would
> have to wait for half a revolution. But you have seek time for that case
> too, so it depends on the IO pattern.
>
> So it might be useful to include the job you ran and the outputs.
>
>
>  Second question. For libaio ioengin, with direct IO option enabled and
>> on a hard drive with writing cache disabled, I could get very high IOPS
>> for sequential writes if I keep increasing the iodepth. The IOPS I got
>> seems unreasonable. It could go up to 16000 IOPS for 4k blocks. So is
>> there anything wrong with sequential writes in FIO?
>>
>
> If you go to a higher depth, the OS/driver may merge sequential IO. So
> this means that fio might be issuing 4K IOs and calculating IOPS based on
> that, but further down they are coalesced and (for instance) 128 4K writes
> are submitted and completed by the drive as a single 512K write.
>
> --
> Jens Axboe
>
>

[-- Attachment #2: Type: text/html, Size: 2971 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about FIO sequential writes
  2014-04-15 23:04   ` Xiaofei Du
@ 2014-04-16  1:44     ` Jens Axboe
  2014-04-16  4:47       ` Elliott, Robert (Server Storage)
  0 siblings, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2014-04-16  1:44 UTC (permalink / raw)
  To: Xiaofei Du; +Cc: fio, aaronc

On 2014-04-15 17:04, Xiaofei Du wrote:
> [global]
> bs=4k
> size=100m
> direct=1
> filename=100mfile
>
> [seq-write]
> rw=write
> stonewall
>
> [rand-write]
> rw=randwrite
> stonewall
>
> ########################################################################
>
> This is the output. In this run the difference is 115 vs 123. On another
> disk, the number I got was 121 vs 141. Random writes are always faster
> than sequential writes.

Since the region is only 100M, it seems reasonable to expect random IO 
within that region to be faster than sequential ones. The sequential 
ones will always be subject to a full rotational penalty, limiting your 
IOPS to 120 if we disregard software overhead, DMA, etc. So 115 seems 
very in the ballpark. For random IO, we have to move the head a bit, but 
if the seek is less than half 1/RPM, then it's a win.

So I'd say things are looking as expected.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Question about FIO sequential writes
  2014-04-16  1:44     ` Jens Axboe
@ 2014-04-16  4:47       ` Elliott, Robert (Server Storage)
  2014-04-16 14:41         ` Jens Axboe
                           ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Elliott, Robert (Server Storage) @ 2014-04-16  4:47 UTC (permalink / raw)
  To: Jens Axboe, Xiaofei Du; +Cc: fio, aaronc

> On 2014-04-15 17:04, Xiaofei Du wrote:
> > [global]
> > bs=4k
> > size=100m
> > direct=1
> > filename=100mfile
> >

A few more tips:
- I don't see ioengine=libaio or iodepth=nn in that excerpt.
- use iostat -mxyz 2 while the test is running to see the IO sizes that the block layer is sending to the LLD, how many block layer merges are occurring, the IO depth being maintained to the LLD, and the IOPS rate.
- set /sys/block/<drive name>/queue/nomerges to 2 to disable block layer/scheduler merging.
- your LLD or controller might have its own cache and/or do sequential merging (coalescing) too.  What controller are you using?
- by using a file, you don't know whether it's on the inner tracks or outer tracks or somewhere in between.
- by using a file, you don't know whether it's fragmented, making sequential transfers really random to the drive.
- by using a file, if using an "advanced format" drive (with large physical sectors), you have to be wary of unaligned transfers.  Is the partition aligned so all the accesses really go to the drive aligned to 4 KiB boundaries? 
- by using a tiny file size like 100 MiB, writes could easily get buffered in a write cache somewhere along the way; 64 and 128 MiB HDD volatile write cache sizes are common nowadays, and RAID controllers have multi-GiB non-volatile write caches.
- although you didn't report results with reads, beware that the drive could end up serving all 100 MiB of random read data from its cache (reads can always be cached, volatile or not), while throwing away sequential prefetched read data after returning that data because the drive does not expect the data to be read again.
- to avoid filesystem interference, directly access the drive with /dev/disk/by-path, /dev/disk/by-id, or /dev/sdNN type names (or \\.\PhysicalDriveNN in Windows).  If using /dev/sdNN or PhysicalDriveNN, be very careful not to overwrite your boot drive, since the mapping can change every reboot; I don't think fio provides any protection from doing so.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about FIO sequential writes
  2014-04-16  4:47       ` Elliott, Robert (Server Storage)
@ 2014-04-16 14:41         ` Jens Axboe
  2014-04-17  1:08         ` Xiaofei Du
  2014-04-21 16:38         ` Bruce Cran
  2 siblings, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2014-04-16 14:41 UTC (permalink / raw)
  To: Elliott, Robert (Server Storage), Xiaofei Du; +Cc: fio, aaronc

On 04/15/2014 10:47 PM, Elliott, Robert (Server Storage) wrote:
>> On 2014-04-15 17:04, Xiaofei Du wrote:
>>> [global]
>>> bs=4k
>>> size=100m
>>> direct=1
>>> filename=100mfile
>>>
> 
> A few more tips:
> - I don't see ioengine=libaio or iodepth=nn in that excerpt.
> - use iostat -mxyz 2 while the test is running to see the IO sizes that the block layer is sending to the LLD, how many block layer merges are occurring, the IO depth being maintained to the LLD, and the IOPS rate.
> - set /sys/block/<drive name>/queue/nomerges to 2 to disable block layer/scheduler merging.
> - your LLD or controller might have its own cache and/or do sequential merging (coalescing) too.  What controller are you using?
> - by using a file, you don't know whether it's on the inner tracks or outer tracks or somewhere in between.
> - by using a file, you don't know whether it's fragmented, making sequential transfers really random to the drive.
> - by using a file, if using an "advanced format" drive (with large physical sectors), you have to be wary of unaligned transfers.  Is the partition aligned so all the accesses really go to the drive aligned to 4 KiB boundaries? 
> - by using a tiny file size like 100 MiB, writes could easily get buffered in a write cache somewhere along the way; 64 and 128 MiB HDD volatile write cache sizes are common nowadays, and RAID controllers have multi-GiB non-volatile write caches.
> - although you didn't report results with reads, beware that the drive could end up serving all 100 MiB of random read data from its cache (reads can always be cached, volatile or not), while throwing away sequential prefetched read data after returning that data because the drive does not expect the data to be read again.
> - to avoid filesystem interference, directly access the drive with /dev/disk/by-path, /dev/disk/by-id, or /dev/sdNN type names (or \\.\PhysicalDriveNN in Windows).  If using /dev/sdNN or PhysicalDriveNN, be very careful not to overwrite your boot drive, since the mapping can change every reboot; I don't think fio provides any protection from doing so.

Fio has a safeguard --readonly switch which will protect you from doing
something stupid. It wont help you from using the wrong drive, but it's
useful to know that your config doesn't contain writes that will then be
issued to a drive you don't want to destroy the data on.


-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about FIO sequential writes
  2014-04-16  4:47       ` Elliott, Robert (Server Storage)
  2014-04-16 14:41         ` Jens Axboe
@ 2014-04-17  1:08         ` Xiaofei Du
  2014-04-17 14:30           ` Elliott, Robert (Server Storage)
  2014-04-21 16:38         ` Bruce Cran
  2 siblings, 1 reply; 11+ messages in thread
From: Xiaofei Du @ 2014-04-17  1:08 UTC (permalink / raw)
  To: Elliott, Robert (Server Storage); +Cc: Jens Axboe, fio, aaronc

[-- Attachment #1: Type: text/plain, Size: 2566 bytes --]

Robert,

Thanks for the tips. I don't understand the last point (- to avoid
filesystem interference, directly access the drive with /dev/disk/by-path,
/dev/disk/by-id, ...)

I googled it, but didn't find something related directly. Maybe I googled
the wrong key words. So could you please help to explain this last point,
or send me some links to read? Thanks a lot.

Best,
Xiaofei


On Tue, Apr 15, 2014 at 9:47 PM, Elliott, Robert (Server Storage) <
Elliott@hp.com> wrote:

> > On 2014-04-15 17:04, Xiaofei Du wrote:
> > > [global]
> > > bs=4k
> > > size=100m
> > > direct=1
> > > filename=100mfile
> > >
>
> A few more tips:
> - I don't see ioengine=libaio or iodepth=nn in that excerpt.
> - use iostat -mxyz 2 while the test is running to see the IO sizes that
> the block layer is sending to the LLD, how many block layer merges are
> occurring, the IO depth being maintained to the LLD, and the IOPS rate.
> - set /sys/block/<drive name>/queue/nomerges to 2 to disable block
> layer/scheduler merging.
> - your LLD or controller might have its own cache and/or do sequential
> merging (coalescing) too.  What controller are you using?
> - by using a file, you don't know whether it's on the inner tracks or
> outer tracks or somewhere in between.
> - by using a file, you don't know whether it's fragmented, making
> sequential transfers really random to the drive.
> - by using a file, if using an "advanced format" drive (with large
> physical sectors), you have to be wary of unaligned transfers.  Is the
> partition aligned so all the accesses really go to the drive aligned to 4
> KiB boundaries?
> - by using a tiny file size like 100 MiB, writes could easily get buffered
> in a write cache somewhere along the way; 64 and 128 MiB HDD volatile write
> cache sizes are common nowadays, and RAID controllers have multi-GiB
> non-volatile write caches.
> - although you didn't report results with reads, beware that the drive
> could end up serving all 100 MiB of random read data from its cache (reads
> can always be cached, volatile or not), while throwing away sequential
> prefetched read data after returning that data because the drive does not
> expect the data to be read again.
> - to avoid filesystem interference, directly access the drive with
> /dev/disk/by-path, /dev/disk/by-id, or /dev/sdNN type names (or
> \\.\PhysicalDriveNN in Windows).  If using /dev/sdNN or PhysicalDriveNN, be
> very careful not to overwrite your boot drive, since the mapping can change
> every reboot; I don't think fio provides any protection from doing so.
>
>

[-- Attachment #2: Type: text/html, Size: 3495 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Question about FIO sequential writes
  2014-04-17  1:08         ` Xiaofei Du
@ 2014-04-17 14:30           ` Elliott, Robert (Server Storage)
  2014-04-17 18:29             ` Xiaofei Du
  0 siblings, 1 reply; 11+ messages in thread
From: Elliott, Robert (Server Storage) @ 2014-04-17 14:30 UTC (permalink / raw)
  To: Xiaofei Du; +Cc: Jens Axboe, fio, aaronc

[-- Attachment #1: Type: text/plain, Size: 1090 bytes --]

> Thanks for the tips. I don't understand the last point (- to avoid
> filesystem interference, directly access the drive with
> /dev/disk/by-path, /dev/disk/by-id, ...)

> I googled it, but didn't find something related directly. Maybe
> I googled the wrong key words. So could you please help to
> explain this last point, or send me some links to read? Thanks a lot.

Point fio to a block device rather than a file, like:
filename=/dev/sdb

However, the /dev/sdNN names are not persistent and can change when you reboot.  It is dangerous to hardcode them in a job description file that does writes. You may use these types of paths instead:

filename=/dev/disk/by-path/pci-0000\:04\:00.0-scsi-0\:0\:0\:15
points to the controller’s PCI ID (e.g., from lspci) and the obsolete bus:target:LUN mapping provided for the drive.    The bus:target:LUN mapping might not be persistent, depending on your controller and its driver.

filename=/dev/disk/by-id/wwn-0x5000cca02b042ef0
points to the worldwide name of the drive (safest)

---
Rob Elliott    HP Server Storage



[-- Attachment #2: Type: text/html, Size: 5990 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about FIO sequential writes
  2014-04-17 14:30           ` Elliott, Robert (Server Storage)
@ 2014-04-17 18:29             ` Xiaofei Du
  0 siblings, 0 replies; 11+ messages in thread
From: Xiaofei Du @ 2014-04-17 18:29 UTC (permalink / raw)
  To: Elliott, Robert (Server Storage); +Cc: Jens Axboe, fio, aaronc

[-- Attachment #1: Type: text/plain, Size: 1440 bytes --]

Robert -- Thank you for the explanation.

Jens -- Thank you for your help as well.

Have a great day!!

Best,
Xiaofei



On Thu, Apr 17, 2014 at 7:30 AM, Elliott, Robert (Server Storage) <
Elliott@hp.com> wrote:

>  > Thanks for the tips. I don't understand the last point (- to avoid
>
> > filesystem interference, directly access the drive with
>
> > /dev/disk/by-path, /dev/disk/by-id, ...)
>
>
>
> > I googled it, but didn't find something related directly. Maybe
>
> > I googled the wrong key words. So could you please help to
>
> > explain this last point, or send me some links to read? Thanks a lot.
>
>
>
> Point fio to a block device rather than a file, like:
>
> filename=/dev/sdb
>
>
>
> However, the /dev/sdNN names are not persistent and can change when you
> reboot.  It is dangerous to hardcode them in a job description file that
> does writes. You may use these types of paths instead:
>
>
>
> filename=/dev/disk/by-path/pci-0000\:04\:00.0-scsi-0\:0\:0\:15
>
> points to the controller’s PCI ID (e.g., from lspci) and the obsolete
> bus:target:LUN mapping provided for the drive.    The bus:target:LUN
> mapping might not be persistent, depending on your controller and its
> driver.
>
>
>
> filename=/dev/disk/by-id/wwn-0x5000cca02b042ef0
>
> points to the worldwide name of the drive (safest)
>
>
>
> ---
>
> Rob Elliott    HP Server Storage
>
>
>
>
>

[-- Attachment #2: Type: text/html, Size: 5135 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about FIO sequential writes
  2014-04-16  4:47       ` Elliott, Robert (Server Storage)
  2014-04-16 14:41         ` Jens Axboe
  2014-04-17  1:08         ` Xiaofei Du
@ 2014-04-21 16:38         ` Bruce Cran
  2 siblings, 0 replies; 11+ messages in thread
From: Bruce Cran @ 2014-04-21 16:38 UTC (permalink / raw)
  To: Elliott, Robert (Server Storage); +Cc: Jens Axboe, Xiaofei Du, fio, aaronc

Windows prevents you from overwriting a block if it contains a mounted volume.

Bruce

Sent from my iPhone

On Apr 15, 2014, at 10:47 PM, "Elliott, Robert (Server Storage)" <Elliott@hp.com> wrote:

>>> On 2014-04-15 17:04, Xiaofei Du wrote:
>>> [global]
>>> bs=4k
>>> size=100m
>>> direct=1
>>> filename=100mfile
> 
> A few more tips:
> - I don't see ioengine=libaio or iodepth=nn in that excerpt.
> - use iostat -mxyz 2 while the test is running to see the IO sizes that the block layer is sending to the LLD, how many block layer merges are occurring, the IO depth being maintained to the LLD, and the IOPS rate.
> - set /sys/block/<drive name>/queue/nomerges to 2 to disable block layer/scheduler merging.
> - your LLD or controller might have its own cache and/or do sequential merging (coalescing) too.  What controller are you using?
> - by using a file, you don't know whether it's on the inner tracks or outer tracks or somewhere in between.
> - by using a file, you don't know whether it's fragmented, making sequential transfers really random to the drive.
> - by using a file, if using an "advanced format" drive (with large physical sectors), you have to be wary of unaligned transfers.  Is the partition aligned so all the accesses really go to the drive aligned to 4 KiB boundaries? 
> - by using a tiny file size like 100 MiB, writes could easily get buffered in a write cache somewhere along the way; 64 and 128 MiB HDD volatile write cache sizes are common nowadays, and RAID controllers have multi-GiB non-volatile write caches.
> - although you didn't report results with reads, beware that the drive could end up serving all 100 MiB of random read data from its cache (reads can always be cached, volatile or not), while throwing away sequential prefetched read data after returning that data because the drive does not expect the data to be read again.
> - to avoid filesystem interference, directly access the drive with /dev/disk/by-path, /dev/disk/by-id, or /dev/sdNN type names (or \\.\PhysicalDriveNN in Windows).  If using /dev/sdNN or PhysicalDriveNN, be very careful not to overwrite your boot drive, since the mapping can change every reboot; I don't think fio provides any protection from doing so.
> 
> N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·ŸŠˆ§¶\x17›¡Ü¨}©ž²Æ zÚ&j:+v‰¨¾\a«‘êçzZ+€Ê+zf£¢·hšˆ§~††Ûiÿûàz¹\x1e®w¥¢¸?™¨èÚ&¢)ß¢^[f


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-04-21 16:38 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-15 22:14 Question about FIO sequential writes Xiaofei Du
2014-04-15 22:24 ` Jens Axboe
2014-04-15 23:04   ` Xiaofei Du
2014-04-16  1:44     ` Jens Axboe
2014-04-16  4:47       ` Elliott, Robert (Server Storage)
2014-04-16 14:41         ` Jens Axboe
2014-04-17  1:08         ` Xiaofei Du
2014-04-17 14:30           ` Elliott, Robert (Server Storage)
2014-04-17 18:29             ` Xiaofei Du
2014-04-21 16:38         ` Bruce Cran
2014-04-16  1:10   ` Xiaofei Du

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.