public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [noob q. on block layer] block IO read-ahead during sequential *write*?
@ 2008-01-07 12:25 Frantisek Rysanek
  2008-01-07 12:49 ` Jörn Engel
  0 siblings, 1 reply; 3+ messages in thread
From: Frantisek Rysanek @ 2008-01-07 12:25 UTC (permalink / raw)
  To: linux-kernel

Dear Everyone,

let me start with a simple example. The following commands:

 cp /dev/zero /dev/hda
 dd if=/dev/zero of=/dev/hda [bs=512]

both have one common side-effect: apart from the disk being properly 
overwritten with zeroes, the kernel seems to keep reading sectors 
ahead of the current seek position of the sequential write.

This is kernel 2.6.22.6 and around that - previously 2.6.16.* and 
2.6.18.* . I believe Linux 2.4 kernels didn't do that.

I observe that behavior using `iostat 1` (from the Sysstat package). 
After I spotted the behavior, I first upgraded to Sysstat 8.0.3 
(latest at the time) to avoid any possible misinterpretation of 
/proc/diskstats or what the relevant /proc node is. The upgrade 
brought no difference.

I'm using commands along those lines to erase hard drives, to check 
their sequential write speed, and as an additional test when checking 
for bad sectors / flawed behavior that goes unreported by SMART.
The read-ahead skews my view of sequential write performance :-)

Note that if I specify count=<some fixed number> to 'dd', 
the read-ahead doesn't kick in.

Makes me wonder if perhaps "cp" and "dd without count=" perform their 
writes un-aligned to sector boundaries, so that the Linux block layer 
feels obliged to prefetch the trailing sector that gets only 
partially written within a given transaction. It's weird though.
At least bs=512 should be aligned to sector boundaries, even if 
uncounted. And, it doesn't seem like every N'th sector - the read and 
write rates are about equal.

I've already discovered "hdparm -a", and some follow-ups called 
ioctl(BLKRASET) and 'bdi->ra_pages' in $KERNEL_SRC/block/* .
That's where I got stuck - trying to understand the whole read-ahead 
logic from the source code would take me too long to be worth it.
That's when I resorted to posting this to the LKML...

Is "hdparm -a" my only chance? Is there any kernel command line 
argument or /proc entry or ioctl() to modify this behavior 
selectively for writes? Should I write my own test prog, 
that will take care to strictly walk the sector boundaries?

Any ideas are welcome :-)

Frank Rysanek



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [noob q. on block layer] block IO read-ahead during sequential *write*?
  2008-01-07 12:25 [noob q. on block layer] block IO read-ahead during sequential *write*? Frantisek Rysanek
@ 2008-01-07 12:49 ` Jörn Engel
  2008-01-07 15:51   ` Frantisek Rysanek
  0 siblings, 1 reply; 3+ messages in thread
From: Jörn Engel @ 2008-01-07 12:49 UTC (permalink / raw)
  To: Frantisek Rysanek; +Cc: linux-kernel

On Mon, 7 January 2008 13:25:09 +0100, Frantisek Rysanek wrote:
> 
> let me start with a simple example. The following commands:
> 
>  cp /dev/zero /dev/hda
>  dd if=/dev/zero of=/dev/hda [bs=512]
> 
> both have one common side-effect: apart from the disk being properly 
> overwritten with zeroes, the kernel seems to keep reading sectors 
> ahead of the current seek position of the sequential write.

Block devices are cached in the page cache.  If you write less than a
full page, any remainder has to be read from the device.

If you retry the dd with bs=4096 (or whatever your architecture's page
size happens to be), does this still occur?

Jörn

-- 
Chance favors only the prepared mind.
-- Louis Pasteur

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [noob q. on block layer] block IO read-ahead during sequential *write*?
  2008-01-07 12:49 ` Jörn Engel
@ 2008-01-07 15:51   ` Frantisek Rysanek
  0 siblings, 0 replies; 3+ messages in thread
From: Frantisek Rysanek @ 2008-01-07 15:51 UTC (permalink / raw)
  To: Jörn Engel, linux-kernel

On 7 Jan 2008 at 13:49, Jörn Engel wrote:
>
> If you retry the dd with bs=4096 (or whatever your architecture's page
> size happens to be), does this still occur?
> 
> JĂśrn
> 
wow, thanks, it works, this is so obvious.
I tend to use values like bs=512000 or suchlike, but probably never
stroke this particular value when I was looking at iostat.
Must've missed that modulo 4096 somehow.

Seems like cp is using 512B per transaction, judging by how slow the
transfer goes. If I use bs=500000 or even bs=1024, it goes faster.

Have a nice day :-}

Frank Rysanek

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-01-07 15:45 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-07 12:25 [noob q. on block layer] block IO read-ahead during sequential *write*? Frantisek Rysanek
2008-01-07 12:49 ` Jörn Engel
2008-01-07 15:51   ` Frantisek Rysanek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox