public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* struct stat{st_blksize} for /dev entries in 2.4.3
@ 2001-04-08 13:49 Russell Coker
  2001-04-08 18:46 ` Ulrich Drepper
  2001-04-09  7:32 ` Eric W. Biederman
  0 siblings, 2 replies; 3+ messages in thread
From: Russell Coker @ 2001-04-08 13:49 UTC (permalink / raw)
  To: linux-kernel; +Cc: herbert

When you stat() the files under /dev the st_blksize is returned as 1024 
bytes.  Currently cat will look at the input block size and the output block 
size and use the maximum of them as it's buffer size.  I believe that 
programs such as cat should never use a buffer size smaller than a page of 
memory and reported this as a bug in cat.
Herbert Xu (the maintainer of the Debian package textutils which contains 
cat) considers that the device should return a larger number in the 
st_blksize.

Here are some test results, first a P3-650 with 256M of RAM using a 300M 
partition of a 10G disk:
root@lyta:/#time cat /dev/ide/host0/bus0/target0/lun0/part1 > /dev/null
 
real    0m37.959s
user    0m0.220s
sys     0m4.910s
My patched cat uses 25% of the user CPU time and 90% the system CPU time of 
the unpatched cat:
root@lyta:/#time /usr/src/textutils-new/src/cat 
/dev/ide/host0/bus0/target0/lun0/part1 > /dev/null
 
real    0m35.502s
user    0m0.060s
sys     0m4.440s

Now here's an AMD K6-350 with 64M of RAM using a 2G RAID-1 partition across 
two 46G disks:
root@ivanova:~#time cat /dev/md/1 > /dev/null
 
real    2m25.906s
user    0m2.200s
sys     1m16.290s

My patched cat uses 30% the user CPU time and 95% the system CPU time:
root@ivanova:~#time /tmp/cat /dev/md/1 > /dev/null
 
real    2m14.845s
user    0m0.720s
sys     1m12.030s


On an AMD Athlon 800 machine I noticed an even more significant difference 
(the command "cat /dev/zero > /dev/hdc1" was using 100% CPU time but the disk 
was not at maximum speed before I patched cat).  But I don't have suitable 
test results with me at this time so I can't give you the details.  Another 
issue is that an Athlon 800 is a reasonably fast CPU, and it should probably 
be able to handle 33000 reads and 33000 writes per second easily without 
using any significant amount of CPU time.


Now I would like some comments on the following issues:
Is this a bug in cat regardless of whether the behaviour of the kernel is 
right or wrong?  I have attached a patch for cat in case it is determined 
that cat is buggy.


Regardless of whether cat is doing the right thing or not, does it make sense 
for the st_blksize of /dev/* to be 1024 bytes?  Or should it be 4096?

My understanding is that the st_blksize is the recommended IO size (not 
mandatory).  So it shouldn't break anything if this is set to a minimum of 
the page size.  But setting it to the page size will hint that applications 
should use a page as the minimum IO block size and cause some applications to 
deliver better performance.



diff -ru textutils-2.0/src/cat.c textutils-new/src/cat.c
--- textutils-2.0/src/cat.c     Sun Apr  8 22:55:10 2001
+++ textutils-new/src/cat.c     Sun Apr  8 23:23:54 2001
@@ -790,6 +790,9 @@
       if (options == 0)
        {
          insize = max (insize, outsize);
+#ifdef _SC_PHYS_PAGES
+         insize = max (insize, sysconf(_SC_PAGESIZE));
+#endif
          inbuf = (unsigned char *) xmalloc (insize);
 
          simple_cat (inbuf, insize);

-- 
http://www.coker.com.au/bonnie++/     Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/       Postal SMTP/POP benchmark
http://www.coker.com.au/projects.html Projects I am working on
http://www.coker.com.au/~russell/     My home page

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: struct stat{st_blksize} for /dev entries in 2.4.3
  2001-04-08 13:49 struct stat{st_blksize} for /dev entries in 2.4.3 Russell Coker
@ 2001-04-08 18:46 ` Ulrich Drepper
  2001-04-09  7:32 ` Eric W. Biederman
  1 sibling, 0 replies; 3+ messages in thread
From: Ulrich Drepper @ 2001-04-08 18:46 UTC (permalink / raw)
  To: Russell Coker; +Cc: linux-kernel, herbert

Russell Coker <russell@coker.com.au> writes:

> diff -ru textutils-2.0/src/cat.c textutils-new/src/cat.c
> --- textutils-2.0/src/cat.c     Sun Apr  8 22:55:10 2001
> +++ textutils-new/src/cat.c     Sun Apr  8 23:23:54 2001
> @@ -790,6 +790,9 @@
>        if (options == 0)
>         {
>           insize = max (insize, outsize);
> +#ifdef _SC_PHYS_PAGES
> +         insize = max (insize, sysconf(_SC_PAGESIZE));
> +#endif
>           inbuf = (unsigned char *) xmalloc (insize);
>  
>           simple_cat (inbuf, insize);

The #ifdef is certainly wrong.  And there is no guarantee that any of
the _SC_* constants are defined as macros.  You'll have to add a
configure test.

-- 
---------------.                          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Red Hat          `--' drepper at redhat.com   `------------------------

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: struct stat{st_blksize} for /dev entries in 2.4.3
  2001-04-08 13:49 struct stat{st_blksize} for /dev entries in 2.4.3 Russell Coker
  2001-04-08 18:46 ` Ulrich Drepper
@ 2001-04-09  7:32 ` Eric W. Biederman
  1 sibling, 0 replies; 3+ messages in thread
From: Eric W. Biederman @ 2001-04-09  7:32 UTC (permalink / raw)
  To: Russell Coker; +Cc: linux-kernel, herbert

Russell Coker <russell@coker.com.au> writes:

> When you stat() the files under /dev the st_blksize is returned as 1024 
> bytes.  Currently cat will look at the input block size and the output block 
> size and use the maximum of them as it's buffer size.  I believe that 
> programs such as cat should never use a buffer size smaller than a page of 
> memory and reported this as a bug in cat.

Hmm. I can't agree with that, at least not all of the way.  I suspect
there are some character devices that do I/O in small chunks for which
a small block size is preferable.  The old latency version bandwidth issue.
Plus this assumes an linux like architecture.  With a different
structure to the code who knows where the tradeoffs fall.

In particular consider where the tradeoffs go if we increase the
internal PAGE_SIZE but not the external one.  So I think we really
need to fix stat!

However it looks like you aren't using ext2 for root.  

Currently looking at 2.4.2 source I see the following things.
1) cp_new_stat is broken, if you don't set st_blksize in your
   filesystem.  It appears to be counting 1K blocks base on filesize.
   st_blocks is always in multiples of 512 bytes.

2) The st_blksize comes out of the inode structure, and it looks 
   currently the device layer doesn't set it so it gets left at
   whatever the filesystem sets this to.  This is PAGE_SIZE for ext2.
   Which filesystem are you runnning?

> 
> Now I would like some comments on the following issues:
> Is this a bug in cat regardless of whether the behaviour of the kernel is 
> right or wrong?  I have attached a patch for cat in case it is determined 
> that cat is buggy.

Unless you can prove that PAGE_SIZE is a determining factor of I/O
efficiency on all possible architectures I don't see a problem with
cat.

> Regardless of whether cat is doing the right thing or not, does it make sense 
> for the st_blksize of /dev/* to be 1024 bytes?  Or should it be
> 4096?

Hmm.  I think we might want to be even smarter than that, but as a
first approximation yes it should be PAGE_SIZE.

> My understanding is that the st_blksize is the recommended IO size (not 
> mandatory).  So it shouldn't break anything if this is set to a minimum of 
> the page size.  But setting it to the page size will hint that applications 
> should use a page as the minimum IO block size and cause some applications to 
> deliver better performance.

Correct.  Changing st_blksize should not break any correct program.

Eric

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2001-04-09  7:33 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-04-08 13:49 struct stat{st_blksize} for /dev entries in 2.4.3 Russell Coker
2001-04-08 18:46 ` Ulrich Drepper
2001-04-09  7:32 ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox