Writeback cache, SCSI, and ReiserFS

All of lore.kernel.org
 help / color / mirror / Atom feed

* Writeback cache, SCSI, and ReiserFS
@ 2002-10-16  5:33 JP Howard
  2002-10-16  6:59 ` Valdis.Kletnieks
  2002-10-16 12:52 ` Chris Mason
  0 siblings, 2 replies; 3+ messages in thread
From: JP Howard @ 2002-10-16  5:33 UTC (permalink / raw)
  To: ReiserFS List; +Cc: Rob Mueller

Here's a little program to test fsync() performance--if you run it with
the parameter '1' it does 10000 write/fsync combinations, with the
parameter '0' it does 10000 writes, followed by a single fsync:
----
int main(int argc, char **argv) {
  int i, fd, mode;
  mode = atoi(argv[1]);
  fd = open("/some/path", O_CREAT | O_RDWR);
  for (i=0; i<10000; i++) {
    if (i%10 == 0)
      lseek(fd, 0, SEEK_SET);
    write(fd, buf, 8192);
    if (mode == 1) {
      fsync(fd);
    }
  }
  fsync(fd);
}
----

Now, according to our RAID controller (MegaRAID Series 475), we have
writeback cache turned on. Therefore, I would have thought that the above
program would run at the same speed regardless of using parameter '1' vs
'0', since fsync should return immediately when using a writeback cache.

But, it doesn't--it's about 30x slower when doing all the fsyncs.

Now, I don't know where to look. Is this a filesystem issue (I've tested
with both ReiserFS and Ext3 with the same results), a kernel issue, or a
hardware issue? We're very keen to get good fsync performance, because
Cyrus is full of them. (Yes, I know Reiser v4 will have proper
transactions, but I'm looking for a short-term solution to this
particular issue!)

Curiously, scsiinfo says that write cache is *not* turned on, despite
what the BIOS and "megamgr" (the controller's management software) says:
----
# scsiinfo -c /dev/sda

Data from Caching Page
----------------------
Write Cache                        0
Read Cache                         1
Prefetch units                     0
<...>
----

Has anyone dealt with this issue before? Are we correct that fsync()
should return immediately when using writeback cache? Any ideas on where
to look for a solution?

Apologies if this is not a ReiserFS issue--I don't really know where we
should be focussing our attention to resolve this.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Writeback cache, SCSI, and ReiserFS
  2002-10-16  5:33 Writeback cache, SCSI, and ReiserFS JP Howard
@ 2002-10-16  6:59 ` Valdis.Kletnieks
  2002-10-16 12:52 ` Chris Mason
  1 sibling, 0 replies; 3+ messages in thread
From: Valdis.Kletnieks @ 2002-10-16  6:59 UTC (permalink / raw)
  To: JP Howard; +Cc: ReiserFS List, Rob Mueller

[-- Attachment #1: Type: text/plain, Size: 1557 bytes --]

On Wed, 16 Oct 2002 05:33:12 -0000, JP Howard said:

> Now, according to our RAID controller (MegaRAID Series 475), we have
> writeback cache turned on. 

OK.. that's a good start..

> Curiously, scsiinfo says that write cache is *not* turned on, despite

but bad news here..

> Has anyone dealt with this issue before? Are we correct that fsync()
> should return immediately when using writeback cache? Any ideas on where
> to look for a solution?

If your kernel is convinced you don't have writeback cache, it doesn't have
much choice but to actually do something when you call fsync().  I'd start
by figuring out why scsiinfo thinks it's off - you won't get anyplace until
you get that issue resolved.

Even if you have a writeback cache, I don't think it does you any good.
Looking in fs/buffer.c, and the code in drivers/scsi/*, it looks like you
don't have a choice but to sit and wait in sync_buffers() if any blocks
have been dirtied but are still in the in-buffer cache.  If there's any
dirtied buffers, you *have* to wait.  At *best*, you can stop waiting when
the disk reports the buffer is in the writeback cache, rather than being
paranoid and waiting until the disk reports the buffer is actually scribed
on the oxide of the platter.  However, my 2:30AM reading of the scsi driver
code doesn't indicate any way to detect anything more specific than "the
disk drive said it got there OK" (and the disk is allowed to waffle on
what the definition of 'there' is).
-- 
				Valdis Kletnieks
				Computer Systems Senior Engineer
				Virginia Tech


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Writeback cache, SCSI, and ReiserFS
  2002-10-16  5:33 Writeback cache, SCSI, and ReiserFS JP Howard
  2002-10-16  6:59 ` Valdis.Kletnieks
@ 2002-10-16 12:52 ` Chris Mason
  1 sibling, 0 replies; 3+ messages in thread
From: Chris Mason @ 2002-10-16 12:52 UTC (permalink / raw)
  To: JP Howard; +Cc: ReiserFS List, Rob Mueller

On Wed, 2002-10-16 at 01:33, JP Howard wrote:
> Here's a little program to test fsync() performance--if you run it with
> the parameter '1' it does 10000 write/fsync combinations, with the
> parameter '0' it does 10000 writes, followed by a single fsync:
> ----
> int main(int argc, char **argv) {
>   int i, fd, mode;
>   mode = atoi(argv[1]);
>   fd = open("/some/path", O_CREAT | O_RDWR);
>   for (i=0; i<10000; i++) {
>     if (i%10 == 0)
>       lseek(fd, 0, SEEK_SET);
>     write(fd, buf, 8192);
>     if (mode == 1) {
>       fsync(fd);
>     }
>   }
>   fsync(fd);
> }
> ----
> 
> Now, according to our RAID controller (MegaRAID Series 475), we have
> writeback cache turned on. Therefore, I would have thought that the above
> program would run at the same speed regardless of using parameter '1' vs
> '0', since fsync should return immediately when using a writeback cache.

This is one of the things the data logging patches were optimized for. 
If you fsync after every write, the average transaction size is going to
be about 4 or 5 blocks.  If you fsync at the every end, the average
transaction size will be a few hundred blocks at least.

The data logging patches make small transactions much faster (even
without data=journal), and with data=journal the difference is huge.

Not 30x faster though.  You see that difference because of the lseek
back to zero, which means you are overwriting the same blocks over and
over again.  With fsync on, you wait for each write to hit the
controller's cache.  With fsync at the end, you wait for each write to
hit the page cache, and then wait for 8192 * 10 bytes to hit the
controller's cache at the end.

Even if the controller is writeback caching, fsync=1 sends many more
commands down the pipe, and each one has to be processed.

So, if you really want to test your writeback cache on the controller,
run your test with fsync = 1 with the writeback on, the run with fsync =
1 with the writeback off.

If you get the same numbers both ways, you don't have writeback on.

-chris

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2002-10-16 12:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-10-16  5:33 Writeback cache, SCSI, and ReiserFS JP Howard
2002-10-16  6:59 ` Valdis.Kletnieks
2002-10-16 12:52 ` Chris Mason

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.