op-journaled fs, journal size and storage speeds

* op-journaled fs, journal size and storage speeds
@ 2011-04-30 14:51 Peter Grandi
  2011-05-01  9:27 ` Dave Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Grandi @ 2011-04-30 14:51 UTC (permalink / raw)
  To: Linux fs XFS, Linux fs JFS

Been thinking about journals and RAID6s and SSDs.

In particular for file system designs like JFS and XFS that do
operation journaling (while ext[34] do block journaling).

The issue is: journal size?

It seems to me that adopting as guideline a percent of the
filesystem is very wrong, and so I have been using a rule of
thumb like one second of expected transfer rate, so "in flight"
updates are never much behind.

But even at a single disk *sequential* transfer rate of say
80MB/s average, a journal that contains operation records could
conceivably hold dozens if not hundreds of thousands of pending
metadata updates, probably targeted at very widely scattered
locations on disk, and playing a journal fully could take a long
time.

So the idea would be that the relevant transfer rate would be
the *random* one, and since that is around 4MB/s per single
disk, journal sizes would end up pretty small. But many people
allocate very large (at least compared to that) journals.

This seems to me a fairly bad idea, because then the journal
becomes a massive hot spot on the disk and draws the disk arm
like black hole. I suspect that operations should not stay on
the journal for a long time. However if the journal is too small
processes that do metadata updates start to hang on it.

So some questions for which I have guesses but not good answers:

  * What should journal size be proportional to?
  * What is the downside of a too small journal?
  * What is the downside of a too large journal other than space?

Again I expect answers to be very different for ext[34] but I am
asking for operation-journaling file system designs like JFS and
XFS.

BTW, another consideration is that for filesystems that are
fairly journal-intensive, putting the journal on a low traffic
storage device can have large benefits.

But if they can be pretty small, I wonder whether putting the
journals of several filesystems on the same storage device then
becomes a sensible option as the locality will be quite narrow
(e.g. a single physical cylinder) or it could be wortwhile like
the database people do to journal to battery-backed RAM.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread