From: Michael Tokarev <mjt@tls.msk.ru>
To: Arshavir Grigorian <ag@m-cam.com>
Cc: linux-raid@vger.kernel.org, pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Postgres on RAID5
Date: Tue, 15 Mar 2005 01:47:16 +0300 [thread overview]
Message-ID: <42361474.9060500@tls.msk.ru> (raw)
In-Reply-To: <4235FC2F.9060000@m-cam.com>
Arshavir Grigorian wrote:
> Alex Turner wrote:
>
[]
> Well, by putting the pg_xlog directory on a separate disk/partition, I
> was able to increase this rate to about 50 or so per second (still
> pretty far from your numbers). Next I am going to try putting the
> pg_xlog on a RAID1+0 array and see if that helps.
pg_xlog is written syncronously, right? It should be, or else reliability
of the database will be at a big question...
I posted a question on Feb-22 here in linux-raid, titled "*terrible*
direct-write performance with raid5". There's a problem with write
performance of a raid4/5/6 array, which is due to the design.
Consider raid5 array (raid4 will be exactly the same, and for raid6,
just double the parity writes) with N data block and 1 parity block.
At the time of writing a portion of data, parity block should be
updated too, to be consistent and recoverable. And here, the size of
the write plays very significant role. If your write size is smaller
than chunk_size*N (N = number of data blocks in a stripe), in order
to calculate correct parity you have to read data from the remaining
drives. The only case where you don't need to read data from other
drives is when you're writing by the size of chunk_size*N, AND the
write is block-aligned. By default, chunk_size is 64Kb (min is 4Kb).
So the only reasonable direct-write size of N drives will be 64Kb*N,
or else raid code will have to read "missing" data to calculate the
parity block. Ofcourse, in 99% cases you're writing in much smaller
sizes, say 4Kb or so. And here, the more drives you have, the
LESS write speed you will have.
When using the O/S buffer and filesystem cache, the system has much
more chances to re-order requests and sometimes even omit reading
entirely (when you perform many sequentional writes for example,
without sync in between), so buffered writes might be much fast.
But not direct or syncronous writes, again especially when you're
doing alot of sequential writes...
So to me it looks like an inherent problem of raid5 architecture
wrt database-like workload -- databases tends to use syncronous
or direct writes to ensure good data consistency.
For pgsql, which (i don't know for sure but reportedly) uses syncronous
writs only for the transaction log, it is a good idea to put that log
only to a raid1 or raid10 array, but NOT to raid5 array.
Just IMHO ofcourse.
/mjt
next prev parent reply other threads:[~2005-03-14 22:47 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-03-11 19:48 Postgres on RAID5 Arshavir Grigorian
2005-03-14 4:36 ` [PERFORM] " Greg Stark
2005-03-14 7:44 ` [PERFORM] Postgres on RAID5 (possible sync blocking read type issue on 2.6.11) David Greaves
2005-03-14 19:53 ` [PERFORM] Postgres on RAID5 Alex Turner
2005-03-14 20:17 ` Greg Stark
2005-03-14 20:35 ` Jim Buttafuoco
2005-03-14 21:03 ` Arshavir Grigorian
2005-03-14 22:47 ` Michael Tokarev [this message]
2005-03-14 23:49 ` Guy
2005-03-15 16:17 ` Effect of Stripe Size (was Postgres on RAID5) Ruth Ivimey-Cook
2005-03-16 16:47 ` Postgres on RAID5 David Dougall
2005-03-16 16:55 ` Michael Tokarev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=42361474.9060500@tls.msk.ru \
--to=mjt@tls.msk.ru \
--cc=ag@m-cam.com \
--cc=linux-raid@vger.kernel.org \
--cc=pgsql-performance@postgresql.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).