* [Lustre-devel] Checksum Algorithm
@ 2007-11-06 16:59 RS RS
2007-11-06 19:30 ` Brian Behlendorf
2007-11-07 20:05 ` Andreas Dilger
0 siblings, 2 replies; 5+ messages in thread
From: RS RS @ 2007-11-06 16:59 UTC (permalink / raw)
To: lustre-devel
Hi,
We have seen a huge performance drop in 1.6.3, due to the checksum being enabled by default. I looked at the algorithm being used, and it is actually a CRC32, which is a very strong algorithm for detecting all sorts of problems, such as single bit errors, swapped bytes, and missing bytes.
I've been experimenting with using a simple XOR algorithm. I've been able to recover most of the lost performance. This algorithm will detected corrupted bytes and words. This algorithm will not detect swapped bytes errors, but I think that these are pretty rare. This algorithm will not detect missing bytes, but I suspect that other things in Lustre or LNET will detect this problem. This algorithm will not detect two errors that offset each other, such as a single bit error in two words that are a multiple of 4 bytes apart.
Should we consider using a more efficient checksum algorithm, in order to regain performance? Should the algorithm be configurable?
-Roger
_________________________________________________________________
Boo!?Scare away worms, viruses and so much more! Try Windows Live OneCare!
http://onecare.live.com/standard/en-us/purchase/trial.aspx?s_cid=wl_hotmailnews
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20071106/30a2c755/attachment.htm>
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Lustre-devel] Checksum Algorithm
2007-11-06 16:59 [Lustre-devel] Checksum Algorithm RS RS
@ 2007-11-06 19:30 ` Brian Behlendorf
2007-11-06 19:57 ` Paul Nowoczynski
2007-11-07 8:39 ` Niklas Edmundsson
2007-11-07 20:05 ` Andreas Dilger
1 sibling, 2 replies; 5+ messages in thread
From: Brian Behlendorf @ 2007-11-06 19:30 UTC (permalink / raw)
To: lustre-devel
Roger,
We've been running with checksums enabled in our release for some time now
and have seen the exact same impact on performance. In our case single node
performance is impacted but aggregate FS performance remains good when enough
clients are involved. We are tracking the performance issue under bug 13805
and would love any input/insight you might have on the issue.
Bug13805 <https://bugzilla.lustre.org/show_bug.cgi?id=13805>
My view on the issue is that it is madness to run with checksums disabled
and we need to investigate more efficient checksum algorithms. The current
crc32 algorithm may be too heavy weight but the simple XOR algorithm you
propose I fear is not strong enough. I've seen to many cases now of various
network components corrupting data in all sorts of interesting ways.
Happily we have a lot of other choices for algorithms to investigate.
If you have the time I'd encourage you to investigate an assortment of
algorithms and see which work best. Making this a runtime option via
proc I think is also an excellent idea.
--
Thanks,
Brian
> Hi,
>
> We have seen a huge performance drop in 1.6.3, due to the checksum being
> enabled by default. I looked at the algorithm being used, and it is
> actually a CRC32, which is a very strong algorithm for detecting all sorts
> of problems, such as single bit errors, swapped bytes, and missing bytes.
>
> I've been experimenting with using a simple XOR algorithm. I've been able
> to recover most of the lost performance. This algorithm will detected
> corrupted bytes and words. This algorithm will not detect swapped bytes
> errors, but I think that these are pretty rare. This algorithm will not
> detect missing bytes, but I suspect that other things in Lustre or LNET
> will detect this problem. This algorithm will not detect two errors that
> offset each other, such as a single bit error in two words that are a
> multiple of 4 bytes apart.
>
> Should we consider using a more efficient checksum algorithm, in order to
> regain performance? Should the algorithm be configurable?
>
> -Roger
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20071106/a8ce0ec3/attachment.pgp>
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Lustre-devel] Checksum Algorithm
2007-11-06 19:30 ` Brian Behlendorf
@ 2007-11-06 19:57 ` Paul Nowoczynski
2007-11-07 8:39 ` Niklas Edmundsson
1 sibling, 0 replies; 5+ messages in thread
From: Paul Nowoczynski @ 2007-11-06 19:57 UTC (permalink / raw)
To: lustre-devel
Brian,
How does the crc mechanism work? I assume that the crc is done at the
client, does the server verify the crc? Also are the crc's stored on disk?
thanks,
paul
Brian Behlendorf wrote:
> Roger,
>
> We've been running with checksums enabled in our release for some time now
> and have seen the exact same impact on performance. In our case single node
> performance is impacted but aggregate FS performance remains good when enough
> clients are involved. We are tracking the performance issue under bug 13805
> and would love any input/insight you might have on the issue.
>
> Bug13805 <https://bugzilla.lustre.org/show_bug.cgi?id=13805>
>
> My view on the issue is that it is madness to run with checksums disabled
> and we need to investigate more efficient checksum algorithms. The current
> crc32 algorithm may be too heavy weight but the simple XOR algorithm you
> propose I fear is not strong enough. I've seen to many cases now of various
> network components corrupting data in all sorts of interesting ways.
> Happily we have a lot of other choices for algorithms to investigate.
>
> If you have the time I'd encourage you to investigate an assortment of
> algorithms and see which work best. Making this a runtime option via
> proc I think is also an excellent idea.
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-devel
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Lustre-devel] Checksum Algorithm
2007-11-06 19:30 ` Brian Behlendorf
2007-11-06 19:57 ` Paul Nowoczynski
@ 2007-11-07 8:39 ` Niklas Edmundsson
1 sibling, 0 replies; 5+ messages in thread
From: Niklas Edmundsson @ 2007-11-07 8:39 UTC (permalink / raw)
To: lustre-devel
On Tue, 6 Nov 2007, Brian Behlendorf wrote:
> My view on the issue is that it is madness to run with checksums disabled
> and we need to investigate more efficient checksum algorithms. The current
> crc32 algorithm may be too heavy weight but the simple XOR algorithm you
> propose I fear is not strong enough. I've seen to many cases now of various
> network components corrupting data in all sorts of interesting ways.
> Happily we have a lot of other choices for algorithms to investigate.
>
> If you have the time I'd encourage you to investigate an assortment of
> algorithms and see which work best. Making this a runtime option via
> proc I think is also an excellent idea.
I'd strongly recommend looking at which algorithms are used for
checksumming in ZFS, they have done rather extensive investigations on
the subject.
If I remember correctly ZFS is using fletcher by default with good
performance, with sha256 as an option for those who wants it.
Anyhow, given the nowadays tight bond between CFS and SUN finding the
relevant info on the subject should be a no-brainer :)
/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | nikke at hpc2n.umu.se
---------------------------------------------------------------------------
OH NO, my wife burned the rice crispies--AGAIN!!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Lustre-devel] Checksum Algorithm
2007-11-06 16:59 [Lustre-devel] Checksum Algorithm RS RS
2007-11-06 19:30 ` Brian Behlendorf
@ 2007-11-07 20:05 ` Andreas Dilger
1 sibling, 0 replies; 5+ messages in thread
From: Andreas Dilger @ 2007-11-07 20:05 UTC (permalink / raw)
To: lustre-devel
On Nov 06, 2007 11:59 -0500, RS RS wrote:
> We have seen a huge performance drop in 1.6.3, due to the checksum
> being enabled by default. I looked at the algorithm being used, and it is
> actually a CRC32, which is a very strong algorithm for detecting all sorts
> of problems, such as single bit errors, swapped bytes, and missing bytes.
> I've been experimenting with using a simple XOR algorithm. I've
> been able to recover most of the lost performance. This algorithm
> will detected corrupted bytes and words. This algorithm will not
> detect swapped bytes errors, but I think that these are pretty rare.
> This algorithm will not detect missing bytes, but I suspect that other
> things in Lustre or LNET will detect this problem. This algorithm will
> not detect two errors that offset each other, such as a single bit error
> in two words that are a multiple of 4 bytes apart.
Note that it is possible to disable checksums to get the previous behaviour
back at runtime with (on all clients that should skip checksums):
for C in /proc/fs/lustre/osc/*/checksums; do
echo 0 > $C
done
in the lustre configuration:
mgs> lctl conf_param testfs-OST0001.osc.checksums=0
or at compile time with "configure --disable-checksum ..."
Cheers, Andreas
--
Andreas Dilger
Sr. Software Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2007-11-07 20:05 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-06 16:59 [Lustre-devel] Checksum Algorithm RS RS
2007-11-06 19:30 ` Brian Behlendorf
2007-11-06 19:57 ` Paul Nowoczynski
2007-11-07 8:39 ` Niklas Edmundsson
2007-11-07 20:05 ` Andreas Dilger
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.