From: Edward Shishkin <edward@namesys.com>
To: Hans Reiser <reiser@namesys.com>
Cc: PFC <lists@peufeu.com>,
jpiszcz@lucidpixels.com, reiserfs-list@namesys.com,
Alexander Zarochentcev <zam@namesys.com>
Subject: Re: Linux Gazette benchmark Reiser 4
Date: Mon, 09 Jan 2006 01:07:46 +0300 [thread overview]
Message-ID: <43C18D32.8020106@namesys.com> (raw)
In-Reply-To: <43BECFF3.10204@namesys.com>
Hans Reiser wrote:
>PFC wrote:
>
>
>
>> Hehe. Wow. Sure, a benchmark that runs in 0.03 seconds for the
>>fastest one and 0.07 seconds for the slowest one looks pretty
>>reliable to me. How much time does it take to spawn the "touch"
>>process 10k times ? Hm... I'd guess most of the benchmark time ?
>>
>>
>
>
>
Let's consider this important aspect of benchmarking more carefully.
So there is an interesting question: how much should be a difference
in order to approve that some fs really wins at this statistics? Is
there any guarantee you won't get, say, 0.05 and 0.02 after next run?
Sorry, but I didn't find any answer in Justin's notes, NOTE5 (Tests
Performed) says that questionable tests were re-run, but it seems we
need something kinda research here instead of re-run.
Below are some comments for how this problem is resolved (1*) in mongo
benchmark. Look for example at this table:
http://www.namesys.com/benchmarks.html#mongo.2.6.11
Fractions like 0.982 (D/A), 1.017 (C/A) are in black color, it means
that we _can not_ do any assumptions about winner because
|1 - X/A| < 0.02. What the magic M = 0.02 is?
Let's run the same phase for the same settings (file system, file set,
etc..) 10 times. We will obtain for the same statistics X a set of
different (because of errors) values x1, x2, ..., x10. Suppose that
X has a normal distribution (any objections?). It means that we can
calculate its trusted interval for a single measurement (2*) as
[X - d(P), X + d(P)], where d(P) = D*U(P), D is dispersion and U(P)
should be found from the standard table by any nominated value of
trusted probability P (3*).
Now we have the following simple criterion (*4):
|A - X| >= 2d(P), i.e. |1 - X/A| >= 2D*U(P)/A
| |<-d->| |<-d->|
------<-----|----->----<-----|----->------
A X
The magic M = 0.02 for mongo benchmark was calculated as 2D*U(P)/A
for the trusted probability P=0.85 (5*).
Now it is clear from the formula above why statistics shouldn't be
too small: because the criterion becomes false. I am sure (and it
is easy to check) 2d(P=0.85) is much more then |0.07 - 0.03| as it
is in the case of find 10000 files. By the way, some settings, which
provide a small values (~5 sec) of the mongo STATS statistics also
make this criterion false.
(1*) Maybe this is not a perfect way, but it is better then nothing
(2*) For N measurements the expression for boundaries becomes a bit
complicated.
(3*) For P=0.85 (as we can found in any scientific book) U(P)=1.44
(4*) One more assumption here about identical distributions of A and X
(5*) Actually D = max(D_create, D_copy, D_read, D_delete, D_dd), where
D_each_phase was estimated once by 10 measurements with some fixed
settings by the standard way:
D^2 = ((x - x1)^2 + ... + (x - x10)^2)/(10 - 1), where
x = (x1 + ... + x10)/10 is an average value.
Edward.
next prev parent reply other threads:[~2006-01-08 22:07 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-01-06 18:10 Linux Gazette benchmark Reiser 4 Robert Hulme
2006-01-06 19:09 ` PFC
2006-01-06 20:15 ` Hans Reiser
2006-01-08 22:07 ` Edward Shishkin [this message]
2006-01-09 11:04 ` Re[2]: " Pysiak Satriani
2006-01-09 19:50 ` Hans Reiser
2006-01-10 7:57 ` Hans Reiser
2006-01-07 12:41 ` Andrea Gelmini
2006-01-07 14:03 ` Philippe Gramoullé
2006-01-09 18:22 ` Hans Reiser
2006-01-09 19:01 ` Marcel Hilzinger
2006-01-18 8:28 ` A question: May Reiser4 be equivalent to Reiser3 with some flag/plugin Giovanni A. Orlando
2006-01-18 17:40 ` Hans Reiser
2006-01-18 18:43 ` Giovanni A. Orlando
2006-01-18 18:21 ` Vladimir V. Saveliev
2006-01-18 18:39 ` Giovanni A. Orlando
2006-01-18 20:17 ` Vladimir V. Saveliev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=43C18D32.8020106@namesys.com \
--to=edward@namesys.com \
--cc=jpiszcz@lucidpixels.com \
--cc=lists@peufeu.com \
--cc=reiser@namesys.com \
--cc=reiserfs-list@namesys.com \
--cc=zam@namesys.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.