From mboxrd@z Thu Jan 1 00:00:00 1970 From: Edward Shishkin Subject: Re: Linux Gazette benchmark Reiser 4 Date: Mon, 09 Jan 2006 01:07:46 +0300 Message-ID: <43C18D32.8020106@namesys.com> References: <43BECFF3.10204@namesys.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: list-help: list-unsubscribe: list-post: Errors-To: flx@namesys.com In-Reply-To: <43BECFF3.10204@namesys.com> List-Id: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: Hans Reiser Cc: PFC , jpiszcz@lucidpixels.com, reiserfs-list@namesys.com, Alexander Zarochentcev Hans Reiser wrote: >PFC wrote: > > > >> Hehe. Wow. Sure, a benchmark that runs in 0.03 seconds for the >>fastest one and 0.07 seconds for the slowest one looks pretty >>reliable to me. How much time does it take to spawn the "touch" >>process 10k times ? Hm... I'd guess most of the benchmark time ? >> >> > > > Let's consider this important aspect of benchmarking more carefully. So there is an interesting question: how much should be a difference in order to approve that some fs really wins at this statistics? Is there any guarantee you won't get, say, 0.05 and 0.02 after next run? Sorry, but I didn't find any answer in Justin's notes, NOTE5 (Tests Performed) says that questionable tests were re-run, but it seems we need something kinda research here instead of re-run. Below are some comments for how this problem is resolved (1*) in mongo benchmark. Look for example at this table: http://www.namesys.com/benchmarks.html#mongo.2.6.11 Fractions like 0.982 (D/A), 1.017 (C/A) are in black color, it means that we _can not_ do any assumptions about winner because |1 - X/A| < 0.02. What the magic M = 0.02 is? Let's run the same phase for the same settings (file system, file set, etc..) 10 times. We will obtain for the same statistics X a set of different (because of errors) values x1, x2, ..., x10. Suppose that X has a normal distribution (any objections?). It means that we can calculate its trusted interval for a single measurement (2*) as [X - d(P), X + d(P)], where d(P) = D*U(P), D is dispersion and U(P) should be found from the standard table by any nominated value of trusted probability P (3*). Now we have the following simple criterion (*4): |A - X| >= 2d(P), i.e. |1 - X/A| >= 2D*U(P)/A | |<-d->| |<-d->| ------<-----|----->----<-----|----->------ A X The magic M = 0.02 for mongo benchmark was calculated as 2D*U(P)/A for the trusted probability P=0.85 (5*). Now it is clear from the formula above why statistics shouldn't be too small: because the criterion becomes false. I am sure (and it is easy to check) 2d(P=0.85) is much more then |0.07 - 0.03| as it is in the case of find 10000 files. By the way, some settings, which provide a small values (~5 sec) of the mongo STATS statistics also make this criterion false. (1*) Maybe this is not a perfect way, but it is better then nothing (2*) For N measurements the expression for boundaries becomes a bit complicated. (3*) For P=0.85 (as we can found in any scientific book) U(P)=1.44 (4*) One more assumption here about identical distributions of A and X (5*) Actually D = max(D_create, D_copy, D_read, D_delete, D_dd), where D_each_phase was estimated once by 10 measurements with some fixed settings by the standard way: D^2 = ((x - x1)^2 + ... + (x - x10)^2)/(10 - 1), where x = (x1 + ... + x10)/10 is an average value. Edward.