* Re: Markov models for Ceph
2014-07-07 15:19 Markov models for Ceph Koleos Fuscus
@ 2014-07-07 17:16 ` Loic Dachary
0 siblings, 0 replies; 2+ messages in thread
From: Loic Dachary @ 2014-07-07 17:16 UTC (permalink / raw)
To: Koleos Fuscus; +Cc: ceph-devel@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 5170 bytes --]
Hi koleosfuscus,
From http://www.kaymgee.com/Kevin_Greenan/Software_files/hfrs.tar downloaded from http://www.kaymgee.com/Kevin_Greenan/Software.html
In hfrs/models/weaver_8_8_3.disk.ber.model
[num states]
4
0 1 a failure
1 0 b repair
1 2 c failure
2 1 d repair
2 3 e failure
[assign]
a=N*lam_d
b=mu
c=(N-1)*lam_d
d=2*mu
e=(N-2)*lam_d
N=8
lam_d=(1/461386.)
mu=(1/12.)
[END]
is semi-human parsable but hfrs/models/weaver_8_8_3.disk.ber.model
[num states]
5
0 1 a failure
0 4 b failure
1 2 c failure
1 4 d failure
1 0 e repair
2 3 f failure
2 4 g failure
2 1 h repair
3 4 i failure
3 2 j repair
[assign]
a=(N-0)*lam_d*(1-0.000000)*(1-(0.000000*(1-(1-p)**(N-1))))
b=(N-0)*lam_d*(0.000000)+(N-0)*lam_d*(1-0.000000)*((0.000000*(1-(1-p)**(N-1))))
c=(N-1)*lam_d*(1-0.000000)*(1-(0.000000*(1-(1-p)**(N-2))))
d=(N-1)*lam_d*(0.000000)+(N-1)*lam_d*(1-0.000000)*((0.000000*(1-(1-p)**(N-2))))
e=1*mu
f=(N-2)*lam_d*(1-0.000000)*(1-(0.114286*(1-(1-p)**(N-3))))
g=(N-2)*lam_d*(0.000000)+(N-2)*lam_d*(1-0.000000)*((0.114286*(1-(1-p)**(N-3))))
h=2*mu
i=(N-3)*lam_d
j=3*mu
N=8
lam_d=(1/461386.)
mu=(1/12.)
p=0.0237
[END]
[Disk sector conditional fault tolerance]
[[0.0, 0.0, 0.0, 0.0, 0.0043956043956043956, 0.02197802197802198, 0.075924075924075921], [0.0, 0.0, 0.0, 0.01098901098901099, 0.057942057942057944, 0.19780219780219779, 1.0], [0.0, 0.0, 0.034632034632034632, 0.16623376623376623, 0.49494949494949497, 1.0, 1.0], [0.0, 0.11428571428571428, 0.44126984126984126, 0.98333333333333328, 1.0, 1.0, 1.0]]
Kevin write that "The HFRS uses an extremely efficient mathematical technique, called importance sampling, which enables the observation of extremely low-probability events. I have implemented (and derived in my thesis) efficient simulation algorithms under both exponential and Weibull failure/repairs. The combination of these techniques, in addition to a custom Markov model solver, makes the HFRS an extremely useful tool for evaluating storage system reliability." meaning you need to understand both https://en.wikipedia.org/wiki/Markov_model and https://en.wikipedia.org/wiki/Importance_sampling as well as the semantics of the input file which is documented in the README.
Nice find koleosfuscus :-)
Cheers
On 07/07/2014 17:19, Koleos Fuscus wrote:
> Hello Loic,
>
> You ask previously:
> In other words, is there a place where one could set things like "disk
> fail % of the time" and "network is X Gb/s" and "repairing a disk
> failure requires disk require reading B bytes from M disks" ? As far
> as I understand, such factors cannot be expressed with a single
> formula and this is why a Markov model is useful.
>
> I think we need to run simulations to have a more precise estimation
> of the reliability of an erasure coded system. Markov models are not
> as flexible as you may think. Besides, solving equations when the
> number of components that may fail is large makes the problem not
> trivial. Maybe standard simulation is enough. As observed by Greenan
> in his thesis, standard simulations have problems with rare events
> which may not be observed during simulation time. I don't know if we
> should care about rare events for comparing methods..
>
> Greenan released the software used for his thesis. It is completely
> developed in Python.
> http://www.kaymgee.com/Kevin_Greenan/Software.html
>
> I found Greenan tool while trying to validate the results of ceph-tool
> and the numbers are completely different:
>
> For instance:
>
> Parameters for ceph tool:
> Disk type consumer, FIT1=2167, FIT2=2167
> Size: 2000GiB
> RAID-6
> Replace 0h
> Rebuild 6000MiB/s
> Volumes:8
> NRE model: ignore
> Period: 10 years
>
> (I used this numbers to compared with model 2DFT.disk.model of Greenan tool)
>
> Parameters for Greenan HFRS tool
> python mm_solve.py -m 2DFT.disk.model -M
>
> Results
>
> CEPH:
>
> storage durability PL(site) PL(copies)
> PL(NRE) PL(rep) loss/PiB
>
> ---------- ---------- ---------- ----------
> ---------- ---------- ----------
>
> RAID-6: 6+2 11-nines 0.000e+00 1.318e-12
> 0.000e+00 0.000e+00 9.887e+02
>
>
> HRFS:
>
> Analytic MTTDL: 4.06111903031e+12
> *********************
> Analytic prob. of failure: 2.15660e-08
> *********************
>
> Could you check if the parameters for ceph are correct and equivalent
> to HRFS model?Do you think it has sense to include Greenan tool.
> Greenan has a number of models including nonMDS codes. I am not sure
> yet how we can describe the LRC code in this platform but it might be
> possible.
>
> koleosfuscus
>
> ________________________________________________________________
> "My reply is: the software has no known bugs, therefore it has not
> been updated."
> Wietse Venema
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Loïc Dachary, Artisan Logiciel Libre
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread