* Caching the erasure code decoding matrix
@ 2015-05-11 20:46 Loic Dachary
2015-06-06 22:42 ` Loic Dachary
0 siblings, 1 reply; 3+ messages in thread
From: Loic Dachary @ 2015-05-11 20:46 UTC (permalink / raw)
To: Andreas-Joachim Peters; +Cc: Ceph Development
[-- Attachment #1: Type: text/plain, Size: 1789 bytes --]
Hi Andreas,
I gave a shot at implementing a cache to avoid computing the decoding matrix every time a 4KB stripe needs it, for the jerasure plugin, in the same way you did it for the ISA plugin.
The draft is at https://github.com/dachary/ceph/commit/a6fb5257fabd810704405c8bc13743d1592ecc54 if you're curious. Then I did some benchmarking and was quite disappointed. It looks like whenever the matrix needs to be computed jerasure_invert_matrix needs ~4000 cycles. Compared to the cost of galois_w08_region_multiply (~4 millions cycles), it is very small [1].
With the ISA plugin ec_init_table is less expensive than jerasure_invert_matrix with ~1200 cycles as well as the the function ec_encode_data_avx (1.5 millions cycles) [2].
In both cases though the order of magnitude remains (1000 to 1) and makes me wonder if I'm not missing something.
What do you think ?
Cheers
[1] jerasure profiling: make -j4 ceph_erasure_code_benchmark && rm bench.callgrind && valgrind --tool=callgrind --callgrind-out-file=bench.callgrind ./ceph_erasure_code_benchmark --plugin jerasure --parameter directory=.libs --workload decode --verbose --parameter technique=reed_sol_van --parameter k=4 --parameter m=2 --iterations 1024 --erased 1 --erased 2 && kcachegrind bench.callgrind
[2] isa profiling: make -j4 ceph_erasure_code_benchmark && rm bench.callgrind && valgrind --tool=callgrind --callgrind-out-file=bench.callgrind ./ceph_erasure_code_benchmark --plugin isa --parameter directory=.libs --workload decode --verbose --parameter technique=reed_sol_van --parameter k=4 --parameter m=2 --iterations 1024 --erased 1 --erased 2 && kcachegrind bench.callgrind
--
Loïc Dachary, Artisan Logiciel Libre
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Caching the erasure code decoding matrix
2015-05-11 20:46 Caching the erasure code decoding matrix Loic Dachary
@ 2015-06-06 22:42 ` Loic Dachary
2015-06-07 12:30 ` Andreas Joachim Peters
0 siblings, 1 reply; 3+ messages in thread
From: Loic Dachary @ 2015-06-06 22:42 UTC (permalink / raw)
To: Andreas-Joachim Peters; +Cc: Ceph Development
[-- Attachment #1: Type: text/plain, Size: 2144 bytes --]
Hi Andreas,
After discussing this a little with a few people, I'm tempted to conclude caching the decoding matrix is probably not worth the complexity. It's even difficult for me to know if maintaining the cache is cheaper than computing the decoding matrix.
Cheers
On 11/05/2015 22:46, Loic Dachary wrote:
> Hi Andreas,
>
> I gave a shot at implementing a cache to avoid computing the decoding matrix every time a 4KB stripe needs it, for the jerasure plugin, in the same way you did it for the ISA plugin.
>
> The draft is at https://github.com/dachary/ceph/commit/a6fb5257fabd810704405c8bc13743d1592ecc54 if you're curious. Then I did some benchmarking and was quite disappointed. It looks like whenever the matrix needs to be computed jerasure_invert_matrix needs ~4000 cycles. Compared to the cost of galois_w08_region_multiply (~4 millions cycles), it is very small [1].
>
> With the ISA plugin ec_init_table is less expensive than jerasure_invert_matrix with ~1200 cycles as well as the the function ec_encode_data_avx (1.5 millions cycles) [2].
>
> In both cases though the order of magnitude remains (1000 to 1) and makes me wonder if I'm not missing something.
>
> What do you think ?
>
> Cheers
>
> [1] jerasure profiling: make -j4 ceph_erasure_code_benchmark && rm bench.callgrind && valgrind --tool=callgrind --callgrind-out-file=bench.callgrind ./ceph_erasure_code_benchmark --plugin jerasure --parameter directory=.libs --workload decode --verbose --parameter technique=reed_sol_van --parameter k=4 --parameter m=2 --iterations 1024 --erased 1 --erased 2 && kcachegrind bench.callgrind
> [2] isa profiling: make -j4 ceph_erasure_code_benchmark && rm bench.callgrind && valgrind --tool=callgrind --callgrind-out-file=bench.callgrind ./ceph_erasure_code_benchmark --plugin isa --parameter directory=.libs --workload decode --verbose --parameter technique=reed_sol_van --parameter k=4 --parameter m=2 --iterations 1024 --erased 1 --erased 2 && kcachegrind bench.callgrind
>
--
Loïc Dachary, Artisan Logiciel Libre
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* RE: Caching the erasure code decoding matrix
2015-06-06 22:42 ` Loic Dachary
@ 2015-06-07 12:30 ` Andreas Joachim Peters
0 siblings, 0 replies; 3+ messages in thread
From: Andreas Joachim Peters @ 2015-06-07 12:30 UTC (permalink / raw)
To: Loic Dachary; +Cc: Ceph Development
Hi Loic,
ec_init_table is not the function to invert the matrix!
The loop to construct the inverted matrix is in isa_decode.
Or maybe I don't understand your email.
The performance of the jerasure plug-in for decoding was already very good and probably they construct and therefore invert the matrix in a different way than in the INTEL library. Without caching the decoding matrix the performance of the ISA library was a desaster using the benchmark tool.
Cheers Andreas.
________________________________________
From: Loic Dachary [loic@dachary.org]
Sent: 07 June 2015 00:42
To: Andreas Joachim Peters
Cc: Ceph Development
Subject: Re: Caching the erasure code decoding matrix
Hi Andreas,
After discussing this a little with a few people, I'm tempted to conclude caching the decoding matrix is probably not worth the complexity. It's even difficult for me to know if maintaining the cache is cheaper than computing the decoding matrix.
Cheers
On 11/05/2015 22:46, Loic Dachary wrote:
> Hi Andreas,
>
> I gave a shot at implementing a cache to avoid computing the decoding matrix every time a 4KB stripe needs it, for the jerasure plugin, in the same way you did it for the ISA plugin.
>
> The draft is at https://github.com/dachary/ceph/commit/a6fb5257fabd810704405c8bc13743d1592ecc54 if you're curious. Then I did some benchmarking and was quite disappointed. It looks like whenever the matrix needs to be computed jerasure_invert_matrix needs ~4000 cycles. Compared to the cost of galois_w08_region_multiply (~4 millions cycles), it is very small [1].
>
> With the ISA plugin ec_init_table is less expensive than jerasure_invert_matrix with ~1200 cycles as well as the the function ec_encode_data_avx (1.5 millions cycles) [2].
>
> In both cases though the order of magnitude remains (1000 to 1) and makes me wonder if I'm not missing something.
>
> What do you think ?
>
> Cheers
>
> [1] jerasure profiling: make -j4 ceph_erasure_code_benchmark && rm bench.callgrind && valgrind --tool=callgrind --callgrind-out-file=bench.callgrind ./ceph_erasure_code_benchmark --plugin jerasure --parameter directory=.libs --workload decode --verbose --parameter technique=reed_sol_van --parameter k=4 --parameter m=2 --iterations 1024 --erased 1 --erased 2 && kcachegrind bench.callgrind
> [2] isa profiling: make -j4 ceph_erasure_code_benchmark && rm bench.callgrind && valgrind --tool=callgrind --callgrind-out-file=bench.callgrind ./ceph_erasure_code_benchmark --plugin isa --parameter directory=.libs --workload decode --verbose --parameter technique=reed_sol_van --parameter k=4 --parameter m=2 --iterations 1024 --erased 1 --erased 2 && kcachegrind bench.callgrind
>
--
Loïc Dachary, Artisan Logiciel Libre
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2015-06-07 12:35 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-11 20:46 Caching the erasure code decoding matrix Loic Dachary
2015-06-06 22:42 ` Loic Dachary
2015-06-07 12:30 ` Andreas Joachim Peters
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.