All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Substriping support in ErasureCodeInterface
       [not found] <CAA-SLH9NNinYZO2iev5nYnTfR+va73-UNh7GnwAroSjtxCot0Q@mail.gmail.com>
@ 2015-06-05 12:40 ` Loic Dachary
  2015-06-05 13:34   ` Sindre Stene
  0 siblings, 1 reply; 3+ messages in thread
From: Loic Dachary @ 2015-06-05 12:40 UTC (permalink / raw)
  To: Sindre Stene; +Cc: Ceph Development

[-- Attachment #1: Type: text/plain, Size: 1519 bytes --]

Hi,

On 05/06/2015 14:02, Sindre Stene wrote:
> Is there, or is there planned support for substriping/subpacket support in ErasureCodeInterface ?

Not at the moment.

> Clarification; I am looking for a way to test a different erasure coding scheme, that (in addition to having multiple stripes) has substripes within those stripes, letting us further reduce the amount of chunks retrieved from the HDD. I have an implementation ready in C for the encoder/decoder etc. I am looking to plug that into Ceph through the ErasureCodeInterface.

https://github.com/ceph/ceph/blob/master/src/erasure-code/ErasureCodeInterface.h

> The principles involved (which I am sure you are familiar with) are similar to https://research.facebook.com/publications/1445994382319911/a-hitchhiker-s-guide-to-fast-and-efficient-data-reconstruction-in-erasure-coded-data-centers/ , but with more substripes (>4, < 256).

There is a declaration of intent for an hitchhiker plugin at http://tracker.ceph.com/issues/11268 but there is no progress yet.

> Can it currently be done without having to modify the Ceph code base ? (I am looking at master on ceph/ceph, and it seems to not be possible).
> Could I ask you to link me to the appropriate issue tracker / Jira / forum, if there is development going on already towards this goal? 

Why do you think the current interface is insufficient ? What would you need in addition ?

Cheers

> 
> Sincerely,
> Sindre B. Stene

-- 
Loïc Dachary, Artisan Logiciel Libre


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Substriping support in ErasureCodeInterface
  2015-06-05 12:40 ` Substriping support in ErasureCodeInterface Loic Dachary
@ 2015-06-05 13:34   ` Sindre Stene
  2015-06-05 17:46     ` Loic Dachary
  0 siblings, 1 reply; 3+ messages in thread
From: Sindre Stene @ 2015-06-05 13:34 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Ceph Development

Sending the mail again without pesky html tags.

On Fri, Jun 5, 2015 at 2:40 PM, Loic Dachary <loic@dachary.orgwrote:
>(...)
>Why do you think the current interface is insufficient ? What would you
>need in addition ?

I am not sure whether or not the interface is sufficient. Let me try to
explain my assumptions, so that we can clarify.

Lets say for the sake of simplicity that I have a system of 14 HDDs, with an
allocation unit size of 4K, and am using K = 10 systematic drives, and M = 4
redundancy drives (and no spares).
Lets say that I am using an encoding scheme with 8 substripes (for each of
the 14 stripes), and only one object size, that perfectly matches the
scheme: 4K * 10 * 8. My raw objects are then of 80 chunks, and i am adding
32 chunks of redundancy data, making each encoded object take up 4K*8*14 =
448K. The chunks are to be physically stored with these offsets:
HDD0: chunks 0-7
HDD1: chunks 8-15
(...)
HDD13 (redundancy 3): chunks
Assuming that the coding scheme is MDS, the encoding scheme would guarantee
recovery of up to 4 lost hard drives. It would not guarantee recovery for 32
arbitrary chunks (which is the same data amount when considering a single
object), as they would have to be organized in adjacent groups of 8.
Assuming the crush map may be used to configure this sort of chunk
placement, perhaps the interface is indeed sufficient ?

And, not specific to the interface definition, but about how Ceph uses the
interface during operation and during tests:
Would the interface receive decode requests for sets of chunks that are not
organized in groups of 8?
Would the subpacketization (or grouping of chunks) create problems with the
unit tests?
Do you experts see any other implications or side-effects?

Motivation; The required read access for recovering one drive in a (14 total
disks,10 systematic data disks) setup using Reed.Solomon, is 10. This can
theoretically be reduced by ~40% by introducing substripes (splitting each
of the 14 parts into many smaller parts, but fundamentally storing the first
10 major parts in exactly the same way on the HDD, meaning that the I/O of
normal reads are not impacted at all). There are many trade-offs to
consider, and so we wish to test the performance differences.


Sincerely,
Sindre B. Stene

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Substriping support in ErasureCodeInterface
  2015-06-05 13:34   ` Sindre Stene
@ 2015-06-05 17:46     ` Loic Dachary
  0 siblings, 0 replies; 3+ messages in thread
From: Loic Dachary @ 2015-06-05 17:46 UTC (permalink / raw)
  To: Sindre Stene; +Cc: Ceph Development

[-- Attachment #1: Type: text/plain, Size: 3427 bytes --]

Hi,

On 05/06/2015 15:34, Sindre Stene wrote:
> Sending the mail again without pesky html tags.
> 
> On Fri, Jun 5, 2015 at 2:40 PM, Loic Dachary <loic@dachary.orgwrote:
>> (...)
>> Why do you think the current interface is insufficient ? What would you
>> need in addition ?
> 
> I am not sure whether or not the interface is sufficient. Let me try to
> explain my assumptions, so that we can clarify.
> 
> Lets say for the sake of simplicity that I have a system of 14 HDDs, with an
> allocation unit size of 4K, and am using K = 10 systematic drives, and M = 4
> redundancy drives (and no spares).
> Lets say that I am using an encoding scheme with 8 substripes (for each of
> the 14 stripes), and only one object size, that perfectly matches the
> scheme: 4K * 10 * 8. My raw objects are then of 80 chunks, and i am adding
> 32 chunks of redundancy data, making each encoded object take up 4K*8*14 =
> 448K. The chunks are to be physically stored with these offsets:
> HDD0: chunks 0-7
> HDD1: chunks 8-15
> (...)
> HDD13 (redundancy 3): chunks
> Assuming that the coding scheme is MDS, the encoding scheme would guarantee
> recovery of up to 4 lost hard drives. It would not guarantee recovery for 32
> arbitrary chunks (which is the same data amount when considering a single
> object), as they would have to be organized in adjacent groups of 8.
> Assuming the crush map may be used to configure this sort of chunk
> placement, perhaps the interface is indeed sufficient ?

I think so. 

> And, not specific to the interface definition, but about how Ceph uses the
> interface during operation and during tests:
> Would the interface receive decode requests for sets of chunks that are not
> organized in groups of 8?

The plugin only decodes when at least a chunk is missing, otherwise it just concatenates the chunks. When it decodes, in the case of jerasure, it expects the caller (the OSD in this case) to ask about the minimum amount of chunks that are needed. If I understand correctly the plugin would hide the 8 substripe division from the caller entirely. It the caller needs to be aware of that substripe logic, a new interface would have to be defined and documented.

> Would the subpacketization (or grouping of chunks) create problems with the
> unit tests?
> Do you experts see any other implications or side-effects?
> 
> Motivation; The required read access for recovering one drive in a (14 total
> disks,10 systematic data disks) setup using Reed.Solomon, is 10. This can
> theoretically be reduced by ~40% by introducing substripes (splitting each
> of the 14 parts into many smaller parts, but fundamentally storing the first
> 10 major parts in exactly the same way on the HDD, meaning that the I/O of
> normal reads are not impacted at all). There are many trade-offs to
> consider, and so we wish to test the performance differences.

There is a pull request pending that only reads part of the chunks when the size is smaller than a stripe. This may be useful for workloads involving small objects. Is it what you're thinking about ?

Cheers

> 
> Sincerely,
> Sindre B. Stene
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Loïc Dachary, Artisan Logiciel Libre


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-06-05 17:46 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAA-SLH9NNinYZO2iev5nYnTfR+va73-UNh7GnwAroSjtxCot0Q@mail.gmail.com>
2015-06-05 12:40 ` Substriping support in ErasureCodeInterface Loic Dachary
2015-06-05 13:34   ` Sindre Stene
2015-06-05 17:46     ` Loic Dachary

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.