RGW: Implement S3 storage class feature

All of lore.kernel.org
 help / color / mirror / Atom feed

* RGW: Implement S3 storage class feature
@ 2017-06-21 11:39 Jiaying Ren
  2017-06-21 14:04 ` Matt Benjamin
  0 siblings, 1 reply; 10+ messages in thread
From: Jiaying Ren @ 2017-06-21 11:39 UTC (permalink / raw)
  To: Yehuda Sadeh-Weinraub; +Cc: ceph-devel

Hi~ Yehuda and Ceph developers:

We're prototyping S3 storage class feature [1]. It seems that we've
tried on this before[2]. I'd like to share the following as start for
anyone who interested about this feature; your comments are
appreciated.

* Storage Class Category

S3 current supported storage class types can be classified by whether
we can set storage class type during uploading or not:

+ Direct Storage Class (like Reduced Redundancy Storage, STANDARD,
  STANDARD_IA, we can specify during upload object)
+ Indirect Storage Class (like Glacier, we can only use this storage
  class type by lifecycle management)

we're going to talk about Direct Storage Class.

* Core Concept

Current rgw are using following concept to determine the bucket/object
placement:

+ placement rule - placement rule is key-value pair, the placement id
  as key, the placement info as value.
+ placement info - collect a bunch of rados pools.
+ placement target - placement target contains a placement id and a
  list of placement tags,that only used to determine whether the user
  can use the placement rule or not.

placement target can be manipulated only in the zonegroup, and
placement rule only in the zone.

* Feature Mapping

In order to make the S3 StorageClass/Swift Storage Policy orthogonal,
we can leverage current placement rule as underlying building block,
and mapping the dialect feature as:

+ Swift storage policy = per bucket placement rule
+ S3 storage class = per object placement rule

Each storage class is presented by a placement rule,that use different
data pools(like STANDARD use 3-replica data_pool, Reduced Redundancy
Storage use 2-replica data_pool), but we need to enforce that the
storage classes defined in the same zone should use the same
index_pool for bucket index and the same pool for object metadata.

* Priority of placement rule

Following structs:

+ zonegroup
+ user
+ bucket

need to contain a default placement rule, we need to determine the
placement rule used by bucket/object.

** bucket placement rule

The order of placement rule priority to determine the bucket default
placement rule:

request rule > user default rule > zonegroup default rule

The bucket default placement rule should not be empty after bucket
creation.

** object placement rule

The order of placement rule priority to determine the object default
placement rule:

request rule > bucket default rule

* Todo List

+ the head of rgw-object should only contains the metadata of
  rgw-object,the first chunk of rgw-object data should be stored in
  the same pool as the tail of rgw-object

* References

+ [1] (http://docs.aws.amazon.com/AmazonS3/latest/dev/storage-class-intro.html)
+ [2] http://tracker.ceph.com/issues/12907

--
mikulely

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RGW: Implement S3 storage class feature
  2017-06-21 11:39 RGW: Implement S3 storage class feature Jiaying Ren
@ 2017-06-21 14:04 ` Matt Benjamin
  2017-06-21 14:46   ` Daniel Gryniewicz
  0 siblings, 1 reply; 10+ messages in thread
From: Matt Benjamin @ 2017-06-21 14:04 UTC (permalink / raw)
  To: Jiaying Ren; +Cc: Yehuda Sadeh-Weinraub, ceph-devel

Hi,

Looks very coherent.

My main question is about...

----- Original Message -----
> From: "Jiaying Ren" <mikulely@gmail.com>
> To: "Yehuda Sadeh-Weinraub" <ysadehwe@redhat.com>
> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
> Sent: Wednesday, June 21, 2017 7:39:24 AM
> Subject: RGW: Implement S3 storage class feature
> 

> 
> * Todo List
> 
> + the head of rgw-object should only contains the metadata of
>   rgw-object,the first chunk of rgw-object data should be stored in
>   the same pool as the tail of rgw-object

Is this always desirable?

Matt
 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RGW: Implement S3 storage class feature
  2017-06-21 14:04 ` Matt Benjamin
@ 2017-06-21 14:46   ` Daniel Gryniewicz
  2017-06-21 15:14     ` Yehuda Sadeh-Weinraub
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Gryniewicz @ 2017-06-21 14:46 UTC (permalink / raw)
  To: Matt Benjamin, Jiaying Ren; +Cc: Yehuda Sadeh-Weinraub, ceph-devel

On 06/21/2017 10:04 AM, Matt Benjamin wrote:
> Hi,
>
> Looks very coherent.
>
> My main question is about...
>
> ----- Original Message -----
>> From: "Jiaying Ren" <mikulely@gmail.com>
>> To: "Yehuda Sadeh-Weinraub" <ysadehwe@redhat.com>
>> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
>> Sent: Wednesday, June 21, 2017 7:39:24 AM
>> Subject: RGW: Implement S3 storage class feature
>>
>
>>
>> * Todo List
>>
>> + the head of rgw-object should only contains the metadata of
>>   rgw-object,the first chunk of rgw-object data should be stored in
>>   the same pool as the tail of rgw-object
>
> Is this always desirable?
>

Well, unless the head pool happens to have the correct storage class, 
it's necessary.  And I'd guess that verification of this is complicated, 
although maybe not.

Maybe we can use the head pool if it has >= the correct storage class?

Daniel


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RGW: Implement S3 storage class feature
  2017-06-21 14:46   ` Daniel Gryniewicz
@ 2017-06-21 15:14     ` Yehuda Sadeh-Weinraub
  2017-06-21 15:50       ` Daniel Gryniewicz
  0 siblings, 1 reply; 10+ messages in thread
From: Yehuda Sadeh-Weinraub @ 2017-06-21 15:14 UTC (permalink / raw)
  To: Daniel Gryniewicz; +Cc: Matt Benjamin, Jiaying Ren, ceph-devel

On Wed, Jun 21, 2017 at 7:46 AM, Daniel Gryniewicz <dang@redhat.com> wrote:
> On 06/21/2017 10:04 AM, Matt Benjamin wrote:
>>
>> Hi,
>>
>> Looks very coherent.
>>
>> My main question is about...
>>
>> ----- Original Message -----
>>>
>>> From: "Jiaying Ren" <mikulely@gmail.com>
>>> To: "Yehuda Sadeh-Weinraub" <ysadehwe@redhat.com>
>>> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
>>> Sent: Wednesday, June 21, 2017 7:39:24 AM
>>> Subject: RGW: Implement S3 storage class feature
>>>
>>
>>>
>>> * Todo List
>>>
>>> + the head of rgw-object should only contains the metadata of
>>>   rgw-object,the first chunk of rgw-object data should be stored in
>>>   the same pool as the tail of rgw-object
>>
>>
>> Is this always desirable?
>>
>
> Well, unless the head pool happens to have the correct storage class, it's
> necessary.  And I'd guess that verification of this is complicated, although
> maybe not.
>
> Maybe we can use the head pool if it has >= the correct storage class?
>
My original thinking was that when we reassign an object to a new
placement, we only touch its tail which is incompatible with that.
However, thinking about it some more I don't see why we need to have
this limitation, so it's probably possible to keep the data in the
head in one case, and modify the object and have the data in the tail
(object's head will need to be rewritten anyway because we modify the
manifest).
I think that the decision whether we keep data in the head could be a
property of the zone. In any case, once an object is created changing
this property will only affect newly created objects, and old objects
could still be read correctly. Having data in the head is an
optimization that supposedly reduces small objects latency, and I
still think it's useful in a mixed pools situation. The thought is
that the bulk of the data will be at the tail anyway. However, we
recently changed the default head size from 512k to 4M, so this might
not be true any more. Anyhow, I favour having this as a configurable
(which should be simple to add).

Yehuda

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RGW: Implement S3 storage class feature
  2017-06-21 15:14     ` Yehuda Sadeh-Weinraub
@ 2017-06-21 15:50       ` Daniel Gryniewicz
  2017-06-21 16:37         ` Yehuda Sadeh-Weinraub
  2017-06-22  9:44         ` Jiaying Ren
  0 siblings, 2 replies; 10+ messages in thread
From: Daniel Gryniewicz @ 2017-06-21 15:50 UTC (permalink / raw)
  To: Yehuda Sadeh-Weinraub; +Cc: Matt Benjamin, Jiaying Ren, ceph-devel

On 06/21/2017 11:14 AM, Yehuda Sadeh-Weinraub wrote:
> On Wed, Jun 21, 2017 at 7:46 AM, Daniel Gryniewicz <dang@redhat.com> wrote:
>> On 06/21/2017 10:04 AM, Matt Benjamin wrote:
>>>
>>> Hi,
>>>
>>> Looks very coherent.
>>>
>>> My main question is about...
>>>
>>> ----- Original Message -----
>>>>
>>>> From: "Jiaying Ren" <mikulely@gmail.com>
>>>> To: "Yehuda Sadeh-Weinraub" <ysadehwe@redhat.com>
>>>> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
>>>> Sent: Wednesday, June 21, 2017 7:39:24 AM
>>>> Subject: RGW: Implement S3 storage class feature
>>>>
>>>
>>>>
>>>> * Todo List
>>>>
>>>> + the head of rgw-object should only contains the metadata of
>>>>   rgw-object,the first chunk of rgw-object data should be stored in
>>>>   the same pool as the tail of rgw-object
>>>
>>>
>>> Is this always desirable?
>>>
>>
>> Well, unless the head pool happens to have the correct storage class, it's
>> necessary.  And I'd guess that verification of this is complicated, although
>> maybe not.
>>
>> Maybe we can use the head pool if it has >= the correct storage class?
>>
> My original thinking was that when we reassign an object to a new
> placement, we only touch its tail which is incompatible with that.
> However, thinking about it some more I don't see why we need to have
> this limitation, so it's probably possible to keep the data in the
> head in one case, and modify the object and have the data in the tail
> (object's head will need to be rewritten anyway because we modify the
> manifest).
> I think that the decision whether we keep data in the head could be a
> property of the zone. In any case, once an object is created changing
> this property will only affect newly created objects, and old objects
> could still be read correctly. Having data in the head is an
> optimization that supposedly reduces small objects latency, and I
> still think it's useful in a mixed pools situation. The thought is
> that the bulk of the data will be at the tail anyway. However, we
> recently changed the default head size from 512k to 4M, so this might
> not be true any more. Anyhow, I favour having this as a configurable
> (which should be simple to add).
>
> Yehuda
>


I would be strongly against keeping data in the head when the head is in 
a lower-level storage class.  That means that the entire object is 
violating the constraints of the storage class.

Of course, having the head in a lower storage class (data or not) is 
probably a violation.  Maybe we'd have to require that all heads go in 
the highest storage class.

Daniel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RGW: Implement S3 storage class feature
  2017-06-21 15:50       ` Daniel Gryniewicz
@ 2017-06-21 16:37         ` Yehuda Sadeh-Weinraub
  2017-06-22  9:44         ` Jiaying Ren
  1 sibling, 0 replies; 10+ messages in thread
From: Yehuda Sadeh-Weinraub @ 2017-06-21 16:37 UTC (permalink / raw)
  To: Daniel Gryniewicz; +Cc: Matt Benjamin, Jiaying Ren, ceph-devel

On Wed, Jun 21, 2017 at 8:50 AM, Daniel Gryniewicz <dang@redhat.com> wrote:
> On 06/21/2017 11:14 AM, Yehuda Sadeh-Weinraub wrote:
>>
>> On Wed, Jun 21, 2017 at 7:46 AM, Daniel Gryniewicz <dang@redhat.com>
>> wrote:
>>>
>>> On 06/21/2017 10:04 AM, Matt Benjamin wrote:
>>>>
>>>>
>>>> Hi,
>>>>
>>>> Looks very coherent.
>>>>
>>>> My main question is about...
>>>>
>>>> ----- Original Message -----
>>>>>
>>>>>
>>>>> From: "Jiaying Ren" <mikulely@gmail.com>
>>>>> To: "Yehuda Sadeh-Weinraub" <ysadehwe@redhat.com>
>>>>> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
>>>>> Sent: Wednesday, June 21, 2017 7:39:24 AM
>>>>> Subject: RGW: Implement S3 storage class feature
>>>>>
>>>>
>>>>>
>>>>> * Todo List
>>>>>
>>>>> + the head of rgw-object should only contains the metadata of
>>>>>   rgw-object,the first chunk of rgw-object data should be stored in
>>>>>   the same pool as the tail of rgw-object
>>>>
>>>>
>>>>
>>>> Is this always desirable?
>>>>
>>>
>>> Well, unless the head pool happens to have the correct storage class,
>>> it's
>>> necessary.  And I'd guess that verification of this is complicated,
>>> although
>>> maybe not.
>>>
>>> Maybe we can use the head pool if it has >= the correct storage class?
>>>
>> My original thinking was that when we reassign an object to a new
>> placement, we only touch its tail which is incompatible with that.
>> However, thinking about it some more I don't see why we need to have
>> this limitation, so it's probably possible to keep the data in the
>> head in one case, and modify the object and have the data in the tail
>> (object's head will need to be rewritten anyway because we modify the
>> manifest).
>> I think that the decision whether we keep data in the head could be a
>> property of the zone. In any case, once an object is created changing
>> this property will only affect newly created objects, and old objects
>> could still be read correctly. Having data in the head is an
>> optimization that supposedly reduces small objects latency, and I
>> still think it's useful in a mixed pools situation. The thought is
>> that the bulk of the data will be at the tail anyway. However, we
>> recently changed the default head size from 512k to 4M, so this might
>> not be true any more. Anyhow, I favour having this as a configurable
>> (which should be simple to add).
>>
>> Yehuda
>>
>
>
> I would be strongly against keeping data in the head when the head is in a
> lower-level storage class.  That means that the entire object is violating
> the constraints of the storage class.
>
> Of course, having the head in a lower storage class (data or not) is
> probably a violation.  Maybe we'd have to require that all heads go in the
> highest storage class.
>

I'd keep it simple. Note that all objects' heads in a bucket need to
reside in the same pool, otherwise we won't be able to locate them
(unless we start searching).

Yehuda

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RGW: Implement S3 storage class feature
  2017-06-21 15:50       ` Daniel Gryniewicz
  2017-06-21 16:37         ` Yehuda Sadeh-Weinraub
@ 2017-06-22  9:44         ` Jiaying Ren
       [not found]           ` <CAOi4hNd_wAshA_=4cW4X7Y_rijavs82-J2FbaE-L92P_Vy0RkA@mail.gmail.com>
  1 sibling, 1 reply; 10+ messages in thread
From: Jiaying Ren @ 2017-06-22  9:44 UTC (permalink / raw)
  To: dang; +Cc: Yehuda Sadeh-Weinraub, Matt Benjamin, ceph-devel

On 21 June 2017 at 23:50, Daniel Gryniewicz <dang@redhat.com> wrote:
>>>
>> My original thinking was that when we reassign an object to a new
>> placement, we only touch its tail which is incompatible with that.
>> However, thinking about it some more I don't see why we need to have
>> this limitation, so it's probably possible to keep the data in the
>> head in one case, and modify the object and have the data in the tail
>> (object's head will need to be rewritten anyway because we modify the
>> manifest).
>> I think that the decision whether we keep data in the head could be a
>> property of the zone.

Yes, I guess we also need to check the zone placement rule config when
pull the realm in the multisite env, to make sure the sync peer has
the same storage class support, multisite sync should also respect
object storage class.

>> In any case, once an object is created changing
>> this property will only affect newly created objects, and old objects
>> could still be read correctly. Having data in the head is an
>> optimization that supposedly reduces small objects latency, and I
>> still think it's useful in a mixed pools situation. The thought is
>> that the bulk of the data will be at the tail anyway. However, we
>> recently changed the default head size from 512k to 4M, so this might
>> not be true any more. Anyhow, I favour having this as a configurable
>> (which should be simple to add).
>>
>> Yehuda
>>
>
>
> I would be strongly against keeping data in the head when the head is in a
> lower-level storage class.  That means that the entire object is violating
> the constraints of the storage class.

Agreed. The default behavior of storage class require us to keep the
data in the head as the same pool as the tail. Even if we made this as
a configureable option, we should disable this kind of inline by
default to match the default behavior of storage class.

>
> Of course, having the head in a lower storage class (data or not) is
> probably a violation.  Maybe we'd have to require that all heads go in the
> highest storage class.
>
> Daniel

On 21 June 2017 at 23:50, Daniel Gryniewicz <dang@redhat.com> wrote:
> On 06/21/2017 11:14 AM, Yehuda Sadeh-Weinraub wrote:
>>
>> On Wed, Jun 21, 2017 at 7:46 AM, Daniel Gryniewicz <dang@redhat.com>
>> wrote:
>>>
>>> On 06/21/2017 10:04 AM, Matt Benjamin wrote:
>>>>
>>>>
>>>> Hi,
>>>>
>>>> Looks very coherent.
>>>>
>>>> My main question is about...
>>>>
>>>> ----- Original Message -----
>>>>>
>>>>>
>>>>> From: "Jiaying Ren" <mikulely@gmail.com>
>>>>> To: "Yehuda Sadeh-Weinraub" <ysadehwe@redhat.com>
>>>>> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
>>>>> Sent: Wednesday, June 21, 2017 7:39:24 AM
>>>>> Subject: RGW: Implement S3 storage class feature
>>>>>
>>>>
>>>>>
>>>>> * Todo List
>>>>>
>>>>> + the head of rgw-object should only contains the metadata of
>>>>>   rgw-object,the first chunk of rgw-object data should be stored in
>>>>>   the same pool as the tail of rgw-object
>>>>
>>>>
>>>>
>>>> Is this always desirable?
>>>>
>>>
>>> Well, unless the head pool happens to have the correct storage class,
>>> it's
>>> necessary.  And I'd guess that verification of this is complicated,
>>> although
>>> maybe not.
>>>
>>> Maybe we can use the head pool if it has >= the correct storage class?
>>>
>> My original thinking was that when we reassign an object to a new
>> placement, we only touch its tail which is incompatible with that.
>> However, thinking about it some more I don't see why we need to have
>> this limitation, so it's probably possible to keep the data in the
>> head in one case, and modify the object and have the data in the tail
>> (object's head will need to be rewritten anyway because we modify the
>> manifest).
>> I think that the decision whether we keep data in the head could be a
>> property of the zone. In any case, once an object is created changing
>> this property will only affect newly created objects, and old objects
>> could still be read correctly. Having data in the head is an
>> optimization that supposedly reduces small objects latency, and I
>> still think it's useful in a mixed pools situation. The thought is
>> that the bulk of the data will be at the tail anyway. However, we
>> recently changed the default head size from 512k to 4M, so this might
>> not be true any more. Anyhow, I favour having this as a configurable
>> (which should be simple to add).
>>
>> Yehuda
>>
>
>
> I would be strongly against keeping data in the head when the head is in a
> lower-level storage class.  That means that the entire object is violating
> the constraints of the storage class.
>
> Of course, having the head in a lower storage class (data or not) is
> probably a violation.  Maybe we'd have to require that all heads go in the
> highest storage class.
>
> Daniel

^ permalink raw reply	[flat|nested] 10+ messages in thread

[parent not found: <CAOi4hNd_wAshA_=4cW4X7Y_rijavs82-J2FbaE-L92P_Vy0RkA@mail.gmail.com>]

* Re: RGW: Implement S3 storage class feature
       [not found]           ` <CAOi4hNd_wAshA_=4cW4X7Y_rijavs82-J2FbaE-L92P_Vy0RkA@mail.gmail.com>
@ 2017-07-06 11:00             ` Jiaying Ren
  2017-07-13 17:48               ` yuxiang fang
  0 siblings, 1 reply; 10+ messages in thread
From: Jiaying Ren @ 2017-07-06 11:00 UTC (permalink / raw)
  To: 方钰翔, Yehuda Sadeh-Weinraub, Matt Benjamin,
	dang
  Cc: ceph-devel

Thanks all for your insight!
After more investigation,I'd like to
share some output, your comments are appreciated as always. ;-)

* proposal

** introduce tail_data_pool

Each storage class is presented as individual placement rule. Each
placement rule has serveral pools:

+ index_pool(for bucket index)
+ data_pool(for head)
+ tail_data_pool(for tail)

Finally,different storage classes use the same index_pool and
data_pool, but different tail_data_pool. Using different storage
classes means using different tail_data_pools.

Here's a placement rule/storage class config sample output:

#+BEGIN_EXAMPLE
    {
        "key": "STANDARD",
        "val": {
            "index_pool": "us-east-1.rgw.buckets.index",
            "data_pool": "us-east-1.rgw.buckets.data",
            "tail_data_pool": "us-east-1.rgw.buckets.3replica", <-
introduced for rgw_obj raw data
            "data_extra_pool": "us-east-1.rgw.buckets.non-ec",
            "index_type": 0,
            "compression": "",
            "inline_head": 1
        }
    },
#+END_EXAMPLE

Multipart rgw_obj will be stored at tail_data_pool. Further more,for
those rgw_obj only has head,not tail, we can refactor Manifest to
support disable inline first chunk data of rgw_obj into the head,
which can finally match the semantic of AWS S3 sotrage class:

#+BEGIN_EXAMPLE
    {
        "key": "STANDARD",
        "val": {
            "index_pool": "us-east-1.rgw.buckets.index",
            "data_pool": "us-east-1.rgw.buckets.data",
            "tail_data_pool": "us-east-1.rgw.buckets.3replica",
            "data_extra_pool": "us-east-1.rgw.buckets.non-ec",
            "index_type": 0,
            "compression": "",
            "inline_head": 1  <- introduced for inline first data
chunk of rgw_obj into head
        }
    },
#+END_EXAMPLE

** expose different storage class as individual placement rule

As draft ,placment list will list all storage class:

#+BEGIN_EXAMPLE
 ./bin/radosgw-admin -c ceph.conf zone  placement list
[
    {
        "key": "STANDARD",
        "val": {
            "index_pool": "us-east-1.rgw.buckets.index",
            "data_pool": "us-east-1.rgw.buckets.data",
            "tail_data_pool": "us-east-1.rgw.buckets.3replica",
            "data_extra_pool": "us-east-1.rgw.buckets.non-ec",
            "index_type": 0,
            "compression": "",
            "inline_head": 1
        }
    },

    {
        "key": "RRS",
        "val": {
            "index_pool": "us-east-1.rgw.buckets.index",
            "data_pool": "us-east-1.rgw.buckets.data",
            "tail_data_pool": "us-east-1.rgw.buckets.2replica",
            "data_extra_pool": "us-east-1.rgw.buckets.non-ec",
            "index_type": 0,
            "compression": ""
            "inline_head": 1
        }
    }
]
#+END_EXAMPLE

Another option would be expose serveral storage classes in the same
placement rule:

#+BEGIN_EXAMPLE
 ./bin/radosgw-admin -c ceph.conf zone  placement list
[
    {
        "key": "default-placement",
        "val": {
            "index_pool": "us-east-1.rgw.buckets.index",
            "storage_class"
            {
              "STANDARD" : {
                           "data_pool": "us-east-1.rgw.3replica",
                           "data_extra_pool": "us-east-1.rgw.buckets.non-ec",
                           "inline_head": 1
                           },
              "RRS" :      {
                           "data_pool": "us-east-1.rgw.2replica",
                           "data_extra_pool": "us-east-1.rgw.buckets.non-ec",
                           "inline_head": 1
                           },
            }
            "index_type": 0,
            "compression": ""
        }
    }
]
#+END_EXAMPLE

This approach strict the meaning of storage class as different data
pool. But we may support things like Multi-Regional Storage (
https://cloud.google.com/storage/docs/storage-classes#multi-regional )
in the future. So I'd prefer expost storage class at placement rule
level.

* issues

If we introduced the tail_data_pool,we need corresponding
modification. I'm not sure about this, feedback are appreciated.

** use rgw_pool instead of placment rule in the RGWManifest

In the RGWObjManifest, we've defined two placement rules:

+ head_placement_rule
(https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.h#L406)
+ tail_placement.placement_rule
(https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.h#L119)

then we use placment rule to find the data_pool of the placement
rule.If we introduced the tail_data_pool,there's no need to keep
tail_placement.placement_rule(although it is the same as
head_placement_rule)

In the RGWObjManifest internal, `class rgw_obj_select`also defined a
`placement_rule`
(https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.h#L127),
which finally used placement rule to find the data_pool of that
placement rule.

So I suppose to instead of using placement rule in the
RGWManifest, replaced with rgw_pool.so that we've the chance to use
tail_data_pool and data_pool in the same placement rule.

On 23 June 2017 at 13:43, 方钰翔 <abcdeffyx@gmail.com> wrote:
> I think storing the head object and tail objects in different pools is also
> necessary.
>
> If we introduce a tail_data_pool in placement rule to store tail objects. we
> can create replicated pool for data_pool and ec for tail_data_pool to
> leverage the performance and capacity.
>
> 2017-06-22 17:44 GMT+08:00 Jiaying Ren <mikulely@gmail.com>:
>>
>> On 21 June 2017 at 23:50, Daniel Gryniewicz <dang@redhat.com> wrote:
>> >>>
>> >> My original thinking was that when we reassign an object to a new
>> >> placement, we only touch its tail which is incompatible with that.
>> >> However, thinking about it some more I don't see why we need to have
>> >> this limitation, so it's probably possible to keep the data in the
>> >> head in one case, and modify the object and have the data in the tail
>> >> (object's head will need to be rewritten anyway because we modify the
>> >> manifest).
>> >> I think that the decision whether we keep data in the head could be a
>> >> property of the zone.
>>
>> Yes, I guess we also need to check the zone placement rule config when
>> pull the realm in the multisite env, to make sure the sync peer has
>> the same storage class support, multisite sync should also respect
>> object storage class.
>>
>> >> In any case, once an object is created changing
>> >> this property will only affect newly created objects, and old objects
>> >> could still be read correctly. Having data in the head is an
>> >> optimization that supposedly reduces small objects latency, and I
>> >> still think it's useful in a mixed pools situation. The thought is
>> >> that the bulk of the data will be at the tail anyway. However, we
>> >> recently changed the default head size from 512k to 4M, so this might
>> >> not be true any more. Anyhow, I favour having this as a configurable
>> >> (which should be simple to add).
>> >>
>> >> Yehuda
>> >>
>> >
>> >
>> > I would be strongly against keeping data in the head when the head is in
>> > a
>> > lower-level storage class.  That means that the entire object is
>> > violating
>> > the constraints of the storage class.
>>
>> Agreed. The default behavior of storage class require us to keep the
>> data in the head as the same pool as the tail. Even if we made this as
>> a configureable option, we should disable this kind of inline by
>> default to match the default behavior of storage class.
>>
>> >
>> > Of course, having the head in a lower storage class (data or not) is
>> > probably a violation.  Maybe we'd have to require that all heads go in
>> > the
>> > highest storage class.
>> >
>> > Daniel
>>
>> On 21 June 2017 at 23:50, Daniel Gryniewicz <dang@redhat.com> wrote:
>> > On 06/21/2017 11:14 AM, Yehuda Sadeh-Weinraub wrote:
>> >>
>> >> On Wed, Jun 21, 2017 at 7:46 AM, Daniel Gryniewicz <dang@redhat.com>
>> >> wrote:
>> >>>
>> >>> On 06/21/2017 10:04 AM, Matt Benjamin wrote:
>> >>>>
>> >>>>
>> >>>> Hi,
>> >>>>
>> >>>> Looks very coherent.
>> >>>>
>> >>>> My main question is about...
>> >>>>
>> >>>> ----- Original Message -----
>> >>>>>
>> >>>>>
>> >>>>> From: "Jiaying Ren" <mikulely@gmail.com>
>> >>>>> To: "Yehuda Sadeh-Weinraub" <ysadehwe@redhat.com>
>> >>>>> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
>> >>>>> Sent: Wednesday, June 21, 2017 7:39:24 AM
>> >>>>> Subject: RGW: Implement S3 storage class feature
>> >>>>>
>> >>>>
>> >>>>>
>> >>>>> * Todo List
>> >>>>>
>> >>>>> + the head of rgw-object should only contains the metadata of
>> >>>>>   rgw-object,the first chunk of rgw-object data should be stored in
>> >>>>>   the same pool as the tail of rgw-object
>> >>>>
>> >>>>
>> >>>>
>> >>>> Is this always desirable?
>> >>>>
>> >>>
>> >>> Well, unless the head pool happens to have the correct storage class,
>> >>> it's
>> >>> necessary.  And I'd guess that verification of this is complicated,
>> >>> although
>> >>> maybe not.
>> >>>
>> >>> Maybe we can use the head pool if it has >= the correct storage class?
>> >>>
>> >> My original thinking was that when we reassign an object to a new
>> >> placement, we only touch its tail which is incompatible with that.
>> >> However, thinking about it some more I don't see why we need to have
>> >> this limitation, so it's probably possible to keep the data in the
>> >> head in one case, and modify the object and have the data in the tail
>> >> (object's head will need to be rewritten anyway because we modify the
>> >> manifest).
>> >> I think that the decision whether we keep data in the head could be a
>> >> property of the zone. In any case, once an object is created changing
>> >> this property will only affect newly created objects, and old objects
>> >> could still be read correctly. Having data in the head is an
>> >> optimization that supposedly reduces small objects latency, and I
>> >> still think it's useful in a mixed pools situation. The thought is
>> >> that the bulk of the data will be at the tail anyway. However, we
>> >> recently changed the default head size from 512k to 4M, so this might
>> >> not be true any more. Anyhow, I favour having this as a configurable
>> >> (which should be simple to add).
>> >>
>> >> Yehuda
>> >>
>> >
>> >
>> > I would be strongly against keeping data in the head when the head is in
>> > a
>> > lower-level storage class.  That means that the entire object is
>> > violating
>> > the constraints of the storage class.
>> >
>> > Of course, having the head in a lower storage class (data or not) is
>> > probably a violation.  Maybe we'd have to require that all heads go in
>> > the
>> > highest storage class.
>> >
>> > Daniel
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RGW: Implement S3 storage class feature
  2017-07-06 11:00             ` Jiaying Ren
@ 2017-07-13 17:48               ` yuxiang fang
  2017-07-18  2:12                 ` Jiaying Ren
  0 siblings, 1 reply; 10+ messages in thread
From: yuxiang fang @ 2017-07-13 17:48 UTC (permalink / raw)
  To: Jiaying Ren
  Cc: Yehuda Sadeh-Weinraub, Matt Benjamin, Daniel Gryniewicz,
	ceph-devel

Hope this mail success using "plain text mode"

We ever faced a problem when created radosgw object storage on ec
pool(k=4, m=2).  Using cosbench to do upload test, we got 700-800 ops
of 4K size objects,  but 2212.18 ops of 4K size objects from 3
replicated pool. Performance from ec is lower than 3 replicated pool,
but they eventually have similar throughput when object size become
bigger(4MB, 8MB or bigger). This phenomena is easy to explain, cpu is
the bottleneck when upload small objects, but disks will be the
bottleneck when upload bigger objects.

Our customers always concern cost, so ec is a good choice to lower the
cost of capacity; but it also brings trouble as mentioned above.
So I wanted to find a way to improve the performance of ec for object
storage based radosgw, and found a way to leverage capacity and
performance.

My opinion is that we should support store head and tail objects of
radosgw object separately, which means that stores head objects in 3
replicated pool and tail objects in ec pool. So for small objects, we
can get performance of 3 replicated pool, and we also benefit 67%
capacity utility from ec(3 replicated only has 33%).

Consider a scene: we want to upload big size(MB or GB) objects , we
prefer to use multipart, radosgw will stripe every part to several
tail rados objects but no head object and all of them will land in ec
pool. So we will get similar throughput as 3 replicated pool for they
are big objects, and we also benefit capacity utility.

Pareto principle (also known as the 80–20 rule) also exists in some
workload, that is 20% files/objects occupy 80% capacity. It is not
just subjective guess,  my company's share disk(like dropbox, storing
department e-doc, software, and so on) obey the rule and even
85-15(15% files occupy 80% capacity).

As the mail I replied several days(rejected by Mail Delivery
Subsystem), if we introduce a tail_data_pool in placement rule to
store tail objects. we can create replicated pool for data_pool and ec
for tail_data_pool to leverage the performance and capacity.

I have open a PR, and request for comments.
https://github.com/ceph/ceph/pull/16325


thanks
ivan from eisoo


On Thu, Jul 6, 2017 at 7:00 PM, Jiaying Ren <mikulely@gmail.com> wrote:
> Thanks all for your insight!
> After more investigation,I'd like to
> share some output, your comments are appreciated as always. ;-)
>
> * proposal
>
> ** introduce tail_data_pool
>
> Each storage class is presented as individual placement rule. Each
> placement rule has serveral pools:
>
> + index_pool(for bucket index)
> + data_pool(for head)
> + tail_data_pool(for tail)
>
> Finally,different storage classes use the same index_pool and
> data_pool, but different tail_data_pool. Using different storage
> classes means using different tail_data_pools.
>
> Here's a placement rule/storage class config sample output:
>
> #+BEGIN_EXAMPLE
>     {
>         "key": "STANDARD",
>         "val": {
>             "index_pool": "us-east-1.rgw.buckets.index",
>             "data_pool": "us-east-1.rgw.buckets.data",
>             "tail_data_pool": "us-east-1.rgw.buckets.3replica", <-
> introduced for rgw_obj raw data
>             "data_extra_pool": "us-east-1.rgw.buckets.non-ec",
>             "index_type": 0,
>             "compression": "",
>             "inline_head": 1
>         }
>     },
> #+END_EXAMPLE
>
> Multipart rgw_obj will be stored at tail_data_pool. Further more,for
> those rgw_obj only has head,not tail, we can refactor Manifest to
> support disable inline first chunk data of rgw_obj into the head,
> which can finally match the semantic of AWS S3 sotrage class:
>
> #+BEGIN_EXAMPLE
>     {
>         "key": "STANDARD",
>         "val": {
>             "index_pool": "us-east-1.rgw.buckets.index",
>             "data_pool": "us-east-1.rgw.buckets.data",
>             "tail_data_pool": "us-east-1.rgw.buckets.3replica",
>             "data_extra_pool": "us-east-1.rgw.buckets.non-ec",
>             "index_type": 0,
>             "compression": "",
>             "inline_head": 1  <- introduced for inline first data
> chunk of rgw_obj into head
>         }
>     },
> #+END_EXAMPLE
>
> ** expose different storage class as individual placement rule
>
> As draft ,placment list will list all storage class:
>
> #+BEGIN_EXAMPLE
>  ./bin/radosgw-admin -c ceph.conf zone  placement list
> [
>     {
>         "key": "STANDARD",
>         "val": {
>             "index_pool": "us-east-1.rgw.buckets.index",
>             "data_pool": "us-east-1.rgw.buckets.data",
>             "tail_data_pool": "us-east-1.rgw.buckets.3replica",
>             "data_extra_pool": "us-east-1.rgw.buckets.non-ec",
>             "index_type": 0,
>             "compression": "",
>             "inline_head": 1
>         }
>     },
>
>     {
>         "key": "RRS",
>         "val": {
>             "index_pool": "us-east-1.rgw.buckets.index",
>             "data_pool": "us-east-1.rgw.buckets.data",
>             "tail_data_pool": "us-east-1.rgw.buckets.2replica",
>             "data_extra_pool": "us-east-1.rgw.buckets.non-ec",
>             "index_type": 0,
>             "compression": ""
>             "inline_head": 1
>         }
>     }
> ]
> #+END_EXAMPLE
>
> Another option would be expose serveral storage classes in the same
> placement rule:
>
> #+BEGIN_EXAMPLE
>  ./bin/radosgw-admin -c ceph.conf zone  placement list
> [
>     {
>         "key": "default-placement",
>         "val": {
>             "index_pool": "us-east-1.rgw.buckets.index",
>             "storage_class"
>             {
>               "STANDARD" : {
>                            "data_pool": "us-east-1.rgw.3replica",
>                            "data_extra_pool": "us-east-1.rgw.buckets.non-ec",
>                            "inline_head": 1
>                            },
>               "RRS" :      {
>                            "data_pool": "us-east-1.rgw.2replica",
>                            "data_extra_pool": "us-east-1.rgw.buckets.non-ec",
>                            "inline_head": 1
>                            },
>             }
>             "index_type": 0,
>             "compression": ""
>         }
>     }
> ]
> #+END_EXAMPLE
>
> This approach strict the meaning of storage class as different data
> pool. But we may support things like Multi-Regional Storage (
> https://cloud.google.com/storage/docs/storage-classes#multi-regional )
> in the future. So I'd prefer expost storage class at placement rule
> level.
>
> * issues
>
> If we introduced the tail_data_pool,we need corresponding
> modification. I'm not sure about this, feedback are appreciated.
>
> ** use rgw_pool instead of placment rule in the RGWManifest
>
> In the RGWObjManifest, we've defined two placement rules:
>
> + head_placement_rule
> (https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.h#L406)
> + tail_placement.placement_rule
> (https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.h#L119)
>
> then we use placment rule to find the data_pool of the placement
> rule.If we introduced the tail_data_pool,there's no need to keep
> tail_placement.placement_rule(although it is the same as
> head_placement_rule)
>
> In the RGWObjManifest internal, `class rgw_obj_select`also defined a
> `placement_rule`
> (https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.h#L127),
> which finally used placement rule to find the data_pool of that
> placement rule.
>
> So I suppose to instead of using placement rule in the
> RGWManifest, replaced with rgw_pool.so that we've the chance to use
> tail_data_pool and data_pool in the same placement rule.
>
> On 23 June 2017 at 13:43, 方钰翔 <abcdeffyx@gmail.com> wrote:
>> I think storing the head object and tail objects in different pools is also
>> necessary.
>>
>> If we introduce a tail_data_pool in placement rule to store tail objects. we
>> can create replicated pool for data_pool and ec for tail_data_pool to
>> leverage the performance and capacity.
>>
>> 2017-06-22 17:44 GMT+08:00 Jiaying Ren <mikulely@gmail.com>:
>>>
>>> On 21 June 2017 at 23:50, Daniel Gryniewicz <dang@redhat.com> wrote:
>>> >>>
>>> >> My original thinking was that when we reassign an object to a new
>>> >> placement, we only touch its tail which is incompatible with that.
>>> >> However, thinking about it some more I don't see why we need to have
>>> >> this limitation, so it's probably possible to keep the data in the
>>> >> head in one case, and modify the object and have the data in the tail
>>> >> (object's head will need to be rewritten anyway because we modify the
>>> >> manifest).
>>> >> I think that the decision whether we keep data in the head could be a
>>> >> property of the zone.
>>>
>>> Yes, I guess we also need to check the zone placement rule config when
>>> pull the realm in the multisite env, to make sure the sync peer has
>>> the same storage class support, multisite sync should also respect
>>> object storage class.
>>>
>>> >> In any case, once an object is created changing
>>> >> this property will only affect newly created objects, and old objects
>>> >> could still be read correctly. Having data in the head is an
>>> >> optimization that supposedly reduces small objects latency, and I
>>> >> still think it's useful in a mixed pools situation. The thought is
>>> >> that the bulk of the data will be at the tail anyway. However, we
>>> >> recently changed the default head size from 512k to 4M, so this might
>>> >> not be true any more. Anyhow, I favour having this as a configurable
>>> >> (which should be simple to add).
>>> >>
>>> >> Yehuda
>>> >>
>>> >
>>> >
>>> > I would be strongly against keeping data in the head when the head is in
>>> > a
>>> > lower-level storage class.  That means that the entire object is
>>> > violating
>>> > the constraints of the storage class.
>>>
>>> Agreed. The default behavior of storage class require us to keep the
>>> data in the head as the same pool as the tail. Even if we made this as
>>> a configureable option, we should disable this kind of inline by
>>> default to match the default behavior of storage class.
>>>
>>> >
>>> > Of course, having the head in a lower storage class (data or not) is
>>> > probably a violation.  Maybe we'd have to require that all heads go in
>>> > the
>>> > highest storage class.
>>> >
>>> > Daniel
>>>
>>> On 21 June 2017 at 23:50, Daniel Gryniewicz <dang@redhat.com> wrote:
>>> > On 06/21/2017 11:14 AM, Yehuda Sadeh-Weinraub wrote:
>>> >>
>>> >> On Wed, Jun 21, 2017 at 7:46 AM, Daniel Gryniewicz <dang@redhat.com>
>>> >> wrote:
>>> >>>
>>> >>> On 06/21/2017 10:04 AM, Matt Benjamin wrote:
>>> >>>>
>>> >>>>
>>> >>>> Hi,
>>> >>>>
>>> >>>> Looks very coherent.
>>> >>>>
>>> >>>> My main question is about...
>>> >>>>
>>> >>>> ----- Original Message -----
>>> >>>>>
>>> >>>>>
>>> >>>>> From: "Jiaying Ren" <mikulely@gmail.com>
>>> >>>>> To: "Yehuda Sadeh-Weinraub" <ysadehwe@redhat.com>
>>> >>>>> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
>>> >>>>> Sent: Wednesday, June 21, 2017 7:39:24 AM
>>> >>>>> Subject: RGW: Implement S3 storage class feature
>>> >>>>>
>>> >>>>
>>> >>>>>
>>> >>>>> * Todo List
>>> >>>>>
>>> >>>>> + the head of rgw-object should only contains the metadata of
>>> >>>>>   rgw-object,the first chunk of rgw-object data should be stored in
>>> >>>>>   the same pool as the tail of rgw-object
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> Is this always desirable?
>>> >>>>
>>> >>>
>>> >>> Well, unless the head pool happens to have the correct storage class,
>>> >>> it's
>>> >>> necessary.  And I'd guess that verification of this is complicated,
>>> >>> although
>>> >>> maybe not.
>>> >>>
>>> >>> Maybe we can use the head pool if it has >= the correct storage class?
>>> >>>
>>> >> My original thinking was that when we reassign an object to a new
>>> >> placement, we only touch its tail which is incompatible with that.
>>> >> However, thinking about it some more I don't see why we need to have
>>> >> this limitation, so it's probably possible to keep the data in the
>>> >> head in one case, and modify the object and have the data in the tail
>>> >> (object's head will need to be rewritten anyway because we modify the
>>> >> manifest).
>>> >> I think that the decision whether we keep data in the head could be a
>>> >> property of the zone. In any case, once an object is created changing
>>> >> this property will only affect newly created objects, and old objects
>>> >> could still be read correctly. Having data in the head is an
>>> >> optimization that supposedly reduces small objects latency, and I
>>> >> still think it's useful in a mixed pools situation. The thought is
>>> >> that the bulk of the data will be at the tail anyway. However, we
>>> >> recently changed the default head size from 512k to 4M, so this might
>>> >> not be true any more. Anyhow, I favour having this as a configurable
>>> >> (which should be simple to add).
>>> >>
>>> >> Yehuda
>>> >>
>>> >
>>> >
>>> > I would be strongly against keeping data in the head when the head is in
>>> > a
>>> > lower-level storage class.  That means that the entire object is
>>> > violating
>>> > the constraints of the storage class.
>>> >
>>> > Of course, having the head in a lower storage class (data or not) is
>>> > probably a violation.  Maybe we'd have to require that all heads go in
>>> > the
>>> > highest storage class.
>>> >
>>> > Daniel
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RGW: Implement S3 storage class feature
  2017-07-13 17:48               ` yuxiang fang
@ 2017-07-18  2:12                 ` Jiaying Ren
  0 siblings, 0 replies; 10+ messages in thread
From: Jiaying Ren @ 2017-07-18  2:12 UTC (permalink / raw)
  To: yuxiang fang
  Cc: Yehuda Sadeh-Weinraub, Matt Benjamin, Daniel Gryniewicz,
	ceph-devel

Hi~ yuxiang:

Glad to see you've pushed this forward! will take a review.

On 14 July 2017 at 01:48, yuxiang fang <abcdeffyx@gmail.com> wrote:
> Hope this mail success using "plain text mode"
>
> We ever faced a problem when created radosgw object storage on ec
> pool(k=4, m=2).  Using cosbench to do upload test, we got 700-800 ops
> of 4K size objects,  but 2212.18 ops of 4K size objects from 3
> replicated pool. Performance from ec is lower than 3 replicated pool,
> but they eventually have similar throughput when object size become
> bigger(4MB, 8MB or bigger). This phenomena is easy to explain, cpu is
> the bottleneck when upload small objects, but disks will be the
> bottleneck when upload bigger objects.
>
> Our customers always concern cost, so ec is a good choice to lower the
> cost of capacity; but it also brings trouble as mentioned above.
> So I wanted to find a way to improve the performance of ec for object
> storage based radosgw, and found a way to leverage capacity and
> performance.
>
> My opinion is that we should support store head and tail objects of
> radosgw object separately, which means that stores head objects in 3
> replicated pool and tail objects in ec pool. So for small objects, we
> can get performance of 3 replicated pool, and we also benefit 67%
> capacity utility from ec(3 replicated only has 33%).
>
> Consider a scene: we want to upload big size(MB or GB) objects , we
> prefer to use multipart, radosgw will stripe every part to several
> tail rados objects but no head object and all of them will land in ec
> pool. So we will get similar throughput as 3 replicated pool for they
> are big objects, and we also benefit capacity utility.
>
> Pareto principle (also known as the 80–20 rule) also exists in some
> workload, that is 20% files/objects occupy 80% capacity. It is not
> just subjective guess,  my company's share disk(like dropbox, storing
> department e-doc, software, and so on) obey the rule and even
> 85-15(15% files occupy 80% capacity).
>
> As the mail I replied several days(rejected by Mail Delivery
> Subsystem), if we introduce a tail_data_pool in placement rule to
> store tail objects. we can create replicated pool for data_pool and ec
> for tail_data_pool to leverage the performance and capacity.
>
> I have open a PR, and request for comments.
> https://github.com/ceph/ceph/pull/16325
>
>
> thanks
> ivan from eisoo
>
>
> On Thu, Jul 6, 2017 at 7:00 PM, Jiaying Ren <mikulely@gmail.com> wrote:
>> Thanks all for your insight!
>> After more investigation,I'd like to
>> share some output, your comments are appreciated as always. ;-)
>>
>> * proposal
>>
>> ** introduce tail_data_pool
>>
>> Each storage class is presented as individual placement rule. Each
>> placement rule has serveral pools:
>>
>> + index_pool(for bucket index)
>> + data_pool(for head)
>> + tail_data_pool(for tail)
>>
>> Finally,different storage classes use the same index_pool and
>> data_pool, but different tail_data_pool. Using different storage
>> classes means using different tail_data_pools.
>>
>> Here's a placement rule/storage class config sample output:
>>
>> #+BEGIN_EXAMPLE
>>     {
>>         "key": "STANDARD",
>>         "val": {
>>             "index_pool": "us-east-1.rgw.buckets.index",
>>             "data_pool": "us-east-1.rgw.buckets.data",
>>             "tail_data_pool": "us-east-1.rgw.buckets.3replica", <-
>> introduced for rgw_obj raw data
>>             "data_extra_pool": "us-east-1.rgw.buckets.non-ec",
>>             "index_type": 0,
>>             "compression": "",
>>             "inline_head": 1
>>         }
>>     },
>> #+END_EXAMPLE
>>
>> Multipart rgw_obj will be stored at tail_data_pool. Further more,for
>> those rgw_obj only has head,not tail, we can refactor Manifest to
>> support disable inline first chunk data of rgw_obj into the head,
>> which can finally match the semantic of AWS S3 sotrage class:
>>
>> #+BEGIN_EXAMPLE
>>     {
>>         "key": "STANDARD",
>>         "val": {
>>             "index_pool": "us-east-1.rgw.buckets.index",
>>             "data_pool": "us-east-1.rgw.buckets.data",
>>             "tail_data_pool": "us-east-1.rgw.buckets.3replica",
>>             "data_extra_pool": "us-east-1.rgw.buckets.non-ec",
>>             "index_type": 0,
>>             "compression": "",
>>             "inline_head": 1  <- introduced for inline first data
>> chunk of rgw_obj into head
>>         }
>>     },
>> #+END_EXAMPLE
>>
>> ** expose different storage class as individual placement rule
>>
>> As draft ,placment list will list all storage class:
>>
>> #+BEGIN_EXAMPLE
>>  ./bin/radosgw-admin -c ceph.conf zone  placement list
>> [
>>     {
>>         "key": "STANDARD",
>>         "val": {
>>             "index_pool": "us-east-1.rgw.buckets.index",
>>             "data_pool": "us-east-1.rgw.buckets.data",
>>             "tail_data_pool": "us-east-1.rgw.buckets.3replica",
>>             "data_extra_pool": "us-east-1.rgw.buckets.non-ec",
>>             "index_type": 0,
>>             "compression": "",
>>             "inline_head": 1
>>         }
>>     },
>>
>>     {
>>         "key": "RRS",
>>         "val": {
>>             "index_pool": "us-east-1.rgw.buckets.index",
>>             "data_pool": "us-east-1.rgw.buckets.data",
>>             "tail_data_pool": "us-east-1.rgw.buckets.2replica",
>>             "data_extra_pool": "us-east-1.rgw.buckets.non-ec",
>>             "index_type": 0,
>>             "compression": ""
>>             "inline_head": 1
>>         }
>>     }
>> ]
>> #+END_EXAMPLE
>>
>> Another option would be expose serveral storage classes in the same
>> placement rule:
>>
>> #+BEGIN_EXAMPLE
>>  ./bin/radosgw-admin -c ceph.conf zone  placement list
>> [
>>     {
>>         "key": "default-placement",
>>         "val": {
>>             "index_pool": "us-east-1.rgw.buckets.index",
>>             "storage_class"
>>             {
>>               "STANDARD" : {
>>                            "data_pool": "us-east-1.rgw.3replica",
>>                            "data_extra_pool": "us-east-1.rgw.buckets.non-ec",
>>                            "inline_head": 1
>>                            },
>>               "RRS" :      {
>>                            "data_pool": "us-east-1.rgw.2replica",
>>                            "data_extra_pool": "us-east-1.rgw.buckets.non-ec",
>>                            "inline_head": 1
>>                            },
>>             }
>>             "index_type": 0,
>>             "compression": ""
>>         }
>>     }
>> ]
>> #+END_EXAMPLE
>>
>> This approach strict the meaning of storage class as different data
>> pool. But we may support things like Multi-Regional Storage (
>> https://cloud.google.com/storage/docs/storage-classes#multi-regional )
>> in the future. So I'd prefer expost storage class at placement rule
>> level.
>>
>> * issues
>>
>> If we introduced the tail_data_pool,we need corresponding
>> modification. I'm not sure about this, feedback are appreciated.
>>
>> ** use rgw_pool instead of placment rule in the RGWManifest
>>
>> In the RGWObjManifest, we've defined two placement rules:
>>
>> + head_placement_rule
>> (https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.h#L406)
>> + tail_placement.placement_rule
>> (https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.h#L119)
>>
>> then we use placment rule to find the data_pool of the placement
>> rule.If we introduced the tail_data_pool,there's no need to keep
>> tail_placement.placement_rule(although it is the same as
>> head_placement_rule)
>>
>> In the RGWObjManifest internal, `class rgw_obj_select`also defined a
>> `placement_rule`
>> (https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.h#L127),
>> which finally used placement rule to find the data_pool of that
>> placement rule.
>>
>> So I suppose to instead of using placement rule in the
>> RGWManifest, replaced with rgw_pool.so that we've the chance to use
>> tail_data_pool and data_pool in the same placement rule.
>>
>> On 23 June 2017 at 13:43, 方钰翔 <abcdeffyx@gmail.com> wrote:
>>> I think storing the head object and tail objects in different pools is also
>>> necessary.
>>>
>>> If we introduce a tail_data_pool in placement rule to store tail objects. we
>>> can create replicated pool for data_pool and ec for tail_data_pool to
>>> leverage the performance and capacity.
>>>
>>> 2017-06-22 17:44 GMT+08:00 Jiaying Ren <mikulely@gmail.com>:
>>>>
>>>> On 21 June 2017 at 23:50, Daniel Gryniewicz <dang@redhat.com> wrote:
>>>> >>>
>>>> >> My original thinking was that when we reassign an object to a new
>>>> >> placement, we only touch its tail which is incompatible with that.
>>>> >> However, thinking about it some more I don't see why we need to have
>>>> >> this limitation, so it's probably possible to keep the data in the
>>>> >> head in one case, and modify the object and have the data in the tail
>>>> >> (object's head will need to be rewritten anyway because we modify the
>>>> >> manifest).
>>>> >> I think that the decision whether we keep data in the head could be a
>>>> >> property of the zone.
>>>>
>>>> Yes, I guess we also need to check the zone placement rule config when
>>>> pull the realm in the multisite env, to make sure the sync peer has
>>>> the same storage class support, multisite sync should also respect
>>>> object storage class.
>>>>
>>>> >> In any case, once an object is created changing
>>>> >> this property will only affect newly created objects, and old objects
>>>> >> could still be read correctly. Having data in the head is an
>>>> >> optimization that supposedly reduces small objects latency, and I
>>>> >> still think it's useful in a mixed pools situation. The thought is
>>>> >> that the bulk of the data will be at the tail anyway. However, we
>>>> >> recently changed the default head size from 512k to 4M, so this might
>>>> >> not be true any more. Anyhow, I favour having this as a configurable
>>>> >> (which should be simple to add).
>>>> >>
>>>> >> Yehuda
>>>> >>
>>>> >
>>>> >
>>>> > I would be strongly against keeping data in the head when the head is in
>>>> > a
>>>> > lower-level storage class.  That means that the entire object is
>>>> > violating
>>>> > the constraints of the storage class.
>>>>
>>>> Agreed. The default behavior of storage class require us to keep the
>>>> data in the head as the same pool as the tail. Even if we made this as
>>>> a configureable option, we should disable this kind of inline by
>>>> default to match the default behavior of storage class.
>>>>
>>>> >
>>>> > Of course, having the head in a lower storage class (data or not) is
>>>> > probably a violation.  Maybe we'd have to require that all heads go in
>>>> > the
>>>> > highest storage class.
>>>> >
>>>> > Daniel
>>>>
>>>> On 21 June 2017 at 23:50, Daniel Gryniewicz <dang@redhat.com> wrote:
>>>> > On 06/21/2017 11:14 AM, Yehuda Sadeh-Weinraub wrote:
>>>> >>
>>>> >> On Wed, Jun 21, 2017 at 7:46 AM, Daniel Gryniewicz <dang@redhat.com>
>>>> >> wrote:
>>>> >>>
>>>> >>> On 06/21/2017 10:04 AM, Matt Benjamin wrote:
>>>> >>>>
>>>> >>>>
>>>> >>>> Hi,
>>>> >>>>
>>>> >>>> Looks very coherent.
>>>> >>>>
>>>> >>>> My main question is about...
>>>> >>>>
>>>> >>>> ----- Original Message -----
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> From: "Jiaying Ren" <mikulely@gmail.com>
>>>> >>>>> To: "Yehuda Sadeh-Weinraub" <ysadehwe@redhat.com>
>>>> >>>>> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
>>>> >>>>> Sent: Wednesday, June 21, 2017 7:39:24 AM
>>>> >>>>> Subject: RGW: Implement S3 storage class feature
>>>> >>>>>
>>>> >>>>
>>>> >>>>>
>>>> >>>>> * Todo List
>>>> >>>>>
>>>> >>>>> + the head of rgw-object should only contains the metadata of
>>>> >>>>>   rgw-object,the first chunk of rgw-object data should be stored in
>>>> >>>>>   the same pool as the tail of rgw-object
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>> Is this always desirable?
>>>> >>>>
>>>> >>>
>>>> >>> Well, unless the head pool happens to have the correct storage class,
>>>> >>> it's
>>>> >>> necessary.  And I'd guess that verification of this is complicated,
>>>> >>> although
>>>> >>> maybe not.
>>>> >>>
>>>> >>> Maybe we can use the head pool if it has >= the correct storage class?
>>>> >>>
>>>> >> My original thinking was that when we reassign an object to a new
>>>> >> placement, we only touch its tail which is incompatible with that.
>>>> >> However, thinking about it some more I don't see why we need to have
>>>> >> this limitation, so it's probably possible to keep the data in the
>>>> >> head in one case, and modify the object and have the data in the tail
>>>> >> (object's head will need to be rewritten anyway because we modify the
>>>> >> manifest).
>>>> >> I think that the decision whether we keep data in the head could be a
>>>> >> property of the zone. In any case, once an object is created changing
>>>> >> this property will only affect newly created objects, and old objects
>>>> >> could still be read correctly. Having data in the head is an
>>>> >> optimization that supposedly reduces small objects latency, and I
>>>> >> still think it's useful in a mixed pools situation. The thought is
>>>> >> that the bulk of the data will be at the tail anyway. However, we
>>>> >> recently changed the default head size from 512k to 4M, so this might
>>>> >> not be true any more. Anyhow, I favour having this as a configurable
>>>> >> (which should be simple to add).
>>>> >>
>>>> >> Yehuda
>>>> >>
>>>> >
>>>> >
>>>> > I would be strongly against keeping data in the head when the head is in
>>>> > a
>>>> > lower-level storage class.  That means that the entire object is
>>>> > violating
>>>> > the constraints of the storage class.
>>>> >
>>>> > Of course, having the head in a lower storage class (data or not) is
>>>> > probably a violation.  Maybe we'd have to require that all heads go in
>>>> > the
>>>> > highest storage class.
>>>> >
>>>> > Daniel
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-07-18  2:12 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-21 11:39 RGW: Implement S3 storage class feature Jiaying Ren
2017-06-21 14:04 ` Matt Benjamin
2017-06-21 14:46   ` Daniel Gryniewicz
2017-06-21 15:14     ` Yehuda Sadeh-Weinraub
2017-06-21 15:50       ` Daniel Gryniewicz
2017-06-21 16:37         ` Yehuda Sadeh-Weinraub
2017-06-22  9:44         ` Jiaying Ren
     [not found]           ` <CAOi4hNd_wAshA_=4cW4X7Y_rijavs82-J2FbaE-L92P_Vy0RkA@mail.gmail.com>
2017-07-06 11:00             ` Jiaying Ren
2017-07-13 17:48               ` yuxiang fang
2017-07-18  2:12                 ` Jiaying Ren

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.