* RGW: Implement S3 storage class feature @ 2017-06-21 11:39 Jiaying Ren 2017-06-21 14:04 ` Matt Benjamin 0 siblings, 1 reply; 10+ messages in thread From: Jiaying Ren @ 2017-06-21 11:39 UTC (permalink / raw) To: Yehuda Sadeh-Weinraub; +Cc: ceph-devel Hi~ Yehuda and Ceph developers: We're prototyping S3 storage class feature [1]. It seems that we've tried on this before[2]. I'd like to share the following as start for anyone who interested about this feature; your comments are appreciated. * Storage Class Category S3 current supported storage class types can be classified by whether we can set storage class type during uploading or not: + Direct Storage Class (like Reduced Redundancy Storage, STANDARD, STANDARD_IA, we can specify during upload object) + Indirect Storage Class (like Glacier, we can only use this storage class type by lifecycle management) we're going to talk about Direct Storage Class. * Core Concept Current rgw are using following concept to determine the bucket/object placement: + placement rule - placement rule is key-value pair, the placement id as key, the placement info as value. + placement info - collect a bunch of rados pools. + placement target - placement target contains a placement id and a list of placement tags,that only used to determine whether the user can use the placement rule or not. placement target can be manipulated only in the zonegroup, and placement rule only in the zone. * Feature Mapping In order to make the S3 StorageClass/Swift Storage Policy orthogonal, we can leverage current placement rule as underlying building block, and mapping the dialect feature as: + Swift storage policy = per bucket placement rule + S3 storage class = per object placement rule Each storage class is presented by a placement rule,that use different data pools(like STANDARD use 3-replica data_pool, Reduced Redundancy Storage use 2-replica data_pool), but we need to enforce that the storage classes defined in the same zone should use the same index_pool for bucket index and the same pool for object metadata. * Priority of placement rule Following structs: + zonegroup + user + bucket need to contain a default placement rule, we need to determine the placement rule used by bucket/object. ** bucket placement rule The order of placement rule priority to determine the bucket default placement rule: request rule > user default rule > zonegroup default rule The bucket default placement rule should not be empty after bucket creation. ** object placement rule The order of placement rule priority to determine the object default placement rule: request rule > bucket default rule * Todo List + the head of rgw-object should only contains the metadata of rgw-object,the first chunk of rgw-object data should be stored in the same pool as the tail of rgw-object * References + [1] (http://docs.aws.amazon.com/AmazonS3/latest/dev/storage-class-intro.html) + [2] http://tracker.ceph.com/issues/12907 -- mikulely ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: RGW: Implement S3 storage class feature 2017-06-21 11:39 RGW: Implement S3 storage class feature Jiaying Ren @ 2017-06-21 14:04 ` Matt Benjamin 2017-06-21 14:46 ` Daniel Gryniewicz 0 siblings, 1 reply; 10+ messages in thread From: Matt Benjamin @ 2017-06-21 14:04 UTC (permalink / raw) To: Jiaying Ren; +Cc: Yehuda Sadeh-Weinraub, ceph-devel Hi, Looks very coherent. My main question is about... ----- Original Message ----- > From: "Jiaying Ren" <mikulely@gmail.com> > To: "Yehuda Sadeh-Weinraub" <ysadehwe@redhat.com> > Cc: "ceph-devel" <ceph-devel@vger.kernel.org> > Sent: Wednesday, June 21, 2017 7:39:24 AM > Subject: RGW: Implement S3 storage class feature > > > * Todo List > > + the head of rgw-object should only contains the metadata of > rgw-object,the first chunk of rgw-object data should be stored in > the same pool as the tail of rgw-object Is this always desirable? Matt -- Matt Benjamin Red Hat, Inc. 315 West Huron Street, Suite 140A Ann Arbor, Michigan 48103 http://www.redhat.com/en/technologies/storage tel. 734-821-5101 fax. 734-769-8938 cel. 734-216-5309 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: RGW: Implement S3 storage class feature 2017-06-21 14:04 ` Matt Benjamin @ 2017-06-21 14:46 ` Daniel Gryniewicz 2017-06-21 15:14 ` Yehuda Sadeh-Weinraub 0 siblings, 1 reply; 10+ messages in thread From: Daniel Gryniewicz @ 2017-06-21 14:46 UTC (permalink / raw) To: Matt Benjamin, Jiaying Ren; +Cc: Yehuda Sadeh-Weinraub, ceph-devel On 06/21/2017 10:04 AM, Matt Benjamin wrote: > Hi, > > Looks very coherent. > > My main question is about... > > ----- Original Message ----- >> From: "Jiaying Ren" <mikulely@gmail.com> >> To: "Yehuda Sadeh-Weinraub" <ysadehwe@redhat.com> >> Cc: "ceph-devel" <ceph-devel@vger.kernel.org> >> Sent: Wednesday, June 21, 2017 7:39:24 AM >> Subject: RGW: Implement S3 storage class feature >> > >> >> * Todo List >> >> + the head of rgw-object should only contains the metadata of >> rgw-object,the first chunk of rgw-object data should be stored in >> the same pool as the tail of rgw-object > > Is this always desirable? > Well, unless the head pool happens to have the correct storage class, it's necessary. And I'd guess that verification of this is complicated, although maybe not. Maybe we can use the head pool if it has >= the correct storage class? Daniel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: RGW: Implement S3 storage class feature 2017-06-21 14:46 ` Daniel Gryniewicz @ 2017-06-21 15:14 ` Yehuda Sadeh-Weinraub 2017-06-21 15:50 ` Daniel Gryniewicz 0 siblings, 1 reply; 10+ messages in thread From: Yehuda Sadeh-Weinraub @ 2017-06-21 15:14 UTC (permalink / raw) To: Daniel Gryniewicz; +Cc: Matt Benjamin, Jiaying Ren, ceph-devel On Wed, Jun 21, 2017 at 7:46 AM, Daniel Gryniewicz <dang@redhat.com> wrote: > On 06/21/2017 10:04 AM, Matt Benjamin wrote: >> >> Hi, >> >> Looks very coherent. >> >> My main question is about... >> >> ----- Original Message ----- >>> >>> From: "Jiaying Ren" <mikulely@gmail.com> >>> To: "Yehuda Sadeh-Weinraub" <ysadehwe@redhat.com> >>> Cc: "ceph-devel" <ceph-devel@vger.kernel.org> >>> Sent: Wednesday, June 21, 2017 7:39:24 AM >>> Subject: RGW: Implement S3 storage class feature >>> >> >>> >>> * Todo List >>> >>> + the head of rgw-object should only contains the metadata of >>> rgw-object,the first chunk of rgw-object data should be stored in >>> the same pool as the tail of rgw-object >> >> >> Is this always desirable? >> > > Well, unless the head pool happens to have the correct storage class, it's > necessary. And I'd guess that verification of this is complicated, although > maybe not. > > Maybe we can use the head pool if it has >= the correct storage class? > My original thinking was that when we reassign an object to a new placement, we only touch its tail which is incompatible with that. However, thinking about it some more I don't see why we need to have this limitation, so it's probably possible to keep the data in the head in one case, and modify the object and have the data in the tail (object's head will need to be rewritten anyway because we modify the manifest). I think that the decision whether we keep data in the head could be a property of the zone. In any case, once an object is created changing this property will only affect newly created objects, and old objects could still be read correctly. Having data in the head is an optimization that supposedly reduces small objects latency, and I still think it's useful in a mixed pools situation. The thought is that the bulk of the data will be at the tail anyway. However, we recently changed the default head size from 512k to 4M, so this might not be true any more. Anyhow, I favour having this as a configurable (which should be simple to add). Yehuda ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: RGW: Implement S3 storage class feature 2017-06-21 15:14 ` Yehuda Sadeh-Weinraub @ 2017-06-21 15:50 ` Daniel Gryniewicz 2017-06-21 16:37 ` Yehuda Sadeh-Weinraub 2017-06-22 9:44 ` Jiaying Ren 0 siblings, 2 replies; 10+ messages in thread From: Daniel Gryniewicz @ 2017-06-21 15:50 UTC (permalink / raw) To: Yehuda Sadeh-Weinraub; +Cc: Matt Benjamin, Jiaying Ren, ceph-devel On 06/21/2017 11:14 AM, Yehuda Sadeh-Weinraub wrote: > On Wed, Jun 21, 2017 at 7:46 AM, Daniel Gryniewicz <dang@redhat.com> wrote: >> On 06/21/2017 10:04 AM, Matt Benjamin wrote: >>> >>> Hi, >>> >>> Looks very coherent. >>> >>> My main question is about... >>> >>> ----- Original Message ----- >>>> >>>> From: "Jiaying Ren" <mikulely@gmail.com> >>>> To: "Yehuda Sadeh-Weinraub" <ysadehwe@redhat.com> >>>> Cc: "ceph-devel" <ceph-devel@vger.kernel.org> >>>> Sent: Wednesday, June 21, 2017 7:39:24 AM >>>> Subject: RGW: Implement S3 storage class feature >>>> >>> >>>> >>>> * Todo List >>>> >>>> + the head of rgw-object should only contains the metadata of >>>> rgw-object,the first chunk of rgw-object data should be stored in >>>> the same pool as the tail of rgw-object >>> >>> >>> Is this always desirable? >>> >> >> Well, unless the head pool happens to have the correct storage class, it's >> necessary. And I'd guess that verification of this is complicated, although >> maybe not. >> >> Maybe we can use the head pool if it has >= the correct storage class? >> > My original thinking was that when we reassign an object to a new > placement, we only touch its tail which is incompatible with that. > However, thinking about it some more I don't see why we need to have > this limitation, so it's probably possible to keep the data in the > head in one case, and modify the object and have the data in the tail > (object's head will need to be rewritten anyway because we modify the > manifest). > I think that the decision whether we keep data in the head could be a > property of the zone. In any case, once an object is created changing > this property will only affect newly created objects, and old objects > could still be read correctly. Having data in the head is an > optimization that supposedly reduces small objects latency, and I > still think it's useful in a mixed pools situation. The thought is > that the bulk of the data will be at the tail anyway. However, we > recently changed the default head size from 512k to 4M, so this might > not be true any more. Anyhow, I favour having this as a configurable > (which should be simple to add). > > Yehuda > I would be strongly against keeping data in the head when the head is in a lower-level storage class. That means that the entire object is violating the constraints of the storage class. Of course, having the head in a lower storage class (data or not) is probably a violation. Maybe we'd have to require that all heads go in the highest storage class. Daniel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: RGW: Implement S3 storage class feature 2017-06-21 15:50 ` Daniel Gryniewicz @ 2017-06-21 16:37 ` Yehuda Sadeh-Weinraub 2017-06-22 9:44 ` Jiaying Ren 1 sibling, 0 replies; 10+ messages in thread From: Yehuda Sadeh-Weinraub @ 2017-06-21 16:37 UTC (permalink / raw) To: Daniel Gryniewicz; +Cc: Matt Benjamin, Jiaying Ren, ceph-devel On Wed, Jun 21, 2017 at 8:50 AM, Daniel Gryniewicz <dang@redhat.com> wrote: > On 06/21/2017 11:14 AM, Yehuda Sadeh-Weinraub wrote: >> >> On Wed, Jun 21, 2017 at 7:46 AM, Daniel Gryniewicz <dang@redhat.com> >> wrote: >>> >>> On 06/21/2017 10:04 AM, Matt Benjamin wrote: >>>> >>>> >>>> Hi, >>>> >>>> Looks very coherent. >>>> >>>> My main question is about... >>>> >>>> ----- Original Message ----- >>>>> >>>>> >>>>> From: "Jiaying Ren" <mikulely@gmail.com> >>>>> To: "Yehuda Sadeh-Weinraub" <ysadehwe@redhat.com> >>>>> Cc: "ceph-devel" <ceph-devel@vger.kernel.org> >>>>> Sent: Wednesday, June 21, 2017 7:39:24 AM >>>>> Subject: RGW: Implement S3 storage class feature >>>>> >>>> >>>>> >>>>> * Todo List >>>>> >>>>> + the head of rgw-object should only contains the metadata of >>>>> rgw-object,the first chunk of rgw-object data should be stored in >>>>> the same pool as the tail of rgw-object >>>> >>>> >>>> >>>> Is this always desirable? >>>> >>> >>> Well, unless the head pool happens to have the correct storage class, >>> it's >>> necessary. And I'd guess that verification of this is complicated, >>> although >>> maybe not. >>> >>> Maybe we can use the head pool if it has >= the correct storage class? >>> >> My original thinking was that when we reassign an object to a new >> placement, we only touch its tail which is incompatible with that. >> However, thinking about it some more I don't see why we need to have >> this limitation, so it's probably possible to keep the data in the >> head in one case, and modify the object and have the data in the tail >> (object's head will need to be rewritten anyway because we modify the >> manifest). >> I think that the decision whether we keep data in the head could be a >> property of the zone. In any case, once an object is created changing >> this property will only affect newly created objects, and old objects >> could still be read correctly. Having data in the head is an >> optimization that supposedly reduces small objects latency, and I >> still think it's useful in a mixed pools situation. The thought is >> that the bulk of the data will be at the tail anyway. However, we >> recently changed the default head size from 512k to 4M, so this might >> not be true any more. Anyhow, I favour having this as a configurable >> (which should be simple to add). >> >> Yehuda >> > > > I would be strongly against keeping data in the head when the head is in a > lower-level storage class. That means that the entire object is violating > the constraints of the storage class. > > Of course, having the head in a lower storage class (data or not) is > probably a violation. Maybe we'd have to require that all heads go in the > highest storage class. > I'd keep it simple. Note that all objects' heads in a bucket need to reside in the same pool, otherwise we won't be able to locate them (unless we start searching). Yehuda ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: RGW: Implement S3 storage class feature 2017-06-21 15:50 ` Daniel Gryniewicz 2017-06-21 16:37 ` Yehuda Sadeh-Weinraub @ 2017-06-22 9:44 ` Jiaying Ren [not found] ` <CAOi4hNd_wAshA_=4cW4X7Y_rijavs82-J2FbaE-L92P_Vy0RkA@mail.gmail.com> 1 sibling, 1 reply; 10+ messages in thread From: Jiaying Ren @ 2017-06-22 9:44 UTC (permalink / raw) To: dang; +Cc: Yehuda Sadeh-Weinraub, Matt Benjamin, ceph-devel On 21 June 2017 at 23:50, Daniel Gryniewicz <dang@redhat.com> wrote: >>> >> My original thinking was that when we reassign an object to a new >> placement, we only touch its tail which is incompatible with that. >> However, thinking about it some more I don't see why we need to have >> this limitation, so it's probably possible to keep the data in the >> head in one case, and modify the object and have the data in the tail >> (object's head will need to be rewritten anyway because we modify the >> manifest). >> I think that the decision whether we keep data in the head could be a >> property of the zone. Yes, I guess we also need to check the zone placement rule config when pull the realm in the multisite env, to make sure the sync peer has the same storage class support, multisite sync should also respect object storage class. >> In any case, once an object is created changing >> this property will only affect newly created objects, and old objects >> could still be read correctly. Having data in the head is an >> optimization that supposedly reduces small objects latency, and I >> still think it's useful in a mixed pools situation. The thought is >> that the bulk of the data will be at the tail anyway. However, we >> recently changed the default head size from 512k to 4M, so this might >> not be true any more. Anyhow, I favour having this as a configurable >> (which should be simple to add). >> >> Yehuda >> > > > I would be strongly against keeping data in the head when the head is in a > lower-level storage class. That means that the entire object is violating > the constraints of the storage class. Agreed. The default behavior of storage class require us to keep the data in the head as the same pool as the tail. Even if we made this as a configureable option, we should disable this kind of inline by default to match the default behavior of storage class. > > Of course, having the head in a lower storage class (data or not) is > probably a violation. Maybe we'd have to require that all heads go in the > highest storage class. > > Daniel On 21 June 2017 at 23:50, Daniel Gryniewicz <dang@redhat.com> wrote: > On 06/21/2017 11:14 AM, Yehuda Sadeh-Weinraub wrote: >> >> On Wed, Jun 21, 2017 at 7:46 AM, Daniel Gryniewicz <dang@redhat.com> >> wrote: >>> >>> On 06/21/2017 10:04 AM, Matt Benjamin wrote: >>>> >>>> >>>> Hi, >>>> >>>> Looks very coherent. >>>> >>>> My main question is about... >>>> >>>> ----- Original Message ----- >>>>> >>>>> >>>>> From: "Jiaying Ren" <mikulely@gmail.com> >>>>> To: "Yehuda Sadeh-Weinraub" <ysadehwe@redhat.com> >>>>> Cc: "ceph-devel" <ceph-devel@vger.kernel.org> >>>>> Sent: Wednesday, June 21, 2017 7:39:24 AM >>>>> Subject: RGW: Implement S3 storage class feature >>>>> >>>> >>>>> >>>>> * Todo List >>>>> >>>>> + the head of rgw-object should only contains the metadata of >>>>> rgw-object,the first chunk of rgw-object data should be stored in >>>>> the same pool as the tail of rgw-object >>>> >>>> >>>> >>>> Is this always desirable? >>>> >>> >>> Well, unless the head pool happens to have the correct storage class, >>> it's >>> necessary. And I'd guess that verification of this is complicated, >>> although >>> maybe not. >>> >>> Maybe we can use the head pool if it has >= the correct storage class? >>> >> My original thinking was that when we reassign an object to a new >> placement, we only touch its tail which is incompatible with that. >> However, thinking about it some more I don't see why we need to have >> this limitation, so it's probably possible to keep the data in the >> head in one case, and modify the object and have the data in the tail >> (object's head will need to be rewritten anyway because we modify the >> manifest). >> I think that the decision whether we keep data in the head could be a >> property of the zone. In any case, once an object is created changing >> this property will only affect newly created objects, and old objects >> could still be read correctly. Having data in the head is an >> optimization that supposedly reduces small objects latency, and I >> still think it's useful in a mixed pools situation. The thought is >> that the bulk of the data will be at the tail anyway. However, we >> recently changed the default head size from 512k to 4M, so this might >> not be true any more. Anyhow, I favour having this as a configurable >> (which should be simple to add). >> >> Yehuda >> > > > I would be strongly against keeping data in the head when the head is in a > lower-level storage class. That means that the entire object is violating > the constraints of the storage class. > > Of course, having the head in a lower storage class (data or not) is > probably a violation. Maybe we'd have to require that all heads go in the > highest storage class. > > Daniel ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <CAOi4hNd_wAshA_=4cW4X7Y_rijavs82-J2FbaE-L92P_Vy0RkA@mail.gmail.com>]
* Re: RGW: Implement S3 storage class feature [not found] ` <CAOi4hNd_wAshA_=4cW4X7Y_rijavs82-J2FbaE-L92P_Vy0RkA@mail.gmail.com> @ 2017-07-06 11:00 ` Jiaying Ren 2017-07-13 17:48 ` yuxiang fang 0 siblings, 1 reply; 10+ messages in thread From: Jiaying Ren @ 2017-07-06 11:00 UTC (permalink / raw) To: 方钰翔, Yehuda Sadeh-Weinraub, Matt Benjamin, dang Cc: ceph-devel Thanks all for your insight! After more investigation,I'd like to share some output, your comments are appreciated as always. ;-) * proposal ** introduce tail_data_pool Each storage class is presented as individual placement rule. Each placement rule has serveral pools: + index_pool(for bucket index) + data_pool(for head) + tail_data_pool(for tail) Finally,different storage classes use the same index_pool and data_pool, but different tail_data_pool. Using different storage classes means using different tail_data_pools. Here's a placement rule/storage class config sample output: #+BEGIN_EXAMPLE { "key": "STANDARD", "val": { "index_pool": "us-east-1.rgw.buckets.index", "data_pool": "us-east-1.rgw.buckets.data", "tail_data_pool": "us-east-1.rgw.buckets.3replica", <- introduced for rgw_obj raw data "data_extra_pool": "us-east-1.rgw.buckets.non-ec", "index_type": 0, "compression": "", "inline_head": 1 } }, #+END_EXAMPLE Multipart rgw_obj will be stored at tail_data_pool. Further more,for those rgw_obj only has head,not tail, we can refactor Manifest to support disable inline first chunk data of rgw_obj into the head, which can finally match the semantic of AWS S3 sotrage class: #+BEGIN_EXAMPLE { "key": "STANDARD", "val": { "index_pool": "us-east-1.rgw.buckets.index", "data_pool": "us-east-1.rgw.buckets.data", "tail_data_pool": "us-east-1.rgw.buckets.3replica", "data_extra_pool": "us-east-1.rgw.buckets.non-ec", "index_type": 0, "compression": "", "inline_head": 1 <- introduced for inline first data chunk of rgw_obj into head } }, #+END_EXAMPLE ** expose different storage class as individual placement rule As draft ,placment list will list all storage class: #+BEGIN_EXAMPLE ./bin/radosgw-admin -c ceph.conf zone placement list [ { "key": "STANDARD", "val": { "index_pool": "us-east-1.rgw.buckets.index", "data_pool": "us-east-1.rgw.buckets.data", "tail_data_pool": "us-east-1.rgw.buckets.3replica", "data_extra_pool": "us-east-1.rgw.buckets.non-ec", "index_type": 0, "compression": "", "inline_head": 1 } }, { "key": "RRS", "val": { "index_pool": "us-east-1.rgw.buckets.index", "data_pool": "us-east-1.rgw.buckets.data", "tail_data_pool": "us-east-1.rgw.buckets.2replica", "data_extra_pool": "us-east-1.rgw.buckets.non-ec", "index_type": 0, "compression": "" "inline_head": 1 } } ] #+END_EXAMPLE Another option would be expose serveral storage classes in the same placement rule: #+BEGIN_EXAMPLE ./bin/radosgw-admin -c ceph.conf zone placement list [ { "key": "default-placement", "val": { "index_pool": "us-east-1.rgw.buckets.index", "storage_class" { "STANDARD" : { "data_pool": "us-east-1.rgw.3replica", "data_extra_pool": "us-east-1.rgw.buckets.non-ec", "inline_head": 1 }, "RRS" : { "data_pool": "us-east-1.rgw.2replica", "data_extra_pool": "us-east-1.rgw.buckets.non-ec", "inline_head": 1 }, } "index_type": 0, "compression": "" } } ] #+END_EXAMPLE This approach strict the meaning of storage class as different data pool. But we may support things like Multi-Regional Storage ( https://cloud.google.com/storage/docs/storage-classes#multi-regional ) in the future. So I'd prefer expost storage class at placement rule level. * issues If we introduced the tail_data_pool,we need corresponding modification. I'm not sure about this, feedback are appreciated. ** use rgw_pool instead of placment rule in the RGWManifest In the RGWObjManifest, we've defined two placement rules: + head_placement_rule (https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.h#L406) + tail_placement.placement_rule (https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.h#L119) then we use placment rule to find the data_pool of the placement rule.If we introduced the tail_data_pool,there's no need to keep tail_placement.placement_rule(although it is the same as head_placement_rule) In the RGWObjManifest internal, `class rgw_obj_select`also defined a `placement_rule` (https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.h#L127), which finally used placement rule to find the data_pool of that placement rule. So I suppose to instead of using placement rule in the RGWManifest, replaced with rgw_pool.so that we've the chance to use tail_data_pool and data_pool in the same placement rule. On 23 June 2017 at 13:43, 方钰翔 <abcdeffyx@gmail.com> wrote: > I think storing the head object and tail objects in different pools is also > necessary. > > If we introduce a tail_data_pool in placement rule to store tail objects. we > can create replicated pool for data_pool and ec for tail_data_pool to > leverage the performance and capacity. > > 2017-06-22 17:44 GMT+08:00 Jiaying Ren <mikulely@gmail.com>: >> >> On 21 June 2017 at 23:50, Daniel Gryniewicz <dang@redhat.com> wrote: >> >>> >> >> My original thinking was that when we reassign an object to a new >> >> placement, we only touch its tail which is incompatible with that. >> >> However, thinking about it some more I don't see why we need to have >> >> this limitation, so it's probably possible to keep the data in the >> >> head in one case, and modify the object and have the data in the tail >> >> (object's head will need to be rewritten anyway because we modify the >> >> manifest). >> >> I think that the decision whether we keep data in the head could be a >> >> property of the zone. >> >> Yes, I guess we also need to check the zone placement rule config when >> pull the realm in the multisite env, to make sure the sync peer has >> the same storage class support, multisite sync should also respect >> object storage class. >> >> >> In any case, once an object is created changing >> >> this property will only affect newly created objects, and old objects >> >> could still be read correctly. Having data in the head is an >> >> optimization that supposedly reduces small objects latency, and I >> >> still think it's useful in a mixed pools situation. The thought is >> >> that the bulk of the data will be at the tail anyway. However, we >> >> recently changed the default head size from 512k to 4M, so this might >> >> not be true any more. Anyhow, I favour having this as a configurable >> >> (which should be simple to add). >> >> >> >> Yehuda >> >> >> > >> > >> > I would be strongly against keeping data in the head when the head is in >> > a >> > lower-level storage class. That means that the entire object is >> > violating >> > the constraints of the storage class. >> >> Agreed. The default behavior of storage class require us to keep the >> data in the head as the same pool as the tail. Even if we made this as >> a configureable option, we should disable this kind of inline by >> default to match the default behavior of storage class. >> >> > >> > Of course, having the head in a lower storage class (data or not) is >> > probably a violation. Maybe we'd have to require that all heads go in >> > the >> > highest storage class. >> > >> > Daniel >> >> On 21 June 2017 at 23:50, Daniel Gryniewicz <dang@redhat.com> wrote: >> > On 06/21/2017 11:14 AM, Yehuda Sadeh-Weinraub wrote: >> >> >> >> On Wed, Jun 21, 2017 at 7:46 AM, Daniel Gryniewicz <dang@redhat.com> >> >> wrote: >> >>> >> >>> On 06/21/2017 10:04 AM, Matt Benjamin wrote: >> >>>> >> >>>> >> >>>> Hi, >> >>>> >> >>>> Looks very coherent. >> >>>> >> >>>> My main question is about... >> >>>> >> >>>> ----- Original Message ----- >> >>>>> >> >>>>> >> >>>>> From: "Jiaying Ren" <mikulely@gmail.com> >> >>>>> To: "Yehuda Sadeh-Weinraub" <ysadehwe@redhat.com> >> >>>>> Cc: "ceph-devel" <ceph-devel@vger.kernel.org> >> >>>>> Sent: Wednesday, June 21, 2017 7:39:24 AM >> >>>>> Subject: RGW: Implement S3 storage class feature >> >>>>> >> >>>> >> >>>>> >> >>>>> * Todo List >> >>>>> >> >>>>> + the head of rgw-object should only contains the metadata of >> >>>>> rgw-object,the first chunk of rgw-object data should be stored in >> >>>>> the same pool as the tail of rgw-object >> >>>> >> >>>> >> >>>> >> >>>> Is this always desirable? >> >>>> >> >>> >> >>> Well, unless the head pool happens to have the correct storage class, >> >>> it's >> >>> necessary. And I'd guess that verification of this is complicated, >> >>> although >> >>> maybe not. >> >>> >> >>> Maybe we can use the head pool if it has >= the correct storage class? >> >>> >> >> My original thinking was that when we reassign an object to a new >> >> placement, we only touch its tail which is incompatible with that. >> >> However, thinking about it some more I don't see why we need to have >> >> this limitation, so it's probably possible to keep the data in the >> >> head in one case, and modify the object and have the data in the tail >> >> (object's head will need to be rewritten anyway because we modify the >> >> manifest). >> >> I think that the decision whether we keep data in the head could be a >> >> property of the zone. In any case, once an object is created changing >> >> this property will only affect newly created objects, and old objects >> >> could still be read correctly. Having data in the head is an >> >> optimization that supposedly reduces small objects latency, and I >> >> still think it's useful in a mixed pools situation. The thought is >> >> that the bulk of the data will be at the tail anyway. However, we >> >> recently changed the default head size from 512k to 4M, so this might >> >> not be true any more. Anyhow, I favour having this as a configurable >> >> (which should be simple to add). >> >> >> >> Yehuda >> >> >> > >> > >> > I would be strongly against keeping data in the head when the head is in >> > a >> > lower-level storage class. That means that the entire object is >> > violating >> > the constraints of the storage class. >> > >> > Of course, having the head in a lower storage class (data or not) is >> > probably a violation. Maybe we'd have to require that all heads go in >> > the >> > highest storage class. >> > >> > Daniel >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: RGW: Implement S3 storage class feature 2017-07-06 11:00 ` Jiaying Ren @ 2017-07-13 17:48 ` yuxiang fang 2017-07-18 2:12 ` Jiaying Ren 0 siblings, 1 reply; 10+ messages in thread From: yuxiang fang @ 2017-07-13 17:48 UTC (permalink / raw) To: Jiaying Ren Cc: Yehuda Sadeh-Weinraub, Matt Benjamin, Daniel Gryniewicz, ceph-devel Hope this mail success using "plain text mode" We ever faced a problem when created radosgw object storage on ec pool(k=4, m=2). Using cosbench to do upload test, we got 700-800 ops of 4K size objects, but 2212.18 ops of 4K size objects from 3 replicated pool. Performance from ec is lower than 3 replicated pool, but they eventually have similar throughput when object size become bigger(4MB, 8MB or bigger). This phenomena is easy to explain, cpu is the bottleneck when upload small objects, but disks will be the bottleneck when upload bigger objects. Our customers always concern cost, so ec is a good choice to lower the cost of capacity; but it also brings trouble as mentioned above. So I wanted to find a way to improve the performance of ec for object storage based radosgw, and found a way to leverage capacity and performance. My opinion is that we should support store head and tail objects of radosgw object separately, which means that stores head objects in 3 replicated pool and tail objects in ec pool. So for small objects, we can get performance of 3 replicated pool, and we also benefit 67% capacity utility from ec(3 replicated only has 33%). Consider a scene: we want to upload big size(MB or GB) objects , we prefer to use multipart, radosgw will stripe every part to several tail rados objects but no head object and all of them will land in ec pool. So we will get similar throughput as 3 replicated pool for they are big objects, and we also benefit capacity utility. Pareto principle (also known as the 80–20 rule) also exists in some workload, that is 20% files/objects occupy 80% capacity. It is not just subjective guess, my company's share disk(like dropbox, storing department e-doc, software, and so on) obey the rule and even 85-15(15% files occupy 80% capacity). As the mail I replied several days(rejected by Mail Delivery Subsystem), if we introduce a tail_data_pool in placement rule to store tail objects. we can create replicated pool for data_pool and ec for tail_data_pool to leverage the performance and capacity. I have open a PR, and request for comments. https://github.com/ceph/ceph/pull/16325 thanks ivan from eisoo On Thu, Jul 6, 2017 at 7:00 PM, Jiaying Ren <mikulely@gmail.com> wrote: > Thanks all for your insight! > After more investigation,I'd like to > share some output, your comments are appreciated as always. ;-) > > * proposal > > ** introduce tail_data_pool > > Each storage class is presented as individual placement rule. Each > placement rule has serveral pools: > > + index_pool(for bucket index) > + data_pool(for head) > + tail_data_pool(for tail) > > Finally,different storage classes use the same index_pool and > data_pool, but different tail_data_pool. Using different storage > classes means using different tail_data_pools. > > Here's a placement rule/storage class config sample output: > > #+BEGIN_EXAMPLE > { > "key": "STANDARD", > "val": { > "index_pool": "us-east-1.rgw.buckets.index", > "data_pool": "us-east-1.rgw.buckets.data", > "tail_data_pool": "us-east-1.rgw.buckets.3replica", <- > introduced for rgw_obj raw data > "data_extra_pool": "us-east-1.rgw.buckets.non-ec", > "index_type": 0, > "compression": "", > "inline_head": 1 > } > }, > #+END_EXAMPLE > > Multipart rgw_obj will be stored at tail_data_pool. Further more,for > those rgw_obj only has head,not tail, we can refactor Manifest to > support disable inline first chunk data of rgw_obj into the head, > which can finally match the semantic of AWS S3 sotrage class: > > #+BEGIN_EXAMPLE > { > "key": "STANDARD", > "val": { > "index_pool": "us-east-1.rgw.buckets.index", > "data_pool": "us-east-1.rgw.buckets.data", > "tail_data_pool": "us-east-1.rgw.buckets.3replica", > "data_extra_pool": "us-east-1.rgw.buckets.non-ec", > "index_type": 0, > "compression": "", > "inline_head": 1 <- introduced for inline first data > chunk of rgw_obj into head > } > }, > #+END_EXAMPLE > > ** expose different storage class as individual placement rule > > As draft ,placment list will list all storage class: > > #+BEGIN_EXAMPLE > ./bin/radosgw-admin -c ceph.conf zone placement list > [ > { > "key": "STANDARD", > "val": { > "index_pool": "us-east-1.rgw.buckets.index", > "data_pool": "us-east-1.rgw.buckets.data", > "tail_data_pool": "us-east-1.rgw.buckets.3replica", > "data_extra_pool": "us-east-1.rgw.buckets.non-ec", > "index_type": 0, > "compression": "", > "inline_head": 1 > } > }, > > { > "key": "RRS", > "val": { > "index_pool": "us-east-1.rgw.buckets.index", > "data_pool": "us-east-1.rgw.buckets.data", > "tail_data_pool": "us-east-1.rgw.buckets.2replica", > "data_extra_pool": "us-east-1.rgw.buckets.non-ec", > "index_type": 0, > "compression": "" > "inline_head": 1 > } > } > ] > #+END_EXAMPLE > > Another option would be expose serveral storage classes in the same > placement rule: > > #+BEGIN_EXAMPLE > ./bin/radosgw-admin -c ceph.conf zone placement list > [ > { > "key": "default-placement", > "val": { > "index_pool": "us-east-1.rgw.buckets.index", > "storage_class" > { > "STANDARD" : { > "data_pool": "us-east-1.rgw.3replica", > "data_extra_pool": "us-east-1.rgw.buckets.non-ec", > "inline_head": 1 > }, > "RRS" : { > "data_pool": "us-east-1.rgw.2replica", > "data_extra_pool": "us-east-1.rgw.buckets.non-ec", > "inline_head": 1 > }, > } > "index_type": 0, > "compression": "" > } > } > ] > #+END_EXAMPLE > > This approach strict the meaning of storage class as different data > pool. But we may support things like Multi-Regional Storage ( > https://cloud.google.com/storage/docs/storage-classes#multi-regional ) > in the future. So I'd prefer expost storage class at placement rule > level. > > * issues > > If we introduced the tail_data_pool,we need corresponding > modification. I'm not sure about this, feedback are appreciated. > > ** use rgw_pool instead of placment rule in the RGWManifest > > In the RGWObjManifest, we've defined two placement rules: > > + head_placement_rule > (https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.h#L406) > + tail_placement.placement_rule > (https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.h#L119) > > then we use placment rule to find the data_pool of the placement > rule.If we introduced the tail_data_pool,there's no need to keep > tail_placement.placement_rule(although it is the same as > head_placement_rule) > > In the RGWObjManifest internal, `class rgw_obj_select`also defined a > `placement_rule` > (https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.h#L127), > which finally used placement rule to find the data_pool of that > placement rule. > > So I suppose to instead of using placement rule in the > RGWManifest, replaced with rgw_pool.so that we've the chance to use > tail_data_pool and data_pool in the same placement rule. > > On 23 June 2017 at 13:43, 方钰翔 <abcdeffyx@gmail.com> wrote: >> I think storing the head object and tail objects in different pools is also >> necessary. >> >> If we introduce a tail_data_pool in placement rule to store tail objects. we >> can create replicated pool for data_pool and ec for tail_data_pool to >> leverage the performance and capacity. >> >> 2017-06-22 17:44 GMT+08:00 Jiaying Ren <mikulely@gmail.com>: >>> >>> On 21 June 2017 at 23:50, Daniel Gryniewicz <dang@redhat.com> wrote: >>> >>> >>> >> My original thinking was that when we reassign an object to a new >>> >> placement, we only touch its tail which is incompatible with that. >>> >> However, thinking about it some more I don't see why we need to have >>> >> this limitation, so it's probably possible to keep the data in the >>> >> head in one case, and modify the object and have the data in the tail >>> >> (object's head will need to be rewritten anyway because we modify the >>> >> manifest). >>> >> I think that the decision whether we keep data in the head could be a >>> >> property of the zone. >>> >>> Yes, I guess we also need to check the zone placement rule config when >>> pull the realm in the multisite env, to make sure the sync peer has >>> the same storage class support, multisite sync should also respect >>> object storage class. >>> >>> >> In any case, once an object is created changing >>> >> this property will only affect newly created objects, and old objects >>> >> could still be read correctly. Having data in the head is an >>> >> optimization that supposedly reduces small objects latency, and I >>> >> still think it's useful in a mixed pools situation. The thought is >>> >> that the bulk of the data will be at the tail anyway. However, we >>> >> recently changed the default head size from 512k to 4M, so this might >>> >> not be true any more. Anyhow, I favour having this as a configurable >>> >> (which should be simple to add). >>> >> >>> >> Yehuda >>> >> >>> > >>> > >>> > I would be strongly against keeping data in the head when the head is in >>> > a >>> > lower-level storage class. That means that the entire object is >>> > violating >>> > the constraints of the storage class. >>> >>> Agreed. The default behavior of storage class require us to keep the >>> data in the head as the same pool as the tail. Even if we made this as >>> a configureable option, we should disable this kind of inline by >>> default to match the default behavior of storage class. >>> >>> > >>> > Of course, having the head in a lower storage class (data or not) is >>> > probably a violation. Maybe we'd have to require that all heads go in >>> > the >>> > highest storage class. >>> > >>> > Daniel >>> >>> On 21 June 2017 at 23:50, Daniel Gryniewicz <dang@redhat.com> wrote: >>> > On 06/21/2017 11:14 AM, Yehuda Sadeh-Weinraub wrote: >>> >> >>> >> On Wed, Jun 21, 2017 at 7:46 AM, Daniel Gryniewicz <dang@redhat.com> >>> >> wrote: >>> >>> >>> >>> On 06/21/2017 10:04 AM, Matt Benjamin wrote: >>> >>>> >>> >>>> >>> >>>> Hi, >>> >>>> >>> >>>> Looks very coherent. >>> >>>> >>> >>>> My main question is about... >>> >>>> >>> >>>> ----- Original Message ----- >>> >>>>> >>> >>>>> >>> >>>>> From: "Jiaying Ren" <mikulely@gmail.com> >>> >>>>> To: "Yehuda Sadeh-Weinraub" <ysadehwe@redhat.com> >>> >>>>> Cc: "ceph-devel" <ceph-devel@vger.kernel.org> >>> >>>>> Sent: Wednesday, June 21, 2017 7:39:24 AM >>> >>>>> Subject: RGW: Implement S3 storage class feature >>> >>>>> >>> >>>> >>> >>>>> >>> >>>>> * Todo List >>> >>>>> >>> >>>>> + the head of rgw-object should only contains the metadata of >>> >>>>> rgw-object,the first chunk of rgw-object data should be stored in >>> >>>>> the same pool as the tail of rgw-object >>> >>>> >>> >>>> >>> >>>> >>> >>>> Is this always desirable? >>> >>>> >>> >>> >>> >>> Well, unless the head pool happens to have the correct storage class, >>> >>> it's >>> >>> necessary. And I'd guess that verification of this is complicated, >>> >>> although >>> >>> maybe not. >>> >>> >>> >>> Maybe we can use the head pool if it has >= the correct storage class? >>> >>> >>> >> My original thinking was that when we reassign an object to a new >>> >> placement, we only touch its tail which is incompatible with that. >>> >> However, thinking about it some more I don't see why we need to have >>> >> this limitation, so it's probably possible to keep the data in the >>> >> head in one case, and modify the object and have the data in the tail >>> >> (object's head will need to be rewritten anyway because we modify the >>> >> manifest). >>> >> I think that the decision whether we keep data in the head could be a >>> >> property of the zone. In any case, once an object is created changing >>> >> this property will only affect newly created objects, and old objects >>> >> could still be read correctly. Having data in the head is an >>> >> optimization that supposedly reduces small objects latency, and I >>> >> still think it's useful in a mixed pools situation. The thought is >>> >> that the bulk of the data will be at the tail anyway. However, we >>> >> recently changed the default head size from 512k to 4M, so this might >>> >> not be true any more. Anyhow, I favour having this as a configurable >>> >> (which should be simple to add). >>> >> >>> >> Yehuda >>> >> >>> > >>> > >>> > I would be strongly against keeping data in the head when the head is in >>> > a >>> > lower-level storage class. That means that the entire object is >>> > violating >>> > the constraints of the storage class. >>> > >>> > Of course, having the head in a lower storage class (data or not) is >>> > probably a violation. Maybe we'd have to require that all heads go in >>> > the >>> > highest storage class. >>> > >>> > Daniel >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: RGW: Implement S3 storage class feature 2017-07-13 17:48 ` yuxiang fang @ 2017-07-18 2:12 ` Jiaying Ren 0 siblings, 0 replies; 10+ messages in thread From: Jiaying Ren @ 2017-07-18 2:12 UTC (permalink / raw) To: yuxiang fang Cc: Yehuda Sadeh-Weinraub, Matt Benjamin, Daniel Gryniewicz, ceph-devel Hi~ yuxiang: Glad to see you've pushed this forward! will take a review. On 14 July 2017 at 01:48, yuxiang fang <abcdeffyx@gmail.com> wrote: > Hope this mail success using "plain text mode" > > We ever faced a problem when created radosgw object storage on ec > pool(k=4, m=2). Using cosbench to do upload test, we got 700-800 ops > of 4K size objects, but 2212.18 ops of 4K size objects from 3 > replicated pool. Performance from ec is lower than 3 replicated pool, > but they eventually have similar throughput when object size become > bigger(4MB, 8MB or bigger). This phenomena is easy to explain, cpu is > the bottleneck when upload small objects, but disks will be the > bottleneck when upload bigger objects. > > Our customers always concern cost, so ec is a good choice to lower the > cost of capacity; but it also brings trouble as mentioned above. > So I wanted to find a way to improve the performance of ec for object > storage based radosgw, and found a way to leverage capacity and > performance. > > My opinion is that we should support store head and tail objects of > radosgw object separately, which means that stores head objects in 3 > replicated pool and tail objects in ec pool. So for small objects, we > can get performance of 3 replicated pool, and we also benefit 67% > capacity utility from ec(3 replicated only has 33%). > > Consider a scene: we want to upload big size(MB or GB) objects , we > prefer to use multipart, radosgw will stripe every part to several > tail rados objects but no head object and all of them will land in ec > pool. So we will get similar throughput as 3 replicated pool for they > are big objects, and we also benefit capacity utility. > > Pareto principle (also known as the 80–20 rule) also exists in some > workload, that is 20% files/objects occupy 80% capacity. It is not > just subjective guess, my company's share disk(like dropbox, storing > department e-doc, software, and so on) obey the rule and even > 85-15(15% files occupy 80% capacity). > > As the mail I replied several days(rejected by Mail Delivery > Subsystem), if we introduce a tail_data_pool in placement rule to > store tail objects. we can create replicated pool for data_pool and ec > for tail_data_pool to leverage the performance and capacity. > > I have open a PR, and request for comments. > https://github.com/ceph/ceph/pull/16325 > > > thanks > ivan from eisoo > > > On Thu, Jul 6, 2017 at 7:00 PM, Jiaying Ren <mikulely@gmail.com> wrote: >> Thanks all for your insight! >> After more investigation,I'd like to >> share some output, your comments are appreciated as always. ;-) >> >> * proposal >> >> ** introduce tail_data_pool >> >> Each storage class is presented as individual placement rule. Each >> placement rule has serveral pools: >> >> + index_pool(for bucket index) >> + data_pool(for head) >> + tail_data_pool(for tail) >> >> Finally,different storage classes use the same index_pool and >> data_pool, but different tail_data_pool. Using different storage >> classes means using different tail_data_pools. >> >> Here's a placement rule/storage class config sample output: >> >> #+BEGIN_EXAMPLE >> { >> "key": "STANDARD", >> "val": { >> "index_pool": "us-east-1.rgw.buckets.index", >> "data_pool": "us-east-1.rgw.buckets.data", >> "tail_data_pool": "us-east-1.rgw.buckets.3replica", <- >> introduced for rgw_obj raw data >> "data_extra_pool": "us-east-1.rgw.buckets.non-ec", >> "index_type": 0, >> "compression": "", >> "inline_head": 1 >> } >> }, >> #+END_EXAMPLE >> >> Multipart rgw_obj will be stored at tail_data_pool. Further more,for >> those rgw_obj only has head,not tail, we can refactor Manifest to >> support disable inline first chunk data of rgw_obj into the head, >> which can finally match the semantic of AWS S3 sotrage class: >> >> #+BEGIN_EXAMPLE >> { >> "key": "STANDARD", >> "val": { >> "index_pool": "us-east-1.rgw.buckets.index", >> "data_pool": "us-east-1.rgw.buckets.data", >> "tail_data_pool": "us-east-1.rgw.buckets.3replica", >> "data_extra_pool": "us-east-1.rgw.buckets.non-ec", >> "index_type": 0, >> "compression": "", >> "inline_head": 1 <- introduced for inline first data >> chunk of rgw_obj into head >> } >> }, >> #+END_EXAMPLE >> >> ** expose different storage class as individual placement rule >> >> As draft ,placment list will list all storage class: >> >> #+BEGIN_EXAMPLE >> ./bin/radosgw-admin -c ceph.conf zone placement list >> [ >> { >> "key": "STANDARD", >> "val": { >> "index_pool": "us-east-1.rgw.buckets.index", >> "data_pool": "us-east-1.rgw.buckets.data", >> "tail_data_pool": "us-east-1.rgw.buckets.3replica", >> "data_extra_pool": "us-east-1.rgw.buckets.non-ec", >> "index_type": 0, >> "compression": "", >> "inline_head": 1 >> } >> }, >> >> { >> "key": "RRS", >> "val": { >> "index_pool": "us-east-1.rgw.buckets.index", >> "data_pool": "us-east-1.rgw.buckets.data", >> "tail_data_pool": "us-east-1.rgw.buckets.2replica", >> "data_extra_pool": "us-east-1.rgw.buckets.non-ec", >> "index_type": 0, >> "compression": "" >> "inline_head": 1 >> } >> } >> ] >> #+END_EXAMPLE >> >> Another option would be expose serveral storage classes in the same >> placement rule: >> >> #+BEGIN_EXAMPLE >> ./bin/radosgw-admin -c ceph.conf zone placement list >> [ >> { >> "key": "default-placement", >> "val": { >> "index_pool": "us-east-1.rgw.buckets.index", >> "storage_class" >> { >> "STANDARD" : { >> "data_pool": "us-east-1.rgw.3replica", >> "data_extra_pool": "us-east-1.rgw.buckets.non-ec", >> "inline_head": 1 >> }, >> "RRS" : { >> "data_pool": "us-east-1.rgw.2replica", >> "data_extra_pool": "us-east-1.rgw.buckets.non-ec", >> "inline_head": 1 >> }, >> } >> "index_type": 0, >> "compression": "" >> } >> } >> ] >> #+END_EXAMPLE >> >> This approach strict the meaning of storage class as different data >> pool. But we may support things like Multi-Regional Storage ( >> https://cloud.google.com/storage/docs/storage-classes#multi-regional ) >> in the future. So I'd prefer expost storage class at placement rule >> level. >> >> * issues >> >> If we introduced the tail_data_pool,we need corresponding >> modification. I'm not sure about this, feedback are appreciated. >> >> ** use rgw_pool instead of placment rule in the RGWManifest >> >> In the RGWObjManifest, we've defined two placement rules: >> >> + head_placement_rule >> (https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.h#L406) >> + tail_placement.placement_rule >> (https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.h#L119) >> >> then we use placment rule to find the data_pool of the placement >> rule.If we introduced the tail_data_pool,there's no need to keep >> tail_placement.placement_rule(although it is the same as >> head_placement_rule) >> >> In the RGWObjManifest internal, `class rgw_obj_select`also defined a >> `placement_rule` >> (https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.h#L127), >> which finally used placement rule to find the data_pool of that >> placement rule. >> >> So I suppose to instead of using placement rule in the >> RGWManifest, replaced with rgw_pool.so that we've the chance to use >> tail_data_pool and data_pool in the same placement rule. >> >> On 23 June 2017 at 13:43, 方钰翔 <abcdeffyx@gmail.com> wrote: >>> I think storing the head object and tail objects in different pools is also >>> necessary. >>> >>> If we introduce a tail_data_pool in placement rule to store tail objects. we >>> can create replicated pool for data_pool and ec for tail_data_pool to >>> leverage the performance and capacity. >>> >>> 2017-06-22 17:44 GMT+08:00 Jiaying Ren <mikulely@gmail.com>: >>>> >>>> On 21 June 2017 at 23:50, Daniel Gryniewicz <dang@redhat.com> wrote: >>>> >>> >>>> >> My original thinking was that when we reassign an object to a new >>>> >> placement, we only touch its tail which is incompatible with that. >>>> >> However, thinking about it some more I don't see why we need to have >>>> >> this limitation, so it's probably possible to keep the data in the >>>> >> head in one case, and modify the object and have the data in the tail >>>> >> (object's head will need to be rewritten anyway because we modify the >>>> >> manifest). >>>> >> I think that the decision whether we keep data in the head could be a >>>> >> property of the zone. >>>> >>>> Yes, I guess we also need to check the zone placement rule config when >>>> pull the realm in the multisite env, to make sure the sync peer has >>>> the same storage class support, multisite sync should also respect >>>> object storage class. >>>> >>>> >> In any case, once an object is created changing >>>> >> this property will only affect newly created objects, and old objects >>>> >> could still be read correctly. Having data in the head is an >>>> >> optimization that supposedly reduces small objects latency, and I >>>> >> still think it's useful in a mixed pools situation. The thought is >>>> >> that the bulk of the data will be at the tail anyway. However, we >>>> >> recently changed the default head size from 512k to 4M, so this might >>>> >> not be true any more. Anyhow, I favour having this as a configurable >>>> >> (which should be simple to add). >>>> >> >>>> >> Yehuda >>>> >> >>>> > >>>> > >>>> > I would be strongly against keeping data in the head when the head is in >>>> > a >>>> > lower-level storage class. That means that the entire object is >>>> > violating >>>> > the constraints of the storage class. >>>> >>>> Agreed. The default behavior of storage class require us to keep the >>>> data in the head as the same pool as the tail. Even if we made this as >>>> a configureable option, we should disable this kind of inline by >>>> default to match the default behavior of storage class. >>>> >>>> > >>>> > Of course, having the head in a lower storage class (data or not) is >>>> > probably a violation. Maybe we'd have to require that all heads go in >>>> > the >>>> > highest storage class. >>>> > >>>> > Daniel >>>> >>>> On 21 June 2017 at 23:50, Daniel Gryniewicz <dang@redhat.com> wrote: >>>> > On 06/21/2017 11:14 AM, Yehuda Sadeh-Weinraub wrote: >>>> >> >>>> >> On Wed, Jun 21, 2017 at 7:46 AM, Daniel Gryniewicz <dang@redhat.com> >>>> >> wrote: >>>> >>> >>>> >>> On 06/21/2017 10:04 AM, Matt Benjamin wrote: >>>> >>>> >>>> >>>> >>>> >>>> Hi, >>>> >>>> >>>> >>>> Looks very coherent. >>>> >>>> >>>> >>>> My main question is about... >>>> >>>> >>>> >>>> ----- Original Message ----- >>>> >>>>> >>>> >>>>> >>>> >>>>> From: "Jiaying Ren" <mikulely@gmail.com> >>>> >>>>> To: "Yehuda Sadeh-Weinraub" <ysadehwe@redhat.com> >>>> >>>>> Cc: "ceph-devel" <ceph-devel@vger.kernel.org> >>>> >>>>> Sent: Wednesday, June 21, 2017 7:39:24 AM >>>> >>>>> Subject: RGW: Implement S3 storage class feature >>>> >>>>> >>>> >>>> >>>> >>>>> >>>> >>>>> * Todo List >>>> >>>>> >>>> >>>>> + the head of rgw-object should only contains the metadata of >>>> >>>>> rgw-object,the first chunk of rgw-object data should be stored in >>>> >>>>> the same pool as the tail of rgw-object >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> Is this always desirable? >>>> >>>> >>>> >>> >>>> >>> Well, unless the head pool happens to have the correct storage class, >>>> >>> it's >>>> >>> necessary. And I'd guess that verification of this is complicated, >>>> >>> although >>>> >>> maybe not. >>>> >>> >>>> >>> Maybe we can use the head pool if it has >= the correct storage class? >>>> >>> >>>> >> My original thinking was that when we reassign an object to a new >>>> >> placement, we only touch its tail which is incompatible with that. >>>> >> However, thinking about it some more I don't see why we need to have >>>> >> this limitation, so it's probably possible to keep the data in the >>>> >> head in one case, and modify the object and have the data in the tail >>>> >> (object's head will need to be rewritten anyway because we modify the >>>> >> manifest). >>>> >> I think that the decision whether we keep data in the head could be a >>>> >> property of the zone. In any case, once an object is created changing >>>> >> this property will only affect newly created objects, and old objects >>>> >> could still be read correctly. Having data in the head is an >>>> >> optimization that supposedly reduces small objects latency, and I >>>> >> still think it's useful in a mixed pools situation. The thought is >>>> >> that the bulk of the data will be at the tail anyway. However, we >>>> >> recently changed the default head size from 512k to 4M, so this might >>>> >> not be true any more. Anyhow, I favour having this as a configurable >>>> >> (which should be simple to add). >>>> >> >>>> >> Yehuda >>>> >> >>>> > >>>> > >>>> > I would be strongly against keeping data in the head when the head is in >>>> > a >>>> > lower-level storage class. That means that the entire object is >>>> > violating >>>> > the constraints of the storage class. >>>> > >>>> > Of course, having the head in a lower storage class (data or not) is >>>> > probably a violation. Maybe we'd have to require that all heads go in >>>> > the >>>> > highest storage class. >>>> > >>>> > Daniel >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2017-07-18 2:12 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-21 11:39 RGW: Implement S3 storage class feature Jiaying Ren
2017-06-21 14:04 ` Matt Benjamin
2017-06-21 14:46 ` Daniel Gryniewicz
2017-06-21 15:14 ` Yehuda Sadeh-Weinraub
2017-06-21 15:50 ` Daniel Gryniewicz
2017-06-21 16:37 ` Yehuda Sadeh-Weinraub
2017-06-22 9:44 ` Jiaying Ren
[not found] ` <CAOi4hNd_wAshA_=4cW4X7Y_rijavs82-J2FbaE-L92P_Vy0RkA@mail.gmail.com>
2017-07-06 11:00 ` Jiaying Ren
2017-07-13 17:48 ` yuxiang fang
2017-07-18 2:12 ` Jiaying Ren
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.