Initial proposal for bluestore compression control and statistics

All of lore.kernel.org
 help / color / mirror / Atom feed

* Initial proposal for bluestore compression control and statistics
@ 2016-05-19 17:27 Igor Fedotov
  2016-05-19 17:57 ` Piotr Dałek
  2016-05-19 19:40 ` Sage Weil
  0 siblings, 2 replies; 7+ messages in thread
From: Igor Fedotov @ 2016-05-19 17:27 UTC (permalink / raw)
  To: ceph-devel

Hi cephers,

please find my initial proposal with regard to bluestore compression 
control and related statistics.

Any comments/thoughts are highly appreciated.

==================================================================

COMPRESSION CONTROL OPTIONS

One can see following means to control  compression at BlueStore level.

1) Per-store setting to enable/disable compression and specify default 
compression method

bluestore_compression = <zlib | snappy> / <force | optional | disable>

E.g.

bluestore_compression = zlib/force

The first token denotes default/applied compression algorithm.
The second one:

'force' - enables compression for any objects

'optional' - burden the caller with the need to enable compression by 
different means (see below)

'disable' - unconditionally disables any compression for the store.

This option is definitely useful for testing/debugging and has probably 
limited use in production.

2) Per-object compression specification. One should be able to 
enable/disable compression for specific object.

Following sub-option can be provided:

   a) Specify compression mode (along with disablement option) at object 
creation

   b) Specify compression mode at arbitrary moment via specific 
method/ioctl call. Compression to be applied for subsequent write requests

   c) Force object compression/decompression at arbitrary moment via 
specific method/ioctl call. Existing object content to be 
compressed/decompressed and appropriate mode to be set for subsequent 
write requests.

   d) Disable compression for short-lived objects if corresponding hint 
has been provided via set_alloc_hint2 call. See PR at 
https://github.com/ceph/ceph/pull/6208/files/306c5e148cd2f538b3b6c8c2a1a3d5f38ef8e15a#r63775941

Along with specific compression algorithm one should be able to specify 
default algorithm selection. E.g. user can specify 'default' compression 
for an object instead of specific 'zlib' or 'snappy' value.

This way one can avoid the need to care about the proper algorithm 
selection for each object.

Default algorithm to be taken from the store setting (see above)

Such an option provides pretty good level of flexibility. Upper level 
can introduce additional logic to control compression this way, e.g. 
enable/disable it for specific pools or dynamically control depending on 
how compressible object content is.

3) Per-write request compression control.

This option provides the highest level of flexibility but is probably an 
overkill.

Any rationales to have it?

==================================================================

PER-STORE STATISTICS

Following statistics parameters to be introduced on per-store basis:

1) Allocated - total amount of data in allocated blobs

2) Stored - actual amount of stored object content, i.e. sum of all 
objects uncompressed content

3) StoredCompressed - amount of stored compressed data

4) StoredCompressedOriginal - original amount of stored compressed data

5) CompressionProcessed - amount of data processed by the compression. 
This differ from 'StoredCompressed' as some data can be finally stored 
uncompressed or removed. Also potentially the parameter can be reset by 
some means.

6) CompressOpsCount - amount of compression operations completed. The 
parameter can be reset by some means.

7) CompressTime - amount of time spent for compression. The parameter 
can be reset by some means.

8) WriteOpsCount - amount of write operations completed. The parameter 
can be reset by some means.

9) WriteTime - amount of time spent for write requests processing. The 
parameter can be reset by some means.

10) WrittenTotal - amount of written data.

11) DecompressionProcessed - amount of data processed by decompression. 
The parameter can be reset by some means.

12) DecompressOpsCount - amount of decompression operations completed. 
The parameter can be reset by some means.

13) DecompressTime - amount of time spent for compression. The parameter 
can be reset by some means.

14) ReadOpsCount - amount of read operations completed. The parameter 
can be reset by some means.

15) ReadTime - amount of time spent for read requests processing. The 
parameter can be reset by some means.

16) ReadTotal - amount of read data. The parameter can be reset by some 
means.

Handling parameters 11)-16) can be a bit tricky as we might want to 
avoid KV updates during reading. Thus we need some means to periodically 
store these parameters or just track them in-memory.

==================================================================

PER-OBJECT STATISTICS NOTES

It might be useful to have per-object statistics similar to the above 
mentioned per-store one. This way upper level can revise compression 
results and adjust the process accordingly.

The drawbacks are onode footprint increase and additional complexities 
for read op handling though.

If collected per-object statistics should be retrieved by using specific 
method/ioctl.

Perhaps we can introduce some object creation flag ( or extend 
alloc_hints or provide an ioctl ) to enable statistics collection for 
specific objects only?

Any thought on the need for that?

==================================================================

ADDITIONAL NOTES

1) It seems helpful to introduce additional means to indicate 
NO-MORE_WRITES event from upper level to BlueStore. This way one 
provides a hint that allows bluestore to trigger some background 
optimization on the object, e.g. garbage collection, defragmentation, etc.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Initial proposal for bluestore compression control and statistics
  2016-05-19 17:27 Initial proposal for bluestore compression control and statistics Igor Fedotov
@ 2016-05-19 17:57 ` Piotr Dałek
  2016-05-20 13:52   ` Igor Fedotov
  2016-05-19 19:40 ` Sage Weil
  1 sibling, 1 reply; 7+ messages in thread
From: Piotr Dałek @ 2016-05-19 17:57 UTC (permalink / raw)
  To: ceph-devel

On Thu, May 19, 2016 at 08:27:02PM +0300, Igor Fedotov wrote:
> Hi cephers,
> 
> please find my initial proposal with regard to bluestore compression
> control and related statistics.
> 
> Any comments/thoughts are highly appreciated.
> 
> ==================================================================
> 
> COMPRESSION CONTROL OPTIONS
> 
> One can see following means to control  compression at BlueStore level.
> 
> 1) Per-store setting to enable/disable compression and specify
> default compression method
> 
> bluestore_compression = <zlib | snappy> / <force | optional | disable>
> 
> E.g.
> 
> bluestore_compression = zlib/force
> 
> The first token denotes default/applied compression algorithm.
> The second one:
> 
> 'force' - enables compression for any objects
> 
> 'optional' - burden the caller with the need to enable compression
> by different means (see below)
> 
> 'disable' - unconditionally disables any compression for the store.
> 
> This option is definitely useful for testing/debugging and has
> probably limited use in production.

If one uses Ceph for storage of pre-compressed data, having an option to
disable additional (Ceph-side) compression would be desireable, at least on
per-Ceph level, but at least per-pool setting would be better. 
Regarding optional - see below.

> 2) Per-object compression specification. One should be able to
> enable/disable compression for specific object.
> 
> Following sub-option can be provided:
> 
>   a) Specify compression mode (along with disablement option) at
> object creation
> 
>   b) Specify compression mode at arbitrary moment via specific
> method/ioctl call. Compression to be applied for subsequent write
> requests
> 
>   c) Force object compression/decompression at arbitrary moment via
> specific method/ioctl call. Existing object content to be
> compressed/decompressed and appropriate mode to be set for
> subsequent write requests.
> 
>   d) Disable compression for short-lived objects if corresponding
> hint has been provided via set_alloc_hint2 call. See PR at https://github.com/ceph/ceph/pull/6208/files/306c5e148cd2f538b3b6c8c2a1a3d5f38ef8e15a#r63775941
> 
> Along with specific compression algorithm one should be able to
> specify default algorithm selection. E.g. user can specify 'default'
> compression for an object instead of specific 'zlib' or 'snappy'
> value.
> 
> This way one can avoid the need to care about the proper algorithm
> selection for each object.
> 
> Default algorithm to be taken from the store setting (see above)
> 
> Such an option provides pretty good level of flexibility. Upper
> level can introduce additional logic to control compression this
> way, e.g. enable/disable it for specific pools or dynamically
> control depending on how compressible object content is.

I would also add ability to set minimum acceptable compression ratio,
with at least two options (any and no-expand). "Any" would store compressed
objects regardless how well they've compressed and "No-expand" would store
object in compressed format only if compressed size is smaller than
uncompressed size. For zlib, this is more than possible (see "Maximum
expansion factor" at http://www.zlib.net/zlib_tech.html) and storing
doubly-compressed data will yield higher cpu and memory usage while
accessing object *and* more storage being utilized. Additional option (set
in percentage or bytes) specifying actual minimum acceptable compression
ratio would improve on this idea further, and for example, improve read
performance on large images (tens of gigabytes) that were compressed by only
few hundred megabytes. 

> 3) Per-write request compression control.
> 
> This option provides the highest level of flexibility but is
> probably an overkill.
> 
> Any rationales to have it?

See above. If we're going to have per-block compression flag, then writing
compressed format data only if the compression actually shrunk the data
would improve read performance later.

-- 
Piotr Dałek
branch@predictor.org.pl
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Initial proposal for bluestore compression control and statistics
  2016-05-19 17:57 ` Piotr Dałek
@ 2016-05-20 13:52   ` Igor Fedotov
  2016-05-20 14:14     ` Piotr Dałek
  0 siblings, 1 reply; 7+ messages in thread
From: Igor Fedotov @ 2016-05-20 13:52 UTC (permalink / raw)
  To: Piotr Dałek, ceph-devel



On 19.05.2016 20:57, Piotr Dałek wrote:
> On Thu, May 19, 2016 at 08:27:02PM +0300, Igor Fedotov wrote:
>> Hi cephers,
>>
>> please find my initial proposal with regard to bluestore compression
>> control and related statistics.
>>
>> Any comments/thoughts are highly appreciated.
>>
>> ==================================================================
>>
>> COMPRESSION CONTROL OPTIONS
>>
>> One can see following means to control  compression at BlueStore level.
>>
>> 1) Per-store setting to enable/disable compression and specify
>> default compression method
>>
>> bluestore_compression = <zlib | snappy> / <force | optional | disable>
>>
>> E.g.
>>
>> bluestore_compression = zlib/force
>>
>> The first token denotes default/applied compression algorithm.
>> The second one:
>>
>> 'force' - enables compression for any objects
>>
>> 'optional' - burden the caller with the need to enable compression
>> by different means (see below)
>>
>> 'disable' - unconditionally disables any compression for the store.
>>
>> This option is definitely useful for testing/debugging and has
>> probably limited use in production.
> If one uses Ceph for storage of pre-compressed data, having an option to
> disable additional (Ceph-side) compression would be desireable, at least on
> per-Ceph level, but at least per-pool setting would be better.
> Regarding optional - see below.
>
>> 2) Per-object compression specification. One should be able to
>> enable/disable compression for specific object.
>>
>> Following sub-option can be provided:
>>
>>    a) Specify compression mode (along with disablement option) at
>> object creation
>>
>>    b) Specify compression mode at arbitrary moment via specific
>> method/ioctl call. Compression to be applied for subsequent write
>> requests
>>
>>    c) Force object compression/decompression at arbitrary moment via
>> specific method/ioctl call. Existing object content to be
>> compressed/decompressed and appropriate mode to be set for
>> subsequent write requests.
>>
>>    d) Disable compression for short-lived objects if corresponding
>> hint has been provided via set_alloc_hint2 call. See PR at https://github.com/ceph/ceph/pull/6208/files/306c5e148cd2f538b3b6c8c2a1a3d5f38ef8e15a#r63775941
>>
>> Along with specific compression algorithm one should be able to
>> specify default algorithm selection. E.g. user can specify 'default'
>> compression for an object instead of specific 'zlib' or 'snappy'
>> value.
>>
>> This way one can avoid the need to care about the proper algorithm
>> selection for each object.
>>
>> Default algorithm to be taken from the store setting (see above)
>>
>> Such an option provides pretty good level of flexibility. Upper
>> level can introduce additional logic to control compression this
>> way, e.g. enable/disable it for specific pools or dynamically
>> control depending on how compressible object content is.
> I would also add ability to set minimum acceptable compression ratio,
> with at least two options (any and no-expand). "Any" would store compressed
> objects regardless how well they've compressed and "No-expand" would store
> object in compressed format only if compressed size is smaller than
> uncompressed size.
Why do we need "Any" option? Isn't "No-expand" enough?
>   For zlib, this is more than possible (see "Maximum
> expansion factor" at http://www.zlib.net/zlib_tech.html) and storing
> doubly-compressed data will yield higher cpu and memory usage while
> accessing object *and* more storage being utilized. Additional option (set
> in percentage or bytes) specifying actual minimum acceptable compression
> ratio would improve on this idea further, and for example, improve read
> performance on large images (tens of gigabytes) that were compressed by only
> few hundred megabytes.
Sounds good.
>
>> 3) Per-write request compression control.
>>
>> This option provides the highest level of flexibility but is
>> probably an overkill.
>>
>> Any rationales to have it?
> See above. If we're going to have per-block compression flag, then writing
> compressed format data only if the compression actually shrunk the data
> would improve read performance later.
>

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Initial proposal for bluestore compression control and statistics
  2016-05-20 13:52   ` Igor Fedotov
@ 2016-05-20 14:14     ` Piotr Dałek
  0 siblings, 0 replies; 7+ messages in thread
From: Piotr Dałek @ 2016-05-20 14:14 UTC (permalink / raw)
  To: ceph-devel

On Fri, May 20, 2016 at 04:52:18PM +0300, Igor Fedotov wrote:
> 
> On 19.05.2016 20:57, Piotr Dałek wrote:
> >I would also add ability to set minimum acceptable compression ratio,
> >with at least two options (any and no-expand). "Any" would store compressed
> >objects regardless how well they've compressed and "No-expand" would store
> >object in compressed format only if compressed size is smaller than
> >uncompressed size.
>
> Why do we need "Any" option? Isn't "No-expand" enough?

For example, when someone wants to benchmark Bluestore compression in worst case
scenario, or when someone wants to use compression as data-masking technique. And
finally, "no-expand" implies additional code in compressor/decompressor code
path and this option would go around it. 
(Not particularly practical uses, but there might be someone who finds this
useful... and it's not difficult to do in code).

-- 
Piotr Dałek
branch@predictor.org.pl
http://blog.predictor.org.pl
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Initial proposal for bluestore compression control and statistics
  2016-05-19 17:27 Initial proposal for bluestore compression control and statistics Igor Fedotov
  2016-05-19 17:57 ` Piotr Dałek
@ 2016-05-19 19:40 ` Sage Weil
  2016-05-19 20:52   ` Sage Weil
  2016-05-20 14:36   ` Igor Fedotov
  1 sibling, 2 replies; 7+ messages in thread
From: Sage Weil @ 2016-05-19 19:40 UTC (permalink / raw)
  To: Igor Fedotov; +Cc: ceph-devel

Hi Igor!

On Thu, 19 May 2016, Igor Fedotov wrote:
> Hi cephers,
> 
> please find my initial proposal with regard to bluestore compression control
> and related statistics.
> 
> Any comments/thoughts are highly appreciated.
> 
> ==================================================================
> 
> COMPRESSION CONTROL OPTIONS
> 
> One can see following means to control  compression at BlueStore level.
> 
> 1) Per-store setting to enable/disable compression and specify default
> compression method
> 
> bluestore_compression = <zlib | snappy> / <force | optional | disable>
> 
> E.g.
> 
> bluestore_compression = zlib/force
> 
> The first token denotes default/applied compression algorithm.
> The second one:
> 
> 'force' - enables compression for any objects
> 
> 'optional' - burden the caller with the need to enable compression by
> different means (see below)
> 
> 'disable' - unconditionally disables any compression for the store.
> 
> This option is definitely useful for testing/debugging and has probably
> limited use in production.

Do we need the 'disable' option?  i.e., is there any difference between

 bluestore compression = snappy/disable

and

 bluestore compression =

Also, since we don't need to list multiple algorithms, we can probably 
just simplify this to be

 bluestore compression algorithm = snappy

and then

 bluestore compression = force | optional | disable

or maybe just

 bluestore compression force = true/false
 bluestore compression allow = true/false

with a check that prevents nonsensical (force + allow).  Right now we 
don't have a enum config option type (although we perhaps should).

> 2) Per-object compression specification. One should be able to enable/disable
> compression for specific object.
> 
> Following sub-option can be provided:
> 
>   a) Specify compression mode (along with disablement option) at object
> creation
> 
>   b) Specify compression mode at arbitrary moment via specific method/ioctl
> call. Compression to be applied for subsequent write requests
> 
>   c) Force object compression/decompression at arbitrary moment via specific
> method/ioctl call. Existing object content to be compressed/decompressed and
> appropriate mode to be set for subsequent write requests.
> 
>   d) Disable compression for short-lived objects if corresponding hint has
> been provided via set_alloc_hint2 call. See PR at
> https://github.com/ceph/ceph/pull/6208/files/306c5e148cd2f538b3b6c8c2a1a3d5f38ef8e15a#r63775941

I think a, b, and d can be address by adding two hints to the 
set_alloc_hint2 operation:

 COMPRESSIBLE
 INCOMPRESSBILE

The first would attempt compression if bluestore compression = allow, and 
the second would not try even if compression = force.

Alternatively, we could have

 bluestore compression = force | aggressive | passive | disable

where aggressive would try unless INCOMPRESSIBLE and passive would not try 
unless COMPRESSIBLE.

I would make the SHORTLIVED inference an independent heuristic that is 
optional, and basically makes SHORTLIVED => INCOMPRESSIBLE and LONGLIVED 
=> COMPRESSIBLE.

> Along with specific compression algorithm one should be able to specify
> default algorithm selection. E.g. user can specify 'default' compression for
> an object instead of specific 'zlib' or 'snappy' value.
> 
> This way one can avoid the need to care about the proper algorithm selection
> for each object.
> 
> Default algorithm to be taken from the store setting (see above)

Do we need to vary the alg per object?

For your c above, I think we probably want a 'compress' and 'decompress' 
rados op, but until we have an actual user that would make use of it, I 
don't think we should worry about it.  In the meantime, someone can 
just set the hint and rewrite the object if they want to force 
compression on existing data.

> Such an option provides pretty good level of flexibility. Upper level can
> introduce additional logic to control compression this way, e.g.
> enable/disable it for specific pools or dynamically control depending on how
> compressible object content is.
> 
> 3) Per-write request compression control.
> 
> This option provides the highest level of flexibility but is probably an
> overkill.
> 
> Any rationales to have it?

I don't think we need it.

> ==================================================================
> 
> PER-STORE STATISTICS
> 
> Following statistics parameters to be introduced on per-store basis:
> 
> 1) Allocated - total amount of data in allocated blobs
> 
> 2) Stored - actual amount of stored object content, i.e. sum of all objects
> uncompressed content
> 
> 3) StoredCompressed - amount of stored compressed data
> 
> 4) StoredCompressedOriginal - original amount of stored compressed data
> 
> 5) CompressionProcessed - amount of data processed by the compression. This
> differ from 'StoredCompressed' as some data can be finally stored uncompressed
> or removed. Also potentially the parameter can be reset by some means.
> 
> 6) CompressOpsCount - amount of compression operations completed. The
> parameter can be reset by some means.
> 
> 7) CompressTime - amount of time spent for compression. The parameter can be
> reset by some means.
> 
> 8) WriteOpsCount - amount of write operations completed. The parameter can be
> reset by some means.
> 
> 9) WriteTime - amount of time spent for write requests processing. The
> parameter can be reset by some means.
> 
> 10) WrittenTotal - amount of written data.
> 
> 11) DecompressionProcessed - amount of data processed by decompression. The
> parameter can be reset by some means.
> 
> 12) DecompressOpsCount - amount of decompression operations completed. The
> parameter can be reset by some means.
> 
> 13) DecompressTime - amount of time spent for compression. The parameter can
> be reset by some means.
> 
> 14) ReadOpsCount - amount of read operations completed. The parameter can be
> reset by some means.
> 
> 15) ReadTime - amount of time spent for read requests processing. The
> parameter can be reset by some means.
> 
> 16) ReadTotal - amount of read data. The parameter can be reset by some means.
> 
> Handling parameters 11)-16) can be a bit tricky as we might want to avoid KV
> updates during reading. Thus we need some means to periodically store these
> parameters or just track them in-memory.

These seem to break down into two categories:

 - stuff that tracks performance, and would probably just map to 
perfcounters, to be slurped up by your metrics and graphing infrastructure 
along with other performance stuff

 - stats about utilized storage that we might want to see from a 'df'.  
Specifically, 1-4.  I suspect we can keep some high-level global counters 
for this and update on a per-transaction basis... probably using a rocksdb 
merge operator for addition/subtraction?  Then we can extend the 
ObjectStore statfs() interface to pass these stats up to the OSD for 
reporting through the mon for 'ceph df' and 'ceph osd df'.

What that doesn't give you is per-pg stats.  Is that important?  If so, we 
need to do the accounting on a per-collection basis, and add a new 
ObjectStore statfs-like op for collections.

> ==================================================================
> 
> PER-OBJECT STATISTICS NOTES
> 
> It might be useful to have per-object statistics similar to the above
> mentioned per-store one. This way upper level can revise compression results
> and adjust the process accordingly.
> 
> The drawbacks are onode footprint increase and additional complexities for
> read op handling though.
> 
> If collected per-object statistics should be retrieved by using specific
> method/ioctl.
> 
> Perhaps we can introduce some object creation flag ( or extend alloc_hints or
> provide an ioctl ) to enable statistics collection for specific objects only?
> 
> Any thought on the need for that?

I think pool granularity would be enough.  I would expect users to be 
interested about a corpus, and object types generally break down by pool.
 
> ==================================================================
> 
> ADDITIONAL NOTES
> 
> 1) It seems helpful to introduce additional means to indicate NO-MORE_WRITES
> event from upper level to BlueStore. This way one provides a hint that allows
> bluestore to trigger some background optimization on the object, e.g. garbage
> collection, defragmentation, etc.

We could have a rados op for 'seal' that would prevent further writes.  
Just a hint would be sufficient for gc/optimization purposes, but here it 
probably makes sense to make it an enforcing flag.  Sam has been 
looking for something like this for a while.


The other topic not covered here is the compressed_blob_size.  The new 
write code will create a single large blob to satisfy an entire 
write currently.  With compression, we'll want to cap the blob size unless 
there is an IMMUTABLE or APPEND_ONLY hint (in which case we don't care 
about overwrites and may as well keep metadata compact).

Is that just

 bluestore compression max blob size = 128*1024

?

sage

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Initial proposal for bluestore compression control and statistics
  2016-05-19 19:40 ` Sage Weil
@ 2016-05-19 20:52   ` Sage Weil
  2016-05-20 14:36   ` Igor Fedotov
  1 sibling, 0 replies; 7+ messages in thread
From: Sage Weil @ 2016-05-19 20:52 UTC (permalink / raw)
  To: Igor Fedotov; +Cc: ceph-devel

On Thu, 19 May 2016, Sage Weil wrote:
> On Thu, 19 May 2016, Igor Fedotov wrote:
> > Hi cephers,
> > 
> > please find my initial proposal with regard to bluestore compression control
> > and related statistics.
> > 
> > Any comments/thoughts are highly appreciated.
> > 
> > ==================================================================
> > 
> > COMPRESSION CONTROL OPTIONS
> > 
> > One can see following means to control  compression at BlueStore level.
> > 
> > 1) Per-store setting to enable/disable compression and specify default
> > compression method
> > 
> > bluestore_compression = <zlib | snappy> / <force | optional | disable>
> > 
> > E.g.
> > 
> > bluestore_compression = zlib/force
> > 
> > The first token denotes default/applied compression algorithm.
> > The second one:
> > 
> > 'force' - enables compression for any objects
> > 
> > 'optional' - burden the caller with the need to enable compression by
> > different means (see below)
> > 
> > 'disable' - unconditionally disables any compression for the store.
> > 
> > This option is definitely useful for testing/debugging and has probably
> > limited use in production.
> 
> Do we need the 'disable' option?  i.e., is there any difference between
> 
>  bluestore compression = snappy/disable
> 
> and
> 
>  bluestore compression =
> 
> Also, since we don't need to list multiple algorithms, we can probably 
> just simplify this to be
> 
>  bluestore compression algorithm = snappy
> 
> and then
> 
>  bluestore compression = force | optional | disable
> 
> or maybe just
> 
>  bluestore compression force = true/false
>  bluestore compression allow = true/false
> 
> with a check that prevents nonsensical (force + allow).  Right now we 
> don't have a enum config option type (although we perhaps should).
> 
> > 2) Per-object compression specification. One should be able to enable/disable
> > compression for specific object.
> > 
> > Following sub-option can be provided:
> > 
> >   a) Specify compression mode (along with disablement option) at object
> > creation
> > 
> >   b) Specify compression mode at arbitrary moment via specific method/ioctl
> > call. Compression to be applied for subsequent write requests
> > 
> >   c) Force object compression/decompression at arbitrary moment via specific
> > method/ioctl call. Existing object content to be compressed/decompressed and
> > appropriate mode to be set for subsequent write requests.
> > 
> >   d) Disable compression for short-lived objects if corresponding hint has
> > been provided via set_alloc_hint2 call. See PR at
> > https://github.com/ceph/ceph/pull/6208/files/306c5e148cd2f538b3b6c8c2a1a3d5f38ef8e15a#r63775941
> 
> I think a, b, and d can be address by adding two hints to the 
> set_alloc_hint2 operation:
> 
>  COMPRESSIBLE
>  INCOMPRESSBILE
> 
> The first would attempt compression if bluestore compression = allow, and 
> the second would not try even if compression = force.
> 
> Alternatively, we could have
> 
>  bluestore compression = force | aggressive | passive | disable
> 
> where aggressive would try unless INCOMPRESSIBLE and passive would not try 
> unless COMPRESSIBLE.
> 
> I would make the SHORTLIVED inference an independent heuristic that is 
> optional, and basically makes SHORTLIVED => INCOMPRESSIBLE and LONGLIVED 
> => COMPRESSIBLE.

One other thought: might we want to take LONGLIVED + IMMUTABLE or 
APPEND_ONLY to mean we should use a different, more space-efficient codec?  
While using some fast and cheap (e.g., snappy) for everything else?

Maybe an additional option

 bluestore compression longlived algorithm = zlib

to go along with bluestore_compression_algorithm...

sage

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Initial proposal for bluestore compression control and statistics
  2016-05-19 19:40 ` Sage Weil
  2016-05-19 20:52   ` Sage Weil
@ 2016-05-20 14:36   ` Igor Fedotov
  1 sibling, 0 replies; 7+ messages in thread
From: Igor Fedotov @ 2016-05-20 14:36 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel



On 19.05.2016 22:40, Sage Weil wrote:
> Hi Igor!
>
> On Thu, 19 May 2016, Igor Fedotov wrote:
>> Hi cephers,
>>
>> please find my initial proposal with regard to bluestore compression control
>> and related statistics.
>>
>> Any comments/thoughts are highly appreciated.
>>
>> ==================================================================
>>
>> COMPRESSION CONTROL OPTIONS
>>
>> One can see following means to control  compression at BlueStore level.
>>
>> 1) Per-store setting to enable/disable compression and specify default
>> compression method
>>
>> bluestore_compression = <zlib | snappy> / <force | optional | disable>
>>
>> E.g.
>>
>> bluestore_compression = zlib/force
>>
>> The first token denotes default/applied compression algorithm.
>> The second one:
>>
>> 'force' - enables compression for any objects
>>
>> 'optional' - burden the caller with the need to enable compression by
>> different means (see below)
>>
>> 'disable' - unconditionally disables any compression for the store.
>>
>> This option is definitely useful for testing/debugging and has probably
>> limited use in production.
> Do we need the 'disable' option?  i.e., is there any difference between
>
>   bluestore compression = snappy/disable
>
> and
>
>   bluestore compression =
Actually there is no specific need for "disable". Blank is enough. But 
IMHO having explicit token improves config readability a bit.
>
> Also, since we don't need to list multiple algorithms, we can probably
> just simplify this to be
>
>   bluestore compression algorithm = snappy
>
> and then
>
>   bluestore compression = force | optional | disable
>
> or maybe just
>
>   bluestore compression force = true/false
>   bluestore compression allow = true/false
>
> with a check that prevents nonsensical (force + allow).  Right now we
> don't have a enum config option type (although we perhaps should).
I'd prefer variant with "bluestore compression algorithm" & "bluestore 
compression" parameters

>> 2) Per-object compression specification. One should be able to enable/disable
>> compression for specific object.
>>
>> Following sub-option can be provided:
>>
>>    a) Specify compression mode (along with disablement option) at object
>> creation
>>
>>    b) Specify compression mode at arbitrary moment via specific method/ioctl
>> call. Compression to be applied for subsequent write requests
>>
>>    c) Force object compression/decompression at arbitrary moment via specific
>> method/ioctl call. Existing object content to be compressed/decompressed and
>> appropriate mode to be set for subsequent write requests.
>>
>>    d) Disable compression for short-lived objects if corresponding hint has
>> been provided via set_alloc_hint2 call. See PR at
>> https://github.com/ceph/ceph/pull/6208/files/306c5e148cd2f538b3b6c8c2a1a3d5f38ef8e15a#r63775941
> I think a, b, and d can be address by adding two hints to the
> set_alloc_hint2 operation:
>
>   COMPRESSIBLE
>   INCOMPRESSBILE
>
> The first would attempt compression if bluestore compression = allow, and
> the second would not try even if compression = force.
Cool!
> Alternatively, we could have
>
>   bluestore compression = force | aggressive | passive | disable
>
> where aggressive would try unless INCOMPRESSIBLE and passive would not try
> unless COMPRESSIBLE.
Sounds good!
>
> I would make the SHORTLIVED inference an independent heuristic that is
> optional, and basically makes SHORTLIVED => INCOMPRESSIBLE and LONGLIVED
> => COMPRESSIBLE.
>
>> Along with specific compression algorithm one should be able to specify
>> default algorithm selection. E.g. user can specify 'default' compression for
>> an object instead of specific 'zlib' or 'snappy' value.
>>
>> This way one can avoid the need to care about the proper algorithm selection
>> for each object.
>>
>> Default algorithm to be taken from the store setting (see above)
> Do we need to vary the alg per object?
There were some notes about that from Allen and Blair Bethwaite during 
the initial bluestore compression discussion.
> For your c above, I think we probably want a 'compress' and 'decompress'
> rados op, but until we have an actual user that would make use of it, I
> don't think we should worry about it.  In the meantime, someone can
> just set the hint and rewrite the object if they want to force
> compression on existing data.
Agreed
>> Such an option provides pretty good level of flexibility. Upper level can
>> introduce additional logic to control compression this way, e.g.
>> enable/disable it for specific pools or dynamically control depending on how
>> compressible object content is.
>>
>> 3) Per-write request compression control.
>>
>> This option provides the highest level of flexibility but is probably an
>> overkill.
>>
>> Any rationales to have it?
> I don't think we need it.
>
>> ==================================================================
>>
>> PER-STORE STATISTICS
>>
>> Following statistics parameters to be introduced on per-store basis:
>>
>> 1) Allocated - total amount of data in allocated blobs
>>
>> 2) Stored - actual amount of stored object content, i.e. sum of all objects
>> uncompressed content
>>
>> 3) StoredCompressed - amount of stored compressed data
>>
>> 4) StoredCompressedOriginal - original amount of stored compressed data
>>
>> 5) CompressionProcessed - amount of data processed by the compression. This
>> differ from 'StoredCompressed' as some data can be finally stored uncompressed
>> or removed. Also potentially the parameter can be reset by some means.
>>
>> 6) CompressOpsCount - amount of compression operations completed. The
>> parameter can be reset by some means.
>>
>> 7) CompressTime - amount of time spent for compression. The parameter can be
>> reset by some means.
>>
>> 8) WriteOpsCount - amount of write operations completed. The parameter can be
>> reset by some means.
>>
>> 9) WriteTime - amount of time spent for write requests processing. The
>> parameter can be reset by some means.
>>
>> 10) WrittenTotal - amount of written data.
>>
>> 11) DecompressionProcessed - amount of data processed by decompression. The
>> parameter can be reset by some means.
>>
>> 12) DecompressOpsCount - amount of decompression operations completed. The
>> parameter can be reset by some means.
>>
>> 13) DecompressTime - amount of time spent for compression. The parameter can
>> be reset by some means.
>>
>> 14) ReadOpsCount - amount of read operations completed. The parameter can be
>> reset by some means.
>>
>> 15) ReadTime - amount of time spent for read requests processing. The
>> parameter can be reset by some means.
>>
>> 16) ReadTotal - amount of read data. The parameter can be reset by some means.
>>
>> Handling parameters 11)-16) can be a bit tricky as we might want to avoid KV
>> updates during reading. Thus we need some means to periodically store these
>> parameters or just track them in-memory.
> These seem to break down into two categories:
>
>   - stuff that tracks performance, and would probably just map to
> perfcounters, to be slurped up by your metrics and graphing infrastructure
> along with other performance stuff
>
>   - stats about utilized storage that we might want to see from a 'df'.
> Specifically, 1-4.  I suspect we can keep some high-level global counters
> for this and update on a per-transaction basis... probably using a rocksdb
> merge operator for addition/subtraction?  Then we can extend the
> ObjectStore statfs() interface to pass these stats up to the OSD for
> reporting through the mon for 'ceph df' and 'ceph osd df'.
Sounds good!
> What that doesn't give you is per-pg stats.  Is that important?  If so, we
> need to do the accounting on a per-collection basis, and add a new
> ObjectStore statfs-like op for collections.
Don't think we need that at the moment
>> ==================================================================
>>
>> PER-OBJECT STATISTICS NOTES
>>
>> It might be useful to have per-object statistics similar to the above
>> mentioned per-store one. This way upper level can revise compression results
>> and adjust the process accordingly.
>>
>> The drawbacks are onode footprint increase and additional complexities for
>> read op handling though.
>>
>> If collected per-object statistics should be retrieved by using specific
>> method/ioctl.
>>
>> Perhaps we can introduce some object creation flag ( or extend alloc_hints or
>> provide an ioctl ) to enable statistics collection for specific objects only?
>>
>> Any thought on the need for that?
> I think pool granularity would be enough.  I would expect users to be
> interested about a corpus, and object types generally break down by pool.
>   
How do we know about the pool at BlueStore level? And how are we 
planning to track that information at BlueStore? Do we have any 
(persistent?) entities for that?
>> ==================================================================
>>
>> ADDITIONAL NOTES
>>
>> 1) It seems helpful to introduce additional means to indicate NO-MORE_WRITES
>> event from upper level to BlueStore. This way one provides a hint that allows
>> bluestore to trigger some background optimization on the object, e.g. garbage
>> collection, defragmentation, etc.
> We could have a rados op for 'seal' that would prevent further writes.
> Just a hint would be sufficient for gc/optimization purposes, but here it
> probably makes sense to make it an enforcing flag.  Sam has been
> looking for something like this for a while.
Isn't "IMMUTABLE" flag such a "seal"?
This actually marks an object as READ-ONLY, right?

I meant a bit different option though - to be able to indicate that no 
more writes are expected in the nearest future. But they are still 
possible later.
Thus one having a bunch of writes can indicate its completion.
An additional indication can also be "MORE-DATA-FOLLOW" flag....

>
> The other topic not covered here is the compressed_blob_size.  The new
> write code will create a single large blob to satisfy an entire
> write currently.  With compression, we'll want to cap the blob size unless
> there is an IMMUTABLE or APPEND_ONLY hint (in which case we don't care
> about overwrites and may as well keep metadata compact).
>
> Is that just
>
>   bluestore compression max blob size = 128*1024
>
> ?
Will come back with that topic a bit later
> sage


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-05-20 14:36 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-19 17:27 Initial proposal for bluestore compression control and statistics Igor Fedotov
2016-05-19 17:57 ` Piotr Dałek
2016-05-20 13:52   ` Igor Fedotov
2016-05-20 14:14     ` Piotr Dałek
2016-05-19 19:40 ` Sage Weil
2016-05-19 20:52   ` Sage Weil
2016-05-20 14:36   ` Igor Fedotov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.