CEPH filesystem development
 help / color / mirror / Atom feed
From: Igor Fedotov <ifedotov@mirantis.com>
To: Allen Samuels <Allen.Samuels@sandisk.com>, Sage Weil <sage@newdream.net>
Cc: ceph-devel <ceph-devel@vger.kernel.org>
Subject: Re: Adding compression support for bluestore.
Date: Mon, 21 Mar 2016 19:35:57 +0300	[thread overview]
Message-ID: <56F022ED.6000909@mirantis.com> (raw)
In-Reply-To: <CY1PR0201MB189754639EB3F2E219CF0E6EE88F0@CY1PR0201MB1897.namprd02.prod.outlook.com>



On 21.03.2016 18:14, Allen Samuels wrote:
>>
>> That's an interesting proposal but I can see following caveats here (I beg
>> pardon I  misunderstood something):
>> 1) Potentially uncontrolled extent map growth when extensive (over)writing
>> takes place.
> Yes, a naïve insertion policy could lead to uncontrolled growth, but I don't think this needs to be the case. I assume that when you add an "extent", you won't increase the size of the array unnecessarily, i.e., if the new extent doesn't overlap an existing extent then there's no reason to increase the size of the map array -- actually you want to insert the new extent at the <smallest> array index that doesn't overlap, only increasing the array size when that's not possible. I'm not 100% certain of the worst case, but I believe that it's limited to the ratio between the largest extent and the smallest extent. (i.e., if we assume writes are no larger than -- say -- 1MB and the smallest are 4K, then I think the max depth of the array is 1M/4K => 2^8, 256. Which is ugly but not awful -- since this is probably a contrived case. This might be a reason to limit the largest extent size to something a bit smaller (say 256K)...
It looks like I misunderstood something... It seemed to me that your 
array grows depending on the maximum amount of block versions.
Imagine you have 1000 writes 0~4K and 1000 writes 8K~4K
I supposed that this will create following array:
[
0: <0:{...,4K}, 8K:{...4K}>,
...
999: <0:{...,4K}, 8K:{...4K}>,
]

what's happening in your case?

>> 2) Read/Lookup algorithmic complexity. To find valid block (or detect
>> overwrite) one should sequentially enumerate the full array. Given 1) that
>> might be very ineffective.
> Only requires one log2 lookup for each index of the array.
This depends on 1) thus still unclear at the moment.
>> 3) It's not dealing with unaligned overwrites. What happens when some
>> block is partially overwritten?
> I'm not sure I understand what cases you're referring to. Can you give an example?
>
Well, as far as I understand in the proposal above you were operating 
the entire blocks (i.e. 4K data)
Thus overwriting the block is a simple case - you just need to create a 
new block "version" and insert it into an array.
But real user writes seems to be unaligned to block size.
E.g.
write 0~2048
write 1024~3072

you have to either track both blocks or merge them. The latter is a bit 
tricky for the compression case.




--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2016-03-21 16:36 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-15 16:29 Adding compression support for bluestore Igor Fedotov
2016-02-16  2:06 ` Haomai Wang
2016-02-17  0:11   ` Igor Fedotov
2016-02-19 23:13     ` Allen Samuels
2016-02-22 12:25       ` Sage Weil
2016-02-24 18:18         ` Igor Fedotov
2016-02-24 18:43           ` Allen Samuels
2016-02-26 17:41             ` Igor Fedotov
2016-03-15 17:12               ` Sage Weil
2016-03-16  1:06                 ` Allen Samuels
2016-03-16 18:34                 ` Igor Fedotov
2016-03-16 19:02                   ` Allen Samuels
2016-03-16 19:15                     ` Sage Weil
2016-03-16 19:20                       ` Allen Samuels
2016-03-16 19:29                         ` Sage Weil
2016-03-16 19:36                           ` Allen Samuels
2016-03-17 14:55                     ` Igor Fedotov
2016-03-17 15:28                       ` Allen Samuels
2016-03-18 13:00                         ` Igor Fedotov
2016-03-16 19:27                   ` Sage Weil
2016-03-16 19:41                     ` Allen Samuels
     [not found]                       ` <CA+z5DsxA9_LLozFrDOtnVRc7FcvN7S8OF12zswQZ4q4ysK_0BA@mail.gmail.com>
2016-03-16 22:56                         ` Blair Bethwaite
2016-03-17  3:21                           ` Allen Samuels
2016-03-17 10:01                             ` Willem Jan Withagen
2016-03-17 17:29                               ` Howard Chu
2016-03-17 15:21                             ` Igor Fedotov
2016-03-17 15:18                     ` Igor Fedotov
2016-03-17 15:33                       ` Sage Weil
2016-03-17 18:53                         ` Allen Samuels
2016-03-18 14:58                           ` Igor Fedotov
2016-03-18 15:53                         ` Igor Fedotov
2016-03-18 17:17                           ` Vikas Sinha-SSI
2016-03-19  3:14                             ` Allen Samuels
2016-03-21 14:19                             ` Igor Fedotov
2016-03-19  3:14                           ` Allen Samuels
2016-03-21 14:07                             ` Igor Fedotov
2016-03-21 15:14                               ` Allen Samuels
2016-03-21 16:35                                 ` Igor Fedotov [this message]
2016-03-21 17:14                                   ` Allen Samuels
2016-03-21 18:31                                     ` Igor Fedotov
2016-03-21 21:14                                       ` Allen Samuels
2016-03-21 15:32                             ` Igor Fedotov
2016-03-21 15:50                               ` Sage Weil
2016-03-21 18:01                                 ` Igor Fedotov
2016-03-24 12:45                                 ` Igor Fedotov
2016-03-24 22:29                                   ` Allen Samuels
2016-03-29 20:19                                   ` Sage Weil
2016-03-29 20:45                                     ` Allen Samuels
2016-03-30 12:32                                       ` Igor Fedotov
2016-03-30 12:28                                     ` Igor Fedotov
2016-03-30 12:47                                       ` Sage Weil
2016-03-31 21:56                                   ` Sage Weil
2016-04-01 18:54                                     ` Allen Samuels
2016-04-04 12:31                                     ` Igor Fedotov
2016-04-04 12:38                                     ` Igor Fedotov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56F022ED.6000909@mirantis.com \
    --to=ifedotov@mirantis.com \
    --cc=Allen.Samuels@sandisk.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=sage@newdream.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox