All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Elder <elder@ieee.org>
To: Ilya Dryomov <ilya.dryomov@inktank.com>
Cc: Ceph Development <ceph-devel@vger.kernel.org>
Subject: Re: [PATCH 6/6] rbd: prefix rbd writes with CEPH_OSD_OP_SETALLOCHINT osd op
Date: Tue, 25 Feb 2014 07:19:31 -0600	[thread overview]
Message-ID: <530C9863.3070003@ieee.org> (raw)
In-Reply-To: <CALFYKtB6F6xq-fpZrEZwxb+8k23Oda=W=hLEV_k+aO5aULhGPQ@mail.gmail.com>

On 02/25/2014 06:58 AM, Ilya Dryomov wrote:
> On Mon, Feb 24, 2014 at 4:59 PM, Alex Elder <elder@ieee.org> wrote:
>> On 02/21/2014 12:55 PM, Ilya Dryomov wrote:
>>> In an effort to reduce fragmentation, prefix every rbd write with
>>> a CEPH_OSD_OP_SETALLOCHINT osd op with an expected_write_size value set
>>> to the object size (1 << order).  Backwards compatibility is taken care
>>> of on the libceph/osd side.
>>
>> If *every* write will include a hint, why even encode this as
>> a distinct opcode?  Why not just extend the definition of a
>> write operation to include the write hint data?  The server
>> side could check expected_object_size, and if 0 (or some other
>> invalid value) it means the client isn't supplying a hint.
>>
>> However, on the assumption you want this to be a distinct
>> OSD op I think you generally did the right thing.  See my
>> comments below.  For now I'm not indicating "Reviewed-by"
>> because it sounds like the nature of this change is under
>> discussion still.  And I really do think that if the hint
>> is not going to be made more generic (and possibly even if
>> it is) I'd rather see this hinting done using an extension
>> of the write operation (like I suggest above).  In this
>> case it is clearly directly tied to every write operation
>> and separating it sort of obscures that.
> 
> Yes, the assumption is that we want to do this in a separate op.  The
> hint is durable, in that it's enough to do it once, so it doesn't make
> much sense to fold it into the write op(s).  The reason every rbd write
> is prefixed is that rbd doesn't explicitly create objects and relies on
> writes creating them implicitly, so there is no place to stick a single
> hint op into.  To get around that we decided to prefix every rbd write
> with a hint (just like write and setattr ops, hint op will create an
> object implicitly if it doesn't exist).

I was thinking primarily in the RBD context and not the
OSD write more generally I guess.  I suspected it was durable
and knew why it still needs to be attached to every rbd write.

On a separate note, it seems to me we've discussed how one
could maintain a bitmap of created (known to be written) RBD
objects for an image, which could be used for layered images
to avoid the separate parent read request.  If such a thing
ever got implemented it could be used to skip the hint as well.

> I'll add the above paragraph to the commit message.

Everything else you incorporated or explained, so this looks
good to me.

Reviewed-by: Alex Elder <elder@linaro.org>


  reply	other threads:[~2014-02-25 13:19 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-21 18:55 [PATCH 0/6] libceph: CEPH_OSD_OP_SETALLOCHINT osd op Ilya Dryomov
2014-02-21 18:55 ` [PATCH 1/6] libceph: encode CEPH_OSD_OP_FLAG_* op flags Ilya Dryomov
2014-02-24 14:58   ` Alex Elder
2014-02-21 18:55 ` [PATCH 2/6] libceph: add support for CEPH_OSD_OP_SETALLOCHINT osd op Ilya Dryomov
2014-02-23 16:03   ` Sage Weil
2014-02-24 14:59   ` Alex Elder
2014-02-25 12:52     ` Ilya Dryomov
2014-02-25 13:05       ` Alex Elder
2014-02-25 13:38         ` Ilya Dryomov
2014-02-25 17:12           ` Sage Weil
2014-02-25 17:12       ` Sage Weil
2014-02-21 18:55 ` [PATCH 3/6] libceph: bump CEPH_OSD_MAX_OP to 3 Ilya Dryomov
2014-02-24 14:59   ` Alex Elder
2014-02-21 18:55 ` [PATCH 4/6] rbd: do not hard-code CEPH_OSD_MAX_OP in rbd_osd_req_callback() Ilya Dryomov
2014-02-24 14:59   ` Alex Elder
2014-02-25 12:53     ` Ilya Dryomov
2014-02-21 18:55 ` [PATCH 5/6] rbd: num_ops parameter for rbd_osd_req_create() Ilya Dryomov
2014-02-24 14:59   ` Alex Elder
2014-02-21 18:55 ` [PATCH 6/6] rbd: prefix rbd writes with CEPH_OSD_OP_SETALLOCHINT osd op Ilya Dryomov
2014-02-24 14:59   ` Alex Elder
2014-02-25 12:58     ` Ilya Dryomov
2014-02-25 13:19       ` Alex Elder [this message]
2014-02-23 16:14 ` [PATCH 0/6] libceph: " Sage Weil
2014-02-23 16:15   ` Alex Elder
2014-02-24 14:58 ` Alex Elder
2014-02-25 12:50   ` Ilya Dryomov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=530C9863.3070003@ieee.org \
    --to=elder@ieee.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=ilya.dryomov@inktank.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.