From: Alex Elder <elder@ieee.org>
To: Douglas Fuller <dfuller@redhat.com>, Ilya Dryomov <idryomov@gmail.com>
Cc: Ceph Development <ceph-devel@vger.kernel.org>
Subject: Re: [PATCHv2 0/3] rbd: header read/refresh improvements
Date: Sun, 26 Apr 2015 09:39:03 -0500 [thread overview]
Message-ID: <553CF887.6000803@ieee.org> (raw)
In-Reply-To: <CD78243E-DADC-49D1-A23F-4E4880045A30@redhat.com>
On 04/26/2015 09:28 AM, Douglas Fuller wrote:
> Alex,
>
> I think you are correct in both your understanding and your
> impression of the approach.
OK, good to hear.
>> On Apr 26, 2015, at 4:44 AM, Ilya Dryomov <idryomov@gmail.com>
>> wrote:
>>
>> On Sun, Apr 26, 2015 at 9:29 AM, Alex Elder <elder@ieee.org>
>> wrote:
>>> On 04/24/2015 08:22 AM, Douglas Fuller wrote:
>>>>
>>>> Support multiple class op calls in one ceph_msg and consolidate
>>>> rbd header read and refresh processes to use this feature to
>>>> reduce the number of ceph_msgs sent for that process. Refresh
>>>> features on header refresh and begin returning EIO if features
>>>> have changed since mapping.
>>>
>>> This sounds pretty expensive. For every class operation you are
>>> copying the received data two extra times.
>
> I’d really like a solution to this, but I don’t think one is
> available given the constraints.
Now that I know I understand it, I will at least provide
some review comments on the patches. I'll also think about
it a little. My instinct says there should be no need to
make the copies, but the devil will be in the details.
>>> Will you please correct me where I'm wrong above, and maybe give
>>> a little better high-level explanation of what you're trying to
>>> do? I see in patch 1 you mention a problem with short reads, and
>>> there may be a simpler fix than that (to begin with). But beyond
>>> that, if you're trying to incorporate more ops in a message,
>>> there may be better ways to go about that as well.
>>
>> Yeah, the only objective of this was to pack more call ops into an
>> osd_request in order to reduce the number of network RTs during rbd
>> map and refresh. The fundamental problem the first patch is trying
>> to work around is the first ceph_msg_data item consuming all reply
>> buffers instead of just its own. We have to preallocate response
>> buffers and if the preallocated response buffer for the first op is
>> large enough (it always is) it will consume the entire ceph_msg,
>> along with replies to other ops.
>>
>> My understanding is that the first patch is supposed to be a
>> specific workarond for just that - I think it's noted somewhere
>> that it will break on reads combined with call ops or similar.
>
> That’s correct. This patch only works around that short response
> issue in this specific corner case. It does not fix cases where call
> ops returning data could be arbitrarily combined with reads or writes
> (and apparently they never have been, because that would not work).
> It should not introduce additional cases of breakage, though.
I knew when we added support for copyup that we were kind of
kludging things to allow *just one* more op in a message. Then
the discard support added another, but in both of these cases
it was not done in a nice, general way. The "right" way is
to generalize it, but if that was easy it would have been
done by now.
> I was told that the preferred way to proceed for now was to avoid
> changing the osd_client interface and to handle this case in
> osd_client instead of the messaging layer. Given those constraints,
> it was agreed in the standups and on #ceph-devel that it should be
> done this way.
Sorry, I haven't been keeping up with all the activity on the
mailing list. I pay closest attention to patches to the kernel
code.
> We can’t know the actual response sizes until they are decoded in
> osd_client. Therefore, we can’t copy them directly to the user
> buffers off the wire. That costs us one extra copy. The other copy is
> required because pagevec.c does not provide an interface for copying
> arbitrary data between two page vectors.
That could probably be changed, but that's a separate issue
(and somewhat out of your hands as far as getting it done when
you need it).
>> I too have my efficiency concerns and I raised them in one of the
>> standups but the argument was that this is only going to be used
>> for header init/refresh, not copyup, so the overhead is negligible.
>> I'm still not sold on the copying everything twice though and was
>> going to try to see if there is a simple way to special case this
>> even more and make the diffstat smaller.
>
> You and I agreed in that particular standup discussion; I don’t like
> it, either. I would prefer a general-case solution that avoids
> introducing so much extra overhead, especially for such a small
> amount of data (really just a few bytes).
>
> If there’s a way to handle this with a better method, I’m all ears. I
> had thought of taking advantage of the fact that the sum total of all
> this data will never exceed a single page, but that seems to me to
> require working around even more functionality for what is,
> essentially, a corner case.
OK, well I'll go look at that code path again to see if I can
come up with any bright ideas.
Thanks for the explanation.
-Alex
> -Doug
>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2015-04-26 14:39 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-24 13:22 [PATCHv2 0/3] rbd: header read/refresh improvements Douglas Fuller
2015-04-24 13:22 ` [PATCHv2 1/3] ceph: support multiple class method calls in one ceph_msg Douglas Fuller
2015-04-24 13:22 ` [PATCHv2 2/3] rbd: combine object method calls in header refresh using fewer ceph_msgs Douglas Fuller
2015-04-24 13:22 ` [PATCHv2 3/3] rbd: re-read features during header refresh and detect changes Douglas Fuller
2015-04-26 6:29 ` [PATCHv2 0/3] rbd: header read/refresh improvements Alex Elder
2015-04-26 8:44 ` Ilya Dryomov
2015-04-26 14:28 ` Douglas Fuller
2015-04-26 14:39 ` Alex Elder [this message]
2015-04-26 15:07 ` Douglas Fuller
2015-04-27 4:15 ` Sage Weil
2015-04-27 4:35 ` Douglas Fuller
2015-04-27 4:40 ` Alex Elder
2015-05-07 15:28 ` Alex Elder
2015-05-07 16:48 ` Douglas Fuller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=553CF887.6000803@ieee.org \
--to=elder@ieee.org \
--cc=ceph-devel@vger.kernel.org \
--cc=dfuller@redhat.com \
--cc=idryomov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.