From: Marcus Watts <mwatts@redhat.com>
To: Sage Weil <sweil@redhat.com>
Cc: ceph-devel@vger.kernel.org
Subject: Re: wip-addr
Date: Tue, 25 Aug 2015 06:06:12 -0400 [thread overview]
Message-ID: <20150825100612.GA23852@pika.b.linuxbox.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1508241046070.13116@cobra.newdream.net>
On Mon, Aug 24, 2015 at 11:01:45AM -0700, Sage Weil wrote:
...
> I looked over the wip-addr branch a bit. I have two basic
> questions/concerns:
>
> 1) In commit
> https://github.com/ceph/ceph/commit/73b09090466a43d5aceb979a4802de3f3f5bf24a
> we switch the type field from sa_family to transport_type. This seems
> like the way to go, but we need to deal with the fact that lots of
> clusters out there are IPv6 and have AF_INET6 filled in here. Probably
> both types should be interpreted to mean "existing/legacy IP messenger" or
> whatever we want to call SimpleMessenger/AsyncMessenger's protocol.
>
> I think encode needs to make sure it fills in that value for the type when
> encoding the legacy entity_addr_t encoding, but could use a/the single
> valid value for the new encoding. And any get_transport_type() accessor
> should also return the single valid value.
With cohortfs I had the luxury of ignoring existing installations.
I think I concluded at the time that probably most real systems
had '0' stored here.
Yes, if there are things that have "junk" here, they'll need to be handled
appropriately - I think it can be mostly fixed by just having decode map
transport_type "AF_INET6" (which is probably 10) to 0. AF_INET is 2,
so maybe that too. People switching *back* to older code after using
new code won't be so happy - would probably need a patch for older code
if that's a concern.
If there's code out there that thinks get_transport_type() returns an address
family, that will need fixing. something like,
get_address_family(){ return in_addr->ss_family; }
but I didn't find any code that actually needed this.
>
> 2) In the later commit
> https://github.com/ceph/ceph/commit/9d203a2058f76414703b4fc212a1a0a960d0c672
> you introduce a grammar for printing/parsing the addrs. This also makes
> sense since e.g. xio uses an IP to identify an endpoint. I think we
> should identify these based on the *protocol* and not the implementation,
> though... whether we use SimpleMessenger or AsyncMessenger is not a
> property of the address. Maybe "tcp://" makes more sense here? Or
> perhaps no prefix at all (a bare IP address), so that this looks the same
> as it did before in the case where the default protocol(s) are in use.
>
> I assume the xio protocol (whether it is rdma or tcp) is closely tied to
> libxio itself.. is that right? If so, using xio in that prefix makes
> sense. I'd include xio somewhere in the rdma prefix though (xrdma:// and
> xtcp://)?
Code base I had didn't have async messenger as a transport type; if that
needs to be addressable independently of simple messenger it will need a
prefix. I had a prefixless address decode as simple messenger - I
can make this more obvious than "unsigned type = 0;".
I had these prefixes,
sm
rdma
xtcp
but I can easily pick others. I figured rdma didn't need an x since
there didn't seem to be much chance anybody would want a non-xio version
of rdma.
>
> What do you think?
>
> Logistically, I think the steps for getting this ready for merge are:
>
> 1) Separate out the preliminary patches that pass a feature to the addr
> encoding.. without any of the other cohortfs patches that are currently on
> this branch. Once this builds we can merge it separately from the rest...
>
> 2) The entity_addrvec_t type.
>
> 3) The type -> transport_type switch.
There's a lot of cohortfs glop that still needs to go away in this branch.
I hadn't thought of structuring the patch that way. I can do it,
but the encoding feature will be tedious to split out.
There's a bunch of cascading stuff after entity_addrvec_t which
could go into a series of further patches, monmap, and so forth,
not all of which exist yet.
>
> 4) We should make the new entity_addr_t encoding encode sockaddr piece
> more compactly instead of eating up a full 80-byte sockaddr_storage even
> for the ~8-byte IPv4 sockaddr_in. Maybe just need to encode an
> explicit length for the sockaddr_* piece?
Yes, this could be made more memory efficient. It's mainly a matter
of making sure the constructor gets the proper amount of memory.
There's a C hack to do this, but I remember seeing a c++ technique
that was a bit less hackish.
A sockaddr_in is actually 16 bytes of which the last 8 are dummy
padding (sin_zero). (And there there are some systems that
actually check to be sure they're 0.)
BSD4.4 derived systems have a length coded into sockaddr too - but on
linux it's usually possible to get by without storing the length, and
I don't think ceph will need to do so.
...
-Marcus Watts
next prev parent reply other threads:[~2015-08-25 10:05 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-24 18:01 wip-addr Sage Weil
2015-08-25 10:06 ` Marcus Watts [this message]
2015-08-25 14:08 ` wip-addr Sage Weil
-- strict thread matches above, loose matches on Subject: below --
2015-10-02 23:24 wip-addr Marcus Watts
2015-10-09 21:49 ` wip-addr Sage Weil
[not found] ` <CACJqLyZh4WQZv_3isjXJy=t6YL700C2GjWuen-QG1D5=RkKHYw@mail.gmail.com>
2015-10-10 3:20 ` wip-addr Haomai Wang
2015-10-10 12:07 ` wip-addr Sage Weil
2015-10-12 17:42 ` wip-addr David Zafman
2015-10-12 18:04 ` wip-addr Sage Weil
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150825100612.GA23852@pika.b.linuxbox.com \
--to=mwatts@redhat.com \
--cc=ceph-devel@vger.kernel.org \
--cc=sweil@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox