All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcus Watts <mwatts@redhat.com>
To: Sage Weil <sweil@redhat.com>
Cc: ceph-devel@vger.kernel.org
Subject: Re: wip-addr
Date: Tue, 25 Aug 2015 06:06:12 -0400	[thread overview]
Message-ID: <20150825100612.GA23852@pika.b.linuxbox.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1508241046070.13116@cobra.newdream.net>

On Mon, Aug 24, 2015 at 11:01:45AM -0700, Sage Weil wrote:
...
> I looked over the wip-addr branch a bit.  I have two basic 
> questions/concerns:
> 
> 1) In commit
> 	https://github.com/ceph/ceph/commit/73b09090466a43d5aceb979a4802de3f3f5bf24a 
> we switch the type field from sa_family to transport_type.  This seems 
> like the way to go, but we need to deal with the fact that lots of 
> clusters out there are IPv6 and have AF_INET6 filled in here.  Probably 
> both types should be interpreted to mean "existing/legacy IP messenger" or 
> whatever we want to call SimpleMessenger/AsyncMessenger's protocol.
> 
> I think encode needs to make sure it fills in that value for the type when 
> encoding the legacy entity_addr_t encoding, but could use a/the single 
> valid value for the new encoding.  And any get_transport_type() accessor 
> should also return the single valid value.

With cohortfs I had the luxury of ignoring existing installations.

I think I concluded at the time that probably most real systems
had '0' stored here.
Yes, if there are things that have "junk" here, they'll need to be handled
appropriately - I think it can be mostly fixed by just having decode map
transport_type "AF_INET6" (which is probably 10) to 0.  AF_INET is 2,
so maybe that too.  People switching *back* to older code after using
new code won't be so happy - would probably need a patch for older code
if that's a concern.

If there's code out there that thinks get_transport_type() returns an address
family, that will need fixing.  something like,
	get_address_family(){ return in_addr->ss_family; }
but I didn't find any code that actually needed this.

> 
> 2) In the later commit
> 	https://github.com/ceph/ceph/commit/9d203a2058f76414703b4fc212a1a0a960d0c672 
> you introduce a grammar for printing/parsing the addrs.  This also makes 
> sense since e.g. xio uses an IP to identify an endpoint.  I think we 
> should identify these based on the *protocol* and not the implementation, 
> though... whether we use SimpleMessenger or AsyncMessenger is not a 
> property of the address.  Maybe "tcp://" makes more sense here?  Or 
> perhaps no prefix at all (a bare IP address), so that this looks the same 
> as it did before in the case where the default protocol(s) are in use.
> 
> I assume the xio protocol (whether it is rdma or tcp) is closely tied to 
> libxio itself.. is that right?  If so, using xio in that prefix makes 
> sense.  I'd include xio somewhere in the rdma prefix though (xrdma:// and 
> xtcp://)?

Code base I had didn't have async messenger as a transport type; if that
needs to be addressable independently of simple messenger it will need a
prefix.  I had a prefixless address decode as simple messenger - I
can make this more obvious than "unsigned type = 0;".

I had these prefixes,
	sm
	rdma
	xtcp
but I can easily pick others.  I figured rdma didn't need an x since
there didn't seem to be much chance anybody would want a non-xio version
of rdma.

> 
> What do you think?
> 
> Logistically, I think the steps for getting this ready for merge are:
> 
> 1) Separate out the preliminary patches that pass a feature to the addr 
> encoding.. without any of the other cohortfs patches that are currently on 
> this branch.  Once this builds we can merge it separately from the rest...
> 
> 2) The entity_addrvec_t type.
> 
> 3) The type -> transport_type switch.

There's a lot of cohortfs glop that still needs to go away in this branch.

I hadn't thought of structuring the patch that way.  I can do it,
but the encoding feature will be tedious to split out.
There's a bunch of cascading stuff after entity_addrvec_t which
could go into a series of further patches, monmap, and so forth,
not all of which exist yet.

> 
> 4) We should make the new entity_addr_t encoding encode sockaddr piece 
> more compactly instead of eating up a full 80-byte sockaddr_storage even 
> for the ~8-byte IPv4 sockaddr_in.  Maybe just need to encode an 
> explicit length for the sockaddr_* piece?

Yes, this could be made more memory efficient.  It's mainly a matter
of making sure the constructor gets the proper amount of memory.
There's a C hack to do this, but I remember seeing a c++ technique
that was a bit less hackish.

A sockaddr_in is actually 16 bytes of which the last 8 are dummy
padding (sin_zero).  (And there there are some systems that
actually check to be sure they're 0.)

BSD4.4 derived systems have a length coded into sockaddr too - but on
linux it's usually possible to get by without storing the length, and
I don't think ceph will need to do so.

...

					-Marcus Watts

  reply	other threads:[~2015-08-25 10:05 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-24 18:01 wip-addr Sage Weil
2015-08-25 10:06 ` Marcus Watts [this message]
2015-08-25 14:08   ` wip-addr Sage Weil
  -- strict thread matches above, loose matches on Subject: below --
2015-10-02 23:24 wip-addr Marcus Watts
2015-10-09 21:49 ` wip-addr Sage Weil
     [not found]   ` <CACJqLyZh4WQZv_3isjXJy=t6YL700C2GjWuen-QG1D5=RkKHYw@mail.gmail.com>
2015-10-10  3:20     ` wip-addr Haomai Wang
2015-10-10 12:07     ` wip-addr Sage Weil
2015-10-12 17:42   ` wip-addr David Zafman
2015-10-12 18:04     ` wip-addr Sage Weil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150825100612.GA23852@pika.b.linuxbox.com \
    --to=mwatts@redhat.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=sweil@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.