* OSD rebind connects to ports of other OSDs
@ 2016-12-20 10:21 Willem Jan Withagen
2016-12-20 15:00 ` Willem Jan Withagen
2016-12-20 15:06 ` Mykola Golub
0 siblings, 2 replies; 11+ messages in thread
From: Willem Jan Withagen @ 2016-12-20 10:21 UTC (permalink / raw)
To: Ceph Development
Hi,
I've been banging my head against the wall for some time now.
But rebinding OSD.0 (in cephtool-test-mon.sh) does not quite work.
When rebinding it connects to the ports of OSD.1 because those ports are
the first not in the avoid_list. That should be refused since these
sockets belong to a different process.
UNLESS SO_REUSEPORT is set:
SO_REUSEPORT allows completely duplicate bindings by multiple processes
if they all set SO_REUSEPORT before binding the port. This option
permits multiple instances of a program to each receive UDP/IP
multicast or broadcast datagrams destined for the bound port.
Which seems that that happens.
Output from sockstat in this state:
wjw ceph-osd-0 43305 14 tcp4 *:6800 *:*
wjw ceph-osd-0 43305 15 tcp4 127.0.0.1:6804 *:*
wjw ceph-osd-0 43305 16 tcp4 127.0.0.1:6805 *:*
wjw ceph-osd-0 43305 45 tcp4 127.0.0.1:6806 *:*
wjw ceph-osd-1 43318 14 tcp4 *:6804 *:*
wjw ceph-osd-1 43318 15 tcp4 *:6805 *:*
wjw ceph-osd-1 43318 16 tcp4 *:6806 *:*
wjw ceph-osd-1 43318 17 tcp4 *:6807 *:*
Which clearly demonstrates the mess.
How ever that option is nowhere set in the ceph-code, neither is it a
setting that "just" gets set.
Any suggestions where to look for this option to get set in an
incidental/bug way would be much appreciated.
Or a suggestion on how to easily debug this.
Thanx,
--WjW
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: OSD rebind connects to ports of other OSDs
2016-12-20 10:21 OSD rebind connects to ports of other OSDs Willem Jan Withagen
@ 2016-12-20 15:00 ` Willem Jan Withagen
2016-12-20 15:23 ` Sage Weil
2016-12-20 15:06 ` Mykola Golub
1 sibling, 1 reply; 11+ messages in thread
From: Willem Jan Withagen @ 2016-12-20 15:00 UTC (permalink / raw)
To: Ceph Development
On 20-12-2016 11:21, Willem Jan Withagen wrote:
> Hi,
>
> I've been banging my head against the wall for some time now.
> But rebinding OSD.0 (in cephtool-test-mon.sh) does not quite work.
>
> When rebinding it connects to the ports of OSD.1 because those ports are
> the first not in the avoid_list. That should be refused since these
> sockets belong to a different process.
> UNLESS SO_REUSEPORT is set:
> SO_REUSEPORT allows completely duplicate bindings by multiple processes
> if they all set SO_REUSEPORT before binding the port. This option
> permits multiple instances of a program to each receive UDP/IP
> multicast or broadcast datagrams destined for the bound port.
>
> Which seems that that happens.
> Output from sockstat in this state:
> wjw ceph-osd-0 43305 14 tcp4 *:6800 *:*
> wjw ceph-osd-0 43305 15 tcp4 127.0.0.1:6804 *:*
> wjw ceph-osd-0 43305 16 tcp4 127.0.0.1:6805 *:*
> wjw ceph-osd-0 43305 45 tcp4 127.0.0.1:6806 *:*
> wjw ceph-osd-1 43318 14 tcp4 *:6804 *:*
> wjw ceph-osd-1 43318 15 tcp4 *:6805 *:*
> wjw ceph-osd-1 43318 16 tcp4 *:6806 *:*
> wjw ceph-osd-1 43318 17 tcp4 *:6807 *:*
>
> Which clearly demonstrates the mess.
> How ever that option is nowhere set in the ceph-code, neither is it a
> setting that "just" gets set.
>
> Any suggestions where to look for this option to get set in an
> incidental/bug way would be much appreciated.
> Or a suggestion on how to easily debug this.
Right,
Compatibility in this area is rather thin. :)
For the question skip to the end.
So I'm going to need some functional description, to see if I can get it
right.
Osd starts and build a few messengers with SO_REUSEADDR on the socket.
On Linux used ports are being reported in use.
As on FreeBSD during startup. Ports are nicely iterated thru
and sequential ports are selected.
So that is how it should be.
Now when the osd has gone down and comes up, it reports:
log_channel(cluster) log [WRN] : map e18 wrongly marked me down
on ./src/osd/OSD.cc:7120
Then it starts rebinding on its messenger connections:
int r = cluster_messenger->rebind(avoid_ports)
on ./src/osd/OSD.cc:7192.
It calls shutdown_connections() to shutdown all of its connections.
Somewhere down the line is SO_REUSEADDR set again on the socket and the
socket is bound.
- Linux grabs the next available ports at the end, because its own
channels are to be avoided and the rest is taken.
- On FreeBSD the first port available is taken. If that is 6800,
than that is taken. Even if the socket is owned by a different
process. Which (per man-page) would require SO_REUSEPORT.
If I disable SO_REUSEADDR in NetHandler::create_socket()
====
/* Make sure connection-intensive things like the benchmark
* will be able to close/open sockets a zillion of times */
if (reuse_addr) {
if (::setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &on,sizeof(on))==-1){
lderr(cct) << __func__ << " setsockopt SO_REUSEADDR failed: "
<< strerror(errno) << dendl;
close(s);
return -errno;
}
}
====
Then things start to work "as expected" and ports are refused when it
has a listener connected.
Doing this has the disadvantage that it is not possible to immediately
kill and restart the OSD because the ports are not yet release in the
netstat table.... But that is an overseeable issue, and that time can be
shorted by setting a sysctl.
So the question is:
- how much rebinding is required.....
- And why do we set SO_REUSEADDR if we are going to add the ports to
avoid_ports. And thus a complete new port is required.
--WjW
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: OSD rebind connects to ports of other OSDs
2016-12-20 10:21 OSD rebind connects to ports of other OSDs Willem Jan Withagen
2016-12-20 15:00 ` Willem Jan Withagen
@ 2016-12-20 15:06 ` Mykola Golub
2016-12-20 16:48 ` Willem Jan Withagen
1 sibling, 1 reply; 11+ messages in thread
From: Mykola Golub @ 2016-12-20 15:06 UTC (permalink / raw)
To: Willem Jan Withagen; +Cc: Ceph Development
This is due to SO_REUSEADDR (not SO_REUSEPORT) socket option set. You
should have mentioned that you were talking about FreeBSD.
Note, although osd-0 and osd-1 processes are bound to the same port,
they have different addresses: wildcard (*) for osd-1, and 127.0.0.1
for rebound osd-0. On FreeBSD if SO_REUSEADDR is set, it fails to bind
only when both address and port are the same, and wildcard is
considered as a different address here. On Linux bind fails in such
case.
See, for example this for more details:
http://stackoverflow.com/questions/14388706/socket-options-so-reuseaddr-and-so-reuseport-how-do-they-differ-do-they-mean-t
The question is though why it rebinds to 127.0.0.1, and not to '*'? I
suppose this is wrong. How does it behave on Linux?
On Tue, Dec 20, 2016 at 11:21:19AM +0100, Willem Jan Withagen wrote:
> Hi,
>
> I've been banging my head against the wall for some time now.
> But rebinding OSD.0 (in cephtool-test-mon.sh) does not quite work.
>
> When rebinding it connects to the ports of OSD.1 because those ports are
> the first not in the avoid_list. That should be refused since these
> sockets belong to a different process.
> UNLESS SO_REUSEPORT is set:
> SO_REUSEPORT allows completely duplicate bindings by multiple processes
> if they all set SO_REUSEPORT before binding the port. This option
> permits multiple instances of a program to each receive UDP/IP
> multicast or broadcast datagrams destined for the bound port.
>
> Which seems that that happens.
> Output from sockstat in this state:
> wjw ceph-osd-0 43305 14 tcp4 *:6800 *:*
> wjw ceph-osd-0 43305 15 tcp4 127.0.0.1:6804 *:*
> wjw ceph-osd-0 43305 16 tcp4 127.0.0.1:6805 *:*
> wjw ceph-osd-0 43305 45 tcp4 127.0.0.1:6806 *:*
> wjw ceph-osd-1 43318 14 tcp4 *:6804 *:*
> wjw ceph-osd-1 43318 15 tcp4 *:6805 *:*
> wjw ceph-osd-1 43318 16 tcp4 *:6806 *:*
> wjw ceph-osd-1 43318 17 tcp4 *:6807 *:*
>
> Which clearly demonstrates the mess.
> How ever that option is nowhere set in the ceph-code, neither is it a
> setting that "just" gets set.
>
> Any suggestions where to look for this option to get set in an
> incidental/bug way would be much appreciated.
> Or a suggestion on how to easily debug this.
>
> Thanx,
> --WjW
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Mykola Golub
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: OSD rebind connects to ports of other OSDs
2016-12-20 15:00 ` Willem Jan Withagen
@ 2016-12-20 15:23 ` Sage Weil
2016-12-20 15:50 ` Willem Jan Withagen
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Sage Weil @ 2016-12-20 15:23 UTC (permalink / raw)
To: Willem Jan Withagen; +Cc: Ceph Development
On Tue, 20 Dec 2016, Willem Jan Withagen wrote:
> On 20-12-2016 11:21, Willem Jan Withagen wrote:
> > Hi,
> >
> > I've been banging my head against the wall for some time now.
> > But rebinding OSD.0 (in cephtool-test-mon.sh) does not quite work.
> >
> > When rebinding it connects to the ports of OSD.1 because those ports are
> > the first not in the avoid_list. That should be refused since these
> > sockets belong to a different process.
> > UNLESS SO_REUSEPORT is set:
> > SO_REUSEPORT allows completely duplicate bindings by multiple processes
> > if they all set SO_REUSEPORT before binding the port. This option
> > permits multiple instances of a program to each receive UDP/IP
> > multicast or broadcast datagrams destined for the bound port.
> >
> > Which seems that that happens.
> > Output from sockstat in this state:
> > wjw ceph-osd-0 43305 14 tcp4 *:6800 *:*
> > wjw ceph-osd-0 43305 15 tcp4 127.0.0.1:6804 *:*
> > wjw ceph-osd-0 43305 16 tcp4 127.0.0.1:6805 *:*
> > wjw ceph-osd-0 43305 45 tcp4 127.0.0.1:6806 *:*
> > wjw ceph-osd-1 43318 14 tcp4 *:6804 *:*
> > wjw ceph-osd-1 43318 15 tcp4 *:6805 *:*
> > wjw ceph-osd-1 43318 16 tcp4 *:6806 *:*
> > wjw ceph-osd-1 43318 17 tcp4 *:6807 *:*
> >
> > Which clearly demonstrates the mess.
> > How ever that option is nowhere set in the ceph-code, neither is it a
> > setting that "just" gets set.
> >
> > Any suggestions where to look for this option to get set in an
> > incidental/bug way would be much appreciated.
> > Or a suggestion on how to easily debug this.
>
> Right,
>
> Compatibility in this area is rather thin. :)
>
> For the question skip to the end.
>
> So I'm going to need some functional description, to see if I can get it
> right.
>
> Osd starts and build a few messengers with SO_REUSEADDR on the socket.
> On Linux used ports are being reported in use.
> As on FreeBSD during startup. Ports are nicely iterated thru
> and sequential ports are selected.
> So that is how it should be.
>
> Now when the osd has gone down and comes up, it reports:
> log_channel(cluster) log [WRN] : map e18 wrongly marked me down
> on ./src/osd/OSD.cc:7120
>
> Then it starts rebinding on its messenger connections:
> int r = cluster_messenger->rebind(avoid_ports)
> on ./src/osd/OSD.cc:7192.
> It calls shutdown_connections() to shutdown all of its connections.
>
> Somewhere down the line is SO_REUSEADDR set again on the socket and the
> socket is bound.
> - Linux grabs the next available ports at the end, because its own
> channels are to be avoided and the rest is taken.
>
> - On FreeBSD the first port available is taken. If that is 6800,
> than that is taken. Even if the socket is owned by a different
> process. Which (per man-page) would require SO_REUSEPORT.
>
> If I disable SO_REUSEADDR in NetHandler::create_socket()
> ====
> /* Make sure connection-intensive things like the benchmark
> * will be able to close/open sockets a zillion of times */
> if (reuse_addr) {
> if (::setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &on,sizeof(on))==-1){
> lderr(cct) << __func__ << " setsockopt SO_REUSEADDR failed: "
> << strerror(errno) << dendl;
> close(s);
> return -errno;
> }
> }
> ====
> Then things start to work "as expected" and ports are refused when it
> has a listener connected.
>
> Doing this has the disadvantage that it is not possible to immediately
> kill and restart the OSD because the ports are not yet release in the
> netstat table.... But that is an overseeable issue, and that time can be
> shorted by setting a sysctl.
>
> So the question is:
> - how much rebinding is required.....
I think it's just for tests. My recollection is that we did this just
because we can run out of ports since we can't reuse one until the tcp
finwait2 (or whatever) timeout expires.
> - And why do we set SO_REUSEADDR if we are going to add the ports to
> avoid_ports. And thus a complete new port is required.
I suspect it's safe to drop the option if the Linux vs FreeBSD semantics
are in fact different.
s
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: OSD rebind connects to ports of other OSDs
2016-12-20 15:23 ` Sage Weil
@ 2016-12-20 15:50 ` Willem Jan Withagen
2016-12-20 17:31 ` Mykola Golub
2016-12-20 18:39 ` Willem Jan Withagen
2 siblings, 0 replies; 11+ messages in thread
From: Willem Jan Withagen @ 2016-12-20 15:50 UTC (permalink / raw)
To: Sage Weil; +Cc: Ceph Development
On 20-12-2016 16:23, Sage Weil wrote:
> On Tue, 20 Dec 2016, Willem Jan Withagen wrote:
>> On 20-12-2016 11:21, Willem Jan Withagen wrote:
>>> Hi,
>>>
>>> I've been banging my head against the wall for some time now.
>>> But rebinding OSD.0 (in cephtool-test-mon.sh) does not quite work.
>>>
>>> When rebinding it connects to the ports of OSD.1 because those ports are
>>> the first not in the avoid_list. That should be refused since these
>>> sockets belong to a different process.
>>> UNLESS SO_REUSEPORT is set:
>>> SO_REUSEPORT allows completely duplicate bindings by multiple processes
>>> if they all set SO_REUSEPORT before binding the port. This option
>>> permits multiple instances of a program to each receive UDP/IP
>>> multicast or broadcast datagrams destined for the bound port.
>>>
>>> Which seems that that happens.
>>> Output from sockstat in this state:
>>> wjw ceph-osd-0 43305 14 tcp4 *:6800 *:*
>>> wjw ceph-osd-0 43305 15 tcp4 127.0.0.1:6804 *:*
>>> wjw ceph-osd-0 43305 16 tcp4 127.0.0.1:6805 *:*
>>> wjw ceph-osd-0 43305 45 tcp4 127.0.0.1:6806 *:*
>>> wjw ceph-osd-1 43318 14 tcp4 *:6804 *:*
>>> wjw ceph-osd-1 43318 15 tcp4 *:6805 *:*
>>> wjw ceph-osd-1 43318 16 tcp4 *:6806 *:*
>>> wjw ceph-osd-1 43318 17 tcp4 *:6807 *:*
>>>
>>> Which clearly demonstrates the mess.
>>> How ever that option is nowhere set in the ceph-code, neither is it a
>>> setting that "just" gets set.
>>>
>>> Any suggestions where to look for this option to get set in an
>>> incidental/bug way would be much appreciated.
>>> Or a suggestion on how to easily debug this.
>>
>> Right,
>>
>> Compatibility in this area is rather thin. :)
>>
>> For the question skip to the end.
>>
>> So I'm going to need some functional description, to see if I can get it
>> right.
>>
>> Osd starts and build a few messengers with SO_REUSEADDR on the socket.
>> On Linux used ports are being reported in use.
>> As on FreeBSD during startup. Ports are nicely iterated thru
>> and sequential ports are selected.
>> So that is how it should be.
>>
>> Now when the osd has gone down and comes up, it reports:
>> log_channel(cluster) log [WRN] : map e18 wrongly marked me down
>> on ./src/osd/OSD.cc:7120
>>
>> Then it starts rebinding on its messenger connections:
>> int r = cluster_messenger->rebind(avoid_ports)
>> on ./src/osd/OSD.cc:7192.
>> It calls shutdown_connections() to shutdown all of its connections.
>>
>> Somewhere down the line is SO_REUSEADDR set again on the socket and the
>> socket is bound.
>> - Linux grabs the next available ports at the end, because its own
>> channels are to be avoided and the rest is taken.
>>
>> - On FreeBSD the first port available is taken. If that is 6800,
>> than that is taken. Even if the socket is owned by a different
>> process. Which (per man-page) would require SO_REUSEPORT.
>>
>> If I disable SO_REUSEADDR in NetHandler::create_socket()
>> ====
>> /* Make sure connection-intensive things like the benchmark
>> * will be able to close/open sockets a zillion of times */
>> if (reuse_addr) {
>> if (::setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &on,sizeof(on))==-1){
>> lderr(cct) << __func__ << " setsockopt SO_REUSEADDR failed: "
>> << strerror(errno) << dendl;
>> close(s);
>> return -errno;
>> }
>> }
>> ====
>> Then things start to work "as expected" and ports are refused when it
>> has a listener connected.
>>
>> Doing this has the disadvantage that it is not possible to immediately
>> kill and restart the OSD because the ports are not yet release in the
>> netstat table.... But that is an overseeable issue, and that time can be
>> shorted by setting a sysctl.
>>
>> So the question is:
>> - how much rebinding is required.....
>
> I think it's just for tests. My recollection is that we did this just
> because we can run out of ports since we can't reuse one until the tcp
> finwait2 (or whatever) timeout expires.
>
>> - And why do we set SO_REUSEADDR if we are going to add the ports to
>> avoid_ports. And thus a complete new port is required.
>
> I suspect it's safe to drop the option if the Linux vs FreeBSD semantics
> are in fact different.
That would be great, since it'll allow me to read up on this during the
Xmas. And I'll commit a PR just excluding the code for now.
That way the FreeBSD jenkins will correctly start building master again.
(With the patches I have outstanding, and are seperatly applied)
--WjW
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: OSD rebind connects to ports of other OSDs
2016-12-20 15:06 ` Mykola Golub
@ 2016-12-20 16:48 ` Willem Jan Withagen
2016-12-20 17:23 ` Mykola Golub
0 siblings, 1 reply; 11+ messages in thread
From: Willem Jan Withagen @ 2016-12-20 16:48 UTC (permalink / raw)
To: Mykola Golub; +Cc: Ceph Development
On 20-12-2016 16:06, Mykola Golub wrote:
> This is due to SO_REUSEADDR (not SO_REUSEPORT) socket option set. You
> should have mentioned that you were talking about FreeBSD.
Hi Mykola,
Sorry, I normally do. Since I know there are subtile differences.
> Note, although osd-0 and osd-1 processes are bound to the same port,
> they have different addresses: wildcard (*) for osd-1, and 127.0.0.1
> for rebound osd-0. On FreeBSD if SO_REUSEADDR is set, it fails to bind
> only when both address and port are the same, and wildcard is
> considered as a different address here. On Linux bind fails in such
> case.
I suspected something like this, but if I try to simulate this in a
program. I still get 'address in use' when I first bind to *:port, and
then to 127.0.0.1:port.
So that makes it rather vague.
> See, for example this for more details:
>
> http://stackoverflow.com/questions/14388706/socket-options-so-reuseaddr-and-so-reuseport-how-do-they-differ-do-they-mean-t
Interesting article, archived it with the other stuff I already have
from the FreeBSD lists where there is also a lot off misunderstanding on
this topic.
> The question is though why it rebinds to 127.0.0.1, and not to '*'? I
> suppose this is wrong. How does it behave on Linux?
Similar:
First round of binds:
2016-12-12 23:57:37.615409 7f36c2be6940 1 -- 0.0.0.0:6800/13799
_finish_bind bind my_inst.addr is 0.0.0.0:6800/13799
2016-12-12 23:57:37.615739 7f36c2be6940 1 -- 0.0.0.0:6801/13799
_finish_bind bind my_inst.addr is 0.0.0.0:6801/13799
2016-12-12 23:57:37.616090 7f36c2be6940 1 -- 0.0.0.0:6802/13799
_finish_bind bind my_inst.addr is 0.0.0.0:6802/13799
2016-12-12 23:57:37.616452 7f36c2be6940 1 -- 0.0.0.0:6803/13799
_finish_bind bind my_inst.addr is 0.0.0.0:6803/13799
So that is to INADDR_ANY
rebinds:
2016-12-12 23:57:50.094446 7f36b5ac6700 1 -- 127.0.0.1:6812/1013799
_finish_bind bind my_inst.addr is 127.0.0.1:6812/1013799
2016-12-12 23:57:50.094956 7f36b5ac6700 1 -- 127.0.0.1:6813/1013799
_finish_bind bind my_inst.addr is 127.0.0.1:6813/1013799
2016-12-12 23:57:50.095477 7f36b5ac6700 1 -- 127.0.0.1:6814/1013799
_finish_bind bind my_inst.addr is 127.0.0.1:6814/1013799
so that is on the hostname as specified in the config.
So your suggestion would be to not bind on INADDR_ANY but on the config
hostname with the initial bind as well??
Also i got Email from Sage, stating that SO_REUSEADDR not working is not
too bad, since it is mainly to prevent running out of ports when they
are cycled thru high speed.
--WjW
>
> On Tue, Dec 20, 2016 at 11:21:19AM +0100, Willem Jan Withagen wrote:
>> Hi,
>>
>> I've been banging my head against the wall for some time now.
>> But rebinding OSD.0 (in cephtool-test-mon.sh) does not quite work.
>>
>> When rebinding it connects to the ports of OSD.1 because those ports are
>> the first not in the avoid_list. That should be refused since these
>> sockets belong to a different process.
>> UNLESS SO_REUSEPORT is set:
>> SO_REUSEPORT allows completely duplicate bindings by multiple processes
>> if they all set SO_REUSEPORT before binding the port. This option
>> permits multiple instances of a program to each receive UDP/IP
>> multicast or broadcast datagrams destined for the bound port.
>>
>> Which seems that that happens.
>> Output from sockstat in this state:
>> wjw ceph-osd-0 43305 14 tcp4 *:6800 *:*
>> wjw ceph-osd-0 43305 15 tcp4 127.0.0.1:6804 *:*
>> wjw ceph-osd-0 43305 16 tcp4 127.0.0.1:6805 *:*
>> wjw ceph-osd-0 43305 45 tcp4 127.0.0.1:6806 *:*
>> wjw ceph-osd-1 43318 14 tcp4 *:6804 *:*
>> wjw ceph-osd-1 43318 15 tcp4 *:6805 *:*
>> wjw ceph-osd-1 43318 16 tcp4 *:6806 *:*
>> wjw ceph-osd-1 43318 17 tcp4 *:6807 *:*
>>
>> Which clearly demonstrates the mess.
>> How ever that option is nowhere set in the ceph-code, neither is it a
>> setting that "just" gets set.
>>
>> Any suggestions where to look for this option to get set in an
>> incidental/bug way would be much appreciated.
>> Or a suggestion on how to easily debug this.
>>
>> Thanx,
>> --WjW
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: OSD rebind connects to ports of other OSDs
2016-12-20 16:48 ` Willem Jan Withagen
@ 2016-12-20 17:23 ` Mykola Golub
0 siblings, 0 replies; 11+ messages in thread
From: Mykola Golub @ 2016-12-20 17:23 UTC (permalink / raw)
To: Willem Jan Withagen; +Cc: Ceph Development
On Tue, Dec 20, 2016 at 05:48:33PM +0100, Willem Jan Withagen wrote:
> First round of binds:
> 2016-12-12 23:57:37.615409 7f36c2be6940 1 -- 0.0.0.0:6800/13799
> _finish_bind bind my_inst.addr is 0.0.0.0:6800/13799
> 2016-12-12 23:57:37.615739 7f36c2be6940 1 -- 0.0.0.0:6801/13799
> _finish_bind bind my_inst.addr is 0.0.0.0:6801/13799
> 2016-12-12 23:57:37.616090 7f36c2be6940 1 -- 0.0.0.0:6802/13799
> _finish_bind bind my_inst.addr is 0.0.0.0:6802/13799
> 2016-12-12 23:57:37.616452 7f36c2be6940 1 -- 0.0.0.0:6803/13799
> _finish_bind bind my_inst.addr is 0.0.0.0:6803/13799
>
> So that is to INADDR_ANY
>
> rebinds:
> 2016-12-12 23:57:50.094446 7f36b5ac6700 1 -- 127.0.0.1:6812/1013799
> _finish_bind bind my_inst.addr is 127.0.0.1:6812/1013799
> 2016-12-12 23:57:50.094956 7f36b5ac6700 1 -- 127.0.0.1:6813/1013799
> _finish_bind bind my_inst.addr is 127.0.0.1:6813/1013799
> 2016-12-12 23:57:50.095477 7f36b5ac6700 1 -- 127.0.0.1:6814/1013799
> _finish_bind bind my_inst.addr is 127.0.0.1:6814/1013799
>
> so that is on the hostname as specified in the config.
>
> So your suggestion would be to not bind on INADDR_ANY but on the config
> hostname with the initial bind as well??
I am not familiar with rebind logic here, so it is difficult for me to
make suggestions. But I would expect that bind and rebind addresses
should be the same. Not sure it should be hostname though. In your
case for example, before rebind the osd was listening on all
interfaces, and after rebind it was listening only on loopback,
i.e. it was accessible only from the local host.
--
Mykola Golub
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: OSD rebind connects to ports of other OSDs
2016-12-20 15:23 ` Sage Weil
2016-12-20 15:50 ` Willem Jan Withagen
@ 2016-12-20 17:31 ` Mykola Golub
2016-12-20 18:39 ` Willem Jan Withagen
2 siblings, 0 replies; 11+ messages in thread
From: Mykola Golub @ 2016-12-20 17:31 UTC (permalink / raw)
To: Sage Weil; +Cc: Willem Jan Withagen, Ceph Development
On Tue, Dec 20, 2016 at 03:23:59PM +0000, Sage Weil wrote:
> On Tue, 20 Dec 2016, Willem Jan Withagen wrote:
> > So the question is:
> > - how much rebinding is required.....
>
> I think it's just for tests. My recollection is that we did this just
> because we can run out of ports since we can't reuse one until the tcp
> finwait2 (or whatever) timeout expires.
>
> > - And why do we set SO_REUSEADDR if we are going to add the ports to
> > avoid_ports. And thus a complete new port is required.
>
> I suspect it's safe to drop the option if the Linux vs FreeBSD semantics
> are in fact different.
SO_REUSEADDR is good to have, so when restarting after a crash bind
wouldn't fail with EADDRINUSE. Thus I wouldn leave it.
For the rebind issue, I would investigate first if this is expected,
that it binds to all interfaces, but rebinds to only one address. If
this is wrong fixing this might also solve the issue on FreeBSD.
--
Mykola Golub
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: OSD rebind connects to ports of other OSDs
2016-12-20 15:23 ` Sage Weil
2016-12-20 15:50 ` Willem Jan Withagen
2016-12-20 17:31 ` Mykola Golub
@ 2016-12-20 18:39 ` Willem Jan Withagen
2016-12-20 18:43 ` Sage Weil
2 siblings, 1 reply; 11+ messages in thread
From: Willem Jan Withagen @ 2016-12-20 18:39 UTC (permalink / raw)
To: Sage Weil; +Cc: Ceph Development
On 20-12-2016 16:23, Sage Weil wrote:
> On Tue, 20 Dec 2016, Willem Jan Withagen wrote:
>> On 20-12-2016 11:21, Willem Jan Withagen wrote:
>>> Hi,
>>>
>>> I've been banging my head against the wall for some time now.
>>> But rebinding OSD.0 (in cephtool-test-mon.sh) does not quite work.
>>>
>>> When rebinding it connects to the ports of OSD.1 because those ports are
>>> the first not in the avoid_list. That should be refused since these
>>> sockets belong to a different process.
>>> UNLESS SO_REUSEPORT is set:
>>> SO_REUSEPORT allows completely duplicate bindings by multiple processes
>>> if they all set SO_REUSEPORT before binding the port. This option
>>> permits multiple instances of a program to each receive UDP/IP
>>> multicast or broadcast datagrams destined for the bound port.
>>>
>>> Which seems that that happens.
>>> Output from sockstat in this state:
>>> wjw ceph-osd-0 43305 14 tcp4 *:6800 *:*
>>> wjw ceph-osd-0 43305 15 tcp4 127.0.0.1:6804 *:*
>>> wjw ceph-osd-0 43305 16 tcp4 127.0.0.1:6805 *:*
>>> wjw ceph-osd-0 43305 45 tcp4 127.0.0.1:6806 *:*
>>> wjw ceph-osd-1 43318 14 tcp4 *:6804 *:*
>>> wjw ceph-osd-1 43318 15 tcp4 *:6805 *:*
>>> wjw ceph-osd-1 43318 16 tcp4 *:6806 *:*
>>> wjw ceph-osd-1 43318 17 tcp4 *:6807 *:*
>>>
>>> Which clearly demonstrates the mess.
>>> How ever that option is nowhere set in the ceph-code, neither is it a
>>> setting that "just" gets set.
>>>
>>> Any suggestions where to look for this option to get set in an
>>> incidental/bug way would be much appreciated.
>>> Or a suggestion on how to easily debug this.
>>
>> Right,
>>
>> Compatibility in this area is rather thin. :)
>>
>> For the question skip to the end.
>>
>> So I'm going to need some functional description, to see if I can get it
>> right.
>>
>> Osd starts and build a few messengers with SO_REUSEADDR on the socket.
>> On Linux used ports are being reported in use.
>> As on FreeBSD during startup. Ports are nicely iterated thru
>> and sequential ports are selected.
>> So that is how it should be.
>>
>> Now when the osd has gone down and comes up, it reports:
>> log_channel(cluster) log [WRN] : map e18 wrongly marked me down
>> on ./src/osd/OSD.cc:7120
>>
>> Then it starts rebinding on its messenger connections:
>> int r = cluster_messenger->rebind(avoid_ports)
>> on ./src/osd/OSD.cc:7192.
>> It calls shutdown_connections() to shutdown all of its connections.
>>
>> Somewhere down the line is SO_REUSEADDR set again on the socket and the
>> socket is bound.
>> - Linux grabs the next available ports at the end, because its own
>> channels are to be avoided and the rest is taken.
>>
>> - On FreeBSD the first port available is taken. If that is 6800,
>> than that is taken. Even if the socket is owned by a different
>> process. Which (per man-page) would require SO_REUSEPORT.
>>
>> If I disable SO_REUSEADDR in NetHandler::create_socket()
>> ====
>> /* Make sure connection-intensive things like the benchmark
>> * will be able to close/open sockets a zillion of times */
>> if (reuse_addr) {
>> if (::setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &on,sizeof(on))==-1){
>> lderr(cct) << __func__ << " setsockopt SO_REUSEADDR failed: "
>> << strerror(errno) << dendl;
>> close(s);
>> return -errno;
>> }
>> }
>> ====
>> Then things start to work "as expected" and ports are refused when it
>> has a listener connected.
>>
>> Doing this has the disadvantage that it is not possible to immediately
>> kill and restart the OSD because the ports are not yet release in the
>> netstat table.... But that is an overseeable issue, and that time can be
>> shorted by setting a sysctl.
>>
>> So the question is:
>> - how much rebinding is required.....
>
> I think it's just for tests. My recollection is that we did this just
> because we can run out of ports since we can't reuse one until the tcp
> finwait2 (or whatever) timeout expires.
>
>> - And why do we set SO_REUSEADDR if we are going to add the ports to
>> avoid_ports. And thus a complete new port is required.
>
> I suspect it's safe to drop the option if the Linux vs FreeBSD semantics
> are in fact different.
When I exclude the SO_REUSEADDR my Jenkins goes back to normal.
Will submit a PR.
--WjW
-------- Forwarded Message --------
Subject: Jenkins build is back to normal : ceph-master #107
Date: Tue, 20 Dec 2016 18:54:51 +0100 (CET)
From: jenkins@digiware.nl
Reply-To: jenkins@digiware.nl
To: wjw@digiware.nl
http://cephdev.digiware.nl:8180/jenkins/job/ceph-master/107/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: OSD rebind connects to ports of other OSDs
2016-12-20 18:39 ` Willem Jan Withagen
@ 2016-12-20 18:43 ` Sage Weil
2016-12-20 18:48 ` Willem Jan Withagen
0 siblings, 1 reply; 11+ messages in thread
From: Sage Weil @ 2016-12-20 18:43 UTC (permalink / raw)
To: Willem Jan Withagen; +Cc: Ceph Development
On Tue, 20 Dec 2016, Willem Jan Withagen wrote:
> On 20-12-2016 16:23, Sage Weil wrote:
> > On Tue, 20 Dec 2016, Willem Jan Withagen wrote:
> >> On 20-12-2016 11:21, Willem Jan Withagen wrote:
> >>> Hi,
> >>>
> >>> I've been banging my head against the wall for some time now.
> >>> But rebinding OSD.0 (in cephtool-test-mon.sh) does not quite work.
> >>>
> >>> When rebinding it connects to the ports of OSD.1 because those ports are
> >>> the first not in the avoid_list. That should be refused since these
> >>> sockets belong to a different process.
> >>> UNLESS SO_REUSEPORT is set:
> >>> SO_REUSEPORT allows completely duplicate bindings by multiple processes
> >>> if they all set SO_REUSEPORT before binding the port. This option
> >>> permits multiple instances of a program to each receive UDP/IP
> >>> multicast or broadcast datagrams destined for the bound port.
> >>>
> >>> Which seems that that happens.
> >>> Output from sockstat in this state:
> >>> wjw ceph-osd-0 43305 14 tcp4 *:6800 *:*
> >>> wjw ceph-osd-0 43305 15 tcp4 127.0.0.1:6804 *:*
> >>> wjw ceph-osd-0 43305 16 tcp4 127.0.0.1:6805 *:*
> >>> wjw ceph-osd-0 43305 45 tcp4 127.0.0.1:6806 *:*
> >>> wjw ceph-osd-1 43318 14 tcp4 *:6804 *:*
> >>> wjw ceph-osd-1 43318 15 tcp4 *:6805 *:*
> >>> wjw ceph-osd-1 43318 16 tcp4 *:6806 *:*
> >>> wjw ceph-osd-1 43318 17 tcp4 *:6807 *:*
> >>>
> >>> Which clearly demonstrates the mess.
> >>> How ever that option is nowhere set in the ceph-code, neither is it a
> >>> setting that "just" gets set.
> >>>
> >>> Any suggestions where to look for this option to get set in an
> >>> incidental/bug way would be much appreciated.
> >>> Or a suggestion on how to easily debug this.
> >>
> >> Right,
> >>
> >> Compatibility in this area is rather thin. :)
> >>
> >> For the question skip to the end.
> >>
> >> So I'm going to need some functional description, to see if I can get it
> >> right.
> >>
> >> Osd starts and build a few messengers with SO_REUSEADDR on the socket.
> >> On Linux used ports are being reported in use.
> >> As on FreeBSD during startup. Ports are nicely iterated thru
> >> and sequential ports are selected.
> >> So that is how it should be.
> >>
> >> Now when the osd has gone down and comes up, it reports:
> >> log_channel(cluster) log [WRN] : map e18 wrongly marked me down
> >> on ./src/osd/OSD.cc:7120
> >>
> >> Then it starts rebinding on its messenger connections:
> >> int r = cluster_messenger->rebind(avoid_ports)
> >> on ./src/osd/OSD.cc:7192.
> >> It calls shutdown_connections() to shutdown all of its connections.
> >>
> >> Somewhere down the line is SO_REUSEADDR set again on the socket and the
> >> socket is bound.
> >> - Linux grabs the next available ports at the end, because its own
> >> channels are to be avoided and the rest is taken.
> >>
> >> - On FreeBSD the first port available is taken. If that is 6800,
> >> than that is taken. Even if the socket is owned by a different
> >> process. Which (per man-page) would require SO_REUSEPORT.
> >>
> >> If I disable SO_REUSEADDR in NetHandler::create_socket()
> >> ====
> >> /* Make sure connection-intensive things like the benchmark
> >> * will be able to close/open sockets a zillion of times */
> >> if (reuse_addr) {
> >> if (::setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &on,sizeof(on))==-1){
> >> lderr(cct) << __func__ << " setsockopt SO_REUSEADDR failed: "
> >> << strerror(errno) << dendl;
> >> close(s);
> >> return -errno;
> >> }
> >> }
> >> ====
> >> Then things start to work "as expected" and ports are refused when it
> >> has a listener connected.
> >>
> >> Doing this has the disadvantage that it is not possible to immediately
> >> kill and restart the OSD because the ports are not yet release in the
> >> netstat table.... But that is an overseeable issue, and that time can be
> >> shorted by setting a sysctl.
> >>
> >> So the question is:
> >> - how much rebinding is required.....
> >
> > I think it's just for tests. My recollection is that we did this just
> > because we can run out of ports since we can't reuse one until the tcp
> > finwait2 (or whatever) timeout expires.
> >
> >> - And why do we set SO_REUSEADDR if we are going to add the ports to
> >> avoid_ports. And thus a complete new port is required.
> >
> > I suspect it's safe to drop the option if the Linux vs FreeBSD semantics
> > are in fact different.
>
> When I exclude the SO_REUSEADDR my Jenkins goes back to normal.
> Will submit a PR.
Please #ifdef it so it's only excluded for FreeBSD.
Thanks!
sage
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: OSD rebind connects to ports of other OSDs
2016-12-20 18:43 ` Sage Weil
@ 2016-12-20 18:48 ` Willem Jan Withagen
0 siblings, 0 replies; 11+ messages in thread
From: Willem Jan Withagen @ 2016-12-20 18:48 UTC (permalink / raw)
To: Sage Weil; +Cc: Ceph Development
On 20-12-2016 19:43, Sage Weil wrote:
> On Tue, 20 Dec 2016, Willem Jan Withagen wrote:
>> On 20-12-2016 16:23, Sage Weil wrote:
>>> On Tue, 20 Dec 2016, Willem Jan Withagen wrote:
>>>> On 20-12-2016 11:21, Willem Jan Withagen wrote:
>>>>> Hi,
>>>>>
>>>>> I've been banging my head against the wall for some time now.
>>>>> But rebinding OSD.0 (in cephtool-test-mon.sh) does not quite work.
>>>>>
>>>>> When rebinding it connects to the ports of OSD.1 because those ports are
>>>>> the first not in the avoid_list. That should be refused since these
>>>>> sockets belong to a different process.
>>>>> UNLESS SO_REUSEPORT is set:
>>>>> SO_REUSEPORT allows completely duplicate bindings by multiple processes
>>>>> if they all set SO_REUSEPORT before binding the port. This option
>>>>> permits multiple instances of a program to each receive UDP/IP
>>>>> multicast or broadcast datagrams destined for the bound port.
>>>>>
>>>>> Which seems that that happens.
>>>>> Output from sockstat in this state:
>>>>> wjw ceph-osd-0 43305 14 tcp4 *:6800 *:*
>>>>> wjw ceph-osd-0 43305 15 tcp4 127.0.0.1:6804 *:*
>>>>> wjw ceph-osd-0 43305 16 tcp4 127.0.0.1:6805 *:*
>>>>> wjw ceph-osd-0 43305 45 tcp4 127.0.0.1:6806 *:*
>>>>> wjw ceph-osd-1 43318 14 tcp4 *:6804 *:*
>>>>> wjw ceph-osd-1 43318 15 tcp4 *:6805 *:*
>>>>> wjw ceph-osd-1 43318 16 tcp4 *:6806 *:*
>>>>> wjw ceph-osd-1 43318 17 tcp4 *:6807 *:*
>>>>>
>>>>> Which clearly demonstrates the mess.
>>>>> How ever that option is nowhere set in the ceph-code, neither is it a
>>>>> setting that "just" gets set.
>>>>>
>>>>> Any suggestions where to look for this option to get set in an
>>>>> incidental/bug way would be much appreciated.
>>>>> Or a suggestion on how to easily debug this.
>>>>
>>>> Right,
>>>>
>>>> Compatibility in this area is rather thin. :)
>>>>
>>>> For the question skip to the end.
>>>>
>>>> So I'm going to need some functional description, to see if I can get it
>>>> right.
>>>>
>>>> Osd starts and build a few messengers with SO_REUSEADDR on the socket.
>>>> On Linux used ports are being reported in use.
>>>> As on FreeBSD during startup. Ports are nicely iterated thru
>>>> and sequential ports are selected.
>>>> So that is how it should be.
>>>>
>>>> Now when the osd has gone down and comes up, it reports:
>>>> log_channel(cluster) log [WRN] : map e18 wrongly marked me down
>>>> on ./src/osd/OSD.cc:7120
>>>>
>>>> Then it starts rebinding on its messenger connections:
>>>> int r = cluster_messenger->rebind(avoid_ports)
>>>> on ./src/osd/OSD.cc:7192.
>>>> It calls shutdown_connections() to shutdown all of its connections.
>>>>
>>>> Somewhere down the line is SO_REUSEADDR set again on the socket and the
>>>> socket is bound.
>>>> - Linux grabs the next available ports at the end, because its own
>>>> channels are to be avoided and the rest is taken.
>>>>
>>>> - On FreeBSD the first port available is taken. If that is 6800,
>>>> than that is taken. Even if the socket is owned by a different
>>>> process. Which (per man-page) would require SO_REUSEPORT.
>>>>
>>>> If I disable SO_REUSEADDR in NetHandler::create_socket()
>>>> ====
>>>> /* Make sure connection-intensive things like the benchmark
>>>> * will be able to close/open sockets a zillion of times */
>>>> if (reuse_addr) {
>>>> if (::setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &on,sizeof(on))==-1){
>>>> lderr(cct) << __func__ << " setsockopt SO_REUSEADDR failed: "
>>>> << strerror(errno) << dendl;
>>>> close(s);
>>>> return -errno;
>>>> }
>>>> }
>>>> ====
>>>> Then things start to work "as expected" and ports are refused when it
>>>> has a listener connected.
>>>>
>>>> Doing this has the disadvantage that it is not possible to immediately
>>>> kill and restart the OSD because the ports are not yet release in the
>>>> netstat table.... But that is an overseeable issue, and that time can be
>>>> shorted by setting a sysctl.
>>>>
>>>> So the question is:
>>>> - how much rebinding is required.....
>>>
>>> I think it's just for tests. My recollection is that we did this just
>>> because we can run out of ports since we can't reuse one until the tcp
>>> finwait2 (or whatever) timeout expires.
>>>
>>>> - And why do we set SO_REUSEADDR if we are going to add the ports to
>>>> avoid_ports. And thus a complete new port is required.
>>>
>>> I suspect it's safe to drop the option if the Linux vs FreeBSD semantics
>>> are in fact different.
>>
>> When I exclude the SO_REUSEADDR my Jenkins goes back to normal.
>> Will submit a PR.
>
> Please #ifdef it so it's only excluded for FreeBSD.
Yup,
in #12593
--WjW
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2016-12-20 18:48 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-20 10:21 OSD rebind connects to ports of other OSDs Willem Jan Withagen
2016-12-20 15:00 ` Willem Jan Withagen
2016-12-20 15:23 ` Sage Weil
2016-12-20 15:50 ` Willem Jan Withagen
2016-12-20 17:31 ` Mykola Golub
2016-12-20 18:39 ` Willem Jan Withagen
2016-12-20 18:43 ` Sage Weil
2016-12-20 18:48 ` Willem Jan Withagen
2016-12-20 15:06 ` Mykola Golub
2016-12-20 16:48 ` Willem Jan Withagen
2016-12-20 17:23 ` Mykola Golub
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.