conntrackd won't start, "can't open multicast server!"

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* conntrackd won't start, "can't open multicast server!"
@ 2008-01-04  8:10 Max Kellermann
  2008-01-05 14:31 ` Pablo Neira Ayuso
  0 siblings, 1 reply; 8+ messages in thread
From: Max Kellermann @ 2008-01-04  8:10 UTC (permalink / raw)
  To: netfilter-devel, Pablo Neira Ayuso

Hi Pablo,

I am currently working on the official conntrack-tools 0.9.5 Debian
package; I have been maintaining the old "conntrack" program before.
The daemon will not start (with examples/stats/conntrackd.conf):

 host:~# /usr/sbin/conntrackd -C /etc/conntrackd.conf 
 Notice: StripNAT clause is obsolete. Please, remove it from conntrackd.conf
 ERROR: conntrackd cannot start, please check the logfile for more info
 host:~# tail /var/log/conntrackd.log 
 [Fri Jan  4 09:01:25 2008] (pid=9353) --- starting in console mode ---
 [Fri Jan  4 09:01:25 2008] (pid=9353) [FAIL] can't open multicast server!
 [Fri Jan  4 09:01:25 2008] (pid=9353) [FAIL] initialization failed

This machine has CONFIG_IP_MULTICAST=y, although I do not understand
why conntrackd needs a multicast socket in the stats mode.  strace
says:

 socket(PF_UNSPEC, SOCK_DGRAM, 0) = -1 EAFNOSUPPORT (Address family
 not supported by protocol)

Also, the example shipped in the 0.9.5 includes the obsolete
"StripNAT" option.

Max

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: conntrackd won't start, "can't open multicast server!"
  2008-01-04  8:10 conntrackd won't start, "can't open multicast server!" Max Kellermann
@ 2008-01-05 14:31 ` Pablo Neira Ayuso
  2008-01-05 17:29   ` Max Kellermann
  0 siblings, 1 reply; 8+ messages in thread
From: Pablo Neira Ayuso @ 2008-01-05 14:31 UTC (permalink / raw)
  To: Max Kellermann; +Cc: netfilter-devel

Hi Max,

Max Kellermann wrote:
> I am currently working on the official conntrack-tools 0.9.5 Debian
> package; I have been maintaining the old "conntrack" program before.
> The daemon will not start (with examples/stats/conntrackd.conf):
> 
>  host:~# /usr/sbin/conntrackd -C /etc/conntrackd.conf 
>  Notice: StripNAT clause is obsolete. Please, remove it from conntrackd.conf
>  ERROR: conntrackd cannot start, please check the logfile for more info
>  host:~# tail /var/log/conntrackd.log 
>  [Fri Jan  4 09:01:25 2008] (pid=9353) --- starting in console mode ---
>  [Fri Jan  4 09:01:25 2008] (pid=9353) [FAIL] can't open multicast server!
>  [Fri Jan  4 09:01:25 2008] (pid=9353) [FAIL] initialization failed

You forgot the -S option to run it in statistics mode. I know that this
option is a bit confusing so I have applied a patch to obsolete it.
Thus, you won't need to pass -S to conntrackd anymore in the upcoming
0.9.6 release.

> Also, the example shipped in the 0.9.5 includes the obsolete
> "StripNAT" option.

This was removed from SVN. BTW, SVN currently contains a patch to enable
logging via file/syslog for the statistics mode. Still, this requires
evaluation in terms of performance (e.g. very busy firewalls) and some
kind of tool that digest the logging information.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: conntrackd won't start, "can't open multicast server!"
  2008-01-05 14:31 ` Pablo Neira Ayuso
@ 2008-01-05 17:29   ` Max Kellermann
  2008-01-07 11:09     ` Pablo Neira Ayuso
  0 siblings, 1 reply; 8+ messages in thread
From: Max Kellermann @ 2008-01-05 17:29 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

On 2008/01/05 15:31, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> You forgot the -S option to run it in statistics mode. I know that this
> option is a bit confusing so I have applied a patch to obsolete it.
> Thus, you won't need to pass -S to conntrackd anymore in the upcoming
> 0.9.6 release.

Right, with -S it starts up.  Somehow I must have missed that option
in the --help text.

By the way, it is not possible to run "conntrackd --help" as user.  It
would be nice if users could view the usage information.  Why does
conntrackd check the capability mask at all?

The conntrackd manual page is missing in the source distribution, it
might be in SVN, since it is displayed on the conntrack-tools home
page.

I noticed conntrackd runs select() with a 200ms timeout, i.e. it wakes
up 5 times a second only to see that there is nothing to do.  Why
that?  This leads to increased power consumption for no good.

When I stop the daemon (running in foreground) with Ctrl-C, glibc
detects a heap corruption:

*** glibc detected *** /usr/sbin/conntrackd: corrupted double-linked
    list: 0x0000000000631d40 ***
======= Backtrace: =========
/lib/libc.so.6[0x2afb493221cc]
/lib/libc.so.6(cfree+0x8c)[0x2afb49325b5c]
/usr/lib/libnetfilter_conntrack.so.1(nfct_close+0x6f)[0x2afb48e9db2f]
/usr/sbin/conntrackd[0x4032de]
/lib/libc.so.6[0x2afb492e0040]
/lib/libc.so.6(sigprocmask+0x10)[0x2afb492e0440]
/usr/sbin/conntrackd[0x403350]
/lib/libc.so.6[0x2afb492e0040]
/lib/libc.so.6(__select+0x13)[0x2afb4937eb33]
/usr/sbin/conntrackd[0x402dd5]
/usr/sbin/conntrackd[0x402924]
/lib/libc.so.6(__libc_start_main+0xf4)[0x2afb492cc1c4]
/usr/sbin/conntrackd[0x402239]

I am using libnetfilter_conntrack 0.0.82.

Max

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: conntrackd won't start, "can't open multicast server!"
  2008-01-05 17:29   ` Max Kellermann
@ 2008-01-07 11:09     ` Pablo Neira Ayuso
  2008-01-07 11:55       ` Max Kellermann
  0 siblings, 1 reply; 8+ messages in thread
From: Pablo Neira Ayuso @ 2008-01-07 11:09 UTC (permalink / raw)
  To: Max Kellermann; +Cc: netfilter-devel

Max Kellermann wrote:
> On 2008/01/05 15:31, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>> You forgot the -S option to run it in statistics mode. I know that this
>> option is a bit confusing so I have applied a patch to obsolete it.
>> Thus, you won't need to pass -S to conntrackd anymore in the upcoming
>> 0.9.6 release.
> 
> Right, with -S it starts up.  Somehow I must have missed that option
> in the --help text.
> 
> By the way, it is not possible to run "conntrackd --help" as user.  It
> would be nice if users could view the usage information.  Why does
> conntrackd check the capability mask at all?

Netlink requires CAP_NET_ADMIN. conntrackd checks for it before
starting. I'm going to do the capability checking later so that the help
message can be shown. I'll commit a patch later.

> The conntrackd manual page is missing in the source distribution, it
> might be in SVN, since it is displayed on the conntrack-tools home
> page.

The conntrackd page wasn't available in 0.9.5, but it will in 0.9.6.
It's impossible that we can bundle something to a package when it didn't
exist at that time :)

> I noticed conntrackd runs select() with a 200ms timeout, i.e. it wakes
> up 5 times a second only to see that there is nothing to do.  Why
> that?  This leads to increased power consumption for no good.

I have implemented alarms based on times slices so I use select to wake
up expired alarms once the slice has been consumed. Are you really
observing this power consumption increment?

> When I stop the daemon (running in foreground) with Ctrl-C, glibc
> detects a heap corruption:
> 
> *** glibc detected *** /usr/sbin/conntrackd: corrupted double-linked
>     list: 0x0000000000631d40 ***
> ======= Backtrace: =========
> /lib/libc.so.6[0x2afb493221cc]
> /lib/libc.so.6(cfree+0x8c)[0x2afb49325b5c]
> /usr/lib/libnetfilter_conntrack.so.1(nfct_close+0x6f)[0x2afb48e9db2f]
> /usr/sbin/conntrackd[0x4032de]
> /lib/libc.so.6[0x2afb492e0040]
> /lib/libc.so.6(sigprocmask+0x10)[0x2afb492e0440]
> /usr/sbin/conntrackd[0x403350]
> /lib/libc.so.6[0x2afb492e0040]
> /lib/libc.so.6(__select+0x13)[0x2afb4937eb33]
> /usr/sbin/conntrackd[0x402dd5]
> /usr/sbin/conntrackd[0x402924]
> /lib/libc.so.6(__libc_start_main+0xf4)[0x2afb492cc1c4]
> /usr/sbin/conntrackd[0x402239]

I'll investigate this. Are you using 0.9.5 or a SVN snapshot? Are you
using the `alarm' mode (formely known as `persistent')?

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: conntrackd won't start, "can't open multicast server!"
  2008-01-07 11:09     ` Pablo Neira Ayuso
@ 2008-01-07 11:55       ` Max Kellermann
  2008-01-09 23:06         ` Pablo Neira Ayuso
  0 siblings, 1 reply; 8+ messages in thread
From: Max Kellermann @ 2008-01-07 11:55 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

On 2008/01/07 12:09, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> The conntrackd page wasn't available in 0.9.5, but it will in 0.9.6.
> It's impossible that we can bundle something to a package when it
> didn't exist at that time :)

Ah, so the manual page on the project home page is from an unreleased
version.

> > I noticed conntrackd runs select() with a 200ms timeout, i.e. it wakes
> > up 5 times a second only to see that there is nothing to do.  Why
> > that?  This leads to increased power consumption for no good.
> 
> I have implemented alarms based on times slices so I use select to wake
> up expired alarms once the slice has been consumed. Are you really
> observing this power consumption increment?

We can argue about whether fixing just conntrackd would be measurable,
but it's yet another daemon who is polling when he could better wait
for real events.  Let's fix all of them, and enjoy a tickless system
which lets the CPU sleep until there is real work to do.

Waking up daemons without a reason is sloppy design most of the time.
A quick look at the conntrackd code made me believe that conntrackd
just doesn't check when the next scheduled event is due, and instead
performs a check on all alarm objects in the current step 5 times a
second.  That is easily fixable, and not only saves CPU cycles and
power, but also leads to better overall design.

The whole alarm.c looks like duplicated effort, you could have used
libevent instead.

By the way, I saw an add_alarm() in cache_timer.c, but its callback
function "timeout()" neither sets a new "expires" value, nor does it
delete the alarm object.  That may lead to integer underflow in the
next do_alarm_run() invocation.

> I'll investigate this. Are you using 0.9.5 or a SVN snapshot? Are
> you using the `alarm' mode (formely known as `persistent')?

I am using the most recent release, i.e. 0.9.5.  I have no idea about
"alarm" or "persistent" mode, and I did not find any documentation on
this.  I am using the "stats" example configuration from the tarball.

Max

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: conntrackd won't start, "can't open multicast server!"
  2008-01-07 11:55       ` Max Kellermann
@ 2008-01-09 23:06         ` Pablo Neira Ayuso
  2008-01-14  9:40           ` Max Kellermann
  0 siblings, 1 reply; 8+ messages in thread
From: Pablo Neira Ayuso @ 2008-01-09 23:06 UTC (permalink / raw)
  To: Max Kellermann; +Cc: netfilter-devel

Max Kellermann wrote:
> Waking up daemons without a reason is sloppy design most of the time.
> A quick look at the conntrackd code made me believe that conntrackd
> just doesn't check when the next scheduled event is due, and instead
> performs a check on all alarm objects in the current step 5 times a
> second.  That is easily fixable, and not only saves CPU cycles and
> power, but also leads to better overall design.

Indeed. This makes a lot sense to me. I have committed a patch to SVN to
wake up the daemon only if there is any alarm event to process instead
of polling. I'll do some testing of it tomorrow.

> The whole alarm.c looks like duplicated effort, you could have used
> libevent instead.

Well, I think that libevent is too much since conntrackd handles not
that many descriptors and the alarm implementation is enough for what
conntrackd needs IMO.

> By the way, I saw an add_alarm() in cache_timer.c, but its callback
> function "timeout()" neither sets a new "expires" value, nor does it
> delete the alarm object.  That may lead to integer underflow in the
> next do_alarm_run() invocation.

I have also changed this since I needed it for the lastest commit.
However, AFAICS such underflow doesn't ever happen in 0.9.5.

>> I'll investigate this. Are you using 0.9.5 or a SVN snapshot? Are
>> you using the `alarm' mode (formely known as `persistent')?
> 
> I am using the most recent release, i.e. 0.9.5.  I have no idea about
> "alarm" or "persistent" mode, and I did not find any documentation on
> this.  I am using the "stats" example configuration from the tarball.

Please, could you check out a working copy from SVN and tell me if the
problem that you're reporting persists?

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: conntrackd won't start, "can't open multicast server!"
  2008-01-09 23:06         ` Pablo Neira Ayuso
@ 2008-01-14  9:40           ` Max Kellermann
  2008-01-14 15:41             ` Pablo Neira Ayuso
  0 siblings, 1 reply; 8+ messages in thread
From: Max Kellermann @ 2008-01-14  9:40 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

On 2008/01/10 00:06, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> Max Kellermann wrote:
> > Waking up daemons without a reason is sloppy design most of the time.
> > A quick look at the conntrackd code made me believe that conntrackd
> > just doesn't check when the next scheduled event is due, and instead
> > performs a check on all alarm objects in the current step 5 times a
> > second.  That is easily fixable, and not only saves CPU cycles and
> > power, but also leads to better overall design.
> 
> Indeed. This makes a lot sense to me. I have committed a patch to SVN to
> wake up the daemon only if there is any alarm event to process instead
> of polling. I'll do some testing of it tomorrow.

Looks much better!

 15 files changed, 103 insertions(+), 208 deletions(-)

Also, the code has become smaller :)

Why did you create separate functions for setting secs and usecs?
(set_alarm_expiration_secs and set_alarm_expiration_usecs); both
functions are not prototyped in a header file.  Why not add something
like this to alarm.h:

 static inline void
 set_alarm_expiration_secs(struct alarm_list *t, long tv_sec, long tv_usec)
 {
     t->tv.tv_sec = tv_sec;
     t->tv.tv_usec = tv_usec;
 }

Since the alarm_list struct is public anyway, this looks more elegant
and creates smaller code.

The function do_alarm_run() assumes that all alarms are sorted by
their due time, but add_alarm() does not enforce this.

I do not understand why you use random() to generate the next alarm
time in sync-alarm.c.

Why do you call INIT_LIST_HEAD() on all alarm_list objects?  For
linux_list.h, that is only required on the sentinel (the global
variable "alarm_list" in this case which is already statically
initalized with the LIST_HEAD macro).

> > The whole alarm.c looks like duplicated effort, you could have used
> > libevent instead.
> 
> Well, I think that libevent is too much since conntrackd handles not
> that many descriptors and the alarm implementation is enough for what
> conntrackd needs IMO.

I tend to use libevent for small projects, too; maybe libowfat is more
appropriate for smaller projects.  That is a matter of taste.

> > I am using the most recent release, i.e. 0.9.5.  I have no idea about
> > "alarm" or "persistent" mode, and I did not find any documentation on
> > this.  I am using the "stats" example configuration from the tarball.
> 
> Please, could you check out a working copy from SVN and tell me if the
> problem that you're reporting persists?

With conntrackd and libnetfilter_conntrack from SVN r7196, the crash
does not occur anymore.  Please tell me as soon as you release both,
so I can update the Debian package.

But have a look at the strace:

 select(6, [4 5], NULL, NULL, {1, 0})    = 0 (Timeout)

After that, it goes into an endless loop of:

 select(6, [4 5], NULL, NULL, {0, 0})    = 0 (Timeout)

This is because select() modifies the timeout value, it contains the
rest time when select() returns.  So timeout is zeroed after the first
select() because it times out, and do_alarm_run() never sets a new
timeout value.

I suggest you pass a NULL timeout when there is no alarm, and ensure
that you always set the correct next_alarm value in do_alarm_run().

Max

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: conntrackd won't start, "can't open multicast server!"
  2008-01-14  9:40           ` Max Kellermann
@ 2008-01-14 15:41             ` Pablo Neira Ayuso
  0 siblings, 0 replies; 8+ messages in thread
From: Pablo Neira Ayuso @ 2008-01-14 15:41 UTC (permalink / raw)
  To: Max Kellermann; +Cc: netfilter-devel, Netfilter-failover list

Max Kellermann wrote:
> Why did you create separate functions for setting secs and usecs?
> (set_alarm_expiration_secs and set_alarm_expiration_usecs); both
> functions are not prototyped in a header file.  Why not add something
> like this to alarm.h:
> 
>  static inline void
>  set_alarm_expiration_secs(struct alarm_list *t, long tv_sec, long tv_usec)
>  {
>      t->tv.tv_sec = tv_sec;
>      t->tv.tv_usec = tv_usec;
>  }
> 
> Since the alarm_list struct is public anyway, this looks more elegant
> and creates smaller code.

I have changed this. Thanks for the suggestion. I'd appreciate patches
for this sort of cleanups.

> The function do_alarm_run() assumes that all alarms are sorted by
> their due time, but add_alarm() does not enforce this.

Hm, add_alarm() always inserts new alarms at the end of the alarm list,
so the list is sorted by their due time AFAICS.

> I do not understand why you use random() to generate the next alarm
> time in sync-alarm.c.

This is part of the alarm-based synchronization approach. We send a
synchronization message which talks about a conntrack entry each
random(RefreshTime) seconds. This approach is, of course, spamming and
CPU consuming but is simple and it resolves very well inconsistent
situations among several replicas. The FTFW synchronization approach
uses an ACK/NACK based protocol which requires less resources.

> Why do you call INIT_LIST_HEAD() on all alarm_list objects?  For
> linux_list.h, that is only required on the sentinel (the global
> variable "alarm_list" in this case which is already statically
> initalized with the LIST_HEAD macro).

Indeed, I have removed them.

>>> I am using the most recent release, i.e. 0.9.5.  I have no idea about
>>> "alarm" or "persistent" mode, and I did not find any documentation on
>>> this.  I am using the "stats" example configuration from the tarball.
>> Please, could you check out a working copy from SVN and tell me if the
>> problem that you're reporting persists?
> 
> With conntrackd and libnetfilter_conntrack from SVN r7196, the crash
> does not occur anymore.  Please tell me as soon as you release both,
> so I can update the Debian package.
> 
> But have a look at the strace:
> 
>  select(6, [4 5], NULL, NULL, {1, 0})    = 0 (Timeout)
> 
> After that, it goes into an endless loop of:
> 
>  select(6, [4 5], NULL, NULL, {0, 0})    = 0 (Timeout)
> 
> This is because select() modifies the timeout value, it contains the
> rest time when select() returns.  So timeout is zeroed after the first
> select() because it times out, and do_alarm_run() never sets a new
> timeout value.
> 
> I suggest you pass a NULL timeout when there is no alarm, and ensure
> that you always set the correct next_alarm value in do_alarm_run().

Fixed in SVN. Moreover, I have removed the ugly 1 sec wait start in the
run() loop. Thanks for catching up this issue. As soon as we finish with
this discussion, I plan to pass it to testing stage and then release a
new version.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-01-14 15:42 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-04  8:10 conntrackd won't start, "can't open multicast server!" Max Kellermann
2008-01-05 14:31 ` Pablo Neira Ayuso
2008-01-05 17:29   ` Max Kellermann
2008-01-07 11:09     ` Pablo Neira Ayuso
2008-01-07 11:55       ` Max Kellermann
2008-01-09 23:06         ` Pablo Neira Ayuso
2008-01-14  9:40           ` Max Kellermann
2008-01-14 15:41             ` Pablo Neira Ayuso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).