* conntrackd won't start, "can't open multicast server!" @ 2008-01-04 8:10 Max Kellermann 2008-01-05 14:31 ` Pablo Neira Ayuso 0 siblings, 1 reply; 8+ messages in thread From: Max Kellermann @ 2008-01-04 8:10 UTC (permalink / raw) To: netfilter-devel, Pablo Neira Ayuso Hi Pablo, I am currently working on the official conntrack-tools 0.9.5 Debian package; I have been maintaining the old "conntrack" program before. The daemon will not start (with examples/stats/conntrackd.conf): host:~# /usr/sbin/conntrackd -C /etc/conntrackd.conf Notice: StripNAT clause is obsolete. Please, remove it from conntrackd.conf ERROR: conntrackd cannot start, please check the logfile for more info host:~# tail /var/log/conntrackd.log [Fri Jan 4 09:01:25 2008] (pid=9353) --- starting in console mode --- [Fri Jan 4 09:01:25 2008] (pid=9353) [FAIL] can't open multicast server! [Fri Jan 4 09:01:25 2008] (pid=9353) [FAIL] initialization failed This machine has CONFIG_IP_MULTICAST=y, although I do not understand why conntrackd needs a multicast socket in the stats mode. strace says: socket(PF_UNSPEC, SOCK_DGRAM, 0) = -1 EAFNOSUPPORT (Address family not supported by protocol) Also, the example shipped in the 0.9.5 includes the obsolete "StripNAT" option. Max ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: conntrackd won't start, "can't open multicast server!" 2008-01-04 8:10 conntrackd won't start, "can't open multicast server!" Max Kellermann @ 2008-01-05 14:31 ` Pablo Neira Ayuso 2008-01-05 17:29 ` Max Kellermann 0 siblings, 1 reply; 8+ messages in thread From: Pablo Neira Ayuso @ 2008-01-05 14:31 UTC (permalink / raw) To: Max Kellermann; +Cc: netfilter-devel Hi Max, Max Kellermann wrote: > I am currently working on the official conntrack-tools 0.9.5 Debian > package; I have been maintaining the old "conntrack" program before. > The daemon will not start (with examples/stats/conntrackd.conf): > > host:~# /usr/sbin/conntrackd -C /etc/conntrackd.conf > Notice: StripNAT clause is obsolete. Please, remove it from conntrackd.conf > ERROR: conntrackd cannot start, please check the logfile for more info > host:~# tail /var/log/conntrackd.log > [Fri Jan 4 09:01:25 2008] (pid=9353) --- starting in console mode --- > [Fri Jan 4 09:01:25 2008] (pid=9353) [FAIL] can't open multicast server! > [Fri Jan 4 09:01:25 2008] (pid=9353) [FAIL] initialization failed You forgot the -S option to run it in statistics mode. I know that this option is a bit confusing so I have applied a patch to obsolete it. Thus, you won't need to pass -S to conntrackd anymore in the upcoming 0.9.6 release. > Also, the example shipped in the 0.9.5 includes the obsolete > "StripNAT" option. This was removed from SVN. BTW, SVN currently contains a patch to enable logging via file/syslog for the statistics mode. Still, this requires evaluation in terms of performance (e.g. very busy firewalls) and some kind of tool that digest the logging information. -- "Los honestos son inadaptados sociales" -- Les Luthiers ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: conntrackd won't start, "can't open multicast server!" 2008-01-05 14:31 ` Pablo Neira Ayuso @ 2008-01-05 17:29 ` Max Kellermann 2008-01-07 11:09 ` Pablo Neira Ayuso 0 siblings, 1 reply; 8+ messages in thread From: Max Kellermann @ 2008-01-05 17:29 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: netfilter-devel On 2008/01/05 15:31, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > You forgot the -S option to run it in statistics mode. I know that this > option is a bit confusing so I have applied a patch to obsolete it. > Thus, you won't need to pass -S to conntrackd anymore in the upcoming > 0.9.6 release. Right, with -S it starts up. Somehow I must have missed that option in the --help text. By the way, it is not possible to run "conntrackd --help" as user. It would be nice if users could view the usage information. Why does conntrackd check the capability mask at all? The conntrackd manual page is missing in the source distribution, it might be in SVN, since it is displayed on the conntrack-tools home page. I noticed conntrackd runs select() with a 200ms timeout, i.e. it wakes up 5 times a second only to see that there is nothing to do. Why that? This leads to increased power consumption for no good. When I stop the daemon (running in foreground) with Ctrl-C, glibc detects a heap corruption: *** glibc detected *** /usr/sbin/conntrackd: corrupted double-linked list: 0x0000000000631d40 *** ======= Backtrace: ========= /lib/libc.so.6[0x2afb493221cc] /lib/libc.so.6(cfree+0x8c)[0x2afb49325b5c] /usr/lib/libnetfilter_conntrack.so.1(nfct_close+0x6f)[0x2afb48e9db2f] /usr/sbin/conntrackd[0x4032de] /lib/libc.so.6[0x2afb492e0040] /lib/libc.so.6(sigprocmask+0x10)[0x2afb492e0440] /usr/sbin/conntrackd[0x403350] /lib/libc.so.6[0x2afb492e0040] /lib/libc.so.6(__select+0x13)[0x2afb4937eb33] /usr/sbin/conntrackd[0x402dd5] /usr/sbin/conntrackd[0x402924] /lib/libc.so.6(__libc_start_main+0xf4)[0x2afb492cc1c4] /usr/sbin/conntrackd[0x402239] I am using libnetfilter_conntrack 0.0.82. Max ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: conntrackd won't start, "can't open multicast server!" 2008-01-05 17:29 ` Max Kellermann @ 2008-01-07 11:09 ` Pablo Neira Ayuso 2008-01-07 11:55 ` Max Kellermann 0 siblings, 1 reply; 8+ messages in thread From: Pablo Neira Ayuso @ 2008-01-07 11:09 UTC (permalink / raw) To: Max Kellermann; +Cc: netfilter-devel Max Kellermann wrote: > On 2008/01/05 15:31, Pablo Neira Ayuso <pablo@netfilter.org> wrote: >> You forgot the -S option to run it in statistics mode. I know that this >> option is a bit confusing so I have applied a patch to obsolete it. >> Thus, you won't need to pass -S to conntrackd anymore in the upcoming >> 0.9.6 release. > > Right, with -S it starts up. Somehow I must have missed that option > in the --help text. > > By the way, it is not possible to run "conntrackd --help" as user. It > would be nice if users could view the usage information. Why does > conntrackd check the capability mask at all? Netlink requires CAP_NET_ADMIN. conntrackd checks for it before starting. I'm going to do the capability checking later so that the help message can be shown. I'll commit a patch later. > The conntrackd manual page is missing in the source distribution, it > might be in SVN, since it is displayed on the conntrack-tools home > page. The conntrackd page wasn't available in 0.9.5, but it will in 0.9.6. It's impossible that we can bundle something to a package when it didn't exist at that time :) > I noticed conntrackd runs select() with a 200ms timeout, i.e. it wakes > up 5 times a second only to see that there is nothing to do. Why > that? This leads to increased power consumption for no good. I have implemented alarms based on times slices so I use select to wake up expired alarms once the slice has been consumed. Are you really observing this power consumption increment? > When I stop the daemon (running in foreground) with Ctrl-C, glibc > detects a heap corruption: > > *** glibc detected *** /usr/sbin/conntrackd: corrupted double-linked > list: 0x0000000000631d40 *** > ======= Backtrace: ========= > /lib/libc.so.6[0x2afb493221cc] > /lib/libc.so.6(cfree+0x8c)[0x2afb49325b5c] > /usr/lib/libnetfilter_conntrack.so.1(nfct_close+0x6f)[0x2afb48e9db2f] > /usr/sbin/conntrackd[0x4032de] > /lib/libc.so.6[0x2afb492e0040] > /lib/libc.so.6(sigprocmask+0x10)[0x2afb492e0440] > /usr/sbin/conntrackd[0x403350] > /lib/libc.so.6[0x2afb492e0040] > /lib/libc.so.6(__select+0x13)[0x2afb4937eb33] > /usr/sbin/conntrackd[0x402dd5] > /usr/sbin/conntrackd[0x402924] > /lib/libc.so.6(__libc_start_main+0xf4)[0x2afb492cc1c4] > /usr/sbin/conntrackd[0x402239] I'll investigate this. Are you using 0.9.5 or a SVN snapshot? Are you using the `alarm' mode (formely known as `persistent')? -- "Los honestos son inadaptados sociales" -- Les Luthiers ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: conntrackd won't start, "can't open multicast server!" 2008-01-07 11:09 ` Pablo Neira Ayuso @ 2008-01-07 11:55 ` Max Kellermann 2008-01-09 23:06 ` Pablo Neira Ayuso 0 siblings, 1 reply; 8+ messages in thread From: Max Kellermann @ 2008-01-07 11:55 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: netfilter-devel On 2008/01/07 12:09, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > The conntrackd page wasn't available in 0.9.5, but it will in 0.9.6. > It's impossible that we can bundle something to a package when it > didn't exist at that time :) Ah, so the manual page on the project home page is from an unreleased version. > > I noticed conntrackd runs select() with a 200ms timeout, i.e. it wakes > > up 5 times a second only to see that there is nothing to do. Why > > that? This leads to increased power consumption for no good. > > I have implemented alarms based on times slices so I use select to wake > up expired alarms once the slice has been consumed. Are you really > observing this power consumption increment? We can argue about whether fixing just conntrackd would be measurable, but it's yet another daemon who is polling when he could better wait for real events. Let's fix all of them, and enjoy a tickless system which lets the CPU sleep until there is real work to do. Waking up daemons without a reason is sloppy design most of the time. A quick look at the conntrackd code made me believe that conntrackd just doesn't check when the next scheduled event is due, and instead performs a check on all alarm objects in the current step 5 times a second. That is easily fixable, and not only saves CPU cycles and power, but also leads to better overall design. The whole alarm.c looks like duplicated effort, you could have used libevent instead. By the way, I saw an add_alarm() in cache_timer.c, but its callback function "timeout()" neither sets a new "expires" value, nor does it delete the alarm object. That may lead to integer underflow in the next do_alarm_run() invocation. > I'll investigate this. Are you using 0.9.5 or a SVN snapshot? Are > you using the `alarm' mode (formely known as `persistent')? I am using the most recent release, i.e. 0.9.5. I have no idea about "alarm" or "persistent" mode, and I did not find any documentation on this. I am using the "stats" example configuration from the tarball. Max ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: conntrackd won't start, "can't open multicast server!" 2008-01-07 11:55 ` Max Kellermann @ 2008-01-09 23:06 ` Pablo Neira Ayuso 2008-01-14 9:40 ` Max Kellermann 0 siblings, 1 reply; 8+ messages in thread From: Pablo Neira Ayuso @ 2008-01-09 23:06 UTC (permalink / raw) To: Max Kellermann; +Cc: netfilter-devel Max Kellermann wrote: > Waking up daemons without a reason is sloppy design most of the time. > A quick look at the conntrackd code made me believe that conntrackd > just doesn't check when the next scheduled event is due, and instead > performs a check on all alarm objects in the current step 5 times a > second. That is easily fixable, and not only saves CPU cycles and > power, but also leads to better overall design. Indeed. This makes a lot sense to me. I have committed a patch to SVN to wake up the daemon only if there is any alarm event to process instead of polling. I'll do some testing of it tomorrow. > The whole alarm.c looks like duplicated effort, you could have used > libevent instead. Well, I think that libevent is too much since conntrackd handles not that many descriptors and the alarm implementation is enough for what conntrackd needs IMO. > By the way, I saw an add_alarm() in cache_timer.c, but its callback > function "timeout()" neither sets a new "expires" value, nor does it > delete the alarm object. That may lead to integer underflow in the > next do_alarm_run() invocation. I have also changed this since I needed it for the lastest commit. However, AFAICS such underflow doesn't ever happen in 0.9.5. >> I'll investigate this. Are you using 0.9.5 or a SVN snapshot? Are >> you using the `alarm' mode (formely known as `persistent')? > > I am using the most recent release, i.e. 0.9.5. I have no idea about > "alarm" or "persistent" mode, and I did not find any documentation on > this. I am using the "stats" example configuration from the tarball. Please, could you check out a working copy from SVN and tell me if the problem that you're reporting persists? -- "Los honestos son inadaptados sociales" -- Les Luthiers ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: conntrackd won't start, "can't open multicast server!" 2008-01-09 23:06 ` Pablo Neira Ayuso @ 2008-01-14 9:40 ` Max Kellermann 2008-01-14 15:41 ` Pablo Neira Ayuso 0 siblings, 1 reply; 8+ messages in thread From: Max Kellermann @ 2008-01-14 9:40 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: netfilter-devel On 2008/01/10 00:06, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > Max Kellermann wrote: > > Waking up daemons without a reason is sloppy design most of the time. > > A quick look at the conntrackd code made me believe that conntrackd > > just doesn't check when the next scheduled event is due, and instead > > performs a check on all alarm objects in the current step 5 times a > > second. That is easily fixable, and not only saves CPU cycles and > > power, but also leads to better overall design. > > Indeed. This makes a lot sense to me. I have committed a patch to SVN to > wake up the daemon only if there is any alarm event to process instead > of polling. I'll do some testing of it tomorrow. Looks much better! 15 files changed, 103 insertions(+), 208 deletions(-) Also, the code has become smaller :) Why did you create separate functions for setting secs and usecs? (set_alarm_expiration_secs and set_alarm_expiration_usecs); both functions are not prototyped in a header file. Why not add something like this to alarm.h: static inline void set_alarm_expiration_secs(struct alarm_list *t, long tv_sec, long tv_usec) { t->tv.tv_sec = tv_sec; t->tv.tv_usec = tv_usec; } Since the alarm_list struct is public anyway, this looks more elegant and creates smaller code. The function do_alarm_run() assumes that all alarms are sorted by their due time, but add_alarm() does not enforce this. I do not understand why you use random() to generate the next alarm time in sync-alarm.c. Why do you call INIT_LIST_HEAD() on all alarm_list objects? For linux_list.h, that is only required on the sentinel (the global variable "alarm_list" in this case which is already statically initalized with the LIST_HEAD macro). > > The whole alarm.c looks like duplicated effort, you could have used > > libevent instead. > > Well, I think that libevent is too much since conntrackd handles not > that many descriptors and the alarm implementation is enough for what > conntrackd needs IMO. I tend to use libevent for small projects, too; maybe libowfat is more appropriate for smaller projects. That is a matter of taste. > > I am using the most recent release, i.e. 0.9.5. I have no idea about > > "alarm" or "persistent" mode, and I did not find any documentation on > > this. I am using the "stats" example configuration from the tarball. > > Please, could you check out a working copy from SVN and tell me if the > problem that you're reporting persists? With conntrackd and libnetfilter_conntrack from SVN r7196, the crash does not occur anymore. Please tell me as soon as you release both, so I can update the Debian package. But have a look at the strace: select(6, [4 5], NULL, NULL, {1, 0}) = 0 (Timeout) After that, it goes into an endless loop of: select(6, [4 5], NULL, NULL, {0, 0}) = 0 (Timeout) This is because select() modifies the timeout value, it contains the rest time when select() returns. So timeout is zeroed after the first select() because it times out, and do_alarm_run() never sets a new timeout value. I suggest you pass a NULL timeout when there is no alarm, and ensure that you always set the correct next_alarm value in do_alarm_run(). Max ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: conntrackd won't start, "can't open multicast server!" 2008-01-14 9:40 ` Max Kellermann @ 2008-01-14 15:41 ` Pablo Neira Ayuso 0 siblings, 0 replies; 8+ messages in thread From: Pablo Neira Ayuso @ 2008-01-14 15:41 UTC (permalink / raw) To: Max Kellermann; +Cc: netfilter-devel, Netfilter-failover list Max Kellermann wrote: > Why did you create separate functions for setting secs and usecs? > (set_alarm_expiration_secs and set_alarm_expiration_usecs); both > functions are not prototyped in a header file. Why not add something > like this to alarm.h: > > static inline void > set_alarm_expiration_secs(struct alarm_list *t, long tv_sec, long tv_usec) > { > t->tv.tv_sec = tv_sec; > t->tv.tv_usec = tv_usec; > } > > Since the alarm_list struct is public anyway, this looks more elegant > and creates smaller code. I have changed this. Thanks for the suggestion. I'd appreciate patches for this sort of cleanups. > The function do_alarm_run() assumes that all alarms are sorted by > their due time, but add_alarm() does not enforce this. Hm, add_alarm() always inserts new alarms at the end of the alarm list, so the list is sorted by their due time AFAICS. > I do not understand why you use random() to generate the next alarm > time in sync-alarm.c. This is part of the alarm-based synchronization approach. We send a synchronization message which talks about a conntrack entry each random(RefreshTime) seconds. This approach is, of course, spamming and CPU consuming but is simple and it resolves very well inconsistent situations among several replicas. The FTFW synchronization approach uses an ACK/NACK based protocol which requires less resources. > Why do you call INIT_LIST_HEAD() on all alarm_list objects? For > linux_list.h, that is only required on the sentinel (the global > variable "alarm_list" in this case which is already statically > initalized with the LIST_HEAD macro). Indeed, I have removed them. >>> I am using the most recent release, i.e. 0.9.5. I have no idea about >>> "alarm" or "persistent" mode, and I did not find any documentation on >>> this. I am using the "stats" example configuration from the tarball. >> Please, could you check out a working copy from SVN and tell me if the >> problem that you're reporting persists? > > With conntrackd and libnetfilter_conntrack from SVN r7196, the crash > does not occur anymore. Please tell me as soon as you release both, > so I can update the Debian package. > > But have a look at the strace: > > select(6, [4 5], NULL, NULL, {1, 0}) = 0 (Timeout) > > After that, it goes into an endless loop of: > > select(6, [4 5], NULL, NULL, {0, 0}) = 0 (Timeout) > > This is because select() modifies the timeout value, it contains the > rest time when select() returns. So timeout is zeroed after the first > select() because it times out, and do_alarm_run() never sets a new > timeout value. > > I suggest you pass a NULL timeout when there is no alarm, and ensure > that you always set the correct next_alarm value in do_alarm_run(). Fixed in SVN. Moreover, I have removed the ugly 1 sec wait start in the run() loop. Thanks for catching up this issue. As soon as we finish with this discussion, I plan to pass it to testing stage and then release a new version. -- "Los honestos son inadaptados sociales" -- Les Luthiers ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-01-14 15:42 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-01-04 8:10 conntrackd won't start, "can't open multicast server!" Max Kellermann 2008-01-05 14:31 ` Pablo Neira Ayuso 2008-01-05 17:29 ` Max Kellermann 2008-01-07 11:09 ` Pablo Neira Ayuso 2008-01-07 11:55 ` Max Kellermann 2008-01-09 23:06 ` Pablo Neira Ayuso 2008-01-14 9:40 ` Max Kellermann 2008-01-14 15:41 ` Pablo Neira Ayuso
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).