netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [ULOGD 00/15] ulogd V2 improvements, round 2
@ 2008-02-02 20:48 heitzenberger
  2008-02-02 20:48 ` [ULOGD 01/15] Add NACCT output plugin heitzenberger
                   ` (15 more replies)
  0 siblings, 16 replies; 31+ messages in thread
From: heitzenberger @ 2008-02-02 20:48 UTC (permalink / raw)
  To: netfilter-devel; +Cc: holger

Hi,

this is a follow-up to my previous ulogd V2 improvements patchset
send to you as an RFC.

The changes to the previous set of patches are rather small, namely
the code of the NFCT plugin contains two small changes after feedback
from Pablo Neira.

Because all those changes were strictly chronological, with some
of the changes confusingly reverted later, I decided to stuff the
NFCT-related changes into one, as well as for the SQLITE3 related
ones.  Therefore the number of patches reduces from 30 to 15.

After the feedback I've got I'll suggest to apply all the patches
without the last NFCT one.  It is applied just as an RFC.

My intention is to turn the output keys of the NFCT plugin compatible
to the former one, and also add a configuration option to control
whether to propagate one dataset per flow or two (as it was before).

Because the NFCT-related changes depend on a patchlet to
libnetfilter-conntrack which might actually not be applied, I'll
suggest to convert the code base to libnl sooner.  Any objections?

Any feedback welcome.

 /holger


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [ULOGD 01/15] Add NACCT output plugin
  2008-02-02 20:48 [ULOGD 00/15] ulogd V2 improvements, round 2 heitzenberger
@ 2008-02-02 20:48 ` heitzenberger
  2008-02-02 21:24   ` Pablo Neira Ayuso
  2008-02-02 20:48 ` [ULOGD 02/15] common.h: added heitzenberger
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: heitzenberger @ 2008-02-02 20:48 UTC (permalink / raw)
  To: netfilter-devel; +Cc: holger

Hi,
Content-Disposition: inline; filename=ulogd-NACCT-plugin.diff

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>

Index: ulogd-netfilter/output/Makefile.am
===================================================================
--- ulogd-netfilter.orig/output/Makefile.am
+++ ulogd-netfilter/output/Makefile.am
@@ -4,7 +4,8 @@ LIBS=""
 SUBDIRS= pcap mysql pgsql sqlite3
 
 pkglib_LTLIBRARIES = ulogd_output_LOGEMU.la ulogd_output_SYSLOG.la \
-		     ulogd_output_OPRINT.la ulogd_output_IPFIX.la
+		     ulogd_output_OPRINT.la ulogd_output_IPFIX.la \
+			 ulogd_output_NACCT.la
 
 ulogd_output_LOGEMU_la_SOURCES = ulogd_output_LOGEMU.c
 ulogd_output_LOGEMU_la_LDFLAGS = -module
@@ -18,3 +19,5 @@ ulogd_output_OPRINT_la_LDFLAGS = -module
 ulogd_output_IPFIX_la_SOURCES = ulogd_output_IPFIX.c
 ulogd_output_IPFIX_la_LDFLAGS = -module
 
+ulogd_output_NACCT_la_SOURCES = ulogd_output_NACCT.c
+ulogd_output_NACCT_la_LDFLAGS = -module
Index: ulogd-netfilter/output/ulogd_output_NACCT.c
===================================================================
--- /dev/null
+++ ulogd-netfilter/output/ulogd_output_NACCT.c
@@ -0,0 +1,206 @@
+/*
+ * ulogd_outpout_NACCT.c
+ *
+ * ulogd output plugin for accounting which tries to stay mostly
+ * compatible with nacct output.
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 
+ * as published by the Free Software Foundation
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ *
+ * Holger Eitzenberger <holger@eitzenberger.org>  Astaro AG 2008
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+#include <arpa/inet.h>
+#include <ulogd/ulogd.h>
+#include <ulogd/conffile.h>
+
+#define NACCT_FILE_DEFAULT	"/var/log/nacctdata.log"
+
+#define HIPQUAD(addr) \
+        ((unsigned char *)&addr)[3], \
+        ((unsigned char *)&addr)[2], \
+        ((unsigned char *)&addr)[1], \
+        ((unsigned char *)&addr)[0]
+
+/* config accessors (lazy me...) */
+#define NACCT_CFG_FILE(pi)	((pi)->config_kset->ces[0].u.string)
+#define NACCT_CFG_SYNC(pi)	((pi)->config_kset->ces[1].u.value)
+
+#define KEY(pi,idx)		((pi)->input.keys[(idx)].u.source)
+
+/* input keys */
+#define KEY_IP_SADDR(pi)		KEY(pi, 0)
+#define KEY_IP_DADDR(pi)		KEY(pi, 1)
+#define KEY_IP_PROTO(pi)		KEY(pi, 2)
+#define KEY_L4_SPORT(pi)		KEY(pi, 3)
+#define KEY_L4_DPORT(pi)		KEY(pi, 4)
+#define KEY_RAW_PKTLEN(pi)		KEY(pi, 5)
+#define KEY_RAW_PKTCNT(pi)		KEY(pi, 6)
+#define KEY_ICMP_CODE(pi)		KEY(pi, 7)
+#define KEY_ICMP_TYPE(pi)		KEY(pi, 8)
+#define KEY_FLOW_START(pi)		KEY(pi, 11)
+#define KEY_FLOW_END(pi)		KEY(pi, 13)
+
+struct nacct_priv {
+	FILE *of;
+};
+
+
+static int
+nacct_interp(struct ulogd_pluginstance *pi)
+{
+	struct nacct_priv *priv = (struct nacct_priv *)&pi->private;
+	static char buf[80];
+
+	/* try to be as close to nacct as possible.  Instead of nacct's
+	   'timestamp' value use 'flow.end.sec' */
+	if (KEY_IP_PROTO(pi)->u.value.ui8 == IPPROTO_ICMP) {
+		snprintf(buf, sizeof(buf),
+				 "%u\t%u\t%u.%u.%u.%u\t%u\t%u.%u.%u.%u\t%u\t%u\t%u",
+				 KEY_FLOW_END(pi)->u.value.ui32,
+				 KEY_IP_PROTO(pi)->u.value.ui8,
+				 HIPQUAD(KEY_IP_SADDR(pi)->u.value.ui32),
+				 KEY_ICMP_TYPE(pi)->u.value.ui8,
+				 HIPQUAD(KEY_IP_DADDR(pi)->u.value.ui32),
+				 KEY_ICMP_CODE(pi)->u.value.ui8,
+				 KEY_RAW_PKTCNT(pi)->u.value.ui32,
+				 KEY_RAW_PKTLEN(pi)->u.value.ui32);
+	} else {
+		snprintf(buf, sizeof(buf),
+				 "%u\t%u\t%u.%u.%u.%u\t%u\t%u.%u.%u.%u\t%u\t%u\t%u",
+				 KEY_FLOW_END(pi)->u.value.ui32,
+				 KEY_IP_PROTO(pi)->u.value.ui8,
+				 HIPQUAD(KEY_IP_SADDR(pi)->u.value.ui32),
+				 KEY_L4_SPORT(pi)->u.value.ui8,
+				 HIPQUAD(KEY_IP_DADDR(pi)->u.value.ui32),
+				 KEY_L4_DPORT(pi)->u.value.ui8,
+				 KEY_RAW_PKTCNT(pi)->u.value.ui32,
+				 KEY_RAW_PKTLEN(pi)->u.value.ui32);
+	}
+
+	fprintf(priv->of, "%s\n", buf);
+
+	if (NACCT_CFG_SYNC(pi) != 0)
+		fflush(priv->of);
+
+	return 0;
+}
+
+static struct config_keyset nacct_kset = {
+	.num_ces = 2,
+	.ces = {
+		{
+			.key = "file", 
+			.type = CONFIG_TYPE_STRING, 
+			.options = CONFIG_OPT_NONE,
+			.u = {.string = NACCT_FILE_DEFAULT },
+		},
+		{
+			.key = "sync",
+			.type = CONFIG_TYPE_INT,
+			.options = CONFIG_OPT_NONE,
+			.u = { .value = 0 },
+		},
+	},
+};
+
+static void
+sighup_handler_print(struct ulogd_pluginstance *pi, int signal)
+{
+	struct nacct_priv *oi = (struct nacct_priv *)&pi->private;
+
+	switch (signal) {
+	case SIGHUP:
+	{
+		ulogd_log(ULOGD_NOTICE, "NACCT: reopening logfile\n");
+		fclose(oi->of);
+		oi->of = fopen(NACCT_CFG_FILE(pi), "a");
+		if (!oi->of)
+			ulogd_log(ULOGD_ERROR, "%s: %s\n", NACCT_CFG_FILE(pi),
+					  strerror(errno));
+		break;
+	}
+
+	default:
+		break;
+	}
+}
+
+static int
+nacct_conf(struct ulogd_pluginstance *pi,
+		   struct ulogd_pluginstance_stack *stack)
+{
+	int ret;
+
+	if ((ret = ulogd_wildcard_inputkeys(pi)) < 0)
+		return ret;
+
+	if ((ret = config_parse_file(pi->id, pi->config_kset)) < 0)
+		return ret;
+
+	return 0;
+}
+
+static int
+nacct_init(struct ulogd_pluginstance *pi)
+{
+	struct nacct_priv *op = (struct nacct_priv *)&pi->private;
+
+	if ((op->of = fopen(NACCT_CFG_FILE(pi), "a")) == NULL) {
+		ulogd_log(ULOGD_FATAL, "%s: %s\n", 
+				  NACCT_CFG_FILE(pi), strerror(errno));
+		return -1;
+	}		
+	return 0;
+}
+
+static int
+nacct_fini(struct ulogd_pluginstance *pi)
+{
+	struct nacct_priv *op = (struct nacct_priv *)&pi->private;
+
+	if (op->of != stdout)
+		fclose(op->of);
+
+	return 0;
+}
+
+static struct ulogd_plugin nacct_plugin = {
+	.name = "NACCT", 
+	.input = {
+		.type = ULOGD_DTYPE_PACKET | ULOGD_DTYPE_FLOW,
+	},
+	.output = {
+		.type = ULOGD_DTYPE_SINK,
+	},
+	.configure = &nacct_conf,
+	.interp	= &nacct_interp,
+	.start 	= &nacct_init,
+	.stop	= &nacct_fini,
+	.signal = &sighup_handler_print,
+	.config_kset = &nacct_kset,
+	.version = ULOGD_VERSION,
+};
+
+void __attribute__ ((constructor)) init(void);
+
+void
+init(void)
+{
+	ulogd_register_plugin(&nacct_plugin);
+}

-- 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [ULOGD 02/15] common.h: added
  2008-02-02 20:48 [ULOGD 00/15] ulogd V2 improvements, round 2 heitzenberger
  2008-02-02 20:48 ` [ULOGD 01/15] Add NACCT output plugin heitzenberger
@ 2008-02-02 20:48 ` heitzenberger
  2008-02-02 21:30   ` Pablo Neira Ayuso
  2008-02-02 20:48 ` [ULOGD 03/15] Replace timer code by working version heitzenberger
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: heitzenberger @ 2008-02-02 20:48 UTC (permalink / raw)
  To: netfilter-devel; +Cc: holger

Hi,
Content-Disposition: inline; filename=ulogd-add-common-code.diff

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>

Index: ulogd-netfilter/include/ulogd/common.h
===================================================================
--- /dev/null
+++ ulogd-netfilter/include/ulogd/common.h
@@ -0,0 +1,50 @@
+/*
+ * common.h
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ *
+ * Holger Eitzenberger <holger@eitzenberger.org>  Astaro AG 2008
+ */
+#ifndef COMMON_H
+#define COMMON_H
+
+#include <stdlib.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <string.h>
+#include <errno.h>
+#include <assert.h>
+
+#define min(x, y) ({ \
+        typeof(x) _x = (x);  typeof(y) _y = (y); \
+        _x < _y ? _x : _y; })
+#define max(x, y) ({ \
+        typeof(x) _x = (x);  typeof(y) _y = (y); \
+        _x > _y ? _x : _y; })
+
+#define SEC		* 1
+#define MIN		* 60 SEC
+#define HOUR	* 60 MIN
+#define DAY		* 24 HOUR
+
+#define __unused				__attribute__((unused))
+#define __fmt_printf(i, first)	__attribute__((format (printtf,(i),(f))))
+
+#ifdef DEBUG
+#define pr_debug(fmt, ...)		ulogd_log(ULOGD_DEBUG, fmt, ## __VA_ARGS__)
+#else
+#define pr_debug(fmt, ...)
+#endif /* DEBUG */
+
+#endif /* COMMON_H */

-- 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [ULOGD 03/15] Replace timer code by working version
  2008-02-02 20:48 [ULOGD 00/15] ulogd V2 improvements, round 2 heitzenberger
  2008-02-02 20:48 ` [ULOGD 01/15] Add NACCT output plugin heitzenberger
  2008-02-02 20:48 ` [ULOGD 02/15] common.h: added heitzenberger
@ 2008-02-02 20:48 ` heitzenberger
  2008-02-02 22:45   ` Pablo Neira Ayuso
  2008-02-02 20:48 ` [ULOGD 04/15] Add IFI list heitzenberger
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: heitzenberger @ 2008-02-02 20:48 UTC (permalink / raw)
  To: netfilter-devel; +Cc: holger

Hi,
Content-Disposition: inline; filename=ulogd-timer-handling.diff

Replace existing timer code by simple and more importantly working
version.  Current resolution is one second, which may be easily
extended if need be.

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>

Index: ulogd-netfilter/include/ulogd/ulogd.h
===================================================================
--- ulogd-netfilter.orig/include/ulogd/ulogd.h
+++ ulogd-netfilter/include/ulogd/ulogd.h
@@ -240,17 +240,31 @@ void ulogd_unregister_fd(struct ulogd_fd
 int ulogd_select_main();
 
 /***********************************************************************
- * timer handling
+ * timer handling (timer.c)
  ***********************************************************************/
+#define TIMER_F_USED			0x0001
+#define TIMER_F_PERIODIC		0x0002
 
 struct ulogd_timer {
 	struct llist_head list;
-	struct timeval expires;
-	void (*cb)(void *data);
-	void *data;
+	unsigned expires;			/* seconds */
+	unsigned ival;				/* seconds */
+	unsigned flags;
+	void (* cb)(struct ulogd_timer *);
+	void *data;					/* usually (ulogd_pluginstance *) */
 };
 
+extern struct timeval tv_now;
+extern struct timeval tv_now_local;
+
+#define t_now			tv_now.tv_sec
+#define t_now_local		tv_now_local.tv_sec
+
+int ulogd_timer_init(void);
+int ulogd_timer_run(void);
 int ulogd_register_timer(struct ulogd_timer *timer);
 void ulogd_unregister_timer(struct ulogd_timer *timer);
+void ulogd_timer_schedule(void);
+int ulogd_timer_handle(void);
 
 #endif /* _ULOGD_H */
Index: ulogd-netfilter/src/timer.c
===================================================================
--- ulogd-netfilter.orig/src/timer.c
+++ ulogd-netfilter/src/timer.c
@@ -18,6 +18,9 @@
  *  You should have received a copy of the GNU General Public License
  *  along with this program; if not, write to the Free Software
  *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ *
+ *
+ * H. Eitzenberger <holger@eitzenberger.org>  Astaro AG, 2007
  */
 
 #include <unistd.h>
@@ -25,143 +28,115 @@
 #include <string.h>
 #include <sys/time.h>
 #include <time.h>
+#include <errno.h>
 
 #include <ulogd/ulogd.h>
+#include <ulogd/common.h>
 #include <ulogd/linuxlist.h>
 
 static LLIST_HEAD(ulogd_timers);
+static struct tm tm_local;
+struct timeval tv_now, tv_now_local;
 
-static void tv_normalize(struct timeval *out)
-{
-	out->tv_sec += (out->tv_usec / 1000000);
-	out->tv_usec = (out->tv_usec % 1000000);
-}
 
-/* subtract two struct timevals */
-static int tv_sub(struct timeval *res, const struct timeval *from,
-		  const struct timeval *sub)
+int
+ulogd_register_timer(struct ulogd_timer *timer)
 {
-	/* FIXME: this stinks.  Deal with wraps, carry, ... */
-	res->tv_sec = from->tv_sec - sub->tv_sec;
-	res->tv_usec = from->tv_usec - sub->tv_usec;
+	pr_debug("%s: timer=%p\n", __func__, timer);
+
+	if (timer->flags & TIMER_F_USED) {
+		ulogd_log(ULOGD_ERROR, "timer already registered\n");
+		return -1;
+	}
+
+	if (timer->flags & TIMER_F_PERIODIC) {
+		timer->expires = t_now + timer->ival;
+	} else {
+		if (timer->expires == 0) {
+			errno = EINVAL;
+			return -1;
+		}
+	}
+
+	timer->flags |= TIMER_F_USED;
+
+	llist_add_tail(&timer->list, &ulogd_timers);
 
 	return 0;
 }
 
-static int tv_add(struct timeval *res, const struct timeval *a1,
-		  const struct timeval *a2)
+
+void
+ulogd_unregister_timer(struct ulogd_timer *timer)
 {
-	unsigned int carry;
+	pr_debug("%s: timer=%p\n", __func__, timer);
 
-	res->tv_sec = a1->tv_sec + a2->tv_sec;
-	res->tv_usec = a1->tv_usec + a2->tv_usec;
+	/* TODO check for race conditions on unregister */
 
-	tv_normalize(res);
-}
+	if ((timer->flags & TIMER_F_USED) == 0)
+		return;					/* not registered */
 
-static int tv_later(const struct timeval *expires, const struct timeval *now)
-{
-	if (expires->tv_sec < now->tv_sec)
-		return 0;
-	else if (expires->tv_sec > now->tv_sec)
-		return 1;
-	else /* if (expires->tv_sec == now->tv_sec */ {
-		if (expires->tv_usec >= now->tv_usec)
-			return 1;
-	}
+	timer->flags &= ~TIMER_F_USED;
 
-	return 0;
+	llist_del(&timer->list);
 }
 
-static int tv_smaller(const struct timeval *t1, const struct timeval *t2)
-{
-	return tv_later(t2, t1);
-}
 
-static int calc_next_expiration(void)
+int
+ulogd_timer_handle(void)
 {
-	struct ulogd_timer *cur;
-	struct timeval min, now, diff;
-	struct itimerval iti;
-	int ret;
-
-retry:
-	if (llist_empty(&ulogd_timers))
-		return 0;
-
-	llist_for_each_entry(cur, &ulogd_timers, list) {
-		if (ulogd_timers.next == &cur->list)
-			min = cur->expires;
-
-		if (tv_smaller(&cur->expires, &min))
-			min = cur->expires;
-	}
+	struct ulogd_timer *t;
 
-	if (tv_sub(&diff, &min, &now) < 0) {
-		/* FIXME: run expired timer callbacks */
-		/* we cannot run timers from here since we might be
-		 * called from register_timer() within check_n_run() */
+	t_now = time(NULL);			/* UTC */
 
-		/* FIXME: restart with next minimum timer */
-		goto retry;
-	}
+	pr_debug("%s: t_now=%ld\n", __func__, t_now);
 
-	/* re-set kernel timer */
-	memset(&iti, 0, sizeof(iti));
-	memcpy(&iti.it_value, &diff, sizeof(iti.it_value));
-	ret = setitimer(ITIMER_REAL, &iti, NULL);
-	if (ret < 0)
-		return ret;
+	/* get offset to local time every hour */
+	if ((t_now % (1 HOUR)) == 0)
+		localtime_r(&t_now, &tm_local);
 
-	return 0;
-}
+	t_now_local = t_now + tm_local.tm_gmtoff;
 
-void ulogd_timer_check_n_run(void)
-{
-	struct ulogd_timer *cur, *cur2;
-	struct timeval now;
+	llist_for_each_entry(t, &ulogd_timers, list) {
+		assert(t->flags & TIMER_F_USED);
 
-	if (gettimeofday(&now, NULL) < 0)
-		return;
+		if (t->expires <= t_now) {
+			(t->cb)(t);
 
-	llist_for_each_entry_safe(cur, cur2, &ulogd_timers, list) {
-		if (tv_later(&cur->expires, &now)) {
-			/* fist delete it from the list of timers */
-			llist_del(&cur->list);
-			/* then call.  called function can re-add it */
-			(cur->cb)(cur->data);
+			if (t->flags & TIMER_F_PERIODIC)
+				t->expires = t_now + t->ival;
+			else
+				llist_del(&t->list);
 		}
 	}
 
-	calc_next_expiration();
+	return 1;
 }
 
 
-int ulogd_register_timer(struct ulogd_timer *timer)
+int
+ulogd_timer_init(void)
 {
-	int ret;
-	struct timeval tv;
-
-	ret = gettimeofday(&tv, NULL);
-	if (ret < 0)
-		return ret;
-
-	/* convert expiration time into absoulte time */
-	timer->expires.tv_sec += tv.tv_sec;
-	timer->expires.tv_usec += tv.tv_usec;
-
-	llist_add_tail(&timer->list, &ulogd_timers);
-
-	/* re-calculate next expiration */
-	calc_next_expiration();
+	t_now = time(NULL);
+	localtime_r(&t_now, &tm_local);
 
 	return 0;
 }
 
-void ulogd_unregister_timer(struct ulogd_timer *timer)
-{
-	llist_del(&timer->list);
 
-	/* re-calculate next expiration */
-	calc_next_expiration();
+/* start periodic timer */
+int
+ulogd_timer_run(void)
+{
+	struct itimerval itv = {	/* run timer every second */
+		.it_interval = { .tv_sec = 1, },
+		.it_value = { .tv_sec = 1, },
+	};
+
+	if (setitimer(ITIMER_REAL, &itv, NULL) < 0) {
+		ulogd_log(ULOGD_ERROR, "setitimer: %s\n", strerror(errno));
+		return -1;
+	}
+
+	return 0;
 }

-- 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [ULOGD 04/15] Add IFI list
  2008-02-02 20:48 [ULOGD 00/15] ulogd V2 improvements, round 2 heitzenberger
                   ` (2 preceding siblings ...)
  2008-02-02 20:48 ` [ULOGD 03/15] Replace timer code by working version heitzenberger
@ 2008-02-02 20:48 ` heitzenberger
  2008-02-02 21:36   ` Pablo Neira Ayuso
  2008-02-02 20:48 ` [ULOGD 05/15] Add signalling subsystem heitzenberger
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: heitzenberger @ 2008-02-02 20:48 UTC (permalink / raw)
  To: netfilter-devel; +Cc: holger

Hi,
Content-Disposition: inline; filename=ulogd-ifi.diff

IFI implements a list of up-to-date network interfaces with MAC
addresses, mostly used for logging and accounting.  It is therefore
quite redudant to the IFINDEX plugin, but IMO this should go into
ulogd core and not into a plugin.

A future step might be to drop IFINDEX plugin.

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>

Index: ulogd-netfilter/include/ulogd/ifi.h
===================================================================
--- /dev/null
+++ ulogd-netfilter/include/ulogd/ifi.h
@@ -0,0 +1,43 @@
+/*
+ * ifi.h
+ *
+ * Maintain a list of network interfaces.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ *
+ *
+ * Holger Eitzenberger <holger@eitzenberger.org>  Astaro AG, 2007.
+ */
+#ifndef IFI_H
+#define IFI_H
+
+#include <ulogd/linuxlist.h>
+#include <net/if.h>
+
+
+struct ifi {
+	struct llist_head link;
+	unsigned idx;			/* interface index */
+	unsigned used : 1;
+	unsigned flags;
+	char name[IFNAMSIZ];
+	unsigned char lladdr[6];
+};
+
+int ifi_init(void);
+void ifi_fini(void);
+
+struct ifi *ifi_find_by_idx(unsigned);
+
+#endif /* IFI_H */
Index: ulogd-netfilter/src/ifi.c
===================================================================
--- /dev/null
+++ ulogd-netfilter/src/ifi.c
@@ -0,0 +1,379 @@
+/*
+ * ifi.c
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ *
+ * Holger Eitzenberger <holger@eitzenberger.org>  Astaro AG 2007.
+ */
+
+#include <ulogd/ulogd.h>
+#include <ulogd/common.h>
+#include <ulogd/ifi.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <linux/rtnetlink.h>
+
+#define IFI_STATIC_MAX			64
+
+
+/* the first IFI_STATIC_MAX entries are kept in ifi_static[] for performance
+   reasons, whereas all entries with an interface index larger than
+   IFI_STATIX_MAX-1 are kept in the linked ifi_list. */
+static LLIST_HEAD(ifi_list);
+static struct ifi ifi_static[IFI_STATIC_MAX];
+
+static unsigned nl_seq;			/* last seq# */
+
+
+static struct ifi *
+ifi_alloc(void)
+{
+	struct ifi *ifi;
+
+	if ((ifi = calloc(1, sizeof(struct ifi))) == NULL)
+		return NULL;
+
+	return ifi;
+}
+
+
+struct ifi *
+ifi_find_by_idx(unsigned idx)
+{
+	struct ifi *ifi;
+
+	if (idx < IFI_STATIC_MAX)
+		ifi = &ifi_static[idx];
+	else {
+		llist_for_each_entry(ifi, &ifi_list, link) {
+			if (ifi->idx == idx)
+				break;
+		}
+
+		if (ifi == NULL)
+			return NULL;
+	}
+
+	return ifi->used ? ifi : NULL;
+}
+
+
+static struct ifi *
+ifi_find_or_add(unsigned idx)
+{
+	struct ifi *ifi = ifi_find_by_idx(idx);
+
+	if (ifi != NULL) {
+		pr_debug("%s: found ifi '%s' (index %u)\n", __func__,
+				 ifi->name, ifi->idx);
+		return ifi;
+	}
+
+	/* add */
+	if (idx < IFI_STATIC_MAX)
+		ifi = &ifi_static[idx];
+	else {
+		if ((ifi = ifi_alloc()) == NULL) {
+			ulogd_log(ULOGD_ERROR, "ifi: out of memory\n");
+			return NULL;
+		}
+
+		llist_add_tail(&ifi->link, &ifi_list);
+	}
+
+	ifi->idx = idx;
+	ifi->used = 1;
+
+	pr_debug("%s: added interface '%s' (index %u)\n", __func__,
+			 ifi->name, ifi->idx);
+
+	return ifi;
+}
+
+
+static bool
+ifi_del(unsigned idx)
+{
+	struct ifi *ifi;
+
+	if (idx < IFI_STATIC_MAX) {
+		ifi = &ifi_static[idx];
+
+		if (ifi->used) {
+			ifi->used = 0;
+
+			pr_debug("%s: deleted interface '%s' (index %u)\n", __func__,
+					 ifi->name, ifi->idx);
+
+			return true;
+		}
+	} else {
+		struct ifi *tmp;
+
+		llist_for_each_entry_safe(ifi, tmp, &ifi_list, link) {
+			if (ifi->idx == idx) {
+				llist_del(&ifi->link);
+				free(ifi);
+
+				return true;
+			}
+		}
+	}
+
+	return false;
+}
+
+
+static void dump_bytes(const char *, unsigned char *, size_t) __unused;
+
+static void
+dump_bytes(const char *prefix, unsigned char *data, size_t len)
+{
+	int i;
+	static char buf[1024];
+	char *pch = buf;
+
+	if (prefix)
+		pch += sprintf(pch, "%s: ", prefix);
+
+	for (i = 0; i < len; i++)
+		pch += sprintf(pch, "0x%.2x ", data[i]);
+
+	fprintf(stdout, "%s\n", buf);
+}
+
+
+static void dump_nlmsg(FILE *, struct nlmsghdr *) __unused;
+
+static void
+dump_nlmsg(FILE *fp, struct nlmsghdr *nlh)
+{
+	fprintf(fp, "rtmsg: len=%04x type=%08x flags=%08x seq=%08x\n",
+			nlh->nlmsg_len,	nlh->nlmsg_type, nlh->nlmsg_flags,
+			nlh->nlmsg_seq);
+}
+
+
+static ssize_t sprint_lladdr(char *, size_t, const unsigned char *) __unused;
+
+static ssize_t
+sprint_lladdr(char *buf, size_t len, const unsigned char *addr)
+{
+	char *pch = buf;
+
+	pch += sprintf(pch, "%02x:%02x:%02x:%02x:%02x:%02x",
+				   addr[0], addr[1], addr[2], addr[3], addr[4], addr[5]);
+
+	return pch - buf;
+}
+
+static int
+rtnl_parse_attrs(struct rtattr *attr, size_t attr_len,
+				 struct rtattr **rta, size_t rta_len)
+{
+	memset(rta, 0, rta_len * sizeof(struct rtattr *));
+
+	while (RTA_OK(attr, attr_len)) {
+		if (attr->rta_type < rta_len)
+			rta[attr->rta_type] = attr;
+
+		attr = RTA_NEXT(attr, attr_len);
+	}
+
+	return 0;
+}
+
+
+static int
+nl_send(int fd, struct nlmsghdr *nlh)
+{
+	struct sockaddr_nl sa;
+
+	memset(&sa, 0, sizeof(sa));
+
+	sa.nl_family = AF_NETLINK;
+
+	nlh->nlmsg_pid = getpid();
+	nlh->nlmsg_seq = ++nl_seq;
+
+	return sendto(fd, nlh, nlh->nlmsg_len, 0, (struct sockaddr *)&sa,
+				  sizeof(sa));
+}
+
+
+static int
+nl_dump_request(int fd, int type)
+{
+	struct {
+		struct nlmsghdr nlh;
+		struct rtgenmsg gen;
+	} req = {
+		.nlh.nlmsg_flags = NLM_F_REQUEST | NLM_F_DUMP,
+		.gen.rtgen_family = AF_UNSPEC,
+	};
+
+	req.nlh.nlmsg_type = type;
+	req.nlh.nlmsg_len = sizeof(req);
+
+	return nl_send(fd, &req.nlh);
+}
+
+
+static int
+nl_listen(int fd, char *buf, size_t len)
+{
+	return read(fd, buf, len);
+}
+
+
+static int
+rtnl_handle_link(struct nlmsghdr *nlh)
+{
+	struct ifinfomsg *m = NLMSG_DATA(nlh);
+	struct rtattr *ifla[IFLA_MAX];
+	struct ifi *ifi;
+	size_t len = nlh->nlmsg_len - NLMSG_LENGTH(sizeof(struct ifinfomsg));
+
+	rtnl_parse_attrs(IFLA_RTA(m), len, ifla, IFLA_MAX);
+
+	switch (nlh->nlmsg_type) {
+	case RTM_NEWLINK:
+		if (m->ifi_flags & IFF_UP) {
+			if ((ifi = ifi_find_or_add(m->ifi_index)) == NULL)
+				return -1;
+
+			ifi->flags = m->ifi_flags;
+
+			if (ifla[IFLA_ADDRESS])
+				memcpy(ifi->lladdr, RTA_DATA(ifla[IFLA_ADDRESS]), 6);
+
+			if (ifla[IFLA_IFNAME])
+				strcpy(ifi->name, RTA_DATA(ifla[IFLA_IFNAME]));
+		} else
+			ifi_del(m->ifi_index);
+		break;
+
+	case RTM_DELLINK:
+		break;
+
+	default:
+		break;
+	}
+
+	return 0;
+}
+
+
+static int
+rtnl_handle_msg(struct nlmsghdr *nlh, size_t len)
+{
+	if (nlh == NULL)
+		return -1;
+
+	while (NLMSG_OK(nlh, len)) {
+		if (nlh->nlmsg_type & NLMSG_DONE)
+			return 0;
+
+#if 0
+		dump_nlmsg(stdout, nlh);
+#endif /* 0 */
+
+		switch (nlh->nlmsg_type) {
+		case RTM_NEWLINK:
+		case RTM_DELLINK:
+			rtnl_handle_link(nlh);
+			break;
+
+		case NLMSG_ERROR:
+			break;
+
+		default:
+			break;
+		}
+
+		nlh = NLMSG_NEXT(nlh, len);
+	}
+
+
+	return 0;
+}
+
+
+static int
+rtnl_read_cb(int fd, unsigned what, void *data)
+{
+	static char buf[4096];
+
+	for (;;) {
+		int nbytes;
+
+		if ((nbytes = nl_listen(fd, buf, sizeof(buf))) < 0) {
+			if (errno == EWOULDBLOCK)
+				return 0;
+
+			ulogd_log(ULOGD_ERROR, "nl_listen: %s\n", strerror(errno));
+
+			return -1;
+		}
+
+		rtnl_handle_msg((struct nlmsghdr *)buf, nbytes);
+	}
+
+	return 0;
+}
+
+
+static struct ulogd_fd nl_fd = {
+	.fd = -1,
+	.cb = rtnl_read_cb,
+	.when = ULOGD_FD_READ,
+};
+
+
+int
+ifi_init(void)
+{
+	struct sockaddr_nl sa = {
+		.nl_family = AF_NETLINK,
+		.nl_groups = RTNLGRP_LINK,
+	};
+
+	pr_debug("Initializing ifi\n");
+
+	sa.nl_pid = getpid();
+
+	if ((nl_fd.fd = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_ROUTE)) < 0) {
+		ulogd_log(ULOGD_ERROR, "ifi: socket: %s\n", strerror(errno));
+		return -1;
+	}
+
+	if (bind(nl_fd.fd, (struct sockaddr *)&sa, sizeof(sa)) < 0) {
+		ulogd_log(ULOGD_ERROR, "ifi: bind: %s\n", strerror(errno));
+		return -1;
+	}
+
+	if (connect(nl_fd.fd, (struct sockaddr *)&sa, sizeof(sa)) < 0) {
+		ulogd_log(ULOGD_ERROR, "ifi: connect: %s\n", strerror(errno));
+		return -1;
+	}
+
+	if (ulogd_register_fd(&nl_fd) < 0)
+		return -1;
+
+	nl_dump_request(nl_fd.fd, RTM_GETLINK);
+
+	return 0;
+}
Index: ulogd-netfilter/src/Makefile.am
===================================================================
--- ulogd-netfilter.orig/src/Makefile.am
+++ ulogd-netfilter/src/Makefile.am
@@ -5,5 +5,5 @@ AM_CPPFLAGS = $(all_includes) -I$(top_sr
 
 sbin_PROGRAMS = ulogd
 
-ulogd_SOURCES = ulogd.c select.c timer.c conffile.c
+ulogd_SOURCES = ulogd.c ifi.c select.c timer.c conffile.c
 ulogd_LDFLAGS = -export-dynamic
Index: ulogd-netfilter/src/ulogd.c
===================================================================
--- ulogd-netfilter.orig/src/ulogd.c
+++ ulogd-netfilter/src/ulogd.c
@@ -63,6 +63,9 @@
 #include <syslog.h>
 #include <ulogd/conffile.h>
 #include <ulogd/ulogd.h>
+#include <ulogd/ifi.h>
+
+
 #ifdef DEBUG
 #define DEBUGP(format, args...) fprintf(stderr, format, ## args)
 #else
@@ -983,6 +986,9 @@ int main(int argc, char* argv[])
 	ulogd_log(ULOGD_INFO, 
 		  "initialization finished, entering main loop\n");
 
+	if (ifi_init() < 0)
+		exit(EXIT_FAILURE);
+
 	ulogd_main_loop();
 
 	/* hackish, but result is the same */

-- 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [ULOGD 05/15] Add signalling subsystem
  2008-02-02 20:48 [ULOGD 00/15] ulogd V2 improvements, round 2 heitzenberger
                   ` (3 preceding siblings ...)
  2008-02-02 20:48 ` [ULOGD 04/15] Add IFI list heitzenberger
@ 2008-02-02 20:48 ` heitzenberger
  2008-02-19 19:38   ` Pablo Neira Ayuso
  2008-02-02 20:48 ` [ULOGD 06/15] Conffile cleanup, use common pr_debug() heitzenberger
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: heitzenberger @ 2008-02-02 20:48 UTC (permalink / raw)
  To: netfilter-devel; +Cc: holger

Hi,
Content-Disposition: inline; filename=ulogd-fix-signal-handling.diff

This patch adds the concept of synchronous and asynchronous signal
handlers to ulogd, where 'synchronous' just means to be synchronous to
the underlying IO multiplexer.

Will later be used by plugins like SQLITE3 and NFCT.

One of the changes herein is the usage of the pthread library.  This
is strictly necessary because some plugins might (and SQLITE3 *does*)
use pthreads.

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>

Index: ulogd-netfilter/src/Makefile.am
===================================================================
--- ulogd-netfilter.orig/src/Makefile.am
+++ ulogd-netfilter/src/Makefile.am
@@ -5,5 +5,5 @@ AM_CPPFLAGS = $(all_includes) -I$(top_sr
 
 sbin_PROGRAMS = ulogd
 
-ulogd_SOURCES = ulogd.c ifi.c select.c timer.c conffile.c
-ulogd_LDFLAGS = -export-dynamic
+ulogd_SOURCES = ulogd.c ifi.c select.c timer.c signal.c conffile.c
+ulogd_LDFLAGS = -export-dynamic -lpthread
Index: ulogd-netfilter/src/select.c
===================================================================
--- ulogd-netfilter.orig/src/select.c
+++ ulogd-netfilter/src/select.c
@@ -23,6 +23,7 @@
 
 #include <fcntl.h>
 #include <ulogd/ulogd.h>
+#include <ulogd/common.h>
 #include <ulogd/linuxlist.h>
 
 static int maxfd = 0;
@@ -59,6 +60,7 @@ int ulogd_select_main()
 {
 	struct ulogd_fd *ufd;
 	fd_set readset, writeset, exceptset;
+	struct timeval tv = { .tv_sec = 1, };
 	int i;
 
 	FD_ZERO(&readset);
@@ -77,7 +79,13 @@ int ulogd_select_main()
 			FD_SET(ufd->fd, &exceptset);
 	}
 
-	i = select(maxfd+1, &readset, &writeset, &exceptset, NULL);
+ again:
+	i = select(maxfd+1, &readset, &writeset, &exceptset, &tv);
+	if (i < 0) {
+		if (errno == EINTR)
+			goto again;
+	}
+
 	if (i > 0) {
 		/* call registered callback functions */
 		llist_for_each_entry(ufd, &ulogd_fds, list) {
@@ -96,5 +104,6 @@ int ulogd_select_main()
 				ufd->cb(ufd->fd, flags, ufd->data);
 		}
 	}
+
 	return i;
 }
Index: ulogd-netfilter/src/ulogd.c
===================================================================
--- ulogd-netfilter.orig/src/ulogd.c
+++ ulogd-netfilter/src/ulogd.c
@@ -53,16 +53,19 @@
 #include <errno.h>
 #include <time.h>
 #include <ctype.h>
-#include <signal.h>
 #include <dlfcn.h>
 #include <sys/types.h>
 #include <dirent.h>
 #include <getopt.h>
 #include <pwd.h>
 #include <grp.h>
+#include <pthread.h>
 #include <syslog.h>
-#include <ulogd/conffile.h>
+
 #include <ulogd/ulogd.h>
+#include <ulogd/common.h>
+#include <ulogd/conffile.h>
+#include <ulogd/signal.h>
 #include <ulogd/ifi.h>
 
 
@@ -204,7 +207,6 @@ int ulogd_wildcard_inputkeys(struct ulog
 
 	/* second pass: copy key names */
 	llist_for_each_entry(pi_cur, &stack->list, list) {
-		struct ulogd_key *cur;
 		int i;
 
 		for (i = 0; i < pi_cur->plugin->output.num_keys; i++)
@@ -298,7 +300,7 @@ void __ulogd_log(int level, char *file, 
 		vsyslog(ulogd2syslog_level(level), format, ap);
 		va_end(ap);
 	} else {
-		if (logfile)
+  		if (logfile)
 			outfd = logfile;
 		else
 			outfd = stderr;
@@ -711,21 +713,22 @@ out_buf:
 
 static int ulogd_main_loop(void)
 {
+	sigset_t curr;
 	int ret = 0;
 
-	while (1) {
+	ulogd_get_sigset(&curr);
+
+	pthread_sigmask(SIG_UNBLOCK, &curr, NULL);
+
+	for (;;) {
 		ret = ulogd_select_main();
 		if (ret == 0) 
 			continue;
 
 		if (ret < 0) {
-			if (errno == -EINTR)
-				continue;
-			else {
-				ulogd_log(ULOGD_ERROR, "select returned %s\n",
+			ulogd_log(ULOGD_ERROR, "select returned %s\n",
 					  strerror(errno));
-				break;
-			}
+			break;
 		}
 	}
 
@@ -736,7 +739,7 @@ static int ulogd_main_loop(void)
 static int logfile_open(const char *name)
 {
 	if (name)
-		ulogd_logfile = name;
+		ulogd_logfile = (char *)name;
 
 	if (!strcmp(name, "stdout")) {
 		logfile = stdout;
@@ -794,54 +797,83 @@ static int parse_conffile(const char *se
 	return 1;
 }
 
-static void deliver_signal_pluginstances(int signal)
+static int
+for_each_pluginstance(int (* cb)(struct ulogd_pluginstance *,
+								 struct ulogd_pluginstance_stack *,
+								 void *), void *arg)
 {
 	struct ulogd_pluginstance_stack *stack;
-	struct ulogd_pluginstance *pi;
+	int sum = 0;
+
+	pr_debug("%s: cb=%p\n", __func__, cb);
 
 	llist_for_each_entry(stack, &ulogd_pi_stacks, stack_list) {
+		struct ulogd_pluginstance *pi;
+
 		llist_for_each_entry(pi, &stack->list, list) {
-			if (pi->plugin->signal)
-				(*pi->plugin->signal)(pi, signal);
+			int ret;
+
+			if ((ret = cb(pi, stack, arg)) < 0)
+				return -1;
+
+			sum += ret;
 		}
 	}
+
+	return sum;
 }
 
-static void sigterm_handler(int signal)
+static int
+_do_signal(struct ulogd_pluginstance *pi,
+		   struct ulogd_pluginstance_stack *stack, void *arg)
 {
-	
-	ulogd_log(ULOGD_NOTICE, "sigterm received, exiting\n");
+	int signo = (int) arg;
 
-	deliver_signal_pluginstances(signal);
+	if (pi->plugin->signal) {
+		pi->plugin->signal(pi, signo);
 
-	if (logfile != stdout)
-		fclose(logfile);
+		return 1;
+	}
 
-	exit(0);
+	return 0;
 }
 
-static void signal_handler(int signal)
+static void
+sync_sig_handler(int signo)
 {
-	ulogd_log(ULOGD_NOTICE, "signal received, calling pluginstances\n");
-	
-	switch (signal) {
+	pr_debug("%s: signal '%d' received\n", __func__, signo);
+
+	switch (signo) {
 	case SIGHUP:
-		/* reopen logfile */
-		if (logfile != stdout && logfile != &syslog_dummy) {
-			fclose(logfile);
-			logfile = fopen(ulogd_logfile, "a");
-			if (!logfile)
-				sigterm_handler(signal);
-		}
 		break;
+
 	case SIGALRM:
-		ulogd_timer_check_n_run();
+		ulogd_timer_handle();
+		break;
+
+	case SIGTERM:
 		break;
+
 	default:
 		break;
 	}
 
-	deliver_signal_pluginstances(signal);
+	for_each_pluginstance(_do_signal, (void *) signo);
+}
+
+static void
+sig_handler(int signo)
+{
+	pr_debug("%s: signal '%d' received\n", __func__, signo);
+
+	switch (signo) {
+	case SIGINT:
+		exit(0);
+		break;
+
+	default:
+		break;
+	}
 }
 
 static void print_usage(void)
@@ -867,7 +899,8 @@ static struct option opts[] = {
 	{ 0 }
 };
 
-int main(int argc, char* argv[])
+int
+main(int argc, char* argv[])
 {
 	int argch;
 	int daemonize = 0;
@@ -922,6 +955,11 @@ int main(int argc, char* argv[])
 		}
 	}
 
+	if (ulogd_signal_init() < 0)
+		exit(EXIT_FAILURE);
+
+	ulogd_timer_init();
+
 	if (config_register_file(ulogd_configfile)) {
 		ulogd_log(ULOGD_FATAL, "error registering configfile \"%s\"\n",
 			  ulogd_configfile);
@@ -977,21 +1015,21 @@ int main(int argc, char* argv[])
 		setsid();
 	}
 
-	signal(SIGTERM, &sigterm_handler);
-	signal(SIGHUP, &signal_handler);
-	signal(SIGALRM, &signal_handler);
-	signal(SIGUSR1, &signal_handler);
-	signal(SIGUSR2, &signal_handler);
-
-	ulogd_log(ULOGD_INFO, 
-		  "initialization finished, entering main loop\n");
+	ulogd_register_signal(SIGTERM, sync_sig_handler, ULOGD_SIGF_SYNC);
+	ulogd_register_signal(SIGINT, sig_handler, 0);
+	ulogd_register_signal(SIGHUP, sync_sig_handler, ULOGD_SIGF_SYNC);
+	ulogd_register_signal(SIGALRM, sync_sig_handler, ULOGD_SIGF_SYNC);
+	ulogd_register_signal(SIGUSR1, sync_sig_handler, ULOGD_SIGF_SYNC);
+	ulogd_register_signal(SIGUSR2, sync_sig_handler, ULOGD_SIGF_SYNC);
 
 	if (ifi_init() < 0)
 		exit(EXIT_FAILURE);
 
+	ulogd_timer_run();
+
+	ulogd_log(ULOGD_INFO, "entering main loop\n");
+
 	ulogd_main_loop();
 
-	/* hackish, but result is the same */
-	sigterm_handler(SIGTERM);	
-	return(0);
+	return 0;
 }
Index: ulogd-netfilter/include/ulogd/signal.h
===================================================================
--- /dev/null
+++ ulogd-netfilter/include/ulogd/signal.h
@@ -0,0 +1,46 @@
+/*
+ * signal.h
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ *
+ *
+ * Holger Eitzenberger <holger@eitzenberger.org>  Astaro AG, 2007.
+ */
+#ifndef SIGNAL_H
+#define SIGNAL_H
+
+#include <signal.h>
+#include <ulogd/linuxlist.h>
+
+
+/* signal flags */
+#define ULOGD_SIGF_SYNC		0x00000001 /* signal is synchronous */
+
+
+struct ulogd_signal {
+	struct llist_head link;
+	int signo;
+	unsigned flags;
+	void (* handler)(int);
+};
+
+
+struct ulogd_signal *ulogd_register_signal(int, void (*)(int), unsigned);
+int ulogd_unregister_signal(struct ulogd_signal *);
+int ulogd_sigaddset(int);
+int ulogd_get_sigset(sigset_t *);
+int ulogd_deliver_signal(int signo);
+int ulogd_signal_init(void);
+
+#endif /* SIGNAL_H */
Index: ulogd-netfilter/src/signal.c
===================================================================
--- /dev/null
+++ ulogd-netfilter/src/signal.c
@@ -0,0 +1,193 @@
+/*
+ * signal.c
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ *
+ * Holger Eitzenberger <holger@eitzenberger.org  Astaro AG, 2007.
+ */
+#include <ulogd/ulogd.h>
+#include <ulogd/common.h>
+#include <ulogd/signal.h>
+#include <unistd.h>
+
+#define SIG_F_USED		0x00000001
+
+
+static struct sig_state {
+	struct llist_head head;
+	struct llist_head async_head;
+	unsigned flags;
+	unsigned cnt;
+} sig_state[NSIG];
+static sigset_t currset;
+static int sig_pipe[2] = { -1, -1 };
+static struct ulogd_fd sig_pipe_fd;
+
+
+static void
+sig_handler(int signo)
+{
+	struct ulogd_signal *sig;
+	sigset_t sigset;
+
+	assert(sig_pipe[1] >= 0);
+
+	pr_debug("%s: received signal '%d'\n", __func__, signo);
+
+	sigemptyset(&sigset);
+	sigaddset(&sigset, signo);
+
+	pthread_sigmask(SIG_BLOCK, &sigset, NULL);
+
+	llist_for_each_entry(sig, &sig_state[signo].async_head, link)
+		sig->handler(signo);
+
+	pthread_sigmask(SIG_UNBLOCK, &sigset, NULL);
+
+	if (!llist_empty(&sig_state[signo].head))
+		write(sig_pipe[1], &signo, sizeof(signo));
+}
+
+struct ulogd_signal *
+ulogd_register_signal(int signo, void (* sigh)(int), unsigned flags)
+{
+	struct ulogd_signal *sig;
+
+	if (signo < 0 || signo > NSIG || sigh == NULL)
+		return NULL;
+
+	/* TODO check signo for values which may not be used synchronous */
+
+	pr_debug("%s: registering handler %p for signal '%d'\n", __func__,
+			 sigh, signo);
+
+	if ((sig = calloc(1, sizeof(struct ulogd_signal))) == NULL)
+		return NULL;
+
+	sig->signo = signo;
+	sig->flags = flags;
+	sig->handler = sigh;
+
+	sig_state[signo].cnt++;
+
+	/* add real signal handler */
+	if ((sig_state[signo].flags & SIG_F_USED) == 0) {
+		signal(signo, sig_handler);
+
+		sig_state[signo].flags |= SIG_F_USED;
+	}
+
+	if (flags & ULOGD_SIGF_SYNC)
+		llist_add_tail(&sig->link, &sig_state[signo].head);
+	else
+		llist_add_tail(&sig->link, &sig_state[signo].async_head);
+
+	ulogd_sigaddset(signo);
+
+	return sig;
+}
+
+int
+ulogd_unregister_signal(struct ulogd_signal *sig)
+{
+	if (sig == NULL || sig->signo < 0 || sig->signo >= NSIG)
+		return -1;
+
+	pr_debug("%s: unregistering handler %p\n", __func__, sig);
+
+	if (--sig_state[sig->signo].cnt == 0)
+		signal(sig->signo, SIG_DFL);
+
+	llist_del(&sig->link);
+
+	free(sig);
+
+	return 0;
+}
+
+int
+ulogd_sigaddset(int signo)
+{
+	return sigaddset(&currset, signo);
+}
+
+int
+ulogd_get_sigset(sigset_t *sigset)
+{
+	assert(sigset != NULL);
+
+	memcpy(sigset, &currset, sizeof(sigset_t));
+
+	return 0;
+}
+
+static int
+sig_pipe_cb(int fd, unsigned what, void *arg)
+{
+	struct ulogd_signal *sig;
+	sigset_t sigset;
+	int signo, nbytes;
+
+	assert(what == ULOGD_FD_READ);
+
+	nbytes = read(fd, &signo, sizeof(signo));
+
+	pr_debug("%s: signo=%d\n", __func__, signo);
+
+	assert(nbytes == sizeof(signo));
+
+	if (signo < 0 || signo > NSIG)
+		abort();
+
+	sigemptyset(&sigset);
+	sigaddset(&sigset, signo);
+
+	pthread_sigmask(SIG_BLOCK, &sigset, NULL);
+
+	llist_for_each_entry(sig, &sig_state[signo].head, link)
+		sig->handler(signo);
+
+	pthread_sigmask(SIG_UNBLOCK, &sigset, NULL);
+
+	return 0;
+}
+
+int
+ulogd_signal_init(void)
+{
+	int i;
+	sigset_t fullset;
+
+	sigfillset(&fullset);
+	pthread_sigmask(SIG_SETMASK, &fullset, NULL);
+
+	sigemptyset(&currset);
+
+	for (i = 0; i < NSIG; i++) {
+		INIT_LLIST_HEAD(&sig_state[i].head);
+		INIT_LLIST_HEAD(&sig_state[i].async_head);
+	}
+
+	/* init signal pipe */
+	if (pipe(sig_pipe) < 0) {
+		ulogd_log(ULOGD_FATAL, "unable to initialize signal pipe\n");
+		return -1;
+	}
+
+	sig_pipe_fd.fd = sig_pipe[0];
+	sig_pipe_fd.cb = sig_pipe_cb;
+	sig_pipe_fd.when = ULOGD_FD_READ;
+
+	return ulogd_register_fd(&sig_pipe_fd);
+}

-- 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [ULOGD 06/15] Conffile cleanup, use common pr_debug()
  2008-02-02 20:48 [ULOGD 00/15] ulogd V2 improvements, round 2 heitzenberger
                   ` (4 preceding siblings ...)
  2008-02-02 20:48 ` [ULOGD 05/15] Add signalling subsystem heitzenberger
@ 2008-02-02 20:48 ` heitzenberger
  2008-02-02 21:43   ` Pablo Neira Ayuso
  2008-02-02 20:48 ` [ULOGD 07/15] Renice to -1 on startup heitzenberger
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: heitzenberger @ 2008-02-02 20:48 UTC (permalink / raw)
  To: netfilter-devel; +Cc: holger

Hi,
Content-Disposition: inline; filename=ulogd-config-changes.diff

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>

Index: ulogd-netfilter/src/conffile.c
===================================================================
--- ulogd-netfilter.orig/src/conffile.c
+++ ulogd-netfilter/src/conffile.c
@@ -17,17 +17,10 @@
  *  along with this program; if not, write to the Free Software
  *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
  */
-
-#include <stdio.h>
-#include <string.h>
-#include <stdlib.h>
+#include <ulogd/ulogd.h>
+#include <ulogd/common.h>
 #include <ulogd/conffile.h>
 
-#ifdef DEBUG_CONF
-#define DEBUGC(format, args...) fprintf(stderr, format, ## args)
-#else
-#define DEBUGC(format, args...)
-#endif
 
 /* points to config entry with error */
 struct config_entry *config_errce = NULL;
@@ -101,6 +94,8 @@ int config_register_file(const char *fil
 	if (fname)
 		return 1;
 
+	pr_debug("%s: registered config file '%s'\n", __func__, file);
+
 	fname = (char *) malloc(strlen(file)+1);
 	if (!fname)
 		return -ERROOM;
@@ -121,12 +116,12 @@ int config_parse_file(const char *sectio
 	char linebuf[LINE_LEN+1];
 	char *line = linebuf;
 
+	pr_debug("%s: section='%s' file='%s'\n", __func__, section, fname);
+
 	cfile = fopen(fname, "r");
 	if (!cfile)
 		return -ERROPEN;
 
-	DEBUGC("parsing section [%s]\n", section);
-
 	/* Search for correct section */
 	while (fgets(line, LINE_LEN, cfile)) {
 		char wordbuf[LINE_LEN];
@@ -137,7 +132,7 @@ int config_parse_file(const char *sectio
 
 		if (!(wordend = get_word(line, " \t\n[]", (char *) wordbuf)))
 			continue;
-		DEBUGC("word: \"%s\"\n", wordbuf);
+		pr_debug("word: \"%s\"\n", wordbuf);
 		if (!strcmp(wordbuf, section)) {
 			found = 1;
 			break;
@@ -156,7 +151,7 @@ int config_parse_file(const char *sectio
 		char wordbuf[LINE_LEN];
 		char *wordend;
 		
-		DEBUGC("line read: %s\n", line);
+		pr_debug("line read: %s\n", line);
 		if (*line == '#')
 			continue;
 
@@ -164,14 +159,14 @@ int config_parse_file(const char *sectio
 			continue;
 
 		if (wordbuf[0] == '[' ) {
-			DEBUGC("Next section '%s' encountered\n", wordbuf);
+			pr_debug("Next section '%s' encountered\n", wordbuf);
 			break;
 		}
 
-		DEBUGC("parse_file: entering main loop\n");
+		pr_debug("parse_file: entering main loop\n");
 		for (i = 0; i < kset->num_ces; i++) {
 			struct config_entry *ce = &kset->ces[i];
-			DEBUGC("parse main loop, key: %s\n", ce->key);
+			pr_debug("parse main loop, key: %s\n", ce->key);
 			if (strcmp(ce->key, (char *) &wordbuf)) {
 				continue;
 			}
@@ -181,7 +176,7 @@ int config_parse_file(const char *sectio
 
 			if (ce->hit && !(ce->options & CONFIG_OPT_MULTI))
 			{
-				DEBUGC("->ce-hit and option not multi!\n");
+				pr_debug("->ce-hit and option not multi!\n");
 				config_errce = ce;
 				err = -ERRMULT;
 				goto cpf_error;
@@ -205,15 +200,15 @@ int config_parse_file(const char *sectio
 			}
 			break;
 		}
-		DEBUGC("parse_file: exiting main loop\n");
+		pr_debug("parse_file: exiting main loop\n");
 	}
 
 
 	for (i = 0; i < kset->num_ces; i++) {
 		struct config_entry *ce = &kset->ces[i];
-		DEBUGC("ce post loop, ce=%s\n", ce->key);
+		pr_debug("ce post loop, ce=%s\n", ce->key);
 		if ((ce->options & CONFIG_OPT_MANDATORY) && (ce->hit == 0)) {
-			DEBUGC("Mandatory config directive \"%s\" not found\n",
+			pr_debug("Mandatory config directive \"%s\" not found\n",
 				ce->key);
 			config_errce = ce;
 			err = -ERRMAND;

-- 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [ULOGD 07/15] Renice to -1 on startup
  2008-02-02 20:48 [ULOGD 00/15] ulogd V2 improvements, round 2 heitzenberger
                   ` (5 preceding siblings ...)
  2008-02-02 20:48 ` [ULOGD 06/15] Conffile cleanup, use common pr_debug() heitzenberger
@ 2008-02-02 20:48 ` heitzenberger
  2008-02-02 21:47   ` Pablo Neira Ayuso
  2008-02-02 20:48 ` [ULOGD 08/15] Initial round to make plugins reconfigurable heitzenberger
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: heitzenberger @ 2008-02-02 20:48 UTC (permalink / raw)
  To: netfilter-devel; +Cc: holger

Hi,
Content-Disposition: inline; filename=ulogd-renice-on-startup.diff

Thus possibly preventing e.g. ctnetlink from overruns on busy sites.

TODO: make this conditional

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>

Index: ulogd-netfilter/src/ulogd.c
===================================================================
--- ulogd-netfilter.orig/src/ulogd.c
+++ ulogd-netfilter/src/ulogd.c
@@ -1004,6 +1004,8 @@ main(int argc, char* argv[])
 		}
 	}
 
+	nice(-1);
+
 	if (daemonize){
 		if (fork()) {
 			exit(0);

-- 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [ULOGD 08/15] Initial round to make plugins reconfigurable
  2008-02-02 20:48 [ULOGD 00/15] ulogd V2 improvements, round 2 heitzenberger
                   ` (6 preceding siblings ...)
  2008-02-02 20:48 ` [ULOGD 07/15] Renice to -1 on startup heitzenberger
@ 2008-02-02 20:48 ` heitzenberger
  2008-02-02 20:48 ` [ULOGD 09/15] llist: add llist_for_each_prev_safe() heitzenberger
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: heitzenberger @ 2008-02-02 20:48 UTC (permalink / raw)
  To: netfilter-devel; +Cc: holger

Hi,
Content-Disposition: inline; filename=ulogd-allow-plugins-to-be-reconfigurable.diff

In order to make ulogd fully reconfigurable without restarting this
patch does some introductory work letting plugins be able to be
flagged as reconfigurable.

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>

Index: ulogd-netfilter/include/ulogd/ulogd.h
===================================================================
--- ulogd-netfilter.orig/include/ulogd/ulogd.h
+++ ulogd-netfilter/include/ulogd/ulogd.h
@@ -125,6 +125,10 @@ struct ulogd_keyset {
 
 struct ulogd_pluginstance_stack;
 struct ulogd_pluginstance;
+
+/* plugin flags */
+#define ULOGD_PF_RECONF			0x00000001
+
 struct ulogd_plugin {
 	/* global list of plugins */
 	struct llist_head list;
@@ -135,6 +139,8 @@ struct ulogd_plugin {
 	/* ID for this plugin (dynamically assigned) */
 	unsigned int id;
 
+	unsigned flags;
+
 	struct ulogd_keyset input;
 	struct ulogd_keyset output;
 
Index: ulogd-netfilter/src/ulogd.c
===================================================================
--- ulogd-netfilter.orig/src/ulogd.c
+++ ulogd-netfilter/src/ulogd.c
@@ -218,6 +218,23 @@ int ulogd_wildcard_inputkeys(struct ulog
 	return 0;
 }
 
+/*
+  ulogd_pluginstance_reset_cfg() - reset to default config
+*/
+int
+ulogd_pluginstance_reset_cfg(const struct ulogd_pluginstance *pi)
+{
+	size_t size;
+
+	assert(pi->plugin != NULL);
+
+	size = sizeof(struct config_keyset)
+		+ pi->plugin->config_kset->num_ces * sizeof(struct config_entry);
+
+	memcpy(pi->config_kset, pi->plugin->config_kset, size);
+
+	return 0;
+}
 
 /***********************************************************************
  * PLUGIN MANAGEMENT 
@@ -823,6 +840,69 @@ for_each_pluginstance(int (* cb)(struct 
 	return sum;
 }
 
+enum ReconfOp
+{
+	INVAL = 0,
+	STOP,
+	CONFIGURE,
+	START,
+};
+
+static int
+_do_reconf(struct ulogd_pluginstance *pi,
+		   struct ulogd_pluginstance_stack *stack, void *arg)
+{
+	enum ReconfOp op = (unsigned)arg;
+	int ret = 0;
+
+	assert(pi != NULL);
+
+	if ((pi->plugin->flags & ULOGD_PF_RECONF) == 0)
+		return 0;
+
+	switch (op) {
+	case STOP:
+		ret = pi->plugin->stop(pi);
+		break;
+
+	case CONFIGURE:
+		ulogd_pluginstance_reset_cfg(pi);
+		ret = pi->plugin->configure(pi, stack);
+		break;
+
+	case START:
+		ret = pi->plugin->start(pi);
+		break;
+
+	default:
+		return -1;
+	}
+
+	if (ret < 0) {
+		ulogd_log(ULOGD_FATAL, "reconfiguring '%s' failed\n", pi->id);
+		return -1;
+	}
+
+	return 1;
+}
+
+static int
+reconfigure_plugins(void)
+{
+	ulogd_log(ULOGD_INFO, "reconfiguring plugins\n");
+
+	if (for_each_pluginstance(_do_reconf, (void *)STOP) < 0)
+		abort();
+
+	if (for_each_pluginstance(_do_reconf, (void *)CONFIGURE) < 0)
+		abort();
+
+	if (for_each_pluginstance(_do_reconf, (void *)START) < 0)
+		abort();
+
+	return 0;
+}
+
 static int
 _do_signal(struct ulogd_pluginstance *pi,
 		   struct ulogd_pluginstance_stack *stack, void *arg)
@@ -845,6 +925,7 @@ sync_sig_handler(int signo)
 
 	switch (signo) {
 	case SIGHUP:
+		reconfigure_plugins();
 		break;
 
 	case SIGALRM:

-- 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [ULOGD 09/15] llist: add llist_for_each_prev_safe()
  2008-02-02 20:48 [ULOGD 00/15] ulogd V2 improvements, round 2 heitzenberger
                   ` (7 preceding siblings ...)
  2008-02-02 20:48 ` [ULOGD 08/15] Initial round to make plugins reconfigurable heitzenberger
@ 2008-02-02 20:48 ` heitzenberger
  2008-02-02 20:48 ` [ULOGD 10/15] Improve select performance heitzenberger
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: heitzenberger @ 2008-02-02 20:48 UTC (permalink / raw)
  To: netfilter-devel; +Cc: holger

Hi,
Content-Disposition: inline; filename=ulogd-common-add-llist_for_each_prev_safe.diff

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>

Index: ulogd-netfilter/include/ulogd/linuxlist.h
===================================================================
--- ulogd-netfilter.orig/include/ulogd/linuxlist.h
+++ ulogd-netfilter/include/ulogd/linuxlist.h
@@ -245,6 +245,16 @@ static inline void llist_splice_init(str
         	pos = pos->prev, prefetch(pos->prev))
         	
 /**
+ * llist_for_each_prev_safe - iterate over llist backwards safe against removal
+ * @pos:	the type * to use as a loop counter.
+ * @n:		another type * to use as temporary storage
+ * @head:	the head for your llist.
+ */
+#define llist_for_each_prev_safe(pos, n, head) \
+    for (pos = (head)->prev, n = pos->prev; pos != (head); \
+        pos = n, n = pos->prev)
+
+/**
  * llist_for_each_safe	-	iterate over a llist safe against removal of llist entry
  * @pos:	the &struct llist_head to use as a loop counter.
  * @n:		another &struct llist_head to use as temporary storage

-- 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [ULOGD 10/15] Improve select performance
  2008-02-02 20:48 [ULOGD 00/15] ulogd V2 improvements, round 2 heitzenberger
                   ` (8 preceding siblings ...)
  2008-02-02 20:48 ` [ULOGD 09/15] llist: add llist_for_each_prev_safe() heitzenberger
@ 2008-02-02 20:48 ` heitzenberger
  2008-02-19 19:58   ` Pablo Neira Ayuso
  2008-02-02 20:48 ` [ULOGD 11/15] Add set_sockbuf_len() heitzenberger
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: heitzenberger @ 2008-02-02 20:48 UTC (permalink / raw)
  To: netfilter-devel; +Cc: holger

Hi,
Content-Disposition: inline; filename=ulogd-select-performance-improvement.diff

The previous code consumed quite lots of CPU cycles because of 
inefficiently handling fd_sets.

This patch corrects that.

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>

Index: ulogd-netfilter/src/select.c
===================================================================
--- ulogd-netfilter.orig/src/select.c
+++ ulogd-netfilter/src/select.c
@@ -24,86 +24,125 @@
 #include <fcntl.h>
 #include <ulogd/ulogd.h>
 #include <ulogd/common.h>
+#include <ulogd/signal.h>
 #include <ulogd/linuxlist.h>
 
-static int maxfd = 0;
+static fd_set readset, writeset, exceptset;
+static int maxfd = -1;
 static LLIST_HEAD(ulogd_fds);
 
-int ulogd_register_fd(struct ulogd_fd *fd)
+
+static int
+set_nonblock(int fd)
 {
 	int flags;
 
-	/* make FD nonblocking */
-	flags = fcntl(fd->fd, F_GETFL);
-	if (flags < 0)
+	if ((flags = fcntl(fd, F_GETFL)) < 0)
 		return -1;
+
 	flags |= O_NONBLOCK;
-	flags = fcntl(fd->fd, F_SETFL, flags);
-	if (flags < 0)
+
+	if ((flags = fcntl(fd, F_SETFL, flags)) < 0)
+		return -1;
+
+	return 0;
+}
+
+int
+ulogd_register_fd(struct ulogd_fd *ufd)
+{
+	if (set_nonblock(ufd->fd) < 0)
 		return -1;
 
-	/* Register FD */
-	if (fd->fd > maxfd)
-		maxfd = fd->fd;
+	if (ufd->when & ULOGD_FD_READ)
+		FD_SET(ufd->fd, &readset);
+	
+	if (ufd->when & ULOGD_FD_WRITE)
+		FD_SET(ufd->fd, &writeset);
+	
+	if (ufd->when & ULOGD_FD_EXCEPT)
+		FD_SET(ufd->fd, &exceptset);
+
+	if (ufd->fd > maxfd)
+		maxfd = ufd->fd;
 
-	llist_add_tail(&fd->list, &ulogd_fds);
+	llist_add_tail(&ufd->list, &ulogd_fds);
 
 	return 0;
 }
 
-void ulogd_unregister_fd(struct ulogd_fd *fd)
+void
+ulogd_unregister_fd(struct ulogd_fd *ufd)
 {
-	llist_del(&fd->list);
+	if (ufd->when & ULOGD_FD_READ)
+		FD_CLR(ufd->fd, &readset);
+	
+	if (ufd->when & ULOGD_FD_WRITE)
+		FD_CLR(ufd->fd, &writeset);
+	
+	if (ufd->when & ULOGD_FD_EXCEPT)
+		FD_CLR(ufd->fd, &exceptset);
+
+	llist_del(&ufd->list);
+
+	maxfd = -1;
+	llist_for_each_entry(ufd, &ulogd_fds, list) {
+		assert(ufd->fd >= 0);
+
+		if (ufd->fd > maxfd)
+			maxfd = ufd->fd;
+	}
 }
 
-int ulogd_select_main()
+/* ulogd_dispatch() - dispatch events */
+int
+ulogd_dispatch(void)
 {
-	struct ulogd_fd *ufd;
-	fd_set readset, writeset, exceptset;
-	struct timeval tv = { .tv_sec = 1, };
-	int i;
-
-	FD_ZERO(&readset);
-	FD_ZERO(&writeset);
-	FD_ZERO(&exceptset);
+	fd_set rds_tmp, wrs_tmp, exs_tmp;
+	sigset_t curr_ss;
 
-	/* prepare read and write fdsets */
-	llist_for_each_entry(ufd, &ulogd_fds, list) {
-		if (ufd->when & ULOGD_FD_READ)
-			FD_SET(ufd->fd, &readset);
+	ulogd_get_sigset(&curr_ss);
 
-		if (ufd->when & ULOGD_FD_WRITE)
-			FD_SET(ufd->fd, &writeset);
+	pthread_sigmask(SIG_UNBLOCK, &curr_ss, NULL);
 
-		if (ufd->when & ULOGD_FD_EXCEPT)
-			FD_SET(ufd->fd, &exceptset);
-	}
+	for (;;) {
+		struct ulogd_fd *ufd;
+		int n;
 
- again:
-	i = select(maxfd+1, &readset, &writeset, &exceptset, &tv);
-	if (i < 0) {
-		if (errno == EINTR)
-			goto again;
-	}
+	again:
+		rds_tmp = readset;
+		wrs_tmp = writeset;
+		exs_tmp = exceptset;
 
-	if (i > 0) {
-		/* call registered callback functions */
-		llist_for_each_entry(ufd, &ulogd_fds, list) {
-			int flags = 0;
-
-			if (FD_ISSET(ufd->fd, &readset))
-				flags |= ULOGD_FD_READ;
+		n = select(maxfd+1, &rds_tmp, &wrs_tmp, &exs_tmp, NULL);
+		if (n < 0) {
+			if (errno == EINTR)
+				goto again;
 
-			if (FD_ISSET(ufd->fd, &writeset))
-				flags |= ULOGD_FD_WRITE;
+			ulogd_log(ULOGD_FATAL, "select: %m\n");
 
-			if (FD_ISSET(ufd->fd, &exceptset))
-				flags |= ULOGD_FD_EXCEPT;
+			break;
+		}
 
-			if (flags)
-				ufd->cb(ufd->fd, flags, ufd->data);
+		if (n > 0) {
+			/* call registered callback functions */
+			llist_for_each_entry(ufd, &ulogd_fds, list) {
+				int flags = 0;
+
+				if (FD_ISSET(ufd->fd, &rds_tmp))
+					flags |= ULOGD_FD_READ;
+
+				if (FD_ISSET(ufd->fd, &wrs_tmp))
+					flags |= ULOGD_FD_WRITE;
+
+				if (FD_ISSET(ufd->fd, &exs_tmp))
+					flags |= ULOGD_FD_EXCEPT;
+
+				if (flags)
+					ufd->cb(ufd->fd, flags, ufd->data);
+			}
 		}
 	}
 
-	return i;
+	return 0;
 }
Index: ulogd-netfilter/src/ulogd.c
===================================================================
--- ulogd-netfilter.orig/src/ulogd.c
+++ ulogd-netfilter/src/ulogd.c
@@ -727,32 +727,6 @@ out_buf:
 	return ret;
 }
 	
-
-static int ulogd_main_loop(void)
-{
-	sigset_t curr;
-	int ret = 0;
-
-	ulogd_get_sigset(&curr);
-
-	pthread_sigmask(SIG_UNBLOCK, &curr, NULL);
-
-	for (;;) {
-		ret = ulogd_select_main();
-		if (ret == 0) 
-			continue;
-
-		if (ret < 0) {
-			ulogd_log(ULOGD_ERROR, "select returned %s\n",
-					  strerror(errno));
-			break;
-		}
-	}
-
-	return ret;
-}
-
-/* open the logfile */
 static int logfile_open(const char *name)
 {
 	if (name)
@@ -1112,7 +1086,7 @@ main(int argc, char* argv[])
 
 	ulogd_log(ULOGD_INFO, "entering main loop\n");
 
-	ulogd_main_loop();
+	ulogd_dispatch();
 
 	return 0;
 }
Index: ulogd-netfilter/include/ulogd/ulogd.h
===================================================================
--- ulogd-netfilter.orig/include/ulogd/ulogd.h
+++ ulogd-netfilter/include/ulogd/ulogd.h
@@ -243,7 +243,7 @@ struct ulogd_fd {
 
 int ulogd_register_fd(struct ulogd_fd *ufd);
 void ulogd_unregister_fd(struct ulogd_fd *ufd);
-int ulogd_select_main();
+int ulogd_dispatch(void);
 
 /***********************************************************************
  * timer handling (timer.c)

-- 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [ULOGD 11/15] Add set_sockbuf_len()
  2008-02-02 20:48 [ULOGD 00/15] ulogd V2 improvements, round 2 heitzenberger
                   ` (9 preceding siblings ...)
  2008-02-02 20:48 ` [ULOGD 10/15] Improve select performance heitzenberger
@ 2008-02-02 20:48 ` heitzenberger
  2008-02-19 19:57   ` Pablo Neira Ayuso
  2008-02-02 20:48 ` [ULOGD 12/15] Introduce global state, skip some stacks during reconfiguration heitzenberger
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: heitzenberger @ 2008-02-02 20:48 UTC (permalink / raw)
  To: netfilter-devel; +Cc: holger

Hi,
Content-Disposition: inline; filename=ulogd-add-set_sockbuf_len.diff

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>

Index: ulogd-netfilter/include/ulogd/common.h
===================================================================
--- ulogd-netfilter.orig/include/ulogd/common.h
+++ ulogd-netfilter/include/ulogd/common.h
@@ -47,4 +47,6 @@
 #define pr_debug(fmt, ...)
 #endif /* DEBUG */
 
+int set_sockbuf_len(int fd, int rcv_len, int snd_len);
+
 #endif /* COMMON_H */
Index: ulogd-netfilter/src/common.c
===================================================================
--- /dev/null
+++ ulogd-netfilter/src/common.c
@@ -0,0 +1,52 @@
+/*
+ * common.c
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ *
+ * Holger Eitzenberger <holger@eitzenberger.org>, 2007.
+ */
+#include <ulogd/ulogd.h>
+#include <ulogd/common.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+
+
+int
+set_sockbuf_len(int fd, int rcv_len, int snd_len)
+{
+	int ret;
+
+	pr_debug("%s: fd=%d rcv-len=%d snd-len\n", __func__, fd, rcv_len,
+			 snd_len);
+
+	if (snd_len > 0) {
+		ret = setsockopt(fd, SOL_SOCKET, SO_SNDBUF, &snd_len,
+						 sizeof(snd_len));
+		if (ret < 0) {
+			ulogd_log(ULOGD_ERROR, "setsockopt: SO_SNDBUF: %m\n");
+			return -1;
+		}
+	}
+
+	if (rcv_len > 0) {
+		ret = setsockopt(fd, SOL_SOCKET, SO_RCVBUF, &rcv_len,
+						 sizeof(rcv_len));
+		if (ret < 0) {
+			ulogd_log(ULOGD_ERROR, "setsockopt: SO_RCVBUF: %m\n");
+			return -1;
+		}
+	}
+
+	return 0;
+}
Index: ulogd-netfilter/src/Makefile.am
===================================================================
--- ulogd-netfilter.orig/src/Makefile.am
+++ ulogd-netfilter/src/Makefile.am
@@ -5,5 +5,5 @@ AM_CPPFLAGS = $(all_includes) -I$(top_sr
 
 sbin_PROGRAMS = ulogd
 
-ulogd_SOURCES = ulogd.c ifi.c select.c timer.c signal.c conffile.c
+ulogd_SOURCES = ulogd.c ifi.c select.c timer.c signal.c common.c conffile.c
 ulogd_LDFLAGS = -export-dynamic -lpthread

-- 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [ULOGD 12/15] Introduce global state, skip some stacks during reconfiguration
  2008-02-02 20:48 [ULOGD 00/15] ulogd V2 improvements, round 2 heitzenberger
                   ` (10 preceding siblings ...)
  2008-02-02 20:48 ` [ULOGD 11/15] Add set_sockbuf_len() heitzenberger
@ 2008-02-02 20:48 ` heitzenberger
  2008-02-02 20:48 ` [ULOGD 13/15] llist: turn poisoning off by default heitzenberger
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: heitzenberger @ 2008-02-02 20:48 UTC (permalink / raw)
  To: netfilter-devel; +Cc: holger

Hi,
Content-Disposition: inline; filename=ulogd-fix-races-during-reconf.diff

Some plugins are not reconfigurable, therefore skips entire stacks if
there is at least one plugin which is not reconfigurable.

Also introduce a global state to counter some pathological
reconfiguration issues.

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>

Index: ulogd-netfilter/src/ulogd.c
===================================================================
--- ulogd-netfilter.orig/src/ulogd.c
+++ ulogd-netfilter/src/ulogd.c
@@ -83,6 +83,7 @@ static FILE *logfile = NULL;		/* logfile
 static char *ulogd_configfile = ULOGD_CONFIGFILE;
 static char *ulogd_logfile = ULOGD_LOGFILE_DEFAULT;
 static FILE syslog_dummy;
+static enum GlobalState state;
 
 /* linked list for all registered plugins */
 static LLIST_HEAD(ulogd_plugins);
@@ -129,6 +130,19 @@ static struct config_keyset ulogd_kset =
 #define loglevel_ce	ulogd_kset.ces[2]
 #define stack_ce	ulogd_kset.ces[3]
 
+
+void
+ulogd_set_state(enum GlobalState s)
+{
+	state = s;
+}
+
+enum GlobalState
+ulogd_get_state(void)
+{
+	return state;
+}
+
 /***********************************************************************
  * UTILITY FUNCTIONS FOR PLUGINS
  ***********************************************************************/
@@ -195,8 +209,8 @@ int ulogd_wildcard_inputkeys(struct ulog
 
 	/* first pass: count keys */
 	llist_for_each_entry(pi_cur, &stack->list, list) {
-		ulogd_log(ULOGD_DEBUG, "iterating over pluginstance '%s'\n",
-			  pi_cur->id);
+		ulogd_log(ULOGD_DEBUG, "%s: iterating over pluginstance '%s'\n",
+				  upi->id, pi_cur->id);
 		num_keys += pi_cur->plugin->output.num_keys;
 	}
 
@@ -209,8 +223,12 @@ int ulogd_wildcard_inputkeys(struct ulog
 	llist_for_each_entry(pi_cur, &stack->list, list) {
 		int i;
 
-		for (i = 0; i < pi_cur->plugin->output.num_keys; i++)
+		for (i = 0; i < pi_cur->plugin->output.num_keys; i++) {
+			pr_debug("%s: copy key '%s' from plugin '%s'\n", upi->id,
+					 pi_cur->plugin->output.keys[i].name, pi_cur->id);
+
 			upi->input.keys[index++] = pi_cur->output.keys[i];
+		}
 	}
 
 	upi->input.num_keys = num_keys;
@@ -650,6 +668,10 @@ static int create_stack(const char *opti
 	}
 	INIT_LLIST_HEAD(&stack->list);
 
+	/* By default a stack is reconfigurable unless there is a plugin
+	   which is not reconfigurable */
+	stack->flags = ULOGD_SF_RECONF;
+
 	ulogd_log(ULOGD_DEBUG, "building new pluginstance stack (%s):\n",
 		  option);
 
@@ -698,6 +720,9 @@ static int create_stack(const char *opti
 			
 		ulogd_log(ULOGD_DEBUG, "pushing `%s' on stack\n", pl->name);
 		llist_add_tail(&pi->list, &stack->list);
+
+		if ((pi->plugin->flags & ULOGD_PF_RECONF) == 0)
+			stack->flags &= ~ULOGD_SF_RECONF;
 	}
 
 	/* PASS 2: resolve key connections from bottom to top of stack */
@@ -788,10 +813,20 @@ static int parse_conffile(const char *se
 	return 1;
 }
 
+/**
+ * Call callback on each ulogd_pluginstance.
+ * @arg cb      Callback to call for each ulogd_pluginstance.
+ * @arg arg     Argument to pass to each call to argument.
+ *
+ * Call callback on each ulogd_pluginstance, skipping stacks which are
+ * not reconfigurable.
+ *
+ * @return Number of calls.
+ */
 static int
 for_each_pluginstance(int (* cb)(struct ulogd_pluginstance *,
 								 struct ulogd_pluginstance_stack *,
-								 void *), void *arg)
+								 unsigned long), unsigned long arg)
 {
 	struct ulogd_pluginstance_stack *stack;
 	int sum = 0;
@@ -801,6 +836,9 @@ for_each_pluginstance(int (* cb)(struct 
 	llist_for_each_entry(stack, &ulogd_pi_stacks, stack_list) {
 		struct ulogd_pluginstance *pi;
 
+		if ((stack->flags & ULOGD_SF_RECONF) == 0)
+			continue;
+
 		llist_for_each_entry(pi, &stack->list, list) {
 			int ret;
 
@@ -824,14 +862,15 @@ enum ReconfOp
 
 static int
 _do_reconf(struct ulogd_pluginstance *pi,
-		   struct ulogd_pluginstance_stack *stack, void *arg)
+		   struct ulogd_pluginstance_stack *stack, unsigned long arg)
 {
-	enum ReconfOp op = (unsigned)arg;
+	enum ReconfOp op = arg;
 	int ret = 0;
 
 	assert(pi != NULL);
 
-	if ((pi->plugin->flags & ULOGD_PF_RECONF) == 0)
+	/* check whether stack is reconfigurable */
+	if ((stack->flags & ULOGD_SF_RECONF) == 0)
 		return 0;
 
 	switch (op) {
@@ -839,16 +878,12 @@ _do_reconf(struct ulogd_pluginstance *pi
 		ret = pi->plugin->stop(pi);
 		break;
 
-	case CONFIGURE:
-		ulogd_pluginstance_reset_cfg(pi);
-		ret = pi->plugin->configure(pi, stack);
-		break;
-
 	case START:
 		ret = pi->plugin->start(pi);
 		break;
 
 	default:
+		ulogd_log(ULOGD_ERROR, "%s: invalid op '%d'\n", __func__, op);
 		return -1;
 	}
 
@@ -863,25 +898,43 @@ _do_reconf(struct ulogd_pluginstance *pi
 static int
 reconfigure_plugins(void)
 {
+	struct ulogd_pluginstance_stack *stack;
+
 	ulogd_log(ULOGD_INFO, "reconfiguring plugins\n");
 
-	if (for_each_pluginstance(_do_reconf, (void *)STOP) < 0)
-		abort();
+	ulogd_set_state(GS_INITIALIZING);
 
-	if (for_each_pluginstance(_do_reconf, (void *)CONFIGURE) < 0)
-		abort();
+	/* Skip stacks which contain at least one plugin which is not
+	   reconfigurable.  This a workaround until all plugins are
+	   reconfigurable. */
+	if (for_each_pluginstance(_do_reconf, STOP) < 0)
+		goto err;
 
-	if (for_each_pluginstance(_do_reconf, (void *)START) < 0)
-		abort();
+	llist_for_each_entry(stack, &ulogd_pi_stacks, stack_list) {
+		if ((stack->flags & ULOGD_SF_RECONF) == 0)
+			continue;
+
+		if (create_stack_resolve_keys(stack) < 0)
+			goto err;
+	}
+
+	if (for_each_pluginstance(_do_reconf, START) < 0)
+		goto err;
+
+	ulogd_set_state(GS_RUNNING);
 
 	return 0;
+
+ err:
+ 	ulogd_log(ULOGD_FATAL, "failed to reconfigure\n");
+ 	return -1;
 }
 
 static int
 _do_signal(struct ulogd_pluginstance *pi,
-		   struct ulogd_pluginstance_stack *stack, void *arg)
+		   struct ulogd_pluginstance_stack *stack, unsigned long arg)
 {
-	int signo = (int) arg;
+	int signo = (int)arg;
 
 	if (pi->plugin->signal) {
 		pi->plugin->signal(pi, signo);
@@ -897,6 +950,9 @@ sync_sig_handler(int signo)
 {
 	pr_debug("%s: signal '%d' received\n", __func__, signo);
 
+	if (ulogd_get_state() != GS_RUNNING)
+		return;
+
 	switch (signo) {
 	case SIGHUP:
 		reconfigure_plugins();
@@ -913,7 +969,7 @@ sync_sig_handler(int signo)
 		break;
 	}
 
-	for_each_pluginstance(_do_signal, (void *) signo);
+	for_each_pluginstance(_do_signal, (unsigned long)signo);
 }
 
 static void
@@ -965,6 +1021,7 @@ main(int argc, char* argv[])
 	uid_t uid = 0;
 	gid_t gid = 0;
 
+	ulogd_set_state(GS_INITIALIZING);
 
 	while ((argch = getopt_long(argc, argv, "c:dh::Vu:", opts, NULL)) != -1) {
 		switch (argch) {
@@ -1086,6 +1143,8 @@ main(int argc, char* argv[])
 
 	ulogd_log(ULOGD_INFO, "entering main loop\n");
 
+	ulogd_set_state(GS_RUNNING);
+
 	ulogd_dispatch();
 
 	return 0;
Index: ulogd-netfilter/include/ulogd/ulogd.h
===================================================================
--- ulogd-netfilter.orig/include/ulogd/ulogd.h
+++ ulogd-netfilter/include/ulogd/ulogd.h
@@ -189,17 +189,29 @@ struct ulogd_pluginstance {
 	char private[0];
 };
 
+/* stack flags */
+#define ULOGD_SF_RECONF			0x00000001 /* stack is reconfigurable */
+
 struct ulogd_pluginstance_stack {
 	/* global list of pluginstance stacks */
 	struct llist_head stack_list;
 	/* list of plugins in this stack */
 	struct llist_head list;
 	char *name;
+	unsigned flags;
 };
 
 /***********************************************************************
  * PUBLIC INTERFACE 
  ***********************************************************************/
+enum GlobalState {
+	GS_INVAL = 0,
+	GS_INITIALIZING,			/* also reconfiguration */
+	GS_RUNNING,
+};
+
+void ulogd_set_state(enum GlobalState);
+enum GlobalState ulogd_get_state(void);
 
 void ulogd_propagate_results(struct ulogd_pluginstance *pi);
 

-- 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [ULOGD 13/15] llist: turn poisoning off by default
  2008-02-02 20:48 [ULOGD 00/15] ulogd V2 improvements, round 2 heitzenberger
                   ` (11 preceding siblings ...)
  2008-02-02 20:48 ` [ULOGD 12/15] Introduce global state, skip some stacks during reconfiguration heitzenberger
@ 2008-02-02 20:48 ` heitzenberger
  2008-02-02 20:48 ` [ULOGD 14/15] SQLITE3: port to ulogd 2.00, mostly a rewrite heitzenberger
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: heitzenberger @ 2008-02-02 20:48 UTC (permalink / raw)
  To: netfilter-devel; +Cc: holger

Hi,
Content-Disposition: inline; filename=ulogd-llist-poison-debug-only.diff

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>

Index: ulogd-netfilter/include/ulogd/linuxlist.h
===================================================================
--- ulogd-netfilter.orig/include/ulogd/linuxlist.h
+++ ulogd-netfilter/include/ulogd/linuxlist.h
@@ -117,8 +117,10 @@ static inline void __llist_del(struct ll
 static inline void llist_del(struct llist_head *entry)
 {
 	__llist_del(entry->prev, entry->next);
+#ifdef ULOGD_LLIST_DEBUG
 	entry->next = LLIST_POISON1;
 	entry->prev = LLIST_POISON2;
+#endif /* ULOGD_LLIST_DEBUG */
 }
 
 /**

-- 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [ULOGD 14/15] SQLITE3: port to ulogd 2.00, mostly a rewrite
  2008-02-02 20:48 [ULOGD 00/15] ulogd V2 improvements, round 2 heitzenberger
                   ` (12 preceding siblings ...)
  2008-02-02 20:48 ` [ULOGD 13/15] llist: turn poisoning off by default heitzenberger
@ 2008-02-02 20:48 ` heitzenberger
  2008-02-02 20:48 ` [ULOGD 15/15] NFCT: rework and let it scale heitzenberger
  2008-02-02 22:52 ` [ULOGD 00/15] ulogd V2 improvements, round 2 Pablo Neira Ayuso
  15 siblings, 0 replies; 31+ messages in thread
From: heitzenberger @ 2008-02-02 20:48 UTC (permalink / raw)
  To: netfilter-devel; +Cc: holger

Hi,
Content-Disposition: inline; filename=ulogd-SQLITE3-plugin.diff

NOTE

As described in

 http://www.sqlite.org/cvstrac/wiki?p=CorruptionFollowingBusyError

some versions of SQLITE contain an error when transactions other than
EXCLUSIVE are used.  If a SQL commands to create a transaction or
inside a transaction ever gets SQLITE_BUSY a ROLLBACK has to be done,
as otherwise a corrupt database might be the result.

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>

Index: ulogd-netfilter-stuffed/output/sqlite3/ulogd_output_SQLITE3.c
===================================================================
--- ulogd-netfilter-stuffed.orig/output/sqlite3/ulogd_output_SQLITE3.c
+++ ulogd-netfilter-stuffed/output/sqlite3/ulogd_output_SQLITE3.c
@@ -1,4 +1,3 @@
-#if 0
 /*
  * ulogd output plugin for logging to a SQLITE database
  *
@@ -26,389 +25,805 @@
  *
  *  2005-02-09 Harald Welte <laforge@gnumonks.org>:
  *  	- port to ulogd-1.20 
+ *
+ *  Holger Eitzenberger <holger@eitzenberger.org>  Astaro AG, 2007
+ *  	- port to ulogd-2.00
  */
 
-#include <stdlib.h>
-#include <string.h>
-#include <arpa/inet.h>
 #include <ulogd/ulogd.h>
 #include <ulogd/conffile.h>
+#include <ulogd/common.h>
+#include <ulogd/linuxlist.h>
+#include <arpa/inet.h>
 #include <sqlite3.h>
 
-#ifdef DEBUG_SQLITE3
-#define DEBUGP(x, args...)	fprintf(stderr, x, ## args)
-#else
-#define DEBUGP(x, args...)
-#endif
+#define PFX		"SQLITE3: "
+
+/* config defaults */
+#define CFG_BUFFER_DEFAULT		100
+#define CFG_TIMER_DEFAULT		1 SEC
+#define CFG_MAX_BACKLOG_DEFAULT	0		/* unlimited */
+
+
+#define SQLITE3_BUSY_TIMEOUT 300
+
+/* number of colums we have (really should be configurable) */
+#define DB_NUM_COLS	10
 
-struct _field {
+
+/* map DB column to ulogd key */
+struct col {
 	char name[ULOGD_MAX_KEYLEN];
-	unsigned int id;
-	struct _field *next;
+	struct ulogd_key *key;
+};
+
+struct row {
+	struct llist_head link;
+	uint32_t ip_saddr;
+	uint32_t ip_daddr;
+	unsigned char ip_proto;
+	unsigned l4_dport;
+	unsigned raw_in_pktlen;
+	unsigned raw_in_pktcount;
+	unsigned raw_out_pktlen;
+	unsigned raw_out_pktcount;
+	unsigned flow_start_sec;
+	unsigned flow_duration;
 };
 
-/* the database handle we are using */
-static sqlite3 *dbh;
+#define RKEY(key)	((key)->u.source)
 
-/* a linked list of the fields the table has */
-static struct _field *fields;
 
-/* buffer for our insert statement */
-static char *stmt;
+struct sqlite3_priv {
+	sqlite3 *dbh;				/* database handle we are using */
+	char *stmt;
+	sqlite3_stmt *p_stmt;
+	int buffer_size;
 
-/* pointer to the final prepared statement */
-static sqlite3_stmt *p_stmt;
+	struct ulogd_timer timer;
 
-/* number of statements to buffer before we commit */
-static int buffer_size;
+	struct col cols[DB_NUM_COLS];
 
-/* number of statements currently in the buffer */
-static int buffer_ctr;
+	/* our backlog buffer */
+	struct llist_head rows;
+	int num_rows;
+	int max_backlog;
 
-/* our configuration directives */
-static config_entry_t db_ce = { 
-	.key = "db", 
-	.type = CONFIG_TYPE_STRING,
-	.options = CONFIG_OPT_MANDATORY,
-};
+	time_t commit_time;
 
-static config_entry_t table_ce = { 
-	.next = &db_ce, 
-	.key = "table",
-	.type = CONFIG_TYPE_STRING,
-	.options = CONFIG_OPT_MANDATORY,
+	unsigned disable : 1;
+	unsigned overlimit_msg : 1;
 };
 
-static config_entry_t buffer_ce = { 
-	.next = &table_ce,
-	.key = "buffer",
-	.type = CONFIG_TYPE_INT,
-	.options = CONFIG_OPT_MANDATORY,
+
+static struct config_keyset sqlite3_kset = {
+	.num_ces = 6,
+	.ces = {
+		{
+			.key = "db",
+			.type = CONFIG_TYPE_STRING,
+			.options = CONFIG_OPT_MANDATORY,
+		},
+		{
+			.key = "table",
+			.type = CONFIG_TYPE_STRING,
+			.options = CONFIG_OPT_MANDATORY,
+		},
+		{
+			.key = "buffer",
+			.type = CONFIG_TYPE_INT,
+			.options = CONFIG_OPT_NONE,
+			.u.value = CFG_BUFFER_DEFAULT,
+		},
+		{
+			.key = "timer",
+			.type = CONFIG_TYPE_INT,
+			.options = CONFIG_OPT_NONE,
+			.u.value = CFG_TIMER_DEFAULT,
+		},
+		{
+			.key = "max-backlog",
+			.type = CONFIG_TYPE_INT,
+			.options = CONFIG_OPT_NONE,
+			.u.value = CFG_MAX_BACKLOG_DEFAULT,
+		},
+		{
+			.key = "disable",
+			.type = CONFIG_TYPE_INT,
+			.options = CONFIG_OPT_NONE,
+			.u.value = 0,
+		},
+	},
 };
 
-/* our main output function, called by ulogd */
-static int _sqlite3_output(ulog_iret_t *result)
+#define db_ce(pi)		(pi)->config_kset->ces[0].u.string
+#define table_ce(pi)	(pi)->config_kset->ces[1].u.string
+#define buffer_ce(pi)	(pi)->config_kset->ces[2].u.value
+#define timer_ce(pi)	(pi)->config_kset->ces[3].u.value
+#define max_backlog_ce(pi)	(pi)->config_kset->ces[4].u.value
+#define disable_ce(pi)	(pi)->config_kset->ces[5].u.value
+
+
+#define SQL_CREATE_STR \
+		"create table daily(ip_saddr integer, ip_daddr integer, " \
+		"ip_protocol integer, l4_dport integer, raw_in_pktlen integer, " \
+		"raw_in_pktcount integer, raw_out_pktlen integer, " \
+		"raw_out_pktcount integer, flow_start_sec integer, " \
+		"flow_duration integer)"
+
+
+static struct row *
+row_new(void)
 {
-	struct _field *f;
-	ulog_iret_t *res;
-	int col_counter;
-#ifdef IP_AS_STRING
-	char *ipaddr;
-	struct in_addr addr;
-#endif
-
-	col_counter = 1;
-	for (f = fields; f; f = f->next) {
-		res = keyh_getres(f->id);
-
-		if (!res) {
-			ulogd_log(ULOGD_NOTICE,
-				"no result for %s ?!?\n", f->name);
-		}
-			
-		if (!res || !IS_VALID((*res))) {
-			/* no result, pass a null */
-			sqlite3_bind_null(p_stmt, col_counter);
-			col_counter++;
-			continue;
-		}
-		
-		switch (res->type) {
-			case ULOGD_RET_INT8:
-				sqlite3_bind_int(p_stmt,col_counter,res->value.i8);
-				break;
-			case ULOGD_RET_INT16:
-				sqlite3_bind_int(p_stmt,col_counter,res->value.i16);
-				break;
-			case ULOGD_RET_INT32:
-				sqlite3_bind_int(p_stmt,col_counter,res->value.i32);
-				break;
-			case ULOGD_RET_INT64:
-				sqlite3_bind_int64(p_stmt,col_counter,res->value.i64);
-				break;
-			case ULOGD_RET_UINT8:
-				sqlite3_bind_int(p_stmt,col_counter,res->value.ui8);
-				break;
-			case ULOGD_RET_UINT16:
-				sqlite3_bind_int(p_stmt,col_counter,res->value.ui16);
-				break;
-			case ULOGD_RET_IPADDR:
-#ifdef IP_AS_STRING
-				memset(&addr, 0, sizeof(addr));
-				addr.s_addr = ntohl(res->value.ui32);
-				ipaddr = inet_ntoa(addr);
-				sqlite3_bind_text(p_stmt,col_counter,ipaddr,strlen(ipaddr),SQLITE_STATIC);
-                                break;
-#endif /* IP_AS_STRING */
-			/* EVIL: fallthrough when logging IP as u_int32_t */
-			case ULOGD_RET_UINT32:
-				sqlite3_bind_int(p_stmt,col_counter,res->value.ui32);
-				break;
-			case ULOGD_RET_UINT64:
-				sqlite3_bind_int64(p_stmt,col_counter,res->value.ui64);
-				break;
-			case ULOGD_RET_BOOL:
-				sqlite3_bind_int(p_stmt,col_counter,res->value.b);
-				break;
-			case ULOGD_RET_STRING:
-				sqlite3_bind_text(p_stmt,col_counter,res->value.ptr,strlen(res->value.ptr),SQLITE_STATIC);
-				break;
-			default:
-				ulogd_log(ULOGD_NOTICE,
-					"unknown type %d for %s\n",
-					res->type, res->key);
-				break;
-		} 
-
-		col_counter++;
-	}
-
-	/* now we have created our statement, insert it */
-
-	if (sqlite3_step(p_stmt) == SQLITE_DONE) {
-		sqlite3_reset(p_stmt);
-		buffer_ctr++;
-	} else {
-		ulogd_log(ULOGD_ERROR, "sql error during insert: %s\n",
-				sqlite3_errmsg(dbh));
-		return 1;
-	}
+	struct row *row;
+
+	if ((row = calloc(1, sizeof(struct row))) == NULL)
+		ulogd_error("%s: out of memory\n", __func__);
+
+	return row;
+}
+
 
-	/* commit all of the inserts to the database, ie flush buffer */
-	if (buffer_ctr >= buffer_size) {
-		if (sqlite3_exec(dbh,"commit",NULL,NULL,NULL) != SQLITE_OK)
-			ulogd_log(ULOGD_ERROR,"unable to commit records to db.");
+static inline void
+__row_del(struct sqlite3_priv *priv, struct row *row)
+{
+	assert(row != NULL);
 
-		if (sqlite3_exec(dbh,"begin deferred",NULL,NULL,NULL) != SQLITE_OK)
-			ulogd_log(ULOGD_ERROR,"unable to begin a new transaction.");
+	free(row);
+}
+
+
+static void
+row_del(struct sqlite3_priv *priv, struct row *row)
+{
+	llist_del(&row->link);
 
-		buffer_ctr = 0;
-		DEBUGP("committing.\n");
+	__row_del(priv, row);
+
+	priv->num_rows--;
+}
+
+
+static int
+row_add(struct sqlite3_priv *priv, struct row *row)
+{
+	if (priv->max_backlog && priv->num_rows >= priv->max_backlog) {
+		if (!priv->overlimit_msg) {
+			ulogd_error(PFX "over max-backlog limit, dropping rows\n");
+
+			priv->overlimit_msg = 1;
+		}
+
+		__row_del(priv, row);
+
+		return -1;
 	}
 
+	llist_add_tail(&row->link, &priv->rows);
+
+	priv->num_rows++;
+
 	return 0;
 }
 
+/* set_commit_time() - set time for next try on locked database
+ *
+ * The database is effectively locked in between.
+ */
+static void
+set_commit_time(const struct ulogd_pluginstance *pi)
+{
+	struct sqlite3_priv *priv = (void *)pi->private;
+
+	priv->commit_time = t_now + 1;
+
+	pr_debug("%s: commit time %d\n", __func__, priv->commit_time);
+}
+
 #define _SQLITE3_INSERTTEMPL   "insert into X (Y) values (Z)"
 
-/* create the static part of our insert statement */
-static int _sqlite3_createstmt(void)
+/* create static part of our insert statement */
+static int
+db_createstmt(struct ulogd_pluginstance *pi)
 {
-	struct _field *f;
-	unsigned int size;
+	struct sqlite3_priv *priv = (void *)pi->private;
 	char buf[ULOGD_MAX_KEYLEN];
 	char *underscore;
 	char *stmt_pos;
-	int col_count;
 	int i;
 
-	if (stmt) {
-		ulogd_log(ULOGD_NOTICE, "createstmt called, but stmt"
-			" already existing\n");	
-		return 1;
-	}
-
-	/* caclulate the size for the insert statement */
-	size = strlen(_SQLITE3_INSERTTEMPL) + strlen(table_ce.u.string);
+	if (priv->stmt != NULL)
+		free(priv->stmt);
 
-	DEBUGP("initial size: %u\n", size);
-
-	col_count = 0;
-	for (f = fields; f; f = f->next) {
-		/* we need space for the key and a comma, and a ? */
-		size += strlen(f->name) + 3;
-		DEBUGP("size is now %u since adding %s\n",size,f->name);
-		col_count++;
+	if ((priv->stmt = calloc(1, 1024)) == NULL) {
+		ulogd_error(PFX "out of memory\n");
+		return -1;
 	}
 
-	DEBUGP("there were %d columns\n",col_count);
-	DEBUGP("after calc name length: %u\n",size);
-
-	ulogd_log(ULOGD_DEBUG, "allocating %u bytes for statement\n", size);
-
-	stmt = (char *) malloc(size);
+	sprintf(priv->stmt, "insert into %s (", table_ce(pi));
+	stmt_pos = priv->stmt + strlen(priv->stmt);
 
-	if (!stmt) {
-		ulogd_log(ULOGD_ERROR, "OOM!\n");
-		return 1;
-	}
+	for (i = 0; i < DB_NUM_COLS; i++) {
+		struct col *col = &priv->cols[i];
 
-	sprintf(stmt, "insert into %s (", table_ce.u.string);
-	stmt_pos = stmt + strlen(stmt);
+		/* convert name */
+		strncpy(buf, col->name, ULOGD_MAX_KEYLEN);
 
-	for (f = fields; f; f = f->next) {
-		strncpy(buf, f->name, ULOGD_MAX_KEYLEN);	
 		while ((underscore = strchr(buf, '.')))
 			*underscore = '_';
+
 		sprintf(stmt_pos, "%s,", buf);
-		stmt_pos = stmt + strlen(stmt);
+		stmt_pos = priv->stmt + strlen(priv->stmt);
 	}
 
 	*(stmt_pos - 1) = ')';
 
 	sprintf(stmt_pos, " values (");
-	stmt_pos = stmt + strlen(stmt);
+	stmt_pos = priv->stmt + strlen(priv->stmt);
 
-	for (i = 0; i < col_count - 1; i++) {
+	for (i = 0; i < DB_NUM_COLS - 1; i++) {
 		sprintf(stmt_pos,"?,");
 		stmt_pos += 2;
 	}
 
 	sprintf(stmt_pos, "?)");
-	ulogd_log(ULOGD_DEBUG, "stmt='%s'\n", stmt);
+	ulogd_log(ULOGD_DEBUG, "%s: stmt='%s'\n", pi->id, priv->stmt);
 
-	DEBUGP("about to prepare statement.\n");
+	pr_debug("about to prepare statement.\n");
 
-	sqlite3_prepare(dbh,stmt,-1,&p_stmt,0);
-
-	DEBUGP("statement prepared.\n");
-
-	if (!p_stmt) {
-		ulogd_log(ULOGD_ERROR,"unable to prepare statement");
+	sqlite3_prepare(priv->dbh, priv->stmt, -1, &priv->p_stmt, 0);
+	if (priv->p_stmt == NULL) {
+		ulogd_error(PFX "prepare: %s\n", sqlite3_errmsg(priv->dbh));
 		return 1;
 	}
 
+	pr_debug("%s: statement prepared.\n", pi->id);
+
 	return 0;
 }
 
 
-/* length of "select * from \0" */
-#define SQLITE_SELECT_LEN 15
+static struct ulogd_key *
+ulogd_find_key(struct ulogd_pluginstance *pi, const char *name)
+{
+	int i;
+
+	for (i = 0; i < pi->input.num_keys; i++) {
+		if (strcmp(pi->input.keys[i].name, name) == 0)
+			return &pi->input.keys[i];
+	}
+
+	return NULL;
+}
+
+#define SELECT_ALL_STR			"select * from "
+#define SELECT_ALL_LEN			sizeof(SELECT_ALL_STR)
 
-/* find out which columns the table has */
-static int _sqlite3_get_columns(const char *table)
+static int
+db_count_cols(struct ulogd_pluginstance *pi, sqlite3_stmt **stmt)
 {
+	struct sqlite3_priv *priv = (void *)pi->private;
+	char query[SELECT_ALL_LEN + CONFIG_VAL_STRING_LEN] = SELECT_ALL_STR;
+
+	strncat(query, table_ce(pi), LINE_LEN);
+
+	if (sqlite3_prepare(priv->dbh, query, -1, stmt, 0) != SQLITE_OK) {
+		return -1;
+	}
+
+	return sqlite3_column_count(*stmt);
+}
+
+
+static int
+db_create_tbl(struct ulogd_pluginstance *pi)
+{
+	struct sqlite3_priv *priv = (void *)pi->private;
+	char *errmsg;
+	int ret;
+
+	sqlite3_exec(priv->dbh, "drop table daily", NULL, NULL, NULL);
+
+	ret = sqlite3_exec(priv->dbh, SQL_CREATE_STR, NULL, NULL, &errmsg);
+	if (ret != SQLITE_OK) {
+		ulogd_error(PFX "create table: %s\n", errmsg);
+		sqlite3_free(errmsg);
+
+		return -1;
+	}
+
+	return 0;
+}
+
+
+/* initialize DB, possibly creating it */
+static int
+db_init(struct ulogd_pluginstance *pi)
+{
+	struct sqlite3_priv *priv = (void *)pi->private;
 	char buf[ULOGD_MAX_KEYLEN];
-	char query[SQLITE_SELECT_LEN + CONFIG_VAL_STRING_LEN] = "select * from \0";
 	char *underscore;
-	struct _field *f;
 	sqlite3_stmt *schema_stmt;
-	int column;
-	int result;
-	int id;
+	int num_cols, i;
 
-	if (!dbh)
-		return 1;
+	if (priv->dbh == NULL)
+		return -1;
 
-	strncat(query,table,LINE_LEN);
-	
-	result = sqlite3_prepare(dbh,query,-1,&schema_stmt,0);
-	
-	if (result != SQLITE_OK)
-		return 1;
+	num_cols = db_count_cols(pi, &schema_stmt);
+	if (num_cols != DB_NUM_COLS) {
+		ulogd_log(ULOGD_INFO, "%s: (re)creating database\n", pi->id);
+
+		if (db_create_tbl(pi) < 0)
+			return -1;
+
+		num_cols = db_count_cols(pi, &schema_stmt);
+	}
+
+	assert(num_cols == DB_NUM_COLS);
+
+	for (i = 0; i < DB_NUM_COLS; i++) {
+		struct col *col = &priv->cols[i];
+
+		strncpy(buf, sqlite3_column_name(schema_stmt, i), ULOGD_MAX_KEYLEN);
 
-	for (column = 0; column < sqlite3_column_count(schema_stmt); column++) {
 		/* replace all underscores with dots */
-		strncpy(buf, sqlite3_column_name(schema_stmt,column), ULOGD_MAX_KEYLEN);
-		while ((underscore = strchr(buf, '_')))
+		while ((underscore = strchr(buf, '_')) != NULL)
 			*underscore = '.';
 
-		DEBUGP("field '%s' found: ", buf);
+		pr_debug("column '%s' found\n", buf);
 
-		if (!(id = keyh_getid(buf))) {
-			DEBUGP(" no keyid!\n");
-			continue;
+		strncpy(col->name, buf, ULOGD_MAX_KEYLEN);
+
+		if ((col->key = ulogd_find_key(pi, buf)) == NULL) {
+			printf(PFX "%s: key not found\n", buf);
+			return -1;
 		}
+	}
+
+	ulogd_log(ULOGD_INFO, "%s: database opened\n", pi->id);
+
+	if (sqlite3_finalize(schema_stmt) != SQLITE_OK) {
+		ulogd_error(PFX "sqlite_finalize: %s\n",
+					sqlite3_errmsg(priv->dbh));
+		return -1;
+	}
+
+	return 0;
+}
+
+
+static void
+db_reset(struct ulogd_pluginstance *pi)
+{
+	struct sqlite3_priv *priv = (void *)pi->private;
+
+	sqlite3_finalize(priv->p_stmt);
+
+	sqlite3_close(priv->dbh);
+	priv->dbh = NULL;
+
+	free(priv->stmt);
+	priv->stmt = NULL;
+}
+
+
+static int
+db_start(struct ulogd_pluginstance *pi)
+{
+	struct sqlite3_priv *priv = (void *)pi->private;
 
-		DEBUGP("keyid %u\n", id);
+	ulogd_log(ULOGD_DEBUG, "%s: opening database connection\n", pi->id);
 
-		/* prepend it to the linked list */
-		f = (struct _field *) malloc(sizeof *f);
-		if (!f) {
-			ulogd_log(ULOGD_ERROR, "OOM!\n");
-			return 1;
+	if (sqlite3_open(db_ce(pi), &priv->dbh) != SQLITE_OK) {
+		ulogd_error(PFX "%s\n", sqlite3_errmsg(priv->dbh));
+		return -1;
+	}
+
+	/* set the timeout so that we don't automatically fail
+	   if the table is busy */
+	sqlite3_busy_timeout(priv->dbh, SQLITE3_BUSY_TIMEOUT);
+
+	/* read the fieldnames to know which values to insert */
+	if (db_init(pi) < 0)
+		return -1;
+
+	/* initialize our buffer size and counter */
+	priv->buffer_size = buffer_ce(pi);
+
+	priv->max_backlog = max_backlog_ce(pi);
+
+	/* create and prepare the actual insert statement */
+	db_createstmt(pi);
+
+	return 0;
+}
+
+/* db_err() - handle database errors */
+static int
+db_err(struct ulogd_pluginstance *pi, int ret)
+{
+	struct sqlite3_priv *priv = (void *)pi->private;
+
+	pr_debug("%s: ret=%d (errcode %d)\n", __func__, ret,
+			 sqlite3_errcode(priv->dbh));
+
+	assert(ret != SQLITE_OK && ret != SQLITE_DONE);
+
+	if (ret == SQLITE_BUSY || ret == SQLITE_LOCKED)
+		set_commit_time(pi);
+	else {
+		switch (sqlite3_errcode(priv->dbh)) {
+		case SQLITE_LOCKED:
+		case SQLITE_BUSY:
+			set_commit_time(pi);
+			break;
+
+		case SQLITE_SCHEMA:
+			if (priv->stmt) {
+				sqlite3_finalize(priv->p_stmt);
+
+				db_createstmt(pi);
+
+				ulogd_log(ULOGD_INFO, "%s: database schema changed\n",
+						  pi->id);
+			}
+			break;
+
+		default:
+			ulogd_error("%s: transaction: %s\n", pi->id,
+						sqlite3_errmsg(priv->dbh));
+			break;
 		}
-		strncpy(f->name, buf, ULOGD_MAX_KEYLEN);
-		f->id = id;
-		f->next = fields;
-		fields = f;	
 	}
 
-	sqlite3_finalize(schema_stmt);
+	sqlite3_exec(priv->dbh, "rollback", NULL, NULL, NULL);
+
+	/* no sqlit3_clear_bindings(), as an unbind will be done implicitely
+	   on next bind. */
+	if (priv->p_stmt != NULL)
+		sqlite3_reset(priv->p_stmt);
+
 	return 0;
 }
 
-/** 
- * make connection and select database 
- * returns 0 if database failed to open.
- */
-static int _sqlite3_open_db(char *db_file)
+static int
+db_add_row(struct ulogd_pluginstance *pi, const struct row *row)
 {
-	DEBUGP("opening database.\n");
-	return sqlite3_open(db_file,&dbh);
+	struct sqlite3_priv *priv = (void *)pi->private;
+	int db_col = 1, ret;
+
+	do {
+		ret = sqlite3_bind_int64(priv->p_stmt, db_col++, row->ip_saddr);
+		if (ret != SQLITE_OK)
+			break;
+
+		ret = sqlite3_bind_int64(priv->p_stmt, db_col++, row->ip_daddr);
+		if (ret != SQLITE_OK)
+			break;
+
+		ret = sqlite3_bind_int(priv->p_stmt, db_col++, row->ip_proto);
+		if (ret != SQLITE_OK)
+			break;
+
+		ret = sqlite3_bind_int(priv->p_stmt, db_col++, row->l4_dport);
+		if (ret != SQLITE_OK)
+			break;
+
+		ret = sqlite3_bind_int(priv->p_stmt, db_col++, row->raw_in_pktlen);
+		if (ret != SQLITE_OK)
+			break;
+
+		ret = sqlite3_bind_int64(priv->p_stmt, db_col++, row->raw_in_pktcount);
+		if (ret != SQLITE_OK)
+			break;
+
+		ret = sqlite3_bind_int(priv->p_stmt, db_col++, row->raw_out_pktlen);
+		if (ret != SQLITE_OK)
+			break;
+
+		ret = sqlite3_bind_int64(priv->p_stmt, db_col++,
+								 row->raw_out_pktcount);
+		if (ret != SQLITE_OK)
+			break;
+
+		ret = sqlite3_bind_int(priv->p_stmt, db_col++, row->flow_start_sec);
+		if (ret != SQLITE_OK)
+			break;
+
+		ret = sqlite3_bind_int(priv->p_stmt, db_col++, row->flow_duration);
+		if (ret != SQLITE_OK)
+			break;
+
+		if ((ret = sqlite3_step(priv->p_stmt)) == SQLITE_DONE) {
+			/* no sqlite3_clear_bindings(), as an unbind will be
+			   implicetely done before next bind. */
+			sqlite3_reset(priv->p_stmt);
+
+			return 0;
+		}
+
+		/* according to the documentation sqlite3_step() always returns a
+		   generic SQLITE_ERROR.  In order to find out the cause of the
+		   error you have to call sqlite3_reset() or sqlite3_finalize(). */
+		ret = sqlite3_reset(priv->p_stmt);
+	} while (0);
+
+	return db_err(pi, ret);
 }
 
-/* give us an opportunity to close the database down properly */
-static void _sqlite3_fini(void)
+/* delete_rows() - delete rows from the tail of the list */
+static int
+delete_rows(struct ulogd_pluginstance *pi, int rows)
 {
-	DEBUGP("cleaning up db connection\n");
+	struct sqlite3_priv *priv = (void *)pi->private;
+	struct llist_head *curr, *tmp;
+
+    llist_for_each_prev_safe(curr, tmp, &priv->rows) {
+		struct row *row = container_of(curr, struct row, link);
 
-	/* free up our prepared statements so we can close the db */
-	if (p_stmt) {
-		sqlite3_finalize(p_stmt);
-		DEBUGP("prepared statement finalized\n");
+		if (rows-- == 0)
+			break;
+
+		row_del(priv, row);
 	}
 
-	if (dbh) {
-		int result;
-		/* flush the remaining insert statements to the database. */
-		result = sqlite3_exec(dbh,"commit",NULL,NULL,NULL);
+	return 0;
+}
+
+/*
+  db_commit_rows()
+
+  RETURN
+    >0	rows commited
+    0	locked
+   -1	error
+*/
+static int
+db_commit_rows(struct ulogd_pluginstance *pi)
+{
+	struct sqlite3_priv *priv = (void *)pi->private;
+	struct row *row;
+	int ret, rows = 0, max_commit;
+
+	ret = sqlite3_exec(priv->dbh, "begin immediate transaction", NULL,
+					   NULL, NULL);
+	if (ret != SQLITE_OK)
+		return db_err(pi, ret);
+
+	/* Limit number of rows to commit.  Note that currently three times
+	   buffer_size is a bit arbitrary and therefore might be adjusted in
+	   the future. */
+	max_commit = max(3 * priv->buffer_size, 1024);
 
-		if (result != SQLITE_OK)
-			ulogd_log(ULOGD_ERROR,"unable to commit remaining records to db.");
+	llist_for_each_entry_reverse(row, &priv->rows, link) {
+		if (++rows > max_commit)
+			break;
 
-		sqlite3_close(dbh);
-		DEBUGP("database file closed\n");
+		if (db_add_row(pi, row) < 0)
+			return db_err(pi, ret);
 	}
+
+	ret = sqlite3_exec(priv->dbh, "commit", NULL, NULL, NULL);
+	if (ret != SQLITE_OK)
+		return db_err(pi, ret);
+
+	sqlite3_reset(priv->p_stmt);
+	
+	pr_debug("%s: commited %d/%d rows\n", pi->id, rows, priv->num_rows);
+
+	delete_rows(pi, rows);
+	
+	if (priv->commit_time >= t_now)
+		priv->commit_time = 0;		/* release commit lock */
+	
+	if (priv->overlimit_msg)
+		priv->overlimit_msg = 0;
+
+	return rows;
 }
 
-#define _SQLITE3_BUSY_TIMEOUT 300
 
-static int _sqlite3_init(void)
+/* our main output function, called by ulogd */
+static int
+sqlite3_interp(struct ulogd_pluginstance *pi)
 {
-	/* have the opts parsed */
-	config_parse_file("SQLITE3", &buffer_ce);
+	struct sqlite3_priv *priv = (void *)pi->private;
+	struct col *cols = priv->cols;
+	struct row *row;
+
+	if ((row = row_new()) == NULL)
+		return ULOGD_IRET_ERR;
+
+	row->ip_saddr = RKEY(cols[0].key)->u.value.ui32;
+	row->ip_daddr = RKEY(cols[1].key)->u.value.ui32;
+	row->ip_proto = RKEY(cols[2].key)->u.value.ui8;
+	row->l4_dport = RKEY(cols[3].key)->u.value.ui16;
+	row->raw_in_pktlen = RKEY(cols[4].key)->u.value.ui32;
+	row->raw_in_pktcount = RKEY(cols[5].key)->u.value.ui32;
+	row->raw_out_pktlen = RKEY(cols[6].key)->u.value.ui32;
+	row->raw_out_pktcount = RKEY(cols[7].key)->u.value.ui32;
+	row->flow_start_sec = RKEY(cols[9].key)->u.value.ui32;
+	row->flow_duration = RKEY(cols[10].key)->u.value.ui32;
 
-	if (_sqlite3_open_db(db_ce.u.string)) {
-		ulogd_log(ULOGD_ERROR, "can't open the database file\n");
-		return 1;
+	if (row_add(priv, row) < 0)
+		return ULOGD_IRET_OK;
+
+	if (priv->num_rows >= priv->buffer_size && priv->commit_time == 0)
+		db_commit_rows(pi);
+
+	return ULOGD_IRET_OK;
+}
+
+
+static void
+sqlite_timer_cb(struct ulogd_timer *t)
+{
+	struct ulogd_pluginstance *pi = t->data;
+	struct sqlite3_priv *priv = (void *)pi->private;
+	int rows;
+
+	pr_debug("%s: timer=%p\n", __func__, t);
+
+	if (priv->commit_time != 0 && priv->commit_time > t_now)
+		return;
+
+	if (priv->num_rows == 0)
+		return;
+
+	rows = db_commit_rows(pi);
+
+	ulogd_log(ULOGD_DEBUG, "%s: rows=%d commited=%d\n", pi->id,
+			  priv->num_rows, rows);
+}
+
+
+static int
+sqlite3_configure(struct ulogd_pluginstance *pi,
+				  struct ulogd_pluginstance_stack *stack)
+{
+	struct sqlite3_priv *priv = (void *)pi->private;
+
+	memset(priv, 0, sizeof(struct sqlite3_priv));
+	
+	config_parse_file(pi->id, pi->config_kset);
+
+	if (ulogd_wildcard_inputkeys(pi) < 0)
+		return -1;
+
+	if (db_ce(pi) == NULL) {
+		ulogd_error("%s: configure: no database specified\n", pi->id);
+		return -1;
 	}
 
-	/* set the timeout so that we don't automatically fail
-         * if the table is busy. */
-	sqlite3_busy_timeout(dbh, _SQLITE3_BUSY_TIMEOUT);
+	if (table_ce(pi) == NULL) {
+		ulogd_error("%s: configure: no table specified\n", pi->id);
+		return -1;
+	}
 
-	/* read the fieldnames to know which values to insert */
-	if (_sqlite3_get_columns(table_ce.u.string)) {
-		ulogd_log(ULOGD_ERROR, "unable to get sqlite columns\n");
-		return 1;
+	if (timer_ce(pi) <= 0) {
+		ulogd_error("%s: configure: invalid timer value\n", pi->id);
+		return -1;
 	}
 
-	/* initialize our buffer size and counter */
-	buffer_size = buffer_ce.u.value;
-	buffer_ctr = 0;
+	if (max_backlog_ce(pi)) {
+		if (max_backlog_ce(pi) <= buffer_ce(pi)) {
+			ulogd_error("%s: configure: invalid max-backlog value\n",
+						pi->id);
+			return -1;
+		}
+	}
 
-	DEBUGP("Have a buffer size of : %d\n", buffer_size);
+	priv->max_backlog = max_backlog_ce(pi);
+	priv->disable = disable_ce(pi);
 
-	if (sqlite3_exec(dbh,"begin deferred",NULL,NULL,NULL) != SQLITE_OK)
-		ulogd_log(ULOGD_ERROR,"can't create a new transaction\n");
+	pr_debug("%s: db='%s' table='%s' timer=%d max-backlog=%d\n", pi->id,
+			 db_ce(pi), table_ce(pi), timer_ce(pi), max_backlog_ce(pi));
 
-	/* create and prepare the actual insert statement */
-	_sqlite3_createstmt();
+	return 0;
+}
+
+
+static int
+sqlite3_start(struct ulogd_pluginstance *pi)
+{
+	struct sqlite3_priv *priv = (void *)pi->private;
+
+	pr_debug("%s: pi=%p\n", __func__, pi);
+
+	if (priv->disable) {
+		ulogd_log(ULOGD_NOTICE, "%s: disabled\n", pi->id);
+		return 0;
+	}
+
+	priv->num_rows = 0;
+	INIT_LLIST_HEAD(&priv->rows);
+
+	if (db_start(pi) < 0)
+		return -1;
+
+	/* init timer */
+	priv->timer.cb = sqlite_timer_cb;
+	priv->timer.ival = timer_ce(pi);
+	priv->timer.flags = TIMER_F_PERIODIC;
+	priv->timer.data = pi;
+
+	if (ulogd_register_timer(&priv->timer) < 0)
+		return -1;
+
+	ulogd_log(ULOGD_INFO, "%s: started\n", pi->id);
 
 	return 0;
 }
 
-static ulog_output_t _sqlite3_plugin = { 
-	.name = "sqlite3", 
-	.output = &_sqlite3_output, 
-	.init = &_sqlite3_init,
-	.fini = &_sqlite3_fini,
-};
 
-void _init(void) 
+/* give us an opportunity to close the database down properly */
+static int
+sqlite3_stop(struct ulogd_pluginstance *pi)
 {
-	register_output(&_sqlite3_plugin);
+	struct sqlite3_priv *priv = (void *)pi->private;
+
+	pr_debug("%s: pi=%p\n", __func__, pi);
+
+	if (priv->disable)
+		return 0;				/* wasn't started */
+
+	if (priv->dbh == NULL)
+		return 0;				/* already stopped */
+
+	db_reset(pi);
+
+	return 0;
 }
 
-#endif
+
+static void
+sqlite3_signal(struct ulogd_pluginstance *pi, int sig)
+{
+	struct sqlite3_priv *priv = (void *)pi->private;
+
+	switch (sig) {
+	case SIGUSR1:
+		if (priv->dbh != NULL) {
+			db_reset(pi);
+
+			if (db_start(pi) < 0) {
+				ulogd_log(ULOGD_FATAL, "%s: database reset failed\n", pi->id);
+				exit(EXIT_FAILURE);
+			}
+		}
+		break;
+
+	default:
+		break;
+	}
+}
+
+
+static struct ulogd_plugin sqlite3_plugin = { 
+	.name = "SQLITE3",
+	.flags = ULOGD_PF_RECONF,
+	.input = {
+		.type = ULOGD_DTYPE_PACKET | ULOGD_DTYPE_FLOW,
+	},
+	.output = {
+		.type = ULOGD_DTYPE_SINK,
+	},
+	.config_kset = &sqlite3_kset,
+	.priv_size = sizeof(struct sqlite3_priv),
+	.configure = sqlite3_configure,
+	.start = sqlite3_start,
+	.stop = sqlite3_stop,
+	.signal = sqlite3_signal,
+	.interp = sqlite3_interp,
+	.version = ULOGD_VERSION,
+};
+
+static void init(void) __attribute__((constructor));
+
+static void
+init(void) 
+{
+	ulogd_register_plugin(&sqlite3_plugin);
+
+	ulogd_log(ULOGD_INFO, "using Sqlite version %s\n", sqlite3_libversion());
+}

-- 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [ULOGD 15/15] NFCT: rework and let it scale
  2008-02-02 20:48 [ULOGD 00/15] ulogd V2 improvements, round 2 heitzenberger
                   ` (13 preceding siblings ...)
  2008-02-02 20:48 ` [ULOGD 14/15] SQLITE3: port to ulogd 2.00, mostly a rewrite heitzenberger
@ 2008-02-02 20:48 ` heitzenberger
  2008-02-02 22:52 ` [ULOGD 00/15] ulogd V2 improvements, round 2 Pablo Neira Ayuso
  15 siblings, 0 replies; 31+ messages in thread
From: heitzenberger @ 2008-02-02 20:48 UTC (permalink / raw)
  To: netfilter-devel; +Cc: holger

Hi,
Content-Disposition: inline; filename=ulogd-NFCT-plugin.diff

Also implement garbage collection to account for the fact that netlink
messages are sometimes lost (ENOBUFS) on busy sites.  This is done by
implementing a tuple cache (indexed by tuple) and a sequence cache and
going regularly over part of both caches if need be.

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>

Index: ulogd-netfilter-stuffed/input/flow/linux_jhash.h
===================================================================
--- /dev/null
+++ ulogd-netfilter-stuffed/input/flow/linux_jhash.h
@@ -0,0 +1,146 @@
+#ifndef _LINUX_JHASH_H
+#define _LINUX_JHASH_H
+
+/* jhash.h: Jenkins hash support.
+ *
+ * Copyright (C) 1996 Bob Jenkins (bob_jenkins@burtleburtle.net)
+ *
+ * http://burtleburtle.net/bob/hash/
+ *
+ * These are the credits from Bob's sources:
+ *
+ * lookup2.c, by Bob Jenkins, December 1996, Public Domain.
+ * hash(), hash2(), hash3, and mix() are externally useful functions.
+ * Routines to test the hash are included if SELF_TEST is defined.
+ * You can use this free for any purpose.  It has no warranty.
+ *
+ * Copyright (C) 2003 David S. Miller (davem@redhat.com)
+ *
+ * I've modified Bob's hash to be useful in the Linux kernel, and
+ * any bugs present are surely my fault.  -DaveM
+ */
+
+#define u32		uint32_t
+#define u8		uint8_t
+
+/* NOTE: Arguments are modified. */
+#define __jhash_mix(a, b, c) \
+{ \
+  a -= b; a -= c; a ^= (c>>13); \
+  b -= c; b -= a; b ^= (a<<8); \
+  c -= a; c -= b; c ^= (b>>13); \
+  a -= b; a -= c; a ^= (c>>12);  \
+  b -= c; b -= a; b ^= (a<<16); \
+  c -= a; c -= b; c ^= (b>>5); \
+  a -= b; a -= c; a ^= (c>>3);  \
+  b -= c; b -= a; b ^= (a<<10); \
+  c -= a; c -= b; c ^= (b>>15); \
+}
+
+/* The golden ration: an arbitrary value */
+#define JHASH_GOLDEN_RATIO	0x9e3779b9
+
+/* The most generic version, hashes an arbitrary sequence
+ * of bytes.  No alignment or length assumptions are made about
+ * the input key.
+ */
+static inline u32 jhash(const void *key, u32 length, u32 initval)
+{
+	u32 a, b, c, len;
+	const u8 *k = key;
+
+	len = length;
+	a = b = JHASH_GOLDEN_RATIO;
+	c = initval;
+
+	while (len >= 12) {
+		a += (k[0] +((u32)k[1]<<8) +((u32)k[2]<<16) +((u32)k[3]<<24));
+		b += (k[4] +((u32)k[5]<<8) +((u32)k[6]<<16) +((u32)k[7]<<24));
+		c += (k[8] +((u32)k[9]<<8) +((u32)k[10]<<16)+((u32)k[11]<<24));
+
+		__jhash_mix(a,b,c);
+
+		k += 12;
+		len -= 12;
+	}
+
+	c += length;
+	switch (len) {
+	case 11: c += ((u32)k[10]<<24);
+	case 10: c += ((u32)k[9]<<16);
+	case 9 : c += ((u32)k[8]<<8);
+	case 8 : b += ((u32)k[7]<<24);
+	case 7 : b += ((u32)k[6]<<16);
+	case 6 : b += ((u32)k[5]<<8);
+	case 5 : b += k[4];
+	case 4 : a += ((u32)k[3]<<24);
+	case 3 : a += ((u32)k[2]<<16);
+	case 2 : a += ((u32)k[1]<<8);
+	case 1 : a += k[0];
+	};
+
+	__jhash_mix(a,b,c);
+
+	return c;
+}
+
+/* A special optimized version that handles 1 or more of u32s.
+ * The length parameter here is the number of u32s in the key.
+ */
+static inline u32 jhash2(u32 *k, u32 length, u32 initval)
+{
+	u32 a, b, c, len;
+
+	a = b = JHASH_GOLDEN_RATIO;
+	c = initval;
+	len = length;
+
+	while (len >= 3) {
+		a += k[0];
+		b += k[1];
+		c += k[2];
+		__jhash_mix(a, b, c);
+		k += 3; len -= 3;
+	}
+
+	c += length * 4;
+
+	switch (len) {
+	case 2 : b += k[1];
+	case 1 : a += k[0];
+	};
+
+	__jhash_mix(a,b,c);
+
+	return c;
+}
+
+
+/* A special ultra-optimized versions that knows they are hashing exactly
+ * 3, 2 or 1 word(s).
+ *
+ * NOTE: In partilar the "c += length; __jhash_mix(a,b,c);" normally
+ *       done at the end is not done here.
+ */
+static inline u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval)
+{
+	a += JHASH_GOLDEN_RATIO;
+	b += JHASH_GOLDEN_RATIO;
+	c += initval;
+
+	__jhash_mix(a, b, c);
+
+	return c;
+}
+
+static inline u32 jhash_2words(u32 a, u32 b, u32 initval)
+{
+	return jhash_3words(a, b, 0, initval);
+}
+
+static inline u32 jhash_1word(u32 a, u32 initval)
+{
+	return jhash_3words(a, 0, 0, initval);
+}
+
+#endif /* _LINUX_JHASH_H */
Index: ulogd-netfilter-stuffed/input/flow/ulogd_inpflow_NFCT.c
===================================================================
--- ulogd-netfilter-stuffed.orig/input/flow/ulogd_inpflow_NFCT.c
+++ ulogd-netfilter-stuffed/input/flow/ulogd_inpflow_NFCT.c
@@ -25,88 +25,121 @@
  * 	  the messages via IPFX to one aggregator who then runs ulogd with a 
  * 	  network wide connection hash table.
  */
-
-#include <stdlib.h>
-#include <string.h>
-#include <errno.h>
-
-#include <sys/time.h>
-#include <time.h>
-#include <ulogd/linuxlist.h>
-
 #include <ulogd/ulogd.h>
+#include <ulogd/common.h>
+#include <ulogd/linuxlist.h>
 #include <ulogd/ipfix_protocol.h>
+#include <time.h>
+#include <sys/time.h>
+#include <netinet/in.h>
 
 #include <libnetfilter_conntrack/libnetfilter_conntrack.h>
+#include <linux/netfilter/nf_conntrack_tcp.h>
+#include "linux_jhash.h"
 
-typedef enum TIMES_ { START, STOP, __TIME_MAX } TIMES;
- 
-struct ct_timestamp {
+#define ORIG	NFCT_DIR_ORIGINAL
+#define REPL	NFCT_DIR_REPLY
+
+/* configuration defaults */
+#define TCACHE_SIZE		8192
+#define SCACHE_SIZE		512
+#define TCACHE_REQ_MAX	100
+#define TIMEOUT			30 SEC
+
+#define RCVBUF_LEN		(1 << 18)
+
+typedef enum TIMES_ { START, UPDATE, STOP, __TIME_MAX } TIMES;
+typedef unsigned conntrack_hash_t;
+
+struct conntrack {
 	struct llist_head list;
+	struct llist_head seq_link;
+	struct nfct_tuple tuple;
+	unsigned last_seq;
 	struct timeval time[__TIME_MAX];
-	int id;
+	time_t t_req;
+	unsigned used;
 };
 
-struct ct_htable {
-	struct llist_head *buckets;
-	int num_buckets;
-	int prealloc;
-	struct llist_head idle;
-	struct ct_timestamp *ts;
+struct cache_head {
+	struct llist_head link;
+	unsigned cnt;
+};
+ 
+struct cache {
+	struct cache_head *c_head;
+	unsigned c_num_heads;
+	unsigned c_curr_head;
+	unsigned c_cnt;
+	conntrack_hash_t (* c_hash)(struct cache *, struct conntrack *);
+	int (* c_add)(struct cache *, struct conntrack *);
+	int (* c_del)(struct cache *, struct conntrack *);
 };
 
 struct nfct_pluginstance {
 	struct nfct_handle *cth;
 	struct ulogd_fd nfct_fd;
+	struct nf_conntrack *nfct_opaque;
+	struct cache *tcache;       /* tuple cache */
+	struct cache *scache;       /* sequence cache */
 	struct ulogd_timer timer;
-	struct ct_htable *ct_active;
+	struct {
+		unsigned nl_err;
+		unsigned nl_ovr;
+	} stats;
 };
 
-#define HTABLE_SIZE	(8192)
-#define MAX_ENTRIES	(4 * HTABLE_SIZE)
-
+static unsigned num_conntrack;
 static struct config_keyset nfct_kset = {
-	.num_ces = 5,
+	.num_ces = 3,
 	.ces = {
 		{
-			.key	 = "pollinterval",
-			.type	 = CONFIG_TYPE_INT,
-			.options = CONFIG_OPT_NONE,
-			.u.value = 0,
-		},
-		{
-			.key	 = "hash_enable",
-			.type	 = CONFIG_TYPE_INT,
-			.options = CONFIG_OPT_NONE,
-			.u.value = 1,
-		},
-		{
-			.key	 = "hash_prealloc",
+			.key	 = "hash_buckets",
 			.type	 = CONFIG_TYPE_INT,
 			.options = CONFIG_OPT_NONE,
-			.u.value = 1,
+			.u.value = TCACHE_SIZE,
 		},
 		{
-			.key	 = "hash_buckets",
+			.key	 = "disable",
 			.type	 = CONFIG_TYPE_INT,
 			.options = CONFIG_OPT_NONE,
-			.u.value = HTABLE_SIZE,
+			.u.value = 0,
 		},
 		{
-			.key	 = "hash_max_entries",
+			.key	 = "timeout",
 			.type	 = CONFIG_TYPE_INT,
 			.options = CONFIG_OPT_NONE,
-			.u.value = MAX_ENTRIES,
+			.u.value = TIMEOUT,
 		},
 	},
 };
-#define pollint_ce(x)	(x->ces[0])
-#define usehash_ce(x)	(x->ces[1])
-#define prealloc_ce(x)	(x->ces[2])
-#define buckets_ce(x)	(x->ces[3])
-#define maxentries_ce(x) (x->ces[4])
+#define buckets_ce(pi)    ((pi)->config_kset->ces[0].u.value)
+#define disable_ce(pi)    ((pi)->config_kset->ces[1].u.value)
+#define timeout_ce(pi)    ((pi)->config_kset->ces[2].u.value)
+
+enum {
+	O_IP_SADDR = 0,
+	O_IP_DADDR,
+	O_IP_PROTO,
+	O_L4_SPORT,
+	O_L4_DPORT,
+	O_RAW_IN_PKTLEN,
+	O_RAW_IN_PKTCOUNT,
+	O_RAW_OUT_PKTLEN,
+	O_RAW_OUT_PKTCOUNT,
+	O_ICMP_CODE,
+	O_ICMP_TYPE,
+	O_CT_MARK,
+	O_CT_ID,
+	O_FLOW_START_SEC,
+	O_FLOW_START_USEC,
+	O_FLOW_END_SEC,
+	O_FLOW_END_USEC,
+	O_FLOW_DURATION,
+	__O_MAX
+};
 
-static struct ulogd_key nfct_okeys[] = {
+static struct ulogd_key nfct_okeys[__O_MAX] = {
 	{
 		.type 	= ULOGD_RET_IPADDR,
 		.flags 	= ULOGD_RETF_NONE,
@@ -155,7 +188,27 @@ static struct ulogd_key nfct_okeys[] = {
 	{
 		.type	= ULOGD_RET_UINT32,
 		.flags	= ULOGD_RETF_NONE,
-		.name	= "raw.pktlen",
+		.name	= "raw.in.pktlen",
+		.ipfix	= { 
+			.vendor 	= IPFIX_VENDOR_IETF,
+			.field_id 	= IPFIX_octetTotalCount,
+			/* FIXME: this could also be octetDeltaCount */
+		},
+	},
+	{
+		.type	= ULOGD_RET_UINT32,
+		.flags	= ULOGD_RETF_NONE,
+		.name	= "raw.in.pktcount",
+		.ipfix	= { 
+			.vendor 	= IPFIX_VENDOR_IETF,
+			.field_id 	= IPFIX_packetTotalCount,
+			/* FIXME: this could also be packetDeltaCount */
+		},
+	},
+	{
+		.type	= ULOGD_RET_UINT32,
+		.flags	= ULOGD_RETF_NONE,
+		.name	= "raw.out.pktlen",
 		.ipfix	= { 
 			.vendor 	= IPFIX_VENDOR_IETF,
 			.field_id 	= IPFIX_octetTotalCount,
@@ -165,7 +218,7 @@ static struct ulogd_key nfct_okeys[] = {
 	{
 		.type	= ULOGD_RET_UINT32,
 		.flags	= ULOGD_RETF_NONE,
-		.name	= "raw.pktcount",
+		.name	= "raw.out.pktcount",
 		.ipfix	= { 
 			.vendor 	= IPFIX_VENDOR_IETF,
 			.field_id 	= IPFIX_packetTotalCount,
@@ -245,355 +298,927 @@ static struct ulogd_key nfct_okeys[] = {
 		},
 	},
 	{
-		.type = ULOGD_RET_BOOL,
+		.type = ULOGD_RET_UINT32,
 		.flags = ULOGD_RETF_NONE,
-		.name = "dir",
+		.name = "flow.duration",
 	},
 };
 
-static struct ct_htable *htable_alloc(int htable_size, int prealloc)
+
+/* forward declarations */
+static int cache_del(struct cache *, struct conntrack *);
+static struct conntrack *tcache_find(const struct ulogd_pluginstance *,
+									 const struct nfct_tuple *);
+static struct conntrack *scache_find(const struct ulogd_pluginstance *,
+									 unsigned);
+
+
+static int
+nl_error(struct ulogd_pluginstance *pi, struct nlmsghdr *nlh, int *err)
 {
-	struct ct_htable *htable;
-	struct ct_timestamp *ct;
-	int i;
+	struct nfct_pluginstance *priv = (void *)pi->private;
+	struct nlmsgerr *e = NLMSG_DATA(nlh);
+	struct conntrack *ct;
 
-	htable = malloc(sizeof(*htable)
-			+ sizeof(struct llist_head)*htable_size);
-	if (!htable)
-		return NULL;
+	if (e->msg.nlmsg_seq == 0)
+		return 0;
 
-	htable->buckets = (void *)htable + sizeof(*htable);
-	htable->num_buckets = htable_size;
-	htable->prealloc = prealloc;
-	INIT_LLIST_HEAD(&htable->idle);
+	ct = scache_find(pi, e->msg.nlmsg_seq);
+	if (ct == NULL)
+		return 0;						/* already gone */
+
+	switch (-e->error) {
+	case ENOENT:
+		/* destroy message was lost (FIXME log all what we got) */
+		if (ct->used > 1) {
+			struct conntrack *ct_tmp = tcache_find(pi, &ct->tuple);
 
-	for (i = 0; i < htable->num_buckets; i++)
-		INIT_LLIST_HEAD(&htable->buckets[i]);
-	
-	if (!htable->prealloc)
-		return htable;
+			if (ct == ct_tmp)
+				cache_del(priv->tcache, ct);
+		}
+		cache_del(priv->scache, ct);
+		break;
 
-	ct = malloc(sizeof(struct ct_timestamp)
-		    * htable->num_buckets * htable->prealloc);
-	if (!ct) {
-		free(htable);
-		return NULL;
+	case 0:								/* "Success" */
+		break;
+
+	default:
+		ulogd_log(ULOGD_ERROR, "netlink error: %s (seq %u)\n",
+				  strerror(-e->error), e->msg.nlmsg_seq);
+		break;
+	}
+
+	*err = -e->error;
+
+	return 0;
+}
+
+
+/* this should go into its own file */
+static int
+nfnl_recv_msgs(struct nfnl_handle *nfnlh,
+			   int (* cb)(struct nlmsghdr *, void *arg), void *arg)
+{
+	static unsigned char buf[NFNL_BUFFSIZE];
+	struct ulogd_pluginstance *pi = arg;
+	struct nfct_pluginstance *priv = (void *)pi->private;
+
+	for (;;) {
+		struct nlmsghdr *nlh = (void *)buf;
+		ssize_t nread;
+
+		nread = nfnl_recv(nfct_nfnlh(priv->cth), buf, sizeof(buf));
+		if (nread < 0) {
+			if (errno == EWOULDBLOCK)
+				break;
+
+			return -1;
+		}
+
+		while (NLMSG_OK(nlh, nread)) {
+			int err = 0;
+
+			if (nlh->nlmsg_type == NLMSG_ERROR) {
+				if (nl_error(pi, nlh, &err) == 0 && err != 0)
+					priv->stats.nl_err++;
+
+				break;
+			}
+
+			if (nlh->nlmsg_type == NLMSG_OVERRUN)
+				priv->stats.nl_ovr++;	/* continue?  payload? */
+
+			(cb)(nlh, pi);
+
+			nlh = NLMSG_NEXT(nlh, nread);
+		}
 	}
 
-	/* save the pointer for later free()ing */
-	htable->ts = ct;
+	return 0;
+}
+
+
+static int
+nfct_msg_type(const struct nlmsghdr *nlh)
+{
+	uint16_t type = NFNL_MSG_TYPE(nlh->nlmsg_type);
+	int nfct_type;
+
+	if (type == IPCTNL_MSG_CT_NEW) {
+		if (nlh->nlmsg_flags & (NLM_F_CREATE | NLM_F_EXCL))
+			nfct_type = NFCT_MSG_NEW;
+		else
+			nfct_type = NFCT_MSG_UPDATE;
+	} else if (type == IPCTNL_MSG_CT_DELETE)
+		nfct_type = NFCT_MSG_DESTROY;
+	else
+		nfct_type = NFCT_MSG_UNKNOWN;
+
+	return nfct_type;
+}
+
+
+/*
+ * nfct_get_conntrack_seq()
+ *
+ * Do GET_CONNTRACK, return seq# used.
+ */
+static int
+nfct_get_conntrack_seq(struct nfct_handle *cth, struct nfct_tuple *t,
+					  uint32_t *seq)
+{
+	static char buf[NFNL_BUFFSIZE];
+	struct nfnlhdr *req = (void *)buf;
+
+	memset(buf, 0, sizeof(buf));
+
+	/* intendedly do not set NLM_F_ACK in order to skip the
+	   ACK message (but NACKs are still send) */
+	nfnl_fill_hdr(nfct_subsys_ct(cth), &req->nlh, 0, t->l3protonum,
+				  0, IPCTNL_MSG_CT_GET, NLM_F_REQUEST);
+
+	if (seq != NULL)
+		*seq = req->nlh.nlmsg_seq;
+
+	nfct_build_tuple(req, sizeof(buf), t, CTA_TUPLE_ORIG);
+
+	return nfnl_send(nfct_nfnlh(cth), &req->nlh);
+}
+
+/* time diff with second resolution */
+static inline unsigned
+tv_diff_sec(const struct timeval *tv1, const struct timeval *tv2)
+{
+	if (tv2->tv_sec >= tv1->tv_sec)
+		return max(tv2->tv_sec - tv1->tv_sec, 1);
+
+	return tv1->tv_sec - tv2->tv_sec;
+}
+
+struct conntrack *
+ct_alloc(const struct nfct_tuple *tuple)
+{
+	struct conntrack *ct;
+
+	if ((ct = calloc(1, sizeof(struct conntrack))) == NULL)
+		return NULL;
+
+	memcpy(&ct->tuple, tuple, sizeof(struct nfct_tuple));
+
+	num_conntrack++;
+
+	return ct;
+}
+
+static inline void
+ct_get(struct conntrack *ct)
+{
+	ct->used++;
+}
+
+static inline void
+ct_put(struct conntrack *ct)
+{
+	if (--ct->used == 0) {
+		assert(num_conntrack > 0);
 
-	for (i = 0; i < htable->num_buckets * htable->prealloc; i++)
-		llist_add(&ct[i].list, &htable->idle);
+		free(ct);
 
-	return htable;
+		num_conntrack--;
+	}
 }
 
-static void htable_free(struct ct_htable *htable)
+static struct cache *
+cache_alloc(int cache_size)
 {
-	struct llist_head *ptr, *ptr2;
+	struct cache *c;
 	int i;
 
-	if (htable->prealloc) {
-		/* the easy case */
-		free(htable->ts);
-		free(htable);
+	c = malloc(sizeof(*c) + sizeof(struct cache_head) * cache_size);
+	if (c == NULL)
+		return NULL;
+
+	c->c_head = (void *)c + sizeof(*c);
+	c->c_num_heads = cache_size;
+	c->c_curr_head = 0;
+	c->c_cnt = 0;
+
+	for (i = 0; i < c->c_num_heads; i++) {
+		INIT_LLIST_HEAD(&c->c_head[i].link);
+		c->c_head[i].cnt = 0;
+	}
+
+	return c;
+}
 
+static void
+cache_free(struct cache *c)
+{
+	int i;
+
+	if (c == NULL)
 		return;
+
+	for (i = 0; i < c->c_num_heads; i++) {
+		struct llist_head *ptr, *ptr2;
+
+		llist_for_each_safe(ptr, ptr2, &c->c_head[i].link)
+			free(container_of(ptr, struct conntrack, list));
 	}
 
-	/* non-prealloc case */
+	free(c);
+}
+
+int
+cache_add(struct cache *c, struct conntrack *ct)
+{
+	ct_get(ct);
+
+	ct->time[UPDATE].tv_sec = ct->time[START].tv_sec = t_now_local;
+
+	/* order of these two is important for debugging purposes */
+	c->c_cnt++;
+	c->c_add(c, ct);
+
+	return 0;
+}
+
+int
+cache_del(struct cache *c, struct conntrack *ct)
+{
+	assert(c->c_cnt > 0);
+	assert(ct->used > 0);
+
+	/* order of these two is important for debugging purposes */
+	c->c_del(c, ct);
+	c->c_cnt--;
+
+	ct_put(ct);
+
+	return 0;
+}
+
+static inline conntrack_hash_t
+cache_head_next(const struct cache *c)
+{
+	return (c->c_curr_head + 1) % c->c_num_heads;
+}
+
+static inline conntrack_hash_t
+cache_slice_end(const struct cache *c, unsigned n)
+{
+	return (c->c_curr_head + n) % c->c_num_heads;
+}
+
+/* tuple cache */
+static struct conntrack ct_search;		/* used by scache too */
+
+static conntrack_hash_t
+tcache_hash(struct cache *c, struct conntrack *ct)
+{
+	static unsigned rnd;
+	struct nfct_tuple *t = &ct->tuple;
+
+	if (rnd == 0U)
+		rnd = rand();
 
-	for (i = 0; i < htable->num_buckets; i++) {
-		llist_for_each_safe(ptr, ptr2, &htable->buckets[i])
-			free(container_of(ptr, struct ct_timestamp, list));
+	return jhash_3words(t->src.v4, t->dst.v4 ^ t->protonum,	t->l4src.all
+						| (t->l4dst.all << 16), rnd) % c->c_num_heads;
+}
+
+static int
+tcache_add(struct cache *c, struct conntrack *ct)
+{
+	conntrack_hash_t h = c->c_hash(c, ct);
+
+	llist_add(&ct->list, &c->c_head[h].link);
+	c->c_head[h].cnt++;
+
+	pr_debug("%s: ct=%p (h %u, %u/%u)\n", __func__, ct, h,
+			 c->c_head[h].cnt, c->c_cnt);
+
+	return 0;
+}
+
+static int
+tcache_del(struct cache *c, struct conntrack *ct)
+{
+	conntrack_hash_t h = c->c_hash(c, ct);
+
+	assert(c->c_head[h].cnt > 0);
+
+	pr_debug("%s: ct=%p (h %u, %u/%u)\n", __func__, ct, h,
+			 c->c_head[h].cnt, c->c_cnt);
+
+	llist_del(&ct->list);
+	c->c_head[h].cnt--;
+
+	return 0;
+}
+
+static struct conntrack *
+tcache_find(const struct ulogd_pluginstance *pi,
+			const struct nfct_tuple *tuple)
+{
+	struct nfct_pluginstance *priv = (void *)pi->private;
+	struct cache *c = priv->tcache;
+	struct conntrack *ct;
+	conntrack_hash_t h;
+
+	memcpy(&ct_search.tuple, tuple, sizeof(struct nfct_tuple));
+	h = c->c_hash(c, &ct_search);
+
+	llist_for_each_entry(ct, &c->c_head[h].link, list) {
+		if (memcmp(&ct->tuple, tuple, sizeof(*tuple)) == 0)
+			return ct;
 	}
 
-	/* don't need to check for 'idle' list, since it is only used in
-	 * the preallocated case */
+	return NULL;
 }
 
-static int ct_hash_add(struct ct_htable *htable, unsigned int id)
+/* check entries in tuple cache */
+static int
+tcache_cleanup(struct ulogd_pluginstance *pi)
 {
-	struct ct_timestamp *ct;
+	struct nfct_pluginstance *priv = (void *)pi->private;
+	struct cache *c = priv->tcache;
+	conntrack_hash_t end = cache_slice_end(c, 32);
+	struct conntrack *ct;
+	int ret, req = 0;
+
+	do {
+		llist_for_each_entry_reverse(ct, &c->c_head[c->c_curr_head].link,
+									 list) {
+			if (tv_diff_sec(&ct->time[UPDATE], &tv_now) < timeout_ce(pi))
+				continue;
+
+			/* check if its still there */
+			ret = nfct_get_conntrack_seq(priv->cth, &ct->tuple,
+									   &ct->last_seq);
+			if (ret < 0) {
+				if (errno == EWOULDBLOCK)
+					break;
+
+				ulogd_log(ULOGD_ERROR, "nfct_get_conntrack: ct=%p: %m\n",
+						  ct);
+				break;
+			}
+
+			if (&ct->last_seq != 0) {
+				ct->t_req = t_now;
 
-	if (htable->prealloc) {
-		if (llist_empty(&htable->idle)) {
-			ulogd_log(ULOGD_ERROR, "Not enough ct_timestamp entries\n");
-			return -1;
+				assert(scache_find(pi, ct->last_seq) == NULL);
+
+				cache_add(priv->scache, ct);
+			}
+
+			if (++req > TCACHE_REQ_MAX)
+				break;
 		}
 
-		ct = container_of(htable->idle.next, struct ct_timestamp, list);
+		c->c_curr_head = cache_head_next(c);
 
-		ct->id = id;
-		gettimeofday(&ct->time[START], NULL);
+		if (req > TCACHE_REQ_MAX)
+			break;
+	} while (c->c_curr_head != end);
 
-		llist_move(&ct->list, &htable->buckets[id % htable->num_buckets]);
-	} else {
-		ct = malloc(sizeof *ct);
-		if (!ct) {
-			ulogd_log(ULOGD_ERROR, "Not enough memory\n");
-			return -1;
-		}
+	return req;
+}
 
-		ct->id = id;
-		gettimeofday(&ct->time[START], NULL);
+/* sequence cache */
+static conntrack_hash_t
+scache_hash(struct cache *c, struct conntrack *ct)
+{
+	static unsigned rnd;
 
-		llist_add(&ct->list, &htable->buckets[id % htable->num_buckets]);
-	}
+	if (rnd == 0U)
+		rnd = rand();
+
+	return (ct->last_seq ^ rnd) % c->c_num_heads;
+}
+
+static int
+scache_add(struct cache *c, struct conntrack *ct)
+{
+	conntrack_hash_t h = c->c_hash(c, ct);
+
+	llist_add(&ct->seq_link, &c->c_head[h].link);
+	c->c_head[h].cnt++;
+
+	pr_debug("%s: ct=%p (h %u, %u/%u)\n", __func__, ct, h,
+			 c->c_head[h].cnt, c->c_cnt);
 
 	return 0;
 }
 
-static struct ct_timestamp *ct_hash_get(struct ct_htable *htable, uint32_t id)
+static int
+scache_del(struct cache *c, struct conntrack *ct)
 {
-	struct ct_timestamp *ct = NULL;
-	struct llist_head *ptr;
+	conntrack_hash_t h = c->c_hash(c, ct);
 
-	llist_for_each(ptr, &htable->buckets[id % htable->num_buckets]) {
-		ct = container_of(ptr, struct ct_timestamp, list);
-		if (ct->id == id) {
-			gettimeofday(&ct->time[STOP], NULL);
-			if (htable->prealloc)
-				llist_move(&ct->list, &htable->idle);
-			else
-				free(ct);
-			break;
-		}
+	assert(c->c_head[h].cnt > 0);
+
+	pr_debug("%s: ct=%p (h %u, %u/%u)\n", __func__, ct, h,
+			 c->c_head[h].cnt, c->c_cnt);
+
+	llist_del(&ct->seq_link);
+	ct->last_seq = 0;
+
+	c->c_head[h].cnt--;
+
+	return 0;
+}
+
+static struct conntrack *
+scache_find(const struct ulogd_pluginstance *pi, unsigned seq)
+{
+	struct nfct_pluginstance *priv = (void *)pi->private;
+	struct cache *c = priv->scache;
+	struct conntrack *ct;
+	conntrack_hash_t h;
+
+	ct_search.last_seq = seq;
+	h = c->c_hash(c, &ct_search);
+
+	llist_for_each_entry(ct, &c->c_head[h].link, seq_link) {
+		if (ct->last_seq == ct_search.last_seq)
+			return ct;
 	}
-	return ct;
+
+	return NULL;
+}
+
+static int
+scache_cleanup(struct ulogd_pluginstance *pi)
+{
+	struct nfct_pluginstance *priv = (void *)pi->private;
+	struct cache *c = priv->scache;
+	conntrack_hash_t end = cache_slice_end(c, 16);
+	struct conntrack *ct;
+	int del = 0;
+
+	if (c->c_cnt == 0)
+		return 0;
+
+	do {
+		struct llist_head *curr, *tmp;
+
+		assert(c->c_curr_head < c->c_num_heads);
+
+		llist_for_each_prev_safe(curr, tmp, &c->c_head[c->c_curr_head].link) {
+			ct = container_of(curr, struct conntrack, seq_link);
+
+			assert(ct->t_req != 0);
+
+			if ((t_now - ct->t_req) < 5 SEC)
+				break;
+
+			cache_del(priv->scache, ct);
+			del++;
+		}
+
+		c->c_curr_head = cache_head_next(c);
+	} while (c->c_curr_head != end);
+
+	return del;
 }
 
-static int propagate_ct_flow(struct ulogd_pluginstance *upi, 
-		             struct nfct_conntrack *ct,
-			     unsigned int flags,
-			     int dir,
-			     struct ct_timestamp *ts)
+static int
+propagate_ct_flow(struct ulogd_pluginstance *upi, 
+				  const struct nfct_conntrack *nfct,
+				  const struct conntrack *ct)
 {
 	struct ulogd_key *ret = upi->output.keys;
 
-	ret[0].u.value.ui32 = htonl(ct->tuple[dir].src.v4);
-	ret[0].flags |= ULOGD_RETF_VALID;
+	ret[O_IP_SADDR].u.value.ui32 = htonl(nfct->tuple[ORIG].src.v4);
+	ret[O_IP_SADDR].flags |= ULOGD_RETF_VALID;
 
-	ret[1].u.value.ui32 = htonl(ct->tuple[dir].dst.v4);
-	ret[1].flags |= ULOGD_RETF_VALID;
+	ret[O_IP_DADDR].u.value.ui32 = htonl(nfct->tuple[REPL].src.v4);
+	ret[O_IP_DADDR].flags |= ULOGD_RETF_VALID;
 
-	ret[2].u.value.ui8 = ct->tuple[dir].protonum;
-	ret[2].flags |= ULOGD_RETF_VALID;
+	ret[O_IP_PROTO].u.value.ui8 = nfct->tuple[ORIG].protonum;
+	ret[O_IP_PROTO].flags |= ULOGD_RETF_VALID;
 
-	switch (ct->tuple[1].protonum) {
+	switch (nfct->tuple[ORIG].protonum) {
 	case IPPROTO_TCP:
 	case IPPROTO_UDP:
 	case IPPROTO_SCTP:
 		/* FIXME: DCCP */
-		ret[3].u.value.ui16 = htons(ct->tuple[dir].l4src.tcp.port);
-		ret[3].flags |= ULOGD_RETF_VALID;
-		ret[4].u.value.ui16 = htons(ct->tuple[dir].l4dst.tcp.port);
-		ret[4].flags |= ULOGD_RETF_VALID;
+		ret[O_L4_SPORT].u.value.ui16
+			= htons(nfct->tuple[ORIG].l4src.tcp.port);
+		ret[O_L4_SPORT].flags |= ULOGD_RETF_VALID;
+		ret[O_L4_DPORT].u.value.ui16
+			= htons(nfct->tuple[REPL].l4src.tcp.port);
+		ret[O_L4_DPORT].flags |= ULOGD_RETF_VALID;
 		break;
 	case IPPROTO_ICMP:
-		ret[7].u.value.ui8 = ct->tuple[dir].l4src.icmp.code;
-		ret[7].flags |= ULOGD_RETF_VALID;
-		ret[8].u.value.ui8 = ct->tuple[dir].l4src.icmp.type;
-		ret[8].flags |= ULOGD_RETF_VALID;
+		ret[O_ICMP_CODE].u.value.ui8 = nfct->tuple[ORIG].l4src.icmp.code;
+		ret[O_ICMP_CODE].flags |= ULOGD_RETF_VALID;
+		ret[O_ICMP_TYPE].u.value.ui8 = nfct->tuple[ORIG].l4src.icmp.type;
+		ret[O_ICMP_TYPE].flags |= ULOGD_RETF_VALID;
 		break;
 	}
 
-	if ((dir == NFCT_DIR_ORIGINAL && flags & NFCT_COUNTERS_ORIG) ||
-	    (dir == NFCT_DIR_REPLY && flags & NFCT_COUNTERS_RPLY)) {
-		ret[5].u.value.ui64 = ct->counters[dir].bytes;
-		ret[5].flags |= ULOGD_RETF_VALID;
+	ret[O_RAW_IN_PKTLEN].u.value.ui32 = nfct->counters[ORIG].bytes;
+	ret[O_RAW_IN_PKTLEN].flags |= ULOGD_RETF_VALID;
+	ret[O_RAW_IN_PKTCOUNT].u.value.ui32 = nfct->counters[ORIG].packets;
+	ret[O_RAW_IN_PKTCOUNT].flags |= ULOGD_RETF_VALID;
+
+	ret[O_RAW_OUT_PKTLEN].u.value.ui32 = nfct->counters[REPL].bytes;
+	ret[O_RAW_OUT_PKTLEN].flags |= ULOGD_RETF_VALID;
+	ret[O_RAW_OUT_PKTCOUNT].u.value.ui32 = nfct->counters[REPL].packets;
+	ret[O_RAW_OUT_PKTCOUNT].flags |= ULOGD_RETF_VALID;
+
+	ret[O_CT_MARK].u.value.ui32 = nfct->mark;
+	ret[O_CT_MARK].flags |= ULOGD_RETF_VALID;
+
+	ret[O_CT_ID].u.value.ui32 = nfct->id;
+	ret[O_CT_ID].flags |= ULOGD_RETF_VALID;
+
+	ret[O_FLOW_START_SEC].u.value.ui32 = ct->time[START].tv_sec;
+	ret[O_FLOW_START_SEC].flags |= ULOGD_RETF_VALID;
+	ret[O_FLOW_START_USEC].u.value.ui32 = ct->time[START].tv_usec;
+	ret[O_FLOW_START_USEC].flags |= ULOGD_RETF_VALID;
+
+	ret[O_FLOW_END_SEC].u.value.ui32 = ct->time[STOP].tv_sec;
+	ret[O_FLOW_END_SEC].flags |= ULOGD_RETF_VALID;
+	ret[O_FLOW_END_USEC].u.value.ui32 = ct->time[STOP].tv_usec;
+	ret[O_FLOW_END_USEC].flags |= ULOGD_RETF_VALID;
+
+	ret[O_FLOW_DURATION].u.value.ui32 = tv_diff_sec(&ct->time[START],
+													&ct->time[STOP]);
+	ret[O_FLOW_DURATION].flags |= ULOGD_RETF_VALID;
 
-		ret[6].u.value.ui64 = ct->counters[dir].packets;
-		ret[6].flags |= ULOGD_RETF_VALID;
-	}
+	ulogd_propagate_results(upi);
 
-	if (flags & NFCT_MARK) {
-		ret[9].u.value.ui32 = ct->mark;
-		ret[9].flags |= ULOGD_RETF_VALID;
-	}
+	return 0;
+}
 
-	if (flags & NFCT_ID) {
-		ret[10].u.value.ui32 = ct->id;
-		ret[10].flags |= ULOGD_RETF_VALID;
-	}
+static int
+propagate_ct(struct ulogd_pluginstance *upi,
+			 struct nfct_conntrack *nfct, struct conntrack *ct)
+{
+	struct nfct_pluginstance *priv = (void *)upi->private;
 
-	if (ts) {
-		ret[11].u.value.ui32 = ts->time[START].tv_sec;
-		ret[11].flags |= ULOGD_RETF_VALID;
-		ret[12].u.value.ui32 = ts->time[START].tv_usec;
-		ret[12].flags |= ULOGD_RETF_VALID;
-		ret[13].u.value.ui32 = ts->time[STOP].tv_sec;
-		ret[13].flags |= ULOGD_RETF_VALID;
-		ret[14].u.value.ui32 = ts->time[STOP].tv_usec;
-		ret[14].flags |= ULOGD_RETF_VALID;
-	}
+	do {
+		if (nfct->tuple[ORIG].src.v4 == INADDR_LOOPBACK
+			|| nfct->tuple[ORIG].dst.v4 == INADDR_LOOPBACK)
+			break;
 
-	ret[15].u.value.b = (dir == NFCT_DIR_ORIGINAL) ? 0 : 1;
-	ret[15].flags |= ULOGD_RETF_VALID;
+		ct->time[STOP].tv_sec = t_now_local;
 
-	ulogd_propagate_results(upi);
+		propagate_ct_flow(upi, nfct, ct);
+	} while (0);
+
+	cache_del(priv->tcache, ct);
 
 	return 0;
 }
 
-static int propagate_ct(struct ulogd_pluginstance *upi,
-			struct nfct_conntrack *ct,
-			unsigned int flags,
-			struct ct_timestamp *ctstamp)
+/* nfct_to_conntrack() - translate from opaque type to nfct_conntrack */
+static int
+nfct_to_conntrack(const struct ulogd_pluginstance *pi,
+				  const struct nf_conntrack *ct, struct nfct_conntrack *out)
 {
-	int rc;
+	bzero(out, sizeof(struct nfct_conntrack));
 
-	rc = propagate_ct_flow(upi, ct, flags, NFCT_DIR_ORIGINAL, ctstamp);
-	if (rc < 0)
-		return rc;
+	assert(nfct_attr_is_set(ct, ATTR_L3PROTO));
+	out->tuple[ORIG].l3protonum = nfct_get_attr_u8(ct, ATTR_L3PROTO);
 
-	return propagate_ct_flow(upi, ct, flags, NFCT_DIR_REPLY, ctstamp);
-}
+	assert(nfct_attr_is_set(ct, ATTR_L4PROTO));
+	out->tuple[ORIG].protonum = nfct_get_attr_u8(ct, ATTR_L4PROTO);
 
-static int event_handler(void *arg, unsigned int flags, int type,
-			 void *data)
-{
-	struct nfct_conntrack *ct = arg;
-	struct ulogd_pluginstance *upi = data;
-	struct nfct_pluginstance *cpi = 
-				(struct nfct_pluginstance *) upi->private;
-
-	if (type == NFCT_MSG_NEW) {
-		if (usehash_ce(upi->config_kset).u.value != 0)
-			ct_hash_add(cpi->ct_active, ct->id);
-	} else if (type == NFCT_MSG_DESTROY) {
-		struct ct_timestamp *ts = NULL;
+	out->tuple[ORIG].src.v4 = nfct_get_attr_u32(ct, ATTR_IPV4_SRC);
+	out->tuple[REPL].src.v4 = nfct_get_attr_u32(ct, ATTR_REPL_IPV4_SRC);
+	out->tuple[ORIG].dst.v4 = nfct_get_attr_u32(ct, ATTR_IPV4_DST);
+	out->tuple[REPL].dst.v4 = nfct_get_attr_u32(ct, ATTR_REPL_IPV4_DST);
+
+	if (out->tuple[ORIG].l3protonum == IPPROTO_ICMP) {
+		out->tuple[ORIG].l4src.icmp.type
+			= nfct_get_attr_u8(ct, ATTR_ICMP_TYPE);
+		out->tuple[ORIG].l4src.icmp.code
+			= nfct_get_attr_u8(ct, ATTR_ICMP_CODE);
+		out->tuple[ORIG].l4src.icmp.id
+			= nfct_get_attr_u16(ct, ATTR_ICMP_ID);
+	}
 
-		if (usehash_ce(upi->config_kset).u.value != 0)
-			ts = ct_hash_get(cpi->ct_active, ct->id);
+	if (out->tuple[ORIG].protonum == IPPROTO_TCP
+		|| out->tuple[ORIG].protonum == IPPROTO_UDP) {
+		assert(nfct_attr_is_set(ct, ATTR_ORIG_PORT_SRC));
+		out->tuple[ORIG].l4src.tcp.port
+			= nfct_get_attr_u16(ct, ATTR_ORIG_PORT_SRC);
+
+		assert(nfct_attr_is_set(ct, ATTR_ORIG_PORT_DST));
+		out->tuple[ORIG].l4dst.tcp.port
+			= nfct_get_attr_u16(ct, ATTR_ORIG_PORT_DST);
+
+		assert(nfct_attr_is_set(ct, ATTR_REPL_PORT_SRC));
+		out->tuple[REPL].l4src.tcp.port
+			= nfct_get_attr_u16(ct, ATTR_REPL_PORT_SRC);
+
+		assert(nfct_attr_is_set(ct, ATTR_REPL_PORT_DST));
+		out->tuple[REPL].l4dst.tcp.port
+			= nfct_get_attr_u16(ct, ATTR_REPL_PORT_DST);
 
-		return propagate_ct(upi, ct, flags, ts);
+		if (nfct_attr_is_set(ct, ATTR_TCP_STATE))
+			out->protoinfo.tcp.state = nfct_get_attr_u8(ct, ATTR_TCP_STATE);
 	}
+
+	if (nfct_attr_is_set(ct, ATTR_STATUS))
+		out->status = nfct_get_attr_u32(ct, ATTR_STATUS);
+	if (nfct_attr_is_set(ct, ATTR_TIMEOUT))
+		out->timeout = nfct_get_attr_u32(ct, ATTR_TIMEOUT);
+	if (nfct_attr_is_set(ct, ATTR_MARK))
+		out->mark = nfct_get_attr_u32(ct, ATTR_MARK);
+	if (nfct_attr_is_set(ct, ATTR_USE))
+		out->use = nfct_get_attr_u32(ct, ATTR_USE);
+
+	/* counter */
+	if (nfct_attr_is_set(ct, ATTR_ORIG_COUNTER_BYTES))
+		out->counters[ORIG].bytes
+			= nfct_get_attr_u32(ct, ATTR_ORIG_COUNTER_BYTES);
+	if (nfct_attr_is_set(ct, ATTR_ORIG_COUNTER_PACKETS))
+		out->counters[ORIG].packets
+			= nfct_get_attr_u32(ct, ATTR_ORIG_COUNTER_PACKETS);
+	if (nfct_attr_is_set(ct, ATTR_REPL_COUNTER_BYTES))
+		out->counters[ORIG].bytes
+			= nfct_get_attr_u32(ct, ATTR_REPL_COUNTER_BYTES);
+	if (nfct_attr_is_set(ct, ATTR_REPL_COUNTER_PACKETS))
+		out->counters[REPL].packets
+			= nfct_get_attr_u32(ct, ATTR_REPL_COUNTER_PACKETS);
+
 	return 0;
 }
 
-static int read_cb_nfct(int fd, unsigned int what, void *param)
+static int
+do_nfct_msg(struct nlmsghdr *nlh, void *arg)
 {
-	struct nfct_pluginstance *cpi = (struct nfct_pluginstance *) param;
+	struct ulogd_pluginstance *pi = arg;
+	struct nfct_pluginstance *priv = (void *)pi->private;
+	struct nfct_conntrack nfct;
+	struct conntrack *ct;
+	int type = nfct_msg_type(nlh);
 
-	if (!(what & ULOGD_FD_READ))
+	if (type == NFCT_MSG_UNKNOWN)
 		return 0;
 
-	/* FIXME: implement this */
-	nfct_event_conntrack(cpi->cth);
+	if (nfct_parse_conntrack(NFCT_T_ALL, nlh, priv->nfct_opaque) < 0)
+		return -1;
+
+	if (nfct_to_conntrack(pi, priv->nfct_opaque, &nfct) < 0)
+		return -1;
+
+	/* TODO handle NFCT_COUNTER_FILLING */
+
+	switch (type) { 
+	case NFCT_MSG_NEW:
+		if ((ct = ct_alloc(&nfct.tuple[ORIG])) == NULL)
+			return -1;
+
+		if (cache_add(priv->tcache, ct) < 0)
+			return -1;
+		break;
+
+	case NFCT_MSG_UPDATE:
+		if ((ct = tcache_find(pi, &nfct.tuple[ORIG])) == NULL) {
+			/* do not add CT to cache, as there would be no start
+			   information */
+			break;
+		}
+
+		ct->time[UPDATE].tv_sec = t_now_local;
+
+		if (ct->used > 1) {
+			struct conntrack *ct_tmp = scache_find(pi, nlh->nlmsg_seq);
+
+			if (ct_tmp != NULL) {
+				assert(ct_tmp == ct);
+
+				cache_del(priv->scache, ct);
+			}
+		}
+
+		/* handle TCP connections differently in order not to bloat CT
+		   hash with many TIME_WAIT connections */
+		if (nfct.tuple[ORIG].protonum == IPPROTO_TCP) {
+			if (nfct.protoinfo.tcp.state == TCP_CONNTRACK_TIME_WAIT)
+				return propagate_ct(pi, &nfct, ct);
+		}
+		break;
+		
+	case NFCT_MSG_DESTROY:
+		if ((ct = tcache_find(pi, &nfct.tuple[ORIG])) != NULL)
+			return propagate_ct(pi, &nfct, ct);
+		break;
+		
+	default:
+		break;
+	}
+
 	return 0;
 }
 
-static int get_ctr_zero(struct ulogd_pluginstance *upi)
+
+static int
+read_cb_nfct(int fd, unsigned what, void *param)
 {
-	struct nfct_pluginstance *cpi = 
-			(struct nfct_pluginstance *)upi->private;
+	struct ulogd_pluginstance *pi = param;
+	struct nfct_pluginstance *priv = (void *)pi->private;
+
+	if (!(what & ULOGD_FD_READ))
+		return 0;
 
-	return nfct_dump_conntrack_table_reset_counters(cpi->cth, AF_INET);
+	return nfnl_recv_msgs(nfct_nfnlh(priv->cth), do_nfct_msg, pi);
 }
 
-static void getctr_timer_cb(void *data)
-{
-	struct ulogd_pluginstance *upi = data;
+/*
+  nfct_timer_cb()
 
-	get_ctr_zero(upi);
+  This is a synchronous timer, do whatever you want.
+*/
+static void
+nfct_timer_cb(struct ulogd_timer *t)
+{
+	struct ulogd_pluginstance *pi = t->data;
+	struct nfct_pluginstance *priv = (void *)pi->private;
+	unsigned sc_start, sc_end, tc_start, tc_end;
+
+	sc_start = priv->scache->c_curr_head;
+	tc_start = priv->tcache->c_curr_head;
+
+	scache_cleanup(pi);
+	tcache_cleanup(pi);
+
+	sc_end = priv->scache->c_curr_head;
+	tc_end = priv->tcache->c_curr_head;
+
+	ulogd_log(ULOGD_DEBUG, "%s: ct=%u t=%u [%u,%u[ s=%u [%u,%u[\n",
+			  pi->id, num_conntrack,
+			  priv->tcache->c_cnt, tc_start, tc_end,
+			  priv->scache->c_cnt, sc_start, sc_end);
 }
 
-static int configure_nfct(struct ulogd_pluginstance *upi,
-			  struct ulogd_pluginstance_stack *stack)
+static int
+nfct_configure(struct ulogd_pluginstance *upi,
+			   struct ulogd_pluginstance_stack *stack)
 {
-	struct nfct_pluginstance *cpi = 
-			(struct nfct_pluginstance *)upi->private;
+	struct nfct_pluginstance *priv = (void *)upi->private;
 	int ret;
+
+	memset(priv, 0, sizeof(struct nfct_pluginstance));
 	
 	ret = config_parse_file(upi->id, upi->config_kset);
 	if (ret < 0)
 		return ret;
-	
-	/* initialize getctrzero timer structure */
-	memset(&cpi->timer, 0, sizeof(cpi->timer));
-	cpi->timer.cb = &getctr_timer_cb;
-	cpi->timer.data = cpi;
-
-	if (pollint_ce(upi->config_kset).u.value != 0) {
-		cpi->timer.expires.tv_sec = 
-			pollint_ce(upi->config_kset).u.value;
-		ulogd_register_timer(&cpi->timer);
-	}
 
 	return 0;
 }
 
-static int constructor_nfct(struct ulogd_pluginstance *upi)
+static int
+init_caches(struct ulogd_pluginstance *pi)
 {
-	struct nfct_pluginstance *cpi = 
-			(struct nfct_pluginstance *)upi->private;
-	int prealloc;
+	struct nfct_pluginstance *priv = (void *)pi->private;
+	struct cache *c;
 
-	memset(cpi, 0, sizeof(*cpi));
+	assert(priv->tcache == NULL && priv->scache == NULL);
 
-	/* FIXME: make eventmask configurable */
-	cpi->cth = nfct_open(NFNL_SUBSYS_CTNETLINK, NF_NETLINK_CONNTRACK_NEW|
-			     NF_NETLINK_CONNTRACK_DESTROY);
-	if (!cpi->cth) {
-		ulogd_log(ULOGD_FATAL, "error opening ctnetlink\n");
+	/* tuple cache */
+	c = priv->tcache = cache_alloc(buckets_ce(pi));
+	if (priv->tcache == NULL) {
+		ulogd_log(ULOGD_FATAL, "%s: out of memory\n", pi->id);
 		return -1;
 	}
 
-	nfct_register_callback(cpi->cth, &event_handler, upi);
+	c->c_hash = tcache_hash;
+	c->c_add = tcache_add;
+	c->c_del = tcache_del;
+
+	/* sequence cache */
+	c = priv->scache = cache_alloc(SCACHE_SIZE);
+	if (priv->scache == NULL) {
+		ulogd_log(ULOGD_FATAL, "%s: out of memory\n", pi->id);
 
-	cpi->nfct_fd.fd = nfct_fd(cpi->cth);
-	cpi->nfct_fd.cb = &read_cb_nfct;
-	cpi->nfct_fd.data = cpi;
-	cpi->nfct_fd.when = ULOGD_FD_READ;
-
-	ulogd_register_fd(&cpi->nfct_fd);
-
-	if (prealloc_ce(upi->config_kset).u.value != 0)
-		prealloc = maxentries_ce(upi->config_kset).u.value / 
-				buckets_ce(upi->config_kset).u.value;
-	else
-		prealloc = 0;
+		cache_free(priv->tcache);
+		priv->tcache = NULL;
 
-	if (usehash_ce(upi->config_kset).u.value != 0) {
-		cpi->ct_active = htable_alloc(buckets_ce(upi->config_kset).u.value,
-					      prealloc);
-		if (!cpi->ct_active) {
-			ulogd_log(ULOGD_FATAL, "error allocating hash\n");
-			nfct_close(cpi->cth);
-			return -1;
-		}
+		return -1;
 	}
-	
+
+	c->c_hash = scache_hash;
+	c->c_add = scache_add;
+	c->c_del = scache_del;
+
 	return 0;
 }
 
-static int destructor_nfct(struct ulogd_pluginstance *pi)
+static int
+nfct_start(struct ulogd_pluginstance *upi)
 {
-	struct nfct_pluginstance *cpi = (void *) pi;
-	int rc;
-	
-	htable_free(cpi->ct_active);
+	struct nfct_pluginstance *priv = (void *)upi->private;
+
+	pr_debug("%s: pi=%p\n", __func__, upi);
+
+	if (disable_ce(upi) != 0) {
+		ulogd_log(ULOGD_INFO, "%s: disabled\n", upi->id);
+		return 0;
+	}
+
+	if (priv->nfct_opaque == NULL) {
+		if ((priv->nfct_opaque = nfct_new()) == NULL) {
+			ulogd_log(ULOGD_FATAL, "%s: out of memory\n", upi->id);
+			return -1;
+		}
+	}
+
+	if (init_caches(upi) < 0)
+		goto err_free;
+
+	priv->cth = nfct_open(NFNL_SUBSYS_CTNETLINK, NFCT_ALL_CT_GROUPS);
+	if (priv->cth == NULL) {
+		ulogd_log(ULOGD_FATAL, "error opening ctnetlink\n");
+		goto err_free;
+	}
+
+	ulogd_log(ULOGD_DEBUG, "%s: ctnetlink connection opened\n", upi->id);
 
-	rc = nfct_close(cpi->cth);
-	if (rc < 0)
-		return rc;
+	if (nfnl_rcvbufsiz(nfct_nfnlh(priv->cth), RCVBUF_LEN) < 0)
+		goto err_nfct_close;
+
+	priv->nfct_fd.fd = nfct_fd(priv->cth);
+	priv->nfct_fd.cb = &read_cb_nfct;
+	priv->nfct_fd.data = upi;
+	priv->nfct_fd.when = ULOGD_FD_READ;
+
+	if (ulogd_register_fd(&priv->nfct_fd) < 0)
+		goto err_nfct_close;
+
+	priv->timer.cb = nfct_timer_cb;
+	priv->timer.ival = 1 SEC;
+	priv->timer.flags = TIMER_F_PERIODIC;
+	priv->timer.data = upi;
+
+	if (ulogd_register_timer(&priv->timer) < 0)
+		goto err_unreg_fd;
+
+	ulogd_log(ULOGD_INFO, "%s: started (tcache %u, scache %u)\n", upi->id,
+			  priv->tcache->c_num_heads, priv->scache->c_num_heads);
 
 	return 0;
+
+ err_unreg_fd:
+	ulogd_unregister_fd(&priv->nfct_fd);
+ err_nfct_close:
+	nfct_close(priv->cth);
+	priv->cth = NULL;
+ err_free:
+	free(priv->nfct_opaque);
+	priv->nfct_opaque = NULL;
+	cache_free(priv->tcache);
+	priv->tcache = NULL;
+	cache_free(priv->scache);
+	priv->tcache = NULL;
+
+	return -1;
 }
 
-static void signal_nfct(struct ulogd_pluginstance *pi, int signal)
+static int
+nfct_stop(struct ulogd_pluginstance *pi)
 {
-	switch (signal) {
-	case SIGUSR2:
-		get_ctr_zero(pi);
-		break;
+	struct nfct_pluginstance *priv = (void *)pi->private;
+
+	pr_debug("%s: pi=%p\n", __func__, pi);
+
+	if (disable_ce(pi) != 0)
+		return 0;				/* wasn't started */
+
+	if (priv->tcache == NULL)
+		return 0;				/* already stopped */
+
+	ulogd_unregister_timer(&priv->timer);
+
+	ulogd_unregister_fd(&priv->nfct_fd);
+
+	if (priv->cth != NULL) {
+		nfct_close(priv->cth);
+		priv->cth = NULL;
 	}
+
+	ulogd_log(ULOGD_DEBUG, "%s: ctnetlink connection closed\n", pi->id);
+
+	cache_free(priv->tcache);
+	priv->tcache = NULL;
+	cache_free(priv->scache);
+	priv->scache = NULL;
+
+	free(priv->nfct_opaque);
+	priv->nfct_opaque = NULL;
+
+	return 0;
 }
 
 static struct ulogd_plugin nfct_plugin = {
 	.name = "NFCT",
+	.flags = ULOGD_PF_RECONF,
 	.input = {
 		.type = ULOGD_DTYPE_SOURCE,
 	},
@@ -604,17 +1229,17 @@ static struct ulogd_plugin nfct_plugin =
 	},
 	.config_kset 	= &nfct_kset,
 	.interp 	= NULL,
-	.configure	= &configure_nfct,
-	.start		= &constructor_nfct,
-	.stop		= &destructor_nfct,
-	.signal		= &signal_nfct,
+	.configure	= nfct_configure,
+	.start		= nfct_start,
+	.stop		= nfct_stop,
 	.priv_size	= sizeof(struct nfct_pluginstance),
 	.version	= ULOGD_VERSION,
 };
 
 void __attribute__ ((constructor)) init(void);
 
-void init(void)
+void
+init(void)
 {
 	ulogd_register_plugin(&nfct_plugin);
 }

-- 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [ULOGD 01/15] Add NACCT output plugin
  2008-02-02 20:48 ` [ULOGD 01/15] Add NACCT output plugin heitzenberger
@ 2008-02-02 21:24   ` Pablo Neira Ayuso
  0 siblings, 0 replies; 31+ messages in thread
From: Pablo Neira Ayuso @ 2008-02-02 21:24 UTC (permalink / raw)
  To: heitzenberger; +Cc: netfilter-devel, holger

Applied. Thanks Holger.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [ULOGD 02/15] common.h: added
  2008-02-02 20:48 ` [ULOGD 02/15] common.h: added heitzenberger
@ 2008-02-02 21:30   ` Pablo Neira Ayuso
  0 siblings, 0 replies; 31+ messages in thread
From: Pablo Neira Ayuso @ 2008-02-02 21:30 UTC (permalink / raw)
  To: heitzenberger; +Cc: netfilter-devel, holger

Applied with minor glitches. Please, make sure your lines split at 80
chars. Added minor missing change in include/ulogd/Makefile.am. Thanks.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [ULOGD 04/15] Add IFI list
  2008-02-02 20:48 ` [ULOGD 04/15] Add IFI list heitzenberger
@ 2008-02-02 21:36   ` Pablo Neira Ayuso
  2008-02-02 21:50     ` Holger Eitzenberger
  0 siblings, 1 reply; 31+ messages in thread
From: Pablo Neira Ayuso @ 2008-02-02 21:36 UTC (permalink / raw)
  To: heitzenberger; +Cc: netfilter-devel, holger

heitzenberger@astaro.com wrote:
> Index: ulogd-netfilter/src/ifi.c
> ===================================================================
> --- /dev/null
> +++ ulogd-netfilter/src/ifi.c

AFAICS, this file implements something similar to the ifname2index API
which lives in libnfnetlink. Why don't you that instead? See the
utils/iftest.c file under libnfnetlink for an example.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [ULOGD 06/15] Conffile cleanup, use common pr_debug()
  2008-02-02 20:48 ` [ULOGD 06/15] Conffile cleanup, use common pr_debug() heitzenberger
@ 2008-02-02 21:43   ` Pablo Neira Ayuso
  0 siblings, 0 replies; 31+ messages in thread
From: Pablo Neira Ayuso @ 2008-02-02 21:43 UTC (permalink / raw)
  To: heitzenberger; +Cc: netfilter-devel, holger

Applied. Thanks Holger.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [ULOGD 07/15] Renice to -1 on startup
  2008-02-02 20:48 ` [ULOGD 07/15] Renice to -1 on startup heitzenberger
@ 2008-02-02 21:47   ` Pablo Neira Ayuso
  0 siblings, 0 replies; 31+ messages in thread
From: Pablo Neira Ayuso @ 2008-02-02 21:47 UTC (permalink / raw)
  To: heitzenberger; +Cc: netfilter-devel, holger

heitzenberger@astaro.com wrote:
> Thus possibly preventing e.g. ctnetlink from overruns on busy sites.
> 
> TODO: make this conditional

Indeed, a patch for that would be nice. Applied. Thanks.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [ULOGD 04/15] Add IFI list
  2008-02-02 21:36   ` Pablo Neira Ayuso
@ 2008-02-02 21:50     ` Holger Eitzenberger
  2008-02-02 22:56       ` Pablo Neira Ayuso
  0 siblings, 1 reply; 31+ messages in thread
From: Holger Eitzenberger @ 2008-02-02 21:50 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: heitzenberger, netfilter-devel

Pablo Neira Ayuso <pablo@netfilter.org> writes:

> AFAICS, this file implements something similar to the ifname2index API
> which lives in libnfnetlink. Why don't you that instead? See the
> utils/iftest.c file under libnfnetlink for an example.

The version of libnfnetlink I used initially did not have that, nor
would I ever expect for libnfnetlink to provide that functionality.

So I would leave it as-is in order to be able to build ulogd against
older librarires.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [ULOGD 03/15] Replace timer code by working version
  2008-02-02 20:48 ` [ULOGD 03/15] Replace timer code by working version heitzenberger
@ 2008-02-02 22:45   ` Pablo Neira Ayuso
  0 siblings, 0 replies; 31+ messages in thread
From: Pablo Neira Ayuso @ 2008-02-02 22:45 UTC (permalink / raw)
  To: heitzenberger; +Cc: netfilter-devel, holger

heitzenberger@astaro.com wrote:
> Replace existing timer code by simple and more importantly working
> version.  Current resolution is one second, which may be easily
> extended if need be.

I want to apply this patch since it fixes the current timer code.
Although I don't like it since setitimer() wakes up the daemon every 1
seconds even if there is nothing to do but OK, we can fix this later.

> @@ -59,6 +60,7 @@
>  {
>  	struct ulogd_fd *ufd;
>  	fd_set readset, writeset, exceptset;
> +	struct timeval tv = { .tv_sec = 1, };
>  	int i;
>  
>  	FD_ZERO(&readset);
> @@ -77,7 +79,13 @@
>  			FD_SET(ufd->fd, &exceptset);
>  	}
>  
> -	i = select(maxfd+1, &readset, &writeset, &exceptset, NULL);
> + again:
> +	i = select(maxfd+1, &readset, &writeset, &exceptset, &tv);
> +	if (i < 0) {
> +		if (errno == EINTR)
> +			goto again;
> +	}
> +

Hm, but why select also wakes up every 1 second? I'll continue applying
your patches when I get a reply on this (I cannot apply anyway since the
others depend on this).

BTW, this patch depends on 5/15. Please, your patches must compile
without dependencies.

$ make
[...]
/home/pablo/SVN-netfilter/branches/ulog/ulogd2/src/ulogd.c:835:
undefined reference to `ulogd_timer_check_n_run'

Holger Heitzenberger wrote:
> One of the changes herein is the usage of the pthread library.  This
> is strictly necessary because some plugins might (and SQLITE3 *does*)
> use pthreads.

I have grep the source code looking for pthread stuff but I didn't find
it, you mean that sqlite3 would need it in the future?

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [ULOGD 00/15] ulogd V2 improvements, round 2
  2008-02-02 20:48 [ULOGD 00/15] ulogd V2 improvements, round 2 heitzenberger
                   ` (14 preceding siblings ...)
  2008-02-02 20:48 ` [ULOGD 15/15] NFCT: rework and let it scale heitzenberger
@ 2008-02-02 22:52 ` Pablo Neira Ayuso
  15 siblings, 0 replies; 31+ messages in thread
From: Pablo Neira Ayuso @ 2008-02-02 22:52 UTC (permalink / raw)
  To: heitzenberger; +Cc: netfilter-devel, holger

heitzenberger@astaro.com wrote:
> Because the NFCT-related changes depend on a patchlet to
> libnetfilter-conntrack which might actually not be applied, I'll
> suggest to convert the code base to libnl sooner.  Any objections?

No objections with using libnl but, as I told you in one of my previous
emails, you can remove your small patch dependency by using libnfnetlink
and libnetfilter_conntrack correctly.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [ULOGD 04/15] Add IFI list
  2008-02-02 21:50     ` Holger Eitzenberger
@ 2008-02-02 22:56       ` Pablo Neira Ayuso
  0 siblings, 0 replies; 31+ messages in thread
From: Pablo Neira Ayuso @ 2008-02-02 22:56 UTC (permalink / raw)
  To: Holger Eitzenberger; +Cc: heitzenberger, netfilter-devel

Holger Eitzenberger wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> writes:
> 
>> AFAICS, this file implements something similar to the ifname2index API
>> which lives in libnfnetlink. Why don't you that instead? See the
>> utils/iftest.c file under libnfnetlink for an example.
> 
> The version of libnfnetlink I used initially did not have that, nor
> would I ever expect for libnfnetlink to provide that functionality.

libnfnetlink provides that since July 2007 because other libraries, such
as libnetfilter_queue need such feature. There was a discussion way back
and I decided to it there.

> So I would leave it as-is in order to be able to build ulogd against
> older librarires.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [ULOGD 05/15] Add signalling subsystem
  2008-02-02 20:48 ` [ULOGD 05/15] Add signalling subsystem heitzenberger
@ 2008-02-19 19:38   ` Pablo Neira Ayuso
  2008-02-20  8:43     ` Holger Eitzenberger
  0 siblings, 1 reply; 31+ messages in thread
From: Pablo Neira Ayuso @ 2008-02-19 19:38 UTC (permalink / raw)
  To: heitzenberger; +Cc: netfilter-devel, holger

heitzenberger@astaro.com wrote:
> This patch adds the concept of synchronous and asynchronous signal
> handlers to ulogd, where 'synchronous' just means to be synchronous to
> the underlying IO multiplexer.

I just committed a patch that reworks the timer framework to make it
synchronous with select(). Since we don't have asynchronous timers
anymore, I think that we can just block signaling during file
descriptors handling since ulogd would receive not often. AFAICS, this
infrastructure is nice but it was mainly targeted to the annoying SIGALRM.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [ULOGD 11/15] Add set_sockbuf_len()
  2008-02-02 20:48 ` [ULOGD 11/15] Add set_sockbuf_len() heitzenberger
@ 2008-02-19 19:57   ` Pablo Neira Ayuso
  0 siblings, 0 replies; 31+ messages in thread
From: Pablo Neira Ayuso @ 2008-02-19 19:57 UTC (permalink / raw)
  To: heitzenberger; +Cc: netfilter-devel, holger

heitzenberger@astaro.com wrote:
> +	if (snd_len > 0) {
> +		ret = setsockopt(fd, SOL_SOCKET, SO_SNDBUF, &snd_len,
> +						 sizeof(snd_len));
> +		if (ret < 0) {
> +			ulogd_log(ULOGD_ERROR, "setsockopt: SO_SNDBUF: %m\n");
> +			return -1;
> +		}
> +	}
> +
> +	if (rcv_len > 0) {
> +		ret = setsockopt(fd, SOL_SOCKET, SO_RCVBUF, &rcv_len,
> +						 sizeof(rcv_len));
> +		if (ret < 0) {
> +			ulogd_log(ULOGD_ERROR, "setsockopt: SO_RCVBUF: %m\n");
> +			return -1;
> +		}
> +	}

I guess that this is meant to increase the size of the netlink buffers,
isn't this? However, where's the client of this code?

Please, next time put the new code with its client so I can easily
notice what you want to do by looking at a single patch, otherwise I'll
have to look at the entire patchset to search for its client. A short
rationale on the benefits of the patch would also help.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [ULOGD 10/15] Improve select performance
  2008-02-02 20:48 ` [ULOGD 10/15] Improve select performance heitzenberger
@ 2008-02-19 19:58   ` Pablo Neira Ayuso
  0 siblings, 0 replies; 31+ messages in thread
From: Pablo Neira Ayuso @ 2008-02-19 19:58 UTC (permalink / raw)
  To: heitzenberger; +Cc: netfilter-devel, holger

[-- Attachment #1: Type: text/plain, Size: 503 bytes --]

heitzenberger@astaro.com wrote:
> The previous code consumed quite lots of CPU cycles because of 
> inefficiently handling fd_sets.

I have applied the following patch which is based on your improvement.

I'll check the other patches that are targeted to make plugins
reconfigurable as soon as I have some spare time, whatever help in the
merge process with current SVN would be appreciated and would surely
speed up the development. Thanks.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

[-- Attachment #2: x --]
[-- Type: text/plain, Size: 2295 bytes --]

Index: src/select.c
===================================================================
--- src/select.c	(revisión: 7377)
+++ src/select.c	(copia de trabajo)
@@ -26,6 +26,7 @@
 #include <ulogd/linuxlist.h>
 
 static int maxfd = 0;
+static fd_set readset, writeset, exceptset;
 static LLIST_HEAD(ulogd_fds);
 
 int ulogd_register_fd(struct ulogd_fd *fd)
@@ -41,6 +42,15 @@
 	if (flags < 0)
 		return -1;
 
+	if (fd->when & ULOGD_FD_READ)
+		FD_SET(fd->fd, &readset);
+
+	if (fd->when & ULOGD_FD_WRITE)
+		FD_SET(fd->fd, &writeset);
+
+	if (fd->when & ULOGD_FD_EXCEPT)
+		FD_SET(fd->fd, &exceptset);
+
 	/* Register FD */
 	if (fd->fd > maxfd)
 		maxfd = fd->fd;
@@ -52,44 +62,48 @@
 
 void ulogd_unregister_fd(struct ulogd_fd *fd)
 {
+	if (fd->when & ULOGD_FD_READ)
+		FD_CLR(fd->fd, &readset);
+
+	if (fd->when & ULOGD_FD_WRITE)
+		FD_CLR(fd->fd, &writeset);
+
+	if (fd->when & ULOGD_FD_EXCEPT)
+		FD_CLR(fd->fd, &exceptset);
+
 	llist_del(&fd->list);
+
+	/* Improvement: recalculate maxfd iif fd->fd == maxfd */
+	maxfd = -1;
+	llist_for_each_entry(fd, &ulogd_fds, list) {
+		if (fd->fd > maxfd)
+			maxfd = fd->fd;
+	}
 }
 
 int ulogd_select_main(struct timeval *tv)
 {
 	struct ulogd_fd *ufd;
-	fd_set readset, writeset, exceptset;
+	fd_set rds_tmp, wrs_tmp, exs_tmp;
 	int i;
 
-	FD_ZERO(&readset);
-	FD_ZERO(&writeset);
-	FD_ZERO(&exceptset);
+	rds_tmp = readset;
+	wrs_tmp = writeset;
+	exs_tmp = exceptset;
 
-	/* prepare read and write fdsets */
-	llist_for_each_entry(ufd, &ulogd_fds, list) {
-		if (ufd->when & ULOGD_FD_READ)
-			FD_SET(ufd->fd, &readset);
-
-		if (ufd->when & ULOGD_FD_WRITE)
-			FD_SET(ufd->fd, &writeset);
-
-		if (ufd->when & ULOGD_FD_EXCEPT)
-			FD_SET(ufd->fd, &exceptset);
-	}
-
-	i = select(maxfd+1, &readset, &writeset, &exceptset, tv);
+	i = select(maxfd+1, &rds_tmp, &wrs_tmp, &exs_tmp, tv);
 	if (i > 0) {
 		/* call registered callback functions */
 		llist_for_each_entry(ufd, &ulogd_fds, list) {
 			int flags = 0;
 
-			if (FD_ISSET(ufd->fd, &readset))
+			if (FD_ISSET(ufd->fd, &rds_tmp))
 				flags |= ULOGD_FD_READ;
 
-			if (FD_ISSET(ufd->fd, &writeset))
+			if (FD_ISSET(ufd->fd, &wrs_tmp))
 				flags |= ULOGD_FD_WRITE;
 
-			if (FD_ISSET(ufd->fd, &exceptset))
+			if (FD_ISSET(ufd->fd, &exs_tmp))
 				flags |= ULOGD_FD_EXCEPT;
 
 			if (flags)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [ULOGD 05/15] Add signalling subsystem
  2008-02-19 19:38   ` Pablo Neira Ayuso
@ 2008-02-20  8:43     ` Holger Eitzenberger
  2008-02-20 12:20       ` Patrick McHardy
  2008-02-20 12:23       ` Pablo Neira Ayuso
  0 siblings, 2 replies; 31+ messages in thread
From: Holger Eitzenberger @ 2008-02-20  8:43 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, holger

Pablo Neira Ayuso wrote:

>> This patch adds the concept of synchronous and asynchronous signal
>> handlers to ulogd, where 'synchronous' just means to be synchronous to
>> the underlying IO multiplexer.
> 
> I just committed a patch that reworks the timer framework to make it
> synchronous with select(). Since we don't have asynchronous timers
> anymore, I think that we can just block signaling during file
> descriptors handling since ulogd would receive not often. AFAICS, this
> infrastructure is nice but it was mainly targeted to the annoying SIGALRM.

Hi Pablo,

comparing your patch with the one I provided in my last patch collection 
I can say that your patch is quite larger due to the usage of the 
red-black trees in the timer code.  The number of timers in ulogd is 
depending of your configuration of course, but with my current 
configuration (NFLOG, NFCT, SQLITE3) I have currently three timers, wow.

Note that with your patch you basically remove the possibility for 
plugins to have timers which are asynchronous.  It's therefore less 
flexible for future users.

Also note that libraries such as libevent do it quite similar than 
provided in my patch.

My patch has the intention of providing a flexible infrastructure for 
plugins, which room for future improvements (such as red-black trees if 
there are hundreds of timers).   Some of my later patches I have 
enqueued locally base on those changes, but that's my problem.

You commented on some of those patches, with a quite positive statement 
to my initial post of this patch.  Also, Eric gave a GO on all patches 
despite the last NFCT patch, which I promised to rework for 
compatibility reason and based on your suggestions.

I accept the fact that you apparently like red-black trees and they 
definitely have their use-cases, but looking at typical ulogd 
configurations and numbers of timers in ulogd I can say that red-black 
trees just for timer usage seem to me like overkill.

Since you are in the habit of favoring your own patches against the ones 
I provide, even without giving a chance to modify my patch, I simply 
consider doing a fork of ulogd.  Otherwise I'll loose many of the work 
I've enqueued locally.

Patrick?

  /holger


-- 
Holger Eitzenberger <heitzenberger@astaro.com> | Software Developer
Astaro AG | www.astaro.de | Phone +49-721-25516-246 | Fax -200

Download your free 30 day trial version of the award-winning Astaro
Security Gateway solution here: https://my.astaro.com/download/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [ULOGD 05/15] Add signalling subsystem
  2008-02-20  8:43     ` Holger Eitzenberger
@ 2008-02-20 12:20       ` Patrick McHardy
  2008-02-20 12:23       ` Pablo Neira Ayuso
  1 sibling, 0 replies; 31+ messages in thread
From: Patrick McHardy @ 2008-02-20 12:20 UTC (permalink / raw)
  To: Holger Eitzenberger; +Cc: Pablo Neira Ayuso, netfilter-devel, holger

Holger Eitzenberger wrote:
> Pablo Neira Ayuso wrote:
> 
>>> This patch adds the concept of synchronous and asynchronous signal
>>> handlers to ulogd, where 'synchronous' just means to be synchronous to
>>> the underlying IO multiplexer.
>>
>> I just committed a patch that reworks the timer framework to make it
>> synchronous with select(). Since we don't have asynchronous timers
>> anymore, I think that we can just block signaling during file
>> descriptors handling since ulogd would receive not often. AFAICS, this
>> infrastructure is nice but it was mainly targeted to the annoying 
>> SIGALRM.
> 
> Hi Pablo,
> 
> comparing your patch with the one I provided in my last patch collection 
> I can say that your patch is quite larger due to the usage of the 
> red-black trees in the timer code.  The number of timers in ulogd is 
> depending of your configuration of course, but with my current 
> configuration (NFLOG, NFCT, SQLITE3) I have currently three timers, wow.
> 
> Note that with your patch you basically remove the possibility for 
> plugins to have timers which are asynchronous.  It's therefore less 
> flexible for future users.
> 
> Also note that libraries such as libevent do it quite similar than 
> provided in my patch.
> 
> My patch has the intention of providing a flexible infrastructure for 
> plugins, which room for future improvements (such as red-black trees if 
> there are hundreds of timers).   Some of my later patches I have 
> enqueued locally base on those changes, but that's my problem.
> 
> You commented on some of those patches, with a quite positive statement 
> to my initial post of this patch.  Also, Eric gave a GO on all patches 
> despite the last NFCT patch, which I promised to rework for 
> compatibility reason and based on your suggestions.
> 
> I accept the fact that you apparently like red-black trees and they 
> definitely have their use-cases, but looking at typical ulogd 
> configurations and numbers of timers in ulogd I can say that red-black 
> trees just for timer usage seem to me like overkill.
> 
> Since you are in the habit of favoring your own patches against the ones 
> I provide, even without giving a chance to modify my patch, I simply 
> consider doing a fork of ulogd.  Otherwise I'll loose many of the work 
> I've enqueued locally.
> 
> Patrick?

I'm not sure I understand the problem fully, looking at both patches
it seems to me that Pablo's patch could be extended to provide similar
functionality to yours without much effort.

I agree that simply committing replacement patches without discussion
when its known that further work depends on a patch is not a nice way
of working together. I did not see any discussion related to this
patch except the minor complaint about polling once per second in
the first posting. Could someone explain the problem here please?


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [ULOGD 05/15] Add signalling subsystem
  2008-02-20  8:43     ` Holger Eitzenberger
  2008-02-20 12:20       ` Patrick McHardy
@ 2008-02-20 12:23       ` Pablo Neira Ayuso
  1 sibling, 0 replies; 31+ messages in thread
From: Pablo Neira Ayuso @ 2008-02-20 12:23 UTC (permalink / raw)
  To: Holger Eitzenberger; +Cc: netfilter-devel, holger

Holger Eitzenberger wrote:
> comparing your patch with the one I provided in my last patch collection
> I can say that your patch is quite larger due to the usage of the
> red-black trees in the timer code.  The number of timers in ulogd is
> depending of your configuration of course, but with my current
> configuration (NFLOG, NFCT, SQLITE3) I have currently three timers, wow.
> 
> Note that with your patch you basically remove the possibility for
> plugins to have timers which are asynchronous.  It's therefore less
> flexible for future users.

Please, could you tell me why my timer approach is less flexible?

> Also note that libraries such as libevent do it quite similar than
> provided in my patch.

Indeed. This was the intention of my patch, to make ulogd event-driven
and so timers synchronous using select() which simplifies the timer
infrastructure instead of using SIGALRM.

> My patch has the intention of providing a flexible infrastructure for
> plugins, which room for future improvements (such as red-black trees if
> there are hundreds of timers).   Some of my later patches I have
> enqueued locally base on those changes, but that's my problem.
> 
> You commented on some of those patches, with a quite positive statement
> to my initial post of this patch.  Also, Eric gave a GO on all patches
> despite the last NFCT patch, which I promised to rework for
> compatibility reason and based on your suggestions.

I have barely touched NFCT. I'm still waiting for your NFCT patch.

> I accept the fact that you apparently like red-black trees and they
> definitely have their use-cases, but looking at typical ulogd
> configurations and numbers of timers in ulogd I can say that red-black
> trees just for timer usage seem to me like overkill.

We can easily implement list-based timers using the same API. You only
have to use the current timer API in your patches.

> Since you are in the habit of favoring your own patches against the ones
> I provide, even without giving a chance to modify my patch, I simply
> consider doing a fork of ulogd.

You still have the chance to modify your patches. I'm not disregarding
your work.

> Otherwise I'll loose many of the work I've enqueued locally.

I have kept lots of patches locally in my whole life that I had rework
lots of times to get them into mainline...

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2008-02-20 12:23 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-02 20:48 [ULOGD 00/15] ulogd V2 improvements, round 2 heitzenberger
2008-02-02 20:48 ` [ULOGD 01/15] Add NACCT output plugin heitzenberger
2008-02-02 21:24   ` Pablo Neira Ayuso
2008-02-02 20:48 ` [ULOGD 02/15] common.h: added heitzenberger
2008-02-02 21:30   ` Pablo Neira Ayuso
2008-02-02 20:48 ` [ULOGD 03/15] Replace timer code by working version heitzenberger
2008-02-02 22:45   ` Pablo Neira Ayuso
2008-02-02 20:48 ` [ULOGD 04/15] Add IFI list heitzenberger
2008-02-02 21:36   ` Pablo Neira Ayuso
2008-02-02 21:50     ` Holger Eitzenberger
2008-02-02 22:56       ` Pablo Neira Ayuso
2008-02-02 20:48 ` [ULOGD 05/15] Add signalling subsystem heitzenberger
2008-02-19 19:38   ` Pablo Neira Ayuso
2008-02-20  8:43     ` Holger Eitzenberger
2008-02-20 12:20       ` Patrick McHardy
2008-02-20 12:23       ` Pablo Neira Ayuso
2008-02-02 20:48 ` [ULOGD 06/15] Conffile cleanup, use common pr_debug() heitzenberger
2008-02-02 21:43   ` Pablo Neira Ayuso
2008-02-02 20:48 ` [ULOGD 07/15] Renice to -1 on startup heitzenberger
2008-02-02 21:47   ` Pablo Neira Ayuso
2008-02-02 20:48 ` [ULOGD 08/15] Initial round to make plugins reconfigurable heitzenberger
2008-02-02 20:48 ` [ULOGD 09/15] llist: add llist_for_each_prev_safe() heitzenberger
2008-02-02 20:48 ` [ULOGD 10/15] Improve select performance heitzenberger
2008-02-19 19:58   ` Pablo Neira Ayuso
2008-02-02 20:48 ` [ULOGD 11/15] Add set_sockbuf_len() heitzenberger
2008-02-19 19:57   ` Pablo Neira Ayuso
2008-02-02 20:48 ` [ULOGD 12/15] Introduce global state, skip some stacks during reconfiguration heitzenberger
2008-02-02 20:48 ` [ULOGD 13/15] llist: turn poisoning off by default heitzenberger
2008-02-02 20:48 ` [ULOGD 14/15] SQLITE3: port to ulogd 2.00, mostly a rewrite heitzenberger
2008-02-02 20:48 ` [ULOGD 15/15] NFCT: rework and let it scale heitzenberger
2008-02-02 22:52 ` [ULOGD 00/15] ulogd V2 improvements, round 2 Pablo Neira Ayuso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).