* [PATCH net-2.6.26] netlink: make socket filters work on netlink
@ 2008-03-21 18:05 Stephen Hemminger
2008-03-21 22:47 ` David Miller
2008-03-26 20:19 ` Patrick McHardy
0 siblings, 2 replies; 25+ messages in thread
From: Stephen Hemminger @ 2008-03-21 18:05 UTC (permalink / raw)
To: David Miller, Jamal; +Cc: netdev
Make socket filters work for netlink unicast and notifications.
This is useful for applications like Zebra that get overrun with
messages that are then ignored.
Note: netlink messages are in host byte order, but packet filter
state machine operations are done as network byte order.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
--- a/net/netlink/af_netlink.c 2008-03-21 09:35:54.000000000 -0700
+++ b/net/netlink/af_netlink.c 2008-03-21 11:00:51.000000000 -0700
@@ -886,6 +886,13 @@ retry:
if (netlink_is_kernel(sk))
return netlink_unicast_kernel(sk, skb);
+ if (sk_filter(sk, skb)) {
+ int err = skb->len;
+ kfree_skb(skb);
+ sock_put(sk);
+ return err;
+ }
+
err = netlink_attachskb(sk, skb, nonblock, &timeo, ssk);
if (err == 1)
goto retry;
@@ -980,6 +987,9 @@ static inline int do_one_broadcast(struc
netlink_overrun(sk);
/* Clone failed. Notify ALL listeners. */
p->failure = 1;
+ } else if (sk_filter(sk, p->skb2)) {
+ kfree_skb(p->skb2);
+ p->skb2 = NULL;
} else if ((val = netlink_broadcast_deliver(sk, p->skb2)) < 0) {
netlink_overrun(sk);
} else {
@@ -1533,8 +1543,13 @@ static int netlink_dump(struct sock *sk)
if (len > 0) {
mutex_unlock(nlk->cb_mutex);
- skb_queue_tail(&sk->sk_receive_queue, skb);
- sk->sk_data_ready(sk, len);
+
+ if (sk_filter(sk, skb))
+ kfree_skb(skb);
+ else {
+ skb_queue_tail(&sk->sk_receive_queue, skb);
+ sk->sk_data_ready(sk, skb->len);
+ }
return 0;
}
@@ -1544,8 +1559,12 @@ static int netlink_dump(struct sock *sk)
memcpy(nlmsg_data(nlh), &len, sizeof(len));
- skb_queue_tail(&sk->sk_receive_queue, skb);
- sk->sk_data_ready(sk, skb->len);
+ if (sk_filter(sk, skb))
+ kfree_skb(skb);
+ else {
+ skb_queue_tail(&sk->sk_receive_queue, skb);
+ sk->sk_data_ready(sk, skb->len);
+ }
if (cb->done)
cb->done(cb);
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-2.6.26] netlink: make socket filters work on netlink
2008-03-21 18:05 [PATCH net-2.6.26] netlink: make socket filters work on netlink Stephen Hemminger
@ 2008-03-21 22:47 ` David Miller
2008-03-26 20:19 ` Patrick McHardy
1 sibling, 0 replies; 25+ messages in thread
From: David Miller @ 2008-03-21 22:47 UTC (permalink / raw)
To: shemminger; +Cc: hadi, netdev
From: Stephen Hemminger <shemminger@vyatta.com>
Date: Fri, 21 Mar 2008 11:05:15 -0700
> Make socket filters work for netlink unicast and notifications.
> This is useful for applications like Zebra that get overrun with
> messages that are then ignored.
>
> Note: netlink messages are in host byte order, but packet filter
> state machine operations are done as network byte order.
>
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Applied, thanks.
Although we will have to, at some point, address the
issue of how to transfer netlink on-the-wire portably
and how that interacts with the existing userland
interfaces.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-2.6.26] netlink: make socket filters work on netlink
2008-03-21 18:05 [PATCH net-2.6.26] netlink: make socket filters work on netlink Stephen Hemminger
2008-03-21 22:47 ` David Miller
@ 2008-03-26 20:19 ` Patrick McHardy
2008-03-31 19:33 ` Stephen Hemminger
1 sibling, 1 reply; 25+ messages in thread
From: Patrick McHardy @ 2008-03-26 20:19 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: David Miller, Jamal, netdev
[-- Attachment #1: Type: text/plain, Size: 617 bytes --]
Stephen Hemminger wrote:
> Make socket filters work for netlink unicast and notifications.
> This is useful for applications like Zebra that get overrun with
> messages that are then ignored.
>
> Note: netlink messages are in host byte order, but packet filter
> state machine operations are done as network byte order.
Do you have an example for a filter for this? I have a similar
patch that adds a new filter instruction for parsing netlink
attributes, which seemed necessary for getting at nested
attributes without too much trouble.
Attached for reference together with a libnl testing
patch for ctnetlink.
[-- Attachment #2: x --]
[-- Type: text/plain, Size: 1908 bytes --]
diff --git a/include/linux/filter.h b/include/linux/filter.h
index ddfa037..0e39016 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -136,7 +136,8 @@ static inline unsigned int sk_filter_len(struct sk_filter *fp)
#define SKF_AD_PROTOCOL 0
#define SKF_AD_PKTTYPE 4
#define SKF_AD_IFINDEX 8
-#define SKF_AD_MAX 12
+#define SKF_AD_NLATTR 12
+#define SKF_AD_MAX 16
#define SKF_NET_OFF (-0x100000)
#define SKF_LL_OFF (-0x200000)
diff --git a/net/core/filter.c b/net/core/filter.c
index e0a0694..20ed056 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -27,6 +27,7 @@
#include <linux/if_packet.h>
#include <net/ip.h>
#include <net/protocol.h>
+#include <net/netlink.h>
#include <linux/skbuff.h>
#include <net/sock.h>
#include <linux/errno.h>
@@ -268,6 +269,22 @@ load_b:
case SKF_AD_IFINDEX:
A = skb->dev->ifindex;
continue;
+ case SKF_AD_NLATTR: {
+ struct nlattr *nla;
+
+ if (skb_is_nonlinear(skb))
+ return 0;
+ if (A > skb->len - sizeof(struct nlattr))
+ return 0;
+
+ nla = nla_find((struct nlattr *)&skb->data[A],
+ skb->len - A, X);
+ if (nla)
+ A = (void *)nla - (void *)skb->data;
+ else
+ A = 0;
+ continue;
+ }
default:
return 0;
}
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 524e826..6f68f2b 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -919,6 +919,17 @@ static inline int netlink_broadcast_deliver(struct sock *sk,
struct sk_buff *skb)
{
struct netlink_sock *nlk = nlk_sk(sk);
+ struct sk_filter *filter;
+ unsigned int len = skb->len;
+
+ rcu_read_lock_bh();
+ filter = rcu_dereference(sk->sk_filter);
+ if (filter)
+ len = sk_run_filter(skb, filter->insns, filter->len);
+ rcu_read_unlock_bh();
+
+ if (len == 0)
+ return 0;
if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf &&
!test_bit(0, &nlk->state)) {
[-- Attachment #3: x2 --]
[-- Type: text/plain, Size: 3021 bytes --]
diff --git a/src/nf-monitor.c b/src/nf-monitor.c
index 2bc58c9..fad5100 100644
--- a/src/nf-monitor.c
+++ b/src/nf-monitor.c
@@ -13,6 +13,9 @@
#include "utils.h"
#include <netlink/netfilter/nfnl.h>
+#include <linux/netfilter/nfnetlink_conntrack.h>
+#include <linux/netfilter/nf_conntrack_tcp.h>
+#include <linux/filter.h>
static void obj_input(struct nl_object *obj, void *arg)
{
@@ -34,6 +37,113 @@ static int event_input(struct nl_msg *msg, void *arg)
return NL_STOP;
}
+#define SKF_AD_NLATTR 12
+
+static int sk_set_filter(int fd)
+{
+ struct sock_filter filter[] = {
+ {
+ /* A = sizeof(struct nlmsghdr) + sizeof(struct nfgenmsg) */
+ .code = BPF_LD|BPF_IMM,
+ .k = sizeof(struct nlmsghdr) + sizeof(struct nfgenmsg),
+ },
+ {
+ /* X = CTA_PROTOINFO */
+ .code = BPF_LDX|BPF_IMM,
+ .k = CTA_PROTOINFO,
+ },
+ {
+ /* A = netlink attribute offset */
+ .code = BPF_LD|BPF_B|BPF_ABS,
+ .k = SKF_AD_OFF + SKF_AD_NLATTR,
+ },
+ {
+ /* Reject if not found (A == 0) */
+ .code = BPF_JMP|BPF_JEQ|BPF_K,
+ .k = 0,
+ .jt = 20 - 3 - 1,
+ },
+
+ {
+ /* A += sizeof(struct nlattr) */
+ .code = BPF_ALU|BPF_ADD|BPF_K,
+ .k = sizeof(struct nlattr),
+ },
+ {
+ /* X = CTA_PROTOINFO_TCP */
+ .code = BPF_LDX|BPF_IMM,
+ .k = CTA_PROTOINFO_TCP,
+ },
+ {
+ /* A = netlink attribute offset */
+ .code = BPF_LD|BPF_B|BPF_ABS,
+ .k = SKF_AD_OFF + SKF_AD_NLATTR,
+ },
+ {
+ /* Reject if not found (A == 0) */
+ .code = BPF_JMP|BPF_JEQ|BPF_K,
+ .k = 0,
+ .jt = 20 - 7 - 1,
+ },
+
+ {
+ /* A += sizeof(struct nlattr) */
+ .code = BPF_ALU|BPF_ADD|BPF_K,
+ .k = sizeof(struct nlattr),
+ },
+ {
+ /* X = CTA_PROTOINFO_TCP_STATE */
+ .code = BPF_LDX|BPF_IMM,
+ .k = CTA_PROTOINFO_TCP_STATE,
+ },
+ {
+ /* A = netlink attribute offset */
+ .code = BPF_LD|BPF_B|BPF_ABS,
+ .k = SKF_AD_OFF + SKF_AD_NLATTR,
+ },
+ {
+ /* Reject if not found (A == 0) */
+ .code = BPF_JMP|BPF_JEQ|BPF_K,
+ .k = 0,
+ .jt = 20 - 11 - 1,
+ },
+
+ {
+ /* X = A */
+ .code = BPF_MISC|BPF_TAX,
+ },
+ {
+ /* A = skb->data[X + k] */
+ .code = BPF_LD|BPF_B|BPF_IND,
+ .k = sizeof(struct nlattr),
+ },
+ {
+ /* Reject if A != TCA_CONNTRACK_ESTABLISHED */
+ .code = BPF_JMP|BPF_JEQ|BPF_K,
+ .k = TCP_CONNTRACK_ESTABLISHED,
+ .jf = 20 - 14 - 1,
+ },
+
+ {
+ /* Accept */
+ .code = BPF_RET|BPF_K,
+ .k = 1,
+ },
+ [20] = {
+ /* Reject */
+ .code = BPF_RET|BPF_K,
+ .k = 0,
+ },
+ };
+ struct sock_fprog fprog = {
+ .len = sizeof(filter) / sizeof(filter[0]),
+ .filter = filter,
+ };
+
+ return setsockopt(fd, SOL_SOCKET, SO_ATTACH_FILTER,
+ &fprog, sizeof(fprog));
+}
+
int main(int argc, char *argv[])
{
struct nl_handle *nlh;
@@ -92,6 +202,11 @@ int main(int argc, char *argv[])
fprintf(stderr, "Warning: Unknown group: %s\n", argv[idx]);
}
+ if (sk_set_filter(nl_socket_get_fd(nlh)) < 0) {
+ perror("setsockopt(SO_ATTACH_FILTER)");
+ goto errout;
+ }
+
while (1) {
fd_set rfds;
int fd, retval;
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH net-2.6.26] netlink: make socket filters work on netlink
2008-03-26 20:19 ` Patrick McHardy
@ 2008-03-31 19:33 ` Stephen Hemminger
2008-03-31 19:40 ` Patrick McHardy
0 siblings, 1 reply; 25+ messages in thread
From: Stephen Hemminger @ 2008-03-31 19:33 UTC (permalink / raw)
To: Patrick McHardy; +Cc: David Miller, Jamal, netdev
[-- Attachment #1: Type: text/plain, Size: 972 bytes --]
On Wed, 26 Mar 2008 21:19:56 +0100
Patrick McHardy <kaber@trash.net> wrote:
> Stephen Hemminger wrote:
> > Make socket filters work for netlink unicast and notifications.
> > This is useful for applications like Zebra that get overrun with
> > messages that are then ignored.
> >
> > Note: netlink messages are in host byte order, but packet filter
> > state machine operations are done as network byte order.
>
>
> Do you have an example for a filter for this? I have a similar
> patch that adds a new filter instruction for parsing netlink
> attributes, which seemed necessary for getting at nested
> attributes without too much trouble.
>
> Attached for reference together with a libnl testing
> patch for ctnetlink.
>
Here is the example program:
it uses netlink IPC and has one thread send route notifications
and the other filters.
to test the mulitcast path used a hacked version of ip_monitor from iproute
see attachment for the quagga patch.
[-- Attachment #2: netlink-ipc.c --]
[-- Type: text/x-csrc, Size: 4831 bytes --]
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdint.h>
#include <string.h>
#include <netinet/in.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/wait.h>
#include <linux/netlink.h>
#include <linux/rtnetlink.h>
#include <linux/filter.h>
#include <asm/byteorder.h>
static inline int nlmsg_msg_size(int payload)
{
return NLMSG_HDRLEN + payload;
}
static inline void *nlmsg_data(const struct nlmsghdr *nlh)
{
return (unsigned char *) nlh + NLMSG_HDRLEN;
}
static inline int nlmsg_total_size(int payload)
{
return NLMSG_ALIGN(nlmsg_msg_size(payload));
}
static inline int nlmsg_padlen(int payload)
{
return nlmsg_total_size(payload) - nlmsg_msg_size(payload);
}
static struct nlmsghdr *nlmsg_put(void *buf, uint32_t pid, uint32_t seq,
int type, int payload, int flags)
{
struct nlmsghdr *nlh;
nlh = (struct nlmsghdr *) buf;
nlh->nlmsg_type = type;
nlh->nlmsg_len = nlmsg_msg_size(payload);
nlh->nlmsg_flags = flags;
nlh->nlmsg_pid = pid;
nlh->nlmsg_seq = seq;
memset((unsigned char *) nlmsg_data(nlh) + payload, 0,
nlmsg_padlen(payload));
return nlh;
}
static int fill_info(unsigned char *buf, uint32_t pid, uint32_t seq, int event,
int flags, unsigned proto)
{
struct rtmsg *r;
struct nlmsghdr *nlh;
nlh = nlmsg_put(buf, pid, seq, event, sizeof(*r), 0);
r = nlmsg_data(nlh);
r->rtm_family = AF_INET;
r->rtm_dst_len = 32;
r->rtm_src_len = 0;
r->rtm_tos = 0;
r->rtm_table = RT_TABLE_MAIN;
r->rtm_type = 0;
r->rtm_scope = RT_SCOPE_UNIVERSE;
r->rtm_protocol = RTPROT_UNSPEC;
r->rtm_flags = flags;
return (void *) (r+1) - (void *) buf;
}
static const char *nltypes[] = {
[RTM_NEWROUTE] = "RTM_NEWROUTE",
[RTM_DELROUTE] = "RTM_DELROUTE",
[RTM_GETROUTE] = "RTM_GETROUTE",
[RTM_NEWLINK] = "RTM_NEWLINK",
[RTM_DELLINK] = "RTM_DELLINK",
[RTM_GETLINK] = "RTM_GETLINK",
[RTM_NEWADDR] = "RTM_NEWADDR",
[RTM_DELADDR] = "RTM_DELADDR",
[RTM_GETADDR] = "RTM_GETADDR",
};
static struct sock_filter filter[] = {
BPF_STMT(BPF_LD|BPF_ABS|BPF_H, 4), /* 0: ldh [4] */
BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __constant_htons(RTM_NEWROUTE), 0, 3),
/* 1: jeq 0x18 jt 2 jf 5 */
BPF_STMT(BPF_LD|BPF_ABS|BPF_B, 23), /* 2: ldb [23] */
BPF_JUMP(BPF_JMP+ BPF_B, RTPROT_ZEBRA, 2, 0),
/* 3: jeq 0xb jt 4 jf 5 */
BPF_STMT(BPF_RET|BPF_K, 0), /* 4: ret 0 */
BPF_STMT(BPF_RET|BPF_K, 0xffff), /* 5: ret 0xffff */
};
int main(int ac, char **av)
{
int i, sk, cc;
struct sockaddr_nl snl = {
.nl_family = AF_NETLINK,
};
socklen_t snl_len = sizeof snl;
pid_t pid;
unsigned char buf[4096];
struct sock_fprog prog = {
.len = sizeof(filter) / sizeof(filter[0]),
.filter = filter,
};
for (i = 0; i < prog.len; i++ )
printf("%d: %#04x %d %d %#08x\n", i,
filter[i].code, filter[i].jt, filter[i].jf, filter[i].k);
printf("\n");
fflush(stdout);
sk = socket(AF_NETLINK, SOCK_DGRAM, NETLINK_USERSOCK);
if (sk < 0)
perror("socket");
if (bind(sk, (struct sockaddr *) &snl, sizeof snl) < 0)
perror("bind");
if (getsockname(sk, (struct sockaddr *) &snl, &snl_len) < 0)
perror("getsockname");
if (setsockopt(sk, SOL_SOCKET, SO_ATTACH_FILTER, &prog, sizeof(prog)) < 0)
perror("setsockopt attach filte");
pid = fork();
if (pid == 0) {
struct nlmsghdr *nlh;
do {
struct rtmsg *r;
cc = recv(sk, &buf, sizeof buf, 0);
if (cc < 0)
perror("recv");
for (i = 0; i < 64 && i < cc; i++)
printf("%02x%c", buf[i],
(i % 16) == 15 ? '\n' : ' ');
nlh = (struct nlmsghdr *) buf;
printf("\n\tlen=%d type=%s flags=%#x seq=%#x pid=%d\n",
nlh->nlmsg_len, nltypes[nlh->nlmsg_type],
nlh->nlmsg_flags, nlh->nlmsg_seq, nlh->nlmsg_pid);
r = NLMSG_DATA(nlh);
printf("\tfamily=%d table=%d proto=%d type=%#x flags=%#x\n\n",
r->rtm_family, r->rtm_table, r->rtm_protocol,
r->rtm_type, r->rtm_flags);
} while (nlh->nlmsg_type != RTM_DELLINK);
exit(0);
} else {
cc = fill_info(buf, 0, 1, RTM_NEWLINK, 0, RTPROT_UNSPEC);
if (sendto(sk, buf, cc, 0,
(struct sockaddr *) &snl, snl_len) < 0)
perror("sendto");
cc = fill_info(buf, 0, 1, RTM_NEWROUTE, RTM_F_CLONED, RTPROT_UNSPEC);
if (sendto(sk, buf, cc, 0,
(struct sockaddr *) &snl, snl_len) < 0)
perror("sendto");
cc = fill_info(buf, 0, 1, RTM_NEWROUTE, 0, RTPROT_ZEBRA);
if (sendto(sk, buf, cc, 0,
(struct sockaddr *) &snl, snl_len) < 0)
perror("sendto");
cc = fill_info(buf, 0, 1, RTM_NEWROUTE, 0, RTPROT_UNSPEC);
if (sendto(sk, buf, cc, 0,
(struct sockaddr *) &snl, snl_len) < 0)
perror("sendto");
cc = fill_info(buf, 0, 1, RTM_DELLINK, 0, RTPROT_UNSPEC);
if (sendto(sk, buf, cc, 0,
(struct sockaddr *) &snl, snl_len) < 0)
perror("sendto");
waitpid(pid, NULL, 0);
}
return 0;
}
[-- Attachment #3: zebra-filter.patch --]
[-- Type: text/x-patch, Size: 2767 bytes --]
>From 2bc693173112ea436884aca19352624504e2246a Mon Sep 17 00:00:00 2001
From: Stephen Hemminger <stephen.hemminger@vyatta.com>
Date: Mon, 31 Mar 2008 12:23:25 -0700
Subject: [PATCH] Filter unwanted netlink messages
Use socket filter to drop unwanted messages on the netlink listen socket.
This prevents problems where the listener socket buffer gets overrruns
with echos of the new route update that occurs when link changes.
---
lib/zebra.h | 1 +
zebra/rt_netlink.c | 33 ++++++++++++++++++++++++++++++++-
2 files changed, 33 insertions(+), 1 deletions(-)
diff --git a/lib/zebra.h b/lib/zebra.h
index 150aa2c..d4f68cf 100644
--- a/lib/zebra.h
+++ b/lib/zebra.h
@@ -162,6 +162,7 @@ typedef int socklen_t;
#ifdef HAVE_NETLINK
#include <linux/netlink.h>
#include <linux/rtnetlink.h>
+#include <linux/filter.h>
#else
#define RT_TABLE_MAIN 0
#endif /* HAVE_NETLINK */
diff --git a/zebra/rt_netlink.c b/zebra/rt_netlink.c
index f071eb2..07e473b 100644
--- a/zebra/rt_netlink.c
+++ b/zebra/rt_netlink.c
@@ -41,6 +41,7 @@
#include "zebra/redistribute.h"
#include "zebra/interface.h"
#include "zebra/debug.h"
+#include <stddef.h>
/* Socket interface to kernel */
struct nlsock
@@ -1938,6 +1939,33 @@ kernel_read (struct thread *thread)
return 0;
}
+/* Filter out messages from self that occur on listener socket */
+static void netlink_install_filter (int sock)
+{
+ /* BPF code to exclude all RTM_NEWROUTE messages from ZEBRA */
+ struct sock_filter filter[] = {
+ BPF_STMT(BPF_LD|BPF_ABS|BPF_H, offsetof(struct nlmsghdr, nlmsg_type)),
+ /* 0: ldh [4] */
+ BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, htons(RTM_NEWROUTE), 0, 3),
+ /* 1: jeq 0x18 jt 2 jf 5 */
+ BPF_STMT(BPF_LD|BPF_ABS|BPF_B,
+ sizeof(struct nlmsghdr) + offsetof(struct rtmsg, rtm_protocol)),
+ /* 2: ldb [23] */
+ BPF_JUMP(BPF_JMP+ BPF_B, RTPROT_ZEBRA, 2, 0),
+ /* 3: jeq 0xb jt 4 jf 5 */
+ BPF_STMT(BPF_RET|BPF_K, 0), /* 4: ret 0 */
+ BPF_STMT(BPF_RET|BPF_K, 0xffff), /* 5: ret 0xffff */
+ };
+
+ struct sock_fprog prog = {
+ .len = sizeof(filter) / sizeof(filter[0]),
+ .filter = filter,
+ };
+
+ if (setsockopt(sock, SOL_SOCKET, SO_ATTACH_FILTER, &prog, sizeof(prog)) < 0)
+ zlog_warn ("Can't install socket filter: %s\n", safe_strerror(errno));
+}
+
/* Exported interface function. This function simply calls
netlink_socket (). */
void
@@ -1954,5 +1982,8 @@ kernel_init (void)
/* Register kernel socket. */
if (netlink.sock > 0)
- thread_add_read (zebrad.master, kernel_read, NULL, netlink.sock);
+ {
+ netlink_install_filter (netlink.sock);
+ thread_add_read (zebrad.master, kernel_read, NULL, netlink.sock);
+ }
}
--
1.5.4.3
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH net-2.6.26] netlink: make socket filters work on netlink
2008-03-31 19:33 ` Stephen Hemminger
@ 2008-03-31 19:40 ` Patrick McHardy
2008-03-31 19:46 ` Stephen Hemminger
2008-03-31 20:07 ` David Miller
0 siblings, 2 replies; 25+ messages in thread
From: Patrick McHardy @ 2008-03-31 19:40 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: David Miller, Jamal, netdev
Stephen Hemminger wrote:
> On Wed, 26 Mar 2008 21:19:56 +0100
> Patrick McHardy <kaber@trash.net> wrote:
>
>> Stephen Hemminger wrote:
>>> Make socket filters work for netlink unicast and notifications.
>>> This is useful for applications like Zebra that get overrun with
>>> messages that are then ignored.
>>>
>>> Note: netlink messages are in host byte order, but packet filter
>>> state machine operations are done as network byte order.
>>
>> Do you have an example for a filter for this? I have a similar
>> patch that adds a new filter instruction for parsing netlink
>> attributes, which seemed necessary for getting at nested
>> attributes without too much trouble.
>>
>> Attached for reference together with a libnl testing
>> patch for ctnetlink.
>>
>
> Here is the example program:
> it uses netlink IPC and has one thread send route notifications
> and the other filters.
>
> to test the mulitcast path used a hacked version of ip_monitor from iproute
>
> see attachment for the quagga patch.
Thanks. It seems it parses only top-level attributes, which
is probably why you didn't need the nlattr_find command I
used in my patch. The problem with this is that finding and
parsing nested attributes using the existing BPF commands is
complicated since you need to fully parse netlink headers
and walk through them. You can't even reuse that part for
multiple nested attributes since you can't jump backwards.
So I think it would be preferrable to have a simpler method
for this.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-2.6.26] netlink: make socket filters work on netlink
2008-03-31 19:40 ` Patrick McHardy
@ 2008-03-31 19:46 ` Stephen Hemminger
2008-03-31 20:07 ` David Miller
1 sibling, 0 replies; 25+ messages in thread
From: Stephen Hemminger @ 2008-03-31 19:46 UTC (permalink / raw)
To: Patrick McHardy; +Cc: David Miller, Jamal, netdev
On Mon, 31 Mar 2008 21:40:51 +0200
Patrick McHardy <kaber@trash.net> wrote:
> Stephen Hemminger wrote:
> > On Wed, 26 Mar 2008 21:19:56 +0100
> > Patrick McHardy <kaber@trash.net> wrote:
> >
> >> Stephen Hemminger wrote:
> >>> Make socket filters work for netlink unicast and notifications.
> >>> This is useful for applications like Zebra that get overrun with
> >>> messages that are then ignored.
> >>>
> >>> Note: netlink messages are in host byte order, but packet filter
> >>> state machine operations are done as network byte order.
> >>
> >> Do you have an example for a filter for this? I have a similar
> >> patch that adds a new filter instruction for parsing netlink
> >> attributes, which seemed necessary for getting at nested
> >> attributes without too much trouble.
> >>
> >> Attached for reference together with a libnl testing
> >> patch for ctnetlink.
> >>
> >
> > Here is the example program:
> > it uses netlink IPC and has one thread send route notifications
> > and the other filters.
> >
> > to test the mulitcast path used a hacked version of ip_monitor from iproute
> >
> > see attachment for the quagga patch.
>
>
> Thanks. It seems it parses only top-level attributes, which
> is probably why you didn't need the nlattr_find command I
> used in my patch. The problem with this is that finding and
> parsing nested attributes using the existing BPF commands is
> complicated since you need to fully parse netlink headers
> and walk through them. You can't even reuse that part for
> multiple nested attributes since you can't jump backwards.
> So I think it would be preferrable to have a simpler method
> for this.
Agreed, it isn't a general solution but it is useful as is
to filter out the cruft.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-2.6.26] netlink: make socket filters work on netlink
2008-03-31 19:40 ` Patrick McHardy
2008-03-31 19:46 ` Stephen Hemminger
@ 2008-03-31 20:07 ` David Miller
2008-03-31 20:15 ` Patrick McHardy
1 sibling, 1 reply; 25+ messages in thread
From: David Miller @ 2008-03-31 20:07 UTC (permalink / raw)
To: kaber; +Cc: shemminger, hadi, netdev
From: Patrick McHardy <kaber@trash.net>
Date: Mon, 31 Mar 2008 21:40:51 +0200
> Thanks. It seems it parses only top-level attributes, which
> is probably why you didn't need the nlattr_find command I
> used in my patch. The problem with this is that finding and
> parsing nested attributes using the existing BPF commands is
> complicated since you need to fully parse netlink headers
> and walk through them. You can't even reuse that part for
> multiple nested attributes since you can't jump backwards.
> So I think it would be preferrable to have a simpler method
> for this.
Agreed.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-2.6.26] netlink: make socket filters work on netlink
2008-03-31 20:07 ` David Miller
@ 2008-03-31 20:15 ` Patrick McHardy
2008-03-31 21:49 ` jamal
0 siblings, 1 reply; 25+ messages in thread
From: Patrick McHardy @ 2008-03-31 20:15 UTC (permalink / raw)
To: David Miller; +Cc: shemminger, hadi, netdev
David Miller wrote:
> From: Patrick McHardy <kaber@trash.net>
> Date: Mon, 31 Mar 2008 21:40:51 +0200
>
>> Thanks. It seems it parses only top-level attributes, which
>> is probably why you didn't need the nlattr_find command I
>> used in my patch. The problem with this is that finding and
>> parsing nested attributes using the existing BPF commands is
>> complicated since you need to fully parse netlink headers
>> and walk through them. You can't even reuse that part for
>> multiple nested attributes since you can't jump backwards.
>> So I think it would be preferrable to have a simpler method
>> for this.
>
> Agreed.
I'll cook something up based on my and Stephen's patches.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-2.6.26] netlink: make socket filters work on netlink
2008-03-31 20:15 ` Patrick McHardy
@ 2008-03-31 21:49 ` jamal
2008-04-01 11:52 ` Patrick McHardy
0 siblings, 1 reply; 25+ messages in thread
From: jamal @ 2008-03-31 21:49 UTC (permalink / raw)
To: Patrick McHardy; +Cc: David Miller, shemminger, netdev
On Mon, 2008-31-03 at 22:15 +0200, Patrick McHardy wrote:
> I'll cook something up based on my and Stephen's patches.
Nice - now we can move all the filtering off iproute2;->
My view is that the nature of most stuff to filter on _events_ will
depend on something along a "whodunnit" field. Route messages have a
rudimentary form of such a field (the "protocol" field on the quagga
patch Stephen posted for example uses it). It would be really nice to be
able to generate such a field on all events to emit say the processid of
the process which caused the event. Ive been thinking of adding a
permanent field like that on the actions/filters. What are peoples
thoughts on that?
cheers,
jamal
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-2.6.26] netlink: make socket filters work on netlink
2008-03-31 21:49 ` jamal
@ 2008-04-01 11:52 ` Patrick McHardy
2008-04-01 14:04 ` jamal
0 siblings, 1 reply; 25+ messages in thread
From: Patrick McHardy @ 2008-04-01 11:52 UTC (permalink / raw)
To: hadi; +Cc: David Miller, shemminger, netdev
jamal wrote:
> On Mon, 2008-31-03 at 22:15 +0200, Patrick McHardy wrote:
>
>> I'll cook something up based on my and Stephen's patches.
>
> Nice - now we can move all the filtering off iproute2;->
>
> My view is that the nature of most stuff to filter on _events_ will
> depend on something along a "whodunnit" field. Route messages have a
> rudimentary form of such a field (the "protocol" field on the quagga
> patch Stephen posted for example uses it). It would be really nice to be
> able to generate such a field on all events to emit say the processid of
> the process which caused the event. Ive been thinking of adding a
> permanent field like that on the actions/filters. What are peoples
> thoughts on that?
Isn't that what nlmsg_pid already contains?
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-2.6.26] netlink: make socket filters work on netlink
2008-04-01 11:52 ` Patrick McHardy
@ 2008-04-01 14:04 ` jamal
2008-04-02 10:00 ` Patrick McHardy
0 siblings, 1 reply; 25+ messages in thread
From: jamal @ 2008-04-01 14:04 UTC (permalink / raw)
To: Patrick McHardy; +Cc: David Miller, shemminger, netdev
On Tue, 2008-01-04 at 13:52 +0200, Patrick McHardy wrote:
> Isn't that what nlmsg_pid already contains?
would work only on the case of user<->user.
On kernel->user it is supposed to be owned by the kernel and set to 0.
OTOH, one could use the nlmsg seq as a "cookie" (since that is untouched
by the kernel) with the hope that no other process will use that same
cookie - then you can filter events based on the cookie. Note this is
only useful if you can guarantee that all processes running on a system
guarantee such uniqueness of such cookies amongst each other; which is
typically a bad assumption for a generic solution but would work.
cheers,
jamal
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-2.6.26] netlink: make socket filters work on netlink
2008-04-01 14:04 ` jamal
@ 2008-04-02 10:00 ` Patrick McHardy
2008-04-02 11:21 ` Thomas Graf
2008-04-02 11:42 ` jamal
0 siblings, 2 replies; 25+ messages in thread
From: Patrick McHardy @ 2008-04-02 10:00 UTC (permalink / raw)
To: hadi; +Cc: David Miller, shemminger, netdev
jamal wrote:
> On Tue, 2008-01-04 at 13:52 +0200, Patrick McHardy wrote:
>
>> Isn't that what nlmsg_pid already contains?
>
> would work only on the case of user<->user.
> On kernel->user it is supposed to be owned by the kernel and set to 0.
No, in the case of events its supposed to be set to the pid of the
socket that caused the event. Check out qdisc_notify() or rtmsg_ifa()
for example.
> OTOH, one could use the nlmsg seq as a "cookie" (since that is untouched
> by the kernel) with the hope that no other process will use that same
> cookie - then you can filter events based on the cookie. Note this is
> only useful if you can guarantee that all processes running on a system
> guarantee such uniqueness of such cookies amongst each other; which is
> typically a bad assumption for a generic solution but would work.
nlmsg_seq is already used by userspace to match responses to requests,
so that probably wouldn't work very well.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-2.6.26] netlink: make socket filters work on netlink
2008-04-02 10:00 ` Patrick McHardy
@ 2008-04-02 11:21 ` Thomas Graf
2008-04-02 12:01 ` jamal
2008-04-02 12:03 ` Patrick McHardy
2008-04-02 11:42 ` jamal
1 sibling, 2 replies; 25+ messages in thread
From: Thomas Graf @ 2008-04-02 11:21 UTC (permalink / raw)
To: Patrick McHardy; +Cc: hadi, David Miller, shemminger, netdev
* Patrick McHardy <kaber@trash.net> 2008-04-02 12:00
> jamal wrote:
> >On Tue, 2008-01-04 at 13:52 +0200, Patrick McHardy wrote:
> >
> >>Isn't that what nlmsg_pid already contains?
> >
> >would work only on the case of user<->user.
> >On kernel->user it is supposed to be owned by the kernel and set to 0.
>
>
> No, in the case of events its supposed to be set to the pid of the
> socket that caused the event. Check out qdisc_notify() or rtmsg_ifa()
> for example.
Unfortunately in many cases the pid is also set to 0 because the
information is lost when carrying the event using the notifier
interface. Probably fixable though.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-2.6.26] netlink: make socket filters work on netlink
2008-04-02 10:00 ` Patrick McHardy
2008-04-02 11:21 ` Thomas Graf
@ 2008-04-02 11:42 ` jamal
2008-04-02 12:07 ` Patrick McHardy
1 sibling, 1 reply; 25+ messages in thread
From: jamal @ 2008-04-02 11:42 UTC (permalink / raw)
To: Patrick McHardy; +Cc: David Miller, shemminger, netdev
On Wed, 2008-02-04 at 12:00 +0200, Patrick McHardy wrote:
> No, in the case of events its supposed to be set to the pid of the
> socket that caused the event. Check out qdisc_notify() or rtmsg_ifa()
> for example.
nod - however, there is inconsistency;
for example if rtmsg_ifa() was in the code path of something invoked by
ioctl, the pid of the event will be 0. Alexey almost chewed my head off
when i tried to change that. His (valid) explanation was, if iirc:
a) that the pid belonged to the source - and 0 means "kernel".
b) The pid could also be some negative number (even in qdisc_notify) if
you have multiple netlink sockets in one process (resolvable if you
getname())
c) what happens when processid goes to > 32-bit?
> nlmsg_seq is already used by userspace to match responses to requests,
> so that probably wouldn't work very well.
True, and thats what made me suggest earlier that if i had two sockets
in my app, one listening and the other sending (the way quagga does for
example), and that the app has a setting which says "all my netlink
messages to the kernel would have sequence 0x1234" - then the listening
socket could be told to set bpf to filter out any events with sequence
0x1234. This overcomes all of #a, #b and #c above but i admit it is
equally weak as using pids because it relies on the fact that only one
app is targeting something in the kernel with sequence 0x1234. You could
do it if you owned the box.
As one step further - route messages "proto" field on the other hand
overcomes that challenge of multiple applications ambiguity. of course
this has other challenges as well (ex: several apps claiming to be zebra
or if you intentionaly want to run multiple quaggas for VRs but wanted
to distinguish who added the route).
cheers,
jamal
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-2.6.26] netlink: make socket filters work on netlink
2008-04-02 11:21 ` Thomas Graf
@ 2008-04-02 12:01 ` jamal
2008-04-02 12:09 ` Patrick McHardy
2008-04-02 12:03 ` Patrick McHardy
1 sibling, 1 reply; 25+ messages in thread
From: jamal @ 2008-04-02 12:01 UTC (permalink / raw)
To: Thomas Graf; +Cc: Patrick McHardy, David Miller, shemminger, netdev
On Wed, 2008-02-04 at 13:21 +0200, Thomas Graf wrote:
> Probably fixable though.
One way "i fixed it" was removed by Alexey in commit:
28633514afd68afa77ed2fa34fa53626837bf2d5
Note: Alexey is right of course.
cheers,
jamal
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-2.6.26] netlink: make socket filters work on netlink
2008-04-02 11:21 ` Thomas Graf
2008-04-02 12:01 ` jamal
@ 2008-04-02 12:03 ` Patrick McHardy
2008-04-02 14:09 ` Thomas Graf
1 sibling, 1 reply; 25+ messages in thread
From: Patrick McHardy @ 2008-04-02 12:03 UTC (permalink / raw)
To: Thomas Graf; +Cc: hadi, David Miller, shemminger, netdev
Thomas Graf wrote:
> * Patrick McHardy <kaber@trash.net> 2008-04-02 12:00
>> jamal wrote:
>>> On Tue, 2008-01-04 at 13:52 +0200, Patrick McHardy wrote:
>>>
>>>> Isn't that what nlmsg_pid already contains?
>>> would work only on the case of user<->user.
>>> On kernel->user it is supposed to be owned by the kernel and set to 0.
>>
>> No, in the case of events its supposed to be set to the pid of the
>> socket that caused the event. Check out qdisc_notify() or rtmsg_ifa()
>> for example.
>
> Unfortunately in many cases the pid is also set to 0 because the
> information is lost when carrying the event using the notifier
> interface. Probably fixable though.
Yes, I remember some cases but couldn't find them anymore.
Tt should be fixed IMO since there's not much value in this
if its used inconsistently.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-2.6.26] netlink: make socket filters work on netlink
2008-04-02 11:42 ` jamal
@ 2008-04-02 12:07 ` Patrick McHardy
2008-04-02 14:05 ` Thomas Graf
0 siblings, 1 reply; 25+ messages in thread
From: Patrick McHardy @ 2008-04-02 12:07 UTC (permalink / raw)
To: hadi; +Cc: David Miller, shemminger, netdev
jamal wrote:
> On Wed, 2008-02-04 at 12:00 +0200, Patrick McHardy wrote:
>
>> No, in the case of events its supposed to be set to the pid of the
>> socket that caused the event. Check out qdisc_notify() or rtmsg_ifa()
>> for example.
>
> nod - however, there is inconsistency;
> for example if rtmsg_ifa() was in the code path of something invoked by
> ioctl, the pid of the event will be 0. Alexey almost chewed my head off
> when i tried to change that. His (valid) explanation was, if iirc:
> a) that the pid belonged to the source - and 0 means "kernel".
Yes, for ioctl it can't carry anything but zero because it
should contains the netlink socket pid, not the process ID,
which obviously doesn't exist in that case.
> b) The pid could also be some negative number (even in qdisc_notify) if
> you have multiple netlink sockets in one process (resolvable if you
> getname())
True, but that doesn't really matter. You also can't assume that
a positive number matches the process ID.
> c) what happens when processid goes to > 32-bit?
Wouldn't be a problem, its the netlink socket pids, which are 32
bit no matter what.
>
>> nlmsg_seq is already used by userspace to match responses to requests,
>> so that probably wouldn't work very well.
>
> True, and thats what made me suggest earlier that if i had two sockets
> in my app, one listening and the other sending (the way quagga does for
> example), and that the app has a setting which says "all my netlink
> messages to the kernel would have sequence 0x1234" - then the listening
> socket could be told to set bpf to filter out any events with sequence
> 0x1234. This overcomes all of #a, #b and #c above but i admit it is
> equally weak as using pids because it relies on the fact that only one
> app is targeting something in the kernel with sequence 0x1234. You could
> do it if you owned the box.
> As one step further - route messages "proto" field on the other hand
> overcomes that challenge of multiple applications ambiguity. of course
> this has other challenges as well (ex: several apps claiming to be zebra
> or if you intentionaly want to run multiple quaggas for VRs but wanted
> to distinguish who added the route).
In my opinion we should try to set nlmsg_pid properly since thats
whats defined as unique identifier for netlink sockets.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-2.6.26] netlink: make socket filters work on netlink
2008-04-02 12:01 ` jamal
@ 2008-04-02 12:09 ` Patrick McHardy
2008-04-02 12:25 ` jamal
0 siblings, 1 reply; 25+ messages in thread
From: Patrick McHardy @ 2008-04-02 12:09 UTC (permalink / raw)
To: hadi; +Cc: Thomas Graf, David Miller, shemminger, netdev
jamal wrote:
> On Wed, 2008-02-04 at 13:21 +0200, Thomas Graf wrote:
>
>> Probably fixable though.
>
> One way "i fixed it" was removed by Alexey in commit:
> 28633514afd68afa77ed2fa34fa53626837bf2d5
>
> Note: Alexey is right of course.
Yes, but it was the use of current->pid that was wrong.
If one of those calls are in a path invoked through netlink
it should set nlmsg_pid.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-2.6.26] netlink: make socket filters work on netlink
2008-04-02 12:09 ` Patrick McHardy
@ 2008-04-02 12:25 ` jamal
2008-04-02 12:45 ` Patrick McHardy
0 siblings, 1 reply; 25+ messages in thread
From: jamal @ 2008-04-02 12:25 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Thomas Graf, David Miller, shemminger, netdev
On Wed, 2008-02-04 at 14:09 +0200, Patrick McHardy wrote:
> Yes, but it was the use of current->pid that was wrong.
There are many many apps out there which still use ioctls - hence the
ambiguity of "is it the kernel that generated the command that caused
the event or was it merely a proxy for some app".
You need to resolve that.
> If one of those calls are in a path invoked through netlink
> it should set nlmsg_pid.
Nod - I think thats mostly taken care of; havent looked lately. I know
Alexey didnt object to any patches i submitted that did change how
nlmsg_pid was set on events to match this thought and I cant think of a
reason it would violate any netlink ettiquette.
Note, I find the whoddunit field (not the pid) to be also useful for
aesthetics and debugging other than for the non-ambiguity in the
filtering.
cheers,
jamal
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-2.6.26] netlink: make socket filters work on netlink
2008-04-02 12:25 ` jamal
@ 2008-04-02 12:45 ` Patrick McHardy
2008-04-02 13:10 ` jamal
0 siblings, 1 reply; 25+ messages in thread
From: Patrick McHardy @ 2008-04-02 12:45 UTC (permalink / raw)
To: hadi; +Cc: Thomas Graf, David Miller, shemminger, netdev
jamal wrote:
> On Wed, 2008-02-04 at 14:09 +0200, Patrick McHardy wrote:
>
>> Yes, but it was the use of current->pid that was wrong.
>
> There are many many apps out there which still use ioctls - hence the
> ambiguity of "is it the kernel that generated the command that caused
> the event or was it merely a proxy for some app".
> You need to resolve that.
Mhh .. we could use a magic nlmsg_pid value (just like zero)
to indicate it was done on behalf of a process using ioctls or
some other, non-netlink means. I'm wondering how useful this
(or any other "whodunit" identifier) would be for filtering
though, I think you're usually more interested in certain
objects than certain processes, like all routes to 192.168.0.0/16,
no matter who changes them.
>> If one of those calls are in a path invoked through netlink
>> it should set nlmsg_pid.
>
> Nod - I think thats mostly taken care of; havent looked lately. I know
> Alexey didnt object to any patches i submitted that did change how
> nlmsg_pid was set on events to match this thought and I cant think of a
> reason it would violate any netlink ettiquette.
>
> Note, I find the whoddunit field (not the pid) to be also useful for
> aesthetics and debugging other than for the non-ambiguity in the
> filtering.
Unfortunately we can't add a new field to the existing headers
without breaking things, so anything new would likely be subsystem
specific.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-2.6.26] netlink: make socket filters work on netlink
2008-04-02 12:45 ` Patrick McHardy
@ 2008-04-02 13:10 ` jamal
2008-04-02 14:28 ` Thomas Graf
0 siblings, 1 reply; 25+ messages in thread
From: jamal @ 2008-04-02 13:10 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Thomas Graf, David Miller, shemminger, netdev
On Wed, 2008-02-04 at 14:45 +0200, Patrick McHardy wrote:
> Mhh .. we could use a magic nlmsg_pid value (just like zero)
> to indicate it was done on behalf of a process using ioctls or
> some other, non-netlink means.
In my experience attempting to write user space apps which worry about
whodunnit and filtering in user space (essentially solving similar
problem to what Stephen is attempting to), this non-nl paths have been
the most headache. Every book written on linux network config
demonstrates ifconfig and route utilities. Of course you could put
restrictions with caveats like "my app works only if you used ip or used
ifconfig version 3 which uses netlink" - but that puts a damp in the
whole concept.
> I'm wondering how useful this
> (or any other "whodunit" identifier) would be for filtering
> though, I think you're usually more interested in certain
> objects than certain processes, like all routes to 192.168.0.0/16,
> no matter who changes them.
I think both are of interest.
Historically in routes for example, the interest would be apps/processes
(not direct mapping to processids) that are of interest (eg a route
added by OSPF vs one added by RIP etc). Filtering is done in user space
so you dont repropagate routes added by the "OSPF process" to RIP if
policy says so etc.
That of course has evolved to application name (gated vs zebra vs bird
etc) which implements all of OSPF/RIP/BGP etc and can keep track of
things. Theres further evolution which desires to run multiple instances
of quagga on the same machine etc.
I dont think the processid is the right thing, but some 32 bit id which
maps to a name would be useful or even a static field defined in some
header file. The name serving could be out of the kernel for simplicity
if the process dies and is re-incarnated it knows what to read(some of
these ideas are already in genetlink).
> Unfortunately we can't add a new field to the existing headers
> without breaking things, so anything new would likely be subsystem
> specific.
Agreed, it is a bit too late to change netlink without breaking apps
(and reason i was suggesting i was going to add it to actions/cls).
Essentially even in subsystem specifics it would be an attribute
(transported via TLV).
cheers,
jamal
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-2.6.26] netlink: make socket filters work on netlink
2008-04-02 12:07 ` Patrick McHardy
@ 2008-04-02 14:05 ` Thomas Graf
0 siblings, 0 replies; 25+ messages in thread
From: Thomas Graf @ 2008-04-02 14:05 UTC (permalink / raw)
To: Patrick McHardy; +Cc: hadi, David Miller, shemminger, netdev
* Patrick McHardy <kaber@trash.net> 2008-04-02 14:07
> jamal wrote:
> >a) that the pid belonged to the source - and 0 means "kernel".
I'm using the term netlink port for now, it avoids the confusion
and makes it clear what nlmsg_pid really is.
> Yes, for ioctl it can't carry anything but zero because it
> should contains the netlink socket pid, not the process ID,
> which obviously doesn't exist in that case.
I like the idea of a magic number although I would define it as
originator is userspace but we do not know the netlink port number.
> >b) The pid could also be some negative number (even in qdisc_notify) if
> >you have multiple netlink sockets in one process (resolvable if you
> >getname())
>
> True, but that doesn't really matter. You also can't assume that
> a positive number matches the process ID.
The netlink port is a unsigned 32 bit value, period.
> >c) what happens when processid goes to > 32-bit?
>
> Wouldn't be a problem, its the netlink socket pids, which are 32
> bit no matter what.
Also the process id is not even 32 bit so far which gives libnl the
ability to generate unique netlink ports based on the real process id
plus some iterator value. Applications using libnl which do not
generate the port number themselves could theoretiaclly resolve the real
process id back even if there are multiple sockets per application.
> In my opinion we should try to set nlmsg_pid properly since thats
> whats defined as unique identifier for netlink sockets.
Agreed.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-2.6.26] netlink: make socket filters work on netlink
2008-04-02 12:03 ` Patrick McHardy
@ 2008-04-02 14:09 ` Thomas Graf
0 siblings, 0 replies; 25+ messages in thread
From: Thomas Graf @ 2008-04-02 14:09 UTC (permalink / raw)
To: Patrick McHardy; +Cc: hadi, David Miller, shemminger, netdev
* Patrick McHardy <kaber@trash.net> 2008-04-02 14:03
> Yes, I remember some cases but couldn't find them anymore.
> Tt should be fixed IMO since there's not much value in this
> if its used inconsistently.
Absolutely. The most prominent case is link notifications. Although I
think it's only a matter of passing the netlink pid along.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-2.6.26] netlink: make socket filters work on netlink
2008-04-02 13:10 ` jamal
@ 2008-04-02 14:28 ` Thomas Graf
2008-04-02 18:12 ` jamal
0 siblings, 1 reply; 25+ messages in thread
From: Thomas Graf @ 2008-04-02 14:28 UTC (permalink / raw)
To: jamal; +Cc: Patrick McHardy, David Miller, shemminger, netdev
* jamal <hadi@cyberus.ca> 2008-04-02 09:10
> I think both are of interest.
> Historically in routes for example, the interest would be apps/processes
> (not direct mapping to processids) that are of interest (eg a route
> added by OSPF vs one added by RIP etc). Filtering is done in user space
> so you dont repropagate routes added by the "OSPF process" to RIP if
> policy says so etc.
> That of course has evolved to application name (gated vs zebra vs bird
> etc) which implements all of OSPF/RIP/BGP etc and can keep track of
> things. Theres further evolution which desires to run multiple instances
> of quagga on the same machine etc.
> I dont think the processid is the right thing, but some 32 bit id which
> maps to a name would be useful or even a static field defined in some
> header file. The name serving could be out of the kernel for simplicity
> if the process dies and is re-incarnated it knows what to read(some of
> these ideas are already in genetlink).
I think this is over the top. The netlink port already provides good
means to retrieve information about the originator of a netlink message.
An additional genl interface which would resolve a netlink port to a
a process id / process name would be very helpful though.
If you want to keep track of netlink processes which have already closed
the socket then a userspace daemon which subscribes to all groups and
keeps a log of netlink port plus eventually message sequence numbers
would be a better solution IMHO. The daemon could be used to log netlink
events in general and even have triggers to invoke commands upon certain
events.
However, I'm positive to the idea of storing the netlink port on a per
object basis. It would be nice to know which application is responsible
for the creation of certain qdiscs without having to listen to events.
We could define the netlink port as follows to cover both netlink and
non netlink applications:
port identifier (10 bits) + process id (22 bits)
A special value for the port identifier is defined (e.g. all zeros) which
specifies that the netlink port is really a ioctl based application. This
way we can encode the process id for both netlink and ioctl, distinguish
between them and support 1023 netlink sockets per application.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-2.6.26] netlink: make socket filters work on netlink
2008-04-02 14:28 ` Thomas Graf
@ 2008-04-02 18:12 ` jamal
0 siblings, 0 replies; 25+ messages in thread
From: jamal @ 2008-04-02 18:12 UTC (permalink / raw)
To: Thomas Graf; +Cc: Patrick McHardy, David Miller, shemminger, netdev
On Wed, 2008-02-04 at 16:28 +0200, Thomas Graf wrote:
> I think this is over the top. The netlink port already provides good
> means to retrieve information about the originator of a netlink message.
netlink is easier on the events.
What needs resolving imo is the ioctl or /proc, sysfs etc.
For both cases one needs to worry about the how such fields are set.
> An additional genl interface which would resolve a netlink port to a
> a process id / process name would be very helpful though.
An out of band genl could be used for the name serving. I think it may
make it easier if such name serving was centralized as opposed to per
subsystem - but the object setting would be per subsystem.
> If you want to keep track of netlink processes which have already closed
> the socket then a userspace daemon which subscribes to all groups and
> keeps a log of netlink port plus eventually message sequence numbers
> would be a better solution IMHO. The daemon could be used to log netlink
> events in general and even have triggers to invoke commands upon certain
> events.
Keeping track of all events will pose a scaling issue imo.
But it's not so much keeping track of apps which have closed their
sockets as it is to reduce the cost of recovery.
The names are not supposed to change over re-incarnations and there
arent that many of them. There may be multiple instances of the same
name (multiple quaggas for example) - but thats a separate discussion.
If you look at routing and defined names, i think there about 14 defined
todate in the kernel and hardly a handful are used by any router at a
time. For this reason I suggest storing the names in the kernel.
> However, I'm positive to the idea of storing the netlink port on a per
> object basis. It would be nice to know which application is responsible
> for the creation of certain qdiscs without having to listen to events.
indeed. A useful list for the field is:
1) aesthetics - example: ive always toasted to the output of ip route ls
because i can see nicely who added the route
2) debugging - example: i can debug easier routing daemons by seeing who
added the route
3) filtering - example: Stephen trying to get quagga to scale better by
filtering unnecessary events. But i should also be able to do a GET to a
subsystem with whodunnit=ZEBRA and get only those etc.
There are other peripheral benefits like the cleanup/recovery/graceful
restart comment i made earlier.
> We could define the netlink port as follows to cover both netlink and
> non netlink applications:
>
> port identifier (10 bits) + process id (22 bits)
>
> A special value for the port identifier is defined (e.g. all zeros) which
> specifies that the netlink port is really a ioctl based application. This
> way we can encode the process id for both netlink and ioctl, distinguish
> between them and support 1023 netlink sockets per application.
If all you are going to tell me is that "this was caused by ioctl", then
thats insufficient. If all you are going to tell me is "this was caused
by an ioctl from a process id 12345" then thats also not good enough
because the process maybe transient(example ifconfig setting of an IP
address). What i need to know eventually is "this was caused by
ifconfig". That information may be reached by just giving me what you
call a "port" above and i do a name resilution which tells me
"ifconfig". nlmsgpid is just too ambigous to be the "port".
my 1.9cents.
cheers,
jamal
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2008-04-02 18:12 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-21 18:05 [PATCH net-2.6.26] netlink: make socket filters work on netlink Stephen Hemminger
2008-03-21 22:47 ` David Miller
2008-03-26 20:19 ` Patrick McHardy
2008-03-31 19:33 ` Stephen Hemminger
2008-03-31 19:40 ` Patrick McHardy
2008-03-31 19:46 ` Stephen Hemminger
2008-03-31 20:07 ` David Miller
2008-03-31 20:15 ` Patrick McHardy
2008-03-31 21:49 ` jamal
2008-04-01 11:52 ` Patrick McHardy
2008-04-01 14:04 ` jamal
2008-04-02 10:00 ` Patrick McHardy
2008-04-02 11:21 ` Thomas Graf
2008-04-02 12:01 ` jamal
2008-04-02 12:09 ` Patrick McHardy
2008-04-02 12:25 ` jamal
2008-04-02 12:45 ` Patrick McHardy
2008-04-02 13:10 ` jamal
2008-04-02 14:28 ` Thomas Graf
2008-04-02 18:12 ` jamal
2008-04-02 12:03 ` Patrick McHardy
2008-04-02 14:09 ` Thomas Graf
2008-04-02 11:42 ` jamal
2008-04-02 12:07 ` Patrick McHardy
2008-04-02 14:05 ` Thomas Graf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).