* [PATCHSET] neighbour tables access via rtnetlink
@ 2005-05-26 18:53 Thomas Graf
2005-05-26 18:54 ` [PATCH 1/4] [NETLINK] New message building macros Thomas Graf
` (3 more replies)
0 siblings, 4 replies; 26+ messages in thread
From: Thomas Graf @ 2005-05-26 18:53 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev
Dave,
This patchset adds neighbour table configuration and
statistics access via rtnetlink. Patches 1-2 add some
new macros to make netlink message construction simpler
and more readable, patch 3 adds the actual neighbour
table code and patch 4 removes a few unused fields
discovered while writing patch 3 ;->
include/linux/netlink.h | 17 ++
include/linux/rtnetlink.h | 45 ++++++
include/net/neighbour.h | 3
include/linux/rtnetlink.h | 107 ++++++++++++++
include/net/neighbour.h | 4
net/core/neighbour.c | 317 +++++++++++++++++++++++++++++++++++++++++++-
net/core/rtnetlink.c | 20 +-
security/selinux/nlmsgtab.c | 2
8 files changed, 499 insertions(+), 16 deletions(-)
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 1/4] [NETLINK] New message building macros
2005-05-26 18:53 [PATCHSET] neighbour tables access via rtnetlink Thomas Graf
@ 2005-05-26 18:54 ` Thomas Graf
2005-05-26 18:54 ` [PATCH 2/4] [RTNETLINK] Routing attribute related shortcuts Thomas Graf
` (2 subsequent siblings)
3 siblings, 0 replies; 26+ messages in thread
From: Thomas Graf @ 2005-05-26 18:54 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev
NLMSG_PUT_ANSWER(skb, nlcb, type, length)
Start a new netlink message as answer to a request,
returns the message header.
NLMSG_END(skb, nlh)
End a netlink message, fixes total message length,
returns skb->len.
NLMSG_CANCEL(skb, nlh)
Cancel the building process and trim whole message
from skb again, returns -1.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
commit 88e04e26d887e806372014a2f7b33c6a7ea64741
tree 031c0e0d63096ca2dff2203940b90460f1db5c68
parent d1faeaeb95a05275cf0c5b51b88f2fa833434625
author Thomas Graf <tgraf@suug.ch> Thu, 26 May 2005 18:21:36 +0200
committer Thomas Graf <tgraf@suug.ch> Thu, 26 May 2005 18:21:36 +0200
include/linux/netlink.h | 17 +++++++++++++++--
1 files changed, 15 insertions(+), 2 deletions(-)
Index: include/linux/netlink.h
===================================================================
--- c6f827347b3c6c28bc55a8eb1e62ea2659202cab/include/linux/netlink.h (mode:100644)
+++ 031c0e0d63096ca2dff2203940b90460f1db5c68/include/linux/netlink.h (mode:100644)
@@ -171,8 +171,21 @@
}
#define NLMSG_PUT(skb, pid, seq, type, len) \
-({ if (skb_tailroom(skb) < (int)NLMSG_SPACE(len)) goto nlmsg_failure; \
- __nlmsg_put(skb, pid, seq, type, len); })
+({ if (skb_tailroom(skb) < (int)NLMSG_SPACE(len)) \
+ goto nlmsg_failure; \
+ __nlmsg_put(skb, pid, seq, type, len); })
+
+#define NLMSG_PUT_ANSWER(skb, cb, type, len) \
+ NLMSG_PUT(skb, NETLINK_CB((cb)->skb).pid, \
+ (cb)->nlh->nlmsg_seq, type, len)
+
+#define NLMSG_END(skb, nlh) \
+({ (nlh)->nlmsg_len = (skb)->tail - (unsigned char *) (nlh); \
+ (skb)->len; })
+
+#define NLMSG_CANCEL(skb, nlh) \
+({ skb_trim(skb, (unsigned char *) (nlh) - (skb)->data); \
+ -1; })
extern int netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
struct nlmsghdr *nlh,
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 2/4] [RTNETLINK] Routing attribute related shortcuts
2005-05-26 18:53 [PATCHSET] neighbour tables access via rtnetlink Thomas Graf
2005-05-26 18:54 ` [PATCH 1/4] [NETLINK] New message building macros Thomas Graf
@ 2005-05-26 18:54 ` Thomas Graf
2005-05-26 18:55 ` [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink Thomas Graf
2005-05-26 18:55 ` [NEIGH] Remove unused fields in struct neigh_parms and neigh_table Thomas Graf
3 siblings, 0 replies; 26+ messages in thread
From: Thomas Graf @ 2005-05-26 18:54 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev
RTA_GET_U(32|64)(tlv)
Assumes TLV is a u32/u64 field and returns its value.
RTA_GET_[M]SECS(tlv)
Assumes TLV is a u64 and transports jiffies converted
to seconds or milliseconds and returns its value.
RTA_PUT_U(32|64)(skb, type, value)
Appends %value as fixed u32/u64 to %skb as TLV %type.
RTA_PUT_[M]SECS(skb, type, jiffies)
Converts %jiffies to secs/msecs and appends it as u64
to %skb as TLV %type.
RTA_PUT_STRING(skb, type, string)
Appends %NUL terminated %string to %skb as TLV %type.
RTA_NEST(skb, type)
Starts a nested TLV %type and returns the nesting handle.
RTA_NEST_END(skb, nesting_handle)
Finishes the nested TLV %nesting_handle, must be called
symmetric to RTA_NEST(). Returns skb->len
RTA_NEST_CANCEL(skb, nesting_handle)
Cancel the nested TLV %nesting_handle and trim nested TLV
from skb again, returns -1.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
commit d24c37db15b6cd790459d9d91ee6331ce17264b4
tree 52e7990f6dee710624f823014c40c14e0e5ccf39
parent 88e04e26d887e806372014a2f7b33c6a7ea64741
author Thomas Graf <tgraf@suug.ch> Thu, 26 May 2005 18:22:23 +0200
committer Thomas Graf <tgraf@suug.ch> Thu, 26 May 2005 18:22:23 +0200
include/linux/rtnetlink.h | 45 +++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 45 insertions(+)
Index: include/linux/rtnetlink.h
===================================================================
--- 031c0e0d63096ca2dff2203940b90460f1db5c68/include/linux/rtnetlink.h (mode:100644)
+++ 52e7990f6dee710624f823014c40c14e0e5ccf39/include/linux/rtnetlink.h (mode:100644)
@@ -789,6 +789,51 @@
({ if (unlikely(skb_tailroom(skb) < (int)(attrlen))) \
goto rtattr_failure; \
memcpy(skb_put(skb, RTA_ALIGN(attrlen)), data, attrlen); })
+
+#define RTA_PUT_U32(skb, attrtype, value) \
+({ u32 _tmp = (value); \
+ RTA_PUT(skb, attrtype, sizeof(u32), &_tmp); })
+
+#define RTA_PUT_U64(skb, attrtype, value) \
+({ u64 _tmp = (value); \
+ RTA_PUT(skb, attrtype, sizeof(u64), &_tmp); })
+
+#define RTA_PUT_SECS(skb, attrtype, value) \
+ RTA_PUT_U64(skb, attrtype, (value) / HZ)
+
+#define RTA_PUT_MSECS(skb, attrtype, value) \
+ RTA_PUT_U64(skb, attrtype, jiffies_to_msecs(value))
+
+#define RTA_PUT_STRING(skb, attrtype, value) \
+ RTA_PUT(skb, attrtype, strlen(value) + 1, value)
+
+#define RTA_NEST(skb, type) \
+({ struct rtattr *__start = (struct rtattr *) (skb)->tail; \
+ RTA_PUT(skb, type, 0, NULL); \
+ __start; })
+
+#define RTA_NEST_END(skb, start) \
+({ (start)->rta_len = ((skb)->tail - (unsigned char *) (start)); \
+ (skb)->len; })
+
+#define RTA_NEST_CANCEL(skb, start) \
+({ skb_trim(skb, (unsigned char *) (start) - (skb)->data); \
+ -1; })
+
+#define RTA_GET_U32(rta) \
+({ if (!rta || RTA_PAYLOAD(rta) < sizeof(u32)) \
+ goto rtattr_failure; \
+ *(u32 *) RTA_DATA(rta); })
+
+#define RTA_GET_U64(rta) \
+({ u64 _tmp; \
+ if (!rta || RTA_PAYLOAD(rta) < sizeof(u64)) \
+ goto rtattr_failure; \
+ memcpy(&_tmp, RTA_DATA(rta), sizeof(_tmp)); \
+ _tmp; })
+
+#define RTA_GET_SECS(rta) ((unsigned long) RTA_GET_U64(rta) * HZ)
+#define RTA_GET_MSECS(rta) (msecs_to_jiffies((unsigned long) RTA_GET_U64(rta)))
static inline struct rtattr *
__rta_reserve(struct sk_buff *skb, int attrtype, int attrlen)
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink
2005-05-26 18:53 [PATCHSET] neighbour tables access via rtnetlink Thomas Graf
2005-05-26 18:54 ` [PATCH 1/4] [NETLINK] New message building macros Thomas Graf
2005-05-26 18:54 ` [PATCH 2/4] [RTNETLINK] Routing attribute related shortcuts Thomas Graf
@ 2005-05-26 18:55 ` Thomas Graf
2005-05-26 22:17 ` David S. Miller
2005-05-27 11:14 ` jamal
2005-05-26 18:55 ` [NEIGH] Remove unused fields in struct neigh_parms and neigh_table Thomas Graf
3 siblings, 2 replies; 26+ messages in thread
From: Thomas Graf @ 2005-05-26 18:55 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev
To retrieve the neighbour tables send RTM_GETNEIGHTBL with the
NLM_F_DUMP flag set. Every neighbour table configuration is
spread over multiple messages to avoid running into message
size limits on systems with many interfaces. The first message
in the sequence transports all not device specific data such as
statistics, configuration, and the default parameter set.
This message is followed by 0..n messages carrying device
specific parameter sets. They all share the same value for
Although the ordering should be sufficient, NDTA_NAME can be
used to identify sequences. The initial message can be identified
by checking for NDTA_CONFIG. The device specific messages do
no contain this TLV but have NDTPA_IFINDEX set to the
corresponding interface index.
To change neighbour table attributes, send RTM_SETNEIGHTBL
with NDTA_NAME set. Changeable attribute include NDTA_THRESH[1-3],
NDTA_GC_INTERVAL, and all TLVs in NDTA_PARMS unless marked
otherwise. Device specific parameter sets can be changed by
setting NDTPA_IFINDEX to the interface index of the corresponding
device.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
commit 6769b3b0603c632bda062e427178de8986900a52
tree 7e955d0e3c8fdef048336d0d7ba24a0539f28f9b
parent 98c3bc2e6320127cd96ef46d33f89ae127ce48fe
author Thomas Graf <tgraf@suug.ch> Wed, 25 May 2005 20:30:09 +0200
committer Thomas Graf <tgraf@suug.ch> Wed, 25 May 2005 20:30:09 +0200
include/linux/rtnetlink.h | 107 ++++++++++++++
include/net/neighbour.h | 4
net/core/neighbour.c | 317 +++++++++++++++++++++++++++++++++++++++++++-
net/core/rtnetlink.c | 20 +-
security/selinux/nlmsgtab.c | 2
5 files changed, 439 insertions(+), 11 deletions(-)
Index: include/linux/rtnetlink.h
===================================================================
--- 0ad944fa24c79b3eeaf06f85f5dd6d91ae1adbc9/include/linux/rtnetlink.h (mode:100644)
+++ 7e955d0e3c8fdef048336d0d7ba24a0539f28f9b/include/linux/rtnetlink.h (mode:100644)
@@ -89,6 +89,13 @@
RTM_GETANYCAST = 62,
#define RTM_GETANYCAST RTM_GETANYCAST
+ RTM_NEWNEIGHTBL = 64,
+#define RTM_NEWNEIGHTBL RTM_NEWNEIGHTBL
+ RTM_GETNEIGHTBL = 66,
+#define RTM_GETNEIGHTBL RTM_GETNEIGHTBL
+ RTM_SETNEIGHTBL,
+#define RTM_SETNEIGHTBL RTM_SETNEIGHTBL
+
__RTM_MAX,
#define RTM_MAX (((__RTM_MAX + 3) & ~3) - 1)
};
@@ -493,6 +500,106 @@
__u32 ndm_refcnt;
};
+
+/*****************************************************************
+ * Neighbour tables specific messages.
+ *
+ * To retrieve the neighbour tables send RTM_GETNEIGHTBL with the
+ * NLM_F_DUMP flag set. Every neighbour table configuration is
+ * spread over multiple messages to avoid running into message
+ * size limits on systems with many interfaces. The first message
+ * in the sequence transports all not device specific data such as
+ * statistics, configuration, and the default parameter set.
+ * This message is followed by 0..n messages carrying device
+ * specific parameter sets. They all share the same value for
+ * Although the ordering should be sufficient, NDTA_NAME can be
+ * used to identify sequences. The initial message can be identified
+ * by checking for NDTA_CONFIG. The device specific messages do
+ * no contain this TLV but have NDTPA_IFINDEX set to the
+ * corresponding interface index.
+ *
+ * To change neighbour table attributes, send RTM_SETNEIGHTBL
+ * with NDTA_NAME set. Changeable attribute include NDTA_THRESH[1-3],
+ * NDTA_GC_INTERVAL, and all TLVs in NDTA_PARMS unless marked
+ * otherwise. Device specific parameter sets can be changed by
+ * setting NDTPA_IFINDEX to the interface index of the corresponding
+ * device.
+ ****/
+
+struct ndt_stats
+{
+ __u64 ndts_allocs;
+ __u64 ndts_destroys;
+ __u64 ndts_hash_grows;
+ __u64 ndts_res_failed;
+ __u64 ndts_lookups;
+ __u64 ndts_hits;
+ __u64 ndts_rcv_probes_mcast;
+ __u64 ndts_rcv_probes_ucast;
+ __u64 ndts_periodic_gc_runs;
+ __u64 ndts_forced_gc_runs;
+};
+
+enum {
+ NDTPA_UNSPEC,
+ NDTPA_IFINDEX, /* u32, unchangeable */
+ NDTPA_REFCNT, /* u32, read-only */
+ NDTPA_REACHABLE_TIME, /* u64, read-only, msecs */
+ NDTPA_BASE_REACHABLE_TIME, /* u64, msecs */
+ NDTPA_RETRANS_TIME, /* u64, msecs */
+ NDTPA_GC_STALETIME, /* u64, msecs */
+ NDTPA_DELAY_PROBE_TIME, /* u64, msecs */
+ NDTPA_QUEUE_LEN, /* u32 */
+ NDTPA_APP_PROBES, /* u32 */
+ NDTPA_UCAST_PROBES, /* u32 */
+ NDTPA_MCAST_PROBES, /* u32 */
+ NDTPA_ANYCAST_DELAY, /* u64, msecs */
+ NDTPA_PROXY_DELAY, /* u64, msecs */
+ NDTPA_PROXY_QLEN, /* u32 */
+ NDTPA_LOCKTIME, /* u64, msecs */
+ __NDTPA_MAX
+};
+#define NDTPA_MAX (__NDTPA_MAX - 1)
+
+struct ndtmsg
+{
+ __u8 ndtm_family;
+ __u8 ndtm_pad1;
+ __u16 ndtm_pad2;
+};
+
+struct ndt_config
+{
+ __u16 ndtc_key_len;
+ __u16 ndtc_entry_size;
+ __u32 ndtc_entries;
+ __u32 ndtc_last_flush; /* delta to now in msecs */
+ __u32 ndtc_last_rand; /* delta to now in msecs */
+ __u32 ndtc_hash_rnd;
+ __u32 ndtc_hash_mask;
+ __u32 ndtc_hash_chain_gc;
+ __u32 ndtc_proxy_qlen;
+};
+
+enum {
+ NDTA_UNSPEC,
+ NDTA_NAME, /* char *, unchangeable */
+ NDTA_THRESH1, /* u32 */
+ NDTA_THRESH2, /* u32 */
+ NDTA_THRESH3, /* u32 */
+ NDTA_CONFIG, /* struct ndt_config, read-only */
+ NDTA_PARMS, /* nested TLV NDTPA_* */
+ NDTA_STATS, /* struct ndt_stats, read-only */
+ NDTA_GC_INTERVAL, /* u64, msecs */
+ __NDTA_MAX
+};
+#define NDTA_MAX (__NDTA_MAX - 1)
+
+#define NDTA_RTA(r) ((struct rtattr*)(((char*)(r)) + \
+ NLMSG_ALIGN(sizeof(struct ndtmsg))))
+#define NDTA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct ndtmsg))
+
+
/****
* General form of address family dependent message.
****/
Index: include/net/neighbour.h
===================================================================
--- 0ad944fa24c79b3eeaf06f85f5dd6d91ae1adbc9/include/net/neighbour.h (mode:100644)
+++ 7e955d0e3c8fdef048336d0d7ba24a0539f28f9b/include/net/neighbour.h (mode:100644)
@@ -65,6 +65,7 @@
struct neigh_parms
{
+ struct net_device *dev;
struct neigh_parms *next;
int (*neigh_setup)(struct neighbour *);
struct neigh_table *tbl;
@@ -252,6 +253,9 @@
extern int neigh_delete(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg);
extern void neigh_app_ns(struct neighbour *n);
+extern int neightbl_dump_info(struct sk_buff *skb, struct netlink_callback *cb);
+extern int neightbl_set(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg);
+
extern void neigh_for_each(struct neigh_table *tbl, void (*cb)(struct neighbour *, void *), void *cookie);
extern void __neigh_for_each_release(struct neigh_table *tbl, int (*cb)(struct neighbour *));
extern void pneigh_for_each(struct neigh_table *tbl, void (*cb)(struct pneigh_entry *));
Index: net/core/neighbour.c
===================================================================
--- 0ad944fa24c79b3eeaf06f85f5dd6d91ae1adbc9/net/core/neighbour.c (mode:100644)
+++ 7e955d0e3c8fdef048336d0d7ba24a0539f28f9b/net/core/neighbour.c (mode:100644)
@@ -1276,9 +1276,14 @@
INIT_RCU_HEAD(&p->rcu_head);
p->reachable_time =
neigh_rand_reach_time(p->base_reachable_time);
- if (dev && dev->neigh_setup && dev->neigh_setup(dev, p)) {
- kfree(p);
- return NULL;
+ if (dev) {
+ if (dev->neigh_setup && dev->neigh_setup(dev, p)) {
+ kfree(p);
+ return NULL;
+ }
+
+ dev_hold(dev);
+ p->dev = dev;
}
p->sysctl_table = NULL;
write_lock_bh(&tbl->lock);
@@ -1309,6 +1314,8 @@
*p = parms->next;
parms->dead = 1;
write_unlock_bh(&tbl->lock);
+ if (parms->dev)
+ dev_put(parms->dev);
call_rcu(&parms->rcu_head, neigh_rcu_free_parms);
return;
}
@@ -1546,6 +1553,308 @@
return err;
}
+static int neightbl_fill_parms(struct sk_buff *skb, struct neigh_parms *parms)
+{
+ struct rtattr *nest = RTA_NEST(skb, NDTA_PARMS);
+
+ if (parms->dev)
+ RTA_PUT_U32(skb, NDTPA_IFINDEX, parms->dev->ifindex);
+
+ RTA_PUT_U32(skb, NDTPA_REFCNT, atomic_read(&parms->refcnt));
+ RTA_PUT_U32(skb, NDTPA_QUEUE_LEN, parms->queue_len);
+ RTA_PUT_U32(skb, NDTPA_PROXY_QLEN, parms->proxy_qlen);
+ RTA_PUT_U32(skb, NDTPA_APP_PROBES, parms->app_probes);
+ RTA_PUT_U32(skb, NDTPA_UCAST_PROBES, parms->ucast_probes);
+ RTA_PUT_U32(skb, NDTPA_MCAST_PROBES, parms->mcast_probes);
+ RTA_PUT_MSECS(skb, NDTPA_REACHABLE_TIME, parms->reachable_time);
+ RTA_PUT_MSECS(skb, NDTPA_BASE_REACHABLE_TIME,
+ parms->base_reachable_time);
+ RTA_PUT_MSECS(skb, NDTPA_GC_STALETIME, parms->gc_staletime);
+ RTA_PUT_MSECS(skb, NDTPA_DELAY_PROBE_TIME, parms->delay_probe_time);
+ RTA_PUT_MSECS(skb, NDTPA_RETRANS_TIME, parms->retrans_time);
+ RTA_PUT_MSECS(skb, NDTPA_ANYCAST_DELAY, parms->anycast_delay);
+ RTA_PUT_MSECS(skb, NDTPA_PROXY_DELAY, parms->proxy_delay);
+ RTA_PUT_MSECS(skb, NDTPA_LOCKTIME, parms->locktime);
+
+ return RTA_NEST_END(skb, nest);
+
+rtattr_failure:
+ return RTA_NEST_CANCEL(skb, nest);
+}
+
+static int neightbl_fill_info(struct neigh_table *tbl, struct sk_buff *skb,
+ struct netlink_callback *cb)
+{
+ struct nlmsghdr *nlh;
+ struct ndtmsg *ndtmsg;
+
+ nlh = NLMSG_PUT_ANSWER(skb, cb, RTM_NEWNEIGHTBL, sizeof(struct ndtmsg));
+ ndtmsg = NLMSG_DATA(nlh);
+
+ NLMSG_SET_MULTIPART(nlh);
+
+ read_lock_bh(&tbl->lock);
+ ndtmsg->ndtm_family = tbl->family;
+
+ RTA_PUT_STRING(skb, NDTA_NAME, tbl->id);
+ RTA_PUT_MSECS(skb, NDTA_GC_INTERVAL, tbl->gc_interval);
+ RTA_PUT_U32(skb, NDTA_THRESH1, tbl->gc_thresh1);
+ RTA_PUT_U32(skb, NDTA_THRESH2, tbl->gc_thresh2);
+ RTA_PUT_U32(skb, NDTA_THRESH3, tbl->gc_thresh3);
+
+ {
+ unsigned long now = jiffies;
+ unsigned int flush_delta = now - tbl->last_flush;
+ unsigned int rand_delta = now - tbl->last_rand;
+
+ struct ndt_config ndc = {
+ .ndtc_key_len = tbl->key_len,
+ .ndtc_entry_size = tbl->entry_size,
+ .ndtc_entries = atomic_read(&tbl->entries),
+ .ndtc_last_flush = jiffies_to_msecs(flush_delta),
+ .ndtc_last_rand = jiffies_to_msecs(rand_delta),
+ .ndtc_hash_rnd = tbl->hash_rnd,
+ .ndtc_hash_mask = tbl->hash_mask,
+ .ndtc_hash_chain_gc = tbl->hash_chain_gc,
+ .ndtc_proxy_qlen = tbl->proxy_queue.qlen,
+ };
+
+ RTA_PUT(skb, NDTA_CONFIG, sizeof(ndc), &ndc);
+ }
+
+ {
+ int cpu;
+ struct ndt_stats ndst;
+
+ memset(&ndst, 0, sizeof(ndst));
+
+ for (cpu = 0; cpu < NR_CPUS; cpu++) {
+ struct neigh_statistics *st;
+
+ if (!cpu_possible(cpu))
+ continue;
+
+ st = per_cpu_ptr(tbl->stats, cpu);
+ ndst.ndts_allocs += st->allocs;
+ ndst.ndts_destroys += st->destroys;
+ ndst.ndts_hash_grows += st->hash_grows;
+ ndst.ndts_res_failed += st->res_failed;
+ ndst.ndts_lookups += st->lookups;
+ ndst.ndts_hits += st->hits;
+ ndst.ndts_rcv_probes_mcast += st->rcv_probes_mcast;
+ ndst.ndts_rcv_probes_ucast += st->rcv_probes_ucast;
+ ndst.ndts_periodic_gc_runs += st->periodic_gc_runs;
+ ndst.ndts_forced_gc_runs += st->forced_gc_runs;
+ }
+
+ RTA_PUT(skb, NDTA_STATS, sizeof(ndst), &ndst);
+ }
+
+ BUG_ON(tbl->parms.dev);
+ if (neightbl_fill_parms(skb, &tbl->parms) < 0)
+ goto rtattr_failure;
+
+ read_unlock_bh(&tbl->lock);
+ return NLMSG_END(skb, nlh);
+
+rtattr_failure:
+ read_unlock_bh(&tbl->lock);
+ return NLMSG_CANCEL(skb, nlh);
+
+nlmsg_failure:
+ return -1;
+}
+
+static int neightbl_fill_param_info(struct neigh_table *tbl,
+ struct neigh_parms *parms,
+ struct sk_buff *skb,
+ struct netlink_callback *cb)
+{
+ struct ndtmsg *ndtmsg;
+ struct nlmsghdr *nlh;
+
+ nlh = NLMSG_PUT_ANSWER(skb, cb, RTM_NEWNEIGHTBL, sizeof(struct ndtmsg));
+ ndtmsg = NLMSG_DATA(nlh);
+
+ NLMSG_SET_MULTIPART(nlh);
+
+ read_lock_bh(&tbl->lock);
+ ndtmsg->ndtm_family = tbl->family;
+ RTA_PUT_STRING(skb, NDTA_NAME, tbl->id);
+
+ if (neightbl_fill_parms(skb, parms) < 0)
+ goto rtattr_failure;
+
+ read_unlock_bh(&tbl->lock);
+ return NLMSG_END(skb, nlh);
+
+rtattr_failure:
+ read_unlock_bh(&tbl->lock);
+ return NLMSG_CANCEL(skb, nlh);
+
+nlmsg_failure:
+ return -1;
+}
+
+static inline struct neigh_parms *lookup_neigh_params(struct neigh_table *tbl,
+ int ifindex)
+{
+ struct neigh_parms *p;
+
+ for (p = &tbl->parms; p; p = p->next)
+ if ((p->dev && p->dev->ifindex == ifindex) ||
+ (!p->dev && !ifindex))
+ return p;
+
+ return NULL;
+}
+
+int neightbl_set(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+{
+ struct neigh_table *tbl;
+ struct ndtmsg *ndtmsg = NLMSG_DATA(nlh);
+ struct rtattr **tb = arg;
+ int err = -EINVAL;
+
+ if (!tb[NDTA_NAME - 1] || !RTA_PAYLOAD(tb[NDTA_NAME - 1]))
+ return -EINVAL;
+
+ read_lock(&neigh_tbl_lock);
+ for (tbl = neigh_tables; tbl; tbl = tbl->next) {
+ if (ndtmsg->ndtm_family && tbl->family != ndtmsg->ndtm_family)
+ continue;
+
+ if (!rtattr_strcmp(tb[NDTA_NAME - 1], tbl->id))
+ break;
+ }
+
+ if (tbl == NULL) {
+ err = -ENOENT;
+ goto errout;
+ }
+
+ /*
+ * We acquire tbl->lock to be nice to the periodic timers and
+ * make sure they always see a consistent set of values.
+ */
+ write_lock_bh(&tbl->lock);
+
+ if (tb[NDTA_THRESH1 - 1])
+ tbl->gc_thresh1 = RTA_GET_U32(tb[NDTA_THRESH1 - 1]);
+
+ if (tb[NDTA_THRESH2 - 1])
+ tbl->gc_thresh2 = RTA_GET_U32(tb[NDTA_THRESH2 - 1]);
+
+ if (tb[NDTA_THRESH3 - 1])
+ tbl->gc_thresh3 = RTA_GET_U32(tb[NDTA_THRESH3 - 1]);
+
+ if (tb[NDTA_GC_INTERVAL - 1])
+ tbl->gc_interval = RTA_GET_MSECS(tb[NDTA_GC_INTERVAL - 1]);
+
+ if (tb[NDTA_PARMS - 1]) {
+ struct rtattr *tbp[NDTPA_MAX];
+ struct neigh_parms *p;
+ u32 ifindex = 0;
+
+ if (rtattr_parse_nested(tbp, NDTPA_MAX, tb[NDTA_PARMS - 1]) < 0)
+ goto rtattr_failure;
+
+ if (tbp[NDTPA_IFINDEX - 1])
+ ifindex = RTA_GET_U32(tbp[NDTPA_IFINDEX - 1]);
+
+ p = lookup_neigh_params(tbl, ifindex);
+ if (p == NULL) {
+ err = -ENOENT;
+ goto rtattr_failure;
+ }
+
+ if (tbp[NDTPA_QUEUE_LEN - 1])
+ p->queue_len = RTA_GET_U32(tbp[NDTPA_QUEUE_LEN - 1]);
+
+ if (tbp[NDTPA_PROXY_QLEN - 1])
+ p->proxy_qlen = RTA_GET_U32(tbp[NDTPA_PROXY_QLEN - 1]);
+
+ if (tbp[NDTPA_APP_PROBES - 1])
+ p->app_probes = RTA_GET_U32(tbp[NDTPA_APP_PROBES - 1]);
+
+ if (tbp[NDTPA_UCAST_PROBES - 1])
+ p->ucast_probes =
+ RTA_GET_U32(tbp[NDTPA_UCAST_PROBES - 1]);
+
+ if (tbp[NDTPA_MCAST_PROBES - 1])
+ p->mcast_probes =
+ RTA_GET_U32(tbp[NDTPA_MCAST_PROBES - 1]);
+
+ if (tbp[NDTPA_BASE_REACHABLE_TIME - 1])
+ p->base_reachable_time =
+ RTA_GET_MSECS(tbp[NDTPA_BASE_REACHABLE_TIME - 1]);
+
+ if (tbp[NDTPA_GC_STALETIME - 1])
+ p->gc_staletime =
+ RTA_GET_MSECS(tbp[NDTPA_GC_STALETIME - 1]);
+
+ if (tbp[NDTPA_DELAY_PROBE_TIME - 1])
+ p->delay_probe_time =
+ RTA_GET_MSECS(tbp[NDTPA_DELAY_PROBE_TIME - 1]);
+
+ if (tbp[NDTPA_RETRANS_TIME - 1])
+ p->retrans_time =
+ RTA_GET_MSECS(tbp[NDTPA_RETRANS_TIME - 1]);
+
+ if (tbp[NDTPA_ANYCAST_DELAY - 1])
+ p->anycast_delay =
+ RTA_GET_MSECS(tbp[NDTPA_ANYCAST_DELAY - 1]);
+
+ if (tbp[NDTPA_PROXY_DELAY - 1])
+ p->proxy_delay =
+ RTA_GET_MSECS(tbp[NDTPA_PROXY_DELAY - 1]);
+
+ if (tbp[NDTPA_LOCKTIME - 1])
+ p->locktime = RTA_GET_MSECS(tbp[NDTPA_LOCKTIME - 1]);
+ }
+
+ err = 0;
+
+rtattr_failure:
+ write_unlock_bh(&tbl->lock);
+errout:
+ read_unlock(&neigh_tbl_lock);
+ return err;
+}
+
+int neightbl_dump_info(struct sk_buff *skb, struct netlink_callback *cb)
+{
+ int idx, family;
+ int s_idx = cb->args[0];
+ struct neigh_table *tbl;
+
+ family = ((struct rtgenmsg *)NLMSG_DATA(cb->nlh))->rtgen_family;
+
+ read_lock(&neigh_tbl_lock);
+ for (tbl = neigh_tables, idx = 0; tbl; tbl = tbl->next) {
+ struct neigh_parms *p;
+
+ if (idx < s_idx || (family && tbl->family != family))
+ continue;
+
+ if (neightbl_fill_info(tbl, skb, cb) <= 0)
+ break;
+
+ for (++idx, p = tbl->parms.next; p; p = p->next, idx++) {
+ if (idx < s_idx)
+ continue;
+
+ if (neightbl_fill_param_info(tbl, p, skb, cb) <= 0)
+ goto out;
+ }
+
+ }
+out:
+ read_unlock(&neigh_tbl_lock);
+ cb->args[0] = idx;
+
+ return skb->len;
+}
static int neigh_fill_info(struct sk_buff *skb, struct neighbour *n,
u32 pid, u32 seq, int event)
@@ -2352,6 +2661,8 @@
EXPORT_SYMBOL(neigh_update_hhs);
EXPORT_SYMBOL(pneigh_enqueue);
EXPORT_SYMBOL(pneigh_lookup);
+EXPORT_SYMBOL(neightbl_dump_info);
+EXPORT_SYMBOL(neightbl_set);
#ifdef CONFIG_ARPD
EXPORT_SYMBOL(neigh_app_ns);
Index: net/core/rtnetlink.c
===================================================================
--- 0ad944fa24c79b3eeaf06f85f5dd6d91ae1adbc9/net/core/rtnetlink.c (mode:100644)
+++ 7e955d0e3c8fdef048336d0d7ba24a0539f28f9b/net/core/rtnetlink.c (mode:100644)
@@ -100,6 +100,7 @@
[RTM_FAM(RTM_NEWPREFIX)] = NLMSG_LENGTH(sizeof(struct rtgenmsg)),
[RTM_FAM(RTM_GETMULTICAST)] = NLMSG_LENGTH(sizeof(struct rtgenmsg)),
[RTM_FAM(RTM_GETANYCAST)] = NLMSG_LENGTH(sizeof(struct rtgenmsg)),
+ [RTM_FAM(RTM_NEWNEIGHTBL)] = NLMSG_LENGTH(sizeof(struct ndtmsg)),
};
static const int rta_max[RTM_NR_FAMILIES] =
@@ -113,6 +114,7 @@
[RTM_FAM(RTM_NEWTCLASS)] = TCA_MAX,
[RTM_FAM(RTM_NEWTFILTER)] = TCA_MAX,
[RTM_FAM(RTM_NEWACTION)] = TCAA_MAX,
+ [RTM_FAM(RTM_NEWNEIGHTBL)] = NDTA_MAX,
};
void __rta_fill(struct sk_buff *skb, int attrtype, int attrlen, const void *data)
@@ -649,14 +651,16 @@
static struct rtnetlink_link link_rtnetlink_table[RTM_NR_MSGTYPES] =
{
- [RTM_GETLINK - RTM_BASE] = { .dumpit = rtnetlink_dump_ifinfo },
- [RTM_SETLINK - RTM_BASE] = { .doit = do_setlink },
- [RTM_GETADDR - RTM_BASE] = { .dumpit = rtnetlink_dump_all },
- [RTM_GETROUTE - RTM_BASE] = { .dumpit = rtnetlink_dump_all },
- [RTM_NEWNEIGH - RTM_BASE] = { .doit = neigh_add },
- [RTM_DELNEIGH - RTM_BASE] = { .doit = neigh_delete },
- [RTM_GETNEIGH - RTM_BASE] = { .dumpit = neigh_dump_info },
- [RTM_GETRULE - RTM_BASE] = { .dumpit = rtnetlink_dump_all },
+ [RTM_GETLINK - RTM_BASE] = { .dumpit = rtnetlink_dump_ifinfo },
+ [RTM_SETLINK - RTM_BASE] = { .doit = do_setlink },
+ [RTM_GETADDR - RTM_BASE] = { .dumpit = rtnetlink_dump_all },
+ [RTM_GETROUTE - RTM_BASE] = { .dumpit = rtnetlink_dump_all },
+ [RTM_NEWNEIGH - RTM_BASE] = { .doit = neigh_add },
+ [RTM_DELNEIGH - RTM_BASE] = { .doit = neigh_delete },
+ [RTM_GETNEIGH - RTM_BASE] = { .dumpit = neigh_dump_info },
+ [RTM_GETRULE - RTM_BASE] = { .dumpit = rtnetlink_dump_all },
+ [RTM_GETNEIGHTBL - RTM_BASE] = { .dumpit = neightbl_dump_info },
+ [RTM_SETNEIGHTBL - RTM_BASE] = { .doit = neightbl_set },
};
static int rtnetlink_event(struct notifier_block *this, unsigned long event, void *ptr)
Index: security/selinux/nlmsgtab.c
===================================================================
--- 0ad944fa24c79b3eeaf06f85f5dd6d91ae1adbc9/security/selinux/nlmsgtab.c (mode:100644)
+++ 7e955d0e3c8fdef048336d0d7ba24a0539f28f9b/security/selinux/nlmsgtab.c (mode:100644)
@@ -63,6 +63,8 @@
{ RTM_GETPREFIX, NETLINK_ROUTE_SOCKET__NLMSG_READ },
{ RTM_GETMULTICAST, NETLINK_ROUTE_SOCKET__NLMSG_READ },
{ RTM_GETANYCAST, NETLINK_ROUTE_SOCKET__NLMSG_READ },
+ { RTM_GETNEIGHTBL, NETLINK_ROUTE_SOCKET__NLMSG_READ },
+ { RTM_SETNEIGHTBL, NETLINK_ROUTE_SOCKET__NLMSG_WRITE },
};
static struct nlmsg_perm nlmsg_firewall_perms[] =
^ permalink raw reply [flat|nested] 26+ messages in thread
* [NEIGH] Remove unused fields in struct neigh_parms and neigh_table
2005-05-26 18:53 [PATCHSET] neighbour tables access via rtnetlink Thomas Graf
` (2 preceding siblings ...)
2005-05-26 18:55 ` [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink Thomas Graf
@ 2005-05-26 18:55 ` Thomas Graf
2005-05-26 19:00 ` Thomas Graf
3 siblings, 1 reply; 26+ messages in thread
From: Thomas Graf @ 2005-05-26 18:55 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev
Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
commit 45b866033b439e510f3638ce9ac0990446b8b3cd
tree 72baef0bb9b97ac2990bcca05b8f831b30303008
parent 3688d656f38445cc120b0970cd437cad1bb929dd
author Thomas Graf <tgraf@suug.ch> Thu, 26 May 2005 20:42:10 +0200
committer Thomas Graf <tgraf@suug.ch> Thu, 26 May 2005 20:42:10 +0200
include/net/neighbour.h | 3 ---
1 files changed, 3 deletions(-)
Index: include/net/neighbour.h
===================================================================
--- 2fbcbf92bed82aaef81b2bcc9baecafd61dfb0ce/include/net/neighbour.h (mode:100644)
+++ 72baef0bb9b97ac2990bcca05b8f831b30303008/include/net/neighbour.h (mode:100644)
@@ -69,8 +69,6 @@
struct neigh_parms *next;
int (*neigh_setup)(struct neighbour *);
struct neigh_table *tbl;
- int entries;
- void *priv;
void *sysctl_table;
@@ -193,7 +191,6 @@
atomic_t entries;
rwlock_t lock;
unsigned long last_rand;
- struct neigh_parms *parms_list;
kmem_cache_t *kmem_cachep;
struct neigh_statistics *stats;
struct neighbour **hash_buckets;
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [NEIGH] Remove unused fields in struct neigh_parms and neigh_table
2005-05-26 18:55 ` [NEIGH] Remove unused fields in struct neigh_parms and neigh_table Thomas Graf
@ 2005-05-26 19:00 ` Thomas Graf
0 siblings, 0 replies; 26+ messages in thread
From: Thomas Graf @ 2005-05-26 19:00 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev
* Thomas Graf <20050526185556.GA15391@postel.suug.ch> 2005-05-26 20:55
>
> Signed-off-by: Thomas Graf <tgraf@suug.ch>
>
> ---
> commit 45b866033b439e510f3638ce9ac0990446b8b3cd
> tree 72baef0bb9b97ac2990bcca05b8f831b30303008
> parent 3688d656f38445cc120b0970cd437cad1bb929dd
> author Thomas Graf <tgraf@suug.ch> Thu, 26 May 2005 20:42:10 +0200
> committer Thomas Graf <tgraf@suug.ch> Thu, 26 May 2005 20:42:10 +0200
This is patch 4 if it's not clear. I took the wrong patch and my script
couldn't find the parent sha1 and ommited the [PATCH 4/4]. Sorry.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink
2005-05-26 18:55 ` [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink Thomas Graf
@ 2005-05-26 22:17 ` David S. Miller
2005-05-26 22:24 ` Thomas Graf
2005-05-26 22:26 ` Thomas Graf
2005-05-27 11:14 ` jamal
1 sibling, 2 replies; 26+ messages in thread
From: David S. Miller @ 2005-05-26 22:17 UTC (permalink / raw)
To: tgraf; +Cc: netdev
From: Thomas Graf <tgraf@suug.ch>
Date: Thu, 26 May 2005 20:55:26 +0200
> This message is followed by 0..n messages carrying device
> specific parameter sets. They all share the same value for
> Although the ordering should be sufficient, NDTA_NAME can be
It seems a sentence is chopped off here mid-stream,
please provide a fixed changelog so I can apply your
patches.
Thanks.
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink
2005-05-26 22:17 ` David S. Miller
@ 2005-05-26 22:24 ` Thomas Graf
2005-05-26 22:26 ` Thomas Graf
1 sibling, 0 replies; 26+ messages in thread
From: Thomas Graf @ 2005-05-26 22:24 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev
To retrieve the neighbour tables send RTM_GETNEIGHTBL with the
NLM_F_DUMP flag set. Every neighbour table configuration is
spread over multiple messages to avoid running into message
size limits on systems with many interfaces. The first message
in the sequence transports all not device specific data such as
statistics, configuration, and the default parameter set.
This message is followed by 0..n messages carrying device
specific parameter sets.
Although the ordering should be sufficient, NDTA_NAME can be
used to identify sequences. The initial message can be identified
by checking for NDTA_CONFIG. The device specific messages do
not contain this TLV but have NDTPA_IFINDEX set to the
corresponding interface index.
To change neighbour table attributes, send RTM_SETNEIGHTBL
with NDTA_NAME set. Changeable attribute include NDTA_THRESH[1-3],
NDTA_GC_INTERVAL, and all TLVs in NDTA_PARMS unless marked
otherwise. Device specific parameter sets can be changed by
setting NDTPA_IFINDEX to the interface index of the corresponding
device.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
commit 6769b3b0603c632bda062e427178de8986900a52
tree 7e955d0e3c8fdef048336d0d7ba24a0539f28f9b
parent 98c3bc2e6320127cd96ef46d33f89ae127ce48fe
author Thomas Graf <tgraf@suug.ch> Wed, 25 May 2005 20:30:09 +0200
committer Thomas Graf <tgraf@suug.ch> Wed, 25 May 2005 20:30:09 +0200
include/linux/rtnetlink.h | 107 ++++++++++++++
include/net/neighbour.h | 4
net/core/neighbour.c | 317 +++++++++++++++++++++++++++++++++++++++++++-
net/core/rtnetlink.c | 20 +-
security/selinux/nlmsgtab.c | 2
5 files changed, 439 insertions(+), 11 deletions(-)
Index: include/linux/rtnetlink.h
===================================================================
--- 0ad944fa24c79b3eeaf06f85f5dd6d91ae1adbc9/include/linux/rtnetlink.h (mode:100644)
+++ 7e955d0e3c8fdef048336d0d7ba24a0539f28f9b/include/linux/rtnetlink.h (mode:100644)
@@ -89,6 +89,13 @@
RTM_GETANYCAST = 62,
#define RTM_GETANYCAST RTM_GETANYCAST
+ RTM_NEWNEIGHTBL = 64,
+#define RTM_NEWNEIGHTBL RTM_NEWNEIGHTBL
+ RTM_GETNEIGHTBL = 66,
+#define RTM_GETNEIGHTBL RTM_GETNEIGHTBL
+ RTM_SETNEIGHTBL,
+#define RTM_SETNEIGHTBL RTM_SETNEIGHTBL
+
__RTM_MAX,
#define RTM_MAX (((__RTM_MAX + 3) & ~3) - 1)
};
@@ -493,6 +500,106 @@
__u32 ndm_refcnt;
};
+
+/*****************************************************************
+ * Neighbour tables specific messages.
+ *
+ * To retrieve the neighbour tables send RTM_GETNEIGHTBL with the
+ * NLM_F_DUMP flag set. Every neighbour table configuration is
+ * spread over multiple messages to avoid running into message
+ * size limits on systems with many interfaces. The first message
+ * in the sequence transports all not device specific data such as
+ * statistics, configuration, and the default parameter set.
+ * This message is followed by 0..n messages carrying device
+ * specific parameter sets.
+ * Although the ordering should be sufficient, NDTA_NAME can be
+ * used to identify sequences. The initial message can be identified
+ * by checking for NDTA_CONFIG. The device specific messages do
+ * not contain this TLV but have NDTPA_IFINDEX set to the
+ * corresponding interface index.
+ *
+ * To change neighbour table attributes, send RTM_SETNEIGHTBL
+ * with NDTA_NAME set. Changeable attribute include NDTA_THRESH[1-3],
+ * NDTA_GC_INTERVAL, and all TLVs in NDTA_PARMS unless marked
+ * otherwise. Device specific parameter sets can be changed by
+ * setting NDTPA_IFINDEX to the interface index of the corresponding
+ * device.
+ ****/
+
+struct ndt_stats
+{
+ __u64 ndts_allocs;
+ __u64 ndts_destroys;
+ __u64 ndts_hash_grows;
+ __u64 ndts_res_failed;
+ __u64 ndts_lookups;
+ __u64 ndts_hits;
+ __u64 ndts_rcv_probes_mcast;
+ __u64 ndts_rcv_probes_ucast;
+ __u64 ndts_periodic_gc_runs;
+ __u64 ndts_forced_gc_runs;
+};
+
+enum {
+ NDTPA_UNSPEC,
+ NDTPA_IFINDEX, /* u32, unchangeable */
+ NDTPA_REFCNT, /* u32, read-only */
+ NDTPA_REACHABLE_TIME, /* u64, read-only, msecs */
+ NDTPA_BASE_REACHABLE_TIME, /* u64, msecs */
+ NDTPA_RETRANS_TIME, /* u64, msecs */
+ NDTPA_GC_STALETIME, /* u64, msecs */
+ NDTPA_DELAY_PROBE_TIME, /* u64, msecs */
+ NDTPA_QUEUE_LEN, /* u32 */
+ NDTPA_APP_PROBES, /* u32 */
+ NDTPA_UCAST_PROBES, /* u32 */
+ NDTPA_MCAST_PROBES, /* u32 */
+ NDTPA_ANYCAST_DELAY, /* u64, msecs */
+ NDTPA_PROXY_DELAY, /* u64, msecs */
+ NDTPA_PROXY_QLEN, /* u32 */
+ NDTPA_LOCKTIME, /* u64, msecs */
+ __NDTPA_MAX
+};
+#define NDTPA_MAX (__NDTPA_MAX - 1)
+
+struct ndtmsg
+{
+ __u8 ndtm_family;
+ __u8 ndtm_pad1;
+ __u16 ndtm_pad2;
+};
+
+struct ndt_config
+{
+ __u16 ndtc_key_len;
+ __u16 ndtc_entry_size;
+ __u32 ndtc_entries;
+ __u32 ndtc_last_flush; /* delta to now in msecs */
+ __u32 ndtc_last_rand; /* delta to now in msecs */
+ __u32 ndtc_hash_rnd;
+ __u32 ndtc_hash_mask;
+ __u32 ndtc_hash_chain_gc;
+ __u32 ndtc_proxy_qlen;
+};
+
+enum {
+ NDTA_UNSPEC,
+ NDTA_NAME, /* char *, unchangeable */
+ NDTA_THRESH1, /* u32 */
+ NDTA_THRESH2, /* u32 */
+ NDTA_THRESH3, /* u32 */
+ NDTA_CONFIG, /* struct ndt_config, read-only */
+ NDTA_PARMS, /* nested TLV NDTPA_* */
+ NDTA_STATS, /* struct ndt_stats, read-only */
+ NDTA_GC_INTERVAL, /* u64, msecs */
+ __NDTA_MAX
+};
+#define NDTA_MAX (__NDTA_MAX - 1)
+
+#define NDTA_RTA(r) ((struct rtattr*)(((char*)(r)) + \
+ NLMSG_ALIGN(sizeof(struct ndtmsg))))
+#define NDTA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct ndtmsg))
+
+
/****
* General form of address family dependent message.
****/
Index: include/net/neighbour.h
===================================================================
--- 0ad944fa24c79b3eeaf06f85f5dd6d91ae1adbc9/include/net/neighbour.h (mode:100644)
+++ 7e955d0e3c8fdef048336d0d7ba24a0539f28f9b/include/net/neighbour.h (mode:100644)
@@ -65,6 +65,7 @@
struct neigh_parms
{
+ struct net_device *dev;
struct neigh_parms *next;
int (*neigh_setup)(struct neighbour *);
struct neigh_table *tbl;
@@ -252,6 +253,9 @@
extern int neigh_delete(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg);
extern void neigh_app_ns(struct neighbour *n);
+extern int neightbl_dump_info(struct sk_buff *skb, struct netlink_callback *cb);
+extern int neightbl_set(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg);
+
extern void neigh_for_each(struct neigh_table *tbl, void (*cb)(struct neighbour *, void *), void *cookie);
extern void __neigh_for_each_release(struct neigh_table *tbl, int (*cb)(struct neighbour *));
extern void pneigh_for_each(struct neigh_table *tbl, void (*cb)(struct pneigh_entry *));
Index: net/core/neighbour.c
===================================================================
--- 0ad944fa24c79b3eeaf06f85f5dd6d91ae1adbc9/net/core/neighbour.c (mode:100644)
+++ 7e955d0e3c8fdef048336d0d7ba24a0539f28f9b/net/core/neighbour.c (mode:100644)
@@ -1276,9 +1276,14 @@
INIT_RCU_HEAD(&p->rcu_head);
p->reachable_time =
neigh_rand_reach_time(p->base_reachable_time);
- if (dev && dev->neigh_setup && dev->neigh_setup(dev, p)) {
- kfree(p);
- return NULL;
+ if (dev) {
+ if (dev->neigh_setup && dev->neigh_setup(dev, p)) {
+ kfree(p);
+ return NULL;
+ }
+
+ dev_hold(dev);
+ p->dev = dev;
}
p->sysctl_table = NULL;
write_lock_bh(&tbl->lock);
@@ -1309,6 +1314,8 @@
*p = parms->next;
parms->dead = 1;
write_unlock_bh(&tbl->lock);
+ if (parms->dev)
+ dev_put(parms->dev);
call_rcu(&parms->rcu_head, neigh_rcu_free_parms);
return;
}
@@ -1546,6 +1553,308 @@
return err;
}
+static int neightbl_fill_parms(struct sk_buff *skb, struct neigh_parms *parms)
+{
+ struct rtattr *nest = RTA_NEST(skb, NDTA_PARMS);
+
+ if (parms->dev)
+ RTA_PUT_U32(skb, NDTPA_IFINDEX, parms->dev->ifindex);
+
+ RTA_PUT_U32(skb, NDTPA_REFCNT, atomic_read(&parms->refcnt));
+ RTA_PUT_U32(skb, NDTPA_QUEUE_LEN, parms->queue_len);
+ RTA_PUT_U32(skb, NDTPA_PROXY_QLEN, parms->proxy_qlen);
+ RTA_PUT_U32(skb, NDTPA_APP_PROBES, parms->app_probes);
+ RTA_PUT_U32(skb, NDTPA_UCAST_PROBES, parms->ucast_probes);
+ RTA_PUT_U32(skb, NDTPA_MCAST_PROBES, parms->mcast_probes);
+ RTA_PUT_MSECS(skb, NDTPA_REACHABLE_TIME, parms->reachable_time);
+ RTA_PUT_MSECS(skb, NDTPA_BASE_REACHABLE_TIME,
+ parms->base_reachable_time);
+ RTA_PUT_MSECS(skb, NDTPA_GC_STALETIME, parms->gc_staletime);
+ RTA_PUT_MSECS(skb, NDTPA_DELAY_PROBE_TIME, parms->delay_probe_time);
+ RTA_PUT_MSECS(skb, NDTPA_RETRANS_TIME, parms->retrans_time);
+ RTA_PUT_MSECS(skb, NDTPA_ANYCAST_DELAY, parms->anycast_delay);
+ RTA_PUT_MSECS(skb, NDTPA_PROXY_DELAY, parms->proxy_delay);
+ RTA_PUT_MSECS(skb, NDTPA_LOCKTIME, parms->locktime);
+
+ return RTA_NEST_END(skb, nest);
+
+rtattr_failure:
+ return RTA_NEST_CANCEL(skb, nest);
+}
+
+static int neightbl_fill_info(struct neigh_table *tbl, struct sk_buff *skb,
+ struct netlink_callback *cb)
+{
+ struct nlmsghdr *nlh;
+ struct ndtmsg *ndtmsg;
+
+ nlh = NLMSG_PUT_ANSWER(skb, cb, RTM_NEWNEIGHTBL, sizeof(struct ndtmsg));
+ ndtmsg = NLMSG_DATA(nlh);
+
+ NLMSG_SET_MULTIPART(nlh);
+
+ read_lock_bh(&tbl->lock);
+ ndtmsg->ndtm_family = tbl->family;
+
+ RTA_PUT_STRING(skb, NDTA_NAME, tbl->id);
+ RTA_PUT_MSECS(skb, NDTA_GC_INTERVAL, tbl->gc_interval);
+ RTA_PUT_U32(skb, NDTA_THRESH1, tbl->gc_thresh1);
+ RTA_PUT_U32(skb, NDTA_THRESH2, tbl->gc_thresh2);
+ RTA_PUT_U32(skb, NDTA_THRESH3, tbl->gc_thresh3);
+
+ {
+ unsigned long now = jiffies;
+ unsigned int flush_delta = now - tbl->last_flush;
+ unsigned int rand_delta = now - tbl->last_rand;
+
+ struct ndt_config ndc = {
+ .ndtc_key_len = tbl->key_len,
+ .ndtc_entry_size = tbl->entry_size,
+ .ndtc_entries = atomic_read(&tbl->entries),
+ .ndtc_last_flush = jiffies_to_msecs(flush_delta),
+ .ndtc_last_rand = jiffies_to_msecs(rand_delta),
+ .ndtc_hash_rnd = tbl->hash_rnd,
+ .ndtc_hash_mask = tbl->hash_mask,
+ .ndtc_hash_chain_gc = tbl->hash_chain_gc,
+ .ndtc_proxy_qlen = tbl->proxy_queue.qlen,
+ };
+
+ RTA_PUT(skb, NDTA_CONFIG, sizeof(ndc), &ndc);
+ }
+
+ {
+ int cpu;
+ struct ndt_stats ndst;
+
+ memset(&ndst, 0, sizeof(ndst));
+
+ for (cpu = 0; cpu < NR_CPUS; cpu++) {
+ struct neigh_statistics *st;
+
+ if (!cpu_possible(cpu))
+ continue;
+
+ st = per_cpu_ptr(tbl->stats, cpu);
+ ndst.ndts_allocs += st->allocs;
+ ndst.ndts_destroys += st->destroys;
+ ndst.ndts_hash_grows += st->hash_grows;
+ ndst.ndts_res_failed += st->res_failed;
+ ndst.ndts_lookups += st->lookups;
+ ndst.ndts_hits += st->hits;
+ ndst.ndts_rcv_probes_mcast += st->rcv_probes_mcast;
+ ndst.ndts_rcv_probes_ucast += st->rcv_probes_ucast;
+ ndst.ndts_periodic_gc_runs += st->periodic_gc_runs;
+ ndst.ndts_forced_gc_runs += st->forced_gc_runs;
+ }
+
+ RTA_PUT(skb, NDTA_STATS, sizeof(ndst), &ndst);
+ }
+
+ BUG_ON(tbl->parms.dev);
+ if (neightbl_fill_parms(skb, &tbl->parms) < 0)
+ goto rtattr_failure;
+
+ read_unlock_bh(&tbl->lock);
+ return NLMSG_END(skb, nlh);
+
+rtattr_failure:
+ read_unlock_bh(&tbl->lock);
+ return NLMSG_CANCEL(skb, nlh);
+
+nlmsg_failure:
+ return -1;
+}
+
+static int neightbl_fill_param_info(struct neigh_table *tbl,
+ struct neigh_parms *parms,
+ struct sk_buff *skb,
+ struct netlink_callback *cb)
+{
+ struct ndtmsg *ndtmsg;
+ struct nlmsghdr *nlh;
+
+ nlh = NLMSG_PUT_ANSWER(skb, cb, RTM_NEWNEIGHTBL, sizeof(struct ndtmsg));
+ ndtmsg = NLMSG_DATA(nlh);
+
+ NLMSG_SET_MULTIPART(nlh);
+
+ read_lock_bh(&tbl->lock);
+ ndtmsg->ndtm_family = tbl->family;
+ RTA_PUT_STRING(skb, NDTA_NAME, tbl->id);
+
+ if (neightbl_fill_parms(skb, parms) < 0)
+ goto rtattr_failure;
+
+ read_unlock_bh(&tbl->lock);
+ return NLMSG_END(skb, nlh);
+
+rtattr_failure:
+ read_unlock_bh(&tbl->lock);
+ return NLMSG_CANCEL(skb, nlh);
+
+nlmsg_failure:
+ return -1;
+}
+
+static inline struct neigh_parms *lookup_neigh_params(struct neigh_table *tbl,
+ int ifindex)
+{
+ struct neigh_parms *p;
+
+ for (p = &tbl->parms; p; p = p->next)
+ if ((p->dev && p->dev->ifindex == ifindex) ||
+ (!p->dev && !ifindex))
+ return p;
+
+ return NULL;
+}
+
+int neightbl_set(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+{
+ struct neigh_table *tbl;
+ struct ndtmsg *ndtmsg = NLMSG_DATA(nlh);
+ struct rtattr **tb = arg;
+ int err = -EINVAL;
+
+ if (!tb[NDTA_NAME - 1] || !RTA_PAYLOAD(tb[NDTA_NAME - 1]))
+ return -EINVAL;
+
+ read_lock(&neigh_tbl_lock);
+ for (tbl = neigh_tables; tbl; tbl = tbl->next) {
+ if (ndtmsg->ndtm_family && tbl->family != ndtmsg->ndtm_family)
+ continue;
+
+ if (!rtattr_strcmp(tb[NDTA_NAME - 1], tbl->id))
+ break;
+ }
+
+ if (tbl == NULL) {
+ err = -ENOENT;
+ goto errout;
+ }
+
+ /*
+ * We acquire tbl->lock to be nice to the periodic timers and
+ * make sure they always see a consistent set of values.
+ */
+ write_lock_bh(&tbl->lock);
+
+ if (tb[NDTA_THRESH1 - 1])
+ tbl->gc_thresh1 = RTA_GET_U32(tb[NDTA_THRESH1 - 1]);
+
+ if (tb[NDTA_THRESH2 - 1])
+ tbl->gc_thresh2 = RTA_GET_U32(tb[NDTA_THRESH2 - 1]);
+
+ if (tb[NDTA_THRESH3 - 1])
+ tbl->gc_thresh3 = RTA_GET_U32(tb[NDTA_THRESH3 - 1]);
+
+ if (tb[NDTA_GC_INTERVAL - 1])
+ tbl->gc_interval = RTA_GET_MSECS(tb[NDTA_GC_INTERVAL - 1]);
+
+ if (tb[NDTA_PARMS - 1]) {
+ struct rtattr *tbp[NDTPA_MAX];
+ struct neigh_parms *p;
+ u32 ifindex = 0;
+
+ if (rtattr_parse_nested(tbp, NDTPA_MAX, tb[NDTA_PARMS - 1]) < 0)
+ goto rtattr_failure;
+
+ if (tbp[NDTPA_IFINDEX - 1])
+ ifindex = RTA_GET_U32(tbp[NDTPA_IFINDEX - 1]);
+
+ p = lookup_neigh_params(tbl, ifindex);
+ if (p == NULL) {
+ err = -ENOENT;
+ goto rtattr_failure;
+ }
+
+ if (tbp[NDTPA_QUEUE_LEN - 1])
+ p->queue_len = RTA_GET_U32(tbp[NDTPA_QUEUE_LEN - 1]);
+
+ if (tbp[NDTPA_PROXY_QLEN - 1])
+ p->proxy_qlen = RTA_GET_U32(tbp[NDTPA_PROXY_QLEN - 1]);
+
+ if (tbp[NDTPA_APP_PROBES - 1])
+ p->app_probes = RTA_GET_U32(tbp[NDTPA_APP_PROBES - 1]);
+
+ if (tbp[NDTPA_UCAST_PROBES - 1])
+ p->ucast_probes =
+ RTA_GET_U32(tbp[NDTPA_UCAST_PROBES - 1]);
+
+ if (tbp[NDTPA_MCAST_PROBES - 1])
+ p->mcast_probes =
+ RTA_GET_U32(tbp[NDTPA_MCAST_PROBES - 1]);
+
+ if (tbp[NDTPA_BASE_REACHABLE_TIME - 1])
+ p->base_reachable_time =
+ RTA_GET_MSECS(tbp[NDTPA_BASE_REACHABLE_TIME - 1]);
+
+ if (tbp[NDTPA_GC_STALETIME - 1])
+ p->gc_staletime =
+ RTA_GET_MSECS(tbp[NDTPA_GC_STALETIME - 1]);
+
+ if (tbp[NDTPA_DELAY_PROBE_TIME - 1])
+ p->delay_probe_time =
+ RTA_GET_MSECS(tbp[NDTPA_DELAY_PROBE_TIME - 1]);
+
+ if (tbp[NDTPA_RETRANS_TIME - 1])
+ p->retrans_time =
+ RTA_GET_MSECS(tbp[NDTPA_RETRANS_TIME - 1]);
+
+ if (tbp[NDTPA_ANYCAST_DELAY - 1])
+ p->anycast_delay =
+ RTA_GET_MSECS(tbp[NDTPA_ANYCAST_DELAY - 1]);
+
+ if (tbp[NDTPA_PROXY_DELAY - 1])
+ p->proxy_delay =
+ RTA_GET_MSECS(tbp[NDTPA_PROXY_DELAY - 1]);
+
+ if (tbp[NDTPA_LOCKTIME - 1])
+ p->locktime = RTA_GET_MSECS(tbp[NDTPA_LOCKTIME - 1]);
+ }
+
+ err = 0;
+
+rtattr_failure:
+ write_unlock_bh(&tbl->lock);
+errout:
+ read_unlock(&neigh_tbl_lock);
+ return err;
+}
+
+int neightbl_dump_info(struct sk_buff *skb, struct netlink_callback *cb)
+{
+ int idx, family;
+ int s_idx = cb->args[0];
+ struct neigh_table *tbl;
+
+ family = ((struct rtgenmsg *)NLMSG_DATA(cb->nlh))->rtgen_family;
+
+ read_lock(&neigh_tbl_lock);
+ for (tbl = neigh_tables, idx = 0; tbl; tbl = tbl->next) {
+ struct neigh_parms *p;
+
+ if (idx < s_idx || (family && tbl->family != family))
+ continue;
+
+ if (neightbl_fill_info(tbl, skb, cb) <= 0)
+ break;
+
+ for (++idx, p = tbl->parms.next; p; p = p->next, idx++) {
+ if (idx < s_idx)
+ continue;
+
+ if (neightbl_fill_param_info(tbl, p, skb, cb) <= 0)
+ goto out;
+ }
+
+ }
+out:
+ read_unlock(&neigh_tbl_lock);
+ cb->args[0] = idx;
+
+ return skb->len;
+}
static int neigh_fill_info(struct sk_buff *skb, struct neighbour *n,
u32 pid, u32 seq, int event)
@@ -2352,6 +2661,8 @@
EXPORT_SYMBOL(neigh_update_hhs);
EXPORT_SYMBOL(pneigh_enqueue);
EXPORT_SYMBOL(pneigh_lookup);
+EXPORT_SYMBOL(neightbl_dump_info);
+EXPORT_SYMBOL(neightbl_set);
#ifdef CONFIG_ARPD
EXPORT_SYMBOL(neigh_app_ns);
Index: net/core/rtnetlink.c
===================================================================
--- 0ad944fa24c79b3eeaf06f85f5dd6d91ae1adbc9/net/core/rtnetlink.c (mode:100644)
+++ 7e955d0e3c8fdef048336d0d7ba24a0539f28f9b/net/core/rtnetlink.c (mode:100644)
@@ -100,6 +100,7 @@
[RTM_FAM(RTM_NEWPREFIX)] = NLMSG_LENGTH(sizeof(struct rtgenmsg)),
[RTM_FAM(RTM_GETMULTICAST)] = NLMSG_LENGTH(sizeof(struct rtgenmsg)),
[RTM_FAM(RTM_GETANYCAST)] = NLMSG_LENGTH(sizeof(struct rtgenmsg)),
+ [RTM_FAM(RTM_NEWNEIGHTBL)] = NLMSG_LENGTH(sizeof(struct ndtmsg)),
};
static const int rta_max[RTM_NR_FAMILIES] =
@@ -113,6 +114,7 @@
[RTM_FAM(RTM_NEWTCLASS)] = TCA_MAX,
[RTM_FAM(RTM_NEWTFILTER)] = TCA_MAX,
[RTM_FAM(RTM_NEWACTION)] = TCAA_MAX,
+ [RTM_FAM(RTM_NEWNEIGHTBL)] = NDTA_MAX,
};
void __rta_fill(struct sk_buff *skb, int attrtype, int attrlen, const void *data)
@@ -649,14 +651,16 @@
static struct rtnetlink_link link_rtnetlink_table[RTM_NR_MSGTYPES] =
{
- [RTM_GETLINK - RTM_BASE] = { .dumpit = rtnetlink_dump_ifinfo },
- [RTM_SETLINK - RTM_BASE] = { .doit = do_setlink },
- [RTM_GETADDR - RTM_BASE] = { .dumpit = rtnetlink_dump_all },
- [RTM_GETROUTE - RTM_BASE] = { .dumpit = rtnetlink_dump_all },
- [RTM_NEWNEIGH - RTM_BASE] = { .doit = neigh_add },
- [RTM_DELNEIGH - RTM_BASE] = { .doit = neigh_delete },
- [RTM_GETNEIGH - RTM_BASE] = { .dumpit = neigh_dump_info },
- [RTM_GETRULE - RTM_BASE] = { .dumpit = rtnetlink_dump_all },
+ [RTM_GETLINK - RTM_BASE] = { .dumpit = rtnetlink_dump_ifinfo },
+ [RTM_SETLINK - RTM_BASE] = { .doit = do_setlink },
+ [RTM_GETADDR - RTM_BASE] = { .dumpit = rtnetlink_dump_all },
+ [RTM_GETROUTE - RTM_BASE] = { .dumpit = rtnetlink_dump_all },
+ [RTM_NEWNEIGH - RTM_BASE] = { .doit = neigh_add },
+ [RTM_DELNEIGH - RTM_BASE] = { .doit = neigh_delete },
+ [RTM_GETNEIGH - RTM_BASE] = { .dumpit = neigh_dump_info },
+ [RTM_GETRULE - RTM_BASE] = { .dumpit = rtnetlink_dump_all },
+ [RTM_GETNEIGHTBL - RTM_BASE] = { .dumpit = neightbl_dump_info },
+ [RTM_SETNEIGHTBL - RTM_BASE] = { .doit = neightbl_set },
};
static int rtnetlink_event(struct notifier_block *this, unsigned long event, void *ptr)
Index: security/selinux/nlmsgtab.c
===================================================================
--- 0ad944fa24c79b3eeaf06f85f5dd6d91ae1adbc9/security/selinux/nlmsgtab.c (mode:100644)
+++ 7e955d0e3c8fdef048336d0d7ba24a0539f28f9b/security/selinux/nlmsgtab.c (mode:100644)
@@ -63,6 +63,8 @@
{ RTM_GETPREFIX, NETLINK_ROUTE_SOCKET__NLMSG_READ },
{ RTM_GETMULTICAST, NETLINK_ROUTE_SOCKET__NLMSG_READ },
{ RTM_GETANYCAST, NETLINK_ROUTE_SOCKET__NLMSG_READ },
+ { RTM_GETNEIGHTBL, NETLINK_ROUTE_SOCKET__NLMSG_READ },
+ { RTM_SETNEIGHTBL, NETLINK_ROUTE_SOCKET__NLMSG_WRITE },
};
static struct nlmsg_perm nlmsg_firewall_perms[] =
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink
2005-05-26 22:17 ` David S. Miller
2005-05-26 22:24 ` Thomas Graf
@ 2005-05-26 22:26 ` Thomas Graf
2005-05-26 22:37 ` David S. Miller
1 sibling, 1 reply; 26+ messages in thread
From: Thomas Graf @ 2005-05-26 22:26 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev
* David S. Miller <20050526.151726.85409972.davem@davemloft.net> 2005-05-26 15:17
> From: Thomas Graf <tgraf@suug.ch>
> Date: Thu, 26 May 2005 20:55:26 +0200
>
> > This message is followed by 0..n messages carrying device
> > specific parameter sets. They all share the same value for
> > Although the ordering should be sufficient, NDTA_NAME can be
>
> It seems a sentence is chopped off here mid-stream,
> please provide a fixed changelog so I can apply your
> patches.
Sorry, it's what happens if one gets interrupted in the middle
of writing a longer comment. ;-> Patch edited by hand but
it should still apply.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink
2005-05-26 22:26 ` Thomas Graf
@ 2005-05-26 22:37 ` David S. Miller
0 siblings, 0 replies; 26+ messages in thread
From: David S. Miller @ 2005-05-26 22:37 UTC (permalink / raw)
To: tgraf; +Cc: netdev
From: Thomas Graf <tgraf@suug.ch>
Date: Fri, 27 May 2005 00:26:56 +0200
> Sorry, it's what happens if one gets interrupted in the middle
> of writing a longer comment. ;-> Patch edited by hand but
> it should still apply.
Thanks.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink
2005-05-26 18:55 ` [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink Thomas Graf
2005-05-26 22:17 ` David S. Miller
@ 2005-05-27 11:14 ` jamal
2005-05-27 12:15 ` Thomas Graf
1 sibling, 1 reply; 26+ messages in thread
From: jamal @ 2005-05-27 11:14 UTC (permalink / raw)
To: Thomas Graf; +Cc: David S. Miller, netdev
On Thu, 2005-26-05 at 20:55 +0200, Thomas Graf wrote:
> To retrieve the neighbour tables send RTM_GETNEIGHTBL with the
> NLM_F_DUMP flag set. Every neighbour table configuration is
> spread over multiple messages to avoid running into message
> size limits on systems with many interfaces. The first message
> in the sequence transports all not device specific data such as
> statistics, configuration, and the default parameter set.
> This message is followed by 0..n messages carrying device
> specific parameter sets. They all share the same value for
> Although the ordering should be sufficient, NDTA_NAME can be
> used to identify sequences. The initial message can be identified
> by checking for NDTA_CONFIG. The device specific messages do
> no contain this TLV but have NDTPA_IFINDEX set to the
> corresponding interface index.
>
> To change neighbour table attributes, send RTM_SETNEIGHTBL
> with NDTA_NAME set. Changeable attribute include NDTA_THRESH[1-3],
> NDTA_GC_INTERVAL, and all TLVs in NDTA_PARMS unless marked
> otherwise. Device specific parameter sets can be changed by
> setting NDTPA_IFINDEX to the interface index of the corresponding
> device.
It would have been better, IMO, to just work on a generic idev/devinet
retrieval and setting instead of one component (as in ARP in this
case).
cheers,
jamal
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink
2005-05-27 11:14 ` jamal
@ 2005-05-27 12:15 ` Thomas Graf
2005-05-27 13:50 ` jamal
0 siblings, 1 reply; 26+ messages in thread
From: Thomas Graf @ 2005-05-27 12:15 UTC (permalink / raw)
To: jamal; +Cc: David S. Miller, netdev
* jamal <1117192464.6688.3.camel@localhost.localdomain> 2005-05-27 07:14
> It would have been better, IMO, to just work on a generic idev/devinet
> retrieval and setting instead of one component (as in ARP in this
> case).
I thought about adding a devinet component to allow retrieving/setting
device specific parameters (along with other settings) but dropped the
idea because I did not want to implement the data structures twice.
I think this is cleaner, it allows one request to be sent to see
the complete arp/ndisc configuration rather than sending one to
retrieve the global configuration/statistics and another one
to retrieve device specific parameters.
Nevertheless, I'm open for changes so once I've written the devinet
component to change rp_filter, forwarding, arp_filter, etc. we can
still move the device specific NDTA_PARMS to the devinet component
if we still think it would be better. For now, the neightbl component
is clean, small, well defined and probably never needs to be touched
again.
Is that acceptable for you?
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink
2005-05-27 12:15 ` Thomas Graf
@ 2005-05-27 13:50 ` jamal
[not found] ` <20050527141023.GP15391@postel.suug.ch>
0 siblings, 1 reply; 26+ messages in thread
From: jamal @ 2005-05-27 13:50 UTC (permalink / raw)
To: Thomas Graf; +Cc: David S. Miller, netdev
On Fri, 2005-27-05 at 14:15 +0200, Thomas Graf wrote:
> * jamal <1117192464.6688.3.camel@localhost.localdomain> 2005-05-27 07:14
> > It would have been better, IMO, to just work on a generic idev/devinet
> > retrieval and setting instead of one component (as in ARP in this
> > case).
>
> I thought about adding a devinet component to allow retrieving/setting
> device specific parameters (along with other settings) but dropped the
> idea because I did not want to implement the data structures twice.
>
> I think this is cleaner, it allows one request to be sent to see
> the complete arp/ndisc configuration rather than sending one to
> retrieve the global configuration/statistics and another one
> to retrieve device specific parameters.
>
> Nevertheless, I'm open for changes so once I've written the devinet
> component to change rp_filter, forwarding, arp_filter, etc. we can
> still move the device specific NDTA_PARMS to the devinet component
> if we still think it would be better. For now, the neightbl component
> is clean, small, well defined and probably never needs to be touched
> again.
>
> Is that acceptable for you?
>
I think depends on where we are going. My view is that the current
abstraction model is fine for this specific bit.
i.e from an abstraction point of view these parameters are per device
per protocol (v4/v6/etc).
In other words, if i was to draw a model diagram it would show:
--> netdevice
----> v4 addresses
----> v4 netdevice configuration
----> v6 addresses
----> v6 netdevice configuration
i.e the best way to do it imo, especially since you are doing this from
scratch, is to maintain that model. Your patch breaks away from this.
The only thing that is not devconfig queriable or settable at the moment
is the stats. That i can certainly see as being part of the standard
neighbor details for example. Infact this would apply elsewhere as well,
for example in the FIB where some stats are being displayed via /proc at
the moment.
We actually did discuss this a while back and i had an incomplete patch
for doing it the way i suggest above which i unfortunately cant seem to
locate at the moment.
cheers,
jamal
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink
[not found] ` <20050527141023.GP15391@postel.suug.ch>
@ 2005-05-27 14:57 ` jamal
2005-05-27 15:16 ` Thomas Graf
0 siblings, 1 reply; 26+ messages in thread
From: jamal @ 2005-05-27 14:57 UTC (permalink / raw)
To: Thomas Graf; +Cc: David S. Miller, netdev
On Fri, 2005-27-05 at 16:10 +0200, Thomas Graf wrote:
> * jamal <1117201853.6383.29.camel@localhost.localdomain> 2005-05-27 09:50
> > In other words, if i was to draw a model diagram it would show:
> > --> netdevice
> > ----> v4 addresses
> > ----> v4 netdevice configuration
> > ----> v6 addresses
> > ----> v6 netdevice configuration
> >
> > i.e the best way to do it imo, especially since you are doing this from
> > scratch, is to maintain that model. Your patch breaks away from this.
>
> I think this model is only valid for neighbour table parameter set
> and not for neighbour tables themselves.
>
Yes, that's true. In other words, neighbor tables details are out of
scope of that model and are more appropriate to the v4/6 neighbor "box".
Anything else that is netdevice related or connected as in the above
model (as opposed to say the ARP "box" related detail) is better to
leave or add it to be part of devconfig if it doesnt exist.
To be precise there are things in your patch that are fine to leave
there; but the most it seems to me belong to the devconfig netlink path
(which doesnt exist today).
> > The only thing that is not devconfig queriable or settable at the moment
> > is the stats. That i can certainly see as being part of the standard
> > neighbor details for example. Infact this would apply elsewhere as well,
> > for example in the FIB where some stats are being displayed via /proc at
> > the moment.
>
> This is not correct, struct ndt_config (read-only configuration, sort
> of like statistics), gc thresholds, gc interval, and the default parameter
> set are also not devconfig queriable. From a user point of view the
> most commonly changed values will be the gc thresholds, gc interval
> and parameters in the default set and if you forget about the device
> specific parameters for a second it would be pretty obvious to not put
> them into devconfig. So my point is that I don't want to put the most
> obvious and common use into something where it doesn't really fit.
The thresholds etc are there - just not netlink accessible. Example, the
default values for V4 are settable or queriable via:
/proc/sys/net/ipv4/neigh/default/gc_thresh3
/proc/sys/net/ipv4/neigh/default/gc_thresh2
/proc/sys/net/ipv4/neigh/default/gc_thresh1
/proc/sys/net/ipv4/neigh/default/gc_interval
> I do agree that the device specific parameters fit into a devconfig
> or whathever we'll call it but not the whole neighbour table
> configuration bits.
>
Agreed that some things are specific to the "ARP" or "NDISC" block
but not all that you have belongs to those boxes.
> > We actually did discuss this a while back and i had an incomplete patch
> > for doing it the way i suggest above which i unfortunately cant seem to
> > locate at the moment.
>
> Can't remember to have seen such as mail or patch, do you remember
> the thread you posted this? I posted the patch 3 times and I just
> read through the archives again and can't find any such commments.
I never posted the patch unfortunately but mentioned it in the
discussions. What counts is we are having this discussion now;->
I apologize i didnt comment on the patch earlier - sometimes when i am
away from the list for sometime i skip threads which may require me to
spend time looking at them to make any useful comments. This maybe the
reason i didnt respond. To catchup i normally read emails addressed to
me first then go over the list in a LIFO mode and stop once i get
engaged in a discussion.
cheers,
jamal
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink
2005-05-27 14:57 ` jamal
@ 2005-05-27 15:16 ` Thomas Graf
2005-05-27 15:56 ` jamal
0 siblings, 1 reply; 26+ messages in thread
From: Thomas Graf @ 2005-05-27 15:16 UTC (permalink / raw)
To: jamal; +Cc: David S. Miller, netdev
* jamal <1117205822.6383.68.camel@localhost.localdomain> 2005-05-27 10:57
> The thresholds etc are there - just not netlink accessible. Example, the
> default values for V4 are settable or queriable via:
>
> /proc/sys/net/ipv4/neigh/default/gc_thresh3
> /proc/sys/net/ipv4/neigh/default/gc_thresh2
> /proc/sys/net/ipv4/neigh/default/gc_thresh1
> /proc/sys/net/ipv4/neigh/default/gc_interval
Sorry but this is just one big hack:
a) they do not belong there, the first implication a user does when
he sees default/, dev1/, dev2/, devN is that default/ covers
the same set of parameters as devX/. Therefore he expects all
parameters in default/ to be also in devX/.
b) struct neigh_parms parms;
/* HACK. gc_* shoul follow parms without a gap! */
int gc_interval;
int gc_thresh1;
int gc_thresh2;
int gc_thresh3;
...
if (dev) {
dev_name_source = dev->name;
t->neigh_dev[0].ctl_name = dev->ifindex;
t->neigh_vars[12].procname = NULL; <- gc_interval
t->neigh_vars[13].procname = NULL; <- gc_thresh1
t->neigh_vars[14].procname = NULL; <- gc_thresh2
t->neigh_vars[15].procname = NULL; <- gc_thresh3
} else {
dev_name_source = t->neigh_dev[0].procname;
t->neigh_vars[12].data = (int *)(p + 1);
t->neigh_vars[13].data = (int *)(p + 1) + 1;
t->neigh_vars[14].data = (int *)(p + 1) + 2;
t->neigh_vars[15].data = (int *)(p + 1) + 3;
}
I will not push along these hacks into my code ;->
Some more notes on the devconfig idea: I'm not sure if you are still
implying devconfig as devinet but if so: although we only have neighbour
tables on devinets so far there might be neighbour table on other
interfaces for whatever reason. A protocol might allocate a neighbour
parameter set on any netdevice which makes me think that the netlink
neighbour code should be open for this as well. It depends on how we
do the devconfig thing, so what I suggest it so leave this as-is for
the moment and make a decision once we know more about how the devconfig
thing will look like. Reasonable?
> I apologize i didnt comment on the patch earlier - sometimes when i am
> away from the list for sometime i skip threads which may require me to
> spend time looking at them to make any useful comments. This maybe the
> reason i didnt respond. To catchup i normally read emails addressed to
> me first then go over the list in a LIFO mode and stop once i get
> engaged in a discussion.
I should have CCed you, my fault.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink
2005-05-27 15:16 ` Thomas Graf
@ 2005-05-27 15:56 ` jamal
2005-05-27 16:35 ` Thomas Graf
0 siblings, 1 reply; 26+ messages in thread
From: jamal @ 2005-05-27 15:56 UTC (permalink / raw)
To: Thomas Graf; +Cc: David S. Miller, netdev
On Fri, 2005-27-05 at 17:16 +0200, Thomas Graf wrote:
> * jamal <1117205822.6383.68.camel@localhost.localdomain> 2005-05-27 10:57
> > The thresholds etc are there - just not netlink accessible. Example, the
> > default values for V4 are settable or queriable via:
> >
> > /proc/sys/net/ipv4/neigh/default/gc_thresh3
> > /proc/sys/net/ipv4/neigh/default/gc_thresh2
> > /proc/sys/net/ipv4/neigh/default/gc_thresh1
> > /proc/sys/net/ipv4/neigh/default/gc_interval
>
> Sorry but this is just one big hack:
It maybe in the user space representation, but not in the kernel
abstraction which is what i was refering to.
In other words look at struct in_device;
> a) they do not belong there, the first implication a user does when
> he sees default/, dev1/, dev2/, devN is that default/ covers
> the same set of parameters as devX/. Therefore he expects all
> parameters in default/ to be also in devX/.
>
The deafult can be overriden by devX. So they dont need to sync.
But this is a separate topic
> b) struct neigh_parms parms;
> /* HACK. gc_* shoul follow parms without a gap! */
> int gc_interval;
> int gc_thresh1;
> int gc_thresh2;
> int gc_thresh3;
>
> ...
>
> if (dev) {
> dev_name_source = dev->name;
> t->neigh_dev[0].ctl_name = dev->ifindex;
> t->neigh_vars[12].procname = NULL; <- gc_interval
> t->neigh_vars[13].procname = NULL; <- gc_thresh1
> t->neigh_vars[14].procname = NULL; <- gc_thresh2
> t->neigh_vars[15].procname = NULL; <- gc_thresh3
> } else {
> dev_name_source = t->neigh_dev[0].procname;
> t->neigh_vars[12].data = (int *)(p + 1);
> t->neigh_vars[13].data = (int *)(p + 1) + 1;
> t->neigh_vars[14].data = (int *)(p + 1) + 2;
> t->neigh_vars[15].data = (int *)(p + 1) + 3;
> }
>
> I will not push along these hacks into my code ;->
>
I am afraid you are looking at this from the wrong angle (user space),
Thomas ;->
The abstraction in the kernel is proper.
To redraw that model again with the exact structure names:
netdevice ->
L2 config stuff
config stuff
more config stuff
..
..
netdevice protocol config:
-> v4 specific (struct in_device)
----> IPV4 addresses (ifa_list), ARP params(arp_parms),etc
-> v6 specific (struct inet6_dev)
----> IPV6 addresses(addr_list), ndisc params (nd_parms), etc
There are a few more items but i am leaving them out for brevity.
I hope it makes more sense now.
> Some more notes on the devconfig idea: I'm not sure if you are still
> implying devconfig as devinet but if so: although we only have neighbour
> tables on devinets so far there might be neighbour table on other
> interfaces for whatever reason.
in_devxxx types; sorry, I created the confusion by mentioning devconf in
the sweep.
> A protocol might allocate a neighbour
> parameter set on any netdevice which makes me think that the netlink
> neighbour code should be open for this as well. It depends on how we
> do the devconfig thing, so what I suggest it so leave this as-is for
> the moment and make a decision once we know more about how the devconfig
> thing will look like. Reasonable?
Take a look at what i said above and see if it makes sense.
Note, I am not objecting to the patch like i said - I think some of
those items are must put in that patch. Whatever simplifies life for
you. I am not exactly a purist but at times i like getting it right from
the beggining.
cheers,
jamal
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink
2005-05-27 15:56 ` jamal
@ 2005-05-27 16:35 ` Thomas Graf
2005-05-28 1:42 ` jamal
0 siblings, 1 reply; 26+ messages in thread
From: Thomas Graf @ 2005-05-27 16:35 UTC (permalink / raw)
To: jamal; +Cc: David S. Miller, netdev
* jamal <1117209411.6383.104.camel@localhost.localdomain> 2005-05-27 11:56
> On Fri, 2005-27-05 at 17:16 +0200, Thomas Graf wrote:
> > Sorry but this is just one big hack:
>
> It maybe in the user space representation, but not in the kernel
> abstraction which is what i was refering to.
> In other words look at struct in_device;
Yes, it references the arp device specific parameters and so
does inet6_dev for the ndisc cache. But this repsents only the
current state, we might have neighbour table parameter sets
which are not bound to inetdevs in the future. However, I
regard this as minor and I can agree on moving the device
specific parameters into a inetdev based architecture but
I do NOT agree on moving gc_ into this architecture as well,
it doesn't belong there.
Nitpicking for a bit, although inet6_dev and in_device hold
reference to the their arp respectively ndisc parameter set,
the sysctl interface does not use this reference but stores
the ifindex of the netdevice, _which_ is correct I think
because parameter sets are _not_ limited to inetdevs in terms
of architecture but only in terms of current use.
> The deafult can be overriden by devX. So they dont need to sync.
> But this is a separate topic
I was not talking about in-sync but rather that gc_* only
appears in default/ but not in devX/. What I expect is that
every default parameter can be overwritten in devX which
is not true for gc_*. Which is the reason why I implemented
them outside of the NDTPA_PARMS nested TLV.
> I am afraid you are looking at this from the wrong angle (user space),
> Thomas ;->
> The abstraction in the kernel is proper.
>
> To redraw that model again with the exact structure names:
>
> netdevice ->
> L2 config stuff
> config stuff
> more config stuff
> ..
> ..
> netdevice protocol config:
> -> v4 specific (struct in_device)
> ----> IPV4 addresses (ifa_list), ARP params(arp_parms),etc
> -> v6 specific (struct inet6_dev)
> ----> IPV6 addresses(addr_list), ndisc params (nd_parms), etc
>
> There are a few more items but i am leaving them out for brevity.
> I hope it makes more sense now.
I understand your architecture and if we follow this thought
we'd have a "default" netdevice which repesents all default
settings. I do agree with this architecture but the problematic
question remains: Do we want parameters in "default" which are
not available in devX? I think this question is what it gets
down to in the end. If we say, yes we do want this, then we
can implement all generic settings, such as tcp_, using this
scheme as well. I don't disagree with this completely but I
find it not very intuitive from a user perspective.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink
2005-05-27 16:35 ` Thomas Graf
@ 2005-05-28 1:42 ` jamal
2005-05-28 12:07 ` Thomas Graf
0 siblings, 1 reply; 26+ messages in thread
From: jamal @ 2005-05-28 1:42 UTC (permalink / raw)
To: Thomas Graf; +Cc: David S. Miller, netdev
On Fri, 2005-27-05 at 18:35 +0200, Thomas Graf wrote:
> I do NOT agree on moving gc_ into this architecture as well,
> it doesn't belong there.
Well, you do realize they are part of the in_dev ? ;->
> Nitpicking for a bit, although inet6_dev and in_device hold
> reference to the their arp respectively ndisc parameter set,
> the sysctl interface does not use this reference but stores
> the ifindex of the netdevice, _which_ is correct I think
> because parameter sets are _not_ limited to inetdevs in terms
> of architecture but only in terms of current use.
>
I see a little main service header with ifindex always no different than
IFA or IFLINK etc followed by appropriate TLVs which are nested.
Unfortunately i still cant find the patch - i started with a different
approach; my immediate interest was to get events when someone made
in_device changes. BTW, this is going to be one of the main challenges
since there are many paths to configure these things.
> > The deafult can be overriden by devX. So they dont need to sync.
> > But this is a separate topic
>
> I was not talking about in-sync but rather that gc_* only
> appears in default/ but not in devX/. What I expect is that
> every default parameter can be overwritten in devX which
> is not true for gc_*. Which is the reason why I implemented
> them outside of the NDTPA_PARMS nested TLV.
>
Well, if you look at the structure there is no reason they should really
be separate; infact theres a comment:
-----------
struct neigh_parms parms;
/* HACK. gc_* shoul follow parms without a gap! */
int gc_interval;
int gc_thresh1;
int gc_thresh2;
----------
To me the abstraction is pretty clear. I would agree that the way
parameters configurable from user space and some of the methods may not
be the best in terms of neighbor tables organization.
> I understand your architecture and if we follow this thought
> we'd have a "default" netdevice which repesents all default
> settings.
>From looking at the code, the default stuff seems to be "hardcoded".
Example in the definition arp_tbl.
> I do agree with this architecture but the problematic
> question remains: Do we want parameters in "default" which are
> not available in devX? I think this question is what it gets
> down to in the end. If we say, yes we do want this, then we
> can implement all generic settings, such as tcp_, using this
> scheme as well. I don't disagree with this completely but I
> find it not very intuitive from a user perspective.
The model like i said is clean. There are some issues i have qualms with
- such as IP address arrangements and tight integration with netdevices
- but those can addressed at a later time.
cheers,
jamal
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink
2005-05-28 1:42 ` jamal
@ 2005-05-28 12:07 ` Thomas Graf
2005-05-31 10:04 ` jamal
0 siblings, 1 reply; 26+ messages in thread
From: Thomas Graf @ 2005-05-28 12:07 UTC (permalink / raw)
To: jamal; +Cc: David S. Miller, netdev
* jamal <1117244567.6251.34.camel@localhost.localdomain> 2005-05-27 21:42
> On Fri, 2005-27-05 at 18:35 +0200, Thomas Graf wrote:
>
> > I do NOT agree on moving gc_ into this architecture as well,
> > it doesn't belong there.
>
> Well, you do realize they are part of the in_dev ? ;->
Huh? They cannot be device specific, there is only one gc
per neighbour table. So even _iff_ there was a device specific
setting it wouldn't make much sense ;->
> I see a little main service header with ifindex always no different than
> IFA or IFLINK etc followed by appropriate TLVs which are nested.
I guess we could simply move NDTPA_PARMS into the devconfig family.
>
> Unfortunately i still cant find the patch - i started with a different
> approach; my immediate interest was to get events when someone made
> in_device changes. BTW, this is going to be one of the main challenges
> since there are many paths to configure these things.
Right.
> Well, if you look at the structure there is no reason they should really
> be separate; infact theres a comment:
> -----------
> struct neigh_parms parms;
> /* HACK. gc_* shoul follow parms without a gap! */
> int gc_interval;
> int gc_thresh1;
> int gc_thresh2;
> ----------
You do realize the reason for that comment? ;-> The author of
the neighbour procfs code was simply too lazy to put these
into a separate place so he introduced a nasty hack.
> > I understand your architecture and if we follow this thought
> > we'd have a "default" netdevice which repesents all default
> > settings.
>
> >From looking at the code, the default stuff seems to be "hardcoded".
> Example in the definition arp_tbl.
These are just the default values for the default parameter set.
> The model like i said is clean. There are some issues i have qualms with
> - such as IP address arrangements and tight integration with netdevices
> - but those can addressed at a later time.
What about this, we move the device specific part into the new devconfig
layer but leave the main configuration, statistics, and gc parameters
in RTM_NEIGHTBL? i.e.
arp_cache entries 2 reachable-time 23s 993msec retransmit-time 1s
key-len 4 entry-size 152 last-flush 55m 44s 572msec
gc threshold 128/512/1024 interval 30s chain-position 3
hash-rand 0x69650257/0x00000003 last-rand 1m 44s 571msec
? refcnt 1 pending-queue-limit 3 proxy-delayed-queue-limit 64
? num-userspace-probes 0 num-unicast-probes 3 num-multicast-probes 3
? min-age 1s base-reachable-time 30s stale-check-interval 1m
? initial-probe-delay 5s answer-delay 1s proxy-answer-delay 800msec
lookups 195 hits 190 failed 0 allocations 3 destroys 1
hash-grows 1 forced-gc-runs 0 periodic-gc-runs 910
rcv-unicast-probes 0 rcv-multicast-probes 0
Xarp_cache<eth0> reachable-time 35s 967msec retransmit-time 1s
X refcnt 3 pending-queue-limit 3 proxy-delayed-queue-limit 64
X num-userspace-probes 0 num-unicast-probes 3 num-multicast-probes 3
X min-age 1s base-reachable-time 30s stale-check-interval 1m
X initial-probe-delay 5s answer-delay 1s proxy-answer-delay 800msec
Everything marked with X is clearly device specific so it will move.
Everything marked with '?' is the default parameter set, we can either
leave it or move it as well and put it under into a 'default' ifindex.
I don't care about the latter, either is fine with me.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink
2005-05-28 12:07 ` Thomas Graf
@ 2005-05-31 10:04 ` jamal
2005-05-31 11:42 ` Thomas Graf
0 siblings, 1 reply; 26+ messages in thread
From: jamal @ 2005-05-31 10:04 UTC (permalink / raw)
To: Thomas Graf; +Cc: David S. Miller, netdev
Doesnt seem like i responded to this.
On Sat, 2005-28-05 at 14:07 +0200, Thomas Graf wrote:
> * jamal <1117244567.6251.34.camel@localhost.localdomain> 2005-05-27 21:42
> > On Fri, 2005-27-05 at 18:35 +0200, Thomas Graf wrote:
> >
> > > I do NOT agree on moving gc_ into this architecture as well,
> > > it doesn't belong there.
> >
> > Well, you do realize they are part of the in_dev ? ;->
>
> Huh? They cannot be device specific, there is only one gc
> per neighbour table. So even _iff_ there was a device specific
> setting it wouldn't make much sense ;->
>
I keep forgeting some of these tables are system wide.
But i think it doesnt matter as long as we comply to the abstration
thats in the kernel; in other words, config access is still via the
in_devxx. i.e if you supply any ifindex (since all devices _at the
moment_ point to the same ARP/ndisc table), it can be accessed and
configured.
> > I see a little main service header with ifindex always no different than
> > IFA or IFLINK etc followed by appropriate TLVs which are nested.
>
> I guess we could simply move NDTPA_PARMS into the devconfig family.
Probably a good start. This would still be missing the gc thresholds
etc, no?
> > Well, if you look at the structure there is no reason they should really
> > be separate; infact theres a comment:
> > -----------
> > struct neigh_parms parms;
> > /* HACK. gc_* shoul follow parms without a gap! */
> > int gc_interval;
> > int gc_thresh1;
> > int gc_thresh2;
> > ----------
>
> You do realize the reason for that comment? ;-> The author of
> the neighbour procfs code was simply too lazy to put these
> into a separate place so he introduced a nasty hack.
>
I think you are correct.
> What about this, we move the device specific part into the new devconfig
> layer but leave the main configuration, statistics, and gc parameters
> in RTM_NEIGHTBL? i.e.
>
> arp_cache entries 2 reachable-time 23s 993msec retransmit-time 1s
> key-len 4 entry-size 152 last-flush 55m 44s 572msec
> gc threshold 128/512/1024 interval 30s chain-position 3
> hash-rand 0x69650257/0x00000003 last-rand 1m 44s 571msec
> ? refcnt 1 pending-queue-limit 3 proxy-delayed-queue-limit 64
> ? num-userspace-probes 0 num-unicast-probes 3 num-multicast-probes 3
> ? min-age 1s base-reachable-time 30s stale-check-interval 1m
> ? initial-probe-delay 5s answer-delay 1s proxy-answer-delay 800msec
> lookups 195 hits 190 failed 0 allocations 3 destroys 1
> hash-grows 1 forced-gc-runs 0 periodic-gc-runs 910
> rcv-unicast-probes 0 rcv-multicast-probes 0
>
> Xarp_cache<eth0> reachable-time 35s 967msec retransmit-time 1s
> X refcnt 3 pending-queue-limit 3 proxy-delayed-queue-limit 64
> X num-userspace-probes 0 num-unicast-probes 3 num-multicast-probes 3
> X min-age 1s base-reachable-time 30s stale-check-interval 1m
> X initial-probe-delay 5s answer-delay 1s proxy-answer-delay 800msec
>
>
> Everything marked with X is clearly device specific so it will move.
> Everything marked with '?' is the default parameter set, we can either
> leave it or move it as well and put it under into a 'default' ifindex.
> I don't care about the latter, either is fine with me.
>
All "stuffs" (your bases is mine) accessible via in_devxx abstraction
should be devconfig configurable. I do concur that some of it may not
make sense - but that should be the exception rules...
Ok, lets say these tables are one such exception, but note:
In the future there could be multiple ARP tables for example (very
likely given all the virtualization approaches happening). In other
words different devices may point to different tables...
cheers,
jamal
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink
2005-05-31 10:04 ` jamal
@ 2005-05-31 11:42 ` Thomas Graf
2005-05-31 12:48 ` jamal
0 siblings, 1 reply; 26+ messages in thread
From: Thomas Graf @ 2005-05-31 11:42 UTC (permalink / raw)
To: jamal; +Cc: David S. Miller, netdev
* jamal <1117533847.6134.32.camel@localhost.localdomain> 2005-05-31 06:04
> Probably a good start. This would still be missing the gc thresholds
> etc, no?
Yes, this only includes the parts that I marked with 'X' and '?' below.
> > What about this, we move the device specific part into the new devconfig
> > layer but leave the main configuration, statistics, and gc parameters
> > in RTM_NEIGHTBL? i.e.
> >
> > arp_cache entries 2 reachable-time 23s 993msec retransmit-time 1s
> > key-len 4 entry-size 152 last-flush 55m 44s 572msec
> > gc threshold 128/512/1024 interval 30s chain-position 3
> > hash-rand 0x69650257/0x00000003 last-rand 1m 44s 571msec
> > ? refcnt 1 pending-queue-limit 3 proxy-delayed-queue-limit 64
> > ? num-userspace-probes 0 num-unicast-probes 3 num-multicast-probes 3
> > ? min-age 1s base-reachable-time 30s stale-check-interval 1m
> > ? initial-probe-delay 5s answer-delay 1s proxy-answer-delay 800msec
> > lookups 195 hits 190 failed 0 allocations 3 destroys 1
> > hash-grows 1 forced-gc-runs 0 periodic-gc-runs 910
> > rcv-unicast-probes 0 rcv-multicast-probes 0
> >
> > Xarp_cache<eth0> reachable-time 35s 967msec retransmit-time 1s
> > X refcnt 3 pending-queue-limit 3 proxy-delayed-queue-limit 64
> > X num-userspace-probes 0 num-unicast-probes 3 num-multicast-probes 3
> > X min-age 1s base-reachable-time 30s stale-check-interval 1m
> > X initial-probe-delay 5s answer-delay 1s proxy-answer-delay 800msec
> >
> >
> > Everything marked with X is clearly device specific so it will move.
> > Everything marked with '?' is the default parameter set, we can either
> > leave it or move it as well and put it under into a 'default' ifindex.
> > I don't care about the latter, either is fine with me.
> >
>
> All "stuffs" (your bases is mine) accessible via in_devxx abstraction
> should be devconfig configurable. I do concur that some of it may not
> make sense - but that should be the exception rules...
> Ok, lets say these tables are one such exception, but note:
> In the future there could be multiple ARP tables for example (very
> likely given all the virtualization approaches happening). In other
> words different devices may point to different tables...
Let's assume for a moment that there are no device specific parameters,
just the default parameter set, e.g everything in the "arp_cache" block.
According do your idea, everything would be part of a faked in_dev entry
with ifindex==0 (or alike). We'd have the exact same issues as with
RTM_NEIGHTBL, the requirement of unique table ids stays the same. There
is one big difference though, what if we ever have neighbour tables which
do not care about inet devices at all? Maybe some kind of new virtual
devices for in-kernel communication? ;-> Sounds scary and not realistic,
but my point is basically that neighbour tables itself have a global scope,
the assignment to the device and device specific parameters is on the
level of the neighbours itself. The connection to in_dev/... is one
level higher with a scope of the relevant neighbour table. A neighbour
table implementation is not required to support device specific
parameters, the code neighbour code will always assign the default
parameter set and leaves it up to the constructor of the implementation
level to reassign a device specific parameter set. Which means that what
the procfs and now also neightbl code does it kind of a hack anyway
since the core neighbour code is not aware of device specific stuff
anyway, all it does is provide an easy method to manage them in a
efficient matter. I would like to stay consistent to this architecture
if possible to be able to follow every new additions.
I hope my point is now clear, I wasn't so sure until now ;-> However,
I think you have a good point, parameter sets have a strong relation
to inet devices at the moment. If you still think that we should move
_everything_ into a devconfig layer then I'll do it but I still support
my initial prospoal.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink
2005-05-31 11:42 ` Thomas Graf
@ 2005-05-31 12:48 ` jamal
2005-05-31 13:17 ` Thomas Graf
0 siblings, 1 reply; 26+ messages in thread
From: jamal @ 2005-05-31 12:48 UTC (permalink / raw)
To: Thomas Graf; +Cc: David S. Miller, netdev
On Tue, 2005-31-05 at 13:42 +0200, Thomas Graf wrote:
> > All "stuffs" (your bases is mine) accessible via in_devxx abstraction
> > should be devconfig configurable. I do concur that some of it may not
> > make sense - but that should be the exception rules...
> > Ok, lets say these tables are one such exception, but note:
> > In the future there could be multiple ARP tables for example (very
> > likely given all the virtualization approaches happening). In other
> > words different devices may point to different tables...
>
> Let's assume for a moment that there are no device specific parameters,
> just the default parameter set, e.g everything in the "arp_cache" block.
> According do your idea, everything would be part of a faked in_dev entry
> with ifindex==0 (or alike). We'd have the exact same issues as with
> RTM_NEIGHTBL, the requirement of unique table ids stays the same. There
> is one big difference though, what if we ever have neighbour tables which
> do not care about inet devices at all? Maybe some kind of new virtual
> devices for in-kernel communication? ;-> Sounds scary and not realistic,
An extra virtual device just to store the defaults will be overkill.
OTOH, a global allocated structure for defaults is ok. I think thats the
way it is done today.
> but my point is basically that neighbour tables itself have a global scope,
> the assignment to the device and device specific parameters is on the
> level of the neighbours itself. The connection to in_dev/... is one
> level higher with a scope of the relevant neighbour table. A neighbour
> table implementation is not required to support device specific
> parameters, the code neighbour code will always assign the default
> parameter set and leaves it up to the constructor of the implementation
> level to reassign a device specific parameter set. Which means that what
> the procfs and now also neightbl code does it kind of a hack anyway
> since the core neighbour code is not aware of device specific stuff
> anyway, all it does is provide an easy method to manage them in a
> efficient matter.
The neighbor code ought not be aware of devices; but whoever is
configuring must be. By configuring i mean "stiching together" the
different blocks (as in some netlink based program in user space). the
idevxxx is a "stiching" construct - and therefore it may be aware of
many other things that a standalone block doesnt. One could argue that
the ifindex in a netdevice is also under the same class. The ARP, NDISC,
netdevice, filters etc are the blocks being sown together.
Most of the times these blocks are built together in a topology that a
packet flows through.
> I would like to stay consistent to this architecture
> if possible to be able to follow every new additions.
>
> I hope my point is now clear, I wasn't so sure until now ;-> However,
> I think you have a good point, parameter sets have a strong relation
> to inet devices at the moment. If you still think that we should move
> _everything_ into a devconfig layer then I'll do it but I still support
> my initial prospoal.
>
I agree that we should leave the neighbor table-specific knobs out of
devconfig.
It will help, though, for devconfig to have a single reference to the
"instance" of the table(perhaps a name or a even a 32 bit pointer
address) that a device uses (thefore maintaining the abstraction); but
since there is only one table for v4 and v6 anyways i suppose it is
implicitly assumed to be that one table.
cheers,
jamal
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink
2005-05-31 12:48 ` jamal
@ 2005-05-31 13:17 ` Thomas Graf
2005-05-31 14:59 ` Jamal Hadi Salim
0 siblings, 1 reply; 26+ messages in thread
From: Thomas Graf @ 2005-05-31 13:17 UTC (permalink / raw)
To: jamal; +Cc: David S. Miller, netdev
* jamal <1117543711.6134.48.camel@localhost.localdomain> 2005-05-31 08:48
> The neighbor code ought not be aware of devices; but whoever is
> configuring must be. By configuring i mean "stiching together" the
> different blocks (as in some netlink based program in user space). the
> idevxxx is a "stiching" construct - and therefore it may be aware of
> many other things that a standalone block doesnt. One could argue that
> the ifindex in a netdevice is also under the same class. The ARP, NDISC,
> netdevice, filters etc are the blocks being sown together.
> Most of the times these blocks are built together in a topology that a
> packet flows through.
OK, so your idevxxx is a kind of container for various configurable
blocks that logically belong to a inetdevice and a device specific
parameter set for a neighbour table would one of these blocks?
I think this is appropriate for NDTA_PARMS, e.g. there might be
multiple of such neighbour table parameter blocks per inetdevice
identified by the name of the neighbour table. You certainly have
a good point there, I see two minor drawbacks though: a) where
do we put these parameter blocks if it doesn't belong to a
inetdevice? and b) it makes collecting the complete configuration
of a neighbour table complicated. Nevertheless, I think I can
agree on this.
> I agree that we should leave the neighbor table-specific knobs out of
> devconfig.
OK, so we leave RTM_NEIGHTBL_* as-is but move the device specific
parameter sets into devconfig? i.e. RTM_NEIGHTBL_* will cover:
* the core neighbour table configuration (key-len, entry-size,
last-flush, gc thresholds, gc interval, hash settings, etc.)
* the default parameter set
* neighbour table statistics
> It will help, though, for devconfig to have a single reference to the
> "instance" of the table(perhaps a name or a even a 32 bit pointer
> address) that a device uses (thefore maintaining the abstraction); but
> since there is only one table for v4 and v6 anyways i suppose it is
> implicitly assumed to be that one table.
This kind of breaks the architecture, i'd rather have it the other
way around, i.e. we hold a netdevice reference in each parameter set
and when lookup a parameter set for a specific idevxxx we iterate
through all neighbour tables and their paremter sets and compare
ifindex.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink
2005-05-31 13:17 ` Thomas Graf
@ 2005-05-31 14:59 ` Jamal Hadi Salim
2005-05-31 16:13 ` Thomas Graf
0 siblings, 1 reply; 26+ messages in thread
From: Jamal Hadi Salim @ 2005-05-31 14:59 UTC (permalink / raw)
To: Thomas Graf; +Cc: netdev, David S. Miller
On Tue, 2005-31-05 at 15:17 +0200, Thomas Graf wrote:
>
> OK, so your idevxxx is a kind of container for various configurable
> blocks that logically belong to a inetdevice and a device specific
> parameter set for a neighbour table would one of these blocks?
>
Essentialy just in_device and inet6_dev
> I think this is appropriate for NDTA_PARMS, e.g. there might be
> multiple of such neighbour table parameter blocks per inetdevice
> identified by the name of the neighbour table. You certainly have
> a good point there, I see two minor drawbacks though: a) where
> do we put these parameter blocks if it doesn't belong to a
> inetdevice? and b) it makes collecting the complete configuration
> of a neighbour table complicated. Nevertheless, I think I can
> agree on this.
>
To recap:
netdevice ->
L2 config stuff
config stuff
more config stuff
even more config stuff
..
..
netdevice protocol config:
-> v4 specific (struct in_device)
----> IPV4 addresses (ifa_list), ARP params(arp_parms),etc
-> v6 specific (struct inet6_dev)
----> IPV6 addresses(addr_list), ndisc params (nd_parms), etc
I think we are agreeing _not_ to configure ifa_list, arp_parms etc
via devconfig. Leave these to their already existing config paths.
In the case of devconfig:
What i am suggesting is to put some identifier that can be used
to access lets say one or more arp_parms. This way when i dump
the devconfig i should be able to see "connected to arp table 1".
I should then be able to say dump arp table 1.
If it is too complex an idea we can drop it.
The defaults should stay as you say (and as they exist today) in the
arp block.
> > I agree that we should leave the neighbor table-specific knobs out of
> > devconfig.
>
> OK, so we leave RTM_NEIGHTBL_* as-is but move the device specific
> parameter sets into devconfig? i.e. RTM_NEIGHTBL_* will cover:
>
> * the core neighbour table configuration (key-len, entry-size,
> last-flush, gc thresholds, gc interval, hash settings, etc.)
> * the default parameter set
> * neighbour table statistics
>
Sounds reasonable.
And per device arp/ndisc is of course one of many
configurable parameters via devconfig.
>
> > It will help, though, for devconfig to have a single reference to the
> > "instance" of the table(perhaps a name or a even a 32 bit pointer
> > address) that a device uses (thefore maintaining the abstraction); but
> > since there is only one table for v4 and v6 anyways i suppose it is
> > implicitly assumed to be that one table.
>
> This kind of breaks the architecture, i'd rather have it the other
> way around, i.e. we hold a netdevice reference in each parameter set
> and when lookup a parameter set for a specific idevxxx we iterate
> through all neighbour tables and their paremter sets and compare
> ifindex.
The directionality as i see it is:
netdevice --> arp/ndisc table(s)
[And the binding/stiching of those tables is via inxx_devyy].
I dont have issues with arp pointing to netdevice as well, but I
thought ARP/ndisc already have ifindex reference?
cheers,
jamal
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink
2005-05-31 14:59 ` Jamal Hadi Salim
@ 2005-05-31 16:13 ` Thomas Graf
2005-06-02 13:33 ` jamal
0 siblings, 1 reply; 26+ messages in thread
From: Thomas Graf @ 2005-05-31 16:13 UTC (permalink / raw)
To: Jamal Hadi Salim; +Cc: netdev, David S. Miller
* Jamal Hadi Salim <1117551561.6279.2.camel@localhost.localdomain> 2005-05-31 10:59
> netdevice ->
> L2 config stuff
> config stuff
> more config stuff
> even more config stuff
> ..
> ..
> netdevice protocol config:
> -> v4 specific (struct in_device)
> ----> IPV4 addresses (ifa_list), ARP params(arp_parms),etc
> -> v6 specific (struct inet6_dev)
> ----> IPV6 addresses(addr_list), ndisc params (nd_parms), etc
Absolutely.
> I think we are agreeing _not_ to configure ifa_list, arp_parms etc
> via devconfig. Leave these to their already existing config paths.
arp_parms is just a reference to the corresponding device specific
parameter set of the "arp_cache" neighbour table. Maybe I've been
misunderstanding you throughout all your posts or maybe I've a
wrong understanding of the neighbour code in general. As far as I
understand it:
1) The actual neighbour implementation registers a new neighbour
table by calling neigh_table_init() with the proposed default
configuration for this table. This also includes the default
parameter set used to derive device specific parameter sets
later on, e.g. arp_tbl.
2) Upon the creation of a inet_device/... a new parameter set
is allocated that derives the initial parameter values from
the default parameter set. This parameter set is stored
in in_dev->arp_parms and linked in a list of parameter sets
in the neighbour table (neightbl->parms).
3) When a new neighbour is created the core neighbour code
will assign the default parameter set of the table to
the neighbour and increments its refcnt. It then calls
the constructor of arp,ndisc,... which replaces the
parameter set just assigned with the one in
in_dev->arp_parms which is the parameter set.
So what I propose is to have the neighbour table parameters,
e.g. everything in arp_tbl be distributed over RTM_NEIGHTBL
and put the device specific parameters into devconfig,
e.g. in_dev->arp_parms.
> In the case of devconfig:
> What i am suggesting is to put some identifier that can be used
> to access lets say one or more arp_parms. This way when i dump
> the devconfig i should be able to see "connected to arp table 1".
> I should then be able to say dump arp table 1.
OK, this is possible with either way. I don't worry about this.
> The directionality as i see it is:
>
> netdevice --> arp/ndisc table(s)
> [And the binding/stiching of those tables is via inxx_devyy].
Absolutely, more specific:
netdevice -> inet_device -> parameter set -> neighbour table
or:
neighbour table -> list of parameter sets -> netdevice
both ways are possible right now.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink
2005-05-31 16:13 ` Thomas Graf
@ 2005-06-02 13:33 ` jamal
0 siblings, 0 replies; 26+ messages in thread
From: jamal @ 2005-06-02 13:33 UTC (permalink / raw)
To: Thomas Graf; +Cc: David S. Miller, netdev
On Tue, 2005-31-05 at 18:13 +0200, Thomas Graf wrote:
[..]
> So what I propose is to have the neighbour table parameters,
> e.g. everything in arp_tbl be distributed over RTM_NEIGHTBL
> and put the device specific parameters into devconfig,
> e.g. in_dev->arp_parms.
>
Right, this is what i am saying a well.
The only caveat i was pointing out is that the devconfig piece is more
than just the neighbor stuff - and of course it hasnt been written,
yet;->
The major challenge will be events - some change via /proc, sysfs etc
should generate event.
I suggest something along usage of notifier_block with something like
NETDEV_CONFIG to transport these things around. Damn, if only i can find
my patch ....
I had already started doing events based on changes from /proc or sysctl
etc.
> Absolutely, more specific:
>
> netdevice -> inet_device -> parameter set -> neighbour table
> or:
> neighbour table -> list of parameter sets -> netdevice
>
> both ways are possible right now.
Sounds good to me.
cheers,
jamal
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2005-06-02 13:33 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-26 18:53 [PATCHSET] neighbour tables access via rtnetlink Thomas Graf
2005-05-26 18:54 ` [PATCH 1/4] [NETLINK] New message building macros Thomas Graf
2005-05-26 18:54 ` [PATCH 2/4] [RTNETLINK] Routing attribute related shortcuts Thomas Graf
2005-05-26 18:55 ` [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink Thomas Graf
2005-05-26 22:17 ` David S. Miller
2005-05-26 22:24 ` Thomas Graf
2005-05-26 22:26 ` Thomas Graf
2005-05-26 22:37 ` David S. Miller
2005-05-27 11:14 ` jamal
2005-05-27 12:15 ` Thomas Graf
2005-05-27 13:50 ` jamal
[not found] ` <20050527141023.GP15391@postel.suug.ch>
2005-05-27 14:57 ` jamal
2005-05-27 15:16 ` Thomas Graf
2005-05-27 15:56 ` jamal
2005-05-27 16:35 ` Thomas Graf
2005-05-28 1:42 ` jamal
2005-05-28 12:07 ` Thomas Graf
2005-05-31 10:04 ` jamal
2005-05-31 11:42 ` Thomas Graf
2005-05-31 12:48 ` jamal
2005-05-31 13:17 ` Thomas Graf
2005-05-31 14:59 ` Jamal Hadi Salim
2005-05-31 16:13 ` Thomas Graf
2005-06-02 13:33 ` jamal
2005-05-26 18:55 ` [NEIGH] Remove unused fields in struct neigh_parms and neigh_table Thomas Graf
2005-05-26 19:00 ` Thomas Graf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).