* [PATCHSET] neighbour tables access via rtnetlink
@ 2005-05-26 18:53 Thomas Graf
2005-05-26 18:54 ` [PATCH 1/4] [NETLINK] New message building macros Thomas Graf
` (3 more replies)
0 siblings, 4 replies; 26+ messages in thread
From: Thomas Graf @ 2005-05-26 18:53 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev
Dave,
This patchset adds neighbour table configuration and
statistics access via rtnetlink. Patches 1-2 add some
new macros to make netlink message construction simpler
and more readable, patch 3 adds the actual neighbour
table code and patch 4 removes a few unused fields
discovered while writing patch 3 ;->
include/linux/netlink.h | 17 ++
include/linux/rtnetlink.h | 45 ++++++
include/net/neighbour.h | 3
include/linux/rtnetlink.h | 107 ++++++++++++++
include/net/neighbour.h | 4
net/core/neighbour.c | 317 +++++++++++++++++++++++++++++++++++++++++++-
net/core/rtnetlink.c | 20 +-
security/selinux/nlmsgtab.c | 2
8 files changed, 499 insertions(+), 16 deletions(-)
^ permalink raw reply [flat|nested] 26+ messages in thread* [PATCH 1/4] [NETLINK] New message building macros 2005-05-26 18:53 [PATCHSET] neighbour tables access via rtnetlink Thomas Graf @ 2005-05-26 18:54 ` Thomas Graf 2005-05-26 18:54 ` [PATCH 2/4] [RTNETLINK] Routing attribute related shortcuts Thomas Graf ` (2 subsequent siblings) 3 siblings, 0 replies; 26+ messages in thread From: Thomas Graf @ 2005-05-26 18:54 UTC (permalink / raw) To: David S. Miller; +Cc: netdev NLMSG_PUT_ANSWER(skb, nlcb, type, length) Start a new netlink message as answer to a request, returns the message header. NLMSG_END(skb, nlh) End a netlink message, fixes total message length, returns skb->len. NLMSG_CANCEL(skb, nlh) Cancel the building process and trim whole message from skb again, returns -1. Signed-off-by: Thomas Graf <tgraf@suug.ch> --- commit 88e04e26d887e806372014a2f7b33c6a7ea64741 tree 031c0e0d63096ca2dff2203940b90460f1db5c68 parent d1faeaeb95a05275cf0c5b51b88f2fa833434625 author Thomas Graf <tgraf@suug.ch> Thu, 26 May 2005 18:21:36 +0200 committer Thomas Graf <tgraf@suug.ch> Thu, 26 May 2005 18:21:36 +0200 include/linux/netlink.h | 17 +++++++++++++++-- 1 files changed, 15 insertions(+), 2 deletions(-) Index: include/linux/netlink.h =================================================================== --- c6f827347b3c6c28bc55a8eb1e62ea2659202cab/include/linux/netlink.h (mode:100644) +++ 031c0e0d63096ca2dff2203940b90460f1db5c68/include/linux/netlink.h (mode:100644) @@ -171,8 +171,21 @@ } #define NLMSG_PUT(skb, pid, seq, type, len) \ -({ if (skb_tailroom(skb) < (int)NLMSG_SPACE(len)) goto nlmsg_failure; \ - __nlmsg_put(skb, pid, seq, type, len); }) +({ if (skb_tailroom(skb) < (int)NLMSG_SPACE(len)) \ + goto nlmsg_failure; \ + __nlmsg_put(skb, pid, seq, type, len); }) + +#define NLMSG_PUT_ANSWER(skb, cb, type, len) \ + NLMSG_PUT(skb, NETLINK_CB((cb)->skb).pid, \ + (cb)->nlh->nlmsg_seq, type, len) + +#define NLMSG_END(skb, nlh) \ +({ (nlh)->nlmsg_len = (skb)->tail - (unsigned char *) (nlh); \ + (skb)->len; }) + +#define NLMSG_CANCEL(skb, nlh) \ +({ skb_trim(skb, (unsigned char *) (nlh) - (skb)->data); \ + -1; }) extern int netlink_dump_start(struct sock *ssk, struct sk_buff *skb, struct nlmsghdr *nlh, ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 2/4] [RTNETLINK] Routing attribute related shortcuts 2005-05-26 18:53 [PATCHSET] neighbour tables access via rtnetlink Thomas Graf 2005-05-26 18:54 ` [PATCH 1/4] [NETLINK] New message building macros Thomas Graf @ 2005-05-26 18:54 ` Thomas Graf 2005-05-26 18:55 ` [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink Thomas Graf 2005-05-26 18:55 ` [NEIGH] Remove unused fields in struct neigh_parms and neigh_table Thomas Graf 3 siblings, 0 replies; 26+ messages in thread From: Thomas Graf @ 2005-05-26 18:54 UTC (permalink / raw) To: David S. Miller; +Cc: netdev RTA_GET_U(32|64)(tlv) Assumes TLV is a u32/u64 field and returns its value. RTA_GET_[M]SECS(tlv) Assumes TLV is a u64 and transports jiffies converted to seconds or milliseconds and returns its value. RTA_PUT_U(32|64)(skb, type, value) Appends %value as fixed u32/u64 to %skb as TLV %type. RTA_PUT_[M]SECS(skb, type, jiffies) Converts %jiffies to secs/msecs and appends it as u64 to %skb as TLV %type. RTA_PUT_STRING(skb, type, string) Appends %NUL terminated %string to %skb as TLV %type. RTA_NEST(skb, type) Starts a nested TLV %type and returns the nesting handle. RTA_NEST_END(skb, nesting_handle) Finishes the nested TLV %nesting_handle, must be called symmetric to RTA_NEST(). Returns skb->len RTA_NEST_CANCEL(skb, nesting_handle) Cancel the nested TLV %nesting_handle and trim nested TLV from skb again, returns -1. Signed-off-by: Thomas Graf <tgraf@suug.ch> --- commit d24c37db15b6cd790459d9d91ee6331ce17264b4 tree 52e7990f6dee710624f823014c40c14e0e5ccf39 parent 88e04e26d887e806372014a2f7b33c6a7ea64741 author Thomas Graf <tgraf@suug.ch> Thu, 26 May 2005 18:22:23 +0200 committer Thomas Graf <tgraf@suug.ch> Thu, 26 May 2005 18:22:23 +0200 include/linux/rtnetlink.h | 45 +++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 45 insertions(+) Index: include/linux/rtnetlink.h =================================================================== --- 031c0e0d63096ca2dff2203940b90460f1db5c68/include/linux/rtnetlink.h (mode:100644) +++ 52e7990f6dee710624f823014c40c14e0e5ccf39/include/linux/rtnetlink.h (mode:100644) @@ -789,6 +789,51 @@ ({ if (unlikely(skb_tailroom(skb) < (int)(attrlen))) \ goto rtattr_failure; \ memcpy(skb_put(skb, RTA_ALIGN(attrlen)), data, attrlen); }) + +#define RTA_PUT_U32(skb, attrtype, value) \ +({ u32 _tmp = (value); \ + RTA_PUT(skb, attrtype, sizeof(u32), &_tmp); }) + +#define RTA_PUT_U64(skb, attrtype, value) \ +({ u64 _tmp = (value); \ + RTA_PUT(skb, attrtype, sizeof(u64), &_tmp); }) + +#define RTA_PUT_SECS(skb, attrtype, value) \ + RTA_PUT_U64(skb, attrtype, (value) / HZ) + +#define RTA_PUT_MSECS(skb, attrtype, value) \ + RTA_PUT_U64(skb, attrtype, jiffies_to_msecs(value)) + +#define RTA_PUT_STRING(skb, attrtype, value) \ + RTA_PUT(skb, attrtype, strlen(value) + 1, value) + +#define RTA_NEST(skb, type) \ +({ struct rtattr *__start = (struct rtattr *) (skb)->tail; \ + RTA_PUT(skb, type, 0, NULL); \ + __start; }) + +#define RTA_NEST_END(skb, start) \ +({ (start)->rta_len = ((skb)->tail - (unsigned char *) (start)); \ + (skb)->len; }) + +#define RTA_NEST_CANCEL(skb, start) \ +({ skb_trim(skb, (unsigned char *) (start) - (skb)->data); \ + -1; }) + +#define RTA_GET_U32(rta) \ +({ if (!rta || RTA_PAYLOAD(rta) < sizeof(u32)) \ + goto rtattr_failure; \ + *(u32 *) RTA_DATA(rta); }) + +#define RTA_GET_U64(rta) \ +({ u64 _tmp; \ + if (!rta || RTA_PAYLOAD(rta) < sizeof(u64)) \ + goto rtattr_failure; \ + memcpy(&_tmp, RTA_DATA(rta), sizeof(_tmp)); \ + _tmp; }) + +#define RTA_GET_SECS(rta) ((unsigned long) RTA_GET_U64(rta) * HZ) +#define RTA_GET_MSECS(rta) (msecs_to_jiffies((unsigned long) RTA_GET_U64(rta))) static inline struct rtattr * __rta_reserve(struct sk_buff *skb, int attrtype, int attrlen) ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink 2005-05-26 18:53 [PATCHSET] neighbour tables access via rtnetlink Thomas Graf 2005-05-26 18:54 ` [PATCH 1/4] [NETLINK] New message building macros Thomas Graf 2005-05-26 18:54 ` [PATCH 2/4] [RTNETLINK] Routing attribute related shortcuts Thomas Graf @ 2005-05-26 18:55 ` Thomas Graf 2005-05-26 22:17 ` David S. Miller 2005-05-27 11:14 ` jamal 2005-05-26 18:55 ` [NEIGH] Remove unused fields in struct neigh_parms and neigh_table Thomas Graf 3 siblings, 2 replies; 26+ messages in thread From: Thomas Graf @ 2005-05-26 18:55 UTC (permalink / raw) To: David S. Miller; +Cc: netdev To retrieve the neighbour tables send RTM_GETNEIGHTBL with the NLM_F_DUMP flag set. Every neighbour table configuration is spread over multiple messages to avoid running into message size limits on systems with many interfaces. The first message in the sequence transports all not device specific data such as statistics, configuration, and the default parameter set. This message is followed by 0..n messages carrying device specific parameter sets. They all share the same value for Although the ordering should be sufficient, NDTA_NAME can be used to identify sequences. The initial message can be identified by checking for NDTA_CONFIG. The device specific messages do no contain this TLV but have NDTPA_IFINDEX set to the corresponding interface index. To change neighbour table attributes, send RTM_SETNEIGHTBL with NDTA_NAME set. Changeable attribute include NDTA_THRESH[1-3], NDTA_GC_INTERVAL, and all TLVs in NDTA_PARMS unless marked otherwise. Device specific parameter sets can be changed by setting NDTPA_IFINDEX to the interface index of the corresponding device. Signed-off-by: Thomas Graf <tgraf@suug.ch> --- commit 6769b3b0603c632bda062e427178de8986900a52 tree 7e955d0e3c8fdef048336d0d7ba24a0539f28f9b parent 98c3bc2e6320127cd96ef46d33f89ae127ce48fe author Thomas Graf <tgraf@suug.ch> Wed, 25 May 2005 20:30:09 +0200 committer Thomas Graf <tgraf@suug.ch> Wed, 25 May 2005 20:30:09 +0200 include/linux/rtnetlink.h | 107 ++++++++++++++ include/net/neighbour.h | 4 net/core/neighbour.c | 317 +++++++++++++++++++++++++++++++++++++++++++- net/core/rtnetlink.c | 20 +- security/selinux/nlmsgtab.c | 2 5 files changed, 439 insertions(+), 11 deletions(-) Index: include/linux/rtnetlink.h =================================================================== --- 0ad944fa24c79b3eeaf06f85f5dd6d91ae1adbc9/include/linux/rtnetlink.h (mode:100644) +++ 7e955d0e3c8fdef048336d0d7ba24a0539f28f9b/include/linux/rtnetlink.h (mode:100644) @@ -89,6 +89,13 @@ RTM_GETANYCAST = 62, #define RTM_GETANYCAST RTM_GETANYCAST + RTM_NEWNEIGHTBL = 64, +#define RTM_NEWNEIGHTBL RTM_NEWNEIGHTBL + RTM_GETNEIGHTBL = 66, +#define RTM_GETNEIGHTBL RTM_GETNEIGHTBL + RTM_SETNEIGHTBL, +#define RTM_SETNEIGHTBL RTM_SETNEIGHTBL + __RTM_MAX, #define RTM_MAX (((__RTM_MAX + 3) & ~3) - 1) }; @@ -493,6 +500,106 @@ __u32 ndm_refcnt; }; + +/***************************************************************** + * Neighbour tables specific messages. + * + * To retrieve the neighbour tables send RTM_GETNEIGHTBL with the + * NLM_F_DUMP flag set. Every neighbour table configuration is + * spread over multiple messages to avoid running into message + * size limits on systems with many interfaces. The first message + * in the sequence transports all not device specific data such as + * statistics, configuration, and the default parameter set. + * This message is followed by 0..n messages carrying device + * specific parameter sets. They all share the same value for + * Although the ordering should be sufficient, NDTA_NAME can be + * used to identify sequences. The initial message can be identified + * by checking for NDTA_CONFIG. The device specific messages do + * no contain this TLV but have NDTPA_IFINDEX set to the + * corresponding interface index. + * + * To change neighbour table attributes, send RTM_SETNEIGHTBL + * with NDTA_NAME set. Changeable attribute include NDTA_THRESH[1-3], + * NDTA_GC_INTERVAL, and all TLVs in NDTA_PARMS unless marked + * otherwise. Device specific parameter sets can be changed by + * setting NDTPA_IFINDEX to the interface index of the corresponding + * device. + ****/ + +struct ndt_stats +{ + __u64 ndts_allocs; + __u64 ndts_destroys; + __u64 ndts_hash_grows; + __u64 ndts_res_failed; + __u64 ndts_lookups; + __u64 ndts_hits; + __u64 ndts_rcv_probes_mcast; + __u64 ndts_rcv_probes_ucast; + __u64 ndts_periodic_gc_runs; + __u64 ndts_forced_gc_runs; +}; + +enum { + NDTPA_UNSPEC, + NDTPA_IFINDEX, /* u32, unchangeable */ + NDTPA_REFCNT, /* u32, read-only */ + NDTPA_REACHABLE_TIME, /* u64, read-only, msecs */ + NDTPA_BASE_REACHABLE_TIME, /* u64, msecs */ + NDTPA_RETRANS_TIME, /* u64, msecs */ + NDTPA_GC_STALETIME, /* u64, msecs */ + NDTPA_DELAY_PROBE_TIME, /* u64, msecs */ + NDTPA_QUEUE_LEN, /* u32 */ + NDTPA_APP_PROBES, /* u32 */ + NDTPA_UCAST_PROBES, /* u32 */ + NDTPA_MCAST_PROBES, /* u32 */ + NDTPA_ANYCAST_DELAY, /* u64, msecs */ + NDTPA_PROXY_DELAY, /* u64, msecs */ + NDTPA_PROXY_QLEN, /* u32 */ + NDTPA_LOCKTIME, /* u64, msecs */ + __NDTPA_MAX +}; +#define NDTPA_MAX (__NDTPA_MAX - 1) + +struct ndtmsg +{ + __u8 ndtm_family; + __u8 ndtm_pad1; + __u16 ndtm_pad2; +}; + +struct ndt_config +{ + __u16 ndtc_key_len; + __u16 ndtc_entry_size; + __u32 ndtc_entries; + __u32 ndtc_last_flush; /* delta to now in msecs */ + __u32 ndtc_last_rand; /* delta to now in msecs */ + __u32 ndtc_hash_rnd; + __u32 ndtc_hash_mask; + __u32 ndtc_hash_chain_gc; + __u32 ndtc_proxy_qlen; +}; + +enum { + NDTA_UNSPEC, + NDTA_NAME, /* char *, unchangeable */ + NDTA_THRESH1, /* u32 */ + NDTA_THRESH2, /* u32 */ + NDTA_THRESH3, /* u32 */ + NDTA_CONFIG, /* struct ndt_config, read-only */ + NDTA_PARMS, /* nested TLV NDTPA_* */ + NDTA_STATS, /* struct ndt_stats, read-only */ + NDTA_GC_INTERVAL, /* u64, msecs */ + __NDTA_MAX +}; +#define NDTA_MAX (__NDTA_MAX - 1) + +#define NDTA_RTA(r) ((struct rtattr*)(((char*)(r)) + \ + NLMSG_ALIGN(sizeof(struct ndtmsg)))) +#define NDTA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct ndtmsg)) + + /**** * General form of address family dependent message. ****/ Index: include/net/neighbour.h =================================================================== --- 0ad944fa24c79b3eeaf06f85f5dd6d91ae1adbc9/include/net/neighbour.h (mode:100644) +++ 7e955d0e3c8fdef048336d0d7ba24a0539f28f9b/include/net/neighbour.h (mode:100644) @@ -65,6 +65,7 @@ struct neigh_parms { + struct net_device *dev; struct neigh_parms *next; int (*neigh_setup)(struct neighbour *); struct neigh_table *tbl; @@ -252,6 +253,9 @@ extern int neigh_delete(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg); extern void neigh_app_ns(struct neighbour *n); +extern int neightbl_dump_info(struct sk_buff *skb, struct netlink_callback *cb); +extern int neightbl_set(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg); + extern void neigh_for_each(struct neigh_table *tbl, void (*cb)(struct neighbour *, void *), void *cookie); extern void __neigh_for_each_release(struct neigh_table *tbl, int (*cb)(struct neighbour *)); extern void pneigh_for_each(struct neigh_table *tbl, void (*cb)(struct pneigh_entry *)); Index: net/core/neighbour.c =================================================================== --- 0ad944fa24c79b3eeaf06f85f5dd6d91ae1adbc9/net/core/neighbour.c (mode:100644) +++ 7e955d0e3c8fdef048336d0d7ba24a0539f28f9b/net/core/neighbour.c (mode:100644) @@ -1276,9 +1276,14 @@ INIT_RCU_HEAD(&p->rcu_head); p->reachable_time = neigh_rand_reach_time(p->base_reachable_time); - if (dev && dev->neigh_setup && dev->neigh_setup(dev, p)) { - kfree(p); - return NULL; + if (dev) { + if (dev->neigh_setup && dev->neigh_setup(dev, p)) { + kfree(p); + return NULL; + } + + dev_hold(dev); + p->dev = dev; } p->sysctl_table = NULL; write_lock_bh(&tbl->lock); @@ -1309,6 +1314,8 @@ *p = parms->next; parms->dead = 1; write_unlock_bh(&tbl->lock); + if (parms->dev) + dev_put(parms->dev); call_rcu(&parms->rcu_head, neigh_rcu_free_parms); return; } @@ -1546,6 +1553,308 @@ return err; } +static int neightbl_fill_parms(struct sk_buff *skb, struct neigh_parms *parms) +{ + struct rtattr *nest = RTA_NEST(skb, NDTA_PARMS); + + if (parms->dev) + RTA_PUT_U32(skb, NDTPA_IFINDEX, parms->dev->ifindex); + + RTA_PUT_U32(skb, NDTPA_REFCNT, atomic_read(&parms->refcnt)); + RTA_PUT_U32(skb, NDTPA_QUEUE_LEN, parms->queue_len); + RTA_PUT_U32(skb, NDTPA_PROXY_QLEN, parms->proxy_qlen); + RTA_PUT_U32(skb, NDTPA_APP_PROBES, parms->app_probes); + RTA_PUT_U32(skb, NDTPA_UCAST_PROBES, parms->ucast_probes); + RTA_PUT_U32(skb, NDTPA_MCAST_PROBES, parms->mcast_probes); + RTA_PUT_MSECS(skb, NDTPA_REACHABLE_TIME, parms->reachable_time); + RTA_PUT_MSECS(skb, NDTPA_BASE_REACHABLE_TIME, + parms->base_reachable_time); + RTA_PUT_MSECS(skb, NDTPA_GC_STALETIME, parms->gc_staletime); + RTA_PUT_MSECS(skb, NDTPA_DELAY_PROBE_TIME, parms->delay_probe_time); + RTA_PUT_MSECS(skb, NDTPA_RETRANS_TIME, parms->retrans_time); + RTA_PUT_MSECS(skb, NDTPA_ANYCAST_DELAY, parms->anycast_delay); + RTA_PUT_MSECS(skb, NDTPA_PROXY_DELAY, parms->proxy_delay); + RTA_PUT_MSECS(skb, NDTPA_LOCKTIME, parms->locktime); + + return RTA_NEST_END(skb, nest); + +rtattr_failure: + return RTA_NEST_CANCEL(skb, nest); +} + +static int neightbl_fill_info(struct neigh_table *tbl, struct sk_buff *skb, + struct netlink_callback *cb) +{ + struct nlmsghdr *nlh; + struct ndtmsg *ndtmsg; + + nlh = NLMSG_PUT_ANSWER(skb, cb, RTM_NEWNEIGHTBL, sizeof(struct ndtmsg)); + ndtmsg = NLMSG_DATA(nlh); + + NLMSG_SET_MULTIPART(nlh); + + read_lock_bh(&tbl->lock); + ndtmsg->ndtm_family = tbl->family; + + RTA_PUT_STRING(skb, NDTA_NAME, tbl->id); + RTA_PUT_MSECS(skb, NDTA_GC_INTERVAL, tbl->gc_interval); + RTA_PUT_U32(skb, NDTA_THRESH1, tbl->gc_thresh1); + RTA_PUT_U32(skb, NDTA_THRESH2, tbl->gc_thresh2); + RTA_PUT_U32(skb, NDTA_THRESH3, tbl->gc_thresh3); + + { + unsigned long now = jiffies; + unsigned int flush_delta = now - tbl->last_flush; + unsigned int rand_delta = now - tbl->last_rand; + + struct ndt_config ndc = { + .ndtc_key_len = tbl->key_len, + .ndtc_entry_size = tbl->entry_size, + .ndtc_entries = atomic_read(&tbl->entries), + .ndtc_last_flush = jiffies_to_msecs(flush_delta), + .ndtc_last_rand = jiffies_to_msecs(rand_delta), + .ndtc_hash_rnd = tbl->hash_rnd, + .ndtc_hash_mask = tbl->hash_mask, + .ndtc_hash_chain_gc = tbl->hash_chain_gc, + .ndtc_proxy_qlen = tbl->proxy_queue.qlen, + }; + + RTA_PUT(skb, NDTA_CONFIG, sizeof(ndc), &ndc); + } + + { + int cpu; + struct ndt_stats ndst; + + memset(&ndst, 0, sizeof(ndst)); + + for (cpu = 0; cpu < NR_CPUS; cpu++) { + struct neigh_statistics *st; + + if (!cpu_possible(cpu)) + continue; + + st = per_cpu_ptr(tbl->stats, cpu); + ndst.ndts_allocs += st->allocs; + ndst.ndts_destroys += st->destroys; + ndst.ndts_hash_grows += st->hash_grows; + ndst.ndts_res_failed += st->res_failed; + ndst.ndts_lookups += st->lookups; + ndst.ndts_hits += st->hits; + ndst.ndts_rcv_probes_mcast += st->rcv_probes_mcast; + ndst.ndts_rcv_probes_ucast += st->rcv_probes_ucast; + ndst.ndts_periodic_gc_runs += st->periodic_gc_runs; + ndst.ndts_forced_gc_runs += st->forced_gc_runs; + } + + RTA_PUT(skb, NDTA_STATS, sizeof(ndst), &ndst); + } + + BUG_ON(tbl->parms.dev); + if (neightbl_fill_parms(skb, &tbl->parms) < 0) + goto rtattr_failure; + + read_unlock_bh(&tbl->lock); + return NLMSG_END(skb, nlh); + +rtattr_failure: + read_unlock_bh(&tbl->lock); + return NLMSG_CANCEL(skb, nlh); + +nlmsg_failure: + return -1; +} + +static int neightbl_fill_param_info(struct neigh_table *tbl, + struct neigh_parms *parms, + struct sk_buff *skb, + struct netlink_callback *cb) +{ + struct ndtmsg *ndtmsg; + struct nlmsghdr *nlh; + + nlh = NLMSG_PUT_ANSWER(skb, cb, RTM_NEWNEIGHTBL, sizeof(struct ndtmsg)); + ndtmsg = NLMSG_DATA(nlh); + + NLMSG_SET_MULTIPART(nlh); + + read_lock_bh(&tbl->lock); + ndtmsg->ndtm_family = tbl->family; + RTA_PUT_STRING(skb, NDTA_NAME, tbl->id); + + if (neightbl_fill_parms(skb, parms) < 0) + goto rtattr_failure; + + read_unlock_bh(&tbl->lock); + return NLMSG_END(skb, nlh); + +rtattr_failure: + read_unlock_bh(&tbl->lock); + return NLMSG_CANCEL(skb, nlh); + +nlmsg_failure: + return -1; +} + +static inline struct neigh_parms *lookup_neigh_params(struct neigh_table *tbl, + int ifindex) +{ + struct neigh_parms *p; + + for (p = &tbl->parms; p; p = p->next) + if ((p->dev && p->dev->ifindex == ifindex) || + (!p->dev && !ifindex)) + return p; + + return NULL; +} + +int neightbl_set(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg) +{ + struct neigh_table *tbl; + struct ndtmsg *ndtmsg = NLMSG_DATA(nlh); + struct rtattr **tb = arg; + int err = -EINVAL; + + if (!tb[NDTA_NAME - 1] || !RTA_PAYLOAD(tb[NDTA_NAME - 1])) + return -EINVAL; + + read_lock(&neigh_tbl_lock); + for (tbl = neigh_tables; tbl; tbl = tbl->next) { + if (ndtmsg->ndtm_family && tbl->family != ndtmsg->ndtm_family) + continue; + + if (!rtattr_strcmp(tb[NDTA_NAME - 1], tbl->id)) + break; + } + + if (tbl == NULL) { + err = -ENOENT; + goto errout; + } + + /* + * We acquire tbl->lock to be nice to the periodic timers and + * make sure they always see a consistent set of values. + */ + write_lock_bh(&tbl->lock); + + if (tb[NDTA_THRESH1 - 1]) + tbl->gc_thresh1 = RTA_GET_U32(tb[NDTA_THRESH1 - 1]); + + if (tb[NDTA_THRESH2 - 1]) + tbl->gc_thresh2 = RTA_GET_U32(tb[NDTA_THRESH2 - 1]); + + if (tb[NDTA_THRESH3 - 1]) + tbl->gc_thresh3 = RTA_GET_U32(tb[NDTA_THRESH3 - 1]); + + if (tb[NDTA_GC_INTERVAL - 1]) + tbl->gc_interval = RTA_GET_MSECS(tb[NDTA_GC_INTERVAL - 1]); + + if (tb[NDTA_PARMS - 1]) { + struct rtattr *tbp[NDTPA_MAX]; + struct neigh_parms *p; + u32 ifindex = 0; + + if (rtattr_parse_nested(tbp, NDTPA_MAX, tb[NDTA_PARMS - 1]) < 0) + goto rtattr_failure; + + if (tbp[NDTPA_IFINDEX - 1]) + ifindex = RTA_GET_U32(tbp[NDTPA_IFINDEX - 1]); + + p = lookup_neigh_params(tbl, ifindex); + if (p == NULL) { + err = -ENOENT; + goto rtattr_failure; + } + + if (tbp[NDTPA_QUEUE_LEN - 1]) + p->queue_len = RTA_GET_U32(tbp[NDTPA_QUEUE_LEN - 1]); + + if (tbp[NDTPA_PROXY_QLEN - 1]) + p->proxy_qlen = RTA_GET_U32(tbp[NDTPA_PROXY_QLEN - 1]); + + if (tbp[NDTPA_APP_PROBES - 1]) + p->app_probes = RTA_GET_U32(tbp[NDTPA_APP_PROBES - 1]); + + if (tbp[NDTPA_UCAST_PROBES - 1]) + p->ucast_probes = + RTA_GET_U32(tbp[NDTPA_UCAST_PROBES - 1]); + + if (tbp[NDTPA_MCAST_PROBES - 1]) + p->mcast_probes = + RTA_GET_U32(tbp[NDTPA_MCAST_PROBES - 1]); + + if (tbp[NDTPA_BASE_REACHABLE_TIME - 1]) + p->base_reachable_time = + RTA_GET_MSECS(tbp[NDTPA_BASE_REACHABLE_TIME - 1]); + + if (tbp[NDTPA_GC_STALETIME - 1]) + p->gc_staletime = + RTA_GET_MSECS(tbp[NDTPA_GC_STALETIME - 1]); + + if (tbp[NDTPA_DELAY_PROBE_TIME - 1]) + p->delay_probe_time = + RTA_GET_MSECS(tbp[NDTPA_DELAY_PROBE_TIME - 1]); + + if (tbp[NDTPA_RETRANS_TIME - 1]) + p->retrans_time = + RTA_GET_MSECS(tbp[NDTPA_RETRANS_TIME - 1]); + + if (tbp[NDTPA_ANYCAST_DELAY - 1]) + p->anycast_delay = + RTA_GET_MSECS(tbp[NDTPA_ANYCAST_DELAY - 1]); + + if (tbp[NDTPA_PROXY_DELAY - 1]) + p->proxy_delay = + RTA_GET_MSECS(tbp[NDTPA_PROXY_DELAY - 1]); + + if (tbp[NDTPA_LOCKTIME - 1]) + p->locktime = RTA_GET_MSECS(tbp[NDTPA_LOCKTIME - 1]); + } + + err = 0; + +rtattr_failure: + write_unlock_bh(&tbl->lock); +errout: + read_unlock(&neigh_tbl_lock); + return err; +} + +int neightbl_dump_info(struct sk_buff *skb, struct netlink_callback *cb) +{ + int idx, family; + int s_idx = cb->args[0]; + struct neigh_table *tbl; + + family = ((struct rtgenmsg *)NLMSG_DATA(cb->nlh))->rtgen_family; + + read_lock(&neigh_tbl_lock); + for (tbl = neigh_tables, idx = 0; tbl; tbl = tbl->next) { + struct neigh_parms *p; + + if (idx < s_idx || (family && tbl->family != family)) + continue; + + if (neightbl_fill_info(tbl, skb, cb) <= 0) + break; + + for (++idx, p = tbl->parms.next; p; p = p->next, idx++) { + if (idx < s_idx) + continue; + + if (neightbl_fill_param_info(tbl, p, skb, cb) <= 0) + goto out; + } + + } +out: + read_unlock(&neigh_tbl_lock); + cb->args[0] = idx; + + return skb->len; +} static int neigh_fill_info(struct sk_buff *skb, struct neighbour *n, u32 pid, u32 seq, int event) @@ -2352,6 +2661,8 @@ EXPORT_SYMBOL(neigh_update_hhs); EXPORT_SYMBOL(pneigh_enqueue); EXPORT_SYMBOL(pneigh_lookup); +EXPORT_SYMBOL(neightbl_dump_info); +EXPORT_SYMBOL(neightbl_set); #ifdef CONFIG_ARPD EXPORT_SYMBOL(neigh_app_ns); Index: net/core/rtnetlink.c =================================================================== --- 0ad944fa24c79b3eeaf06f85f5dd6d91ae1adbc9/net/core/rtnetlink.c (mode:100644) +++ 7e955d0e3c8fdef048336d0d7ba24a0539f28f9b/net/core/rtnetlink.c (mode:100644) @@ -100,6 +100,7 @@ [RTM_FAM(RTM_NEWPREFIX)] = NLMSG_LENGTH(sizeof(struct rtgenmsg)), [RTM_FAM(RTM_GETMULTICAST)] = NLMSG_LENGTH(sizeof(struct rtgenmsg)), [RTM_FAM(RTM_GETANYCAST)] = NLMSG_LENGTH(sizeof(struct rtgenmsg)), + [RTM_FAM(RTM_NEWNEIGHTBL)] = NLMSG_LENGTH(sizeof(struct ndtmsg)), }; static const int rta_max[RTM_NR_FAMILIES] = @@ -113,6 +114,7 @@ [RTM_FAM(RTM_NEWTCLASS)] = TCA_MAX, [RTM_FAM(RTM_NEWTFILTER)] = TCA_MAX, [RTM_FAM(RTM_NEWACTION)] = TCAA_MAX, + [RTM_FAM(RTM_NEWNEIGHTBL)] = NDTA_MAX, }; void __rta_fill(struct sk_buff *skb, int attrtype, int attrlen, const void *data) @@ -649,14 +651,16 @@ static struct rtnetlink_link link_rtnetlink_table[RTM_NR_MSGTYPES] = { - [RTM_GETLINK - RTM_BASE] = { .dumpit = rtnetlink_dump_ifinfo }, - [RTM_SETLINK - RTM_BASE] = { .doit = do_setlink }, - [RTM_GETADDR - RTM_BASE] = { .dumpit = rtnetlink_dump_all }, - [RTM_GETROUTE - RTM_BASE] = { .dumpit = rtnetlink_dump_all }, - [RTM_NEWNEIGH - RTM_BASE] = { .doit = neigh_add }, - [RTM_DELNEIGH - RTM_BASE] = { .doit = neigh_delete }, - [RTM_GETNEIGH - RTM_BASE] = { .dumpit = neigh_dump_info }, - [RTM_GETRULE - RTM_BASE] = { .dumpit = rtnetlink_dump_all }, + [RTM_GETLINK - RTM_BASE] = { .dumpit = rtnetlink_dump_ifinfo }, + [RTM_SETLINK - RTM_BASE] = { .doit = do_setlink }, + [RTM_GETADDR - RTM_BASE] = { .dumpit = rtnetlink_dump_all }, + [RTM_GETROUTE - RTM_BASE] = { .dumpit = rtnetlink_dump_all }, + [RTM_NEWNEIGH - RTM_BASE] = { .doit = neigh_add }, + [RTM_DELNEIGH - RTM_BASE] = { .doit = neigh_delete }, + [RTM_GETNEIGH - RTM_BASE] = { .dumpit = neigh_dump_info }, + [RTM_GETRULE - RTM_BASE] = { .dumpit = rtnetlink_dump_all }, + [RTM_GETNEIGHTBL - RTM_BASE] = { .dumpit = neightbl_dump_info }, + [RTM_SETNEIGHTBL - RTM_BASE] = { .doit = neightbl_set }, }; static int rtnetlink_event(struct notifier_block *this, unsigned long event, void *ptr) Index: security/selinux/nlmsgtab.c =================================================================== --- 0ad944fa24c79b3eeaf06f85f5dd6d91ae1adbc9/security/selinux/nlmsgtab.c (mode:100644) +++ 7e955d0e3c8fdef048336d0d7ba24a0539f28f9b/security/selinux/nlmsgtab.c (mode:100644) @@ -63,6 +63,8 @@ { RTM_GETPREFIX, NETLINK_ROUTE_SOCKET__NLMSG_READ }, { RTM_GETMULTICAST, NETLINK_ROUTE_SOCKET__NLMSG_READ }, { RTM_GETANYCAST, NETLINK_ROUTE_SOCKET__NLMSG_READ }, + { RTM_GETNEIGHTBL, NETLINK_ROUTE_SOCKET__NLMSG_READ }, + { RTM_SETNEIGHTBL, NETLINK_ROUTE_SOCKET__NLMSG_WRITE }, }; static struct nlmsg_perm nlmsg_firewall_perms[] = ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink 2005-05-26 18:55 ` [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink Thomas Graf @ 2005-05-26 22:17 ` David S. Miller 2005-05-26 22:24 ` Thomas Graf 2005-05-26 22:26 ` Thomas Graf 2005-05-27 11:14 ` jamal 1 sibling, 2 replies; 26+ messages in thread From: David S. Miller @ 2005-05-26 22:17 UTC (permalink / raw) To: tgraf; +Cc: netdev From: Thomas Graf <tgraf@suug.ch> Date: Thu, 26 May 2005 20:55:26 +0200 > This message is followed by 0..n messages carrying device > specific parameter sets. They all share the same value for > Although the ordering should be sufficient, NDTA_NAME can be It seems a sentence is chopped off here mid-stream, please provide a fixed changelog so I can apply your patches. Thanks. ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink 2005-05-26 22:17 ` David S. Miller @ 2005-05-26 22:24 ` Thomas Graf 2005-05-26 22:26 ` Thomas Graf 1 sibling, 0 replies; 26+ messages in thread From: Thomas Graf @ 2005-05-26 22:24 UTC (permalink / raw) To: David S. Miller; +Cc: netdev To retrieve the neighbour tables send RTM_GETNEIGHTBL with the NLM_F_DUMP flag set. Every neighbour table configuration is spread over multiple messages to avoid running into message size limits on systems with many interfaces. The first message in the sequence transports all not device specific data such as statistics, configuration, and the default parameter set. This message is followed by 0..n messages carrying device specific parameter sets. Although the ordering should be sufficient, NDTA_NAME can be used to identify sequences. The initial message can be identified by checking for NDTA_CONFIG. The device specific messages do not contain this TLV but have NDTPA_IFINDEX set to the corresponding interface index. To change neighbour table attributes, send RTM_SETNEIGHTBL with NDTA_NAME set. Changeable attribute include NDTA_THRESH[1-3], NDTA_GC_INTERVAL, and all TLVs in NDTA_PARMS unless marked otherwise. Device specific parameter sets can be changed by setting NDTPA_IFINDEX to the interface index of the corresponding device. Signed-off-by: Thomas Graf <tgraf@suug.ch> --- commit 6769b3b0603c632bda062e427178de8986900a52 tree 7e955d0e3c8fdef048336d0d7ba24a0539f28f9b parent 98c3bc2e6320127cd96ef46d33f89ae127ce48fe author Thomas Graf <tgraf@suug.ch> Wed, 25 May 2005 20:30:09 +0200 committer Thomas Graf <tgraf@suug.ch> Wed, 25 May 2005 20:30:09 +0200 include/linux/rtnetlink.h | 107 ++++++++++++++ include/net/neighbour.h | 4 net/core/neighbour.c | 317 +++++++++++++++++++++++++++++++++++++++++++- net/core/rtnetlink.c | 20 +- security/selinux/nlmsgtab.c | 2 5 files changed, 439 insertions(+), 11 deletions(-) Index: include/linux/rtnetlink.h =================================================================== --- 0ad944fa24c79b3eeaf06f85f5dd6d91ae1adbc9/include/linux/rtnetlink.h (mode:100644) +++ 7e955d0e3c8fdef048336d0d7ba24a0539f28f9b/include/linux/rtnetlink.h (mode:100644) @@ -89,6 +89,13 @@ RTM_GETANYCAST = 62, #define RTM_GETANYCAST RTM_GETANYCAST + RTM_NEWNEIGHTBL = 64, +#define RTM_NEWNEIGHTBL RTM_NEWNEIGHTBL + RTM_GETNEIGHTBL = 66, +#define RTM_GETNEIGHTBL RTM_GETNEIGHTBL + RTM_SETNEIGHTBL, +#define RTM_SETNEIGHTBL RTM_SETNEIGHTBL + __RTM_MAX, #define RTM_MAX (((__RTM_MAX + 3) & ~3) - 1) }; @@ -493,6 +500,106 @@ __u32 ndm_refcnt; }; + +/***************************************************************** + * Neighbour tables specific messages. + * + * To retrieve the neighbour tables send RTM_GETNEIGHTBL with the + * NLM_F_DUMP flag set. Every neighbour table configuration is + * spread over multiple messages to avoid running into message + * size limits on systems with many interfaces. The first message + * in the sequence transports all not device specific data such as + * statistics, configuration, and the default parameter set. + * This message is followed by 0..n messages carrying device + * specific parameter sets. + * Although the ordering should be sufficient, NDTA_NAME can be + * used to identify sequences. The initial message can be identified + * by checking for NDTA_CONFIG. The device specific messages do + * not contain this TLV but have NDTPA_IFINDEX set to the + * corresponding interface index. + * + * To change neighbour table attributes, send RTM_SETNEIGHTBL + * with NDTA_NAME set. Changeable attribute include NDTA_THRESH[1-3], + * NDTA_GC_INTERVAL, and all TLVs in NDTA_PARMS unless marked + * otherwise. Device specific parameter sets can be changed by + * setting NDTPA_IFINDEX to the interface index of the corresponding + * device. + ****/ + +struct ndt_stats +{ + __u64 ndts_allocs; + __u64 ndts_destroys; + __u64 ndts_hash_grows; + __u64 ndts_res_failed; + __u64 ndts_lookups; + __u64 ndts_hits; + __u64 ndts_rcv_probes_mcast; + __u64 ndts_rcv_probes_ucast; + __u64 ndts_periodic_gc_runs; + __u64 ndts_forced_gc_runs; +}; + +enum { + NDTPA_UNSPEC, + NDTPA_IFINDEX, /* u32, unchangeable */ + NDTPA_REFCNT, /* u32, read-only */ + NDTPA_REACHABLE_TIME, /* u64, read-only, msecs */ + NDTPA_BASE_REACHABLE_TIME, /* u64, msecs */ + NDTPA_RETRANS_TIME, /* u64, msecs */ + NDTPA_GC_STALETIME, /* u64, msecs */ + NDTPA_DELAY_PROBE_TIME, /* u64, msecs */ + NDTPA_QUEUE_LEN, /* u32 */ + NDTPA_APP_PROBES, /* u32 */ + NDTPA_UCAST_PROBES, /* u32 */ + NDTPA_MCAST_PROBES, /* u32 */ + NDTPA_ANYCAST_DELAY, /* u64, msecs */ + NDTPA_PROXY_DELAY, /* u64, msecs */ + NDTPA_PROXY_QLEN, /* u32 */ + NDTPA_LOCKTIME, /* u64, msecs */ + __NDTPA_MAX +}; +#define NDTPA_MAX (__NDTPA_MAX - 1) + +struct ndtmsg +{ + __u8 ndtm_family; + __u8 ndtm_pad1; + __u16 ndtm_pad2; +}; + +struct ndt_config +{ + __u16 ndtc_key_len; + __u16 ndtc_entry_size; + __u32 ndtc_entries; + __u32 ndtc_last_flush; /* delta to now in msecs */ + __u32 ndtc_last_rand; /* delta to now in msecs */ + __u32 ndtc_hash_rnd; + __u32 ndtc_hash_mask; + __u32 ndtc_hash_chain_gc; + __u32 ndtc_proxy_qlen; +}; + +enum { + NDTA_UNSPEC, + NDTA_NAME, /* char *, unchangeable */ + NDTA_THRESH1, /* u32 */ + NDTA_THRESH2, /* u32 */ + NDTA_THRESH3, /* u32 */ + NDTA_CONFIG, /* struct ndt_config, read-only */ + NDTA_PARMS, /* nested TLV NDTPA_* */ + NDTA_STATS, /* struct ndt_stats, read-only */ + NDTA_GC_INTERVAL, /* u64, msecs */ + __NDTA_MAX +}; +#define NDTA_MAX (__NDTA_MAX - 1) + +#define NDTA_RTA(r) ((struct rtattr*)(((char*)(r)) + \ + NLMSG_ALIGN(sizeof(struct ndtmsg)))) +#define NDTA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct ndtmsg)) + + /**** * General form of address family dependent message. ****/ Index: include/net/neighbour.h =================================================================== --- 0ad944fa24c79b3eeaf06f85f5dd6d91ae1adbc9/include/net/neighbour.h (mode:100644) +++ 7e955d0e3c8fdef048336d0d7ba24a0539f28f9b/include/net/neighbour.h (mode:100644) @@ -65,6 +65,7 @@ struct neigh_parms { + struct net_device *dev; struct neigh_parms *next; int (*neigh_setup)(struct neighbour *); struct neigh_table *tbl; @@ -252,6 +253,9 @@ extern int neigh_delete(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg); extern void neigh_app_ns(struct neighbour *n); +extern int neightbl_dump_info(struct sk_buff *skb, struct netlink_callback *cb); +extern int neightbl_set(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg); + extern void neigh_for_each(struct neigh_table *tbl, void (*cb)(struct neighbour *, void *), void *cookie); extern void __neigh_for_each_release(struct neigh_table *tbl, int (*cb)(struct neighbour *)); extern void pneigh_for_each(struct neigh_table *tbl, void (*cb)(struct pneigh_entry *)); Index: net/core/neighbour.c =================================================================== --- 0ad944fa24c79b3eeaf06f85f5dd6d91ae1adbc9/net/core/neighbour.c (mode:100644) +++ 7e955d0e3c8fdef048336d0d7ba24a0539f28f9b/net/core/neighbour.c (mode:100644) @@ -1276,9 +1276,14 @@ INIT_RCU_HEAD(&p->rcu_head); p->reachable_time = neigh_rand_reach_time(p->base_reachable_time); - if (dev && dev->neigh_setup && dev->neigh_setup(dev, p)) { - kfree(p); - return NULL; + if (dev) { + if (dev->neigh_setup && dev->neigh_setup(dev, p)) { + kfree(p); + return NULL; + } + + dev_hold(dev); + p->dev = dev; } p->sysctl_table = NULL; write_lock_bh(&tbl->lock); @@ -1309,6 +1314,8 @@ *p = parms->next; parms->dead = 1; write_unlock_bh(&tbl->lock); + if (parms->dev) + dev_put(parms->dev); call_rcu(&parms->rcu_head, neigh_rcu_free_parms); return; } @@ -1546,6 +1553,308 @@ return err; } +static int neightbl_fill_parms(struct sk_buff *skb, struct neigh_parms *parms) +{ + struct rtattr *nest = RTA_NEST(skb, NDTA_PARMS); + + if (parms->dev) + RTA_PUT_U32(skb, NDTPA_IFINDEX, parms->dev->ifindex); + + RTA_PUT_U32(skb, NDTPA_REFCNT, atomic_read(&parms->refcnt)); + RTA_PUT_U32(skb, NDTPA_QUEUE_LEN, parms->queue_len); + RTA_PUT_U32(skb, NDTPA_PROXY_QLEN, parms->proxy_qlen); + RTA_PUT_U32(skb, NDTPA_APP_PROBES, parms->app_probes); + RTA_PUT_U32(skb, NDTPA_UCAST_PROBES, parms->ucast_probes); + RTA_PUT_U32(skb, NDTPA_MCAST_PROBES, parms->mcast_probes); + RTA_PUT_MSECS(skb, NDTPA_REACHABLE_TIME, parms->reachable_time); + RTA_PUT_MSECS(skb, NDTPA_BASE_REACHABLE_TIME, + parms->base_reachable_time); + RTA_PUT_MSECS(skb, NDTPA_GC_STALETIME, parms->gc_staletime); + RTA_PUT_MSECS(skb, NDTPA_DELAY_PROBE_TIME, parms->delay_probe_time); + RTA_PUT_MSECS(skb, NDTPA_RETRANS_TIME, parms->retrans_time); + RTA_PUT_MSECS(skb, NDTPA_ANYCAST_DELAY, parms->anycast_delay); + RTA_PUT_MSECS(skb, NDTPA_PROXY_DELAY, parms->proxy_delay); + RTA_PUT_MSECS(skb, NDTPA_LOCKTIME, parms->locktime); + + return RTA_NEST_END(skb, nest); + +rtattr_failure: + return RTA_NEST_CANCEL(skb, nest); +} + +static int neightbl_fill_info(struct neigh_table *tbl, struct sk_buff *skb, + struct netlink_callback *cb) +{ + struct nlmsghdr *nlh; + struct ndtmsg *ndtmsg; + + nlh = NLMSG_PUT_ANSWER(skb, cb, RTM_NEWNEIGHTBL, sizeof(struct ndtmsg)); + ndtmsg = NLMSG_DATA(nlh); + + NLMSG_SET_MULTIPART(nlh); + + read_lock_bh(&tbl->lock); + ndtmsg->ndtm_family = tbl->family; + + RTA_PUT_STRING(skb, NDTA_NAME, tbl->id); + RTA_PUT_MSECS(skb, NDTA_GC_INTERVAL, tbl->gc_interval); + RTA_PUT_U32(skb, NDTA_THRESH1, tbl->gc_thresh1); + RTA_PUT_U32(skb, NDTA_THRESH2, tbl->gc_thresh2); + RTA_PUT_U32(skb, NDTA_THRESH3, tbl->gc_thresh3); + + { + unsigned long now = jiffies; + unsigned int flush_delta = now - tbl->last_flush; + unsigned int rand_delta = now - tbl->last_rand; + + struct ndt_config ndc = { + .ndtc_key_len = tbl->key_len, + .ndtc_entry_size = tbl->entry_size, + .ndtc_entries = atomic_read(&tbl->entries), + .ndtc_last_flush = jiffies_to_msecs(flush_delta), + .ndtc_last_rand = jiffies_to_msecs(rand_delta), + .ndtc_hash_rnd = tbl->hash_rnd, + .ndtc_hash_mask = tbl->hash_mask, + .ndtc_hash_chain_gc = tbl->hash_chain_gc, + .ndtc_proxy_qlen = tbl->proxy_queue.qlen, + }; + + RTA_PUT(skb, NDTA_CONFIG, sizeof(ndc), &ndc); + } + + { + int cpu; + struct ndt_stats ndst; + + memset(&ndst, 0, sizeof(ndst)); + + for (cpu = 0; cpu < NR_CPUS; cpu++) { + struct neigh_statistics *st; + + if (!cpu_possible(cpu)) + continue; + + st = per_cpu_ptr(tbl->stats, cpu); + ndst.ndts_allocs += st->allocs; + ndst.ndts_destroys += st->destroys; + ndst.ndts_hash_grows += st->hash_grows; + ndst.ndts_res_failed += st->res_failed; + ndst.ndts_lookups += st->lookups; + ndst.ndts_hits += st->hits; + ndst.ndts_rcv_probes_mcast += st->rcv_probes_mcast; + ndst.ndts_rcv_probes_ucast += st->rcv_probes_ucast; + ndst.ndts_periodic_gc_runs += st->periodic_gc_runs; + ndst.ndts_forced_gc_runs += st->forced_gc_runs; + } + + RTA_PUT(skb, NDTA_STATS, sizeof(ndst), &ndst); + } + + BUG_ON(tbl->parms.dev); + if (neightbl_fill_parms(skb, &tbl->parms) < 0) + goto rtattr_failure; + + read_unlock_bh(&tbl->lock); + return NLMSG_END(skb, nlh); + +rtattr_failure: + read_unlock_bh(&tbl->lock); + return NLMSG_CANCEL(skb, nlh); + +nlmsg_failure: + return -1; +} + +static int neightbl_fill_param_info(struct neigh_table *tbl, + struct neigh_parms *parms, + struct sk_buff *skb, + struct netlink_callback *cb) +{ + struct ndtmsg *ndtmsg; + struct nlmsghdr *nlh; + + nlh = NLMSG_PUT_ANSWER(skb, cb, RTM_NEWNEIGHTBL, sizeof(struct ndtmsg)); + ndtmsg = NLMSG_DATA(nlh); + + NLMSG_SET_MULTIPART(nlh); + + read_lock_bh(&tbl->lock); + ndtmsg->ndtm_family = tbl->family; + RTA_PUT_STRING(skb, NDTA_NAME, tbl->id); + + if (neightbl_fill_parms(skb, parms) < 0) + goto rtattr_failure; + + read_unlock_bh(&tbl->lock); + return NLMSG_END(skb, nlh); + +rtattr_failure: + read_unlock_bh(&tbl->lock); + return NLMSG_CANCEL(skb, nlh); + +nlmsg_failure: + return -1; +} + +static inline struct neigh_parms *lookup_neigh_params(struct neigh_table *tbl, + int ifindex) +{ + struct neigh_parms *p; + + for (p = &tbl->parms; p; p = p->next) + if ((p->dev && p->dev->ifindex == ifindex) || + (!p->dev && !ifindex)) + return p; + + return NULL; +} + +int neightbl_set(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg) +{ + struct neigh_table *tbl; + struct ndtmsg *ndtmsg = NLMSG_DATA(nlh); + struct rtattr **tb = arg; + int err = -EINVAL; + + if (!tb[NDTA_NAME - 1] || !RTA_PAYLOAD(tb[NDTA_NAME - 1])) + return -EINVAL; + + read_lock(&neigh_tbl_lock); + for (tbl = neigh_tables; tbl; tbl = tbl->next) { + if (ndtmsg->ndtm_family && tbl->family != ndtmsg->ndtm_family) + continue; + + if (!rtattr_strcmp(tb[NDTA_NAME - 1], tbl->id)) + break; + } + + if (tbl == NULL) { + err = -ENOENT; + goto errout; + } + + /* + * We acquire tbl->lock to be nice to the periodic timers and + * make sure they always see a consistent set of values. + */ + write_lock_bh(&tbl->lock); + + if (tb[NDTA_THRESH1 - 1]) + tbl->gc_thresh1 = RTA_GET_U32(tb[NDTA_THRESH1 - 1]); + + if (tb[NDTA_THRESH2 - 1]) + tbl->gc_thresh2 = RTA_GET_U32(tb[NDTA_THRESH2 - 1]); + + if (tb[NDTA_THRESH3 - 1]) + tbl->gc_thresh3 = RTA_GET_U32(tb[NDTA_THRESH3 - 1]); + + if (tb[NDTA_GC_INTERVAL - 1]) + tbl->gc_interval = RTA_GET_MSECS(tb[NDTA_GC_INTERVAL - 1]); + + if (tb[NDTA_PARMS - 1]) { + struct rtattr *tbp[NDTPA_MAX]; + struct neigh_parms *p; + u32 ifindex = 0; + + if (rtattr_parse_nested(tbp, NDTPA_MAX, tb[NDTA_PARMS - 1]) < 0) + goto rtattr_failure; + + if (tbp[NDTPA_IFINDEX - 1]) + ifindex = RTA_GET_U32(tbp[NDTPA_IFINDEX - 1]); + + p = lookup_neigh_params(tbl, ifindex); + if (p == NULL) { + err = -ENOENT; + goto rtattr_failure; + } + + if (tbp[NDTPA_QUEUE_LEN - 1]) + p->queue_len = RTA_GET_U32(tbp[NDTPA_QUEUE_LEN - 1]); + + if (tbp[NDTPA_PROXY_QLEN - 1]) + p->proxy_qlen = RTA_GET_U32(tbp[NDTPA_PROXY_QLEN - 1]); + + if (tbp[NDTPA_APP_PROBES - 1]) + p->app_probes = RTA_GET_U32(tbp[NDTPA_APP_PROBES - 1]); + + if (tbp[NDTPA_UCAST_PROBES - 1]) + p->ucast_probes = + RTA_GET_U32(tbp[NDTPA_UCAST_PROBES - 1]); + + if (tbp[NDTPA_MCAST_PROBES - 1]) + p->mcast_probes = + RTA_GET_U32(tbp[NDTPA_MCAST_PROBES - 1]); + + if (tbp[NDTPA_BASE_REACHABLE_TIME - 1]) + p->base_reachable_time = + RTA_GET_MSECS(tbp[NDTPA_BASE_REACHABLE_TIME - 1]); + + if (tbp[NDTPA_GC_STALETIME - 1]) + p->gc_staletime = + RTA_GET_MSECS(tbp[NDTPA_GC_STALETIME - 1]); + + if (tbp[NDTPA_DELAY_PROBE_TIME - 1]) + p->delay_probe_time = + RTA_GET_MSECS(tbp[NDTPA_DELAY_PROBE_TIME - 1]); + + if (tbp[NDTPA_RETRANS_TIME - 1]) + p->retrans_time = + RTA_GET_MSECS(tbp[NDTPA_RETRANS_TIME - 1]); + + if (tbp[NDTPA_ANYCAST_DELAY - 1]) + p->anycast_delay = + RTA_GET_MSECS(tbp[NDTPA_ANYCAST_DELAY - 1]); + + if (tbp[NDTPA_PROXY_DELAY - 1]) + p->proxy_delay = + RTA_GET_MSECS(tbp[NDTPA_PROXY_DELAY - 1]); + + if (tbp[NDTPA_LOCKTIME - 1]) + p->locktime = RTA_GET_MSECS(tbp[NDTPA_LOCKTIME - 1]); + } + + err = 0; + +rtattr_failure: + write_unlock_bh(&tbl->lock); +errout: + read_unlock(&neigh_tbl_lock); + return err; +} + +int neightbl_dump_info(struct sk_buff *skb, struct netlink_callback *cb) +{ + int idx, family; + int s_idx = cb->args[0]; + struct neigh_table *tbl; + + family = ((struct rtgenmsg *)NLMSG_DATA(cb->nlh))->rtgen_family; + + read_lock(&neigh_tbl_lock); + for (tbl = neigh_tables, idx = 0; tbl; tbl = tbl->next) { + struct neigh_parms *p; + + if (idx < s_idx || (family && tbl->family != family)) + continue; + + if (neightbl_fill_info(tbl, skb, cb) <= 0) + break; + + for (++idx, p = tbl->parms.next; p; p = p->next, idx++) { + if (idx < s_idx) + continue; + + if (neightbl_fill_param_info(tbl, p, skb, cb) <= 0) + goto out; + } + + } +out: + read_unlock(&neigh_tbl_lock); + cb->args[0] = idx; + + return skb->len; +} static int neigh_fill_info(struct sk_buff *skb, struct neighbour *n, u32 pid, u32 seq, int event) @@ -2352,6 +2661,8 @@ EXPORT_SYMBOL(neigh_update_hhs); EXPORT_SYMBOL(pneigh_enqueue); EXPORT_SYMBOL(pneigh_lookup); +EXPORT_SYMBOL(neightbl_dump_info); +EXPORT_SYMBOL(neightbl_set); #ifdef CONFIG_ARPD EXPORT_SYMBOL(neigh_app_ns); Index: net/core/rtnetlink.c =================================================================== --- 0ad944fa24c79b3eeaf06f85f5dd6d91ae1adbc9/net/core/rtnetlink.c (mode:100644) +++ 7e955d0e3c8fdef048336d0d7ba24a0539f28f9b/net/core/rtnetlink.c (mode:100644) @@ -100,6 +100,7 @@ [RTM_FAM(RTM_NEWPREFIX)] = NLMSG_LENGTH(sizeof(struct rtgenmsg)), [RTM_FAM(RTM_GETMULTICAST)] = NLMSG_LENGTH(sizeof(struct rtgenmsg)), [RTM_FAM(RTM_GETANYCAST)] = NLMSG_LENGTH(sizeof(struct rtgenmsg)), + [RTM_FAM(RTM_NEWNEIGHTBL)] = NLMSG_LENGTH(sizeof(struct ndtmsg)), }; static const int rta_max[RTM_NR_FAMILIES] = @@ -113,6 +114,7 @@ [RTM_FAM(RTM_NEWTCLASS)] = TCA_MAX, [RTM_FAM(RTM_NEWTFILTER)] = TCA_MAX, [RTM_FAM(RTM_NEWACTION)] = TCAA_MAX, + [RTM_FAM(RTM_NEWNEIGHTBL)] = NDTA_MAX, }; void __rta_fill(struct sk_buff *skb, int attrtype, int attrlen, const void *data) @@ -649,14 +651,16 @@ static struct rtnetlink_link link_rtnetlink_table[RTM_NR_MSGTYPES] = { - [RTM_GETLINK - RTM_BASE] = { .dumpit = rtnetlink_dump_ifinfo }, - [RTM_SETLINK - RTM_BASE] = { .doit = do_setlink }, - [RTM_GETADDR - RTM_BASE] = { .dumpit = rtnetlink_dump_all }, - [RTM_GETROUTE - RTM_BASE] = { .dumpit = rtnetlink_dump_all }, - [RTM_NEWNEIGH - RTM_BASE] = { .doit = neigh_add }, - [RTM_DELNEIGH - RTM_BASE] = { .doit = neigh_delete }, - [RTM_GETNEIGH - RTM_BASE] = { .dumpit = neigh_dump_info }, - [RTM_GETRULE - RTM_BASE] = { .dumpit = rtnetlink_dump_all }, + [RTM_GETLINK - RTM_BASE] = { .dumpit = rtnetlink_dump_ifinfo }, + [RTM_SETLINK - RTM_BASE] = { .doit = do_setlink }, + [RTM_GETADDR - RTM_BASE] = { .dumpit = rtnetlink_dump_all }, + [RTM_GETROUTE - RTM_BASE] = { .dumpit = rtnetlink_dump_all }, + [RTM_NEWNEIGH - RTM_BASE] = { .doit = neigh_add }, + [RTM_DELNEIGH - RTM_BASE] = { .doit = neigh_delete }, + [RTM_GETNEIGH - RTM_BASE] = { .dumpit = neigh_dump_info }, + [RTM_GETRULE - RTM_BASE] = { .dumpit = rtnetlink_dump_all }, + [RTM_GETNEIGHTBL - RTM_BASE] = { .dumpit = neightbl_dump_info }, + [RTM_SETNEIGHTBL - RTM_BASE] = { .doit = neightbl_set }, }; static int rtnetlink_event(struct notifier_block *this, unsigned long event, void *ptr) Index: security/selinux/nlmsgtab.c =================================================================== --- 0ad944fa24c79b3eeaf06f85f5dd6d91ae1adbc9/security/selinux/nlmsgtab.c (mode:100644) +++ 7e955d0e3c8fdef048336d0d7ba24a0539f28f9b/security/selinux/nlmsgtab.c (mode:100644) @@ -63,6 +63,8 @@ { RTM_GETPREFIX, NETLINK_ROUTE_SOCKET__NLMSG_READ }, { RTM_GETMULTICAST, NETLINK_ROUTE_SOCKET__NLMSG_READ }, { RTM_GETANYCAST, NETLINK_ROUTE_SOCKET__NLMSG_READ }, + { RTM_GETNEIGHTBL, NETLINK_ROUTE_SOCKET__NLMSG_READ }, + { RTM_SETNEIGHTBL, NETLINK_ROUTE_SOCKET__NLMSG_WRITE }, }; static struct nlmsg_perm nlmsg_firewall_perms[] = ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink 2005-05-26 22:17 ` David S. Miller 2005-05-26 22:24 ` Thomas Graf @ 2005-05-26 22:26 ` Thomas Graf 2005-05-26 22:37 ` David S. Miller 1 sibling, 1 reply; 26+ messages in thread From: Thomas Graf @ 2005-05-26 22:26 UTC (permalink / raw) To: David S. Miller; +Cc: netdev * David S. Miller <20050526.151726.85409972.davem@davemloft.net> 2005-05-26 15:17 > From: Thomas Graf <tgraf@suug.ch> > Date: Thu, 26 May 2005 20:55:26 +0200 > > > This message is followed by 0..n messages carrying device > > specific parameter sets. They all share the same value for > > Although the ordering should be sufficient, NDTA_NAME can be > > It seems a sentence is chopped off here mid-stream, > please provide a fixed changelog so I can apply your > patches. Sorry, it's what happens if one gets interrupted in the middle of writing a longer comment. ;-> Patch edited by hand but it should still apply. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink 2005-05-26 22:26 ` Thomas Graf @ 2005-05-26 22:37 ` David S. Miller 0 siblings, 0 replies; 26+ messages in thread From: David S. Miller @ 2005-05-26 22:37 UTC (permalink / raw) To: tgraf; +Cc: netdev From: Thomas Graf <tgraf@suug.ch> Date: Fri, 27 May 2005 00:26:56 +0200 > Sorry, it's what happens if one gets interrupted in the middle > of writing a longer comment. ;-> Patch edited by hand but > it should still apply. Thanks. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink 2005-05-26 18:55 ` [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink Thomas Graf 2005-05-26 22:17 ` David S. Miller @ 2005-05-27 11:14 ` jamal 2005-05-27 12:15 ` Thomas Graf 1 sibling, 1 reply; 26+ messages in thread From: jamal @ 2005-05-27 11:14 UTC (permalink / raw) To: Thomas Graf; +Cc: David S. Miller, netdev On Thu, 2005-26-05 at 20:55 +0200, Thomas Graf wrote: > To retrieve the neighbour tables send RTM_GETNEIGHTBL with the > NLM_F_DUMP flag set. Every neighbour table configuration is > spread over multiple messages to avoid running into message > size limits on systems with many interfaces. The first message > in the sequence transports all not device specific data such as > statistics, configuration, and the default parameter set. > This message is followed by 0..n messages carrying device > specific parameter sets. They all share the same value for > Although the ordering should be sufficient, NDTA_NAME can be > used to identify sequences. The initial message can be identified > by checking for NDTA_CONFIG. The device specific messages do > no contain this TLV but have NDTPA_IFINDEX set to the > corresponding interface index. > > To change neighbour table attributes, send RTM_SETNEIGHTBL > with NDTA_NAME set. Changeable attribute include NDTA_THRESH[1-3], > NDTA_GC_INTERVAL, and all TLVs in NDTA_PARMS unless marked > otherwise. Device specific parameter sets can be changed by > setting NDTPA_IFINDEX to the interface index of the corresponding > device. It would have been better, IMO, to just work on a generic idev/devinet retrieval and setting instead of one component (as in ARP in this case). cheers, jamal ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink 2005-05-27 11:14 ` jamal @ 2005-05-27 12:15 ` Thomas Graf 2005-05-27 13:50 ` jamal 0 siblings, 1 reply; 26+ messages in thread From: Thomas Graf @ 2005-05-27 12:15 UTC (permalink / raw) To: jamal; +Cc: David S. Miller, netdev * jamal <1117192464.6688.3.camel@localhost.localdomain> 2005-05-27 07:14 > It would have been better, IMO, to just work on a generic idev/devinet > retrieval and setting instead of one component (as in ARP in this > case). I thought about adding a devinet component to allow retrieving/setting device specific parameters (along with other settings) but dropped the idea because I did not want to implement the data structures twice. I think this is cleaner, it allows one request to be sent to see the complete arp/ndisc configuration rather than sending one to retrieve the global configuration/statistics and another one to retrieve device specific parameters. Nevertheless, I'm open for changes so once I've written the devinet component to change rp_filter, forwarding, arp_filter, etc. we can still move the device specific NDTA_PARMS to the devinet component if we still think it would be better. For now, the neightbl component is clean, small, well defined and probably never needs to be touched again. Is that acceptable for you? ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink 2005-05-27 12:15 ` Thomas Graf @ 2005-05-27 13:50 ` jamal [not found] ` <20050527141023.GP15391@postel.suug.ch> 0 siblings, 1 reply; 26+ messages in thread From: jamal @ 2005-05-27 13:50 UTC (permalink / raw) To: Thomas Graf; +Cc: David S. Miller, netdev On Fri, 2005-27-05 at 14:15 +0200, Thomas Graf wrote: > * jamal <1117192464.6688.3.camel@localhost.localdomain> 2005-05-27 07:14 > > It would have been better, IMO, to just work on a generic idev/devinet > > retrieval and setting instead of one component (as in ARP in this > > case). > > I thought about adding a devinet component to allow retrieving/setting > device specific parameters (along with other settings) but dropped the > idea because I did not want to implement the data structures twice. > > I think this is cleaner, it allows one request to be sent to see > the complete arp/ndisc configuration rather than sending one to > retrieve the global configuration/statistics and another one > to retrieve device specific parameters. > > Nevertheless, I'm open for changes so once I've written the devinet > component to change rp_filter, forwarding, arp_filter, etc. we can > still move the device specific NDTA_PARMS to the devinet component > if we still think it would be better. For now, the neightbl component > is clean, small, well defined and probably never needs to be touched > again. > > Is that acceptable for you? > I think depends on where we are going. My view is that the current abstraction model is fine for this specific bit. i.e from an abstraction point of view these parameters are per device per protocol (v4/v6/etc). In other words, if i was to draw a model diagram it would show: --> netdevice ----> v4 addresses ----> v4 netdevice configuration ----> v6 addresses ----> v6 netdevice configuration i.e the best way to do it imo, especially since you are doing this from scratch, is to maintain that model. Your patch breaks away from this. The only thing that is not devconfig queriable or settable at the moment is the stats. That i can certainly see as being part of the standard neighbor details for example. Infact this would apply elsewhere as well, for example in the FIB where some stats are being displayed via /proc at the moment. We actually did discuss this a while back and i had an incomplete patch for doing it the way i suggest above which i unfortunately cant seem to locate at the moment. cheers, jamal ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <20050527141023.GP15391@postel.suug.ch>]
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink [not found] ` <20050527141023.GP15391@postel.suug.ch> @ 2005-05-27 14:57 ` jamal 2005-05-27 15:16 ` Thomas Graf 0 siblings, 1 reply; 26+ messages in thread From: jamal @ 2005-05-27 14:57 UTC (permalink / raw) To: Thomas Graf; +Cc: David S. Miller, netdev On Fri, 2005-27-05 at 16:10 +0200, Thomas Graf wrote: > * jamal <1117201853.6383.29.camel@localhost.localdomain> 2005-05-27 09:50 > > In other words, if i was to draw a model diagram it would show: > > --> netdevice > > ----> v4 addresses > > ----> v4 netdevice configuration > > ----> v6 addresses > > ----> v6 netdevice configuration > > > > i.e the best way to do it imo, especially since you are doing this from > > scratch, is to maintain that model. Your patch breaks away from this. > > I think this model is only valid for neighbour table parameter set > and not for neighbour tables themselves. > Yes, that's true. In other words, neighbor tables details are out of scope of that model and are more appropriate to the v4/6 neighbor "box". Anything else that is netdevice related or connected as in the above model (as opposed to say the ARP "box" related detail) is better to leave or add it to be part of devconfig if it doesnt exist. To be precise there are things in your patch that are fine to leave there; but the most it seems to me belong to the devconfig netlink path (which doesnt exist today). > > The only thing that is not devconfig queriable or settable at the moment > > is the stats. That i can certainly see as being part of the standard > > neighbor details for example. Infact this would apply elsewhere as well, > > for example in the FIB where some stats are being displayed via /proc at > > the moment. > > This is not correct, struct ndt_config (read-only configuration, sort > of like statistics), gc thresholds, gc interval, and the default parameter > set are also not devconfig queriable. From a user point of view the > most commonly changed values will be the gc thresholds, gc interval > and parameters in the default set and if you forget about the device > specific parameters for a second it would be pretty obvious to not put > them into devconfig. So my point is that I don't want to put the most > obvious and common use into something where it doesn't really fit. The thresholds etc are there - just not netlink accessible. Example, the default values for V4 are settable or queriable via: /proc/sys/net/ipv4/neigh/default/gc_thresh3 /proc/sys/net/ipv4/neigh/default/gc_thresh2 /proc/sys/net/ipv4/neigh/default/gc_thresh1 /proc/sys/net/ipv4/neigh/default/gc_interval > I do agree that the device specific parameters fit into a devconfig > or whathever we'll call it but not the whole neighbour table > configuration bits. > Agreed that some things are specific to the "ARP" or "NDISC" block but not all that you have belongs to those boxes. > > We actually did discuss this a while back and i had an incomplete patch > > for doing it the way i suggest above which i unfortunately cant seem to > > locate at the moment. > > Can't remember to have seen such as mail or patch, do you remember > the thread you posted this? I posted the patch 3 times and I just > read through the archives again and can't find any such commments. I never posted the patch unfortunately but mentioned it in the discussions. What counts is we are having this discussion now;-> I apologize i didnt comment on the patch earlier - sometimes when i am away from the list for sometime i skip threads which may require me to spend time looking at them to make any useful comments. This maybe the reason i didnt respond. To catchup i normally read emails addressed to me first then go over the list in a LIFO mode and stop once i get engaged in a discussion. cheers, jamal ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink 2005-05-27 14:57 ` jamal @ 2005-05-27 15:16 ` Thomas Graf 2005-05-27 15:56 ` jamal 0 siblings, 1 reply; 26+ messages in thread From: Thomas Graf @ 2005-05-27 15:16 UTC (permalink / raw) To: jamal; +Cc: David S. Miller, netdev * jamal <1117205822.6383.68.camel@localhost.localdomain> 2005-05-27 10:57 > The thresholds etc are there - just not netlink accessible. Example, the > default values for V4 are settable or queriable via: > > /proc/sys/net/ipv4/neigh/default/gc_thresh3 > /proc/sys/net/ipv4/neigh/default/gc_thresh2 > /proc/sys/net/ipv4/neigh/default/gc_thresh1 > /proc/sys/net/ipv4/neigh/default/gc_interval Sorry but this is just one big hack: a) they do not belong there, the first implication a user does when he sees default/, dev1/, dev2/, devN is that default/ covers the same set of parameters as devX/. Therefore he expects all parameters in default/ to be also in devX/. b) struct neigh_parms parms; /* HACK. gc_* shoul follow parms without a gap! */ int gc_interval; int gc_thresh1; int gc_thresh2; int gc_thresh3; ... if (dev) { dev_name_source = dev->name; t->neigh_dev[0].ctl_name = dev->ifindex; t->neigh_vars[12].procname = NULL; <- gc_interval t->neigh_vars[13].procname = NULL; <- gc_thresh1 t->neigh_vars[14].procname = NULL; <- gc_thresh2 t->neigh_vars[15].procname = NULL; <- gc_thresh3 } else { dev_name_source = t->neigh_dev[0].procname; t->neigh_vars[12].data = (int *)(p + 1); t->neigh_vars[13].data = (int *)(p + 1) + 1; t->neigh_vars[14].data = (int *)(p + 1) + 2; t->neigh_vars[15].data = (int *)(p + 1) + 3; } I will not push along these hacks into my code ;-> Some more notes on the devconfig idea: I'm not sure if you are still implying devconfig as devinet but if so: although we only have neighbour tables on devinets so far there might be neighbour table on other interfaces for whatever reason. A protocol might allocate a neighbour parameter set on any netdevice which makes me think that the netlink neighbour code should be open for this as well. It depends on how we do the devconfig thing, so what I suggest it so leave this as-is for the moment and make a decision once we know more about how the devconfig thing will look like. Reasonable? > I apologize i didnt comment on the patch earlier - sometimes when i am > away from the list for sometime i skip threads which may require me to > spend time looking at them to make any useful comments. This maybe the > reason i didnt respond. To catchup i normally read emails addressed to > me first then go over the list in a LIFO mode and stop once i get > engaged in a discussion. I should have CCed you, my fault. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink 2005-05-27 15:16 ` Thomas Graf @ 2005-05-27 15:56 ` jamal 2005-05-27 16:35 ` Thomas Graf 0 siblings, 1 reply; 26+ messages in thread From: jamal @ 2005-05-27 15:56 UTC (permalink / raw) To: Thomas Graf; +Cc: David S. Miller, netdev On Fri, 2005-27-05 at 17:16 +0200, Thomas Graf wrote: > * jamal <1117205822.6383.68.camel@localhost.localdomain> 2005-05-27 10:57 > > The thresholds etc are there - just not netlink accessible. Example, the > > default values for V4 are settable or queriable via: > > > > /proc/sys/net/ipv4/neigh/default/gc_thresh3 > > /proc/sys/net/ipv4/neigh/default/gc_thresh2 > > /proc/sys/net/ipv4/neigh/default/gc_thresh1 > > /proc/sys/net/ipv4/neigh/default/gc_interval > > Sorry but this is just one big hack: It maybe in the user space representation, but not in the kernel abstraction which is what i was refering to. In other words look at struct in_device; > a) they do not belong there, the first implication a user does when > he sees default/, dev1/, dev2/, devN is that default/ covers > the same set of parameters as devX/. Therefore he expects all > parameters in default/ to be also in devX/. > The deafult can be overriden by devX. So they dont need to sync. But this is a separate topic > b) struct neigh_parms parms; > /* HACK. gc_* shoul follow parms without a gap! */ > int gc_interval; > int gc_thresh1; > int gc_thresh2; > int gc_thresh3; > > ... > > if (dev) { > dev_name_source = dev->name; > t->neigh_dev[0].ctl_name = dev->ifindex; > t->neigh_vars[12].procname = NULL; <- gc_interval > t->neigh_vars[13].procname = NULL; <- gc_thresh1 > t->neigh_vars[14].procname = NULL; <- gc_thresh2 > t->neigh_vars[15].procname = NULL; <- gc_thresh3 > } else { > dev_name_source = t->neigh_dev[0].procname; > t->neigh_vars[12].data = (int *)(p + 1); > t->neigh_vars[13].data = (int *)(p + 1) + 1; > t->neigh_vars[14].data = (int *)(p + 1) + 2; > t->neigh_vars[15].data = (int *)(p + 1) + 3; > } > > I will not push along these hacks into my code ;-> > I am afraid you are looking at this from the wrong angle (user space), Thomas ;-> The abstraction in the kernel is proper. To redraw that model again with the exact structure names: netdevice -> L2 config stuff config stuff more config stuff .. .. netdevice protocol config: -> v4 specific (struct in_device) ----> IPV4 addresses (ifa_list), ARP params(arp_parms),etc -> v6 specific (struct inet6_dev) ----> IPV6 addresses(addr_list), ndisc params (nd_parms), etc There are a few more items but i am leaving them out for brevity. I hope it makes more sense now. > Some more notes on the devconfig idea: I'm not sure if you are still > implying devconfig as devinet but if so: although we only have neighbour > tables on devinets so far there might be neighbour table on other > interfaces for whatever reason. in_devxxx types; sorry, I created the confusion by mentioning devconf in the sweep. > A protocol might allocate a neighbour > parameter set on any netdevice which makes me think that the netlink > neighbour code should be open for this as well. It depends on how we > do the devconfig thing, so what I suggest it so leave this as-is for > the moment and make a decision once we know more about how the devconfig > thing will look like. Reasonable? Take a look at what i said above and see if it makes sense. Note, I am not objecting to the patch like i said - I think some of those items are must put in that patch. Whatever simplifies life for you. I am not exactly a purist but at times i like getting it right from the beggining. cheers, jamal ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink 2005-05-27 15:56 ` jamal @ 2005-05-27 16:35 ` Thomas Graf 2005-05-28 1:42 ` jamal 0 siblings, 1 reply; 26+ messages in thread From: Thomas Graf @ 2005-05-27 16:35 UTC (permalink / raw) To: jamal; +Cc: David S. Miller, netdev * jamal <1117209411.6383.104.camel@localhost.localdomain> 2005-05-27 11:56 > On Fri, 2005-27-05 at 17:16 +0200, Thomas Graf wrote: > > Sorry but this is just one big hack: > > It maybe in the user space representation, but not in the kernel > abstraction which is what i was refering to. > In other words look at struct in_device; Yes, it references the arp device specific parameters and so does inet6_dev for the ndisc cache. But this repsents only the current state, we might have neighbour table parameter sets which are not bound to inetdevs in the future. However, I regard this as minor and I can agree on moving the device specific parameters into a inetdev based architecture but I do NOT agree on moving gc_ into this architecture as well, it doesn't belong there. Nitpicking for a bit, although inet6_dev and in_device hold reference to the their arp respectively ndisc parameter set, the sysctl interface does not use this reference but stores the ifindex of the netdevice, _which_ is correct I think because parameter sets are _not_ limited to inetdevs in terms of architecture but only in terms of current use. > The deafult can be overriden by devX. So they dont need to sync. > But this is a separate topic I was not talking about in-sync but rather that gc_* only appears in default/ but not in devX/. What I expect is that every default parameter can be overwritten in devX which is not true for gc_*. Which is the reason why I implemented them outside of the NDTPA_PARMS nested TLV. > I am afraid you are looking at this from the wrong angle (user space), > Thomas ;-> > The abstraction in the kernel is proper. > > To redraw that model again with the exact structure names: > > netdevice -> > L2 config stuff > config stuff > more config stuff > .. > .. > netdevice protocol config: > -> v4 specific (struct in_device) > ----> IPV4 addresses (ifa_list), ARP params(arp_parms),etc > -> v6 specific (struct inet6_dev) > ----> IPV6 addresses(addr_list), ndisc params (nd_parms), etc > > There are a few more items but i am leaving them out for brevity. > I hope it makes more sense now. I understand your architecture and if we follow this thought we'd have a "default" netdevice which repesents all default settings. I do agree with this architecture but the problematic question remains: Do we want parameters in "default" which are not available in devX? I think this question is what it gets down to in the end. If we say, yes we do want this, then we can implement all generic settings, such as tcp_, using this scheme as well. I don't disagree with this completely but I find it not very intuitive from a user perspective. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink 2005-05-27 16:35 ` Thomas Graf @ 2005-05-28 1:42 ` jamal 2005-05-28 12:07 ` Thomas Graf 0 siblings, 1 reply; 26+ messages in thread From: jamal @ 2005-05-28 1:42 UTC (permalink / raw) To: Thomas Graf; +Cc: David S. Miller, netdev On Fri, 2005-27-05 at 18:35 +0200, Thomas Graf wrote: > I do NOT agree on moving gc_ into this architecture as well, > it doesn't belong there. Well, you do realize they are part of the in_dev ? ;-> > Nitpicking for a bit, although inet6_dev and in_device hold > reference to the their arp respectively ndisc parameter set, > the sysctl interface does not use this reference but stores > the ifindex of the netdevice, _which_ is correct I think > because parameter sets are _not_ limited to inetdevs in terms > of architecture but only in terms of current use. > I see a little main service header with ifindex always no different than IFA or IFLINK etc followed by appropriate TLVs which are nested. Unfortunately i still cant find the patch - i started with a different approach; my immediate interest was to get events when someone made in_device changes. BTW, this is going to be one of the main challenges since there are many paths to configure these things. > > The deafult can be overriden by devX. So they dont need to sync. > > But this is a separate topic > > I was not talking about in-sync but rather that gc_* only > appears in default/ but not in devX/. What I expect is that > every default parameter can be overwritten in devX which > is not true for gc_*. Which is the reason why I implemented > them outside of the NDTPA_PARMS nested TLV. > Well, if you look at the structure there is no reason they should really be separate; infact theres a comment: ----------- struct neigh_parms parms; /* HACK. gc_* shoul follow parms without a gap! */ int gc_interval; int gc_thresh1; int gc_thresh2; ---------- To me the abstraction is pretty clear. I would agree that the way parameters configurable from user space and some of the methods may not be the best in terms of neighbor tables organization. > I understand your architecture and if we follow this thought > we'd have a "default" netdevice which repesents all default > settings. >From looking at the code, the default stuff seems to be "hardcoded". Example in the definition arp_tbl. > I do agree with this architecture but the problematic > question remains: Do we want parameters in "default" which are > not available in devX? I think this question is what it gets > down to in the end. If we say, yes we do want this, then we > can implement all generic settings, such as tcp_, using this > scheme as well. I don't disagree with this completely but I > find it not very intuitive from a user perspective. The model like i said is clean. There are some issues i have qualms with - such as IP address arrangements and tight integration with netdevices - but those can addressed at a later time. cheers, jamal ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink 2005-05-28 1:42 ` jamal @ 2005-05-28 12:07 ` Thomas Graf 2005-05-31 10:04 ` jamal 0 siblings, 1 reply; 26+ messages in thread From: Thomas Graf @ 2005-05-28 12:07 UTC (permalink / raw) To: jamal; +Cc: David S. Miller, netdev * jamal <1117244567.6251.34.camel@localhost.localdomain> 2005-05-27 21:42 > On Fri, 2005-27-05 at 18:35 +0200, Thomas Graf wrote: > > > I do NOT agree on moving gc_ into this architecture as well, > > it doesn't belong there. > > Well, you do realize they are part of the in_dev ? ;-> Huh? They cannot be device specific, there is only one gc per neighbour table. So even _iff_ there was a device specific setting it wouldn't make much sense ;-> > I see a little main service header with ifindex always no different than > IFA or IFLINK etc followed by appropriate TLVs which are nested. I guess we could simply move NDTPA_PARMS into the devconfig family. > > Unfortunately i still cant find the patch - i started with a different > approach; my immediate interest was to get events when someone made > in_device changes. BTW, this is going to be one of the main challenges > since there are many paths to configure these things. Right. > Well, if you look at the structure there is no reason they should really > be separate; infact theres a comment: > ----------- > struct neigh_parms parms; > /* HACK. gc_* shoul follow parms without a gap! */ > int gc_interval; > int gc_thresh1; > int gc_thresh2; > ---------- You do realize the reason for that comment? ;-> The author of the neighbour procfs code was simply too lazy to put these into a separate place so he introduced a nasty hack. > > I understand your architecture and if we follow this thought > > we'd have a "default" netdevice which repesents all default > > settings. > > >From looking at the code, the default stuff seems to be "hardcoded". > Example in the definition arp_tbl. These are just the default values for the default parameter set. > The model like i said is clean. There are some issues i have qualms with > - such as IP address arrangements and tight integration with netdevices > - but those can addressed at a later time. What about this, we move the device specific part into the new devconfig layer but leave the main configuration, statistics, and gc parameters in RTM_NEIGHTBL? i.e. arp_cache entries 2 reachable-time 23s 993msec retransmit-time 1s key-len 4 entry-size 152 last-flush 55m 44s 572msec gc threshold 128/512/1024 interval 30s chain-position 3 hash-rand 0x69650257/0x00000003 last-rand 1m 44s 571msec ? refcnt 1 pending-queue-limit 3 proxy-delayed-queue-limit 64 ? num-userspace-probes 0 num-unicast-probes 3 num-multicast-probes 3 ? min-age 1s base-reachable-time 30s stale-check-interval 1m ? initial-probe-delay 5s answer-delay 1s proxy-answer-delay 800msec lookups 195 hits 190 failed 0 allocations 3 destroys 1 hash-grows 1 forced-gc-runs 0 periodic-gc-runs 910 rcv-unicast-probes 0 rcv-multicast-probes 0 Xarp_cache<eth0> reachable-time 35s 967msec retransmit-time 1s X refcnt 3 pending-queue-limit 3 proxy-delayed-queue-limit 64 X num-userspace-probes 0 num-unicast-probes 3 num-multicast-probes 3 X min-age 1s base-reachable-time 30s stale-check-interval 1m X initial-probe-delay 5s answer-delay 1s proxy-answer-delay 800msec Everything marked with X is clearly device specific so it will move. Everything marked with '?' is the default parameter set, we can either leave it or move it as well and put it under into a 'default' ifindex. I don't care about the latter, either is fine with me. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink 2005-05-28 12:07 ` Thomas Graf @ 2005-05-31 10:04 ` jamal 2005-05-31 11:42 ` Thomas Graf 0 siblings, 1 reply; 26+ messages in thread From: jamal @ 2005-05-31 10:04 UTC (permalink / raw) To: Thomas Graf; +Cc: David S. Miller, netdev Doesnt seem like i responded to this. On Sat, 2005-28-05 at 14:07 +0200, Thomas Graf wrote: > * jamal <1117244567.6251.34.camel@localhost.localdomain> 2005-05-27 21:42 > > On Fri, 2005-27-05 at 18:35 +0200, Thomas Graf wrote: > > > > > I do NOT agree on moving gc_ into this architecture as well, > > > it doesn't belong there. > > > > Well, you do realize they are part of the in_dev ? ;-> > > Huh? They cannot be device specific, there is only one gc > per neighbour table. So even _iff_ there was a device specific > setting it wouldn't make much sense ;-> > I keep forgeting some of these tables are system wide. But i think it doesnt matter as long as we comply to the abstration thats in the kernel; in other words, config access is still via the in_devxx. i.e if you supply any ifindex (since all devices _at the moment_ point to the same ARP/ndisc table), it can be accessed and configured. > > I see a little main service header with ifindex always no different than > > IFA or IFLINK etc followed by appropriate TLVs which are nested. > > I guess we could simply move NDTPA_PARMS into the devconfig family. Probably a good start. This would still be missing the gc thresholds etc, no? > > Well, if you look at the structure there is no reason they should really > > be separate; infact theres a comment: > > ----------- > > struct neigh_parms parms; > > /* HACK. gc_* shoul follow parms without a gap! */ > > int gc_interval; > > int gc_thresh1; > > int gc_thresh2; > > ---------- > > You do realize the reason for that comment? ;-> The author of > the neighbour procfs code was simply too lazy to put these > into a separate place so he introduced a nasty hack. > I think you are correct. > What about this, we move the device specific part into the new devconfig > layer but leave the main configuration, statistics, and gc parameters > in RTM_NEIGHTBL? i.e. > > arp_cache entries 2 reachable-time 23s 993msec retransmit-time 1s > key-len 4 entry-size 152 last-flush 55m 44s 572msec > gc threshold 128/512/1024 interval 30s chain-position 3 > hash-rand 0x69650257/0x00000003 last-rand 1m 44s 571msec > ? refcnt 1 pending-queue-limit 3 proxy-delayed-queue-limit 64 > ? num-userspace-probes 0 num-unicast-probes 3 num-multicast-probes 3 > ? min-age 1s base-reachable-time 30s stale-check-interval 1m > ? initial-probe-delay 5s answer-delay 1s proxy-answer-delay 800msec > lookups 195 hits 190 failed 0 allocations 3 destroys 1 > hash-grows 1 forced-gc-runs 0 periodic-gc-runs 910 > rcv-unicast-probes 0 rcv-multicast-probes 0 > > Xarp_cache<eth0> reachable-time 35s 967msec retransmit-time 1s > X refcnt 3 pending-queue-limit 3 proxy-delayed-queue-limit 64 > X num-userspace-probes 0 num-unicast-probes 3 num-multicast-probes 3 > X min-age 1s base-reachable-time 30s stale-check-interval 1m > X initial-probe-delay 5s answer-delay 1s proxy-answer-delay 800msec > > > Everything marked with X is clearly device specific so it will move. > Everything marked with '?' is the default parameter set, we can either > leave it or move it as well and put it under into a 'default' ifindex. > I don't care about the latter, either is fine with me. > All "stuffs" (your bases is mine) accessible via in_devxx abstraction should be devconfig configurable. I do concur that some of it may not make sense - but that should be the exception rules... Ok, lets say these tables are one such exception, but note: In the future there could be multiple ARP tables for example (very likely given all the virtualization approaches happening). In other words different devices may point to different tables... cheers, jamal ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink 2005-05-31 10:04 ` jamal @ 2005-05-31 11:42 ` Thomas Graf 2005-05-31 12:48 ` jamal 0 siblings, 1 reply; 26+ messages in thread From: Thomas Graf @ 2005-05-31 11:42 UTC (permalink / raw) To: jamal; +Cc: David S. Miller, netdev * jamal <1117533847.6134.32.camel@localhost.localdomain> 2005-05-31 06:04 > Probably a good start. This would still be missing the gc thresholds > etc, no? Yes, this only includes the parts that I marked with 'X' and '?' below. > > What about this, we move the device specific part into the new devconfig > > layer but leave the main configuration, statistics, and gc parameters > > in RTM_NEIGHTBL? i.e. > > > > arp_cache entries 2 reachable-time 23s 993msec retransmit-time 1s > > key-len 4 entry-size 152 last-flush 55m 44s 572msec > > gc threshold 128/512/1024 interval 30s chain-position 3 > > hash-rand 0x69650257/0x00000003 last-rand 1m 44s 571msec > > ? refcnt 1 pending-queue-limit 3 proxy-delayed-queue-limit 64 > > ? num-userspace-probes 0 num-unicast-probes 3 num-multicast-probes 3 > > ? min-age 1s base-reachable-time 30s stale-check-interval 1m > > ? initial-probe-delay 5s answer-delay 1s proxy-answer-delay 800msec > > lookups 195 hits 190 failed 0 allocations 3 destroys 1 > > hash-grows 1 forced-gc-runs 0 periodic-gc-runs 910 > > rcv-unicast-probes 0 rcv-multicast-probes 0 > > > > Xarp_cache<eth0> reachable-time 35s 967msec retransmit-time 1s > > X refcnt 3 pending-queue-limit 3 proxy-delayed-queue-limit 64 > > X num-userspace-probes 0 num-unicast-probes 3 num-multicast-probes 3 > > X min-age 1s base-reachable-time 30s stale-check-interval 1m > > X initial-probe-delay 5s answer-delay 1s proxy-answer-delay 800msec > > > > > > Everything marked with X is clearly device specific so it will move. > > Everything marked with '?' is the default parameter set, we can either > > leave it or move it as well and put it under into a 'default' ifindex. > > I don't care about the latter, either is fine with me. > > > > All "stuffs" (your bases is mine) accessible via in_devxx abstraction > should be devconfig configurable. I do concur that some of it may not > make sense - but that should be the exception rules... > Ok, lets say these tables are one such exception, but note: > In the future there could be multiple ARP tables for example (very > likely given all the virtualization approaches happening). In other > words different devices may point to different tables... Let's assume for a moment that there are no device specific parameters, just the default parameter set, e.g everything in the "arp_cache" block. According do your idea, everything would be part of a faked in_dev entry with ifindex==0 (or alike). We'd have the exact same issues as with RTM_NEIGHTBL, the requirement of unique table ids stays the same. There is one big difference though, what if we ever have neighbour tables which do not care about inet devices at all? Maybe some kind of new virtual devices for in-kernel communication? ;-> Sounds scary and not realistic, but my point is basically that neighbour tables itself have a global scope, the assignment to the device and device specific parameters is on the level of the neighbours itself. The connection to in_dev/... is one level higher with a scope of the relevant neighbour table. A neighbour table implementation is not required to support device specific parameters, the code neighbour code will always assign the default parameter set and leaves it up to the constructor of the implementation level to reassign a device specific parameter set. Which means that what the procfs and now also neightbl code does it kind of a hack anyway since the core neighbour code is not aware of device specific stuff anyway, all it does is provide an easy method to manage them in a efficient matter. I would like to stay consistent to this architecture if possible to be able to follow every new additions. I hope my point is now clear, I wasn't so sure until now ;-> However, I think you have a good point, parameter sets have a strong relation to inet devices at the moment. If you still think that we should move _everything_ into a devconfig layer then I'll do it but I still support my initial prospoal. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink 2005-05-31 11:42 ` Thomas Graf @ 2005-05-31 12:48 ` jamal 2005-05-31 13:17 ` Thomas Graf 0 siblings, 1 reply; 26+ messages in thread From: jamal @ 2005-05-31 12:48 UTC (permalink / raw) To: Thomas Graf; +Cc: David S. Miller, netdev On Tue, 2005-31-05 at 13:42 +0200, Thomas Graf wrote: > > All "stuffs" (your bases is mine) accessible via in_devxx abstraction > > should be devconfig configurable. I do concur that some of it may not > > make sense - but that should be the exception rules... > > Ok, lets say these tables are one such exception, but note: > > In the future there could be multiple ARP tables for example (very > > likely given all the virtualization approaches happening). In other > > words different devices may point to different tables... > > Let's assume for a moment that there are no device specific parameters, > just the default parameter set, e.g everything in the "arp_cache" block. > According do your idea, everything would be part of a faked in_dev entry > with ifindex==0 (or alike). We'd have the exact same issues as with > RTM_NEIGHTBL, the requirement of unique table ids stays the same. There > is one big difference though, what if we ever have neighbour tables which > do not care about inet devices at all? Maybe some kind of new virtual > devices for in-kernel communication? ;-> Sounds scary and not realistic, An extra virtual device just to store the defaults will be overkill. OTOH, a global allocated structure for defaults is ok. I think thats the way it is done today. > but my point is basically that neighbour tables itself have a global scope, > the assignment to the device and device specific parameters is on the > level of the neighbours itself. The connection to in_dev/... is one > level higher with a scope of the relevant neighbour table. A neighbour > table implementation is not required to support device specific > parameters, the code neighbour code will always assign the default > parameter set and leaves it up to the constructor of the implementation > level to reassign a device specific parameter set. Which means that what > the procfs and now also neightbl code does it kind of a hack anyway > since the core neighbour code is not aware of device specific stuff > anyway, all it does is provide an easy method to manage them in a > efficient matter. The neighbor code ought not be aware of devices; but whoever is configuring must be. By configuring i mean "stiching together" the different blocks (as in some netlink based program in user space). the idevxxx is a "stiching" construct - and therefore it may be aware of many other things that a standalone block doesnt. One could argue that the ifindex in a netdevice is also under the same class. The ARP, NDISC, netdevice, filters etc are the blocks being sown together. Most of the times these blocks are built together in a topology that a packet flows through. > I would like to stay consistent to this architecture > if possible to be able to follow every new additions. > > I hope my point is now clear, I wasn't so sure until now ;-> However, > I think you have a good point, parameter sets have a strong relation > to inet devices at the moment. If you still think that we should move > _everything_ into a devconfig layer then I'll do it but I still support > my initial prospoal. > I agree that we should leave the neighbor table-specific knobs out of devconfig. It will help, though, for devconfig to have a single reference to the "instance" of the table(perhaps a name or a even a 32 bit pointer address) that a device uses (thefore maintaining the abstraction); but since there is only one table for v4 and v6 anyways i suppose it is implicitly assumed to be that one table. cheers, jamal ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink 2005-05-31 12:48 ` jamal @ 2005-05-31 13:17 ` Thomas Graf 2005-05-31 14:59 ` Jamal Hadi Salim 0 siblings, 1 reply; 26+ messages in thread From: Thomas Graf @ 2005-05-31 13:17 UTC (permalink / raw) To: jamal; +Cc: David S. Miller, netdev * jamal <1117543711.6134.48.camel@localhost.localdomain> 2005-05-31 08:48 > The neighbor code ought not be aware of devices; but whoever is > configuring must be. By configuring i mean "stiching together" the > different blocks (as in some netlink based program in user space). the > idevxxx is a "stiching" construct - and therefore it may be aware of > many other things that a standalone block doesnt. One could argue that > the ifindex in a netdevice is also under the same class. The ARP, NDISC, > netdevice, filters etc are the blocks being sown together. > Most of the times these blocks are built together in a topology that a > packet flows through. OK, so your idevxxx is a kind of container for various configurable blocks that logically belong to a inetdevice and a device specific parameter set for a neighbour table would one of these blocks? I think this is appropriate for NDTA_PARMS, e.g. there might be multiple of such neighbour table parameter blocks per inetdevice identified by the name of the neighbour table. You certainly have a good point there, I see two minor drawbacks though: a) where do we put these parameter blocks if it doesn't belong to a inetdevice? and b) it makes collecting the complete configuration of a neighbour table complicated. Nevertheless, I think I can agree on this. > I agree that we should leave the neighbor table-specific knobs out of > devconfig. OK, so we leave RTM_NEIGHTBL_* as-is but move the device specific parameter sets into devconfig? i.e. RTM_NEIGHTBL_* will cover: * the core neighbour table configuration (key-len, entry-size, last-flush, gc thresholds, gc interval, hash settings, etc.) * the default parameter set * neighbour table statistics > It will help, though, for devconfig to have a single reference to the > "instance" of the table(perhaps a name or a even a 32 bit pointer > address) that a device uses (thefore maintaining the abstraction); but > since there is only one table for v4 and v6 anyways i suppose it is > implicitly assumed to be that one table. This kind of breaks the architecture, i'd rather have it the other way around, i.e. we hold a netdevice reference in each parameter set and when lookup a parameter set for a specific idevxxx we iterate through all neighbour tables and their paremter sets and compare ifindex. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink 2005-05-31 13:17 ` Thomas Graf @ 2005-05-31 14:59 ` Jamal Hadi Salim 2005-05-31 16:13 ` Thomas Graf 0 siblings, 1 reply; 26+ messages in thread From: Jamal Hadi Salim @ 2005-05-31 14:59 UTC (permalink / raw) To: Thomas Graf; +Cc: netdev, David S. Miller On Tue, 2005-31-05 at 15:17 +0200, Thomas Graf wrote: > > OK, so your idevxxx is a kind of container for various configurable > blocks that logically belong to a inetdevice and a device specific > parameter set for a neighbour table would one of these blocks? > Essentialy just in_device and inet6_dev > I think this is appropriate for NDTA_PARMS, e.g. there might be > multiple of such neighbour table parameter blocks per inetdevice > identified by the name of the neighbour table. You certainly have > a good point there, I see two minor drawbacks though: a) where > do we put these parameter blocks if it doesn't belong to a > inetdevice? and b) it makes collecting the complete configuration > of a neighbour table complicated. Nevertheless, I think I can > agree on this. > To recap: netdevice -> L2 config stuff config stuff more config stuff even more config stuff .. .. netdevice protocol config: -> v4 specific (struct in_device) ----> IPV4 addresses (ifa_list), ARP params(arp_parms),etc -> v6 specific (struct inet6_dev) ----> IPV6 addresses(addr_list), ndisc params (nd_parms), etc I think we are agreeing _not_ to configure ifa_list, arp_parms etc via devconfig. Leave these to their already existing config paths. In the case of devconfig: What i am suggesting is to put some identifier that can be used to access lets say one or more arp_parms. This way when i dump the devconfig i should be able to see "connected to arp table 1". I should then be able to say dump arp table 1. If it is too complex an idea we can drop it. The defaults should stay as you say (and as they exist today) in the arp block. > > I agree that we should leave the neighbor table-specific knobs out of > > devconfig. > > OK, so we leave RTM_NEIGHTBL_* as-is but move the device specific > parameter sets into devconfig? i.e. RTM_NEIGHTBL_* will cover: > > * the core neighbour table configuration (key-len, entry-size, > last-flush, gc thresholds, gc interval, hash settings, etc.) > * the default parameter set > * neighbour table statistics > Sounds reasonable. And per device arp/ndisc is of course one of many configurable parameters via devconfig. > > > It will help, though, for devconfig to have a single reference to the > > "instance" of the table(perhaps a name or a even a 32 bit pointer > > address) that a device uses (thefore maintaining the abstraction); but > > since there is only one table for v4 and v6 anyways i suppose it is > > implicitly assumed to be that one table. > > This kind of breaks the architecture, i'd rather have it the other > way around, i.e. we hold a netdevice reference in each parameter set > and when lookup a parameter set for a specific idevxxx we iterate > through all neighbour tables and their paremter sets and compare > ifindex. The directionality as i see it is: netdevice --> arp/ndisc table(s) [And the binding/stiching of those tables is via inxx_devyy]. I dont have issues with arp pointing to netdevice as well, but I thought ARP/ndisc already have ifindex reference? cheers, jamal ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink 2005-05-31 14:59 ` Jamal Hadi Salim @ 2005-05-31 16:13 ` Thomas Graf 2005-06-02 13:33 ` jamal 0 siblings, 1 reply; 26+ messages in thread From: Thomas Graf @ 2005-05-31 16:13 UTC (permalink / raw) To: Jamal Hadi Salim; +Cc: netdev, David S. Miller * Jamal Hadi Salim <1117551561.6279.2.camel@localhost.localdomain> 2005-05-31 10:59 > netdevice -> > L2 config stuff > config stuff > more config stuff > even more config stuff > .. > .. > netdevice protocol config: > -> v4 specific (struct in_device) > ----> IPV4 addresses (ifa_list), ARP params(arp_parms),etc > -> v6 specific (struct inet6_dev) > ----> IPV6 addresses(addr_list), ndisc params (nd_parms), etc Absolutely. > I think we are agreeing _not_ to configure ifa_list, arp_parms etc > via devconfig. Leave these to their already existing config paths. arp_parms is just a reference to the corresponding device specific parameter set of the "arp_cache" neighbour table. Maybe I've been misunderstanding you throughout all your posts or maybe I've a wrong understanding of the neighbour code in general. As far as I understand it: 1) The actual neighbour implementation registers a new neighbour table by calling neigh_table_init() with the proposed default configuration for this table. This also includes the default parameter set used to derive device specific parameter sets later on, e.g. arp_tbl. 2) Upon the creation of a inet_device/... a new parameter set is allocated that derives the initial parameter values from the default parameter set. This parameter set is stored in in_dev->arp_parms and linked in a list of parameter sets in the neighbour table (neightbl->parms). 3) When a new neighbour is created the core neighbour code will assign the default parameter set of the table to the neighbour and increments its refcnt. It then calls the constructor of arp,ndisc,... which replaces the parameter set just assigned with the one in in_dev->arp_parms which is the parameter set. So what I propose is to have the neighbour table parameters, e.g. everything in arp_tbl be distributed over RTM_NEIGHTBL and put the device specific parameters into devconfig, e.g. in_dev->arp_parms. > In the case of devconfig: > What i am suggesting is to put some identifier that can be used > to access lets say one or more arp_parms. This way when i dump > the devconfig i should be able to see "connected to arp table 1". > I should then be able to say dump arp table 1. OK, this is possible with either way. I don't worry about this. > The directionality as i see it is: > > netdevice --> arp/ndisc table(s) > [And the binding/stiching of those tables is via inxx_devyy]. Absolutely, more specific: netdevice -> inet_device -> parameter set -> neighbour table or: neighbour table -> list of parameter sets -> netdevice both ways are possible right now. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink 2005-05-31 16:13 ` Thomas Graf @ 2005-06-02 13:33 ` jamal 0 siblings, 0 replies; 26+ messages in thread From: jamal @ 2005-06-02 13:33 UTC (permalink / raw) To: Thomas Graf; +Cc: David S. Miller, netdev On Tue, 2005-31-05 at 18:13 +0200, Thomas Graf wrote: [..] > So what I propose is to have the neighbour table parameters, > e.g. everything in arp_tbl be distributed over RTM_NEIGHTBL > and put the device specific parameters into devconfig, > e.g. in_dev->arp_parms. > Right, this is what i am saying a well. The only caveat i was pointing out is that the devconfig piece is more than just the neighbor stuff - and of course it hasnt been written, yet;-> The major challenge will be events - some change via /proc, sysfs etc should generate event. I suggest something along usage of notifier_block with something like NETDEV_CONFIG to transport these things around. Damn, if only i can find my patch .... I had already started doing events based on changes from /proc or sysctl etc. > Absolutely, more specific: > > netdevice -> inet_device -> parameter set -> neighbour table > or: > neighbour table -> list of parameter sets -> netdevice > > both ways are possible right now. Sounds good to me. cheers, jamal ^ permalink raw reply [flat|nested] 26+ messages in thread
* [NEIGH] Remove unused fields in struct neigh_parms and neigh_table 2005-05-26 18:53 [PATCHSET] neighbour tables access via rtnetlink Thomas Graf ` (2 preceding siblings ...) 2005-05-26 18:55 ` [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink Thomas Graf @ 2005-05-26 18:55 ` Thomas Graf 2005-05-26 19:00 ` Thomas Graf 3 siblings, 1 reply; 26+ messages in thread From: Thomas Graf @ 2005-05-26 18:55 UTC (permalink / raw) To: David S. Miller; +Cc: netdev Signed-off-by: Thomas Graf <tgraf@suug.ch> --- commit 45b866033b439e510f3638ce9ac0990446b8b3cd tree 72baef0bb9b97ac2990bcca05b8f831b30303008 parent 3688d656f38445cc120b0970cd437cad1bb929dd author Thomas Graf <tgraf@suug.ch> Thu, 26 May 2005 20:42:10 +0200 committer Thomas Graf <tgraf@suug.ch> Thu, 26 May 2005 20:42:10 +0200 include/net/neighbour.h | 3 --- 1 files changed, 3 deletions(-) Index: include/net/neighbour.h =================================================================== --- 2fbcbf92bed82aaef81b2bcc9baecafd61dfb0ce/include/net/neighbour.h (mode:100644) +++ 72baef0bb9b97ac2990bcca05b8f831b30303008/include/net/neighbour.h (mode:100644) @@ -69,8 +69,6 @@ struct neigh_parms *next; int (*neigh_setup)(struct neighbour *); struct neigh_table *tbl; - int entries; - void *priv; void *sysctl_table; @@ -193,7 +191,6 @@ atomic_t entries; rwlock_t lock; unsigned long last_rand; - struct neigh_parms *parms_list; kmem_cache_t *kmem_cachep; struct neigh_statistics *stats; struct neighbour **hash_buckets; ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [NEIGH] Remove unused fields in struct neigh_parms and neigh_table 2005-05-26 18:55 ` [NEIGH] Remove unused fields in struct neigh_parms and neigh_table Thomas Graf @ 2005-05-26 19:00 ` Thomas Graf 0 siblings, 0 replies; 26+ messages in thread From: Thomas Graf @ 2005-05-26 19:00 UTC (permalink / raw) To: David S. Miller; +Cc: netdev * Thomas Graf <20050526185556.GA15391@postel.suug.ch> 2005-05-26 20:55 > > Signed-off-by: Thomas Graf <tgraf@suug.ch> > > --- > commit 45b866033b439e510f3638ce9ac0990446b8b3cd > tree 72baef0bb9b97ac2990bcca05b8f831b30303008 > parent 3688d656f38445cc120b0970cd437cad1bb929dd > author Thomas Graf <tgraf@suug.ch> Thu, 26 May 2005 20:42:10 +0200 > committer Thomas Graf <tgraf@suug.ch> Thu, 26 May 2005 20:42:10 +0200 This is patch 4 if it's not clear. I took the wrong patch and my script couldn't find the parent sha1 and ommited the [PATCH 4/4]. Sorry. ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2005-06-02 13:33 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-26 18:53 [PATCHSET] neighbour tables access via rtnetlink Thomas Graf
2005-05-26 18:54 ` [PATCH 1/4] [NETLINK] New message building macros Thomas Graf
2005-05-26 18:54 ` [PATCH 2/4] [RTNETLINK] Routing attribute related shortcuts Thomas Graf
2005-05-26 18:55 ` [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink Thomas Graf
2005-05-26 22:17 ` David S. Miller
2005-05-26 22:24 ` Thomas Graf
2005-05-26 22:26 ` Thomas Graf
2005-05-26 22:37 ` David S. Miller
2005-05-27 11:14 ` jamal
2005-05-27 12:15 ` Thomas Graf
2005-05-27 13:50 ` jamal
[not found] ` <20050527141023.GP15391@postel.suug.ch>
2005-05-27 14:57 ` jamal
2005-05-27 15:16 ` Thomas Graf
2005-05-27 15:56 ` jamal
2005-05-27 16:35 ` Thomas Graf
2005-05-28 1:42 ` jamal
2005-05-28 12:07 ` Thomas Graf
2005-05-31 10:04 ` jamal
2005-05-31 11:42 ` Thomas Graf
2005-05-31 12:48 ` jamal
2005-05-31 13:17 ` Thomas Graf
2005-05-31 14:59 ` Jamal Hadi Salim
2005-05-31 16:13 ` Thomas Graf
2005-06-02 13:33 ` jamal
2005-05-26 18:55 ` [NEIGH] Remove unused fields in struct neigh_parms and neigh_table Thomas Graf
2005-05-26 19:00 ` Thomas Graf
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).