* [RFC V2 PATCH 16/25] net/netpolicy: introduce per socket netpolicy
From: kan.liang @ 2015-01-01 1:39 UTC (permalink / raw)
To: davem, linux-kernel, netdev
Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
hannes, jesse.brandeburg, andi, Kan Liang
In-Reply-To: <1420076354-4861-1-git-send-email-kan.liang@intel.com>
From: Kan Liang <kan.liang@intel.com>
The network socket is the most basic unit which control the network
traffic. This patch introduces a new socket option SO_NETPOLICY to
set/get net policy for socket. so that the application can set its own
policy on socket to improve the network performance.
Per socket net policy can also be inherited by new socket.
The usage of SO_NETPOLICY socket option is as below.
setsockopt(sockfd,SOL_SOCKET,SO_NETPOLICY,&policy,sizeof(int))
getsockopt(sockfd,SOL_SOCKET,SO_NETPOLICY,&policy,sizeof(int))
The policy set by SO_NETPOLICY socket option must be valid and
compatible with current device policy. Othrewise, it will error out. The
socket policy will be set to NET_POLICY_INVALID.
Signed-off-by: Kan Liang <kan.liang@intel.com>
---
arch/alpha/include/uapi/asm/socket.h | 2 ++
arch/avr32/include/uapi/asm/socket.h | 2 ++
arch/frv/include/uapi/asm/socket.h | 2 ++
arch/ia64/include/uapi/asm/socket.h | 2 ++
arch/m32r/include/uapi/asm/socket.h | 2 ++
arch/mips/include/uapi/asm/socket.h | 2 ++
arch/mn10300/include/uapi/asm/socket.h | 2 ++
arch/parisc/include/uapi/asm/socket.h | 2 ++
arch/powerpc/include/uapi/asm/socket.h | 2 ++
arch/s390/include/uapi/asm/socket.h | 2 ++
arch/sparc/include/uapi/asm/socket.h | 2 ++
arch/xtensa/include/uapi/asm/socket.h | 2 ++
include/net/request_sock.h | 4 +++-
include/net/sock.h | 9 +++++++++
include/uapi/asm-generic/socket.h | 2 ++
net/core/sock.c | 28 ++++++++++++++++++++++++++++
16 files changed, 66 insertions(+), 1 deletion(-)
diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
index 9e46d6e..06b2ef9 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -97,4 +97,6 @@
#define SO_CNX_ADVICE 53
+#define SO_NETPOLICY 54
+
#endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/avr32/include/uapi/asm/socket.h b/arch/avr32/include/uapi/asm/socket.h
index 1fd147f..24f85f0 100644
--- a/arch/avr32/include/uapi/asm/socket.h
+++ b/arch/avr32/include/uapi/asm/socket.h
@@ -90,4 +90,6 @@
#define SO_CNX_ADVICE 53
+#define SO_NETPOLICY 54
+
#endif /* _UAPI__ASM_AVR32_SOCKET_H */
diff --git a/arch/frv/include/uapi/asm/socket.h b/arch/frv/include/uapi/asm/socket.h
index afbc98f0..82c8d44 100644
--- a/arch/frv/include/uapi/asm/socket.h
+++ b/arch/frv/include/uapi/asm/socket.h
@@ -90,5 +90,7 @@
#define SO_CNX_ADVICE 53
+#define SO_NETPOLICY 54
+
#endif /* _ASM_SOCKET_H */
diff --git a/arch/ia64/include/uapi/asm/socket.h b/arch/ia64/include/uapi/asm/socket.h
index 0018fad..b99c1df 100644
--- a/arch/ia64/include/uapi/asm/socket.h
+++ b/arch/ia64/include/uapi/asm/socket.h
@@ -99,4 +99,6 @@
#define SO_CNX_ADVICE 53
+#define SO_NETPOLICY 54
+
#endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/m32r/include/uapi/asm/socket.h b/arch/m32r/include/uapi/asm/socket.h
index 5fe42fc..71a43ed 100644
--- a/arch/m32r/include/uapi/asm/socket.h
+++ b/arch/m32r/include/uapi/asm/socket.h
@@ -90,4 +90,6 @@
#define SO_CNX_ADVICE 53
+#define SO_NETPOLICY 54
+
#endif /* _ASM_M32R_SOCKET_H */
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
index 2027240a..ce8b9ba 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -108,4 +108,6 @@
#define SO_CNX_ADVICE 53
+#define SO_NETPOLICY 54
+
#endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/mn10300/include/uapi/asm/socket.h b/arch/mn10300/include/uapi/asm/socket.h
index 5129f23..c041265 100644
--- a/arch/mn10300/include/uapi/asm/socket.h
+++ b/arch/mn10300/include/uapi/asm/socket.h
@@ -90,4 +90,6 @@
#define SO_CNX_ADVICE 53
+#define SO_NETPOLICY 54
+
#endif /* _ASM_SOCKET_H */
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index 9c935d7..2639dcd 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -89,4 +89,6 @@
#define SO_CNX_ADVICE 0x402E
+#define SO_NETPOLICY 0x402F
+
#endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/powerpc/include/uapi/asm/socket.h b/arch/powerpc/include/uapi/asm/socket.h
index 1672e33..e04e3b6 100644
--- a/arch/powerpc/include/uapi/asm/socket.h
+++ b/arch/powerpc/include/uapi/asm/socket.h
@@ -97,4 +97,6 @@
#define SO_CNX_ADVICE 53
+#define SO_NETPOLICY 54
+
#endif /* _ASM_POWERPC_SOCKET_H */
diff --git a/arch/s390/include/uapi/asm/socket.h b/arch/s390/include/uapi/asm/socket.h
index 41b51c2..d43b854 100644
--- a/arch/s390/include/uapi/asm/socket.h
+++ b/arch/s390/include/uapi/asm/socket.h
@@ -96,4 +96,6 @@
#define SO_CNX_ADVICE 53
+#define SO_NETPOLICY 54
+
#endif /* _ASM_SOCKET_H */
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
index 31aede3..94a2cdf 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -86,6 +86,8 @@
#define SO_CNX_ADVICE 0x0037
+#define SO_NETPOLICY 0x0038
+
/* Security levels - as per NRL IPv6 - don't actually do anything */
#define SO_SECURITY_AUTHENTICATION 0x5001
#define SO_SECURITY_ENCRYPTION_TRANSPORT 0x5002
diff --git a/arch/xtensa/include/uapi/asm/socket.h b/arch/xtensa/include/uapi/asm/socket.h
index 81435d9..97f1691 100644
--- a/arch/xtensa/include/uapi/asm/socket.h
+++ b/arch/xtensa/include/uapi/asm/socket.h
@@ -101,4 +101,6 @@
#define SO_CNX_ADVICE 53
+#define SO_NETPOLICY 54
+
#endif /* _XTENSA_SOCKET_H */
diff --git a/include/net/request_sock.h b/include/net/request_sock.h
index 6ebe13e..1fa2d0e 100644
--- a/include/net/request_sock.h
+++ b/include/net/request_sock.h
@@ -101,7 +101,9 @@ reqsk_alloc(const struct request_sock_ops *ops, struct sock *sk_listener,
sk_tx_queue_clear(req_to_sk(req));
req->saved_syn = NULL;
atomic_set(&req->rsk_refcnt, 0);
-
+#ifdef CONFIG_NETPOLICY
+ memcpy(&req_to_sk(req)->sk_netpolicy, &sk_listener->sk_netpolicy, sizeof(sk_listener->sk_netpolicy));
+#endif
return req;
}
diff --git a/include/net/sock.h b/include/net/sock.h
index ff5be7e..fd4132f 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -70,6 +70,7 @@
#include <net/checksum.h>
#include <net/tcp_states.h>
#include <linux/net_tstamp.h>
+#include <linux/netpolicy.h>
/*
* This structure really needs to be cleaned up.
@@ -141,6 +142,7 @@ typedef __u64 __bitwise __addrpair;
* %SO_OOBINLINE settings, %SO_TIMESTAMPING settings
* @skc_incoming_cpu: record/match cpu processing incoming packets
* @skc_refcnt: reference count
+ * @skc_netpolicy: per socket net policy
*
* This is the minimal network layer representation of sockets, the header
* for struct sock and struct inet_timewait_sock.
@@ -200,6 +202,10 @@ struct sock_common {
struct sock *skc_listener; /* request_sock */
struct inet_timewait_death_row *skc_tw_dr; /* inet_timewait_sock */
};
+
+#ifdef CONFIG_NETPOLICY
+ struct netpolicy_instance skc_netpolicy;
+#endif
/*
* fields between dontcopy_begin/dontcopy_end
* are not copied in sock_copy()
@@ -339,6 +345,9 @@ struct sock {
#define sk_incoming_cpu __sk_common.skc_incoming_cpu
#define sk_flags __sk_common.skc_flags
#define sk_rxhash __sk_common.skc_rxhash
+#ifdef CONFIG_NETPOLICY
+#define sk_netpolicy __sk_common.skc_netpolicy
+#endif
socket_lock_t sk_lock;
struct sk_buff_head sk_receive_queue;
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index 67d632f..d2a5aeb 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -92,4 +92,6 @@
#define SO_CNX_ADVICE 53
+#define SO_NETPOLICY 54
+
#endif /* __ASM_GENERIC_SOCKET_H */
diff --git a/net/core/sock.c b/net/core/sock.c
index 25dab8b..77f226b 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1003,6 +1003,12 @@ set_rcvbuf:
if (val == 1)
dst_negative_advice(sk);
break;
+
+#ifdef CONFIG_NETPOLICY
+ case SO_NETPOLICY:
+ ret = netpolicy_register(&sk->sk_netpolicy, val);
+ break;
+#endif
default:
ret = -ENOPROTOOPT;
break;
@@ -1263,6 +1269,11 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
v.val = sk->sk_incoming_cpu;
break;
+#ifdef CONFIG_NETPOLICY
+ case SO_NETPOLICY:
+ v.val = sk->sk_netpolicy.policy;
+ break;
+#endif
default:
/* We implement the SO_SNDLOWAT etc to not be settable
* (1003.1g 7).
@@ -1424,6 +1435,12 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
sock_update_classid(&sk->sk_cgrp_data);
sock_update_netprioidx(&sk->sk_cgrp_data);
+
+#ifdef CONFIG_NETPOLICY
+ sk->sk_netpolicy.dev = NULL;
+ sk->sk_netpolicy.ptr = (void *)sk;
+ sk->sk_netpolicy.policy = NET_POLICY_INVALID;
+#endif
}
return sk;
@@ -1461,6 +1478,10 @@ static void __sk_destruct(struct rcu_head *head)
put_pid(sk->sk_peer_pid);
if (likely(sk->sk_net_refcnt))
put_net(sock_net(sk));
+#ifdef CONFIG_NETPOLICY
+ if (is_net_policy_valid(sk->sk_netpolicy.policy))
+ netpolicy_unregister(&sk->sk_netpolicy);
+#endif
sk_prot_free(sk->sk_prot_creator, sk);
}
@@ -1597,6 +1618,13 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
if (sock_needs_netstamp(sk) &&
newsk->sk_flags & SK_FLAGS_TIMESTAMP)
net_enable_timestamp();
+
+#ifdef CONFIG_NETPOLICY
+ newsk->sk_netpolicy.ptr = (void *)newsk;
+ if (is_net_policy_valid(newsk->sk_netpolicy.policy))
+ netpolicy_register(&newsk->sk_netpolicy, newsk->sk_netpolicy.policy);
+
+#endif
}
out:
return newsk;
--
2.5.5
^ permalink raw reply related
* [RFC V2 PATCH 15/25] net/netpolicy: implement netpolicy register
From: kan.liang @ 2015-01-01 1:39 UTC (permalink / raw)
To: davem, linux-kernel, netdev
Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
hannes, jesse.brandeburg, andi, Kan Liang
In-Reply-To: <1420076354-4861-1-git-send-email-kan.liang@intel.com>
From: Kan Liang <kan.liang@intel.com>
The socket/task can only be benefited when it register itself with
specific policy. If it's the first time to register, a record will be
created and inserted into RCU hash table. The record includes ptr,
policy and object information. ptr is the socket/task's pointer which is
used as key to search the record in hash table. Object will be assigned
later.
This patch also introduces a new type NET_POLICY_INVALID, which
indicates that the task/socket are not registered.
np_hashtable_lock is introduced to protect the hash table.
Signed-off-by: Kan Liang <kan.liang@intel.com>
---
include/linux/netpolicy.h | 26 ++++++++
net/core/netpolicy.c | 153 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 179 insertions(+)
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index cc75e3c..5900252 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -17,6 +17,7 @@
#define __LINUX_NETPOLICY_H
enum netpolicy_name {
+ NET_POLICY_INVALID = -1,
NET_POLICY_NONE = 0,
NET_POLICY_CPU,
NET_POLICY_BULK,
@@ -79,12 +80,37 @@ struct netpolicy_info {
struct list_head obj_list[NETPOLICY_RXTX][NET_POLICY_MAX];
};
+struct netpolicy_instance {
+ struct net_device *dev;
+ enum netpolicy_name policy; /* required policy */
+ void *ptr; /* pointers */
+};
+
+/* check if policy is valid */
+static inline int is_net_policy_valid(enum netpolicy_name policy)
+{
+ return ((policy < NET_POLICY_MAX) && (policy > NET_POLICY_INVALID));
+}
+
#ifdef CONFIG_NETPOLICY
extern void update_netpolicy_sys_map(void);
+extern int netpolicy_register(struct netpolicy_instance *instance,
+ enum netpolicy_name policy);
+extern void netpolicy_unregister(struct netpolicy_instance *instance);
#else
static inline void update_netpolicy_sys_map(void)
{
}
+
+static inline int netpolicy_register(struct netpolicy_instance *instance,
+ enum netpolicy_name policy)
+{ return 0;
+}
+
+static inline void netpolicy_unregister(struct netpolicy_instance *instance)
+{
+}
+
#endif
#endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 7579685..3605761 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -38,6 +38,19 @@
#include <linux/sort.h>
#include <linux/ctype.h>
#include <linux/cpu.h>
+#include <linux/hashtable.h>
+
+struct netpolicy_record {
+ struct hlist_node hash_node;
+ unsigned long ptr_id;
+ enum netpolicy_name policy;
+ struct net_device *dev;
+ struct netpolicy_object *rx_obj;
+ struct netpolicy_object *tx_obj;
+};
+
+static DEFINE_HASHTABLE(np_record_hash, 10);
+static DEFINE_SPINLOCK(np_hashtable_lock);
static int netpolicy_get_dev_info(struct net_device *dev,
struct netpolicy_dev_info *d_info)
@@ -223,6 +236,143 @@ static int netpolicy_enable(struct net_device *dev)
return 0;
}
+static struct netpolicy_record *netpolicy_record_search(unsigned long ptr_id)
+{
+ struct netpolicy_record *rec = NULL;
+
+ hash_for_each_possible_rcu(np_record_hash, rec, hash_node, ptr_id) {
+ if (rec->ptr_id == ptr_id)
+ break;
+ }
+
+ return rec;
+}
+
+static void put_queue(struct net_device *dev,
+ struct netpolicy_object *rx_obj,
+ struct netpolicy_object *tx_obj)
+{
+ if (!dev || !dev->netpolicy)
+ return;
+
+ if (rx_obj)
+ atomic_dec(&rx_obj->refcnt);
+ if (tx_obj)
+ atomic_dec(&tx_obj->refcnt);
+}
+
+static void netpolicy_record_clear_obj(void)
+{
+ struct netpolicy_record *rec;
+ int i;
+
+ spin_lock_bh(&np_hashtable_lock);
+ hash_for_each_rcu(np_record_hash, i, rec, hash_node) {
+ put_queue(rec->dev, rec->rx_obj, rec->tx_obj);
+ rec->rx_obj = NULL;
+ rec->tx_obj = NULL;
+ }
+ spin_unlock_bh(&np_hashtable_lock);
+}
+
+static void netpolicy_record_clear_dev_node(struct net_device *dev)
+{
+ struct netpolicy_record *rec;
+ int i;
+
+ spin_lock_bh(&np_hashtable_lock);
+ hash_for_each_rcu(np_record_hash, i, rec, hash_node) {
+ if (rec->dev == dev) {
+ hash_del_rcu(&rec->hash_node);
+ kfree(rec);
+ }
+ }
+ spin_unlock_bh(&np_hashtable_lock);
+}
+
+/**
+ * netpolicy_register() - Register per socket/task policy request
+ * @instance: NET policy per socket/task instance info
+ * @policy: request NET policy
+ *
+ * This function intends to register per socket/task policy request.
+ * If it's the first time to register, an record will be created and
+ * inserted into RCU hash table.
+ *
+ * The record includes ptr, policy and object info. ptr of the socket/task
+ * is the key to search the record in hash table. Object will be assigned
+ * until the first packet is received/transmitted.
+ *
+ * Return: 0 on success, others on failure
+ */
+int netpolicy_register(struct netpolicy_instance *instance,
+ enum netpolicy_name policy)
+{
+ unsigned long ptr_id = (uintptr_t)instance->ptr;
+ struct netpolicy_record *new, *old;
+
+ if (!is_net_policy_valid(policy)) {
+ instance->policy = NET_POLICY_INVALID;
+ return -EINVAL;
+ }
+
+ new = kzalloc(sizeof(*new), GFP_KERNEL);
+ if (!new) {
+ instance->policy = NET_POLICY_INVALID;
+ return -ENOMEM;
+ }
+
+ spin_lock_bh(&np_hashtable_lock);
+ /* Check it in mapping table */
+ old = netpolicy_record_search(ptr_id);
+ if (old) {
+ if (old->policy != policy) {
+ put_queue(old->dev, old->rx_obj, old->tx_obj);
+ old->rx_obj = NULL;
+ old->tx_obj = NULL;
+ old->policy = policy;
+ }
+ kfree(new);
+ } else {
+ new->ptr_id = ptr_id;
+ new->dev = instance->dev;
+ new->policy = policy;
+ hash_add_rcu(np_record_hash, &new->hash_node, ptr_id);
+ }
+ instance->policy = policy;
+ spin_unlock_bh(&np_hashtable_lock);
+
+ return 0;
+}
+EXPORT_SYMBOL(netpolicy_register);
+
+/**
+ * netpolicy_unregister() - Unregister per socket/task policy request
+ * @instance: NET policy per socket/task instance info
+ *
+ * This function intends to unregister policy request by del related record
+ * from hash table.
+ *
+ */
+void netpolicy_unregister(struct netpolicy_instance *instance)
+{
+ struct netpolicy_record *record;
+ unsigned long ptr_id = (uintptr_t)instance->ptr;
+
+ spin_lock_bh(&np_hashtable_lock);
+ /* del from hash table */
+ record = netpolicy_record_search(ptr_id);
+ if (record) {
+ hash_del_rcu(&record->hash_node);
+ /* The record cannot be share. It can be safely free. */
+ put_queue(record->dev, record->rx_obj, record->tx_obj);
+ kfree(record);
+ }
+ instance->policy = NET_POLICY_INVALID;
+ spin_unlock_bh(&np_hashtable_lock);
+}
+EXPORT_SYMBOL(netpolicy_unregister);
+
const char *policy_name[NET_POLICY_MAX] = {
"NONE",
"CPU",
@@ -825,6 +975,7 @@ static int netpolicy_notify(struct notifier_block *this,
break;
case NETDEV_GOING_DOWN:
uninit_netpolicy(dev);
+ netpolicy_record_clear_dev_node(dev);
#ifdef CONFIG_PROC_FS
proc_remove(dev->proc_dev);
dev->proc_dev = NULL;
@@ -863,6 +1014,8 @@ void update_netpolicy_sys_map(void)
dev->netpolicy->cur_policy = NET_POLICY_NONE;
+ /* clear mapping table */
+ netpolicy_record_clear_obj();
/* rebuild everything */
netpolicy_disable(dev);
netpolicy_enable(dev);
--
2.5.5
^ permalink raw reply related
* [RFC V2 PATCH 14/25] net/netpolicy: handle channel changes
From: kan.liang @ 2015-01-01 1:39 UTC (permalink / raw)
To: davem, linux-kernel, netdev
Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
hannes, jesse.brandeburg, andi, Kan Liang
In-Reply-To: <1420076354-4861-1-git-send-email-kan.liang@intel.com>
From: Kan Liang <kan.liang@intel.com>
User can uses ethtool to set the channel number. This patch handles the
channel changes by rebuilding the object list.
Signed-off-by: Kan Liang <kan.liang@intel.com>
---
include/linux/netpolicy.h | 8 ++++++++
net/core/ethtool.c | 8 +++++++-
net/core/netpolicy.c | 1 +
3 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index 579ff98..cc75e3c 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -79,4 +79,12 @@ struct netpolicy_info {
struct list_head obj_list[NETPOLICY_RXTX][NET_POLICY_MAX];
};
+#ifdef CONFIG_NETPOLICY
+extern void update_netpolicy_sys_map(void);
+#else
+static inline void update_netpolicy_sys_map(void)
+{
+}
+#endif
+
#endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 9774898..e1f8bd0 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -1703,6 +1703,7 @@ static noinline_for_stack int ethtool_set_channels(struct net_device *dev,
{
struct ethtool_channels channels, max;
u32 max_rx_in_use = 0;
+ int ret;
if (!dev->ethtool_ops->set_channels || !dev->ethtool_ops->get_channels)
return -EOPNOTSUPP;
@@ -1726,7 +1727,12 @@ static noinline_for_stack int ethtool_set_channels(struct net_device *dev,
(channels.combined_count + channels.rx_count) <= max_rx_in_use)
return -EINVAL;
- return dev->ethtool_ops->set_channels(dev, &channels);
+ ret = dev->ethtool_ops->set_channels(dev, &channels);
+#ifdef CONFIG_NETPOLICY
+ if (!ret)
+ update_netpolicy_sys_map();
+#endif
+ return ret;
}
static int ethtool_get_pauseparam(struct net_device *dev, void __user *useraddr)
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 3b523fc..7579685 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -885,6 +885,7 @@ unlock:
}
}
}
+EXPORT_SYMBOL(update_netpolicy_sys_map);
static int netpolicy_cpu_callback(struct notifier_block *nfb,
unsigned long action, void *hcpu)
--
2.5.5
^ permalink raw reply related
* [RFC V2 PATCH 13/25] net/netpolicy: support CPU hotplug
From: kan.liang @ 2015-01-01 1:39 UTC (permalink / raw)
To: davem, linux-kernel, netdev
Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
hannes, jesse.brandeburg, andi, Kan Liang
In-Reply-To: <1420076354-4861-1-git-send-email-kan.liang@intel.com>
From: Kan Liang <kan.liang@intel.com>
For CPU hotplug, the NET policy subsystem will rebuild the sys map and
object list.
Signed-off-by: Kan Liang <kan.liang@intel.com>
---
net/core/netpolicy.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 76 insertions(+)
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 2a04fcf..3b523fc 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -37,6 +37,7 @@
#include <net/net_namespace.h>
#include <linux/sort.h>
#include <linux/ctype.h>
+#include <linux/cpu.h>
static int netpolicy_get_dev_info(struct net_device *dev,
struct netpolicy_dev_info *d_info)
@@ -838,6 +839,73 @@ static struct notifier_block netpolicy_dev_notf = {
.notifier_call = netpolicy_notify,
};
+/**
+ * update_netpolicy_sys_map() - rebuild the sys map and object list
+ *
+ * This function go through all the available net policy supported device,
+ * and rebuild sys map and object list.
+ *
+ */
+void update_netpolicy_sys_map(void)
+{
+ struct net *net;
+ struct net_device *dev, *aux;
+ enum netpolicy_name cur_policy;
+
+ for_each_net(net) {
+ for_each_netdev_safe(net, dev, aux) {
+ spin_lock(&dev->np_lock);
+ if (!dev->netpolicy)
+ goto unlock;
+ cur_policy = dev->netpolicy->cur_policy;
+ if (cur_policy == NET_POLICY_NONE)
+ goto unlock;
+
+ dev->netpolicy->cur_policy = NET_POLICY_NONE;
+
+ /* rebuild everything */
+ netpolicy_disable(dev);
+ netpolicy_enable(dev);
+ if (netpolicy_gen_obj_list(dev, cur_policy)) {
+ pr_warn("NETPOLICY: Failed to generate netpolicy object list for dev %s\n",
+ dev->name);
+ netpolicy_disable(dev);
+ goto unlock;
+ }
+ if (dev->netdev_ops->ndo_set_net_policy(dev, cur_policy)) {
+ pr_warn("NETPOLICY: Failed to set netpolicy for dev %s\n",
+ dev->name);
+ netpolicy_disable(dev);
+ goto unlock;
+ }
+
+ dev->netpolicy->cur_policy = cur_policy;
+unlock:
+ spin_unlock(&dev->np_lock);
+ }
+ }
+}
+
+static int netpolicy_cpu_callback(struct notifier_block *nfb,
+ unsigned long action, void *hcpu)
+{
+ switch (action & ~CPU_TASKS_FROZEN) {
+ case CPU_ONLINE:
+ update_netpolicy_sys_map();
+ break;
+ case CPU_DYING:
+ update_netpolicy_sys_map();
+ break;
+ }
+ return NOTIFY_OK;
+}
+
+static struct notifier_block netpolicy_cpu_notifier = {
+ &netpolicy_cpu_callback,
+ NULL,
+ 0
+};
+
static int __init netpolicy_init(void)
{
int ret;
@@ -846,6 +914,10 @@ static int __init netpolicy_init(void)
if (!ret)
register_netdevice_notifier(&netpolicy_dev_notf);
+ cpu_notifier_register_begin();
+ __register_cpu_notifier(&netpolicy_cpu_notifier);
+ cpu_notifier_register_done();
+
return ret;
}
@@ -853,6 +925,10 @@ static void __exit netpolicy_exit(void)
{
unregister_netdevice_notifier(&netpolicy_dev_notf);
unregister_pernet_subsys(&netpolicy_net_ops);
+
+ cpu_notifier_register_begin();
+ __unregister_cpu_notifier(&netpolicy_cpu_notifier);
+ cpu_notifier_register_done();
}
subsys_initcall(netpolicy_init);
--
2.5.5
^ permalink raw reply related
* [RFC V2 PATCH 10/25] net/netpolicy: add three new NET policies
From: kan.liang @ 2015-01-01 1:38 UTC (permalink / raw)
To: davem, linux-kernel, netdev
Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
hannes, jesse.brandeburg, andi, Kan Liang
In-Reply-To: <1420076354-4861-1-git-send-email-kan.liang@intel.com>
From: Kan Liang <kan.liang@intel.com>
Introduce three NET policies
CPU policy: configure for higher throughput and lower CPU% (power
saving).
BULK policy: configure for highest throughput.
LATENCY policy: configure for lowest latency.
Signed-off-by: Kan Liang <kan.liang@intel.com>
---
include/linux/netpolicy.h | 3 +++
net/core/netpolicy.c | 5 ++++-
2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index b1d9277..3d348a7 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -18,6 +18,9 @@
enum netpolicy_name {
NET_POLICY_NONE = 0,
+ NET_POLICY_CPU,
+ NET_POLICY_BULK,
+ NET_POLICY_LATENCY,
NET_POLICY_MAX,
};
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 8112839..71e9163 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -223,7 +223,10 @@ static int netpolicy_enable(struct net_device *dev)
}
const char *policy_name[NET_POLICY_MAX] = {
- "NONE"
+ "NONE",
+ "CPU",
+ "BULK",
+ "LATENCY"
};
static u32 cpu_to_queue(struct net_device *dev,
--
2.5.5
^ permalink raw reply related
* [RFC V2 PATCH 09/25] net/netpolicy: set NET policy by policy name
From: kan.liang @ 2015-01-01 1:38 UTC (permalink / raw)
To: davem, linux-kernel, netdev
Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
hannes, jesse.brandeburg, andi, Kan Liang
In-Reply-To: <1420076354-4861-1-git-send-email-kan.liang@intel.com>
From: Kan Liang <kan.liang@intel.com>
User can write policy name to /proc/net/netpolicy/$DEV/policy to enable
net policy for specific device.
When the policy is enabled, the subsystem automatically disables IRQ
balance and set IRQ affinity. The object list is also generated
accordingly.
It is device driver's responsibility to set driver specific
configuration for the given policy.
np_lock will be used to protect the state.
Signed-off-by: Kan Liang <kan.liang@intel.com>
---
include/linux/netdevice.h | 5 +++
include/linux/netpolicy.h | 1 +
net/core/netpolicy.c | 95 +++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 101 insertions(+)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 1eda870..aa3ef38 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1127,6 +1127,9 @@ struct netdev_xdp {
* int (*ndo_get_irq_info)(struct net_device *dev,
* struct netpolicy_dev_info *info);
* This function is used to get irq information of rx and tx queues
+ * int (*ndo_set_net_policy)(struct net_device *dev,
+ * enum netpolicy_name name);
+ * This function is used to set per device net policy by name
*
*/
struct net_device_ops {
@@ -1318,6 +1321,8 @@ struct net_device_ops {
struct netpolicy_info *info);
int (*ndo_get_irq_info)(struct net_device *dev,
struct netpolicy_dev_info *info);
+ int (*ndo_set_net_policy)(struct net_device *dev,
+ enum netpolicy_name name);
#endif /* CONFIG_NETPOLICY */
};
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index 73a5fa6..b1d9277 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -27,6 +27,7 @@ enum netpolicy_traffic {
NETPOLICY_RXTX,
};
+#define POLICY_NAME_LEN_MAX 64
extern const char *policy_name[];
struct netpolicy_dev_info {
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 0f8ff16..8112839 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -36,6 +36,7 @@
#include <linux/netdevice.h>
#include <net/net_namespace.h>
#include <linux/sort.h>
+#include <linux/ctype.h>
static int netpolicy_get_dev_info(struct net_device *dev,
struct netpolicy_dev_info *d_info)
@@ -430,6 +431,69 @@ err:
return ret;
}
+static int net_policy_set_by_name(char *name, struct net_device *dev)
+{
+ int i, ret;
+
+ spin_lock(&dev->np_lock);
+ ret = 0;
+
+ if (!dev->netpolicy ||
+ !dev->netdev_ops->ndo_set_net_policy) {
+ ret = -ENOTSUPP;
+ goto unlock;
+ }
+
+ for (i = 0; i < NET_POLICY_MAX; i++) {
+ if (!strncmp(name, policy_name[i], strlen(policy_name[i])))
+ break;
+ }
+
+ if (!test_bit(i, dev->netpolicy->avail_policy)) {
+ ret = -ENOTSUPP;
+ goto unlock;
+ }
+
+ if (i == dev->netpolicy->cur_policy)
+ goto unlock;
+
+ /* If there is no policy applied yet, need to do enable first . */
+ if (dev->netpolicy->cur_policy == NET_POLICY_NONE) {
+ ret = netpolicy_enable(dev);
+ if (ret)
+ goto unlock;
+ }
+
+ netpolicy_free_obj_list(dev);
+
+ /* Generate object list according to policy name */
+ ret = netpolicy_gen_obj_list(dev, i);
+ if (ret)
+ goto err;
+
+ /* set policy */
+ ret = dev->netdev_ops->ndo_set_net_policy(dev, i);
+ if (ret)
+ goto err;
+
+ /* If removing policy, need to do disable. */
+ if (i == NET_POLICY_NONE)
+ netpolicy_disable(dev);
+
+ dev->netpolicy->cur_policy = i;
+
+ spin_unlock(&dev->np_lock);
+ return 0;
+
+err:
+ netpolicy_free_obj_list(dev);
+ if (dev->netpolicy->cur_policy == NET_POLICY_NONE)
+ netpolicy_disable(dev);
+unlock:
+ spin_unlock(&dev->np_lock);
+ return ret;
+}
+
#ifdef CONFIG_PROC_FS
static int net_policy_proc_show(struct seq_file *m, void *v)
@@ -459,11 +523,40 @@ static int net_policy_proc_open(struct inode *inode, struct file *file)
return single_open(file, net_policy_proc_show, PDE_DATA(inode));
}
+static ssize_t net_policy_proc_write(struct file *file, const char __user *buf,
+ size_t count, loff_t *pos)
+{
+ struct seq_file *m = file->private_data;
+ struct net_device *dev = (struct net_device *)m->private;
+ char name[POLICY_NAME_LEN_MAX];
+ int i, ret;
+
+ if (!dev->netpolicy)
+ return -ENOTSUPP;
+
+ if (count > POLICY_NAME_LEN_MAX)
+ return -EINVAL;
+
+ if (copy_from_user(name, buf, count))
+ return -EINVAL;
+
+ for (i = 0; i < count - 1; i++)
+ name[i] = toupper(name[i]);
+ name[POLICY_NAME_LEN_MAX - 1] = 0;
+
+ ret = net_policy_set_by_name(name, dev);
+ if (ret)
+ return ret;
+
+ return count;
+}
+
static const struct file_operations proc_net_policy_operations = {
.open = net_policy_proc_open,
.read = seq_read,
.llseek = seq_lseek,
.release = seq_release,
+ .write = net_policy_proc_write,
.owner = THIS_MODULE,
};
@@ -527,6 +620,8 @@ void uninit_netpolicy(struct net_device *dev)
{
spin_lock(&dev->np_lock);
if (dev->netpolicy) {
+ if (dev->netpolicy->cur_policy > NET_POLICY_NONE)
+ netpolicy_disable(dev);
kfree(dev->netpolicy);
dev->netpolicy = NULL;
}
--
2.5.5
^ permalink raw reply related
* [RFC V2 PATCH 08/25] net/netpolicy: introduce NET policy object
From: kan.liang @ 2015-01-01 1:38 UTC (permalink / raw)
To: davem, linux-kernel, netdev
Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
hannes, jesse.brandeburg, andi, Kan Liang
In-Reply-To: <1420076354-4861-1-git-send-email-kan.liang@intel.com>
From: Kan Liang <kan.liang@intel.com>
This patch introduces the concept of NET policy object and policy object
list.
The NET policy object is the instance of CPU/queue mapping. The object
can be shared between different tasks/sockets. So besides CPU and queue
information, the object also maintains a reference counter.
Each policy will have a dedicated object list. If the policy is set as
device policy, all objects will be inserted into the related policy
object list. The user will search and pickup the available objects from
the list later.
The network performance for objects could be different because of the
queue and CPU topology. To generate a proper object list, dev location,
HT and CPU topology have to be considered. The high performance objects
are in the front of the list.
The object lists will be regenerated if sys mapping changes or device
net policy changes.
Lock np_ob_list_lock is used to protect the object list.
Signed-off-by: Kan Liang <kan.liang@intel.com>
---
include/linux/netdevice.h | 2 +
include/linux/netpolicy.h | 15 +++
net/core/netpolicy.c | 237 +++++++++++++++++++++++++++++++++++++++++++++-
3 files changed, 253 insertions(+), 1 deletion(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0e55ccd..1eda870 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1635,6 +1635,7 @@ enum netdev_priv_flags {
* @proc_dev: device node in proc to configure device net policy
* @netpolicy: NET policy related information of net device
* @np_lock: protect the state of NET policy
+ * @np_ob_list_lock: protect the net policy object list
*
* FIXME: cleanup struct net_device such that network protocol info
* moves out.
@@ -1909,6 +1910,7 @@ struct net_device {
#endif /* CONFIG_PROC_FS */
struct netpolicy_info *netpolicy;
spinlock_t np_lock;
+ spinlock_t np_ob_list_lock;
#endif /* CONFIG_NETPOLICY */
};
#define to_net_dev(d) container_of(d, struct net_device, dev)
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index a946b75c..73a5fa6 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -21,6 +21,12 @@ enum netpolicy_name {
NET_POLICY_MAX,
};
+enum netpolicy_traffic {
+ NETPOLICY_RX = 0,
+ NETPOLICY_TX,
+ NETPOLICY_RXTX,
+};
+
extern const char *policy_name[];
struct netpolicy_dev_info {
@@ -46,11 +52,20 @@ struct netpolicy_sys_info {
struct netpolicy_sys_map *tx;
};
+struct netpolicy_object {
+ struct list_head list;
+ u32 cpu;
+ u32 queue;
+ atomic_t refcnt;
+};
+
struct netpolicy_info {
enum netpolicy_name cur_policy;
unsigned long avail_policy[BITS_TO_LONGS(NET_POLICY_MAX)];
/* cpu and queue mapping information */
struct netpolicy_sys_info sys_info;
+ /* List of policy objects 0 rx 1 tx */
+ struct list_head obj_list[NETPOLICY_RXTX][NET_POLICY_MAX];
};
#endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 7d4a49d..0f8ff16 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -35,6 +35,7 @@
#include <linux/uaccess.h>
#include <linux/netdevice.h>
#include <net/net_namespace.h>
+#include <linux/sort.h>
static int netpolicy_get_dev_info(struct net_device *dev,
struct netpolicy_dev_info *d_info)
@@ -161,10 +162,30 @@ static void netpolicy_set_affinity(struct net_device *dev)
}
}
+static void netpolicy_free_obj_list(struct net_device *dev)
+{
+ int i, j;
+ struct netpolicy_object *obj, *tmp;
+
+ spin_lock(&dev->np_ob_list_lock);
+ for (i = 0; i < NETPOLICY_RXTX; i++) {
+ for (j = NET_POLICY_NONE; j < NET_POLICY_MAX; j++) {
+ if (list_empty(&dev->netpolicy->obj_list[i][j]))
+ continue;
+ list_for_each_entry_safe(obj, tmp, &dev->netpolicy->obj_list[i][j], list) {
+ list_del(&obj->list);
+ kfree(obj);
+ }
+ }
+ }
+ spin_unlock(&dev->np_ob_list_lock);
+}
+
static int netpolicy_disable(struct net_device *dev)
{
netpolicy_clear_affinity(dev);
netpolicy_free_sys_map(dev);
+ netpolicy_free_obj_list(dev);
return 0;
}
@@ -203,6 +224,212 @@ static int netpolicy_enable(struct net_device *dev)
const char *policy_name[NET_POLICY_MAX] = {
"NONE"
};
+
+static u32 cpu_to_queue(struct net_device *dev,
+ u32 cpu, bool is_rx)
+{
+ struct netpolicy_sys_info *s_info = &dev->netpolicy->sys_info;
+ int i;
+
+ if (is_rx) {
+ for (i = 0; i < s_info->avail_rx_num; i++) {
+ if (s_info->rx[i].cpu == cpu)
+ return s_info->rx[i].queue;
+ }
+ } else {
+ for (i = 0; i < s_info->avail_tx_num; i++) {
+ if (s_info->tx[i].cpu == cpu)
+ return s_info->tx[i].queue;
+ }
+ }
+
+ return ~0;
+}
+
+static int netpolicy_add_obj(struct net_device *dev,
+ u32 cpu, bool is_rx,
+ enum netpolicy_name policy)
+{
+ struct netpolicy_object *obj;
+ int dir = is_rx ? NETPOLICY_RX : NETPOLICY_TX;
+
+ obj = kzalloc(sizeof(*obj), GFP_ATOMIC);
+ if (!obj)
+ return -ENOMEM;
+ obj->cpu = cpu;
+ obj->queue = cpu_to_queue(dev, cpu, is_rx);
+ list_add_tail(&obj->list, &dev->netpolicy->obj_list[dir][policy]);
+
+ return 0;
+}
+
+struct sort_node {
+ int node;
+ int distance;
+};
+
+static inline int node_distance_cmp(const void *a, const void *b)
+{
+ const struct sort_node *_a = a;
+ const struct sort_node *_b = b;
+
+ return _a->distance - _b->distance;
+}
+
+static int _netpolicy_gen_obj_list(struct net_device *dev, bool is_rx,
+ enum netpolicy_name policy,
+ struct sort_node *nodes, int num_node,
+ struct cpumask *node_avail_cpumask)
+{
+ cpumask_var_t node_tmp_cpumask, sibling_tmp_cpumask;
+ struct cpumask *node_assigned_cpumask;
+ int i, ret = -ENOMEM;
+ u32 cpu;
+
+ if (!alloc_cpumask_var(&node_tmp_cpumask, GFP_ATOMIC))
+ return ret;
+ if (!alloc_cpumask_var(&sibling_tmp_cpumask, GFP_ATOMIC))
+ goto alloc_fail1;
+
+ node_assigned_cpumask = kcalloc(num_node, sizeof(struct cpumask), GFP_ATOMIC);
+ if (!node_assigned_cpumask)
+ goto alloc_fail2;
+
+ /* Don't share physical core */
+ for (i = 0; i < num_node; i++) {
+ if (cpumask_weight(&node_avail_cpumask[nodes[i].node]) == 0)
+ continue;
+ spin_lock(&dev->np_ob_list_lock);
+ cpumask_copy(node_tmp_cpumask, &node_avail_cpumask[nodes[i].node]);
+ while (cpumask_weight(node_tmp_cpumask)) {
+ cpu = cpumask_first(node_tmp_cpumask);
+
+ /* push to obj list */
+ ret = netpolicy_add_obj(dev, cpu, is_rx, policy);
+ if (ret) {
+ spin_unlock(&dev->np_ob_list_lock);
+ goto err;
+ }
+
+ cpumask_set_cpu(cpu, &node_assigned_cpumask[nodes[i].node]);
+ cpumask_and(sibling_tmp_cpumask, node_tmp_cpumask, topology_sibling_cpumask(cpu));
+ cpumask_xor(node_tmp_cpumask, node_tmp_cpumask, sibling_tmp_cpumask);
+ }
+ spin_unlock(&dev->np_ob_list_lock);
+ }
+
+ for (i = 0; i < num_node; i++) {
+ cpumask_xor(node_tmp_cpumask, &node_avail_cpumask[nodes[i].node], &node_assigned_cpumask[nodes[i].node]);
+ if (cpumask_weight(node_tmp_cpumask) == 0)
+ continue;
+ spin_lock(&dev->np_ob_list_lock);
+ for_each_cpu(cpu, node_tmp_cpumask) {
+ /* push to obj list */
+ ret = netpolicy_add_obj(dev, cpu, is_rx, policy);
+ if (ret) {
+ spin_unlock(&dev->np_ob_list_lock);
+ goto err;
+ }
+ cpumask_set_cpu(cpu, &node_assigned_cpumask[nodes[i].node]);
+ }
+ spin_unlock(&dev->np_ob_list_lock);
+ }
+
+err:
+ kfree(node_assigned_cpumask);
+alloc_fail2:
+ free_cpumask_var(sibling_tmp_cpumask);
+alloc_fail1:
+ free_cpumask_var(node_tmp_cpumask);
+
+ return ret;
+}
+
+static int netpolicy_gen_obj_list(struct net_device *dev,
+ enum netpolicy_name policy)
+{
+ struct netpolicy_sys_info *s_info = &dev->netpolicy->sys_info;
+ struct cpumask *node_avail_cpumask;
+ int dev_node = 0, num_nodes = 1;
+ struct sort_node *nodes;
+ int i, ret, node = 0;
+ u32 cpu;
+#ifdef CONFIG_NUMA
+ int val;
+#endif
+ /* The network performance for objects could be different
+ * because of the queue and cpu topology.
+ * The objects will be ordered accordingly,
+ * and put high performance object in the front.
+ *
+ * The priority rules as below,
+ * - The local object. (Local means cpu and queue are in the same node.)
+ * - The cpu in the object is the only logical core in physical core.
+ * The sibiling core's object has not been added in the object list yet.
+ * - The rest of objects
+ *
+ * So the order of object list is as below:
+ * 1. Local core + the only logical core
+ * 2. Remote core + the only logical core
+ * 3. Local core + the core's sibling is already in the object list
+ * 4. Remote core + the core's sibling is already in the object list
+ */
+#ifdef CONFIG_NUMA
+ dev_node = dev_to_node(dev->dev.parent);
+ num_nodes = num_online_nodes();
+#endif
+
+ nodes = kcalloc(num_nodes, sizeof(*nodes), GFP_ATOMIC);
+ if (!nodes)
+ return -ENOMEM;
+
+ node_avail_cpumask = kcalloc(num_nodes, sizeof(struct cpumask), GFP_ATOMIC);
+ if (!node_avail_cpumask) {
+ kfree(nodes);
+ return -ENOMEM;
+ }
+
+#ifdef CONFIG_NUMA
+ /* order the node from near to far */
+ for_each_node_mask(i, node_online_map) {
+ val = node_distance(dev_node, i);
+ nodes[node].node = i;
+ nodes[node].distance = val;
+ node++;
+ }
+ sort(nodes, num_nodes, sizeof(*nodes),
+ node_distance_cmp, NULL);
+#else
+ nodes[0].node = 0;
+#endif
+
+ for (i = 0; i < s_info->avail_rx_num; i++) {
+ cpu = s_info->rx[i].cpu;
+ cpumask_set_cpu(cpu, &node_avail_cpumask[cpu_to_node(cpu)]);
+ }
+ ret = _netpolicy_gen_obj_list(dev, true, policy, nodes,
+ node, node_avail_cpumask);
+ if (ret)
+ goto err;
+
+ for (i = 0; i < node; i++)
+ cpumask_clear(&node_avail_cpumask[nodes[i].node]);
+
+ for (i = 0; i < s_info->avail_tx_num; i++) {
+ cpu = s_info->tx[i].cpu;
+ cpumask_set_cpu(cpu, &node_avail_cpumask[cpu_to_node(cpu)]);
+ }
+ ret = _netpolicy_gen_obj_list(dev, false, policy, nodes,
+ node, node_avail_cpumask);
+ if (ret)
+ goto err;
+
+err:
+ kfree(nodes);
+ kfree(node_avail_cpumask);
+ return ret;
+}
+
#ifdef CONFIG_PROC_FS
static int net_policy_proc_show(struct seq_file *m, void *v)
@@ -258,7 +485,7 @@ static int netpolicy_proc_dev_init(struct net *net, struct net_device *dev)
int init_netpolicy(struct net_device *dev)
{
- int ret;
+ int ret, i, j;
spin_lock(&dev->np_lock);
ret = 0;
@@ -281,7 +508,15 @@ int init_netpolicy(struct net_device *dev)
if (ret) {
kfree(dev->netpolicy);
dev->netpolicy = NULL;
+ goto unlock;
+ }
+
+ spin_lock(&dev->np_ob_list_lock);
+ for (i = 0; i < NETPOLICY_RXTX; i++) {
+ for (j = NET_POLICY_NONE; j < NET_POLICY_MAX; j++)
+ INIT_LIST_HEAD(&dev->netpolicy->obj_list[i][j]);
}
+ spin_unlock(&dev->np_ob_list_lock);
unlock:
spin_unlock(&dev->np_lock);
--
2.5.5
^ permalink raw reply related
* [RFC V2 PATCH 07/25] net/netpolicy: enable and disable NET policy
From: kan.liang @ 2015-01-01 1:38 UTC (permalink / raw)
To: davem, linux-kernel, netdev
Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
hannes, jesse.brandeburg, andi, Kan Liang
In-Reply-To: <1420076354-4861-1-git-send-email-kan.liang@intel.com>
From: Kan Liang <kan.liang@intel.com>
This patch introduces functions to enable and disable NET policy.
For enabling, it collects device and CPU information, setup CPU/queue
mapping, and set IRQ affinity accordingly.
For disabling, it removes the IRQ affinity and mapping information.
np_lock should protect the enable and disable state. It will be done
later in this series.
Signed-off-by: Kan Liang <kan.liang@intel.com>
---
net/core/netpolicy.c | 39 +++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index c44818d..7d4a49d 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -161,6 +161,45 @@ static void netpolicy_set_affinity(struct net_device *dev)
}
}
+static int netpolicy_disable(struct net_device *dev)
+{
+ netpolicy_clear_affinity(dev);
+ netpolicy_free_sys_map(dev);
+
+ return 0;
+}
+
+static int netpolicy_enable(struct net_device *dev)
+{
+ int ret;
+ struct netpolicy_dev_info d_info;
+ u32 cpu;
+
+ if (WARN_ON(!dev->netpolicy))
+ return -EINVAL;
+
+ /* get driver information */
+ ret = netpolicy_get_dev_info(dev, &d_info);
+ if (ret)
+ return ret;
+
+ /* get cpu information */
+ cpu = netpolicy_get_cpu_information();
+
+ /* create sys map */
+ ret = netpolicy_update_sys_map(dev, &d_info, cpu);
+ if (ret) {
+ netpolicy_free_dev_info(&d_info);
+ return ret;
+ }
+
+ /* set irq affinity */
+ netpolicy_set_affinity(dev);
+
+ netpolicy_free_dev_info(&d_info);
+ return 0;
+}
+
const char *policy_name[NET_POLICY_MAX] = {
"NONE"
};
--
2.5.5
^ permalink raw reply related
* [RFC V2 PATCH 05/25] net/netpolicy: create CPU and queue mapping
From: kan.liang @ 2015-01-01 1:38 UTC (permalink / raw)
To: davem, linux-kernel, netdev
Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
hannes, jesse.brandeburg, andi, Kan Liang
In-Reply-To: <1420076354-4861-1-git-send-email-kan.liang@intel.com>
From: Kan Liang <kan.liang@intel.com>
Current implementation forces CPU and queue 1:1 mapping. This patch
introduces the function netpolicy_update_sys_map to create this mapping.
The result is stored in netpolicy_sys_info.
If the CPU count and queue count are different, the remaining
CPUs/queues are not used for now.
CPU hotplug, device hotplug or ethtool may change the CPU count or
queue count. For these cases, this function can also be called to
reconstruct the mapping. These cases will be handled later in this
series.
Signed-off-by: Kan Liang <kan.liang@intel.com>
---
include/linux/netpolicy.h | 18 ++++++++++++
net/core/netpolicy.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 92 insertions(+)
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index fc87d9b..a946b75c 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -30,9 +30,27 @@ struct netpolicy_dev_info {
u32 *tx_irq;
};
+struct netpolicy_sys_map {
+ u32 cpu;
+ u32 queue;
+ u32 irq;
+};
+
+struct netpolicy_sys_info {
+ /*
+ * Record the cpu and queue 1:1 mapping
+ */
+ u32 avail_rx_num;
+ struct netpolicy_sys_map *rx;
+ u32 avail_tx_num;
+ struct netpolicy_sys_map *tx;
+};
+
struct netpolicy_info {
enum netpolicy_name cur_policy;
unsigned long avail_policy[BITS_TO_LONGS(NET_POLICY_MAX)];
+ /* cpu and queue mapping information */
+ struct netpolicy_sys_info sys_info;
};
#endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 075aaca..ff7fc04 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -54,6 +54,80 @@ static u32 netpolicy_get_cpu_information(void)
return num_online_cpus();
}
+static void netpolicy_free_sys_map(struct net_device *dev)
+{
+ struct netpolicy_sys_info *s_info = &dev->netpolicy->sys_info;
+
+ kfree(s_info->rx);
+ s_info->rx = NULL;
+ s_info->avail_rx_num = 0;
+ kfree(s_info->tx);
+ s_info->tx = NULL;
+ s_info->avail_tx_num = 0;
+}
+
+static int netpolicy_update_sys_map(struct net_device *dev,
+ struct netpolicy_dev_info *d_info,
+ u32 cpu)
+{
+ struct netpolicy_sys_info *s_info = &dev->netpolicy->sys_info;
+ u32 num, i, online_cpu;
+ cpumask_var_t cpumask;
+
+ if (!alloc_cpumask_var(&cpumask, GFP_ATOMIC))
+ return -ENOMEM;
+
+ /* update rx cpu map */
+ if (cpu > d_info->rx_num)
+ num = d_info->rx_num;
+ else
+ num = cpu;
+
+ s_info->avail_rx_num = num;
+ s_info->rx = kcalloc(num, sizeof(*s_info->rx), GFP_ATOMIC);
+ if (!s_info->rx)
+ goto err;
+ cpumask_copy(cpumask, cpu_online_mask);
+
+ i = 0;
+ for_each_cpu(online_cpu, cpumask) {
+ if (i == num)
+ break;
+ s_info->rx[i].cpu = online_cpu;
+ s_info->rx[i].queue = i;
+ s_info->rx[i].irq = d_info->rx_irq[i];
+ i++;
+ }
+
+ /* update tx cpu map */
+ if (cpu >= d_info->tx_num)
+ num = d_info->tx_num;
+ else
+ num = cpu;
+
+ s_info->avail_tx_num = num;
+ s_info->tx = kcalloc(num, sizeof(*s_info->tx), GFP_ATOMIC);
+ if (!s_info->tx)
+ goto err;
+
+ i = 0;
+ for_each_cpu(online_cpu, cpumask) {
+ if (i == num)
+ break;
+ s_info->tx[i].cpu = online_cpu;
+ s_info->tx[i].queue = i;
+ s_info->tx[i].irq = d_info->tx_irq[i];
+ i++;
+ }
+
+ free_cpumask_var(cpumask);
+ return 0;
+err:
+ netpolicy_free_sys_map(dev);
+ free_cpumask_var(cpumask);
+ return -ENOMEM;
+}
+
const char *policy_name[NET_POLICY_MAX] = {
"NONE"
};
--
2.5.5
^ permalink raw reply related
* [RFC V2 PATCH 03/25] net/netpolicy: get device queue irq information
From: kan.liang @ 2015-01-01 1:38 UTC (permalink / raw)
To: davem, linux-kernel, netdev
Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
hannes, jesse.brandeburg, andi, Kan Liang
In-Reply-To: <1420076354-4861-1-git-send-email-kan.liang@intel.com>
From: Kan Liang <kan.liang@intel.com>
Net policy needs to know device information. Currently, it's enough to
only get irq information of rx and tx queues.
This patch introduces ndo ops to do so, not ethtool ops.
Because there are already several ways to get irq information in
userspace. It's not necessory to extend the ethtool.
Signed-off-by: Kan Liang <kan.liang@intel.com>
---
include/linux/netdevice.h | 5 +++++
include/linux/netpolicy.h | 7 +++++++
net/core/netpolicy.c | 14 ++++++++++++++
3 files changed, 26 insertions(+)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2e0a7e7..0e55ccd 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1124,6 +1124,9 @@ struct netdev_xdp {
* int(*ndo_netpolicy_init)(struct net_device *dev,
* struct netpolicy_info *info);
* This function is used to init and get supported policy.
+ * int (*ndo_get_irq_info)(struct net_device *dev,
+ * struct netpolicy_dev_info *info);
+ * This function is used to get irq information of rx and tx queues
*
*/
struct net_device_ops {
@@ -1313,6 +1316,8 @@ struct net_device_ops {
#ifdef CONFIG_NETPOLICY
int (*ndo_netpolicy_init)(struct net_device *dev,
struct netpolicy_info *info);
+ int (*ndo_get_irq_info)(struct net_device *dev,
+ struct netpolicy_dev_info *info);
#endif /* CONFIG_NETPOLICY */
};
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index ca1f131..fc87d9b 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -23,6 +23,13 @@ enum netpolicy_name {
extern const char *policy_name[];
+struct netpolicy_dev_info {
+ u32 rx_num;
+ u32 tx_num;
+ u32 *rx_irq;
+ u32 *tx_irq;
+};
+
struct netpolicy_info {
enum netpolicy_name cur_policy;
unsigned long avail_policy[BITS_TO_LONGS(NET_POLICY_MAX)];
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 5f304d5..7c34c8a 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -35,6 +35,20 @@
#include <linux/netdevice.h>
#include <net/net_namespace.h>
+static int netpolicy_get_dev_info(struct net_device *dev,
+ struct netpolicy_dev_info *d_info)
+{
+ if (!dev->netdev_ops->ndo_get_irq_info)
+ return -ENOTSUPP;
+ return dev->netdev_ops->ndo_get_irq_info(dev, d_info);
+}
+
+static void netpolicy_free_dev_info(struct netpolicy_dev_info *d_info)
+{
+ kfree(d_info->rx_irq);
+ kfree(d_info->tx_irq);
+}
+
const char *policy_name[NET_POLICY_MAX] = {
"NONE"
};
--
2.5.5
^ permalink raw reply related
* [RFC V2 PATCH 01/25] net: introduce NET policy
From: kan.liang @ 2015-01-01 1:38 UTC (permalink / raw)
To: davem, linux-kernel, netdev
Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
hannes, jesse.brandeburg, andi, Kan Liang
In-Reply-To: <1420076354-4861-1-git-send-email-kan.liang@intel.com>
From: Kan Liang <kan.liang@intel.com>
This patch introduce NET policy subsystem. If proc is supported in the
system, it creates netpolicy node in proc system.
Signed-off-by: Kan Liang <kan.liang@intel.com>
---
include/linux/netdevice.h | 7 +++
include/net/net_namespace.h | 3 ++
net/Kconfig | 7 +++
net/core/Makefile | 1 +
net/core/netpolicy.c | 128 ++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 146 insertions(+)
create mode 100644 net/core/netpolicy.c
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 076df53..19638d6 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1619,6 +1619,8 @@ enum netdev_priv_flags {
* switch driver and used to set the phys state of the
* switch port.
*
+ * @proc_dev: device node in proc to configure device net policy
+ *
* FIXME: cleanup struct net_device such that network protocol info
* moves out.
*/
@@ -1886,6 +1888,11 @@ struct net_device {
struct lock_class_key *qdisc_tx_busylock;
struct lock_class_key *qdisc_running_key;
bool proto_down;
+#ifdef CONFIG_NETPOLICY
+#ifdef CONFIG_PROC_FS
+ struct proc_dir_entry *proc_dev;
+#endif /* CONFIG_PROC_FS */
+#endif /* CONFIG_NETPOLICY */
};
#define to_net_dev(d) container_of(d, struct net_device, dev)
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 4089abc..d2ff6c4 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -142,6 +142,9 @@ struct net {
#endif
struct sock *diag_nlsk;
atomic_t fnhe_genid;
+#ifdef CONFIG_NETPOLICY
+ struct proc_dir_entry *proc_netpolicy;
+#endif /* CONFIG_NETPOLICY */
};
#include <linux/seq_file_net.h>
diff --git a/net/Kconfig b/net/Kconfig
index c2cdbce..00552ba 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -205,6 +205,13 @@ source "net/bridge/netfilter/Kconfig"
endif
+config NETPOLICY
+ depends on NET
+ bool "Net policy support"
+ default y
+ ---help---
+ Net policy support
+
source "net/dccp/Kconfig"
source "net/sctp/Kconfig"
source "net/rds/Kconfig"
diff --git a/net/core/Makefile b/net/core/Makefile
index d6508c2..0be7092 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -27,3 +27,4 @@ obj-$(CONFIG_LWTUNNEL) += lwtunnel.o
obj-$(CONFIG_DST_CACHE) += dst_cache.o
obj-$(CONFIG_HWBM) += hwbm.o
obj-$(CONFIG_NET_DEVLINK) += devlink.o
+obj-$(CONFIG_NETPOLICY) += netpolicy.o
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
new file mode 100644
index 0000000..faabfe7
--- /dev/null
+++ b/net/core/netpolicy.c
@@ -0,0 +1,128 @@
+/*
+ * netpolicy.c: Net policy support
+ * Copyright (c) 2016, Intel Corporation.
+ * Author: Kan Liang (kan.liang@intel.com)
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * NET policy intends to simplify the network configuration and get a good
+ * network performance according to the hints(policy) which is applied by user.
+ *
+ * Motivation
+ * - The network performance is not good with default system settings.
+ * - It is too difficult to do automatic tuning for all possible
+ * workloads, since workloads have different requirements. Some
+ * workloads may want high throughput. Some may need low latency.
+ * - There are lots of manual configurations. Fine grained configuration
+ * is too difficult for users.
+ * So, it is a big challenge to get good network performance.
+ *
+ */
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/seq_file.h>
+#include <linux/proc_fs.h>
+#include <linux/uaccess.h>
+#include <linux/netdevice.h>
+#include <net/net_namespace.h>
+
+#ifdef CONFIG_PROC_FS
+
+static int net_policy_proc_show(struct seq_file *m, void *v)
+{
+ struct net_device *dev = (struct net_device *)m->private;
+
+ seq_printf(m, "%s doesn't support net policy manager\n", dev->name);
+
+ return 0;
+}
+
+static int net_policy_proc_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, net_policy_proc_show, PDE_DATA(inode));
+}
+
+static const struct file_operations proc_net_policy_operations = {
+ .open = net_policy_proc_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release,
+ .owner = THIS_MODULE,
+};
+
+static int netpolicy_proc_dev_init(struct net *net, struct net_device *dev)
+{
+ dev->proc_dev = proc_net_mkdir(net, dev->name, net->proc_netpolicy);
+ if (!dev->proc_dev)
+ return -ENOMEM;
+
+ if (!proc_create_data("policy", S_IWUSR | S_IRUGO,
+ dev->proc_dev, &proc_net_policy_operations,
+ (void *)dev)) {
+ remove_proc_subtree(dev->name, net->proc_netpolicy);
+ return -ENOMEM;
+ }
+ return 0;
+}
+
+static int __net_init netpolicy_net_init(struct net *net)
+{
+ struct net_device *dev, *aux;
+
+ net->proc_netpolicy = proc_net_mkdir(net, "netpolicy",
+ net->proc_net);
+ if (!net->proc_netpolicy)
+ return -ENOMEM;
+
+ for_each_netdev_safe(net, dev, aux) {
+ netpolicy_proc_dev_init(net, dev);
+ }
+
+ return 0;
+}
+
+#else /* CONFIG_PROC_FS */
+
+static int __net_init netpolicy_net_init(struct net *net)
+{
+ return 0;
+}
+#endif /* CONFIG_PROC_FS */
+
+static void __net_exit netpolicy_net_exit(struct net *net)
+{
+#ifdef CONFIG_PROC_FS
+ remove_proc_subtree("netpolicy", net->proc_net);
+#endif /* CONFIG_PROC_FS */
+}
+
+static struct pernet_operations netpolicy_net_ops = {
+ .init = netpolicy_net_init,
+ .exit = netpolicy_net_exit,
+};
+
+static int __init netpolicy_init(void)
+{
+ int ret;
+
+ ret = register_pernet_subsys(&netpolicy_net_ops);
+
+ return ret;
+}
+
+static void __exit netpolicy_exit(void)
+{
+ unregister_pernet_subsys(&netpolicy_net_ops);
+}
+
+subsys_initcall(netpolicy_init);
+module_exit(netpolicy_exit);
--
2.5.5
^ permalink raw reply related
* [RFC V2 PATCH 00/25] Kernel NET policy
From: kan.liang @ 2015-01-01 1:38 UTC (permalink / raw)
To: davem, linux-kernel, netdev
Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
hannes, jesse.brandeburg, andi, Kan Liang
From: Kan Liang <kan.liang@intel.com>
It is a big challenge to get good network performance. First, the network
performance is not good with default system settings. Second, it is too
difficult to do automatic tuning for all possible workloads, since workloads
have different requirements. Some workloads may want high throughput. Some may
need low latency. Last but not least, there are lots of manual configurations.
Fine grained configuration is too difficult for users.
NET policy intends to simplify the network configuration and get a good network
performance according to the hints(policy) which is applied by user. It
provides some typical "policies" for user which can be set per-socket, per-task
or per-device. The kernel will automatically figures out how to merge different
requests to get good network performance.
NET policy is designed for multiqueue network devices. This implementation is
only for Intel NICs using i40e driver. But the concepts and generic code should
apply to other multiqueue NICs too.
NET policy is also a combination of generic policy manager code and some
ethtool callbacks (per queue coalesce setting, flow classification rules) to
configure the driver.
This series also supports CPU hotplug and device hotplug.
Here are some common questions about NET policy.
1. Why userspace tool cannot do the same thing?
A: Kernel is more suitable for NET policy.
- User space code would be far more complicated to get right and perform
well . It always need to work with out of date state compared to the
latest, because it cannot do any locking with the kernel state.
- User space code is less efficient than kernel code, because of the
additional context switches needed.
- Kernel is in the right position to coordinate requests from multiple
users.
2. Is NET policy looking for optimal settings?
A: No. The NET policy intends to get a good network performance according
to user's specific request. Our target for good performance is ~90% of
the optimal settings.
3. How's the configuration impact the connection rates?
A: There are two places to acquire rtnl mutex to configure the device.
- One is to do device policy setting. It happens on initalization stage,
hotplug or queue number changes. The device policy will be set to
NET_POLICY_NONE. If so, it "falls back" to the system default way to
direct the packets. It doesn't block the connection.
- The other is to set Rx network flow classification options or rules.
It uses work queue to do asynchronized setting. It avoid destroying
the connection rates.
4. Why not using existing mechanism for NET policy?
For example, cgroup tc or existing SOCKET options.
A: The NET policy has already used existing mechanism as many as it can.
For example, it uses existing ethtool interface to configure the device.
However, the NET policy stiil need to introduce new interfaces to meet
its special request.
For resource usage, current cgroup tc is not suitable for per-socket
setting. Also, current tc can only set rate limit. The NET policy wants
to change interrupt moderation per device queue. So in this series, it
will not use cgroup tc. But in some places, cgroup and NET policy are
similar. For example, both of them isolates the resource usage. Both of
them do traffic controller. So it is on the NET policy TODO list to
work well with cgroup.
For socket options, SO_MARK or may be SO_PRIORITY is close to NET policy's
requirement. But they can not be reused for NET policy. SO_MARK can be
used for routing and packet filtering. But the NET policy doesn't intend to
change the routing. It only redirects the packet to the specific device
queue. Also, the target queue is assigned by NET policy subsystem at run
time. It should not be set in advance. SO_PRIORITY can set protocol-defined
priority for all packets on the socket. But the policies don't have priority.
5. Why disable IRQ balance?
A: Disabling IRQ balance is a common way (recommend way for some devices) to
tune network performance.
Here are some key Interfaces/APIs for NET policy.
Interfaces which export to user space
/proc/net/netpolicy/$DEV/policy
User can set/get per device policy from /proc
/proc/$PID/net_policy
User can set/get per task policy from /proc
prctl(PR_SET_NETPOLICY, POLICY_NAME, NULL, NULL, NULL)
An alternative way to set/get per task policy is from prctl.
setsockopt(sockfd,SOL_SOCKET,SO_NETPOLICY,&policy,sizeof(int))
User can set/get per socket policy by setsockopt
New ndo opt
int (*ndo_netpolicy_init)(struct net_device *dev,
struct netpolicy_info *info);
Initialize device driver for NET policy
int (*ndo_get_irq_info)(struct net_device *dev,
struct netpolicy_dev_info *info);
Collect device information. Currently, only collecting IRQ
informance should be enough.
int (*ndo_set_net_policy)(struct net_device *dev,
enum netpolicy_name name);
This interface is used to set device NET policy by name. It is device driver's
responsibility to set driver specific configuration for the given policy.
NET policy subsystem APIs
netpolicy_register(struct netpolicy_instance *instance,
enum netpolicy_name policy)
netpolicy_unregister(struct netpolicy_instance *instance)
Register/unregister per task/socket NET policy.
The socket/task can only be benefited when it register itself with
specific policy. After registeration, an record will be created and inserted
into a RCU hash table, which include all the NET policy related information
for the socket/task.
netpolicy_pick_queue(struct netpolicy_instance *instance, bool is_rx);
Find the proper queue according to policy for packet receiving and
transmitting
netpolicy_set_rules(struct netpolicy_instance *instance);
Configure Rx network flow classification rules
For using NET policy, the per-device policy must be set in advance. It will
automatically configure the system and re-organize the resource of the system
accordingly. For system configuration, in this series, it will disable irq
balance, set device queue irq affinity, and modify interrupt moderation. For
re-organizing the resource, current implementation forces that CPU and queue
irq are 1:1 mapping. An 1:1 mapping group is also called NET policy object.
For each device policy, it maintains a policy list. Once the device policy is
applied, the objects will be insert and tracked in that device policy list. The
policy list only be updated when CPU/device hotplug, queue number changes or
device policy changes.
The user can use /proc, prctl and setsockopt to set per-task and per-socket
NET policy. Once the policy is set, an related record will be inserted into RCU
hash table. The record includes ptr, policy and NET policy object. The ptr is
the pointer address of task/socket. The object will not be assigned until the
first package receive/transmit. The object is picked by round-robin from object
list. Once the object is determined, the following packets will be set to
redirect to the queue(object).
The object can be shared. The per-task or per-socket policy can be inherited.
Now NET policy supports four per device policies and three per task/socket
policies.
- BULK policy: This policy is designed for high throughput. It can be
applied to either per device policy or per task/socket policy.
- CPU policy: This policy is designed for high throughput but lower CPU
utilization (power saving). It can be applied to either per device policy
or per task/socket policy.
- LATENCY policy: This policy is designed for low latency. It can be
applied to either per device policy or per task/socket policy.
- MIX policy: This policy can only be applied to per device policy. This
is designed for the case which miscellaneous types of workload running
on the device.
Lots of tests are done for NET policy on platforms with Intel Xeon E5 V2
and XL710 40G NIC. The baseline test is with Linux 4.6.0 kernel.
Netperf is used to evaluate the throughput and latency performance.
- "netperf -f m -t TCP_RR -H server_IP -c -C -l 60 -- -r buffersize
-b burst -D" is used to evaluate throughput performance, which is
called throughput-first workload.
- "netperf -t TCP_RR -H server_IP -c -C -l 60 -- -r buffersize" is
used to evaluate latency performance, which is called latency-first
workload.
- Different loads are also evaluated by running 1, 12, 24, 48 or 96
throughput-first workloads/latency-first workload simultaneously.
For "BULK" policy, the throughput performance is on average ~1.27X than
baseline.
For "CPU" policy, the throughput performance is on average ~1.25X than
baseline, and has lower CPU% (on average ~5% lower than "BULK" policy).
For "LATENCY" policy, the latency is on average 51.5% less than the baseline.
For "MIX" policy, mixed workloads performance is evaluated.
The mixed workloads are combination of throughput-first workload and
latency-first workload. Five different types of combinations are evaluated
(pure throughput-first workload, pure latency-first workloads,
2/3 throughput-first workload + 1/3 latency-first workloads,
1/3 throughput-first workload + 2/3 latency-first workloads and
1/2 throughput-first workload + 1/2 latency-first workloads).
For caculating the performance of mixed workloads, a weighted sum system
is introduced.
Score = normalized_latency * Weight + normalized_throughput * (1 - Weight).
If we assume that the user has an equal interest in latency and throughput
performance, the Score for "MIX" policy is on average ~1.83X than baseline.
Changes since V1:
- Using work queue to set Rx network flow classification rules and search
available NET policy object asynchronously.
- Using RCU lock to replace read-write lock
- Redo performance test and update performance results.
- Some minor modification for codes and documents.
- Remove i40e related patches which will be submitted in separate thread.
Kan Liang (25):
net: introduce NET policy
net/netpolicy: init NET policy
net/netpolicy: get device queue irq information
net/netpolicy: get CPU information
net/netpolicy: create CPU and queue mapping
net/netpolicy: set and remove IRQ affinity
net/netpolicy: enable and disable NET policy
net/netpolicy: introduce NET policy object
net/netpolicy: set NET policy by policy name
net/netpolicy: add three new NET policies
net/netpolicy: add MIX policy
net/netpolicy: NET device hotplug
net/netpolicy: support CPU hotplug
net/netpolicy: handle channel changes
net/netpolicy: implement netpolicy register
net/netpolicy: introduce per socket netpolicy
net/netpolicy: introduce netpolicy_pick_queue
net/netpolicy: set tx queues according to policy
net/netpolicy: set Rx queues according to policy
net/netpolicy: introduce per task net policy
net/netpolicy: set per task policy by proc
net/netpolicy: fast path for finding the queues
net/netpolicy: optimize for queue pair
net/netpolicy: limit the total record number
Documentation/networking: Document NET policy
Documentation/networking/netpolicy.txt | 157 ++++
arch/alpha/include/uapi/asm/socket.h | 2 +
arch/avr32/include/uapi/asm/socket.h | 2 +
arch/frv/include/uapi/asm/socket.h | 2 +
arch/ia64/include/uapi/asm/socket.h | 2 +
arch/m32r/include/uapi/asm/socket.h | 2 +
arch/mips/include/uapi/asm/socket.h | 2 +
arch/mn10300/include/uapi/asm/socket.h | 2 +
arch/parisc/include/uapi/asm/socket.h | 2 +
arch/powerpc/include/uapi/asm/socket.h | 2 +
arch/s390/include/uapi/asm/socket.h | 2 +
arch/sparc/include/uapi/asm/socket.h | 2 +
arch/xtensa/include/uapi/asm/socket.h | 2 +
fs/proc/base.c | 64 ++
include/linux/init_task.h | 9 +
include/linux/netdevice.h | 31 +
include/linux/netpolicy.h | 163 ++++
include/linux/sched.h | 5 +
include/net/net_namespace.h | 3 +
include/net/request_sock.h | 4 +-
include/net/sock.h | 28 +
include/uapi/asm-generic/socket.h | 2 +
include/uapi/linux/prctl.h | 4 +
kernel/exit.c | 4 +
kernel/fork.c | 6 +
kernel/sys.c | 31 +
net/Kconfig | 7 +
net/core/Makefile | 1 +
net/core/dev.c | 20 +-
net/core/ethtool.c | 8 +-
net/core/netpolicy.c | 1511 ++++++++++++++++++++++++++++++++
net/core/sock.c | 36 +
net/ipv4/af_inet.c | 71 ++
net/ipv4/udp.c | 4 +
34 files changed, 2189 insertions(+), 4 deletions(-)
create mode 100644 Documentation/networking/netpolicy.txt
create mode 100644 include/linux/netpolicy.h
create mode 100644 net/core/netpolicy.c
--
2.5.5
^ permalink raw reply
* Re: [PATCH] qlcnic: Fix return value in qlcnic_probe()
From: David Miller @ 2015-01-01 0:21 UTC (permalink / raw)
To: xuyongjiande
Cc: shahed.shaikh, Dept-GELinuxNICDev, netdev, linux-kernel,
xuyongjiande
In-Reply-To: <1419926626-26624-1-git-send-email-xuyongjiande@163.com>
From: xuyongjiande@163.com
Date: Tue, 30 Dec 2014 16:03:46 +0800
> From: Yongjian Xu <xuyongjiande@gmail.com>
>
> If the check of adapter fails and goes into the 'else' branch, the
> return value 'err' should not still be zero.
>
> Signed-off-by: Yongjian Xu <xuyongjiande@gmail.com>
Applied, thank you.
^ permalink raw reply
* Re: [PATCH 2/8] myri10ge: fix error return code
From: David Miller @ 2015-01-01 0:20 UTC (permalink / raw)
To: Julia.Lawall; +Cc: hykim, kernel-janitors, netdev, linux-kernel
In-Reply-To: <1419872683-32709-3-git-send-email-Julia.Lawall@lip6.fr>
From: Julia Lawall <Julia.Lawall@lip6.fr>
Date: Mon, 29 Dec 2014 18:04:37 +0100
> Return a negative error code on failure.
>
> A simplified version of the semantic match that finds this problem is as
> follows: (http://coccinelle.lip6.fr/)
...
The patch also modifies the test of mgp->cmd to satisfy checkpatch.
>
> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Applied.
^ permalink raw reply
* Re: [PATCH 1/8] net: Xilinx: fix error return code
From: David Miller @ 2015-01-01 0:20 UTC (permalink / raw)
To: Julia.Lawall
Cc: michal.simek, kernel-janitors, soren.brinkmann, netdev,
linux-arm-kernel, linux-kernel
In-Reply-To: <1419872683-32709-2-git-send-email-Julia.Lawall@lip6.fr>
From: Julia Lawall <Julia.Lawall@lip6.fr>
Date: Mon, 29 Dec 2014 18:04:36 +0100
> Return a negative error code on failure.
>
> A simplified version of the semantic match that finds this problem is as
> follows: (http://coccinelle.lip6.fr/)
...
> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Applied.
^ permalink raw reply
* Re: [PATCH 5/8] net: sun4i-emac: fix error return code
From: David Miller @ 2015-01-01 0:20 UTC (permalink / raw)
To: Julia.Lawall
Cc: maxime.ripard, kernel-janitors, netdev, linux-arm-kernel,
linux-kernel
In-Reply-To: <1419872683-32709-6-git-send-email-Julia.Lawall@lip6.fr>
From: Julia Lawall <Julia.Lawall@lip6.fr>
Date: Mon, 29 Dec 2014 18:04:40 +0100
> Return a negative error code on failure.
>
> A simplified version of the semantic match that finds this problem is as
> follows: (http://coccinelle.lip6.fr/)
...
> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Applied.
^ permalink raw reply
* Re: [PATCH 7/8] net: axienet: fix error return code
From: David Miller @ 2015-01-01 0:19 UTC (permalink / raw)
To: Julia.Lawall
Cc: anirudh, kernel-janitors, John.Linn, michal.simek,
soren.brinkmann, netdev, linux-arm-kernel, linux-kernel
In-Reply-To: <1419872683-32709-8-git-send-email-Julia.Lawall@lip6.fr>
From: Julia Lawall <Julia.Lawall@lip6.fr>
Date: Mon, 29 Dec 2014 18:04:42 +0100
> Return a negative error code on failure.
>
> A simplified version of the semantic match that finds this problem is as
> follows: (http://coccinelle.lip6.fr/)
...
> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Applied.
^ permalink raw reply
* Re: [net 0/3][pull request] Intel Wired LAN Driver Updates 2014-12-31
From: David Miller @ 2015-01-01 0:17 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann, jogreene
In-Reply-To: <1420070655-28453-1-git-send-email-jeffrey.t.kirsher@intel.com>
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Wed, 31 Dec 2014 16:04:12 -0800
> This series contains updates to fixes for e100, igb and i40e.
>
> John Linville fixes a typo in e100 that has been around for some time,
> where an attempted revert actually inverted the test for eeprom_mdix_enabled.
>
> Todd fixes up a code comment that should have been removed back in 2007.
>
> Joe Perches fixes a possible memory leak in i40e which was reported by
> Dan Carpenter using smatch.
>
> The following are changes since commit 2c90331cf5ed1d648a711b9483e173aaaf2c4a9b:
> Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
> and are available in the git repository at:
> git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net master
Pulled, thanks Jeff.
^ permalink raw reply
* Re: [PATCH v3 0/6] support GMAC driver for RK3288
From: David Miller @ 2015-01-01 0:15 UTC (permalink / raw)
To: roger.chen
Cc: heiko, peppe.cavallaro, netdev, linux-kernel, linux-rockchip,
kever.yang, eddie.cai
In-Reply-To: <1419846152-14531-1-git-send-email-roger.chen@rock-chips.com>
From: Roger Chen <roger.chen@rock-chips.com>
Date: Mon, 29 Dec 2014 17:42:32 +0800
> Roger Chen (6):
> patch1: add driver for Rockchip RK3288 SoCs integrated GMAC
> patch2: define clock ID used for GMAC
> patch3: modify CRU config for Rockchip RK3288 SoCs integrated GMAC
> patch4: dts: rockchip: add gmac info for rk3288
> patch5: dts: rockchip: enable gmac on RK3288 evb board
> patch6: add document for Rockchip RK3288 GMAC
>
> Tested on rk3288 evb board:
> Execute the following command to enable ethernet,
> set local IP and ping a remote host.
>
> busybox ifconfig eth0 up
> busybox ifconfig eth0 192.168.1.111
> ping 192.168.1.1
Series applied to net-next, thanks.
^ permalink raw reply
* [net 3/3] i40e: Fix possible memory leak in i40e_dbg_dump_desc
From: Jeff Kirsher @ 2015-01-01 0:04 UTC (permalink / raw)
To: davem; +Cc: Joe Perches, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <1420070655-28453-1-git-send-email-jeffrey.t.kirsher@intel.com>
From: Joe Perches <joe@perches.com>
I didn't notice that return in the code, fix it by
adding a goto out instead to free the memory.
Fixes:
> New smatch warnings:
> drivers/net/ethernet/intel/i40e/i40e_debugfs.c:832 i40e_dbg_dump_desc() warn: possible memory leak of 'ring'
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Joe Perches <joe@perches.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_debugfs.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
index 433a558..cb0de45 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
@@ -829,7 +829,7 @@ static void i40e_dbg_dump_desc(int cnt, int vsi_seid, int ring_id, int desc_n,
if (desc_n >= ring->count || desc_n < 0) {
dev_info(&pf->pdev->dev,
"descriptor %d not found\n", desc_n);
- return;
+ goto out;
}
if (!is_rx_ring) {
txd = I40E_TX_DESC(ring, desc_n);
@@ -855,6 +855,8 @@ static void i40e_dbg_dump_desc(int cnt, int vsi_seid, int ring_id, int desc_n,
} else {
dev_info(&pf->pdev->dev, "dump desc rx/tx <vsi_seid> <ring_id> [<desc_n>]\n");
}
+
+out:
kfree(ring);
}
--
1.9.3
^ permalink raw reply related
* [net 2/3] igb: Remove unneeded FIXME
From: Jeff Kirsher @ 2015-01-01 0:04 UTC (permalink / raw)
To: davem; +Cc: Todd Fujinaka, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <1420070655-28453-1-git-send-email-jeffrey.t.kirsher@intel.com>
From: Todd Fujinaka <todd.fujinaka@intel.com>
Remove a FIXME comment that was missed in a commit on 1/2007.
Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Reported-by: nick <xerofoify@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/igb/e1000_82575.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/igb/e1000_82575.c b/drivers/net/ethernet/intel/igb/e1000_82575.c
index 051ea94..0f69ef8 100644
--- a/drivers/net/ethernet/intel/igb/e1000_82575.c
+++ b/drivers/net/ethernet/intel/igb/e1000_82575.c
@@ -1125,7 +1125,7 @@ static s32 igb_acquire_swfw_sync_82575(struct e1000_hw *hw, u16 mask)
u32 swmask = mask;
u32 fwmask = mask << 16;
s32 ret_val = 0;
- s32 i = 0, timeout = 200; /* FIXME: find real value to use here */
+ s32 i = 0, timeout = 200;
while (i < timeout) {
if (igb_get_hw_semaphore(hw)) {
--
1.9.3
^ permalink raw reply related
* [net 1/3] e100: fix typo in MDI/MDI-X eeprom check in e100_phy_init
From: Jeff Kirsher @ 2015-01-01 0:04 UTC (permalink / raw)
To: davem; +Cc: John W. Linville, netdev, nhorman, sassmann, jogreene,
Jeff Kirsher
In-Reply-To: <1420070655-28453-1-git-send-email-jeffrey.t.kirsher@intel.com>
From: "John W. Linville" <linville@tuxdriver.com>
Although it doesn't explicitly say so, commit 60ffa478759f39a2 ("e100:
Fix MDIO/MDIO-X") appears to be intended to revert the earlier commit
648951451e6d2d53 ("e100: fixed e100 MDI/MDI-X issues"). However,
careful examination reveals that the attempted revert actually
_inverted_ the test for eeprom_mdix_enabled. That is bound to program
a few PHYs incorrectly...
https://bugzilla.redhat.com/show_bug.cgi?id=1156417
Signed-off-by: "John W. Linville" <linville@tuxdriver.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/e100.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/e100.c b/drivers/net/ethernet/intel/e100.c
index 781065e..e9c3a87 100644
--- a/drivers/net/ethernet/intel/e100.c
+++ b/drivers/net/ethernet/intel/e100.c
@@ -1543,7 +1543,7 @@ static int e100_phy_init(struct nic *nic)
mdio_write(netdev, nic->mii.phy_id, MII_BMCR, bmcr);
} else if ((nic->mac >= mac_82550_D102) || ((nic->flags & ich) &&
(mdio_read(netdev, nic->mii.phy_id, MII_TPISTATUS) & 0x8000) &&
- !(nic->eeprom[eeprom_cnfg_mdix] & eeprom_mdix_enabled))) {
+ (nic->eeprom[eeprom_cnfg_mdix] & eeprom_mdix_enabled))) {
/* enable/disable MDI/MDI-X auto-switching. */
mdio_write(netdev, nic->mii.phy_id, MII_NCONFIG,
nic->mii.force_media ? 0 : NCONFIG_AUTO_SWITCH);
--
1.9.3
^ permalink raw reply related
* [net 0/3][pull request] Intel Wired LAN Driver Updates 2014-12-31
From: Jeff Kirsher @ 2015-01-01 0:04 UTC (permalink / raw)
To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann, jogreene
This series contains updates to fixes for e100, igb and i40e.
John Linville fixes a typo in e100 that has been around for some time,
where an attempted revert actually inverted the test for eeprom_mdix_enabled.
Todd fixes up a code comment that should have been removed back in 2007.
Joe Perches fixes a possible memory leak in i40e which was reported by
Dan Carpenter using smatch.
The following are changes since commit 2c90331cf5ed1d648a711b9483e173aaaf2c4a9b:
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
and are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net master
Joe Perches (1):
i40e: Fix possible memory leak in i40e_dbg_dump_desc
John W. Linville (1):
e100: fix typo in MDI/MDI-X eeprom check in e100_phy_init
Todd Fujinaka (1):
igb: Remove unneeded FIXME
drivers/net/ethernet/intel/e100.c | 2 +-
drivers/net/ethernet/intel/i40e/i40e_debugfs.c | 4 +++-
drivers/net/ethernet/intel/igb/e1000_82575.c | 2 +-
3 files changed, 5 insertions(+), 3 deletions(-)
--
1.9.3
^ permalink raw reply
* Re: [net-next PATCH 00/17] fib_trie: Reduce time spent in fib_table_lookup by 35 to 75%
From: David Miller @ 2014-12-31 23:46 UTC (permalink / raw)
To: alexander.h.duyck; +Cc: netdev
In-Reply-To: <20141231184649.3006.29958.stgit@ahduyck-vm-fedora20>
From: Alexander Duyck <alexander.h.duyck@redhat.com>
Date: Wed, 31 Dec 2014 10:55:23 -0800
> These patches are meant to address several performance issues I have seen
> in the fib_trie implementation, and fib_table_lookup specifically. With
> these changes in place I have seen a reduction of up to 35 to 75% for the
> total time spent in fib_table_lookup depending on the type of search being
> performed.
...
> Changes since RFC:
> Replaced this_cpu_ptr with correct call to this_cpu_inc in patch 1
> Changed test for leaf_info mismatch to (key ^ n->key) & li->mask_plen in patch 10
As before, this looks awesome.
All applied to net-next, thanks!
This knocks about 35 cpu cycles off of a lookup that ends up using the
default route on sparc64. From about ~438 cycles to ~403.
^ permalink raw reply
* Re: [PATCH 2/2] igb_ptp: Include clocksource.h to get CLOCKSOURCE_MASK.
From: Jeff Kirsher @ 2014-12-31 23:43 UTC (permalink / raw)
To: David Miller; +Cc: Richard Cochran, netdev
In-Reply-To: <20141231.183359.681102444156146233.davem@davemloft.net>
On Wed, Dec 31, 2014 at 3:33 PM, David Miller <davem@davemloft.net> wrote:
>
> Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> ---
> drivers/net/ethernet/intel/igb/igb_ptp.c | 1 +
> 1 file changed, 1 insertion(+)
>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox