Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] xfrm: Report user triggered expirations against the users socket
From: Jamal Hadi Salim @ 2012-09-08 11:48 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: David Miller, netdev, Jamal Hadi Salim
In-Reply-To: <87pq5xhtky.fsf_-_@xmission.com>

On 12-09-08 03:17 AM, Eric W. Biederman wrote:
> When a policy expiration is triggered from user space the request
> travles through km_policy_expired and ultimately into
> xfrm_exp_policy_notify which calls build_polexpire.  build_polexpire
> uses the netlink port passed to km_policy_expired as the source port for
> the netlink message it builds.
>
> When a state expiration is triggered from user space the request travles
> through km_state_expired and ultimately into xfrm_exp_state_notify which
> calls build_expire.  build_expire uses the netlink port passed to
> km_state_expired as the source port for the netlink message it builds.
>
> Pass nlh->nlmsg_pid from the user generated netlink message that
> requested the expiration to km_policy_expired and km_state_expired
> instead of current->pid which is not a netlink port number.
>
> Cc: Jamal Hadi Salim <hadi@cyberus.ca>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>

I suppose.
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>


cheers,
jamal

^ permalink raw reply

* [PATCH 0/2] [v3] netlink_kernel_create updates
From: pablo @ 2012-09-08 12:53 UTC (permalink / raw)
  To: netdev; +Cc: davem

From: Pablo Neira Ayuso <pablo@netfilter.org>

Hi David,

Fixed the infiniband issue. New round of these patches.

Please, apply.

Thanks!

Pablo Neira Ayuso (2):
  netlink: kill netlink_set_nonroot
  netlink: hide struct module parameter in netlink_kernel_create

 crypto/crypto_user.c                |    3 +--
 drivers/connector/connector.c       |    3 +--
 drivers/infiniband/core/netlink.c   |    2 +-
 drivers/scsi/scsi_netlink.c         |    2 +-
 drivers/scsi/scsi_transport_iscsi.c |    3 +--
 drivers/staging/gdm72xx/netlink_k.c |    2 +-
 include/linux/netlink.h             |   22 +++++++++++++--------
 kernel/audit.c                      |    3 +--
 lib/kobject_uevent.c                |    5 ++---
 net/bridge/netfilter/ebt_ulog.c     |    3 +--
 net/core/rtnetlink.c                |    4 ++--
 net/core/sock_diag.c                |    3 +--
 net/decnet/netfilter/dn_rtmsg.c     |    3 +--
 net/ipv4/fib_frontend.c             |    2 +-
 net/ipv4/netfilter/ipt_ULOG.c       |    3 +--
 net/netfilter/nfnetlink.c           |    2 +-
 net/netlink/af_netlink.c            |   36 ++++++++++++++++-------------------
 net/netlink/genetlink.c             |    6 ++----
 net/xfrm/xfrm_user.c                |    2 +-
 security/selinux/netlink.c          |    5 ++---
 20 files changed, 52 insertions(+), 62 deletions(-)

-- 
1.7.10.4

^ permalink raw reply

* [PATCH 1/2] netlink: kill netlink_set_nonroot
From: pablo @ 2012-09-08 12:53 UTC (permalink / raw)
  To: netdev; +Cc: davem
In-Reply-To: <1347108834-15429-1-git-send-email-pablo@netfilter.org>

From: Pablo Neira Ayuso <pablo@netfilter.org>

Replace netlink_set_nonroot by one new field `flags' in
struct netlink_kernel_cfg that is passed to netlink_kernel_create.

This patch also renames NL_NONROOT_* to NL_CFG_F_NONROOT_* since
now the flags field in nl_table is generic (so we can add more
flags if needed in the future).

Also adjust all callers in the net-next tree to use these flags
instead of netlink_set_nonroot.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/linux/netlink.h    |    9 ++++-----
 lib/kobject_uevent.c       |    2 +-
 net/core/rtnetlink.c       |    2 +-
 net/netlink/af_netlink.c   |   28 +++++++++++++---------------
 net/netlink/genetlink.c    |    3 +--
 security/selinux/netlink.c |    2 +-
 6 files changed, 21 insertions(+), 25 deletions(-)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index c9fdde2..d30ee743 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -175,12 +175,16 @@ struct netlink_skb_parms {
 extern void netlink_table_grab(void);
 extern void netlink_table_ungrab(void);
 
+#define NL_CFG_F_NONROOT_RECV	(1 << 0)
+#define NL_CFG_F_NONROOT_SEND	(1 << 1)
+
 /* optional Netlink kernel configuration parameters */
 struct netlink_kernel_cfg {
 	unsigned int	groups;
 	void		(*input)(struct sk_buff *skb);
 	struct mutex	*cb_mutex;
 	void		(*bind)(int group);
+	unsigned int	flags;
 };
 
 extern struct sock *netlink_kernel_create(struct net *net, int unit,
@@ -259,11 +263,6 @@ extern int netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
 			      const struct nlmsghdr *nlh,
 			      struct netlink_dump_control *control);
 
-
-#define NL_NONROOT_RECV 0x1
-#define NL_NONROOT_SEND 0x2
-extern void netlink_set_nonroot(int protocol, unsigned flag);
-
 #endif /* __KERNEL__ */
 
 #endif	/* __LINUX_NETLINK_H */
diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index 0401d29..c2e9778 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -375,6 +375,7 @@ static int uevent_net_init(struct net *net)
 	struct uevent_sock *ue_sk;
 	struct netlink_kernel_cfg cfg = {
 		.groups	= 1,
+		.flags	= NL_CFG_F_NONROOT_RECV,
 	};
 
 	ue_sk = kzalloc(sizeof(*ue_sk), GFP_KERNEL);
@@ -422,7 +423,6 @@ static struct pernet_operations uevent_net_ops = {
 
 static int __init kobject_uevent_init(void)
 {
-	netlink_set_nonroot(NETLINK_KOBJECT_UEVENT, NL_NONROOT_RECV);
 	return register_pernet_subsys(&uevent_net_ops);
 }
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index c64efcf..a71806e 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2381,6 +2381,7 @@ static int __net_init rtnetlink_net_init(struct net *net)
 		.groups		= RTNLGRP_MAX,
 		.input		= rtnetlink_rcv,
 		.cb_mutex	= &rtnl_mutex,
+		.flags		= NL_CFG_F_NONROOT_RECV,
 	};
 
 	sk = netlink_kernel_create(net, NETLINK_ROUTE, THIS_MODULE, &cfg);
@@ -2416,7 +2417,6 @@ void __init rtnetlink_init(void)
 	if (register_pernet_subsys(&rtnetlink_net_ops))
 		panic("rtnetlink_init: cannot initialize rtnetlink\n");
 
-	netlink_set_nonroot(NETLINK_ROUTE, NL_NONROOT_RECV);
 	register_netdevice_notifier(&rtnetlink_dev_notifier);
 
 	rtnl_register(PF_UNSPEC, RTM_GETLINK, rtnl_getlink,
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 3821199..1543a66 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -121,7 +121,7 @@ struct netlink_table {
 	struct nl_pid_hash	hash;
 	struct hlist_head	mc_list;
 	struct listeners __rcu	*listeners;
-	unsigned int		nl_nonroot;
+	unsigned int		flags;
 	unsigned int		groups;
 	struct mutex		*cb_mutex;
 	struct module		*module;
@@ -536,6 +536,8 @@ static int netlink_release(struct socket *sock)
 		if (--nl_table[sk->sk_protocol].registered == 0) {
 			kfree(nl_table[sk->sk_protocol].listeners);
 			nl_table[sk->sk_protocol].module = NULL;
+			nl_table[sk->sk_protocol].bind = NULL;
+			nl_table[sk->sk_protocol].flags = 0;
 			nl_table[sk->sk_protocol].registered = 0;
 		}
 	} else if (nlk->subscriptions) {
@@ -596,7 +598,7 @@ retry:
 
 static inline int netlink_capable(const struct socket *sock, unsigned int flag)
 {
-	return (nl_table[sock->sk->sk_protocol].nl_nonroot & flag) ||
+	return (nl_table[sock->sk->sk_protocol].flags & flag) ||
 	       capable(CAP_NET_ADMIN);
 }
 
@@ -659,7 +661,7 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
 
 	/* Only superuser is allowed to listen multicasts */
 	if (nladdr->nl_groups) {
-		if (!netlink_capable(sock, NL_NONROOT_RECV))
+		if (!netlink_capable(sock, NL_CFG_F_NONROOT_RECV))
 			return -EPERM;
 		err = netlink_realloc_groups(sk);
 		if (err)
@@ -721,7 +723,7 @@ static int netlink_connect(struct socket *sock, struct sockaddr *addr,
 		return -EINVAL;
 
 	/* Only superuser is allowed to send multicasts */
-	if (nladdr->nl_groups && !netlink_capable(sock, NL_NONROOT_SEND))
+	if (nladdr->nl_groups && !netlink_capable(sock, NL_CFG_F_NONROOT_SEND))
 		return -EPERM;
 
 	if (!nlk->pid)
@@ -1244,7 +1246,7 @@ static int netlink_setsockopt(struct socket *sock, int level, int optname,
 		break;
 	case NETLINK_ADD_MEMBERSHIP:
 	case NETLINK_DROP_MEMBERSHIP: {
-		if (!netlink_capable(sock, NL_NONROOT_RECV))
+		if (!netlink_capable(sock, NL_CFG_F_NONROOT_RECV))
 			return -EPERM;
 		err = netlink_realloc_groups(sk);
 		if (err)
@@ -1376,7 +1378,7 @@ static int netlink_sendmsg(struct kiocb *kiocb, struct socket *sock,
 		dst_group = ffs(addr->nl_groups);
 		err =  -EPERM;
 		if ((dst_group || dst_pid) &&
-		    !netlink_capable(sock, NL_NONROOT_SEND))
+		    !netlink_capable(sock, NL_CFG_F_NONROOT_SEND))
 			goto out;
 	} else {
 		dst_pid = nlk->dst_pid;
@@ -1580,7 +1582,10 @@ netlink_kernel_create(struct net *net, int unit,
 		rcu_assign_pointer(nl_table[unit].listeners, listeners);
 		nl_table[unit].cb_mutex = cb_mutex;
 		nl_table[unit].module = module;
-		nl_table[unit].bind = cfg ? cfg->bind : NULL;
+		if (cfg) {
+			nl_table[unit].bind = cfg->bind;
+			nl_table[unit].flags = cfg->flags;
+		}
 		nl_table[unit].registered = 1;
 	} else {
 		kfree(listeners);
@@ -1679,13 +1684,6 @@ void netlink_clear_multicast_users(struct sock *ksk, unsigned int group)
 	netlink_table_ungrab();
 }
 
-void netlink_set_nonroot(int protocol, unsigned int flags)
-{
-	if ((unsigned int)protocol < MAX_LINKS)
-		nl_table[protocol].nl_nonroot = flags;
-}
-EXPORT_SYMBOL(netlink_set_nonroot);
-
 struct nlmsghdr *
 __nlmsg_put(struct sk_buff *skb, u32 pid, u32 seq, int type, int len, int flags)
 {
@@ -2150,7 +2148,7 @@ static void __init netlink_add_usersock_entry(void)
 	rcu_assign_pointer(nl_table[NETLINK_USERSOCK].listeners, listeners);
 	nl_table[NETLINK_USERSOCK].module = THIS_MODULE;
 	nl_table[NETLINK_USERSOCK].registered = 1;
-	nl_table[NETLINK_USERSOCK].nl_nonroot = NL_NONROOT_SEND;
+	nl_table[NETLINK_USERSOCK].flags = NL_CFG_F_NONROOT_SEND;
 
 	netlink_table_ungrab();
 }
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index fda4974..c1b71ae 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -918,6 +918,7 @@ static int __net_init genl_pernet_init(struct net *net)
 	struct netlink_kernel_cfg cfg = {
 		.input		= genl_rcv,
 		.cb_mutex	= &genl_mutex,
+		.flags		= NL_CFG_F_NONROOT_RECV,
 	};
 
 	/* we'll bump the group number right afterwards */
@@ -955,8 +956,6 @@ static int __init genl_init(void)
 	if (err < 0)
 		goto problem;
 
-	netlink_set_nonroot(NETLINK_GENERIC, NL_NONROOT_RECV);
-
 	err = register_pernet_subsys(&genl_pernet_ops);
 	if (err)
 		goto problem;
diff --git a/security/selinux/netlink.c b/security/selinux/netlink.c
index 8a77725..0d2cd11 100644
--- a/security/selinux/netlink.c
+++ b/security/selinux/netlink.c
@@ -113,13 +113,13 @@ static int __init selnl_init(void)
 {
 	struct netlink_kernel_cfg cfg = {
 		.groups	= SELNLGRP_MAX,
+		.flags	= NL_CFG_F_NONROOT_RECV,
 	};
 
 	selnl = netlink_kernel_create(&init_net, NETLINK_SELINUX,
 				      THIS_MODULE, &cfg);
 	if (selnl == NULL)
 		panic("SELinux:  Cannot create netlink socket.");
-	netlink_set_nonroot(NETLINK_SELINUX, NL_NONROOT_RECV);
 	return 0;
 }
 
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 2/2] netlink: hide struct module parameter in netlink_kernel_create
From: pablo @ 2012-09-08 12:53 UTC (permalink / raw)
  To: netdev; +Cc: davem
In-Reply-To: <1347108834-15429-1-git-send-email-pablo@netfilter.org>

From: Pablo Neira Ayuso <pablo@netfilter.org>

This patch defines netlink_kernel_create as a wrapper function of
__netlink_kernel_create to hide the struct module *me parameter
(which seems to be THIS_MODULE in all existing netlink subsystems).

Suggested by David S. Miller.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 crypto/crypto_user.c                |    3 +--
 drivers/connector/connector.c       |    3 +--
 drivers/infiniband/core/netlink.c   |    2 +-
 drivers/scsi/scsi_netlink.c         |    2 +-
 drivers/scsi/scsi_transport_iscsi.c |    3 +--
 drivers/staging/gdm72xx/netlink_k.c |    2 +-
 include/linux/netlink.h             |   13 ++++++++++---
 kernel/audit.c                      |    3 +--
 lib/kobject_uevent.c                |    3 +--
 net/bridge/netfilter/ebt_ulog.c     |    3 +--
 net/core/rtnetlink.c                |    2 +-
 net/core/sock_diag.c                |    3 +--
 net/decnet/netfilter/dn_rtmsg.c     |    3 +--
 net/ipv4/fib_frontend.c             |    2 +-
 net/ipv4/netfilter/ipt_ULOG.c       |    3 +--
 net/netfilter/nfnetlink.c           |    2 +-
 net/netlink/af_netlink.c            |    8 +++-----
 net/netlink/genetlink.c             |    3 +--
 net/xfrm/xfrm_user.c                |    2 +-
 security/selinux/netlink.c          |    3 +--
 20 files changed, 31 insertions(+), 37 deletions(-)

diff --git a/crypto/crypto_user.c b/crypto/crypto_user.c
index ba2c611..165914e 100644
--- a/crypto/crypto_user.c
+++ b/crypto/crypto_user.c
@@ -500,8 +500,7 @@ static int __init crypto_user_init(void)
 		.input	= crypto_netlink_rcv,
 	};
 
-	crypto_nlsk = netlink_kernel_create(&init_net, NETLINK_CRYPTO,
-					    THIS_MODULE, &cfg);
+	crypto_nlsk = netlink_kernel_create(&init_net, NETLINK_CRYPTO, &cfg);
 	if (!crypto_nlsk)
 		return -ENOMEM;
 
diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index 82fa4f0..965b781 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -264,8 +264,7 @@ static int __devinit cn_init(void)
 		.input	= dev->input,
 	};
 
-	dev->nls = netlink_kernel_create(&init_net, NETLINK_CONNECTOR,
-					 THIS_MODULE, &cfg);
+	dev->nls = netlink_kernel_create(&init_net, NETLINK_CONNECTOR, &cfg);
 	if (!dev->nls)
 		return -EIO;
 
diff --git a/drivers/infiniband/core/netlink.c b/drivers/infiniband/core/netlink.c
index 3ae2bfd..fe10a94 100644
--- a/drivers/infiniband/core/netlink.c
+++ b/drivers/infiniband/core/netlink.c
@@ -177,7 +177,7 @@ int __init ibnl_init(void)
 		.input	= ibnl_rcv,
 	};
 
-	nls = netlink_kernel_create(&init_net, NETLINK_RDMA, THIS_MODULE, &cfg);
+	nls = netlink_kernel_create(&init_net, NETLINK_RDMA, &cfg);
 	if (!nls) {
 		pr_warn("Failed to create netlink socket\n");
 		return -ENOMEM;
diff --git a/drivers/scsi/scsi_netlink.c b/drivers/scsi/scsi_netlink.c
index 8818dd6..3252bc9 100644
--- a/drivers/scsi/scsi_netlink.c
+++ b/drivers/scsi/scsi_netlink.c
@@ -501,7 +501,7 @@ scsi_netlink_init(void)
 	}
 
 	scsi_nl_sock = netlink_kernel_create(&init_net, NETLINK_SCSITRANSPORT,
-					     THIS_MODULE, &cfg);
+					     &cfg);
 	if (!scsi_nl_sock) {
 		printk(KERN_ERR "%s: register of receive handler failed\n",
 				__func__);
diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c
index fa1dfaa..519bd53 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -2969,8 +2969,7 @@ static __init int iscsi_transport_init(void)
 	if (err)
 		goto unregister_conn_class;
 
-	nls = netlink_kernel_create(&init_net, NETLINK_ISCSI,
-				    THIS_MODULE, &cfg);
+	nls = netlink_kernel_create(&init_net, NETLINK_ISCSI, &cfg);
 	if (!nls) {
 		err = -ENOBUFS;
 		goto unregister_session_class;
diff --git a/drivers/staging/gdm72xx/netlink_k.c b/drivers/staging/gdm72xx/netlink_k.c
index 3abb31d..2109cab 100644
--- a/drivers/staging/gdm72xx/netlink_k.c
+++ b/drivers/staging/gdm72xx/netlink_k.c
@@ -95,7 +95,7 @@ struct sock *netlink_init(int unit, void (*cb)(struct net_device *dev, u16 type,
 	init_MUTEX(&netlink_mutex);
 #endif
 
-	sock = netlink_kernel_create(&init_net, unit, THIS_MODULE, &cfg);
+	sock = netlink_kernel_create(&init_net, unit, &cfg);
 
 	if (sock)
 		rcv_cb = cb;
diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index d30ee743..628e799 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -153,6 +153,7 @@ struct nlattr {
 
 #include <linux/capability.h>
 #include <linux/skbuff.h>
+#include <linux/module.h>
 
 struct net;
 
@@ -187,9 +188,15 @@ struct netlink_kernel_cfg {
 	unsigned int	flags;
 };
 
-extern struct sock *netlink_kernel_create(struct net *net, int unit,
-					  struct module *module,
-					  struct netlink_kernel_cfg *cfg);
+extern struct sock *__netlink_kernel_create(struct net *net, int unit,
+					    struct module *module,
+					    struct netlink_kernel_cfg *cfg);
+static inline struct sock *
+netlink_kernel_create(struct net *net, int unit, struct netlink_kernel_cfg *cfg)
+{
+	return __netlink_kernel_create(net, unit, THIS_MODULE, cfg);
+}
+
 extern void netlink_kernel_release(struct sock *sk);
 extern int __netlink_change_ngroups(struct sock *sk, unsigned int groups);
 extern int netlink_change_ngroups(struct sock *sk, unsigned int groups);
diff --git a/kernel/audit.c b/kernel/audit.c
index ea3b7b6..a24aafa 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -971,8 +971,7 @@ static int __init audit_init(void)
 
 	printk(KERN_INFO "audit: initializing netlink socket (%s)\n",
 	       audit_default ? "enabled" : "disabled");
-	audit_sock = netlink_kernel_create(&init_net, NETLINK_AUDIT,
-					   THIS_MODULE, &cfg);
+	audit_sock = netlink_kernel_create(&init_net, NETLINK_AUDIT, &cfg);
 	if (!audit_sock)
 		audit_panic("cannot initialize netlink socket");
 	else
diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index c2e9778..52e5abb 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -382,8 +382,7 @@ static int uevent_net_init(struct net *net)
 	if (!ue_sk)
 		return -ENOMEM;
 
-	ue_sk->sk = netlink_kernel_create(net, NETLINK_KOBJECT_UEVENT,
-					  THIS_MODULE, &cfg);
+	ue_sk->sk = netlink_kernel_create(net, NETLINK_KOBJECT_UEVENT, &cfg);
 	if (!ue_sk->sk) {
 		printk(KERN_ERR
 		       "kobject_uevent: unable to create netlink socket!\n");
diff --git a/net/bridge/netfilter/ebt_ulog.c b/net/bridge/netfilter/ebt_ulog.c
index 1906347..3476ec4 100644
--- a/net/bridge/netfilter/ebt_ulog.c
+++ b/net/bridge/netfilter/ebt_ulog.c
@@ -298,8 +298,7 @@ static int __init ebt_ulog_init(void)
 		spin_lock_init(&ulog_buffers[i].lock);
 	}
 
-	ebtulognl = netlink_kernel_create(&init_net, NETLINK_NFLOG,
-					  THIS_MODULE, &cfg);
+	ebtulognl = netlink_kernel_create(&init_net, NETLINK_NFLOG, &cfg);
 	if (!ebtulognl)
 		ret = -ENOMEM;
 	else if ((ret = xt_register_target(&ebt_ulog_tg_reg)) != 0)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a71806e..508c5df 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2384,7 +2384,7 @@ static int __net_init rtnetlink_net_init(struct net *net)
 		.flags		= NL_CFG_F_NONROOT_RECV,
 	};
 
-	sk = netlink_kernel_create(net, NETLINK_ROUTE, THIS_MODULE, &cfg);
+	sk = netlink_kernel_create(net, NETLINK_ROUTE, &cfg);
 	if (!sk)
 		return -ENOMEM;
 	net->rtnl = sk;
diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c
index 9d8755e..602cd63 100644
--- a/net/core/sock_diag.c
+++ b/net/core/sock_diag.c
@@ -172,8 +172,7 @@ static int __net_init diag_net_init(struct net *net)
 		.input	= sock_diag_rcv,
 	};
 
-	net->diag_nlsk = netlink_kernel_create(net, NETLINK_SOCK_DIAG,
-					       THIS_MODULE, &cfg);
+	net->diag_nlsk = netlink_kernel_create(net, NETLINK_SOCK_DIAG, &cfg);
 	return net->diag_nlsk == NULL ? -ENOMEM : 0;
 }
 
diff --git a/net/decnet/netfilter/dn_rtmsg.c b/net/decnet/netfilter/dn_rtmsg.c
index 11db0ec..dfe4201 100644
--- a/net/decnet/netfilter/dn_rtmsg.c
+++ b/net/decnet/netfilter/dn_rtmsg.c
@@ -130,8 +130,7 @@ static int __init dn_rtmsg_init(void)
 		.input	= dnrmg_receive_user_skb,
 	};
 
-	dnrmg = netlink_kernel_create(&init_net,
-				      NETLINK_DNRTMSG, THIS_MODULE, &cfg);
+	dnrmg = netlink_kernel_create(&init_net, NETLINK_DNRTMSG, &cfg);
 	if (dnrmg == NULL) {
 		printk(KERN_ERR "dn_rtmsg: Cannot create netlink socket");
 		return -ENOMEM;
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index acdee32..21bf521 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -986,7 +986,7 @@ static int __net_init nl_fib_lookup_init(struct net *net)
 		.input	= nl_fib_input,
 	};
 
-	sk = netlink_kernel_create(net, NETLINK_FIB_LOOKUP, THIS_MODULE, &cfg);
+	sk = netlink_kernel_create(net, NETLINK_FIB_LOOKUP, &cfg);
 	if (sk == NULL)
 		return -EAFNOSUPPORT;
 	net->ipv4.fibnl = sk;
diff --git a/net/ipv4/netfilter/ipt_ULOG.c b/net/ipv4/netfilter/ipt_ULOG.c
index 1109f7f..b5ef3cb 100644
--- a/net/ipv4/netfilter/ipt_ULOG.c
+++ b/net/ipv4/netfilter/ipt_ULOG.c
@@ -396,8 +396,7 @@ static int __init ulog_tg_init(void)
 	for (i = 0; i < ULOG_MAXNLGROUPS; i++)
 		setup_timer(&ulog_buffers[i].timer, ulog_timer, i);
 
-	nflognl = netlink_kernel_create(&init_net, NETLINK_NFLOG,
-					THIS_MODULE, &cfg);
+	nflognl = netlink_kernel_create(&init_net, NETLINK_NFLOG, &cfg);
 	if (!nflognl)
 		return -ENOMEM;
 
diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index a265033..ffb92c0 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -241,7 +241,7 @@ static int __net_init nfnetlink_net_init(struct net *net)
 #endif
 	};
 
-	nfnl = netlink_kernel_create(net, NETLINK_NETFILTER, THIS_MODULE, &cfg);
+	nfnl = netlink_kernel_create(net, NETLINK_NETFILTER, &cfg);
 	if (!nfnl)
 		return -ENOMEM;
 	net->nfnl_stash = nfnl;
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 1543a66..93768db 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1526,9 +1526,8 @@ static void netlink_data_ready(struct sock *sk, int len)
  */
 
 struct sock *
-netlink_kernel_create(struct net *net, int unit,
-		      struct module *module,
-		      struct netlink_kernel_cfg *cfg)
+__netlink_kernel_create(struct net *net, int unit, struct module *module,
+			struct netlink_kernel_cfg *cfg)
 {
 	struct socket *sock;
 	struct sock *sk;
@@ -1603,8 +1602,7 @@ out_sock_release_nosk:
 	sock_release(sock);
 	return NULL;
 }
-EXPORT_SYMBOL(netlink_kernel_create);
-
+EXPORT_SYMBOL(__netlink_kernel_create);
 
 void
 netlink_kernel_release(struct sock *sk)
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index c1b71ae..19288b7 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -922,8 +922,7 @@ static int __net_init genl_pernet_init(struct net *net)
 	};
 
 	/* we'll bump the group number right afterwards */
-	net->genl_sock = netlink_kernel_create(net, NETLINK_GENERIC,
-					       THIS_MODULE, &cfg);
+	net->genl_sock = netlink_kernel_create(net, NETLINK_GENERIC, &cfg);
 
 	if (!net->genl_sock && net_eq(net, &init_net))
 		panic("GENL: Cannot initialize generic netlink\n");
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index ab58034..354070a 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2963,7 +2963,7 @@ static int __net_init xfrm_user_net_init(struct net *net)
 		.input	= xfrm_netlink_rcv,
 	};
 
-	nlsk = netlink_kernel_create(net, NETLINK_XFRM, THIS_MODULE, &cfg);
+	nlsk = netlink_kernel_create(net, NETLINK_XFRM, &cfg);
 	if (nlsk == NULL)
 		return -ENOMEM;
 	net->xfrm.nlsk_stash = nlsk; /* Don't set to NULL */
diff --git a/security/selinux/netlink.c b/security/selinux/netlink.c
index 0d2cd11..14d810e 100644
--- a/security/selinux/netlink.c
+++ b/security/selinux/netlink.c
@@ -116,8 +116,7 @@ static int __init selnl_init(void)
 		.flags	= NL_CFG_F_NONROOT_RECV,
 	};
 
-	selnl = netlink_kernel_create(&init_net, NETLINK_SELINUX,
-				      THIS_MODULE, &cfg);
+	selnl = netlink_kernel_create(&init_net, NETLINK_SELINUX, &cfg);
 	if (selnl == NULL)
 		panic("SELinux:  Cannot create netlink socket.");
 	return 0;
-- 
1.7.10.4

^ permalink raw reply related

* (unknown), 
From: ranjith kumar @ 2012-09-08 14:13 UTC (permalink / raw)
  To: netdev

Hi,

We know that, in TCP socket programming accept() is a "blocking call".
Is  there any alternative to make "unblocked" accept() call?

I want to this because I am unable to kill the thread which made call
to accept().
Thanks.

^ permalink raw reply

* Re:
From: Rémi Denis-Courmont @ 2012-09-08 14:35 UTC (permalink / raw)
  To: ranjith kumar; +Cc: netdev
In-Reply-To: <CAG9fbzbQg9vWgt6ZcaRYKUQJNzQEziFp6_3Q_cJtSthsjhbN2Q@mail.gmail.com>

Le samedi 8 septembre 2012 17:13:02, ranjith kumar a écrit :
> We know that, in TCP socket programming accept() is a "blocking call".
> Is  there any alternative to make "unblocked" accept() call?

Yes and there is ample and easy to find documentation on the Internets.


-- 
Rémi Denis-Courmont
http://www.remlab.net/

^ permalink raw reply

* [PATCH net-next 4/5] cnic: Allocate kcq resource only on devices that support FCoE.
From: Michael Chan @ 2012-09-08 16:01 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1347120065-26492-3-git-send-email-mchan@broadcom.com>

To save memory and to exit IRQ loop quicker on devices that don't support
FCoE.

Reviewed-by: Eddie Wai <eddie.wai@broadcom.com>
Reviewed-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/ethernet/broadcom/cnic.c |    7 +++----
 drivers/net/ethernet/broadcom/cnic.h |    4 ++++
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/cnic.c b/drivers/net/ethernet/broadcom/cnic.c
index ac08b8e..c223314 100644
--- a/drivers/net/ethernet/broadcom/cnic.c
+++ b/drivers/net/ethernet/broadcom/cnic.c
@@ -1288,7 +1288,7 @@ static int cnic_alloc_bnx2x_resc(struct cnic_dev *dev)
 	if (ret)
 		goto error;
 
-	if (BNX2X_CHIP_IS_E2_PLUS(cp->chip_id)) {
+	if (CNIC_SUPPORTS_FCOE(cp)) {
 		ret = cnic_alloc_kcq(dev, &cp->kcq2, true);
 		if (ret)
 			goto error;
@@ -3130,7 +3130,7 @@ static void cnic_service_bnx2x_bh(unsigned long data)
 		CNIC_WR16(dev, cp->kcq1.io_addr,
 			  cp->kcq1.sw_prod_idx + MAX_KCQ_IDX);
 
-		if (!BNX2X_CHIP_IS_E2_PLUS(cp->chip_id)) {
+		if (cp->ethdev->drv_state & CNIC_DRV_STATE_NO_FCOE) {
 			cp->arm_int(dev, status_idx);
 			break;
 		}
@@ -5516,8 +5516,7 @@ static struct cnic_dev *init_bnx2x_cnic(struct net_device *dev)
 
 	if (!(ethdev->drv_state & CNIC_DRV_STATE_NO_ISCSI))
 		cdev->max_iscsi_conn = ethdev->max_iscsi_conn;
-	if (BNX2X_CHIP_IS_E2_PLUS(cp->chip_id) &&
-	    !(ethdev->drv_state & CNIC_DRV_STATE_NO_FCOE))
+	if (CNIC_SUPPORTS_FCOE(cp))
 		cdev->max_fcoe_conn = ethdev->max_fcoe_conn;
 
 	if (cdev->max_fcoe_conn > BNX2X_FCOE_NUM_CONNECTIONS)
diff --git a/drivers/net/ethernet/broadcom/cnic.h b/drivers/net/ethernet/broadcom/cnic.h
index 9643e3a..148604c 100644
--- a/drivers/net/ethernet/broadcom/cnic.h
+++ b/drivers/net/ethernet/broadcom/cnic.h
@@ -475,6 +475,10 @@ struct bnx2x_bd_chain_next {
 	  MAX_STAT_COUNTER_ID_E1))
 #endif
 
+#define CNIC_SUPPORTS_FCOE(cp)					\
+	(BNX2X_CHIP_IS_E2_PLUS((cp)->chip_id) &&		\
+	 !((cp)->ethdev->drv_state & CNIC_DRV_STATE_NO_FCOE))
+
 #define CNIC_RAMROD_TMO			(HZ / 4)
 
 #endif
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 5/5] cnic: Allocate UIO resources only on devices that support iSCSI.
From: Michael Chan @ 2012-09-08 16:01 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1347120065-26492-4-git-send-email-mchan@broadcom.com>

Update version to 2.5.13.

Reviewed-by: Eddie Wai <eddie.wai@broadcom.com>
Reviewed-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/ethernet/broadcom/cnic.c    |    5 ++++-
 drivers/net/ethernet/broadcom/cnic_if.h |    4 ++--
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/cnic.c b/drivers/net/ethernet/broadcom/cnic.c
index c223314..2107d79 100644
--- a/drivers/net/ethernet/broadcom/cnic.c
+++ b/drivers/net/ethernet/broadcom/cnic.c
@@ -1303,6 +1303,9 @@ static int cnic_alloc_bnx2x_resc(struct cnic_dev *dev)
 	if (ret)
 		goto error;
 
+	if (cp->ethdev->drv_state & CNIC_DRV_STATE_NO_ISCSI)
+		return 0;
+
 	cp->bnx2x_def_status_blk = cp->ethdev->irq_arr[1].status_blk;
 
 	cp->l2_rx_ring_size = 15;
@@ -5351,7 +5354,7 @@ static void cnic_stop_hw(struct cnic_dev *dev)
 		/* Need to wait for the ring shutdown event to complete
 		 * before clearing the CNIC_UP flag.
 		 */
-		while (cp->udev->uio_dev != -1 && i < 15) {
+		while (cp->udev && cp->udev->uio_dev != -1 && i < 15) {
 			msleep(100);
 			i++;
 		}
diff --git a/drivers/net/ethernet/broadcom/cnic_if.h b/drivers/net/ethernet/broadcom/cnic_if.h
index 5cb8888..2e92c34 100644
--- a/drivers/net/ethernet/broadcom/cnic_if.h
+++ b/drivers/net/ethernet/broadcom/cnic_if.h
@@ -14,8 +14,8 @@
 
 #include "bnx2x/bnx2x_mfw_req.h"
 
-#define CNIC_MODULE_VERSION	"2.5.12"
-#define CNIC_MODULE_RELDATE	"June 29, 2012"
+#define CNIC_MODULE_VERSION	"2.5.13"
+#define CNIC_MODULE_RELDATE	"Sep 07, 2012"
 
 #define CNIC_ULP_RDMA		0
 #define CNIC_ULP_ISCSI		1
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 1/5] cnic: Add functions to allocate and free UIO rings
From: Michael Chan @ 2012-09-08 16:01 UTC (permalink / raw)
  To: davem; +Cc: netdev

These functions are needed to free up memory when the rings are no longer
needed.

Reviewed-by: Eddie Wai <eddie.wai@broadcom.com>
Reviewed-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/ethernet/broadcom/cnic.c |   59 +++++++++++++++++++++++-----------
 1 files changed, 40 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/cnic.c b/drivers/net/ethernet/broadcom/cnic.c
index 3b4fc61..ff35894 100644
--- a/drivers/net/ethernet/broadcom/cnic.c
+++ b/drivers/net/ethernet/broadcom/cnic.c
@@ -823,10 +823,8 @@ static void cnic_free_context(struct cnic_dev *dev)
 	}
 }
 
-static void __cnic_free_uio(struct cnic_uio_dev *udev)
+static void __cnic_free_uio_rings(struct cnic_uio_dev *udev)
 {
-	uio_unregister_device(&udev->cnic_uinfo);
-
 	if (udev->l2_buf) {
 		dma_free_coherent(&udev->pdev->dev, udev->l2_buf_size,
 				  udev->l2_buf, udev->l2_buf_map);
@@ -839,6 +837,14 @@ static void __cnic_free_uio(struct cnic_uio_dev *udev)
 		udev->l2_ring = NULL;
 	}
 
+}
+
+static void __cnic_free_uio(struct cnic_uio_dev *udev)
+{
+	uio_unregister_device(&udev->cnic_uinfo);
+
+	__cnic_free_uio_rings(udev);
+
 	pci_dev_put(udev->pdev);
 	kfree(udev);
 }
@@ -996,6 +1002,34 @@ static int cnic_alloc_kcq(struct cnic_dev *dev, struct kcq_info *info,
 	return 0;
 }
 
+static int __cnic_alloc_uio_rings(struct cnic_uio_dev *udev, int pages)
+{
+	struct cnic_local *cp = udev->dev->cnic_priv;
+
+	if (udev->l2_ring)
+		return 0;
+
+	udev->l2_ring_size = pages * BCM_PAGE_SIZE;
+	udev->l2_ring = dma_alloc_coherent(&udev->pdev->dev, udev->l2_ring_size,
+					   &udev->l2_ring_map,
+					   GFP_KERNEL | __GFP_COMP);
+	if (!udev->l2_ring)
+		return -ENOMEM;
+
+	udev->l2_buf_size = (cp->l2_rx_ring_size + 1) * cp->l2_single_buf_size;
+	udev->l2_buf_size = PAGE_ALIGN(udev->l2_buf_size);
+	udev->l2_buf = dma_alloc_coherent(&udev->pdev->dev, udev->l2_buf_size,
+					  &udev->l2_buf_map,
+					  GFP_KERNEL | __GFP_COMP);
+	if (!udev->l2_buf) {
+		__cnic_free_uio_rings(udev);
+		return -ENOMEM;
+	}
+
+	return 0;
+
+}
+
 static int cnic_alloc_uio_rings(struct cnic_dev *dev, int pages)
 {
 	struct cnic_local *cp = dev->cnic_priv;
@@ -1020,20 +1054,9 @@ static int cnic_alloc_uio_rings(struct cnic_dev *dev, int pages)
 
 	udev->dev = dev;
 	udev->pdev = dev->pcidev;
-	udev->l2_ring_size = pages * BCM_PAGE_SIZE;
-	udev->l2_ring = dma_alloc_coherent(&udev->pdev->dev, udev->l2_ring_size,
-					   &udev->l2_ring_map,
-					   GFP_KERNEL | __GFP_COMP);
-	if (!udev->l2_ring)
-		goto err_udev;
 
-	udev->l2_buf_size = (cp->l2_rx_ring_size + 1) * cp->l2_single_buf_size;
-	udev->l2_buf_size = PAGE_ALIGN(udev->l2_buf_size);
-	udev->l2_buf = dma_alloc_coherent(&udev->pdev->dev, udev->l2_buf_size,
-					  &udev->l2_buf_map,
-					  GFP_KERNEL | __GFP_COMP);
-	if (!udev->l2_buf)
-		goto err_dma;
+	if (__cnic_alloc_uio_rings(udev, pages))
+		goto err_udev;
 
 	write_lock(&cnic_dev_lock);
 	list_add(&udev->list, &cnic_udev_list);
@@ -1044,9 +1067,7 @@ static int cnic_alloc_uio_rings(struct cnic_dev *dev, int pages)
 	cp->udev = udev;
 
 	return 0;
- err_dma:
-	dma_free_coherent(&udev->pdev->dev, udev->l2_ring_size,
-			  udev->l2_ring, udev->l2_ring_map);
+
  err_udev:
 	kfree(udev);
 	return -ENOMEM;
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 3/5] cnic: Add function pointers to arm IRQ for different devices.
From: Michael Chan @ 2012-09-08 16:01 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1347120065-26492-2-git-send-email-mchan@broadcom.com>

This will make it easier to exit IRQ loop and re-arm IRQ on devices that
don't support FCoE.

Reviewed-by: Eddie Wai <eddie.wai@broadcom.com>
Reviewed-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/ethernet/broadcom/cnic.c |   26 ++++++++++++++++++++++----
 drivers/net/ethernet/broadcom/cnic.h |    1 +
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/cnic.c b/drivers/net/ethernet/broadcom/cnic.c
index 38be4d9..ac08b8e 100644
--- a/drivers/net/ethernet/broadcom/cnic.c
+++ b/drivers/net/ethernet/broadcom/cnic.c
@@ -3078,6 +3078,22 @@ static void cnic_ack_bnx2x_e2_msix(struct cnic_dev *dev)
 			IGU_INT_DISABLE, 0);
 }
 
+static void cnic_arm_bnx2x_msix(struct cnic_dev *dev, u32 idx)
+{
+	struct cnic_local *cp = dev->cnic_priv;
+
+	cnic_ack_bnx2x_int(dev, cp->bnx2x_igu_sb_id, CSTORM_ID, idx,
+			   IGU_INT_ENABLE, 1);
+}
+
+static void cnic_arm_bnx2x_e2_msix(struct cnic_dev *dev, u32 idx)
+{
+	struct cnic_local *cp = dev->cnic_priv;
+
+	cnic_ack_igu_sb(dev, cp->bnx2x_igu_sb_id, IGU_SEG_ACCESS_DEF, idx,
+			IGU_INT_ENABLE, 1);
+}
+
 static u32 cnic_service_bnx2x_kcq(struct cnic_dev *dev, struct kcq_info *info)
 {
 	u32 last_status = *info->status_idx_ptr;
@@ -3115,8 +3131,7 @@ static void cnic_service_bnx2x_bh(unsigned long data)
 			  cp->kcq1.sw_prod_idx + MAX_KCQ_IDX);
 
 		if (!BNX2X_CHIP_IS_E2_PLUS(cp->chip_id)) {
-			cnic_ack_bnx2x_int(dev, cp->bnx2x_igu_sb_id, USTORM_ID,
-					   status_idx, IGU_INT_ENABLE, 1);
+			cp->arm_int(dev, status_idx);
 			break;
 		}
 
@@ -5520,10 +5535,13 @@ static struct cnic_dev *init_bnx2x_cnic(struct net_device *dev)
 	cp->stop_cm = cnic_cm_stop_bnx2x_hw;
 	cp->enable_int = cnic_enable_bnx2x_int;
 	cp->disable_int_sync = cnic_disable_bnx2x_int_sync;
-	if (BNX2X_CHIP_IS_E2_PLUS(cp->chip_id))
+	if (BNX2X_CHIP_IS_E2_PLUS(cp->chip_id)) {
 		cp->ack_int = cnic_ack_bnx2x_e2_msix;
-	else
+		cp->arm_int = cnic_arm_bnx2x_e2_msix;
+	} else {
 		cp->ack_int = cnic_ack_bnx2x_msix;
+		cp->arm_int = cnic_arm_bnx2x_msix;
+	}
 	cp->close_conn = cnic_close_bnx2x_conn;
 	return cdev;
 }
diff --git a/drivers/net/ethernet/broadcom/cnic.h b/drivers/net/ethernet/broadcom/cnic.h
index 3032809..9643e3a 100644
--- a/drivers/net/ethernet/broadcom/cnic.h
+++ b/drivers/net/ethernet/broadcom/cnic.h
@@ -334,6 +334,7 @@ struct cnic_local {
 	void			(*enable_int)(struct cnic_dev *);
 	void			(*disable_int_sync)(struct cnic_dev *);
 	void			(*ack_int)(struct cnic_dev *);
+	void			(*arm_int)(struct cnic_dev *, u32 index);
 	void			(*close_conn)(struct cnic_sock *, u32 opcode);
 };
 
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 2/5] cnic: Free UIO rings when the device is closed.
From: Michael Chan @ 2012-09-08 16:01 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1347120065-26492-1-git-send-email-mchan@broadcom.com>

This will free up unneeded memory.

Reviewed-by: Eddie Wai <eddie.wai@broadcom.com>
Reviewed-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/ethernet/broadcom/cnic.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/cnic.c b/drivers/net/ethernet/broadcom/cnic.c
index ff35894..38be4d9 100644
--- a/drivers/net/ethernet/broadcom/cnic.c
+++ b/drivers/net/ethernet/broadcom/cnic.c
@@ -868,6 +868,8 @@ static void cnic_free_resc(struct cnic_dev *dev)
 	if (udev) {
 		udev->dev = NULL;
 		cp->udev = NULL;
+		if (udev->uio_dev == -1)
+			__cnic_free_uio_rings(udev);
 	}
 
 	cnic_free_context(dev);
@@ -1039,6 +1041,11 @@ static int cnic_alloc_uio_rings(struct cnic_dev *dev, int pages)
 	list_for_each_entry(udev, &cnic_udev_list, list) {
 		if (udev->pdev == dev->pcidev) {
 			udev->dev = dev;
+			if (__cnic_alloc_uio_rings(udev, pages)) {
+				udev->dev = NULL;
+				read_unlock(&cnic_dev_lock);
+				return -ENOMEM;
+			}
 			cp->udev = udev;
 			read_unlock(&cnic_dev_lock);
 			return 0;
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH] netlink: Rename pid to portid to avoid confusion
From: Stephen Hemminger @ 2012-09-08 16:54 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: David Miller, netdev
In-Reply-To: <87fw6tjb4p.fsf@xmission.com>

On Fri, 07 Sep 2012 23:12:54 -0700
ebiederm@xmission.com (Eric W. Biederman) wrote:

> It is a frequent mistake to confuse the netlink port identifier with a
> process identifier.  Try to reduce this confusion by renaming fields
> that hold port identifiers portid instead of pid.
> 
> I have carefully avoided changing the structures exported to
> userspace to avoid changing the userspace API.
> 
> I have successfully built an allyesconfig kernel with this change.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---

Ok. I validated that no header file used by iproute2 is affected.

Acked-by: Stephen Hemminger <shemminger@vyatta.com>

^ permalink raw reply

* Re: kernel 3.5.2/amd64: iwlwifi 0000:03:00.0: failed to allocate pci memory
From: Marc MERLIN @ 2012-09-08 17:01 UTC (permalink / raw)
  To: johannes.berg-ral2JQCrhuEAvxtiuMwx3w,
	wey-yi.w.guy-ral2JQCrhuEAvxtiuMwx3w, ilw-VuQAYsv1563Yd54FQh9/CA
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-q7rQbLoQdy39qxiX1TGQuw
In-Reply-To: <20120904003014.GB6287-xnduUnryOU1AfugRpC6u6w@public.gmane.org>

Howdy,

I currently rmmod iwlwifi before putting my laptop to sleep and reload it 
when coming back. Arguably, it's maybe not needed, but from time to time
I hit this memory allocation failure below.

I realize it's likely a memory fragmentation problem, but I have 8GB and
plenty of 'free' space, so I'm hoping that somehow it can be defragmented
enough for module loading ot work?

My kenrel config options are here
http://marc.merlins.org/tmp/config-3.5.2-amd64-preempt-noide-20120731

and this happens on an Lenovo T530

when it works:
[   13.494270] iwlwifi 0000:03:00.0: loaded firmware version 9.221.4.1 build 25532
[   13.494440] iwlwifi 0000:03:00.0: Detected Intel(R) Centrino(R) Ultimate-N 6300 AGN, REV=0x74

When it doesn't:
[856806.443647] cfg80211: Calling CRDA to update world regulatory domain
[856806.448428] iwlwifi: Intel(R) Wireless WiFi Link AGN driver for Linux, in-tree:d
[856806.448431] iwlwifi: Copyright(c) 2003-2012 Intel Corporation
[856806.448929] cfg80211: World regulatory domain updated:
[856806.448931] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
[856806.448933] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[856806.448941] cfg80211:   (2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
[856806.448942] cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
[856806.448943] cfg80211:   (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[856806.448945] cfg80211:   (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[856806.483929] iwlwifi 0000:03:00.0: pci_resource_len = 0x00002000
[856806.483932] iwlwifi 0000:03:00.0: pci_resource_base = ffffc900057bc000
[856806.483933] iwlwifi 0000:03:00.0: HW Revision ID = 0x3E
[856806.484004] iwlwifi 0000:03:00.0: irq 46 for MSI/MSI-X
[856806.497476] iwlwifi 0000:03:00.0: loaded firmware version 9.221.4.1 build 25532
[856806.497944] kworker/3:0: page allocation failure: order:5, mode:0xd0
[856806.497948] Pid: 17936, comm: kworker/3:0 Tainted: G        W  O 3.5.2-amd64-preempt-noide-20120731 #1
[856806.497949] Call Trace:
[856806.497959]  [<ffffffff810cf54c>] warn_alloc_failed+0x117/0x12c
[856806.497963]  [<ffffffff810d23af>] __alloc_pages_nodemask+0x6e3/0x792
[856806.497969]  [<ffffffff812b7f41>] ? pfn_to_dma_pte+0x116/0x15e
[856806.497976]  [<ffffffff810ff58b>] alloc_pages_current+0xcd/0xee
[856806.497979]  [<ffffffff810cecca>] __get_free_pages+0x9/0x45
[856806.497982]  [<ffffffff812ba67d>] intel_alloc_coherent+0x84/0xe7
[856806.497986]  [<ffffffff81085cf8>] ? arch_local_irq_save+0x15/0x1b
[856806.497999]  [<ffffffffa0b84afc>] iwl_ucode_callback+0xa49/0xc0d [iwlwifi]
[856806.498006]  [<ffffffff8128f100>] ? _request_firmware_prepare.isra.5+0x1bf/0x1bf
[856806.498010]  [<ffffffff8128f181>] request_firmware_work_func+0x81/0xb1
[856806.498014]  [<ffffffff81054b13>] process_one_work+0x16f/0x28e
[856806.498018]  [<ffffffff810555d5>] worker_thread+0xce/0x152
[856806.498021]  [<ffffffff81055507>] ? manage_workers.isra.24+0x16c/0x16c
[856806.498024]  [<ffffffff81058e3c>] kthread+0x86/0x8e
[856806.498029]  [<ffffffff813a0aa4>] kernel_thread_helper+0x4/0x10
[856806.498032]  [<ffffffff81058db6>] ? kthread_freezable_should_stop+0x3e/0x3e
[856806.498034]  [<ffffffff813a0aa0>] ? gs_change+0x13/0x13
[856806.498036] Mem-Info:
[856806.498037] Node 0 DMA per-cpu:
[856806.498039] CPU    0: hi:    0, btch:   1 usd:   0
[856806.498041] CPU    1: hi:    0, btch:   1 usd:   0
[856806.498042] CPU    2: hi:    0, btch:   1 usd:   0
[856806.498044] CPU    3: hi:    0, btch:   1 usd:   0
[856806.498045] Node 0 DMA32 per-cpu:
[856806.498047] CPU    0: hi:  186, btch:  31 usd:   0
[856806.498048] CPU    1: hi:  186, btch:  31 usd:   0
[856806.498050] CPU    2: hi:  186, btch:  31 usd:   0
[856806.498051] CPU    3: hi:  186, btch:  31 usd:   0
[856806.498052] Node 0 Normal per-cpu:
[856806.498054] CPU    0: hi:  186, btch:  31 usd:   0
[856806.498055] CPU    1: hi:  186, btch:  31 usd:   0
[856806.498057] CPU    2: hi:  186, btch:  31 usd:   0
[856806.498058] CPU    3: hi:  186, btch:  31 usd:   0
[856806.498062] active_anon:880341 inactive_anon:274439 isolated_anon:0
[856806.498062]  active_file:222778 inactive_file:228271 isolated_file:0
[856806.498062]  unevictable:1436 dirty:369 writeback:0 unstable:0
[856806.498062]  free:137592 slab_reclaimable:167927 slab_unreclaimable:17618
[856806.498062]  mapped:27053 shmem:29116 pagetables:23832 bounce:0
[856806.498065] Node 0 DMA free:15900kB min:132kB low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15676kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[856806.498071] lowmem_reserve[]: 0 3257 7777 7777
[856806.498074] Node 0 DMA32 free:454560kB min:28252kB low:35312kB high:42376kB active_anon:1412512kB inactive_anon:513084kB active_file:333072kB inactive_file:323580kB unevictable:196kB isolated(anon):0kB isolated(file):0kB present:3335900kB mlocked:196kB dirty:108kB writeback:0kB mapped:43284kB shmem:27472kB slab_reclaimable:281024kB slab_unreclaimable:20952kB kernel_stack:4200kB pagetables:27640kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[856806.498079] lowmem_reserve[]: 0 0 4519 4519
[856806.498083] Node 0 Normal free:79908kB min:39196kB low:48992kB high:58792kB active_anon:2108852kB inactive_anon:584672kB active_file:558040kB inactive_file:589504kB unevictable:5548kB isolated(anon):0kB isolated(file):0kB present:4627820kB mlocked:5548kB dirty:1368kB writeback:0kB mapped:64928kB shmem:88992kB slab_reclaimable:390684kB slab_unreclaimable:49520kB kernel_stack:3496kB pagetables:67688kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[856806.498088] lowmem_reserve[]: 0 0 0 0
[856806.498090] Node 0 DMA: 1*4kB 1*8kB 1*16kB 0*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15900kB
[856806.498098] Node 0 DMA32: 60562*4kB 25568*8kB 264*16kB 4*32kB 1*64kB 1*128kB 0*256kB 2*512kB 2*1024kB 0*2048kB 0*4096kB = 454408kB
[856806.498106] Node 0 Normal: 15497*4kB 861*8kB 162*16kB 79*32kB 22*64kB 4*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 80012kB
[856806.498115] 660922 total pagecache pages
[856806.498116] 180029 pages in swap cache
[856806.498118] Swap cache stats: add 7904087, delete 7724058, find 2010810/2392461
[856806.498119] Free swap  = 5831292kB
[856806.498121] Total swap = 10485756kB
[856806.520762] 2057712 pages RAM
[856806.520764] 63301 pages reserved
[856806.520765] 543630 pages shared
[856806.520766] 1544767 pages non-shared
[856806.520771] iwlwifi 0000:03:00.0: failed to allocate pci memory

It will typically fix itself and work again later.

Any ideas?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next] netfilter: x_tables: xt_init() should run earlier
From: Patrick McHardy @ 2012-09-08 17:50 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Cong Wang, Pablo Neira Ayuso, netfilter-devel,
	Linux Kernel Network Developers
In-Reply-To: <1346863073.13121.155.camel@edumazet-glaptop>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 877 bytes --]

On Wed, 5 Sep 2012, Eric Dumazet wrote:

> From: Eric Dumazet <edumazet@google.com>
>
> Cong Wang reported a NULL dereference in xt_register_target()
>
> It turns out xt_nat_init() was called before xt_init(), so xt array
> was not yet setup.
>
> xt_init() should be marked core_initcall() to solve this problem.
>
> Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
> net/netfilter/x_tables.c |    2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> index 8d987c3..afcea11 100644
> --- a/net/netfilter/x_tables.c
> +++ b/net/netfilter/x_tables.c
> @@ -1390,6 +1390,6 @@ static void __exit xt_fini(void)
> 	kfree(xt);
> }
>
> -module_init(xt_init);
> +core_initcall(xt_init);
> module_exit(xt_fini);

Shouldn't we simply change the Makefile order?

[-- Attachment #2: Type: TEXT/PLAIN, Size: 1448 bytes --]

commit ecc4508e476e4325e747dad5d86c03248ed16271
Author: Patrick McHardy <kaber@trash.net>
Date:   Sat Sep 8 19:45:12 2012 +0200

    netfilter: fix xt_nat link order
    
    Cong Wang reported a NULL dereference in xt_register_target()
    
    It turns out xt_nat_init() was called before xt_init(), so xt array
    was not yet setup.
    
    Move xt_nat down in the Makefile to avoid initialization before
    x_tables is initialized.
    
    Based on patch from Eric Dumazet.
    
    Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
    Signed-off-by: Patrick McHardy <kaber@trash.net>

diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 98244d4..6ad6616 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -47,7 +47,6 @@ nf_nat-y	:= nf_nat_core.o nf_nat_proto_unknown.o nf_nat_proto_common.o \
 		   nf_nat_proto_udp.o nf_nat_proto_tcp.o nf_nat_helper.o
 
 obj-$(CONFIG_NF_NAT) += nf_nat.o
-obj-$(CONFIG_NF_NAT) += xt_nat.o
 
 # NAT protocols (nf_nat)
 obj-$(CONFIG_NF_NAT_PROTO_DCCP) += nf_nat_proto_dccp.o
@@ -93,6 +92,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP) += xt_TCPOPTSTRIP.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TEE) += xt_TEE.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TRACE) += xt_TRACE.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_IDLETIMER) += xt_IDLETIMER.o
+obj-$(CONFIG_NF_NAT) += xt_nat.o
 
 # matches
 obj-$(CONFIG_NETFILTER_XT_MATCH_ADDRTYPE) += xt_addrtype.o

^ permalink raw reply related

* Re: kernel 3.5.2/amd64: iwlwifi 0000:03:00.0: failed to allocate pci memory
From: Johannes Berg @ 2012-09-08 18:57 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: wey-yi.w.guy-ral2JQCrhuEAvxtiuMwx3w, ilw-VuQAYsv1563Yd54FQh9/CA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-q7rQbLoQdy39qxiX1TGQuw
In-Reply-To: <20120908170128.GK3347-xnduUnryOU1AfugRpC6u6w@public.gmane.org>

On Sat, 2012-09-08 at 10:01 -0700, Marc MERLIN wrote:

> I realize it's likely a memory fragmentation problem, but I have 8GB and
> plenty of 'free' space, so I'm hoping that somehow it can be defragmented
> enough for module loading ot work?

> [856806.443647] cfg80211: Calling CRDA to update world regulatory domain
> [856806.448428] iwlwifi: Intel(R) Wireless WiFi Link AGN driver for Linux, in-tree:d
> [856806.448431] iwlwifi: Copyright(c) 2003-2012 Intel Corporation
> [856806.448929] cfg80211: World regulatory domain updated:
> [856806.448931] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
> [856806.448933] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
> [856806.448941] cfg80211:   (2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
> [856806.448942] cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
> [856806.448943] cfg80211:   (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
> [856806.448945] cfg80211:   (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
> [856806.483929] iwlwifi 0000:03:00.0: pci_resource_len = 0x00002000
> [856806.483932] iwlwifi 0000:03:00.0: pci_resource_base = ffffc900057bc000
> [856806.483933] iwlwifi 0000:03:00.0: HW Revision ID = 0x3E
> [856806.484004] iwlwifi 0000:03:00.0: irq 46 for MSI/MSI-X
> [856806.497476] iwlwifi 0000:03:00.0: loaded firmware version 9.221.4.1 build 25532
> [856806.497944] kworker/3:0: page allocation failure: order:5, mode:0xd0
> [856806.497948] Pid: 17936, comm: kworker/3:0 Tainted: G        W  O 3.5.2-amd64-preempt-noide-20120731 #1
> [856806.497949] Call Trace:
> [856806.497959]  [<ffffffff810cf54c>] warn_alloc_failed+0x117/0x12c
> [856806.497963]  [<ffffffff810d23af>] __alloc_pages_nodemask+0x6e3/0x792
> [856806.497969]  [<ffffffff812b7f41>] ? pfn_to_dma_pte+0x116/0x15e
> [856806.497976]  [<ffffffff810ff58b>] alloc_pages_current+0xcd/0xee
> [856806.497979]  [<ffffffff810cecca>] __get_free_pages+0x9/0x45
> [856806.497982]  [<ffffffff812ba67d>] intel_alloc_coherent+0x84/0xe7
> [856806.497986]  [<ffffffff81085cf8>] ? arch_local_irq_save+0x15/0x1b
> [856806.497999]  [<ffffffffa0b84afc>] iwl_ucode_callback+0xa49/0xc0d [iwlwifi]

Yes, unfortunately we need a whole bunch of contiguous memory to load
the firmware.

> Any ideas?

Nothing we can do from the driver side, I'm afraid.

johannes

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next] netfilter: x_tables: xt_init() should run earlier
From: Eric Dumazet @ 2012-09-08 19:50 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Cong Wang, Pablo Neira Ayuso, netfilter-devel,
	Linux Kernel Network Developers
In-Reply-To: <Pine.GSO.4.63.1209081949500.2030@stinky-local.trash.net>

On Sat, 2012-09-08 at 19:50 +0200, Patrick McHardy wrote:

> Shouldn't we simply change the Makefile order?

Yes, this is what Pablo did.



^ permalink raw reply

* Re: [PATCH net-next] filter: add MOD operation
From: George Bakos @ 2012-09-08 20:31 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Jay Schulist, Andi Kleen, tcpdump-workers
In-Reply-To: <1347091415.1234.317.camel@edumazet-glaptop>

[-- Attachment #1: Type: text/plain, Size: 4516 bytes --]

Here's a patch to libpcap-1.3 to test against. I still need to
include changes to man pages.

g

On Sat, 08 Sep 2012 10:03:35 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> From: Eric Dumazet <edumazet@google.com>
> 
> On Fri, 2012-09-07 at 20:03 -0700, Andi Kleen wrote:
> > On Fri, Sep 07, 2012 at 07:49:10AM +0000, George Bakos wrote:
> > > Gents,
> > > Any fundamental reason why the following (, etc.) shouldn't be
> > > included in net/core/filter.c?
> > > 
> > >                 case BPF_S_ALU_MOD_X:
> > >                         if (X == 0)
> > >                                 return 0;
> > >                         A %= X;
> > >                         continue;
> > 
> > Copying netdev.
> > 
> > In principle no reason against it, but you may need to update
> > the various BPF JITs too that Linux now has too.
> 
> Hi Andi, thanks for the forward
> 
> In recent commit ffe06c17afbb was added ALU_XOR_X,
> so we could add ALU_MOD_X as well.
> 
> ALU_MOD_K is a bit more complex as we cant use an ancillary, and must
> instead use a new BPF_OP code :
> 
> /* alu/jmp fields */
> #define BPF_OP(code)    ((code) & 0xf0)
> #define         BPF_ADD         0x00
> #define         BPF_SUB         0x10
> #define         BPF_MUL         0x20
> #define         BPF_DIV         0x30
> #define         BPF_OR          0x40
> #define         BPF_AND         0x50
> #define         BPF_LSH         0x60
> #define         BPF_RSH         0x70
> #define         BPF_NEG         0x80
> 
> So I guess we could use
> 
> #define         BPF_MOD         0x90
> 
> About the various arches JIT, there is no hurry :
> We can update them later.
> 
> At JIT 'compile' time, if we find a not yet handled instruction, we fall
> back to the net/core/filter.c interpreter.
> 
> If the following patch is accepted, I'll do the x86 part as a followup.
> 
> Thanks !
> 
> [PATCH net-next] filter: add MOD operation
> 
> Add a new ALU opcode, to compute a modulus.
> 
> Commit ffe06c17afbbb used an ancillary to implement XOR_X,
> but here we reserve one of the available ALU opcode to implement both
> MOD_X and MOD_K
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Suggested-by: George Bakos <gbakos@alpinista.org>
> Cc: Jay Schulist <jschlst@samba.org>
> Cc: Jiri Pirko <jpirko@redhat.com>
> Cc: Andi Kleen <ak@linux.intel.com>
> ---
>  include/linux/filter.h |    4 ++++
>  net/core/filter.c      |   15 +++++++++++++++
>  2 files changed, 19 insertions(+)
> 
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index 82b0135..3cf5fd5 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -74,6 +74,8 @@ struct sock_fprog {	/* Required for SO_ATTACH_FILTER. */
>  #define         BPF_LSH         0x60
>  #define         BPF_RSH         0x70
>  #define         BPF_NEG         0x80
> +#define		BPF_MOD		0x90
> +
>  #define         BPF_JA          0x00
>  #define         BPF_JEQ         0x10
>  #define         BPF_JGT         0x20
> @@ -196,6 +198,8 @@ enum {
>  	BPF_S_ALU_MUL_K,
>  	BPF_S_ALU_MUL_X,
>  	BPF_S_ALU_DIV_X,
> +	BPF_S_ALU_MOD_K,
> +	BPF_S_ALU_MOD_X,
>  	BPF_S_ALU_AND_K,
>  	BPF_S_ALU_AND_X,
>  	BPF_S_ALU_OR_K,
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 907efd2..fbe3a8d 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -167,6 +167,14 @@ unsigned int sk_run_filter(const struct sk_buff *skb,
>  		case BPF_S_ALU_DIV_K:
>  			A = reciprocal_divide(A, K);
>  			continue;
> +		case BPF_S_ALU_MOD_X:
> +			if (X == 0)
> +				return 0;
> +			A %= X;
> +			continue;
> +		case BPF_S_ALU_MOD_K:
> +			A %= K;
> +			continue;
>  		case BPF_S_ALU_AND_X:
>  			A &= X;
>  			continue;
> @@ -469,6 +477,8 @@ int sk_chk_filter(struct sock_filter *filter, unsigned int flen)
>  		[BPF_ALU|BPF_MUL|BPF_K]  = BPF_S_ALU_MUL_K,
>  		[BPF_ALU|BPF_MUL|BPF_X]  = BPF_S_ALU_MUL_X,
>  		[BPF_ALU|BPF_DIV|BPF_X]  = BPF_S_ALU_DIV_X,
> +		[BPF_ALU|BPF_MOD|BPF_K]  = BPF_S_ALU_MOD_K,
> +		[BPF_ALU|BPF_MOD|BPF_X]  = BPF_S_ALU_MOD_X,
>  		[BPF_ALU|BPF_AND|BPF_K]  = BPF_S_ALU_AND_K,
>  		[BPF_ALU|BPF_AND|BPF_X]  = BPF_S_ALU_AND_X,
>  		[BPF_ALU|BPF_OR|BPF_K]   = BPF_S_ALU_OR_K,
> @@ -531,6 +541,11 @@ int sk_chk_filter(struct sock_filter *filter, unsigned int flen)
>  				return -EINVAL;
>  			ftest->k = reciprocal_value(ftest->k);
>  			break;
> +		case BPF_S_ALU_MOD_K:
> +			/* check for division by zero */
> +			if (ftest->k == 0)
> +				return -EINVAL;
> +			break;
>  		case BPF_S_LD_MEM:
>  		case BPF_S_LDX_MEM:
>  		case BPF_S_ST:
> 
> 


-- 

[-- Attachment #2: libpcap-1.3.0-with-modulus.patch --]
[-- Type: text/x-patch, Size: 4452 bytes --]

diff -Naur libpcap-1.3.0/bpf/net/bpf_filter.c libpcap-1.3.0-with-modulus/bpf/net/bpf_filter.c
--- libpcap-1.3.0/bpf/net/bpf_filter.c	2012-03-29 12:57:32.000000000 +0000
+++ libpcap-1.3.0-with-modulus/bpf/net/bpf_filter.c	2012-08-31 01:36:53.206825554 +0000
@@ -469,6 +469,12 @@
 			A /= X;
 			continue;
 
+		case BPF_ALU|BPF_MOD|BPF_X:
+			if (X == 0)
+				return 0;
+			A %= X;
+			continue;
+
 		case BPF_ALU|BPF_AND|BPF_X:
 			A &= X;
 			continue;
@@ -501,6 +507,10 @@
 			A /= pc->k;
 			continue;
 
+		case BPF_ALU|BPF_MOD|BPF_K:
+			A %= pc->k;
+			continue;
+
 		case BPF_ALU|BPF_AND|BPF_K:
 			A &= pc->k;
 			continue;
@@ -621,6 +631,13 @@
 				 */
 				if (BPF_SRC(p->code) == BPF_K && p->k == 0)
 					return 0;
+				break;
+			case BPF_MOD:
+				/*
+				 * Check for illegal modulus 0.
+				 */
+				if (BPF_SRC(p->code) == BPF_K && p->k == 0)
+					return 0;
 				break;
 			default:
 				return 0;
diff -Naur libpcap-1.3.0/bpf_image.c libpcap-1.3.0-with-modulus/bpf_image.c
--- libpcap-1.3.0/bpf_image.c	2012-03-29 12:57:32.000000000 +0000
+++ libpcap-1.3.0-with-modulus/bpf_image.c	2012-08-31 01:36:53.225825770 +0000
@@ -216,6 +216,11 @@
 		fmt = "x";
 		break;
 
+	case BPF_ALU|BPF_MOD|BPF_X:
+		op = "mod";
+		fmt = "x";
+		break;
+
 	case BPF_ALU|BPF_AND|BPF_X:
 		op = "and";
 		fmt = "x";
@@ -256,6 +261,11 @@
 		fmt = "#%d";
 		break;
 
+	case BPF_ALU|BPF_MOD|BPF_K:
+		op = "mod";
+		fmt = "#%d";
+		break;
+
 	case BPF_ALU|BPF_AND|BPF_K:
 		op = "and";
 		fmt = "#0x%x";
diff -Naur libpcap-1.3.0/grammar.y libpcap-1.3.0-with-modulus/grammar.y
--- libpcap-1.3.0/grammar.y	2012-03-29 12:57:32.000000000 +0000
+++ libpcap-1.3.0-with-modulus/grammar.y	2012-08-31 01:36:53.196825439 +0000
@@ -617,6 +617,7 @@
 	| arth '*' arth			{ $$ = gen_arth(BPF_MUL, $1, $3); }
 	| arth '/' arth			{ $$ = gen_arth(BPF_DIV, $1, $3); }
 	| arth '&' arth			{ $$ = gen_arth(BPF_AND, $1, $3); }
+	| arth '%' arth			{ $$ = gen_arth(BPF_MOD, $1, $3); }
 	| arth '|' arth			{ $$ = gen_arth(BPF_OR, $1, $3); }
 	| arth LSH arth			{ $$ = gen_arth(BPF_LSH, $1, $3); }
 	| arth RSH arth			{ $$ = gen_arth(BPF_RSH, $1, $3); }
diff -Naur libpcap-1.3.0/optimize.c libpcap-1.3.0-with-modulus/optimize.c
--- libpcap-1.3.0/optimize.c	2012-03-29 12:57:32.000000000 +0000
+++ libpcap-1.3.0-with-modulus/optimize.c	2012-08-31 01:36:53.188825347 +0000
@@ -666,6 +666,12 @@
 		a /= b;
 		break;
 
+	case BPF_MOD:
+		if (b == 0)
+			bpf_error("illegal modulus 0");
+		a %= b;
+		break;
+
 	case BPF_AND:
 		a &= b;
 		break;
@@ -1044,6 +1050,7 @@
 	case BPF_ALU|BPF_SUB|BPF_K:
 	case BPF_ALU|BPF_MUL|BPF_K:
 	case BPF_ALU|BPF_DIV|BPF_K:
+	case BPF_ALU|BPF_MOD|BPF_K:
 	case BPF_ALU|BPF_AND|BPF_K:
 	case BPF_ALU|BPF_OR|BPF_K:
 	case BPF_ALU|BPF_LSH|BPF_K:
@@ -1079,6 +1086,7 @@
 	case BPF_ALU|BPF_SUB|BPF_X:
 	case BPF_ALU|BPF_MUL|BPF_X:
 	case BPF_ALU|BPF_DIV|BPF_X:
+	case BPF_ALU|BPF_MOD|BPF_X:
 	case BPF_ALU|BPF_AND|BPF_X:
 	case BPF_ALU|BPF_OR|BPF_X:
 	case BPF_ALU|BPF_LSH|BPF_X:
@@ -1112,7 +1120,7 @@
 				vstore(s, &val[A_ATOM], val[X_ATOM], alter);
 				break;
 			}
-			else if (op == BPF_MUL || op == BPF_DIV ||
+			else if (op == BPF_MUL || op == BPF_DIV || op == BPF_MOD ||
 				 op == BPF_AND || op == BPF_LSH || op == BPF_RSH) {
 				s->code = BPF_LD|BPF_IMM;
 				s->k = 0;
diff -Naur libpcap-1.3.0/pcap/bpf.h libpcap-1.3.0-with-modulus/pcap/bpf.h
--- libpcap-1.3.0/pcap/bpf.h	2012-06-12 16:55:36.000000000 +0000
+++ libpcap-1.3.0-with-modulus/pcap/bpf.h	2012-08-31 01:36:53.199825471 +0000
@@ -1235,6 +1235,7 @@
 #define		BPF_LSH		0x60
 #define		BPF_RSH		0x70
 #define		BPF_NEG		0x80
+#define		BPF_MOD		0x90
 #define		BPF_JA		0x00
 #define		BPF_JEQ		0x10
 #define		BPF_JGT		0x20
diff -Naur libpcap-1.3.0/scanner.l libpcap-1.3.0-with-modulus/scanner.l
--- libpcap-1.3.0/scanner.l	2012-03-29 12:57:32.000000000 +0000
+++ libpcap-1.3.0-with-modulus/scanner.l	2012-08-31 01:36:53.225825770 +0000
@@ -329,7 +329,7 @@
 sls		return SLS;
 
 [ \r\n\t]		;
-[+\-*/:\[\]!<>()&|=]	return yytext[0];
+[+\-*/:\[\]!<>()&|=%]	return yytext[0];
 ">="			return GEQ;
 "<="			return LEQ;
 "!="			return NEQ;
@@ -387,7 +387,7 @@
 [A-Za-z0-9]([-_.A-Za-z0-9]*[.A-Za-z0-9])? {
 			 yylval.s = sdup((char *)yytext); return ID; }
 "\\"[^ !()\n\t]+	{ yylval.s = sdup((char *)yytext + 1); return ID; }
-[^ \[\]\t\n\-_.A-Za-z0-9!<>()&|=]+ {
+[^ \[\]\t\n\-_.A-Za-z0-9!<>()&|=%]+ {
 			bpf_error("illegal token: %s", yytext); }
 .			{ bpf_error("illegal char '%c'", *yytext); }
 %%

[-- Attachment #3: Type: text/plain, Size: 171 bytes --]

_______________________________________________
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers

^ permalink raw reply

* Re: [PATCH] net: small bug on rxhash calculation
From: David Miller @ 2012-09-08 22:43 UTC (permalink / raw)
  To: eric.dumazet; +Cc: chema, edumazet, netdev, chema
In-Reply-To: <1347092320.1234.335.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 08 Sep 2012 10:18:40 +0200

> On Fri, 2012-09-07 at 16:40 -0700, Chema Gonzalez wrote:
>> In the current rxhash calculation function, while the
>> sorting of the ports/addrs is coherent (you get the
>> same rxhash for packets sharing the same 4-tuple, in
>> both directions), ports and addrs are sorted
>> independently. This implies packets from a connection
>> between the same addresses but crossed ports hash to
>> the same rxhash.
>> 
>> For example, traffic between A=S:l and B=L:s is hashed
>> (in both directions) from {L, S, {s, l}}. The same
>> rxhash is obtained for packets between C=S:s and D=L:l.
>> 
>> This patch ensures that you either swap both addrs and ports,
>> or you swap none. Traffic between A and B, and traffic
>> between C and D, get their rxhash from different sources
>> ({L, S, {l, s}} for A<->B, and {L, S, {s, l}} for C<->D)
>> 
>> The patch is co-written with Eric Dumazet <edumazet@google.com>
>> 
>> Signed-off-by: Chema Gonzalez <chema@google.com>
>> ---
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH] scsi_netlink: Remove dead and buggy code
From: David Miller @ 2012-09-08 22:51 UTC (permalink / raw)
  To: ebiederm; +Cc: netdev, James.Bottomley, James.Smart
In-Reply-To: <87pq5xjw4m.fsf@xmission.com>

From: ebiederm@xmission.com (Eric W. Biederman)
Date: Fri, 07 Sep 2012 15:39:21 -0700

> 
> The scsi netlink code confuses the netlink port id with a process id,
> going so far as to read NETLINK_CREDS(skb)->pid instead of the correct
> NETLINK_CB(skb).pid.  Fortunately it does not matter because nothing
> registers to respond to scsi netlink requests.
> 
> The only interesting use of the scsi_netlink interface is
> fc_host_post_vendor_event which sends a netlink multicast message.
> 
> Since nothing registers to handle scsi netlink messages kill all of the
> registration logic, while retaining the same error handling behavior
> preserving the userspace visible behavior and removing all of the
> confused code that thought a netlink port id was a process id.
> 
> This was tested with a kernel allyesconfig build which had no problems.
> 
> Cc: James Bottomley <James.Bottomley@parallels.com>
> Cc: James Smart <James.Smart@Emulex.Com>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Yeah I can't see anyone, anywhere, using these scsi_send_nl_*()
interfaces at all.

When I get an ACK from the scsi folks I'll add this to net-next,
thanks Eric.

^ permalink raw reply

* Re: [PATCH 0/2] [v3] netlink_kernel_create updates
From: David Miller @ 2012-09-08 23:16 UTC (permalink / raw)
  To: pablo; +Cc: netdev
In-Reply-To: <1347108834-15429-1-git-send-email-pablo@netfilter.org>

From: pablo@netfilter.org
Date: Sat,  8 Sep 2012 14:53:52 +0200

> Fixed the infiniband issue. New round of these patches.
 ...
> Pablo Neira Ayuso (2):
>   netlink: kill netlink_set_nonroot
>   netlink: hide struct module parameter in netlink_kernel_create

All applied to net-next, thanks.

^ permalink raw reply

* [PATCH net-next] etherdevice: introduce help function eth_zero_addr()
From: Duan Jiong @ 2012-09-09  2:32 UTC (permalink / raw)
  To: davem; +Cc: netdev

a lot of code has either the memset or an inefficient copy
from a static array that contains the all-zeros Ethernet address.
Introduce help function eth_zero_addr() to fill an address with
all zeros, making the code clearer and allowing us to get rid of
some constant arrays.

Signed-off-by: Duan Jiong <djduanjiong@gmail.com>
---
 include/linux/etherdevice.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h
index d426336..b006ba0 100644
--- a/include/linux/etherdevice.h
+++ b/include/linux/etherdevice.h
@@ -151,6 +151,17 @@ static inline void eth_broadcast_addr(u8 *addr)
 }
 
 /**
+ * eth_zero_addr - Assign zero address
+ * @addr: Pointer to a six-byte array containing the Ethernet address
+ *
+ * Assign the zero address to the given address array.
+ */
+static inline void eth_zero_addr(u8 *addr)
+{
+	memset(addr, 0x00, ETH_ALEN);
+}
+
+/**
  * eth_hw_addr_random - Generate software assigned random Ethernet and
  * set device flag
  * @dev: pointer to net_device structure
-- 
1.7.11.4

^ permalink raw reply related

* Re: [PATCHv3] virtio-spec: virtio network device multiqueue support
From: Michael S. Tsirkin @ 2012-09-09 12:40 UTC (permalink / raw)
  To: Sasha Levin; +Cc: netdev, kvm, virtualization
In-Reply-To: <50493D04.1090408@gmail.com>

On Fri, Sep 07, 2012 at 02:17:08AM +0200, Sasha Levin wrote:
> Hi Michael,
> 
> On 09/06/2012 02:08 PM, Michael S. Tsirkin wrote:
> > Add multiqueue support to virtio network device.
> > Add a new feature flag VIRTIO_NET_F_MULTIQUEUE for this feature, a new
> > configuration field max_virtqueue_pairs to detect supported number of
> > virtqueues as well as a new command VIRTIO_NET_CTRL_STEERING to program
> > packet steering.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> Some comments about the change:
> 
>  - "The following four read-only fields only exists if VIRTIO_NET_F_MULTIQUEUE
> is set." => Should be "exist" (I think).
> 
>  - "When rule is set to VIRTIO_NET_CTRL_STEERING_RX_FOLLOWS_TX packets are
> steered by driver to the first (param+1) multiqueue virtqueues
> transmitq1...transmitqN;" - Why param+1?  I thought we ignore the default
> transmit/receive in this case.
> 
>  - "As selecting a specific steering ais n optimization feature" - "is an".
> 
>  - It's mentioned several times that the ability to read the steering rule from
> the virtio-net config is there for debug reasons. Is it really necessary? I
> think it's the first time I see debug features go in as part of the spec.

Yes, let features -> less stuff to debug. I'll drop it.

>  - I'm slightly confused, why are there both receive and transmit steering? I
> can't find a difference in the way to configure the rule for transmit and
> receive.

This paragraph is there to address this:
	Driver selects an active steering rule using VIRTIO_NET_CTRL_STEERING
	command (this controls both which virtqueue is selected for a given
	packet for receive and notifies the device which virtqueues are about to
	be used for transmit).

How can I clarify this better?

> Is it a plan for the future to allow different rules for tx and rx? If
> so, shouldn't we use different ctrl commands (
> VIRTIO_NET_CTRL_TX_STEERING/VIRTIO_NET_CTRL_RX_STEERING)?

I don't see separate steering as very useful:
it does not work for RX follows TX or for TX follows
RX, and separate commands imediately create lots of
options with behaviour which hard to define.
For example if you configure SINGLE on TX but RX_FOLLOWS_TX
on RX what does it mean?

>  - "When rule is set to VIRTIO_NET_CTRL_STEERING_SINGLE all packets are steered
> to the default virtqueue receveq (0);" - "receiveq (0)"
> 
> 
> 
> Thanks,
> Sasha

^ permalink raw reply

* [PATCHv4] virtio-spec: virtio network device multiqueue support
From: Michael S. Tsirkin @ 2012-09-09 13:03 UTC (permalink / raw)
  To: kvm, virtualization, netdev; +Cc: rick.jones2, pbonzini, levinsasha928

Add multiqueue support to virtio network device.  Add a new feature flag
VIRTIO_NET_F_MULTIQUEUE for this feature, a +new configuration field
max_virtqueue_pairs to detect supported number +of virtqueues as well as
a new command VIRTIO_NET_CTRL_STEERING to +program packet steering.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Changes from v3:
Address Sasha's comments
- drop debug fields - less fields less to debug :)
- clarify max_virtqueue_pairs field and steering param field
- misc typos
Address Paolo's comments
- Fixed old rule name left over from v2
Address Rick's comment
- Tweaked wording

Changes from v2:
Address Jason's comments on v2:
- Changed STEERING_HOST to STEERING_RX_FOLLOWS_TX:
  this is both clearer and easier to support.
  It does not look like we need a separate steering command
  since host can just watch tx packets as they go.
- Moved RX and TX steering sections near each other.
- Add motivation for other changes in v2

Changes from Jason's rfc:
- reserved vq 3: this makes all rx vqs even and tx vqs odd, which
  looks nicer to me.
- documented packet steering, added a generalized steering programming
  command. Current modes are single queue and host driven multiqueue,
  but I envision support for guest driven multiqueue in the future.
- make default vqs unused when in mq mode - this wastes some memory
  but makes it more efficient to switch between modes as
  we can avoid this causing packet reordering.

If this looks OK to everyone, we can proceed with finalizing the
implementation.  This patch is against
eb9fc84d0d3c46438aaab190e2401a9e5409a052 in virtio-spec git tree.

---
 virtio-spec.lyx | 453 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 446 insertions(+), 7 deletions(-)

diff --git a/virtio-spec.lyx b/virtio-spec.lyx
index fb6a4e3..2c2490e 100644
--- a/virtio-spec.lyx
+++ b/virtio-spec.lyx
@@ -58,6 +58,7 @@
 \html_be_strict false
 \author -608949062 "Rusty Russell,,," 
 \author 1531152142 "Paolo Bonzini,,," 
+\author 1986246365 "Michael S. Tsirkin" 
 \end_header
 
 \begin_body
@@ -3896,6 +3897,61 @@ Only if VIRTIO_NET_F_CTRL_VQ set
 \end_inset
 
 
+\change_inserted 1986246365 1346663522
+ 3: reserved
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1986246365 1346663550
+4: receiveq1.
+ 5: transmitq1.
+ 6: receiveq2.
+ 7.
+ transmitq2.
+ ...
+ 2
+\emph on
+N
+\emph default
++2:receivq
+\emph on
+N
+\emph default
+, 2
+\emph on
+N
+\emph default
++3:transmitq
+\emph on
+N
+\emph default
+
+\begin_inset Foot
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1346663558
+Only if VIRTIO_NET_F_CTRL_VQ set.
+ 
+\emph on
+N
+\emph default
+ is indicated by 
+\emph on
+max_virtqueue_pairs
+\emph default
+ field.
+\change_unchanged
+
+\end_layout
+
+\end_inset
+
+
+\change_unchanged
+
 \end_layout
 
 \begin_layout Description
@@ -4056,6 +4112,17 @@ VIRTIO_NET_F_CTRL_VLAN
 
 \begin_layout Description
 VIRTIO_NET_F_GUEST_ANNOUNCE(21) Guest can send gratuitous packets.
+\change_inserted 1986246365 1346617842
+
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1986246365 1346618103
+VIRTIO_NET_F_MULTIQUEUE(22) Device has multiple receive and transmission
+ queues.
+\change_unchanged
+
 \end_layout
 
 \end_deeper
@@ -4068,11 +4135,45 @@ configuration
 \begin_inset space ~
 \end_inset
 
-layout Two configuration fields are currently defined.
+layout 
+\change_deleted 1986246365 1346671560
+Two
+\change_inserted 1986246365 1346671647
+Six
+\change_unchanged
+ configuration fields are currently defined.
  The mac address field always exists (though is only valid if VIRTIO_NET_F_MAC
  is set), and the status field only exists if VIRTIO_NET_F_STATUS is set.
  Two read-only bits are currently defined for the status field: VIRTIO_NET_S_LIN
 K_UP and VIRTIO_NET_S_ANNOUNCE.
+
+\change_inserted 1986246365 1347194909
+ The following read-only field, 
+\emph on
+max_virtqueue_pairs
+\emph default
+ only exists if VIRTIO_NET_F_MULTIQUEUE is set.
+ This field specifies the maximum number of each of transmit and receive
+ virtqueues (receiveq1..receiveq
+\emph on
+N
+\emph default
+ and transmitq1..transmitq
+\emph on
+N
+\emph default
+ respectively; 
+\emph on
+N
+\emph default
+=
+\emph on
+max_virtqueue_pairs
+\emph default
+) that can be used for multiqueue operation, excluding the default receiveq(0)
+ and transmitq(1) virtqueues.
+
+\change_unchanged
  
 \begin_inset listings
 inline false
@@ -4105,6 +4206,15 @@ struct virtio_net_config {
 \begin_layout Plain Layout
 
     u16 status;
+\change_inserted 1986246365 1346671221
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1346671532
+
+    u16 max_virtqueue_pairs;
 \end_layout
 
 \begin_layout Plain Layout
@@ -4151,6 +4261,18 @@ physical
 \begin_layout Enumerate
 If the VIRTIO_NET_F_CTRL_VQ feature bit is negotiated, identify the control
  virtqueue.
+\change_inserted 1986246365 1346618052
+
+\end_layout
+
+\begin_layout Enumerate
+
+\change_inserted 1986246365 1346618175
+If VIRTIO_NET_F_MULTIQUEUE feature bit is negotiated, identify the receive
+ and transmission queues that are going to be used in multiqueue mode.
+ Only queues that are going to be used need to be initialized.
+\change_unchanged
+
 \end_layout
 
 \begin_layout Enumerate
@@ -4168,7 +4290,11 @@ status
 \end_layout
 
 \begin_layout Enumerate
-The receive virtqueue should be filled with receive buffers.
+The receive virtqueue
+\change_inserted 1986246365 1346618180
+(s)
+\change_unchanged
+ should be filled with receive buffers.
  This is described in detail below in 
 \begin_inset Quotes eld
 \end_inset
@@ -4513,6 +4639,8 @@ Note that the header will be two bytes longer for the VIRTIO_NET_F_MRG_RXBUF
 \end_inset
 
 
+\change_deleted 1986246365 1346932640
+
 \end_layout
 
 \begin_layout Subsection*
@@ -4988,8 +5116,24 @@ status open
 The Guest needs to check VIRTIO_NET_S_ANNOUNCE bit in status field when
  it notices the changes of device configuration.
  The command VIRTIO_NET_CTRL_ANNOUNCE_ACK is used to indicate that driver
- has recevied the notification and device would clear the VIRTIO_NET_S_ANNOUNCE
- bit in the status filed after it received this command.
+ has rece
+\change_inserted 1986246365 1346663932
+i
+\change_unchanged
+v
+\change_deleted 1986246365 1346663934
+i
+\change_unchanged
+ed the notification and device would clear the VIRTIO_NET_S_ANNOUNCE bit
+ in the status fi
+\change_inserted 1986246365 1346663942
+e
+\change_unchanged
+l
+\change_deleted 1986246365 1346663943
+e
+\change_unchanged
+d after it received this command.
 \end_layout
 
 \begin_layout Standard
@@ -5004,10 +5148,306 @@ Sending the gratuitous packets or marking there are pending gratuitous packets
 \begin_layout Enumerate
 Sending VIRTIO_NET_CTRL_ANNOUNCE_ACK command through control vq.
  
+\change_deleted 1986246365 1346662247
+
 \end_layout
 
-\begin_layout Enumerate
+\begin_layout Subsection*
+
+\change_inserted 1986246365 1346932658
+\begin_inset CommandInset label
+LatexCommand label
+name "sub:Transmit-Packet-Steering"
+
+\end_inset
+
+Transmit Packet Steering
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1346932658
+When VIRTIO_NET_F_MULTIQUEUE feature bit is negotiated, guest can use any
+ of multiple configured transmit queues to transmit a given packet.
+ To avoid packet reordering by device (which generally leads to performance
+ degradation) driver should attempt to utilize the same transmit virtqueue
+ for all packets of a given transmit flow.
+ For bi-directional protocols (in practice, TCP), a given network connection
+ can utilize both transmit and receive queues.
+ For best performance, packets from a single connection should utilize the
+ paired transmit and receive queues from the same virtqueue pair; for example
+ both transmitqN and receiveqN.
+ This rule makes it possible to optimize processing on the device side,
+ but this is not a hard requirement: devices should function correctly even
+ when this rule is not followed.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1346932658
+Driver selects an active steering rule using VIRTIO_NET_CTRL_STEERING command
+ (this controls both which virtqueue is selected for a given packet for
+ receive and notifies the device which virtqueues are about to be used for
+ transmit).
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1346932658
+This command accepts a single out argument in the following format:
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1346932658
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1347192845
+
+#define VIRTIO_NET_CTRL_STEERING               4
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1346932658
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1346932658
+
+struct virtio_net_ctrl_steering {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1346932658
+
+	u8 current_steering_rule;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1346932658
+
+    u8 reserved;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1346932658
+
+	u16 current_steering_param;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1346932658
+
+};
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1347192841
+
+#define VIRTIO_NET_CTRL_STEERING_SINGLE        0
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1347192840
+
+#define VIRTIO_NET_CTRL_STEERING_RX_FOLLOWS_TX 1
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1347193028
+The field 
+\emph on
+rule
+\emph default
+ specifies the function used to select transmit virtqueue for a given packet;
+ the field 
+\emph on
+param
+\emph default
+ makes it possible to pass an extra parameter if appropriate.
+ When 
+\emph on
+rule
+\emph default
+ is set to VIRTIO_NET_CTRL_STEERING_SINGLE (this is the default) all packets
+ are steered to the default virtqueue transmitq (1); param is unused; this
+ is the default.
+ With any other rule, When 
+\emph on
+rule
+\emph default
+ is set to VIRTIO_NET_CTRL_STEERING_RX_FOLLOWS_TX packets are steered by
+ driver to the first 
+\emph on
+N
+\emph default
+=(
+\emph on
+param
+\emph default
++1) multiqueue virtqueues transmitq1...transmitqN; the default transmitq is
+ unused.
+ Driver must have configured all these (
+\emph on
+param
+\emph default
++1) virtqueues beforehand.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1347193114
+Supported steering rules can be added and removed in the future.
+ Driver should check that the request to change the steering rule was successful
+ by checking ack values of the command.
+ As selecting a specific steering is an optimization feature, drivers should
+ avoid hard failure and fall back on using a supported steering rule if
+ this command fails.
+ The default steering rule is VIRTIO_NET_CTRL_STEERING_SINGLE.
+ It will not be removed.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1346932658
+When the steering rule is modified, some packets can still be outstanding
+ in one or more of the transmit virtqueues.
+ Since drivers might choose to modify the current steering rule at a high
+ rate (e.g.
+ adaptively in response to changes in the workload) to avoid reordering
+ packets, device is recommended to complete processing of the transmit queue(s)
+ utilized by the original steering before processing any packets delivered
+ by the modified steering rule.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1346932658
+For debugging, the current steering rule can also be read from the configuration
+ space.
+\end_layout
+
+\begin_layout Subsection*
+
+\change_inserted 1986246365 1346670357
+\begin_inset CommandInset label
+LatexCommand label
+name "sub:Receive-Packet-Steering"
+
+\end_inset
+
+Receive Packet Steering
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1346671046
+When VIRTIO_NET_F_MULTIQUEUE feature bit is negotiated, device can use any
+ of multiple configured receive queues to pass a given packet to driver.
+ Driver controls which virtqueue is selected in practice by configuring
+ packet steering rule using VIRTIO_NET_CTRL_STEERING command, as described
+ above
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "sub:Transmit-Packet-Steering"
+
+\end_inset
+
 .
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1347193175
+The field 
+\emph on
+rule
+\emph default
+ specifies the function used to select receive virtqueue for a given packet;
+ the field 
+\emph on
+param
+\emph default
+ makes it possible to pass an extra parameter if appropriate.
+ When 
+\emph on
+rule
+\emph default
+ is set to VIRTIO_NET_CTRL_STEERING_SINGLE all packets are steered to the
+ default virtqueue receiveq (0); param is unused; this is the default.
+ When 
+\emph on
+rule
+\emph default
+ is set to VIRTIO_NET_CTRL_STEERING_RX_FOLLOWS_TX packets are steered by
+ host to the first 
+\emph on
+N
+\emph default
+=(
+\emph on
+param
+\emph default
++1) multiqueue virtqueues receiveq1...receiveqN; the default receiveq is unused.
+ Driver must have configured all these (
+\emph on
+param
+\emph default
++1) virtqueues beforehand.
+ For best performance for bi-directional flows (such as TCP) device should
+ detect the flow to virtqueue pair mapping on transmit and select the receive
+ virtqueue from the same virtqueue pair.
+ For uni-directional flows, or when this mapping information is missing,
+ a device-specific steering function is used.
+\change_unchanged
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1346669564
+Supported steering rules can be added and removed in the future.
+ Driver should probe for supported rules by checking ack values of the command.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1346932135
+When the steering rule is modified, some packets can still be outstanding
+ in one or more of the virtqueues.
+ Device is not required to wait for these packets to be consumed before
+ delivering packets using the new streering rule.
+ Drivers modifying the steering rule at a high rate (e.g.
+ adaptively in response to changes in the workload) are recommended to complete
+ processing of the receive queue(s) utilized by the original steering before
+ processing any packets delivered by the modified steering rule.
+\end_layout
+
+\begin_layout Standard
+
+\change_deleted 1986246365 1346664095
+.
+
+\change_unchanged
  
 \end_layout
 
@@ -5973,8 +6413,7 @@ If the VIRTIO_CONSOLE_F_MULTIPORT feature is negotiated, the driver can
  spawn multiple ports, not all of which may be attached to a console.
  Some could be generic ports.
  In this case, the control virtqueues are enabled and according to the max_nr_po
-rts configuration-space value, an appropriate number of virtqueues are
- created.
+rts configuration-space value, an appropriate number of virtqueues are created.
  A control message indicating the driver is ready is sent to the host.
  The host can then send control messages for adding new ports to the device.
  After creating and initializing each port, a VIRTIO_CONSOLE_PORT_READY
-- 
MST

^ permalink raw reply related

* ndo_get_stats and rtnl_netlink
From: Shlomo Pongartz @ 2012-09-09 15:23 UTC (permalink / raw)
  To: netdev

Hi,

Just realized that dev_get_stats which calls into a netdevice 
ndo_get_stats64/ndo_get_stats can be
called with or without RTNL lock protection. If called from 
rtnl_fill_ifinfo e.g as of invocation of
"ip link show <interface>, there IS locking, however if called from 
dev_seq_printf_stats e.g as of
invocation of reading the /sys/class/net/<interface>/statistics/ 
entries, etc more cases -- no locking.

This turned to be problematic when implementing the ethtool 
"set_channels" directive which
changes the number of **rings**, since we stepped on a bug where the 
rings data structure was
changed by the ethtool flow in the same time a statistics call was done 
into the driver, etc.

What would be the way to continue here, per driver lock sounds non 
generic...

Regards
Shlomo Pongratz

^ permalink raw reply

* Re: [PATCH v2] net-tcp: TCP/IP stack bypass for loopback connections
From: Jan Engelhardt @ 2012-09-09 17:54 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Pádraig Brady, Bruce "Brutus" Curtis,
	David S. Miller, Eric Dumazet, netdev
In-Reply-To: <1345722015.5904.675.camel@edumazet-glaptop>


On Thursday 2012-08-23 13:40, Eric Dumazet wrote:
>On Thu, 2012-08-23 at 11:57 +0100, Pádraig Brady wrote:
>
>> Just to quantify the loopback testing compat issue.
>> I often do stuff like the following to test latency.
>> Will that be impacted?
>> 
>>   tc qdisc add dev lo root handle 1:0 netem delay 20msec
>> 
>
>Yes this will. At least for tcp traffic this wont "work".
>
>TCP friends bypass layers, by directly queuing skbs to sockets.
>
>-> no iptables, 

If it amounts to that, you will have upset users rather soon.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox