Netdev List
 help / color / mirror / Atom feed
* [PATCH 3/8] netfilter: nfnetlink_queue: provide rcu enabled callbacks
From: kaber @ 2011-07-21 10:17 UTC (permalink / raw)
  To: davem; +Cc: netfilter-devel, netdev
In-Reply-To: <1311243476-18236-1-git-send-email-kaber@trash.net>

From: Eric Dumazet <eric.dumazet@gmail.com>

nenetlink_queue operations on SMP are not efficent if several queues are
used, because of nfnl_mutex contention when applications give packet
verdict.

Use new call_rcu field in struct nfnl_callback to advertize a callback
that is called under rcu_read_lock instead of nfnl_mutex.

On my 2x4x2 machine, I was able to reach 2.000.000 pps going through
user land returning NF_ACCEPT verdicts without losses, instead of less
than 500.000 pps before patch.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Florian Westphal <fw@strlen.de>
CC: Eric Leblond <eric@regit.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 net/netfilter/nfnetlink_queue.c |   41 +++++++++++---------------------------
 1 files changed, 12 insertions(+), 29 deletions(-)

diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index b83123f..c645b87 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -619,39 +619,26 @@ nfqnl_recv_verdict(struct sock *ctnl, struct sk_buff *skb,
 	struct nfqnl_instance *queue;
 	unsigned int verdict;
 	struct nf_queue_entry *entry;
-	int err;
 
-	rcu_read_lock();
 	queue = instance_lookup(queue_num);
-	if (!queue) {
-		err = -ENODEV;
-		goto err_out_unlock;
-	}
+	if (!queue)
+		return -ENODEV;
 
-	if (queue->peer_pid != NETLINK_CB(skb).pid) {
-		err = -EPERM;
-		goto err_out_unlock;
-	}
+	if (queue->peer_pid != NETLINK_CB(skb).pid)
+		return -EPERM;
 
-	if (!nfqa[NFQA_VERDICT_HDR]) {
-		err = -EINVAL;
-		goto err_out_unlock;
-	}
+	if (!nfqa[NFQA_VERDICT_HDR])
+		return -EINVAL;
 
 	vhdr = nla_data(nfqa[NFQA_VERDICT_HDR]);
 	verdict = ntohl(vhdr->verdict);
 
-	if ((verdict & NF_VERDICT_MASK) > NF_MAX_VERDICT) {
-		err = -EINVAL;
-		goto err_out_unlock;
-	}
+	if ((verdict & NF_VERDICT_MASK) > NF_MAX_VERDICT)
+		return -EINVAL;
 
 	entry = find_dequeue_entry(queue, ntohl(vhdr->id));
-	if (entry == NULL) {
-		err = -ENOENT;
-		goto err_out_unlock;
-	}
-	rcu_read_unlock();
+	if (entry == NULL)
+		return -ENOENT;
 
 	if (nfqa[NFQA_PAYLOAD]) {
 		if (nfqnl_mangle(nla_data(nfqa[NFQA_PAYLOAD]),
@@ -664,10 +651,6 @@ nfqnl_recv_verdict(struct sock *ctnl, struct sk_buff *skb,
 
 	nf_reinject(entry, verdict);
 	return 0;
-
-err_out_unlock:
-	rcu_read_unlock();
-	return err;
 }
 
 static int
@@ -780,9 +763,9 @@ err_out_unlock:
 }
 
 static const struct nfnl_callback nfqnl_cb[NFQNL_MSG_MAX] = {
-	[NFQNL_MSG_PACKET]	= { .call = nfqnl_recv_unsupp,
+	[NFQNL_MSG_PACKET]	= { .call_rcu = nfqnl_recv_unsupp,
 				    .attr_count = NFQA_MAX, },
-	[NFQNL_MSG_VERDICT]	= { .call = nfqnl_recv_verdict,
+	[NFQNL_MSG_VERDICT]	= { .call_rcu = nfqnl_recv_verdict,
 				    .attr_count = NFQA_MAX,
 				    .policy = nfqa_verdict_policy },
 	[NFQNL_MSG_CONFIG]	= { .call = nfqnl_recv_config,
-- 
1.7.2.3


^ permalink raw reply related

* [PATCH 1/8] netfilter: add SELinux context support to AUDIT target
From: kaber @ 2011-07-21 10:17 UTC (permalink / raw)
  To: davem; +Cc: netfilter-devel, netdev
In-Reply-To: <1311243476-18236-1-git-send-email-kaber@trash.net>

From: Mr Dash Four <mr.dash.four@googlemail.com>

In this revision the conversion of secid to SELinux context and adding it
to the audit log is moved from xt_AUDIT.c to audit.c with the aid of a
separate helper function - audit_log_secctx - which does both the conversion
and logging of SELinux context, thus also preventing internal secid number
being leaked to userspace. If conversion is not successful an error is raised.

With the introduction of this helper function the work done in xt_AUDIT.c is
much more simplified. It also opens the possibility of this helper function
being used by other modules (including auditd itself), if desired. With this
addition, typical (raw auditd) output after applying the patch would be:

type=NETFILTER_PKT msg=audit(1305852240.082:31012): action=0 hook=1 len=52 inif=? outif=eth0 saddr=10.1.1.7 daddr=10.1.2.1 ipid=16312 proto=6 sport=56150 dport=22 obj=system_u:object_r:ssh_client_packet_t:s0
type=NETFILTER_PKT msg=audit(1306772064.079:56): action=0 hook=3 len=48 inif=eth0 outif=? smac=00:05:5d:7c:27:0b dmac=00:02:b3:0a:7f:81 macproto=0x0800 saddr=10.1.2.1 daddr=10.1.1.7 ipid=462 proto=6 sport=22 dport=3561 obj=system_u:object_r:ssh_server_packet_t:s0

Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Mr Dash Four <mr.dash.four@googlemail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 include/linux/audit.h    |    7 +++++++
 kernel/audit.c           |   29 +++++++++++++++++++++++++++++
 net/netfilter/xt_AUDIT.c |    5 +++++
 3 files changed, 41 insertions(+), 0 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 9d339eb..0c80061 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -613,6 +613,12 @@ extern void		    audit_log_d_path(struct audit_buffer *ab,
 extern void		    audit_log_key(struct audit_buffer *ab,
 					  char *key);
 extern void		    audit_log_lost(const char *message);
+#ifdef CONFIG_SECURITY
+extern void 		    audit_log_secctx(struct audit_buffer *ab, u32 secid);
+#else
+#define audit_log_secctx(b,s) do { ; } while (0)
+#endif
+
 extern int		    audit_update_lsm_rules(void);
 
 				/* Private API (for audit.c only) */
@@ -635,6 +641,7 @@ extern int audit_enabled;
 #define audit_log_untrustedstring(a,s) do { ; } while (0)
 #define audit_log_d_path(b, p, d) do { ; } while (0)
 #define audit_log_key(b, k) do { ; } while (0)
+#define audit_log_secctx(b,s) do { ; } while (0)
 #define audit_enabled 0
 #endif
 #endif
diff --git a/kernel/audit.c b/kernel/audit.c
index 9395003..52501b5 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -55,6 +55,9 @@
 #include <net/sock.h>
 #include <net/netlink.h>
 #include <linux/skbuff.h>
+#ifdef CONFIG_SECURITY
+#include <linux/security.h>
+#endif
 #include <linux/netlink.h>
 #include <linux/freezer.h>
 #include <linux/tty.h>
@@ -1502,6 +1505,32 @@ void audit_log(struct audit_context *ctx, gfp_t gfp_mask, int type,
 	}
 }
 
+#ifdef CONFIG_SECURITY
+/**
+ * audit_log_secctx - Converts and logs SELinux context
+ * @ab: audit_buffer
+ * @secid: security number
+ *
+ * This is a helper function that calls security_secid_to_secctx to convert
+ * secid to secctx and then adds the (converted) SELinux context to the audit
+ * log by calling audit_log_format, thus also preventing leak of internal secid
+ * to userspace. If secid cannot be converted audit_panic is called.
+ */
+void audit_log_secctx(struct audit_buffer *ab, u32 secid)
+{
+	u32 len;
+	char *secctx;
+
+	if (security_secid_to_secctx(secid, &secctx, &len)) {
+		audit_panic("Cannot convert secid to context");
+	} else {
+		audit_log_format(ab, " obj=%s", secctx);
+		security_release_secctx(secctx, len);
+	}
+}
+EXPORT_SYMBOL(audit_log_secctx);
+#endif
+
 EXPORT_SYMBOL(audit_log_start);
 EXPORT_SYMBOL(audit_log_end);
 EXPORT_SYMBOL(audit_log_format);
diff --git a/net/netfilter/xt_AUDIT.c b/net/netfilter/xt_AUDIT.c
index 363a99e..4bca15a 100644
--- a/net/netfilter/xt_AUDIT.c
+++ b/net/netfilter/xt_AUDIT.c
@@ -163,6 +163,11 @@ audit_tg(struct sk_buff *skb, const struct xt_action_param *par)
 		break;
 	}
 
+#ifdef CONFIG_NETWORK_SECMARK
+	if (skb->secmark)
+		audit_log_secctx(ab, skb->secmark);
+#endif
+
 	audit_log_end(ab);
 
 errout:
-- 
1.7.2.3


^ permalink raw reply related

* [PATCH 2/8] netfilter: nfnetlink: add RCU in nfnetlink_rcv_msg()
From: kaber @ 2011-07-21 10:17 UTC (permalink / raw)
  To: davem; +Cc: netfilter-devel, netdev
In-Reply-To: <1311243476-18236-1-git-send-email-kaber@trash.net>

From: Eric Dumazet <eric.dumazet@gmail.com>

Goal of this patch is to permit nfnetlink providers not mandate
nfnl_mutex being held while nfnetlink_rcv_msg() calls them.

If struct nfnl_callback contains a non NULL call_rcu(), then
nfnetlink_rcv_msg() will use it instead of call() field, holding
rcu_read_lock instead of nfnl_mutex

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Florian Westphal <fw@strlen.de>
CC: Eric Leblond <eric@regit.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 include/linux/netfilter/nfnetlink.h |    3 ++
 net/netfilter/nfnetlink.c           |   40 ++++++++++++++++++++++++++--------
 2 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/include/linux/netfilter/nfnetlink.h b/include/linux/netfilter/nfnetlink.h
index 2b11fc1..74d3386 100644
--- a/include/linux/netfilter/nfnetlink.h
+++ b/include/linux/netfilter/nfnetlink.h
@@ -60,6 +60,9 @@ struct nfnl_callback {
 	int (*call)(struct sock *nl, struct sk_buff *skb, 
 		    const struct nlmsghdr *nlh,
 		    const struct nlattr * const cda[]);
+	int (*call_rcu)(struct sock *nl, struct sk_buff *skb, 
+		    const struct nlmsghdr *nlh,
+		    const struct nlattr * const cda[]);
 	const struct nla_policy *policy;	/* netlink attribute policy */
 	const u_int16_t attr_count;		/* number of nlattr's */
 };
diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index b4a4532..1905976 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -37,7 +37,7 @@ MODULE_ALIAS_NET_PF_PROTO(PF_NETLINK, NETLINK_NETFILTER);
 
 static char __initdata nfversion[] = "0.30";
 
-static const struct nfnetlink_subsystem *subsys_table[NFNL_SUBSYS_COUNT];
+static const struct nfnetlink_subsystem __rcu *subsys_table[NFNL_SUBSYS_COUNT];
 static DEFINE_MUTEX(nfnl_mutex);
 
 void nfnl_lock(void)
@@ -59,7 +59,7 @@ int nfnetlink_subsys_register(const struct nfnetlink_subsystem *n)
 		nfnl_unlock();
 		return -EBUSY;
 	}
-	subsys_table[n->subsys_id] = n;
+	rcu_assign_pointer(subsys_table[n->subsys_id], n);
 	nfnl_unlock();
 
 	return 0;
@@ -71,7 +71,7 @@ int nfnetlink_subsys_unregister(const struct nfnetlink_subsystem *n)
 	nfnl_lock();
 	subsys_table[n->subsys_id] = NULL;
 	nfnl_unlock();
-
+	synchronize_rcu();
 	return 0;
 }
 EXPORT_SYMBOL_GPL(nfnetlink_subsys_unregister);
@@ -83,7 +83,7 @@ static inline const struct nfnetlink_subsystem *nfnetlink_get_subsys(u_int16_t t
 	if (subsys_id >= NFNL_SUBSYS_COUNT)
 		return NULL;
 
-	return subsys_table[subsys_id];
+	return rcu_dereference(subsys_table[subsys_id]);
 }
 
 static inline const struct nfnl_callback *
@@ -139,21 +139,27 @@ static int nfnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 
 	type = nlh->nlmsg_type;
 replay:
+	rcu_read_lock();
 	ss = nfnetlink_get_subsys(type);
 	if (!ss) {
 #ifdef CONFIG_MODULES
-		nfnl_unlock();
+		rcu_read_unlock();
 		request_module("nfnetlink-subsys-%d", NFNL_SUBSYS_ID(type));
-		nfnl_lock();
+		rcu_read_lock();
 		ss = nfnetlink_get_subsys(type);
 		if (!ss)
 #endif
+		{
+			rcu_read_unlock();
 			return -EINVAL;
+		}
 	}
 
 	nc = nfnetlink_find_client(type, ss);
-	if (!nc)
+	if (!nc) {
+		rcu_read_unlock();
 		return -EINVAL;
+	}
 
 	{
 		int min_len = NLMSG_SPACE(sizeof(struct nfgenmsg));
@@ -167,7 +173,23 @@ replay:
 		if (err < 0)
 			return err;
 
-		err = nc->call(net->nfnl, skb, nlh, (const struct nlattr **)cda);
+		if (nc->call_rcu) {
+			err = nc->call_rcu(net->nfnl, skb, nlh,
+					   (const struct nlattr **)cda);
+			rcu_read_unlock();
+		} else {
+			rcu_read_unlock();
+			nfnl_lock();
+			if (rcu_dereference_protected(
+					subsys_table[NFNL_SUBSYS_ID(type)],
+					lockdep_is_held(&nfnl_mutex)) != ss ||
+			    nfnetlink_find_client(type, ss) != nc)
+				err = -EAGAIN;
+			else
+				err = nc->call(net->nfnl, skb, nlh,
+						   (const struct nlattr **)cda);
+			nfnl_unlock();
+		}
 		if (err == -EAGAIN)
 			goto replay;
 		return err;
@@ -176,9 +198,7 @@ replay:
 
 static void nfnetlink_rcv(struct sk_buff *skb)
 {
-	nfnl_lock();
 	netlink_rcv_skb(skb, &nfnetlink_rcv_msg);
-	nfnl_unlock();
 }
 
 static int __net_init nfnetlink_net_init(struct net *net)
-- 
1.7.2.3


^ permalink raw reply related

* [PATCH 8/8] netfilter: ipset: fix compiler warnings "'hash_ip4_data_next' declared inline after being called"
From: kaber @ 2011-07-21 10:17 UTC (permalink / raw)
  To: davem; +Cc: netfilter-devel, netdev
In-Reply-To: <1311243476-18236-1-git-send-email-kaber@trash.net>

From: Chris Friesen <chris.friesen@genband.com>

Some gcc versions warn about prototypes without "inline" when the declaration
includes the "inline" keyword. The fix generates a false error message
"marked inline, but without a definition" with sparse below 0.4.2.

Signed-off-by: Chris Friesen <chris.friesen@genband.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 include/linux/netfilter/ipset/ip_set_ahash.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/netfilter/ipset/ip_set_ahash.h b/include/linux/netfilter/ipset/ip_set_ahash.h
index 1e7f759..b89fb79 100644
--- a/include/linux/netfilter/ipset/ip_set_ahash.h
+++ b/include/linux/netfilter/ipset/ip_set_ahash.h
@@ -392,7 +392,7 @@ retry:
 	return 0;
 }
 
-static void
+static inline void
 type_pf_data_next(struct ip_set_hash *h, const struct type_pf_elem *d);
 
 /* Add an element to a hash and update the internal counters when succeeded,
-- 
1.7.2.3


^ permalink raw reply related

* [PATCH 4/8] netfilter: nfnetlink_queue: assert monotonic packet ids
From: kaber @ 2011-07-21 10:17 UTC (permalink / raw)
  To: davem; +Cc: netfilter-devel, netdev
In-Reply-To: <1311243476-18236-1-git-send-email-kaber@trash.net>

From: Eric Dumazet <eric.dumazet@gmail.com>

Packet identifier is currently setup in nfqnl_build_packet_message(),
using one atomic_inc_return().

Problem is that since several cpus might concurrently call
nfqnl_enqueue_packet() for the same queue, we can deliver packets to
consumer in non monotonic way (packet N+1 being delivered after packet
N)

This patch moves the packet id setup from nfqnl_build_packet_message()
to nfqnl_enqueue_packet() to guarantee correct delivery order.

This also removes one atomic operation.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Florian Westphal <fw@strlen.de>
CC: Pablo Neira Ayuso <pablo@netfilter.org>
CC: Eric Leblond <eric@regit.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 net/netfilter/nfnetlink_queue.c |   26 +++++++++++++++-----------
 1 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index c645b87..3b2af8c 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -58,7 +58,7 @@ struct nfqnl_instance {
  */
 	spinlock_t	lock;
 	unsigned int	queue_total;
-	atomic_t	id_sequence;		/* 'sequence' of pkt ids */
+	unsigned int	id_sequence;		/* 'sequence' of pkt ids */
 	struct list_head queue_list;		/* packets in queue */
 };
 
@@ -213,13 +213,15 @@ nfqnl_flush(struct nfqnl_instance *queue, nfqnl_cmpfn cmpfn, unsigned long data)
 
 static struct sk_buff *
 nfqnl_build_packet_message(struct nfqnl_instance *queue,
-			   struct nf_queue_entry *entry)
+			   struct nf_queue_entry *entry,
+			   __be32 **packet_id_ptr)
 {
 	sk_buff_data_t old_tail;
 	size_t size;
 	size_t data_len = 0;
 	struct sk_buff *skb;
-	struct nfqnl_msg_packet_hdr pmsg;
+	struct nlattr *nla;
+	struct nfqnl_msg_packet_hdr *pmsg;
 	struct nlmsghdr *nlh;
 	struct nfgenmsg *nfmsg;
 	struct sk_buff *entskb = entry->skb;
@@ -272,12 +274,11 @@ nfqnl_build_packet_message(struct nfqnl_instance *queue,
 	nfmsg->version = NFNETLINK_V0;
 	nfmsg->res_id = htons(queue->queue_num);
 
-	entry->id = atomic_inc_return(&queue->id_sequence);
-	pmsg.packet_id 		= htonl(entry->id);
-	pmsg.hw_protocol	= entskb->protocol;
-	pmsg.hook		= entry->hook;
-
-	NLA_PUT(skb, NFQA_PACKET_HDR, sizeof(pmsg), &pmsg);
+	nla = __nla_reserve(skb, NFQA_PACKET_HDR, sizeof(*pmsg));
+	pmsg = nla_data(nla);
+	pmsg->hw_protocol	= entskb->protocol;
+	pmsg->hook		= entry->hook;
+	*packet_id_ptr		= &pmsg->packet_id;
 
 	indev = entry->indev;
 	if (indev) {
@@ -388,6 +389,7 @@ nfqnl_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum)
 	struct sk_buff *nskb;
 	struct nfqnl_instance *queue;
 	int err = -ENOBUFS;
+	__be32 *packet_id_ptr;
 
 	/* rcu_read_lock()ed by nf_hook_slow() */
 	queue = instance_lookup(queuenum);
@@ -401,7 +403,7 @@ nfqnl_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum)
 		goto err_out;
 	}
 
-	nskb = nfqnl_build_packet_message(queue, entry);
+	nskb = nfqnl_build_packet_message(queue, entry, &packet_id_ptr);
 	if (nskb == NULL) {
 		err = -ENOMEM;
 		goto err_out;
@@ -420,6 +422,8 @@ nfqnl_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum)
 				 queue->queue_total);
 		goto err_out_free_nskb;
 	}
+	entry->id = ++queue->id_sequence;
+	*packet_id_ptr = htonl(entry->id);
 
 	/* nfnetlink_unicast will either free the nskb or add it to a socket */
 	err = nfnetlink_unicast(nskb, &init_net, queue->peer_pid, MSG_DONTWAIT);
@@ -852,7 +856,7 @@ static int seq_show(struct seq_file *s, void *v)
 			  inst->peer_pid, inst->queue_total,
 			  inst->copy_mode, inst->copy_range,
 			  inst->queue_dropped, inst->queue_user_dropped,
-			  atomic_read(&inst->id_sequence), 1);
+			  inst->id_sequence, 1);
 }
 
 static const struct seq_operations nfqnl_seq_ops = {
-- 
1.7.2.3


^ permalink raw reply related

* [PATCH 0/8] netfilter: netfilter update for net-next
From: kaber @ 2011-07-21 10:17 UTC (permalink / raw)
  To: davem; +Cc: netfilter-devel, netdev

Hi Dave,

following is a netfilter update for net-next, containing:

- changes to the AUDIT target to log the security context, from Mr Dash Four

- RCUification of nfnetlink and nfnetlink_queue from Eric, raising performance
  fourfold in his tests

- a fix for nfnetlink_queue to make sure packet IDs are monotonic increasing,
  from Eric

- nfnetlink_queue batch verdict support from Florian, increasing performance
  even further

- ipset updates from Jozsef

Please apply or pull from:

git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6.git master

Thanks!

^ permalink raw reply

* Re: ipvs oops in 3.0-rc7
From: Julian Anastasov @ 2011-07-21  9:04 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: netdev, lvs-devel, Simon Horman, Wensong Zhang
In-Reply-To: <20110720205019.9dfa30c3.rdunlap@xenotime.net>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 7076 bytes --]


	Hello,

On Wed, 20 Jul 2011, Randy Dunlap wrote:

> I'm seeing the following Oops in 3.0-rc7 on x86_64, just loading and unloading
> modules.  Any chance this is already fixed?  I can test current git, but I
> wanted to ask first.
> 
> Looks like it is on the second module load of ip_vs (i.e.,
> modprobe ip_vs; rmmod ip_vs; modprobe ip_vs).

	I think, this problem was fixed by this patch:

http://www.spinics.net/lists/lvs-devel/msg02051.html

	But it seems it was lost somewhere ...

> Jul 20 17:15:05 chimera kernel: [ 3323.505527] IPVS: ipvs unloaded.
> Jul 20 17:15:06 chimera kernel: [ 3324.554297] BUG: unable to handle kernel paging request at ffffffffa1543820
> Jul 20 17:15:06 chimera kernel: [ 3324.554382] IP: [<ffffffff810a8d4f>] raw_notifier_chain_register+0x1f/0x4a
> Jul 20 17:15:06 chimera kernel: [ 3324.554445] PGD 1872067 PUD 1876063 PMD b653f067 PTE 0
> Jul 20 17:15:06 chimera kernel: [ 3324.554505] Oops: 0000 [#1] SMP 
> Jul 20 17:15:06 chimera kernel: [ 3324.554551] CPU 1 
> Jul 20 17:15:06 chimera kernel: [ 3324.554574] Modules linked in: ip_vs(+) nf_conntrack_sip nf_tproxy_core xt_RATEEST nf_conntrack_proto_gre nfnetlink_log nfnetlink nf_conntrack_broadcast l2tp_core can rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr atm kernelcapi fcrypt pcbc af_rxrpc xp gru macvtap tun isdnhdlc mISDNipac mISDN_core chipreg map_funcs macvlan ptp pps_core mdio_bitbang hdlcdrv ax25 mdio pppox gre inet_lro cycx_drv wanrouter hdlc lapb uio ppp_generic xenbus_probe_frontend configfs ecb rtl8192c_common ath9k_common ath9k_hw ath libertas atmel rt2x00pci rt2x00usb rt2x00lib rng_core orinoco wl12xx crc7 p54common arc4 hostap rndis_host eeprom_93cx6 libipw lib80211 mac80211 cfg80211 fddi crc32c libcrc32c dca com20020 arcnet psnap cdc_ether phonet usbnet sja1000 can_dev sir_dev i
 rda crc_ccitt mtd zlib_deflate slhc virtio_ring virtio tr i2400m wimax mii usbserial leds_net5501 fuse af_packet ipt_MASQUERADE iptable_nat nf_nat nfsd lockd nfs_acl auth_rpcgss stp llc bne!
 p bluetooth rfkill crc16 sunrpc ipt_REJEC
> Jul 20 17:15:06 chimera kernel: T nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand acpi_cpufreq freq_table mperf binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath scsi_dh dm_mod kvm_intel kvm uinput mousedev sr_mod cdrom ppdev snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device ide_pci_generic snd_pcm_oss usbmouse ide_core snd_mixer_oss firewire_ohci usbhid snd_pcm firewire_core usb_storage hid usblp ata_generic i2c_i801 sg pcspkr snd_timer pata_acpi usb_libusual iTCO_wdt iTCO_vendor_support uas snd crc_itu_t soundcore pata_marvell snd_page_alloc parport_pc evdev proce
 ssor parport mac_hid rtc_cmos unix sd_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ssb mmc_core pcmcia pcmcia_core firmware_class ehci_hcd usbcore i915 drm_kms_helper intel_agp button !
 intel_gtt video thermal_sys hwmon [last unlo
> Jul 20 17:15:06 chimera kernel: aded: ip_vs]
> Jul 20 17:15:06 chimera kernel: [ 3324.556037] 
> Jul 20 17:15:06 chimera kernel: [ 3324.556037] Pid: 20884, comm: modprobe Not tainted 3.0.0-rc7 #6 Gateway GT5636E/DG965OT
> Jul 20 17:15:06 chimera kernel: [ 3324.556037] RIP: 0010:[<ffffffff810a8d4f>]  [<ffffffff810a8d4f>] raw_notifier_chain_register+0x1f/0x4a
> Jul 20 17:15:06 chimera kernel: [ 3324.556037] RSP: 0018:ffff8800b5169e88  EFLAGS: 00010202
> Jul 20 17:15:06 chimera kernel: [ 3324.556037] RAX: ffffffffa1543810 RBX: ffffffffa18f3810 RCX: 0000000000000000
> Jul 20 17:15:06 chimera kernel: [ 3324.556037] RDX: 0000000000000000 RSI: ffffffffa18f3810 RDI: ffffffffa125f9b8
> Jul 20 17:15:06 chimera kernel: [ 3324.556037] RBP: ffff8800b5169e88 R08: ffffffff810aa3ee R09: 0000000000000000
> Jul 20 17:15:06 chimera kernel: [ 3324.556037] R10: 0000000000000088 R11: ffffffff81b24258 R12: ffffffffa1908155
> Jul 20 17:15:06 chimera kernel: [ 3324.556037] R13: 0000000000000000 R14: 000003060ede68d6 R15: 0000000000000000
> Jul 20 17:15:06 chimera kernel: [ 3324.556037] FS:  00007f6e2d1856f0(0000) GS:ffff88011b400000(0000) knlGS:0000000000000000
> Jul 20 17:15:06 chimera kernel: [ 3324.556037] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Jul 20 17:15:06 chimera kernel: [ 3324.556037] CR2: ffffffffa1543820 CR3: 00000000b5241000 CR4: 00000000000006e0
> Jul 20 17:15:06 chimera kernel: [ 3324.556037] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Jul 20 17:15:06 chimera kernel: [ 3324.556037] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Jul 20 17:15:06 chimera kernel: [ 3324.556037] Process modprobe (pid: 20884, threadinfo ffff8800b5168000, task ffff8800b818b000)
> Jul 20 17:15:06 chimera kernel: [ 3324.556037] Stack:
> Jul 20 17:15:06 chimera kernel: [ 3324.556037]  ffff8800b5169ed8 ffffffff814932e7 ffff8800b5169ed8 ffffffff814c5248
> Jul 20 17:15:06 chimera kernel: [ 3324.556037]  ffff8800b5169eb8 0000000000000000 ffffffffa1908155 0000000000000000
> Jul 20 17:15:06 chimera kernel: [ 3324.556037]  000003060ede68d6 0000000000000000 ffff8800b5169ef8 ffffffffa190843f
> Jul 20 17:15:06 chimera kernel: [ 3324.556037] Call Trace:
> Jul 20 17:15:06 chimera kernel: [ 3324.556037]  [<ffffffff814932e7>] register_netdevice_notifier+0x3b/0x24b
> Jul 20 17:15:06 chimera kernel: [ 3324.556037]  [<ffffffff814c5248>] ? genl_register_family_with_ops+0x50/0x9e
> Jul 20 17:15:06 chimera kernel: [ 3324.556037]  [<ffffffffa1908155>] ? ip_vs_conn_init+0x155/0x155 [ip_vs]
> Jul 20 17:15:06 chimera kernel: [ 3324.556037]  [<ffffffffa190843f>] ip_vs_control_init+0xeb/0x132 [ip_vs]
> Jul 20 17:15:06 chimera kernel: [ 3324.556037]  [<ffffffffa1908155>] ? ip_vs_conn_init+0x155/0x155 [ip_vs]
> Jul 20 17:15:06 chimera kernel: [ 3324.556037]  [<ffffffffa1908176>] ip_vs_init+0x21/0x1ff [ip_vs]
> Jul 20 17:15:06 chimera kernel: [ 3324.556037]  [<ffffffffa1908155>] ? ip_vs_conn_init+0x155/0x155 [ip_vs]
> Jul 20 17:15:06 chimera kernel: [ 3324.556037]  [<ffffffff81002094>] do_one_initcall+0x6c/0x1c5
> Jul 20 17:15:06 chimera kernel: [ 3324.556037]  [<ffffffff810cd572>] sys_init_module+0xe1/0x2b0
> Jul 20 17:15:06 chimera kernel: [ 3324.556037]  [<ffffffff8157da02>] system_call_fastpath+0x16/0x1b
> Jul 20 17:15:06 chimera kernel: [ 3324.556037] Code: 89 e5 e8 85 e3 04 00 c9 c3 90 90 90 55 48 89 e5 66 66 66 66 90 48 ff 05 f8 5b 00 01 48 8b 07 eb 1e 48 ff 05 fc 5b 00 01 8b 56 10 <3b> 50 10 7f 14 48 ff 05 e5 5b 00 01 48 8d 78 08 48 8b 40 08 48 
> Jul 20 17:15:06 chimera kernel: [ 3324.556037] RIP  [<ffffffff810a8d4f>] raw_notifier_chain_register+0x1f/0x4a
> Jul 20 17:15:06 chimera kernel: [ 3324.556037]  RSP <ffff8800b5169e88>
> Jul 20 17:15:06 chimera kernel: [ 3324.556037] CR2: ffffffffa1543820
> Jul 20 17:15:06 chimera kernel: [ 3324.583800] ---[ end trace 1df4eeece34268d5 ]---
> 
> 
> 
> ---
> ~Randy
> *** Remember to use Documentation/SubmitChecklist when testing your code ***

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: ipvs oops in 3.0-rc7
From: Huajun Li @ 2011-07-21  8:42 UTC (permalink / raw)
  To: Simon Horman
  Cc: Randy Dunlap, netdev, lvs-devel, Wensong Zhang, Julian Anastasov,
	huajun li
In-Reply-To: <20110721054033.GA8299@verge.net.au>

Hi Randy and Simon,
    I happened to meet the issue too, loading and unloading module of
ip_vs, then loading it again will cause Oops, the root cause may be
ip_vs_dst_notifier is not unregistered. Please try following patch, it
works for me.


Signed-off-by: Huajun Li <huajun.li.lee@gmail.com>
---
 net/netfilter/ipvs/ip_vs_ctl.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 699c79a..a178cb3 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -3771,6 +3771,7 @@ err_sock:
 void ip_vs_control_cleanup(void)
 {
 	EnterFunction(2);
+	unregister_netdevice_notifier(&ip_vs_dst_notifier);
 	ip_vs_genl_unregister();
 	nf_unregister_sockopt(&ip_vs_sockopts);
 	LeaveFunction(2);
-- 
1.7.4.1


2011/7/21 Simon Horman <horms@verge.net.au>:
> On Wed, Jul 20, 2011 at 08:50:19PM -0700, Randy Dunlap wrote:
>> I'm seeing the following Oops in 3.0-rc7 on x86_64, just loading and unloading
>> modules.  Any chance this is already fixed?  I can test current git, but I
>> wanted to ask first.
>>
>> Looks like it is on the second module load of ip_vs (i.e.,
>> modprobe ip_vs; rmmod ip_vs; modprobe ip_vs).
>
> Hi Randy,
>
> I don't believe that this problem has been resolved (or observed before).
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply related

* (unknown), 
From: Victor Chernika @ 2011-07-21  8:21 UTC (permalink / raw)
  To: netdev

unsubscribe netdev

^ permalink raw reply

* Re: [PATCH V2]  vhost: fix check for # of outstanding buffers
From: Michael S. Tsirkin @ 2011-07-21  8:06 UTC (permalink / raw)
  To: Shirley Ma; +Cc: David Miller, netdev, jasowang
In-Reply-To: <1311182592.8573.45.camel@localhost.localdomain>

On Wed, Jul 20, 2011 at 10:23:12AM -0700, Shirley Ma wrote:
> Fix the check for number of outstanding buffers returns incorrect
> results due to vq->pend_idx wrap around;
> 
> Signed-off-by: Shirley Ma <xma@us.ibm.com>

OK, the logic's right now, and it's not worse
than what we had, so I applied this after
fixing up the comment (it's upend_idx and English
sentences don't need to end with a semicolumn ;)

However, I would like to see the effect of the bug
noted in the log in the future.

And the reason I mention this here, is that
I think that the whole VHOST_MAX_PEND thing
does not work as advertised: this logic only
triggers when the ring is empty, so we will happily push
more than VHOST_MAX_PEND packets if the guest manages
to give them to us.

I'm not sure why we have the limit, either: the wmem
limit in the socket still applies and seems more
effective to prevent denial of service by a malicious guest.

> ---
> 
>  drivers/vhost/net.c |   12 +++++++++---
>  1 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 70ac604..946a71e 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -182,15 +182,21 @@ static void handle_tx(struct vhost_net *net)
>  			break;
>  		/* Nothing new?  Wait for eventfd to tell us they refilled. */
>  		if (head == vq->num) {
> +			int num_pends;
> +
>  			wmem = atomic_read(&sock->sk->sk_wmem_alloc);
>  			if (wmem >= sock->sk->sk_sndbuf * 3 / 4) {
>  				tx_poll_start(net, sock);
>  				set_bit(SOCK_ASYNC_NOSPACE, &sock->flags);
>  				break;
>  			}
> -			/* If more outstanding DMAs, queue the work */
> -			if (unlikely(vq->upend_idx - vq->done_idx >
> -				     VHOST_MAX_PEND)) {
> +			/* If more outstanding DMAs, queue the work
> +			 * handle upend_idx wrap around
> +			 */
> +			num_pends = (vq->upend_idx >= vq->done_idx) ?
> +				    (vq->upend_idx - vq->done_idx) :
> +				    (vq->upend_idx + UIO_MAXIOV - vq->done_idx);
> +			if (unlikely(num_pends > VHOST_MAX_PEND)) {
>  				tx_poll_start(net, sock);
>  				set_bit(SOCK_ASYNC_NOSPACE, &sock->flags);
>  				break;
> 

^ permalink raw reply

* Re: [PATCH net-next-2.6] ipv6: make fragment identifications less predictable
From: Fernando Gont @ 2011-07-21  1:32 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, security, Eugene Teo, netdev, Matt Mackall
In-Reply-To: <1311157648.2338.22.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

On 07/20/2011 07:27 AM, Eric Dumazet wrote:
> Le mercredi 20 juillet 2011 à 10:25 +0200, Eric Dumazet a écrit :
> 
>> Please hold on, I'll make a different patch series to ease stable teams
>> job. It appears inetpeer & ipv6 are really not an option for old
>> kernels.
>>
>> Common patch for all kernels :
>> 1) Fix the problem without inetpeer help
>> --- 
>> Patches for next kernels
>> 2) random split as suggested by Matt Mackal
>> 3) Use inetpeer cache to scale identification generation
> 
> Here is the first patch, applicable on net-2.6 / linux-2.6 and stable
> kernels.

Does it make sense to go in this direction rather than simply randomize
the IPv6 Fragment Identification?

Keep in mind that IPv6 routers don't perform fragmentation, that that
the IPv6 identification is 32-bits long.

Thanks,
-- 
Fernando Gont
e-mail: fernando@gont.com.ar || fgont@acm.org
PGP Fingerprint: 7809 84F5 322E 45C7 F1C9 3945 96EE A9EF D076 FFF1




^ permalink raw reply

* Re: [patch net-next-2.6 37/47] igb: do vlan cleanup
From: Jiri Pirko @ 2011-07-21  6:57 UTC (permalink / raw)
  To: Jesse Gross
  Cc: netdev, davem, shemminger, eric.dumazet, greearb, mirqus,
	jeffrey.t.kirsher, jesse.brandeburg, peter.p.waskiewicz.jr,
	bruce.w.allan, carolyn.wyborny, donald.c.skidmore, gregory.v.rose,
	alexander.h.duyck, john.ronciak, e1000-devel
In-Reply-To: <CAEP_g=9j3=s74_VQ6RQxVRGOOs-mVR94s31ETXJ1S5P7aQV4iQ@mail.gmail.com>

Thu, Jul 21, 2011 at 01:58:10AM CEST, jesse@nicira.com wrote:
>On Wed, Jul 20, 2011 at 12:10 PM, Jiri Pirko <jpirko@redhat.com> wrote:
>> Wed, Jul 20, 2011 at 07:35:33PM CEST, jesse@nicira.com wrote:
>>>On Wed, Jul 20, 2011 at 7:54 AM, Jiri Pirko <jpirko@redhat.com> wrote:
>>>> @@ -2943,7 +2944,7 @@ static void igb_rlpml_set(struct igb_adapter *adapter)
>>>>        struct e1000_hw *hw = &adapter->hw;
>>>>        u16 pf_id = adapter->vfs_allocated_count;
>>>>
>>>> -       if (adapter->vlgrp)
>>>> +       if (igb_vlan_used(adapter))
>>>>                max_frame_size += VLAN_TAG_SIZE;
>>>
>>>There are similar issues here as with the VF driver.  I think you're
>>>also confusing vlan acceleration with vlan filtering.  If no vlan
>>>filters are in use but the card is in promiscuous mode, the buffer
>>>will be undersized and we lose tagged packets.
>>
>> I'm certainly not confusing vlan accel and filtering. Here is the
>> intension is the behaviour remains intact as well. I believe it's true.
>
>I believe the underlying issue for all three of these threads is the
>same, so I'll just respond to them all here.
>
>I agree that this doesn't change the behavior of the driver but I
>don't think that should be the goal.  When I originally designed this
>new vlan model my intention was to eliminate a whole class of driver
>bugs that I was repeatedly hitting in various forms.  In the example
>above, if you run tcpdump on this device without configuring a vlan
>group on it then you will see that MTU sized packets are missing
>because the receive buffer was undersized.
>
>The common theme for these problems is that they all occur in
>situations where vlans are not configured on the device and the driver
>does something different as a result of this.  The solution was to
>prevent drivers from changing their behavior in such situations by
>completely removing the concept of a vlan group from them and letting
>the networking core tell them when to make the changes instead of
>doing it implicitly.  That's why I don't see the fact that this change
>essentially emulates the knowledge of configuring a group to be a
>plus.  By the way, plenty of your other patches change the behavior of
>the drivers - on any of the NICs that always enable stripping, try
>running tcpdump on the interface without configuring a vlan group.
>Before the change you will see that tags have disappeared and
>afterwards the tags are intact.  So I think that changing the behavior
>of drivers in this regard is a positive thing.
>
>As an aside, thank you for taking the time to work on all of these
>drivers.  The only reason why I'm complaining about these few drivers
>is because I'd like to close the door on this class of problems, which
>is finally in reach thanks to your work.


Okay now it's clear to me. I tried to stay with the code as much similar
as unpatched. But I see your arguments. I will review and repost
patches which are enabling/disabling vlan accel on add_vid/kill_vid and
convert it to set_features.

Thanks. Jesse.

Jirka

^ permalink raw reply

* Re: [PATCH 1/2] igb: Allow extra 4 bytes on RX for vlan tags.
From: Alexander Duyck @ 2011-07-21  6:35 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: Ben Greear, Jesse Gross, netdev@vger.kernel.org,
	Duyck, Alexander H
In-Reply-To: <1311211304.2401.9.camel@jtkirshe-mobl>

On Wed, Jul 20, 2011 at 6:21 PM, Jeff Kirsher
<jeffrey.t.kirsher@intel.com> wrote:
> On Wed, 2011-07-20 at 17:27 -0700, Ben Greear wrote:
>> On 07/20/2011 05:18 PM, Jesse Gross wrote:
>> > On Thu, Feb 17, 2011 at 9:28 AM, Ben Greear<greearb@candelatech.com>  wrote:
>> >> On 02/17/2011 03:04 AM, Jeff Kirsher wrote:
>> >>>
>> >>> On Thu, Feb 10, 2011 at 13:59,<greearb@candelatech.com>    wrote:
>> >>>>
>> >>>> From: Ben Greear<greearb@candelatech.com>
>> >>>>
>> >>>> This allows the NIC to receive 1518 byte (not counting
>> >>>> FCS) packets when MTU is 1500, thus allowing 1500 MTU
>> >>>> VLAN frames to be received.  Please note that no VLANs
>> >>>> were actually configured on the NIC...it was just acting
>> >>>> as pass-through device.
>> >>>>
>> >>>> Signed-off-by: Ben Greear<greearb@candelatech.com>
>> >>>> ---
>> >>>> :100644 100644 58c665b... 30c9cc6... M  drivers/net/igb/igb_main.c
>> >>>>   drivers/net/igb/igb_main.c |    5 +++--
>> >>>>   1 files changed, 3 insertions(+), 2 deletions(-)
>> >>>>
>> >>>> diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
>> >>>> index 58c665b..30c9cc6 100644
>> >>>> --- a/drivers/net/igb/igb_main.c
>> >>>> +++ b/drivers/net/igb/igb_main.c
>> >>>> @@ -2281,7 +2281,8 @@ static int __devinit igb_sw_init(struct igb_adapter
>> >>>> *adapter)
>> >>>>         adapter->rx_itr_setting = IGB_DEFAULT_ITR;
>> >>>>         adapter->tx_itr_setting = IGB_DEFAULT_ITR;
>> >>>>
>> >>>> -       adapter->max_frame_size = netdev->mtu + ETH_HLEN + ETH_FCS_LEN;
>> >>>> +       adapter->max_frame_size = (netdev->mtu + ETH_HLEN + ETH_FCS_LEN
>> >>>> +                                  + VLAN_HLEN);
>> >>>>         adapter->min_frame_size = ETH_ZLEN + ETH_FCS_LEN;
>> >>>>
>> >>>>         spin_lock_init(&adapter->stats64_lock);
>> >>>> @@ -4303,7 +4304,7 @@ static int igb_change_mtu(struct net_device
>> >>>> *netdev, int new_mtu)
>> >>>>   {
>> >>>>         struct igb_adapter *adapter = netdev_priv(netdev);
>> >>>>         struct pci_dev *pdev = adapter->pdev;
>> >>>> -       int max_frame = new_mtu + ETH_HLEN + ETH_FCS_LEN;
>> >>>> +       int max_frame = new_mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN;
>> >>>>         u32 rx_buffer_len, i;
>> >>>>
>> >>>>         if ((new_mtu<    68) || (max_frame>    MAX_JUMBO_FRAME_SIZE)) {
>> >>>
>> >>> While testing this patch, validation found that the patch reduces the
>> >>> maximum mtu size
>> >>> by 4 bytes (reduces it from 9216 to 9212).  This is not a desired side
>> >>> effect of this patch.
>> >>
>> >> You could add handling for that case and have it act as it used to when
>> >> new_mtu is greater than 9212?
>> >>
>> >> I tested e1000e and it worked w/out hacking at 1500 MTU, so maybe
>> >> check how it does it?
>> >
>> > I just wanted to bring this up again to see if any progress had been
>> > made.  We were looking at this driver and trying to figure out the
>> > best way to convert it to use the new vlan model but I'm not familiar
>>
>> I've been watching :)
>>
>> > enough with the hardware to know.  It seems that all of the other
>> > Intel drivers unconditionally add space for the vlan tag to the
>> > receive buffer (and would therefore have similar effects as this
>> > patch), is there something different about this card?
>> >
>> > I believe that Alex was working on something in this area (in the
>> > context of one of my patches from a long time ago) but I'm not sure
>> > what came of that.
>>
>> Truth is, I don't really see why it's a problem to decrease the
>> maximum MTU slightly in order to make it work with VLANs.
>>
>> I'm not sure if there is some way to make it work with VLANs
>> and not decrease the maximum MTU.
>
> This was the reason this did not get accepted.  I was looking into what
> could be done so that we did not decease the maximum MTU, but I got
> side-tracked and have not done anything on it in several months.
>

I can take a look at fixing this most likely tomorrow.  I have some
work planned for igb anyway over the next few days.

Odds are it is just a matter of where the VLAN_HLEN is added.  As I
recall for our drivers the correct spot is in the setting of
rx_buffer_len since that is the area more concerned with maximum
receive frame size versus the mtu section which is more concerned with
the transmit side of things.

Thanks,

Alex

^ permalink raw reply

* [PATCH] net: Kobj and queues_kset should be used when CONFIG_XPS is enabled
From: jhbird.choi @ 2011-07-21  6:33 UTC (permalink / raw)
  To: netdev, linux-kernel; +Cc: David S. Miller, Choi, Jong-Hwan

From: Choi, Jong-Hwan <jhbird.choi@samsung.com>

Kobj and queues_kset are used with CONFIG_XPS=y.

Signed-off-by: Choi, Jong-Hwan <jhbird.choi@samsung.com>
---
 include/linux/netdevice.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9e19477..8eb2f11 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -556,7 +556,7 @@ struct netdev_queue {
 	struct Qdisc		*qdisc;
 	unsigned long		state;
 	struct Qdisc		*qdisc_sleeping;
-#ifdef CONFIG_RPS
+#if defined(CONFIG_RPS) || defined(CONFIG_XPS)
 	struct kobject		kobj;
 #endif
 #if defined(CONFIG_XPS) && defined(CONFIG_NUMA)
@@ -1214,7 +1214,7 @@ struct net_device {
 
 	unsigned char		broadcast[MAX_ADDR_LEN];	/* hw bcast add	*/
 
-#ifdef CONFIG_RPS
+#if defined(CONFIG_RPS) || defined(CONFIG_XPS)
 	struct kset		*queues_kset;
 
 	struct netdev_rx_queue	*_rx;
-- 
1.7.1

^ permalink raw reply related

* Re: ipvs oops in 3.0-rc7
From: Simon Horman @ 2011-07-21  5:40 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: netdev, lvs-devel, Wensong Zhang, Julian Anastasov
In-Reply-To: <20110720205019.9dfa30c3.rdunlap@xenotime.net>

On Wed, Jul 20, 2011 at 08:50:19PM -0700, Randy Dunlap wrote:
> I'm seeing the following Oops in 3.0-rc7 on x86_64, just loading and unloading
> modules.  Any chance this is already fixed?  I can test current git, but I
> wanted to ask first.
> 
> Looks like it is on the second module load of ip_vs (i.e.,
> modprobe ip_vs; rmmod ip_vs; modprobe ip_vs).

Hi Randy,

I don't believe that this problem has been resolved (or observed before).

^ permalink raw reply

* Re: IPv6: autoconfiguration and suspend/resume or link down/up
From: Dan Williams @ 2011-07-21  5:30 UTC (permalink / raw)
  To: Jiri Bohac; +Cc: netdev, Herbert Xu, David S. Miller, stephen hemminger
In-Reply-To: <20110720163656.GD11692@midget.suse.cz>

On Wed, 2011-07-20 at 18:36 +0200, Jiri Bohac wrote:
> On Wed, Jul 20, 2011 at 11:21:43AM -0500, Dan Williams wrote:
> > ... and in the resume handler use that value to age anything
> > that needs to know about time spent in suspend, and then do what needs
> > to be done with that.  So something like that may work for IPv6
> > addrconf; on suspend save current time, and on resume check the current
> > time, subtract the time you saved on suspend, and magically add that to
> > the lifetime counts and then run any expiry stuff.
> 
> IPv6 (by specification) does not send any RS when an IP address
> or route expires. So only subtracting the supend time from the
> lifetimes and possibly expiring the routes/IP addresses won't fix
> the problem.

Well, the prefix option of the RA includes the Valid Lifetime (in
seconds, no less) so I'd assume the kernel starts a timer when it
receives the RA and updates any addresses configured as a result of
receiving that RA+prefix, such that when the timer expires, the
autoconfigured address is deleted.  That timer can be used as a base for
the expiry mechanism that I've noted above, no?  This fixes problem #1
from your first mail.

For problem #2, shouldn't a new RS be sent whenever the interface
changes it's IFF_LOWER_UP bit?  IFF_LOWER_UP indicates a carrier on/off
event and thus indicates possible disconnect/reconnect to a new network.
I don't specifically know how it works now, but if RS isn't triggered
from IFF_LOWER_UP, I'd imagine that either (a) something didn't get
updated when IFF_LOWER_UP became how carrier was indicated in 2.6.17
(commit b00055aacdb172c05067612278ba27265fcd05ce) or (b) there's a
reason IFF_LOWER_UP isn't used as the trigger for sending an RS and I'm
qualified to say why.

Dan

> When I move to a new network, I need to restart the
> autoconfiguration. This does not currently happen - neither for
> an alive system where the ethernet link goes down/up, nor for a
> system that gets suspended, moved and then resumed.
> 



^ permalink raw reply

* Re: [PATCH v3] net: filter: BPF 'JIT' compiler for PPC64
From: Eric Dumazet @ 2011-07-21  5:00 UTC (permalink / raw)
  To: Matt Evans; +Cc: netdev, linuxppc-dev
In-Reply-To: <4E278604.5080605@ozlabs.org>

Le jeudi 21 juillet 2011 à 11:51 +1000, Matt Evans a écrit :
> An implementation of a code generator for BPF programs to speed up packet
> filtering on PPC64, inspired by Eric Dumazet's x86-64 version.
> 
> Filter code is generated as an ABI-compliant function in module_alloc()'d mem
> with stackframe & prologue/epilogue generated if required (simple filters don't
> need anything more than an li/blr).  The filter's local variables, M[], live in
> registers.  Supports all BPF opcodes, although "complicated" loads from negative
> packet offsets (e.g. SKF_LL_OFF) are not yet supported.
> 
> There are a couple of further optimisations left for future work; many-pass
> assembly with branch-reach reduction and a register allocator to push M[]
> variables into volatile registers would improve the code quality further.
> 
> This currently supports big-endian 64-bit PowerPC only (but is fairly simple
> to port to PPC32 or LE!).
> 
> Enabled in the same way as x86-64:
> 
> 	echo 1 > /proc/sys/net/core/bpf_jit_enable
> 
> Or, enabled with extra debug output:
> 
> 	echo 2 > /proc/sys/net/core/bpf_jit_enable
> 
> Signed-off-by: Matt Evans <matt@ozlabs.org>
> ---
> 
> V3: Added BUILD_BUG_ON to assert PACA CPU ID is 16bits, made a comment (in
>     LD_MSH) a bit clearer, ratelimited "Unknown opcode" error and moved
>     bpf_jit.S to bpf_jit_64.S (it doesn't make sense to rename bpf_jit_comp.c as
>     small portions will eventually get split out into _32/_64.c files when we do
>     32bit support).
> 
>  arch/powerpc/Kconfig                  |    1 +
>  arch/powerpc/Makefile                 |    3 +-
>  arch/powerpc/include/asm/ppc-opcode.h |   40 ++
>  arch/powerpc/net/Makefile             |    4 +
>  arch/powerpc/net/bpf_jit.h            |  227 +++++++++++
>  arch/powerpc/net/bpf_jit_64.S         |  138 +++++++
>  arch/powerpc/net/bpf_jit_comp.c       |  694 +++++++++++++++++++++++++++++++++
>  7 files changed, 1106 insertions(+), 1 deletions(-)

Nice work Matt ;)

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Thanks


_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply

* RE: Bridging behavior apparently changed around the Fedora 14 time
From: Greg Scott @ 2011-07-21  4:40 UTC (permalink / raw)
  To: Greg Scott, David Lamparter
  Cc: netdev, Lynn Hanson, Joe Whalen, Graham Parenteau
In-Reply-To: <925A849792280C4E80C5461017A4B8A2A040FB@mail733.InfraSupportEtc.com>

Aw nuts, nothing is ever straightforward.  

When I do:

ip link set br0 promisc on

My internal users can see the internally hosted websites using the
public IP Addresses.  The router on a stick rules I put in work just
fine.  (In on br0/eth1, DNATed in PREROUTING, MASQUERADEd in
POSTROUTING, back out br0/eth1 to the correct internal host.)

However, I just learned tonight, this breaks both inbound and outbound
PPTP VPNs.  And when I do:

ip link set br0 promisc off

now my PPTP VPNs work, but this breaks my above router on a stick rules.


My PPTP VPN stuff uses the GRE iptables conntrack modules,
ip_conntrack_pptp and ip_nat_pptp, and some PREROUTING and POSTROUTING
rules to DNAT TCP 1723 and all GRE packets to an internal Windows RRAS
server.  But when I turn promisc on for br0, I see a storm of packets
looping over and over again, until the remote client finally times out
after what seems like an eternity.  

I'll bet that bridge forwards my packets out the wrong physical ethnn
interface when it's in promisc mode and that's why my NATed PPTP VPN
breaks.  Which makes me wonder if putting br0 in promisc mode breaks any
of my other NATed services.

- Greg Scott



-----Original Message-----
From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
On Behalf Of Greg Scott
Sent: Tuesday, July 12, 2011 11:29 AM
To: David Lamparter
Cc: netdev@vger.kernel.org; Lynn Hanson; Joe Whalen
Subject: RE: Bridging behavior apparently changed around the Fedora 14
time


> P.S.: you blissfully ignored my "ip neigh add proxy 1.2.3.4" note :)

Sorry - didn't ignore it, just didn't reply back to it.  I'll look into
it. What I've read about this before has all been kind of vague.  Does
this mean I proxy ARP only for IP Address 1.2.3.4?  So somebody sends an
ARP whois 1.2.3.4, I'll answer with 1.2.3.4. is at {My MAC Address}?  If
so, then I agree, not nearly as evil as just setting proxy_arp.  

> Whoa. And here I was almost ashamed of running 2.6.38. I'm sorry, but
I
> think you need to go bug RedHat.

Yeah, maybe.  OK, probably.  This was such a bizarre problem - I started
with Netfilter and those guys suggested I try here.  At least now I
understand the problem lots better than before. And it's not like I can
just go and update dozens of kernels at dozens of sites all the time
when a new kernel comes out.  

> You totally misunderstood me. I'm suggesting the separate VLAN for
your
> servers which have private IPs but which have services exposed to the
> internet (and your clients) on public IPs through NAT.

Ahh - OK.  The challenge with many small sites is, economic reality.
That same server that hosts the public ftp and websites also hosts all
the internal Windows file/print services.  It's the only server at this
site, so it has several roles.  I would love to build a real DMZ network
and put all the public facing stuff in there, but I don't have money for
multiple servers.  This will become even more difficult to separate when
we go to virtual servers and clustered hosts.  

> Your H323 stuff is totally unrelated.

Agreed.  Wholeheartedly.  

> Yes. Your problem seems to be between the private-IP clients in your
> network and your private-IP servers if I understand correctly.

Yes.  Dead-bang, right on target.  

> Yes. And because it is a router, it as an IP from the private subnet
> your clients are in. My question was: what device is that IP on?

Ahh - eth1 is the private LAN side, 192.168.10.1.  All the NATed LAN
stuff and all the workstations are in the 192.168.10.0/24 subnet and
connected to eth1.  Eth0 is the Internet side.  The Internet side has
the firewall NIC, a cable, and the Internet router.  That's it.
Everything is connected to the LAN side.  

> No. You're jumping to conclusions. You're affecting the "top" bridge
> device's promiscuity. I would say that the effect you're seeing is in
> the IP stack above it, caused by it now promiscuously handling packets
> that are dropped otherwise.

Well they were sure dropped before I set it to PROMISC mode, that's for
sure. And it all worked with the earlier version.  That's why this feels
like a layer 2 issue.  If it was an IP issue, why didn't it break
several years ago when I first set it up?

Does bridging make everything a little more complex and delicate to set
up?  Well, yeah.  And some of the netfilter stuff has been a moving
target over the years.  

I don't see how ICMP redirects matter.  Comparing
/proc/sys/net/ipv4/conf/*/accept_redirects with this version and an
older one at another site - all identical.  ../all/accept_recdirects is
0, the rest are all 1.  Shared media and ARP settings -
/proc/sys/net/ipv4/conf/*/shared_media - all 1 for all interfaces.
There are a zillion arp settings.  Looking at
/proc/sys/net/ipv4/conf/*/*arp* - all are 0 in both the other older site
and this newer site.  

Curiously - at one of my other older sites, apparently br0 is not in
promisc mode.  But I don't think these guys do any of the stick routing
stuff.  I wonder if these guys have the problem but we don't see it
because they never try it?

[root@NSSSS-fw1 ~]# more /sys/class/net/br0/flags
0x1003
[root@NSSSS-fw1 ~]#
[root@NSSSS-fw1 ~]# more /proc/version
Linux version 2.6.32.11-99.fc12.i686.PAE
(mockbuild@x86-05.phx2.fedoraproject.org) (gcc version 4.4.3 20100127
(Red Hat 4.4.3-4) (GCC) )
#1 SMP Mon Apr 5 16:15:03 EDT 2010
[root@NSSSS-fw1 ~]#
[root@NSSSS-fw1 ~]# uname -a
Linux NSSSS-fw1 2.6.32.11-99.fc12.i686.PAE #1 SMP Mon Apr 5 16:15:03 EDT
2010 i686 i686 i386 GNU/Linux
[root@NSSSS-fw1 ~]#


Here is a much older bridged site based on Fedora 9 and I'm sure these
guys use my stick routing stuff.  Look at the difference in ..br0/flags.


[root@lme-fw2 ~]#  more /sys/class/net/br0/flags
0x1103
[root@lme-fw2 ~]#
[root@lme-fw2 ~]# more /proc/version
Linux version 2.6.25-14.fc9.i686 (mockbuild@) (gcc version 4.3.0
20080428 (Red H
at 4.3.0-8) (GCC) ) #1 SMP Thu May 1 06:28:41 EDT 2008
[root@lme-fw2 ~]#
[root@lme-fw2 ~]# uname -a
Linux lme-fw2 2.6.25-14.fc9.i686 #1 SMP Thu May 1 06:28:41 EDT 2008 i686
i686 i386 GNU/Linux

I can still get my hands on the old box at the site in question.  I
guess it couldn't hurt to fire it up and look at its br0 flags.  

- Greg

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [Pv-drivers] [PATCH net-next] vmxnet3: set netdev parant device before calling netdev_info
From: Shreyas Bhatewara @ 2011-07-21  4:05 UTC (permalink / raw)
  To: Joe Perches, Scott Goldman; +Cc: netdev@vger.kernel.org, pv-drivers@vmware.com
In-Reply-To: <1311220141.1663.24.camel@Joe-Laptop>

> -----Original Message-----
> From: Joe Perches [mailto:joe@perches.com]
> Sent: Wednesday, July 20, 2011 8:49 PM
> To: Scott Goldman
> Cc: Shreyas Bhatewara; netdev@vger.kernel.org; pv-drivers@vmware.com
> Subject: RE: [Pv-drivers] [PATCH net-next] vmxnet3: set netdev parant
> device before calling netdev_info
> 
> On Wed, 2011-07-20 at 20:06 -0700, Scott Goldman wrote:
> > > Parent device for netdev should be set before netdev_info() can be
> called
> > > otherwise there is a NULL pointer dereference and probe() fails.
> 
> I believe this is not true.
> I don't see any NULL pointer dereference here.
> functions and macros reordered top to bottom.
> 


Thanks for looking Joe.
This happened in 2.6.36. I saw the panic in 2.6.34 and assumed that it would be same
in the latest kernel.
It would not panic in 3.0 but it is good to have the parent device set early.

^ permalink raw reply

* ipvs oops in 3.0-rc7
From: Randy Dunlap @ 2011-07-21  3:50 UTC (permalink / raw)
  To: netdev, lvs-devel; +Cc: Simon Horman, Wensong Zhang, Julian Anastasov

I'm seeing the following Oops in 3.0-rc7 on x86_64, just loading and unloading
modules.  Any chance this is already fixed?  I can test current git, but I
wanted to ask first.

Looks like it is on the second module load of ip_vs (i.e.,
modprobe ip_vs; rmmod ip_vs; modprobe ip_vs).


Jul 20 17:15:05 chimera kernel: [ 3323.505527] IPVS: ipvs unloaded.
Jul 20 17:15:06 chimera kernel: [ 3324.554297] BUG: unable to handle kernel paging request at ffffffffa1543820
Jul 20 17:15:06 chimera kernel: [ 3324.554382] IP: [<ffffffff810a8d4f>] raw_notifier_chain_register+0x1f/0x4a
Jul 20 17:15:06 chimera kernel: [ 3324.554445] PGD 1872067 PUD 1876063 PMD b653f067 PTE 0
Jul 20 17:15:06 chimera kernel: [ 3324.554505] Oops: 0000 [#1] SMP 
Jul 20 17:15:06 chimera kernel: [ 3324.554551] CPU 1 
Jul 20 17:15:06 chimera kernel: [ 3324.554574] Modules linked in: ip_vs(+) nf_conntrack_sip nf_tproxy_core xt_RATEEST nf_conntrack_proto_gre nfnetlink_log nfnetlink nf_conntrack_broadcast l2tp_core can rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr atm kernelcapi fcrypt pcbc af_rxrpc xp gru macvtap tun isdnhdlc mISDNipac mISDN_core chipreg map_funcs macvlan ptp pps_core mdio_bitbang hdlcdrv ax25 mdio pppox gre inet_lro cycx_drv wanrouter hdlc lapb uio ppp_generic xenbus_probe_frontend configfs ecb rtl8192c_common ath9k_common ath9k_hw ath libertas atmel rt2x00pci rt2x00usb rt2x00lib rng_core orinoco wl12xx crc7 p54common arc4 hostap rndis_host eeprom_93cx6 libipw lib80211 mac80211 cfg80211 fddi crc32c libcrc32c dca com20020 arcnet psnap cdc_ether phonet usbnet sja1000 can_dev sir_dev ird
 a crc_ccitt mtd zlib_deflate slhc virtio_ring virtio tr i2400m wimax mii usbserial leds_net5501 fuse af_packet ipt_MASQUERADE iptable_nat nf_nat nfsd lockd nfs_acl auth_rpcgss stp llc bnep bluetooth rfkill crc16 sunrpc ipt_REJEC
Jul 20 17:15:06 chimera kernel: T nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand acpi_cpufreq freq_table mperf binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath scsi_dh dm_mod kvm_intel kvm uinput mousedev sr_mod cdrom ppdev snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device ide_pci_generic snd_pcm_oss usbmouse ide_core snd_mixer_oss firewire_ohci usbhid snd_pcm firewire_core usb_storage hid usblp ata_generic i2c_i801 sg pcspkr snd_timer pata_acpi usb_libusual iTCO_wdt iTCO_vendor_support uas snd crc_itu_t soundcore pata_marvell snd_page_alloc parport_pc evdev process
 or parport mac_hid rtc_cmos unix sd_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ssb mmc_core pcmcia pcmcia_core firmware_class ehci_hcd usbcore i915 drm_kms_helper intel_agp button intel_gtt video thermal_sys hwmon [last unlo
Jul 20 17:15:06 chimera kernel: aded: ip_vs]
Jul 20 17:15:06 chimera kernel: [ 3324.556037] 
Jul 20 17:15:06 chimera kernel: [ 3324.556037] Pid: 20884, comm: modprobe Not tainted 3.0.0-rc7 #6 Gateway GT5636E/DG965OT
Jul 20 17:15:06 chimera kernel: [ 3324.556037] RIP: 0010:[<ffffffff810a8d4f>]  [<ffffffff810a8d4f>] raw_notifier_chain_register+0x1f/0x4a
Jul 20 17:15:06 chimera kernel: [ 3324.556037] RSP: 0018:ffff8800b5169e88  EFLAGS: 00010202
Jul 20 17:15:06 chimera kernel: [ 3324.556037] RAX: ffffffffa1543810 RBX: ffffffffa18f3810 RCX: 0000000000000000
Jul 20 17:15:06 chimera kernel: [ 3324.556037] RDX: 0000000000000000 RSI: ffffffffa18f3810 RDI: ffffffffa125f9b8
Jul 20 17:15:06 chimera kernel: [ 3324.556037] RBP: ffff8800b5169e88 R08: ffffffff810aa3ee R09: 0000000000000000
Jul 20 17:15:06 chimera kernel: [ 3324.556037] R10: 0000000000000088 R11: ffffffff81b24258 R12: ffffffffa1908155
Jul 20 17:15:06 chimera kernel: [ 3324.556037] R13: 0000000000000000 R14: 000003060ede68d6 R15: 0000000000000000
Jul 20 17:15:06 chimera kernel: [ 3324.556037] FS:  00007f6e2d1856f0(0000) GS:ffff88011b400000(0000) knlGS:0000000000000000
Jul 20 17:15:06 chimera kernel: [ 3324.556037] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 20 17:15:06 chimera kernel: [ 3324.556037] CR2: ffffffffa1543820 CR3: 00000000b5241000 CR4: 00000000000006e0
Jul 20 17:15:06 chimera kernel: [ 3324.556037] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 20 17:15:06 chimera kernel: [ 3324.556037] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul 20 17:15:06 chimera kernel: [ 3324.556037] Process modprobe (pid: 20884, threadinfo ffff8800b5168000, task ffff8800b818b000)
Jul 20 17:15:06 chimera kernel: [ 3324.556037] Stack:
Jul 20 17:15:06 chimera kernel: [ 3324.556037]  ffff8800b5169ed8 ffffffff814932e7 ffff8800b5169ed8 ffffffff814c5248
Jul 20 17:15:06 chimera kernel: [ 3324.556037]  ffff8800b5169eb8 0000000000000000 ffffffffa1908155 0000000000000000
Jul 20 17:15:06 chimera kernel: [ 3324.556037]  000003060ede68d6 0000000000000000 ffff8800b5169ef8 ffffffffa190843f
Jul 20 17:15:06 chimera kernel: [ 3324.556037] Call Trace:
Jul 20 17:15:06 chimera kernel: [ 3324.556037]  [<ffffffff814932e7>] register_netdevice_notifier+0x3b/0x24b
Jul 20 17:15:06 chimera kernel: [ 3324.556037]  [<ffffffff814c5248>] ? genl_register_family_with_ops+0x50/0x9e
Jul 20 17:15:06 chimera kernel: [ 3324.556037]  [<ffffffffa1908155>] ? ip_vs_conn_init+0x155/0x155 [ip_vs]
Jul 20 17:15:06 chimera kernel: [ 3324.556037]  [<ffffffffa190843f>] ip_vs_control_init+0xeb/0x132 [ip_vs]
Jul 20 17:15:06 chimera kernel: [ 3324.556037]  [<ffffffffa1908155>] ? ip_vs_conn_init+0x155/0x155 [ip_vs]
Jul 20 17:15:06 chimera kernel: [ 3324.556037]  [<ffffffffa1908176>] ip_vs_init+0x21/0x1ff [ip_vs]
Jul 20 17:15:06 chimera kernel: [ 3324.556037]  [<ffffffffa1908155>] ? ip_vs_conn_init+0x155/0x155 [ip_vs]
Jul 20 17:15:06 chimera kernel: [ 3324.556037]  [<ffffffff81002094>] do_one_initcall+0x6c/0x1c5
Jul 20 17:15:06 chimera kernel: [ 3324.556037]  [<ffffffff810cd572>] sys_init_module+0xe1/0x2b0
Jul 20 17:15:06 chimera kernel: [ 3324.556037]  [<ffffffff8157da02>] system_call_fastpath+0x16/0x1b
Jul 20 17:15:06 chimera kernel: [ 3324.556037] Code: 89 e5 e8 85 e3 04 00 c9 c3 90 90 90 55 48 89 e5 66 66 66 66 90 48 ff 05 f8 5b 00 01 48 8b 07 eb 1e 48 ff 05 fc 5b 00 01 8b 56 10 <3b> 50 10 7f 14 48 ff 05 e5 5b 00 01 48 8d 78 08 48 8b 40 08 48 
Jul 20 17:15:06 chimera kernel: [ 3324.556037] RIP  [<ffffffff810a8d4f>] raw_notifier_chain_register+0x1f/0x4a
Jul 20 17:15:06 chimera kernel: [ 3324.556037]  RSP <ffff8800b5169e88>
Jul 20 17:15:06 chimera kernel: [ 3324.556037] CR2: ffffffffa1543820
Jul 20 17:15:06 chimera kernel: [ 3324.583800] ---[ end trace 1df4eeece34268d5 ]---



---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply

* RE: [Pv-drivers] [PATCH net-next] vmxnet3: set netdev parant device before calling netdev_info
From: Joe Perches @ 2011-07-21  3:49 UTC (permalink / raw)
  To: Scott Goldman
  Cc: Shreyas Bhatewara, netdev@vger.kernel.org, pv-drivers@vmware.com
In-Reply-To: <03E840D17E263A48A5766AD576E0423A03C5836359@exch-mbx-111.vmware.com>

On Wed, 2011-07-20 at 20:06 -0700, Scott Goldman wrote:
> > Parent device for netdev should be set before netdev_info() can be called
> > otherwise there is a NULL pointer dereference and probe() fails.

I believe this is not true.
I don't see any NULL pointer dereference here.
functions and macros reordered top to bottom.

define_netdev_printk_level(netdev_info, KERN_INFO);

#define define_netdev_printk_level(func, level)			\
int func(const struct net_device *dev, const char *fmt, ...)	\
{								\
	int r;							\
	struct va_format vaf;					\
	va_list args;						\
								\
	va_start(args, fmt);					\
								\
	vaf.fmt = fmt;						\
	vaf.va = &args;						\
								\
	r = __netdev_printk(level, dev, &vaf);			\
	va_end(args);						\
								\
	return r;						\
}								\
EXPORT_SYMBOL(func);

static int __netdev_printk(const char *level, const struct net_device *dev,
			   struct va_format *vaf)
{
	int r;

	if (dev && dev->dev.parent)
		r = dev_printk(level, dev->dev.parent, "%s: %pV",
			       netdev_name(dev), vaf);
	else if (dev)
		r = printk("%s%s: %pV", level, netdev_name(dev), vaf);
	else
		r = printk("%s(NULL net_device): %pV", level, vaf);

	return r;
}

static inline const char *netdev_name(const struct net_device *dev)
{
	if (dev->reg_state != NETREG_REGISTERED)
		return "(unregistered net_device)";
	return dev->name;
}



^ permalink raw reply

* [PATCH] Fix panic in virtnet_remove
From: Krishna Kumar @ 2011-07-20  7:43 UTC (permalink / raw)
  To: mst; +Cc: netdev, shemminger, davem, Krishna Kumar

Fix a panic in virtnet_remove. unregister_netdev has already
freed up the netdev (and virtnet_info) due to dev->destructor
being set, while virtnet_info is still required. Remove
virtnet_free altogether, and move the freeing of the per-cpu
statistics from virtnet_free to virtnet_remove.

Tested patch below.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
 drivers/net/virtio_net.c |   10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff -ruNp org/drivers/net/virtio_net.c new/drivers/net/virtio_net.c
--- org/drivers/net/virtio_net.c	2011-07-18 09:14:02.000000000 +0530
+++ new/drivers/net/virtio_net.c	2011-07-18 09:16:35.000000000 +0530
@@ -705,14 +705,6 @@ static void virtnet_netpoll(struct net_d
 }
 #endif
 
-static void virtnet_free(struct net_device *dev)
-{
-	struct virtnet_info *vi = netdev_priv(dev);
-
-	free_percpu(vi->stats);
-	free_netdev(dev);
-}
-
 static int virtnet_open(struct net_device *dev)
 {
 	struct virtnet_info *vi = netdev_priv(dev);
@@ -959,7 +951,6 @@ static int virtnet_probe(struct virtio_d
 	/* Set up network device as normal. */
 	dev->netdev_ops = &virtnet_netdev;
 	dev->features = NETIF_F_HIGHDMA;
-	dev->destructor = virtnet_free;
 
 	SET_ETHTOOL_OPS(dev, &virtnet_ethtool_ops);
 	SET_NETDEV_DEV(dev, &vdev->dev);
@@ -1122,6 +1113,7 @@ static void __devexit virtnet_remove(str
 	while (vi->pages)
 		__free_pages(get_a_page(vi, GFP_KERNEL), 0);
 
+	free_percpu(vi->stats);
 	free_netdev(vi->dev);
 }
 

^ permalink raw reply

* [PATCH net-next] vmxnet3: fix publicity of NETIF_F_HIGHDMA
From: Shreyas Bhatewara @ 2011-07-21  3:21 UTC (permalink / raw)
  To: netdev; +Cc: pv-drivers

NETIF_F_HIGHDMA is being disabled even when dma64 is true. This patch fixes it.

CC: Michal Miroslaw <mirq-linux@rere.qmqm.pl>
Signed-off-by: Shreyas N Bhatewara <sbhatewara@vmware.com>
--

diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
index 009277e..b46d101 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -2647,7 +2647,7 @@ vmxnet3_declare_features(struct vmxnet3_adapter *adapter, bool dma64)
 		NETIF_F_HW_VLAN_RX | NETIF_F_TSO | NETIF_F_TSO6 |
 		NETIF_F_LRO;
 	if (dma64)
-		netdev->features |= NETIF_F_HIGHDMA;
+		netdev->hw_features |= NETIF_F_HIGHDMA;
 	netdev->vlan_features = netdev->hw_features &
 				~(NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX);
 	netdev->features = netdev->hw_features | NETIF_F_HW_VLAN_FILTER;

^ permalink raw reply related

* RE: [Pv-drivers] [PATCH net-next] vmxnet3: set netdev parant device before calling netdev_info
From: Scott Goldman @ 2011-07-21  3:06 UTC (permalink / raw)
  To: Shreyas Bhatewara, netdev@vger.kernel.org; +Cc: pv-drivers@vmware.com
In-Reply-To: <alpine.LRH.2.00.1107201834220.19334@sbhatewara-dev1.eng.vmware.com>

> Parent device for netdev should be set before netdev_info() can be called
> otherwise there is a NULL pointer dereference and probe() fails.

> Signed-off-by: Shreyas N Bhatewara <sbhatewara@vmware.com>
Signed-off-by: Scott J. Goldman <scottjg@vmware.com>

^ permalink raw reply

* [PATCH net-2.6] jme: Fix unmap error (Causing system freeze)
From: cooldavid @ 2011-07-21  2:57 UTC (permalink / raw)
  To: David Miller
  Cc: Jason Lamb, linux-netdev, Guo-Fu Tseng, Jason Long, Marcus Becker,
	Aries Lee, Devinchiu, Marc Schiffbauer, stable

From: Guo-Fu Tseng <cooldavid@cooldavid.org>


This patch add the missing dma_unmap().
Which solved the critical issue of system freeze on heavy load.

Michal Miroslaw's rejected patch:
[PATCH v2 10/46] net: jme: convert to generic DMA API
Pointed out the issue also, thank you Michal.
But the fix was incorrect. It would unmap needed address
when low memory.

Got lots of feedback from End user and Gentoo Bugzilla.
https://bugs.gentoo.org/show_bug.cgi?id=373109
Thank you all. :)

Cc: stable@kernel.org
Signed-off-by: Guo-Fu Tseng <cooldavid@cooldavid.org>
---
 drivers/net/jme.c |   20 ++++++++++++++------
 1 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/net/jme.c b/drivers/net/jme.c
index b5b174a..1973814 100644
--- a/drivers/net/jme.c
+++ b/drivers/net/jme.c
@@ -753,20 +753,28 @@ jme_make_new_rx_buf(struct jme_adapter *jme, int i)
 	struct jme_ring *rxring = &(jme->rxring[0]);
 	struct jme_buffer_info *rxbi = rxring->bufinf + i;
 	struct sk_buff *skb;
+	dma_addr_t mapping;
 
 	skb = netdev_alloc_skb(jme->dev,
 		jme->dev->mtu + RX_EXTRA_LEN);
 	if (unlikely(!skb))
 		return -ENOMEM;
 
+	mapping = pci_map_page(jme->pdev, virt_to_page(skb->data),
+			       offset_in_page(skb->data), skb_tailroom(skb),
+			       PCI_DMA_FROMDEVICE);
+	if (unlikely(pci_dma_mapping_error(jme->pdev, mapping))) {
+		dev_kfree_skb(skb);
+		return -ENOMEM;
+	}
+
+	if (likely(rxbi->mapping))
+		pci_unmap_page(jme->pdev, rxbi->mapping,
+			       rxbi->len, PCI_DMA_FROMDEVICE);
+
 	rxbi->skb = skb;
 	rxbi->len = skb_tailroom(skb);
-	rxbi->mapping = pci_map_page(jme->pdev,
-					virt_to_page(skb->data),
-					offset_in_page(skb->data),
-					rxbi->len,
-					PCI_DMA_FROMDEVICE);
-
+	rxbi->mapping = mapping;
 	return 0;
 }
 
-- 
1.7.3.4

_______________________________________________
stable mailing list
stable@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/stable

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox