Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v5 0/2] ipv6 addrconf: add IFA_F_NOPREFIXROUTE flag to suppress creation of IP6 routes
From: Thomas Haller @ 2014-01-15 14:58 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: David Miller, jiri, netdev, stephen, dcbw
In-Reply-To: <20140115144331.GF19945@order.stressinduktion.org>

[-- Attachment #1: Type: text/plain, Size: 670 bytes --]

On Wed, 2014-01-15 at 15:43 +0100, Hannes Frederic Sowa wrote:
> On Wed, Jan 15, 2014 at 03:36:57PM +0100, Thomas Haller wrote:
> > v1 -> v2: add a second commit, handling NOPREFIXROUTE in ip6_del_addr.
> > v2 -> v3: reword commit messages, code comments and some refactoring.
> > v3 -> v4: refactor, rename variables, add enum
> > v4 -> v5: rebase, so that patch applies cleanly to current net-next/master
> 
> Huh? I thought those were already applied?
> 
> Hmm, they seem to be in no repository? I hope that no other patches are
> affected, too.
> 
> Greetings,
> 
>   Hannes
> 

I thought so too, but I cannot find it, so I re-posted v5.

Thomas

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH net-next] xen-netback: Rework rx_work_todo
From: Zoltan Kiss @ 2014-01-15 14:52 UTC (permalink / raw)
  To: Wei Liu; +Cc: ian.campbell, xen-devel, netdev, linux-kernel, jonathan.davies
In-Reply-To: <20140115144519.GO5698@zion.uk.xensource.com>

On 15/01/14 14:45, Wei Liu wrote:
>>>> The recent patch to fix receive side flow control (11b57f) solved the spinning
>>>> > >>thread problem, however caused an another one. The receive side can stall, if:
>>>> > >>- xenvif_rx_action sets rx_queue_stopped to false
>>>> > >>- interrupt happens, and sets rx_event to true
>>>> > >>- then xenvif_kthread sets rx_event to false
>>>> > >>
>>> > >
>>> > >If you mean "rx_work_todo" returns false.
>>> > >
>>> > >In this case
>>> > >
>>> > >(!skb_queue_empty(&vif->rx_queue) && !vif->rx_queue_stopped) || vif->rx_event;
>>> > >
>>> > >can still be true, can't it?
>> >Sorry, I should wrote rx_queue_stopped to true
>> >
> In this case, if rx_queue_stopped is true, then we're expecting frontend
> to notify us, right?
>
> rx_queue_stopped is set to true if we cannot make any progress to queue
> packet into the ring. In that situation we can expect frontend will send
> notification to backend after it goes through the backlog in the ring.
> That means rx_event is set to true, and rx_work_todo is true again. So
> the ring is actually not stalled in this case as well. Did I miss
> something?
>

Yes, we expect the guest to notify us, and it does, and we set rx_event 
to true (see second point), but then the thread set it to false (see 
third point). Talking with Paul, another solution could be to set 
rx_event false before calling xenvif_rx_action. But using 
rx_last_skb_slots makes it quicker for the thread to see if it doesn't 
have to do anything.

Zoli

^ permalink raw reply

* [PATCH] net/ipv4: don't use module_init in non-modular gre_offload
From: Paul Gortmaker @ 2014-01-15 14:52 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Paul Gortmaker, Eric Dumazet

Recent commit 438e38fadca2f6e57eeecc08326c8a95758594d4
("gre_offload: statically build GRE offloading support") added
new module_init/module_exit calls to the gre_offload.c file.

The file is obj-y and can't be anything other than built-in.
Currently it can never be built modular, so using module_init
as an alias for __initcall can be somewhat misleading.

Fix this up now, so that we can relocate module_init from
init.h into module.h in the future.  If we don't do this, we'd
have to add module.h to obviously non-modular code, and that
would be a worse thing.  We also make the inclusion explicit.

Note that direct use of __initcall is discouraged, vs. one
of the priority categorized subgroups.  As __initcall gets
mapped onto device_initcall, our use of device_initcall
directly in this change means that the runtime impact is
zero -- it will remain at level 6 in initcall ordering.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
index 29512e3e7e7c..ed968daf2380 100644
--- a/net/ipv4/gre_offload.c
+++ b/net/ipv4/gre_offload.c
@@ -11,6 +11,7 @@
  */

 #include <linux/skbuff.h>
+#include <linux/init.h>
 #include <net/protocol.h>
 #include <net/gre.h>

@@ -283,11 +284,10 @@ static int __init gre_offload_init(void)
 {
 	return inet_add_offload(&gre_offload, IPPROTO_GRE);
 }
+device_initcall(gre_offload_init);

 static void __exit gre_offload_exit(void)
 {
 	inet_del_offload(&gre_offload, IPPROTO_GRE);
 }
-
-module_init(gre_offload_init);
-module_exit(gre_offload_exit);
+__exitcall(gre_offload_exit);
-- 
1.8.5.2

^ permalink raw reply related

* [PATCH v2 net] bpf: do not use reciprocal divide
From: Eric Dumazet @ 2014-01-15 14:50 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Martin Schwidefsky, Hannes Frederic Sowa, netdev, dborkman,
	darkjames-ws, Mircea Gherzan, Russell King, Matt Evans
In-Reply-To: <1389795926.31367.334.camel@edumazet-glaptop2.roam.corp.google.com>

From: Eric Dumazet <edumazet@google.com>

At first Jakub Zawadzki noticed that some divisions by reciprocal_divide
were not correct. (off by one in some cases)
http://www.wireshark.org/~darkjames/reciprocal-buggy.c

He could also show this with BPF:
http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c

The reciprocal divide in linux kernel is not generic enough,
lets remove its use in BPF, as it is not worth the pain with
current cpus.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Cc: Mircea Gherzan <mgherzan@gmail.com>
Cc: Daniel Borkmann <dxchgb@gmail.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Matt Evans <matt@ozlabs.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
---
v2: Fixed sparc code as David kindly suggested
    Added tests on K being 1 (divide by 1 is a nop
                              modulo by 1 clears A),
    as Martin Schwidefsky seems concerned by this case.

Please review arch code to make sure I made no mistake, thanks !

 arch/arm/net/bpf_jit_32.c       |    6 +++---
 arch/powerpc/net/bpf_jit_comp.c |    7 ++++---
 arch/s390/net/bpf_jit_comp.c    |   17 ++++++++++++-----
 arch/sparc/net/bpf_jit_comp.c   |   17 ++++++++++++++---
 arch/x86/net/bpf_jit_comp.c     |   14 ++++++++++----
 net/core/filter.c               |   30 ++----------------------------
 6 files changed, 45 insertions(+), 46 deletions(-)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index 9ed155ad0f97..271b5e971568 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -641,10 +641,10 @@ load_ind:
 			emit(ARM_MUL(r_A, r_A, r_X), ctx);
 			break;
 		case BPF_S_ALU_DIV_K:
-			/* current k == reciprocal_value(userspace k) */
+			if (k == 1)
+				break;
 			emit_mov_i(r_scratch, k, ctx);
-			/* A = top 32 bits of the product */
-			emit(ARM_UMULL(r_scratch, r_A, r_A, r_scratch), ctx);
+			emit_udiv(r_A, r_A, r_scratch, ctx);
 			break;
 		case BPF_S_ALU_DIV_X:
 			update_on_xread(ctx);
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index ac3c2a10dafd..555034f8505e 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -223,10 +223,11 @@ static int bpf_jit_build_body(struct sk_filter *fp, u32 *image,
 			}
 			PPC_DIVWU(r_A, r_A, r_X);
 			break;
-		case BPF_S_ALU_DIV_K: /* A = reciprocal_divide(A, K); */
+		case BPF_S_ALU_DIV_K: /* A /= K */
+			if (K == 1)
+				break;
 			PPC_LI32(r_scratch1, K);
-			/* Top 32 bits of 64bit result -> A */
-			PPC_MULHWU(r_A, r_A, r_scratch1);
+			PPC_DIVWU(r_A, r_A, r_scratch1);
 			break;
 		case BPF_S_ALU_AND_X:
 			ctx->seen |= SEEN_XREG;
diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index 16871da37371..fc0fa77728e1 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -371,11 +371,13 @@ static int bpf_jit_insn(struct bpf_jit *jit, struct sock_filter *filter,
 		/* dr %r4,%r12 */
 		EMIT2(0x1d4c);
 		break;
-	case BPF_S_ALU_DIV_K: /* A = reciprocal_divide(A, K) */
-		/* m %r4,<d(K)>(%r13) */
-		EMIT4_DISP(0x5c40d000, EMIT_CONST(K));
-		/* lr %r5,%r4 */
-		EMIT2(0x1854);
+	case BPF_S_ALU_DIV_K: /* A /= K */
+		if (K == 1)
+			break;
+		/* lhi %r4,0 */
+		EMIT4(0xa7480000);
+		/* d %r4,<d(K)>(%r13) */
+		EMIT4_DISP(0x5d40d000, EMIT_CONST(K));
 		break;
 	case BPF_S_ALU_MOD_X: /* A %= X */
 		jit->seen |= SEEN_XREG | SEEN_RET0;
@@ -391,6 +393,11 @@ static int bpf_jit_insn(struct bpf_jit *jit, struct sock_filter *filter,
 		EMIT2(0x1854);
 		break;
 	case BPF_S_ALU_MOD_K: /* A %= K */
+		if (K == 1) {
+			/* lhi %r5,0 */
+			EMIT4(0xa7580000);
+			break;
+		}
 		/* lhi %r4,0 */
 		EMIT4(0xa7480000);
 		/* d %r4,<d(K)>(%r13) */
diff --git a/arch/sparc/net/bpf_jit_comp.c b/arch/sparc/net/bpf_jit_comp.c
index 218b6b23c378..01fe9946d388 100644
--- a/arch/sparc/net/bpf_jit_comp.c
+++ b/arch/sparc/net/bpf_jit_comp.c
@@ -497,9 +497,20 @@ void bpf_jit_compile(struct sk_filter *fp)
 			case BPF_S_ALU_MUL_K:	/* A *= K */
 				emit_alu_K(MUL, K);
 				break;
-			case BPF_S_ALU_DIV_K:	/* A /= K */
-				emit_alu_K(MUL, K);
-				emit_read_y(r_A);
+			case BPF_S_ALU_DIV_K:	/* A /= K with K != 0*/
+				if (K == 1)
+					break;
+				emit_write_y(G0);
+#ifdef CONFIG_SPARC32
+				/* The Sparc v8 architecture requires
+				 * three instructions between a %y
+				 * register write and the first use.
+				 */
+				emit_nop();
+				emit_nop();
+				emit_nop();
+#endif
+				emit_alu_K(DIV, K);
 				break;
 			case BPF_S_ALU_DIV_X:	/* A /= X; */
 				emit_cmpi(r_X, 0);
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 26328e800869..4ed75dd81d05 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -359,15 +359,21 @@ void bpf_jit_compile(struct sk_filter *fp)
 				EMIT2(0x89, 0xd0);	/* mov %edx,%eax */
 				break;
 			case BPF_S_ALU_MOD_K: /* A %= K; */
+				if (K == 1) {
+					CLEAR_A();
+					break;
+				}
 				EMIT2(0x31, 0xd2);	/* xor %edx,%edx */
 				EMIT1(0xb9);EMIT(K, 4);	/* mov imm32,%ecx */
 				EMIT2(0xf7, 0xf1);	/* div %ecx */
 				EMIT2(0x89, 0xd0);	/* mov %edx,%eax */
 				break;
-			case BPF_S_ALU_DIV_K: /* A = reciprocal_divide(A, K); */
-				EMIT3(0x48, 0x69, 0xc0); /* imul imm32,%rax,%rax */
-				EMIT(K, 4);
-				EMIT4(0x48, 0xc1, 0xe8, 0x20); /* shr $0x20,%rax */
+			case BPF_S_ALU_DIV_K: /* A /= K */
+				if (K == 1)
+					break;
+				EMIT2(0x31, 0xd2);	/* xor %edx,%edx */
+				EMIT1(0xb9);EMIT(K, 4);	/* mov imm32,%ecx */
+				EMIT2(0xf7, 0xf1);	/* div %ecx */
 				break;
 			case BPF_S_ALU_AND_X:
 				seen |= SEEN_XREG;
diff --git a/net/core/filter.c b/net/core/filter.c
index 01b780856db2..ad30d626a5bd 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -36,7 +36,6 @@
 #include <asm/uaccess.h>
 #include <asm/unaligned.h>
 #include <linux/filter.h>
-#include <linux/reciprocal_div.h>
 #include <linux/ratelimit.h>
 #include <linux/seccomp.h>
 #include <linux/if_vlan.h>
@@ -166,7 +165,7 @@ unsigned int sk_run_filter(const struct sk_buff *skb,
 			A /= X;
 			continue;
 		case BPF_S_ALU_DIV_K:
-			A = reciprocal_divide(A, K);
+			A /= K;
 			continue;
 		case BPF_S_ALU_MOD_X:
 			if (X == 0)
@@ -553,11 +552,6 @@ int sk_chk_filter(struct sock_filter *filter, unsigned int flen)
 		/* Some instructions need special checks */
 		switch (code) {
 		case BPF_S_ALU_DIV_K:
-			/* check for division by zero */
-			if (ftest->k == 0)
-				return -EINVAL;
-			ftest->k = reciprocal_value(ftest->k);
-			break;
 		case BPF_S_ALU_MOD_K:
 			/* check for division by zero */
 			if (ftest->k == 0)
@@ -853,27 +847,7 @@ void sk_decode_filter(struct sock_filter *filt, struct sock_filter *to)
 	to->code = decodes[code];
 	to->jt = filt->jt;
 	to->jf = filt->jf;
-
-	if (code == BPF_S_ALU_DIV_K) {
-		/*
-		 * When loaded this rule user gave us X, which was
-		 * translated into R = r(X). Now we calculate the
-		 * RR = r(R) and report it back. If next time this
-		 * value is loaded and RRR = r(RR) is calculated
-		 * then the R == RRR will be true.
-		 *
-		 * One exception. X == 1 translates into R == 0 and
-		 * we can't calculate RR out of it with r().
-		 */
-
-		if (filt->k == 0)
-			to->k = 1;
-		else
-			to->k = reciprocal_value(filt->k);
-
-		BUG_ON(reciprocal_value(to->k) != filt->k);
-	} else
-		to->k = filt->k;
+	to->k = filt->k;
 }
 
 int sk_get_filter(struct sock *sk, struct sock_filter __user *ubuf, unsigned int len)

^ permalink raw reply related

* Re: [PATCH net-next] xen-netback: Rework rx_work_todo
From: Wei Liu @ 2014-01-15 14:45 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: Wei Liu, ian.campbell, xen-devel, netdev, linux-kernel,
	jonathan.davies
In-Reply-To: <52D67536.4030106@citrix.com>

On Wed, Jan 15, 2014 at 11:47:02AM +0000, Zoltan Kiss wrote:
> On 15/01/14 10:37, Wei Liu wrote:
> >On Tue, Jan 14, 2014 at 07:28:39PM +0000, Zoltan Kiss wrote:
> >>The recent patch to fix receive side flow control (11b57f) solved the spinning
> >>thread problem, however caused an another one. The receive side can stall, if:
> >>- xenvif_rx_action sets rx_queue_stopped to false
> >>- interrupt happens, and sets rx_event to true
> >>- then xenvif_kthread sets rx_event to false
> >>
> >
> >If you mean "rx_work_todo" returns false.
> >
> >In this case
> >
> >(!skb_queue_empty(&vif->rx_queue) && !vif->rx_queue_stopped) || vif->rx_event;
> >
> >can still be true, can't it?
> Sorry, I should wrote rx_queue_stopped to true
> 

In this case, if rx_queue_stopped is true, then we're expecting frontend
to notify us, right?

rx_queue_stopped is set to true if we cannot make any progress to queue
packet into the ring. In that situation we can expect frontend will send
notification to backend after it goes through the backlog in the ring.
That means rx_event is set to true, and rx_work_todo is true again. So
the ring is actually not stalled in this case as well. Did I miss
something?

> >
> >>Also, through rx_event a malicious guest can force the RX thread to spin. This
> >>patch ditch that two variable, and rework rx_work_todo. If the thread finds it
> >
> >This seems to be a bigger problem. Can you elaborate?
> My mistake too. I forgot that rx_action set it to false, so it's not
> really a spinning. However the thread should still run
> xenvif_rx_action to figure out there is no space in the ring before
> it sets rx_event to false. In my patch we can quit earlier.
> 
> Zoli

^ permalink raw reply

* Re: [PATCH v5 0/2] ipv6 addrconf: add IFA_F_NOPREFIXROUTE flag to suppress creation of IP6 routes
From: Hannes Frederic Sowa @ 2014-01-15 14:43 UTC (permalink / raw)
  To: Thomas Haller; +Cc: David Miller, jiri, netdev, stephen, dcbw
In-Reply-To: <1389796619-13440-1-git-send-email-thaller@redhat.com>

On Wed, Jan 15, 2014 at 03:36:57PM +0100, Thomas Haller wrote:
> v1 -> v2: add a second commit, handling NOPREFIXROUTE in ip6_del_addr.
> v2 -> v3: reword commit messages, code comments and some refactoring.
> v3 -> v4: refactor, rename variables, add enum
> v4 -> v5: rebase, so that patch applies cleanly to current net-next/master

Huh? I thought those were already applied?

Hmm, they seem to be in no repository? I hope that no other patches are
affected, too.

Greetings,

  Hannes

^ permalink raw reply

* [PATCH v5 2/2] ipv6 addrconf: don't cleanup prefix route for IFA_F_NOPREFIXROUTE
From: Thomas Haller @ 2014-01-15 14:36 UTC (permalink / raw)
  To: David Miller; +Cc: hannes, jiri, netdev, stephen, dcbw, Thomas Haller
In-Reply-To: <1389796619-13440-1-git-send-email-thaller@redhat.com>

Refactor the deletion/update of prefix routes when removing an
address. Now also consider IFA_F_NOPREFIXROUTE and if there is an address
present with this flag, to not cleanup the route. Instead, assume
that userspace is taking care of this route.

Also perform the same cleanup, when userspace changes an existing address
to add NOPREFIXROUTE (to an address that didn't have this flag). This is
done because when the address was added, a prefix route was created for it.
Since the user now wants to handle this route by himself, we cleanup this
route.

This cleanup of the route is not totally robust. There is no guarantee,
that the route we are about to delete was really the one added by the
kernel. This behavior does not change by the patch, and in practice it
should work just fine.

Signed-off-by: Thomas Haller <thaller@redhat.com>
---
 net/ipv6/addrconf.c | 184 +++++++++++++++++++++++++++++++---------------------
 1 file changed, 109 insertions(+), 75 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 85100c1..6913a82 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -900,15 +900,95 @@ out:
 	goto out2;
 }
 
+enum cleanup_prefix_rt_t {
+	CLEANUP_PREFIX_RT_NOP,    /* no cleanup action for prefix route */
+	CLEANUP_PREFIX_RT_DEL,    /* delete the prefix route */
+	CLEANUP_PREFIX_RT_EXPIRE, /* update the lifetime of the prefix route */
+};
+
+/*
+ * Check, whether the prefix for ifp would still need a prefix route
+ * after deleting ifp. The function returns one of the CLEANUP_PREFIX_RT_*
+ * constants.
+ *
+ * 1) we don't purge prefix if address was not permanent.
+ *    prefix is managed by its own lifetime.
+ * 2) we also don't purge, if the address was IFA_F_NOPREFIXROUTE.
+ * 3) if there are no addresses, delete prefix.
+ * 4) if there are still other permanent address(es),
+ *    corresponding prefix is still permanent.
+ * 5) if there are still other addresses with IFA_F_NOPREFIXROUTE,
+ *    don't purge the prefix, assume user space is managing it.
+ * 6) otherwise, update prefix lifetime to the
+ *    longest valid lifetime among the corresponding
+ *    addresses on the device.
+ *    Note: subsequent RA will update lifetime.
+ **/
+static enum cleanup_prefix_rt_t
+check_cleanup_prefix_route(struct inet6_ifaddr *ifp, unsigned long *expires)
+{
+	struct inet6_ifaddr *ifa;
+	struct inet6_dev *idev = ifp->idev;
+	unsigned long lifetime;
+	enum cleanup_prefix_rt_t action = CLEANUP_PREFIX_RT_DEL;
+
+	*expires = jiffies;
+
+	list_for_each_entry(ifa, &idev->addr_list, if_list) {
+		if (ifa == ifp)
+			continue;
+		if (!ipv6_prefix_equal(&ifa->addr, &ifp->addr,
+				       ifp->prefix_len))
+			continue;
+		if (ifa->flags & (IFA_F_PERMANENT | IFA_F_NOPREFIXROUTE))
+			return CLEANUP_PREFIX_RT_NOP;
+
+		action = CLEANUP_PREFIX_RT_EXPIRE;
+
+		spin_lock(&ifa->lock);
+
+		lifetime = addrconf_timeout_fixup(ifa->valid_lft, HZ);
+		/*
+		 * Note: Because this address is
+		 * not permanent, lifetime <
+		 * LONG_MAX / HZ here.
+		 */
+		if (time_before(*expires, ifa->tstamp + lifetime * HZ))
+			*expires = ifa->tstamp + lifetime * HZ;
+		spin_unlock(&ifa->lock);
+	}
+
+	return action;
+}
+
+static void
+cleanup_prefix_route(struct inet6_ifaddr *ifp, unsigned long expires, bool del_rt)
+{
+	struct rt6_info *rt;
+
+	rt = addrconf_get_prefix_route(&ifp->addr,
+				       ifp->prefix_len,
+				       ifp->idev->dev,
+				       0, RTF_GATEWAY | RTF_DEFAULT);
+	if (rt) {
+		if (del_rt)
+			ip6_del_rt(rt);
+		else {
+			if (!(rt->rt6i_flags & RTF_EXPIRES))
+				rt6_set_expires(rt, expires);
+			ip6_rt_put(rt);
+		}
+	}
+}
+
+
 /* This function wants to get referenced ifp and releases it before return */
 
 static void ipv6_del_addr(struct inet6_ifaddr *ifp)
 {
-	struct inet6_ifaddr *ifa, *ifn;
-	struct inet6_dev *idev = ifp->idev;
 	int state;
-	int deleted = 0, onlink = 0;
-	unsigned long expires = jiffies;
+	enum cleanup_prefix_rt_t action = CLEANUP_PREFIX_RT_NOP;
+	unsigned long expires;
 
 	spin_lock_bh(&ifp->state_lock);
 	state = ifp->state;
@@ -922,7 +1002,7 @@ static void ipv6_del_addr(struct inet6_ifaddr *ifp)
 	hlist_del_init_rcu(&ifp->addr_lst);
 	spin_unlock_bh(&addrconf_hash_lock);
 
-	write_lock_bh(&idev->lock);
+	write_lock_bh(&ifp->idev->lock);
 
 	if (ifp->flags&IFA_F_TEMPORARY) {
 		list_del(&ifp->tmp_list);
@@ -933,45 +1013,13 @@ static void ipv6_del_addr(struct inet6_ifaddr *ifp)
 		__in6_ifa_put(ifp);
 	}
 
-	list_for_each_entry_safe(ifa, ifn, &idev->addr_list, if_list) {
-		if (ifa == ifp) {
-			list_del_init(&ifp->if_list);
-			__in6_ifa_put(ifp);
+	if (ifp->flags & IFA_F_PERMANENT && !(ifp->flags & IFA_F_NOPREFIXROUTE))
+		action = check_cleanup_prefix_route(ifp, &expires);
 
-			if (!(ifp->flags & IFA_F_PERMANENT) || onlink > 0)
-				break;
-			deleted = 1;
-			continue;
-		} else if (ifp->flags & IFA_F_PERMANENT) {
-			if (ipv6_prefix_equal(&ifa->addr, &ifp->addr,
-					      ifp->prefix_len)) {
-				if (ifa->flags & IFA_F_PERMANENT) {
-					onlink = 1;
-					if (deleted)
-						break;
-				} else {
-					unsigned long lifetime;
-
-					if (!onlink)
-						onlink = -1;
-
-					spin_lock(&ifa->lock);
-
-					lifetime = addrconf_timeout_fixup(ifa->valid_lft, HZ);
-					/*
-					 * Note: Because this address is
-					 * not permanent, lifetime <
-					 * LONG_MAX / HZ here.
-					 */
-					if (time_before(expires,
-							ifa->tstamp + lifetime * HZ))
-						expires = ifa->tstamp + lifetime * HZ;
-					spin_unlock(&ifa->lock);
-				}
-			}
-		}
-	}
-	write_unlock_bh(&idev->lock);
+	list_del_init(&ifp->if_list);
+	__in6_ifa_put(ifp);
+
+	write_unlock_bh(&ifp->idev->lock);
 
 	addrconf_del_dad_timer(ifp);
 
@@ -979,38 +1027,9 @@ static void ipv6_del_addr(struct inet6_ifaddr *ifp)
 
 	inet6addr_notifier_call_chain(NETDEV_DOWN, ifp);
 
-	/*
-	 * Purge or update corresponding prefix
-	 *
-	 * 1) we don't purge prefix here if address was not permanent.
-	 *    prefix is managed by its own lifetime.
-	 * 2) if there are no addresses, delete prefix.
-	 * 3) if there are still other permanent address(es),
-	 *    corresponding prefix is still permanent.
-	 * 4) otherwise, update prefix lifetime to the
-	 *    longest valid lifetime among the corresponding
-	 *    addresses on the device.
-	 *    Note: subsequent RA will update lifetime.
-	 *
-	 * --yoshfuji
-	 */
-	if ((ifp->flags & IFA_F_PERMANENT) && onlink < 1) {
-		struct rt6_info *rt;
-
-		rt = addrconf_get_prefix_route(&ifp->addr,
-					       ifp->prefix_len,
-					       ifp->idev->dev,
-					       0, RTF_GATEWAY | RTF_DEFAULT);
-
-		if (rt) {
-			if (onlink == 0) {
-				ip6_del_rt(rt);
-				rt = NULL;
-			} else if (!(rt->rt6i_flags & RTF_EXPIRES)) {
-				rt6_set_expires(rt, expires);
-			}
-		}
-		ip6_rt_put(rt);
+	if (action != CLEANUP_PREFIX_RT_NOP) {
+		cleanup_prefix_route(ifp, expires,
+			action == CLEANUP_PREFIX_RT_DEL);
 	}
 
 	/* clean up prefsrc entries */
@@ -3636,6 +3655,7 @@ static int inet6_addr_modify(struct inet6_ifaddr *ifp, u32 ifa_flags,
 	clock_t expires;
 	unsigned long timeout;
 	bool was_managetempaddr;
+	bool had_prefixroute;
 
 	if (!valid_lft || (prefered_lft > valid_lft))
 		return -EINVAL;
@@ -3664,6 +3684,8 @@ static int inet6_addr_modify(struct inet6_ifaddr *ifp, u32 ifa_flags,
 
 	spin_lock_bh(&ifp->lock);
 	was_managetempaddr = ifp->flags & IFA_F_MANAGETEMPADDR;
+	had_prefixroute = ifp->flags & IFA_F_PERMANENT &&
+			  !(ifp->flags & IFA_F_NOPREFIXROUTE);
 	ifp->flags &= ~(IFA_F_DEPRECATED | IFA_F_PERMANENT | IFA_F_NODAD |
 			IFA_F_HOMEADDRESS | IFA_F_MANAGETEMPADDR |
 			IFA_F_NOPREFIXROUTE);
@@ -3679,6 +3701,18 @@ static int inet6_addr_modify(struct inet6_ifaddr *ifp, u32 ifa_flags,
 	if (!(ifa_flags & IFA_F_NOPREFIXROUTE)) {
 		addrconf_prefix_route(&ifp->addr, ifp->prefix_len, ifp->idev->dev,
 				      expires, flags);
+	} else if (had_prefixroute) {
+		enum cleanup_prefix_rt_t action;
+		unsigned long rt_expires;
+
+		write_lock_bh(&ifp->idev->lock);
+		action = check_cleanup_prefix_route(ifp, &rt_expires);
+		write_unlock_bh(&ifp->idev->lock);
+
+		if (action != CLEANUP_PREFIX_RT_NOP) {
+			cleanup_prefix_route(ifp, rt_expires,
+				action == CLEANUP_PREFIX_RT_DEL);
+		}
 	}
 
 	if (was_managetempaddr || ifp->flags & IFA_F_MANAGETEMPADDR) {
-- 
1.8.4.2

^ permalink raw reply related

* [PATCH v5 0/2] ipv6 addrconf: add IFA_F_NOPREFIXROUTE flag to suppress creation of IP6 routes
From: Thomas Haller @ 2014-01-15 14:36 UTC (permalink / raw)
  To: David Miller; +Cc: hannes, jiri, netdev, stephen, dcbw, Thomas Haller
In-Reply-To: <20140110.181030.2081631538801450145.davem@davemloft.net>

v1 -> v2: add a second commit, handling NOPREFIXROUTE in ip6_del_addr.
v2 -> v3: reword commit messages, code comments and some refactoring.
v3 -> v4: refactor, rename variables, add enum
v4 -> v5: rebase, so that patch applies cleanly to current net-next/master

Thomas Haller (2):
  ipv6 addrconf: add IFA_F_NOPREFIXROUTE flag to suppress creation of
    IP6 routes
  ipv6 addrconf: don't cleanup prefix route for IFA_F_NOPREFIXROUTE

 include/uapi/linux/if_addr.h |   1 +
 net/ipv6/addrconf.c          | 203 ++++++++++++++++++++++++++-----------------
 2 files changed, 123 insertions(+), 81 deletions(-)

-- 
1.8.4.2

^ permalink raw reply

* [PATCH v5 1/2] ipv6 addrconf: add IFA_F_NOPREFIXROUTE flag to suppress creation of IP6 routes
From: Thomas Haller @ 2014-01-15 14:36 UTC (permalink / raw)
  To: David Miller; +Cc: hannes, jiri, netdev, stephen, dcbw, Thomas Haller
In-Reply-To: <1389796619-13440-1-git-send-email-thaller@redhat.com>

When adding/modifying an IPv6 address, the userspace application needs
a way to suppress adding a prefix route. This is for example relevant
together with IFA_F_MANAGERTEMPADDR, where userspace creates autoconf
generated addresses, but depending on on-link, no route for the
prefix should be added.

Signed-off-by: Thomas Haller <thaller@redhat.com>
---
 include/uapi/linux/if_addr.h |  1 +
 net/ipv6/addrconf.c          | 19 +++++++++++++------
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/if_addr.h b/include/uapi/linux/if_addr.h
index cfed10b..dea10a8 100644
--- a/include/uapi/linux/if_addr.h
+++ b/include/uapi/linux/if_addr.h
@@ -49,6 +49,7 @@ enum {
 #define IFA_F_TENTATIVE		0x40
 #define IFA_F_PERMANENT		0x80
 #define IFA_F_MANAGETEMPADDR	0x100
+#define IFA_F_NOPREFIXROUTE	0x200
 
 struct ifa_cacheinfo {
 	__u32	ifa_prefered;
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 1b2e4ee..85100c1 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2434,8 +2434,11 @@ static int inet6_addr_add(struct net *net, int ifindex,
 			    valid_lft, prefered_lft);
 
 	if (!IS_ERR(ifp)) {
-		addrconf_prefix_route(&ifp->addr, ifp->prefix_len, dev,
-				      expires, flags);
+		if (!(ifa_flags & IFA_F_NOPREFIXROUTE)) {
+			addrconf_prefix_route(&ifp->addr, ifp->prefix_len, dev,
+					      expires, flags);
+		}
+
 		/*
 		 * Note that section 3.1 of RFC 4429 indicates
 		 * that the Optimistic flag should not be set for
@@ -3662,7 +3665,8 @@ static int inet6_addr_modify(struct inet6_ifaddr *ifp, u32 ifa_flags,
 	spin_lock_bh(&ifp->lock);
 	was_managetempaddr = ifp->flags & IFA_F_MANAGETEMPADDR;
 	ifp->flags &= ~(IFA_F_DEPRECATED | IFA_F_PERMANENT | IFA_F_NODAD |
-			IFA_F_HOMEADDRESS | IFA_F_MANAGETEMPADDR);
+			IFA_F_HOMEADDRESS | IFA_F_MANAGETEMPADDR |
+			IFA_F_NOPREFIXROUTE);
 	ifp->flags |= ifa_flags;
 	ifp->tstamp = jiffies;
 	ifp->valid_lft = valid_lft;
@@ -3672,8 +3676,10 @@ static int inet6_addr_modify(struct inet6_ifaddr *ifp, u32 ifa_flags,
 	if (!(ifp->flags&IFA_F_TENTATIVE))
 		ipv6_ifa_notify(0, ifp);
 
-	addrconf_prefix_route(&ifp->addr, ifp->prefix_len, ifp->idev->dev,
-			      expires, flags);
+	if (!(ifa_flags & IFA_F_NOPREFIXROUTE)) {
+		addrconf_prefix_route(&ifp->addr, ifp->prefix_len, ifp->idev->dev,
+				      expires, flags);
+	}
 
 	if (was_managetempaddr || ifp->flags & IFA_F_MANAGETEMPADDR) {
 		if (was_managetempaddr && !(ifp->flags & IFA_F_MANAGETEMPADDR))
@@ -3727,7 +3733,8 @@ inet6_rtm_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh)
 	ifa_flags = tb[IFA_FLAGS] ? nla_get_u32(tb[IFA_FLAGS]) : ifm->ifa_flags;
 
 	/* We ignore other flags so far. */
-	ifa_flags &= IFA_F_NODAD | IFA_F_HOMEADDRESS | IFA_F_MANAGETEMPADDR;
+	ifa_flags &= IFA_F_NODAD | IFA_F_HOMEADDRESS | IFA_F_MANAGETEMPADDR |
+		     IFA_F_NOPREFIXROUTE;
 
 	ifa = ipv6_get_ifaddr(net, pfx, dev, 1);
 	if (ifa == NULL) {
-- 
1.8.4.2

^ permalink raw reply related

* Re: [Xen-devel] [PATCH net-next] xen-netfront: clean up code in xennet_release_rx_bufs
From: Wei Liu @ 2014-01-15 14:32 UTC (permalink / raw)
  To: annie li; +Cc: David Vrabel, Wei Liu, xen-devel, netdev, ian.campbell
In-Reply-To: <52D69864.9030207@oracle.com>

On Wed, Jan 15, 2014 at 10:17:08PM +0800, annie li wrote:
[...]
> >Having gnttab_end_foreign_access() do a free just looks odd to me, the
> >free isn't paired with any alloc in the grant table code.
> >
> >It seems more logical to me that granting access takes an additional
> >page ref, and then ending access releases that ref.
> 
> I am thinking of two ways, and they can be implemented in new patches.
> 1. If gnttab_end_foreign_access_ref succeeds, then kfree_skb is
> called to free skb. Otherwise, using gnttab_end_foreign_access to
> release ref and pages.

This is probably not a very good idea as skb_free does a lot more things
than simply freeing pages. You're still leaking something (skb: state,
head etc.) with this approach.

> 2. Add a similar deferred way of gnttab_end_foreign_access in
> gnttab_end_foreign_access_ref.
> 
> Thanks
> Annie
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net-next] net: stmmac: fix NULL pointer dereference in stmmac_get_tx_hwtstamp
From: Bruce Liu @ 2014-01-15 14:26 UTC (permalink / raw)
  To: peppe.cavallaro; +Cc: netdev, linux-kernel, Bruce Liu

When timestamping is enabled, stmmac_tx_clean will call
stmmac_get_tx_hwtstamp to get tx TS.
But the skb can be NULL because the last of its tx_skbuff is NULL
if this packet frame is filled in more than one descriptors.

Signed-off-by: Bruce Liu <damuzi000@gmail.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 797b56a..47f2287 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -332,7 +332,7 @@ static void stmmac_get_tx_hwtstamp(struct stmmac_priv *priv,
 		return;
 
 	/* exit if skb doesn't support hw tstamp */
-	if (likely(!(skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS)))
+	if (likely(!skb || !(skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS)))
 		return;
 
 	if (priv->adv_ts)
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH net] bpf: do not use reciprocal divide
From: Eric Dumazet @ 2014-01-15 14:25 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Martin Schwidefsky, Hannes Frederic Sowa, netdev, dborkman,
	darkjames-ws, Mircea Gherzan, Russell King, Matt Evans
In-Reply-To: <1389795716.31367.333.camel@edumazet-glaptop2.roam.corp.google.com>

On Wed, 2014-01-15 at 06:21 -0800, Eric Dumazet wrote:
> On Wed, 2014-01-15 at 11:51 +0100, Heiko Carstens wrote:
> > On Wed, Jan 15, 2014 at 09:13:22AM +0100, Martin Schwidefsky wrote:
> > > On Wed, 15 Jan 2014 09:00:07 +0100
> > > Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> > > 
> > > > On Tue, Jan 14, 2014 at 11:02:41PM -0800, Eric Dumazet wrote:
> > > > > diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
> > > > > index 16871da37371..e349dc7d0992 100644
> > > > > --- a/arch/s390/net/bpf_jit_comp.c
> > > > > +++ b/arch/s390/net/bpf_jit_comp.c
> > > > > @@ -371,11 +371,11 @@ static int bpf_jit_insn(struct bpf_jit *jit, struct sock_filter *filter,
> > > > >  		/* dr %r4,%r12 */
> > > > >  		EMIT2(0x1d4c);
> > > > >  		break;
> > > > > -	case BPF_S_ALU_DIV_K: /* A = reciprocal_divide(A, K) */
> > > > > -		/* m %r4,<d(K)>(%r13) */
> > > > > -		EMIT4_DISP(0x5c40d000, EMIT_CONST(K));
> > > > > -		/* lr %r5,%r4 */
> > > > > -		EMIT2(0x1854);
> > > > > +	case BPF_S_ALU_DIV_K: /* A /= K */
> > > > > +		/* lhi %r4,0 */
> > > > > +		EMIT4(0xa7480000);
> > > > > +		/* d %r4,<d(K)>(%r13) */
> > > > > +		EMIT4_DISP(0x5d40d000, EMIT_CONST(K));
> > > > >  		break;
> > > > 
> > > > The s390 part looks good.
> > > 
> > > Does it? The divide instruction is signed, for the special
> > > case of K==1 this can now cause an exception if the quotient
> > > gets too large. We should add a check for K==1 and do nothing
> > > in this case. With a divisor of at least 2 the result will
> > > stay in the limit.
> > 
> > Indeed. That's quite subtle.
> 
> net/core/filter.c does :
> 
> A /= K;
> 
> Why is this working in generic code (if K == 1), not in s390 one ?

Note that I copied code found in BPF_S_ALU_MOD_K, so this one would need
a fix as well.

^ permalink raw reply

* Re: [PATCH net] bpf: do not use reciprocal divide
From: Eric Dumazet @ 2014-01-15 14:21 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Martin Schwidefsky, Hannes Frederic Sowa, netdev, dborkman,
	darkjames-ws, Mircea Gherzan, Russell King, Matt Evans
In-Reply-To: <20140115105114.GB6638@osiris>

On Wed, 2014-01-15 at 11:51 +0100, Heiko Carstens wrote:
> On Wed, Jan 15, 2014 at 09:13:22AM +0100, Martin Schwidefsky wrote:
> > On Wed, 15 Jan 2014 09:00:07 +0100
> > Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> > 
> > > On Tue, Jan 14, 2014 at 11:02:41PM -0800, Eric Dumazet wrote:
> > > > diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
> > > > index 16871da37371..e349dc7d0992 100644
> > > > --- a/arch/s390/net/bpf_jit_comp.c
> > > > +++ b/arch/s390/net/bpf_jit_comp.c
> > > > @@ -371,11 +371,11 @@ static int bpf_jit_insn(struct bpf_jit *jit, struct sock_filter *filter,
> > > >  		/* dr %r4,%r12 */
> > > >  		EMIT2(0x1d4c);
> > > >  		break;
> > > > -	case BPF_S_ALU_DIV_K: /* A = reciprocal_divide(A, K) */
> > > > -		/* m %r4,<d(K)>(%r13) */
> > > > -		EMIT4_DISP(0x5c40d000, EMIT_CONST(K));
> > > > -		/* lr %r5,%r4 */
> > > > -		EMIT2(0x1854);
> > > > +	case BPF_S_ALU_DIV_K: /* A /= K */
> > > > +		/* lhi %r4,0 */
> > > > +		EMIT4(0xa7480000);
> > > > +		/* d %r4,<d(K)>(%r13) */
> > > > +		EMIT4_DISP(0x5d40d000, EMIT_CONST(K));
> > > >  		break;
> > > 
> > > The s390 part looks good.
> > 
> > Does it? The divide instruction is signed, for the special
> > case of K==1 this can now cause an exception if the quotient
> > gets too large. We should add a check for K==1 and do nothing
> > in this case. With a divisor of at least 2 the result will
> > stay in the limit.
> 
> Indeed. That's quite subtle.

net/core/filter.c does :

A /= K;

Why is this working in generic code (if K == 1), not in s390 one ?

^ permalink raw reply

* [RFC PATCH net-next 3/3] virtio-net: Add accelerated RFS support
From: Zhi Yong Wu @ 2014-01-15 14:20 UTC (permalink / raw)
  To: netdev; +Cc: therbert, edumazet, davem, Zhi Yong Wu
In-Reply-To: <1389795654-28381-1-git-send-email-zwu.kernel@gmail.com>

From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>

Accelerated RFS is used to select the RX queue.

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
 drivers/net/virtio_net.c |   56 +++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 55 insertions(+), 1 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 046421c..9f4cfa7 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -26,6 +26,7 @@
 #include <linux/if_vlan.h>
 #include <linux/slab.h>
 #include <linux/cpu.h>
+#include <linux/cpu_rmap.h>
 
 static int napi_weight = NAPI_POLL_WEIGHT;
 module_param(napi_weight, int, 0444);
@@ -1554,6 +1555,49 @@ err:
 	return ret;
 }
 
+static void virtnet_free_irq_cpu_rmap(struct virtnet_info *vi)
+{
+#ifdef CONFIG_RFS_ACCEL
+	free_irq_cpu_rmap(vi->dev->rx_cpu_rmap);
+	vi->dev->rx_cpu_rmap = NULL;
+#endif
+}
+
+static int virtnet_init_rx_cpu_rmap(struct virtnet_info *vi)
+{
+	int rc = 0;
+
+#ifdef CONFIG_RFS_ACCEL
+	struct virtio_device *vdev = vi->vdev;
+	unsigned int irq;
+	int i;
+
+	if (!vi->affinity_hint_set)
+		goto out;
+
+	vi->dev->rx_cpu_rmap = alloc_irq_cpu_rmap(vi->max_queue_pairs);
+	if (!vi->dev->rx_cpu_rmap) {
+		rc = -ENOMEM;
+		goto out;
+	}
+
+	for (i = 0; i < vi->max_queue_pairs; i++) {
+		irq = virtqueue_get_vq_irq(vdev, vi->rq[i].vq);
+		if (irq == -1)
+			goto failed;
+
+		rc = irq_cpu_rmap_add(vi->dev->rx_cpu_rmap, irq);
+		if (rc) {
+failed:
+			virtnet_free_irq_cpu_rmap(vi);
+			goto out;
+		}
+	}
+out:
+#endif
+	return rc;
+}
+
 static int virtnet_probe(struct virtio_device *vdev)
 {
 	int i, err;
@@ -1671,10 +1715,16 @@ static int virtnet_probe(struct virtio_device *vdev)
 	netif_set_real_num_tx_queues(dev, vi->curr_queue_pairs);
 	netif_set_real_num_rx_queues(dev, vi->curr_queue_pairs);
 
+	err = virtnet_init_rx_cpu_rmap(vi);
+	if (err) {
+		pr_debug("virtio_net: initializing rx cpu rmap failed\n");
+		goto free_vqs;
+	}
+
 	err = register_netdev(dev);
 	if (err) {
 		pr_debug("virtio_net: registering device failed\n");
-		goto free_vqs;
+		goto free_cpu_rmap;
 	}
 
 	/* Last of all, set up some receive buffers. */
@@ -1714,6 +1764,8 @@ static int virtnet_probe(struct virtio_device *vdev)
 free_recv_bufs:
 	free_receive_bufs(vi);
 	unregister_netdev(dev);
+free_cpu_rmap:
+	virtnet_free_irq_cpu_rmap(vi);
 free_vqs:
 	cancel_delayed_work_sync(&vi->refill);
 	virtnet_del_vqs(vi);
@@ -1751,6 +1803,8 @@ static void virtnet_remove(struct virtio_device *vdev)
 
 	unregister_netdev(vi->dev);
 
+	virtnet_free_irq_cpu_rmap(vi);
+
 	remove_vq_common(vi);
 	if (vi->alloc_frag.page)
 		put_page(vi->alloc_frag.page);
-- 
1.7.6.5

^ permalink raw reply related

* [RFC PATCH net-next 2/3] virtio_net: Introduce one dummy function virtnet_filter_rfs()
From: Zhi Yong Wu @ 2014-01-15 14:20 UTC (permalink / raw)
  To: netdev; +Cc: therbert, edumazet, davem, Zhi Yong Wu
In-Reply-To: <1389795654-28381-1-git-send-email-zwu.kernel@gmail.com>

From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
 drivers/net/virtio_net.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 7b17240..046421c 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1295,6 +1295,14 @@ static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
 	return 0;
 }
 
+#ifdef CONFIG_RFS_ACCEL
+static int virtnet_filter_rfs(struct net_device *net_dev,
+		const struct sk_buff *skb, u16 rxq_index, u32 flow_id)
+{
+	return 0;
+}
+#endif /* CONFIG_RFS_ACCEL */
+
 static const struct net_device_ops virtnet_netdev = {
 	.ndo_open            = virtnet_open,
 	.ndo_stop   	     = virtnet_close,
@@ -1309,6 +1317,9 @@ static const struct net_device_ops virtnet_netdev = {
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller = virtnet_netpoll,
 #endif
+#ifdef CONFIG_RFS_ACCEL
+	.ndo_rx_flow_steer   = virtnet_filter_rfs,
+#endif
 };
 
 static void virtnet_config_changed_work(struct work_struct *work)
-- 
1.7.6.5

^ permalink raw reply related

* [RFC PATCH net-next 1/3] virtio_pci: Introduce one new config api vp_get_vq_irq()
From: Zhi Yong Wu @ 2014-01-15 14:20 UTC (permalink / raw)
  To: netdev; +Cc: therbert, edumazet, davem, Zhi Yong Wu
In-Reply-To: <1389795654-28381-1-git-send-email-zwu.kernel@gmail.com>

From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>

This config api is used to get irq number of the given virtqueue.

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
 drivers/virtio/virtio_pci.c   |   11 +++++++++++
 include/linux/virtio_config.h |   12 ++++++++++++
 2 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index a37c699..85f1aa6 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -654,6 +654,16 @@ static int vp_set_vq_affinity(struct virtqueue *vq, int cpu)
 	return 0;
 }
 
+static int vp_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	struct virtio_pci_vq_info *info = vq->priv;
+
+	if (vp_dev->msix_enabled)
+		return vp_dev->msix_entries[info->msix_vector].vector;
+	return -1;
+}
+
 static const struct virtio_config_ops virtio_pci_config_ops = {
 	.get		= vp_get,
 	.set		= vp_set,
@@ -666,6 +676,7 @@ static const struct virtio_config_ops virtio_pci_config_ops = {
 	.finalize_features = vp_finalize_features,
 	.bus_name	= vp_bus_name,
 	.set_vq_affinity = vp_set_vq_affinity,
+	.get_vq_irq	= vp_get_vq_irq,
 };
 
 static void virtio_pci_release_dev(struct device *_d)
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index e8f8f71..b70fc47 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -51,6 +51,9 @@
  *      This returns a pointer to the bus name a la pci_name from which
  *      the caller can then copy.
  * @set_vq_affinity: set the affinity for a virtqueue.
+ * @get_vq_irq: get irq of the given virtqueue
+ *	vdev: the virtio_device
+ *	vq: the virtqueue
  */
 typedef void vq_callback_t(struct virtqueue *);
 struct virtio_config_ops {
@@ -70,6 +73,7 @@ struct virtio_config_ops {
 	void (*finalize_features)(struct virtio_device *vdev);
 	const char *(*bus_name)(struct virtio_device *vdev);
 	int (*set_vq_affinity)(struct virtqueue *vq, int cpu);
+	int (*get_vq_irq)(struct virtio_device *vdev, struct virtqueue *vq);
 };
 
 /* If driver didn't advertise the feature, it will never appear. */
@@ -135,6 +139,14 @@ int virtqueue_set_affinity(struct virtqueue *vq, int cpu)
 	return 0;
 }
 
+static inline
+int virtqueue_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq)
+{
+	if (vdev->config->get_vq_irq)
+		return vdev->config->get_vq_irq(vdev, vq);
+	return -1;
+}
+
 /* Config space accessors. */
 #define virtio_cread(vdev, structname, member, ptr)			\
 	do {								\
-- 
1.7.6.5

^ permalink raw reply related

* [RFC PATCH net-next 0/3] virtio_net: add aRFS support
From: Zhi Yong Wu @ 2014-01-15 14:20 UTC (permalink / raw)
  To: netdev; +Cc: therbert, edumazet, davem, Zhi Yong Wu

From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>

HI, folks

The patchset is trying to integrate aRFS support to virtio_net. In this case,
aRFS will be used to select the RX queue. To make sure that it's going ahead
in the correct direction, although it is still one RFC and isn't tested, it's
post out ASAP. Any comment are appreciated, thanks.

If anyone is interested in playing with it, you can get this patchset from my
dev git on github:
  git://github.com/wuzhy/kernel.git virtnet_rfs

Zhi Yong Wu (3):
  virtio_pci: Introduce one new config api vp_get_vq_irq()
  virtio_net: Introduce one dummy function virtnet_filter_rfs()
  virtio-net: Add accelerated RFS support

 drivers/net/virtio_net.c      |   67 ++++++++++++++++++++++++++++++++++++++++-
 drivers/virtio/virtio_pci.c   |   11 +++++++
 include/linux/virtio_config.h |   12 +++++++
 3 files changed, 89 insertions(+), 1 deletions(-)

-- 
1.7.6.5

^ permalink raw reply

* Re: [Xen-devel] [PATCH net-next] xen-netfront: clean up code in xennet_release_rx_bufs
From: annie li @ 2014-01-15 14:17 UTC (permalink / raw)
  To: David Vrabel; +Cc: Wei Liu, xen-devel, netdev, ian.campbell
In-Reply-To: <52D67677.4050407@citrix.com>


On 2014-1-15 19:52, David Vrabel wrote:
> On 15/01/14 11:42, Wei Liu wrote:
>> On Wed, Jan 15, 2014 at 11:20:49AM +0000, David Vrabel wrote:
>>> On 09/01/14 22:48, Annie Li wrote:
>>>> Current netfront only grants pages for grant copy, not for grant transfer, so
>>>> remove corresponding transfer code and add receiving copy code in
>>>> xennet_release_rx_bufs.
>>> While netfront only supports a copying backend, I don't see anything
>>> preventing the backend from retaining mappings to netfront's Rx buffers...
>>>
>> Correct.
>>
>>>> Signed-off-by: Annie Li <Annie.li@oracle.com>
>>>> ---
>>>>   drivers/net/xen-netfront.c |   60 ++-----------------------------------------
>>>>   1 files changed, 3 insertions(+), 57 deletions(-)
>>>>
>>>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>>>> index e59acb1..692589e 100644
>>>> --- a/drivers/net/xen-netfront.c
>>>> +++ b/drivers/net/xen-netfront.c
>>>> @@ -1134,78 +1134,24 @@ static void xennet_release_tx_bufs(struct netfront_info *np)
>>>>   
>>>>   static void xennet_release_rx_bufs(struct netfront_info *np)
>>>>   {
>>> [...]
>>>> -		mfn = gnttab_end_foreign_transfer_ref(ref);
>>>> +		gnttab_end_foreign_access_ref(ref, 0);
>>> ... the gnttab_end_foreign_access_ref() may then fail and...
>>>
>> Oh, I see. Andrew was actually referencing this function. Yes, it can
>> fail. Since he omitted "_ref" I looked at the other function when I
>> replied to him...
>>
>>>>   		gnttab_release_grant_reference(&np->gref_rx_head, ref);
>>>>   		np->grant_rx_ref[id] = GRANT_INVALID_REF;
>>> [...]
>>>> +		kfree_skb(skb);
>>> ... this could then potentially free pages that the backend still has
>>> mapped.  If the pages are then reused, this would leak information to
>>> the backend.
>>>
>>> Since only a buggy backend would result in this, leaking the skbs and
>>> grant refs would be acceptable here.  I would also print an error.
>>>
>> How about using gnttab_end_foreign_access. The deferred queue looks like
>> a right solution -- pending page won't get freed until gref is
>> quiescent.
> This is more like the correct approach but I don't think it still quite
> right.  The skb owns the pages so we don't want
> gnttab_end_foreign_access() to free them as freeing the skb will attempt
> to free them again.
>
> Having gnttab_end_foreign_access() do a free just looks odd to me, the
> free isn't paired with any alloc in the grant table code.
>
> It seems more logical to me that granting access takes an additional
> page ref, and then ending access releases that ref.

I am thinking of two ways, and they can be implemented in new patches.
1. If gnttab_end_foreign_access_ref succeeds, then kfree_skb is called 
to free skb. Otherwise, using gnttab_end_foreign_access to release ref 
and pages.
2. Add a similar deferred way of gnttab_end_foreign_access in 
gnttab_end_foreign_access_ref.

Thanks
Annie

^ permalink raw reply

* Re: [PATCH net] bpf: do not use reciprocal divide
From: Eric Dumazet @ 2014-01-15 14:16 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Hannes Frederic Sowa, netdev, dborkman, darkjames-ws,
	Mircea Gherzan, Russell King, Matt Evans, Martin Schwidefsky
In-Reply-To: <20140115080007.GA6638@osiris>

On Wed, 2014-01-15 at 09:00 +0100, Heiko Carstens wrote:
> On Tue, Jan 14, 2014 at 11:02:41PM -0800, Eric Dumazet wrote:
> > diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
> > index 16871da37371..e349dc7d0992 100644
> > --- a/arch/s390/net/bpf_jit_comp.c
> > +++ b/arch/s390/net/bpf_jit_comp.c
> > @@ -371,11 +371,11 @@ static int bpf_jit_insn(struct bpf_jit *jit, struct sock_filter *filter,
> >  		/* dr %r4,%r12 */
> >  		EMIT2(0x1d4c);
> >  		break;
> > -	case BPF_S_ALU_DIV_K: /* A = reciprocal_divide(A, K) */
> > -		/* m %r4,<d(K)>(%r13) */
> > -		EMIT4_DISP(0x5c40d000, EMIT_CONST(K));
> > -		/* lr %r5,%r4 */
> > -		EMIT2(0x1854);
> > +	case BPF_S_ALU_DIV_K: /* A /= K */
> > +		/* lhi %r4,0 */
> > +		EMIT4(0xa7480000);
> > +		/* d %r4,<d(K)>(%r13) */
> > +		EMIT4_DISP(0x5d40d000, EMIT_CONST(K));
> >  		break;
> 
> The s390 part looks good.
> 
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 01b780856db2..ad30d626a5bd 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -166,7 +165,7 @@ unsigned int sk_run_filter(const struct sk_buff *skb,
> >  			A /= X;
> >  			continue;
> >  		case BPF_S_ALU_DIV_K:
> > -			A = reciprocal_divide(A, K);
> > +			A /= K;
> >  			continue;
> >  		case BPF_S_ALU_MOD_X:
> >  			if (X == 0)
> > @@ -553,11 +552,6 @@ int sk_chk_filter(struct sock_filter *filter, unsigned int flen)
> >  		/* Some instructions need special checks */
> >  		switch (code) {
> >  		case BPF_S_ALU_DIV_K:
> > -			/* check for division by zero */
> > -			if (ftest->k == 0)
> > -				return -EINVAL;
> > -			ftest->k = reciprocal_value(ftest->k);
> > -			break;
> 
> Are you sure you want to remove the k == 0 check? Is there something
> else that would prevent a division by zero?

This is done by factoring the two cases, modulo and divide :

 vi +553 net/core/filter.c 

                /* Some instructions need special checks */
                switch (code) {
                case BPF_S_ALU_DIV_K:
                case BPF_S_ALU_MOD_K:
                        /* check for division by zero */
                        if (ftest->k == 0)
                                return -EINVAL;
                        break;

^ permalink raw reply

* Re: [Xen-devel] [PATCH net-next] xen-netfront: clean up code in xennet_release_rx_bufs
From: annie li @ 2014-01-15 14:16 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, netdev, wei.liu2, ian.campbell
In-Reply-To: <52D66F11.204@citrix.com>


On 2014-1-15 19:20, David Vrabel wrote:
> On 09/01/14 22:48, Annie Li wrote:
>> Current netfront only grants pages for grant copy, not for grant transfer, so
>> remove corresponding transfer code and add receiving copy code in
>> xennet_release_rx_bufs.
> While netfront only supports a copying backend, I don't see anything
> preventing the backend from retaining mappings to netfront's Rx buffers...

Right. This does not prevent backend from mappings.
Maybe my description is not clear. What I mean here is based on old 
2.6.18 netfront which uses "copying_receiver" to tell netback whether rx 
requires grant copy. Probably changing "grant copy" above into "grant 
access" vs "grant transfer" is more precise. And the 
"gnttab_end_foreign_transfer_ref" is the unnecessary code kept from old 
netfront.

>
>> Signed-off-by: Annie Li <Annie.li@oracle.com>
>> ---
>>   drivers/net/xen-netfront.c |   60 ++-----------------------------------------
>>   1 files changed, 3 insertions(+), 57 deletions(-)
>>
>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>> index e59acb1..692589e 100644
>> --- a/drivers/net/xen-netfront.c
>> +++ b/drivers/net/xen-netfront.c
>> @@ -1134,78 +1134,24 @@ static void xennet_release_tx_bufs(struct netfront_info *np)
>>   
>>   static void xennet_release_rx_bufs(struct netfront_info *np)
>>   {
> [...]
>> -		mfn = gnttab_end_foreign_transfer_ref(ref);
>> +		gnttab_end_foreign_access_ref(ref, 0);
> ... the gnttab_end_foreign_access_ref() may then fail and...
>
>>   		gnttab_release_grant_reference(&np->gref_rx_head, ref);
>>   		np->grant_rx_ref[id] = GRANT_INVALID_REF;
> [...]
>> +		kfree_skb(skb);
> ... this could then potentially free pages that the backend still has
> mapped.  If the pages are then reused, this would leak information to
> the backend.

Yes, it is possible. But doing kfree_skb is right thing from netfront 
point of view.

>
> Since only a buggy backend would result in this, leaking the skbs and
> grant refs would be acceptable here.

This is the same thing for tx side which uses similar process.

>    I would also print an error.

You mean add some print log here? Is it necessary?

Thanks
Annie
>
> While checking blkfront for how it handles this, it also doesn't appear
> to do the right thing either.
>
> David
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC] sysfs_rename_link() and its usage
From: Tejun Heo @ 2014-01-15 14:16 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Veaceslav Falico, Greg KH, linux-kernel, netdev
In-Reply-To: <87iotmc9ic.fsf@xmission.com>

Hey, Veaceslav, Eric.

On Tue, Jan 14, 2014 at 05:35:23PM -0800, Eric W. Biederman wrote:
> >>> >>This works like a charm. However, if I want to use (obviously, with the
> >>> >>symlink present):
> >>> >>
> >>> >>sysfs_rename_link(&(a->dev.kobj), &(b->dev.kobj), oldname, newname);
> >>> >
> >>> >You forgot the namespace option to this call, what kernel version are
> >>> >you using here?
> >>>
> >>> It's git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ,
> >>> 3.13-rc6 with some networking patches on top of it.

Does this work on 3.12?  How about Greg's driver-core-next?  Do you
have a minimal test case that I can use to reproduce the issue?

> sysfs_rename_link was originally written to infer figure everything out
> based on the target device.  Because symlinks only make sense being in
> the same namespace of their devices.  Tejun broke the inference and now
> you are having hard time using the code.

I wouldn't be surprised if I broke it but

> The commit that broke things was:
> 
> commit 4b30ee58ee64c64f59fd876e4afa6ed82caef3a4
> Author: Tejun Heo <tj@kernel.org>
> Date:   Wed Sep 11 22:29:06 2013 -0400
> 
>     sysfs: remove ktype->namespace() invocations in symlink code

I'm not so sure about the above commit.  The warning is from rename
failing to locate the existing node while the above patch updates the
target ns to rename to.  The old_ns part stays the same before and
after the commit.

Veaceslav, please confirm whether the issue is reproducible w/ v3.12.

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [Xen-devel] [PATCH net-next] xen-netfront: clean up code in xennet_release_rx_bufs
From: annie li @ 2014-01-15 14:15 UTC (permalink / raw)
  To: Andrew Bennieston; +Cc: Wei Liu, netdev, ian.campbell, xen-devel
In-Reply-To: <52D66ADF.9070401@citrix.com>


On 2014-1-15 19:02, Andrew Bennieston wrote:
> On 15/01/14 10:07, Wei Liu wrote:
>> On Fri, Jan 10, 2014 at 06:48:38AM +0800, Annie Li wrote:
>>> Current netfront only grants pages for grant copy, not for grant 
>>> transfer, so
>>> remove corresponding transfer code and add receiving copy code in
>>> xennet_release_rx_bufs.
>>>
>>
>> This path seldom gets call -- not that many people unload xen-netfront
>> driver. If Annie has tested this patch and it works as expected I think
>> it's fine.
>>
> In XenServer we have seen a number of cases where unplugging and 
> replugging VIFs results in leakage of grant references, eventually 
> leading to a case where you cannot plug a VIF (after ~ 400 such 
> cycles)...
>
> It's worth pointing out, as far as this patch is concerned, that 
> gnttab_end_foreign_access() can fail, 

Just like what Wei mentioned, it is gnttab_end_foreign_access_ref here, 
right?

> which is not taken into account here.

Good point, gnttab_end_foreign_access_ref fails for grant which is in use.

Thanks
Annie
>
> Andrew.
>
>> I'm not netfront maintainer but I'm happy to add
>> Acked-by: Wei Liu <wei.liu2@citrix.com>
>> if Annie confirms she's tested this patch.
>>
>> Wei.
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>>
>

^ permalink raw reply

* [PATCH 2/2] ixgbe: set driver_max_VFs should be done before enabling SRIOV
From: Ethan Zhao @ 2014-01-15 14:15 UTC (permalink / raw)
  To: aaron.f.brown, jeffrey.t.kirsher, jesse.brandeburg, bruce.w.allan,
	carolyn.wyborny, davem
  Cc: e1000-devel, netdev, Ethan Zhao, linux-kernel

commit 43dc4e01 Limit number of reported VFs to device specific value
It doesn't work and always returns -EBUSY because VFs ware already enabled.

ixgbe_enable_sriov()
        pci_enable_sriov()
                sriov_enable()
                {
                ... ..
                iov->ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE;
                pci_cfg_access_lock(dev);
                ... ...
                }

pci_sriov_set_totalvfs()
{
... ...
if (dev->sriov->ctrl & PCI_SRIOV_CTRL_VFE)
                return -EBUSY;
...
}

So should set driver_max_VFs with pci_sriov_set_totalvfs() before
enable VFs with ixgbe_enable_sriov().

Signed-off-by: Ethan Zhao <ethan.kernel@gmail.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 47e9b44..bfc8e94 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -7639,8 +7639,8 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	/* Mailbox */
 	ixgbe_init_mbx_params_pf(hw);
 	memcpy(&hw->mbx.ops, ii->mbx_ops, sizeof(hw->mbx.ops));
-	ixgbe_enable_sriov(adapter);
 	pci_sriov_set_totalvfs(pdev, IXGBE_MAX_VFS_DRV_LIMIT);
+	ixgbe_enable_sriov(adapter);
 skip_sriov:
 
 #endif
-- 
1.8.3.4 (Apple Git-47)


------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply related

* [PATCH 1/2 v3] ixgbe: define IXGBE_MAX_VFS_DRV_LIMIT macro and cleanup const 63
From: Ethan Zhao @ 2014-01-15 14:12 UTC (permalink / raw)
  To: aaron.f.brown, jeffrey.t.kirsher, jesse.brandeburg, bruce.w.allan,
	carolyn.wyborny, davem
  Cc: e1000-devel, netdev, linux-kernel, Ethan Zhao

Because ixgbe driver limit the max number of VF functions could be enabled
to 63, so define one macro IXGBE_MAX_VFS_DRV_LIMIT and cleanup the const 63
in code.

v2: fix a typo.
v3: fix a encoding issue.

Signed-off-by: Ethan Zhao <ethan.kernel@gmail.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  | 4 ++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c | 5 +++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h | 5 +++++
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 0ade0cd..47e9b44 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -4818,7 +4818,7 @@ static int ixgbe_sw_init(struct ixgbe_adapter *adapter)
 #ifdef CONFIG_PCI_IOV
 	/* assign number of SR-IOV VFs */
 	if (hw->mac.type != ixgbe_mac_82598EB)
-		adapter->num_vfs = (max_vfs > 63) ? 0 : max_vfs;
+		adapter->num_vfs = (max_vfs > IXGBE_MAX_VFS_DRV_LIMIT) ? 0 : max_vfs;
 
 #endif
 	/* enable itr by default in dynamic mode */
@@ -7640,7 +7640,7 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	ixgbe_init_mbx_params_pf(hw);
 	memcpy(&hw->mbx.ops, ii->mbx_ops, sizeof(hw->mbx.ops));
 	ixgbe_enable_sriov(adapter);
-	pci_sriov_set_totalvfs(pdev, 63);
+	pci_sriov_set_totalvfs(pdev, IXGBE_MAX_VFS_DRV_LIMIT);
 skip_sriov:
 
 #endif
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 276d7b1..6925979 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -152,7 +152,8 @@ void ixgbe_enable_sriov(struct ixgbe_adapter *adapter)
 		 * physical function.  If the user requests greater thn
 		 * 63 VFs then it is an error - reset to default of zero.
 		 */
-		adapter->num_vfs = min_t(unsigned int, adapter->num_vfs, 63);
+		adapter->num_vfs = min_t(unsigned int,
+					adapter->num_vfs, IXGBE_MAX_VFS_DRV_LIMIT);
 
 		err = pci_enable_sriov(adapter->pdev, adapter->num_vfs);
 		if (err) {
@@ -259,7 +260,7 @@ static int ixgbe_pci_sriov_enable(struct pci_dev *dev, int num_vfs)
 	 * PF.  The PCI bus driver already checks for other values out of
 	 * range.
 	 */
-	if (num_vfs > 63) {
+	if (num_vfs > IXGBE_MAX_VFS_DRV_LIMIT) {
 		err = -EPERM;
 		goto err_out;
 	}
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h
index 4713f9f..2593666 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h
@@ -28,6 +28,11 @@
 #ifndef _IXGBE_SRIOV_H_
 #define _IXGBE_SRIOV_H_
 
+/*  ixgbe driver limit the max number of VFs could be enabled to
+ *  63 (IXGBE_MAX_VF_FUNCTIONS - 1)
+ */
+#define IXGBE_MAX_VFS_DRV_LIMIT  (IXGBE_MAX_VF_FUNCTIONS - 1)
+
 void ixgbe_restore_vf_multicasts(struct ixgbe_adapter *adapter);
 void ixgbe_msg_task(struct ixgbe_adapter *adapter);
 int ixgbe_vf_configuration(struct pci_dev *pdev, unsigned int event_mask);
-- 
1.8.3.4 (Apple Git-47)

^ permalink raw reply related

* Re: [PATCH net-next] sctp: create helper function to enable|disable sackdelay
From: Neil Horman @ 2014-01-15 14:10 UTC (permalink / raw)
  To: Wang Weidong; +Cc: Vlad Yasevich, David Miller, linux-sctp, netdev
In-Reply-To: <52D653B1.4040708@huawei.com>

On Wed, Jan 15, 2014 at 05:24:01PM +0800, Wang Weidong wrote:
> add sctp_spp_sackdelay_{enable|disable} helper function for
> avoiding code duplication. 
> 
> Signed-off-by: Wang Weidong <wangweidong1@huawei.com>

Acked-by: Neil Horman <nhorman@tuxdriver.com>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox