Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 1/1 net-next] ipx: replace __inline__ by inline
From: David Miller @ 2014-10-27 19:46 UTC (permalink / raw)
  To: fabf; +Cc: linux-kernel, acme, netdev
In-Reply-To: <1414433352-29210-1-git-send-email-fabf@skynet.be>

From: Fabian Frederick <fabf@skynet.be>
Date: Mon, 27 Oct 2014 19:09:12 +0100

> Signed-off-by: Fabian Frederick <fabf@skynet.be>

If it's in a foo.c file, just kill the inline completely and let the
compiler decide.

Thanks.

^ permalink raw reply

* [PATCH V2 net-next] ipx: remove unnecessary casting on ntohl
From: Fabian Frederick @ 2014-10-27 19:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: joe, Fabian Frederick, Arnaldo Carvalho de Melo, David S. Miller,
	netdev

use %08X instead of %08lX and remove casting.

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
---
V2: remove casting instead of long unsigned int -> unsigned long
(suggested by Joe Perches)

 net/ipx/ipx_proc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipx/ipx_proc.c b/net/ipx/ipx_proc.c
index e15c16a..8391191 100644
--- a/net/ipx/ipx_proc.c
+++ b/net/ipx/ipx_proc.c
@@ -89,8 +89,8 @@ static int ipx_seq_route_show(struct seq_file *seq, void *v)
 
 	seq_printf(seq, "%08lX   ", (unsigned long int)ntohl(rt->ir_net));
 	if (rt->ir_routed)
-		seq_printf(seq, "%08lX     %02X%02X%02X%02X%02X%02X\n",
-			   (long unsigned int)ntohl(rt->ir_intrfc->if_netnum),
+		seq_printf(seq, "%08X     %02X%02X%02X%02X%02X%02X\n",
+			   ntohl(rt->ir_intrfc->if_netnum),
 			   rt->ir_router_node[0], rt->ir_router_node[1],
 			   rt->ir_router_node[2], rt->ir_router_node[3],
 			   rt->ir_router_node[4], rt->ir_router_node[5]);
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH 1/1 net-next] net/irda: include linux/uaccess.h instead of asm/uaccess.h
From: David Miller @ 2014-10-27 20:04 UTC (permalink / raw)
  To: fabf; +Cc: linux-kernel, samuel, netdev
In-Reply-To: <1414432809-29054-1-git-send-email-fabf@skynet.be>

From: Fabian Frederick <fabf@skynet.be>
Date: Mon, 27 Oct 2014 19:00:08 +0100

> Signed-off-by: Fabian Frederick <fabf@skynet.be>

Applied.

^ permalink raw reply

* Re: [PATCH 1/1] ipv4: remove set but unused variable sha
From: David Miller @ 2014-10-27 20:04 UTC (permalink / raw)
  To: fabf; +Cc: linux-kernel, kuznet, jmorris, yoshfuji, kaber, netdev
In-Reply-To: <1414433002-29106-1-git-send-email-fabf@skynet.be>

From: Fabian Frederick <fabf@skynet.be>
Date: Mon, 27 Oct 2014 19:03:22 +0100

> unsigned char *sha (source) was already in original git version
>  but was never used.
> 
> Signed-off-by: Fabian Frederick <fabf@skynet.be>

Applied.

^ permalink raw reply

* Re: [PATCH 1/1 net-next] ipv6: replace min/casting by min_t
From: David Miller @ 2014-10-27 20:04 UTC (permalink / raw)
  To: fabf; +Cc: linux-kernel, kuznet, jmorris, yoshfuji, kaber, netdev
In-Reply-To: <1414433516-29278-1-git-send-email-fabf@skynet.be>

From: Fabian Frederick <fabf@skynet.be>
Date: Mon, 27 Oct 2014 19:11:56 +0100

> Signed-off-by: Fabian Frederick <fabf@skynet.be>

Applied.

^ permalink raw reply

* Re: [PATCH 1/1 net-next] ipv6: include linux/uaccess.h instead of asm/uaccess.h
From: David Miller @ 2014-10-27 20:04 UTC (permalink / raw)
  To: fabf; +Cc: linux-kernel, kuznet, jmorris, yoshfuji, kaber, netdev
In-Reply-To: <1414433578-29321-1-git-send-email-fabf@skynet.be>

From: Fabian Frederick <fabf@skynet.be>
Date: Mon, 27 Oct 2014 19:12:58 +0100

> Signed-off-by: Fabian Frederick <fabf@skynet.be>

Applied.

^ permalink raw reply

* Re: [PATCH 1/1 net-next] ipx: move extern sysctl_ipx_pprop_broadcasting to header file
From: David Miller @ 2014-10-27 20:04 UTC (permalink / raw)
  To: fabf; +Cc: linux-kernel, acme, kuznet, jmorris, yoshfuji, kaber, netdev
In-Reply-To: <1414436441-4784-1-git-send-email-fabf@skynet.be>

From: Fabian Frederick <fabf@skynet.be>
Date: Mon, 27 Oct 2014 20:00:41 +0100

> include ipx.h from sysctl_net_ipx.c
> 
> Signed-off-by: Fabian Frederick <fabf@skynet.be>

Applied.

^ permalink raw reply

* Re: [PATCH V2 net-next] ipx: remove unnecessary casting on ntohl
From: David Miller @ 2014-10-27 20:04 UTC (permalink / raw)
  To: fabf; +Cc: linux-kernel, joe, acme, netdev
In-Reply-To: <1414439709-5870-1-git-send-email-fabf@skynet.be>

From: Fabian Frederick <fabf@skynet.be>
Date: Mon, 27 Oct 2014 20:55:08 +0100

> use %08X instead of %08lX and remove casting.
> 
> Suggested-by: Joe Perches <joe@perches.com>
> Signed-off-by: Fabian Frederick <fabf@skynet.be>
> ---
> V2: remove casting instead of long unsigned int -> unsigned long
> (suggested by Joe Perches)

Applied.

^ permalink raw reply

* [PATCH V2 net-next] ipx: remove __inline__ in c file on static
From: Fabian Frederick @ 2014-10-27 20:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Fabian Frederick, Arnaldo Carvalho de Melo, David S. Miller,
	netdev

Let compiler decide what to do with static void __ipxitf_put()

Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
---
V2: remove __inline__ instead of replacing it by standard inline
(suggested by David S. Miller)

 net/ipx/af_ipx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipx/af_ipx.c b/net/ipx/af_ipx.c
index 91729b8..313ef46 100644
--- a/net/ipx/af_ipx.c
+++ b/net/ipx/af_ipx.c
@@ -306,7 +306,7 @@ void ipxitf_down(struct ipx_interface *intrfc)
 	spin_unlock_bh(&ipx_interfaces_lock);
 }
 
-static __inline__ void __ipxitf_put(struct ipx_interface *intrfc)
+static void __ipxitf_put(struct ipx_interface *intrfc)
 {
 	if (atomic_dec_and_test(&intrfc->refcnt))
 		__ipxitf_down(intrfc);
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH V2 net-next] ipx: remove __inline__ in c file on static
From: David Miller @ 2014-10-27 20:25 UTC (permalink / raw)
  To: fabf; +Cc: linux-kernel, acme, netdev
In-Reply-To: <1414440728-6144-1-git-send-email-fabf@skynet.be>

From: Fabian Frederick <fabf@skynet.be>
Date: Mon, 27 Oct 2014 21:12:08 +0100

> Let compiler decide what to do with static void __ipxitf_put()
> 
> Suggested-by: David S. Miller <davem@davemloft.net>
> Signed-off-by: Fabian Frederick <fabf@skynet.be>
> ---
> V2: remove __inline__ instead of replacing it by standard inline
> (suggested by David S. Miller)

Applied, thanks Fabian.

^ permalink raw reply

* [PATCH] net: smc91x: Fix gpios for device tree based booting
From: Tony Lindgren @ 2014-10-27 20:25 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, devicetree, linux-omap, Kevin Hilman

With legacy booting, the platform init code was taking care of
the configuring of GPIOs. With device tree based booting, things
may or may not work depending what bootloader has configured or
if the legacy platform code gets called.

Let's add support for the pwrdn and reset GPIOs to the smc91x
driver to fix the issues of smc91x not working properly when
booted in device tree mode.

And let's change n900 to use these settings as some versions
of the bootloader do not configure things properly causing
errors.

Reported-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>

--- a/Documentation/devicetree/bindings/net/smsc-lan91c111.txt
+++ b/Documentation/devicetree/bindings/net/smsc-lan91c111.txt
@@ -11,3 +11,5 @@ Optional properties:
   are supported on the device.  Valid value for SMSC LAN91c111 are
   1, 2 or 4.  If it's omitted or invalid, the size would be 2 meaning
   16-bit access only.
+- power-gpios: GPIO to control the PWRDWN pin
+- reset-gpios: GPIO to control the RESET pin
--- a/arch/arm/boot/dts/omap3-n900.dts
+++ b/arch/arm/boot/dts/omap3-n900.dts
@@ -697,6 +697,8 @@
 		bank-width = <2>;
 		pinctrl-names = "default";
 		pinctrl-0 = <&ethernet_pins>;
+		power-gpios = <&gpio3 22 GPIO_ACTIVE_HIGH>;	/* gpio86 */
+		reset-gpios = <&gpio6 4 GPIO_ACTIVE_HIGH>;	/* gpio164 */
 		gpmc,device-width = <2>;
 		gpmc,sync-clk-ps = <0>;
 		gpmc,cs-on-ns = <0>;
--- a/arch/arm/mach-omap2/pdata-quirks.c
+++ b/arch/arm/mach-omap2/pdata-quirks.c
@@ -253,9 +253,6 @@ static void __init nokia_n900_legacy_init(void)
 		platform_device_register(&omap3_rom_rng_device);
 
 	}
-
-	/* Only on some development boards */
-	gpio_request_one(164, GPIOF_OUT_INIT_LOW, "smc91x reset");
 }
 
 static void __init omap3_tao3530_legacy_init(void)
--- a/drivers/net/ethernet/smsc/smc91x.c
+++ b/drivers/net/ethernet/smsc/smc91x.c
@@ -81,6 +81,7 @@ static const char version[] =
 #include <linux/workqueue.h>
 #include <linux/of.h>
 #include <linux/of_device.h>
+#include <linux/of_gpio.h>
 
 #include <linux/netdevice.h>
 #include <linux/etherdevice.h>
@@ -2190,6 +2191,41 @@ static const struct of_device_id smc91x_match[] = {
 MODULE_DEVICE_TABLE(of, smc91x_match);
 #endif
 
+/**
+ * of_try_set_control_gpio - configure a gpio if it exists
+ */
+static int try_toggle_control_gpio(struct device *dev,
+				   struct gpio_desc **desc,
+				   const char *name, int index,
+				   int value, unsigned int nsdelay)
+{
+	struct gpio_desc *gpio = *desc;
+	int res;
+
+	gpio = devm_gpiod_get_index(dev, name, index);
+	if (IS_ERR(gpio)) {
+		if (PTR_ERR(gpio) == -ENOENT) {
+			*desc = NULL;
+			return 0;
+		}
+
+		return PTR_ERR(gpio);
+	}
+	res = gpiod_direction_output(gpio, !value);
+	if (res) {
+		dev_err(dev, "unable to toggle gpio %s: %i\n", name, res);
+		devm_gpiod_put(dev, gpio);
+		gpio = NULL;
+		return res;
+	}
+	if (nsdelay)
+		usleep_range(nsdelay, 2 * nsdelay);
+	gpiod_set_value_cansleep(gpio, value);
+	*desc = gpio;
+
+	return 0;
+}
+
 /*
  * smc_init(void)
  *   Input parameters:
@@ -2237,6 +2273,28 @@ static int smc_drv_probe(struct platform_device *pdev)
 		struct device_node *np = pdev->dev.of_node;
 		u32 val;
 
+		/* Optional pwrdwn GPIO configured? */
+		ret = try_toggle_control_gpio(&pdev->dev, &lp->power_gpio,
+					      "power", 0, 0, 100);
+		if (ret)
+			return ret;
+
+		/*
+		 * Optional reset GPIO configured? Minimum 100 ns reset needed
+		 * according to LAN91C96 datasheet page 14.
+		 */
+		ret = try_toggle_control_gpio(&pdev->dev, &lp->reset_gpio,
+					      "reset", 0, 0, 100);
+		if (ret)
+			return ret;
+
+		/*
+		 * Need to wait for optional EEPROM to load, max 750 us according
+		 * to LAN91C96 datasheet page 55.
+		 */
+		if (lp->reset_gpio)
+			usleep_range(750, 1000);
+
 		/* Combination of IO widths supported, default to 16-bit */
 		if (!of_property_read_u32(np, "reg-io-width", &val)) {
 			if (val & 1)
--- a/drivers/net/ethernet/smsc/smc91x.h
+++ b/drivers/net/ethernet/smsc/smc91x.h
@@ -298,6 +298,9 @@ struct smc_local {
 	struct sk_buff *pending_tx_skb;
 	struct tasklet_struct tx_task;
 
+	struct gpio_desc *power_gpio;
+	struct gpio_desc *reset_gpio;
+
 	/* version/revision of the SMC91x chip */
 	int	version;
 

^ permalink raw reply

* Re: irq disable in __netdev_alloc_frag() ?
From: Jesper Dangaard Brouer @ 2014-10-27 20:35 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: brouer, Alexander Duyck, Alexei Starovoitov, Eric Dumazet,
	Network Development, Christoph Lameter
In-Reply-To: <1414036276.2094.18.camel@edumazet-glaptop2.roam.corp.google.com>

On Wed, 22 Oct 2014 20:51:16 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On my hosts, this hard irq masking is pure noise.

On my hosts I can measure a significant difference between using
local_irq_disable() vs. local_irq_save(flags)

 *  2.860 ns cost for local_irq_{disable,enable} 
 * 14.840 ns cost for local_irq_save()+local_irq_restore() 

This is quite significant in my nanosec world ;-)

 
> What CPU are you using Alexander ?

I'm using a E5-2695 (Ivy-bridge)

You can easily reproduce my results on your own system with my
time_bench_sample module here:
 https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time_bench_sample.c#L173

> Same could be done with some kmem_cache_alloc() : SLAB uses hard irq
> masking while some caches are never used from hard irq context.

Sounds interesting.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: localed stuck in recent 3.18 git in copy_net_ns?
From: Jay Vosburgh @ 2014-10-27 20:43 UTC (permalink / raw)
  To: paulmck
  Cc: Yanko Kaneti, Josh Boyer, Eric W. Biederman, Cong Wang,
	Kevin Fenzi, netdev, Linux-Kernel@Vger. Kernel. Org, mroos, tj
In-Reply-To: <20141027174539.GC27568@linux.vnet.ibm.com>

Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:

>On Sat, Oct 25, 2014 at 11:18:27AM -0700, Paul E. McKenney wrote:
>> On Sat, Oct 25, 2014 at 09:38:16AM -0700, Jay Vosburgh wrote:
>> > Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
>> > 
>> > >On Fri, Oct 24, 2014 at 09:33:33PM -0700, Jay Vosburgh wrote:
>> > >> 	Looking at the dmesg, the early boot messages seem to be
>> > >> confused as to how many CPUs there are, e.g.,
>> > >> 
>> > >> [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
>> > >> [    0.000000] Hierarchical RCU implementation.
>> > >> [    0.000000]  RCU debugfs-based tracing is enabled.
>> > >> [    0.000000]  RCU dyntick-idle grace-period acceleration is enabled.
>> > >> [    0.000000]  RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
>> > >> [    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
>> > >> [    0.000000] NR_IRQS:16640 nr_irqs:456 0
>> > >> [    0.000000]  Offload RCU callbacks from all CPUs
>> > >> [    0.000000]  Offload RCU callbacks from CPUs: 0-3.
>> > >> 
>> > >> 	but later shows 2:
>> > >> 
>> > >> [    0.233703] x86: Booting SMP configuration:
>> > >> [    0.236003] .... node  #0, CPUs:      #1
>> > >> [    0.255528] x86: Booted up 1 node, 2 CPUs
>> > >> 
>> > >> 	In any event, the E8400 is a 2 core CPU with no hyperthreading.
>> > >
>> > >Well, this might explain some of the difficulties.  If RCU decides to wait
>> > >on CPUs that don't exist, we will of course get a hang.  And rcu_barrier()
>> > >was definitely expecting four CPUs.
>> > >
>> > >So what happens if you boot with maxcpus=2?  (Or build with
>> > >CONFIG_NR_CPUS=2.) I suspect that this might avoid the hang.  If so,
>> > >I might have some ideas for a real fix.
>> > 
>> > 	Booting with maxcpus=2 makes no difference (the dmesg output is
>> > the same).
>> > 
>> > 	Rebuilding with CONFIG_NR_CPUS=2 makes the problem go away, and
>> > dmesg has different CPU information at boot:
>> > 
>> > [    0.000000] smpboot: 4 Processors exceeds NR_CPUS limit of 2
>> > [    0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
>> >  [...]
>> > [    0.000000] setup_percpu: NR_CPUS:2 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:1
>> >  [...]
>> > [    0.000000] Hierarchical RCU implementation.
>> > [    0.000000] 	RCU debugfs-based tracing is enabled.
>> > [    0.000000] 	RCU dyntick-idle grace-period acceleration is enabled.
>> > [    0.000000] NR_IRQS:4352 nr_irqs:440 0
>> > [    0.000000] 	Offload RCU callbacks from all CPUs
>> > [    0.000000] 	Offload RCU callbacks from CPUs: 0-1.
>> 
>> Thank you -- this confirms my suspicions on the fix, though I must admit
>> to being surprised that maxcpus made no difference.
>
>And here is an alleged fix, lightly tested at this end.  Does this patch
>help?

	This patch appears to make the problem go away; I've run about
10 iterations.  I applied this patch to the same -net tree I was using
previously (-net as of Oct 22), with all other test patches removed.

	FWIW, dmesg is unchanged, and still shows messages like:

[    0.000000]  Offload RCU callbacks from CPUs: 0-3.

Tested-by: Jay Vosburgh <jay.vosburgh@canonical.com>

	-J

>							Thanx, Paul
>
>------------------------------------------------------------------------
>
>rcu: Make rcu_barrier() understand about missing rcuo kthreads
>
>Commit 35ce7f29a44a (rcu: Create rcuo kthreads only for onlined CPUs)
>avoids creating rcuo kthreads for CPUs that never come online.  This
>fixes a bug in many instances of firmware: Instead of lying about their
>age, these systems instead lie about the number of CPUs that they have.
>Before commit 35ce7f29a44a, this could result in huge numbers of useless
>rcuo kthreads being created.
>
>It appears that experience indicates that I should have told the
>people suffering from this problem to fix their broken firmware, but
>I instead produced what turned out to be a partial fix.   The missing
>piece supplied by this commit makes sure that rcu_barrier() knows not to
>post callbacks for no-CBs CPUs that have not yet come online, because
>otherwise rcu_barrier() will hang on systems having firmware that lies
>about the number of CPUs.
>
>It is tempting to simply have rcu_barrier() refuse to post a callback on
>any no-CBs CPU that does not have an rcuo kthread.  This unfortunately
>does not work because rcu_barrier() is required to wait for all pending
>callbacks.  It is therefore required to wait even for those callbacks
>that cannot possibly be invoked.  Even if doing so hangs the system.
>
>Given that posting a callback to a no-CBs CPU that does not yet have an
>rcuo kthread can hang rcu_barrier(), It is tempting to report an error
>in this case.  Unfortunately, this will result in false positives at
>boot time, when it is perfectly legal to post callbacks to the boot CPU
>before the scheduler has started, in other words, before it is legal
>to invoke rcu_barrier().
>
>So this commit instead has rcu_barrier() avoid posting callbacks to
>CPUs having neither rcuo kthread nor pending callbacks, and has it
>complain bitterly if it finds CPUs having no rcuo kthread but some
>pending callbacks.  And when rcu_barrier() does find CPUs having no rcuo
>kthread but pending callbacks, as noted earlier, it has no choice but
>to hang indefinitely.
>
>Reported-by: Yanko Kaneti <yaneti@declera.com>
>Reported-by: Jay Vosburgh <jay.vosburgh@canonical.com>
>Reported-by: Meelis Roos <mroos@linux.ee>
>Reported-by: Eric B Munson <emunson@akamai.com>
>Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>
>diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
>index aa8e5eea3ab4..c78e88ce5ea3 100644
>--- a/include/trace/events/rcu.h
>+++ b/include/trace/events/rcu.h
>@@ -660,18 +660,18 @@ TRACE_EVENT(rcu_torture_read,
> /*
>  * Tracepoint for _rcu_barrier() execution.  The string "s" describes
>  * the _rcu_barrier phase:
>- *	"Begin": rcu_barrier_callback() started.
>- *	"Check": rcu_barrier_callback() checking for piggybacking.
>- *	"EarlyExit": rcu_barrier_callback() piggybacked, thus early exit.
>- *	"Inc1": rcu_barrier_callback() piggyback check counter incremented.
>- *	"Offline": rcu_barrier_callback() found offline CPU
>- *	"OnlineNoCB": rcu_barrier_callback() found online no-CBs CPU.
>- *	"OnlineQ": rcu_barrier_callback() found online CPU with callbacks.
>- *	"OnlineNQ": rcu_barrier_callback() found online CPU, no callbacks.
>+ *	"Begin": _rcu_barrier() started.
>+ *	"Check": _rcu_barrier() checking for piggybacking.
>+ *	"EarlyExit": _rcu_barrier() piggybacked, thus early exit.
>+ *	"Inc1": _rcu_barrier() piggyback check counter incremented.
>+ *	"OfflineNoCB": _rcu_barrier() found callback on never-online CPU
>+ *	"OnlineNoCB": _rcu_barrier() found online no-CBs CPU.
>+ *	"OnlineQ": _rcu_barrier() found online CPU with callbacks.
>+ *	"OnlineNQ": _rcu_barrier() found online CPU, no callbacks.
>  *	"IRQ": An rcu_barrier_callback() callback posted on remote CPU.
>  *	"CB": An rcu_barrier_callback() invoked a callback, not the last.
>  *	"LastCB": An rcu_barrier_callback() invoked the last callback.
>- *	"Inc2": rcu_barrier_callback() piggyback check counter incremented.
>+ *	"Inc2": _rcu_barrier() piggyback check counter incremented.
>  * The "cpu" argument is the CPU or -1 if meaningless, the "cnt" argument
>  * is the count of remaining callbacks, and "done" is the piggybacking count.
>  */
>diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>index f6880052b917..7680fc275036 100644
>--- a/kernel/rcu/tree.c
>+++ b/kernel/rcu/tree.c
>@@ -3312,11 +3312,16 @@ static void _rcu_barrier(struct rcu_state *rsp)
> 			continue;
> 		rdp = per_cpu_ptr(rsp->rda, cpu);
> 		if (rcu_is_nocb_cpu(cpu)) {
>-			_rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
>-					   rsp->n_barrier_done);
>-			atomic_inc(&rsp->barrier_cpu_count);
>-			__call_rcu(&rdp->barrier_head, rcu_barrier_callback,
>-				   rsp, cpu, 0);
>+			if (!rcu_nocb_cpu_needs_barrier(rsp, cpu)) {
>+				_rcu_barrier_trace(rsp, "OfflineNoCB", cpu,
>+						   rsp->n_barrier_done);
>+			} else {
>+				_rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
>+						   rsp->n_barrier_done);
>+				atomic_inc(&rsp->barrier_cpu_count);
>+				__call_rcu(&rdp->barrier_head,
>+					   rcu_barrier_callback, rsp, cpu, 0);
>+			}
> 		} else if (ACCESS_ONCE(rdp->qlen)) {
> 			_rcu_barrier_trace(rsp, "OnlineQ", cpu,
> 					   rsp->n_barrier_done);
>diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
>index 4beab3d2328c..8e7b1843896e 100644
>--- a/kernel/rcu/tree.h
>+++ b/kernel/rcu/tree.h
>@@ -587,6 +587,7 @@ static void print_cpu_stall_info(struct rcu_state *rsp, int cpu);
> static void print_cpu_stall_info_end(void);
> static void zero_cpu_stall_ticks(struct rcu_data *rdp);
> static void increment_cpu_stall_ticks(void);
>+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu);
> static void rcu_nocb_gp_set(struct rcu_node *rnp, int nrq);
> static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp);
> static void rcu_init_one_nocb(struct rcu_node *rnp);
>diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
>index 927c17b081c7..68c5b23b7173 100644
>--- a/kernel/rcu/tree_plugin.h
>+++ b/kernel/rcu/tree_plugin.h
>@@ -2050,6 +2050,33 @@ static void wake_nocb_leader(struct rcu_data *rdp, bool force)
> }
> 
> /*
>+ * Does the specified CPU need an RCU callback for the specified flavor
>+ * of rcu_barrier()?
>+ */
>+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
>+{
>+	struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
>+	struct rcu_head *rhp;
>+
>+	/* No-CBs CPUs might have callbacks on any of three lists. */
>+	rhp = ACCESS_ONCE(rdp->nocb_head);
>+	if (!rhp)
>+		rhp = ACCESS_ONCE(rdp->nocb_gp_head);
>+	if (!rhp)
>+		rhp = ACCESS_ONCE(rdp->nocb_follower_head);
>+
>+	/* Having no rcuo kthread but CBs after scheduler starts is bad! */
>+	if (!ACCESS_ONCE(rdp->nocb_kthread) && rhp) {
>+		/* RCU callback enqueued before CPU first came online??? */
>+		pr_err("RCU: Never-onlined no-CBs CPU %d has CB %p\n",
>+		       cpu, rhp->func);
>+		WARN_ON_ONCE(1);
>+	}
>+
>+	return !!rhp;
>+}
>+
>+/*
>  * Enqueue the specified string of rcu_head structures onto the specified
>  * CPU's no-CBs lists.  The CPU is specified by rdp, the head of the
>  * string by rhp, and the tail of the string by rhtp.  The non-lazy/lazy
>@@ -2646,6 +2673,10 @@ static bool init_nocb_callback_list(struct rcu_data *rdp)
> 
> #else /* #ifdef CONFIG_RCU_NOCB_CPU */
> 
>+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
>+{
>+}
>+
> static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp)
> {
> }
>

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply

* Re: localed stuck in recent 3.18 git in copy_net_ns?
From: Paul E. McKenney @ 2014-10-27 21:07 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Yanko Kaneti, Josh Boyer, Eric W. Biederman, Cong Wang,
	Kevin Fenzi, netdev, Linux-Kernel@Vger. Kernel. Org, mroos, tj
In-Reply-To: <25166.1414442601@famine>

On Mon, Oct 27, 2014 at 01:43:21PM -0700, Jay Vosburgh wrote:
> Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> 
> >On Sat, Oct 25, 2014 at 11:18:27AM -0700, Paul E. McKenney wrote:
> >> On Sat, Oct 25, 2014 at 09:38:16AM -0700, Jay Vosburgh wrote:
> >> > Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> >> > 
> >> > >On Fri, Oct 24, 2014 at 09:33:33PM -0700, Jay Vosburgh wrote:
> >> > >> 	Looking at the dmesg, the early boot messages seem to be
> >> > >> confused as to how many CPUs there are, e.g.,
> >> > >> 
> >> > >> [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> >> > >> [    0.000000] Hierarchical RCU implementation.
> >> > >> [    0.000000]  RCU debugfs-based tracing is enabled.
> >> > >> [    0.000000]  RCU dyntick-idle grace-period acceleration is enabled.
> >> > >> [    0.000000]  RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
> >> > >> [    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
> >> > >> [    0.000000] NR_IRQS:16640 nr_irqs:456 0
> >> > >> [    0.000000]  Offload RCU callbacks from all CPUs
> >> > >> [    0.000000]  Offload RCU callbacks from CPUs: 0-3.
> >> > >> 
> >> > >> 	but later shows 2:
> >> > >> 
> >> > >> [    0.233703] x86: Booting SMP configuration:
> >> > >> [    0.236003] .... node  #0, CPUs:      #1
> >> > >> [    0.255528] x86: Booted up 1 node, 2 CPUs
> >> > >> 
> >> > >> 	In any event, the E8400 is a 2 core CPU with no hyperthreading.
> >> > >
> >> > >Well, this might explain some of the difficulties.  If RCU decides to wait
> >> > >on CPUs that don't exist, we will of course get a hang.  And rcu_barrier()
> >> > >was definitely expecting four CPUs.
> >> > >
> >> > >So what happens if you boot with maxcpus=2?  (Or build with
> >> > >CONFIG_NR_CPUS=2.) I suspect that this might avoid the hang.  If so,
> >> > >I might have some ideas for a real fix.
> >> > 
> >> > 	Booting with maxcpus=2 makes no difference (the dmesg output is
> >> > the same).
> >> > 
> >> > 	Rebuilding with CONFIG_NR_CPUS=2 makes the problem go away, and
> >> > dmesg has different CPU information at boot:
> >> > 
> >> > [    0.000000] smpboot: 4 Processors exceeds NR_CPUS limit of 2
> >> > [    0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
> >> >  [...]
> >> > [    0.000000] setup_percpu: NR_CPUS:2 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:1
> >> >  [...]
> >> > [    0.000000] Hierarchical RCU implementation.
> >> > [    0.000000] 	RCU debugfs-based tracing is enabled.
> >> > [    0.000000] 	RCU dyntick-idle grace-period acceleration is enabled.
> >> > [    0.000000] NR_IRQS:4352 nr_irqs:440 0
> >> > [    0.000000] 	Offload RCU callbacks from all CPUs
> >> > [    0.000000] 	Offload RCU callbacks from CPUs: 0-1.
> >> 
> >> Thank you -- this confirms my suspicions on the fix, though I must admit
> >> to being surprised that maxcpus made no difference.
> >
> >And here is an alleged fix, lightly tested at this end.  Does this patch
> >help?
> 
> 	This patch appears to make the problem go away; I've run about
> 10 iterations.  I applied this patch to the same -net tree I was using
> previously (-net as of Oct 22), with all other test patches removed.

So I finally produced a patch that helps!  It was bound to happen sooner
or later, I guess.  ;-)

> 	FWIW, dmesg is unchanged, and still shows messages like:
> 
> [    0.000000]  Offload RCU callbacks from CPUs: 0-3.

Yep, at that point in boot, RCU has no way of knowing that the firmware
is lying to it about the number of CPUs.  ;-)

> Tested-by: Jay Vosburgh <jay.vosburgh@canonical.com>

Thank you for your testing efforts!!!

							Thanx, Paul

> 	-J
> >
> >------------------------------------------------------------------------
> >
> >rcu: Make rcu_barrier() understand about missing rcuo kthreads
> >
> >Commit 35ce7f29a44a (rcu: Create rcuo kthreads only for onlined CPUs)
> >avoids creating rcuo kthreads for CPUs that never come online.  This
> >fixes a bug in many instances of firmware: Instead of lying about their
> >age, these systems instead lie about the number of CPUs that they have.
> >Before commit 35ce7f29a44a, this could result in huge numbers of useless
> >rcuo kthreads being created.
> >
> >It appears that experience indicates that I should have told the
> >people suffering from this problem to fix their broken firmware, but
> >I instead produced what turned out to be a partial fix.   The missing
> >piece supplied by this commit makes sure that rcu_barrier() knows not to
> >post callbacks for no-CBs CPUs that have not yet come online, because
> >otherwise rcu_barrier() will hang on systems having firmware that lies
> >about the number of CPUs.
> >
> >It is tempting to simply have rcu_barrier() refuse to post a callback on
> >any no-CBs CPU that does not have an rcuo kthread.  This unfortunately
> >does not work because rcu_barrier() is required to wait for all pending
> >callbacks.  It is therefore required to wait even for those callbacks
> >that cannot possibly be invoked.  Even if doing so hangs the system.
> >
> >Given that posting a callback to a no-CBs CPU that does not yet have an
> >rcuo kthread can hang rcu_barrier(), It is tempting to report an error
> >in this case.  Unfortunately, this will result in false positives at
> >boot time, when it is perfectly legal to post callbacks to the boot CPU
> >before the scheduler has started, in other words, before it is legal
> >to invoke rcu_barrier().
> >
> >So this commit instead has rcu_barrier() avoid posting callbacks to
> >CPUs having neither rcuo kthread nor pending callbacks, and has it
> >complain bitterly if it finds CPUs having no rcuo kthread but some
> >pending callbacks.  And when rcu_barrier() does find CPUs having no rcuo
> >kthread but pending callbacks, as noted earlier, it has no choice but
> >to hang indefinitely.
> >
> >Reported-by: Yanko Kaneti <yaneti@declera.com>
> >Reported-by: Jay Vosburgh <jay.vosburgh@canonical.com>
> >Reported-by: Meelis Roos <mroos@linux.ee>
> >Reported-by: Eric B Munson <emunson@akamai.com>
> >Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> >
> >diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> >index aa8e5eea3ab4..c78e88ce5ea3 100644
> >--- a/include/trace/events/rcu.h
> >+++ b/include/trace/events/rcu.h
> >@@ -660,18 +660,18 @@ TRACE_EVENT(rcu_torture_read,
> > /*
> >  * Tracepoint for _rcu_barrier() execution.  The string "s" describes
> >  * the _rcu_barrier phase:
> >- *	"Begin": rcu_barrier_callback() started.
> >- *	"Check": rcu_barrier_callback() checking for piggybacking.
> >- *	"EarlyExit": rcu_barrier_callback() piggybacked, thus early exit.
> >- *	"Inc1": rcu_barrier_callback() piggyback check counter incremented.
> >- *	"Offline": rcu_barrier_callback() found offline CPU
> >- *	"OnlineNoCB": rcu_barrier_callback() found online no-CBs CPU.
> >- *	"OnlineQ": rcu_barrier_callback() found online CPU with callbacks.
> >- *	"OnlineNQ": rcu_barrier_callback() found online CPU, no callbacks.
> >+ *	"Begin": _rcu_barrier() started.
> >+ *	"Check": _rcu_barrier() checking for piggybacking.
> >+ *	"EarlyExit": _rcu_barrier() piggybacked, thus early exit.
> >+ *	"Inc1": _rcu_barrier() piggyback check counter incremented.
> >+ *	"OfflineNoCB": _rcu_barrier() found callback on never-online CPU
> >+ *	"OnlineNoCB": _rcu_barrier() found online no-CBs CPU.
> >+ *	"OnlineQ": _rcu_barrier() found online CPU with callbacks.
> >+ *	"OnlineNQ": _rcu_barrier() found online CPU, no callbacks.
> >  *	"IRQ": An rcu_barrier_callback() callback posted on remote CPU.
> >  *	"CB": An rcu_barrier_callback() invoked a callback, not the last.
> >  *	"LastCB": An rcu_barrier_callback() invoked the last callback.
> >- *	"Inc2": rcu_barrier_callback() piggyback check counter incremented.
> >+ *	"Inc2": _rcu_barrier() piggyback check counter incremented.
> >  * The "cpu" argument is the CPU or -1 if meaningless, the "cnt" argument
> >  * is the count of remaining callbacks, and "done" is the piggybacking count.
> >  */
> >diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> >index f6880052b917..7680fc275036 100644
> >--- a/kernel/rcu/tree.c
> >+++ b/kernel/rcu/tree.c
> >@@ -3312,11 +3312,16 @@ static void _rcu_barrier(struct rcu_state *rsp)
> > 			continue;
> > 		rdp = per_cpu_ptr(rsp->rda, cpu);
> > 		if (rcu_is_nocb_cpu(cpu)) {
> >-			_rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
> >-					   rsp->n_barrier_done);
> >-			atomic_inc(&rsp->barrier_cpu_count);
> >-			__call_rcu(&rdp->barrier_head, rcu_barrier_callback,
> >-				   rsp, cpu, 0);
> >+			if (!rcu_nocb_cpu_needs_barrier(rsp, cpu)) {
> >+				_rcu_barrier_trace(rsp, "OfflineNoCB", cpu,
> >+						   rsp->n_barrier_done);
> >+			} else {
> >+				_rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
> >+						   rsp->n_barrier_done);
> >+				atomic_inc(&rsp->barrier_cpu_count);
> >+				__call_rcu(&rdp->barrier_head,
> >+					   rcu_barrier_callback, rsp, cpu, 0);
> >+			}
> > 		} else if (ACCESS_ONCE(rdp->qlen)) {
> > 			_rcu_barrier_trace(rsp, "OnlineQ", cpu,
> > 					   rsp->n_barrier_done);
> >diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> >index 4beab3d2328c..8e7b1843896e 100644
> >--- a/kernel/rcu/tree.h
> >+++ b/kernel/rcu/tree.h
> >@@ -587,6 +587,7 @@ static void print_cpu_stall_info(struct rcu_state *rsp, int cpu);
> > static void print_cpu_stall_info_end(void);
> > static void zero_cpu_stall_ticks(struct rcu_data *rdp);
> > static void increment_cpu_stall_ticks(void);
> >+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu);
> > static void rcu_nocb_gp_set(struct rcu_node *rnp, int nrq);
> > static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp);
> > static void rcu_init_one_nocb(struct rcu_node *rnp);
> >diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> >index 927c17b081c7..68c5b23b7173 100644
> >--- a/kernel/rcu/tree_plugin.h
> >+++ b/kernel/rcu/tree_plugin.h
> >@@ -2050,6 +2050,33 @@ static void wake_nocb_leader(struct rcu_data *rdp, bool force)
> > }
> > 
> > /*
> >+ * Does the specified CPU need an RCU callback for the specified flavor
> >+ * of rcu_barrier()?
> >+ */
> >+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
> >+{
> >+	struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
> >+	struct rcu_head *rhp;
> >+
> >+	/* No-CBs CPUs might have callbacks on any of three lists. */
> >+	rhp = ACCESS_ONCE(rdp->nocb_head);
> >+	if (!rhp)
> >+		rhp = ACCESS_ONCE(rdp->nocb_gp_head);
> >+	if (!rhp)
> >+		rhp = ACCESS_ONCE(rdp->nocb_follower_head);
> >+
> >+	/* Having no rcuo kthread but CBs after scheduler starts is bad! */
> >+	if (!ACCESS_ONCE(rdp->nocb_kthread) && rhp) {
> >+		/* RCU callback enqueued before CPU first came online??? */
> >+		pr_err("RCU: Never-onlined no-CBs CPU %d has CB %p\n",
> >+		       cpu, rhp->func);
> >+		WARN_ON_ONCE(1);
> >+	}
> >+
> >+	return !!rhp;
> >+}
> >+
> >+/*
> >  * Enqueue the specified string of rcu_head structures onto the specified
> >  * CPU's no-CBs lists.  The CPU is specified by rdp, the head of the
> >  * string by rhp, and the tail of the string by rhtp.  The non-lazy/lazy
> >@@ -2646,6 +2673,10 @@ static bool init_nocb_callback_list(struct rcu_data *rdp)
> > 
> > #else /* #ifdef CONFIG_RCU_NOCB_CPU */
> > 
> >+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
> >+{
> >+}
> >+
> > static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp)
> > {
> > }
> >
> 
> ---
> 	-Jay Vosburgh, jay.vosburgh@canonical.com
> 

^ permalink raw reply

* [PATCH 0/8] Netfilter fixes for net
From: Pablo Neira Ayuso @ 2014-10-27 21:37 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

Hi David,

The following patchset contains Netfilter fixes for your net tree,
they are:

1) Allow to recycle a TCP port in conntrack when the change role from
   server to client, from Marcelo Leitner.

2) Fix possible off by one access in ip_set_nfnl_get_byindex(), patch
   from Dan Carpenter.

3) alloc_percpu returns NULL on error, no need for IS_ERR() in nf_tables
   chain statistic updates. From Sabrina Dubroca.

4) Don't compile ip options in bridge netfilter, this mangles the packet
   and bridge should not alter layer >= 3 headers when forwarding packets.
   Patch from Herbert Xu and tested by Florian Westphal.

5) Account the final NLMSG_DONE message when calculating the size of the
   nflog netlink batches. Patch from Florian Westphal.

6) Fix a possible netlink attribute length overflow with large packets.
   Again from Florian Westphal.

7) Release the skbuff if nfnetlink_log fails to put the final
   NLMSG_DONE message. This fixes a leak on error. This shouldn't ever
   happen though, otherwise this means we miscalculate the netlink batch
   size, so spot a warning if this ever happens so we can track down the
   problem. This patch from Houcheng Lin.

8) Look at the right list when recycling targets in the nft_compat,
   patch from Arturo Borrero.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks!

----------------------------------------------------------------

The following changes since commit 7c1c97d54f9bfc810908d3903cb8bcacf734df18:

  net: sched: initialize bstats syncp (2014-10-21 21:45:21 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git master

for you to fetch changes up to 7965ee93719921ea5978f331da653dfa2d7b99f5:

  netfilter: nft_compat: fix wrong target lookup in nft_target_select_ops() (2014-10-27 22:17:46 +0100)

----------------------------------------------------------------
Arturo Borrero (1):
      netfilter: nft_compat: fix wrong target lookup in nft_target_select_ops()

Dan Carpenter (1):
      netfilter: ipset: off by one in ip_set_nfnl_get_byindex()

Florian Westphal (2):
      netfilter: nf_log: account for size of NLMSG_DONE attribute
      netfilter: nfnetlink_log: fix maximum packet length logged to userspace

Herbert Xu (1):
      bridge: Do not compile options in br_parse_ip_options

Houcheng Lin (1):
      netfilter: nf_log: release skbuff on nlmsg put failure

Marcelo Leitner (1):
      netfilter: nf_conntrack: allow server to become a client in TW handling

Sabrina Dubroca (1):
      netfilter: nf_tables: check for NULL in nf_tables_newchain pcpu stats allocation

 net/bridge/br_netfilter.c              |   24 +++++-------------------
 net/netfilter/ipset/ip_set_core.c      |    2 +-
 net/netfilter/nf_conntrack_proto_tcp.c |    4 ++--
 net/netfilter/nf_tables_api.c          |    4 ++--
 net/netfilter/nfnetlink_log.c          |   31 ++++++++++++++++---------------
 net/netfilter/nft_compat.c             |    2 +-
 6 files changed, 27 insertions(+), 40 deletions(-)

^ permalink raw reply

* [PATCH 1/8] netfilter: nf_conntrack: allow server to become a client in TW handling
From: Pablo Neira Ayuso @ 2014-10-27 21:38 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1414445887-5108-1-git-send-email-pablo@netfilter.org>

From: Marcelo Leitner <mleitner@redhat.com>

When a port that was used to listen for inbound connections gets closed
and reused for outgoing connections (like rsh ends up doing for stderr
flow), current we may reject the SYN/ACK packet for the new connection
because tcp_conntracks states forbirds a port to become a client while
there is still a TIME_WAIT entry in there for it.

As TCP may expire the TIME_WAIT socket in 60s and conntrack's timeout
for it is 120s, there is a ~60s window that the application can end up
opening a port that conntrack will end up blocking.

This patch fixes this by simply allowing such state transition: if we
see a SYN, in TIME_WAIT state, on REPLY direction, move it to sSS. Note
that the rest of the code already handles this situation, more
specificly in tcp_packet(), first switch clause.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_conntrack_proto_tcp.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index 44d1ea3..d87b642 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -213,7 +213,7 @@ static const u8 tcp_conntracks[2][6][TCP_CONNTRACK_MAX] = {
 	{
 /* REPLY */
 /* 	     sNO, sSS, sSR, sES, sFW, sCW, sLA, sTW, sCL, sS2	*/
-/*syn*/	   { sIV, sS2, sIV, sIV, sIV, sIV, sIV, sIV, sIV, sS2 },
+/*syn*/	   { sIV, sS2, sIV, sIV, sIV, sIV, sIV, sSS, sIV, sS2 },
 /*
  *	sNO -> sIV	Never reached.
  *	sSS -> sS2	Simultaneous open
@@ -223,7 +223,7 @@ static const u8 tcp_conntracks[2][6][TCP_CONNTRACK_MAX] = {
  *	sFW -> sIV
  *	sCW -> sIV
  *	sLA -> sIV
- *	sTW -> sIV	Reopened connection, but server may not do it.
+ *	sTW -> sSS	Reopened connection, but server may have switched role
  *	sCL -> sIV
  */
 /* 	     sNO, sSS, sSR, sES, sFW, sCW, sLA, sTW, sCL, sS2	*/
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 2/8] netfilter: ipset: off by one in ip_set_nfnl_get_byindex()
From: Pablo Neira Ayuso @ 2014-10-27 21:38 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1414445887-5108-1-git-send-email-pablo@netfilter.org>

From: Dan Carpenter <dan.carpenter@oracle.com>

The ->ip_set_list[] array is initialized in ip_set_net_init() and it
has ->ip_set_max elements so this check should be >= instead of >
otherwise we are off by one.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/ipset/ip_set_core.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
index 912e5a0..86f9d76 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -659,7 +659,7 @@ ip_set_nfnl_get_byindex(struct net *net, ip_set_id_t index)
 	struct ip_set *set;
 	struct ip_set_net *inst = ip_set_pernet(net);
 
-	if (index > inst->ip_set_max)
+	if (index >= inst->ip_set_max)
 		return IPSET_INVALID_ID;
 
 	nfnl_lock(NFNL_SUBSYS_IPSET);
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 3/8] netfilter: nf_tables: check for NULL in nf_tables_newchain pcpu stats allocation
From: Pablo Neira Ayuso @ 2014-10-27 21:38 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1414445887-5108-1-git-send-email-pablo@netfilter.org>

From: Sabrina Dubroca <sd@queasysnail.net>

alloc_percpu returns NULL on failure, not a negative error code.

Fixes: ff3cd7b3c922 ("netfilter: nf_tables: refactor chain statistic routines")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_tables_api.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 65eb2a1..11ab4b0 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -1328,10 +1328,10 @@ static int nf_tables_newchain(struct sock *nlsk, struct sk_buff *skb,
 			basechain->stats = stats;
 		} else {
 			stats = netdev_alloc_pcpu_stats(struct nft_stats);
-			if (IS_ERR(stats)) {
+			if (stats == NULL) {
 				module_put(type->owner);
 				kfree(basechain);
-				return PTR_ERR(stats);
+				return -ENOMEM;
 			}
 			rcu_assign_pointer(basechain->stats, stats);
 		}
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 4/8] bridge: Do not compile options in br_parse_ip_options
From: Pablo Neira Ayuso @ 2014-10-27 21:38 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1414445887-5108-1-git-send-email-pablo@netfilter.org>

From: Herbert Xu <herbert@gondor.apana.org.au>

Commit 462fb2af9788a82a534f8184abfde31574e1cfa0

	bridge : Sanitize skb before it enters the IP stack

broke when IP options are actually used because it mangles the
skb as if it entered the IP stack which is wrong because the
bridge is supposed to operate below the IP stack.

Since nobody has actually requested for parsing of IP options
this patch fixes it by simply reverting to the previous approach
of ignoring all IP options, i.e., zeroing the IPCB.

If and when somebody who uses IP options and actually needs them
to be parsed by the bridge complains then we can revisit this.

Reported-by: David Newall <davidn@davidnewall.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/bridge/br_netfilter.c |   24 +++++-------------------
 1 file changed, 5 insertions(+), 19 deletions(-)

diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 1bada53..1a4f32c 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -192,7 +192,6 @@ static inline void nf_bridge_save_header(struct sk_buff *skb)
 
 static int br_parse_ip_options(struct sk_buff *skb)
 {
-	struct ip_options *opt;
 	const struct iphdr *iph;
 	struct net_device *dev = skb->dev;
 	u32 len;
@@ -201,7 +200,6 @@ static int br_parse_ip_options(struct sk_buff *skb)
 		goto inhdr_error;
 
 	iph = ip_hdr(skb);
-	opt = &(IPCB(skb)->opt);
 
 	/* Basic sanity checks */
 	if (iph->ihl < 5 || iph->version != 4)
@@ -227,23 +225,11 @@ static int br_parse_ip_options(struct sk_buff *skb)
 	}
 
 	memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
-	if (iph->ihl == 5)
-		return 0;
-
-	opt->optlen = iph->ihl*4 - sizeof(struct iphdr);
-	if (ip_options_compile(dev_net(dev), opt, skb))
-		goto inhdr_error;
-
-	/* Check correct handling of SRR option */
-	if (unlikely(opt->srr)) {
-		struct in_device *in_dev = __in_dev_get_rcu(dev);
-		if (in_dev && !IN_DEV_SOURCE_ROUTE(in_dev))
-			goto drop;
-
-		if (ip_options_rcv_srr(skb))
-			goto drop;
-	}
-
+	/* We should really parse IP options here but until
+	 * somebody who actually uses IP options complains to
+	 * us we'll just silently ignore the options because
+	 * we're lazy!
+	 */
 	return 0;
 
 inhdr_error:
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 7/8] netfilter: nf_log: release skbuff on nlmsg put failure
From: Pablo Neira Ayuso @ 2014-10-27 21:38 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1414445887-5108-1-git-send-email-pablo@netfilter.org>

From: Houcheng Lin <houcheng@gmail.com>

The kernel should reserve enough room in the skb so that the DONE
message can always be appended.  However, in case of e.g. new attribute
erronously not being size-accounted for, __nfulnl_send() will still
try to put next nlmsg into this full skbuf, causing the skb to be stuck
forever and blocking delivery of further messages.

Fix issue by releasing skb immediately after nlmsg_put error and
WARN() so we can track down the cause of such size mismatch.

[ fw@strlen.de: add tailroom/len info to WARN ]

Signed-off-by: Houcheng Lin <houcheng@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nfnetlink_log.c |   17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c
index 2d02eac3..5f1be5b 100644
--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@ -346,26 +346,25 @@ nfulnl_alloc_skb(struct net *net, u32 peer_portid, unsigned int inst_size,
 	return skb;
 }
 
-static int
+static void
 __nfulnl_send(struct nfulnl_instance *inst)
 {
-	int status = -1;
-
 	if (inst->qlen > 1) {
 		struct nlmsghdr *nlh = nlmsg_put(inst->skb, 0, 0,
 						 NLMSG_DONE,
 						 sizeof(struct nfgenmsg),
 						 0);
-		if (!nlh)
+		if (WARN_ONCE(!nlh, "bad nlskb size: %u, tailroom %d\n",
+			      inst->skb->len, skb_tailroom(inst->skb))) {
+			kfree_skb(inst->skb);
 			goto out;
+		}
 	}
-	status = nfnetlink_unicast(inst->skb, inst->net, inst->peer_portid,
-				   MSG_DONTWAIT);
-
+	nfnetlink_unicast(inst->skb, inst->net, inst->peer_portid,
+			  MSG_DONTWAIT);
+out:
 	inst->qlen = 0;
 	inst->skb = NULL;
-out:
-	return status;
 }
 
 static void
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 5/8] netfilter: nf_log: account for size of NLMSG_DONE attribute
From: Pablo Neira Ayuso @ 2014-10-27 21:38 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1414445887-5108-1-git-send-email-pablo@netfilter.org>

From: Florian Westphal <fw@strlen.de>

We currently neither account for the nlattr size, nor do we consider
the size of the trailing NLMSG_DONE when allocating nlmsg skb.

This can result in nflog to stop working, as __nfulnl_send() re-tries
sending forever if it failed to append NLMSG_DONE (which will never
work if buffer is not large enough).

Reported-by: Houcheng Lin <houcheng@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nfnetlink_log.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c
index b1e3a05..8117fba 100644
--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@ -649,7 +649,8 @@ nfulnl_log_packet(struct net *net,
 		+ nla_total_size(sizeof(u_int32_t))	/* gid */
 		+ nla_total_size(plen)			/* prefix */
 		+ nla_total_size(sizeof(struct nfulnl_msg_packet_hw))
-		+ nla_total_size(sizeof(struct nfulnl_msg_packet_timestamp));
+		+ nla_total_size(sizeof(struct nfulnl_msg_packet_timestamp))
+		+ nla_total_size(sizeof(struct nfgenmsg));	/* NLMSG_DONE */
 
 	if (in && skb_mac_header_was_set(skb)) {
 		size +=   nla_total_size(skb->dev->hard_header_len)
@@ -692,8 +693,7 @@ nfulnl_log_packet(struct net *net,
 		goto unlock_and_release;
 	}
 
-	if (inst->skb &&
-	    size > skb_tailroom(inst->skb) - sizeof(struct nfgenmsg)) {
+	if (inst->skb && size > skb_tailroom(inst->skb)) {
 		/* either the queue len is too high or we don't have
 		 * enough room in the skb left. flush to userspace. */
 		__nfulnl_flush(inst);
-- 
1.7.10.4


^ permalink raw reply related

* [PATCH 6/8] netfilter: nfnetlink_log: fix maximum packet length logged to userspace
From: Pablo Neira Ayuso @ 2014-10-27 21:38 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1414445887-5108-1-git-send-email-pablo@netfilter.org>

From: Florian Westphal <fw@strlen.de>

don't try to queue payloads > 0xffff - NLA_HDRLEN, it does not work.
The nla length includes the size of the nla struct, so anything larger
results in u16 integer overflow.

This patch is similar to
9cefbbc9c8f9abe (netfilter: nfnetlink_queue: cleanup copy_range usage).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nfnetlink_log.c |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c
index 8117fba..2d02eac3 100644
--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@ -43,7 +43,8 @@
 #define NFULNL_NLBUFSIZ_DEFAULT	NLMSG_GOODSIZE
 #define NFULNL_TIMEOUT_DEFAULT 	100	/* every second */
 #define NFULNL_QTHRESH_DEFAULT 	100	/* 100 packets */
-#define NFULNL_COPY_RANGE_MAX	0xFFFF	/* max packet size is limited by 16-bit struct nfattr nfa_len field */
+/* max packet size is limited by 16-bit struct nfattr nfa_len field */
+#define NFULNL_COPY_RANGE_MAX	(0xFFFF - NLA_HDRLEN)
 
 #define PRINTR(x, args...)	do { if (net_ratelimit()) \
 				     printk(x, ## args); } while (0);
@@ -252,6 +253,8 @@ nfulnl_set_mode(struct nfulnl_instance *inst, u_int8_t mode,
 
 	case NFULNL_COPY_PACKET:
 		inst->copy_mode = mode;
+		if (range == 0)
+			range = NFULNL_COPY_RANGE_MAX;
 		inst->copy_range = min_t(unsigned int,
 					 range, NFULNL_COPY_RANGE_MAX);
 		break;
@@ -679,8 +682,7 @@ nfulnl_log_packet(struct net *net,
 		break;
 
 	case NFULNL_COPY_PACKET:
-		if (inst->copy_range == 0
-		    || inst->copy_range > skb->len)
+		if (inst->copy_range > skb->len)
 			data_len = skb->len;
 		else
 			data_len = inst->copy_range;
-- 
1.7.10.4


^ permalink raw reply related

* [PATCH 8/8] netfilter: nft_compat: fix wrong target lookup in nft_target_select_ops()
From: Pablo Neira Ayuso @ 2014-10-27 21:38 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1414445887-5108-1-git-send-email-pablo@netfilter.org>

From: Arturo Borrero <arturo.borrero.glez@gmail.com>

The code looks for an already loaded target, and the correct list to search
is nft_target_list, not nft_match_list.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_compat.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/nft_compat.c b/net/netfilter/nft_compat.c
index 0480f57..9d6d6f6 100644
--- a/net/netfilter/nft_compat.c
+++ b/net/netfilter/nft_compat.c
@@ -672,7 +672,7 @@ nft_target_select_ops(const struct nft_ctx *ctx,
 	family = ctx->afi->family;
 
 	/* Re-use the existing target if it's already loaded. */
-	list_for_each_entry(nft_target, &nft_match_list, head) {
+	list_for_each_entry(nft_target, &nft_target_list, head) {
 		struct xt_target *target = nft_target->ops.data;
 
 		if (strcmp(target->name, tg_name) == 0 &&
-- 
1.7.10.4


^ permalink raw reply related

* Re: [PATCH] ovs: Turn vports with dependencies into separate modules
From: Thomas Graf @ 2014-10-27 21:47 UTC (permalink / raw)
  To: Pravin Shelar; +Cc: dev@openvswitch.org, netdev
In-Reply-To: <CALnjE+opCLA9KJ5RHaUs1vbx41p6=iUi9B59Q7G0TeSTmWm7_w@mail.gmail.com>

On 10/27/14 at 10:14am, Pravin Shelar wrote:
> On Fri, Oct 24, 2014 at 2:57 PM, Thomas Graf <tgraf@suug.ch> wrote:
> > I was refering to how many other kernel APIs have been designed, a
> > registration API allowing a vport to be implemented exclusively in the
> > scope of a single file tends to be cleaner than having to touch multiple
> > files and maintaining an init list.
> >
> This has never been issue in openvswitch. Plus we do not need loadable
> vport module to fix this issue.
> 
> > It also allows for OVS to be built into vmlinuz while vports can
> > remain as modules even if vxlan itself is built as a module.
> >
> 
> What is problem with current OVS built into kernel?

What I mean specifically is the following dependency logic which will
no longer be required:

depends on NET_IPGRE_DEMUX && !(OPENVSWITCH=y && NET_IPGRE_DEMUX=m)

The patch also brings additional flexibility to users of
distributions. Distros typically ship something like an allmodconfig
so a user can either run openvswitch.ko with all encaps compiled in
or not run openvswitch.ko. With vports as module, a user can blacklist
a certain encap type.

Another advantage is obviously that users can run additional vport
types on top of their distribution kernels.

Is there anything specific that you are concerned with in regard
to this proposed change?

^ permalink raw reply

* Re: [PATCH v3] ipv6: notify userspace when we added or changed an ipv6 token
From: Daniel Borkmann @ 2014-10-27 22:25 UTC (permalink / raw)
  To: Lubomir Rintel; +Cc: netdev, David S. Miller, Hannes Frederic Sowa
In-Reply-To: <1414427956-20056-1-git-send-email-lkundrak@v3.sk>

On 10/27/2014 05:39 PM, Lubomir Rintel wrote:
> NetworkManager might want to know that it changed when the router advertisement
> arrives.
>
> Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Cc: Daniel Borkmann <dborkman@redhat.com>

Looks better, thanks!

Acked-by: Daniel Borkmann <dborkman@redhat.com>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox