Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] net: qmi_wwan: add ZTE MF821D
From: David Miller @ 2012-07-18 16:41 UTC (permalink / raw)
  To: bjorn; +Cc: netdev, linux-usb, tschaefer
In-Reply-To: <1342091906-30045-1-git-send-email-bjorn@mork.no>

From: Bjørn Mork <bjorn@mork.no>
Date: Thu, 12 Jul 2012 13:18:26 +0200

> Sold by O2 (telefonica germany) under the name "LTE4G"
> 
> Tested-by: Thomas Schäfer <tschaefer@t-online.de>
> Signed-off-by: Bjørn Mork <bjorn@mork.no>

Applied to net-next, thanks.

^ permalink raw reply

* Re: [PATCH net-next] net: ftgmac100/ftmac100: dont pull too much data
From: David Miller @ 2012-07-18 16:42 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, ratbert
In-Reply-To: <1342102778.3265.8272.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 12 Jul 2012 16:19:38 +0200

> From: Eric Dumazet <edumazet@google.com>
> 
> Drivers should pull only ethernet header from page frag
> to skb->head.
> 
> Pulling 64 bytes is too much for TCP (without options) on IPv4.
> 
> However, it makes sense to pull all the frame if it fits the
> 128 bytes bloc allocated for skb->head, to free one page per
> small incoming frame.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply

* Re: [patch] qlge: fix an "&&" vs "||" bug
From: David Miller @ 2012-07-18 16:42 UTC (permalink / raw)
  To: jitendra.kalsaria
  Cc: dan.carpenter, anirban.chakraborty, ron.mercer, Linux-Driver,
	netdev, kernel-janitors
In-Reply-To: <5E4F49720D0BAD499EE1F01232234BA8774378FA29@AVEXMB1.qlogic.org>

From: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Date: Thu, 12 Jul 2012 11:15:56 -0700

> -----Original Message-----
>>From: Dan Carpenter [mailto:dan.carpenter@oracle.com] 
>>Sent: Thursday, July 12, 2012 7:47 AM
>>To: Anirban Chakraborty
>>Cc: Jitendra Kalsaria; Ron Mercer; Dept-Eng Linux Driver; netdev; kernel-janitors@vger.kernel.org
>>Subject: [patch] qlge: fix an "&&" vs "||" bug
>>
>>The condition is always true so WOL will never work.
>>
>>Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
...
> Acked-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>

Applied.

^ permalink raw reply

* Re: [PATCH] ISDN:Add check for return value of pnp_activate_dev()
From: David Miller @ 2012-07-18 16:42 UTC (permalink / raw)
  To: kkeil; +Cc: netdev, alan, rucsoftsec
In-Reply-To: <1342110562-7774-1-git-send-email-kkeil@linux-pingi.de>

From: Karsten Keil <kkeil@linux-pingi.de>
Date: Thu, 12 Jul 2012 18:29:22 +0200

> pnp_activate_dev() return value needs to be checked to make sure that
> following calls calls to the PNP functions do work correctly.
> Fix for report #44491 on bugzilla.kernel.org.
> 
> Signed-off-by: Karsten Keil <kkeil@linux-pingi.de>

Applied.

^ permalink raw reply

* Re: [PATCH ISDN] Add check for usb_alloc_urb() result
From: David Miller @ 2012-07-18 16:44 UTC (permalink / raw)
  To: kkeil; +Cc: netdev, rucsoftsec, m.bachem
In-Reply-To: <1342169986-24268-1-git-send-email-kkeil@linux-pingi.de>

From: Karsten Keil <kkeil@linux-pingi.de>
Date: Fri, 13 Jul 2012 10:59:46 +0200

> usb_alloc_urb() return value needs to be checked to avoid
> later NULL pointer access.
> Reported by rucsoftsec@gmail.com via bugzilla.kernel.org #44601.
> 
> Signed-off-by: Karsten Keil <kkeil@linux-pingi.de>

Applied.

Please use consistent subject line formatting.  In your previous
patch you provided:

	[PATCH] ISDN:foo bar baz

which I corrected to:

	[PATCH] ISDN: foo bar baz

And for this patch you provided:

	[PATCH ISDN] foo bar baz

which I corrected to:

	[PATCH] ISDN: foo bar baz

Anything in those initial brackets will be removed by the
automated GIT tools, it's a place for text strings you don't
want to end up in the final commit message.

But you want that "ISDN: " prefix there in the end, so please
do not put it in brackets, and please do put a space after
that ":"

Thanks.

^ permalink raw reply

* Re: [net-next PATCH v7] net: ethernet: davinci_emac: add OF support
From: David Miller @ 2012-07-18 16:45 UTC (permalink / raw)
  To: agust
  Cc: netdev, hs, davinci-linux-open-source, linux-arm-kernel,
	devicetree-discuss, grant.likely, nsekhar, wd, mm05
In-Reply-To: <1342521264-18466-1-git-send-email-agust@denx.de>

From: Anatolij Gustschin <agust@denx.de>
Date: Tue, 17 Jul 2012 12:34:24 +0200

> From: Heiko Schocher <hs@denx.de>
> 
> add OF support for the davinci_emac driver.
> 
> Signed-off-by: Heiko Schocher <hs@denx.de>
> Acked-by: Sekhar Nori <nsekhar@ti.com>
> Signed-off-by: Anatolij Gustschin <agust@denx.de>

Applied.

^ permalink raw reply

* Re: [PATCH] skbuff: Use correct allocation in skb_copy_ubufs
From: David Miller @ 2012-07-18 16:45 UTC (permalink / raw)
  To: krkumar2; +Cc: xma, netdev
In-Reply-To: <20120717120529.16840.51108.sendpatchset@localhost.localdomain>

From: Krishna Kumar <krkumar2@in.ibm.com>
Date: Tue, 17 Jul 2012 17:35:29 +0530

> Use correct allocation flags during copy of user space fragments
> to the kernel. Also "improve" couple of for loops.
> 
> Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>

Applied to net-next

^ permalink raw reply

* Re: [PATCH] jme: netpoll support
From: David Miller @ 2012-07-18 16:45 UTC (permalink / raw)
  To: lekensteyn; +Cc: cooldavid, netdev
In-Reply-To: <18143736.brt1iGhlQ1@al>

From: Lekensteyn <lekensteyn@gmail.com>
Date: Tue, 17 Jul 2012 18:29:34 +0200

> From: Peter Wu <lekensteyn@gmail.com>
> 
> This patch adds the netpoll function to support netconsole. Tested and works
> fine on my "JMC250 PCI Express Gigabit Ethernet Controller" (PCI ID 0250).
> 
> Signed-off-by: Peter Wu <lekensteyn@gmail.com>

Applied to net-next

I really wonder if this driver works on SMP systems at all.

^ permalink raw reply

* Re: [PATCH 0/2] runtime PM support for cpsw and davinci mdio drivers
From: David Miller @ 2012-07-18 16:45 UTC (permalink / raw)
  To: mugunthanvnm; +Cc: netdev
In-Reply-To: <1342548590-12502-1-git-send-email-mugunthanvnm@ti.com>

From: Mugunthan V N <mugunthanvnm@ti.com>
Date: Tue, 17 Jul 2012 23:39:48 +0530

> This patch set adds support for runtime PM support for CPSW and Davinci MDIO
> drivers
> 
> Mugunthan V N (2):
>   driver: net: ethernet: davinci_mdio: runtime PM support
>   driver: net: ethernet: cpsw: runtime PM support

All applied to net-next, thanks.

^ permalink raw reply

* Re: [PATCH 0/3] net: various tilegx networking fixes
From: David Miller @ 2012-07-18 16:47 UTC (permalink / raw)
  To: cmetcalf; +Cc: netdev, linux-kernel
In-Reply-To: <201207181640.q6IGet7P007227@lab-41.internal.tilera.com>


Please don't post patches like you did here.

The big problem is that you use the dates of your commit in
the email, that breaks everything.

It makes the patches appear out of order in patchwork, which make
more work for me.

Please repost these patches, and tell git-am or whatever tool you use
to not use the commit date in the outgoing emails.

THanks.

^ permalink raw reply

* Re: [RFC PATCH] net: cgroup: null ptr dereference in netprio cgroup during init
From: Neil Horman @ 2012-07-18 16:50 UTC (permalink / raw)
  To: John Fastabend; +Cc: davem, gaofeng, mark.d.rustad, netdev, eric.dumazet
In-Reply-To: <5006D2E0.1070404@intel.com>

On Wed, Jul 18, 2012 at 08:14:40AM -0700, John Fastabend wrote:
> On 7/18/2012 7:21 AM, John Fastabend wrote:
> >On 7/18/2012 5:45 AM, Neil Horman wrote:
> >>On Tue, Jul 17, 2012 at 05:33:16PM -0700, John Fastabend wrote:
> >>>When the netprio cgroup is built in the kernel cgroup_init will call
> >>>cgrp_create which eventually calls update_netdev_tables. This is
> >>>being called before do_initcalls() so a null ptr dereference occurs
> >>>on init_net.
> >>>
> >>>This patch adds a check on init_net.count to verify the structure
> >>>has been initialized. The failure was introduced here,
> >>>
> >>>commit ef209f15980360f6945873df3cd710c5f62f2a3e
> >>>Author: Gao feng <gaofeng@cn.fujitsu.com>
> >>>Date:   Wed Jul 11 21:50:15 2012 +0000
> >>>
> >>>     net: cgroup: fix access the unallocated memory in netprio cgroup
> >>>
> >>>Tested with ping with netprio_cgroup as a module and built in.
> >>>
> >>>Marked RFC for now I think DaveM might have a reason why this needs
> >>>some improvement.
> >>>
> >>>Reported-by: Mark Rustad <mark.d.rustad@intel.com>
> >>>Cc: Neil Horman <nhorman@tuxdriver.com>
> >>>Cc: Eric Dumazet <edumazet@google.com>
> >>>Cc: Gao feng <gaofeng@cn.fujitsu.com>
> >>>Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> >>>---
> >>>
> >>>  net/core/netprio_cgroup.c |    3 +++
> >>>  1 files changed, 3 insertions(+), 0 deletions(-)
> >>>
> >>>diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
> >>>index b2e9caa..e9fd7fd 100644
> >>>--- a/net/core/netprio_cgroup.c
> >>>+++ b/net/core/netprio_cgroup.c
> >>>@@ -116,6 +116,9 @@ static int update_netdev_tables(void)
> >>>      u32 max_len;
> >>>      struct netprio_map *map;
> >>>
> >>>+    if (!atomic_read(&init_net.count))
> >>>+        return ret;
> >>>+
> >>>      rtnl_lock();
> >>>      max_len = atomic_read(&max_prioidx) + 1;
> >>>      for_each_netdev(&init_net, dev) {
> >>>
> >>>
> >>
> >>John, do you have a stack trace of this.  I'm having a hard time
> >>seeing how we
> >>get into this path prior to the network stack being initalized.
> >
> >Mark had a partial trace
> >
> >[    0.003455] Dentry cache hash table entries: 262144 (order: 9,
> >2097152 bytes)
> >[    0.005550] Inode-cache hash table entries: 131072 (order: 8, 1048576
> >bytes)
> >[    0.007165] Mount-cache hash table entries: 256
> >[    0.010289] Initializing cgroup subsys net_cls
> >[    0.010947] Initializing cgroup subsys net_prio
> >[    0.011039] BUG: unable to handle kernel NULL pointer dereference at
> >0000000000000828
> >[    0.011998] IP: [<ffffffff814202c8>] update_netdev_tables+0x68/0xe0
> >
> >
> >>
> >>It also brings up another point.  If this is happening, and we're
> >>creating the
> >>root cgroup from start_kernel, Then we're actually initalizing some
> >>cgroups
> >>twice, because a few cgroups register themselves via
> >>cgroup_load_subsys in
> >>module_init specified routines.  So if you're building netprio_cgroup or
> >>net_cls_cgroup as part of the monolithic kernel, you'll get
> >>cgroup_create called
> >>prior to your module_init() call.  Thats not good.
> >
> >Well your module_init() wouldn't be called in this case right? I think
> >netprio has a bug where we only register a netdevice notifier when
> >its built as a module.
> >
> >same issue with cls_cgroup and register_tcf_proto_ops?
> >
> 
> Neil, I was very unclear in the above. What I meant here was
> cgroup_load_subsys() checks ss->module so you should _not_
> get two create calls. And returns 0 so the register calls for
> netdev notifiers should get setup.
> 
Ok, that a fair point.  So cgroup_load_subsys becomes a no-op if you build
monolithically, thats good.  I'm still worried though that theres a very
non-intuitive order to boot here.  If I write a module and set a module_init()
call in it, I expect that to get called before any other code does.  It appears
that you've found that the netprio_cgroup's cgrp_create routine can be called
prior to the module initialization code.  Even if that happens to work our in
some cases, it seems like a bad idea, calling code that may not have properly
initalized data.

> I missed the return 0 part and so I thought we might abort before
> this occurs but it looks ok to me on second glance.
> 
Yeah, you're right, we dont' get double initalization, but we do still seem to
have this situation in which we call code before its init routine has run, which
I really don't like.

Regardless, your patch looks like it will fix this problem, and since, as Dave
pointed out, we're late in -rc, my issues can take a back seat.  I've acked
you're patch.

Thanks!
Neil

^ permalink raw reply

* [PATCH 1/3] net: tilegx driver bugfix (be explicit about percpu queue number)
From: Chris Metcalf @ 2012-07-18 16:52 UTC (permalink / raw)
  To: David S. Miller, netdev, linux-kernel
In-Reply-To: <201207181650.q6IGodZ7007565@lab-41.internal.tilera.com>

Avoid packets belonging to queue/cpu A trying to transmit on cpu B.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 drivers/net/ethernet/tile/tilegx.c |   23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/tile/tilegx.c b/drivers/net/ethernet/tile/tilegx.c
index 83b4b38..c7bde28 100644
--- a/drivers/net/ethernet/tile/tilegx.c
+++ b/drivers/net/ethernet/tile/tilegx.c
@@ -123,6 +123,7 @@ struct tile_net_comps {
 
 /* The transmit wake timer for a given cpu and echannel. */
 struct tile_net_tx_wake {
+	int tx_queue_idx;
 	struct hrtimer timer;
 	struct net_device *dev;
 };
@@ -573,12 +574,14 @@ static void add_comp(gxio_mpipe_equeue_t *equeue,
 	comps->comp_next++;
 }
 
-static void tile_net_schedule_tx_wake_timer(struct net_device *dev)
+static void tile_net_schedule_tx_wake_timer(struct net_device *dev,
+                                            int tx_queue_idx)
 {
-	struct tile_net_info *info = &__get_cpu_var(per_cpu_info);
+	struct tile_net_info *info = &per_cpu(per_cpu_info, tx_queue_idx);
 	struct tile_net_priv *priv = netdev_priv(dev);
+	struct tile_net_tx_wake *tx_wake = &info->tx_wake[priv->echannel];
 
-	hrtimer_start(&info->tx_wake[priv->echannel].timer,
+	hrtimer_start(&tx_wake->timer,
 		      ktime_set(0, TX_TIMER_DELAY_USEC * 1000UL),
 		      HRTIMER_MODE_REL_PINNED);
 }
@@ -587,7 +590,7 @@ static enum hrtimer_restart tile_net_handle_tx_wake_timer(struct hrtimer *t)
 {
 	struct tile_net_tx_wake *tx_wake =
 		container_of(t, struct tile_net_tx_wake, timer);
-	netif_wake_subqueue(tx_wake->dev, smp_processor_id());
+	netif_wake_subqueue(tx_wake->dev, tx_wake->tx_queue_idx);
 	return HRTIMER_NORESTART;
 }
 
@@ -1218,6 +1221,7 @@ static int tile_net_open(struct net_device *dev)
 
 		hrtimer_init(&tx_wake->timer, CLOCK_MONOTONIC,
 			     HRTIMER_MODE_REL);
+		tx_wake->tx_queue_idx = cpu;
 		tx_wake->timer.function = tile_net_handle_tx_wake_timer;
 		tx_wake->dev = dev;
 	}
@@ -1291,6 +1295,7 @@ static inline void *tile_net_frag_buf(skb_frag_t *f)
  * stop the queue and schedule the tx_wake timer.
  */
 static s64 tile_net_equeue_try_reserve(struct net_device *dev,
+				       int tx_queue_idx,
 				       struct tile_net_comps *comps,
 				       gxio_mpipe_equeue_t *equeue,
 				       int num_edescs)
@@ -1313,8 +1318,8 @@ static s64 tile_net_equeue_try_reserve(struct net_device *dev,
 	}
 
 	/* Still nothing; give up and stop the queue for a short while. */
-	netif_stop_subqueue(dev, smp_processor_id());
-	tile_net_schedule_tx_wake_timer(dev);
+	netif_stop_subqueue(dev, tx_queue_idx);
+	tile_net_schedule_tx_wake_timer(dev, tx_queue_idx);
 	return -1;
 }
 
@@ -1580,7 +1585,8 @@ static int tile_net_tx_tso(struct sk_buff *skb, struct net_device *dev)
 	local_irq_save(irqflags);
 
 	/* Try to acquire a completion entry and an egress slot. */
-	slot = tile_net_equeue_try_reserve(dev, comps, equeue, num_edescs);
+	slot = tile_net_equeue_try_reserve(dev, skb->queue_mapping, comps,
+					   equeue, num_edescs);
 	if (slot < 0) {
 		local_irq_restore(irqflags);
 		return NETDEV_TX_BUSY;
@@ -1674,7 +1680,8 @@ static int tile_net_tx(struct sk_buff *skb, struct net_device *dev)
 	local_irq_save(irqflags);
 
 	/* Try to acquire a completion entry and an egress slot. */
-	slot = tile_net_equeue_try_reserve(dev, comps, equeue, num_edescs);
+	slot = tile_net_equeue_try_reserve(dev, skb->queue_mapping, comps,
+					   equeue, num_edescs);
 	if (slot < 0) {
 		local_irq_restore(irqflags);
 		return NETDEV_TX_BUSY;
-- 
1.7.10.3

^ permalink raw reply related

* [PATCH 2/3] tilegx net driver: handle payload data not in frags
From: Chris Metcalf @ 2012-07-18 16:52 UTC (permalink / raw)
  To: David S. Miller, netdev, linux-kernel
In-Reply-To: <201207181650.q6IGodZ7007565@lab-41.internal.tilera.com>

The original driver implementation assumed that for TSO, all the
payload data would be in the frags.  This isn't always true; change
the driver to support payload data at skb->data between
"skb_transport_offset(skb) + tcp_hdrlen(skb)" and "skb->hdr_len",
followed by the data in the frags.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 drivers/net/ethernet/tile/tilegx.c |   36 ++++++++++++++++++------------------
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/tile/tilegx.c b/drivers/net/ethernet/tile/tilegx.c
index c7bde28..f78effc 100644
--- a/drivers/net/ethernet/tile/tilegx.c
+++ b/drivers/net/ethernet/tile/tilegx.c
@@ -1333,11 +1333,12 @@ static s64 tile_net_equeue_try_reserve(struct net_device *dev,
 static int tso_count_edescs(struct sk_buff *skb)
 {
 	struct skb_shared_info *sh = skb_shinfo(skb);
-	unsigned int data_len = skb->data_len;
+	unsigned int sh_len = skb_transport_offset(skb) + tcp_hdrlen(skb);
+	unsigned int data_len = skb->data_len + skb->hdr_len - sh_len;
 	unsigned int p_len = sh->gso_size;
 	long f_id = -1;    /* id of the current fragment */
-	long f_size = -1;  /* size of the current fragment */
-	long f_used = -1;  /* bytes used from the current fragment */
+	long f_size = skb->hdr_len;  /* size of the current fragment */
+	long f_used = sh_len;  /* bytes used from the current fragment */
 	long n;            /* size of the current piece of payload */
 	int num_edescs = 0;
 	int segment;
@@ -1382,13 +1383,14 @@ static void tso_headers_prepare(struct sk_buff *skb, unsigned char *headers,
 	struct skb_shared_info *sh = skb_shinfo(skb);
 	struct iphdr *ih;
 	struct tcphdr *th;
-	unsigned int data_len = skb->data_len;
+	unsigned int sh_len = skb_transport_offset(skb) + tcp_hdrlen(skb);
+	unsigned int data_len = skb->data_len + skb->hdr_len - sh_len;
 	unsigned char *data = skb->data;
-	unsigned int ih_off, th_off, sh_len, p_len;
+	unsigned int ih_off, th_off, p_len;
 	unsigned int isum_seed, tsum_seed, id, seq;
 	long f_id = -1;    /* id of the current fragment */
-	long f_size = -1;  /* size of the current fragment */
-	long f_used = -1;  /* bytes used from the current fragment */
+	long f_size = skb->hdr_len;  /* size of the current fragment */
+	long f_used = sh_len;  /* bytes used from the current fragment */
 	long n;            /* size of the current piece of payload */
 	int segment;
 
@@ -1397,14 +1399,13 @@ static void tso_headers_prepare(struct sk_buff *skb, unsigned char *headers,
 	th = tcp_hdr(skb);
 	ih_off = skb_network_offset(skb);
 	th_off = skb_transport_offset(skb);
-	sh_len = th_off + tcp_hdrlen(skb);
 	p_len = sh->gso_size;
 
 	/* Set up seed values for IP and TCP csum and initialize id and seq. */
 	isum_seed = ((0xFFFF - ih->check) +
 		     (0xFFFF - ih->tot_len) +
 		     (0xFFFF - ih->id));
-	tsum_seed = th->check + (0xFFFF ^ htons(skb->len));
+	tsum_seed = th->check + (0xFFFF ^ htons(sh_len + data_len));
 	id = ntohs(ih->id);
 	seq = ntohl(th->seq);
 
@@ -1476,21 +1477,22 @@ static void tso_egress(struct net_device *dev, gxio_mpipe_equeue_t *equeue,
 {
 	struct tile_net_priv *priv = netdev_priv(dev);
 	struct skb_shared_info *sh = skb_shinfo(skb);
-	unsigned int data_len = skb->data_len;
+	unsigned int sh_len = skb_transport_offset(skb) + tcp_hdrlen(skb);
+	unsigned int data_len = skb->data_len + skb->hdr_len - sh_len;
 	unsigned int p_len = sh->gso_size;
 	gxio_mpipe_edesc_t edesc_head = { { 0 } };
 	gxio_mpipe_edesc_t edesc_body = { { 0 } };
 	long f_id = -1;    /* id of the current fragment */
-	long f_size = -1;  /* size of the current fragment */
-	long f_used = -1;  /* bytes used from the current fragment */
+	long f_size = skb->hdr_len;  /* size of the current fragment */
+	long f_used = sh_len;  /* bytes used from the current fragment */
+	void *f_data = skb->data;
 	long n;            /* size of the current piece of payload */
 	unsigned long tx_packets = 0, tx_bytes = 0;
-	unsigned int csum_start, sh_len;
+	unsigned int csum_start;
 	int segment;
 
 	/* Prepare to egress the headers: set up header edesc. */
 	csum_start = skb_checksum_start_offset(skb);
-	sh_len = skb_transport_offset(skb) + tcp_hdrlen(skb);
 	edesc_head.csum = 1;
 	edesc_head.csum_start = csum_start;
 	edesc_head.csum_dest = csum_start + skb->csum_offset;
@@ -1502,7 +1504,6 @@ static void tso_egress(struct net_device *dev, gxio_mpipe_equeue_t *equeue,
 
 	/* Egress all the edescs. */
 	for (segment = 0; segment < sh->gso_segs; segment++) {
-		void *va;
 		unsigned char *buf;
 		unsigned int p_used = 0;
 
@@ -1521,10 +1522,9 @@ static void tso_egress(struct net_device *dev, gxio_mpipe_equeue_t *equeue,
 				f_id++;
 				f_size = sh->frags[f_id].size;
 				f_used = 0;
+				f_data = tile_net_frag_buf(&sh->frags[f_id]);
 			}
 
-			va = tile_net_frag_buf(&sh->frags[f_id]) + f_used;
-
 			/* Use bytes from the current fragment. */
 			n = p_len - p_used;
 			if (n > f_size - f_used)
@@ -1533,7 +1533,7 @@ static void tso_egress(struct net_device *dev, gxio_mpipe_equeue_t *equeue,
 			p_used += n;
 
 			/* Egress a piece of the payload. */
-			edesc_body.va = va_to_tile_io_addr(va);
+			edesc_body.va = va_to_tile_io_addr(f_data) + f_used;
 			edesc_body.xfer_size = n;
 			edesc_body.bound = !(p_used < p_len);
 			gxio_mpipe_equeue_put_at(equeue, edesc_body, slot);
-- 
1.7.10.3

^ permalink raw reply related

* [PATCH 3/3] tilegx net: use eth_hw_addr_random(), not random_ether_addr()
From: Chris Metcalf @ 2012-07-18 16:53 UTC (permalink / raw)
  To: David S. Miller, netdev, linux-kernel
In-Reply-To: <201207181650.q6IGodZ7007565@lab-41.internal.tilera.com>

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
 drivers/net/ethernet/tile/tilegx.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/tile/tilegx.c b/drivers/net/ethernet/tile/tilegx.c
index f78effc..4e2a162 100644
--- a/drivers/net/ethernet/tile/tilegx.c
+++ b/drivers/net/ethernet/tile/tilegx.c
@@ -1851,7 +1851,7 @@ static void tile_net_dev_init(const char *name, const uint8_t *mac)
 		memcpy(dev->dev_addr, mac, 6);
 		dev->addr_len = 6;
 	} else {
-		random_ether_addr(dev->dev_addr);
+		eth_hw_addr_random(dev);
 	}
 
 	/* Register the network device. */
-- 
1.7.10.3

^ permalink raw reply related

* Re: [RFC PATCH] net: cgroup: null ptr dereference in netprio cgroup during init
From: Neil Horman @ 2012-07-18 17:14 UTC (permalink / raw)
  To: John Fastabend; +Cc: davem, gaofeng, mark.d.rustad, netdev, eric.dumazet
In-Reply-To: <5006D2E0.1070404@intel.com>

On Wed, Jul 18, 2012 at 08:14:40AM -0700, John Fastabend wrote:
> On 7/18/2012 7:21 AM, John Fastabend wrote:
> >On 7/18/2012 5:45 AM, Neil Horman wrote:
> >>On Tue, Jul 17, 2012 at 05:33:16PM -0700, John Fastabend wrote:
> >>>When the netprio cgroup is built in the kernel cgroup_init will call
> >>>cgrp_create which eventually calls update_netdev_tables. This is
> >>>being called before do_initcalls() so a null ptr dereference occurs
> >>>on init_net.
> >>>
> >>>This patch adds a check on init_net.count to verify the structure
> >>>has been initialized. The failure was introduced here,
> >>>
> >>>commit ef209f15980360f6945873df3cd710c5f62f2a3e
> >>>Author: Gao feng <gaofeng@cn.fujitsu.com>
> >>>Date:   Wed Jul 11 21:50:15 2012 +0000
> >>>
> >>>     net: cgroup: fix access the unallocated memory in netprio cgroup
> >>>
> >>>Tested with ping with netprio_cgroup as a module and built in.
> >>>
> >>>Marked RFC for now I think DaveM might have a reason why this needs
> >>>some improvement.
> >>>
> >>>Reported-by: Mark Rustad <mark.d.rustad@intel.com>
> >>>Cc: Neil Horman <nhorman@tuxdriver.com>
> >>>Cc: Eric Dumazet <edumazet@google.com>
> >>>Cc: Gao feng <gaofeng@cn.fujitsu.com>
> >>>Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> >>>---
> >>>
> >>>  net/core/netprio_cgroup.c |    3 +++
> >>>  1 files changed, 3 insertions(+), 0 deletions(-)
> >>>
> >>>diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
> >>>index b2e9caa..e9fd7fd 100644
> >>>--- a/net/core/netprio_cgroup.c
> >>>+++ b/net/core/netprio_cgroup.c
> >>>@@ -116,6 +116,9 @@ static int update_netdev_tables(void)
> >>>      u32 max_len;
> >>>      struct netprio_map *map;
> >>>
> >>>+    if (!atomic_read(&init_net.count))
> >>>+        return ret;
> >>>+
> >>>      rtnl_lock();
> >>>      max_len = atomic_read(&max_prioidx) + 1;
> >>>      for_each_netdev(&init_net, dev) {
> >>>
> >>>
> >>
> >>John, do you have a stack trace of this.  I'm having a hard time
> >>seeing how we
> >>get into this path prior to the network stack being initalized.
> >
> >Mark had a partial trace
> >
> >[    0.003455] Dentry cache hash table entries: 262144 (order: 9,
> >2097152 bytes)
> >[    0.005550] Inode-cache hash table entries: 131072 (order: 8, 1048576
> >bytes)
> >[    0.007165] Mount-cache hash table entries: 256
> >[    0.010289] Initializing cgroup subsys net_cls
> >[    0.010947] Initializing cgroup subsys net_prio
> >[    0.011039] BUG: unable to handle kernel NULL pointer dereference at
> >0000000000000828
> >[    0.011998] IP: [<ffffffff814202c8>] update_netdev_tables+0x68/0xe0
> >
> >
> >>
> >>It also brings up another point.  If this is happening, and we're
> >>creating the
> >>root cgroup from start_kernel, Then we're actually initalizing some
> >>cgroups
> >>twice, because a few cgroups register themselves via
> >>cgroup_load_subsys in
> >>module_init specified routines.  So if you're building netprio_cgroup or
> >>net_cls_cgroup as part of the monolithic kernel, you'll get
> >>cgroup_create called
> >>prior to your module_init() call.  Thats not good.
> >
> >Well your module_init() wouldn't be called in this case right? I think
> >netprio has a bug where we only register a netdevice notifier when
> >its built as a module.
> >
> >same issue with cls_cgroup and register_tcf_proto_ops?
> >
> 
> Neil, I was very unclear in the above. What I meant here was
> cgroup_load_subsys() checks ss->module so you should _not_
> get two create calls. And returns 0 so the register calls for
> netdev notifiers should get setup.
> 
> I missed the return 0 part and so I thought we might abort before
> this occurs but it looks ok to me on second glance.
> 

John, et al.

Just so we all have it, I've got the problem reproduced here, and it gives me
this backtrace:

 0.149924] Mount-cache hash table entries: 256
[    0.163754] Initializing cgroup subsys cpuacct
[    0.176991] Initializing cgroup subsys memory
[    0.190012] Initializing cgroup subsys devices
[    0.203249] Initializing cgroup subsys freezer
[    0.216484] Initializing cgroup subsys net_cls
[    0.229719] Initializing cgroup subsys blkio
[    0.242436] Initializing cgroup subsys perf_event
[    0.256451] Initializing cgroup subsys net_prio
[    0.269948] BUG: unable to handle kernel NULL pointer dereference at
0000000000000698
[    0.293303] IP: [<ffffffff81512e37>] cgrp_create+0x107/0x1c0
[    0.310175] PGD 0 
[    0.316157] Oops: 0000 [#1] SMP 
[    0.325775] CPU 0 
[    0.331227] Modules linked in:
[    0.340846] 
[    0.345264] Pid: 0, comm: swapper/0 Not tainted 3.5.0-rc7+ #1 AMD Dinar/Dinar
[    0.366555] RIP: 0010:[<ffffffff81512e37>]  [<ffffffff81512e37>]
cgrp_create+0x107/0x1c0
[    0.390681] RSP: 0000:ffffffff81c01ea8  EFLAGS: 00010213
[    0.406501] RAX: 0000000000000000 RBX: ffffffffffffff10 RCX: 0000000000000000
[    0.427764] RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffffffff81c9d840
[    0.449026] RBP: ffffffff81c01ed8 R08: 00000000000164e0 R09: 0000000000000000
[    0.470289] R10: ffff8804278303c0 R11: 0000000000000000 R12: 0000000000000001
[    0.491553] R13: ffff8804278303c0 R14: ffff881036fd0700 R15: 0000000000000000
[    0.512819] FS:  0000000000000000(0000) GS:ffff880427c00000(0000)
knlGS:0000000000000000
[    0.536932] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    0.554049] CR2: 0000000000000698 CR3: 0000000001c0b000 CR4: 00000000000406b0
[    0.575311] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.596574] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    0.617838] Process swapper/0 (pid: 0, threadinfo ffffffff81c00000, task
ffffffff81c13420)
[    0.642471] Stack:
[    0.648442]  ffffffff81c01eb8 ffffffff81c9f320 ffffffff81c9f320
ffffffff81c9f320
[    0.670522]  ffffffff81c9f320 ffffffff81d482c0 ffffffff81c01ef8
ffffffff81d10397
[    0.692604]  ffffffff81e99790 0000000000000048 ffffffff81c01f18
ffffffff81d1062e
[    0.714687] Call Trace:
[    0.721960]  [<ffffffff81d10397>] cgroup_init_subsys+0x51/0xdf
[    0.739337]  [<ffffffff81d1062e>] cgroup_init+0x36/0x119
[    0.755160]  [<ffffffff81cf5c02>] start_kernel+0x38f/0x3c4
[    0.771501]  [<ffffffff81cf5672>] ? repair_env_string+0x5e/0x5e
[    0.789138]  [<ffffffff81cf5356>] x86_64_start_reservations+0x131/0x135
[    0.808849]  [<ffffffff81cf545a>] x86_64_start_kernel+0x100/0x10f
[    0.827003] Code: 10 ff ff ff 75 25 e9 89 00 00 00 66 0f 1f 84 00 00 00 00 00
48 8b 93 f0 00 00 00 48 81 fa 38 39 f9 81 48 8d 9a 10 ff ff ff 74 69 <48> 8b 93
88 07 00 00 48 85 d2 74 dd 44 3b 62 10 76 d7 48 8d bb 
[    0.883860] RIP  [<ffffffff81512e37>] cgrp_create+0x107/0x1c0
[    0.900988]  RSP <ffffffff81c01ea8>
[    0.911366] CR2: 0000000000000698
[    0.921235] ---[ end trace a7919e7f17c0a725 ]---


So yes, it appears to me that we're calling cgrp_create from cgroup_init_subsys
prior to having the module_init routine called for netprio_cgroup.  It seems to
me that (given that we have a cgroup_early_init patch), we can move the
cgroup_init call until later in the boot process.  I'll spend the some time in
the next few weeks tinkering with that.
Best
Neil

^ permalink raw reply

* Re: getsockopt/setsockopt with SO_RCVBUF and SO_SNDBUF "non-standard" behaviour
From: Rick Jones @ 2012-07-18 17:32 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eugen Dedu, linux-kernel@vger.kernel.org, netdev
In-Reply-To: <1342627875.2626.3070.camel@edumazet-glaptop>

On 07/18/2012 09:11 AM, Eric Dumazet wrote:
>
> That the way it's done on linux since day 0
>
> You can probably find a lot of pages on the web explaining the
> rationale.
>
> If your application handles UDP frames, what SO_RCVBUF should count ?
>
> If its the amount of payload bytes, you could have a pathological
> situation where an attacker sends 1-byte UDP frames fast enough and
> could consume a lot of kernel memory.
>
> Each frame consumes a fair amount of kernel memory (between 512 bytes
> and 8 Kbytes depending on the driver).
>
> So linux says : If user expect to receive  XXXX bytes, set a limit of
> _kernel_ memory used to store these bytes, and use an estimation of 100%
> of overhead. That is : allow 2*XXXX bytes to be allocated for socket
> receive buffers.

Expanding on/rewording that, in a setsockopt() call SO_RCVBUF specifies 
the data bytes and gets doubled to become the kernel/overhead byte 
limit.  Unless the doubling would be greater than net.core.rmem_max, in 
which case the limit becomes net.core.rmem_max.

But on getsockopt() SO_RCVBUF is always the kernel/overhead byte limit.

In one call it is fish.  In the other it is fowl.

Other stacks appear to keep their kernel/overhead limit quiet, keeping 
SO_RCVBUF an expression of a data limit in both setsockopt() and 
getsockopt().  With those stacks, there is I suppose the possible source 
of confusion when/if someone tests the queuing to a socket, sends "high 
overhead" packets and doesn't get to SO_RCVBUF worth of data though I 
don't recall encountering that in my "pre-linux" time.

The sometimes fish, sometimes fowl version (along with the auto tuning 
when one doesn't make setsockopt() calls) gave me fits in netperf for 
years until I finally relented and split the socket buffer size 
variables into three - what netperf's user requested via the command 
line, what it was right after the socket was created, and what it was at 
the end of the data phase of the test.

rick jones

^ permalink raw reply

* Re: That's pretty much it for 3.5.0
From: Rustad, Mark D @ 2012-07-18 17:36 UTC (permalink / raw)
  To: Neil Horman
  Cc: Fastabend, John R, <h@hmsreliant.think-freely.org>,
	David Miller, <netdev@vger.kernel.org>,
	<linux-wireless@vger.kernel.org>,
	<netfilter-devel@vger.kernel.org>
In-Reply-To: <20120718130430.GE25563@hmsreliant.think-freely.org>

On Jul 18, 2012, at 6:04 AM, Neil Horman wrote:

> John, can you post the backtrace you got for this?  I replied to the patch that
> you posted for this fix.  the cgroup subsystem has an early_init flag thats
> supposed to prevent the initialization of cgroups that don't need initialization
> until later (like via module_init() calls).

Here is the backtrace that I get and below a patch that fixes it:

[    0.010958] Initializing cgroup subsys net_prio
[    0.011040] BUG: unable to handle kernel NULL pointer dereference at 0000000000000828
[    0.011998] IP: [<ffffffff814202c8>] update_netdev_tables+0x68/0xe0
[    0.011998] PGD 0 
[    0.011998] Oops: 0000 [#1] SMP 
[    0.011998] CPU 0 
[    0.011998] Modules linked in:
[    0.011998] 
[    0.011998] Pid: 0, comm: swapper/0 Not tainted 3.5.0-rc7-mdrlinux+ #10 Bochs Bochs
[    0.011998] RIP: 0010:[<ffffffff814202c8>]  [<ffffffff814202c8>] update_netdev_tables+0x68/0xe0
[    0.011998] RSP: 0000:ffffffff81a01e68  EFLAGS: 00010246
[    0.011998] RAX: 0000000000000000 RBX: fffffffffffffed0 RCX: 0000000000000000
[    0.011998] RDX: 0000000000000006 RSI: 2222222222222222 RDI: 2222222222222222
[    0.011998] RBP: ffffffff81a01e88 R08: 2222222222222222 R09: 2222222222222222
[    0.011998] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000001
[    0.011998] R13: 0000000000000000 R14: ffff88007ff608c0 R15: 00000000000143d0
[    0.011998] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[    0.011998] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    0.011998] CR2: 0000000000000828 CR3: 0000000001a0b000 CR4: 00000000000006b0
[    0.011998] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.011998] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    0.011998] Process swapper/0 (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a13420)
[    0.011998] Stack:
[    0.011998]  ffffffff81a88020 0000000000000000 ffff88007d3a38f0 fffffffffffffff4
[    0.011998]  ffffffff81a01ec8 ffffffff814203cd ffff88007ff608c0 ffffffff817d8e9a
[    0.011998]  ffffffff81a88cd8 ffffffff81a88020 ffffffff81a88020 ffffffff81b010a0
[    0.011998] Call Trace:
[    0.011998]  [<ffffffff814203cd>] cgrp_create+0x8d/0xc0
[    0.011998]  [<ffffffff81ade14a>] cgroup_init_subsys+0x80/0x126
[    0.011998]  [<ffffffff81ade380>] cgroup_init+0x36/0x117
[    0.011998]  [<ffffffff81acab71>] start_kernel+0x32e/0x34f
[    0.011998]  [<ffffffff81aca6d5>] ? repair_env_string+0x5a/0x5a
[    0.011998]  [<ffffffff81aca346>] x86_64_start_reservations+0x101/0x105
[    0.011998]  [<ffffffff81aca120>] ? early_idt_handlers+0x120/0x120
[    0.011998]  [<ffffffff81aca417>] x86_64_start_kernel+0xcd/0xdc
[    0.011998] Code: 0f 1f 00 48 8b 83 30 01 00 00 48 8d 98 d0 fe ff ff 48 3d a8 e8 52 82 74 3a e8 25 db c3 ff 85 c0 74 09 80 3d bb 3d 68 00 00 74 40 <48> 8b 83 58 09 00 00 48 85 c0 74 cc 44 3b 60 10 76 c6 44 89 e6 
[    0.011998] RIP  [<ffffffff814202c8>] update_netdev_tables+0x68/0xe0
[    0.011998]  RSP <ffffffff81a01e68>
[    0.011998] CR2: 0000000000000828
[    0.012009] ---[ end trace a7919e7f17c0a725 ]---
[    0.012601] Kernel panic - not syncing: Attempted to kill the idle task!

The following change simply statically initializes init_net.dev_base_head. I copied and pasted it into the email, so this rendering may not work, but I can send it if this approach looks reasonable. I have verified that it resolves the issue above.

diff --git a/net/core/dev.c b/net/core/dev.c
index 0f28a9e..db1ba61 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6283,8 +6283,6 @@ static struct hlist_head *netdev_create_hash(void)
 /* Initialize per network namespace state */
 static int __net_init netdev_init(struct net *net)
 {
-       INIT_LIST_HEAD(&net->dev_base_head);
-
        net->dev_name_head = netdev_create_hash();
        if (net->dev_name_head == NULL)
                goto err_name;
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index dddbacb..42f1e1c 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -27,7 +27,9 @@ static DEFINE_MUTEX(net_mutex);
 LIST_HEAD(net_namespace_list);
 EXPORT_SYMBOL_GPL(net_namespace_list);
 
-struct net init_net;
+struct net init_net = {
+       .dev_base_head = LIST_HEAD_INIT(init_net.dev_base_head),
+};
 EXPORT_SYMBOL(init_net);
 
 #define INITIAL_NET_GEN_PTRS   13 /* +1 for len +2 for rcu_head */

-- 
Mark Rustad, LAN Access Division, Intel Corporation

^ permalink raw reply related

* Re: [PATCH] SUNRPC: Prevent kernel stack corruption on long values of flush
From: J. Bruce Fields @ 2012-07-18 17:39 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Trond.Myklebust, davem, davej, linux-nfs, netdev, linux-kernel
In-Reply-To: <1342476086-21638-1-git-send-email-levinsasha928@gmail.com>

On Tue, Jul 17, 2012 at 12:01:26AM +0200, Sasha Levin wrote:
> The buffer size in read_flush() is too small for the longest possible values
> for it. This can lead to a kernel stack corruption:

Thanks!

> 
> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> index 2afd2a8..f86d95e 100644
> --- a/net/sunrpc/cache.c
> +++ b/net/sunrpc/cache.c
> @@ -1409,11 +1409,11 @@ static ssize_t read_flush(struct file *file, char __user *buf,
>  			  size_t count, loff_t *ppos,
>  			  struct cache_detail *cd)
>  {
> -	char tbuf[20];
> +	char tbuf[22];

I wonder how common this sort of calculation is in the kernel?  It might
provide some peace of mind to be able to write this something like

	char tbuf[MAXLEN_BASE10_UL + 2]  /* + 2 for final "\n\0" */

--b.

>  	unsigned long p = *ppos;
>  	size_t len;
>  
> -	sprintf(tbuf, "%lu\n", convert_to_wallclock(cd->flush_time));
> +	snprintf(tbuf, sizeof(tbuf), "%lu\n", convert_to_wallclock(cd->flush_time));
>  	len = strlen(tbuf);
>  	if (p >= len)
>  		return 0;
> -- 
> 1.7.8.6
> 

^ permalink raw reply

* [patch net-next] team: refine IFF_XMIT_DST_RELEASE capability
From: Jiri Pirko @ 2012-07-18 17:39 UTC (permalink / raw)
  To: netdev; +Cc: davem, edumazet

Cloned patch of Eric Dumazet for bonding.

Some workloads greatly benefit of IFF_XMIT_DST_RELEASE capability
on output net device, avoiding dirtying dst refcount.

team currently disables IFF_XMIT_DST_RELEASE unconditionally.

If all ports have the IFF_XMIT_DST_RELEASE bit set, then
team dev can also have it in its priv_flags.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 drivers/net/team/team.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 1a13470..813e131 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -733,12 +733,14 @@ static void __team_compute_features(struct team *team)
 	struct team_port *port;
 	u32 vlan_features = TEAM_VLAN_FEATURES;
 	unsigned short max_hard_header_len = ETH_HLEN;
+	unsigned int flags, dst_release_flag = IFF_XMIT_DST_RELEASE;
 
 	list_for_each_entry(port, &team->port_list, list) {
 		vlan_features = netdev_increment_features(vlan_features,
 					port->dev->vlan_features,
 					TEAM_VLAN_FEATURES);
 
+		dst_release_flag &= port->dev->priv_flags;
 		if (port->dev->hard_header_len > max_hard_header_len)
 			max_hard_header_len = port->dev->hard_header_len;
 	}
@@ -746,6 +748,9 @@ static void __team_compute_features(struct team *team)
 	team->dev->vlan_features = vlan_features;
 	team->dev->hard_header_len = max_hard_header_len;
 
+	flags = team->dev->priv_flags & ~IFF_XMIT_DST_RELEASE;
+	team->dev->priv_flags = flags | dst_release_flag;
+
 	netdev_change_features(team->dev);
 }
 
-- 
1.7.10.4

^ permalink raw reply related

* Re: r8169: link up, link down
From: Francois Romieu @ 2012-07-18 17:39 UTC (permalink / raw)
  To: J. Christopher Pereira; +Cc: netdev
In-Reply-To: <02b401cd64f9$115acc90$341065b0$@cl>

J. Christopher Pereira <kripper@imatronix.cl> :
[...]
> Is there any solution or workarround?

If it's an XID 98000000 - i.e. old new hardware - you may try to remove the
device then rescan the PCI bus through sysfs.

Building a modern kernel is strongly suggested if the hardware includes
a recent 816x chipset.

-- 
Ueimor

^ permalink raw reply

* Re: That's pretty much it for 3.5.0
From: Eric Dumazet @ 2012-07-18 17:55 UTC (permalink / raw)
  To: Rustad, Mark D
  Cc: Neil Horman, Fastabend, John R,
	<h@hmsreliant.think-freely.org>, David Miller,
	<netdev@vger.kernel.org>,
	<linux-wireless@vger.kernel.org>,
	<netfilter-devel@vger.kernel.org>
In-Reply-To: <205259E8-A99F-4573-96C9-7A394235B338@intel.com>

On Wed, 2012-07-18 at 17:36 +0000, Rustad, Mark D wrote:
> On Jul 18, 2012, at 6:04 AM, Neil Horman wrote:
> 
> > John, can you post the backtrace you got for this?  I replied to the patch that
> > you posted for this fix.  the cgroup subsystem has an early_init flag thats
> > supposed to prevent the initialization of cgroups that don't need initialization
> > until later (like via module_init() calls).
> 
> Here is the backtrace that I get and below a patch that fixes it:
> 
> [    0.010958] Initializing cgroup subsys net_prio
> [    0.011040] BUG: unable to handle kernel NULL pointer dereference at 0000000000000828
> [    0.011998] IP: [<ffffffff814202c8>] update_netdev_tables+0x68/0xe0
> [    0.011998] PGD 0 
> [    0.011998] Oops: 0000 [#1] SMP 
> [    0.011998] CPU 0 
> [    0.011998] Modules linked in:
> [    0.011998] 
> [    0.011998] Pid: 0, comm: swapper/0 Not tainted 3.5.0-rc7-mdrlinux+ #10 Bochs Bochs
> [    0.011998] RIP: 0010:[<ffffffff814202c8>]  [<ffffffff814202c8>] update_netdev_tables+0x68/0xe0
> [    0.011998] RSP: 0000:ffffffff81a01e68  EFLAGS: 00010246
> [    0.011998] RAX: 0000000000000000 RBX: fffffffffffffed0 RCX: 0000000000000000
> [    0.011998] RDX: 0000000000000006 RSI: 2222222222222222 RDI: 2222222222222222
> [    0.011998] RBP: ffffffff81a01e88 R08: 2222222222222222 R09: 2222222222222222
> [    0.011998] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000001
> [    0.011998] R13: 0000000000000000 R14: ffff88007ff608c0 R15: 00000000000143d0
> [    0.011998] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
> [    0.011998] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [    0.011998] CR2: 0000000000000828 CR3: 0000000001a0b000 CR4: 00000000000006b0
> [    0.011998] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    0.011998] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [    0.011998] Process swapper/0 (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a13420)
> [    0.011998] Stack:
> [    0.011998]  ffffffff81a88020 0000000000000000 ffff88007d3a38f0 fffffffffffffff4
> [    0.011998]  ffffffff81a01ec8 ffffffff814203cd ffff88007ff608c0 ffffffff817d8e9a
> [    0.011998]  ffffffff81a88cd8 ffffffff81a88020 ffffffff81a88020 ffffffff81b010a0
> [    0.011998] Call Trace:
> [    0.011998]  [<ffffffff814203cd>] cgrp_create+0x8d/0xc0
> [    0.011998]  [<ffffffff81ade14a>] cgroup_init_subsys+0x80/0x126
> [    0.011998]  [<ffffffff81ade380>] cgroup_init+0x36/0x117
> [    0.011998]  [<ffffffff81acab71>] start_kernel+0x32e/0x34f
> [    0.011998]  [<ffffffff81aca6d5>] ? repair_env_string+0x5a/0x5a
> [    0.011998]  [<ffffffff81aca346>] x86_64_start_reservations+0x101/0x105
> [    0.011998]  [<ffffffff81aca120>] ? early_idt_handlers+0x120/0x120
> [    0.011998]  [<ffffffff81aca417>] x86_64_start_kernel+0xcd/0xdc
> [    0.011998] Code: 0f 1f 00 48 8b 83 30 01 00 00 48 8d 98 d0 fe ff ff 48 3d a8 e8 52 82 74 3a e8 25 db c3 ff 85 c0 74 09 80 3d bb 3d 68 00 00 74 40 <48> 8b 83 58 09 00 00 48 85 c0 74 cc 44 3b 60 10 76 c6 44 89 e6 
> [    0.011998] RIP  [<ffffffff814202c8>] update_netdev_tables+0x68/0xe0
> [    0.011998]  RSP <ffffffff81a01e68>
> [    0.011998] CR2: 0000000000000828
> [    0.012009] ---[ end trace a7919e7f17c0a725 ]---
> [    0.012601] Kernel panic - not syncing: Attempted to kill the idle task!
> 
> The following change simply statically initializes init_net.dev_base_head. I copied and pasted it into the email, so this rendering may not work, but I can send it if this approach looks reasonable. I have verified that it resolves the issue above.
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 0f28a9e..db1ba61 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6283,8 +6283,6 @@ static struct hlist_head *netdev_create_hash(void)
>  /* Initialize per network namespace state */
>  static int __net_init netdev_init(struct net *net)
>  {
> -       INIT_LIST_HEAD(&net->dev_base_head);
> -

	if (net != &init_net)
		INIT_LIST_HEAD(&net->dev_base_head);

>         net->dev_name_head = netdev_create_hash();
>         if (net->dev_name_head == NULL)
>                 goto err_name;
> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> index dddbacb..42f1e1c 100644
> --- a/net/core/net_namespace.c
> +++ b/net/core/net_namespace.c
> @@ -27,7 +27,9 @@ static DEFINE_MUTEX(net_mutex);
>  LIST_HEAD(net_namespace_list);
>  EXPORT_SYMBOL_GPL(net_namespace_list);
>  
> -struct net init_net;
> +struct net init_net = {
> +       .dev_base_head = LIST_HEAD_INIT(init_net.dev_base_head),
> +};
>  EXPORT_SYMBOL(init_net);
>  
>  #define INITIAL_NET_GEN_PTRS   13 /* +1 for len +2 for rcu_head */
> 



^ permalink raw reply

* [PATCH v2] sctp: Implement quick failover draft from tsvwg
From: Neil Horman @ 2012-07-18 18:01 UTC (permalink / raw)
  To: netdev
  Cc: Neil Horman, Vlad Yasevich, Sridhar Samudrala, David S. Miller,
	linux-sctp
In-Reply-To: <1342203998-24037-1-git-send-email-nhorman@tuxdriver.com>

I've seen several attempts recently made to do quick failover of sctp transports
by reducing various retransmit timers and counters.  While its possible to
implement a faster failover on multihomed sctp associations, its not
particularly robust, in that it can lead to unneeded retransmits, as well as
false connection failures due to intermittent latency on a network.

Instead, lets implement the new ietf quick failover draft found here:
http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05

This will let the sctp stack identify transports that have had a small number of
errors, and avoid using them quickly until their reliability can be
re-established.  I've tested this out on two virt guests connected via multiple
isolated virt networks and believe its in compliance with the above draft and
works well.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: Sridhar Samudrala <sri@us.ibm.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: linux-sctp@vger.kernel.org

---
Change notes:

V2)
- Added socket option API from section 6.1 of the specification, as per
request from Vlad. Adding this socket option allows us to alter both the path
maximum retransmit value and the path partial failure threshold for each
transport and the association as a whole.

- Added a per transport pf_retrans value, and initialized it from the
association value.  This makes each transport independently configurable as per
the socket option above, and prevents changes in the sysctl from bleeding into
an already created association.
---
 Documentation/networking/ip-sysctl.txt |   14 +++++
 include/net/sctp/constants.h           |    1 +
 include/net/sctp/structs.h             |   11 +++-
 include/net/sctp/user.h                |   11 ++++
 net/sctp/associola.c                   |   36 ++++++++++--
 net/sctp/outqueue.c                    |    6 +-
 net/sctp/sm_sideeffect.c               |   33 ++++++++++-
 net/sctp/socket.c                      |   96 ++++++++++++++++++++++++++++++++
 net/sctp/sysctl.c                      |    9 +++
 net/sctp/transport.c                   |    4 +-
 10 files changed, 206 insertions(+), 15 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 47b6c79..c636f9c 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1408,6 +1408,20 @@ path_max_retrans - INTEGER
 
 	Default: 5
 
+pf_retrans - INTEGER
+	The number of retransmissions that will be attempted on a given path
+	before traffic is redirected to an alternate transport (should one
+	exist).  Note this is distinct from path_max_retrans, as a path that
+	passes the pf_retrans threshold can still be used.  Its only
+	deprioritized when a transmission path is selected by the stack.  This
+	setting is primarily used to enable fast failover mechanisms without
+	having to reduce path_max_retrans to a very low value.  See:
+	http://www.ietf.org/id/draft-nishida-tsvwg-sctp-failover-05.txt
+	for details.  Note also that a value of pf_retrans > path_max_retrans
+	disables this feature
+
+	Default: 0
+
 rto_initial - INTEGER
 	The initial round trip timeout value in milliseconds that will be used
 	in calculating round trip times.  This is the initial time interval
diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h
index 942b864..d053d2e 100644
--- a/include/net/sctp/constants.h
+++ b/include/net/sctp/constants.h
@@ -334,6 +334,7 @@ typedef enum {
 typedef enum {
 	SCTP_TRANSPORT_UP,
 	SCTP_TRANSPORT_DOWN,
+	SCTP_TRANSPORT_PF,
 } sctp_transport_cmd_t;
 
 /* These are the address scopes defined mainly for IPv4 addresses
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index e4652fe..f70726c 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -160,6 +160,7 @@ extern struct sctp_globals {
 	int max_retrans_association;
 	int max_retrans_path;
 	int max_retrans_init;
+	int pf_retrans;
 
 	/*
 	 * Policy for preforming sctp/socket accounting
@@ -258,6 +259,7 @@ extern struct sctp_globals {
 #define sctp_sndbuf_policy	 	(sctp_globals.sndbuf_policy)
 #define sctp_rcvbuf_policy	 	(sctp_globals.rcvbuf_policy)
 #define sctp_max_retrans_path		(sctp_globals.max_retrans_path)
+#define sctp_pf_retrans			(sctp_globals.pf_retrans)
 #define sctp_max_retrans_init		(sctp_globals.max_retrans_init)
 #define sctp_sack_timeout		(sctp_globals.sack_timeout)
 #define sctp_hb_interval		(sctp_globals.hb_interval)
@@ -987,10 +989,15 @@ struct sctp_transport {
 
 	/* This is the max_retrans value for the transport and will
 	 * be initialized from the assocs value.  This can be changed
-	 * using SCTP_SET_PEER_ADDR_PARAMS socket option.
+	 * using the SCTP_SET_PEER_ADDR_PARAMS socket option.
 	 */
 	__u16 pathmaxrxt;
 
+	/* This is the partially failed retrans value for the transport
+	 * and will be initialized from the assocs value.  This can be changed
+	 * using the SCTP_PEER_ADDR_THLDS socket option
+	 */
+	int pf_retrans;
 	/* PMTU	      : The current known path MTU.  */
 	__u32 pathmtu;
 
@@ -1660,6 +1667,8 @@ struct sctp_association {
 	 */
 	int max_retrans;
 
+	int pf_retrans;
+
 	/* Maximum number of times the endpoint will retransmit INIT  */
 	__u16 max_init_attempts;
 
diff --git a/include/net/sctp/user.h b/include/net/sctp/user.h
index 0842ef0..1b02d7a 100644
--- a/include/net/sctp/user.h
+++ b/include/net/sctp/user.h
@@ -93,6 +93,7 @@ typedef __s32 sctp_assoc_t;
 #define SCTP_GET_ASSOC_NUMBER	28	/* Read only */
 #define SCTP_GET_ASSOC_ID_LIST	29	/* Read only */
 #define SCTP_AUTO_ASCONF       30
+#define SCTP_PEER_ADDR_THLDS	31
 
 /* Internal Socket Options. Some of the sctp library functions are
  * implemented using these socket options.
@@ -649,6 +650,7 @@ struct sctp_paddrinfo {
  */
 enum sctp_spinfo_state {
 	SCTP_INACTIVE,
+	SCTP_PF,
 	SCTP_ACTIVE,
 	SCTP_UNCONFIRMED,
 	SCTP_UNKNOWN = 0xffff  /* Value used for transport state unknown */
@@ -741,4 +743,13 @@ typedef struct {
 	int sd;
 } sctp_peeloff_arg_t;
 
+/*
+ *  Peer Address Thresholds socket option
+ */
+struct sctp_paddrthlds {
+	sctp_assoc_t spt_assoc_id;
+	struct sockaddr_storage spt_address;
+	__u16 spt_pathmaxrxt;
+	__u16 spt_pathpfthld;
+};
 #endif /* __net_sctp_user_h__ */
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 5bc9ab1..b357195 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -124,6 +124,8 @@ static struct sctp_association *sctp_association_init(struct sctp_association *a
 	 * socket values.
 	 */
 	asoc->max_retrans = sp->assocparams.sasoc_asocmaxrxt;
+	asoc->pf_retrans  = sctp_pf_retrans;
+
 	asoc->rto_initial = msecs_to_jiffies(sp->rtoinfo.srto_initial);
 	asoc->rto_max = msecs_to_jiffies(sp->rtoinfo.srto_max);
 	asoc->rto_min = msecs_to_jiffies(sp->rtoinfo.srto_min);
@@ -685,6 +687,9 @@ struct sctp_transport *sctp_assoc_add_peer(struct sctp_association *asoc,
 	/* Set the path max_retrans.  */
 	peer->pathmaxrxt = asoc->pathmaxrxt;
 
+	/* And the partial failure retrnas threshold */
+	peer->pf_retrans = asoc->pf_retrans;
+
 	/* Initialize the peer's SACK delay timeout based on the
 	 * association configured value.
 	 */
@@ -840,6 +845,7 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
 	struct sctp_ulpevent *event;
 	struct sockaddr_storage addr;
 	int spc_state = 0;
+	bool ulp_notify = true;
 
 	/* Record the transition on the transport.  */
 	switch (command) {
@@ -853,6 +859,14 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
 			spc_state = SCTP_ADDR_CONFIRMED;
 		else
 			spc_state = SCTP_ADDR_AVAILABLE;
+		/* Don't inform ULP about transition from PF to
+		 * active state and set cwnd to 1, see SCTP
+		 * Quick failover draft section 5.1, point 5
+		 */
+		if (transport->state == SCTP_PF) {
+			ulp_notify = false;
+			transport->cwnd = 1;
+		}
 		transport->state = SCTP_ACTIVE;
 		break;
 
@@ -871,6 +885,10 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
 		spc_state = SCTP_ADDR_UNREACHABLE;
 		break;
 
+	case SCTP_TRANSPORT_PF:
+		transport->state = SCTP_PF;
+		ulp_notify = false;
+		break;
 	default:
 		return;
 	}
@@ -878,12 +896,15 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
 	/* Generate and send a SCTP_PEER_ADDR_CHANGE notification to the
 	 * user.
 	 */
-	memset(&addr, 0, sizeof(struct sockaddr_storage));
-	memcpy(&addr, &transport->ipaddr, transport->af_specific->sockaddr_len);
-	event = sctp_ulpevent_make_peer_addr_change(asoc, &addr,
-				0, spc_state, error, GFP_ATOMIC);
-	if (event)
-		sctp_ulpq_tail_event(&asoc->ulpq, event);
+	if (ulp_notify) {
+		memset(&addr, 0, sizeof(struct sockaddr_storage));
+		memcpy(&addr, &transport->ipaddr,
+		       transport->af_specific->sockaddr_len);
+		event = sctp_ulpevent_make_peer_addr_change(asoc, &addr,
+					0, spc_state, error, GFP_ATOMIC);
+		if (event)
+			sctp_ulpq_tail_event(&asoc->ulpq, event);
+	}
 
 	/* Select new active and retran paths. */
 
@@ -899,7 +920,8 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
 			transports) {
 
 		if ((t->state == SCTP_INACTIVE) ||
-		    (t->state == SCTP_UNCONFIRMED))
+		    (t->state == SCTP_UNCONFIRMED) ||
+		    (t->state == SCTP_PF))
 			continue;
 		if (!first || t->last_time_heard > first->last_time_heard) {
 			second = first;
diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index a0fa19f..e7aa177c 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -792,7 +792,8 @@ static int sctp_outq_flush(struct sctp_outq *q, int rtx_timeout)
 			if (!new_transport)
 				new_transport = asoc->peer.active_path;
 		} else if ((new_transport->state == SCTP_INACTIVE) ||
-			   (new_transport->state == SCTP_UNCONFIRMED)) {
+			   (new_transport->state == SCTP_UNCONFIRMED) ||
+			   (new_transport->state == SCTP_PF)) {
 			/* If the chunk is Heartbeat or Heartbeat Ack,
 			 * send it to chunk->transport, even if it's
 			 * inactive.
@@ -987,7 +988,8 @@ static int sctp_outq_flush(struct sctp_outq *q, int rtx_timeout)
 			new_transport = chunk->transport;
 			if (!new_transport ||
 			    ((new_transport->state == SCTP_INACTIVE) ||
-			     (new_transport->state == SCTP_UNCONFIRMED)))
+			     (new_transport->state == SCTP_UNCONFIRMED) ||
+			     (new_transport->state == SCTP_PF)))
 				new_transport = asoc->peer.active_path;
 			if (new_transport->state == SCTP_UNCONFIRMED)
 				continue;
diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index c96d1a8..285e26a 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -76,6 +76,8 @@ static int sctp_side_effects(sctp_event_t event_type, sctp_subtype_t subtype,
 			     sctp_cmd_seq_t *commands,
 			     gfp_t gfp);
 
+static void sctp_cmd_hb_timer_update(sctp_cmd_seq_t *cmds,
+				     struct sctp_transport *t);
 /********************************************************************
  * Helper functions
  ********************************************************************/
@@ -470,7 +472,8 @@ sctp_timer_event_t *sctp_timer_events[SCTP_NUM_TIMEOUT_TYPES] = {
  * notification SHOULD be sent to the upper layer.
  *
  */
-static void sctp_do_8_2_transport_strike(struct sctp_association *asoc,
+static void sctp_do_8_2_transport_strike(sctp_cmd_seq_t *commands,
+					 struct sctp_association *asoc,
 					 struct sctp_transport *transport,
 					 int is_hb)
 {
@@ -495,6 +498,23 @@ static void sctp_do_8_2_transport_strike(struct sctp_association *asoc,
 			transport->error_count++;
 	}
 
+	/* If the transport error count is greater than the pf_retrans
+	 * threshold, and less than pathmaxrtx, then mark this transport
+	 * as Partially Failed, ee SCTP Quick Failover Draft, secon 5.1,
+	 * point 1
+	 */
+	if ((transport->state != SCTP_PF) &&
+	   (asoc->pf_retrans < transport->pathmaxrxt) &&
+	   (transport->error_count > asoc->pf_retrans)) {
+
+		sctp_assoc_control_transport(asoc, transport,
+					     SCTP_TRANSPORT_PF,
+					     0);
+
+		/* Update the hb timer to resend a heartbeat every rto */
+		sctp_cmd_hb_timer_update(commands, transport);
+	}
+
 	if (transport->state != SCTP_INACTIVE &&
 	    (transport->error_count > transport->pathmaxrxt)) {
 		SCTP_DEBUG_PRINTK_IPADDR("transport_strike:association %p",
@@ -699,6 +719,10 @@ static void sctp_cmd_transport_on(sctp_cmd_seq_t *cmds,
 					     SCTP_HEARTBEAT_SUCCESS);
 	}
 
+	if (t->state == SCTP_PF)
+		sctp_assoc_control_transport(asoc, t, SCTP_TRANSPORT_UP,
+					     SCTP_HEARTBEAT_SUCCESS);
+
 	/* The receiver of the HEARTBEAT ACK should also perform an
 	 * RTT measurement for that destination transport address
 	 * using the time value carried in the HEARTBEAT ACK chunk.
@@ -1565,8 +1589,8 @@ static int sctp_cmd_interpreter(sctp_event_t event_type,
 
 		case SCTP_CMD_STRIKE:
 			/* Mark one strike against a transport.  */
-			sctp_do_8_2_transport_strike(asoc, cmd->obj.transport,
-						    0);
+			sctp_do_8_2_transport_strike(commands, asoc,
+						    cmd->obj.transport, 0);
 			break;
 
 		case SCTP_CMD_TRANSPORT_IDLE:
@@ -1576,7 +1600,8 @@ static int sctp_cmd_interpreter(sctp_event_t event_type,
 
 		case SCTP_CMD_TRANSPORT_HB_SENT:
 			t = cmd->obj.transport;
-			sctp_do_8_2_transport_strike(asoc, t, 1);
+			sctp_do_8_2_transport_strike(commands, asoc,
+						     t, 1);
 			t->hb_sent = 1;
 			break;
 
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index b3b8a8d..dfffece 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -3470,6 +3470,52 @@ static int sctp_setsockopt_auto_asconf(struct sock *sk, char __user *optval,
 }
 
 
+/*
+ * SCTP_PEER_ADDR_THLDS
+ *
+ * This option allows us to alter the partially failed threshold for one or all
+ * transports in an association.  See Section 6.1 of:
+ * http://www.ietf.org/id/draft-nishida-tsvwg-sctp-failover-05.txt
+ */
+static int sctp_setsockopt_paddr_thresholds(struct sock *sk,
+					    char __user *optval,
+					    unsigned int optlen)
+{
+	struct sctp_paddrthlds val;
+	struct sctp_transport *trans;
+	struct sctp_association *asoc;
+
+	if (optlen < sizeof(struct sctp_paddrthlds))
+		return -EINVAL;
+	if (copy_from_user(&val, (struct sctp_paddrthlds __user *)optval,
+			   optlen))
+		return -EFAULT;
+
+	if (sctp_is_any(sk, (const union sctp_addr *)&val.spt_address)) {
+		asoc = sctp_id2assoc(sk, val.spt_assoc_id);
+		if (!asoc)
+			return -ENOENT;
+		list_for_each_entry(trans, &asoc->peer.transport_addr_list,
+				    transports) {
+			trans->pathmaxrxt = val.spt_pathmaxrxt;
+			trans->pf_retrans = val.spt_pathpfthld;
+		}
+
+		asoc->pf_retrans = val.spt_pathpfthld;
+		asoc->pathmaxrxt = val.spt_pathmaxrxt;
+	} else {
+		trans = sctp_addr_id2transport(sk, &val.spt_address,
+					       val.spt_assoc_id);
+		if (!trans)
+			return -ENOENT;
+
+		trans->pathmaxrxt = val.spt_pathmaxrxt;
+		trans->pf_retrans = val.spt_pathpfthld;
+	}
+
+	return 0;
+}
+
 /* API 6.2 setsockopt(), getsockopt()
  *
  * Applications use setsockopt() and getsockopt() to set or retrieve
@@ -3619,6 +3665,9 @@ SCTP_STATIC int sctp_setsockopt(struct sock *sk, int level, int optname,
 	case SCTP_AUTO_ASCONF:
 		retval = sctp_setsockopt_auto_asconf(sk, optval, optlen);
 		break;
+	case SCTP_PEER_ADDR_THLDS:
+		retval = sctp_setsockopt_paddr_thresholds(sk, optval, optlen);
+		break;
 	default:
 		retval = -ENOPROTOOPT;
 		break;
@@ -5490,6 +5539,50 @@ static int sctp_getsockopt_assoc_ids(struct sock *sk, int len,
 	return 0;
 }
 
+/*
+ * SCTP_PEER_ADDR_THLDS
+ *
+ * This option allows us to fetch the partially failed threshold for one or all
+ * transports in an association.  See Section 6.1 of:
+ * http://www.ietf.org/id/draft-nishida-tsvwg-sctp-failover-05.txt
+ */
+static int sctp_getsockopt_paddr_thresholds(struct sock *sk,
+					    char __user *optval,
+					    int optlen)
+{
+	struct sctp_paddrthlds val;
+	struct sctp_transport *trans;
+	struct sctp_association *asoc;
+
+	if (optlen < sizeof(struct sctp_paddrthlds))
+		return -EINVAL;
+	if (copy_from_user(&val, (struct sctp_paddrthlds __user *)optval, optlen))
+		return -EFAULT;
+
+	if (sctp_is_any(sk, (const union sctp_addr *)&val.spt_address)) {
+			val.spt_assoc_id);
+		asoc = sctp_id2assoc(sk, val.spt_assoc_id);
+		if (!asoc)
+			return -ENOENT;
+
+		val.spt_pathpfthld = asoc->pf_retrans;
+		val.spt_pathmaxrxt = asoc->pathmaxrxt;
+	} else {
+		trans = sctp_addr_id2transport(sk, &val.spt_address,
+					       val.spt_assoc_id);
+		if (!trans)
+			return -ENOENT;
+
+		val.spt_pathmaxrxt = trans->pathmaxrxt;
+		val.spt_pathpfthld = trans->pf_retrans;
+	}
+
+	if (copy_to_user(optval, &val, optlen))
+		return -EFAULT;
+
+	return 0;
+}
+
 SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname,
 				char __user *optval, int __user *optlen)
 {
@@ -5628,6 +5721,9 @@ SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname,
 	case SCTP_AUTO_ASCONF:
 		retval = sctp_getsockopt_auto_asconf(sk, len, optval, optlen);
 		break;
+	case SCTP_PEER_ADDR_THLDS:
+		retval = sctp_getsockopt_paddr_thresholds(sk, optval, len);
+		break;
 	default:
 		retval = -ENOPROTOOPT;
 		break;
diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
index e5fe639..2b2bfe9 100644
--- a/net/sctp/sysctl.c
+++ b/net/sctp/sysctl.c
@@ -141,6 +141,15 @@ static ctl_table sctp_table[] = {
 		.extra2		= &int_max
 	},
 	{
+		.procname	= "pf_retrans",
+		.data		= &sctp_pf_retrans,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &int_max
+	},
+	{
 		.procname	= "max_init_retransmits",
 		.data		= &sctp_max_retrans_init,
 		.maxlen		= sizeof(int),
diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index b026ba0..194d0f3 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -85,6 +85,7 @@ static struct sctp_transport *sctp_transport_init(struct sctp_transport *peer,
 
 	/* Initialize the default path max_retrans.  */
 	peer->pathmaxrxt  = sctp_max_retrans_path;
+	peer->pf_retrans  = sctp_pf_retrans;
 
 	INIT_LIST_HEAD(&peer->transmitted);
 	INIT_LIST_HEAD(&peer->send_ready);
@@ -585,7 +586,8 @@ unsigned long sctp_transport_timeout(struct sctp_transport *t)
 {
 	unsigned long timeout;
 	timeout = t->rto + sctp_jitter(t->rto);
-	if (t->state != SCTP_UNCONFIRMED)
+	if ((t->state != SCTP_UNCONFIRMED) &&
+	    (t->state != SCTP_PF))
 		timeout += t->hbinterval;
 	timeout += jiffies;
 	return timeout;
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH net-next v4] ipv6: add ipv6_addr_hash() helper
From: Eric Dumazet @ 2012-07-18 18:11 UTC (permalink / raw)
  To: David Miller; +Cc: Joe Perches, netdev, Andrew McGregor, Dave Taht, Tom Herbert
In-Reply-To: <1342621670.2626.2818.camel@edumazet-glaptop>

From: Eric Dumazet <edumazet@google.com>

Introduce ipv6_addr_hash() helper doing a XOR on all bits
of an IPv6 address, with an optimized x86_64 version.

Use it in flow dissector, as suggested by Andrew McGregor,
to reduce hash collision probabilities in fq_codel (and other
users of flow dissector)

Use it in ip6_tunnel.c and use more bit shuffling, as suggested
by David Laight, as existing hash was ignoring most of them.

Use it in sunrpc and use more bit shuffling, using hash_32().

Use it in net/ipv6/addrconf.c, using hash_32() as well.

As a cleanup, use it in net/ipv4/tcp_metrics.c

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrew McGregor <andrewmcgr@gmail.com>
Cc: Dave Taht <dave.taht@gmail.com>
Cc: Tom Herbert <therbert@google.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Joe Perches <joe@perches.com>
---
v4: net/ipv6/addrconf.c part, sorry again David

 include/net/addrconf.h    |    3 ++-
 include/net/ipv6.h        |   13 +++++++++++++
 net/core/flow_dissector.c |    5 +++--
 net/ipv4/tcp_metrics.c    |   15 +++------------
 net/ipv6/addrconf.c       |   21 ++++++++-------------
 net/ipv6/ip6_tunnel.c     |   20 ++++++++++++--------
 net/sunrpc/svcauth_unix.c |   22 ++++------------------
 7 files changed, 45 insertions(+), 54 deletions(-)

diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index f2b801c..089a09d 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -46,7 +46,8 @@ struct prefix_info {
 #include <net/if_inet6.h>
 #include <net/ipv6.h>
 
-#define IN6_ADDR_HSIZE		16
+#define IN6_ADDR_HSIZE_SHIFT	4
+#define IN6_ADDR_HSIZE		(1 << IN6_ADDR_HSIZE_SHIFT)
 
 extern int			addrconf_init(void);
 extern void			addrconf_cleanup(void);
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index f695f39..01c34b3 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -419,6 +419,19 @@ static inline bool ipv6_addr_any(const struct in6_addr *a)
 #endif
 }
 
+static inline u32 ipv6_addr_hash(const struct in6_addr *a)
+{
+#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64
+	const unsigned long *ul = (const unsigned long *)a;
+	unsigned long x = ul[0] ^ ul[1];
+
+	return (u32)(x ^ (x >> 32));
+#else
+	return (__force u32)(a->s6_addr32[0] ^ a->s6_addr32[1] ^
+			     a->s6_addr32[2] ^ a->s6_addr32[3]);
+#endif
+}
+
 static inline bool ipv6_addr_loopback(const struct in6_addr *a)
 {
 	return (a->s6_addr32[0] | a->s6_addr32[1] |
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index a225089..466820b 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -4,6 +4,7 @@
 #include <linux/ipv6.h>
 #include <linux/if_vlan.h>
 #include <net/ip.h>
+#include <net/ipv6.h>
 #include <linux/if_tunnel.h>
 #include <linux/if_pppox.h>
 #include <linux/ppp_defs.h>
@@ -55,8 +56,8 @@ ipv6:
 			return false;
 
 		ip_proto = iph->nexthdr;
-		flow->src = iph->saddr.s6_addr32[3];
-		flow->dst = iph->daddr.s6_addr32[3];
+		flow->src = (__force __be32)ipv6_addr_hash(&iph->saddr);
+		flow->dst = (__force __be32)ipv6_addr_hash(&iph->daddr);
 		nhoff += sizeof(struct ipv6hdr);
 		break;
 	}
diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
index 5a38a2d..1a115b6 100644
--- a/net/ipv4/tcp_metrics.c
+++ b/net/ipv4/tcp_metrics.c
@@ -211,10 +211,7 @@ static struct tcp_metrics_block *__tcp_get_metrics_req(struct request_sock *req,
 		break;
 	case AF_INET6:
 		*(struct in6_addr *)addr.addr.a6 = inet6_rsk(req)->rmt_addr;
-		hash = ((__force unsigned int) addr.addr.a6[0] ^
-			(__force unsigned int) addr.addr.a6[1] ^
-			(__force unsigned int) addr.addr.a6[2] ^
-			(__force unsigned int) addr.addr.a6[3]);
+		hash = ipv6_addr_hash(&inet6_rsk(req)->rmt_addr);
 		break;
 	default:
 		return NULL;
@@ -251,10 +248,7 @@ static struct tcp_metrics_block *__tcp_get_metrics_tw(struct inet_timewait_sock
 	case AF_INET6:
 		tw6 = inet6_twsk((struct sock *)tw);
 		*(struct in6_addr *)addr.addr.a6 = tw6->tw_v6_daddr;
-		hash = ((__force unsigned int) addr.addr.a6[0] ^
-			(__force unsigned int) addr.addr.a6[1] ^
-			(__force unsigned int) addr.addr.a6[2] ^
-			(__force unsigned int) addr.addr.a6[3]);
+		hash = ipv6_addr_hash(&tw6->tw_v6_daddr);
 		break;
 	default:
 		return NULL;
@@ -291,10 +285,7 @@ static struct tcp_metrics_block *tcp_get_metrics(struct sock *sk,
 		break;
 	case AF_INET6:
 		*(struct in6_addr *)addr.addr.a6 = inet6_sk(sk)->daddr;
-		hash = ((__force unsigned int) addr.addr.a6[0] ^
-			(__force unsigned int) addr.addr.a6[1] ^
-			(__force unsigned int) addr.addr.a6[2] ^
-			(__force unsigned int) addr.addr.a6[3]);
+		hash = ipv6_addr_hash(&inet6_sk(sk)->daddr);
 		break;
 	default:
 		return NULL;
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 8f6411c..7918181 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -63,6 +63,7 @@
 #include <linux/delay.h>
 #include <linux/notifier.h>
 #include <linux/string.h>
+#include <linux/hash.h>
 
 #include <net/net_namespace.h>
 #include <net/sock.h>
@@ -579,15 +580,9 @@ ipv6_link_dev_addr(struct inet6_dev *idev, struct inet6_ifaddr *ifp)
 	list_add_tail(&ifp->if_list, p);
 }
 
-static u32 ipv6_addr_hash(const struct in6_addr *addr)
+static u32 inet6_addr_hash(const struct in6_addr *addr)
 {
-	/*
-	 * We perform the hash function over the last 64 bits of the address
-	 * This will include the IEEE address token on links that support it.
-	 */
-	return jhash_2words((__force u32)addr->s6_addr32[2],
-			    (__force u32)addr->s6_addr32[3], 0)
-		& (IN6_ADDR_HSIZE - 1);
+	return hash_32(ipv6_addr_hash(addr), IN6_ADDR_HSIZE_SHIFT);
 }
 
 /* On success it returns ifp with increased reference count */
@@ -662,7 +657,7 @@ ipv6_add_addr(struct inet6_dev *idev, const struct in6_addr *addr, int pfxlen,
 	in6_ifa_hold(ifa);
 
 	/* Add to big hash table */
-	hash = ipv6_addr_hash(addr);
+	hash = inet6_addr_hash(addr);
 
 	hlist_add_head_rcu(&ifa->addr_lst, &inet6_addr_lst[hash]);
 	spin_unlock(&addrconf_hash_lock);
@@ -1270,7 +1265,7 @@ int ipv6_chk_addr(struct net *net, const struct in6_addr *addr,
 {
 	struct inet6_ifaddr *ifp;
 	struct hlist_node *node;
-	unsigned int hash = ipv6_addr_hash(addr);
+	unsigned int hash = inet6_addr_hash(addr);
 
 	rcu_read_lock_bh();
 	hlist_for_each_entry_rcu(ifp, node, &inet6_addr_lst[hash], addr_lst) {
@@ -1293,7 +1288,7 @@ EXPORT_SYMBOL(ipv6_chk_addr);
 static bool ipv6_chk_same_addr(struct net *net, const struct in6_addr *addr,
 			       struct net_device *dev)
 {
-	unsigned int hash = ipv6_addr_hash(addr);
+	unsigned int hash = inet6_addr_hash(addr);
 	struct inet6_ifaddr *ifp;
 	struct hlist_node *node;
 
@@ -1336,7 +1331,7 @@ struct inet6_ifaddr *ipv6_get_ifaddr(struct net *net, const struct in6_addr *add
 				     struct net_device *dev, int strict)
 {
 	struct inet6_ifaddr *ifp, *result = NULL;
-	unsigned int hash = ipv6_addr_hash(addr);
+	unsigned int hash = inet6_addr_hash(addr);
 	struct hlist_node *node;
 
 	rcu_read_lock_bh();
@@ -3223,7 +3218,7 @@ int ipv6_chk_home_addr(struct net *net, const struct in6_addr *addr)
 	int ret = 0;
 	struct inet6_ifaddr *ifp = NULL;
 	struct hlist_node *n;
-	unsigned int hash = ipv6_addr_hash(addr);
+	unsigned int hash = inet6_addr_hash(addr);
 
 	rcu_read_lock_bh();
 	hlist_for_each_entry_rcu_bh(ifp, n, &inet6_addr_lst[hash], addr_lst) {
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index db32846..9a1d5fe 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -40,6 +40,7 @@
 #include <linux/rtnetlink.h>
 #include <linux/netfilter_ipv6.h>
 #include <linux/slab.h>
+#include <linux/hash.h>
 
 #include <asm/uaccess.h>
 #include <linux/atomic.h>
@@ -70,11 +71,15 @@ MODULE_ALIAS_NETDEV("ip6tnl0");
 #define IPV6_TCLASS_MASK (IPV6_FLOWINFO_MASK & ~IPV6_FLOWLABEL_MASK)
 #define IPV6_TCLASS_SHIFT 20
 
-#define HASH_SIZE  32
+#define HASH_SIZE_SHIFT  5
+#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
 
-#define HASH(addr) ((__force u32)((addr)->s6_addr32[0] ^ (addr)->s6_addr32[1] ^ \
-		     (addr)->s6_addr32[2] ^ (addr)->s6_addr32[3]) & \
-		    (HASH_SIZE - 1))
+static u32 HASH(const struct in6_addr *addr1, const struct in6_addr *addr2)
+{
+	u32 hash = ipv6_addr_hash(addr1) ^ ipv6_addr_hash(addr2);
+
+	return hash_32(hash, HASH_SIZE_SHIFT);
+}
 
 static int ip6_tnl_dev_init(struct net_device *dev);
 static void ip6_tnl_dev_setup(struct net_device *dev);
@@ -166,12 +171,11 @@ static inline void ip6_tnl_dst_store(struct ip6_tnl *t, struct dst_entry *dst)
 static struct ip6_tnl *
 ip6_tnl_lookup(struct net *net, const struct in6_addr *remote, const struct in6_addr *local)
 {
-	unsigned int h0 = HASH(remote);
-	unsigned int h1 = HASH(local);
+	unsigned int hash = HASH(remote, local);
 	struct ip6_tnl *t;
 	struct ip6_tnl_net *ip6n = net_generic(net, ip6_tnl_net_id);
 
-	for_each_ip6_tunnel_rcu(ip6n->tnls_r_l[h0 ^ h1]) {
+	for_each_ip6_tunnel_rcu(ip6n->tnls_r_l[hash]) {
 		if (ipv6_addr_equal(local, &t->parms.laddr) &&
 		    ipv6_addr_equal(remote, &t->parms.raddr) &&
 		    (t->dev->flags & IFF_UP))
@@ -205,7 +209,7 @@ ip6_tnl_bucket(struct ip6_tnl_net *ip6n, const struct ip6_tnl_parm *p)
 
 	if (!ipv6_addr_any(remote) || !ipv6_addr_any(local)) {
 		prio = 1;
-		h = HASH(remote) ^ HASH(local);
+		h = HASH(remote, local);
 	}
 	return &ip6n->tnls[prio][h];
 }
diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
index 2777fa8..4d01292 100644
--- a/net/sunrpc/svcauth_unix.c
+++ b/net/sunrpc/svcauth_unix.c
@@ -104,23 +104,9 @@ static void ip_map_put(struct kref *kref)
 	kfree(im);
 }
 
-#if IP_HASHBITS == 8
-/* hash_long on a 64 bit machine is currently REALLY BAD for
- * IP addresses in reverse-endian (i.e. on a little-endian machine).
- * So use a trivial but reliable hash instead
- */
-static inline int hash_ip(__be32 ip)
-{
-	int hash = (__force u32)ip ^ ((__force u32)ip>>16);
-	return (hash ^ (hash>>8)) & 0xff;
-}
-#endif
-static inline int hash_ip6(struct in6_addr ip)
+static inline int hash_ip6(const struct in6_addr *ip)
 {
-	return (hash_ip(ip.s6_addr32[0]) ^
-		hash_ip(ip.s6_addr32[1]) ^
-		hash_ip(ip.s6_addr32[2]) ^
-		hash_ip(ip.s6_addr32[3]));
+	return hash_32(ipv6_addr_hash(ip), IP_HASHBITS);
 }
 static int ip_map_match(struct cache_head *corig, struct cache_head *cnew)
 {
@@ -301,7 +287,7 @@ static struct ip_map *__ip_map_lookup(struct cache_detail *cd, char *class,
 	ip.m_addr = *addr;
 	ch = sunrpc_cache_lookup(cd, &ip.h,
 				 hash_str(class, IP_HASHBITS) ^
-				 hash_ip6(*addr));
+				 hash_ip6(addr));
 
 	if (ch)
 		return container_of(ch, struct ip_map, h);
@@ -331,7 +317,7 @@ static int __ip_map_update(struct cache_detail *cd, struct ip_map *ipm,
 	ip.h.expiry_time = expiry;
 	ch = sunrpc_cache_update(cd, &ip.h, &ipm->h,
 				 hash_str(ipm->m_class, IP_HASHBITS) ^
-				 hash_ip6(ipm->m_addr));
+				 hash_ip6(&ipm->m_addr));
 	if (!ch)
 		return -ENOMEM;
 	cache_put(ch, cd);

^ permalink raw reply related

* pull request: sfc-next 2012-07-18
From: Ben Hutchings @ 2012-07-18 18:16 UTC (permalink / raw)
  To: David Miller; +Cc: linux-net-drivers, netdev, Andrew Jackson, Richard Cochran

[-- Attachment #1: Type: text/plain, Size: 1996 bytes --]

The following changes since commit f150bd7f8cf742c4cdd0c929aa494ef72f7f5b13:

  driver: net: ethernet: cpsw: runtime PM support (2012-07-18 09:40:54 -0700)

are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next.git for-davem

(commit 2482aab313c3657557f1dad4c1b20c0934b84919)

This adds PTP hardware timestamping support for SFC9000-family
controllers with the appropriate peripheral (currently SFN5322F and
SFN6322F boards).

Ben.

Ben Hutchings (3):
      sfc: Support variable-length response to MCDI GET_BOARD_CFG
      sfc: Expose FPGA bitfile partition through MTD
      sfc: Bump version to 3.2

Stuart Hodgson (4):
      sfc: Add explicit RX queue flag to channel
      sfc: Add channel specific receive_skb handler and post_remove callback
      sfc: Allow efx_mcdi_rpc to be called in two parts
      sfc: Add support for IEEE-1588 PTP

 drivers/net/ethernet/sfc/Kconfig      |    7 +
 drivers/net/ethernet/sfc/Makefile     |    1 +
 drivers/net/ethernet/sfc/efx.c        |   17 +-
 drivers/net/ethernet/sfc/efx.h        |    1 +
 drivers/net/ethernet/sfc/ethtool.c    |    1 +
 drivers/net/ethernet/sfc/mcdi.c       |   44 +-
 drivers/net/ethernet/sfc/mcdi.h       |    6 +
 drivers/net/ethernet/sfc/mcdi_pcol.h  |    1 +
 drivers/net/ethernet/sfc/mtd.c        |    7 +-
 drivers/net/ethernet/sfc/net_driver.h |   29 +-
 drivers/net/ethernet/sfc/nic.h        |   31 +
 drivers/net/ethernet/sfc/ptp.c        | 1519 +++++++++++++++++++++++++++++++++
 drivers/net/ethernet/sfc/rx.c         |   20 +-
 drivers/net/ethernet/sfc/siena.c      |    1 +
 drivers/net/ethernet/sfc/tx.c         |    6 +
 15 files changed, 1673 insertions(+), 18 deletions(-)
 create mode 100644 drivers/net/ethernet/sfc/ptp.c

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.




[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* [PATCH] sunrpc: clnt: Add missing braces
From: Joe Perches @ 2012-07-18 18:17 UTC (permalink / raw)
  To: Trond Myklebust, J. Bruce Fields, Chuck Lever
  Cc: David S. Miller, linux-nfs, netdev, linux-kernel

Add a missing set of braces that commit 4e0038b6b24
("SUNRPC: Move clnt->cl_server into struct rpc_xprt")
forgot.

Signed-off-by: Joe Perches <joe@perches.com>
---
 net/sunrpc/clnt.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index f56f045..aaf70aa 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1844,12 +1844,13 @@ call_timeout(struct rpc_task *task)
 		return;
 	}
 	if (RPC_IS_SOFT(task)) {
-		if (clnt->cl_chatty)
+		if (clnt->cl_chatty) {
 			rcu_read_lock();
 			printk(KERN_NOTICE "%s: server %s not responding, timed out\n",
 				clnt->cl_protname,
 				rcu_dereference(clnt->cl_xprt)->servername);
 			rcu_read_unlock();
+		}
 		if (task->tk_flags & RPC_TASK_TIMEOUT)
 			rpc_exit(task, -ETIMEDOUT);
 		else

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox