Netdev List

Netdev List
 help / color / mirror / Atom feed

* RE: [PATCH net-next] bnx2x: dont use netdev_alloc_skb()
From: Vladislav Zolotarov @ 2010-10-12  9:12 UTC (permalink / raw)
  To: Eric Dumazet, Tom Herbert
  Cc: David Miller, netdev, Michael Chan, Eilon Greenstein
In-Reply-To: <1286860560.30423.186.camel@edumazet-laptop>



> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of Eric Dumazet
> Sent: Tuesday, October 12, 2010 7:16 AM
> To: Tom Herbert
> Cc: David Miller; netdev; Michael Chan; Eilon Greenstein
> Subject: Re: [PATCH net-next] bnx2x: dont use netdev_alloc_skb()
> 
> Le lundi 11 octobre 2010 à 22:03 -0700, Tom Herbert a écrit :
> > On Mon, Oct 11, 2010 at 4:22 PM, Eric Dumazet
> <eric.dumazet@gmail.com> wrote:
> > > Le mardi 12 octobre 2010 à 01:03 +0200, Eric Dumazet a écrit :
> > >> netdev_alloc_skb() is a very wrong interface, really.
> > >>
> > >> We should remove/deprecate it.
> > >>
> > >> For multi queue devices, it makes more sense to allocate skb on
> local
> > >> node of the cpu handling RX interrupts. This allow each cpu to
> > >> manipulate its own slub/slab queues/structures without doing
> expensive
> > >> cross-node business.
> > >>
> > >> For non multi queue devices, IRQ affinity should be set so that a
> cpu
> > >> close to the device services interrupts. Even if not set, using
> > >> dev_alloc_skb() is faster.
> > >>
> > >> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> > >
> > > Or maybe revert :
> > >
> > > commit b30973f877fea1a3fb84e05599890fcc082a88e5
> > > Author: Christoph Hellwig <hch@lst.de>
> > > Date:   Wed Dec 6 20:32:36 2006 -0800
> > >
> >
> > I second this revert.  Node aware allocation by device's node makes
> > little sense on a multi-queue device and leads to mediocre
> > performance.
> 
> Yes, I said this several time in the past, I believe time has come to
> get rid of it.
> 
> I posted a patch some minutes ago, so you can review it and ack it ;)
> 
> Thanks !

Eric, very nice patch, thanks. However Eilon is on vacation till tomorrow
and he is a responsible maintainer of the bnx2x, thus the final ACK should
be his.

Dave, if it can wait till tomorrow, I'd like to ask u to wait for Eilon's
final ACK.

Thanks,
vlad

> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply

* Re: tbf/htb qdisc limitations
From: Jarek Poplawski @ 2010-10-12 10:10 UTC (permalink / raw)
  To: Steven Brudenell; +Cc: netdev
In-Reply-To: <AANLkTin_jPp+WEkyJkD_KZyrJ4WX_6dM5wtFNw_ixuzv@mail.gmail.com>

On Mon, Oct 11, 2010 at 06:27:25PM -0400, Steven Brudenell wrote:
> > I'm not sure you checked how the "burst" works, and doubt it could
> > help you here. Anyway, do you think: rate 2KB/s with burst 5GB
> > config would be useful for you?
> 
> i actually really do want something like 2KB/s with 5GB burst
> (modifying parameters such that burst + rate * 30 days <= 5GB, but you
> get the idea). but this isn't possible given the implementation:
> 
> i see that overall, virtual "tokens" map to "scheduler ticks", where a
> "scheduler tick" is 64ns. (net/sched/sch_{tbf,htb}.c,
> include/net/pkt_sched.h -- these 64ns units are called "ticks" despite
> being unrelated to HZ). the "burst" parameter is also stored and
> passed from userspace as a u32. so, the maximum configurable burst in
> both cases is rate * 275s, since we can only track 275s worth of
> "scheduler ticks" in a u32 ( (1<<32) / NSEC_PER_SEC * 64 =~ 275s ).

Right. It was a compromise to allow higher rates for "common" use
without u64 changes. It can be tuned with PSCHED_SHIFT yet to give you
more burst, but I doubt tbf/htb authors expected monthly values here.

> 
> > My proposal is you don't bother with 1) and 2), but first do the
> > hack in tbf or htb directly, using or omitting rate tables, how
> > you like, and test this idea.
> 
> i'll give it a shot, though given that i hate writing the same code
> twice, i would prefer to know the right way to change netlink before i
> write a functional test.
> 
> due to the implementation coupling i don't see any way to make any
> permanent change *without* changing the netlink interface -- even
> changing that u32 to a u64, which would only need to be a u64 in
> userspace because userspace does the munging today!
> 
> (what's worse, today userspace has to specify the full rate table over
> netlink, instead of just specifying the rate and having the kernel
> driver compute the table or whatever other data structure it deems
> necessary. i think decoupling interface from implementation is a
> worthy goal by itself. if they were decoupled, i could have just coded
> a patch and not bothered y'all in the first place....)
> 
> > But it seems the right way is to collect monthly stats with some
> > userspace tool and change qdisc config dynamically. You might
> > look at network admins' lists for small ISPs exemplary scripts
> > doing such nasty things to their users, or have a look at ppp
> > accounting tools.
> 
> <non technical sidetrack>
> i disagree outright that a userspace tool is the "right" way to solve
> my constraints.
> 
> my constraints are:
> 1) i need to guarantee i never ever go over the monthly transfer limit
> (bad experiences with Comcast... you can check out of Red Tape Hotel
> any time you like, but you can never leave).
> 2) i want to be able to transfer short bursts at top speed whenever
> possible (that's what i'm paying for in the first place).
> 3) i need to ration transfer usage so i am never stuck in a situation
> of being limited to snail speeds until the end of the month (on a
> Comcast connection in my area, i can reasonably sustain 2MB/sec
> downstream, which eats 250GB in ~36 hours, so this constraint becomes
> important).
> 
> tbf with a large burst size seems ideal for my constraints. i can't
> quantify this, but it seems like no simpler strategy satisfies the
> constraints well and no more complex strategy is necessary. i think
> any userspace solution i could write would end up trying to emulate
> tbf with large burst.
> 
> a userspace tool updating qdisc parameters, even if run in an infinite
> loop, would always have a big chunky time resolution compared to an
> inline packet shaper (which is important for #2, and for #1 to a
> degree). i could write a packet shaper in userspace, but this does not
> make sense to given that kernel qos already exists, and already has a
> tbf implementation that just needs a little love.
> </non technical sidetrack>
> 
> given all that, i'd just like to know
> 
> 1) whether it's forbidden or bad to do floating point math in a packet
> scheduler, and

Yes, it's not allowed according to Documentation/HOWTO. Btw, as you
can see e.g. in sch_hfsc comments, 64-bit division is avoided too.

> 
> 2) the best way to go about making breaking changes to netlink.

I can only say there is no versioning, but backward compatibility
is crucial, so you need to do some tricks or data duplication.
You could probably try to get opinions about it with an RFC on
moving tbf and htb schedulers to 64 bits if you're interested
(decoupling it from your specific burst problem).

Jarek P.

^ permalink raw reply

* Re: [PATCH net-next] net:  allocate skbs on local node
From: Pekka Enberg @ 2010-10-12 11:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Eric Dumazet, David Miller, netdev, Michael Chan,
	Eilon Greenstein, Christoph Hellwig, Christoph Lameter,
	David Rientjes, LKML, Nick Piggin
In-Reply-To: <20101012005856.994bea6d.akpm@linux-foundation.org>

On 10/12/10 10:58 AM, Andrew Morton wrote:
> On Tue, 12 Oct 2010 09:49:53 +0200 Eric Dumazet<eric.dumazet@gmail.com>  wrote:
>
>> Le mardi 12 octobre 2010 à 00:24 -0700, Andrew Morton a écrit :
>>
>>> I'd love to forget it, but it's faster for some things (I forget
>>> which).  Which is why it's still around.
>>
>> Yes, two years ago it was true on pathological/obscure cases.
>> Every time I did the comparison, SLUB won.
>> You asked me, I did yet another test this morning, and 40% is pretty
>> serious, I believe.
>>
>>> And the ghastly thing about this is that you're forced to care about it
>>> too because some people are, apparently, still using it.
>>
>> Yes, some people (in my company) still use linux 2.6.9 32bit on HP G6/G7
>> machines, I know...
>>
>> I am not saying we should not care, but for any serious network workload
>> on NUMA arches, SLUB is the best, and seeing Christoph recent work, it
>> might even get better.
>>
>> BTW, I believe all modern distros ship SLUB, dont they ?
>
> Dunno.
>
> Pekka, why haven't we deleted slab yet??

To make a long story short, we still have relevant performance 
regressions that need to be taken care of. The most interesting one is a 
regression in netperf TCP_RR that's been reported by David Rientjes a 
while back. There's bunch of SLUB cleanups queued for 2.6.37 that pave 
the way for Christoph's SLUB queueing work that should hopefully fix 
that particular issue for 2.6.38.

There's little point in discussing the removal of SLAB as long as there 
are performance regressions for real workloads from people who are 
willing to share results and test patches. I'm optimistic that we'll be 
able to try removing SLAB some time next year unless something 
interesting pops up...

			Pekka

^ permalink raw reply

* [RFC arm-kernel 0/4] caif-u5500: CAIF shared memory/mailbox driver
From: Sjur Braendeland @ 2010-10-12 11:10 UTC (permalink / raw)
  To: davem, Russell King
  Cc: stefan.xk.nilsson, linus.walleij, linux-arm-kernel, netdev,
	kim.lilliestierna, daniel.martensson, Sjur Brændeland

From: Sjur Brændeland <sjur.brandeland@stericsson.com>

This patch-set introduces the CAIF protocol driver for shared memory
over mailbox for u5500.

It implements the driver for transporting CAIF frames between the DB5500
modem portion and host side for the u5500 architecture.

The patches depends on the patch-set "U5500 mailbox driver and
modem irq handler" submitted on the linux-arm-kernel list by Stefan Nilsson.
http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=6417/1

These patches applies cleanly on latest linux-next git, but depends on
the U5500 mailbox driver for compilation.

Please note that the size and address of the shared memory are specified as module
parameters: shm_total_size, shm_start.

Regards,
Sjur Braendeland

Amarnath Revanna (4):
  caif-u5500: Adding shared memory include
  caif-u5500: CAIF shared memory transport protocol
  caif-u5500: CAIF shared memory mailbox interface
  caif-u5500: Build config for CAIF shared mem driver

 drivers/net/caif/Kconfig          |    7 +
 drivers/net/caif/Makefile         |    4 +
 drivers/net/caif/caif_shm_u5500.c |  129 ++++++
 drivers/net/caif/caif_shmcore.c   |  792 +++++++++++++++++++++++++++++++++++++
 include/net/caif/caif_shm.h       |   26 ++
 5 files changed, 958 insertions(+), 0 deletions(-)
 create mode 100644 drivers/net/caif/caif_shm_u5500.c
 create mode 100644 drivers/net/caif/caif_shmcore.c
 create mode 100644 include/net/caif/caif_shm.h

^ permalink raw reply

* [RFC arm-kernel 1/4] caif-u5500: Adding shared memory include
From: Sjur Braendeland @ 2010-10-12 11:10 UTC (permalink / raw)
  To: davem, Russell King
  Cc: stefan.xk.nilsson, linus.walleij, linux-arm-kernel, netdev,
	kim.lilliestierna, daniel.martensson, Amarnath Revanna,
	Sjur Brændeland
In-Reply-To: <1286881821-2916-1-git-send-email-sjur.brandeland@stericsson.com>

From: Amarnath Revanna <amarnath.bangalore.revanna@stericsson.com>

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
---
 include/net/caif/caif_shm.h |   26 ++++++++++++++++++++++++++
 1 files changed, 26 insertions(+), 0 deletions(-)
 create mode 100644 include/net/caif/caif_shm.h

diff --git a/include/net/caif/caif_shm.h b/include/net/caif/caif_shm.h
new file mode 100644
index 0000000..5bcce55
--- /dev/null
+++ b/include/net/caif/caif_shm.h
@@ -0,0 +1,26 @@
+/*
+ * Copyright (C) ST-Ericsson AB 2010
+ * Contact: Sjur Brendeland / sjur.brandeland@stericsson.com
+ * Author: Amarnath Revanna / amarnath.bangalore.revanna@stericsson.com
+ * License terms: GNU General Public License (GPL) version 2
+ */
+
+#ifndef CAIF_SHM_H_
+#define CAIF_SHM_H_
+
+struct shmdev_layer {
+	u32 shm_base_addr;
+	u32 shm_total_sz;
+	u32 shm_id;
+	u32 shm_loopback;
+	void *hmbx;
+	int (*pshmdev_mbxsend) (u32 shm_id, u32 mbx_msg);
+	int (*pshmdev_mbxsetup) (void *pshmdrv_cb,
+				struct shmdev_layer *pshm_dev, void *pshm_drv);
+	struct net_device *pshm_netdev;
+};
+
+extern int caif_shmcore_probe(struct shmdev_layer *pshm_dev);
+extern void caif_shmcore_remove(struct net_device *pshm_netdev);
+
+#endif
-- 
1.7.0.4


^ permalink raw reply related

* [RFC arm-kernel 2/4] caif-u5500: CAIF shared memory transport protocol
From: Sjur Braendeland @ 2010-10-12 11:10 UTC (permalink / raw)
  To: davem, Russell King
  Cc: stefan.xk.nilsson, linus.walleij, linux-arm-kernel, netdev,
	kim.lilliestierna, daniel.martensson, Amarnath Revanna,
	Sjur Brændeland
In-Reply-To: <1286881821-2916-1-git-send-email-sjur.brandeland@stericsson.com>

From: Amarnath Revanna <amarnath.bangalore.revanna@stericsson.com>

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
---
 drivers/net/caif/caif_shmcore.c |  792 +++++++++++++++++++++++++++++++++++++++
 1 files changed, 792 insertions(+), 0 deletions(-)
 create mode 100644 drivers/net/caif/caif_shmcore.c

diff --git a/drivers/net/caif/caif_shmcore.c b/drivers/net/caif/caif_shmcore.c
new file mode 100644
index 0000000..f20cbd3
--- /dev/null
+++ b/drivers/net/caif/caif_shmcore.c
@@ -0,0 +1,792 @@
+/*
+ * Copyright (C) ST-Ericsson AB 2010
+ * Contact: Sjur Brendeland / sjur.brandeland@stericsson.com
+ * Authors:  Amarnath Revanna / amarnath.bangalore.revanna@stericsson.com,
+ *           Daniel Martensson / daniel.martensson@stericsson.com
+ * License terms: GNU General Public License (GPL) version 2
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ":%s(): " fmt, __func__
+
+#include <linux/spinlock.h>
+#include <linux/sched.h>
+#include <linux/list.h>
+#include <linux/netdevice.h>
+#include <linux/if_arp.h>
+
+#include <net/caif/caif_device.h>
+#include <net/caif/caif_shm.h>
+
+#define NR_TX_BUF		6
+#define NR_RX_BUF		6
+#define TX_BUF_SZ		0x2000
+#define RX_BUF_SZ		0x2000
+
+#define CAIF_NEEDED_HEADROOM	32
+
+#define CAIF_FLOW_ON		1
+#define CAIF_FLOW_OFF		0
+
+#define LOW_WATERMARK		3
+#define HIGH_WATERMARK		4
+
+/* Maximum number of CAIF buffers per shared memory buffer. */
+#define SHM_MAX_FRMS_PER_BUF	10
+
+/*
+ * Size in bytes of the descriptor area
+ * (With end of descriptor signalling)
+ */
+#define SHM_CAIF_DESC_SIZE	((SHM_MAX_FRMS_PER_BUF + 1) * \
+					sizeof(struct shm_pck_desc))
+
+/*
+ * Offset to the first CAIF frame within a shared memory buffer.
+ * Aligned on 32 bytes.
+ */
+#define SHM_CAIF_FRM_OFS	(SHM_CAIF_DESC_SIZE + (SHM_CAIF_DESC_SIZE % 32))
+
+/* Number of bytes for CAIF shared memory header. */
+#define SHM_HDR_LEN		1
+
+/* Number of padding bytes for the complete CAIF frame. */
+#define SHM_FRM_PAD_LEN		4
+
+#define CAIF_MAX_MTU		4096
+
+#define SHM_SET_FULL(x)	(((x+1) & 0x0F) << 0)
+#define SHM_GET_FULL(x)	(((x >> 0) & 0x0F) - 1)
+
+#define SHM_SET_EMPTY(x)	(((x+1) & 0x0F) << 4)
+#define SHM_GET_EMPTY(x)	(((x >> 4) & 0x0F) - 1)
+
+#define SHM_FULL_MASK		(0x0F << 0)
+#define SHM_EMPTY_MASK		(0x0F << 4)
+
+struct shm_pck_desc {
+	/*
+	 * Offset from start of shared memory area to start of
+	 * shared memory CAIF frame.
+	 */
+	u32 frm_ofs;
+	u32 frm_len;
+};
+
+struct buf_list {
+	unsigned char *desc_vptr;
+	u32 phy_addr;
+	u32 index;
+	u32 len;
+	u32 frames;
+	u32 frm_ofs;
+	struct list_head list;
+};
+
+struct shm_caif_frm {
+	/* Number of bytes of padding before the CAIF frame. */
+	u8 hdr_ofs;
+};
+
+struct shmdrv_layer {
+	/* caif_dev_common must always be first in the structure*/
+	struct caif_dev_common cfdev;
+
+	u32 shm_tx_addr;
+	u32 shm_rx_addr;
+	u32 shm_base_addr;
+	u32 tx_empty_available;
+	spinlock_t lock;
+
+	struct list_head tx_empty_list;
+	struct list_head tx_pend_list;
+	struct list_head tx_full_list;
+	struct list_head rx_empty_list;
+	struct list_head rx_pend_list;
+	struct list_head rx_full_list;
+
+	struct workqueue_struct *pshm_tx_workqueue;
+	struct workqueue_struct *pshm_rx_workqueue;
+
+	struct work_struct shm_tx_work;
+	struct work_struct shm_rx_work;
+
+	struct sk_buff_head sk_qhead;
+	struct shmdev_layer *pshm_dev;
+};
+
+static int shm_netdev_open(struct net_device *shm_netdev)
+{
+	netif_wake_queue(shm_netdev);
+	return 0;
+}
+
+static int shm_netdev_close(struct net_device *shm_netdev)
+{
+	netif_stop_queue(shm_netdev);
+	return 0;
+}
+
+int caif_shmdrv_rx_cb(u32 mbx_msg, void *priv)
+{
+	struct buf_list *pbuf;
+	struct shmdrv_layer *pshm_drv;
+	struct list_head *pos;
+	u32 avail_emptybuff = 0;
+	unsigned long flags = 0;
+
+	pshm_drv = (struct shmdrv_layer *)priv;
+
+	/* Check for received buffers. */
+	if (mbx_msg & SHM_FULL_MASK) {
+		int idx;
+
+		if (!in_irq())
+			spin_lock_bh(&pshm_drv->lock);
+		else
+			spin_lock_irqsave(&pshm_drv->lock, flags);
+
+		/* Check whether we have any outstanding buffers. */
+		if (list_empty(&pshm_drv->rx_empty_list)) {
+
+			/* Release spin lock. */
+			if (!in_irq())
+				spin_unlock_bh(&pshm_drv->lock);
+			else
+				spin_unlock_irqrestore(&pshm_drv->lock, flags);
+
+			/* We print even in IRQ context... */
+			pr_warn("No empty Rx buffers to fill: "
+					"mbx_msg:%x\n", mbx_msg);
+
+			/* Bail out. */
+			goto err_sync;
+		}
+
+		pbuf =
+			list_entry(pshm_drv->rx_empty_list.next,
+					struct buf_list, list);
+		idx = pbuf->index;
+
+		/* Check buffer synchronization. */
+		if (idx != SHM_GET_FULL(mbx_msg)) {
+
+			/* We print even in IRQ context... */
+			pr_warn(
+			"phyif_shm_mbx_msg_cb: RX full out of sync:"
+			" idx:%d, msg:%x SHM_GET_FULL(mbx_msg):%x\n",
+				idx, mbx_msg, SHM_GET_FULL(mbx_msg));
+
+			if (!in_irq())
+				spin_unlock_bh(&pshm_drv->lock);
+			else
+				spin_unlock_irqrestore(&pshm_drv->lock, flags);
+
+			/* Bail out. */
+			goto err_sync;
+		}
+
+		list_del_init(&pbuf->list);
+		list_add_tail(&pbuf->list, &pshm_drv->rx_full_list);
+
+		if (!in_irq())
+			spin_unlock_bh(&pshm_drv->lock);
+		else
+			spin_unlock_irqrestore(&pshm_drv->lock, flags);
+
+		/* Schedule RX work queue. */
+		if (!work_pending(&pshm_drv->shm_rx_work))
+			queue_work(pshm_drv->pshm_rx_workqueue,
+						&pshm_drv->shm_rx_work);
+	}
+
+	/* Check for emptied buffers. */
+	if (mbx_msg & SHM_EMPTY_MASK) {
+		int idx;
+
+		if (!in_irq())
+			spin_lock_bh(&pshm_drv->lock);
+		else
+			spin_lock_irqsave(&pshm_drv->lock, flags);
+
+		/* Check whether we have any outstanding buffers. */
+		if (list_empty(&pshm_drv->tx_full_list)) {
+
+			/* We print even in IRQ context... */
+			pr_warn("No TX to empty: msg:%x\n", mbx_msg);
+
+			if (!in_irq())
+				spin_unlock_bh(&pshm_drv->lock);
+			else
+				spin_unlock_irqrestore(&pshm_drv->lock, flags);
+
+			/* Bail out. */
+			goto err_sync;
+		}
+
+		pbuf =
+			list_entry(pshm_drv->tx_full_list.next,
+					struct buf_list, list);
+		idx = pbuf->index;
+
+		/* Check buffer synchronization. */
+		if (idx != SHM_GET_EMPTY(mbx_msg)) {
+
+			if (!in_irq())
+				spin_unlock_bh(&pshm_drv->lock);
+			else
+				spin_unlock_irqrestore(&pshm_drv->lock, flags);
+
+			/* We print even in IRQ context... */
+			pr_warn("TX empty "
+				"out of sync:idx:%d, msg:%x\n", idx, mbx_msg);
+
+			/* Bail out. */
+			goto err_sync;
+		}
+		list_del_init(&pbuf->list);
+
+		/* Reset buffer parameters. */
+		pbuf->frames = 0;
+		pbuf->frm_ofs = SHM_CAIF_FRM_OFS;
+
+		list_add_tail(&pbuf->list, &pshm_drv->tx_empty_list);
+
+		/* Check the available no. of buffers in the empty list */
+		list_for_each(pos, &pshm_drv->tx_empty_list)
+			avail_emptybuff++;
+
+		/* Check whether we have to wake up the transmitter. */
+		if ((avail_emptybuff > HIGH_WATERMARK) &&
+					(!pshm_drv->tx_empty_available)) {
+			pshm_drv->tx_empty_available = 1;
+			pshm_drv->cfdev.flowctrl
+					(pshm_drv->pshm_dev->pshm_netdev,
+								CAIF_FLOW_ON);
+
+			if (!in_irq())
+				spin_unlock_bh(&pshm_drv->lock);
+			else
+				spin_unlock_irqrestore(&pshm_drv->lock, flags);
+
+			/* Schedule the work queue. if required */
+			if (!work_pending(&pshm_drv->shm_tx_work))
+				queue_work(pshm_drv->pshm_tx_workqueue,
+							&pshm_drv->shm_tx_work);
+		} else
+			if (!in_irq())
+				spin_unlock_bh(&pshm_drv->lock);
+			else
+				spin_unlock_irqrestore(&pshm_drv->lock, flags);
+	}
+
+	return 0;
+
+err_sync:
+	return -EIO;
+}
+
+static void shm_rx_work_func(struct work_struct *rx_work)
+{
+	struct shmdrv_layer *pshm_drv;
+	struct buf_list *pbuf;
+	unsigned long flags = 0;
+	struct sk_buff *skb;
+	char *p;
+	int ret;
+
+	pshm_drv = container_of(rx_work, struct shmdrv_layer, shm_rx_work);
+
+	while (1) {
+
+		struct shm_pck_desc *pck_desc;
+
+		if (!in_irq())
+			spin_lock_bh(&pshm_drv->lock);
+		else
+			spin_lock_irqsave(&pshm_drv->lock, flags);
+
+		/* Check for received buffers. */
+		if (list_empty(&pshm_drv->rx_full_list)) {
+			if (!in_irq())
+				spin_unlock_bh(&pshm_drv->lock);
+			else
+				spin_unlock_irqrestore(&pshm_drv->lock, flags);
+			break;
+		}
+
+		pbuf =
+			list_entry(pshm_drv->rx_full_list.next, struct buf_list,
+					list);
+		list_del_init(&pbuf->list);
+
+		/* Retrieve pointer to start of the packet descriptor area. */
+		pck_desc = (struct shm_pck_desc *) pbuf->desc_vptr;
+
+		/*
+		 * Check whether descriptor contains a CAIF shared memory
+		 * frame.
+		 */
+		while (pck_desc->frm_ofs) {
+			unsigned int frm_buf_ofs;
+			unsigned int frm_pck_ofs;
+			unsigned int frm_pck_len;
+			/*
+			 * Check whether offset is within buffer limits
+			 * (lower).
+			 */
+			if (pck_desc->frm_ofs <
+				(pbuf->phy_addr - pshm_drv->shm_base_addr))
+				break;
+			/*
+			 * Check whether offset is within buffer limits
+			 * (higher).
+			 */
+			if (pck_desc->frm_ofs >
+				((pbuf->phy_addr - pshm_drv->shm_base_addr) +
+					pbuf->len))
+				break;
+
+			/* Calculate offset from start of buffer. */
+			frm_buf_ofs =
+				pck_desc->frm_ofs - (pbuf->phy_addr -
+						pshm_drv->shm_base_addr);
+
+			/*
+			 * Calculate offset and length of CAIF packet while
+			 * taking care of the shared memory header.
+			 */
+			frm_pck_ofs =
+				frm_buf_ofs + SHM_HDR_LEN +
+				(*(pbuf->desc_vptr + frm_buf_ofs));
+			frm_pck_len =
+				(pck_desc->frm_len - SHM_HDR_LEN -
+				(*(pbuf->desc_vptr + frm_buf_ofs)));
+
+			/* Check whether CAIF packet is within buffer limits */
+			if ((frm_pck_ofs + pck_desc->frm_len) > pbuf->len)
+				break;
+
+			/* Get a suitable CAIF packet and copy in data. */
+			skb = netdev_alloc_skb(pshm_drv->pshm_dev->pshm_netdev,
+							frm_pck_len + 1);
+			BUG_ON(skb == NULL);
+
+			p = skb_put(skb, frm_pck_len);
+			memcpy(p, pbuf->desc_vptr + frm_pck_ofs, frm_pck_len);
+
+			skb->protocol = htons(ETH_P_CAIF);
+			skb_reset_mac_header(skb);
+			skb->dev = pshm_drv->pshm_dev->pshm_netdev;
+
+			/* Push received packet up the stack. */
+			ret = netif_rx_ni(skb);
+
+			if (!ret) {
+				pshm_drv->pshm_dev->pshm_netdev->stats.
+								rx_packets++;
+				pshm_drv->pshm_dev->pshm_netdev->stats.
+						rx_bytes += pck_desc->frm_len;
+			} else
+				++pshm_drv->pshm_dev->pshm_netdev->stats.
+								rx_dropped;
+			/* Move to next packet descriptor. */
+			pck_desc++;
+		}
+
+		list_add_tail(&pbuf->list, &pshm_drv->rx_pend_list);
+
+		if (!in_irq())
+			spin_unlock_bh(&pshm_drv->lock);
+		else
+			spin_unlock_irqrestore(&pshm_drv->lock, flags);
+
+	}
+
+	/* Schedule the work queue. if required */
+	if (!work_pending(&pshm_drv->shm_tx_work))
+		queue_work(pshm_drv->pshm_tx_workqueue, &pshm_drv->shm_tx_work);
+
+}
+
+static void shm_tx_work_func(struct work_struct *tx_work)
+{
+	u32 mbox_msg;
+	unsigned int frmlen, avail_emptybuff, append = 0;
+	unsigned long flags = 0;
+	struct buf_list *pbuf = NULL;
+	struct shmdrv_layer *pshm_drv;
+	struct shm_caif_frm *frm;
+	struct sk_buff *skb;
+	struct shm_pck_desc *pck_desc;
+	struct list_head *pos;
+
+	pshm_drv = container_of(tx_work, struct shmdrv_layer, shm_tx_work);
+
+	do {
+		/* Initialize mailbox message. */
+		mbox_msg = 0x00;
+		avail_emptybuff = 0;
+
+		if (!in_irq())
+			spin_lock_bh(&pshm_drv->lock);
+		else
+			spin_lock_irqsave(&pshm_drv->lock, flags);
+
+		/* Check for pending receive buffers. */
+		if (!list_empty(&pshm_drv->rx_pend_list)) {
+
+			pbuf = list_entry(pshm_drv->rx_pend_list.next,
+						struct buf_list, list);
+
+			list_del_init(&pbuf->list);
+			list_add_tail(&pbuf->list, &pshm_drv->rx_empty_list);
+			/*
+			 * Value index is never changed,
+			 * so read access should be safe.
+			 */
+			mbox_msg |= SHM_SET_EMPTY(pbuf->index);
+		}
+
+		skb = skb_peek(&pshm_drv->sk_qhead);
+
+		if (skb == NULL)
+			goto send_msg;
+
+		/* Check the available no. of buffers in the empty list */
+		list_for_each(pos, &pshm_drv->tx_empty_list)
+			avail_emptybuff++;
+
+		if ((avail_emptybuff < LOW_WATERMARK) &&
+					pshm_drv->tx_empty_available) {
+			/* Update blocking condition. */
+			pshm_drv->tx_empty_available = 0;
+			pshm_drv->cfdev.flowctrl
+					(pshm_drv->pshm_dev->pshm_netdev,
+					CAIF_FLOW_OFF);
+		}
+		/*
+		 * We simply return back to the caller if we do not have space
+		 * either in Tx pending list or Tx empty list. In this case,
+		 * we hold the received skb in the skb list, waiting to
+		 * be transmitted once Tx buffers become available
+		 */
+		if (list_empty(&pshm_drv->tx_empty_list))
+			goto send_msg;
+
+		/* Get the first free Tx buffer. */
+		pbuf = list_entry(pshm_drv->tx_empty_list.next,
+						struct buf_list, list);
+		do {
+			if (append) {
+				skb = skb_peek(&pshm_drv->sk_qhead);
+				if (skb == NULL)
+					break;
+			}
+
+			frm = (struct shm_caif_frm *)
+					(pbuf->desc_vptr + pbuf->frm_ofs);
+
+			frm->hdr_ofs = 0;
+			frmlen = 0;
+			frmlen += SHM_HDR_LEN + frm->hdr_ofs + skb->len;
+
+			/* Add tail padding if needed. */
+			if (frmlen % SHM_FRM_PAD_LEN)
+				frmlen += SHM_FRM_PAD_LEN -
+						(frmlen % SHM_FRM_PAD_LEN);
+
+			/*
+			 * Verify that packet, header and additional padding
+			 * can fit within the buffer frame area.
+			 */
+			if (frmlen >= (pbuf->len - pbuf->frm_ofs))
+				break;
+
+			if (!append) {
+				list_del_init(&pbuf->list);
+				append = 1;
+			}
+
+			skb = skb_dequeue(&pshm_drv->sk_qhead);
+			/* Copy in CAIF frame. */
+			skb_copy_bits(skb, 0, pbuf->desc_vptr +
+					pbuf->frm_ofs + SHM_HDR_LEN +
+						frm->hdr_ofs, skb->len);
+
+			pshm_drv->pshm_dev->pshm_netdev->stats.tx_packets++;
+			pshm_drv->pshm_dev->pshm_netdev->stats.tx_bytes +=
+									frmlen;
+			dev_kfree_skb(skb);
+
+			/* Fill in the shared memory packet descriptor area. */
+			pck_desc = (struct shm_pck_desc *) (pbuf->desc_vptr);
+			/* Forward to current frame. */
+			pck_desc += pbuf->frames;
+			pck_desc->frm_ofs = (pbuf->phy_addr -
+						pshm_drv->shm_base_addr) +
+								pbuf->frm_ofs;
+			pck_desc->frm_len = frmlen;
+			/* Terminate packet descriptor area. */
+			pck_desc++;
+			pck_desc->frm_ofs = 0;
+			/* Update buffer parameters. */
+			pbuf->frames++;
+			pbuf->frm_ofs += frmlen + (frmlen % 32);
+
+		} while (pbuf->frames < SHM_MAX_FRMS_PER_BUF);
+
+		/* Assign buffer as full. */
+		list_add_tail(&pbuf->list, &pshm_drv->tx_full_list);
+		append = 0;
+		mbox_msg |= SHM_SET_FULL(pbuf->index);
+send_msg:
+		if (!in_irq())
+			spin_unlock_bh(&pshm_drv->lock);
+		else
+			spin_unlock_irqrestore(&pshm_drv->lock, flags);
+
+		if (mbox_msg)
+			pshm_drv->pshm_dev->pshmdev_mbxsend
+					(pshm_drv->pshm_dev->shm_id, mbox_msg);
+	} while (mbox_msg);
+}
+
+static int shm_netdev_tx(struct sk_buff *skb, struct net_device *shm_netdev)
+{
+	struct shmdrv_layer *pshm_drv;
+	unsigned long flags = 0;
+
+	pshm_drv = netdev_priv(shm_netdev);
+
+	if (!in_irq())
+		spin_lock_bh(&pshm_drv->lock);
+	else
+		spin_lock_irqsave(&pshm_drv->lock, flags);
+
+	skb_queue_tail(&pshm_drv->sk_qhead, skb);
+
+	if (!in_irq())
+		spin_unlock_bh(&pshm_drv->lock);
+	else
+		spin_unlock_irqrestore(&pshm_drv->lock, flags);
+
+	/* Schedule Tx work queue. for deferred processing of skbs*/
+	if (!work_pending(&pshm_drv->shm_tx_work))
+		queue_work(pshm_drv->pshm_tx_workqueue, &pshm_drv->shm_tx_work);
+
+	return 0;
+}
+
+static const struct net_device_ops netdev_ops = {
+	.ndo_open = shm_netdev_open,
+	.ndo_stop = shm_netdev_close,
+	.ndo_start_xmit = shm_netdev_tx,
+};
+
+static void shm_netdev_setup(struct net_device *pshm_netdev)
+{
+	struct shmdrv_layer *pshm_drv;
+	pshm_netdev->netdev_ops = &netdev_ops;
+
+	pshm_netdev->mtu = CAIF_MAX_MTU;
+	pshm_netdev->type = ARPHRD_CAIF;
+	pshm_netdev->hard_header_len = CAIF_NEEDED_HEADROOM;
+	pshm_netdev->tx_queue_len = 0;
+	pshm_netdev->destructor = free_netdev;
+
+	pshm_drv = netdev_priv(pshm_netdev);
+
+	/* Initialize structures in a clean state. */
+	memset(pshm_drv, 0, sizeof(struct shmdrv_layer));
+
+	pshm_drv->cfdev.link_select = CAIF_LINK_LOW_LATENCY;
+}
+
+int caif_shmcore_probe(struct shmdev_layer *pshm_dev)
+{
+	int result, j;
+	struct shmdrv_layer *pshm_drv = NULL;
+
+	pshm_dev->pshm_netdev = alloc_netdev(sizeof(struct shmdrv_layer),
+						"cfshm%d", shm_netdev_setup);
+	if (!pshm_dev->pshm_netdev)
+		return -ENOMEM;
+
+	pshm_drv = netdev_priv(pshm_dev->pshm_netdev);
+	pshm_drv->pshm_dev = pshm_dev;
+
+	/*
+	 * Initialization starts with the verification of the
+	 * availability of MBX driver by calling its setup function.
+	 * MBX driver must be available by this time for proper
+	 * functioning of SHM driver.
+	 */
+	if ((pshm_dev->pshmdev_mbxsetup
+				(caif_shmdrv_rx_cb, pshm_dev, pshm_drv)) != 0) {
+		pr_warn("Could not config. SHM Mailbox,"
+				" Bailing out.....\n");
+		free_netdev(pshm_dev->pshm_netdev);
+		return -ENODEV;
+	}
+
+	skb_queue_head_init(&pshm_drv->sk_qhead);
+
+	pr_info("SHM DEVICE[%d] PROBED BY DRIVER, NEW SHM DRIVER"
+			" INSTANCE AT pshm_drv =0x%p\n",
+			pshm_drv->pshm_dev->shm_id, pshm_drv);
+
+	if (pshm_dev->shm_total_sz <
+			(NR_TX_BUF * TX_BUF_SZ + NR_RX_BUF * RX_BUF_SZ)) {
+
+		pr_warn("ERROR, Amount of available"
+				" Phys. SHM cannot accomodate current SHM "
+				"driver configuration, Bailing out ...\n");
+		free_netdev(pshm_dev->pshm_netdev);
+		return -ENOMEM;
+	}
+
+	pshm_drv->shm_base_addr = pshm_dev->shm_base_addr;
+	pshm_drv->shm_tx_addr = pshm_drv->shm_base_addr;
+
+	if (pshm_dev->shm_loopback)
+		pshm_drv->shm_rx_addr = pshm_drv->shm_tx_addr;
+	else
+		pshm_drv->shm_rx_addr = pshm_dev->shm_base_addr +
+						(NR_TX_BUF * TX_BUF_SZ);
+
+	INIT_LIST_HEAD(&pshm_drv->tx_empty_list);
+	INIT_LIST_HEAD(&pshm_drv->tx_pend_list);
+	INIT_LIST_HEAD(&pshm_drv->tx_full_list);
+
+	INIT_LIST_HEAD(&pshm_drv->rx_empty_list);
+	INIT_LIST_HEAD(&pshm_drv->rx_pend_list);
+	INIT_LIST_HEAD(&pshm_drv->rx_full_list);
+
+	INIT_WORK(&pshm_drv->shm_tx_work, shm_tx_work_func);
+	INIT_WORK(&pshm_drv->shm_rx_work, shm_rx_work_func);
+
+	pshm_drv->pshm_tx_workqueue =
+				create_singlethread_workqueue("shm_tx_work");
+	pshm_drv->pshm_rx_workqueue =
+				create_singlethread_workqueue("shm_rx_work");
+
+	for (j = 0; j < NR_TX_BUF; j++) {
+		struct buf_list *tx_buf =
+				kmalloc(sizeof(struct buf_list), GFP_KERNEL);
+
+		if (tx_buf == NULL) {
+			pr_warn("ERROR, Could not"
+					" allocate dynamic mem. for tx_buf,"
+					" Bailing out ...\n");
+			free_netdev(pshm_dev->pshm_netdev);
+			return -ENOMEM;
+		}
+		tx_buf->index = j;
+		tx_buf->phy_addr = pshm_drv->shm_tx_addr + (TX_BUF_SZ * j);
+		tx_buf->len = TX_BUF_SZ;
+		tx_buf->frames = 0;
+		tx_buf->frm_ofs = SHM_CAIF_FRM_OFS;
+
+		if (pshm_dev->shm_loopback)
+			tx_buf->desc_vptr = (char *)tx_buf->phy_addr;
+		else
+			tx_buf->desc_vptr =
+					ioremap(tx_buf->phy_addr, TX_BUF_SZ);
+
+		list_add_tail(&tx_buf->list, &pshm_drv->tx_empty_list);
+	}
+
+	for (j = 0; j < NR_RX_BUF; j++) {
+		struct buf_list *rx_buf =
+				kmalloc(sizeof(struct buf_list), GFP_KERNEL);
+
+		if (rx_buf == NULL) {
+			pr_warn("ERROR, Could not"
+					" allocate dynamic mem.for rx_buf,"
+					" Bailing out ...\n");
+			free_netdev(pshm_dev->pshm_netdev);
+			return -ENOMEM;
+		}
+		rx_buf->index = j;
+		rx_buf->phy_addr = pshm_drv->shm_rx_addr + (RX_BUF_SZ * j);
+		rx_buf->len = RX_BUF_SZ;
+
+		if (pshm_dev->shm_loopback)
+			rx_buf->desc_vptr = (char *)rx_buf->phy_addr;
+		else
+			rx_buf->desc_vptr =
+					ioremap(rx_buf->phy_addr, RX_BUF_SZ);
+		list_add_tail(&rx_buf->list, &pshm_drv->rx_empty_list);
+	}
+
+	pshm_drv->tx_empty_available = 1;
+	result = register_netdev(pshm_dev->pshm_netdev);
+	if (result)
+		pr_warn("ERROR[%d], SHM could not, "
+			"register with NW FRMWK Bailing out ...\n", result);
+
+	return result;
+}
+
+void caif_shmcore_remove(struct net_device *pshm_netdev)
+{
+	struct buf_list *pbuf;
+	struct shmdrv_layer *pshm_drv = NULL;
+
+	pshm_drv = netdev_priv(pshm_netdev);
+
+	while (!(list_empty(&pshm_drv->tx_pend_list))) {
+		pbuf =
+			list_entry(pshm_drv->tx_pend_list.next,
+					struct buf_list, list);
+
+		list_del(&pbuf->list);
+		kfree(pbuf);
+	}
+
+	while (!(list_empty(&pshm_drv->tx_full_list))) {
+		pbuf =
+			list_entry(pshm_drv->tx_full_list.next,
+					struct buf_list, list);
+		list_del(&pbuf->list);
+		kfree(pbuf);
+	}
+
+	while (!(list_empty(&pshm_drv->tx_empty_list))) {
+		pbuf =
+			list_entry(pshm_drv->tx_empty_list.next,
+					struct buf_list, list);
+		list_del(&pbuf->list);
+		kfree(pbuf);
+	}
+
+	while (!(list_empty(&pshm_drv->rx_full_list))) {
+		pbuf =
+			list_entry(pshm_drv->tx_full_list.next,
+				struct buf_list, list);
+		list_del(&pbuf->list);
+		kfree(pbuf);
+	}
+
+	while (!(list_empty(&pshm_drv->rx_pend_list))) {
+		pbuf =
+			list_entry(pshm_drv->tx_pend_list.next,
+				struct buf_list, list);
+		list_del(&pbuf->list);
+		kfree(pbuf);
+	}
+
+	while (!(list_empty(&pshm_drv->rx_empty_list))) {
+		pbuf =
+			list_entry(pshm_drv->rx_empty_list.next,
+				struct buf_list, list);
+		list_del(&pbuf->list);
+		kfree(pbuf);
+	}
+
+	/* Destroy work queues. */
+	destroy_workqueue(pshm_drv->pshm_tx_workqueue);
+	destroy_workqueue(pshm_drv->pshm_rx_workqueue);
+
+	unregister_netdev(pshm_netdev);
+}
-- 
1.7.0.4


^ permalink raw reply related

* [RFC arm-kernel 3/4] caif-u5500: CAIF shared memory mailbox interface
From: Sjur Braendeland @ 2010-10-12 11:10 UTC (permalink / raw)
  To: davem, Russell King
  Cc: stefan.xk.nilsson, linus.walleij, linux-arm-kernel, netdev,
	kim.lilliestierna, daniel.martensson, Amarnath Revanna,
	Sjur Brændeland
In-Reply-To: <1286881821-2916-1-git-send-email-sjur.brandeland@stericsson.com>

From: Amarnath Revanna <amarnath.bangalore.revanna@stericsson.com>

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
---
 drivers/net/caif/caif_shm_u5500.c |  129 +++++++++++++++++++++++++++++++++++++
 1 files changed, 129 insertions(+), 0 deletions(-)
 create mode 100644 drivers/net/caif/caif_shm_u5500.c

diff --git a/drivers/net/caif/caif_shm_u5500.c b/drivers/net/caif/caif_shm_u5500.c
new file mode 100644
index 0000000..7c1f7b9
--- /dev/null
+++ b/drivers/net/caif/caif_shm_u5500.c
@@ -0,0 +1,129 @@
+/*
+ * Copyright (C) ST-Ericsson AB 2010
+ * Contact: Sjur Brendeland / sjur.brandeland@stericsson.com
+ * Author:  Amarnath Revanna / amarnath.bangalore.revanna@stericsson.com
+ * License terms: GNU General Public License (GPL) version 2
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ":%s(): " fmt, __func__
+
+#include <linux/version.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/netdevice.h>
+#include <mach/mbox.h>
+#include <net/caif/caif_shm.h>
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("CAIF Shared Memory protocol driver");
+
+#define MAX_SHM_INSTANCES	1
+
+enum {
+	MBX_ACC0,
+	MBX_ACC1,
+	MBX_DSP
+};
+
+static struct shmdev_layer shmdev_lyr[MAX_SHM_INSTANCES];
+
+static unsigned int shm_start;
+static unsigned int shm_size;
+
+module_param(shm_size, uint  , 0440);
+MODULE_PARM_DESC(shm_total_size, "Start of SHM shared memory");
+
+module_param(shm_start, uint  , 0440);
+MODULE_PARM_DESC(shm_total_start, "Total Size of SHM shared memory");
+
+static int shmdev_send_msg(u32 dev_id, u32 mbx_msg)
+{
+	/* Always block until msg is written successfully */
+	mbox_send(shmdev_lyr[dev_id].hmbx, mbx_msg, true);
+	return 0;
+}
+
+static int shmdev_mbx_setup(void *pshmdrv_cb, struct shmdev_layer *pshm_dev,
+							 void *pshm_drv)
+{
+	/*
+	 * For UX5500, we have only 1 SHM instance which uses MBX0
+	 * for communication with the peer modem
+	 */
+	pshm_dev->hmbx = mbox_setup(MBX_ACC0, pshmdrv_cb, pshm_drv);
+
+	if (!pshm_dev->hmbx)
+		return -ENODEV;
+	else
+		return 0;
+}
+
+static int __init caif_shmdev_init(void)
+{
+	int i, result;
+
+	/* Loop is currently overkill, there is only one instance */
+	for (i = 0; i < MAX_SHM_INSTANCES; i++) {
+
+		shmdev_lyr[i].shm_base_addr = shm_start;
+		shmdev_lyr[i].shm_total_sz = shm_size;
+
+		if (((char *)shmdev_lyr[i].shm_base_addr == NULL)
+			       || (shmdev_lyr[i].shm_total_sz <= 0))	{
+			pr_warn("ERROR,"
+				"Shared memory Address and/or Size incorrect"
+				", Bailing out ...\n");
+			result = -EINVAL;
+			goto clean;
+		}
+
+		pr_info("SHM AREA (instance %d) STARTS"
+			" AT %p\n", i, (char *)shmdev_lyr[i].shm_base_addr);
+
+		shmdev_lyr[i].shm_id = i;
+		shmdev_lyr[i].pshmdev_mbxsend = shmdev_send_msg;
+		shmdev_lyr[i].pshmdev_mbxsetup = shmdev_mbx_setup;
+
+		/*
+		 * Finally, CAIF core module is called with details in place:
+		 * 1. SHM base address
+		 * 2. SHM size
+		 * 3. MBX handle
+		 */
+		result = caif_shmcore_probe(&shmdev_lyr[i]);
+		if (result) {
+			pr_warn("ERROR[%d],"
+				"Could not probe SHM core (instance %d)"
+				" Bailing out ...\n", result, i);
+			goto clean;
+		}
+	}
+
+	return 0;
+
+clean:
+	/*
+	 * For now, we assume that even if one instance of SHM fails, we bail
+	 * out of the driver support completely. For this, we need to release
+	 * any memory allocated and unregister any instance of SHM net device.
+	 */
+	for (i = 0; i < MAX_SHM_INSTANCES; i++) {
+		if (shmdev_lyr[i].pshm_netdev)
+			unregister_netdev(shmdev_lyr[i].pshm_netdev);
+	}
+	return result;
+}
+
+static void __exit caif_shmdev_exit(void)
+{
+	int i;
+
+	for (i = 0; i < MAX_SHM_INSTANCES; i++) {
+		caif_shmcore_remove(shmdev_lyr[i].pshm_netdev);
+		kfree((void *)shmdev_lyr[i].shm_base_addr);
+	}
+
+}
+
+module_init(caif_shmdev_init);
+module_exit(caif_shmdev_exit);
-- 
1.7.0.4


^ permalink raw reply related

* [RFC arm-kernel 4/4] caif-u5500: Build config for CAIF shared mem driver
From: Sjur Braendeland @ 2010-10-12 11:10 UTC (permalink / raw)
  To: davem, Russell King
  Cc: stefan.xk.nilsson, linus.walleij, linux-arm-kernel, netdev,
	kim.lilliestierna, daniel.martensson, Amarnath Revanna,
	Sjur Brændeland
In-Reply-To: <1286881821-2916-1-git-send-email-sjur.brandeland@stericsson.com>

From: Amarnath Revanna <amarnath.bangalore.revanna@stericsson.com>

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
---
 drivers/net/caif/Kconfig  |    7 +++++++
 drivers/net/caif/Makefile |    4 ++++
 2 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/drivers/net/caif/Kconfig b/drivers/net/caif/Kconfig
index 75bfc3a..09ed3f4 100644
--- a/drivers/net/caif/Kconfig
+++ b/drivers/net/caif/Kconfig
@@ -31,3 +31,10 @@ config CAIF_SPI_SYNC
 	Putting the next command and length in the start of the frame can
 	help to synchronize to the next transfer in case of over or under-runs.
 	This option also needs to be enabled on the modem.
+
+config CAIF_SHM
+	tristate "CAIF shared memory protocol driver"
+	depends on CAIF && U5500_MBOX
+	default n
+	---help---
+	The CAIF shared memory protocol driver for the STE UX5500 platform.
diff --git a/drivers/net/caif/Makefile b/drivers/net/caif/Makefile
index 3a11d61..b38d987 100644
--- a/drivers/net/caif/Makefile
+++ b/drivers/net/caif/Makefile
@@ -8,3 +8,7 @@ obj-$(CONFIG_CAIF_TTY) += caif_serial.o
 # SPI slave physical interfaces module
 cfspi_slave-objs := caif_spi.o caif_spi_slave.o
 obj-$(CONFIG_CAIF_SPI_SLAVE) += cfspi_slave.o
+
+# Shared memory
+caif_shm-objs := caif_shmcore.o caif_shm_u5500.o
+obj-$(CONFIG_CAIF_SHM) += caif_shm.o
-- 
1.7.0.4


^ permalink raw reply related

* Re: [patch 2/2] vhost: fix return code for log_access_ok()
From: Michael S. Tsirkin @ 2010-10-12 12:28 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Juan Quintela, David S. Miller, Rusty Russell, kvm,
	virtualization, netdev, kernel-janitors
In-Reply-To: <20101011172419.GG5851@bicker>

On Mon, Oct 11, 2010 at 07:24:19PM +0200, Dan Carpenter wrote:
> access_ok() returns 1 if it's OK otherwise it should return 0.
> 
> Signed-off-by: Dan Carpenter <error27@gmail.com>

Good catch, thanks!
Acked-by: Michael S. Tsirkin <mst@redhat.com>

> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index c2aa12c..f82fe57 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -371,7 +371,7 @@ static int log_access_ok(void __user *log_base, u64 addr, unsigned long sz)
>  	/* Make sure 64 bit math will not overflow. */
>  	if (a > ULONG_MAX - (unsigned long)log_base ||
>  	    a + (unsigned long)log_base > ULONG_MAX)
> -		return -EFAULT;
> +		return 0;
>  
>  	return access_ok(VERIFY_WRITE, log_base + a,
>  			 (sz + VHOST_PAGE_SIZE * 8 - 1) / VHOST_PAGE_SIZE / 8);

^ permalink raw reply

* Re: [patch 1/2] vhost: potential integer overflows
From: Michael S. Tsirkin @ 2010-10-12 12:25 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Juan Quintela, David S. Miller, Rusty Russell, kvm,
	virtualization, netdev, kernel-janitors
In-Reply-To: <20101011172256.GF5851@bicker>

On Mon, Oct 11, 2010 at 07:22:57PM +0200, Dan Carpenter wrote:
> I did an audit for potential integer overflows of values which get passed
> to access_ok() and here are the results.
> 
> Signed-off-by: Dan Carpenter <error27@gmail.com>
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index dd3d6f7..c2aa12c 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -429,6 +429,14 @@ static int vq_access_ok(unsigned int num,
>  			struct vring_avail __user *avail,
>  			struct vring_used __user *used)
>  {
> +
> +	if (num > UINT_MAX / sizeof *desc)
> +		return 0;
> +	if (num > UINT_MAX / sizeof *avail->ring - sizeof *avail)
> +		return 0;
> +	if (num > UINT_MAX / sizeof *used->ring - sizeof *used)
> +		return 0;
> +
>  	return access_ok(VERIFY_READ, desc, num * sizeof *desc) &&
>  	       access_ok(VERIFY_READ, avail,
>  			 sizeof *avail + num * sizeof *avail->ring) &&
> @@ -447,6 +455,9 @@ int vhost_log_access_ok(struct vhost_dev *dev)
>  /* Caller should have vq mutex and device mutex */
>  static int vq_log_access_ok(struct vhost_virtqueue *vq, void __user *log_base)
>  {
> +	if (vq->num > UINT_MAX / sizeof *vq->used->ring - sizeof *vq->used)
> +		return 0;
> +
>  	return vq_memory_access_ok(log_base, vq->dev->memory,
>  			    vhost_has_feature(vq->dev, VHOST_F_LOG_ALL)) &&
>  		(!vq->log_used || log_access_ok(log_base, vq->log_addr,
> @@ -606,12 +617,17 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
>  			}
>  
>  			/* Also validate log access for used ring if enabled. */
> -			if ((a.flags & (0x1 << VHOST_VRING_F_LOG)) &&
> -			    !log_access_ok(vq->log_base, a.log_guest_addr,
> +			if (a.flags & (0x1 << VHOST_VRING_F_LOG)) {
> +				if (vq->num > UINT_MAX / sizeof *vq->used->ring - sizeof *vq->used) {
> +					r = -EINVAL;
> +					break;
> +				}
> +				if (!log_access_ok(vq->log_base, a.log_guest_addr,
>  					   sizeof *vq->used +
>  					   vq->num * sizeof *vq->used->ring)) {
> -				r = -EINVAL;
> -				break;
> +					r = -EINVAL;
> +					break;
> +				}
>  			}
>  		}
>  


As far as I can see, maximum value for num is 64K - 1:

                if (!s.num || s.num > 0xffff || (s.num & (s.num - 1))) {
                        r = -EINVAL;
                        break;
                }

How can any of the above two trigger?
It seems easier to check value for sanity at a single place where it's
passed from userspace to kernel.

-- 
MST

^ permalink raw reply

* SCTP AUTO-ASCONF patch
From: Michio Honda @ 2010-10-12 12:27 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: netdev, hadi

Hi, 

I'm resubmitting a patch to enable AUTO_ASCONF for Linux SCTP. (version is for 2.6.36-rc7).  

Thanks,
- Michio

Only in linux-2.6/include: asm-arm
Only in linux-2.6/include: asm-mn10300
diff -ru -x '\.git' -x arch -x drivers -x fs -x asm -x Documentation -p linux-2.6.orig/include/linux/sysctl.h linux-2.6/include/linux/sysctl.h
--- linux-2.6.orig/include/linux/sysctl.h	2010-10-11 08:24:33.000000000 +0900
+++ linux-2.6/include/linux/sysctl.h	2010-10-11 07:21:40.000000000 +0900
@@ -767,6 +767,7 @@ enum {
 	NET_SCTP_SNDBUF_POLICY		 = 15,
 	NET_SCTP_SACK_TIMEOUT		 = 16,
 	NET_SCTP_RCVBUF_POLICY		 = 17,
+	NET_SCTP_AUTO_ASCONF_ENABLE	 = 18,
 };
 
 /* /proc/sys/net/bridge */
diff -ru -x '\.git' -x arch -x drivers -x fs -x asm -x Documentation -p linux-2.6.orig/include/net/sctp/sctp.h linux-2.6/include/net/sctp/sctp.h
--- linux-2.6.orig/include/net/sctp/sctp.h	2010-10-11 08:24:33.000000000 +0900
+++ linux-2.6/include/net/sctp/sctp.h	2010-10-11 07:21:40.000000000 +0900
@@ -121,6 +121,8 @@ extern int sctp_copy_local_addr_list(str
 				     int flags);
 extern struct sctp_pf *sctp_get_pf_specific(sa_family_t family);
 extern int sctp_register_pf(struct sctp_pf *, sa_family_t);
+void sctp_addr_wq_mgmt(union sctp_addr *, int);
+void sctp_path_check_and_react(struct sctp_association *, struct sockaddr *);
 
 /*
  * sctp/socket.c
@@ -135,6 +137,9 @@ void sctp_sock_rfree(struct sk_buff *skb
 void sctp_copy_sock(struct sock *newsk, struct sock *sk,
 		    struct sctp_association *asoc);
 extern struct percpu_counter sctp_sockets_allocated;
+int sctp_asconf_mgmt(struct sctp_endpoint *, struct sock *sk);
+void sctp_add_addr_to_laddr(struct sockaddr *, struct sctp_association *);
+void sctp_trans_immediate_retrans(struct sctp_transport *);
 
 /*
  * sctp/primitive.c
diff -ru -x '\.git' -x arch -x drivers -x fs -x asm -x Documentation -p linux-2.6.orig/include/net/sctp/sm.h linux-2.6/include/net/sctp/sm.h
--- linux-2.6.orig/include/net/sctp/sm.h	2010-10-11 08:24:33.000000000 +0900
+++ linux-2.6/include/net/sctp/sm.h	2010-10-11 07:21:40.000000000 +0900
@@ -295,6 +295,8 @@ int sctp_addip_addr_config(struct sctp_a
 __u32 sctp_generate_tag(const struct sctp_endpoint *);
 __u32 sctp_generate_tsn(const struct sctp_endpoint *);
 
+void sctp_path_check_and_react(struct sctp_association *, struct sockaddr *);
+
 /* Extern declarations for major data structures.  */
 extern sctp_timer_event_t *sctp_timer_events[SCTP_NUM_TIMEOUT_TYPES];
 
diff -ru -x '\.git' -x arch -x drivers -x fs -x asm -x Documentation -p linux-2.6.orig/include/net/sctp/structs.h linux-2.6/include/net/sctp/structs.h
--- linux-2.6.orig/include/net/sctp/structs.h	2010-10-11 08:24:33.000000000 +0900
+++ linux-2.6/include/net/sctp/structs.h	2010-10-11 07:21:40.000000000 +0900
@@ -205,6 +205,10 @@ extern struct sctp_globals {
 	 * It is a list of sctp_sockaddr_entry.
 	 */
 	struct list_head local_addr_list;
+	int auto_asconf_enable;
+	struct list_head addr_waitq;
+	struct timer_list addr_wq_timer;
+	spinlock_t addr_wq_lock;
 
 	/* Lock that protects the local_addr_list writers */
 	spinlock_t addr_list_lock;
@@ -265,6 +269,10 @@ extern struct sctp_globals {
 #define sctp_port_alloc_lock		(sctp_globals.port_alloc_lock)
 #define sctp_port_hashtable		(sctp_globals.port_hashtable)
 #define sctp_local_addr_list		(sctp_globals.local_addr_list)
+#define sctp_addr_waitq			(sctp_globals.addr_waitq)
+#define sctp_addr_wq_timer		(sctp_globals.addr_wq_timer)
+#define sctp_addr_wq_lock		(sctp_globals.addr_wq_lock)
+#define sctp_auto_asconf_enable		(sctp_globals.auto_asconf_enable)
 #define sctp_local_addr_lock		(sctp_globals.addr_list_lock)
 #define sctp_scope_policy		(sctp_globals.ipv4_scope_policy)
 #define sctp_addip_enable		(sctp_globals.addip_enable)
@@ -798,6 +806,16 @@ struct sctp_sockaddr_entry {
 	__u8 valid;
 };
 
+#define SCTP_NEWADDR	1
+#define SCTP_DELADDR	2
+#define SCTP_ADDRESS_TICK_DELAY	500
+struct sctp_addr_wait {
+	struct list_head list;
+	struct rcu_head rcu;
+	union sctp_addr a;
+	int	cmd;
+};
+
 typedef struct sctp_chunk *(sctp_packet_phandler_t)(struct sctp_association *);
 
 /* This structure holds lists of chunks as we are assembling for
@@ -1241,6 +1259,7 @@ sctp_scope_t sctp_scope(const union sctp
 int sctp_in_scope(const union sctp_addr *addr, const sctp_scope_t scope);
 int sctp_is_any(struct sock *sk, const union sctp_addr *addr);
 int sctp_addr_is_valid(const union sctp_addr *addr);
+int sctp_is_ep_boundall(struct sock *sk);
 
 
 /* What type of endpoint?  */
@@ -1903,6 +1922,11 @@ struct sctp_association {
 	 * after reaching 4294967295.
 	 */
 	__u32 addip_serial;
+	/* list of valid address in association local */
+	struct list_head asoc_laddr_list; 
+	union sctp_addr *asconf_addr_del_pending;
+	__u32 asconf_del_pending_cid;
+	int src_out_of_asoc_ok;
 
 	/* SCTP AUTH: list of the endpoint shared keys.  These
 	 * keys are provided out of band by the user applicaton
diff -ru -x '\.git' -x arch -x drivers -x fs -x asm -x Documentation -p linux-2.6.orig/kernel/sysctl_check.c linux-2.6/kernel/sysctl_check.c
--- linux-2.6.orig/kernel/sysctl_check.c	2010-04-22 17:55:41.000000000 +0900
+++ linux-2.6/kernel/sysctl_check.c	2010-04-21 18:40:22.000000000 +0900
@@ -5,7 +5,6 @@
 #include <linux/string.h>
 #include <net/ip_vs.h>
 
-
 static int sysctl_depth(struct ctl_table *table)
 {
 	struct ctl_table *tmp;
diff -ru -x '\.git' -x arch -x drivers -x fs -x asm -x Documentation -p linux-2.6.orig/net/sctp/associola.c linux-2.6/net/sctp/associola.c
--- linux-2.6.orig/net/sctp/associola.c	2010-10-11 08:24:34.000000000 +0900
+++ linux-2.6/net/sctp/associola.c	2010-10-11 07:21:40.000000000 +0900
@@ -278,6 +278,10 @@ static struct sctp_association *sctp_ass
 	if (sctp_addip_noauth)
 		asoc->peer.asconf_capable = 1;
 
+	asoc->asconf_addr_del_pending = NULL;
+	asoc->asconf_del_pending_cid = 0;
+	asoc->src_out_of_asoc_ok = 0;
+	INIT_LIST_HEAD(&asoc->asoc_laddr_list);
 	/* Create an input queue.  */
 	sctp_inq_init(&asoc->base.inqueue);
 	sctp_inq_set_th_handler(&asoc->base.inqueue, sctp_assoc_bh_rcv);
@@ -444,6 +448,17 @@ void sctp_association_free(struct sctp_a
 	/* Free any cached ASCONF_ACK chunk. */
 	sctp_assoc_free_asconf_acks(asoc);
 
+	/* Free pending address space being deleted */
+	if (asoc->asconf_addr_del_pending != NULL) 
+		kfree(asoc->asconf_addr_del_pending);
+	if (!list_empty(&asoc->asoc_laddr_list)) {
+		struct sctp_sockaddr_entry *laddr = NULL;
+		list_for_each_entry(laddr, &asoc->asoc_laddr_list, list) {
+			list_del(&laddr->list);
+			kfree(laddr);
+		}
+	}
+
 	/* Free any cached ASCONF chunk. */
 	if (asoc->addip_last_asconf)
 		sctp_chunk_free(asoc->addip_last_asconf);
@@ -618,6 +633,7 @@ void sctp_assoc_rm_peer(struct sctp_asso
 			if (!mod_timer(&active->T3_rtx_timer,
 					jiffies + active->rto))
 				sctp_transport_hold(active);
+		active->flight_size += peer->flight_size;
 	}
 
 	asoc->peer.transport_count--;
diff -ru -x '\.git' -x arch -x drivers -x fs -x asm -x Documentation -p linux-2.6.orig/net/sctp/bind_addr.c linux-2.6/net/sctp/bind_addr.c
--- linux-2.6.orig/net/sctp/bind_addr.c	2010-04-22 17:55:41.000000000 +0900
+++ linux-2.6/net/sctp/bind_addr.c	2010-06-24 02:33:20.000000000 +0900
@@ -536,6 +536,24 @@ int sctp_in_scope(const union sctp_addr 
 	return 0;
 }
 
+int sctp_is_ep_boundall(struct sock *sk)
+{
+	struct sctp_bind_addr *bp;
+	struct sctp_sockaddr_entry *addr;
+
+	if (!sk) 
+		return 0;
+       
+	bp = &sctp_sk(sk)->ep->base.bind_addr;
+	if (sctp_list_single_entry(&bp->address_list)) {
+		addr = list_entry(bp->address_list.next,
+				  struct sctp_sockaddr_entry, list);
+		if (sctp_is_any(sk, &addr->a)) 
+			return 1;
+	}
+	return 0;
+}
+
 /********************************************************************
  * 3rd Level Abstractions
  ********************************************************************/
diff -ru -x '\.git' -x arch -x drivers -x fs -x asm -x Documentation -p linux-2.6.orig/net/sctp/ipv6.c linux-2.6/net/sctp/ipv6.c
--- linux-2.6.orig/net/sctp/ipv6.c	2010-10-11 08:24:34.000000000 +0900
+++ linux-2.6/net/sctp/ipv6.c	2010-10-11 07:21:40.000000000 +0900
@@ -103,6 +103,7 @@ static int sctp_inet6addr_event(struct n
 			addr->valid = 1;
 			spin_lock_bh(&sctp_local_addr_lock);
 			list_add_tail_rcu(&addr->list, &sctp_local_addr_list);
+			sctp_addr_wq_mgmt(&addr->a, SCTP_NEWADDR);
 			spin_unlock_bh(&sctp_local_addr_lock);
 		}
 		break;
@@ -113,6 +114,7 @@ static int sctp_inet6addr_event(struct n
 			if (addr->a.sa.sa_family == AF_INET6 &&
 					ipv6_addr_equal(&addr->a.v6.sin6_addr,
 						&ifa->addr)) {
+				sctp_addr_wq_mgmt(&addr->a, SCTP_DELADDR);
 				found = 1;
 				addr->valid = 0;
 				list_del_rcu(&addr->list);
@@ -330,6 +332,25 @@ static void sctp_v6_get_saddr(struct sct
 				matchlen = bmatchlen;
 			}
 		}
+		if ((laddr->state == SCTP_ADDR_NEW) && asoc->src_out_of_asoc_ok) {
+			bmatchlen = sctp_v6_addr_match_len(daddr, &laddr->a);
+			if (!baddr || (matchlen < bmatchlen)) {
+				baddr = &laddr->a;
+				matchlen = bmatchlen;
+			}
+		}
+	}
+	if (baddr == NULL) {
+		/* We don't have a valid src addr in "endpoint-wide".  
+		 * Looking up in assoc-locally valid address list.  
+		 */
+		list_for_each_entry(laddr, &asoc->asoc_laddr_list, list) {
+			bmatchlen = sctp_v6_addr_match_len(daddr, &laddr->a);
+			if (!baddr || (matchlen < bmatchlen)) {
+				baddr = &laddr->a;
+				matchlen = bmatchlen;
+			}
+		}
 	}
 
 	if (baddr) {
diff -ru -x '\.git' -x arch -x drivers -x fs -x asm -x Documentation -p linux-2.6.orig/net/sctp/outqueue.c linux-2.6/net/sctp/outqueue.c
--- linux-2.6.orig/net/sctp/outqueue.c	2010-10-11 08:24:34.000000000 +0900
+++ linux-2.6/net/sctp/outqueue.c	2010-10-11 07:21:40.000000000 +0900
@@ -342,7 +342,13 @@ int sctp_outq_tail(struct sctp_outq *q, 
 			break;
 		}
 	} else {
-		list_add_tail(&chunk->list, &q->control_chunk_list);
+		/* We add the ASCONF for the only one newly added address at 
+		 * the front of the queue 
+		 */
+		if (q->asoc->src_out_of_asoc_ok && (chunk->chunk_hdr->type == SCTP_CID_ASCONF))
+			list_add(&chunk->list, &q->control_chunk_list);
+		else
+			list_add_tail(&chunk->list, &q->control_chunk_list);
 		SCTP_INC_STATS(SCTP_MIB_OUTCTRLCHUNKS);
 	}
 
@@ -850,6 +856,24 @@ static int sctp_outq_flush(struct sctp_o
 		case SCTP_CID_SHUTDOWN:
 		case SCTP_CID_ECN_ECNE:
 		case SCTP_CID_ASCONF:
+			/* RFC 5061, 5.3
+			 * F1) This means that until such time as the ASCONF 
+			 * containing the add is acknowledged, the sender MUST 
+			 * NOT use the new IP address as a source for ANY SCTP 
+			 * packet except on carrying an ASCONF Chunk.
+			 */
+			if (asoc->src_out_of_asoc_ok) {
+				SCTP_DEBUG_PRINTK("outq_flush: out_of_asoc_ok, transmit chunk type %d\n", chunk->chunk_hdr->type);
+				packet = &transport->packet;
+				sctp_packet_config(packet, vtag, 
+						asoc->peer.ecn_capable);
+				sctp_packet_append_chunk(packet, chunk);
+				error = sctp_packet_transmit(packet);
+				if (error < 0) {
+					return error;
+				}
+				goto sctp_flush_out;
+			}
 		case SCTP_CID_FWD_TSN:
 			status = sctp_packet_transmit_chunk(packet, chunk,
 							    one_packet);
diff -ru -x '\.git' -x arch -x drivers -x fs -x asm -x Documentation -p linux-2.6.orig/net/sctp/protocol.c linux-2.6/net/sctp/protocol.c
--- linux-2.6.orig/net/sctp/protocol.c	2010-10-11 08:24:34.000000000 +0900
+++ linux-2.6/net/sctp/protocol.c	2010-10-11 07:21:40.000000000 +0900
@@ -508,12 +508,19 @@ static struct dst_entry *sctp_v4_get_dst
 		sctp_v4_dst_saddr(&dst_saddr, dst, htons(bp->port));
 		rcu_read_lock();
 		list_for_each_entry_rcu(laddr, &bp->address_list, list) {
-			if (!laddr->valid || (laddr->state != SCTP_ADDR_SRC))
+			if (!laddr->valid || ((laddr->state != SCTP_ADDR_SRC) && (asoc->src_out_of_asoc_ok == 0)))
 				continue;
 			if (sctp_v4_cmp_addr(&dst_saddr, &laddr->a))
 				goto out_unlock;
 		}
 		rcu_read_unlock();
+		/* We don't have a valid src addr in "endpoint-wide".  
+		 * Looking up in assoc-locally valid address list.  
+		 */
+		list_for_each_entry(laddr, &asoc->asoc_laddr_list, list) {
+			if (sctp_v4_cmp_addr(&dst_saddr, &laddr->a))
+				goto out_unlock;
+		}
 
 		/* None of the bound addresses match the source address of the
 		 * dst. So release it.
@@ -633,6 +640,184 @@ static void sctp_v4_ecn_capable(struct s
 	INET_ECN_xmit(sk);
 }
 
+void sctp_addr_wq_timeout_handler(unsigned long arg)
+{
+	struct sctp_addr_wait *addrw = NULL;
+	union sctp_addr *addr = NULL;
+	struct sctp_ep_common *epb = NULL;
+	struct sctp_endpoint *ep = NULL;
+	struct hlist_node *node = NULL;
+	struct sctp_hashbucket *head = NULL;
+	int cnt=0;
+	int i; 
+
+	spin_lock_bh(&sctp_addr_wq_lock);
+retry_wq:
+	if (list_empty(&sctp_addr_waitq)) {
+		SCTP_DEBUG_PRINTK("sctp_addrwq_timo_handler: nothing in addr waitq\n");
+		spin_unlock_bh(&sctp_addr_wq_lock);
+		return;
+	}
+	addrw = list_first_entry(&sctp_addr_waitq, struct sctp_addr_wait, list);
+	if ((addrw->cmd != SCTP_NEWADDR) && (addrw->cmd != SCTP_DELADDR)) {
+		SCTP_DEBUG_PRINTK("sctp_addrwq_timo_handler: Huh, cmd is neither NEWADDR nor DELADDR\n");
+		list_del(&addrw->list);
+		kfree(addrw);
+		goto retry_wq;
+	}
+
+	addr = &addrw->a;
+	SCTP_DEBUG_PRINTK_IPADDR("sctp_addrwq_timo_handler: the first ent in wq %p is "," for cmd %d at entry %p\n", &sctp_addr_waitq, addr, addrw->cmd, addrw);
+
+	/* Now we send an ASCONF for each association */
+	/* Note. we currently don't handle link local IPv6 addressees */
+	if (addr->sa.sa_family == AF_INET6) {
+		struct in6_addr *in6 = (struct in6_addr *)&addr->v6.sin6_addr;
+
+		if (ipv6_addr_type(&addr->v6.sin6_addr) & IPV6_ADDR_LINKLOCAL) {
+			SCTP_DEBUG_PRINTK("sctp_timo_handler: link local, hence don't tell eps\n");
+			list_del(&addrw->list);
+			kfree(addrw);
+			goto retry_wq;
+		}
+		if ((ipv6_chk_addr(&init_net, in6, NULL, 0) == 0) && (addrw->cmd == SCTP_NEWADDR)) {
+			unsigned long timeo_val;
+
+			SCTP_DEBUG_PRINTK("sctp_timo_handler: this is on DAD, trying %d sec later\n", SCTP_ADDRESS_TICK_DELAY);
+			timeo_val = jiffies;
+			timeo_val += msecs_to_jiffies(SCTP_ADDRESS_TICK_DELAY);
+			(void)mod_timer(&sctp_addr_wq_timer, timeo_val);
+			spin_unlock_bh(&sctp_addr_wq_lock);
+			return;
+		}
+	}
+	for (i = 0; i < sctp_ep_hashsize; ++i) {
+		head = &sctp_ep_hashtable[i];
+		if (head == NULL) {
+			SCTP_DEBUG_PRINTK("addrwq_timo_handler: no head in hash\n");
+			continue;
+		}
+		write_lock(&head->lock);
+		epb = NULL;
+		sctp_for_each_hentry(epb, node, &head->chain) {
+
+			if (epb == NULL) {
+				SCTP_DEBUG_PRINTK("addrwq_timo_handler: no epb\n");
+				continue;
+			}
+			if (!sctp_is_ep_boundall(epb->sk)) {
+				/* ignore bound-specific endpoints */
+				continue;
+			}
+			ep = sctp_ep(epb);
+			if (sctp_asconf_mgmt(ep, epb->sk) < 0) {
+				SCTP_DEBUG_PRINTK("sctp_addrwq_timo_handler: sctp_asconf_mgmt failed\n");
+				continue;
+			}
+			++cnt;
+		}
+		write_unlock(&head->lock);
+	}
+
+	list_del(&addrw->list);
+	kfree(addrw);
+
+	if (list_empty(&sctp_addr_waitq)) {
+		spin_unlock_bh(&sctp_addr_wq_lock);
+		return;
+	} else {
+		goto retry_wq;
+	}
+	spin_unlock_bh(&sctp_addr_wq_lock);
+}
+
+void sctp_addr_wq_mgmt(union sctp_addr *reqaddr, int cmd)
+{
+	struct sctp_addr_wait *addrw = NULL;
+	struct sctp_addr_wait *addrw_new = NULL;
+	unsigned long timeo_val;
+	union sctp_addr *tmpaddr; // for debugging
+
+	/* first, we check if an opposite message already exist in the queue.  
+	 * If we found such message, it is removed.  
+	 * This operation is a bit stupid, but the DHCP client attaches the 
+	 * new address after a couple of addition and deletion of that address
+	 */
+
+	if (reqaddr == NULL) {
+		SCTP_DEBUG_PRINTK("sctp_addr_wq_mgmt: no address message?\n");
+		return;
+	}
+	
+	spin_lock_bh(&sctp_addr_wq_lock);
+	/* Offsets existing events in addr_wq */
+	list_for_each_entry(addrw, &sctp_addr_waitq, list) {
+		if (addrw->a.sa.sa_family != reqaddr->sa.sa_family) {
+			continue;
+		}
+		if (reqaddr->sa.sa_family == AF_INET) {
+			if (reqaddr->v4.sin_addr.s_addr == addrw->a.v4.sin_addr.s_addr) {
+				if (cmd != addrw->cmd) {
+					tmpaddr = &addrw->a;
+					SCTP_DEBUG_PRINTK_IPADDR("sctp_addr_wq_mgmt offsets existing entry for %d "," in waitq %p\n", addrw->cmd, tmpaddr, &sctp_addr_waitq);
+					list_del(&addrw->list);
+					kfree(addrw);
+					/* nothing to do anymore */
+					spin_unlock_bh(&sctp_addr_wq_lock);
+					return;
+				}
+			}
+		}
+		else if (reqaddr->sa.sa_family == AF_INET6) {
+			if (memcmp(&reqaddr->v6.sin6_addr, &addrw->a.v6.sin6_addr, sizeof(struct in6_addr)) == 0) {
+				if (cmd != addrw->cmd) {
+					tmpaddr = &addrw->a;
+					SCTP_DEBUG_PRINTK_IPADDR("sctp_addr_wq_mgmt: offsets existing entry for %d "," in waitq %p\n", addrw->cmd, tmpaddr, &sctp_addr_waitq);
+					list_del(&addrw->list);
+					kfree(addrw);
+					spin_unlock_bh(&sctp_addr_wq_lock);
+					return;
+				}
+			}
+		}
+	}
+				
+	/* OK, we have to add the new address to the wait queue */
+	addrw_new = kmalloc(sizeof(struct sctp_addr_wait), GFP_ATOMIC);
+	if (addrw_new == NULL) {
+		SCTP_DEBUG_PRINTK("sctp_addr_weitq_mgmt no memory? return\n");
+		spin_unlock_bh(&sctp_addr_wq_lock);
+		return;
+	}
+	memset(addrw_new, 0, sizeof(struct sctp_addr_wait));
+	if (reqaddr->sa.sa_family == AF_INET) {
+		addrw_new->a.v4.sin_family = AF_INET;
+		memcpy(&addrw_new->a.v4.sin_addr, &reqaddr->v4.sin_addr, sizeof(struct in_addr));
+	} else if (reqaddr->sa.sa_family == AF_INET6) {
+		addrw_new->a.v6.sin6_family = AF_INET6;
+		memcpy(&addrw_new->a.v6.sin6_addr, &reqaddr->v6.sin6_addr, sizeof(struct in6_addr));
+	} else {
+		SCTP_DEBUG_PRINTK("sctp_addr_waitq_mgmt: Unknown family of request addr, return\n");
+		kfree(addrw_new);
+		spin_unlock_bh(&sctp_addr_wq_lock);
+		return;
+	}
+	addrw_new->cmd = cmd;
+	list_add_tail(&addrw_new->list, &sctp_addr_waitq);
+	tmpaddr = &addrw_new->a;
+	SCTP_DEBUG_PRINTK_IPADDR("sctp_addr_wq_mgmt add new entry for cmd:%d "," in waitq %p, start a timer\n", addrw_new->cmd, tmpaddr, &sctp_addr_waitq);
+
+	if (timer_pending(&sctp_addr_wq_timer)) {
+		SCTP_DEBUG_PRINTK("sctp_addr_wq_mgmt: addr_wq timer is already running\n");
+		spin_unlock_bh(&sctp_addr_wq_lock);
+		return;
+	}
+	timeo_val = jiffies;
+	timeo_val += msecs_to_jiffies(SCTP_ADDRESS_TICK_DELAY);
+	(void)mod_timer(&sctp_addr_wq_timer, timeo_val);
+	spin_unlock_bh(&sctp_addr_wq_lock);
+}
+
 /* Event handler for inet address addition/deletion events.
  * The sctp_local_addr_list needs to be protocted by a spin lock since
  * multiple notifiers (say IPv4 and IPv6) may be running at the same
@@ -660,6 +845,7 @@ static int sctp_inetaddr_event(struct no
 			addr->valid = 1;
 			spin_lock_bh(&sctp_local_addr_lock);
 			list_add_tail_rcu(&addr->list, &sctp_local_addr_list);
+			sctp_addr_wq_mgmt(&addr->a, SCTP_NEWADDR);
 			spin_unlock_bh(&sctp_local_addr_lock);
 		}
 		break;
@@ -670,6 +856,7 @@ static int sctp_inetaddr_event(struct no
 			if (addr->a.sa.sa_family == AF_INET &&
 					addr->a.v4.sin_addr.s_addr ==
 					ifa->ifa_local) {
+				sctp_addr_wq_mgmt(&addr->a, SCTP_DELADDR);
 				found = 1;
 				addr->valid = 0;
 				list_del_rcu(&addr->list);
@@ -1276,6 +1463,10 @@ SCTP_STATIC __init int sctp_init(void)
 
 	/* Initialize the local address list. */
 	INIT_LIST_HEAD(&sctp_local_addr_list);
+	INIT_LIST_HEAD(&sctp_addr_waitq);
+	spin_lock_init(&sctp_addr_wq_lock);
+	sctp_addr_wq_timer.expires = 0;
+	setup_timer(&sctp_addr_wq_timer, sctp_addr_wq_timeout_handler, (unsigned long)NULL); 
 	spin_lock_init(&sctp_local_addr_lock);
 	sctp_get_local_addr_list();
 
diff -ru -x '\.git' -x arch -x drivers -x fs -x asm -x Documentation -p linux-2.6.orig/net/sctp/sm_make_chunk.c linux-2.6/net/sctp/sm_make_chunk.c
--- linux-2.6.orig/net/sctp/sm_make_chunk.c	2010-10-11 08:24:34.000000000 +0900
+++ linux-2.6/net/sctp/sm_make_chunk.c	2010-10-11 07:21:40.000000000 +0900
@@ -2649,6 +2649,78 @@ __u32 sctp_generate_tsn(const struct sct
 	return retval;
 }
 
+void
+sctp_trans_immediate_retrans(struct sctp_transport *trans)
+{
+	struct sctp_association *asoc = trans->asoc;
+
+	/* Stop pending T3_rtx_timer on this transport */
+	if (timer_pending(&trans->T3_rtx_timer)) {
+		(void)del_timer(&trans->T3_rtx_timer);
+		sctp_transport_put(trans);
+	} 
+
+	/* We consider the event as if the T3RTX timer expires */
+	sctp_retransmit(&asoc->outqueue, trans, SCTP_RTXR_T3_RTX);
+	if (!timer_pending(&trans->T3_rtx_timer)) {
+		if (!mod_timer(&trans->T3_rtx_timer, jiffies + trans->rto)) 
+			sctp_transport_hold(trans);
+	}
+
+	return;
+}
+
+void
+sctp_path_check_and_react(struct sctp_association *asoc, struct sockaddr *sa)
+{
+	struct sctp_transport *trans;
+	int addrnum, family;
+	struct sctp_sockaddr_entry *saddr;
+	struct sctp_bind_addr *bp;
+	union sctp_addr *tmpaddr;
+
+	family = sa->sa_family;
+	bp = &asoc->base.bind_addr;
+	addrnum = 0;
+	/* count up the number of local addresses in the same family */
+	list_for_each_entry(saddr, &bp->address_list, list) {
+		if (saddr->a.sa.sa_family == family) {
+			tmpaddr = &saddr->a;
+			if (family == AF_INET6 && 
+			    ipv6_addr_type(&tmpaddr->v6.sin6_addr) & 
+			    IPV6_ADDR_LINKLOCAL) {
+				continue;
+			}
+			addrnum++;
+		}
+	}
+	if (addrnum == 1) {
+		union sctp_addr *tmpaddr;
+		tmpaddr = (union sctp_addr *)sa;
+		SCTP_DEBUG_PRINTK_IPADDR("pcheck_react: only 1 local addr in asoc %p "," family %d\n", asoc, tmpaddr, family);
+		list_for_each_entry(trans, &asoc->peer.transport_addr_list, transports) {
+			/* reset path information and release refcount to the 
+			 * dst_entry  based on the src change */
+			sctp_transport_hold(trans);
+			trans->cwnd = min(4*asoc->pathmtu, max_t(__u32, 2*asoc->pathmtu, 4380));
+			trans->ssthresh = asoc->peer.i.a_rwnd;
+			trans->rtt = 0;
+			trans->srtt = 0;
+			trans->rttvar = 0;
+			trans->rto = asoc->rto_initial;
+			dst_release(trans->dst);
+			trans->dst = NULL;
+			memset(&trans->saddr, 0, sizeof(union sctp_addr));
+			sctp_transport_route(trans, NULL, sctp_sk(asoc->base.sk));
+			SCTP_DEBUG_PRINTK_IPADDR("we freed dst_entry (asoc: %p dst: "," trans: %p)\n", asoc, (&trans->ipaddr), trans);
+			trans->rto_pending = 1;
+			sctp_trans_immediate_retrans(trans);
+			sctp_transport_put(trans);
+		}
+	}
+	return;
+}
+
 /*
  * ADDIP 3.1.1 Address Configuration Change Chunk (ASCONF)
  *      0                   1                   2                   3
@@ -2742,11 +2814,30 @@ struct sctp_chunk *sctp_make_asconf_upda
 	int			addr_param_len = 0;
 	int 			totallen = 0;
 	int 			i;
+	sctp_addip_param_t del_param; // 8 Bytes (Type(0xC002), Len and CrrID)
+	sctp_addip_param_t spr_param;
+	struct sctp_af *del_af;
+	struct sctp_af *spr_af;
+	int del_addr_param_len = 0;
+	int spr_addr_param_len = 0;
+	int del_paramlen = sizeof(sctp_addip_param_t);
+	int spr_paramlen = sizeof(sctp_addip_param_t);
+	union sctp_addr_param del_addr_param; // (v4) 8 Bytes, (v6) 20 Bytes
+	union sctp_addr_param spr_addr_param;
+	int			v4 = 0;
+	int			v6 = 0;
 
 	/* Get total length of all the address parameters. */
 	addr_buf = addrs;
 	for (i = 0; i < addrcnt; i++) {
 		addr = (union sctp_addr *)addr_buf;
+		if (addr != NULL) {
+			if (addr->sa.sa_family == AF_INET) {
+				v4 = 1;
+			} else if (addr->sa.sa_family == AF_INET6) {
+				v6 = 1;
+			}
+		}
 		af = sctp_get_af_specific(addr->v4.sin_family);
 		addr_param_len = af->to_addr_param(addr, &addr_param);
 
@@ -2755,6 +2846,35 @@ struct sctp_chunk *sctp_make_asconf_upda
 
 		addr_buf += af->sockaddr_len;
 	}
+	/* Add the length of a pending address being deleted */
+	if ((flags == SCTP_PARAM_ADD_IP) && 
+	    (asoc->asconf_addr_del_pending != NULL)) {
+		if (((asoc->asconf_addr_del_pending->sa.sa_family == AF_INET) 
+		    && v4) || 
+		    ((asoc->asconf_addr_del_pending->sa.sa_family == AF_INET6)
+		    && v6)) {
+			del_af = sctp_get_af_specific(asoc->asconf_addr_del_pending->sa.sa_family);
+			del_addr_param_len = del_af->to_addr_param(asoc->asconf_addr_del_pending, &del_addr_param);
+			totallen += del_paramlen;
+			totallen += del_addr_param_len;
+			SCTP_DEBUG_PRINTK("mkasconf_update_ip: now we picked del_pending addr, totallen for all addresses is %d\n", totallen);
+			/* for Set Primary (equal size as del parameters */
+			totallen += del_paramlen;
+			totallen += del_addr_param_len;
+		}
+		if (v4) {
+			if ((totallen != 32) && (totallen != 48)) {
+				SCTP_DEBUG_PRINTK("mkasconf_update_ip: incorrect total length of ASCONF parameters, del + add MUST be 32 bytes, but %d bytes\n", totallen);
+			return NULL;
+			}
+		} else if (v6) {
+			if ((totallen != 56) && (totallen != 84)) {
+				SCTP_DEBUG_PRINTK("mkasconf_update_ip: incorrect total length of ASCONF parameters, del + add MUST be 56 bytes, but %d bytes\n", totallen);
+			return NULL;
+			}
+		}
+	}
+	SCTP_DEBUG_PRINTK("mkasconf_update_ip: call mkasconf() for %d bytes\n", totallen);
 
 	/* Create an asconf chunk with the required length. */
 	retval = sctp_make_asconf(asoc, laddr, totallen);
@@ -2776,6 +2896,29 @@ struct sctp_chunk *sctp_make_asconf_upda
 
 		addr_buf += af->sockaddr_len;
 	}
+	if ((flags == SCTP_PARAM_ADD_IP) && 
+	    (asoc->asconf_addr_del_pending != NULL)) {
+		addr = asoc->asconf_addr_del_pending;
+		del_af = sctp_get_af_specific(addr->v4.sin_family);
+		del_addr_param_len = del_af->to_addr_param(addr, &del_addr_param);
+		del_param.param_hdr.type = SCTP_PARAM_DEL_IP;
+		del_param.param_hdr.length = htons(del_paramlen + del_addr_param_len);
+		del_param.crr_id = i;
+		asoc->asconf_del_pending_cid = i;
+
+		sctp_addto_chunk(retval, del_paramlen, &del_param);
+		sctp_addto_chunk(retval, del_addr_param_len, &del_addr_param);
+		/* For SET_PRIMARY */
+		addr_buf = addrs;
+		addr = (union sctp_addr *)addr_buf;
+		spr_af = sctp_get_af_specific(addr->v4.sin_family);
+		spr_addr_param_len = spr_af->to_addr_param(addr, &spr_addr_param);
+		spr_param.param_hdr.type = SCTP_PARAM_SET_PRIMARY;
+		spr_param.param_hdr.length = htons(spr_paramlen + spr_addr_param_len);
+		spr_param.crr_id = (i+1);
+		sctp_addto_chunk(retval, spr_paramlen, &spr_param);
+		sctp_addto_chunk(retval, spr_addr_param_len, &spr_addr_param);
+	}
 	return retval;
 }
 
@@ -2988,7 +3131,7 @@ static __be16 sctp_process_asconf_param(
 		 * an Error Cause TLV set to the new error code 'Request to
 		 * Delete Source IP Address'
 		 */
-		if (sctp_cmp_addr_exact(sctp_source(asconf), &addr))
+		if (sctp_cmp_addr_exact(&asconf->source, &addr))
 			return SCTP_ERROR_DEL_SRC_IP;
 
 		/* Section 4.2.2
@@ -3169,7 +3312,6 @@ static void sctp_asconf_param_success(st
 	struct sctp_bind_addr *bp = &asoc->base.bind_addr;
 	union sctp_addr_param *addr_param;
 	struct sctp_transport *transport;
-	struct sctp_sockaddr_entry *saddr;
 
 	addr_param = (union sctp_addr_param *)
 			((void *)asconf_param + sizeof(sctp_addip_param_t));
@@ -3184,9 +3326,16 @@ static void sctp_asconf_param_success(st
 		 * held, so the list can not change.
 		 */
 		local_bh_disable();
-		list_for_each_entry(saddr, &bp->address_list, list) {
-			if (sctp_cmp_addr_exact(&saddr->a, &addr))
-				saddr->state = SCTP_ADDR_SRC;
+		/* Until this ASCONF is acked on all associations, we cannot 
+		 * consider this address as ADDR_SRC
+		 */
+		asoc->src_out_of_asoc_ok = 0;
+		sctp_add_addr_to_laddr(&addr.sa, asoc);
+		list_for_each_entry(transport, &asoc->peer.transport_addr_list,
+				transports) {
+			dst_release(transport->dst);
+			sctp_transport_route(transport, NULL,
+					     sctp_sk(asoc->base.sk));
 		}
 		local_bh_enable();
 		list_for_each_entry(transport, &asoc->peer.transport_addr_list,
@@ -3201,6 +3350,26 @@ static void sctp_asconf_param_success(st
 	case SCTP_PARAM_DEL_IP:
 		local_bh_disable();
 		sctp_del_bind_addr(bp, &addr);
+		if (asoc->asconf_addr_del_pending != NULL) {
+			if ((addr.sa.sa_family == AF_INET) && 
+			    (asoc->asconf_addr_del_pending->sa.sa_family == 
+			     AF_INET)) {
+				if (asoc->asconf_addr_del_pending->v4.sin_addr.s_addr == addr.v4.sin_addr.s_addr) {
+					kfree(asoc->asconf_addr_del_pending);
+					asoc->asconf_del_pending_cid = 0;
+					asoc->asconf_addr_del_pending = NULL;
+				}
+			} 
+			else if ((addr.sa.sa_family == AF_INET6) && 
+				(asoc->asconf_addr_del_pending->sa.sa_family == 
+				 AF_INET6)) {
+				if (memcmp(&asoc->asconf_addr_del_pending->v6.sin6_addr, &addr.v6.sin6_addr, sizeof(struct in6_addr)) == 0) {
+					kfree(asoc->asconf_addr_del_pending);
+					asoc->asconf_del_pending_cid = 0;
+					asoc->asconf_addr_del_pending = NULL;
+				}
+			}
+		}
 		local_bh_enable();
 		list_for_each_entry(transport, &asoc->peer.transport_addr_list,
 				transports) {
@@ -3291,6 +3460,8 @@ int sctp_process_asconf_ack(struct sctp_
 	int	no_err = 1;
 	int	retval = 0;
 	__be16	err_code = SCTP_ERROR_NO_ERROR;
+	sctp_addip_param_t *first_asconf_param = NULL;
+	int first_asconf_paramlen;
 
 	/* Skip the chunkhdr and addiphdr from the last asconf sent and store
 	 * a pointer to address parameter.
@@ -3305,6 +3476,8 @@ int sctp_process_asconf_ack(struct sctp_
 	length = ntohs(addr_param->v4.param_hdr.length);
 	asconf_param = (sctp_addip_param_t *)((void *)addr_param + length);
 	asconf_len -= length;
+	first_asconf_param = asconf_param;
+	first_asconf_paramlen = ntohs(first_asconf_param->param_hdr.length);
 
 	/* ADDIP 4.1
 	 * A8) If there is no response(s) to specific TLV parameter(s), and no
@@ -3359,6 +3532,34 @@ int sctp_process_asconf_ack(struct sctp_
 		asconf_len -= length;
 	}
 
+	/* When the source address obviously changes to newly added one, we 
+	   reset the cwnd to re-probe the path condition
+	*/
+	if (no_err && (first_asconf_param->param_hdr.type == SCTP_PARAM_ADD_IP)) {
+		if (first_asconf_paramlen == 16) {
+			struct sockaddr_in sin;
+
+			memset(&sin, 0, sizeof(struct sockaddr_in));
+			sin.sin_family = AF_INET;
+			memcpy(&sin.sin_addr.s_addr, first_asconf_param + 1, 
+					sizeof(struct in_addr));
+			sctp_path_check_and_react(asoc, 
+					(struct sockaddr *)&sin);
+
+		} else if (first_asconf_paramlen == 28) {
+			struct sockaddr_in6 sin6;
+
+			memset(&sin6, 0, sizeof(struct sockaddr_in6));
+			sin6.sin6_family = AF_INET6;
+			memcpy(&sin6.sin6_addr, first_asconf_param + 1, 
+					sizeof(struct in6_addr));
+			sctp_path_check_and_react(asoc, 
+					(struct sockaddr *)&sin6);
+		} else {
+			SCTP_DEBUG_PRINTK("funny asconf_paramlen? (%d)\n", first_asconf_paramlen);
+		}
+	}
+
 	/* Free the cached last sent asconf chunk. */
 	list_del_init(&asconf->transmitted_list);
 	sctp_chunk_free(asconf);
diff -ru -x '\.git' -x arch -x drivers -x fs -x asm -x Documentation -p linux-2.6.orig/net/sctp/socket.c linux-2.6/net/sctp/socket.c
--- linux-2.6.orig/net/sctp/socket.c	2010-10-11 08:24:34.000000000 +0900
+++ linux-2.6/net/sctp/socket.c	2010-10-11 07:21:40.000000000 +0900
@@ -525,6 +525,7 @@ static int sctp_send_asconf_add_ip(struc
 	struct list_head		*p;
 	int 				i;
 	int 				retval = 0;
+	struct sctp_transport 		*trans = NULL;
 
 	if (!sctp_addip_enable)
 		return retval;
@@ -581,13 +582,11 @@ static int sctp_send_asconf_add_ip(struc
 			goto out;
 		}
 
-		retval = sctp_send_asconf(asoc, chunk);
-		if (retval)
-			goto out;
 
 		/* Add the new addresses to the bind address list with
 		 * use_as_src set to 0.
 		 */
+		SCTP_DEBUG_PRINTK("snd_asconf_addip: next, add_bind_addr with ADDR_NEW flag\n");
 		addr_buf = addrs;
 		for (i = 0; i < addrcnt; i++) {
 			addr = (union sctp_addr *)addr_buf;
@@ -597,6 +596,26 @@ static int sctp_send_asconf_add_ip(struc
 						    SCTP_ADDR_NEW, GFP_ATOMIC);
 			addr_buf += af->sockaddr_len;
 		}
+		list_for_each_entry(trans, &asoc->peer.transport_addr_list, transports) {
+			if (asoc->asconf_addr_del_pending != NULL) {
+				/* This ADDIP ASCONF piggybacks DELIP for the 
+				 * last address, so need to select src addr 
+				 * from the out_of_asoc addrs 
+				 */
+				asoc->src_out_of_asoc_ok = 1;
+			}
+			/* Clear the source and route cache in the path */
+			memset(&trans->saddr, 0, sizeof(union sctp_addr));
+			dst_release(trans->dst);
+			trans->cwnd = min(4*asoc->pathmtu, max_t(__u32, 2*asoc->pathmtu, 4380));
+			trans->ssthresh = asoc->peer.i.a_rwnd;
+			trans->rto = asoc->rto_initial;
+			trans->rtt = 0;
+			trans->srtt = 0;
+			trans->rttvar = 0;
+			sctp_transport_route(trans, NULL, sctp_sk(asoc->base.sk));
+		}
+		retval = sctp_send_asconf(asoc, chunk);
 	}
 
 out:
@@ -638,6 +657,7 @@ static int sctp_bindx_rem(struct sock *s
 		 * bind address, there is nothing more to be removed (we need
 		 * at least one address here).
 		 */
+		
 		if (list_empty(&bp->address_list) ||
 		    (sctp_list_single_entry(&bp->address_list))) {
 			retval = -EBUSY;
@@ -709,7 +729,9 @@ static int sctp_send_asconf_del_ip(struc
 	struct sctp_sockaddr_entry *saddr;
 	int 			i;
 	int 			retval = 0;
+	int			stored = 0;
 
+	chunk = NULL;
 	if (!sctp_addip_enable)
 		return retval;
 
@@ -760,8 +782,36 @@ static int sctp_send_asconf_del_ip(struc
 		bp = &asoc->base.bind_addr;
 		laddr = sctp_find_unmatch_addr(bp, (union sctp_addr *)addrs,
 					       addrcnt, sp);
-		if (!laddr)
-			continue;
+		if ((laddr == NULL) && (addrcnt == 1)) {
+			union sctp_addr *sa_addr = NULL;
+
+			if (asoc->asconf_addr_del_pending == NULL) {
+				asoc->asconf_addr_del_pending = kmalloc(sizeof(union sctp_addr), GFP_ATOMIC);
+				memset(asoc->asconf_addr_del_pending, 0, 
+						sizeof(union sctp_addr));
+				if (addrs->sa_family == AF_INET) {
+					struct sockaddr_in *sin;
+
+					sin = (struct sockaddr_in *)addrs;
+					asoc->asconf_addr_del_pending->v4.sin_family = AF_INET;
+					memcpy(&asoc->asconf_addr_del_pending->v4.sin_addr, &sin->sin_addr, sizeof(struct in_addr));
+				} else if (addrs->sa_family == AF_INET6) {
+					struct sockaddr_in6 *sin6;
+
+					sin6 = (struct sockaddr_in6 *)addrs;
+					asoc->asconf_addr_del_pending->v6.sin6_family = AF_INET6;
+					memcpy(&asoc->asconf_addr_del_pending->v6.sin6_addr, &sin6->sin6_addr, sizeof(struct in6_addr));
+				}
+				sa_addr = (union sctp_addr *)addrs;
+				SCTP_DEBUG_PRINTK_IPADDR("send_asconf_del_ip: keep the last address asoc: %p "," at %p\n", asoc, sa_addr, asoc->asconf_addr_del_pending);
+				stored = 1;
+				goto skip_mkasconf;
+			} else {
+				SCTP_DEBUG_PRINTK_IPADDR("send_asconf_del_ip: asoc %p, deleting last address "," is already stored at %p\n", asoc, asoc->asconf_addr_del_pending, asoc->asconf_addr_del_pending);
+				continue;
+			}
+		}
+
 
 		/* We do not need RCU protection throughout this loop
 		 * because this is done under a socket lock from the
@@ -774,6 +824,7 @@ static int sctp_send_asconf_del_ip(struc
 			goto out;
 		}
 
+skip_mkasconf:
 		/* Reset use_as_src flag for the addresses in the bind address
 		 * list that are to be deleted.
 		 */
@@ -795,16 +846,222 @@ static int sctp_send_asconf_del_ip(struc
 		list_for_each_entry(transport, &asoc->peer.transport_addr_list,
 					transports) {
 			dst_release(transport->dst);
+			/* Clear source address cache */
+			memset(&transport->saddr, 0, sizeof(union sctp_addr));
 			sctp_transport_route(transport, NULL,
 					     sctp_sk(asoc->base.sk));
 		}
 
+		if (stored) {
+			/* We don't need to transmit ASCONF */
+			continue;
+		}
 		retval = sctp_send_asconf(asoc, chunk);
 	}
 out:
 	return retval;
 }
 
+/* Add a new address to the list contains available addresses only in the 
+ * association.  If the new address is also available on the other associations 
+ * on the endpoint, it is marked as SCTP_ADDR_SRC in the bind address list on 
+ * the endpoint.  This situation is possible when some of associations receive
+ * ASCONF-ACK for ADD_IP at the endpoint
+ */
+void
+sctp_add_addr_to_laddr(struct sockaddr *sa, struct sctp_association *asoc)
+{
+	struct sctp_endpoint *ep = asoc->ep;
+	struct sctp_association *tmp = NULL;
+	struct sctp_bind_addr *bp;
+	struct sctp_sockaddr_entry *addr;
+	struct sockaddr_in *sin = NULL;
+	struct sockaddr_in6 *sin6 = NULL;
+	int local;
+	int found;
+
+	union sctp_addr *tmpaddr = NULL;
+	tmpaddr = (union sctp_addr *)sa;
+	SCTP_DEBUG_PRINTK_IPADDR("add_addr_to_laddr: asoc: %p "," ep: %p", asoc, tmpaddr, ep);
+	if (sa->sa_family == AF_INET) {
+		sin = (struct sockaddr_in *)sa;
+	} else if (sa->sa_family == AF_INET6) {
+		sin6 = (struct sockaddr_in6 *)sa;
+	}
+
+	/* Check if this address is locally available in the other asocs */
+	local = 0;
+	list_for_each_entry(tmp, &ep->asocs, asocs) {
+		if (tmp == asoc) {
+			continue;
+		}
+		found = 0;
+		list_for_each_entry(addr, &tmp->asoc_laddr_list, list) {
+			tmpaddr = &addr->a;
+			if (sa->sa_family != addr->a.sa.sa_family) {
+				continue;
+			}
+			if (sa->sa_family == AF_INET) {
+				if (sin->sin_addr.s_addr == addr->a.v4.sin_addr.s_addr) {
+					found = 1;
+				}
+			} else if (sa->sa_family == AF_INET6) {
+				if (memcmp(&sin6->sin6_addr, &addr->a.v6.sin6_addr, sizeof(struct in6_addr)) == 0) {
+					found = 1;
+
+				}
+			}
+		}
+		if (!found) {
+			SCTP_DEBUG_PRINTK("add_addr_to_laddr: not found in asoc %p\n", tmp);
+			local = 1;
+			break;
+		}
+	}
+	addr = NULL;
+
+	if (local) {
+		/* this address is not available in some of the other 
+		 * associations.  So add as locally-available in this 
+		 * asocciation 
+		 */
+		addr = kmalloc(sizeof(struct sctp_sockaddr_entry), GFP_ATOMIC);
+		if  (addr == NULL) {
+			SCTP_DEBUG_PRINTK("add_addr_to_laddr: failed to allocate memory for this address\n");
+			return;
+		}
+		memset(addr, 0, sizeof(struct sctp_sockaddr_entry));
+		if (sa->sa_family == AF_INET) {
+			addr->a.sa.sa_family = AF_INET;
+			addr->a.v4.sin_port = sin->sin_port;
+			addr->a.v4.sin_addr.s_addr = sin->sin_addr.s_addr;
+		} else if (sa->sa_family == AF_INET6) {
+			addr->a.sa.sa_family = AF_INET6;
+			addr->a.v6.sin6_port = sin6->sin6_port;
+			memcpy(&addr->a.v6.sin6_addr, &sin6->sin6_addr, sizeof(struct in6_addr));
+		}
+		list_add_tail(&addr->list, &asoc->asoc_laddr_list);
+		SCTP_DEBUG_PRINTK("add_addr_to_laddr: now we added this address to the local list on asoc %p\n", asoc);
+	} else {
+		/* this address is also available in all other asocs.  So set 
+		 * it as ADDR_SRC in the bind-addr list in the endpoint, then 
+		 * remove from the asoc_laddr_list on the associations.  
+		 */
+		SCTP_DEBUG_PRINTK("add_addr_to_laddr: this address is available in all other asocs\n");
+		bp = &asoc->base.bind_addr;
+
+		/* change state of the new address in the bind list */
+		list_for_each_entry(addr, &bp->address_list, list) {
+			if (addr->state != SCTP_ADDR_NEW) {
+				continue;
+			}
+			if (addr->a.sa.sa_family != sa->sa_family) {
+				continue;
+			}
+			if (addr->a.sa.sa_family == AF_INET) {
+				if (sin->sin_port != addr->a.v4.sin_port) {
+					continue;
+				}
+				if (sin->sin_addr.s_addr != 
+				    addr->a.v4.sin_addr.s_addr) {
+					continue;
+				}
+			} else if (addr->a.sa.sa_family == AF_INET6) {
+				if (sin6->sin6_port != addr->a.v6.sin6_port) {
+					continue;
+				}
+				if (memcmp(&sin6->sin6_addr, 
+				    &addr->a.v6.sin6_addr, 
+				    sizeof(struct in6_addr)) != 0) {
+					continue;
+				}
+			}
+			SCTP_DEBUG_PRINTK("add_addr_to_laddr: found the entry for this address with ADDR_NEW flag, set to ADDR_SRC\n");
+			addr->state = SCTP_ADDR_SRC;
+		}
+
+		/* remove the entry of this address from the asoc-local list */
+		list_for_each_entry(tmp, &ep->asocs, asocs) {
+			if (tmp == asoc) {
+				continue;
+			}
+			addr = NULL;
+			list_for_each_entry(addr, &tmp->asoc_laddr_list, list) {
+				if (sa->sa_family != addr->a.sa.sa_family) {
+					continue;
+				}
+				if (sa->sa_family == AF_INET) {
+					if (sin->sin_addr.s_addr != addr->a.v4.sin_addr.s_addr) {
+						continue;
+					}
+				} else if (sa->sa_family == AF_INET6) {
+					if (memcmp(&sin6->sin6_addr, &addr->a.v6.sin6_addr, sizeof(struct in6_addr)) != 0) {
+						continue;
+					}
+				}
+				break;
+			}
+			if (addr == NULL) {
+				SCTP_DEBUG_PRINTK("add_addr_to_laddr: Huh, asoc %p doesn't have the entry for this address?\n", asoc);
+				continue;
+			}
+			list_del(&addr->list);
+			kfree(addr);
+		}
+	}
+}
+
+/* set address events to associations in the given endpoint.  We assume the ep 
+ * is write-locked, and addr_wq is read-locked.  
+ */
+int
+sctp_asconf_mgmt(struct sctp_endpoint *ep, struct sock *sk)
+{
+	struct sctp_addr_wait *addrw = NULL;
+	union sctp_addr *addr = NULL;
+	int cmd;
+	int error = 0;
+
+	if (!sctp_auto_asconf_enable) {
+		return (0);
+	}
+	if ((ep == NULL) || (sk == NULL)) {
+		return(-EINVAL);
+	}
+	if (list_empty(&sctp_addr_waitq)) {
+		SCTP_DEBUG_PRINTK("asconf_mgmt: nothing in the wq\n");
+		return(-EINVAL);
+	}
+	addrw = list_first_entry(&sctp_addr_waitq, struct sctp_addr_wait, list);
+	if (addrw->cmd != SCTP_NEWADDR && addrw->cmd != SCTP_DELADDR) {
+		return(-EINVAL);
+	}
+	addr = &addrw->a;
+	cmd = addrw->cmd;
+
+	if (addr->sa.sa_family == AF_INET) {
+		addr->v4.sin_port = htons(ep->base.bind_addr.port);
+	} else if (addr->sa.sa_family == AF_INET6) {
+		addr->v6.sin6_port = htons(ep->base.bind_addr.port);
+	}
+
+	if (cmd == SCTP_NEWADDR) {
+		error = sctp_send_asconf_add_ip(sk, (struct sockaddr *)addr, 1);
+		if (error) {
+			SCTP_DEBUG_PRINTK("asconf_mgmt: send_asconf_add_ip returns %d\n", error);
+			return(error);
+		}
+	} else if (cmd == SCTP_DELADDR) {
+		error = sctp_send_asconf_del_ip(sk, (struct sockaddr *)addr, 1);
+		if (error) {
+			SCTP_DEBUG_PRINTK("asconf_mgmt: send_asconf_del_ip returns %d\n", error);
+			return(error);
+		}
+	}
+
+	return(0);
+}
+
 /* Helper for tunneling sctp_bindx() requests through sctp_setsockopt()
  *
  * API 8.1
@@ -1146,6 +1403,7 @@ static int __sctp_connect(struct sock* s
 	if ((err == 0 || err == -EINPROGRESS) && assoc_id)
 		*assoc_id = asoc->assoc_id;
 
+	sctp_hash_endpoint(ep);
 	/* Don't free association on exit. */
 	asoc = NULL;
 
@@ -3559,6 +3817,8 @@ SCTP_STATIC struct sock *sctp_accept(str
 	struct sctp_association *asoc;
 	long timeo;
 	int error = 0;
+	struct sctp_sock *newsp = NULL;
+	struct sctp_endpoint *newep = NULL;
 
 	sctp_lock_sock(sk);
 
@@ -3596,6 +3856,9 @@ SCTP_STATIC struct sock *sctp_accept(str
 	 * asoc to the newsk.
 	 */
 	sctp_sock_migrate(sk, newsk, asoc, SCTP_SOCKET_TCP);
+	newsp = sctp_sk(newsk);
+	newep = newsp->ep;
+	sctp_hash_endpoint(newep);
 
 out:
 	sctp_release_sock(sk);
diff -ru -x '\.git' -x arch -x drivers -x fs -x asm -x Documentation -p linux-2.6.orig/net/sctp/sysctl.c linux-2.6/net/sctp/sysctl.c
--- linux-2.6.orig/net/sctp/sysctl.c	2010-04-22 17:55:41.000000000 +0900
+++ linux-2.6/net/sctp/sysctl.c	2010-06-23 09:11:02.000000000 +0900
@@ -183,6 +183,13 @@ static ctl_table sctp_table[] = {
 		.proc_handler	= proc_dointvec,
 	},
 	{
+		.procname	= "auto_asconf_enable",
+		.data		= &sctp_auto_asconf_enable,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
+	{
 		.procname	= "prsctp_enable",
 		.data		= &sctp_prsctp_enable,
 		.maxlen		= sizeof(int),


^ permalink raw reply

* Re: [PATCH net-next] net:  allocate skbs on local node
From: Christoph Lameter @ 2010-10-12 12:50 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Andrew Morton, Eric Dumazet, David Miller, netdev, Michael Chan,
	Eilon Greenstein, Christoph Hellwig, David Rientjes, LKML,
	Nick Piggin
In-Reply-To: <4CB441CB.2000708@cs.helsinki.fi>

On Tue, 12 Oct 2010, Pekka Enberg wrote:

> There's little point in discussing the removal of SLAB as long as there are
> performance regressions for real workloads from people who are willing to
> share results and test patches. I'm optimistic that we'll be able to try
> removing SLAB some time next year unless something interesting pops up...

Hmmm. Given these effects I think we should be more cautious regarding the
unification work. May be the "unified allocator" should replace SLAB
instead and SLUB can stay unchanged? The unification patches go back to
the one lock per node SLAB thing because the queue maintenance overhead is
otherwise causing large regressions in hackbench because of lots of atomic
ops. The per node lock seem to be causing problems here in the network
stack,. Take the unified as a SLAB cleanup instead? Then at least we have
a large common code base and just differentiate through the locking
mechanism?

^ permalink raw reply

* Re: [PATCH] ethtool: add the stmmac support
From: Peppe CAVALLARO @ 2010-10-12 13:26 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev@vger.kernel.org
In-Reply-To: <1286457211.2271.7.camel@achroite.uk.solarflarecom.com>

 On 10/7/2010 3:13 PM, Ben Hutchings wrote:
>
> On Wed, 2010-10-06 at 07:36 +0200, Peppe CAVALLARO wrote:
> > Hello,
> >
> > On 09/28/2010 11:51 AM, Giuseppe CAVALLARO wrote:
> > > Add the stmmac support into the ethtool to
> > > dump both the Mac Core and Dma registers.
> >
> > Any news for this patch?
> >
> > The stmmac is now working on several platforms (not only on STM ST40
> > based boxes). I think it's worth having the ethtool support for the
> driver.
> >
> > Welcome review and advice as usual.
> [...]
>
> You need to send ethtool patches to the ethtool maintainer, Jeff Garzik
> <jgarzik@pobox.com>.  I expect he'll make an ethtool release shortly
> after Linux 2.6.36, and will apply patches then.
>
Hi Ben
many thanks.
I'll sent the patch again adding in copy the Jeff Garzik.

Regards
Peppe

> Ben.
>
> --
> Ben Hutchings, Senior Software Engineer, Solarflare Communications
> Not speaking for my employer; that's the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.
>

^ permalink raw reply

* Re: [RFC PATCH 2/9] ipvs network name space aware
From: Daniel Lezcano @ 2010-10-12 14:30 UTC (permalink / raw)
  To: Hans Schillstrom; +Cc: lvs-devel, netdev, netfilter-devel, horms, ja, wensong
In-Reply-To: <201010081316.54676.hans.schillstrom@ericsson.com>

On 10/08/2010 01:16 PM, Hans Schillstrom wrote:
> This part contains the include files
> where include/net/netns/ip_vs.h is new and contains all moved vars.
>
> SUMMARY
>
>   include/net/ip_vs.h                     |  136 ++++---
>   include/net/net_namespace.h             |    2 +
>   include/net/netns/ip_vs.h               |  112 +++++
>
> Signed-off-by:Hans Schillstrom<hans.schillstrom@ericsson.com>
> ---
>    

Why are patch 1 and 2 the same ?


^ permalink raw reply

* Re: [RFC PATCH 2/9] ipvs network name space aware
From: Hans Schillstrom @ 2010-10-12 14:36 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: lvs-devel@vger.kernel.org, netdev@vger.kernel.org,
	netfilter-devel@vger.kernel.org, horms@verge.net.au, ja@ssi.bg,
	wensong@linux-vs.org
In-Reply-To: <4CB470FC.7070500@free.fr>

On Tuesday 12 October 2010 16:30:20 Daniel Lezcano wrote:
> On 10/08/2010 01:16 PM, Hans Schillstrom wrote:
> > This part contains the include files
> > where include/net/netns/ip_vs.h is new and contains all moved vars.
> >
> > SUMMARY
> >
> >   include/net/ip_vs.h                     |  136 ++++---
> >   include/net/net_namespace.h             |    2 +
> >   include/net/netns/ip_vs.h               |  112 +++++
> >
> > Signed-off-by:Hans Schillstrom<hans.schillstrom@ericsson.com>
> > ---
> >    
> 
> Why are patch 1 and 2 the same ?
> 
> 
Cut & Paste error :-)

I'll resend patch 2


-- 
Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

^ permalink raw reply

* [RFC PATCH 2/9 v1.1] ipvs network name space aware (resend with right patch)
From: Hans Schillstrom @ 2010-10-12 14:44 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: lvs-devel@vger.kernel.org, netdev@vger.kernel.org,
	netfilter-devel@vger.kernel.org, horms@verge.net.au, ja@ssi.bg,
	wensong@linux-vs.org
In-Reply-To: <4CB470FC.7070500@free.fr>

This is patch 2/9  previous was just a copy of patch 1/9 

This patch just contains ip_vs_app.c
There is nothing special whith this file,
just the normal,
 - moving to vars to struct ipvs
 - adding per netns init and exit

Signed-off-by:Hans Schillstrom <hans.schillstrom@ericsson.com>

diff --git a/net/netfilter/ipvs/ip_vs_app.c b/net/netfilter/ipvs/ip_vs_app.c
index e76f87f..87b7609 100644
--- a/net/netfilter/ipvs/ip_vs_app.c
+++ b/net/netfilter/ipvs/ip_vs_app.c
@@ -43,11 +43,6 @@ EXPORT_SYMBOL(register_ip_vs_app);
 EXPORT_SYMBOL(unregister_ip_vs_app);
 EXPORT_SYMBOL(register_ip_vs_app_inc);

-/* ipvs application list head */
-static LIST_HEAD(ip_vs_app_list);
-static DEFINE_MUTEX(__ip_vs_app_mutex);
-
-
 /*
  *	Get an ip_vs_app object
  */
@@ -67,7 +62,8 @@ static inline void ip_vs_app_put(struct ip_vs_app *app)
  *	Allocate/initialize app incarnation and register it in proto apps.
  */
 static int
-ip_vs_app_inc_new(struct ip_vs_app *app, __u16 proto, __u16 port)
+ip_vs_app_inc_new(struct net *net, struct ip_vs_app *app, __u16 proto,
+		  __u16 port)
 {
 	struct ip_vs_protocol *pp;
 	struct ip_vs_app *inc;
@@ -98,7 +94,7 @@ ip_vs_app_inc_new(struct ip_vs_app *app, __u16 proto, __u16 port)
 		}
 	}

-	ret = pp->register_app(inc);
+	ret = pp->register_app(net, inc);
 	if (ret)
 		goto out;

@@ -119,7 +115,7 @@ ip_vs_app_inc_new(struct ip_vs_app *app, __u16 proto, __u16 port)
  *	Release app incarnation
  */
 static void
-ip_vs_app_inc_release(struct ip_vs_app *inc)
+ip_vs_app_inc_release(struct net *net, struct ip_vs_app *inc)
 {
 	struct ip_vs_protocol *pp;

@@ -127,7 +123,7 @@ ip_vs_app_inc_release(struct ip_vs_app *inc)
 		return;

 	if (pp->unregister_app)
-		pp->unregister_app(inc);
+		pp->unregister_app(net, inc);

 	IP_VS_DBG(9, "%s App %s:%u unregistered\n",
 		  pp->name, inc->name, inc->port);
@@ -167,16 +163,16 @@ void ip_vs_app_inc_put(struct ip_vs_app *inc)
 /*
  *	Register an application incarnation in protocol applications
  */
-int
-register_ip_vs_app_inc(struct ip_vs_app *app, __u16 proto, __u16 port)
+int register_ip_vs_app_inc(struct net *net, struct ip_vs_app *app, __u16 proto,
+		           __u16 port)
 {
 	int result;

-	mutex_lock(&__ip_vs_app_mutex);
+	mutex_lock(&net->ipvs->app_mutex);

-	result = ip_vs_app_inc_new(app, proto, port);
+	result = ip_vs_app_inc_new(net, app, proto, port);

-	mutex_unlock(&__ip_vs_app_mutex);
+	mutex_unlock(&net->ipvs->app_mutex);

 	return result;
 }
@@ -185,16 +181,16 @@ register_ip_vs_app_inc(struct ip_vs_app *app, __u16 proto, __u16 port)
 /*
  *	ip_vs_app registration routine
  */
-int register_ip_vs_app(struct ip_vs_app *app)
+int register_ip_vs_app(struct net *net, struct ip_vs_app *app)
 {
 	/* increase the module use count */
 	ip_vs_use_count_inc();

-	mutex_lock(&__ip_vs_app_mutex);
+	mutex_lock(&net->ipvs->app_mutex);

-	list_add(&app->a_list, &ip_vs_app_list);
+	list_add(&app->a_list, &net->ipvs->app_list);

-	mutex_unlock(&__ip_vs_app_mutex);
+	mutex_unlock(&net->ipvs->app_mutex);

 	return 0;
 }
@@ -204,19 +200,19 @@ int register_ip_vs_app(struct ip_vs_app *app)
  *	ip_vs_app unregistration routine
  *	We are sure there are no app incarnations attached to services
  */
-void unregister_ip_vs_app(struct ip_vs_app *app)
+void unregister_ip_vs_app(struct net *net, struct ip_vs_app *app)
 {
 	struct ip_vs_app *inc, *nxt;

-	mutex_lock(&__ip_vs_app_mutex);
+	mutex_lock(&net->ipvs->app_mutex);

 	list_for_each_entry_safe(inc, nxt, &app->incs_list, a_list) {
-		ip_vs_app_inc_release(inc);
+		ip_vs_app_inc_release(net, inc);
 	}

 	list_del(&app->a_list);

-	mutex_unlock(&__ip_vs_app_mutex);
+	mutex_unlock(&net->ipvs->app_mutex);

 	/* decrease the module use count */
 	ip_vs_use_count_dec();
@@ -226,9 +222,9 @@ void unregister_ip_vs_app(struct ip_vs_app *app)
 /*
  *	Bind ip_vs_conn to its ip_vs_app (called by cp constructor)
  */
-int ip_vs_bind_app(struct ip_vs_conn *cp, struct ip_vs_protocol *pp)
+int ip_vs_bind_app(struct net *net, struct ip_vs_conn *cp, struct ip_vs_protocol *pp)
 {
-	return pp->app_conn_bind(cp);
+	return pp->app_conn_bind(net, cp);
 }


@@ -481,11 +477,12 @@ int ip_vs_app_pkt_in(struct ip_vs_conn *cp, struct sk_buff *skb)
  *	/proc/net/ip_vs_app entry function
  */

-static struct ip_vs_app *ip_vs_app_idx(loff_t pos)
+static struct ip_vs_app *ip_vs_app_idx(struct net *net, loff_t pos)
 {
 	struct ip_vs_app *app, *inc;
+	struct netns_ipvs *ipvs = net->ipvs;

-	list_for_each_entry(app, &ip_vs_app_list, a_list) {
+	list_for_each_entry(app, &ipvs->app_list, a_list) {
 		list_for_each_entry(inc, &app->incs_list, a_list) {
 			if (pos-- == 0)
 				return inc;
@@ -497,19 +494,22 @@ static struct ip_vs_app *ip_vs_app_idx(loff_t pos)

 static void *ip_vs_app_seq_start(struct seq_file *seq, loff_t *pos)
 {
-	mutex_lock(&__ip_vs_app_mutex);
+	struct net *net = seq_file_net(seq);
+	mutex_lock(&net->ipvs->app_mutex);

-	return *pos ? ip_vs_app_idx(*pos - 1) : SEQ_START_TOKEN;
+	return *pos ? ip_vs_app_idx(net, *pos - 1) : SEQ_START_TOKEN;
 }

 static void *ip_vs_app_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
 	struct ip_vs_app *inc, *app;
 	struct list_head *e;
+	struct net *net = seq_file_net(seq);
+	struct netns_ipvs *ipvs = net->ipvs;

 	++*pos;
 	if (v == SEQ_START_TOKEN)
-		return ip_vs_app_idx(0);
+		return ip_vs_app_idx(net, 0);

 	inc = v;
 	app = inc->app;
@@ -518,7 +518,7 @@ static void *ip_vs_app_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 		return list_entry(e, struct ip_vs_app, a_list);

 	/* go on to next application */
-	for (e = app->a_list.next; e != &ip_vs_app_list; e = e->next) {
+	for (e = app->a_list.next; e != &ipvs->app_list; e = e->next) {
 		app = list_entry(e, struct ip_vs_app, a_list);
 		list_for_each_entry(inc, &app->incs_list, a_list) {
 			return inc;
@@ -529,7 +529,9 @@ static void *ip_vs_app_seq_next(struct seq_file *seq, void *v, loff_t *pos)

 static void ip_vs_app_seq_stop(struct seq_file *seq, void *v)
 {
-	mutex_unlock(&__ip_vs_app_mutex);
+	struct net *net = seq_file_net(seq);
+
+	mutex_unlock(&net->ipvs->app_mutex);
 }

 static int ip_vs_app_seq_show(struct seq_file *seq, void *v)
@@ -557,7 +559,8 @@ static const struct seq_operations ip_vs_app_seq_ops = {

 static int ip_vs_app_open(struct inode *inode, struct file *file)
 {
-	return seq_open(file, &ip_vs_app_seq_ops);
+
+	return seq_open_net(inode,file, &ip_vs_app_seq_ops, sizeof(struct seq_net_private));
 }

 static const struct file_operations ip_vs_app_fops = {
@@ -565,19 +568,38 @@ static const struct file_operations ip_vs_app_fops = {
 	.open	 = ip_vs_app_open,
 	.read	 = seq_read,
 	.llseek  = seq_lseek,
-	.release = seq_release,
+	.release = seq_release_private,
 };
 #endif

-int __init ip_vs_app_init(void)
+static int __net_init __ip_vs_app_init(struct net *net)
 {
-	/* we will replace it with proc_net_ipvs_create() soon */
-	proc_net_fops_create(&init_net, "ip_vs_app", 0, &ip_vs_app_fops);
+	INIT_LIST_HEAD(&net->ipvs->app_list);
+	__mutex_init(&net->ipvs->app_mutex,"ipvs->app_mutex", &net->ipvs->app_key);
+	proc_net_fops_create(net, "ip_vs_app", 0, &ip_vs_app_fops);
 	return 0;
 }

+static void __net_exit __ip_vs_app_cleanup(struct net *net)
+{
+	proc_net_remove(net, "ip_vs_app");
+}
+
+static struct pernet_operations ip_vs_app_ops = {
+	.init = __ip_vs_app_init,
+	.exit = __ip_vs_app_cleanup,
+};
+
+int __init ip_vs_app_init(void)
+{
+	int rv;
+
+	rv = register_pernet_subsys(&ip_vs_app_ops);
+	return rv;
+}
+

 void ip_vs_app_cleanup(void)
 {
-	proc_net_remove(&init_net, "ip_vs_app");
+	unregister_pernet_subsys(&ip_vs_app_ops);
 }

-- 
Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

^ permalink raw reply related

* Re: [patch 1/2] vhost: potential integer overflows
From: Dan Carpenter @ 2010-10-12 14:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Juan Quintela, David S. Miller, Rusty Russell, kvm,
	virtualization, netdev, kernel-janitors
In-Reply-To: <20101012122548.GA25446@redhat.com>

On Tue, Oct 12, 2010 at 02:25:48PM +0200, Michael S. Tsirkin wrote:
> 
> As far as I can see, maximum value for num is 64K - 1:
> 
>                 if (!s.num || s.num > 0xffff || (s.num & (s.num - 1))) {
>                         r = -EINVAL;
>                         break;
>                 }
> 
> How can any of the above two trigger?
> It seems easier to check value for sanity at a single place where it's
> passed from userspace to kernel.
> 

Gar.  Sorry for that.  My mistake.

regards,
dan carpenter


^ permalink raw reply

* [PATCH] Phonet: 'connect' socket implementation for Pipe controller
From: Kumar A Sanghvi @ 2010-10-12 15:26 UTC (permalink / raw)
  To: remi.denis-courmont, davem, netdev
  Cc: linus.walleij, gulshan.karmani, sudeep.divakaran, Kumar Sanghvi

From: Kumar Sanghvi <kumar.sanghvi@stericsson.com>

Based on suggestion by Rémi Denis-Courmont to implement 'connect'
for Pipe controller logic,  this patch implements 'connect' socket
call for the Pipe controller logic.
The patch does following:-
- Removes setsockopts for PNPIPE_CREATE and PNPIPE_DESTROY
- Adds setsockopt for setting the Pipe handle value
- Implements connect socket call
- Updates the Pipe controller logic

User-space should now follow below sequence with Pipe controller:-
-socket
-bind
-setsockopt for PNPIPE_PIPE_HANDLE
-connect
-setsockopt for PNPIPE_ENCAP_IP
-setsockopt for PNPIPE_ENABLE

GPRS/3G data has been tested working fine with this.

Signed-off-by: Kumar Sanghvi <kumar.sanghvi@stericsson.com>
---
 include/linux/phonet.h   |    3 +-
 include/net/phonet/pep.h |    4 +-
 net/phonet/pep.c         |  300 +++++++++++++++++----------------------------
 net/phonet/socket.c      |  100 +++++++++++++++
 4 files changed, 216 insertions(+), 191 deletions(-)

diff --git a/include/linux/phonet.h b/include/linux/phonet.h
index e27cbf9..26c8df7 100644
--- a/include/linux/phonet.h
+++ b/include/linux/phonet.h
@@ -36,10 +36,9 @@
 /* Socket options for SOL_PNPIPE level */
 #define PNPIPE_ENCAP		1
 #define PNPIPE_IFINDEX		2
-#define PNPIPE_CREATE           3
+#define PNPIPE_PIPE_HANDLE	3
 #define PNPIPE_ENABLE           4
 /* unused slot */
-#define PNPIPE_DESTROY          6
 
 #define PNADDR_ANY		0
 #define PNADDR_BROADCAST	0xFC
diff --git a/include/net/phonet/pep.h b/include/net/phonet/pep.h
index def6cfa..b60b28c 100644
--- a/include/net/phonet/pep.h
+++ b/include/net/phonet/pep.h
@@ -46,8 +46,8 @@ struct pep_sock {
 	u8			init_enable;	/* auto-enable at creation */
 	u8			aligned;
 #ifdef CONFIG_PHONET_PIPECTRLR
-	u16                     remote_pep;
-	u8                      pipe_state;
+	u8			pipe_state;
+	struct sockaddr_pn	remote_pep;
 #endif
 };
 
diff --git a/net/phonet/pep.c b/net/phonet/pep.c
index f818f76..9221729 100644
--- a/net/phonet/pep.c
+++ b/net/phonet/pep.c
@@ -88,15 +88,6 @@ static int pep_reply(struct sock *sk, struct sk_buff *oskb,
 	const struct pnpipehdr *oph = pnp_hdr(oskb);
 	struct pnpipehdr *ph;
 	struct sk_buff *skb;
-#ifdef CONFIG_PHONET_PIPECTRLR
-	const struct phonethdr *hdr = pn_hdr(oskb);
-	struct sockaddr_pn spn = {
-		.spn_family = AF_PHONET,
-		.spn_resource = 0xD9,
-		.spn_dev = hdr->pn_sdev,
-		.spn_obj = hdr->pn_sobj,
-	};
-#endif
 
 	skb = alloc_skb(MAX_PNPIPE_HEADER + len, priority);
 	if (!skb)
@@ -114,11 +105,7 @@ static int pep_reply(struct sock *sk, struct sk_buff *oskb,
 	ph->pipe_handle = oph->pipe_handle;
 	ph->error_code = code;
 
-#ifdef CONFIG_PHONET_PIPECTRLR
-	return pn_skb_send(sk, skb, &spn);
-#else
 	return pn_skb_send(sk, skb, &pipe_srv);
-#endif
 }
 
 #define PAD 0x00
@@ -188,18 +175,13 @@ static int pipe_get_flow_info(struct sock *sk, struct sk_buff *skb,
 	return 0;
 }
 
-static int pipe_handler_send_req(struct sock *sk, u16 dobj, u8 utid,
-		u8 msg_id, u8 p_handle, gfp_t priority)
+static int pipe_handler_send_req(struct sock *sk, u8 utid,
+		u8 msg_id, gfp_t priority)
 {
 	int len;
 	struct pnpipehdr *ph;
 	struct sk_buff *skb;
-	struct sockaddr_pn spn = {
-		.spn_family = AF_PHONET,
-		.spn_resource = 0xD9,
-		.spn_dev = pn_dev(dobj),
-		.spn_obj = pn_obj(dobj),
-	};
+	struct pep_sock *pn = pep_sk(sk);
 
 	static const u8 data[4] = {
 		PAD, PAD, PAD, PAD,
@@ -235,30 +217,25 @@ static int pipe_handler_send_req(struct sock *sk, u16 dobj, u8 utid,
 	ph = pnp_hdr(skb);
 	ph->utid = utid;
 	ph->message_id = msg_id;
-	ph->pipe_handle = p_handle;
+	ph->pipe_handle = pn->pipe_handle;
 	ph->error_code = PN_PIPE_NO_ERROR;
 
-	return pn_skb_send(sk, skb, &spn);
+	return pn_skb_send(sk, skb, &pn->remote_pep);
 }
 
-static int pipe_handler_send_created_ind(struct sock *sk, u16 dobj,
-		u8 utid, u8 p_handle, u8 msg_id, u8 tx_fc, u8 rx_fc)
+static int pipe_handler_send_created_ind(struct sock *sk,
+		u8 utid, u8 msg_id)
 {
 	int err_code;
 	struct pnpipehdr *ph;
 	struct sk_buff *skb;
-	struct sockaddr_pn spn = {
-		.spn_family = AF_PHONET,
-		.spn_resource = 0xD9,
-		.spn_dev = pn_dev(dobj),
-		.spn_obj = pn_obj(dobj),
-	};
 
+	struct pep_sock *pn = pep_sk(sk);
 	static u8 data[4] = {
 		0x03, 0x04,
 	};
-	data[2] = tx_fc;
-	data[3] = rx_fc;
+	data[2] = pn->tx_fc;
+	data[3] = pn->rx_fc;
 
 	/*
 	 * actually, below is number of sub-blocks and not error code.
@@ -282,24 +259,18 @@ static int pipe_handler_send_created_ind(struct sock *sk, u16 dobj,
 	ph = pnp_hdr(skb);
 	ph->utid = utid;
 	ph->message_id = msg_id;
-	ph->pipe_handle = p_handle;
+	ph->pipe_handle = pn->pipe_handle;
 	ph->error_code = err_code;
 
-	return pn_skb_send(sk, skb, &spn);
+	return pn_skb_send(sk, skb, &pn->remote_pep);
 }
 
-static int pipe_handler_send_ind(struct sock *sk, u16 dobj, u8 utid,
-		u8 p_handle, u8 msg_id)
+static int pipe_handler_send_ind(struct sock *sk, u8 utid, u8 msg_id)
 {
 	int err_code;
 	struct pnpipehdr *ph;
 	struct sk_buff *skb;
-	struct sockaddr_pn spn = {
-		.spn_family = AF_PHONET,
-		.spn_resource = 0xD9,
-		.spn_dev = pn_dev(dobj),
-		.spn_obj = pn_obj(dobj),
-	};
+	struct pep_sock *pn = pep_sk(sk);
 
 	/*
 	 * actually, below is a filler.
@@ -321,10 +292,10 @@ static int pipe_handler_send_ind(struct sock *sk, u16 dobj, u8 utid,
 	ph = pnp_hdr(skb);
 	ph->utid = utid;
 	ph->message_id = msg_id;
-	ph->pipe_handle = p_handle;
+	ph->pipe_handle = pn->pipe_handle;
 	ph->error_code = err_code;
 
-	return pn_skb_send(sk, skb, &spn);
+	return pn_skb_send(sk, skb, &pn->remote_pep);
 }
 
 static int pipe_handler_enable_pipe(struct sock *sk, int enable)
@@ -339,34 +310,7 @@ static int pipe_handler_enable_pipe(struct sock *sk, int enable)
 		utid = PNS_PIPE_DISABLE_UTID;
 		req = PNS_PEP_DISABLE_REQ;
 	}
-	return pipe_handler_send_req(sk, pn->pn_sk.sobject, utid, req,
-			pn->pipe_handle, GFP_ATOMIC);
-}
-
-static int pipe_handler_create_pipe(struct sock *sk, int pipe_handle, int cmd)
-{
-	int ret;
-	struct pep_sock *pn = pep_sk(sk);
-
-	switch (cmd) {
-	case PNPIPE_CREATE:
-		ret = pipe_handler_send_req(sk, pn->pn_sk.sobject,
-				PNS_PEP_CONNECT_UTID, PNS_PEP_CONNECT_REQ,
-				pipe_handle, GFP_ATOMIC);
-		break;
-
-	case PNPIPE_DESTROY:
-		ret = pipe_handler_send_req(sk, pn->remote_pep,
-				PNS_PEP_DISCONNECT_UTID,
-				PNS_PEP_DISCONNECT_REQ,
-				pn->pipe_handle, GFP_ATOMIC);
-		break;
-
-	default:
-		ret = -EINVAL;
-	}
-
-	return ret;
+	return pipe_handler_send_req(sk, utid, req, GFP_ATOMIC);
 }
 #endif
 
@@ -434,14 +378,6 @@ static int pipe_snd_status(struct sock *sk, u8 type, u8 status, gfp_t priority)
 	struct pep_sock *pn = pep_sk(sk);
 	struct pnpipehdr *ph;
 	struct sk_buff *skb;
-#ifdef CONFIG_PHONET_PIPECTRLR
-	struct sockaddr_pn spn = {
-		.spn_family = AF_PHONET,
-		.spn_resource = 0xD9,
-		.spn_dev = pn_dev(pn->remote_pep),
-		.spn_obj = pn_obj(pn->remote_pep),
-	};
-#endif
 
 	skb = alloc_skb(MAX_PNPIPE_HEADER + 4, priority);
 	if (!skb)
@@ -462,7 +398,7 @@ static int pipe_snd_status(struct sock *sk, u8 type, u8 status, gfp_t priority)
 	ph->data[4] = status;
 
 #ifdef CONFIG_PHONET_PIPECTRLR
-	return pn_skb_send(sk, skb, &spn);
+	return pn_skb_send(sk, skb, &pn->remote_pep);
 #else
 	return pn_skb_send(sk, skb, &pipe_srv);
 #endif
@@ -582,12 +518,6 @@ static int pipe_do_rcv(struct sock *sk, struct sk_buff *skb)
 	struct pnpipehdr *hdr = pnp_hdr(skb);
 	struct sk_buff_head *queue;
 	int err = 0;
-#ifdef CONFIG_PHONET_PIPECTRLR
-	struct phonethdr *ph = pn_hdr(skb);
-	static u8 host_pref_rx_fc[3], host_req_tx_fc[3];
-	u8 remote_pref_rx_fc[3], remote_req_tx_fc[3];
-	u8 negotiated_rx_fc, negotiated_tx_fc;
-#endif
 
 	BUG_ON(sk->sk_state == TCP_CLOSE_WAIT);
 
@@ -596,40 +526,6 @@ static int pipe_do_rcv(struct sock *sk, struct sk_buff *skb)
 		pep_reject_conn(sk, skb, PN_PIPE_ERR_PEP_IN_USE);
 		break;
 
-#ifdef CONFIG_PHONET_PIPECTRLR
-	case PNS_PEP_CONNECT_RESP:
-		if ((ph->pn_sdev == pn_dev(pn->remote_pep)) &&
-				(ph->pn_sobj == pn_obj(pn->remote_pep))) {
-			pipe_get_flow_info(sk, skb, remote_pref_rx_fc,
-					remote_req_tx_fc);
-
-			 negotiated_tx_fc = pipe_negotiate_fc(remote_req_tx_fc,
-					 host_pref_rx_fc,
-					 sizeof(host_pref_rx_fc));
-			 negotiated_rx_fc = pipe_negotiate_fc(host_req_tx_fc,
-					 remote_pref_rx_fc,
-					 sizeof(host_pref_rx_fc));
-
-			pn->pipe_state = PIPE_DISABLED;
-			pipe_handler_send_created_ind(sk, pn->remote_pep,
-					PNS_PIPE_CREATED_IND_UTID,
-					pn->pipe_handle, PNS_PIPE_CREATED_IND,
-					negotiated_tx_fc, negotiated_rx_fc);
-			pipe_handler_send_created_ind(sk, pn->pn_sk.sobject,
-					PNS_PIPE_CREATED_IND_UTID,
-					pn->pipe_handle, PNS_PIPE_CREATED_IND,
-					negotiated_tx_fc, negotiated_rx_fc);
-		} else {
-			pipe_handler_send_req(sk, pn->remote_pep,
-					PNS_PEP_CONNECT_UTID,
-					PNS_PEP_CONNECT_REQ, pn->pipe_handle,
-					GFP_ATOMIC);
-			pipe_get_flow_info(sk, skb, host_pref_rx_fc,
-					host_req_tx_fc);
-		}
-		break;
-#endif
-
 	case PNS_PEP_DISCONNECT_REQ:
 		pep_reply(sk, skb, PN_PIPE_NO_ERROR, NULL, 0, GFP_ATOMIC);
 		sk->sk_state = TCP_CLOSE_WAIT;
@@ -640,10 +536,7 @@ static int pipe_do_rcv(struct sock *sk, struct sk_buff *skb)
 #ifdef CONFIG_PHONET_PIPECTRLR
 	case PNS_PEP_DISCONNECT_RESP:
 		pn->pipe_state = PIPE_IDLE;
-		pipe_handler_send_req(sk, pn->pn_sk.sobject,
-				PNS_PEP_DISCONNECT_UTID,
-				PNS_PEP_DISCONNECT_REQ, pn->pipe_handle,
-				GFP_KERNEL);
+		sk->sk_state = TCP_CLOSE;
 		break;
 #endif
 
@@ -654,21 +547,18 @@ static int pipe_do_rcv(struct sock *sk, struct sk_buff *skb)
 
 #ifdef CONFIG_PHONET_PIPECTRLR
 	case PNS_PEP_ENABLE_RESP:
-		if ((ph->pn_sdev == pn_dev(pn->remote_pep)) &&
-				(ph->pn_sobj == pn_obj(pn->remote_pep))) {
-			pn->pipe_state = PIPE_ENABLED;
-			pipe_handler_send_ind(sk, pn->remote_pep,
-					PNS_PIPE_ENABLED_IND_UTID,
-					pn->pipe_handle, PNS_PIPE_ENABLED_IND);
-			pipe_handler_send_ind(sk, pn->pn_sk.sobject,
-					PNS_PIPE_ENABLED_IND_UTID,
-					pn->pipe_handle, PNS_PIPE_ENABLED_IND);
-		} else
-			pipe_handler_send_req(sk, pn->remote_pep,
-					PNS_PIPE_ENABLE_UTID,
-					PNS_PEP_ENABLE_REQ, pn->pipe_handle,
-					GFP_KERNEL);
+		pn->pipe_state = PIPE_ENABLED;
+		pipe_handler_send_ind(sk, PNS_PIPE_ENABLED_IND_UTID,
+				PNS_PIPE_ENABLED_IND);
 
+		if (!pn_flow_safe(pn->tx_fc)) {
+			atomic_set(&pn->tx_credits, 1);
+			sk->sk_write_space(sk);
+		}
+		if (sk->sk_state == TCP_ESTABLISHED)
+			break; /* Nothing to do */
+		sk->sk_state = TCP_ESTABLISHED;
+		pipe_grant_credits(sk);
 		break;
 #endif
 
@@ -692,22 +582,12 @@ static int pipe_do_rcv(struct sock *sk, struct sk_buff *skb)
 
 #ifdef CONFIG_PHONET_PIPECTRLR
 	case PNS_PEP_DISABLE_RESP:
-		if ((ph->pn_sdev == pn_dev(pn->remote_pep)) &&
-				(ph->pn_sobj == pn_obj(pn->remote_pep))) {
-			pn->pipe_state = PIPE_DISABLED;
-			pipe_handler_send_ind(sk, pn->remote_pep,
-					PNS_PIPE_DISABLED_IND_UTID,
-					pn->pipe_handle,
-					PNS_PIPE_DISABLED_IND);
-			pipe_handler_send_ind(sk, pn->pn_sk.sobject,
-					PNS_PIPE_DISABLED_IND_UTID,
-					pn->pipe_handle,
-					PNS_PIPE_DISABLED_IND);
-		} else
-			pipe_handler_send_req(sk, pn->remote_pep,
-					PNS_PIPE_DISABLE_UTID,
-					PNS_PEP_DISABLE_REQ, pn->pipe_handle,
-					GFP_KERNEL);
+		pn->pipe_state = PIPE_DISABLED;
+		atomic_set(&pn->tx_credits, 0);
+		pipe_handler_send_ind(sk, PNS_PIPE_DISABLED_IND_UTID,
+				PNS_PIPE_DISABLED_IND);
+		sk->sk_state = TCP_SYN_RECV;
+		pn->rx_credits = 0;
 		break;
 #endif
 
@@ -802,6 +682,42 @@ static void pipe_destruct(struct sock *sk)
 	skb_queue_purge(&pn->ctrlreq_queue);
 }
 
+#ifdef CONFIG_PHONET_PIPECTRLR
+static int pep_connresp_rcv(struct sock *sk, struct sk_buff *skb)
+{
+	struct pep_sock *pn = pep_sk(sk);
+	static u8 host_pref_rx_fc[3] = {3, 2, 1}, host_req_tx_fc[3] = {3, 2, 1};
+	u8 remote_pref_rx_fc[3], remote_req_tx_fc[3];
+	u8 negotiated_rx_fc, negotiated_tx_fc;
+	int ret;
+
+	pipe_get_flow_info(sk, skb, remote_pref_rx_fc,
+			remote_req_tx_fc);
+	negotiated_tx_fc = pipe_negotiate_fc(remote_req_tx_fc,
+			host_pref_rx_fc,
+			sizeof(host_pref_rx_fc));
+	negotiated_rx_fc = pipe_negotiate_fc(host_req_tx_fc,
+			remote_pref_rx_fc,
+			sizeof(host_pref_rx_fc));
+
+	pn->pipe_state = PIPE_DISABLED;
+	sk->sk_state = TCP_SYN_RECV;
+	sk->sk_backlog_rcv = pipe_do_rcv;
+	sk->sk_destruct = pipe_destruct;
+	pn->rx_credits = 0;
+	pn->rx_fc = negotiated_rx_fc;
+	pn->tx_fc = negotiated_tx_fc;
+	sk->sk_state_change(sk);
+
+	ret = pipe_handler_send_created_ind(sk,
+			PNS_PIPE_CREATED_IND_UTID,
+			PNS_PIPE_CREATED_IND
+			);
+
+	return ret;
+}
+#endif
+
 static int pep_connreq_rcv(struct sock *sk, struct sk_buff *skb)
 {
 	struct sock *newsk;
@@ -884,9 +800,6 @@ static int pep_connreq_rcv(struct sock *sk, struct sk_buff *skb)
 	newpn->rx_fc = newpn->tx_fc = PN_LEGACY_FLOW_CONTROL;
 	newpn->init_enable = enabled;
 	newpn->aligned = aligned;
-#ifdef CONFIG_PHONET_PIPECTRLR
-	newpn->remote_pep = pn->remote_pep;
-#endif
 
 	BUG_ON(!skb_queue_empty(&newsk->sk_receive_queue));
 	skb_queue_head(&newsk->sk_receive_queue, skb);
@@ -968,6 +881,12 @@ static int pep_do_rcv(struct sock *sk, struct sk_buff *skb)
 		err = pep_connreq_rcv(sk, skb);
 		break;
 
+#ifdef CONFIG_PHONET_PIPECTRLR
+	case PNS_PEP_CONNECT_RESP:
+		err = pep_connresp_rcv(sk, skb);
+		break;
+#endif
+
 	case PNS_PEP_DISCONNECT_REQ:
 		pep_reply(sk, skb, PN_PIPE_NO_ERROR, NULL, 0, GFP_ATOMIC);
 		break;
@@ -1032,6 +951,18 @@ static void pep_sock_close(struct sock *sk, long timeout)
 		/* Forcefully remove dangling Phonet pipe */
 		pipe_do_remove(sk);
 
+#ifdef CONFIG_PHONET_PIPECTRLR
+	if (pn->pipe_state != PIPE_IDLE) {
+		/* send pep disconnect request */
+		pipe_handler_send_req(sk,
+				PNS_PEP_DISCONNECT_UTID, PNS_PEP_DISCONNECT_REQ,
+				GFP_KERNEL);
+
+		pn->pipe_state = PIPE_IDLE;
+		sk->sk_state = TCP_CLOSE;
+	}
+#endif
+
 	ifindex = pn->ifindex;
 	pn->ifindex = 0;
 	release_sock(sk);
@@ -1108,6 +1039,20 @@ out:
 	return newsk;
 }
 
+#ifdef CONFIG_PHONET_PIPECTRLR
+static int pep_sock_connect(struct sock *sk, struct sockaddr *addr, int len)
+{
+	struct pep_sock *pn = pep_sk(sk);
+	struct sockaddr_pn *spn =  (struct sockaddr_pn *)addr;
+
+	memcpy(&pn->remote_pep, spn, sizeof(struct sockaddr_pn));
+
+	return pipe_handler_send_req(sk,
+			PNS_PEP_CONNECT_UTID, PNS_PEP_CONNECT_REQ,
+			GFP_ATOMIC);
+}
+#endif
+
 static int pep_ioctl(struct sock *sk, int cmd, unsigned long arg)
 {
 	struct pep_sock *pn = pep_sk(sk);
@@ -1149,10 +1094,6 @@ static int pep_setsockopt(struct sock *sk, int level, int optname,
 {
 	struct pep_sock *pn = pep_sk(sk);
 	int val = 0, err = 0;
-#ifdef CONFIG_PHONET_PIPECTRLR
-	int remote_pep;
-	int pipe_handle;
-#endif
 
 	if (level != SOL_PNPIPE)
 		return -ENOPROTOOPT;
@@ -1164,28 +1105,15 @@ static int pep_setsockopt(struct sock *sk, int level, int optname,
 	lock_sock(sk);
 	switch (optname) {
 #ifdef CONFIG_PHONET_PIPECTRLR
-	case PNPIPE_CREATE:
+	case PNPIPE_PIPE_HANDLE:
 		if (val) {
 			if (pn->pipe_state > PIPE_IDLE) {
 				err = -EFAULT;
 				break;
 			}
-			remote_pep = val & 0xFFFF;
-			pipe_handle =  (val >> 16) & 0xFF;
-			pn->remote_pep = remote_pep;
-			err = pipe_handler_create_pipe(sk, pipe_handle,
-					PNPIPE_CREATE);
-			break;
-		}
-
-	case PNPIPE_DESTROY:
-		if (pn->pipe_state < PIPE_DISABLED) {
-			err = -EFAULT;
+			pn->pipe_handle = val;
 			break;
 		}
-
-		err = pipe_handler_create_pipe(sk, 0x0, PNPIPE_DESTROY);
-		break;
 #endif
 
 	case PNPIPE_ENCAP:
@@ -1278,14 +1206,6 @@ static int pipe_skb_send(struct sock *sk, struct sk_buff *skb)
 	struct pep_sock *pn = pep_sk(sk);
 	struct pnpipehdr *ph;
 	int err;
-#ifdef CONFIG_PHONET_PIPECTRLR
-	struct sockaddr_pn spn = {
-		.spn_family = AF_PHONET,
-		.spn_resource = 0xD9,
-		.spn_dev = pn_dev(pn->remote_pep),
-		.spn_obj = pn_obj(pn->remote_pep),
-	};
-#endif
 
 	if (pn_flow_safe(pn->tx_fc) &&
 	    !atomic_add_unless(&pn->tx_credits, -1, 0)) {
@@ -1304,7 +1224,7 @@ static int pipe_skb_send(struct sock *sk, struct sk_buff *skb)
 		ph->message_id = PNS_PIPE_DATA;
 	ph->pipe_handle = pn->pipe_handle;
 #ifdef CONFIG_PHONET_PIPECTRLR
-	err = pn_skb_send(sk, skb, &spn);
+	err = pn_skb_send(sk, skb, &pn->remote_pep);
 #else
 	err = pn_skb_send(sk, skb, &pipe_srv);
 #endif
@@ -1504,6 +1424,8 @@ static void pep_sock_unhash(struct sock *sk)
 	struct sock *skparent = NULL;
 
 	lock_sock(sk);
+
+#ifndef CONFIG_PHONET_PIPECTRLR
 	if ((1 << sk->sk_state) & ~(TCPF_CLOSE|TCPF_LISTEN)) {
 		skparent = pn->listener;
 		release_sock(sk);
@@ -1513,6 +1435,7 @@ static void pep_sock_unhash(struct sock *sk)
 		sk_del_node_init(sk);
 		sk = skparent;
 	}
+#endif
 	/* Unhash a listening sock only when it is closed
 	 * and all of its active connected pipes are closed. */
 	if (hlist_empty(&pn->hlist))
@@ -1526,6 +1449,9 @@ static void pep_sock_unhash(struct sock *sk)
 static struct proto pep_proto = {
 	.close		= pep_sock_close,
 	.accept		= pep_sock_accept,
+#ifdef CONFIG_PHONET_PIPECTRLR
+	.connect	= pep_sock_connect,
+#endif
 	.ioctl		= pep_ioctl,
 	.init		= pep_init,
 	.setsockopt	= pep_setsockopt,
diff --git a/net/phonet/socket.c b/net/phonet/socket.c
index aca8fba..123a374 100644
--- a/net/phonet/socket.c
+++ b/net/phonet/socket.c
@@ -225,6 +225,102 @@ static int pn_socket_autobind(struct socket *sock)
 	return 0; /* socket was already bound */
 }
 
+static int pn_socket_connect(struct socket *sock, struct sockaddr *addr,
+		int len, int flags)
+{
+	struct sock *sk = sock->sk;
+	struct sockaddr_pn *spn = (struct sockaddr_pn *)addr;
+	long timeo;
+	int err;
+
+	lock_sock(sk);
+
+	if (len < sizeof(struct sockaddr_pn))
+		return -EINVAL;
+	if (spn->spn_family != AF_PHONET)
+		return -EAFNOSUPPORT;
+
+	switch (sock->state) {
+	case SS_UNCONNECTED:
+		sk->sk_state = TCP_CLOSE;
+		break;
+	case SS_CONNECTING:
+		switch (sk->sk_state) {
+		case TCP_SYN_RECV:
+			sock->state = SS_CONNECTED;
+			err = -EISCONN;
+			goto out;
+		case TCP_CLOSE:
+			err = -EALREADY;
+			if (flags & O_NONBLOCK)
+				goto out;
+			goto wait_connect;
+			break;
+		}
+		break;
+	case SS_CONNECTED:
+		switch (sk->sk_state) {
+		case TCP_SYN_RECV:
+			err = -EISCONN;
+			goto out;
+		case TCP_CLOSE:
+			sock->state = SS_UNCONNECTED;
+			break;
+		}
+		break;
+	case SS_DISCONNECTING:
+	case SS_FREE:
+		break;
+	}
+	sk->sk_state = TCP_CLOSE;
+	sock->state = SS_UNCONNECTED;
+	sk_stream_kill_queues(sk);
+
+
+	sock->state = SS_CONNECTING;
+	err = sk->sk_prot->connect(sk, addr, len);
+	if (err < 0) {
+		sock->state = SS_UNCONNECTED;
+		sk->sk_state = TCP_CLOSE;
+		goto out;
+	}
+
+	err = -EINPROGRESS;
+wait_connect:
+	if (sk->sk_state != TCP_SYN_RECV && (flags & O_NONBLOCK))
+		goto out;
+
+	timeo = sock_sndtimeo(sk, flags & O_NONBLOCK);
+	release_sock(sk);
+
+	err = -ERESTARTSYS;
+	timeo = wait_event_interruptible_timeout(*sk_sleep(sk),
+			sk->sk_state != TCP_CLOSE,
+			timeo);
+
+	lock_sock(sk);
+	if (timeo < 0)
+		goto out; /* -ERESTARTSYS */
+
+	err = -ETIMEDOUT;
+	if (timeo == 0 && sk->sk_state != TCP_SYN_RECV)
+		goto out;
+
+	if (sk->sk_state != TCP_SYN_RECV) {
+		sock->state = SS_UNCONNECTED;
+		err = sock_error(sk);
+		if (!err)
+			err = -ECONNREFUSED;
+		goto out;
+	}
+	sock->state = SS_CONNECTED;
+	err = 0;
+
+out:
+	release_sock(sk);
+	return err;
+}
+
 static int pn_socket_accept(struct socket *sock, struct socket *newsock,
 				int flags)
 {
@@ -393,7 +489,11 @@ const struct proto_ops phonet_stream_ops = {
 	.owner		= THIS_MODULE,
 	.release	= pn_socket_release,
 	.bind		= pn_socket_bind,
+#ifdef CONFIG_PHONET_PIPECTRLR
+	.connect	= pn_socket_connect,
+#else
 	.connect	= sock_no_connect,
+#endif
 	.socketpair	= sock_no_socketpair,
 	.accept		= pn_socket_accept,
 	.getname	= pn_socket_getname,
-- 
1.7.2.dirty


^ permalink raw reply related

* Re: [RFC PATCH 4/9] ipvs network name space aware
From: Daniel Lezcano @ 2010-10-12 16:02 UTC (permalink / raw)
  To: Hans Schillstrom; +Cc: lvs-devel, netdev, netfilter-devel, horms, ja, wensong
In-Reply-To: <201010081317.01120.hans.schillstrom@ericsson.com>

On 10/08/2010 01:16 PM, Hans Schillstrom wrote:
> This patch just contains ip_vs_core.c
>
> Signed-off-by:Hans Schillstrom<hans.schillstrom@ericsson.com>
>
> diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
> index 0c043b6..4fdc5cb 100644
> --- a/net/netfilter/ipvs/ip_vs_core.c
> +++ b/net/netfilter/ipvs/ip_vs_core.c
> @@ -52,7 +52,6 @@
>
>   #include<net/ip_vs.h>
>
> -
>   EXPORT_SYMBOL(register_ip_vs_scheduler);
>   EXPORT_SYMBOL(unregister_ip_vs_scheduler);
>   EXPORT_SYMBOL(ip_vs_proto_name);
> @@ -67,6 +66,8 @@ EXPORT_SYMBOL(ip_vs_conn_put);
>   EXPORT_SYMBOL(ip_vs_get_debug_level);
>   #endif
>
> +/* netns cnt used for uniqueness */
> +static atomic_t ipvs_netns_cnt = ATOMIC_INIT(0);
>    

Why is this counter needed ?

[ cut ]

> + *	Initialize IP Virtual Server netns mem.
> + */
> +static int __net_init __ip_vs_init(struct net *net)
> +{
> +	struct netns_ipvs *ipvs = 0;
>
> +	ipvs = kzalloc(sizeof(struct netns_ipvs), GFP_ATOMIC);
> +	if( ipvs == NULL ) {
> +		pr_err("%s(): no memory.\n", __func__);
> +		return -ENOMEM;
> +	}
> +	ipvs->inc = atomic_read(&ipvs_netns_cnt);
>    

AFAICS, this counter is never used. Is it really needed ?

> +	atomic_inc(&ipvs_netns_cnt);
> +	IP_VS_DBG(10, "Creating new netns *net=%p *ipvs=%p size=%lu\n",
> +		     net, ipvs, sizeof(struct netns_ipvs));
> +	net->ipvs = ipvs;
> +
> +	return 0;
> +}
>    


^ permalink raw reply

* [BUG net-next] bnx2x: all traffic comes to RX queue 0
From: Eric Dumazet @ 2010-10-12 16:07 UTC (permalink / raw)
  To: David Miller, Dmitry Kravkov, Vladislav Zolotarov, Yaniv Rosner
  Cc: netdev, Michael Chan, Eilon Greenstein
In-Reply-To: <1286838210.30423.128.camel@edumazet-laptop>

Hmm, while doing tests for the netdev_alloc_skb() problem, I found
current net-next tree is not really multi queue enabled...

ethtool -S eth1|grep _ucast
     [0]: rx_ucast_packets: 3507786
     [0]: tx_ucast_packets: 416925
     [1]: rx_ucast_packets: 0
     [1]: tx_ucast_packets: 4
     [2]: rx_ucast_packets: 0
     [2]: tx_ucast_packets: 397467
     [3]: rx_ucast_packets: 0
     [3]: tx_ucast_packets: 75832
     [4]: rx_ucast_packets: 0
     [4]: tx_ucast_packets: 171025
     [5]: rx_ucast_packets: 0
     [5]: tx_ucast_packets: 233025
     [6]: rx_ucast_packets: 0
     [6]: tx_ucast_packets: 250358
     [7]: rx_ucast_packets: 0
     [7]: tx_ucast_packets: 240792
     [8]: rx_ucast_packets: 0
     [8]: tx_ucast_packets: 216366
     [9]: rx_ucast_packets: 0
     [9]: tx_ucast_packets: 1
     [10]: rx_ucast_packets: 0
     [10]: tx_ucast_packets: 350324
     [11]: rx_ucast_packets: 0
     [11]: tx_ucast_packets: 92403
     [12]: rx_ucast_packets: 0
     [12]: tx_ucast_packets: 307678
     [13]: rx_ucast_packets: 0
     [13]: tx_ucast_packets: 314315
     [14]: rx_ucast_packets: 0
     [14]: tx_ucast_packets: 256767
     [15]: rx_ucast_packets: 0
     [15]: tx_ucast_packets: 185105
     rx_ucast_packets: 3507786
     tx_ucast_packets: 3508387

# ethtool -i eth1
driver: bnx2x
version: 1.60.00-1
firmware-version: bc 4.8.0 phy baa0.105
bus-info: 0000:02:00.1

02:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM57711E
10Gigabit PCIe
	Subsystem: Hewlett-Packard Company NC532i Dual Port 10GbE Multifunction
BL-C Adapter
	Flags: bus master, fast devsel, latency 0, IRQ 47
	Memory at fa000000 (64-bit, non-prefetchable) [size=8M]
	Memory at f9800000 (64-bit, non-prefetchable) [size=8M]
	[virtual] Expansion ROM at e7010000 [disabled] [size=64K]
	Capabilities: [48] Power Management version 3
	Capabilities: [50] Vital Product Data
	Capabilities: [58] MSI: Enable- Count=1/8 Maskable- 64bit+
	Capabilities: [a0] MSI-X: Enable+ Count=17 Masked-
	Capabilities: [ac] Express Endpoint, MSI 00
	Capabilities: [100] Device Serial Number f4-ce-46-ff-fe-bb-32-d4
	Capabilities: [110] Advanced Error Reporting
	Capabilities: [150] Power Budgeting <?>
	Capabilities: [160] Virtual Channel <?>
	Kernel driver in use: bnx2x
	Kernel modules: bnx2x


Any idea before a biscection ?

Thanks !



^ permalink raw reply

* RE: [BUG net-next] bnx2x: all traffic comes to RX queue 0
From: Dmitry Kravkov @ 2010-10-12 16:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, Michael Chan, Eilon Greenstein, David Miller,
	Vladislav Zolotarov, Yaniv Rosner
In-Reply-To: <1286899657.2732.93.camel@edumazet-laptop>

Eric,

I will take a look

Thanks

-----Original Message-----
From: Eric Dumazet [mailto:eric.dumazet@gmail.com] 
Sent: Tuesday, October 12, 2010 6:08 PM
To: David Miller; Dmitry Kravkov; Vladislav Zolotarov; Yaniv Rosner
Cc: netdev; Michael Chan; Eilon Greenstein
Subject: [BUG net-next] bnx2x: all traffic comes to RX queue 0

Hmm, while doing tests for the netdev_alloc_skb() problem, I found
current net-next tree is not really multi queue enabled...

ethtool -S eth1|grep _ucast
     [0]: rx_ucast_packets: 3507786
     [0]: tx_ucast_packets: 416925
     [1]: rx_ucast_packets: 0
     [1]: tx_ucast_packets: 4
     [2]: rx_ucast_packets: 0
     [2]: tx_ucast_packets: 397467
     [3]: rx_ucast_packets: 0
     [3]: tx_ucast_packets: 75832
     [4]: rx_ucast_packets: 0
     [4]: tx_ucast_packets: 171025
     [5]: rx_ucast_packets: 0
     [5]: tx_ucast_packets: 233025
     [6]: rx_ucast_packets: 0
     [6]: tx_ucast_packets: 250358
     [7]: rx_ucast_packets: 0
     [7]: tx_ucast_packets: 240792
     [8]: rx_ucast_packets: 0
     [8]: tx_ucast_packets: 216366
     [9]: rx_ucast_packets: 0
     [9]: tx_ucast_packets: 1
     [10]: rx_ucast_packets: 0
     [10]: tx_ucast_packets: 350324
     [11]: rx_ucast_packets: 0
     [11]: tx_ucast_packets: 92403
     [12]: rx_ucast_packets: 0
     [12]: tx_ucast_packets: 307678
     [13]: rx_ucast_packets: 0
     [13]: tx_ucast_packets: 314315
     [14]: rx_ucast_packets: 0
     [14]: tx_ucast_packets: 256767
     [15]: rx_ucast_packets: 0
     [15]: tx_ucast_packets: 185105
     rx_ucast_packets: 3507786
     tx_ucast_packets: 3508387

# ethtool -i eth1
driver: bnx2x
version: 1.60.00-1
firmware-version: bc 4.8.0 phy baa0.105
bus-info: 0000:02:00.1

02:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM57711E
10Gigabit PCIe
	Subsystem: Hewlett-Packard Company NC532i Dual Port 10GbE Multifunction
BL-C Adapter
	Flags: bus master, fast devsel, latency 0, IRQ 47
	Memory at fa000000 (64-bit, non-prefetchable) [size=8M]
	Memory at f9800000 (64-bit, non-prefetchable) [size=8M]
	[virtual] Expansion ROM at e7010000 [disabled] [size=64K]
	Capabilities: [48] Power Management version 3
	Capabilities: [50] Vital Product Data
	Capabilities: [58] MSI: Enable- Count=1/8 Maskable- 64bit+
	Capabilities: [a0] MSI-X: Enable+ Count=17 Masked-
	Capabilities: [ac] Express Endpoint, MSI 00
	Capabilities: [100] Device Serial Number f4-ce-46-ff-fe-bb-32-d4
	Capabilities: [110] Advanced Error Reporting
	Capabilities: [150] Power Budgeting <?>
	Capabilities: [160] Virtual Channel <?>
	Kernel driver in use: bnx2x
	Kernel modules: bnx2x


Any idea before a biscection ?

Thanks !




^ permalink raw reply

* Re: [PATCH] Phonet: 'connect' socket implementation for Pipe controller
From: Rémi Denis-Courmont @ 2010-10-12 16:30 UTC (permalink / raw)
  To: Kumar A Sanghvi
  Cc: remi.denis-courmont, davem, netdev, linus.walleij,
	gulshan.karmani, sudeep.divakaran
In-Reply-To: <1286897211-3198-1-git-send-email-kumar.sanghvi@stericsson.com>

	Hello,

Just a few comments...

Le mardi 12 octobre 2010 18:26:51 Kumar A Sanghvi, vous avez écrit :
> diff --git a/include/net/phonet/pep.h b/include/net/phonet/pep.h
> index def6cfa..b60b28c 100644
> --- a/include/net/phonet/pep.h
> +++ b/include/net/phonet/pep.h
> @@ -802,6 +682,42 @@ static void pipe_destruct(struct sock *sk)
>  	skb_queue_purge(&pn->ctrlreq_queue);
>  }
> 
> +#ifdef CONFIG_PHONET_PIPECTRLR
> +static int pep_connresp_rcv(struct sock *sk, struct sk_buff *skb)
> +{
> +	struct pep_sock *pn = pep_sk(sk);
> +	static u8 host_pref_rx_fc[3] = {3, 2, 1}, host_req_tx_fc[3] = {3, 2, 1};

Why is this 'static' ? Doesn't that break concurrent uses?

> +	u8 remote_pref_rx_fc[3], remote_req_tx_fc[3];
> +	u8 negotiated_rx_fc, negotiated_tx_fc;
> +	int ret;
> +
> +	pipe_get_flow_info(sk, skb, remote_pref_rx_fc,
> +			remote_req_tx_fc);
> +	negotiated_tx_fc = pipe_negotiate_fc(remote_req_tx_fc,
> +			host_pref_rx_fc,
> +			sizeof(host_pref_rx_fc));
> +	negotiated_rx_fc = pipe_negotiate_fc(host_req_tx_fc,
> +			remote_pref_rx_fc,
> +			sizeof(host_pref_rx_fc));
> +
> +	pn->pipe_state = PIPE_DISABLED;
> +	sk->sk_state = TCP_SYN_RECV;
> +	sk->sk_backlog_rcv = pipe_do_rcv;
> +	sk->sk_destruct = pipe_destruct;
> +	pn->rx_credits = 0;
> +	pn->rx_fc = negotiated_rx_fc;
> +	pn->tx_fc = negotiated_tx_fc;
> +	sk->sk_state_change(sk);
> +
> +	ret = pipe_handler_send_created_ind(sk,
> +			PNS_PIPE_CREATED_IND_UTID,
> +			PNS_PIPE_CREATED_IND
> +			);
> +
> +	return ret;
> +}
> +#endif
> +
>  static int pep_connreq_rcv(struct sock *sk, struct sk_buff *skb)
>  {
>  	struct sock *newsk;

> diff --git a/net/phonet/socket.c b/net/phonet/socket.c
> index aca8fba..123a374 100644
> --- a/net/phonet/socket.c
> +++ b/net/phonet/socket.c
> @@ -225,6 +225,102 @@ static int pn_socket_autobind(struct socket *sock)
>  	return 0; /* socket was already bound */
>  }
> 
> +static int pn_socket_connect(struct socket *sock, struct sockaddr *addr,
> +		int len, int flags)
> +{
> +	struct sock *sk = sock->sk;
> +	struct sockaddr_pn *spn = (struct sockaddr_pn *)addr;
> +	long timeo;
> +	int err;
> +
> +	lock_sock(sk);
> +
> +	if (len < sizeof(struct sockaddr_pn))
> +		return -EINVAL;
> +	if (spn->spn_family != AF_PHONET)
> +		return -EAFNOSUPPORT;

You should move lock_sock(sk); here, I think.

> +
> +	switch (sock->state) {
> +	case SS_UNCONNECTED:
> +		sk->sk_state = TCP_CLOSE;
> +		break;
> +	case SS_CONNECTING:
> +		switch (sk->sk_state) {
> +		case TCP_SYN_RECV:
> +			sock->state = SS_CONNECTED;
> +			err = -EISCONN;
> +			goto out;
> +		case TCP_CLOSE:
> +			err = -EALREADY;
> +			if (flags & O_NONBLOCK)
> +				goto out;
> +			goto wait_connect;
> +			break;

I think the kernel policy is against redumdant break statements.

> +		}
> +		break;
> +	case SS_CONNECTED:
> +		switch (sk->sk_state) {
> +		case TCP_SYN_RECV:
> +			err = -EISCONN;
> +			goto out;
> +		case TCP_CLOSE:
> +			sock->state = SS_UNCONNECTED;
> +			break;
> +		}
> +		break;
> +	case SS_DISCONNECTING:
> +	case SS_FREE:
> +		break;
> +	}
> +	sk->sk_state = TCP_CLOSE;
> +	sock->state = SS_UNCONNECTED;

This is dead code...

> +	sk_stream_kill_queues(sk);
> +
> +
> +	sock->state = SS_CONNECTING;

...because of this ^ .

> +	err = sk->sk_prot->connect(sk, addr, len);
> +	if (err < 0) {
> +		sock->state = SS_UNCONNECTED;
> +		sk->sk_state = TCP_CLOSE;
> +		goto out;
> +	}
> +
> +	err = -EINPROGRESS;
> +wait_connect:
> +	if (sk->sk_state != TCP_SYN_RECV && (flags & O_NONBLOCK))
> +		goto out;
> +
> +	timeo = sock_sndtimeo(sk, flags & O_NONBLOCK);
> +	release_sock(sk);
> +
> +	err = -ERESTARTSYS;
> +	timeo = wait_event_interruptible_timeout(*sk_sleep(sk),
> +			sk->sk_state != TCP_CLOSE,
> +			timeo);
> +
> +	lock_sock(sk);
> +	if (timeo < 0)
> +		goto out; /* -ERESTARTSYS */
> +
> +	err = -ETIMEDOUT;
> +	if (timeo == 0 && sk->sk_state != TCP_SYN_RECV)
> +		goto out;
> +
> +	if (sk->sk_state != TCP_SYN_RECV) {
> +		sock->state = SS_UNCONNECTED;
> +		err = sock_error(sk);
> +		if (!err)
> +			err = -ECONNREFUSED;
> +		goto out;
> +	}
> +	sock->state = SS_CONNECTED;
> +	err = 0;
> +
> +out:
> +	release_sock(sk);
> +	return err;
> +}
> +
>  static int pn_socket_accept(struct socket *sock, struct socket *newsock,
>  				int flags)
>  {


-- 
Rémi Denis-Courmont
http://www.remlab.net/
http://fi.linkedin.com/in/remidenis

^ permalink raw reply

* Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net
From: Michael S. Tsirkin @ 2010-10-12 17:09 UTC (permalink / raw)
  To: Krishna Kumar2; +Cc: anthony, arnd, avi, davem, kvm, netdev, rusty
In-Reply-To: <OF6C412DBC.EA03FC24-ON652577B9.00228DCA-652577B9.0028232D@in.ibm.com>

On Mon, Oct 11, 2010 at 12:51:27PM +0530, Krishna Kumar2 wrote:
> "Michael S. Tsirkin" <mst@redhat.com> wrote on 10/06/2010 07:04:31 PM:
> 
> > On Fri, Sep 17, 2010 at 03:33:07PM +0530, Krishna Kumar wrote:
> > > For 1 TCP netperf, I ran 7 iterations and summed it. Explanation
> > > for degradation for 1 stream case:
> >
> > I thought about possible RX/TX contention reasons, and I realized that
> > we get/put the mm counter all the time.  So I write the following: I
> > haven't seen any performance gain from this in a single queue case, but
> > maybe this will help multiqueue?
> 
> Sorry for the delay, I was sick last couple of days. The results
> with your patch are (%'s over original code):
> 
> Code               BW%       CPU%       RemoteCPU
> MQ     (#txq=16)   31.4%     38.42%     6.41%
> MQ+MST (#txq=16)   28.3%     18.9%      -10.77%
> 
> The patch helps CPU utilization but didn't help single stream
> drop.
> 
> Thanks,

What other shared TX/RX locks are there?  In your setup, is the same
macvtap socket structure used for RX and TX?  If yes this will create
cacheline bounces as sk_wmem_alloc/sk_rmem_alloc share a cache line,
there might also be contention on the lock in sk_sleep waitqueue.
Anything else?

-- 
MST

^ permalink raw reply

* kernel panic in fib_rules_lookup [2.6.27.7 vendor-patched]
From: Joe Buehler @ 2010-10-12 17:14 UTC (permalink / raw)
  To: netdev

I am seeing a kernel panic (NULL pointer) in fib_rules_lookup.  There
were some other reports for 2.6.32 back in March and May.  It looks to
me as though "rules_list" is not in a good state when fib_rules_lookup
traverses it.

My application is bringing TAP interfaces up and down and making
changes to associated routing tables at a fairly good clip (say, a few
times a second).  That use case seems to be similar to a previously
reported crash case.

This is a MIPS kernel (Cavium Octeon) running two CPUs SMP.  I am
using 2.6.27.7 patched by Cavium for hardware support reasons.  I
cannot upgrade because the vendor patches are non-trivial to
forward-port.

Here is one stack trace:

[<ffffffff814671ec>] fib_rules_lookup+0x11c/0x1a8
[<ffffffff814bd144>] fib_lookup+0x2c/0x48
[<ffffffff814788d8>] __ip_route_output_key+0x918/0xf38
[<ffffffff81478f30>] ip_route_output_flow+0x38/0x2e8
[<ffffffff8149fd1c>] tcp_v4_connect+0x134/0x498
[<ffffffff814aef80>] inet_stream_connect+0xf8/0x2f0
[<ffffffff81442680>] sys_connect+0xe0/0xf8
[<ffffffff8114528c>] handle_sys+0x12c/0x148

Here is another:

[<ffffffff814671ec>] fib_rules_lookup+0x11c/0x1a8
[<ffffffff814bd144>] fib_lookup+0x2c/0x48
[<ffffffff814b6550>] fib_validate_source+0xb0/0x4c0
[<ffffffff8147a524>] ip_route_input+0x11a4/0x13c0
[<ffffffff8147c304>] ip_rcv_finish+0x2f4/0x4c0
[<ffffffff81454220>] process_backlog+0xa8/0x160
[<ffffffff81451ea8>] net_rx_action+0x190/0x2e0
[<ffffffff81166978>] __do_softirq+0xf0/0x218
[<ffffffff81166b18>] do_softirq+0x78/0x80
[<ffffffff81100e30>] plat_irq_dispatch+0x130/0x1e0
[<ffffffff81130948>] ret_from_irq+0x0/0x4
[<ffffffff8151167c>] _cond_resched+0x34/0x50
[<ffffffff81148b60>] fpu_emulator_cop1Handler+0x90/0x1c80
[<ffffffff81136f4c>] do_cpu+0x24c/0x360
[<ffffffff81130940>] ret_from_exception+0x0/0x8

*IF* my reading of the disassembled code at point of panic is correct,
 the "pos" pointer in list_for_each_entry_rcu appears to be NULL.

Looking at the code in net/core/fib_rules.c I see some uses of the
"rules_list" using rcu and some apparently not.  Has something simple
been overlooked?

I need this fixed so will try adding a spinlock to protect rules_list
if necessary.

Joe Buehler

^ permalink raw reply

* Re: [PATCH] af_packet: account for VLAN when checking packet size
From: Michael S. Tsirkin @ 2010-10-12 17:19 UTC (permalink / raw)
  To: David Miller, eric.dumazet, netdev, johann.baudy
In-Reply-To: <20101011172932.GB12342@orbit.nwl.cc>

On Mon, Oct 11, 2010 at 07:29:32PM +0200, Phil Sutter wrote:
> On Mon, Oct 11, 2010 at 09:01:53AM -0700, David Miller wrote:
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Mon, 11 Oct 2010 16:03:02 +0200
> > 
> > > If we dont test ETH_P_8021Q protocol here, we allow sending 1504 bytes
> > > frames for MTU=1500
> > > 
> > > Should we really care ?
> > > 
> > > If not, just do
> > > 
> > > reserve = dev->hard_header_len + VLAN_HLEN;
> > 
> > Thats a good point, I think we need to validate the SKB protocol
> > field.
> 
> Which is set to the value of the passed struct sockaddr_ll field
> sll_protocol. At least in the two userspace code samples I have here,
> the later field is set to htons(ETH_P_ALL). So unless one changes the
> API, the only way to find out the packet type is to actually parse the
> given ethernet header.

Yes, like eth_type_trans does I guess.  I think we had a similar
discussion already:

http://lists.openwall.net/netdev/2010/01/06/38

Summary: if we want to make the protocol field have the correct
value for this case we need to make it work for other
transports not just for ethernet.

> Since tpacket_rcv() just interprets the vlan_tci skb field, such
> detailed packet inspection is otherwise not done in af_packet.c.
> 
> Greetings, Phil
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox