Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 5/5 v4] net: add old_queue_mapping into skb->cb
From: jamal @ 2010-12-23 13:00 UTC (permalink / raw)
  To: Changli Gao
  Cc: David S. Miller, Stephen Hemminger, Eric Dumazet, Tom Herbert,
	Jiri Pirko, netdev, netem
In-Reply-To: <AANLkTimqemuhxCKq-PJu+FD-MDgKaHnYKnP_2ch30wxE@mail.gmail.com>

On Tue, 2010-12-21 at 22:03 +0800, Changli Gao wrote:

> When I tested it, my OS got frozen.

I will look into it the next opportunity i get. The example i showed
is on egress btw. A ping from outside that matches the filter
will be a good test.

> Currently, you can only change the rx queue mapping, because for tx,
> dev_pick_tx() doesn't use skb->queue_mapping to choose tx queue.

If skbedit is on egress, it will happen after (and override whatever
dev_pick_tx() chose), no? Thats the whole point for skbedits queuemap
editing.

> However, I don't think change the rx queue mapping is a good idea.

I agree for that as a default policy. But it is
policy that skbedit can and should be able to override.

> When the skbs returned from ifb enter netif_receive_skb() again,
> get_rps_cpu() may warn about the wrong rx queue, and my this patch is
> used to solve this problem. Even though the rx queue is legal, a
> different rps_cpus settings will be used, and the skbs may be
> redirected to different CPUs. Is it expected?

I am not sure without analyzing what performance impact would be, i.e i
think that the only reason i wouldnt do it is because it may have crazy
effect on performance but:
If i wanted to override the choice made by rps through some policy, why
shouldnt i be able to do it? Same thing if i wanted to bypass rps. tc
level seems appropriate.
I may be misreading the code: Quick glance at the code indicates users
have no choice on ingress: rps happens first then we can do tc level -
so it doesnt matter what changes we make to the queue map it will not
take effect in any case. Am i mistaken?

cheers,
jamal


^ permalink raw reply

* Re: Help: major pppoe regression since 2.6.35 (panic on first ppp conection)?
From: Eric Dumazet @ 2010-12-23 12:12 UTC (permalink / raw)
  To: Joel Soete; +Cc: Jarek Poplawski, Andrew Morton, Linux Kernel, netdev
In-Reply-To: <4D132C5F.8090404@scarlet.be>

Le jeudi 23 décembre 2010 à 11:02 +0000, Joel Soete a écrit :
> Hello Eric,
> 
> 
> On 12/22/2010 04:25 PM, Eric Dumazet wrote:
> [snip]
> >
> > Something overwrites nr_frags in skb_shinfo(skb)
> >
> > As skb_shinfo follows head portion of an skb, something overflows skb
> > head
> >
> > Please try adding some room like in following patch ?
> >
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index e6ba898..adf2834 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -187,6 +187,7 @@ enum {
> >    * the end of the header data, ie. at skb->end.
> >    */
> >   struct skb_shared_info {
> > +	char		filler[64];
> >   	unsigned short	nr_frags;
> >   	unsigned short	gso_size;
> >   	/* Warning: this field is not always filled in (UFO)! */
> >
> Sorry for delay but I have good news, I am sending this answer from:
> $ uname -a
> Linux sidh2 2.6.37-rc7-amd64-t1 #1 SMP Thu Dec 23 10:30:27 GMT 2010 x86_64 GNU/Linux
> 
> with your tips ;<) (without kernel had already died)
> 
> That said how can find stuff overflowing skb head? (all I say, is that this issue started with 2.6.34-git6???)
> 
> Thanks a lot,

You're welcome. At least we know were to search. Thanks !

I am taking holidays right now for about 5 days, I guess someone else
might find the bug before me ;)




^ permalink raw reply

* Re: Help: major pppoe regression since 2.6.35 (panic on first ppp conection)?
From: Joel Soete @ 2010-12-23 11:02 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jarek Poplawski, Andrew Morton, Linux Kernel, netdev
In-Reply-To: <1293035100.3027.247.camel@edumazet-laptop>

Hello Eric,


On 12/22/2010 04:25 PM, Eric Dumazet wrote:
[snip]
>
> Something overwrites nr_frags in skb_shinfo(skb)
>
> As skb_shinfo follows head portion of an skb, something overflows skb
> head
>
> Please try adding some room like in following patch ?
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index e6ba898..adf2834 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -187,6 +187,7 @@ enum {
>    * the end of the header data, ie. at skb->end.
>    */
>   struct skb_shared_info {
> +	char		filler[64];
>   	unsigned short	nr_frags;
>   	unsigned short	gso_size;
>   	/* Warning: this field is not always filled in (UFO)! */
>
Sorry for delay but I have good news, I am sending this answer from:
$ uname -a
Linux sidh2 2.6.37-rc7-amd64-t1 #1 SMP Thu Dec 23 10:30:27 GMT 2010 x86_64 GNU/Linux

with your tips ;<) (without kernel had already died)

That said how can find stuff overflowing skb head? (all I say, is that this issue started with 2.6.34-git6???)

Thanks a lot,
	J.

^ permalink raw reply

* RE: Using ethernet device as efficient small packet generator
From: juice @ 2010-12-23 10:50 UTC (permalink / raw)
  To: Jon Zhou, Eric Dumazet, Stephen Hemminger, netdev@vger.kernel.org
In-Reply-To: <4A6A2125329CFD4D8CC40C9E8ABCAB9F249D5F39C2@MILEXCH2.ds.jdsu.net>

>
> Ethtool -S "My intel x520 10G nic" will show there are 8 rx/tx queues
>
> I just made 5M pps with 64 bytes packet according to link given by eric
> Dumazet.
> (connect the 2 ports with each other of the NIC, XEON E5540,kernel
> 2.6.32,set irq affinity, Noted that I have an abnormal ksoftirqd/2 which
> occupy 30%cpu even at idle state, so the result still has space to
> improve)
>
> At another old kernel(2.6.16) with tg3 and bnx2 1G NIC,XEON E5450, I only
> got 490K pps(it is about 300Mbps,30% GE), I think the reason is multiqueue
> unsupported in this kernel.
>
> I will do a test with 1Gb nic on the new kernel later.
>

Which do you suppose is the reason for poor performance on my setup,
is it lack of multiqueue HW in the GE NIC's I am using or is it lack
of multiqueue support in the kernel (2.6.32) that I am using?

Is multiqueue really necessary to achieve the full 1GE saturation, or
is it only needed on 10GE NIC's?

As I understand multiqueue is useful only if there are lots of CPU cores
to run, each handling one queue.

The application I am thinking of, preloading a packet sequence into
kernel from userland application and then starting to send from buffer
propably does not benefit so much from many cores, it would be enough
that one CPU would handle the sending and other core(s) would handle
other tasks.

Yours, Jussi Ohenoja



^ permalink raw reply

* [PATCH] smsc911x: add disable and re-enable Rx int to de-assert interrupt pin
From: Jason Wang @ 2010-12-23 10:43 UTC (permalink / raw)
  To: davem; +Cc: netdev, steve.glendinning, linux-omap

When kernel enters irqhanlder, it will check the Rx interrupt status
bit, if Rx status is set but can't call napi_schedule(), it will do
nothing and directly return form irqhandler. This situation is prone
to be produced when we repeatly call irqhandler through netpoll
interface(i.e kgdboe connecting).

This is a potential risk for those level triggered platforms(i.e
ti_omap3evm), because if we don't handle Rx int and just return from
irqhandler, the irq pin will be keeping asserted, the level triggered
platforms will have no chance to jump out from the Rx irq. The whole
system will hung into the irq subsystem.

To solve it, we add a disable/re-enable Rx int operation for this
situation, this operation can de-assert interrupt pin for this time
and will leave the received data and status in the FIFO for later
interrupts to handle.

Signed-off-by: Jason Wang <jason77.wang@gmail.com>
---
 drivers/net/smsc911x.c |   18 ++++++++++++------
 1 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/net/smsc911x.c b/drivers/net/smsc911x.c
index 64bfdae..dd6312f 100644
--- a/drivers/net/smsc911x.c
+++ b/drivers/net/smsc911x.c
@@ -1492,14 +1492,20 @@ static irqreturn_t smsc911x_irqhandler(int irq, void *dev_id)
 	}
 
 	if (likely(intsts & inten & INT_STS_RSFL_)) {
-		if (likely(napi_schedule_prep(&pdata->napi))) {
-			/* Disable Rx interrupts */
-			temp = smsc911x_reg_read(pdata, INT_EN);
-			temp &= (~INT_EN_RSFL_EN_);
-			smsc911x_reg_write(pdata, INT_EN, temp);
+		/* Disable Rx interrupts first, if doesn't meet
+		 * napi_schedule_prep(), we will re-enable Rx interrupts. This
+		 * disable and re-enable pair operation can De-assert interrupt
+		 * line and is more safer to those level triggered platforms. */
+		temp = smsc911x_reg_read(pdata, INT_EN);
+		temp &= (~INT_EN_RSFL_EN_);
+		smsc911x_reg_write(pdata, INT_EN, temp);
+
+		if (likely(napi_schedule_prep(&pdata->napi)))
 			/* Schedule a NAPI poll */
 			__napi_schedule(&pdata->napi);
-		} else {
+		else {
+			temp |= INT_EN_RSFL_EN_;
+			smsc911x_reg_write(pdata, INT_EN, temp);
 			SMSC_WARNING(RX_ERR,
 				"napi_schedule_prep failed");
 		}
-- 
1.5.6.5


^ permalink raw reply related

* [PATCH net-next-2.6 2/2] can: add driver for Softing card
From: Kurt Van Dijck @ 2010-12-23  9:47 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA,
	socketcan-core-0fE9KPoRgkgATYTw5x5z8w
In-Reply-To: <20101223093627.GA325-MxZ6Iy/zr/UdbCeoMzGj59i2O/JbrIOy@public.gmane.org>

This patch adds the driver that creates a platform:softing device
from a pcmcia_device
Note: the Kconfig indicates a dependency on the softing.ko driver,
but this is purely to make configuration intuitive. This driver will
work independent, but no CAN network devices appear until softing.ko is
loaded too.

Signed-off-by: Kurt Van Dijck <kurt.van.dijck-/BeEPy95v10@public.gmane.org>

---
 drivers/net/can/softing/Kconfig      |   11 +
 drivers/net/can/softing/Makefile     |    1 +
 drivers/net/can/softing/softing_cs.c |  362 ++++++++++++++++++++++++++++++++++
 3 files changed, 374 insertions(+), 0 deletions(-)

diff --git a/drivers/net/can/softing/Kconfig b/drivers/net/can/softing/Kconfig
index 072f337..d1a6ba9 100644
--- a/drivers/net/can/softing/Kconfig
+++ b/drivers/net/can/softing/Kconfig
@@ -14,3 +14,14 @@ config CAN_SOFTING
 	  controls the 2 busses on the card together.
 	  As such, some actions (start/stop/busoff recovery) on 1 bus
 	  must bring down the other bus too temporarily.
+
+config CAN_SOFTING_CS
+	tristate "Softing CAN pcmcia cards"
+	depends on CAN_SOFTING && PCMCIA
+	---help---
+	  Support for PCMCIA cards from Softing Gmbh & some cards
+	  from Vector Gmbh.
+	  You need firmware for these, which you can get at
+	  http://developer.berlios.de/projects/socketcan/
+	  This version of the driver is written against
+	  firmware version 4.6 (softing-fw-4.6-binaries.tar.gz)
diff --git a/drivers/net/can/softing/Makefile b/drivers/net/can/softing/Makefile
index 7878b7b..5f0f527 100644
--- a/drivers/net/can/softing/Makefile
+++ b/drivers/net/can/softing/Makefile
@@ -1,5 +1,6 @@
 
 softing-y := softing_main.o softing_fw.o
 obj-$(CONFIG_CAN_SOFTING)        += softing.o
+obj-$(CONFIG_CAN_SOFTING_CS)     += softing_cs.o
 
 ccflags-$(CONFIG_CAN_DEBUG_DEVICES) := -DDEBUG
diff --git a/drivers/net/can/softing/softing_cs.c b/drivers/net/can/softing/softing_cs.c
new file mode 100644
index 0000000..74d6b0e
--- /dev/null
+++ b/drivers/net/can/softing/softing_cs.c
@@ -0,0 +1,362 @@
+/*
+ * drivers/net/can/softing/softing_cs.c
+ *
+ * Copyright (C) 2008-2010
+ *
+ * - Kurt Van Dijck, EIA Electronics
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the version 2 of the GNU General Public License
+ * as published by the Free Software Foundation
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+
+#include <pcmcia/cistpl.h>
+#include <pcmcia/ds.h>
+
+#include "softing_platform.h"
+
+static int softingcs_index;
+static spinlock_t softingcs_index_lock;
+
+static int softingcs_reset(struct platform_device *pdev, int v);
+static int softingcs_enable_irq(struct platform_device *pdev, int v);
+
+/*
+ * platform_data descriptions
+ */
+static const struct softing_platform_data softingcs_platform_data[] = {
+{
+	.name = "CANcard",
+	.manf = 0x0168, .prod = 0x001,
+	.generation = 1,
+	.nbus = 2,
+	.freq = 16, .max_brp = 32, .max_sjw = 4,
+	.dpram_size = 0x0800,
+	.boot = {0x0000, 0x000000, fw_dir "bcard.bin",},
+	.load = {0x0120, 0x00f600, fw_dir "ldcard.bin",},
+	.app = {0x0010, 0x0d0000, fw_dir "cancard.bin",},
+	.reset = softingcs_reset,
+	.enable_irq = softingcs_enable_irq,
+}, {
+	.name = "CANcard-NEC",
+	.manf = 0x0168, .prod = 0x002,
+	.generation = 1,
+	.nbus = 2,
+	.freq = 16, .max_brp = 32, .max_sjw = 4,
+	.dpram_size = 0x0800,
+	.boot = {0x0000, 0x000000, fw_dir "bcard.bin",},
+	.load = {0x0120, 0x00f600, fw_dir "ldcard.bin",},
+	.app = {0x0010, 0x0d0000, fw_dir "cancard.bin",},
+	.reset = softingcs_reset,
+	.enable_irq = softingcs_enable_irq,
+}, {
+	.name = "CANcard-SJA",
+	.manf = 0x0168, .prod = 0x004,
+	.generation = 1,
+	.nbus = 2,
+	.freq = 20, .max_brp = 32, .max_sjw = 4,
+	.dpram_size = 0x0800,
+	.boot = {0x0000, 0x000000, fw_dir "bcard.bin",},
+	.load = {0x0120, 0x00f600, fw_dir "ldcard.bin",},
+	.app = {0x0010, 0x0d0000, fw_dir "cansja.bin",},
+	.reset = softingcs_reset,
+	.enable_irq = softingcs_enable_irq,
+}, {
+	.name = "CANcard-2",
+	.manf = 0x0168, .prod = 0x005,
+	.generation = 2,
+	.nbus = 2,
+	.freq = 24, .max_brp = 64, .max_sjw = 4,
+	.dpram_size = 0x0800,
+	.boot = {0x0000, 0x000000, fw_dir "bcard2.bin",},
+	.load = {0x0120, 0x00f600, fw_dir "ldcard2.bin",},
+	.app = {0x0010, 0x0d0000, fw_dir "cancrd2.bin",},
+	.reset = softingcs_reset,
+	.enable_irq = 0,
+}, {
+	.name = "Vector-CANcard",
+	.manf = 0x0168, .prod = 0x081,
+	.generation = 1,
+	.nbus = 2,
+	.freq = 16, .max_brp = 64, .max_sjw = 4,
+	.dpram_size = 0x0800,
+	.boot = {0x0000, 0x000000, fw_dir "bcard.bin",},
+	.load = {0x0120, 0x00f600, fw_dir "ldcard.bin",},
+	.app = {0x0010, 0x0d0000, fw_dir "cancard.bin",},
+	.reset = softingcs_reset,
+	.enable_irq = softingcs_enable_irq,
+}, {
+	.name = "Vector-CANcard-SJA",
+	.manf = 0x0168, .prod = 0x084,
+	.generation = 1,
+	.nbus = 2,
+	.freq = 20, .max_brp = 32, .max_sjw = 4,
+	.dpram_size = 0x0800,
+	.boot = {0x0000, 0x000000, fw_dir "bcard.bin",},
+	.load = {0x0120, 0x00f600, fw_dir "ldcard.bin",},
+	.app = {0x0010, 0x0d0000, fw_dir "cansja.bin",},
+	.reset = softingcs_reset,
+	.enable_irq = softingcs_enable_irq,
+}, {
+	.name = "Vector-CANcard-2",
+	.manf = 0x0168, .prod = 0x085,
+	.generation = 2,
+	.nbus = 2,
+	.freq = 24, .max_brp = 64, .max_sjw = 4,
+	.dpram_size = 0x0800,
+	.boot = {0x0000, 0x000000, fw_dir "bcard2.bin",},
+	.load = {0x0120, 0x00f600, fw_dir "ldcard2.bin",},
+	.app = {0x0010, 0x0d0000, fw_dir "cancrd2.bin",},
+	.reset = softingcs_reset,
+	.enable_irq = 0,
+}, {
+	.name = "EDICcard-NEC",
+	.manf = 0x0168, .prod = 0x102,
+	.generation = 1,
+	.nbus = 2,
+	.freq = 16, .max_brp = 64, .max_sjw = 4,
+	.dpram_size = 0x0800,
+	.boot = {0x0000, 0x000000, fw_dir "bcard.bin",},
+	.load = {0x0120, 0x00f600, fw_dir "ldcard.bin",},
+	.app = {0x0010, 0x0d0000, fw_dir "cancard.bin",},
+	.reset = softingcs_reset,
+	.enable_irq = softingcs_enable_irq,
+}, {
+	.name = "EDICcard-2",
+	.manf = 0x0168, .prod = 0x105,
+	.generation = 2,
+	.nbus = 2,
+	.freq = 24, .max_brp = 64, .max_sjw = 4,
+	.dpram_size = 0x0800,
+	.boot = {0x0000, 0x000000, fw_dir "bcard2.bin",},
+	.load = {0x0120, 0x00f600, fw_dir "ldcard2.bin",},
+	.app = {0x0010, 0x0d0000, fw_dir "cancrd2.bin",},
+	.reset = softingcs_reset,
+	.enable_irq = 0,
+}, {
+	0, 0,
+},
+};
+
+MODULE_FIRMWARE(fw_dir "bcard.bin");
+MODULE_FIRMWARE(fw_dir "ldcard.bin");
+MODULE_FIRMWARE(fw_dir "cancard.bin");
+MODULE_FIRMWARE(fw_dir "cansja.bin");
+
+MODULE_FIRMWARE(fw_dir "bcard2.bin");
+MODULE_FIRMWARE(fw_dir "ldcard2.bin");
+MODULE_FIRMWARE(fw_dir "cancrd2.bin");
+
+static const struct softing_platform_data *softingcs_find_platform_data(
+		unsigned int manf, unsigned int prod)
+{
+	const struct softing_platform_data *lp;
+
+	for (lp = softingcs_platform_data; lp->manf; ++lp) {
+		if ((lp->manf == manf) && (lp->prod == prod))
+			return lp;
+	}
+	return 0;
+}
+
+/*
+ * platformdata callbacks
+ */
+static int softingcs_reset(struct platform_device *pdev, int v)
+{
+	struct pcmcia_device *pcmcia = to_pcmcia_dev(pdev->dev.parent);
+
+	dev_dbg(&pdev->dev, "pcmcia config [2] %02x\n", v ? 0 : 0x20);
+	return pcmcia_write_config_byte(pcmcia, 2, v ? 0 : 0x20);
+}
+
+static int softingcs_enable_irq(struct platform_device *pdev, int v)
+{
+	struct pcmcia_device *pcmcia = to_pcmcia_dev(pdev->dev.parent);
+
+	dev_dbg(&pdev->dev, "pcmcia config [0] %02x\n", v ? 0x60 : 0);
+	return pcmcia_write_config_byte(pcmcia, 0, v ? 0x60 : 0);
+}
+
+/*
+ * pcmcia check
+ */
+static int softingcs_probe_config(struct pcmcia_device *pcmcia,
+		void *priv_data)
+{
+	struct softing_platform_data *pdat = priv_data;
+	struct resource *pres;
+	int memspeed = 0;
+
+	WARN_ON(!pdat);
+	pres = pcmcia->resource[PCMCIA_IOMEM_0];
+	if (resource_size(pres) < 0x1000)
+		return -ERANGE;
+
+	pres->flags |= WIN_MEMORY_TYPE_CM | WIN_ENABLE;
+	if (pdat->generation < 2) {
+		pres->flags |= WIN_USE_WAIT | WIN_DATA_WIDTH_8;
+		memspeed = 3;
+	} else {
+		pres->flags |= WIN_DATA_WIDTH_16;
+	}
+	return pcmcia_request_window(pcmcia, pres, memspeed);
+}
+
+static void softingcs_remove(struct pcmcia_device *pcmcia)
+{
+	struct platform_device *pdev = pcmcia->priv;
+
+	/* free bits */
+	platform_device_unregister(pdev);
+	/* release pcmcia stuff */
+	pcmcia_disable_device(pcmcia);
+}
+
+/*
+ * platform_device wrapper
+ * pdev->resource has 2 entries: io & irq
+ */
+static void softingcs_pdev_release(struct device *dev)
+{
+	struct platform_device *pdev = to_platform_device(dev);
+	kfree(pdev);
+}
+
+static int softingcs_probe(struct pcmcia_device *pcmcia)
+{
+	int ret;
+	struct platform_device *pdev;
+	const struct softing_platform_data *pdat;
+	struct resource *pres;
+	struct dev {
+		struct platform_device pdev;
+		struct resource res[2];
+	} *dev;
+
+	/* find matching platform_data */
+	pdat = softingcs_find_platform_data(pcmcia->manf_id, pcmcia->card_id);
+	if (!pdat)
+		return -ENOTTY;
+
+	/* setup pcmcia device */
+	pcmcia->config_flags |= CONF_ENABLE_IRQ | CONF_AUTO_SET_IOMEM |
+		CONF_AUTO_SET_VPP | CONF_AUTO_CHECK_VCC;
+	ret = pcmcia_loop_config(pcmcia, softingcs_probe_config, (void *)pdat);
+	if (ret)
+		goto pcmcia_failed;
+
+	ret = pcmcia_enable_device(pcmcia);
+	if (ret < 0)
+		goto pcmcia_failed;
+
+	pres = pcmcia->resource[PCMCIA_IOMEM_0];
+	if (!pres) {
+		ret = -EBADF;
+		goto pcmcia_bad;
+	}
+
+	/* create softing platform device */
+	dev = kzalloc(sizeof(*dev), GFP_KERNEL);
+	if (!dev) {
+		ret = -ENOMEM;
+		goto mem_failed;
+	}
+	dev->pdev.resource = dev->res;
+	dev->pdev.num_resources = ARRAY_SIZE(dev->res);
+	dev->pdev.dev.release = softingcs_pdev_release;
+
+	pdev = &dev->pdev;
+	pdev->dev.platform_data = (void *)pdat;
+	pdev->dev.parent = &pcmcia->dev;
+	pcmcia->priv = pdev;
+
+	/* platform device resources */
+	pdev->resource[0].flags = IORESOURCE_MEM;
+	pdev->resource[0].start = pres->start;
+	pdev->resource[0].end = pres->end;
+
+	pdev->resource[1].flags = IORESOURCE_IRQ;
+	pdev->resource[1].start = pcmcia->irq;
+	pdev->resource[1].end = pdev->resource[1].start;
+
+	/* platform device setup */
+	spin_lock(&softingcs_index_lock);
+	pdev->id = softingcs_index++;
+	spin_unlock(&softingcs_index_lock);
+	pdev->name = "softing";
+	dev_set_name(&pdev->dev, "softingcs.%i", pdev->id);
+	ret = platform_device_register(pdev);
+	if (ret < 0)
+		goto platform_failed;
+
+	dev_info(&pcmcia->dev, "created %s\n", dev_name(&pdev->dev));
+	return 0;
+
+platform_failed:
+	kfree(dev);
+mem_failed:
+pcmcia_bad:
+pcmcia_failed:
+	pcmcia_disable_device(pcmcia);
+	pcmcia->priv = 0;
+	return ret ?: -ENODEV;
+}
+
+static /*const*/ struct pcmcia_device_id softingcs_ids[] = {
+	/* softing */
+	PCMCIA_DEVICE_MANF_CARD(0x0168, 0x0001),
+	PCMCIA_DEVICE_MANF_CARD(0x0168, 0x0002),
+	PCMCIA_DEVICE_MANF_CARD(0x0168, 0x0004),
+	PCMCIA_DEVICE_MANF_CARD(0x0168, 0x0005),
+	/* vector, manufacturer? */
+	PCMCIA_DEVICE_MANF_CARD(0x0168, 0x0081),
+	PCMCIA_DEVICE_MANF_CARD(0x0168, 0x0084),
+	PCMCIA_DEVICE_MANF_CARD(0x0168, 0x0085),
+	/* EDIC */
+	PCMCIA_DEVICE_MANF_CARD(0x0168, 0x0102),
+	PCMCIA_DEVICE_MANF_CARD(0x0168, 0x0105),
+	PCMCIA_DEVICE_NULL,
+};
+
+MODULE_DEVICE_TABLE(pcmcia, softingcs_ids);
+
+static struct pcmcia_driver softingcs_driver = {
+	.owner		= THIS_MODULE,
+	.name		= "softingcs",
+	.id_table	= softingcs_ids,
+	.probe		= softingcs_probe,
+	.remove		= softingcs_remove,
+};
+
+static int __init softingcs_start(void)
+{
+	spin_lock_init(&softingcs_index_lock);
+	return pcmcia_register_driver(&softingcs_driver);
+}
+
+static void __exit softingcs_stop(void)
+{
+	pcmcia_unregister_driver(&softingcs_driver);
+}
+
+module_init(softingcs_start);
+module_exit(softingcs_stop);
+
+MODULE_DESCRIPTION("softing CANcard driver"
+		", links PCMCIA card to softing driver");
+MODULE_LICENSE("GPL");
+MODULE_SUPPORTED_DEVICE("softing CANcard2");
+

^ permalink raw reply related

* [PATCH net-next-2.6 1/2] can: add driver for Softing card
From: Kurt Van Dijck @ 2010-12-23  9:43 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA,
	socketcan-core-0fE9KPoRgkgATYTw5x5z8w
In-Reply-To: <20101223093627.GA325-MxZ6Iy/zr/UdbCeoMzGj59i2O/JbrIOy@public.gmane.org>

This patch adds a driver for the platform:softing device.
This will create (up to) 2 CAN network devices from 1
platform:softing device

Signed-off-by: Kurt Van Dijck <kurt.van.dijck-/BeEPy95v10@public.gmane.org>

---
 drivers/net/can/Kconfig                    |    2 +
 drivers/net/can/Makefile                   |    1 +
 drivers/net/can/softing/Kconfig            |   16 +
 drivers/net/can/softing/Makefile           |    5 +
 drivers/net/can/softing/softing.h          |  216 +++++++
 drivers/net/can/softing/softing_fw.c       |  664 ++++++++++++++++++++
 drivers/net/can/softing/softing_main.c     |  935 ++++++++++++++++++++++++++++
 drivers/net/can/softing/softing_platform.h |   38 ++
 8 files changed, 1877 insertions(+), 0 deletions(-)

diff --git a/drivers/net/can/Kconfig b/drivers/net/can/Kconfig
index d5a9db6..986195e 100644
--- a/drivers/net/can/Kconfig
+++ b/drivers/net/can/Kconfig
@@ -117,6 +117,8 @@ source "drivers/net/can/sja1000/Kconfig"
 
 source "drivers/net/can/usb/Kconfig"
 
+source "drivers/net/can/softing/Kconfig"
+
 config CAN_DEBUG_DEVICES
 	bool "CAN devices debugging messages"
 	depends on CAN
diff --git a/drivers/net/can/Makefile b/drivers/net/can/Makefile
index 07ca159..53c82a7 100644
--- a/drivers/net/can/Makefile
+++ b/drivers/net/can/Makefile
@@ -9,6 +9,7 @@ obj-$(CONFIG_CAN_DEV)		+= can-dev.o
 can-dev-y			:= dev.o
 
 obj-y				+= usb/
+obj-y				+= softing/
 
 obj-$(CONFIG_CAN_SJA1000)	+= sja1000/
 obj-$(CONFIG_CAN_MSCAN)		+= mscan/
diff --git a/drivers/net/can/softing/Kconfig b/drivers/net/can/softing/Kconfig
new file mode 100644
index 0000000..072f337
--- /dev/null
+++ b/drivers/net/can/softing/Kconfig
@@ -0,0 +1,16 @@
+config CAN_SOFTING
+	tristate "Softing Gmbh CAN generic support"
+	depends on CAN_DEV
+	---help---
+	  Support for CAN cards from Softing Gmbh & some cards
+	  from Vector Gmbh.
+	  Softing Gmbh CAN cards come with 1 or 2 physical busses.
+	  Those cards typically use Dual Port RAM to communicate
+	  with the host CPU. The interface is then identical for PCI
+	  and PCMCIA cards. This driver operates on a platform device,
+	  which has been created by softing_cs or softing_pci driver.
+	  Warning:
+	  The API of the card does not allow fine control per bus, but
+	  controls the 2 busses on the card together.
+	  As such, some actions (start/stop/busoff recovery) on 1 bus
+	  must bring down the other bus too temporarily.
diff --git a/drivers/net/can/softing/Makefile b/drivers/net/can/softing/Makefile
new file mode 100644
index 0000000..7878b7b
--- /dev/null
+++ b/drivers/net/can/softing/Makefile
@@ -0,0 +1,5 @@
+
+softing-y := softing_main.o softing_fw.o
+obj-$(CONFIG_CAN_SOFTING)        += softing.o
+
+ccflags-$(CONFIG_CAN_DEBUG_DEVICES) := -DDEBUG
diff --git a/drivers/net/can/softing/softing.h b/drivers/net/can/softing/softing.h
new file mode 100644
index 0000000..99046a7
--- /dev/null
+++ b/drivers/net/can/softing/softing.h
@@ -0,0 +1,216 @@
+/*
+ * softing common interfaces
+ *
+ * by Kurt Van Dijck, 06-2008
+ */
+
+#include <linux/netdevice.h>
+#include <linux/ktime.h>
+#include <linux/mutex.h>
+#include <linux/spinlock.h>
+#include <linux/can.h>
+#include <linux/can/dev.h>
+
+#include "softing_platform.h"
+
+#ifndef CAN_CTRLMODE_BERR_REPORTING
+#define CAN_CTRLMODE_BERR_REPORTING 0
+#endif
+
+struct softing;
+
+struct softing_priv {
+	struct can_priv can;	/* must be the first member! */
+	struct net_device *netdev;
+	struct softing *card;
+	struct {
+		int pending;
+		/* variables wich hold the circular buffer */
+		int echo_put;
+		int echo_get;
+	} tx;
+	struct can_bittiming_const btr_const;
+	int index;
+	u8 output;
+	u16 chip;
+};
+#define netdev2softing(netdev)	((struct softing_priv *)netdev_priv(netdev))
+
+struct softing {
+	const struct softing_platform_data *pdat;
+	struct platform_device *pdev;
+	struct net_device *net[2];
+	spinlock_t	 spin; /* protect this structure & DPRAM access */
+	ktime_t ts_ref;
+	ktime_t ts_overflow; /* timestamp overflow value, in ktime */
+
+	struct {
+		/* indication of firmware status */
+		int up;
+		/* protection of the 'up' variable */
+		struct mutex lock;
+	} fw;
+	struct {
+		int nr;
+		int requested;
+		struct tasklet_struct bh;
+		int svc_count;
+	} irq;
+	struct {
+		int pending;
+		int last_bus;
+		/* keep the bus that last tx'd a message,
+		 * in order to let every netdev queue resume
+		 */
+	} tx;
+	struct {
+		unsigned long phys;
+		unsigned long size;
+		unsigned char *virt;
+		unsigned char *end;
+		struct softing_fct  *fct;
+		struct softing_info *info;
+		struct softing_rx  *rx;
+		struct softing_tx  *tx;
+		struct softing_irq *irq;
+		unsigned short *command;
+		unsigned short *receipt;
+	} dpram;
+	struct {
+		u32  serial, fw, hw, lic;
+		u16  chip[2];
+		u32  freq;
+	} id;
+};
+
+extern int softing_default_output(struct net_device *netdev);
+
+extern ktime_t softing_raw2ktime(struct softing *card, u32 raw);
+
+extern int softing_fct_cmd(struct softing *card
+			, int cmd, int vector, const char *msg);
+
+extern int softing_bootloader_command(struct softing *card
+			, int command, const char *msg);
+
+/* reset DPRAM */
+static inline void softing_set_reset_dpram(struct softing *card)
+{
+	if (card->pdat->generation >= 2) {
+		spin_lock_bh(&card->spin);
+		card->dpram.virt[0xe00] &= ~1;
+		spin_unlock_bh(&card->spin);
+	}
+}
+
+static inline void softing_clr_reset_dpram(struct softing *card)
+{
+	if (card->pdat->generation >= 2) {
+		spin_lock_bh(&card->spin);
+		card->dpram.virt[0xe00] |= 1;
+		spin_unlock_bh(&card->spin);
+	}
+}
+
+/* Load firmware after reset */
+extern int softing_load_fw(const char *file, struct softing *card,
+			unsigned char *virt, unsigned int size, int offset);
+
+/* Load final application firmware after bootloader */
+extern int softing_load_app_fw(const char *file, struct softing *card);
+
+extern int softing_reset_chip(struct softing *card);
+
+/*
+ * enable or disable irq
+ * only called with fw.lock locked
+ */
+extern int softing_enable_irq(struct softing *card, int enable);
+
+/* start/stop 1 bus on card */
+extern int softing_startstop(struct net_device *netdev, int up);
+
+/* netif_rx() */
+extern int softing_netdev_rx(struct net_device *netdev,
+		const struct can_frame *msg, ktime_t ktime);
+
+/* SOFTING DPRAM mappings */
+struct softing_rx {
+	u8  fifo[16][32];
+	u8  dummy1;
+	u16 rd;
+	u16 dummy2;
+	u16 wr;
+	u16  dummy3;
+	u16 lost_msg;
+} __attribute__((packed));
+
+#define TXMAX	31
+struct softing_tx {
+	u8  fifo[32][16];
+	u8  dummy1;
+	u16 rd;
+	u16 dummy2;
+	u16 wr;
+	u8  dummy3;
+} __attribute__((packed));
+
+struct softing_irq {
+	u8 to_host;
+	u8 to_card;
+} __attribute__((packed));
+
+struct softing_fct {
+	s16 param[20]; /* 0 is index */
+	s16 returned;
+	u8  dummy;
+	u16 host_access;
+} __attribute__((packed));
+
+struct softing_info {
+	u8  dummy1;
+	u16 bus_state;
+	u16 dummy2;
+	u16 bus_state2;
+	u16 dummy3;
+	u16 error_state;
+	u16 dummy4;
+	u16 error_state2;
+	u16 dummy5;
+	u16 reset;
+	u16 dummy6;
+	u16 clear_rcv_fifo;
+	u16 dummy7;
+	u16 dummyxx;
+	u16 dummy8;
+	u16 time_reset;
+	u8  dummy9;
+	u32 time;
+	u32 time_wrap;
+	u8  wr_start;
+	u8  wr_end;
+	u8  dummy10;
+	u16 dummy12;
+	u16 dummy12x;
+	u16 dummy13;
+	u16 reset_rcv_fifo;
+	u8  dummy14;
+	u8  reset_xmt_fifo;
+	u8  read_fifo_levels;
+	u16 rcv_fifo_level;
+	u16 xmt_fifo_level;
+} __attribute__((packed));
+
+/* DPRAM return codes */
+#define RES_NONE	0
+#define RES_OK		1
+#define RES_NOK		2
+#define RES_UNKNOWN	3
+/* DPRAM flags */
+#define CMD_TX		0x01
+#define CMD_ACK		0x02
+#define CMD_XTD		0x04
+#define CMD_RTR		0x08
+#define CMD_ERR		0x10
+#define CMD_BUS2	0x80
+
diff --git a/drivers/net/can/softing/softing_fw.c b/drivers/net/can/softing/softing_fw.c
new file mode 100644
index 0000000..f61299c
--- /dev/null
+++ b/drivers/net/can/softing/softing_fw.c
@@ -0,0 +1,664 @@
+/*
+* drivers/net/can/softing/softing_fw.c
+*
+* Copyright (C) 2008-2010
+*
+* - Kurt Van Dijck, EIA Electronics
+*
+* This program is free software; you can redistribute it and/or modify
+* it under the terms of the version 2 of the GNU General Public License
+* as published by the Free Software Foundation
+*
+* This program is distributed in the hope that it will be useful,
+* but WITHOUT ANY WARRANTY; without even the implied warranty of
+* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+* GNU General Public License for more details.
+*
+* You should have received a copy of the GNU General Public License
+* along with this program; if not, write to the Free Software
+* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+*/
+
+#include <linux/firmware.h>
+#include <linux/sched.h>
+#include <asm/div64.h>
+
+#include "softing.h"
+
+int softing_fct_cmd(struct softing *card, int cmd, int vector, const char *msg)
+{
+	int ret;
+	unsigned long stamp;
+	if (vector == RES_OK)
+		vector = RES_NONE;
+	card->dpram.fct->param[0] = cmd;
+	card->dpram.fct->host_access = vector;
+	/* be sure to flush this to the card */
+	wmb();
+	stamp = jiffies;
+	/*wait for card */
+	do {
+		ret = card->dpram.fct->host_access;
+		/* don't have any cached variables */
+		rmb();
+		if (ret == RES_OK) {
+			/*don't read return-value now */
+			ret = card->dpram.fct->returned;
+			if (ret)
+				dev_alert(&card->pdev->dev,
+					"%s returned %u\n", msg, ret);
+			return 0;
+		}
+		if ((jiffies - stamp) >= 1 * HZ)
+			break;
+		if (in_interrupt())
+			/* go as fast as possible */
+			continue;
+		/* process context => relax */
+		schedule();
+	} while (!signal_pending(current));
+
+	if (ret == RES_NONE) {
+		dev_alert(&card->pdev->dev,
+			"%s, no response from card on %u/0x%02x\n",
+			msg, cmd, vector);
+		return 1;
+	} else {
+		dev_alert(&card->pdev->dev,
+			"%s, bad response from card on %u/0x%02x, 0x%04x\n",
+			msg, cmd, vector, ret);
+		/*make sure to return something not 0 */
+		return ret ? ret : 1;
+	}
+}
+
+int softing_bootloader_command(struct softing *card
+		, int command, const char *msg)
+{
+	int ret;
+	unsigned long stamp;
+	card->dpram.receipt[0] = RES_NONE;
+	card->dpram.command[0] = command;
+	/* be sure to flush this to the card */
+	wmb();
+	stamp = jiffies;
+	/*wait for card */
+	do {
+		ret = card->dpram.receipt[0];
+		/* don't have any cached variables */
+		rmb();
+		if (ret == RES_OK)
+			return 0;
+		if ((jiffies - stamp) >= (3 * HZ))
+			break;
+		schedule();
+	} while (!signal_pending(current));
+
+	switch (ret) {
+	case RES_NONE:
+		dev_alert(&card->pdev->dev, "%s: no response from card\n", msg);
+		break;
+	case RES_NOK:
+		dev_alert(&card->pdev->dev, "%s: response from card nok\n",
+				msg);
+		break;
+	case RES_UNKNOWN:
+		dev_alert(&card->pdev->dev, "%s: command 0x%04x unknown\n",
+			msg, command);
+		break;
+	default:
+		dev_alert(&card->pdev->dev, "%s: bad response from card: %i\n",
+			msg, ret);
+		break;
+	}
+	return ret ? ret : 1;
+}
+
+struct fw_hdr {
+	u16 type;
+	u32 addr;
+	u16 len;
+	u16 checksum;
+	const unsigned char *base;
+} __attribute__ ((packed));
+
+static int fw_parse(const unsigned char **pmem, struct fw_hdr *hdr)
+{
+	u16 tmp;
+	const unsigned char *mem;
+	const unsigned char *end;
+	mem = *pmem;
+	hdr->type = (mem[0] << 0) | (mem[1] << 8);
+	hdr->addr = (mem[2] << 0) | (mem[3] << 8)
+		 | (mem[4] << 16) | (mem[5] << 24);
+	hdr->len = (mem[6] << 0) | (mem[7] << 8);
+	hdr->base = &mem[8];
+	hdr->checksum =
+		 (hdr->base[hdr->len] << 0) | (hdr->base[hdr->len + 1] << 8);
+	for (tmp = 0, mem = *pmem, end = &hdr->base[hdr->len]; mem < end; ++mem)
+		tmp += *mem;
+	if (tmp != hdr->checksum)
+		return -EINVAL;
+	*pmem += 10 + hdr->len;
+	return 0;
+}
+
+int softing_load_fw(const char *file, struct softing *card,
+			unsigned char *virt, unsigned int size, int offset)
+{
+	const struct firmware *fw;
+	const unsigned char *mem;
+	const unsigned char *end;
+	int ret = 0;
+	u32 start_addr;
+	struct fw_hdr rec;
+	int ok = 0;
+	unsigned char buf[1024];
+
+	ret = request_firmware(&fw, file, &card->pdev->dev);
+	if (ret) {
+		dev_alert(&card->pdev->dev, "request_firmware(%s) got %i\n",
+			file, ret);
+		return ret;
+	}
+	dev_dbg(&card->pdev->dev, "%s, firmware(%s) got %u bytes"
+		", offset %c0x%04x\n",
+		card->pdat->name, file, (unsigned int)fw->size,
+		(offset >= 0) ? '+' : '-', (unsigned int)abs(offset));
+	/* parse the firmware */
+	mem = fw->data;
+	end = &mem[fw->size];
+	/* look for header record */
+	ret = fw_parse(&mem, &rec);
+	if (ret < 0)
+		goto fw_end;
+	if (rec.type != 0xffff) {
+		dev_alert(&card->pdev->dev, "firware starts with type 0x%04x\n",
+			rec.type);
+		goto fw_end;
+	}
+	if (strncmp("Structured Binary Format, Softing GmbH"
+			, rec.base, rec.len)) {
+		dev_info(&card->pdev->dev, "firware string '%.*s'\n",
+			rec.len, rec.base);
+		goto fw_end;
+	}
+	ok |= 1;
+	/* ok, we had a header */
+	while (mem < end) {
+		ret = fw_parse(&mem, &rec);
+		if (ret)
+			break;
+		if (rec.type == 3) {
+			/*start address */
+			start_addr = rec.addr;
+			ok |= 2;
+			continue;
+		} else if (rec.type == 1) {
+			/*eof */
+			ok |= 4;
+			goto fw_end;
+		} else if (rec.type != 0) {
+			dev_alert(&card->pdev->dev, "unknown record type 0x%04x\n",
+				rec.type);
+			break;
+		}
+
+		if ((rec.addr + rec.len + offset) > size) {
+			dev_alert(&card->pdev->dev,
+				"firmware out of range (0x%08x / 0x%08x)\n",
+				(rec.addr + rec.len + offset), size);
+			goto fw_end;
+		}
+		memcpy_toio(&virt[rec.addr + offset],
+				 rec.base, rec.len);
+		/* be sure to flush caches from IO space */
+		mb();
+		if (rec.len > sizeof(buf)) {
+			dev_info(&card->pdev->dev,
+				"record is big (%u bytes), not verifying\n",
+				rec.len);
+			continue;
+		}
+		/* verify record data */
+		memcpy_fromio(buf, &virt[rec.addr + offset], rec.len);
+		if (!memcmp(buf, rec.base, rec.len))
+			/* is ok */
+			continue;
+		dev_alert(&card->pdev->dev, "0x%08x:0x%03x at 0x%p failed\n",
+			rec.addr, rec.len, &virt[rec.addr + offset]);
+		goto fw_end;
+	}
+fw_end:
+	release_firmware(fw);
+	if (0x5 == (ok & 0x5))
+		/* got eof & start */
+		return 0;
+	dev_info(&card->pdev->dev, "firmware %s failed\n", file);
+	return ret ?: -EINVAL;
+}
+
+int softing_load_app_fw(const char *file, struct softing *card)
+{
+	const struct firmware *fw;
+	const unsigned char *mem;
+	const unsigned char *end;
+	int ret;
+	struct fw_hdr rec;
+	int ok = 0;
+	u32 start_addr = 0;
+	u16 rx_sum;
+	unsigned int sum;
+	const unsigned char *mem_lp;
+	const unsigned char *mem_end;
+	struct cpy {
+		u32 src;
+		u32 dst;
+		u16 len;
+		u8 do_cs;
+	} __attribute__((packed)) *pcpy =
+		 (struct cpy *)&card->dpram.command[1];
+	struct cmd {
+		u32 start;
+		u8 autorestart;
+	} __attribute__((packed)) *pcmdstart =
+		(struct cmd *)&card->dpram.command[1];
+
+	ret = request_firmware(&fw, file, &card->pdev->dev);
+	if (ret) {
+		dev_alert(&card->pdev->dev, "request_firmware(%s) got %i\n",
+			file, ret);
+		return ret;
+	}
+	dev_dbg(&card->pdev->dev, "firmware(%s) got %lu bytes\n",
+		file, (unsigned long)fw->size);
+	/* parse the firmware */
+	mem = fw->data;
+	end = &mem[fw->size];
+	/* look for header record */
+	ret = fw_parse(&mem, &rec);
+	if (ret)
+		goto fw_end;
+	if (rec.type != 0xffff) {
+		dev_alert(&card->pdev->dev, "firware starts with type 0x%04x\n",
+			rec.type);
+		goto fw_end;
+	}
+	if (strncmp("Structured Binary Format, Softing GmbH"
+		, rec.base, rec.len)) {
+		dev_alert(&card->pdev->dev, "firware string '%.*s' fault\n",
+			rec.len, rec.base);
+		goto fw_end;
+	}
+	ok |= 1;
+	/* ok, we had a header */
+	while (mem < end) {
+		ret = fw_parse(&mem, &rec);
+		if (ret)
+			break;
+
+		if (rec.type == 3) {
+			/*start address */
+			start_addr = rec.addr;
+			ok |= 2;
+			continue;
+		} else if (rec.type == 1) {
+			/*eof */
+			ok |= 4;
+			goto fw_end;
+		} else if (rec.type != 0) {
+			dev_alert(&card->pdev->dev, "unknown record type 0x%04x\n",
+				rec.type);
+			break;
+		}
+		/* regualar data */
+		for (sum = 0, mem_lp = rec.base, mem_end = &mem_lp[rec.len];
+			mem_lp < mem_end; ++mem_lp)
+			sum += *mem_lp;
+
+		memcpy_toio(&card->dpram. virt[card->pdat->app.offs],
+				 rec.base, rec.len);
+		pcpy->src = card->pdat->app.offs + card->pdat->app.addr;
+		pcpy->dst = rec.addr;
+		pcpy->len = rec.len;
+		pcpy->do_cs = 1;
+		if (softing_bootloader_command(card, 1, "loading app."))
+			goto fw_end;
+		/*verify checksum */
+		rx_sum = card->dpram.receipt[1];
+		if (rx_sum != (sum & 0xffff)) {
+			dev_alert(&card->pdev->dev, "SRAM seems to be damaged"
+				", wanted 0x%04x, got 0x%04x\n", sum, rx_sum);
+			goto fw_end;
+		}
+	}
+fw_end:
+	release_firmware(fw);
+	if (ok != 7)
+		goto fw_failed;
+	/*got start, start_addr, & eof */
+	pcmdstart->start = start_addr;
+	pcmdstart->autorestart = 1;
+	if (softing_bootloader_command(card, 3, "start app."))
+		goto fw_failed;
+	dev_info(&card->pdev->dev, "firmware %s up\n", file);
+	return 0;
+fw_failed:
+	dev_info(&card->pdev->dev, "firmware %s failed\n", file);
+	return ret ?: -EINVAL;
+}
+
+int softing_reset_chip(struct softing *card)
+{
+	do {
+		/*reset chip */
+		card->dpram.info->reset_rcv_fifo = 0;
+		card->dpram.info->reset = 1;
+		if (!softing_fct_cmd(card, 0, 0, "reset_chip"))
+			break;
+		if (signal_pending(current))
+			goto failed;
+		/*sync */
+		if (softing_fct_cmd(card, 99, 0x55, "sync-a"))
+			goto failed;
+		if (softing_fct_cmd(card, 99, 0xaa, "sync-a"))
+			goto failed;
+	} while (1);
+	card->tx.pending = 0;
+	return 0;
+failed:
+	return -EIO;
+}
+
+static void softing_initialize_timestamp(struct softing *card)
+{
+	uint64_t ovf;
+
+	card->ts_ref = ktime_get();
+
+	/* 16MHz is the reference */
+	ovf = 0x100000000ULL * 16;
+	do_div(ovf, card->pdat->freq ?: 16);
+
+	card->ts_overflow = ktime_add_us(ktime_set(0, 0), ovf);
+}
+
+ktime_t softing_raw2ktime(struct softing *card, u32 raw)
+{
+	uint64_t rawl;
+	ktime_t now, real_offset;
+	ktime_t target;
+	ktime_t tmp;
+
+	now = ktime_get();
+	real_offset = ktime_sub(ktime_get_real(), now);
+
+	/* find nsec from card */
+	rawl = raw * 16;
+	do_div(rawl, card->pdat->freq ?: 16);
+	target = ktime_add_us(card->ts_ref, rawl);
+	/* test for overflows */
+	tmp = ktime_add(target, card->ts_overflow);
+	while (unlikely(ktime_to_ns(tmp) > ktime_to_ns(now))) {
+		card->ts_ref = ktime_add(card->ts_ref, card->ts_overflow);
+		target = tmp;
+		tmp = ktime_add(target, card->ts_overflow);
+	}
+	return ktime_add(target, real_offset);
+}
+
+static inline int softing_error_reporting(struct net_device *netdev)
+{
+	struct softing_priv *priv = netdev_priv(netdev);
+
+	return (priv->can.ctrlmode & CAN_CTRLMODE_BERR_REPORTING)
+		? 1 : 0;
+}
+
+int softing_startstop(struct net_device *dev, int up)
+{
+	int ret;
+	struct softing *card;
+	struct softing_priv *priv;
+	struct net_device *netdev;
+	int mask_start;
+	int j, error_reporting;
+	struct can_frame msg;
+
+	priv = netdev_priv(dev);
+	card = priv->card;
+
+	if (!card->fw.up)
+		return -EIO;
+
+	ret = mutex_lock_interruptible(&card->fw.lock);
+	if (ret)
+		return ret;
+
+	mask_start = 0;
+	if (dev && up)
+		/* prepare to start this bus as well */
+		mask_start |= (1 << priv->index);
+	/* bring netdevs down */
+	for (j = 0; j < ARRAY_SIZE(card->net); ++j) {
+		netdev = card->net[j];
+		if (!netdev)
+			continue;
+		priv = netdev_priv(netdev);
+
+		if (dev != netdev)
+			netif_stop_queue(netdev);
+
+		if (netif_running(netdev)) {
+			if (dev != netdev)
+				mask_start |= (1 << j);
+			priv->tx.pending = 0;
+			priv->tx.echo_put = 0;
+			priv->tx.echo_get = 0;
+			/* this bus' may just have called open_candev()
+			 * which is rather stupid to call close_candev()
+			 * already
+			 * but we may come here from busoff recovery too
+			 * in which case the echo_skb _needs_ flushing too.
+			 * just be sure to call open_candev() again
+			 */
+			close_candev(netdev);
+		}
+		priv->can.state = CAN_STATE_STOPPED;
+	}
+	card->tx.pending = 0;
+
+	softing_enable_irq(card, 0);
+	ret = softing_reset_chip(card);
+	if (ret)
+		goto failed;
+	if (!mask_start)
+		/* no busses to be brought up */
+		goto card_done;
+
+	if ((mask_start & 1) && (mask_start & 2)
+			&& (softing_error_reporting(card->net[0])
+				!= softing_error_reporting(card->net[1]))) {
+		dev_alert(&card->pdev->dev,
+				"err_reporting flag differs for busses\n");
+		goto invalid;
+	}
+	error_reporting = 0;
+	if (mask_start & 1) {
+		netdev = card->net[0];
+		priv = netdev_priv(netdev);
+		error_reporting += softing_error_reporting(netdev);
+		/*init chip 1 */
+		card->dpram.fct->param[1] = priv->can.bittiming.brp;
+		card->dpram.fct->param[2] = priv->can.bittiming.sjw;
+		card->dpram.fct->param[3] =
+			priv->can.bittiming.phase_seg1 +
+			priv->can.bittiming.prop_seg;
+		card->dpram.fct->param[4] =
+			priv->can.bittiming.phase_seg2;
+		card->dpram.fct->param[5] = (priv->can.ctrlmode &
+			CAN_CTRLMODE_3_SAMPLES) ? 1 : 0;
+		if (softing_fct_cmd(card, 1, 0, "initialize_chip[0]"))
+			goto failed;
+		/*set mode */
+		card->dpram.fct->param[1] = 0;
+		card->dpram.fct->param[2] = 0;
+		if (softing_fct_cmd(card, 3, 0, "set_mode[0]"))
+			goto failed;
+		/*set filter */
+		card->dpram.fct->param[1] = 0x0000;/*card->bus[0].s.msg; */
+		card->dpram.fct->param[2] = 0x07ff;/*card->bus[0].s.msk; */
+		card->dpram.fct->param[3] = 0x0000;/*card->bus[0].l.msg; */
+		card->dpram.fct->param[4] = 0xffff;/*card->bus[0].l.msk; */
+		card->dpram.fct->param[5] = 0x0000;/*card->bus[0].l.msg >> 16;*/
+		card->dpram.fct->param[6] = 0x1fff;/*card->bus[0].l.msk >> 16;*/
+		if (softing_fct_cmd(card, 7, 0, "set_filter[0]"))
+			goto failed;
+		/*set output control */
+		card->dpram.fct->param[1] = priv->output;
+		if (softing_fct_cmd(card, 5, 0, "set_output[0]"))
+			goto failed;
+	}
+	if (mask_start & 2) {
+		netdev = card->net[1];
+		priv = netdev_priv(netdev);
+		error_reporting += softing_error_reporting(netdev);
+		/*init chip2 */
+		card->dpram.fct->param[1] = priv->can.bittiming.brp;
+		card->dpram.fct->param[2] = priv->can.bittiming.sjw;
+		card->dpram.fct->param[3] =
+			priv->can.bittiming.phase_seg1 +
+			priv->can.bittiming.prop_seg;
+		card->dpram.fct->param[4] =
+			priv->can.bittiming.phase_seg2;
+		card->dpram.fct->param[5] = (priv->can.ctrlmode &
+			CAN_CTRLMODE_3_SAMPLES) ? 1 : 0;
+		if (softing_fct_cmd(card, 2, 0, "initialize_chip[1]"))
+			goto failed;
+		/*set mode2 */
+		card->dpram.fct->param[1] = 0;
+		card->dpram.fct->param[2] = 0;
+		if (softing_fct_cmd(card, 4, 0, "set_mode[1]"))
+			goto failed;
+		/*set filter2 */
+		card->dpram.fct->param[1] = 0x0000;/*card->bus[1].s.msg; */
+		card->dpram.fct->param[2] = 0x07ff;/*card->bus[1].s.msk; */
+		card->dpram.fct->param[3] = 0x0000;/*card->bus[1].l.msg; */
+		card->dpram.fct->param[4] = 0xffff;/*card->bus[1].l.msk; */
+		card->dpram.fct->param[5] = 0x0000;/*card->bus[1].l.msg >> 16;*/
+		card->dpram.fct->param[6] = 0x1fff;/*card->bus[1].l.msk >> 16;*/
+		if (softing_fct_cmd(card, 8, 0, "set_filter[1]"))
+			goto failed;
+		/*set output control2 */
+		card->dpram.fct->param[1] = priv->output;
+		if (softing_fct_cmd(card, 6, 0, "set_output[1]"))
+			goto failed;
+	}
+	/*enable_error_frame */
+	if (error_reporting) {
+		if (softing_fct_cmd(card, 51, 0, "enable_error_frame"))
+			goto failed;
+	}
+	/*initialize interface */
+	card->dpram.fct->param[1] = 1;
+	card->dpram.fct->param[2] = 1;
+	card->dpram.fct->param[3] = 1;
+	card->dpram.fct->param[4] = 1;
+	card->dpram.fct->param[5] = 1;
+	card->dpram.fct->param[6] = 1;
+	card->dpram.fct->param[7] = 1;
+	card->dpram.fct->param[8] = 1;
+	card->dpram.fct->param[9] = 1;
+	card->dpram.fct->param[10] = 1;
+	if (softing_fct_cmd(card, 17, 0, "initialize_interface"))
+		goto failed;
+	/*enable_fifo */
+	if (softing_fct_cmd(card, 36, 0, "enable_fifo"))
+		goto failed;
+	/*enable fifo tx ack */
+	if (softing_fct_cmd(card, 13, 0, "fifo_tx_ack[0]"))
+		goto failed;
+	/*enable fifo tx ack2 */
+	if (softing_fct_cmd(card, 14, 0, "fifo_tx_ack[1]"))
+		goto failed;
+	/*enable timestamps */
+	/*is default, no code found */
+	/*start_chip */
+	if (softing_fct_cmd(card, 11, 0, "start_chip"))
+		goto failed;
+	card->dpram.info->bus_state = 0;
+	card->dpram.info->bus_state2 = 0;
+	dev_info(&card->pdev->dev, "%s up\n", __func__);
+	if (card->pdat->generation < 2) {
+		card->dpram.irq->to_host = 0;
+		/* flush the DPRAM caches */
+		wmb();
+	}
+
+	softing_initialize_timestamp(card);
+
+	/*
+	 * do socketcan notifications/status changes
+	 * from here, no errors should occur, or the failed: part
+	 * must be reviewed
+	 */
+	memset(&msg, 0, sizeof(msg));
+	msg.can_id = CAN_ERR_FLAG | CAN_ERR_RESTARTED;
+	msg.can_dlc = CAN_ERR_DLC;
+	for (j = 0; j < ARRAY_SIZE(card->net); ++j) {
+		if (!(mask_start & (1 << j)))
+			continue;
+		netdev = card->net[j];
+		if (!netdev)
+			continue;
+		priv = netdev_priv(netdev);
+		priv->can.state = CAN_STATE_ERROR_ACTIVE;
+		open_candev(netdev);
+		if (dev != netdev) {
+			/* notify other busses on the restart */
+			softing_netdev_rx(netdev, &msg, ktime_set(0, 0));
+			++priv->can.can_stats.restarts;
+		}
+		netif_wake_queue(netdev);
+	}
+
+	/* enable interrupts */
+	ret = softing_enable_irq(card, 1);
+	if (ret)
+		goto failed;
+card_done:
+	mutex_unlock(&card->fw.lock);
+	return 0;
+failed:
+	dev_alert(&card->pdev->dev, "firmware failed, going idle\n");
+invalid:
+	softing_enable_irq(card, 0);
+	softing_reset_chip(card);
+	mutex_unlock(&card->fw.lock);
+	/* bring all other interfaces down */
+	for (j = 0; j < ARRAY_SIZE(card->net); ++j) {
+		netdev = card->net[j];
+		if (!netdev)
+			continue;
+		dev_close(netdev);
+	}
+	return -EIO;
+}
+
+int softing_default_output(struct net_device *netdev)
+{
+	struct softing_priv *priv = netdev_priv(netdev);
+	struct softing *card = priv->card;
+
+	switch (priv->chip) {
+	case 1000:
+		if (card->pdat->generation < 2)
+			return 0xfb;
+		return 0xfa;
+	case 5:
+		return 0x60;
+	default:
+		return 0x40;
+	}
+}
+
diff --git a/drivers/net/can/softing/softing_main.c b/drivers/net/can/softing/softing_main.c
new file mode 100644
index 0000000..a3d94d4
--- /dev/null
+++ b/drivers/net/can/softing/softing_main.c
@@ -0,0 +1,935 @@
+/*
+* drivers/net/can/softing/softing_main.c
+*
+* Copyright (C) 2008-2010
+*
+* - Kurt Van Dijck, EIA Electronics
+*
+* This program is free software; you can redistribute it and/or modify
+* it under the terms of the version 2 of the GNU General Public License
+* as published by the Free Software Foundation
+*
+* This program is distributed in the hope that it will be useful,
+* but WITHOUT ANY WARRANTY; without even the implied warranty of
+* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+* GNU General Public License for more details.
+*
+* You should have received a copy of the GNU General Public License
+* along with this program; if not, write to the Free Software
+* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+*/
+
+#include <linux/version.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/interrupt.h>
+
+#include "softing.h"
+
+#define TX_ECHO_SKB_MAX (TXMAX/2)
+
+/*
+ * test is a specific CAN netdev
+ * is online (ie. up 'n running, not sleeping, not busoff
+ */
+static inline int canif_is_active(struct net_device *netdev)
+{
+	struct can_priv *can = netdev_priv(netdev);
+	if (!netif_running(netdev))
+		return 0;
+	return (can->state <= CAN_STATE_ERROR_PASSIVE);
+}
+
+/* trigger the tx queue-ing */
+static netdev_tx_t
+softing_netdev_start_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+	struct softing_priv *priv = netdev_priv(dev);
+	struct softing *card = priv->card;
+	int ret;
+	int bhlock;
+	u8 *ptr;
+	u8 cmd;
+	unsigned int fifo_wr;
+	struct can_frame msg;
+
+	if (can_dropped_invalid_skb(dev, skb))
+		return NETDEV_TX_OK;
+
+	if (in_interrupt()) {
+		bhlock = 0;
+		spin_lock(&card->spin);
+	} else {
+		bhlock = 1;
+		spin_lock_bh(&card->spin);
+	}
+	ret = NETDEV_TX_BUSY;
+	if (!card->fw.up)
+		goto xmit_done;
+	if (card->tx.pending >= TXMAX)
+		goto xmit_done;
+	if (priv->tx.pending >= TX_ECHO_SKB_MAX)
+		goto xmit_done;
+	fifo_wr = card->dpram.tx->wr;
+	if (fifo_wr == card->dpram.tx->rd)
+		/*fifo full */
+		goto xmit_done;
+	memcpy(&msg, skb->data, sizeof(msg));
+	ptr = &card->dpram.tx->fifo[fifo_wr][0];
+	cmd = CMD_TX;
+	if (msg.can_id & CAN_RTR_FLAG)
+		cmd |= CMD_RTR;
+	if (msg.can_id & CAN_EFF_FLAG)
+		cmd |= CMD_XTD;
+	if (priv->index)
+		cmd |= CMD_BUS2;
+	*ptr++ = cmd;
+	*ptr++ = msg.can_dlc;
+	*ptr++ = (msg.can_id >> 0);
+	*ptr++ = (msg.can_id >> 8);
+	if (msg.can_id & CAN_EFF_FLAG) {
+		*ptr++ = (msg.can_id >> 16);
+		*ptr++ = (msg.can_id >> 24);
+	} else {
+		/*increment 1, not 2 as you might think */
+		ptr += 1;
+	}
+	if (!(msg.can_id & CAN_RTR_FLAG))
+		memcpy_toio(ptr, &msg.data[0], msg.can_dlc);
+	if (++fifo_wr >=
+		 sizeof(card->dpram.tx->fifo) /
+		 sizeof(card->dpram.tx->fifo[0]))
+		fifo_wr = 0;
+	card->dpram.tx->wr = fifo_wr;
+	card->tx.last_bus = priv->index;
+	++card->tx.pending;
+	++priv->tx.pending;
+	can_put_echo_skb(skb, dev, priv->tx.echo_put);
+	++priv->tx.echo_put;
+	if (priv->tx.echo_put >= TX_ECHO_SKB_MAX)
+		priv->tx.echo_put = 0;
+	/* can_put_echo_skb() saves the skb, safe to return TX_OK */
+	ret = NETDEV_TX_OK;
+xmit_done:
+	if (bhlock)
+		spin_unlock_bh(&card->spin);
+	else
+		spin_unlock(&card->spin);
+	if (card->tx.pending >= TXMAX) {
+		int j;
+		for (j = 0; j < ARRAY_SIZE(card->net); ++j) {
+			if (card->net[j])
+				netif_stop_queue(card->net[j]);
+		}
+	}
+	if (ret != NETDEV_TX_OK)
+		netif_stop_queue(dev);
+
+	return ret;
+}
+
+/*
+ * shortcut for skb delivery
+ */
+int softing_netdev_rx(struct net_device *netdev,
+		const struct can_frame *msg, ktime_t ktime)
+{
+	struct sk_buff *skb;
+	struct can_frame *cf;
+	int ret;
+
+	skb = alloc_can_skb(netdev, &cf);
+	if (!skb)
+		return -ENOMEM;
+	memcpy(cf, msg, sizeof(*msg));
+	skb->tstamp = ktime;
+	ret = netif_rx(skb);
+	if (ret == NET_RX_DROP)
+		++netdev->stats.rx_dropped;
+	return ret;
+}
+
+/*
+ * softing_handle_1
+ * pop 1 entry from the DPRAM queue, and process
+ */
+static int softing_handle_1(struct softing *card)
+{
+	int j;
+	struct net_device *netdev;
+	struct softing_priv *priv;
+	ktime_t ktime;
+	struct can_frame msg;
+
+	unsigned int fifo_rd;
+	unsigned int cnt = 0;
+	u8 *ptr;
+	u32 tmp;
+	u8 cmd;
+
+	memset(&msg, 0, sizeof(msg));
+	if (card->dpram.rx->lost_msg) {
+		/*reset condition */
+		card->dpram.rx->lost_msg = 0;
+		/* prepare msg */
+		msg.can_id = CAN_ERR_FLAG | CAN_ERR_CRTL;
+		msg.can_dlc = CAN_ERR_DLC;
+		msg.data[1] = CAN_ERR_CRTL_RX_OVERFLOW;
+		/*
+		 * service to all busses, we don't know which it was applicable
+		 * but only service busses that are online
+		 */
+		for (j = 0; j < ARRAY_SIZE(card->net); ++j) {
+			netdev = card->net[j];
+			if (!netdev)
+				continue;
+			if (!canif_is_active(netdev))
+				/* a dead bus has no overflows */
+				continue;
+			++netdev->stats.rx_over_errors;
+			softing_netdev_rx(netdev, &msg, ktime_set(0, 0));
+		}
+		/* prepare for other use */
+		memset(&msg, 0, sizeof(msg));
+		++cnt;
+	}
+
+	fifo_rd = card->dpram.rx->rd;
+	if (++fifo_rd >= ARRAY_SIZE(card->dpram.rx->fifo))
+		fifo_rd = 0;
+
+	if (card->dpram.rx->wr == fifo_rd)
+		return cnt;
+
+	ptr = &card->dpram.rx->fifo[fifo_rd][0];
+
+	cmd = *ptr++;
+	if (cmd == 0xff) {
+		/*not quite usefull, probably the card has got out */
+		dev_alert(&card->pdev->dev, "got cmd 0x%02x,"
+			" I suspect the card is lost\n", cmd);
+	}
+	/*mod_trace("0x%02x", cmd);*/
+	netdev = card->net[0];
+	if (cmd & CMD_BUS2)
+		netdev = card->net[1];
+	priv = netdev_priv(netdev);
+
+	if (cmd & CMD_ERR) {
+		u8 can_state;
+		u8 state;
+		state = *ptr++;
+
+		msg.can_id = CAN_ERR_FLAG;
+		msg.can_dlc = CAN_ERR_DLC;
+
+		if (state & 0x80) {
+			can_state = CAN_STATE_BUS_OFF;
+			msg.can_id |= CAN_ERR_BUSOFF;
+			state = 2;
+		} else if (state & 0x60) {
+			can_state = CAN_STATE_ERROR_PASSIVE;
+			msg.can_id |= CAN_ERR_BUSERROR;
+			msg.data[1] = CAN_ERR_CRTL_TX_PASSIVE;
+			state = 1;
+		} else {
+			can_state = CAN_STATE_ERROR_ACTIVE;
+			state = 0;
+			msg.can_id |= CAN_ERR_BUSERROR;
+		}
+		/*update DPRAM */
+		if (!priv->index)
+			card->dpram.info->bus_state = state;
+		else
+			card->dpram.info->bus_state2 = state;
+		/*timestamp */
+		tmp = (ptr[0] <<  0) | (ptr[1] <<  8)
+		    | (ptr[2] << 16) | (ptr[3] << 24);
+		ptr += 4;
+		ktime = softing_raw2ktime(card, tmp);
+		/*trigger dual port RAM */
+		mb();
+		card->dpram.rx->rd = fifo_rd;
+
+		++priv->can.can_stats.bus_error;
+		++netdev->stats.rx_errors;
+		/*update internal status */
+		if (can_state != priv->can.state) {
+			priv->can.state = can_state;
+			if (can_state == CAN_STATE_ERROR_PASSIVE)
+				++priv->can.can_stats.error_passive;
+			if (can_state == CAN_STATE_BUS_OFF) {
+				/* this calls can_close_cleanup() */
+				can_bus_off(netdev);
+				netif_stop_queue(netdev);
+			}
+			/*trigger socketcan */
+			softing_netdev_rx(netdev, &msg, ktime);
+		}
+
+	} else {
+		if (cmd & CMD_RTR)
+			msg.can_id |= CAN_RTR_FLAG;
+		/* acknowledge, was tx msg
+		 * no real tx flag to set
+		if (cmd & CMD_ACK) {
+		}
+		 */
+		msg.can_dlc = get_can_dlc(*ptr++);
+		if (cmd & CMD_XTD) {
+			msg.can_id |= CAN_EFF_FLAG;
+			msg.can_id |= (ptr[0] <<  0) | (ptr[1] <<  8)
+				    | (ptr[2] << 16) | (ptr[3] << 24);
+			ptr += 4;
+		} else {
+			msg.can_id |= (ptr[0] << 0) | (ptr[1] << 8);
+			ptr += 2;
+		}
+		tmp = (ptr[0] <<  0) | (ptr[1] <<  8)
+		    | (ptr[2] << 16) | (ptr[3] << 24);
+		ptr += 4;
+		ktime = softing_raw2ktime(card, tmp);
+		memcpy_fromio(&msg.data[0], ptr, 8);
+		ptr += 8;
+		/*trigger dual port RAM */
+		mb();
+		card->dpram.rx->rd = fifo_rd;
+		/*update socket */
+		if (cmd & CMD_ACK) {
+			struct sk_buff *skb;
+			skb = priv->can.echo_skb[priv->tx.echo_get];
+			if (skb)
+				skb->tstamp = ktime;
+			can_get_echo_skb(netdev, priv->tx.echo_get);
+			++priv->tx.echo_get;
+			if (priv->tx.echo_get >= TX_ECHO_SKB_MAX)
+				priv->tx.echo_get = 0;
+			if (priv->tx.pending)
+				--priv->tx.pending;
+			if (card->tx.pending)
+				--card->tx.pending;
+			++netdev->stats.tx_packets;
+			netdev->stats.tx_bytes += msg.can_dlc;
+		} else {
+			++netdev->stats.rx_packets;
+			netdev->stats.rx_bytes += msg.can_dlc;
+			softing_netdev_rx(netdev, &msg, ktime);
+		}
+	}
+	++cnt;
+	return cnt;
+}
+
+/*
+ * real interrupt handler
+ */
+static void softing_handler(unsigned long param)
+{
+	struct softing *card = (struct softing *)param;
+	struct net_device *netdev;
+	struct softing_priv *priv;
+	int j;
+	int offset;
+
+	spin_lock(&card->spin);
+	while (softing_handle_1(card) > 0)
+		++card->irq.svc_count;
+	spin_unlock(&card->spin);
+	/*resume tx queue's */
+	offset = card->tx.last_bus;
+	for (j = 0; j < ARRAY_SIZE(card->net); ++j) {
+		if (card->tx.pending >= TXMAX)
+			break;
+		netdev = card->net[(j + offset + 1) % card->pdat->nbus];
+		if (!netdev)
+			continue;
+		priv = netdev_priv(netdev);
+		if (!canif_is_active(netdev))
+			/* it makes no sense to wake dead busses */
+			continue;
+		if (priv->tx.pending >= TX_ECHO_SKB_MAX)
+			continue;
+		netif_wake_queue(netdev);
+	}
+}
+
+/*
+ * interrupt routines:
+ * schedule the 'real interrupt handler'
+ */
+static
+irqreturn_t softing_irq_new(int irq, void *dev_id)
+{
+	struct softing *card = (struct softing *)dev_id;
+	unsigned char ir;
+	ir = card->dpram.virt[0xe02];
+	card->dpram.virt[0xe02] = 0;
+	if (card->dpram.rx->rd == 0xffff) {
+		dev_alert(&card->pdev->dev, "I think the card is gone\n");
+		return IRQ_NONE;
+	}
+	if (ir == 1) {
+		tasklet_schedule(&card->irq.bh);
+		return IRQ_HANDLED;
+	} else if (ir == 0x10) {
+		return IRQ_NONE;
+	} else {
+		return IRQ_NONE;
+	}
+}
+
+static
+irqreturn_t softing_irq_old(int irq, void *dev_id)
+{
+	struct softing *card = (struct softing *)dev_id;
+	unsigned char irq_host;
+	irq_host = card->dpram.irq->to_host;
+	/* make sure we have a copy, before clearing the variable in DPRAM */
+	rmb();
+	card->dpram.irq->to_host = 0;
+	/* make sure we cleared it */
+	wmb();
+	if (card->dpram.rx->rd == 0xffff) {
+		dev_alert(&card->pdev->dev, "I think the card is gone\n");
+		return IRQ_NONE;
+	}
+	tasklet_schedule(&card->irq.bh);
+	return IRQ_HANDLED;
+}
+
+/*
+ * netdev/candev inter-operability
+ */
+static int softing_netdev_open(struct net_device *ndev)
+{
+	int ret;
+
+	/* check or determine and set bittime */
+	ret = open_candev(ndev);
+	if (ret)
+		goto failed;
+	ret = softing_startstop(ndev, 1);
+	if (ret)
+		goto failed;
+	return 0;
+failed:
+	return ret;
+}
+
+static int softing_netdev_stop(struct net_device *ndev)
+{
+	int ret;
+
+	netif_stop_queue(ndev);
+
+	/* softing cycle does close_candev() */
+	ret = softing_startstop(ndev, 0);
+	return ret;
+}
+
+static int softing_candev_set_mode(struct net_device *ndev, enum can_mode mode)
+{
+	int ret;
+
+	switch (mode) {
+	case CAN_MODE_START:
+		/* softing cycle does close_candev() */
+		ret = softing_startstop(ndev, 1);
+		return ret;
+	case CAN_MODE_STOP:
+	case CAN_MODE_SLEEP:
+		return -EOPNOTSUPP;
+	}
+	return 0;
+}
+
+/*
+ * Softing device management helpers
+ */
+int softing_enable_irq(struct softing *card, int enable)
+{
+	int ret;
+	if (!enable) {
+		if (card->irq.requested && card->irq.nr) {
+			free_irq(card->irq.nr, card);
+			card->irq.requested = 0;
+		}
+		return 0;
+	}
+	if (!card->irq.requested && (card->irq.nr)) {
+		ret = request_irq(card->irq.nr,
+				(card->pdat->generation >= 2)
+					? softing_irq_new : softing_irq_old,
+				IRQF_SHARED, dev_name(&card->pdev->dev), card);
+		if (ret) {
+			dev_alert(&card->pdev->dev, "%s, request_irq(%u) failed\n",
+				card->pdat->name, card->irq.nr);
+			return ret;
+		}
+		card->irq.requested = 1;
+	}
+	return 0;
+}
+
+static void softing_card_shutdown(struct softing *card)
+{
+	int fw_up = 0;
+	dev_dbg(&card->pdev->dev, "%s()\n", __func__);
+	if (mutex_lock_interruptible(&card->fw.lock))
+		/* return -ERESTARTSYS*/;
+	fw_up = card->fw.up;
+	card->fw.up = 0;
+
+	if (card->irq.requested && card->irq.nr) {
+		free_irq(card->irq.nr, card);
+		card->irq.requested = 0;
+	}
+	if (fw_up) {
+		if (card->pdat->enable_irq)
+			card->pdat->enable_irq(card->pdev, 0);
+		softing_set_reset_dpram(card);
+		if (card->pdat->reset)
+			card->pdat->reset(card->pdev, 1);
+	}
+	mutex_unlock(&card->fw.lock);
+	tasklet_kill(&card->irq.bh);
+}
+
+static int softing_card_boot(struct softing *card)
+{
+	unsigned char *lp;
+	static const unsigned char stream[] = {
+		0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, };
+	unsigned char back[sizeof(stream)];
+	dev_dbg(&card->pdev->dev, "%s()\n", __func__);
+
+	if (mutex_lock_interruptible(&card->fw.lock))
+		return -ERESTARTSYS;
+	if (card->fw.up) {
+		mutex_unlock(&card->fw.lock);
+		return 0;
+	}
+	/* reset board */
+	if (card->pdat->enable_irq)
+		card->pdat->enable_irq(card->pdev, 1);
+	/* boot card */
+	softing_set_reset_dpram(card);
+	if (card->pdat->reset)
+		card->pdat->reset(card->pdev, 1);
+	for (lp = card->dpram.virt; &lp[sizeof(stream)] <= card->dpram.end;
+		lp += sizeof(stream)) {
+
+		memcpy_toio(lp, stream, sizeof(stream));
+		/* flush IO cache */
+		mb();
+		memcpy_fromio(back, lp, sizeof(stream));
+
+		if (!memcmp(back, stream, sizeof(stream)))
+			continue;
+		/* memory is not equal */
+		dev_alert(&card->pdev->dev, "write to dpram failed at 0x%04lx\n",
+			(unsigned long)(lp - card->dpram.virt));
+		goto open_failed;
+	}
+	wmb();
+	/*load boot firmware */
+	if (softing_load_fw(card->pdat->boot.fw, card, card->dpram.virt,
+				 card->dpram.size,
+				 card->pdat->boot.offs -
+				 card->pdat->boot.addr))
+		goto open_failed;
+	/*load load firmware */
+	if (softing_load_fw(card->pdat->load.fw, card, card->dpram.virt,
+				 card->dpram.size,
+				 card->pdat->load.offs -
+				 card->pdat->load.addr))
+		goto open_failed;
+
+	if (card->pdat->reset)
+		card->pdat->reset(card->pdev, 0);
+	softing_clr_reset_dpram(card);
+	if (softing_bootloader_command(card, 0, "card boot"))
+		goto open_failed;
+	if (softing_load_app_fw(card->pdat->app.fw, card))
+		goto open_failed;
+	/*reset chip */
+	card->dpram.info->reset_rcv_fifo = 0;
+	card->dpram.info->reset = 1;
+	/*sync */
+	if (softing_fct_cmd(card, 99, 0x55, "sync-a"))
+		goto open_failed;
+	if (softing_fct_cmd(card, 99, 0xaa, "sync-a"))
+		goto open_failed;
+	/*reset chip */
+	if (softing_fct_cmd(card, 0, 0, "reset_chip"))
+		goto open_failed;
+	/*get_serial */
+	if (softing_fct_cmd(card, 43, 0, "get_serial_number"))
+		goto open_failed;
+	card->id.serial =
+		 (u16) card->dpram.fct->param[1] +
+		 (((u16) card->dpram.fct->param[2]) << 16);
+	/*get_version */
+	if (softing_fct_cmd(card, 12, 0, "get_version"))
+		goto open_failed;
+	card->id.fw = (u16) card->dpram.fct->param[1];
+	card->id.hw = (u16) card->dpram.fct->param[2];
+	card->id.lic = (u16) card->dpram.fct->param[3];
+	card->id.chip[0] = (u16) card->dpram.fct->param[4];
+	card->id.chip[1] = (u16) card->dpram.fct->param[5];
+
+	dev_info(&card->pdev->dev, "card booted, type %s, "
+			"serial %u, fw %u, hw %u, lic %u, chip (%u,%u)\n",
+		  card->pdat->name, card->id.serial, card->id.fw, card->id.hw,
+		  card->id.lic, card->id.chip[0], card->id.chip[1]);
+
+	card->fw.up = 1;
+	mutex_unlock(&card->fw.lock);
+	return 0;
+open_failed:
+	card->fw.up = 0;
+	if (card->pdat->enable_irq)
+		card->pdat->enable_irq(card->pdev, 0);
+	softing_set_reset_dpram(card);
+	if (card->pdat->reset)
+		card->pdat->reset(card->pdev, 1);
+	mutex_unlock(&card->fw.lock);
+	return -EIO;
+}
+
+/*
+ * netdev sysfs
+ */
+static ssize_t show_channel(struct device *dev
+		, struct device_attribute *attr, char *buf)
+{
+	struct net_device *ndev = to_net_dev(dev);
+	struct softing_priv *priv = netdev2softing(ndev);
+	return sprintf(buf, "%i\n", priv->index);
+}
+
+static ssize_t show_chip(struct device *dev
+		, struct device_attribute *attr, char *buf)
+{
+	struct net_device *ndev = to_net_dev(dev);
+	struct softing_priv *priv = netdev2softing(ndev);
+	return sprintf(buf, "%i\n", priv->chip);
+}
+
+static ssize_t show_output(struct device *dev
+		, struct device_attribute *attr, char *buf)
+{
+	struct net_device *ndev = to_net_dev(dev);
+	struct softing_priv *priv = netdev2softing(ndev);
+	return sprintf(buf, "0x%02x\n", priv->output);
+}
+
+static ssize_t store_output(struct device *dev
+		, struct device_attribute *attr
+		, const char *buf, size_t count)
+{
+	struct net_device *ndev = to_net_dev(dev);
+	struct softing_priv *priv = netdev2softing(ndev);
+	struct softing *card = priv->card;
+	unsigned long val;
+	int ret;
+
+	ret = strict_strtoul(buf, 0, &val);
+	if (ret < 0)
+		return ret;
+	val &= 0xFF;
+
+	ret = mutex_lock_interruptible(&card->fw.lock);
+	if (ret)
+		return -ERESTARTSYS;
+	if (netif_running(ndev)) {
+		mutex_unlock(&card->fw.lock);
+		return -EBUSY;
+	}
+	priv->output = val;
+	mutex_unlock(&card->fw.lock);
+	return count;
+}
+
+static const DEVICE_ATTR(channel, S_IRUGO, show_channel, 0);
+static const DEVICE_ATTR(chip, S_IRUGO, show_chip, 0);
+static const DEVICE_ATTR(output, S_IRUGO | S_IWUSR, show_output, store_output);
+
+static const struct attribute *const netdev_sysfs_attrs[] = {
+	&dev_attr_channel.attr,
+	&dev_attr_chip.attr,
+	&dev_attr_output.attr,
+	0,
+};
+static const struct attribute_group netdev_sysfs_group = {
+	.name  = 0,
+	.attrs = (struct attribute **)netdev_sysfs_attrs,
+};
+
+static const struct net_device_ops softing_netdev_ops = {
+	.ndo_open = softing_netdev_open,
+	.ndo_stop = softing_netdev_stop,
+	.ndo_start_xmit	= softing_netdev_start_xmit,
+};
+
+static const struct can_bittiming_const softing_btr_const = {
+	.tseg1_min = 1,
+	.tseg1_max = 16,
+	.tseg2_min = 1,
+	.tseg2_max = 8,
+	.sjw_max = 4, /* overruled */
+	.brp_min = 1,
+	.brp_max = 32, /* overruled */
+	.brp_inc = 1,
+};
+
+
+static struct net_device *softing_netdev_create(
+		struct softing *card, u16 chip_id)
+{
+	struct net_device *netdev;
+	struct softing_priv *priv;
+
+	netdev = alloc_candev(sizeof(*priv), TX_ECHO_SKB_MAX);
+	if (!netdev) {
+		dev_alert(&card->pdev->dev, "alloc_candev failed\n");
+		return 0;
+	}
+	priv = netdev_priv(netdev);
+	priv->netdev = netdev;
+	priv->card = card;
+	memcpy(&priv->btr_const, &softing_btr_const, sizeof(priv->btr_const));
+	priv->btr_const.brp_max = card->pdat->max_brp;
+	priv->btr_const.sjw_max = card->pdat->max_sjw;
+	priv->can.bittiming_const = &priv->btr_const;
+	priv->can.clock.freq = 8000000;
+	priv->chip = chip_id;
+	priv->output = softing_default_output(netdev);
+	SET_NETDEV_DEV(netdev, &card->pdev->dev);
+
+	netdev->flags |= IFF_ECHO;
+	netdev->netdev_ops	= &softing_netdev_ops;
+	priv->can.do_set_mode	= softing_candev_set_mode;
+	priv->can.ctrlmode_supported = CAN_CTRLMODE_3_SAMPLES |
+		CAN_CTRLMODE_BERR_REPORTING;
+
+	return netdev;
+}
+
+static int softing_netdev_register(struct net_device *netdev)
+{
+	int ret;
+
+	/*
+	 * provide bus-specific sysfs attributes _during_ the uevent
+	 */
+	netdev->sysfs_groups[0] = &netdev_sysfs_group;
+	ret = register_candev(netdev);
+	if (ret) {
+		dev_alert(&netdev->dev, "register failed\n");
+		return ret;
+	}
+	return 0;
+}
+
+static void softing_netdev_cleanup(struct net_device *netdev)
+{
+	unregister_candev(netdev);
+	free_candev(netdev);
+}
+
+/*
+ * sysfs for Platform device
+ */
+#define DEV_ATTR_RO(name, member) \
+static ssize_t show_##name(struct device *dev, \
+		struct device_attribute *attr, char *buf) \
+{ \
+	struct softing *card = platform_get_drvdata(to_platform_device(dev)); \
+	return sprintf(buf, "%u\n", card->member); \
+} \
+static DEVICE_ATTR(name, 0444, show_##name, 0)
+
+DEV_ATTR_RO(serial	, id.serial);
+DEV_ATTR_RO(firmware	, id.fw);
+DEV_ATTR_RO(hardware	, id.hw);
+DEV_ATTR_RO(license	, id.lic);
+DEV_ATTR_RO(freq	, id.freq);
+DEV_ATTR_RO(txpending	, tx.pending);
+
+static struct attribute *softing_pdev_attrs[] = {
+	&dev_attr_serial.attr,
+	&dev_attr_firmware.attr,
+	&dev_attr_hardware.attr,
+	&dev_attr_license.attr,
+	&dev_attr_freq.attr,
+	&dev_attr_txpending.attr,
+	0,
+};
+
+static const struct attribute_group softing_pdev_group = {
+	.attrs = softing_pdev_attrs,
+};
+
+/*
+ * platform driver
+ */
+static int softing_pdev_remove(struct platform_device *pdev)
+{
+	struct softing *card = platform_get_drvdata(pdev);
+	int j;
+
+	/*first, disable card*/
+	softing_card_shutdown(card);
+	tasklet_kill(&card->irq.bh);
+
+	for (j = 0; j < ARRAY_SIZE(card->net); ++j) {
+		if (!card->net[j])
+			continue;
+		softing_netdev_cleanup(card->net[j]);
+		card->net[j] = 0;
+	}
+	sysfs_remove_group(&pdev->dev.kobj, &softing_pdev_group);
+
+	iounmap(card->dpram.virt);
+	kfree(card);
+	return 0;
+}
+
+static int softing_pdev_probe(struct platform_device *pdev)
+{
+	const struct softing_platform_data *pdat = pdev->dev.platform_data;
+	struct softing *card;
+	struct net_device *netdev;
+	struct softing_priv *priv;
+	struct resource *pres;
+	int ret;
+	int j;
+
+	if (!pdat) {
+		dev_warn(&pdev->dev, "no platform data\n");
+		return -EINVAL;
+	}
+	if (pdat->nbus > ARRAY_SIZE(card->net)) {
+		dev_warn(&pdev->dev, "%u nets??\n", pdat->nbus);
+		return -EINVAL;
+	}
+
+	card = kzalloc(sizeof(*card), GFP_KERNEL);
+	if (!card)
+		return -ENOMEM;
+	card->pdat = pdat;
+	card->pdev = pdev;
+	platform_set_drvdata(pdev, card);
+	/* try_module_get(THIS_MODULE); */
+	mutex_init(&card->fw.lock);
+	spin_lock_init(&card->spin);
+	tasklet_init(&card->irq.bh, softing_handler, (unsigned long)card);
+
+	ret = -EINVAL;
+	pres = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	if (!pres)
+		goto ioremap_failed;
+	card->dpram.phys = pres->start;
+	card->dpram.size = pres->end - pres->start + 1;
+	card->dpram.virt = ioremap_nocache(card->dpram.phys, card->dpram.size);
+	if (!card->dpram.virt) {
+		dev_alert(&card->pdev->dev, "dpram ioremap failed\n");
+		goto ioremap_failed;
+	}
+	card->dpram.end = &card->dpram.virt[card->dpram.size];
+	/*initialize_board */
+	card->dpram.rx = (struct softing_rx *)&card->dpram.virt[0x0000];
+	card->dpram.tx = (struct softing_tx *)&card->dpram.virt[0x0400];
+	card->dpram.fct = (struct softing_fct *)&card->dpram.virt[0x0300];
+	card->dpram.info = (struct softing_info *)&card->dpram.virt[0x0330];
+	card->dpram.command = (unsigned short *)&card->dpram.virt[0x07e0];
+	card->dpram.receipt = (unsigned short *)&card->dpram.virt[0x07f0];
+	card->dpram.irq = (struct softing_irq *)&card->dpram.virt[0x07fe];
+
+	pres = platform_get_resource(pdev, IORESOURCE_IRQ, 0);
+	if (pres)
+		card->irq.nr = pres->start;
+
+	/*reset card */
+	ret = -EIO;
+	if (softing_card_boot(card)) {
+		dev_alert(&pdev->dev, "failed to boot\n");
+		goto boot_failed;
+	}
+
+	/*only now, the chip's are known */
+	card->id.freq = card->pdat->freq * 1000000UL;
+
+	ret = sysfs_create_group(&pdev->dev.kobj, &softing_pdev_group);
+	if (ret < 0) {
+		dev_alert(&card->pdev->dev, "sysfs failed\n");
+		goto sysfs_failed;
+	}
+
+	ret = -ENOMEM;
+	for (j = 0; j < ARRAY_SIZE(card->net); ++j) {
+		card->net[j] = netdev =
+			softing_netdev_create(card, card->id.chip[j]);
+		if (!netdev) {
+			dev_alert(&pdev->dev, "failed to make can[%i]", j);
+			goto netdev_failed;
+		}
+		priv = netdev_priv(card->net[j]);
+		priv->index = j;
+		ret = softing_netdev_register(netdev);
+		if (ret) {
+			free_candev(netdev);
+			card->net[j] = 0;
+			dev_alert(&card->pdev->dev,
+				"failed to register can[%i]\n", j);
+			goto netdev_failed;
+		}
+	}
+	dev_info(&card->pdev->dev, "card initialised\n");
+	return 0;
+
+netdev_failed:
+	for (j = 0; j < ARRAY_SIZE(card->net); ++j) {
+		if (!card->net[j])
+			continue;
+		softing_netdev_cleanup(card->net[j]);
+	}
+	sysfs_remove_group(&pdev->dev.kobj, &softing_pdev_group);
+sysfs_failed:
+	softing_card_shutdown(card);
+boot_failed:
+	iounmap(card->dpram.virt);
+ioremap_failed:
+	tasklet_kill(&card->irq.bh);
+	kfree(card);
+	return ret;
+}
+
+static struct platform_driver softing_driver = {
+	.driver = {
+		.name = "softing",
+		.owner = THIS_MODULE,
+	},
+	.probe = softing_pdev_probe,
+	.remove = softing_pdev_remove,
+};
+
+MODULE_ALIAS("platform:softing");
+
+static int __init softing_start(void)
+{
+	return platform_driver_register(&softing_driver);
+}
+
+static void __exit softing_stop(void)
+{
+	platform_driver_unregister(&softing_driver);
+}
+
+module_init(softing_start);
+module_exit(softing_stop);
+
+MODULE_DESCRIPTION("socketcan softing driver");
+MODULE_AUTHOR("Kurt Van Dijck <kurt.van.dijck-/BeEPy95v10@public.gmane.org>");
+MODULE_LICENSE("GPL");
diff --git a/drivers/net/can/softing/softing_platform.h b/drivers/net/can/softing/softing_platform.h
new file mode 100644
index 0000000..9ff69a1
--- /dev/null
+++ b/drivers/net/can/softing/softing_platform.h
@@ -0,0 +1,38 @@
+
+#include <linux/platform_device.h>
+
+#ifndef _SOFTING_DEVICE_H_
+#define _SOFTING_DEVICE_H_
+
+/* softing firmware directory prefix */
+#define fw_dir "softing-4.6/"
+
+struct softing_platform_data {
+	unsigned int manf;
+	unsigned int prod;
+	/* generation
+	 * 1st with NEC or SJA1000
+	 * 8bit, exclusive interrupt, ...
+	 * 2nd only SJA11000
+	 * 16bit, shared interrupt
+	 */
+	int generation;
+	int nbus; /* # busses on device */
+	unsigned int freq; /* crystal in MHz */
+	unsigned int max_brp;
+	unsigned int max_sjw;
+	unsigned long dpram_size;
+	char name[32];
+	struct {
+		unsigned long offs;
+		unsigned long addr;
+		const char *fw;
+	} boot, load, app;
+	/* reset() function, bring pdev in or out of reset, depending on
+	   value */
+	int (*reset)(struct platform_device *pdev, int value);
+	int (*enable_irq)(struct platform_device *pdev, int value);
+};
+
+#endif
+

^ permalink raw reply related

* [PATCH net-next-2.6 0/2] can: add driver for Softing card
From: Kurt Van Dijck @ 2010-12-23  9:36 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA,
	socketcan-core-0fE9KPoRgkgATYTw5x5z8w

This series will add a driver for Softing PCMCIA CAN card.

This core CAN networking code exists for a few years in the
socketCAN repository.
The updates since the latest socketCAN version:
* PCMCIA interfacing changed
* seperation between the two drivers via a platform:softing device
* added conditional bus-error reporting

About the platform_device ...
Softing Gmbh has PCMCIA & PCI cards. Both share the same
Dual Port RAM (DPRAM) interface. Therefore, the driver is split in 2 stages:

[1/2] softing.ko: Generic platform bus device driver
It expects a platform:softing device with an IO range that contains
the DPRAM, and an IRQ line.

[2/2] softing_cs.ko: PCMCIA driver
This driver will create a platform:softing device on top of the
pcmcia device.

The 2 driver are not linked in a way that softing.ko depends
on softing_cs.ko or vice versa. The reason for doing so is that
the DPRAM interface takes quite some code, and building it directly
on the PCMCIA or PCI device was difficult to follow.
The present design eliminates the need for exotic sysfs API's since
all sysfs attributes know they are attached to a platform_device.

Kurt

^ permalink raw reply

* Re: [PATCH] tcp: cleanup of cwnd initialization in tcp_init_metrics()
From: Jiri Kosina @ 2010-12-23  9:23 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, linux-kernel, Vojtech Pavlik,
	Ilpo Järvinen
In-Reply-To: <1293095663.7789.3.camel@edumazet-laptop>

On Thu, 23 Dec 2010, Eric Dumazet wrote:

> > > > Commit 86bcebafc5e7f5 ("tcp: fix >2 iw selection") fixed a case when 
> > > > congestion window initialization has been mistakenly omitted by 
> > > > introducing cwnd label and putting backwards jump from the end of the 
> > > > function.
> > > > 
> > > > This makes the code unnecessarily tricky to read and understand on a first 
> > > > sight.
> > > > 
> > > > Shuffle the code around a little bit to make it more obvious.
> > > 
> > > Well in fine you have
> > > 
> > > 	if (inet_csk(sk)->icsk_rto < TCP_TIMEOUT_INIT && !tp->rx_opt.saw_tstamp)
> > > 		goto reset;
> > > 	goto out;
> > > reset:
> > > 
> > > Is that really more obvious ? ;)
> > 
> > To me it seems much more obvious than goto from the very end of the 
> > function somewhere into the middle and returning from there, but 
> > definitely a matter of personal taste.
> > 
> 
> You dont understand what I said. Please read again.
> 
> To me I prefer you _finish_ the cleanup so that we have :
> 
> 	if (some condition) {
> reset:
> 	}
> 
> out:
> 
> You remove two "goto" in the process.
> 
> Is that clear now ?

Right, that's even better. Updated patch below.


From: Jiri Kosina <jkosina@suse.cz>
Subject: [PATCH] tcp: cleanup of cwnd initialization in tcp_init_metrics()

Commit 86bcebafc5e7f5 ("tcp: fix >2 iw selection") fixed a case
when congestion window initialization has been mistakenly omitted
by introducing cwnd label and putting backwards goto from the
end of the function.

This makes the code unnecessarily tricky to read and understand
on a first sight.

Shuffle the code around a little bit to make it more obvious.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---
 net/ipv4/tcp_input.c |   29 ++++++++++++-----------------
 1 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 6d8ab1c..3d1e015 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -912,25 +912,20 @@ static void tcp_init_metrics(struct sock *sk)
 		tp->mdev_max = tp->rttvar = max(tp->mdev, tcp_rto_min(sk));
 	}
 	tcp_set_rto(sk);
-	if (inet_csk(sk)->icsk_rto < TCP_TIMEOUT_INIT && !tp->rx_opt.saw_tstamp)
-		goto reset;
-
-cwnd:
-	tp->snd_cwnd = tcp_init_cwnd(tp, dst);
-	tp->snd_cwnd_stamp = tcp_time_stamp;
-	return;
-
+	if (inet_csk(sk)->icsk_rto < TCP_TIMEOUT_INIT && !tp->rx_opt.saw_tstamp) {
 reset:
-	/* Play conservative. If timestamps are not
-	 * supported, TCP will fail to recalculate correct
-	 * rtt, if initial rto is too small. FORGET ALL AND RESET!
-	 */
-	if (!tp->rx_opt.saw_tstamp && tp->srtt) {
-		tp->srtt = 0;
-		tp->mdev = tp->mdev_max = tp->rttvar = TCP_TIMEOUT_INIT;
-		inet_csk(sk)->icsk_rto = TCP_TIMEOUT_INIT;
+		/* Play conservative. If timestamps are not
+		 * supported, TCP will fail to recalculate correct
+		 * rtt, if initial rto is too small. FORGET ALL AND RESET!
+		 */
+		if (!tp->rx_opt.saw_tstamp && tp->srtt) {
+			tp->srtt = 0;
+			tp->mdev = tp->mdev_max = tp->rttvar = TCP_TIMEOUT_INIT;
+			inet_csk(sk)->icsk_rto = TCP_TIMEOUT_INIT;
+		}
 	}
-	goto cwnd;
+	tp->snd_cwnd = tcp_init_cwnd(tp, dst);
+	tp->snd_cwnd_stamp = tcp_time_stamp;
 }
 
 static void tcp_update_reordering(struct sock *sk, const int metric,

-- 
Jiri Kosina
SUSE Labs, Novell Inc.

^ permalink raw reply related

* Re: ip rule and/or route problem in 2.6.37-rc5+
From: Maciej Żenczykowski @ 2010-12-23  9:22 UTC (permalink / raw)
  To: David Miller; +Cc: Tom Herbert, greearb, netdev
In-Reply-To: <AANLkTikNk8Ltei+jxpJuVqNxbhc-hhMpjc89YDvbirQK@mail.gmail.com>

On Mon, Dec 20, 2010 at 18:22, Tom Herbert <therbert@google.com> wrote:
>> Tom, please acknowledge this regression you've added to the tree.
>
> Acknowledged.  Looking at it.
>
> Tom

This is definitely a regression, and the fault is definitely with this code.

Indeed, the code is fundamentally flawed.  I've been trying to come up
with a fix, but I'm beginning to think we're barking up the wrong tree
here.
Currently my preference is leaning towards using setsockopt(SOL_IP,
IP_TRANSPARENT) [possibly with some sk->transparent tcp inheritence
fixes] instead.
I definitely want this functionality though, since it's wonderfully
useful for testing (and for serving as well).

Could we please revert this ( 4465b469008bc03b98a1b8df4e9ae501b6c69d4b
) and make sure the revert makes it into 2.6.37?  We definitely don't
want to ship 2.6.37 with this patch in its current state.

-- Maciej

^ permalink raw reply

* Re: [PATCH] tcp: cleanup of cwnd initialization in tcp_init_metrics()
From: Eric Dumazet @ 2010-12-23  9:14 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: David Miller, netdev, linux-kernel, Vojtech Pavlik,
	Ilpo Järvinen
In-Reply-To: <alpine.LNX.2.00.1012231002310.16569@pobox.suse.cz>

Le jeudi 23 décembre 2010 à 10:03 +0100, Jiri Kosina a écrit :
> On Thu, 23 Dec 2010, Eric Dumazet wrote:
> 
> > Le mercredi 22 décembre 2010 à 19:39 +0100, Jiri Kosina a écrit :
> > > Commit 86bcebafc5e7f5 ("tcp: fix >2 iw selection") fixed a case when 
> > > congestion window initialization has been mistakenly omitted by 
> > > introducing cwnd label and putting backwards jump from the end of the 
> > > function.
> > > 
> > > This makes the code unnecessarily tricky to read and understand on a first 
> > > sight.
> > > 
> > > Shuffle the code around a little bit to make it more obvious.
> > 
> > Well in fine you have
> > 
> > 	if (inet_csk(sk)->icsk_rto < TCP_TIMEOUT_INIT && !tp->rx_opt.saw_tstamp)
> > 		goto reset;
> > 	goto out;
> > reset:
> > 
> > Is that really more obvious ? ;)
> 
> To me it seems much more obvious than goto from the very end of the 
> function somewhere into the middle and returning from there, but 
> definitely a matter of personal taste.
> 

You dont understand what I said. Please read again.

To me I prefer you _finish_ the cleanup so that we have :

	if (some condition) {
reset:
	}

out:

You remove two "goto" in the process.

Is that clear now ?

^ permalink raw reply

* RE: Using ethernet device as efficient small packet generator
From: Jon Zhou @ 2010-12-23  8:57 UTC (permalink / raw)
  To: juice@swagman.org, Eric Dumazet, Stephen Hemminger,
	netdev@vger.kernel.org
In-Reply-To: <e1a7dc28b8b95e93d38edc418d59e89a.squirrel@www.liukuma.net>



> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of juice
> Sent: Thursday, December 23, 2010 1:16 PM
> To: Eric Dumazet; Stephen Hemminger; netdev@vger.kernel.org
> Subject: Re: Using ethernet device as efficient small packet generator
> 
> > Reaching 1Gbs should not be a problem (I was speaking about 10Gbps)
> > I reach link speed with my tg3 card and one single cpu :)
> > (Broadcom Corporation NetXtreme BCM5715S Gigabit Ethernet (rev a3))
> >
> > Please provide : ethtool -S eth0
> >
> 
> This is from the e1000 interface:
> 03:02.1 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet
> Controller (Copper) (rev 01)
> 
> root@a2labralinux:/home/juice# ethtool -S eth1
> NIC statistics:
>      rx_packets: 192069
>      tx_packets: 60000313
>      rx_bytes: 33850492
>      tx_bytes: 3840026215
>      rx_broadcast: 192069
>      tx_broadcast: 3
>      rx_multicast: 0
>      tx_multicast: 310
>      rx_errors: 0
>      tx_errors: 0
>      tx_dropped: 0
>      multicast: 0
>      collisions: 0
>      rx_length_errors: 0
>      rx_over_errors: 0
>      rx_crc_errors: 0
>      rx_frame_errors: 0
>      rx_no_buffer_count: 0
>      rx_missed_errors: 0
>      tx_aborted_errors: 0
>      tx_carrier_errors: 0
>      tx_fifo_errors: 0
>      tx_heartbeat_errors: 0
>      tx_window_errors: 0
>      tx_abort_late_coll: 0
>      tx_deferred_ok: 0
>      tx_single_coll_ok: 0
>      tx_multi_coll_ok: 0
>      tx_timeout_count: 0
>      tx_restart_queue: 1806437
>      rx_long_length_errors: 0
>      rx_short_length_errors: 0
>      rx_align_errors: 0
>      tx_tcp_seg_good: 0
>      tx_tcp_seg_failed: 0
>      rx_flow_control_xon: 0
>      rx_flow_control_xoff: 0
>      tx_flow_control_xon: 0
>      tx_flow_control_xoff: 0
>      rx_long_byte_count: 33850492
>      rx_csum_offload_good: 8978
>      rx_csum_offload_errors: 0
>      rx_header_split: 0
>      alloc_rx_buff_failed: 0
>      tx_smbus: 0
>      rx_smbus: 0
>      dropped_smbus: 0
> 
> 
> This is from the tg3 interface:
> 05:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5761
> Gigabit Ethernet PCIe (rev 10)
> 
> root@d8labralinux:/home/juice# ethtool -S eth2
> NIC statistics:
>      rx_octets: 10814
>      rx_fragments: 0
>      rx_ucast_packets: 20
>      rx_mcast_packets: 0
>      rx_bcast_packets: 26
>      rx_fcs_errors: 0
>      rx_align_errors: 0
>      rx_xon_pause_rcvd: 0
>      rx_xoff_pause_rcvd: 0
>      rx_mac_ctrl_rcvd: 0
>      rx_xoff_entered: 0
>      rx_frame_too_long_errors: 0
>      rx_jabbers: 0
>      rx_undersize_packets: 0
>      rx_in_length_errors: 0
>      rx_out_length_errors: 0
>      rx_64_or_less_octet_packets: 0
>      rx_65_to_127_octet_packets: 0
>      rx_128_to_255_octet_packets: 0
>      rx_256_to_511_octet_packets: 0
>      rx_512_to_1023_octet_packets: 0
>      rx_1024_to_1522_octet_packets: 0
>      rx_1523_to_2047_octet_packets: 0
>      rx_2048_to_4095_octet_packets: 0
>      rx_4096_to_8191_octet_packets: 0
>      rx_8192_to_9022_octet_packets: 0
>      tx_octets: 5120013863
>      tx_collisions: 0
>      tx_xon_sent: 0
>      tx_xoff_sent: 0
>      tx_flow_control: 0
>      tx_mac_errors: 0
>      tx_single_collisions: 0
>      tx_mult_collisions: 0
>      tx_deferred: 0
>      tx_excessive_collisions: 0
>      tx_late_collisions: 0
>      tx_collide_2times: 0
>      tx_collide_3times: 0
>      tx_collide_4times: 0
>      tx_collide_5times: 0
>      tx_collide_6times: 0
>      tx_collide_7times: 0
>      tx_collide_8times: 0
>      tx_collide_9times: 0
>      tx_collide_10times: 0
>      tx_collide_11times: 0
>      tx_collide_12times: 0
>      tx_collide_13times: 0
>      tx_collide_14times: 0
>      tx_collide_15times: 0
>      tx_ucast_packets: 80000034
>      tx_mcast_packets: 42
>      tx_bcast_packets: 40
>      tx_carrier_sense_errors: 0
>      tx_discards: 0
>      tx_errors: 0
>      dma_writeq_full: 0
>      dma_write_prioq_full: 0
>      rxbds_empty: 0
>      rx_discards: 0
>      rx_errors: 0
>      rx_threshold_hit: 0
>      dma_readq_full: 0
>      dma_read_prioq_full: 0
>      tx_comp_queue_full: 0
>      ring_set_send_prod_index: 0
>      ring_status_update: 0
>      nic_irqs: 0
>      nic_avoided_irqs: 0
>      nic_tx_threshold_hit: 0

Ethtool -S "My intel x520 10G nic" will show there are 8 rx/tx queues

I just made 5M pps with 64 bytes packet according to link given by eric Dumazet.
(connect the 2 ports with each other of the NIC, XEON E5540,kernel 2.6.32,set irq affinity, Noted that I have an abnormal ksoftirqd/2 which occupy 30%cpu even at idle state, so the result still has space to improve)

At another old kernel(2.6.16) with tg3 and bnx2 1G NIC,XEON E5450, I only got 490K pps(it is about 300Mbps,30% GE), I think the reason is multiqueue unsupported in this kernel.

I will do a test with 1Gb nic on the new kernel later.

> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] tcp: cleanup of cwnd initialization in tcp_init_metrics()
From: Jiri Kosina @ 2010-12-23  9:03 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, linux-kernel, Vojtech Pavlik,
	Ilpo Järvinen
In-Reply-To: <1293092671.2679.44.camel@edumazet-laptop>

On Thu, 23 Dec 2010, Eric Dumazet wrote:

> Le mercredi 22 décembre 2010 à 19:39 +0100, Jiri Kosina a écrit :
> > Commit 86bcebafc5e7f5 ("tcp: fix >2 iw selection") fixed a case when 
> > congestion window initialization has been mistakenly omitted by 
> > introducing cwnd label and putting backwards jump from the end of the 
> > function.
> > 
> > This makes the code unnecessarily tricky to read and understand on a first 
> > sight.
> > 
> > Shuffle the code around a little bit to make it more obvious.
> 
> Well in fine you have
> 
> 	if (inet_csk(sk)->icsk_rto < TCP_TIMEOUT_INIT && !tp->rx_opt.saw_tstamp)
> 		goto reset;
> 	goto out;
> reset:
> 
> Is that really more obvious ? ;)

To me it seems much more obvious than goto from the very end of the 
function somewhere into the middle and returning from there, but 
definitely a matter of personal taste.

-- 
Jiri Kosina
SUSE Labs, Novell Inc.

^ permalink raw reply

* Re: [PATCH] ipv4: dont create routes on down devices
From: Octavian Purdila @ 2010-12-23  8:50 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: nicolas.dichtel, David Miller, netdev
In-Reply-To: <1293028779.3027.133.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wednesday 22 December 2010, 16:39:39

> [PATCH] ipv4: dont create routes on down devices
> 
> In ip_route_output_slow(), instead of allowing a route to be created on
> a not UPed device, report -ENETUNREACH immediately.
> 
> # ip tunnel add mode ipip remote 10.16.0.164 local
> 10.16.0.72 dev eth0
> # (Note : tunl1 is down)
> # ping -I tunl1 10.1.2.3
> PING 10.1.2.3 (10.1.2.3) from 192.168.18.5 tunl1: 56(84) bytes of data.
> (nothing)
> # ./a.out tunl1
> # ip tunnel del tunl1
> Message from syslogd@shelby at Dec 22 10:12:08 ...
>   kernel: unregister_netdevice: waiting for tunl1 to become free.
> Usage count = 3
> 
> After patch:
> # ping -I tunl1 10.1.2.3
> connect: Network is unreachable
> 
> 
> Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Cc: Octavian Purdila <opurdila@ixiacom.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Thanks Eric !

Reviewed-by: Octavian Purdila <opurdila@ixiacom.com>

^ permalink raw reply

* Re: [PATCH] tcp: cleanup of cwnd initialization in tcp_init_metrics()
From: Eric Dumazet @ 2010-12-23  8:24 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: David Miller, netdev, linux-kernel, Vojtech Pavlik,
	Ilpo Järvinen
In-Reply-To: <alpine.LNX.2.00.1012221938030.16569@pobox.suse.cz>

Le mercredi 22 décembre 2010 à 19:39 +0100, Jiri Kosina a écrit :
> Commit 86bcebafc5e7f5 ("tcp: fix >2 iw selection") fixed a case when 
> congestion window initialization has been mistakenly omitted by 
> introducing cwnd label and putting backwards jump from the end of the 
> function.
> 
> This makes the code unnecessarily tricky to read and understand on a first 
> sight.
> 
> Shuffle the code around a little bit to make it more obvious.

Well in fine you have

	if (inet_csk(sk)->icsk_rto < TCP_TIMEOUT_INIT && !tp->rx_opt.saw_tstamp)
		goto reset;
	goto out;
reset:

Is that really more obvious ? ;)

> 
> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
> ---
>  net/ipv4/tcp_input.c |   10 ++++------
>  1 files changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 6d8ab1c..dddff6d 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -915,11 +915,7 @@ static void tcp_init_metrics(struct sock *sk)
>  	if (inet_csk(sk)->icsk_rto < TCP_TIMEOUT_INIT && !tp->rx_opt.saw_tstamp)
>  		goto reset;
>  
> -cwnd:
> -	tp->snd_cwnd = tcp_init_cwnd(tp, dst);
> -	tp->snd_cwnd_stamp = tcp_time_stamp;
> -	return;
> -
> +	goto out;
>  reset:
>  	/* Play conservative. If timestamps are not
>  	 * supported, TCP will fail to recalculate correct
> @@ -930,7 +926,9 @@ reset:
>  		tp->mdev = tp->mdev_max = tp->rttvar = TCP_TIMEOUT_INIT;
>  		inet_csk(sk)->icsk_rto = TCP_TIMEOUT_INIT;
>  	}
> -	goto cwnd;
> +out:
> +	tp->snd_cwnd = tcp_init_cwnd(tp, dst);
> +	tp->snd_cwnd_stamp = tcp_time_stamp;
>  }
>  
>  static void tcp_update_reordering(struct sock *sk, const int metric,





^ permalink raw reply

* [PATCH] bna: bnad_udelay macro cleanup
From: Rasesh Mody @ 2010-12-23  7:05 UTC (permalink / raw)
  To: netdev; +Cc: Rasesh Mody, Debashis Dutt

Change details:
	- Removed unnecessary bnad_udelay macro for ia64

Signed-off-by: Debashis Dutt <ddutt@brocade.com>
Signed-off-by: Rasesh Mody <rmody@brocade.com>
---
 drivers/net/bna/bnad.c |    4 ++--
 drivers/net/bna/bnad.h |    6 ------
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/net/bna/bnad.c b/drivers/net/bna/bnad.c
index 277dbe0..12d1d7a 100644
--- a/drivers/net/bna/bnad.c
+++ b/drivers/net/bna/bnad.c
@@ -925,7 +925,7 @@ bnad_cb_tx_cleanup(struct bnad *bnad, struct bna_tcb *tcb)
 {
 	/* Delay only once for the whole Tx Path Shutdown */
 	if (!test_and_set_bit(BNAD_RF_TX_SHUTDOWN_DELAYED, &bnad->run_flags))
-		bnad_udelay(BNAD_TXRX_SYNC_UDELAY);
+		udelay(BNAD_TXRX_SYNC_UDELAY);
 }
 
 static void
@@ -938,7 +938,7 @@ bnad_cb_rx_cleanup(struct bnad *bnad,
 		clear_bit(BNAD_RXQ_STARTED, &ccb->rcb[1]->flags);
 
 	if (!test_and_set_bit(BNAD_RF_RX_SHUTDOWN_DELAYED, &bnad->run_flags))
-		bnad_udelay(BNAD_TXRX_SYNC_UDELAY);
+		udelay(BNAD_TXRX_SYNC_UDELAY);
 }
 
 static void
diff --git a/drivers/net/bna/bnad.h b/drivers/net/bna/bnad.h
index 24fd983..8b1d515 100644
--- a/drivers/net/bna/bnad.h
+++ b/drivers/net/bna/bnad.h
@@ -336,10 +336,4 @@ extern void bnad_netdev_hwstats_fill(struct bnad *bnad,
 	(((_bnad)->cfg_flags & BNAD_CF_DIM_ENABLED) && 		\
 	(test_bit(BNAD_RF_DIM_TIMER_RUNNING, &((_bnad)->run_flags))))
 
-#if defined(__ia64__)
-#define bnad_udelay	udelay
-#else
-#define bnad_udelay	__udelay
-#endif
-
 #endif /* __BNAD_H__ */
-- 
1.7.1


^ permalink raw reply related

* Re: [PATCH 00/12] make rpc_pipefs be mountable multiple times
From: Kirill A. Shutemov @ 2010-12-23  6:50 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Kirill A. Shutemov, Trond Myklebust, Neil Brown, Pavel Emelyanov,
	linux-nfs, David S. Miller, netdev, linux-kernel
In-Reply-To: <20101221234520.GA30525@fieldses.org>

On Tue, Dec 21, 2010 at 06:45:21PM -0500, J. Bruce Fields wrote:
> On Wed, Dec 22, 2010 at 01:32:15AM +0200, Kirill A. Shutemov wrote:
> > On Mon, Dec 20, 2010 at 09:46:44AM -0500, J. Bruce Fields wrote:
> > > By the way, was there ever a resolution to Trond's question?:
> > > 
> > > 	http://marc.info/?l=linux-nfs&m=128655758712817&w=2
> > > 
> > > 	"The keyring upcalls are currently initiated through the same
> > > 	mechanism as module_request and therefore get started with the
> > > 	init_nsproxy namespace. We'd really like them to run inside the
> > > 	same container as the process.  As part of the same problem,
> > > 	there is the issue of what to do with the dns resolver and
> > > 	Bryan's new keyring based idmapper code."
> > 
> > I'm not sure that I understand the problem correctly.
> > 
> > Currently, idmap uses dentry taken from client's cl_rpcclient->cl_path
> > (see nfs_idmap_new()). cl_rpcclient (and cl_path) is initialized with
> > rpcmount resolved against mount namespace of mount process (see
> > nfs_create_rpc_client()).
> > I assume it's correct.
> 
> There's actually two separate sets of idmapper code; look at
> fs/nfs/idmapper.c, the first part of the file (between #ifdef
> CONFIG_NFS_USE_NEW_IDMAPPER and #else) is idmapping code that uses
> request_key().  The code you're looking at (including nfs_idmap_new())
> is later in the file, and deprecated.

IIUC, we need to save nsproxy of mount process in struct nfs_client and
pass it down to request_key(). I think it's outside of this patchset.

-- 
 Kirill A. Shutemov

^ permalink raw reply

* Re: [PATCH V7 1/8] ntp: add ADJ_SETOFFSET mode bit
From: Richard Cochran @ 2010-12-23  6:13 UTC (permalink / raw)
  To: Kuwahara,T.
  Cc: john stultz, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Alan Cox, Arnd Bergmann, Christoph Lameter, David Miller,
	Krzysztof Halasa, Peter Zijlstra, Rodolfo Giometti,
	Thomas Gleixner
In-Reply-To: <AANLkTimmTzH8+fSYmbajqZ+hU5Ps-UZaTp_1TYzjHB6P-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Thu, Dec 23, 2010 at 05:27:58AM +0900, Kuwahara,T. wrote:
> On Wed, Dec 22, 2010 at 7:25 AM, john stultz <johnstul-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> wrote:
> > I don't see why that would be better then adding a
> > clear new mode flag?
> 
> In short, time step is a special case of time slew.  Those are the same,
> only different in one parameter, as is shown in my previous post.
> That's why I said there's no need for adding a new mode.

Well, in addition to the objections raised by John, your suggested
implementation is also shortsighted. The field timex.constant is
copied into time_constant in some code paths. Obviously, this would be
a bad thing when timex.constant==-huge.

So, you need to clarify the interaction between ADJ_OFFSET,
ADJ_TIMECONST, ADJ_TAI, timex.constant, time_constant, and MAXTC.

If you would fully implement your idea, I expect it would become
obvious that it a bit of a hack, both in the kernel code and in the
user space interface. But, if you disagree, please just post a patch
with the complete implementation...

Thanks,
Richard

^ permalink raw reply

* Re: [net-next-2.6 PATCH v2 1/3] net: implement mechanism for HW based QOS
From: John Fastabend @ 2010-12-23  5:29 UTC (permalink / raw)
  To: Johannes Berg
  Cc: davem@davemloft.net, netdev@vger.kernel.org, hadi@cyberus.ca,
	shemminger@vyatta.com, tgraf@infradead.org,
	eric.dumazet@gmail.com, bhutchings@solarflare.com,
	nhorman@tuxdriver.com
In-Reply-To: <1293009149.3531.12.camel@jlt3.sipsolutions.net>

On 12/22/2010 1:12 AM, Johannes Berg wrote:
> On Tue, 2010-12-21 at 11:28 -0800, John Fastabend wrote:
>> This patch provides a mechanism for lower layer devices to
>> steer traffic using skb->priority to tx queues. This allows
>> for hardware based QOS schemes to use the default qdisc without
>> incurring the penalties related to global state and the qdisc
>> lock. While reliably receiving skbs on the correct tx ring
>> to avoid head of line blocking resulting from shuffling in
>> the LLD. Finally, all the goodness from txq caching and xps/rps
>> can still be leveraged.
> 
> Is there any chance this might be applicable to the 802.11 layer as
> well? We will definitely still need an ndo_select_queue handler to reset
> in the case where the peer doesn't support QoS, but it seems the part
> that depends on the frame itself could be pushed out to the generic
> framework instead of having net/wireless/util.c:cfg80211_classify8021d?
> 
> johannes
> 

Johannes,

I took a quick look at this and I believe it should be doable. It would be nice to completely remove the ndo_select_queue if possible though.

I probably won't have a chance to look any further into this for at least a week maybe two. So I'll think about it a bit more later unless someone else gets there first.

Thanks,
John.

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply

* Re: Using ethernet device as efficient small packet generator
From: juice @ 2010-12-23  5:15 UTC (permalink / raw)
  To: Eric Dumazet, Stephen Hemminger, netdev

> Reaching 1Gbs should not be a problem (I was speaking about 10Gbps)
> I reach link speed with my tg3 card and one single cpu :)
> (Broadcom Corporation NetXtreme BCM5715S Gigabit Ethernet (rev a3))
>
> Please provide : ethtool -S eth0
>

This is from the e1000 interface:
03:02.1 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet
Controller (Copper) (rev 01)

root@a2labralinux:/home/juice# ethtool -S eth1
NIC statistics:
     rx_packets: 192069
     tx_packets: 60000313
     rx_bytes: 33850492
     tx_bytes: 3840026215
     rx_broadcast: 192069
     tx_broadcast: 3
     rx_multicast: 0
     tx_multicast: 310
     rx_errors: 0
     tx_errors: 0
     tx_dropped: 0
     multicast: 0
     collisions: 0
     rx_length_errors: 0
     rx_over_errors: 0
     rx_crc_errors: 0
     rx_frame_errors: 0
     rx_no_buffer_count: 0
     rx_missed_errors: 0
     tx_aborted_errors: 0
     tx_carrier_errors: 0
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     tx_window_errors: 0
     tx_abort_late_coll: 0
     tx_deferred_ok: 0
     tx_single_coll_ok: 0
     tx_multi_coll_ok: 0
     tx_timeout_count: 0
     tx_restart_queue: 1806437
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     rx_align_errors: 0
     tx_tcp_seg_good: 0
     tx_tcp_seg_failed: 0
     rx_flow_control_xon: 0
     rx_flow_control_xoff: 0
     tx_flow_control_xon: 0
     tx_flow_control_xoff: 0
     rx_long_byte_count: 33850492
     rx_csum_offload_good: 8978
     rx_csum_offload_errors: 0
     rx_header_split: 0
     alloc_rx_buff_failed: 0
     tx_smbus: 0
     rx_smbus: 0
     dropped_smbus: 0


This is from the tg3 interface:
05:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5761
Gigabit Ethernet PCIe (rev 10)

root@d8labralinux:/home/juice# ethtool -S eth2
NIC statistics:
     rx_octets: 10814
     rx_fragments: 0
     rx_ucast_packets: 20
     rx_mcast_packets: 0
     rx_bcast_packets: 26
     rx_fcs_errors: 0
     rx_align_errors: 0
     rx_xon_pause_rcvd: 0
     rx_xoff_pause_rcvd: 0
     rx_mac_ctrl_rcvd: 0
     rx_xoff_entered: 0
     rx_frame_too_long_errors: 0
     rx_jabbers: 0
     rx_undersize_packets: 0
     rx_in_length_errors: 0
     rx_out_length_errors: 0
     rx_64_or_less_octet_packets: 0
     rx_65_to_127_octet_packets: 0
     rx_128_to_255_octet_packets: 0
     rx_256_to_511_octet_packets: 0
     rx_512_to_1023_octet_packets: 0
     rx_1024_to_1522_octet_packets: 0
     rx_1523_to_2047_octet_packets: 0
     rx_2048_to_4095_octet_packets: 0
     rx_4096_to_8191_octet_packets: 0
     rx_8192_to_9022_octet_packets: 0
     tx_octets: 5120013863
     tx_collisions: 0
     tx_xon_sent: 0
     tx_xoff_sent: 0
     tx_flow_control: 0
     tx_mac_errors: 0
     tx_single_collisions: 0
     tx_mult_collisions: 0
     tx_deferred: 0
     tx_excessive_collisions: 0
     tx_late_collisions: 0
     tx_collide_2times: 0
     tx_collide_3times: 0
     tx_collide_4times: 0
     tx_collide_5times: 0
     tx_collide_6times: 0
     tx_collide_7times: 0
     tx_collide_8times: 0
     tx_collide_9times: 0
     tx_collide_10times: 0
     tx_collide_11times: 0
     tx_collide_12times: 0
     tx_collide_13times: 0
     tx_collide_14times: 0
     tx_collide_15times: 0
     tx_ucast_packets: 80000034
     tx_mcast_packets: 42
     tx_bcast_packets: 40
     tx_carrier_sense_errors: 0
     tx_discards: 0
     tx_errors: 0
     dma_writeq_full: 0
     dma_write_prioq_full: 0
     rxbds_empty: 0
     rx_discards: 0
     rx_errors: 0
     rx_threshold_hit: 0
     dma_readq_full: 0
     dma_read_prioq_full: 0
     tx_comp_queue_full: 0
     ring_set_send_prod_index: 0
     ring_status_update: 0
     nic_irqs: 0
     nic_avoided_irqs: 0
     nic_tx_threshold_hit: 0




^ permalink raw reply

* Re: 2.6.36.2 - loop on read /proc/net/tcp
From: Eric Dumazet @ 2010-12-23  5:07 UTC (permalink / raw)
  To: Alexey Vlasov, David Miller; +Cc: linux-kernel, netdev
In-Reply-To: <20101222134343.GC19998@beaver.vrungel.ru>

Le mercredi 22 décembre 2010 à 16:43 +0300, Alexey Vlasov a écrit :
> Hi.
> 
> Has anyone seen such a bug at 2.6.36.2?
> # netstat -ntl
> Active Internet connections (only servers)
> Proto Recv-Q Send-Q Local Address           Foreign Address         State
> tcp        0      0 81.176.228.2:60608      0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.4:8099       0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.5:8099       0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.7:8099       0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.4:8100       0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.5:8100       0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.4:8101       0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.5:8101       0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.5:20037      0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.4:8102       0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.5:8102       0.0.0.0:*               LISTEN
> tcp        0      0 127.0.0.1:3399          0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.4:20040      0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.4:38985      0.0.0.0:*               LISTEN
> tcp        0      0 0.0.0.0:873             0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.4:20041      0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.4:20042      0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.4:3306       0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.3:3306       0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.2:3306       0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.5:9099       0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.4:9099       0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.4:20043      0.0.0.0:*               LISTEN
> tcp        0      0 0.0.0.0:139             0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.5:9100       0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.4:9100       0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.4:20044      0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.5:33549      0.0.0.0:*               LISTEN
> ...
> First 30 lines are ok
> 
> but then go lines repeating in "eternal" loop:
> tcp        0      0 81.176.228.2:80         0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.3:80         0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.4:80         0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.7:80         0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.2:80         0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.3:80         0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.4:80         0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.7:80         0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.2:80         0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.3:80         0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.4:80         0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.7:80         0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.2:80         0.0.0.0:*               LISTEN
> 
> # cat /proc/net/tcp
> ...
> It can hang an hour or so. but not always actually.
> 
> # i=0; while [ "$i" -lt "10" ]; do time wc -l /proc/net/tcp; let "i = $i + 1"; done
> 614782727 /proc/net/tcp
> 
> real    18m42.066s
> user    0m12.620s
> sys     18m25.890s
> 19443 /proc/net/tcp
> 
> real    0m0.040s
> user    0m0.000s
> sys     0m0.030s
> 19503 /proc/net/tcp
> 
> real    0m0.040s
> sys     0m0.030s
> 19502 /proc/net/tcp
> 
> real    0m0.041s
> user    0m0.000s
> sys     0m0.040s
> 28525 /proc/net/tcp
> 
> real    0m0.059s
> user    0m0.000s
> sys     0m0.050s
> 19463 /proc/net/tcp
> 
> real    0m0.048s
> user    0m0.000s
> sys     0m0.040s
> 19521 /proc/net/tcp
> 
> real    0m0.040s
> user    0m0.000s
> sys     0m0.030s
> 54394 /proc/net/tcp
> 
> real    0m0.104s
> user    0m0.000s
> sys     0m0.100s
> 19479 /proc/net/tcp
> 
> real    0m0.040s
> user    0m0.000s
> sys     0m0.030s
> 19481 /proc/net/tcp
> 
> real    0m0.040s
> user    0m0.000s
> sys     0m0.030s
> 

Hi Alexey

Thanks a lot for your report.

Here is a fix.

(Incidentaly, this means accesses to 0x40000000 addresses dont trigger
faults, since we never BUG() at this point)

David, this is a stable candidate. (2.6.29 +)

Thanks !

[PATCH] tcp: fix listening_get_next()

Alexey Vlasov found /proc/net/tcp could sometime loop and display
millions of sockets in LISTEN state.

In 2.6.29, when we converted TCP hash tables to RCU, we left two
sk_next() calls in listening_get_next().

We must instead use sk_nulls_next() to properly detect an end of chain.

Reported-by: Alexey Vlasov <renton@renton.name>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv4/tcp_ipv4.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index e13da6d..d978bb2 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2030,7 +2030,7 @@ static void *listening_get_next(struct seq_file *seq, void *cur)
 get_req:
 			req = icsk->icsk_accept_queue.listen_opt->syn_table[st->sbucket];
 		}
-		sk	  = sk_next(st->syn_wait_sk);
+		sk	  = sk_nulls_next(st->syn_wait_sk);
 		st->state = TCP_SEQ_STATE_LISTENING;
 		read_unlock_bh(&icsk->icsk_accept_queue.syn_wait_lock);
 	} else {
@@ -2039,7 +2039,7 @@ get_req:
 		if (reqsk_queue_len(&icsk->icsk_accept_queue))
 			goto start_req;
 		read_unlock_bh(&icsk->icsk_accept_queue.syn_wait_lock);
-		sk = sk_next(sk);
+		sk = sk_nulls_next(sk);
 	}
 get_sk:
 	sk_nulls_for_each_from(sk, node) {



^ permalink raw reply related

* Re: [PATCH v7] kptr_restrict for hiding kernel pointers
From: Joe Perches @ 2010-12-23  4:10 UTC (permalink / raw)
  To: Dan Rosenberg
  Cc: linux-kernel, netdev, linux-security-module, jmorris,
	eric.dumazet, tgraf, eugeneteo, kees.cook, mingo, davem,
	a.p.zijlstra, akpm, eparis
In-Reply-To: <1293069792.9820.342.camel@dan>

On Wed, 2010-12-22 at 21:03 -0500, Dan Rosenberg wrote:
> Add the %pK printk format specifier and
> the /proc/sys/kernel/kptr_restrict sysctl.

Another trivial style comment:

> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
[]
> +	case 'K':
> +		/*
> +		 * %pK cannot be used in IRQ context because its test
> +		 * for CAP_SYSLOG would be meaningless.
> +		 */
> +		if (in_irq() || in_serving_softirq() || in_nmi()) {
> +			if (spec.field_width == -1)
> +				spec.field_width = 2 * sizeof(void *);
> +			return string(buf, end, "pK-error", spec);
> +		}
> +
> +		else if ((kptr_restrict == 0) ||
> +			 (kptr_restrict == 1 &&
> +			  has_capability_noaudit(current, CAP_SYSLOG)))
> +			break;
> +

---

> +		if (spec.field_width == -1) {
> +			spec.field_width = 2 * sizeof(void *);
> +			spec.flags |= ZEROPAD;
> +		}
> +		return number(buf, end, 0, spec);

It'd be slightly smaller code to use:

 		   ptr = 0;
		   break;

and delete the if block and return number.

>  	}
>  	spec.flags |= SMALL;
>  	if (spec.field_width == -1) {

^ permalink raw reply

* [PATCH] Convert net %p usage %pK
From: Dan Rosenberg @ 2010-12-23  3:22 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-sctp
  Cc: David S. Miller, Alexey Kuznetsov, James Morris,
	Pekka Savola (ipv6), Remi Denis-Courmont, Vlad Yasevich,
	Patrick McHardy, Sridhar Samudrala, Hideaki YOSHIFUJI, Tejun Heo,
	Eric Dumazet, Li Zefan, Joe Perches, Stephen Hemminger,
	Jamal Hadi Salim, Eric W. Biederman, Alexey Dobriyan, Jiri Pirko,
	Oliver Hartkopp, Urs Thuermann, Greg Kroah-Hartman,
	Daniel Lezcano, Pavel 

The %pK format specifier is designed to hide exposed kernel pointers,
specifically via /proc interfaces. Exposing these pointers provides an
easy target for kernel write vulnerabilities, since they reveal the
locations of writable structures containing easily triggerable function
pointers. The behavior of %pK depends on the kptr_restrict sysctl.

If kptr_restrict is set to 0, no deviation from the standard %p behavior
occurs.  If kptr_restrict is set to 1, the default, if the current user
(intended to be a reader via seq_printf(), etc.) does not have
CAP_SYSLOG (currently in the LSM tree), kernel pointers using %pK are
printed as 0's.  If kptr_restrict is set to 2, kernel pointers using %pK
are printed as 0's regardless of privileges.  Replacing with 0's was
chosen over the default "(null)", which cannot be parsed by userland %p,
which expects "(nil)".

The supporting code for kptr_restrict and %pK are currently in the -mm
tree.  This patch converts users of %p in net/ to %pK.  Cases of
printing pointers to the syslog are not covered, since this would
eliminate useful information for postmortem debugging and the reading of
the syslog is already optionally protected by the dmesg_restrict sysctl.

Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
---
 net/atm/proc.c           |    4 ++--
 net/can/bcm.c            |    6 +++---
 net/ipv4/raw.c           |    2 +-
 net/ipv4/tcp_ipv4.c      |    6 +++---
 net/ipv4/udp.c           |    2 +-
 net/ipv6/raw.c           |    2 +-
 net/ipv6/tcp_ipv6.c      |    6 +++---
 net/ipv6/udp.c           |    2 +-
 net/key/af_key.c         |    2 +-
 net/netlink/af_netlink.c |    2 +-
 net/packet/af_packet.c   |    2 +-
 net/phonet/socket.c      |    2 +-
 net/sctp/proc.c          |    4 ++--
 net/unix/af_unix.c       |    2 +-
 14 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/net/atm/proc.c b/net/atm/proc.c
index f85da07..be3afde 100644
--- a/net/atm/proc.c
+++ b/net/atm/proc.c
@@ -191,7 +191,7 @@ static void vcc_info(struct seq_file *seq, struct atm_vcc *vcc)
 {
 	struct sock *sk = sk_atm(vcc);
 
-	seq_printf(seq, "%p ", vcc);
+	seq_printf(seq, "%pK ", vcc);
 	if (!vcc->dev)
 		seq_printf(seq, "Unassigned    ");
 	else
@@ -218,7 +218,7 @@ static void svc_info(struct seq_file *seq, struct atm_vcc *vcc)
 {
 	if (!vcc->dev)
 		seq_printf(seq, sizeof(void *) == 4 ?
-			   "N/A@%p%10s" : "N/A@%p%2s", vcc, "");
+			   "N/A@%pK%10s" : "N/A@%pK%2s", vcc, "");
 	else
 		seq_printf(seq, "%3d %3d %5d         ",
 			   vcc->dev->number, vcc->vpi, vcc->vci);
diff --git a/net/can/bcm.c b/net/can/bcm.c
index 6faa825..1d2a0d6 100644
--- a/net/can/bcm.c
+++ b/net/can/bcm.c
@@ -165,9 +165,9 @@ static int bcm_proc_show(struct seq_file *m, void *v)
 	struct bcm_sock *bo = bcm_sk(sk);
 	struct bcm_op *op;
 
-	seq_printf(m, ">>> socket %p", sk->sk_socket);
-	seq_printf(m, " / sk %p", sk);
-	seq_printf(m, " / bo %p", bo);
+	seq_printf(m, ">>> socket %pK", sk->sk_socket);
+	seq_printf(m, " / sk %pK", sk);
+	seq_printf(m, " / bo %pK", bo);
 	seq_printf(m, " / dropped %lu", bo->dropped_usr_msgs);
 	seq_printf(m, " / bound %s", bcm_proc_getifname(ifname, bo->ifindex));
 	seq_printf(m, " <<<\n");
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 1f85ef2..6cb9d20 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -949,7 +949,7 @@ static void raw_sock_seq_show(struct seq_file *seq, struct sock *sp, int i)
 	      srcp  = inet->inet_num;
 
 	seq_printf(seq, "%4d: %08X:%04X %08X:%04X"
-		" %02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %p %d\n",
+		" %02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %pK %d\n",
 		i, src, srcp, dest, destp, sp->sk_state,
 		sk_wmem_alloc_get(sp),
 		sk_rmem_alloc_get(sp),
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index e13da6d..86e46a0 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2389,7 +2389,7 @@ static void get_openreq4(struct sock *sk, struct request_sock *req,
 	int ttd = req->expires - jiffies;
 
 	seq_printf(f, "%4d: %08X:%04X %08X:%04X"
-		" %02X %08X:%08X %02X:%08lX %08X %5d %8d %u %d %p%n",
+		" %02X %08X:%08X %02X:%08lX %08X %5d %8d %u %d %pK%n",
 		i,
 		ireq->loc_addr,
 		ntohs(inet_sk(sk)->inet_sport),
@@ -2444,7 +2444,7 @@ static void get_tcp4_sock(struct sock *sk, struct seq_file *f, int i, int *len)
 		rx_queue = max_t(int, tp->rcv_nxt - tp->copied_seq, 0);
 
 	seq_printf(f, "%4d: %08X:%04X %08X:%04X %02X %08X:%08X %02X:%08lX "
-			"%08X %5d %8d %lu %d %p %lu %lu %u %u %d%n",
+			"%08X %5d %8d %lu %d %pK %lu %lu %u %u %d%n",
 		i, src, srcp, dest, destp, sk->sk_state,
 		tp->write_seq - tp->snd_una,
 		rx_queue,
@@ -2479,7 +2479,7 @@ static void get_timewait4_sock(struct inet_timewait_sock *tw,
 	srcp  = ntohs(tw->tw_sport);
 
 	seq_printf(f, "%4d: %08X:%04X %08X:%04X"
-		" %02X %08X:%08X %02X:%08lX %08X %5d %8d %d %d %p%n",
+		" %02X %08X:%08X %02X:%08lX %08X %5d %8d %d %d %pK%n",
 		i, src, srcp, dest, destp, tw->tw_substate, 0, 0,
 		3, jiffies_to_clock_t(ttd), 0, 0, 0, 0,
 		atomic_read(&tw->tw_refcnt), tw, len);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 5e0a3a5..55dcf56 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2046,7 +2046,7 @@ static void udp4_format_sock(struct sock *sp, struct seq_file *f,
 	__u16 srcp	  = ntohs(inet->inet_sport);
 
 	seq_printf(f, "%5d: %08X:%04X %08X:%04X"
-		" %02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %p %d%n",
+		" %02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %pK %d%n",
 		bucket, src, srcp, dest, destp, sp->sk_state,
 		sk_wmem_alloc_get(sp),
 		sk_rmem_alloc_get(sp),
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 86c3952..9c883b7 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -1231,7 +1231,7 @@ static void raw6_sock_seq_show(struct seq_file *seq, struct sock *sp, int i)
 	srcp  = inet_sk(sp)->inet_num;
 	seq_printf(seq,
 		   "%4d: %08X%08X%08X%08X:%04X %08X%08X%08X%08X:%04X "
-		   "%02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %p %d\n",
+		   "%02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %pK %d\n",
 		   i,
 		   src->s6_addr32[0], src->s6_addr32[1],
 		   src->s6_addr32[2], src->s6_addr32[3], srcp,
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 7e41e2c..60a2d56 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1975,7 +1975,7 @@ static void get_openreq6(struct seq_file *seq,
 
 	seq_printf(seq,
 		   "%4d: %08X%08X%08X%08X:%04X %08X%08X%08X%08X:%04X "
-		   "%02X %08X:%08X %02X:%08lX %08X %5d %8d %d %d %p\n",
+		   "%02X %08X:%08X %02X:%08lX %08X %5d %8d %d %d %pK\n",
 		   i,
 		   src->s6_addr32[0], src->s6_addr32[1],
 		   src->s6_addr32[2], src->s6_addr32[3],
@@ -2026,7 +2026,7 @@ static void get_tcp6_sock(struct seq_file *seq, struct sock *sp, int i)
 
 	seq_printf(seq,
 		   "%4d: %08X%08X%08X%08X:%04X %08X%08X%08X%08X:%04X "
-		   "%02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %p %lu %lu %u %u %d\n",
+		   "%02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %pK %lu %lu %u %u %d\n",
 		   i,
 		   src->s6_addr32[0], src->s6_addr32[1],
 		   src->s6_addr32[2], src->s6_addr32[3], srcp,
@@ -2068,7 +2068,7 @@ static void get_timewait6_sock(struct seq_file *seq,
 
 	seq_printf(seq,
 		   "%4d: %08X%08X%08X%08X:%04X %08X%08X%08X%08X:%04X "
-		   "%02X %08X:%08X %02X:%08lX %08X %5d %8d %d %d %p\n",
+		   "%02X %08X:%08X %02X:%08lX %08X %5d %8d %d %d %pK\n",
 		   i,
 		   src->s6_addr32[0], src->s6_addr32[1],
 		   src->s6_addr32[2], src->s6_addr32[3], srcp,
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 91def93..ba25da4 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1395,7 +1395,7 @@ static void udp6_sock_seq_show(struct seq_file *seq, struct sock *sp, int bucket
 	srcp  = ntohs(inet->inet_sport);
 	seq_printf(seq,
 		   "%5d: %08X%08X%08X%08X:%04X %08X%08X%08X%08X:%04X "
-		   "%02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %p %d\n",
+		   "%02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %pK %d\n",
 		   bucket,
 		   src->s6_addr32[0], src->s6_addr32[1],
 		   src->s6_addr32[2], src->s6_addr32[3], srcp,
diff --git a/net/key/af_key.c b/net/key/af_key.c
index d87c22d..cf00ddf 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -3643,7 +3643,7 @@ static int pfkey_seq_show(struct seq_file *f, void *v)
 	if (v == SEQ_START_TOKEN)
 		seq_printf(f ,"sk       RefCnt Rmem   Wmem   User   Inode\n");
 	else
-		seq_printf(f ,"%p %-6d %-6u %-6u %-6u %-6lu\n",
+		seq_printf(f, "%pK %-6d %-6u %-6u %-6u %-6lu\n",
 			       s,
 			       atomic_read(&s->sk_refcnt),
 			       sk_rmem_alloc_get(s),
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 478181d..31425fa 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1990,7 +1990,7 @@ static int netlink_seq_show(struct seq_file *seq, void *v)
 		struct sock *s = v;
 		struct netlink_sock *nlk = nlk_sk(s);
 
-		seq_printf(seq, "%p %-3d %-6d %08x %-8d %-8d %p %-8d %-8d %-8lu\n",
+		seq_printf(seq, "%pK %-3d %-6d %08x %-8d %-8d %pK %-8d %-8d %-8lu\n",
 			   s,
 			   s->sk_protocol,
 			   nlk->pid,
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 8298e67..02b1b58 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2639,7 +2639,7 @@ static int packet_seq_show(struct seq_file *seq, void *v)
 		const struct packet_sock *po = pkt_sk(s);
 
 		seq_printf(seq,
-			   "%p %-6d %-4d %04x   %-5d %1d %-6u %-6u %-6lu\n",
+			   "%pK %-6d %-4d %04x   %-5d %1d %-6u %-6u %-6lu\n",
 			   s,
 			   atomic_read(&s->sk_refcnt),
 			   s->sk_type,
diff --git a/net/phonet/socket.c b/net/phonet/socket.c
index 25f746d..2c8f9d9 100644
--- a/net/phonet/socket.c
+++ b/net/phonet/socket.c
@@ -632,7 +632,7 @@ static int pn_sock_seq_show(struct seq_file *seq, void *v)
 		struct pn_sock *pn = pn_sk(sk);
 
 		seq_printf(seq, "%2d %04X:%04X:%02X %02X %08X:%08X %5d %lu "
-			"%d %p %d%n",
+			"%d %pK %d%n",
 			sk->sk_protocol, pn->sobject, 0, pn->resource,
 			sk->sk_state,
 			sk_wmem_alloc_get(sk), sk_rmem_alloc_get(sk),
diff --git a/net/sctp/proc.c b/net/sctp/proc.c
index 61aacfb..05a6ce2 100644
--- a/net/sctp/proc.c
+++ b/net/sctp/proc.c
@@ -212,7 +212,7 @@ static int sctp_eps_seq_show(struct seq_file *seq, void *v)
 	sctp_for_each_hentry(epb, node, &head->chain) {
 		ep = sctp_ep(epb);
 		sk = epb->sk;
-		seq_printf(seq, "%8p %8p %-3d %-3d %-4d %-5d %5d %5lu ", ep, sk,
+		seq_printf(seq, "%8pK %8pK %-3d %-3d %-4d %-5d %5d %5lu ", ep, sk,
 			   sctp_sk(sk)->type, sk->sk_state, hash,
 			   epb->bind_addr.port,
 			   sock_i_uid(sk), sock_i_ino(sk));
@@ -316,7 +316,7 @@ static int sctp_assocs_seq_show(struct seq_file *seq, void *v)
 		assoc = sctp_assoc(epb);
 		sk = epb->sk;
 		seq_printf(seq,
-			   "%8p %8p %-3d %-3d %-2d %-4d "
+			   "%8pK %8pK %-3d %-3d %-2d %-4d "
 			   "%4d %8d %8d %7d %5lu %-5d %5d ",
 			   assoc, sk, sctp_sk(sk)->type, sk->sk_state,
 			   assoc->state, hash,
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 2268e67..e6d7d04 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2225,7 +2225,7 @@ static int unix_seq_show(struct seq_file *seq, void *v)
 		struct unix_sock *u = unix_sk(s);
 		unix_state_lock(s);
 
-		seq_printf(seq, "%p: %08X %08X %08X %04X %02X %5lu",
+		seq_printf(seq, "%pK: %08X %08X %08X %04X %02X %5lu",
 			s,
 			atomic_read(&s->sk_refcnt),
 			0,

^ permalink raw reply related

* [PATCH v7] kptr_restrict for hiding kernel pointers
From: Dan Rosenberg @ 2010-12-23  2:03 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-security-module
  Cc: jmorris, eric.dumazet, tgraf, eugeneteo, kees.cook, mingo, davem,
	a.p.zijlstra, akpm, eparis

Add the %pK printk format specifier and
the /proc/sys/kernel/kptr_restrict sysctl.

The %pK format specifier is designed to hide exposed kernel pointers,
specifically via /proc interfaces. Exposing these pointers provides an
easy target for kernel write vulnerabilities, since they reveal the
locations of writable structures containing easily triggerable function
pointers. The behavior of %pK depends on the kptr_restrict sysctl.

If kptr_restrict is set to 0, no deviation from the standard %p behavior
occurs.  If kptr_restrict is set to 1, the default, if the current user
(intended to be a reader via seq_printf(), etc.) does not have
CAP_SYSLOG (currently in the LSM tree), kernel pointers using %pK are
printed as 0's.  If kptr_restrict is set to 2, kernel pointers using %pK
are printed as 0's regardless of privileges.  Replacing with 0's was
chosen over the default "(null)", which cannot be parsed by userland %p,
which expects "(nil)".


v7 moves the extern to printk.h and cleans up the conditional statements
based on the suggestions of Joe Perches.

v6 removes the WARN_ONCE in favor of returning "pK-error" to avoid
breaking in certain cases, thanks to Ingo Molnar.

v5 sets kptr_restrict to a default value of 1, and properly handles the
case where it's incorrectly used in IRQ context. 

v4 incorporates Eric Paris' suggestion of using
has_capability_noaudit(), since failing this capability check is not a
policy violation but rather a code path choice and shouldn't generate
potentially excessive log noise.  Adjusted IRQ comment for clarity.

v3 adds the "2" setting, cleans up documentation, removes the CONFIG,
and incorporates changes and suggestions from Andrew Morton.

v2 improves checking for inappropriate context, on suggestion by Peter
Zijlstra.  Thanks to Thomas Graf for suggesting use of a centralized
format specifier.


Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Cc: James Morris <jmorris@namei.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Thomas Graf <tgraf@infradead.org>
Cc: Eugene Teo <eugeneteo@kernel.org>
Cc: Kees Cook <kees.cook@canonical.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David S. Miller <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric Paris <eparis@parisplace.org>
---
 Documentation/sysctl/kernel.txt |   14 ++++++++++++++
 include/linux/printk.h          |    1 +
 kernel/sysctl.c                 |    9 +++++++++
 lib/vsprintf.c                  |   24 ++++++++++++++++++++++++
 4 files changed, 48 insertions(+), 0 deletions(-)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 209e158..8ace8c4 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -34,6 +34,7 @@ show up in /proc/sys/kernel:
 - hotplug
 - java-appletviewer           [ binfmt_java, obsolete ]
 - java-interpreter            [ binfmt_java, obsolete ]
+- kptr_restrict
 - kstack_depth_to_print       [ X86 only ]
 - l2cr                        [ PPC only ]
 - modprobe                    ==> Documentation/debugging-modules.txt
@@ -261,6 +262,19 @@ This flag controls the L2 cache of G3 processor boards. If
 
 ==============================================================
 
+kptr_restrict:
+
+This toggle indicates whether restrictions are placed on
+exposing kernel addresses via /proc and other interfaces.  When
+kptr_restrict is set to (0), there are no restrictions.  When
+kptr_restrict is set to (1), the default, kernel pointers
+printed using the %pK format specifier will be replaced with 0's
+unless the user has CAP_SYSLOG.  When kptr_restrict is set to
+(2), kernel pointers printed using %pK will be replaced with 0's
+regardless of privileges.
+
+==============================================================
+
 kstack_depth_to_print: (X86 only)
 
 Controls the number of words to print when dumping the raw
diff --git a/include/linux/printk.h b/include/linux/printk.h
index b772ca5..9adfba6 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -83,6 +83,7 @@ extern bool printk_timed_ratelimit(unsigned long *caller_jiffies,
 
 extern int printk_delay_msec;
 extern int dmesg_restrict;
+extern int kptr_restrict;
 
 /*
  * Print a one-time message (analogous to WARN_ONCE() et al):
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 5abfa15..236fa91 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -713,6 +713,15 @@ static struct ctl_table kern_table[] = {
 	},
 #endif
 	{
+		.procname	= "kptr_restrict",
+		.data		= &kptr_restrict,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &two,
+	},
+	{
 		.procname	= "ngroups_max",
 		.data		= &ngroups_max,
 		.maxlen		= sizeof (int),
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index c150d3d..ea556da 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -936,6 +936,8 @@ char *uuid_string(char *buf, char *end, const u8 *addr,
 	return string(buf, end, uuid, spec);
 }
 
+int kptr_restrict = 1;
+
 /*
  * Show a '%p' thing.  A kernel extension is that the '%p' is followed
  * by an extra set of alphanumeric characters that are extended format
@@ -979,6 +981,7 @@ char *uuid_string(char *buf, char *end, const u8 *addr,
  *       Implements a "recursive vsnprintf".
  *       Do not use this feature without some mechanism to verify the
  *       correctness of the format string and va_list arguments.
+ * - 'K' For a kernel pointer that should be hidden from unprivileged users
  *
  * Note: The difference between 'S' and 'F' is that on ia64 and ppc64
  * function pointers are really function descriptors, which contain a
@@ -1035,6 +1038,27 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr,
 		return buf + vsnprintf(buf, end - buf,
 				       ((struct va_format *)ptr)->fmt,
 				       *(((struct va_format *)ptr)->va));
+	case 'K':
+		/*
+		 * %pK cannot be used in IRQ context because its test
+		 * for CAP_SYSLOG would be meaningless.
+		 */
+		if (in_irq() || in_serving_softirq() || in_nmi()) {
+			if (spec.field_width == -1)
+				spec.field_width = 2 * sizeof(void *);
+			return string(buf, end, "pK-error", spec);
+		}
+
+		else if ((kptr_restrict == 0) ||
+			 (kptr_restrict == 1 &&
+			  has_capability_noaudit(current, CAP_SYSLOG)))
+			break;
+
+		if (spec.field_width == -1) {
+			spec.field_width = 2 * sizeof(void *);
+			spec.flags |= ZEROPAD;
+		}
+		return number(buf, end, 0, spec);
 	}
 	spec.flags |= SMALL;
 	if (spec.field_width == -1) {



^ permalink raw reply related

* Re: pull request: wireless-2.6 2010-12-22
From: David Miller @ 2010-12-23  1:35 UTC (permalink / raw)
  To: linville; +Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <20101222195209.GD10046@tuxdriver.com>

From: "John W. Linville" <linville@tuxdriver.com>
Date: Wed, 22 Dec 2010 14:52:09 -0500

> Dave,
> 
> Three more fixes intended for 2.6.37...
> 
> The one from Johannes Berg avoids a NULL pointer dereference in the mesh
> code.  The one from Johannes Stezenbach fixes bug 24892, which is a
> lock-up caused by rt2x00.  Finally, you probably recognize the one from
> Meelis Roos which removes some log spam coming from hostap.  These have
> all spent several days in linux-next without any adverse effects.
> 
> Please let me know if there are problems!

Pulled, thanks John.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox