Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: hackbench regression due to commit 9dfc6e68bfe6e
From: Eric Dumazet @ 2010-04-06 22:10 UTC (permalink / raw)
  To: Christoph Lameter, netdev
  Cc: Zhang, Yanmin, Tejun Heo, Pekka Enberg, alex.shi,
	linux-kernel@vger.kernel.org, Ma, Ling, Chen, Tim C,
	Andrew Morton
In-Reply-To: <alpine.DEB.2.00.1004061552500.19151@router.home>

Le mardi 06 avril 2010 à 15:55 -0500, Christoph Lameter a écrit :
> We cannot reproduce the issue here. Our tests here (dual quad dell) show a
> performance increase in hackbench instead.
> 
> Linux 2.6.33.2 #2 SMP Mon Apr 5 11:30:56 CDT 2010 x86_64 GNU/Linux
> ./hackbench 100 process 200000
> Running with 100*40 (== 4000) tasks.
> Time: 3102.142
> ./hackbench 100 process 20000
> Running with 100*40 (== 4000) tasks.
> Time: 308.731
> ./hackbench 100 process 20000
> Running with 100*40 (== 4000) tasks.
> Time: 311.591
> ./hackbench 100 process 20000
> Running with 100*40 (== 4000) tasks.
> Time: 310.200
> ./hackbench 10 process 20000
> Running with 10*40 (== 400) tasks.
> Time: 38.048
> ./hackbench 10 process 20000
> Running with 10*40 (== 400) tasks.
> Time: 44.711
> ./hackbench 10 process 20000
> Running with 10*40 (== 400) tasks.
> Time: 39.407
> ./hackbench 1 process 20000
> Running with 1*40 (== 40) tasks.
> Time: 9.411
> ./hackbench 1 process 20000
> Running with 1*40 (== 40) tasks.
> Time: 8.765
> ./hackbench 1 process 20000
> Running with 1*40 (== 40) tasks.
> Time: 8.822
> 
> Linux 2.6.34-rc3 #1 SMP Tue Apr 6 13:30:34 CDT 2010 x86_64 GNU/Linux
> ./hackbench 100 process 200000
> Running with 100*40 (== 4000) tasks.
> Time: 3003.578
> ./hackbench 100 process 20000
> Running with 100*40 (== 4000) tasks.
> Time: 300.289
> ./hackbench 100 process 20000
> Running with 100*40 (== 4000) tasks.
> Time: 301.462
> ./hackbench 100 process 20000
> Running with 100*40 (== 4000) tasks.
> Time: 301.173
> ./hackbench 10 process 20000
> Running with 10*40 (== 400) tasks.
> Time: 41.191
> ./hackbench 10 process 20000
> Running with 10*40 (== 400) tasks.
> Time: 41.964
> ./hackbench 10 process 20000
> Running with 10*40 (== 400) tasks.
> Time: 41.470
> ./hackbench 1 process 20000
> Running with 1*40 (== 40) tasks.
> Time: 8.829
> ./hackbench 1 process 20000
> Running with 1*40 (== 40) tasks.
> Time: 9.166
> ./hackbench 1 process 20000
> Running with 1*40 (== 40) tasks.
> Time: 8.681
> 
> 


Well, your config might be very different... and hackbench results can
vary by 10% on same machine, same kernel.

This is not a reliable bench, because af_unix is not prepared to get
such a lazy workload.

We really should warn people about this.



# hackbench 25 process 3000
Running with 25*40 (== 1000) tasks.
Time: 12.922
# hackbench 25 process 3000
Running with 25*40 (== 1000) tasks.
Time: 12.696
# hackbench 25 process 3000
Running with 25*40 (== 1000) tasks.
Time: 13.060
# hackbench 25 process 3000
Running with 25*40 (== 1000) tasks.
Time: 14.108
# hackbench 25 process 3000
Running with 25*40 (== 1000) tasks.
Time: 13.165
# hackbench 25 process 3000
Running with 25*40 (== 1000) tasks.
Time: 13.310
# hackbench 25 process 3000 
Running with 25*40 (== 1000) tasks.
Time: 12.530


booting with slub_min_order=3 do change hackbench results for example ;)

All writers can compete on spinlock for a target UNIX socket, we spend _lot_ of time spinning.

If we _really_ want to speedup hackbench, we would have to change unix_state_lock()
to use a non spinning locking primitive (aka lock_sock()), and slowdown normal path.


# perf record -f hackbench 25 process 3000 
Running with 25*40 (== 1000) tasks.
Time: 13.330
[ perf record: Woken up 289 times to write data ]
[ perf record: Captured and wrote 54.312 MB perf.data (~2372928 samples) ]
# perf report
# Samples: 2370135
#
# Overhead    Command                 Shared Object  Symbol
# ........  .........  ............................  ......
#
     9.68%  hackbench  [kernel]                      [k] do_raw_spin_lock
     6.50%  hackbench  [kernel]                      [k] schedule
     4.38%  hackbench  [kernel]                      [k] __kmalloc_track_caller
     3.95%  hackbench  [kernel]                      [k] copy_to_user
     3.86%  hackbench  [kernel]                      [k] __alloc_skb
     3.77%  hackbench  [kernel]                      [k] unix_stream_recvmsg
     3.12%  hackbench  [kernel]                      [k] sock_alloc_send_pskb
     2.75%  hackbench  [vdso]                        [.] 0x000000ffffe425
     2.28%  hackbench  [kernel]                      [k] sysenter_past_esp
     2.03%  hackbench  [kernel]                      [k] __mutex_lock_common
     2.00%  hackbench  [kernel]                      [k] kfree
     2.00%  hackbench  [kernel]                      [k] delay_tsc
     1.75%  hackbench  [kernel]                      [k] update_curr
     1.70%  hackbench  [kernel]                      [k] kmem_cache_alloc
     1.69%  hackbench  [kernel]                      [k] do_raw_spin_unlock
     1.60%  hackbench  [kernel]                      [k] unix_stream_sendmsg
     1.54%  hackbench  [kernel]                      [k] sched_clock_local
     1.46%  hackbench  [kernel]                      [k] __slab_free
     1.37%  hackbench  [kernel]                      [k] do_raw_read_lock
     1.34%  hackbench  [kernel]                      [k] __switch_to
     1.24%  hackbench  [kernel]                      [k] select_task_rq_fair
     1.23%  hackbench  [kernel]                      [k] sock_wfree
     1.21%  hackbench  [kernel]                      [k] _raw_spin_unlock_irqrestore
     1.19%  hackbench  [kernel]                      [k] __mutex_unlock_slowpath
     1.05%  hackbench  [kernel]                      [k] trace_hardirqs_off
     0.99%  hackbench  [kernel]                      [k] __might_sleep
     0.93%  hackbench  [kernel]                      [k] do_raw_read_unlock
     0.93%  hackbench  [kernel]                      [k] _raw_spin_lock
     0.91%  hackbench  [kernel]                      [k] try_to_wake_up
     0.81%  hackbench  [kernel]                      [k] sched_clock
     0.80%  hackbench  [kernel]                      [k] trace_hardirqs_on



^ permalink raw reply

* [PATCH] [V4] Add non-Virtex5 support for LL TEMAC driver
From: John Linn @ 2010-04-06 21:43 UTC (permalink / raw)
  To: netdev, linuxppc-dev, grant.likely, jwboyer, eric.dumazet
  Cc: john.williams, michal.simek, John Linn, John Tyner

This patch adds support for using the LL TEMAC Ethernet driver on
non-Virtex 5 platforms by adding support for accessing the Soft DMA
registers as if they were memory mapped instead of solely through the
DCR's (available on the Virtex 5).

The patch also updates the driver so that it runs on the MicroBlaze.
The changes were tested on the PowerPC 440, PowerPC 405, and the
MicroBlaze platforms.

Signed-off-by: John Tyner <jtyner@cs.ucr.edu>
Signed-off-by: John Linn <john.linn@xilinx.com>

---

V2 - Incorporated comments from Grant and added more logic to allow the driver
to work on MicroBlaze.

V3 - Only updated it to apply to head, minor change to include slab.h. Also
verified that it now builds for MicroBlaze. Retested on PowerPC and MicroBlaze.

V4 - Removed buffer alignment for skb and called the network functions that
already do the alignment for cache line and word alignment. Added constants
to MicroBlaze system to make sure network alignment is maintained. Also updated
the Kconfig so it depends on Microblaze or PPC based on Grant's comment.
---
 arch/microblaze/include/asm/system.h |   11 +++
 drivers/net/Kconfig                  |    2 +-
 drivers/net/ll_temac.h               |   14 +++-
 drivers/net/ll_temac_main.c          |  137 +++++++++++++++++++++++++--------
 4 files changed, 126 insertions(+), 38 deletions(-)

diff --git a/arch/microblaze/include/asm/system.h b/arch/microblaze/include/asm/system.h
index 59efb3f..48c4f03 100644
--- a/arch/microblaze/include/asm/system.h
+++ b/arch/microblaze/include/asm/system.h
@@ -12,6 +12,7 @@
 #include <asm/registers.h>
 #include <asm/setup.h>
 #include <asm/irqflags.h>
+#include <asm/cache.h>
 
 #include <asm-generic/cmpxchg.h>
 #include <asm-generic/cmpxchg-local.h>
@@ -96,4 +97,14 @@ extern struct dentry *of_debugfs_root;
 
 #define arch_align_stack(x) (x)
 
+/*
+ * MicroBlaze doesn't handle unaligned accesses in hardware.
+ *
+ * Based on this we force the IP header alignment in network drivers.
+ * We also modify NET_SKB_PAD to be a cacheline in size, thus maintaining
+ * cacheline alignment of buffers.
+ */
+#define NET_IP_ALIGN	2
+#define NET_SKB_PAD	L1_CACHE_BYTES
+
 #endif /* _ASM_MICROBLAZE_SYSTEM_H */
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 0ba5b8e..207a57d 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -2434,8 +2434,8 @@ config MV643XX_ETH
 
 config XILINX_LL_TEMAC
 	tristate "Xilinx LL TEMAC (LocalLink Tri-mode Ethernet MAC) driver"
+	depends on PPC || MICROBLAZE
 	select PHYLIB
-	depends on PPC_DCR_NATIVE
 	help
 	  This driver supports the Xilinx 10/100/1000 LocalLink TEMAC
 	  core used in Xilinx Spartan and Virtex FPGAs
diff --git a/drivers/net/ll_temac.h b/drivers/net/ll_temac.h
index 1af66a1..c033584 100644
--- a/drivers/net/ll_temac.h
+++ b/drivers/net/ll_temac.h
@@ -5,8 +5,11 @@
 #include <linux/netdevice.h>
 #include <linux/of.h>
 #include <linux/spinlock.h>
+
+#ifdef CONFIG_PPC_DCR
 #include <asm/dcr.h>
 #include <asm/dcr-regs.h>
+#endif
 
 /* packet size info */
 #define XTE_HDR_SIZE			14      /* size of Ethernet header */
@@ -290,9 +293,6 @@ This option defaults to enabled (set) */
 
 #define TX_CONTROL_CALC_CSUM_MASK   1
 
-#define XTE_ALIGN       32
-#define BUFFER_ALIGN(adr) ((XTE_ALIGN - ((u32) adr)) % XTE_ALIGN)
-
 #define MULTICAST_CAM_TABLE_NUM 4
 
 /* TX/RX CURDESC_PTR points to first descriptor */
@@ -335,9 +335,15 @@ struct temac_local {
 	struct mii_bus *mii_bus;	/* MII bus reference */
 	int mdio_irqs[PHY_MAX_ADDR];	/* IRQs table for MDIO bus */
 
-	/* IO registers and IRQs */
+	/* IO registers, dma functions and IRQs */
 	void __iomem *regs;
+	void __iomem *sdma_regs;
+#ifdef CONFIG_PPC_DCR
 	dcr_host_t sdma_dcrs;
+#endif
+	u32 (*dma_in)(struct temac_local *, int);
+	void (*dma_out)(struct temac_local *, int, u32);
+
 	int tx_irq;
 	int rx_irq;
 	int emac_num;
diff --git a/drivers/net/ll_temac_main.c b/drivers/net/ll_temac_main.c
index ba617e3..6270bb0 100644
--- a/drivers/net/ll_temac_main.c
+++ b/drivers/net/ll_temac_main.c
@@ -20,9 +20,6 @@
  *   or rx, so this should be okay.
  *
  * TODO:
- * - Fix driver to work on more than just Virtex5.  Right now the driver
- *   assumes that the locallink DMA registers are accessed via DCR
- *   instructions.
  * - Factor out locallink DMA code into separate driver
  * - Fix multicast assignment.
  * - Fix support for hardware checksumming.
@@ -116,17 +113,86 @@ void temac_indirect_out32(struct temac_local *lp, int reg, u32 value)
 	temac_iow(lp, XTE_CTL0_OFFSET, CNTLREG_WRITE_ENABLE_MASK | reg);
 }
 
+/**
+ * temac_dma_in32 - Memory mapped DMA read, this function expects a
+ * register input that is based on DCR word addresses which
+ * are then converted to memory mapped byte addresses
+ */
 static u32 temac_dma_in32(struct temac_local *lp, int reg)
 {
-	return dcr_read(lp->sdma_dcrs, reg);
+	return in_be32((u32 *)(lp->sdma_regs + (reg << 2)));
 }
 
+/**
+ * temac_dma_out32 - Memory mapped DMA read, this function expects a
+ * register input that is based on DCR word addresses which
+ * are then converted to memory mapped byte addresses
+ */
 static void temac_dma_out32(struct temac_local *lp, int reg, u32 value)
 {
+	out_be32((u32 *)(lp->sdma_regs + (reg << 2)), value);
+}
+
+/* DMA register access functions can be DCR based or memory mapped.
+ * The PowerPC 440 is DCR based, the PowerPC 405 and MicroBlaze are both
+ * memory mapped.
+ */
+#ifdef CONFIG_PPC_DCR
+
+/**
+ * temac_dma_dcr_in32 - DCR based DMA read
+ */
+static u32 temac_dma_dcr_in(struct temac_local *lp, int reg)
+{
+	return dcr_read(lp->sdma_dcrs, reg);
+}
+
+/**
+ * temac_dma_dcr_out32 - DCR based DMA write
+ */
+static void temac_dma_dcr_out(struct temac_local *lp, int reg, u32 value)
+{
 	dcr_write(lp->sdma_dcrs, reg, value);
 }
 
 /**
+ * temac_dcr_setup - If the DMA is DCR based, then setup the address and
+ * I/O  functions
+ */
+static int temac_dcr_setup(struct temac_local *lp, struct of_device *op,
+				struct device_node *np)
+{
+	unsigned int dcrs;
+
+	/* setup the dcr address mapping if it's in the device tree */
+
+	dcrs = dcr_resource_start(np, 0);
+	if (dcrs != 0) {
+		lp->sdma_dcrs = dcr_map(np, dcrs, dcr_resource_len(np, 0));
+		lp->dma_in = temac_dma_dcr_in;
+		lp->dma_out = temac_dma_dcr_out;
+		dev_dbg(&op->dev, "DCR base: %x\n", dcrs);
+		return 0;
+	}
+	/* no DCR in the device tree, indicate a failure */
+	return -1;
+}
+
+#else
+
+/*
+ * temac_dcr_setup - This is a stub for when DCR is not supported,
+ * such as with MicroBlaze
+ */
+static int temac_dcr_setup(struct temac_local *lp, struct of_device *op,
+				struct device_node *np)
+{
+	return -1;
+}
+
+#endif
+
+/**
  * temac_dma_bd_init - Setup buffer descriptor rings
  */
 static int temac_dma_bd_init(struct net_device *ndev)
@@ -156,14 +222,14 @@ static int temac_dma_bd_init(struct net_device *ndev)
 		lp->rx_bd_v[i].next = lp->rx_bd_p +
 				sizeof(*lp->rx_bd_v) * ((i + 1) % RX_BD_NUM);
 
-		skb = alloc_skb(XTE_MAX_JUMBO_FRAME_SIZE
-				+ XTE_ALIGN, GFP_ATOMIC);
+		skb = netdev_alloc_skb_ip_align(ndev,
+						XTE_MAX_JUMBO_FRAME_SIZE);
+
 		if (skb == 0) {
 			dev_err(&ndev->dev, "alloc_skb error %d\n", i);
 			return -1;
 		}
 		lp->rx_skb[i] = skb;
-		skb_reserve(skb,  BUFFER_ALIGN(skb->data));
 		/* returns physical address of skb->data */
 		lp->rx_bd_v[i].phys = dma_map_single(ndev->dev.parent,
 						     skb->data,
@@ -173,23 +239,23 @@ static int temac_dma_bd_init(struct net_device *ndev)
 		lp->rx_bd_v[i].app0 = STS_CTRL_APP0_IRQONEND;
 	}
 
-	temac_dma_out32(lp, TX_CHNL_CTRL, 0x10220400 |
+	lp->dma_out(lp, TX_CHNL_CTRL, 0x10220400 |
 					  CHNL_CTRL_IRQ_EN |
 					  CHNL_CTRL_IRQ_DLY_EN |
 					  CHNL_CTRL_IRQ_COAL_EN);
 	/* 0x10220483 */
 	/* 0x00100483 */
-	temac_dma_out32(lp, RX_CHNL_CTRL, 0xff010000 |
+	lp->dma_out(lp, RX_CHNL_CTRL, 0xff010000 |
 					  CHNL_CTRL_IRQ_EN |
 					  CHNL_CTRL_IRQ_DLY_EN |
 					  CHNL_CTRL_IRQ_COAL_EN |
 					  CHNL_CTRL_IRQ_IOE);
 	/* 0xff010283 */
 
-	temac_dma_out32(lp, RX_CURDESC_PTR,  lp->rx_bd_p);
-	temac_dma_out32(lp, RX_TAILDESC_PTR,
+	lp->dma_out(lp, RX_CURDESC_PTR,  lp->rx_bd_p);
+	lp->dma_out(lp, RX_TAILDESC_PTR,
 		       lp->rx_bd_p + (sizeof(*lp->rx_bd_v) * (RX_BD_NUM - 1)));
-	temac_dma_out32(lp, TX_CURDESC_PTR, lp->tx_bd_p);
+	lp->dma_out(lp, TX_CURDESC_PTR, lp->tx_bd_p);
 
 	return 0;
 }
@@ -427,9 +493,9 @@ static void temac_device_reset(struct net_device *ndev)
 	temac_indirect_out32(lp, XTE_RXC1_OFFSET, val & ~XTE_RXC1_RXEN_MASK);
 
 	/* Reset Local Link (DMA) */
-	temac_dma_out32(lp, DMA_CONTROL_REG, DMA_CONTROL_RST);
+	lp->dma_out(lp, DMA_CONTROL_REG, DMA_CONTROL_RST);
 	timeout = 1000;
-	while (temac_dma_in32(lp, DMA_CONTROL_REG) & DMA_CONTROL_RST) {
+	while (lp->dma_in(lp, DMA_CONTROL_REG) & DMA_CONTROL_RST) {
 		udelay(1);
 		if (--timeout == 0) {
 			dev_err(&ndev->dev,
@@ -437,7 +503,7 @@ static void temac_device_reset(struct net_device *ndev)
 			break;
 		}
 	}
-	temac_dma_out32(lp, DMA_CONTROL_REG, DMA_TAIL_ENABLE);
+	lp->dma_out(lp, DMA_CONTROL_REG, DMA_TAIL_ENABLE);
 
 	temac_dma_bd_init(ndev);
 
@@ -598,7 +664,7 @@ static int temac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
 		lp->tx_bd_tail = 0;
 
 	/* Kick off the transfer */
-	temac_dma_out32(lp, TX_TAILDESC_PTR, tail_p); /* DMA start */
+	lp->dma_out(lp, TX_TAILDESC_PTR, tail_p); /* DMA start */
 
 	return NETDEV_TX_OK;
 }
@@ -640,16 +706,15 @@ static void ll_temac_recv(struct net_device *ndev)
 		ndev->stats.rx_packets++;
 		ndev->stats.rx_bytes += length;
 
-		new_skb = alloc_skb(XTE_MAX_JUMBO_FRAME_SIZE + XTE_ALIGN,
-				GFP_ATOMIC);
+		new_skb = netdev_alloc_skb_ip_align(ndev,
+						XTE_MAX_JUMBO_FRAME_SIZE);
+
 		if (new_skb == 0) {
 			dev_err(&ndev->dev, "no memory for new sk_buff\n");
 			spin_unlock_irqrestore(&lp->rx_lock, flags);
 			return;
 		}
 
-		skb_reserve(new_skb, BUFFER_ALIGN(new_skb->data));
-
 		cur_p->app0 = STS_CTRL_APP0_IRQONEND;
 		cur_p->phys = dma_map_single(ndev->dev.parent, new_skb->data,
 					     XTE_MAX_JUMBO_FRAME_SIZE,
@@ -664,7 +729,7 @@ static void ll_temac_recv(struct net_device *ndev)
 		cur_p = &lp->rx_bd_v[lp->rx_bd_ci];
 		bdstat = cur_p->app0;
 	}
-	temac_dma_out32(lp, RX_TAILDESC_PTR, tail_p);
+	lp->dma_out(lp, RX_TAILDESC_PTR, tail_p);
 
 	spin_unlock_irqrestore(&lp->rx_lock, flags);
 }
@@ -675,8 +740,8 @@ static irqreturn_t ll_temac_tx_irq(int irq, void *_ndev)
 	struct temac_local *lp = netdev_priv(ndev);
 	unsigned int status;
 
-	status = temac_dma_in32(lp, TX_IRQ_REG);
-	temac_dma_out32(lp, TX_IRQ_REG, status);
+	status = lp->dma_in(lp, TX_IRQ_REG);
+	lp->dma_out(lp, TX_IRQ_REG, status);
 
 	if (status & (IRQ_COAL | IRQ_DLY))
 		temac_start_xmit_done(lp->ndev);
@@ -693,8 +758,8 @@ static irqreturn_t ll_temac_rx_irq(int irq, void *_ndev)
 	unsigned int status;
 
 	/* Read and clear the status registers */
-	status = temac_dma_in32(lp, RX_IRQ_REG);
-	temac_dma_out32(lp, RX_IRQ_REG, status);
+	status = lp->dma_in(lp, RX_IRQ_REG);
+	lp->dma_out(lp, RX_IRQ_REG, status);
 
 	if (status & (IRQ_COAL | IRQ_DLY))
 		ll_temac_recv(lp->ndev);
@@ -795,7 +860,7 @@ static ssize_t temac_show_llink_regs(struct device *dev,
 	int i, len = 0;
 
 	for (i = 0; i < 0x11; i++)
-		len += sprintf(buf + len, "%.8x%s", temac_dma_in32(lp, i),
+		len += sprintf(buf + len, "%.8x%s", lp->dma_in(lp, i),
 			       (i % 8) == 7 ? "\n" : " ");
 	len += sprintf(buf + len, "\n");
 
@@ -821,7 +886,6 @@ temac_of_probe(struct of_device *op, const struct of_device_id *match)
 	struct net_device *ndev;
 	const void *addr;
 	int size, rc = 0;
-	unsigned int dcrs;
 
 	/* Init network device structure */
 	ndev = alloc_etherdev(sizeof(*lp));
@@ -871,13 +935,20 @@ temac_of_probe(struct of_device *op, const struct of_device_id *match)
 		goto nodev;
 	}
 
-	dcrs = dcr_resource_start(np, 0);
-	if (dcrs == 0) {
-		dev_err(&op->dev, "could not get DMA register address\n");
-		goto nodev;
+	/* Setup the DMA register accesses, could be DCR or memory mapped */
+	if (temac_dcr_setup(lp, op, np)) {
+
+		/* no DCR in the device tree, try non-DCR */
+		lp->sdma_regs = of_iomap(np, 0);
+		if (lp->sdma_regs) {
+			lp->dma_in = temac_dma_in32;
+			lp->dma_out = temac_dma_out32;
+			dev_dbg(&op->dev, "MEM base: %p\n", lp->sdma_regs);
+		} else {
+			dev_err(&op->dev, "unable to map DMA registers\n");
+			goto nodev;
+		}
 	}
-	lp->sdma_dcrs = dcr_map(np, dcrs, dcr_resource_len(np, 0));
-	dev_dbg(&op->dev, "DCR base: %x\n", dcrs);
 
 	lp->rx_irq = irq_of_parse_and_map(np, 0);
 	lp->tx_irq = irq_of_parse_and_map(np, 1);
-- 
1.6.2.1



This email and any attachments are intended for the sole use of the named recipient(s) and contain(s) confidential information that may be proprietary, privileged or copyrighted under applicable law. If you are not the intended recipient, do not read, copy, or forward this email message or any attachments. Delete this email message and any attachments immediately.



^ permalink raw reply related

* [PATCH 1/1] TIPC: Updated topology subscription protocol according to latest spec
From: Jon Maloy @ 2010-04-06 21:40 UTC (permalink / raw)
  To: David Miller; +Cc: Maloy, netdev, tipc-discussion, Jon

This patch makes it explicit in the API that all fields in subscriptions and events exchanged with the Topology Server must be in
network byte order.
It also ensures that all fields of a subscription are compared when cancelling a subscription, in order to avoid inadvertent
cancelling of the wrong subscription.
Finally, the tipc module version is updated to 2.0.0, to reflect the API change.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
---
 include/linux/tipc.h |   30 ++++++++++++------------------
 net/tipc/core.c      |    2 +-
 net/tipc/subscr.c    |   15 ++++++++++-----
 3 files changed, 23 insertions(+), 24 deletions(-)

diff --git a/include/linux/tipc.h b/include/linux/tipc.h
index 3d92396..9536d8a 100644
--- a/include/linux/tipc.h
+++ b/include/linux/tipc.h
@@ -127,23 +127,17 @@ static inline unsigned int tipc_node(__u32 addr)
  * TIPC topology subscription service definitions
  */
 
-#define TIPC_SUB_PORTS     	0x01  	/* filter for port availability */
-#define TIPC_SUB_SERVICE     	0x02  	/* filter for service availability */
-#define TIPC_SUB_CANCEL         0x04    /* cancel a subscription */
-#if 0
-/* The following filter options are not currently implemented */
-#define TIPC_SUB_NO_BIND_EVTS	0x04	/* filter out "publish" events */
-#define TIPC_SUB_NO_UNBIND_EVTS	0x08	/* filter out "withdraw" events */
-#define TIPC_SUB_SINGLE_EVT	0x10	/* expire after first event */
-#endif
+#define TIPC_SUB_SERVICE     	0x00  	/* Filter for service availability    */
+#define TIPC_SUB_PORTS     	0x01  	/* Filter for port availability  */
+#define TIPC_SUB_CANCEL         0x04    /* Cancel a subscription         */
 
 #define TIPC_WAIT_FOREVER	~0	/* timeout for permanent subscription */
 
 struct tipc_subscr {
-	struct tipc_name_seq seq;	/* name sequence of interest */
-	__u32 timeout;			/* subscription duration (in ms) */
-        __u32 filter;   		/* bitmask of filter options */
-	char usr_handle[8];		/* available for subscriber use */
+	struct tipc_name_seq seq;	/* NBO. Name sequence of interest */
+	__u32 timeout;			/* NBO. Subscription duration (in ms) */
+        __u32 filter;   		/* NBO. Bitmask of filter options */
+	char usr_handle[8];		/* Opaque. Available for subscriber use */
 };
 
 #define TIPC_PUBLISHED		1	/* publication event */
@@ -151,11 +145,11 @@ struct tipc_subscr {
 #define TIPC_SUBSCR_TIMEOUT	3	/* subscription timeout event */
 
 struct tipc_event {
-	__u32 event;			/* event type */
-	__u32 found_lower;		/* matching name seq instances */
-	__u32 found_upper;		/*    "      "    "     "      */
-	struct tipc_portid port;	/* associated port */
-	struct tipc_subscr s;		/* associated subscription */
+	__u32 event;			/* NBO. Event type, as defined above */
+	__u32 found_lower;		/* NBO. Matching name seq instances  */
+	__u32 found_upper;		/*  "      "       "   "    "        */
+	struct tipc_portid port;	/* NBO. Associated port              */
+	struct tipc_subscr s;		/* Original, associated subscription */
 };
 
 /*
diff --git a/net/tipc/core.c b/net/tipc/core.c
index 52c571f..4e84c84 100644
--- a/net/tipc/core.c
+++ b/net/tipc/core.c
@@ -49,7 +49,7 @@
 #include "config.h"
 
 
-#define TIPC_MOD_VER "1.6.4"
+#define TIPC_MOD_VER "2.0.0"
 
 #ifndef CONFIG_TIPC_ZONES
 #define CONFIG_TIPC_ZONES 3
diff --git a/net/tipc/subscr.c b/net/tipc/subscr.c
index ff123e5..ab6eab4 100644
--- a/net/tipc/subscr.c
+++ b/net/tipc/subscr.c
@@ -274,7 +274,7 @@ static void subscr_cancel(struct tipc_subscr *s,
 {
 	struct subscription *sub;
 	struct subscription *sub_temp;
-	__u32 type, lower, upper;
+	__u32 type, lower, upper, timeout, filter;
 	int found = 0;
 
 	/* Find first matching subscription, exit if not found */
@@ -282,12 +282,18 @@ static void subscr_cancel(struct tipc_subscr *s,
 	type = ntohl(s->seq.type);
 	lower = ntohl(s->seq.lower);
 	upper = ntohl(s->seq.upper);
+	timeout = ntohl(s->timeout);
+	filter = ntohl(s->filter) & ~TIPC_SUB_CANCEL;
 
 	list_for_each_entry_safe(sub, sub_temp, &subscriber->subscription_list,
 				 subscription_list) {
 			if ((type == sub->seq.type) &&
 			    (lower == sub->seq.lower) &&
-			    (upper == sub->seq.upper)) {
+			    (upper == sub->seq.upper) &&
+			    (timeout == sub->timeout) &&
+                            (filter == sub->filter) &&
+                             !memcmp(s->usr_handle,sub->evt.s.usr_handle,
+				     sizeof(s->usr_handle)) ){
 				found = 1;
 				break;
 			}
@@ -304,7 +310,7 @@ static void subscr_cancel(struct tipc_subscr *s,
 		k_term_timer(&sub->timer);
 		spin_lock_bh(subscriber->lock);
 	}
-	dbg("Cancel: removing sub %u,%u,%u from subscriber %x list\n",
+	dbg("Cancel: removing sub %u,%u,%u from subscriber %p list\n",
 	    sub->seq.type, sub->seq.lower, sub->seq.upper, subscriber);
 	subscr_del(sub);
 }
@@ -352,8 +358,7 @@ static struct subscription *subscr_subscribe(struct tipc_subscr *s,
 	sub->seq.upper = ntohl(s->seq.upper);
 	sub->timeout = ntohl(s->timeout);
 	sub->filter = ntohl(s->filter);
-	if ((!(sub->filter & TIPC_SUB_PORTS) ==
-	     !(sub->filter & TIPC_SUB_SERVICE)) ||
+	if ((sub->filter && (sub->filter != TIPC_SUB_PORTS)) ||
 	    (sub->seq.lower > sub->seq.upper)) {
 		warn("Subscription rejected, illegal request\n");
 		kfree(sub);
-- 
1.5.4.3


------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev

^ permalink raw reply related

* Re: [PATCH net-next] via-velocity: remove private #define
From: David Miller @ 2010-04-06 21:25 UTC (permalink / raw)
  To: romieu; +Cc: netdev, simon.kagstrom, jan.ceuleers, rseguier
In-Reply-To: <20100406205810.GA8077@electric-eye.fr.zoreil.com>

From: François Romieu <romieu@fr.zoreil.com>
Date: Tue, 6 Apr 2010 22:58:10 +0200

> Registers and their bits from mii.h. Courtesy from ed.
> 
> Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>

Applied, thanks a lot for doing this.

^ permalink raw reply

* [PATCH net-next] via-velocity: remove private #define
From: François Romieu @ 2010-04-06 20:58 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Simon Kagstrom, Jan Ceuleers, Séguier Régis

Registers and their bits from mii.h. Courtesy from ed.

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
---
 drivers/net/via-velocity.c |  114 ++++++++++++++++++++++----------------------
 drivers/net/via-velocity.h |   77 +----------------------------
 2 files changed, 60 insertions(+), 131 deletions(-)

diff --git a/drivers/net/via-velocity.c b/drivers/net/via-velocity.c
index 91f3b84..078903f 100644
--- a/drivers/net/via-velocity.c
+++ b/drivers/net/via-velocity.c
@@ -719,30 +719,30 @@ static u32 mii_check_media_mode(struct mac_regs __iomem *regs)
 	u32 status = 0;
 	u16 ANAR;
 
-	if (!MII_REG_BITS_IS_ON(BMSR_LNK, MII_REG_BMSR, regs))
+	if (!MII_REG_BITS_IS_ON(BMSR_LSTATUS, MII_BMSR, regs))
 		status |= VELOCITY_LINK_FAIL;
 
-	if (MII_REG_BITS_IS_ON(G1000CR_1000FD, MII_REG_G1000CR, regs))
+	if (MII_REG_BITS_IS_ON(ADVERTISE_1000FULL, MII_CTRL1000, regs))
 		status |= VELOCITY_SPEED_1000 | VELOCITY_DUPLEX_FULL;
-	else if (MII_REG_BITS_IS_ON(G1000CR_1000, MII_REG_G1000CR, regs))
+	else if (MII_REG_BITS_IS_ON(ADVERTISE_1000HALF, MII_CTRL1000, regs))
 		status |= (VELOCITY_SPEED_1000);
 	else {
-		velocity_mii_read(regs, MII_REG_ANAR, &ANAR);
-		if (ANAR & ANAR_TXFD)
+		velocity_mii_read(regs, MII_ADVERTISE, &ANAR);
+		if (ANAR & ADVERTISE_100FULL)
 			status |= (VELOCITY_SPEED_100 | VELOCITY_DUPLEX_FULL);
-		else if (ANAR & ANAR_TX)
+		else if (ANAR & ADVERTISE_100HALF)
 			status |= VELOCITY_SPEED_100;
-		else if (ANAR & ANAR_10FD)
+		else if (ANAR & ADVERTISE_10FULL)
 			status |= (VELOCITY_SPEED_10 | VELOCITY_DUPLEX_FULL);
 		else
 			status |= (VELOCITY_SPEED_10);
 	}
 
-	if (MII_REG_BITS_IS_ON(BMCR_AUTO, MII_REG_BMCR, regs)) {
-		velocity_mii_read(regs, MII_REG_ANAR, &ANAR);
-		if ((ANAR & (ANAR_TXFD | ANAR_TX | ANAR_10FD | ANAR_10))
-		    == (ANAR_TXFD | ANAR_TX | ANAR_10FD | ANAR_10)) {
-			if (MII_REG_BITS_IS_ON(G1000CR_1000 | G1000CR_1000FD, MII_REG_G1000CR, regs))
+	if (MII_REG_BITS_IS_ON(BMCR_ANENABLE, MII_BMCR, regs)) {
+		velocity_mii_read(regs, MII_ADVERTISE, &ANAR);
+		if ((ANAR & (ADVERTISE_100FULL | ADVERTISE_100HALF | ADVERTISE_10FULL | ADVERTISE_10HALF))
+		    == (ADVERTISE_100FULL | ADVERTISE_100HALF | ADVERTISE_10FULL | ADVERTISE_10HALF)) {
+			if (MII_REG_BITS_IS_ON(ADVERTISE_1000HALF | ADVERTISE_1000FULL, MII_CTRL1000, regs))
 				status |= VELOCITY_AUTONEG_ENABLE;
 		}
 	}
@@ -801,23 +801,23 @@ static void set_mii_flow_control(struct velocity_info *vptr)
 	/*Enable or Disable PAUSE in ANAR */
 	switch (vptr->options.flow_cntl) {
 	case FLOW_CNTL_TX:
-		MII_REG_BITS_OFF(ANAR_PAUSE, MII_REG_ANAR, vptr->mac_regs);
-		MII_REG_BITS_ON(ANAR_ASMDIR, MII_REG_ANAR, vptr->mac_regs);
+		MII_REG_BITS_OFF(ADVERTISE_PAUSE_CAP, MII_ADVERTISE, vptr->mac_regs);
+		MII_REG_BITS_ON(ADVERTISE_PAUSE_ASYM, MII_ADVERTISE, vptr->mac_regs);
 		break;
 
 	case FLOW_CNTL_RX:
-		MII_REG_BITS_ON(ANAR_PAUSE, MII_REG_ANAR, vptr->mac_regs);
-		MII_REG_BITS_ON(ANAR_ASMDIR, MII_REG_ANAR, vptr->mac_regs);
+		MII_REG_BITS_ON(ADVERTISE_PAUSE_CAP, MII_ADVERTISE, vptr->mac_regs);
+		MII_REG_BITS_ON(ADVERTISE_PAUSE_ASYM, MII_ADVERTISE, vptr->mac_regs);
 		break;
 
 	case FLOW_CNTL_TX_RX:
-		MII_REG_BITS_ON(ANAR_PAUSE, MII_REG_ANAR, vptr->mac_regs);
-		MII_REG_BITS_ON(ANAR_ASMDIR, MII_REG_ANAR, vptr->mac_regs);
+		MII_REG_BITS_ON(ADVERTISE_PAUSE_CAP, MII_ADVERTISE, vptr->mac_regs);
+		MII_REG_BITS_ON(ADVERTISE_PAUSE_ASYM, MII_ADVERTISE, vptr->mac_regs);
 		break;
 
 	case FLOW_CNTL_DISABLE:
-		MII_REG_BITS_OFF(ANAR_PAUSE, MII_REG_ANAR, vptr->mac_regs);
-		MII_REG_BITS_OFF(ANAR_ASMDIR, MII_REG_ANAR, vptr->mac_regs);
+		MII_REG_BITS_OFF(ADVERTISE_PAUSE_CAP, MII_ADVERTISE, vptr->mac_regs);
+		MII_REG_BITS_OFF(ADVERTISE_PAUSE_ASYM, MII_ADVERTISE, vptr->mac_regs);
 		break;
 	default:
 		break;
@@ -832,10 +832,10 @@ static void set_mii_flow_control(struct velocity_info *vptr)
  */
 static void mii_set_auto_on(struct velocity_info *vptr)
 {
-	if (MII_REG_BITS_IS_ON(BMCR_AUTO, MII_REG_BMCR, vptr->mac_regs))
-		MII_REG_BITS_ON(BMCR_REAUTO, MII_REG_BMCR, vptr->mac_regs);
+	if (MII_REG_BITS_IS_ON(BMCR_ANENABLE, MII_BMCR, vptr->mac_regs))
+		MII_REG_BITS_ON(BMCR_ANRESTART, MII_BMCR, vptr->mac_regs);
 	else
-		MII_REG_BITS_ON(BMCR_AUTO, MII_REG_BMCR, vptr->mac_regs);
+		MII_REG_BITS_ON(BMCR_ANENABLE, MII_BMCR, vptr->mac_regs);
 }
 
 static u32 check_connection_type(struct mac_regs __iomem *regs)
@@ -860,11 +860,11 @@ static u32 check_connection_type(struct mac_regs __iomem *regs)
 	else
 		status |= VELOCITY_SPEED_100;
 
-	if (MII_REG_BITS_IS_ON(BMCR_AUTO, MII_REG_BMCR, regs)) {
-		velocity_mii_read(regs, MII_REG_ANAR, &ANAR);
-		if ((ANAR & (ANAR_TXFD | ANAR_TX | ANAR_10FD | ANAR_10))
-		    == (ANAR_TXFD | ANAR_TX | ANAR_10FD | ANAR_10)) {
-			if (MII_REG_BITS_IS_ON(G1000CR_1000 | G1000CR_1000FD, MII_REG_G1000CR, regs))
+	if (MII_REG_BITS_IS_ON(BMCR_ANENABLE, MII_BMCR, regs)) {
+		velocity_mii_read(regs, MII_ADVERTISE, &ANAR);
+		if ((ANAR & (ADVERTISE_100FULL | ADVERTISE_100HALF | ADVERTISE_10FULL | ADVERTISE_10HALF))
+		    == (ADVERTISE_100FULL | ADVERTISE_100HALF | ADVERTISE_10FULL | ADVERTISE_10HALF)) {
+			if (MII_REG_BITS_IS_ON(ADVERTISE_1000HALF | ADVERTISE_1000FULL, MII_CTRL1000, regs))
 				status |= VELOCITY_AUTONEG_ENABLE;
 		}
 	}
@@ -905,7 +905,7 @@ static int velocity_set_media_mode(struct velocity_info *vptr, u32 mii_status)
 	 */
 
 	if (PHYID_GET_PHY_ID(vptr->phy_id) == PHYID_CICADA_CS8201)
-		MII_REG_BITS_ON(AUXCR_MDPPS, MII_REG_AUXCR, vptr->mac_regs);
+		MII_REG_BITS_ON(AUXCR_MDPPS, MII_NCONFIG, vptr->mac_regs);
 
 	/*
 	 *	If connection type is AUTO
@@ -915,9 +915,9 @@ static int velocity_set_media_mode(struct velocity_info *vptr, u32 mii_status)
 		/* clear force MAC mode bit */
 		BYTE_REG_BITS_OFF(CHIPGCR_FCMODE, &regs->CHIPGCR);
 		/* set duplex mode of MAC according to duplex mode of MII */
-		MII_REG_BITS_ON(ANAR_TXFD | ANAR_TX | ANAR_10FD | ANAR_10, MII_REG_ANAR, vptr->mac_regs);
-		MII_REG_BITS_ON(G1000CR_1000FD | G1000CR_1000, MII_REG_G1000CR, vptr->mac_regs);
-		MII_REG_BITS_ON(BMCR_SPEED1G, MII_REG_BMCR, vptr->mac_regs);
+		MII_REG_BITS_ON(ADVERTISE_100FULL | ADVERTISE_100HALF | ADVERTISE_10FULL | ADVERTISE_10HALF, MII_ADVERTISE, vptr->mac_regs);
+		MII_REG_BITS_ON(ADVERTISE_1000FULL | ADVERTISE_1000HALF, MII_CTRL1000, vptr->mac_regs);
+		MII_REG_BITS_ON(BMCR_SPEED1000, MII_BMCR, vptr->mac_regs);
 
 		/* enable AUTO-NEGO mode */
 		mii_set_auto_on(vptr);
@@ -952,31 +952,31 @@ static int velocity_set_media_mode(struct velocity_info *vptr, u32 mii_status)
 				BYTE_REG_BITS_ON(TCR_TB2BDIS, &regs->TCR);
 		}
 
-		MII_REG_BITS_OFF(G1000CR_1000FD | G1000CR_1000, MII_REG_G1000CR, vptr->mac_regs);
+		MII_REG_BITS_OFF(ADVERTISE_1000FULL | ADVERTISE_1000HALF, MII_CTRL1000, vptr->mac_regs);
 
 		if (!(mii_status & VELOCITY_DUPLEX_FULL) && (mii_status & VELOCITY_SPEED_10))
 			BYTE_REG_BITS_OFF(TESTCFG_HBDIS, &regs->TESTCFG);
 		else
 			BYTE_REG_BITS_ON(TESTCFG_HBDIS, &regs->TESTCFG);
 
-		/* MII_REG_BITS_OFF(BMCR_SPEED1G, MII_REG_BMCR, vptr->mac_regs); */
-		velocity_mii_read(vptr->mac_regs, MII_REG_ANAR, &ANAR);
-		ANAR &= (~(ANAR_TXFD | ANAR_TX | ANAR_10FD | ANAR_10));
+		/* MII_REG_BITS_OFF(BMCR_SPEED1000, MII_BMCR, vptr->mac_regs); */
+		velocity_mii_read(vptr->mac_regs, MII_ADVERTISE, &ANAR);
+		ANAR &= (~(ADVERTISE_100FULL | ADVERTISE_100HALF | ADVERTISE_10FULL | ADVERTISE_10HALF));
 		if (mii_status & VELOCITY_SPEED_100) {
 			if (mii_status & VELOCITY_DUPLEX_FULL)
-				ANAR |= ANAR_TXFD;
+				ANAR |= ADVERTISE_100FULL;
 			else
-				ANAR |= ANAR_TX;
+				ANAR |= ADVERTISE_100HALF;
 		} else {
 			if (mii_status & VELOCITY_DUPLEX_FULL)
-				ANAR |= ANAR_10FD;
+				ANAR |= ADVERTISE_10FULL;
 			else
-				ANAR |= ANAR_10;
+				ANAR |= ADVERTISE_10HALF;
 		}
-		velocity_mii_write(vptr->mac_regs, MII_REG_ANAR, ANAR);
+		velocity_mii_write(vptr->mac_regs, MII_ADVERTISE, ANAR);
 		/* enable AUTO-NEGO mode */
 		mii_set_auto_on(vptr);
-		/* MII_REG_BITS_ON(BMCR_AUTO, MII_REG_BMCR, vptr->mac_regs); */
+		/* MII_REG_BITS_ON(BMCR_ANENABLE, MII_BMCR, vptr->mac_regs); */
 	}
 	/* vptr->mii_status=mii_check_media_mode(vptr->mac_regs); */
 	/* vptr->mii_status=check_connection_type(vptr->mac_regs); */
@@ -1178,36 +1178,36 @@ static void mii_init(struct velocity_info *vptr, u32 mii_status)
 		/*
 		 *	Reset to hardware default
 		 */
-		MII_REG_BITS_OFF((ANAR_ASMDIR | ANAR_PAUSE), MII_REG_ANAR, vptr->mac_regs);
+		MII_REG_BITS_OFF((ADVERTISE_PAUSE_ASYM | ADVERTISE_PAUSE_CAP), MII_ADVERTISE, vptr->mac_regs);
 		/*
 		 *	Turn on ECHODIS bit in NWay-forced full mode and turn it
 		 *	off it in NWay-forced half mode for NWay-forced v.s.
 		 *	legacy-forced issue.
 		 */
 		if (vptr->mii_status & VELOCITY_DUPLEX_FULL)
-			MII_REG_BITS_ON(TCSR_ECHODIS, MII_REG_TCSR, vptr->mac_regs);
+			MII_REG_BITS_ON(TCSR_ECHODIS, MII_SREVISION, vptr->mac_regs);
 		else
-			MII_REG_BITS_OFF(TCSR_ECHODIS, MII_REG_TCSR, vptr->mac_regs);
+			MII_REG_BITS_OFF(TCSR_ECHODIS, MII_SREVISION, vptr->mac_regs);
 		/*
 		 *	Turn on Link/Activity LED enable bit for CIS8201
 		 */
-		MII_REG_BITS_ON(PLED_LALBE, MII_REG_PLED, vptr->mac_regs);
+		MII_REG_BITS_ON(PLED_LALBE, MII_TPISTATUS, vptr->mac_regs);
 		break;
 	case PHYID_VT3216_32BIT:
 	case PHYID_VT3216_64BIT:
 		/*
 		 *	Reset to hardware default
 		 */
-		MII_REG_BITS_ON((ANAR_ASMDIR | ANAR_PAUSE), MII_REG_ANAR, vptr->mac_regs);
+		MII_REG_BITS_ON((ADVERTISE_PAUSE_ASYM | ADVERTISE_PAUSE_CAP), MII_ADVERTISE, vptr->mac_regs);
 		/*
 		 *	Turn on ECHODIS bit in NWay-forced full mode and turn it
 		 *	off it in NWay-forced half mode for NWay-forced v.s.
 		 *	legacy-forced issue
 		 */
 		if (vptr->mii_status & VELOCITY_DUPLEX_FULL)
-			MII_REG_BITS_ON(TCSR_ECHODIS, MII_REG_TCSR, vptr->mac_regs);
+			MII_REG_BITS_ON(TCSR_ECHODIS, MII_SREVISION, vptr->mac_regs);
 		else
-			MII_REG_BITS_OFF(TCSR_ECHODIS, MII_REG_TCSR, vptr->mac_regs);
+			MII_REG_BITS_OFF(TCSR_ECHODIS, MII_SREVISION, vptr->mac_regs);
 		break;
 
 	case PHYID_MARVELL_1000:
@@ -1219,15 +1219,15 @@ static void mii_init(struct velocity_info *vptr, u32 mii_status)
 		/*
 		 *	Reset to hardware default
 		 */
-		MII_REG_BITS_ON((ANAR_ASMDIR | ANAR_PAUSE), MII_REG_ANAR, vptr->mac_regs);
+		MII_REG_BITS_ON((ADVERTISE_PAUSE_ASYM | ADVERTISE_PAUSE_CAP), MII_ADVERTISE, vptr->mac_regs);
 		break;
 	default:
 		;
 	}
-	velocity_mii_read(vptr->mac_regs, MII_REG_BMCR, &BMCR);
-	if (BMCR & BMCR_ISO) {
-		BMCR &= ~BMCR_ISO;
-		velocity_mii_write(vptr->mac_regs, MII_REG_BMCR, BMCR);
+	velocity_mii_read(vptr->mac_regs, MII_BMCR, &BMCR);
+	if (BMCR & BMCR_ISOLATE) {
+		BMCR &= ~BMCR_ISOLATE;
+		velocity_mii_write(vptr->mac_regs, MII_BMCR, BMCR);
 	}
 }
 
@@ -2953,13 +2953,13 @@ static int velocity_set_wol(struct velocity_info *vptr)
 
 	if (vptr->mii_status & VELOCITY_AUTONEG_ENABLE) {
 		if (PHYID_GET_PHY_ID(vptr->phy_id) == PHYID_CICADA_CS8201)
-			MII_REG_BITS_ON(AUXCR_MDPPS, MII_REG_AUXCR, vptr->mac_regs);
+			MII_REG_BITS_ON(AUXCR_MDPPS, MII_NCONFIG, vptr->mac_regs);
 
-		MII_REG_BITS_OFF(G1000CR_1000FD | G1000CR_1000, MII_REG_G1000CR, vptr->mac_regs);
+		MII_REG_BITS_OFF(ADVERTISE_1000FULL | ADVERTISE_1000HALF, MII_CTRL1000, vptr->mac_regs);
 	}
 
 	if (vptr->mii_status & VELOCITY_SPEED_1000)
-		MII_REG_BITS_ON(BMCR_REAUTO, MII_REG_BMCR, vptr->mac_regs);
+		MII_REG_BITS_ON(BMCR_ANRESTART, MII_BMCR, vptr->mac_regs);
 
 	BYTE_REG_BITS_ON(CHIPGCR_FCMODE, &regs->CHIPGCR);
 
diff --git a/drivers/net/via-velocity.h b/drivers/net/via-velocity.h
index ef4a0f6..c381911 100644
--- a/drivers/net/via-velocity.h
+++ b/drivers/net/via-velocity.h
@@ -1240,86 +1240,16 @@ struct velocity_context {
 	u32 pattern[8];
 };
 
-
-/*
- *	MII registers.
- */
-
-
 /*
  *	Registers in the MII (offset unit is WORD)
  */
 
-#define MII_REG_BMCR        0x00	// physical address
-#define MII_REG_BMSR        0x01	//
-#define MII_REG_PHYID1      0x02	// OUI
-#define MII_REG_PHYID2      0x03	// OUI + Module ID + REV ID
-#define MII_REG_ANAR        0x04	//
-#define MII_REG_ANLPAR      0x05	//
-#define MII_REG_G1000CR     0x09	//
-#define MII_REG_G1000SR     0x0A	//
-#define MII_REG_MODCFG      0x10	//
-#define MII_REG_TCSR        0x16	//
-#define MII_REG_PLED        0x1B	//
-// NS, MYSON only
-#define MII_REG_PCR         0x17	//
-// ESI only
-#define MII_REG_PCSR        0x17	//
-#define MII_REG_AUXCR       0x1C	//
-
 // Marvell 88E1000/88E1000S
 #define MII_REG_PSCR        0x10	// PHY specific control register
 
 //
-// Bits in the BMCR register
-//
-#define BMCR_RESET          0x8000	//
-#define BMCR_LBK            0x4000	//
-#define BMCR_SPEED100       0x2000	//
-#define BMCR_AUTO           0x1000	//
-#define BMCR_PD             0x0800	//
-#define BMCR_ISO            0x0400	//
-#define BMCR_REAUTO         0x0200	//
-#define BMCR_FDX            0x0100	//
-#define BMCR_SPEED1G        0x0040	//
-//
-// Bits in the BMSR register
-//
-#define BMSR_AUTOCM         0x0020	//
-#define BMSR_LNK            0x0004	//
-
-//
-// Bits in the ANAR register
-//
-#define ANAR_ASMDIR         0x0800	// Asymmetric PAUSE support
-#define ANAR_PAUSE          0x0400	// Symmetric PAUSE Support
-#define ANAR_T4             0x0200	//
-#define ANAR_TXFD           0x0100	//
-#define ANAR_TX             0x0080	//
-#define ANAR_10FD           0x0040	//
-#define ANAR_10             0x0020	//
-//
-// Bits in the ANLPAR register
-//
-#define ANLPAR_ASMDIR       0x0800	// Asymmetric PAUSE support
-#define ANLPAR_PAUSE        0x0400	// Symmetric PAUSE Support
-#define ANLPAR_T4           0x0200	//
-#define ANLPAR_TXFD         0x0100	//
-#define ANLPAR_TX           0x0080	//
-#define ANLPAR_10FD         0x0040	//
-#define ANLPAR_10           0x0020	//
-
-//
-// Bits in the G1000CR register
-//
-#define G1000CR_1000FD      0x0200	// PHY is 1000-T Full-duplex capable
-#define G1000CR_1000        0x0100	// PHY is 1000-T Half-duplex capable
-
-//
-// Bits in the G1000SR register
+// Bits in the Silicon revision register
 //
-#define G1000SR_1000FD      0x0800	// LP PHY is 1000-T Full-duplex capable
-#define G1000SR_1000        0x0400	// LP PHY is 1000-T Half-duplex capable
 
 #define TCSR_ECHODIS        0x2000	//
 #define AUXCR_MDPPS         0x0004	//
@@ -1338,7 +1268,6 @@ struct velocity_context {
 
 #define PHYID_REV_ID_MASK   0x0000000FUL
 
-#define PHYID_GET_PHY_REV_ID(i)     ((i) & PHYID_REV_ID_MASK)
 #define PHYID_GET_PHY_ID(i)         ((i) & ~PHYID_REV_ID_MASK)
 
 #define MII_REG_BITS_ON(x,i,p) do {\
@@ -1362,8 +1291,8 @@ struct velocity_context {
 
 #define MII_GET_PHY_ID(p) ({\
     u32 id;\
-    velocity_mii_read((p),MII_REG_PHYID2,(u16 *) &id);\
-    velocity_mii_read((p),MII_REG_PHYID1,((u16 *) &id)+1);\
+    velocity_mii_read((p),MII_PHYSID2,(u16 *) &id);\
+    velocity_mii_read((p),MII_PHYSID1,((u16 *) &id)+1);\
     (id);})
 
 /*
-- 
1.6.6.1


^ permalink raw reply related

* [PATCH v3] Add Mergeable receive buffer support to vhost_net
From: David L Stevens @ 2010-04-06 20:32 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, netdev, rusty, virtualization


This patch adds support for the Mergeable Receive Buffers feature to
vhost_net.

						+-DLS

Changes from previous revision:
1) renamed:
	vhost_discard_vq_desc -> vhost_discard_desc
	vhost_get_heads -> vhost_get_desc_n
	vhost_get_vq_desc -> vhost_get_desc
2) added heads as argument to ghost_get_desc_n
3) changed "vq->heads" from iovec to vring_used_elem, removed casts
4) changed vhost_add_used to do multiple elements in a single
copy_to_user,
	or two when we wrap the ring.
5) removed rxmaxheadcount and available buffer checks in favor of
running until
	an allocation failure, but making sure we break the loop if we get
	two in a row, indicating we have at least 1 buffer, but not enough
	for the current receive packet
6) restore non-vnet header handling

Signed-Off-By: David L Stevens <dlstevens@us.ibm.com>

diff -ruNp net-next-p0/drivers/vhost/net.c
net-next-v3/drivers/vhost/net.c
--- net-next-p0/drivers/vhost/net.c	2010-03-22 12:04:38.000000000 -0700
+++ net-next-v3/drivers/vhost/net.c	2010-04-06 12:54:56.000000000 -0700
@@ -130,9 +130,8 @@ static void handle_tx(struct vhost_net *
 	hdr_size = vq->hdr_size;
 
 	for (;;) {
-		head = vhost_get_vq_desc(&net->dev, vq, vq->iov,
-					 ARRAY_SIZE(vq->iov),
-					 &out, &in,
+		head = vhost_get_desc(&net->dev, vq, vq->iov,
+					 ARRAY_SIZE(vq->iov), &out, &in,
 					 NULL, NULL);
 		/* Nothing new?  Wait for eventfd to tell us they refilled. */
 		if (head == vq->num) {
@@ -167,8 +166,15 @@ static void handle_tx(struct vhost_net *
 		/* TODO: Check specific error and bomb out unless ENOBUFS? */
 		err = sock->ops->sendmsg(NULL, sock, &msg, len);
 		if (unlikely(err < 0)) {
-			vhost_discard_vq_desc(vq);
-			tx_poll_start(net, sock);
+			if (err == -EAGAIN) {
+				vhost_discard_desc(vq, 1);
+				tx_poll_start(net, sock);
+			} else {
+				vq_err(vq, "sendmsg: errno %d\n", -err);
+				/* drop packet; do not discard/resend */
+				vhost_add_used_and_signal(&net->dev, vq, head,
+							  0);
+			}
 			break;
 		}
 		if (err != len)
@@ -186,12 +192,25 @@ static void handle_tx(struct vhost_net *
 	unuse_mm(net->dev.mm);
 }
 
+static int vhost_head_len(struct sock *sk)
+{
+	struct sk_buff *head;
+	int len = 0;
+
+	lock_sock(sk);
+	head = skb_peek(&sk->sk_receive_queue);
+	if (head)
+		len = head->len;
+	release_sock(sk);
+	return len;
+}
+
 /* Expects to be always run from workqueue - which acts as
  * read-size critical section for our kind of RCU. */
 static void handle_rx(struct vhost_net *net)
 {
 	struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_RX];
-	unsigned head, out, in, log, s;
+	unsigned in, log, s;
 	struct vhost_log *vq_log;
 	struct msghdr msg = {
 		.msg_name = NULL,
@@ -202,13 +221,14 @@ static void handle_rx(struct vhost_net *
 		.msg_flags = MSG_DONTWAIT,
 	};
 
-	struct virtio_net_hdr hdr = {
-		.flags = 0,
-		.gso_type = VIRTIO_NET_HDR_GSO_NONE
+	struct virtio_net_hdr_mrg_rxbuf hdr = {
+		.hdr.flags = 0,
+		.hdr.gso_type = VIRTIO_NET_HDR_GSO_NONE
 	};
 
+	int retries = 0;
 	size_t len, total_len = 0;
-	int err;
+	int err, headcount, datalen;
 	size_t hdr_size;
 	struct socket *sock = rcu_dereference(vq->private_data);
 	if (!sock || skb_queue_empty(&sock->sk->sk_receive_queue))
@@ -222,31 +242,25 @@ static void handle_rx(struct vhost_net *
 	vq_log = unlikely(vhost_has_feature(&net->dev, VHOST_F_LOG_ALL)) ?
 		vq->log : NULL;
 
-	for (;;) {
-		head = vhost_get_vq_desc(&net->dev, vq, vq->iov,
-					 ARRAY_SIZE(vq->iov),
-					 &out, &in,
-					 vq_log, &log);
+	while ((datalen = vhost_head_len(sock->sk))) {
+		headcount = vhost_get_desc_n(vq, vq->heads, datalen, &in,
+					     vq_log, &log);
 		/* OK, now we need to know about added descriptors. */
-		if (head == vq->num) {
-			if (unlikely(vhost_enable_notify(vq))) {
+		if (!headcount) {
+			if (retries == 0 && unlikely(vhost_enable_notify(vq))) {
 				/* They have slipped one in as we were
 				 * doing that: check again. */
 				vhost_disable_notify(vq);
+				retries++;
 				continue;
 			}
+			retries = 0;
 			/* Nothing new?  Wait for eventfd to tell us
 			 * they refilled. */
 			break;
 		}
 		/* We don't need to be notified again. */
-		if (out) {
-			vq_err(vq, "Unexpected descriptor format for RX: "
-			       "out %d, int %d\n",
-			       out, in);
-			break;
-		}
-		/* Skip header. TODO: support TSO/mergeable rx buffers. */
+		/* Skip header. TODO: support TSO. */
 		s = move_iovec_hdr(vq->iov, vq->hdr, hdr_size, in);
 		msg.msg_iovlen = in;
 		len = iov_length(vq->iov, in);
@@ -261,14 +275,33 @@ static void handle_rx(struct vhost_net *
 					 len, MSG_DONTWAIT | MSG_TRUNC);
 		/* TODO: Check specific error and bomb out unless EAGAIN? */
 		if (err < 0) {
-			vhost_discard_vq_desc(vq);
+			vhost_discard_desc(vq, headcount);
 			break;
 		}
 		/* TODO: Should check and handle checksum. */
+		if (vhost_has_feature(&net->dev, VIRTIO_NET_F_MRG_RXBUF)) {
+			struct virtio_net_hdr_mrg_rxbuf *vhdr =
+				(struct virtio_net_hdr_mrg_rxbuf *)
+				vq->iov[0].iov_base;
+			/* add num_buffers */
+			if (vhost_has_feature(&net->dev,
+					      VHOST_NET_F_VIRTIO_NET_HDR))
+				hdr.num_buffers = headcount;
+			else if (vq->iov[0].iov_len < sizeof(*vhdr)) {
+				vq_err(vq, "tiny buffers < %d unsupported",
+					vq->iov[0].iov_len);
+				vhost_discard_desc(vq, headcount);
+				break;
+			} else if (put_user(headcount, &vhdr->num_buffers)) {
+				vq_err(vq, "Failed num_buffers write");
+				vhost_discard_desc(vq, headcount);
+				break;
+			}
+		}
 		if (err > len) {
 			pr_err("Discarded truncated rx packet: "
 			       " len %d > %zd\n", err, len);
-			vhost_discard_vq_desc(vq);
+			vhost_discard_desc(vq, headcount);
 			continue;
 		}
 		len = err;
@@ -279,7 +312,7 @@ static void handle_rx(struct vhost_net *
 			break;
 		}
 		len += hdr_size;
-		vhost_add_used_and_signal(&net->dev, vq, head, len);
+		vhost_add_used_and_signal_n(&net->dev, vq, vq->heads, headcount);
 		if (unlikely(vq_log))
 			vhost_log_write(vq, vq_log, log, len);
 		total_len += len;
@@ -560,9 +593,14 @@ done:
 
 static int vhost_net_set_features(struct vhost_net *n, u64 features)
 {
-	size_t hdr_size = features & (1 << VHOST_NET_F_VIRTIO_NET_HDR) ?
-		sizeof(struct virtio_net_hdr) : 0;
+	size_t hdr_size = 0;
 	int i;
+
+	if (features & (1 << VHOST_NET_F_VIRTIO_NET_HDR)) {
+		hdr_size = sizeof(struct virtio_net_hdr);
+		if (features & (1 << VIRTIO_NET_F_MRG_RXBUF))
+			hdr_size = sizeof(struct virtio_net_hdr_mrg_rxbuf);
+	}
 	mutex_lock(&n->dev.mutex);
 	if ((features & (1 << VHOST_F_LOG_ALL)) &&
 	    !vhost_log_access_ok(&n->dev)) {
diff -ruNp net-next-p0/drivers/vhost/vhost.c
net-next-v3/drivers/vhost/vhost.c
--- net-next-p0/drivers/vhost/vhost.c	2010-03-22 12:04:38.000000000
-0700
+++ net-next-v3/drivers/vhost/vhost.c	2010-04-06 12:57:51.000000000
-0700
@@ -856,6 +856,47 @@ static unsigned get_indirect(struct vhos
 	return 0;
 }
 
+/* This is a multi-buffer version of vhost_get_vq_desc
+ * @vq		- the relevant virtqueue
+ * datalen	- data length we'll be reading
+ * @iovcount	- returned count of io vectors we fill
+ * @log		- vhost log
+ * @log_num	- log offset
+ *	returns number of buffer heads allocated, 0 on error
+ */
+int vhost_get_desc_n(struct vhost_virtqueue *vq, struct vring_used_elem
*heads,
+		     int datalen, int *iovcount, struct vhost_log *log,
+		     unsigned int *log_num)
+{
+	int out, in;
+	int seg = 0;		/* iov index */
+	int hc = 0;		/* head count */
+
+	while (datalen > 0) {
+		if (hc >= VHOST_NET_MAX_SG)
+			goto err;
+		heads[hc].id = vhost_get_desc(vq->dev, vq, vq->iov+seg,
+					      ARRAY_SIZE(vq->iov)-seg, &out,
+					      &in, log, log_num);
+		if (heads[hc].id == vq->num)
+			goto err;
+		if (out || in <= 0) {
+			vq_err(vq, "unexpected descriptor format for RX: "
+				"out %d, in %d\n", out, in);
+			goto err;
+		}
+		heads[hc].len = iov_length(vq->iov+seg, in);
+		datalen -= heads[hc].len;
+		hc++;
+		seg += in;
+	}
+	*iovcount = seg;
+	return hc;
+err:
+	vhost_discard_desc(vq, hc);
+	return 0;
+}
+
 /* This looks in the virtqueue and for the first available buffer, and
converts
  * it to an iovec for convenient access.  Since descriptors consist of
some
  * number of output then some number of input descriptors, it's
actually two
@@ -863,7 +904,7 @@ static unsigned get_indirect(struct vhos
  *
  * This function returns the descriptor number found, or vq->num (which
  * is never a valid descriptor number) if none was found. */
-unsigned vhost_get_vq_desc(struct vhost_dev *dev, struct
vhost_virtqueue *vq,
+unsigned vhost_get_desc(struct vhost_dev *dev, struct vhost_virtqueue
*vq,
 			   struct iovec iov[], unsigned int iov_size,
 			   unsigned int *out_num, unsigned int *in_num,
 			   struct vhost_log *log, unsigned int *log_num)
@@ -981,31 +1022,42 @@ unsigned vhost_get_vq_desc(struct vhost_
 }
 
 /* Reverse the effect of vhost_get_vq_desc. Useful for error handling.
*/
-void vhost_discard_vq_desc(struct vhost_virtqueue *vq)
+void vhost_discard_desc(struct vhost_virtqueue *vq, int n)
 {
-	vq->last_avail_idx--;
+	vq->last_avail_idx -= n;
 }
 
 /* After we've used one of their buffers, we tell them about it.  We'll
then
  * want to notify the guest, using eventfd. */
-int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int
len)
+int vhost_add_used(struct vhost_virtqueue *vq, struct vring_used_elem
*heads,
+		   int count)
 {
 	struct vring_used_elem *used;
+	int start, n;
+
+	if (count <= 0)
+		return -EINVAL;
 
-	/* The virtqueue contains a ring of used buffers.  Get a pointer to
the
-	 * next entry in that used ring. */
-	used = &vq->used->ring[vq->last_used_idx % vq->num];
-	if (put_user(head, &used->id)) {
-		vq_err(vq, "Failed to write used id");
+	start = vq->last_used_idx % vq->num;
+	if (vq->num - start < count)
+		n = vq->num - start;
+	else
+		n = count;
+	used = vq->used->ring + start;
+	if (copy_to_user(used, heads, sizeof(heads[0])*n)) {
+		vq_err(vq, "Failed to write used");
 		return -EFAULT;
 	}
-	if (put_user(len, &used->len)) {
-		vq_err(vq, "Failed to write used len");
-		return -EFAULT;
+	if (n < count) {	/* wrapped the ring */
+		used = vq->used->ring;
+		if (copy_to_user(used, heads+n, sizeof(heads[0])*(count-n))) {
+			vq_err(vq, "Failed to write used");
+			return -EFAULT;
+		}
 	}
 	/* Make sure buffer is written before we update index. */
 	smp_wmb();
-	if (put_user(vq->last_used_idx + 1, &vq->used->idx)) {
+	if (put_user(vq->last_used_idx+count, &vq->used->idx)) {
 		vq_err(vq, "Failed to increment used idx");
 		return -EFAULT;
 	}
@@ -1023,7 +1075,7 @@ int vhost_add_used(struct vhost_virtqueu
 		if (vq->log_ctx)
 			eventfd_signal(vq->log_ctx, 1);
 	}
-	vq->last_used_idx++;
+	vq->last_used_idx += count;
 	return 0;
 }
 
@@ -1049,10 +1101,23 @@ void vhost_signal(struct vhost_dev *dev,
 
 /* And here's the combo meal deal.  Supersize me! */
 void vhost_add_used_and_signal(struct vhost_dev *dev,
-			       struct vhost_virtqueue *vq,
-			       unsigned int head, int len)
+			       struct vhost_virtqueue *vq, unsigned int id,
+			       int len)
+{
+	struct vring_used_elem head;
+
+	head.id = id;
+	head.len = len;
+	vhost_add_used(vq, &head, 1);
+	vhost_signal(dev, vq);
+}
+
+/* multi-buffer version of vhost_add_used_and_signal */
+void vhost_add_used_and_signal_n(struct vhost_dev *dev,
+				 struct vhost_virtqueue *vq,
+				 struct vring_used_elem *heads, int count)
 {
-	vhost_add_used(vq, head, len);
+	vhost_add_used(vq, heads, count);
 	vhost_signal(dev, vq);
 }
 
diff -ruNp net-next-p0/drivers/vhost/vhost.h
net-next-v3/drivers/vhost/vhost.h
--- net-next-p0/drivers/vhost/vhost.h	2010-03-22 12:04:38.000000000
-0700
+++ net-next-v3/drivers/vhost/vhost.h	2010-04-05 20:33:57.000000000
-0700
@@ -85,6 +85,7 @@ struct vhost_virtqueue {
 	struct iovec iov[VHOST_NET_MAX_SG];
 	struct iovec hdr[VHOST_NET_MAX_SG];
 	size_t hdr_size;
+	struct vring_used_elem heads[VHOST_NET_MAX_SG];
 	/* We use a kind of RCU to access private pointer.
 	 * All readers access it from workqueue, which makes it possible to
 	 * flush the workqueue instead of synchronize_rcu. Therefore readers
do
@@ -120,16 +121,22 @@ long vhost_dev_ioctl(struct vhost_dev *,
 int vhost_vq_access_ok(struct vhost_virtqueue *vq);
 int vhost_log_access_ok(struct vhost_dev *);
 
-unsigned vhost_get_vq_desc(struct vhost_dev *, struct vhost_virtqueue
*,
+int vhost_get_desc_n(struct vhost_virtqueue *, struct vring_used_elem
*heads,
+		     int datalen, int *iovcount, struct vhost_log *log,
+		     unsigned int *log_num);
+unsigned vhost_get_desc(struct vhost_dev *, struct vhost_virtqueue *,
 			   struct iovec iov[], unsigned int iov_count,
 			   unsigned int *out_num, unsigned int *in_num,
 			   struct vhost_log *log, unsigned int *log_num);
-void vhost_discard_vq_desc(struct vhost_virtqueue *);
+void vhost_discard_desc(struct vhost_virtqueue *, int);
 
-int vhost_add_used(struct vhost_virtqueue *, unsigned int head, int
len);
-void vhost_signal(struct vhost_dev *, struct vhost_virtqueue *);
+int vhost_add_used(struct vhost_virtqueue *, struct vring_used_elem
*heads,
+		    int count);
 void vhost_add_used_and_signal(struct vhost_dev *, struct
vhost_virtqueue *,
-			       unsigned int head, int len);
+			       unsigned int id, int len);
+void vhost_add_used_and_signal_n(struct vhost_dev *, struct
vhost_virtqueue *,
+			       struct vring_used_elem *heads, int count);
+void vhost_signal(struct vhost_dev *, struct vhost_virtqueue *);
 void vhost_disable_notify(struct vhost_virtqueue *);
 bool vhost_enable_notify(struct vhost_virtqueue *);
 
@@ -149,7 +156,8 @@ enum {
 	VHOST_FEATURES = (1 << VIRTIO_F_NOTIFY_ON_EMPTY) |
 			 (1 << VIRTIO_RING_F_INDIRECT_DESC) |
 			 (1 << VHOST_F_LOG_ALL) |
-			 (1 << VHOST_NET_F_VIRTIO_NET_HDR),
+			 (1 << VHOST_NET_F_VIRTIO_NET_HDR) |
+			 (1 << VIRTIO_NET_F_MRG_RXBUF),
 };
 
 static inline int vhost_has_feature(struct vhost_dev *dev, int bit)



^ permalink raw reply

* RE: [PATCH] [V3] Add non-Virtex5 support for LL TEMAC driver
From: John Linn @ 2010-04-06 20:32 UTC (permalink / raw)
  To: Eric Dumazet, grant.likely
  Cc: netdev, linuxppc-dev, jwboyer, john.williams, michal.simek,
	John Tyner
In-Reply-To: <1270585441.2091.65.camel@edumazet-laptop>

> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: Tuesday, April 06, 2010 2:24 PM
> To: grant.likely@secretlab.ca
> Cc: John Linn; netdev@vger.kernel.org; linuxppc-dev@ozlabs.org; jwboyer@linux.vnet.ibm.com;
> john.williams@petalogix.com; michal.simek@petalogix.com; John Tyner
> Subject: Re: [PATCH] [V3] Add non-Virtex5 support for LL TEMAC driver
> 
> Le mardi 06 avril 2010 à 12:53 -0600, Grant Likely a écrit :
> 
> > Hold on.... BUFFER_ALIGN is being used to align the DMA buffer on a
> > cache line boundary.  I don't think netdev_alloc_skb() makes any
> > guarantees about how the start of the IP header lines up against cache
> > line boundaries.  The amount of padding needed is not known until an
> > skbuff is obtained from netdev_alloc_skb(), and
> > netdev_alloc_skb_ip_align() can only handle a fixed size padding,
> >
> > It doesn't look like netdev_alloc_skb_ip_align() is the right thing in
> > this regard.
> >
> 
> OK, time to have a long explanation :
> 
> netdev_alloc_skb_ip_align() is doing the right thing, but it seems you
> guys insist to invent a new private stuff.
> 
> I am only wondering if you really know why you do this.
> 
> Many drivers do have same requirements, so every driver should reinvent
> the wheel ? Really this is beyond me.
> 
> Original code was aligning the buffer on a 32 bytes boundary (because of
> a hardware requirement on NIC, I only trust original code on this).
> 
> Then you want to change this to align buffer on this 32 bytes boundary
> PLUS 2. What is this kind of new requirement ?
> 
> 1) Hardware requirement really changed that much. (firmware changed on
> NIC). If not using this new alignement, NIC doesnt work at all.
> 
> 2) or Microblaze arch requires that IP header is aligned on a word
> boundary to avoid unaligned traps in IP stack ? (like many arches)

Yes.

> 
> If this is the latest requirement, then use standard mechanism.
> skb data is naturally aligned on L1_CACHE_SIZE + SKB_PAD boundaries.
> (32 bytes alignment)
> 
> netdev_alloc_skb_ip_align()() then skips 2 bytes to make skb data
> aligned so that 2 + 14 (sizeof eth header) = 16 : IP header is aligned
> (modulo 16)
> 
> It just works. If not, we should correct it, please fill a bug report.
> 

Trying it now, thanks for the help and patience :)

-- John

> Thanks
> 
> 


This email and any attachments are intended for the sole use of the named recipient(s) and contain(s) confidential information that may be proprietary, privileged or copyrighted under applicable law. If you are not the intended recipient, do not read, copy, or forward this email message or any attachments. Delete this email message and any attachments immediately.

^ permalink raw reply

* Re: [PATCH] [V3] Add non-Virtex5 support for LL TEMAC driver
From: Eric Dumazet @ 2010-04-06 20:24 UTC (permalink / raw)
  To: Grant Likely
  Cc: John Linn, netdev, linuxppc-dev, jwboyer, john.williams,
	michal.simek, John Tyner
In-Reply-To: <k2mfa686aa41004061153q3d065924o9172b1cf6038d917@mail.gmail.com>

Le mardi 06 avril 2010 à 12:53 -0600, Grant Likely a écrit :

> Hold on.... BUFFER_ALIGN is being used to align the DMA buffer on a
> cache line boundary.  I don't think netdev_alloc_skb() makes any
> guarantees about how the start of the IP header lines up against cache
> line boundaries.  The amount of padding needed is not known until an
> skbuff is obtained from netdev_alloc_skb(), and
> netdev_alloc_skb_ip_align() can only handle a fixed size padding,
> 
> It doesn't look like netdev_alloc_skb_ip_align() is the right thing in
> this regard.
> 

OK, time to have a long explanation :

netdev_alloc_skb_ip_align() is doing the right thing, but it seems you
guys insist to invent a new private stuff.

I am only wondering if you really know why you do this. 

Many drivers do have same requirements, so every driver should reinvent
the wheel ? Really this is beyond me.

Original code was aligning the buffer on a 32 bytes boundary (because of
a hardware requirement on NIC, I only trust original code on this).

Then you want to change this to align buffer on this 32 bytes boundary
PLUS 2. What is this kind of new requirement ? 

1) Hardware requirement really changed that much. (firmware changed on
NIC). If not using this new alignement, NIC doesnt work at all.

2) or Microblaze arch requires that IP header is aligned on a word
boundary to avoid unaligned traps in IP stack ? (like many arches)

If this is the latest requirement, then use standard mechanism.
skb data is naturally aligned on L1_CACHE_SIZE + SKB_PAD boundaries.
(32 bytes alignment)

netdev_alloc_skb_ip_align()() then skips 2 bytes to make skb data
aligned so that 2 + 14 (sizeof eth header) = 16 : IP header is aligned
(modulo 16)

It just works. If not, we should correct it, please fill a bug report.

Thanks

^ permalink raw reply

* RE: [PATCH] [V3] Add non-Virtex5 support for LL TEMAC driver
From: John Linn @ 2010-04-06 20:12 UTC (permalink / raw)
  To: steve, grant.likely
  Cc: Eric Dumazet, netdev, linuxppc-dev, jwboyer, john.williams,
	michal.simek, John Tyner
In-Reply-To: <1270584223.3949.57.camel@iscandar.digidescorp.com>

> -----Original Message-----
> From: Steven J. Magnani [mailto:steve@digidescorp.com]
> Sent: Tuesday, April 06, 2010 2:04 PM
> To: grant.likely@secretlab.ca
> Cc: John Linn; Eric Dumazet; netdev@vger.kernel.org;
linuxppc-dev@ozlabs.org;
> jwboyer@linux.vnet.ibm.com; john.williams@petalogix.com;
michal.simek@petalogix.com; John Tyner
> Subject: Re: [PATCH] [V3] Add non-Virtex5 support for LL TEMAC driver
> 
> On Tue, 2010-04-06 at 12:53 -0600, Grant Likely wrote:
> 
> > Hold on.... BUFFER_ALIGN is being used to align the DMA buffer on a
> > cache line boundary.  I don't think netdev_alloc_skb() makes any
> > guarantees about how the start of the IP header lines up against
cache
> > line boundaries.  The amount of padding needed is not known until an
> > skbuff is obtained from netdev_alloc_skb(), and
> > netdev_alloc_skb_ip_align() can only handle a fixed size padding,
> >
> > It doesn't look like netdev_alloc_skb_ip_align() is the right thing
in
> > this regard.
> 
> __netdev_alloc_skb reserves NET_SKB_PAD bytes which gets us cacheline
> alignment on Microblaze. From include/linux/skbuff.h:
> 

Good find. I'll give it a test on MicroBlaze and PowerPC.

> /*
>  * The networking layer reserves some headroom in skb data (via
>  * dev_alloc_skb). This is used to avoid having to reallocate skb data
> when
>  * the header has to grow. In the default case, if the header has to
> grow
>  * 32 bytes or less we avoid the reallocation.
>  *
>  * Unfortunately this headroom changes the DMA alignment of the
> resulting
>  * network packet. As for NET_IP_ALIGN, this unaligned DMA is
expensive
>  * on some architectures. An architecture can override this value,
>  * perhaps setting it to a cacheline in size (since that will maintain
>  * cacheline alignment of the DMA). It must be a power of 2.
>  *
>  * Various parts of the networking layer expect at least 32 bytes of
>  * headroom, you should not reduce this.
>  */
> #ifndef NET_SKB_PAD
> #define NET_SKB_PAD     32
> #endif
> 
> If this doesn't work for some of the PPC variants with larger cache
> lines, maybe one of the PPC header files needs to define NET_SKB_PAD?

Looks like it is defined in system.h in powerpc so that it works.

> And if we want to guard against possible future changes to the default
> NET_SKB_PAD breaking Microblaze operation, maybe one of its headers
> should define NET_SKB_PAD as well?

Good idea, we can add that to system.h for MicroBlaze also.

Thanks,
John

> 
>
------------------------------------------------------------------------
>  Steven J. Magnani               "I claim this network for MARS!
>  www.digidescorp.com              Earthling, return my space
modulator!"
> 
>  #include <standard.disclaimer>
> 
> 
> 


This email and any attachments are intended for the sole use of the named recipient(s) and contain(s) confidential information that may be proprietary, privileged or copyrighted under applicable law. If you are not the intended recipient, do not read, copy, or forward this email message or any attachments. Delete this email message and any attachments immediately.



^ permalink raw reply

* RE: [PATCH] [V3] Add non-Virtex5 support for LL TEMAC driver
From: John Linn @ 2010-04-06 20:03 UTC (permalink / raw)
  To: Grant Likely
  Cc: Eric Dumazet, netdev, linuxppc-dev, jwboyer, john.williams,
	michal.simek, John Tyner
In-Reply-To: <k2mfa686aa41004061153q3d065924o9172b1cf6038d917@mail.gmail.com>

> -----Original Message-----
> From: glikely@secretlab.ca [mailto:glikely@secretlab.ca] On Behalf Of Grant Likely
> Sent: Tuesday, April 06, 2010 12:54 PM
> To: John Linn
> Cc: Eric Dumazet; netdev@vger.kernel.org; linuxppc-dev@ozlabs.org; jwboyer@linux.vnet.ibm.com;
> john.williams@petalogix.com; michal.simek@petalogix.com; John Tyner
> Subject: Re: [PATCH] [V3] Add non-Virtex5 support for LL TEMAC driver
> 
> On Tue, Apr 6, 2010 at 11:11 AM, John Linn <John.Linn@xilinx.com> wrote:
> >> -----Original Message-----
> >> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> >> Sent: Tuesday, April 06, 2010 11:01 AM
> >> To: John Linn
> >> Cc: netdev@vger.kernel.org; linuxppc-dev@ozlabs.org; grant.likely@secretlab.ca;
> >> jwboyer@linux.vnet.ibm.com; john.williams@petalogix.com; michal.simek@petalogix.com; John Tyner
> >> Subject: RE: [PATCH] [V3] Add non-Virtex5 support for LL TEMAC driver
> >>
> >> Le mardi 06 avril 2010 à 10:12 -0600, John Linn a écrit :
> >> > > -----Original Message-----
> >> > > From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> >> > > Sent: Monday, April 05, 2010 3:30 PM
> >> > > To: John Linn
> >> > > Cc: netdev@vger.kernel.org; linuxppc-dev@ozlabs.org; grant.likely@secretlab.ca;
> >> > > jwboyer@linux.vnet.ibm.com; john.williams@petalogix.com; michal.simek@petalogix.com; John Tyner
> >> > > Subject: Re: [PATCH] [V3] Add non-Virtex5 support for LL TEMAC driver
> >> > >
> >> > > Le lundi 05 avril 2010 à 15:11 -0600, John Linn a écrit :
> >> > > > This patch adds support for using the LL TEMAC Ethernet driver on
> >> > > > non-Virtex 5 platforms by adding support for accessing the Soft DMA
> >> > > > registers as if they were memory mapped instead of solely through the
> >> > > > DCR's (available on the Virtex 5).
> >> > > >
> >> > > > The patch also updates the driver so that it runs on the MicroBlaze.
> >> > > > The changes were tested on the PowerPC 440, PowerPC 405, and the
> >> > > > MicroBlaze platforms.
> >> > > >
> >> > > > Signed-off-by: John Tyner <jtyner@cs.ucr.edu>
> >> > > > Signed-off-by: John Linn <john.linn@xilinx.com>
> >> > > >
> >> > > > ---
> >> > >
> >> > > > +/* Align the IP data in the packet on word boundaries as MicroBlaze
> >> > > > + * needs it.
> >> > > > + */
> >> > > > +
> >> > > >  #define XTE_ALIGN       32
> >> > > > -#define BUFFER_ALIGN(adr) ((XTE_ALIGN - ((u32) adr)) % XTE_ALIGN)
> >> > > > +#define BUFFER_ALIGN(adr) ((34 - ((u32) adr)) % XTE_ALIGN)
> >> > > >
> >> > >
> >> > > Very interesting way of doing this, but why such convoluted thing ?
> >> >
> >> > This is trying to align for a cache line (32 bytes) before my change.
> >> >
> >> > My change was then also making it align the IP data on a word boundary.
> >> >
> >> > >
> >> > > Because of the % 32, this is equivalent to :
> >> > >
> >> > > #define BUFFER_ALIGN(adr) ((2 - ((u32) adr)) % XTE_ALIGN)
> >> > >
> >> >
> >> > Yes, but I'm not sure that's clearer IMHO.
> >> >
> >> > > But wait, dont we recognise the magic constant NET_IP_ALIGN ?
> >> >
> >> > Yes it could be used.  I'm struggling with how to make this all be clearer.
> >> >
> >>
> >> I am not saying its clearer, I am saying we have a standard way to
> >> handle this exact problem (aligning rcvs buffer so that IP header is
> >> aligned)
> >>
> >> There is no need to invent new ones, this makes reviewing of this driver
> >> more difficult.
> 
> Hold on.... BUFFER_ALIGN is being used to align the DMA buffer on a
> cache line boundary.  I don't think netdev_alloc_skb() makes any
> guarantees about how the start of the IP header lines up against cache
> line boundaries.  The amount of padding needed is not known until an
> skbuff is obtained from netdev_alloc_skb(), and
> netdev_alloc_skb_ip_align() can only handle a fixed size padding,
> 
> It doesn't look like netdev_alloc_skb_ip_align() is the right thing in
> this regard.
> 
> >> > How about this?
> >> > #define BUFFER_ALIGN(adr) (((XTE_ALIGN + NET_IP_ALIGN) - ((u32) adr)) % XTE_ALIGN)
> >> >
> >>
> >> Sorry, I still dont understand why you need XTE_ALIGN + ...
> >>
> >> ((A + B) - C) % A   is equal to (B - C) % A
> >>
> >> Which one is more readable ?
> >
> > I'm fine with your suggestion.
> >
> > #define BUFFER_ALIGN(adr) ((2 - ((u32) adr)) % XTE_ALIGN)
> >
> >>
> >> Please take a look at existing and clean code, no magic macro, and we
> >> can understand the intention.
> >>
> >> find drivers/net | xargs grep -n netdev_alloc_skb_ip_align
> >>
> >>
> >
> > Yes I see how it's used, but it only allows you to reserve 2 bytes in the skb with no options.
> 
> Eric is here.  The mod operation means that BUFFER_ALIGN using either
> 2 or 34 is equivalent.
> 
> g.

I can spin another patch with the following and with Grant's Kconfig changes, just looking for confirmation that's acceptable.

#define BUFFER_ALIGN(adr) ((2 - ((u32) adr)) % XTE_ALIGN)

Thanks,
John




This email and any attachments are intended for the sole use of the named recipient(s) and contain(s) confidential information that may be proprietary, privileged or copyrighted under applicable law. If you are not the intended recipient, do not read, copy, or forward this email message or any attachments. Delete this email message and any attachments immediately.



^ permalink raw reply

* Re: [PATCH] [V3] Add non-Virtex5 support for LL TEMAC driver
From: Steven J. Magnani @ 2010-04-06 20:03 UTC (permalink / raw)
  To: Grant Likely
  Cc: John Linn, Eric Dumazet, netdev, linuxppc-dev, jwboyer,
	john.williams, michal.simek, John Tyner
In-Reply-To: <k2mfa686aa41004061153q3d065924o9172b1cf6038d917@mail.gmail.com>

On Tue, 2010-04-06 at 12:53 -0600, Grant Likely wrote:

> Hold on.... BUFFER_ALIGN is being used to align the DMA buffer on a
> cache line boundary.  I don't think netdev_alloc_skb() makes any
> guarantees about how the start of the IP header lines up against cache
> line boundaries.  The amount of padding needed is not known until an
> skbuff is obtained from netdev_alloc_skb(), and
> netdev_alloc_skb_ip_align() can only handle a fixed size padding,
> 
> It doesn't look like netdev_alloc_skb_ip_align() is the right thing in
> this regard.

__netdev_alloc_skb reserves NET_SKB_PAD bytes which gets us cacheline
alignment on Microblaze. From include/linux/skbuff.h:

/*
 * The networking layer reserves some headroom in skb data (via
 * dev_alloc_skb). This is used to avoid having to reallocate skb data
when
 * the header has to grow. In the default case, if the header has to
grow
 * 32 bytes or less we avoid the reallocation.
 *
 * Unfortunately this headroom changes the DMA alignment of the
resulting
 * network packet. As for NET_IP_ALIGN, this unaligned DMA is expensive
 * on some architectures. An architecture can override this value,
 * perhaps setting it to a cacheline in size (since that will maintain
 * cacheline alignment of the DMA). It must be a power of 2.
 *
 * Various parts of the networking layer expect at least 32 bytes of
 * headroom, you should not reduce this.
 */
#ifndef NET_SKB_PAD
#define NET_SKB_PAD     32
#endif

If this doesn't work for some of the PPC variants with larger cache
lines, maybe one of the PPC header files needs to define NET_SKB_PAD?
And if we want to guard against possible future changes to the default
NET_SKB_PAD breaking Microblaze operation, maybe one of its headers
should define NET_SKB_PAD as well?

------------------------------------------------------------------------
 Steven J. Magnani               "I claim this network for MARS!
 www.digidescorp.com              Earthling, return my space modulator!"

 #include <standard.disclaimer>

^ permalink raw reply

* Re: [Bugme-new] [Bug 15682] New: XFRM is not updating RTAX_ADVMSS metric
From: Eduardo Panisset @ 2010-04-06 19:02 UTC (permalink / raw)
  To: hadi
  Cc: Andrew Morton, Herbert Xu, netdev, bugzilla-daemon, bugme-daemon,
	David S. Miller
In-Reply-To: <1270561240.7198.3.camel@bigi>

Hi,

My intention is only to report a problem that I have faced. The
solution proposed isn't (I know that) probably the best one to adopt
as I'm not a kernel specialist, it is more illustrative to allow you
guys understanding what I'm meaning and solve that on the better way
(hence I haven't submited a patch).

Regards,
Eduardo Panisset.

On Tue, Apr 6, 2010 at 10:40 AM, jamal <hadi@cyberus.ca> wrote:
>
> Herbert would give better answers. I dont think what Eduardo is
> doing is correct. You cant just start factoring in tcp headers
> at the xfrm level - and besides, the mtu calculation
> already takes care tunnel headers - so tcp should be able to
> compute correct MSS.
>
> cheers,
> jamal
>
> On Mon, 2010-04-05 at 12:50 -0700, Andrew Morton wrote:
>> (switched to email.  Please respond via emailed reply-to-all, not via the
>> bugzilla web interface).
>>
>> On Fri, 2 Apr 2010 17:34:35 GMT
>> bugzilla-daemon@bugzilla.kernel.org wrote:
>>
>> > https://bugzilla.kernel.org/show_bug.cgi?id=15682
>> >
>> >            Summary: XFRM is not updating RTAX_ADVMSS metric
>> >            Product: Networking
>> >            Version: 2.5
>> >     Kernel Version: 2.6.28-2
>> >           Platform: All
>> >         OS/Version: Linux
>> >               Tree: Mainline
>> >             Status: NEW
>> >           Severity: normal
>> >           Priority: P1
>> >          Component: Other
>> >         AssignedTo: acme@ghostprotocols.net
>> >         ReportedBy: eduardo.panisset@gmail.com
>> >         Regression: No
>> >
>> >
>> > I have been testing DSMIPv6 code which uses all kind of advanced
>> > features of XFRM framework and I believe I have found a bug related to
>> > update RTAX_ADVMSS route metric.
>> > The XFRM code on net/xfrm/xfrm_policy.c by its functions
>> > xfrm_init_pmtu and xfrm_bundle_ok updates RTAX_MTU route caching
>> > metric however I believe it must update RTAX_ADVMSS as this later is
>> > used by tcp connect function for adverting the MSS value on SYN
>> > messages.
>> >
>> > As MSS is not being updated by XFRM the TCP SYN messages (e.g.
>> > originated from a internet browser)  is erroneously informing its MSS
>> > (without taking into account the overhead added to IP packet size by
>> > XFRM transformations).  One result of that is the browser gets
>> > "frozen" after starts a TCP connection because TCP messages sent by
>> > TCP server will never get to it (TCP server is sending too large
>> > segments to browser).
>> >
>> > Below I describe the changes I have done (on xfrm_init_pmtu and
>> > xfrm_bundle_ok) and that seem to fix this problem:
>> >
>> > xfrm_init_pmtu:
>> >                  .
>> >                  .
>> >                  .
>> >
>> >         dst->metrics[RTAX_MTU-1] = pmtu; // original code, below my changes
>> >
>> >         if (dst->xfrm->props.mode == XFRM_MODE_TUNNEL)
>> >                  switch (dst->xfrm->props.family)
>> >                  {
>> >                  case AF_INET:
>> >                  dst->metrics[RTAX_ADVMSS-1] = max_t(unsigned int,
>> > pmtu - sizeof(struct iphdr) - sizeof(struct tcphdr), 256);
>> >                  break;
>> >
>> >                  case AF_INET6:
>> >                  dst->metrics[RTAX_ADVMSS-1] = max_t(unsigned int,
>> > pmtu - sizeof(struct ipv6hdr) - sizeof(struct tcphdr),
>> >                             dev_net(dst->dev)->ipv6.
>> > sysctl.ip6_rt_min_advmss);
>> >                  break;
>> >                  }
>> >
>> > xfrm_bundle_ok:
>> >
>> >                .
>> >                .
>> >                .
>> >
>> >         dst->metrics[RTAX_MTU-1] = mtu; // original code, below my changes
>> >
>> >         if (dst->xfrm->props.mode == XFRM_MODE_TUNNEL)
>> >                 switch (dst->xfrm->props.family)
>> >                 {
>> >                 case AF_INET:
>> >                         dst->metrics[RTAX_ADVMSS-1] = max_t(unsigned
>> > int, mtu - sizeof(struct iphdr) - sizeof(struct tcphdr), 256);
>> >                 break;
>> >
>> >                 case AF_INET6:
>> >                         dst->metrics[RTAX_ADVMSS-1] = max_t(unsigned
>> > int, mtu - sizeof(struct ipv6hdr) - sizeof(struct tcphdr),
>> >
>> > dev_net(dst->dev)->ipv6.sysctl.ip6_rt_min_advmss);
>> >                 break;
>> >                 }
>> >
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

^ permalink raw reply

* Re: [PATCH] [V3] Add non-Virtex5 support for LL TEMAC driver
From: Grant Likely @ 2010-04-06 18:53 UTC (permalink / raw)
  To: John Linn
  Cc: Eric Dumazet, netdev, linuxppc-dev, jwboyer, john.williams,
	michal.simek, John Tyner
In-Reply-To: <bf8eac92-5008-4c1f-9b99-0199e62436fa@SG2EHSMHS010.ehs.local>

On Tue, Apr 6, 2010 at 11:11 AM, John Linn <John.Linn@xilinx.com> wrote:
>> -----Original Message-----
>> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
>> Sent: Tuesday, April 06, 2010 11:01 AM
>> To: John Linn
>> Cc: netdev@vger.kernel.org; linuxppc-dev@ozlabs.org; grant.likely@secretlab.ca;
>> jwboyer@linux.vnet.ibm.com; john.williams@petalogix.com; michal.simek@petalogix.com; John Tyner
>> Subject: RE: [PATCH] [V3] Add non-Virtex5 support for LL TEMAC driver
>>
>> Le mardi 06 avril 2010 à 10:12 -0600, John Linn a écrit :
>> > > -----Original Message-----
>> > > From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
>> > > Sent: Monday, April 05, 2010 3:30 PM
>> > > To: John Linn
>> > > Cc: netdev@vger.kernel.org; linuxppc-dev@ozlabs.org; grant.likely@secretlab.ca;
>> > > jwboyer@linux.vnet.ibm.com; john.williams@petalogix.com; michal.simek@petalogix.com; John Tyner
>> > > Subject: Re: [PATCH] [V3] Add non-Virtex5 support for LL TEMAC driver
>> > >
>> > > Le lundi 05 avril 2010 à 15:11 -0600, John Linn a écrit :
>> > > > This patch adds support for using the LL TEMAC Ethernet driver on
>> > > > non-Virtex 5 platforms by adding support for accessing the Soft DMA
>> > > > registers as if they were memory mapped instead of solely through the
>> > > > DCR's (available on the Virtex 5).
>> > > >
>> > > > The patch also updates the driver so that it runs on the MicroBlaze.
>> > > > The changes were tested on the PowerPC 440, PowerPC 405, and the
>> > > > MicroBlaze platforms.
>> > > >
>> > > > Signed-off-by: John Tyner <jtyner@cs.ucr.edu>
>> > > > Signed-off-by: John Linn <john.linn@xilinx.com>
>> > > >
>> > > > ---
>> > >
>> > > > +/* Align the IP data in the packet on word boundaries as MicroBlaze
>> > > > + * needs it.
>> > > > + */
>> > > > +
>> > > >  #define XTE_ALIGN       32
>> > > > -#define BUFFER_ALIGN(adr) ((XTE_ALIGN - ((u32) adr)) % XTE_ALIGN)
>> > > > +#define BUFFER_ALIGN(adr) ((34 - ((u32) adr)) % XTE_ALIGN)
>> > > >
>> > >
>> > > Very interesting way of doing this, but why such convoluted thing ?
>> >
>> > This is trying to align for a cache line (32 bytes) before my change.
>> >
>> > My change was then also making it align the IP data on a word boundary.
>> >
>> > >
>> > > Because of the % 32, this is equivalent to :
>> > >
>> > > #define BUFFER_ALIGN(adr) ((2 - ((u32) adr)) % XTE_ALIGN)
>> > >
>> >
>> > Yes, but I'm not sure that's clearer IMHO.
>> >
>> > > But wait, dont we recognise the magic constant NET_IP_ALIGN ?
>> >
>> > Yes it could be used.  I'm struggling with how to make this all be clearer.
>> >
>>
>> I am not saying its clearer, I am saying we have a standard way to
>> handle this exact problem (aligning rcvs buffer so that IP header is
>> aligned)
>>
>> There is no need to invent new ones, this makes reviewing of this driver
>> more difficult.

Hold on.... BUFFER_ALIGN is being used to align the DMA buffer on a
cache line boundary.  I don't think netdev_alloc_skb() makes any
guarantees about how the start of the IP header lines up against cache
line boundaries.  The amount of padding needed is not known until an
skbuff is obtained from netdev_alloc_skb(), and
netdev_alloc_skb_ip_align() can only handle a fixed size padding,

It doesn't look like netdev_alloc_skb_ip_align() is the right thing in
this regard.

>> > How about this?
>> > #define BUFFER_ALIGN(adr) (((XTE_ALIGN + NET_IP_ALIGN) - ((u32) adr)) % XTE_ALIGN)
>> >
>>
>> Sorry, I still dont understand why you need XTE_ALIGN + ...
>>
>> ((A + B) - C) % A   is equal to (B - C) % A
>>
>> Which one is more readable ?
>
> I'm fine with your suggestion.
>
> #define BUFFER_ALIGN(adr) ((2 - ((u32) adr)) % XTE_ALIGN)
>
>>
>> Please take a look at existing and clean code, no magic macro, and we
>> can understand the intention.
>>
>> find drivers/net | xargs grep -n netdev_alloc_skb_ip_align
>>
>>
>
> Yes I see how it's used, but it only allows you to reserve 2 bytes in the skb with no options.

Eric is here.  The mod operation means that BUFFER_ALIGN using either
2 or 34 is equivalent.

g.

^ permalink raw reply

* Re: [PATCH] vhost: Make it more scalable by creating a vhost thread per device.
From: Avi Kivity @ 2010-04-06 18:49 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: Michael S. Tsirkin, Tom Lendacky, netdev, kvm@vger.kernel.org
In-Reply-To: <1270488911.27874.43.camel@w-sridhar.beaverton.ibm.com>

On 04/05/2010 08:35 PM, Sridhar Samudrala wrote:
> On Sun, 2010-04-04 at 14:14 +0300, Michael S. Tsirkin wrote:
>    
>> On Fri, Apr 02, 2010 at 10:31:20AM -0700, Sridhar Samudrala wrote:
>>      
>>> Make vhost scalable by creating a separate vhost thread per vhost
>>> device. This provides better scaling across multiple guests and with
>>> multiple interfaces in a guest.
>>>        
>> Thanks for looking into this. An alternative approach is
>> to simply replace create_singlethread_workqueue with
>> create_workqueue which would get us a thread per host CPU.
>>
>> It seems that in theory this should be the optimal approach
>> wrt CPU locality, however, in practice a single thread
>> seems to get better numbers. I have a TODO to investigate this.
>> Could you try looking into this?
>>      
> Yes. I tried using create_workqueue(), but the results were not good
> atleast when the number of guest interfaces is less than the number
> of CPUs. I didn't try more than 8 guests.
> Creating a separate thread per guest interface seems to be more
> scalable based on the testing i have done so far.
>    

Thread per guest is also easier to account.  I'm worried about guests 
impacting other guests' performance outside scheduler control by 
extensive use of vhost.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply

* Re: [PATCH] [V3] Add non-Virtex5 support for LL TEMAC driver
From: Grant Likely @ 2010-04-06 17:54 UTC (permalink / raw)
  To: John Linn
  Cc: Eric Dumazet, linuxppc-dev, netdev, John Tyner, michal.simek,
	john.williams
In-Reply-To: <7a42d507-bf50-40f3-a2a0-8de682b314d7@SG2EHSMHS017.ehs.local>

On Mon, Apr 5, 2010 at 3:33 PM, John Linn <John.Linn@xilinx.com> wrote:
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
>> > +/* Align the IP data in the packet on word boundaries as MicroBlaze
>> > + * needs it.
>> > + */
>> > +
>> >  #define XTE_ALIGN       32
>> > -#define BUFFER_ALIGN(adr) ((XTE_ALIGN - ((u32) adr)) % XTE_ALIGN)
>> > +#define BUFFER_ALIGN(adr) ((34 - ((u32) adr)) % XTE_ALIGN)
>> >
>>
>> Very interesting way of doing this, but why such convoluted thing ?
>
> Grant might have insight into why this started this way, I just updated to help with MicroBlaze alignment.

It was that way in the code I received from Yoshio Kashiwagi and David
H. Lynch.  I have no problem with it being changed.

Personally I would probably write a followup patch to change this to
netdev_alloc_skb_ip_align(), just to keep changes logically separated.
 But I'm okay with it rolled into this patch also.

g.

^ permalink raw reply

* RE: [PATCH] [V3] Add non-Virtex5 support for LL TEMAC driver
From: Eric Dumazet @ 2010-04-06 17:47 UTC (permalink / raw)
  To: John Linn
  Cc: netdev, linuxppc-dev, grant.likely, jwboyer, john.williams,
	michal.simek, John Tyner
In-Reply-To: <bf8eac92-5008-4c1f-9b99-0199e62436fa@SG2EHSMHS010.ehs.local>

> 
> Yes I see how it's used, but it only allows you to reserve 2 bytes in the skb with no options.

Really ? This would be a bug !

__alloc_skb() uses kmalloc(XXXX), this gives you the guarantee you want,
or maybe comment you wrote is not what is _really_ necessary ?

/* Align the IP data in the packet on word boundaries as MicroBlaze
 * needs it.
 */

^ permalink raw reply

* Re: [RFC][PATCH] ipmr:  Fix struct mfcctl to be independent of MAXVIFS v2
From: Ben Greear @ 2010-04-06 16:57 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: netdev, David S. Miller, Eric Dumazet, Patrick McHardy, Ilia K,
	Tom Goff
In-Reply-To: <m1eiis8uc6.fsf@fess.ebiederm.org>

On 04/06/2010 08:38 AM, Eric W. Biederman wrote:
>
> Right now if you recompile the kernel increasing MAXVIFS
> to support more VIFS users of the MRT_ADD_VIF and MRT_DEL_VIF
> will break because the ABI changed.
>
> My goal is an API that works with just a recompile of existing
> applications, and an ABI that continues to work for old
> applications.
>
> The unused/dead fields at the end of struct mfcctl make this
> exercise more difficult than it should be.
>
> - Rename the existing struct mfcctl mfcctl_old.
> - Define a new and larger struct mfcctl that we can detect
>    by size.
>
>    The new and larger struct mfcctl won't have trailing garbage
>    fields so we can accept anything of that size or larger,
>    and simply ignore the entries that are above MAXVIFS.
>
> My new struct mfcctl is now 128 bytes which is noticeable on
> the stack but should still be small enough not to cause problems.
>
> v2:  Rework the support larger arrays so that most/all? existing
>     applications can simply be recompiled and work with a larger
>     maximum number of VIFS.

If we're going to change the ABI, can we not support an arbitrary
number of VIFS instead of just a larger fixed maximum?

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply

* e1000 TX unit hang and a possibly kernel oops
From: Breno Leitao @ 2010-04-06 17:23 UTC (permalink / raw)
  To: e1000-devel@lists.sourceforge.net, netdev; +Cc: kamaleshb

Hi, 

During a test workload on kernel 2.6.27.19-5-ppc64 (SLES11), we are facing
a "TX Unit Hang" issue, and then a kernel oops.

We also tried the same workload with the sourceforge device driver and the
problem is also reproducible. When  TSO is off, the problem don't appear on
both driver. 

I am not sure if the TX hang issue could be related to oops that happen in the
virtual memory subsystem, that is why I decided to ask you guys.

Thanks, 

Mar 17 02:34:34 c862f3sq03 kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
Mar 17 02:34:34 c862f3sq03 kernel:   Tx Queue             <0>
Mar 17 02:34:34 c862f3sq03 kernel:   TDH                  <d6>
Mar 17 02:34:34 c862f3sq03 kernel:   TDT                  <d1>
Mar 17 02:34:34 c862f3sq03 kernel:   next_to_use          <d1>
Mar 17 02:34:34 c862f3sq03 kernel:   next_to_clean        <d5>
Mar 17 02:34:34 c862f3sq03 kernel: buffer_info[next_to_clean]
Mar 17 02:34:34 c862f3sq03 kernel:   time_stamp           <10053bc21>
Mar 17 02:34:34 c862f3sq03 kernel:   next_to_watch        <da>
Mar 17 02:34:34 c862f3sq03 kernel:   jiffies              <10053bcbc>
Mar 17 02:34:34 c862f3sq03 kernel:   next_to_watch.status <0>
Mar 17 02:34:36 c862f3sq03 kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
Mar 17 02:34:36 c862f3sq03 kernel:   Tx Queue             <0>
Mar 17 02:34:36 c862f3sq03 kernel:   TDH                  <d6>
Mar 17 02:34:36 c862f3sq03 kernel:   TDT                  <d1>
Mar 17 02:34:36 c862f3sq03 kernel:   next_to_use          <d1>
Mar 17 02:34:36 c862f3sq03 kernel:   next_to_clean        <d5>
Mar 17 02:34:36 c862f3sq03 kernel: buffer_info[next_to_clean]
Mar 17 02:34:36 c862f3sq03 kernel:   time_stamp           <10053bc21>
Mar 17 02:34:36 c862f3sq03 kernel:   next_to_watch        <da>
Mar 17 02:34:36 c862f3sq03 kernel:   jiffies              <10053bd84>
Mar 17 02:34:36 c862f3sq03 kernel:   next_to_watch.status <0>
Mar 17 02:34:38 c862f3sq03 kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
Mar 17 02:34:38 c862f3sq03 kernel:   Tx Queue             <0>
Mar 17 02:34:38 c862f3sq03 kernel:   TDH                  <d6>
Mar 17 02:34:38 c862f3sq03 kernel:   TDT                  <d1>
Mar 17 02:34:38 c862f3sq03 kernel:   next_to_use          <d1>
Mar 17 02:34:38 c862f3sq03 kernel:   next_to_clean        <d5>
Mar 17 02:34:38 c862f3sq03 kernel: buffer_info[next_to_clean]
Mar 17 02:34:38 c862f3sq03 kernel:   time_stamp           <10053bc21>
Mar 17 02:34:38 c862f3sq03 kernel:   next_to_watch        <da>
Mar 17 02:34:38 c862f3sq03 kernel:   jiffies              <10053be4c>
Mar 17 02:34:38 c862f3sq03 kernel:   next_to_watch.status <0>
Mar 17 02:34:40 c862f3sq03 kernel: ------------[ cut here ]------------
Mar 17 02:34:40 c862f3sq03 kernel: Badness at net/sched/sch_generic.c:219
Mar 17 02:34:40 c862f3sq03 kernel: NIP: c000000000498eac LR: c000000000498d8c CTR: 0000000000000001
Mar 17 02:34:40 c862f3sq03 kernel: REGS: c00000000f68fa80 TRAP: 0700   Tainted: G           (2.6.27.19-5-ppc64)
Mar 17 02:34:40 c862f3sq03 kernel: MSR: 8000000000029032 <EE,ME,IR,DR>  CR: 88000022  XER: 00000010
Mar 17 02:34:40 c862f3sq03 kernel: TASK = c000000000918340[0] 'swapper' THREAD: c0000000009d0000 CPU: 0
Mar 17 02:34:40 c862f3sq03 kernel: GPR00: 0000000000000000 c00000000f68fd00 c0000000009cbc80 0000000000000080 
Mar 17 02:34:40 c862f3sq03 kernel: GPR04: 0000000000000000 c000000000498ce4 c0000000009bb550 c0000003be260980 
Mar 17 02:34:40 c862f3sq03 kernel: GPR08: 0000000000000002 c000000000c91e68 0000000000000080 ffffffffffffff01 
Mar 17 02:34:40 c862f3sq03 kernel: GPR12: 0000000000000000 c000000000a92c80 0000000000051bc3 0000000000051aa1 
Mar 17 02:34:40 c862f3sq03 kernel: GPR16: 0000000000051bbb c000000000a51280 0000000000000000 0000000000000000 
Mar 17 02:34:40 c862f3sq03 kernel: GPR20: c000000000bda198 c000000000bda598 c000000000bda998 0000000000000002 
Mar 17 02:34:40 c862f3sq03 kernel: GPR24: ffffffffffffffff 0000000000000000 0000000000000000 0000000000000000 
Mar 17 02:34:40 c862f3sq03 kernel: GPR28: c0000003be2c9400 0000000000000001 c000000000961118 c0000003be260980 
Mar 17 02:34:40 c862f3sq03 kernel: NIP [c000000000498eac] .dev_watchdog+0x190/0x2cc
Mar 17 02:34:40 c862f3sq03 kernel: LR [c000000000498d8c] .dev_watchdog+0x70/0x2cc
Mar 17 02:34:40 c862f3sq03 kernel: Call Trace:
Mar 17 02:34:40 c862f3sq03 kernel: [c00000000f68fd00] [c000000000498d8c] .dev_watchdog+0x70/0x2cc (unreliable)
Mar 17 02:34:40 c862f3sq03 kernel: [c00000000f68fdc0] [c00000000009cec4] .run_timer_softirq+0x1e8/0x2c4
Mar 17 02:34:40 c862f3sq03 kernel: [c00000000f68fec0] [c00000000009705c] .__do_softirq+0x13c/0x284
Mar 17 02:34:40 c862f3sq03 kernel: [c00000000f68ff90] [c00000000002ab4c] .call_do_softirq+0x14/0x24
Mar 17 02:34:40 c862f3sq03 kernel: [c0000000009d3880] [c00000000000db5c] .do_softirq+0x88/0xf0
Mar 17 02:34:40 c862f3sq03 kernel: [c0000000009d3920] [c000000000096e38] .irq_exit+0x5c/0xb4
Mar 17 02:34:40 c862f3sq03 kernel: [c0000000009d39a0] [c000000000027c1c] .timer_interrupt+0xd8/0x104
Mar 17 02:34:40 c862f3sq03 kernel: [c0000000009d3a30] [c000000000003718] decrementer_common+0x118/0x180
Mar 17 02:34:40 c862f3sq03 kernel: --- Exception: 901 at .pseries_dedicated_idle_sleep+0xf0/0x1b8
Mar 17 02:34:40 c862f3sq03 kernel:     LR = .pseries_dedicated_idle_sleep+0xe0/0x1b8
Mar 17 02:34:40 c862f3sq03 kernel: [c0000000009d3d20] [c0000000000501b8] .pseries_dedicated_idle_sleep+0x80/0x1b8 (unreliable)
Mar 17 02:34:40 c862f3sq03 kernel: [c0000000009d3dd0] [c000000000013114] .cpu_idle+0xfc/0x1a4
Mar 17 02:34:40 c862f3sq03 kernel: [c0000000009d3e60] [c00000000051b124] .rest_init+0x7c/0x94
Mar 17 02:34:40 c862f3sq03 kernel: [c0000000009d3ee0] [c000000000760d18] .start_kernel+0x52c/0x554
Mar 17 02:34:40 c862f3sq03 kernel: [c0000000009d3f90] [c000000000008568] .start_here_common+0x3c/0x54
Mar 17 02:34:40 c862f3sq03 kernel: Instruction dump:
Mar 17 02:34:40 c862f3sq03 kernel: 48000064 e97e8018 e81c0330 e93c033a 7d290214 e80b0000 7d604851 40800048 
Mar 17 02:34:40 c862f3sq03 kernel: e93e8028 80090000 2f800000 40fe0014 <0fe00000> 38000001 e93e8028 90090000 
Mar 17 02:34:40 c862f3sq03 kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
Mar 17 02:34:40 c862f3sq03 kernel:   Tx Queue             <0>
Mar 17 02:34:40 c862f3sq03 kernel:   TDH                  <d6>
Mar 17 02:34:40 c862f3sq03 kernel:   TDT                  <d1>
Mar 17 02:34:40 c862f3sq03 kernel:   next_to_use          <d1>
Mar 17 02:34:40 c862f3sq03 kernel:   next_to_clean        <d5>
Mar 17 02:34:40 c862f3sq03 kernel: buffer_info[next_to_clean]
Mar 17 02:34:40 c862f3sq03 kernel:   time_stamp           <10053bc21>
Mar 17 02:34:40 c862f3sq03 kernel:   next_to_watch        <da>
Mar 17 02:34:40 c862f3sq03 kernel:   jiffies              <10053bf14>
Mar 17 02:34:40 c862f3sq03 kernel:   next_to_watch.status <0>
Mar 17 02:34:44 c862f3sq03 kernel: e1000: eth1: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
--snip--
Mar 17 15:31:05 c862f3sq04 kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
Mar 17 15:31:05 c862f3sq04 kernel:   Tx Queue             <0>
Mar 17 15:31:05 c862f3sq04 kernel:   TDH                  <98>
Mar 17 15:31:05 c862f3sq04 kernel:   TDT                  <93>
Mar 17 15:31:05 c862f3sq04 kernel:   next_to_use          <93>
Mar 17 15:31:05 c862f3sq04 kernel:   next_to_clean        <97>
Mar 17 15:31:05 c862f3sq04 kernel: buffer_info[next_to_clean]
Mar 17 15:31:05 c862f3sq04 kernel:   time_stamp           <10080baf1>
Mar 17 15:31:05 c862f3sq04 kernel:   next_to_watch        <9a>
Mar 17 15:31:05 c862f3sq04 kernel:   jiffies              <10080bca8>
Mar 17 15:31:05 c862f3sq04 kernel:   next_to_watch.status <0>
Mar 17 15:31:06 c862f3sq04 kernel: ------------[ cut here ]------------
Mar 17 15:31:06 c862f3sq04 kernel: Badness at net/sched/sch_generic.c:219
Mar 17 15:31:06 c862f3sq04 kernel: NIP: c000000000498eac LR: c000000000498d8c CTR: 0000000000000001
Mar 17 15:31:06 c862f3sq04 kernel: REGS: c00000000f68fa80 TRAP: 0700   Tainted: G           (2.6.27.19-5-ppc64)
Mar 17 15:31:06 c862f3sq04 kernel: MSR: 8000000000029032 <EE,ME,IR,DR>  CR: 88000022  XER: 00000010
Mar 17 15:31:06 c862f3sq04 kernel: TASK = c000000000918340[0] 'swapper' THREAD: c0000000009d0000 CPU: 0
Mar 17 15:31:06 c862f3sq04 kernel: GPR00: 0000000000000000 c00000000f68fd00 c0000000009cbc80 0000000000000080 
Mar 17 15:31:06 c862f3sq04 kernel: GPR04: 0000000000000000 c000000000498ce4 c0000000009bb550 c0000001dcf6d180 
Mar 17 15:31:06 c862f3sq04 kernel: GPR08: 0000000000000002 c000000000c91e68 0000000000000080 ffffffffffffffd9 
Mar 17 15:31:06 c862f3sq04 kernel: GPR12: 0000000000000000 c000000000a92c80 0000000000051bc3 0000000000051aa1 
Mar 17 15:31:06 c862f3sq04 kernel: GPR16: 0000000000051bbb c000000000a51280 0000000000000000 0000000000000000 
Mar 17 15:31:06 c862f3sq04 kernel: GPR20: c000000000bda198 c000000000bda598 c000000000bda998 0000000000000002 
Mar 17 15:31:06 c862f3sq04 kernel: GPR24: ffffffffffffffff 0000000000000000 0000000000000000 0000000000000000 
Mar 17 15:31:06 c862f3sq04 kernel: GPR28: c0000001dcb60380 0000000000000001 c000000000961118 c0000001dcf6d180 
Mar 17 15:31:06 c862f3sq04 kernel: NIP [c000000000498eac] .dev_watchdog+0x190/0x2cc
Mar 17 15:31:06 c862f3sq04 kernel: LR [c000000000498d8c] .dev_watchdog+0x70/0x2cc
Mar 17 15:31:06 c862f3sq04 kernel: Call Trace:
Mar 17 15:31:06 c862f3sq04 kernel: [c00000000f68fd00] [c000000000498d8c] .dev_watchdog+0x70/0x2cc (unreliable)
Mar 17 15:31:06 c862f3sq04 kernel: [c00000000f68fdc0] [c00000000009cec4] .run_timer_softirq+0x1e8/0x2c4
Mar 17 15:31:06 c862f3sq04 kernel: [c00000000f68fec0] [c00000000009705c] .__do_softirq+0x13c/0x284
Mar 17 15:31:06 c862f3sq04 kernel: [c00000000f68ff90] [c00000000002ab4c] .call_do_softirq+0x14/0x24
Mar 17 15:31:06 c862f3sq04 kernel: [c0000000009d3880] [c00000000000db5c] .do_softirq+0x88/0xf0
Mar 17 15:31:06 c862f3sq04 kernel: [c0000000009d3920] [c000000000096e38] .irq_exit+0x5c/0xb4
Mar 17 15:31:06 c862f3sq04 kernel: [c0000000009d39a0] [c000000000027c1c] .timer_interrupt+0xd8/0x104
Mar 17 15:31:06 c862f3sq04 kernel: [c0000000009d3a30] [c000000000003718] decrementer_common+0x118/0x180
Mar 17 15:31:06 c862f3sq04 kernel: --- Exception: 901 at .pseries_dedicated_idle_sleep+0xe8/0x1b8
Mar 17 15:31:06 c862f3sq04 kernel:     LR = .pseries_dedicated_idle_sleep+0xe0/0x1b8
Mar 17 15:31:06 c862f3sq04 kernel: [c0000000009d3d20] [c0000000000501b8] .pseries_dedicated_idle_sleep+0x80/0x1b8 (unreliable)
Mar 17 15:31:06 c862f3sq04 kernel: [c0000000009d3dd0] [c000000000013114] .cpu_idle+0xfc/0x1a4
Mar 17 15:31:06 c862f3sq04 kernel: [c0000000009d3e60] [c00000000051b124] .rest_init+0x7c/0x94
Mar 17 15:31:06 c862f3sq04 kernel: [c0000000009d3ee0] [c000000000760d18] .start_kernel+0x52c/0x554
Mar 17 15:31:06 c862f3sq04 kernel: [c0000000009d3f90] [c000000000008568] .start_here_common+0x3c/0x54
Mar 17 15:31:06 c862f3sq04 kernel: Instruction dump:
Mar 17 15:31:06 c862f3sq04 kernel: 48000064 e97e8018 e81c0330 e93c033a 7d290214 e80b0000 7d604851 40800048 
Mar 17 15:31:06 c862f3sq04 kernel: e93e8028 80090000 2f800000 40fe0014 <0fe00000> 38000001 e93e8028 90090000 
Mar 17 15:31:09 c862f3sq04 kernel: e1000: eth1: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
--snip--
Mar 17 15:59:54 c862f3sq02 kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
Mar 17 15:59:54 c862f3sq02 kernel:   Tx Queue             <0>
Mar 17 15:59:54 c862f3sq02 kernel:   TDH                  <3d>
Mar 17 15:59:54 c862f3sq02 kernel:   TDT                  <1f>
Mar 17 15:59:54 c862f3sq02 kernel:   next_to_use          <1f>
Mar 17 15:59:54 c862f3sq02 kernel:   next_to_clean        <3c>
Mar 17 15:59:54 c862f3sq02 kernel: buffer_info[next_to_clean]
Mar 17 15:59:54 c862f3sq02 kernel:   time_stamp           <1004d6039>
Mar 17 15:59:54 c862f3sq02 kernel:   next_to_watch        <41>
Mar 17 15:59:54 c862f3sq02 kernel:   jiffies              <1004d6164>
Mar 17 15:59:54 c862f3sq02 kernel:   next_to_watch.status <0>
Mar 17 15:59:54 c862f3sq02 kernel: klogd 1.4.1, ---------- state change ----------
Mar 17 15:59:56 c862f3sq02 kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
Mar 17 15:59:56 c862f3sq02 kernel:   Tx Queue             <0>
Mar 17 15:59:56 c862f3sq02 kernel:   TDH                  <3d>
Mar 17 15:59:56 c862f3sq02 kernel:   TDT                  <1f>
Mar 17 15:59:56 c862f3sq02 kernel:   next_to_use          <1f>
Mar 17 15:59:56 c862f3sq02 kernel:   next_to_clean        <3c>
Mar 17 15:59:56 c862f3sq02 kernel: buffer_info[next_to_clean]
Mar 17 15:59:56 c862f3sq02 kernel:   time_stamp           <1004d6039>
Mar 17 15:59:56 c862f3sq02 kernel:   next_to_watch        <41>
Mar 17 15:59:56 c862f3sq02 kernel:   jiffies              <1004d622c>
Mar 17 15:59:56 c862f3sq02 kernel:   next_to_watch.status <0>
Mar 17 15:59:58 c862f3sq02 kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
Mar 17 15:59:58 c862f3sq02 kernel:   Tx Queue             <0>
Mar 17 15:59:58 c862f3sq02 kernel:   TDH                  <3d>
Mar 17 15:59:58 c862f3sq02 kernel:   TDT                  <1f>
Mar 17 15:59:58 c862f3sq02 kernel:   next_to_use          <1f>
Mar 17 15:59:58 c862f3sq02 kernel:   next_to_clean        <3c>
Mar 17 15:59:58 c862f3sq02 kernel: buffer_info[next_to_clean]
Mar 17 15:59:58 c862f3sq02 kernel:   time_stamp           <1004d6039>
Mar 17 15:59:58 c862f3sq02 kernel:   next_to_watch        <41>
Mar 17 15:59:58 c862f3sq02 kernel:   jiffies              <1004d62f4>
Mar 17 15:59:58 c862f3sq02 kernel:   next_to_watch.status <0>
Mar 17 16:00:03 c862f3sq02 kernel: e1000: eth1: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX

Mar 17 17:42:59 c862f3sq02 kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
Mar 17 17:42:59 c862f3sq02 kernel:   Tx Queue             <0>
Mar 17 17:42:59 c862f3sq02 kernel:   TDH                  <ec>
Mar 17 17:42:59 c862f3sq02 kernel:   TDT                  <e8>
Mar 17 17:42:59 c862f3sq02 kernel:   next_to_use          <e8>
Mar 17 17:42:59 c862f3sq02 kernel:   next_to_clean        <ec>
Mar 17 17:42:59 c862f3sq02 kernel: buffer_info[next_to_clean]
Mar 17 17:42:59 c862f3sq02 kernel:   time_stamp           <10056d08d>
Mar 17 17:42:59 c862f3sq02 kernel:   next_to_watch        <f0>
Mar 17 17:42:59 c862f3sq02 kernel:   jiffies              <10056d168>
Mar 17 17:42:59 c862f3sq02 kernel:   next_to_watch.status <0>
Mar 17 17:43:01 c862f3sq02 kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
Mar 17 17:43:01 c862f3sq02 kernel:   Tx Queue             <0>
Mar 17 17:43:01 c862f3sq02 kernel:   TDH                  <ec>
Mar 17 17:43:01 c862f3sq02 kernel:   TDT                  <e8>
Mar 17 17:43:01 c862f3sq02 kernel:   next_to_use          <e8>
Mar 17 17:43:01 c862f3sq02 kernel:   next_to_clean        <ec>
Mar 17 17:43:01 c862f3sq02 kernel: buffer_info[next_to_clean]
Mar 17 17:43:01 c862f3sq02 kernel:   time_stamp           <10056d08d>
Mar 17 17:43:01 c862f3sq02 kernel:   next_to_watch        <f0>
Mar 17 17:43:01 c862f3sq02 kernel:   jiffies              <10056d230>
Mar 17 17:43:01 c862f3sq02 kernel:   next_to_watch.status <0>
Mar 17 17:43:06 c862f3sq02 kernel: e1000: eth1: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: R
--snip--

Mar 17 18:43:32 c862f3sq03 kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
Mar 17 18:43:32 c862f3sq03 kernel:   Tx Queue             <0>
Mar 17 18:43:32 c862f3sq03 kernel:   TDH                  <3d>
Mar 17 18:43:32 c862f3sq03 kernel:   TDT                  <37>
Mar 17 18:43:32 c862f3sq03 kernel:   next_to_use          <37>
Mar 17 18:43:32 c862f3sq03 kernel:   next_to_clean        <3c>
Mar 17 18:43:32 c862f3sq03 kernel: buffer_info[next_to_clean]
Mar 17 18:43:32 c862f3sq03 kernel:   time_stamp           <100ac71f5>
Mar 17 18:43:32 c862f3sq03 kernel:   next_to_watch        <41>
Mar 17 18:43:32 c862f3sq03 kernel:   jiffies              <100ac72e4>
Mar 17 18:43:32 c862f3sq03 kernel:   next_to_watch.status <0>
Mar 17 18:43:32 c862f3sq03 kernel: klogd 1.4.1, ---------- state change ----------
Mar 17 18:43:34 c862f3sq03 kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
Mar 17 18:43:34 c862f3sq03 kernel:   Tx Queue             <0>
Mar 17 18:43:34 c862f3sq03 kernel:   TDH                  <3d>
Mar 17 18:43:34 c862f3sq03 kernel:   TDT                  <37>
Mar 17 18:43:34 c862f3sq03 kernel:   next_to_use          <37>
Mar 17 18:43:34 c862f3sq03 kernel:   next_to_clean        <3c>
Mar 17 18:43:34 c862f3sq03 kernel: buffer_info[next_to_clean]
Mar 17 18:43:34 c862f3sq03 kernel:   time_stamp           <100ac71f5>
Mar 17 18:43:34 c862f3sq03 kernel:   next_to_watch        <41>
Mar 17 18:43:34 c862f3sq03 kernel:   jiffies              <100ac73ac>
Mar 17 18:43:34 c862f3sq03 kernel:   next_to_watch.status <0>
Mar 17 18:43:36 c862f3sq03 kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
Mar 17 18:43:36 c862f3sq03 kernel:   Tx Queue             <0>
Mar 17 18:43:36 c862f3sq03 kernel:   TDH                  <3d>
Mar 17 18:43:36 c862f3sq03 kernel:   TDT                  <37>
Mar 17 18:43:36 c862f3sq03 kernel:   next_to_use          <37>
Mar 17 18:43:36 c862f3sq03 kernel:   next_to_clean        <3c>
Mar 17 18:43:36 c862f3sq03 kernel: buffer_info[next_to_clean]
Mar 17 18:43:36 c862f3sq03 kernel:   time_stamp           <100ac71f5>
Mar 17 18:43:36 c862f3sq03 kernel:   next_to_watch        <41>
Mar 17 18:43:36 c862f3sq03 kernel:   jiffies              <100ac7474>
Mar 17 18:43:36 c862f3sq03 kernel:   next_to_watch.status <0>
Mar 17 18:43:38 c862f3sq03 kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
Mar 17 18:43:38 c862f3sq03 kernel:   Tx Queue             <0>
Mar 17 18:43:38 c862f3sq03 kernel:   TDH                  <3d>
Mar 17 18:43:38 c862f3sq03 kernel:   TDT                  <37>
Mar 17 18:43:38 c862f3sq03 kernel:   next_to_use          <37>
Mar 17 18:43:38 c862f3sq03 kernel:   next_to_clean        <3c>
Mar 17 18:43:38 c862f3sq03 kernel: buffer_info[next_to_clean]
Mar 17 18:43:38 c862f3sq03 kernel:   time_stamp           <100ac71f5>
Mar 17 18:43:38 c862f3sq03 kernel:   next_to_watch        <41>
Mar 17 18:43:38 c862f3sq03 kernel:   jiffies              <100ac753c>
Mar 17 18:43:38 c862f3sq03 kernel:   next_to_watch.status <0>
--snip--
Mar 17 19:21:15 c862f3sq02 kernel: Unable to handle kernel paging request for data at address 0x00000000
Mar 17 19:21:15 c862f3sq02 kernel: Faulting instruction address: 0xc00000000010776c
Mar 17 19:21:15 c862f3sq02 kernel: Oops: Kernel access of bad area, sig: 11 [#1]
Mar 17 19:21:15 c862f3sq02 kernel: SMP NR_CPUS=1024 NUMA pSeries
Mar 17 19:21:15 c862f3sq02 kernel: Modules linked in: mmfs26(X) mmfslinux(X) tracedev(X) nfs lockd nfs_acl sunrpc dm_round_robin scsi_dh_rdac dm_multipath scsi_dh ipv6 af_packet fuse loop dm_mod lpfc sr_mod cdrom ses scsi_transport_fc sg enclosure e100 e1000 scsi_tgt ib_ehca mii ib_core sd_mod crc_t10dif ipr(X) pata_pdc2027x libata scsi_mod
Mar 17 19:21:15 c862f3sq02 kernel: Supported: Yes, External
Mar 17 19:21:15 c862f3sq02 kernel: NIP: c00000000010776c LR: c000000000107734 CTR: 0000000000000000
Mar 17 19:21:15 c862f3sq02 kernel: REGS: c0000001d0f6b560 TRAP: 0300   Tainted: G        W  (2.6.27.19-5-ppc64)
Mar 17 19:21:15 c862f3sq02 kernel: MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 44000002  XER: 00000010
Mar 17 19:21:15 c862f3sq02 kernel: DAR: 0000000000000000, DSISR: 0000000040000000
Mar 17 19:21:15 c862f3sq02 kernel: TASK = c0000001db8bada0[11171] 'bash' THREAD: c0000001d0f68000 CPU: 2
Mar 17 19:21:15 c862f3sq02 kernel: GPR00: 0000000000000000 c0000001d0f6b7e0 c0000000009cbc80 0000000000000000 
Mar 17 19:21:15 c862f3sq02 kernel: GPR04: c0000001da3c2300 00000000000000d0 0000000000000000 c0000001a7310000 
Mar 17 19:21:15 c862f3sq02 kernel: GPR08: 000000000001a731 00000000000001a7 0000000000000000 0000000000000000 
Mar 17 19:21:15 c862f3sq02 kernel: GPR12: 0000000000000005 c000000000a93080 c000000000094818 0000000000000000 
Mar 17 19:21:15 c862f3sq02 kernel: GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000002000000 
Mar 17 19:21:15 c862f3sq02 kernel: GPR20: 00006b3600000093 c0000001da3c2300 000004000003dfd0 c0000001da55c5b0 
Mar 17 19:21:15 c862f3sq02 kernel: GPR24: c0000001db890000 c0000001a7310018 c000000001f50c50 c000000001fa3600 
Mar 17 19:21:15 c862f3sq02 kernel: GPR28: c0000000020d2c00 0000000000000000 c00000000094b070 00006b3600000093 
Mar 17 19:21:15 c862f3sq02 kernel: NIP [c00000000010776c] .do_wp_page+0x520/0x830
Mar 17 19:21:15 c862f3sq02 kernel: LR [c000000000107734] .do_wp_page+0x4e8/0x830
Mar 17 19:21:15 c862f3sq02 kernel: Call Trace:
Mar 17 19:21:15 c862f3sq02 kernel: [c0000001d0f6b7e0] [c0000000001076f4] .do_wp_page+0x4a8/0x830 (unreliable)
Mar 17 19:21:15 c862f3sq02 kernel: [c0000001d0f6b8b0] [c000000000109be8] .handle_mm_fault+0x36c/0x478
Mar 17 19:21:15 c862f3sq02 kernel: [c0000001d0f6b990] [c000000000518b10] .do_page_fault+0x3c4/0x5c0
Mar 17 19:21:15 c862f3sq02 kernel: [c0000001d0f6bac0] [c00000000000567c] handle_page_fault+0x20/0x5c
Mar 17 19:21:15 c862f3sq02 kernel: --- Exception: 301 at .schedule_tail+0x78/0x94
Mar 17 19:21:15 c862f3sq02 kernel:     LR = .schedule_tail+0x70/0x94
Mar 17 19:21:15 c862f3sq02 kernel: [c0000001d0f6bdb0] [c00000000008bdd8] .schedule_tail+0x2c/0x94 (unreliable)
Mar 17 19:21:15 c862f3sq02 kernel: [c0000001d0f6be30] [c000000000008910] .ret_from_fork+0x4/0x74
Mar 17 19:21:15 c862f3sq02 kernel: Instruction dump:
Mar 17 19:21:15 c862f3sq02 kernel: 409e02c8 e8f80000 3c004000 e97e8008 39400000 780007c6 78e905a4 7d290214 
Mar 17 19:21:15 c862f3sq02 kernel: 7920e120 79288402 78001f24 79294602 <7d6b002a> 2fab0000 419e000c 79202428 
Mar 17 19:21:15 c862f3sq02 kernel: ---[ end trace e6c07b27e638f8ff ]---


^ permalink raw reply

* Re: [RFC][PATCH] ipmr:  Fix struct mfcctl to be independent of MAXVIFS v2
From: Eric W. Biederman @ 2010-04-06 17:23 UTC (permalink / raw)
  To: Ben Greear
  Cc: netdev, David S. Miller, Eric Dumazet, Patrick McHardy, Ilia K,
	Tom Goff
In-Reply-To: <4BBB67FE.6020209@candelatech.com>

Ben Greear <greearb@candelatech.com> writes:

> On 04/06/2010 08:38 AM, Eric W. Biederman wrote:
>>
>> Right now if you recompile the kernel increasing MAXVIFS
>> to support more VIFS users of the MRT_ADD_VIF and MRT_DEL_VIF
>> will break because the ABI changed.
>>
>> My goal is an API that works with just a recompile of existing
>> applications, and an ABI that continues to work for old
>> applications.
>>
>> The unused/dead fields at the end of struct mfcctl make this
>> exercise more difficult than it should be.
>>
>> - Rename the existing struct mfcctl mfcctl_old.
>> - Define a new and larger struct mfcctl that we can detect
>>    by size.
>>
>>    The new and larger struct mfcctl won't have trailing garbage
>>    fields so we can accept anything of that size or larger,
>>    and simply ignore the entries that are above MAXVIFS.
>>
>> My new struct mfcctl is now 128 bytes which is noticeable on
>> the stack but should still be small enough not to cause problems.
>>
>> v2:  Rework the support larger arrays so that most/all? existing
>>     applications can simply be recompiled and work with a larger
>>     maximum number of VIFS.
>
> If we're going to change the ABI, can we not support an arbitrary
> number of VIFS instead of just a larger fixed maximum?

The ABI as I have specified should work for any larger structure than
I have specified.  But like select many applications will limit themselves
to use the definition of struct mfcctl that is passed to them.

Eric

^ permalink raw reply

* RE: [PATCH] [V3] Add non-Virtex5 support for LL TEMAC driver
From: John Linn @ 2010-04-06 17:11 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, linuxppc-dev, grant.likely, jwboyer, john.williams,
	michal.simek, John Tyner
In-Reply-To: <1270573233.2081.47.camel@edumazet-laptop>

> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: Tuesday, April 06, 2010 11:01 AM
> To: John Linn
> Cc: netdev@vger.kernel.org; linuxppc-dev@ozlabs.org; grant.likely@secretlab.ca;
> jwboyer@linux.vnet.ibm.com; john.williams@petalogix.com; michal.simek@petalogix.com; John Tyner
> Subject: RE: [PATCH] [V3] Add non-Virtex5 support for LL TEMAC driver
> 
> Le mardi 06 avril 2010 à 10:12 -0600, John Linn a écrit :
> > > -----Original Message-----
> > > From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> > > Sent: Monday, April 05, 2010 3:30 PM
> > > To: John Linn
> > > Cc: netdev@vger.kernel.org; linuxppc-dev@ozlabs.org; grant.likely@secretlab.ca;
> > > jwboyer@linux.vnet.ibm.com; john.williams@petalogix.com; michal.simek@petalogix.com; John Tyner
> > > Subject: Re: [PATCH] [V3] Add non-Virtex5 support for LL TEMAC driver
> > >
> > > Le lundi 05 avril 2010 à 15:11 -0600, John Linn a écrit :
> > > > This patch adds support for using the LL TEMAC Ethernet driver on
> > > > non-Virtex 5 platforms by adding support for accessing the Soft DMA
> > > > registers as if they were memory mapped instead of solely through the
> > > > DCR's (available on the Virtex 5).
> > > >
> > > > The patch also updates the driver so that it runs on the MicroBlaze.
> > > > The changes were tested on the PowerPC 440, PowerPC 405, and the
> > > > MicroBlaze platforms.
> > > >
> > > > Signed-off-by: John Tyner <jtyner@cs.ucr.edu>
> > > > Signed-off-by: John Linn <john.linn@xilinx.com>
> > > >
> > > > ---
> > >
> > > > +/* Align the IP data in the packet on word boundaries as MicroBlaze
> > > > + * needs it.
> > > > + */
> > > > +
> > > >  #define XTE_ALIGN       32
> > > > -#define BUFFER_ALIGN(adr) ((XTE_ALIGN - ((u32) adr)) % XTE_ALIGN)
> > > > +#define BUFFER_ALIGN(adr) ((34 - ((u32) adr)) % XTE_ALIGN)
> > > >
> > >
> > > Very interesting way of doing this, but why such convoluted thing ?
> >
> > This is trying to align for a cache line (32 bytes) before my change.
> >
> > My change was then also making it align the IP data on a word boundary.
> >
> > >
> > > Because of the % 32, this is equivalent to :
> > >
> > > #define BUFFER_ALIGN(adr) ((2 - ((u32) adr)) % XTE_ALIGN)
> > >
> >
> > Yes, but I'm not sure that's clearer IMHO.
> >
> > > But wait, dont we recognise the magic constant NET_IP_ALIGN ?
> >
> > Yes it could be used.  I'm struggling with how to make this all be clearer.
> >
> 
> I am not saying its clearer, I am saying we have a standard way to
> handle this exact problem (aligning rcvs buffer so that IP header is
> aligned)
> 
> There is no need to invent new ones, this makes reviewing of this driver
> more difficult.
> 
> 
> > How about this?
> > #define BUFFER_ALIGN(adr) (((XTE_ALIGN + NET_IP_ALIGN) - ((u32) adr)) % XTE_ALIGN)
> >
> 
> Sorry, I still dont understand why you need XTE_ALIGN + ...
> 
> ((A + B) - C) % A   is equal to (B - C) % A
> 
> Which one is more readable ?

I'm fine with your suggestion.

#define BUFFER_ALIGN(adr) ((2 - ((u32) adr)) % XTE_ALIGN)

> 
> Please take a look at existing and clean code, no magic macro, and we
> can understand the intention.
> 
> find drivers/net | xargs grep -n netdev_alloc_skb_ip_align
> 
> 

Yes I see how it's used, but it only allows you to reserve 2 bytes in the skb with no options.




This email and any attachments are intended for the sole use of the named recipient(s) and contain(s) confidential information that may be proprietary, privileged or copyrighted under applicable law. If you are not the intended recipient, do not read, copy, or forward this email message or any attachments. Delete this email message and any attachments immediately.

^ permalink raw reply

* RE: [PATCH] [V3] Add non-Virtex5 support for LL TEMAC driver
From: Eric Dumazet @ 2010-04-06 17:00 UTC (permalink / raw)
  To: John Linn
  Cc: netdev, linuxppc-dev, grant.likely, jwboyer, john.williams,
	michal.simek, John Tyner
In-Reply-To: <2fefb2a2-d0dc-461d-ac8c-3e7d177b7cf8@VA3EHSMHS032.ehs.local>

Le mardi 06 avril 2010 à 10:12 -0600, John Linn a écrit :
> > -----Original Message-----
> > From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> > Sent: Monday, April 05, 2010 3:30 PM
> > To: John Linn
> > Cc: netdev@vger.kernel.org; linuxppc-dev@ozlabs.org; grant.likely@secretlab.ca;
> > jwboyer@linux.vnet.ibm.com; john.williams@petalogix.com; michal.simek@petalogix.com; John Tyner
> > Subject: Re: [PATCH] [V3] Add non-Virtex5 support for LL TEMAC driver
> > 
> > Le lundi 05 avril 2010 à 15:11 -0600, John Linn a écrit :
> > > This patch adds support for using the LL TEMAC Ethernet driver on
> > > non-Virtex 5 platforms by adding support for accessing the Soft DMA
> > > registers as if they were memory mapped instead of solely through the
> > > DCR's (available on the Virtex 5).
> > >
> > > The patch also updates the driver so that it runs on the MicroBlaze.
> > > The changes were tested on the PowerPC 440, PowerPC 405, and the
> > > MicroBlaze platforms.
> > >
> > > Signed-off-by: John Tyner <jtyner@cs.ucr.edu>
> > > Signed-off-by: John Linn <john.linn@xilinx.com>
> > >
> > > ---
> > 
> > > +/* Align the IP data in the packet on word boundaries as MicroBlaze
> > > + * needs it.
> > > + */
> > > +
> > >  #define XTE_ALIGN       32
> > > -#define BUFFER_ALIGN(adr) ((XTE_ALIGN - ((u32) adr)) % XTE_ALIGN)
> > > +#define BUFFER_ALIGN(adr) ((34 - ((u32) adr)) % XTE_ALIGN)
> > >
> > 
> > Very interesting way of doing this, but why such convoluted thing ?
> 
> This is trying to align for a cache line (32 bytes) before my change.
> 
> My change was then also making it align the IP data on a word boundary. 
> 
> > 
> > Because of the % 32, this is equivalent to :
> > 
> > #define BUFFER_ALIGN(adr) ((2 - ((u32) adr)) % XTE_ALIGN)
> > 
> 
> Yes, but I'm not sure that's clearer IMHO.
> 
> > But wait, dont we recognise the magic constant NET_IP_ALIGN ?
> 
> Yes it could be used.  I'm struggling with how to make this all be clearer.
> 

I am not saying its clearer, I am saying we have a standard way to
handle this exact problem (aligning rcvs buffer so that IP header is
aligned)

There is no need to invent new ones, this makes reviewing of this driver
more difficult.


> How about this?
> #define BUFFER_ALIGN(adr) (((XTE_ALIGN + NET_IP_ALIGN) - ((u32) adr)) % XTE_ALIGN)
> 

Sorry, I still dont understand why you need XTE_ALIGN + ...

((A + B) - C) % A   is equal to (B - C) % A

Which one is more readable ?

Please take a look at existing and clean code, no magic macro, and we
can understand the intention.

find drivers/net | xargs grep -n netdev_alloc_skb_ip_align



^ permalink raw reply

* Re: [PATCH 5/5] netfilter: xt_TEE: have cloned packet travel through Xtables too
From: Patrick McHardy @ 2010-04-06 16:37 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel, netdev
In-Reply-To: <alpine.LSU.2.01.1004061814120.13186@obet.zrqbmnf.qr>

Jan Engelhardt wrote:
> On Thursday 2010-04-01 15:22, Patrick McHardy wrote:
>>> Or should we be using skb_alloc and copying the data portion over, like 
>>> ipt_REJECT does since v2.6.24-2931-g9ba99b0?
>> I guess pskb_copy() would be most optimal since we can modify
>> the header, but the non-linear area could be shared
> 
> Trying to improve my understanding: when doing skb_pull,
> does the skb->head that is relevant for pskb_copy move?

skb_pull() only changes skb->data.

^ permalink raw reply

* Re: [PATCH 5/5] netfilter: xt_TEE: have cloned packet travel through Xtables too
From: Jan Engelhardt @ 2010-04-06 16:14 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel, netdev
In-Reply-To: <4BB49E10.8080608@trash.net>


On Thursday 2010-04-01 15:22, Patrick McHardy wrote:
>> Or should we be using skb_alloc and copying the data portion over, like 
>> ipt_REJECT does since v2.6.24-2931-g9ba99b0?
>
>I guess pskb_copy() would be most optimal since we can modify
>the header, but the non-linear area could be shared

Trying to improve my understanding: when doing skb_pull,
does the skb->head that is relevant for pskb_copy move?

^ permalink raw reply

* RE: [PATCH] [V3] Add non-Virtex5 support for LL TEMAC driver
From: John Linn @ 2010-04-06 16:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, linuxppc-dev, grant.likely, jwboyer, john.williams,
	michal.simek, John Tyner
In-Reply-To: <1270502993.9013.36.camel@edumazet-laptop>

> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: Monday, April 05, 2010 3:30 PM
> To: John Linn
> Cc: netdev@vger.kernel.org; linuxppc-dev@ozlabs.org; grant.likely@secretlab.ca;
> jwboyer@linux.vnet.ibm.com; john.williams@petalogix.com; michal.simek@petalogix.com; John Tyner
> Subject: Re: [PATCH] [V3] Add non-Virtex5 support for LL TEMAC driver
> 
> Le lundi 05 avril 2010 à 15:11 -0600, John Linn a écrit :
> > This patch adds support for using the LL TEMAC Ethernet driver on
> > non-Virtex 5 platforms by adding support for accessing the Soft DMA
> > registers as if they were memory mapped instead of solely through the
> > DCR's (available on the Virtex 5).
> >
> > The patch also updates the driver so that it runs on the MicroBlaze.
> > The changes were tested on the PowerPC 440, PowerPC 405, and the
> > MicroBlaze platforms.
> >
> > Signed-off-by: John Tyner <jtyner@cs.ucr.edu>
> > Signed-off-by: John Linn <john.linn@xilinx.com>
> >
> > ---
> 
> > +/* Align the IP data in the packet on word boundaries as MicroBlaze
> > + * needs it.
> > + */
> > +
> >  #define XTE_ALIGN       32
> > -#define BUFFER_ALIGN(adr) ((XTE_ALIGN - ((u32) adr)) % XTE_ALIGN)
> > +#define BUFFER_ALIGN(adr) ((34 - ((u32) adr)) % XTE_ALIGN)
> >
> 
> Very interesting way of doing this, but why such convoluted thing ?

This is trying to align for a cache line (32 bytes) before my change.

My change was then also making it align the IP data on a word boundary. 

> 
> Because of the % 32, this is equivalent to :
> 
> #define BUFFER_ALIGN(adr) ((2 - ((u32) adr)) % XTE_ALIGN)
> 

Yes, but I'm not sure that's clearer IMHO.

> But wait, dont we recognise the magic constant NET_IP_ALIGN ?

Yes it could be used.  I'm struggling with how to make this all be clearer.

How about this?

#define BUFFER_ALIGN(adr) (((XTE_ALIGN + NET_IP_ALIGN) - ((u32) adr)) % XTE_ALIGN)

> 
> So, I ask, cant you use netdev_alloc_skb_ip_align() in this driver ?

From what I can tell, this wouldn't work as it only reserves the 2 bytes to align with 
a word boundary.

Thanks,
John

> 
> 


This email and any attachments are intended for the sole use of the named recipient(s) and contain(s) confidential information that may be proprietary, privileged or copyrighted under applicable law. If you are not the intended recipient, do not read, copy, or forward this email message or any attachments. Delete this email message and any attachments immediately.

^ permalink raw reply

* Re: HFSC classes going out of bounds, regression in recent kernels?
From: Denys Fedorysychenko @ 2010-04-06 15:45 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, Jeff Garzik, Eric Dumazet
In-Reply-To: <4BBB5543.40607@trash.net>

On Tuesday 06 April 2010 18:37:39 Patrick McHardy wrote:
> Denys Fedorysychenko wrote:
> > I notice on one of my QoS machines that HFSC start going out of bandwidth
> > limits. The most terrible thing - it happens suddenly, and if i just
> > relaunch QoS script - everything will work fine.
> 
> That sounds like there's an overflow somewhere.
> 
> > I'm not sure it is not my mistake, but most probably it is a bug.
> > I can't tell for sure when it is happened, last kernel was on this
> > machine 2.6.28 i guess, or maybe even older.
> 
> Looking through the recent patches in this area, my prime suspect
> is the attached patch. Does reverting it make any difference?
> 
I will try to upgrade soon, it is critical router, so probably i will do this 
tonight. 
I guess with reverting this patch also it will hurt shaper resolution on high 
speeds... not a case for me, but for other people.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox