Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [net-next PATCH v1 1/3] net: sched: af_packet support for direct ring access
From: David Miller @ 2014-10-07 16:08 UTC (permalink / raw)
  To: David.Laight
  Cc: willemb, john.fastabend, dborkman, fw, gerlitz.or, hannes, netdev,
	john.ronciak, amirv, eric.dumazet, danny.zhou
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D174C621F@AcuExch.aculab.com>

From: David Laight <David.Laight@ACULAB.COM>
Date: Tue, 7 Oct 2014 15:59:35 +0000

> From: David 
>> From: David Laight <David.Laight@ACULAB.COM>
>> Date: Tue, 7 Oct 2014 09:27:03 +0000
>> 
>> > That is (probably) the only scheme that stops the application
>> > accessing random parts of physical memory.
>> 
>> I don't know where this claim keeps coming from, it's false.
>> 
>> The application has to attach memory to the ring, and then the
>> ring can only refer to that memory for the duration of the
>> session.
>> 
>> There is no way that the user can program the address field of the
>> descriptors to point at arbitrary physical memory locations.
>> 
>> There is protection and control.
> 
> I got the impression that the application was directly writing the ring
> structure that the ethernet mac hardware uses to describe tx and rx buffers.
> (ie they are mapped read-write into userspace).
> Unless you have a system where you can limit the physical memory
> ranges accessible to the mac hardware, I don't see how you can stop
> the application putting rogue values into the ring.
> 
> Clearly I'm missing something in my quick read of the change.

No, I think I misunderstood, and apparently the Mellanox driver allows
the user to crap into arbitrary physical memory too.

All of this garbage must get fixed and this feature is a non-starter
until there is control over the memory the rings can point ti.

^ permalink raw reply

* Re: [PATCH net-next] net: bcmgenet: fix Tx ring priority programming
From: Petri Gynther @ 2014-10-07 16:09 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Florian Fainelli
In-Reply-To: <20141006.235926.30933585433924137.davem@davemloft.net>

Hi David,

On Mon, Oct 6, 2014 at 8:59 PM, David Miller <davem@davemloft.net> wrote:
> From: Petri Gynther <pgynther@google.com>
> Date: Mon,  6 Oct 2014 17:50:01 -0700 (PDT)
>
>> @@ -1731,11 +1744,12 @@ static void bcmgenet_init_multiq(struct net_device *dev)
>>       reg |= ring_cfg;
>>       bcmgenet_tdma_writel(priv, reg, DMA_RING_CFG);
>>
>> -     /* Use configured rings priority and set ring #16 priority */
>> -     reg = bcmgenet_tdma_readl(priv, DMA_RING_PRIORITY);
>> -     reg |= ((GENET_Q0_PRIORITY + priv->hw_params->tx_queues) << 20);
>> -     reg |= dma_priority;
>> -     bcmgenet_tdma_writel(priv, reg, DMA_PRIORITY);
>> +     /* Set ring 16 priority and program the hardware registers */
>> +     dma_priority[2] |=
>> +             ((GENET_Q0_PRIORITY + priv->hw_params->tx_queues) << 20);
>
> Please use "<< (16 - 12) * DMA_RING_BUF_PRIORITY_SHIFT" otherwise this
> constant is magic.
>
> You might, optionally, add macros for the subtraction adjustment each
> priority register uses (0, 6, 12, respectively).

Thanks for the comments. I'm going to simplify this with a few macros.

^ permalink raw reply

* [PATCH net-next 1/1] tipc: fix bug in multicast congestion handling
From: Jon Maloy @ 2014-10-07 16:18 UTC (permalink / raw)
  To: davem
  Cc: netdev, Paul Gortmaker, erik.hugne, ying.xue, maloy,
	tipc-discussion, Jon Maloy

One aim of commit 50100a5e39461b2a61d6040e73c384766c29975d ("tipc:
use pseudo message to wake up sockets after link congestion") was
to handle link congestion abatement in a uniform way for both unicast
and multicast transmit. However, the latter doesn't work correctly,
and has been broken since the referenced commit was applied.

If a user now sends a burst of multicast messages that is big
enough to cause broadcast link congestion, it will be put to sleep,
and not be waked up when the congestion abates as it should be.

This has two reasons. First, the flag that is used, TIPC_WAKEUP_USERS,
is set correctly, but in the wrong field. Instead of setting it in the
'action_flags' field of the arrival node struct, it is by mistake set
in the dummy node struct that is owned by the broadcast link, where it
will never tested for. Second, we cannot use the same flag for waking
up unicast and multicast users, since the function tipc_node_unlock()
needs to pick the wakeup pseudo messages to deliver from different
queues. It must hence be able to distinguish between the two cases.

This commit solves this problem by adding a new flag
TIPC_WAKEUP_BCAST_USERS, and a new function tipc_bclink_wakeup_user().
The latter is to be called by tipc_node_unlock() when the named flag,
now set in the correct field, is encountered.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/bcast.c | 14 +++++++++++++-
 net/tipc/bcast.h |  2 +-
 net/tipc/node.c  |  5 +++++
 net/tipc/node.h  |  3 ++-
 4 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index b2bbe69..b8670bf 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -226,6 +226,17 @@ static void bclink_retransmit_pkt(u32 after, u32 to)
 }
 
 /**
+ * tipc_bclink_wakeup_users - wake up pending users
+ *
+ * Called with no locks taken
+ */
+void tipc_bclink_wakeup_users(void)
+{
+	while (skb_queue_len(&bclink->link.waiting_sks))
+		tipc_sk_rcv(skb_dequeue(&bclink->link.waiting_sks));
+}
+
+/**
  * tipc_bclink_acknowledge - handle acknowledgement of broadcast packets
  * @n_ptr: node that sent acknowledgement info
  * @acked: broadcast sequence # that has been acknowledged
@@ -300,7 +311,8 @@ void tipc_bclink_acknowledge(struct tipc_node *n_ptr, u32 acked)
 		bclink_set_last_sent();
 	}
 	if (unlikely(released && !skb_queue_empty(&bcl->waiting_sks)))
-		bclink->node.action_flags |= TIPC_WAKEUP_USERS;
+		n_ptr->action_flags |= TIPC_WAKEUP_BCAST_USERS;
+
 exit:
 	tipc_bclink_unlock();
 }
diff --git a/net/tipc/bcast.h b/net/tipc/bcast.h
index 4875d95..e7b0f85 100644
--- a/net/tipc/bcast.h
+++ b/net/tipc/bcast.h
@@ -99,5 +99,5 @@ int  tipc_bclink_set_queue_limits(u32 limit);
 void tipc_bcbearer_sort(struct tipc_node_map *nm_ptr, u32 node, bool action);
 uint  tipc_bclink_get_mtu(void);
 int tipc_bclink_xmit(struct sk_buff *buf);
-
+void tipc_bclink_wakeup_users(void);
 #endif
diff --git a/net/tipc/node.c b/net/tipc/node.c
index 17e6378..54950ac 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -552,6 +552,7 @@ void tipc_node_unlock(struct tipc_node *node)
 	LIST_HEAD(conn_sks);
 	struct sk_buff_head waiting_sks;
 	u32 addr = 0;
+	uint flags = node->action_flags;
 
 	if (likely(!node->action_flags)) {
 		spin_unlock_bh(&node->lock);
@@ -572,6 +573,7 @@ void tipc_node_unlock(struct tipc_node *node)
 		node->action_flags &= ~TIPC_NOTIFY_NODE_UP;
 		addr = node->addr;
 	}
+	node->action_flags &= ~TIPC_WAKEUP_BCAST_USERS;
 	spin_unlock_bh(&node->lock);
 
 	while (!skb_queue_empty(&waiting_sks))
@@ -583,6 +585,9 @@ void tipc_node_unlock(struct tipc_node *node)
 	if (!list_empty(&nsub_list))
 		tipc_nodesub_notify(&nsub_list);
 
+	if (flags & TIPC_WAKEUP_BCAST_USERS)
+		tipc_bclink_wakeup_users();
+
 	if (addr)
 		tipc_named_node_up(addr);
 }
diff --git a/net/tipc/node.h b/net/tipc/node.h
index 522d6f3..67513c3 100644
--- a/net/tipc/node.h
+++ b/net/tipc/node.h
@@ -59,7 +59,8 @@ enum {
 	TIPC_WAIT_OWN_LINKS_DOWN	= (1 << 2),
 	TIPC_NOTIFY_NODE_DOWN		= (1 << 3),
 	TIPC_NOTIFY_NODE_UP		= (1 << 4),
-	TIPC_WAKEUP_USERS		= (1 << 5)
+	TIPC_WAKEUP_USERS		= (1 << 5),
+	TIPC_WAKEUP_BCAST_USERS		= (1 << 6)
 };
 
 /**
-- 
1.9.1

^ permalink raw reply related

* [PATCH v2 net-next] net: bcmgenet: fix Tx ring priority programming
From: Petri Gynther @ 2014-10-07 16:30 UTC (permalink / raw)
  To: netdev; +Cc: davem, f.fainelli

GENET MAC has three Tx ring priority registers:
- GENET_x_TDMA_PRIORITY0 for queues 0-5
- GENET_x_TDMA_PRIORITY1 for queues 6-11
- GENET_x_TDMA_PRIORITY2 for queues 12-16

Fix bcmgenet_init_multiq() to program them correctly.

Signed-off-by: Petri Gynther <pgynther@google.com>
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 42 +++++++++++++++-----------
 drivers/net/ethernet/broadcom/genet/bcmgenet.h |  2 ++
 2 files changed, 27 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index e0a6238..fff2634 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -191,8 +191,9 @@ enum dma_reg {
 	DMA_STATUS,
 	DMA_SCB_BURST_SIZE,
 	DMA_ARB_CTRL,
-	DMA_PRIORITY,
-	DMA_RING_PRIORITY,
+	DMA_PRIORITY_0,
+	DMA_PRIORITY_1,
+	DMA_PRIORITY_2,
 };
 
 static const u8 bcmgenet_dma_regs_v3plus[] = {
@@ -201,8 +202,9 @@ static const u8 bcmgenet_dma_regs_v3plus[] = {
 	[DMA_STATUS]		= 0x08,
 	[DMA_SCB_BURST_SIZE]	= 0x0C,
 	[DMA_ARB_CTRL]		= 0x2C,
-	[DMA_PRIORITY]		= 0x30,
-	[DMA_RING_PRIORITY]	= 0x38,
+	[DMA_PRIORITY_0]	= 0x30,
+	[DMA_PRIORITY_1]	= 0x34,
+	[DMA_PRIORITY_2]	= 0x38,
 };
 
 static const u8 bcmgenet_dma_regs_v2[] = {
@@ -211,8 +213,9 @@ static const u8 bcmgenet_dma_regs_v2[] = {
 	[DMA_STATUS]		= 0x08,
 	[DMA_SCB_BURST_SIZE]	= 0x0C,
 	[DMA_ARB_CTRL]		= 0x30,
-	[DMA_PRIORITY]		= 0x34,
-	[DMA_RING_PRIORITY]	= 0x3C,
+	[DMA_PRIORITY_0]	= 0x34,
+	[DMA_PRIORITY_1]	= 0x38,
+	[DMA_PRIORITY_2]	= 0x3C,
 };
 
 static const u8 bcmgenet_dma_regs_v1[] = {
@@ -220,8 +223,9 @@ static const u8 bcmgenet_dma_regs_v1[] = {
 	[DMA_STATUS]		= 0x04,
 	[DMA_SCB_BURST_SIZE]	= 0x0C,
 	[DMA_ARB_CTRL]		= 0x30,
-	[DMA_PRIORITY]		= 0x34,
-	[DMA_RING_PRIORITY]	= 0x3C,
+	[DMA_PRIORITY_0]	= 0x34,
+	[DMA_PRIORITY_1]	= 0x38,
+	[DMA_PRIORITY_2]	= 0x3C,
 };
 
 /* Set at runtime once bcmgenet version is known */
@@ -1696,7 +1700,8 @@ static void bcmgenet_init_multiq(struct net_device *dev)
 {
 	struct bcmgenet_priv *priv = netdev_priv(dev);
 	unsigned int i, dma_enable;
-	u32 reg, dma_ctrl, ring_cfg = 0, dma_priority = 0;
+	u32 reg, dma_ctrl, ring_cfg = 0;
+	u32 dma_priority[3] = {0, 0, 0};
 
 	if (!netif_is_multiqueue(dev)) {
 		netdev_warn(dev, "called with non multi queue aware HW\n");
@@ -1721,22 +1726,25 @@ static void bcmgenet_init_multiq(struct net_device *dev)
 
 		/* Configure ring as descriptor ring and setup priority */
 		ring_cfg |= 1 << i;
-		dma_priority |= ((GENET_Q0_PRIORITY + i) <<
-				(GENET_MAX_MQ_CNT + 1) * i);
 		dma_ctrl |= 1 << (i + DMA_RING_BUF_EN_SHIFT);
+
+		dma_priority[DMA_PRIO_REG_INDEX(i)] |=
+			((GENET_Q0_PRIORITY + i) << DMA_PRIO_REG_SHIFT(i));
 	}
 
+	/* Set ring 16 priority and program the hardware registers */
+	dma_priority[DMA_PRIO_REG_INDEX(DESC_INDEX)] |=
+		((GENET_Q0_PRIORITY + priv->hw_params->tx_queues) <<
+		 DMA_PRIO_REG_SHIFT(DESC_INDEX));
+	bcmgenet_tdma_writel(priv, dma_priority[0], DMA_PRIORITY_0);
+	bcmgenet_tdma_writel(priv, dma_priority[1], DMA_PRIORITY_1);
+	bcmgenet_tdma_writel(priv, dma_priority[2], DMA_PRIORITY_2);
+
 	/* Enable rings */
 	reg = bcmgenet_tdma_readl(priv, DMA_RING_CFG);
 	reg |= ring_cfg;
 	bcmgenet_tdma_writel(priv, reg, DMA_RING_CFG);
 
-	/* Use configured rings priority and set ring #16 priority */
-	reg = bcmgenet_tdma_readl(priv, DMA_RING_PRIORITY);
-	reg |= ((GENET_Q0_PRIORITY + priv->hw_params->tx_queues) << 20);
-	reg |= dma_priority;
-	bcmgenet_tdma_writel(priv, reg, DMA_PRIORITY);
-
 	/* Configure ring as descriptor ring and re-enable DMA if enabled */
 	reg = bcmgenet_tdma_readl(priv, DMA_CTRL);
 	reg |= dma_ctrl;
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.h b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
index 321b1db..dbf524e 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.h
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
@@ -401,6 +401,8 @@ struct bcmgenet_mib_counters {
 #define DMA_ARBITER_MODE_MASK		0x03
 #define DMA_RING_BUF_PRIORITY_MASK	0x1F
 #define DMA_RING_BUF_PRIORITY_SHIFT	5
+#define DMA_PRIO_REG_INDEX(q)		((q) / 6)
+#define DMA_PRIO_REG_SHIFT(q)		(((q) % 6) * DMA_RING_BUF_PRIORITY_SHIFT)
 #define DMA_RATE_ADJ_MASK		0xFF
 
 /* Tx/Rx Dma Descriptor common bits*/
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related

* Re: [net-next PATCH v1 1/3] net: sched: af_packet support for direct ring access
From: Alexei Starovoitov @ 2014-10-07 16:33 UTC (permalink / raw)
  To: Zhou, Danny
  Cc: Willem de Bruijn, John Fastabend, Daniel Borkmann,
	Florian Westphal, gerlitz.or@gmail.com, Hannes Frederic Sowa,
	Network Development, Ronciak, John, Amir Vadai, Eric Dumazet,
	David S. Miller

On Tue, Oct 7, 2014 at 8:21 AM, Zhou, Danny <danny.zhou@intel.com> wrote:
>
> As a master driver, the NIC kernel driver still takes control of flow director as a ethtool backend. Generally,
> not all queues are initialized and used by NIC kernel driver, which reports actually used rx/tx numbers to stacks.
> Before splitting off certain queues, if you want use ethtool to direct traffics to those unused queues,
> ethtool reports invalid argument. Once certain stack-unaware queues are allocated for user space slave driver,
> ethtool allows directing packets to them as the NIC driver maintains a data struct about which queues are visible
> and used by kernel, which are used by user space.

this whole thing sounds like it's a way to let DPDK apps share physical
interfaces with kernel, so that tcp-based control plane traffic
can reach DPDK process via kernel while DPDK user space
data path does the rest of packet processing?
One still needs pcie register i/o and all of user space driver support
to make it work, right?
I guess that's great for DPDK users, but I don't think it's good for linux.

^ permalink raw reply

* RE: [net-next PATCH v1 1/3] net: sched: af_packet support for direct ring access
From: Zhou, Danny @ 2014-10-07 16:46 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Willem de Bruijn, John Fastabend, Daniel Borkmann,
	Florian Westphal, gerlitz.or@gmail.com, Hannes Frederic Sowa,
	Network Development, Ronciak, John, Amir Vadai, Eric Dumazet,
	David S. Miller
In-Reply-To: <CAADnVQJf8jtRr3xBdSOt_X188Xkkg0EnxVd+gh6ruTMjc9jOHg@mail.gmail.com>


> -----Original Message-----
> From: Alexei Starovoitov [mailto:alexei.starovoitov@gmail.com]
> Sent: Wednesday, October 08, 2014 12:33 AM
> To: Zhou, Danny
> Cc: Willem de Bruijn; John Fastabend; Daniel Borkmann; Florian Westphal; gerlitz.or@gmail.com; Hannes Frederic Sowa; Network
> Development; Ronciak, John; Amir Vadai; Eric Dumazet; David S. Miller
> Subject: Re: [net-next PATCH v1 1/3] net: sched: af_packet support for direct ring access
> 
> On Tue, Oct 7, 2014 at 8:21 AM, Zhou, Danny <danny.zhou@intel.com> wrote:
> >
> > As a master driver, the NIC kernel driver still takes control of flow director as a ethtool backend. Generally,
> > not all queues are initialized and used by NIC kernel driver, which reports actually used rx/tx numbers to stacks.
> > Before splitting off certain queues, if you want use ethtool to direct traffics to those unused queues,
> > ethtool reports invalid argument. Once certain stack-unaware queues are allocated for user space slave driver,
> > ethtool allows directing packets to them as the NIC driver maintains a data struct about which queues are visible
> > and used by kernel, which are used by user space.
> 
> this whole thing sounds like it's a way to let DPDK apps share physical
> interfaces with kernel, so that tcp-based control plane traffic
> can reach DPDK process via kernel while DPDK user space
> data path does the rest of packet processing?
> One still needs pcie register i/o and all of user space driver support
> to make it work, right?
> I guess that's great for DPDK users, but I don't think it's good for linux.

No, it is a generic NIC resource(e.g. rx/tx queue pairs) portioning mechanism among NIC kernel and user space drivers with 
different performance/latency/jitter characteristics. One only needs write efficient packet rx/tx routines by taking advantage
of whatever software optimizations, to manipulate limited rx/tx queue relevant registers on PCIe I/O space. In other words, no 
need to port the entire NIC driver from kernel to user space. 


^ permalink raw reply

* Re: [PATCH v2 net-next] net: phy: adjust fixed_phy_register() return value
From: Petri Gynther @ 2014-10-07 16:48 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Florian Fainelli, Thomas Petazzoni
In-Reply-To: <20141007.000754.1011567266866832634.davem@davemloft.net>

Hi David,

On Mon, Oct 6, 2014 at 9:07 PM, David Miller <davem@davemloft.net> wrote:
> From: Petri Gynther <pgynther@google.com>
> Date: Mon,  6 Oct 2014 11:38:30 -0700 (PDT)
>
>> Adjust fixed_phy_register() to return struct phy_device *, so that
>> it becomes easy to use fixed PHYs without device tree support:
>>
>>   phydev = fixed_phy_register(PHY_POLL, &fixed_phy_status, NULL);
>>   fixed_phy_set_link_update(phydev, fixed_phy_link_update);
>>   phy_connect_direct(netdev, phydev, handler_fn, phy_interface);
>>
>> This change is a prerequisite for modifying bcmgenet driver to work
>> without a device tree on Broadcom's MIPS-based 7xxx platforms.
>>
>> Signed-off-by: Petri Gynther <pgynther@google.com>
>
> If the caller gets this 'phy' pointer and does something with it,
> something seems amiss.
>
> We don't hold an extra reference to the 'phy' object for the caller,
> so another thread of control can unregister it and kill that last
> reference and therefore free it up.
>
> I think to be legitimate, you have to hold an extra reference on
> 'phy' for the caller.  And now that means that code paths that
> don't need to do anything with 'phy' now will need to release
> that reference.

I'm not sure if I understand your comment. The caller of
fixed_phy_register() now gets the pointer to the phydev created by
get_phy_device(). What other thread is aware of this pointer and how
could they free it? Isn't the caller of fixed_phy_register()
exclusively in charge of the created phydev?

I'm trying to make fixed_phy_device() more convenient to use, so that
drivers don't need to make separate calls to fixed_phy_add() +
get_phy_device() + phy_device_register().

-- Petri

^ permalink raw reply

* Re: [PATCH] net: Add ndo_gso_check
From: Alexei Starovoitov @ 2014-10-07 16:50 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Jesse Gross, Or Gerlitz, Alexander Duyck, John Fastabend,
	Jeff Kirsher, David Miller, Linux Netdev List, Thomas Graf,
	Pravin Shelar, Andy Zhou

On Mon, Oct 6, 2014 at 5:17 PM, Tom Herbert <therbert@google.com> wrote:
> On Mon, Oct 6, 2014 at 3:33 PM, Jesse Gross <jesse@nicira.com> wrote:
>
>> I have no disagreement with trying to be generic across protocols. I'm
>> just not convinced that it is a realistic plan. It's obvious that it
>> is not doable today nor will be it be in the next generation of NICs
>> (which are guaranteed to add support for new protocols). Furthermore,
>> there will be more advanced stuff coming in the future that I think
>> will be difficult or impossible to make protocol agnostic. Rather than
>> pretending that this doesn't exist or will never happen, it's better
>> focus on how to integrating it cleanly.
>
> Sorry, but I don't understand how supporting a new protocols in a
> device for the purposes of returning CHECKSUM_UNNECESSARY is better or
> easier to implement than just returning CHECKSUM_COMPLETE. Same thing
> for trying to use NETIF_F_IP_CSUM with encapsulation rather than
> NETIF_F_HW_CSUM. I'm not a hardware guy, so it's possible I'm missing
> something obvious...

it's definitely more difficult to properly implement
CHECKSUM_UNNECESSARY in HW, but it's worth it.
CHECKSUM_COMPLETE is a burden on software. Old NICs
used to do that, but overhead of recomputing csum for every
step of packet parsing and header modifications is too high.
sw routers, bridges and < L4 networking devices are
simpler and faster with CHECKSUM_UNNECESSARY.

^ permalink raw reply

* Re: linux-next: Tree for Oct 7 (openvswitch)
From: Randy Dunlap @ 2014-10-07 16:59 UTC (permalink / raw)
  To: Stephen Rothwell, linux-next-u79uwXL29TY76Z2rM5mHXA
  Cc: dev-yBygre7rU0TnMu66kgdUjQ,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20141007194454.10502188-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org>

On 10/07/14 01:44, Stephen Rothwell wrote:
> Hi all,
> 
> Changes since 20141003:
> 

on i386 or x86_64:

In file included from ../include/net/geneve.h:4:0,
                 from ../net/openvswitch/flow_netlink.c:45:
../include/net/udp_tunnel.h: In function 'udp_tunnel_handle_offloads':
../include/net/udp_tunnel.h:100:2: error: implicit declaration of function 'iptunnel_handle_offloads' [-Werror=implicit-function-declaration]
  return iptunnel_handle_offloads(skb, udp_csum, type);
  ^
../include/net/udp_tunnel.h:100:2: warning: return makes pointer from integer without a cast [enabled by default]



-- 
~Randy

^ permalink raw reply

* Re: [PATCH] net: bcmgenet: fix increase rx_read_ptr
From: Florian Fainelli @ 2014-10-07 17:01 UTC (permalink / raw)
  To: Jaedon Shin; +Cc: netdev
In-Reply-To: <A2B48C43-2DFE-4D2F-9950-B834AE469F23@me.com>

On 10/06/2014 11:40 PM, Jaedon Shin wrote:
> This patch fixes the previous commit b629be5c8399d7c423b92135eb43a86c924d1cbc ("net: bcmgenet: check harder for out of memory conditions").
> 
> The previous commit has a problem that gets invalid dma_length_status by increased rx_read_ptr. And it should be increased after all goto refill.

Good catch, I have some other GENET and SYSTEMPORT fixes for net that I
will submit shortly. Thank you!

>  
> 
>> On Oct 7, 2014, at 1:45 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>
>> On 10/05/2014 08:05 PM, Jaedon Shin wrote:
>>> The rx_read_ptr must increase after using it.
>>
>> Your commit message is too terse, you need to explain why you think the
>> current code is bad, and how your patch is fixing it.
>>
>> One possible thing that I see is that we might be off by one in how we
>> use the enet_cb versus how we read the HW packet descriptor.
>>
>>>
>>> Signed-off-by: Jaedon Shin <jaedon.shin@gmail.com>
>>> ---
>>> drivers/net/ethernet/broadcom/genet/bcmgenet.c | 6 +++---
>>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
>>> index 5cc9cae..b47db5e 100644
>>> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
>>> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
>>> @@ -1282,9 +1282,6 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_priv *priv,
>>>
>>> 		rxpktprocessed++;
>>>
>>> -		priv->rx_read_ptr++;
>>> -		priv->rx_read_ptr &= (priv->num_rx_bds - 1);
>>> -
>>> 		/* We do not have a backing SKB, so we do not have a
>>> 		 * corresponding DMA mapping for this incoming packet since
>>> 		 * bcmgenet_rx_refill always either has both skb and mapping or
>>> @@ -1399,6 +1396,9 @@ refill:
>>> 		err = bcmgenet_rx_refill(priv, cb);
>>> 		if (err)
>>> 			netif_err(priv, rx_err, dev, "Rx refill failed\n");
>>> +
>>> +		priv->rx_read_ptr++;
>>> +		priv->rx_read_ptr &= (priv->num_rx_bds - 1);
>>> 	}
>>>
>>> 	return rxpktprocessed;
>>>
>>
> 

^ permalink raw reply

* Re: [net-next PATCH v1 1/3] net: sched: af_packet support for direct ring access
From: David Miller @ 2014-10-07 17:01 UTC (permalink / raw)
  To: alexei.starovoitov
  Cc: danny.zhou, willemb, john.fastabend, dborkman, fw, gerlitz.or,
	hannes, netdev, john.ronciak, amirv, eric.dumazet
In-Reply-To: <CAADnVQJf8jtRr3xBdSOt_X188Xkkg0EnxVd+gh6ruTMjc9jOHg@mail.gmail.com>

From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Date: Tue, 7 Oct 2014 09:33:04 -0700

> I guess that's great for DPDK users, but I don't think it's good for
> linux.

Any use of a piece of hardware is fine with me, personally, as long
as adequate protections are in place.

If it's just a descriptor ring in software and a doorbell to trigger
a refetch of the head and tail pointers, with appropriate protection
and control of the memory attached to the ring, I don't see how I
could object to such a facility.

^ permalink raw reply

* Re: [PATCH v2 net-next] net: phy: adjust fixed_phy_register() return value
From: David Miller @ 2014-10-07 17:02 UTC (permalink / raw)
  To: pgynther; +Cc: netdev, f.fainelli, thomas.petazzoni
In-Reply-To: <CAGXr9JF8fx6YH3szY=nwxTJeJgopBCW9Kzhcs33q4oFG5GNOBQ@mail.gmail.com>

From: Petri Gynther <pgynther@google.com>
Date: Tue, 7 Oct 2014 09:47:37 -0700

> I'm not sure if I understand your comment. The caller of
> fixed_phy_register() now gets the pointer to the phydev created by
> get_phy_device(). What other thread is aware of this pointer and how could
> they free it? Isn't the caller of fixed_phy_register() exclusively in
> charge of the created phydev?

If this is the case then my concerns are unfounded.

Thanks for clearing that up and I'll apply your patch, thanks!

^ permalink raw reply

* Re: randconfig build error with next-20141001, in drivers/i2c/algos/i2c-algo-bit.c
From: Randy Dunlap @ 2014-10-07 17:03 UTC (permalink / raw)
  To: Oliver Hartkopp, Stephane Grosjean
  Cc: Jim Davis, Stephen Rothwell, linux-next,
	linux-i2c-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-can-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <5433AB31.9090603-fJ+pQTUTwRTk1uMJSBkQmQ@public.gmane.org>

On 10/07/14 01:58, Oliver Hartkopp wrote:
> On 10/06/2014 08:09 PM, Randy Dunlap wrote:
>> On 10/06/14 10:39, Oliver Hartkopp wrote:
> 
>>> AFAICS there is 'just' a style problem as 'configs should not enable entire
>>> subsystems'. But it finally is a correct and valid Kconfig, right?
>>
>> Yes, right.
> 
> (..)
> 
>> In the unlikely case that I2C is not enabled, the user should have to enable
>> it instead of a solitary driver enabling it.  IOW, if a subsystem is disabled,
>> the user probably wanted it that way and a single driver should not override
>> that setting.
> 
> Due to the fact that a change to 'depends on I2C' would make the config option
> invisible (and therefore not selectable) in the case I2C was (unlikely)
> disabled I would finally vote to leave it as-is.
> 
> The current Kconfig entry already contains a description that points to the
> requirement to have I2C and I2C_ALGOBIT to be enabled to compile this driver:
> 
> config CAN_PEAK_PCIEC
> 	bool "PEAK PCAN-ExpressCard Cards"
> 	depends on CAN_PEAK_PCI
> 	select I2C
> 	select I2C_ALGOBIT
> 	default y
> 	---help---
> 	  Say Y here if you want to use a PCAN-ExpressCard from PEAK-System
> 	  Technik. This will also automatically select I2C and I2C_ALGO
> 	  configuration options.
> 
> AFAIK the PEAK PCAN-ExpressCard is usually used in x86 architecture Laptops,
> so it's near to an academic discussion as x86 usually selects I2C ;-)

Pray tell where does x86 usually select I2C?
Thanks.

> @Stephane: When updating the help text to introduce the PCAN-ExpressCard 34
> support anyway you might probably add some more information *why* the I2C
> support is needed (for CAN transceiver settings and status LED).
> 
> And /s/I2C_ALGO/I2C_ALGOBIT/ :-)


-- 
~Randy

^ permalink raw reply

* Re: [PATCH] net: Add ndo_gso_check
From: David Miller @ 2014-10-07 17:05 UTC (permalink / raw)
  To: alexei.starovoitov
  Cc: therbert, jesse, gerlitz.or, alexander.h.duyck, john.r.fastabend,
	jeffrey.t.kirsher, netdev, tgraf, pshelar, azhou
In-Reply-To: <CAADnVQLkWrjN=MgtU_8zBrGFW1eJd01hd6BM_m9iOq=ypHvSLQ@mail.gmail.com>

From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Date: Tue, 7 Oct 2014 09:50:50 -0700

> CHECKSUM_COMPLETE is a burden on software.

I totally disagree, it's the most software friendly checksumming
offload mechanism possible.  I wish every card did it.

CHECKSUM_COMPLETE means that any sub-protocol or tunneling mechanism
can be trivially supported without any modifications to hardware, and
it therefore makes checksum offloading of new protocols require no
hardware changes whatsoever.

^ permalink raw reply

* Re: [PATCH v2 net-next] net: bcmgenet: fix Tx ring priority programming
From: Florian Fainelli @ 2014-10-07 17:07 UTC (permalink / raw)
  To: Petri Gynther, netdev; +Cc: davem
In-Reply-To: <20141007163001.85125100761@puck.mtv.corp.google.com>

On 10/07/2014 09:30 AM, Petri Gynther wrote:
> GENET MAC has three Tx ring priority registers:
> - GENET_x_TDMA_PRIORITY0 for queues 0-5
> - GENET_x_TDMA_PRIORITY1 for queues 6-11
> - GENET_x_TDMA_PRIORITY2 for queues 12-16
> 
> Fix bcmgenet_init_multiq() to program them correctly.

Looks good to me, the register layout is correct for GENETv1 to v4, thanks!

> 
> Signed-off-by: Petri Gynther <pgynther@google.com>

Acked-by: Florian Fainelli <f.fainelli@gmail.com>

> ---
>  drivers/net/ethernet/broadcom/genet/bcmgenet.c | 42 +++++++++++++++-----------
>  drivers/net/ethernet/broadcom/genet/bcmgenet.h |  2 ++
>  2 files changed, 27 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> index e0a6238..fff2634 100644
> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> @@ -191,8 +191,9 @@ enum dma_reg {
>  	DMA_STATUS,
>  	DMA_SCB_BURST_SIZE,
>  	DMA_ARB_CTRL,
> -	DMA_PRIORITY,
> -	DMA_RING_PRIORITY,
> +	DMA_PRIORITY_0,
> +	DMA_PRIORITY_1,
> +	DMA_PRIORITY_2,
>  };
>  
>  static const u8 bcmgenet_dma_regs_v3plus[] = {
> @@ -201,8 +202,9 @@ static const u8 bcmgenet_dma_regs_v3plus[] = {
>  	[DMA_STATUS]		= 0x08,
>  	[DMA_SCB_BURST_SIZE]	= 0x0C,
>  	[DMA_ARB_CTRL]		= 0x2C,
> -	[DMA_PRIORITY]		= 0x30,
> -	[DMA_RING_PRIORITY]	= 0x38,
> +	[DMA_PRIORITY_0]	= 0x30,
> +	[DMA_PRIORITY_1]	= 0x34,
> +	[DMA_PRIORITY_2]	= 0x38,
>  };
>  
>  static const u8 bcmgenet_dma_regs_v2[] = {
> @@ -211,8 +213,9 @@ static const u8 bcmgenet_dma_regs_v2[] = {
>  	[DMA_STATUS]		= 0x08,
>  	[DMA_SCB_BURST_SIZE]	= 0x0C,
>  	[DMA_ARB_CTRL]		= 0x30,
> -	[DMA_PRIORITY]		= 0x34,
> -	[DMA_RING_PRIORITY]	= 0x3C,
> +	[DMA_PRIORITY_0]	= 0x34,
> +	[DMA_PRIORITY_1]	= 0x38,
> +	[DMA_PRIORITY_2]	= 0x3C,
>  };
>  
>  static const u8 bcmgenet_dma_regs_v1[] = {
> @@ -220,8 +223,9 @@ static const u8 bcmgenet_dma_regs_v1[] = {
>  	[DMA_STATUS]		= 0x04,
>  	[DMA_SCB_BURST_SIZE]	= 0x0C,
>  	[DMA_ARB_CTRL]		= 0x30,
> -	[DMA_PRIORITY]		= 0x34,
> -	[DMA_RING_PRIORITY]	= 0x3C,
> +	[DMA_PRIORITY_0]	= 0x34,
> +	[DMA_PRIORITY_1]	= 0x38,
> +	[DMA_PRIORITY_2]	= 0x3C,
>  };
>  
>  /* Set at runtime once bcmgenet version is known */
> @@ -1696,7 +1700,8 @@ static void bcmgenet_init_multiq(struct net_device *dev)
>  {
>  	struct bcmgenet_priv *priv = netdev_priv(dev);
>  	unsigned int i, dma_enable;
> -	u32 reg, dma_ctrl, ring_cfg = 0, dma_priority = 0;
> +	u32 reg, dma_ctrl, ring_cfg = 0;
> +	u32 dma_priority[3] = {0, 0, 0};
>  
>  	if (!netif_is_multiqueue(dev)) {
>  		netdev_warn(dev, "called with non multi queue aware HW\n");
> @@ -1721,22 +1726,25 @@ static void bcmgenet_init_multiq(struct net_device *dev)
>  
>  		/* Configure ring as descriptor ring and setup priority */
>  		ring_cfg |= 1 << i;
> -		dma_priority |= ((GENET_Q0_PRIORITY + i) <<
> -				(GENET_MAX_MQ_CNT + 1) * i);
>  		dma_ctrl |= 1 << (i + DMA_RING_BUF_EN_SHIFT);
> +
> +		dma_priority[DMA_PRIO_REG_INDEX(i)] |=
> +			((GENET_Q0_PRIORITY + i) << DMA_PRIO_REG_SHIFT(i));
>  	}
>  
> +	/* Set ring 16 priority and program the hardware registers */
> +	dma_priority[DMA_PRIO_REG_INDEX(DESC_INDEX)] |=
> +		((GENET_Q0_PRIORITY + priv->hw_params->tx_queues) <<
> +		 DMA_PRIO_REG_SHIFT(DESC_INDEX));
> +	bcmgenet_tdma_writel(priv, dma_priority[0], DMA_PRIORITY_0);
> +	bcmgenet_tdma_writel(priv, dma_priority[1], DMA_PRIORITY_1);
> +	bcmgenet_tdma_writel(priv, dma_priority[2], DMA_PRIORITY_2);
> +
>  	/* Enable rings */
>  	reg = bcmgenet_tdma_readl(priv, DMA_RING_CFG);
>  	reg |= ring_cfg;
>  	bcmgenet_tdma_writel(priv, reg, DMA_RING_CFG);
>  
> -	/* Use configured rings priority and set ring #16 priority */
> -	reg = bcmgenet_tdma_readl(priv, DMA_RING_PRIORITY);
> -	reg |= ((GENET_Q0_PRIORITY + priv->hw_params->tx_queues) << 20);
> -	reg |= dma_priority;
> -	bcmgenet_tdma_writel(priv, reg, DMA_PRIORITY);
> -
>  	/* Configure ring as descriptor ring and re-enable DMA if enabled */
>  	reg = bcmgenet_tdma_readl(priv, DMA_CTRL);
>  	reg |= dma_ctrl;
> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.h b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
> index 321b1db..dbf524e 100644
> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.h
> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
> @@ -401,6 +401,8 @@ struct bcmgenet_mib_counters {
>  #define DMA_ARBITER_MODE_MASK		0x03
>  #define DMA_RING_BUF_PRIORITY_MASK	0x1F
>  #define DMA_RING_BUF_PRIORITY_SHIFT	5
> +#define DMA_PRIO_REG_INDEX(q)		((q) / 6)
> +#define DMA_PRIO_REG_SHIFT(q)		(((q) % 6) * DMA_RING_BUF_PRIORITY_SHIFT)
>  #define DMA_RATE_ADJ_MASK		0xFF
>  
>  /* Tx/Rx Dma Descriptor common bits*/
> 

^ permalink raw reply

* Re: [PATCH net-next 1/1] tipc: fix bug in multicast congestion handling
From: David Miller @ 2014-10-07 17:07 UTC (permalink / raw)
  To: jon.maloy
  Cc: netdev, paul.gortmaker, erik.hugne, ying.xue, maloy,
	tipc-discussion
In-Reply-To: <1412698715-9838-1-git-send-email-jon.maloy@ericsson.com>

From: Jon Maloy <jon.maloy@ericsson.com>
Date: Tue,  7 Oct 2014 12:18:35 -0400

> +	uint flags = node->action_flags;

Please use explicit "unsigned int"

^ permalink raw reply

* Re: [PATCH v2 net-next] net: bcmgenet: fix Tx ring priority programming
From: David Miller @ 2014-10-07 17:09 UTC (permalink / raw)
  To: pgynther; +Cc: netdev, f.fainelli
In-Reply-To: <20141007163001.85125100761@puck.mtv.corp.google.com>

From: Petri Gynther <pgynther@google.com>
Date: Tue,  7 Oct 2014 09:30:01 -0700 (PDT)

> GENET MAC has three Tx ring priority registers:
> - GENET_x_TDMA_PRIORITY0 for queues 0-5
> - GENET_x_TDMA_PRIORITY1 for queues 6-11
> - GENET_x_TDMA_PRIORITY2 for queues 12-16
> 
> Fix bcmgenet_init_multiq() to program them correctly.
> 
> Signed-off-by: Petri Gynther <pgynther@google.com>

Yeah this looks a lot nicer, applied, thanks!

^ permalink raw reply

* Re: [PATCH] net: fec: fix regression on i.MX28 introduced by rx_copybreak support
From: David Miller @ 2014-10-07 17:15 UTC (permalink / raw)
  To: LW; +Cc: netdev, rmk+kernel, Frank.Li, fabio.estevam, linux-kernel
In-Reply-To: <1412687977-11742-1-git-send-email-LW@KARO-electronics.de>

From: Lothar Waßmann <LW@KARO-electronics.de>
Date: Tue,  7 Oct 2014 15:19:37 +0200

> commit 1b7bde6d659d ("net: fec: implement rx_copybreak to improve rx performance")
> introduced a regression for i.MX28. The swap_buffer() function doing
> the endian conversion of the received data on i.MX28 may access memory
> beyond the actual packet size in the DMA buffer. fec_enet_copybreak()
> does not copy those bytes, so that the last bytes of a packet may be
> filled with invalid data after swapping.
> This will likely lead to checksum errors on received packets.
> E.g. when trying to mount an NFS rootfs:
> UDP: bad checksum. From 192.168.1.225:111 to 192.168.100.73:44662 ulen 36
> 
> Do the byte swapping and copying to the new skb in one go if
> necessary.
> 
> Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de>

Why don't you just round up the length fec_enet_copybreak() uses when
need_swap is true?  Then you will end up mimicking the original behavior
and not require this new helper function.

And in any case I agree with Sergei that if you do retain your approach,
the new 'swap' argument to fec_enet_copybreak() should be a 'bool'.

I'm really surprised there isn't a control register bit to adjust the
endianness of the data DMA'd to/from the network.

^ permalink raw reply

* Re: [PATCH RESEND v3] 3c59x: fix bad split of cpu_to_le32(pci_map_single())
From: David Miller @ 2014-10-07 17:16 UTC (permalink / raw)
  To: sylvain.hitier; +Cc: linux-kernel, nhorman, mroos, netdev, klassert
In-Reply-To: <20141007134034.GA8838@erable>

From: Sylvain 'ythier' Hitier <sylvain.hitier@gmail.com>
Date: Tue, 7 Oct 2014 13:40:34 +0000

> From: Sylvain "ythier" Hitier <sylvain.hitier@gmail.com>
> 
> In commit 6f2b6a3005b2c34c39f207a87667564f64f2f91a,
>   # 3c59x: Add dma error checking and recovery
> the intent is to split out the mapping from the byte-swapping in order to
> insert a dma_mapping_error() check.
> 
> Kinda this semantic patch:
> 
>     // See http://coccinelle.lip6.fr/
>     //
>     // Beware, grouik-and-dirty!
>     @@
>     expression DEV, X, Y, Z;
>     @@
>     -   cpu_to_le32(pci_map_single(DEV, X, Y, Z))
>     +   dma_addr_t addr = pci_map_single(DEV, X, Y, Z);
>     +   if (dma_mapping_error(&DEV->dev, addr))
>     +       /* snip */;
>     +   cpu_to_le32(addr)
> 
> However, the #else part (of the #if DO_ZEROCOPY test) is changed this way:
> 
>     -   cpu_to_le32(pci_map_single(DEV, X, Y, Z))
>     +   dma_addr_t addr = cpu_to_le32(pci_map_single(DEV, X, Y, Z));
>     //                    ^^^^^^^^^^^
>     //                    That mismatches the 3 other changes!
>     +   if (dma_mapping_error(&DEV->dev, addr))
>     +       /* snip */;
>     +   cpu_to_le32(addr)
> 
> Let's remove the leftover cpu_to_le32() for coherency.
> 
> v2: Better changelog.
> v3: Add Acked-by
> 
> Fixes: 6f2b6a3005b2c34c39f207a87667564f64f2f91a
>   # 3c59x: Add dma error checking and recovery
> Acked-by: Neil Horman <nhorman@tuxdriver.com>
> Signed-off-by: Sylvain "ythier" Hitier <sylvain.hitier@gmail.com>

Applied and queued up for -stable, thanks!

^ permalink raw reply

* Re: [PATCH] net: Add ndo_gso_check
From: Alexei Starovoitov @ 2014-10-07 17:18 UTC (permalink / raw)
  To: David Miller
  Cc: Tom Herbert, Jesse Gross, gerlitz.or@gmail.com, Alexander Duyck,
	John Fastabend, Jeff Kirsher, netdev@vger.kernel.org, Thomas Graf,
	Pravin Shelar, Andy Zhou
In-Reply-To: <20141007.130504.1805673129740661479.davem@davemloft.net>

On Tue, Oct 7, 2014 at 10:05 AM, David Miller <davem@davemloft.net> wrote:
> From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Date: Tue, 7 Oct 2014 09:50:50 -0700
>
>> CHECKSUM_COMPLETE is a burden on software.
>
> I totally disagree, it's the most software friendly checksumming
> offload mechanism possible.  I wish every card did it.
>
> CHECKSUM_COMPLETE means that any sub-protocol or tunneling mechanism
> can be trivially supported without any modifications to hardware, and
> it therefore makes checksum offloading of new protocols require no
> hardware changes whatsoever.

yes, of course. My point is that if HW can parse the packet and validate
csum it should do that, since it's faster for the stack on top.
HW can fall back to CHECKSUM_COMPLETE if it fails to parse, for example.
I think some NICs do exactly that.

^ permalink raw reply

* Re: Quota in __qdisc_run()
From: David Miller @ 2014-10-07 17:19 UTC (permalink / raw)
  To: eric.dumazet
  Cc: hannes, brouer, netdev, therbert, fw, dborkman, jhs,
	alexander.duyck, john.r.fastabend, dave.taht, toke
In-Reply-To: <1412694080.11091.131.camel@edumazet-glaptop2.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 07 Oct 2014 08:01:20 -0700

> On Tue, 2014-10-07 at 16:43 +0200, Hannes Frederic Sowa wrote:
> 
>> This needs to be:
>> 
>> do
>>    ...
>> while ((iskb = iskb->next))
> 
> I do not feel needed to break the bulk dequeue at precise quota
> boundary. These quotas are advisory, and bql prefers to get its full
> budget for appropriate feedback from TX completion.
> 
> Quota was a packet quota, which was quite irrelevant if segmentation had
> to be done, so I would just let the dequeue be done so that we benefit
> from optimal xmit_more.

Yes, this makes sense, do a full qdisc_restart() cycle without boundaries,
then check how much quota was used afterwards to guard the outermost loop.

^ permalink raw reply

* Re: [PATCH net-next] net/mlx4_en: remove NETDEV_TX_BUSY
From: David Miller @ 2014-10-07 17:21 UTC (permalink / raw)
  To: amirv, amirv.mellanox
  Cc: eric.dumazet, edumazet, netdev, yevgenyp, ogerlitz, idos
In-Reply-To: <5433A48E.1040505@gmail.com>

From: Amir Vadai <amirv.mellanox@gmail.com>
Date: Tue, 07 Oct 2014 11:30:06 +0300

> On 10/6/2014 7:30 PM, Eric Dumazet wrote:
>> From: Eric Dumazet <edumazet@google.com>
>> 
>> Drivers should avoid NETDEV_TX_BUSY as much as possible.
>> 
>> They should stop the tx queue before qdisc even tries to push another
>> packet, to avoid requeues.
>> 
>> For a driver supporting skb->xmit_more, this is likely to be a prereq
>> anyway, otherwise we could have a tx deadlock : We need to force a
>> doorbell if TX ring is full.
>> 
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>> ---
> 
> Reviewed. Also verified that it fixes the deadlock (by sending a large
> burst - larger than ring size). Before this fix, last packet of the
> burst wasn't sent, therefore no doorbell was rang, and the queue was
> stalled.
> 
> BTW, another nice optimization that we hope to send soon, is not to arm
> the CQ unless ringing the doorbell.
> 
> Acked-by: Amir Vadai <amirv@mellanox.com>

Applied, thanks everyone.

^ permalink raw reply

* Re: [PATCH v2 net-next] net: better IFF_XMIT_DST_RELEASE support
From: David Miller @ 2014-10-07 17:22 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, ja
In-Reply-To: <1412633126.11091.89.camel@edumazet-glaptop2.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 06 Oct 2014 15:05:26 -0700

> We have a loop over team/bonding members, where we do :
> 
> dst_release_flag &= slave->dev->priv_flags;
> 
> So at the end of the loop, we check if any one of the member had one of
> the bit cleared.
> 
> if dst_release_flags has both bits set, then we are set and we allow the
> IFF_XMIT_DST_RELEASE being set on the master.

Oh I see, I mis-read the patch thinking dst_release_flags was a new variable
when in fact it's an existing one.

Patch applied, thanks Eric.

^ permalink raw reply

* Re: [PATCH] net: Add ndo_gso_check
From: Eric Dumazet @ 2014-10-07 17:23 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Tom Herbert, Jesse Gross, Or Gerlitz, Alexander Duyck,
	John Fastabend, Jeff Kirsher, David Miller, Linux Netdev List,
	Thomas Graf, Pravin Shelar, Andy Zhou
In-Reply-To: <CAADnVQLkWrjN=MgtU_8zBrGFW1eJd01hd6BM_m9iOq=ypHvSLQ@mail.gmail.com>

On Tue, 2014-10-07 at 09:50 -0700, Alexei Starovoitov wrote:

> it's definitely more difficult to properly implement
> CHECKSUM_UNNECESSARY in HW, but it's worth it.
> CHECKSUM_COMPLETE is a burden on software. Old NICs
> used to do that, but overhead of recomputing csum for every
> step of packet parsing and header modifications is too high.
> sw routers, bridges and < L4 networking devices are
> simpler and faster with CHECKSUM_UNNECESSARY.

Really this is wrong. Once you validated/pulled a header,
adjusting the complete checksum is in the order of 10 cycles or so,
ie less than 1% of the other costs.

UNNECESSARY usually requests complex NIC firmware, and usually the NIC
has fewer cores than the host.

It mostly worked for basic IP+TCP kind of traffic, but once you want
complex cloud models, it is a major pain.

If we need to optimize csum_partial() for short lengths, lets do it,
instead of pushing hardware vendors adding more and more schemes.

^ permalink raw reply

* Re: Quota in __qdisc_run()
From: Eric Dumazet @ 2014-10-07 17:32 UTC (permalink / raw)
  To: David Miller
  Cc: hannes, brouer, netdev, therbert, fw, dborkman, jhs,
	alexander.duyck, john.r.fastabend, dave.taht, toke
In-Reply-To: <20141007.131938.1410434352331637585.davem@davemloft.net>

On Tue, 2014-10-07 at 13:19 -0400, David Miller wrote:

> Yes, this makes sense, do a full qdisc_restart() cycle without boundaries,
> then check how much quota was used afterwards to guard the outermost loop.

I am testing this, and also am testing the xmit_more patch for I40E.

Will send patches today.

Thanks

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox