Netdev List
 help / color / mirror / Atom feed
* Re: IPv6 routing type - not at par with IPv4 one?
From: Markus @ 2012-09-06 10:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Markus Stenberg, netdev, Nicolas Dichtel
In-Reply-To: <1346927705.2484.22.camel@edumazet-glaptop>

On 6.9.2012, at 13.35, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Well, it  seems you missed this : 
> 
> http://git.kernel.org/?p=linux/kernel/git/davem/net-next.git;a=commitdiff;h=ef2c7d7b59708d54213c7556a82d14de9a7e4475
> 
> At least the blackhole is now supported on IPv6, so you probably have to
> add the 'throw' bit, if it makes any sense.


Ah, cool, teaches me to refresh my git trees before posting ;-)
Nicolas, want to update for that too or will I? 

Throw is useful in various cases (it is RTN_THROW / -EAGAIN). 

Cheers,

-Markus

^ permalink raw reply

* |PATCH] seeq: Add missing spinlock init
From: Jean Delvare @ 2012-09-06 10:47 UTC (permalink / raw)
  To: netdev

It doesn't seem this spinlock was properly initialized.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
---
I can't even build-test this.

 drivers/net/ethernet/seeq/sgiseeq.c |    1 +
 1 file changed, 1 insertion(+)

--- linux-3.6-rc4.orig/drivers/net/ethernet/seeq/sgiseeq.c	2012-07-21 22:58:29.000000000 +0200
+++ linux-3.6-rc4/drivers/net/ethernet/seeq/sgiseeq.c	2012-09-06 12:40:30.092144722 +0200
@@ -751,6 +751,7 @@ static int __devinit sgiseeq_probe(struc
 	sp->srings = sr;
 	sp->rx_desc = sp->srings->rxvector;
 	sp->tx_desc = sp->srings->txvector;
+	spin_lock_init(&sp->tx_lock);
 
 	/* A couple calculations now, saves many cycles later. */
 	setup_rx_ring(dev, sp->rx_desc, SEEQ_RX_BUFFERS);


-- 
Jean Delvare

^ permalink raw reply

* Re: IPv6 routing type - not at par with IPv4 one?
From: Eric Dumazet @ 2012-09-06 10:35 UTC (permalink / raw)
  To: Markus Stenberg; +Cc: netdev, Nicolas Dichtel
In-Reply-To: <loom.20120906T115737-277@post.gmane.org>

On Thu, 2012-09-06 at 10:02 +0000, Markus Stenberg wrote:
> ~ # ip route add throw 1.2.3.4
> ~ # ip -6 route add throw ::1.2.3.4
> RTNETLINK answers: No such device
> ~ # ip route add blackhole 1.2.3.5
> ~ # ip -6 route add blackhole ::1.2.3.5
> RTNETLINK answers: No such device
> ~ #
> 
> The reason for this is in net/ipv6/route.c - ipv6_route_add:
> 
> Eventually code winds up at this (1423 line in 3.5.3):
> 
> 	err = -ENODEV;
> 	if (!dev)
> 		goto out;
> 
> and poof, ENODEV.
> 
> Is there some reason for this? Or should I write a patch? 
> Or does someone else want to? Support for dev=NULL 
> elsewhere in the code seems to be ok.
> 
> Based on quick googling I'm not the first one to have encountered this.

Well, it  seems you missed this : 

http://git.kernel.org/?p=linux/kernel/git/davem/net-next.git;a=commitdiff;h=ef2c7d7b59708d54213c7556a82d14de9a7e4475

At least the blackhole is now supported on IPv6, so you probably have to
add the 'throw' bit, if it makes any sense.

^ permalink raw reply

* IPv6 routing type - not at par with IPv4 one?
From: Markus Stenberg @ 2012-09-06 10:02 UTC (permalink / raw)
  To: netdev

~ # ip route add throw 1.2.3.4
~ # ip -6 route add throw ::1.2.3.4
RTNETLINK answers: No such device
~ # ip route add blackhole 1.2.3.5
~ # ip -6 route add blackhole ::1.2.3.5
RTNETLINK answers: No such device
~ #

The reason for this is in net/ipv6/route.c - ipv6_route_add:

Eventually code winds up at this (1423 line in 3.5.3):

	err = -ENODEV;
	if (!dev)
		goto out;

and poof, ENODEV.

Is there some reason for this? Or should I write a patch? 
Or does someone else want to? Support for dev=NULL 
elsewhere in the code seems to be ok.

Based on quick googling I'm not the first one to have encountered this.

Cheers,

-Markus

^ permalink raw reply

* Re: [PATCH] mac80211: use list_move instead of list_del/list_add
From: Johannes Berg @ 2012-09-06  9:56 UTC (permalink / raw)
  To: Wei Yongjun; +Cc: linville, yongjun_wei, linux-wireless, netdev
In-Reply-To: <CAPgLHd_VrjzzfPDsd3KKzgX2bKJ47zUCJmsQO+Q49ajNVLnTVg@mail.gmail.com>

On Thu, 2012-09-06 at 13:20 +0800, Wei Yongjun wrote:
> From: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
> 
> Using list_move() instead of list_del() + list_add().
> 
> spatch with a semantic match is used to found this problem.
> (http://coccinelle.lip6.fr/)
> 
> Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>

Applied. FWIW, I don't think it's really a "problem" rather than a
simplification or something like that, but anyway.

johannes

^ permalink raw reply

* Re: [ethtool 1/2] ethtool.h: implement new MDI-X set defines
From: Jeff Kirsher @ 2012-09-06  9:44 UTC (permalink / raw)
  To: bhutchings@solarflare.com
  Cc: Brandeburg, Jesse, netdev@vger.kernel.org, gospo@redhat.com,
	sassmann@redhat.com
In-Reply-To: <1345538236-1636-1-git-send-email-jeffrey.t.kirsher@intel.com>

[-- Attachment #1: Type: text/plain, Size: 887 bytes --]

On Tue, 2012-08-21 at 01:37 -0700, Kirsher, Jeffrey T wrote:
> From: Jesse Brandeburg <jesse.brandeburg@intel.com>
> 
> These changes implement the kernel side of the interface
> for allowing drivers to set MDI-X state on twisted pair.
> 
> Changes implemented as suggested by Ben Hutchings, thanks Ben!
> 
> see ethtool patches titled:
> ethtool: allow setting MDI-X state
> 
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> CC: Ben Hutchings <bhutchings@solarflare.com>
> Tested-by: Aaron Brown aaron.f.brown@intel.com
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> ---
>  ethtool-copy.h | 17 +++++++++++------
>  1 file changed, 11 insertions(+), 6 deletions(-) 

Ben - ping?  I did not see a response that you queued up this 2 patch
series for Ethtool yet.  Dave has already pulled the corresponding
kernel patches into net-next.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* [PATCH] udp: increment UDP_MIB_INERRORS if copy failed
From: Eric Dumazet @ 2012-09-06  9:34 UTC (permalink / raw)
  To: David Miller; +Cc: Shawn Bohrer, netdev

From: Eric Dumazet <edumazet@google.com>

In UDP recvmsg(), we miss an increase of UDP_MIB_INERRORS if the copy
of skb to userspace failed for whatever reason.

Reported-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/udp.c |    5 +++++
 net/ipv6/udp.c |   11 +++++++++++
 2 files changed, 16 insertions(+)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 6f6d1ac..2814f66 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1226,6 +1226,11 @@ try_again:
 
 	if (unlikely(err)) {
 		trace_kfree_skb(skb, udp_recvmsg);
+		if (!peeked) {
+			atomic_inc(&sk->sk_drops);
+			UDP_INC_STATS_USER(sock_net(sk),
+					   UDP_MIB_INERRORS, is_udplite);
+		}
 		goto out_free;
 	}
 
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 99d0077..07e2bfe 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -394,6 +394,17 @@ try_again:
 	}
 	if (unlikely(err)) {
 		trace_kfree_skb(skb, udpv6_recvmsg);
+		if (!peeked) {
+			atomic_inc(&sk->sk_drops);
+			if (is_udp4)
+				UDP_INC_STATS_USER(sock_net(sk),
+						   UDP_MIB_INERRORS,
+						   is_udplite);
+			else
+				UDP6_INC_STATS_USER(sock_net(sk),
+						    UDP_MIB_INERRORS,
+						    is_udplite);
+		}
 		goto out_free;
 	}
 	if (!peeked) {

^ permalink raw reply related

* Re: changing usbnet's API to better deal with cdc-ncm
From: Eric Dumazet @ 2012-09-06  9:22 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Alexey ORISHKO, bjorn@mork.no, Ming Lei, netdev@vger.kernel.org,
	linux-usb@vger.kernel.org, Alexey Orishko
In-Reply-To: <1346922697.2484.5.camel@edumazet-glaptop>

On Thu, 2012-09-06 at 11:11 +0200, Eric Dumazet wrote:

> Really skb_clone() use should be removed from cdc_ncm_rx_fixup()
> 
> Unless you expect 10Gbit speed from this driver, skb_clone() is the
> worst possible strategy.
> 
> Allocating fresh skbs of the right size permits better memory use and
> allows TCP coalescing as well.
> 
> The use of skb_clone() forces some parts of the stack to perform a full
> copy anyway.
> 
> It's still unclear to me with we use up to 32KB blocks in USB drivers...
> 

So I advise testing this patch :

diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c
index 4cd582a..c0821f7 100644
--- a/drivers/net/usb/cdc_ncm.c
+++ b/drivers/net/usb/cdc_ncm.c
@@ -1052,12 +1052,11 @@ static int cdc_ncm_rx_fixup(struct usbnet *dev, struct sk_buff *skb_in)
 			break;
 
 		} else {
-			skb = skb_clone(skb_in, GFP_ATOMIC);
+			skb = netdev_alloc_skb_ip_align(dev->net, len);
 			if (!skb)
 				goto error;
-			skb->len = len;
-			skb->data = ((u8 *)skb_in->data) + offset;
-			skb_set_tail_pointer(skb, len);
+			skb_put(skb, len);
+			memcpy(skb->data, skb_in->data + offset, len);
 			usbnet_skb_return(dev, skb);
 		}
 	}

^ permalink raw reply related

* Re: changing usbnet's API to better deal with cdc-ncm
From: Eric Dumazet @ 2012-09-06  9:11 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Alexey ORISHKO, bjorn-yOkvZcmFvRU@public.gmane.org, Ming Lei,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-usb-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Alexey Orishko
In-Reply-To: <6556435.5o3fR8ZcBa-ugxBuEnWX9yG/4A2pS7c2Q@public.gmane.org>

On Thu, 2012-09-06 at 10:50 +0200, Oliver Neukum wrote: 
> On Thursday 06 September 2012 10:13:01 Alexey ORISHKO wrote:

> > Rx path:
> > 4. IP packets are cloned to separate skb and length of actual data
> set to eth packet size, while skb size is still the same as skb
> containing full NTB frame. This causes problems with TCP stack when
> throughput is high because of flow control in the stack, if too much
> data allocated.
> > Someone suggested a patch for it, which was rejected. Anyway, I
> don't think copy data to a new skb would be a nice solution, but
> keeping several clones with size equal to incoming skb in not good
> either. 
> 
> Perhaps the problem is using an skb for aggregate reception at all.
> Possibly enough buffers of fixed size should be allocated on open and
> reused,
> not freed.

Really skb_clone() use should be removed from cdc_ncm_rx_fixup()

Unless you expect 10Gbit speed from this driver, skb_clone() is the
worst possible strategy.

Allocating fresh skbs of the right size permits better memory use and
allows TCP coalescing as well.

The use of skb_clone() forces some parts of the stack to perform a full
copy anyway.

It's still unclear to me with we use up to 32KB blocks in USB drivers...


--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [net-next 1/6] igb: Tidy up wrapping for CONFIG_IGB_PTP.
From: Jeff Kirsher @ 2012-09-06  9:04 UTC (permalink / raw)
  To: Richard Cochran; +Cc: davem, Matthew Vick, netdev, gospo, sassmann
In-Reply-To: <20120906084954.GH2550@netboy.at.omicron.at>

[-- Attachment #1: Type: text/plain, Size: 821 bytes --]

On Thu, 2012-09-06 at 10:49 +0200, Richard Cochran wrote:
> On Thu, Sep 06, 2012 at 01:44:27AM -0700, Jeff Kirsher wrote:
> > 
> > As I recall, he stated he was going to integrate your suggestions in a
> > new patch set that would follow this one.
> 
> Okay, but won't that mean moving the time stamping functions back, and
> removing the #ifdefs again?

That maybe, I do not know what Matthew has in mind to integrate your
suggestions.

> > Also, IMHO, while I look at pre-processor tags all day and get used to
> > matching up #ifdef to #endif's, I still see #endif /* CONFIG_IGB_PTP */
> > useful.  While it is not currently consistent, I see it more useful to
> > fix as we go, rather than create a single patch to add a comment for
> > every #endif.
> 
> Okay, fine.
> 
> Thanks,
> Richard



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* [PATCH net 5/5] net/mlx4_core: Return the error value in case of command initialization failure
From: Yevgeny Petrilin @ 2012-09-06  8:50 UTC (permalink / raw)
  To: davem; +Cc: netdev, Eugenia Emantayev, Yevgeny Petrilin
In-Reply-To: <1346921452-2035-1-git-send-email-yevgenyp@mellanox.com>

From: Eugenia Emantayev <eugenia@mellanox.com>

If mlx4_cmd_init() failed, the init_one function returned
success, although no resources were opened.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/main.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 0fadac5..2f816c6 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -1997,7 +1997,8 @@ static int __mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id)
 	}
 
 slave_start:
-	if (mlx4_cmd_init(dev)) {
+	err = mlx4_cmd_init(dev);
+	if (err) {
 		mlx4_err(dev, "Failed to init command interface, aborting.\n");
 		goto err_sriov;
 	}
-- 
1.7.7

^ permalink raw reply related

* Re: changing usbnet's API to better deal with cdc-ncm
From: Oliver Neukum @ 2012-09-06  8:50 UTC (permalink / raw)
  To: Alexey ORISHKO
  Cc: bjorn@mork.no, Ming Lei, netdev@vger.kernel.org,
	linux-usb@vger.kernel.org, Alexey Orishko
In-Reply-To: <2AC7D4AD8BA1C640B4C60C61C8E520154A6AA56BF9@EXDCVYMBSTM006.EQ1STM.local>

On Thursday 06 September 2012 10:13:01 Alexey ORISHKO wrote:
> > -----Original Message-----
> > From: Oliver Neukum [mailto:oneukum@suse.de]
> 
> 
> > looking at cdc-ncm it seeems to me that cdc-ncm is forced to play very
> > dirty games because usbnet doesn't have a notion about aggregating
> > packets for a single transfer.
> 
> Several issues need to be improved:
> Tx path:
> 1. IP packets must be accumulated in one NTB. Currently it's done via data copy.
> Preferred way would be a possibility to have a list of skb-s in resulting frame sent down.

I am afraid much HCD hardware is just not capable of doing DMA on a list of skbs.
I guess one copy will be necessary. We might consider not freeing the buffer used
to transfer over the bus.

> 2. Timer is needed to accumulate enough packets to get acceptable throughput and reducing amount of HW INTs on device side.
> If we get access to Tx queue size and can read skb from it, time can be removed. But Tx queue is above usbnet, so doesn't look possible in current framework.

I think we can use the number of transfers in flight to make that decision.
So if the choice is to not transfer data at all or to send transfers filled to the
max, we should send, otherwise we ought to accumulate.
That way we get rid of the timer.

> 3. While trying to get best throughput we also need to keep latency in mind, because in some setups it is quite important.
> As a simple solution configuration parameters could be used to address this issue.

I think we should just send the first (two) packages and use the time
to aggregate later packets.

> Rx path:
> 4. IP packets are cloned to separate skb and length of actual data set to eth packet size, while skb size is still the same as skb containing full NTB frame. This causes problems with TCP stack when throughput is high because of flow control in the stack, if too much data allocated.
> Someone suggested a patch for it, which was rejected. Anyway, I don't think copy data to a new skb would be a nice solution, but keeping several clones with size equal to incoming skb in not good either. 

Perhaps the problem is using an skb for aggregate reception at all.
Possibly enough buffers of fixed size should be allocated on open and reused,
not freed.

	Regards
		Oliver

^ permalink raw reply

* [PATCH net 4/5] net/mlx4_core: Fixing error flow in case of QUERY_FW failure
From: Yevgeny Petrilin @ 2012-09-06  8:50 UTC (permalink / raw)
  To: davem; +Cc: netdev, Aviad Yehezkel, Yevgeny Petrilin
In-Reply-To: <1346921452-2035-1-git-send-email-yevgenyp@mellanox.com>

From: Aviad Yehezkel <aviadye@mellanox.com>

The order of operations was wrong on the teardown flow.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
 drivers/net/ethernet/mellanox/mlx4/main.c |   13 +++++++------
 1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 827b72d..0fadac5 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -1234,13 +1234,13 @@ static int mlx4_init_hca(struct mlx4_dev *dev)
 				mlx4_info(dev, "non-primary physical function, skipping.\n");
 			else
 				mlx4_err(dev, "QUERY_FW command failed, aborting.\n");
-			goto unmap_bf;
+			return err;
 		}
 
 		err = mlx4_load_fw(dev);
 		if (err) {
 			mlx4_err(dev, "Failed to start FW, aborting.\n");
-			goto unmap_bf;
+			return err;
 		}
 
 		mlx4_cfg.log_pg_sz_m = 1;
@@ -1304,7 +1304,7 @@ static int mlx4_init_hca(struct mlx4_dev *dev)
 		err = mlx4_init_slave(dev);
 		if (err) {
 			mlx4_err(dev, "Failed to initialize slave\n");
-			goto unmap_bf;
+			return err;
 		}
 
 		err = mlx4_slave_cap(dev);
@@ -1324,7 +1324,7 @@ static int mlx4_init_hca(struct mlx4_dev *dev)
 	err = mlx4_QUERY_ADAPTER(dev, &adapter);
 	if (err) {
 		mlx4_err(dev, "QUERY_ADAPTER command failed, aborting.\n");
-		goto err_close;
+		goto unmap_bf;
 	}
 
 	priv->eq_table.inta_pin = adapter.inta_pin;
@@ -1332,6 +1332,9 @@ static int mlx4_init_hca(struct mlx4_dev *dev)
 
 	return 0;
 
+unmap_bf:
+	unmap_bf_area(dev);
+
 err_close:
 	mlx4_close_hca(dev);
 
@@ -1344,8 +1347,6 @@ err_stop_fw:
 		mlx4_UNMAP_FA(dev);
 		mlx4_free_icm(dev, priv->fw.fw_icm, 0);
 	}
-unmap_bf:
-	unmap_bf_area(dev);
 	return err;
 }
 
-- 
1.7.7

^ permalink raw reply related

* [PATCH net 2/5] net/mlx4_core: Add security check / enforcement for flow steering rules set for VMs
From: Yevgeny Petrilin @ 2012-09-06  8:50 UTC (permalink / raw)
  To: davem; +Cc: netdev, Hadar Hen Zion, Or Gerlitz
In-Reply-To: <1346921452-2035-1-git-send-email-yevgenyp@mellanox.com>

From: Hadar Hen Zion <hadarh@mellanox.com>

Since VFs may be mapped to VMs which aren't trusted entities,  flow
steering rules attached through the wrapper on behalf of VFs must be
checked to make sure that their L2 specification relate to MAC address
assigned to that VF, and add L2 specification if its missing.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |  116 ++++++++++++++++++++
 include/linux/mlx4/device.h                        |   11 ++
 2 files changed, 127 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
index 94ceddd..293c9e8 100644
--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
@@ -42,6 +42,7 @@
 #include <linux/mlx4/cmd.h>
 #include <linux/mlx4/qp.h>
 #include <linux/if_ether.h>
+#include <linux/etherdevice.h>
 
 #include "mlx4.h"
 #include "fw.h"
@@ -2776,18 +2777,133 @@ ex_put:
 	return err;
 }
 
+/*
+ * MAC validation for Flow Steering rules.
+ * VF can attach rules only with a mac address which is assigned to it.
+ */
+static int validate_eth_header_mac(int slave, struct _rule_hw *eth_header,
+				   struct list_head *rlist)
+{
+	struct mac_res *res, *tmp;
+	__be64 be_mac;
+
+	/* make sure it isn't multicast or broadcast mac*/
+	if (!is_multicast_ether_addr(eth_header->eth.dst_mac) &&
+	    !is_broadcast_ether_addr(eth_header->eth.dst_mac)) {
+		list_for_each_entry_safe(res, tmp, rlist, list) {
+			be_mac = cpu_to_be64(res->mac << 16);
+			if (!memcmp(&be_mac, eth_header->eth.dst_mac, ETH_ALEN))
+				return 0;
+		}
+		pr_err("MAC %pM doesn't belong to VF %d, Steering rule rejected\n",
+		       eth_header->eth.dst_mac, slave);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/*
+ * In case of missing eth header, append eth header with a MAC address
+ * assigned to the VF.
+ */
+static int add_eth_header(struct mlx4_dev *dev, int slave,
+			  struct mlx4_cmd_mailbox *inbox,
+			  struct list_head *rlist, int header_id)
+{
+	struct mac_res *res, *tmp;
+	u8 port;
+	struct mlx4_net_trans_rule_hw_ctrl *ctrl;
+	struct mlx4_net_trans_rule_hw_eth *eth_header;
+	struct mlx4_net_trans_rule_hw_ipv4 *ip_header;
+	struct mlx4_net_trans_rule_hw_tcp_udp *l4_header;
+	__be64 be_mac = 0;
+	__be64 mac_msk = cpu_to_be64(MLX4_MAC_MASK << 16);
+
+	ctrl = (struct mlx4_net_trans_rule_hw_ctrl *)inbox->buf;
+	port = be32_to_cpu(ctrl->vf_vep_port) & 0xff;
+	eth_header = (struct mlx4_net_trans_rule_hw_eth *)(ctrl + 1);
+
+	/* Clear a space in the inbox for eth header */
+	switch (header_id) {
+	case MLX4_NET_TRANS_RULE_ID_IPV4:
+		ip_header =
+			(struct mlx4_net_trans_rule_hw_ipv4 *)(eth_header + 1);
+		memmove(ip_header, eth_header,
+			sizeof(*ip_header) + sizeof(*l4_header));
+		break;
+	case MLX4_NET_TRANS_RULE_ID_TCP:
+	case MLX4_NET_TRANS_RULE_ID_UDP:
+		l4_header = (struct mlx4_net_trans_rule_hw_tcp_udp *)
+			    (eth_header + 1);
+		memmove(l4_header, eth_header, sizeof(*l4_header));
+		break;
+	default:
+		return -EINVAL;
+	}
+	list_for_each_entry_safe(res, tmp, rlist, list) {
+		if (port == res->port) {
+			be_mac = cpu_to_be64(res->mac << 16);
+			break;
+		}
+	}
+	if (!be_mac) {
+		pr_err("Failed adding eth header to FS rule, Can't find matching MAC for port %d .\n",
+		       port);
+		return -EINVAL;
+	}
+
+	memset(eth_header, 0, sizeof(*eth_header));
+	eth_header->size = sizeof(*eth_header) >> 2;
+	eth_header->id = cpu_to_be16(__sw_id_hw[MLX4_NET_TRANS_RULE_ID_ETH]);
+	memcpy(eth_header->dst_mac, &be_mac, ETH_ALEN);
+	memcpy(eth_header->dst_mac_msk, &mac_msk, ETH_ALEN);
+
+	return 0;
+
+}
+
 int mlx4_QP_FLOW_STEERING_ATTACH_wrapper(struct mlx4_dev *dev, int slave,
 					 struct mlx4_vhcr *vhcr,
 					 struct mlx4_cmd_mailbox *inbox,
 					 struct mlx4_cmd_mailbox *outbox,
 					 struct mlx4_cmd_info *cmd)
 {
+
+	struct mlx4_priv *priv = mlx4_priv(dev);
+	struct mlx4_resource_tracker *tracker = &priv->mfunc.master.res_tracker;
+	struct list_head *rlist = &tracker->slave_list[slave].res_list[RES_MAC];
 	int err;
+	struct mlx4_net_trans_rule_hw_ctrl *ctrl;
+	struct _rule_hw  *rule_header;
+	int header_id;
 
 	if (dev->caps.steering_mode !=
 	    MLX4_STEERING_MODE_DEVICE_MANAGED)
 		return -EOPNOTSUPP;
 
+	ctrl = (struct mlx4_net_trans_rule_hw_ctrl *)inbox->buf;
+	rule_header = (struct _rule_hw *)(ctrl + 1);
+	header_id = map_hw_to_sw_id(be16_to_cpu(rule_header->id));
+
+	switch (header_id) {
+	case MLX4_NET_TRANS_RULE_ID_ETH:
+		if (validate_eth_header_mac(slave, rule_header, rlist))
+			return -EINVAL;
+		break;
+	case MLX4_NET_TRANS_RULE_ID_IPV4:
+	case MLX4_NET_TRANS_RULE_ID_TCP:
+	case MLX4_NET_TRANS_RULE_ID_UDP:
+		pr_warn("Can't attach FS rule without L2 headers, adding L2 header.\n");
+		if (add_eth_header(dev, slave, inbox, rlist, header_id))
+			return -EINVAL;
+		vhcr->in_modifier +=
+			sizeof(struct mlx4_net_trans_rule_hw_eth) >> 2;
+		break;
+	default:
+		pr_err("Corrupted mailbox.\n");
+		return -EINVAL;
+	}
+
 	err = mlx4_cmd_imm(dev, inbox->dma, &vhcr->out_param,
 			   vhcr->in_modifier, 0,
 			   MLX4_QP_FLOW_STEERING_ATTACH, MLX4_CMD_TIME_CLASS_A,
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 244ba90..6e1b0f9 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -798,6 +798,17 @@ enum mlx4_net_trans_rule_id {
 
 extern const u16 __sw_id_hw[];
 
+static inline int map_hw_to_sw_id(u16 header_id)
+{
+
+	int i;
+	for (i = 0; i < MLX4_NET_TRANS_RULE_NUM; i++) {
+		if (header_id == __sw_id_hw[i])
+			return i;
+	}
+	return -EINVAL;
+}
+
 enum mlx4_net_trans_promisc_mode {
 	MLX4_FS_PROMISC_NONE = 0,
 	MLX4_FS_PROMISC_UPLINK,
-- 
1.7.7

^ permalink raw reply related

* [PATCH net 0/5] net/mlx4: Fixes to mlx4_core driver.
From: Yevgeny Petrilin @ 2012-09-06  8:50 UTC (permalink / raw)
  To: davem; +Cc: netdev

Hello Dave,

Add security check / enforcement for flow steering rules set for VMs:
Since VFs may be mapped to VMs which aren't trusted entities, flow
steering rules attached through the wrapper on behalf of VFs must be
checked to make sure that their L2 specification relate to MAC address
assigned to that VF, and add L2 specification if its missing.

Fixing bad handling of command failures, can lead to crash.

Fixing wrong promiscuous mode entries management.
---
Hadar Hen Zion (2):
  net/mlx4_core: Put Firmware flow steering structures in common header files
  net/mlx4_core: Add security check / enforcement for flow steering rules set for VMs
Aviad Yehezkel (2):
  net/mlx4_core: Looking for promiscuous entries on the correct port
  net/mlx4_core: Fixing error flow in case of QUERY_FW failure
Eugenia Emantayev (1):
  net/mlx4_core: Return the error value in case of command initialization failure

 drivers/net/ethernet/mellanox/mlx4/main.c             |   16 +-
 drivers/net/ethernet/mellanox/mlx4/mcg.c              |  106 ++--------------
 drivers/net/ethernet/mellanox/mlx4/mlx4.h             |   76 +++++++++++
 drivers/net/ethernet/mellanox/mlx4/resource_tracker.c |  116 ++++++++++++++++++
 include/linux/mlx4/device.h                           |   13 ++
 5 files changed, 229 insertions(+), 98 deletions(-)

-- 
1.7.8.2

^ permalink raw reply

* [PATCH net 3/5] net/mlx4_core: Looking for promiscuous entries on the correct port
From: Yevgeny Petrilin @ 2012-09-06  8:50 UTC (permalink / raw)
  To: davem; +Cc: netdev, Aviad Yehezkel, Yevgeny Petrilin
In-Reply-To: <1346921452-2035-1-git-send-email-yevgenyp@mellanox.com>

From: Aviad Yehezkel <aviadye@mellanox.com>

The search for promisc entries was always done on the first port,
While the addition is done on the correct port.
This lead to resource leackage of promisc entries on the second
port and brought to a state where we could no longer enter to
promiscuous mode after enough iterations of "ifconfig promisc"
on the second port.
Fix that by using the correct port when searching.

Reported-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/mcg.c |   16 ++++++++--------
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/mcg.c b/drivers/net/ethernet/mellanox/mlx4/mcg.c
index 2673f3b..e151c21 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mcg.c
+++ b/drivers/net/ethernet/mellanox/mlx4/mcg.c
@@ -137,11 +137,11 @@ static int mlx4_GID_HASH(struct mlx4_dev *dev, struct mlx4_cmd_mailbox *mailbox,
 	return err;
 }
 
-static struct mlx4_promisc_qp *get_promisc_qp(struct mlx4_dev *dev, u8 pf_num,
+static struct mlx4_promisc_qp *get_promisc_qp(struct mlx4_dev *dev, u8 port,
 					      enum mlx4_steer_type steer,
 					      u32 qpn)
 {
-	struct mlx4_steer *s_steer = &mlx4_priv(dev)->steer[pf_num];
+	struct mlx4_steer *s_steer = &mlx4_priv(dev)->steer[port - 1];
 	struct mlx4_promisc_qp *pqp;
 
 	list_for_each_entry(pqp, &s_steer->promisc_qps[steer], list) {
@@ -182,7 +182,7 @@ static int new_steering_entry(struct mlx4_dev *dev, u8 port,
 	/* If the given qpn is also a promisc qp,
 	 * it should be inserted to duplicates list
 	 */
-	pqp = get_promisc_qp(dev, 0, steer, qpn);
+	pqp = get_promisc_qp(dev, port, steer, qpn);
 	if (pqp) {
 		dqp = kmalloc(sizeof *dqp, GFP_KERNEL);
 		if (!dqp) {
@@ -256,7 +256,7 @@ static int existing_steering_entry(struct mlx4_dev *dev, u8 port,
 
 	s_steer = &mlx4_priv(dev)->steer[port - 1];
 
-	pqp = get_promisc_qp(dev, 0, steer, qpn);
+	pqp = get_promisc_qp(dev, port, steer, qpn);
 	if (!pqp)
 		return 0; /* nothing to do */
 
@@ -302,7 +302,7 @@ static bool check_duplicate_entry(struct mlx4_dev *dev, u8 port,
 	s_steer = &mlx4_priv(dev)->steer[port - 1];
 
 	/* if qp is not promisc, it cannot be duplicated */
-	if (!get_promisc_qp(dev, 0, steer, qpn))
+	if (!get_promisc_qp(dev, port, steer, qpn))
 		return false;
 
 	/* The qp is promisc qp so it is a duplicate on this index
@@ -352,7 +352,7 @@ static bool can_remove_steering_entry(struct mlx4_dev *dev, u8 port,
 	members_count = be32_to_cpu(mgm->members_count) & 0xffffff;
 	for (i = 0;  i < members_count; i++) {
 		qpn = be32_to_cpu(mgm->qp[i]) & MGM_QPN_MASK;
-		if (!get_promisc_qp(dev, 0, steer, qpn) && qpn != tqpn) {
+		if (!get_promisc_qp(dev, port, steer, qpn) && qpn != tqpn) {
 			/* the qp is not promisc, the entry can't be removed */
 			goto out;
 		}
@@ -398,7 +398,7 @@ static int add_promisc_qp(struct mlx4_dev *dev, u8 port,
 
 	mutex_lock(&priv->mcg_table.mutex);
 
-	if (get_promisc_qp(dev, 0, steer, qpn)) {
+	if (get_promisc_qp(dev, port, steer, qpn)) {
 		err = 0;  /* Noting to do, already exists */
 		goto out_mutex;
 	}
@@ -503,7 +503,7 @@ static int remove_promisc_qp(struct mlx4_dev *dev, u8 port,
 	s_steer = &mlx4_priv(dev)->steer[port - 1];
 	mutex_lock(&priv->mcg_table.mutex);
 
-	pqp = get_promisc_qp(dev, 0, steer, qpn);
+	pqp = get_promisc_qp(dev, port, steer, qpn);
 	if (unlikely(!pqp)) {
 		mlx4_warn(dev, "QP %x is not promiscuous QP\n", qpn);
 		/* nothing to do */
-- 
1.7.7

^ permalink raw reply related

* [PATCH net 1/5] net/mlx4_core: Put Firmware flow steering structures in common header files
From: Yevgeny Petrilin @ 2012-09-06  8:50 UTC (permalink / raw)
  To: davem; +Cc: netdev, Hadar Hen Zion, Or Gerlitz
In-Reply-To: <1346921452-2035-1-git-send-email-yevgenyp@mellanox.com>

From: Hadar Hen Zion <hadarh@mellanox.com>

To allow for usage of the flow steering Firmware structures in more locations over the driver,
such as the resource tracker, move them from mcg.c to common header files.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/mcg.c  |   90 ++--------------------------
 drivers/net/ethernet/mellanox/mlx4/mlx4.h |   76 ++++++++++++++++++++++++
 include/linux/mlx4/device.h               |    2 +
 3 files changed, 85 insertions(+), 83 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/mcg.c b/drivers/net/ethernet/mellanox/mlx4/mcg.c
index a018ea2..2673f3b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mcg.c
+++ b/drivers/net/ethernet/mellanox/mlx4/mcg.c
@@ -650,13 +650,6 @@ static int find_entry(struct mlx4_dev *dev, u8 port,
 	return err;
 }
 
-struct mlx4_net_trans_rule_hw_ctrl {
-	__be32 ctrl;
-	__be32 vf_vep_port;
-	__be32 qpn;
-	__be32 reserved;
-};
-
 static void trans_rule_ctrl_to_hw(struct mlx4_net_trans_rule *ctrl,
 				  struct mlx4_net_trans_rule_hw_ctrl *hw)
 {
@@ -680,87 +673,18 @@ static void trans_rule_ctrl_to_hw(struct mlx4_net_trans_rule *ctrl,
 	hw->qpn = cpu_to_be32(ctrl->qpn);
 }
 
-struct mlx4_net_trans_rule_hw_ib {
-	u8	size;
-	u8	rsvd1;
-	__be16	id;
-	u32	rsvd2;
-	__be32	qpn;
-	__be32	qpn_mask;
-	u8	dst_gid[16];
-	u8	dst_gid_msk[16];
-} __packed;
-
-struct mlx4_net_trans_rule_hw_eth {
-	u8	size;
-	u8	rsvd;
-	__be16	id;
-	u8	rsvd1[6];
-	u8	dst_mac[6];
-	u16	rsvd2;
-	u8	dst_mac_msk[6];
-	u16	rsvd3;
-	u8	src_mac[6];
-	u16	rsvd4;
-	u8	src_mac_msk[6];
-	u8      rsvd5;
-	u8      ether_type_enable;
-	__be16  ether_type;
-	__be16  vlan_id_msk;
-	__be16  vlan_id;
-} __packed;
-
-struct mlx4_net_trans_rule_hw_tcp_udp {
-	u8	size;
-	u8	rsvd;
-	__be16	id;
-	__be16	rsvd1[3];
-	__be16	dst_port;
-	__be16	rsvd2;
-	__be16	dst_port_msk;
-	__be16	rsvd3;
-	__be16	src_port;
-	__be16	rsvd4;
-	__be16	src_port_msk;
-} __packed;
-
-struct mlx4_net_trans_rule_hw_ipv4 {
-	u8	size;
-	u8	rsvd;
-	__be16	id;
-	__be32	rsvd1;
-	__be32	dst_ip;
-	__be32	dst_ip_msk;
-	__be32	src_ip;
-	__be32	src_ip_msk;
-} __packed;
-
-struct _rule_hw {
-	union {
-		struct {
-			u8 size;
-			u8 rsvd;
-			__be16 id;
-		};
-		struct mlx4_net_trans_rule_hw_eth eth;
-		struct mlx4_net_trans_rule_hw_ib ib;
-		struct mlx4_net_trans_rule_hw_ipv4 ipv4;
-		struct mlx4_net_trans_rule_hw_tcp_udp tcp_udp;
-	};
+const u16 __sw_id_hw[] = {
+	[MLX4_NET_TRANS_RULE_ID_ETH]     = 0xE001,
+	[MLX4_NET_TRANS_RULE_ID_IB]      = 0xE005,
+	[MLX4_NET_TRANS_RULE_ID_IPV6]    = 0xE003,
+	[MLX4_NET_TRANS_RULE_ID_IPV4]    = 0xE002,
+	[MLX4_NET_TRANS_RULE_ID_TCP]     = 0xE004,
+	[MLX4_NET_TRANS_RULE_ID_UDP]     = 0xE006
 };
 
 static int parse_trans_rule(struct mlx4_dev *dev, struct mlx4_spec_list *spec,
 			    struct _rule_hw *rule_hw)
 {
-	static const u16 __sw_id_hw[] = {
-		[MLX4_NET_TRANS_RULE_ID_ETH]     = 0xE001,
-		[MLX4_NET_TRANS_RULE_ID_IB]      = 0xE005,
-		[MLX4_NET_TRANS_RULE_ID_IPV6]    = 0xE003,
-		[MLX4_NET_TRANS_RULE_ID_IPV4]    = 0xE002,
-		[MLX4_NET_TRANS_RULE_ID_TCP]     = 0xE004,
-		[MLX4_NET_TRANS_RULE_ID_UDP]     = 0xE006
-	};
-
 	static const size_t __rule_hw_sz[] = {
 		[MLX4_NET_TRANS_RULE_ID_ETH] =
 			sizeof(struct mlx4_net_trans_rule_hw_eth),
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4.h b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
index 4d9df8f..dba69d9 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
@@ -690,6 +690,82 @@ struct mlx4_steer {
 	struct list_head steer_entries[MLX4_NUM_STEERS];
 };
 
+struct mlx4_net_trans_rule_hw_ctrl {
+	__be32 ctrl;
+	__be32 vf_vep_port;
+	__be32 qpn;
+	__be32 reserved;
+};
+
+struct mlx4_net_trans_rule_hw_ib {
+	u8 size;
+	u8 rsvd1;
+	__be16 id;
+	u32 rsvd2;
+	__be32 qpn;
+	__be32 qpn_mask;
+	u8 dst_gid[16];
+	u8 dst_gid_msk[16];
+} __packed;
+
+struct mlx4_net_trans_rule_hw_eth {
+	u8	size;
+	u8	rsvd;
+	__be16	id;
+	u8	rsvd1[6];
+	u8	dst_mac[6];
+	u16	rsvd2;
+	u8	dst_mac_msk[6];
+	u16	rsvd3;
+	u8	src_mac[6];
+	u16	rsvd4;
+	u8	src_mac_msk[6];
+	u8      rsvd5;
+	u8      ether_type_enable;
+	__be16  ether_type;
+	__be16  vlan_id_msk;
+	__be16  vlan_id;
+} __packed;
+
+struct mlx4_net_trans_rule_hw_tcp_udp {
+	u8	size;
+	u8	rsvd;
+	__be16	id;
+	__be16	rsvd1[3];
+	__be16	dst_port;
+	__be16	rsvd2;
+	__be16	dst_port_msk;
+	__be16	rsvd3;
+	__be16	src_port;
+	__be16	rsvd4;
+	__be16	src_port_msk;
+} __packed;
+
+struct mlx4_net_trans_rule_hw_ipv4 {
+	u8	size;
+	u8	rsvd;
+	__be16	id;
+	__be32	rsvd1;
+	__be32	dst_ip;
+	__be32	dst_ip_msk;
+	__be32	src_ip;
+	__be32	src_ip_msk;
+} __packed;
+
+struct _rule_hw {
+	union {
+		struct {
+			u8 size;
+			u8 rsvd;
+			__be16 id;
+		};
+		struct mlx4_net_trans_rule_hw_eth eth;
+		struct mlx4_net_trans_rule_hw_ib ib;
+		struct mlx4_net_trans_rule_hw_ipv4 ipv4;
+		struct mlx4_net_trans_rule_hw_tcp_udp tcp_udp;
+	};
+};
+
 struct mlx4_priv {
 	struct mlx4_dev		dev;
 
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index bd6c9fc..244ba90 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -796,6 +796,8 @@ enum mlx4_net_trans_rule_id {
 	MLX4_NET_TRANS_RULE_NUM, /* should be last */
 };
 
+extern const u16 __sw_id_hw[];
+
 enum mlx4_net_trans_promisc_mode {
 	MLX4_FS_PROMISC_NONE = 0,
 	MLX4_FS_PROMISC_UPLINK,
-- 
1.7.7

^ permalink raw reply related

* Re: [PATCH net-next] ipv6: Export nd_tbl to allow modules to support IPv6
From: Thomas Graf @ 2012-09-06  8:51 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120905.130636.2295422323329670307.davem@davemloft.net>

On Wed, Sep 05, 2012 at 01:06:36PM -0400, David Miller wrote:
> So if one of our goals is to move towards a situation where all neigh
> accesses are refcount'less, having those external users makes that
> nearly impossible.
> 
> Instead, I'd rather see patches that mark arp_tbl as being exported
> only for internal usage inside of the tree, so that we can reach that
> goal.
> 
> I'm not applying this, sorry.

Fair enough

Does that mean you dismiss neighbour lookups by external users in
general in order to get rid of the refcnt?

Assuming that lookups would still be ok, would an ipv6 version of
__ipv4_neigh_lookup() be an acceptable API for external users?
(Yes there is __ipv6_neigh_lookup() already but unlike the ipv4 version
it takes the table as first argument)

^ permalink raw reply

* Re: [net-next 1/6] igb: Tidy up wrapping for CONFIG_IGB_PTP.
From: Richard Cochran @ 2012-09-06  8:49 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: davem, Matthew Vick, netdev, gospo, sassmann
In-Reply-To: <1346921067.4228.12.camel@jtkirshe-mobl>

On Thu, Sep 06, 2012 at 01:44:27AM -0700, Jeff Kirsher wrote:
> 
> As I recall, he stated he was going to integrate your suggestions in a
> new patch set that would follow this one.

Okay, but won't that mean moving the time stamping functions back, and
removing the #ifdefs again?

> Also, IMHO, while I look at pre-processor tags all day and get used to
> matching up #ifdef to #endif's, I still see #endif /* CONFIG_IGB_PTP */
> useful.  While it is not currently consistent, I see it more useful to
> fix as we go, rather than create a single patch to add a comment for
> every #endif.

Okay, fine.

Thanks,
Richard

^ permalink raw reply

* Re: [net-next 1/6] igb: Tidy up wrapping for CONFIG_IGB_PTP.
From: Jeff Kirsher @ 2012-09-06  8:44 UTC (permalink / raw)
  To: Richard Cochran; +Cc: davem, Matthew Vick, netdev, gospo, sassmann
In-Reply-To: <20120906080938.GE2550@netboy.at.omicron.at>

[-- Attachment #1: Type: text/plain, Size: 1043 bytes --]

On Thu, 2012-09-06 at 10:09 +0200, Richard Cochran wrote:
> On Wed, Sep 05, 2012 at 04:35:01PM -0700, Jeff Kirsher wrote:
> > From: Matthew Vick <matthew.vick@intel.com>
> > 
> > For users without CONFIG_IGB_PTP=y, we should not be compiling any
> PTP
> > code into the driver. Tidy up the wrapping in igb to support this.
> 
> So, you have ignored everything I wrote about time stamping
> applications.
> 
> (no)Thanks,
> Richard 

As I recall, he stated he was going to integrate your suggestions in a
new patch set that would follow this one.

In regards to your previous comments about the earlier patch I submitted
to wrap the code.  It was to fix a bug when the driver and time stamping
support are both modules.

Also, IMHO, while I look at pre-processor tags all day and get used to
matching up #ifdef to #endif's, I still see #endif /* CONFIG_IGB_PTP */
useful.  While it is not currently consistent, I see it more useful to
fix as we go, rather than create a single patch to add a comment for
every #endif.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: CBQ(but probably u32 filter bug), kernel "freeze", at least from 3.2.0
From: Eric Dumazet @ 2012-09-06  8:40 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: netdev
In-Reply-To: <1346151568.3001.20.camel@edumazet-glaptop>

On Tue, 2012-08-28 at 03:59 -0700, Eric Dumazet wrote:
> On Tue, 2012-08-28 at 07:50 +0300, Denys Fedoryshchenko wrote:
> > Hi
> > 
> > Got information from friend, confirmed that it crashed at least two my 
> > boxes also :)
> > 3.0.5-rc1 is working fine, 3.4.1 , 3.2.0 from ubuntu  - crashing
> > No watchdog fired, and didn't got yet significant debugging 
> > information.
> > 
> > Very easy to reproduce:
> > 1)run the script
> > 2)ping 192.168.3.234
> > 
> > script:
> > DEV_OUT=eth0
> > ICMP="match ip protocol 1 0xff"
> > U32="protocol ip u32"
> > DST="match ip dst"
> > tc qdisc add dev $DEV_OUT root handle 1: cbq avpkt 1000 bandwidth 
> > 100mbit
> > tc class add dev $DEV_OUT parent 1: classid 1:1 cbq rate 512kbit allot 
> > 1500 prio 5 bounded isolated
> > tc filter add dev $DEV_OUT parent 1:              prio 3 $U32 $ICMP 
> > $DST 192.168.3.234 flowid 1:
> > tc qdisc add dev $DEV_OUT parent 1:1 sfq perturb 10
> 
> Not sure what your friend expected from this buggy configuration.
> 
> It probably never worked at all.
> 
> CBQ needs at least one child class and one leaf class.
> 
> This scripts creates a loop inside CBQ, so cpu is probably looping in
> cbq_enqueue() (or more exactly cbq_classify()), as instructed by the
> sysadmin ;)
> 
> u32 (or sfq) seems ok.
> 
> Could you try the following patch ?
> 
> Thanks !
> 
> diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
> index 6aabd77..564b9fc 100644
> --- a/net/sched/sch_cbq.c
> +++ b/net/sched/sch_cbq.c
> @@ -250,10 +250,11 @@ cbq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
>  			else if ((cl = defmap[res.classid & TC_PRIO_MAX]) == NULL)
>  				cl = defmap[TC_PRIO_BESTEFFORT];
>  
> -			if (cl == NULL || cl->level >= head->level)
> +			if (cl == NULL)
>  				goto fallback;
>  		}
> -
> +		if (cl->level >= head->level)
> +			goto fallback;
>  #ifdef CONFIG_NET_CLS_ACT
>  		switch (result) {
>  		case TC_ACT_QUEUED:
> 

Hi Denys

Any feedback on the suggested patch ?

Thanks !

^ permalink raw reply

* Re: changing usbnet's API to better deal with cdc-ncm
From: Bjørn Mork @ 2012-09-06  8:30 UTC (permalink / raw)
  To: Ming Lei
  Cc: Oliver Neukum, alexey.orishko-0IS4wlFg1OjSUeElwK9/Pw,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <CACVXFVMXz1mqANBYR56KZsh3RgN5M_og_OggDEHT02w0Pe-UiA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> On Thu, Sep 6, 2012 at 4:12 AM, Oliver Neukum <oneukum-l3A5Bk7waGM@public.gmane.org> wrote:
>> Hi,
>>
>> looking at cdc-ncm it seeems to me that cdc-ncm is forced to play
>> very dirty games because usbnet doesn't have a notion about aggregating
>> packets for a single transfer.
>
> The Ethernet API we are using does not support transmitting multiple Ethernet
> frames in a single call, so the aggregation things should be done inside low
> level driver, in fact it is just what some wlan(802.11n) drivers have
> been doing.
>
> IMO, the current .tx_fixup is intelligent enough to handle aggregation:
>
>       - return a skb_out for sending if low level drivers think it is
> ready to send
>         the aggregated frames
>      -  return NULL if  the low level drivers think they need to wait
> for more frames
>
> Of course, the low level drivers need to start a timer to trigger sending
> remainder frames in case of timeout and no further frames come from
> upper protocol stack.
>
> Looks the introduced .tx_bundle is not necessary since .tx_fixup is OK.

The minidriver does not have any information about tx in progress.  The
strategy above, which is what cdc_ncm uses today, is fine as long as you
always want to wait as long as you always want to fill as many frames as
possible in each packet.  But what if the queue is empty and the device
is just sitting there doing nothing while you are waiting for more
frames?  Then you are just adding unnecessary latency.

There are also several usbnet minidrivers for protocols with frame
aggregation support.  Most of them do not currently implement tx
aggregation, possibly due to the complexity.  This makes a common
aggregation framework interesting in any case.  Reimplementing similar
loginc in multiple minidrivers is meaningless.

I believe Oliver is adding very useful functionality to usbnet here.
Functionality which at least cdc_ncm and possibly other existing
minidrivers can take advantage of immediately.  There also seems to be a
trend towards more complex aggregating protocols as the devices get
smarter and speeds go up.


BJørn
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: changing usbnet's API to better deal with cdc-ncm
From: Bjørn Mork @ 2012-09-06  8:17 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: alexey.orishko, netdev, linux-usb
In-Reply-To: <2791550.LhGu6po6Xy@linux-lqwf.site>

Oliver Neukum <oneukum@suse.de> writes:

> looking at cdc-ncm it seeems to me that cdc-ncm is forced to play
> very dirty games because usbnet doesn't have a notion about aggregating
> packets for a single transfer.
>
> It seems to me that this can be fixed introducing a method for bundling,
> which tells usbnet how packets have been aggregated. To have performance
> usbnet strives to always keep at least two transfers in flight.
>
> The code isn't complete and I need to get a device for testing, but to get
> your opinion, I ask you to comment on what I have now.

I haven't tested this on any device either, but it looks like the right
direction to me.  Note that there are several other existing minidrivers
which could take advantage of this code. A quick look found that both
sierra_net and rndis_host have some unbundling support in their
rx_fixups, but both ignore tx bundling. rndis_host has the following
explanation in tx_fixup:

        /* fill out the RNDIS header.  we won't bother trying to batch
         * packets; Linux minimizes wasted bandwidth through tx queues.
         */

Although that may be true for the host, I am not sure it will be true
for every device?


> @@ -874,71 +840,37 @@ cdc_ncm_fill_tx_frame(struct cdc_ncm_ctx *ctx, struct sk_buff *skb)
>  	/* return skb */
>  	ctx->tx_curr_skb = NULL;
>  	ctx->netdev->stats.tx_packets += ctx->tx_curr_frame_num;


Maybe all the statistics could be moved back to usbnet now that it will
see all packets again?


> index f87cf62..bb2f622 100644
> --- a/include/linux/usb/usbnet.h
> +++ b/include/linux/usb/usbnet.h
> @@ -33,6 +33,7 @@ struct usbnet {
>  	wait_queue_head_t	*wait;
>  	struct mutex		phy_mutex;
>  	unsigned char		suspend_count;
> +	atomic_t		tx_in_flight;
>  
>  	/* i/o info: pipes etc */
>  	unsigned		in, out;


So you don't keep any timer here?  The idea is that you always will
transmit immediately unless there are two transmit's in flight?  I
believe that is a change which has to be tested against real devices.
They may be optimized for receiving bundles.


Not really related, but I am still worrying how MBIM is going to fit
into the usbnet model where you have a strict relation between one
netdev and one USB interface.  Do you see any way usbnet could be
extended to manage a list of netdevs?

All the netdev related functions doing

	struct usbnet		*dev = netdev_priv(net);

would still work.  But all the USB related functions using dev->net to
get _the_ netdev would fail.  Maybe this will be too difficult to
implement within the usbnet framework at all?


Bjørn

^ permalink raw reply

* Re: [net-next 0/6][pull request] Intel Wired LAN Driver Updates
From: Richard Cochran @ 2012-09-06  8:13 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: davem, netdev, gospo, sassmann
In-Reply-To: <1346888106-25638-1-git-send-email-jeffrey.t.kirsher@intel.com>

On Wed, Sep 05, 2012 at 04:35:00PM -0700, Jeff Kirsher wrote:
> 
> Matthew Vick (6):
>   igb: Tidy up wrapping for CONFIG_IGB_PTP.
>   igb: Update PTP function names/variables and locations.
>   igb: Correct PTP support query from ethtool.
>   igb: Store the MAC address in the name in the PTP struct.

These four still look like unnecessary churn to me, and possibly the
wrong direction WRT time stamping in the 82580.

>   igb: Prevent dropped Tx timestamps via work items and interrupts.
>   igb: Add 1588 support to I210/I211.

But these two look important, so I don't know what to say.

Sorry,
Richard

^ permalink raw reply

* RE: changing usbnet's API to better deal with cdc-ncm
From: Alexey ORISHKO @ 2012-09-06  8:13 UTC (permalink / raw)
  To: Oliver Neukum, bjorn-yOkvZcmFvRU@public.gmane.org, Ming Lei
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-usb-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Alexey Orishko
In-Reply-To: <2791550.LhGu6po6Xy-ugxBuEnWX9yG/4A2pS7c2Q@public.gmane.org>

> -----Original Message-----
> From: Oliver Neukum [mailto:oneukum-l3A5Bk7waGM@public.gmane.org]


> looking at cdc-ncm it seeems to me that cdc-ncm is forced to play very
> dirty games because usbnet doesn't have a notion about aggregating
> packets for a single transfer.

Several issues need to be improved:
Tx path:
1. IP packets must be accumulated in one NTB. Currently it's done via data copy.
Preferred way would be a possibility to have a list of skb-s in resulting frame sent down.

2. Timer is needed to accumulate enough packets to get acceptable throughput and reducing amount of HW INTs on device side.
If we get access to Tx queue size and can read skb from it, time can be removed. But Tx queue is above usbnet, so doesn't look possible in current framework.

3. While trying to get best throughput we also need to keep latency in mind, because in some setups it is quite important.
As a simple solution configuration parameters could be used to address this issue.

Rx path:
4. IP packets are cloned to separate skb and length of actual data set to eth packet size, while skb size is still the same as skb containing full NTB frame. This causes problems with TCP stack when throughput is high because of flow control in the stack, if too much data allocated.
Someone suggested a patch for it, which was rejected. Anyway, I don't think copy data to a new skb would be a nice solution, but keeping several clones with size equal to incoming skb in not good either. 

/alexey
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox