Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] tc: fix parallel build file with lex/yacc
From: Mike Frysinger @ 2011-10-18 21:38 UTC (permalink / raw)
  To: stephen.hemminger, netdev

Building iproute2 in parallel might hit the race failure:
	emp_ematch.l:2:30: fatal error: emp_ematch.yacc.h:
		No such file or directory
	make[1]: *** [emp_ematch.lex.o] Error 1

This is because we currently allow the yacc/lex files to generate and
compile in parallel.  So add a simple dependency to make sure yacc has
finished before we attempt to compile the lex output.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
---
 tc/Makefile |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/tc/Makefile b/tc/Makefile
index 08aa4ce..b2ca165 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -136,6 +136,11 @@ m_xt_old.so: m_xt_old.c
 %.lex.c: %.l
 	$(LEX) $(LEXFLAGS) -o$@ $<
 
+# our lexer includes the header from yacc, so make sure
+# we don't attempt to compile it before the header has
+# been generated as part of the yacc step.
+emp_ematch.lex.o: emp_ematch.yacc.c
+
 ifneq ($(SHARED_LIBS),y)
 
 tc: static-syms.o
-- 
1.7.6.1

^ permalink raw reply related

* Re: Comment on nf_queue NF_STOLEN patch
From: Eric Dumazet @ 2011-10-18 21:23 UTC (permalink / raw)
  To: Jim Sansing; +Cc: Linux Network Development list
In-Reply-To: <4E9DCEAD.7070603@verizon.net>

Le mardi 18 octobre 2011 à 15:08 -0400, Jim Sansing a écrit :
> I have been working on a kernel module that registers with netfilter,
> and I noticed that a patch was added to nf_queue that changed the
> handling of return code NF_FILTER from 'do nothing' to 'free the skb'. 
> I'm not sure which kernel version this went in, but the date of the
> patch is Feb, 19, 2010.
> 
> Everything I have read about netfilter states that it is up to the
> netfilter hook to free the skb if NF_STOLEN is returned.  The
> implications of this patch from a hook programming perspective are:
> 
> 1) If the skb is used after the return from the hook, it must be cloned.
> 2) The original skb must not be freed.
> 
> I suggest that a comment be added to include/linux/netfilter.h that says
> explicitly the skb will be freed if NF_STOLEN is returned.

But its not true. Just read the code.

If you are working on this stuff I recommend you take a look at
commits :

c6675233f9015d3c0460c8aab53ed9b99d915c64
(netfilter: nf_queue: reject NF_STOLEN verdicts from userspace)

fad54440438a7c231a6ae347738423cbabc936d9
(netfilter: avoid double free in nf_reinject)

64507fdbc29c3a622180378210ecea8659b14e40
(netfilter: nf_queue: fix NF_STOLEN skb leak)

3bc38712e3a6e0596ccb6f8299043a826f983701
([NETFILTER]: nf_queue: handle NF_STOP and unknown verdicts in
nf_reinject)

^ permalink raw reply

* [PATCH 2/2] batman-adv: correctly set the data field in the TT_REPONSE packet
From: Marek Lindner @ 2011-10-18 21:01 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Antonio Quartulli, Marek Lindner
In-Reply-To: <1318971669-941-1-git-send-email-lindner_marek@yahoo.de>

From: Antonio Quartulli <ordex@autistici.org>

In the TT_RESPONSE packet, the number of carried entries is not correctly set.
This leads to a wrong interpretation of the packet payload on the receiver side
causing random entries to be added to the global translation table. Therefore
the latter gets always corrupted, triggering a table recovery all the time.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
---
 net/batman-adv/translation-table.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c
index f599db9..ef1acfd 100644
--- a/net/batman-adv/translation-table.c
+++ b/net/batman-adv/translation-table.c
@@ -999,7 +999,6 @@ static struct sk_buff *tt_response_fill_table(uint16_t tt_len, uint8_t ttvn,
 	tt_response = (struct tt_query_packet *)skb_put(skb,
 						     tt_query_size + tt_len);
 	tt_response->ttvn = ttvn;
-	tt_response->tt_data = htons(tt_tot);
 
 	tt_change = (struct tt_change *)(skb->data + tt_query_size);
 	tt_count = 0;
@@ -1025,6 +1024,10 @@ static struct sk_buff *tt_response_fill_table(uint16_t tt_len, uint8_t ttvn,
 	}
 	rcu_read_unlock();
 
+	/* store in the message the number of entries we have successfully
+	 * copied */
+	tt_response->tt_data = htons(tt_count);
+
 out:
 	return skb;
 }
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH 1/2] batman-adv: fix tt_local_reset_flags() function
From: Marek Lindner @ 2011-10-18 21:01 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r, Marek Lindner
In-Reply-To: <1318971669-941-1-git-send-email-lindner_marek-LWAfsSFWpa4@public.gmane.org>

From: Antonio Quartulli <ordex-GaUfNO9RBHfsrOwW+9ziJQ@public.gmane.org>

Currently the counter of tt_local_entry structures (tt_local_num) is incremented
each time the tt_local_reset_flags() is invoked causing the node to send wrong
TT_REPONSE packets containing a copy of non-initialised memory thus corrupting
other nodes global translation table and making higher level communication
impossible.

Reported-by: Junkeun Song <jun361-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Signed-off-by: Antonio Quartulli <ordex-GaUfNO9RBHfsrOwW+9ziJQ@public.gmane.org>
Acked-by: Junkeun Song <jun361-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Signed-off-by: Marek Lindner <lindner_marek-LWAfsSFWpa4@public.gmane.org>
---
 net/batman-adv/translation-table.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c
index fb6931d..f599db9 100644
--- a/net/batman-adv/translation-table.c
+++ b/net/batman-adv/translation-table.c
@@ -1668,6 +1668,8 @@ static void tt_local_reset_flags(struct bat_priv *bat_priv, uint16_t flags)
 		rcu_read_lock();
 		hlist_for_each_entry_rcu(tt_local_entry, node,
 					 head, hash_entry) {
+			if (!(tt_local_entry->flags & flags))
+				continue;
 			tt_local_entry->flags &= ~flags;
 			atomic_inc(&bat_priv->num_local_tt);
 		}
-- 
1.7.5.4

^ permalink raw reply related

* pull request: batman-adv 2011-10-18 (more regression fixes)
From: Marek Lindner @ 2011-10-18 21:01 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r

Hi David,

we have identified and fixed 2 more critical bugs in the linux-3.1
code base. We are somewhat late in the 3.1 release cycle but hoped
to take advantage of the delayed release to get these patches 
included before the final version is out. If not I will have to
send these patches to stable@ and would like to see these patches
applied to net-next-2.6/3.2. No merge conflicts are to be expected.
Let us know what works best for you.

Thanks,
Marek

The following changes since commit 8b267b312df9343fea3bd679c509b36214b5a854:

  batman-adv: do_bcast has to be true for broadcast packets only (2011-09-22 20:27:10 +0200)

are available in the git repository at:
  git://git.open-mesh.org/linux-merge.git batman-adv/maint

Antonio Quartulli (2):
      batman-adv: fix tt_local_reset_flags() function
      batman-adv: correctly set the data field in the TT_REPONSE packet

 net/batman-adv/translation-table.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

^ permalink raw reply

* Comment on nf_queue NF_STOLEN patch
From: Jim Sansing @ 2011-10-18 19:08 UTC (permalink / raw)
  To: Linux Network Development list
In-Reply-To: <CAEuXFEy89F_=J_PJMSbQ1fDcOEkkpz2vucq3LcToi7rTcZHcDA@mail.gmail.com>

I have been working on a kernel module that registers with netfilter,
and I noticed that a patch was added to nf_queue that changed the
handling of return code NF_FILTER from 'do nothing' to 'free the skb'. 
I'm not sure which kernel version this went in, but the date of the
patch is Feb, 19, 2010.

Everything I have read about netfilter states that it is up to the
netfilter hook to free the skb if NF_STOLEN is returned.  The
implications of this patch from a hook programming perspective are:

1) If the skb is used after the return from the hook, it must be cloned.
2) The original skb must not be freed.

I suggest that a comment be added to include/linux/netfilter.h that says
explicitly the skb will be freed if NF_STOLEN is returned.

Later . . .   Jim

^ permalink raw reply

* Re: [PATCH] tproxy: copy transparent flag when creating a time wait
From: David Miller @ 2011-10-18 20:48 UTC (permalink / raw)
  To: hidden; +Cc: pablo, netfilter-devel, netdev
In-Reply-To: <1318969055.2959.7.camel@nessa.odu>

From: KOVACS Krisztian <hidden@balabit.hu>
Date: Tue, 18 Oct 2011 22:17:35 +0200

> The transparent socket option setting was not copied to the time wait
> socket when an inet socket was being replaced by a time wait socket. This
> broke the --transparent option of the socket match and may have caused
> that FIN packets belonging to sockets in FIN_WAIT2 or TIME_WAIT state
> were being dropped by the packet filter.
> 
> Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>

I can't believe such a fundamental bug went unspotted for so long :-)

I'll apply this, thanks.

^ permalink raw reply

* [PATCH] tproxy: copy transparent flag when creating a time wait
From: KOVACS Krisztian @ 2011-10-18 20:17 UTC (permalink / raw)
  To: Pablo Neira Ayuso, netfilter-devel, netdev

The transparent socket option setting was not copied to the time wait
socket when an inet socket was being replaced by a time wait socket. This
broke the --transparent option of the socket match and may have caused
that FIN packets belonging to sockets in FIN_WAIT2 or TIME_WAIT state
were being dropped by the packet filter.

Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
---
 net/ipv4/tcp_minisocks.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index d2fe4e0..0ce3d06 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -328,6 +328,7 @@ void tcp_time_wait(struct sock *sk, int state, int timeo)
                struct tcp_timewait_sock *tcptw = tcp_twsk((struct sock *)tw);
                const int rto = (icsk->icsk_rto << 2) - (icsk->icsk_rto >> 1);

+               tw->tw_transparent      = inet_sk(sk)->transparent;
                tw->tw_rcv_wscale       = tp->rx_opt.rcv_wscale;
                tcptw->tw_rcv_nxt       = tp->rcv_nxt;
                tcptw->tw_snd_nxt       = tp->snd_nxt;
-- 
1.7.7




^ permalink raw reply related

* [PATCH] ipvs: Remove unused variable "cs" from ip_vs_leave function.
From: Krzysztof Wilczynski @ 2011-10-18 19:59 UTC (permalink / raw)
  To: Simon Horman; +Cc: Patrick McHardy, netdev

This is to address the following warning during compilation time:

  net/netfilter/ipvs/ip_vs_core.c: In function ‘ip_vs_leave’:
  net/netfilter/ipvs/ip_vs_core.c:532: warning: unused variable ‘cs’

This variable is indeed no longer in use.

Signed-off-by: Krzysztof Wilczynski <krzysztof.wilczynski@linux.com>
---
 net/netfilter/ipvs/ip_vs_core.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 00ea1ad..4f7d89d 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -529,7 +529,7 @@ int ip_vs_leave(struct ip_vs_service *svc, struct sk_buff *skb,
 	   a cache_bypass connection entry */
 	ipvs = net_ipvs(net);
 	if (ipvs->sysctl_cache_bypass && svc->fwmark && unicast) {
-		int ret, cs;
+		int ret;
 		struct ip_vs_conn *cp;
 		unsigned int flags = (svc->flags & IP_VS_SVC_F_ONEPACKET &&
 				      iph.protocol == IPPROTO_UDP)?
-- 
1.7.7

^ permalink raw reply related

* Re: [PATCH 6/7] mlx4_en: Adding rxhash support
From: Ben Hutchings @ 2011-10-18 19:35 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Yevgeny Petrilin, Eric Dumazet, davem@davemloft.net,
	netdev@vger.kernel.org
In-Reply-To: <20111018083606.6367ff50@nehalam.linuxnetplumber.net>

On Tue, 2011-10-18 at 08:36 -0700, Stephen Hemminger wrote:
> On Tue, 18 Oct 2011 08:59:44 +0000
> Yevgeny Petrilin <yevgenyp@mellanox.co.il> wrote:
> 
> > > 
> > > What is the gain using random values ?
> > > 
> > > Usually, we tend to have same hardware in a single machine, or we use
> > > active-backup bonding mode, and an active slave flip can change rxhash
> > > values with litle effect, since this happens not often.
> > > 
> > > I really prefer not random values, because it allows to have replayable
> > > configurations : For a given tcp flow, the same rxhash value is given
> > > and same cpu target in RPS. Its way easier to tune your machine for some workloads.
> > > 
> > 
> > There is no gain in random values, 
> > I'll make the change to have static value for RSS function.
> > 
> > We might consider how to ensure consistency across the different drivers in this aspect.
> 
> The key should be part of the network device core. Almost all hardware just
> implements the Microsoft standard, and if all drivers used same key they should
> come up with the same hash.

It should, but that's not enough.  The core also needs to be responsible
for initialising the hash indirection table, determining how many RX
queues to create, and IRQ affinity hints.

> Although using the same key all the time makes testing easier.
> The risk of using the same key is that it makes it easier for an attacker to
> create a set of addresses that all map to the same CPU which would make a DoS
> attack work better.  Therefore the key should be randomly generated at boot time.

If I understand correctly, the core of the Toeplitz hash functions is
(pretending we have a wide enough type called bigint):

u32 toeplitz_hash(bigint input, bigint key, unsigned width)
{
	u32 hash = 0;
	unsigned i;

	for (i = 0; i < width; i++)
		if (input & ((bigint)1 << i))
		        hash ^= key >> (1 + i);
	return hash;
}

This is hardly a cryptographic hash!  And while the key probably should
be random it should not just be random *bits*.  For example, if any 32
consecutive bits of the key are zero then 1 bit of input will have no
effect on the hash at all.

There was also a proposal a while back that we should try to make the
hash symmetric w.r.t. RX and TX addresses, so that both directions of a
flow through a router/bridge are aligned.  I think this was to be done
by repeating a 16-bit pattern across the key.  Not sure whether that's
worthwhile.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH] Fix guest memory leak and panic
From: David Miller @ 2011-10-18 19:12 UTC (permalink / raw)
  To: krkumar2; +Cc: rusty, mst, Ian.Campbell, netdev, linux-kernel, virtualization
In-Reply-To: <20111018080523.16861.55402.sendpatchset@krkumar2.in.ibm.com>

From: Krishna Kumar <krkumar2@in.ibm.com>
Date: Tue, 18 Oct 2011 13:35:23 +0530

> Commit 86ee8130 ("virtionet: convert to SKB paged frag API")
> introduced a bug in guest. During RX testing, guest runs out
> of memory within seconds, causing oom-killer; which then
> panics the system: "Kernel panic - not syncing: Out of memory
> and no killable processes...". /proc/meminfo just before the
> panic shows MemFree is a few MB's:
> 
> 	MemFree:         1928544 kB	(starts here)
> 	...
> 	...
> 	MemFree:           27488 kB
> 	MemFree:           26248 kB
> 	MemFree:           24636 kB
> 	MemFree:           22632 kB
> 	MemFree:           19580 kB
> 	MemFree:           17928 kB
> 	MemFree:           15548 kB
> 		(Panic)
> 
> The extra reference to the fragment pages causes those pages to
> not get freed in skb_release_data(). The following patch fixes
> the bug. I have not checked if any other converted driver has
> the same issue.
> 
> Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>

I'll wait for Ian's full audit, but Krishna please use more appropriate
subject lines in future patch submissions.

This patch is fixing a problem in the virtio_net driver, so please
mention that: "Subject: [PATCH] virtio_net: Fix guest memory leak and panic"

^ permalink raw reply

* Re: [PATCH 6/7] mlx4_en: Adding rxhash support
From: Eric Dumazet @ 2011-10-18 19:05 UTC (permalink / raw)
  To: Jesse Brandeburg
  Cc: Stephen Hemminger, Yevgeny Petrilin, davem@davemloft.net,
	netdev@vger.kernel.org
In-Reply-To: <CAEuXFEwzoC-L7Agr3Ssq9M-QN2w+t=rxjVvt9oJwQ2pQw7To8w@mail.gmail.com>

Le mardi 18 octobre 2011 à 11:49 -0700, Jesse Brandeburg a écrit :
> On Tue, Oct 18, 2011 at 8:36 AM, Stephen Hemminger
> <shemminger@vyatta.com> wrote:
> > On Tue, 18 Oct 2011 08:59:44 +0000
> > Yevgeny Petrilin <yevgenyp@mellanox.co.il> wrote:
> >> There is no gain in random values,
> >> I'll make the change to have static value for RSS function.
> >>
> >> We might consider how to ensure consistency across the different drivers in this aspect.
> >
> > The key should be part of the network device core. Almost all hardware just
> > implements the Microsoft standard, and if all drivers used same key they should
> > come up with the same hash.
> >
> > Although using the same key all the time makes testing easier.
> > The risk of using the same key is that it makes it easier for an attacker to
> > create a set of addresses that all map to the same CPU which would make a DoS
> > attack work better.  Therefore the key should be randomly generated at boot time.
> 
> Stephen, I respectfully disagree with your position here.  The risk of
> using the same key is that a malicious user could target a particular
> queue with a DoS attack, but how is that different than any single
> queue device?  NAPI protects a single queue against (a network
> interrupt based) DoS.  I do not think we should be generating a random
> key at boot time, and because of the way NAPI mitigates load, we are
> okay.  The gain from from the far simpler setup (and reproducability)
> outweighs the risk until someone can show damage due to this
> theoretical DoS attack.

Note : This policy could be up to the admin :

1) We could let admin chose a known hash for reproducability

   ethtool .... rss_hash xxxxxxxx:yyyyyyyy:zzzzzzzz:....

2) We could have a 'rss_perturb N ' ethtool option, to randomly reshufle
things every N seconds, for people really afraid ;)

^ permalink raw reply

* Re: Problem with ixgbe and TX locked on one cpu
From: Jesse Brandeburg @ 2011-10-18 18:57 UTC (permalink / raw)
  To: Paweł Staszewski; +Cc: Linux Network Development list, e1000-devel
In-Reply-To: <4E988B1E.5000606@itcare.pl>

CC: e1000-devel

2011/10/14 Paweł Staszewski <pstaszewski@itcare.pl>:
> Hello
>
> I have weird problem with ixgbe and irq affinity / rx-tx queue assignment

what application are you running, how are you using ixgbe? looks like
a router.  is something changing the skb->rx_queue entry (like
netfilter) or is there a layered device above ixgbe (bonding or ...) ?

why do your interrupts move after a long period? did you do it by
hand? we recommend disabling irqbalance and hand tuning interrupts
possibly with the set_irq_affinity.sh script.

> Statistics for my ethernet - ixgbe driver:
> ethtool -S eth4
> NIC statistics:
>     rx_packets: 5815535848808
>     tx_packets: 5811202378421
>     rx_bytes: 4791001750842200
>     tx_bytes: 4781190419358301
>     rx_pkts_nic: 5815535848827
>     tx_pkts_nic: 5811202378510
>     rx_bytes_nic: 4837563124411799
>     tx_bytes_nic: 4829987507084013
>     lsc_int: 8
>     tx_busy: 0
>     non_eop_descs: 0
>     rx_errors: 0
>     tx_errors: 0
>     rx_dropped: 0
>     tx_dropped: 0
>     multicast: 92494273
>     broadcast: 268718206
>     rx_no_buffer_count: 28829
>     collisions: 0
>     rx_over_errors: 0
>     rx_crc_errors: 0
>     rx_frame_errors: 0
>     hw_rsc_aggregated: 0
>     hw_rsc_flushed: 0
>     fdir_match: 0
>     fdir_miss: 0
>     rx_fifo_errors: 0
>     rx_missed_errors: 307051074
>     tx_aborted_errors: 0
>     tx_carrier_errors: 0
>     tx_fifo_errors: 0
>     tx_heartbeat_errors: 0
>     tx_timeout_count: 0
>     tx_restart_queue: 15926219
>     rx_long_length_errors: 298
>     rx_short_length_errors: 0
>     tx_flow_control_xon: 0
>     rx_flow_control_xon: 0
>     tx_flow_control_xoff: 0
>     rx_flow_control_xoff: 0
>     rx_csum_offload_errors: 54173917
>     alloc_rx_page_failed: 0
>     alloc_rx_buff_failed: 0
>     rx_no_dma_resources: 0
>     tx_queue_0_packets: 68694825
>     tx_queue_0_bytes: 9443750332
>     tx_queue_1_packets: 8410961
>     tx_queue_1_bytes: 2527763233
>     tx_queue_2_packets: 14411252
>     tx_queue_2_bytes: 1317132394
>     tx_queue_3_packets: 15013508147
>     tx_queue_3_bytes: 17364767277348
>     tx_queue_4_packets: 62779891
>     tx_queue_4_bytes: 63476596221
>     tx_queue_5_packets: 11176001
>     tx_queue_5_bytes: 2763600253
>     tx_queue_6_packets: 4416357
>     tx_queue_6_bytes: 611874984
>     tx_queue_7_packets: 8933405
>     tx_queue_7_bytes: 1837198524
>     tx_queue_8_packets: 13292669
>     tx_queue_8_bytes: 3241333510
>     tx_queue_9_packets: 10747236
>     tx_queue_9_bytes: 1805109931
>     tx_queue_10_packets: 5795935258380
>     tx_queue_10_bytes: 4763725304722245
>     tx_queue_11_packets: 12073934
>     tx_queue_11_bytes: 2982743045
>     tx_queue_12_packets: 10523764
>     tx_queue_12_bytes: 2637451199
>     tx_queue_13_packets: 12480552
>     tx_queue_13_bytes: 2434827407
>     tx_queue_14_packets: 7401777
>     tx_queue_14_bytes: 2413618099
>     tx_queue_15_packets: 8269270
>     tx_queue_15_bytes: 2854359576
>     rx_queue_0_packets: 361373769507
>     rx_queue_0_bytes: 298565751248279
>     rx_queue_1_packets: 369901571908
>     rx_queue_1_bytes: 303414679798160
>     rx_queue_2_packets: 362508961738
>     rx_queue_2_bytes: 299852439447157
>     rx_queue_3_packets: 363449272013
>     rx_queue_3_bytes: 299738390792515
>     rx_queue_4_packets: 361876234461
>     rx_queue_4_bytes: 297483366939732
>     rx_queue_5_packets: 361402926316
>     rx_queue_5_bytes: 297633876486533
>     rx_queue_6_packets: 362261522767
>     rx_queue_6_bytes: 298026696344647
>     rx_queue_7_packets: 361248593301
>     rx_queue_7_bytes: 296756459279986
>     rx_queue_8_packets: 361654143416
>     rx_queue_8_bytes: 298272433659520
>     rx_queue_9_packets: 362781764710
>     rx_queue_9_bytes: 298804803191595
>     rx_queue_10_packets: 361386593064
>     rx_queue_10_bytes: 297434987797644
>     rx_queue_11_packets: 369886597895
>     rx_queue_11_bytes: 302353350171712
>     rx_queue_12_packets: 361582732276
>     rx_queue_12_bytes: 298670408005971
>     rx_queue_13_packets: 365248093536
>     rx_queue_13_bytes: 302573023878287
>     rx_queue_14_packets: 366571142073
>     rx_queue_14_bytes: 302396739276514
>     rx_queue_15_packets: 362401929830
>     rx_queue_15_bytes: 299024344526029
>
> The problem is with queue 10
>     tx_queue_10_packets: 5795935258380
>     tx_queue_10_bytes: 4763725304722245
>
> as you can see most of the queue processing is used in queue 10
> Average difference is 1,854271229903958e-6  - compared to other queues
>
> and the problem is that almost all TX packet processing is on one CPU
> cat /proc/interrupts - in attached file
>
> Is this driver or kernel problem ?
>
> Kernel is: 2.6.38.2
>
> ixgbe driver is:
> ethtool -i eth4
> driver: ixgbe
> version: 3.2.9-k2
> firmware-version: 1.12-2
> bus-info: 0000:04:00.0

^ permalink raw reply

* Re: [PATCH 6/7] mlx4_en: Adding rxhash support
From: Jesse Brandeburg @ 2011-10-18 18:49 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Yevgeny Petrilin, Eric Dumazet, davem@davemloft.net,
	netdev@vger.kernel.org
In-Reply-To: <20111018083606.6367ff50@nehalam.linuxnetplumber.net>

On Tue, Oct 18, 2011 at 8:36 AM, Stephen Hemminger
<shemminger@vyatta.com> wrote:
> On Tue, 18 Oct 2011 08:59:44 +0000
> Yevgeny Petrilin <yevgenyp@mellanox.co.il> wrote:
>> There is no gain in random values,
>> I'll make the change to have static value for RSS function.
>>
>> We might consider how to ensure consistency across the different drivers in this aspect.
>
> The key should be part of the network device core. Almost all hardware just
> implements the Microsoft standard, and if all drivers used same key they should
> come up with the same hash.
>
> Although using the same key all the time makes testing easier.
> The risk of using the same key is that it makes it easier for an attacker to
> create a set of addresses that all map to the same CPU which would make a DoS
> attack work better.  Therefore the key should be randomly generated at boot time.

Stephen, I respectfully disagree with your position here.  The risk of
using the same key is that a malicious user could target a particular
queue with a DoS attack, but how is that different than any single
queue device?  NAPI protects a single queue against (a network
interrupt based) DoS.  I do not think we should be generating a random
key at boot time, and because of the way NAPI mitigates load, we are
okay.  The gain from from the far simpler setup (and reproducability)
outweighs the risk until someone can show damage due to this
theoretical DoS attack.

^ permalink raw reply

* Re: Bug#645589: linux-image-3.0.0-2-amd64: sky2 rx errors on 3.0, 2.6.32 works
From: Stephen Hemminger @ 2011-10-18 18:13 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: 645589, Antti Salmela, netdev
In-Reply-To: <1318909386.3340.91.camel@deadeye>

On Tue, 18 Oct 2011 04:43:06 +0100
Ben Hutchings <ben@decadent.org.uk> wrote:

> On Mon, 2011-10-17 at 10:40 +0300, Antti Salmela wrote:
> > Package: linux-2.6
> > Version: 3.0.0-5
> > Severity: normal
> > 
> > 
> > sky2 loses packets on 3.0 (-3 and -5) and 3.1-rc7, 2.6.32-38 and
> > setting interface to promiscuous works.
> > 
> > [   60.118244] sky2 0000:02:00.0: eth0: rx error, status 0xb92100 length 185
> > [   62.664370] sky2 0000:02:00.0: eth0: rx error, status 0x602100 length 96
> > [   63.370051] sky2 0000:02:00.0: eth0: rx error, status 0x422100 length 66
> > [   63.714672] sky2 0000:02:00.0: eth0: rx error, status 0x722100 length 114
> > [   64.513458] device eth0 entered promiscuous mode
> 
> It looks like this is a bug in accounting of VLAN tags, though I don't
> see what difference promiscuous mode should make.
> 
> The log messages show that status has the VLAN flag (bit 13) set and the
> length field (bits 16:28) equals the length passed into sky2_receive(),
> but that function expects the length field to be greater by VLAN_HLEN.
> 
> This device is:
> 
> [...]
> > 02:00.0 Ethernet controller [0200]: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller [11ab:4362] (rev 19)
> > 	Subsystem: ASUSTeK Computer Inc. Marvell 88E8053 Gigabit Ethernet controller PCIe (Asus) [1043:8142]
> > 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> > 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> > 	Latency: 0, Cache Line Size: 16 bytes
> > 	Interrupt: pin A routed to IRQ 43
> > 	Region 0: Memory at cdefc000 (64-bit, non-prefetchable) [size=16K]
> > 	Region 2: I/O ports at c800 [size=256]
> > 	Expansion ROM at cdec0000 [disabled] [size=128K]
> > 	Capabilities: <access denied>
> > 	Kernel driver in use: sky2
> [...]

The accounting is supposed to be:
   MAC = total length of packet (including vlan)
   DMA = bytes dma'd to buffer (does not include vlan)
Looks like the code is incorrect for the case where hardware
VLAN stripping is disabled.  What happens is that status bit
still has the VLAN flag, but DMA engine leaves the VLAN tag
in the DMA buffer so the check fails.

Proper accounting would involve more state machine mechanics
about whether VLAN tag has already been seen in current receive
status ring.

For now probably best to do something like:

--- net-next.orig/drivers/net/ethernet/marvell/sky2.c	2011-10-18 11:09:04.108683763 -0700
+++ net-next/drivers/net/ethernet/marvell/sky2.c	2011-10-18 11:09:53.661264323 -0700
@@ -2543,7 +2543,8 @@ static struct sk_buff *sky2_receive(stru
 	struct sk_buff *skb = NULL;
 	u16 count = (status & GMR_FS_LEN) >> 16;
 
-	if (status & GMR_FS_VLAN)
+	if ((dev->features & NETIF_F_HW_VLAN_RX) &&
+	    (status & GMR_FS_VLAN))
 		count -= VLAN_HLEN;	/* Account for vlan tag */
 
 	netif_printk(sky2, rx_status, KERN_DEBUG, dev,

^ permalink raw reply

* [RESEND] [PATCH] ll_temac: Add support for ethtool
From: Ricardo Ribalda Delgado @ 2011-10-18 17:55 UTC (permalink / raw)
  To: davem, grant.likely, sfr, u.kleine-koenig, netdev, linux-kernel
  Cc: Ricardo Ribalda Delgado

This patch enables the ethtool interface. The implementation is done
using the libphy helper functions.

Reviewed-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
---
 drivers/net/ll_temac_main.c |   27 +++++++++++++++++++++++++++
 1 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ll_temac_main.c b/drivers/net/ll_temac_main.c
index 728fe41..91a9804 100644
--- a/drivers/net/ll_temac_main.c
+++ b/drivers/net/ll_temac_main.c
@@ -957,6 +957,32 @@ static const struct attribute_group temac_attr_group = {
 	.attrs = temac_device_attrs,
 };
 
+/* ethtool support */
+static int temac_get_settings(struct net_device *ndev, struct ethtool_cmd *cmd)
+{
+	struct temac_local *lp = netdev_priv(ndev);
+	return phy_ethtool_gset(lp->phy_dev, cmd);
+}
+
+static int temac_set_settings(struct net_device *ndev, struct ethtool_cmd *cmd)
+{
+	struct temac_local *lp = netdev_priv(ndev);
+	return phy_ethtool_sset(lp->phy_dev, cmd);
+}
+
+static int temac_nway_reset(struct net_device *ndev)
+{
+	struct temac_local *lp = netdev_priv(ndev);
+	return phy_start_aneg(lp->phy_dev);
+}
+
+static const struct ethtool_ops temac_ethtool_ops = {
+	.get_settings = temac_get_settings,
+	.set_settings = temac_set_settings,
+	.nway_reset = temac_nway_reset,
+	.get_link = ethtool_op_get_link,
+};
+
 static int __devinit temac_of_probe(struct platform_device *op)
 {
 	struct device_node *np;
@@ -978,6 +1004,7 @@ static int __devinit temac_of_probe(struct platform_device *op)
 	ndev->flags &= ~IFF_MULTICAST;  /* clear multicast */
 	ndev->features = NETIF_F_SG | NETIF_F_FRAGLIST;
 	ndev->netdev_ops = &temac_netdev_ops;
+	ndev->ethtool_ops= &temac_ethtool_ops;
 #if 0
 	ndev->features |= NETIF_F_IP_CSUM; /* Can checksum TCP/UDP over IPv4. */
 	ndev->features |= NETIF_F_HW_CSUM; /* Can checksum all the packets. */
-- 
1.7.7

^ permalink raw reply related

* Re: [PATCH 6/7] mlx4_en: Adding rxhash support
From: Stephen Hemminger @ 2011-10-18 15:36 UTC (permalink / raw)
  To: Yevgeny Petrilin
  Cc: Eric Dumazet, davem@davemloft.net, netdev@vger.kernel.org
In-Reply-To: <953B660C027164448AE903364AC447D2235EEC80@MTLDAG01.mtl.com>

On Tue, 18 Oct 2011 08:59:44 +0000
Yevgeny Petrilin <yevgenyp@mellanox.co.il> wrote:

> > 
> > What is the gain using random values ?
> > 
> > Usually, we tend to have same hardware in a single machine, or we use
> > active-backup bonding mode, and an active slave flip can change rxhash
> > values with litle effect, since this happens not often.
> > 
> > I really prefer not random values, because it allows to have replayable
> > configurations : For a given tcp flow, the same rxhash value is given
> > and same cpu target in RPS. Its way easier to tune your machine for some workloads.
> > 
> 
> There is no gain in random values, 
> I'll make the change to have static value for RSS function.
> 
> We might consider how to ensure consistency across the different drivers in this aspect.

The key should be part of the network device core. Almost all hardware just
implements the Microsoft standard, and if all drivers used same key they should
come up with the same hash.

Although using the same key all the time makes testing easier.
The risk of using the same key is that it makes it easier for an attacker to
create a set of addresses that all map to the same CPU which would make a DoS
attack work better.  Therefore the key should be randomly generated at boot time.

^ permalink raw reply

* Re: [patch] pktgen: bug when calling ndelay in x86 architectures
From: Eric Dumazet @ 2011-10-18 14:47 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Daniel Turull, David Miller, netdev, Robert Olsson,
	Voravit Tanyingyong, Jens Laas
In-Reply-To: <1318946401.23980.6.camel@deadeye>

Le mardi 18 octobre 2011 à 15:00 +0100, Ben Hutchings a écrit :

> AIUI, the reason for limits on delays is not that it's bad practice to
> spin for so long, but that the delay calculations may overflow or
> otherwise become inaccurate.

OK, I can understand that, then a more appropriate patch would be :


diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 796044a..28bbf5b 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -2145,9 +2145,12 @@ static void spin(struct pktgen_dev *pkt_dev, ktime_t spin_until)
 	}
 
 	start_time = ktime_now();
-	if (remaining < 100000)
-		ndelay(remaining);	/* really small just spin */
-	else {
+	if (remaining < 100000) {
+		if (remaining >= 10000)
+			udelay(remaining/1000);
+		else
+			ndelay(remaining);
+	} else {
 		/* see do_nanosleep */
 		hrtimer_init_sleeper(&t, current);
 		do {

^ permalink raw reply related

* Re: [patch] pktgen: bug when calling ndelay in x86 architectures
From: Ben Hutchings @ 2011-10-18 14:00 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Daniel Turull, David Miller, netdev, Robert Olsson,
	Voravit Tanyingyong, Jens Laas
In-Reply-To: <1318939007.2657.57.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

On Tue, 2011-10-18 at 13:56 +0200, Eric Dumazet wrote:
> Le mardi 18 octobre 2011 à 13:08 +0200, Daniel Turull a écrit :
> > The value selected to delay the transmission in pktgen with the ndelay function should be lower.
> > In Linux/arch/x86/include/asm/delay.h and Linux/arch/sh/include/asm/delay.h
> > the maximal expected value for a constant is 20000 ns.
> > 
> > Signed-off-by: Daniel Turull <daniel.turull@gmail.com>
> > ---
> > diff --git a/net/core/pktgen.c b/net/core/pktgen.c
> > index 796044a..e17bd41 100644
> > --- a/net/core/pktgen.c
> > +++ b/net/core/pktgen.c
> > @@ -2145,7 +2145,7 @@ static void spin(struct pktgen_dev *pkt_dev, ktime_t spin_until)
> >  	}
> >  
> >  	start_time = ktime_now();
> > -	if (remaining < 100000)
> > +	if (remaining < 20000)
> >  		ndelay(remaining);	/* really small just spin */
> >  	else {
> >  		/* see do_nanosleep */
> 
> But 'remaining' is not a constant.
> 
> If we want exactly 40.000 packets per second rate (25 us between
> packets), your patch makes this not quite possible without
> CONFIG_HIGH_RES_TIMERS and probable high jitter because of scheduler
> effects.
> 
> pktgen is kind of special, we _want_ a cpu for our exclusive use.

AIUI, the reason for limits on delays is not that it's bad practice to
spin for so long, but that the delay calculations may overflow or
otherwise become inaccurate.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* [PATCH 6/6 V2] mlx4_en: Updating driver version
From: Yevgeny Petrilin @ 2011-10-18 11:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, yevgenyp


Driver version updated to 1.5.4.2

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index a6b5cf6..fca6616 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -51,8 +51,8 @@
 #include "en_port.h"
 
 #define DRV_NAME	"mlx4_en"
-#define DRV_VERSION	"1.5.4.1"
-#define DRV_RELDATE	"March 2011"
+#define DRV_VERSION	"1.5.4.2"
+#define DRV_RELDATE	"October 2011"
 
 #define MLX4_EN_MSG_LEVEL	(NETIF_MSG_LINK | NETIF_MSG_IFDOWN)
 
-- 
1.7.7

^ permalink raw reply related

* [PATCH 5/6 V2] mlx4_en: Adding rxhash support
From: Yevgeny Petrilin @ 2011-10-18 11:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, yevgenyp


Moving to Toeplitz function in RSS calculation.
Reporting rxhash in skb.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |    2 +-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c     |   12 ++++++++++++
 2 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index c4c4be4..78d776b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -1084,7 +1084,7 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 
 	dev->vlan_features = dev->hw_features;
 
-	dev->hw_features |= NETIF_F_RXCSUM;
+	dev->hw_features |= NETIF_F_RXCSUM | NETIF_F_RXHASH;
 	dev->features = dev->hw_features | NETIF_F_HIGHDMA |
 			NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX |
 			NETIF_F_HW_VLAN_FILTER;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index eed2a0a..0e368af 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -618,6 +618,9 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 						__vlan_hwaccel_put_tag(gro_skb, vid);
 					}
 
+					if (dev->features & NETIF_F_RXHASH)
+						gro_skb->rxhash = be32_to_cpu(cqe->immed_rss_invalid);
+
 					skb_record_rx_queue(gro_skb, cq->ring);
 					napi_gro_frags(&cq->napi);
 
@@ -651,6 +654,9 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 		skb->protocol = eth_type_trans(skb, dev);
 		skb_record_rx_queue(skb, cq->ring);
 
+		if (dev->features & NETIF_F_RXHASH)
+			skb->rxhash = be32_to_cpu(cqe->immed_rss_invalid);
+
 		if (be32_to_cpu(cqe->vlan_my_qpn) &
 		    MLX4_CQE_VLAN_PRESENT_MASK)
 			__vlan_hwaccel_put_tag(skb, be16_to_cpu(cqe->sl_vid));
@@ -834,6 +840,9 @@ int mlx4_en_config_rss_steer(struct mlx4_en_priv *priv)
 	int i, qpn;
 	int err = 0;
 	int good_qps = 0;
+	static const u32 rsskey[10] = { 0xD181C62C, 0xF7F4DB5B, 0x1983A2FC,
+				0x943E1ADB, 0xD9389E6B, 0xD1039C2C, 0xA74499AD,
+				0x593D56D9, 0xF3253C06, 0x2ADC1FFC};
 
 	en_dbg(DRV, priv, "Configuring rss steering\n");
 	err = mlx4_qp_reserve_range(mdev->dev, priv->rx_ring_num,
@@ -871,6 +880,9 @@ int mlx4_en_config_rss_steer(struct mlx4_en_priv *priv)
 					    (rss_map->base_qpn));
 	rss_context->default_qpn = cpu_to_be32(rss_map->base_qpn);
 	rss_context->flags = rss_mask;
+	rss_context->hash_fn = 1;
+	for (i = 0; i < 10; i++)
+		rss_context->rss_key[i] = rsskey[i];
 
 	if (priv->mdev->profile.udp_rss)
 		rss_context->base_qpn_udp = rss_context->default_qpn;
-- 
1.7.7

^ permalink raw reply related

* [PATCH 4/6 V2] mlx4_en: Recording rx queue for gro packets
From: Yevgeny Petrilin @ 2011-10-18 11:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, yevgenyp


Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 1231b21..eed2a0a 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -618,6 +618,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 						__vlan_hwaccel_put_tag(gro_skb, vid);
 					}
 
+					skb_record_rx_queue(gro_skb, cq->ring);
 					napi_gro_frags(&cq->napi);
 
 					goto next;
-- 
1.7.7

^ permalink raw reply related

* [PATCH 3/6 V2] mlx4_en: Checksum counters per ring
From: Yevgeny Petrilin @ 2011-10-18 11:50 UTC (permalink / raw)
  To: davem; +Cc: netdev, yevgenyp


Not updating common counters from data path.
The checksum counters are per ring, summarizing them when collecting statistics.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
 drivers/net/ethernet/mellanox/mlx4/en_port.c |    6 ++++++
 drivers/net/ethernet/mellanox/mlx4/en_rx.c   |    6 +++---
 drivers/net/ethernet/mellanox/mlx4/en_tx.c   |    2 +-
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |    3 +++
 4 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_port.c b/drivers/net/ethernet/mellanox/mlx4/en_port.c
index 9d27555..03c84cd 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_port.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_port.c
@@ -214,15 +214,21 @@ int mlx4_en_DUMP_ETH_STATS(struct mlx4_en_dev *mdev, u8 port, u8 reset)
 
 	stats->rx_packets = 0;
 	stats->rx_bytes = 0;
+	priv->port_stats.rx_chksum_good = 0;
+	priv->port_stats.rx_chksum_none = 0;
 	for (i = 0; i < priv->rx_ring_num; i++) {
 		stats->rx_packets += priv->rx_ring[i].packets;
 		stats->rx_bytes += priv->rx_ring[i].bytes;
+		priv->port_stats.rx_chksum_good += priv->rx_ring[i].csum_ok;
+		priv->port_stats.rx_chksum_none += priv->rx_ring[i].csum_none;
 	}
 	stats->tx_packets = 0;
 	stats->tx_bytes = 0;
+	priv->port_stats.tx_chksum_offload = 0;
 	for (i = 0; i < priv->tx_ring_num; i++) {
 		stats->tx_packets += priv->tx_ring[i].packets;
 		stats->tx_bytes += priv->tx_ring[i].bytes;
+		priv->port_stats.tx_chksum_offload += priv->tx_ring[i].tx_csum;
 	}
 
 	stats->rx_errors = be64_to_cpu(mlx4_en_stats->PCS) +
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index fbf1dcf..1231b21 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -587,7 +587,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 		if (likely(dev->features & NETIF_F_RXCSUM)) {
 			if ((cqe->status & cpu_to_be16(MLX4_CQE_STATUS_IPOK)) &&
 			    (cqe->checksum == cpu_to_be16(0xffff))) {
-				priv->port_stats.rx_chksum_good++;
+				ring->csum_ok++;
 				/* This packet is eligible for LRO if it is:
 				 * - DIX Ethernet (type interpretation)
 				 * - TCP/IP (v4)
@@ -627,11 +627,11 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 				ip_summed = CHECKSUM_UNNECESSARY;
 			} else {
 				ip_summed = CHECKSUM_NONE;
-				priv->port_stats.rx_chksum_none++;
+				ring->csum_none++;
 			}
 		} else {
 			ip_summed = CHECKSUM_NONE;
-			priv->port_stats.rx_chksum_none++;
+			ring->csum_none++;
 		}
 
 		skb = mlx4_en_rx_skb(priv, rx_desc, skb_frags,
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 6e03de0..f199460 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -695,7 +695,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 	if (likely(skb->ip_summed == CHECKSUM_PARTIAL)) {
 		tx_desc->ctrl.srcrb_flags |= cpu_to_be32(MLX4_WQE_CTRL_IP_CSUM |
 							 MLX4_WQE_CTRL_TCP_UDP_CSUM);
-		priv->port_stats.tx_chksum_offload++;
+		ring->tx_csum++;
 	}
 
 	if (unlikely(priv->validate_loopback)) {
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 3b753f7..a6b5cf6 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -249,6 +249,7 @@ struct mlx4_en_tx_ring {
 	struct mlx4_srq dummy;
 	unsigned long bytes;
 	unsigned long packets;
+	unsigned long tx_csum;
 	spinlock_t comp_lock;
 	struct mlx4_bf bf;
 	bool bf_enabled;
@@ -275,6 +276,8 @@ struct mlx4_en_rx_ring {
 	void *rx_info;
 	unsigned long bytes;
 	unsigned long packets;
+	unsigned long csum_ok;
+	unsigned long csum_none;
 };
 
 
-- 
1.7.7

^ permalink raw reply related

* [PATCH 2/6 V2] mlx4_en: Controlling FCS header removal
From: Yevgeny Petrilin @ 2011-10-18 11:50 UTC (permalink / raw)
  To: davem; +Cc: netdev, yevgenyp


Canceling FCS removal where FW allows for better alignment
of incoming data.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c |    4 ++++
 drivers/net/ethernet/mellanox/mlx4/fw.c    |    1 +
 include/linux/mlx4/device.h                |    1 +
 3 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 37cc9e5..fbf1dcf 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -806,6 +806,10 @@ static int mlx4_en_config_rss_qp(struct mlx4_en_priv *priv, int qpn,
 				qpn, ring->cqn, context);
 	context->db_rec_addr = cpu_to_be64(ring->wqres.db.dma);
 
+	/* Cancel FCS removal if FW allows */
+	if (mdev->dev->caps.flags & MLX4_DEV_CAP_FLAG_FCS_KEEP)
+		context->param3 |= cpu_to_be32(1 << 29);
+
 	err = mlx4_qp_to_ready(mdev->dev, &ring->wqres.mtt, context, qp, state);
 	if (err) {
 		mlx4_qp_remove(mdev->dev, qp);
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c
index 7eb8ba8..ed452dd 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -101,6 +101,7 @@ static void dump_dev_cap_flags(struct mlx4_dev *dev, u64 flags)
 		[25] = "Router support",
 		[30] = "IBoE support",
 		[32] = "Unicast loopback support",
+		[34] = "FCS header control",
 		[38] = "Wake On LAN support",
 		[40] = "UDP RSS support",
 		[41] = "Unicast VEP steering support",
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 53ef894..2366f94 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -75,6 +75,7 @@ enum {
 	MLX4_DEV_CAP_FLAG_UD_MCAST	= 1LL << 21,
 	MLX4_DEV_CAP_FLAG_IBOE		= 1LL << 30,
 	MLX4_DEV_CAP_FLAG_UC_LOOPBACK	= 1LL << 32,
+	MLX4_DEV_CAP_FLAG_FCS_KEEP	= 1LL << 34,
 	MLX4_DEV_CAP_FLAG_WOL		= 1LL << 38,
 	MLX4_DEV_CAP_FLAG_UDP_RSS	= 1LL << 40,
 	MLX4_DEV_CAP_FLAG_VEP_UC_STEER	= 1LL << 41,
-- 
1.7.7

^ permalink raw reply related

* [PATCH 1/6 V2] mlx4: Fix vlan table overflow
From: Yevgeny Petrilin @ 2011-10-18 11:50 UTC (permalink / raw)
  To: davem; +Cc: netdev, yevgenyp


Prevent overflow when trying to register more Vlans then the Vlan table in
HW is configured to.
Need to take into acount that the first 2 entries are reserved.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
 drivers/net/ethernet/mellanox/mlx4/port.c |   15 ++++++++-------
 1 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/port.c b/drivers/net/ethernet/mellanox/mlx4/port.c
index 609e0ec..163a314 100644
--- a/drivers/net/ethernet/mellanox/mlx4/port.c
+++ b/drivers/net/ethernet/mellanox/mlx4/port.c
@@ -65,7 +65,7 @@ void mlx4_init_vlan_table(struct mlx4_dev *dev, struct mlx4_vlan_table *table)
 		table->entries[i] = 0;
 		table->refs[i]	 = 0;
 	}
-	table->max   = 1 << dev->caps.log_num_vlans;
+	table->max   = (1 << dev->caps.log_num_vlans) - MLX4_VLAN_REGULAR;
 	table->total = 0;
 }
 
@@ -354,6 +354,13 @@ int mlx4_register_vlan(struct mlx4_dev *dev, u8 port, u16 vlan, int *index)
 	int free = -1;
 
 	mutex_lock(&table->mutex);
+
+	if (table->total == table->max) {
+		/* No free vlan entries */
+		err = -ENOSPC;
+		goto out;
+	}
+
 	for (i = MLX4_VLAN_REGULAR; i < MLX4_MAX_VLAN_NUM; i++) {
 		if (free < 0 && (table->refs[i] == 0)) {
 			free = i;
@@ -375,12 +382,6 @@ int mlx4_register_vlan(struct mlx4_dev *dev, u8 port, u16 vlan, int *index)
 		goto out;
 	}
 
-	if (table->total == table->max) {
-		/* No free vlan entries */
-		err = -ENOSPC;
-		goto out;
-	}
-
 	/* Register new MAC */
 	table->refs[free] = 1;
 	table->entries[free] = cpu_to_be32(vlan | MLX4_VLAN_VALID);
-- 
1.7.7

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox