Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 15/17] sfc: Add Siena PHY BIST and cable diagnostic support
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272483022.2549.31.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:30:22 +0100

> From: Steve Hodgson <shodgson@solarflare.com>
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 14/17] sfc: Clean up efx_nic::irq_zero_count
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272483000.2549.30.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:30:00 +0100

> There is no need for this to be unsigned long; make it unsigned int.
> It does need a line in kernel-doc, so add that.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 13/17] sfc: Add necessary parentheses to macro definitions in net_driver.h
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482990.2549.29.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:29:50 +0100

> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 12/17] sfc: Break NAPI processing after one ring-full of TX completions
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482982.2549.28.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:29:42 +0100

> Currently TX completions do not count towards the NAPI budget.  This
> means a continuous stream of TX completions can cause the polling
> function to loop indefinitely with scheduling disabled.  To avoid
> this, follow the common practice of reporting the budget spent after
> processing one ring-full of TX completions.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 11/17] sfc: Set PERIODIC_NOEVENT flag for MC_CMD_MAC_STATS
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482972.2549.27.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:29:32 +0100

> From: Steve Hodgson <shodgson@solarflare.com>
> 
> When set, an event is not sent whenever periodic MAC statistics are
> raised.  This avoids unnecessary wake-ups.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 10/17] sfc: Update MCDI protocol definitions
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482954.2549.26.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:29:14 +0100

> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 09/17] sfc: Enable IPv6 RSS using random key for Toeplitz hash
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482942.2549.25.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:29:02 +0100

> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 08/17] sfc: Read MEM_STAT for SRM_PERR as well as MEM_PERR errors
From: David Miller @ 2010-04-28 19:45 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482932.2549.24.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:28:52 +0100

> From: Steve Hodgson <shodgson@solarflare.com>
> 
> Parity errors in different blocks of SRAM may set one of two different
> interrupt flags.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 07/17] sfc: Log specific message for failure of NVRAM self-test
From: David Miller @ 2010-04-28 19:45 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482916.2549.23.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:28:36 +0100

> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 06/17] sfc: Extend the legacy interrupt workarounds
From: David Miller @ 2010-04-28 19:45 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482907.2549.22.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:28:27 +0100

> From: Steve Hodgson <shodgson@solarflare.com>
> 
> Siena has two problems with legacy interrupts:
>   1. There is no synchronisation between the ISR read completion,
>      and the interrupt deassert message.
>   2. A downstream read at the "wrong" moment can return 0, and
>      suppress generating the next interrupt.
> 
> Falcon should suffer from both of these, and it appears it does.
> Enable EFX_WORKAROUND_15783 on Falcon as well.
> 
> Also, when we see queues == 0, ensure we always schedule or rearm
> every event queue.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 05/17] sfc: Reconfigure the XAUI serdes after an EM reset
From: David Miller @ 2010-04-28 19:45 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482890.2549.21.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:28:10 +0100

> From: Steve Hodgson <shodgson@solarflare.com>
> 
> Fix a regression introduced in d3245b28ef2a45ec4e115062a38100bd06229289
> "sfc: Refactor link configuration".
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 04/17] sfc: Stop masking out XGMII faults over reconfigures
From: David Miller @ 2010-04-28 19:45 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482874.2549.20.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:27:54 +0100

> From: Steve Hodgson <shodgson@solarflare.com>
> 
> The aim of this code was to avoid a spurious XGMII fault over a MAC
> reconfigure. It's less relevant now that the PHY reconfigure isn't
> called from the MAC reconfigure.
> 
> After applying this patch, our link stress test passed 48 hours of
> testing without ever resetting the PHY.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 03/17] sfc: Handle serious errors in exactly one interrupt handler
From: David Miller @ 2010-04-28 19:45 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482856.2549.19.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:27:36 +0100

> From: Steve Hodgson <shodgson@solarflare.com>
> 
> 'Fatal' errors set an interrupt flag associated with a specific event
> queue; only read the syndrome vector if we see that queue's flag set
> (legacy interrupts) or in the interrupt handler for that queue (MSI).
> 
> Do not ignore an interrupt if the fatal error flag is set but specific
> error flags are all zero.  Even if we don't schedule a reset, we must
> respect the queue mask and rearm the appropriate event queues.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 02/17] sfc: Consistently report short MCDI responses as EIO
From: David Miller @ 2010-04-28 19:45 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482834.2549.18.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:27:14 +0100

> In some cases failing functions were returning 0 which is obviously wrong.
> In other cases they were returning inappropriate error codes.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 01/17] sfc: Ignore parity errors in the other port's SRAM
From: David Miller @ 2010-04-28 19:45 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482722.2549.17.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:25:22 +0100

> From: Steve Hodgson <shodgson@solarflare.com>
> 
> Siena has a separate SRAM bank for each port.  On single-port boards
> these can be merged together, so each port has an interrupt flag for
> parity errors in the other port's SRAM.  Currently we do not enable
> such merging and should mask this interrupt source.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: 2.6.34-rc5-git7 (plus all patches) -- another suspicious rcu_dereference_check() usage.
From: Eric Dumazet @ 2010-04-28 19:38 UTC (permalink / raw)
  To: paulmck
  Cc: Miles Lane, Vivek Goyal, Eric Paris, Lai Jiangshan, Ingo Molnar,
	Peter Zijlstra, LKML, nauman, netdev, Jens Axboe, Gui Jianfeng,
	Li Zefan, Johannes Berg, shemminger
In-Reply-To: <20100428175426.GK2540@linux.vnet.ibm.com>

Le mercredi 28 avril 2010 à 10:54 -0700, Paul E. McKenney a écrit :
> On Mon, Apr 26, 2010 at 08:51:06PM -0400, Miles Lane wrote:
> > This one occurred during the wakeup from suspend to RAM.
> > 
> > [  984.724697] [ INFO: suspicious rcu_dereference_check() usage. ]
> > [  984.724700] ---------------------------------------------------
> > [  984.724703] include/linux/fdtable.h:88 invoked
> > rcu_dereference_check() without protection!
> > [  984.724706]
> > [  984.724707] other info that might help us debug this:
> > [  984.724708]
> > [  984.724711]
> > [  984.724711] rcu_scheduler_active = 1, debug_locks = 1
> > [  984.724714] no locks held by dbus-daemon/4680.
> > [  984.724717]
> > [  984.724717] stack backtrace:
> > [  984.724721] Pid: 4680, comm: dbus-daemon Not tainted 2.6.34-rc5-git7 #33
> > [  984.724724] Call Trace:
> > [  984.724734]  [<ffffffff81074556>] lockdep_rcu_dereference+0x9d/0xa6
> > [  984.724740]  [<ffffffff810fc785>] fcheck_files+0xb1/0xc9
> > [  984.724745]  [<ffffffff810fc7f5>] fget_light+0x35/0xab
> > [  984.724751]  [<ffffffff81433e1b>] ? sock_poll_wait+0x13/0x18
> > [  984.724755]  [<ffffffff81433e39>] ? unix_poll+0x19/0x95
> > [  984.724762]  [<ffffffff8110aa95>] do_sys_poll+0x1ff/0x3e5
> > [  984.724766]  [<ffffffff8110a19e>] ? __pollwait+0x0/0xc7
> > [  984.724771]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > [  984.724776]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > [  984.724780]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > [  984.724784]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > [  984.724788]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > [  984.724793]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > [  984.724797]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > [  984.724802]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > [  984.724806]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > [  984.724812]  [<ffffffff8110ae0f>] sys_poll+0x50/0xbb
> > [  984.724818]  [<ffffffff81009d82>] system_call_fastpath+0x16/0x1b
> 
> Hmmm...  I am not convinced that this is a false positive.  Couldn't
> there be a multi-threaded process where one thread is invoking poll()
> on a UNIX socket just as another thread is calling close() on it?
> 
> The current fcheck_files() logic requires that the caller either (1) be in
> an RCU read-side critical section, (2) hold ->files_lock, or (3) passing
> in a files_struct with ->count equal to 1 (initialization or cleanup).
> 
> So I don't feel comfortable just slapping an RCU read-side critical
> section around this one, at least not unless someone who understands
> the locking says that doing so is OK.
> 
> 		

Its a single threaded program.

So fget_light() calls fcheck_files(files, fd); without rcu lock,
but some /proc/pid/fd/... user temporarly raised files->count just
before we perform the condition check.

^ permalink raw reply

* Re: [PATCH]: sctp: Fix skb_over_panic resulting from multiple invalid parameter errors (CVE-2010-1173) (v3)
From: Neil Horman @ 2010-04-28 19:37 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: sri, linux-sctp, eteo, netdev, davem, security
In-Reply-To: <20100428174711.GC4818@hmsreliant.think-freely.org>

Ok, version 3.

Change notes:

1) As per our discussion, uses a fixed size alloation stragetgy, allocating only
a mtu sized chunk (up to 1500 bytes).

2) add sctp_init_cause_fixed and sctp_addto_chunk_fixed helpers that, instead of
oopsing on insufficient data in the skb, instead simple fail to preform the
copy, and return an appropriate error code.

Summary:
	Recently, it was reported to me that the kernel could oops in the
following way:

<5> kernel BUG at net/core/skbuff.c:91!
<5> invalid operand: 0000 [#1]
<5> Modules linked in: sctp netconsole nls_utf8 autofs4 sunrpc iptable_filter
ip_tables cpufreq_powersave parport_pc lp parport vmblock(U) vsock(U) vmci(U)
vmxnet(U) vmmemctl(U) vmhgfs(U) acpiphp dm_mirror dm_mod button battery ac md5
ipv6 uhci_hcd ehci_hcd snd_ens1371 snd_rawmidi snd_seq_device snd_pcm_oss
snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_ac97_codec snd soundcore
pcnet32 mii floppy ext3 jbd ata_piix libata mptscsih mptsas mptspi mptscsi
mptbase sd_mod scsi_mod
<5> CPU:    0
<5> EIP:    0060:[<c02bff27>]    Not tainted VLI
<5> EFLAGS: 00010216   (2.6.9-89.0.25.EL) 
<5> EIP is at skb_over_panic+0x1f/0x2d
<5> eax: 0000002c   ebx: c033f461   ecx: c0357d96   edx: c040fd44
<5> esi: c033f461   edi: df653280   ebp: 00000000   esp: c040fd40
<5> ds: 007b   es: 007b   ss: 0068
<5> Process swapper (pid: 0, threadinfo=c040f000 task=c0370be0)
<5> Stack: c0357d96 e0c29478 00000084 00000004 c033f461 df653280 d7883180
e0c2947d 
<5>        00000000 00000080 df653490 00000004 de4f1ac0 de4f1ac0 00000004
df653490 
<5>        00000001 e0c2877a 08000800 de4f1ac0 df653490 00000000 e0c29d2e
00000004 
<5> Call Trace:
<5>  [<e0c29478>] sctp_addto_chunk+0xb0/0x128 [sctp]
<5>  [<e0c2947d>] sctp_addto_chunk+0xb5/0x128 [sctp]
<5>  [<e0c2877a>] sctp_init_cause+0x3f/0x47 [sctp]
<5>  [<e0c29d2e>] sctp_process_unk_param+0xac/0xb8 [sctp]
<5>  [<e0c29e90>] sctp_verify_init+0xcc/0x134 [sctp]
<5>  [<e0c20322>] sctp_sf_do_5_1B_init+0x83/0x28e [sctp]
<5>  [<e0c25333>] sctp_do_sm+0x41/0x77 [sctp]
<5>  [<c01555a4>] cache_grow+0x140/0x233
<5>  [<e0c26ba1>] sctp_endpoint_bh_rcv+0xc5/0x108 [sctp]
<5>  [<e0c2b863>] sctp_inq_push+0xe/0x10 [sctp]
<5>  [<e0c34600>] sctp_rcv+0x454/0x509 [sctp]
<5>  [<e084e017>] ipt_hook+0x17/0x1c [iptable_filter]
<5>  [<c02d005e>] nf_iterate+0x40/0x81
<5>  [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
<5>  [<c02e0c7f>] ip_local_deliver_finish+0xc6/0x151
<5>  [<c02d0362>] nf_hook_slow+0x83/0xb5
<5>  [<c02e0bb2>] ip_local_deliver+0x1a2/0x1a9
<5>  [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
<5>  [<c02e103e>] ip_rcv+0x334/0x3b4
<5>  [<c02c66fd>] netif_receive_skb+0x320/0x35b
<5>  [<e0a0928b>] init_stall_timer+0x67/0x6a [uhci_hcd]
<5>  [<c02c67a4>] process_backlog+0x6c/0xd9
<5>  [<c02c690f>] net_rx_action+0xfe/0x1f8
<5>  [<c012a7b1>] __do_softirq+0x35/0x79
<5>  [<c0107efb>] handle_IRQ_event+0x0/0x4f
<5>  [<c01094de>] do_softirq+0x46/0x4d

Its an skb_over_panic BUG halt that results from processing an init chunk in
which too many of its variable length parameters are in some way malformed.

The problem is in sctp_process_unk_param:
if (NULL == *errp)
	*errp = sctp_make_op_error_space(asoc, chunk,
					 ntohs(chunk->chunk_hdr->length));

	if (*errp) {
		sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM,
				 WORD_ROUND(ntohs(param.p->length)));
		sctp_addto_chunk(*errp,
			WORD_ROUND(ntohs(param.p->length)),
				  param.v);

When we allocate an error chunk, we assume that the worst case scenario requires
that we have chunk_hdr->length data allocated, which would be correct nominally,
given that we call sctp_addto_chunk for the violating parameter.  Unfortunately,
we also, in sctp_init_cause insert a sctp_errhdr_t structure into the error
chunk, so the worst case situation in which all parameters are in violation
requires chunk_hdr->length+(sizeof(sctp_errhdr_t)*param_count) bytes of data.

The result of this error is that a deliberately malformed packet sent to a
listening host can cause a remote DOS, described in CVE-2010-1173:
http://cve.mitre.org/cgi-bin/cvename.cgi?name=2010-1173

This fix solves the problem by allowing our implementation to only report a
fixed number of errors.  When we encounter an error in parameter processing we
allocate a chunk that is min(asoc->pathmtu, SCTP_DEFAULT_MAXSEGMENT), limiting
our error reporting to a single mtu sized chunk.  Parameter errors that grow
beyond that value are discarded.

I've tested this fix using the reproducer that I was provided (send an init chunk to a
listening sctp socket with a few dozen parameters of type 0xc001 and length 4),
and it solves the problem nicely.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>


 include/net/sctp/structs.h |    1 
 net/sctp/sm_make_chunk.c   |   70 +++++++++++++++++++++++++++++++++++++++------
 2 files changed, 62 insertions(+), 9 deletions(-)


diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index ff30177..597f8e2 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -778,6 +778,7 @@ int sctp_user_addto_chunk(struct sctp_chunk *chunk, int off, int len,
 			  struct iovec *data);
 void sctp_chunk_free(struct sctp_chunk *);
 void  *sctp_addto_chunk(struct sctp_chunk *, int len, const void *data);
+void  *sctp_addto_chunk_fixed(struct sctp_chunk *, int len, const void *data);
 struct sctp_chunk *sctp_chunkify(struct sk_buff *,
 				 const struct sctp_association *,
 				 struct sock *);
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index f592163..9623112 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -107,7 +107,7 @@ static const struct sctp_paramhdr prsctp_param = {
 	cpu_to_be16(sizeof(struct sctp_paramhdr)),
 };
 
-/* A helper to initialize to initialize an op error inside a
+/* A helper to initialize an op error inside a
  * provided chunk, as most cause codes will be embedded inside an
  * abort chunk.
  */
@@ -124,6 +124,29 @@ void  sctp_init_cause(struct sctp_chunk *chunk, __be16 cause_code,
 	chunk->subh.err_hdr = sctp_addto_chunk(chunk, sizeof(sctp_errhdr_t), &err);
 }
 
+/* A helper to initialize an op error inside a
+ * provided chunk, as most cause codes will be embedded inside an
+ * abort chunk.  Differs from sctp_init_cause in that it won't oops
+ * if there isn't enough space in the op error chunk
+ */
+int sctp_init_cause_fixed(struct sctp_chunk *chunk, __be16 cause_code,
+		      size_t paylen)
+{
+	sctp_errhdr_t err;
+	__u16 len;
+
+	/* Cause code constants are now defined in network order.  */
+	err.cause = cause_code;
+	len = sizeof(sctp_errhdr_t) + paylen;
+	err.length  = htons(len);
+
+	if (skb_tailroom(chunk->skb) >  len)
+		return -ENOSPC;
+	chunk->subh.err_hdr = sctp_addto_chunk_fixed(chunk,
+						     sizeof(sctp_errhdr_t),
+						     &err);
+	return 0;
+}
 /* 3.3.2 Initiation (INIT) (1)
  *
  * This chunk is used to initiate a SCTP association between two
@@ -1131,6 +1154,24 @@ nodata:
 	return retval;
 }
 
+/* Create an Operation Error chunk of a fixed size,
+ * specifically, max(asoc->pathmtu, SCTP_DEFAULT_MAXSEGMENT)
+ * This is a helper function to allocate an error chunk for
+ * for those invalid parameter codes in which we may not want
+ * to report all the errors, if the incomming chunk is large
+ */
+static inline struct sctp_chunk *sctp_make_op_error_fixed(
+	const struct sctp_association *asoc,
+	const struct sctp_chunk *chunk)
+{
+	size_t size = asoc ? asoc->pathmtu : 0;
+
+	if (size > SCTP_DEFAULT_MAXSEGMENT)
+		size = SCTP_DEFAULT_MAXSEGMENT;
+
+	return sctp_make_op_error_space(asoc, chunk, size);
+}
+
 /* Create an Operation Error chunk.  */
 struct sctp_chunk *sctp_make_op_error(const struct sctp_association *asoc,
 				 const struct sctp_chunk *chunk,
@@ -1373,6 +1414,18 @@ void *sctp_addto_chunk(struct sctp_chunk *chunk, int len, const void *data)
 	return target;
 }
 
+/* Append bytes to the end of a chunk. Returns NULL if there isn't sufficient
+ * space in the chunk
+ */
+void *sctp_addto_chunk_fixed(struct sctp_chunk *chunk,
+			     int len, const void *data)
+{
+	if (skb_tailroom(chunk->skb) > len)
+		return sctp_addto_chunk(chunk, len, data);
+	else
+		return NULL;
+}
+
 /* Append bytes from user space to the end of a chunk.  Will panic if
  * chunk is not big enough.
  * Returns a kernel err value.
@@ -1793,9 +1846,9 @@ static int sctp_process_missing_param(const struct sctp_association *asoc,
 	if (*errp) {
 		report.num_missing = htonl(1);
 		report.type = paramtype;
-		sctp_init_cause(*errp, SCTP_ERROR_MISS_PARAM,
-				sizeof(report));
-		sctp_addto_chunk(*errp, sizeof(report), &report);
+		sctp_init_cause_fixed(*errp, SCTP_ERROR_MISS_PARAM,
+				      sizeof(report));
+		sctp_addto_chunk_fixed(*errp, sizeof(report), &report);
 	}
 
 	/* Stop processing this chunk. */
@@ -1813,7 +1866,7 @@ static int sctp_process_inv_mandatory(const struct sctp_association *asoc,
 		*errp = sctp_make_op_error_space(asoc, chunk, 0);
 
 	if (*errp)
-		sctp_init_cause(*errp, SCTP_ERROR_INV_PARAM, 0);
+		sctp_init_cause_fixed(*errp, SCTP_ERROR_INV_PARAM, 0);
 
 	/* Stop processing this chunk. */
 	return 0;
@@ -1976,13 +2029,12 @@ static sctp_ierror_t sctp_process_unk_param(const struct sctp_association *asoc,
 		 * returning multiple unknown parameters.
 		 */
 		if (NULL == *errp)
-			*errp = sctp_make_op_error_space(asoc, chunk,
-					ntohs(chunk->chunk_hdr->length));
+			*errp = sctp_make_op_error_fixed(asoc, chunk);
 
 		if (*errp) {
-			sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM,
+			sctp_init_cause_fixed(*errp, SCTP_ERROR_UNKNOWN_PARAM,
 					WORD_ROUND(ntohs(param.p->length)));
-			sctp_addto_chunk(*errp,
+			sctp_addto_chunk_fixed(*errp,
 					WORD_ROUND(ntohs(param.p->length)),
 					param.v);
 		} else {

^ permalink raw reply related

* Re: [net-next-2.6 PATCH 1/2] Add netdev port-profile support (take III, was iovnl)
From: Arnd Bergmann @ 2010-04-28 19:37 UTC (permalink / raw)
  To: Scott Feldman; +Cc: davem, netdev, chrisw, Jens Osterkamp
In-Reply-To: <C7FDD272.2C86D%scofeldm@cisco.com>

On Wednesday 28 April 2010, Scott Feldman wrote:
> On 4/28/10 6:13 AM, "Arnd Bergmann" <arnd@arndb.de> wrote:
> 
> > What I could imagine to unify this is something like
> > 
> >    ip port_profile set DEVICE [ { pre_associate | pre_associate_rr } ]
> >                               { name PORT-PROFILE | vsi MGR:VTID:VER }
> >                                     mac LLADDR
> >    [ vlan VID ]
> >                                     [ host_uuid HOST_UUID ]
> >                                     [ client_uuid CLIENT_UUID ]
> >                                     [ client_name CLIENT_NAME ]
> >    ip port_profile del DEVICE [ mac LLADDR [ vlan VID ] ]
> >    ip port_profile show DEVICE
> 
> Arnd, can someone test this with VDP today?  I don't have access to that
> equipment so it's difficult for me to fully test the unified patch.  I can
> test the previous patch with enic easily because I have access to production
> systems.  I'd like to make sure someone can test this with VDP before I
> respin the patch one more time.

Sorry, but I don't have access to production hardware at this time. Jens wants to
implement both sides so we can test this in simulation mode, but it's not done
yet.

	Arnd

^ permalink raw reply

* [GIT] Networking
From: David Miller @ 2010-04-28 19:37 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


Mainly I'm pushing this to revert the TCP bind() bug fix, it's
causing too much fallout to try and sort out this late in the
-rc series.

Bunch of OOPS'ers for SCTP, Intel chip fixes (lack of PHY powerdown in
ixgbe lets crap into chip FIFO wedging the chip completely, etc.), a
regression fix for tg3 on systems that do run out of MSI vectors, bnx2
driver fixes (lost MSI-X vectors, 'scheduling while atomic' fix) and
hopefully r8169 MAC address programming is solid now.

Please pull, thanks a lot!

The following changes since commit b91ce4d14a21fc04d165be30319541e0f9204f15:
  Linus Torvalds (1):
        Merge git://git.kernel.org/.../davem/net-2.6

are available in the git repository at:

  master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.git master

Andre Detsch (2):
      tg3: Fix INTx fallback when MSI fails
      cxgb3: Wait longer for control packets on initialization

Andreas Hartmann (1):
      drivers/usb/net/kaweth.c: add device "Allied Telesyn AT-USB10 USB Ethernet Adapter"

Andy Fleming (1):
      gianfar: Wait for both RX and TX to stop

Ben Hutchings (3):
      sfc: Wait at most 10ms for the MC to finish reading out MAC statistics
      sfc: Always close net device at the end of a disabling reset
      sfc: Change falcon_probe_board() to fail for unsupported boards

Bruce Allan (1):
      e1000e: enable/disable ASPM L0s and L1 and ERT according to hardware errata

Dan Carpenter (2):
      ipheth: potential null dereferences on error path
      bluetooth: handle l2cap_create_connless_pdu() errors

David S. Miller (1):
      Revert "tcp: bind() fix when many ports are bound"

Elina Pasheva (1):
      net/usb: add sierra_net.c driver

Ken Kawasaki (1):
      smc91c92_cs: spin_unlock_irqrestore before calling smc_interrupt()

Michael Chan (3):
      bnx2: Fix lost MSI-X problem on 5709 NICs.
      bnx2: Prevent "scheduling while atomic" warning with cnic, bonding and vlan.
      bnx2: Update version to 2.0.9.

Peter Waskiewicz (1):
      ixgbe: Power down PHY during driver resets

Stefan Schmidt (1):
      ieee802154: Fix oops during ieee802154_sock_ioctl

Torgny Johansson (1):
      cdc_ether: fix autosuspend for mbm devices

Vlad Yasevich (3):
      sctp: fix potential reference of a freed pointer
      sctp: per_cpu variables should be in bh_disabled section
      sctp: Fix oops when sending queued ASCONF chunks

Wei Yongjun (2):
      sctp: avoid irq lock inversion while call sk->sk_data_ready()
      sctp: fix to calc the INIT/INIT-ACK chunk length correctly is set

YOSHIFUJI Hideaki / 吉藤英明 (1):
      bridge br_multicast: Ensure to initialize BR_INPUT_SKB_CB(skb)->mrouters_only.

françois romieu (2):
      r8169: failure to enable mwi should not be fatal
      r8169: more broken register writes workaround

 drivers/net/bnx2.c               |   48 +-
 drivers/net/cxgb3/cxgb3_main.c   |    2 +-
 drivers/net/e1000e/82571.c       |   20 +-
 drivers/net/e1000e/e1000.h       |    5 +-
 drivers/net/e1000e/netdev.c      |   70 ++-
 drivers/net/gianfar.c            |    6 +-
 drivers/net/ixgbe/ixgbe_82599.c  |   62 ++-
 drivers/net/ixgbe/ixgbe_main.c   |   22 +-
 drivers/net/ixgbe/ixgbe_type.h   |    2 +
 drivers/net/pcmcia/smc91c92_cs.c |   31 +-
 drivers/net/r8169.c              |   32 +-
 drivers/net/sfc/efx.c            |    4 +-
 drivers/net/sfc/falcon.c         |    4 +-
 drivers/net/sfc/falcon_boards.c  |   13 +-
 drivers/net/sfc/nic.h            |    2 +-
 drivers/net/sfc/siena.c          |   13 +-
 drivers/net/tg3.c                |    1 +
 drivers/net/usb/Kconfig          |   10 +
 drivers/net/usb/Makefile         |    1 +
 drivers/net/usb/cdc_ether.c      |    1 +
 drivers/net/usb/ipheth.c         |   15 +-
 drivers/net/usb/kaweth.c         |    1 +
 drivers/net/usb/sierra_net.c     | 1001 ++++++++++++++++++++++++++++++++++++++
 include/net/sctp/command.h       |    1 +
 include/net/sctp/sctp.h          |    1 +
 net/bluetooth/l2cap.c            |    5 +-
 net/bridge/br_multicast.c        |    6 +-
 net/ieee802154/af_ieee802154.c   |    3 +
 net/ipv4/inet_connection_sock.c  |   16 +-
 net/ipv6/inet6_connection_sock.c |   15 +-
 net/sctp/associola.c             |    6 +-
 net/sctp/endpointola.c           |    1 +
 net/sctp/sm_make_chunk.c         |   32 +-
 net/sctp/sm_sideeffect.c         |   26 +
 net/sctp/sm_statefuns.c          |    8 +-
 net/sctp/socket.c                |   14 +-
 36 files changed, 1312 insertions(+), 188 deletions(-)
 create mode 100644 drivers/net/usb/sierra_net.c

^ permalink raw reply

* Re: [net-next-2.6 PATCH 1/2] Add netdev port-profile support (take III, was iovnl)
From: Arnd Bergmann @ 2010-04-28 19:33 UTC (permalink / raw)
  To: Scott Feldman; +Cc: davem, netdev, chrisw
In-Reply-To: <C7FDC3B8.2C7EB%scofeldm@cisco.com>

On Wednesday 28 April 2010, Scott Feldman wrote:
> On 4/28/10 6:13 AM, "Arnd Bergmann" <arnd@arndb.de> wrote:
> > 
> > We will need a few more options to cover draft VDP in addition to the protocol
> > your NIC is using. I still think it's possible to use the same interface
> > for both, but the differences are obviously showing.
> > 
> > The missing bits that I can see so far are:
> > 
> > - You only have 'get' and 'set'. We will also need a 'unset' or 'delete'
> >   option in order to get rid of a port profile association.
> 
> That's there in my patch.  If you don't specify anything after port-profile
> keyword, it's an unset.  See extra "[" and "]" above.  Will that work for
> you?

Well, this won't work if you want to have multiple slave interfaces connected
to a single master and record the port profile in the master.

> > - VDP has three different ways to 'set' a port profile: 'associate',
> >   'pre-associate with resource reservation' and 'pre-associate without
> >   resource reservation'. This could become an extra option flag.
> 
> Ok, let's add an option flag bit field.  I'm not sure how that looks from
> the iproute2 cmd line.  Would you take a stab at defining these?
> 
> > - Instead of a port profile name, VDP specifies a tuple like
> >    struct vsi_associate {
> > unsigned char VSI_Mgr_ID;       /* VSI manager ID */
> > unsigned char VSI_Type_ID[3];   /* 24 bit VSI Type ID */
> > unsigned char VSI_Type_Version; /* VSI Type version */
> >    };
> >    I'm not sure how to deal with that best, but there needs to be
> >    some parsing of these numbers.
> 
> PORT-PROFILE above is a u8* array.  You could decide to encode the tuple in
> a string, e.g. "1.12345.1", and let the receiver parse it?  Or pack it in as
> binary.  PORT-PROFILE for us is just a string identifier, e.g. "corp-net-10"
> or "joes-garage".

I guess either one would work, but I'd prefer to do the parsing at
the front-end and passing it as binary to avoid libvirt encoding
it as ascii and lldpad (or the kernel) parsing the data again, which
would be more error-prone.

Another alternative would be to make this a distinct argument (or even
three of them) and only allow passing one or the other. That would
also implicitly choose the protocol (VDP or port extender).
 
> > - VDP requires a vlan ID to be part of the association, in addition to
> >   the MAC address. In theory, we could have multiple tuples of MAC+VLAN
> >   addresses, but we can probably just associate each tuple separately
> >   and ignore that part of the standard.
> 
> I don't think I have enough information to spec this out for this item.
> Would you take a stab at how this would look in the struct and how it would
> look from the iproute2 cmd line?  (Note I'm using iproute2 cmd line for
> illustrative purposes, but the sender of the msg could be something like
> libvirt).

Just adding the VID would be done I wrote at the end of my last mail.
Adding multiple VLAN/MAC pairs is probably not necessary if we can
do multiple associations.

> > - we have a set of possible error conditions that can be returned by
> >   the switch (invalid format, insufficient resources, unknown VTID,
> >   VTID violation, VTID verison violation, out of sync). It should be
> >   possible to return each of these to the user with 'get'.
> 
> There is a status code in the get cmd as defined in my patch.  I have it as
> a u8 with some enum codes.  Can we add to the enum code list?  Or do you
> want to return a full string?  Our requirements are we return one of:
> {success, error, in-progress}.

enum is fine, but I think it would be good to use the same numbers
as the VDP standard where possible. Maybe we could use two bytes,
the first one for the overall status (success, error, in progress)
and the second one for the specific error (as above)?

> >> Since we're using netlink sockets, the receiver of the RTM_SETLINK msg can
> >> be in kernel- or user-space.  For kernel-space recipient, rtnetlink.c, the
> >> new ndo_set_port_profile netdev op is called to set the port-profile.
> >> User-space recipients can decide how they propagate the msg to the switch.
> >> There is also a RTM_GETLINK cmd to to return port-profile setting of an
> >> interface and to also return the status of the last port-profile.
> > 
> > More on a stylistic note, I'm not convinced that using RTM_SETLINK/GETLINK
> > is the right interface for this, unlike the 'ip link set DEV vf ...' stuff,
> > because it seems to suggest that this is an option of the adapter itself.
> > I actually liked the iovnl family better in this regard, because it kept
> > the protocols separate.
> 
> Wait a second...I abandoned iovnl and moved to if_link based on suggestions
> from you and others.  On 4/21/10 2:13 PM, "Arnd Bergmann" <arnd@arndb.de>
> wrote:
>
>    > Right. My preference would probably be make these a subcategory of
>    > the if_link, and use the existing RTM_NEWLINK/RTM_DELLINK commands.
> 
> My latest with if_link fits best with what we're trying to do with enic.

Sorry for the misunderstanding, I was probably not clear enough. This
was a side-discussion about how we should create and destroy virtual
interfaces on IOV capable adapters. My point was that this should be
_separate_ from the port profile association. For creating a
macvlan/macvtap device, we already use RTM_NEWLINK, which does not
associate the device with the switch, so I suggested that for creating
a virtual interface with hardware support we should do the same thing,
but leave the association somewhere else. iovnl sounds like a good
place for that.

> Note the current interface is per netdev interface (a link), where the
> netdev interface could be the adapter itself or any other netdev such as
> macvlan, bond, etc.

We certainly missed each others arguments here, see my other mail.
In order to do the port profile association from software, we
definitely need the master device (nic, PF, bond, bridge, ...) so
we have a way to communicate to the switch, not the slave device
(VF, tap, macvtap, ...) that we are trying to associate.

> > What I could imagine to unify this is something like
> > 
> >    ip port_profile set DEVICE [ { pre_associate | pre_associate_rr } ]
> >                               { name PORT-PROFILE | vsi MGR:VTID:VER }
> >                                     mac LLADDR
> >    [ vlan VID ]
> >                                     [ host_uuid HOST_UUID ]
> >                                     [ client_uuid CLIENT_UUID ]
> >                                     [ client_name CLIENT_NAME ]
> >    ip port_profile del DEVICE [ mac LLADDR [ vlan VID ] ]
> >    ip port_profile show DEVICE
> 
> If we want to break port_profile out into it's own ip cmd, I'm ok with that.
> What you have above would work for enic.  The netdev would have these ops:
> 
>     ndo_set_port_profile
>     ndo_get_port_profile
>     ndo_del_port_profile
> 
> Sounds OK?

That sounds good.

	Arnd

^ permalink raw reply

* [PATCH 17/17] sfc: Create multiple TX queues
From: Ben Hutchings @ 2010-04-28 19:30 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482722.2549.17.camel@achroite.uk.solarflarecom.com>

Create a core TX queue and 2 hardware TX queues for each channel.
If separate_tx_channels is set, create equal numbers of RX and TX
channels instead.

Rewrite the channel and queue iteration macros accordingly.
Eliminate efx_channel::used_flags as redundant.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/sfc/efx.c        |  113 ++++++++++++++++++++++-------------------
 drivers/net/sfc/efx.h        |    4 +-
 drivers/net/sfc/ethtool.c    |    4 +-
 drivers/net/sfc/net_driver.h |   61 ++++++++++------------
 drivers/net/sfc/nic.c        |   12 ++--
 drivers/net/sfc/selftest.c   |    4 +-
 drivers/net/sfc/selftest.h   |    4 +-
 drivers/net/sfc/tx.c         |   61 ++++++++++++++---------
 8 files changed, 140 insertions(+), 123 deletions(-)

diff --git a/drivers/net/sfc/efx.c b/drivers/net/sfc/efx.c
index 5e3f944..bc75ef6 100644
--- a/drivers/net/sfc/efx.c
+++ b/drivers/net/sfc/efx.c
@@ -288,7 +288,7 @@ static int efx_poll(struct napi_struct *napi, int budget)
 	if (spent < budget) {
 		struct efx_nic *efx = channel->efx;
 
-		if (channel->used_flags & EFX_USED_BY_RX &&
+		if (channel->channel < efx->n_rx_channels &&
 		    efx->irq_rx_adaptive &&
 		    unlikely(++channel->irq_count == 1000)) {
 			if (unlikely(channel->irq_mod_score <
@@ -333,7 +333,6 @@ void efx_process_channel_now(struct efx_channel *channel)
 {
 	struct efx_nic *efx = channel->efx;
 
-	BUG_ON(!channel->used_flags);
 	BUG_ON(!channel->enabled);
 
 	/* Disable interrupts and wait for ISRs to complete */
@@ -446,12 +445,12 @@ static void efx_set_channel_names(struct efx_nic *efx)
 
 	efx_for_each_channel(channel, efx) {
 		number = channel->channel;
-		if (efx->n_channels > efx->n_rx_queues) {
-			if (channel->channel < efx->n_rx_queues) {
+		if (efx->n_channels > efx->n_rx_channels) {
+			if (channel->channel < efx->n_rx_channels) {
 				type = "-rx";
 			} else {
 				type = "-tx";
-				number -= efx->n_rx_queues;
+				number -= efx->n_rx_channels;
 			}
 		}
 		snprintf(channel->name, sizeof(channel->name),
@@ -585,8 +584,6 @@ static void efx_remove_channel(struct efx_channel *channel)
 	efx_for_each_channel_tx_queue(tx_queue, channel)
 		efx_remove_tx_queue(tx_queue);
 	efx_remove_eventq(channel);
-
-	channel->used_flags = 0;
 }
 
 void efx_schedule_slow_fill(struct efx_rx_queue *rx_queue, int delay)
@@ -956,10 +953,9 @@ static void efx_fini_io(struct efx_nic *efx)
 	pci_disable_device(efx->pci_dev);
 }
 
-/* Get number of RX queues wanted.  Return number of online CPU
- * packages in the expectation that an IRQ balancer will spread
- * interrupts across them. */
-static int efx_wanted_rx_queues(void)
+/* Get number of channels wanted.  Each channel will have its own IRQ,
+ * 1 RX queue and/or 2 TX queues. */
+static int efx_wanted_channels(void)
 {
 	cpumask_var_t core_mask;
 	int count;
@@ -995,34 +991,39 @@ static void efx_probe_interrupts(struct efx_nic *efx)
 
 	if (efx->interrupt_mode == EFX_INT_MODE_MSIX) {
 		struct msix_entry xentries[EFX_MAX_CHANNELS];
-		int wanted_ints;
-		int rx_queues;
+		int n_channels;
 
-		/* We want one RX queue and interrupt per CPU package
-		 * (or as specified by the rss_cpus module parameter).
-		 * We will need one channel per interrupt.
-		 */
-		rx_queues = rss_cpus ? rss_cpus : efx_wanted_rx_queues();
-		wanted_ints = rx_queues + (separate_tx_channels ? 1 : 0);
-		wanted_ints = min(wanted_ints, max_channels);
+		n_channels = efx_wanted_channels();
+		if (separate_tx_channels)
+			n_channels *= 2;
+		n_channels = min(n_channels, max_channels);
 
-		for (i = 0; i < wanted_ints; i++)
+		for (i = 0; i < n_channels; i++)
 			xentries[i].entry = i;
-		rc = pci_enable_msix(efx->pci_dev, xentries, wanted_ints);
+		rc = pci_enable_msix(efx->pci_dev, xentries, n_channels);
 		if (rc > 0) {
 			EFX_ERR(efx, "WARNING: Insufficient MSI-X vectors"
-				" available (%d < %d).\n", rc, wanted_ints);
+				" available (%d < %d).\n", rc, n_channels);
 			EFX_ERR(efx, "WARNING: Performance may be reduced.\n");
-			EFX_BUG_ON_PARANOID(rc >= wanted_ints);
-			wanted_ints = rc;
+			EFX_BUG_ON_PARANOID(rc >= n_channels);
+			n_channels = rc;
 			rc = pci_enable_msix(efx->pci_dev, xentries,
-					     wanted_ints);
+					     n_channels);
 		}
 
 		if (rc == 0) {
-			efx->n_rx_queues = min(rx_queues, wanted_ints);
-			efx->n_channels = wanted_ints;
-			for (i = 0; i < wanted_ints; i++)
+			efx->n_channels = n_channels;
+			if (separate_tx_channels) {
+				efx->n_tx_channels =
+					max(efx->n_channels / 2, 1U);
+				efx->n_rx_channels =
+					max(efx->n_channels -
+					    efx->n_tx_channels, 1U);
+			} else {
+				efx->n_tx_channels = efx->n_channels;
+				efx->n_rx_channels = efx->n_channels;
+			}
+			for (i = 0; i < n_channels; i++)
 				efx->channel[i].irq = xentries[i].vector;
 		} else {
 			/* Fall back to single channel MSI */
@@ -1033,8 +1034,9 @@ static void efx_probe_interrupts(struct efx_nic *efx)
 
 	/* Try single interrupt MSI */
 	if (efx->interrupt_mode == EFX_INT_MODE_MSI) {
-		efx->n_rx_queues = 1;
 		efx->n_channels = 1;
+		efx->n_rx_channels = 1;
+		efx->n_tx_channels = 1;
 		rc = pci_enable_msi(efx->pci_dev);
 		if (rc == 0) {
 			efx->channel[0].irq = efx->pci_dev->irq;
@@ -1046,8 +1048,9 @@ static void efx_probe_interrupts(struct efx_nic *efx)
 
 	/* Assume legacy interrupts */
 	if (efx->interrupt_mode == EFX_INT_MODE_LEGACY) {
-		efx->n_rx_queues = 1;
 		efx->n_channels = 1 + (separate_tx_channels ? 1 : 0);
+		efx->n_rx_channels = 1;
+		efx->n_tx_channels = 1;
 		efx->legacy_irq = efx->pci_dev->irq;
 	}
 }
@@ -1068,21 +1071,24 @@ static void efx_remove_interrupts(struct efx_nic *efx)
 
 static void efx_set_channels(struct efx_nic *efx)
 {
+	struct efx_channel *channel;
 	struct efx_tx_queue *tx_queue;
 	struct efx_rx_queue *rx_queue;
+	unsigned tx_channel_offset =
+		separate_tx_channels ? efx->n_channels - efx->n_tx_channels : 0;
 
-	efx_for_each_tx_queue(tx_queue, efx) {
-		if (separate_tx_channels)
-			tx_queue->channel = &efx->channel[efx->n_channels-1];
-		else
-			tx_queue->channel = &efx->channel[0];
-		tx_queue->channel->used_flags |= EFX_USED_BY_TX;
+	efx_for_each_channel(channel, efx) {
+		if (channel->channel - tx_channel_offset < efx->n_tx_channels) {
+			channel->tx_queue = &efx->tx_queue[
+				(channel->channel - tx_channel_offset) *
+				EFX_TXQ_TYPES];
+			efx_for_each_channel_tx_queue(tx_queue, channel)
+				tx_queue->channel = channel;
+		}
 	}
 
-	efx_for_each_rx_queue(rx_queue, efx) {
+	efx_for_each_rx_queue(rx_queue, efx)
 		rx_queue->channel = &efx->channel[rx_queue->queue];
-		rx_queue->channel->used_flags |= EFX_USED_BY_RX;
-	}
 }
 
 static int efx_probe_nic(struct efx_nic *efx)
@@ -1096,11 +1102,12 @@ static int efx_probe_nic(struct efx_nic *efx)
 	if (rc)
 		return rc;
 
-	/* Determine the number of channels and RX queues by trying to hook
+	/* Determine the number of channels and queues by trying to hook
 	 * in MSI-X interrupts. */
 	efx_probe_interrupts(efx);
 
 	efx_set_channels(efx);
+	efx->net_dev->real_num_tx_queues = efx->n_tx_channels;
 
 	/* Initialise the interrupt moderation settings */
 	efx_init_irq_moderation(efx, tx_irq_mod_usec, rx_irq_mod_usec, true);
@@ -1187,11 +1194,12 @@ static void efx_start_all(struct efx_nic *efx)
 	/* Mark the port as enabled so port reconfigurations can start, then
 	 * restart the transmit interface early so the watchdog timer stops */
 	efx_start_port(efx);
-	if (efx_dev_registered(efx))
-		efx_wake_queue(efx);
 
-	efx_for_each_channel(channel, efx)
+	efx_for_each_channel(channel, efx) {
+		if (efx_dev_registered(efx))
+			efx_wake_queue(channel);
 		efx_start_channel(channel);
+	}
 
 	efx_nic_enable_interrupts(efx);
 
@@ -1282,7 +1290,9 @@ static void efx_stop_all(struct efx_nic *efx)
 	/* Stop the kernel transmit interface late, so the watchdog
 	 * timer isn't ticking over the flush */
 	if (efx_dev_registered(efx)) {
-		efx_stop_queue(efx);
+		struct efx_channel *channel;
+		efx_for_each_channel(channel, efx)
+			efx_stop_queue(channel);
 		netif_tx_lock_bh(efx->net_dev);
 		netif_tx_unlock_bh(efx->net_dev);
 	}
@@ -1537,9 +1547,8 @@ static void efx_watchdog(struct net_device *net_dev)
 {
 	struct efx_nic *efx = netdev_priv(net_dev);
 
-	EFX_ERR(efx, "TX stuck with stop_count=%d port_enabled=%d:"
-		" resetting channels\n",
-		atomic_read(&efx->netif_stop_count), efx->port_enabled);
+	EFX_ERR(efx, "TX stuck with port_enabled=%d: resetting channels\n",
+		efx->port_enabled);
 
 	efx_schedule_reset(efx, RESET_TYPE_TX_WATCHDOG);
 }
@@ -2014,22 +2023,22 @@ static int efx_init_struct(struct efx_nic *efx, struct efx_nic_type *type,
 
 	efx->net_dev = net_dev;
 	efx->rx_checksum_enabled = true;
-	spin_lock_init(&efx->netif_stop_lock);
 	spin_lock_init(&efx->stats_lock);
 	mutex_init(&efx->mac_lock);
 	efx->mac_op = type->default_mac_ops;
 	efx->phy_op = &efx_dummy_phy_operations;
 	efx->mdio.dev = net_dev;
 	INIT_WORK(&efx->mac_work, efx_mac_work);
-	atomic_set(&efx->netif_stop_count, 1);
 
 	for (i = 0; i < EFX_MAX_CHANNELS; i++) {
 		channel = &efx->channel[i];
 		channel->efx = efx;
 		channel->channel = i;
 		channel->work_pending = false;
+		spin_lock_init(&channel->tx_stop_lock);
+		atomic_set(&channel->tx_stop_count, 1);
 	}
-	for (i = 0; i < EFX_TX_QUEUE_COUNT; i++) {
+	for (i = 0; i < EFX_MAX_TX_QUEUES; i++) {
 		tx_queue = &efx->tx_queue[i];
 		tx_queue->efx = efx;
 		tx_queue->queue = i;
@@ -2201,7 +2210,7 @@ static int __devinit efx_pci_probe(struct pci_dev *pci_dev,
 	int i, rc;
 
 	/* Allocate and initialise a struct net_device and struct efx_nic */
-	net_dev = alloc_etherdev(sizeof(*efx));
+	net_dev = alloc_etherdev_mq(sizeof(*efx), EFX_MAX_CORE_TX_QUEUES);
 	if (!net_dev)
 		return -ENOMEM;
 	net_dev->features |= (type->offload_features | NETIF_F_SG |
diff --git a/drivers/net/sfc/efx.h b/drivers/net/sfc/efx.h
index 7eff0a6..ffd708c 100644
--- a/drivers/net/sfc/efx.h
+++ b/drivers/net/sfc/efx.h
@@ -35,8 +35,8 @@ efx_hard_start_xmit(struct sk_buff *skb, struct net_device *net_dev);
 extern netdev_tx_t
 efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb);
 extern void efx_xmit_done(struct efx_tx_queue *tx_queue, unsigned int index);
-extern void efx_stop_queue(struct efx_nic *efx);
-extern void efx_wake_queue(struct efx_nic *efx);
+extern void efx_stop_queue(struct efx_channel *channel);
+extern void efx_wake_queue(struct efx_channel *channel);
 #define EFX_TXQ_SIZE 1024
 #define EFX_TXQ_MASK (EFX_TXQ_SIZE - 1)
 
diff --git a/drivers/net/sfc/ethtool.c b/drivers/net/sfc/ethtool.c
index cbe9319..22026bf 100644
--- a/drivers/net/sfc/ethtool.c
+++ b/drivers/net/sfc/ethtool.c
@@ -647,7 +647,7 @@ static int efx_ethtool_get_coalesce(struct net_device *net_dev,
 	efx_for_each_tx_queue(tx_queue, efx) {
 		channel = tx_queue->channel;
 		if (channel->irq_moderation < coalesce->tx_coalesce_usecs_irq) {
-			if (channel->used_flags != EFX_USED_BY_RX_TX)
+			if (channel->channel < efx->n_rx_channels)
 				coalesce->tx_coalesce_usecs_irq =
 					channel->irq_moderation;
 			else
@@ -690,7 +690,7 @@ static int efx_ethtool_set_coalesce(struct net_device *net_dev,
 
 	/* If the channel is shared only allow RX parameters to be set */
 	efx_for_each_tx_queue(tx_queue, efx) {
-		if ((tx_queue->channel->used_flags == EFX_USED_BY_RX_TX) &&
+		if ((tx_queue->channel->channel < efx->n_rx_channels) &&
 		    tx_usecs) {
 			EFX_ERR(efx, "Channel is shared. "
 				"Only RX coalescing may be set\n");
diff --git a/drivers/net/sfc/net_driver.h b/drivers/net/sfc/net_driver.h
index d68331c..2e6fd89 100644
--- a/drivers/net/sfc/net_driver.h
+++ b/drivers/net/sfc/net_driver.h
@@ -85,9 +85,13 @@ do {if (net_ratelimit()) EFX_LOG(efx, fmt, ##args); } while (0)
 #define EFX_MAX_CHANNELS 32
 #define EFX_MAX_RX_QUEUES EFX_MAX_CHANNELS
 
-#define EFX_TX_QUEUE_OFFLOAD_CSUM	0
-#define EFX_TX_QUEUE_NO_CSUM		1
-#define EFX_TX_QUEUE_COUNT		2
+/* Checksum generation is a per-queue option in hardware, so each
+ * queue visible to the networking core is backed by two hardware TX
+ * queues. */
+#define EFX_MAX_CORE_TX_QUEUES	EFX_MAX_CHANNELS
+#define EFX_TXQ_TYPE_OFFLOAD	1
+#define EFX_TXQ_TYPES		2
+#define EFX_MAX_TX_QUEUES	(EFX_TXQ_TYPES * EFX_MAX_CORE_TX_QUEUES)
 
 /**
  * struct efx_special_buffer - An Efx special buffer
@@ -187,7 +191,7 @@ struct efx_tx_buffer {
 struct efx_tx_queue {
 	/* Members which don't change on the fast path */
 	struct efx_nic *efx ____cacheline_aligned_in_smp;
-	int queue;
+	unsigned queue;
 	struct efx_channel *channel;
 	struct efx_nic *nic;
 	struct efx_tx_buffer *buffer;
@@ -306,11 +310,6 @@ struct efx_buffer {
 };
 

-/* Flags for channel->used_flags */
-#define EFX_USED_BY_RX 1
-#define EFX_USED_BY_TX 2
-#define EFX_USED_BY_RX_TX (EFX_USED_BY_RX | EFX_USED_BY_TX)
-
 enum efx_rx_alloc_method {
 	RX_ALLOC_METHOD_AUTO = 0,
 	RX_ALLOC_METHOD_SKB = 1,
@@ -327,7 +326,6 @@ enum efx_rx_alloc_method {
  * @efx: Associated Efx NIC
  * @channel: Channel instance number
  * @name: Name for channel and IRQ
- * @used_flags: Channel is used by net driver
  * @enabled: Channel enabled indicator
  * @irq: IRQ number (MSI and MSI-X only)
  * @irq_moderation: IRQ moderation value (in hardware ticks)
@@ -352,12 +350,14 @@ enum efx_rx_alloc_method {
  * @n_rx_frm_trunc: Count of RX_FRM_TRUNC errors
  * @n_rx_overlength: Count of RX_OVERLENGTH errors
  * @n_skbuff_leaks: Count of skbuffs leaked due to RX overrun
+ * @tx_queue: Pointer to first TX queue, or %NULL if not used for TX
+ * @tx_stop_count: Core TX queue stop count
+ * @tx_stop_lock: Core TX queue stop lock
  */
 struct efx_channel {
 	struct efx_nic *efx;
 	int channel;
 	char name[IFNAMSIZ + 6];
-	int used_flags;
 	bool enabled;
 	int irq;
 	unsigned int irq_moderation;
@@ -389,6 +389,9 @@ struct efx_channel {
 	struct efx_rx_buffer *rx_pkt;
 	bool rx_pkt_csummed;
 
+	struct efx_tx_queue *tx_queue;
+	atomic_t tx_stop_count;
+	spinlock_t tx_stop_lock;
 };
 
 enum efx_led_mode {
@@ -661,8 +664,9 @@ union efx_multicast_hash {
  * @rx_queue: RX DMA queues
  * @channel: Channels
  * @next_buffer_table: First available buffer table id
- * @n_rx_queues: Number of RX queues
  * @n_channels: Number of channels in use
+ * @n_rx_channels: Number of channels used for RX (= number of RX queues)
+ * @n_tx_channels: Number of channels used for TX
  * @rx_buffer_len: RX buffer length
  * @rx_buffer_order: Order (log2) of number of pages for each RX buffer
  * @int_error_count: Number of internal errors seen recently
@@ -693,8 +697,6 @@ union efx_multicast_hash {
  * @port_initialized: Port initialized?
  * @net_dev: Operating system network device. Consider holding the rtnl lock
  * @rx_checksum_enabled: RX checksumming enabled
- * @netif_stop_count: Port stop count
- * @netif_stop_lock: Port stop lock
  * @mac_stats: MAC statistics. These include all statistics the MACs
  *	can provide.  Generic code converts these into a standard
  *	&struct net_device_stats.
@@ -742,13 +744,14 @@ struct efx_nic {
 	enum nic_state state;
 	enum reset_type reset_pending;
 
-	struct efx_tx_queue tx_queue[EFX_TX_QUEUE_COUNT];
+	struct efx_tx_queue tx_queue[EFX_MAX_TX_QUEUES];
 	struct efx_rx_queue rx_queue[EFX_MAX_RX_QUEUES];
 	struct efx_channel channel[EFX_MAX_CHANNELS];
 
 	unsigned next_buffer_table;
-	int n_rx_queues;
-	int n_channels;
+	unsigned n_channels;
+	unsigned n_rx_channels;
+	unsigned n_tx_channels;
 	unsigned int rx_buffer_len;
 	unsigned int rx_buffer_order;
 
@@ -780,9 +783,6 @@ struct efx_nic {
 	struct net_device *net_dev;
 	bool rx_checksum_enabled;
 
-	atomic_t netif_stop_count;
-	spinlock_t netif_stop_lock;
-
 	struct efx_mac_stats mac_stats;
 	struct efx_buffer stats_buffer;
 	spinlock_t stats_lock;
@@ -928,31 +928,26 @@ struct efx_nic_type {
 /* Iterate over all used channels */
 #define efx_for_each_channel(_channel, _efx)				\
 	for (_channel = &((_efx)->channel[0]);				\
-	     _channel < &((_efx)->channel[EFX_MAX_CHANNELS]);		\
-	     _channel++)						\
-		if (!_channel->used_flags)				\
-			continue;					\
-		else
+	     _channel < &((_efx)->channel[(efx)->n_channels]);		\
+	     _channel++)
 
 /* Iterate over all used TX queues */
 #define efx_for_each_tx_queue(_tx_queue, _efx)				\
 	for (_tx_queue = &((_efx)->tx_queue[0]);			\
-	     _tx_queue < &((_efx)->tx_queue[EFX_TX_QUEUE_COUNT]);	\
+	     _tx_queue < &((_efx)->tx_queue[EFX_TXQ_TYPES *		\
+					    (_efx)->n_tx_channels]);	\
 	     _tx_queue++)
 
 /* Iterate over all TX queues belonging to a channel */
 #define efx_for_each_channel_tx_queue(_tx_queue, _channel)		\
-	for (_tx_queue = &((_channel)->efx->tx_queue[0]);		\
-	     _tx_queue < &((_channel)->efx->tx_queue[EFX_TX_QUEUE_COUNT]); \
-	     _tx_queue++)						\
-		if (_tx_queue->channel != (_channel))			\
-			continue;					\
-		else
+	for (_tx_queue = (_channel)->tx_queue;				\
+	     _tx_queue && _tx_queue < (_channel)->tx_queue + EFX_TXQ_TYPES; \
+	     _tx_queue++)
 
 /* Iterate over all used RX queues */
 #define efx_for_each_rx_queue(_rx_queue, _efx)				\
 	for (_rx_queue = &((_efx)->rx_queue[0]);			\
-	     _rx_queue < &((_efx)->rx_queue[(_efx)->n_rx_queues]);	\
+	     _rx_queue < &((_efx)->rx_queue[(_efx)->n_rx_channels]);	\
 	     _rx_queue++)
 
 /* Iterate over all RX queues belonging to a channel */
diff --git a/drivers/net/sfc/nic.c b/drivers/net/sfc/nic.c
index f3226bb..5d3aaec 100644
--- a/drivers/net/sfc/nic.c
+++ b/drivers/net/sfc/nic.c
@@ -418,7 +418,7 @@ void efx_nic_init_tx(struct efx_tx_queue *tx_queue)
 			      FRF_BZ_TX_NON_IP_DROP_DIS, 1);
 
 	if (efx_nic_rev(efx) >= EFX_REV_FALCON_B0) {
-		int csum = tx_queue->queue == EFX_TX_QUEUE_OFFLOAD_CSUM;
+		int csum = tx_queue->queue & EFX_TXQ_TYPE_OFFLOAD;
 		EFX_SET_OWORD_FIELD(tx_desc_ptr, FRF_BZ_TX_IP_CHKSM_DIS, !csum);
 		EFX_SET_OWORD_FIELD(tx_desc_ptr, FRF_BZ_TX_TCP_CHKSM_DIS,
 				    !csum);
@@ -431,10 +431,10 @@ void efx_nic_init_tx(struct efx_tx_queue *tx_queue)
 		efx_oword_t reg;
 
 		/* Only 128 bits in this register */
-		BUILD_BUG_ON(EFX_TX_QUEUE_COUNT >= 128);
+		BUILD_BUG_ON(EFX_MAX_TX_QUEUES > 128);
 
 		efx_reado(efx, &reg, FR_AA_TX_CHKSM_CFG);
-		if (tx_queue->queue == EFX_TX_QUEUE_OFFLOAD_CSUM)
+		if (tx_queue->queue & EFX_TXQ_TYPE_OFFLOAD)
 			clear_bit_le(tx_queue->queue, (void *)&reg);
 		else
 			set_bit_le(tx_queue->queue, (void *)&reg);
@@ -1132,7 +1132,7 @@ static void efx_poll_flush_events(struct efx_nic *efx)
 		    ev_sub_code == FSE_AZ_TX_DESCQ_FLS_DONE_EV) {
 			ev_queue = EFX_QWORD_FIELD(*event,
 						   FSF_AZ_DRIVER_EV_SUBDATA);
-			if (ev_queue < EFX_TX_QUEUE_COUNT) {
+			if (ev_queue < EFX_TXQ_TYPES * efx->n_tx_channels) {
 				tx_queue = efx->tx_queue + ev_queue;
 				tx_queue->flushed = FLUSH_DONE;
 			}
@@ -1142,7 +1142,7 @@ static void efx_poll_flush_events(struct efx_nic *efx)
 				*event, FSF_AZ_DRIVER_EV_RX_DESCQ_ID);
 			ev_failed = EFX_QWORD_FIELD(
 				*event, FSF_AZ_DRIVER_EV_RX_FLUSH_FAIL);
-			if (ev_queue < efx->n_rx_queues) {
+			if (ev_queue < efx->n_rx_channels) {
 				rx_queue = efx->rx_queue + ev_queue;
 				rx_queue->flushed =
 					ev_failed ? FLUSH_FAILED : FLUSH_DONE;
@@ -1441,7 +1441,7 @@ static void efx_setup_rss_indir_table(struct efx_nic *efx)
 	     offset < FR_BZ_RX_INDIRECTION_TBL + 0x800;
 	     offset += 0x10) {
 		EFX_POPULATE_DWORD_1(dword, FRF_BZ_IT_QUEUE,
-				     i % efx->n_rx_queues);
+				     i % efx->n_rx_channels);
 		efx_writed(efx, &dword, offset);
 		i++;
 	}
diff --git a/drivers/net/sfc/selftest.c b/drivers/net/sfc/selftest.c
index 3a16e06..371e86c 100644
--- a/drivers/net/sfc/selftest.c
+++ b/drivers/net/sfc/selftest.c
@@ -618,8 +618,8 @@ static int efx_test_loopbacks(struct efx_nic *efx, struct efx_self_tests *tests,
 
 		/* Test both types of TX queue */
 		efx_for_each_channel_tx_queue(tx_queue, &efx->channel[0]) {
-			state->offload_csum = (tx_queue->queue ==
-					       EFX_TX_QUEUE_OFFLOAD_CSUM);
+			state->offload_csum = (tx_queue->queue &
+					       EFX_TXQ_TYPE_OFFLOAD);
 			rc = efx_test_loopback(tx_queue,
 					       &tests->loopback[mode]);
 			if (rc)
diff --git a/drivers/net/sfc/selftest.h b/drivers/net/sfc/selftest.h
index 643bef7..aed495a 100644
--- a/drivers/net/sfc/selftest.h
+++ b/drivers/net/sfc/selftest.h
@@ -18,8 +18,8 @@
  */
 
 struct efx_loopback_self_tests {
-	int tx_sent[EFX_TX_QUEUE_COUNT];
-	int tx_done[EFX_TX_QUEUE_COUNT];
+	int tx_sent[EFX_TXQ_TYPES];
+	int tx_done[EFX_TXQ_TYPES];
 	int rx_good;
 	int rx_bad;
 };
diff --git a/drivers/net/sfc/tx.c b/drivers/net/sfc/tx.c
index be0e110..6bb12a8 100644
--- a/drivers/net/sfc/tx.c
+++ b/drivers/net/sfc/tx.c
@@ -30,32 +30,46 @@
  */
 #define EFX_TXQ_THRESHOLD (EFX_TXQ_MASK / 2u)
 
-/* We want to be able to nest calls to netif_stop_queue(), since each
- * channel can have an individual stop on the queue.
- */
-void efx_stop_queue(struct efx_nic *efx)
+/* We need to be able to nest calls to netif_tx_stop_queue(), partly
+ * because of the 2 hardware queues associated with each core queue,
+ * but also so that we can inhibit TX for reasons other than a full
+ * hardware queue. */
+void efx_stop_queue(struct efx_channel *channel)
 {
-	spin_lock_bh(&efx->netif_stop_lock);
+	struct efx_nic *efx = channel->efx;
+
+	if (!channel->tx_queue)
+		return;
+
+	spin_lock_bh(&channel->tx_stop_lock);
 	EFX_TRACE(efx, "stop TX queue\n");
 
-	atomic_inc(&efx->netif_stop_count);
-	netif_stop_queue(efx->net_dev);
+	atomic_inc(&channel->tx_stop_count);
+	netif_tx_stop_queue(
+		netdev_get_tx_queue(
+			efx->net_dev,
+			channel->tx_queue->queue / EFX_TXQ_TYPES));
 
-	spin_unlock_bh(&efx->netif_stop_lock);
+	spin_unlock_bh(&channel->tx_stop_lock);
 }
 
-/* Wake netif's TX queue
- * We want to be able to nest calls to netif_stop_queue(), since each
- * channel can have an individual stop on the queue.
- */
-void efx_wake_queue(struct efx_nic *efx)
+/* Decrement core TX queue stop count and wake it if the count is 0 */
+void efx_wake_queue(struct efx_channel *channel)
 {
+	struct efx_nic *efx = channel->efx;
+
+	if (!channel->tx_queue)
+		return;
+
 	local_bh_disable();
-	if (atomic_dec_and_lock(&efx->netif_stop_count,
-				&efx->netif_stop_lock)) {
+	if (atomic_dec_and_lock(&channel->tx_stop_count,
+				&channel->tx_stop_lock)) {
 		EFX_TRACE(efx, "waking TX queue\n");
-		netif_wake_queue(efx->net_dev);
-		spin_unlock(&efx->netif_stop_lock);
+		netif_tx_wake_queue(
+			netdev_get_tx_queue(
+				efx->net_dev,
+				channel->tx_queue->queue / EFX_TXQ_TYPES));
+		spin_unlock(&channel->tx_stop_lock);
 	}
 	local_bh_enable();
 }
@@ -298,7 +312,7 @@ netdev_tx_t efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb)
 	rc = NETDEV_TX_BUSY;
 
 	if (tx_queue->stopped == 1)
-		efx_stop_queue(efx);
+		efx_stop_queue(tx_queue->channel);
 
  unwind:
 	/* Work backwards until we hit the original insert pointer value */
@@ -374,10 +388,9 @@ netdev_tx_t efx_hard_start_xmit(struct sk_buff *skb,
 	if (unlikely(efx->port_inhibited))
 		return NETDEV_TX_BUSY;
 
+	tx_queue = &efx->tx_queue[EFX_TXQ_TYPES * skb_get_queue_mapping(skb)];
 	if (likely(skb->ip_summed == CHECKSUM_PARTIAL))
-		tx_queue = &efx->tx_queue[EFX_TX_QUEUE_OFFLOAD_CSUM];
-	else
-		tx_queue = &efx->tx_queue[EFX_TX_QUEUE_NO_CSUM];
+		tx_queue += EFX_TXQ_TYPE_OFFLOAD;
 
 	return efx_enqueue_skb(tx_queue, skb);
 }
@@ -405,7 +418,7 @@ void efx_xmit_done(struct efx_tx_queue *tx_queue, unsigned int index)
 			netif_tx_lock(efx->net_dev);
 			if (tx_queue->stopped) {
 				tx_queue->stopped = 0;
-				efx_wake_queue(efx);
+				efx_wake_queue(tx_queue->channel);
 			}
 			netif_tx_unlock(efx->net_dev);
 		}
@@ -488,7 +501,7 @@ void efx_fini_tx_queue(struct efx_tx_queue *tx_queue)
 	/* Release queue's stop on port, if any */
 	if (tx_queue->stopped) {
 		tx_queue->stopped = 0;
-		efx_wake_queue(tx_queue->efx);
+		efx_wake_queue(tx_queue->channel);
 	}
 }
 
@@ -1120,7 +1133,7 @@ static int efx_enqueue_skb_tso(struct efx_tx_queue *tx_queue,
 
 	/* Stop the queue if it wasn't stopped before. */
 	if (tx_queue->stopped == 1)
-		efx_stop_queue(efx);
+		efx_stop_queue(tx_queue->channel);
 
  unwind:
 	/* Free the DMA mapping we were in the process of writing out */
-- 
1.6.2.5

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related

* [PATCH 16/17] sfc: Test only the first pair of TX queues
From: Ben Hutchings @ 2010-04-28 19:30 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482722.2549.17.camel@achroite.uk.solarflarecom.com>

This makes no immediate difference, but we definitely do not want
to test all TX queues once we allocate a pair of TX queues to each
channel.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/sfc/ethtool.c  |    2 +-
 drivers/net/sfc/selftest.c |    4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/sfc/ethtool.c b/drivers/net/sfc/ethtool.c
index d9f9c02..cbe9319 100644
--- a/drivers/net/sfc/ethtool.c
+++ b/drivers/net/sfc/ethtool.c
@@ -304,7 +304,7 @@ static int efx_fill_loopback_test(struct efx_nic *efx,
 {
 	struct efx_tx_queue *tx_queue;
 
-	efx_for_each_tx_queue(tx_queue, efx) {
+	efx_for_each_channel_tx_queue(tx_queue, &efx->channel[0]) {
 		efx_fill_test(test_index++, strings, data,
 			      &lb_tests->tx_sent[tx_queue->queue],
 			      EFX_TX_QUEUE_NAME(tx_queue),
diff --git a/drivers/net/sfc/selftest.c b/drivers/net/sfc/selftest.c
index 0106b1d..3a16e06 100644
--- a/drivers/net/sfc/selftest.c
+++ b/drivers/net/sfc/selftest.c
@@ -616,8 +616,8 @@ static int efx_test_loopbacks(struct efx_nic *efx, struct efx_self_tests *tests,
 			goto out;
 		}
 
-		/* Test every TX queue */
-		efx_for_each_tx_queue(tx_queue, efx) {
+		/* Test both types of TX queue */
+		efx_for_each_channel_tx_queue(tx_queue, &efx->channel[0]) {
 			state->offload_csum = (tx_queue->queue ==
 					       EFX_TX_QUEUE_OFFLOAD_CSUM);
 			rc = efx_test_loopback(tx_queue,
-- 
1.6.2.5


-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related

* [PATCH 15/17] sfc: Add Siena PHY BIST and cable diagnostic support
From: Ben Hutchings @ 2010-04-28 19:30 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482722.2549.17.camel@achroite.uk.solarflarecom.com>

From: Steve Hodgson <shodgson@solarflare.com>

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/sfc/mcdi_phy.c |  146 +++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 144 insertions(+), 2 deletions(-)

diff --git a/drivers/net/sfc/mcdi_phy.c b/drivers/net/sfc/mcdi_phy.c
index 5d34487..6032c0e 100644
--- a/drivers/net/sfc/mcdi_phy.c
+++ b/drivers/net/sfc/mcdi_phy.c
@@ -17,6 +17,8 @@
 #include "mcdi.h"
 #include "mcdi_pcol.h"
 #include "mdio_10g.h"
+#include "nic.h"
+#include "selftest.h"
 
 struct efx_mcdi_phy_cfg {
 	u32 flags;
@@ -594,6 +596,146 @@ static int efx_mcdi_phy_test_alive(struct efx_nic *efx)
 	return 0;
 }
 
+static const char *const mcdi_sft9001_cable_diag_names[] = {
+	"cable.pairA.length",
+	"cable.pairB.length",
+	"cable.pairC.length",
+	"cable.pairD.length",
+	"cable.pairA.status",
+	"cable.pairB.status",
+	"cable.pairC.status",
+	"cable.pairD.status",
+};
+
+static int efx_mcdi_bist(struct efx_nic *efx, unsigned int bist_mode,
+			 int *results)
+{
+	unsigned int retry, i, count = 0;
+	size_t outlen;
+	u32 status;
+	u8 *buf, *ptr;
+	int rc;
+
+	buf = kzalloc(0x100, GFP_KERNEL);
+	if (buf == NULL)
+		return -ENOMEM;
+
+	BUILD_BUG_ON(MC_CMD_START_BIST_OUT_LEN != 0);
+	MCDI_SET_DWORD(buf, START_BIST_IN_TYPE, bist_mode);
+	rc = efx_mcdi_rpc(efx, MC_CMD_START_BIST, buf, MC_CMD_START_BIST_IN_LEN,
+			  NULL, 0, NULL);
+	if (rc)
+		goto out;
+
+	/* Wait up to 10s for BIST to finish */
+	for (retry = 0; retry < 100; ++retry) {
+		BUILD_BUG_ON(MC_CMD_POLL_BIST_IN_LEN != 0);
+		rc = efx_mcdi_rpc(efx, MC_CMD_POLL_BIST, NULL, 0,
+				  buf, 0x100, &outlen);
+		if (rc)
+			goto out;
+
+		status = MCDI_DWORD(buf, POLL_BIST_OUT_RESULT);
+		if (status != MC_CMD_POLL_BIST_RUNNING)
+			goto finished;
+
+		msleep(100);
+	}
+
+	rc = -ETIMEDOUT;
+	goto out;
+
+finished:
+	results[count++] = (status == MC_CMD_POLL_BIST_PASSED) ? 1 : -1;
+
+	/* SFT9001 specific cable diagnostics output */
+	if (efx->phy_type == PHY_TYPE_SFT9001B &&
+	    (bist_mode == MC_CMD_PHY_BIST_CABLE_SHORT ||
+	     bist_mode == MC_CMD_PHY_BIST_CABLE_LONG)) {
+		ptr = MCDI_PTR(buf, POLL_BIST_OUT_SFT9001_CABLE_LENGTH_A);
+		if (status == MC_CMD_POLL_BIST_PASSED &&
+		    outlen >= MC_CMD_POLL_BIST_OUT_SFT9001_LEN) {
+			for (i = 0; i < 8; i++) {
+				results[count + i] =
+					EFX_DWORD_FIELD(((efx_dword_t *)ptr)[i],
+							EFX_DWORD_0);
+			}
+		}
+		count += 8;
+	}
+	rc = count;
+
+out:
+	kfree(buf);
+
+	return rc;
+}
+
+static int efx_mcdi_phy_run_tests(struct efx_nic *efx, int *results,
+				  unsigned flags)
+{
+	struct efx_mcdi_phy_cfg *phy_cfg = efx->phy_data;
+	u32 mode;
+	int rc;
+
+	if (phy_cfg->flags & (1 << MC_CMD_GET_PHY_CFG_BIST_LBN)) {
+		rc = efx_mcdi_bist(efx, MC_CMD_PHY_BIST, results);
+		if (rc < 0)
+			return rc;
+
+		results += rc;
+	}
+
+	/* If we support both LONG and SHORT, then run each in response to
+	 * break or not. Otherwise, run the one we support */
+	mode = 0;
+	if (phy_cfg->flags & (1 << MC_CMD_GET_PHY_CFG_BIST_CABLE_SHORT_LBN)) {
+		if ((flags & ETH_TEST_FL_OFFLINE) &&
+		    (phy_cfg->flags &
+		     (1 << MC_CMD_GET_PHY_CFG_BIST_CABLE_LONG_LBN)))
+			mode = MC_CMD_PHY_BIST_CABLE_LONG;
+		else
+			mode = MC_CMD_PHY_BIST_CABLE_SHORT;
+	} else if (phy_cfg->flags &
+		   (1 << MC_CMD_GET_PHY_CFG_BIST_CABLE_LONG_LBN))
+		mode = MC_CMD_PHY_BIST_CABLE_LONG;
+
+	if (mode != 0) {
+		rc = efx_mcdi_bist(efx, mode, results);
+		if (rc < 0)
+			return rc;
+		results += rc;
+	}
+
+	return 0;
+}
+
+const char *efx_mcdi_phy_test_name(struct efx_nic *efx, unsigned int index)
+{
+	struct efx_mcdi_phy_cfg *phy_cfg = efx->phy_data;
+
+	if (phy_cfg->flags & (1 << MC_CMD_GET_PHY_CFG_BIST_LBN)) {
+		if (index == 0)
+			return "bist";
+		--index;
+	}
+
+	if (phy_cfg->flags & ((1 << MC_CMD_GET_PHY_CFG_BIST_CABLE_SHORT_LBN) |
+			      (1 << MC_CMD_GET_PHY_CFG_BIST_CABLE_LONG_LBN))) {
+		if (index == 0)
+			return "cable";
+		--index;
+
+		if (efx->phy_type == PHY_TYPE_SFT9001B) {
+			if (index < ARRAY_SIZE(mcdi_sft9001_cable_diag_names))
+				return mcdi_sft9001_cable_diag_names[index];
+			index -= ARRAY_SIZE(mcdi_sft9001_cable_diag_names);
+		}
+	}
+
+	return NULL;
+}
+
 struct efx_phy_operations efx_mcdi_phy_ops = {
 	.probe		= efx_mcdi_phy_probe,
 	.init 	 	= efx_port_dummy_op_int,
@@ -604,6 +746,6 @@ struct efx_phy_operations efx_mcdi_phy_ops = {
 	.get_settings	= efx_mcdi_phy_get_settings,
 	.set_settings	= efx_mcdi_phy_set_settings,
 	.test_alive	= efx_mcdi_phy_test_alive,
-	.run_tests	= NULL,
-	.test_name	= NULL,
+	.run_tests	= efx_mcdi_phy_run_tests,
+	.test_name	= efx_mcdi_phy_test_name,
 };
-- 
1.6.2.5


-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related

* [PATCH 14/17] sfc: Clean up efx_nic::irq_zero_count
From: Ben Hutchings @ 2010-04-28 19:30 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482722.2549.17.camel@achroite.uk.solarflarecom.com>

There is no need for this to be unsigned long; make it unsigned int.
It does need a line in kernel-doc, so add that.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/sfc/net_driver.h |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/net/sfc/net_driver.h b/drivers/net/sfc/net_driver.h
index bdea03c..d68331c 100644
--- a/drivers/net/sfc/net_driver.h
+++ b/drivers/net/sfc/net_driver.h
@@ -672,6 +672,7 @@ union efx_multicast_hash {
  *	This register is written with the SMP processor ID whenever an
  *	interrupt is handled.  It is used by efx_nic_test_interrupt()
  *	to verify that an interrupt has occurred.
+ * @irq_zero_count: Number of legacy IRQs seen with queue flags == 0
  * @fatal_irq_level: IRQ level (bit number) used for serious errors
  * @spi_flash: SPI flash device
  *	This field will be %NULL if no flash device is present (or for Siena).
@@ -756,7 +757,7 @@ struct efx_nic {
 
 	struct efx_buffer irq_status;
 	volatile signed int last_irq_cpu;
-	unsigned long irq_zero_count;
+	unsigned irq_zero_count;
 	unsigned fatal_irq_level;
 
 	struct efx_spi_device *spi_flash;
-- 
1.6.2.5


-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related

* [PATCH 13/17] sfc: Add necessary parentheses to macro definitions in net_driver.h
From: Ben Hutchings @ 2010-04-28 19:29 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482722.2549.17.camel@achroite.uk.solarflarecom.com>

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/sfc/net_driver.h |   22 +++++++++++-----------
 1 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/net/sfc/net_driver.h b/drivers/net/sfc/net_driver.h
index 70aea3a..bdea03c 100644
--- a/drivers/net/sfc/net_driver.h
+++ b/drivers/net/sfc/net_driver.h
@@ -926,8 +926,8 @@ struct efx_nic_type {
 
 /* Iterate over all used channels */
 #define efx_for_each_channel(_channel, _efx)				\
-	for (_channel = &_efx->channel[0];				\
-	     _channel < &_efx->channel[EFX_MAX_CHANNELS];		\
+	for (_channel = &((_efx)->channel[0]);				\
+	     _channel < &((_efx)->channel[EFX_MAX_CHANNELS]);		\
 	     _channel++)						\
 		if (!_channel->used_flags)				\
 			continue;					\
@@ -935,31 +935,31 @@ struct efx_nic_type {
 
 /* Iterate over all used TX queues */
 #define efx_for_each_tx_queue(_tx_queue, _efx)				\
-	for (_tx_queue = &_efx->tx_queue[0];				\
-	     _tx_queue < &_efx->tx_queue[EFX_TX_QUEUE_COUNT];		\
+	for (_tx_queue = &((_efx)->tx_queue[0]);			\
+	     _tx_queue < &((_efx)->tx_queue[EFX_TX_QUEUE_COUNT]);	\
 	     _tx_queue++)
 
 /* Iterate over all TX queues belonging to a channel */
 #define efx_for_each_channel_tx_queue(_tx_queue, _channel)		\
-	for (_tx_queue = &_channel->efx->tx_queue[0];			\
-	     _tx_queue < &_channel->efx->tx_queue[EFX_TX_QUEUE_COUNT];	\
+	for (_tx_queue = &((_channel)->efx->tx_queue[0]);		\
+	     _tx_queue < &((_channel)->efx->tx_queue[EFX_TX_QUEUE_COUNT]); \
 	     _tx_queue++)						\
-		if (_tx_queue->channel != _channel)			\
+		if (_tx_queue->channel != (_channel))			\
 			continue;					\
 		else
 
 /* Iterate over all used RX queues */
 #define efx_for_each_rx_queue(_rx_queue, _efx)				\
-	for (_rx_queue = &_efx->rx_queue[0];				\
-	     _rx_queue < &_efx->rx_queue[_efx->n_rx_queues];		\
+	for (_rx_queue = &((_efx)->rx_queue[0]);			\
+	     _rx_queue < &((_efx)->rx_queue[(_efx)->n_rx_queues]);	\
 	     _rx_queue++)
 
 /* Iterate over all RX queues belonging to a channel */
 #define efx_for_each_channel_rx_queue(_rx_queue, _channel)		\
-	for (_rx_queue = &_channel->efx->rx_queue[_channel->channel];	\
+	for (_rx_queue = &((_channel)->efx->rx_queue[(_channel)->channel]); \
 	     _rx_queue;							\
 	     _rx_queue = NULL)						\
-		if (_rx_queue->channel != _channel)			\
+		if (_rx_queue->channel != (_channel))			\
 			continue;					\
 		else
 
-- 
1.6.2.5


-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox