Netdev List
 help / color / mirror / Atom feed
* Re: [PATCHv2 1/2] bonding: fix netdev event NULL pointer dereference
From: David Miller @ 2013-04-11 17:28 UTC (permalink / raw)
  To: nikolay; +Cc: netdev, andy, fubar
In-Reply-To: <1365687905-1053-1-git-send-email-nikolay@redhat.com>


When you repost a patch that's part of a series, you must resubmit all
patches in that series, not just the one that changed.

You should also provide a header posting "[PATCH 0/N] ..." that explains
the high level purpose and intent of the patch set.

^ permalink raw reply

* Re: [PATCH] tcp: incoming connections might use wrong route under synflood
From: David Miller @ 2013-04-11 17:18 UTC (permalink / raw)
  To: dp; +Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel
In-Reply-To: <20130411114600.14cb8a5d5a5dee9e2f1306b5@highloadlab.com>

From: Dmitry Popov <dp@highloadlab.com>
Date: Thu, 11 Apr 2013 11:46:00 +0400

> There is a bug in cookie_v4_check (net/ipv4/syncookies.c):
> 	flowi4_init_output(&fl4, 0, sk->sk_mark, RT_CONN_FLAGS(sk),
> 			   RT_SCOPE_UNIVERSE, IPPROTO_TCP,
> 			   inet_sk_flowi_flags(sk),
> 			   (opt && opt->srr) ? opt->faddr : ireq->rmt_addr,
> 			   ireq->loc_addr, th->source, th->dest);
> 
> Here we do not respect sk->sk_bound_dev_if, therefore wrong dst_entry may be
> taken. This dst_entry is used by new socket (get_cookie_sock -> 
> tcp_v4_syn_recv_sock), so its packets may take the wrong path.
> 
> Signed-off-by: Dmitry Popov <dp@highloadlab.com>

Do not top post, especially with patches!  Because you top posted
the new version of the patch, my reply to you sits at the end of
the new patch.

Make a fresh, completely new, mailing list posting to post new versions
of patches.  Never do so using replies.

Thanks.

^ permalink raw reply

* Re: [PATCH v2] net: mv643xx_eth: use managed devm_kzalloc
From: David Miller @ 2013-04-11 17:15 UTC (permalink / raw)
  To: sebastian.hesselbarth
  Cc: buytenh, andrew, jason, florian, sergei.shtylyov, netdev,
	linux-kernel
In-Reply-To: <51665DDF.4060205@gmail.com>

From: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Date: Thu, 11 Apr 2013 08:53:19 +0200

> On 04/11/2013 05:39 AM, David Miller wrote:
>> From: Sebastian Hesselbarth<sebastian.hesselbarth@gmail.com>
>> Date: Wed, 10 Apr 2013 22:42:07 +0200
>>
>>> This patch moves shared private data kzalloc to managed devm_kzalloc
>>> and
>>> cleans now unneccessary kfree and error handling.
>>>
>>> Signed-off-by: Sebastian Hesselbarth<sebastian.hesselbarth@gmail.com>
>>
>> This doesn't apply cleanly to the net-next tree.
> 
> Yeah. I sent two single patches for mv643xx_eth, while they should
> have been sent together in one patch set. I'll prepare a cover letter
> and resend both in one patch set.

If you don't number the patches or mention the dependency, there is no way
for people to know.

^ permalink raw reply

* Re: [PATCH] net: mv643xx_eth: Add GRO support
From: Willy Tarreau @ 2013-04-11 17:13 UTC (permalink / raw)
  To: Sebastian Hesselbarth
  Cc: Andrew Lunn, Jason Cooper, Benjamin Herrenschmidt, linux-kernel,
	Florian Fainelli, Soeren Moch, Paul Mackerras, Lennert Buytenhek,
	Dale Farnsworth, netdev, linuxppc-dev, David S. Miller,
	linux-arm-kernel
In-Reply-To: <CABJ1b_QaEJbuBsqx+J1Pkc7o66S_Rboa7Trnko1mXSK6-UZhfg@mail.gmail.com>

On Thu, Apr 11, 2013 at 06:59:11PM +0200, Sebastian Hesselbarth wrote:
> On Thu, Apr 11, 2013 at 5:32 PM, Willy Tarreau <w@1wt.eu> wrote:
> > On Thu, Apr 11, 2013 at 05:27:03PM +0200, Sebastian Hesselbarth wrote:
> >> I don't have a strong opinion on whether Soeren's or your proposal should
> >> be submitted. But I insist on having one of them in, as GRO significantly
> >> improves the common use case, is enabled by default, and not as
> >> constrained as LRO.
> >
> > I agree, use yours first, but we should keep an eye on this. Since you have
> > everything to run a test, please try to see if you can get netperf to run
> > over IPv6, I'm sure the NIC doesn't handle it.
> 
> Willy,
> 
> out of curiosity I replayed all tests using netperf/netserver with -6 which
> enables ipv6. The overall results remain quite the same here:
> enabling support for GRO gives a huge improvement in achievable
> throughput, and the difference between Soeren's and your patch is
> neglectible.

Perfect, thank you for testing this !

Best regards,
Willy

^ permalink raw reply

* Re: [PATCH 0/5 v2] mv643xx_eth: device tree bindings
From: Sebastian Hesselbarth @ 2013-04-11 17:09 UTC (permalink / raw)
  To: Jason Cooper
  Cc: Thomas Petazzoni, Jean-François Moine, Andrew Lunn, netdev,
	devicetree-discuss, Rob Herring, David S. Miller, Grant Likely,
	jogo, linux-arm-kernel, Jamie Lentin, Florian Fainelli,
	Lennert Buytenhek
In-Reply-To: <20130411165303.GI28693@titan.lakedaemon.net>

On Thu, Apr 11, 2013 at 6:53 PM, Jason Cooper <jason@lakedaemon.net> wrote:
> On Thu, Apr 04, 2013 at 12:27:10PM +0200, Florian Fainelli wrote:
>> This patch serie implements mv643xx_eth device tree bindings. I opted for
>> the reuse of the bindings already defined in
>> Documentation/devicetree/bindings/marvell.txt so that we do not create
>> any confusion.
>>
>
> How is the new version of this looking?  Do you think you'll be able to
> get it up in time for DaveM to take it into v3.10?
>
> With this and Thomas' pci series, we will have Kirkwood fully converted
> to devicetree, can begin removing board files, and finally begin
> migrating everything over to mach-mvebu/.  This will lead to the removal
> of five directories under arch/arm/ (plat-orion, mach-kirkwood,
> mach-orion5x, mach-dove, and mach-mv78xx0).

Jason,

I sent Florian an update of some of his patches earlier, based on
cleanup patches I submitted for mv643xx_eth while testing his patch set.
I think, the patch set will play nicely on ARM but without PPC guys involved
in testing it, I guess it will break something there.

Florian is reusing the DT bindings from PPC and they are compatible, but
there parsing is also done within arch/ppc/sysdev/mv64360-dev.c and
maybe somewhere else. The current DT patches for mv643xx_eth will
reparse some properties again (and may break eth on ppc platforms
using mv643xx_eth).

Sebastian

^ permalink raw reply

* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
From: Neil Horman @ 2013-04-11 17:04 UTC (permalink / raw)
  To: g; +Cc: Bart Van Assche, David Miller, netdev
In-Reply-To: <2032636.RnfJ0d1BWA@cpaasch-mac>

On Thu, Apr 11, 2013 at 05:54:28PM +0200, Christoph Paasch wrote:
> On Thursday 11 April 2013 11:18:06 Neil Horman wrote:
> > Bart, this patch should fix your problem.  Could you please test it and
> > confirm?
> > 
> > Bart Van Assche recently reported a warning to me:
> > 
> > <IRQ>  [<ffffffff8103d79f>] warn_slowpath_common+0x7f/0xc0
> >  [<ffffffff8103d7fa>] warn_slowpath_null+0x1a/0x20
> >  [<ffffffff814761dd>] mutex_trylock+0x16d/0x180
> >  [<ffffffff813968c9>] netpoll_poll_dev+0x49/0xc30
> >  [<ffffffff8136a2d2>] ? __alloc_skb+0x82/0x2a0
> >  [<ffffffff81397715>] netpoll_send_skb_on_dev+0x265/0x410
> >  [<ffffffff81397c5a>] netpoll_send_udp+0x28a/0x3a0
> >  [<ffffffffa0541843>] ? write_msg+0x53/0x110 [netconsole]
> >  [<ffffffffa05418bf>] write_msg+0xcf/0x110 [netconsole]
> >  [<ffffffff8103eba1>] call_console_drivers.constprop.17+0xa1/0x1c0
> >  [<ffffffff8103fb76>] console_unlock+0x2d6/0x450
> >  [<ffffffff8104011e>] vprintk_emit+0x1ee/0x510
> >  [<ffffffff8146f9f6>] printk+0x4d/0x4f
> >  [<ffffffffa0004f1d>] scsi_print_command+0x7d/0xe0 [scsi_mod]
> > 
> > This resulted from my commit ca99ca14c which introduced a mutex_trylock
> > operation in a path that could execute in interrupt context.  When mutex
> > debugging is enabled, the above warns the user when we are in fact
> > exectuting in interrupt context.
> > 
> > I think this is a false positive however.  The check is intended to catch
> > users who might be issuing sleeping calls in irq context, but the use of
> > mutex_trylock here is guaranteed not to sleep.
> 
> Even if he does not sleep, may we still hit a deadlock like the following:
> 
> netpoll_rx_disable() calls mutex_lock(), who ends up in __mutex_lock_common, 
> calling spin_lock_mutex().
> 
> Immediatly after that, on the same CPU, the interrupt comes and 
> netpoll_poll_dev calls mutex_trylock and ends up also calling 
> spin_lock_mutex(). Now, it seems to me that we are deadlocked - the interrupt 
> is spinning on the lock, because netpoll_rx_disable() already took it.
> 
> Or maybe I am missing something?
> 
I dont believe the deadlock you describe can happen.  The spin_lock_mutex
operation disables irqs on the local cpu with local_irq_save, so we won't loose
the cpu while we're holding the spinlock.  Likewise we don't restore the irq
flags until after we release said lock.  Once we have the mutex, if we're
preempted by another path that goes through the netpoll_poll_dev path, then we
hit the trylock api call.  The spinlock is either released or held on another
cpu (read: no deadlock), and if the mutex is held, then the trylock simply
fails.

Thanks
Neil

> 
> Cheers,
> Christoph
> 
> -- 
> IP Networking Lab --- http://inl.info.ucl.ac.be
> MultiPath TCP in the Linux Kernel --- http://multipath-tcp.org
> UCLouvain
> --
> 

^ permalink raw reply

* Re: this cpu documentation
From: Paul E. McKenney @ 2013-04-11 17:00 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Randy Dunlap, Eric Dumazet, RongQing Li, Shan Wei, netdev,
	Tejun Heo
In-Reply-To: <0000013dd622cebd-a7fee90b-b297-4e92-9143-87f4771718a4-000000@email.amazonses.com>

On Thu, Apr 04, 2013 at 05:40:38PM +0000, Christoph Lameter wrote:
> On Thu, 4 Apr 2013, Randy Dunlap wrote:
> 
> > Thanks.  I have a few more corrections to V2 (please see below).
> 
> From: Christoph Lameter <cl@linux.com>
> Subject: this_cpu: Add documentation V3
> 
> Document the rationale and the way to use this_cpu operations.
> 
> V2/V3: Improved after feedback from Randy Dunlap
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>

Very good to see this!!!

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> Index: linux/Documentation/this_cpu_ops
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux/Documentation/this_cpu_ops	2013-04-04 12:39:38.479720028 -0500
> @@ -0,0 +1,197 @@
> +this_cpu operations
> +-------------------
> +
> +this_cpu operations are a way of optimizing access to per cpu variables
> +associated with the *currently* executing processor
> +through the use of segment registers (or a dedicated register where the cpu
> +permanently stored the beginning of the per cpu area for a specific
> +processor).
> +
> +The this_cpu operations add a per cpu variable offset to the processor
> +specific percpu base and encode that operation in the instruction operating
> +on the per cpu variable.
> +
> +This means there are no atomicity issues between the calculation
> +of the offset and the operation on the data. Therefore it is not necessary
> +to disable preempt or interrupts to ensure that the processor is not changed
> +between the calculation of the address and the operation on the data.
> +
> +Read-modify-write operations are of particular interest. Frequently
> +processors have special lower latency instructions that can operate without
> +the typical synchronization overhead but still provide some sort of relaxed
> +atomicity guarantee. The x86 for example can execute RMV (Read Modify Write)
> +instructions like inc/dec/cmpxchg without the lock prefix and the
> +associated latency penalty.
> +
> +Access to the variable without the lock prefix is not synchronized but
> +synchronization is not necessary since we are dealing with per cpu data
> +specific to the currently executing processor. Only the current processor
> +should be accessing that variable and therefore there are no concurirency
> +issues with other processors in the system.
> +
> +On x86 the fs: or the gs: segment registers contain the base of the per cpu area. It is
> +then possible to simply use the segment override to relocate a per cpu relative address
> +to the proper per cpu area for the processor. So the relocation to the per cpu base
> +is encoded in the instruction via a segment register prefix.
> +
> +For example:
> +
> +	DEFINE_PER_CPU(int, x);
> +	int z;
> +
> +	z = this_cpu_read(x);
> +
> +results in a single instruction
> +
> +	mov ax, gs:[x]
> +
> +instead of a sequence of calculation of the address and then a fetch from
> +that address which occurs with the percpu operations. Before this_cpu_ops
> +such sequence also required preempt disable/enable to prevent the kernel from
> +moving the thread to a different processor while the calculation is performed.
> +
> +
> +The main use of the this_cpu operations has been to optimize counter operations.
> +
> +
> +	this_cpu_inc(x)
> +
> +results in the following single instruction (no lock prefix!)
> +
> +	inc gs:[x]
> +
> +
> +instead of the following operations required if there is no segment register.
> +
> +	int *y;
> +	int cpu;
> +
> +	cpu = get_cpu();
> +	y = per_cpu_ptr(&x, cpu);
> +	(*y)++;
> +	put_cpu();
> +
> +
> +Note that these operations can only be used on percpu data that is reserved for
> +a specific processor. Without disabling preemption in the surrounding code
> +this_cpu_inc() will only guarantee that one of the percpu counters is correctly
> +incremented. However, there is no guarantee that the OS will not move the process
> +directly before or after the this_cpu instruction is executed. In general this
> +means that the value of the individual counters for each processor are
> +meaningless. The sum of all the per cpu counters is the only value that is of
> +interest.
> +
> +Per cpu variables are used for performance reasons. Bouncing cache lines can
> +be avoided if multiple processors concurrently go through the same code paths.
> +Since each processor has its own per cpu variables no concurrent cacheline
> +updates take place. The price that has to be paid for this optimization is
> +the need to add up the per cpu counters when the value of the counter is
> +needed.
> +
> +
> +Special operations:
> +-------------------
> +
> +	y = this_cpu_ptr(&x)
> +
> +Takes the offset of a per cpu variable (&x !) and returns the address of the
> +per cpu variable that belongs to the currently executing processor.
> +this_cpu_ptr avoids multiple steps that the common get_cpu/put_cpu sequence
> +requires. No processor number is available. Instead the offset of the local
> +per cpu area is simply added to the percpu offset.
> +
> +
> +
> +Per cpu variables and offsets
> +-----------------------------
> +
> +Per cpu variables have *offsets* to the beginning of the percpu area. They do
> +not have addresses although they look like that in the code. Offsets
> +cannot be directly dereferenced. The offset must be added to a base pointer of
> +a percpu area of a processor in order to form a valid address.
> +
> +Therefore the use of x or &x outside of the context of per cpu operations
> +is invalid and will generally be treated like a NULL pointer dereference.
> +
> +In the context of per cpu operations
> +
> +	x is a per cpu variable. Most this_cpu operations take a cpu variable.
> +
> +	&x is the *offset* a per cpu variable. this_cpu_ptr() takes the offset
> +		of a per cpu variable which makes this look a bit strange.
> +
> +
> +
> +Operations on a field of a per cpu structure
> +--------------------------------------------
> +
> +Let's say we have a percpu structure
> +
> +	struct s {
> +		int n,m;
> +	};
> +
> +	DEFINE_PER_CPU(struct s, p);
> +
> +
> +Operations on these fields are straightforward
> +
> +	this_cpu_inc(p.m)
> +
> +	z = this_cpu_cmpxchg(p.m, 0, 1);
> +
> +
> +If we have an offset to struct s:
> +
> +	struct s __percpu *ps = &p;
> +
> +	z = this_cpu_dec(ps->m);
> +
> +	z = this_cpu_inc_return(ps->n);
> +
> +
> +The calculation of the pointer may require the use of this_cpu_ptr() if we
> +do not make use of this_cpu ops later to manipulate fields:
> +
> +	struct s *pp;
> +
> +	pp = this_cpu_ptr(&p);
> +
> +	pp->m--;
> +
> +	z = pp->n++;
> +
> +
> +Variants of this_cpu ops
> +-------------------------
> +
> +this_cpu ops are interrupt safe. Some architecture do not support these per
> +cpu local operations. In that case the operation must be replaced by code
> +that disables interrupts, then does the operations that are guaranteed to be
> +atomic and then reenable interrupts. Doing so is expensive. If there are
> +other reasons why the scheduler cannot change the processor we are executing
> +on then there is no reason to disable interrupts. For that purpose
> +the __this_cpu operations are provided. For example.
> +
> +	__this_cpu_inc(x);
> +
> +Will increment x and will not fallback to code that disables interrupts on
> +platforms that cannot accomplish atomicity through address relocation and
> +a Read-Modify-Write operation in the same instruction.
> +
> +
> +
> +&this_cpu_ptr(pp)->n vs this_cpu_ptr(&pp->n)
> +--------------------------------------------
> +
> +The first operation takes the offset and forms an address and then adds
> +the offset of the n field.
> +
> +The second one first adds the two offsets and then does the relocation.
> +IMHO the second form looks cleaner and has an easier time with (). The
> +second form also is consistent with the way this_cpu_read() and friends
> +are used.
> +
> +
> +Christoph Lameter, April 4th, 2013
> +
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] net: mv643xx_eth: Add GRO support
From: Sebastian Hesselbarth @ 2013-04-11 16:59 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Andrew Lunn, Jason Cooper, Benjamin Herrenschmidt, linux-kernel,
	Florian Fainelli, Soeren Moch, Paul Mackerras, Lennert Buytenhek,
	Dale Farnsworth, netdev, linuxppc-dev, David S. Miller,
	linux-arm-kernel
In-Reply-To: <20130411153256.GH1910@1wt.eu>

On Thu, Apr 11, 2013 at 5:32 PM, Willy Tarreau <w@1wt.eu> wrote:
> On Thu, Apr 11, 2013 at 05:27:03PM +0200, Sebastian Hesselbarth wrote:
>> I don't have a strong opinion on whether Soeren's or your proposal should
>> be submitted. But I insist on having one of them in, as GRO significantly
>> improves the common use case, is enabled by default, and not as
>> constrained as LRO.
>
> I agree, use yours first, but we should keep an eye on this. Since you have
> everything to run a test, please try to see if you can get netperf to run
> over IPv6, I'm sure the NIC doesn't handle it.

Willy,

out of curiosity I replayed all tests using netperf/netserver with -6 which
enables ipv6. The overall results remain quite the same here:
enabling support for GRO gives a huge improvement in achievable
throughput, and the difference between Soeren's and your patch is
neglectible.

Sebastian

^ permalink raw reply

* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
From: Neil Horman @ 2013-04-11 16:56 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Bart Van Assche, David Miller, netdev
In-Reply-To: <1365695821.3887.167.camel@edumazet-glaptop>

On Thu, Apr 11, 2013 at 08:57:01AM -0700, Eric Dumazet wrote:
> On Thu, 2013-04-11 at 11:18 -0400, Neil Horman wrote:
> > Bart, this patch should fix your problem.  Could you please test it and confirm?
> > 
> > Bart Van Assche recently reported a warning to me:
> > 
> > <IRQ>  [<ffffffff8103d79f>] warn_slowpath_common+0x7f/0xc0
> >  [<ffffffff8103d7fa>] warn_slowpath_null+0x1a/0x20
> >  [<ffffffff814761dd>] mutex_trylock+0x16d/0x180
> >  [<ffffffff813968c9>] netpoll_poll_dev+0x49/0xc30
> >  [<ffffffff8136a2d2>] ? __alloc_skb+0x82/0x2a0
> >  [<ffffffff81397715>] netpoll_send_skb_on_dev+0x265/0x410
> >  [<ffffffff81397c5a>] netpoll_send_udp+0x28a/0x3a0
> >  [<ffffffffa0541843>] ? write_msg+0x53/0x110 [netconsole]
> >  [<ffffffffa05418bf>] write_msg+0xcf/0x110 [netconsole]
> >  [<ffffffff8103eba1>] call_console_drivers.constprop.17+0xa1/0x1c0
> >  [<ffffffff8103fb76>] console_unlock+0x2d6/0x450
> >  [<ffffffff8104011e>] vprintk_emit+0x1ee/0x510
> >  [<ffffffff8146f9f6>] printk+0x4d/0x4f
> >  [<ffffffffa0004f1d>] scsi_print_command+0x7d/0xe0 [scsi_mod]
> > 
> > This resulted from my commit ca99ca14c which introduced a mutex_trylock
> > operation in a path that could execute in interrupt context.  When mutex
> > debugging is enabled, the above warns the user when we are in fact exectuting in
> > interrupt context.
> > 
> > I think this is a false positive however.  The check is intended to catch users
> > who might be issuing sleeping calls in irq context, but the use of mutex_trylock
> > here is guaranteed not to sleep.
> > 
> > We could fix this by replacing the DEBUG_LOCK_WARN_ON check in spin_lock_mutex
> > with a __might_sleep call in the appropriate parent mutex operations, but for
> > the sake of effiency (which It seems is why the check was put in the spin lock
> > code only when debug is enabled), lets split the spin_lock_mutex call into two
> > components, where the outer component does the debug checking.  Then
> > mutex_trylock can just call the inner part as its callable from irq context
> > safely.
> > 
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > Reported-by: Bart Van Assche <bvanassche@acm.org>
> > CC: Bart Van Assche <bvanassche@acm.org>
> > CC: David Miller <davem@davemloft.net>
> > CC: netdev@vger.kernel.org
> 
> Please post your patch to lkml ?
> 
I will, I just wanted Bart to test it first, and CCed everyone that he sent his
origional note too.

Thanks
Neil

> Thanks !
> 
> 
> 

^ permalink raw reply

* Re: [PATCH 0/5 v2] mv643xx_eth: device tree bindings
From: Jason Cooper @ 2013-04-11 16:53 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: davem, thomas.petazzoni, moinejf, andrew, netdev,
	devicetree-discuss, rob.herring, grant.likely, jogo, buytenh, jm,
	linux-arm-kernel, sebastian.hesselbarth
In-Reply-To: <1365071235-11611-1-git-send-email-florian@openwrt.org>

Florian,

On Thu, Apr 04, 2013 at 12:27:10PM +0200, Florian Fainelli wrote:
> Hi all,
> 
> This patch serie implements mv643xx_eth device tree bindings. I opted for
> the reuse of the bindings already defined in
> Documentation/devicetree/bindings/marvell.txt so that we do not create
> any confusion.
> 
> For reference, I have included the mv643xx-eth related nodes in the
> corresponding Kirkwood, Dove and Orion5x .dtsi files so you can more
> easily test on your own platforms.
> 
> I tested these on a custom 88F6181-based boards.
> 
> Florian Fainelli (5):
>   mv643xx_eth: add Device Tree bindings
>   mv643xx_eth: update Device Tree bindings documentation
>   ARM: kirkwood: add device node entries for the gigabit interfaces
>   ARM: orion5x: add gigabit ethernet device tree node
>   ARM: dove: add gigabit device tree nodes to dove.dtsi
> 
>  Documentation/devicetree/bindings/marvell.txt |   25 +++++-
>  arch/arm/boot/dts/dove.dtsi                   |   25 ++++++
>  arch/arm/boot/dts/kirkwood.dtsi               |   46 ++++++++++
>  arch/arm/boot/dts/orion5x.dtsi                |   23 +++++
>  drivers/net/ethernet/marvell/mv643xx_eth.c    |  120 ++++++++++++++++++++++++-
>  5 files changed, 234 insertions(+), 5 deletions(-)

How is the new version of this looking?  Do you think you'll be able to
get it up in time for DaveM to take it into v3.10?

With this and Thomas' pci series, we will have Kirkwood fully converted
to devicetree, can begin removing board files, and finally begin
migrating everything over to mach-mvebu/.  This will lead to the removal
of five directories under arch/arm/ (plat-orion, mach-kirkwood,
mach-orion5x, mach-dove, and mach-mv78xx0).

thx,

Jason.

^ permalink raw reply

* Re: [PATCH] net: usb: active URB submitted multiple times
From: Dan Williams @ 2013-04-11 16:38 UTC (permalink / raw)
  To: Petko Manolov; +Cc: David Miller, netdev, linux-kernel, Sarah Sharp
In-Reply-To: <alpine.DEB.2.02.1304110956110.4237@fry>

On Thu, 2013-04-11 at 10:09 +0300, Petko Manolov wrote:
> From: Petko Manolov <petkan@nucleusys.com>
> 
> (For inclusion in 3.10, diff against latest net-next.)

Your mail client replaced tabs with spaces.  Make sure when adding the
patch to the mail, you choose the "preformatted" option or whatever the
client has to ensure that the patch text itself is not modified when
pasting it into the mail.

Dan

> Pegasus driver used single callback for sync and async control URBs. 
> Special flags were employed to distinguish between both, but due to flawed 
> logic (as Sarah Sharp spotted) it didn't always work.  As a result of this 
> change [get|set]_registers() are now much simpler.  Async write is also 
> leaner and does not use single, statically allocated memory for 
> usb_ctrlrequest, which is another potential race when asynchronously 
> submitting URBs.
> 
> The socket buffer pool for the receive path is now gone.  It's existence 
> didn't make much difference (performance-wise) and the code is better off 
> without the spinlocks protecting it.
> 
> Largely duplicated code in routines reading and writing MII registers is 
> now packed in __mii_op().
> 
> Adding URL for the public pegasus git repository.
> 
> Signed-off-by: Petko Manolov <petkan@nucleusys.com>
> ---
>   MAINTAINERS               |    6 +-
>   drivers/net/usb/pegasus.c |  601 +++++++++++++--------------------------------
>   drivers/net/usb/pegasus.h |   10 +-
>   3 files changed, 181 insertions(+), 436 deletions(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index c39bdc3..863d8cb 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8358,9 +8358,10 @@ S:       Maintained
>   F:    drivers/usb/serial/option.c
> 
>   USB PEGASUS DRIVER
> -M:     Petko Manolov <petkan@users.sourceforge.net>
> +M:     Petko Manolov <petkan@nucleusys.com>
>   L:    linux-usb@vger.kernel.org
>   L:    netdev@vger.kernel.org
> +T:     git git://git.code.sf.net/p/pegasus2/git
>   W:    http://pegasus2.sourceforge.net/
>   S:    Maintained
>   F:    drivers/net/usb/pegasus.*
> @@ -8380,9 +8381,10 @@ S:       Supported
>   F:    drivers/usb/class/usblp.c
> 
>   USB RTL8150 DRIVER
> -M:     Petko Manolov <petkan@users.sourceforge.net>
> +M:     Petko Manolov <petkan@nucleusys.com>
>   L:    linux-usb@vger.kernel.org
>   L:    netdev@vger.kernel.org
> +T:     git git://git.code.sf.net/p/pegasus2/git
>   W:    http://pegasus2.sourceforge.net/
>   S:    Maintained
>   F:    drivers/net/usb/rtl8150.c
> diff --git a/drivers/net/usb/pegasus.c b/drivers/net/usb/pegasus.c
> index 73051d1..501db20 100644
> --- a/drivers/net/usb/pegasus.c
> +++ b/drivers/net/usb/pegasus.c
> @@ -1,5 +1,5 @@
>   /*
> - *  Copyright (c) 1999-2005 Petko Manolov (petkan@users.sourceforge.net)
> + *  Copyright (c) 1999-2013 Petko Manolov (petkan@nucleusys.com)
>    *
>    * This program is free software; you can redistribute it and/or modify
>    * it under the terms of the GNU General Public License version 2 as
> @@ -26,6 +26,9 @@
>    *            v0.5.1  ethtool support added
>    *            v0.5.5  rx socket buffers are in a pool and the their allocation
>    *                    is out of the interrupt routine.
> + *             ...
> + *             v0.9.1  simplified [get|set]_register(s), async update registers
> + *                     logic revisited, receive skb_pool removed.
>    */
> 
>   #include <linux/sched.h>
> @@ -45,8 +48,8 @@
>   /*
>    * Version Information
>    */
> -#define DRIVER_VERSION "v0.6.14 (2006/09/27)"
> -#define DRIVER_AUTHOR "Petko Manolov <petkan@users.sourceforge.net>"
> +#define DRIVER_VERSION "v0.9.3 (2013/04/09)"
> +#define DRIVER_AUTHOR "Petko Manolov <petkan@nucleusys.com>"
>   #define DRIVER_DESC "Pegasus/Pegasus II USB Ethernet driver"
> 
>   static const char driver_name[] = "pegasus";
> @@ -108,299 +111,154 @@ MODULE_PARM_DESC(msg_level, "Override default message level");
>   MODULE_DEVICE_TABLE(usb, pegasus_ids);
>   static const struct net_device_ops pegasus_netdev_ops;
> 
> -static int update_eth_regs_async(pegasus_t *);
> -/* Aargh!!! I _really_ hate such tweaks */
> -static void ctrl_callback(struct urb *urb)
> +static void async_ctrl_callback(struct urb *urb)
>   {
> -       pegasus_t *pegasus = urb->context;
> +       struct usb_ctrlrequest *req = (struct usb_ctrlrequest *)urb->context;
>         int status = urb->status;
> 
> -       if (!pegasus)
> -               return;
> -
> -       switch (status) {
> -       case 0:
> -               if (pegasus->flags & ETH_REGS_CHANGE) {
> -                       pegasus->flags &= ~ETH_REGS_CHANGE;
> -                       pegasus->flags |= ETH_REGS_CHANGED;
> -                       update_eth_regs_async(pegasus);
> -                       return;
> -               }
> -               break;
> -       case -EINPROGRESS:
> -               return;
> -       case -ENOENT:
> -               break;
> -       default:
> -               if (net_ratelimit())
> -                       netif_dbg(pegasus, drv, pegasus->net,
> -                                 "%s, status %d\n", __func__, status);
> -               break;
> -       }
> -       pegasus->flags &= ~ETH_REGS_CHANGED;
> -       wake_up(&pegasus->ctrl_wait);
> +       if (status < 0)
> +               dev_dbg(&urb->dev->dev, "%s failed with %d", __func__, status);
> +       kfree(req);
> +       usb_free_urb(urb);
>   }
> 
> -static int get_registers(pegasus_t *pegasus, __u16 indx, __u16 size,
> +static int get_registers(pegasus_t * pegasus, __u16 indx, __u16 size,
>                          void *data)
>   {
>         int ret;
> -       char *buffer;
> -       DECLARE_WAITQUEUE(wait, current);
> -
> -       buffer = kmalloc(size, GFP_KERNEL);
> -       if (!buffer)
> -               return -ENOMEM;
> -
> -       add_wait_queue(&pegasus->ctrl_wait, &wait);
> -       set_current_state(TASK_UNINTERRUPTIBLE);
> -       while (pegasus->flags & ETH_REGS_CHANGED)
> -               schedule();
> -       remove_wait_queue(&pegasus->ctrl_wait, &wait);
> -       set_current_state(TASK_RUNNING);
> -
> -       pegasus->dr.bRequestType = PEGASUS_REQT_READ;
> -       pegasus->dr.bRequest = PEGASUS_REQ_GET_REGS;
> -       pegasus->dr.wValue = cpu_to_le16(0);
> -       pegasus->dr.wIndex = cpu_to_le16(indx);
> -       pegasus->dr.wLength = cpu_to_le16(size);
> -       pegasus->ctrl_urb->transfer_buffer_length = size;
> -
> -       usb_fill_control_urb(pegasus->ctrl_urb, pegasus->usb,
> -                            usb_rcvctrlpipe(pegasus->usb, 0),
> -                            (char *) &pegasus->dr,
> -                            buffer, size, ctrl_callback, pegasus);
> -
> -       add_wait_queue(&pegasus->ctrl_wait, &wait);
> -       set_current_state(TASK_UNINTERRUPTIBLE);
> -
> -       /* using ATOMIC, we'd never wake up if we slept */
> -       if ((ret = usb_submit_urb(pegasus->ctrl_urb, GFP_ATOMIC))) {
> -               set_current_state(TASK_RUNNING);
> -               if (ret == -ENODEV)
> -                       netif_device_detach(pegasus->net);
> -               if (net_ratelimit())
> -                       netif_err(pegasus, drv, pegasus->net,
> -                                 "%s, status %d\n", __func__, ret);
> -               goto out;
> -       }
> -
> -       schedule();
> -out:
> -       remove_wait_queue(&pegasus->ctrl_wait, &wait);
> -       memcpy(data, buffer, size);
> -       kfree(buffer);
> 
> +       ret = usb_control_msg(pegasus->usb, usb_rcvctrlpipe(pegasus->usb, 0),
> +                             PEGASUS_REQ_GET_REGS, PEGASUS_REQT_READ, 0,
> +                             indx, data, size, 1000);
> +       if (ret < 0)
> +               netif_dbg(pegasus, drv, pegasus->net,
> +                         "%s returned %d\n", __func__, ret);
>         return ret;
>   }
> 
> -static int set_registers(pegasus_t *pegasus, __u16 indx, __u16 size,
> +static int set_registers(pegasus_t * pegasus, __u16 indx, __u16 size,
>                          void *data)
>   {
>         int ret;
> -       char *buffer;
> -       DECLARE_WAITQUEUE(wait, current);
> -
> -       buffer = kmemdup(data, size, GFP_KERNEL);
> -       if (!buffer) {
> -               netif_warn(pegasus, drv, pegasus->net,
> -                          "out of memory in %s\n", __func__);
> -               return -ENOMEM;
> -       }
> -
> -       add_wait_queue(&pegasus->ctrl_wait, &wait);
> -       set_current_state(TASK_UNINTERRUPTIBLE);
> -       while (pegasus->flags & ETH_REGS_CHANGED)
> -               schedule();
> -       remove_wait_queue(&pegasus->ctrl_wait, &wait);
> -       set_current_state(TASK_RUNNING);
> -
> -       pegasus->dr.bRequestType = PEGASUS_REQT_WRITE;
> -       pegasus->dr.bRequest = PEGASUS_REQ_SET_REGS;
> -       pegasus->dr.wValue = cpu_to_le16(0);
> -       pegasus->dr.wIndex = cpu_to_le16(indx);
> -       pegasus->dr.wLength = cpu_to_le16(size);
> -       pegasus->ctrl_urb->transfer_buffer_length = size;
> -
> -       usb_fill_control_urb(pegasus->ctrl_urb, pegasus->usb,
> -                            usb_sndctrlpipe(pegasus->usb, 0),
> -                            (char *) &pegasus->dr,
> -                            buffer, size, ctrl_callback, pegasus);
> -
> -       add_wait_queue(&pegasus->ctrl_wait, &wait);
> -       set_current_state(TASK_UNINTERRUPTIBLE);
> -
> -       if ((ret = usb_submit_urb(pegasus->ctrl_urb, GFP_ATOMIC))) {
> -               if (ret == -ENODEV)
> -                       netif_device_detach(pegasus->net);
> -               netif_err(pegasus, drv, pegasus->net,
> -                         "%s, status %d\n", __func__, ret);
> -               goto out;
> -       }
> -
> -       schedule();
> -out:
> -       remove_wait_queue(&pegasus->ctrl_wait, &wait);
> -       kfree(buffer);
> 
> +       ret = usb_control_msg(pegasus->usb, usb_sndctrlpipe(pegasus->usb, 0),
> +                             PEGASUS_REQ_SET_REGS, PEGASUS_REQT_WRITE, 0,
> +                             indx, data, size, 100);
> +       if (ret < 0)
> +               netif_dbg(pegasus, drv, pegasus->net,
> +                         "%s returned %d\n", __func__, ret);
>         return ret;
>   }
> 
> -static int set_register(pegasus_t *pegasus, __u16 indx, __u8 data)
> +static int set_register(pegasus_t * pegasus, __u16 indx, __u8 data)
>   {
>         int ret;
> -       char *tmp;
> -       DECLARE_WAITQUEUE(wait, current);
> -
> -       tmp = kmemdup(&data, 1, GFP_KERNEL);
> -       if (!tmp) {
> -               netif_warn(pegasus, drv, pegasus->net,
> -                          "out of memory in %s\n", __func__);
> -               return -ENOMEM;
> -       }
> -       add_wait_queue(&pegasus->ctrl_wait, &wait);
> -       set_current_state(TASK_UNINTERRUPTIBLE);
> -       while (pegasus->flags & ETH_REGS_CHANGED)
> -               schedule();
> -       remove_wait_queue(&pegasus->ctrl_wait, &wait);
> -       set_current_state(TASK_RUNNING);
> -
> -       pegasus->dr.bRequestType = PEGASUS_REQT_WRITE;
> -       pegasus->dr.bRequest = PEGASUS_REQ_SET_REG;
> -       pegasus->dr.wValue = cpu_to_le16(data);
> -       pegasus->dr.wIndex = cpu_to_le16(indx);
> -       pegasus->dr.wLength = cpu_to_le16(1);
> -       pegasus->ctrl_urb->transfer_buffer_length = 1;
> -
> -       usb_fill_control_urb(pegasus->ctrl_urb, pegasus->usb,
> -                            usb_sndctrlpipe(pegasus->usb, 0),
> -                            (char *) &pegasus->dr,
> -                            tmp, 1, ctrl_callback, pegasus);
> -
> -       add_wait_queue(&pegasus->ctrl_wait, &wait);
> -       set_current_state(TASK_UNINTERRUPTIBLE);
> -
> -       if ((ret = usb_submit_urb(pegasus->ctrl_urb, GFP_ATOMIC))) {
> -               if (ret == -ENODEV)
> -                       netif_device_detach(pegasus->net);
> -               if (net_ratelimit())
> -                       netif_err(pegasus, drv, pegasus->net,
> -                                 "%s, status %d\n", __func__, ret);
> -               goto out;
> -       }
> -
> -       schedule();
> -out:
> -       remove_wait_queue(&pegasus->ctrl_wait, &wait);
> -       kfree(tmp);
> 
> +       ret = usb_control_msg(pegasus->usb, usb_sndctrlpipe(pegasus->usb, 0),
> +                             PEGASUS_REQ_SET_REG, PEGASUS_REQT_WRITE, data,
> +                             indx, &data, 1, 1000);
> +       if (ret < 0)
> +               netif_dbg(pegasus, drv, pegasus->net,
> +                         "%s returned %d\n", __func__, ret);
>         return ret;
>   }
> 
> -static int update_eth_regs_async(pegasus_t *pegasus)
> +static int update_eth_regs_async(pegasus_t * pegasus)
>   {
> -       int ret;
> +       int ret = -ENOMEM;
> +       struct urb *async_urb;
> +       struct usb_ctrlrequest *req;
> 
> -       pegasus->dr.bRequestType = PEGASUS_REQT_WRITE;
> -       pegasus->dr.bRequest = PEGASUS_REQ_SET_REGS;
> -       pegasus->dr.wValue = cpu_to_le16(0);
> -       pegasus->dr.wIndex = cpu_to_le16(EthCtrl0);
> -       pegasus->dr.wLength = cpu_to_le16(3);
> -       pegasus->ctrl_urb->transfer_buffer_length = 3;
> +       req = kmalloc(sizeof(struct usb_ctrlrequest), GFP_ATOMIC);
> +       if (req == NULL)
> +               return ret;
> 
> -       usb_fill_control_urb(pegasus->ctrl_urb, pegasus->usb,
> -                            usb_sndctrlpipe(pegasus->usb, 0),
> -                            (char *) &pegasus->dr,
> -                            pegasus->eth_regs, 3, ctrl_callback, pegasus);
> +       if ((async_urb = usb_alloc_urb(0, GFP_ATOMIC)) == NULL) {
> +               kfree(req);
> +               return ret;
> +       }
> +       req->bRequestType = PEGASUS_REQT_WRITE;
> +       req->bRequest = PEGASUS_REQ_SET_REGS;
> +       req->wValue = cpu_to_le16(0);
> +       req->wIndex = cpu_to_le16(EthCtrl0);
> +       req->wLength = cpu_to_le16(3);
> +
> +       usb_fill_control_urb(async_urb, pegasus->usb,
> +                            usb_sndctrlpipe(pegasus->usb, 0), (void *)req,
> +                            pegasus->eth_regs, 3, async_ctrl_callback, req);
> 
> -       if ((ret = usb_submit_urb(pegasus->ctrl_urb, GFP_ATOMIC))) {
> +       if ((ret = usb_submit_urb(async_urb, GFP_ATOMIC))) {
>                 if (ret == -ENODEV)
>                         netif_device_detach(pegasus->net);
>                 netif_err(pegasus, drv, pegasus->net,
> -                         "%s, status %d\n", __func__, ret);
> +                         "%s returned %d\n", __func__, ret);
>         }
> -
>         return ret;
>   }
> 
> -/* Returns 0 on success, error on failure */
> -static int read_mii_word(pegasus_t *pegasus, __u8 phy, __u8 indx, __u16 *regd)
> +static int __mii_op(pegasus_t * p, __u8 phy, __u8 indx, __u16 * regd, __u8 cmd)
>   {
>         int i;
>         __u8 data[4] = { phy, 0, 0, indx };
>         __le16 regdi;
> -       int ret;
> +       int ret = -ETIMEDOUT;
> 
> -       set_register(pegasus, PhyCtrl, 0);
> -       set_registers(pegasus, PhyAddr, sizeof(data), data);
> -       set_register(pegasus, PhyCtrl, (indx | PHY_READ));
> +       if (cmd & PHY_WRITE) {
> +               __le16 *t = (__le16 *) & data[1];
> +               *t = cpu_to_le16(*regd);
> +       }
> +       set_register(p, PhyCtrl, 0);
> +       set_registers(p, PhyAddr, sizeof(data), data);
> +       set_register(p, PhyCtrl, (indx | cmd));
>         for (i = 0; i < REG_TIMEOUT; i++) {
> -               ret = get_registers(pegasus, PhyCtrl, 1, data);
> -               if (ret == -ESHUTDOWN)
> +               ret = get_registers(p, PhyCtrl, 1, data);
> +               if (ret < 0)
>                         goto fail;
>                 if (data[0] & PHY_DONE)
>                         break;
>         }
> -
>         if (i >= REG_TIMEOUT)
>                 goto fail;
> -
> -       ret = get_registers(pegasus, PhyData, 2, &regdi);
> -       *regd = le16_to_cpu(regdi);
> +       if (cmd & PHY_READ) {
> +               ret = get_registers(p, PhyData, 2, &regdi);
> +               *regd = le16_to_cpu(regdi);
> +               return ret;
> +       }
> +       return 0;
> +fail:
> +       netif_dbg(p, drv, p->net, "%s failed\n", __func__);
>         return ret;
> +}
> 
> -fail:
> -       netif_warn(pegasus, drv, pegasus->net, "%s failed\n", __func__);
> +/* Returns non-negative int on success, error on failure */
> +static int read_mii_word(pegasus_t * pegasus, __u8 phy, __u8 indx, __u16 * regd)
> +{
> +       return __mii_op(pegasus, phy, indx, regd, PHY_READ);
> +}
> 
> -       return ret;
> +/* Returns zero on success, error on failure */
> +static int write_mii_word(pegasus_t * pegasus, __u8 phy, __u8 indx, __u16 * regd)
> +{
> +       return __mii_op(pegasus, phy, indx, regd, PHY_WRITE);
>   }
> 
>   static int mdio_read(struct net_device *dev, int phy_id, int loc)
>   {
>         pegasus_t *pegasus = netdev_priv(dev);
> -       u16 res;
> +       __u16 res;
> 
>         read_mii_word(pegasus, phy_id, loc, &res);
>         return (int)res;
>   }
> 
> -static int write_mii_word(pegasus_t *pegasus, __u8 phy, __u8 indx, __u16 regd)
> -{
> -       int i;
> -       __u8 data[4] = { phy, 0, 0, indx };
> -       int ret;
> -
> -       data[1] = (u8) regd;
> -       data[2] = (u8) (regd >> 8);
> -       set_register(pegasus, PhyCtrl, 0);
> -       set_registers(pegasus, PhyAddr, sizeof(data), data);
> -       set_register(pegasus, PhyCtrl, (indx | PHY_WRITE));
> -       for (i = 0; i < REG_TIMEOUT; i++) {
> -               ret = get_registers(pegasus, PhyCtrl, 1, data);
> -               if (ret == -ESHUTDOWN)
> -                       goto fail;
> -               if (data[0] & PHY_DONE)
> -                       break;
> -       }
> -
> -       if (i >= REG_TIMEOUT)
> -               goto fail;
> -
> -       return ret;
> -
> -fail:
> -       netif_warn(pegasus, drv, pegasus->net, "%s failed\n", __func__);
> -       return -ETIMEDOUT;
> -}
> -
>   static void mdio_write(struct net_device *dev, int phy_id, int loc, int val)
>   {
>         pegasus_t *pegasus = netdev_priv(dev);
> 
> -       write_mii_word(pegasus, phy_id, loc, val);
> +       write_mii_word(pegasus, phy_id, loc, (__u16 *) & val);
>   }
> 
> -static int read_eprom_word(pegasus_t *pegasus, __u8 index, __u16 *retdata)
> +static int read_eprom_word(pegasus_t * pegasus, __u8 index, __u16 * retdata)
>   {
>         int i;
>         __u8 tmp;
> @@ -420,18 +278,16 @@ static int read_eprom_word(pegasus_t *pegasus, __u8 index, __u16 *retdata)
>         }
>         if (i >= REG_TIMEOUT)
>                 goto fail;
> -
>         ret = get_registers(pegasus, EpromData, 2, &retdatai);
>         *retdata = le16_to_cpu(retdatai);
>         return ret;
> -
>   fail:
>         netif_warn(pegasus, drv, pegasus->net, "%s failed\n", __func__);
>         return -ETIMEDOUT;
>   }
> 
>   #ifdef        PEGASUS_WRITE_EEPROM
> -static inline void enable_eprom_write(pegasus_t *pegasus)
> +static inline void enable_eprom_write(pegasus_t * pegasus)
>   {
>         __u8 tmp;
>         int ret;
> @@ -440,7 +296,7 @@ static inline void enable_eprom_write(pegasus_t *pegasus)
>         set_register(pegasus, EthCtrl2, tmp | EPROM_WR_ENABLE);
>   }
> 
> -static inline void disable_eprom_write(pegasus_t *pegasus)
> +static inline void disable_eprom_write(pegasus_t * pegasus)
>   {
>         __u8 tmp;
>         int ret;
> @@ -450,7 +306,7 @@ static inline void disable_eprom_write(pegasus_t *pegasus)
>         set_register(pegasus, EthCtrl2, tmp & ~EPROM_WR_ENABLE);
>   }
> 
> -static int write_eprom_word(pegasus_t *pegasus, __u8 index, __u16 data)
> +static int write_eprom_word(pegasus_t * pegasus, __u8 index, __u16 data)
>   {
>         int i;
>         __u8 tmp, d[4] = { 0x3f, 0, 0, EPROM_WRITE };
> @@ -473,16 +329,14 @@ static int write_eprom_word(pegasus_t *pegasus, __u8 index, __u16 data)
>         disable_eprom_write(pegasus);
>         if (i >= REG_TIMEOUT)
>                 goto fail;
> -
>         return ret;
> -
>   fail:
>         netif_warn(pegasus, drv, pegasus->net, "%s failed\n", __func__);
>         return -ETIMEDOUT;
>   }
> -#endif                         /* PEGASUS_WRITE_EEPROM */
> +#endif /* PEGASUS_WRITE_EEPROM */
> 
> -static inline void get_node_id(pegasus_t *pegasus, __u8 *id)
> +static inline void get_node_id(pegasus_t * pegasus, __u8 * id)
>   {
>         int i;
>         __u16 w16;
> @@ -493,7 +347,7 @@ static inline void get_node_id(pegasus_t *pegasus, __u8 *id)
>         }
>   }
> 
> -static void set_ethernet_addr(pegasus_t *pegasus)
> +static void set_ethernet_addr(pegasus_t * pegasus)
>   {
>         __u8 node_id[6];
> 
> @@ -506,7 +360,7 @@ static void set_ethernet_addr(pegasus_t *pegasus)
>         memcpy(pegasus->net->dev_addr, node_id, sizeof(node_id));
>   }
> 
> -static inline int reset_mac(pegasus_t *pegasus)
> +static inline int reset_mac(pegasus_t * pegasus)
>   {
>         __u8 data = 0x8;
>         int i;
> @@ -528,7 +382,6 @@ static inline int reset_mac(pegasus_t *pegasus)
>         }
>         if (i == REG_TIMEOUT)
>                 return -ETIMEDOUT;
> -
>         if (usb_dev_id[pegasus->dev_index].vendor == VENDOR_LINKSYS ||
>             usb_dev_id[pegasus->dev_index].vendor == VENDOR_DLINK) {
>                 set_register(pegasus, Gpio0, 0x24);
> @@ -537,18 +390,17 @@ static inline int reset_mac(pegasus_t *pegasus)
>         if (usb_dev_id[pegasus->dev_index].vendor == VENDOR_ELCON) {
>                 __u16 auxmode;
>                 read_mii_word(pegasus, 3, 0x1b, &auxmode);
> -               write_mii_word(pegasus, 3, 0x1b, auxmode | 4);
> +               auxmode |= 4;
> +               write_mii_word(pegasus, 3, 0x1b, &auxmode);
>         }
> -
>         return 0;
>   }
> 
> -static int enable_net_traffic(struct net_device *dev, struct usb_device *usb)
> +static void enable_net_traffic(struct net_device *dev, struct usb_device *usb)
>   {
>         __u16 linkpart;
>         __u8 data[4];
>         pegasus_t *pegasus = netdev_priv(dev);
> -       int ret;
> 
>         read_mii_word(pegasus, pegasus->phy, MII_LPA, &linkpart);
>         data[0] = 0xc9;
> @@ -562,62 +414,16 @@ static int enable_net_traffic(struct net_device *dev, struct usb_device *usb)
>         data[2] = loopback ? 0x09 : 0x01;
> 
>         memcpy(pegasus->eth_regs, data, sizeof(data));
> -       ret = set_registers(pegasus, EthCtrl0, 3, data);
> +       set_registers(pegasus, EthCtrl0, 3, data);
> 
>         if (usb_dev_id[pegasus->dev_index].vendor == VENDOR_LINKSYS ||
>             usb_dev_id[pegasus->dev_index].vendor == VENDOR_LINKSYS2 ||
>             usb_dev_id[pegasus->dev_index].vendor == VENDOR_DLINK) {
> -               u16 auxmode;
> +               __u16 auxmode;
>                 read_mii_word(pegasus, 0, 0x1b, &auxmode);
> -               write_mii_word(pegasus, 0, 0x1b, auxmode | 4);
> +               auxmode |= 4;
> +               write_mii_word(pegasus, 0, 0x1b, &auxmode);
>         }
> -
> -       return ret;
> -}
> -
> -static void fill_skb_pool(pegasus_t *pegasus)
> -{
> -       int i;
> -
> -       for (i = 0; i < RX_SKBS; i++) {
> -               if (pegasus->rx_pool[i])
> -                       continue;
> -               pegasus->rx_pool[i] = dev_alloc_skb(PEGASUS_MTU + 2);
> -               /*
> -                ** we give up if the allocation fail. the tasklet will be
> -                ** rescheduled again anyway...
> -                */
> -               if (pegasus->rx_pool[i] == NULL)
> -                       return;
> -               skb_reserve(pegasus->rx_pool[i], 2);
> -       }
> -}
> -
> -static void free_skb_pool(pegasus_t *pegasus)
> -{
> -       int i;
> -
> -       for (i = 0; i < RX_SKBS; i++) {
> -               if (pegasus->rx_pool[i]) {
> -                       dev_kfree_skb(pegasus->rx_pool[i]);
> -                       pegasus->rx_pool[i] = NULL;
> -               }
> -       }
> -}
> -
> -static inline struct sk_buff *pull_skb(pegasus_t * pegasus)
> -{
> -       int i;
> -       struct sk_buff *skb;
> -
> -       for (i = 0; i < RX_SKBS; i++) {
> -               if (likely(pegasus->rx_pool[i] != NULL)) {
> -                       skb = pegasus->rx_pool[i];
> -                       pegasus->rx_pool[i] = NULL;
> -                       return skb;
> -               }
> -       }
> -       return NULL;
>   }
> 
>   static void read_bulk_callback(struct urb *urb)
> @@ -643,7 +449,7 @@ static void read_bulk_callback(struct urb *urb)
>                 netif_dbg(pegasus, rx_err, net, "reset MAC\n");
>                 pegasus->flags &= ~PEGASUS_RX_BUSY;
>                 break;
> -       case -EPIPE:            /* stall, or disconnect from TT */
> +       case -EPIPE:    /* stall, or disconnect from TT */
>                 /* FIXME schedule work to clear the halt */
>                 netif_warn(pegasus, rx_err, net, "no rx stall recovery\n");
>                 return;
> @@ -665,16 +471,16 @@ static void read_bulk_callback(struct urb *urb)
>                 netif_dbg(pegasus, rx_err, net,
>                           "RX packet error %x\n", rx_status);
>                 pegasus->stats.rx_errors++;
> -               if (rx_status & 0x06)   /* long or runt */
> +               if (rx_status & 0x06)   /* long or runt */
>                         pegasus->stats.rx_length_errors++;
>                 if (rx_status & 0x08)
>                         pegasus->stats.rx_crc_errors++;
> -               if (rx_status & 0x10)   /* extra bits   */
> +               if (rx_status & 0x10)   /* extra bits */
>                         pegasus->stats.rx_frame_errors++;
>                 goto goon;
>         }
>         if (pegasus->chip == 0x8513) {
> -               pkt_len = le32_to_cpu(*(__le32 *)urb->transfer_buffer);
> +               pkt_len = le32_to_cpu(*(__le32 *) urb->transfer_buffer);
>                 pkt_len &= 0x0fff;
>                 pegasus->rx_skb->data += 2;
>         } else {
> @@ -683,14 +489,12 @@ static void read_bulk_callback(struct urb *urb)
>                 pkt_len &= 0xfff;
>                 pkt_len -= 8;
>         }
> -
>         /*
>          * If the packet is unreasonably long, quietly drop it rather than
>          * kernel panicing by calling skb_put.
>          */
>         if (pkt_len > PEGASUS_MTU)
>                 goto goon;
> -
>         /*
>          * at this point we are sure pegasus->rx_skb != NULL
>          * so we go ahead and pass up the packet.
> @@ -704,10 +508,8 @@ static void read_bulk_callback(struct urb *urb)
>         if (pegasus->flags & PEGASUS_UNPLUG)
>                 return;
> 
> -       spin_lock(&pegasus->rx_pool_lock);
> -       pegasus->rx_skb = pull_skb(pegasus);
> -       spin_unlock(&pegasus->rx_pool_lock);
> -
> +       pegasus->rx_skb = __netdev_alloc_skb_ip_align(pegasus->net,
> +                                                     PEGASUS_MTU, GFP_ATOMIC);
>         if (pegasus->rx_skb == NULL)
>                 goto tl_sched;
>   goon:
> @@ -724,9 +526,7 @@ goon:
>         } else {
>                 pegasus->flags &= ~PEGASUS_RX_URB_FAIL;
>         }
> -
>         return;
> -
>   tl_sched:
>         tasklet_schedule(&pegasus->rx_tl);
>   }
> @@ -734,24 +534,22 @@ tl_sched:
>   static void rx_fixup(unsigned long data)
>   {
>         pegasus_t *pegasus;
> -       unsigned long flags;
>         int status;
> 
>         pegasus = (pegasus_t *) data;
>         if (pegasus->flags & PEGASUS_UNPLUG)
>                 return;
> -
> -       spin_lock_irqsave(&pegasus->rx_pool_lock, flags);
> -       fill_skb_pool(pegasus);
>         if (pegasus->flags & PEGASUS_RX_URB_FAIL)
>                 if (pegasus->rx_skb)
>                         goto try_again;
>         if (pegasus->rx_skb == NULL)
> -               pegasus->rx_skb = pull_skb(pegasus);
> +               pegasus->rx_skb = __netdev_alloc_skb_ip_align(pegasus->net,
> +                                                             PEGASUS_MTU,
> +                                                             GFP_ATOMIC);
>         if (pegasus->rx_skb == NULL) {
>                 netif_warn(pegasus, rx_err, pegasus->net, "low on memory\n");
>                 tasklet_schedule(&pegasus->rx_tl);
> -               goto done;
> +               return;
>         }
>         usb_fill_bulk_urb(pegasus->rx_urb, pegasus->usb,
>                           usb_rcvbulkpipe(pegasus->usb, 1),
> @@ -767,8 +565,6 @@ try_again:
>         } else {
>                 pegasus->flags &= ~PEGASUS_RX_URB_FAIL;
>         }
> -done:
> -       spin_unlock_irqrestore(&pegasus->rx_pool_lock, flags);
>   }
> 
>   static void write_bulk_callback(struct urb *urb)
> @@ -779,12 +575,9 @@ static void write_bulk_callback(struct urb *urb)
> 
>         if (!pegasus)
>                 return;
> -
>         net = pegasus->net;
> -
>         if (!netif_device_present(net) || !netif_running(net))
>                 return;
> -
>         switch (status) {
>         case -EPIPE:
>                 /* FIXME schedule_work() to clear the tx halt */
> @@ -802,8 +595,7 @@ static void write_bulk_callback(struct urb *urb)
>         case 0:
>                 break;
>         }
> -
> -       net->trans_start = jiffies; /* prevent tx timeout */
> +       net->trans_start = jiffies;     /* prevent tx timeout */
>         netif_wake_queue(net);
>   }
> 
> @@ -816,7 +608,6 @@ static void intr_callback(struct urb *urb)
>         if (!pegasus)
>                 return;
>         net = pegasus->net;
> -
>         switch (status) {
>         case 0:
>                 break;
> @@ -830,13 +621,12 @@ static void intr_callback(struct urb *urb)
>                  */
>                 netif_dbg(pegasus, timer, net, "intr status %d\n", status);
>         }
> -
>         if (urb->actual_length >= 6) {
>                 u8 *d = urb->transfer_buffer;
> 
>                 /* byte 0 == tx_status1, reg 2B */
> -               if (d[0] & (TX_UNDERRUN|EXCESSIVE_COL
> -                                       |LATE_COL|JABBER_TIMEOUT)) {
> +               if (d[0] & (TX_UNDERRUN | EXCESSIVE_COL
> +                           | LATE_COL | JABBER_TIMEOUT)) {
>                         pegasus->stats.tx_errors++;
>                         if (d[0] & TX_UNDERRUN)
>                                 pegasus->stats.tx_fifo_errors++;
> @@ -854,7 +644,6 @@ static void intr_callback(struct urb *urb)
>                 /* bytes 3-4 == rx_lostpkt, reg 2E/2F */
>                 pegasus->stats.rx_missed_errors += ((d[3] & 0x7f) << 8) | d[4];
>         }
> -
>         res = usb_submit_urb(urb, GFP_ATOMIC);
>         if (res == -ENODEV)
>                 netif_device_detach(pegasus->net);
> @@ -872,7 +661,7 @@ static void pegasus_tx_timeout(struct net_device *net)
>   }
> 
>   static netdev_tx_t pegasus_start_xmit(struct sk_buff *skb,
> -                                           struct net_device *net)
> +                                     struct net_device *net)
>   {
>         pegasus_t *pegasus = netdev_priv(net);
>         int count = ((skb->len + 2) & 0x3f) ? skb->len + 2 : skb->len + 3;
> @@ -890,10 +679,10 @@ static netdev_tx_t pegasus_start_xmit(struct sk_buff *skb,
>         if ((res = usb_submit_urb(pegasus->tx_urb, GFP_ATOMIC))) {
>                 netif_warn(pegasus, tx_err, net, "fail tx, %d\n", res);
>                 switch (res) {
> -               case -EPIPE:            /* stall, or disconnect from TT */
> +               case -EPIPE:    /* stall, or disconnect from TT */
>                         /* cleanup should already have been scheduled */
>                         break;
> -               case -ENODEV:           /* disconnect() upcoming */
> +               case -ENODEV:   /* disconnect() upcoming */
>                 case -EPERM:
>                         netif_device_detach(pegasus->net);
>                         break;
> @@ -915,14 +704,14 @@ static struct net_device_stats *pegasus_netdev_stats(struct net_device *dev)
>         return &((pegasus_t *) netdev_priv(dev))->stats;
>   }
> 
> -static inline void disable_net_traffic(pegasus_t *pegasus)
> +static inline void disable_net_traffic(pegasus_t * pegasus)
>   {
>         __le16 tmp = cpu_to_le16(0);
> 
>         set_registers(pegasus, EthCtrl0, sizeof(tmp), &tmp);
>   }
> 
> -static inline void get_interrupt_interval(pegasus_t *pegasus)
> +static inline void get_interrupt_interval(pegasus_t * pegasus)
>   {
>         u16 data;
>         u8 interval;
> @@ -935,7 +724,7 @@ static inline void get_interrupt_interval(pegasus_t *pegasus)
>                                    "intr interval changed from %ums to %ums\n",
>                                    interval, 0x80);
>                         interval = 0x80;
> -                       data = (data & 0x00FF) | ((u16)interval << 8);
> +                       data = (data & 0x00FF) | ((u16) interval << 8);
>   #ifdef PEGASUS_WRITE_EEPROM
>                         write_eprom_word(pegasus, 4, data);
>   #endif
> @@ -951,71 +740,58 @@ static void set_carrier(struct net_device *net)
> 
>         if (read_mii_word(pegasus, pegasus->phy, MII_BMSR, &tmp))
>                 return;
> -
>         if (tmp & BMSR_LSTATUS)
>                 netif_carrier_on(net);
>         else
>                 netif_carrier_off(net);
>   }
> 
> -static void free_all_urbs(pegasus_t *pegasus)
> +static void free_all_urbs(pegasus_t * pegasus)
>   {
>         usb_free_urb(pegasus->intr_urb);
>         usb_free_urb(pegasus->tx_urb);
>         usb_free_urb(pegasus->rx_urb);
> -       usb_free_urb(pegasus->ctrl_urb);
>   }
> 
> -static void unlink_all_urbs(pegasus_t *pegasus)
> +static void unlink_all_urbs(pegasus_t * pegasus)
>   {
>         usb_kill_urb(pegasus->intr_urb);
>         usb_kill_urb(pegasus->tx_urb);
>         usb_kill_urb(pegasus->rx_urb);
> -       usb_kill_urb(pegasus->ctrl_urb);
>   }
> 
> -static int alloc_urbs(pegasus_t *pegasus)
> +static int alloc_urbs(pegasus_t * pegasus)
>   {
> -       pegasus->ctrl_urb = usb_alloc_urb(0, GFP_KERNEL);
> -       if (!pegasus->ctrl_urb)
> -               return 0;
>         pegasus->rx_urb = usb_alloc_urb(0, GFP_KERNEL);
>         if (!pegasus->rx_urb) {
> -               usb_free_urb(pegasus->ctrl_urb);
>                 return 0;
>         }
>         pegasus->tx_urb = usb_alloc_urb(0, GFP_KERNEL);
>         if (!pegasus->tx_urb) {
>                 usb_free_urb(pegasus->rx_urb);
> -               usb_free_urb(pegasus->ctrl_urb);
>                 return 0;
>         }
>         pegasus->intr_urb = usb_alloc_urb(0, GFP_KERNEL);
>         if (!pegasus->intr_urb) {
>                 usb_free_urb(pegasus->tx_urb);
>                 usb_free_urb(pegasus->rx_urb);
> -               usb_free_urb(pegasus->ctrl_urb);
>                 return 0;
>         }
> -
>         return 1;
>   }
> 
>   static int pegasus_open(struct net_device *net)
>   {
>         pegasus_t *pegasus = netdev_priv(net);
> -       int res;
> +       int res = -ENOMEM;
> 
>         if (pegasus->rx_skb == NULL)
> -               pegasus->rx_skb = pull_skb(pegasus);
> -       /*
> -        ** Note: no point to free the pool.  it is empty :-)
> -        */
> +               pegasus->rx_skb = __netdev_alloc_skb_ip_align(pegasus->net,
> +                                                             PEGASUS_MTU,
> +                                                             GFP_KERNEL);
>         if (!pegasus->rx_skb)
> -               return -ENOMEM;
> -
> +               goto exit;
>         res = set_registers(pegasus, EthID, 6, net->dev_addr);
> -
>         usb_fill_bulk_urb(pegasus->rx_urb, pegasus->usb,
>                           usb_rcvbulkpipe(pegasus->usb, 1),
>                           pegasus->rx_skb->data, PEGASUS_MTU + 8,
> @@ -1026,7 +802,6 @@ static int pegasus_open(struct net_device *net)
>                 netif_dbg(pegasus, ifup, net, "failed rx_urb, %d\n", res);
>                 goto exit;
>         }
> -
>         usb_fill_int_urb(pegasus->intr_urb, pegasus->usb,
>                          usb_rcvintpipe(pegasus->usb, 3),
>                          pegasus->intr_buff, sizeof(pegasus->intr_buff),
> @@ -1038,18 +813,9 @@ static int pegasus_open(struct net_device *net)
>                 usb_kill_urb(pegasus->rx_urb);
>                 goto exit;
>         }
> -       if ((res = enable_net_traffic(net, pegasus->usb))) {
> -               netif_dbg(pegasus, ifup, net,
> -                         "can't enable_net_traffic() - %d\n", res);
> -               res = -EIO;
> -               usb_kill_urb(pegasus->rx_urb);
> -               usb_kill_urb(pegasus->intr_urb);
> -               free_skb_pool(pegasus);
> -               goto exit;
> -       }
> +       enable_net_traffic(net, pegasus->usb);
>         set_carrier(net);
>         netif_start_queue(net);
> -       netif_dbg(pegasus, ifup, net, "open\n");
>         res = 0;
>   exit:
>         return res;
> @@ -1081,25 +847,22 @@ static void pegasus_get_drvinfo(struct net_device *dev,
>   /* also handles three patterns of some kind in hardware */
>   #define       WOL_SUPPORTED   (WAKE_MAGIC|WAKE_PHY)
> 
> -static void
> -pegasus_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
> +static void pegasus_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
>   {
> -       pegasus_t       *pegasus = netdev_priv(dev);
> +       pegasus_t *pegasus = netdev_priv(dev);
> 
>         wol->supported = WAKE_MAGIC | WAKE_PHY;
>         wol->wolopts = pegasus->wolopts;
>   }
> 
> -static int
> -pegasus_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
> +static int pegasus_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
>   {
> -       pegasus_t       *pegasus = netdev_priv(dev);
> -       u8              reg78 = 0x04;
> -       int             ret;
> +       pegasus_t *pegasus = netdev_priv(dev);
> +       u8 reg78 = 0x04;
> +       int r;
> 
>         if (wol->wolopts & ~WOL_SUPPORTED)
>                 return -EINVAL;
> -
>         if (wol->wolopts & WAKE_MAGIC)
>                 reg78 |= 0x80;
>         if (wol->wolopts & WAKE_PHY)
> @@ -1111,11 +874,10 @@ pegasus_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
>                 pegasus->eth_regs[0] &= ~0x10;
>         pegasus->wolopts = wol->wolopts;
> 
> -       ret = set_register(pegasus, WakeupControl, reg78);
> -       if (!ret)
> -               ret = device_set_wakeup_enable(&pegasus->usb->dev,
> -                                               wol->wolopts);
> -       return ret;
> +       r = set_register(pegasus, WakeupControl, reg78);
> +       if (!r)
> +               r = device_set_wakeup_enable(&pegasus->usb->dev, wol->wolopts);
> +       return r;
>   }
> 
>   static inline void pegasus_reset_wol(struct net_device *dev)
> @@ -1123,7 +885,7 @@ static inline void pegasus_reset_wol(struct net_device *dev)
>         struct ethtool_wolinfo wol;
> 
>         memset(&wol, 0, sizeof wol);
> -       (void) pegasus_set_wol(dev, &wol);
> +       (void)pegasus_set_wol(dev, &wol);
>   }
> 
>   static int
> @@ -1133,6 +895,7 @@ pegasus_get_settings(struct net_device *dev, struct ethtool_cmd *ecmd)
> 
>         pegasus = netdev_priv(dev);
>         mii_ethtool_gset(&pegasus->mii, ecmd);
> +
>         return 0;
>   }
> 
> @@ -1140,30 +903,35 @@ static int
>   pegasus_set_settings(struct net_device *dev, struct ethtool_cmd *ecmd)
>   {
>         pegasus_t *pegasus = netdev_priv(dev);
> +
>         return mii_ethtool_sset(&pegasus->mii, ecmd);
>   }
> 
>   static int pegasus_nway_reset(struct net_device *dev)
>   {
>         pegasus_t *pegasus = netdev_priv(dev);
> +
>         return mii_nway_restart(&pegasus->mii);
>   }
> 
>   static u32 pegasus_get_link(struct net_device *dev)
>   {
>         pegasus_t *pegasus = netdev_priv(dev);
> +
>         return mii_link_ok(&pegasus->mii);
>   }
> 
>   static u32 pegasus_get_msglevel(struct net_device *dev)
>   {
>         pegasus_t *pegasus = netdev_priv(dev);
> +
>         return pegasus->msg_enable;
>   }
> 
>   static void pegasus_set_msglevel(struct net_device *dev, u32 v)
>   {
>         pegasus_t *pegasus = netdev_priv(dev);
> +
>         pegasus->msg_enable = v;
>   }
> 
> @@ -1181,7 +949,7 @@ static const struct ethtool_ops ops = {
> 
>   static int pegasus_ioctl(struct net_device *net, struct ifreq *rq, int cmd)
>   {
> -       __u16 *data = (__u16 *) &rq->ifr_ifru;
> +       __u16 *data = (__u16 *) & rq->ifr_ifru;
>         pegasus_t *pegasus = netdev_priv(net);
>         int res;
> 
> @@ -1195,7 +963,7 @@ static int pegasus_ioctl(struct net_device *net, struct ifreq *rq, int cmd)
>         case SIOCDEVPRIVATE + 2:
>                 if (!capable(CAP_NET_ADMIN))
>                         return -EPERM;
> -               write_mii_word(pegasus, pegasus->phy, data[1] & 0x1f, data[2]);
> +               write_mii_word(pegasus, pegasus->phy, data[1] & 0x1f, &data[2]);
>                 res = 0;
>                 break;
>         default:
> @@ -1218,15 +986,12 @@ static void pegasus_set_multicast(struct net_device *net)
>         } else {
>                 pegasus->eth_regs[EthCtrl0] &= ~RX_MULTICAST;
>                 pegasus->eth_regs[EthCtrl2] &= ~RX_PROMISCUOUS;
> +               netif_dbg(pegasus, link, net, "general mode\n");
>         }
> -
> -       pegasus->ctrl_urb->status = 0;
> -
> -       pegasus->flags |= ETH_REGS_CHANGE;
> -       ctrl_callback(pegasus->ctrl_urb);
> +       update_eth_regs_async(pegasus);
>   }
> 
> -static __u8 mii_phy_probe(pegasus_t *pegasus)
> +static __u8 mii_phy_probe(pegasus_t * pegasus)
>   {
>         int i;
>         __u16 tmp;
> @@ -1238,11 +1003,10 @@ static __u8 mii_phy_probe(pegasus_t *pegasus)
>                 else
>                         return i;
>         }
> -
>         return 0xff;
>   }
> 
> -static inline void setup_pegasus_II(pegasus_t *pegasus)
> +static inline void setup_pegasus_II(pegasus_t * pegasus)
>   {
>         __u8 data = 0xa5;
> 
> @@ -1253,26 +1017,21 @@ static inline void setup_pegasus_II(pegasus_t *pegasus)
>                 set_register(pegasus, Reg7b, 0);
>         else
>                 set_register(pegasus, Reg7b, 2);
> -
>         set_register(pegasus, 0x83, data);
>         get_registers(pegasus, 0x83, 1, &data);
> -
>         if (data == 0xa5)
>                 pegasus->chip = 0x8513;
>         else
>                 pegasus->chip = 0;
> -
>         set_register(pegasus, 0x80, 0xc0);
>         set_register(pegasus, 0x83, 0xff);
>         set_register(pegasus, 0x84, 0x01);
> -
>         if (pegasus->features & HAS_HOME_PNA && mii_mode)
>                 set_register(pegasus, Reg81, 6);
>         else
>                 set_register(pegasus, Reg81, 2);
>   }
> 
> -
>   static int pegasus_count;
>   static struct workqueue_struct *pegasus_workqueue;
>   #define CARRIER_CHECK_DELAY (2 * HZ)
> @@ -1281,10 +1040,9 @@ static void check_carrier(struct work_struct *work)
>   {
>         pegasus_t *pegasus = container_of(work, pegasus_t, carrier_check.work);
>         set_carrier(pegasus->net);
> -       if (!(pegasus->flags & PEGASUS_UNPLUG)) {
> +       if (!(pegasus->flags & PEGASUS_UNPLUG))
>                 queue_delayed_work(pegasus_workqueue, &pegasus->carrier_check,
> -                       CARRIER_CHECK_DELAY);
> -       }
> +                                  CARRIER_CHECK_DELAY);
>   }
> 
>   static int pegasus_blacklisted(struct usb_device *udev)
> @@ -1299,11 +1057,11 @@ static int pegasus_blacklisted(struct usb_device *udev)
>             (udd->bDeviceClass == USB_CLASS_WIRELESS_CONTROLLER) &&
>             (udd->bDeviceProtocol == 1))
>                 return 1;
> -
>         return 0;
>   }
> 
> -/* we rely on probe() and remove() being serialized so we
> +/*
> + * we rely on probe() and remove() being serialized so we
>    * don't need extra locking on pegasus_count.
>    */
>   static void pegasus_dec_workqueue(void)
> @@ -1340,14 +1098,13 @@ static int pegasus_probe(struct usb_interface *intf,
> 
>         pegasus = netdev_priv(net);
>         pegasus->dev_index = dev_index;
> -       init_waitqueue_head(&pegasus->ctrl_wait);
> 
>         if (!alloc_urbs(pegasus)) {
>                 dev_err(&intf->dev, "can't allocate %s\n", "urbs");
>                 goto out1;
>         }
> 
> -       tasklet_init(&pegasus->rx_tl, rx_fixup, (unsigned long) pegasus);
> +       tasklet_init(&pegasus->rx_tl, rx_fixup, (unsigned long)pegasus);
> 
>         INIT_DELAYED_WORK(&pegasus->carrier_check, check_carrier);
> 
> @@ -1355,7 +1112,6 @@ static int pegasus_probe(struct usb_interface *intf,
>         pegasus->usb = dev;
>         pegasus->net = net;
> 
> -
>         net->watchdog_timeo = PEGASUS_TX_TIMEOUT;
>         net->netdev_ops = &pegasus_netdev_ops;
>         SET_ETHTOOL_OPS(net, &ops);
> @@ -1364,9 +1120,9 @@ static int pegasus_probe(struct usb_interface *intf,
>         pegasus->mii.mdio_write = mdio_write;
>         pegasus->mii.phy_id_mask = 0x1f;
>         pegasus->mii.reg_num_mask = 0x1f;
> -       spin_lock_init(&pegasus->rx_pool_lock);
>         pegasus->msg_enable = netif_msg_init(msg_level, NETIF_MSG_DRV
> -                               | NETIF_MSG_PROBE | NETIF_MSG_LINK);
> +                                            | NETIF_MSG_PROBE
> +                                            | NETIF_MSG_LINK);
> 
>         pegasus->features = usb_dev_id[dev_index].private;
>         get_interrupt_interval(pegasus);
> @@ -1376,7 +1132,6 @@ static int pegasus_probe(struct usb_interface *intf,
>                 goto out2;
>         }
>         set_ethernet_addr(pegasus);
> -       fill_skb_pool(pegasus);
>         if (pegasus->features & PEGASUS_II) {
>                 dev_info(&intf->dev, "setup Pegasus II specific registers\n");
>                 setup_pegasus_II(pegasus);
> @@ -1394,17 +1149,12 @@ static int pegasus_probe(struct usb_interface *intf,
>         if (res)
>                 goto out3;
>         queue_delayed_work(pegasus_workqueue, &pegasus->carrier_check,
> -                               CARRIER_CHECK_DELAY);
> -
> -       dev_info(&intf->dev, "%s, %s, %pM\n",
> -                net->name,
> -                usb_dev_id[dev_index].name,
> -                net->dev_addr);
> +                          CARRIER_CHECK_DELAY);
> +       dev_info(&intf->dev, "%s, %s: %pM\n", net->name,
> +                usb_dev_id[dev_index].name, net->dev_addr);
>         return 0;
> -
>   out3:
>         usb_set_intfdata(intf, NULL);
> -       free_skb_pool(pegasus);
>   out2:
>         free_all_urbs(pegasus);
>   out1:
> @@ -1429,7 +1179,6 @@ static void pegasus_disconnect(struct usb_interface *intf)
>         unregister_netdev(pegasus->net);
>         unlink_all_urbs(pegasus);
>         free_all_urbs(pegasus);
> -       free_skb_pool(pegasus);
>         if (pegasus->rx_skb != NULL) {
>                 dev_kfree_skb(pegasus->rx_skb);
>                 pegasus->rx_skb = NULL;
> @@ -1466,21 +1215,21 @@ static int pegasus_resume(struct usb_interface *intf)
>                 intr_callback(pegasus->intr_urb);
>         }
>         queue_delayed_work(pegasus_workqueue, &pegasus->carrier_check,
> -                               CARRIER_CHECK_DELAY);
> +                          CARRIER_CHECK_DELAY);
>         return 0;
>   }
> 
>   static const struct net_device_ops pegasus_netdev_ops = {
> -       .ndo_open =                     pegasus_open,
> -       .ndo_stop =                     pegasus_close,
> -       .ndo_do_ioctl =                 pegasus_ioctl,
> -       .ndo_start_xmit =               pegasus_start_xmit,
> -       .ndo_set_rx_mode =              pegasus_set_multicast,
> -       .ndo_get_stats =                pegasus_netdev_stats,
> -       .ndo_tx_timeout =               pegasus_tx_timeout,
> -       .ndo_change_mtu =               eth_change_mtu,
> -       .ndo_set_mac_address =          eth_mac_addr,
> -       .ndo_validate_addr =            eth_validate_addr,
> +       .ndo_open = pegasus_open,
> +       .ndo_stop = pegasus_close,
> +       .ndo_do_ioctl = pegasus_ioctl,
> +       .ndo_start_xmit = pegasus_start_xmit,
> +       .ndo_set_rx_mode = pegasus_set_multicast,
> +       .ndo_get_stats = pegasus_netdev_stats,
> +       .ndo_tx_timeout = pegasus_tx_timeout,
> +       .ndo_change_mtu = eth_change_mtu,
> +       .ndo_set_mac_address = eth_mac_addr,
> +       .ndo_validate_addr = eth_validate_addr,
>   };
> 
>   static struct usb_driver pegasus_driver = {
> @@ -1500,7 +1249,7 @@ static void __init parse_id(char *id)
> 
>         if ((token = strsep(&id, ":")) != NULL)
>                 name = token;
> -       /* name now points to a null terminated string*/
> +       /* name now points to a null terminated string */
>         if ((token = strsep(&id, ":")) != NULL)
>                 vendor_id = simple_strtoul(token, NULL, 16);
>         if ((token = strsep(&id, ":")) != NULL)
> @@ -1514,7 +1263,7 @@ static void __init parse_id(char *id)
>         if (device_id > 0x10000 || device_id == 0)
>                 return;
> 
> -       for (i = 0; usb_dev_id[i].name; i++);
> +       for (i = 0; usb_dev_id[i].name; i++) ;
>         usb_dev_id[i].name = name;
>         usb_dev_id[i].vendor = vendor_id;
>         usb_dev_id[i].device = device_id;
> diff --git a/drivers/net/usb/pegasus.h b/drivers/net/usb/pegasus.h
> index 65b78b3..809e560 100644
> --- a/drivers/net/usb/pegasus.h
> +++ b/drivers/net/usb/pegasus.h
> @@ -1,5 +1,5 @@
>   /*
> - * Copyright (c) 1999-2003 Petko Manolov - Petkan (petkan@users.sourceforge.net)
> + * Copyright (c) 1999-2013 Petko Manolov - Petkan (petkan@nucleusys.com)
>    *
>    * This program is free software; you can redistribute it and/or modify
>    * it under the terms of the GNU General Public License version 2 as published
> @@ -34,8 +34,6 @@
>   #define       CTRL_URB_SLEEP          0x00000020
>   #define       PEGASUS_UNPLUG          0x00000040
>   #define       PEGASUS_RX_URB_FAIL     0x00000080
> -#define        ETH_REGS_CHANGE         0x40000000
> -#define        ETH_REGS_CHANGED        0x80000000
> 
>   #define       RX_MULTICAST            2
>   #define       RX_PROMISCUOUS          4
> @@ -96,12 +94,8 @@ typedef struct pegasus {
>         int                     intr_interval;
>         struct tasklet_struct   rx_tl;
>         struct delayed_work     carrier_check;
> -       struct urb              *ctrl_urb, *rx_urb, *tx_urb, *intr_urb;
> -       struct sk_buff          *rx_pool[RX_SKBS];
> +       struct urb              *rx_urb, *tx_urb, *intr_urb;
>         struct sk_buff          *rx_skb;
> -       struct usb_ctrlrequest  dr;
> -       wait_queue_head_t       ctrl_wait;
> -       spinlock_t              rx_pool_lock;
>         int                     chip;
>         unsigned char           intr_buff[8];
>         __u8                    tx_buff[PEGASUS_MTU];
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC PATCH] udp: don't rereference dst_entry dev pointer on rcv
From: Eric Dumazet @ 2013-04-11 16:32 UTC (permalink / raw)
  To: Tom Parkin; +Cc: David Miller, netdev
In-Reply-To: <1363274946.29475.24.camel@edumazet-glaptop>

On Thu, 2013-03-14 at 16:29 +0100, Eric Dumazet wrote:
> On Thu, 2013-03-14 at 16:05 +0100, Eric Dumazet wrote:
> > On Thu, 2013-03-14 at 14:45 +0000, Tom Parkin wrote:
> > > On Thu, Mar 14, 2013 at 02:18:04AM +0100, Eric Dumazet wrote:
> > > > Ah thanks for this, as this definitely makes more sense ;)
> > > > 
> > > > Could you try the following fix ?
> > > > 
> > > > diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
> > > > index b6d30ac..87f4ecb 100644
> > > > --- a/net/ipv4/ip_fragment.c
> > > > +++ b/net/ipv4/ip_fragment.c
> > > > @@ -529,6 +529,7 @@ found:
> > > >  	    qp->q.meat == qp->q.len)
> > > >  		return ip_frag_reasm(qp, prev, dev);
> > > >  
> > > > +	skb_dst_force(skb);
> > > >  	inet_frag_lru_move(&qp->q);
> > > >  	return -EINPROGRESS;
> > > >
> > > 
> > > Thanks Eric, with this patch I can no longer reproduce the oops :-)
> > 
> > Thanks for testing.
> > 
> > I am considering an alternative patch :
> > 
> > We can drop the reference instead, and use the dst of the last skb.
> > 
> > This would help to not dirty the dst refcount.
> > 
> > I'll send an updated version.
> > 
> 
> OK, I tested the following one instead, please test it so that I can send an official patch.
> 
> Thanks
> 
> diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
> index b6d30ac..2d23830 100644
> --- a/net/ipv4/ip_fragment.c
> +++ b/net/ipv4/ip_fragment.c
> @@ -529,6 +529,7 @@ found:
>  	    qp->q.meat == qp->q.len)
>  		return ip_frag_reasm(qp, prev, dev);
>  
> +	skb_dst_drop(skb);
>  	inet_frag_lru_move(&qp->q);
>  	return -EINPROGRESS;
>  
> 

Short update : I do not understand yet why this patch is not working.

Normally, the reassembled packet should get the dst from the last skb
(the one completing the packet)...

I have to make more experiments.

^ permalink raw reply

* Re: [PATCH] net: mv643xx_eth: Add GRO support
From: Eric Dumazet @ 2013-04-11 16:10 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Sebastian Hesselbarth, Andrew Lunn, Jason Cooper,
	Benjamin Herrenschmidt, linux-kernel, Florian Fainelli,
	Soeren Moch, Paul Mackerras, Lennert Buytenhek, Dale Farnsworth,
	netdev, linuxppc-dev, David S. Miller, linux-arm-kernel
In-Reply-To: <20130411160258.GI1910@1wt.eu>

On Thu, 2013-04-11 at 18:02 +0200, Willy Tarreau wrote:

> OK, that makes sense indeed, I didn't think about this case. All
> I remember was that the old call achieved a higher packet rate
> than napi_gro_receive, but it was on an older kernel and I can't
> be more specifics after several months :-/

Its probably true that the GRO handler consumes more cpu for packets
that cant be aggregated in the end.

Thats a trade off, and maybe we could add in the core stack a device
feature to instruct gro handler to do a short cut for packets with no
checksum. Or better a sysctl so that a static_branch can be used.

^ permalink raw reply

* Re: [PATCH] net: mv643xx_eth: Add GRO support
From: Willy Tarreau @ 2013-04-11 16:02 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andrew Lunn, Jason Cooper, Benjamin Herrenschmidt, linux-kernel,
	David S. Miller, Soeren Moch, Paul Mackerras, linux-arm-kernel,
	Dale Farnsworth, netdev, linuxppc-dev, Florian Fainelli,
	Lennert Buytenhek, Sebastian Hesselbarth
In-Reply-To: <1365695675.3887.165.camel@edumazet-glaptop>

On Thu, Apr 11, 2013 at 08:54:35AM -0700, Eric Dumazet wrote:
> On Thu, 2013-04-11 at 17:32 +0200, Willy Tarreau wrote:
> > On Thu, Apr 11, 2013 at 05:27:03PM +0200, Sebastian Hesselbarth wrote:
> > > I don't have a strong opinion on whether Soeren's or your proposal should
> > > be submitted. But I insist on having one of them in, as GRO significantly
> > > improves the common use case, is enabled by default, and not as
> > > constrained as LRO.
> > 
> > I agree, use yours first, but we should keep an eye on this. Since you have
> > everything to run a test, please try to see if you can get netperf to run
> > over IPv6, I'm sure the NIC doesn't handle it.
> 
> Willy, testing the checksum in the NIC driver itself prevents the stack
> doing GRO even if the NIC could not checksum the packet, as in GRE
> tunneling for example.
> 
> So Sebastien patch is better IMHO : Just call the napi gro handler and
> let core stack handles the details ;)

OK, that makes sense indeed, I didn't think about this case. All
I remember was that the old call achieved a higher packet rate
than napi_gro_receive, but it was on an older kernel and I can't
be more specifics after several months :-/

Cheers,
Willy

^ permalink raw reply

* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
From: Eric Dumazet @ 2013-04-11 15:57 UTC (permalink / raw)
  To: Neil Horman; +Cc: Bart Van Assche, David Miller, netdev
In-Reply-To: <1365693486-6315-1-git-send-email-nhorman@tuxdriver.com>

On Thu, 2013-04-11 at 11:18 -0400, Neil Horman wrote:
> Bart, this patch should fix your problem.  Could you please test it and confirm?
> 
> Bart Van Assche recently reported a warning to me:
> 
> <IRQ>  [<ffffffff8103d79f>] warn_slowpath_common+0x7f/0xc0
>  [<ffffffff8103d7fa>] warn_slowpath_null+0x1a/0x20
>  [<ffffffff814761dd>] mutex_trylock+0x16d/0x180
>  [<ffffffff813968c9>] netpoll_poll_dev+0x49/0xc30
>  [<ffffffff8136a2d2>] ? __alloc_skb+0x82/0x2a0
>  [<ffffffff81397715>] netpoll_send_skb_on_dev+0x265/0x410
>  [<ffffffff81397c5a>] netpoll_send_udp+0x28a/0x3a0
>  [<ffffffffa0541843>] ? write_msg+0x53/0x110 [netconsole]
>  [<ffffffffa05418bf>] write_msg+0xcf/0x110 [netconsole]
>  [<ffffffff8103eba1>] call_console_drivers.constprop.17+0xa1/0x1c0
>  [<ffffffff8103fb76>] console_unlock+0x2d6/0x450
>  [<ffffffff8104011e>] vprintk_emit+0x1ee/0x510
>  [<ffffffff8146f9f6>] printk+0x4d/0x4f
>  [<ffffffffa0004f1d>] scsi_print_command+0x7d/0xe0 [scsi_mod]
> 
> This resulted from my commit ca99ca14c which introduced a mutex_trylock
> operation in a path that could execute in interrupt context.  When mutex
> debugging is enabled, the above warns the user when we are in fact exectuting in
> interrupt context.
> 
> I think this is a false positive however.  The check is intended to catch users
> who might be issuing sleeping calls in irq context, but the use of mutex_trylock
> here is guaranteed not to sleep.
> 
> We could fix this by replacing the DEBUG_LOCK_WARN_ON check in spin_lock_mutex
> with a __might_sleep call in the appropriate parent mutex operations, but for
> the sake of effiency (which It seems is why the check was put in the spin lock
> code only when debug is enabled), lets split the spin_lock_mutex call into two
> components, where the outer component does the debug checking.  Then
> mutex_trylock can just call the inner part as its callable from irq context
> safely.
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> Reported-by: Bart Van Assche <bvanassche@acm.org>
> CC: Bart Van Assche <bvanassche@acm.org>
> CC: David Miller <davem@davemloft.net>
> CC: netdev@vger.kernel.org

Please post your patch to lkml ?

Thanks !

^ permalink raw reply

* Re: [PATCH] net: mv643xx_eth: Add GRO support
From: Eric Dumazet @ 2013-04-11 15:54 UTC (permalink / raw)
  To: Sebastian Hesselbarth
  Cc: Soeren Moch, David S. Miller, Lennert Buytenhek, Andrew Lunn,
	Jason Cooper, Florian Fainelli, Benjamin Herrenschmidt,
	Paul Mackerras, Dale Farnsworth, netdev, linux-arm-kernel,
	linuxppc-dev, linux-kernel
In-Reply-To: <1365684023-9967-1-git-send-email-sebastian.hesselbarth@gmail.com>

On Thu, 2013-04-11 at 14:40 +0200, Sebastian Hesselbarth wrote:
> This patch adds GRO support to mv643xx_eth by making it invoke
> napi_gro_receive instead of netif_receive_skb.
> 
> Signed-off-by: Soeren Moch <smoch@web.de>
> Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
> ---
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Lennert Buytenhek <buytenh@wantstofly.org>
> Cc: Andrew Lunn <andrew@lunn.ch>
> Cc: Jason Cooper <jason@lakedaemon.net>
> Cc: Florian Fainelli <florian@openwrt.org>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Dale Farnsworth <dale@farnsworth.org>
> Cc: netdev@vger.kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  drivers/net/ethernet/marvell/mv643xx_eth.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c b/drivers/net/ethernet/marvell/mv643xx_eth.c
> index 305038f..c850d04 100644
> --- a/drivers/net/ethernet/marvell/mv643xx_eth.c
> +++ b/drivers/net/ethernet/marvell/mv643xx_eth.c
> @@ -604,7 +604,7 @@ static int rxq_process(struct rx_queue *rxq, int budget)
>  			lro_receive_skb(&rxq->lro_mgr, skb, (void *)cmd_sts);
>  			lro_flush_needed = 1;
>  		} else
> -			netif_receive_skb(skb);
> +			napi_gro_receive(&mp->napi, skb);
>  
>  		continue;
>  

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [PATCH] net: mv643xx_eth: Add GRO support
From: Eric Dumazet @ 2013-04-11 15:54 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Sebastian Hesselbarth, Andrew Lunn, Jason Cooper,
	Benjamin Herrenschmidt, linux-kernel, Florian Fainelli,
	Soeren Moch, Paul Mackerras, Lennert Buytenhek, Dale Farnsworth,
	netdev, linuxppc-dev, David S. Miller, linux-arm-kernel
In-Reply-To: <20130411153256.GH1910@1wt.eu>

On Thu, 2013-04-11 at 17:32 +0200, Willy Tarreau wrote:
> On Thu, Apr 11, 2013 at 05:27:03PM +0200, Sebastian Hesselbarth wrote:
> > I don't have a strong opinion on whether Soeren's or your proposal should
> > be submitted. But I insist on having one of them in, as GRO significantly
> > improves the common use case, is enabled by default, and not as
> > constrained as LRO.
> 
> I agree, use yours first, but we should keep an eye on this. Since you have
> everything to run a test, please try to see if you can get netperf to run
> over IPv6, I'm sure the NIC doesn't handle it.

Willy, testing the checksum in the NIC driver itself prevents the stack
doing GRO even if the NIC could not checksum the packet, as in GRE
tunneling for example.

So Sebastien patch is better IMHO : Just call the napi gro handler and
let core stack handles the details ;)

^ permalink raw reply

* Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex
From: Christoph Paasch @ 2013-04-11 15:54 UTC (permalink / raw)
  To: Neil Horman; +Cc: Bart Van Assche, David Miller, netdev
In-Reply-To: <1365693486-6315-1-git-send-email-nhorman@tuxdriver.com>

On Thursday 11 April 2013 11:18:06 Neil Horman wrote:
> Bart, this patch should fix your problem.  Could you please test it and
> confirm?
> 
> Bart Van Assche recently reported a warning to me:
> 
> <IRQ>  [<ffffffff8103d79f>] warn_slowpath_common+0x7f/0xc0
>  [<ffffffff8103d7fa>] warn_slowpath_null+0x1a/0x20
>  [<ffffffff814761dd>] mutex_trylock+0x16d/0x180
>  [<ffffffff813968c9>] netpoll_poll_dev+0x49/0xc30
>  [<ffffffff8136a2d2>] ? __alloc_skb+0x82/0x2a0
>  [<ffffffff81397715>] netpoll_send_skb_on_dev+0x265/0x410
>  [<ffffffff81397c5a>] netpoll_send_udp+0x28a/0x3a0
>  [<ffffffffa0541843>] ? write_msg+0x53/0x110 [netconsole]
>  [<ffffffffa05418bf>] write_msg+0xcf/0x110 [netconsole]
>  [<ffffffff8103eba1>] call_console_drivers.constprop.17+0xa1/0x1c0
>  [<ffffffff8103fb76>] console_unlock+0x2d6/0x450
>  [<ffffffff8104011e>] vprintk_emit+0x1ee/0x510
>  [<ffffffff8146f9f6>] printk+0x4d/0x4f
>  [<ffffffffa0004f1d>] scsi_print_command+0x7d/0xe0 [scsi_mod]
> 
> This resulted from my commit ca99ca14c which introduced a mutex_trylock
> operation in a path that could execute in interrupt context.  When mutex
> debugging is enabled, the above warns the user when we are in fact
> exectuting in interrupt context.
> 
> I think this is a false positive however.  The check is intended to catch
> users who might be issuing sleeping calls in irq context, but the use of
> mutex_trylock here is guaranteed not to sleep.

Even if he does not sleep, may we still hit a deadlock like the following:

netpoll_rx_disable() calls mutex_lock(), who ends up in __mutex_lock_common, 
calling spin_lock_mutex().

Immediatly after that, on the same CPU, the interrupt comes and 
netpoll_poll_dev calls mutex_trylock and ends up also calling 
spin_lock_mutex(). Now, it seems to me that we are deadlocked - the interrupt 
is spinning on the lock, because netpoll_rx_disable() already took it.

Or maybe I am missing something?


Cheers,
Christoph

-- 
IP Networking Lab --- http://inl.info.ucl.ac.be
MultiPath TCP in the Linux Kernel --- http://multipath-tcp.org
UCLouvain
--

^ permalink raw reply

* Re: [PATCH] tcp: Reallocate headroom if it would overflow csum_start
From: Eric Dumazet @ 2013-04-11 15:49 UTC (permalink / raw)
  To: Thomas Graf; +Cc: davem, netdev
In-Reply-To: <c35c262084e8907098dc2db5ea9690d2119b4916.1365678820.git.tgraf@suug.ch>

On Thu, 2013-04-11 at 13:19 +0200, Thomas Graf wrote:
> If a TCP retransmission gets partially ACKed and collapsed multiple
> times it is possible for the headroom to grow beyond 64K which will
> overflow the 16bit skb->csum_start which is based on the start of
> the headroom. It has been observed rarely in the wild with IPoIB due
> to the 64K MTU.
> 
> Verify if the acking and collapsing resulted in a headroom exceeding
> what csum_start can cover and reallocate the headroom if so.
> 
> LLNL has been running the patch for a while and has not seen the
> problem occur since.
> 
> A big thank you to Jim Foraker <foraker1@llnl.gov> and the team at
> LLNL for helping out with the investigation and testing.
> 
> Reported-by: Jim Foraker <foraker1@llnl.gov>
> Signed-off-by: Thomas Graf <tgraf@suug.ch>
> ---
> v2: reallocate headroom instead of preventing further collapsing
> 
>  net/ipv4/tcp_output.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index b44cf81..bf6ceb7 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2388,8 +2388,11 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
>  	 */
>  	TCP_SKB_CB(skb)->when = tcp_time_stamp;
>  
> -	/* make sure skb->data is aligned on arches that require it */
> -	if (unlikely(NET_IP_ALIGN && ((unsigned long)skb->data & 3))) {
> +	/* make sure skb->data is aligned on arches that require it
> +	 * and check if ack-trimming & collapsing extended the headroom
> +	 * beyond what csum_start can cover. */
> +	if (unlikely(NET_IP_ALIGN && ((unsigned long)skb->data & 3) ||
> +		     skb_headroom(skb) >= 0xFFFF)) {
>  		struct sk_buff *nskb = __pskb_copy(skb, MAX_TCP_HEADER,
>  						   GFP_ATOMIC);
>  		return nskb ? tcp_transmit_skb(sk, nskb, 0, GFP_ATOMIC) :

Strange... It was tested on an arch with NET_IP_ALIGN == 2 I presume ?

This fix should also be done for other arches (x86 for example)

I would code the condition like that instead

if ((NET_IP_ALIGN && ((unsigned long)skb->data & 3)) ||
    skb_headroom(skb) >= 0xFFFF)

^ permalink raw reply

* linville limited availability for merging after tomorrow
From: John W. Linville @ 2013-04-11 15:46 UTC (permalink / raw)
  To: linux-wireless-u79uwXL29TY76Z2rM5mHXA; +Cc: netdev-u79uwXL29TY76Z2rM5mHXA

Greetings,

This is just a heads-up that I will be doing some traveling (collab
summit and personal) over the next couple of weeks.  I'm not entirely
sure what my schedule will be or how much energy (or clear thinking)
I'll have left for merging. :-)

I'll try to get most pending things merged into my trees tomorrow.
After that, it will be more hit-and-miss -- I'll do what I can when
I can.  I won't really be back to my regular schedule until May 1...

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org			might be all we have.  Be ready.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] net: mv643xx_eth: Add GRO support
From: Willy Tarreau @ 2013-04-11 15:32 UTC (permalink / raw)
  To: Sebastian Hesselbarth
  Cc: Andrew Lunn, Jason Cooper, Benjamin Herrenschmidt, linux-kernel,
	David S. Miller, Soeren Moch, Paul Mackerras, linux-arm-kernel,
	Dale Farnsworth, netdev, linuxppc-dev, Florian Fainelli,
	Lennert Buytenhek
In-Reply-To: <CABJ1b_TRxm97DKJLP_h0VwMJvcYmSpjoJvtg1KvBNGah=VzqMQ@mail.gmail.com>

On Thu, Apr 11, 2013 at 05:27:03PM +0200, Sebastian Hesselbarth wrote:
> I don't have a strong opinion on whether Soeren's or your proposal should
> be submitted. But I insist on having one of them in, as GRO significantly
> improves the common use case, is enabled by default, and not as
> constrained as LRO.

I agree, use yours first, but we should keep an eye on this. Since you have
everything to run a test, please try to see if you can get netperf to run
over IPv6, I'm sure the NIC doesn't handle it.

Willy

^ permalink raw reply

* [PATCH repost for-3.9] pci: avoid work_on_cpu for nested SRIOV probes
From: Michael S. Tsirkin @ 2013-04-11 15:30 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Ming Lei, Greg Kroah-Hartman, David Miller, Roland Dreier, netdev,
	Yan Burman, Jack Morgenstein, Bjorn Helgaas, linux-pci, Tejun Heo

The following lockdep warning is reported to trigger since 3.9-rc1:

=============================================
[ INFO: possible recursive locking detected ]
3.9.0-rc1 #96 Not tainted
---------------------------------------------
kworker/0:1/734 is trying to acquire lock:
 ((&wfc.work)){+.+.+.}, at: [<ffffffff81066cb0>] flush_work+0x0/0x250

but task is already holding lock:
 ((&wfc.work)){+.+.+.}, at: [<ffffffff81064352>]
process_one_work+0x162/0x4c0

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock((&wfc.work));
  lock((&wfc.work));

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by kworker/0:1/734:
 #0:  (events){.+.+.+}, at: [<ffffffff81064352>]
process_one_work+0x162/0x4c0
 #1:  ((&wfc.work)){+.+.+.}, at: [<ffffffff81064352>]
process_one_work+0x162/0x4c0
 #2:  (&__lockdep_no_validate__){......}, at: [<ffffffff812db225>]
device_attach+0x25/0xb0

stack backtrace:
Pid: 734, comm: kworker/0:1 Not tainted 3.9.0-rc1 #96
Call Trace:
 [<ffffffff810948ec>] validate_chain+0xdcc/0x11f0
 [<ffffffff81095150>] __lock_acquire+0x440/0xc70
 [<ffffffff81095150>] ? __lock_acquire+0x440/0xc70
 [<ffffffff810959da>] lock_acquire+0x5a/0x70
 [<ffffffff81066cb0>] ? wq_worker_waking_up+0x60/0x60
 [<ffffffff81066cf5>] flush_work+0x45/0x250
 [<ffffffff81066cb0>] ? wq_worker_waking_up+0x60/0x60
 [<ffffffff810922be>] ? mark_held_locks+0x9e/0x130
 [<ffffffff81066a96>] ? queue_work_on+0x46/0x90
 [<ffffffff810925dd>] ? trace_hardirqs_on_caller+0xfd/0x190
 [<ffffffff8109267d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff81066f74>] work_on_cpu+0x74/0x90
 [<ffffffff81063820>] ? keventd_up+0x20/0x20
 [<ffffffff8121fd30>] ? pci_pm_prepare+0x60/0x60
 [<ffffffff811f9293>] ? cpumask_next_and+0x23/0x40
 [<ffffffff81220a1a>] pci_device_probe+0xba/0x110
 [<ffffffff812dadca>] ? driver_sysfs_add+0x7a/0xb0
 [<ffffffff812daf1f>] driver_probe_device+0x8f/0x230
 [<ffffffff812db170>] ? __driver_attach+0xb0/0xb0
 [<ffffffff812db1bb>] __device_attach+0x4b/0x60
 [<ffffffff812d9314>] bus_for_each_drv+0x64/0x90
 [<ffffffff812db298>] device_attach+0x98/0xb0
 [<ffffffff81218474>] pci_bus_add_device+0x24/0x50
 [<ffffffff81232e80>] virtfn_add+0x240/0x3e0
 [<ffffffff8146ce3d>] ? _raw_spin_unlock_irqrestore+0x3d/0x80
 [<ffffffff812333be>] pci_enable_sriov+0x23e/0x500
 [<ffffffffa011fa1a>] __mlx4_init_one+0x5da/0xce0 [mlx4_core]
 [<ffffffffa012016d>] mlx4_init_one+0x2d/0x60 [mlx4_core]
 [<ffffffff8121fd79>] local_pci_probe+0x49/0x80
 [<ffffffff81063833>] work_for_cpu_fn+0x13/0x20
 [<ffffffff810643b8>] process_one_work+0x1c8/0x4c0
 [<ffffffff81064352>] ? process_one_work+0x162/0x4c0
 [<ffffffff81064cfb>] worker_thread+0x30b/0x430
 [<ffffffff810649f0>] ? manage_workers+0x340/0x340
 [<ffffffff8106cea6>] kthread+0xd6/0xe0
 [<ffffffff8106cdd0>] ? __init_kthread_worker+0x70/0x70
 [<ffffffff8146daac>] ret_from_fork+0x7c/0xb0
 [<ffffffff8106cdd0>] ? __init_kthread_worker+0x70/0x70

Reference: http://marc.info/?l=linux-netdev&m=136249690901892&w=2

The issue is that a driver, in it's probe function, calls
pci_sriov_enable so a PF device probe causes VF probe (AKA nested
probe).  Each probe in pci_device_probe is (normally) run through
work_on_cpu (this is to get the right numa node for memory allocated by
the driver).  In turn work_on_cpu does this internally:

        schedule_work_on(cpu, &wfc.work);
        flush_work(&wfc.work);

So if you are running probe on CPU1, and cause another
probe on the same CPU, this will try to flush
workqueue from inside same workqueue which of course
deadlocks.

Nested probing might be tricky to get right generally.

But for pci_sriov_enable, the situation is actually very simple: all VFs
naturally have same affinity as the PF, and cpumask_any_and is actually
same as cpumask_first_and, so it always gives us the same CPU.
So let's just detect that, and run the probing for VFs locally without a
workqueue.

This is hardly elegant, but looks to me like an appropriate quick fix
for 3.9.

Tested-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

---

Reposting due to missed Cc's. Sorry about the noise.

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 1fa1e48..6eeb5ec 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -286,8 +286,8 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
 		int cpu;
 
 		get_online_cpus();
 		cpu = cpumask_any_and(cpumask_of_node(node), cpu_online_mask);
-		if (cpu < nr_cpu_ids)
+		if (cpu != raw_smp_processor_id() && cpu < nr_cpu_ids)
 			error = work_on_cpu(cpu, local_pci_probe, &ddi);
 		else
 			error = local_pci_probe(&ddi);

^ permalink raw reply related

* Re: [PATCH 1/2 v5] usbnet: allow status interrupt URB to always be active
From: Dan Williams @ 2013-04-11 15:28 UTC (permalink / raw)
  To: Ming Lei
  Cc: Oliver Neukum, Elina Pasheva, Network Development, linux-usb,
	Rory Filer, Phil Sutter
In-Reply-To: <CACVXFVNKSszw+3VnM3hwepbTHh3gvjC+pG2SL_OsVm+1SqrNzQ@mail.gmail.com>

On Thu, 2013-04-11 at 10:31 +0800, Ming Lei wrote:
> On Thu, Apr 11, 2013 at 4:30 AM, Dan Williams <dcbw@redhat.com> wrote:
> > Some drivers (sierra_net) need the status interrupt URB
> > active even when the device is closed, because they receive
> > custom indications from firmware.  Add functions to refcount
> > the status interrupt URB submit/kill operation so that
> > sub-drivers and the generic driver don't fight over whether
> > the status interrupt URB is active or not.
> >
> > A sub-driver can call usbnet_status_start() at any time, but
> > the URB is only submitted the first time the function is
> > called.  Likewise, when the sub-driver is done with the URB,
> > it calls usbnet_status_stop() but the URB is only killed when
> > all users have stopped it.  The URB is still killed and
> > re-submitted for suspend/resume, as before, with the same
> > refcount it had at suspend.
> >
> > Signed-off-by: Dan Williams <dcbw@redhat.com>
> > ---
> >  drivers/net/usb/usbnet.c   | 79 ++++++++++++++++++++++++++++++++++++++++++----
> >  include/linux/usb/usbnet.h |  5 +++
> >  2 files changed, 77 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
> > index 51f3192..b71ce36 100644
> > --- a/drivers/net/usb/usbnet.c
> > +++ b/drivers/net/usb/usbnet.c
> > @@ -252,6 +252,70 @@ static int init_status (struct usbnet *dev, struct usb_interface *intf)
> >         return 0;
> >  }
> >
> > +/* Submit the interrupt URB if not previously submitted, increasing refcount */
> > +int usbnet_status_start(struct usbnet *dev, gfp_t mem_flags)
> 
> 'mem_flags' isn't needed any more since we can apply allocation
> of GFP_NOIO automatically in resume path now, and you can always
> use GFP_KERNEL safely. Considered that it is a API, please don't
> introduce it.
> 
> After removing it, you can add
> 
>           Acked-by: Ming Lei <ming.lei@canonical.com>
> 
> > +{
> > +       int ret = 0;
> > +
> > +       WARN_ON_ONCE(dev->interrupt == NULL);
> > +       if (dev->interrupt) {
> > +               mutex_lock(&dev->interrupt_mutex);
> > +
> > +               if (++dev->interrupt_count == 1)
> > +                       ret = usb_submit_urb(dev->interrupt, mem_flags);
> > +
> > +               dev_dbg(&dev->udev->dev, "incremented interrupt URB count to %d\n",
> > +                       dev->interrupt_count);
> > +               mutex_unlock(&dev->interrupt_mutex);
> > +       }
> > +       return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(usbnet_status_start);
> > +
> > +/* For resume; submit interrupt URB if previously submitted */
> > +static int __usbnet_status_start_force(struct usbnet *dev, gfp_t mem_flags)
> > +{
> > +       int ret = 0;
> > +
> > +       mutex_lock(&dev->interrupt_mutex);
> > +       if (dev->interrupt_count) {
> > +               ret = usb_submit_urb(dev->interrupt, mem_flags);
> > +               dev_dbg(&dev->udev->dev,
> > +                       "submitted interrupt URB for resume\n");
> > +       }
> > +       mutex_unlock(&dev->interrupt_mutex);
> > +       return ret;
> > +}
> > +
> > +/* Kill the interrupt URB if all submitters want it killed */
> > +void usbnet_status_stop(struct usbnet *dev)
> > +{
> > +       if (dev->interrupt) {
> > +               mutex_lock(&dev->interrupt_mutex);
> > +               WARN_ON(dev->interrupt_count == 0);
> > +
> > +               if (dev->interrupt_count && --dev->interrupt_count == 0)
> > +                       usb_kill_urb(dev->interrupt);
> > +
> > +               dev_dbg(&dev->udev->dev,
> > +                       "decremented interrupt URB count to %d\n",
> > +                       dev->interrupt_count);
> > +               mutex_unlock(&dev->interrupt_mutex);
> > +       }
> > +}
> > +EXPORT_SYMBOL_GPL(usbnet_status_stop);
> > +
> > +/* For suspend; always kill interrupt URB */
> > +static void __usbnet_status_stop_force(struct usbnet *dev)
> > +{
> > +       if (dev->interrupt) {
> > +               mutex_lock(&dev->interrupt_mutex);
> > +               usb_kill_urb(dev->interrupt);
> > +               dev_dbg(&dev->udev->dev, "killed interrupt URB for suspend\n");
> > +               mutex_unlock(&dev->interrupt_mutex);
> > +       }
> > +}
> 
> Looks it isn't a good practice to duplicate code in above four functions, but
> it should be OK to merge first.

Oliver requested this approach.  I'd originally taken your suggestion to
merge them, but this proved somewhat more complicated and Oliver
rejected that approach.

Dan

> > +
> >  /* Passes this packet up the stack, updating its accounting.
> >   * Some link protocols batch packets, so their rx_fixup paths
> >   * can return clones as well as just modify the original skb.
> > @@ -725,7 +789,7 @@ int usbnet_stop (struct net_device *net)
> >         if (!(info->flags & FLAG_AVOID_UNLINK_URBS))
> >                 usbnet_terminate_urbs(dev);
> >
> > -       usb_kill_urb(dev->interrupt);
> > +       usbnet_status_stop(dev);
> >
> >         usbnet_purge_paused_rxq(dev);
> >
> > @@ -787,7 +851,7 @@ int usbnet_open (struct net_device *net)
> >
> >         /* start any status interrupt transfer */
> >         if (dev->interrupt) {
> > -               retval = usb_submit_urb (dev->interrupt, GFP_KERNEL);
> > +               retval = usbnet_status_start(dev, GFP_KERNEL);
> >                 if (retval < 0) {
> >                         netif_err(dev, ifup, dev->net,
> >                                   "intr submit %d\n", retval);
> > @@ -1430,6 +1494,8 @@ usbnet_probe (struct usb_interface *udev, const struct usb_device_id *prod)
> >         dev->delay.data = (unsigned long) dev;
> >         init_timer (&dev->delay);
> >         mutex_init (&dev->phy_mutex);
> > +       mutex_init(&dev->interrupt_mutex);
> > +       dev->interrupt_count = 0;
> >
> >         dev->net = net;
> >         strcpy (net->name, "usb%d");
> > @@ -1565,7 +1631,7 @@ int usbnet_suspend (struct usb_interface *intf, pm_message_t message)
> >                  */
> >                 netif_device_detach (dev->net);
> >                 usbnet_terminate_urbs(dev);
> > -               usb_kill_urb(dev->interrupt);
> > +               __usbnet_status_stop_force(dev);
> >
> >                 /*
> >                  * reattach so runtime management can use and
> > @@ -1585,9 +1651,8 @@ int usbnet_resume (struct usb_interface *intf)
> >         int                     retval;
> >
> >         if (!--dev->suspend_count) {
> > -               /* resume interrupt URBs */
> > -               if (dev->interrupt && test_bit(EVENT_DEV_OPEN, &dev->flags))
> > -                       usb_submit_urb(dev->interrupt, GFP_NOIO);
> > +               /* resume interrupt URB if it was previously submitted */
> > +               __usbnet_status_start_force(dev, GFP_NOIO);
> >
> >                 spin_lock_irq(&dev->txq.lock);
> >                 while ((res = usb_get_from_anchor(&dev->deferred))) {
> > diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h
> > index 0e5ac93..d71f44c 100644
> > --- a/include/linux/usb/usbnet.h
> > +++ b/include/linux/usb/usbnet.h
> > @@ -56,6 +56,8 @@ struct usbnet {
> >         struct sk_buff_head     done;
> >         struct sk_buff_head     rxq_pause;
> >         struct urb              *interrupt;
> > +       unsigned                interrupt_count;
> > +       struct mutex            interrupt_mutex;
> >         struct usb_anchor       deferred;
> >         struct tasklet_struct   bh;
> >
> > @@ -246,4 +248,7 @@ extern int usbnet_nway_reset(struct net_device *net);
> >
> >  extern int usbnet_manage_power(struct usbnet *, int);
> >
> > +int usbnet_status_start(struct usbnet *dev, gfp_t mem_flags);
> > +void usbnet_status_stop(struct usbnet *dev);
> > +
> >  #endif /* __LINUX_USB_USBNET_H */
> > --
> > 1.8.1.4
> >
> >
> 
> 
> Thanks,

^ permalink raw reply

* Re: [PATCH 00/11] usbnet: usbnet: handle link change
From: Ming Lei @ 2013-04-11 15:27 UTC (permalink / raw)
  To: Jussi Kivilinna
  Cc: David S. Miller, Greg Kroah-Hartman, Oliver Neukum,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <5166D431.4080400-X3B1VOXEql0@public.gmane.org>

On Thu, Apr 11, 2013 at 11:18 PM, Jussi Kivilinna
<jussi.kivilinna-X3B1VOXEql0@public.gmane.org> wrote:
> On 11.04.2013 17:40, Ming Lei wrote:
>> Hi,
>>
>> This patch set introduces usbnet_link_change() API and applies
>> it on all usbnet drivers, then handle the link change centrally
>> to stop bulk transfer when link becomes off and restart bulk
>> transfer when link becomes on.
>
> Should 'rndis_wlan' be changed to use this too?

If link detection of 'rndis_wlan' doesn't depend on
bulk transfer, it can benefit from the change. Otherwise,
it needn't the change, but the patch won't have side-effect
on 'rndis_wlan'.


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] net: mv643xx_eth: Add GRO support
From: Sebastian Hesselbarth @ 2013-04-11 15:27 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Andrew Lunn, Jason Cooper, Benjamin Herrenschmidt, linux-kernel,
	Florian Fainelli, Soeren Moch, Paul Mackerras, Lennert Buytenhek,
	Dale Farnsworth, netdev, linuxppc-dev, David S. Miller,
	linux-arm-kernel
In-Reply-To: <20130411150326.GA19978@1wt.eu>

On Thu, Apr 11, 2013 at 5:03 PM, Willy Tarreau <w@1wt.eu> wrote:
> On Thu, Apr 11, 2013 at 04:47:49PM +0200, Sebastian Hesselbarth wrote:
>> I tried todays net-next on top of 3.9-rc6 without any gro patch, with
>> the initial
>> patch (Soeren) and your proposed patch (Willy). The results show that
>> both patches
>> allow a significant increase in throughput compared to
>> netif_receive_skb (!gro, !lro)
>> alone. Having gro with lro disabled gives some 2% more throughput
>> compared to lro only.
>
> Indeed this is consistent with my memories, since Eric improved the
> GRO path, it became faster than LRO on this chip.

I don't have a strong opinion on whether Soeren's or your proposal should
be submitted. But I insist on having one of them in, as GRO significantly
improves the common use case, is enabled by default, and not as
constrained as LRO.

Sebastian

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox