Netdev List

Netdev List
 help / color / mirror / Atom feed

* RE: nfs client hang
From: Andy Chittenden @ 2010-07-28  7:08 UTC (permalink / raw)
  To: Chuck Lever, Eric Dumazet
  Cc: Linux Kernel Mailing List (linux-kernel@vger.kernel.org),
	Trond Myklebust, netdev, Linux NFS Mailing List
In-Reply-To: <4C4F174C.2000308@oracle.com>

> I don't see an xs_error_report() call here, which would confirm that the socket took a trip through tcp_disconnect().

From my reading of tcp_disconnect(), it calls sk->sk_error_report(sk) unconditionally so as there's no xs_error_report(), that surely means the exact opposite: tcp_disconnect() wasn't called. If it's not called, sk_shutdown is not cleared. And my revised tracing confirmed that it was set to 
SEND_SHUTDOWN.

-- 
Andy, BlueArc Engineering


-----Original Message-----
From: Chuck Lever [mailto:chuck.lever@oracle.com] 
Sent: 27 July 2010 18:29
To: Eric Dumazet
Cc: Andy Chittenden; Linux Kernel Mailing List (linux-kernel@vger.kernel.org); Trond Myklebust; netdev; Linux NFS Mailing List
Subject: Re: nfs client hang

Add CC: linux-nfs@vger.kernel.org

On 07/27/10 08:21 AM, Eric Dumazet wrote:
> Le mardi 27 juillet 2010 à 11:53 +0100, Andy Chittenden a écrit :
>>>>> IE the client starts a connection and then closes it again without sending data.
>>>> Once this happens, here's some rpcdebug info for the rpc module using 2.6.34.1 kernel:
>>>>
>>>> ... lots of the following nfsv3 WRITE requests:
>>>> [ 7670.026741] 57793 0001    -11 ffff88012e32b000   (null)        0 ffffffffa03beb10 nfsv3 WRITE a:call_reserveresult q:xprt_backlog
>>>> [ 7670.026759] 57794 0001    -11 ffff88012e32b000   (null)        0 ffffffffa03beb10 nfsv3 WRITE a:call_reserveresult q:xprt_backlog
>>>> [ 7670.026778] 57795 0001    -11 ffff88012e32b000   (null)        0 ffffffffa03beb10 nfsv3 WRITE a:call_reserveresult q:xprt_backlog
>>>> [ 7670.026797] 57796 0001    -11 ffff88012e32b000   (null)        0 ffffffffa03beb10 nfsv3 WRITE a:call_reserveresult q:xprt_backlog
>>>> [ 7670.026815] 57797 0001    -11 ffff88012e32b000   (null)        0 ffffffffa03beb10 nfsv3 WRITE a:call_reserveresult q:xprt_backlog
>>>> [ 7670.026834] 57798 0001    -11 ffff88012e32b000   (null)        0 ffffffffa03beb10 nfsv3 WRITE a:call_reserveresult q:xprt_backlog
>>>> [ 7670.026853] 57799 0001    -11 ffff88012e32b000   (null)        0 ffffffffa03beb10 nfsv3 WRITE a:call_reserveresult q:xprt_backlog
>>>> [ 7670.026871] 57800 0001    -11 ffff88012e32b000   (null)        0 ffffffffa03beb10 nfsv3 WRITE a:call_reserveresult q:xprt_backlog
>>>> [ 7670.026890] 57801 0001    -11 ffff88012e32b000   (null)        0 ffffffffa03beb10 nfsv3 WRITE a:call_reserveresult q:xprt_backlog
>>>> [ 7670.026909] 57802 0001    -11 ffff88012e32b000   (null)        0 ffffffffa03beb10 nfsv3 WRITE a:call_reserveresult q:xprt_backlog
>>>> [ 7680.520042] RPC:       worker connecting xprt ffff88013e62d800 via tcp to 10.1.6.102 (port 2049)
>>>> [ 7680.520066] RPC:       ffff88013e62d800 connect status 99 connected 0 sock state 7
>>>> [ 7680.520074] RPC: 33550 __rpc_wake_up_task (now 4296812426)
>>>> [ 7680.520079] RPC: 33550 disabling timer
>>>> [ 7680.520084] RPC: 33550 removed from queue ffff88013e62db20 "xprt_pending"
>>>> [ 7680.520089] RPC:       __rpc_wake_up_task done
>>>> [ 7680.520094] RPC: 33550 __rpc_execute flags=0x1
>>>> [ 7680.520098] RPC: 33550 xprt_connect_status: retrying
>>>> [ 7680.520103] RPC: 33550 call_connect_status (status -11)
>>>> [ 7680.520108] RPC: 33550 call_transmit (status 0)
>>>> [ 7680.520112] RPC: 33550 xprt_prepare_transmit
>>>> [ 7680.520118] RPC: 33550 rpc_xdr_encode (status 0)
>>>> [ 7680.520123] RPC: 33550 marshaling UNIX cred ffff88012e002300
>>>> [ 7680.520130] RPC: 33550 using AUTH_UNIX cred ffff88012e002300 to wrap rpc data
>>>> [ 7680.520136] RPC: 33550 xprt_transmit(32920)
>>>> [ 7680.520145] RPC:       xs_tcp_send_request(32920) = -32
>>>> [ 7680.520151] RPC:       xs_tcp_state_change client ffff88013e62d800...
>>>> [ 7680.520156] RPC:       state 7 conn 0 dead 0 zapped 1
>>
>>> I changed that debug to output sk_shutdown too. That has a value of 2
>>> (IE SEND_SHUTDOWN). Looking at tcp_sendmsg(), I see this:
>>
>>>           err = -EPIPE;
>>>           if (sk->sk_err || (sk->sk_shutdown&  SEND_SHUTDOWN))
>>>                   goto out_err;
>>
>>> which correlates with the trace "xs_tcp_send_request(32920) = -32". So,
>>> this looks like a problem in the sockets/tcp layer. The rpc layer issues
>>> a shutdown and then reconnects using the same socket. So either
>>> sk_shutdown needs zeroing once the shutdown completes or should be
>>> zeroed on subsequent connect. The latter sounds safer.

>> This patch for 2.6.34.1 fixes the issue:
>>
>> --- /home/company/software/src/linux-2.6.34.1/net/ipv4/tcp_output.c     2010-07-27 08:46:46.917000000 +0100
>> +++ net/ipv4/tcp_output.c       2010-07-27 09:19:16.000000000 +0100
>> @@ -2522,6 +2522,13 @@
>>          struct tcp_sock *tp = tcp_sk(sk);
>>          __u8 rcv_wscale;
>>
>> +       /* clear down any previous shutdown attempts so that
>> +        * reconnects on a socket that's been shutdown leave the
>> +        * socket in a usable state (otherwise tcp_sendmsg() returns
>> +        * -EPIPE).
>> +        */
>> +       sk->sk_shutdown = 0;
>> +
>>          /* We'll fix this up when we get a response from the other end.
>>           * See tcp_input.c:tcp_rcv_state_process case TCP_SYN_SENT.
>>           */
>>
>> As I mentioned in my first message, we first saw this issue in 2.6.32 as supplied by debian (linux-image-2.6.32-5-amd64 Version: 2.6.32-17). It looks like the same patch would fix the problem there too.
>>
>
> CC netdev
>
> This reminds me a similar problem we had in the past, fixed with commit
> 1fdf475a (tcp: tcp_disconnect() should clear window_clamp)
>
> But tcp_disconnect() already clears sk->sk_shutdown
>
> If NFS calls tcp_disconnect(), then shutdown(), there is a problem.

If tcp_disconnect() was called at some point, I would expect to see a 
message from xs_error_report() in the debugging output.  Perhaps 
tcp_disconnect() is not being invoked at all?

> Maybe xs_tcp_shutdown() should make some sanity tests ?
>
> Following sequence is legal, and your patch might break it.
>
> fd = socket(...);
> shutdown(fd, SHUT_WR);
> ...
> connect(fd, ...);


I looked closely at some of Andy's debugging output from the 
linux-kernel mailing list archive.  I basically agree that the network 
layer is returning -EPIPE from tcp_sendmsg(), which the RPC client logic 
does not expect. But it's not clear to me how it gets into this state.

> [ 7728.520042] RPC:       worker connecting xprt ffff88013e62d800 via tcp to 10.1.6.102 (port 2049)
 > [ 7728.520093] RPC:       ffff88013e62d800 connect status 115 
connected 0 sock state 2

"sock state 2" => sk->sk_state == TCP_SYN_SENT

 > [ 7728.520884] RPC:       xs_tcp_state_change client ffff88013e62d800...
 > [ 7728.520889] RPC:       state 1 conn 0 dead 0 zapped 1

"state 1" => sk->sk_state == TCP_ESTABLISHED

RPC client wakes up this RPC task now that the connection is established.

> [ 7728.520896] RPC: 33550 __rpc_wake_up_task (now 4296824426)
> [ 7728.520900] RPC: 33550 disabling timer
> [ 7728.520906] RPC: 33550 removed from queue ffff88013e62db20 "xprt_pending"
> [ 7728.520912] RPC:       __rpc_wake_up_task done
> [ 7728.520932] RPC: 33550 __rpc_execute flags=0x1
> [ 7728.520937] RPC: 33550 xprt_connect_status: retrying
> [ 7728.520942] RPC: 33550 call_connect_status (status -11)

The awoken RPC task's status is -EAGAIN, which prevents a reconnection 
attempt.

> [ 7728.520947] RPC: 33550 call_transmit (status 0)
> [ 7728.520951] RPC: 33550 xprt_prepare_transmit
> [ 7728.520957] RPC: 33550 rpc_xdr_encode (status 0)
> [ 7728.520962] RPC: 33550 marshaling UNIX cred ffff88012e002300
> [ 7728.520969] RPC: 33550 using AUTH_UNIX cred ffff88012e002300 to wrap rpc data
> [ 7728.520976] RPC: 33550 xprt_transmit(32920)

RPC client encodes the request and attempts to send it.

> [ 7728.520984] RPC:       xs_tcp_send_request(32920) = -32

Network layer says -EPIPE, for some reason.  RPC client calls 
kernel_sock_shutdown(SHUT_WR).

> [ 7728.520997] RPC:       xs_tcp_state_change client ffff88013e62d800...
> [ 7728.521007] RPC:       state 4 conn 1 dead 0 zapped 1

"state 4" => sk->sk_state == TCP_FIN_WAIT1

The RPC client sets up a linger timeout.

> [ 7728.521013] RPC: 33550 call_status (status -32)
> [ 7728.521018] RPC: 33550 call_bind (status 0)
> [ 7728.521023] RPC: 33550 call_connect xprt ffff88013e62d800 is not connected
> [ 7728.521028] RPC: 33550 xprt_connect xprt ffff88013e62d800 is not connected
> [ 7728.521035] RPC: 33550 sleep_on(queue "xprt_pending" time 4296824426)
> [ 7728.521040] RPC: 33550 added to queue ffff88013e62db20 "xprt_pending"
> [ 7728.521045] RPC: 33550 setting alarm for 60000 ms

RPC client puts this RPC task to sleep.

> [ 7728.521439] RPC:       xs_tcp_state_change client ffff88013e62d800...
> [ 7728.521444] RPC:       state 5 conn 0 dead 0 zapped 1

"state 5" => sk->sk_state == TCP_FIN_WAIT2

> [ 7728.521602] RPC:       xs_tcp_state_change client ffff88013e62d800...
> [ 7728.521608] RPC:       state 7 conn 0 dead 0 zapped 1
> [ 7728.521612] RPC:       disconnected transport ffff88013e62d800

"state 7" => sk->sk_state == TCP_CLOSE

The RPC client marks the socket closed. and the linger timeout is 
cancelled.  At this point, sk_shutdown should be set to zero, correct? 
I don't see an xs_error_report() call here, which would confirm that the 
socket took a trip through tcp_disconnect().

> [ 7728.521617] RPC: 33550 __rpc_wake_up_task (now 4296824426)
> [ 7728.521621] RPC: 33550 disabling timer
> [ 7728.521626] RPC: 33550 removed from queue ffff88013e62db20 "xprt_pending"
> [ 7728.521631] RPC:       __rpc_wake_up_task done

RPC client wakes up the RPC task.  Meanwhile...

> [ 7728.521636] RPC:       xs_tcp_state_change client ffff88013e62d800...
> [ 7728.521641] RPC:       state 7 conn 0 dead 0 zapped 1
> [ 7728.521645] RPC:       disconnected transport ffff88013e62d800
> [ 7728.521649] RPC:       xs_tcp_data_ready...

... network layer calls closed socket's data_ready method... while the 
awoken RPC task gets underway.

> [ 7728.521662] RPC: 33550 __rpc_execute flags=0x1
> [ 7728.521666] RPC: 33550 xprt_connect_status: retrying
> [ 7728.521671] RPC: 33550 call_connect_status (status -11)

The awoken RPC task's status is -EAGAIN, which prevents a reconnection 
attempt, even though there is no established connection.

RPC client barrels on to send the request again.

> [ 7728.521675] RPC: 33550 call_transmit (status 0)
> [ 7728.521679] RPC: 33550 xprt_prepare_transmit
> [ 7728.521683] RPC: 33550 rpc_xdr_encode (status 0)
> [ 7728.521688] RPC: 33550 marshaling UNIX cred ffff88012e002300
> [ 7728.521694] RPC: 33550 using AUTH_UNIX cred ffff88012e002300 to wrap rpc data
> [ 7728.521699] RPC: 33550 xprt_transmit(32920)
> [ 7728.521704] RPC:       xs_tcp_send_request(32920) = -32

RPC client attempts to send again, gets -EPIPE, and calls 
kernel_sock_shutdown(SHUT_WR).  If there is no connection established, 
the RPC client expects -ENOTCONN, in which case it will attempt to 
reconnect here.

> [ 7728.521709] RPC:       xs_tcp_state_change client ffff88013e62d800...
> [ 7728.521714] RPC:       state 7 conn 0 dead 0 zapped 1
> [ 7728.521718] RPC:       disconnected transport ffff88013e62d800

"state 7" => sk->sk_state == TCP_CLOSE

Following this, the RPC client attempts to retransmit the request 
repeatedly, but the socket remains in the TCP_CLOSE state.

^ permalink raw reply

* RE: nfs client hang
From: Andy Chittenden @ 2010-07-28  7:24 UTC (permalink / raw)
  To: Andy Chittenden, Chuck Lever, Eric Dumazet
  Cc: Linux Kernel Mailing List (linux-kernel@vger.kernel.org),
	Trond Myklebust, netdev, Linux NFS Mailing List
In-Reply-To: <4C4F174C.2000308@oracle.com>

resending as it seems to have been corrupted on LKML!

> The RPC client marks the socket closed. and the linger timeout is 
> cancelled.  At this point, sk_shutdown should be set to zero, correct? 
> I don't see an xs_error_report() call here, which would confirm that the 
> socket took a trip through tcp_disconnect().

From my reading of tcp_disconnect(), it calls sk->sk_error_report(sk) unconditionally so as there's no xs_error_report(), that surely means the exact opposite: tcp_disconnect() wasn't called. If it's not called, sk_shutdown is not cleared. And my revised tracing confirmed that it was set to 
SEND_SHUTDOWN.

-- 
Andy, BlueArc Engineering

^ permalink raw reply

* Re: br_forward.c - rcu dereference warning
From: Johannes Berg @ 2010-07-28  7:33 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: paulmck, netdev
In-Reply-To: <20100727134223.5bb0be30@nehalam>

On Tue, 2010-07-27 at 13:42 -0700, Stephen Hemminger wrote:

> > 
> > Did you want me to test the patch?
> 
> Yes please, I can make sure it works, but not that it gets rid
> of your error

Yes, it fixed it, thanks.

johannes



^ permalink raw reply

* Re: BUG: net-next netns teardown bridge bug "waiting for lo to become free."
From: David Lamparter @ 2010-07-28  7:44 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20100727100249.51ffc893@nehalam>

On Tue, Jul 27, 2010 at 10:02:49AM -0700, Stephen Hemminger wrote:
> David Lamparter <equinox@diac24.net> wrote:
> > unregister_netdevice: waiting for lo to become free. Usage count = 4
> > 
> > unshare -n -- /bin/bash -c 'ip link add link lo veth0 type veth peer
> > name veth3 netns 1; brctl addbr br0; ip l s br0 up; ip a l; sleep 8'
> > 
> > (It happens when the netns is going down.)
> > 
> > Without the "ip l s br0 up" it does not happen;
[...]
> > I'm looking into this, but i'm not versed in Linux kernel code, so my
> > luck in fixing this might be limited. Some reference taken in bridge
> > "up"? Should the bridge be set "down" on netns teardown?
> 
> Loopback device is special and is referenced by routes etc.
> You need to manually remove bridge, and all routes, neighbors and cache
> before removing namespace.

That doesn't do. I have namespaces which are only network namespaces and
they have only one or two processes like openvpn. openvpn can have bugs,
so it might just terminate. If it is the last process in the namespace,
the namespace needs to go, and correctly. Even if I had some "master
daemon" to tear down the namespace at the "end", who tells me that that
will work perfectly and not die? It shouldn't be possible to break the
network stack by just killing the wrong process...

I'll try reading the code today, maybe I can come up with the right
dereferences in the right places.

-David

^ permalink raw reply

* Re: [PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread
From: Tejun Heo @ 2010-07-28  7:48 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Oleg Nesterov, Sridhar Samudrala, netdev, lkml,
	kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev, Jiri Kosina,
	Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <20100727191911.GA16350@redhat.com>

On 07/27/2010 09:19 PM, Michael S. Tsirkin wrote:
>> Thinking a bit more about it, it kind of sucks that queueing to
>> another worker from worker->func() breaks flush.  Maybe the right
>> thing to do there is using atomic_t for done_seq?
> 
> I don't believe it will help: we might have:
> 
> worker1 runs work
> work requeues itself queued index = 1
> worker1 reads queued index = 1
> worker2 runs work
> work requeues itself queued index = 2
> worker2 runs work
> worker2 reads queued index = 2
> worker2 writes done index = 2
> worker1 writes done index = 1
> 
> As you see, done index got moved back.

Yeah, I think the flushing logic should be moved to the worker.  Are
you interested in doing it w/ your change?

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [Uclinux-dist-devel] [PATCH net-next] drivers/net/bfin_mac.c: Use pr_fmt, netdev_<level>
From: Mike Frysinger @ 2010-07-28  7:50 UTC (permalink / raw)
  To: Joe Perches; +Cc: Michael Hennerich, uclinux-dist-devel, LKML, netdev
In-Reply-To: <1280258531.24054.10.camel@Joe-Laptop.home>

On Tue, Jul 27, 2010 at 15:22, Joe Perches wrote:
> And some assorted neatening for checkpatch:
>
>        80 column reformatting (mostly comments)
>        argument alignment
>        couple of spelling/grammar typos corrected
>
> Added bfin_alloc_skb to centralize allocation/dcache invalidation
> Added get_mac_addr for symmetry
>
> $ ./scripts/checkpatch.pl -f drivers/net/bfin_mac.c | grep "^total:"
> total: 2 errors, 25 warnings, 1723 lines checked
> $ ./scripts/checkpatch.pl -f drivers/net/bfin_mac.c | grep "^total:"
> total: 0 errors, 0 warnings, 1743 lines checked
>
> Uncompiled, untested.

i dislike the mixing of whitespace and useful changes

if they were split, and they worked, then i wouldnt have a problem with them
-mike

^ permalink raw reply

* can: expected receive behavior broken
From: Matthias Fuchs @ 2010-07-28  7:56 UTC (permalink / raw)
  To: Linux Netdev List; +Cc: Socketcan-core-0fE9KPoRgkgATYTw5x5z8w

Hi,

I just noticed that the receive behavior of CAN sockets is broken
in current net-next-2.6.
I wrote some simple code that receives messages and echos them back to
the bus. When I now trigger one single message on the bus, I get
this message received and echoed back in an endless loop.

I do not touch the sockopts CAN_RAW_LOOPBACK or CAN_RAW_RECV_OWN_MSGS in my code.
Only (!) setting CAN_RAW_LOOPBACK to 0 helps at the moment. But this behavior
actually has nothing to do with LOOPBACK but more with RECV_OWN_MSGS.

Matthias

^ permalink raw reply

* Re: [PATCH] Driver-core: Fix bluetooth network device rename  regression
From: Eric W. Biederman @ 2010-07-28  7:57 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Greg KH, Greg KH, Johannes Berg, Andrew Morton, Rafael J. Wysocki,
	Maciej W. Rozycki, netdev
In-Reply-To: <AANLkTikoLQMH5LZCgKmQivTrtvtCf8oksk-HwBQQ+Tte@mail.gmail.com>

Kay Sievers <kay.sievers@vrfy.org> writes:

> Yeah, but most of these things we should have fixed over the last
> years. There is no single WAIT_FOR instruction left in udev rules. :)

Last time I looked there were quite a few attributes that were still
getting created late.  I would not be surprised if the common case
works fine, but I know of a least one and I think a couple of weird
cases that still have to do unpleasant things.

Still that is a project for another time.


>> At the subsystem level bus devices look better.
>> At the individual device level bus devices stacked on bus devices
>> appear to be a namespace disaster.
>
> They are usually created by the same code, in many cases by the same
> drivers, and have not been a real problem so far. As you said, network
> devices are special here, because of the ability to rename them from
> userspace.
>
> At some time in the future, when buses and classes are merged, I
> expect stuff can just set a flag to have a 'glue dir' created or not.
>
> For now 'glue dirs' are limited to be created between a bus and a
> class device. It could possibly be extended to be created between
> classes of different types to handle issues like this.

Sounds like a plan.  And now I'm off on vacation.

Have a good one.

Eric

^ permalink raw reply

* local_bh_enable_ip warning with conntrack/forcedeth
From: Johannes Berg @ 2010-07-28  8:12 UTC (permalink / raw)
  To: netdev

This is a bit weird ... forcedeth uses dev_kfree_skb_any(), which gets
to dev_kree_skb_irq(), but then why consume_skb() which can't be called
in that context?

Or is the destructor check in dev_kfree_skb_irq() wrong?

[  446.375065] WARNING: at /home/johannes/sys/wireless-testing/kernel/softirq.c:143 local_bh_enable_ip+0xba/0x110()
[  446.375069] Hardware name: MacBook5,1
[  446.375172] Pid: 0, comm: swapper Not tainted 2.6.35-rc6-wl-48390-gb63b887-dirty #176
[  446.375175] Call Trace:
[  446.375178]  <IRQ>  [<ffffffff81046b5f>] warn_slowpath_common+0x7f/0xc0
[  446.375197]  [<ffffffff81046bba>] warn_slowpath_null+0x1a/0x20
[  446.375201]  [<ffffffff8104eb0a>] local_bh_enable_ip+0xba/0x110
[  446.375207]  [<ffffffff8149072e>] _raw_spin_unlock_bh+0x3e/0x50
[  446.375214]  [<ffffffffa08c249c>] destroy_conntrack+0xfc/0x150 [nf_conntrack]
[  446.375227]  [<ffffffff813db358>] nf_conntrack_destroy+0x68/0x100
[  446.375236]  [<ffffffff813a71b5>] skb_release_head_state+0xe5/0x120
[  446.375240]  [<ffffffff813a6de6>] __kfree_skb+0x16/0xa0
[  446.375244]  [<ffffffff813a6e8e>] consume_skb+0x1e/0x40
[  446.375249]  [<ffffffff813b4405>] dev_kfree_skb_irq+0xa5/0xb0
[  446.375258]  [<ffffffff813b443c>] dev_kfree_skb_any+0x2c/0x50
[  446.375262]  [<ffffffff81364be9>] nv_tx_done_optimized+0x59/0x220
[  446.375271]  [<ffffffff81367ffa>] nv_napi_poll+0x6a/0x330
[  446.375276]  [<ffffffff813b228b>] net_rx_action+0x12b/0x300
[  446.375280]  [<ffffffff8104ee34>] __do_softirq+0x114/0x3d0
[  446.375290]  [<ffffffff8100360c>] call_softirq+0x1c/0x50
[  446.375294]  [<ffffffff8100500d>] do_softirq+0x7d/0xb0
[  446.375298]  [<ffffffff8104e8d5>] irq_exit+0x95/0xa0
[  446.375302]  [<ffffffff8100450c>] do_IRQ+0x7c/0xf0
[  446.375306]  [<ffffffff81490d13>] ret_from_intr+0x0/0xf




^ permalink raw reply

* Re: can: expected receive behavior broken
From: Wolfgang Grandegger @ 2010-07-28  8:17 UTC (permalink / raw)
  To: Matthias Fuchs; +Cc: Socketcan-core-0fE9KPoRgkgATYTw5x5z8w, Linux Netdev List
In-Reply-To: <201007280956.38957.matthias.fuchs-iOnpLzIbIdM@public.gmane.org>

On 07/28/2010 09:56 AM, Matthias Fuchs wrote:
> Hi,
> 
> I just noticed that the receive behavior of CAN sockets is broken
> in current net-next-2.6.
> I wrote some simple code that receives messages and echos them back to
> the bus. When I now trigger one single message on the bus, I get
> this message received and echoed back in an endless loop.
> 
> I do not touch the sockopts CAN_RAW_LOOPBACK or CAN_RAW_RECV_OWN_MSGS in my code.
> Only (!) setting CAN_RAW_LOOPBACK to 0 helps at the moment. But this behavior
> actually has nothing to do with LOOPBACK but more with RECV_OWN_MSGS.

Sounds wired! What driver are you using?

Wolfgang.

^ permalink raw reply

* Re: can: expected receive behavior broken
From: Matthias Fuchs @ 2010-07-28  8:23 UTC (permalink / raw)
  To: Wolfgang Grandegger
  Cc: Socketcan-core-0fE9KPoRgkgATYTw5x5z8w, Linux Netdev List
In-Reply-To: <4C4FE7AC.4010806-5Yr1BZd7O62+XT7JhA+gdA@public.gmane.org>

plx_pci/sja1000 + esd_usb2

Should be easy to reproduce.

Matthias

On Wednesday 28 July 2010 10:17, Wolfgang Grandegger wrote:
> On 07/28/2010 09:56 AM, Matthias Fuchs wrote:
> > Hi,
> > 
> > I just noticed that the receive behavior of CAN sockets is broken
> > in current net-next-2.6.
> > I wrote some simple code that receives messages and echos them back to
> > the bus. When I now trigger one single message on the bus, I get
> > this message received and echoed back in an endless loop.
> > 
> > I do not touch the sockopts CAN_RAW_LOOPBACK or CAN_RAW_RECV_OWN_MSGS in my code.
> > Only (!) setting CAN_RAW_LOOPBACK to 0 helps at the moment. But this behavior
> > actually has nothing to do with LOOPBACK but more with RECV_OWN_MSGS.
> 
> Sounds wired! What driver are you using?
> 
> Wolfgang.
> 
> 
> 

^ permalink raw reply

* noqueue on bonding devices
From: Simon Horman @ 2010-07-28  8:32 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: netdev

Hi Jay, Hi All,

I would just to wonder out loud if it is intentional that bonding
devices default to noqueue, whereas for instance ethernet devices
default to a pfifo_fast with qlen 1000.

The reason that I ask, is that when setting up some bandwidth
control using tc I encountered some strange behaviour which
I eventually tracked down to the queue-length of the qdiscs being 1p -
inherited from noqueue, as opposed to 1000p which would occur
on an ethernet device.

Its trivial to work around, by either altering the txqueuelen on
the bonding device before adding the qdisc or by manually setting
the qlen of the qdisc. But it did take us a while to determine the
cause of the problem we were seeing. And as it seems inconsistent
I'm interested to know why this is the case.

On an unrelated note, MAINTANERS lists bonding-devel@lists.sourceforge.net
but the (recent) archives seem to be entirely spam.  Is the MAINTAINERS
file correct?

^ permalink raw reply

* [PATCH] vlan: fix u64_stats
From: Changli Gao @ 2010-07-28  5:10 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David S. Miller, netdev, Changli Gao

rx_stats->rx_errors should be under the protection of rx_stats->syncp.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 net/8021q/vlan_dev.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 3d59c9b..d70d31a 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -211,7 +211,6 @@ int vlan_skb_recv(struct sk_buff *skb, struct net_device *dev,
 		default:
 			break;
 		}
-		u64_stats_update_end(&rx_stats->syncp);
 	}
 
 	skb_pull_rcsum(skb, VLAN_HLEN);
@@ -221,8 +220,10 @@ int vlan_skb_recv(struct sk_buff *skb, struct net_device *dev,
 		skb = vlan_check_reorder_header(skb);
 		if (!skb) {
 			rx_stats->rx_errors++;
+			u64_stats_update_end(&rx_stats->syncp);
 			goto err_unlock;
 		}
+		u64_stats_update_end(&rx_stats->syncp);
 	}
 
 	netif_rx(skb);

^ permalink raw reply related

* Re: [PATCH] vlan: fix u64_stats
From: Changli Gao @ 2010-07-28  8:41 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David S. Miller, netdev, Changli Gao
In-Reply-To: <1280293833-8723-1-git-send-email-xiaosuo@gmail.com>

On Wed, Jul 28, 2010 at 1:10 PM, Changli Gao <xiaosuo@gmail.com> wrote:
> rx_stats->rx_errors should be under the protection of rx_stats->syncp.
>

I checked the code again. rx_errors is unsigned long, so need no
protection. Sorry for the noise.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* [BUG] bridge leaks 3 references on lo per up&down
From: David Lamparter @ 2010-07-28  9:58 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger, bridge
In-Reply-To: <20100728074428.GB13413@jupiter.n2.diac24.net>

On Wed, Jul 28, 2010 at 09:44:28AM +0200, David Lamparter wrote:
> On Tue, Jul 27, 2010 at 10:02:49AM -0700, Stephen Hemminger wrote:
> > David Lamparter <equinox@diac24.net> wrote:
> > > unregister_netdevice: waiting for lo to become free. Usage count = 4
> > > 
> > > unshare -n -- /bin/bash -c 'ip link add link lo veth0 type veth peer
> > > name veth3 netns 1; brctl addbr br0; ip l s br0 up; ip a l; sleep 8'

When I do

unshare -n -- /bin/bash -c 'ip link add link lo veth0 type veth peer
	name veth3 netns 1; brctl addbr br0; \
	ip l s br0 up; sleep 8; ip l s br0 down; sleep 1;
	ip l s br0 up; sleep 8;'

I get "usage count = 7" on lo; when I add another up & down, I get
"usage count = 10". 4 times makes it a merry 13 refs...


-David


^ permalink raw reply

* Re: [Uclinux-dist-devel] [PATCH net-next] drivers/net/bfin_mac.c: Use  pr_fmt, netdev_<level>
From: Joe Perches @ 2010-07-28 10:00 UTC (permalink / raw)
  To: Mike Frysinger; +Cc: Michael Hennerich, uclinux-dist-devel, LKML, netdev
In-Reply-To: <AANLkTi=p=-ovBaqyqBdEW=W4E1ArWsiyUBwy37GtXOyu@mail.gmail.com>

On Wed, 2010-07-28 at 03:50 -0400, Mike Frysinger wrote:
> On Tue, Jul 27, 2010 at 15:22, Joe Perches wrote:
> > $ ./scripts/checkpatch.pl -f drivers/net/bfin_mac.c | grep "^total:"
> > total: 2 errors, 25 warnings, 1723 lines checked
> > $ ./scripts/checkpatch.pl -f drivers/net/bfin_mac.c | grep "^total:"
> > total: 0 errors, 0 warnings, 1743 lines checked
> i dislike the mixing of whitespace and useful changes
> if they were split, and they worked, then i wouldnt have a problem with them

Is that a nak or a dislike?

Also, is Michael Hennerich actually the maintainer for bfin_mac?
He had been  listed as MAINTAINER for over a year but he hasn't
ever written or acked a patch for this file.

commit making Michael Hennerich bfin_mac MAINTAINER
$ git log -1 6c83429a
commit 6c83429a1c32c914dfb81939cc2ddece97e48294
Author: Mike Frysinger <vapier@gentoo.org>
Date:   Sun May 24 02:13:15 2009 -0400

    MAINTAINERS: update Blackfin items
    
    With Bryan Wu having moved on to another job, push the slack onto some
    other ADI lackeys.
    
    Signed-off-by: Mike Frysinger <vapier@gentoo.org>

$ ./scripts/get_maintainer.pl -f drivers/net/bfin_mac.c \
	--rolestats --git-since=5-years-ago --nol
Michael Hennerich <michael.hennerich@analog.com> (supporter:BLACKFIN EMAC DRIVER)
"David S. Miller" <davem@davemloft.net> (commit_signer:41/65=63%)
Jeff Garzik <jgarzik@redhat.com> (commit_signer:31/65=48%)
Bryan Wu <cooloney@kernel.org> (commit_signer:26/65=40%)
Mike Frysinger <vapier@gentoo.org> (commit_signer:20/65=31%)
Sonic Zhang <sonic.zhang@analog.com> (commit_signer:8/65=12%)




^ permalink raw reply

* RE: [Uclinux-dist-devel] [PATCH net-next] drivers/net/bfin_mac.c: Use  pr_fmt, netdev_<level>
From: Hennerich, Michael @ 2010-07-28 10:41 UTC (permalink / raw)
  To: Joe Perches, Mike Frysinger; +Cc: uclinux-dist-devel, LKML, netdev
In-Reply-To: <1280311240.24054.103.camel@Joe-Laptop.home>

Joe Perches wrote on 2010-07-28:
> On Wed, 2010-07-28 at 03:50 -0400, Mike Frysinger wrote:
>> On Tue, Jul 27, 2010 at 15:22, Joe Perches wrote:
>>> $ ./scripts/checkpatch.pl -f drivers/net/bfin_mac.c | grep "^total:"
>>> total: 2 errors, 25 warnings, 1723 lines checked $
>>> ./scripts/checkpatch.pl -f drivers/net/bfin_mac.c | grep "^total:"
>>> total: 0 errors, 0 warnings, 1743 lines checked
>> i dislike the mixing of whitespace and useful changes if they were
>> split, and they worked, then i wouldnt have a problem with them
>
> Is that a nak or a dislike?

I think what Mike meant was - two patches would be better.
I'm currently cloning the tree this patch applies to.
I'll ACK once I verified the patch.

>
> Also, is Michael Hennerich actually the maintainer for bfin_mac?
> He had been  listed as MAINTAINER for over a year but he hasn't ever
> written or acked a patch for this file.
>
> commit making Michael Hennerich bfin_mac MAINTAINER $ git log -1
> 6c83429a commit 6c83429a1c32c914dfb81939cc2ddece97e48294
> Author: Mike Frysinger <vapier@gentoo.org>
> Date:   Sun May 24 02:13:15 2009 -0400
>
>     MAINTAINERS: update Blackfin items
>
>     With Bryan Wu having moved on to another job, push the slack onto
>     some other ADI lackeys.
>
>     Signed-off-by: Mike Frysinger <vapier@gentoo.org>
> $ ./scripts/get_maintainer.pl -f drivers/net/bfin_mac.c \
>       --rolestats --git-since=5-years-ago --nol Michael Hennerich
> <michael.hennerich@analog.com> (supporter:BLACKFIN EMAC DRIVER) "David
> S. Miller" <davem@davemloft.net> (commit_signer:41/65=63%) Jeff Garzik
> <jgarzik@redhat.com> (commit_signer:31/65=48%) Bryan Wu
> <cooloney@kernel.org> (commit_signer:26/65=40%) Mike Frysinger
> <vapier@gentoo.org> (commit_signer:20/65=31%) Sonic Zhang
> <sonic.zhang@analog.com> (commit_signer:8/65=12%)
>
>

Greetings,
Michael

Analog Devices GmbH      Wilhelm-Wagenfeld-Str. 6      80807 Muenchen
Sitz der Gesellschaft Muenchen, Registergericht Muenchen HRB 4036 Geschaeftsfuehrer Thomas Wessel, William A. Martin, Margaret Seif



^ permalink raw reply

* Re: [PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread
From: Michael S. Tsirkin @ 2010-07-28 10:48 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Oleg Nesterov, Sridhar Samudrala, netdev, lkml,
	kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev, Jiri Kosina,
	Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <4C4FE0CF.3070506@kernel.org>

On Wed, Jul 28, 2010 at 09:48:31AM +0200, Tejun Heo wrote:
> On 07/27/2010 09:19 PM, Michael S. Tsirkin wrote:
> >> Thinking a bit more about it, it kind of sucks that queueing to
> >> another worker from worker->func() breaks flush.  Maybe the right
> >> thing to do there is using atomic_t for done_seq?
> > 
> > I don't believe it will help: we might have:
> > 
> > worker1 runs work
> > work requeues itself queued index = 1
> > worker1 reads queued index = 1
> > worker2 runs work
> > work requeues itself queued index = 2
> > worker2 runs work
> > worker2 reads queued index = 2
> > worker2 writes done index = 2
> > worker1 writes done index = 1
> > 
> > As you see, done index got moved back.
> 
> Yeah, I think the flushing logic should be moved to the worker.
> Are you interested in doing it w/ your change?
> 
> Thanks.

I'm unsure how flush_work operates under these conditions.  E.g. in
workqueue.c, this seems to work by keeping a pointer to current
workqueue in the work.  But what prevents us from destroying the
workqueue when work might not be running?

Is this currently broken if you use multiple workqueues
for the same work? If yes, I propose we do as I did,
making flush_work get worker pointer, and only flushing
on that worker.

> -- 
> tejun

^ permalink raw reply

* [PATCH RHEL6 RESEND] kernel performance optimization with CONFIG_DEBUG_RODATA
From: Xiaotian Feng @ 2010-07-28 11:00 UTC (permalink / raw)
  To: linux-mm, linux-nfs, netdev
  Cc: cl, a.p.zijlstra, Xiaotian Feng, lwang, penberg, davem

backport of following commits to improve x86_64 kernel performance with
CONFIG_DEBUG_RODATA:

straightforward backport of:
commit b9af7c0d (x86-64: preserve large page mapping for 1st 2MB kernel txt with CONFIG_DEBUG_RODATA)
commit 74e08179 (x86-64: align RODATA kernel section to 2MB with CONFIG_DEBUG_RODATA)
commit d6cc1c3a (x86-64: add comment for RODATA large page retainment)

Resolves bz557364

We still have CONFIG_DEBUG_RODATA set for regular rhel6 kernel, so this fix is still needed.

There's no kabi breakage with latest rhel6 code (don't know why...)
Brew build is available at:
https://brewweb.devel.redhat.com/taskinfo?taskID=2627856

Test has been done for RHTS kernel tier1:
http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=168225

No regressions are introduced by this patch. Reviews and comments are welcome.
---
--- a/arch/x86/include/asm/sections.h
+++ b/arch/x86/include/asm/sections.h
@@ -2,7 +2,13 @@
 #define _ASM_X86_SECTIONS_H
 
 #include <asm-generic/sections.h>
+#include <asm/uaccess.h>
 
 extern char __brk_base[], __brk_limit[];
+extern struct exception_table_entry __stop___ex_table[];
+
+#if defined(CONFIG_X86_64) && defined(CONFIG_DEBUG_RODATA)
+extern char __end_rodata_hpage_align[];
+#endif
 
 #endif	/* _ASM_X86_SECTIONS_H */
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -262,11 +262,11 @@ ENTRY(secondary_startup_64)
 	.quad	x86_64_start_kernel
 	ENTRY(initial_gs)
 	.quad	INIT_PER_CPU_VAR(irq_stack_union)
-	__FINITDATA
 
 	ENTRY(stack_start)
 	.quad  init_thread_union+THREAD_SIZE-8
 	.word  0
+	__FINITDATA
 
 bad_address:
 	jmp bad_address
@@ -340,6 +340,7 @@ ENTRY(name)
 	i = i + 1 ;					\
 	.endr
 
+	.data
 	/*
 	 * This default setting generates an ident mapping at address 0x100000
 	 * and a mapping for the kernel that precisely maps virtual address
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -41,6 +41,32 @@ ENTRY(phys_startup_64)
 jiffies_64 = jiffies;
 #endif
 
+#if defined(CONFIG_X86_64) && defined(CONFIG_DEBUG_RODATA)
+/*
+ * On 64-bit, align RODATA to 2MB so that even with CONFIG_DEBUG_RODATA
+ * we retain large page mappings for boundaries spanning kernel text, rodata
+ * and data sections.
+ *
+ * However, kernel identity mappings will have different RWX permissions
+ * to the pages mapping to text and to the pages padding (which are freed) the
+ * text section. Hence kernel identity mappings will be broken to smaller
+ * pages. For 64-bit, kernel text and kernel identity mappings are different,
+ * so we can enable protection checks that come with CONFIG_DEBUG_RODATA,
+ * as well as retain 2MB large page mappings for kernel text.
+ */
+#define X64_ALIGN_DEBUG_RODATA_BEGIN   . = ALIGN(HPAGE_SIZE);
+
+#define X64_ALIGN_DEBUG_RODATA_END				\
+		. = ALIGN(HPAGE_SIZE);				\
+		__end_rodata_hpage_align = .;
+
+#else
+
+#define X64_ALIGN_DEBUG_RODATA_BEGIN
+#define X64_ALIGN_DEBUG_RODATA_END
+
+#endif
+
 PHDRS {
 	text PT_LOAD FLAGS(5);          /* R_E */
 	data PT_LOAD FLAGS(7);          /* RWE */
@@ -90,7 +116,9 @@ SECTIONS
 
 	EXCEPTION_TABLE(16) :text = 0x9090
 
+	X64_ALIGN_DEBUG_RODATA_BEGIN
 	RO_DATA(PAGE_SIZE)
+	X64_ALIGN_DEBUG_RODATA_END
 
 	/* Data */
 	.data : AT(ADDR(.data) - LOAD_OFFSET) {
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -761,7 +761,7 @@ static int kernel_set_to_readonly;
 
 void set_kernel_text_rw(void)
 {
-	unsigned long start = PFN_ALIGN(_stext);
+	unsigned long start = PFN_ALIGN(_text);
 	unsigned long end = PFN_ALIGN(__start_rodata);
 
 	if (!kernel_set_to_readonly)
@@ -775,7 +775,7 @@ void set_kernel_text_rw(void)
 
 void set_kernel_text_ro(void)
 {
-	unsigned long start = PFN_ALIGN(_stext);
+	unsigned long start = PFN_ALIGN(_text);
 	unsigned long end = PFN_ALIGN(__start_rodata);
 
 	if (!kernel_set_to_readonly)
@@ -789,9 +789,13 @@ void set_kernel_text_ro(void)
 
 void mark_rodata_ro(void)
 {
-	unsigned long start = PFN_ALIGN(_stext), end = PFN_ALIGN(__end_rodata);
+	unsigned long start = PFN_ALIGN(_text);
 	unsigned long rodata_start =
 		((unsigned long)__start_rodata + PAGE_SIZE - 1) & PAGE_MASK;
+	unsigned long end = (unsigned long) &__end_rodata_hpage_align;
+	unsigned long text_end = PAGE_ALIGN((unsigned long) &__stop___ex_table);
+	unsigned long rodata_end = PAGE_ALIGN((unsigned long) &__end_rodata);
+	unsigned long data_start = (unsigned long) &_sdata;
 
 	printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
 	       (end - start) >> 10);
@@ -814,6 +818,14 @@ void mark_rodata_ro(void)
 	printk(KERN_INFO "Testing CPA: again\n");
 	set_memory_ro(start, (end-start) >> PAGE_SHIFT);
 #endif
+
+	free_init_pages("unused kernel memory",
+			(unsigned long) page_address(virt_to_page(text_end)),
+			(unsigned long)
+				 page_address(virt_to_page(rodata_start)));
+	free_init_pages("unused kernel memory",
+			(unsigned long) page_address(virt_to_page(rodata_end)),
+			(unsigned long) page_address(virt_to_page(data_start)));
 }
 
 #endif
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -279,6 +279,20 @@ static inline pgprot_t static_protections(pgprot_t prot, unsigned long address,
 		   __pa((unsigned long)__end_rodata) >> PAGE_SHIFT))
 		pgprot_val(forbidden) |= _PAGE_RW;
 
+#if defined(CONFIG_X86_64) && defined(CONFIG_DEBUG_RODATA)
+	/*
+	 * Kernel text mappings for the large page aligned .rodata section
+	 * will be read-only. For the kernel identity mappings covering
+	 * the holes caused by this alignment can be anything.
+	 *
+	 * This will preserve the large page mappings for kernel text/data
+	 * at no extra cost.
+	 */
+	if (within(address, (unsigned long)_text,
+		   (unsigned long)__end_rodata_hpage_align))
+		pgprot_val(forbidden) |= _PAGE_RW;
+#endif
+
 	prot = __pgprot(pgprot_val(prot) & ~pgprot_val(forbidden));
 
 	return prot;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH RHEL6 RESEND] kernel performance optimization with CONFIG_DEBUG_RODATA
From: Xiaotian Feng @ 2010-07-28 11:03 UTC (permalink / raw)
  Cc: Xiaotian Feng, linux-mm, linux-nfs, netdev, cl, a.p.zijlstra,
	lwang, penberg, davem
In-Reply-To: <20100728110043.27677.13908.sendpatchset@dhcp-65-180.nay.redhat.com>

oops, script got some wrong, sorry for the noise... please ignore the 
mail ...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread
From: Tejun Heo @ 2010-07-28 12:00 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Oleg Nesterov, Sridhar Samudrala, netdev, lkml,
	kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev, Jiri Kosina,
	Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <20100728104858.GB30643@redhat.com>

Hello,

On 07/28/2010 12:48 PM, Michael S. Tsirkin wrote:
> I'm unsure how flush_work operates under these conditions.  E.g. in
> workqueue.c, this seems to work by keeping a pointer to current
> workqueue in the work.  But what prevents us from destroying the
> workqueue when work might not be running?

In cmwq, work points to the gcwq it was on, which keeps track of all
the works in progress, so flushing work which is on a destroyed
workqueue should be fine, but in the original implementation, it would
end up accessing freed memory.

> Is this currently broken if you use multiple workqueues
> for the same work? If yes, I propose we do as I did,
> making flush_work get worker pointer, and only flushing
> on that worker.

The original semantics of workqueue is that flush_work() guarantees
that the work has finished executing on the workqueue it was last
queued on.  Adding @worker to flush_work() is okay, I think.

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [BUG] bridge leaks 3 references on lo per up&down
From: David Lamparter @ 2010-07-28 12:03 UTC (permalink / raw)
  To: netdev; +Cc: Herbert Xu, Stephen Hemminger, bridge
In-Reply-To: <20100728095825.GC13413@jupiter.n2.diac24.net>

On Wed, Jul 28, 2010 at 11:58:25AM +0200, David Lamparter wrote:
> unshare -n -- /bin/bash -c 'ip link add link lo veth0 type veth peer
>       name veth3 netns 1; brctl addbr br0; \
>       ip l s br0 up; sleep 8; ip l s br0 down; sleep 1;
>       ip l s br0 up; sleep 8;'
>
> I get "usage count = 7" on lo; when I add another up & down, I get
> "usage count = 10". 4 times makes it a merry 13 refs...

It only happens with BRIDGE_IGMP_SNOOPING. As we don't need that, our
"fix" is just to run with it disabled... works pretty fine.

Herbert, you implemented bridge multicast support, maybe you can figure
out where these references are taken?


-David


^ permalink raw reply

* RE: [PATCH net-next] drivers/net/bfin_mac.c: Use pr_fmt, netdev_<level>
From: Hennerich, Michael @ 2010-07-28 13:23 UTC (permalink / raw)
  To: Joe Perches; +Cc: uclinux-dist-devel, netdev, LKML
In-Reply-To: <1280258531.24054.10.camel@Joe-Laptop.home>

Joe Perches wrote on 2010-07-27:
> And some assorted neatening for checkpatch:
>
>       80 column reformatting (mostly comments)
>       argument alignment
>       couple of spelling/grammar typos corrected
>
> Added bfin_alloc_skb to centralize allocation/dcache invalidation
> Added get_mac_addr for symmetry
>
> $ ./scripts/checkpatch.pl -f drivers/net/bfin_mac.c | grep "^total:"
> total: 2 errors, 25 warnings, 1723 lines checked $
> ./scripts/checkpatch.pl -f drivers/net/bfin_mac.c | grep "^total:"
> total: 0 errors, 0 warnings, 1743 lines checked
>
> Uncompiled, untested.

One little thing -

drivers/net/bfin_mac.c: In function ‘bfin_alloc_skb’:
drivers/net/bfin_mac.c:89: error: ‘PKT_BUF_SIZE’ undeclared (first use in this function)

If you change the define PKT_BUF_SZ to match what is being used in this patch
- then everything compiles and works cleanly.

Acked-by: Michael Hennerich <michael.hennerich@analog.com>

>
> Signed-off-by: Joe Perches <joe@perches.com>
> ---
>  drivers/net/bfin_mac.c |  284
> ++++++++++++++++++++++++++--------------
> --------
>  1 files changed, 152 insertions(+), 132 deletions(-)
> diff --git a/drivers/net/bfin_mac.c b/drivers/net/bfin_mac.c index
> 012613f..46a4576 100644
> --- a/drivers/net/bfin_mac.c
> +++ b/drivers/net/bfin_mac.c
> @@ -8,6 +8,8 @@
>   * Licensed under the GPL-2 or later.
>   */
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
>  #include <linux/init.h>
>  #include <linux/module.h>
>  #include <linux/kernel.h>
> @@ -81,6 +83,24 @@ static u16 pin_req[] = P_RMII0;  static u16
> pin_req[] = P_MII0;  #endif
>
> +static struct sk_buff *bfin_alloc_skb(void) { +      /* allocate a new skb
> */ +  struct sk_buff *new_skb = dev_alloc_skb(PKT_BUF_SIZE +
> NET_IP_ALIGN); + +    if (!new_skb) +         return NULL; +
> +     skb_reserve(new_skb, NET_IP_ALIGN); +   /* Invalidate the data cache of
> skb->data range when it is write back +        * cache.  It will prevent
> overwriting the new data from DMA +    */
> +     blackfin_dcache_invalidate_range((unsigned long)new_skb->head, +
> (unsigned long)new_skb->end); + +     return new_skb; +} +
>  static void desc_list_free(void)
>  {
>       struct net_dma_desc_rx *r;
> @@ -132,14 +152,14 @@ static int desc_list_init(void)  #endif
>
>       tx_desc = bfin_mac_alloc(&dma_handle,
> -                             sizeof(struct net_dma_desc_tx) * -                              CONFIG_BFIN_TX_DESC_NUM);
> +                              sizeof(struct net_dma_desc_tx) * +                              CONFIG_BFIN_TX_DESC_NUM);
>       if (tx_desc == NULL)
>               goto init_error;
>
>       rx_desc = bfin_mac_alloc(&dma_handle,
> -                             sizeof(struct net_dma_desc_rx) * -                              CONFIG_BFIN_RX_DESC_NUM);
> +                              sizeof(struct net_dma_desc_rx) * +                              CONFIG_BFIN_RX_DESC_NUM);
>       if (rx_desc == NULL)
>               goto init_error;
> @@ -192,19 +212,11 @@ static int desc_list_init(void)
>               struct dma_descriptor *a = &(r->desc_a);
>               struct dma_descriptor *b = &(r->desc_b);
> -             /* allocate a new skb for next time receive */
> -             new_skb = dev_alloc_skb(PKT_BUF_SZ + NET_IP_ALIGN);
> +             new_skb = bfin_alloc_skb();
>               if (!new_skb) {
> -                     printk(KERN_NOTICE DRV_NAME
> -                            ": init: low on mem - packet dropped\n");
> +                     pr_notice("init: low on mem - packet dropped\n");
>                       goto init_error;
>               }
> -             skb_reserve(new_skb, NET_IP_ALIGN);
> -             /* Invidate the data cache of skb->data range when it is
> write back
> -              * cache. It will prevent overwritting the new data from
> DMA
> -              */
> -             blackfin_dcache_invalidate_range((unsigned long)new_skb-
>  head, -                                       (unsigned long)new_skb->end);          r->skb = new_skb;
>>
>               /* @@ -229,8 +241,8 @@ static int desc_list_init(void)           * 6 half
>  words is desc size            * large desc flow               */
> -             b->config = DMAEN | WNR | WDSIZE_32 | DI_EN |
> -                             NDSIZE_6 | DMAFLOW_LARGE;
> +             b->config = (DMAEN | WNR | WDSIZE_32 | DI_EN |
> +                          NDSIZE_6 | DMAFLOW_LARGE);
>               b->start_addr = (unsigned long)(&(r->status));
>               b->x_count = 0;
> @@ -246,7 +258,7 @@ static int desc_list_init(void)
>
>  init_error:
>       desc_list_free();
> -     printk(KERN_ERR DRV_NAME ": kmalloc failed\n");
> +     pr_err("kmalloc failed\n");
>       return -ENOMEM;
>  }
> @@ -263,12 +275,11 @@ static int bfin_mdio_poll(void)
>
>       /* poll the STABUSY bit */      while ((bfin_read_EMAC_STAADD()) &
>  STABUSY) { -         udelay(1);              if (timeout_cnt-- < 0) {
> -                     printk(KERN_ERR DRV_NAME
> -                     ": wait MDC/MDIO transaction to complete timeout\n");
> +                     pr_err("wait MDC/MDIO transaction to complete
> timeout\n");
>                       return -ETIMEDOUT;              } +             udelay(1);      }
>
>       return 0;
> @@ -284,15 +295,15 @@ static int bfin_mdiobus_read(struct mii_bus
> *bus, int phy_addr, int regnum)
>               return ret;
>
>       /* read mode */
> -     bfin_write_EMAC_STAADD(SET_PHYAD((u16) phy_addr) |
> +     bfin_write_EMAC_STAADD((SET_PHYAD((u16) phy_addr) |
>                               SET_REGAD((u16) regnum) |
> -                             STABUSY);
> +                             STABUSY));
>
>       ret = bfin_mdio_poll();
>       if (ret)
>               return ret;
> -     return (int) bfin_read_EMAC_STADAT();
> +     return (int)bfin_read_EMAC_STADAT();
>  }
>
>  /* Write an off-chip register in a PHY through the MDC/MDIO port */
> @@ -308,10 +319,10 @@ static int bfin_mdiobus_write(struct mii_bus
> *bus, int phy_addr, int regnum,
>       bfin_write_EMAC_STADAT((u32) value);
>
>       /* write mode */
> -     bfin_write_EMAC_STAADD(SET_PHYAD((u16) phy_addr) |
> +     bfin_write_EMAC_STAADD((SET_PHYAD((u16) phy_addr) |
>                               SET_REGAD((u16) regnum) |
>                               STAOP |
> -                             STABUSY);
> +                             STABUSY));
>
>       return bfin_mdio_poll();
>  }
> @@ -356,9 +367,9 @@ static void bfin_mac_adjust_link(struct net_device
> *dev)
>                               opmode &= ~(RMII_10);
>                               break;
>                       default:
> -                             printk(KERN_WARNING
> -                                     "%s: Ack!  Speed (%d) is not 10/100!\n",
> -                                     DRV_NAME, phydev->speed);
> +                             netdev_warn(dev,
> +                                         "Ack!  Speed (%d) is not 10/100!\n",
> +                                         phydev->speed);
>                               break;
>                       }
>                       bfin_write_EMAC_OPMODE(opmode);
> @@ -382,7 +393,7 @@ static void bfin_mac_adjust_link(struct net_device
> *dev)
>       if (new_state) {
>               u32 opmode = bfin_read_EMAC_OPMODE();
>               phy_print_status(phydev);
> -             pr_debug("EMAC_OPMODE = 0x%08x\n", opmode);
> +             netdev_dbg(dev, "EMAC_OPMODE = 0x%08x\n", opmode);
>       }
>
>       spin_unlock_irqrestore(&lp->lock, flags); @@ -421,35 +432,35 @@ static
>  int mii_probe(struct net_device *dev)                break; /* found it */   }
> -     /* now we are supposed to have a proper phydev, to attach to... */
> +     /* now we are supposed to have a proper phydev to attach to... */
>       if (!phydev) {
> -             printk(KERN_INFO "%s: Don't found any phy device at all\n",
> -                     dev->name); +           netdev_info(dev, "No PHY device found\n");
>               return -ENODEV;
>       }
>
>  #if defined(CONFIG_BFIN_MAC_RMII)
>       phydev = phy_connect(dev, dev_name(&phydev->dev),
> &bfin_mac_adjust_link,
> -                     0, PHY_INTERFACE_MODE_RMII);
> +                          0, PHY_INTERFACE_MODE_RMII);
>  #else
>       phydev = phy_connect(dev, dev_name(&phydev->dev),
> &bfin_mac_adjust_link,
> -                     0, PHY_INTERFACE_MODE_MII);
> +                          0, PHY_INTERFACE_MODE_MII);
>  #endif
>
>       if (IS_ERR(phydev)) { -         printk(KERN_ERR "%s: Could not attach to
>  PHY\n", dev- name); +                netdev_err(dev, "Could not attach to PHY\n");
>               return PTR_ERR(phydev);         }
>
>       /* mask with MAC supported features */
> -     phydev->supported &= (SUPPORTED_10baseT_Half
> -                           | SUPPORTED_10baseT_Full
> -                           | SUPPORTED_100baseT_Half
> -                           | SUPPORTED_100baseT_Full
> -                           | SUPPORTED_Autoneg
> -                           | SUPPORTED_Pause | SUPPORTED_Asym_Pause
> -                           | SUPPORTED_MII
> -                           | SUPPORTED_TP);
> +     phydev->supported &= (SUPPORTED_10baseT_Half |
> +                           SUPPORTED_10baseT_Full |
> +                           SUPPORTED_100baseT_Half |
> +                           SUPPORTED_100baseT_Full |
> +                           SUPPORTED_Autoneg |
> +                           SUPPORTED_Pause |
> +                           SUPPORTED_Asym_Pause |
> +                           SUPPORTED_MII |
> +                           SUPPORTED_TP);
>
>       phydev->advertising = phydev->supported;
> @@ -458,11 +469,11 @@ static int mii_probe(struct net_device *dev)
>       lp->old_duplex = -1;
>       lp->phydev = phydev;
> -     printk(KERN_INFO "%s: attached PHY driver [%s] " -
> "(mii_bus:phy_addr=%s, irq=%d, mdc_clk=%dHz(mdc_div=%d)" -
> "@sclk=%dMHz)\n", -          DRV_NAME, phydev->drv->name,
> dev_name(&phydev->dev), phydev->irq, -               MDC_CLK, mdc_div,
> sclk/1000000); +      netdev_info(dev, "attached PHY driver [%s] " +
> "(mii_bus:phy_addr=%s, irq=%d, mdc_clk=%dHz(mdc_div=%d)" +
> "@sclk=%dMHz)\n", +               phydev->drv->name, dev_name(&phydev->dev),
> phydev->irq, +                    MDC_CLK, mdc_div, sclk/1000000);
>
>       return 0;
>  }
> @@ -514,7 +525,7 @@ static void bfin_mac_ethtool_getdrvinfo(struct
> net_device *dev,  }
>
>  static void bfin_mac_ethtool_getwol(struct net_device *dev,
> -     struct ethtool_wolinfo *wolinfo)
> +                                 struct ethtool_wolinfo *wolinfo)
>  {
>       struct bfin_mac_local *lp = netdev_priv(dev);
> @@ -523,7 +534,7 @@ static void bfin_mac_ethtool_getwol(struct
> net_device *dev,  }
>
>  static int bfin_mac_ethtool_setwol(struct net_device *dev,
> -     struct ethtool_wolinfo *wolinfo)
> +                                struct ethtool_wolinfo *wolinfo)
>  {    struct bfin_mac_local *lp = netdev_priv(dev);   int rc; @@ -599,10
>  +610,20 @@ void setup_system_regs(struct net_device *dev)
>       bfin_write_DMA1_Y_MODIFY(0); }
> +/* Grab the MAC address in the MAC */ static void get_mac_addr(u8
> +*mac_addr) {
> +     __le32 addr_hi = cpu_to_le32(bfin_read_EMAC_ADDRLO());
> +     __le16 addr_low = cpu_to_le16((u16)bfin_read_EMAC_ADDRHI());
> +
> +     memcpy(mac_addr, &addr_hi, 4);
> +     memcpy(mac_addr + 4, &addr_low, 2);
> +}
> +
>  static void setup_mac_addr(u8 *mac_addr)  {
> -     u32 addr_low = le32_to_cpu(*(__le32 *) & mac_addr[0]);
> -     u16 addr_hi = le16_to_cpu(*(__le16 *) & mac_addr[4]);
> +     u32 addr_low = le32_to_cpu(*(__le32 *)&mac_addr[0]);
> +     u16 addr_hi = le16_to_cpu(*(__le16 *)&mac_addr[4]);
>
>       /* this depends on a little-endian machine */
>       bfin_write_EMAC_ADDRLO(addr_low);
> @@ -612,6 +633,7 @@ static void setup_mac_addr(u8 *mac_addr)  static
> int bfin_mac_set_mac_address(struct net_device *dev, void *p)  {
>       struct sockaddr *addr = p; +    if (netif_running(dev))                 return -EBUSY;
>       memcpy(dev->dev_addr, addr->sa_data, dev->addr_len); @@ -623,7
> +645,7 @@ static int bfin_mac_set_mac_address(struct net_device *dev,
> void *p)  #define bfin_mac_hwtstamp_is_none(cfg) ((cfg) ==
> HWTSTAMP_FILTER_NONE)
>
>  static int bfin_mac_hwtstamp_ioctl(struct net_device *netdev,
> -             struct ifreq *ifr, int cmd)
> +                                struct ifreq *ifr, int cmd)
>  {
>       struct hwtstamp_config config;
>       struct bfin_mac_local *lp = netdev_priv(netdev); @@ -633,15
> +655,15 @@ static int bfin_mac_hwtstamp_ioctl(struct net_device
> *netdev,
>       if (copy_from_user(&config, ifr->ifr_data, sizeof(config)))
>               return -EFAULT;
> -     pr_debug("%s config flag:0x%x, tx_type:0x%x, rx_filter:0x%x\n",
> -                     __func__, config.flags, config.tx_type,
> config.rx_filter);
> +     netdev_dbg("%s config flag:0x%x, tx_type:0x%x, rx_filter:0x%x\n",
> +                __func__, config.flags, config.tx_type,
> config.rx_filter);
>
>       /* reserved for future extensions */
>       if (config.flags)
>               return -EINVAL;
>
>       if ((config.tx_type != HWTSTAMP_TX_OFF) &&
> -                     (config.tx_type != HWTSTAMP_TX_ON))
> +         (config.tx_type != HWTSTAMP_TX_ON))
>               return -ERANGE;
>
>       ptpctl = bfin_read_EMAC_PTP_CTL();
> @@ -658,7 +680,8 @@ static int bfin_mac_hwtstamp_ioctl(struct
> net_device *netdev,
>       case HWTSTAMP_FILTER_PTP_V1_L4_SYNC:
>       case HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ:
>               /*
> -              * Clear the five comparison mask bits (bits[12:8]) in EMAC_PTP_CTL)
> +              * Clear the five comparison mask bits +                 * (bits[12:8] in
> EMAC_PTP_CTL)
>                * to enable all the field matches.
>                */
>               ptpctl &= ~0x1F00;
> @@ -694,8 +717,8 @@ static int bfin_mac_hwtstamp_ioctl(struct
> net_device *netdev,
>               ptpctl &= ~0x1F00;
>               bfin_write_EMAC_PTP_CTL(ptpctl);
>               /*
> -              * Keep the default values of the EMAC_PTP_FOFF register,
> except set
> -              * the PTPCOF field to 0x2A.
> +              * Keep the default values of the EMAC_PTP_FOFF register,
> +              * except set the PTPCOF field to 0x2A.
>                */
>               ptpfoff = 0x2A24170C;
>               bfin_write_EMAC_PTP_FOFF(ptpfoff);
> @@ -720,20 +743,20 @@ static int bfin_mac_hwtstamp_ioctl(struct
> net_device *netdev,
>       case HWTSTAMP_FILTER_PTP_V2_L2_SYNC:
>       case HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ:
>               /*
> -              * Clear bits 8 and 12 of the EMAC_PTP_CTL register to
> enable only the
> -              * EFTM and PTPCM field comparison.
> +              * Clear bits 8 and 12 of the EMAC_PTP_CTL register to
> enable
> +              * only the EFTM and PTPCM field comparison.
>                */
>               ptpctl &= ~0x1100;
>               bfin_write_EMAC_PTP_CTL(ptpctl);
>               /*
> -              * Keep the default values of all the fields of the EMAC_PTP_FOFF -
> * register, except set the PTPCOF field to 0x0E. +             * Keep the default
> values of all the fields of the +              * EMAC_PTP_FOFF register, except set
> the PTPCOF field to 0x0E.
>                */
>               ptpfoff = 0x0E24170C;
>               bfin_write_EMAC_PTP_FOFF(ptpfoff);
>               /*
> -              * Program bits [15:0] of the EMAC_PTP_FV1 register to 0x88F7, which
> -              * corresponds to PTP messages on the MAC layer. +               * Program bits
> [15:0] of the EMAC_PTP_FV1 register to 0x88F7, +               * which corresponds
> to PTP messages on the MAC layer.
>                */
>               ptpfv1 = 0x110488F7;
>               bfin_write_EMAC_PTP_FV1(ptpfv1);
> @@ -791,13 +814,17 @@ static int bfin_mac_hwtstamp_ioctl(struct
> net_device *netdev,
>               -EFAULT : 0;
>  }
> -static void bfin_dump_hwtamp(char *s, ktime_t *hw, ktime_t *ts,
> struct timecompare *cmp)
> +static void bfin_dump_hwtamp(char *s, ktime_t *hw, ktime_t *ts,
> +                          struct timecompare *cmp)
>  {
>       ktime_t sys = ktime_get_real();
>
>       pr_debug("%s %s hardware:%d,%d transform system:%d,%d system:%d,%d,
> cmp:%lld, %lld\n",
> -                     __func__, s, hw->tv.sec, hw->tv.nsec, ts->tv.sec, ts-
>> tv.nsec, sys.tv.sec,
> -                     sys.tv.nsec, cmp->offset, cmp->skew);
> +              __func__, s,
> +              hw->tv.sec, hw->tv.nsec,
> +              ts->tv.sec, ts->tv.nsec,
> +              sys.tv.sec, sys.tv.nsec,
> +              cmp->offset, cmp->skew);
>  }
>
>  static void bfin_tx_hwtstamp(struct net_device *netdev, struct
> sk_buff
> *skb) @@ -814,16 +841,17 @@ static void bfin_tx_hwtstamp(struct
> net_device *netdev, struct sk_buff *skb)
>               shtx->in_progress = 1;
>
>               /*
> -              * The timestamping is done at the EMAC module's MII/RMII interface
> -              * when the module sees the Start of Frame of an event message
> packet. This -                 * interface is the closest possible place to the
> physical Ethernet transmission +               * The timestamping is done at the
> EMAC module's MII/RMII +               * interface when the module sees the Start of
> Frame of an +          * event message packet. This interface is the closest
> +              * possible place to the physical Ethernet transmission
>                * medium, providing the best timing accuracy.
>                */
> -             while ((!(bfin_read_EMAC_PTP_ISTAT() & TXTL)) && (-- timeout_cnt))
> +             while ((!(bfin_read_EMAC_PTP_ISTAT() & TXTL)) && +
> (--timeout_cnt))
>                       udelay(1);
>               if (timeout_cnt == 0)
> -                     printk(KERN_ERR DRV_NAME
> -                                     ": fails to timestamp the TX packet\n");
> +                     netdev_err(dev, "failed to timestamp the TX
> packet\n");
>               else {
>                       struct skb_shared_hwtstamps shhwtstamps;
>                       u64 ns;
> @@ -832,15 +860,15 @@ static void bfin_tx_hwtstamp(struct net_device
> *netdev, struct sk_buff *skb)
>                       regval = bfin_read_EMAC_PTP_TXSNAPLO();
>                       regval |= (u64)bfin_read_EMAC_PTP_TXSNAPHI() << 32;
>                       memset(&shhwtstamps, 0, sizeof(shhwtstamps));
> -                     ns = timecounter_cyc2time(&lp->clock,
> -                                     regval);
> +                     ns = timecounter_cyc2time(&lp->clock, regval);
>                       timecompare_update(&lp->compare, ns);
>                       shhwtstamps.hwtstamp = ns_to_ktime(ns);
>                       shhwtstamps.syststamp =
>                               timecompare_transform(&lp->compare, ns);
>                       skb_tstamp_tx(skb, &shhwtstamps);
> -                     bfin_dump_hwtamp("TX", &shhwtstamps.hwtstamp,
> &shhwtstamps.syststamp, &lp->compare);
> +                     bfin_dump_hwtamp("TX", &shhwtstamps.hwtstamp,
> +                                      &shhwtstamps.syststamp, &lp->compare);
>               }
>       }
>  }
> @@ -869,7 +897,8 @@ static void bfin_rx_hwtstamp(struct net_device
> *netdev, struct sk_buff *skb)
>       shhwtstamps->hwtstamp = ns_to_ktime(ns);
>       shhwtstamps->syststamp = timecompare_transform(&lp->compare, ns);
> -     bfin_dump_hwtamp("RX", &shhwtstamps->hwtstamp, &shhwtstamps-
>> syststamp, &lp->compare);
> +     bfin_dump_hwtamp("RX", &shhwtstamps->hwtstamp,
> +                      &shhwtstamps->syststamp, &lp->compare);
>  }
>
>  /*
> @@ -879,8 +908,8 @@ static cycle_t bfin_read_clock(const struct
> cyclecounter *tc)  {
>       u64 stamp;
> -     stamp =  bfin_read_EMAC_PTP_TIMELO();
> -     stamp |= (u64)bfin_read_EMAC_PTP_TIMEHI() << 32ULL;
> +     stamp = bfin_read_EMAC_PTP_TIMELO();
> +     stamp |= ((u64)bfin_read_EMAC_PTP_TIMEHI()) << 32;
>
>       return stamp;
>  }
> @@ -961,7 +990,7 @@ static void tx_reclaim_skb(struct bfin_mac_local
> *lp)
>       }
>
>       if (current_tx_ptr->next != tx_list_head &&
> -             netif_queue_stopped(lp->ndev))
> +         netif_queue_stopped(lp->ndev))
>               netif_wake_queue(lp->ndev);
>
>       if (tx_list_head != current_tx_ptr) { @@ -974,7 +1003,7 @@ static
> void tx_reclaim_skb(struct bfin_mac_local
> *lp)
>                               jiffies + TX_RECLAIM_JIFFIES;
>
>               mod_timer(&lp->tx_reclaim_timer,
> -                     lp->tx_reclaim_timer.expires);
> +                       lp->tx_reclaim_timer.expires);
>       }
>
>       return;
> @@ -985,8 +1014,7 @@ static void tx_reclaim_skb_timeout(unsigned long
> lp)
>       tx_reclaim_skb((struct bfin_mac_local *)lp);  }
> -static int bfin_mac_hard_start_xmit(struct sk_buff *skb,
> -                             struct net_device *dev)
> +static int bfin_mac_hard_start_xmit(struct sk_buff *skb, struct
> +net_device *dev)
>  {
>       struct bfin_mac_local *lp = netdev_priv(dev);
>       u16 *data;
> @@ -1000,10 +1028,11 @@ static int bfin_mac_hard_start_xmit(struct
> sk_buff *skb,
>               data = (u16 *)(skb->data) - 1;
>               *data = (u16)(skb->len);
>               /*
> -              * When transmitting an Ethernet packet, the PTP_TSYNC module
> requires -             * a DMA_Length_Word field associated with the packet. The
> lower 12 bits -                * of this field are the length of the packet payload
> in bytes and the higher -              * 4 bits are the timestamping enable field.
> +              * When transmitting an Ethernet packet, the PTP_TSYNC module +          *
> requires a DMA_Length_Word field associated with the packet. +                 * The
> lower 12 bits of this field are the length of the packet +             * payload
> in bytes and the higher 4 bits are the timestamping +          * enable field.
>                */
>               if (shtx->hardware)
>                       *data |= 0x1000;
> @@ -1011,7 +1040,7 @@ static int bfin_mac_hard_start_xmit(struct
> sk_buff *skb,
>               current_tx_ptr->desc_a.start_addr = (u32)data;
>               /* this is important! */
>               blackfin_dcache_flush_range((u32)data,
> -                             (u32)((u8 *)data + skb->len + 4));
> +                                         (u32)((u8 *)data + skb->len + 4));
>       } else {
>               *((u16 *)(current_tx_ptr->packet)) = (u16)(skb->len);
>               /* enable timestamping for the sent packet */ @@ -1063,7
> +1092,8 @@ out:
>
>  #define IP_HEADER_OFF  0
>  #define RX_ERROR_MASK (RX_LONG | RX_ALIGN | RX_CRC | RX_LEN | \
> -     RX_FRAG | RX_ADDR | RX_DMAO | RX_PHY | RX_LATE | RX_RANGE)
> +                    RX_FRAG | RX_ADDR | RX_DMAO | RX_PHY | \
> +                    RX_LATE | RX_RANGE)
>
>  static void bfin_mac_rx(struct net_device *dev)  { @@ -1079,8 +1109,7
>  @@ static void bfin_mac_rx(struct net_device *dev)    * we which case we
>  simply drop the packet        */     if (current_rx_ptr->status.status_word &
>  RX_ERROR_MASK) {
> -             printk(KERN_NOTICE DRV_NAME
> -                    ": rx: receive error - packet dropped\n");
> +             netdev_notice(dev, "rx: receive error - packet dropped\n");
>               dev->stats.rx_dropped++;                goto out;       } @@ -1088,20 +1117,12 @@
>  static void bfin_mac_rx(struct net_device *dev)      /* allocate a new skb
>  for next time receive */     skb = current_rx_ptr->skb;
> -     new_skb = dev_alloc_skb(PKT_BUF_SZ + NET_IP_ALIGN);
> +     new_skb = bfin_alloc_skb();
>       if (!new_skb) {
> -             printk(KERN_NOTICE DRV_NAME
> -                    ": rx: low on mem - packet dropped\n");
> +             netdev_notice(dev, "rx: low on mem - packet dropped\n");
>               dev->stats.rx_dropped++;
>               goto out;
>       }
> -     /* reserve 2 bytes for RXDWA padding */ -       skb_reserve(new_skb,
> NET_IP_ALIGN); -      /* Invidate the data cache of skb->data range when it
> is write back -        * cache. It will prevent overwritting the new data from
> DMA -  */ -   blackfin_dcache_invalidate_range((unsigned
> long)new_skb->head, -                                  (unsigned long)new_skb->end);
>
>       current_rx_ptr->skb = new_skb;
>       current_rx_ptr->desc_a.start_addr = (unsigned long)new_skb->data
> - 2; @@ -1116,25 +1137,28 @@ static void bfin_mac_rx(struct net_device
> *dev)
>       bfin_rx_hwtstamp(dev, skb);
>
>  #if defined(BFIN_MAC_CSUM_OFFLOAD)
> -     /* Checksum offloading only works for IPv4 packets with the standard
> IP header -    * length of 20 bytes, because the blackfin MAC checksum
> calculation is -       * based on that assumption. We must NOT use the
> calculated checksum if our -   * IP version or header break that
> assumption. + /* Checksum offloading only works for IPv4 packets with
> the standard +         * IP header length of 20 bytes, because the blackfin MAC
> checksum +     * calculation is based on that assumption. We must NOT use
> the +  * calculated checksum if our IP version or header break that +  *
> assumption.
>        */
>       if (skb->data[IP_HEADER_OFF] == 0x45) {
>               skb->csum = current_rx_ptr->status.ip_payload_csum;
>               /*
> -              * Deduce Ethernet FCS from hardware generated IP payload checksum.
> -              * IP checksum is based on 16-bit one's complement algorithm. -          *
> To deduce a value from checksum is equal to add its inversion. -               * If
> the IP payload len is odd, the inversed FCS should also -              * begin from
> odd address and leave first byte zero. +               * Deduce Ethernet FCS from
> hardware generated IP payload +                * checksum.  IP checksum is based on
> 16-bit one's complement +              * algorithm.  To deduce a value from
> checksum is equal to +                 * add its inversion.  If the IP payload len is
> odd, the +             * inversed FCS should also begin from odd address and leave
> +              * first byte zero.
>>               */             if (skb->len % 2) {                     fcs[0] = 0;                     for (i = 0; i <
>> ETH_FCS_LEN; i++)                            fcs[i + 1] = ~skb->data[skb->len + i];
>> -                    skb->csum = csum_partial(fcs, ETH_FCS_LEN + 1, skb- csum);
> +                     skb->csum = csum_partial(fcs, ETH_FCS_LEN + 1,
> +                                              skb->csum);
>               } else {                        for (i = 0; i < ETH_FCS_LEN; i++)                               fcs[i] =
>  ~skb->data[skb->len + i]; @@ -1209,7 +1233,7 @@ static int
>  bfin_mac_enable(void)        int ret;        u32 opmode;
> -     pr_debug("%s: %s\n", DRV_NAME, __func__);
> +     pr_debug("%s\n", __func__);
>
>       /* Set RX DMA */
>       bfin_write_DMA1_NEXT_DESC_PTR(&(rx_list_head->desc_a));
> @@ -1251,7 +1275,7 @@ static void bfin_mac_timeout(struct net_device
> *dev)  {
>       struct bfin_mac_local *lp = netdev_priv(dev);
> -     pr_debug("%s: %s\n", dev->name, __func__);
> +     netdev_dbg(dev, "%s\n", __func__);
>
>       bfin_mac_disable();
> @@ -1318,7 +1342,7 @@ static void bfin_mac_set_multicast_list(struct
> net_device *dev)
>       u32 sysctl;
>
>       if (dev->flags & IFF_PROMISC) {
> -             printk(KERN_INFO "%s: set to promisc mode\n", dev->name);
> +             netdev_info(dev, "set to promisc mode\n");
>               sysctl = bfin_read_EMAC_OPMODE();
>               sysctl |= PR;
>               bfin_write_EMAC_OPMODE(sysctl);
> @@ -1372,7 +1396,7 @@ static int bfin_mac_open(struct net_device *dev)
> {
>       struct bfin_mac_local *lp = netdev_priv(dev);
>       int ret;
> -     pr_debug("%s: %s\n", dev->name, __func__);
> +     netdev_dbg(dev, "%s\n", __func__);
>
>       /*       * Check that the address is valid.  If its not, refuse @@ -
>  1380,7 +1404,7 @@ static int bfin_mac_open(struct net_device *dev)    *
>  address using ifconfig eth0 hw ether xx:xx:xx:xx:xx:xx        */     if
>  (!is_valid_ether_addr(dev->dev_addr)) {
> -             printk(KERN_WARNING DRV_NAME ": no valid ethernet hw addr\n");
> +             netdev_warn(dev, "no valid ethernet hw addr\n");
>               return -EINVAL;
>       }
> @@ -1398,7 +1422,7 @@ static int bfin_mac_open(struct net_device *dev)
>       ret = bfin_mac_enable();
>       if (ret)
>               return ret;
> -     pr_debug("hardware init finished\n");
> +     netdev_dbg(dev, "hardware init finished\n");
>
>       netif_start_queue(dev);
>       netif_carrier_on(dev);
> @@ -1414,7 +1438,7 @@ static int bfin_mac_open(struct net_device *dev)
> static int bfin_mac_close(struct net_device *dev)  {
>       struct bfin_mac_local *lp = netdev_priv(dev);
> -     pr_debug("%s: %s\n", dev->name, __func__);
> +     netdev_dbg(dev, "%s\n", __func__);
>
>       netif_stop_queue(dev);
>       netif_carrier_off(dev);
> @@ -1464,9 +1488,7 @@ static int __devinit bfin_mac_probe(struct
> platform_device *pdev)
>       lp = netdev_priv(ndev);
>       lp->ndev = ndev;
> -     /* Grab the MAC address in the MAC */ - *(__le32 *)
> (&(ndev->dev_addr[0])) = cpu_to_le32(bfin_read_EMAC_ADDRLO());
> -     *(__le16 *) (&(ndev->dev_addr[4])) = cpu_to_le16((u16)
> bfin_read_EMAC_ADDRHI()); +   get_mac_addr(ndev->dev_addr);
>
>       /* probe mac */
>       /*todo: how to proble? which is revision_register */ @@ -1526,8
> +1548,8 @@ static int __devinit bfin_mac_probe(struct platform_device
> *pdev)
>
>       /* now, enable interrupts */
>       /* register irq handler */
> -     rc = request_irq(IRQ_MAC_RX, bfin_mac_interrupt,
> -                     IRQF_DISABLED, "EMAC_RX", ndev);
> +     rc = request_irq(IRQ_MAC_RX, bfin_mac_interrupt, IRQF_DISABLED,
> +                      "EMAC_RX", ndev);
>       if (rc) {
>               dev_err(&pdev->dev, "Cannot request Blackfin MAC RX IRQ!\n");
>               rc = -EBUSY;
> @@ -1647,7 +1669,7 @@ static int __devinit bfin_mii_bus_probe(struct
> platform_device *pdev)
>       miibus->parent = &pdev->dev;
>       miibus->name = "bfin_mii_bus";
>       snprintf(miibus->id, MII_BUS_ID_SIZE, "0");
> -     miibus->irq = kmalloc(sizeof(int)*PHY_MAX_ADDR, GFP_KERNEL);
> +     miibus->irq = kmalloc(sizeof(int) * PHY_MAX_ADDR, GFP_KERNEL);
>       if (miibus->irq == NULL)                goto out_err_alloc;     for (i = 0; i <
>  PHY_MAX_ADDR; ++i) @@ -1674,6 +1696,7 @@ out_err_alloc: static int
>  __devexit bfin_mii_bus_remove(struct platform_device *pdev) {        struct
>  mii_bus *miibus = platform_get_drvdata(pdev); +
>       platform_set_drvdata(pdev, NULL);       mdiobus_unregister(miibus);
>       kfree(miibus->irq);
> @@ -1698,17 +1721,15 @@ static struct platform_driver bfin_mac_driver
> = {
>       .suspend = bfin_mac_suspend,
>       .driver = {
>               .name = DRV_NAME,
> -             .owner  = THIS_MODULE,
> +             .owner = THIS_MODULE,
>       },
>  };
>
>  static int __init bfin_mac_init(void)  {
> -     int ret;
> -     ret = platform_driver_register(&bfin_mii_bus_driver);
> -     if (!ret)
> -             return platform_driver_register(&bfin_mac_driver);
> -     return -ENODEV;
> +     if (platform_driver_register(&bfin_mii_bus_driver))
> +             return -ENODEV;
> +     return platform_driver_register(&bfin_mac_driver);
>  }
>
>  module_init(bfin_mac_init);
> @@ -1720,4 +1741,3 @@ static void __exit bfin_mac_cleanup(void)  }
>
>  module_exit(bfin_mac_cleanup);
> -
>

Greetings,
Michael

Analog Devices GmbH      Wilhelm-Wagenfeld-Str. 6      80807 Muenchen
Sitz der Gesellschaft Muenchen, Registergericht Muenchen HRB 4036 Geschaeftsfuehrer Thomas Wessel, William A. Martin, Margaret Seif



^ permalink raw reply

* [GIT PULL net-next-2.6] vhost-net patchset for 2.6.36
From: Michael S. Tsirkin @ 2010-07-28 13:32 UTC (permalink / raw)
  To: David Miller; +Cc: kvm, virtualization, netdev, linux-kernel, krkumar2
In-Reply-To: <20100701164127.GA3199@redhat.com>

David,
The following tree includes the current vhost-net patchset.
Please merge it for 2.6.36.
Thanks!

The following changes since commit 4cfa580e7eebb8694b875d2caff3b989ada2efac:

  r6040: Fix args to phy_mii_ioctl(). (2010-07-21 21:10:49 -0700)

are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net-next

David Stevens (1):
      vhost-net: mergeable buffers support

Michael S. Tsirkin (1):
      vhost: apply cgroup to vhost workers

Sridhar Samudrala (1):
      cgroups: Add an API to attach a task to current task's cgroup

Tejun Heo (1):
      vhost: replace vhost_workqueue with per-vhost kthread

 drivers/vhost/net.c    |  293 +++++++++++++++++++++++++++++++++++++++++-------
 drivers/vhost/vhost.c  |  228 ++++++++++++++++++++++++++++++++-----
 drivers/vhost/vhost.h  |   55 ++++++---
 include/linux/cgroup.h |    7 +
 kernel/cgroup.c        |   23 ++++
 5 files changed, 515 insertions(+), 91 deletions(-)

^ permalink raw reply

* Problem with rtl8192su staging driver
From: Jose Alberto Reguero @ 2010-07-28 15:02 UTC (permalink / raw)
  To: netdev

I have a problem with rtl8192su driver. When transmitting the rate is about 10 
Mbytes, but when receiving the rate is about 2 Mbytes. Also I noticed that the 
rx stats don't reflect the correct data. That is with kernel 2.6.34 and with a 
snapshot of rtl8192su driver taken from linux-next.
How I can debug that to find where is the problem?

Jose Alberto

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox