Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 0/4] ipv4: small set of fixes
From: David Miller @ 2014-01-23  1:54 UTC (permalink / raw)
  To: popovich_sergei; +Cc: netdev
In-Reply-To: <cover.1390304505.git.popovich_sergei@mail.ru>

From: Sergey Popovich <popovich_sergei@mail.ru>
Date: Tue, 21 Jan 2014 13:48:47 +0200

> In this patch series I present my attempt to address
> some issues in IPv4 implementation.
> 
> Please take time to review my patches.

You're going to have to repost these with the adjustments I've
asked for.

^ permalink raw reply

* Re: [PATCH 4/4] ipv4: mark nexthop as dead when it's subnet becomes unreachable
From: David Miller @ 2014-01-23  1:53 UTC (permalink / raw)
  To: popovich_sergei; +Cc: netdev
In-Reply-To: <40044b636dbf7ae5bba5fe2873451e14438ec170.1390304505.git.popovich_sergei@mail.ru>

From: Sergey Popovich <popovich_sergei@mail.ru>
Date: Tue, 21 Jan 2014 13:48:51 +0200

> Also optimize loop a bit:
>   move handling of force > 1 out of the change_nexthops() loop.

Do not mix bug fixes with optimizations.

^ permalink raw reply

* Re: [PATCH 3/4] ipv4: use SNMP macro assuming softirq context in ip_forward().
From: David Miller @ 2014-01-23  1:52 UTC (permalink / raw)
  To: popovich_sergei; +Cc: netdev
In-Reply-To: <3c324aaf986e6d5a0678cac6d292dd0b96bd9766.1390304505.git.popovich_sergei@mail.ru>

From: Sergey Popovich <popovich_sergei@mail.ru>
Date: Tue, 21 Jan 2014 13:48:50 +0200

> ip_forward() runs entirely in softirq context, we could use
> version of SNMP macro which updates stats counters assuming that.
> 
> Signed-off-by: Sergey Popovich <popovich_sergei@mail.ru>

This is not a bug fix, it's an optimization.

^ permalink raw reply

* Re: [PATCH 1/4] ipv4: don't disable interface if last ipv4 address is removed
From: David Miller @ 2014-01-23  1:52 UTC (permalink / raw)
  To: popovich_sergei; +Cc: netdev
In-Reply-To: <2748ead3b814852075f954ea157fe5d7288531e2.1390304505.git.popovich_sergei@mail.ru>

From: Sergey Popovich <popovich_sergei@mail.ru>
Date: Tue, 21 Jan 2014 13:48:48 +0200

> Since
> 
>   commit 876fd05ddbae03166e7037fca957b55bb3be6594
>   Author: Hannes Frederic Sowa <>
>   Date:   Mon Jun 24 00:22:20 2013 +0200
> 
>     ipv6: don't disable interface if last ipv6 address is removed
> 
> we have disabled this behavior for IPv6. Adjust behavior for IPv4.

I do not agree with this change.

IPV4 is a lot different from ipv6, for example ipv4 lacks things like
temporary address assignments and other notifications that links listen
to even if they don't have explicit static addressed assigned.

The ipv4 behavior also has two decades of precedence, and I think you'll
have a very hard time convincing me that nobody depends upon it.

So I'm not applying this patch, sorry.

^ permalink raw reply

* Re: [PATCH net-next v5 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy
From: David Miller @ 2014-01-23  1:50 UTC (permalink / raw)
  To: zoltan.kiss
  Cc: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel,
	jonathan.davies
In-Reply-To: <1390253069-25507-1-git-send-email-zoltan.kiss@citrix.com>

From: Zoltan Kiss <zoltan.kiss@citrix.com>
Date: Mon, 20 Jan 2014 21:24:20 +0000

> A long known problem of the upstream netback implementation that on the TX
> path (from guest to Dom0) it copies the whole packet from guest memory into
> Dom0. That simply became a bottleneck with 10Gb NICs, and generally it's a
> huge perfomance penalty. The classic kernel version of netback used grant
> mapping, and to get notified when the page can be unmapped, it used page
> destructors. Unfortunately that destructor is not an upstreamable solution.
> Ian Campbell's skb fragment destructor patch series [1] tried to solve this
> problem, however it seems to be very invasive on the network stack's code,
> and therefore haven't progressed very well.
> This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to
> know when the skb is freed up.

This series does not apply to net-next due to some other recent changes.

Please respin, thanks.

^ permalink raw reply

* RE: Freescale FEC packet loss
From: fugang.duan @ 2014-01-23  1:49 UTC (permalink / raw)
  To: Marek Vasut, netdev@vger.kernel.org
  Cc: Frank.Li@freescale.com, Fabio.Estevam@freescale.com,
	Hector Palacios, linux-arm-kernel@lists.infradead.org,
	Detlev Zundel, Eric Nelson
In-Reply-To: <201401222255.29467.marex@denx.de>

Hi, Marek,


From: Marek Vasut <marex@denx.de>
Data: Thursday, January 23, 2014 5:55 AM
>To: netdev@vger.kernel.org
>Cc: Li Frank-B20596; Duan Fugang-B38611; Estevam Fabio-R49496; Hector Palacios;
>linux-arm-kernel@lists.infradead.org; Detlev Zundel; Eric Nelson
>Subject: Freescale FEC packet loss
>
>Hi guys,
>
>I am running stock Linux 3.13 on i.MX6Q SabreLite board. The CPU is i.MX6Q TO
>1.0 .
>
>I am hitting a WARNING when I use the FEC ethernet to transfer data, thus I
>started investigating this problem. TL;DR I am not able to figure this problem
>out, so I am not attaching a patch :-(
>
>Steps to reproduce:
>-------------------
>1) Boot stock Linux 3.13 on i.MX6Q SabreLite board
>2) Plug in an SD card into one of the SD slots (I use the full-size one)
>3) Plug in an USB stick into one of the USB ports (I use the upper one)
>4) Plug in an ethernet cable into the board
>   -> Connect the other side into a gigabit-capable PC
>   -> Let us assume the PC has IP address 192.168.1.1/24 and
>      the board 192.168.1.2/24
>5) Install iperf on both boards
>6) Bring up the ethernet link on both ends:
>   $ ifconfig ...
>7) On the PC, run:
>   $ iperf -s -i 1
>8) Start producing load on the in-CPU busses. Run this on the board:
>   $ sha1sum /dev/mmcblk0 &
>   $ sha1sum /dev/sda &
>9) Now let the board generate ethernet traffic:
>   $ iperf -c 192.168.1.1 -t 1000 -i 1
>
>Now wait about 10 minutes and the system will produce a warning and you will
>observe dips in the transmission speed. You can see the output at the end of
>the email.
>
>I observe that this happens more often when there is a severe load on the
>busses, which in this case I simulate by running the sha1sum on /dev/mmcblk0
>and sha1sum /dev/sda in the background in step 8) . I can also well simulate
>this by running 'sha1sum /dev/mmcblk0 & sha1sum /dev/mmcblk1 &' when I have
>both SD cards populated on the MX6Q sabrelite with the same result (WARNING and
>dips in speed).
>
>There was apparently a thread about this problem already [1] where the person
>used SATA to produce high bus load and had exactly the same result.
>
>One more observation is that I managed to replicate this problem all the way
>back to Linux 3.3.8 , which is one of the first kernel versions where sabrelite
>was supported. I also see this one the Freescale's 3.0.35-4.1.0 .
>
>I have trouble figuring out what this problem is all about. Can you please help
>me? Thank you!
>
>[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2013-
>October/202519.html
>
>root@(none):~# iperf -c 192.168.1.1 -t 1000 -i 1
>------------------------------------------------------------
>Client connecting to 192.168.1.1, TCP port 5001 TCP window size: 43.8 KByte
>(default)
>------------------------------------------------------------
>[  3] local 192.168.1.128 port 36760 connected with 192.168.1.1 port 5001
>[ ID] Interval       Transfer     Bandwidth
>[  3]  0.0- 1.0 sec  25.5 MBytes   214 Mbits/sec
>[  3]  1.0- 2.0 sec  28.9 MBytes   242 Mbits/sec
>[  3]  2.0- 3.0 sec  24.1 MBytes   202 Mbits/sec
>[  3]  3.0- 4.0 sec  21.1 MBytes   177 Mbits/sec
>[  3]  4.0- 5.0 sec  20.2 MBytes   170 Mbits/sec
>[  3]  5.0- 6.0 sec  20.0 MBytes   168 Mbits/sec
>[  3]  6.0- 7.0 sec  20.0 MBytes   168 Mbits/sec
>[  3]  7.0- 8.0 sec  20.1 MBytes   169 Mbits/sec
>[  3]  8.0- 9.0 sec  20.5 MBytes   172 Mbits/sec
>[  3]  9.0-10.0 sec  21.5 MBytes   180 Mbits/sec
>[  3] 10.0-11.0 sec  20.0 MBytes   168 Mbits/sec
>[  3] 11.0-12.0 sec  19.4 MBytes   163 Mbits/sec
>[  3] 12.0-13.0 sec  19.6 MBytes   165 Mbits/sec
>[  3] 13.0-14.0 sec  19.8 MBytes   166 Mbits/sec
>[  3] 14.0-15.0 sec  19.4 MBytes   163 Mbits/sec
>[  275.945247] ------------[ cut here ]------------ [  275.949920] WARNING: CPU:
>3 PID: 1155 at net/sched/sch_generic.c:264
>dev_watchdog+0x280/0x2a4()
>[  275.958679] NETDEV WATCHDOG: eth3 (fec): transmit queue 0 timed out
>[  275.964985] Modules linked in:
>[  275.968096] CPU: 3 PID: 1155 Comm: sha1sum Not tainted 3.13.0 #18
>[  275.974217] Backtrace:
>[  275.976787] [<80012410>] (dump_backtrace+0x0/0x10c) from [<800125ac>]
>(show_stack+0x18/0x1c)
>[  275.985271]  r6:00000108 r5:00000009 r4:00000000 r3:00000000
>[  275.991050] [<80012594>] (show_stack+0x0/0x1c) from [<8060f9c8>]
>(dump_stack+0x80/0x9c)
>[  275.999103] [<8060f948>] (dump_stack+0x0/0x9c) from [<8002703c>]
>(warn_slowpath_common+0x6c/0x90)
>[  276.008017]  r4:bef43e10 r3:bef96c00
>[  276.011663] [<80026fd0>] (warn_slowpath_common+0x0/0x90) from [<80027104>]
>(warn_slowpath_fmt+0x3)
>[  276.021170]  r8:bf098240 r7:bef42000 r6:bf098200 r5:bf068000 r4:00000000
>[  276.028083] [<800270cc>] (warn_slowpath_fmt+0x0/0x40) from [<804e2754>]
>(dev_watchdog+0x280/0x2a4)
>[  276.037095]  r3:bf068000 r2:807d2fb4
>[  276.040734] [<804e24d4>] (dev_watchdog+0x0/0x2a4) from [<80030e1c>]
>(call_timer_fn+0x70/0xec)
>[  276.049313] [<80030dac>] (call_timer_fn+0x0/0xec) from [<80031a90>]
>(run_timer_softirq+0x198/0x23)
>[  276.058407]  r8:00200200 r7:00000000 r6:bef43ec8 r5:bf836000 r4:bf068284
>[  276.065257] [<800318f8>] (run_timer_softirq+0x0/0x230) from [<8002b480>]
>(__do_softirq+0x100/0x25)
>[  276.074338] [<8002b380>] (__do_softirq+0x0/0x254) from [<8002b9a8>]
>(irq_exit+0xb0/0x108)
>[  276.082570] [<8002b8f8>] (irq_exit+0x0/0x108) from [<8000f3dc>]
>(handle_IRQ+0x58/0xb8)
>[  276.090528]  r4:8086ccc8 r3:00000184
>[  276.094162] [<8000f384>] (handle_IRQ+0x0/0xb8) from [<80008640>]
>(gic_handle_irq+0x30/0x64)
>[  276.102552]  r8:5590d9fa r7:f4000100 r6:bef43fb0 r5:8086ce14 r4:f400010c
>r3:000000a0
>[  276.110575] [<80008610>] (gic_handle_irq+0x0/0x64) from [<800133a0>]
>(__irq_usr+0x40/0x60)
>[  276.118871] Exception stack(0xbef43fb0 to 0xbef43ff8)
>[  276.123942] 3fa0:                                     fbe9d585 13e98d33
>6ed9eba1 21e67a57
>[  276.132173] 3fc0: 170dd9fc bc58d89e 5d1b7878 c19f2bf4 5590d9fa 2577bcc4
>7b2ac1ea 6cee44dd [  276.140398] 3fe0: cf486f09 7ea92a58 0000ba1f 0000ab72
>a0000030 ffffffff [  276.147041]  r7:c19f2bf4 r6:ffffffff r5:a0000030
>r4:0000ab72 [  276.152846] ---[ end trace 054500acb8edb763 ]---
>[  3] 15.0-16.0 sec  18.8 MBytes   157 Mbits/sec
>[  3] 16.0-17.0 sec  0.00 Bytes  0.00 bits/sec [  3] 17.0-18.0 sec  0.00 Bytes
>0.00 bits/sec [  3] 18.0-19.0 sec  0.00 Bytes  0.00 bits/sec [  3] 19.0-20.0
>sec  4.25 MBytes  35.7 Mbits/sec
>[  3] 20.0-21.0 sec  19.8 MBytes   166 Mbits/sec
>[  3] 21.0-22.0 sec  19.8 MBytes   166 Mbits/sec
>[  3] 22.0-23.0 sec  19.5 MBytes   164 Mbits/sec
>[  3] 23.0-24.0 sec  8.38 MBytes  70.3 Mbits/sec [  3] 24.0-25.0 sec  0.00
>Bytes  0.00 bits/sec [  3] 25.0-26.0 sec  7.88 MBytes  66.1 Mbits/sec
>[  3] 26.0-27.0 sec  20.1 MBytes   169 Mbits/sec
>[...]
>[  3] 71.0-72.0 sec  23.4 MBytes   196 Mbits/sec
>[  3] 72.0-73.0 sec  12.2 MBytes   103 Mbits/sec
>[  3] 73.0-74.0 sec  0.00 Bytes  0.00 bits/sec [  3] 74.0-75.0 sec  0.00 Bytes
>0.00 bits/sec [  3] 75.0-76.0 sec  10.9 MBytes  91.2 Mbits/sec
>[  3] 76.0-77.0 sec  22.4 MBytes   188 Mbits/sec
>[  3] 77.0-78.0 sec  23.0 MBytes   193 Mbits/sec
>

I will debug the issue when I am free, and then report the result to you.
Thanks for your reporting the issue.

Thanks,
Andy

^ permalink raw reply

* Re: [PATCH net-next v3 0/2] bonding: fix primary problem for bonding
From: David Miller @ 2014-01-23  1:46 UTC (permalink / raw)
  To: dingtianhong; +Cc: fubar, vfalico, netdev
In-Reply-To: <52DA3B3F.5090004@huawei.com>

From: Ding Tianhong <dingtianhong@huawei.com>
Date: Sat, 18 Jan 2014 16:28:47 +0800

> If the slave's name changed, and the bond params primary is exist,
> the bond should deal with the situation in two ways:
> 
> 1) If the slave was the primary slave yet, clean the primary slave
>    and reselect active slave.
> 2) If the slave's new name is as same as bond primary, set the slave
>    as primary slave and reselect active slave.
> 
> If the new primary is not matching any slave in the bond, the bond should
> record it to params, clean the primary slave and select a new active slave.
> 
> Update bonding.txt for primary description.

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] net: vxlan: convert to act as a pernet subsystem
From: David Miller @ 2014-01-23  1:44 UTC (permalink / raw)
  To: dborkman; +Cc: netdev, jesse.brandeburg, ebiederm
In-Reply-To: <1390421273-20152-1-git-send-email-dborkman@redhat.com>

From: Daniel Borkmann <dborkman@redhat.com>
Date: Wed, 22 Jan 2014 21:07:53 +0100

> As per suggestion from Eric W. Biederman, vxlan should be using
> {un,}register_pernet_subsys() instead of {un,}register_pernet_device()
> to ensure the vxlan_net structure is initialized before and cleaned
> up after all network devices in a given network namespace i.e. when
> dealing with network notifiers. This is similarly handeled already in
> commit 91e2ff3528ac ("net: Teach vlans to cleanup as a pernet subsystem")
> and, thus, improves upon fd27e0d44a89 ("net: vxlan: do not use vxlan_net
> before checking event type"). Just as in 91e2ff3528ac, we do not need
> to explicitly handle deletion of vxlan devices as network namespace
> exit calls dellink on all remaining virtual devices, and
> rtnl_link_unregister() calls dellink on all outstanding devices in that
> network namespace, so we can entirely drop the pernet exit operation
> as well. Moreover, on vxlan module exit, rcu_barrier() is called by
> netns since commit 3a765edadb28 ("netns: Add an explicit rcu_barrier
> to unregister_pernet_{device|subsys}"), so this may be omitted. Tested
> with various scenarios and works well on my side.
> 
> Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] SUNRPC: Allow one callback request to be received from two sk_buff
From: shaobingqing @ 2014-01-23  1:42 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Trond Myklebust, Dr Fields James Bruce, Miller David S., linuxnfs,
	netdev, Linux Kernel Mailing List
In-Reply-To: <20140122235233.GA17155@fieldses.org>

2014/1/23 J. Bruce Fields <bfields@fieldses.org>:
> On Tue, Jan 21, 2014 at 08:35:36AM -0700, Trond Myklebust wrote:
>>
>> On Jan 21, 2014, at 3:08, shaobingqing <shaobingqing@bwstor.com.cn> wrote:
>>
>> > 2014/1/21 Trond Myklebust <trond.myklebust@primarydata.com>:
>> >> On Mon, 2014-01-20 at 14:59 +0800, shaobingqing wrote:
>> >>> In current code, there only one struct rpc_rqst is prealloced. If one
>> >>> callback request is received from two sk_buff, the xprt_alloc_bc_request
>> >>> would be execute two times with the same transport->xid. The first time
>> >>> xprt_alloc_bc_request will alloc one struct rpc_rqst and the TCP_RCV_COPY_DATA
>> >>> bit of transport->tcp_flags will not be cleared. The second time
>> >>> xprt_alloc_bc_request could not alloc struct rpc_rqst any more and NULL
>> >>> pointer will be returned, then xprt_force_disconnect occur. I think one
>> >>> callback request can be allowed to be received from two sk_buff.
>> >>>
>> >>> Signed-off-by: shaobingqing <shaobingqing@bwstor.com.cn>
>> >>> ---
>> >>> net/sunrpc/xprtsock.c |   11 +++++++++--
>> >>> 1 files changed, 9 insertions(+), 2 deletions(-)
>> >>>
>> >>> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
>> >>> index ee03d35..606950d 100644
>> >>> --- a/net/sunrpc/xprtsock.c
>> >>> +++ b/net/sunrpc/xprtsock.c
>> >>> @@ -1271,8 +1271,13 @@ static inline int xs_tcp_read_callback(struct rpc_xprt *xprt,
>> >>>      struct sock_xprt *transport =
>> >>>                              container_of(xprt, struct sock_xprt, xprt);
>> >>>      struct rpc_rqst *req;
>> >>> +     static struct rpc_rqst *req_partial;
>> >>> +
>> >>> +     if (req_partial == NULL)
>> >>> +             req = xprt_alloc_bc_request(xprt);
>> >>> +     else if (req_partial->rq_xid == transport->tcp_xid)
>> >>> +             req = req_partial;
>> >>
>> >> What happens here if req_partial->rq_xid != transport->tcp_xid? AFAICS,
>> >> req will be undefined. Either way, you cannot use a static variable for
>> >> storage here: that isn't re-entrant.
>> >
>> > Because metadata sever only have one slot for backchannel request,
>> > req_partial->rq_xid == transport->tcp_xid always happens, if the callback
>> > request just being splited in two sk_buffs. But req_partial->rq_xid !=
>> > transport->tcp_xid may also happens in some special cases, such as
>> > retransmission occurs?
>>
>> If the server retransmits, then it is broken. The NFSv4.1 protocol does not allow it to retransmit unless the connection breaks.
>
> shaobingqing, are you actually seeing retransmission?  (If so, are we
> setting up the callback client wrong?)
No, not actually. Here I just see that one client can receive two
callback requests with the same xid.
>
> --b.
>
>>
>> > If one callback request is splited in two sk_buffs, xs_tcp_read_callback
>> > will be execute two times. The req_partial should be a static variable,
>> > because  the second execution of xs_tcp_read_callback should use
>> > the rpc_rqst allocated for the first execution, which saves information
>> > copies from the first sk_buff.
>>
>> No! This is a multi-threaded/process environment which can support multiple connection. It is a bug to use a static variable.
>>
>> --
>> Trond Myklebust
>> Linux NFS client maintainer
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v3] sch_htb: let skb->priority refer to non-leaf class
From: David Miller @ 2014-01-23  1:40 UTC (permalink / raw)
  To: harry.mason; +Cc: sergei.shtylyov, hadi, eric.dumazet, netdev
In-Reply-To: <1389964952.4698.20.camel@azathoth.dev.smoothwall.net>

From: Harry Mason <harry.mason@smoothwall.net>
Date: Fri, 17 Jan 2014 13:22:32 +0000

> If the class in skb->priority is not a leaf, apply filters from the
> selected class, not the qdisc. This lets netfilter or user space
> partially classify the packet.
> 
> Signed-off-by: Harry Mason <harry.mason@smoothwall.net>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH] af_packet: Add Queue mapping mode to af_packet fanout operation
From: David Miller @ 2014-01-23  1:37 UTC (permalink / raw)
  To: nhorman; +Cc: netdev
In-Reply-To: <1390424504-18543-1-git-send-email-nhorman@tuxdriver.com>

From: Neil Horman <nhorman@tuxdriver.com>
Date: Wed, 22 Jan 2014 16:01:44 -0500

> This patch adds a queue mapping mode to the fanout operation of af_packet
> sockets.  This allows user space af_packet users to better filter on flows
> ingressing and egressing via a specific hardware queue, and avoids the potential
> packet reordering that can occur when FANOUT_CPU is being used and irq affinity
> varies.
> 
> Tested successfully by myself.  applies to net-next
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

This looks fine, applied, thanks.

^ permalink raw reply

* Re: [PATCH] af_packet: Add Queue mapping mode to af_packet fanout operation
From: Eric Dumazet @ 2014-01-23  1:08 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: Neil Horman, netdev, David S. Miller
In-Reply-To: <52E03F80.8020803@redhat.com>

On Wed, 2014-01-22 at 23:00 +0100, Daniel Borkmann wrote:
> On 01/22/2014 10:01 PM, Neil Horman wrote:
> > This patch adds a queue mapping mode to the fanout operation of af_packet
> > sockets.  This allows user space af_packet users to better filter on flows
> > ingressing and egressing via a specific hardware queue, and avoids the potential
> 
> Maybe I'm missing something, but I currently cannot find where this is
> being filled out for ingress path? Egress, ok, this gets filled out
> somewhere in protocol layers or elsewhere and is being locally pushed
> back through dev_queue_xmit_nit(), but I think main use case is ingress
> through packet fanout. In driver layer I can find skb->rxhash filled out
> which would then be PACKET_FANOUT_HASH.
> (Otherwise patch looks good.)


Check for various multiqueue drivers calling skb_record_rx_queue()

^ permalink raw reply

* Re: [RFC PATCH net] IPv6: Fix broken IPv6 routing table after loopback down-up
From: Hannes Frederic Sowa @ 2014-01-23  1:01 UTC (permalink / raw)
  To: Gao feng; +Cc: Sabrina Dubroca, netdev
In-Reply-To: <52E068C5.2050405@cn.fujitsu.com>

On Thu, Jan 23, 2014 at 08:56:37AM +0800, Gao feng wrote:
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 1a341f7..4dca886 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -2610,8 +2610,16 @@ static void init_loopback(struct net_device *dev)
>                         if (sp_ifa->flags & (IFA_F_DADFAILED | IFA_F_TENTATIVE))
>                                 continue;
> 
> -                       if (sp_ifa->rt)
> -                               continue;
> +                       if (sp_ifa->rt) {
> +                               /* This dst has been added to garbage list when
> +                                * lo device down, delete this obsolete dst and
> +                                * reallocate new router for ifa. */
> +                               if (sp_ifa->rt->dst.obsolete > 0) {
> +                                       ip6_del_rt(sp_ifa->rt);
> +                                       sp_ifa->rt = NULL;
> +                               } else
> +                                       continue;
> +                       }
> 
>                         sp_rt = addrconf_dst_alloc(idev, &sp_ifa->addr, false);

I agree, this seems a lot simpler. In the end I would like to replace this
conditional loopback up/down thing with something like below. I haven't done
the correct hookups into the relevant places, but I hope you get the idea:

diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 017badb..1648a59a 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -114,7 +114,8 @@ struct rt6_rtnl_dump_arg {
 };
 
 int rt6_dump_route(struct rt6_info *rt, void *p_arg);
-void rt6_ifdown(struct net *net, struct net_device *dev);
+void rt6_ifdown(struct net *net, struct net_device *dev, bool unregister);
+void rt6_ifup(struct net_device *dev);
 void rt6_mtu_change(struct net_device *dev, unsigned int mtu);
 void rt6_remove_prefsrc(struct inet6_ifaddr *ifp);
 
diff --git a/include/uapi/linux/ipv6_route.h b/include/uapi/linux/ipv6_route.h
index 2be7bd1..5dd40ed 100644
--- a/include/uapi/linux/ipv6_route.h
+++ b/include/uapi/linux/ipv6_route.h
@@ -15,6 +15,7 @@
 
 #include <linux/types.h>
 
+#define RTF_DEAD	0x00008000	/* route dead bcs interface down */
 #define RTF_DEFAULT	0x00010000	/* default - learned via ND	*/
 #define RTF_ALLONLINK	0x00020000	/* (deprecated and will be removed)
 					   fallback, no routers on link */
diff --git a/include/uapi/linux/route.h b/include/uapi/linux/route.h
index 6600708..9099a5f 100644
--- a/include/uapi/linux/route.h
+++ b/include/uapi/linux/route.h
@@ -60,7 +60,7 @@ struct rtentry {
 #define RTF_REJECT	0x0200		/* Reject route			*/
 
 /*
- *	<linux/ipv6_route.h> uses RTF values >= 64k
+ *	<linux/ipv6_route.h> uses RTF values >= 32k
  */
 
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 6913a82..e2df0f9 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -143,7 +143,7 @@ static void addrconf_leave_anycast(struct inet6_ifaddr *ifp);
 
 static void addrconf_type_change(struct net_device *dev,
 				 unsigned long event);
-static int addrconf_ifdown(struct net_device *dev, int how);
+static int addrconf_ifdown(struct net_device *dev, bool unregister);
 
 static struct rt6_info *addrconf_get_prefix_route(const struct in6_addr *pfx,
 						  int plen,
@@ -2882,7 +2882,7 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event,
 			 * IPV6_MIN_MTU stop IPv6 on this interface.
 			 */
 			if (dev->mtu < IPV6_MIN_MTU)
-				addrconf_ifdown(dev, 1);
+				addrconf_ifdown(dev, true);
 		}
 		break;
 
@@ -2952,7 +2952,7 @@ static void addrconf_type_change(struct net_device *dev, unsigned long event)
 		ipv6_mc_unmap(idev);
 }
 
-static int addrconf_ifdown(struct net_device *dev, int how)
+static int addrconf_ifdown(struct net_device *dev, bool unregister)
 {
 	struct net *net = dev_net(dev);
 	struct inet6_dev *idev;
@@ -2961,7 +2961,7 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 
 	ASSERT_RTNL();
 
-	rt6_ifdown(net, dev);
+	rt6_ifdown(net, dev, unregister);
 	neigh_ifdown(&nd_tbl, dev);
 
 	idev = __in6_dev_get(dev);
@@ -2972,7 +2972,7 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 	 * Step 1: remove reference to ipv6 device from parent device.
 	 *	   Do not dev_put!
 	 */
-	if (how) {
+	if (unregister) {
 		idev->dead = 1;
 
 		/* protected by rtnl_lock */
@@ -3004,10 +3004,10 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 	addrconf_del_rs_timer(idev);
 
 	/* Step 2: clear flags for stateless addrconf */
-	if (!how)
+	if (!unregister)
 		idev->if_flags &= ~(IF_RS_SENT|IF_RA_RCVD|IF_READY);
 
-	if (how && del_timer(&idev->regen_timer))
+	if (unregister && del_timer(&idev->regen_timer))
 		in6_dev_put(idev);
 
 	/* Step 3: clear tempaddr list */
@@ -3053,7 +3053,7 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 	write_unlock_bh(&idev->lock);
 
 	/* Step 5: Discard multicast list */
-	if (how)
+	if (unregister)
 		ipv6_mc_destroy_dev(idev);
 	else
 		ipv6_mc_down(idev);
@@ -3061,7 +3061,7 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 	idev->tstamp = jiffies;
 
 	/* Last: Shot the device (if unregistered) */
-	if (how) {
+	if (unregister) {
 		addrconf_sysctl_unregister(idev);
 		neigh_parms_release(&nd_tbl, idev->nd_parms);
 		neigh_ifdown(&nd_tbl, dev);
@@ -5309,9 +5309,9 @@ void addrconf_cleanup(void)
 	for_each_netdev(&init_net, dev) {
 		if (__in6_dev_get(dev) == NULL)
 			continue;
-		addrconf_ifdown(dev, 1);
+		addrconf_ifdown(dev, true);
 	}
-	addrconf_ifdown(init_net.loopback_dev, 2);
+	addrconf_ifdown(init_net.loopback_dev, true);
 
 	/*
 	 *	Check hash table.
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 075602f..1132cfb 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1,3 +1,4 @@
+
 /*
  *	Linux INET6 implementation
  *	Forwarding Information Database
@@ -1711,7 +1712,7 @@ out_timer:
 
 static void fib6_net_exit(struct net *net)
 {
-	rt6_ifdown(net, NULL);
+	rt6_ifdown(net, NULL, true);
 	del_timer_sync(&net->ipv6.ip6_fib_timer);
 
 #ifdef CONFIG_IPV6_MULTIPLE_TABLES
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 11dac21..fb69e8b 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2259,32 +2259,70 @@ void rt6_remove_prefsrc(struct inet6_ifaddr *ifp)
 	fib6_clean_all(net, fib6_remove_prefsrc, &adni);
 }
 
+enum arg_dev_net_action {
+	ARG_DEV_NET_REMOVE = 0,
+	ARG_DEV_NET_DISABLE,
+	ARG_DEV_NET_ENABLE,
+};
+
 struct arg_dev_net {
 	struct net_device *dev;
 	struct net *net;
+	enum arg_dev_net_action action;
 };
 
-static int fib6_ifdown(struct rt6_info *rt, void *arg)
+static int __fib6_match_or_update_if(struct rt6_info *rt, void *arg)
 {
 	const struct arg_dev_net *adn = arg;
 	const struct net_device *dev = adn->dev;
 
 	if ((rt->dst.dev == dev || !dev) &&
-	    rt != adn->net->ipv6.ip6_null_entry)
-		return -1;
+	    rt != adn->net->ipv6.ip6_null_entry) {
+		switch (adn->action) {
+		case ARG_DEV_NET_REMOVE:
+			/* remove rt */
+			return -1;
+		case ARG_DEV_NET_DISABLE:
+			WARN_ON(rt->rt6i_flags & RTF_DEAD);
+			rt->rt6i_flags |= RTF_DEAD;
+			return 0;
+		case ARG_DEV_NET_ENABLE:
+			WARN_ON(!(rt->rt6i_flags & RTF_DEAD));
+			rt->rt6i_flags &= ~RTF_DEAD;
+			return 0;
+		}
+	}
 
 	return 0;
 }
 
-void rt6_ifdown(struct net *net, struct net_device *dev)
+
+static void __rt6_fib_action(struct net *net, struct net_device *dev,
+			     enum arg_dev_net_action action)
 {
 	struct arg_dev_net adn = {
 		.dev = dev,
 		.net = net,
+		.action = action,
 	};
 
-	fib6_clean_all(net, fib6_ifdown, &adn);
-	icmp6_clean_all(fib6_ifdown, &adn);
+	fib6_clean_all(net, __fib6_match_or_update_if, &adn);
+	if (action == ARG_DEV_NET_REMOVE ||
+	    action == ARG_DEV_NET_DISABLE) {
+		adn.action = ARG_DEV_NET_REMOVE;
+		icmp6_clean_all(__fib6_match_or_update_if, &adn);
+	}
+}
+
+void rt6_ifdown(struct net *net, struct net_device *dev, bool unregister)
+{
+	__rt6_fib_action(net, dev, unregister ? ARG_DEV_NET_REMOVE :
+				       ARG_DEV_NET_DISABLE);
+}
+
+void rt6_ifup(struct net_device *dev)
+{
+	__rt6_fib_action(dev_net(dev), dev, ARG_DEV_NET_ENABLE);
 }
 
 struct rt6_mtu_change_arg {

^ permalink raw reply related

* Re: [RFC PATCH net] IPv6: Fix broken IPv6 routing table after loopback down-up
From: Gao feng @ 2014-01-23  0:56 UTC (permalink / raw)
  To: Sabrina Dubroca, netdev, Hannes Frederic Sowa
In-Reply-To: <20140122213446.GD7269@order.stressinduktion.org>

On 01/23/2014 05:34 AM, Hannes Frederic Sowa wrote:
> On Wed, Jan 22, 2014 at 04:35:08PM +0100, Sabrina Dubroca wrote:
>> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
>> index 4b6b720..b2eb653 100644
>> --- a/net/ipv6/addrconf.c
>> +++ b/net/ipv6/addrconf.c
>> @@ -2578,6 +2578,23 @@ static void sit_add_v4_addrs(struct inet6_dev *idev)
>>  }
>>  #endif
>>  
>> +struct rt6_info *addrconf_dst_alloc_once(struct inet6_dev *idev,
>> +					 const struct in6_addr *addr,
>> +					 bool anycast)
>> +{
>> +	struct net *net = dev_net(idev->dev);
>> +	struct rt6_info *rt = rt6_lookup(net, addr, NULL, 0, 0);
> 
> We should use addrconf_get_prefix_route here. Eric's comment applies
> here, too, we do a dst_hold in addrconf_get_prefix_route and thus must
> decrease the reference counter with ip6_rt_put then, too.
> 
>> +	if (rt == NULL)
>> +		return addrconf_dst_alloc(idev, addr, anycast);
>> +	else if (!(rt->rt6i_flags & RTF_NONEXTHOP &&
>> +		   ((anycast && rt->rt6i_flags & RTF_ANYCAST) ||
>> +		    (!anycast && rt->rt6i_flags & RTF_LOCAL))))
>> +		return addrconf_dst_alloc(idev, addr, anycast);
> 
> The necessary flags can be given to addrconf_get_prefix_route, then.
> 
>> +	return ERR_PTR(-EEXIST);
>> +}
>> +
>>  static void init_loopback(struct net_device *dev)
>>  {
>>  	struct inet6_dev  *idev;
>> @@ -2611,10 +2628,7 @@ static void init_loopback(struct net_device *dev)
>>  			if (sp_ifa->flags & (IFA_F_DADFAILED | IFA_F_TENTATIVE))
>>  				continue;
>>  
>> -			if (sp_ifa->rt)
>> -				continue;
> 
> We should try to clean sp_ifa->rt on ifdown so we don't have dangling
> pointer to it if it is already in dst gc queue and obsolete. So even if
> we leave this code in there we would never hit the continue (it seems we
> have to decrease the reference counter in addrconf_ifdown before setting
> sp_ifa->rt to NULL).
> 
>> -
>> -			sp_rt = addrconf_dst_alloc(idev, &sp_ifa->addr, false);
>> +			sp_rt = addrconf_dst_alloc_once(idev, &sp_ifa->addr, false);
>>  
>>  			/* Failure cases are ignored */
>>  			if (!IS_ERR(sp_rt)) {
> 
> I am fine if we get this fix in for 3.14 because it solves a real problem.
> I'll already work on the full route preserving logic on ifdown/up and
> would build this on this patch then.
> 
> I currently wonder if we need the relookup logic at all as we currently
> remove the prefix routes in all cases (in rt6_ifdown, which iterates
> over all routing tables). So we always have a obsolete dst which we
> just need to clean up in addrconf_ifdown and just allocate the prefix
> route in init_loopback in all cases. We just have to steal some code
> from inet6_rtm_newaddr to calculate the appropriate flags from the
> inet6_ifaddr's ifa_flags (e.g. RTF_EXPIRES).
> 

I still prefer the patch I send before. I don't like the rt6i_flags,
it is not clear and easy to misunderstand.

But if you all think this one is good, I'm ok, but I don't have enough
time to review it deeply.

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 1a341f7..4dca886 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2610,8 +2610,16 @@ static void init_loopback(struct net_device *dev)
                        if (sp_ifa->flags & (IFA_F_DADFAILED | IFA_F_TENTATIVE))
                                continue;

-                       if (sp_ifa->rt)
-                               continue;
+                       if (sp_ifa->rt) {
+                               /* This dst has been added to garbage list when
+                                * lo device down, delete this obsolete dst and
+                                * reallocate new router for ifa. */
+                               if (sp_ifa->rt->dst.obsolete > 0) {
+                                       ip6_del_rt(sp_ifa->rt);
+                                       sp_ifa->rt = NULL;
+                               } else
+                                       continue;
+                       }

                        sp_rt = addrconf_dst_alloc(idev, &sp_ifa->addr, false);

^ permalink raw reply related

* Re: Fwd: [RFC PATCH net-next 0/3] virtio_net: add aRFS support
From: Zhi Yong Wu @ 2014-01-23  0:40 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Ben Hutchings, Stefan Hajnoczi, Linux Netdev List, Eric Dumazet,
	David S. Miller, Zhi Yong Wu, Michael S. Tsirkin, Rusty Russell,
	Jason Wang
In-Reply-To: <CA+mtBx-Xv6xp3VCq=PKMyboAmDjUWzVrbUhr4WdKQRi1PCansg@mail.gmail.com>

On Thu, Jan 23, 2014 at 2:00 AM, Tom Herbert <therbert@google.com> wrote:
>>> 1. The aRFS interface for the guest to specify which virtual queue to
>>> receive a packet on is fairly straight forward.
>>> 2. To hook into RFS, we need to match the virtual queue to the real
>>> CPU it will processed on, and then program the RFS table for that flow
>>> and CPU.
>>> 3. NIC aRFS keys off the RFS tables so it can program the HW with the
>>> correct queue for the CPU.
>> Does anyone have time to make one conclusion for this discussion? in
>> particular, how will rx packet be steered up the stack from guest
>> virtio_net driver, virtio_net NIC, vhost_net, tun driver, host network
>> stack, to physical NIC with more details?
>> What is the role of each path units? otherwise this discussion wont
>> get any meanful result, which is not what we expect.
>>
> Working code outweighs theoretical discussion :-). I think you started
> on a good track with original patches, and I believe the tun path
> should work pretty well (some performance numbers would still be good
I planed to run netperf in one kvm guest with the path "vhost_net +
tun + OVS bridge". But it seems to be a bit difficult for me to get
some x86 hardwares. The boxes which i have all are Power arch.
> to validate). Seems like there's enough hooks in the virtio_net path
> to implement something meaningful and maybe get some numbers (maybe
Can you say what something meaningful is with more details? What is
the roadmap of virt_net aRFS support which you expect?
> ignore NIC aRFS in the first pass).
>
> Tom
>
>>>
>>>> Stefan
>>
>>
>>
>> --
>> Regards,
>>
>> Zhi Yong Wu



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply

* Re: [PATCH RFC 00/73] tree-wide: clean up some no longer required #include <linux/init.h>
From: Paul Gortmaker @ 2014-01-23  0:38 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: linux-kernel, linux-arch, linux-alpha, linux-arm-kernel,
	linux-ia64, linux-m68k, linux-mips, linuxppc-dev, linux-s390,
	sparclinux, x86, netdev, kvm, rusty, gregkh, akpm, torvalds
In-Reply-To: <20140122180023.dd90d34cba38d9f9ac516349@canb.auug.org.au>

[Re: [PATCH RFC 00/73] tree-wide: clean up some no longer required #include <linux/init.h>] On 22/01/2014 (Wed 18:00) Stephen Rothwell wrote:

> Hi Paul,
> 
> On Tue, 21 Jan 2014 16:22:03 -0500 Paul Gortmaker <paul.gortmaker@windriver.com> wrote:
> >
> > Where: This work exists as a queue of patches that I apply to
> > linux-next; since the changes are fixing some things that currently
> > can only be found there.  The patch series can be found at:
> > 
> >    http://git.kernel.org/cgit/linux/kernel/git/paulg/init.git
> >    git://git.kernel.org/pub/scm/linux/kernel/git/paulg/init.git
> > 
> > I've avoided annoying Stephen with another queue of patches for
> > linux-next while the development content was in flux, but now that
> > the merge window has opened, and new additions are fewer, perhaps he
> > wouldn't mind tacking it on the end...  Stephen?
> 
> OK, I have added this to the end of linux-next today - we will see how we
> go.  It is called "init".

Thanks, it was a great help as it uncovered a few issues in fringe arch
that I didn't have toolchains for, and I've fixed all of those up.

I've noticed that powerpc has been un-buildable for a while now; I have
used this hack patch locally so I could run the ppc defconfigs to check
that I didn't break anything.  Maybe useful for linux-next in the
interim?  It is a hack patch -- Not-Signed-off-by: Paul Gortmaker.  :)

Paul.
--

diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
index d27960c89a71..d0f070a2b395 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -560,9 +560,9 @@ extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 			    pmd_t *pmdp);
 
 #define pmd_move_must_withdraw pmd_move_must_withdraw
-typedef struct spinlock spinlock_t;
-static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl,
-					 spinlock_t *old_pmd_ptl)
+struct spinlock;
+static inline int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
+					 struct spinlock *old_pmd_ptl)
 {
 	/*
 	 * Archs like ppc64 use pgtable to store per pmd

^ permalink raw reply related

* Re: [PATCH net-next v2 00/25] bonding: introduce new option API
From: David Miller @ 2014-01-23  0:29 UTC (permalink / raw)
  To: nikolay; +Cc: netdev, andy, fubar, vfalico, sfeldma
In-Reply-To: <1390398820-5355-1-git-send-email-nikolay@redhat.com>

From: Nikolay Aleksandrov <nikolay@redhat.com>
Date: Wed, 22 Jan 2014 14:53:15 +0100

> This patchset's goal is to introduce a new option API which should be used
> to properly describe the bonding options with their mode dependcies and
> requirements. With this patchset applied we get centralized option
> manipulation, automatic RTNL acquire per option setting, automatic option
> range checking, mode dependcy checking and other various flags which are
> described in detail in patch 01's commit message and comments.

This looks great, all applied, thanks!

^ permalink raw reply

* Re: [BUG] null pointer dereference in tcp_gso_segment()
From: Willy Tarreau @ 2014-01-22 23:56 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Arnaud Ebalard, David Miller, Eric Dumazet, Daniel Borkmann,
	Herbert Xu, netdev
In-Reply-To: <1390429125.27806.40.camel@edumazet-glaptop2.roam.corp.google.com>

On Wed, Jan 22, 2014 at 02:18:45PM -0800, Eric Dumazet wrote:
> On Wed, 2014-01-22 at 23:02 +0100, Arnaud Ebalard wrote:
> > Hi Eric,
> > 
> > Eric Dumazet <eric.dumazet@gmail.com> writes:
> > 
> > >> Unless there is an assumption I missed somewhere in the function, the
> > >> problem may occur during the first round of the loop, because (unlike
> > >> the 'while' condition does at line 21) skb->next is not checked against
> > >> null at lines 17 above before it is passed to tcp_hdr() at line 18.
> > >> 
> > >> To be honest, I am asking because I am not familiar w/ the code and it
> > >> is somewhat old so I wonder why noone got hit before. AFAICT,
> > >> f4c50d990dcf ([NET]: Add software TSOv4) added TSOv4 support in 2006 via
> > >> introduction of tcp_tso_segmen() (with the same kind of deref but
> > >> possibly different assumptions) which was more recently modified via
> > >> 28850dc7c7 (net: tcp: move GRO/GSO functions to tcp_offload) to become
> > >> tcp_gso_segment().
> > >> 
> > >> David, can you confirm the analysis and possibly comment on the
> > >> conditions needed for the bug to manifest?
> > >
> > > A gso packet contains at least 2 segments.
> > 
> > By whom / where is it enforced?
> 
> For example, tcp_gso_segment() does the following check :
> 
> if (unlikely(skb->len <= mss))
> 	goto out;
> 
> If there was one segment, then skb->len should also be smaller than mss
> 
> Since TCP stack seemed to be the provider of the packet in your stack
> trace, check tcp_set_skb_tso_segs()

Thanks Eric for the explanation. From Arnaud's trace, I suspect that he's
received an ACK which has released some pending data, so it's very likely
indeed that at least two segments were released at once given that the
receiver is likely to ACK every two segments.

Also we can expect that the received ACK was copy-breaked. I don't know
if some sort of skb recycling may happen at this stage and reveal some
bad corner cases (eg: improperly initialized skb during the rx path that
causes everything to break when it's recycled for the tx path), but Arnaud
you can easily disable the rx_copybreak feature by setting the rx_copybreak
module argument to zero. You can change it at run time in /sys/module. At
least it will tell us if it could be related or not.

Regards,
Willy

^ permalink raw reply

* Re: [PATCH] SUNRPC: Allow one callback request to be received from two sk_buff
From: J. Bruce Fields @ 2014-01-22 23:52 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: shaobingqing, Dr Fields James Bruce, Miller David S., linuxnfs,
	netdev-u79uwXL29TY76Z2rM5mHXA, Linux Kernel Mailing List
In-Reply-To: <1BFACA51-087E-4945-851A-FBF0F108604C-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>

On Tue, Jan 21, 2014 at 08:35:36AM -0700, Trond Myklebust wrote:
> 
> On Jan 21, 2014, at 3:08, shaobingqing <shaobingqing-Gb3srWounXyPt1CcHtbs0g@public.gmane.org> wrote:
> 
> > 2014/1/21 Trond Myklebust <trond.myklebust-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>:
> >> On Mon, 2014-01-20 at 14:59 +0800, shaobingqing wrote:
> >>> In current code, there only one struct rpc_rqst is prealloced. If one
> >>> callback request is received from two sk_buff, the xprt_alloc_bc_request
> >>> would be execute two times with the same transport->xid. The first time
> >>> xprt_alloc_bc_request will alloc one struct rpc_rqst and the TCP_RCV_COPY_DATA
> >>> bit of transport->tcp_flags will not be cleared. The second time
> >>> xprt_alloc_bc_request could not alloc struct rpc_rqst any more and NULL
> >>> pointer will be returned, then xprt_force_disconnect occur. I think one
> >>> callback request can be allowed to be received from two sk_buff.
> >>> 
> >>> Signed-off-by: shaobingqing <shaobingqing-Gb3srWounXyPt1CcHtbs0g@public.gmane.org>
> >>> ---
> >>> net/sunrpc/xprtsock.c |   11 +++++++++--
> >>> 1 files changed, 9 insertions(+), 2 deletions(-)
> >>> 
> >>> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> >>> index ee03d35..606950d 100644
> >>> --- a/net/sunrpc/xprtsock.c
> >>> +++ b/net/sunrpc/xprtsock.c
> >>> @@ -1271,8 +1271,13 @@ static inline int xs_tcp_read_callback(struct rpc_xprt *xprt,
> >>>      struct sock_xprt *transport =
> >>>                              container_of(xprt, struct sock_xprt, xprt);
> >>>      struct rpc_rqst *req;
> >>> +     static struct rpc_rqst *req_partial;
> >>> +
> >>> +     if (req_partial == NULL)
> >>> +             req = xprt_alloc_bc_request(xprt);
> >>> +     else if (req_partial->rq_xid == transport->tcp_xid)
> >>> +             req = req_partial;
> >> 
> >> What happens here if req_partial->rq_xid != transport->tcp_xid? AFAICS,
> >> req will be undefined. Either way, you cannot use a static variable for
> >> storage here: that isn't re-entrant.
> > 
> > Because metadata sever only have one slot for backchannel request,
> > req_partial->rq_xid == transport->tcp_xid always happens, if the callback
> > request just being splited in two sk_buffs. But req_partial->rq_xid !=
> > transport->tcp_xid may also happens in some special cases, such as
> > retransmission occurs?
> 
> If the server retransmits, then it is broken. The NFSv4.1 protocol does not allow it to retransmit unless the connection breaks.

shaobingqing, are you actually seeing retransmission?  (If so, are we
setting up the callback client wrong?)

--b.

> 
> > If one callback request is splited in two sk_buffs, xs_tcp_read_callback
> > will be execute two times. The req_partial should be a static variable,
> > because  the second execution of xs_tcp_read_callback should use
> > the rpc_rqst allocated for the first execution, which saves information
> > copies from the first sk_buff.
> 
> No! This is a multi-threaded/process environment which can support multiple connection. It is a bug to use a static variable.
> 
> --
> Trond Myklebust
> Linux NFS client maintainer
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* VLAN ID 0 with priority tagging
From: Sridhar Samudrala @ 2014-01-22 23:49 UTC (permalink / raw)
  To: netdev@vger.kernel.org

I am trying to send a packet with VLAN ID 0 and non-zero priority.

The VLAN interfaces are created on 2 hosts using
     ip link add link eth1 eth1.0 type vlan id 0 egress-qos-map 0:2

When i try to send a packet using ping/arping, on the sender side tcpdump
shows that the VLAN tag is added with ID 0 and priority 2.

However, the receiver is receiving the packet with vlan tag stripped.

I am seeing the same behavior with multiple NICs and also with a switch in
between the 2 hosts or with the 2 hosts connected over loopback cable.

It looks as if the driver on the send side is stripping the tag if vlan 
id is 0.
Is this correct behavior or a bug?
Any clues on how to  get priority tagging to work with vlan id 0?

Thanks
Sridhar

^ permalink raw reply

* Fw: [Bug 69231] New: tc class overhead is displayed incorrectly
From: Stephen Hemminger @ 2014-01-22 23:46 UTC (permalink / raw)
  To: netdev


Begin forwarded message:

Date: Wed, 22 Jan 2014 06:57:49 -0800
From: "bugzilla-daemon@bugzilla.kernel.org" <bugzilla-daemon@bugzilla.kernel.org>
To: "stephen@networkplumber.org" <stephen@networkplumber.org>
Subject: [Bug 69231] New: tc class overhead is displayed incorrectly


https://bugzilla.kernel.org/show_bug.cgi?id=69231

            Bug ID: 69231
           Summary: tc class overhead is displayed incorrectly
           Product: Networking
           Version: 2.5
    Kernel Version: 3.13
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Other
          Assignee: shemminger@linux-foundation.org
          Reporter: gabriel.zigelboim.work@gmail.com
        Regression: No

Configuring HTB to do shaping for 3000kb/s with 4 bytes overhead. Test shows
shaping is done correctly (exact amount of bytes are received including CFS)
however when checking TC configuration $overhead$ value is displayed
incorrectly showing configuration of 0 bytes
. 
Enter the following commands:
sudo  tc qdisc add dev eth1 root handle 1: htb default 10
sudo  tc class add dev eth1 parent 1: classid 1:10 htb rate 3000kbit burst
10000 cburst 10000 overhead 4
sudo  tc qdisc add dev eth1 parent 1:10 handle 10:0 pfifo limit 10
tc -s -d class show dev eth1

Result:
class htb 1:10 root leaf 10: prio 0 quantum 37500 rate 3000Kbit ceil 3000Kbit
burst 9999b/1 mpu 0b overhead 0b cburst 9999b/1 mpu 0b overhead 0b level 0
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 rate 0bit 0pps backlog 0b 0p requeues 0
 lended: 0 borrowed: 0 giants: 0
 tokens: 416656 ctokens: 416656

uname -a
Linux cto-linux-11 3.13.0-031300-generic #201401192235 SMP Mon Jan 20 03:36:48
UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

^ permalink raw reply

* Re: r8169 won't transmit with 3.12
From: Francois Romieu @ 2014-01-22 23:31 UTC (permalink / raw)
  To: Craig Small; +Cc: Realtek linux nic maintainers, netdev
In-Reply-To: <20140122004119.GA28174@enc.com.au>

Craig Small <csmall@enc.com.au> :
> On Wed, Jan 22, 2014 at 12:36:30AM +0100, Francois Romieu wrote:
> > You may check that the onboard nic device does not even work when the
> > extra PCI-e card aren't plugged.
> It definitely doesn't. I've tried with only the onboard device
> and no other NIC device (r8169 or not) installed and its the same
> problem.

Should I understand that the LOM fails on Linux with a naked motherboard ?

> > A complete dmesg including the XID lines that the r8169 driver prints
> > would be welcome.
> See attached text file. I'm not sure why I get so many interface up
> messages.

Please check if runtime resume is disabled (/sys/bus/pci/devices/.../power/control
should be 'on'). If so and you do not see many netdev watchdog messages either,
these could be real link status change events.

> This is for the two NIC cards. I can pull them out and go back to the
> onboard one if you like.

Please do (and send a complete dmesg).

Your previous posting suggests that the LOM board is a 8168b but I haven't
seen its XID.

It could help to check the add-on 8168c in a different computer whose
motherboard does not include any networking chipset from Realtek (nor
any bios that could confuse different revisions of those).

Please don't switch between the r8168 and the r8169 modules at runtime.
I saw it in your dmesg and I won't claim it's supported.

I'll be out for work during two days.

Thanks.

-- 
Ueimor

^ permalink raw reply

* Re: [patch net-next 0/5] fix bonding slave info API (sysfs and netlink)
From: Scott Feldman @ 2014-01-22 23:06 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Netdev, David S. Miller, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek, Stephen Hemminger, vyasevic, nicolas.dichtel,
	john.r.fastabend
In-Reply-To: <1390377957-31466-1-git-send-email-jiri@resnulli.us>


On Jan 22, 2014, at 12:05 AM, Jiri Pirko <jiri@resnulli.us> wrote:

> The main part of this patchset is the introduction of a generic way how
> to get/set info of slaves unsing rtnl_link_ops.
> 
> Jiri Pirko (5):
>  bonding: change name of sysfs dir for bonding slaves
>  rtnetlink: put "BOND" into nl attribute names which are related to
>    bonding
>  rtnetlink: provide api for getting and setting slave info
>  bonding: convert netlink to use slave data info api
>  rtnetlink: remove ndo_get_slave

Looks good.  You’ll need to update iproute2 patch I posted to Stephen to reflect the IFLA_BOND_SLAVE_xx renames and nesting.

(patch 6/6 was part of this, correct)?

Acked-by: Scott Feldman <sfeldma@cumulusnetworks.com>

^ permalink raw reply

* Re: [PATCH] net: Fix some fallout from the etner_addr_copy() changes.
From: Joe Perches @ 2014-01-22 22:56 UTC (permalink / raw)
  To: David Miller; +Cc: billfink, netdev
In-Reply-To: <20140121.225431.117378955801559296.davem@davemloft.net>

On Tue, 2014-01-21 at 22:54 -0800, David Miller wrote:
> From: Bill Fink <billfink@mindspring.com>
> Date: Tue, 21 Jan 2014 23:23:31 -0500
> 
> > The commit message indicates problems with appletalk/aarp.c,
> > atm/lec.c, and caif/caif_usb.c, but the diffstat and patch only
> > address the first two and not caif/caif_usb.c.  Is that intended
> > or am I missing something.
> 
> Thanks for catching that, I just pushed the missing change.

My apologies for my stupid carelessness in the
patches I sent here.

I thought #define ETH_ALEN was in the same #include
as ether_addr_copy and I didn't bother compiling.

^ permalink raw reply

* Re: [PATCH] 6lowpan: add a license to 6lowpan_iphc module
From: Marcel Holtmann @ 2014-01-22 22:49 UTC (permalink / raw)
  To: Yann Droneaud
  Cc: Alexander Smirnov, Dmitry Eremin-Solenikov, Gustavo F. Padovan,
	Johan Hedberg, David S. Miller, linux-zigbee-devel,
	BlueZ development, Network Development, linux-kernel,
	Jukka Rissanen, Alexander Aring
In-Reply-To: <1390418724-9804-1-git-send-email-ydroneaud@opteya.com>

Hi Yann,

> Since commit 8df8c56a5abc, 6lowpan_iphc is a module of its own.
> 
> Unfortunately, it lacks some infrastructure to behave like a
> good kernel citizen:
> 
>  kernel: 6lowpan_iphc: module license 'unspecified' taints kernel.
>  kernel: Disabling lock debugging due to kernel taint
> 
> This patch adds the basic MODULE_LICENSE(); with GPL license:
> the code was copied from net/ieee802154/6lowpan.c which is GPL
> and the module exports symbol with EXPORT_SYMBOL_GPL();.
> 
> Cc: Jukka Rissanen <jukka.rissanen@linux.intel.com>
> Cc: Alexander Aring <alex.aring@gmail.com>
> Cc: Marcel Holtmann <marcel@holtmann.org>
> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
> ---
> net/ieee802154/6lowpan_iphc.c | 3 +++
> 1 file changed, 3 insertions(+)
> 
> diff --git a/net/ieee802154/6lowpan_iphc.c b/net/ieee802154/6lowpan_iphc.c
> index e14fe8b2c054..860aa2d445ba 100644
> --- a/net/ieee802154/6lowpan_iphc.c
> +++ b/net/ieee802154/6lowpan_iphc.c
> @@ -52,6 +52,7 @@
> 
> #include <linux/bitops.h>
> #include <linux/if_arp.h>
> +#include <linux/module.h>
> #include <linux/netdevice.h>
> #include <net/ipv6.h>
> #include <net/af_ieee802154.h>
> @@ -797,3 +798,5 @@ int lowpan_header_compress(struct sk_buff *skb, struct net_device *dev,
> 	return 0;
> }
> EXPORT_SYMBOL_GPL(lowpan_header_compress);
> +
> +MODULE_LICENSE("GPL”);

looks good to me.

Acked-by: Marcel Holtmann <marcel@holtmann.org>

Regards

Marcel

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox