Netdev List

Netdev List
 help / color / mirror / Atom feed

* davinci-mdio: failing to connect to PHY
From: Petr Kulhavy @ 2016-04-04  8:18 UTC (permalink / raw)
  To: netdev

Hi,

I'm experiencing a peculiar problem with PHY communication in the 
current davinci-mdio.c driver.
After upgrading from kernel 3.17 to 4.5 my DT based AM1808 board started 
having issues with the PHY communication.
The MAC is detected, the MDIO is detected, the PHY is detected 
(twice?!?!), however there is no data being sent/received and the after 
issuing "ifdown -a" the MDIO starts spitting out messages that it cannot 
connect to the PHY:

net eth0: could not connect to phy davinci_mdio.0:00
davinci_mdio davinci_mdio.0: resetting idled controller

I'm using a single Micrel KSZ8081 PHY connected via RMII using the 
default PHY address 0x01.
Here is the dmesg excerpt related to mdio:

davinci_mdio davinci_mdio.0: Runtime PM disabled, clock forced on.
davinci_mdio davinci_mdio.0: davinci mdio revision 1.5
davinci_mdio davinci_mdio.0: detected phy mask fffffffc
libphy: davinci_mdio.0: probed
davinci_mdio davinci_mdio.0: phy[0]: device davinci_mdio.0:00, driver 
Micrel KSZ8081 or KSZ8091
davinci_mdio davinci_mdio.0: phy[1]: device davinci_mdio.0:01, driver 
Micrel KSZ8081 or KSZ8091
davinci_mdio davinci_mdio.0: resetting idled controller
Micrel KSZ8081 or KSZ8091 davinci_mdio.0:00: failed to disable NAND tree 
mode
Micrel KSZ8081 or KSZ8091 davinci_mdio.0:00: attached PHY driver [Micrel 
KSZ8081 or KSZ8091] (mii_bus:phy_addr=davinci_mdio.0:00, irq=-1)

After a soft-reboot the MDIO uses a different PHY mask fffffffd, detects 
correctly only one PHY at address 1 (this is the default address) and 
the networking works:

davinci_mdio davinci_mdio.0: Runtime PM disabled, clock forced on.
davinci_mdio davinci_mdio.0: davinci mdio revision 1.5
davinci_mdio davinci_mdio.0: detected phy mask fffffffd
libphy: davinci_mdio.0: probed
davinci_mdio davinci_mdio.0: phy[1]: device davinci_mdio.0:01, driver 
Micrel KSZ8081 or KSZ8091
davinci_mdio davinci_mdio.0: resetting idled controller
Micrel KSZ8081 or KSZ8091 davinci_mdio.0:01: attached PHY driver [Micrel 
KSZ8081 or KSZ8091] (mii_bus:phy_addr=davinci_mdio.0:01, irq=-1)

I'm wondering what the problem is and why the PHY mask is different 
after power-up and after a soft reboot.
Also it's not clear to me why this set-up worked with kernel 3.17 even 
if it was detecting the PHY twice exactly the same way.
How does the mask relate to the PHY address and how is it calculated?

Thanks
Petr

^ permalink raw reply

* [RFC] ipv6: allow bypassing cross-intf routing limits
From: Michal Kazior @ 2016-04-04  8:15 UTC (permalink / raw)
  To: netdev; +Cc: Michal Kazior
In-Reply-To: <CA+BoTQnC-OKZ8eRohBYetfyW6-xo31kJtS8Lh+svxC=fkVsrXw@mail.gmail.com>

There are some use-cases to allow link-local
routing for bridging purposes.

One of these is allowing transparent 802.11
bridging. Due to 802.11 framing limitations many
Access Points make it impossible to create bridges
on Client endpoints because they can't maintain
Destination/Source/Transmitter/Receiver address
distinction with only 3 addresses in frame header.

The default behavior, i.e. link-local traffic
being non-routable, remains. The user has to
explicitly enable the bypass when defining a given
route.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
---
For more background see:

  http://www.spinics.net/lists/netdev/msg371022.html



 include/uapi/linux/rtnetlink.h |  8 ++++++--
 net/ipv6/ip6_output.c          | 11 +++++++++--
 net/ipv6/route.c               |  4 ++++
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index ca764b5da86d..a577eec0e56e 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -424,9 +424,13 @@ enum {
 #define RTAX_FEATURE_SACK	(1 << 1)
 #define RTAX_FEATURE_TIMESTAMP	(1 << 2)
 #define RTAX_FEATURE_ALLFRAG	(1 << 3)
+#define RTAX_FEATURE_XFACE	(1 << 4)
 
-#define RTAX_FEATURE_MASK	(RTAX_FEATURE_ECN | RTAX_FEATURE_SACK | \
-				 RTAX_FEATURE_TIMESTAMP | RTAX_FEATURE_ALLFRAG)
+#define RTAX_FEATURE_MASK	(RTAX_FEATURE_ECN | \
+				 RTAX_FEATURE_SACK | \
+				 RTAX_FEATURE_TIMESTAMP | \
+				 RTAX_FEATURE_ALLFRAG | \
+				 RTAX_FEATURE_XFACE)
 
 struct rta_session {
 	__u8	proto;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 9428345d3a07..9abb42acb6ad 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -283,6 +283,7 @@ static int ip6_forward_proxy_check(struct sk_buff *skb)
 	u8 nexthdr = hdr->nexthdr;
 	__be16 frag_off;
 	int offset;
+	int feat = dst_metric_raw(skb_dst(skb), RTAX_FEATURES);
 
 	if (ipv6_ext_hdr(nexthdr)) {
 		offset = ipv6_skip_exthdr(skb, sizeof(*hdr), &nexthdr, &frag_off);
@@ -320,8 +321,11 @@ static int ip6_forward_proxy_check(struct sk_buff *skb)
 	 * The proxying router can't forward traffic sent to a link-local
 	 * address, so signal the sender and discard the packet. This
 	 * behavior is clarified by the MIPv6 specification.
+	 *
+	 * It's useful to allow an override for transparent traffic relay.
 	 */
-	if (ipv6_addr_type(&hdr->daddr) & IPV6_ADDR_LINKLOCAL) {
+	if ((ipv6_addr_type(&hdr->daddr) & IPV6_ADDR_LINKLOCAL) &&
+	    !(feat & RTAX_FEATURE_XFACE)) {
 		dst_link_failure(skb);
 		return -1;
 	}
@@ -485,12 +489,15 @@ int ip6_forward(struct sk_buff *skb)
 			inet_putpeer(peer);
 	} else {
 		int addrtype = ipv6_addr_type(&hdr->saddr);
+		int feat = dst_metric_raw(dst, RTAX_FEATURES);
 
 		/* This check is security critical. */
 		if (addrtype == IPV6_ADDR_ANY ||
 		    addrtype & (IPV6_ADDR_MULTICAST | IPV6_ADDR_LOOPBACK))
 			goto error;
-		if (addrtype & IPV6_ADDR_LINKLOCAL) {
+
+		if ((addrtype & IPV6_ADDR_LINKLOCAL) &&
+		    !(feat & RTAX_FEATURE_XFACE)) {
 			icmpv6_send(skb, ICMPV6_DEST_UNREACH,
 				    ICMPV6_NOT_NEIGHBOUR, 0);
 			goto error;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index ed446639219c..560c99853907 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -629,8 +629,12 @@ static inline enum rt6_nud_state rt6_check_neigh(struct rt6_info *rt)
 static int rt6_score_route(struct rt6_info *rt, int oif,
 			   int strict)
 {
+	int feat = dst_metric_raw(&rt->dst, RTAX_FEATURES);
 	int m;
 
+	if (feat & RTAX_FEATURE_XFACE)
+		strict &= ~RT6_LOOKUP_F_IFACE;
+
 	m = rt6_check_dev(rt, oif);
 	if (!m && (strict & RT6_LOOKUP_F_IFACE))
 		return RT6_NUD_FAIL_HARD;
-- 
2.1.4

^ permalink raw reply related

* Re: net: memory leak due to CLONE_NEWNET
From: Dmitry Vyukov @ 2016-04-04  8:13 UTC (permalink / raw)
  To: Cong Wang
  Cc: David S. Miller, Nicolas Dichtel, Thomas Graf, netdev, LKML,
	Eric Dumazet, syzkaller, Kostya Serebryany, Alexander Potapenko,
	Sasha Levin
In-Reply-To: <CAM_iQpU2_7dFR22xqHm3R3Vh7jbbe1=j3CBBnjhQj3G3AnYvYg@mail.gmail.com>

On Sun, Apr 3, 2016 at 12:31 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Sat, Apr 2, 2016 at 6:55 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> Hello,
>>
>> The following program leads to memory leaks in:
>>
>> unreferenced object 0xffff88005c10d208 (size 96):
>>   comm "a.out", pid 10753, jiffies 4296778619 (age 43.118s)
>>   hex dump (first 32 bytes):
>>     e8 31 85 2d 00 88 ff ff 0f 00 00 00 00 00 00 00  .1.-............
>>     00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00  .....N..........
>>   backtrace:
>>     [<ffffffff8679bb23>] kmemleak_alloc+0x63/0xa0 mm/kmemleak.c:915
>>     [<     inline     >] kmemleak_alloc_recursive include/linux/kmemleak.h:47
>>     [<     inline     >] slab_post_alloc_hook mm/slab.h:406
>>     [<     inline     >] slab_alloc_node mm/slub.c:2602
>>     [<     inline     >] slab_alloc mm/slub.c:2610
>>     [<ffffffff8179b4c0>] kmem_cache_alloc_trace+0x160/0x3d0 mm/slub.c:2627
>>     [<     inline     >] kmalloc include/linux/slab.h:478
>>     [<     inline     >] tc_action_net_init include/net/act_api.h:122
>>     [<ffffffff8574e62e>] csum_init_net+0x15e/0x450 net/sched/act_csum.c:593
>>     [<ffffffff8564ffc9>] ops_init+0xa9/0x3a0 net/core/net_namespace.c:109
>>     [<ffffffff85650474>] setup_net+0x1b4/0x3e0 net/core/net_namespace.c:287
>>     [<ffffffff85651a56>] copy_net_ns+0xd6/0x1a0 net/core/net_namespace.c:367
>>     [<ffffffff813d01bf>] create_new_namespaces+0x37f/0x740 kernel/nsproxy.c:106
>>     [<ffffffff813d0b69>] unshare_nsproxy_namespaces+0xa9/0x1d0
>
> The following patch should fix it.
>
> diff --git a/include/net/act_api.h b/include/net/act_api.h
> index 2a19fe1..03e322b 100644
> --- a/include/net/act_api.h
> +++ b/include/net/act_api.h
> @@ -135,6 +135,7 @@ void tcf_hashinfo_destroy(const struct tc_action_ops *ops,
>  static inline void tc_action_net_exit(struct tc_action_net *tn)
>  {
>         tcf_hashinfo_destroy(tn->ops, tn->hinfo);
> +       kfree(tn->hinfo);
>  }
>
>  int tcf_generic_walker(struct tc_action_net *tn, struct sk_buff *skb,


Fixes the leak for me.

Tested-by: Dmitry Vyukov <dvyukov@google.com>

Thanks

^ permalink raw reply

* System hangs (unable to handle kernel paging request)
From: Oleksii Berezhniak @ 2016-04-04  7:59 UTC (permalink / raw)
  To: netdev

Good day.

We have PPPoE server with CentOS 7 (kernel 3.10.0-327.10.1.el7.dsip.x86_64)

We applied some PPPoE related patches to this kernel:

ppp: don't override sk->sk_state in pppoe_flush_dev()
ppp: fix pppoe_dev deletion condition in pppoe_release()
pppoe: fix memory corruption in padt work structure
pppoe: fix reference counting in PPPoE proxy

Also we built latest version of ixgbe driver from Intel.

Now we have crashes after approx. one week of uptime:

[545444.673270] BUG: unable to handle kernel paging request at ffff88a005040200
[545444.673306] IP: [<ffffffff811c0e95>] kmem_cache_alloc+0x75/0x1d0
[545444.673335] PGD 0
[545444.673348] Oops: 0000 [#1] SMP
[545444.673367] Modules linked in: arc4 ppp_mppe act_police cls_u32
sch_ingress sch_tbf pptp gre pppoe pppox ppp_generic slhc 8021q garp
stp mrp llc iptable_nat nf_conn
track_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_filter xt_TCPMSS
iptable_mangle xt_CT nf_conntrack iptable_raw w83793 hwmon_vid
snd_hda_codec_realtek snd_hda_codec
_generic snd_hda_intel snd_hda_codec coretemp snd_hda_core iTCO_wdt
kvm iTCO_vendor_support snd_hwdep snd_seq snd_seq_device ipmi_ssif
ppdev lpc_ich snd_pcm pcspkr mfd_
core sg ipmi_si snd_timer snd i2c_i801 ipmi_msghandler ioatdma
parport_pc parport shpchp soundcore i7core_edac tpm_infineon edac_core
ip_tables ext4 mbcache jbd2 sd_mod
 crct10dif_generic crc_t10dif crct10dif_common syscopyarea sysfillrect
firewire_ohci sysimgblt i2c_algo_bit drm_kms_helper ata_generic
pata_acpi
[545444.674383]  ttm firewire_core crc_itu_t serio_raw drm ata_piix
libata crc32c_intel i2c_core ixgbe(OE) vxlan e1000e ip6_udp_tunnel
udp_tunnel aacraid dca ptp pps_co
re
[545444.674783] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G           OE
------------   3.10.0-327.10.1.el7.dsip.x86_64 #1
[545444.675032] Hardware name: empty empty/S7010, BIOS 'V2.06  ' 03/31/2010
[545444.675162] task: ffff880139c55c00 ti: ffff880139c84000 task.ti:
ffff880139c84000
[545444.675400] RIP: 0010:[<ffffffff811c0e95>]  [<ffffffff811c0e95>]
kmem_cache_alloc+0x75/0x1d0
[545444.675641] RSP: 0018:ffff88023fc23ce8  EFLAGS: 00010286
[545444.675766] RAX: 0000000000000000 RBX: ffff8802302eab00 RCX:
000000010eb8edbe
[545444.676002] RDX: 000000010eb8edbd RSI: 0000000000000020 RDI:
ffff88013b803700
[545444.676237] RBP: ffff88023fc23d18 R08: 00000000000175a0 R09:
ffffffff81517e70
[545444.676472] R10: 000000000000006b R11: 0000000000000000 R12:
ffff88a005040200
[545444.676706] R13: 0000000000000020 R14: ffff88013b803700 R15:
ffff88013b803700
[545444.676942] FS:  0000000000000000(0000) GS:ffff88023fc20000(0000)
knlGS:0000000000000000
[545444.677180] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[545444.677307] CR2: ffff88a005040200 CR3: 0000000237e63000 CR4:
00000000000007e0
[545444.677543] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[545444.677779] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[545444.678014] Stack:
[545444.678127]  ffff880237ea2040 ffff8802302eab00 0000000000000280
0000000000000280
[545444.678370]  0000000000000006 ffff880236bb1b60 ffff88023fc23d40
ffffffff81517e70
[545444.678614]  0000000000000280 ffff8802302eab00 0000000000000000
ffff88023fc23d60
[545444.678857] Call Trace:
[545444.678973]  <IRQ>

[545444.678982]
[545444.679100]  [<ffffffff81517e70>] build_skb+0x30/0x1d0
[545444.679222]  [<ffffffff8151a973>] __alloc_rx_skb+0x63/0xb0
[545444.679349]  [<ffffffff8151a9db>] __netdev_alloc_skb+0x1b/0x40
[545444.679492]  [<ffffffffa0104d8e>] ixgbe_clean_rx_irq+0xee/0xa50 [ixgbe]
[545444.679624]  [<ffffffff8152862f>] ? __napi_complete+0x1f/0x30
[545444.679756]  [<ffffffffa0106738>] ixgbe_poll+0x2d8/0x6d0 [ixgbe]
[545444.679886]  [<ffffffff8152b092>] net_rx_action+0x152/0x240
[545444.680015]  [<ffffffff81084aef>] __do_softirq+0xef/0x280
[545444.680144]  [<ffffffff8164735c>] call_softirq+0x1c/0x30
[545444.680277]  [<ffffffff81016fc5>] do_softirq+0x65/0xa0
[545444.680402]  [<ffffffff81084e85>] irq_exit+0x115/0x120
[545444.680529]  [<ffffffff81647ef8>] do_IRQ+0x58/0xf0
[545444.680660]  [<ffffffff8163d1ad>] common_interrupt+0x6d/0x6d
[545444.680786]  <EOI>
[545444.680794]
[545444.680914]  [<ffffffff81058e96>] ? native_safe_halt+0x6/0x10
[545444.681041]  [<ffffffff8101dbcf>] default_idle+0x1f/0xc0
[545444.681168]  [<ffffffff8101e4d6>] arch_cpu_idle+0x26/0x30
[545444.681297]  [<ffffffff810d62c5>] cpu_startup_entry+0x245/0x290
[545444.681427]  [<ffffffff810475fa>] start_secondary+0x1ba/0x230
[545444.681554] Code: ce 00 00 49 8b 50 08 4d 8b 20 49 8b 40 10 4d 85
e4 0f 84 1f 01 00 00 48 85 c0 0f 84 16 01 00 00 49 63 46 20 48 8d 4a
01 4d 8b 06 <49> 8b 1c 04 4c
89 e0 65 49 0f c7 08 0f 94 c0 84 c0 74 b9 49 63
[545444.682056] RIP  [<ffffffff811c0e95>] kmem_cache_alloc+0x75/0x1d0
[545444.682186]  RSP <ffff88023fc23ce8>
[545444.682305] CR2: ffff88a005040200


Every time description and call stack are the same.

What can be cause of these crashes?

Thanks.

-- 
WBR

^ permalink raw reply

* Re: [PATCH] ip6_tunnel: set rtnl_link_ops before calling register_netdevice
From: Nicolas Dichtel @ 2016-04-04  7:51 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo, netdev
In-Reply-To: <1459541870-26938-1-git-send-email-cascardo@redhat.com>

Le 01/04/2016 22:17, Thadeu Lima de Souza Cascardo a écrit :
> When creating an ip6tnl tunnel with ip tunnel, rtnl_link_ops is not set
> before ip6_tnl_create2 is called. When register_netdevice is called, there
> is no linkinfo attribute in the NEWLINK message because of that.
>
> Setting rtnl_link_ops before calling register_netdevice fixes that.
>
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Fixes: 0b112457229d ("ip6tnl: add support of link creation via rtnl")
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>

^ permalink raw reply

* Re: [RFC PATCH 0/5] Add driver bpf hook for early packet drop
From: Jesper Dangaard Brouer @ 2016-04-04  7:48 UTC (permalink / raw)
  To: Brenden Blanco
  Cc: Tom Herbert, David S. Miller, Linux Kernel Network Developers,
	Alexei Starovoitov, gerlitz, Daniel Borkmann, john fastabend,
	brouer, Alexander Duyck
In-Reply-To: <20160403054103.GB21980@gmail.com>

On Sat, 2 Apr 2016 22:41:04 -0700
Brenden Blanco <bblanco@plumgrid.com> wrote:

> On Sat, Apr 02, 2016 at 12:47:16PM -0400, Tom Herbert wrote:
>
> > Very nice! Do you think this hook will be sufficient to implement a
> > fast forward patch also?

(DMA experts please verify and correct me!)

One of the gotchas is how DMA sync/unmap works.  For forwarding you
need to modify the headers.  The DMA sync API (DMA_FROM_DEVICE) specify
that the data is to be _considered_ read-only.  AFAIK you can write into
the data, BUT on DMA_unmap the API/DMA-engine is allowed to overwrite
data... note on most archs the DMA_unmap does not overwrite.

This DMA issue should not block the work on a hook for early packet drop.
Maybe we should add a flag option, that can specify to the hook if the
packet read-only? (e.g. if driver use page-fragments and DMA_sync)

We should have another track/thread on how to solve the DMA issue:
I see two solutions.

Solution 1: Simply use a "full" page per packet and do the DMA_unmap.
This result in a slowdown on arch's with expensive DMA-map/unmap.  And
we stress the page allocator more (can be solved with a page-pool-cache).
Eric will not like this due to memory usage, but we can just add a
"copy-break" step for normal stack hand-off.

Solution 2: (Due credit to Alex Duyck, this idea came up while
discussing issue with him).  Remember DMA_sync'ed data is only
considered read-only, because the DMA_unmap can be destructive.  In many
cases DMA_unmap is not.  Thus, we could take advantage of this, and
allow modifying DMA sync'ed data on those DMA setups.

> That is the goal, but more work needs to be done of course. It won't be
> possible with just a single pseudo skb, the driver will need a fast
> way to get batches of pseudo skbs (per core?) through from rx to tx.
> In mlx4 for instance, either the skb needs to be much more complete
> to be handled from the start of mlx4_en_xmit(), or that function
> would need to be split so that the fast tx could start midway through.
> 
> Or, skb allocation just gets much faster. Then it should be pretty
> straightforward.

With the bulking SLUB API, we can reduce the bare kmem_cache_alloc+free
cost per SKB from 90 cycles to 27 cycles.  It is good, but for really
fast forwarding it would be good to avoid allocating any extra data
structures.  We just want to move a RX packet-page to a TX ring queue.

Maybe the 27 cycles kmem_cache/slab cost is considered "fast-enough",
for what we gain in ease of implementation.  The real expensive part of
the SKB process is memset/clearing the SKB.  Which the fast forward
use-case could avoid.  Splitting the SKB alloc and clearing part would
be a needed first step.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Has the net-next tree been open now?
From: Dexuan Cui @ 2016-04-04  7:44 UTC (permalink / raw)
  To: David Miller, netdev@vger.kernel.org

Hi David,
I saw the v4.6-rc1 tag had been in net-next.git and a bunch of stmmac patches
appeared on the tree's master branch yesterday.

Thanks,
-- Dexuan

^ permalink raw reply

* Re: [PATCH v3 00/16] add Intel X722 iWARP driver
From: Christoph Hellwig @ 2016-04-04  7:39 UTC (permalink / raw)
  To: Faisal Latif
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w,
	e1000-rdma-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
In-Reply-To: <1453318816-21672-1-git-send-email-faisal.latif-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

On Wed, Jan 20, 2016 at 01:40:00PM -0600, Faisal Latif wrote:
> This driver provides iWARP RDMA functionality for the Intel(R) X722 Ethernet
> controller for PCI Physical Functions. It is in early product cycle
> and having the driver in the kernel will allow users to have hardware support
> when available for purchase.

Just curious: how is this driver supposed to work?  It doesn't seem to
support FRWRs despite the iWarp spec requiring support for it.  It also
sets IB_DEVICE_MEM_MGT_EXTENSIONS despite the lack of this methods,
which will lead to instant crashes when using any of the usual drivers.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC PATCH 0/5] Add driver bpf hook for early packet drop
From: Johannes Berg @ 2016-04-04  7:37 UTC (permalink / raw)
  To: Lorenzo Colitti, Tom Herbert
  Cc: Brenden Blanco, David S. Miller, Linux Kernel Network Developers,
	Alexei Starovoitov, gerlitz, Daniel Borkmann, john fastabend,
	Jesper Dangaard Brouer
In-Reply-To: <CAKD1Yr351bEXwBOj8e8Hq=_u7J4Zi2-r=w3k9Z3XFe0AP4m5aw@mail.gmail.com>

On Sun, 2016-04-03 at 11:28 +0900, Lorenzo Colitti wrote:

> That said, getting BPF to the driver is part of the picture. On the
> chipsets we're targeting for APF, we're only seeing 2k-4k of memory
> (that's 256-512 BPF instructions) available for filtering code, which
> means that BPF might be too large.

That's true, but I think that as far as the userspace API is concerned
that shouldn't really be an issue. I think we can compile the BPF into
APF, similar to how BPF can be compiled into machine code today.
Additionally, I'm not sure we can realistically expect all devices to
really implement APF "natively", I think there's a good chance but
there's also a possibility of compiling to the native firmware
environment, for example.

johannes

^ permalink raw reply

* Re: [RFC PATCH 4/5] mlx4: add support for fast rx drop bpf program
From: Johannes Berg @ 2016-04-04  7:35 UTC (permalink / raw)
  To: Brenden Blanco
  Cc: davem, netdev, tom, alexei.starovoitov, ogerlitz, daniel,
	john.fastabend, brouer
In-Reply-To: <20160403063834.GE21980@gmail.com>

On Sat, 2016-04-02 at 23:38 -0700, Brenden Blanco wrote:
> 
> Having a common check makes sense. The tricky thing is that the type can
> only be checked after taking the reference, and I wanted to keep the
> scope of the prog brief in the case of errors. I would have to move the
> bpf_prog_get logic into dev_change_bpf_fd and pass a bpf_prog * into the
> ndo instead. Would that API look fine to you?

I can't really comment, I wasn't planning on using the API right now :)

However, what else is there that the driver could possibly do with the
FD, other than getting the bpf_prog?

> A possible extension of this is just to keep the bpf_prog * in the
> netdev itself and expose a feature flag from the driver rather than
> an ndo. But that would mean another 8 bytes in the netdev.

That also misses the signal to the driver when the program is
set/removed, so I don't think that works. I'd argue it's not really
desirable anyway though since I wouldn't expect a majority of drivers
to start supporting this.

johannes

^ permalink raw reply

* [PATCH] net: socket: return a proper error code when source address becomes nonlocal
From: Liping Zhang @ 2016-04-04  7:09 UTC (permalink / raw)
  To: davem; +Cc: netdev, Liping Zhang

From: Liping Zhang <liping.zhang@spreadtrum.com>

1. Socket can use bind(directly) or connect(indirectly) to bind to a local
   ip address, and later if the network becomes down, that cause the source
   address becomes nonlocal, then send() call will fail and return EINVAL.
   But this error code is confusing, acctually we did not pass any invalid
   arguments. Furthermore, send() maybe return ok at first, it now returns
   fail just because of a temporary network problem, i.e. when the network
   recovery, send() call will become ok. Return EADDRNOTAVAIL instead of
   EINVAL in such situation is better.
2. We can use IPV6_PKTINFO to specify the ipv6 source address when call
   sendmsg() to send packet, but if the address is not available, call will
   fail and EINVAL is returned. This error code is not very appropriate,
   it failed maybe just because of a temporary network problem. Also
   RFC3542, section 6.6 describe an example returns EADDRNOTAVAIL:
   "ipi6_ifindex specifies an interface but the address ipi6_addr is not
   available for use on that interface.". So return EADDRNOTAVAIL instead
   of EINVAL here.

Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
---
 net/ipv4/route.c    |    6 ++++--
 net/ipv6/datagram.c |    2 +-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 02c6229..857f7b3 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2149,11 +2149,13 @@ struct rtable *__ip_route_output_key_hash(struct net *net, struct flowi4 *fl4,

 	rcu_read_lock();
 	if (fl4->saddr) {
-		rth = ERR_PTR(-EINVAL);
+		rth = ERR_PTR(-EADDRNOTAVAIL);
 		if (ipv4_is_multicast(fl4->saddr) ||
 		    ipv4_is_lbcast(fl4->saddr) ||
-		    ipv4_is_zeronet(fl4->saddr))
+		    ipv4_is_zeronet(fl4->saddr)) {
+			rth = ERR_PTR(-EINVAL);
 			goto out;
+		}

 		/* I removed check for oif == dev_out->oif here.
 		   It was wrong for two reasons:
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 4281621..04d62e8 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -746,7 +746,7 @@ int ip6_datagram_send_ctl(struct net *net, struct sock *sk,
 						   strict ? dev : NULL, 0) &&
 				    !ipv6_chk_acast_addr_src(net, dev,
 							     &src_info->ipi6_addr))
-					err = -EINVAL;
+					err = -EADDRNOTAVAIL;
 				else
 					fl6->saddr = src_info->ipi6_addr;
 			}
-- 
1.7.9.5

^ permalink raw reply related

* For Your Consideration!
From: John M @ 2016-04-04  7:41 UTC (permalink / raw)
  To: netdev

Hello,

I need you to assist me claim and invest the sum of $50 Million(Fifty Million US Dollars) in your Country.You will get 30% share out of the total fund for your assistance.More details when i hear back from you.

Kind regards,
John

^ permalink raw reply

* Re: Section 4 No. 9,10 Failed was occurred by IPv6 Ready Logo Conformance Test
From: Yuki Machida @ 2016-04-04  6:43 UTC (permalink / raw)
  To: Rongqing Li, netdev
In-Reply-To: <56FE2A90.8080300@jp.fujitsu.com>

Hi Roy,

On 2016年04月01日 17:00, Yuki Machida wrote:
> Hi Roy,
> 
> Thank you for your advice.
> I am very glad.
> 
> Futher comment below.
> 
> On 2016年04月01日 16:43, Rongqing Li wrote:
>>
>>
>> On 2016年04月01日 15:31, Yuki Machida wrote:
>>> Hi all,
>>>
>>> I tested 4.6-rc1 by IPv6 Ready Logo Core Conformance Test.
>>> 4.6-rc1 has some FAILs in Section 4 (RFC 1981: Path MTU Discovery for IP version 6).
>>> I conformed that it was PASSed in 3.14.28 and it was FAILed in 4.1.17.
>>> I will find a patch between 3.14 and 4.1.
>>>
>>> IPv6 Ready Logo
>>> https://www.ipv6ready.org/
>>> TAHI Project
>>> http://www.tahi.org/
>>>
>>> I ran the IPv6 Ready Logo Core Conformance Test on Intel D510MO (Atom D510).
>>> It is using userland build with yocto project.
>>>
>>> Test Environment
>>> Test Specification          : 4.0.6
>>> Tool Version                : REL_3_3_2
>>> Test Program Version        : V6LC_5_0_0
>>> Target Device               : Intel D510MO (Atom D510)
>>>
>>> List of FAILs
>>>
>>> Section 4: RFC 1981 - Path MTU Discovery for IPv6
>>> - Test v6LC.4.1.6: Receiving MTU Below IPv6 Minimum Link MTU
>>>      - No. 9 Part A: MTU equal to 56
>>>      - No.10 Part B: MTU equal to 1279
>>>
>>
>> apply this one
>>
>> commit 8013d1d7eafb0589ca766db6b74026f76b7f5cb4
>> Author: Hangbin Liu <liuhangbin@gmail.com>
>> Date:   Thu Jul 30 14:28:42 2015 +0800
>>
>>       net/ipv6: add sysctl option accept_ra_min_hop_limit
>>
>>       Commit 6fd99094de2b ("ipv6: Don't reduce hop limit for an interface")
>>       disabled accept hop limit from RA if it is smaller than the current hop
>>       limit for security stuff. But this behavior kind of break the RFC
>> definition.
>>
>>       RFC 4861, 6.3.4.  Processing Received Router Advertisements
>>          A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time,
>>          and Retrans Timer) may contain a value denoting that it is
>>          unspecified.  In such cases, the parameter should be ignored and the
>>          host should continue using whatever value it is already using.
>>
>>          If the received Cur Hop Limit value is non-zero, the host SHOULD set
>>          its CurHopLimit variable to the received value.
>>
>>       So add sysctl option accept_ra_min_hop_limit to let user choose the
>> minimum
>>       hop limit value they can accept from RA. And set default to 1 to
>> meet RFC
>>       standards.
>>
>>       Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
>>       Acked-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
>>       Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> I conformed that above patch has been applied at v4.3 in linux.git.
> 
> % git tag --contains=8013d1d7eafb0589ca766db6b74026f76b7f5cb4 | head
> v4.3
> v4.3-rc1
> v4.3-rc2
> v4.3-rc3
> v4.3-rc4
> v4.3-rc5
> v4.3-rc6
> v4.3-rc7
> v4.4
> v4.4-rc1
> 
>>
>>
>>
>>
>>
>> and revert the below one, the TAHI should be updated
>>
>> commit 9d289715eb5c252ae15bd547cb252ca547a3c4f2
>> Author: Hagen Paul Pfeifer <hagen@jauu.net>
>> Date: Thu Jan 15 22:34:25 2015 +0100
>>
>>       ipv6: stop sending PTB packets for MTU < 1280
>>
>>       Reduce the attack vector and stop generating IPv6 Fragment Header for
>>       paths with an MTU smaller than the minimum required IPv6 MTU
>>       size (1280 byte) - called atomic fragments.
>>
>>       See IETF I-D "Deprecating the Generation of IPv6 Atomic Fragments" [1]
>>       for more information and how this "feature" can be misused.
>>
>>       [1]
>> https://tools.ietf.org/html/draft-ietf-6man-deprecate-atomfrag-generation-00
>>
>>       Signed-off-by: Fernando Gont <fgont@si6networks.com>
>>       Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
>>       Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
>>       Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> I will try.

I confirmed that v4.1.20 revert above patch is passed Section 4 No. 9 and 10 testcases
in IPv6 Ready Logo Conformance Test.
I can't immediately revert above patch from v4.6-rc1 by implementation has changed.

I am considering how to fix this problem for mainline.

> 
>>
>>
>>
>> -Roy
>>
>>
>>
>>
>>> Regards,
>>> Yuki Machida
>>>
>>

^ permalink raw reply

* Re: [PATCH v5 net-next] net: ipv4: Consider failed nexthops in multipath routes
From: Julian Anastasov @ 2016-04-04  6:29 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev
In-Reply-To: <1459728547-36371-1-git-send-email-dsa@cumulusnetworks.com>


	Hello,

On Sun, 3 Apr 2016, David Ahern wrote:

> Multipath route lookups should consider knowledge about next hops and not
> select a hop that is known to be failed.
> 
> Example:
> 
>                      [h2]                   [h3]   15.0.0.5
>                       |                      |
>                      3|                     3|
>                     [SP1]                  [SP2]--+
>                      1  2                   1     2
>                      |  |     /-------------+     |
>                      |   \   /                    |
>                      |     X                      |
>                      |    / \                     |
>                      |   /   \---------------\    |
>                      1  2                     1   2
>          12.0.0.2  [TOR1] 3-----------------3 [TOR2] 12.0.0.3
>                      4                         4
>                       \                       /
>                         \                    /
>                          \                  /
>                           -------|   |-----/
>                                  1   2
>                                 [TOR3]
>                                   3|
>                                    |
>                                   [h1]  12.0.0.1
> 
> host h1 with IP 12.0.0.1 has 2 paths to host h3 at 15.0.0.5:
> 
>     root@h1:~# ip ro ls
>     ...
>     12.0.0.0/24 dev swp1  proto kernel  scope link  src 12.0.0.1
>     15.0.0.0/16
>             nexthop via 12.0.0.2  dev swp1 weight 1
>             nexthop via 12.0.0.3  dev swp1 weight 1
>     ...
> 
> If the link between tor3 and tor1 is down and the link between tor1
> and tor2 then tor1 is effectively cut-off from h1. Yet the route lookups
> in h1 are alternating between the 2 routes: ping 15.0.0.5 gets one and
> ssh 15.0.0.5 gets the other. Connections that attempt to use the
> 12.0.0.2 nexthop fail since that neighbor is not reachable:
> 
>     root@h1:~# ip neigh show
>     ...
>     12.0.0.3 dev swp1 lladdr 00:02:00:00:00:1b REACHABLE
>     12.0.0.2 dev swp1  FAILED
>     ...
> 
> The failed path can be avoided by considering known neighbor information
> when selecting next hops. If the neighbor lookup fails we have no
> knowledge about the nexthop, so give it a shot. If there is an entry
> then only select the nexthop if the state is sane. This is similar to
> what fib_detect_death does.
> 
> To maintain backward compatibility use of the neighbor information is
> based on a new sysctl, fib_multipath_use_neigh.
> 
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>

Reviewed-by: Julian Anastasov <ja@ssi.bg>

	With one comment: the fallback strategy is simplified,
we do not fallback to all possible reachable nexthops.

> ---
> v5
> - returned comma that got lost in the ether and removed resetting of
>   nhsel at end of loop - again comments from Julian
> 
> v4
> - remove NULL initializer and logic for fallback per Julian's comment
> 
> v3
> - Julian comments: changed use of dead in documentation to failed,
>   init state to NUD_REACHABLE which simplifies fib_good_nh, use of
>   nh_dev for neighbor lookup, fallback to first entry which is what
>   current logic does
> 
> v2
> - use rcu locking to avoid refcnts per Eric's suggestion
> - only consider neighbor info for nh_scope == RT_SCOPE_LINK per Julian's
>   comment
> - drop the 'state == NUD_REACHABLE' from the state check since it is
>   part of NUD_VALID (comment from Julian)
> - wrapped the use of the neigh in a sysctl
> 
>  Documentation/networking/ip-sysctl.txt | 10 ++++++++++
>  include/net/netns/ipv4.h               |  3 +++
>  net/ipv4/fib_semantics.c               | 34 +++++++++++++++++++++++++++++-----
>  net/ipv4/sysctl_net_ipv4.c             | 11 +++++++++++
>  4 files changed, 53 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
> index b183e2b606c8..6c7f365b1515 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -63,6 +63,16 @@ fwmark_reflect - BOOLEAN
>  	fwmark of the packet they are replying to.
>  	Default: 0
>  
> +fib_multipath_use_neigh - BOOLEAN
> +	Use status of existing neighbor entry when determining nexthop for
> +	multipath routes. If disabled, neighbor information is not used and
> +	packets could be directed to a failed nexthop. Only valid for kernels
> +	built with CONFIG_IP_ROUTE_MULTIPATH enabled.
> +	Default: 0 (disabled)
> +	Possible values:
> +	0 - disabled
> +	1 - enabled
> +
>  route/max_size - INTEGER
>  	Maximum number of routes allowed in the kernel.  Increase
>  	this when using large numbers of interfaces and/or routes.
> diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
> index a69cde3ce460..d061ffeb1e71 100644
> --- a/include/net/netns/ipv4.h
> +++ b/include/net/netns/ipv4.h
> @@ -133,6 +133,9 @@ struct netns_ipv4 {
>  	struct fib_rules_ops	*mr_rules_ops;
>  #endif
>  #endif
> +#ifdef CONFIG_IP_ROUTE_MULTIPATH
> +	int sysctl_fib_multipath_use_neigh;
> +#endif
>  	atomic_t	rt_genid;
>  };
>  #endif
> diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
> index d97268e8ff10..5016676c9186 100644
> --- a/net/ipv4/fib_semantics.c
> +++ b/net/ipv4/fib_semantics.c
> @@ -1559,21 +1559,45 @@ int fib_sync_up(struct net_device *dev, unsigned int nh_flags)
>  }
>  
>  #ifdef CONFIG_IP_ROUTE_MULTIPATH
> +static bool fib_good_nh(const struct fib_nh *nh)
> +{
> +	int state = NUD_REACHABLE;
> +
> +	if (nh->nh_scope == RT_SCOPE_LINK) {
> +		struct neighbour *n;
> +
> +		rcu_read_lock_bh();
> +
> +		n = __neigh_lookup_noref(&arp_tbl, &nh->nh_gw, nh->nh_dev);
> +		if (n)
> +			state = n->nud_state;
> +
> +		rcu_read_unlock_bh();
> +	}
> +
> +	return !!(state & NUD_VALID);
> +}
>  
>  void fib_select_multipath(struct fib_result *res, int hash)
>  {
>  	struct fib_info *fi = res->fi;
> +	struct net *net = fi->fib_net;
> +	bool first = false;
>  
>  	for_nexthops(fi) {
>  		if (hash > atomic_read(&nh->nh_upper_bound))
>  			continue;
>  
> -		res->nh_sel = nhsel;
> -		return;
> +		if (!net->ipv4.sysctl_fib_multipath_use_neigh ||
> +		    fib_good_nh(nh)) {
> +			res->nh_sel = nhsel;
> +			return;
> +		}
> +		if (!first) {
> +			res->nh_sel = nhsel;
> +			first = true;
> +		}
>  	} endfor_nexthops(fi);
> -
> -	/* Race condition: route has just become dead. */
> -	res->nh_sel = 0;
>  }
>  #endif
>  
> diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
> index 1e1fe6086dd9..bb0419582b8d 100644
> --- a/net/ipv4/sysctl_net_ipv4.c
> +++ b/net/ipv4/sysctl_net_ipv4.c
> @@ -960,6 +960,17 @@ static struct ctl_table ipv4_net_table[] = {
>  		.mode		= 0644,
>  		.proc_handler	= proc_dointvec,
>  	},
> +#ifdef CONFIG_IP_ROUTE_MULTIPATH
> +	{
> +		.procname	= "fib_multipath_use_neigh",
> +		.data		= &init_net.ipv4.sysctl_fib_multipath_use_neigh,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec_minmax,
> +		.extra1		= &zero,
> +		.extra2		= &one,
> +	},
> +#endif
>  	{ }
>  };
>  
> -- 
> 1.9.1

Regards

^ permalink raw reply

* Re: [v7, 3/5] dt: move guts devicetree doc out of powerpc directory
From: Rob Herring @ 2016-04-04  5:15 UTC (permalink / raw)
  To: Yangbo Lu
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA,
	ulf.hansson-QSEj5FYQhm4dnm+yROfE0A, Zhao Qiang, Russell King,
	Claudiu Manoil, Bhupesh Sharma, netdev-u79uwXL29TY76Z2rM5mHXA,
	Santosh Shilimkar, linux-mmc-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, scott.wood-3arQi8VN3Tc,
	xiaobo.xie-3arQi8VN3Tc,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-i2c-u79uwXL29TY76Z2rM5mHXA, Jochen Friedrich, Kumar Gala,
	leoyang.li-3arQi8VN3Tc, linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-clk-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
In-Reply-To: <1459480051-3701-4-git-send-email-yangbo.lu-3arQi8VN3Tc@public.gmane.org>

On Fri, Apr 01, 2016 at 11:07:29AM +0800, Yangbo Lu wrote:
> Move guts devicetree doc to Documentation/devicetree/bindings/soc/fsl/
> since it's used by not only PowerPC but also ARM. And add a specification
> for 'little-endian' property.
> 
> Signed-off-by: Yangbo Lu <yangbo.lu-3arQi8VN3Tc@public.gmane.org>
> ---
> Changes for v2:
> 	- None
> Changes for v3:
> 	- None
> Changes for v4:
> 	- Added this patch
> Changes for v5:
> 	- Modified the description for little-endian property
> Changes for v6:
> 	- None
> Changes for v7:
> 	- None
> ---
>  Documentation/devicetree/bindings/{powerpc => soc}/fsl/guts.txt | 3 +++
>  1 file changed, 3 insertions(+)
>  rename Documentation/devicetree/bindings/{powerpc => soc}/fsl/guts.txt (91%)

Acked-by: Rob Herring <robh-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

^ permalink raw reply

* [PATCH net-next] cxgb4/cxgb4vf:  Deprecate module parameter dflt_msg_enable
From: Hariprasad Shenai @ 2016-04-04  4:53 UTC (permalink / raw)
  To: davem; +Cc: netdev, leedom, nirranjan, Hariprasad Shenai

Message level can be set through ethtool, so deprecate module parameter
which is used to set the same.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c     | 3 ++-
 drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index d1e3f0997d6b..acefa35b7250 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -168,7 +168,8 @@ MODULE_PARM_DESC(force_init, "Forcibly become Master PF and initialize adapter,"
 static int dflt_msg_enable = DFLT_MSG_ENABLE;
 
 module_param(dflt_msg_enable, int, 0644);
-MODULE_PARM_DESC(dflt_msg_enable, "Chelsio T4 default message enable bitmap");
+MODULE_PARM_DESC(dflt_msg_enable, "Chelsio T4 default message enable bitmap,"
+		 "deprecated parameter");
 
 /*
  * The driver uses the best interrupt scheme available on a platform in the
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
index 1cc8a7a69457..730fec73d5a6 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
@@ -74,7 +74,8 @@ static int dflt_msg_enable = DFLT_MSG_ENABLE;
 
 module_param(dflt_msg_enable, int, 0644);
 MODULE_PARM_DESC(dflt_msg_enable,
-		 "default adapter ethtool message level bitmap");
+		 "default adapter ethtool message level bitmap, "
+		 "deprecated parameter");
 
 /*
  * The driver uses the best interrupt scheme available on a platform in the
-- 
2.3.4

^ permalink raw reply related

* [PATCH net] cxgb4: Add pci device id for chelsio t520-cr adapter
From: Hariprasad Shenai @ 2016-04-04  4:24 UTC (permalink / raw)
  To: davem; +Cc: netdev, leedom, nirranjan, Hariprasad Shenai

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h b/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h
index 06bc2d2e7a73..a2cdfc1261dc 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h
@@ -166,6 +166,7 @@ CH_PCI_DEVICE_ID_TABLE_DEFINE_BEGIN
 	CH_PCI_ID_TABLE_FENTRY(0x5099),	/* Custom 2x40G QSFP */
 	CH_PCI_ID_TABLE_FENTRY(0x509a),	/* Custom T520-CR */
 	CH_PCI_ID_TABLE_FENTRY(0x509b),	/* Custom T540-CR LOM */
+	CH_PCI_ID_TABLE_FENTRY(0x509c),	/* Custom T520-CR*/
 
 	/* T6 adapters:
 	 */
-- 
2.3.4

^ permalink raw reply related

* Re: [PATCH net-next v2 0/6] net: dsa: mv88e6131: HW bridging support for 6185
From: Andrew Lunn @ 2016-04-04  2:13 UTC (permalink / raw)
  To: Vivien Didelot; +Cc: netdev, linux-kernel, kernel, David S. Miller
In-Reply-To: <1459457626-30082-1-git-send-email-vivien.didelot@savoirfairelinux.com>

On Thu, Mar 31, 2016 at 04:53:40PM -0400, Vivien Didelot wrote:
> All packets passing through a switch of the 6185 family are currently all
> directed to the CPU port. This means that port bridging is software driven.
> 
> To enable hardware bridging for this switch family, we need to implement the
> port mapping operations, the FDB operations, and optionally the VLAN operations
> (for 802.1Q and VLAN filtering aware systems).
> 
> However this family only has 256 FDBs indexed by 8-bit identifiers, opposed to
> 4096 FDBs with 12-bit identifiers for other families such as 6352. It also
> doesn't have dedicated FID registers for ATU and VTU operations.
> 
> This patchset fixes these differences, and enable hardware bridging for 6185.

Hi Vivien

I added a test for in chip 6185 bridging, and it worked as expected.

Tested-by: Andrew Lunn <andrew@lunn.ch>

	   Andrew

^ permalink raw reply

* [PATCH net-next] irda: sh_irda: remove driver
From: Simon Horman @ 2016-04-04  1:44 UTC (permalink / raw)
  To: Samuel Ortiz
  Cc: David Miller, Magnus Damm, netdev, linux-renesas-soc,
	Simon Horman

Remove the sh-irda driver as it appears to be unused since
c0bb9b302769 ("ARCH: ARM: shmobile: Remove ag5evm board support").

Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
---
 drivers/net/irda/Kconfig   |   7 -
 drivers/net/irda/Makefile  |   1 -
 drivers/net/irda/sh_irda.c | 875 ---------------------------------------------
 3 files changed, 883 deletions(-)
 delete mode 100644 drivers/net/irda/sh_irda.c

diff --git a/drivers/net/irda/Kconfig b/drivers/net/irda/Kconfig
index a2c227bfb687..e070e1222733 100644
--- a/drivers/net/irda/Kconfig
+++ b/drivers/net/irda/Kconfig
@@ -394,12 +394,5 @@ config MCS_FIR
 	  To compile it as a module, choose M here: the module will be called
 	  mcs7780.
 
-config SH_IRDA
-	tristate "SuperH IrDA driver"
-	depends on IRDA
-	depends on (ARCH_SHMOBILE || COMPILE_TEST) && HAS_IOMEM
-	help
-	  Say Y here if your want to enable SuperH IrDA devices.
-
 endmenu
 
diff --git a/drivers/net/irda/Makefile b/drivers/net/irda/Makefile
index be8ab5b9a4a2..4c344433dae5 100644
--- a/drivers/net/irda/Makefile
+++ b/drivers/net/irda/Makefile
@@ -19,7 +19,6 @@ obj-$(CONFIG_VIA_FIR)		+= via-ircc.o
 obj-$(CONFIG_PXA_FICP)	        += pxaficp_ir.o
 obj-$(CONFIG_MCS_FIR)	        += mcs7780.o
 obj-$(CONFIG_AU1000_FIR)	+= au1k_ir.o
-obj-$(CONFIG_SH_IRDA)		+= sh_irda.o
 # SIR drivers
 obj-$(CONFIG_IRTTY_SIR)		+= irtty-sir.o	sir-dev.o
 obj-$(CONFIG_BFIN_SIR)		+= bfin_sir.o
diff --git a/drivers/net/irda/sh_irda.c b/drivers/net/irda/sh_irda.c
deleted file mode 100644
index c96b46b2c3a8..000000000000
--- a/drivers/net/irda/sh_irda.c
+++ /dev/null
@@ -1,875 +0,0 @@
-/*
- * SuperH IrDA Driver
- *
- * Copyright (C) 2010 Renesas Solutions Corp.
- * Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
- *
- * Based on sh_sir.c
- * Copyright (C) 2009 Renesas Solutions Corp.
- * Copyright 2006-2009 Analog Devices Inc.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- */
-
-/*
- * CAUTION
- *
- * This driver is very simple.
- * So, it doesn't have below support now
- *  - MIR/FIR support
- *  - DMA transfer support
- *  - FIFO mode support
- */
-#include <linux/io.h>
-#include <linux/interrupt.h>
-#include <linux/module.h>
-#include <linux/platform_device.h>
-#include <linux/pm_runtime.h>
-#include <linux/clk.h>
-#include <net/irda/wrapper.h>
-#include <net/irda/irda_device.h>
-
-#define DRIVER_NAME "sh_irda"
-
-#define __IRDARAM_LEN	0x1039
-
-#define IRTMR		0x1F00 /* Transfer mode */
-#define IRCFR		0x1F02 /* Configuration */
-#define IRCTR		0x1F04 /* IR control */
-#define IRTFLR		0x1F20 /* Transmit frame length */
-#define IRTCTR		0x1F22 /* Transmit control */
-#define IRRFLR		0x1F40 /* Receive frame length */
-#define IRRCTR		0x1F42 /* Receive control */
-#define SIRISR		0x1F60 /* SIR-UART mode interrupt source */
-#define SIRIMR		0x1F62 /* SIR-UART mode interrupt mask */
-#define SIRICR		0x1F64 /* SIR-UART mode interrupt clear */
-#define SIRBCR		0x1F68 /* SIR-UART mode baud rate count */
-#define MFIRISR		0x1F70 /* MIR/FIR mode interrupt source */
-#define MFIRIMR		0x1F72 /* MIR/FIR mode interrupt mask */
-#define MFIRICR		0x1F74 /* MIR/FIR mode interrupt clear */
-#define CRCCTR		0x1F80 /* CRC engine control */
-#define CRCIR		0x1F86 /* CRC engine input data */
-#define CRCCR		0x1F8A /* CRC engine calculation */
-#define CRCOR		0x1F8E /* CRC engine output data */
-#define FIFOCP		0x1FC0 /* FIFO current pointer */
-#define FIFOFP		0x1FC2 /* FIFO follow pointer */
-#define FIFORSMSK	0x1FC4 /* FIFO receive status mask */
-#define FIFORSOR	0x1FC6 /* FIFO receive status OR */
-#define FIFOSEL		0x1FC8 /* FIFO select */
-#define FIFORS		0x1FCA /* FIFO receive status */
-#define FIFORFL		0x1FCC /* FIFO receive frame length */
-#define FIFORAMCP	0x1FCE /* FIFO RAM current pointer */
-#define FIFORAMFP	0x1FD0 /* FIFO RAM follow pointer */
-#define BIFCTL		0x1FD2 /* BUS interface control */
-#define IRDARAM		0x0000 /* IrDA buffer RAM */
-#define IRDARAM_LEN	__IRDARAM_LEN /* - 8/16/32 (read-only for 32) */
-
-/* IRTMR */
-#define TMD_MASK	(0x3 << 14) /* Transfer Mode */
-#define TMD_SIR		(0x0 << 14)
-#define TMD_MIR		(0x3 << 14)
-#define TMD_FIR		(0x2 << 14)
-
-#define FIFORIM		(1 << 8) /* FIFO receive interrupt mask */
-#define MIM		(1 << 4) /* MIR/FIR Interrupt Mask */
-#define SIM		(1 << 0) /* SIR Interrupt Mask */
-#define xIM_MASK	(FIFORIM | MIM | SIM)
-
-/* IRCFR */
-#define RTO_SHIFT	8 /* shift for Receive Timeout */
-#define RTO		(0x3 << RTO_SHIFT)
-
-/* IRTCTR */
-#define ARMOD		(1 << 15) /* Auto-Receive Mode */
-#define TE		(1 <<  0) /* Transmit Enable */
-
-/* IRRFLR */
-#define RFL_MASK	(0x1FFF) /* mask for Receive Frame Length */
-
-/* IRRCTR */
-#define RE		(1 <<  0) /* Receive Enable */
-
-/*
- * SIRISR,  SIRIMR,  SIRICR,
- * MFIRISR, MFIRIMR, MFIRICR
- */
-#define FRE		(1 << 15) /* Frame Receive End */
-#define TROV		(1 << 11) /* Transfer Area Overflow */
-#define xIR_9		(1 << 9)
-#define TOT		xIR_9     /* for SIR     Timeout */
-#define ABTD		xIR_9     /* for MIR/FIR Abort Detection */
-#define xIR_8		(1 << 8)
-#define FER		xIR_8     /* for SIR     Framing Error */
-#define CRCER		xIR_8     /* for MIR/FIR CRC error */
-#define FTE		(1 << 7)  /* Frame Transmit End */
-#define xIR_MASK	(FRE | TROV | xIR_9 | xIR_8 | FTE)
-
-/* SIRBCR */
-#define BRC_MASK	(0x3F) /* mask for Baud Rate Count */
-
-/* CRCCTR */
-#define CRC_RST		(1 << 15) /* CRC Engine Reset */
-#define CRC_CT_MASK	0x0FFF    /* mask for CRC Engine Input Data Count */
-
-/* CRCIR */
-#define CRC_IN_MASK	0x0FFF    /* mask for CRC Engine Input Data */
-
-/************************************************************************
-
-
-			enum / structure
-
-
-************************************************************************/
-enum sh_irda_mode {
-	SH_IRDA_NONE = 0,
-	SH_IRDA_SIR,
-	SH_IRDA_MIR,
-	SH_IRDA_FIR,
-};
-
-struct sh_irda_self;
-struct sh_irda_xir_func {
-	int (*xir_fre)	(struct sh_irda_self *self);
-	int (*xir_trov)	(struct sh_irda_self *self);
-	int (*xir_9)	(struct sh_irda_self *self);
-	int (*xir_8)	(struct sh_irda_self *self);
-	int (*xir_fte)	(struct sh_irda_self *self);
-};
-
-struct sh_irda_self {
-	void __iomem		*membase;
-	unsigned int		irq;
-	struct platform_device	*pdev;
-
-	struct net_device	*ndev;
-
-	struct irlap_cb		*irlap;
-	struct qos_info		qos;
-
-	iobuff_t		tx_buff;
-	iobuff_t		rx_buff;
-
-	enum sh_irda_mode	mode;
-	spinlock_t		lock;
-
-	struct sh_irda_xir_func	*xir_func;
-};
-
-/************************************************************************
-
-
-			common function
-
-
-************************************************************************/
-static void sh_irda_write(struct sh_irda_self *self, u32 offset, u16 data)
-{
-	unsigned long flags;
-
-	spin_lock_irqsave(&self->lock, flags);
-	iowrite16(data, self->membase + offset);
-	spin_unlock_irqrestore(&self->lock, flags);
-}
-
-static u16 sh_irda_read(struct sh_irda_self *self, u32 offset)
-{
-	unsigned long flags;
-	u16 ret;
-
-	spin_lock_irqsave(&self->lock, flags);
-	ret = ioread16(self->membase + offset);
-	spin_unlock_irqrestore(&self->lock, flags);
-
-	return ret;
-}
-
-static void sh_irda_update_bits(struct sh_irda_self *self, u32 offset,
-			       u16 mask, u16 data)
-{
-	unsigned long flags;
-	u16 old, new;
-
-	spin_lock_irqsave(&self->lock, flags);
-	old = ioread16(self->membase + offset);
-	new = (old & ~mask) | data;
-	if (old != new)
-		iowrite16(data, self->membase + offset);
-	spin_unlock_irqrestore(&self->lock, flags);
-}
-
-/************************************************************************
-
-
-			mode function
-
-
-************************************************************************/
-/*=====================================
- *
- *		common
- *
- *=====================================*/
-static void sh_irda_rcv_ctrl(struct sh_irda_self *self, int enable)
-{
-	struct device *dev = &self->ndev->dev;
-
-	sh_irda_update_bits(self, IRRCTR, RE, enable ? RE : 0);
-	dev_dbg(dev, "recv %s\n", enable ? "enable" : "disable");
-}
-
-static int sh_irda_set_timeout(struct sh_irda_self *self, int interval)
-{
-	struct device *dev = &self->ndev->dev;
-
-	if (SH_IRDA_SIR != self->mode)
-		interval = 0;
-
-	if (interval < 0 || interval > 2) {
-		dev_err(dev, "unsupported timeout interval\n");
-		return -EINVAL;
-	}
-
-	sh_irda_update_bits(self, IRCFR, RTO, interval << RTO_SHIFT);
-	return 0;
-}
-
-static int sh_irda_set_baudrate(struct sh_irda_self *self, int baudrate)
-{
-	struct device *dev = &self->ndev->dev;
-	u16 val;
-
-	if (baudrate < 0)
-		return 0;
-
-	if (SH_IRDA_SIR != self->mode) {
-		dev_err(dev, "it is not SIR mode\n");
-		return -EINVAL;
-	}
-
-	/*
-	 * Baud rate (bits/s) =
-	 *   (48 MHz / 26) / (baud rate counter value + 1) x 16
-	 */
-	val = (48000000 / 26 / 16 / baudrate) - 1;
-	dev_dbg(dev, "baudrate = %d,  val = 0x%02x\n", baudrate, val);
-
-	sh_irda_update_bits(self, SIRBCR, BRC_MASK, val);
-
-	return 0;
-}
-
-static int sh_irda_get_rcv_length(struct sh_irda_self *self)
-{
-	return RFL_MASK & sh_irda_read(self, IRRFLR);
-}
-
-/*=====================================
- *
- *		NONE MODE
- *
- *=====================================*/
-static int sh_irda_xir_fre(struct sh_irda_self *self)
-{
-	struct device *dev = &self->ndev->dev;
-	dev_err(dev, "none mode: frame recv\n");
-	return 0;
-}
-
-static int sh_irda_xir_trov(struct sh_irda_self *self)
-{
-	struct device *dev = &self->ndev->dev;
-	dev_err(dev, "none mode: buffer ram over\n");
-	return 0;
-}
-
-static int sh_irda_xir_9(struct sh_irda_self *self)
-{
-	struct device *dev = &self->ndev->dev;
-	dev_err(dev, "none mode: time over\n");
-	return 0;
-}
-
-static int sh_irda_xir_8(struct sh_irda_self *self)
-{
-	struct device *dev = &self->ndev->dev;
-	dev_err(dev, "none mode: framing error\n");
-	return 0;
-}
-
-static int sh_irda_xir_fte(struct sh_irda_self *self)
-{
-	struct device *dev = &self->ndev->dev;
-	dev_err(dev, "none mode: frame transmit end\n");
-	return 0;
-}
-
-static struct sh_irda_xir_func sh_irda_xir_func = {
-	.xir_fre	= sh_irda_xir_fre,
-	.xir_trov	= sh_irda_xir_trov,
-	.xir_9		= sh_irda_xir_9,
-	.xir_8		= sh_irda_xir_8,
-	.xir_fte	= sh_irda_xir_fte,
-};
-
-/*=====================================
- *
- *		MIR/FIR MODE
- *
- * MIR/FIR are not supported now
- *=====================================*/
-static struct sh_irda_xir_func sh_irda_mfir_func = {
-	.xir_fre	= sh_irda_xir_fre,
-	.xir_trov	= sh_irda_xir_trov,
-	.xir_9		= sh_irda_xir_9,
-	.xir_8		= sh_irda_xir_8,
-	.xir_fte	= sh_irda_xir_fte,
-};
-
-/*=====================================
- *
- *		SIR MODE
- *
- *=====================================*/
-static int sh_irda_sir_fre(struct sh_irda_self *self)
-{
-	struct device *dev = &self->ndev->dev;
-	u16 data16;
-	u8  *data = (u8 *)&data16;
-	int len = sh_irda_get_rcv_length(self);
-	int i, j;
-
-	if (len > IRDARAM_LEN)
-		len = IRDARAM_LEN;
-
-	dev_dbg(dev, "frame recv length = %d\n", len);
-
-	for (i = 0; i < len; i++) {
-		j = i % 2;
-		if (!j)
-			data16 = sh_irda_read(self, IRDARAM + i);
-
-		async_unwrap_char(self->ndev, &self->ndev->stats,
-				  &self->rx_buff, data[j]);
-	}
-	self->ndev->last_rx = jiffies;
-
-	sh_irda_rcv_ctrl(self, 1);
-
-	return 0;
-}
-
-static int sh_irda_sir_trov(struct sh_irda_self *self)
-{
-	struct device *dev = &self->ndev->dev;
-
-	dev_err(dev, "buffer ram over\n");
-	sh_irda_rcv_ctrl(self, 1);
-	return 0;
-}
-
-static int sh_irda_sir_tot(struct sh_irda_self *self)
-{
-	struct device *dev = &self->ndev->dev;
-
-	dev_err(dev, "time over\n");
-	sh_irda_set_baudrate(self, 9600);
-	sh_irda_rcv_ctrl(self, 1);
-	return 0;
-}
-
-static int sh_irda_sir_fer(struct sh_irda_self *self)
-{
-	struct device *dev = &self->ndev->dev;
-
-	dev_err(dev, "framing error\n");
-	sh_irda_rcv_ctrl(self, 1);
-	return 0;
-}
-
-static int sh_irda_sir_fte(struct sh_irda_self *self)
-{
-	struct device *dev = &self->ndev->dev;
-
-	dev_dbg(dev, "frame transmit end\n");
-	netif_wake_queue(self->ndev);
-
-	return 0;
-}
-
-static struct sh_irda_xir_func sh_irda_sir_func = {
-	.xir_fre	= sh_irda_sir_fre,
-	.xir_trov	= sh_irda_sir_trov,
-	.xir_9		= sh_irda_sir_tot,
-	.xir_8		= sh_irda_sir_fer,
-	.xir_fte	= sh_irda_sir_fte,
-};
-
-static void sh_irda_set_mode(struct sh_irda_self *self, enum sh_irda_mode mode)
-{
-	struct device *dev = &self->ndev->dev;
-	struct sh_irda_xir_func	*func;
-	const char *name;
-	u16 data;
-
-	switch (mode) {
-	case SH_IRDA_SIR:
-		name	= "SIR";
-		data	= TMD_SIR;
-		func	= &sh_irda_sir_func;
-		break;
-	case SH_IRDA_MIR:
-		name	= "MIR";
-		data	= TMD_MIR;
-		func	= &sh_irda_mfir_func;
-		break;
-	case SH_IRDA_FIR:
-		name	= "FIR";
-		data	= TMD_FIR;
-		func	= &sh_irda_mfir_func;
-		break;
-	default:
-		name	= "NONE";
-		data	= 0;
-		func	= &sh_irda_xir_func;
-		break;
-	}
-
-	self->mode = mode;
-	self->xir_func = func;
-	sh_irda_update_bits(self, IRTMR, TMD_MASK, data);
-
-	dev_dbg(dev, "switch to %s mode", name);
-}
-
-/************************************************************************
-
-
-			irq function
-
-
-************************************************************************/
-static void sh_irda_set_irq_mask(struct sh_irda_self *self)
-{
-	u16 tmr_hole;
-	u16 xir_reg;
-
-	/* set all mask */
-	sh_irda_update_bits(self, IRTMR,   xIM_MASK, xIM_MASK);
-	sh_irda_update_bits(self, SIRIMR,  xIR_MASK, xIR_MASK);
-	sh_irda_update_bits(self, MFIRIMR, xIR_MASK, xIR_MASK);
-
-	/* clear irq */
-	sh_irda_update_bits(self, SIRICR,  xIR_MASK, xIR_MASK);
-	sh_irda_update_bits(self, MFIRICR, xIR_MASK, xIR_MASK);
-
-	switch (self->mode) {
-	case SH_IRDA_SIR:
-		tmr_hole	= SIM;
-		xir_reg		= SIRIMR;
-		break;
-	case SH_IRDA_MIR:
-	case SH_IRDA_FIR:
-		tmr_hole	= MIM;
-		xir_reg		= MFIRIMR;
-		break;
-	default:
-		tmr_hole	= 0;
-		xir_reg		= 0;
-		break;
-	}
-
-	/* open mask */
-	if (xir_reg) {
-		sh_irda_update_bits(self, IRTMR, tmr_hole, 0);
-		sh_irda_update_bits(self, xir_reg, xIR_MASK, 0);
-	}
-}
-
-static irqreturn_t sh_irda_irq(int irq, void *dev_id)
-{
-	struct sh_irda_self *self = dev_id;
-	struct sh_irda_xir_func	*func = self->xir_func;
-	u16 isr = sh_irda_read(self, SIRISR);
-
-	/* clear irq */
-	sh_irda_write(self, SIRICR, isr);
-
-	if (isr & FRE)
-		func->xir_fre(self);
-	if (isr & TROV)
-		func->xir_trov(self);
-	if (isr & xIR_9)
-		func->xir_9(self);
-	if (isr & xIR_8)
-		func->xir_8(self);
-	if (isr & FTE)
-		func->xir_fte(self);
-
-	return IRQ_HANDLED;
-}
-
-/************************************************************************
-
-
-			CRC function
-
-
-************************************************************************/
-static void sh_irda_crc_reset(struct sh_irda_self *self)
-{
-	sh_irda_write(self, CRCCTR, CRC_RST);
-}
-
-static void sh_irda_crc_add(struct sh_irda_self *self, u16 data)
-{
-	sh_irda_write(self, CRCIR, data & CRC_IN_MASK);
-}
-
-static u16 sh_irda_crc_cnt(struct sh_irda_self *self)
-{
-	return CRC_CT_MASK & sh_irda_read(self, CRCCTR);
-}
-
-static u16 sh_irda_crc_out(struct sh_irda_self *self)
-{
-	return sh_irda_read(self, CRCOR);
-}
-
-static int sh_irda_crc_init(struct sh_irda_self *self)
-{
-	struct device *dev = &self->ndev->dev;
-	int ret = -EIO;
-	u16 val;
-
-	sh_irda_crc_reset(self);
-
-	sh_irda_crc_add(self, 0xCC);
-	sh_irda_crc_add(self, 0xF5);
-	sh_irda_crc_add(self, 0xF1);
-	sh_irda_crc_add(self, 0xA7);
-
-	val = sh_irda_crc_cnt(self);
-	if (4 != val) {
-		dev_err(dev, "CRC count error %x\n", val);
-		goto crc_init_out;
-	}
-
-	val = sh_irda_crc_out(self);
-	if (0x51DF != val) {
-		dev_err(dev, "CRC result error%x\n", val);
-		goto crc_init_out;
-	}
-
-	ret = 0;
-
-crc_init_out:
-
-	sh_irda_crc_reset(self);
-	return ret;
-}
-
-/************************************************************************
-
-
-			iobuf function
-
-
-************************************************************************/
-static void sh_irda_remove_iobuf(struct sh_irda_self *self)
-{
-	kfree(self->rx_buff.head);
-
-	self->tx_buff.head = NULL;
-	self->tx_buff.data = NULL;
-	self->rx_buff.head = NULL;
-	self->rx_buff.data = NULL;
-}
-
-static int sh_irda_init_iobuf(struct sh_irda_self *self, int rxsize, int txsize)
-{
-	if (self->rx_buff.head ||
-	    self->tx_buff.head) {
-		dev_err(&self->ndev->dev, "iobuff has already existed.");
-		return -EINVAL;
-	}
-
-	/* rx_buff */
-	self->rx_buff.head = kmalloc(rxsize, GFP_KERNEL);
-	if (!self->rx_buff.head)
-		return -ENOMEM;
-
-	self->rx_buff.truesize	= rxsize;
-	self->rx_buff.in_frame	= FALSE;
-	self->rx_buff.state	= OUTSIDE_FRAME;
-	self->rx_buff.data	= self->rx_buff.head;
-
-	/* tx_buff */
-	self->tx_buff.head	= self->membase + IRDARAM;
-	self->tx_buff.truesize	= IRDARAM_LEN;
-
-	return 0;
-}
-
-/************************************************************************
-
-
-			net_device_ops function
-
-
-************************************************************************/
-static int sh_irda_hard_xmit(struct sk_buff *skb, struct net_device *ndev)
-{
-	struct sh_irda_self *self = netdev_priv(ndev);
-	struct device *dev = &self->ndev->dev;
-	int speed = irda_get_next_speed(skb);
-	int ret;
-
-	dev_dbg(dev, "hard xmit\n");
-
-	netif_stop_queue(ndev);
-	sh_irda_rcv_ctrl(self, 0);
-
-	ret = sh_irda_set_baudrate(self, speed);
-	if (ret < 0)
-		goto sh_irda_hard_xmit_end;
-
-	self->tx_buff.len = 0;
-	if (skb->len) {
-		unsigned long flags;
-
-		spin_lock_irqsave(&self->lock, flags);
-		self->tx_buff.len = async_wrap_skb(skb,
-						   self->tx_buff.head,
-						   self->tx_buff.truesize);
-		spin_unlock_irqrestore(&self->lock, flags);
-
-		if (self->tx_buff.len > self->tx_buff.truesize)
-			self->tx_buff.len = self->tx_buff.truesize;
-
-		sh_irda_write(self, IRTFLR, self->tx_buff.len);
-		sh_irda_write(self, IRTCTR, ARMOD | TE);
-	} else
-		goto sh_irda_hard_xmit_end;
-
-	dev_kfree_skb(skb);
-
-	return 0;
-
-sh_irda_hard_xmit_end:
-	sh_irda_set_baudrate(self, 9600);
-	netif_wake_queue(self->ndev);
-	sh_irda_rcv_ctrl(self, 1);
-	dev_kfree_skb(skb);
-
-	return ret;
-
-}
-
-static int sh_irda_ioctl(struct net_device *ndev, struct ifreq *ifreq, int cmd)
-{
-	/*
-	 * FIXME
-	 *
-	 * This function is needed for irda framework.
-	 * But nothing to do now
-	 */
-	return 0;
-}
-
-static struct net_device_stats *sh_irda_stats(struct net_device *ndev)
-{
-	struct sh_irda_self *self = netdev_priv(ndev);
-
-	return &self->ndev->stats;
-}
-
-static int sh_irda_open(struct net_device *ndev)
-{
-	struct sh_irda_self *self = netdev_priv(ndev);
-	int err;
-
-	pm_runtime_get_sync(&self->pdev->dev);
-	err = sh_irda_crc_init(self);
-	if (err)
-		goto open_err;
-
-	sh_irda_set_mode(self, SH_IRDA_SIR);
-	sh_irda_set_timeout(self, 2);
-	sh_irda_set_baudrate(self, 9600);
-
-	self->irlap = irlap_open(ndev, &self->qos, DRIVER_NAME);
-	if (!self->irlap) {
-		err = -ENODEV;
-		goto open_err;
-	}
-
-	netif_start_queue(ndev);
-	sh_irda_rcv_ctrl(self, 1);
-	sh_irda_set_irq_mask(self);
-
-	dev_info(&ndev->dev, "opened\n");
-
-	return 0;
-
-open_err:
-	pm_runtime_put_sync(&self->pdev->dev);
-
-	return err;
-}
-
-static int sh_irda_stop(struct net_device *ndev)
-{
-	struct sh_irda_self *self = netdev_priv(ndev);
-
-	/* Stop IrLAP */
-	if (self->irlap) {
-		irlap_close(self->irlap);
-		self->irlap = NULL;
-	}
-
-	netif_stop_queue(ndev);
-	pm_runtime_put_sync(&self->pdev->dev);
-
-	dev_info(&ndev->dev, "stopped\n");
-
-	return 0;
-}
-
-static const struct net_device_ops sh_irda_ndo = {
-	.ndo_open		= sh_irda_open,
-	.ndo_stop		= sh_irda_stop,
-	.ndo_start_xmit		= sh_irda_hard_xmit,
-	.ndo_do_ioctl		= sh_irda_ioctl,
-	.ndo_get_stats		= sh_irda_stats,
-};
-
-/************************************************************************
-
-
-			platform_driver function
-
-
-************************************************************************/
-static int sh_irda_probe(struct platform_device *pdev)
-{
-	struct net_device *ndev;
-	struct sh_irda_self *self;
-	struct resource *res;
-	int irq;
-	int err = -ENOMEM;
-
-	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-	irq = platform_get_irq(pdev, 0);
-	if (!res || irq < 0) {
-		dev_err(&pdev->dev, "Not enough platform resources.\n");
-		goto exit;
-	}
-
-	ndev = alloc_irdadev(sizeof(*self));
-	if (!ndev)
-		goto exit;
-
-	self = netdev_priv(ndev);
-	self->membase = ioremap_nocache(res->start, resource_size(res));
-	if (!self->membase) {
-		err = -ENXIO;
-		dev_err(&pdev->dev, "Unable to ioremap.\n");
-		goto err_mem_1;
-	}
-
-	err = sh_irda_init_iobuf(self, IRDA_SKB_MAX_MTU, IRDA_SIR_MAX_FRAME);
-	if (err)
-		goto err_mem_2;
-
-	self->pdev = pdev;
-	pm_runtime_enable(&pdev->dev);
-
-	irda_init_max_qos_capabilies(&self->qos);
-
-	ndev->netdev_ops	= &sh_irda_ndo;
-	ndev->irq		= irq;
-
-	self->ndev			= ndev;
-	self->qos.baud_rate.bits	&= IR_9600; /* FIXME */
-	self->qos.min_turn_time.bits	= 1; /* 10 ms or more */
-	spin_lock_init(&self->lock);
-
-	irda_qos_bits_to_value(&self->qos);
-
-	err = register_netdev(ndev);
-	if (err)
-		goto err_mem_4;
-
-	platform_set_drvdata(pdev, ndev);
-	err = devm_request_irq(&pdev->dev, irq, sh_irda_irq, 0, "sh_irda", self);
-	if (err) {
-		dev_warn(&pdev->dev, "Unable to attach sh_irda interrupt\n");
-		goto err_mem_4;
-	}
-
-	dev_info(&pdev->dev, "SuperH IrDA probed\n");
-
-	goto exit;
-
-err_mem_4:
-	pm_runtime_disable(&pdev->dev);
-	sh_irda_remove_iobuf(self);
-err_mem_2:
-	iounmap(self->membase);
-err_mem_1:
-	free_netdev(ndev);
-exit:
-	return err;
-}
-
-static int sh_irda_remove(struct platform_device *pdev)
-{
-	struct net_device *ndev = platform_get_drvdata(pdev);
-	struct sh_irda_self *self = netdev_priv(ndev);
-
-	if (!self)
-		return 0;
-
-	unregister_netdev(ndev);
-	pm_runtime_disable(&pdev->dev);
-	sh_irda_remove_iobuf(self);
-	iounmap(self->membase);
-	free_netdev(ndev);
-
-	return 0;
-}
-
-static int sh_irda_runtime_nop(struct device *dev)
-{
-	/* Runtime PM callback shared between ->runtime_suspend()
-	 * and ->runtime_resume(). Simply returns success.
-	 *
-	 * This driver re-initializes all registers after
-	 * pm_runtime_get_sync() anyway so there is no need
-	 * to save and restore registers here.
-	 */
-	return 0;
-}
-
-static const struct dev_pm_ops sh_irda_pm_ops = {
-	.runtime_suspend	= sh_irda_runtime_nop,
-	.runtime_resume		= sh_irda_runtime_nop,
-};
-
-static struct platform_driver sh_irda_driver = {
-	.probe	= sh_irda_probe,
-	.remove	= sh_irda_remove,
-	.driver	= {
-		.name	= DRIVER_NAME,
-		.pm	= &sh_irda_pm_ops,
-	},
-};
-
-module_platform_driver(sh_irda_driver);
-
-MODULE_AUTHOR("Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>");
-MODULE_DESCRIPTION("SuperH IrDA driver");
-MODULE_LICENSE("GPL");
-- 
2.1.4

^ permalink raw reply related

* [PATCH v5 net-next] net: ipv4: Consider failed nexthops in multipath routes
From: David Ahern @ 2016-04-04  0:09 UTC (permalink / raw)
  To: netdev; +Cc: ja, David Ahern

Multipath route lookups should consider knowledge about next hops and not
select a hop that is known to be failed.

Example:

                     [h2]                   [h3]   15.0.0.5
                      |                      |
                     3|                     3|
                    [SP1]                  [SP2]--+
                     1  2                   1     2
                     |  |     /-------------+     |
                     |   \   /                    |
                     |     X                      |
                     |    / \                     |
                     |   /   \---------------\    |
                     1  2                     1   2
         12.0.0.2  [TOR1] 3-----------------3 [TOR2] 12.0.0.3
                     4                         4
                      \                       /
                        \                    /
                         \                  /
                          -------|   |-----/
                                 1   2
                                [TOR3]
                                  3|
                                   |
                                  [h1]  12.0.0.1

host h1 with IP 12.0.0.1 has 2 paths to host h3 at 15.0.0.5:

    root@h1:~# ip ro ls
    ...
    12.0.0.0/24 dev swp1  proto kernel  scope link  src 12.0.0.1
    15.0.0.0/16
            nexthop via 12.0.0.2  dev swp1 weight 1
            nexthop via 12.0.0.3  dev swp1 weight 1
    ...

If the link between tor3 and tor1 is down and the link between tor1
and tor2 then tor1 is effectively cut-off from h1. Yet the route lookups
in h1 are alternating between the 2 routes: ping 15.0.0.5 gets one and
ssh 15.0.0.5 gets the other. Connections that attempt to use the
12.0.0.2 nexthop fail since that neighbor is not reachable:

    root@h1:~# ip neigh show
    ...
    12.0.0.3 dev swp1 lladdr 00:02:00:00:00:1b REACHABLE
    12.0.0.2 dev swp1  FAILED
    ...

The failed path can be avoided by considering known neighbor information
when selecting next hops. If the neighbor lookup fails we have no
knowledge about the nexthop, so give it a shot. If there is an entry
then only select the nexthop if the state is sane. This is similar to
what fib_detect_death does.

To maintain backward compatibility use of the neighbor information is
based on a new sysctl, fib_multipath_use_neigh.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
v5
- returned comma that got lost in the ether and removed resetting of
  nhsel at end of loop - again comments from Julian

v4
- remove NULL initializer and logic for fallback per Julian's comment

v3
- Julian comments: changed use of dead in documentation to failed,
  init state to NUD_REACHABLE which simplifies fib_good_nh, use of
  nh_dev for neighbor lookup, fallback to first entry which is what
  current logic does

v2
- use rcu locking to avoid refcnts per Eric's suggestion
- only consider neighbor info for nh_scope == RT_SCOPE_LINK per Julian's
  comment
- drop the 'state == NUD_REACHABLE' from the state check since it is
  part of NUD_VALID (comment from Julian)
- wrapped the use of the neigh in a sysctl

 Documentation/networking/ip-sysctl.txt | 10 ++++++++++
 include/net/netns/ipv4.h               |  3 +++
 net/ipv4/fib_semantics.c               | 34 +++++++++++++++++++++++++++++-----
 net/ipv4/sysctl_net_ipv4.c             | 11 +++++++++++
 4 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index b183e2b606c8..6c7f365b1515 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -63,6 +63,16 @@ fwmark_reflect - BOOLEAN
 	fwmark of the packet they are replying to.
 	Default: 0
 
+fib_multipath_use_neigh - BOOLEAN
+	Use status of existing neighbor entry when determining nexthop for
+	multipath routes. If disabled, neighbor information is not used and
+	packets could be directed to a failed nexthop. Only valid for kernels
+	built with CONFIG_IP_ROUTE_MULTIPATH enabled.
+	Default: 0 (disabled)
+	Possible values:
+	0 - disabled
+	1 - enabled
+
 route/max_size - INTEGER
 	Maximum number of routes allowed in the kernel.  Increase
 	this when using large numbers of interfaces and/or routes.
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index a69cde3ce460..d061ffeb1e71 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -133,6 +133,9 @@ struct netns_ipv4 {
 	struct fib_rules_ops	*mr_rules_ops;
 #endif
 #endif
+#ifdef CONFIG_IP_ROUTE_MULTIPATH
+	int sysctl_fib_multipath_use_neigh;
+#endif
 	atomic_t	rt_genid;
 };
 #endif
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index d97268e8ff10..5016676c9186 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -1559,21 +1559,45 @@ int fib_sync_up(struct net_device *dev, unsigned int nh_flags)
 }
 
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
+static bool fib_good_nh(const struct fib_nh *nh)
+{
+	int state = NUD_REACHABLE;
+
+	if (nh->nh_scope == RT_SCOPE_LINK) {
+		struct neighbour *n;
+
+		rcu_read_lock_bh();
+
+		n = __neigh_lookup_noref(&arp_tbl, &nh->nh_gw, nh->nh_dev);
+		if (n)
+			state = n->nud_state;
+
+		rcu_read_unlock_bh();
+	}
+
+	return !!(state & NUD_VALID);
+}
 
 void fib_select_multipath(struct fib_result *res, int hash)
 {
 	struct fib_info *fi = res->fi;
+	struct net *net = fi->fib_net;
+	bool first = false;
 
 	for_nexthops(fi) {
 		if (hash > atomic_read(&nh->nh_upper_bound))
 			continue;
 
-		res->nh_sel = nhsel;
-		return;
+		if (!net->ipv4.sysctl_fib_multipath_use_neigh ||
+		    fib_good_nh(nh)) {
+			res->nh_sel = nhsel;
+			return;
+		}
+		if (!first) {
+			res->nh_sel = nhsel;
+			first = true;
+		}
 	} endfor_nexthops(fi);
-
-	/* Race condition: route has just become dead. */
-	res->nh_sel = 0;
 }
 #endif
 
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 1e1fe6086dd9..bb0419582b8d 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -960,6 +960,17 @@ static struct ctl_table ipv4_net_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec,
 	},
+#ifdef CONFIG_IP_ROUTE_MULTIPATH
+	{
+		.procname	= "fib_multipath_use_neigh",
+		.data		= &init_net.ipv4.sysctl_fib_multipath_use_neigh,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
+#endif
 	{ }
 };
 
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH v4 net-next] net: ipv4: Consider failed nexthops in multipath routes
From: David Ahern @ 2016-04-03 23:41 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: netdev
In-Reply-To: <alpine.LFD.2.11.1604040039020.1534@ja.home.ssi.bg>

On 4/3/16 3:57 PM, Julian Anastasov wrote:
>
> 	Hello,
>
> On Sun, 3 Apr 2016, David Ahern wrote:
>
>> --- a/Documentation/networking/ip-sysctl.txt
>> +++ b/Documentation/networking/ip-sysctl.txt
>> @@ -63,6 +63,16 @@ fwmark_reflect - BOOLEAN
>>   	fwmark of the packet they are replying to.
>>   	Default: 0
>>
>> +fib_multipath_use_neigh - BOOLEAN
>> +	Use status of existing neighbor entry when determining nexthop for
>> +	multipath routes. If disabled neighbor information is not used and
>
> 	Comma from v3 is removed?
>
>> +	packets could be directed to a failed nexthop. Only valid for kernels
>
>> --- a/net/ipv4/fib_semantics.c
>> +++ b/net/ipv4/fib_semantics.c
>>   void fib_select_multipath(struct fib_result *res, int hash)
>>   {
>>   	struct fib_info *fi = res->fi;
>> +	struct net *net = fi->fib_net;
>> +	bool first = false;
>>
>>   	for_nexthops(fi) {
>>   		if (hash > atomic_read(&nh->nh_upper_bound))
>>   			continue;
>>
>> -		res->nh_sel = nhsel;
>> -		return;
>> +		if (!net->ipv4.sysctl_fib_multipath_use_neigh ||
>> +		    fib_good_nh(nh)) {
>> +			res->nh_sel = nhsel;
>> +			return;
>> +		}
>> +		if (!first) {
>> +			res->nh_sel = nhsel;
>> +			first = true;
>> +		}
>>   	} endfor_nexthops(fi);
>>
>>   	/* Race condition: route has just become dead. */
>
> 	The 'res->nh_sel = 0;' that is here should be
> removed because it invalidates the above assignment.
>

right. will send a v5

^ permalink raw reply

* [PATCH net-next 4/4] udp: move peek offset on read and peek
From: Willem de Bruijn @ 2016-04-03 23:29 UTC (permalink / raw)
  To: netdev; +Cc: davem, samanthakumar, edumazet, willemb
In-Reply-To: <1459726193-20863-1-git-send-email-willemdebruijn.kernel@gmail.com>

From: Willem de Bruijn <willemb@google.com>

For UDP sockets, implement the SO_PEEK_OFF semantics introduced in
commit ef64a54f6e55 ("sock: Introduce the SO_PEEK_OFF sock option").

Move the offset forward on peek by the size of the data peeked,
move it backwards on regular reads.

The socket lock is not held for the duration of udp_recvmsg, so
peek and read operations can run concurrently. Only the last store
to sk_peek_off is preserved.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/linux/skbuff.h | 7 ++++++-
 net/core/datagram.c    | 9 ++++++---
 net/ipv4/udp.c         | 9 ++++-----
 net/ipv6/udp.c         | 9 ++++-----
 4 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 15d0df9..0073812 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2949,7 +2949,12 @@ int skb_copy_datagram_from_iter(struct sk_buff *skb, int offset,
 				 struct iov_iter *from, int len);
 int zerocopy_sg_from_iter(struct sk_buff *skb, struct iov_iter *frm);
 void skb_free_datagram(struct sock *sk, struct sk_buff *skb);
-void skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb);
+void __skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb, int len);
+static inline void skb_free_datagram_locked(struct sock *sk,
+					    struct sk_buff *skb)
+{
+	__skb_free_datagram_locked(sk, skb, 0);
+}
 int skb_kill_datagram(struct sock *sk, struct sk_buff *skb, unsigned int flags);
 int skb_copy_bits(const struct sk_buff *skb, int offset, void *to, int len);
 int skb_store_bits(struct sk_buff *skb, int offset, const void *from, int len);
diff --git a/net/core/datagram.c b/net/core/datagram.c
index fa9dc64..b7de71f 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -301,16 +301,19 @@ void skb_free_datagram(struct sock *sk, struct sk_buff *skb)
 }
 EXPORT_SYMBOL(skb_free_datagram);
 
-void skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb)
+void __skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb, int len)
 {
 	bool slow;
 
 	if (likely(atomic_read(&skb->users) == 1))
 		smp_rmb();
-	else if (likely(!atomic_dec_and_test(&skb->users)))
+	else if (likely(!atomic_dec_and_test(&skb->users))) {
+		sk_peek_offset_bwd(sk, len);
 		return;
+	}
 
 	slow = lock_sock_fast(sk);
+	sk_peek_offset_bwd(sk, len);
 	skb_orphan(skb);
 	sk_mem_reclaim_partial(sk);
 	unlock_sock_fast(sk, slow);
@@ -318,7 +321,7 @@ void skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb)
 	/* skb is now orphaned, can be freed outside of locked section */
 	__kfree_skb(skb);
 }
-EXPORT_SYMBOL(skb_free_datagram_locked);
+EXPORT_SYMBOL(__skb_free_datagram_locked);
 
 /**
  *	skb_kill_datagram - Free a datagram skbuff forcibly
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 016d13c..075c874 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1356,7 +1356,7 @@ try_again:
 	skb = __skb_recv_datagram(sk, flags | (noblock ? MSG_DONTWAIT : 0),
 				  &peeked, &off, &err);
 	if (!skb)
-		goto out;
+		return err;
 
 	ulen = skb->len;
 	copied = len;
@@ -1393,7 +1393,8 @@ try_again:
 			UDP_INC_STATS_USER(sock_net(sk),
 					   UDP_MIB_INERRORS, is_udplite);
 		}
-		goto out_free;
+		skb_free_datagram_locked(sk, skb);
+		return err;
 	}
 
 	if (!peeked)
@@ -1417,9 +1418,7 @@ try_again:
 	if (flags & MSG_TRUNC)
 		err = ulen;
 
-out_free:
-	skb_free_datagram_locked(sk, skb);
-out:
+	__skb_free_datagram_locked(sk, skb, peeking ? -err : err);
 	return err;
 
 csum_copy_err:
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index d107810..323282a 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -419,7 +419,7 @@ try_again:
 	skb = __skb_recv_datagram(sk, flags | (noblock ? MSG_DONTWAIT : 0),
 				  &peeked, &off, &err);
 	if (!skb)
-		goto out;
+		return err;
 
 	ulen = skb->len;
 	copied = len;
@@ -462,7 +462,8 @@ try_again:
 						    UDP_MIB_INERRORS,
 						    is_udplite);
 		}
-		goto out_free;
+		skb_free_datagram_locked(sk, skb);
+		return err;
 	}
 	if (!peeked) {
 		if (is_udp4)
@@ -510,9 +511,7 @@ try_again:
 	if (flags & MSG_TRUNC)
 		err = ulen;
 
-out_free:
-	skb_free_datagram_locked(sk, skb);
-out:
+	__skb_free_datagram_locked(sk, skb, peeking ? -err : err);
 	return err;
 
 csum_copy_err:
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* [PATCH net-next 3/4] udp: enable MSG_PEEK at non-zero offset
From: Willem de Bruijn @ 2016-04-03 23:29 UTC (permalink / raw)
  To: netdev; +Cc: davem, samanthakumar, edumazet, willemb
In-Reply-To: <1459726193-20863-1-git-send-email-willemdebruijn.kernel@gmail.com>

From: samanthakumar <samanthakumar@google.com>

Enable peeking at UDP datagrams at the offset specified with socket
option SOL_SOCKET/SO_PEEK_OFF. Peek at any datagram in the queue, up
to the end of the given datagram.

When peeking, always checksum the packet immediately, to avoid
recomputation on subsequent peeks and final read.

This implementation does not move the peek offset. A follow-up patch
adds that.

Signed-off-by: Sam Kumar <samanthakumar@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/net/sock.h  |  2 ++
 net/core/sock.c     |  9 +++++++++
 net/ipv4/af_inet.c  |  1 +
 net/ipv4/udp.c      | 13 +++++++------
 net/ipv6/af_inet6.c |  1 +
 net/ipv6/udp.c      | 13 +++++++------
 6 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index b30c2b3..5978bcf 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -456,6 +456,8 @@ struct sock {
 #define SK_CAN_REUSE	1
 #define SK_FORCE_REUSE	2
 
+int sk_set_peek_off(struct sock *sk, int val);
+
 static inline int sk_peek_offset(struct sock *sk, int flags)
 {
 	if (unlikely(flags & MSG_PEEK)) {
diff --git a/net/core/sock.c b/net/core/sock.c
index a33f494..3739381 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2149,6 +2149,15 @@ void __sk_mem_reclaim(struct sock *sk, int amount)
 }
 EXPORT_SYMBOL(__sk_mem_reclaim);
 
+int sk_set_peek_off(struct sock *sk, int val)
+{
+	if (val < 0)
+		return -EINVAL;
+
+	sk->sk_peek_off = val;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(sk_set_peek_off);
 
 /*
  * Set of default routines for initialising struct proto_ops when
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 9e48199..a38b991 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -948,6 +948,7 @@ const struct proto_ops inet_dgram_ops = {
 	.recvmsg	   = inet_recvmsg,
 	.mmap		   = sock_no_mmap,
 	.sendpage	   = inet_sendpage,
+	.set_peek_off	   = sk_set_peek_off,
 #ifdef CONFIG_COMPAT
 	.compat_setsockopt = compat_sock_common_setsockopt,
 	.compat_getsockopt = compat_sock_common_getsockopt,
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 6ebc7de..016d13c 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1342,7 +1342,7 @@ int udp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int noblock,
 	DECLARE_SOCKADDR(struct sockaddr_in *, sin, msg->msg_name);
 	struct sk_buff *skb;
 	unsigned int ulen, copied;
-	int peeked, off = 0;
+	int peeked, peeking, off;
 	int err;
 	int is_udplite = IS_UDPLITE(sk);
 	bool checksum_valid = false;
@@ -1352,6 +1352,7 @@ int udp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int noblock,
 		return ip_recv_error(sk, msg, len, addr_len);
 
 try_again:
+	peeking = off = sk_peek_offset(sk, flags);
 	skb = __skb_recv_datagram(sk, flags | (noblock ? MSG_DONTWAIT : 0),
 				  &peeked, &off, &err);
 	if (!skb)
@@ -1359,8 +1360,8 @@ try_again:
 
 	ulen = skb->len;
 	copied = len;
-	if (copied > ulen)
-		copied = ulen;
+	if (copied > ulen - off)
+		copied = ulen - off;
 	else if (copied < ulen)
 		msg->msg_flags |= MSG_TRUNC;
 
@@ -1370,16 +1371,16 @@ try_again:
 	 * coverage checksum (UDP-Lite), do it before the copy.
 	 */
 
-	if (copied < ulen || UDP_SKB_CB(skb)->partial_cov) {
+	if (copied < ulen || UDP_SKB_CB(skb)->partial_cov || peeking) {
 		checksum_valid = !udp_lib_checksum_complete(skb);
 		if (!checksum_valid)
 			goto csum_copy_err;
 	}
 
 	if (checksum_valid || skb_csum_unnecessary(skb))
-		err = skb_copy_datagram_msg(skb, 0, msg, copied);
+		err = skb_copy_datagram_msg(skb, off, msg, copied);
 	else {
-		err = skb_copy_and_csum_datagram_msg(skb, 0, msg);
+		err = skb_copy_and_csum_datagram_msg(skb, off, msg);
 
 		if (err == -EINVAL)
 			goto csum_copy_err;
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index b11c37c..2b78aad 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -561,6 +561,7 @@ const struct proto_ops inet6_dgram_ops = {
 	.recvmsg	   = inet_recvmsg,		/* ok		*/
 	.mmap		   = sock_no_mmap,
 	.sendpage	   = sock_no_sendpage,
+	.set_peek_off	   = sk_set_peek_off,
 #ifdef CONFIG_COMPAT
 	.compat_setsockopt = compat_sock_common_setsockopt,
 	.compat_getsockopt = compat_sock_common_getsockopt,
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index ebcf05f..d107810 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -401,7 +401,7 @@ int udpv6_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 	struct inet_sock *inet = inet_sk(sk);
 	struct sk_buff *skb;
 	unsigned int ulen, copied;
-	int peeked, off = 0;
+	int peeked, peeking, off;
 	int err;
 	int is_udplite = IS_UDPLITE(sk);
 	bool checksum_valid = false;
@@ -415,6 +415,7 @@ int udpv6_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		return ipv6_recv_rxpmtu(sk, msg, len, addr_len);
 
 try_again:
+	peeking = off = sk_peek_offset(sk, flags);
 	skb = __skb_recv_datagram(sk, flags | (noblock ? MSG_DONTWAIT : 0),
 				  &peeked, &off, &err);
 	if (!skb)
@@ -422,8 +423,8 @@ try_again:
 
 	ulen = skb->len;
 	copied = len;
-	if (copied > ulen)
-		copied = ulen;
+	if (copied > ulen - off)
+		copied = ulen - off;
 	else if (copied < ulen)
 		msg->msg_flags |= MSG_TRUNC;
 
@@ -435,16 +436,16 @@ try_again:
 	 * coverage checksum (UDP-Lite), do it before the copy.
 	 */
 
-	if (copied < ulen || UDP_SKB_CB(skb)->partial_cov) {
+	if (copied < ulen || UDP_SKB_CB(skb)->partial_cov || peeking) {
 		checksum_valid = !udp_lib_checksum_complete(skb);
 		if (!checksum_valid)
 			goto csum_copy_err;
 	}
 
 	if (checksum_valid || skb_csum_unnecessary(skb))
-		err = skb_copy_datagram_msg(skb, 0, msg, copied);
+		err = skb_copy_datagram_msg(skb, off, msg, copied);
 	else {
-		err = skb_copy_and_csum_datagram_msg(skb, 0, msg);
+		err = skb_copy_and_csum_datagram_msg(skb, off, msg);
 		if (err == -EINVAL)
 			goto csum_copy_err;
 	}
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* [PATCH net-next 2/4] udp: remove headers from UDP packets before queueing
From: Willem de Bruijn @ 2016-04-03 23:29 UTC (permalink / raw)
  To: netdev; +Cc: davem, samanthakumar, edumazet, willemb
In-Reply-To: <1459726193-20863-1-git-send-email-willemdebruijn.kernel@gmail.com>

From: samanthakumar <samanthakumar@google.com>

Remove UDP transport headers before queueing packets for reception.
This change simplifies a follow-up patch to add MSG_PEEK support.

Signed-off-by: Sam Kumar <samanthakumar@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/net/sock.h |  1 +
 include/net/udp.h  |  9 +++++++++
 net/core/sock.c    | 19 +++++++++++++------
 net/ipv4/udp.c     | 20 +++++++++++---------
 net/ipv6/udp.c     | 12 +++++++-----
 5 files changed, 41 insertions(+), 20 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 6435f6d..b30c2b3 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1859,6 +1859,7 @@ void sk_reset_timer(struct sock *sk, struct timer_list *timer,
 
 void sk_stop_timer(struct sock *sk, struct timer_list *timer);
 
+int __sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
 int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
 
 int sock_queue_err_skb(struct sock *sk, struct sk_buff *skb);
diff --git a/include/net/udp.h b/include/net/udp.h
index 92927f7..baa2ec1 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -158,6 +158,15 @@ static inline __sum16 udp_v4_check(int len, __be32 saddr,
 void udp_set_csum(bool nocheck, struct sk_buff *skb,
 		  __be32 saddr, __be32 daddr, int len);
 
+static inline void udp_csum_pull_header(struct sk_buff *skb)
+{
+	if (skb->ip_summed == CHECKSUM_NONE)
+		skb->csum = csum_partial(udp_hdr(skb), sizeof(struct udphdr),
+					 skb->csum);
+	skb_pull_rcsum(skb, sizeof(struct udphdr));
+	UDP_SKB_CB(skb)->cscov -= sizeof(struct udphdr);
+}
+
 struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb,
 				 struct udphdr *uh);
 int udp_gro_complete(struct sk_buff *skb, int nhoff);
diff --git a/net/core/sock.c b/net/core/sock.c
index b67b9ae..a33f494 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -402,9 +402,8 @@ static void sock_disable_timestamp(struct sock *sk, unsigned long flags)
 }
 
 
-int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
+int __sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 {
-	int err;
 	unsigned long flags;
 	struct sk_buff_head *list = &sk->sk_receive_queue;
 
@@ -414,10 +413,6 @@ int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 		return -ENOMEM;
 	}
 
-	err = sk_filter(sk, skb);
-	if (err)
-		return err;
-
 	if (!sk_rmem_schedule(sk, skb, skb->truesize)) {
 		atomic_inc(&sk->sk_drops);
 		return -ENOBUFS;
@@ -440,6 +435,18 @@ int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 		sk->sk_data_ready(sk);
 	return 0;
 }
+EXPORT_SYMBOL(__sock_queue_rcv_skb);
+
+int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
+{
+	int err;
+
+	err = sk_filter(sk, skb);
+	if (err)
+		return err;
+
+	return __sock_queue_rcv_skb(sk, skb);
+}
 EXPORT_SYMBOL(sock_queue_rcv_skb);
 
 int sk_receive_skb(struct sock *sk, struct sk_buff *skb, const int nested)
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 08eed5e..6ebc7de 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1357,7 +1357,7 @@ try_again:
 	if (!skb)
 		goto out;
 
-	ulen = skb->len - sizeof(struct udphdr);
+	ulen = skb->len;
 	copied = len;
 	if (copied > ulen)
 		copied = ulen;
@@ -1377,11 +1377,9 @@ try_again:
 	}
 
 	if (checksum_valid || skb_csum_unnecessary(skb))
-		err = skb_copy_datagram_msg(skb, sizeof(struct udphdr),
-					    msg, copied);
+		err = skb_copy_datagram_msg(skb, 0, msg, copied);
 	else {
-		err = skb_copy_and_csum_datagram_msg(skb, sizeof(struct udphdr),
-						     msg);
+		err = skb_copy_and_csum_datagram_msg(skb, 0, msg);
 
 		if (err == -EINVAL)
 			goto csum_copy_err;
@@ -1548,7 +1546,7 @@ static int __udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 		sk_incoming_cpu_update(sk);
 	}
 
-	rc = sock_queue_rcv_skb(sk, skb);
+	rc = __sock_queue_rcv_skb(sk, skb);
 	if (rc < 0) {
 		int is_udplite = IS_UDPLITE(sk);
 
@@ -1664,10 +1662,14 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 		}
 	}
 
-	if (rcu_access_pointer(sk->sk_filter) &&
-	    udp_lib_checksum_complete(skb))
-		goto csum_error;
+	if (rcu_access_pointer(sk->sk_filter)) {
+		if (udp_lib_checksum_complete(skb))
+			goto csum_error;
+		if (sk_filter(sk, skb))
+			goto drop;
+	}
 
+	udp_csum_pull_header(skb);
 	if (sk_rcvqueues_full(sk, sk->sk_rcvbuf)) {
 		UDP_INC_STATS_BH(sock_net(sk), UDP_MIB_RCVBUFERRORS,
 				 is_udplite);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 8125931..ebcf05f 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -420,7 +420,7 @@ try_again:
 	if (!skb)
 		goto out;
 
-	ulen = skb->len - sizeof(struct udphdr);
+	ulen = skb->len;
 	copied = len;
 	if (copied > ulen)
 		copied = ulen;
@@ -442,10 +442,9 @@ try_again:
 	}
 
 	if (checksum_valid || skb_csum_unnecessary(skb))
-		err = skb_copy_datagram_msg(skb, sizeof(struct udphdr),
-					    msg, copied);
+		err = skb_copy_datagram_msg(skb, 0, msg, copied);
 	else {
-		err = skb_copy_and_csum_datagram_msg(skb, sizeof(struct udphdr), msg);
+		err = skb_copy_and_csum_datagram_msg(skb, 0, msg);
 		if (err == -EINVAL)
 			goto csum_copy_err;
 	}
@@ -598,7 +597,7 @@ static int __udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 		sk_incoming_cpu_update(sk);
 	}
 
-	rc = sock_queue_rcv_skb(sk, skb);
+	rc = __sock_queue_rcv_skb(sk, skb);
 	if (rc < 0) {
 		int is_udplite = IS_UDPLITE(sk);
 
@@ -692,8 +691,11 @@ int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	if (rcu_access_pointer(sk->sk_filter)) {
 		if (udp_lib_checksum_complete(skb))
 			goto csum_error;
+		if (sk_filter(sk, skb))
+			goto drop;
 	}
 
+	udp_csum_pull_header(skb);
 	if (sk_rcvqueues_full(sk, sk->sk_rcvbuf)) {
 		UDP6_INC_STATS_BH(sock_net(sk),
 				  UDP_MIB_RCVBUFERRORS, is_udplite);
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* [PATCH net-next 1/4] sock: convert sk_peek_offset functions to WRITE_ONCE
From: Willem de Bruijn @ 2016-04-03 23:29 UTC (permalink / raw)
  To: netdev; +Cc: davem, samanthakumar, edumazet, willemb
In-Reply-To: <1459726193-20863-1-git-send-email-willemdebruijn.kernel@gmail.com>

From: Willem de Bruijn <willemb@google.com>

Make the peek offset interface safe to use in lockless environments.
Use READ_ONCE and WRITE_ONCE to avoid race conditions between testing
and updating the peek offset.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/net/sock.h | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 255d3e0..6435f6d 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -458,26 +458,28 @@ struct sock {
 
 static inline int sk_peek_offset(struct sock *sk, int flags)
 {
-	if ((flags & MSG_PEEK) && (sk->sk_peek_off >= 0))
-		return sk->sk_peek_off;
-	else
-		return 0;
+	if (unlikely(flags & MSG_PEEK)) {
+		s32 off = READ_ONCE(sk->sk_peek_off);
+		if (off >= 0)
+			return off;
+	}
+
+	return 0;
 }
 
 static inline void sk_peek_offset_bwd(struct sock *sk, int val)
 {
-	if (sk->sk_peek_off >= 0) {
-		if (sk->sk_peek_off >= val)
-			sk->sk_peek_off -= val;
-		else
-			sk->sk_peek_off = 0;
+	s32 off = READ_ONCE(sk->sk_peek_off);
+
+	if (unlikely(off >= 0)) {
+		off = max_t(s32, off - val, 0);
+		WRITE_ONCE(sk->sk_peek_off, off);
 	}
 }
 
 static inline void sk_peek_offset_fwd(struct sock *sk, int val)
 {
-	if (sk->sk_peek_off >= 0)
-		sk->sk_peek_off += val;
+	sk_peek_offset_bwd(sk, -val);
 }
 
 /*
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox