Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] ip: add udp_csum, udp6_csum_tx, udp6_csum_rx control flags to ip l2tp add tunnel
From: Wang Shanker @ 2016-04-27 17:06 UTC (permalink / raw)
  To: James Chapman; +Cc: netdev
In-Reply-To: <FD67CA88-0731-4DD7-AC78-5A4B922810D6@gmail.com>



> 在 2016年4月27日，23:33，Wang Shanker <shankerwangmiao@gmail.com> 写道：
> 
> 
> 
>> 在 2016年4月27日，20:21，James Chapman <jchapman@katalix.com> 写道：
>> 
>> On 26 April 2016 at 15:15, Wang Shanker <shankerwangmiao@gmail.com> wrote:
>>> Hi, all
>>> 
>>> It’s my first time to contribute to such an important open source project. Things began when I upgraded my server, called "Server A", form ubuntu 14.04 to 16.04, which is shipped with new kernel version, 4.4. After upgrade, I soon found a l2tp tunnel between this server and another linux server, called "Server B", via ipv6 broke down. Here is the network topology:
>>> 
>>> +----------+                    +----------+
>>> | Server A | -- IPV6 Network -- | Server B |
>>> +----------+                    +----------+
>>> 
>>> The l2tp tunnel was encapsulated in udp datagrams. All the configuration was normal and could work after I reverted the kernel on Server A to original version.
>>> 
>>> Here is what i did to create the tunnel:
>>> 
>>> ```
>>> on Server A:
>>> 
>>> ip l2tp add tunnel tunnel_id 86 peer_tunnel_id 86 remote 2001:db8::aaaa local 2001:db8::bbbb udp_sport 1086 udp_dport 1086
>>> ip l2tp add session name l2tpeth0 tunnel_id 86 session_id 86 peer_session_id 86
>>> ip l s l2tpeth0 up
>>> 
>>> on Server B:
>>> 
>>> ip l2tp add tunnel tunnel_id 86 peer_tunnel_id 86 local 2001:db8::aaaa remote 2001:db8::bbbb udp_sport 1086 udp_dport 1086
>>> ip l2tp add session name l2tpeth0 tunnel_id 86 session_id 86 peer_session_id 86
>>> ip l s l2tpeth0 up
>>> 
>>> ```
>>> 
>>> When I used tcpdump to diagnose the problem, I got such result:
>>> 
>>> ```
>>> on Server A:
>>> 
>>> arping -i l2tpeth0 -0 1.2.3.4
>>> 
>>> on Server B:
>>> 
>>> tcpdump -i eth0 -n port 1086  -v
>>> 
>>> 21:35:57.818810 IP6 (flowlabel 0x8f028, hlim 64, next-header UDP (17) payload length: 62) 2001:db8::aaaa.1086 > 2001:db8::bbbb.1086: [bad udp cksum 0x0000 -> 0x1140!] UDP, length 54
>>> 21:35:58.820572 IP6 (flowlabel 0x8f028, hlim 64, next-header UDP (17) payload length: 62) 2001:db8::aaaa.1086 > 2001:db8::bbbb.1086: [bad udp cksum 0x0000 -> 0x1140!] UDP, length 54
>>> 21:35:59.822216 IP6 (flowlabel 0x8f028, hlim 64, next-header UDP (17) payload length: 62) 2001:db8::aaaa.1086 > 2001:db8::bbbb.1086: [bad udp cksum 0x0000 -> 0x1140!] UDP, length 54
>>> 
>>> ```
>>> 
>>> After looking into kernel source, I found out that in this commit a new feature to set udp6 checksum to zero in commit 6b649fe, which added `L2TP_ATTR_UDP_ZERO_CSUM6_TX` and `L2TP_ATTR_UDP_ZERO_CSUM6_TX`.
>>> 
>>> As a result, I added `udp_csum`, `udp6_csum_tx`, `udp6_csum_rx` control flags to `ip l2tp add tunnel` to control those attributes about checksum.
>>> 
>>> Using this to create the tunnel instead on Server A:
>>> 
>>> ```
>>> ip l2tp add tunnel tunnel_id 86 peer_tunnel_id 86 remote 2001:db8::aaaa local 2001:db8::bbbb udp_sport 1086 udp_dport 1086 udp6_csum_tx on udp6_csum_rx on
>>> ```
>>> 
>>> I finally got:
>>> 
>>> ```
>>> on Server A:
>>> 
>>> arping -i l2tpeth0 -0 1.2.3.4
>>> 
>>> on Server B:
>>> 
>>> tcpdump -i eth0 -n port 1086  -v
>>> 
>>> 22:07:03.844297 IP6 (flowlabel 0x8f028, hlim 64, next-header UDP (17) payload length: 62) 2001:db8::aaaa.1086 > 2001:db8::bbbb.1086: [udp sum ok] UDP, length 54
>>> 22:07:04.845717 IP6 (flowlabel 0x8f028, hlim 64, next-header UDP (17) payload length: 62) 2001:db8::aaaa.1086 > 2001:db8::bbbb.1086: [udp sum ok] UDP, length 54
>>> 22:07:05.847965 IP6 (flowlabel 0x8f028, hlim 64, next-header UDP (17) payload length: 62) 2001:db8::aaaa.1086 > 2001:db8::bbbb.1086: [udp sum ok] UDP, length 54
>>> 
>>> tcpdump -i l2tpeth0 -v
>>> 
>>> 22:10:35.691326 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.2.3.4 tell 0.0.0.0, length 28
>>> 22:10:36.693627 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.2.3.4 tell 0.0.0.0, length 28
>>> 22:10:37.695010 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.2.3.4 tell 0.0.0.0, length 28
>>> 22:10:38.697121 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.2.3.4 tell 0.0.0.0, length 28
>>> 
>>> ```
>>> 
>>> It seems to work. However, is it the real point that should be fixed and does my patch fix it well? I need your suggestion.
>> 
>> This seems reasonable to me. It is good to provide user control of
>> L2TP checksum options.
>> 
>> However, there is a problem with your patch. The netlink attributes
>> you are accessing to control checksums are flags, not u8 values.
> 
> I’m not so familiar with kernel code. However, in `<linux/l2tp.h>` : 
> 
> ```
> 
> /*
> * ATTR types defined for L2TP
> */
> enum {
> 	L2TP_ATTR_NONE,			/* no data */
> // ...
> 	L2TP_ATTR_IP6_DADDR,		/* struct in6_addr */
> 	L2TP_ATTR_UDP_ZERO_CSUM6_TX,	/* u8 */
> 	L2TP_ATTR_UDP_ZERO_CSUM6_RX,	/* u8 */
> // ...
> 
> }
> ```
> 
> isn’t L2TP_ATTR_UDP_ZERO_CSUM6_TX a u8 value? Or should I use `addattr` instead of `addattr8`? 
> 
>> 
>> Maybe the default checksum setting for such l2tp tunnels should be
>> changed in the l2tp kernel code to match the previous behaviour where
>> IPv6 checksums were disabled?
> 
> I think so. However, I’m confused with those code.
> 
> From the name of the attrs `L2TP_ATTR_UDP_ZERO_CSUM6_TX` and `L2TP_ATTR_UDP_ZERO_CSUM6_RX`, I 
> can tell that when those flags are set, the checksum will be zero. Also, according to the 
> comment of commit 6b649fe in kernel source, “Added new L2TP configuration options to allow 
> TX and RX of zero checksums in IPv6. Default is not to use them.”, checksums shouldn't have
> been zero by default. However, in fact, they are. I think there may be some bugs in kernel
> source. 

I think I’ve got the bug. Here is the patch for kernel

---
 net/l2tp/l2tp_core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index afca2eb..6edfa99 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1376,9 +1376,9 @@ static int l2tp_tunnel_sock_create(struct net *net,
                        memcpy(&udp_conf.peer_ip6, cfg->peer_ip6,
                               sizeof(udp_conf.peer_ip6));
                        udp_conf.use_udp6_tx_checksums =
-                           cfg->udp6_zero_tx_checksums;
+                         ! cfg->udp6_zero_tx_checksums;
                        udp_conf.use_udp6_rx_checksums =
-                           cfg->udp6_zero_rx_checksums;
+                         ! cfg->udp6_zero_rx_checksums;
                } else
 #endif
                {
-- 


>> 
>>> 
>>> 
>>> ---
>>> ip/ipl2tp.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 45 insertions(+)
>>> 
>>> diff --git a/ip/ipl2tp.c b/ip/ipl2tp.c
>>> index 3c8ee93..67a6482 100644
>>> --- a/ip/ipl2tp.c
>>> +++ b/ip/ipl2tp.c
>>> @@ -56,6 +56,8 @@ struct l2tp_parm {
>>> 
>>>       uint16_t pw_type;
>>>       uint16_t mtu;
>>> +       int udp6_csum_tx:1;
>>> +       int udp6_csum_rx:1;
>>>       int udp_csum:1;
>>>       int recv_seq:1;
>>>       int send_seq:1;
>>> @@ -117,6 +119,9 @@ static int create_tunnel(struct l2tp_parm *p)
>>>       if (p->encap == L2TP_ENCAPTYPE_UDP) {
>>>               addattr16(&req.n, 1024, L2TP_ATTR_UDP_SPORT, p->local_udp_port);
>>>               addattr16(&req.n, 1024, L2TP_ATTR_UDP_DPORT, p->peer_udp_port);
>>> +               addattr8 (&req.n, 1024, L2TP_ATTR_UDP_CSUM, p->udp_csum);
>>> +               addattr8 (&req.n, 1024, L2TP_ATTR_UDP_ZERO_CSUM6_TX, p->udp6_csum_tx);
>>> +               addattr8 (&req.n, 1024, L2TP_ATTR_UDP_ZERO_CSUM6_RX, p->udp6_csum_rx);
>>>       }
>>> 
>>>       if (rtnl_talk(&genl_rth, &req.n, NULL, 0) < 0)
>>> @@ -282,6 +287,14 @@ static int get_response(struct nlmsghdr *n, void *arg)
>>>               p->l2spec_len = rta_getattr_u8(attrs[L2TP_ATTR_L2SPEC_LEN]);
>>> 
>>>       p->udp_csum = !!attrs[L2TP_ATTR_UDP_CSUM];
>>> +       if (attrs[L2TP_ATTR_UDP_ZERO_CSUM6_TX])
>>> +         p->udp6_csum_tx = rta_getattr_u8(attrs[L2TP_ATTR_UDP_ZERO_CSUM6_TX]);
>>> +       else
>>> +         p->udp6_csum_tx = 1;
>>> +       if (attrs[L2TP_ATTR_UDP_ZERO_CSUM6_RX])
>>> +         p->udp6_csum_rx = rta_getattr_u8(attrs[L2TP_ATTR_UDP_ZERO_CSUM6_RX]);
>>> +       else
>>> +         p->udp6_csum_rx = 1;
>>>       if (attrs[L2TP_ATTR_COOKIE])
>>>               memcpy(p->cookie, RTA_DATA(attrs[L2TP_ATTR_COOKIE]),
>>>                      p->cookie_len = RTA_PAYLOAD(attrs[L2TP_ATTR_COOKIE]));
>>> @@ -470,6 +483,9 @@ static void usage(void)
>>>       fprintf(stderr, "          tunnel_id ID peer_tunnel_id ID\n");
>>>       fprintf(stderr, "          [ encap { ip | udp } ]\n");
>>>       fprintf(stderr, "          [ udp_sport PORT ] [ udp_dport PORT ]\n");
>>> +       fprintf(stderr, "          [ udp_csum { on | off } ]\n");
>>> +       fprintf(stderr, "          [ udp6_csum_tx { on | off } ]\n");
>>> +       fprintf(stderr, "          [ udp6_csum_rx { on | off } ]\n");
>>>       fprintf(stderr, "Usage: ip l2tp add session [ name NAME ]\n");
>>>       fprintf(stderr, "          tunnel_id ID\n");
>>>       fprintf(stderr, "          session_id ID peer_session_id ID\n");
>>> @@ -500,6 +516,8 @@ static int parse_args(int argc, char **argv, int cmd, struct l2tp_parm *p)
>>>       /* Defaults */
>>>       p->l2spec_type = L2TP_L2SPECTYPE_DEFAULT;
>>>       p->l2spec_len = 4;
>>> +       p->udp6_csum_rx = 1;
>>> +       p->udp6_csum_tx = 1;
>>> 
>>>       while (argc > 0) {
>>>               if (strcmp(*argv, "encap") == 0) {
>>> @@ -569,6 +587,33 @@ static int parse_args(int argc, char **argv, int cmd, struct l2tp_parm *p)
>>>                       if (get_u16(&uval, *argv, 0))
>>>                               invarg("invalid port\n", *argv);
>>>                       p->peer_udp_port = uval;
>>> +               } else if (strcmp(*argv, "udp_csum") == 0) {
>>> +                       NEXT_ARG();
>>> +                       if (strcmp(*argv, "on") == 0) {
>>> +                               p->udp_csum = 1;
>>> +                       } else if (strcmp(*argv, "off") == 0) {
>>> +                               p->udp_csum = 0;
>>> +                       } else {
>>> +                               invarg("invalid option for udp_csum\n", *argv);
>>> +                       }
>>> +               } else if (strcmp(*argv, "udp6_csum_rx") == 0) {
>>> +                       NEXT_ARG();
>>> +                       if (strcmp(*argv, "on") == 0) {
>>> +                               p->udp6_csum_rx = 1;
>>> +                       } else if (strcmp(*argv, "off") == 0) {
>>> +                               p->udp6_csum_rx = 0;
>>> +                       } else {
>>> +                               invarg("invalid option for udp6_csum_rx\n", *argv);
>>> +                       }
>>> +               } else if (strcmp(*argv, "udp6_csum_tx") == 0) {
>>> +                       NEXT_ARG();
>>> +                       if (strcmp(*argv, "on") == 0) {
>>> +                               p->udp6_csum_tx = 1;
>>> +                       } else if (strcmp(*argv, "off") == 0) {
>>> +                               p->udp6_csum_tx = 0;
>>> +                       } else {
>>> +                               invarg("invalid option for udp6_csum_tx\n", *argv);
>>> +                       }
>>>               } else if (strcmp(*argv, "offset") == 0) {
>>>                       __u8 uval;
>>> 
>>> --
>>> 2.5.2

^ permalink raw reply related

* Re: [PATCH net-next] drivers/net: add 6WIND SHULTI support
From: Stephen Hemminger @ 2016-04-27 17:07 UTC (permalink / raw)
  To: David Miller; +Cc: jiri, fw, nicolas.dichtel, netdev
In-Reply-To: <20160427.125506.1088576804354580011.davem@davemloft.net>

On Wed, 27 Apr 2016 12:55:06 -0400 (EDT)
David Miller <davem@davemloft.net> wrote:

> From: Jiri Pirko <jiri@resnulli.us>
> Date: Wed, 27 Apr 2016 17:14:04 +0200
> 
> > Wed, Apr 27, 2016 at 11:56:15AM CEST, fw@strlen.de wrote:
> >>Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
> >>> This patch adds the support of the 6WIND SHULTI switch. It is a software
> >>> switch doing L2 forwarding.
> >>> 
> >>> This first version implements the minimum needed to get the device working.
> >>> It also implements, via switchdev and rtnetlink, bridge forwarding offload,
> >>> including FDB static entries, FDB learning and FDB ageing.
> >>
> >>How is this different from net/bridge?
> >>How is this different from openvswitch?
> > 
> > The difference is that it this tries to allow userspace crap to mirror
> > setting user does for bridge/ovs. Basically this looks to me like an
> > attempt to enable userspace SDKs and such.
> 
> +1
> 
> There is no way I'm applying this.

Also it has a bunch of device specific generic netlink which was
a red flag for me.

^ permalink raw reply

* [PATCH net-next] tcp: give prequeue mode some care
From: Eric Dumazet @ 2016-04-27 17:12 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

TCP prequeue goal is to defer processing of incoming packets
to user space thread currently blocked in a recvmsg() system call.

Intent is to spend less time processing these packets on behalf
of softirq handler, as softirq handler is unfair to normal process
scheduler decisions, as it might interrupt threads that do not
even use networking.

Current prequeue implementation has following issues :

1) It only checks size of the prequeue against sk_rcvbuf

   It was fine 15 years ago when sk_rcvbuf was in the 64KB vicinity.
   But we now have ~8MB values to cope with modern networking needs.
   We have to add sk_rmem_alloc in the equation, since out of order
   packets can definitely use up to sk_rcvbuf memory themselves.

2) Even with a fixed memory truesize check, prequeue can be filled
   by thousands of packets. When prequeue needs to be flushed, either
   from sofirq context (in tcp_prequeue() or timer code), or process
   context (in tcp_prequeue_process()), this adds a latency spike
   which is often not desirable.
   I added a fixed limit of 32 packets, as this translated to a max
   flush time of 60 us on my test hosts.

   Also note that all packets in prequeue are not accounted for tcp_mem,
   since they are not charged against sk_forward_alloc at this point.
   This is probably not a big deal.

Note that this might increase LINUX_MIB_TCPPREQUEUEDROPPED counts,
which is misnamed, as packets are not dropped at all, but rather pushed
to the stack (where they can be either consumed or dropped)

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_ipv4.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index d2a5763e5abc..58bcf5e001e7 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1506,16 +1506,16 @@ bool tcp_prequeue(struct sock *sk, struct sk_buff *skb)

 	__skb_queue_tail(&tp->ucopy.prequeue, skb);
 	tp->ucopy.memory += skb->truesize;
-	if (tp->ucopy.memory > sk->sk_rcvbuf) {
+	if (skb_queue_len(&tp->ucopy.prequeue) >= 32 ||
+	    tp->ucopy.memory + atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf) {
 		struct sk_buff *skb1;

 		BUG_ON(sock_owned_by_user(sk));
+		NET_ADD_STATS_BH(sock_net(sk), LINUX_MIB_TCPPREQUEUEDROPPED,
+				 skb_queue_len(&tp->ucopy.prequeue));

-		while ((skb1 = __skb_dequeue(&tp->ucopy.prequeue)) != NULL) {
+		while ((skb1 = __skb_dequeue(&tp->ucopy.prequeue)) != NULL)
 			sk_backlog_rcv(sk, skb1);
-			NET_INC_STATS_BH(sock_net(sk),
-					 LINUX_MIB_TCPPREQUEUEDROPPED);
-		}

 		tp->ucopy.memory = 0;
 	} else if (skb_queue_len(&tp->ucopy.prequeue) == 1) {

^ permalink raw reply related

* Re: [PATCH net-next 0/7] bridge: per-vlan stats
From: Nikolay Aleksandrov @ 2016-04-27 17:13 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, roopa, davem, jhs
In-Reply-To: <20160427100601.6bbf9d28@samsung9>

On 04/27/2016 07:06 PM, Stephen Hemminger wrote:
> On Wed, 27 Apr 2016 18:18:15 +0200
> Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:
> 
>> Hi,
>> This set adds support for bridge per-vlan statistics.
>> In order to be able to dump statistics we need a way to continue
>> dumping after reaching maximum size, thus patches 01-03 extend the new
>> stats API with a per-device extended link stats attribute and callback
>> which can save its local state and continue where it left off afterwards.
>> I considered using the already existing "fill_xstats" callback but it gets
>> confusing since we need to separate the linkinfo dump from the new stats
>> api dump and adding a flag/argument to do that just looks messy. I don't
>> think the rtnl_link_ops size is an issue, so adding these seemed like the
>> cleaner approach.
>>
>> Patch 05 converts the pvid to a pointer so we can consolidate the vlan
>> stats accounting paths later, also allows to simplify the pvid code.
>> Patches 06 and 07 add the stats support and netlink dump support
>> respectively.
>> I've tested this set with both old and modified iproute2, kmemleak on and
>> some traffic stress tests while adding/removing vlans and ports.
>>
>> Thank you,
>>  Nik
>>
>> Note: Jamal I haven't forgotten about the per-port per-vlan stats, I've got
>> a follow-up patch that adds it. You can easily see that the infrastructure
>> for private port/vlan stats is in place after this set. Though the stats
>> api will need some more changes to support that.
>>
>>
>> Nikolay Aleksandrov (7):
>>   net: rtnetlink: allow rtnl_fill_statsinfo to save private state
>>     counter
>>   net: rtnetlink: allow only one idx saving stats attribute
>>   net: rtnetlink: add linkxstats callbacks and attribute
>>   net: constify is_skb_forwardable's arguments
>>   bridge: vlan: RCUify pvid
>>   bridge: vlan: learn to count
>>   bridge: netlink: export per-vlan stats
>>
>>  include/linux/netdevice.h      |   3 +-
>>  include/net/rtnetlink.h        |  10 +++
>>  include/uapi/linux/if_bridge.h |   8 +++
>>  include/uapi/linux/if_link.h   |   9 +++
>>  net/bridge/br_netlink.c        |  80 ++++++++++++++++++++----
>>  net/bridge/br_private.h        |  32 +++++-----
>>  net/bridge/br_vlan.c           | 134 +++++++++++++++++++++++++++++------------
>>  net/core/dev.c                 |   2 +-
>>  net/core/rtnetlink.c           |  64 +++++++++++++++++---
>>  9 files changed, 266 insertions(+), 76 deletions(-)
>>
> 
> I am concerned this adds unnecessary complexity (more bugs)
IMO the whole point in moving to a per-vlan structure from a bitmap was to
have per-vlan context and flexibility to implement functions like this.
The fast path code that is added is minimal (fetch & stats adds).
In any case I'm sure there're more per-vlan options coming, and not only from
me or Cumulus. If we won't make use of the per-vlan context, then I'd suggest
we revert to bitmaps as that's faster and more compact.

> and overhead (slower performance). Statistics are not free, and having
> them in a convenient place maybe unnecessary duplication.
> 
The performance impact is minimal as we're using per-cpu counters. If you're
concerned that even that is too much I can make this conditional on a per-vlan
flag that can be requested by the user to enable the stats, but that is overkill
for something as basic as stats in my opinion.

Thanks,
 Nik

^ permalink raw reply

* Re: [PATCH v2] net: Add Qualcomm IPC router
From: Bjorn Andersson @ 2016-04-27 17:14 UTC (permalink / raw)
  To: David Miller
  Cc: linux-kernel, netdev, linux-arm-msm, courtney.cavin,
	bjorn.andersson
In-Reply-To: <20160427.122242.1614940676503935894.davem@davemloft.net>

On Wed 27 Apr 09:22 PDT 2016, David Miller wrote:

> From: Bjorn Andersson <bjorn.andersson@linaro.org>
> Date: Tue, 26 Apr 2016 22:48:05 -0700
> 
> > +	rc = qcom_smd_send(qdev->channel, skb->data, skb->len);
> 
> I truly dislike adding networking protocols that depend upon some
> piece of infrastructure that only some platforms can enable, it's even
> worse when that set of platforms doesn't intersect with x86-64.
> 
> When you do things like this, it's quite hard to make protocol wide
> changes to APIs because build testing becomes an issue.
> 

That's a very valid concern.

> This code can now only be build tested on ARCH_QCOM architectures, and
> that's a serious negative downside.

For normal usage the QRTR_SMD doesn't make much sense to be selectable
unless QCOM_SMD is compiled in, but I can fix up the QCOM_SMD exports
and slap a COMPILE_TEST on it.


Looking at it again, we already have the conditional for QRTR and the OF
code in the driver went away a while back, so we're down to something
like:

	depends on QCOM_SMD || COMPILE_TEST

Regards,
Bjorn

^ permalink raw reply

* Please Forward this email to Mr.Nobuyuki Hirano
From: Feevey, Carol L. @ 2016-04-27 17:25 UTC (permalink / raw)


I am Mr. Hirano Nobuyuki , from Japan, I will like to seek your partnership and mutual understanding regarding a beneficial transaction.

Reply Immediately to my email below for Details .

( hiranonobuyuki108@outlook.com )

Regards For Hirano
Sincerely,
Mr.Nobuyuki Hirano
Email:  hiranonobuyuki108@outlook.com
Bank Of Tokyo-Mitsubishi UFJ????????????????

^ permalink raw reply

* Re: [net-next PATCH 6/8] mlx4: Add support for inner IPv6 checksum offloads and TSO
From: Alexander Duyck @ 2016-04-27 18:05 UTC (permalink / raw)
  To: Tariq Toukan, Saeed Mahameed
  Cc: Alex Duyck, Tal Alon, Linux Kernel Network Developers,
	David Miller, Gal Pressman, Or Gerlitz, Eran Ben Elisha
In-Reply-To: <8ed986cd-10f4-bcf4-1433-45e6594849a8@gmail.com>

On 04/27/2016 08:39 AM, Tariq Toukan wrote:
>
>
> On 27/04/2016 12:01 AM, Alexander Duyck wrote:
>> On Tue, Apr 26, 2016 at 1:23 PM, Saeed Mahameed
>> <saeedm@dev.mellanox.co.il> wrote:
>>> On Tue, Apr 26, 2016 at 6:50 PM, Alex Duyck <aduyck@mirantis.com> wrote:
>>>> The setup is pretty straight forward.  Basically I left the first port
>>>> in the default namespace and moved the second int a secondary
>>>> namespace referred to below as $netns.  I then assigned the IPv6
>>>> addresses fec0::10:1 and fec0::10:2. After that I ran the following:
>>>>
>>>>          VXLAN=vx$net
>>>>          echo $VXLAN ${test_options[$i]}
>>>>          ip link add $VXLAN type vxlan id $net \
>>>>                  local fec0::10:1 remote $addr6 dev $PF0 \
>>>>                  ${test_options[$i]} dstport `expr 8800 + $net`
>>>>          ip netns exec $netns ip link add $VXLAN type vxlan id $net \
>>>>                                    local $addr6 remote fec0::10:1
>>>> dev $port \
>>>>                                    ${test_options[$i]} dstport `expr
>>>> 8800 + $net`
>>>>          ifconfig $VXLAN 192.168.${net}.1/24
>>>>          ip netns exec $netns ifconfig $VXLAN 192.168.${net}.2/24
>>>>
>>> Thanks, indeed i see that GSO is not working with vxlan over IPv6 over
>>> mlx5 device.
>>> We will test out those patches on both mlx4 and mlx5, and debug mlx4
>>> IPv6 issue you see.
>>>
>>>>> Anyway, I suspect it might be related to a driver bug most likely in
>>>>> get_real_size function @en_tx.c
>>>>> specifically in : *lso_header_size =
>>>>> (skb_inner_transport_header(skb) -
>>>>> skb->data) + inner_tcp_hdrlen(skb);
>>>>>
>>>>> will check this and get back to you.
>>>> I'm not entirely convinced.  What I was seeing is t hat the hardware
>>>> itself was performing Rx checksum offload only on tunnels with an
>>>> outer IPv4 header and ignoring tunnels with an outer IPv6 header.
>>> I don't get it, are you trying to say that the issue is in the RX side ?
>>> what do you mean by ignoring ? Dropping ? or just not validating
>>> checksum ?
>>> if so why would you disable GSO and IPv6 checksumming on TX ?
>> I'm suspecting that whatever parsing logic exists in either the
>> hardware or firmware may not be configured to parse tunnels with outer
>> IPv6 headers.  The tell-tale sign is what occurs with an IPv6 based
>> tunnel with no outer checksum.  The hardware is not performing a
>> checksum on the inner headers so it reports it as a UDP frame with no
>> checksum to the stack which ends up preventing us from doing GRO.
>> That tells me that the hardware is not parsing IPv6 based tunnels on
>> Rx.  I am assuming that if the Rx side doesn't work then there is a
>> good chance that the Tx won't.
>>
>>>>>>    @@ -2431,7 +2435,18 @@ static netdev_features_t
>>>>>> mlx4_en_features_check(struct sk_buff *skb,
>>>>>>                                                  netdev_features_t
>>>>>> features)
>>>>>>    {
>>>>>>          features = vlan_features_check(skb, features);
>>>>>> -       return vxlan_features_check(skb, features);
>>>>>> +       features = vxlan_features_check(skb, features);
>>>>>> +
>>>>>> +       /* The ConnectX-3 doesn't support outer IPv6 checksums but
>>>>>> it does
>>>>>> +        * support inner IPv6 checksums and segmentation so  we
>>>>>> need to
>>>>>> +        * strip that feature if this is an IPv6 encapsulated frame.
>>>>>> +        */
>>>>>> +       if (skb->encapsulation &&
>>>>>> +           (skb->ip_summed == CHECKSUM_PARTIAL) &&
>>>>>> +           (ip_hdr(skb)->version != 4))
>>>>>> +               features &= ~(NETIF_F_CSUM_MASK | NETIF_F_GSO_MASK);
>>>>> Dejavu, didn't you fix this already in harmonize_features, in
>>>>> i.e, it is enough to do here:
>>>>>
>>>>> if (skb->encapsulation && (skb->ip_summed == CHECKSUM_PARTIAL))
>>>>>              features &= ~NETIF_F_IPV6_CSUM;
>>>>>
>>>> So what this patch is doing is enabling an inner IPv6 header offloads.
>>>> Up above we set the NETIF_F_IPV6_CSUM bit and we want it to stay set
>>>> unless we have an outer IPv6 header because the inner headers may
>>>> still need that bit set.  If I did what you suggest it strips IPv6
>>>> checksum support for inner headers and if we have to use GSO partial I
>>>> ended up encountering some of the other bugs that I have fixed for GSO
>>>> partial where either sg or csum are not defined.
>>>>
>>> I see, you mean that you want to disable checksumming and GSO only for
>>> packets with Outer(IPv6):Inner(X) and keep it in case for
>>> Outer(IPv4):Inner(IPv6)
>>> but i think it is weird that the driver decides to disable features it
>>> didn't declare in first place (NETIF_F_CSUM_MASK | NETIF_F_GSO_MASK)
>>>
>>> Retry:
>>>
>>> if (skb->encapsulation && (skb->ip_summed == CHECKSUM_PARTIAL) &&
>>>      (ip_hdr(skb)->version != 4))
>>>              features &= ~NETIF_F_IPV6_CSUM;
>>>
>>> will this work ?
>> Sort of.  All that would happen is that you would fall through to
>> harmonize_features where NETIF_F_CSUM_MASK | NETIF_F_GSO_MASK gets
>> cleared.  I just figured I would short-cut things since we cannot
>> support inner checksum or any GSO offloads if the tunnel has an outer
>> IPv6 header.  In addition this happens to effectively be the same code
>> I am using in vxlan_features_check to disable things if we cannot
>> checksum a protocol so it should help to keep the code size smaller
>> for the function if the compiler is smart enough to coalesce similar
>> code.
>>
>>> Anyway i prefer to debug the mlx4 issue first before we discuss the
>>> best approach to disable checksumming & GSO for outer IPv6 in mlx4.
>> The current code as-is already has it disabled.  All I am doing is
>> enabling IPv6 checksums for inner headers as it seems like it doesn't
>> work for outer headers.
>
> Hi Alex,
> I will be working on the mlx4 issue next week after the holidays.
> I will check this offline in-house, without blocking the series.
>
> Regards,
> Tariq
>

If it helps below are the kind of results I am seeing with the patch 
series applied versus what is currently there.  The big problem areas 
are IPv6 over IPv4 tunnels, and IPv4 over IPv6 tunnels without checksums.

The breakdown below is bare-ip-version without tunnel, outer-inner with 
tunnel, or outer-csum-inner if a checksum is present on the outer UDP 
header.

After the series is applied you can see the v6 over v4 issues are 
addressed, and GSO partial has improved the performance for traffic over 
v4 tunnels with outer UDP checksums.

Throughput Throughput  Local Local   Result
            Units       CPU   Service Tag
                        Util  Demand
                        %
mlx4 - Before
-------------------------------------------------
26616.45   10^6bits/s  3.62  0.357   "v4"
26101.18   10^6bits/s  6.72  0.675   "v6"
22289.41   10^6bits/s  6.49  0.764   "v4-v4"
N/A - could not connect              "v4-v6"
12743.91   10^6bits/s  4.25  0.874   "v4-csum-v4"
N/A - could not connect              "v4-csum-v6"
0.69       10^6bits/s  0.66  2519.1  "v6-v4"
5924.47    10^6bits/s  4.23  1.871   "v6-v6"
10369.95   10^6bits/s  4.33  1.096   "v6-csum-v4"
10648.51   10^6bits/s  4.10  1.010   "v6-csum-v6"

mlx4 - After
-------------------------------------------------
26585.36   10^6bits/s  3.60  0.355   "v4"
26342.86   10^6bits/s  6.67  0.664   "v6"
22295.93   10^6bits/s  6.34  0.746   "v4-v4"
19977.24   10^6bits/s  6.04  0.793   "v4-v6"
22164.71   10^6bits/s  6.46  0.763   "v4-csum-v4"
19685.22   10^6bits/s  6.12  0.815   "v4-csum-v6"
6126.80    10^6bits/s  4.29  1.835   "v6-v4"
5793.53    10^6bits/s  4.24  1.917   "v6-v6"
10278.52   10^6bits/s  4.07  1.039   "v6-csum-v4"
10526.68   10^6bits/s  4.11  1.024   "v6-csum-v6"

^ permalink raw reply

* Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
From: Ben Hutchings @ 2016-04-27 18:07 UTC (permalink / raw)
  To: Ben Greear, linux-kernel, stable
  Cc: akpm, David S. Miller, Vijay Pandurangan, Cong Wang, netdev,
	Evan Jones, Nicolas Dichtel, Phil Sutter, Toshiaki Makita
In-Reply-To: <5720E1F0.9010203@candelatech.com>

[-- Attachment #1: Type: text/plain, Size: 631 bytes --]

On Wed, 2016-04-27 at 08:59 -0700, Ben Greear wrote:
> On 04/26/2016 04:02 PM, Ben Hutchings wrote:
> > 
> > 3.2.80-rc1 review patch.  If anyone has any objections, please let me know.
> I would be careful about this.  It causes regressions when sending
> PACKET_SOCKET buffers from user-space to veth devices.
> 
> There was a proposed upstream fix for the regression, but it has not gone
> into the tree as far as I know.
> 
> http://www.spinics.net/lists/netdev/msg370436.html
[...]

OK, I'll drop this for now.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 2/2] tcp: remove a redundant check for SKBTX_ACK_TSTAMP
From: Martin KaFai Lau @ 2016-04-27 18:19 UTC (permalink / raw)
  To: Soheil Hassas Yeganeh
  Cc: davem, netdev, willemb, edumazet, ycheng, ncardwell,
	Soheil Hassas Yeganeh
In-Reply-To: <1461617473-11349-2-git-send-email-soheil.kdev@gmail.com>

On Mon, Apr 25, 2016 at 04:51:13PM -0400, Soheil Hassas Yeganeh wrote:
> From: Soheil Hassas Yeganeh <soheil@google.com>
>
> txstamp_ack in tcp_skb_cb is set iff the SKBTX_ACK_TSTAMP
> flag is set for an skb. Thus, it is not required to check
> shinfo->tx_flags if the txstamp_ack bit is checked.
>
> Remove the check on shinfo->tx_flags & SKBTX_ACK_TSTAMP, since
> it has already been checked using the txstamp_ack bit.
>
> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
> ---
>  net/ipv4/tcp_input.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 967520d..2f3fd92 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -3087,8 +3087,7 @@ static void tcp_ack_tstamp(struct sock *sk, struct sk_buff *skb,
>  		return;
>
>  	shinfo = skb_shinfo(skb);
> -	if ((shinfo->tx_flags & SKBTX_ACK_TSTAMP) &&
> -	    !before(shinfo->tskey, prior_snd_una) &&
> +	if (!before(shinfo->tskey, prior_snd_una) &&
>  	    before(shinfo->tskey, tcp_sk(sk)->snd_una))
>  		__skb_tstamp_tx(skb, NULL, sk, SCM_TSTAMP_ACK);
>  }
> --
> 2.8.0.rc3.226.g39d4020
>
Acked-by: Martin KaFai Lau <kafai@fb.com>

Can it be one step further and completely remove SKBTX_ACK_TSTAMP?
like what Willem has also suggested here:
http://www.spinics.net/lists/netdev/msg374231.html

It seems no one else is using the SKBTX_ACK_TSTAMP except TCP.

^ permalink raw reply

* [PATCH next] ipvlan: Fix failure path in dev registration during link creation
From: Mahesh Bandewar @ 2016-04-27 18:37 UTC (permalink / raw)
  To: David Miller; +Cc: Mahesh Bandewar, Eric Dumazet, netdev, Eric W . Biederman

From: Mahesh Bandewar <maheshb@google.com>

When newlink creation fails at device-registration, the port->count
is decremented twice. Francesco Ruggeri (fruggeri@arista.com) found
this issue in Macvlan and the same exists in IPvlan driver too.

While fixing this issue I noticed another issue of missing unregister
in case of failure, so adding it to the fix which is similar to the
macvlan fix by Francesco in SHA1:308379607548524b8d86dbf20134681024935e0b

Reported-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
CC: Eric Dumazet <edumazet@google.com>
CC: Eric W. Biederman <ebiederm@xmission.com>
---
 drivers/net/ipvlan/ipvlan_main.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index 57941d3f4227..1c4d395fbd49 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -113,6 +113,7 @@ static int ipvlan_init(struct net_device *dev)
 {
 	struct ipvl_dev *ipvlan = netdev_priv(dev);
 	const struct net_device *phy_dev = ipvlan->phy_dev;
+	struct ipvl_port *port = ipvlan->port;
 
 	dev->state = (dev->state & ~IPVLAN_STATE_MASK) |
 		     (phy_dev->state & IPVLAN_STATE_MASK);
@@ -128,6 +129,8 @@ static int ipvlan_init(struct net_device *dev)
 	if (!ipvlan->pcpu_stats)
 		return -ENOMEM;
 
+	port->count += 1;
+
 	return 0;
 }
 
@@ -481,27 +484,21 @@ static int ipvlan_link_new(struct net *src_net, struct net_device *dev,
 
 	dev->priv_flags |= IFF_IPVLAN_SLAVE;
 
-	port->count += 1;
 	err = register_netdevice(dev);
 	if (err < 0)
-		goto ipvlan_destroy_port;
+		return err;
 
 	err = netdev_upper_dev_link(phy_dev, dev);
-	if (err)
-		goto ipvlan_destroy_port;
+	if (err) {
+		unregister_netdevice(dev);
+		return err;
+	}
 
 	list_add_tail_rcu(&ipvlan->pnode, &port->ipvlans);
 	ipvlan_set_port_mode(port, mode);
 
 	netif_stacked_transfer_operstate(phy_dev, dev);
 	return 0;
-
-ipvlan_destroy_port:
-	port->count -= 1;
-	if (!port->count)
-		ipvlan_port_destroy(phy_dev);
-
-	return err;
 }
 
 static void ipvlan_link_delete(struct net_device *dev, struct list_head *head)
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* [PATCH net-next] net: dsa: Provide CPU port statistics to master netdev
From: Florian Fainelli @ 2016-04-27 18:45 UTC (permalink / raw)
  To: netdev; +Cc: davem, andrew, vivien.didelot, Florian Fainelli

This patch overloads the DSA master netdev, aka CPU Ethernet MAC to also
include switch-side statistics, which is useful for debugging purposes,
when the switch is not properly connected to the Ethernet MAC (duplex
mismatch, (RG)MII electrical issues etc.).

We accomplish this by retaining the original copy of the master netdev's
ethtool_ops, and just overload the 3 operations we care about:
get_sset_count, get_strings and get_ethtool_stats so as to intercept
these calls and call into the original master_netdev ethtool_ops, plus
our own.

We take this approach as opposed to providing a set of DSA helper
functions that would retrive the CPU port's statistics, because the
entire purpose of DSA is to allow unmodified Ethernet MAC drivers to be
used as CPU conduit interfaces, therefore, statistics overlay in such
drivers would simply not scale.

The new ethtool -S <iface> output would therefore look like this now:
<iface> statistics
p<2 digits cpu port number>_<switch MIB counter names>

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
Changes from RFC:

- prepend the CPU port as a prefix to make it clear what the stats are
  about, master netdev interface stats are unchanged

 include/net/dsa.h |  5 ++++
 net/dsa/slave.c   | 88 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 93 insertions(+)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 2d280aba97e2..8e86af87c84f 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -111,6 +111,11 @@ struct dsa_switch_tree {
 	enum dsa_tag_protocol	tag_protocol;
 
 	/*
+	 * Original copy of the master netdev ethtool_ops
+	 */
+	struct ethtool_ops	master_ethtool_ops;
+
+	/*
 	 * The switch and port to which the CPU is attached.
 	 */
 	s8			cpu_switch;
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 3b6750f5e68b..5ea8a40c8d33 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -666,6 +666,78 @@ static void dsa_slave_get_strings(struct net_device *dev,
 	}
 }
 
+static void dsa_cpu_port_get_ethtool_stats(struct net_device *dev,
+					   struct ethtool_stats *stats,
+					   uint64_t *data)
+{
+	struct dsa_switch_tree *dst = dev->dsa_ptr;
+	struct dsa_switch *ds = dst->ds[0];
+	s8 cpu_port = dst->cpu_port;
+	int count = 0;
+
+	if (dst->master_ethtool_ops.get_sset_count) {
+		count = dst->master_ethtool_ops.get_sset_count(dev,
+							       ETH_SS_STATS);
+		dst->master_ethtool_ops.get_ethtool_stats(dev, stats, data);
+	}
+
+	if (ds->drv->get_ethtool_stats)
+		ds->drv->get_ethtool_stats(ds, cpu_port, data + count);
+}
+
+static int dsa_cpu_port_get_sset_count(struct net_device *dev, int sset)
+{
+	struct dsa_switch_tree *dst = dev->dsa_ptr;
+	struct dsa_switch *ds = dst->ds[0];
+	int count = 0;
+
+	if (dst->master_ethtool_ops.get_sset_count)
+		count += dst->master_ethtool_ops.get_sset_count(dev, sset);
+
+	if (sset == ETH_SS_STATS && ds->drv->get_sset_count)
+		count += ds->drv->get_sset_count(ds);
+
+	return count;
+}
+
+static void dsa_cpu_port_get_strings(struct net_device *dev,
+				     uint32_t stringset, uint8_t *data)
+{
+	struct dsa_switch_tree *dst = dev->dsa_ptr;
+	struct dsa_switch *ds = dst->ds[0];
+	s8 cpu_port = dst->cpu_port;
+	int len = ETH_GSTRING_LEN;
+	int mcount = 0, count;
+	unsigned int i;
+	uint8_t pfx[4];
+	uint8_t *ndata;
+
+	snprintf(pfx, sizeof(pfx), "p%.2d", cpu_port);
+	/* We do not want to be NULL-terminated, since this is a prefix */
+	pfx[sizeof(pfx) - 1] = '_';
+
+	if (dst->master_ethtool_ops.get_sset_count) {
+		mcount = dst->master_ethtool_ops.get_sset_count(dev,
+								ETH_SS_STATS);
+		dst->master_ethtool_ops.get_strings(dev, stringset, data);
+	}
+
+	if (stringset == ETH_SS_STATS && ds->drv->get_strings) {
+		ndata = data + mcount * len;
+		/* This function copies ETH_GSTRINGS_LEN bytes, we will mangle
+		 * the output after to prepend our CPU port prefix we
+		 * constructed earlier
+		 */
+		ds->drv->get_strings(ds, cpu_port, ndata);
+		count = ds->drv->get_sset_count(ds);
+		for (i = 0; i < count; i++) {
+			memmove(ndata + (i * len + sizeof(pfx)),
+				ndata + i * len, len - sizeof(pfx));
+			memcpy(ndata + i * len, pfx, sizeof(pfx));
+		}
+	}
+}
+
 static void dsa_slave_get_ethtool_stats(struct net_device *dev,
 					struct ethtool_stats *stats,
 					uint64_t *data)
@@ -821,6 +893,8 @@ static const struct ethtool_ops dsa_slave_ethtool_ops = {
 	.get_eee		= dsa_slave_get_eee,
 };
 
+static struct ethtool_ops dsa_cpu_port_ethtool_ops;
+
 static const struct net_device_ops dsa_slave_netdev_ops = {
 	.ndo_open	 	= dsa_slave_open,
 	.ndo_stop		= dsa_slave_close,
@@ -1038,6 +1112,7 @@ int dsa_slave_create(struct dsa_switch *ds, struct device *parent,
 		     int port, char *name)
 {
 	struct net_device *master = ds->dst->master_netdev;
+	struct dsa_switch_tree *dst = ds->dst;
 	struct net_device *slave_dev;
 	struct dsa_slave_priv *p;
 	int ret;
@@ -1049,6 +1124,19 @@ int dsa_slave_create(struct dsa_switch *ds, struct device *parent,
 
 	slave_dev->features = master->vlan_features;
 	slave_dev->ethtool_ops = &dsa_slave_ethtool_ops;
+	if (master->ethtool_ops != &dsa_cpu_port_ethtool_ops) {
+		memcpy(&dst->master_ethtool_ops, master->ethtool_ops,
+		       sizeof(struct ethtool_ops));
+		memcpy(&dsa_cpu_port_ethtool_ops, &dst->master_ethtool_ops,
+		       sizeof(struct ethtool_ops));
+		dsa_cpu_port_ethtool_ops.get_sset_count =
+					dsa_cpu_port_get_sset_count;
+		dsa_cpu_port_ethtool_ops.get_ethtool_stats =
+					dsa_cpu_port_get_ethtool_stats;
+		dsa_cpu_port_ethtool_ops.get_strings =
+					dsa_cpu_port_get_strings;
+		master->ethtool_ops = &dsa_cpu_port_ethtool_ops;
+	}
 	eth_hw_addr_inherit(slave_dev, master);
 	slave_dev->priv_flags |= IFF_NO_QUEUE;
 	slave_dev->netdev_ops = &dsa_slave_netdev_ops;
-- 
2.1.0

^ permalink raw reply related

* Re: [PATCH v2] net: Add Qualcomm IPC router
From: David Miller @ 2016-04-27 18:55 UTC (permalink / raw)
  To: bjorn.andersson
  Cc: linux-kernel, netdev, linux-arm-msm, courtney.cavin,
	bjorn.andersson
In-Reply-To: <20160427171445.GO3202@tuxbot>

From: Bjorn Andersson <bjorn.andersson@linaro.org>
Date: Wed, 27 Apr 2016 10:14:45 -0700

> On Wed 27 Apr 09:22 PDT 2016, David Miller wrote:
> 
>> This code can now only be build tested on ARCH_QCOM architectures, and
>> that's a serious negative downside.
> 
> For normal usage the QRTR_SMD doesn't make much sense to be selectable
> unless QCOM_SMD is compiled in, but I can fix up the QCOM_SMD exports
> and slap a COMPILE_TEST on it.
> 
> 
> Looking at it again, we already have the conditional for QRTR and the OF
> code in the driver went away a while back, so we're down to something
> like:
> 
> 	depends on QCOM_SMD || COMPILE_TEST

If that's enough to make it work, feel free to spin as a series a patch that
does the Kconfig bits and then the patch that adds the IPC protocol.

^ permalink raw reply

* Re: [PATCH next] ipvlan: Fix failure path in dev registration during link creation
From: David Miller @ 2016-04-27 18:57 UTC (permalink / raw)
  To: mahesh; +Cc: maheshb, edumazet, netdev, ebiederm
In-Reply-To: <1461782259-10806-1-git-send-email-mahesh@bandewar.net>

From: Mahesh Bandewar <mahesh@bandewar.net>
Date: Wed, 27 Apr 2016 11:37:39 -0700

> While fixing this issue I noticed another issue of missing unregister
> in case of failure, so adding it to the fix which is similar to the
> macvlan fix by Francesco in SHA1:308379607548524b8d86dbf20134681024935e0b

This is not the correct way to refer to commits.

You should specify, exactly, 12 digits of the SHA1 value, followed by
a space, followed by the header line text of that commit contained in
parenthesis and double quotes, like how Fixes: tags specify commits.

^ permalink raw reply

* [PATCH net-next] tcp: do not block bh during prequeue processing
From: Eric Dumazet @ 2016-04-27 18:59 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

AFAIK, nothing in current TCP stack absolutely wants BH
being disabled once socket is owned by a thread running in
process context.

As mentioned in my prior patch ("tcp: give prequeue mode some care"),
processing a batch of packets might take time, better not block BH
at all.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp.c       |    4 ----
 net/ipv4/tcp_input.c |   30 ++----------------------------
 2 files changed, 2 insertions(+), 32 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 4d73858991af..7a0f6f738155 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1445,12 +1445,8 @@ static void tcp_prequeue_process(struct sock *sk)
 
 	NET_INC_STATS_USER(sock_net(sk), LINUX_MIB_TCPPREQUEUED);
 
-	/* RX process wants to run with disabled BHs, though it is not
-	 * necessary */
-	local_bh_disable();
 	while ((skb = __skb_dequeue(&tp->ucopy.prequeue)) != NULL)
 		sk_backlog_rcv(sk, skb);
-	local_bh_enable();
 
 	/* Clear memory counter. */
 	tp->ucopy.memory = 0;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 967520dbe0bf..2a0d19e0044f 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4608,14 +4608,12 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
 
 			__set_current_state(TASK_RUNNING);
 
-			local_bh_enable();
 			if (!skb_copy_datagram_msg(skb, 0, tp->ucopy.msg, chunk)) {
 				tp->ucopy.len -= chunk;
 				tp->copied_seq += chunk;
 				eaten = (chunk == skb->len);
 				tcp_rcv_space_adjust(sk);
 			}
-			local_bh_disable();
 		}
 
 		if (eaten <= 0) {
@@ -5131,7 +5129,6 @@ static int tcp_copy_to_iovec(struct sock *sk, struct sk_buff *skb, int hlen)
 	int chunk = skb->len - hlen;
 	int err;
 
-	local_bh_enable();
 	if (skb_csum_unnecessary(skb))
 		err = skb_copy_datagram_msg(skb, hlen, tp->ucopy.msg, chunk);
 	else
@@ -5143,32 +5140,9 @@ static int tcp_copy_to_iovec(struct sock *sk, struct sk_buff *skb, int hlen)
 		tcp_rcv_space_adjust(sk);
 	}
 
-	local_bh_disable();
 	return err;
 }
 
-static __sum16 __tcp_checksum_complete_user(struct sock *sk,
-					    struct sk_buff *skb)
-{
-	__sum16 result;
-
-	if (sock_owned_by_user(sk)) {
-		local_bh_enable();
-		result = __tcp_checksum_complete(skb);
-		local_bh_disable();
-	} else {
-		result = __tcp_checksum_complete(skb);
-	}
-	return result;
-}
-
-static inline bool tcp_checksum_complete_user(struct sock *sk,
-					     struct sk_buff *skb)
-{
-	return !skb_csum_unnecessary(skb) &&
-	       __tcp_checksum_complete_user(sk, skb);
-}
-
 /* Does PAWS and seqno based validation of an incoming segment, flags will
  * play significant role here.
  */
@@ -5382,7 +5356,7 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb,
 				}
 			}
 			if (!eaten) {
-				if (tcp_checksum_complete_user(sk, skb))
+				if (tcp_checksum_complete(skb))
 					goto csum_error;
 
 				if ((int)skb->truesize > sk->sk_forward_alloc)
@@ -5426,7 +5400,7 @@ no_ack:
 	}
 
 slow_path:
-	if (len < (th->doff << 2) || tcp_checksum_complete_user(sk, skb))
+	if (len < (th->doff << 2) || tcp_checksum_complete(skb))
 		goto csum_error;
 
 	if (!th->ack && !th->rst && !th->syn)

^ permalink raw reply related

* Re: [PATCH net-next] net: dsa: Provide CPU port statistics to master netdev
From: Andrew Lunn @ 2016-04-27 19:03 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: netdev, davem, vivien.didelot
In-Reply-To: <1461782714-13471-1-git-send-email-f.fainelli@gmail.com>

> +	if (stringset == ETH_SS_STATS && ds->drv->get_strings) {
> +		ndata = data + mcount * len;
> +		/* This function copies ETH_GSTRINGS_LEN bytes, we will mangle
> +		 * the output after to prepend our CPU port prefix we
> +		 * constructed earlier
> +		 */
> +		ds->drv->get_strings(ds, cpu_port, ndata);
> +		count = ds->drv->get_sset_count(ds);
> +		for (i = 0; i < count; i++) {
> +			memmove(ndata + (i * len + sizeof(pfx)),
> +				ndata + i * len, len - sizeof(pfx));
> +			memcpy(ndata + i * len, pfx, sizeof(pfx));

Hi Florian

Did you check what happens if this causes the NULL terminator to be
discarded? Does ethtool handle that? As i said before, it is unclear
if one is required.

	   Andrew

^ permalink raw reply

* [PATCH v3 1/2] soc: qcom: smd: Introduce compile stubs
From: Bjorn Andersson @ 2016-04-27 19:13 UTC (permalink / raw)
  To: David S. Miller, Andy Gross; +Cc: linux-kernel, netdev, linux-arm-msm

Introduce compile stubs for the SMD API, allowing consumers to be
compile tested.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
---

Changes since v2:
- Introduce this patch, to allow compile testing of QRTR_SMD

 include/linux/soc/qcom/smd.h | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/linux/soc/qcom/smd.h b/include/linux/soc/qcom/smd.h
index d0cb6d189a0a..46a984f5e3a3 100644
--- a/include/linux/soc/qcom/smd.h
+++ b/include/linux/soc/qcom/smd.h
@@ -45,13 +45,39 @@ struct qcom_smd_driver {
 	int (*callback)(struct qcom_smd_device *, const void *, size_t);
 };
 
+#if IS_ENABLED(CONFIG_QCOM_SMD)
+
 int qcom_smd_driver_register(struct qcom_smd_driver *drv);
 void qcom_smd_driver_unregister(struct qcom_smd_driver *drv);
 
+int qcom_smd_send(struct qcom_smd_channel *channel, const void *data, int len);
+
+#else
+
+static inline int qcom_smd_driver_register(struct qcom_smd_driver *drv)
+{
+	return -ENXIO;
+}
+
+static inline void qcom_smd_driver_unregister(struct qcom_smd_driver *drv)
+{
+	/* This shouldn't be possible */
+	WARN_ON(1);
+}
+
+static inline int qcom_smd_send(struct qcom_smd_channel *channel,
+				const void *data, int len)
+{
+	/* This shouldn't be possible */
+	WARN_ON(1);
+	return -ENXIO;
+}
+
+#endif
+
 #define module_qcom_smd_driver(__smd_driver) \
 	module_driver(__smd_driver, qcom_smd_driver_register, \
 		      qcom_smd_driver_unregister)
 
-int qcom_smd_send(struct qcom_smd_channel *channel, const void *data, int len);
 
 #endif
-- 
2.5.0

^ permalink raw reply related

* [PATCH v3 2/2] net: Add Qualcomm IPC router
From: Bjorn Andersson @ 2016-04-27 19:13 UTC (permalink / raw)
  To: David S. Miller, Andy Gross
  Cc: linux-kernel, netdev, linux-arm-msm, Courtney Cavin,
	Bjorn Andersson
In-Reply-To: <1461784383-2978-1-git-send-email-bjorn.andersson@linaro.org>

From: Courtney Cavin <courtney.cavin@sonymobile.com>

Add an implementation of Qualcomm's IPC router protocol, used to
communicate with service providing remote processors.

Signed-off-by: Courtney Cavin <courtney.cavin@sonymobile.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@sonymobile.com>
[bjorn: Cope with 0 being a valid node id and implement RTM_NEWADDR]
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
---

Changes since v2:
- Altered Kconfig dependency for QRTR_SMD to be compile testable

Changes since v1:
- Made node 0 (normally the Qualcomm modem) a valid node
- Implemented RTM_NEWADDR for specifying the local node id

 include/linux/socket.h    |    4 +-
 include/uapi/linux/qrtr.h |   12 +
 net/Kconfig               |    1 +
 net/Makefile              |    1 +
 net/qrtr/Kconfig          |   24 ++
 net/qrtr/Makefile         |    2 +
 net/qrtr/qrtr.c           | 1007 +++++++++++++++++++++++++++++++++++++++++++++
 net/qrtr/qrtr.h           |   31 ++
 net/qrtr/smd.c            |  117 ++++++
 9 files changed, 1198 insertions(+), 1 deletion(-)
 create mode 100644 include/uapi/linux/qrtr.h
 create mode 100644 net/qrtr/Kconfig
 create mode 100644 net/qrtr/Makefile
 create mode 100644 net/qrtr/qrtr.c
 create mode 100644 net/qrtr/qrtr.h
 create mode 100644 net/qrtr/smd.c

diff --git a/include/linux/socket.h b/include/linux/socket.h
index 73bf6c6a833b..b5cc5a6d7011 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -201,8 +201,9 @@ struct ucred {
 #define AF_NFC		39	/* NFC sockets			*/
 #define AF_VSOCK	40	/* vSockets			*/
 #define AF_KCM		41	/* Kernel Connection Multiplexor*/
+#define AF_QIPCRTR	42	/* Qualcomm IPC Router          */
 
-#define AF_MAX		42	/* For now.. */
+#define AF_MAX		43	/* For now.. */
 
 /* Protocol families, same as address families. */
 #define PF_UNSPEC	AF_UNSPEC
@@ -249,6 +250,7 @@ struct ucred {
 #define PF_NFC		AF_NFC
 #define PF_VSOCK	AF_VSOCK
 #define PF_KCM		AF_KCM
+#define PF_QIPCRTR	AF_QIPCRTR
 #define PF_MAX		AF_MAX
 
 /* Maximum queue length specifiable by listen.  */
diff --git a/include/uapi/linux/qrtr.h b/include/uapi/linux/qrtr.h
new file mode 100644
index 000000000000..66c0748d26e2
--- /dev/null
+++ b/include/uapi/linux/qrtr.h
@@ -0,0 +1,12 @@
+#ifndef _LINUX_QRTR_H
+#define _LINUX_QRTR_H
+
+#include <linux/socket.h>
+
+struct sockaddr_qrtr {
+	__kernel_sa_family_t sq_family;
+	__u32 sq_node;
+	__u32 sq_port;
+};
+
+#endif /* _LINUX_QRTR_H */
diff --git a/net/Kconfig b/net/Kconfig
index a8934d8c8fda..b841c42e5c9b 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -236,6 +236,7 @@ source "net/mpls/Kconfig"
 source "net/hsr/Kconfig"
 source "net/switchdev/Kconfig"
 source "net/l3mdev/Kconfig"
+source "net/qrtr/Kconfig"
 
 config RPS
 	bool
diff --git a/net/Makefile b/net/Makefile
index 81d14119eab5..bdd14553a774 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -78,3 +78,4 @@ endif
 ifneq ($(CONFIG_NET_L3_MASTER_DEV),)
 obj-y				+= l3mdev/
 endif
+obj-$(CONFIG_QRTR)		+= qrtr/
diff --git a/net/qrtr/Kconfig b/net/qrtr/Kconfig
new file mode 100644
index 000000000000..0c2619d068bd
--- /dev/null
+++ b/net/qrtr/Kconfig
@@ -0,0 +1,24 @@
+# Qualcomm IPC Router configuration
+#
+
+config QRTR
+	bool "Qualcomm IPC Router support"
+	depends on ARCH_QCOM || COMPILE_TEST
+	---help---
+	  Say Y if you intend to use Qualcomm IPC router protocol.  The
+	  protocol is used to communicate with services provided by other
+	  hardware blocks in the system.
+
+	  In order to do service lookups, a userspace daemon is required to
+	  maintain a service listing.
+
+if QRTR
+
+config QRTR_SMD
+	tristate "SMD IPC Router channels"
+	depends on QCOM_SMD || COMPILE_TEST
+	---help---
+	  Say Y here to support SMD based ipcrouter channels.  SMD is the
+	  most common transport for IPC Router.
+
+endif # QRTR
diff --git a/net/qrtr/Makefile b/net/qrtr/Makefile
new file mode 100644
index 000000000000..e282a84ffc5c
--- /dev/null
+++ b/net/qrtr/Makefile
@@ -0,0 +1,2 @@
+obj-y := qrtr.o
+obj-$(CONFIG_QRTR_SMD) += smd.o
diff --git a/net/qrtr/qrtr.c b/net/qrtr/qrtr.c
new file mode 100644
index 000000000000..c985ecbe9bd6
--- /dev/null
+++ b/net/qrtr/qrtr.c
@@ -0,0 +1,1007 @@
+/*
+ * Copyright (c) 2015, Sony Mobile Communications Inc.
+ * Copyright (c) 2013, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#include <linux/module.h>
+#include <linux/netlink.h>
+#include <linux/qrtr.h>
+#include <linux/termios.h>	/* For TIOCINQ/OUTQ */
+
+#include <net/sock.h>
+
+#include "qrtr.h"
+
+#define QRTR_PROTO_VER 1
+
+/* auto-bind range */
+#define QRTR_MIN_EPH_SOCKET 0x4000
+#define QRTR_MAX_EPH_SOCKET 0x7fff
+
+enum qrtr_pkt_type {
+	QRTR_TYPE_DATA		= 1,
+	QRTR_TYPE_HELLO		= 2,
+	QRTR_TYPE_BYE		= 3,
+	QRTR_TYPE_NEW_SERVER	= 4,
+	QRTR_TYPE_DEL_SERVER	= 5,
+	QRTR_TYPE_DEL_CLIENT	= 6,
+	QRTR_TYPE_RESUME_TX	= 7,
+	QRTR_TYPE_EXIT		= 8,
+	QRTR_TYPE_PING		= 9,
+};
+
+/**
+ * struct qrtr_hdr - (I|R)PCrouter packet header
+ * @version: protocol version
+ * @type: packet type; one of QRTR_TYPE_*
+ * @src_node_id: source node
+ * @src_port_id: source port
+ * @confirm_rx: boolean; whether a resume-tx packet should be send in reply
+ * @size: length of packet, excluding this header
+ * @dst_node_id: destination node
+ * @dst_port_id: destination port
+ */
+struct qrtr_hdr {
+	__le32 version;
+	__le32 type;
+	__le32 src_node_id;
+	__le32 src_port_id;
+	__le32 confirm_rx;
+	__le32 size;
+	__le32 dst_node_id;
+	__le32 dst_port_id;
+} __packed;
+
+#define QRTR_HDR_SIZE sizeof(struct qrtr_hdr)
+#define QRTR_NODE_BCAST ((unsigned int)-1)
+#define QRTR_PORT_CTRL ((unsigned int)-2)
+
+struct qrtr_sock {
+	/* WARNING: sk must be the first member */
+	struct sock sk;
+	struct sockaddr_qrtr us;
+	struct sockaddr_qrtr peer;
+};
+
+static inline struct qrtr_sock *qrtr_sk(struct sock *sk)
+{
+	BUILD_BUG_ON(offsetof(struct qrtr_sock, sk) != 0);
+	return container_of(sk, struct qrtr_sock, sk);
+}
+
+static unsigned int qrtr_local_nid = -1;
+
+/* for node ids */
+static RADIX_TREE(qrtr_nodes, GFP_KERNEL);
+/* broadcast list */
+static LIST_HEAD(qrtr_all_nodes);
+/* lock for qrtr_nodes, qrtr_all_nodes and node reference */
+static DEFINE_MUTEX(qrtr_node_lock);
+
+/* local port allocation management */
+static DEFINE_IDR(qrtr_ports);
+static DEFINE_MUTEX(qrtr_port_lock);
+
+/**
+ * struct qrtr_node - endpoint node
+ * @ep_lock: lock for endpoint management and callbacks
+ * @ep: endpoint
+ * @ref: reference count for node
+ * @nid: node id
+ * @rx_queue: receive queue
+ * @work: scheduled work struct for recv work
+ * @item: list item for broadcast list
+ */
+struct qrtr_node {
+	struct mutex ep_lock;
+	struct qrtr_endpoint *ep;
+	struct kref ref;
+	unsigned int nid;
+
+	struct sk_buff_head rx_queue;
+	struct work_struct work;
+	struct list_head item;
+};
+
+/* Release node resources and free the node.
+ *
+ * Do not call directly, use qrtr_node_release.  To be used with
+ * kref_put_mutex.  As such, the node mutex is expected to be locked on call.
+ */
+static void __qrtr_node_release(struct kref *kref)
+{
+	struct qrtr_node *node = container_of(kref, struct qrtr_node, ref);
+
+	if (node->nid != QRTR_EP_NID_AUTO)
+		radix_tree_delete(&qrtr_nodes, node->nid);
+
+	list_del(&node->item);
+	mutex_unlock(&qrtr_node_lock);
+
+	skb_queue_purge(&node->rx_queue);
+	kfree(node);
+}
+
+/* Increment reference to node. */
+static struct qrtr_node *qrtr_node_acquire(struct qrtr_node *node)
+{
+	if (node)
+		kref_get(&node->ref);
+	return node;
+}
+
+/* Decrement reference to node and release as necessary. */
+static void qrtr_node_release(struct qrtr_node *node)
+{
+	if (!node)
+		return;
+	kref_put_mutex(&node->ref, __qrtr_node_release, &qrtr_node_lock);
+}
+
+/* Pass an outgoing packet socket buffer to the endpoint driver. */
+static int qrtr_node_enqueue(struct qrtr_node *node, struct sk_buff *skb)
+{
+	int rc = -ENODEV;
+
+	mutex_lock(&node->ep_lock);
+	if (node->ep)
+		rc = node->ep->xmit(node->ep, skb);
+	else
+		kfree_skb(skb);
+	mutex_unlock(&node->ep_lock);
+
+	return rc;
+}
+
+/* Lookup node by id.
+ *
+ * callers must release with qrtr_node_release()
+ */
+static struct qrtr_node *qrtr_node_lookup(unsigned int nid)
+{
+	struct qrtr_node *node;
+
+	mutex_lock(&qrtr_node_lock);
+	node = radix_tree_lookup(&qrtr_nodes, nid);
+	node = qrtr_node_acquire(node);
+	mutex_unlock(&qrtr_node_lock);
+
+	return node;
+}
+
+/* Assign node id to node.
+ *
+ * This is mostly useful for automatic node id assignment, based on
+ * the source id in the incoming packet.
+ */
+static void qrtr_node_assign(struct qrtr_node *node, unsigned int nid)
+{
+	if (node->nid != QRTR_EP_NID_AUTO || nid == QRTR_EP_NID_AUTO)
+		return;
+
+	mutex_lock(&qrtr_node_lock);
+	radix_tree_insert(&qrtr_nodes, nid, node);
+	node->nid = nid;
+	mutex_unlock(&qrtr_node_lock);
+}
+
+/**
+ * qrtr_endpoint_post() - post incoming data
+ * @ep: endpoint handle
+ * @data: data pointer
+ * @len: size of data in bytes
+ *
+ * Return: 0 on success; negative error code on failure
+ */
+int qrtr_endpoint_post(struct qrtr_endpoint *ep, const void *data, size_t len)
+{
+	struct qrtr_node *node = ep->node;
+	const struct qrtr_hdr *phdr = data;
+	struct sk_buff *skb;
+	unsigned int psize;
+	unsigned int size;
+	unsigned int type;
+	unsigned int ver;
+	unsigned int dst;
+
+	if (len < QRTR_HDR_SIZE || len & 3)
+		return -EINVAL;
+
+	ver = le32_to_cpu(phdr->version);
+	size = le32_to_cpu(phdr->size);
+	type = le32_to_cpu(phdr->type);
+	dst = le32_to_cpu(phdr->dst_port_id);
+
+	psize = (size + 3) & ~3;
+
+	if (ver != QRTR_PROTO_VER)
+		return -EINVAL;
+
+	if (len != psize + QRTR_HDR_SIZE)
+		return -EINVAL;
+
+	if (dst != QRTR_PORT_CTRL && type != QRTR_TYPE_DATA)
+		return -EINVAL;
+
+	skb = netdev_alloc_skb(NULL, len);
+	if (!skb)
+		return -ENOMEM;
+
+	skb_reset_transport_header(skb);
+	memcpy(skb_put(skb, len), data, len);
+
+	skb_queue_tail(&node->rx_queue, skb);
+	schedule_work(&node->work);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(qrtr_endpoint_post);
+
+/* Allocate and construct a resume-tx packet. */
+static struct sk_buff *qrtr_alloc_resume_tx(u32 src_node,
+					    u32 dst_node, u32 port)
+{
+	const int pkt_len = 20;
+	struct qrtr_hdr *hdr;
+	struct sk_buff *skb;
+	u32 *buf;
+
+	skb = alloc_skb(QRTR_HDR_SIZE + pkt_len, GFP_KERNEL);
+	if (!skb)
+		return NULL;
+	skb_reset_transport_header(skb);
+
+	hdr = (struct qrtr_hdr *)skb_put(skb, QRTR_HDR_SIZE);
+	hdr->version = cpu_to_le32(QRTR_PROTO_VER);
+	hdr->type = cpu_to_le32(QRTR_TYPE_RESUME_TX);
+	hdr->src_node_id = cpu_to_le32(src_node);
+	hdr->src_port_id = cpu_to_le32(QRTR_PORT_CTRL);
+	hdr->confirm_rx = cpu_to_le32(0);
+	hdr->size = cpu_to_le32(pkt_len);
+	hdr->dst_node_id = cpu_to_le32(dst_node);
+	hdr->dst_port_id = cpu_to_le32(QRTR_PORT_CTRL);
+
+	buf = (u32 *)skb_put(skb, pkt_len);
+	memset(buf, 0, pkt_len);
+	buf[0] = cpu_to_le32(QRTR_TYPE_RESUME_TX);
+	buf[1] = cpu_to_le32(src_node);
+	buf[2] = cpu_to_le32(port);
+
+	return skb;
+}
+
+static struct qrtr_sock *qrtr_port_lookup(int port);
+static void qrtr_port_put(struct qrtr_sock *ipc);
+
+/* Handle and route a received packet.
+ *
+ * This will auto-reply with resume-tx packet as necessary.
+ */
+static void qrtr_node_rx_work(struct work_struct *work)
+{
+	struct qrtr_node *node = container_of(work, struct qrtr_node, work);
+	struct sk_buff *skb;
+
+	while ((skb = skb_dequeue(&node->rx_queue)) != NULL) {
+		const struct qrtr_hdr *phdr;
+		u32 dst_node, dst_port;
+		struct qrtr_sock *ipc;
+		u32 src_node;
+		int confirm;
+
+		phdr = (const struct qrtr_hdr *)skb_transport_header(skb);
+		src_node = le32_to_cpu(phdr->src_node_id);
+		dst_node = le32_to_cpu(phdr->dst_node_id);
+		dst_port = le32_to_cpu(phdr->dst_port_id);
+		confirm = !!phdr->confirm_rx;
+
+		qrtr_node_assign(node, src_node);
+
+		ipc = qrtr_port_lookup(dst_port);
+		if (!ipc) {
+			kfree_skb(skb);
+		} else {
+			if (sock_queue_rcv_skb(&ipc->sk, skb))
+				kfree_skb(skb);
+
+			qrtr_port_put(ipc);
+		}
+
+		if (confirm) {
+			skb = qrtr_alloc_resume_tx(dst_node, node->nid, dst_port);
+			if (!skb)
+				break;
+			if (qrtr_node_enqueue(node, skb))
+				break;
+		}
+	}
+}
+
+/**
+ * qrtr_endpoint_register() - register a new endpoint
+ * @ep: endpoint to register
+ * @nid: desired node id; may be QRTR_EP_NID_AUTO for auto-assignment
+ * Return: 0 on success; negative error code on failure
+ *
+ * The specified endpoint must have the xmit function pointer set on call.
+ */
+int qrtr_endpoint_register(struct qrtr_endpoint *ep, unsigned int nid)
+{
+	struct qrtr_node *node;
+
+	if (!ep || !ep->xmit)
+		return -EINVAL;
+
+	node = kzalloc(sizeof(*node), GFP_KERNEL);
+	if (!node)
+		return -ENOMEM;
+
+	INIT_WORK(&node->work, qrtr_node_rx_work);
+	kref_init(&node->ref);
+	mutex_init(&node->ep_lock);
+	skb_queue_head_init(&node->rx_queue);
+	node->nid = QRTR_EP_NID_AUTO;
+	node->ep = ep;
+
+	qrtr_node_assign(node, nid);
+
+	mutex_lock(&qrtr_node_lock);
+	list_add(&node->item, &qrtr_all_nodes);
+	mutex_unlock(&qrtr_node_lock);
+	ep->node = node;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(qrtr_endpoint_register);
+
+/**
+ * qrtr_endpoint_unregister - unregister endpoint
+ * @ep: endpoint to unregister
+ */
+void qrtr_endpoint_unregister(struct qrtr_endpoint *ep)
+{
+	struct qrtr_node *node = ep->node;
+
+	mutex_lock(&node->ep_lock);
+	node->ep = NULL;
+	mutex_unlock(&node->ep_lock);
+
+	qrtr_node_release(node);
+	ep->node = NULL;
+}
+EXPORT_SYMBOL_GPL(qrtr_endpoint_unregister);
+
+/* Lookup socket by port.
+ *
+ * Callers must release with qrtr_port_put()
+ */
+static struct qrtr_sock *qrtr_port_lookup(int port)
+{
+	struct qrtr_sock *ipc;
+
+	if (port == QRTR_PORT_CTRL)
+		port = 0;
+
+	mutex_lock(&qrtr_port_lock);
+	ipc = idr_find(&qrtr_ports, port);
+	if (ipc)
+		sock_hold(&ipc->sk);
+	mutex_unlock(&qrtr_port_lock);
+
+	return ipc;
+}
+
+/* Release acquired socket. */
+static void qrtr_port_put(struct qrtr_sock *ipc)
+{
+	sock_put(&ipc->sk);
+}
+
+/* Remove port assignment. */
+static void qrtr_port_remove(struct qrtr_sock *ipc)
+{
+	int port = ipc->us.sq_port;
+
+	if (port == QRTR_PORT_CTRL)
+		port = 0;
+
+	__sock_put(&ipc->sk);
+
+	mutex_lock(&qrtr_port_lock);
+	idr_remove(&qrtr_ports, port);
+	mutex_unlock(&qrtr_port_lock);
+}
+
+/* Assign port number to socket.
+ *
+ * Specify port in the integer pointed to by port, and it will be adjusted
+ * on return as necesssary.
+ *
+ * Port may be:
+ *   0: Assign ephemeral port in [QRTR_MIN_EPH_SOCKET, QRTR_MAX_EPH_SOCKET]
+ *   <QRTR_MIN_EPH_SOCKET: Specified; requires CAP_NET_ADMIN
+ *   >QRTR_MIN_EPH_SOCKET: Specified; available to all
+ */
+static int qrtr_port_assign(struct qrtr_sock *ipc, int *port)
+{
+	int rc;
+
+	mutex_lock(&qrtr_port_lock);
+	if (!*port) {
+		rc = idr_alloc(&qrtr_ports, ipc,
+			       QRTR_MIN_EPH_SOCKET, QRTR_MAX_EPH_SOCKET + 1,
+			       GFP_ATOMIC);
+		if (rc >= 0)
+			*port = rc;
+	} else if (*port < QRTR_MIN_EPH_SOCKET && !capable(CAP_NET_ADMIN)) {
+		rc = -EACCES;
+	} else if (*port == QRTR_PORT_CTRL) {
+		rc = idr_alloc(&qrtr_ports, ipc, 0, 1, GFP_ATOMIC);
+	} else {
+		rc = idr_alloc(&qrtr_ports, ipc, *port, *port + 1, GFP_ATOMIC);
+		if (rc >= 0)
+			*port = rc;
+	}
+	mutex_unlock(&qrtr_port_lock);
+
+	if (rc == -ENOSPC)
+		return -EADDRINUSE;
+	else if (rc < 0)
+		return rc;
+
+	sock_hold(&ipc->sk);
+
+	return 0;
+}
+
+/* Bind socket to address.
+ *
+ * Socket should be locked upon call.
+ */
+static int __qrtr_bind(struct socket *sock,
+		       const struct sockaddr_qrtr *addr, int zapped)
+{
+	struct qrtr_sock *ipc = qrtr_sk(sock->sk);
+	struct sock *sk = sock->sk;
+	int port;
+	int rc;
+
+	/* rebinding ok */
+	if (!zapped && addr->sq_port == ipc->us.sq_port)
+		return 0;
+
+	port = addr->sq_port;
+	rc = qrtr_port_assign(ipc, &port);
+	if (rc)
+		return rc;
+
+	/* unbind previous, if any */
+	if (!zapped)
+		qrtr_port_remove(ipc);
+	ipc->us.sq_port = port;
+
+	sock_reset_flag(sk, SOCK_ZAPPED);
+
+	return 0;
+}
+
+/* Auto bind to an ephemeral port. */
+static int qrtr_autobind(struct socket *sock)
+{
+	struct sock *sk = sock->sk;
+	struct sockaddr_qrtr addr;
+
+	if (!sock_flag(sk, SOCK_ZAPPED))
+		return 0;
+
+	addr.sq_family = AF_QIPCRTR;
+	addr.sq_node = qrtr_local_nid;
+	addr.sq_port = 0;
+
+	return __qrtr_bind(sock, &addr, 1);
+}
+
+/* Bind socket to specified sockaddr. */
+static int qrtr_bind(struct socket *sock, struct sockaddr *saddr, int len)
+{
+	DECLARE_SOCKADDR(struct sockaddr_qrtr *, addr, saddr);
+	struct qrtr_sock *ipc = qrtr_sk(sock->sk);
+	struct sock *sk = sock->sk;
+	int rc;
+
+	if (len < sizeof(*addr) || addr->sq_family != AF_QIPCRTR)
+		return -EINVAL;
+
+	if (addr->sq_node != ipc->us.sq_node)
+		return -EINVAL;
+
+	lock_sock(sk);
+	rc = __qrtr_bind(sock, addr, sock_flag(sk, SOCK_ZAPPED));
+	release_sock(sk);
+
+	return rc;
+}
+
+/* Queue packet to local peer socket. */
+static int qrtr_local_enqueue(struct qrtr_node *node, struct sk_buff *skb)
+{
+	const struct qrtr_hdr *phdr;
+	struct qrtr_sock *ipc;
+
+	phdr = (const struct qrtr_hdr *)skb_transport_header(skb);
+
+	ipc = qrtr_port_lookup(le32_to_cpu(phdr->dst_port_id));
+	if (!ipc || &ipc->sk == skb->sk) { /* do not send to self */
+		kfree_skb(skb);
+		return -ENODEV;
+	}
+
+	if (sock_queue_rcv_skb(&ipc->sk, skb)) {
+		qrtr_port_put(ipc);
+		kfree_skb(skb);
+		return -ENOSPC;
+	}
+
+	qrtr_port_put(ipc);
+
+	return 0;
+}
+
+/* Queue packet for broadcast. */
+static int qrtr_bcast_enqueue(struct qrtr_node *node, struct sk_buff *skb)
+{
+	struct sk_buff *skbn;
+
+	mutex_lock(&qrtr_node_lock);
+	list_for_each_entry(node, &qrtr_all_nodes, item) {
+		skbn = skb_clone(skb, GFP_KERNEL);
+		if (!skbn)
+			break;
+		skb_set_owner_w(skbn, skb->sk);
+		qrtr_node_enqueue(node, skbn);
+	}
+	mutex_unlock(&qrtr_node_lock);
+
+	qrtr_local_enqueue(node, skb);
+
+	return 0;
+}
+
+static int qrtr_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
+{
+	DECLARE_SOCKADDR(struct sockaddr_qrtr *, addr, msg->msg_name);
+	int (*enqueue_fn)(struct qrtr_node *, struct sk_buff *);
+	struct qrtr_sock *ipc = qrtr_sk(sock->sk);
+	struct sock *sk = sock->sk;
+	struct qrtr_node *node;
+	struct qrtr_hdr *hdr;
+	struct sk_buff *skb;
+	size_t plen;
+	int rc;
+
+	if (msg->msg_flags & ~(MSG_DONTWAIT))
+		return -EINVAL;
+
+	if (len > 65535)
+		return -EMSGSIZE;
+
+	lock_sock(sk);
+
+	if (addr) {
+		if (msg->msg_namelen < sizeof(*addr)) {
+			release_sock(sk);
+			return -EINVAL;
+		}
+
+		if (addr->sq_family != AF_QIPCRTR) {
+			release_sock(sk);
+			return -EINVAL;
+		}
+
+		rc = qrtr_autobind(sock);
+		if (rc) {
+			release_sock(sk);
+			return rc;
+		}
+	} else if (sk->sk_state == TCP_ESTABLISHED) {
+		addr = &ipc->peer;
+	} else {
+		release_sock(sk);
+		return -ENOTCONN;
+	}
+
+	node = NULL;
+	if (addr->sq_node == QRTR_NODE_BCAST) {
+		enqueue_fn = qrtr_bcast_enqueue;
+	} else if (addr->sq_node == ipc->us.sq_node) {
+		enqueue_fn = qrtr_local_enqueue;
+	} else {
+		enqueue_fn = qrtr_node_enqueue;
+		node = qrtr_node_lookup(addr->sq_node);
+		if (!node) {
+			release_sock(sk);
+			return -ECONNRESET;
+		}
+	}
+
+	plen = (len + 3) & ~3;
+	skb = sock_alloc_send_skb(sk, plen + QRTR_HDR_SIZE,
+				  msg->msg_flags & MSG_DONTWAIT, &rc);
+	if (!skb)
+		goto out_node;
+
+	skb_reset_transport_header(skb);
+	skb_put(skb, len + QRTR_HDR_SIZE);
+
+	hdr = (struct qrtr_hdr *)skb_transport_header(skb);
+	hdr->version = cpu_to_le32(QRTR_PROTO_VER);
+	hdr->src_node_id = cpu_to_le32(ipc->us.sq_node);
+	hdr->src_port_id = cpu_to_le32(ipc->us.sq_port);
+	hdr->confirm_rx = cpu_to_le32(0);
+	hdr->size = cpu_to_le32(len);
+	hdr->dst_node_id = cpu_to_le32(addr->sq_node);
+	hdr->dst_port_id = cpu_to_le32(addr->sq_port);
+
+	rc = skb_copy_datagram_from_iter(skb, QRTR_HDR_SIZE,
+					 &msg->msg_iter, len);
+	if (rc) {
+		kfree_skb(skb);
+		goto out_node;
+	}
+
+	if (plen != len) {
+		skb_pad(skb, plen - len);
+		skb_put(skb, plen - len);
+	}
+
+	if (ipc->us.sq_port == QRTR_PORT_CTRL) {
+		if (len < 4) {
+			rc = -EINVAL;
+			kfree_skb(skb);
+			goto out_node;
+		}
+
+		/* control messages already require the type as 'command' */
+		skb_copy_bits(skb, QRTR_HDR_SIZE, &hdr->type, 4);
+	} else {
+		hdr->type = cpu_to_le32(QRTR_TYPE_DATA);
+	}
+
+	rc = enqueue_fn(node, skb);
+	if (rc >= 0)
+		rc = len;
+
+out_node:
+	qrtr_node_release(node);
+	release_sock(sk);
+
+	return rc;
+}
+
+static int qrtr_recvmsg(struct socket *sock, struct msghdr *msg,
+			size_t size, int flags)
+{
+	DECLARE_SOCKADDR(struct sockaddr_qrtr *, addr, msg->msg_name);
+	const struct qrtr_hdr *phdr;
+	struct sock *sk = sock->sk;
+	struct sk_buff *skb;
+	int copied, rc;
+
+	lock_sock(sk);
+
+	if (sock_flag(sk, SOCK_ZAPPED)) {
+		release_sock(sk);
+		return -EADDRNOTAVAIL;
+	}
+
+	skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
+				flags & MSG_DONTWAIT, &rc);
+	if (!skb) {
+		release_sock(sk);
+		return rc;
+	}
+
+	phdr = (const struct qrtr_hdr *)skb_transport_header(skb);
+	copied = le32_to_cpu(phdr->size);
+	if (copied > size) {
+		copied = size;
+		msg->msg_flags |= MSG_TRUNC;
+	}
+
+	rc = skb_copy_datagram_msg(skb, QRTR_HDR_SIZE, msg, copied);
+	if (rc < 0)
+		goto out;
+	rc = copied;
+
+	if (addr) {
+		addr->sq_family = AF_QIPCRTR;
+		addr->sq_node = le32_to_cpu(phdr->src_node_id);
+		addr->sq_port = le32_to_cpu(phdr->src_port_id);
+		msg->msg_namelen = sizeof(*addr);
+	}
+
+out:
+	skb_free_datagram(sk, skb);
+	release_sock(sk);
+
+	return rc;
+}
+
+static int qrtr_connect(struct socket *sock, struct sockaddr *saddr,
+			int len, int flags)
+{
+	DECLARE_SOCKADDR(struct sockaddr_qrtr *, addr, saddr);
+	struct qrtr_sock *ipc = qrtr_sk(sock->sk);
+	struct sock *sk = sock->sk;
+	int rc;
+
+	if (len < sizeof(*addr) || addr->sq_family != AF_QIPCRTR)
+		return -EINVAL;
+
+	lock_sock(sk);
+
+	sk->sk_state = TCP_CLOSE;
+	sock->state = SS_UNCONNECTED;
+
+	rc = qrtr_autobind(sock);
+	if (rc) {
+		release_sock(sk);
+		return rc;
+	}
+
+	ipc->peer = *addr;
+	sock->state = SS_CONNECTED;
+	sk->sk_state = TCP_ESTABLISHED;
+
+	release_sock(sk);
+
+	return 0;
+}
+
+static int qrtr_getname(struct socket *sock, struct sockaddr *saddr,
+			int *len, int peer)
+{
+	struct qrtr_sock *ipc = qrtr_sk(sock->sk);
+	struct sockaddr_qrtr qaddr;
+	struct sock *sk = sock->sk;
+
+	lock_sock(sk);
+	if (peer) {
+		if (sk->sk_state != TCP_ESTABLISHED) {
+			release_sock(sk);
+			return -ENOTCONN;
+		}
+
+		qaddr = ipc->peer;
+	} else {
+		qaddr = ipc->us;
+	}
+	release_sock(sk);
+
+	*len = sizeof(qaddr);
+	qaddr.sq_family = AF_QIPCRTR;
+
+	memcpy(saddr, &qaddr, sizeof(qaddr));
+
+	return 0;
+}
+
+static int qrtr_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
+{
+	void __user *argp = (void __user *)arg;
+	struct qrtr_sock *ipc = qrtr_sk(sock->sk);
+	struct sock *sk = sock->sk;
+	struct sockaddr_qrtr *sq;
+	struct sk_buff *skb;
+	struct ifreq ifr;
+	long len = 0;
+	int rc = 0;
+
+	lock_sock(sk);
+
+	switch (cmd) {
+	case TIOCOUTQ:
+		len = sk->sk_sndbuf - sk_wmem_alloc_get(sk);
+		if (len < 0)
+			len = 0;
+		rc = put_user(len, (int __user *)argp);
+		break;
+	case TIOCINQ:
+		skb = skb_peek(&sk->sk_receive_queue);
+		if (skb)
+			len = skb->len - QRTR_HDR_SIZE;
+		rc = put_user(len, (int __user *)argp);
+		break;
+	case SIOCGIFADDR:
+		if (copy_from_user(&ifr, argp, sizeof(ifr))) {
+			rc = -EFAULT;
+			break;
+		}
+
+		sq = (struct sockaddr_qrtr *)&ifr.ifr_addr;
+		*sq = ipc->us;
+		if (copy_to_user(argp, &ifr, sizeof(ifr))) {
+			rc = -EFAULT;
+			break;
+		}
+		break;
+	case SIOCGSTAMP:
+		rc = sock_get_timestamp(sk, argp);
+		break;
+	case SIOCADDRT:
+	case SIOCDELRT:
+	case SIOCSIFADDR:
+	case SIOCGIFDSTADDR:
+	case SIOCSIFDSTADDR:
+	case SIOCGIFBRDADDR:
+	case SIOCSIFBRDADDR:
+	case SIOCGIFNETMASK:
+	case SIOCSIFNETMASK:
+		rc = -EINVAL;
+		break;
+	default:
+		rc = -ENOIOCTLCMD;
+		break;
+	}
+
+	release_sock(sk);
+
+	return rc;
+}
+
+static int qrtr_release(struct socket *sock)
+{
+	struct sock *sk = sock->sk;
+	struct qrtr_sock *ipc;
+
+	if (!sk)
+		return 0;
+
+	lock_sock(sk);
+
+	ipc = qrtr_sk(sk);
+	sk->sk_shutdown = SHUTDOWN_MASK;
+	if (!sock_flag(sk, SOCK_DEAD))
+		sk->sk_state_change(sk);
+
+	sock_set_flag(sk, SOCK_DEAD);
+	sock->sk = NULL;
+
+	if (!sock_flag(sk, SOCK_ZAPPED))
+		qrtr_port_remove(ipc);
+
+	skb_queue_purge(&sk->sk_receive_queue);
+
+	release_sock(sk);
+	sock_put(sk);
+
+	return 0;
+}
+
+static const struct proto_ops qrtr_proto_ops = {
+	.owner		= THIS_MODULE,
+	.family		= AF_QIPCRTR,
+	.bind		= qrtr_bind,
+	.connect	= qrtr_connect,
+	.socketpair	= sock_no_socketpair,
+	.accept		= sock_no_accept,
+	.listen		= sock_no_listen,
+	.sendmsg	= qrtr_sendmsg,
+	.recvmsg	= qrtr_recvmsg,
+	.getname	= qrtr_getname,
+	.ioctl		= qrtr_ioctl,
+	.poll		= datagram_poll,
+	.shutdown	= sock_no_shutdown,
+	.setsockopt	= sock_no_setsockopt,
+	.getsockopt	= sock_no_getsockopt,
+	.release	= qrtr_release,
+	.mmap		= sock_no_mmap,
+	.sendpage	= sock_no_sendpage,
+};
+
+static struct proto qrtr_proto = {
+	.name		= "QIPCRTR",
+	.owner		= THIS_MODULE,
+	.obj_size	= sizeof(struct qrtr_sock),
+};
+
+static int qrtr_create(struct net *net, struct socket *sock,
+		       int protocol, int kern)
+{
+	struct qrtr_sock *ipc;
+	struct sock *sk;
+
+	if (sock->type != SOCK_DGRAM)
+		return -EPROTOTYPE;
+
+	sk = sk_alloc(net, AF_QIPCRTR, GFP_KERNEL, &qrtr_proto, kern);
+	if (!sk)
+		return -ENOMEM;
+
+	sock_set_flag(sk, SOCK_ZAPPED);
+
+	sock_init_data(sock, sk);
+	sock->ops = &qrtr_proto_ops;
+
+	ipc = qrtr_sk(sk);
+	ipc->us.sq_family = AF_QIPCRTR;
+	ipc->us.sq_node = qrtr_local_nid;
+	ipc->us.sq_port = 0;
+
+	return 0;
+}
+
+static const struct nla_policy qrtr_policy[IFA_MAX + 1] = {
+	[IFA_LOCAL] = { .type = NLA_U32 },
+};
+
+static int qrtr_addr_doit(struct sk_buff *skb, struct nlmsghdr *nlh)
+{
+	struct nlattr *tb[IFA_MAX + 1];
+	struct ifaddrmsg *ifm;
+	int rc;
+
+	if (!netlink_capable(skb, CAP_NET_ADMIN))
+		return -EPERM;
+
+	if (!netlink_capable(skb, CAP_SYS_ADMIN))
+		return -EPERM;
+
+	ASSERT_RTNL();
+
+	rc = nlmsg_parse(nlh, sizeof(*ifm), tb, IFA_MAX, qrtr_policy);
+	if (rc < 0)
+		return rc;
+
+	ifm = nlmsg_data(nlh);
+	if (!tb[IFA_LOCAL])
+		return -EINVAL;
+
+	qrtr_local_nid = nla_get_u32(tb[IFA_LOCAL]);
+	return 0;
+}
+
+static const struct net_proto_family qrtr_family = {
+	.owner	= THIS_MODULE,
+	.family	= AF_QIPCRTR,
+	.create	= qrtr_create,
+};
+
+static int __init qrtr_proto_init(void)
+{
+	int rc;
+
+	rc = proto_register(&qrtr_proto, 1);
+	if (rc)
+		return rc;
+
+	rc = sock_register(&qrtr_family);
+	if (rc) {
+		proto_unregister(&qrtr_proto);
+		return rc;
+	}
+
+	rtnl_register(PF_QIPCRTR, RTM_NEWADDR, qrtr_addr_doit, NULL, NULL);
+
+	return 0;
+}
+module_init(qrtr_proto_init);
+
+static void __exit qrtr_proto_fini(void)
+{
+	rtnl_unregister(PF_QIPCRTR, RTM_NEWADDR);
+	sock_unregister(qrtr_family.family);
+	proto_unregister(&qrtr_proto);
+}
+module_exit(qrtr_proto_fini);
+
+MODULE_DESCRIPTION("Qualcomm IPC-router driver");
+MODULE_LICENSE("GPL v2");
diff --git a/net/qrtr/qrtr.h b/net/qrtr/qrtr.h
new file mode 100644
index 000000000000..2b848718f8fe
--- /dev/null
+++ b/net/qrtr/qrtr.h
@@ -0,0 +1,31 @@
+#ifndef __QRTR_H_
+#define __QRTR_H_
+
+#include <linux/types.h>
+
+struct sk_buff;
+
+/* endpoint node id auto assignment */
+#define QRTR_EP_NID_AUTO (-1)
+
+/**
+ * struct qrtr_endpoint - endpoint handle
+ * @xmit: Callback for outgoing packets
+ *
+ * The socket buffer passed to the xmit function becomes owned by the endpoint
+ * driver.  As such, when the driver is done with the buffer, it should
+ * call kfree_skb() on failure, or consume_skb() on success.
+ */
+struct qrtr_endpoint {
+	int (*xmit)(struct qrtr_endpoint *ep, struct sk_buff *skb);
+	/* private: not for endpoint use */
+	struct qrtr_node *node;
+};
+
+int qrtr_endpoint_register(struct qrtr_endpoint *ep, unsigned int nid);
+
+void qrtr_endpoint_unregister(struct qrtr_endpoint *ep);
+
+int qrtr_endpoint_post(struct qrtr_endpoint *ep, const void *data, size_t len);
+
+#endif
diff --git a/net/qrtr/smd.c b/net/qrtr/smd.c
new file mode 100644
index 000000000000..84ebce73aa23
--- /dev/null
+++ b/net/qrtr/smd.c
@@ -0,0 +1,117 @@
+/*
+ * Copyright (c) 2015, Sony Mobile Communications Inc.
+ * Copyright (c) 2013, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/soc/qcom/smd.h>
+
+#include "qrtr.h"
+
+struct qrtr_smd_dev {
+	struct qrtr_endpoint ep;
+	struct qcom_smd_channel *channel;
+};
+
+/* from smd to qrtr */
+static int qcom_smd_qrtr_callback(struct qcom_smd_device *sdev,
+				  const void *data, size_t len)
+{
+	struct qrtr_smd_dev *qdev = dev_get_drvdata(&sdev->dev);
+	int rc;
+
+	if (!qdev)
+		return -EAGAIN;
+
+	rc = qrtr_endpoint_post(&qdev->ep, data, len);
+	if (rc == -EINVAL) {
+		dev_err(&sdev->dev, "invalid ipcrouter packet\n");
+		/* return 0 to let smd drop the packet */
+		rc = 0;
+	}
+
+	return rc;
+}
+
+/* from qrtr to smd */
+static int qcom_smd_qrtr_send(struct qrtr_endpoint *ep, struct sk_buff *skb)
+{
+	struct qrtr_smd_dev *qdev = container_of(ep, struct qrtr_smd_dev, ep);
+	int rc;
+
+	rc = skb_linearize(skb);
+	if (rc)
+		goto out;
+
+	rc = qcom_smd_send(qdev->channel, skb->data, skb->len);
+
+out:
+	if (rc)
+		kfree_skb(skb);
+	else
+		consume_skb(skb);
+	return rc;
+}
+
+static int qcom_smd_qrtr_probe(struct qcom_smd_device *sdev)
+{
+	struct qrtr_smd_dev *qdev;
+	int rc;
+
+	qdev = devm_kzalloc(&sdev->dev, sizeof(*qdev), GFP_KERNEL);
+	if (!qdev)
+		return -ENOMEM;
+
+	qdev->channel = sdev->channel;
+	qdev->ep.xmit = qcom_smd_qrtr_send;
+
+	rc = qrtr_endpoint_register(&qdev->ep, QRTR_EP_NID_AUTO);
+	if (rc)
+		return rc;
+
+	dev_set_drvdata(&sdev->dev, qdev);
+
+	dev_dbg(&sdev->dev, "Qualcomm SMD QRTR driver probed\n");
+
+	return 0;
+}
+
+static void qcom_smd_qrtr_remove(struct qcom_smd_device *sdev)
+{
+	struct qrtr_smd_dev *qdev = dev_get_drvdata(&sdev->dev);
+
+	qrtr_endpoint_unregister(&qdev->ep);
+
+	dev_set_drvdata(&sdev->dev, NULL);
+}
+
+static const struct qcom_smd_id qcom_smd_qrtr_smd_match[] = {
+	{ "IPCRTR" },
+	{}
+};
+
+static struct qcom_smd_driver qcom_smd_qrtr_driver = {
+	.probe = qcom_smd_qrtr_probe,
+	.remove = qcom_smd_qrtr_remove,
+	.callback = qcom_smd_qrtr_callback,
+	.smd_match_table = qcom_smd_qrtr_smd_match,
+	.driver = {
+		.name = "qcom_smd_qrtr",
+		.owner = THIS_MODULE,
+	},
+};
+
+module_qcom_smd_driver(qcom_smd_qrtr_driver);
+
+MODULE_DESCRIPTION("Qualcomm IPC-Router SMD interface driver");
+MODULE_LICENSE("GPL v2");
-- 
2.5.0

^ permalink raw reply related

* Re: [RFC PATCH net] ipv6/ila: fix nlsize calculation for lwtunnel
From: David Miller @ 2016-04-27 19:20 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev, tom
In-Reply-To: <1461340682-31568-1-git-send-email-nicolas.dichtel@6wind.com>

From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Fri, 22 Apr 2016 17:58:02 +0200

> The handler 'ila_fill_encap_info' adds one attribute: ILA_ATTR_LOCATOR.
> 
> Fixes: 65d7ab8de582 ("net: Identifier Locator Addressing module")
> CC: Tom Herbert <tom@herbertland.com>
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> ---
> 
> Tom, when I read the comment, I feel I'm misssing something, but what?

Tom, seriously, please look at this.

And with recent changes in net-next the csum attribute size needs to
be specified as well, plus the locator needs to use the 64-bit
alignment sizing helper.

^ permalink raw reply

* Re: [PATCH net-next 2/2] tcp: remove a redundant check for SKBTX_ACK_TSTAMP
From: Soheil Hassas Yeganeh @ 2016-04-27 19:25 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Soheil Hassas Yeganeh, David Miller, netdev, Willem de Bruijn,
	Eric Dumazet, Yuchung Cheng, Neal Cardwell
In-Reply-To: <20160427181938.GA54673@kafai-mba.local>

On Wed, Apr 27, 2016 at 2:19 PM, Martin KaFai Lau <kafai@fb.com> wrote:
> On Mon, Apr 25, 2016 at 04:51:13PM -0400, Soheil Hassas Yeganeh wrote:
>> From: Soheil Hassas Yeganeh <soheil@google.com>
>>
>> txstamp_ack in tcp_skb_cb is set iff the SKBTX_ACK_TSTAMP
>> flag is set for an skb. Thus, it is not required to check
>> shinfo->tx_flags if the txstamp_ack bit is checked.
>>
>> Remove the check on shinfo->tx_flags & SKBTX_ACK_TSTAMP, since
>> it has already been checked using the txstamp_ack bit.
>>
>> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
>> ---
>>  net/ipv4/tcp_input.c | 3 +--
>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
>> index 967520d..2f3fd92 100644
>> --- a/net/ipv4/tcp_input.c
>> +++ b/net/ipv4/tcp_input.c
>> @@ -3087,8 +3087,7 @@ static void tcp_ack_tstamp(struct sock *sk, struct sk_buff *skb,
>>               return;
>>
>>       shinfo = skb_shinfo(skb);
>> -     if ((shinfo->tx_flags & SKBTX_ACK_TSTAMP) &&
>> -         !before(shinfo->tskey, prior_snd_una) &&
>> +     if (!before(shinfo->tskey, prior_snd_una) &&
>>           before(shinfo->tskey, tcp_sk(sk)->snd_una))
>>               __skb_tstamp_tx(skb, NULL, sk, SCM_TSTAMP_ACK);
>>  }
>> --
>> 2.8.0.rc3.226.g39d4020
>>
> Acked-by: Martin KaFai Lau <kafai@fb.com>
>
> Can it be one step further and completely remove SKBTX_ACK_TSTAMP?
> like what Willem has also suggested here:
> http://www.spinics.net/lists/netdev/msg374231.html
>
> It seems no one else is using the SKBTX_ACK_TSTAMP except TCP.

Ah, good point. Will update the patch then. Thanks!

^ permalink raw reply

* Re: [PATCH v3 1/2] soc: qcom: smd: Introduce compile stubs
From: Andy Gross @ 2016-04-27 19:32 UTC (permalink / raw)
  To: Bjorn Andersson; +Cc: David S. Miller, linux-kernel, netdev, linux-arm-msm
In-Reply-To: <1461784383-2978-1-git-send-email-bjorn.andersson@linaro.org>

On 27 April 2016 at 14:13, Bjorn Andersson <bjorn.andersson@linaro.org> wrote:
> Introduce compile stubs for the SMD API, allowing consumers to be
> compile tested.
>
> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
> ---
>
> Changes since v2:
> - Introduce this patch, to allow compile testing of QRTR_SMD
>
>  include/linux/soc/qcom/smd.h | 28 +++++++++++++++++++++++++++-
>  1 file changed, 27 insertions(+), 1 deletion(-)
>

Looks ok.

Acked-by: Andy Gross <andy.gross@linaro.org>

^ permalink raw reply

* Re: [PATCH v2 net-next] net: ipv6: Use passed in table for nexthop lookups
From: David Miller @ 2016-04-27 19:37 UTC (permalink / raw)
  To: dsa; +Cc: netdev
In-Reply-To: <1461558364-17970-1-git-send-email-dsa@cumulusnetworks.com>

From: David Ahern <dsa@cumulusnetworks.com>
Date: Sun, 24 Apr 2016 21:26:04 -0700

> Similar to 3bfd847203c6 ("net: Use passed in table for nexthop lookups")
> for IPv4, if the route spec contains a table id use that to lookup the
> next hop first and fall back to a full lookup if it fails (per the fix
> 4c9bcd117918b ("net: Fix nexthop lookups")).
 ...
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>

Applied, thanks David.

^ permalink raw reply

* Re: [PATCH 1/5] phylib: don't return NULL from get_phy_device()
From: Andrew Lunn @ 2016-04-27 19:49 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: David S. Miller, Sergei Shtylyov, f.fainelli, arnd, netdev,
	kernel
In-Reply-To: <874mamzw5q.fsf@ketchup.mtl.sfl>

On Wed, Apr 27, 2016 at 03:30:57PM -0400, Vivien Didelot wrote:
> Hi David, All,
> 
> Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> writes:
> 
> > Arnd Bergmann asked that get_phy_device() returns either NULL or the error
> > value,  not both on error.  Do as he said, return ERR_PTR(-ENODEV) instead
> > of NULL when the PHY ID registers read as  all ones.
> >
> > Suggested-by: Arnd Bergmann <arnd@arndb.de>
> > Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
> >
> > ---
> >  drivers/net/phy/phy_device.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > Index: net-next/drivers/net/phy/phy_device.c
> > ===================================================================
> > --- net-next.orig/drivers/net/phy/phy_device.c
> > +++ net-next/drivers/net/phy/phy_device.c
> > @@ -529,7 +529,7 @@ struct phy_device *get_phy_device(struct
> >  
> >  	/* If the phy_id is mostly Fs, there is no device there */
> >  	if ((phy_id & 0x1fffffff) == 0x1fffffff)
> > -		return NULL;
> > +		return ERR_PTR(-ENODEV);
> >  
> >  	return phy_device_create(bus, addr, phy_id, is_c45, &c45_ids);
> >  }

This change is wrong, it needs reverting, or the call sights need
fixing to expect ENODEV.

The point is, the device not being there is not an error, with respect
to the code calling this function.

It gets called by mdiobus_scan()

struct phy_device *mdiobus_scan(struct mii_bus *bus, int addr)
{
        struct phy_device *phydev;
        int err;

        phydev = get_phy_device(bus, addr, false);
        if (IS_ERR(phydev) || phydev == NULL)
                return phydev;

So before, we return NULL, if the device was not there. Now we return
ERR_PTR(-ENODEV).

This is being called by:

int __mdiobus_register(struct mii_bus *bus, struct module *owner)
{
        struct mdio_device *mdiodev;
...
        for (i = 0; i < PHY_MAX_ADDR; i++) {
                if ((bus->phy_mask & (1 << i)) == 0) {
                        struct phy_device *phydev;

                        phydev = mdiobus_scan(bus, i);
                        if (IS_ERR(phydev)) {
                                err = PTR_ERR(phydev);
                                goto error;
                        }
                }
        }

This is treating ERR_PTR(-ENODEV) as a fatal error, where as before
IS_ERR(NULL) would be false and it would continue scanning other
addresses on the bus.

Please revert this, or fix all the callsites such that ENODEV is not a
fatal error.

	     Andrew

^ permalink raw reply

* Re: [PATCH net-next] tcp: do not block bh during prequeue processing
From: Eric Dumazet @ 2016-04-27 19:54 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <1461783594.5535.84.camel@edumazet-glaptop3.roam.corp.google.com>

On Wed, 2016-04-27 at 11:59 -0700, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> AFAIK, nothing in current TCP stack absolutely wants BH
> being disabled once socket is owned by a thread running in
> process context.
> 
> As mentioned in my prior patch ("tcp: give prequeue mode some care"),
> processing a batch of packets might take time, better not block BH
> at all.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---

Scratch that. We have some TCP_INC_STATS_BH() to take care of before.

^ permalink raw reply

* Re: [PATCH 1/5] phylib: don't return NULL from get_phy_device()
From: Sergei Shtylyov @ 2016-04-27 20:09 UTC (permalink / raw)
  To: Andrew Lunn, Vivien Didelot
  Cc: David S. Miller, f.fainelli, arnd, netdev, kernel
In-Reply-To: <20160427194932.GF29024@lunn.ch>

Hello.

On 04/27/2016 10:49 PM, Andrew Lunn wrote:

>> Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> writes:
>>
>>> Arnd Bergmann asked that get_phy_device() returns either NULL or the error
>>> value,  not both on error.  Do as he said, return ERR_PTR(-ENODEV) instead
>>> of NULL when the PHY ID registers read as  all ones.
>>>
>>> Suggested-by: Arnd Bergmann <arnd@arndb.de>
>>> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
>>>
>>> ---
>>>   drivers/net/phy/phy_device.c |    2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> Index: net-next/drivers/net/phy/phy_device.c
>>> ===================================================================
>>> --- net-next.orig/drivers/net/phy/phy_device.c
>>> +++ net-next/drivers/net/phy/phy_device.c
>>> @@ -529,7 +529,7 @@ struct phy_device *get_phy_device(struct
>>>
>>>   	/* If the phy_id is mostly Fs, there is no device there */
>>>   	if ((phy_id & 0x1fffffff) == 0x1fffffff)
>>> -		return NULL;
>>> +		return ERR_PTR(-ENODEV);
>>>
>>>   	return phy_device_create(bus, addr, phy_id, is_c45, &c45_ids);
>>>   }
>
> This change is wrong, it needs reverting, or the call sights need
> fixing to expect ENODEV.

    So this function had a good reason to return NULL, as it turned out... :-(

> The point is, the device not being there is not an error, with respect
> to the code calling this function.
>
> It gets called by mdiobus_scan()
>
> struct phy_device *mdiobus_scan(struct mii_bus *bus, int addr)
> {
>          struct phy_device *phydev;
>          int err;
>
>          phydev = get_phy_device(bus, addr, false);
>          if (IS_ERR(phydev) || phydev == NULL)
>                  return phydev;
>
> So before, we return NULL, if the device was not there. Now we return
> ERR_PTR(-ENODEV).
>
> This is being called by:
>
> int __mdiobus_register(struct mii_bus *bus, struct module *owner)
> {
>          struct mdio_device *mdiodev;
> ...
>          for (i = 0; i < PHY_MAX_ADDR; i++) {
>                  if ((bus->phy_mask & (1 << i)) == 0) {
>                          struct phy_device *phydev;
>
>                          phydev = mdiobus_scan(bus, i);
>                          if (IS_ERR(phydev)) {
>                                  err = PTR_ERR(phydev);
>                                  goto error;
>                          }
>                  }
>          }
>
> This is treating ERR_PTR(-ENODEV) as a fatal error, where as before
> IS_ERR(NULL) would be false and it would continue scanning other
> addresses on the bus.

    Thank you for the detailed analysis! (And shame on me for the lack of it.)

> Please revert this, or fix all the callsites such that ENODEV is not a
> fatal error.

    OK, I'll do what DaveM decides.

> 	     Andrew

MBR, Sergei

^ permalink raw reply

* Re: [PATCH 1/5] phylib: don't return NULL from get_phy_device()
From: Sergei Shtylyov @ 2016-04-27 20:12 UTC (permalink / raw)
  To: Vivien Didelot, David S. Miller, f.fainelli
  Cc: arnd, Andrew Lunn, netdev, kernel
In-Reply-To: <874mamzw5q.fsf@ketchup.mtl.sfl>

Hello.

On 04/27/2016 10:30 PM, Vivien Didelot wrote:

>> Arnd Bergmann asked that get_phy_device() returns either NULL or the error
>> value,  not both on error.  Do as he said, return ERR_PTR(-ENODEV) instead
>> of NULL when the PHY ID registers read as  all ones.
>>
>> Suggested-by: Arnd Bergmann <arnd@arndb.de>
>> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
>>
>> ---
>>   drivers/net/phy/phy_device.c |    2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> Index: net-next/drivers/net/phy/phy_device.c
>> ===================================================================
>> --- net-next.orig/drivers/net/phy/phy_device.c
>> +++ net-next/drivers/net/phy/phy_device.c
>> @@ -529,7 +529,7 @@ struct phy_device *get_phy_device(struct
>>
>>   	/* If the phy_id is mostly Fs, there is no device there */
>>   	if ((phy_id & 0x1fffffff) == 0x1fffffff)
>> -		return NULL;
>> +		return ERR_PTR(-ENODEV);
>>
>>   	return phy_device_create(bus, addr, phy_id, is_c45, &c45_ids);
>>   }
>
> This particular commit, merged as:
>
>      b74766a0a0fe ("phylib: don't return NULL from get_phy_device()")
>
> breaks my 3-switch DSA setup with the following error:
>
>      fec: probe of 400d1000.ethernet failed with error -22
>
> Reverting c971c0e580a6 ("Merge branch 'get_phy_device-retval'") restores
> a working setup.

    I think I was able to follow this to the get_phy_device() call in 
fixed_phy_register() but I'm unable to see why it fails now and didn't before. 
Are you using fixed_phy.c at all?

> Thanks,
>
>          Vivien

MBR, Sergei

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox