Netdev List
 help / color / mirror / Atom feed
* [PATCH 0/3] fix reuseaddr regression
From: Josef Bacik @ 2017-09-23  0:20 UTC (permalink / raw)
  To: davem, netdev, kernel-team, linux-kernel

I introduced a regression when reworking the fastreuse port stuff that allows
bind conflicts to occur once a reuseaddr successfully opens on an existing tb.
The root cause is I reversed an if statement which caused us to set the tb as if
there were no owners on the socket if there were, which obviously is not
correct.

Dave could you please queue these changes up for -stable, I've run them through
the net tests and added another test to check for this problem specifically.
Thanks,

Josef

^ permalink raw reply

* [PATCH 1/3] net: set tb->fast_sk_family
From: Josef Bacik @ 2017-09-23  0:20 UTC (permalink / raw)
  To: davem, netdev, kernel-team, linux-kernel; +Cc: Josef Bacik
In-Reply-To: <1506126008-9148-1-git-send-email-josef@toxicpanda.com>

From: Josef Bacik <jbacik@fb.com>

We need to set the tb->fast_sk_family properly so we can use the proper
comparison function for all subsequent reuseport bind requests.

Fixes: 637bc8bbe6c0 ("inet: reset tb->fastreuseport when adding a reuseport sk")
Reported-and-tested-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 net/ipv4/inet_connection_sock.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index b9c64b40a83a..f87f4805e244 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -328,6 +328,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 			tb->fastuid = uid;
 			tb->fast_rcv_saddr = sk->sk_rcv_saddr;
 			tb->fast_ipv6_only = ipv6_only_sock(sk);
+			tb->fast_sk_family = sk->sk_family;
 #if IS_ENABLED(CONFIG_IPV6)
 			tb->fast_v6_rcv_saddr = sk->sk_v6_rcv_saddr;
 #endif
@@ -354,6 +355,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 				tb->fastuid = uid;
 				tb->fast_rcv_saddr = sk->sk_rcv_saddr;
 				tb->fast_ipv6_only = ipv6_only_sock(sk);
+				tb->fast_sk_family = sk->sk_family;
 #if IS_ENABLED(CONFIG_IPV6)
 				tb->fast_v6_rcv_saddr = sk->sk_v6_rcv_saddr;
 #endif
-- 
2.7.4

^ permalink raw reply related

* [PATCH 2/3] net: use inet6_rcv_saddr to compare sockets
From: Josef Bacik @ 2017-09-23  0:20 UTC (permalink / raw)
  To: davem, netdev, kernel-team, linux-kernel; +Cc: Josef Bacik
In-Reply-To: <1506126008-9148-1-git-send-email-josef@toxicpanda.com>

From: Josef Bacik <jbacik@fb.com>

In ipv6_rcv_saddr_equal() we need to use inet6_rcv_saddr(sk) for the
ipv6 compare with the fast socket information to make sure we're doing
the proper comparisons.

Fixes: 637bc8bbe6c0 ("inet: reset tb->fastreuseport when adding a reuseport sk")
Reported-and-tested-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 net/ipv4/inet_connection_sock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index f87f4805e244..a1bf30438bc5 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -266,7 +266,7 @@ static inline int sk_reuseport_match(struct inet_bind_bucket *tb,
 #if IS_ENABLED(CONFIG_IPV6)
 	if (tb->fast_sk_family == AF_INET6)
 		return ipv6_rcv_saddr_equal(&tb->fast_v6_rcv_saddr,
-					    &sk->sk_v6_rcv_saddr,
+					    inet6_rcv_saddr(sk),
 					    tb->fast_rcv_saddr,
 					    sk->sk_rcv_saddr,
 					    tb->fast_ipv6_only,
-- 
2.7.4

^ permalink raw reply related

* [PATCH 3/3] inet: fix improper empty comparison
From: Josef Bacik @ 2017-09-23  0:20 UTC (permalink / raw)
  To: davem, netdev, kernel-team, linux-kernel; +Cc: Josef Bacik
In-Reply-To: <1506126008-9148-1-git-send-email-josef@toxicpanda.com>

From: Josef Bacik <jbacik@fb.com>

When doing my reuseport rework I screwed up and changed a

if (hlist_empty(&tb->owners))

to

if (!hlist_empty(&tb->owners))

This is obviously bad as all of the reuseport/reuse logic was reversed,
which caused weird problems like allowing an ipv4 bind conflict if we
opened an ipv4 only socket on a port followed by an ipv6 only socket on
the same port.

Fixes: b9470c27607b ("inet: kill smallest_size and smallest_port")
Reported-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 net/ipv4/inet_connection_sock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index a1bf30438bc5..c039c937ba90 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -321,7 +321,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 			goto fail_unlock;
 	}
 success:
-	if (!hlist_empty(&tb->owners)) {
+	if (hlist_empty(&tb->owners)) {
 		tb->fastreuse = reuse;
 		if (sk->sk_reuseport) {
 			tb->fastreuseport = FASTREUSEPORT_ANY;
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH 0/3] fix reuseaddr regression
From: Josef Bacik @ 2017-09-23  0:28 UTC (permalink / raw)
  To: David Miller; +Cc: josef, netdev, linux-kernel, crobinso, labbott, kernel-team
In-Reply-To: <20170919.135056.44228457394918392.davem@davemloft.net>

On Tue, Sep 19, 2017 at 01:50:56PM -0700, David Miller wrote:
> From: josef@toxicpanda.com
> Date: Mon, 18 Sep 2017 12:28:54 -0400
> 
> > I introduced a regression when reworking the fastreuse port stuff that allows
> > bind conflicts to occur once a reuseaddr socket successfully opens on an
> > existing tb.  The root cause is I reversed an if statement which caused us to
> > set the tb as if there were no owners on the socket if there were, which
> > obviously is not correct.
> > 
> > Dave I have follow up patches that will add a selftest for this case and I ran
> > the other reuseport related tests as well.  These need to go in pretty quickly
> > as it breaks kvm, I've marked them for stable.  Sorry for the regression,
> 
> First, please fix your "From: " field so that it actually has your full
> name rather than just your email address.  This matter when I apply
> your patches.
> 
> Second, remove the stable CC:.  For networking changes, you simply ask
> me to queue the changes up for -stable.
> 

Sorry Dave, I've fixed my git email settings and I droped the stable cc and sent
a new round.  Didn't see this until just now, my bad.

Josef

^ permalink raw reply

* [PATCH net-next] liquidio: pass date and time info to NIC firmware
From: Felix Manlunas @ 2017-09-23  0:35 UTC (permalink / raw)
  To: davem
  Cc: netdev, raghu.vatsavayi, derek.chickles, satananda.burla,
	manish.awasthi, veerasenareddy.burru

From: Veerasenareddy Burru <veerasenareddy.burru@cavium.com>

Signed-off-by: Veerasenareddy Burru <veerasenareddy.burru@cavium.com>
Signed-off-by: Manish Awasthi <manish.awasthi@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
---
 .../net/ethernet/cavium/liquidio/octeon_console.c  | 28 +++++++++++++++++++---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_console.c b/drivers/net/ethernet/cavium/liquidio/octeon_console.c
index ec3dd69..eda799b 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_console.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_console.c
@@ -803,15 +803,19 @@ static int octeon_console_read(struct octeon_device *oct, u32 console_num,
 }
 
 #define FBUF_SIZE	(4 * 1024 * 1024)
+#define MAX_DATE_SIZE    30
 
 int octeon_download_firmware(struct octeon_device *oct, const u8 *data,
 			     size_t size)
 {
-	int ret = 0;
+	struct octeon_firmware_file_header *h;
+	char date[MAX_DATE_SIZE];
+	struct timeval time;
 	u32 crc32_result;
+	struct tm tm_val;
 	u64 load_addr;
 	u32 image_len;
-	struct octeon_firmware_file_header *h;
+	int ret = 0;
 	u32 i, rem;
 
 	if (size < sizeof(struct octeon_firmware_file_header)) {
@@ -890,11 +894,29 @@ int octeon_download_firmware(struct octeon_device *oct, const u8 *data,
 			load_addr += size;
 		}
 	}
+
+	/* Get time of the day */
+	do_gettimeofday(&time);
+	time_to_tm(time.tv_sec, (-sys_tz.tz_minuteswest) * 60,  &tm_val);
+	ret = snprintf(date, MAX_DATE_SIZE,
+		       " date=%04ld.%02d.%02d-%02d:%02d:%02d",
+		       tm_val.tm_year + 1900, tm_val.tm_mon + 1, tm_val.tm_mday,
+		       tm_val.tm_hour, tm_val.tm_min, tm_val.tm_sec);
+	if ((sizeof(h->bootcmd) - strnlen(h->bootcmd, sizeof(h->bootcmd))) <
+		ret) {
+		dev_err(&oct->pci_dev->dev, "Boot command buffer too small\n");
+		return -EINVAL;
+	}
+	strncat(h->bootcmd, date,
+		sizeof(h->bootcmd) - strnlen(h->bootcmd, sizeof(h->bootcmd)));
+
 	dev_info(&oct->pci_dev->dev, "Writing boot command: %s\n",
 		 h->bootcmd);
 
 	/* Invoke the bootcmd */
 	ret = octeon_console_send_cmd(oct, h->bootcmd, 50);
+	if (ret)
+		dev_info(&oct->pci_dev->dev, "Boot command send failed\n");
 
-	return 0;
+	return ret;
 }

^ permalink raw reply related

* Re: [PATCH net-next 10/10] net: hns3: Add mqprio support when interacting with network stack
From: Yunsheng Lin @ 2017-09-23  0:47 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: davem@davemloft.net, huangdaode, xuwei (O), Liguozhu (Kenneth),
	Zhuangyuzeng (Yisen), Gabriele Paoloni, John Garry, Linuxarm,
	Salil Mehta, lipeng (Y), netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20170922160322.GB2005@nanopsycho.orion>

Hi, Jiri

On 2017/9/23 0:03, Jiri Pirko wrote:
> Fri, Sep 22, 2017 at 04:11:51PM CEST, linyunsheng@huawei.com wrote:
>> Hi, Jiri
>>
>>>> - if (!tc) {
>>>> + if (if_running) {
>>>> + (void)hns3_nic_net_stop(netdev);
>>>> + msleep(100);
>>>> + }
>>>> +
>>>> + ret = (kinfo->dcb_ops && kinfo->dcb_ops->>setup_tc) ?
>>>> + kinfo->dcb_ops->setup_tc(h, tc, prio_tc) : ->EOPNOTSUPP;
>>
>>> This is most odd. Why do you call dcb_ops from >ndo_setup_tc callback?
>>> Why are you mixing this together? prio->tc mapping >can be done
>>> directly in dcbnl
>>
>> Here is what we do in dcb_ops->setup_tc:
>> Firstly, if current tc num is different from the tc num
>> that user provide, then we setup the queues for each
>> tc.
>>
>> Secondly, we tell hardware the pri to tc mapping that
>> the stack is using. In rx direction, our hardware need
>> that mapping to put different packet into different tc'
>> queues according to the priority of the packet, then
>> rss decides which specific queue in the tc should the
>> packet goto.
>>
>> By mixing, I suppose you meant why we need the
>> pri to tc infomation?
> 
> by mixing, I mean what I wrote. You are calling dcb_ops callback from
> ndo_setup_tc callback. So you are mixing DCBNL subsystem and TC
> subsystem. Why? Why do you need sch_mqprio? Why DCBNL is not enough for
> all?

When using lldptool, dcbnl is involved.

But when using tc qdisc, dcbbl is not involved, below is the a few key
call graph in the kernel when tc qdisc cmd is executed.

cmd:
tc qdisc add dev eth0 root handle 1:0 mqprio num_tc 4 map 1 2 3 3 1 3 1 1 hw 1

call graph:
rtnetlink_rcv_msg -> tc_modify_qdisc -> qdisc_create -> mqprio_init ->
hns3_nic_setup_tc

When hns3_nic_setup_tc is called, we need to know how many tc num and
prio_tc mapping from the tc_mqprio_qopt which is provided in the paramter
in the ndo_setup_tc function, and dcb_ops is the our hardware specific
method to setup the tc related parameter to the hardware, so this is why
we call dcb_ops callback in ndo_setup_tc callback.

I hope this will answer your question, thanks for your time.

> 
> 
> 
>> I hope I did not misunderstand your question, thanks
>> for your time reviewing.
> 
> .
> 

^ permalink raw reply

* Re: [PATCH net-next] virtio-net: correctly set xdp_xmit for mergeable buffer
From: David Miller @ 2017-09-23  1:16 UTC (permalink / raw)
  To: jasowang; +Cc: mst, virtualization, netdev, linux-kernel, john.fastabend
In-Reply-To: <1506062338-3617-1-git-send-email-jasowang@redhat.com>

From: Jason Wang <jasowang@redhat.com>
Date: Fri, 22 Sep 2017 14:38:58 +0800

> We should set xdp_xmit only when xdp_do_redirect() succeed.
> 
> Cc: John Fastabend <john.fastabend@gmail.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Applied, thanks Jason.

^ permalink raw reply

* Re: tools: selftests: psock_tpacket: skip un-supported tpacket_v3 test
From: David Miller @ 2017-09-23  1:20 UTC (permalink / raw)
  To: orson.zhai
  Cc: shuah, milosz.wasilewski, sumit.semwal, netdev, linux-kselftest
In-Reply-To: <20170922101717.11933-1-orson.zhai@linaro.org>

From: Orson Zhai <orson.zhai@linaro.org>
Date: Fri, 22 Sep 2017 18:17:17 +0800

> The TPACKET_V3 test of PACKET_TX_RING will fail with kernel version
> lower than v4.11. Supported code of tx ring was add with commit id
> <7f953ab2ba46: af_packet: TX_RING support for TPACKET_V3> at Jan. 3
> of 2017.
> 
> So skip this item test instead of reporting failing for old kernels.
> 
> Signed-off-by: Orson Zhai <orson.zhai@linaro.org>

The whole point is to make sure the kernel in which the selftest
code is present functions properly.

There are many tests in selftests that only work on recent kernels.

I'm not applying this, sorry.

^ permalink raw reply

* Re: [PATCH 0/5] use setup_timer() helper function.
From: David Miller @ 2017-09-23  1:22 UTC (permalink / raw)
  To: allen.lkml; +Cc: netdev, sameo
In-Reply-To: <1506077902-1796-1-git-send-email-allen.lkml@gmail.com>

From: Allen Pais <allen.lkml@gmail.com>
Date: Fri, 22 Sep 2017 16:28:17 +0530

> This series uses setup_timer() helper function. The series
> addresses the files under net/*.

There was a recent change to the nfc code in net-next which causes
your patches to not apply.

Please repsin against net-next, thanks.

^ permalink raw reply

* Re: [PATCH] net: stmmac: Meet alignment requirements for DMA
From: David Miller @ 2017-09-23  1:26 UTC (permalink / raw)
  To: matt.redfearn; +Cc: netdev, alexandre.torgue, peppe.cavallaro, linux-kernel
In-Reply-To: <1506078833-14002-1-git-send-email-matt.redfearn@imgtec.com>

From: Matt Redfearn <matt.redfearn@imgtec.com>
Date: Fri, 22 Sep 2017 12:13:53 +0100

> According to Documentation/DMA-API.txt:
>  Warnings:  Memory coherency operates at a granularity called the cache
>  line width.  In order for memory mapped by this API to operate
>  correctly, the mapped region must begin exactly on a cache line
>  boundary and end exactly on one (to prevent two separately mapped
>  regions from sharing a single cache line).  Since the cache line size
>  may not be known at compile time, the API will not enforce this
>  requirement.  Therefore, it is recommended that driver writers who
>  don't take special care to determine the cache line size at run time
>  only map virtual regions that begin and end on page boundaries (which
>  are guaranteed also to be cache line boundaries).

This is rediculious.  You're misreading what this document is trying
to explain.

As long as you use the dma_{map,unamp}_single() and sync to/from
deivce interfaces properly, the cacheline issues will be handled properly
and the cpu and the device will see proper uptodate memory contents.

It is completely rediculious to require every driver to stash away two
sets of pointer for every packet, and to DMA map the headroom of the SKB
which is wasteful.

I'm not applying this, fix this problem properly, thanks.

^ permalink raw reply

* Re: [PATCH] MAINTAINERS: update git tree locations for ieee802154 subsystem
From: David Miller @ 2017-09-23  1:29 UTC (permalink / raw)
  To: stefan; +Cc: linux-wpan, aring, netdev
In-Reply-To: <20170922122846.32377-1-stefan@osg.samsung.com>

From: Stefan Schmidt <stefan@osg.samsung.com>
Date: Fri, 22 Sep 2017 14:28:46 +0200

> Patches for ieee802154 will go through my new trees towards netdev from
> now on. The 6LoWPAN subsystem will stay as is (shared between ieee802154
> and bluetooth) and go through the bluetooth tree as usual.
> 
> Signed-off-by: Stefan Schmidt <stefan@osg.samsung.com>

Applied.

^ permalink raw reply

* Re: [PATCH net v2] net: orphan frags on stand-alone ptype in dev_queue_xmit_nit
From: David Miller @ 2017-09-23  3:32 UTC (permalink / raw)
  To: willemb; +Cc: netdev
In-Reply-To: <20170922234237.43174-1-willemb@google.com>

From: Willem de Bruijn <willemb@google.com>
Date: Fri, 22 Sep 2017 19:42:37 -0400

> Zerocopy skbs frags are copied when the skb is looped to a local sock.
> Commit 1080e512d44d ("net: orphan frags on receive") introduced calls
> to skb_orphan_frags to deliver_skb and __netif_receive_skb for this.
> 
> With msg_zerocopy, these skbs can also exist in the tx path and thus
> loop from dev_queue_xmit_nit. This already calls deliver_skb in its
> loop. But it does not orphan before a separate pt_prev->func().
> 
> Add the missing skb_orphan_frags_rx.
> 
> Changes
>   v1->v2: handle skb_orphan_frags_rx failure
> 
> Fixes: 1f8b977ab32d ("sock: enable MSG_ZEROCOPY")
> Signed-off-by: Willem de Bruijn <willemb@google.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH 0/3] fix reuseaddr regression
From: David Miller @ 2017-09-23  3:40 UTC (permalink / raw)
  To: josef; +Cc: netdev, kernel-team, linux-kernel
In-Reply-To: <1506126008-9148-1-git-send-email-josef@toxicpanda.com>

From: Josef Bacik <josef@toxicpanda.com>
Date: Fri, 22 Sep 2017 20:20:05 -0400

> I introduced a regression when reworking the fastreuse port stuff that allows
> bind conflicts to occur once a reuseaddr successfully opens on an existing tb.
> The root cause is I reversed an if statement which caused us to set the tb as if
> there were no owners on the socket if there were, which obviously is not
> correct.
> 
> Dave could you please queue these changes up for -stable, I've run them through
> the net tests and added another test to check for this problem specifically.

Series applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH net-next v2] net: Remove useless function skb_header_release
From: David Miller @ 2017-09-23  3:41 UTC (permalink / raw)
  To: gfree.wind; +Cc: netdev
In-Reply-To: <1506047122-71116-1-git-send-email-gfree.wind@vip.163.com>

From: gfree.wind@vip.163.com
Date: Fri, 22 Sep 2017 10:25:22 +0800

> From: Gao Feng <gfree.wind@vip.163.com>
> 
> There is no one which would invokes the function skb_header_release.
> So just remove it now.
> 
> Signed-off-by: Gao Feng <gfree.wind@vip.163.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] net: use 32-bit arithmetic while allocating net device
From: David Miller @ 2017-09-23  4:26 UTC (permalink / raw)
  To: adobriyan; +Cc: netdev
In-Reply-To: <20170921203329.GA13550@avx2>

From: Alexey Dobriyan <adobriyan@gmail.com>
Date: Thu, 21 Sep 2017 23:33:29 +0300

> Private part of allocation is never big enough to warrant size_t.
> 
> Space savings:
> 
> 	add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-10 (-10)
> 	function                                     old     new   delta
> 	alloc_netdev_mqs                            1120    1110     -10
> 
> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>

Applied to net-next.

^ permalink raw reply

* Re: [PATCH net-next v2 0/4] cxgb4: add support to offload tc flower
From: David Miller @ 2017-09-23  4:28 UTC (permalink / raw)
  To: rahul.lakkireddy; +Cc: netdev, kumaras, ganeshgr, nirranjan, indranil
In-Reply-To: <cover.1506015856.git.rahul.lakkireddy@chelsio.com>

From: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Date: Thu, 21 Sep 2017 23:41:12 +0530

> This series of patches add support to offload tc flower onto Chelsio
> NICs.
> 
> Patch 1 adds basic skeleton to prepare for offloading tc flower flows.
> 
> Patch 2 adds support to add/remove flows for offload.  Flows can have
> accompanying masks.  Following match and action are currently supported
> for offload:
> Match:  ether-protocol, IPv4/IPv6 addresses, L4 ports (TCP/UDP)
> Action: drop, redirect to another port on the device.
> 
> Patch 3 adds support to offload tc-flower flows having
> vlan actions: pop, push, and modify.
> 
> Patch 4 adds support to fetch stats for the offloaded tc flower flows
> from hardware.
> 
> Support for offloading more match and action types are to be followed
> in subsequent series.

Series applied, thank you.

^ permalink raw reply

* Re: pull-request: ieee802154 2017-09-20
From: David Miller @ 2017-09-23  4:29 UTC (permalink / raw)
  To: stefan; +Cc: linux-wpan, alex.aring, marcel, netdev
In-Reply-To: <20170921205606.GA14244@work.Speedport_W_724V_09011603_05_010>

From: Stefan Schmidt <stefan@osg.samsung.com>
Date: Thu, 21 Sep 2017 22:56:07 +0200

> Here comes a pull request for ieee802154 changes I have queued up for
> this merge window.
> 
> Normally these have been coming through the bluetooth tree but as this
> three have been falling through the cracks so far and I have to review
> and ack all of them anyway I think it makes sense if I save the
> bluetooth people some work and handle them directly.
> 
> Its the first pull request I send to you so please let me know if I did
> something wrong or if you prefer a different format.

Pulled, thanks.

^ permalink raw reply

* Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
From: Y Song @ 2017-09-23  4:49 UTC (permalink / raw)
  To: Edward Cree
  Cc: Alexei Starovoitov, Daniel Borkmann, David Miller, netdev,
	Jiong Wang, Jakub Kicinski
In-Reply-To: <7c1ab2b8-e65d-3b09-f9f0-9fd13c1ceccf@solarflare.com>

[-- Attachment #1: Type: text/plain, Size: 1876 bytes --]

On Fri, Sep 22, 2017 at 9:23 AM, Edward Cree <ecree@solarflare.com> wrote:
> On 22/09/17 16:16, Alexei Starovoitov wrote:
>> looks like we're converging on
>> "be16/be32/be64/le16/le32/le64 #register" for BPF_END.
>> I guess it can live with that. I would prefer more C like syntax
>> to match the rest, but llvm parsing point is a strong one.
> Yep, agreed.  I'll post a v2 once we've settled BPF_NEG.
>> For BPG_NEG I prefer to do it in C syntax like interpreter does:
>>         ALU_NEG:
>>                 DST = (u32) -DST;
>>         ALU64_NEG:
>>                 DST = -DST;
>> Yonghong, does it mean that asmparser will equally suffer?
> Correction to my earlier statements: verifier will currently disassemble
>  neg as:
> (87) r0 neg 0
> (84) (u32) r0 neg (u32) 0
>  because it pretends 'neg' is a compound-assignment operator like +=.
> The analogy with be16 and friends would be to use
>     neg64 r0
>     neg32 r0
>  whereas the analogy with everything else would be
>     r0 = -r0
>     r0 = (u32) -r0
>  as Alexei says.
> I'm happy to go with Alexei's version if it doesn't cause problems for llvm.

I got some time to do some prototyping in llvm and it looks like that
I am able to
resolve the issue and we are able to use more C-like syntax. That is:
for bswap:
     r1 = (be16) (u16) r1
     or
     r1 = (be16) r1
     or
     r1 = be16 r1
for neg:
     r0 = -r0
     (for 32bit support, llvm may output "w0 = -w0" in the future. But
since it is not
      enabled yet, you can continue to output "r0 = (u32) -r0".)

Not sure which syntax is best for bswap. The "r1 = (be16) (u16) r1" is most
explicit in its intention.

Attaching my llvm patch as well and cc'ing Jiong and Jakub so they can see my
implementation and the relative discussion here. (In this patch, I did
not implement
bswap for little endian yet.) Maybe they can provide additional comments.

[-- Attachment #2: 0001-bpf-add-support-for-neg-insn-and-change-format-of-bs.patch --]
[-- Type: application/octet-stream, Size: 8464 bytes --]

From a3f1a282ffb7e9d459ba3d1135536504bd89f597 Mon Sep 17 00:00:00 2001
From: Yonghong Song <yhs@fb.com>
Date: Fri, 22 Sep 2017 13:52:55 -0700
Subject: [PATCH] bpf: add support for neg insn and change format of bswap insn

[Alexei,

 This patch demonstrates that we can support
 bswap/neg like:
   reg1 = (be16) (u16) reg1
   reg1 = -reg1
 At IR level, we already have constraints to ensure
 that src reg the same as dst reg. The only issue
 is the assembler (from .s to .o) where the constraint
 check in BPFInstrInfo.td is not effective.
 I added additional check in BPFAsmParser.cpp which
 can help warn user if the src/dst not the same
 for the above insns. Without this check, wrong code
 will be generated.

 My previous experiment tries to use the same dst
 register in two different places to enforce the
 same src/dst register. Now with additional check
 in BPFAsmParser, we can use both src and dst registers
 in asmstring and still guarantee they point to the
 same register.

 From llvm point of view, the following format is the
 simplest, and requires no special precheck insn matching
 for assembler:
   be16 <reg>
   neg  <reg>
 But this syntax is not consistent with all other arith
 syntax we having now.

 Therefore, let us go back to the C-style syntax.
 Not sure we want
   "reg1 = (be16) (u16) reg1" or
   "reg1 = (be16) reg1".
 Maybe the second choice is good enough?

 Regarding for 32bit syntax, with Jiong's patch, 32bit ALU
 operations will have syntax with registers "w0, w1, ..., w10"
 instead of "(u32)r0". I guess we can deal with this
 in verifier later when 32bit support matures.

 Once we have consensus, I can send email out to Edward
 about the proposal and the reason.

]

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 lib/Target/BPF/AsmParser/BPFAsmParser.cpp | 51 ++++++++++++++++++++++++++-----
 lib/Target/BPF/BPFInstrFormats.td         |  1 +
 lib/Target/BPF/BPFInstrInfo.td            | 28 ++++++++++++++---
 3 files changed, 68 insertions(+), 12 deletions(-)

diff --git a/lib/Target/BPF/AsmParser/BPFAsmParser.cpp b/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
index d00200c..683f7fd 100644
--- a/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
+++ b/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
@@ -30,6 +30,8 @@ struct BPFOperand;
 class BPFAsmParser : public MCTargetAsmParser {
   SMLoc getLoc() const { return getParser().getTok().getLoc(); }
 
+  bool PreMatchCheck(OperandVector &Operands);
+
   bool MatchAndEmitInstruction(SMLoc IDLoc, unsigned &Opcode,
                                OperandVector &Operands, MCStreamer &Out,
                                uint64_t &ErrorInfo,
@@ -225,9 +227,6 @@ public:
         .Case("*", true)
         .Case("exit", true)
         .Case("lock", true)
-        .Case("bswap64", true)
-        .Case("bswap32", true)
-        .Case("bswap16", true)
         .Case("ld_pseudo", true)
         .Default(false);
   }
@@ -239,6 +238,9 @@ public:
         .Case("u32", true)
         .Case("u16", true)
         .Case("u8", true)
+        .Case("be64", true)
+        .Case("be32", true)
+        .Case("be16", true)
         .Case("goto", true)
         .Case("ll", true)
         .Case("skb", true)
@@ -252,6 +254,41 @@ public:
 #define GET_MATCHER_IMPLEMENTATION
 #include "BPFGenAsmMatcher.inc"
 
+bool BPFAsmParser::PreMatchCheck(OperandVector &Operands) {
+
+  if (Operands.size() == 4) {
+    // check reg1 = -reg2, reg1 must be the same as reg2
+    BPFOperand &Op0 = (BPFOperand &)*Operands[0];
+    BPFOperand &Op1 = (BPFOperand &)*Operands[1];
+    BPFOperand &Op2 = (BPFOperand &)*Operands[2];
+    BPFOperand &Op3 = (BPFOperand &)*Operands[3];
+    if (Op0.isReg() && Op1.isToken() && Op2.isToken() && Op3.isReg()
+        && Op1.getToken() == "=" && Op2.getToken() == "-"
+        && Op0.getReg() != Op3.getReg())
+      return true;
+  } else if (Operands.size() == 9) {
+    // check reg1 = (be16) (u16) reg2, reg1 must be the same as reg2
+    BPFOperand &Op0 = (BPFOperand &)*Operands[0];
+    BPFOperand &Op1 = (BPFOperand &)*Operands[1];
+    BPFOperand &Op2 = (BPFOperand &)*Operands[2];
+    BPFOperand &Op3 = (BPFOperand &)*Operands[3];
+    BPFOperand &Op4 = (BPFOperand &)*Operands[4];
+    BPFOperand &Op5 = (BPFOperand &)*Operands[5];
+    BPFOperand &Op6 = (BPFOperand &)*Operands[6];
+    BPFOperand &Op7 = (BPFOperand &)*Operands[7];
+    BPFOperand &Op8 = (BPFOperand &)*Operands[8];
+    if (Op0.isReg() && Op1.isToken() && Op2.isToken() && Op3.isToken()
+        && Op4.isToken() &&  Op5.isToken() && Op6.isToken()
+        && Op7.isToken() && Op8.isReg()
+        && Op1.getToken() == "=" && Op2.getToken() == "("
+        && Op4.getToken() == ")" && Op5.getToken() == "("
+        && Op7.getToken() == ")" && Op0.getReg() != Op8.getReg())
+      return true;
+  }
+
+  return false;
+}
+
 bool BPFAsmParser::MatchAndEmitInstruction(SMLoc IDLoc, unsigned &Opcode,
                                            OperandVector &Operands,
                                            MCStreamer &Out, uint64_t &ErrorInfo,
@@ -259,6 +296,9 @@ bool BPFAsmParser::MatchAndEmitInstruction(SMLoc IDLoc, unsigned &Opcode,
   MCInst Inst;
   SMLoc ErrorLoc;
 
+  if (PreMatchCheck(Operands))
+    return Error(IDLoc, "additional inst constraint not met");
+
   switch (MatchInstructionImpl(Operands, Inst, ErrorInfo, MatchingInlineAsm)) {
   default:
     break;
@@ -324,13 +364,8 @@ BPFAsmParser::parseOperandAsOperator(OperandVector &Operands) {
   switch (getLexer().getKind()) {
   case AsmToken::Minus:
   case AsmToken::Plus: {
-    StringRef Name = getLexer().getTok().getString();
-
     if (getLexer().peekTok().is(AsmToken::Integer))
       return MatchOperand_NoMatch;
-
-    getLexer().Lex();
-    Operands.push_back(BPFOperand::createToken(Name, S));
   }
   // Fall through.
 
diff --git a/lib/Target/BPF/BPFInstrFormats.td b/lib/Target/BPF/BPFInstrFormats.td
index 1e3bc3b..92d4a62 100644
--- a/lib/Target/BPF/BPFInstrFormats.td
+++ b/lib/Target/BPF/BPFInstrFormats.td
@@ -38,6 +38,7 @@ def BPF_OR   : BPFArithOp<0x4>;
 def BPF_AND  : BPFArithOp<0x5>;
 def BPF_LSH  : BPFArithOp<0x6>;
 def BPF_RSH  : BPFArithOp<0x7>;
+def BPF_NEG  : BPFArithOp<0x8>;
 def BPF_XOR  : BPFArithOp<0xa>;
 def BPF_MOV  : BPFArithOp<0xb>;
 def BPF_ARSH : BPFArithOp<0xc>;
diff --git a/lib/Target/BPF/BPFInstrInfo.td b/lib/Target/BPF/BPFInstrInfo.td
index e1f233e..fc7979b 100644
--- a/lib/Target/BPF/BPFInstrInfo.td
+++ b/lib/Target/BPF/BPFInstrInfo.td
@@ -232,6 +232,26 @@ let isAsCheapAsAMove = 1 in {
   defm DIV : ALU<BPF_DIV, "/=", udiv>;
 }
 
+class NEG_RR<BPFOpClass Class, BPFArithOp Opc,
+             dag outs, dag ins, string asmstr, list<dag> pattern>
+    : TYPE_ALU_JMP<Opc.Value, 0, outs, ins, asmstr, pattern> {
+  bits<4> dst;
+  bits<4> src;
+
+  let Inst{55-52} = src;
+  let Inst{51-48} = dst;
+  let BPFClass = Class;
+}
+
+let Constraints = "$dst = $src", isAsCheapAsAMove = 1 in {
+  def NEG_64: NEG_RR<BPF_ALU64, BPF_NEG, (outs GPR:$dst), (ins GPR:$src),
+                     "$dst = -$src",
+                     [(set GPR:$dst, (ineg i64:$src))]>;
+  def NEG_32: NEG_RR<BPF_ALU, BPF_NEG, (outs GPR32:$dst), (ins GPR32:$src),
+                     "$dst = -$src",
+                     [(set GPR32:$dst, (ineg i32:$src))]>;
+}
+
 class LD_IMM64<bits<4> Pseudo, string OpcodeStr>
     : TYPE_LD_ST<BPF_IMM.Value, BPF_DW.Value,
                  (outs GPR:$dst),
@@ -488,7 +508,7 @@ class BSWAP<bits<32> SizeOp, string OpcodeStr, list<dag> Pattern>
     : TYPE_ALU_JMP<BPF_END.Value, BPF_TO_BE.Value,
                    (outs GPR:$dst),
                    (ins GPR:$src),
-                   !strconcat(OpcodeStr, "\t$dst"),
+                   "$dst = (be"#OpcodeStr#") (u"#OpcodeStr#") $src",
                    Pattern> {
   bits<4> dst;
 
@@ -498,9 +518,9 @@ class BSWAP<bits<32> SizeOp, string OpcodeStr, list<dag> Pattern>
 }
 
 let Constraints = "$dst = $src" in {
-def BSWAP16 : BSWAP<16, "bswap16", [(set GPR:$dst, (srl (bswap GPR:$src), (i64 48)))]>;
-def BSWAP32 : BSWAP<32, "bswap32", [(set GPR:$dst, (srl (bswap GPR:$src), (i64 32)))]>;
-def BSWAP64 : BSWAP<64, "bswap64", [(set GPR:$dst, (bswap GPR:$src))]>;
+def BSWAP16 : BSWAP<16, "16", [(set GPR:$dst, (srl (bswap GPR:$src), (i64 48)))]>;
+def BSWAP32 : BSWAP<32, "32", [(set GPR:$dst, (srl (bswap GPR:$src), (i64 32)))]>;
+def BSWAP64 : BSWAP<64, "64", [(set GPR:$dst, (bswap GPR:$src))]>;
 }
 
 let Defs = [R0, R1, R2, R3, R4, R5], Uses = [R6], hasSideEffects = 1,
-- 
2.9.5


^ permalink raw reply related

* [GIT] Networking
From: David Miller @ 2017-09-23  5:03 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


1) Fix NAPI poll list corruption in enic driver, from Christian
   Lamparter.

2) Fix route use after free, from Eric Dumazet.

3) Fix regression in reuseaddr handling, from Josef Bacik.

4) Assert the size of control messages in compat handling since we
   copy it in from userspace twice.  From Meng Xu.

5) SMC layer bug fixes (missing RCU locking, bad refcounting, etc.)
   from Ursula Braun.

6) Fix races in AF_PACKET fanout handling, from Willem de Bruijn.

7) Don't use ARRAY_SIZE on spinlock array which might have zero
   entries, from Geert Uytterhoeven.

8) Fix miscomputation of checksum in ipv6 udp code, from Subash
   Abhinov Kasiviswanathan.

9) Push the ipv6 header properly in ipv6 GRE tunnel driver, from
   Xin Long.

Please pull, thanks a lot.

The following changes since commit 2bd6bf03f4c1c59381d62c61d03f6cc3fe71f66e:

  Linux 4.14-rc1 (2017-09-16 15:47:51 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 

for you to fetch changes up to 4e683f499a15cd777d3cb51aaebe48d72334c852:

  Merge branch 'net-fix-reuseaddr-regression' (2017-09-22 20:33:18 -0700)

----------------------------------------------------------------
Alex Ng (1):
      hv_netvsc: fix send buffer failure on MTU change

Andreas Gruenbacher (1):
      rhashtable: Documentation tweak

Ariel Elior (1):
      MAINTAINERS: Remove Yuval Mintz from maintainers list

Christian Lamparter (1):
      net: emac: Fix napi poll list corruption

Cong Wang (1):
      net_sched: remove cls_flower idr on failure

Daniel Borkmann (1):
      bpf: fix ri->map_owner pointer on bpf_prog_realloc

David S. Miller (8):
      Merge tag 'mac80211-for-davem-2017-11-19' of git://git.kernel.org/.../jberg/mac80211
      Merge branch 'hns3-bug-fixes'
      Merge git://git.kernel.org/.../pablo/nf
      Merge branch 'hns3-tm-fixes'
      Merge branch 'phylib-xcvr-type'
      Merge branch 'lan78xx-fixes'
      Merge branch 'smc-bug-fixes'
      Merge branch 'net-fix-reuseaddr-regression'

Davide Caratti (1):
      net/sched: cls_matchall: fix crash when used with classful qdisc

Edward Cree (1):
      net: change skb->mac_header when Generic XDP calls adjust_head

Eric Dumazet (4):
      8139too: revisit napi_complete_done() usage
      bpf: do not disable/enable BH in bpf_map_free_id()
      tcp: fastopen: fix on syn-data transmit failure
      net: prevent dst uses after free

Fahad Kunnathadi (1):
      net: phy: Fix mask value write on gmii2rgmii converter speed register

Florian Fainelli (3):
      net: systemport: Fix 64-bit statistics dependency
      net: ethtool: Add back transceiver type
      net: phy: Keep reporting transceiver type

Geert Uytterhoeven (2):
      netfilter: nat: Do not use ARRAY_SIZE() on spinlocks to fix zero div
      net: phy: Fix truncation of large IRQ numbers in phy_attached_print()

Hans Wippel (2):
      net/smc: add missing dev_put
      net/smc: add receive timeout check

Jerome Brunet (1):
      net: phy: Kconfig: Fix PHY infrastructure menu in menuconfig

Johannes Berg (1):
      nl80211: fix null-ptr dereference on invalid mesh configuration

Josef Bacik (3):
      net: set tb->fast_sk_family
      net: use inet6_rcv_saddr to compare sockets
      inet: fix improper empty comparison

Konstantin Khlebnikov (2):
      net_sched: always reset qdisc backlog in qdisc_reset()
      net_sched/hfsc: fix curve activation in hfsc_change_class()

Lipeng (6):
      net: hns3: Fixes initialization of phy address from firmware
      net: hns3: Fixes the command used to unmap ring from vector
      net: hns3: Fixes ring-to-vector map-and-unmap command
      net: hns3: Fixes the initialization of MAC address in hardware
      net: hns3: Fixes the default VLAN-id of PF
      net: hns3: Fixes the premature exit of loop when matching clients

Matteo Croce (1):
      ipv6: fix net.ipv6.conf.all interface DAD handlers

Meng Xu (2):
      net: compat: assert the size of cmsg copied in is as expected
      isdn/i4l: fetch the ppp_write buffer in one shot

Mike Manning (1):
      net: ipv6: fix regression of no RTM_DELADDR sent after DAD failure

Nisar Sayed (3):
      lan78xx: Fix for eeprom read/write when device auto suspend
      lan78xx: Allow EEPROM write for less than MAX_EEPROM_SIZE
      lan78xx: Use default values loaded from EEPROM/OTP after reset

Randy Dunlap (1):
      Documentation: networking: fix ASCII art in switchdev.txt

Salil Mehta (1):
      net: hns3: Fixes the ether address copy with appropriate API

Sathya Perla (1):
      bnxt_en: check for ingress qdisc in flower offload

Stefan Schmidt (1):
      MAINTAINERS: update git tree locations for ieee802154 subsystem

Subash Abhinov Kasiviswanathan (1):
      udpv6: Fix the checksum computation when HW checksum does not apply

Thomas Meyer (1):
      net: stmmac: Cocci spatch "of_table"

Timur Tabi (1):
      net: qcom/emac: add software control for pause frame mode

Tobias Klauser (1):
      bpf: devmap: pass on return value of bpf_map_precharge_memlock

Troy Kisky (3):
      net: fec: only check queue 0 if RXF_0/TXF_0 interrupt is set
      net: fec: remove unused interrupt FEC_ENET_TS_TIMER
      net: fec: return IRQ_HANDLED if fec_ptp_check_pps_event handled it

Ursula Braun (7):
      net/smc: take RCU read lock for routing cache lookup
      net/smc: adjust net_device refcount
      net/smc: adapt send request completion notification
      net/smc: longer delay for client link group removal
      net/smc: terminate link group if out-of-sync is received
      net/smc: introduce a delay
      net/smc: no close wait in case of process shut down

Vishwanath Pai (1):
      netfilter: ipset: ipset list may return wrong member count for set with timeout

Vladis Dronov (1):
      nl80211: check for the required netlink attributes presence

Willem de Bruijn (2):
      packet: hold bind lock when rebinding to fanout hook
      net: orphan frags on stand-alone ptype in dev_queue_xmit_nit

Xin Long (2):
      ip6_gre: skb_push ipv6hdr before packing the header in ip6gre_header
      ip6_tunnel: do not allow loading ip6_tunnel if ipv6 is disabled in cmdline

Yonghong Song (1):
      bpf: one perf event close won't free bpf program attached by another perf event

Yuchung Cheng (1):
      tcp: remove two unused functions

Yunsheng Lin (9):
      net: hns3: Cleanup for ROCE capability flag in ae_dev
      net: hns3: Fix initialization when cmd is not supported
      net: hns3: Fix for DEFAULT_DV when dev doesn't support DCB
      net: hns3: Fix for not setting rx private buffer size to zero
      net: hns3: Fix for rx_priv_buf_alloc not setting rx shared buffer
      net: hns3: Fix for rx priv buf allocation when DCB is not supported
      net: hns3: Fix typo error for feild in hclge_tm
      net: hns3: Fix for setting rss_size incorrectly
      net: hns3: Fix for pri to tc mapping in TM

 Documentation/networking/ip-sysctl.txt                  |  18 +++++++---
 Documentation/networking/switchdev.txt                  |  68 ++++++++++++++++++-------------------
 MAINTAINERS                                             |   6 ++--
 drivers/isdn/i4l/isdn_ppp.c                             |  37 +++++++++++++-------
 drivers/net/ethernet/broadcom/bcmsysport.c              |  52 +++++++++++++++++-----------
 drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c            |   4 +++
 drivers/net/ethernet/freescale/fec.h                    |   4 +--
 drivers/net/ethernet/freescale/fec_main.c               |   8 ++---
 drivers/net/ethernet/hisilicon/hns3/hnae3.c             |  43 +++++------------------
 drivers/net/ethernet/hisilicon/hns3/hnae3.h             |  15 ++++++--
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h  |  12 +++++--
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 183 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h |   3 +-
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c   |  41 ++++++++++++----------
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h   |   4 +--
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c  |  23 ++++++++-----
 drivers/net/ethernet/ibm/emac/mal.c                     |   3 +-
 drivers/net/ethernet/qualcomm/emac/emac-ethtool.c       |  30 ++++++++++++++++
 drivers/net/ethernet/qualcomm/emac/emac-mac.c           |  22 ++++++++++++
 drivers/net/ethernet/qualcomm/emac/emac.c               |   3 ++
 drivers/net/ethernet/qualcomm/emac/emac.h               |   3 ++
 drivers/net/ethernet/realtek/8139too.c                  |   5 +--
 drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c   |   1 +
 drivers/net/hyperv/hyperv_net.h                         |   2 ++
 drivers/net/hyperv/netvsc.c                             |   7 ++--
 drivers/net/hyperv/netvsc_drv.c                         |   8 +++++
 drivers/net/phy/Kconfig                                 |  18 +++++-----
 drivers/net/phy/phy.c                                   |   3 +-
 drivers/net/phy/phy_device.c                            |   2 +-
 drivers/net/phy/xilinx_gmii2rgmii.c                     |   2 +-
 drivers/net/usb/lan78xx.c                               |  34 +++++++++++++------
 include/linux/trace_events.h                            |   1 +
 include/net/dst.h                                       |  22 +++---------
 include/net/route.h                                     |   2 +-
 include/net/sock.h                                      |   2 +-
 include/net/tcp.h                                       |   1 -
 include/uapi/linux/ethtool.h                            |   6 +++-
 kernel/bpf/devmap.c                                     |   6 ++--
 kernel/bpf/syscall.c                                    |   6 ++--
 kernel/bpf/verifier.c                                   |   7 +++-
 kernel/events/core.c                                    |   3 +-
 lib/rhashtable.c                                        |   9 ++---
 net/compat.c                                            |   7 ++++
 net/core/dev.c                                          |   9 +++--
 net/core/ethtool.c                                      |   2 ++
 net/core/filter.c                                       |  24 ++++++++-----
 net/ipv4/inet_connection_sock.c                         |   6 ++--
 net/ipv4/tcp_output.c                                   |  43 +++++------------------
 net/ipv6/addrconf.c                                     |  32 ++++++++++++-----
 net/ipv6/ip6_gre.c                                      |  21 ++++++------
 net/ipv6/ip6_tunnel.c                                   |   3 ++
 net/ipv6/udp.c                                          |   1 +
 net/netfilter/ipset/ip_set_hash_gen.h                   |  14 +++++++-
 net/netfilter/nf_nat_core.c                             |  12 +++----
 net/packet/af_packet.c                                  |  16 ++++++---
 net/sched/cls_flower.c                                  |  15 ++++----
 net/sched/cls_matchall.c                                |   1 +
 net/sched/sch_generic.c                                 |   1 +
 net/sched/sch_hfsc.c                                    |  23 ++++++++++---
 net/smc/af_smc.c                                        |  16 +++++----
 net/smc/smc.h                                           |   2 +-
 net/smc/smc_clc.c                                       |  10 +++---
 net/smc/smc_clc.h                                       |   3 +-
 net/smc/smc_close.c                                     |  27 ++++++++-------
 net/smc/smc_core.c                                      |  16 ++++++---
 net/smc/smc_ib.c                                        |   1 +
 net/smc/smc_pnet.c                                      |   4 ++-
 net/smc/smc_rx.c                                        |   2 ++
 net/smc/smc_tx.c                                        |  12 ++++---
 net/smc/smc_wr.c                                        |   2 +-
 net/wireless/nl80211.c                                  |   6 ++++
 71 files changed, 646 insertions(+), 414 deletions(-)

^ permalink raw reply

* Re: [patch net-next 07/12] mlxsw: spectrum: Add the multicast routing offloading logic
From: Yotam Gigi @ 2017-09-23  9:32 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Jiri Pirko, netdev, davem, idosch, mlxsw
In-Reply-To: <20170922132152.GB31634@lunn.ch>

On 09/22/2017 04:21 PM, Andrew Lunn wrote:
> On Fri, Sep 22, 2017 at 11:36:59AM +0300, Yotam Gigi wrote:
>> On 09/21/2017 06:26 PM, Andrew Lunn wrote:
>>>> +static void mlxsw_sp_mr_route_stats_update(struct mlxsw_sp *mlxsw_sp,
>>>> +					   struct mlxsw_sp_mr_route *mr_route)
>>>> +{
>>>> +	struct mlxsw_sp_mr *mr = mlxsw_sp->mr;
>>>> +	u64 packets, bytes;
>>>> +
>>>> +	if (mr_route->route_action == MLXSW_SP_MR_ROUTE_ACTION_TRAP)
>>>> +		return;
>>>> +
>>>> +	mr->mr_ops->route_stats(mlxsw_sp, mr_route->route_priv, &packets,
>>>> +				&bytes);
>>>> +
>>>> +	switch (mr_route->mr_table->proto) {
>>>> +	case MLXSW_SP_L3_PROTO_IPV4:
>>>> +		mr_route->mfc4->mfc_un.res.pkt = packets;
>>>> +		mr_route->mfc4->mfc_un.res.bytes = bytes;
>>> What about wrong_if and lastuse? 
> Hi Yotam
>
>> wronf_if is updated by ipmr as it is trapped to the CPU.
> Great.
>
>> We did not address lastuse currently, though it can be easily
>> addressed here.
> Please do. I've written multicast routing daemons, where i use it to
> flush out MFCs which are no longer in use. Having it always 0 is going
> to break daemons.

I will. Thanks for the feedback!

>  
>>> Is an mfc with iif on the host, not the switch, not offloaded?
>>
>> I am not sure I followed. What do you mean MFC with iif on the host? you mean
>> MFC with iif that is an external NIC which is not part of the spectrum ASIC?
> Yes. We probably have different perspectives on the world. To
> Mellanox, everything is a switch in a box. In the DSA world, we tend
> to think of having a general purpose machine which also has a switch
> connected. Think of a wireless access point, set top box, passenger
> entertainment system. We have applications on the general purpose
> computer, we have wifi interfaces, cable modems, etc. Think about all
> the different packages in LEDE. We might have a multicast video
> stream, coming from the cable modem being sent over ports of the
> switch to clients.
>
> So when i look at these patches, i try to make sure the general use
> cases works, not just the plain boring Ethernet switch box use cases
> :-)

So when doing it, we did think about multi-ASIC situations, so I think it should
fit :)

>
>> in this case, the route will not be offloaded and all traffic will
>> pass in slowpath.
> O.K. I was just thinking if those counters need to be +=, not =.  But
> either the iif is on the host, or it is in the switch. It cannot be
> both. So = is O.K.
>
> Thanks
> 	Andrew

^ permalink raw reply

* Re: [PATCH net-next 2/2] net: dsa: lan9303: Add basic offloading of unicast traffic
From: Egil Hjelmeland @ 2017-09-23  9:58 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: vivien.didelot, f.fainelli, netdev, linux-kernel
In-Reply-To: <20170922200810.GJ3470@lunn.ch>

Den 22. sep. 2017 22:08, skrev Andrew Lunn:
>>> I'm wondering how this is supposed to work. Please add a good comment
>>> here, since the hardware is forcing you to do something odd.
>>>
>>> Maybe it would be a good idea to save the STP state in chip.  And then
>>> when chip->is_bridged is set true, change the state in the hardware to
>>> the saved value?
>>>
>>> What happens when port 0 is added to the bridge, there is then a
>>> minute pause and then port 1 is added? I would expect that as soon as
>>> port 0 is added, the STP state machine for port 0 will start and move
>>> into listening and then forwarding. Due to hardware limitations it
>>> looks like you cannot do this. So what state is the hardware
>>> effectively in? Blocking? Forwarding?
>>>
>>> Then port 1 is added. You can then can respect the states. port 1 will
>>> do blocking->listening->forwarding, but what about port 0? The calls
>>> won't get repeated? How does it transition to forwarding?
>>>
>>>    Andrew
>>>
>>
>> I see your point with the "minute pause" argument. Although a bit
>> contrived use case, it is easy to fix by caching the STP state, as
>> you suggest. So I can do that.
> 
> I don't think it is contrived. I've done bridge configuration by hand
> for testing purposes. I've also set the forwarding delay to very small
> values, so there is a clear race condition here.
> 
>> How does other DSA HW chips handle port separation? Knowing that
>> could perhaps help me know what to look for.
> 
> They have better hardware :-)
> 
> Generally each port is totally independent. You can change the STP
> state per port without restrictions.
> 
We can indeed change the STP state per lan9303 port "without
restrictions".

The point is: Once both external ports are in "forwarding", I see no way
to prevent traffic flowing directly between the external ports.


>        Andrew
> 

Egil

^ permalink raw reply

* [PATCH net] sctp: Fix a big endian bug in sctp_for_each_transport()
From: Dan Carpenter @ 2017-09-23 10:25 UTC (permalink / raw)
  To: Vlad Yasevich, Xin Long
  Cc: Neil Horman, David S. Miller, linux-sctp, netdev, kernel-janitors

Fundamentally, the "pos" pointer points to "cb->args[2]" which is a long.
In the current code, we only use the high 32 bits and cast it as an
int.  That works on little endian systems but will fail on big endian
systems.

Fixes: d25adbeb0cdb ("sctp: fix an use-after-free issue in sctp_sock_dump")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index d7d8cba01469..7d87439f299a 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -121,14 +121,14 @@ void sctp_transport_walk_stop(struct rhashtable_iter *iter);
 struct sctp_transport *sctp_transport_get_next(struct net *net,
 			struct rhashtable_iter *iter);
 struct sctp_transport *sctp_transport_get_idx(struct net *net,
-			struct rhashtable_iter *iter, int pos);
+			struct rhashtable_iter *iter, long pos);
 int sctp_transport_lookup_process(int (*cb)(struct sctp_transport *, void *),
 				  struct net *net,
 				  const union sctp_addr *laddr,
 				  const union sctp_addr *paddr, void *p);
 int sctp_for_each_transport(int (*cb)(struct sctp_transport *, void *),
 			    int (*cb_done)(struct sctp_transport *, void *),
-			    struct net *net, int *pos, void *p);
+			    struct net *net, long *pos, void *p);
 int sctp_for_each_endpoint(int (*cb)(struct sctp_endpoint *, void *), void *p);
 int sctp_get_sctp_info(struct sock *sk, struct sctp_association *asoc,
 		       struct sctp_info *info);
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index d4730ada7f32..0222743b3aa8 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4603,7 +4603,7 @@ struct sctp_transport *sctp_transport_get_next(struct net *net,
 
 struct sctp_transport *sctp_transport_get_idx(struct net *net,
 					      struct rhashtable_iter *iter,
-					      int pos)
+					      long pos)
 {
 	void *obj = SEQ_START_TOKEN;
 
@@ -4659,7 +4659,7 @@ EXPORT_SYMBOL_GPL(sctp_transport_lookup_process);
 
 int sctp_for_each_transport(int (*cb)(struct sctp_transport *, void *),
 			    int (*cb_done)(struct sctp_transport *, void *),
-			    struct net *net, int *pos, void *p) {
+			    struct net *net, long *pos, void *p) {
 	struct rhashtable_iter hti;
 	struct sctp_transport *tsp;
 	int ret;
diff --git a/net/sctp/sctp_diag.c b/net/sctp/sctp_diag.c
index 22ed01a76b19..e9d5405aa6ac 100644
--- a/net/sctp/sctp_diag.c
+++ b/net/sctp/sctp_diag.c
@@ -493,7 +493,7 @@ static void sctp_diag_dump(struct sk_buff *skb, struct netlink_callback *cb,
 		goto done;
 
 	sctp_for_each_transport(sctp_sock_filter, sctp_sock_dump,
-				net, (int *)&cb->args[2], &commp);
+				net, &cb->args[2], &commp);
 
 done:
 	cb->args[1] = cb->args[4];

^ permalink raw reply related

* [PATCH net-next] cxgb4: do DCB state reset in couple of places
From: Ganesh Goudar @ 2017-09-23 10:37 UTC (permalink / raw)
  To: netdev, davem; +Cc: nirranjan, indranil, venkatesh, Ganesh Goudar, Casey Leedom

reset the driver's DCB state in couple of places
where it was missing.

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_dcb.c  | 15 +++++++++++----
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_dcb.h  |  1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 10 ++++++++--
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_dcb.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_dcb.c
index 6ee2ed3..4e7f72b 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_dcb.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_dcb.c
@@ -40,8 +40,7 @@ static inline bool cxgb4_dcb_state_synced(enum cxgb4_dcb_state state)
 		return false;
 }
 
-/* Initialize a port's Data Center Bridging state.  Typically used after a
- * Link Down event.
+/* Initialize a port's Data Center Bridging state.
  */
 void cxgb4_dcb_state_init(struct net_device *dev)
 {
@@ -106,6 +105,15 @@ static void cxgb4_dcb_cleanup_apps(struct net_device *dev)
 	}
 }
 
+/* Reset a port's Data Center Bridging state.  Typically used after a
+ * Link Down event.
+ */
+void cxgb4_dcb_reset(struct net_device *dev)
+{
+	cxgb4_dcb_cleanup_apps(dev);
+	cxgb4_dcb_state_init(dev);
+}
+
 /* Finite State machine for Data Center Bridging.
  */
 void cxgb4_dcb_state_fsm(struct net_device *dev,
@@ -194,8 +202,7 @@ void cxgb4_dcb_state_fsm(struct net_device *dev,
 			 * state.  We need to reset back to a ground state
 			 * of incomplete.
 			 */
-			cxgb4_dcb_cleanup_apps(dev);
-			cxgb4_dcb_state_init(dev);
+			cxgb4_dcb_reset(dev);
 			dcb->state = CXGB4_DCB_STATE_FW_INCOMPLETE;
 			dcb->supported = CXGB4_DCBX_FW_SUPPORT;
 			linkwatch_fire_event(dev);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_dcb.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_dcb.h
index ccf24d3..02040b9 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_dcb.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_dcb.h
@@ -131,6 +131,7 @@ struct port_dcb_info {
 
 void cxgb4_dcb_state_init(struct net_device *);
 void cxgb4_dcb_version_init(struct net_device *);
+void cxgb4_dcb_reset(struct net_device *dev);
 void cxgb4_dcb_state_fsm(struct net_device *, enum cxgb4_dcb_state_input);
 void cxgb4_dcb_handle_fw_update(struct adapter *, const struct fw_port_cmd *);
 void cxgb4_dcb_set_caps(struct adapter *, const struct fw_port_cmd *);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index aa93ae9..13b636b 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -281,7 +281,7 @@ void t4_os_link_changed(struct adapter *adapter, int port_id, int link_stat)
 		else {
 #ifdef CONFIG_CHELSIO_T4_DCB
 			if (cxgb4_dcb_enabled(dev)) {
-				cxgb4_dcb_state_init(dev);
+				cxgb4_dcb_reset(dev);
 				dcb_tx_queue_prio_enable(dev, false);
 			}
 #endif /* CONFIG_CHELSIO_T4_DCB */
@@ -2304,10 +2304,16 @@ static int cxgb_close(struct net_device *dev)
 {
 	struct port_info *pi = netdev_priv(dev);
 	struct adapter *adapter = pi->adapter;
+	int ret;
 
 	netif_tx_stop_all_queues(dev);
 	netif_carrier_off(dev);
-	return t4_enable_vi(adapter, adapter->pf, pi->viid, false, false);
+	ret = t4_enable_vi(adapter, adapter->pf, pi->viid, false, false);
+#ifdef CONFIG_CHELSIO_T4_DCB
+	cxgb4_dcb_reset(dev);
+	dcb_tx_queue_prio_enable(dev, false);
+#endif
+	return ret;
 }
 
 int cxgb4_create_server_filter(const struct net_device *dev, unsigned int stid,
-- 
2.1.0

^ permalink raw reply related

* Re: tools: selftests: psock_tpacket: skip un-supported tpacket_v3 test
From: Fathi Boudra @ 2017-09-23 11:27 UTC (permalink / raw)
  To: David Miller
  Cc: orson.zhai, Shuah Khan, Milosz Wasilewski,
	sumit.semwal@linaro.org, netdev, linux-kselftest
In-Reply-To: <20170922.182017.17747017677768533.davem@davemloft.net>

On 23 September 2017 at 04:20, David Miller <davem@davemloft.net> wrote:
> From: Orson Zhai <orson.zhai@linaro.org>
> Date: Fri, 22 Sep 2017 18:17:17 +0800
>
>> The TPACKET_V3 test of PACKET_TX_RING will fail with kernel version
>> lower than v4.11. Supported code of tx ring was add with commit id
>> <7f953ab2ba46: af_packet: TX_RING support for TPACKET_V3> at Jan. 3
>> of 2017.
>>
>> So skip this item test instead of reporting failing for old kernels.
>>
>> Signed-off-by: Orson Zhai <orson.zhai@linaro.org>
>
> The whole point is to make sure the kernel in which the selftest
> code is present functions properly.
>
> There are many tests in selftests that only work on recent kernels.

For the background, a similar discussion happened on this thread:
https://lkml.org/lkml/2017/6/22/802

There's cases where we'd like to run latest selftests on stable kernels.
You're right, there are many tests in selftests that only work on
recent kernels and we intend to fix it.
Skipping gracefully a test because the feature is missing on the
kernel under test is preferred to fail.

> I'm not applying this, sorry.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox