Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next 2/2] selftests/net: add ip_defrag selftest
From: David Miller @ 2018-08-30  2:50 UTC (permalink / raw)
  To: posk; +Cc: netdev
In-Reply-To: <20180828183620.101597-2-posk@google.com>

From: Peter Oskolkov <posk@google.com>
Date: Tue, 28 Aug 2018 11:36:20 -0700

> This test creates a raw IPv4 socket, fragments a largish UDP
> datagram and sends the fragments out of order.
> 
> Then repeats in a loop with different message and fragment lengths.
> 
> Then does the same with overlapping fragments (with overlapping
> fragments the expectation is that the recv times out).
> 
> Tested:
> 
> root@<host># time ./ip_defrag.sh
> ipv4 defrag
> PASS
> ipv4 defrag with overlaps
> PASS
> 
> real    1m7.679s
> user    0m0.628s
> sys     0m2.242s
> 
> A similar test for IPv6 is to follow.
> 
> Signed-off-by: Peter Oskolkov <posk@google.com>
> Reviewed-by: Willem de Bruijn <willemb@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 1/2] ip: fail fast on IP defrag errors
From: David Miller @ 2018-08-30  2:49 UTC (permalink / raw)
  To: posk; +Cc: netdev
In-Reply-To: <20180828183620.101597-1-posk@google.com>

From: Peter Oskolkov <posk@google.com>
Date: Tue, 28 Aug 2018 11:36:19 -0700

> The current behavior of IP defragmentation is inconsistent:
> - some overlapping/wrong length fragments are dropped without
>   affecting the queue;
> - most overlapping fragments cause the whole frag queue to be dropped.
> 
> This patch brings consistency: if a bad fragment is detected,
> the whole frag queue is dropped. Two major benefits:
> - fail fast: corrupted frag queues are cleared immediately, instead of
>   by timeout;
> - testing of overlapping fragments is now much easier: any kind of
>   random fragment length mutation now leads to the frag queue being
>   discarded (IP packet dropped); before this patch, some overlaps were
>   "corrected", with tests not seeing expected packet drops.
> 
> Note that in one case (see "if (end&7)" conditional) the current
> behavior is preserved as there are concerns that this could be
> legitimate padding.
> 
> Signed-off-by: Peter Oskolkov <posk@google.com>
> Reviewed-by: Eric Dumazet <edumazet@google.com>
> Reviewed-by: Willem de Bruijn <willemb@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] liquidio: fix race condition in instruction completion processing
From: David Miller @ 2018-08-30  2:48 UTC (permalink / raw)
  To: felix.manlunas
  Cc: netdev, raghu.vatsavayi, derek.chickles, satananda.burla,
	ricardo.farrington
In-Reply-To: <20180828183255.GA7536@felix-thinkpad.cavium.com>

From: Felix Manlunas <felix.manlunas@cavium.com>
Date: Tue, 28 Aug 2018 11:32:55 -0700

> From: Rick Farrington <ricardo.farrington@cavium.com>
> 
> In lio_enable_irq, the pkt_in_done count register was being cleared to
> zero.  However, there could be some completed instructions which were not
> yet processed due to budget and limit constraints.
> So, only write this register with the number of actual completions
> that were processed.
> 
> Signed-off-by: Rick Farrington <ricardo.farrington@cavium.com>
> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] liquidio: remove unnecessary delay when processing IQ responses
From: David Miller @ 2018-08-30  2:48 UTC (permalink / raw)
  To: felix.manlunas
  Cc: netdev, raghu.vatsavayi, derek.chickles, satananda.burla,
	ricardo.farrington
In-Reply-To: <20180828181954.GA7521@felix-thinkpad.cavium.com>

From: Felix Manlunas <felix.manlunas@cavium.com>
Date: Tue, 28 Aug 2018 11:19:54 -0700

> From: Rick Farrington <ricardo.farrington@cavium.com>
> 
> Signed-off-by: Rick Farrington <ricardo.farrington@cavium.com>
> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: thunderbolt: Convert to use SPDX identifier
From: David Miller @ 2018-08-30  2:43 UTC (permalink / raw)
  To: mika.westerberg; +Cc: michael.jamet, YehezkelShB, netdev
In-Reply-To: <20180828165843.55580-1-mika.westerberg@linux.intel.com>

From: Mika Westerberg <mika.westerberg@linux.intel.com>
Date: Tue, 28 Aug 2018 19:58:43 +0300

> This gets rid of the licence boilerblate in favor of SPDX identifier
> which only takes a single line comment.
> 
> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>

Applied.

^ permalink raw reply

* Re: [PATCH v2] net: ethernet: Convert to using %pOFn instead of device_node.name
From: David Miller @ 2018-08-30  2:41 UTC (permalink / raw)
  To: robh
  Cc: linux-kernel, yisen.zhuang, salil.mehta, sebastian.hesselbarth,
	nbd, john, sean.wang, nelson.chang, matthias.bgg, w-kwok2,
	m-karicheri2, netdev
In-Reply-To: <20180828154433.5693-4-robh@kernel.org>

From: Rob Herring <robh@kernel.org>
Date: Tue, 28 Aug 2018 10:44:30 -0500

> In preparation to remove the node name pointer from struct device_node,
> convert printf users to use the %pOFn format specifier.
> 
> Signed-off-by: Rob Herring <robh@kernel.org>
> ---
> v2:
> - fix missing brackets in netcp

Applied to net-next.

^ permalink raw reply

* Re: [PATCH net 0/3] ipv6: fix error path of inet6_init()
From: David Miller @ 2018-08-30  2:29 UTC (permalink / raw)
  To: sd; +Cc: netdev, lxin
In-Reply-To: <cover.1535451234.git.sd@queasysnail.net>

From: Sabrina Dubroca <sd@queasysnail.net>
Date: Tue, 28 Aug 2018 13:40:50 +0200

> The error path of inet6_init() can trigger multiple kernel panics,
> mostly due to wrong ordering of cleanups. This series fixes those
> issues.

Series applied, thank you.

^ permalink raw reply

* Re: [RFC PATCH v2 bpf-next 0/2] verifier liveness simplification
From: Alexei Starovoitov @ 2018-08-30  2:18 UTC (permalink / raw)
  To: Edward Cree; +Cc: ast, daniel, netdev
In-Reply-To: <d16ea072-61a0-8f8a-aca1-13cac09d3d14@solarflare.com>

On Wed, Aug 22, 2018 at 08:00:46PM +0100, Edward Cree wrote:
> The first patch is a simplification of register liveness tracking by using
>  a separate parentage chain for each register and stack slot, thus avoiding
>  the need for logic to handle callee-saved registers when applying read
>  marks.  In the future this idea may be extended to form use-def chains.
> The second patch adds information about misc/zero data on the stack to the
>  state dumps emitted to the log at various points; this information was
>  found essential in debugging the first patch, and may be useful elsewhere.

I think this set is a great improvement in liveness tracking,
so depsite seeing the issues I applied it to bpf-next.

I think it's a better base to continue debugging.
In particular:
1. we have instability issue in the verifier.
 from time to time the verifier goes to process extra 7 instructions on one
 of the cilium tests. This was happening before and after this set.
2. there is a nice improvement in number of processed insns with this set,
 but the difference I cannot explain, hence it has to debugged.
 In theory the liveness rewrite shouldn't cause the difference in processed insns.

If not for the issue 1 I would argue that the issue 2 means that the set has to
be debugged before going in, but since the verifier is unstable it's better
to debug from this base with this patch set applied (because it greatly
simplifies liveness and adds additional debug in patch2)
and once we figure out the issue 1, I hope, the issue 2 will be resolved automatically.

The numbers on cilium bpf programs:
                      before1    before2   after1  after2
bpf_lb-DLB_L3.o 	2003      2003      2003    2003
bpf_lb-DLB_L4.o 	3173      3173      3173    3173
bpf_lb-DUNKNOWN.o 	1080      1080      1080    1080
bpf_lxc-DDROP_ALL.o	29587     29587     29587   29587
bpf_lxc-DUNKNOWN.o	37204     37211     36926   36933
bpf_netdev.o		11283     11283     11188   11188
bpf_overlay.o		6679      6679      6679    6679
bpf_lcx_jit.o		39657     39657     39561   39561

notice how bpf_lxc-DUNKNOWN.o fluctuates with +7 before and after. That is issue 1.
bpf_lxc-DUNKNOWN.o, bpf_netdev.o, and bpf_lcx_jit.o have small improvements.
That is issue 2.

To reproduce above numbers clone this repo: https://github.com/4ast/bpf_cilium_test
and run .sh. The .o files in there are pretty old cilium bpf programs.
I kept them frozen and didn't recompile for more than a year to keep stable
base line and track the progress of the verifier in 'processed insns'.

Thanks

^ permalink raw reply

* [PATCH v2] hv_netvsc: Fix a deadlock by getting rtnl lock earlier in netvsc_probe()
From: Dexuan Cui @ 2018-08-30  5:42 UTC (permalink / raw)
  To: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	'David S. Miller', 'netdev@vger.kernel.org'
  Cc: Josh Poulson, 'olaf@aepfle.de',
	'jasowang@redhat.com',
	'linux-kernel@vger.kernel.org',
	'marcelo.cerri@canonical.com',
	'apw@canonical.com',
	'devel@linuxdriverproject.org', vkuznets


This patch fixes the race between netvsc_probe() and
rndis_set_subchannel(), which can cause a deadlock.

These are the related 3 paths which show the deadlock:

path #1:
    Workqueue: hv_vmbus_con vmbus_onmessage_work [hv_vmbus]
    Call Trace:
     schedule
     schedule_preempt_disabled
     __mutex_lock
     __device_attach
     bus_probe_device
     device_add
     vmbus_device_register
     vmbus_onoffer
     vmbus_onmessage_work
     process_one_work
     worker_thread
     kthread
     ret_from_fork

path #2:
    schedule
     schedule_preempt_disabled
     __mutex_lock
     netvsc_probe
     vmbus_probe
     really_probe
     __driver_attach
     bus_for_each_dev
     driver_attach_async
     async_run_entry_fn
     process_one_work
     worker_thread
     kthread
     ret_from_fork

path #3:
    Workqueue: events netvsc_subchan_work [hv_netvsc]
    Call Trace:
     schedule
     rndis_set_subchannel
     netvsc_subchan_work
     process_one_work
     worker_thread
     kthread
     ret_from_fork

Before path #1 finishes, path #2 can start to run, because just before
the "bus_probe_device(dev);" in device_add() in path #1, there is a line
"object_uevent(&dev->kobj, KOBJ_ADD);", so systemd-udevd can
immediately try to load hv_netvsc and hence path #2 can start to run.

Next, path #2 offloads the subchannal's initialization to a workqueue,
i.e. path #3, so we can end up in a deadlock situation like this:

Path #2 gets the device lock, and is trying to get the rtnl lock;
Path #3 gets the rtnl lock and is waiting for all the subchannel messages
to be processed;
Path #1 is trying to get the device lock, but since #2 is not releasing
the device lock, path #1 has to sleep; since the VMBus messages are
processed one by one, this means the sub-channel messages can't be
procedded, so #3 has to sleep with the rtnl lock held, and finally #2
has to sleep... Now all the 3 paths are sleeping and we hit the deadlock.

With the patch, we can make sure #2 gets both the device lock and the
rtnl lock together, gets its job done, and releases the locks, so #1
and #3 will not be blocked for ever.

Fixes: 8195b1396ec8 ("hv_netvsc: fix deadlock on hotplug")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
---

This v2 is a resend of v1, but the commit log is updated:
1. moved the text after the --- to before the ---;
2. add 3 paragraphs to elaborate the deadlock.

 drivers/net/hyperv/netvsc_drv.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 1121a1ec..70921bb 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -2206,6 +2206,16 @@ static int netvsc_probe(struct hv_device *dev,
 
 	memcpy(net->dev_addr, device_info.mac_adr, ETH_ALEN);
 
+	/* We must get rtnl lock before scheduling nvdev->subchan_work,
+	 * otherwise netvsc_subchan_work() can get rtnl lock first and wait
+	 * all subchannels to show up, but that may not happen because
+	 * netvsc_probe() can't get rtnl lock and as a result vmbus_onoffer()
+	 * -> ... -> device_add() -> ... -> __device_attach() can't get
+	 * the device lock, so all the subchannels can't be processed --
+	 * finally netvsc_subchan_work() hangs for ever.
+	 */
+	rtnl_lock();
+
 	if (nvdev->num_chn > 1)
 		schedule_work(&nvdev->subchan_work);
 
@@ -2224,7 +2234,6 @@ static int netvsc_probe(struct hv_device *dev,
 	else
 		net->max_mtu = ETH_DATA_LEN;
 
-	rtnl_lock();
 	ret = register_netdevice(net);
 	if (ret != 0) {
 		pr_err("Unable to register netdev.\n");
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next] net/ncsi: remove duplicated include from ncsi-netlink.c
From: YueHaibing @ 2018-08-30  1:29 UTC (permalink / raw)
  To: Samuel Mendoza-Jonas, David S. Miller; +Cc: YueHaibing, netdev, kernel-janitors

Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
 net/ncsi/ncsi-netlink.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/ncsi/ncsi-netlink.c b/net/ncsi/ncsi-netlink.c
index 45f33d6..32cb775 100644
--- a/net/ncsi/ncsi-netlink.c
+++ b/net/ncsi/ncsi-netlink.c
@@ -12,7 +12,6 @@
 #include <linux/if_arp.h>
 #include <linux/rtnetlink.h>
 #include <linux/etherdevice.h>
-#include <linux/module.h>
 #include <net/genetlink.h>
 #include <net/ncsi.h>
 #include <linux/skbuff.h>

^ permalink raw reply related

* Re: [PATCH 1/2] dt-bindings: net: cpsw: Document cpsw-phy-sel usage but prefer phandle
From: Andrew Lunn @ 2018-08-30  1:18 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Grygorii Strashko, David Miller, netdev, linux-omap, devicetree,
	Ivan Khoronzhuk, Mark Rutland, Murali Karicheri, Rob Herring
In-Reply-To: <20180830004745.GU7523@atomide.com>

> In the long run cpsw should be really treated as an
> interconnect instance with it's control module providing
> standard Linux framework services such as clock /
> regulator / phy / pinctrl / iio whatever for the other
> modules.

Some of us have been applying pressure for a new driver. This sounds
like another argument for such a re-write.

	   Andrew

^ permalink raw reply

* Re: [PATCH net] net/sched: act_pedit: fix dump of extended layered op
From: David Miller @ 2018-08-30  1:12 UTC (permalink / raw)
  To: dcaratti; +Cc: jhs, xiyou.wangcong, netdev, amir
In-Reply-To: <69b7819408d62dc49aa242c239fec558fa9acd8d.1535403029.git.dcaratti@redhat.com>

From: Davide Caratti <dcaratti@redhat.com>
Date: Mon, 27 Aug 2018 22:56:22 +0200

> in the (rare) case of failure in nla_nest_start(), missing NULL checks in
> tcf_pedit_key_ex_dump() can make the following command
> 
>  # tc action add action pedit ex munge ip ttl set 64
> 
> dereference a NULL pointer:
 ...
> Like it's done for other TC actions, give up dumping pedit rules and return
> an error if nla_nest_start() returns NULL.
> 
> Fixes: 71d0ed7079df ("net/act_pedit: Support using offset relative to the conventional network headers")
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH v2] sh_eth: Add R7S9210 support
From: David Miller @ 2018-08-30  1:10 UTC (permalink / raw)
  To: chris.brandt
  Cc: sergei.shtylyov, robh+dt, mark.rutland, netdev, devicetree,
	linux-renesas-soc, horms+renesas
In-Reply-To: <20180827174202.80750-1-chris.brandt@renesas.com>

From: Chris Brandt <chris.brandt@renesas.com>
Date: Mon, 27 Aug 2018 12:42:02 -0500

> Add support for the R7S9210 which is part of the RZ/A2 series.
> 
> Signed-off-by: Chris Brandt <chris.brandt@renesas.com>
> ---
> v2:
>  * Use sh_eth_offset_fast_sh4 instead of sh_eth_offset_fast_rza2
>  * Use sh_eth_set_rate_rcar instead of sh_eth_set_rate_r7s9210()
>  * Removed enum SH_ETH_REG_FAST_RZA2

Applied.

^ permalink raw reply

* Re: [PATCH] r8169: set RxConfig after tx/rx is enabled for RTL8169sb/8110sb devices
From: David Miller @ 2018-08-30  1:07 UTC (permalink / raw)
  To: a3at.mail; +Cc: netdev, hkallweit1, nic_swsd
In-Reply-To: <20180826140309.32310-1-a3at.mail@gmail.com>

From: Azat Khuzhin <a3at.mail@gmail.com>
Date: Sun, 26 Aug 2018 17:03:09 +0300

> I have two Ethernet adapters:
>   r8169 0000:03:01.0 eth0: RTL8169sb/8110sb, 00:14:d1:14:2d:49, XID 10000000, IRQ 18
>   r8169 0000:01:00.0 eth0: RTL8168e/8111e, 64:66:b3:11:14:5d, XID 2c200000, IRQ 30
> And after upgrading from linux 4.15 [1] to linux 4.18+ [2] RTL8169sb failed to
> receive any packets. tcpdump shows a lot of checksum mismatch.
> 
>   [1]: a0f79386a4968b4925da6db2d1daffd0605a4402
>   [2]: 0519359784328bfa92bf0931bf0cff3b58c16932 (4.19 merge window opened)
> 
> I started bisecting and the found that [3] breaks it. According to [4]:
>   "For 8110S, 8110SB, and 8110SC series, the initial value of RxConfig
>   needs to be set after the tx/rx is enabled."
> So I moved rtl_init_rxcfg() after enabling tx/rs and now my adapter works
> (RTL8168e works too).
> 
>   [3]: 3559d81e76bfe3803e89f2e04cf6ef7ab4f3aace
>   [4]: e542a2269f232d61270ceddd42b73a4348dee2bb ("r8169: adjust the RxConfig
> settings.")
> 
> Also drop "rx" from rtl_set_rx_tx_config_registers(), since it does nothing
> with it already.
> 
> Fixes: 3559d81e76bfe3803e89f2e04cf6ef7ab4f3aace ("r8169: simplify
> rtl_hw_start_8169")
> 
> Cc: Heiner Kallweit <hkallweit1@gmail.com>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: netdev@vger.kernel.org
> Cc: Realtek linux nic maintainers <nic_swsd@realtek.com>
> Signed-off-by: Azat Khuzhin <a3at.mail@gmail.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [Patch net] tipc: switch to rhashtable iterator
From: David Miller @ 2018-08-30  1:05 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev, tipc-discussion, jon.maloy, ying.xue
In-Reply-To: <20180824192806.32005-1-xiyou.wangcong@gmail.com>

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Fri, 24 Aug 2018 12:28:06 -0700

> syzbot reported a use-after-free in tipc_group_fill_sock_diag(),
> where tipc_group_fill_sock_diag() still reads tsk->group meanwhile
> tipc_group_delete() just deletes it in tipc_release().
> 
> tipc_nl_sk_walk() aims to lock this sock when walking each sock
> in the hash table to close race conditions with sock changes like
> this one, by acquiring tsk->sk.sk_lock.slock spinlock, unfortunately
> this doesn't work at all. All non-BH call path should take
> lock_sock() instead to make it work.
> 
> tipc_nl_sk_walk() brutally iterates with raw rht_for_each_entry_rcu()
> where RCU read lock is required, this is the reason why lock_sock()
> can't be taken on this path. This could be resolved by switching to
> rhashtable iterator API's, where taking a sleepable lock is possible.
> Also, the iterator API's are friendly for restartable calls like
> diag dump, the last position is remembered behind the scence,
> all we need to do here is saving the iterator into cb->args[].
> 
> I tested this with parallel tipc diag dump and thousands of tipc
> socket creation and release, no crash or memory leak.
> 
> Reported-by: syzbot+b9c8f3ab2994b7cd1625@syzkaller.appspotmail.com
> Cc: Jon Maloy <jon.maloy@ericsson.com>
> Cc: Ying Xue <ying.xue@windriver.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Applied and queued up for -stable, thanks Cong.

^ permalink raw reply

* Re: [Patch net] tipc: fix a missing rhashtable_walk_exit()
From: David Miller @ 2018-08-30  0:59 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev, herbert, ying.xue
In-Reply-To: <20180823231944.4959-1-xiyou.wangcong@gmail.com>

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Thu, 23 Aug 2018 16:19:44 -0700

> rhashtable_walk_exit() must be paired with rhashtable_walk_enter().
> 
> Fixes: 40f9f4397060 ("tipc: Fix tipc_sk_reinit race conditions")
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: Ying Xue <ying.xue@windriver.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Applied and queued up for -stable, thanks Cong.

^ permalink raw reply

* Re: [PATCH net] vti6: remove !skb->ignore_df check from vti6_xmit()
From: David Miller @ 2018-08-30  0:52 UTC (permalink / raw)
  To: alexey.kodanev; +Cc: netdev, steffen.klassert
In-Reply-To: <1535042994-27225-1-git-send-email-alexey.kodanev@oracle.com>

From: Alexey Kodanev <alexey.kodanev@oracle.com>
Date: Thu, 23 Aug 2018 19:49:54 +0300

> Before the commit d6990976af7c ("vti6: fix PMTU caching and reporting
> on xmit") '!skb->ignore_df' check was always true because the function
> skb_scrub_packet() was called before it, resetting ignore_df to zero.
> 
> In the commit, skb_scrub_packet() was moved below, and now this check
> can be false for the packet, e.g. when sending it in the two fragments,
> this prevents successful PMTU updates in such case. The next attempts
> to send the packet lead to the same tx error. Moreover, vti6 initial
> MTU value relies on PMTU adjustments.
> 
> This issue can be reproduced with the following LTP test script:
>     udp_ipsec_vti.sh -6 -p ah -m tunnel -s 2000
> 
> Fixes: ccd740cbc6e0 ("vti6: Add pmtu handling to vti6_xmit.")
> Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>

Applied and queued up for -stable, thank you.

^ permalink raw reply

* Re: [PATCH 1/2] dt-bindings: net: cpsw: Document cpsw-phy-sel usage but prefer phandle
From: Tony Lindgren @ 2018-08-30  0:47 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: David Miller, netdev, linux-omap, devicetree, Andrew Lunn,
	Ivan Khoronzhuk, Mark Rutland, Murali Karicheri, Rob Herring
In-Reply-To: <90e0f25f-45f5-b2c0-59d9-cdf25eb06c0c@ti.com>

* Grygorii Strashko <grygorii.strashko@ti.com> [180830 00:12]:
> Hi Tony,
> 
> On 08/29/2018 10:00 AM, Tony Lindgren wrote:
> > The current cpsw usage for cpsw-phy-sel is undocumented but is used for
> > all the boards using cpsw. And cpsw-phy-sel is not really a child of
> > the cpsw device, it lives in the system control module instead.
> > 
> > Let's document the existing usage, and improve it a bit where we prefer
> > to use a phandle instead of a child device for it. That way we can
> > properly describe the hardware in dts files for things like genpd.
> 
> I'm ok with this series, but I really don't like cpsw-phy-sel in general.

Yeah this binding predates any standards. This series
only fixes the nasty issue of cpsw claiming a module as a
child that's outside it's IO range.

> It was introduced long time back and now I'm thinking about possibility to replace it with
> one of current generic interfaces - for example mux-controller. 
> Each port will control up to 3 muxes (port mode, idmode and rmii_ext_clk) and  
> transform phy-mode => mux states.
> What do you think?

Sure a mux-controller here makes sense.

> Another option is to use phy, but it'd be complicated.

For the port muxes, how about a phy driver just using
a pinctrl driver?

In general, it seems cpsw is just an interconnect instance
(L4_FAST) with a control module (CPSW_WR) and a pile of
independent other modules. That's described nicely in
am437x TRM chapter "2.1.4 L4 Fast Peripheral Memory Map".
So from that point of view the binding reg entries right
now are all wrong :)

In the long run cpsw should be really treated as an
interconnect instance with it's control module providing
standard Linux framework services such as clock /
regulator / phy / pinctrl / iio whatever for the other
modules.

Just my 2c based on looking at the interconnect, I'm
not too familiar with cpsw otherwise.

Regards,

Tony

^ permalink raw reply

* Re: [PATCH net] ebpf: fix bpf_msg_pull_data
From: Tushar Dave @ 2018-08-30  0:21 UTC (permalink / raw)
  To: john.fastabend, ast, daniel, davem, netdev; +Cc: sowmini.varadhan
In-Reply-To: <1535587649-11742-1-git-send-email-tushar.n.dave@oracle.com>



On 08/29/2018 05:07 PM, Tushar Dave wrote:
> While doing some preliminary testing it is found that bpf helper
> bpf_msg_pull_data does not calculate the data and data_end offset
> correctly. Fix it!
> 
> Fixes: 015632bb30da ("bpf: sk_msg program helper bpf_sk_msg_pull_data")
> Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
> ---
>   net/core/filter.c | 38 +++++++++++++++++++++++++-------------
>   1 file changed, 25 insertions(+), 13 deletions(-)
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index c25eb36..3eeb3d6 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2285,7 +2285,7 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
>   BPF_CALL_4(bpf_msg_pull_data,
>   	   struct sk_msg_buff *, msg, u32, start, u32, end, u64, flags)
>   {
> -	unsigned int len = 0, offset = 0, copy = 0;
> +	unsigned int len = 0, offset = 0, copy = 0, off = 0;
>   	struct scatterlist *sg = msg->sg_data;
>   	int first_sg, last_sg, i, shift;
>   	unsigned char *p, *to, *from;
> @@ -2299,22 +2299,30 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
>   	i = msg->sg_start;
>   	do {
>   		len = sg[i].length;
> -		offset += len;
>   		if (start < offset + len)
>   			break;
> +		offset += len;
>   		i++;
>   		if (i == MAX_SKB_FRAGS)
>   			i = 0;
> -	} while (i != msg->sg_end);
> +	} while (i <= msg->sg_end);
>   
> +	/* return error if start is out of range */
>   	if (unlikely(start >= offset + len))
>   		return -EINVAL;
>   
> -	if (!msg->sg_copy[i] && bytes <= len)
> -		goto out;
> +	/* return error if i is last entry in sglist and end is out of range */
> +	if (msg->sg_copy[i] && end > offset + len)
> +		return -EINVAL;
>   
>   	first_sg = i;
>   
> +	/* if i is not last entry in sg list and end (i.e start + bytes) is
> +	 * within this sg[i] then goto out and calculate data and data_end
> +	 */
> +	if (!msg->sg_copy[i] && end <= offset + len)
> +		goto out;
> +
>   	/* At this point we need to linearize multiple scatterlist
>   	 * elements or a single shared page. Either way we need to
>   	 * copy into a linear buffer exclusively owned by BPF. Then
> @@ -2330,9 +2338,14 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
>   		i++;
>   		if (i == MAX_SKB_FRAGS)
>   			i = 0;
> -		if (bytes < copy)
> +		if (end < copy)
>   			break;
> -	} while (i != msg->sg_end);
> +	} while (i <= msg->sg_end);
> +
> +	/* return error if i is last entry in sglist and end is out of range */
> +	if (i > msg->sg_end && end > offset + copy)
> +		return -EINVAL;
> +
>   	last_sg = i;
>   
>   	if (unlikely(copy < end - start))
> @@ -2342,23 +2355,22 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
>   	if (unlikely(!page))
>   		return -ENOMEM;
>   	p = page_address(page);
> -	offset = 0;
>   
>   	i = first_sg;
>   	do {
>   		from = sg_virt(&sg[i]);
>   		len = sg[i].length;
> -		to = p + offset;
> +		to = p + off;
>   
>   		memcpy(to, from, len);
> -		offset += len;
> +		off += len;
>   		sg[i].length = 0;
>   		put_page(sg_page(&sg[i]));
>   
>   		i++;
>   		if (i == MAX_SKB_FRAGS)
>   			i = 0;
> -	} while (i != last_sg);
> +	} while (i < last_sg);
>   
>   	sg[first_sg].length = copy;
>   	sg_set_page(&sg[first_sg], page, copy, 0);
> @@ -2380,7 +2392,7 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
>   		else
>   			move_from = i + shift;
>   
> -		if (move_from == msg->sg_end)
> +		if (move_from > msg->sg_end)
>   			break;
>   
>   		sg[i] = sg[move_from];
> @@ -2396,7 +2408,7 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
>   	if (msg->sg_end < 0)
>   		msg->sg_end += MAX_SKB_FRAGS;
>   out:
> -	msg->data = sg_virt(&sg[i]) + start - offset;
> +	msg->data = sg_virt(&sg[first_sg]) + start - offset;
>   	msg->data_end = msg->data + bytes;
>   
>   	return 0;
> 

Please discard this patch. I just noticed that Daniel Borkmann sent 
some similar fixes for bpf_msg_pull_data.


-Tushar

^ permalink raw reply

* [PATCH net-next] net: dsa: mv88e6xxx: Share main switch IRQ
From: Marek Behún @ 2018-08-30  0:13 UTC (permalink / raw)
  To: netdev; +Cc: Marek Behún

On some boards the interrupt can be shared between multiple devices.
For example on Turris Mox the interrupt is shared between all switches.

Signed-off-by: Marek Behun <marek.behun@nic.cz>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 8da3d39e3218..b57f5403982a 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -434,7 +434,7 @@ static int mv88e6xxx_g1_irq_setup(struct mv88e6xxx_chip *chip)
 
 	err = request_threaded_irq(chip->irq, NULL,
 				   mv88e6xxx_g1_irq_thread_fn,
-				   IRQF_ONESHOT,
+				   IRQF_ONESHOT | IRQF_SHARED,
 				   dev_name(chip->dev), chip);
 	if (err)
 		mv88e6xxx_g1_irq_free_common(chip);
-- 
2.16.4

^ permalink raw reply related

* Re: KASAN: stack-out-of-bounds Read in __schedule
From: Dmitry Vyukov @ 2018-08-30  4:11 UTC (permalink / raw)
  To: Alexander Potapenko, Alexei Starovoitov, Daniel Borkmann, netdev
  Cc: Jan Kara, syzbot+45a34334c61a8ecf661d, Jan Kara, linux-ext4, LKML,
	syzkaller-bugs, Theodore Ts'o
In-Reply-To: <CAG_fn=VMcHaGrFs_YvtBttmfuz8Rr1C_dfmKRibvPrbqeKFGCA@mail.gmail.com>

On Wed, Aug 29, 2018 at 7:03 AM, 'Alexander Potapenko' via
syzkaller-bugs <syzkaller-bugs@googlegroups.com> wrote:
> On Wed, Aug 29, 2018 at 3:46 PM Jan Kara <jack@suse.cz> wrote:
>>
>> On Tue 28-08-18 08:30:02, syzbot wrote:
>> > Hello,
>> >
>> > syzbot found the following crash on:
>> >
>> > HEAD commit:    5b394b2ddf03 Linux 4.19-rc1
>> > git tree:       upstream
>> > console output: https://syzkaller.appspot.com/x/log.txt?x=14f4d8e1400000
>> > kernel config:  https://syzkaller.appspot.com/x/.config?x=49927b422dcf0b29
>> > dashboard link: https://syzkaller.appspot.com/bug?extid=45a34334c61a8ecf661d
>> > compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=13127e5a400000
>> >
>> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> > Reported-by: syzbot+45a34334c61a8ecf661d@syzkaller.appspotmail.com
>> >
>> > IPv6: ADDRCONF(NETDEV_UP): veth1: link is not ready
>> > IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
>> > IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
>> > 8021q: adding VLAN 0 to HW filter on device team0
>> > ==================================================================
>> > BUG: KASAN: stack-out-of-bounds in schedule_debug kernel/sched/core.c:3285
>> > [inline]
>> > BUG: KASAN: stack-out-of-bounds in __schedule+0x1977/0x1df0
>> > kernel/sched/core.c:3395
>> > Read of size 8 at addr ffff8801ad090000 by task syz-executor0/4718
>>
>> Weird, can you please help me decipher this? So here KASAN complains about
>> wrong memory access in the scheduler.


This looks like a result of a previous bad silent memory corruption.

The KASAN report says there is a stack out-of-bounds in scheduler. And
that if followed by slab corruption report in another task.

fs/jbd2/transaction.c happens to be the first meaningful file in this
crash, and so that's where it is attributed to.

Rerunning the reproducer several times can maybe give some better
glues, or maybe not, maybe they all will look equally puzzling.

This part of the repro looks familiar:

r1 = bpf$MAP_CREATE(0x0, &(0x7f0000002e40)={0x12, 0x0, 0x4, 0x6e, 0x0,
0x1}, 0x68)
bpf$MAP_UPDATE_ELEM(0x2, &(0x7f0000000180)={r1, &(0x7f0000000000),
&(0x7f0000000140)}, 0x20)

We had exactly such consequences of a bug in bpf map very recently,
but that was claimed to be fixed. Maybe not completely?
+bpf maintainers



> Most certainly the following code:
>
>   #ifdef CONFIG_SCHED_STACK_END_CHECK
>     if (task_stack_end_corrupted(prev))
>       panic("corrupted stack end detected inside scheduler\n");
>   #endif
>
> in schedule_debug() triggers the KASAN report.
> I guess we must disable CONFIG_SCHED_STACK_END_CHECK for KASAN builds.
>
>> However the stacktrace below shows a
>> problem in find_stack() function called by KASAN?
> For some reason the stackdepot hash table is corrupted. Looks like a
> separate issue.
>> And this does not seem to
>> be fs related at all? Also the reproducer has no sign of any filesystem
>> related activity...
>>
>>                                                                 Honza
>>
>> > CPU: 0 PID: 4718 Comm: syz-executor0 Not tainted 4.19.0-rc1+ #211
>> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> > Google 01/01/2011
>> > Call Trace:
>> >
>> > The buggy address belongs to the page:
>> > page:ffffea0006b42400 count:1 mapcount:-512 mapping:0000000000000000
>> > index:0x0
>> > flags: 0x2fffc0000000000()
>> > raw: 02fffc0000000000 dead000000000100 dead000000000200 0000000000000000
>> > raw: 0000000000000000 0000000000000000 00000001fffffdff ffff8801d29544c0
>> > page dumped because: kasan: bad access detected
>> > page->mem_cgroup:ffff8801d29544c0
>> >
>> > Memory state around the buggy address:
>> >  ffff8801ad08ff00: f2 f2 f2 f2 f2 00 f2 f2 f2 00 00 00 00 00 00 00
>> >  ffff8801ad08ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1
>> > > ffff8801ad090000: f1 f1 f1 00 f2 f2 f2 f2 f2 f2 f2 04 f2 f2 f2 f2
>> >                    ^
>> >  ffff8801ad090080: f2 f2 f2 00 f2 f2 f2 00 00 00 00 00 00 00 00 00
>> >  ffff8801ad090100: 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1
>> > ==================================================================
>> > Kernel panic - not syncing: panic_on_warn set ...
>> >
>> > BUG: unable to handle kernel paging request at 0000000100000007
>> > PGD 1b34a2067 P4D 1b34a2067 PUD 0
>> > Oops: 0000 [#1] SMP KASAN
>> > CPU: 1 PID: 4325 Comm: rs:main Q:Reg Tainted: G    B             4.19.0-rc1+
>> > #211
>> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> > Google 01/01/2011
>> > RIP: 0010:find_stack lib/stackdepot.c:188 [inline]
>> > RIP: 0010:depot_save_stack+0x120/0x470 lib/stackdepot.c:238
>> > Code: 0f 00 4e 8b 24 f5 e0 db ae 89 4d 85 e4 0f 84 d4 00 00 00 44 8d 47 ff
>> > 49 c1 e0 03 eb 0d 4d 8b 24 24 4d 85 e4 0f 84 bd 00 00 00 <41> 39 5c 24 08 75
>> > ec 41 3b 7c 24 0c 75 e5 48 8b 01 49 39 44 24 18
>> > RSP: 0018:ffff8801b2636f40 EFLAGS: 00010006
>> > RAX: 0000000084727a0d RBX: 00000000222ca320 RCX: ffff8801b2636fa0
>> > RDX: 000000004e510a9d RSI: 0000000000400000 RDI: 0000000000000012
>> > RBP: ffff8801b2636f78 R08: 0000000000000088 R09: 00000000dcf06c78
>> > R10: 00000000ecfd654a R11: ffff8801db1236f3 R12: 00000000ffffffff
>> > R13: ffff8801b2636f88 R14: 00000000000ca320 R15: ffff8801b2a72680
>> > FS:  00007ff2eb061700(0000) GS:ffff8801db100000(0000) knlGS:0000000000000000
>> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > CR2: 0000000100000007 CR3: 00000001b4fdd000 CR4: 00000000001406e0
>> > Call Trace:
>> >  save_stack+0xa9/0xd0 mm/kasan/kasan.c:454
>> >  set_track mm/kasan/kasan.c:460 [inline]
>> >  __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
>> >  kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
>> >  __cache_free mm/slab.c:3498 [inline]
>> >  kmem_cache_free+0x86/0x280 mm/slab.c:3756
>> >  jbd2_free_handle include/linux/jbd2.h:1426 [inline]
>> >  jbd2_journal_stop+0x443/0x1600 fs/jbd2/transaction.c:1787
>> >  __ext4_journal_stop+0xde/0x1f0 fs/ext4/ext4_jbd2.c:103
>> >  ext4_dirty_inode+0xab/0xc0 fs/ext4/inode.c:6027
>> >  __mark_inode_dirty+0x760/0x1300 fs/fs-writeback.c:2129
>> >  generic_update_time+0x26a/0x450 fs/inode.c:1651
>> >  update_time fs/inode.c:1667 [inline]
>> >  file_update_time+0x390/0x640 fs/inode.c:1877
>> >  __generic_file_write_iter+0x1dc/0x630 mm/filemap.c:3214
>> >  ext4_file_write_iter+0x390/0x1450 fs/ext4/file.c:266
>> >  call_write_iter include/linux/fs.h:1807 [inline]
>> >  new_sync_write fs/read_write.c:474 [inline]
>> >  __vfs_write+0x6af/0x9d0 fs/read_write.c:487
>> >  vfs_write+0x1fc/0x560 fs/read_write.c:549
>> >  ksys_write+0x101/0x260 fs/read_write.c:598
>> >  __do_sys_write fs/read_write.c:610 [inline]
>> >  __se_sys_write fs/read_write.c:607 [inline]
>> >  __x64_sys_write+0x73/0xb0 fs/read_write.c:607
>> >  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>> >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> > RIP: 0033:0x7ff2ecabf19d
>> > Code: d1 20 00 00 75 10 b8 01 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48
>> > 83 ec 08 e8 be fa ff ff 48 89 04 24 b8 01 00 00 00 0f 05 <48> 8b 3c 24 48 89
>> > c2 e8 07 fb ff ff 48 89 d0 48 83 c4 08 48 3d 01
>> > RSP: 002b:00007ff2eb05ff90 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
>> > RAX: ffffffffffffffda RBX: 0000000000000400 RCX: 00007ff2ecabf19d
>> > RDX: 0000000000000400 RSI: 0000000002089a90 RDI: 0000000000000005
>> > RBP: 0000000002089a90 R08: 00000000020d9e00 R09: 656c6c616b7a7973
>> > R10: 6c656e72656b2072 R11: 0000000000000293 R12: 0000000000000000
>> > R13: 00007ff2eb060410 R14: 00000000020d9e00 R15: 0000000002089890
>> > Modules linked in:
>> > Dumping ftrace buffer:
>> >    (ftrace buffer empty)
>> > CR2: 0000000100000007
>> > ---[ end trace fbf1ba842de6c894 ]---
>> > RIP: 0010:find_stack lib/stackdepot.c:188 [inline]
>> > RIP: 0010:depot_save_stack+0x120/0x470 lib/stackdepot.c:238
>> > Code: 0f 00 4e 8b 24 f5 e0 db ae 89 4d 85 e4 0f 84 d4 00 00 00 44 8d 47 ff
>> > 49 c1 e0 03 eb 0d 4d 8b 24 24 4d 85 e4 0f 84 bd 00 00 00 <41> 39 5c 24 08 75
>> > ec 41 3b 7c 24 0c 75 e5 48 8b 01 49 39 44 24 18
>> > RSP: 0018:ffff8801b2636f40 EFLAGS: 00010006
>> > RAX: 0000000084727a0d RBX: 00000000222ca320 RCX: ffff8801b2636fa0
>> > RDX: 000000004e510a9d RSI: 0000000000400000 RDI: 0000000000000012
>> > RBP: ffff8801b2636f78 R08: 0000000000000088 R09: 00000000dcf06c78
>> > R10: 00000000ecfd654a R11: ffff8801db1236f3 R12: 00000000ffffffff
>> > R13: ffff8801b2636f88 R14: 00000000000ca320 R15: ffff8801b2a72680
>> > FS:  00007ff2eb061700(0000) GS:ffff8801db100000(0000) knlGS:0000000000000000
>> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > CR2: 0000000100000007 CR3: 00000001b4fdd000 CR4: 00000000001406e0
>> > Shutting down cpus with NMI
>> > Dumping ftrace buffer:
>> >    (ftrace buffer empty)
>> > Kernel Offset: disabled
>> > Rebooting in 86400 seconds..
>> >
>> >
>> > ---
>> > This bug is generated by a bot. It may contain errors.
>> > See https://goo.gl/tpsmEJ for more information about syzbot.
>> > syzbot engineers can be reached at syzkaller@googlegroups.com.
>> >
>> > syzbot will keep track of this bug report. See:
>> > https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
>> > syzbot.
>> > syzbot can test patches for this bug, for details see:
>> > https://goo.gl/tpsmEJ#testing-patches
>> >
>> --
>> Jan Kara <jack@suse.com>
>> SUSE Labs, CR
>>
>> --
>> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
>> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20180829134620.GD7369%40quack2.suse.cz.
>> For more options, visit https://groups.google.com/d/optout.
>
>
>
> --
> Alexander Potapenko
> Software Engineer
>
> Google Germany GmbH
> Erika-Mann-Straße, 33
> 80636 München
>
> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/CAG_fn%3DVMcHaGrFs_YvtBttmfuz8Rr1C_dfmKRibvPrbqeKFGCA%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply

* Re: [PATCH bpf-next 00/11] AF_XDP zero-copy support for i40e
From: William Tu @ 2018-08-30  0:10 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Björn Töpel, Karlsson, Magnus, magnus.karlsson,
	Alexander Duyck, Alexander Duyck, Alexei Starovoitov,
	Jesper Dangaard Brouer, Linux Kernel Network Developers,
	Brandeburg, Jesse, Anjali Singhai Jain, peter.waskiewicz.jr,
	Björn Töpel, michael.lundkvist, Willem de Bruijn,
	John Fastabend, Jakub Kicinski, neerav.parikh, mykyta.iziumtsev
In-Reply-To: <1c26d5c6-cd37-b022-fe34-1ca48ada598f@iogearbox.net>

> Thanks for working on this, LGTM! Are you also planning to get ixgbe
> out after that?
>

I currently don't have i40e nic to test, so
I'm also looking forward to the ixgbe patch!

Thank you
William

^ permalink raw reply

* Re: [PATCH 1/2] dt-bindings: net: cpsw: Document cpsw-phy-sel usage but prefer phandle
From: Grygorii Strashko @ 2018-08-30  0:08 UTC (permalink / raw)
  To: Tony Lindgren, David Miller
  Cc: netdev, linux-omap, devicetree, Andrew Lunn, Ivan Khoronzhuk,
	Mark Rutland, Murali Karicheri, Rob Herring
In-Reply-To: <20180829150024.43210-1-tony@atomide.com>

Hi Tony,

On 08/29/2018 10:00 AM, Tony Lindgren wrote:
> The current cpsw usage for cpsw-phy-sel is undocumented but is used for
> all the boards using cpsw. And cpsw-phy-sel is not really a child of
> the cpsw device, it lives in the system control module instead.
> 
> Let's document the existing usage, and improve it a bit where we prefer
> to use a phandle instead of a child device for it. That way we can
> properly describe the hardware in dts files for things like genpd.

I'm ok with this series, but I really don't like cpsw-phy-sel in general.

It was introduced long time back and now I'm thinking about possibility to replace it with
one of current generic interfaces - for example mux-controller. 
Each port will control up to 3 muxes (port mode, idmode and rmii_ext_clk) and  
transform phy-mode => mux states.
What do you think?

Another option is to use phy, but it'd be complicated.

> 
> Cc: devicetree@vger.kernel.org
> Cc: Andrew Lunn <andrew@lunn.ch>
> Cc: Grygorii Strashko <grygorii.strashko@ti.com>
> Cc: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Murali Karicheri <m-karicheri2@ti.com>
> Cc: Rob Herring <robh+dt@kernel.org>
> Signed-off-by: Tony Lindgren <tony@atomide.com>
> ---
>   Documentation/devicetree/bindings/net/cpsw.txt | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/net/cpsw.txt b/Documentation/devicetree/bindings/net/cpsw.txt
> --- a/Documentation/devicetree/bindings/net/cpsw.txt
> +++ b/Documentation/devicetree/bindings/net/cpsw.txt
> @@ -19,6 +19,10 @@ Required properties:
>   - slaves		: Specifies number for slaves
>   - active_slave		: Specifies the slave to use for time stamping,
>   			  ethtool and SIOCGMIIPHY
> +- cpsw-phy-sel		: Specifies the phandle to the CPSW phy mode selection
> +			  device. See also cpsw-phy-sel.txt for it's binding.
> +			  Note that in legacy cases cpsw-phy-sel may be
> +			  a child device instead of a phandle.
>   
>   Optional properties:
>   - ti,hwmods		: Must be "cpgmac0"
> @@ -75,6 +79,7 @@ Examples:
>   		cpts_clock_mult = <0x80000000>;
>   		cpts_clock_shift = <29>;
>   		syscon = <&cm>;
> +		cpsw-phy-sel = <&phy_sel>;
>   		cpsw_emac0: slave@0 {
>   			phy_id = <&davinci_mdio>, <0>;
>   			phy-mode = "rgmii-txid";
> @@ -103,6 +108,7 @@ Examples:
>   		cpts_clock_mult = <0x80000000>;
>   		cpts_clock_shift = <29>;
>   		syscon = <&cm>;
> +		cpsw-phy-sel = <&phy_sel>;
>   		cpsw_emac0: slave@0 {
>   			phy_id = <&davinci_mdio>, <0>;
>   			phy-mode = "rgmii-txid";
> 

-- 
regards,
-grygorii

^ permalink raw reply

* [PATCH net] ebpf: fix bpf_msg_pull_data
From: Tushar Dave @ 2018-08-30  0:07 UTC (permalink / raw)
  To: john.fastabend, ast, daniel, davem, netdev; +Cc: sowmini.varadhan

While doing some preliminary testing it is found that bpf helper
bpf_msg_pull_data does not calculate the data and data_end offset
correctly. Fix it!

Fixes: 015632bb30da ("bpf: sk_msg program helper bpf_sk_msg_pull_data")
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
 net/core/filter.c | 38 +++++++++++++++++++++++++-------------
 1 file changed, 25 insertions(+), 13 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index c25eb36..3eeb3d6 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2285,7 +2285,7 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
 BPF_CALL_4(bpf_msg_pull_data,
 	   struct sk_msg_buff *, msg, u32, start, u32, end, u64, flags)
 {
-	unsigned int len = 0, offset = 0, copy = 0;
+	unsigned int len = 0, offset = 0, copy = 0, off = 0;
 	struct scatterlist *sg = msg->sg_data;
 	int first_sg, last_sg, i, shift;
 	unsigned char *p, *to, *from;
@@ -2299,22 +2299,30 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
 	i = msg->sg_start;
 	do {
 		len = sg[i].length;
-		offset += len;
 		if (start < offset + len)
 			break;
+		offset += len;
 		i++;
 		if (i == MAX_SKB_FRAGS)
 			i = 0;
-	} while (i != msg->sg_end);
+	} while (i <= msg->sg_end);
 
+	/* return error if start is out of range */
 	if (unlikely(start >= offset + len))
 		return -EINVAL;
 
-	if (!msg->sg_copy[i] && bytes <= len)
-		goto out;
+	/* return error if i is last entry in sglist and end is out of range */
+	if (msg->sg_copy[i] && end > offset + len)
+		return -EINVAL;
 
 	first_sg = i;
 
+	/* if i is not last entry in sg list and end (i.e start + bytes) is
+	 * within this sg[i] then goto out and calculate data and data_end
+	 */
+	if (!msg->sg_copy[i] && end <= offset + len)
+		goto out;
+
 	/* At this point we need to linearize multiple scatterlist
 	 * elements or a single shared page. Either way we need to
 	 * copy into a linear buffer exclusively owned by BPF. Then
@@ -2330,9 +2338,14 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
 		i++;
 		if (i == MAX_SKB_FRAGS)
 			i = 0;
-		if (bytes < copy)
+		if (end < copy)
 			break;
-	} while (i != msg->sg_end);
+	} while (i <= msg->sg_end);
+
+	/* return error if i is last entry in sglist and end is out of range */
+	if (i > msg->sg_end && end > offset + copy)
+		return -EINVAL;
+
 	last_sg = i;
 
 	if (unlikely(copy < end - start))
@@ -2342,23 +2355,22 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
 	if (unlikely(!page))
 		return -ENOMEM;
 	p = page_address(page);
-	offset = 0;
 
 	i = first_sg;
 	do {
 		from = sg_virt(&sg[i]);
 		len = sg[i].length;
-		to = p + offset;
+		to = p + off;
 
 		memcpy(to, from, len);
-		offset += len;
+		off += len;
 		sg[i].length = 0;
 		put_page(sg_page(&sg[i]));
 
 		i++;
 		if (i == MAX_SKB_FRAGS)
 			i = 0;
-	} while (i != last_sg);
+	} while (i < last_sg);
 
 	sg[first_sg].length = copy;
 	sg_set_page(&sg[first_sg], page, copy, 0);
@@ -2380,7 +2392,7 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
 		else
 			move_from = i + shift;
 
-		if (move_from == msg->sg_end)
+		if (move_from > msg->sg_end)
 			break;
 
 		sg[i] = sg[move_from];
@@ -2396,7 +2408,7 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
 	if (msg->sg_end < 0)
 		msg->sg_end += MAX_SKB_FRAGS;
 out:
-	msg->data = sg_virt(&sg[i]) + start - offset;
+	msg->data = sg_virt(&sg[first_sg]) + start - offset;
 	msg->data_end = msg->data + bytes;
 
 	return 0;
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH 0/5] net: mvneta: some bug fix and trivial improvement
From: Jisheng Zhang @ 2018-08-30  3:53 UTC (permalink / raw)
  To: thomas.petazzoni, David S. Miller
  Cc: netdev, Gregory CLEMENT, linux-kernel, linux-arm-kernel
In-Reply-To: <20180829165131.52798cd6@xhacker.debian>

On Wed, 29 Aug 2018 16:51:31 +0800 Jisheng Zhang wrote:

> On Wed, 29 Aug 2018 16:40:24 +0800 Jisheng Zhang wrote:
> 
> > On Wed, 29 Aug 2018 16:25:57 +0800
> > Jisheng Zhang wrote:
> >   
> > > patch1 fixes rx_offset_correction set and usage. Because the
> > > rx_offset_correction is RX packet offset correction for platforms,
> > > it's not related with SW BM, instead, it's only related with the
> > > platform's NET_SKB_PAD.
> > > 
> > > patch2 fixes the wrong function to unmap rx buf    
> > 
> > I have question about the following two commits:
> > 
> > 7e47fd84b56b ("net: mvneta: Allocate page for the descriptor"), it cause
> > a waste, for normal 1500 MTU, before this patch we allocate 1920Bytes for rx
> > after this patch, we always allocate PAGE_SIZE bytes, if PAGE_SIZE=4096, we
> > waste 53% memory for each rx buf. I'm not sure whether the performance
> > improvement deserve the pay.
> > 
> > 562e2f467e71 ("net: mvneta: Improve the buffer allocation method for SWBM")
> > mentions that "With system having a small memory (around 256MB), the state
> > "cannot allocate memory to refill with new buffer" is reach pretty quickly"
> > is it due to the memory waste as said above? Anyway, by this commit, we
> > want to improve the situation on a small memory system, so should we firstly
> > revert commit 7e47fd84b56b ("net: mvneta: Allocate page for the descriptor")?

Any comments? 

Now I believe the situation is due to the memory waste introduced by 7e47fd84b56b
With linux 4.18, I tried to limit berlin platforms available memory to 256MB,
I didn't see "cannot allocate memory to refill with new buffer".

Thanks

> >   
> 
> If maintainers decide to revert the two commits: 7e47fd84b56b and 562e2f467e71
> then, patch1,2,3 are useless, we can drop them. Only patch4 and patch5 are
> still useful.
> 
> Thanks
> 
> > Any comments are welcome!
> > 
> > Thanks
> > 
> >   
> > > 
> > > patch3 removes the NETIF_F_GRO check ourself, because the net subsystem
> > > will handle it for us.
> > > 
> > > patch4 enables NETIF_F_RXCSUM by default, since the driver and HW
> > > supports the feature.
> > > 
> > > patch5 is a trivial optimization, to reduce smp_processor_id() calling
> > > in mvneta_tx_done_gbe.
> > > 
> > > Jisheng Zhang (5):
> > >   net: mvneta: fix rx_offset_correction set and usage
> > >   net: mvneta: fix the wrong function to unmap rx buf
> > >   net: mvneta: Don't check NETIF_F_GRO ourself
> > >   net: mvneta: enable NETIF_F_RXCSUM by default
> > >   net: mvneta: reduce smp_processor_id() calling in mvneta_tx_done_gbe
> > > 
> > >  drivers/net/ethernet/marvell/mvneta.c | 49 ++++++++++++---------------
> > >  1 file changed, 22 insertions(+), 27 deletions(-)
> > >     
> >   
> 

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox