Netdev List
 help / color / mirror / Atom feed
* Re: general protection fault in dev_map_hash_update_elem
From: Jesper Dangaard Brouer @ 2019-09-06 12:54 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: syzbot, bpf, Daniel Borkmann, Jesper Dangaard Brouer, LKML,
	Network Development, syzkaller-bugs,
	Toke Høiland-Jørgensen
In-Reply-To: <CAADnVQK94boXD8Y=g1LsBtNG4wrYQ0Jnjxhq7hdxvyBKZuPwXw@mail.gmail.com>

On Thu, 5 Sep 2019 14:44:37 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Thu, Sep 5, 2019 at 1:08 PM syzbot
> <syzbot+4e7a85b1432052e8d6f8@syzkaller.appspotmail.com> wrote:
> >
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit:    6d028043 Add linux-next specific files for 20190830
> > git tree:       linux-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=135c1a92600000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=82a6bec43ab0cb69
> > dashboard link: https://syzkaller.appspot.com/bug?extid=4e7a85b1432052e8d6f8
> > compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=109124e1600000
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+4e7a85b1432052e8d6f8@syzkaller.appspotmail.com
> >
> > kasan: CONFIG_KASAN_INLINE enabled
> > kasan: GPF could be caused by NULL-ptr deref or user memory access
> > general protection fault: 0000 [#1] PREEMPT SMP KASAN
> > CPU: 1 PID: 10235 Comm: syz-executor.0 Not tainted 5.3.0-rc6-next-20190830
> > #75
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > Google 01/01/2011
> > RIP: 0010:__write_once_size include/linux/compiler.h:203 [inline]
> > RIP: 0010:__hlist_del include/linux/list.h:795 [inline]
> > RIP: 0010:hlist_del_rcu include/linux/rculist.h:475 [inline]
> > RIP: 0010:__dev_map_hash_update_elem kernel/bpf/devmap.c:668 [inline]
> > RIP: 0010:dev_map_hash_update_elem+0x3c8/0x6e0 kernel/bpf/devmap.c:691
> > Code: 48 89 f1 48 89 75 c8 48 c1 e9 03 80 3c 11 00 0f 85 d3 02 00 00 48 b9
> > 00 00 00 00 00 fc ff df 48 8b 53 10 48 89 d6 48 c1 ee 03 <80> 3c 0e 00 0f
> > 85 97 02 00 00 48 85 c0 48 89 02 74 38 48 89 55 b8
> > RSP: 0018:ffff88808d607c30 EFLAGS: 00010046
> > RAX: 0000000000000000 RBX: ffff8880a7f14580 RCX: dffffc0000000000
> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a7f14588
> > RBP: ffff88808d607c78 R08: 0000000000000004 R09: ffffed1011ac0f73
> > R10: ffffed1011ac0f72 R11: 0000000000000003 R12: ffff88809f4e9400
> > R13: ffff88809b06ba00 R14: 0000000000000000 R15: ffff88809f4e9528
> > FS:  00007f3a3d50c700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00007feb3fcd0000 CR3: 00000000986b9000 CR4: 00000000001406e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> >   map_update_elem+0xc82/0x10b0 kernel/bpf/syscall.c:966
> >   __do_sys_bpf+0x8b5/0x3350 kernel/bpf/syscall.c:2854
> >   __se_sys_bpf kernel/bpf/syscall.c:2825 [inline]
> >   __x64_sys_bpf+0x73/0xb0 kernel/bpf/syscall.c:2825
> >   do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
> >   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > RIP: 0033:0x459879
> > Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7
> > 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
> > ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> > RSP: 002b:00007f3a3d50bc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
> > RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000459879
> > RDX: 0000000000000020 RSI: 0000000020000040 RDI: 0000000000000002
> > RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000246 R12: 00007f3a3d50c6d4
> > R13: 00000000004bfc86 R14: 00000000004d1960 R15: 00000000ffffffff
> > Modules linked in:
> > ---[ end trace 083223e21dbd0ae5 ]---
> > RIP: 0010:__write_once_size include/linux/compiler.h:203 [inline]
> > RIP: 0010:__hlist_del include/linux/list.h:795 [inline]
> > RIP: 0010:hlist_del_rcu include/linux/rculist.h:475 [inline]
> > RIP: 0010:__dev_map_hash_update_elem kernel/bpf/devmap.c:668 [inline]
> > RIP: 0010:dev_map_hash_update_elem+0x3c8/0x6e0 kernel/bpf/devmap.c:691  
> 
> Toke,
> please take a look.
> Thanks!

Hi Toke,

I think the problem is that you read:
 old_dev = __dev_map_hash_lookup_elem(map, idx);

Before holding the lock dtab->index_lock... 

I'm not sure this is the correct fix, but I think below change should
solve the issue (not even compile tested):

[bpf-next]$ git diff

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 9af048a932b5..c41854a68e9e 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -664,6 +664,9 @@ static int __dev_map_hash_update_elem(struct net *net, struct bpf_map *map,
 
        spin_lock_irqsave(&dtab->index_lock, flags);
 
+       /* Re-read old_dev while holding lock*/
+       old_dev = __dev_map_hash_lookup_elem(map, idx);
+
        if (old_dev) {
                hlist_del_rcu(&old_dev->index_hlist);
        } else {


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply related

* Re: [PATCH net-next,v3 0/4] flow_offload: update mangle action representation
From: Edward Cree @ 2019-09-06 12:55 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, davem, netdev, jakub.kicinski, jiri, saeedm,
	vishal, vladbu
In-Reply-To: <20190906105638.hylw6quhk7t3wff2@salvia>

On 06/09/2019 11:56, Pablo Neira Ayuso wrote:
> On Fri, Sep 06, 2019 at 11:02:18AM +0100, Edward Cree wrote:
>> Still NAK for the same reasons as three versions ago (when it was called
>>  "netfilter: payload mangling offload support"), you've never managed to
>>  explain why this extra API complexity is useful.  (Reducing LOC does not
>>  mean you've reduced complexity.)
> Oh well...
>
> Patch 1) Mask is inverted for no reason, just because tc pedit needs
> this in this way. All drivers reverse this mask.
>
> Patch 2) All drivers mask out meaningless fields in the value field.
To be clear: I have no issue with these two patches; they look fine to me.
(Though I'd like to see some comments on struct flow_action_entry specifying
 the semantics of these fields, especially if they're going to differ from
 the corresponding fields in struct tc_pedit_key.)

> Patch 3) Without this patchset, offsets are on the 32-bit boundary.
> Drivers need to play with the 32-bit mask to infer what field they are
> supposed to mangle... eg. with 32-bit offset alignment, checking if
> the use want to alter the ttl/hop_limit need for helper structures to
> check the 32-bit mask. Mangling a IPv6 address comes with one single
> action...
Drivers are still going to need to handle multiple pedit actions, in
 case the original rule wanted to mangle two non-consecutive fields.
And you can't just coalesce all consecutive mangles, because if you
 mangle two consecutive fields (e.g. UDP sport and dport) the driver
 still needs to disentangle that if it works on a 'fields' (rather
 than 'u32s') level.
So either have the core convert things into named protocol fields
 (i.e. "set src IPv6 to 1234::5 and add 1 to UDP sport"), or leave
 the current sequence-of-u32-mangles as it is.  This in-between "we'll
 coalesce things together despite not knowing what fields they are" is
 neither fish nor fowl.

-Ed

^ permalink raw reply

* Re: [PATCH 2/3] nfp: Drop unnecessary continue in nfp_net_pf_alloc_vnics
From: David Miller @ 2019-09-06 12:58 UTC (permalink / raw)
  To: zhongjiang; +Cc: kvalo, pkshih, netdev, linux-kernel
In-Reply-To: <1567568784-9669-3-git-send-email-zhongjiang@huawei.com>

From: zhong jiang <zhongjiang@huawei.com>
Date: Wed, 4 Sep 2019 11:46:23 +0800

> Continue is not needed at the bottom of a loop.
> 
> Signed-off-by: zhong jiang <zhongjiang@huawei.com>

Applied to net-next.

^ permalink raw reply

* Re: [PATCH net-next v4 1/1] net: openvswitch: Set OvS recirc_id from tc chain index
From: David Miller @ 2019-09-06 12:59 UTC (permalink / raw)
  To: paulb
  Cc: pshelar, netdev, jpettit, simon.horman, marcelo.leitner, vladbu,
	jiri, roid, yossiku, ronye, ozsh
In-Reply-To: <1567605397-14060-2-git-send-email-paulb@mellanox.com>

From: Paul Blakey <paulb@mellanox.com>
Date: Wed,  4 Sep 2019 16:56:37 +0300

> Offloaded OvS datapath rules are translated one to one to tc rules,
> for example the following simplified OvS rule:
> 
> recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) actions:ct(),recirc(2)
> 
> Will be translated to the following tc rule:
> 
> $ tc filter add dev dev1 ingress \
> 	    prio 1 chain 0 proto ip \
> 		flower tcp ct_state -trk \
> 		action ct pipe \
> 		action goto chain 2
> 
> Received packets will first travel though tc, and if they aren't stolen
> by it, like in the above rule, they will continue to OvS datapath.
> Since we already did some actions (action ct in this case) which might
> modify the packets, and updated action stats, we would like to continue
> the proccessing with the correct recirc_id in OvS (here recirc_id(2))
> where we left off.
> 
> To support this, introduce a new skb extension for tc, which
> will be used for translating tc chain to ovs recirc_id to
> handle these miss cases. Last tc chain index will be set
> by tc goto chain action and read by OvS datapath.
> 
> Signed-off-by: Paul Blakey <paulb@mellanox.com>
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
> Acked-by: Jiri Pirko <jiri@mellanox.com>
> ---
> Changelog:
> V3->V4:
> 	Removed changes to tcf_result, instead us action return value to get chain index

Applied to net-next.

^ permalink raw reply

* Re: [v3] net_sched: act_police: add 2 new attributes to support police 64bit rate and peakrate
From: David Miller @ 2019-09-06 13:02 UTC (permalink / raw)
  To: zdai; +Cc: jhs, xiyou.wangcong, jiri, netdev, linux-kernel, zdai
In-Reply-To: <1567609423-26826-1-git-send-email-zdai@linux.vnet.ibm.com>

From: David Dai <zdai@linux.vnet.ibm.com>
Date: Wed,  4 Sep 2019 10:03:43 -0500

> For high speed adapter like Mellanox CX-5 card, it can reach upto
> 100 Gbits per second bandwidth. Currently htb already supports 64bit rate
> in tc utility. However police action rate and peakrate are still limited
> to 32bit value (upto 32 Gbits per second). Add 2 new attributes
> TCA_POLICE_RATE64 and TCA_POLICE_RATE64 in kernel for 64bit support
> so that tc utility can use them for 64bit rate and peakrate value to
> break the 32bit limit, and still keep the backward binary compatibility.
> 
> Tested-by: David Dai <zdai@linux.vnet.ibm.com>
> Signed-off-by: David Dai <zdai@linux.vnet.ibm.com>
> ---
> Changelog:
> v1->v2:
>  - Move 2 attributes TCA_POLICE_RATE64 TCA_POLICE_PEAKRATE64 after
>    TCA_POLICE_PAD in pkt_cls.h header.
> v2->v3:
>  - Use TCA_POLICE_PAD instead of __TCA_POLICE_MAX as padding attr
>    in last parameter in nla_put_u64_64bit() routine.

Applied to net-next.

^ permalink raw reply

* [PATCH 1/2] net: stmmac: implement support for passive mode converters via dt
From: Alexandru Ardelean @ 2019-09-06 13:02 UTC (permalink / raw)
  To: netdev, devicetree, linux-kernel
  Cc: davem, robh+dt, peppe.cavallaro, alexandre.torgue, --cc=andrew,
	Alexandru Ardelean

In-between the MAC & PHY there can be a mode converter, which converts one
mode to another (e.g. GMII-to-RGMII).

The converter, can be passive (i.e. no driver or OS/SW information
required), so the MAC & PHY need to be configured differently.

For the `stmmac` driver, this is implemented via a `mac-mode` property in
the device-tree, which configures the MAC into a certain mode, and for the
PHY a `phy_interface` field will hold the mode of the PHY. The mode of the
PHY will be passed to the PHY and from there-on it work in a different
mode. If unspecified, the default `phy-mode` will be used for both.

Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
---
 .../net/ethernet/stmicro/stmmac/stmmac_main.c |  2 +-
 .../ethernet/stmicro/stmmac/stmmac_platform.c | 34 ++++++++++++++++++-
 include/linux/stmmac.h                        |  1 +
 3 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index c3baca9f587b..ec7aa42128a5 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1029,7 +1029,7 @@ static int stmmac_init_phy(struct net_device *dev)
 static int stmmac_phy_setup(struct stmmac_priv *priv)
 {
 	struct fwnode_handle *fwnode = of_fwnode_handle(priv->plat->phylink_node);
-	int mode = priv->plat->interface;
+	int mode = priv->plat->phy_interface;
 	struct phylink *phylink;
 
 	priv->phylink_config.dev = &priv->dev->dev;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index eaf8f08f2e91..401cbbfc06f0 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -355,6 +355,32 @@ static int stmmac_dt_phy(struct plat_stmmacenet_data *plat,
 	return 0;
 }
 
+/**
+ * stmmac_of_get_mac_mode - retrieves the interface of the MAC
+ * @np - device-tree node
+ * Description:
+ * Similar to `of_get_phy_mode()`, this function will retrieve (from
+ * the device-tree) the interface mode on the MAC side. This assumes
+ * that there is mode converter in-between the MAC & PHY
+ * (e.g. GMII-to-RGMII).
+ */
+static int stmmac_of_get_mac_mode(struct device_node *np)
+{
+	const char *pm;
+	int err, i;
+
+	err = of_property_read_string(np, "mac-mode", &pm);
+	if (err < 0)
+		return err;
+
+	for (i = 0; i < PHY_INTERFACE_MODE_MAX; i++) {
+		if (!strcasecmp(pm, phy_modes(i)))
+			return i;
+	}
+
+	return -ENODEV;
+}
+
 /**
  * stmmac_probe_config_dt - parse device-tree driver parameters
  * @pdev: platform_device structure
@@ -383,7 +409,13 @@ stmmac_probe_config_dt(struct platform_device *pdev, const char **mac)
 		*mac = NULL;
 	}
 
-	plat->interface = of_get_phy_mode(np);
+	plat->phy_interface = of_get_phy_mode(np);
+	if (plat->phy_interface < 0)
+		return ERR_PTR(plat->phy_interface);
+
+	plat->interface = stmmac_of_get_mac_mode(np);
+	if (plat->interface < 0)
+		plat->interface = plat->phy_interface;
 
 	/* Some wrapper drivers still rely on phy_node. Let's save it while
 	 * they are not converted to phylink. */
diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
index 7ad7ae35cf88..dc60d03c4b60 100644
--- a/include/linux/stmmac.h
+++ b/include/linux/stmmac.h
@@ -131,6 +131,7 @@ struct plat_stmmacenet_data {
 	int bus_id;
 	int phy_addr;
 	int interface;
+	int phy_interface;
 	struct stmmac_mdio_bus_data *mdio_bus_data;
 	struct device_node *phy_node;
 	struct device_node *phylink_node;
-- 
2.20.1


^ permalink raw reply related

* [PATCH 2/2] dt-bindings: net: dwmac: document 'mac-mode' property
From: Alexandru Ardelean @ 2019-09-06 13:02 UTC (permalink / raw)
  To: netdev, devicetree, linux-kernel
  Cc: davem, robh+dt, peppe.cavallaro, alexandre.torgue, --cc=andrew,
	Alexandru Ardelean
In-Reply-To: <20190906130256.10321-1-alexandru.ardelean@analog.com>

This change documents the 'mac-mode' property that was introduced in the
'stmmac' driver to support passive mode converters that can sit in-between
the MAC & PHY.

Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
---
 Documentation/devicetree/bindings/net/snps,dwmac.yaml | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/Documentation/devicetree/bindings/net/snps,dwmac.yaml b/Documentation/devicetree/bindings/net/snps,dwmac.yaml
index c78be15704b9..ebe4537a7cce 100644
--- a/Documentation/devicetree/bindings/net/snps,dwmac.yaml
+++ b/Documentation/devicetree/bindings/net/snps,dwmac.yaml
@@ -112,6 +112,14 @@ properties:
   reset-names:
     const: stmmaceth
 
+  mac-mode:
+    maxItems: 1
+    description:
+      The property is identical to 'phy-mode', and assumes that there is mode
+      converter in-between the MAC & PHY (e.g. GMII-to-RGMII). This converter
+      can be passive (no SW requirement), and requires that the MAC operate
+      in a different mode than the PHY in order to function.
+
   snps,axi-config:
     $ref: /schemas/types.yaml#definitions/phandle
     description:
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH v2 net] net: sonic: return NETDEV_TX_OK if failed to map buffer
From: David Miller @ 2019-09-06 13:05 UTC (permalink / raw)
  To: maowenan; +Cc: eric.dumazet, tsbogend, netdev, linux-kernel, kernel-janitors
In-Reply-To: <20190905015712.107173-1-maowenan@huawei.com>

From: Mao Wenan <maowenan@huawei.com>
Date: Thu, 5 Sep 2019 09:57:12 +0800

> NETDEV_TX_BUSY really should only be used by drivers that call
> netif_tx_stop_queue() at the wrong moment. If dma_map_single() is
> failed to map tx DMA buffer, it might trigger an infinite loop.
> This patch use NETDEV_TX_OK instead of NETDEV_TX_BUSY, and change
> printk to pr_err_ratelimited.
> 
> Fixes: d9fb9f384292 ("*sonic/natsemi/ns83829: Move the National Semi-conductor drivers")
> Signed-off-by: Mao Wenan <maowenan@huawei.com>

Applied.

^ permalink raw reply

* Re: [PATCHv3 1/1] forcedeth: use per cpu to collect xmit/recv statistics
From: David Miller @ 2019-09-06 13:07 UTC (permalink / raw)
  To: yanjun.zhu; +Cc: eric.dumazet, netdev
In-Reply-To: <1567674942-5132-2-git-send-email-yanjun.zhu@oracle.com>

From: Zhu Yanjun <yanjun.zhu@oracle.com>
Date: Thu,  5 Sep 2019 05:15:42 -0400

> When testing with a background iperf pushing 1Gbit/sec traffic and running
> both ifconfig and netstat to collect statistics, some deadlocks occurred.
> 
> Ifconfig and netstat will call nv_get_stats64 to get software xmit/recv
> statistics. In the commit f5d827aece36 ("forcedeth: implement
> ndo_get_stats64() API"), the normal tx/rx variables is to collect tx/rx
> statistics. The fix is to replace normal tx/rx variables with per
> cpu 64-bit variable to collect xmit/recv statistics. The per cpu variable
> will avoid deadlocks and provide fast efficient statistics updates.
> 
> In nv_probe, the per cpu variable is initialized. In nv_remove, this
> per cpu variable is freed.
> 
> In xmit/recv process, this per cpu variable will be updated.
> 
> In nv_get_stats64, this per cpu variable on each cpu is added up. Then
> the driver can get xmit/recv packets statistics.
> 
> A test runs for several days with this commit, the deadlocks disappear
> and the performance is better.
> 
> Tested:
 ...
> Fixes: f5d827aece36 ("forcedeth: implement ndo_get_stats64() API")
> CC: Joe Jin <joe.jin@oracle.com>
> CC: JUNXIAO_BI <junxiao.bi@oracle.com>
> Reported-and-tested-by: Nan san <nan.1986san@gmail.com>
> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>

Applied.

^ permalink raw reply

* Re: pull request (net): ipsec 2019-09-05
From: David Miller @ 2019-09-06 13:09 UTC (permalink / raw)
  To: steffen.klassert; +Cc: herbert, netdev
In-Reply-To: <20190905102201.1636-1-steffen.klassert@secunet.com>

From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Thu, 5 Sep 2019 12:21:56 +0200

> 1) Several xfrm interface fixes from Nicolas Dichtel:
>    - Avoid an interface ID corruption on changelink.
>    - Fix wrong intterface names in the logs.
>    - Fix a list corruption when changing network namespaces.
>    - Fix unregistation of the underying phydev.
> 
> 2) Fix a potential warning when merging xfrm_plocy nodes.
>    From Florian Westphal.
> 
> Please pull or let me know if there are problems.

Pulled, thanks Steffen.

^ permalink raw reply

* Re: [PATCH net-next] net: phy: Do not check Link status when loopback is enabled
From: David Miller @ 2019-09-06 13:11 UTC (permalink / raw)
  To: Jose.Abreu
  Cc: netdev, Joao.Pinto, andrew, f.fainelli, hkallweit1, linux-kernel
In-Reply-To: <7db46f6b1318ec22d45f7e6f6f907eda015a9df6.1567683751.git.joabreu@synopsys.com>

From: Jose Abreu <Jose.Abreu@synopsys.com>
Date: Thu,  5 Sep 2019 13:43:10 +0200

> While running stmmac selftests I found that in my 1G setup some tests
> were failling when running with PHY loopback enabled.
> 
> It looks like when loopback is enabled the PHY will report that Link is
> down even though there is a valid connection.
> 
> As in loopback mode the data will not be sent anywhere we can bypass the
> logic of checking if Link is valid thus saving unecessary reads.
> 
> Signed-off-by: Jose Abreu <joabreu@synopsys.com>

Applied to net-next.

^ permalink raw reply

* Re: [PATCH net] net: sched: fix reordering issues
From: David Miller @ 2019-09-06 13:13 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, eric.dumazet, john.fastabend
In-Reply-To: <20190905122022.106680-1-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Thu,  5 Sep 2019 05:20:22 -0700

> Whenever MQ is not used on a multiqueue device, we experience
> serious reordering problems. Bisection found the cited
> commit.
> 
> The issue can be described this way :
> 
> - A single qdisc hierarchy is shared by all transmit queues.
>   (eg : tc qdisc replace dev eth0 root fq_codel)
> 
> - When/if try_bulk_dequeue_skb_slow() dequeues a packet targetting
>   a different transmit queue than the one used to build a packet train,
>   we stop building the current list and save the 'bad' skb (P1) in a
>   special queue. (bad_txq)
> 
> - When dequeue_skb() calls qdisc_dequeue_skb_bad_txq() and finds this
>   skb (P1), it checks if the associated transmit queues is still in frozen
>   state. If the queue is still blocked (by BQL or NIC tx ring full),
>   we leave the skb in bad_txq and return NULL.
> 
> - dequeue_skb() calls q->dequeue() to get another packet (P2)
> 
>   The other packet can target the problematic queue (that we found
>   in frozen state for the bad_txq packet), but another cpu just ran
>   TX completion and made room in the txq that is now ready to accept
>   new packets.
> 
> - Packet P2 is sent while P1 is still held in bad_txq, P1 might be sent
>   at next round. In practice P2 is the lead of a big packet train
>   (P2,P3,P4 ...) filling the BQL budget and delaying P1 by many packets :/
> 
> To solve this problem, we have to block the dequeue process as long
> as the first packet in bad_txq can not be sent. Reordering issues
> disappear and no side effects have been seen.
> 
> Fixes: a53851e2c321 ("net: sched: explicit locking in gso_cpu fallback")
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied and queued up for -stable, thanks Eric.

^ permalink raw reply

* Re: [PATCH net-next,v3 0/4] flow_offload: update mangle action representation
From: Pablo Neira Ayuso @ 2019-09-06 13:14 UTC (permalink / raw)
  To: Edward Cree
  Cc: netfilter-devel, davem, netdev, jakub.kicinski, jiri, saeedm,
	vishal, vladbu
In-Reply-To: <b8baf681-b808-4b83-d521-0353c3136516@solarflare.com>

On Fri, Sep 06, 2019 at 01:55:29PM +0100, Edward Cree wrote:
> On 06/09/2019 11:56, Pablo Neira Ayuso wrote:
> > On Fri, Sep 06, 2019 at 11:02:18AM +0100, Edward Cree wrote:
> >> Still NAK for the same reasons as three versions ago (when it was called
> >>  "netfilter: payload mangling offload support"), you've never managed to
> >>  explain why this extra API complexity is useful.  (Reducing LOC does not
> >>  mean you've reduced complexity.)
> > Oh well...
> >
> > Patch 1) Mask is inverted for no reason, just because tc pedit needs
> > this in this way. All drivers reverse this mask.
> >
> > Patch 2) All drivers mask out meaningless fields in the value field.
>
> To be clear: I have no issue with these two patches; they look fine to me.
> (Though I'd like to see some comments on struct flow_action_entry specifying
>  the semantics of these fields, especially if they're going to differ from
>  the corresponding fields in struct tc_pedit_key.)

OK, I can document this semantics, I need just _time_ to write that
documentation. I was expecting this patch description is enough by now
until I can get to finish that documentation.

> > Patch 3) Without this patchset, offsets are on the 32-bit boundary.
> > Drivers need to play with the 32-bit mask to infer what field they are
> > supposed to mangle... eg. with 32-bit offset alignment, checking if
> > the use want to alter the ttl/hop_limit need for helper structures to
> > check the 32-bit mask. Mangling a IPv6 address comes with one single
> > action...
>
> Drivers are still going to need to handle multiple pedit actions, in
>  case the original rule wanted to mangle two non-consecutive fields.
> And you can't just coalesce all consecutive mangles, because if you
>  mangle two consecutive fields (e.g. UDP sport and dport) the driver
>  still needs to disentangle that if it works on a 'fields' (rather
>  than 'u32s') level.

This infrastructure is _not_ coalescing two consecutive field, e.g.
UDP sport and dport is _not_ coalesced. The coalesce routine does
_not_ handle multiple tc pedit ex actions.

I'll clarify this in the cover letter in the next patchset round.

>  So either have the core convert things into named protocol fields
>  (i.e. "set src IPv6 to 1234::5 and add 1 to UDP sport"), or leave
>  the current sequence-of-u32-mangles as it is. This in-between "we'll
>  coalesce things together despite not knowing what fields they are" is
>  neither fish nor fowl.

The model you propose would still need this code for tc pedit to
adjust offset/length and coalesce u32 fields.

^ permalink raw reply

* Re: [PATCH 0/2] Revert and rework on the metadata accelreation
From: David Miller @ 2019-09-06 13:15 UTC (permalink / raw)
  To: jasowang
  Cc: jgg, mst, kvm, virtualization, netdev, linux-kernel, aarcange,
	jglisse, linux-mm
In-Reply-To: <7785d39b-b4e7-8165-516c-ee6a08ac9c4e@redhat.com>

From: Jason Wang <jasowang@redhat.com>
Date: Fri, 6 Sep 2019 18:02:35 +0800

> On 2019/9/5 下午9:59, Jason Gunthorpe wrote:
>> I think you should apply the revert this cycle and rebase the other
>> patch for next..
>>
>> Jason
> 
> Yes, the plan is to revert in this release cycle.

Then you should reset patch #1 all by itself targetting 'net'.

^ permalink raw reply

* Re: [PATCH net-next] MAINTAINERS: add myself as maintainer for xilinx axiethernet driver
From: David Miller @ 2019-09-06 13:17 UTC (permalink / raw)
  To: radhey.shyam.pandey
  Cc: netdev, michal.simek, anirudha.sarangi, john.linn,
	mchehab+samsung, gregkh, nicolas.ferre, linux-arm-kernel,
	linux-kernel
In-Reply-To: <1567688168-20607-1-git-send-email-radhey.shyam.pandey@xilinx.com>

From: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Date: Thu,  5 Sep 2019 18:26:08 +0530

> I am maintaining xilinx axiethernet driver in xilinx tree and would like
> to maintain it in the mainline kernel as well. Hence adding myself as a
> maintainer. Also Anirudha and John has moved to new roles, so based on
> request removing them from the maintainer list.
> 
> Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
> Acked-by: John Linn <john.linn@xilinx.com>
> Acked-by: Michal Simek <michal.simek@xilinx.com>
> ---
> I am resending this patch as earlier version didn't go through mailing
> list due to some config restriction on the external email addresses.
> Also adding Michal's acked-by tag.

Applied to 'net'.

^ permalink raw reply

* Re: [PATCH v1] stmmac: platform: adjust messages and move to dev level
From: David Miller @ 2019-09-06 13:18 UTC (permalink / raw)
  To: andriy.shevchenko
  Cc: peppe.cavallaro, alexandre.torgue, joabreu, netdev,
	mcoquelin.stm32, linux-stm32
In-Reply-To: <20190905130053.84703-1-andriy.shevchenko@linux.intel.com>

From: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Date: Thu,  5 Sep 2019 16:00:53 +0300

> This patch amends the error and warning messages across the platform driver.
> It includes the following changes:
>  - append \n to the end of messages
>  - change pr_* macros to dev_*
> 
> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

Applied to net-next.

^ permalink raw reply

* Re: [PATCH V2 net-next 0/7] net: hns3: add some bugfixes and cleanups
From: David Miller @ 2019-09-06 13:21 UTC (permalink / raw)
  To: tanhuazhong
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	jakub.kicinski
In-Reply-To: <1567690302-16648-1-git-send-email-tanhuazhong@huawei.com>

From: Huazhong Tan <tanhuazhong@huawei.com>
Date: Thu, 5 Sep 2019 21:31:35 +0800

> This patch-set includes bugfixes and cleanups for the HNS3
> ethernet controller driver.
 ...

Series applied to net-next

^ permalink raw reply

* Re: pull-request: wireless-drivers 2019-09-05
From: David Miller @ 2019-09-06 13:22 UTC (permalink / raw)
  To: kvalo; +Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <87k1amluae.fsf@kamboji.qca.qualcomm.com>

From: Kalle Valo <kvalo@codeaurora.org>
Date: Thu, 05 Sep 2019 16:58:01 +0300

> here's a pull request to net tree for v5.3, more info below. Please let
> me know if there are any problems.

Pulled, thanks Kalle.

^ permalink raw reply

* RE: [RFC net-next v2 4/5] net: phy: introducing support for DWC xPCS logics for EHL & TGL
From: Jose Abreu @ 2019-09-06 13:28 UTC (permalink / raw)
  To: Ong Boon Leong, davem@davemloft.net, linux@armlinux.org.uk,
	mcoquelin.stm32@gmail.com, f.fainelli@gmail.com, andrew@lunn.ch
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	peppe.cavallaro@st.com, alexandre.torgue@st.com,
	weifeng.voon@intel.com
In-Reply-To: <20190828174722.6726-5-boon.leong.ong@intel.com>

From: Ong Boon Leong <boon.leong.ong@intel.com>
Date: Aug/28/2019, 18:47:21 (UTC+00:00)

> xPCS is DWC Ethernet Physical Coding Sublayer that can be integrated with
> Ethernet MAC controller and acts as converter between GMII and SGMII. 

You have to be careful here because xPCS supports much more than these 
interfaces and speeds.

I would implement this thinking about the future integrations and 
leaving margin for more speeds.

As for this being a converter I'm not sure about the implementation 
logic because sometimes you can have a setup like:

MAC <-> xPCS <-> SERDES <-> SFP

SERDES and SFP may not require configuration nor even have MDIO access 
so in my view the "PHY" would then be the xPCS, no ?

---
Thanks,
Jose Miguel Abreu

^ permalink raw reply

* RE: [PATCH v3 net-next] net: stmmac: Add support for MDIO interrupts
From: Jose Abreu @ 2019-09-06 13:31 UTC (permalink / raw)
  To: Voon Weifeng, David S. Miller, Maxime Coquelin
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Giuseppe Cavallaro, Andrew Lunn, Alexandre Torgue, Ong Boon Leong
In-Reply-To: <1567685130-8153-1-git-send-email-weifeng.voon@intel.com>

From: Voon Weifeng <weifeng.voon@intel.com>
Date: Sep/05/2019, 13:05:30 (UTC+00:00)

> DW EQoS v5.xx controllers added capability for interrupt generation
> when MDIO interface is done (GMII Busy bit is cleared).
> This patch adds support for this interrupt on supported HW to avoid
> polling on GMII Busy bit.

Better leave the enabling of this optional because the support for it is 
also optional depending on the IP HW configuration.

---
Thanks,
Jose Miguel Abreu

^ permalink raw reply

* Re: [PATCH net-next,v3 0/4] flow_offload: update mangle action representation
From: Edward Cree @ 2019-09-06 13:37 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, davem, netdev, jakub.kicinski, jiri, saeedm,
	vishal, vladbu
In-Reply-To: <20190906131457.7olkal45kkdtbevo@salvia>

On 06/09/2019 14:14, Pablo Neira Ayuso wrote:
> OK, I can document this semantics, I need just _time_ to write that
> documentation. I was expecting this patch description is enough by now
> until I can get to finish that documentation.
I think for two structs with apparently the same contents but different
 semantics (one has the mask bitwise complemented) it's best to hold up
 the code change until the comment is ready to come with it, because
 until then it's a dangerously confusing situation.

>> And you can't just coalesce all consecutive mangles, because if you
>>  mangle two consecutive fields (e.g. UDP sport and dport) the driver
>>  still needs to disentangle that if it works on a 'fields' (rather
>>  than 'u32s') level.
> This infrastructure is _not_ coalescing two consecutive field, e.g.
> UDP sport and dport is _not_ coalesced. The coalesce routine does
> _not_ handle multiple tc pedit ex actions.
So an IPv6 address mangle only comes as a single action if it's from
 netfilter, not if it's coming from TC pedit.  Therefore drivers still
 need to handle an IPv6 or MAC address mangle coming in multiple
 actions, therefore your driver simplifications are invalid.  No?

> The model you propose would still need this code for tc pedit to
> adjust offset/length and coalesce u32 fields.
Yes, but we don't add code/features to the kernel based on what we
 *could* use it for later; every submission has to be self-contained
 in providing something of demonstrable value.  So either implement
 "the model I propose" (which to be clear I'm *not* proposing, I want
 the u32 pedit left as it is; it's just that it's a better model than
 what you're implementing here), or leave well alone.

-Ed

^ permalink raw reply

* [PULL] vhost, virtio: last minute fixes
From: Michael S. Tsirkin @ 2019-09-06 13:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: kvm, virtualization, netdev, linux-kernel, jasowang, jiangkidd,
	linyunsheng, mst, namit, tiwei.bie

Hope this can still make it.
I was not sure about virtio-net change but it seems that it prevents
livelocks for some people.

The following changes since commit 089cf7f6ecb266b6a4164919a2e69bd2f938374a:

  Linux 5.3-rc7 (2019-09-02 09:57:40 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to 02fa5d7b17a761f53ef1eedfc254e1f33bd226b0:

  mm/balloon_compaction: suppress allocation warnings (2019-09-04 07:42:01 -0400)

----------------------------------------------------------------
virtio, vhost, balloon: bugfixes

A couple of last minute bugfixes. And a revert of a failed attempt at
metadata access optimization - we'll try again in the next cycle.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

----------------------------------------------------------------
? jiang (1):
      virtio-net: lower min ring num_free for efficiency

Michael S. Tsirkin (1):
      Revert "vhost: access vq metadata through kernel virtual address"

Nadav Amit (1):
      mm/balloon_compaction: suppress allocation warnings

Tiwei Bie (2):
      vhost/test: fix build for vhost test
      vhost/test: fix build for vhost test

Yunsheng Lin (1):
      vhost: Remove unnecessary variable

 drivers/net/virtio_net.c |   2 +-
 drivers/vhost/test.c     |  13 +-
 drivers/vhost/vhost.c    | 520 +----------------------------------------------
 drivers/vhost/vhost.h    |  41 ----
 mm/balloon_compaction.c  |   3 +-
 5 files changed, 17 insertions(+), 562 deletions(-)

^ permalink raw reply

* Re: [PATCH 1/2] Revert "vhost: access vq metadata through kernel virtual address"
From: Michael S. Tsirkin @ 2019-09-06 13:46 UTC (permalink / raw)
  To: Jason Wang
  Cc: kvm, virtualization, netdev, linux-kernel, jgg, aarcange, jglisse,
	linux-mm
In-Reply-To: <20190905122736.19768-2-jasowang@redhat.com>

On Thu, Sep 05, 2019 at 08:27:35PM +0800, Jason Wang wrote:
> It was reported that metadata acceleration introduces several issues,
> so this patch reverts commit ff466032dc9e5a61217f22ea34b2df932786bbfc,
> 73f628ec9e6bcc45b77c53fe6d0c0ec55eaf82af and
> 0b4a7092ffe568a55bf8f3cefdf79ff666586d91.
> 
> We will rework it on the next version.
> 
> Cc: Jason Gunthorpe <jgg@mellanox.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>


I am confused by the above.
What I see upstream is 7f466032dc.

commit 7f466032dc9e5a61217f22ea34b2df932786bbfc
Author: Jason Wang <jasowang@redhat.com>
Date:   Fri May 24 04:12:18 2019 -0400

    vhost: access vq metadata through kernel virtual address

so this is what I reverted.

Pls take a look, and let me know if you see issues.

Thanks!

> ---
>  drivers/vhost/vhost.c | 515 +-----------------------------------------
>  drivers/vhost/vhost.h |  41 ----
>  2 files changed, 3 insertions(+), 553 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 0536f8526359..791562e03fe0 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -298,160 +298,6 @@ static void vhost_vq_meta_reset(struct vhost_dev *d)
>  		__vhost_vq_meta_reset(d->vqs[i]);
>  }
>  
> -#if VHOST_ARCH_CAN_ACCEL_UACCESS
> -static void vhost_map_unprefetch(struct vhost_map *map)
> -{
> -	kfree(map->pages);
> -	map->pages = NULL;
> -	map->npages = 0;
> -	map->addr = NULL;
> -}
> -
> -static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq)
> -{
> -	struct vhost_map *map[VHOST_NUM_ADDRS];
> -	int i;
> -
> -	spin_lock(&vq->mmu_lock);
> -	for (i = 0; i < VHOST_NUM_ADDRS; i++) {
> -		map[i] = rcu_dereference_protected(vq->maps[i],
> -				  lockdep_is_held(&vq->mmu_lock));
> -		if (map[i])
> -			rcu_assign_pointer(vq->maps[i], NULL);
> -	}
> -	spin_unlock(&vq->mmu_lock);
> -
> -	synchronize_rcu();
> -
> -	for (i = 0; i < VHOST_NUM_ADDRS; i++)
> -		if (map[i])
> -			vhost_map_unprefetch(map[i]);
> -
> -}
> -
> -static void vhost_reset_vq_maps(struct vhost_virtqueue *vq)
> -{
> -	int i;
> -
> -	vhost_uninit_vq_maps(vq);
> -	for (i = 0; i < VHOST_NUM_ADDRS; i++)
> -		vq->uaddrs[i].size = 0;
> -}
> -
> -static bool vhost_map_range_overlap(struct vhost_uaddr *uaddr,
> -				     unsigned long start,
> -				     unsigned long end)
> -{
> -	if (unlikely(!uaddr->size))
> -		return false;
> -
> -	return !(end < uaddr->uaddr || start > uaddr->uaddr - 1 + uaddr->size);
> -}
> -
> -static void vhost_invalidate_vq_start(struct vhost_virtqueue *vq,
> -				      int index,
> -				      unsigned long start,
> -				      unsigned long end)
> -{
> -	struct vhost_uaddr *uaddr = &vq->uaddrs[index];
> -	struct vhost_map *map;
> -	int i;
> -
> -	if (!vhost_map_range_overlap(uaddr, start, end))
> -		return;
> -
> -	spin_lock(&vq->mmu_lock);
> -	++vq->invalidate_count;
> -
> -	map = rcu_dereference_protected(vq->maps[index],
> -					lockdep_is_held(&vq->mmu_lock));
> -	if (map) {
> -		if (uaddr->write) {
> -			for (i = 0; i < map->npages; i++)
> -				set_page_dirty(map->pages[i]);
> -		}
> -		rcu_assign_pointer(vq->maps[index], NULL);
> -	}
> -	spin_unlock(&vq->mmu_lock);
> -
> -	if (map) {
> -		synchronize_rcu();
> -		vhost_map_unprefetch(map);
> -	}
> -}
> -
> -static void vhost_invalidate_vq_end(struct vhost_virtqueue *vq,
> -				    int index,
> -				    unsigned long start,
> -				    unsigned long end)
> -{
> -	if (!vhost_map_range_overlap(&vq->uaddrs[index], start, end))
> -		return;
> -
> -	spin_lock(&vq->mmu_lock);
> -	--vq->invalidate_count;
> -	spin_unlock(&vq->mmu_lock);
> -}
> -
> -static int vhost_invalidate_range_start(struct mmu_notifier *mn,
> -					const struct mmu_notifier_range *range)
> -{
> -	struct vhost_dev *dev = container_of(mn, struct vhost_dev,
> -					     mmu_notifier);
> -	int i, j;
> -
> -	if (!mmu_notifier_range_blockable(range))
> -		return -EAGAIN;
> -
> -	for (i = 0; i < dev->nvqs; i++) {
> -		struct vhost_virtqueue *vq = dev->vqs[i];
> -
> -		for (j = 0; j < VHOST_NUM_ADDRS; j++)
> -			vhost_invalidate_vq_start(vq, j,
> -						  range->start,
> -						  range->end);
> -	}
> -
> -	return 0;
> -}
> -
> -static void vhost_invalidate_range_end(struct mmu_notifier *mn,
> -				       const struct mmu_notifier_range *range)
> -{
> -	struct vhost_dev *dev = container_of(mn, struct vhost_dev,
> -					     mmu_notifier);
> -	int i, j;
> -
> -	for (i = 0; i < dev->nvqs; i++) {
> -		struct vhost_virtqueue *vq = dev->vqs[i];
> -
> -		for (j = 0; j < VHOST_NUM_ADDRS; j++)
> -			vhost_invalidate_vq_end(vq, j,
> -						range->start,
> -						range->end);
> -	}
> -}
> -
> -static const struct mmu_notifier_ops vhost_mmu_notifier_ops = {
> -	.invalidate_range_start = vhost_invalidate_range_start,
> -	.invalidate_range_end = vhost_invalidate_range_end,
> -};
> -
> -static void vhost_init_maps(struct vhost_dev *dev)
> -{
> -	struct vhost_virtqueue *vq;
> -	int i, j;
> -
> -	dev->mmu_notifier.ops = &vhost_mmu_notifier_ops;
> -
> -	for (i = 0; i < dev->nvqs; ++i) {
> -		vq = dev->vqs[i];
> -		for (j = 0; j < VHOST_NUM_ADDRS; j++)
> -			RCU_INIT_POINTER(vq->maps[j], NULL);
> -	}
> -}
> -#endif
> -
>  static void vhost_vq_reset(struct vhost_dev *dev,
>  			   struct vhost_virtqueue *vq)
>  {
> @@ -480,11 +326,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
>  	vq->busyloop_timeout = 0;
>  	vq->umem = NULL;
>  	vq->iotlb = NULL;
> -	vq->invalidate_count = 0;
>  	__vhost_vq_meta_reset(vq);
> -#if VHOST_ARCH_CAN_ACCEL_UACCESS
> -	vhost_reset_vq_maps(vq);
> -#endif
>  }
>  
>  static int vhost_worker(void *data)
> @@ -634,9 +476,7 @@ void vhost_dev_init(struct vhost_dev *dev,
>  	INIT_LIST_HEAD(&dev->read_list);
>  	INIT_LIST_HEAD(&dev->pending_list);
>  	spin_lock_init(&dev->iotlb_lock);
> -#if VHOST_ARCH_CAN_ACCEL_UACCESS
> -	vhost_init_maps(dev);
> -#endif
> +
>  
>  	for (i = 0; i < dev->nvqs; ++i) {
>  		vq = dev->vqs[i];
> @@ -645,7 +485,6 @@ void vhost_dev_init(struct vhost_dev *dev,
>  		vq->heads = NULL;
>  		vq->dev = dev;
>  		mutex_init(&vq->mutex);
> -		spin_lock_init(&vq->mmu_lock);
>  		vhost_vq_reset(dev, vq);
>  		if (vq->handle_kick)
>  			vhost_poll_init(&vq->poll, vq->handle_kick,
> @@ -725,18 +564,7 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
>  	if (err)
>  		goto err_cgroup;
>  
> -#if VHOST_ARCH_CAN_ACCEL_UACCESS
> -	err = mmu_notifier_register(&dev->mmu_notifier, dev->mm);
> -	if (err)
> -		goto err_mmu_notifier;
> -#endif
> -
>  	return 0;
> -
> -#if VHOST_ARCH_CAN_ACCEL_UACCESS
> -err_mmu_notifier:
> -	vhost_dev_free_iovecs(dev);
> -#endif
>  err_cgroup:
>  	kthread_stop(worker);
>  	dev->worker = NULL;
> @@ -827,107 +655,6 @@ static void vhost_clear_msg(struct vhost_dev *dev)
>  	spin_unlock(&dev->iotlb_lock);
>  }
>  
> -#if VHOST_ARCH_CAN_ACCEL_UACCESS
> -static void vhost_setup_uaddr(struct vhost_virtqueue *vq,
> -			      int index, unsigned long uaddr,
> -			      size_t size, bool write)
> -{
> -	struct vhost_uaddr *addr = &vq->uaddrs[index];
> -
> -	addr->uaddr = uaddr;
> -	addr->size = size;
> -	addr->write = write;
> -}
> -
> -static void vhost_setup_vq_uaddr(struct vhost_virtqueue *vq)
> -{
> -	vhost_setup_uaddr(vq, VHOST_ADDR_DESC,
> -			  (unsigned long)vq->desc,
> -			  vhost_get_desc_size(vq, vq->num),
> -			  false);
> -	vhost_setup_uaddr(vq, VHOST_ADDR_AVAIL,
> -			  (unsigned long)vq->avail,
> -			  vhost_get_avail_size(vq, vq->num),
> -			  false);
> -	vhost_setup_uaddr(vq, VHOST_ADDR_USED,
> -			  (unsigned long)vq->used,
> -			  vhost_get_used_size(vq, vq->num),
> -			  true);
> -}
> -
> -static int vhost_map_prefetch(struct vhost_virtqueue *vq,
> -			       int index)
> -{
> -	struct vhost_map *map;
> -	struct vhost_uaddr *uaddr = &vq->uaddrs[index];
> -	struct page **pages;
> -	int npages = DIV_ROUND_UP(uaddr->size, PAGE_SIZE);
> -	int npinned;
> -	void *vaddr, *v;
> -	int err;
> -	int i;
> -
> -	spin_lock(&vq->mmu_lock);
> -
> -	err = -EFAULT;
> -	if (vq->invalidate_count)
> -		goto err;
> -
> -	err = -ENOMEM;
> -	map = kmalloc(sizeof(*map), GFP_ATOMIC);
> -	if (!map)
> -		goto err;
> -
> -	pages = kmalloc_array(npages, sizeof(struct page *), GFP_ATOMIC);
> -	if (!pages)
> -		goto err_pages;
> -
> -	err = EFAULT;
> -	npinned = __get_user_pages_fast(uaddr->uaddr, npages,
> -					uaddr->write, pages);
> -	if (npinned > 0)
> -		release_pages(pages, npinned);
> -	if (npinned != npages)
> -		goto err_gup;
> -
> -	for (i = 0; i < npinned; i++)
> -		if (PageHighMem(pages[i]))
> -			goto err_gup;
> -
> -	vaddr = v = page_address(pages[0]);
> -
> -	/* For simplicity, fallback to userspace address if VA is not
> -	 * contigious.
> -	 */
> -	for (i = 1; i < npinned; i++) {
> -		v += PAGE_SIZE;
> -		if (v != page_address(pages[i]))
> -			goto err_gup;
> -	}
> -
> -	map->addr = vaddr + (uaddr->uaddr & (PAGE_SIZE - 1));
> -	map->npages = npages;
> -	map->pages = pages;
> -
> -	rcu_assign_pointer(vq->maps[index], map);
> -	/* No need for a synchronize_rcu(). This function should be
> -	 * called by dev->worker so we are serialized with all
> -	 * readers.
> -	 */
> -	spin_unlock(&vq->mmu_lock);
> -
> -	return 0;
> -
> -err_gup:
> -	kfree(pages);
> -err_pages:
> -	kfree(map);
> -err:
> -	spin_unlock(&vq->mmu_lock);
> -	return err;
> -}
> -#endif
> -
>  void vhost_dev_cleanup(struct vhost_dev *dev)
>  {
>  	int i;
> @@ -957,16 +684,8 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
>  		kthread_stop(dev->worker);
>  		dev->worker = NULL;
>  	}
> -	if (dev->mm) {
> -#if VHOST_ARCH_CAN_ACCEL_UACCESS
> -		mmu_notifier_unregister(&dev->mmu_notifier, dev->mm);
> -#endif
> +	if (dev->mm)
>  		mmput(dev->mm);
> -	}
> -#if VHOST_ARCH_CAN_ACCEL_UACCESS
> -	for (i = 0; i < dev->nvqs; i++)
> -		vhost_uninit_vq_maps(dev->vqs[i]);
> -#endif
>  	dev->mm = NULL;
>  }
>  EXPORT_SYMBOL_GPL(vhost_dev_cleanup);
> @@ -1195,26 +914,6 @@ static inline void __user *__vhost_get_user(struct vhost_virtqueue *vq,
>  
>  static inline int vhost_put_avail_event(struct vhost_virtqueue *vq)
>  {
> -#if VHOST_ARCH_CAN_ACCEL_UACCESS
> -	struct vhost_map *map;
> -	struct vring_used *used;
> -
> -	if (!vq->iotlb) {
> -		rcu_read_lock();
> -
> -		map = rcu_dereference(vq->maps[VHOST_ADDR_USED]);
> -		if (likely(map)) {
> -			used = map->addr;
> -			*((__virtio16 *)&used->ring[vq->num]) =
> -				cpu_to_vhost16(vq, vq->avail_idx);
> -			rcu_read_unlock();
> -			return 0;
> -		}
> -
> -		rcu_read_unlock();
> -	}
> -#endif
> -
>  	return vhost_put_user(vq, cpu_to_vhost16(vq, vq->avail_idx),
>  			      vhost_avail_event(vq));
>  }
> @@ -1223,27 +922,6 @@ static inline int vhost_put_used(struct vhost_virtqueue *vq,
>  				 struct vring_used_elem *head, int idx,
>  				 int count)
>  {
> -#if VHOST_ARCH_CAN_ACCEL_UACCESS
> -	struct vhost_map *map;
> -	struct vring_used *used;
> -	size_t size;
> -
> -	if (!vq->iotlb) {
> -		rcu_read_lock();
> -
> -		map = rcu_dereference(vq->maps[VHOST_ADDR_USED]);
> -		if (likely(map)) {
> -			used = map->addr;
> -			size = count * sizeof(*head);
> -			memcpy(used->ring + idx, head, size);
> -			rcu_read_unlock();
> -			return 0;
> -		}
> -
> -		rcu_read_unlock();
> -	}
> -#endif
> -
>  	return vhost_copy_to_user(vq, vq->used->ring + idx, head,
>  				  count * sizeof(*head));
>  }
> @@ -1251,25 +929,6 @@ static inline int vhost_put_used(struct vhost_virtqueue *vq,
>  static inline int vhost_put_used_flags(struct vhost_virtqueue *vq)
>  
>  {
> -#if VHOST_ARCH_CAN_ACCEL_UACCESS
> -	struct vhost_map *map;
> -	struct vring_used *used;
> -
> -	if (!vq->iotlb) {
> -		rcu_read_lock();
> -
> -		map = rcu_dereference(vq->maps[VHOST_ADDR_USED]);
> -		if (likely(map)) {
> -			used = map->addr;
> -			used->flags = cpu_to_vhost16(vq, vq->used_flags);
> -			rcu_read_unlock();
> -			return 0;
> -		}
> -
> -		rcu_read_unlock();
> -	}
> -#endif
> -
>  	return vhost_put_user(vq, cpu_to_vhost16(vq, vq->used_flags),
>  			      &vq->used->flags);
>  }
> @@ -1277,25 +936,6 @@ static inline int vhost_put_used_flags(struct vhost_virtqueue *vq)
>  static inline int vhost_put_used_idx(struct vhost_virtqueue *vq)
>  
>  {
> -#if VHOST_ARCH_CAN_ACCEL_UACCESS
> -	struct vhost_map *map;
> -	struct vring_used *used;
> -
> -	if (!vq->iotlb) {
> -		rcu_read_lock();
> -
> -		map = rcu_dereference(vq->maps[VHOST_ADDR_USED]);
> -		if (likely(map)) {
> -			used = map->addr;
> -			used->idx = cpu_to_vhost16(vq, vq->last_used_idx);
> -			rcu_read_unlock();
> -			return 0;
> -		}
> -
> -		rcu_read_unlock();
> -	}
> -#endif
> -
>  	return vhost_put_user(vq, cpu_to_vhost16(vq, vq->last_used_idx),
>  			      &vq->used->idx);
>  }
> @@ -1341,50 +981,12 @@ static void vhost_dev_unlock_vqs(struct vhost_dev *d)
>  static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq,
>  				      __virtio16 *idx)
>  {
> -#if VHOST_ARCH_CAN_ACCEL_UACCESS
> -	struct vhost_map *map;
> -	struct vring_avail *avail;
> -
> -	if (!vq->iotlb) {
> -		rcu_read_lock();
> -
> -		map = rcu_dereference(vq->maps[VHOST_ADDR_AVAIL]);
> -		if (likely(map)) {
> -			avail = map->addr;
> -			*idx = avail->idx;
> -			rcu_read_unlock();
> -			return 0;
> -		}
> -
> -		rcu_read_unlock();
> -	}
> -#endif
> -
>  	return vhost_get_avail(vq, *idx, &vq->avail->idx);
>  }
>  
>  static inline int vhost_get_avail_head(struct vhost_virtqueue *vq,
>  				       __virtio16 *head, int idx)
>  {
> -#if VHOST_ARCH_CAN_ACCEL_UACCESS
> -	struct vhost_map *map;
> -	struct vring_avail *avail;
> -
> -	if (!vq->iotlb) {
> -		rcu_read_lock();
> -
> -		map = rcu_dereference(vq->maps[VHOST_ADDR_AVAIL]);
> -		if (likely(map)) {
> -			avail = map->addr;
> -			*head = avail->ring[idx & (vq->num - 1)];
> -			rcu_read_unlock();
> -			return 0;
> -		}
> -
> -		rcu_read_unlock();
> -	}
> -#endif
> -
>  	return vhost_get_avail(vq, *head,
>  			       &vq->avail->ring[idx & (vq->num - 1)]);
>  }
> @@ -1392,98 +994,24 @@ static inline int vhost_get_avail_head(struct vhost_virtqueue *vq,
>  static inline int vhost_get_avail_flags(struct vhost_virtqueue *vq,
>  					__virtio16 *flags)
>  {
> -#if VHOST_ARCH_CAN_ACCEL_UACCESS
> -	struct vhost_map *map;
> -	struct vring_avail *avail;
> -
> -	if (!vq->iotlb) {
> -		rcu_read_lock();
> -
> -		map = rcu_dereference(vq->maps[VHOST_ADDR_AVAIL]);
> -		if (likely(map)) {
> -			avail = map->addr;
> -			*flags = avail->flags;
> -			rcu_read_unlock();
> -			return 0;
> -		}
> -
> -		rcu_read_unlock();
> -	}
> -#endif
> -
>  	return vhost_get_avail(vq, *flags, &vq->avail->flags);
>  }
>  
>  static inline int vhost_get_used_event(struct vhost_virtqueue *vq,
>  				       __virtio16 *event)
>  {
> -#if VHOST_ARCH_CAN_ACCEL_UACCESS
> -	struct vhost_map *map;
> -	struct vring_avail *avail;
> -
> -	if (!vq->iotlb) {
> -		rcu_read_lock();
> -		map = rcu_dereference(vq->maps[VHOST_ADDR_AVAIL]);
> -		if (likely(map)) {
> -			avail = map->addr;
> -			*event = (__virtio16)avail->ring[vq->num];
> -			rcu_read_unlock();
> -			return 0;
> -		}
> -		rcu_read_unlock();
> -	}
> -#endif
> -
>  	return vhost_get_avail(vq, *event, vhost_used_event(vq));
>  }
>  
>  static inline int vhost_get_used_idx(struct vhost_virtqueue *vq,
>  				     __virtio16 *idx)
>  {
> -#if VHOST_ARCH_CAN_ACCEL_UACCESS
> -	struct vhost_map *map;
> -	struct vring_used *used;
> -
> -	if (!vq->iotlb) {
> -		rcu_read_lock();
> -
> -		map = rcu_dereference(vq->maps[VHOST_ADDR_USED]);
> -		if (likely(map)) {
> -			used = map->addr;
> -			*idx = used->idx;
> -			rcu_read_unlock();
> -			return 0;
> -		}
> -
> -		rcu_read_unlock();
> -	}
> -#endif
> -
>  	return vhost_get_used(vq, *idx, &vq->used->idx);
>  }
>  
>  static inline int vhost_get_desc(struct vhost_virtqueue *vq,
>  				 struct vring_desc *desc, int idx)
>  {
> -#if VHOST_ARCH_CAN_ACCEL_UACCESS
> -	struct vhost_map *map;
> -	struct vring_desc *d;
> -
> -	if (!vq->iotlb) {
> -		rcu_read_lock();
> -
> -		map = rcu_dereference(vq->maps[VHOST_ADDR_DESC]);
> -		if (likely(map)) {
> -			d = map->addr;
> -			*desc = *(d + idx);
> -			rcu_read_unlock();
> -			return 0;
> -		}
> -
> -		rcu_read_unlock();
> -	}
> -#endif
> -
>  	return vhost_copy_from_user(vq, desc, vq->desc + idx, sizeof(*desc));
>  }
>  
> @@ -1824,32 +1352,12 @@ static bool iotlb_access_ok(struct vhost_virtqueue *vq,
>  	return true;
>  }
>  
> -#if VHOST_ARCH_CAN_ACCEL_UACCESS
> -static void vhost_vq_map_prefetch(struct vhost_virtqueue *vq)
> -{
> -	struct vhost_map __rcu *map;
> -	int i;
> -
> -	for (i = 0; i < VHOST_NUM_ADDRS; i++) {
> -		rcu_read_lock();
> -		map = rcu_dereference(vq->maps[i]);
> -		rcu_read_unlock();
> -		if (unlikely(!map))
> -			vhost_map_prefetch(vq, i);
> -	}
> -}
> -#endif
> -
>  int vq_meta_prefetch(struct vhost_virtqueue *vq)
>  {
>  	unsigned int num = vq->num;
>  
> -	if (!vq->iotlb) {
> -#if VHOST_ARCH_CAN_ACCEL_UACCESS
> -		vhost_vq_map_prefetch(vq);
> -#endif
> +	if (!vq->iotlb)
>  		return 1;
> -	}
>  
>  	return iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->desc,
>  			       vhost_get_desc_size(vq, num), VHOST_ADDR_DESC) &&
> @@ -2060,16 +1568,6 @@ static long vhost_vring_set_num_addr(struct vhost_dev *d,
>  
>  	mutex_lock(&vq->mutex);
>  
> -#if VHOST_ARCH_CAN_ACCEL_UACCESS
> -	/* Unregister MMU notifer to allow invalidation callback
> -	 * can access vq->uaddrs[] without holding a lock.
> -	 */
> -	if (d->mm)
> -		mmu_notifier_unregister(&d->mmu_notifier, d->mm);
> -
> -	vhost_uninit_vq_maps(vq);
> -#endif
> -
>  	switch (ioctl) {
>  	case VHOST_SET_VRING_NUM:
>  		r = vhost_vring_set_num(d, vq, argp);
> @@ -2081,13 +1579,6 @@ static long vhost_vring_set_num_addr(struct vhost_dev *d,
>  		BUG();
>  	}
>  
> -#if VHOST_ARCH_CAN_ACCEL_UACCESS
> -	vhost_setup_vq_uaddr(vq);
> -
> -	if (d->mm)
> -		mmu_notifier_register(&d->mmu_notifier, d->mm);
> -#endif
> -
>  	mutex_unlock(&vq->mutex);
>  
>  	return r;
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 42a8c2a13ab1..e9ed2722b633 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -12,9 +12,6 @@
>  #include <linux/virtio_config.h>
>  #include <linux/virtio_ring.h>
>  #include <linux/atomic.h>
> -#include <linux/pagemap.h>
> -#include <linux/mmu_notifier.h>
> -#include <asm/cacheflush.h>
>  
>  struct vhost_work;
>  typedef void (*vhost_work_fn_t)(struct vhost_work *work);
> @@ -83,24 +80,6 @@ enum vhost_uaddr_type {
>  	VHOST_NUM_ADDRS = 3,
>  };
>  
> -struct vhost_map {
> -	int npages;
> -	void *addr;
> -	struct page **pages;
> -};
> -
> -struct vhost_uaddr {
> -	unsigned long uaddr;
> -	size_t size;
> -	bool write;
> -};
> -
> -#if defined(CONFIG_MMU_NOTIFIER) && ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE == 0
> -#define VHOST_ARCH_CAN_ACCEL_UACCESS 0
> -#else
> -#define VHOST_ARCH_CAN_ACCEL_UACCESS 0
> -#endif
> -
>  /* The virtqueue structure describes a queue attached to a device. */
>  struct vhost_virtqueue {
>  	struct vhost_dev *dev;
> @@ -111,22 +90,7 @@ struct vhost_virtqueue {
>  	struct vring_desc __user *desc;
>  	struct vring_avail __user *avail;
>  	struct vring_used __user *used;
> -
> -#if VHOST_ARCH_CAN_ACCEL_UACCESS
> -	/* Read by memory accessors, modified by meta data
> -	 * prefetching, MMU notifier and vring ioctl().
> -	 * Synchonrized through mmu_lock (writers) and RCU (writers
> -	 * and readers).
> -	 */
> -	struct vhost_map __rcu *maps[VHOST_NUM_ADDRS];
> -	/* Read by MMU notifier, modified by vring ioctl(),
> -	 * synchronized through MMU notifier
> -	 * registering/unregistering.
> -	 */
> -	struct vhost_uaddr uaddrs[VHOST_NUM_ADDRS];
> -#endif
>  	const struct vhost_umem_node *meta_iotlb[VHOST_NUM_ADDRS];
> -
>  	struct file *kick;
>  	struct eventfd_ctx *call_ctx;
>  	struct eventfd_ctx *error_ctx;
> @@ -181,8 +145,6 @@ struct vhost_virtqueue {
>  	bool user_be;
>  #endif
>  	u32 busyloop_timeout;
> -	spinlock_t mmu_lock;
> -	int invalidate_count;
>  };
>  
>  struct vhost_msg_node {
> @@ -196,9 +158,6 @@ struct vhost_msg_node {
>  
>  struct vhost_dev {
>  	struct mm_struct *mm;
> -#ifdef CONFIG_MMU_NOTIFIER
> -	struct mmu_notifier mmu_notifier;
> -#endif
>  	struct mutex mutex;
>  	struct vhost_virtqueue **vqs;
>  	int nvqs;
> -- 
> 2.19.1

^ permalink raw reply

* Re: [PATCH] Bluetooth: hidp: Fix error checks in hidp_get/set_raw_report
From: Marcel Holtmann @ 2019-09-06 13:56 UTC (permalink / raw)
  To: Dan Elkouby
  Cc: Johan Hedberg, David S. Miller, Dan Carpenter, Fabian Henneke,
	Brian Norris, Al Viro, Andrea Parri, linux-bluetooth, netdev,
	linux-kernel
In-Reply-To: <20190906094158.8854-1-streetwalkermc@gmail.com>

Hi Dan,

> Commit 48d9cc9d85dd ("Bluetooth: hidp: Let hidp_send_message return
> number of queued bytes") changed hidp_send_message to return non-zero
> values on success, which some other bits did not expect. This caused
> spurious errors to be propagated through the stack, breaking some (all?)
> drivers, such as hid-sony for the Dualshock 4 in Bluetooth mode.
> 
> Signed-off-by: Dan Elkouby <streetwalkermc@gmail.com>
> ---
> net/bluetooth/hidp/core.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)

patch has been applied to bluetooth-next tree.

Regards

Marcel


^ permalink raw reply

* Re: [PATCH] Bluetooth: hidp: Fix assumptions on the return value of hidp_send_message
From: Marcel Holtmann @ 2019-09-06 13:57 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Dan Elkouby, Dan Carpenter, Benjamin Tissoires, Johan Hedberg,
	David S. Miller, Brian Norris, Fabian Henneke, Al Viro,
	Andrea Parri, linux-input, linux-kernel, linux-bluetooth, netdev
In-Reply-To: <nycvar.YFH.7.76.1909061330390.31470@cbobk.fhfr.pm>

Hi Jiri,

>> hidp_send_message was changed to return non-zero values on success,
>> which some other bits did not expect. This caused spurious errors to be
>> propagated through the stack, breaking some drivers, such as hid-sony
>> for the Dualshock 4 in Bluetooth mode.
>> 
>> As pointed out by Dan Carpenter, hid-microsoft directly relied on that
>> assumption as well.
>> 
>> Fixes: 48d9cc9d85dd ("Bluetooth: hidp: Let hidp_send_message return number of queued bytes")
>> 
>> Signed-off-by: Dan Elkouby <streetwalkermc@gmail.com>
> 
> Reviewed-by: Jiri Kosina <jkosina@suse.cz>
> 
> Marcel, are you taking this through your tree?

I am taking this through my tree. And yes, I applied the updated patch, but answered the other ;)

Regards

Marcel


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox