Netdev List
 help / color / mirror / Atom feed
* Re: stmmac driver...
From: Jie Deng @ 2016-12-14  4:05 UTC (permalink / raw)
  To: Giuseppe CAVALLARO, David Miller, alexandre.torgue; +Cc: netdev
In-Reply-To: <35edb551-518e-f59f-7a9e-db108d8f42a7@st.com>

Hi Peppe,

On 2016/12/12 22:17, Giuseppe CAVALLARO wrote:
> Hi David
>
> On 12/7/2016 7:06 PM, David Miller wrote:
>>
>> Giuseppe and Alexandre,
>>
>> There are a lot of patches and discussions happening around the stammc
>> driver lately and both of you are listed as the maintainers.
>>
>> I really need prompt and conclusive reviews of these patch submissions
>> from you, and participation in all discussions about the driver.
>
> yes we are trying to do the best.
>
>> Otherwise I have only three things I can do: 1) let the patches rot in
>> patchwork for days 2) trust that the patches are sane and fit your
>> desires and goals and just apply them or 3) reject them since they
>> aren't being reviewed properly.
>
> at this stage, I think the best is: (3).
I think the patches David mentioned also included XLGMAC. He sent this email
before I explained QoS and XLGMAC were different IPs. Do you mind we do XLGMAC
development under drivers/net/ethernet/synopsys/ ? I think we don't have
conflict since we will keep QoS development in stmmac.
>
>>
>> Thanks in advance.
>>
> you are welcome
>
>
> Peppe

^ permalink raw reply

* Re: netlink: GPF in sock_sndtimeo
From: Richard Guy Briggs @ 2016-12-14  4:17 UTC (permalink / raw)
  To: Cong Wang
  Cc: Herbert Xu, Johannes Berg, netdev, Florian Westphal, LKML,
	Eric Dumazet, linux-audit, syzkaller, David Miller, Dmitry Vyukov
In-Reply-To: <CAM_iQpVPEJ2t29ENpT4qcBznwE83w_PEBOxStwyzDH27Si2Ppw@mail.gmail.com>

On 2016-12-13 16:17, Cong Wang wrote:
> On Tue, Dec 13, 2016 at 2:52 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
> > It is actually the audit_pid and audit_nlk_portid that I care about
> > more.  The audit daemon could vanish or close the socket while the
> > kernel sock to which it was attached is still quite valid.  Accessing
> > the set of three atomically is the urge.  I wonder if it makes more
> > sense to test for the presence of auditd using audit_sock rather than
> > audit_pid, but still keep audit_pid for our reporting and replacement
> > strategy.  Another idea would be to put the three in one struct.
> 
> Note, the process has audit_pid should hold a refcnt to the netns too,
> so the netns can't be gone until that process is gone.

I noted that.  I did wonder if there might be a problem if all the
processes were moved to another netns with the struct sock stuck in the
now process-void netns.

This is alluded-to in 6f285b19d09f ("audit: Send replies in the proper
network namespace.").

> > Can someone explain how they think the original test was able to trigger
> > this GPF?  Network namespace shutdown while something pretended to set
> > up a new auditd?  That's impressive for a fuzzer if that's the case...
> > Is there an strace?  I guess it is all in test().
> 
> I am surprised you still don't get the race condition even when you
> are now working on v2...
> 
> The race happens in this scenarios :
> 
> 1) Create a new netns
> 
> 2) In the new netns, communicate with kauditd to set audit_sock
> 
> 3) Generate some audit messages, so kauditd will keep sending them
> via audit_sock
> 
> 4) exit the netns
> 
> 5) the previous audit_sock is now going away, but kaudit_sock could still
> access it in this small window.

Ah ok that fits...

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Kernel Security Engineering, Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply

* Re: "virtio-net: enable multiqueue by default" in linux-next breaks networking on GCE
From: Wei Xu @ 2016-12-14  4:24 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: jasowang, netdev, mst, nhorman, davem
In-Reply-To: <20161213194431.42vtozn6bs24vwda@thunk.org>

On 2016年12月14日 03:44, Theodore Ts'o wrote:
> Jason's patch fixed the issue, so I think we have the proper fix, but
> to answer your questions:
>
> On Wed, Dec 14, 2016 at 01:46:44AM +0800, Wei Xu wrote:
>>
>> Q1:
>> Which distribution are you using for the GCE instance?
>
> The test appliance is based on Debian Jessie.
>
>> Q2:
>> Are you running xfs test as an embedded VM case, which means XFS test
>> appliance is also a VM inside the GCE instance? Or the kernel is built
>> for the instance itself?
>
> No, GCE currently doesn't support running nested VM's (e.g., running
> VM's inside GCE).  So the kernel is built for the instance itself.
> The way the test appliance works is that it initially boots using the
> Debian Jessie default kernel and then we kexec into the kernel under
> test.
>
>> Q3:
>> Can this bug be reproduced for kvm-xfstests case? I'm trying to set up
>> a local test bed if it makes sense.
>
> You definitely can't do it out of the box -- you need to build the
> image using "gen-image --networking", and then run "kvm-xfstests -N
> shell" as root.  But the bug doesn't reproduce on kvm-xfstests, using
> a 4.9 host kernel and linux-next guest kernel.
>

OK, thanks a lot.

BTW, although this is a guest issue, is there anyway to view the GCE
host kernel or qemu(if it is) version?

>
> Cheers,
>
> 					- Ted
>

^ permalink raw reply

* Re: [PATCH v3 net-next 1/3] openvswitch: Add a missing break statement.
From: Pravin Shelar @ 2016-12-14  5:07 UTC (permalink / raw)
  To: Jarno Rajahalme; +Cc: Linux Kernel Network Developers, Jiri Benc, Eric Garver
In-Reply-To: <1480462253-114713-1-git-send-email-jarno@ovn.org>

On Tue, Nov 29, 2016 at 3:30 PM, Jarno Rajahalme <jarno@ovn.org> wrote:
> Add a break statement to prevent fall-through from
> OVS_KEY_ATTR_ETHERNET to OVS_KEY_ATTR_TUNNEL.  Without the break
> actions setting ethernet addresses fail to validate with log messages
> complaining about invalid tunnel attributes.
>
> Fixes: 0a6410fbde ("openvswitch: netlink: support L3 packets")
> Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
> Acked-by: Pravin B Shelar <pshelar@ovn.org>
> Acked-by: Jiri Benc <jbenc@redhat.com>

Hi Jarno,
Since this is straight forward patch. can you send it separately so
that we can get it merged soon?

Thanks,
Pravin.

^ permalink raw reply

* Re: [RFC PATCH v3] audit: use proper refcount locking on audit_sock
From: Cong Wang @ 2016-12-14  5:36 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: Linux Kernel Network Developers, LKML, Eric Dumazet, linux-audit,
	Dmitry Vyukov
In-Reply-To: <20161214040005.GL22660@madcap2.tricolour.ca>

On Tue, Dec 13, 2016 at 8:00 PM, Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2016-12-13 16:19, Cong Wang wrote:
>> On Tue, Dec 13, 2016 at 7:03 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
>> > @@ -1283,8 +1299,10 @@ static void __net_exit audit_net_exit(struct net *net)
>> >  {
>> >         struct audit_net *aunet = net_generic(net, audit_net_id);
>> >         struct sock *sock = aunet->nlsk;
>> > +       mutex_lock(&audit_cmd_mutex);
>> >         if (sock == audit_sock)
>> >                 auditd_reset();
>> > +       mutex_unlock(&audit_cmd_mutex);
>>
>> This still doesn't look correct to me, b/c here we release the audit_sock
>> refcnt twice:
>>
>> 1) inside audit_reset()
>
> The audit_reset() refcount decrement corresponds to a setting of
> audit_sock only if audit_sock is still non-NULL.
>

Hmm, thinking about it again, looks like the sock == audit_sock
and audit_sock != NULL checks can guarantee we are safe. So,

Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>

^ permalink raw reply

* [PATCH v3] net: macb: Added PCI wrapper for Platform Driver.
From: Bartosz Folta @ 2016-12-14  6:39 UTC (permalink / raw)
  To: Nicolas Ferre, David S. Miller, Niklas Cassel, Alexandre Torgue,
	Satanand Burla, Raghu Vatsavayi, Simon Horman,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
  Cc: Bartosz Folta, Rafal Ozieblo
In-Reply-To: <1481648560-25927-1-git-send-email-bfolta@cadence.com>

There are hardware PCI implementations of Cadence GEM network
controller. This patch will allow to use such hardware with reuse of
existing Platform Driver.

Signed-off-by: Bartosz Folta <bfolta@cadence.com>
---
Changed in v3:
Fixed dependencies in Kconfig.
---
Changed in v2:
Respin to net-next. Changed patch formatting.
---
 drivers/net/ethernet/cadence/Kconfig    |   9 ++
 drivers/net/ethernet/cadence/Makefile   |   1 +
 drivers/net/ethernet/cadence/macb.c     |  31 +++++--
 drivers/net/ethernet/cadence/macb_pci.c | 153 ++++++++++++++++++++++++++++++++
 include/linux/platform_data/macb.h      |   6 ++
 5 files changed, 195 insertions(+), 5 deletions(-)
 create mode 100644 drivers/net/ethernet/cadence/macb_pci.c

diff --git a/drivers/net/ethernet/cadence/Kconfig b/drivers/net/ethernet/cadence/Kconfig
index f0bcb15..608bea1 100644
--- a/drivers/net/ethernet/cadence/Kconfig
+++ b/drivers/net/ethernet/cadence/Kconfig
@@ -31,4 +31,13 @@ config MACB
 	  To compile this driver as a module, choose M here: the module
 	  will be called macb.
 
+config MACB_PCI
+	tristate "Cadence PCI MACB/GEM support"
+	depends on MACB && PCI && COMMON_CLK
+	---help---
+	  This is PCI wrapper for MACB driver.
+
+	  To compile this driver as a module, choose M here: the module
+	  will be called macb_pci.
+
 endif # NET_CADENCE
diff --git a/drivers/net/ethernet/cadence/Makefile b/drivers/net/ethernet/cadence/Makefile
index 91f79b1..4ba7559 100644
--- a/drivers/net/ethernet/cadence/Makefile
+++ b/drivers/net/ethernet/cadence/Makefile
@@ -3,3 +3,4 @@
 #
 
 obj-$(CONFIG_MACB) += macb.o
+obj-$(CONFIG_MACB_PCI) += macb_pci.o
diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c
index 538544a..c0fb80a 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -404,6 +404,8 @@ static int macb_mii_probe(struct net_device *dev)
 			phy_irq = gpio_to_irq(pdata->phy_irq_pin);
 			phydev->irq = (phy_irq < 0) ? PHY_POLL : phy_irq;
 		}
+	} else {
+		phydev->irq = PHY_POLL;
 	}
 
 	/* attach the mac to the phy */
@@ -482,6 +484,9 @@ static int macb_mii_init(struct macb *bp)
 				goto err_out_unregister_bus;
 		}
 	} else {
+		for (i = 0; i < PHY_MAX_ADDR; i++)
+			bp->mii_bus->irq[i] = PHY_POLL;
+
 		if (pdata)
 			bp->mii_bus->phy_mask = pdata->phy_mask;
 
@@ -2523,16 +2528,24 @@ static int macb_clk_init(struct platform_device *pdev, struct clk **pclk,
 			 struct clk **hclk, struct clk **tx_clk,
 			 struct clk **rx_clk)
 {
+	struct macb_platform_data *pdata;
 	int err;
 
-	*pclk = devm_clk_get(&pdev->dev, "pclk");
+	pdata = dev_get_platdata(&pdev->dev);
+	if (pdata) {
+		*pclk = pdata->pclk;
+		*hclk = pdata->hclk;
+	} else {
+		*pclk = devm_clk_get(&pdev->dev, "pclk");
+		*hclk = devm_clk_get(&pdev->dev, "hclk");
+	}
+
 	if (IS_ERR(*pclk)) {
 		err = PTR_ERR(*pclk);
 		dev_err(&pdev->dev, "failed to get macb_clk (%u)\n", err);
 		return err;
 	}
 
-	*hclk = devm_clk_get(&pdev->dev, "hclk");
 	if (IS_ERR(*hclk)) {
 		err = PTR_ERR(*hclk);
 		dev_err(&pdev->dev, "failed to get hclk (%u)\n", err);
@@ -3107,15 +3120,23 @@ static int at91ether_init(struct platform_device *pdev)
 MODULE_DEVICE_TABLE(of, macb_dt_ids);
 #endif /* CONFIG_OF */
 
+static const struct macb_config default_gem_config = {
+	.caps = MACB_CAPS_GIGABIT_MODE_AVAILABLE | MACB_CAPS_JUMBO,
+	.dma_burst_length = 16,
+	.clk_init = macb_clk_init,
+	.init = macb_init,
+	.jumbo_max_len = 10240,
+};
+
 static int macb_probe(struct platform_device *pdev)
 {
+	const struct macb_config *macb_config = &default_gem_config;
 	int (*clk_init)(struct platform_device *, struct clk **,
 			struct clk **, struct clk **,  struct clk **)
-					      = macb_clk_init;
-	int (*init)(struct platform_device *) = macb_init;
+					      = macb_config->clk_init;
+	int (*init)(struct platform_device *) = macb_config->init;
 	struct device_node *np = pdev->dev.of_node;
 	struct device_node *phy_node;
-	const struct macb_config *macb_config = NULL;
 	struct clk *pclk, *hclk = NULL, *tx_clk = NULL, *rx_clk = NULL;
 	unsigned int queue_mask, num_queues;
 	struct macb_platform_data *pdata;
diff --git a/drivers/net/ethernet/cadence/macb_pci.c b/drivers/net/ethernet/cadence/macb_pci.c
new file mode 100644
index 0000000..92be2cd
--- /dev/null
+++ b/drivers/net/ethernet/cadence/macb_pci.c
@@ -0,0 +1,153 @@
+/**
+ * macb_pci.c - Cadence GEM PCI wrapper.
+ *
+ * Copyright (C) 2016 Cadence Design Systems - http://www.cadence.com
+ *
+ * Authors: Rafal Ozieblo <rafalo@cadence.com>
+ *	    Bartosz Folta <bfolta@cadence.com>
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2  of
+ * the License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/clk.h>
+#include <linux/clk-provider.h>
+#include <linux/etherdevice.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/platform_data/macb.h>
+#include <linux/platform_device.h>
+#include "macb.h"
+
+#define PCI_DRIVER_NAME "macb_pci"
+#define PLAT_DRIVER_NAME "macb"
+
+#define CDNS_VENDOR_ID 0x17cd
+#define CDNS_DEVICE_ID 0xe007
+
+#define GEM_PCLK_RATE 50000000
+#define GEM_HCLK_RATE 50000000
+
+static int macb_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+	int err;
+	struct platform_device *plat_dev;
+	struct platform_device_info plat_info;
+	struct macb_platform_data plat_data;
+	struct resource res[2];
+
+	/* sanity check */
+	if (!id)
+		return -EINVAL;
+
+	/* enable pci device */
+	err = pci_enable_device(pdev);
+	if (err < 0) {
+		dev_err(&pdev->dev, "Enabling PCI device has failed: 0x%04X",
+			err);
+		return -EACCES;
+	}
+
+	pci_set_master(pdev);
+
+	/* set up resources */
+	memset(res, 0x00, sizeof(struct resource) * ARRAY_SIZE(res));
+	res[0].start = pdev->resource[0].start;
+	res[0].end = pdev->resource[0].end;
+	res[0].name = PCI_DRIVER_NAME;
+	res[0].flags = IORESOURCE_MEM;
+	res[1].start = pdev->irq;
+	res[1].name = PCI_DRIVER_NAME;
+	res[1].flags = IORESOURCE_IRQ;
+
+	dev_info(&pdev->dev, "EMAC physical base addr = 0x%p\n",
+		 (void *)(uintptr_t)pci_resource_start(pdev, 0));
+
+	/* set up macb platform data */
+	memset(&plat_data, 0, sizeof(plat_data));
+
+	/* initialize clocks */
+	plat_data.pclk = clk_register_fixed_rate(&pdev->dev, "pclk", NULL, 0,
+						 GEM_PCLK_RATE);
+	if (IS_ERR(plat_data.pclk)) {
+		err = PTR_ERR(plat_data.pclk);
+		goto err_pclk_register;
+	}
+
+	plat_data.hclk = clk_register_fixed_rate(&pdev->dev, "hclk", NULL, 0,
+						 GEM_HCLK_RATE);
+	if (IS_ERR(plat_data.hclk)) {
+		err = PTR_ERR(plat_data.hclk);
+		goto err_hclk_register;
+	}
+
+	/* set up platform device info */
+	memset(&plat_info, 0, sizeof(plat_info));
+	plat_info.parent = &pdev->dev;
+	plat_info.fwnode = pdev->dev.fwnode;
+	plat_info.name = PLAT_DRIVER_NAME;
+	plat_info.id = pdev->devfn;
+	plat_info.res = res;
+	plat_info.num_res = ARRAY_SIZE(res);
+	plat_info.data = &plat_data;
+	plat_info.size_data = sizeof(plat_data);
+	plat_info.dma_mask = DMA_BIT_MASK(32);
+
+	/* register platform device */
+	plat_dev = platform_device_register_full(&plat_info);
+	if (IS_ERR(plat_dev)) {
+		err = PTR_ERR(plat_dev);
+		goto err_plat_dev_register;
+	}
+
+	pci_set_drvdata(pdev, plat_dev);
+
+	return 0;
+
+err_plat_dev_register:
+	clk_unregister(plat_data.hclk);
+
+err_hclk_register:
+	clk_unregister(plat_data.pclk);
+
+err_pclk_register:
+	pci_disable_device(pdev);
+	return err;
+}
+
+static void macb_remove(struct pci_dev *pdev)
+{
+	struct platform_device *plat_dev = pci_get_drvdata(pdev);
+	struct macb_platform_data *plat_data = dev_get_platdata(&plat_dev->dev);
+
+	platform_device_unregister(plat_dev);
+	pci_disable_device(pdev);
+	clk_unregister(plat_data->pclk);
+	clk_unregister(plat_data->hclk);
+}
+
+static struct pci_device_id dev_id_table[] = {
+	{ PCI_DEVICE(CDNS_VENDOR_ID, CDNS_DEVICE_ID), },
+	{ 0, }
+};
+
+static struct pci_driver macb_pci_driver = {
+	.name     = PCI_DRIVER_NAME,
+	.id_table = dev_id_table,
+	.probe    = macb_probe,
+	.remove	  = macb_remove,
+};
+
+module_pci_driver(macb_pci_driver);
+MODULE_DEVICE_TABLE(pci, dev_id_table);
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Cadence NIC PCI wrapper");
diff --git a/include/linux/platform_data/macb.h b/include/linux/platform_data/macb.h
index 21b15f6..7815d50 100644
--- a/include/linux/platform_data/macb.h
+++ b/include/linux/platform_data/macb.h
@@ -8,6 +8,8 @@
 #ifndef __MACB_PDATA_H__
 #define __MACB_PDATA_H__
 
+#include <linux/clk.h>
+
 /**
  * struct macb_platform_data - platform data for MACB Ethernet
  * @phy_mask:		phy mask passed when register the MDIO bus
@@ -15,12 +17,16 @@
  * @phy_irq_pin:	PHY IRQ
  * @is_rmii:		using RMII interface?
  * @rev_eth_addr:	reverse Ethernet address byte order
+ * @pclk:		platform clock
+ * @hclk:		AHB clock
  */
 struct macb_platform_data {
 	u32		phy_mask;
 	int		phy_irq_pin;
 	u8		is_rmii;
 	u8		rev_eth_addr;
+	struct clk	*pclk;
+	struct clk	*hclk;
 };
 
 #endif /* __MACB_PDATA_H__ */
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH iproute2 -net-next] lwt: BPF support for LWT
From: Thomas Graf @ 2016-12-14  7:31 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Daniel Borkmann, Alexei Starovoitov, netdev
In-Reply-To: <20161212154134.51638dae@xeon-e3>

On 13 December 2016 at 00:41, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> I went ahead and fixed these.

Thanks for fixing it up Stephen.

^ permalink raw reply

* [Query] Delayed vxlan socket creation?
From: Du, Fan @ 2016-12-14  7:49 UTC (permalink / raw)
  To: netdev@vger.kernel.org; +Cc: mrjana@gmail.com, Du, Fan

Hi

I'm interested to one Docker issue[1] which looks like related to kernel vxlan socket creation
as described in the thread. From my limited knowledge here, socket creation is synchronous ,
and after the *socket* syscall, the sock handle will be valid and ready to linkup.

Somehow I'm not sure the detailed scenario here, and which/how possible commit fix?
Thanks!

Quoted analysis:
--------------------------------------------------------------------------
(Found in kernel 3.13)
The issue happens because in older kernels when a vxlan interface is created, 
the socket creation is queued up in a worker thread which actually creates 
the socket. But this needs to happen before we bring up the link on the vxlan interface. 
If for some chance, the worker thread hasn't completed the creation of the socket 
before we did link up then when we do link up the kernel checks if the socket was 
created and if not it will return ENOTCONN. This was a bug in the kernel which got fixed
in later kernels. That is why retrying with a timer fixes the issue.

[1]: https://github.com/docker/libnetwork/issues/1247

^ permalink raw reply

* [PATCH] vhost: introduce O(1) vq metadata cache
From: Jason Wang @ 2016-12-14  7:56 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev, linux-kernel
  Cc: vkaplans, maxime.coquelin, wexu, peterx

When device IOTLB is enabled, all address translations were stored in
interval tree. O(lgN) searching time could be slow for virtqueue
metadata (avail, used and descriptors) since they were accessed much
often than other addresses. So this patch introduces an O(1) array
which points to the interval tree nodes that store the translations of
vq metadata. Those array were update during vq IOTLB prefetching and
were reset during each invalidation and tlb update. Each time we want
to access vq metadata, this small array were queried before interval
tree. This would be sufficient for static mappings but not dynamic
mappings, we could do optimizations on top.

Test were done with l2fwd in guest (2M hugepage):

   noiommu  | before        | after
tx 1.32Mpps | 1.06Mpps(82%) | 1.30Mpps(98%)
rx 2.33Mpps | 1.46Mpps(63%) | 2.29Mpps(98%)

We can almost reach the same performance as noiommu mode.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 136 ++++++++++++++++++++++++++++++++++++++++----------
 drivers/vhost/vhost.h |   8 +++
 2 files changed, 118 insertions(+), 26 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index c6f2d89..89e40b6 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -282,6 +282,22 @@ void vhost_poll_queue(struct vhost_poll *poll)
 }
 EXPORT_SYMBOL_GPL(vhost_poll_queue);
 
+static void __vhost_vq_meta_reset(struct vhost_virtqueue *vq)
+{
+	int j;
+
+	for (j = 0; j < VHOST_NUM_ADDRS; j++)
+		vq->meta_iotlb[j] = NULL;
+}
+
+static void vhost_vq_meta_reset(struct vhost_dev *d)
+{
+	int i;
+
+	for (i = 0; i < d->nvqs; ++i)
+		__vhost_vq_meta_reset(d->vqs[i]);
+}
+
 static void vhost_vq_reset(struct vhost_dev *dev,
 			   struct vhost_virtqueue *vq)
 {
@@ -311,6 +327,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 	vq->busyloop_timeout = 0;
 	vq->umem = NULL;
 	vq->iotlb = NULL;
+	__vhost_vq_meta_reset(vq);
 }
 
 static int vhost_worker(void *data)
@@ -690,6 +707,18 @@ static int vq_memory_access_ok(void __user *log_base, struct vhost_umem *umem,
 	return 1;
 }
 
+static inline void __user *vhost_vq_meta_fetch(struct vhost_virtqueue *vq,
+					       u64 addr, unsigned int size,
+					       int type)
+{
+	const struct vhost_umem_node *node = vq->meta_iotlb[type];
+
+	if (!node)
+		return NULL;
+
+	return (void *)(node->userspace_addr + (u64)addr - node->start);
+}
+
 /* Can we switch to this memory table? */
 /* Caller should have device mutex but not vq mutex */
 static int memory_access_ok(struct vhost_dev *d, struct vhost_umem *umem,
@@ -732,8 +761,14 @@ static int vhost_copy_to_user(struct vhost_virtqueue *vq, void *to,
 		 * could be access through iotlb. So -EAGAIN should
 		 * not happen in this case.
 		 */
-		/* TODO: more fast path */
 		struct iov_iter t;
+		void __user *uaddr = vhost_vq_meta_fetch(vq,
+				     (u64)(uintptr_t)to, size,
+				     VHOST_ADDR_DESC);
+
+		if (uaddr)
+			return __copy_to_user(uaddr, from, size);
+
 		ret = translate_desc(vq, (u64)(uintptr_t)to, size, vq->iotlb_iov,
 				     ARRAY_SIZE(vq->iotlb_iov),
 				     VHOST_ACCESS_WO);
@@ -761,8 +796,14 @@ static int vhost_copy_from_user(struct vhost_virtqueue *vq, void *to,
 		 * could be access through iotlb. So -EAGAIN should
 		 * not happen in this case.
 		 */
-		/* TODO: more fast path */
+		void __user *uaddr = vhost_vq_meta_fetch(vq,
+				     (u64)(uintptr_t)from, size,
+				     VHOST_ADDR_DESC);
 		struct iov_iter f;
+
+		if (uaddr)
+			return __copy_from_user(to, uaddr, size);
+
 		ret = translate_desc(vq, (u64)(uintptr_t)from, size, vq->iotlb_iov,
 				     ARRAY_SIZE(vq->iotlb_iov),
 				     VHOST_ACCESS_RO);
@@ -782,17 +823,12 @@ static int vhost_copy_from_user(struct vhost_virtqueue *vq, void *to,
 	return ret;
 }
 
-static void __user *__vhost_get_user(struct vhost_virtqueue *vq,
-				     void *addr, unsigned size)
+static void __user *__vhost_get_user_slow(struct vhost_virtqueue *vq,
+					  void *addr, unsigned int size,
+					  int type)
 {
 	int ret;
 
-	/* This function should be called after iotlb
-	 * prefetch, which means we're sure that vq
-	 * could be access through iotlb. So -EAGAIN should
-	 * not happen in this case.
-	 */
-	/* TODO: more fast path */
 	ret = translate_desc(vq, (u64)(uintptr_t)addr, size, vq->iotlb_iov,
 			     ARRAY_SIZE(vq->iotlb_iov),
 			     VHOST_ACCESS_RO);
@@ -813,14 +849,32 @@ static void __user *__vhost_get_user(struct vhost_virtqueue *vq,
 	return vq->iotlb_iov[0].iov_base;
 }
 
-#define vhost_put_user(vq, x, ptr) \
+/* This function should be called after iotlb
+ * prefetch, which means we're sure that vq
+ * could be access through iotlb. So -EAGAIN should
+ * not happen in this case.
+ */
+static inline void __user *__vhost_get_user(struct vhost_virtqueue *vq,
+					    void *addr, unsigned int size,
+					    int type)
+{
+	void __user *uaddr = vhost_vq_meta_fetch(vq,
+			     (u64)(uintptr_t)addr, size, type);
+	if (uaddr)
+		return uaddr;
+
+	return __vhost_get_user_slow(vq, addr, size, type);
+}
+
+#define vhost_put_user(vq, x, ptr)		\
 ({ \
 	int ret = -EFAULT; \
 	if (!vq->iotlb) { \
 		ret = __put_user(x, ptr); \
 	} else { \
 		__typeof__(ptr) to = \
-			(__typeof__(ptr)) __vhost_get_user(vq, ptr, sizeof(*ptr)); \
+			(__typeof__(ptr)) __vhost_get_user(vq, ptr,	\
+					  sizeof(*ptr), VHOST_ADDR_USED); \
 		if (to != NULL) \
 			ret = __put_user(x, to); \
 		else \
@@ -829,14 +883,16 @@ static void __user *__vhost_get_user(struct vhost_virtqueue *vq,
 	ret; \
 })
 
-#define vhost_get_user(vq, x, ptr) \
+#define vhost_get_user(vq, x, ptr, type)		\
 ({ \
 	int ret; \
 	if (!vq->iotlb) { \
 		ret = __get_user(x, ptr); \
 	} else { \
 		__typeof__(ptr) from = \
-			(__typeof__(ptr)) __vhost_get_user(vq, ptr, sizeof(*ptr)); \
+			(__typeof__(ptr)) __vhost_get_user(vq, ptr, \
+							   sizeof(*ptr), \
+							   type); \
 		if (from != NULL) \
 			ret = __get_user(x, from); \
 		else \
@@ -845,6 +901,12 @@ static void __user *__vhost_get_user(struct vhost_virtqueue *vq,
 	ret; \
 })
 
+#define vhost_get_avail(vq, x, ptr) \
+	vhost_get_user(vq, x, ptr, VHOST_ADDR_AVAIL)
+
+#define vhost_get_used(vq, x, ptr) \
+	vhost_get_user(vq, x, ptr, VHOST_ADDR_USED)
+
 static void vhost_dev_lock_vqs(struct vhost_dev *d)
 {
 	int i = 0;
@@ -950,6 +1012,7 @@ int vhost_process_iotlb_msg(struct vhost_dev *dev,
 			ret = -EFAULT;
 			break;
 		}
+		vhost_vq_meta_reset(dev);
 		if (vhost_new_umem_range(dev->iotlb, msg->iova, msg->size,
 					 msg->iova + msg->size - 1,
 					 msg->uaddr, msg->perm)) {
@@ -959,6 +1022,7 @@ int vhost_process_iotlb_msg(struct vhost_dev *dev,
 		vhost_iotlb_notify_vq(dev, msg);
 		break;
 	case VHOST_IOTLB_INVALIDATE:
+		vhost_vq_meta_reset(dev);
 		vhost_del_umem_range(dev->iotlb, msg->iova,
 				     msg->iova + msg->size - 1);
 		break;
@@ -1102,12 +1166,26 @@ static int vq_access_ok(struct vhost_virtqueue *vq, unsigned int num,
 			sizeof *used + num * sizeof *used->ring + s);
 }
 
+static void vhost_vq_meta_update(struct vhost_virtqueue *vq,
+				 const struct vhost_umem_node *node,
+				 int type)
+{
+	int access = (type == VHOST_ADDR_USED) ?
+		     VHOST_ACCESS_WO : VHOST_ACCESS_RO;
+
+	if (likely(node->perm & access))
+		vq->meta_iotlb[type] = node;
+}
+
 static int iotlb_access_ok(struct vhost_virtqueue *vq,
-			   int access, u64 addr, u64 len)
+			   int access, u64 addr, u64 len, int type)
 {
 	const struct vhost_umem_node *node;
 	struct vhost_umem *umem = vq->iotlb;
-	u64 s = 0, size;
+	u64 s = 0, size, orig_addr = addr;
+
+	if (vhost_vq_meta_fetch(vq, addr, len, type))
+		return true;
 
 	while (len > s) {
 		node = vhost_umem_interval_tree_iter_first(&umem->umem_tree,
@@ -1124,6 +1202,10 @@ static int iotlb_access_ok(struct vhost_virtqueue *vq,
 		}
 
 		size = node->size - addr + node->start;
+
+		if (orig_addr == addr && size >= len)
+			vhost_vq_meta_update(vq, node, type);
+
 		s += size;
 		addr += size;
 	}
@@ -1140,13 +1222,15 @@ int vq_iotlb_prefetch(struct vhost_virtqueue *vq)
 		return 1;
 
 	return iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->desc,
-			       num * sizeof *vq->desc) &&
+			       num * sizeof(*vq->desc), VHOST_ADDR_DESC) &&
 	       iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->avail,
 			       sizeof *vq->avail +
-			       num * sizeof *vq->avail->ring + s) &&
+			       num * sizeof(*vq->avail->ring) + s,
+			       VHOST_ADDR_AVAIL) &&
 	       iotlb_access_ok(vq, VHOST_ACCESS_WO, (u64)(uintptr_t)vq->used,
 			       sizeof *vq->used +
-			       num * sizeof *vq->used->ring + s);
+			       num * sizeof(*vq->used->ring) + s,
+			       VHOST_ADDR_USED);
 }
 EXPORT_SYMBOL_GPL(vq_iotlb_prefetch);
 
@@ -1729,7 +1813,7 @@ int vhost_vq_init_access(struct vhost_virtqueue *vq)
 		r = -EFAULT;
 		goto err;
 	}
-	r = vhost_get_user(vq, last_used_idx, &vq->used->idx);
+	r = vhost_get_used(vq, last_used_idx, &vq->used->idx);
 	if (r) {
 		vq_err(vq, "Can't access used idx at %p\n",
 		       &vq->used->idx);
@@ -1932,7 +2016,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 
 	/* Check it isn't doing very strange things with descriptor numbers. */
 	last_avail_idx = vq->last_avail_idx;
-	if (unlikely(vhost_get_user(vq, avail_idx, &vq->avail->idx))) {
+	if (unlikely(vhost_get_avail(vq, avail_idx, &vq->avail->idx))) {
 		vq_err(vq, "Failed to access avail idx at %p\n",
 		       &vq->avail->idx);
 		return -EFAULT;
@@ -1954,7 +2038,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 
 	/* Grab the next descriptor number they're advertising, and increment
 	 * the index we've seen. */
-	if (unlikely(vhost_get_user(vq, ring_head,
+	if (unlikely(vhost_get_avail(vq, ring_head,
 		     &vq->avail->ring[last_avail_idx & (vq->num - 1)]))) {
 		vq_err(vq, "Failed to read head: idx %d address %p\n",
 		       last_avail_idx,
@@ -2170,7 +2254,7 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 
 	if (!vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX)) {
 		__virtio16 flags;
-		if (vhost_get_user(vq, flags, &vq->avail->flags)) {
+		if (vhost_get_avail(vq, flags, &vq->avail->flags)) {
 			vq_err(vq, "Failed to get flags");
 			return true;
 		}
@@ -2184,7 +2268,7 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 	if (unlikely(!v))
 		return true;
 
-	if (vhost_get_user(vq, event, vhost_used_event(vq))) {
+	if (vhost_get_avail(vq, event, vhost_used_event(vq))) {
 		vq_err(vq, "Failed to get used event idx");
 		return true;
 	}
@@ -2226,7 +2310,7 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 	__virtio16 avail_idx;
 	int r;
 
-	r = vhost_get_user(vq, avail_idx, &vq->avail->idx);
+	r = vhost_get_avail(vq, avail_idx, &vq->avail->idx);
 	if (r)
 		return false;
 
@@ -2261,7 +2345,7 @@ bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 	/* They could have slipped one in as we were doing that: make
 	 * sure it's written, then check again. */
 	smp_mb();
-	r = vhost_get_user(vq, avail_idx, &vq->avail->idx);
+	r = vhost_get_avail(vq, avail_idx, &vq->avail->idx);
 	if (r) {
 		vq_err(vq, "Failed to check avail idx at %p: %d\n",
 		       &vq->avail->idx, r);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 78f3c5f..034ea18 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -76,6 +76,13 @@ struct vhost_umem {
 	int numem;
 };
 
+enum vhost_uaddr_type {
+	VHOST_ADDR_DESC = 0,
+	VHOST_ADDR_AVAIL = 1,
+	VHOST_ADDR_USED = 2,
+	VHOST_NUM_ADDRS = 3,
+};
+
 /* The virtqueue structure describes a queue attached to a device. */
 struct vhost_virtqueue {
 	struct vhost_dev *dev;
@@ -86,6 +93,7 @@ struct vhost_virtqueue {
 	struct vring_desc __user *desc;
 	struct vring_avail __user *avail;
 	struct vring_used __user *used;
+	const struct vhost_umem_node *meta_iotlb[VHOST_NUM_ADDRS];
 	struct file *kick;
 	struct file *call;
 	struct file *error;
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH] vhost: introduce O(1) vq metadata cache
From: kbuild test robot @ 2016-12-14  8:14 UTC (permalink / raw)
  To: Jason Wang
  Cc: kvm, mst, netdev, linux-kernel, peterx, virtualization,
	maxime.coquelin, kbuild-all, vkaplans, wexu
In-Reply-To: <1481702183-16088-1-git-send-email-jasowang@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 1838 bytes --]

Hi Jason,

[auto build test WARNING on vhost/linux-next]
[also build test WARNING on v4.9 next-20161214]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Jason-Wang/vhost-introduce-O-1-vq-metadata-cache/20161214-160153
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: i386-randconfig-x005-201650 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   drivers/vhost/vhost.c: In function 'vhost_vq_meta_fetch':
>> drivers/vhost/vhost.c:719:9: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
     return (void *)(node->userspace_addr + (u64)addr - node->start);
            ^

vim +719 drivers/vhost/vhost.c

   703							   node->start,
   704							   node->size))
   705				return 0;
   706		}
   707		return 1;
   708	}
   709	
   710	static inline void __user *vhost_vq_meta_fetch(struct vhost_virtqueue *vq,
   711						       u64 addr, unsigned int size,
   712						       int type)
   713	{
   714		const struct vhost_umem_node *node = vq->meta_iotlb[type];
   715	
   716		if (!node)
   717			return NULL;
   718	
 > 719		return (void *)(node->userspace_addr + (u64)addr - node->start);
   720	}
   721	
   722	/* Can we switch to this memory table? */
   723	/* Caller should have device mutex but not vq mutex */
   724	static int memory_access_ok(struct vhost_dev *d, struct vhost_umem *umem,
   725				    int log_all)
   726	{
   727		int i;

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 27829 bytes --]

[-- Attachment #3: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: stmmac driver...
From: Jie Deng @ 2016-12-14  8:26 UTC (permalink / raw)
  To: Jie Deng, David Miller, alexandre.torgue
  Cc: CARLOS.PALMINHA, netdev, Giuseppe CAVALLARO
In-Reply-To: <d8d00654-a07e-2992-a911-3b88f1d3d3ac@st.com>

Hi David,

>>>> Giuseppe and Alexandre,
>>>>
>>>> There are a lot of patches and discussions happening around the stammc
>>>> driver lately and both of you are listed as the maintainers.
>>>>
>>>> I really need prompt and conclusive reviews of these patch submissions
>>>> from you, and participation in all discussions about the driver.
>>>
>>> yes we are trying to do the best.
>>>
>>>> Otherwise I have only three things I can do: 1) let the patches rot in
>>>> patchwork for days 2) trust that the patches are sane and fit your
>>>> desires and goals and just apply them or 3) reject them since they
>>>> aren't being reviewed properly.
>>>
>>> at this stage, I think the best is: (3).
>> I think the patches David mentioned also included XLGMAC. He sent this email
>> before I explained QoS and XLGMAC were different IPs. Do you mind we do XLGMAC
>> development under drivers/net/ethernet/synopsys/ ? I think we don't have
>> conflict since we will keep QoS development in stmmac.
>
> Great. Many thanks for the clarification :-)
>
> Regards
> Peppe
>
Do you agree that we do XLGMAC  development under drivers/net/ethernet/synopsys/
in the future ?
There is no conflict of interest since this is a new IP without driver. As you
see, there are several drivers for QoS (GMAC) and several drivers for XGMAC. We
want to avoid this situation for the new IP XLGMAC.

Regards,
Jie

^ permalink raw reply

* Re: [PATCH v2 net-next 1/2] phy: add phy fixup unregister functions
From: Dongpo Li @ 2016-12-14  8:39 UTC (permalink / raw)
  To: Woojung.Huh, davem, f.fainelli; +Cc: andrew, netdev, UNGLinuxDriver
In-Reply-To: <9235D6609DB808459E95D78E17F2E43D4097999C@CHN-SV-EXMX02.mchp-main.com>

Hi all,

On 2016/12/8 4:26, Woojung.Huh@microchip.com wrote:
>>From : Woojung Huh <woojung.huh@microchip.com>
> 
> Add functions to unregister phy fixup for modules.
> 
> int phy_unregister_fixup(const char *bus_id, u32 phy_uid, u32 phy_uid_mask)
> 	Unregister phy fixup from phy_fixup_list per bus_id, phy_uid &
> 	phy_uid_mask
> 
> int phy_unregister_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask)
> 	Unregister phy fixup from phy_fixup_list.
> 	Use it for fixup registered by phy_register_fixup_for_uid()
> 
> int phy_unregister_fixup_for_id(const char *bus_id)
> 	Unregister phy fixup from phy_fixup_list.
> 	Use it for fixup registered by phy_register_fixup_for_id()
> 
> Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
> ---
>  Documentation/networking/phy.txt |  9 ++++++++
>  drivers/net/phy/phy_device.c     | 47 ++++++++++++++++++++++++++++++++++++++++
>  include/linux/phy.h              |  4 ++++
>  3 files changed, 60 insertions(+)
> 
> diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt
> index e017d93..16f90d8 100644
> --- a/Documentation/networking/phy.txt
> +++ b/Documentation/networking/phy.txt
> @@ -407,6 +407,15 @@ Board Fixups
>   The stubs set one of the two matching criteria, and set the other one to
>   match anything.
>  
> + When phy_register_fixup() or *_for_uid()/*_for_id() is called at module,
> + unregister fixup and free allocate memory are required.
> +
> + Call one of following function before unloading module.
> +
> + int phy_unregister_fixup(const char *phy_id, u32 phy_uid, u32 phy_uid_mask);
> + int phy_unregister_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask);
> + int phy_register_fixup_for_id(const char *phy_id);
> +
>  Standards
>  
>   IEEE Standard 802.3: CSMA/CD Access Method and Physical Layer Specifications, Section Two:
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index aeaf1bc..32fa7c7 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -235,6 +235,53 @@ int phy_register_fixup_for_id(const char *bus_id,
>  }
>  EXPORT_SYMBOL(phy_register_fixup_for_id);
>  
> +/**
> + * phy_unregister_fixup - remove a phy_fixup from the list
> + * @bus_id: A string matches fixup->bus_id (or PHY_ANY_ID) in phy_fixup_list
> + * @phy_uid: A phy id matches fixup->phy_id (or PHY_ANY_UID) in phy_fixup_list
> + * @phy_uid_mask: Applied to phy_uid and fixup->phy_uid before comparison
> + */
> +int phy_unregister_fixup(const char *bus_id, u32 phy_uid, u32 phy_uid_mask)
> +{
> +	struct list_head *pos, *n;
> +	struct phy_fixup *fixup;
> +	int ret;
> +
> +	ret = -ENODEV;
> +
> +	mutex_lock(&phy_fixup_lock);
> +	list_for_each_safe(pos, n, &phy_fixup_list) {
> +		fixup = list_entry(pos, struct phy_fixup, list);
> +
> +		if ((!strcmp(fixup->bus_id, bus_id)) &&
> +		    ((fixup->phy_uid & phy_uid_mask) ==
> +		     (phy_uid & phy_uid_mask))) {
> +			list_del(&fixup->list);
> +			kfree(fixup);
> +			ret = 0;
> +			break;
> +		}
> +	}
> +	mutex_unlock(&phy_fixup_lock);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(phy_unregister_fixup);
> +
I just want to commit the unregister patch and found this patch. Good job!
But I consider this patch may miss something.
If one SoC has 2 MAC ports and each port uses the different network driver,
the 2 drivers may register fixup for the same PHY chip with different
"run" function because the PHY chip works in different mode.
In such a case, this patch doesn't consider "run" function and may cause problem.
When removing the driver which register fixup at last, it will remove another
driver's fixup.
Should this condition be considered and fixed?

> +/* Unregisters a fixup of any PHY with the UID in phy_uid */
> +int phy_unregister_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask)
> +{
> +	return phy_unregister_fixup(PHY_ANY_ID, phy_uid, phy_uid_mask);
> +}
> +EXPORT_SYMBOL(phy_unregister_fixup_for_uid);
> +
> +/* Unregisters a fixup of the PHY with id string bus_id */
> +int phy_unregister_fixup_for_id(const char *bus_id)
> +{
> +	return phy_unregister_fixup(bus_id, PHY_ANY_UID, 0xffffffff);
> +}
> +EXPORT_SYMBOL(phy_unregister_fixup_for_id);
> +
>  /* Returns 1 if fixup matches phydev in bus_id and phy_uid.
>   * Fixups can be set to match any in one or more fields.
>   */
> diff --git a/include/linux/phy.h b/include/linux/phy.h
> index feb8a98..f7d95f6 100644
> --- a/include/linux/phy.h
> +++ b/include/linux/phy.h
> @@ -860,6 +860,10 @@ int phy_register_fixup_for_id(const char *bus_id,
>  int phy_register_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask,
>  			       int (*run)(struct phy_device *));
>  
> +int phy_unregister_fixup(const char *bus_id, u32 phy_uid, u32 phy_uid_mask);
> +int phy_unregister_fixup_for_id(const char *bus_id);
> +int phy_unregister_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask);
> +
>  int phy_init_eee(struct phy_device *phydev, bool clk_stop_enable);
>  int phy_get_eee_err(struct phy_device *phydev);
>  int phy_ethtool_set_eee(struct phy_device *phydev, struct ethtool_eee *data);
> 


    Regards,
    Dongpo

.

^ permalink raw reply

* Re: stmmac driver...
From: Giuseppe CAVALLARO @ 2016-12-14  7:33 UTC (permalink / raw)
  To: Jie Deng, David Miller, alexandre.torgue; +Cc: netdev
In-Reply-To: <8d624fd3-8440-5b8a-ee8d-558a671eec60@synopsys.com>

Hello Jie

On 12/14/2016 5:05 AM, Jie Deng wrote:
> Hi Peppe,
>
> On 2016/12/12 22:17, Giuseppe CAVALLARO wrote:
>> Hi David
>>
>> On 12/7/2016 7:06 PM, David Miller wrote:
>>>
>>> Giuseppe and Alexandre,
>>>
>>> There are a lot of patches and discussions happening around the stammc
>>> driver lately and both of you are listed as the maintainers.
>>>
>>> I really need prompt and conclusive reviews of these patch submissions
>>> from you, and participation in all discussions about the driver.
>>
>> yes we are trying to do the best.
>>
>>> Otherwise I have only three things I can do: 1) let the patches rot in
>>> patchwork for days 2) trust that the patches are sane and fit your
>>> desires and goals and just apply them or 3) reject them since they
>>> aren't being reviewed properly.
>>
>> at this stage, I think the best is: (3).
> I think the patches David mentioned also included XLGMAC. He sent this email
> before I explained QoS and XLGMAC were different IPs. Do you mind we do XLGMAC
> development under drivers/net/ethernet/synopsys/ ? I think we don't have
> conflict since we will keep QoS development in stmmac.

Great. Many thanks for the clarification :-)

Regards
Peppe

>>
>>>
>>> Thanks in advance.
>>>
>> you are welcome
>>
>>
>> Peppe
>
>

^ permalink raw reply

* [PATCH] net: davicom: dm9000: use new api ethtool_{get|set}_link_ksettings
From: Philippe Reynes @ 2016-12-14  9:01 UTC (permalink / raw)
  To: davem, robert.jarzmik, mugunthanvnm, marcel, jarod, s.nawrocki,
	fw, harvey.hunt
  Cc: netdev, Philippe Reynes

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
---
 drivers/net/ethernet/davicom/dm9000.c |   14 ++++++++------
 1 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/davicom/dm9000.c b/drivers/net/ethernet/davicom/dm9000.c
index f1a81c5..008dc81 100644
--- a/drivers/net/ethernet/davicom/dm9000.c
+++ b/drivers/net/ethernet/davicom/dm9000.c
@@ -570,19 +570,21 @@ static void dm9000_set_msglevel(struct net_device *dev, u32 value)
 	dm->msg_enable = value;
 }
 
-static int dm9000_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
+static int dm9000_get_link_ksettings(struct net_device *dev,
+				     struct ethtool_link_ksettings *cmd)
 {
 	struct board_info *dm = to_dm9000_board(dev);
 
-	mii_ethtool_gset(&dm->mii, cmd);
+	mii_ethtool_get_link_ksettings(&dm->mii, cmd);
 	return 0;
 }
 
-static int dm9000_set_settings(struct net_device *dev, struct ethtool_cmd *cmd)
+static int dm9000_set_link_ksettings(struct net_device *dev,
+				     const struct ethtool_link_ksettings *cmd)
 {
 	struct board_info *dm = to_dm9000_board(dev);
 
-	return mii_ethtool_sset(&dm->mii, cmd);
+	return mii_ethtool_set_link_ksettings(&dm->mii, cmd);
 }
 
 static int dm9000_nway_reset(struct net_device *dev)
@@ -741,8 +743,6 @@ static int dm9000_set_wol(struct net_device *dev, struct ethtool_wolinfo *w)
 
 static const struct ethtool_ops dm9000_ethtool_ops = {
 	.get_drvinfo		= dm9000_get_drvinfo,
-	.get_settings		= dm9000_get_settings,
-	.set_settings		= dm9000_set_settings,
 	.get_msglevel		= dm9000_get_msglevel,
 	.set_msglevel		= dm9000_set_msglevel,
 	.nway_reset		= dm9000_nway_reset,
@@ -752,6 +752,8 @@ static int dm9000_set_wol(struct net_device *dev, struct ethtool_wolinfo *w)
 	.get_eeprom_len		= dm9000_get_eeprom_len,
 	.get_eeprom		= dm9000_get_eeprom,
 	.set_eeprom		= dm9000_set_eeprom,
+	.get_link_ksettings	= dm9000_get_link_ksettings,
+	.set_link_ksettings	= dm9000_set_link_ksettings,
 };
 
 static void dm9000_show_carrier(struct board_info *db,
-- 
1.7.4.4

^ permalink raw reply related

* Re: [Query] Delayed vxlan socket creation?
From: Jiri Benc @ 2016-12-14  9:29 UTC (permalink / raw)
  To: Du, Fan; +Cc: netdev@vger.kernel.org, mrjana@gmail.com
In-Reply-To: <5A90DA2E42F8AE43BC4A093BF06788481A9457F1@SHSMSX103.ccr.corp.intel.com>

On Wed, 14 Dec 2016 07:49:24 +0000, Du, Fan wrote:
> I'm interested to one Docker issue[1] which looks like related to kernel vxlan socket creation
> as described in the thread. From my limited knowledge here, socket creation is synchronous ,
> and after the *socket* syscall, the sock handle will be valid and ready to linkup.
> 
> Somehow I'm not sure the detailed scenario here, and which/how possible commit fix?

baf606d9c9b1^..56ef9c909b40

 Jiri

^ permalink raw reply

* Re: [PATCH] vhost: introduce O(1) vq metadata cache
From: Jason Wang @ 2016-12-14  9:33 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kvm, mst, netdev, linux-kernel, peterx, virtualization,
	maxime.coquelin, kbuild-all, vkaplans, wexu
In-Reply-To: <201612141605.QJowNf4i%fengguang.wu@intel.com>



On 2016年12月14日 16:14, kbuild test robot wrote:
> Hi Jason,
>
> [auto build test WARNING on vhost/linux-next]
> [also build test WARNING on v4.9 next-20161214]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
>
> url:    https://github.com/0day-ci/linux/commits/Jason-Wang/vhost-introduce-O-1-vq-metadata-cache/20161214-160153
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
> config: i386-randconfig-x005-201650 (attached as .config)
> compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
> reproduce:
>          # save the attached .config to linux build tree

Thanks, V2 will be posted soon.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: Designing a safe RX-zero-copy Memory Model for Networking
From: Jesper Dangaard Brouer @ 2016-12-14  9:39 UTC (permalink / raw)
  To: John Fastabend
  Cc: David Miller, cl, rppt, netdev, linux-mm, willemdebruijn.kernel,
	bjorn.topel, magnus.karlsson, alexander.duyck, mgorman, tom,
	bblanco, tariqt, saeedm, jesse.brandeburg, METH, vyasevich,
	brouer
In-Reply-To: <58505535.1080908@gmail.com>

On Tue, 13 Dec 2016 12:08:21 -0800
John Fastabend <john.fastabend@gmail.com> wrote:

> On 16-12-13 11:53 AM, David Miller wrote:
> > From: John Fastabend <john.fastabend@gmail.com>
> > Date: Tue, 13 Dec 2016 09:43:59 -0800
> >   
> >> What does "zero-copy send packet-pages to the application/socket that
> >> requested this" mean? At the moment on x86 page-flipping appears to be
> >> more expensive than memcpy (I can post some data shortly) and shared
> >> memory was proposed and rejected for security reasons when we were
> >> working on bifurcated driver.  
> > 
> > The whole idea is that we map all the active RX ring pages into
> > userspace from the start.
> > 
> > And just how Jesper's page pool work will avoid DMA map/unmap,
> > it will also avoid changing the userspace mapping of the pages
> > as well.
> > 
> > Thus avoiding the TLB/VM overhead altogether.
> >   

Exactly.  It is worth mentioning that pages entering the page pool need
to be cleared (measured cost 143 cycles), in order to not leak any
kernel info.  The primary focus of this design is to make sure not to
leak kernel info to userspace, but with an "exclusive" mode also
support isolation between applications.


> I get this but it requires applications to be isolated. The pages from
> a queue can not be shared between multiple applications in different
> trust domains. And the application has to be cooperative meaning it
> can't "look" at data that has not been marked by the stack as OK. In
> these schemes we tend to end up with something like virtio/vhost or
> af_packet.

I expect 3 modes, when enabling RX-zero-copy on a page_pool. The first
two would require CAP_NET_ADMIN privileges.  All modes have a trust
domain id, that need to match e.g. when page reach the socket.

Mode-1 "Shared": Application choose lowest isolation level, allowing
 multiple application to mmap VMA area.

Mode-2 "Single-user": Application request it want to be the only user
 of the RX queue.  This blocks other application to mmap VMA area.

Mode-3 "Exclusive": Application request to own RX queue.  Packets are
 no longer allowed for normal netstack delivery.

Notice mode-2 still requires CAP_NET_ADMIN, because packets/pages are
still allowed to travel netstack and thus can contain packet data from
other normal applications.  This is part of the design, to share the
NIC between netstack and an accelerated userspace application using RX
zero-copy delivery.


> Any ACLs/filtering/switching/headers need to be done in hardware or
> the application trust boundaries are broken.

The software solution outlined allow the application to make the choice
of what trust boundary it wants.

The "exclusive" mode-3 make most sense together with HW filters.
Already today, we support creating a new RX queue based on ethtool
ntuple HW filter and then you simply attach your application that queue
in mode-3, and have full isolation.

 
> If the above can not be met then a copy is needed. What I am trying
> to tease out is the above comment along with other statements like
> this "can be done with out HW filter features".

Does this address your concerns?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* RE: [PATCH 3/3] secure_seq: use fast&secure siphash instead of slow&insecure md5
From: David Laight @ 2016-12-14  9:51 UTC (permalink / raw)
  To: 'Jason A. Donenfeld', Netdev, David Miller,
	Linus Torvalds, kernel-hardening@lists.openwall.com, LKML,
	George Spelvin, Scott Bauer, Andi Kleen, Andy Lutomirski, Greg KH,
	Eric Biggers
In-Reply-To: <20161214001656.19388-3-Jason@zx2c4.com>

From: Jason A. Donenfeld
> Sent: 14 December 2016 00:17
> This gives a clear speed and security improvement. Rather than manually
> filling MD5 buffers, we simply create a layout by a simple anonymous
> struct, for which gcc generates rather efficient code.
...
> +	const struct {
> +		struct in6_addr saddr;
> +		struct in6_addr daddr;
> +		__be16 sport;
> +		__be16 dport;
> +	} __packed combined = {
> +		.saddr = *(struct in6_addr *)saddr,
> +		.daddr = *(struct in6_addr *)daddr,
> +		.sport = sport,
> +		.dport = dport
> +	};

You need to look at the effect of marking this (and the other)
structures 'packed' on architectures like sparc64.

	David

^ permalink raw reply

* [PATCH V2] vhost: introduce O(1) vq metadata cache
From: Jason Wang @ 2016-12-14  9:53 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev, linux-kernel
  Cc: vkaplans, maxime.coquelin, wexu, peterx

When device IOTLB is enabled, all address translations were stored in
interval tree. O(lgN) searching time could be slow for virtqueue
metadata (avail, used and descriptors) since they were accessed much
often than other addresses. So this patch introduces an O(1) array
which points to the interval tree nodes that store the translations of
vq metadata. Those array were update during vq IOTLB prefetching and
were reset during each invalidation and tlb update. Each time we want
to access vq metadata, this small array were queried before interval
tree. This would be sufficient for static mappings but not dynamic
mappings, we could do optimizations on top.

Test were done with l2fwd in guest (2M hugepage):

   noiommu  | before        | after
tx 1.32Mpps | 1.06Mpps(82%) | 1.30Mpps(98%)
rx 2.33Mpps | 1.46Mpps(63%) | 2.29Mpps(98%)

We can almost reach the same performance as noiommu mode.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
Changes from V1:
- silent 32bit build warning
---
 drivers/vhost/vhost.c | 136 ++++++++++++++++++++++++++++++++++++++++----------
 drivers/vhost/vhost.h |   8 +++
 2 files changed, 118 insertions(+), 26 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index c6f2d89..50ed625 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -282,6 +282,22 @@ void vhost_poll_queue(struct vhost_poll *poll)
 }
 EXPORT_SYMBOL_GPL(vhost_poll_queue);
 
+static void __vhost_vq_meta_reset(struct vhost_virtqueue *vq)
+{
+	int j;
+
+	for (j = 0; j < VHOST_NUM_ADDRS; j++)
+		vq->meta_iotlb[j] = NULL;
+}
+
+static void vhost_vq_meta_reset(struct vhost_dev *d)
+{
+	int i;
+
+	for (i = 0; i < d->nvqs; ++i)
+		__vhost_vq_meta_reset(d->vqs[i]);
+}
+
 static void vhost_vq_reset(struct vhost_dev *dev,
 			   struct vhost_virtqueue *vq)
 {
@@ -311,6 +327,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 	vq->busyloop_timeout = 0;
 	vq->umem = NULL;
 	vq->iotlb = NULL;
+	__vhost_vq_meta_reset(vq);
 }
 
 static int vhost_worker(void *data)
@@ -690,6 +707,18 @@ static int vq_memory_access_ok(void __user *log_base, struct vhost_umem *umem,
 	return 1;
 }
 
+static inline void __user *vhost_vq_meta_fetch(struct vhost_virtqueue *vq,
+					       u64 addr, unsigned int size,
+					       int type)
+{
+	const struct vhost_umem_node *node = vq->meta_iotlb[type];
+
+	if (!node)
+		return NULL;
+
+	return (void *)(uintptr_t)(node->userspace_addr + addr - node->start);
+}
+
 /* Can we switch to this memory table? */
 /* Caller should have device mutex but not vq mutex */
 static int memory_access_ok(struct vhost_dev *d, struct vhost_umem *umem,
@@ -732,8 +761,14 @@ static int vhost_copy_to_user(struct vhost_virtqueue *vq, void *to,
 		 * could be access through iotlb. So -EAGAIN should
 		 * not happen in this case.
 		 */
-		/* TODO: more fast path */
 		struct iov_iter t;
+		void __user *uaddr = vhost_vq_meta_fetch(vq,
+				     (u64)(uintptr_t)to, size,
+				     VHOST_ADDR_DESC);
+
+		if (uaddr)
+			return __copy_to_user(uaddr, from, size);
+
 		ret = translate_desc(vq, (u64)(uintptr_t)to, size, vq->iotlb_iov,
 				     ARRAY_SIZE(vq->iotlb_iov),
 				     VHOST_ACCESS_WO);
@@ -761,8 +796,14 @@ static int vhost_copy_from_user(struct vhost_virtqueue *vq, void *to,
 		 * could be access through iotlb. So -EAGAIN should
 		 * not happen in this case.
 		 */
-		/* TODO: more fast path */
+		void __user *uaddr = vhost_vq_meta_fetch(vq,
+				     (u64)(uintptr_t)from, size,
+				     VHOST_ADDR_DESC);
 		struct iov_iter f;
+
+		if (uaddr)
+			return __copy_from_user(to, uaddr, size);
+
 		ret = translate_desc(vq, (u64)(uintptr_t)from, size, vq->iotlb_iov,
 				     ARRAY_SIZE(vq->iotlb_iov),
 				     VHOST_ACCESS_RO);
@@ -782,17 +823,12 @@ static int vhost_copy_from_user(struct vhost_virtqueue *vq, void *to,
 	return ret;
 }
 
-static void __user *__vhost_get_user(struct vhost_virtqueue *vq,
-				     void *addr, unsigned size)
+static void __user *__vhost_get_user_slow(struct vhost_virtqueue *vq,
+					  void *addr, unsigned int size,
+					  int type)
 {
 	int ret;
 
-	/* This function should be called after iotlb
-	 * prefetch, which means we're sure that vq
-	 * could be access through iotlb. So -EAGAIN should
-	 * not happen in this case.
-	 */
-	/* TODO: more fast path */
 	ret = translate_desc(vq, (u64)(uintptr_t)addr, size, vq->iotlb_iov,
 			     ARRAY_SIZE(vq->iotlb_iov),
 			     VHOST_ACCESS_RO);
@@ -813,14 +849,32 @@ static void __user *__vhost_get_user(struct vhost_virtqueue *vq,
 	return vq->iotlb_iov[0].iov_base;
 }
 
-#define vhost_put_user(vq, x, ptr) \
+/* This function should be called after iotlb
+ * prefetch, which means we're sure that vq
+ * could be access through iotlb. So -EAGAIN should
+ * not happen in this case.
+ */
+static inline void __user *__vhost_get_user(struct vhost_virtqueue *vq,
+					    void *addr, unsigned int size,
+					    int type)
+{
+	void __user *uaddr = vhost_vq_meta_fetch(vq,
+			     (u64)(uintptr_t)addr, size, type);
+	if (uaddr)
+		return uaddr;
+
+	return __vhost_get_user_slow(vq, addr, size, type);
+}
+
+#define vhost_put_user(vq, x, ptr)		\
 ({ \
 	int ret = -EFAULT; \
 	if (!vq->iotlb) { \
 		ret = __put_user(x, ptr); \
 	} else { \
 		__typeof__(ptr) to = \
-			(__typeof__(ptr)) __vhost_get_user(vq, ptr, sizeof(*ptr)); \
+			(__typeof__(ptr)) __vhost_get_user(vq, ptr,	\
+					  sizeof(*ptr), VHOST_ADDR_USED); \
 		if (to != NULL) \
 			ret = __put_user(x, to); \
 		else \
@@ -829,14 +883,16 @@ static void __user *__vhost_get_user(struct vhost_virtqueue *vq,
 	ret; \
 })
 
-#define vhost_get_user(vq, x, ptr) \
+#define vhost_get_user(vq, x, ptr, type)		\
 ({ \
 	int ret; \
 	if (!vq->iotlb) { \
 		ret = __get_user(x, ptr); \
 	} else { \
 		__typeof__(ptr) from = \
-			(__typeof__(ptr)) __vhost_get_user(vq, ptr, sizeof(*ptr)); \
+			(__typeof__(ptr)) __vhost_get_user(vq, ptr, \
+							   sizeof(*ptr), \
+							   type); \
 		if (from != NULL) \
 			ret = __get_user(x, from); \
 		else \
@@ -845,6 +901,12 @@ static void __user *__vhost_get_user(struct vhost_virtqueue *vq,
 	ret; \
 })
 
+#define vhost_get_avail(vq, x, ptr) \
+	vhost_get_user(vq, x, ptr, VHOST_ADDR_AVAIL)
+
+#define vhost_get_used(vq, x, ptr) \
+	vhost_get_user(vq, x, ptr, VHOST_ADDR_USED)
+
 static void vhost_dev_lock_vqs(struct vhost_dev *d)
 {
 	int i = 0;
@@ -950,6 +1012,7 @@ int vhost_process_iotlb_msg(struct vhost_dev *dev,
 			ret = -EFAULT;
 			break;
 		}
+		vhost_vq_meta_reset(dev);
 		if (vhost_new_umem_range(dev->iotlb, msg->iova, msg->size,
 					 msg->iova + msg->size - 1,
 					 msg->uaddr, msg->perm)) {
@@ -959,6 +1022,7 @@ int vhost_process_iotlb_msg(struct vhost_dev *dev,
 		vhost_iotlb_notify_vq(dev, msg);
 		break;
 	case VHOST_IOTLB_INVALIDATE:
+		vhost_vq_meta_reset(dev);
 		vhost_del_umem_range(dev->iotlb, msg->iova,
 				     msg->iova + msg->size - 1);
 		break;
@@ -1102,12 +1166,26 @@ static int vq_access_ok(struct vhost_virtqueue *vq, unsigned int num,
 			sizeof *used + num * sizeof *used->ring + s);
 }
 
+static void vhost_vq_meta_update(struct vhost_virtqueue *vq,
+				 const struct vhost_umem_node *node,
+				 int type)
+{
+	int access = (type == VHOST_ADDR_USED) ?
+		     VHOST_ACCESS_WO : VHOST_ACCESS_RO;
+
+	if (likely(node->perm & access))
+		vq->meta_iotlb[type] = node;
+}
+
 static int iotlb_access_ok(struct vhost_virtqueue *vq,
-			   int access, u64 addr, u64 len)
+			   int access, u64 addr, u64 len, int type)
 {
 	const struct vhost_umem_node *node;
 	struct vhost_umem *umem = vq->iotlb;
-	u64 s = 0, size;
+	u64 s = 0, size, orig_addr = addr;
+
+	if (vhost_vq_meta_fetch(vq, addr, len, type))
+		return true;
 
 	while (len > s) {
 		node = vhost_umem_interval_tree_iter_first(&umem->umem_tree,
@@ -1124,6 +1202,10 @@ static int iotlb_access_ok(struct vhost_virtqueue *vq,
 		}
 
 		size = node->size - addr + node->start;
+
+		if (orig_addr == addr && size >= len)
+			vhost_vq_meta_update(vq, node, type);
+
 		s += size;
 		addr += size;
 	}
@@ -1140,13 +1222,15 @@ int vq_iotlb_prefetch(struct vhost_virtqueue *vq)
 		return 1;
 
 	return iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->desc,
-			       num * sizeof *vq->desc) &&
+			       num * sizeof(*vq->desc), VHOST_ADDR_DESC) &&
 	       iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->avail,
 			       sizeof *vq->avail +
-			       num * sizeof *vq->avail->ring + s) &&
+			       num * sizeof(*vq->avail->ring) + s,
+			       VHOST_ADDR_AVAIL) &&
 	       iotlb_access_ok(vq, VHOST_ACCESS_WO, (u64)(uintptr_t)vq->used,
 			       sizeof *vq->used +
-			       num * sizeof *vq->used->ring + s);
+			       num * sizeof(*vq->used->ring) + s,
+			       VHOST_ADDR_USED);
 }
 EXPORT_SYMBOL_GPL(vq_iotlb_prefetch);
 
@@ -1729,7 +1813,7 @@ int vhost_vq_init_access(struct vhost_virtqueue *vq)
 		r = -EFAULT;
 		goto err;
 	}
-	r = vhost_get_user(vq, last_used_idx, &vq->used->idx);
+	r = vhost_get_used(vq, last_used_idx, &vq->used->idx);
 	if (r) {
 		vq_err(vq, "Can't access used idx at %p\n",
 		       &vq->used->idx);
@@ -1932,7 +2016,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 
 	/* Check it isn't doing very strange things with descriptor numbers. */
 	last_avail_idx = vq->last_avail_idx;
-	if (unlikely(vhost_get_user(vq, avail_idx, &vq->avail->idx))) {
+	if (unlikely(vhost_get_avail(vq, avail_idx, &vq->avail->idx))) {
 		vq_err(vq, "Failed to access avail idx at %p\n",
 		       &vq->avail->idx);
 		return -EFAULT;
@@ -1954,7 +2038,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 
 	/* Grab the next descriptor number they're advertising, and increment
 	 * the index we've seen. */
-	if (unlikely(vhost_get_user(vq, ring_head,
+	if (unlikely(vhost_get_avail(vq, ring_head,
 		     &vq->avail->ring[last_avail_idx & (vq->num - 1)]))) {
 		vq_err(vq, "Failed to read head: idx %d address %p\n",
 		       last_avail_idx,
@@ -2170,7 +2254,7 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 
 	if (!vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX)) {
 		__virtio16 flags;
-		if (vhost_get_user(vq, flags, &vq->avail->flags)) {
+		if (vhost_get_avail(vq, flags, &vq->avail->flags)) {
 			vq_err(vq, "Failed to get flags");
 			return true;
 		}
@@ -2184,7 +2268,7 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 	if (unlikely(!v))
 		return true;
 
-	if (vhost_get_user(vq, event, vhost_used_event(vq))) {
+	if (vhost_get_avail(vq, event, vhost_used_event(vq))) {
 		vq_err(vq, "Failed to get used event idx");
 		return true;
 	}
@@ -2226,7 +2310,7 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 	__virtio16 avail_idx;
 	int r;
 
-	r = vhost_get_user(vq, avail_idx, &vq->avail->idx);
+	r = vhost_get_avail(vq, avail_idx, &vq->avail->idx);
 	if (r)
 		return false;
 
@@ -2261,7 +2345,7 @@ bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 	/* They could have slipped one in as we were doing that: make
 	 * sure it's written, then check again. */
 	smp_mb();
-	r = vhost_get_user(vq, avail_idx, &vq->avail->idx);
+	r = vhost_get_avail(vq, avail_idx, &vq->avail->idx);
 	if (r) {
 		vq_err(vq, "Failed to check avail idx at %p: %d\n",
 		       &vq->avail->idx, r);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 78f3c5f..034ea18 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -76,6 +76,13 @@ struct vhost_umem {
 	int numem;
 };
 
+enum vhost_uaddr_type {
+	VHOST_ADDR_DESC = 0,
+	VHOST_ADDR_AVAIL = 1,
+	VHOST_ADDR_USED = 2,
+	VHOST_NUM_ADDRS = 3,
+};
+
 /* The virtqueue structure describes a queue attached to a device. */
 struct vhost_virtqueue {
 	struct vhost_dev *dev;
@@ -86,6 +93,7 @@ struct vhost_virtqueue {
 	struct vring_desc __user *desc;
 	struct vring_avail __user *avail;
 	struct vring_used __user *used;
+	const struct vhost_umem_node *meta_iotlb[VHOST_NUM_ADDRS];
 	struct file *kick;
 	struct file *call;
 	struct file *error;
-- 
2.7.4

^ permalink raw reply related

* RE: [PATCH 1/3] siphash: add cryptographically secure hashtable function
From: David Laight @ 2016-12-14  9:56 UTC (permalink / raw)
  To: 'Jason A. Donenfeld', Netdev, David Miller,
	Linus Torvalds, kernel-hardening@lists.openwall.com, LKML,
	George Spelvin, Scott Bauer, Andi Kleen, Andy Lutomirski, Greg KH,
	Eric Biggers
  Cc: Jean-Philippe Aumasson, Daniel J . Bernstein
In-Reply-To: <20161214001656.19388-1-Jason@zx2c4.com>

From: Jason A. Donenfeld
> Sent: 14 December 2016 00:17
> SipHash is a 64-bit keyed hash function that is actually a
> cryptographically secure PRF, like HMAC. Except SipHash is super fast,
> and is meant to be used as a hashtable keyed lookup function.
> 
> SipHash isn't just some new trendy hash function. It's been around for a
> while, and there really isn't anything that comes remotely close to
> being useful in the way SipHash is. With that said, why do we need this?
> 
> There are a variety of attacks known as "hashtable poisoning" in which an
> attacker forms some data such that the hash of that data will be the
> same, and then preceeds to fill up all entries of a hashbucket. This is
> a realistic and well-known denial-of-service vector.
> 
> Linux developers already seem to be aware that this is an issue, and
> various places that use hash tables in, say, a network context, use a
> non-cryptographically secure function (usually jhash) and then try to
> twiddle with the key on a time basis (or in many cases just do nothing
> and hope that nobody notices). While this is an admirable attempt at
> solving the problem, it doesn't actually fix it. SipHash fixes it.
> 
> (It fixes it in such a sound way that you could even build a stream
> cipher out of SipHash that would resist the modern cryptanalysis.)
> 
> There are a modicum of places in the kernel that are vulnerable to
> hashtable poisoning attacks, either via userspace vectors or network
> vectors, and there's not a reliable mechanism inside the kernel at the
> moment to fix it. The first step toward fixing these issues is actually
> getting a secure primitive into the kernel for developers to use. Then
> we can, bit by bit, port things over to it as deemed appropriate.
> 
> Dozens of languages are already using this internally for their hash
> tables. Some of the BSDs already use this in their kernels. SipHash is
> a widely known high-speed solution to a widely known problem, and it's
> time we catch-up.
...
> +u64 siphash24(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN])
...
> +	u64 k0 = get_unaligned_le64(key);
> +	u64 k1 = get_unaligned_le64(key + sizeof(u64));
...
> +		m = get_unaligned_le64(data);

All these unaligned accesses are going to get expensive on architectures
like sparc64.

	David

^ permalink raw reply

* dsa: handling more than 1 cpu port
From: John Crispin @ 2016-12-14 10:01 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli; +Cc: netdev@vger.kernel.org

Hi Andrew,

switches supported by qca8k have 2 gmacs that we can wire an external
mii interface to. Usually this is used to wire 2 of the SoCs MACs to the
switch. Thw switch itself is aware of one of the MACs being the cpu port
and expects this to be port/mac0. Using the other will break the
hardware offloading features. The original series from Matthieu added a
feature to configure the switch in a pass-through mode [1]. This
resulted in us having to define the pass-through inside the dts which
means that we loose userland configurability. Assume the setup as
follows ...

port0 - cpu port wired to SoCs MAC0
port1-4 - the lan ports
port5 - the wan port
port6 - wired to the SoCs MAC1

What i have done now is bring up one bridge for port1-4 and a second one
for port5-6. Once setup I can pass traffic on the SoCs MAC1 and it will
flow via port6 and egress on port5. So far so good, however due to the
way the port based vlans are setup and how the bridge_join/leave() logic
works, port5/6 will also fwd traffic to the cpu port. the driver has now
to tell that we are trunking traffic on eth1 via port6. also the MII
mode is not known to the driver. Adding some hackish register writes
will make this work nicely. My proposed way of fixing this cleanly in an
upstream friendly way would be as follows

1) add an extra dsa,ethernet property to the 2nd MII port

dsa@0 {
        compatible = "qca,ar8xxx";

        dsa,ethernet = <&gmac1>;

	[...]

        switch@0 {
		[...]

                port@5 {
                        reg = <5>;
                        label = "wan";
                        phy-handle = <&phy_port5>;
                };

                port@6 {
                        reg = <6>;
                        label = "gmac2";

                        dsa,ethernet = <&gmac2>;
                        fixed-link {
                                speed = <1000>;
                                full-duplex;
                        };
                };
        };
};

2) fix up the port_bridge_join/leave() logic such that if a port is
present in the bridge that has a reference to a ethernet interface it
will remove all ports in the bridge from the port based vlan of the
actual cpu port.

3) in case of this switch we also need to fiddle with the bcast/mcast
flooding registers

would this be ok and would it be better to probe the extra dsa_ethernet
inside the subsystem or the driver ? i was considering to do add a
dsa_is_trunk_port() or similar to achieve this.

	John




[1] https://patchwork.ozlabs.org/patch/477525/

^ permalink raw reply

* Hi
From: mrsnicole123 @ 2016-12-14  9:49 UTC (permalink / raw)


I am Mrs Nicole Marois, i have a pending project of fulfillment to put
in your hand, i will need your support to make this dream come through,
could you let me know your interest to enable me give you further
information, and I hereby advice that you send the below mentioned
information

I decided to will/donate the sum of $4.5 Million US Dollars to you for 
the
good work of God, and also to help the motherless and less privilege
and also forth assistance of the widows. At the moment I cannot take
any telephone calls right now due to the fact that my relatives (that
have squandered the funds agave them for this purpose before) are
around me and my health status also. I have adjusted my will and my
lawyer is aware.

I have willed those properties to you by quoting my personal file
routing and account information. And I have also notified the bank
that I am willing that properties to you for a good, effective and
prudent work. I know I don't know you but I have been directed to do
this by God.ok Please contact this woman for more details you might
not get me on line in time contact this email if you need more 
information 
ok

Email: mrsnicole2marios@gmail.com


Yours fairly friend,
Mrs Nicole Benoite Marois. 

^ permalink raw reply

* Re: sctp: suspicious rcu_dereference_check() usage in sctp_epaddr_lookup_transport
From: Xin Long @ 2016-12-14 10:04 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Dmitry Vyukov, Vladislav Yasevich, Neil Horman, David Miller,
	linux-sctp, netdev, LKML, Eric Dumazet, syzkaller
In-Reply-To: <20161213213730.GA4731@localhost.localdomain>

On Wed, Dec 14, 2016 at 5:37 AM, Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> On Tue, Dec 13, 2016 at 07:07:01PM +0100, Dmitry Vyukov wrote:
>> Hello,
>>
>> I am getting the following reports while running syzkaller fuzzer:
>>
>> [ INFO: suspicious RCU usage. ]
>> 4.9.0+ #85 Not tainted
>> -------------------------------
>> ./include/linux/rhashtable.h:572 suspicious rcu_dereference_check() usage!
>>
>> other info that might help us debug this:
>>
>> rcu_scheduler_active = 1, debug_locks = 0
>> 1 lock held by syz-executor1/18023:
>>  #0:  (sk_lock-AF_INET){+.+.+.}, at: [<     inline     >] lock_sock
>> include/net/sock.h:1454
>>  #0:  (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff87bb3ccf>]
>> sctp_getsockopt+0x45f/0x6800 net/sctp/socket.c:6432
>>
>> stack backtrace:
>> CPU: 2 PID: 18023 Comm: syz-executor1 Not tainted 4.9.0+ #85
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> Call Trace:
>> [<     inline     >] __dump_stack lib/dump_stack.c:15
>> [<        none        >] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
>> [<        none        >] lockdep_rcu_suspicious+0x139/0x180
>> kernel/locking/lockdep.c:4448
>> [<     inline     >] __rhashtable_lookup ./include/linux/rhashtable.h:572
>> [<     inline     >] rhltable_lookup ./include/linux/rhashtable.h:660
>> [<        none        >] sctp_epaddr_lookup_transport+0x641/0x930
>> net/sctp/input.c:946
>
> I think this was introduced in the rhlist converstion. We had removed
> some rcu_read_lock() calls on sctp stack because rhashtable was already
> calling it, but then we didn't add them back when moving to rhlist.
>
> This code:
> +/* return a transport without holding it, as it's only used under sock lock */
>  struct sctp_transport *sctp_epaddr_lookup_transport(
>                                 const struct sctp_endpoint *ep,
>                                 const union sctp_addr *paddr)
>  {
>         struct net *net = sock_net(ep->base.sk);
> +       struct rhlist_head *tmp, *list;
> +       struct sctp_transport *t;
>         struct sctp_hash_cmp_arg arg = {
> -               .ep    = ep,
>                 .paddr = paddr,
>                 .net   = net,
> +               .lport = htons(ep->base.bind_addr.port),
>         };
>
> -       return rhashtable_lookup_fast(&sctp_transport_hashtable, &arg,
> -                                     sctp_hash_params);
> +       list = rhltable_lookup(&sctp_transport_hashtable, &arg,
> +                              sctp_hash_params);
>
> Had an implicit rcu_read_lock() on rhashtable_lookup_fast, but it
> doesn't on rhltable_lookup and rhltable_lookup uses _rcu calls which
> assumes we have rcu read protection.

You're right, we need to call rcu_read_lock before using rhltable_lookup.
will post a fix for it, thanks.

^ permalink raw reply

* Re: dsa: handling more than 1 cpu port
From: Andrew Lunn @ 2016-12-14 10:31 UTC (permalink / raw)
  To: John Crispin; +Cc: Florian Fainelli, netdev@vger.kernel.org
In-Reply-To: <1671ae82-c213-7611-584f-02a3b1d5dff9@phrozen.org>

On Wed, Dec 14, 2016 at 11:01:54AM +0100, John Crispin wrote:
> Hi Andrew,
> 
> switches supported by qca8k have 2 gmacs that we can wire an external
> mii interface to. Usually this is used to wire 2 of the SoCs MACs to the
> switch. Thw switch itself is aware of one of the MACs being the cpu port
> and expects this to be port/mac0. Using the other will break the
> hardware offloading features.

Just to be sure here. There is no way to use the second port connected
to the CPU as a CPU port?

The Marvell chips do allow this. So i developed a proof of concept
which had a mapping between cpu ports and slave ports. slave port X
should you cpu port y for its traffic. This never got past proof of
concept. 

If this can be made to work for qca8k, i would prefer having this
general concept, than specific hacks for pass through.

	Andrew

^ permalink raw reply

* Re: dsa: handling more than 1 cpu port
From: John Crispin @ 2016-12-14 10:35 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Florian Fainelli, netdev@vger.kernel.org
In-Reply-To: <20161214103153.GC27370@lunn.ch>



On 14/12/2016 11:31, Andrew Lunn wrote:
> On Wed, Dec 14, 2016 at 11:01:54AM +0100, John Crispin wrote:
>> Hi Andrew,
>>
>> switches supported by qca8k have 2 gmacs that we can wire an external
>> mii interface to. Usually this is used to wire 2 of the SoCs MACs to the
>> switch. Thw switch itself is aware of one of the MACs being the cpu port
>> and expects this to be port/mac0. Using the other will break the
>> hardware offloading features.
> 
> Just to be sure here. There is no way to use the second port connected
> to the CPU as a CPU port?

both macs are considered cpu ports and both allow for the tag to be
injected. for HW NAT/routing/... offloading to work, the lan ports neet
to trunk via port0 and not port6 however.

> 
> The Marvell chips do allow this. So i developed a proof of concept
> which had a mapping between cpu ports and slave ports. slave port X
> should you cpu port y for its traffic. This never got past proof of
> concept. 
> 
> If this can be made to work for qca8k, i would prefer having this
> general concept, than specific hacks for pass through.

oh cool, can you send those patches my way please ? how do you configure
this from userland ? does the cpu port get its on swdev which i just add
to my lan bridge and then add the 2nd cpu port to the wan bridge ?

	John

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox