Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v2 06/29] mtd: Add support for reading MTD devices via the nvmem API
From: Boris Brezillon @ 2018-08-19 16:46 UTC (permalink / raw)
  To: Alban, Srinivas Kandagatla
  Cc: Bartosz Golaszewski, Jonathan Corbet, Sekhar Nori, Kevin Hilman,
	Russell King, Arnd Bergmann, Greg Kroah-Hartman, David Woodhouse,
	Brian Norris, Marek Vasut, Richard Weinberger, Grygorii Strashko,
	David S . Miller, Naren, Mauro Carvalho Chehab, Andrew Morton,
	Lukas Wunner, Dan Carpenter, Florian Fainelli, Ivan 
In-Reply-To: <20180819133106.0420df5f@tock>

On Sun, 19 Aug 2018 13:31:06 +0200
Alban <albeu@free.fr> wrote:

> On Fri, 17 Aug 2018 18:27:20 +0200
> Boris Brezillon <boris.brezillon@bootlin.com> wrote:
> 
> > Hi Bartosz,
> > 
> > On Fri, 10 Aug 2018 10:05:03 +0200
> > Bartosz Golaszewski <brgl@bgdev.pl> wrote:
> >   
> > > From: Alban Bedel <albeu@free.fr>
> > > 
> > > Allow drivers that use the nvmem API to read data stored on MTD devices.
> > > For this the mtd devices are registered as read-only NVMEM providers.
> > > 
> > > Signed-off-by: Alban Bedel <albeu@free.fr>
> > > [Bartosz:
> > >   - use the managed variant of nvmem_register(),
> > >   - set the nvmem name]
> > > Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>    
> > 
> > What happened to the 2 other patches of Alban's series? I'd really
> > like the DT case to be handled/agreed on in the same patchset, but
> > IIRC, Alban and Srinivas disagreed on how this should be represented.
> > I hope this time we'll come to an agreement, because the MTD <-> NVMEM
> > glue has been floating around for quite some time...  
> 
> These other patches were to fix what I consider a fundamental flaw in
> the generic NVMEM bindings, however we couldn't agree on this point.
> Bartosz later contacted me to take over this series and I suggested to
> just change the MTD NVMEM binding to use a compatible string on the
> NVMEM cells as an alternative solution to fix the clash with the old
> style MTD partition.
> 
> However all this has no impact on the code needed to add NVMEM support
> to MTD, so the above patch didn't change at all.

It does have an impact on the supported binding though.
nvmem->dev.of_node is automatically assigned to mtd->dev.of_node, which
means people will be able to define their NVMEM cells directly under
the MTD device and reference them from other nodes (even if it's not
documented), and as you said, it conflict with the old MTD partition
bindings. So we'd better agree on this binding before merging this
patch.

I see several options:

1/ provide a way to tell the NVMEM framework not to use parent->of_node
   even if it's != NULL. This way we really don't support defining
   NVMEM cells in the DT, and also don't support referencing the nvmem
   device using a phandle.

2/ define a new binding where all nvmem-cells are placed in an
   "nvmem" subnode (just like we have this "partitions" subnode for
   partitions), and then add a config->of_node field so that the
   nvmem provider can explicitly specify the DT node representing the
   nvmem device. We'll also need to set this field to ERR_PTR(-ENOENT)
   in case this node does not exist so that the nvmem framework knows
   that it should not assign nvmem->dev.of_node to parent->of_node

3/ only declare partitions as nvmem providers. This would solve the
   problem we have with partitions defined in the DT since
   defining sub-partitions in the DT is not (yet?) supported and
   partition nodes are supposed to be leaf nodes. Still, I'm not a big
   fan of this solution because it will prevent us from supporting
   sub-partitions if we ever want/need to.

4/ Add a ->of_xlate() hook that would be called if present by the
   framework instead of using the default parsing we have right now.

5/ Tell the nvmem framework the name of the subnode containing nvmem
   cell definitions (if NULL that means cells are directly defined
   under the nvmem provider node). We would set it to "nvmem-cells" (or
   whatever you like) for the MTD case.

There are probably other options (some were proposed by Alban and
Srinivas already), but I'd like to get this sorted out before we merge
this patch.

Alban, Srinivas, any opinion?

^ permalink raw reply

* [PATCH net-next v8 7/7] net: vhost: make busyloop_intr more accurate
From: xiangxia.m.yue @ 2018-08-19 12:11 UTC (permalink / raw)
  To: jasowang, mst, makita.toshiaki; +Cc: virtualization, netdev, Tonghao Zhang
In-Reply-To: <1534680686-3108-1-git-send-email-xiangxia.m.yue@gmail.com>

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

The patch uses vhost_has_work_pending() to check if
the specified handler is scheduled, because in the most case,
vhost_has_work() return true when other side handler is added
to worker list. Use the vhost_has_work_pending() insead of
vhost_has_work().

Topology:
[Host] ->linux bridge -> tap vhost-net ->[Guest]

TCP_STREAM (netperf):
* Without the patch:  38035.39 Mbps, 3.37 us mean latency
* With the patch:     38409.44 Mbps, 3.34 us mean latency

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 drivers/vhost/net.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index db63ae2..b6939ef 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -487,10 +487,8 @@ static void vhost_net_busy_poll(struct vhost_net *net,
 	endtime = busy_clock() + busyloop_timeout;
 
 	while (vhost_can_busy_poll(endtime)) {
-		if (vhost_has_work(&net->dev)) {
-			*busyloop_intr = true;
+		if (vhost_has_work(&net->dev))
 			break;
-		}
 
 		if ((sock_has_rx_data(sock) &&
 		     !vhost_vq_avail_empty(&net->dev, rvq)) ||
@@ -513,6 +511,11 @@ static void vhost_net_busy_poll(struct vhost_net *net,
 	    !vhost_has_work_pending(&net->dev, VHOST_NET_VQ_RX))
 		vhost_net_enable_vq(net, rvq);
 
+	if (vhost_has_work_pending(&net->dev,
+				   poll_rx ?
+				   VHOST_NET_VQ_RX: VHOST_NET_VQ_TX))
+		*busyloop_intr = true;
+
 	mutex_unlock(&vq->mutex);
 }
 
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next v8 6/7] net: vhost: disable rx wakeup during tx busypoll
From: xiangxia.m.yue @ 2018-08-19 12:11 UTC (permalink / raw)
  To: jasowang, mst, makita.toshiaki; +Cc: virtualization, netdev, Tonghao Zhang
In-Reply-To: <1534680686-3108-1-git-send-email-xiangxia.m.yue@gmail.com>

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

In the handle_tx, the busypoll will vhost_net_disable/enable_vq
because we have poll the sock. This can improve performance.

This is suggested by Toshiaki Makita and Jason Wang.

If the rx handle is scheduled, we will not enable vq, because it's
not necessary. We do it not in last 'else' because if we receive
the data, but can't queue the rx handle(rx vring is full), then we
enable the vq to avoid case: guest receives the data, vring is not
full then guest can get more data, but vq is disabled, rx vq can't
be wakeup to receive more data.

Topology:
[Host] ->linux bridge -> tap vhost-net ->[Guest]

TCP_STREAM (netperf):
* Without the patch:  37598.20 Mbps, 3.43 us mean latency
* With the patch:     38035.39 Mbps, 3.37 us mean latency

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 drivers/vhost/net.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 23d7ffc..db63ae2 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -480,6 +480,9 @@ static void vhost_net_busy_poll(struct vhost_net *net,
 	busyloop_timeout = poll_rx ? rvq->busyloop_timeout:
 				     tvq->busyloop_timeout;
 
+	if (!poll_rx)
+		vhost_net_disable_vq(net, rvq);
+
 	preempt_disable();
 	endtime = busy_clock() + busyloop_timeout;
 
@@ -506,6 +509,10 @@ static void vhost_net_busy_poll(struct vhost_net *net,
 	else /* On tx here, sock has no rx data. */
 		vhost_enable_notify(&net->dev, rvq);
 
+	if (!poll_rx &&
+	    !vhost_has_work_pending(&net->dev, VHOST_NET_VQ_RX))
+		vhost_net_enable_vq(net, rvq);
+
 	mutex_unlock(&vq->mutex);
 }
 
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next v8 5/7] net: vhost: introduce bitmap for vhost_poll
From: xiangxia.m.yue @ 2018-08-19 12:11 UTC (permalink / raw)
  To: jasowang, mst, makita.toshiaki; +Cc: virtualization, netdev, Tonghao Zhang
In-Reply-To: <1534680686-3108-1-git-send-email-xiangxia.m.yue@gmail.com>

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

The bitmap of vhost_dev can help us to check if the
specified poll is scheduled. This patch will be used
for next two patches.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 drivers/vhost/net.c   | 11 +++++++++--
 drivers/vhost/vhost.c | 17 +++++++++++++++--
 drivers/vhost/vhost.h |  7 ++++++-
 3 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 1eff72d..23d7ffc 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1135,8 +1135,15 @@ static int vhost_net_open(struct inode *inode, struct file *f)
 	}
 	vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX);
 
-	vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, EPOLLOUT, dev);
-	vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, EPOLLIN, dev);
+	vhost_poll_init(n->poll + VHOST_NET_VQ_TX,
+			handle_tx_net,
+			VHOST_NET_VQ_TX,
+			EPOLLOUT, dev);
+
+	vhost_poll_init(n->poll + VHOST_NET_VQ_RX,
+			handle_rx_net,
+			VHOST_NET_VQ_RX,
+			EPOLLIN, dev);
 
 	f->private_data = n;
 
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index a1c06e7..dc88a60 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -186,7 +186,7 @@ void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn)
 
 /* Init poll structure */
 void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
-		     __poll_t mask, struct vhost_dev *dev)
+		     __u8 poll_id, __poll_t mask, struct vhost_dev *dev)
 {
 	init_waitqueue_func_entry(&poll->wait, vhost_poll_wakeup);
 	init_poll_funcptr(&poll->table, vhost_poll_func);
@@ -194,6 +194,7 @@ void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
 	poll->dev = dev;
 	poll->wqh = NULL;
 
+	poll->poll_id = poll_id;
 	vhost_work_init(&poll->work, fn);
 }
 EXPORT_SYMBOL_GPL(vhost_poll_init);
@@ -276,8 +277,16 @@ bool vhost_has_work(struct vhost_dev *dev)
 }
 EXPORT_SYMBOL_GPL(vhost_has_work);
 
+bool vhost_has_work_pending(struct vhost_dev *dev, int poll_id)
+{
+	return !llist_empty(&dev->work_list) &&
+		test_bit(poll_id, dev->work_pending);
+}
+EXPORT_SYMBOL_GPL(vhost_has_work_pending);
+
 void vhost_poll_queue(struct vhost_poll *poll)
 {
+	set_bit(poll->poll_id, poll->dev->work_pending);
 	vhost_work_queue(poll->dev, &poll->work);
 }
 EXPORT_SYMBOL_GPL(vhost_poll_queue);
@@ -354,6 +363,7 @@ static int vhost_worker(void *data)
 		if (!node)
 			schedule();
 
+		bitmap_zero(dev->work_pending, VHOST_DEV_MAX_VQ);
 		node = llist_reverse_order(node);
 		/* make sure flag is seen after deletion */
 		smp_wmb();
@@ -420,6 +430,8 @@ void vhost_dev_init(struct vhost_dev *dev,
 	struct vhost_virtqueue *vq;
 	int i;
 
+	BUG_ON(nvqs > VHOST_DEV_MAX_VQ);
+
 	dev->vqs = vqs;
 	dev->nvqs = nvqs;
 	mutex_init(&dev->mutex);
@@ -428,6 +440,7 @@ void vhost_dev_init(struct vhost_dev *dev,
 	dev->iotlb = NULL;
 	dev->mm = NULL;
 	dev->worker = NULL;
+	bitmap_zero(dev->work_pending, VHOST_DEV_MAX_VQ);
 	init_llist_head(&dev->work_list);
 	init_waitqueue_head(&dev->wait);
 	INIT_LIST_HEAD(&dev->read_list);
@@ -445,7 +458,7 @@ void vhost_dev_init(struct vhost_dev *dev,
 		vhost_vq_reset(dev, vq);
 		if (vq->handle_kick)
 			vhost_poll_init(&vq->poll, vq->handle_kick,
-					EPOLLIN, dev);
+					i, EPOLLIN, dev);
 	}
 }
 EXPORT_SYMBOL_GPL(vhost_dev_init);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 6c844b9..60b6f6d 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -30,6 +30,7 @@ struct vhost_poll {
 	wait_queue_head_t        *wqh;
 	wait_queue_entry_t              wait;
 	struct vhost_work	  work;
+	__u8			  poll_id;
 	__poll_t		  mask;
 	struct vhost_dev	 *dev;
 };
@@ -37,9 +38,10 @@ struct vhost_poll {
 void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn);
 void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
 bool vhost_has_work(struct vhost_dev *dev);
+bool vhost_has_work_pending(struct vhost_dev *dev, int poll_id);
 
 void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
-		     __poll_t mask, struct vhost_dev *dev);
+		     __u8 id, __poll_t mask, struct vhost_dev *dev);
 int vhost_poll_start(struct vhost_poll *poll, struct file *file);
 void vhost_poll_stop(struct vhost_poll *poll);
 void vhost_poll_flush(struct vhost_poll *poll);
@@ -152,6 +154,8 @@ struct vhost_msg_node {
   struct list_head node;
 };
 
+#define VHOST_DEV_MAX_VQ	128
+
 struct vhost_dev {
 	struct mm_struct *mm;
 	struct mutex mutex;
@@ -159,6 +163,7 @@ struct vhost_dev {
 	int nvqs;
 	struct eventfd_ctx *log_ctx;
 	struct llist_head work_list;
+	DECLARE_BITMAP(work_pending, VHOST_DEV_MAX_VQ);
 	struct task_struct *worker;
 	struct vhost_umem *umem;
 	struct vhost_umem *iotlb;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next v8 4/7] net: vhost: add rx busy polling in tx path
From: xiangxia.m.yue @ 2018-08-19 12:11 UTC (permalink / raw)
  To: jasowang, mst, makita.toshiaki; +Cc: virtualization, netdev, Tonghao Zhang
In-Reply-To: <1534680686-3108-1-git-send-email-xiangxia.m.yue@gmail.com>

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

This patch improves the guest receive performance.
On the handle_tx side, we poll the sock receive queue at the
same time. handle_rx do that in the same way.

We set the poll-us=100us and use the netperf to test throughput
and mean latency. When running the tests, the vhost-net kthread
of that VM, is alway 100% CPU. The commands are shown as below.

Rx performance is greatly improved by this patch. There is not
notable performance change on tx with this series though. This
patch is useful for bi-directional traffic.

netperf -H IP -t TCP_STREAM -l 20 -- -O "THROUGHPUT, THROUGHPUT_UNITS, MEAN_LATENCY"

Topology:
[Host] ->linux bridge -> tap vhost-net ->[Guest]

TCP_STREAM:
* Without the patch:  19842.95 Mbps, 6.50 us mean latency
* With the patch:     37598.20 Mbps, 3.43 us mean latency

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 drivers/vhost/net.c | 33 +++++++++++++--------------------
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 453c061..1eff72d 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -510,31 +510,24 @@ static void vhost_net_busy_poll(struct vhost_net *net,
 }
 
 static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
-				    struct vhost_net_virtqueue *nvq,
+				    struct vhost_net_virtqueue *tnvq,
 				    unsigned int *out_num, unsigned int *in_num,
 				    bool *busyloop_intr)
 {
-	struct vhost_virtqueue *vq = &nvq->vq;
-	unsigned long uninitialized_var(endtime);
-	int r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
+	struct vhost_net_virtqueue *rnvq = &net->vqs[VHOST_NET_VQ_RX];
+	struct vhost_virtqueue *rvq = &rnvq->vq;
+	struct vhost_virtqueue *tvq = &tnvq->vq;
+
+	int r = vhost_get_vq_desc(tvq, tvq->iov, ARRAY_SIZE(tvq->iov),
 				  out_num, in_num, NULL, NULL);
 
-	if (r == vq->num && vq->busyloop_timeout) {
-		if (!vhost_sock_zcopy(vq->private_data))
-			vhost_net_signal_used(nvq);
-		preempt_disable();
-		endtime = busy_clock() + vq->busyloop_timeout;
-		while (vhost_can_busy_poll(endtime)) {
-			if (vhost_has_work(vq->dev)) {
-				*busyloop_intr = true;
-				break;
-			}
-			if (!vhost_vq_avail_empty(vq->dev, vq))
-				break;
-			cpu_relax();
-		}
-		preempt_enable();
-		r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
+	if (r == tvq->num && tvq->busyloop_timeout) {
+		if (!vhost_sock_zcopy(tvq->private_data))
+			vhost_net_signal_used(tnvq);
+
+		vhost_net_busy_poll(net, rvq, tvq, busyloop_intr, false);
+
+		r = vhost_get_vq_desc(tvq, tvq->iov, ARRAY_SIZE(tvq->iov),
 				      out_num, in_num, NULL, NULL);
 	}
 
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next v8 3/7] net: vhost: factor out busy polling logic to vhost_net_busy_poll()
From: xiangxia.m.yue @ 2018-08-19 12:11 UTC (permalink / raw)
  To: jasowang, mst, makita.toshiaki; +Cc: virtualization, netdev, Tonghao Zhang
In-Reply-To: <1534680686-3108-1-git-send-email-xiangxia.m.yue@gmail.com>

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

Factor out generic busy polling logic and will be
used for in tx path in the next patch. And with the patch,
qemu can set differently the busyloop_timeout for rx queue.

To avoid duplicate codes, introduce the helper functions:
* sock_has_rx_data(changed from sk_has_rx_data)
* vhost_net_busy_poll_try_queue

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 drivers/vhost/net.c | 111 +++++++++++++++++++++++++++++++++-------------------
 1 file changed, 71 insertions(+), 40 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 32c1b52..453c061 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -440,6 +440,75 @@ static void vhost_net_signal_used(struct vhost_net_virtqueue *nvq)
 	nvq->done_idx = 0;
 }
 
+static int sock_has_rx_data(struct socket *sock)
+{
+	if (unlikely(!sock))
+		return 0;
+
+	if (sock->ops->peek_len)
+		return sock->ops->peek_len(sock);
+
+	return skb_queue_empty(&sock->sk->sk_receive_queue);
+}
+
+static void vhost_net_busy_poll_try_queue(struct vhost_net *net,
+					  struct vhost_virtqueue *vq)
+{
+	if (!vhost_vq_avail_empty(&net->dev, vq)) {
+		vhost_poll_queue(&vq->poll);
+	} else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
+		vhost_disable_notify(&net->dev, vq);
+		vhost_poll_queue(&vq->poll);
+	}
+}
+
+static void vhost_net_busy_poll(struct vhost_net *net,
+				struct vhost_virtqueue *rvq,
+				struct vhost_virtqueue *tvq,
+				bool *busyloop_intr,
+				bool poll_rx)
+{
+	unsigned long busyloop_timeout;
+	unsigned long endtime;
+	struct socket *sock;
+	struct vhost_virtqueue *vq = poll_rx ? tvq : rvq;
+
+	mutex_lock_nested(&vq->mutex, poll_rx ? VHOST_NET_VQ_TX: VHOST_NET_VQ_RX);
+	vhost_disable_notify(&net->dev, vq);
+	sock = rvq->private_data;
+
+	busyloop_timeout = poll_rx ? rvq->busyloop_timeout:
+				     tvq->busyloop_timeout;
+
+	preempt_disable();
+	endtime = busy_clock() + busyloop_timeout;
+
+	while (vhost_can_busy_poll(endtime)) {
+		if (vhost_has_work(&net->dev)) {
+			*busyloop_intr = true;
+			break;
+		}
+
+		if ((sock_has_rx_data(sock) &&
+		     !vhost_vq_avail_empty(&net->dev, rvq)) ||
+		    !vhost_vq_avail_empty(&net->dev, tvq))
+			break;
+
+		cpu_relax();
+	}
+
+	preempt_enable();
+
+	if (poll_rx)
+		vhost_net_busy_poll_try_queue(net, tvq);
+	else if (sock_has_rx_data(sock))
+		vhost_net_busy_poll_try_queue(net, rvq);
+	else /* On tx here, sock has no rx data. */
+		vhost_enable_notify(&net->dev, rvq);
+
+	mutex_unlock(&vq->mutex);
+}
+
 static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
 				    struct vhost_net_virtqueue *nvq,
 				    unsigned int *out_num, unsigned int *in_num,
@@ -753,16 +822,6 @@ static int peek_head_len(struct vhost_net_virtqueue *rvq, struct sock *sk)
 	return len;
 }
 
-static int sk_has_rx_data(struct sock *sk)
-{
-	struct socket *sock = sk->sk_socket;
-
-	if (sock->ops->peek_len)
-		return sock->ops->peek_len(sock);
-
-	return skb_queue_empty(&sk->sk_receive_queue);
-}
-
 static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk,
 				      bool *busyloop_intr)
 {
@@ -770,41 +829,13 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk,
 	struct vhost_net_virtqueue *tnvq = &net->vqs[VHOST_NET_VQ_TX];
 	struct vhost_virtqueue *rvq = &rnvq->vq;
 	struct vhost_virtqueue *tvq = &tnvq->vq;
-	unsigned long uninitialized_var(endtime);
 	int len = peek_head_len(rnvq, sk);
 
-	if (!len && tvq->busyloop_timeout) {
+	if (!len && rvq->busyloop_timeout) {
 		/* Flush batched heads first */
 		vhost_net_signal_used(rnvq);
 		/* Both tx vq and rx socket were polled here */
-		mutex_lock_nested(&tvq->mutex, VHOST_NET_VQ_TX);
-		vhost_disable_notify(&net->dev, tvq);
-
-		preempt_disable();
-		endtime = busy_clock() + tvq->busyloop_timeout;
-
-		while (vhost_can_busy_poll(endtime)) {
-			if (vhost_has_work(&net->dev)) {
-				*busyloop_intr = true;
-				break;
-			}
-			if ((sk_has_rx_data(sk) &&
-			     !vhost_vq_avail_empty(&net->dev, rvq)) ||
-			    !vhost_vq_avail_empty(&net->dev, tvq))
-				break;
-			cpu_relax();
-		}
-
-		preempt_enable();
-
-		if (!vhost_vq_avail_empty(&net->dev, tvq)) {
-			vhost_poll_queue(&tvq->poll);
-		} else if (unlikely(vhost_enable_notify(&net->dev, tvq))) {
-			vhost_disable_notify(&net->dev, tvq);
-			vhost_poll_queue(&tvq->poll);
-		}
-
-		mutex_unlock(&tvq->mutex);
+		vhost_net_busy_poll(net, rvq, tvq, busyloop_intr, true);
 
 		len = peek_head_len(rnvq, sk);
 	}
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next v8 2/7] net: vhost: replace magic number of lock annotation
From: xiangxia.m.yue @ 2018-08-19 12:11 UTC (permalink / raw)
  To: jasowang, mst, makita.toshiaki; +Cc: virtualization, netdev, Tonghao Zhang
In-Reply-To: <1534680686-3108-1-git-send-email-xiangxia.m.yue@gmail.com>

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

Use the VHOST_NET_VQ_XXX as a subclass for mutex_lock_nested.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 367d802..32c1b52 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -712,7 +712,7 @@ static void handle_tx(struct vhost_net *net)
 	struct vhost_virtqueue *vq = &nvq->vq;
 	struct socket *sock;
 
-	mutex_lock(&vq->mutex);
+	mutex_lock_nested(&vq->mutex, VHOST_NET_VQ_TX);
 	sock = vq->private_data;
 	if (!sock)
 		goto out;
@@ -777,7 +777,7 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk,
 		/* Flush batched heads first */
 		vhost_net_signal_used(rnvq);
 		/* Both tx vq and rx socket were polled here */
-		mutex_lock_nested(&tvq->mutex, 1);
+		mutex_lock_nested(&tvq->mutex, VHOST_NET_VQ_TX);
 		vhost_disable_notify(&net->dev, tvq);
 
 		preempt_disable();
@@ -919,7 +919,7 @@ static void handle_rx(struct vhost_net *net)
 	__virtio16 num_buffers;
 	int recv_pkts = 0;
 
-	mutex_lock_nested(&vq->mutex, 0);
+	mutex_lock_nested(&vq->mutex, VHOST_NET_VQ_RX);
 	sock = vq->private_data;
 	if (!sock)
 		goto out;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next v8 1/7] net: vhost: lock the vqs one by one
From: xiangxia.m.yue @ 2018-08-19 12:11 UTC (permalink / raw)
  To: jasowang, mst, makita.toshiaki; +Cc: virtualization, netdev, Tonghao Zhang
In-Reply-To: <1534680686-3108-1-git-send-email-xiangxia.m.yue@gmail.com>

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

This patch changes the way that lock all vqs
at the same, to lock them one by one. It will
be used for next patch to avoid the deadlock.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 24 +++++++-----------------
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index a502f1a..a1c06e7 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -294,8 +294,11 @@ static void vhost_vq_meta_reset(struct vhost_dev *d)
 {
 	int i;
 
-	for (i = 0; i < d->nvqs; ++i)
+	for (i = 0; i < d->nvqs; ++i) {
+		mutex_lock(&d->vqs[i]->mutex);
 		__vhost_vq_meta_reset(d->vqs[i]);
+		mutex_unlock(&d->vqs[i]->mutex);
+	}
 }
 
 static void vhost_vq_reset(struct vhost_dev *dev,
@@ -890,20 +893,6 @@ static inline void __user *__vhost_get_user(struct vhost_virtqueue *vq,
 #define vhost_get_used(vq, x, ptr) \
 	vhost_get_user(vq, x, ptr, VHOST_ADDR_USED)
 
-static void vhost_dev_lock_vqs(struct vhost_dev *d)
-{
-	int i = 0;
-	for (i = 0; i < d->nvqs; ++i)
-		mutex_lock_nested(&d->vqs[i]->mutex, i);
-}
-
-static void vhost_dev_unlock_vqs(struct vhost_dev *d)
-{
-	int i = 0;
-	for (i = 0; i < d->nvqs; ++i)
-		mutex_unlock(&d->vqs[i]->mutex);
-}
-
 static int vhost_new_umem_range(struct vhost_umem *umem,
 				u64 start, u64 size, u64 end,
 				u64 userspace_addr, int perm)
@@ -953,7 +942,10 @@ static void vhost_iotlb_notify_vq(struct vhost_dev *d,
 		if (msg->iova <= vq_msg->iova &&
 		    msg->iova + msg->size - 1 > vq_msg->iova &&
 		    vq_msg->type == VHOST_IOTLB_MISS) {
+			mutex_lock(&node->vq->mutex);
 			vhost_poll_queue(&node->vq->poll);
+			mutex_unlock(&node->vq->mutex);
+
 			list_del(&node->node);
 			kfree(node);
 		}
@@ -985,7 +977,6 @@ static int vhost_process_iotlb_msg(struct vhost_dev *dev,
 	int ret = 0;
 
 	mutex_lock(&dev->mutex);
-	vhost_dev_lock_vqs(dev);
 	switch (msg->type) {
 	case VHOST_IOTLB_UPDATE:
 		if (!dev->iotlb) {
@@ -1019,7 +1010,6 @@ static int vhost_process_iotlb_msg(struct vhost_dev *dev,
 		break;
 	}
 
-	vhost_dev_unlock_vqs(dev);
 	mutex_unlock(&dev->mutex);
 
 	return ret;
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH v2 06/29] mtd: Add support for reading MTD devices via the nvmem API
From: Alban @ 2018-08-19 11:31 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Aban Bedel, Bartosz Golaszewski, Jonathan Corbet, Sekhar Nori,
	Kevin Hilman, Russell King, Arnd Bergmann, Greg Kroah-Hartman,
	David Woodhouse, Brian Norris, Marek Vasut, Richard Weinberger,
	Grygorii Strashko, David S . Miller, Srinivas Kandagatla, Naren,
	Mauro Carvalho Chehab, Andrew Morton, Lukas Wunner,
	Dan Carpenter <dan.c
In-Reply-To: <20180817182720.6a6e5e8e@bbrezillon>

[-- Attachment #1: Type: text/plain, Size: 1475 bytes --]

On Fri, 17 Aug 2018 18:27:20 +0200
Boris Brezillon <boris.brezillon@bootlin.com> wrote:

> Hi Bartosz,
> 
> On Fri, 10 Aug 2018 10:05:03 +0200
> Bartosz Golaszewski <brgl@bgdev.pl> wrote:
> 
> > From: Alban Bedel <albeu@free.fr>
> > 
> > Allow drivers that use the nvmem API to read data stored on MTD devices.
> > For this the mtd devices are registered as read-only NVMEM providers.
> > 
> > Signed-off-by: Alban Bedel <albeu@free.fr>
> > [Bartosz:
> >   - use the managed variant of nvmem_register(),
> >   - set the nvmem name]
> > Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>  
> 
> What happened to the 2 other patches of Alban's series? I'd really
> like the DT case to be handled/agreed on in the same patchset, but
> IIRC, Alban and Srinivas disagreed on how this should be represented.
> I hope this time we'll come to an agreement, because the MTD <-> NVMEM
> glue has been floating around for quite some time...

These other patches were to fix what I consider a fundamental flaw in
the generic NVMEM bindings, however we couldn't agree on this point.
Bartosz later contacted me to take over this series and I suggested to
just change the MTD NVMEM binding to use a compatible string on the
NVMEM cells as an alternative solution to fix the clash with the old
style MTD partition.

However all this has no impact on the code needed to add NVMEM support
to MTD, so the above patch didn't change at all.

Alban

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Cześć słodka!!!
From: Wesley @ 2018-08-19 10:25 UTC (permalink / raw)


Jak się dzisiaj czujesz, mam nadzieję, że wszystko jest w porządku, cieszę się, że mogę się z tobą spotkać. W każdym razie jestem Wesley ze Stanów Zjednoczonych Ameryki, przebywa obecnie w Syrii na misję pokojową. Chcę cię lepiej poznać, jeśli mogę być odważny. Uważam się za łatwego człowieka, a obecnie szukam związku, w którym czuję się kochany. Proszę wybaczyć moje maniery, nie jestem dobry, jeśli chodzi o Internet, ponieważ to nie jest moja dziedzina. W Syrii nie wolno nam wychodzić, co sprawia, że jest dla mnie bardzo znudzony, więc myślę, że potrzebuję przyjaciela do rozmowy z trzymaj mnie!

^ permalink raw reply

* [PATCH net-next v8 0/7] net: vhost: improve performance when enable busyloop
From: xiangxia.m.yue @ 2018-08-19 12:11 UTC (permalink / raw)
  To: jasowang, mst, makita.toshiaki; +Cc: netdev, virtualization

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

This patches improve the guest receive performance.
On the handle_tx side, we poll the sock receive queue
at the same time. handle_rx do that in the same way.

For more performance report, see patch 4, 6, 7

Tonghao Zhang (7):
  net: vhost: lock the vqs one by one
  net: vhost: replace magic number of lock annotation
  net: vhost: factor out busy polling logic to vhost_net_busy_poll()
  net: vhost: add rx busy polling in tx path
  net: vhost: introduce bitmap for vhost_poll
  net: vhost: disable rx wakeup during tx busypoll
  net: vhost: make busyloop_intr more accurate

 drivers/vhost/net.c   | 169 +++++++++++++++++++++++++++++++-------------------
 drivers/vhost/vhost.c |  41 ++++++------
 drivers/vhost/vhost.h |   7 ++-
 3 files changed, 133 insertions(+), 84 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* Re: how to (cross)connect two (physical) eth ports for ping test?
From: Robert P. J. Day @ 2018-08-19  8:29 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Andrew Lunn, Linux kernel netdev mailing list
In-Reply-To: <20180818204520.GC8729@1wt.eu>

On Sat, 18 Aug 2018, Willy Tarreau wrote:

> On Sat, Aug 18, 2018 at 09:10:25PM +0200, Andrew Lunn wrote:
> > On Sat, Aug 18, 2018 at 01:39:50PM -0400, Robert P. J. Day wrote:
> > >
> > >   (i'm sure this has been explained many times before, so a link
> > > covering this will almost certainly do just fine.)
> > >
> > >   i want to loop one physical ethernet port into another, and just
> > > ping the daylights from one to the other for stress testing. my fedora
> > > laptop doesn't actually have two unused ethernet ports, so i just want
> > > to emulate this by slapping a couple startech USB/net adapters into
> > > two empty USB ports, setting this up, then doing it all over again
> > > monday morning on the actual target system, which does have multiple
> > > ethernet ports.
> > >
> > >   so if someone can point me to the recipe, that would be great and
> > > you can stop reading.
> > >
> > >   as far as my tentative solution goes, i assume i need to put at
> > > least one of the physical ports in a network namespace via "ip netns",
> > > then ping from the netns to the root namespace. or, going one step
> > > further, perhaps putting both interfaces into two new namespaces, and
> > > setting up forwarding.
> >
> > Namespaces is a good solution. Something like this should work:
> >
> > ip netns add namespace1
> > ip netns add namespace2
> >
> > ip link set eth1 netns namespace1
> > ip link set eth2 netns namespace2
> >
> > ip netns exec namespace1 \
> >         ip addr add 10.42.42.42/24 dev eth1
> >
> > ip netns exec namespace1 \
> >         ip link set eth1 up
> >
> > ip netns exec namespace2 \
> >         ip addr add 10.42.42.24/24 dev eth2
> >
> > ip netns exec namespace2 \
> >         ip link set eth2 up
> >
> > ip netns exec namespace1 \
> >         ping 10.42.42.24
> >
> > You might also want to consider iperf3 for stress testing, depending
> > on the sort of stress you need.
>
> FWIW I have a setup somewhere involving ip rule + ip route which
> achieves the same without involving namespaces. It's a bit hackish
> but sometimes convenient. I can dig if someone is interested.

  sure, i'm interested ... always educational to see different
solutions.

rday

-- 

========================================================================
Robert P. J. Day                                 Ottawa, Ontario, CANADA
                  http://crashcourse.ca/dokuwiki

Twitter:                                       http://twitter.com/rpjday
LinkedIn:                               http://ca.linkedin.com/in/rpjday
========================================================================

^ permalink raw reply

* ***SPAM***会议记录【太 陽 城集团:401362。COM 送 您150%老 虎] 机大牌优 惠,存100送100 二存再送50 元,天天反水2.0%起,
From: Mathioudaki Athina @ 2018-08-19  4:56 UTC (permalink / raw)
  To: netdev

Thank you for your e-mail.
Tech-Line S.A will be closed for summer holidays from August 6th to Friday August 25th inclusive.
I will reply to your e-mail upon my return.

Best regards,
Athina Mathioudaki

^ permalink raw reply

* [PATCH 2/2] ip6_vti: fix creating fallback tunnel device for vti6
From: Haishuang Yan @ 2018-08-19  7:05 UTC (permalink / raw)
  To: Steffen Klassert, David S. Miller, Alexey Kuznetsov
  Cc: netdev, linux-kernel, Haishuang Yan
In-Reply-To: <1534662305-16734-1-git-send-email-yanhaishuang@cmss.chinamobile.com>

When set fb_tunnels_only_for_init_net to 1, don't create fallback tunnel
device for vti6 when a new namespace is created.

Tested:
[root@builder2 ~]# modprobe ip6_tunnel
[root@builder2 ~]# modprobe ip6_vti
[root@builder2 ~]# echo 1 > /proc/sys/net/core/fb_tunnels_only_for_init_net
[root@builder2 ~]# unshare -n
[root@builder2 ~]# ip link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group
default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
---
 net/ipv6/ip6_vti.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index c72ae3a..3b9f39f 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -1114,6 +1114,8 @@ static int __net_init vti6_init_net(struct net *net)
 	ip6n->tnls[0] = ip6n->tnls_wc;
 	ip6n->tnls[1] = ip6n->tnls_r_l;
 
+	if (!net_has_fallback_tunnels(net))
+		return 0;
 	err = -ENOMEM;
 	ip6n->fb_tnl_dev = alloc_netdev(sizeof(struct ip6_tnl), "ip6_vti0",
 					NET_NAME_UNKNOWN, vti6_dev_setup);
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH 1/2] ip_vti: fix a null pointer deferrence when create vti fallback tunnel
From: Haishuang Yan @ 2018-08-19  7:05 UTC (permalink / raw)
  To: Steffen Klassert, David S. Miller, Alexey Kuznetsov
  Cc: netdev, linux-kernel, Haishuang Yan, Eric Dumazet

After set fb_tunnels_only_for_init_net to 1, the itn->fb_tunnel_dev will
be NULL and will cause following crash:

[ 2742.849298] BUG: unable to handle kernel NULL pointer dereference at 0000000000000941
[ 2742.851380] PGD 800000042c21a067 P4D 800000042c21a067 PUD 42aaed067 PMD 0
[ 2742.852818] Oops: 0002 [#1] SMP PTI
[ 2742.853570] CPU: 7 PID: 2484 Comm: unshare Kdump: loaded Not tainted 4.18.0-rc8+ #2
[ 2742.855163] Hardware name: Fedora Project OpenStack Nova, BIOS seabios-1.7.5-11.el7 04/01/2014
[ 2742.856970] RIP: 0010:vti_init_net+0x3a/0x50 [ip_vti]
[ 2742.858034] Code: 90 83 c0 48 c7 c2 20 a1 83 c0 48 89 fb e8 6e 3b f6 ff 85 c0 75 22 8b 0d f4 19 00 00 48 8b 93 00 14 00 00 48 8b 14 ca 48 8b 12 <c6> 82 41 09 00 00 04 c6 82 38 09 00 00 45 5b c3 66 0f 1f 44 00 00
[ 2742.861940] RSP: 0018:ffff9be28207fde0 EFLAGS: 00010246
[ 2742.863044] RAX: 0000000000000000 RBX: ffff8a71ebed4980 RCX: 0000000000000013
[ 2742.864540] RDX: 0000000000000000 RSI: 0000000000000013 RDI: ffff8a71ebed4980
[ 2742.866020] RBP: ffff8a71ea717000 R08: ffffffffc083903c R09: ffff8a71ea717000
[ 2742.867505] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a71ebed4980
[ 2742.868987] R13: 0000000000000013 R14: ffff8a71ea5b49c0 R15: 0000000000000000
[ 2742.870473] FS:  00007f02266c9740(0000) GS:ffff8a71ffdc0000(0000) knlGS:0000000000000000
[ 2742.872143] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2742.873340] CR2: 0000000000000941 CR3: 000000042bc20006 CR4: 00000000001606e0
[ 2742.874821] Call Trace:
[ 2742.875358]  ops_init+0x38/0xf0
[ 2742.876078]  setup_net+0xd9/0x1f0
[ 2742.876789]  copy_net_ns+0xb7/0x130
[ 2742.877538]  create_new_namespaces+0x11a/0x1d0
[ 2742.878525]  unshare_nsproxy_namespaces+0x55/0xa0
[ 2742.879526]  ksys_unshare+0x1a7/0x330
[ 2742.880313]  __x64_sys_unshare+0xe/0x20
[ 2742.881131]  do_syscall_64+0x5b/0x180
[ 2742.881933]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reproduce:
echo 1 > /proc/sys/net/core/fb_tunnels_only_for_init_net
modprobe ip_vti
unshare -n

Fixes: 79134e6ce2c9 (net: do not create fallback tunnels for non-default
namespaces)
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
---
 net/ipv4/ip_vti.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index 3f091cc..f38cb21 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -438,7 +438,8 @@ static int __net_init vti_init_net(struct net *net)
 	if (err)
 		return err;
 	itn = net_generic(net, vti_net_id);
-	vti_fb_tunnel_init(itn->fb_tunnel_dev);
+	if (itn->fb_tunnel_dev)
+		vti_fb_tunnel_init(itn->fb_tunnel_dev);
 	return 0;
 }
 
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH][net-next] vxlan: reduce dirty cache line in vxlan_find_mac
From: Li RongQing @ 2018-08-19  3:36 UTC (permalink / raw)
  To: netdev

vxlan_find_mac() unconditionally set f->used for every packet,
this cause a cache miss for every packet, since remote, hlist
and used of vxlan_fdb share the same cacheline.

With this change f->used is set only if not equal to jiffies
This gives up to 5% speed-up with small packets.

Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
---
 drivers/net/vxlan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index ababba37d735..e5d236595206 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -464,7 +464,7 @@ static struct vxlan_fdb *vxlan_find_mac(struct vxlan_dev *vxlan,
 	struct vxlan_fdb *f;
 
 	f = __vxlan_find_mac(vxlan, mac, vni);
-	if (f)
+	if (f && f->used != jiffies)
 		f->used = jiffies;
 
 	return f;
-- 
2.16.2

^ permalink raw reply related

* Re: [RFC 0/1] Appletalk AARP probe broken by receipt of own broadcasts.
From: Craig McGeachie @ 2018-08-19  2:09 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: David S. Miller, netdev, Craig McGeachie
In-Reply-To: <20180819013244.GA8950@lunn.ch>

On 19/08/18 13:32, Andrew Lunn wrote:
> On Sun, Aug 19, 2018 at 01:07:38PM +1200, Craig McGeachie wrote:
>> I'm hoping I can find someone able and willing to test this patch. That
>> requires someone still using netatalk 2.2.x with DDP, or some other DDP
>> userspace application. This feels like a longshot.
>>
>> When netatalk 2.2.x starts up with DDP and sets the Appletalk node
>> address, the kernel AARP code sends a probe packet for the address. It
>> then receives its own probe packet and interprets that as some other
>> node also trying to claim the address. It increments the address, tries
>> again, and fails again ad nausium. Eventually the kernel module gives up
>> and returns to netatalk which terminates with an error that it cannot
>> get a node address.
> 
> Hi Craig
> 
> What Ethernet device are you seeing this problem with?
> 
> I'm not sure an Ethernet device should receive its own broadcasts.
> This might be a driver bug, not an AARP bug.
> 
>       Andrew
> 

I run inside Virtualbox with the Realtek PCIe GBE Family Controller.

Assuming I'm reading /sys/class/net/enp0s3/driver correctly, it's using 
the e1000 driver.

However, it might not be the ethernet driver's fault. I've been a bit 
loose with terminology. Appletalk AARP probe packets aren't ethernet 
broadcasts as such; they're multicast packets, via the psnap driver, to 
hardware address 09:00:07:ff:ff:ff.

Cheers,
Craig.

^ permalink raw reply

* Re: [RFC 0/1] Appletalk AARP probe broken by receipt of own broadcasts.
From: Andrew Lunn @ 2018-08-19  1:32 UTC (permalink / raw)
  To: Craig McGeachie; +Cc: David S. Miller, netdev, Craig McGeachie
In-Reply-To: <20180819010739.26975-1-slapdau@gmail.com>

On Sun, Aug 19, 2018 at 01:07:38PM +1200, Craig McGeachie wrote:
> I'm hoping I can find someone able and willing to test this patch. That
> requires someone still using netatalk 2.2.x with DDP, or some other DDP
> userspace application. This feels like a longshot.
> 
> When netatalk 2.2.x starts up with DDP and sets the Appletalk node
> address, the kernel AARP code sends a probe packet for the address. It
> then receives its own probe packet and interprets that as some other
> node also trying to claim the address. It increments the address, tries
> again, and fails again ad nausium. Eventually the kernel module gives up
> and returns to netatalk which terminates with an error that it cannot
> get a node address.

Hi Craig

What Ethernet device are you seeing this problem with?

I'm not sure an Ethernet device should receive its own broadcasts.
This might be a driver bug, not an AARP bug.

     Andrew

^ permalink raw reply

* [RFC 1/1] appletalk: ignore aarp probe broadcasts that loopback.
From: Craig McGeachie @ 2018-08-19  1:07 UTC (permalink / raw)
  To: David S. Miller, netdev; +Cc: Craig McGeachie, Craig McGeachie
In-Reply-To: <20180819010739.26975-1-slapdau@gmail.com>

AARP probe packets are broadcast to dynamically discover if an Appletalk
node address is in use. The definition of AARP requires interpreting the
receipt of probe requests for the address being tested as the address being
in use, and then trying the next address. However, the node receives its own
Ethertalk broadcast, and incorrectly interprets that as another node trying
to claim the address.

Signed-off-by: Craig McGeachie <slapdau@gmail.com>
---
 net/appletalk/aarp.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/appletalk/aarp.c b/net/appletalk/aarp.c
index 49a16cee..f966cc01 100644
--- a/net/appletalk/aarp.c
+++ b/net/appletalk/aarp.c
@@ -757,6 +757,10 @@ static int aarp_rcv(struct sk_buff *skb, struct net_device *dev,
 	if (!ifa)
 		goto out1;
 
+	/* Ignore packets from myself. */
+	if (ether_addr_equal(ea->hw_src, dev->dev_addr))
+		goto out1;
+
 	if (ifa->status & ATIF_PROBE &&
 	    ifa->address.s_node == ea->pa_dst_node &&
 	    ifa->address.s_net == ea->pa_dst_net) {
-- 
2.17.1

^ permalink raw reply related

* [RFC 0/1] Appletalk AARP probe broken by receipt of own broadcasts.
From: Craig McGeachie @ 2018-08-19  1:07 UTC (permalink / raw)
  To: David S. Miller, netdev; +Cc: Craig McGeachie, Craig McGeachie

I'm hoping I can find someone able and willing to test this patch. That
requires someone still using netatalk 2.2.x with DDP, or some other DDP
userspace application. This feels like a longshot.

When netatalk 2.2.x starts up with DDP and sets the Appletalk node
address, the kernel AARP code sends a probe packet for the address. It
then receives its own probe packet and interprets that as some other
node also trying to claim the address. It increments the address, tries
again, and fails again ad nausium. Eventually the kernel module gives up
and returns to netatalk which terminates with an error that it cannot
get a node address.

Well, most of the time. There seems to be some sort of race condition
where occasionally a self collision won't happen. Restart netatalk
enough times and it will probably work.

The device Ethernet MAC address is copied into the AARP packet, so the
fix is to disregard all received packets that have a sender address that
matches the device hardware address. This is more than just probe
packets, but there is no legitimate situation where an Appletalk node
sends AARP packets to itself.

Craig McGeachie (1):
  appletalk: ignore aarp probe broadcasts that loopback.

 net/appletalk/aarp.c | 4 ++++
 1 file changed, 4 insertions(+)

-- 
2.17.1

^ permalink raw reply

* financial support
From: Xiang, Feifei @ 2018-08-19  0:00 UTC (permalink / raw)




-- 
donation of $3 to you, contact hams.fly@hotmail.com to verefly

^ permalink raw reply

* Re: [PATCH ethtool] ethtool: document WoL filters option also in help message
From: John W. Linville @ 2018-08-18 21:35 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: Michal Kubecek, netdev
In-Reply-To: <430765a1-adde-a5bc-5c96-213e85d5c093@gmail.com>

On Fri, Aug 17, 2018 at 10:51:31AM -0700, Florian Fainelli wrote:
> On 08/17/2018 06:21 AM, Michal Kubecek wrote:
> > Commit eff0bb337223 ("ethtool: Add support for WAKE_FILTER (WoL using
> > filters)") added option "f" for wake on lan and documented it in man page
> > but not in the output of "ethtool --help".
> > 
> > Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
> 
> Acked-by: Florian Fainelli <f.fainelli@gmail.com>
> 
> Thanks Michal!

Yes, good eye!

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply

* Re: [PATCH v2] net: phy: add tracepoints
From: David Miller @ 2018-08-18 21:01 UTC (permalink / raw)
  To: tobias; +Cc: netdev, andrew, f.fainelli, rostedt, mingo, linux-kernel
In-Reply-To: <20180816152452.7834-1-tobias@waldekranz.com>

From: Tobias Waldekranz <tobias@waldekranz.com>
Date: Thu, 16 Aug 2018 17:24:48 +0200

> Two tracepoints for now:
> 
>    * `phy_interrupt` Pretty self-explanatory.
> 
>    * `phy_state_change` Whenever the PHY's state machine is run, trace
>      the old and the new state.
> 
> Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---
> v1 -> v2:
>    * Actually include the entire patch, based on current net-next.
>    * Pretty print the PHY's state according to Steven's suggestion.

Please resubmit this when the net-next tree opens back up.

Thank you.

^ permalink raw reply

* KASAN: use-after-free Read in tipc_group_fill_sock_diag
From: syzbot @ 2018-08-19  0:10 UTC (permalink / raw)
  To: davem, jon.maloy, linux-kernel, netdev, syzkaller-bugs,
	tipc-discussion, ying.xue

Hello,

syzbot found the following crash on:

HEAD commit:    a18d783fedfe Merge tag 'driver-core-4.19-rc1' of git://git..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=14bcc936400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=7eeabb17332898c0
dashboard link: https://syzkaller.appspot.com/bug?extid=b9c8f3ab2994b7cd1625
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=1374ec4a400000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+b9c8f3ab2994b7cd1625@syzkaller.appspotmail.com

IPv6: ADDRCONF(NETDEV_UP): veth1: link is not ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
8021q: adding VLAN 0 to HW filter on device team0
==================================================================
BUG: KASAN: use-after-free in tipc_group_fill_sock_diag+0x7b9/0x84b  
net/tipc/group.c:921
Read of size 4 at addr ffff8801ceb5565c by task syz-executor0/4838

CPU: 1 PID: 4838 Comm: syz-executor0 Not tainted 4.18.0+ #196
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
  print_address_description+0x6c/0x20b mm/kasan/report.c:256
  kasan_report_error mm/kasan/report.c:354 [inline]
  kasan_report.cold.7+0x242/0x30d mm/kasan/report.c:412
  __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
  tipc_group_fill_sock_diag+0x7b9/0x84b net/tipc/group.c:921
  tipc_sk_fill_sock_diag+0x9f8/0xdb0 net/tipc/socket.c:3322
  __tipc_add_sock_diag+0x22f/0x360 net/tipc/diag.c:62
  tipc_nl_sk_walk+0x68d/0xd30 net/tipc/socket.c:3249
  tipc_diag_dump+0x24/0x30 net/tipc/diag.c:73
  netlink_dump+0x519/0xd50 net/netlink/af_netlink.c:2233
  __netlink_dump_start+0x4f1/0x6f0 net/netlink/af_netlink.c:2329
  netlink_dump_start include/linux/netlink.h:213 [inline]
  tipc_sock_diag_handler_dump+0x234/0x340 net/tipc/diag.c:89
  __sock_diag_cmd net/core/sock_diag.c:232 [inline]
  sock_diag_rcv_msg+0x31d/0x410 net/core/sock_diag.c:263
  netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2454
  sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:274
  netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
  netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343
  netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1908
  sock_sendmsg_nosec net/socket.c:621 [inline]
  sock_sendmsg+0xd5/0x120 net/socket.c:631
  ___sys_sendmsg+0x7fd/0x930 net/socket.c:2114
  __sys_sendmsg+0x11d/0x290 net/socket.c:2152
  __do_sys_sendmsg net/socket.c:2161 [inline]
  __se_sys_sendmsg net/socket.c:2159 [inline]
  __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2159
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457089
Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f3a5d506c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f3a5d5076d4 RCX: 0000000000457089
RDX: 0000000000000000 RSI: 0000000020000040 RDI: 0000000000000006
RBP: 00000000009300a0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000004d4088 R14: 00000000004c8ab0 R15: 0000000000000000

Allocated by task 4838:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
  kmem_cache_alloc_trace+0x152/0x780 mm/slab.c:3620
  kmalloc include/linux/slab.h:513 [inline]
  kzalloc include/linux/slab.h:707 [inline]
  tipc_group_create+0x155/0xa70 net/tipc/group.c:171
  tipc_sk_join net/tipc/socket.c:2766 [inline]
  tipc_setsockopt+0x2d1/0xd70 net/tipc/socket.c:2881
  __sys_setsockopt+0x1c5/0x3b0 net/socket.c:1900
  __do_sys_setsockopt net/socket.c:1911 [inline]
  __se_sys_setsockopt net/socket.c:1908 [inline]
  __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1908
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 4837:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
  kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
  __cache_free mm/slab.c:3498 [inline]
  kfree+0xd9/0x260 mm/slab.c:3813
  tipc_group_delete+0x2e5/0x3f0 net/tipc/group.c:227
  tipc_sk_leave+0x113/0x220 net/tipc/socket.c:2800
  tipc_release+0x14e/0x12b0 net/tipc/socket.c:574
  __sock_release+0xd7/0x250 net/socket.c:579
  sock_close+0x19/0x20 net/socket.c:1139
  __fput+0x39b/0x860 fs/file_table.c:251
  ____fput+0x15/0x20 fs/file_table.c:282
  task_work_run+0x1e8/0x2a0 kernel/task_work.c:113
  tracehook_notify_resume include/linux/tracehook.h:193 [inline]
  exit_to_usermode_loop+0x318/0x380 arch/x86/entry/common.c:166
  prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
  syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
  do_syscall_64+0x6be/0x820 arch/x86/entry/common.c:293
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8801ceb55600
  which belongs to the cache kmalloc-192 of size 192
The buggy address is located 92 bytes inside of
  192-byte region [ffff8801ceb55600, ffff8801ceb556c0)
The buggy address belongs to the page:
page:ffffea00073ad540 count:1 mapcount:0 mapping:ffff8801dac00040 index:0x0
flags: 0x2fffc0000000100(slab)
raw: 02fffc0000000100 ffffea00073a5088 ffffea00073ae5c8 ffff8801dac00040
raw: 0000000000000000 ffff8801ceb55000 0000000100000010 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff8801ceb55500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  ffff8801ceb55580: 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> ffff8801ceb55600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                     ^
  ffff8801ceb55680: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
  ffff8801ceb55700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: [PATCH]ipv6: multicast: In mld_send_cr function moving read lock to second for loop
From: David Miller @ 2018-08-18 20:58 UTC (permalink / raw)
  To: guru2018; +Cc: netdev, kuznet, yoshfuji
In-Reply-To: <CAHSpA5-ivVaQnQU7_ikt=BV-EAwFJFaA6ErpQ1Jn1HmPmXZapg@mail.gmail.com>

From: Guruswamy Basavaiah <guru2018@gmail.com>
Date: Fri, 17 Aug 2018 18:01:41 +0530

> @@ -1860,7 +1860,6 @@ static void mld_send_cr(struct inet6_dev *idev)
>      struct sk_buff *skb = NULL;
>      int type, dtype;
> 
> -    read_lock_bh(&idev->lock);
>      spin_lock(&idev->mc_lock);
> 
>      /* deleted MCA's */

This will lead to deadlocks, idev->mc_lock must be taken with _bh().

I have zero confidence in this change, did you do any stress testing
with lockdep enabled?  It would have caught this quickly.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox