Netdev List
 help / color / mirror / Atom feed
* Re: Re: [RFC PATCH 2/3] netdev: kernel-only IFF_HIDDEN netdevice
From: Michael S. Tsirkin @ 2018-04-19  5:07 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: Siwei Liu, David Miller, David Ahern, Jiri Pirko, si-wei liu,
	Stephen Hemminger, Alexander Duyck, Brandeburg, Jesse,
	Jakub Kicinski, Jason Wang, Netdev, virtualization, virtio-dev
In-Reply-To: <4e029524-d542-0f12-bfdb-7c8a2f198e28@intel.com>

On Wed, Apr 18, 2018 at 10:00:51PM -0700, Samudrala, Sridhar wrote:
> On 4/18/2018 9:41 PM, Michael S. Tsirkin wrote:
> > On Wed, Apr 18, 2018 at 04:33:34PM -0700, Samudrala, Sridhar wrote:
> > > On 4/17/2018 5:26 PM, Siwei Liu wrote:
> > > > I ran this with a few folks offline and gathered some good feedbacks
> > > > that I'd like to share thus revive the discussion.
> > > > 
> > > > First of all, as illustrated in the reply below, cloud service
> > > > providers require transparent live migration. Specifically, the main
> > > > target of our case is to support SR-IOV live migration via kernel
> > > > upgrade while keeping the userspace of old distros unmodified. If it's
> > > > because this use case is not appealing enough for the mainline to
> > > > adopt, I will shut up and not continue discussing, although
> > > > technically it's entirely possible (and there's precedent in other
> > > > implementation) to do so to benefit any cloud service providers.
> > > > 
> > > > If it's just the implementation of hiding netdev itself needs to be
> > > > improved, such as implementing it as attribute flag or adding linkdump
> > > > API, that's completely fine and we can look into that. However, the
> > > > specific issue needs to be undestood beforehand is to make transparent
> > > > SR-IOV to be able to take over the name (so inherit all the configs)
> > > > from the lower netdev, which needs some games with uevents and name
> > > > space reservation. So far I don't think it's been well discussed.
> > > > 
> > > > One thing in particular I'd like to point out is that the 3-netdev
> > > > model currently missed to address the core problem of live migration:
> > > > migration of hardware specific feature/state, for e.g. ethtool configs
> > > > and hardware offloading states. Only general network state (IP
> > > > address, gateway, for eg.) associated with the bypass interface can be
> > > > migrated. As a follow-up work, bypass driver can/should be enhanced to
> > > > save and apply those hardware specific configs before or after
> > > > migration as needed. The transparent 1-netdev model being proposed as
> > > > part of this patch series will be able to solve that problem naturally
> > > > by making all hardware specific configurations go through the central
> > > > bypass driver, such that hardware configurations can be replayed when
> > > > new VF or passthrough gets plugged back in. Although that
> > > > corresponding function hasn't been implemented today, I'd like to
> > > > refresh everyone's mind that is the core problem any live migration
> > > > proposal should have addressed.
> > > > 
> > > > If it would make things more clear to defer netdev hiding until all
> > > > functionalities regarding centralizing and replay are implemented,
> > > > we'd take advices like that and move on to implementing those features
> > > > as follow-up patches. Once all needed features get done, we'd resume
> > > > the work for hiding lower netdev at that point. Think it would be the
> > > > best to make everyone understand the big picture in advance before
> > > > going too far.
> > > I think we should get the 3-netdev model integrated and add any additional
> > > ndo_ops/ethool ops that we would like to support/migrate before looking into
> > > hiding the lower netdevs.
> > Once they are exposed, I don't think we'll be able to hide them -
> > they will be a kernel ABI.
> > 
> > Do you think everyone needs to hide the SRIOV device?
> > Or that only some users need this?
> 
> Hyper-V is currently supporting live migration without hiding the SR-IOV device. So i don't
> think it is a hard requirement.

OK, fine.

> And also,  as we don't yet have a consensus on how to hide
> the lower netdevs, we could make it as another feature bit to hide lower netdevs once
> we have an acceptable solution.

Guest/host interface isn't more flexible than the userspace/kernel
interface.  The feature bit you propose would say what exactly?
Hypervisor has no idea what guest kernel shows guest userspace.
Note that the backup flag doesn't tell guest kernel what to do,
it just tells guest that there is or will be a faster main device
connected to the same backend, so the backup should only be used
when main device is not present.

-- 
MST

^ permalink raw reply

* [PATCH] net: qrtr: Expose tunneling endpoint to user space
From: Bjorn Andersson @ 2018-04-19  5:03 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel, netdev, linux-arm-msm, Chris Lew

This implements a misc character device named "qrtr-tun" for the purpose
of allowing user space applications to implement endpoints in the qrtr
network.

This allows more advanced (and dynamic) testing of the qrtr code as well
as opens up the ability of tunneling qrtr over a network or USB link.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
---
 net/qrtr/Kconfig  |   7 +++
 net/qrtr/Makefile |   2 +
 net/qrtr/tun.c    | 162 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 171 insertions(+)
 create mode 100644 net/qrtr/tun.c

diff --git a/net/qrtr/Kconfig b/net/qrtr/Kconfig
index 326fd97444f5..1944834d225c 100644
--- a/net/qrtr/Kconfig
+++ b/net/qrtr/Kconfig
@@ -21,4 +21,11 @@ config QRTR_SMD
 	  Say Y here to support SMD based ipcrouter channels.  SMD is the
 	  most common transport for IPC Router.
 
+config QRTR_TUN
+	tristate "TUN device for Qualcomm IPC Router"
+	---help---
+	  Say Y here to expose a character device that allows user space to
+	  implement endpoints of QRTR, for purpose of tunneling data to other
+	  hosts or testing purposes.
+
 endif # QRTR
diff --git a/net/qrtr/Makefile b/net/qrtr/Makefile
index ab09e40f7c74..be012bfd3e52 100644
--- a/net/qrtr/Makefile
+++ b/net/qrtr/Makefile
@@ -2,3 +2,5 @@ obj-$(CONFIG_QRTR) := qrtr.o
 
 obj-$(CONFIG_QRTR_SMD) += qrtr-smd.o
 qrtr-smd-y	:= smd.o
+obj-$(CONFIG_QRTR_TUN) += qrtr-tun.o
+qrtr-tun-y	:= tun.o
diff --git a/net/qrtr/tun.c b/net/qrtr/tun.c
new file mode 100644
index 000000000000..ab87a12cbb17
--- /dev/null
+++ b/net/qrtr/tun.c
@@ -0,0 +1,162 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2018, Linaro Ltd */
+
+#include <linux/miscdevice.h>
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/uaccess.h>
+
+#include "qrtr.h"
+
+struct qrtr_tun {
+	struct qrtr_endpoint ep;
+
+	struct mutex queue_lock;
+	struct sk_buff_head queue;
+	wait_queue_head_t readq;
+};
+
+static int qrtr_tun_send(struct qrtr_endpoint *ep, struct sk_buff *skb)
+{
+	struct qrtr_tun *tun = container_of(ep, struct qrtr_tun, ep);
+
+	mutex_lock(&tun->queue_lock);
+	skb_queue_tail(&tun->queue, skb);
+	mutex_unlock(&tun->queue_lock);
+
+	/* wake up any blocking processes, waiting for new data */
+	wake_up_interruptible(&tun->readq);
+
+	return 0;
+}
+
+static int qrtr_tun_open(struct inode *inode, struct file *filp)
+{
+	struct qrtr_tun *tun;
+
+	tun = kzalloc(sizeof(*tun), GFP_KERNEL);
+	if (!tun)
+		return -ENOMEM;
+
+	mutex_init(&tun->queue_lock);
+	skb_queue_head_init(&tun->queue);
+	init_waitqueue_head(&tun->readq);
+
+	tun->ep.xmit = qrtr_tun_send;
+
+	filp->private_data = tun;
+
+	return qrtr_endpoint_register(&tun->ep, QRTR_EP_NID_AUTO);
+}
+
+static ssize_t qrtr_tun_read_iter(struct kiocb *iocb, struct iov_iter *to)
+{
+	struct file *filp = iocb->ki_filp;
+	struct qrtr_tun *tun = filp->private_data;
+	struct sk_buff *skb;
+	int count;
+
+	mutex_lock(&tun->queue_lock);
+
+	/* Wait for data in the queue */
+	if (skb_queue_empty(&tun->queue)) {
+		mutex_unlock(&tun->queue_lock);
+
+		if (filp->f_flags & O_NONBLOCK)
+			return -EAGAIN;
+
+		/* Wait until we get data or the endpoint goes away */
+		if (wait_event_interruptible(tun->readq,
+					     !skb_queue_empty(&tun->queue)))
+			return -ERESTARTSYS;
+
+		mutex_lock(&tun->queue_lock);
+	}
+
+	skb = skb_dequeue(&tun->queue);
+	mutex_unlock(&tun->queue_lock);
+	if (!skb)
+		return -EFAULT;
+
+	count = min_t(size_t, iov_iter_count(to), skb->len);
+	if (copy_to_iter(skb->data, count, to) != count)
+		count = -EFAULT;
+
+	kfree_skb(skb);
+
+	return count;
+}
+
+static ssize_t qrtr_tun_write_iter(struct kiocb *iocb, struct iov_iter *from)
+{
+	struct file *filp = iocb->ki_filp;
+	struct qrtr_tun *tun = filp->private_data;
+	size_t len = iov_iter_count(from);
+	ssize_t ret;
+	void *kbuf;
+
+	kbuf = kzalloc(len, GFP_KERNEL);
+	if (!kbuf)
+		return -ENOMEM;
+
+	if (!copy_from_iter_full(kbuf, len, from))
+		return -EFAULT;
+
+	ret = qrtr_endpoint_post(&tun->ep, kbuf, len);
+
+	return ret < 0 ? ret : len;
+}
+
+static int qrtr_tun_release(struct inode *inode, struct file *filp)
+{
+	struct qrtr_tun *tun = filp->private_data;
+	struct sk_buff *skb;
+
+	qrtr_endpoint_unregister(&tun->ep);
+
+	/* Discard all SKBs */
+	while (!skb_queue_empty(&tun->queue)) {
+		skb = skb_dequeue(&tun->queue);
+		kfree_skb(skb);
+	}
+
+	kfree(tun);
+
+	return 0;
+}
+
+static const struct file_operations qrtr_tun_ops = {
+	.owner = THIS_MODULE,
+	.open = qrtr_tun_open,
+	.read_iter = qrtr_tun_read_iter,
+	.write_iter = qrtr_tun_write_iter,
+	.release = qrtr_tun_release,
+};
+
+static struct miscdevice qrtr_tun_miscdev = {
+	MISC_DYNAMIC_MINOR,
+	"qrtr-tun",
+	&qrtr_tun_ops,
+};
+
+static int __init qrtr_tun_init(void)
+{
+	int ret;
+
+	ret = misc_register(&qrtr_tun_miscdev);
+	if (ret)
+		pr_err("failed to register Qualcomm IPC Router tun device\n");
+
+	return ret;
+}
+
+static void __exit qrtr_tun_exit(void)
+{
+	misc_deregister(&qrtr_tun_miscdev);
+}
+
+module_init(qrtr_tun_init);
+module_exit(qrtr_tun_exit);
+
+MODULE_DESCRIPTION("Qualcomm IPC Router TUN device");
+MODULE_LICENSE("GPL v2");
-- 
2.16.2

^ permalink raw reply related

* [PATCH net] virtio_net: split out ctrl buffer
From: Michael S. Tsirkin @ 2018-04-19  5:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mikulas Patocka, Eric Dumazet, David Miller, Jason Wang,
	virtualization, netdev

When sending control commands, virtio net sets up several buffers for
DMA. The buffers are all part of the net device which means it's
actually allocated by kvmalloc so in theory (on extreme memory pressure)
it's possible to get a vmalloc'ed buffer which on some platforms means
we can't DMA there.

Fix up by moving the DMA buffers out into a separate structure.

Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---

Lightly tested.
Mikulas, does this address your problem?

 drivers/net/virtio_net.c | 68 +++++++++++++++++++++++++++---------------------
 1 file changed, 39 insertions(+), 29 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 7b187ec..82f50e5 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -147,6 +147,17 @@ struct receive_queue {
 	struct xdp_rxq_info xdp_rxq;
 };
 
+/* Control VQ buffers: protected by the rtnl lock */
+struct control_buf {
+	struct virtio_net_ctrl_hdr hdr;
+	virtio_net_ctrl_ack status;
+	struct virtio_net_ctrl_mq mq;
+	u8 promisc;
+	u8 allmulti;
+	u16 vid;
+	u64 offloads;
+};
+
 struct virtnet_info {
 	struct virtio_device *vdev;
 	struct virtqueue *cvq;
@@ -192,14 +203,7 @@ struct virtnet_info {
 	struct hlist_node node;
 	struct hlist_node node_dead;
 
-	/* Control VQ buffers: protected by the rtnl lock */
-	struct virtio_net_ctrl_hdr ctrl_hdr;
-	virtio_net_ctrl_ack ctrl_status;
-	struct virtio_net_ctrl_mq ctrl_mq;
-	u8 ctrl_promisc;
-	u8 ctrl_allmulti;
-	u16 ctrl_vid;
-	u64 ctrl_offloads;
+	struct control_buf *ctrl;
 
 	/* Ethtool settings */
 	u8 duplex;
@@ -1454,25 +1458,25 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
 	/* Caller should know better */
 	BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ));
 
-	vi->ctrl_status = ~0;
-	vi->ctrl_hdr.class = class;
-	vi->ctrl_hdr.cmd = cmd;
+	vi->ctrl->status = ~0;
+	vi->ctrl->hdr.class = class;
+	vi->ctrl->hdr.cmd = cmd;
 	/* Add header */
-	sg_init_one(&hdr, &vi->ctrl_hdr, sizeof(vi->ctrl_hdr));
+	sg_init_one(&hdr, &vi->ctrl->hdr, sizeof(vi->ctrl->hdr));
 	sgs[out_num++] = &hdr;
 
 	if (out)
 		sgs[out_num++] = out;
 
 	/* Add return status. */
-	sg_init_one(&stat, &vi->ctrl_status, sizeof(vi->ctrl_status));
+	sg_init_one(&stat, &vi->ctrl->status, sizeof(vi->ctrl->status));
 	sgs[out_num] = &stat;
 
 	BUG_ON(out_num + 1 > ARRAY_SIZE(sgs));
 	virtqueue_add_sgs(vi->cvq, sgs, out_num, 1, vi, GFP_ATOMIC);
 
 	if (unlikely(!virtqueue_kick(vi->cvq)))
-		return vi->ctrl_status == VIRTIO_NET_OK;
+		return vi->ctrl->status == VIRTIO_NET_OK;
 
 	/* Spin for a response, the kick causes an ioport write, trapping
 	 * into the hypervisor, so the request should be handled immediately.
@@ -1481,7 +1485,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
 	       !virtqueue_is_broken(vi->cvq))
 		cpu_relax();
 
-	return vi->ctrl_status == VIRTIO_NET_OK;
+	return vi->ctrl->status == VIRTIO_NET_OK;
 }
 
 static int virtnet_set_mac_address(struct net_device *dev, void *p)
@@ -1593,8 +1597,8 @@ static int _virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
 	if (!vi->has_cvq || !virtio_has_feature(vi->vdev, VIRTIO_NET_F_MQ))
 		return 0;
 
-	vi->ctrl_mq.virtqueue_pairs = cpu_to_virtio16(vi->vdev, queue_pairs);
-	sg_init_one(&sg, &vi->ctrl_mq, sizeof(vi->ctrl_mq));
+	vi->ctrl->mq.virtqueue_pairs = cpu_to_virtio16(vi->vdev, queue_pairs);
+	sg_init_one(&sg, &vi->ctrl->mq, sizeof(vi->ctrl->mq));
 
 	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_MQ,
 				  VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET, &sg)) {
@@ -1653,22 +1657,22 @@ static void virtnet_set_rx_mode(struct net_device *dev)
 	if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_RX))
 		return;
 
-	vi->ctrl_promisc = ((dev->flags & IFF_PROMISC) != 0);
-	vi->ctrl_allmulti = ((dev->flags & IFF_ALLMULTI) != 0);
+	vi->ctrl->promisc = ((dev->flags & IFF_PROMISC) != 0);
+	vi->ctrl->allmulti = ((dev->flags & IFF_ALLMULTI) != 0);
 
-	sg_init_one(sg, &vi->ctrl_promisc, sizeof(vi->ctrl_promisc));
+	sg_init_one(sg, &vi->ctrl->promisc, sizeof(vi->ctrl->promisc));
 
 	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RX,
 				  VIRTIO_NET_CTRL_RX_PROMISC, sg))
 		dev_warn(&dev->dev, "Failed to %sable promisc mode.\n",
-			 vi->ctrl_promisc ? "en" : "dis");
+			 vi->ctrl->promisc ? "en" : "dis");
 
-	sg_init_one(sg, &vi->ctrl_allmulti, sizeof(vi->ctrl_allmulti));
+	sg_init_one(sg, &vi->ctrl->allmulti, sizeof(vi->ctrl->allmulti));
 
 	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RX,
 				  VIRTIO_NET_CTRL_RX_ALLMULTI, sg))
 		dev_warn(&dev->dev, "Failed to %sable allmulti mode.\n",
-			 vi->ctrl_allmulti ? "en" : "dis");
+			 vi->ctrl->allmulti ? "en" : "dis");
 
 	uc_count = netdev_uc_count(dev);
 	mc_count = netdev_mc_count(dev);
@@ -1714,8 +1718,8 @@ static int virtnet_vlan_rx_add_vid(struct net_device *dev,
 	struct virtnet_info *vi = netdev_priv(dev);
 	struct scatterlist sg;
 
-	vi->ctrl_vid = vid;
-	sg_init_one(&sg, &vi->ctrl_vid, sizeof(vi->ctrl_vid));
+	vi->ctrl->vid = vid;
+	sg_init_one(&sg, &vi->ctrl->vid, sizeof(vi->ctrl->vid));
 
 	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_VLAN,
 				  VIRTIO_NET_CTRL_VLAN_ADD, &sg))
@@ -1729,8 +1733,8 @@ static int virtnet_vlan_rx_kill_vid(struct net_device *dev,
 	struct virtnet_info *vi = netdev_priv(dev);
 	struct scatterlist sg;
 
-	vi->ctrl_vid = vid;
-	sg_init_one(&sg, &vi->ctrl_vid, sizeof(vi->ctrl_vid));
+	vi->ctrl->vid = vid;
+	sg_init_one(&sg, &vi->ctrl->vid, sizeof(vi->ctrl->vid));
 
 	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_VLAN,
 				  VIRTIO_NET_CTRL_VLAN_DEL, &sg))
@@ -2126,9 +2130,9 @@ static int virtnet_restore_up(struct virtio_device *vdev)
 static int virtnet_set_guest_offloads(struct virtnet_info *vi, u64 offloads)
 {
 	struct scatterlist sg;
-	vi->ctrl_offloads = cpu_to_virtio64(vi->vdev, offloads);
+	vi->ctrl->offloads = cpu_to_virtio64(vi->vdev, offloads);
 
-	sg_init_one(&sg, &vi->ctrl_offloads, sizeof(vi->ctrl_offloads));
+	sg_init_one(&sg, &vi->ctrl->offloads, sizeof(vi->ctrl->offloads));
 
 	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_GUEST_OFFLOADS,
 				  VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET, &sg)) {
@@ -2351,6 +2355,7 @@ static void virtnet_free_queues(struct virtnet_info *vi)
 
 	kfree(vi->rq);
 	kfree(vi->sq);
+	kfree(vi->err_ctrl);
 }
 
 static void _free_receive_bufs(struct virtnet_info *vi)
@@ -2543,6 +2548,9 @@ static int virtnet_alloc_queues(struct virtnet_info *vi)
 {
 	int i;
 
+	vi->ctrl = kzalloc(sizeof(*vi->ctrl), GFP_KERNEL);
+	if (!vi->ctrl)
+		goto err_ctrl;
 	vi->sq = kzalloc(sizeof(*vi->sq) * vi->max_queue_pairs, GFP_KERNEL);
 	if (!vi->sq)
 		goto err_sq;
@@ -2571,6 +2579,8 @@ static int virtnet_alloc_queues(struct virtnet_info *vi)
 err_rq:
 	kfree(vi->sq);
 err_sq:
+	kfree(vq->ctrl);
+err_ctrl:
 	return -ENOMEM;
 }
 
-- 
MST

^ permalink raw reply related

* Re: [virtio-dev] Re: [RFC PATCH 2/3] netdev: kernel-only IFF_HIDDEN netdevice
From: Samudrala, Sridhar @ 2018-04-19  5:00 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Siwei Liu, David Miller, David Ahern, Jiri Pirko, si-wei liu,
	Stephen Hemminger, Alexander Duyck, Brandeburg, Jesse,
	Jakub Kicinski, Jason Wang, Netdev, virtualization, virtio-dev
In-Reply-To: <20180419072003-mutt-send-email-mst@kernel.org>

On 4/18/2018 9:41 PM, Michael S. Tsirkin wrote:
> On Wed, Apr 18, 2018 at 04:33:34PM -0700, Samudrala, Sridhar wrote:
>> On 4/17/2018 5:26 PM, Siwei Liu wrote:
>>> I ran this with a few folks offline and gathered some good feedbacks
>>> that I'd like to share thus revive the discussion.
>>>
>>> First of all, as illustrated in the reply below, cloud service
>>> providers require transparent live migration. Specifically, the main
>>> target of our case is to support SR-IOV live migration via kernel
>>> upgrade while keeping the userspace of old distros unmodified. If it's
>>> because this use case is not appealing enough for the mainline to
>>> adopt, I will shut up and not continue discussing, although
>>> technically it's entirely possible (and there's precedent in other
>>> implementation) to do so to benefit any cloud service providers.
>>>
>>> If it's just the implementation of hiding netdev itself needs to be
>>> improved, such as implementing it as attribute flag or adding linkdump
>>> API, that's completely fine and we can look into that. However, the
>>> specific issue needs to be undestood beforehand is to make transparent
>>> SR-IOV to be able to take over the name (so inherit all the configs)
>>> from the lower netdev, which needs some games with uevents and name
>>> space reservation. So far I don't think it's been well discussed.
>>>
>>> One thing in particular I'd like to point out is that the 3-netdev
>>> model currently missed to address the core problem of live migration:
>>> migration of hardware specific feature/state, for e.g. ethtool configs
>>> and hardware offloading states. Only general network state (IP
>>> address, gateway, for eg.) associated with the bypass interface can be
>>> migrated. As a follow-up work, bypass driver can/should be enhanced to
>>> save and apply those hardware specific configs before or after
>>> migration as needed. The transparent 1-netdev model being proposed as
>>> part of this patch series will be able to solve that problem naturally
>>> by making all hardware specific configurations go through the central
>>> bypass driver, such that hardware configurations can be replayed when
>>> new VF or passthrough gets plugged back in. Although that
>>> corresponding function hasn't been implemented today, I'd like to
>>> refresh everyone's mind that is the core problem any live migration
>>> proposal should have addressed.
>>>
>>> If it would make things more clear to defer netdev hiding until all
>>> functionalities regarding centralizing and replay are implemented,
>>> we'd take advices like that and move on to implementing those features
>>> as follow-up patches. Once all needed features get done, we'd resume
>>> the work for hiding lower netdev at that point. Think it would be the
>>> best to make everyone understand the big picture in advance before
>>> going too far.
>> I think we should get the 3-netdev model integrated and add any additional
>> ndo_ops/ethool ops that we would like to support/migrate before looking into
>> hiding the lower netdevs.
> Once they are exposed, I don't think we'll be able to hide them -
> they will be a kernel ABI.
>
> Do you think everyone needs to hide the SRIOV device?
> Or that only some users need this?

Hyper-V is currently supporting live migration without hiding the SR-IOV device. So i don't
think it is a hard requirement. And also,  as we don't yet have a consensus on how to hide
the lower netdevs, we could make it as another feature bit to hide lower netdevs once
we have an acceptable solution.

^ permalink raw reply

* [PATCH] net: net_cls: remove a NULL check for css_cls_state
From: Li RongQing @ 2018-04-19  4:59 UTC (permalink / raw)
  To: netdev

The input of css_cls_state() is impossible to NULL except
cgrp_css_online, so simplify it

Signed-off-by: Li RongQing <lirongqing@baidu.com>
---
 net/core/netclassid_cgroup.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/core/netclassid_cgroup.c b/net/core/netclassid_cgroup.c
index 5e4f04004a49..ee087cf793c2 100644
--- a/net/core/netclassid_cgroup.c
+++ b/net/core/netclassid_cgroup.c
@@ -19,7 +19,7 @@
 
 static inline struct cgroup_cls_state *css_cls_state(struct cgroup_subsys_state *css)
 {
-	return css ? container_of(css, struct cgroup_cls_state, css) : NULL;
+	return container_of(css, struct cgroup_cls_state, css);
 }
 
 struct cgroup_cls_state *task_cls_state(struct task_struct *p)
@@ -44,10 +44,9 @@ cgrp_css_alloc(struct cgroup_subsys_state *parent_css)
 static int cgrp_css_online(struct cgroup_subsys_state *css)
 {
 	struct cgroup_cls_state *cs = css_cls_state(css);
-	struct cgroup_cls_state *parent = css_cls_state(css->parent);
 
-	if (parent)
-		cs->classid = parent->classid;
+	if (css->parent)
+		cs->classid = css_cls_state(css->parent)->classid;
 
 	return 0;
 }
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH bpf-next 4/5] samples/bpf: Refine printing symbol for sampleip
From: Alexei Starovoitov @ 2018-04-19  4:47 UTC (permalink / raw)
  To: Leo Yan; +Cc: Alexei Starovoitov, Daniel Borkmann, netdev, linux-kernel
In-Reply-To: <1524101646-6544-5-git-send-email-leo.yan@linaro.org>

On Thu, Apr 19, 2018 at 09:34:05AM +0800, Leo Yan wrote:
> The code defines macro 'PAGE_OFFSET' and uses it to decide if the
> address is in kernel space or not.  But different architecture has
> different 'PAGE_OFFSET' so this program cannot be used for all
> platforms.
> 
> This commit changes to check returned pointer from ksym_search() to
> judge if the address falls into kernel space or not, and removes
> macro 'PAGE_OFFSET' as it isn't used anymore.  As result, this program
> has no architecture dependency.
> 
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  samples/bpf/sampleip_user.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/samples/bpf/sampleip_user.c b/samples/bpf/sampleip_user.c
> index 4ed690b..0eea1b3 100644
> --- a/samples/bpf/sampleip_user.c
> +++ b/samples/bpf/sampleip_user.c
> @@ -26,7 +26,6 @@
>  #define DEFAULT_FREQ	99
>  #define DEFAULT_SECS	5
>  #define MAX_IPS		8192
> -#define PAGE_OFFSET	0xffff880000000000
>  
>  static int nr_cpus;
>  
> @@ -107,14 +106,13 @@ static void print_ip_map(int fd)
>  	/* sort and print */
>  	qsort(counts, max, sizeof(struct ipcount), count_cmp);
>  	for (i = 0; i < max; i++) {
> -		if (counts[i].ip > PAGE_OFFSET) {
> -			sym = ksym_search(counts[i].ip);

yes. it is x64 specific, since it's a sample code,
but simply removing it is not a fix.
It makes this sampleip code behaving incorrectly.
To do such 'cleanup of ksym' please refactor it in the true generic way,
so these ksym* helpers can work on all archs and put this new
functionality into selftests.
If you can convert these examples into proper self-checking tests
that run out of selftests that would be awesome.
Thanks!

^ permalink raw reply

* Re: Re: [RFC PATCH 2/3] netdev: kernel-only IFF_HIDDEN netdevice
From: Michael S. Tsirkin @ 2018-04-19  4:41 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: Siwei Liu, David Miller, David Ahern, Jiri Pirko, si-wei liu,
	Stephen Hemminger, Alexander Duyck, Brandeburg, Jesse,
	Jakub Kicinski, Jason Wang, Netdev, virtualization, virtio-dev
In-Reply-To: <1f3af59f-fd64-cc0d-f9eb-668636c52db4@intel.com>

On Wed, Apr 18, 2018 at 04:33:34PM -0700, Samudrala, Sridhar wrote:
> On 4/17/2018 5:26 PM, Siwei Liu wrote:
> > I ran this with a few folks offline and gathered some good feedbacks
> > that I'd like to share thus revive the discussion.
> > 
> > First of all, as illustrated in the reply below, cloud service
> > providers require transparent live migration. Specifically, the main
> > target of our case is to support SR-IOV live migration via kernel
> > upgrade while keeping the userspace of old distros unmodified. If it's
> > because this use case is not appealing enough for the mainline to
> > adopt, I will shut up and not continue discussing, although
> > technically it's entirely possible (and there's precedent in other
> > implementation) to do so to benefit any cloud service providers.
> > 
> > If it's just the implementation of hiding netdev itself needs to be
> > improved, such as implementing it as attribute flag or adding linkdump
> > API, that's completely fine and we can look into that. However, the
> > specific issue needs to be undestood beforehand is to make transparent
> > SR-IOV to be able to take over the name (so inherit all the configs)
> > from the lower netdev, which needs some games with uevents and name
> > space reservation. So far I don't think it's been well discussed.
> > 
> > One thing in particular I'd like to point out is that the 3-netdev
> > model currently missed to address the core problem of live migration:
> > migration of hardware specific feature/state, for e.g. ethtool configs
> > and hardware offloading states. Only general network state (IP
> > address, gateway, for eg.) associated with the bypass interface can be
> > migrated. As a follow-up work, bypass driver can/should be enhanced to
> > save and apply those hardware specific configs before or after
> > migration as needed. The transparent 1-netdev model being proposed as
> > part of this patch series will be able to solve that problem naturally
> > by making all hardware specific configurations go through the central
> > bypass driver, such that hardware configurations can be replayed when
> > new VF or passthrough gets plugged back in. Although that
> > corresponding function hasn't been implemented today, I'd like to
> > refresh everyone's mind that is the core problem any live migration
> > proposal should have addressed.
> > 
> > If it would make things more clear to defer netdev hiding until all
> > functionalities regarding centralizing and replay are implemented,
> > we'd take advices like that and move on to implementing those features
> > as follow-up patches. Once all needed features get done, we'd resume
> > the work for hiding lower netdev at that point. Think it would be the
> > best to make everyone understand the big picture in advance before
> > going too far.
> 
> I think we should get the 3-netdev model integrated and add any additional
> ndo_ops/ethool ops that we would like to support/migrate before looking into
> hiding the lower netdevs.

Once they are exposed, I don't think we'll be able to hide them -
they will be a kernel ABI.

Do you think everyone needs to hide the SRIOV device?
Or that only some users need this?

-- 
MST

^ permalink raw reply

* Re: [PATCH bpf-next v2 9/9] tools/bpf: add a test_progs test case for bpf_get_stack helper
From: Alexei Starovoitov @ 2018-04-19  4:39 UTC (permalink / raw)
  To: Yonghong Song; +Cc: ast, daniel, netdev, kernel-team
In-Reply-To: <20180418165444.2263237-10-yhs@fb.com>

On Wed, Apr 18, 2018 at 09:54:44AM -0700, Yonghong Song wrote:
> The test_stacktrace_map is enhanced to call bpf_get_stack
> in the helper to get the stack trace as well.
> The stack traces from bpf_get_stack and bpf_get_stackid
> are compared to ensure that for the same stack as
> represented as the same hash, their ip addresses
> must be the same.
> 
> Signed-off-by: Yonghong Song <yhs@fb.com>

could you please add a test for bpf_get_stack() with buildid as well?
I think patch 2 implementes it correctly, but would be good to have a test for it.

^ permalink raw reply

* Re: [PATCH bpf-next v2 7/9] samples/bpf: add a test for bpf_get_stack helper
From: Alexei Starovoitov @ 2018-04-19  4:37 UTC (permalink / raw)
  To: Yonghong Song; +Cc: ast, daniel, netdev, kernel-team
In-Reply-To: <20180418165444.2263237-8-yhs@fb.com>

On Wed, Apr 18, 2018 at 09:54:42AM -0700, Yonghong Song wrote:
> The test attached a kprobe program to kernel function sys_write.
> It tested to get stack for user space, kernel space and user
> space with build_id request. It also tested to get user
> and kernel stack into the same buffer with back-to-back
> bpf_get_stack helper calls.
> 
> Whenever the kernel stack is available, the user space
> application will check to ensure that sys_write/SyS_write
> is part of the stack.
> 
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  samples/bpf/Makefile               |   4 +
>  samples/bpf/trace_get_stack_kern.c |  86 +++++++++++++++++++++
>  samples/bpf/trace_get_stack_user.c | 150 +++++++++++++++++++++++++++++++++++++
>  3 files changed, 240 insertions(+)

since perf_read is being refactored out of trace_output_user.c in the previous patch
please move it to selftests (instead of bpf_load.c) and move
this whole test to selftests as well.

>  create mode 100644 samples/bpf/trace_get_stack_kern.c
>  create mode 100644 samples/bpf/trace_get_stack_user.c
> 
> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
> index 4d6a6ed..94e7b10 100644
> --- a/samples/bpf/Makefile
> +++ b/samples/bpf/Makefile
> @@ -44,6 +44,7 @@ hostprogs-y += xdp_monitor
>  hostprogs-y += xdp_rxq_info
>  hostprogs-y += syscall_tp
>  hostprogs-y += cpustat
> +hostprogs-y += trace_get_stack
>  
>  # Libbpf dependencies
>  LIBBPF := ../../tools/lib/bpf/bpf.o ../../tools/lib/bpf/nlattr.o
> @@ -95,6 +96,7 @@ xdp_monitor-objs := bpf_load.o $(LIBBPF) xdp_monitor_user.o
>  xdp_rxq_info-objs := bpf_load.o $(LIBBPF) xdp_rxq_info_user.o
>  syscall_tp-objs := bpf_load.o $(LIBBPF) syscall_tp_user.o
>  cpustat-objs := bpf_load.o $(LIBBPF) cpustat_user.o
> +trace_get_stack-objs := bpf_load.o $(LIBBPF) trace_get_stack_user.o
>  
>  # Tell kbuild to always build the programs
>  always := $(hostprogs-y)
> @@ -148,6 +150,7 @@ always += xdp_rxq_info_kern.o
>  always += xdp2skb_meta_kern.o
>  always += syscall_tp_kern.o
>  always += cpustat_kern.o
> +always += trace_get_stack_kern.o
>  
>  HOSTCFLAGS += -I$(objtree)/usr/include
>  HOSTCFLAGS += -I$(srctree)/tools/lib/
> @@ -193,6 +196,7 @@ HOSTLOADLIBES_xdp_monitor += -lelf
>  HOSTLOADLIBES_xdp_rxq_info += -lelf
>  HOSTLOADLIBES_syscall_tp += -lelf
>  HOSTLOADLIBES_cpustat += -lelf
> +HOSTLOADLIBES_trace_get_stack += -lelf
>  
>  # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline:
>  #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
> diff --git a/samples/bpf/trace_get_stack_kern.c b/samples/bpf/trace_get_stack_kern.c
> new file mode 100644
> index 0000000..665e4ad
> --- /dev/null
> +++ b/samples/bpf/trace_get_stack_kern.c
> @@ -0,0 +1,86 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/ptrace.h>
> +#include <linux/version.h>
> +#include <uapi/linux/bpf.h>
> +#include "bpf_helpers.h"
> +
> +/* Permit pretty deep stack traces */
> +#define MAX_STACK 100
> +struct stack_trace_t {
> +	int pid;
> +	int kern_stack_size;
> +	int user_stack_size;
> +	int user_stack_buildid_size;
> +	u64 kern_stack[MAX_STACK];
> +	u64 user_stack[MAX_STACK];
> +	struct bpf_stack_build_id user_stack_buildid[MAX_STACK];
> +};
> +
> +struct bpf_map_def SEC("maps") perfmap = {
> +	.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
> +	.key_size = sizeof(int),
> +	.value_size = sizeof(u32),
> +	.max_entries = 2,
> +};
> +
> +struct bpf_map_def SEC("maps") stackdata_map = {
> +	.type = BPF_MAP_TYPE_PERCPU_ARRAY,
> +	.key_size = sizeof(u32),
> +	.value_size = sizeof(struct stack_trace_t),
> +	.max_entries = 1,
> +};
> +
> +struct bpf_map_def SEC("maps") rawdata_map = {
> +	.type = BPF_MAP_TYPE_PERCPU_ARRAY,
> +	.key_size = sizeof(u32),
> +	.value_size = MAX_STACK * sizeof(u64) * 2,
> +	.max_entries = 1,
> +};
> +
> +SEC("kprobe/sys_write")
> +int bpf_prog1(struct pt_regs *ctx)
> +{
> +	int max_len, max_buildid_len, usize, ksize, total_size;
> +	struct stack_trace_t *data;
> +	void *raw_data;
> +	u32 key = 0;
> +
> +	data = bpf_map_lookup_elem(&stackdata_map, &key);
> +	if (!data)
> +		return 0;
> +
> +	max_len = MAX_STACK * sizeof(u64);
> +	max_buildid_len = MAX_STACK * sizeof(struct bpf_stack_build_id);
> +	data->pid = bpf_get_current_pid_tgid();
> +	data->kern_stack_size = bpf_get_stack(ctx, data->kern_stack,
> +					      max_len, 0);
> +	data->user_stack_size = bpf_get_stack(ctx, data->user_stack, max_len,
> +					    BPF_F_USER_STACK);
> +	data->user_stack_buildid_size = bpf_get_stack(
> +		ctx, data->user_stack_buildid, max_buildid_len,
> +		BPF_F_USER_STACK | BPF_F_USER_BUILD_ID);
> +	bpf_perf_event_output(ctx, &perfmap, 0, data, sizeof(*data));
> +
> +	/* write both kernel and user stacks to the same buffer */
> +	raw_data = bpf_map_lookup_elem(&rawdata_map, &key);
> +	if (!raw_data)
> +		return 0;
> +
> +	usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK);
> +	if (usize < 0)
> +		return 0;
> +
> +	ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
> +	if (ksize < 0)
> +		return 0;
> +
> +	total_size = usize + ksize;
> +	if (total_size > 0 && total_size <= max_len)
> +		bpf_perf_event_output(ctx, &perfmap, 0, raw_data, total_size);
> +
> +	return 0;
> +}
> +
> +char _license[] SEC("license") = "GPL";
> +u32 _version SEC("version") = LINUX_VERSION_CODE;
> diff --git a/samples/bpf/trace_get_stack_user.c b/samples/bpf/trace_get_stack_user.c
> new file mode 100644
> index 0000000..f64f5a5
> --- /dev/null
> +++ b/samples/bpf/trace_get_stack_user.c
> @@ -0,0 +1,150 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <stdio.h>
> +#include <unistd.h>
> +#include <stdlib.h>
> +#include <stdbool.h>
> +#include <string.h>
> +#include <fcntl.h>
> +#include <poll.h>
> +#include <linux/perf_event.h>
> +#include <linux/bpf.h>
> +#include <errno.h>
> +#include <assert.h>
> +#include <sys/syscall.h>
> +#include <sys/ioctl.h>
> +#include <sys/mman.h>
> +#include <time.h>
> +#include <signal.h>
> +#include "libbpf.h"
> +#include "bpf_load.h"
> +#include "perf-sys.h"
> +
> +static int pmu_fd;
> +
> +#define MAX_CNT 10ull
> +#define MAX_STACK 100
> +struct stack_trace_t {
> +	int pid;
> +	int kern_stack_size;
> +	int user_stack_size;
> +	int user_stack_buildid_size;
> +	__u64 kern_stack[MAX_STACK];
> +	__u64 user_stack[MAX_STACK];
> +	struct bpf_stack_build_id user_stack_buildid[MAX_STACK];
> +};
> +
> +static void print_bpf_output(void *data, int size)
> +{
> +	struct stack_trace_t *e = data;
> +	int i, num_stack;
> +	static __u64 cnt;
> +	bool found = false;
> +
> +	cnt++;
> +
> +	if (size < sizeof(struct stack_trace_t)) {
> +		__u64 *raw_data = data;
> +
> +		num_stack = size / sizeof(__u64);
> +		printf("sample size = %d, raw stack\n\t", size);
> +		for (i = 0; i < num_stack; i++) {
> +			struct ksym *ks = ksym_search(raw_data[i]);
> +
> +			printf("0x%llx ", raw_data[i]);
> +			if (ks && (strcmp(ks->name, "sys_write") == 0 ||
> +				   strcmp(ks->name, "SyS_write") == 0))
> +				found = true;
> +		}
> +		printf("\n");
> +	} else {
> +		printf("sample size = %d, pid %d\n", size, e->pid);
> +		if (e->kern_stack_size > 0) {
> +			num_stack = e->kern_stack_size / sizeof(__u64);
> +			printf("\tkernel_stack(%d): ", num_stack);
> +			for (i = 0; i < num_stack; i++) {
> +				struct ksym *ks = ksym_search(e->kern_stack[i]);
> +
> +				printf("0x%llx ", e->kern_stack[i]);
> +				if (ks && (strcmp(ks->name, "sys_write") == 0 ||
> +					   strcmp(ks->name, "SyS_write") == 0))
> +					found = true;
> +			}
> +			printf("\n");
> +		}
> +		if (e->user_stack_size > 0) {
> +			num_stack = e->user_stack_size / sizeof(__u64);
> +			printf("\tuser_stack(%d): ", num_stack);
> +			for (i = 0; i < num_stack; i++)
> +				printf("0x%llx ", e->user_stack[i]);
> +			printf("\n");
> +		}
> +		if (e->user_stack_buildid_size > 0) {
> +			num_stack = e->user_stack_buildid_size /
> +				    sizeof(struct bpf_stack_build_id);
> +			printf("\tuser_stack_buildid(%d): ", num_stack);
> +			for (i = 0; i < num_stack; i++) {
> +				int j;
> +
> +				printf("(%d, 0x", e->user_stack_buildid[i].status);
> +				for (j = 0; j < BPF_BUILD_ID_SIZE; j++)
> +					printf("%02x", e->user_stack_buildid[i].build_id[i]);
> +				printf(", %llx) ", e->user_stack_buildid[i].offset);
> +			}
> +			printf("\n");
> +		}
> +	}
> +	if (!found) {
> +		printf("received %lld events, kern symbol not found, exiting ...\n", cnt);
> +		kill(0, SIGINT);
> +	}
> +
> +	if (cnt == MAX_CNT) {
> +		printf("received max %lld events, exiting ...\n", cnt);
> +		kill(0, SIGINT);
> +	}
> +}
> +
> +static void test_bpf_perf_event(void)
> +{
> +	struct perf_event_attr attr = {
> +		.sample_type = PERF_SAMPLE_RAW,
> +		.type = PERF_TYPE_SOFTWARE,
> +		.config = PERF_COUNT_SW_BPF_OUTPUT,
> +	};
> +	int key = 0;
> +
> +	pmu_fd = sys_perf_event_open(&attr, -1/*pid*/, 0/*cpu*/, -1/*group_fd*/, 0);
> +
> +	assert(pmu_fd >= 0);
> +	assert(bpf_map_update_elem(map_fd[0], &key, &pmu_fd, BPF_ANY) == 0);
> +	ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
> +}
> +
> +static void action(void)
> +{
> +	FILE *f;
> +
> +	f = popen("taskset 1 dd if=/dev/zero of=/dev/null", "r");
> +	(void) f;
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	char filename[256];
> +
> +	snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
> +
> +	if (load_kallsyms()) {
> +		printf("failed to process /proc/kallsyms\n");
> +		return 2;
> +	}
> +
> +	if (load_bpf_file(filename)) {
> +		printf("%s", bpf_log_buf);
> +		return 1;
> +	}
> +
> +	test_bpf_perf_event();
> +	return perf_event_poller(pmu_fd, action, print_bpf_output);
> +}
> -- 
> 2.9.5
> 

^ permalink raw reply

* Re: [PATCH bpf-next v2 4/9] bpf/verifier: improve register value range tracking with ARSH
From: Alexei Starovoitov @ 2018-04-19  4:35 UTC (permalink / raw)
  To: Yonghong Song; +Cc: ast, daniel, netdev, kernel-team
In-Reply-To: <20180418165444.2263237-5-yhs@fb.com>

On Wed, Apr 18, 2018 at 09:54:39AM -0700, Yonghong Song wrote:
> When helpers like bpf_get_stack returns an int value
> and later on used for arithmetic computation, the LSH and ARSH
> operations are often required to get proper sign extension into
> 64-bit. For example, without this patch:
>     54: R0=inv(id=0,umax_value=800)
>     54: (bf) r8 = r0
>     55: R0=inv(id=0,umax_value=800) R8_w=inv(id=0,umax_value=800)
>     55: (67) r8 <<= 32
>     56: R8_w=inv(id=0,umax_value=3435973836800,var_off=(0x0; 0x3ff00000000))
>     56: (c7) r8 s>>= 32
>     57: R8=inv(id=0)
> With this patch:
>     54: R0=inv(id=0,umax_value=800)
>     54: (bf) r8 = r0
>     55: R0=inv(id=0,umax_value=800) R8_w=inv(id=0,umax_value=800)
>     55: (67) r8 <<= 32
>     56: R8_w=inv(id=0,umax_value=3435973836800,var_off=(0x0; 0x3ff00000000))
>     56: (c7) r8 s>>= 32
>     57: R8=inv(id=0, umax_value=800,var_off=(0x0; 0x3ff))
> With better range of "R8", later on when "R8" is added to other register,
> e.g., a map pointer or scalar-value register, the better register
> range can be derived and verifier failure may be avoided.
> 
> In our later example,
>     ......
>     usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK);
>     if (usize < 0)
>         return 0;
>     ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
>     ......
> Without improving ARSH value range tracking, the register representing
> "max_len - usize" will have smin_value equal to S64_MIN and will be
> rejected by verifier.
> 
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  kernel/bpf/verifier.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index a8302c3..6148d31 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -2944,6 +2944,7 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env,
>  		__update_reg_bounds(dst_reg);
>  		break;
>  	case BPF_RSH:
> +	case BPF_ARSH:

I don't think that's correct.
The code further down is very RSH specific.

>  		if (umax_val >= insn_bitness) {
>  			/* Shifts greater than 31 or 63 are undefined.
>  			 * This includes shifts by a negative number.
> -- 
> 2.9.5
> 

^ permalink raw reply

* Re: [PATCH bpf-next v2 3/9] bpf/verifier: refine retval R0 state for bpf_get_stack helper
From: Alexei Starovoitov @ 2018-04-19  4:33 UTC (permalink / raw)
  To: Yonghong Song; +Cc: ast, daniel, netdev, kernel-team
In-Reply-To: <20180418165444.2263237-4-yhs@fb.com>

On Wed, Apr 18, 2018 at 09:54:38AM -0700, Yonghong Song wrote:
> The special property of return values for helpers bpf_get_stack
> and bpf_probe_read_str are captured in verifier.
> Both helpers return a negative error code or
> a length, which is equal to or smaller than the buffer
> size argument. This additional information in the
> verifier can avoid the condition such as "retval > bufsize"
> in the bpf program. For example, for the code blow,
>     usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK);
>     if (usize < 0 || usize > max_len)
>         return 0;
> The verifier may have the following errors:
>     52: (85) call bpf_get_stack#65
>      R0=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R1_w=ctx(id=0,off=0,imm=0)
>      R2_w=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R3_w=inv800 R4_w=inv256
>      R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
>      R9_w=inv800 R10=fp0,call_-1
>     53: (bf) r8 = r0
>     54: (bf) r1 = r8
>     55: (67) r1 <<= 32
>     56: (bf) r2 = r1
>     57: (77) r2 >>= 32
>     58: (25) if r2 > 0x31f goto pc+33
>      R0=inv(id=0) R1=inv(id=0,smax_value=9223372032559808512,
>                          umax_value=18446744069414584320,
>                          var_off=(0x0; 0xffffffff00000000))
>      R2=inv(id=0,umax_value=799,var_off=(0x0; 0x3ff))
>      R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
>      R8=inv(id=0) R9=inv800 R10=fp0,call_-1
>     59: (1f) r9 -= r8
>     60: (c7) r1 s>>= 32
>     61: (bf) r2 = r7
>     62: (0f) r2 += r1
>     math between map_value pointer and register with unbounded
>     min value is not allowed
> The failure is due to llvm compiler optimization where register "r2",
> which is a copy of "r1", is tested for condition while later on "r1"
> is used for map_ptr operation. The verifier is not able to track such
> inst sequence effectively.
> 
> Without the "usize > max_len" condition, there is no llvm optimization
> and the below generated code passed verifier:
>     52: (85) call bpf_get_stack#65
>      R0=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R1_w=ctx(id=0,off=0,imm=0)
>      R2_w=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R3_w=inv800 R4_w=inv256
>      R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
>      R9_w=inv800 R10=fp0,call_-1
>     53: (b7) r1 = 0
>     54: (bf) r8 = r0
>     55: (67) r8 <<= 32
>     56: (c7) r8 s>>= 32
>     57: (6d) if r1 s> r8 goto pc+24
>      R0=inv(id=0,umax_value=800) R1=inv0 R6=ctx(id=0,off=0,imm=0)
>      R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
>      R8=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff)) R9=inv800
>      R10=fp0,call_-1
>     58: (bf) r2 = r7
>     59: (0f) r2 += r8
>     60: (1f) r9 -= r8
>     61: (bf) r1 = r6
> 
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  kernel/bpf/verifier.c | 31 ++++++++++++++++++++++++++++++-
>  1 file changed, 30 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index aba9425..a8302c3 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -2333,10 +2333,32 @@ static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
>  	return 0;
>  }
>  
> +static void do_refine_retval_range(struct bpf_reg_state *regs, int ret_type,
> +				   int func_id,
> +				   struct bpf_reg_state *retval_state,
> +				   bool is_check)
> +{
> +	struct bpf_reg_state *src_reg, *dst_reg;
> +
> +	if (ret_type != RET_INTEGER ||
> +	    (func_id != BPF_FUNC_get_stack &&
> +	     func_id != BPF_FUNC_probe_read_str))
> +		return;
> +
> +	dst_reg = is_check ? retval_state : &regs[BPF_REG_0];
> +	if (func_id == BPF_FUNC_get_stack)
> +		src_reg = is_check ? &regs[BPF_REG_3] : retval_state;
> +	else
> +		src_reg = is_check ? &regs[BPF_REG_2] : retval_state;
> +
> +	dst_reg->smax_value = src_reg->smax_value;
> +	dst_reg->umax_value = src_reg->umax_value;
> +}

I think this part can be made more generic, by using 'meta' logic.
check_func_arg(.. &meta);
can remember smax/umax into meta for arg_type_is_mem_size()
and later refine_retval_range() can be applied to r0.
This will help avoid mistakes with specifying reg by position (r2 or r3)
like above snippet is doing.

> +
>  static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn_idx)
>  {
>  	const struct bpf_func_proto *fn = NULL;
> -	struct bpf_reg_state *regs;
> +	struct bpf_reg_state *regs, retval_state;
>  	struct bpf_call_arg_meta meta;
>  	bool changes_data;
>  	int i, err;
> @@ -2415,6 +2437,10 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
>  	}
>  
>  	regs = cur_regs(env);
> +
> +	/* before reset caller saved regs, check special ret value */
> +	do_refine_retval_range(regs, fn->ret_type, func_id, &retval_state, 1);
> +
>  	/* reset caller saved regs */
>  	for (i = 0; i < CALLER_SAVED_REGS; i++) {
>  		mark_reg_not_init(env, regs, caller_saved[i]);
> @@ -2456,6 +2482,9 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
>  		return -EINVAL;
>  	}
>  
> +	/* apply additional constraints to ret value */
> +	do_refine_retval_range(regs, fn->ret_type, func_id, &retval_state, 0);
> +
>  	err = check_map_func_compatibility(env, meta.map_ptr, func_id);
>  	if (err)
>  		return err;
> -- 
> 2.9.5
> 

^ permalink raw reply

* Re: [PATCH 02/12] storsvc: don't set a bounce limit
From: Martin K. Petersen @ 2018-04-19  4:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-arch-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-ide-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
In-Reply-To: <20180416085032.7367-3-hch-jcswGhMUV9g@public.gmane.org>


Christoph,

> The default already is to never bounce, so the call is a no-op.

Applied to 4.18/scsi-queue. Thanks!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply

* Re: [PATCH 01/12] iscsi_tcp: don't set a bounce limit
From: Martin K. Petersen @ 2018-04-19  4:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-arch-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-ide-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
In-Reply-To: <20180416085032.7367-2-hch-jcswGhMUV9g@public.gmane.org>


Christoph,

> The default already is to never bounce, so the call is a no-op.

Applied to 4.18/scsi-queue. Thanks!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply

* Re: [PATCH bpf-next v2 2/9] bpf: add bpf_get_stack helper
From: Alexei Starovoitov @ 2018-04-19  4:26 UTC (permalink / raw)
  To: Yonghong Song; +Cc: ast, daniel, netdev, kernel-team
In-Reply-To: <20180418165444.2263237-3-yhs@fb.com>

On Wed, Apr 18, 2018 at 09:54:37AM -0700, Yonghong Song wrote:
> Currently, stackmap and bpf_get_stackid helper are provided
> for bpf program to get the stack trace. This approach has
> a limitation though. If two stack traces have the same hash,
> only one will get stored in the stackmap table,
> so some stack traces are missing from user perspective.
> 
> This patch implements a new helper, bpf_get_stack, will
> send stack traces directly to bpf program. The bpf program
> is able to see all stack traces, and then can do in-kernel
> processing or send stack traces to user space through
> shared map or bpf_perf_event_output.
> 
> Signed-off-by: Yonghong Song <yhs@fb.com>

lgtm
Acked-by: Alexei Starovoitov <ast@kernel.org>

^ permalink raw reply

* Re: tg3 crashes under high load, when using 100Mbits
From: Siva Reddy Kallam @ 2018-04-19  4:21 UTC (permalink / raw)
  To: Kai-Heng Feng
  Cc: Satish Baddipadige, Prashant Sreedharan, Michael Chan,
	Linux Netdev List, Linux Kernel Mailing List, Stanley Hsiao,
	Tim Chen

On Sat, Apr 14, 2018 at 9:17 PM, Kai-Heng Feng
<kai.heng.feng@canonical.com> wrote:
> Hi Satish,
>
>> On 2018Mar21, at 00:57, Kai-Heng Feng <kai.heng.feng@canonical.com> wrote:
>>
>> Satish Baddipadige <satish.baddipadige@broadcom.com> wrote:
>>
>>> On Thu, Feb 15, 2018 at 7:37 PM, Siva Reddy Kallam
>>> <siva.kallam@broadcom.com> wrote:
>>>> On Mon, Feb 12, 2018 at 10:59 AM, Siva Reddy Kallam
>>>> <siva.kallam@broadcom.com> wrote:
>>>>> On Fri, Feb 9, 2018 at 10:41 AM, Kai Heng Feng
>>>>> <kai.heng.feng@canonical.com> wrote:
>>>>>> Hi Broadcom folks,
>>>>>>
>>>>>> We are now enabling a new platform with tg3 nic, unfortunately we observed
>>>>>> the bug [1] that dated back to 2015.
>>>>>> I tried commit 4419bb1cedcd ("tg3: Add workaround to restrict 5762 MRRS to
>>>>>> 2048”) but it does’t work.
>>>>>>
>>>>>> Do you have any idea how to solve the issue?
>>>>>>
>>>>>> [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1447664
>>>>>>
>>>>>> Kai-Heng
>>>>> Thank you for reporting. We will check and update you.
>>>> With link aware mode, the clock speed could be slow and boot code does not
>>>> complete within the expected time with lower link speeds. Need to override
>>>> and the clock in driver. We are checking the feasibility of adding
>>>> this in driver or firmware.
>>>
>>> Hi Kai-Heng,
>>>
>>> Can you please test the attached patch?
>>
>> I built a kernel and asked affected users to try.
>
> Users reported that the crash still happens with the patch.
>
> Kai-Heng
>
Thanks for the feedback. We will re-work on the patch and soon provide
you the update.
>>
>> Thanks for your work.
>>
>> Kai-Heng
>>
>>>
>>> Thanks,
>>> Satish
>>> <tg3_5762_clock_override.patch>
>

^ permalink raw reply

* Re: [RFC PATCH net-next v6 2/4] net: Introduce generic bypass module
From: Michael S. Tsirkin @ 2018-04-19  4:08 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Samudrala, Sridhar, stephen, davem, netdev, virtualization,
	virtio-dev, jesse.brandeburg, alexander.h.duyck, kubakici,
	jasowang, loseweigh
In-Reply-To: <20180418203206.GC1922@nanopsycho>

On Wed, Apr 18, 2018 at 10:32:06PM +0200, Jiri Pirko wrote:
> >> >> > With regards to alternate names for 'active', you suggested 'stolen', but i
> >> >> > am not too happy with it.
> >> >> > netvsc uses vf_netdev, are you OK with this? Or another option is 'passthru'
> >> >> No. The netdev could be any netdevice. It does not have to be a "VF".
> >> >> I think "stolen" is quite appropriate since it describes the modus
> >> >> operandi. The bypass master steals some netdevice according to some
> >> >> match.
> >> >> 
> >> >> But I don't insist on "stolen". Just sounds right.
> >> >
> >> >We are adding VIRTIO_NET_F_BACKUP as a new feature bit to enable this feature, So i think
> >> >'backup' name is consistent.
> >> 
> >> It perhaps makes sense from the view of virtio device. However, as I
> >> described couple of times, for master/slave device the name "backup" is
> >> highly misleading.
> >
> >virtio is the backup. You are supposed to use another
> >(typically passthrough) device, if that fails use virtio.
> >It does seem appropriate to me. If you like, we can
> >change that to "standby".  Active I don't like either. "main"?
> 
> Sounds much better, yes.

Excuse me, which of the versions are better in your eyes?


> 
> >
> >In fact would failover be better than bypass?
> 
> Also, much better.
> 

^ permalink raw reply

* Re: [PATCH] net: don't use kvzalloc for DMA memory
From: Michael S. Tsirkin @ 2018-04-19  4:00 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Mikulas Patocka, David S. Miller, Eric Dumazet, Joby Poriyath,
	Ben Hutchings, netdev, linux-kernel
In-Reply-To: <f821bf46-94e9-c794-bc77-600dc19edb9a@gmail.com>

On Wed, Apr 18, 2018 at 01:38:43PM -0700, Eric Dumazet wrote:
> 
> 
> On 04/18/2018 10:55 AM, Michael S. Tsirkin wrote:
> 
> > Imagine you want to pass some data to card.
> > Natural thing is to just put it in a variable and start DMA.
> > However DMA API disallows stack access nowdays,
> > so it's natural to put this within struct device.
> > 
> > See e.g.
> > 
> > 	commit a725ee3e44e39dab1ec82cc745899a785d2a555e
> > 	Author: Andy Lutomirski <luto@kernel.org>
> > 	Date:   Mon Jul 18 15:34:49 2016 -0700
> > 
> > 	    virtio-net: Remove more stack DMA
> >
> 
> Andy just moved the problem to another one, since at that time we already
> had vmalloc() fallback for at least 2 years.
> 
> Note that my original patch had :
> 
> p = kzalloc(alloc_size, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
> if (!p)
>       p = vzalloc(alloc_size);
> 
> So really, normal (less than PAGE_SIZE) allocations would have almost-zero-chance to end up to vmalloc(one_page)

Thanks Eric, I'll fix virtio.


-- 
MST

^ permalink raw reply

* [PATCH] net: hns: Avoid action name truncation
From: dann frazier @ 2018-04-19  3:55 UTC (permalink / raw)
  To: Yisen Zhuang, Salil Mehta, David S. Miller
  Cc: netdev, linux-kernel, Lin Yun Sheng

When longer interface names are used, the action names exposed in
/proc/interrupts and /proc/irq/* maybe truncated. For example, when
using the predictable name algorithm in systemd on a HiSilicon D05,
I see:

  ubuntu@d05-3:~$  grep enahisic2i0-tx /proc/interrupts | sed 's/.* //'
  enahisic2i0-tx0
  enahisic2i0-tx1
  [...]
  enahisic2i0-tx8
  enahisic2i0-tx9
  enahisic2i0-tx1
  enahisic2i0-tx1
  enahisic2i0-tx1
  enahisic2i0-tx1
  enahisic2i0-tx1
  enahisic2i0-tx1

Increase the max ring name length to allow for an interface name
of IFNAMSIZE. After this change, I now see:

  $ grep enahisic2i0-tx /proc/interrupts | sed 's/.* //'
  enahisic2i0-tx0
  enahisic2i0-tx1
  enahisic2i0-tx2
  [...]
  enahisic2i0-tx8
  enahisic2i0-tx9
  enahisic2i0-tx10
  enahisic2i0-tx11
  enahisic2i0-tx12
  enahisic2i0-tx13
  enahisic2i0-tx14
  enahisic2i0-tx15

Signed-off-by: dann frazier <dann.frazier@canonical.com>
---
 drivers/net/ethernet/hisilicon/hns/hnae.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hnae.h b/drivers/net/ethernet/hisilicon/hns/hnae.h
index 3e62692af011..fa5b30f547f6 100644
--- a/drivers/net/ethernet/hisilicon/hns/hnae.h
+++ b/drivers/net/ethernet/hisilicon/hns/hnae.h
@@ -87,7 +87,7 @@ do { \
 
 #define HNAE_AE_REGISTER 0x1
 
-#define RCB_RING_NAME_LEN 16
+#define RCB_RING_NAME_LEN (IFNAMSIZ + 4)
 
 #define HNAE_LOWEST_LATENCY_COAL_PARAM	30
 #define HNAE_LOW_LATENCY_COAL_PARAM	80
-- 
2.17.0

^ permalink raw reply related

* Re: KASAN: slab-out-of-bounds Write in tcp_v6_syn_recv_sock
From: Eric Biggers @ 2018-04-19  3:33 UTC (permalink / raw)
  To: syzbot; +Cc: davem, kuznet, linux-kernel, netdev, syzkaller-bugs, yoshfuji
In-Reply-To: <001a113ed0fc75c8770561d3de8c@google.com>

On Tue, Jan 02, 2018 at 03:58:01PM -0800, syzbot wrote:
> Hello,
> 
> syzkaller hit the following crash on
> 61233580f1f33c50e159c50e24d80ffd2ba2e06b
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> C reproducer is attached
> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> for information about syzkaller reproducers
> 
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+6dc95bddc6976b800b0b@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
> 
> TCP: request_sock_TCPv6: Possible SYN flooding on port 20002. Sending
> cookies.  Check SNMP counters.
> ==================================================================
> BUG: KASAN: slab-out-of-bounds in memcpy include/linux/string.h:344 [inline]
> BUG: KASAN: slab-out-of-bounds in tcp_v6_syn_recv_sock+0x628/0x23a0
> net/ipv6/tcp_ipv6.c:1144
> Write of size 160 at addr ffff8801cbdd7460 by task syzkaller545407/3196
> 
> CPU: 1 PID: 3196 Comm: syzkaller545407 Not tainted 4.15.0-rc5+ #241
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  <IRQ>
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:53
>  print_address_description+0x73/0x250 mm/kasan/report.c:252
>  kasan_report_error mm/kasan/report.c:351 [inline]
>  kasan_report+0x25b/0x340 mm/kasan/report.c:409
>  check_memory_region_inline mm/kasan/kasan.c:260 [inline]
>  check_memory_region+0x137/0x190 mm/kasan/kasan.c:267
>  memcpy+0x37/0x50 mm/kasan/kasan.c:303
>  memcpy include/linux/string.h:344 [inline]
>  tcp_v6_syn_recv_sock+0x628/0x23a0 net/ipv6/tcp_ipv6.c:1144
>  tcp_get_cookie_sock+0x102/0x540 net/ipv4/syncookies.c:213
>  cookie_v6_check+0x177d/0x2160 net/ipv6/syncookies.c:255
>  tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:1008 [inline]
>  tcp_v6_do_rcv+0xe4d/0x11c0 net/ipv6/tcp_ipv6.c:1316
>  tcp_v6_rcv+0x22ee/0x2b40 net/ipv6/tcp_ipv6.c:1510
>  ip6_input_finish+0x36f/0x1700 net/ipv6/ip6_input.c:284
>  NF_HOOK include/linux/netfilter.h:250 [inline]
>  ip6_input+0xe9/0x560 net/ipv6/ip6_input.c:327
>  dst_input include/net/dst.h:466 [inline]
>  ip6_rcv_finish+0x1a9/0x7a0 net/ipv6/ip6_input.c:71
>  NF_HOOK include/linux/netfilter.h:250 [inline]
>  ipv6_rcv+0xf1f/0x1f80 net/ipv6/ip6_input.c:208
>  __netif_receive_skb_core+0x1a3e/0x3450 net/core/dev.c:4461
>  __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4526
>  process_backlog+0x203/0x740 net/core/dev.c:5205
>  napi_poll net/core/dev.c:5603 [inline]
>  net_rx_action+0x792/0x1910 net/core/dev.c:5669
>  __do_softirq+0x2d7/0xb85 kernel/softirq.c:285
>  do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1115
>  </IRQ>
>  do_softirq.part.21+0x14d/0x190 kernel/softirq.c:329
>  do_softirq kernel/softirq.c:177 [inline]
>  __local_bh_enable_ip+0x1ee/0x230 kernel/softirq.c:182
>  local_bh_enable include/linux/bottom_half.h:32 [inline]
>  rcu_read_unlock_bh include/linux/rcupdate.h:727 [inline]
>  ip6_finish_output2+0xba6/0x2390 net/ipv6/ip6_output.c:121
>  ip6_finish_output+0x2f9/0x920 net/ipv6/ip6_output.c:146
>  NF_HOOK_COND include/linux/netfilter.h:239 [inline]
>  ip6_output+0x1eb/0x840 net/ipv6/ip6_output.c:163
>  dst_output include/net/dst.h:460 [inline]
>  NF_HOOK include/linux/netfilter.h:250 [inline]
>  ip6_xmit+0xd75/0x2080 net/ipv6/ip6_output.c:269
>  inet6_csk_xmit+0x2fc/0x580 net/ipv6/inet6_connection_sock.c:139
>  tcp_transmit_skb+0x1b12/0x38b0 net/ipv4/tcp_output.c:1176
>  tcp_write_xmit+0x680/0x5190 net/ipv4/tcp_output.c:2367
>  __tcp_push_pending_frames+0xa0/0x250 net/ipv4/tcp_output.c:2543
>  tcp_send_fin+0x1b0/0xd20 net/ipv4/tcp_output.c:3087
>  tcp_close+0xbe0/0xfc0 net/ipv4/tcp.c:2234
>  inet_release+0xed/0x1c0 net/ipv4/af_inet.c:426
>  inet6_release+0x50/0x70 net/ipv6/af_inet6.c:432
>  sock_release+0x8d/0x1e0 net/socket.c:600
>  sock_close+0x16/0x20 net/socket.c:1129
>  __fput+0x327/0x7e0 fs/file_table.c:210
>  ____fput+0x15/0x20 fs/file_table.c:244
>  task_work_run+0x199/0x270 kernel/task_work.c:113
>  exit_task_work include/linux/task_work.h:22 [inline]
>  do_exit+0x9bb/0x1ad0 kernel/exit.c:865
>  do_group_exit+0x149/0x400 kernel/exit.c:968
>  get_signal+0x73f/0x16c0 kernel/signal.c:2335
>  do_signal+0x94/0x1ee0 arch/x86/kernel/signal.c:809
>  exit_to_usermode_loop+0x214/0x310 arch/x86/entry/common.c:158
>  prepare_exit_to_usermode arch/x86/entry/common.c:195 [inline]
>  syscall_return_slowpath+0x490/0x550 arch/x86/entry/common.c:264
>  entry_SYSCALL_64_fastpath+0x94/0x96
> RIP: 0033:0x4456e9
> RSP: 002b:00007fb4de631da8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> RAX: fffffffffffffe00 RBX: 00000000006dac3c RCX: 00000000004456e9
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000006dac3c
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006dac38
> R13: 0100000000000000 R14: 00007fb4de6329c0 R15: 0000000000000009
> 
> Allocated by task 3196:
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
>  set_track mm/kasan/kasan.c:459 [inline]
>  kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
>  kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
>  kmem_cache_alloc+0x12e/0x760 mm/slab.c:3544
>  sk_prot_alloc+0x65/0x2a0 net/core/sock.c:1463
>  sk_clone_lock+0x152/0x1570 net/core/sock.c:1649
>  inet_csk_clone_lock+0x92/0x4f0 net/ipv4/inet_connection_sock.c:781
>  tcp_create_openreq_child+0x9b/0x1b70 net/ipv4/tcp_minisocks.c:449
>  tcp_v6_syn_recv_sock+0x22d/0x23a0 net/ipv6/tcp_ipv6.c:1123
>  tcp_get_cookie_sock+0x102/0x540 net/ipv4/syncookies.c:213
>  cookie_v6_check+0x177d/0x2160 net/ipv6/syncookies.c:255
>  tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:1008 [inline]
>  tcp_v6_do_rcv+0xe4d/0x11c0 net/ipv6/tcp_ipv6.c:1316
>  tcp_v6_rcv+0x22ee/0x2b40 net/ipv6/tcp_ipv6.c:1510
>  ip6_input_finish+0x36f/0x1700 net/ipv6/ip6_input.c:284
>  NF_HOOK include/linux/netfilter.h:250 [inline]
>  ip6_input+0xe9/0x560 net/ipv6/ip6_input.c:327
>  dst_input include/net/dst.h:466 [inline]
>  ip6_rcv_finish+0x1a9/0x7a0 net/ipv6/ip6_input.c:71
>  NF_HOOK include/linux/netfilter.h:250 [inline]
>  ipv6_rcv+0xf1f/0x1f80 net/ipv6/ip6_input.c:208
>  __netif_receive_skb_core+0x1a3e/0x3450 net/core/dev.c:4461
>  __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4526
>  process_backlog+0x203/0x740 net/core/dev.c:5205
>  napi_poll net/core/dev.c:5603 [inline]
>  net_rx_action+0x792/0x1910 net/core/dev.c:5669
>  __do_softirq+0x2d7/0xb85 kernel/softirq.c:285
> 
> Freed by task 0:
> (stack is not available)
> 
> The buggy address belongs to the object at ffff8801cbdd6a80
>  which belongs to the cache TCP of size 2528
> The buggy address is located 0 bytes to the right of
>  2528-byte region [ffff8801cbdd6a80, ffff8801cbdd7460)
> The buggy address belongs to the page:
> page:000000006145927c count:1 mapcount:0 mapping:00000000d41dd7c1
> index:0xffff8801cbdd7ffd compound_mapcount: 0
> flags: 0x2fffc0000008100(slab|head)
> raw: 02fffc0000008100 ffff8801cbdd6000 ffff8801cbdd7ffd 0000000100000003
> raw: ffffea00074ef120 ffff8801d82b7248 ffff8801d798b640 0000000000000000
> page dumped because: kasan: bad access detected
> 
> Memory state around the buggy address:
>  ffff8801cbdd7300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>  ffff8801cbdd7380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > ffff8801cbdd7400: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
>                                                        ^
>  ffff8801cbdd7480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>  ffff8801cbdd7500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> ==================================================================
> 
> 
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkaller@googlegroups.com.
> 
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is
> merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title

No longer occurring; last occurrence was on Jan 19 (commit dda3e15231b35).
The fix was commit d91c3e17f75f2:

#syz fix: net/tls: Only attach to sockets in ESTABLISHED state

- Eric

^ permalink raw reply

* Re: [PATCH 2/6] rhashtable: remove incorrect comment on r{hl, hash}table_walk_enter()
From: Herbert Xu @ 2018-04-19  3:22 UTC (permalink / raw)
  To: NeilBrown; +Cc: Thomas Graf, netdev, linux-kernel
In-Reply-To: <87efjcqg2r.fsf@notabene.neil.brown.name>

On Thu, Apr 19, 2018 at 08:56:28AM +1000, NeilBrown wrote:
>
> I don't want to do that - I just want the documentation to be correct
> (or at least, not be blatantly incorrect).  The function does not sleep,
> and is safe to call with spin locks held.
> Do we need to spell out when it can be called?  If so, maybe:
> 
>    This function may be called from any process context, including
>    non-preemptable context, but cannot be called from interrupts.

Just to make it perfectly clear, how about "cannot be called from
softirq or hardirq context"? Previously the not able to sleep part
completely ruled out any ambiguity but the new wording could confuse
people into thinking that this can be called from softirq context
where it would be unsafe if mixed with process context usage.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH net-next] ipv6: frags: fix a lockdep false positive
From: David Miller @ 2018-04-19  3:20 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, eric.dumazet
In-Reply-To: <20180418011144.71697-1-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Tue, 17 Apr 2018 18:11:44 -0700

> lockdep does not know that the locks used by IPv4 defrag
> and IPv6 reassembly units are of different classes.
> 
> It complains because of following chains :
> 
> 1) sch_direct_xmit()        (lock txq->_xmit_lock)
>     dev_hard_start_xmit()
>      xmit_one()
>       dev_queue_xmit_nit()
>        packet_rcv_fanout()
>         ip_check_defrag()
>          ip_defrag()
>           spin_lock()     (lock frag queue spinlock)
> 
> 2) ip6_input_finish()
>     ipv6_frag_rcv()       (lock frag queue spinlock)
>      ip6_frag_queue()
>       icmpv6_param_prob() (lock txq->_xmit_lock at some point)
> 
> We could add lockdep annotations, but we also can make sure IPv6
> calls icmpv6_param_prob() only after the release of the frag queue spinlock,
> since this naturally makes frag queue spinlock a leaf in lock hierarchy.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
> Note do David: I chose net-next because of recent changes in net-next,
> and because it is a false positive, but can respin for net tree
> if you prefer. Thanks !

Yeah I think net-next is fine for this.

Applied, thanks.

^ permalink raw reply

* Re: [PATCH v4 00/10] New network driver for Amiga X-Surf 100 (m68k)
From: Michael Schmitz @ 2018-04-19  3:17 UTC (permalink / raw)
  To: netdev; +Cc: andrew, fthain, geert, f.fainelli, linux-m68k, Michael.Karcher
In-Reply-To: <1524103526-12240-1-git-send-email-schmitzmic@gmail.com>

Hi,

messed up the subject there, sorry - this was meant to be

[PATCH v4 0/9] net-next: New network driver for Amiga X-Surf 100 (m68k)

Cheers,

	Michael


Am 19.04.2018 um 14:05 schrieb Michael Schmitz:
> [This is a resend of my v3 series which was based on the wrong version and
> tree. Only substantial change is to Asix AX99796B PHY driver.]
> 
> This patch series adds support for the Individual Computers X-Surf 100
> network card for m68k Amiga, a network adapter based on the AX88796 chip set.
> 
> The driver was originally written for kernel version 3.19 by Michael Karcher
> (see CC:), and adapted to 4.16+ for submission to netdev by me. Questions
> regarding motivation for some of the changes are probably best directed at
> Michael Karcher.
> 
> The driver has been tested by Adrian <glaubitz@physik.fu-berlin.de> who will
> send his Tested-by tag separately.
> 
> A few changes to the ax88796 driver were required:
> - to read the MAC address, some setup of the ax99796 chip must be done,
> - attach to the MII bus only on device open to allow module unloading,
> - allow to supersede ax_block_input/ax_block_output by card-specific
>   optimized code,
> - use an optional interrupt status callback to allow easier sharing of the
>   card interrupt,
> - set IRQF_SHARED if platform IRQ resource is marked shareable
> 
> The Asix Electronix PHY used on the X-Surf 100 is buggy, and causes the
> software reset to hang if the previous command sent to the PHY was also
> a soft reset. This bug requires addition of a PHY driver for Asix PHYs
> to provide a fixed .soft_reset function, included in this series.
> 
> Some additional cleanup:
> - do not attempt to free IRQ in ax_remove (complements 82533ad9a1c),
> - clear platform drvdata on probe fail and module remove.
> 
> Changes since v1:
> 
> Raised in review by Andrew Lunn:
> - move MII code around to avoid need for forward declaration,
> - combine patches 2 and 7 to add cleanup in error path
> 
> Changes since v2:
> 
> - corrected authorship attribution to Michael Karcher
> 
> Suggested by Geert Uytterhoeven:
> - use ei_local->reset_8390() instead of duplicating ax_reset_8390(),
> - use %pR to format struct resource pointers,
> - assign pdev and xs100 pointers in declaration,
> - don't split error messages,
> - change Kconfig logic to only require XSURF100 set on Amiga
> 
> Suggested by Andrew Lunn:
> - add COMPILE_TEST to ax88796 Kconfig options,
> - use new Asix PHY driver for X-Surf 100
> 
> Suggested by Andrew Lunn/Finn Thain:
> - declare struct sk_buff in ax88796.h,
> - correct whitespace error in ax88796.h
> 
> Changes since v3:
> 
> - various checkpatch cleanup
> 
> Andrew Lunn:
> - don't duplicate genphy_soft_reset in Asix PHY driver, just call 
>   genphy_soft_reset after writing zero to control register
> 
> This series' patches, in order:
> 
> 1/9 net-next: phy: new Asix Electronics PHY driver
> 2/9 net-next: ax88796: Fix MAC address reading
> 3/9 net-next: ax88796: Attach MII bus only when open
> 4/9 net-next: ax88796: Do not free IRQ in ax_remove() (already freed in ax_close()).
> 5/9 net-next: ax88796: Add block_input/output hooks to ax_plat_data
> 6/9 net-next: ax88796: add interrupt status callback to platform data
> 7/9 net-next: ax88796: set IRQF_SHARED flag when IRQ resource is marked as shareable
> 8/9 net-next: ax88796: release platform device drvdata on probe error and module remove
> 9/9 net-next: New ax88796 platform driver for Amiga X-Surf 100 Zorro board (m68k)
> 
>  drivers/net/ethernet/8390/Kconfig    |   17 ++-
>  drivers/net/ethernet/8390/Makefile   |    1 +
>  drivers/net/ethernet/8390/ax88796.c  |  228 ++++++++++++--------
>  drivers/net/ethernet/8390/xsurf100.c |  382 ++++++++++++++++++++++++++++++++++
>  drivers/net/phy/Kconfig              |    6 +
>  drivers/net/phy/Makefile             |    1 +
>  drivers/net/phy/asix.c               |   63 ++++++
>  include/net/ax88796.h                |   14 ++
>  8 files changed, 617 insertions(+), 95 deletions(-)
> 
> Cheers,
> 
> 	Michael
> 

^ permalink raw reply

* Re: [Regression] net/phy/micrel.c v4.9.94
From: Chris Ruehl @ 2018-04-19  2:34 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: f.fainelli, netdev
In-Reply-To: <24c8e4cd-f2c1-6d50-9557-28a9958e74a5@gtsys.com.hk>

On Thursday, April 19, 2018 09:34 AM, Chris Ruehl wrote:
> 
> 
> On Thursday, April 19, 2018 09:21 AM, Chris Ruehl wrote:
>> On Wednesday, April 18, 2018 09:02 PM, Andrew Lunn wrote:
>>> On Wed, Apr 18, 2018 at 02:56:01PM +0200, Andrew Lunn wrote:
>>>> On Wed, Apr 18, 2018 at 09:34:16AM +0800, Chris Ruehl wrote:
>>>>> Hello,
>>>>>
>>>>> I like to get your heads up at a regression introduced in 4.9.94
>>>>> commitment lead to a kernel ops and make the network unusable on my MX6DL
>>>>> customized board.
>>>>>
>>>>> Race condition resume is called on startup and the phy not yet initialized.
>>>>
>>>> Hi Chris
>>>>
>>>> Please could you try
>>>>
>>>> bfe72442578b ("net: phy: micrel: fix crash when statistic requested for 
>>>> KSZ9031 phy")
>>>
>>> I don't think it is a complete fix. I suspect "Micrel KSZ8795",
>>> "Micrel KSZ886X Switch", "Micrel KSZ8061", and "Micrel KS8737" will
>>> still have problems.
>>>
>>> Those four probably need a:
>>>
>>>          .probe          = kszphy_probe,
>>>
>>>     Andrew
>>>
>>
>> Indeed I have the
>> [    7.385851] Micrel KSZ9031 Gigabit PHY 2188000.ethernet-1:05: attached PHY 
>> driver [Micrel KSZ9031 Gigabit PHY] (mii_bus:phy_addr=2188000.ethernet-1:05, 
>> irq=-1)
>>
>> first I rollback to a non crashing stable kernel.
>>
>> As "bfe72442578b" gonna fix it I check with the next update and verify its 
>> works for me.
>>
>> Thanks
>> Chris
> 
> Andrew,
> 
> Change my mind. Find the patch you mentioned and will apply and test.
> 
> Chris

Andrew,

make a micro patch only and can confirm to add the .probe fix the ops.

But the imx serial is broken :-( ! have to have a look into it.

--- linux-4.9/drivers/net/phy/micrel.c.orig     2018-04-19 09:37:45.648000000 +0800
+++ linux-4.9/drivers/net/phy/micrel.c  2018-04-19 09:44:54.356000000 +0800
@@ -974,6 +974,7 @@
         .features       = (PHY_GBIT_FEATURES | SUPPORTED_Pause),
         .flags          = PHY_HAS_MAGICANEG | PHY_HAS_INTERRUPT,
         .driver_data    = &ksz9021_type,
+       .probe          = kszphy_probe,
         .config_init    = ksz9021_config_init,
         .config_aneg    = genphy_config_aneg,
         .read_status    = genphy_read_status,
@@ -993,6 +994,7 @@
         .features       = (PHY_GBIT_FEATURES | SUPPORTED_Pause),
         .flags          = PHY_HAS_MAGICANEG | PHY_HAS_INTERRUPT,
         .driver_data    = &ksz9021_type,
+       .probe          = kszphy_probe,
         .config_init    = ksz9031_config_init,
         .config_aneg    = genphy_config_aneg,
         .read_status    = ksz9031_read_status,

[    7.219735] Micrel KSZ9031 Gigabit PHY 2188000.ethernet-1:04: attached PHY 
driver [Micrel KSZ9031 Gigabit PHY] (mii_bus:phy_addr=2188000.ethernet-1:04, irq=-1)



Regards
Chris



-- 
GTSYS Limited RFID Technology
9/F, Unit E, R07, Kwai Shing Industrial Building Phase 2,
42-46 Tai Lin Pai Road, Kwai Chung, N.T., Hong Kong
Tel (852) 9079 9521

Disclaimer: https://www.gtsys.com.hk/email/classified.html

^ permalink raw reply

* Re: [RFC PATCH ghak32 V2 12/13] audit: NETFILTER_PKT: record each container ID associated with a netNS
From: Paul Moore @ 2018-04-19  2:10 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: cgroups, containers, linux-api, Linux-Audit Mailing List,
	linux-fsdevel, LKML, netdev, ebiederm, luto, jlayton, carlos,
	dhowells, viro, simo, Eric Paris, serge
In-Reply-To: <66adde01c1dda792aff99a457eea576a0b08ca98.1521179281.git.rgb@redhat.com>

On Fri, Mar 16, 2018 at 5:00 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
> Add container ID auxiliary record(s) to NETFILTER_PKT event standalone
> records.  Iterate through all potential container IDs associated with a
> network namespace.
>
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> ---
>  kernel/audit.c           |  1 +
>  kernel/auditsc.c         |  2 ++
>  net/netfilter/xt_AUDIT.c | 15 ++++++++++++++-
>  3 files changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 08662b4..3c77e47 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -2102,6 +2102,7 @@ int audit_log_container_info(struct audit_context *context,
>         audit_log_end(ab);
>         return 0;
>  }
> +EXPORT_SYMBOL(audit_log_container_info);
>
>  void audit_log_key(struct audit_buffer *ab, char *key)
>  {
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index 208da962..af68d01 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -975,6 +975,7 @@ struct audit_context *audit_alloc_local(void)
>         context->in_syscall = 1;
>         return context;
>  }
> +EXPORT_SYMBOL(audit_alloc_local);
>
>  inline void audit_free_context(struct audit_context *context)
>  {
> @@ -989,6 +990,7 @@ inline void audit_free_context(struct audit_context *context)
>         audit_proctitle_free(context);
>         kfree(context);
>  }
> +EXPORT_SYMBOL(audit_free_context);
>
>  static int audit_log_pid_context(struct audit_context *context, pid_t pid,
>                                  kuid_t auid, kuid_t uid, unsigned int sessionid,
> diff --git a/net/netfilter/xt_AUDIT.c b/net/netfilter/xt_AUDIT.c
> index c502419..edaa456 100644
> --- a/net/netfilter/xt_AUDIT.c
> +++ b/net/netfilter/xt_AUDIT.c
> @@ -71,10 +71,14 @@ static bool audit_ip6(struct audit_buffer *ab, struct sk_buff *skb)
>  {
>         struct audit_buffer *ab;
>         int fam = -1;
> +       struct audit_context *context = audit_alloc_local();
> +       struct audit_containerid *cont;
> +       int i = 0;
> +       struct net *net;
>
>         if (audit_enabled == 0)
>                 goto errout;

Do I need to say it?  I probably should ... the allocation should
happen after the audit_enabled check.

> -       ab = audit_log_start(NULL, GFP_ATOMIC, AUDIT_NETFILTER_PKT);
> +       ab = audit_log_start(context, GFP_ATOMIC, AUDIT_NETFILTER_PKT);
>         if (ab == NULL)
>                 goto errout;
>
> @@ -104,7 +108,16 @@ static bool audit_ip6(struct audit_buffer *ab, struct sk_buff *skb)
>
>         audit_log_end(ab);
>
> +       net = sock_net(NETLINK_CB(skb).sk);
> +       list_for_each_entry(cont, &net->audit_containerid, list) {
> +               char buf[14];
> +
> +               sprintf(buf, "net%u", i++);
> +               audit_log_container_info(context, buf, cont->id);
> +       }

It seems like this could (should?) be hidden inside an audit function,
e.g. audit_log_net_containers() or something like that.

>  errout:
> +       audit_free_context(context);
>         return XT_CONTINUE;
>  }

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* [PATCH v4 9/9] net-next: New ax88796 platform driver for Amiga X-Surf 100 Zorro board (m68k)
From: Michael Schmitz @ 2018-04-19  2:05 UTC (permalink / raw)
  To: netdev
  Cc: andrew, fthain, geert, f.fainelli, linux-m68k, Michael.Karcher,
	Michael Karcher, Michael Schmitz
In-Reply-To: <1524025616-3722-1-git-send-email-schmitzmic@gmail.com>

From: Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de>

Add platform device driver to populate the ax88796 platform data from
information provided by the XSurf100 zorro device driver. The ax88796
module will be loaded through this module's probe function.

Signed-off-by: Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de>
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>

---

Changes in v3:

Suggested by Geert Uytterhoeven:
- use ei_local->reset_8390() instead of duplicating ax_reset_8390()
- use %pR to format struct resource pointers
- assign pdev and xs100 pointers in declaration
- don't split error messages
- change Kconfig logic to only require XSURF100 set on Amiga

Suggested by Andrew Lunn:
- add COMPILE_TEST to ax88796 Kconfig options
- use new Asix PHY driver for X-Surf 100
---
 drivers/net/ethernet/8390/Kconfig    |   17 ++-
 drivers/net/ethernet/8390/Makefile   |    1 +
 drivers/net/ethernet/8390/xsurf100.c |  382 ++++++++++++++++++++++++++++++++++
 3 files changed, 398 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/8390/xsurf100.c

diff --git a/drivers/net/ethernet/8390/Kconfig b/drivers/net/ethernet/8390/Kconfig
index 9fee7c8..f2f0264 100644
--- a/drivers/net/ethernet/8390/Kconfig
+++ b/drivers/net/ethernet/8390/Kconfig
@@ -29,8 +29,8 @@ config PCMCIA_AXNET
 	  called axnet_cs.  If unsure, say N.
 
 config AX88796
-	tristate "ASIX AX88796 NE2000 clone support"
-	depends on (ARM || MIPS || SUPERH)
+	tristate "ASIX AX88796 NE2000 clone support" if !ZORRO
+	depends on (ARM || MIPS || SUPERH || ZORRO || COMPILE_TEST)
 	select CRC32
 	select PHYLIB
 	select MDIO_BITBANG
@@ -45,6 +45,19 @@ config AX88796_93CX6
 	---help---
 	  Select this if your platform comes with an external 93CX6 eeprom.
 
+config XSURF100
+	tristate "Amiga XSurf 100 AX88796/NE2000 clone support"
+	depends on ZORRO
+	select AX88796
+	select ASIX_PHY
+	help
+	  This driver is for the Individual Computers X-Surf 100 Ethernet
+	  card (based on the Asix AX88796 chip). If you have such a card,
+	  say Y. Otherwise, say N.
+
+	  To compile this driver as a module, choose M here: the module
+	  will be called xsurf100.
+
 config HYDRA
 	tristate "Hydra support"
 	depends on ZORRO
diff --git a/drivers/net/ethernet/8390/Makefile b/drivers/net/ethernet/8390/Makefile
index 1d650e6..85c83c5 100644
--- a/drivers/net/ethernet/8390/Makefile
+++ b/drivers/net/ethernet/8390/Makefile
@@ -16,4 +16,5 @@ obj-$(CONFIG_PCMCIA_PCNET) += pcnet_cs.o 8390.o
 obj-$(CONFIG_STNIC) += stnic.o 8390.o
 obj-$(CONFIG_ULTRA) += smc-ultra.o 8390.o
 obj-$(CONFIG_WD80x3) += wd.o 8390.o
+obj-$(CONFIG_XSURF100) += xsurf100.o
 obj-$(CONFIG_ZORRO8390) += zorro8390.o
diff --git a/drivers/net/ethernet/8390/xsurf100.c b/drivers/net/ethernet/8390/xsurf100.c
new file mode 100644
index 0000000..e2c9638
--- /dev/null
+++ b/drivers/net/ethernet/8390/xsurf100.c
@@ -0,0 +1,382 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/module.h>
+#include <linux/netdevice.h>
+#include <linux/platform_device.h>
+#include <linux/zorro.h>
+#include <net/ax88796.h>
+#include <asm/amigaints.h>
+
+#define ZORRO_PROD_INDIVIDUAL_COMPUTERS_X_SURF100 \
+		ZORRO_ID(INDIVIDUAL_COMPUTERS, 0x64, 0)
+
+#define XS100_IRQSTATUS_BASE 0x40
+#define XS100_8390_BASE 0x800
+
+/* Longword-access area. Translated to 2 16-bit access cycles by the
+ * X-Surf 100 FPGA
+ */
+#define XS100_8390_DATA32_BASE 0x8000
+#define XS100_8390_DATA32_SIZE 0x2000
+/* Sub-Areas for fast data register access; addresses relative to area begin */
+#define XS100_8390_DATA_READ32_BASE 0x0880
+#define XS100_8390_DATA_WRITE32_BASE 0x0C80
+#define XS100_8390_DATA_AREA_SIZE 0x80
+
+#define __NS8390_init ax_NS8390_init
+
+/* force unsigned long back to 'void __iomem *' */
+#define ax_convert_addr(_a) ((void __force __iomem *)(_a))
+
+#define ei_inb(_a) z_readb(ax_convert_addr(_a))
+#define ei_outb(_v, _a) z_writeb(_v, ax_convert_addr(_a))
+
+#define ei_inw(_a) z_readw(ax_convert_addr(_a))
+#define ei_outw(_v, _a) z_writew(_v, ax_convert_addr(_a))
+
+#define ei_inb_p(_a) ei_inb(_a)
+#define ei_outb_p(_v, _a) ei_outb(_v, _a)
+
+/* define EI_SHIFT() to take into account our register offsets */
+#define EI_SHIFT(x) (ei_local->reg_offset[(x)])
+
+/* Ensure we have our RCR base value */
+#define AX88796_PLATFORM
+
+static unsigned char version[] =
+		"ax88796.c: Copyright 2005,2007 Simtec Electronics\n";
+
+#include "lib8390.c"
+
+/* from ne.c */
+#define NE_CMD		EI_SHIFT(0x00)
+#define NE_RESET	EI_SHIFT(0x1f)
+#define NE_DATAPORT	EI_SHIFT(0x10)
+
+struct xsurf100_ax_plat_data {
+	struct ax_plat_data ax;
+	void __iomem *base_regs;
+	void __iomem *data_area;
+};
+
+static int is_xsurf100_network_irq(struct platform_device *pdev)
+{
+	struct xsurf100_ax_plat_data *xs100 = dev_get_platdata(&pdev->dev);
+
+	return (readw(xs100->base_regs + XS100_IRQSTATUS_BASE) & 0xaaaa) != 0;
+}
+
+/* These functions guarantee that the iomem is accessed with 32 bit
+ * cycles only. z_memcpy_fromio / z_memcpy_toio don't
+ */
+static void z_memcpy_fromio32(void *dst, const void __iomem *src, size_t bytes)
+{
+	while (bytes > 32) {
+		asm __volatile__
+		   ("movem.l (%0)+,%%d0-%%d7\n"
+		    "movem.l %%d0-%%d7,(%1)\n"
+		    "adda.l #32,%1" : "=a"(src), "=a"(dst)
+		    : "0"(src), "1"(dst) : "d0", "d1", "d2", "d3", "d4",
+					   "d5", "d6", "d7", "memory");
+		bytes -= 32;
+	}
+	while (bytes) {
+		*(uint32_t *)dst = z_readl(src);
+		src += 4;
+		dst += 4;
+		bytes -= 4;
+	}
+}
+
+static void z_memcpy_toio32(void __iomem *dst, const void *src, size_t bytes)
+{
+	while (bytes) {
+		z_writel(*(const uint32_t *)src, dst);
+		src += 4;
+		dst += 4;
+		bytes -= 4;
+	}
+}
+
+static void xs100_write(struct net_device *dev, const void *src,
+			unsigned int count)
+{
+	struct ei_device *ei_local = netdev_priv(dev);
+	struct platform_device *pdev = to_platform_device(dev->dev.parent);
+	struct xsurf100_ax_plat_data *xs100 = dev_get_platdata(&pdev->dev);
+
+	/* copy whole blocks */
+	while (count > XS100_8390_DATA_AREA_SIZE) {
+		z_memcpy_toio32(xs100->data_area +
+				XS100_8390_DATA_WRITE32_BASE, src,
+				XS100_8390_DATA_AREA_SIZE);
+		src += XS100_8390_DATA_AREA_SIZE;
+		count -= XS100_8390_DATA_AREA_SIZE;
+	}
+	/* copy whole dwords */
+	z_memcpy_toio32(xs100->data_area + XS100_8390_DATA_WRITE32_BASE,
+			src, count & ~3);
+	src += count & ~3;
+	if (count & 2) {
+		ei_outw(*(uint16_t *)src, ei_local->mem + NE_DATAPORT);
+		src += 2;
+	}
+	if (count & 1)
+		ei_outb(*(uint8_t *)src, ei_local->mem + NE_DATAPORT);
+}
+
+static void xs100_read(struct net_device *dev, void *dst, unsigned int count)
+{
+	struct ei_device *ei_local = netdev_priv(dev);
+	struct platform_device *pdev = to_platform_device(dev->dev.parent);
+	struct xsurf100_ax_plat_data *xs100 = dev_get_platdata(&pdev->dev);
+
+	/* copy whole blocks */
+	while (count > XS100_8390_DATA_AREA_SIZE) {
+		z_memcpy_fromio32(dst, xs100->data_area +
+				  XS100_8390_DATA_READ32_BASE,
+				  XS100_8390_DATA_AREA_SIZE);
+		dst += XS100_8390_DATA_AREA_SIZE;
+		count -= XS100_8390_DATA_AREA_SIZE;
+	}
+	/* copy whole dwords */
+	z_memcpy_fromio32(dst, xs100->data_area + XS100_8390_DATA_READ32_BASE,
+			  count & ~3);
+	dst += count & ~3;
+	if (count & 2) {
+		*(uint16_t *)dst = ei_inw(ei_local->mem + NE_DATAPORT);
+		dst += 2;
+	}
+	if (count & 1)
+		*(uint8_t *)dst = ei_inb(ei_local->mem + NE_DATAPORT);
+}
+
+/* Block input and output, similar to the Crynwr packet driver. If
+ * you are porting to a new ethercard, look at the packet driver
+ * source for hints. The NEx000 doesn't share the on-board packet
+ * memory -- you have to put the packet out through the "remote DMA"
+ * dataport using ei_outb.
+ */
+static void xs100_block_input(struct net_device *dev, int count,
+			      struct sk_buff *skb, int ring_offset)
+{
+	struct ei_device *ei_local = netdev_priv(dev);
+	void __iomem *nic_base = ei_local->mem;
+	char *buf = skb->data;
+
+	if (ei_local->dmaing) {
+		netdev_err(dev,
+			   "DMAing conflict in %s [DMAstat:%d][irqlock:%d]\n",
+			   __func__,
+			   ei_local->dmaing, ei_local->irqlock);
+		return;
+	}
+
+	ei_local->dmaing |= 0x01;
+
+	ei_outb(E8390_NODMA + E8390_PAGE0 + E8390_START, nic_base + NE_CMD);
+	ei_outb(count & 0xff, nic_base + EN0_RCNTLO);
+	ei_outb(count >> 8, nic_base + EN0_RCNTHI);
+	ei_outb(ring_offset & 0xff, nic_base + EN0_RSARLO);
+	ei_outb(ring_offset >> 8, nic_base + EN0_RSARHI);
+	ei_outb(E8390_RREAD + E8390_START, nic_base + NE_CMD);
+
+	xs100_read(dev, buf, count);
+
+	ei_local->dmaing &= ~1;
+}
+
+static void xs100_block_output(struct net_device *dev, int count,
+			       const unsigned char *buf, const int start_page)
+{
+	struct ei_device *ei_local = netdev_priv(dev);
+	void __iomem *nic_base = ei_local->mem;
+	unsigned long dma_start;
+
+	/* Round the count up for word writes. Do we need to do this?
+	 * What effect will an odd byte count have on the 8390?  I
+	 * should check someday.
+	 */
+	if (ei_local->word16 && (count & 0x01))
+		count++;
+
+	/* This *shouldn't* happen. If it does, it's the last thing
+	 * you'll see
+	 */
+	if (ei_local->dmaing) {
+		netdev_err(dev,
+			   "DMAing conflict in %s [DMAstat:%d][irqlock:%d]\n",
+			   __func__,
+			   ei_local->dmaing, ei_local->irqlock);
+		return;
+	}
+
+	ei_local->dmaing |= 0x01;
+	/* We should already be in page 0, but to be safe... */
+	ei_outb(E8390_PAGE0 + E8390_START + E8390_NODMA, nic_base + NE_CMD);
+
+	ei_outb(ENISR_RDC, nic_base + EN0_ISR);
+
+	/* Now the normal output. */
+	ei_outb(count & 0xff, nic_base + EN0_RCNTLO);
+	ei_outb(count >> 8, nic_base + EN0_RCNTHI);
+	ei_outb(0x00, nic_base + EN0_RSARLO);
+	ei_outb(start_page, nic_base + EN0_RSARHI);
+
+	ei_outb(E8390_RWRITE + E8390_START, nic_base + NE_CMD);
+
+	xs100_write(dev, buf, count);
+
+	dma_start = jiffies;
+
+	while ((ei_inb(nic_base + EN0_ISR) & ENISR_RDC) == 0) {
+		if (jiffies - dma_start > 2 * HZ / 100) {	/* 20ms */
+			netdev_warn(dev, "timeout waiting for Tx RDC.\n");
+			ei_local->reset_8390(dev);
+			ax_NS8390_init(dev, 1);
+			break;
+		}
+	}
+
+	ei_outb(ENISR_RDC, nic_base + EN0_ISR);	/* Ack intr. */
+	ei_local->dmaing &= ~0x01;
+}
+
+static int xsurf100_probe(struct zorro_dev *zdev,
+			  const struct zorro_device_id *ent)
+{
+	struct platform_device *pdev;
+	struct xsurf100_ax_plat_data ax88796_data;
+	struct resource res[2] = {
+		DEFINE_RES_NAMED(IRQ_AMIGA_PORTS, 1, NULL,
+				 IORESOURCE_IRQ | IORESOURCE_IRQ_SHAREABLE),
+		DEFINE_RES_MEM(zdev->resource.start + XS100_8390_BASE,
+			       4 * 0x20)
+	};
+	int reg;
+	/* This table is referenced in the device structure, so it must
+	 * outlive the scope of xsurf100_probe.
+	 */
+	static u32 reg_offsets[32];
+	int ret = 0;
+
+	/* X-Surf 100 control and 32 bit ring buffer data access areas.
+	 * These resources are not used by the ax88796 driver, so must
+	 * be requested here and passed via platform data.
+	 */
+
+	if (!request_mem_region(zdev->resource.start, 0x100, zdev->name)) {
+		dev_err(&zdev->dev, "cannot reserve X-Surf 100 control registers\n");
+		return -ENXIO;
+	}
+
+	if (!request_mem_region(zdev->resource.start +
+				XS100_8390_DATA32_BASE,
+				XS100_8390_DATA32_SIZE,
+				"X-Surf 100 32-bit data access")) {
+		dev_err(&zdev->dev, "cannot reserve 32-bit area\n");
+		ret = -ENXIO;
+		goto exit_req;
+	}
+
+	for (reg = 0; reg < 0x20; reg++)
+		reg_offsets[reg] = 4 * reg;
+
+	memset(&ax88796_data, 0, sizeof(ax88796_data));
+	ax88796_data.ax.flags = AXFLG_HAS_EEPROM;
+	ax88796_data.ax.wordlength = 2;
+	ax88796_data.ax.dcr_val = 0x48;
+	ax88796_data.ax.rcr_val = 0x40;
+	ax88796_data.ax.reg_offsets = reg_offsets;
+	ax88796_data.ax.check_irq = is_xsurf100_network_irq;
+	ax88796_data.base_regs = ioremap(zdev->resource.start, 0x100);
+
+	/* error handling for ioremap regs */
+	if (!ax88796_data.base_regs) {
+		dev_err(&zdev->dev, "Cannot ioremap area %pR (registers)\n",
+			&zdev->resource);
+
+		ret = -ENXIO;
+		goto exit_req2;
+	}
+
+	ax88796_data.data_area = ioremap(zdev->resource.start +
+			XS100_8390_DATA32_BASE, XS100_8390_DATA32_SIZE);
+
+	/* error handling for ioremap data */
+	if (!ax88796_data.data_area) {
+		dev_err(&zdev->dev,
+			"Cannot ioremap area %pR offset %x (32-bit access)\n",
+			&zdev->resource,  XS100_8390_DATA32_BASE);
+
+		ret = -ENXIO;
+		goto exit_mem;
+	}
+
+	ax88796_data.ax.block_output = xs100_block_output;
+	ax88796_data.ax.block_input = xs100_block_input;
+
+	pdev = platform_device_register_resndata(&zdev->dev, "ax88796",
+						 zdev->slotaddr, res, 2,
+						 &ax88796_data,
+						 sizeof(ax88796_data));
+
+	if (IS_ERR(pdev)) {
+		dev_err(&zdev->dev, "cannot register platform device\n");
+		ret = -ENXIO;
+		goto exit_mem2;
+	}
+
+	zorro_set_drvdata(zdev, pdev);
+
+	if (!ret)
+		return 0;
+
+ exit_mem2:
+	iounmap(ax88796_data.data_area);
+
+ exit_mem:
+	iounmap(ax88796_data.base_regs);
+
+ exit_req2:
+	release_mem_region(zdev->resource.start + XS100_8390_DATA32_BASE,
+			   XS100_8390_DATA32_SIZE);
+
+ exit_req:
+	release_mem_region(zdev->resource.start, 0x100);
+
+	return ret;
+}
+
+static void xsurf100_remove(struct zorro_dev *zdev)
+{
+	struct platform_device *pdev = zorro_get_drvdata(zdev);
+	struct xsurf100_ax_plat_data *xs100 = dev_get_platdata(&pdev->dev);
+
+	platform_device_unregister(pdev);
+
+	iounmap(xs100->base_regs);
+	release_mem_region(zdev->resource.start, 0x100);
+	iounmap(xs100->data_area);
+	release_mem_region(zdev->resource.start + XS100_8390_DATA32_BASE,
+			   XS100_8390_DATA32_SIZE);
+}
+
+static const struct zorro_device_id xsurf100_zorro_tbl[] = {
+	{ ZORRO_PROD_INDIVIDUAL_COMPUTERS_X_SURF100, },
+	{ 0 }
+};
+
+MODULE_DEVICE_TABLE(zorro, xsurf100_zorro_tbl);
+
+static struct zorro_driver xsurf100_driver = {
+	.name           = "xsurf100",
+	.id_table       = xsurf100_zorro_tbl,
+	.probe          = xsurf100_probe,
+	.remove         = xsurf100_remove,
+};
+
+module_driver(xsurf100_driver, zorro_register_driver, zorro_unregister_driver);
+
+MODULE_DESCRIPTION("X-Surf 100 driver");
+MODULE_AUTHOR("Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de>");
+MODULE_LICENSE("GPL v2");
-- 
1.7.0.4

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox