* [PATCH v8 32/50] vhost: switch to __get/__put_user exclusively
From: Michael S. Tsirkin @ 2014-12-01 16:05 UTC (permalink / raw)
To: linux-kernel
Cc: David Miller, cornelia.huck, rusty, nab, pbonzini, thuth, dahi,
kvm, virtualization, netdev
In-Reply-To: <1417449619-24896-1-git-send-email-mst@redhat.com>
Most places in vhost can use __get/__put_user rather than
get/put_user since addresses are pre-validated.
This should be good for performance, but this also
will help make code sparse-clean: get/put_user macros
don't play well with __virtioXX bitwise tags.
Switch to get/put_user to __ variants everywhere in vhost.
There's one exception - for consistency switch that
as well, and add an explicit access_ok check.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
drivers/vhost/vhost.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index c90f437..6a40837 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1038,6 +1038,7 @@ static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 avail_event)
int vhost_init_used(struct vhost_virtqueue *vq)
{
+ u16 last_used_idx;
int r;
if (!vq->private_data)
return 0;
@@ -1046,7 +1047,13 @@ int vhost_init_used(struct vhost_virtqueue *vq)
if (r)
return r;
vq->signalled_used_valid = false;
- return get_user(vq->last_used_idx, &vq->used->idx);
+ if (!access_ok(VERIFY_READ, &vq->used->idx, sizeof vq->used->idx))
+ return -EFAULT;
+ r = __get_user(last_used_idx, &vq->used->idx);
+ if (r)
+ return r;
+ vq->last_used_idx = last_used_idx;
+ return 0;
}
EXPORT_SYMBOL_GPL(vhost_init_used);
@@ -1404,7 +1411,7 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
/* Make sure buffer is written before we update index. */
smp_wmb();
- if (put_user(vq->last_used_idx, &vq->used->idx)) {
+ if (__put_user(vq->last_used_idx, &vq->used->idx)) {
vq_err(vq, "Failed to increment used idx");
return -EFAULT;
}
@@ -1449,7 +1456,7 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
if (unlikely(!v))
return true;
- if (get_user(event, vhost_used_event(vq))) {
+ if (__get_user(event, vhost_used_event(vq))) {
vq_err(vq, "Failed to get used event idx");
return true;
}
--
MST
^ permalink raw reply related
* [PATCH v8 31/50] vhost/net: force len for TX to host endian
From: Michael S. Tsirkin @ 2014-12-01 16:05 UTC (permalink / raw)
To: linux-kernel
Cc: thuth, kvm, rusty, netdev, virtualization, dahi, pbonzini,
David Miller
In-Reply-To: <1417449619-24896-1-git-send-email-mst@redhat.com>
vhost/net keeps a copy of the used ring in host memory but (ab)uses
the length field for internal house-keeping. This works because the
length in the used ring for tx is always 0. In order to suppress sparse
warnings, we force native endianness here.
Note that these values are never exposed to guests.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
---
drivers/vhost/net.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 8dae2f7..dce5c58 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -48,15 +48,15 @@ MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;"
* status internally; used for zerocopy tx only.
*/
/* Lower device DMA failed */
-#define VHOST_DMA_FAILED_LEN 3
+#define VHOST_DMA_FAILED_LEN ((__force __virtio32)3)
/* Lower device DMA done */
-#define VHOST_DMA_DONE_LEN 2
+#define VHOST_DMA_DONE_LEN ((__force __virtio32)2)
/* Lower device DMA in progress */
-#define VHOST_DMA_IN_PROGRESS 1
+#define VHOST_DMA_IN_PROGRESS ((__force __virtio32)1)
/* Buffer unused */
-#define VHOST_DMA_CLEAR_LEN 0
+#define VHOST_DMA_CLEAR_LEN ((__force __virtio32)0)
-#define VHOST_DMA_IS_DONE(len) ((len) >= VHOST_DMA_DONE_LEN)
+#define VHOST_DMA_IS_DONE(len) ((__force u32)(len) >= (__force u32)VHOST_DMA_DONE_LEN)
enum {
VHOST_NET_FEATURES = VHOST_FEATURES |
--
MST
^ permalink raw reply related
* [PATCH v8 30/50] vhost: add memory access wrappers
From: Michael S. Tsirkin @ 2014-12-01 16:05 UTC (permalink / raw)
To: linux-kernel
Cc: thuth, kvm, rusty, netdev, virtualization, dahi, pbonzini,
David Miller
In-Reply-To: <1417449619-24896-1-git-send-email-mst@redhat.com>
Add guest memory access wrappers to handle virtio endianness
conversions.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
---
drivers/vhost/vhost.h | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 55a95c9..a0a6c98 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -176,4 +176,35 @@ static inline bool vhost_has_feature(struct vhost_virtqueue *vq, int bit)
{
return vq->acked_features & (1ULL << bit);
}
+
+/* Memory accessors */
+static inline u16 vhost16_to_cpu(struct vhost_virtqueue *vq, __virtio16 val)
+{
+ return __virtio16_to_cpu(vhost_has_feature(vq, VIRTIO_F_VERSION_1), val);
+}
+
+static inline __virtio16 cpu_to_vhost16(struct vhost_virtqueue *vq, u16 val)
+{
+ return __cpu_to_virtio16(vhost_has_feature(vq, VIRTIO_F_VERSION_1), val);
+}
+
+static inline u32 vhost32_to_cpu(struct vhost_virtqueue *vq, __virtio32 val)
+{
+ return __virtio32_to_cpu(vhost_has_feature(vq, VIRTIO_F_VERSION_1), val);
+}
+
+static inline __virtio32 cpu_to_vhost32(struct vhost_virtqueue *vq, u32 val)
+{
+ return __cpu_to_virtio32(vhost_has_feature(vq, VIRTIO_F_VERSION_1), val);
+}
+
+static inline u64 vhost64_to_cpu(struct vhost_virtqueue *vq, __virtio64 val)
+{
+ return __virtio64_to_cpu(vhost_has_feature(vq, VIRTIO_F_VERSION_1), val);
+}
+
+static inline __virtio64 cpu_to_vhost64(struct vhost_virtqueue *vq, u64 val)
+{
+ return __cpu_to_virtio64(vhost_has_feature(vq, VIRTIO_F_VERSION_1), val);
+}
#endif
--
MST
^ permalink raw reply related
* [PATCH v8 29/50] vhost: make features 64 bit
From: Michael S. Tsirkin @ 2014-12-01 16:05 UTC (permalink / raw)
To: linux-kernel
Cc: David Miller, cornelia.huck, rusty, nab, pbonzini, thuth, dahi,
Sergei Shtylyov, Ben Hutchings, Jason Wang, kvm, virtualization,
netdev
In-Reply-To: <1417449619-24896-1-git-send-email-mst@redhat.com>
We need to use bit 32 for virtio 1.0.
Make vhost_has_feature bool to avoid discarding high bits.
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
---
drivers/vhost/vhost.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 3eda654..55a95c9 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -106,7 +106,7 @@ struct vhost_virtqueue {
/* Protected by virtqueue mutex. */
struct vhost_memory *memory;
void *private_data;
- unsigned acked_features;
+ u64 acked_features;
/* Log write descriptors */
void __user *log_base;
struct vhost_log *log;
@@ -172,8 +172,8 @@ enum {
(1ULL << VHOST_F_LOG_ALL),
};
-static inline int vhost_has_feature(struct vhost_virtqueue *vq, int bit)
+static inline bool vhost_has_feature(struct vhost_virtqueue *vq, int bit)
{
- return vq->acked_features & (1 << bit);
+ return vq->acked_features & (1ULL << bit);
}
#endif
--
MST
^ permalink raw reply related
* [PATCH v8 28/50] virtio_net: enable v1.0 support
From: Michael S. Tsirkin @ 2014-12-01 16:05 UTC (permalink / raw)
To: linux-kernel
Cc: thuth, rusty, netdev, virtualization, dahi, pbonzini,
David Miller
In-Reply-To: <1417449619-24896-1-git-send-email-mst@redhat.com>
Now that we have completed 1.0 support, enable it in our driver.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
drivers/net/virtio_net.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index b8bd719..9ab3c50 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2004,6 +2004,7 @@ static unsigned int features[] = {
VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ,
VIRTIO_NET_F_CTRL_MAC_ADDR,
VIRTIO_F_ANY_LAYOUT,
+ VIRTIO_F_VERSION_1,
};
static struct virtio_driver virtio_net_driver = {
--
MST
^ permalink raw reply related
* [PATCH v8 27/50] virtio_net: disable mac write for virtio 1.0
From: Michael S. Tsirkin @ 2014-12-01 16:05 UTC (permalink / raw)
To: linux-kernel
Cc: thuth, rusty, netdev, virtualization, dahi, pbonzini,
David Miller
In-Reply-To: <1417449619-24896-1-git-send-email-mst@redhat.com>
The spec states that mac in config space is only driver-writable in the
legacy case. Fence writing it in virtnet_set_mac_address() in the
virtio 1.0 case.
Suggested-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
---
drivers/net/virtio_net.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index a0e64cf..b8bd719 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1030,7 +1030,8 @@ static int virtnet_set_mac_address(struct net_device *dev, void *p)
"Failed to set mac address by vq command.\n");
return -EINVAL;
}
- } else if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC)) {
+ } else if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC) &&
+ !virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
unsigned int i;
/* Naturally, this has an atomicity problem. */
--
MST
^ permalink raw reply related
* [PATCH v8 26/50] virtio_net: bigger header when VERSION_1 is set
From: Michael S. Tsirkin @ 2014-12-01 16:04 UTC (permalink / raw)
To: linux-kernel
Cc: thuth, rusty, netdev, virtualization, dahi, pbonzini,
David Miller
In-Reply-To: <1417449619-24896-1-git-send-email-mst@redhat.com>
With VERSION_1 virtio_net uses same header size
whether mergeable buffers are enabled or not.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/virtio_net.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 098f443..a0e64cf 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1805,7 +1805,8 @@ static int virtnet_probe(struct virtio_device *vdev)
if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
vi->mergeable_rx_bufs = true;
- if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
+ if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF) ||
+ virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
vi->hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
else
vi->hdr_len = sizeof(struct virtio_net_hdr);
--
MST
^ permalink raw reply related
* [PATCH v8 25/50] virtio_net: stricter short buffer length checks
From: Michael S. Tsirkin @ 2014-12-01 16:04 UTC (permalink / raw)
To: linux-kernel
Cc: David Miller, cornelia.huck, rusty, nab, pbonzini, thuth, dahi,
Jason Wang, Rusty Russell, virtualization, netdev
In-Reply-To: <1417449619-24896-1-git-send-email-mst@redhat.com>
Our buffer length check is not strict enough for mergeable
buffers: buffer can still be shorter that header + address
by 2 bytes.
Fix that up.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/virtio_net.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 516f2cb..098f443 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -437,7 +437,7 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
struct sk_buff *skb;
struct virtio_net_hdr_mrg_rxbuf *hdr;
- if (unlikely(len < sizeof(struct virtio_net_hdr) + ETH_HLEN)) {
+ if (unlikely(len < vi->hdr_len + ETH_HLEN)) {
pr_debug("%s: short packet %i\n", dev->name, len);
dev->stats.rx_length_errors++;
if (vi->mergeable_rx_bufs) {
--
MST
^ permalink raw reply related
* [PATCH v8 24/50] virtio_net: get rid of virtio_net_hdr/skb_vnet_hdr
From: Michael S. Tsirkin @ 2014-12-01 16:04 UTC (permalink / raw)
To: linux-kernel
Cc: thuth, rusty, netdev, virtualization, dahi, pbonzini,
David Miller
In-Reply-To: <1417449619-24896-1-git-send-email-mst@redhat.com>
virtio 1.0 doesn't use virtio_net_hdr anymore, and in fact, it's not
really useful since virtio_net_hdr_mrg_rxbuf includes that as the first
field anyway.
Let's drop it, precalculate header len and store within vi instead.
This way we can also remove struct skb_vnet_hdr.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/virtio_net.c | 90 ++++++++++++++++++++++--------------------------
1 file changed, 41 insertions(+), 49 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 1630c21..516f2cb 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -123,6 +123,9 @@ struct virtnet_info {
/* Host can handle any s/g split between our header and packet data */
bool any_header_sg;
+ /* Packet virtio header size */
+ u8 hdr_len;
+
/* Active statistics */
struct virtnet_stats __percpu *stats;
@@ -139,21 +142,14 @@ struct virtnet_info {
struct notifier_block nb;
};
-struct skb_vnet_hdr {
- union {
- struct virtio_net_hdr hdr;
- struct virtio_net_hdr_mrg_rxbuf mhdr;
- };
-};
-
struct padded_vnet_hdr {
- struct virtio_net_hdr hdr;
+ struct virtio_net_hdr_mrg_rxbuf hdr;
/*
- * virtio_net_hdr should be in a separated sg buffer because of a
- * QEMU bug, and data sg buffer shares same page with this header sg.
- * This padding makes next sg 16 byte aligned after virtio_net_hdr.
+ * hdr is in a separate sg buffer, and data sg buffer shares same page
+ * with this header sg. This padding makes next sg 16 byte aligned
+ * after the header.
*/
- char padding[6];
+ char padding[4];
};
/* Converting between virtqueue no. and kernel tx/rx queue no.
@@ -179,9 +175,9 @@ static int rxq2vq(int rxq)
return rxq * 2;
}
-static inline struct skb_vnet_hdr *skb_vnet_hdr(struct sk_buff *skb)
+static inline struct virtio_net_hdr_mrg_rxbuf *skb_vnet_hdr(struct sk_buff *skb)
{
- return (struct skb_vnet_hdr *)skb->cb;
+ return (struct virtio_net_hdr_mrg_rxbuf *)skb->cb;
}
/*
@@ -247,7 +243,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
unsigned int len, unsigned int truesize)
{
struct sk_buff *skb;
- struct skb_vnet_hdr *hdr;
+ struct virtio_net_hdr_mrg_rxbuf *hdr;
unsigned int copy, hdr_len, hdr_padded_len;
char *p;
@@ -260,13 +256,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
hdr = skb_vnet_hdr(skb);
- if (vi->mergeable_rx_bufs) {
- hdr_len = sizeof hdr->mhdr;
- hdr_padded_len = sizeof hdr->mhdr;
- } else {
- hdr_len = sizeof hdr->hdr;
+ hdr_len = vi->hdr_len;
+ if (vi->mergeable_rx_bufs)
+ hdr_padded_len = sizeof *hdr;
+ else
hdr_padded_len = sizeof(struct padded_vnet_hdr);
- }
memcpy(hdr, p, hdr_len);
@@ -317,11 +311,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
return skb;
}
-static struct sk_buff *receive_small(void *buf, unsigned int len)
+static struct sk_buff *receive_small(struct virtnet_info *vi, void *buf, unsigned int len)
{
struct sk_buff * skb = buf;
- len -= sizeof(struct virtio_net_hdr);
+ len -= vi->hdr_len;
skb_trim(skb, len);
return skb;
@@ -354,8 +348,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
unsigned int len)
{
void *buf = mergeable_ctx_to_buf_address(ctx);
- struct skb_vnet_hdr *hdr = buf;
- u16 num_buf = virtio16_to_cpu(rq->vq->vdev, hdr->mhdr.num_buffers);
+ struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
+ u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
struct page *page = virt_to_head_page(buf);
int offset = buf - page_address(page);
unsigned int truesize = max(len, mergeable_ctx_to_buf_truesize(ctx));
@@ -373,8 +367,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
if (unlikely(!ctx)) {
pr_debug("%s: rx error: %d buffers out of %d missing\n",
dev->name, num_buf,
- virtio16_to_cpu(rq->vq->vdev,
- hdr->mhdr.num_buffers));
+ virtio16_to_cpu(vi->vdev,
+ hdr->num_buffers));
dev->stats.rx_length_errors++;
goto err_buf;
}
@@ -441,7 +435,7 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
struct net_device *dev = vi->dev;
struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
struct sk_buff *skb;
- struct skb_vnet_hdr *hdr;
+ struct virtio_net_hdr_mrg_rxbuf *hdr;
if (unlikely(len < sizeof(struct virtio_net_hdr) + ETH_HLEN)) {
pr_debug("%s: short packet %i\n", dev->name, len);
@@ -463,7 +457,7 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
else if (vi->big_packets)
skb = receive_big(dev, vi, rq, buf, len);
else
- skb = receive_small(buf, len);
+ skb = receive_small(vi, buf, len);
if (unlikely(!skb))
return;
@@ -545,7 +539,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
gfp_t gfp)
{
struct sk_buff *skb;
- struct skb_vnet_hdr *hdr;
+ struct virtio_net_hdr_mrg_rxbuf *hdr;
int err;
skb = __netdev_alloc_skb_ip_align(vi->dev, GOOD_PACKET_LEN, gfp);
@@ -556,7 +550,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
hdr = skb_vnet_hdr(skb);
sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
- sg_set_buf(rq->sg, &hdr->hdr, sizeof hdr->hdr);
+ sg_set_buf(rq->sg, hdr, vi->hdr_len);
skb_to_sgvec(skb, rq->sg + 1, 0, skb->len);
err = virtqueue_add_inbuf(rq->vq, rq->sg, 2, skb, gfp);
@@ -566,7 +560,8 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
return err;
}
-static int add_recvbuf_big(struct receive_queue *rq, gfp_t gfp)
+static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
+ gfp_t gfp)
{
struct page *first, *list = NULL;
char *p;
@@ -597,8 +592,8 @@ static int add_recvbuf_big(struct receive_queue *rq, gfp_t gfp)
p = page_address(first);
/* rq->sg[0], rq->sg[1] share the same page */
- /* a separated rq->sg[0] for virtio_net_hdr only due to QEMU bug */
- sg_set_buf(&rq->sg[0], p, sizeof(struct virtio_net_hdr));
+ /* a separated rq->sg[0] for header - required in case !any_header_sg */
+ sg_set_buf(&rq->sg[0], p, vi->hdr_len);
/* rq->sg[1] for data packet, from offset */
offset = sizeof(struct padded_vnet_hdr);
@@ -677,7 +672,7 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
if (vi->mergeable_rx_bufs)
err = add_recvbuf_mergeable(rq, gfp);
else if (vi->big_packets)
- err = add_recvbuf_big(rq, gfp);
+ err = add_recvbuf_big(vi, rq, gfp);
else
err = add_recvbuf_small(vi, rq, gfp);
@@ -857,18 +852,14 @@ static void free_old_xmit_skbs(struct send_queue *sq)
static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
{
- struct skb_vnet_hdr *hdr;
+ struct virtio_net_hdr_mrg_rxbuf *hdr;
const unsigned char *dest = ((struct ethhdr *)skb->data)->h_dest;
struct virtnet_info *vi = sq->vq->vdev->priv;
unsigned num_sg;
- unsigned hdr_len;
+ unsigned hdr_len = vi->hdr_len;
bool can_push;
pr_debug("%s: xmit %p %pM\n", vi->dev->name, skb, dest);
- if (vi->mergeable_rx_bufs)
- hdr_len = sizeof hdr->mhdr;
- else
- hdr_len = sizeof hdr->hdr;
can_push = vi->any_header_sg &&
!((unsigned long)skb->data & (__alignof__(*hdr) - 1)) &&
@@ -876,7 +867,7 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
/* Even if we can, don't push here yet as this would skew
* csum_start offset below. */
if (can_push)
- hdr = (struct skb_vnet_hdr *)(skb->data - hdr_len);
+ hdr = (struct virtio_net_hdr_mrg_rxbuf *)(skb->data - hdr_len);
else
hdr = skb_vnet_hdr(skb);
@@ -909,7 +900,7 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
}
if (vi->mergeable_rx_bufs)
- hdr->mhdr.num_buffers = 0;
+ hdr->num_buffers = 0;
sg_init_table(sq->sg, MAX_SKB_FRAGS + 2);
if (can_push) {
@@ -1814,18 +1805,19 @@ static int virtnet_probe(struct virtio_device *vdev)
if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
vi->mergeable_rx_bufs = true;
+ if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
+ vi->hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
+ else
+ vi->hdr_len = sizeof(struct virtio_net_hdr);
+
if (virtio_has_feature(vdev, VIRTIO_F_ANY_LAYOUT))
vi->any_header_sg = true;
if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ))
vi->has_cvq = true;
- if (vi->any_header_sg) {
- if (vi->mergeable_rx_bufs)
- dev->needed_headroom = sizeof(struct virtio_net_hdr_mrg_rxbuf);
- else
- dev->needed_headroom = sizeof(struct virtio_net_hdr);
- }
+ if (vi->any_header_sg)
+ dev->needed_headroom = vi->hdr_len;
/* Use single tx/rx queue pair as default */
vi->curr_queue_pairs = 1;
--
MST
^ permalink raw reply related
* [PATCH v8 23/50] virtio_net: pass vi around
From: Michael S. Tsirkin @ 2014-12-01 16:04 UTC (permalink / raw)
To: linux-kernel
Cc: thuth, rusty, netdev, virtualization, dahi, pbonzini,
David Miller
In-Reply-To: <1417449619-24896-1-git-send-email-mst@redhat.com>
Too many places poke at [rs]q->vq->vdev->priv just to get
the vi structure. Let's just pass the pointer around: seems
cleaner, and might even be faster.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
---
drivers/net/virtio_net.c | 38 ++++++++++++++++++++------------------
1 file changed, 20 insertions(+), 18 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c07e030..1630c21 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -241,11 +241,11 @@ static unsigned long mergeable_buf_to_ctx(void *buf, unsigned int truesize)
}
/* Called from bottom half context */
-static struct sk_buff *page_to_skb(struct receive_queue *rq,
+static struct sk_buff *page_to_skb(struct virtnet_info *vi,
+ struct receive_queue *rq,
struct page *page, unsigned int offset,
unsigned int len, unsigned int truesize)
{
- struct virtnet_info *vi = rq->vq->vdev->priv;
struct sk_buff *skb;
struct skb_vnet_hdr *hdr;
unsigned int copy, hdr_len, hdr_padded_len;
@@ -328,12 +328,13 @@ static struct sk_buff *receive_small(void *buf, unsigned int len)
}
static struct sk_buff *receive_big(struct net_device *dev,
+ struct virtnet_info *vi,
struct receive_queue *rq,
void *buf,
unsigned int len)
{
struct page *page = buf;
- struct sk_buff *skb = page_to_skb(rq, page, 0, len, PAGE_SIZE);
+ struct sk_buff *skb = page_to_skb(vi, rq, page, 0, len, PAGE_SIZE);
if (unlikely(!skb))
goto err;
@@ -359,7 +360,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
int offset = buf - page_address(page);
unsigned int truesize = max(len, mergeable_ctx_to_buf_truesize(ctx));
- struct sk_buff *head_skb = page_to_skb(rq, page, offset, len, truesize);
+ struct sk_buff *head_skb = page_to_skb(vi, rq, page, offset, len,
+ truesize);
struct sk_buff *curr_skb = head_skb;
if (unlikely(!curr_skb))
@@ -433,9 +435,9 @@ err_buf:
return NULL;
}
-static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len)
+static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
+ void *buf, unsigned int len)
{
- struct virtnet_info *vi = rq->vq->vdev->priv;
struct net_device *dev = vi->dev;
struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
struct sk_buff *skb;
@@ -459,7 +461,7 @@ static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len)
if (vi->mergeable_rx_bufs)
skb = receive_mergeable(dev, vi, rq, (unsigned long)buf, len);
else if (vi->big_packets)
- skb = receive_big(dev, rq, buf, len);
+ skb = receive_big(dev, vi, rq, buf, len);
else
skb = receive_small(buf, len);
@@ -539,9 +541,9 @@ frame_err:
dev_kfree_skb(skb);
}
-static int add_recvbuf_small(struct receive_queue *rq, gfp_t gfp)
+static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
+ gfp_t gfp)
{
- struct virtnet_info *vi = rq->vq->vdev->priv;
struct sk_buff *skb;
struct skb_vnet_hdr *hdr;
int err;
@@ -664,9 +666,9 @@ static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp)
* before we're receiving packets, or from refill_work which is
* careful to disable receiving (using napi_disable).
*/
-static bool try_fill_recv(struct receive_queue *rq, gfp_t gfp)
+static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
+ gfp_t gfp)
{
- struct virtnet_info *vi = rq->vq->vdev->priv;
int err;
bool oom;
@@ -677,7 +679,7 @@ static bool try_fill_recv(struct receive_queue *rq, gfp_t gfp)
else if (vi->big_packets)
err = add_recvbuf_big(rq, gfp);
else
- err = add_recvbuf_small(rq, gfp);
+ err = add_recvbuf_small(vi, rq, gfp);
oom = err == -ENOMEM;
if (err)
@@ -726,7 +728,7 @@ static void refill_work(struct work_struct *work)
struct receive_queue *rq = &vi->rq[i];
napi_disable(&rq->napi);
- still_empty = !try_fill_recv(rq, GFP_KERNEL);
+ still_empty = !try_fill_recv(vi, rq, GFP_KERNEL);
virtnet_napi_enable(rq);
/* In theory, this can happen: if we don't get any buffers in
@@ -745,12 +747,12 @@ static int virtnet_receive(struct receive_queue *rq, int budget)
while (received < budget &&
(buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
- receive_buf(rq, buf, len);
+ receive_buf(vi, rq, buf, len);
received++;
}
if (rq->vq->num_free > virtqueue_get_vring_size(rq->vq) / 2) {
- if (!try_fill_recv(rq, GFP_ATOMIC))
+ if (!try_fill_recv(vi, rq, GFP_ATOMIC))
schedule_delayed_work(&vi->refill, 0);
}
@@ -826,7 +828,7 @@ static int virtnet_open(struct net_device *dev)
for (i = 0; i < vi->max_queue_pairs; i++) {
if (i < vi->curr_queue_pairs)
/* Make sure we have some buffers: if oom use wq. */
- if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
+ if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
schedule_delayed_work(&vi->refill, 0);
virtnet_napi_enable(&vi->rq[i]);
}
@@ -1851,7 +1853,7 @@ static int virtnet_probe(struct virtio_device *vdev)
/* Last of all, set up some receive buffers. */
for (i = 0; i < vi->curr_queue_pairs; i++) {
- try_fill_recv(&vi->rq[i], GFP_KERNEL);
+ try_fill_recv(vi, &vi->rq[i], GFP_KERNEL);
/* If we didn't even get one input buffer, we're useless. */
if (vi->rq[i].vq->num_free ==
@@ -1971,7 +1973,7 @@ static int virtnet_restore(struct virtio_device *vdev)
if (netif_running(vi->dev)) {
for (i = 0; i < vi->curr_queue_pairs; i++)
- if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
+ if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
schedule_delayed_work(&vi->refill, 0);
for (i = 0; i < vi->max_queue_pairs; i++)
--
MST
^ permalink raw reply related
* [PATCH v8 15/50] virtio_net: v1.0 endianness
From: Michael S. Tsirkin @ 2014-12-01 16:04 UTC (permalink / raw)
To: linux-kernel
Cc: thuth, rusty, netdev, virtualization, dahi, linux-api, pbonzini,
David Miller
In-Reply-To: <1417449619-24896-1-git-send-email-mst@redhat.com>
Based on patches by Rusty Russell, Cornelia Huck.
Note: more code changes are needed for 1.0 support
(due to different header size).
So we don't advertize support for 1.0 yet.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
include/uapi/linux/virtio_net.h | 15 ++++++++-------
drivers/net/virtio_net.c | 33 ++++++++++++++++++++-------------
2 files changed, 28 insertions(+), 20 deletions(-)
diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
index 172a7f0..b5f1677 100644
--- a/include/uapi/linux/virtio_net.h
+++ b/include/uapi/linux/virtio_net.h
@@ -28,6 +28,7 @@
#include <linux/types.h>
#include <linux/virtio_ids.h>
#include <linux/virtio_config.h>
+#include <linux/virtio_types.h>
#include <linux/if_ether.h>
/* The feature bitmap for virtio net */
@@ -84,17 +85,17 @@ struct virtio_net_hdr {
#define VIRTIO_NET_HDR_GSO_TCPV6 4 // GSO frame, IPv6 TCP
#define VIRTIO_NET_HDR_GSO_ECN 0x80 // TCP has ECN set
__u8 gso_type;
- __u16 hdr_len; /* Ethernet + IP + tcp/udp hdrs */
- __u16 gso_size; /* Bytes to append to hdr_len per frame */
- __u16 csum_start; /* Position to start checksumming from */
- __u16 csum_offset; /* Offset after that to place checksum */
+ __virtio16 hdr_len; /* Ethernet + IP + tcp/udp hdrs */
+ __virtio16 gso_size; /* Bytes to append to hdr_len per frame */
+ __virtio16 csum_start; /* Position to start checksumming from */
+ __virtio16 csum_offset; /* Offset after that to place checksum */
};
/* This is the version of the header to use when the MRG_RXBUF
* feature has been negotiated. */
struct virtio_net_hdr_mrg_rxbuf {
struct virtio_net_hdr hdr;
- __u16 num_buffers; /* Number of merged rx buffers */
+ __virtio16 num_buffers; /* Number of merged rx buffers */
};
/*
@@ -149,7 +150,7 @@ typedef __u8 virtio_net_ctrl_ack;
* VIRTIO_NET_F_CTRL_MAC_ADDR feature is available.
*/
struct virtio_net_ctrl_mac {
- __u32 entries;
+ __virtio32 entries;
__u8 macs[][ETH_ALEN];
} __attribute__((packed));
@@ -193,7 +194,7 @@ struct virtio_net_ctrl_mac {
* specified.
*/
struct virtio_net_ctrl_mq {
- __u16 virtqueue_pairs;
+ __virtio16 virtqueue_pairs;
};
#define VIRTIO_NET_CTRL_MQ 4
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index b0bc8ea..c07e030 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -347,13 +347,14 @@ err:
}
static struct sk_buff *receive_mergeable(struct net_device *dev,
+ struct virtnet_info *vi,
struct receive_queue *rq,
unsigned long ctx,
unsigned int len)
{
void *buf = mergeable_ctx_to_buf_address(ctx);
struct skb_vnet_hdr *hdr = buf;
- int num_buf = hdr->mhdr.num_buffers;
+ u16 num_buf = virtio16_to_cpu(rq->vq->vdev, hdr->mhdr.num_buffers);
struct page *page = virt_to_head_page(buf);
int offset = buf - page_address(page);
unsigned int truesize = max(len, mergeable_ctx_to_buf_truesize(ctx));
@@ -369,7 +370,9 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
ctx = (unsigned long)virtqueue_get_buf(rq->vq, &len);
if (unlikely(!ctx)) {
pr_debug("%s: rx error: %d buffers out of %d missing\n",
- dev->name, num_buf, hdr->mhdr.num_buffers);
+ dev->name, num_buf,
+ virtio16_to_cpu(rq->vq->vdev,
+ hdr->mhdr.num_buffers));
dev->stats.rx_length_errors++;
goto err_buf;
}
@@ -454,7 +457,7 @@ static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len)
}
if (vi->mergeable_rx_bufs)
- skb = receive_mergeable(dev, rq, (unsigned long)buf, len);
+ skb = receive_mergeable(dev, vi, rq, (unsigned long)buf, len);
else if (vi->big_packets)
skb = receive_big(dev, rq, buf, len);
else
@@ -473,8 +476,8 @@ static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len)
if (hdr->hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
pr_debug("Needs csum!\n");
if (!skb_partial_csum_set(skb,
- hdr->hdr.csum_start,
- hdr->hdr.csum_offset))
+ virtio16_to_cpu(vi->vdev, hdr->hdr.csum_start),
+ virtio16_to_cpu(vi->vdev, hdr->hdr.csum_offset)))
goto frame_err;
} else if (hdr->hdr.flags & VIRTIO_NET_HDR_F_DATA_VALID) {
skb->ip_summed = CHECKSUM_UNNECESSARY;
@@ -514,7 +517,8 @@ static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len)
if (hdr->hdr.gso_type & VIRTIO_NET_HDR_GSO_ECN)
skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
- skb_shinfo(skb)->gso_size = hdr->hdr.gso_size;
+ skb_shinfo(skb)->gso_size = virtio16_to_cpu(vi->vdev,
+ hdr->hdr.gso_size);
if (skb_shinfo(skb)->gso_size == 0) {
net_warn_ratelimited("%s: zero gso size.\n", dev->name);
goto frame_err;
@@ -876,16 +880,19 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
if (skb->ip_summed == CHECKSUM_PARTIAL) {
hdr->hdr.flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
- hdr->hdr.csum_start = skb_checksum_start_offset(skb);
- hdr->hdr.csum_offset = skb->csum_offset;
+ hdr->hdr.csum_start = cpu_to_virtio16(vi->vdev,
+ skb_checksum_start_offset(skb));
+ hdr->hdr.csum_offset = cpu_to_virtio16(vi->vdev,
+ skb->csum_offset);
} else {
hdr->hdr.flags = 0;
hdr->hdr.csum_offset = hdr->hdr.csum_start = 0;
}
if (skb_is_gso(skb)) {
- hdr->hdr.hdr_len = skb_headlen(skb);
- hdr->hdr.gso_size = skb_shinfo(skb)->gso_size;
+ hdr->hdr.hdr_len = cpu_to_virtio16(vi->vdev, skb_headlen(skb));
+ hdr->hdr.gso_size = cpu_to_virtio16(vi->vdev,
+ skb_shinfo(skb)->gso_size);
if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
hdr->hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
@@ -1112,7 +1119,7 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
if (!vi->has_cvq || !virtio_has_feature(vi->vdev, VIRTIO_NET_F_MQ))
return 0;
- s.virtqueue_pairs = queue_pairs;
+ s.virtqueue_pairs = cpu_to_virtio16(vi->vdev, queue_pairs);
sg_init_one(&sg, &s, sizeof(s));
if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_MQ,
@@ -1189,7 +1196,7 @@ static void virtnet_set_rx_mode(struct net_device *dev)
sg_init_table(sg, 2);
/* Store the unicast list and count in the front of the buffer */
- mac_data->entries = uc_count;
+ mac_data->entries = cpu_to_virtio32(vi->vdev, uc_count);
i = 0;
netdev_for_each_uc_addr(ha, dev)
memcpy(&mac_data->macs[i++][0], ha->addr, ETH_ALEN);
@@ -1200,7 +1207,7 @@ static void virtnet_set_rx_mode(struct net_device *dev)
/* multicast list and count fill the end */
mac_data = (void *)&mac_data->macs[uc_count][0];
- mac_data->entries = mc_count;
+ mac_data->entries = cpu_to_virtio32(vi->vdev, mc_count);
i = 0;
netdev_for_each_mc_addr(ha, dev)
memcpy(&mac_data->macs[i++][0], ha->addr, ETH_ALEN);
--
MST
^ permalink raw reply related
* Re: [PATCH v2 02/19] kbuild: kselftest_install - add a new make target to install selftests
From: Michal Marek @ 2014-12-01 15:47 UTC (permalink / raw)
To: Shuah Khan, gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
davem-fT/PcQaiUtIeIZ0/mPfg9Q, keescook-F7+t8E8rja9g9hUCZPvPmw,
tranmanphong-Re5JQEeQqe8AvxtiuMwx3w,
dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w, hughd-hpIqsD4AKlfQT0dZR+AlfA,
bobby.prani-Re5JQEeQqe8AvxtiuMwx3w,
ebiederm-aS9lmoZGLiVWk0Htik3J/w,
serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA
Cc: linux-kbuild-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-api-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <a2344d4df903d673afe1631118f40917f773cc9a.1415735831.git.shuahkh-JPH+aEBZ4P+UEJcrhfAQsw@public.gmane.org>
On 2014-11-11 21:27, Shuah Khan wrote:
> Add a new make target to install to install kernel selftests.
> This new target will build and install selftests. kselftest
> target now depends on kselftest_install and runs the generated
> kselftest script to reduce duplicate work and for common look
> and feel when running tests.
>
> Approach:
>
> make kselftest_target:
> -- exports kselftest INSTALL_KSFT_PATH
> default $(INSTALL_MOD_PATH)/lib/kselftest/$(KERNELRELEASE)
> -- exports path for ksefltest.sh
> -- runs selftests make install target:
>
> selftests make install target
> -- creates kselftest.sh script in install install dir
> -- runs install targets for all INSTALL_TARGETS
> (Note: ftrace and powerpc aren't included in INSTALL_TARGETS,
> to not add more content to patch v1 series. This work
> will happen soon. In this series these two targets are
> run after running the generated kselftest script, without
> any regression in the way these tests are run with
> "make kselftest" prior to this work.)
> -- install target can be run only from top level source dir.
>
> Individual test make install targets:
> -- install test programs and/or scripts in install dir
> -- append to the ksefltest.sh file to add commands to run test
> -- install target can be run only from top level source dir.
>
> Adds the following new ways to initiate selftests:
> -- Installing and running kselftest from install directory
> by running "make kselftest"
> -- Running kselftest script from install directory
>
> Maintains the following ways to run tests:
> -- make -C tools/testing/selftests run_tests
> -- make -C tools/testing/selftests TARGETS=target run_tests
> Ability specify targets: e.g TARGETS=net
> -- make run_tests from tools/testing/selftests
> -- make run_tests from individual test directories:
> e.g: make run_tests in tools/testing/selftests/breakpoints
>
> Signed-off-by: Shuah Khan <shuahkh-JPH+aEBZ4P+UEJcrhfAQsw@public.gmane.org>
> ---
> Makefile | 21 +++++++++++++++++++--
> 1 file changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/Makefile b/Makefile
> index 05d67af..ccbd2e1 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1071,12 +1071,26 @@ headers_check: headers_install
> $(Q)$(MAKE) $(hdr-inst)=arch/$(hdr-arch)/include/uapi/asm $(hdr-dst) HDRCHECK=1
>
> # ---------------------------------------------------------------------------
> -# Kernel selftest
> +# Kernel selftest targets
> +
> +PHONY += __kselftest_configure
> +INSTALL_KSFT_PATH=$(INSTALL_MOD_PATH)/lib/kselftest/$(KERNELRELEASE)
> +export INSTALL_KSFT_PATH
> +KSELFTEST=$(INSTALL_KSFT_PATH)/kselftest.sh
> +export KSELFTEST
Can this be moved to tools/testing/selftests/Makefile? It's only used in
this part of the tree.
> PHONY += kselftest
> -kselftest:
> +kselftest: kselftest_install
> $(Q)$(MAKE) -C tools/testing/selftests run_tests
>
> +# Kernel selftest install
> +
> +PHONY += kselftest_install
> +kselftest_install: __kselftest_configure
> + @rm -rf $(INSTALL_KSFT_PATH)
> + @mkdir -p $(INSTALL_KSFT_PATH)
Please use $(Q) insteaf od hardcoding the @.
> + $(Q)$(MAKE) -C tools/testing/selftests install
The install target is only added by the next patch, which in turn
depends on changes done by later patches. The order (in the current
numbering) should be 01/19, 04/19, ... 19/19, 03/19 and 02/19.
Michal
^ permalink raw reply
* Re: [PATCH v7 31/46] vhost: virtio 1.0 endian-ness support
From: Michael S. Tsirkin @ 2014-12-01 15:45 UTC (permalink / raw)
To: Cornelia Huck
Cc: thuth, kvm, rusty, netdev, linux-kernel, virtualization, dahi,
pbonzini, David Miller
In-Reply-To: <20141201133353.0bbaa2e1.cornelia.huck@de.ibm.com>
On Mon, Dec 01, 2014 at 01:33:53PM +0100, Cornelia Huck wrote:
> On Sun, 30 Nov 2014 17:11:49 +0200
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > drivers/vhost/vhost.c | 93 +++++++++++++++++++++++++++++++--------------------
> > 1 file changed, 56 insertions(+), 37 deletions(-)
> >
>
> > @@ -1113,18 +1120,19 @@ static int get_indirect(struct vhost_virtqueue *vq,
> > {
> > struct vring_desc desc;
> > unsigned int i = 0, count, found = 0;
> > + u32 len = vhost32_to_cpu(vq, indirect->len);
> > int ret;
> >
> > /* Sanity check */
> > - if (unlikely(indirect->len % sizeof desc)) {
> > + if (unlikely(len % sizeof desc)) {
> > vq_err(vq, "Invalid length in indirect descriptor: "
> > "len 0x%llx not multiple of 0x%zx\n",
> > - (unsigned long long)indirect->len,
> > + (unsigned long long)vhost32_to_cpu(vq, indirect->len),
>
> Can't you use len here?
>
> > sizeof desc);
> > return -EINVAL;
> > }
> >
> > - ret = translate_desc(vq, indirect->addr, indirect->len, vq->indirect,
> > + ret = translate_desc(vq, vhost64_to_cpu(vq, indirect->addr), len, vq->indirect,
> > UIO_MAXIOV);
> > if (unlikely(ret < 0)) {
> > vq_err(vq, "Translation failure %d in indirect.\n", ret);
>
>
> > @@ -1404,7 +1422,7 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
> >
> > /* Make sure buffer is written before we update index. */
> > smp_wmb();
> > - if (put_user(vq->last_used_idx, &vq->used->idx)) {
> > + if (__put_user(cpu_to_vhost16(vq, vq->last_used_idx), &vq->used->idx)) {
>
> Why s/put_user/__put_user/ - I don't see how endianness conversions
> should have an influence there?
>
> > vq_err(vq, "Failed to increment used idx");
> > return -EFAULT;
> > }
>
> > @@ -1449,11 +1468,11 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
> > if (unlikely(!v))
> > return true;
> >
> > - if (get_user(event, vhost_used_event(vq))) {
> > + if (__get_user(event, vhost_used_event(vq))) {
>
> Dito: why the change?
I remember now.
put_user/get_user macros don't work well
with __virtio tags.
As __ are a good idea anyway, I just switched to that
everywhere.
> > vq_err(vq, "Failed to get used event idx");
> > return true;
> > }
> > - return vring_need_event(event, new, old);
> > + return vring_need_event(vhost16_to_cpu(vq, event), new, old);
> > }
> >
> > /* This actually signals the guest, using eventfd. */
^ permalink raw reply
* Re: [PATCH 4/7] ARM: STi: DT: STiH410: Add usb2 picophy dt nodes
From: Arnd Bergmann @ 2014-12-01 15:40 UTC (permalink / raw)
To: linux-arm-kernel
Cc: Peter Griffin, linux-kernel, srinivas.kandagatla, maxime.coquelin,
patrice.chotard, peppe.cavallaro, kishon, netdev, lee.jones,
alexandre.torgue, devicetree
In-Reply-To: <1416385632-5832-5-git-send-email-peter.griffin@linaro.org>
On Wednesday 19 November 2014 08:27:09 Peter Griffin wrote:
> + soc {
> + usb2_picophy1: phy@1 {
> + compatible = "st,stih407-usb2-phy";
> + #phy-cells = <0>;
> + st,syscfg = <&syscfg_core 0xf8 0xf4>;
> + resets = <&softreset STIH407_PICOPHY_SOFTRESET>,
> + <&picophyreset STIH407_PICOPHY0_RESET>;
> + reset-names = "global", "port";
> + };
>
> + usb2_picophy2: phy@2 {
> + compatible = "st,stih407-usb2-phy";
> + #phy-cells = <0>;
> + st,syscfg = <&syscfg_core 0xfc 0xf4>;
> + resets = <&softreset STIH407_PICOPHY_SOFTRESET>,
> + <&picophyreset STIH407_PICOPHY1_RESET>;
> + reset-names = "global", "port";
> + };
> + };
In theory the unit-address (the @1 and @2 part of the name) is supposed to
match the 'reg' property value, but of course that doesn't work any
more with the changed binding. The same problem keeps coming up, so
I wonder if anyone has an idea how this is supposed to be handled properly.
Should we just make up unit-address numbers? I guess a more elaborate
variant would be to have a parent node with #address-cells = <1> and
no ranges, to make up a new address space with arbitrarily assigned
reg values, like
phys {
#address-cells = <1>; /* just counting the nodes */
#size-cells = <0>;
usb2_picophy1: phy@0 {
compatible = "st,stih407-usb2-phy";
reg = <0>;
#phy-cells = <0>;
st,syscfg = <&syscfg_core 0xf8 0xf4>;
};
usb2_picophy2: phy@1 {
compatible = "st,stih407-usb2-phy";
reg = <0>;
#phy-cells = <0>;
st,syscfg = <&syscfg_core 0xf8 0xf4>;
};
}
Should we try to do it like this, or is that overcomplicating
things?
Arnd
^ permalink raw reply
* Re: [PATCH 0/7] Fix sti drivers whcih mix reg address spaces
From: Arnd Bergmann @ 2014-12-01 15:36 UTC (permalink / raw)
To: Peter Griffin
Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
srinivas.kandagatla-Re5JQEeQqe8AvxtiuMwx3w,
maxime.coquelin-qxv4g6HH51o, patrice.chotard-qxv4g6HH51o,
peppe.cavallaro-qxv4g6HH51o, kishon-l0cyMroinI0,
lee.jones-QSEj5FYQhm4dnm+yROfE0A, netdev-u79uwXL29TY76Z2rM5mHXA,
devicetree-u79uwXL29TY76Z2rM5mHXA, alexandre.torgue-qxv4g6HH51o
In-Reply-To: <1416385632-5832-1-git-send-email-peter.griffin-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
On Wednesday 19 November 2014 08:27:05 Peter Griffin wrote:
> Hi,
>
> Following on from Arnds comments about the picophy driver here
> https://lkml.org/lkml/2014/11/13/161, this series fixes the
> remaining upstreamed drivers for STI, which are mixing address spaces
> in the reg property. We do this in a way similar to the keystone
> and bcm7445 platforms, by having sysconfig phandle/ offset pair (
> where only one register is required). Or phandle / integer array
> where multiple offsets in the same bank are needed).
>
> This series breaks DT compatability! But the platform support
> is WIP and only being used by the few developers who are upstreaming
> support for it. I've made each change to the driver / dt doc / dt
> file as a single atomic commit so the kernel will remain bisectable.
>
> This series then also enables the picophy driver, and adds back in
> the ehci/ohci dt nodes for stih410 which make use of the picophy.
This all looks very good to me, nice work!
Reviewed-by: Arnd Bergmann <arnd-r2nGTMty4D4@public.gmane.org>
I have one comment for patch 4, but I don't have a good solution for
that, hopefully someone on the devicetree-discuss list has some insight,
otherwise we just keep your version.
Arnd
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH v7 36/46] vhost/net: suppress compiler warning
From: Cornelia Huck @ 2014-12-01 15:18 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: thuth, kvm, rusty, netdev, linux-kernel, virtualization, dahi,
pbonzini, David Miller
In-Reply-To: <20141201151208.GA18919@redhat.com>
On Mon, 1 Dec 2014 17:12:08 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Mon, Dec 01, 2014 at 03:21:03PM +0100, Cornelia Huck wrote:
> > On Mon, 1 Dec 2014 15:48:39 +0200
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >
> > > On Mon, Dec 01, 2014 at 01:37:40PM +0100, Cornelia Huck wrote:
> > > > On Sun, 30 Nov 2014 17:12:13 +0200
> > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > >
> > > > > len is always initialized since function is called with size > 0.
> > > > >
> > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > ---
> > > > > drivers/vhost/net.c | 2 +-
> > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > > > > index 984242e..54ffbb0 100644
> > > > > --- a/drivers/vhost/net.c
> > > > > +++ b/drivers/vhost/net.c
> > > > > @@ -501,7 +501,7 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
> > > > > int headcount = 0;
> > > > > unsigned d;
> > > > > int r, nlogs = 0;
> > > > > - u32 len;
> > > > > + u32 uninitialized_var(len);
> > > > >
> > > > > while (datalen > 0 && headcount < quota) {
> > > > > if (unlikely(seg >= UIO_MAXIOV)) {
> > > >
> > > > Want to merge this with the patch introducing the variable and add a
> > > > comment there?
> > >
> > > Not really. Warnings in bisect are fine I think.
> >
> > I'm not sure what a separate patch buys us, though, as it should be
> > trivial to merge.
>
> Easier to document the reason if it's split out.
>
That's why I suggested a comment ;)
^ permalink raw reply
* Re: [PATCH v7 36/46] vhost/net: suppress compiler warning
From: Michael S. Tsirkin @ 2014-12-01 15:12 UTC (permalink / raw)
To: Cornelia Huck
Cc: thuth, kvm, rusty, netdev, linux-kernel, virtualization, dahi,
pbonzini, David Miller
In-Reply-To: <20141201152103.0f375544.cornelia.huck@de.ibm.com>
On Mon, Dec 01, 2014 at 03:21:03PM +0100, Cornelia Huck wrote:
> On Mon, 1 Dec 2014 15:48:39 +0200
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>
> > On Mon, Dec 01, 2014 at 01:37:40PM +0100, Cornelia Huck wrote:
> > > On Sun, 30 Nov 2014 17:12:13 +0200
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > >
> > > > len is always initialized since function is called with size > 0.
> > > >
> > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > ---
> > > > drivers/vhost/net.c | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > > > index 984242e..54ffbb0 100644
> > > > --- a/drivers/vhost/net.c
> > > > +++ b/drivers/vhost/net.c
> > > > @@ -501,7 +501,7 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
> > > > int headcount = 0;
> > > > unsigned d;
> > > > int r, nlogs = 0;
> > > > - u32 len;
> > > > + u32 uninitialized_var(len);
> > > >
> > > > while (datalen > 0 && headcount < quota) {
> > > > if (unlikely(seg >= UIO_MAXIOV)) {
> > >
> > > Want to merge this with the patch introducing the variable and add a
> > > comment there?
> >
> > Not really. Warnings in bisect are fine I think.
>
> I'm not sure what a separate patch buys us, though, as it should be
> trivial to merge.
Easier to document the reason if it's split out.
^ permalink raw reply
* Payment
From: Finance Department @ 2014-12-01 14:35 UTC (permalink / raw)
Dear Recipient,
You have been awarded the sum of 8,000,000.00 (Eight Million Pounds sterling) with reference number 77100146 by office of the ministry of finance UK.Send us your personal details to deliver your funds.
Gloria Peter
^ permalink raw reply
* Re: [PATCH net] gso: do GSO for local skb with size bigger than MTU
From: Michael S. Tsirkin @ 2014-12-01 15:06 UTC (permalink / raw)
To: Thomas Graf
Cc: dev-yBygre7rU0TnMu66kgdUjQ,
netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
'Jason Wang', Du, Fan,
fw-HFFVJYpyMKqzQB+pC5nmwQ@public.gmane.org,
davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org
In-Reply-To: <20141201135225.GA16814-FZi0V3Vbi30CUdFEqe4BF2D2FQJk+8+b@public.gmane.org>
On Mon, Dec 01, 2014 at 01:52:25PM +0000, Thomas Graf wrote:
> On 11/30/14 at 10:08am, Du, Fan wrote:
> > >-----Original Message-----
> > >From: Jason Wang [mailto:jasowang@redhat.com]
> > >Sent: Friday, November 28, 2014 3:02 PM
> > >To: Du, Fan
> > >Cc: netdev@vger.kernel.org; davem@davemloft.net; fw@strlen.de; Du, Fan
> > >Subject: Re: [PATCH net] gso: do GSO for local skb with size bigger than MTU
> > >On Fri, Nov 28, 2014 at 2:33 PM, Fan Du <fan.du@intel.com> wrote:
> > >> Test scenario: two KVM guests sitting in different hosts communicate
> > >> to each other with a vxlan tunnel.
> > >>
> > >> All interface MTU is default 1500 Bytes, from guest point of view, its
> > >> skb gso_size could be as bigger as 1448Bytes, however after guest skb
> > >> goes through vxlan encapuslation, individual segments length of a gso
> > >> packet could exceed physical NIC MTU 1500, which will be lost at
> > >> recevier side.
> > >>
> > >> So it's possible in virtualized environment, locally created skb len
> > >> after encapslation could be bigger than underlayer MTU. In such case,
> > >> it's reasonable to do GSO first, then fragment any packet bigger than
> > >> MTU as possible.
> > >>
> > >> +---------------+ TX RX +---------------+
> > >> | KVM Guest | -> ... -> | KVM Guest |
> > >> +-+-----------+-+ +-+-----------+-+
> > >> |Qemu/VirtIO| |Qemu/VirtIO|
> > >> +-----------+ +-----------+
> > >> | |
> > >> v tap0 tap0 v
> > >> +-----------+ +-----------+
> > >> | ovs bridge| | ovs bridge|
> > >> +-----------+ +-----------+
> > >> | vxlan vxlan |
> > >> v v
> > >> +-----------+ +-----------+
> > >> | NIC | <------> | NIC |
> > >> +-----------+ +-----------+
> > >>
> > >> Steps to reproduce:
> > >> 1. Using kernel builtin openvswitch module to setup ovs bridge.
> > >> 2. Runing iperf without -M, communication will stuck.
> > >
> > >Is this issue specific to ovs or ipv4? Path MTU discovery should help in this case I
> > >believe.
> >
> > Problem here is host stack push local over-sized gso skb down to NIC, and perform GSO there
> > without any further ip segmentation.
> >
> > Reasonable behavior is do gso first at ip level, if gso-ed skb is bigger than MTU && df is set,
> > Then push ICMP_DEST_UNREACH/ICMP_FRAG_NEEDED message back to sender to adjust mtu.
Sounds about right.
> Aside from this. I think Virtio should provide a MTU hint to the guest
> to adjust MTU in the vNIC to account for both overhead or support for
> jumbo frames in the underlay transparently without relying on PMTU or
> MSS hints. I remember we talked about this a while ago with at least
> Michael but haven't done actual code work on it yet.
This can be an optimization but can't replace fixing PMTU.
In particular this can't help legacy guests.
> > For PMTU to work, that's another issue I will try to address later on.
>
> PMTU discovery was explicitly removed from the OVS datapath. Maybe
> Pravin or Jesse can provide some background on that
PMTU was never working properly for OVS users. There was an ugly hack that kind
of worked part of the time. That was dropped, and good riddance.
Proper PMTU support for all kind of tunneling solutions, e.g. along the
lines you outline above, belongs in host core.
--
MST
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev
^ permalink raw reply
* Re: [PATCH 1/3] net-PPP: Deletion of unnecessary checks before the function call "kfree"
From: SF Markus Elfring @ 2014-12-01 15:00 UTC (permalink / raw)
To: Sergei Shtylyov, Paul Mackerras, linux-ppp, netdev
Cc: LKML, kernel-janitors, Julia Lawall
In-Reply-To: <547C5CBC.6060607@cogentembedded.com>
>> diff --git a/drivers/net/ppp/ppp_mppe.c b/drivers/net/ppp/ppp_mppe.c
>> index 911b216..7e44212 100644
>> --- a/drivers/net/ppp/ppp_mppe.c
>> +++ b/drivers/net/ppp/ppp_mppe.c
>> @@ -238,8 +238,7 @@ static void *mppe_alloc(unsigned char *options, int optlen)
>> return (void *)state;
>>
>> out_free:
>> - if (state->sha1_digest)
>> - kfree(state->sha1_digest);
>> + kfree(state->sha1_digest);
>
> Please keep this line aligned to the others.
Can it be that the previous source code contained unwanted space
characters at this place?
Do you want indentation fixes as a separate update step?
Regards,
Markus
^ permalink raw reply
* Re: [PATCH v7 36/46] vhost/net: suppress compiler warning
From: Cornelia Huck @ 2014-12-01 14:21 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: thuth, kvm, rusty, netdev, linux-kernel, virtualization, dahi,
pbonzini, David Miller
In-Reply-To: <20141201134839.GB18305@redhat.com>
On Mon, 1 Dec 2014 15:48:39 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Mon, Dec 01, 2014 at 01:37:40PM +0100, Cornelia Huck wrote:
> > On Sun, 30 Nov 2014 17:12:13 +0200
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >
> > > len is always initialized since function is called with size > 0.
> > >
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > > drivers/vhost/net.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > > index 984242e..54ffbb0 100644
> > > --- a/drivers/vhost/net.c
> > > +++ b/drivers/vhost/net.c
> > > @@ -501,7 +501,7 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
> > > int headcount = 0;
> > > unsigned d;
> > > int r, nlogs = 0;
> > > - u32 len;
> > > + u32 uninitialized_var(len);
> > >
> > > while (datalen > 0 && headcount < quota) {
> > > if (unlikely(seg >= UIO_MAXIOV)) {
> >
> > Want to merge this with the patch introducing the variable and add a
> > comment there?
>
> Not really. Warnings in bisect are fine I think.
I'm not sure what a separate patch buys us, though, as it should be
trivial to merge.
^ permalink raw reply
* Re: [Xen-devel] [PATCH] xen-netfront: Fix handling packets on compound pages with skb_linearize
From: Stefan Bader @ 2014-12-01 14:13 UTC (permalink / raw)
To: Zoltan Kiss, David Vrabel, Zoltan Kiss, Konrad Rzeszutek Wilk,
Boris Ostrovsky
Cc: Wei Liu, Ian Campbell, netdev, linux-kernel, Paul Durrant,
xen-devel
In-Reply-To: <547C742E.6060801@linaro.org>
[-- Attachment #1: Type: text/plain, Size: 3372 bytes --]
On 01.12.2014 14:59, Zoltan Kiss wrote:
>
>
> On 01/12/14 13:36, David Vrabel wrote:
>> On 01/12/14 08:55, Stefan Bader wrote:
>>> On 11.08.2014 19:32, Zoltan Kiss wrote:
>>>> There is a long known problem with the netfront/netback interface: if the guest
>>>> tries to send a packet which constitues more than MAX_SKB_FRAGS + 1 ring slots,
>>>> it gets dropped. The reason is that netback maps these slots to a frag in the
>>>> frags array, which is limited by size. Having so many slots can occur since
>>>> compound pages were introduced, as the ring protocol slice them up into
>>>> individual (non-compound) page aligned slots. The theoretical worst case
>>>> scenario looks like this (note, skbs are limited to 64 Kb here):
>>>> linear buffer: at most PAGE_SIZE - 17 * 2 bytes, overlapping page boundary,
>>>> using 2 slots
>>>> first 15 frags: 1 + PAGE_SIZE + 1 bytes long, first and last bytes are at the
>>>> end and the beginning of a page, therefore they use 3 * 15 = 45 slots
>>>> last 2 frags: 1 + 1 bytes, overlapping page boundary, 2 * 2 = 4 slots
>>>> Although I don't think this 51 slots skb can really happen, we need a solution
>>>> which can deal with every scenario. In real life there is only a few slots
>>>> overdue, but usually it causes the TCP stream to be blocked, as the retry will
>>>> most likely have the same buffer layout.
>>>> This patch solves this problem by linearizing the packet. This is not the
>>>> fastest way, and it can fail much easier as it tries to allocate a big linear
>>>> area for the whole packet, but probably easier by an order of magnitude than
>>>> anything else. Probably this code path is not touched very frequently anyway.
>>>>
>>>> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
>>>> Cc: Wei Liu <wei.liu2@citrix.com>
>>>> Cc: Ian Campbell <Ian.Campbell@citrix.com>
>>>> Cc: Paul Durrant <paul.durrant@citrix.com>
>>>> Cc: netdev@vger.kernel.org
>>>> Cc: linux-kernel@vger.kernel.org
>>>> Cc: xen-devel@lists.xenproject.org
>>>
>>> This does not seem to be marked explicitly as stable. Has someone already asked
>>> David Miller to put it on his stable queue? IMO it qualifies quite well and the
>>> actual change should be simple to pick/backport.
>>
>> I think it's a candidate, yes.
>>
>> Can you expand on the user visible impact of the bug this patch fixes?
>> I think it results in certain types of traffic not working (because the
>> domU always generates skb's with the problematic frag layout), but I
>> can't remember the details.
>
> Yes, this line in the comment talks about it: "In real life there is only a few
> slots overdue, but usually it causes the TCP stream to be blocked, as the retry
> will most likely have the same buffer layout."
> Maybe we can add what kind of traffic triggered this so far, AFAIK NFS was one
> of them, and Stefan had an another use case. But my memories are blur about this.
We had some report about some web-app hitting packet losses. I suspect that also
was streaming something. For a easy trigger we found redis-benchmark (part of
the redis keyserver) with a larger (iirc 1kB) payload would trigger the
fragmentation/exceeding pages to happen. Though I think it did not fail but
showed a performance drop instead (from memory which also suffers from loosing
detail).
-Stefan
>
> Zoli
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* Re: [Xen-devel] [PATCH] xen-netfront: Fix handling packets on compound pages with skb_linearize
From: Zoltan Kiss @ 2014-12-01 13:59 UTC (permalink / raw)
To: David Vrabel, Stefan Bader, Zoltan Kiss, Konrad Rzeszutek Wilk,
Boris Ostrovsky
Cc: Wei Liu, Ian Campbell, netdev, linux-kernel, Paul Durrant,
xen-devel
In-Reply-To: <547C6EF6.8020604@citrix.com>
On 01/12/14 13:36, David Vrabel wrote:
> On 01/12/14 08:55, Stefan Bader wrote:
>> On 11.08.2014 19:32, Zoltan Kiss wrote:
>>> There is a long known problem with the netfront/netback interface: if the guest
>>> tries to send a packet which constitues more than MAX_SKB_FRAGS + 1 ring slots,
>>> it gets dropped. The reason is that netback maps these slots to a frag in the
>>> frags array, which is limited by size. Having so many slots can occur since
>>> compound pages were introduced, as the ring protocol slice them up into
>>> individual (non-compound) page aligned slots. The theoretical worst case
>>> scenario looks like this (note, skbs are limited to 64 Kb here):
>>> linear buffer: at most PAGE_SIZE - 17 * 2 bytes, overlapping page boundary,
>>> using 2 slots
>>> first 15 frags: 1 + PAGE_SIZE + 1 bytes long, first and last bytes are at the
>>> end and the beginning of a page, therefore they use 3 * 15 = 45 slots
>>> last 2 frags: 1 + 1 bytes, overlapping page boundary, 2 * 2 = 4 slots
>>> Although I don't think this 51 slots skb can really happen, we need a solution
>>> which can deal with every scenario. In real life there is only a few slots
>>> overdue, but usually it causes the TCP stream to be blocked, as the retry will
>>> most likely have the same buffer layout.
>>> This patch solves this problem by linearizing the packet. This is not the
>>> fastest way, and it can fail much easier as it tries to allocate a big linear
>>> area for the whole packet, but probably easier by an order of magnitude than
>>> anything else. Probably this code path is not touched very frequently anyway.
>>>
>>> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
>>> Cc: Wei Liu <wei.liu2@citrix.com>
>>> Cc: Ian Campbell <Ian.Campbell@citrix.com>
>>> Cc: Paul Durrant <paul.durrant@citrix.com>
>>> Cc: netdev@vger.kernel.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Cc: xen-devel@lists.xenproject.org
>>
>> This does not seem to be marked explicitly as stable. Has someone already asked
>> David Miller to put it on his stable queue? IMO it qualifies quite well and the
>> actual change should be simple to pick/backport.
>
> I think it's a candidate, yes.
>
> Can you expand on the user visible impact of the bug this patch fixes?
> I think it results in certain types of traffic not working (because the
> domU always generates skb's with the problematic frag layout), but I
> can't remember the details.
Yes, this line in the comment talks about it: "In real life there is
only a few slots overdue, but usually it causes the TCP stream to be
blocked, as the retry will most likely have the same buffer layout."
Maybe we can add what kind of traffic triggered this so far, AFAIK NFS
was one of them, and Stefan had an another use case. But my memories are
blur about this.
Zoli
^ permalink raw reply
* Re: [PATCH] mips: bpf: Fix broken BPF_MOD
From: Denis Kirjanov @ 2014-12-01 13:55 UTC (permalink / raw)
To: Sergei Shtylyov; +Cc: netdev, markos.chandras
In-Reply-To: <547C7240.4010401@cogentembedded.com>
On 12/1/14, Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> wrote:
> Hello.
>
> On 12/1/2014 12:57 PM, Denis Kirjanov wrote:
>
> You should CC the 'linux-mips' ML.
>
>> Remove optimize_div() from BPF_MOD | BPF_K case
>> since we don't know the dividend and fix the
>> emit_mod() by reading the mod operation result from HI register
>
> Isn't this 2 unrelated fixes? They should be in 2 patches, not a single
>
> one in that case.
>
They do fix the BPD_MOD _single_ case. I don't think that it's a good
reason here to split the fin in 2 patches
>> Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
>> ---
>> arch/mips/net/bpf_jit.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> WBR, Sergei
>
>
^ permalink raw reply
* Re: [PATCH net] gso: do GSO for local skb with size bigger than MTU
From: Thomas Graf @ 2014-12-01 13:52 UTC (permalink / raw)
To: Du, Fan
Cc: 'Jason Wang', netdev@vger.kernel.org, davem@davemloft.net,
fw@strlen.de, dev, mst, jesse, pshelar
In-Reply-To: <5A90DA2E42F8AE43BC4A093BF0678848DED92B@SHSMSX104.ccr.corp.intel.com>
On 11/30/14 at 10:08am, Du, Fan wrote:
> >-----Original Message-----
> >From: Jason Wang [mailto:jasowang@redhat.com]
> >Sent: Friday, November 28, 2014 3:02 PM
> >To: Du, Fan
> >Cc: netdev@vger.kernel.org; davem@davemloft.net; fw@strlen.de; Du, Fan
> >Subject: Re: [PATCH net] gso: do GSO for local skb with size bigger than MTU
> >On Fri, Nov 28, 2014 at 2:33 PM, Fan Du <fan.du@intel.com> wrote:
> >> Test scenario: two KVM guests sitting in different hosts communicate
> >> to each other with a vxlan tunnel.
> >>
> >> All interface MTU is default 1500 Bytes, from guest point of view, its
> >> skb gso_size could be as bigger as 1448Bytes, however after guest skb
> >> goes through vxlan encapuslation, individual segments length of a gso
> >> packet could exceed physical NIC MTU 1500, which will be lost at
> >> recevier side.
> >>
> >> So it's possible in virtualized environment, locally created skb len
> >> after encapslation could be bigger than underlayer MTU. In such case,
> >> it's reasonable to do GSO first, then fragment any packet bigger than
> >> MTU as possible.
> >>
> >> +---------------+ TX RX +---------------+
> >> | KVM Guest | -> ... -> | KVM Guest |
> >> +-+-----------+-+ +-+-----------+-+
> >> |Qemu/VirtIO| |Qemu/VirtIO|
> >> +-----------+ +-----------+
> >> | |
> >> v tap0 tap0 v
> >> +-----------+ +-----------+
> >> | ovs bridge| | ovs bridge|
> >> +-----------+ +-----------+
> >> | vxlan vxlan |
> >> v v
> >> +-----------+ +-----------+
> >> | NIC | <------> | NIC |
> >> +-----------+ +-----------+
> >>
> >> Steps to reproduce:
> >> 1. Using kernel builtin openvswitch module to setup ovs bridge.
> >> 2. Runing iperf without -M, communication will stuck.
> >
> >Is this issue specific to ovs or ipv4? Path MTU discovery should help in this case I
> >believe.
>
> Problem here is host stack push local over-sized gso skb down to NIC, and perform GSO there
> without any further ip segmentation.
>
> Reasonable behavior is do gso first at ip level, if gso-ed skb is bigger than MTU && df is set,
> Then push ICMP_DEST_UNREACH/ICMP_FRAG_NEEDED message back to sender to adjust mtu.
Aside from this. I think Virtio should provide a MTU hint to the guest
to adjust MTU in the vNIC to account for both overhead or support for
jumbo frames in the underlay transparently without relying on PMTU or
MSS hints. I remember we talked about this a while ago with at least
Michael but haven't done actual code work on it yet.
> For PMTU to work, that's another issue I will try to address later on.
PMTU discovery was explicitly removed from the OVS datapath. Maybe
Pravin or Jesse can provide some background on that
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox