From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, "Michael S. Tsirkin" <mst@redhat.com>,
Jason Wang <jasowang@redhat.com>,
"David S. Miller" <davem@davemloft.net>
Subject: [ 18/23] macvtap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS
Date: Thu, 12 Sep 2013 10:45:14 -0700 [thread overview]
Message-ID: <20130912174453.735312348@linuxfoundation.org> (raw)
In-Reply-To: <20130912174451.748805761@linuxfoundation.org>
3.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jason Wang <jasowang@redhat.com>
commit ece793fcfc417b3925844be88a6a6dc82ae8f7c6 upstream.
We try to linearize part of the skb when the number of iov is greater than
MAX_SKB_FRAGS. This is not enough since each single vector may occupy more than
one pages, so zerocopy_sg_fromiovec() may still fail and may break the guest
network.
Solve this problem by calculate the pages needed for iov before trying to do
zerocopy and switch to use copy instead of zerocopy if it needs more than
MAX_SKB_FRAGS.
This is done through introducing a new helper to count the pages for iov, and
call uarg->callback() manually when switching from zerocopy to copy to notify
vhost.
We can do further optimization on top.
This bug were introduced from b92946e2919134ebe2a4083e4302236295ea2a73
(macvtap: zerocopy: validate vectors before building skb).
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/net/macvtap.c | 62 +++++++++++++++++++++++++++++---------------------
1 file changed, 37 insertions(+), 25 deletions(-)
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -642,6 +642,28 @@ static int macvtap_skb_to_vnet_hdr(const
return 0;
}
+static unsigned long iov_pages(const struct iovec *iv, int offset,
+ unsigned long nr_segs)
+{
+ unsigned long seg, base;
+ int pages = 0, len, size;
+
+ while (nr_segs && (offset >= iv->iov_len)) {
+ offset -= iv->iov_len;
+ ++iv;
+ --nr_segs;
+ }
+
+ for (seg = 0; seg < nr_segs; seg++) {
+ base = (unsigned long)iv[seg].iov_base + offset;
+ len = iv[seg].iov_len - offset;
+ size = ((base & ~PAGE_MASK) + len + ~PAGE_MASK) >> PAGE_SHIFT;
+ pages += size;
+ offset = 0;
+ }
+
+ return pages;
+}
/* Get packet from user space buffer */
static ssize_t macvtap_get_user(struct macvtap_queue *q, struct msghdr *m,
@@ -688,31 +710,15 @@ static ssize_t macvtap_get_user(struct m
if (unlikely(count > UIO_MAXIOV))
goto err;
- if (m && m->msg_control && sock_flag(&q->sk, SOCK_ZEROCOPY))
- zerocopy = true;
-
- if (zerocopy) {
- /* Userspace may produce vectors with count greater than
- * MAX_SKB_FRAGS, so we need to linearize parts of the skb
- * to let the rest of data to be fit in the frags.
- */
- if (count > MAX_SKB_FRAGS) {
- copylen = iov_length(iv, count - MAX_SKB_FRAGS);
- if (copylen < vnet_hdr_len)
- copylen = 0;
- else
- copylen -= vnet_hdr_len;
- }
- /* There are 256 bytes to be copied in skb, so there is enough
- * room for skb expand head in case it is used.
- * The rest buffer is mapped from userspace.
- */
- if (copylen < vnet_hdr.hdr_len)
- copylen = vnet_hdr.hdr_len;
- if (!copylen)
- copylen = GOODCOPY_LEN;
+ if (m && m->msg_control && sock_flag(&q->sk, SOCK_ZEROCOPY)) {
+ copylen = vnet_hdr.hdr_len ? vnet_hdr.hdr_len : GOODCOPY_LEN;
linear = copylen;
- } else {
+ if (iov_pages(iv, vnet_hdr_len + copylen, count)
+ <= MAX_SKB_FRAGS)
+ zerocopy = true;
+ }
+
+ if (!zerocopy) {
copylen = len;
linear = vnet_hdr.hdr_len;
}
@@ -724,9 +730,15 @@ static ssize_t macvtap_get_user(struct m
if (zerocopy)
err = zerocopy_sg_from_iovec(skb, iv, vnet_hdr_len, count);
- else
+ else {
err = skb_copy_datagram_from_iovec(skb, 0, iv, vnet_hdr_len,
len);
+ if (!err && m && m->msg_control) {
+ struct ubuf_info *uarg = m->msg_control;
+ uarg->callback(uarg);
+ }
+ }
+
if (err)
goto err_kfree;
next prev parent reply other threads:[~2013-09-12 17:45 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-12 17:44 [ 00/23] 3.4.62-stable review Greg Kroah-Hartman
2013-09-12 17:44 ` [ 01/23] htb: fix sign extension bug Greg Kroah-Hartman
2013-09-13 5:04 ` [00/23] 3.4.62-stable review Guenter Roeck
2013-09-13 12:35 ` Greg Kroah-Hartman
2013-09-12 17:44 ` [ 02/23] net: check net.core.somaxconn sysctl values Greg Kroah-Hartman
2013-09-12 17:44 ` [ 03/23] neighbour: populate neigh_parms on alloc before calling ndo_neigh_setup Greg Kroah-Hartman
2013-09-12 17:45 ` [ 04/23] bonding: modify only neigh_parms owned by us Greg Kroah-Hartman
2013-09-12 17:45 ` [ 05/23] fib_trie: remove potential out of bound access Greg Kroah-Hartman
2013-09-12 17:45 ` [ 06/23] tcp: cubic: fix overflow error in bictcp_update() Greg Kroah-Hartman
2013-09-12 17:45 ` [ 07/23] tcp: cubic: fix bug in bictcp_acked() Greg Kroah-Hartman
2013-09-12 17:45 ` [ 08/23] ipv6: dont stop backtracking in fib6_lookup_1 if subtree does not match Greg Kroah-Hartman
2013-09-12 17:45 ` [ 09/23] 8139cp: Fix skb leak in rx_status_loop failure path Greg Kroah-Hartman
2013-09-12 17:45 ` [ 10/23] tun: signedness bug in tun_get_user() Greg Kroah-Hartman
2013-09-12 17:45 ` [ 11/23] ipv6: remove max_addresses check from ipv6_create_tempaddr Greg Kroah-Hartman
2013-09-12 17:45 ` [ 12/23] ipv6: drop packets with multiple fragmentation headers Greg Kroah-Hartman
2013-09-12 17:45 ` [ 13/23] ipv6: Dont depend on per socket memory for neighbour discovery messages Greg Kroah-Hartman
2013-09-12 17:45 ` [ 14/23] net: bridge: convert MLDv2 Query MRC into msecs_to_jiffies for max_delay Greg Kroah-Hartman
2013-09-12 17:45 ` [ 15/23] ICMPv6: treat dest unreachable codes 5 and 6 as EACCES, not EPROTO Greg Kroah-Hartman
2013-09-12 17:45 ` [ 16/23] net: ipv6: tcp: fix potential use after free in tcp_v6_do_rcv Greg Kroah-Hartman
2013-09-12 17:45 ` [ 17/23] vhost: zerocopy: poll vq in zerocopy callback Greg Kroah-Hartman
2013-09-12 17:45 ` Greg Kroah-Hartman [this message]
2013-09-12 17:45 ` [ 19/23] tipc: fix lockdep warning during bearer initialization Greg Kroah-Hartman
2013-09-12 17:45 ` [ 20/23] m32r: consistently use "suffix-$(...)" Greg Kroah-Hartman
2013-09-12 17:45 ` [ 21/23] m32r: add memcpy() for CONFIG_KERNEL_GZIP=y Greg Kroah-Hartman
2013-09-12 17:45 ` [ 22/23] m32r: make memset() global for CONFIG_KERNEL_BZIP2=y Greg Kroah-Hartman
2013-09-12 17:45 ` [ 23/23] Revert "KVM: X86 emulator: fix source operand decoding for 8bit mov[zs]x instructions" Greg Kroah-Hartman
2013-09-13 23:02 ` [ 00/23] 3.4.62-stable review Shuah Khan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130912174453.735312348@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=davem@davemloft.net \
--cc=jasowang@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mst@redhat.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).