From: "Michael S. Tsirkin" <mst@redhat.com>
To: Ben Hutchings <bhutchings@solarflare.com>
Cc: xiaohui.xin@intel.com, netdev@vger.kernel.org,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org, mingo@elte.hu,
davem@davemloft.net, herbert@gondor.hengli.com.au,
jdike@linux.intel.com, arnd@arndb.de
Subject: Re: [PATCH v11 13/17] Add mp(mediate passthru) device.
Date: Tue, 28 Sep 2010 15:06:54 +0200 [thread overview]
Message-ID: <20100928130654.GD14385@redhat.com> (raw)
In-Reply-To: <1285622613.2263.550.camel@achroite.uk.solarflarecom.com>
On Mon, Sep 27, 2010 at 10:23:33PM +0100, Ben Hutchings wrote:
> > +/* The main function to transform the guest user space address
> > + * to host kernel address via get_user_pages(). Thus the hardware
> > + * can do DMA directly to the external buffer address.
> > + */
> > +static struct page_info *alloc_page_info(struct page_ctor *ctor,
> > + struct kiocb *iocb, struct iovec *iov,
> > + int count, struct frag *frags,
> > + int npages, int total)
> > +{
> > + int rc;
> > + int i, j, n = 0;
> > + int len;
> > + unsigned long base, lock_limit;
> > + struct page_info *info = NULL;
> > +
> > + lock_limit = current->signal->rlim[RLIMIT_MEMLOCK].rlim_cur;
> > + lock_limit >>= PAGE_SHIFT;
> > +
> > + if (ctor->lock_pages + count > lock_limit && npages) {
> > + printk(KERN_INFO "exceed the locked memory rlimit.");
> > + return NULL;
> > + }
>
> What if the process is locking pages with mlock() as well? Doesn't this
> allow it to lock twice as many pages as it should be able to?
No, since locked_vm is incremented by both mp and mlock.
Or at least, that's the idea :)
In any case, twice as many would not be a big deal: admin can control
which processes can do this by controlling access to the device.
> > + info = kmem_cache_alloc(ext_page_info_cache, GFP_KERNEL);
> > +
> > + if (!info)
> > + return NULL;
> > + info->skb = NULL;
> > + info->next = info->prev = NULL;
> > +
> > + for (i = j = 0; i < count; i++) {
> > + base = (unsigned long)iov[i].iov_base;
> > + len = iov[i].iov_len;
> > +
> > + if (!len)
> > + continue;
> > + n = ((base & ~PAGE_MASK) + len + ~PAGE_MASK) >> PAGE_SHIFT;
> > +
> > + rc = get_user_pages_fast(base, n, npages ? 1 : 0,
> > + &info->pages[j]);
> > + if (rc != n)
> > + goto failed;
> > +
> > + while (n--) {
> > + frags[j].offset = base & ~PAGE_MASK;
> > + frags[j].size = min_t(int, len,
> > + PAGE_SIZE - frags[j].offset);
> > + len -= frags[j].size;
> > + base += frags[j].size;
> > + j++;
> > + }
> > + }
> > +
> > +#ifdef CONFIG_HIGHMEM
> > + if (npages && !(dev->features & NETIF_F_HIGHDMA)) {
> > + for (i = 0; i < j; i++) {
> > + if (PageHighMem(info->pages[i]))
> > + goto failed;
> > + }
> > + }
> > +#endif
>
> Shouldn't you try to allocate lowmem pages explicitly, rather than
> failing at this point?
We don't allocate pages, we lock given pages. Once this is
integrated in macvtap presumably we'll fall back on data copy
for such devices.
...
> > + skb_reserve(skb, NET_IP_ALIGN);
> > + skb_put(skb, len);
> > +
> > + if (skb_copy_datagram_from_iovec(skb, 0, iov, 0, len)) {
> > + kfree_skb(skb);
> > + return -EAGAIN;
> > + }
> > +
> > + skb->protocol = eth_type_trans(skb, mp->dev);
>
> Why are you calling eth_type_trans() on transmit?
So that GSO can work. BTW macvtap does:
skb_set_network_header(skb, ETH_HLEN);
skb_reset_mac_header(skb);
skb->protocol = eth_hdr(skb)->h_proto;
and I think this is broken for vlans. Arnd?
--
MST
next prev parent reply other threads:[~2010-09-28 13:13 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-25 4:27 [PATCH v11 00/17] Provide a zero-copy method on KVM virtio-net xiaohui.xin
2010-09-25 4:27 ` [PATCH v11 01/17] Add a new structure for skb buffer from external xiaohui.xin
2010-09-25 4:27 ` [PATCH v11 02/17] Add a new struct for device to manipulate external buffer xiaohui.xin
2010-09-27 13:41 ` Ben Hutchings
2010-09-25 4:27 ` [PATCH v11 03/17] Add a ndo_mp_port_prep pointer to net_device_ops xiaohui.xin
2010-09-27 13:42 ` Ben Hutchings
2010-09-29 13:41 ` Xin, Xiaohui
2010-09-25 4:27 ` [PATCH v11 04/17]Add a function make external buffer owner to query capability xiaohui.xin
2010-09-27 13:45 ` Ben Hutchings
2010-09-25 4:27 ` [PATCH v11 05/17] Add a function to indicate if device use external buffer xiaohui.xin
2010-09-25 4:27 ` [PATCH v11 06/17]Use callback to deal with skb_release_data() specially xiaohui.xin
2010-09-25 4:27 ` [PATCH v11 07/17] Modify netdev_alloc_page() to get external buffer xiaohui.xin
2010-09-25 4:27 ` [PATCH v11 08/17] Modify netdev_free_page() to release " xiaohui.xin
2010-09-25 4:27 ` [PATCH v11 09/17] Don't do skb recycle, if device use " xiaohui.xin
2010-09-25 4:27 ` [PATCH v11 10/17] Add a hook to intercept external buffers from NIC driver xiaohui.xin
2010-09-25 4:27 ` [PATCH v11 11/17] Add header file for mp device xiaohui.xin
2010-09-27 13:55 ` Ben Hutchings
2010-09-25 4:27 ` [PATCH v11 12/17] Add a kconfig entry and make entry " xiaohui.xin
2010-09-27 13:56 ` Ben Hutchings
2010-09-29 13:39 ` Xin, Xiaohui
2010-09-25 4:27 ` [PATCH v11 13/17] Add mp(mediate passthru) device xiaohui.xin
2010-09-27 21:23 ` Ben Hutchings
2010-09-28 13:06 ` Michael S. Tsirkin [this message]
2010-09-28 14:39 ` Arnd Bergmann
2010-09-28 14:43 ` Michael S. Tsirkin
2010-09-28 15:18 ` Arnd Bergmann
2010-09-28 18:48 ` Sridhar Samudrala
2010-09-29 13:38 ` Xin, Xiaohui
2010-09-25 4:27 ` [PATCH v11 14/17]Provides multiple submits and asynchronous notifications xiaohui.xin
2010-09-25 4:27 ` [PATCH v11 15/17]An example how to modifiy NIC driver to use napi_gro_frags() interface xiaohui.xin
2010-09-25 4:27 ` [PATCH v11 16/17]An example how to alloc user buffer based on " xiaohui.xin
2010-09-25 4:27 ` [PATCH v11 17/17]add two new ioctls for mp device xiaohui.xin
2010-09-27 21:36 ` Ben Hutchings
2010-09-28 13:09 ` Michael S. Tsirkin
2010-09-28 9:48 ` Michael S. Tsirkin
2010-09-29 9:36 ` xiaohui.xin
2010-09-26 17:01 ` [PATCH v11 00/17] Provide a zero-copy method on KVM virtio-net Michael S. Tsirkin
2010-09-27 0:44 ` Xin, Xiaohui
2010-09-28 1:25 ` Xin, Xiaohui
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100928130654.GD14385@redhat.com \
--to=mst@redhat.com \
--cc=arnd@arndb.de \
--cc=bhutchings@solarflare.com \
--cc=davem@davemloft.net \
--cc=herbert@gondor.hengli.com.au \
--cc=jdike@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=netdev@vger.kernel.org \
--cc=xiaohui.xin@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).