From: "Michael S. Tsirkin" <mst@redhat.com>
To: Ben Hutchings <bhutchings@solarflare.com>
Cc: xiaohui.xin@intel.com, netdev@vger.kernel.org,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org, mingo@elte.hu,
davem@davemloft.net, herbert@gondor.hengli.com.au,
jdike@linux.intel.com, arnd@arndb.de
Subject: Re: [PATCH v11 13/17] Add mp(mediate passthru) device.
Date: Tue, 28 Sep 2010 15:06:54 +0200 [thread overview]
Message-ID: <20100928130654.GD14385@redhat.com> (raw)
In-Reply-To: <1285622613.2263.550.camel@achroite.uk.solarflarecom.com>
On Mon, Sep 27, 2010 at 10:23:33PM +0100, Ben Hutchings wrote:
> > +/* The main function to transform the guest user space address
> > + * to host kernel address via get_user_pages(). Thus the hardware
> > + * can do DMA directly to the external buffer address.
> > + */
> > +static struct page_info *alloc_page_info(struct page_ctor *ctor,
> > + struct kiocb *iocb, struct iovec *iov,
> > + int count, struct frag *frags,
> > + int npages, int total)
> > +{
> > + int rc;
> > + int i, j, n = 0;
> > + int len;
> > + unsigned long base, lock_limit;
> > + struct page_info *info = NULL;
> > +
> > + lock_limit = current->signal->rlim[RLIMIT_MEMLOCK].rlim_cur;
> > + lock_limit >>= PAGE_SHIFT;
> > +
> > + if (ctor->lock_pages + count > lock_limit && npages) {
> > + printk(KERN_INFO "exceed the locked memory rlimit.");
> > + return NULL;
> > + }
>
> What if the process is locking pages with mlock() as well? Doesn't this
> allow it to lock twice as many pages as it should be able to?
No, since locked_vm is incremented by both mp and mlock.
Or at least, that's the idea :)
In any case, twice as many would not be a big deal: admin can control
which processes can do this by controlling access to the device.
> > + info = kmem_cache_alloc(ext_page_info_cache, GFP_KERNEL);
> > +
> > + if (!info)
> > + return NULL;
> > + info->skb = NULL;
> > + info->next = info->prev = NULL;
> > +
> > + for (i = j = 0; i < count; i++) {
> > + base = (unsigned long)iov[i].iov_base;
> > + len = iov[i].iov_len;
> > +
> > + if (!len)
> > + continue;
> > + n = ((base & ~PAGE_MASK) + len + ~PAGE_MASK) >> PAGE_SHIFT;
> > +
> > + rc = get_user_pages_fast(base, n, npages ? 1 : 0,
> > + &info->pages[j]);
> > + if (rc != n)
> > + goto failed;
> > +
> > + while (n--) {
> > + frags[j].offset = base & ~PAGE_MASK;
> > + frags[j].size = min_t(int, len,
> > + PAGE_SIZE - frags[j].offset);
> > + len -= frags[j].size;
> > + base += frags[j].size;
> > + j++;
> > + }
> > + }
> > +
> > +#ifdef CONFIG_HIGHMEM
> > + if (npages && !(dev->features & NETIF_F_HIGHDMA)) {
> > + for (i = 0; i < j; i++) {
> > + if (PageHighMem(info->pages[i]))
> > + goto failed;
> > + }
> > + }
> > +#endif
>
> Shouldn't you try to allocate lowmem pages explicitly, rather than
> failing at this point?
We don't allocate pages, we lock given pages. Once this is
integrated in macvtap presumably we'll fall back on data copy
for such devices.
...
> > + skb_reserve(skb, NET_IP_ALIGN);
> > + skb_put(skb, len);
> > +
> > + if (skb_copy_datagram_from_iovec(skb, 0, iov, 0, len)) {
> > + kfree_skb(skb);
> > + return -EAGAIN;
> > + }
> > +
> > + skb->protocol = eth_type_trans(skb, mp->dev);
>
> Why are you calling eth_type_trans() on transmit?
So that GSO can work. BTW macvtap does:
skb_set_network_header(skb, ETH_HLEN);
skb_reset_mac_header(skb);
skb->protocol = eth_hdr(skb)->h_proto;
and I think this is broken for vlans. Arnd?
--
MST
next prev parent reply other threads:[~2010-09-28 13:13 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-25 4:27 [PATCH v11 00/17] Provide a zero-copy method on KVM virtio-net xiaohui.xin
2010-09-25 4:27 ` [PATCH v11 01/17] Add a new structure for skb buffer from external xiaohui.xin
2010-09-25 4:27 ` [PATCH v11 02/17] Add a new struct for device to manipulate external buffer xiaohui.xin
2010-09-27 13:41 ` Ben Hutchings
2010-09-25 4:27 ` [PATCH v11 03/17] Add a ndo_mp_port_prep pointer to net_device_ops xiaohui.xin
2010-09-27 13:42 ` Ben Hutchings
2010-09-29 13:41 ` Xin, Xiaohui
2010-09-25 4:27 ` [PATCH v11 04/17]Add a function make external buffer owner to query capability xiaohui.xin
2010-09-27 13:45 ` Ben Hutchings
2010-09-25 4:27 ` [PATCH v11 05/17] Add a function to indicate if device use external buffer xiaohui.xin
2010-09-25 4:27 ` [PATCH v11 06/17]Use callback to deal with skb_release_data() specially xiaohui.xin
2010-09-25 4:27 ` [PATCH v11 07/17] Modify netdev_alloc_page() to get external buffer xiaohui.xin
2010-09-25 4:27 ` [PATCH v11 08/17] Modify netdev_free_page() to release " xiaohui.xin
2010-09-25 4:27 ` [PATCH v11 09/17] Don't do skb recycle, if device use " xiaohui.xin
2010-09-25 4:27 ` [PATCH v11 10/17] Add a hook to intercept external buffers from NIC driver xiaohui.xin
2010-09-25 4:27 ` [PATCH v11 11/17] Add header file for mp device xiaohui.xin
2010-09-27 13:55 ` Ben Hutchings
2010-09-25 4:27 ` [PATCH v11 12/17] Add a kconfig entry and make entry " xiaohui.xin
2010-09-27 13:56 ` Ben Hutchings
2010-09-29 13:39 ` Xin, Xiaohui
2010-09-25 4:27 ` [PATCH v11 13/17] Add mp(mediate passthru) device xiaohui.xin
2010-09-27 21:23 ` Ben Hutchings
2010-09-28 13:06 ` Michael S. Tsirkin [this message]
2010-09-28 14:39 ` Arnd Bergmann
2010-09-28 14:43 ` Michael S. Tsirkin
2010-09-28 15:18 ` Arnd Bergmann
2010-09-28 18:48 ` Sridhar Samudrala
2010-09-29 13:38 ` Xin, Xiaohui
2010-09-25 4:27 ` [PATCH v11 14/17]Provides multiple submits and asynchronous notifications xiaohui.xin
2010-09-25 4:27 ` [PATCH v11 15/17]An example how to modifiy NIC driver to use napi_gro_frags() interface xiaohui.xin
2010-09-25 4:27 ` [PATCH v11 16/17]An example how to alloc user buffer based on " xiaohui.xin
2010-09-25 4:27 ` [PATCH v11 17/17]add two new ioctls for mp device xiaohui.xin
2010-09-27 21:36 ` Ben Hutchings
2010-09-28 13:09 ` Michael S. Tsirkin
2010-09-28 9:48 ` Michael S. Tsirkin
2010-09-29 9:36 ` xiaohui.xin
2010-09-26 17:01 ` [PATCH v11 00/17] Provide a zero-copy method on KVM virtio-net Michael S. Tsirkin
2010-09-27 0:44 ` Xin, Xiaohui
2010-09-28 1:25 ` Xin, Xiaohui
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100928130654.GD14385@redhat.com \
--to=mst@redhat.com \
--cc=arnd@arndb.de \
--cc=bhutchings@solarflare.com \
--cc=davem@davemloft.net \
--cc=herbert@gondor.hengli.com.au \
--cc=jdike@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=netdev@vger.kernel.org \
--cc=xiaohui.xin@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.