All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v6 0/10] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy
@ 2014-03-04 22:32 Zoltan Kiss
  2014-03-04 22:32 ` [PATCH net-next v6 1/10] xen-netback: Use skb->cb for pending_idx Zoltan Kiss
                   ` (15 more replies)
  0 siblings, 16 replies; 46+ messages in thread
From: Zoltan Kiss @ 2014-03-04 22:32 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel
  Cc: netdev, linux-kernel, jonathan.davies, Zoltan Kiss

A long known problem of the upstream netback implementation that on the TX
path (from guest to Dom0) it copies the whole packet from guest memory into
Dom0. That simply became a bottleneck with 10Gb NICs, and generally it's a
huge perfomance penalty. The classic kernel version of netback used grant
mapping, and to get notified when the page can be unmapped, it used page
destructors. Unfortunately that destructor is not an upstreamable solution.
Ian Campbell's skb fragment destructor patch series [1] tried to solve this
problem, however it seems to be very invasive on the network stack's code,
and therefore haven't progressed very well.
This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to
know when the skb is freed up. That is the way KVM solved the same problem,
and based on my initial tests it can do the same for us. Avoiding the extra
copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower AMD
Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node,
running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb
switch)
Based on my investigations the packet get only copied if it is delivered to
Dom0 IP stack through deliver_skb, which is due to this [2] patch. This affects
DomU->Dom0 IP traffic and when Dom0 does routing/NAT for the guest. That's a bit
unfortunate, but luckily it doesn't cause a major regression for this usecase.
In the future we should try to eliminate that copy somehow.
There are a few spinoff tasks which will be addressed in separate patches:
- grant copy the header directly instead of map and memcpy. This should help
  us avoiding TLB flushing
- use something else than ballooned pages
- fix grant map to use page->index properly
I've tried to broke it down to smaller patches, with mixed results, so I
welcome suggestions on that part as well:
1: Use skb->cb to store pending_idx
2: Some refactoring
3: Change RX path for mapped SKB fragments (moved here to keep bisectability,
review it after #4)
4: Introduce TX grant mapping
5: Remove old TX grant copy definitons and fix indentations
6: Add stat counters for zerocopy
7: Handle guests with too many frags
8: Add stat counters for frag_list skbs
9: Timeout packets in RX path
10: Aggregate TX unmap operations

v2: I've fixed some smaller things, see the individual patches. I've added a
few new stat counters, and handling the important use case when an older guest
sends lots of slots. Instead of delayed copy now we timeout packets on the RX
path, based on the assumption that otherwise packets should get stucked
anywhere else. Finally some unmap batching to avoid too much TLB flush

v3: Apart from fixing a few things mentioned in responses the important change
is the use the hypercall directly for grant [un]mapping, therefore we can
avoid m2p override.

v4: Now we are using a new grant mapping API to avoid m2p_override. The RX queue
timeout logic changed also.

v5: Only minor fixes based on Wei's comments

v6: Important bugfixes for xenvif_poll exit path and zerocopy callback, see
first 2 patches. Also rework of handling packets with too many slots, and
reorder the series a bit.

[1] http://lwn.net/Articles/491522/
[2] https://lkml.org/lkml/2012/7/20/363

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>


^ permalink raw reply	[flat|nested] 46+ messages in thread
* Re: [PATCH net-next v6 4/10] xen-netback: Introduce TX grant mapping
@ 2014-03-05  2:19 Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 46+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-03-05  2:19 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: jonathan.davies, wei.liu2, ian.campbell, netdev, linux-kernel,
	xen-devel


On Mar 4, 2014 5:32 PM, Zoltan Kiss <zoltan.kiss@citrix.com> wrote:
>
> This patch introduces grant mapping on netback TX path. It replaces grant copy 
> operations, ditching grant copy coalescing along the way. Another solution for 
> copy coalescing is introduced in patch #7, older guests and Windows can broke

Don't say #7 patch as that information is stripped from the patch when it goes in git tree. Instead use the full name of the patch.

> before that patch applies. 
> There is a callback (xenvif_zerocopy_callback) from core stack to release the 
> slots back to the guests when kfree_skb or skb_orphan_frags called. It feeds a 
> separate dealloc thread, as scheduling NAPI instance from there is inefficient, 
> therefore we can't do dealloc from the instance. 
>
> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> 
> --- 
> History of the original #1 patch 
> v2: 
> - move unmapping to separate thread. The NAPI instance has to be scheduled 
>   even from thread context, which can cause huge delays 
> - that causes unfortunately bigger struct xenvif 
> - store grant handle after checking validity 
>
> v3: 
> - fix comment in xenvif_tx_dealloc_action() 
> - call unmap hypercall directly instead of gnttab_unmap_refs(), which does 
>   unnecessary m2p_override. Also remove pages_to_[un]map members 
> - BUG() if grant_tx_handle corrupted 
>
> v4: 
> - fix indentations and comments 
> - use bool for tx_dealloc_work_todo 
> - BUG() if grant_tx_handle corrupted - now really :) 
> - go back to gnttab_unmap_refs, now we rely on API changes 
>
> v5: 
> - remove hypercall from xenvif_idx_unmap 
> - remove stray line in xenvif_tx_create_gop 
> - simplify tx_dealloc_work_todo 
> - BUG() in xenvif_idx_unmap 
>
> History of the original #2 patch: 
> v2: 
> - delete branch for handling fragmented packets fit PKT_PROT_LEN sized first 
>   request 
> - mark the effect of using ballooned pages in a comment 
> - place setting of skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY right 
>   before netif_receive_skb, and mark the importance of it 
> - grab dealloc_lock before __napi_complete to avoid contention with the 
>   callback's napi_schedule 
> - handle fragmented packets where first request < PKT_PROT_LEN 
> - fix up error path when checksum_setup failed 
> - check before teardown for pending grants, and start complain if they are 
>   there after 10 second 
>
> v3: 
> - delete a surplus checking from tx_action 
> - remove stray line 
> - squash xenvif_idx_unmap changes into the first patch 
> - init spinlocks 
> - call map hypercall directly instead of gnttab_map_refs() 
> - fix unmapping timeout in xenvif_free() 
>
> v4: 
> - fix indentations and comments 
> - handle errors of set_phys_to_machine 
> - go back to gnttab_map_refs instead of direct hypercall. Now we rely on the 
>   modified API 
>
> v5: 
> - BUG_ON(vif->dealloc_task) in xenvif_connect 
> - use 'task' in xenvif_connect for thread creation 
> - proper return value if alloc_xenballooned_pages fails 
> - BUG in xenvif_tx_check_gop if stale handle found 
>
> Joint history: 
>
> v6: 
> - I wanted to avoid napi_schedule in zerocopy_callback, as it has a performance 
>   penalty if called from another vif thread's context. That's why I moved 
>   dealloc to a separate thread. But there are certain situations where it's 
>   necessary, so do it. Fortunately it doesn't happen too often, I couldn't see 
>   performance regression in iperf tests 
> - tx_dealloc_work_todo missing ; 
> - introduce xenvif_tx_pending_slots_available() as that code is used often 
> - move some definitions to common.h due to previous 
> - new ubuf_to_vif and xenvif_grant_handle_[re]set helpers 
> - call xenvif_idx_release from xenvif_unmap insted of right after it 
> - improve comments 
> - add more BUG_ONs 
> - check xenvif_tx_pending_slots_available before napi_complete, otherwise NAPI 
>   will keep polling on that instance despite it can't do anything as there is no 
>   room for pending requests. It becomes even worse if the pending packets are 
>   waiting for an another instance on the same CPU. 
> - xenvif_fill_frags handles prev_pending_idx independently 
>
> drivers/net/xen-netback/common.h    |   33 ++- 
> drivers/net/xen-netback/interface.c |   67 +++++- 
> drivers/net/xen-netback/netback.c   |  424 ++++++++++++++++++++++------------- 
> 3 files changed, 362 insertions(+), 162 deletions(-) 
>
> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h 
> index 2f6d772..de1b799 100644 
> --- a/drivers/net/xen-netback/common.h 
> +++ b/drivers/net/xen-netback/common.h 
> @@ -79,6 +79,11 @@ struct pending_tx_info { 
>   * if it is head of one or more tx 
>   * reqs 
>   */ 
> + /* callback data for released SKBs. The callback is always 
> + * xenvif_zerocopy_callback, ctx points to the next fragment, desc 
> + * contains the pending_idx 
> + */ 
> + struct ubuf_info callback_struct; 
> }; 
>
> #define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE) 
> @@ -136,13 +141,31 @@ struct xenvif { 
> pending_ring_idx_t pending_cons; 
> u16 pending_ring[MAX_PENDING_REQS]; 
> struct pending_tx_info pending_tx_info[MAX_PENDING_REQS]; 
> + grant_handle_t grant_tx_handle[MAX_PENDING_REQS]; 
>
> /* Coalescing tx requests before copying makes number of grant 
> * copy ops greater or equal to number of slots required. In 
> * worst case a tx request consumes 2 gnttab_copy. 
> */ 
> struct gnttab_copy tx_copy_ops[2*MAX_PENDING_REQS]; 
> - 
> + struct gnttab_map_grant_ref tx_map_ops[MAX_PENDING_REQS]; 
> + struct gnttab_unmap_grant_ref tx_unmap_ops[MAX_PENDING_REQS]; 
> + /* passed to gnttab_[un]map_refs with pages under (un)mapping */ 
> + struct page *pages_to_map[MAX_PENDING_REQS]; 
> + struct page *pages_to_unmap[MAX_PENDING_REQS]; 
> + 
> + /* This prevents zerocopy callbacks  to race over dealloc_ring */ 
> + spinlock_t callback_lock; 
> + /* This prevents dealloc thread and NAPI instance to race over response 
> + * creation and pending_ring in xenvif_idx_release. In xenvif_tx_err 
> + * it only protect response creation 
> + */ 
> + spinlock_t response_lock; 
> + pending_ring_idx_t dealloc_prod; 
> + pending_ring_idx_t dealloc_cons; 
> + u16 dealloc_ring[MAX_PENDING_REQS]; 
> + struct task_struct *dealloc_task; 
> + wait_queue_head_t dealloc_wq; 
>
> /* Use kthread for guest RX */ 
> struct task_struct *task; 
> @@ -229,6 +252,8 @@ int xenvif_tx_action(struct xenvif *vif, int budget); 
> int xenvif_kthread_guest_rx(void *data); 
> void xenvif_kick_thread(struct xenvif *vif); 
>
> +int xenvif_dealloc_kthread(void *data); 
> + 
> /* Determine whether the needed number of slots (req) are available, 
>   * and set req_event if not. 
>   */ 
> @@ -236,6 +261,12 @@ bool xenvif_rx_ring_slots_available(struct xenvif *vif, int needed); 
>
> void xenvif_stop_queue(struct xenvif *vif); 
>
> +/* Callback from stack when TX packet can be released */ 
> +void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success); 
> + 
> +/* Unmap a pending page and release it back to the guest */ 
> +void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx); 
> + 
> static inline pending_ring_idx_t nr_pending_reqs(struct xenvif *vif) 
> { 
> return MAX_PENDING_REQS - 
> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c 
> index bc32627..82af643 100644 
> --- a/drivers/net/xen-netback/interface.c 
> +++ b/drivers/net/xen-netback/interface.c 
> @@ -38,6 +38,7 @@ 
>
> #include <xen/events.h> 
> #include <asm/xen/hypercall.h> 
> +#include <xen/balloon.h> 
>
> #define XENVIF_QUEUE_LENGTH 32 
> #define XENVIF_NAPI_WEIGHT  64 
> @@ -87,7 +88,8 @@ static int xenvif_poll(struct napi_struct *napi, int budget) 
> local_irq_save(flags); 
>
> RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, more_to_do); 
> - if (!more_to_do) 
> + if (!(more_to_do && 
> +       xenvif_tx_pending_slots_available(vif))) 
> __napi_complete(napi); 
>
> local_irq_restore(flags); 
> @@ -121,7 +123,9 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev) 
> BUG_ON(skb->dev != dev); 
>
> /* Drop the packet if vif is not ready */ 
> - if (vif->task == NULL || !xenvif_schedulable(vif)) 
> + if (vif->task == NULL || 
> +     vif->dealloc_task == NULL || 
> +     !xenvif_schedulable(vif)) 
> goto drop; 
>
> /* At best we'll need one slot for the header and one for each 
> @@ -343,8 +347,26 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid, 
> vif->pending_prod = MAX_PENDING_REQS; 
> for (i = 0; i < MAX_PENDING_REQS; i++) 
> vif->pending_ring[i] = i; 
> - for (i = 0; i < MAX_PENDING_REQS; i++) 
> - vif->mmap_pages[i] = NULL; 
> + spin_lock_init(&vif->callback_lock); 
> + spin_lock_init(&vif->response_lock); 
> + /* If ballooning is disabled, this will consume real memory, so you 
> + * better enable it. The long term solution would be to use just a 
> + * bunch of valid page descriptors, without dependency on ballooning 
> + */ 
> + err = alloc_xenballooned_pages(MAX_PENDING_REQS, 
> +        vif->mmap_pages, 
> +        false); 
> + if (err) { 
> + netdev_err(dev, "Could not reserve mmap_pages\n"); 
> + return ERR_PTR(-ENOMEM); 
> + } 
> + for (i = 0; i < MAX_PENDING_REQS; i++) { 
> + vif->pending_tx_info[i].callback_struct = (struct ubuf_info) 
> + { .callback = xenvif_zerocopy_callback, 
> +   .ctx = NULL, 
> +   .desc = i }; 
> + vif->grant_tx_handle[i] = NETBACK_INVALID_HANDLE; 
> + } 
>
> /* 
> * Initialise a dummy MAC address. We choose the numerically 
> @@ -382,12 +404,14 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref, 
>
> BUG_ON(vif->tx_irq); 
> BUG_ON(vif->task); 
> + BUG_ON(vif->dealloc_task); 
>
> err = xenvif_map_frontend_rings(vif, tx_ring_ref, rx_ring_ref); 
> if (err < 0) 
> goto err; 
>
> init_waitqueue_head(&vif->wq); 
> + init_waitqueue_head(&vif->dealloc_wq); 
>
> if (tx_evtchn == rx_evtchn) { 
> /* feature-split-event-channels == 0 */ 
> @@ -431,6 +455,18 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref, 
>
> vif->task = task; 
>
> + task = kthread_create(xenvif_dealloc_kthread, 
> +    (void *)vif, 
> +    "%s-dealloc", 
> +    vif->dev->name); 
> + if (IS_ERR(task)) { 
> + pr_warn("Could not allocate kthread for %s\n", vif->dev->name); 
> + err = PTR_ERR(task); 
> + goto err_rx_unbind; 
> + } 
> + 
> + vif->dealloc_task = task; 
> + 
> rtnl_lock(); 
> if (!vif->can_sg && vif->dev->mtu > ETH_DATA_LEN) 
> dev_set_mtu(vif->dev, ETH_DATA_LEN); 
> @@ -441,6 +477,7 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref, 
> rtnl_unlock(); 
>
> wake_up_process(vif->task); 
> + wake_up_process(vif->dealloc_task); 
>
> return 0; 
>
> @@ -478,6 +515,11 @@ void xenvif_disconnect(struct xenvif *vif) 
> vif->task = NULL; 
> } 
>
> + if (vif->dealloc_task) { 
> + kthread_stop(vif->dealloc_task); 
> + vif->dealloc_task = NULL; 
> + } 
> + 
> if (vif->tx_irq) { 
> if (vif->tx_irq == vif->rx_irq) 
> unbind_from_irqhandler(vif->tx_irq, vif); 
> @@ -493,6 +535,23 @@ void xenvif_disconnect(struct xenvif *vif) 
>
> void xenvif_free(struct xenvif *vif) 
> { 
> + int i, unmap_timeout = 0; 
> + 
> + for (i = 0; i < MAX_PENDING_REQS; ++i) { 
> + if (vif->grant_tx_handle[i] != NETBACK_INVALID_HANDLE) { 
> + unmap_timeout++; 
> + schedule_timeout(msecs_to_jiffies(1000)); 
> + if (unmap_timeout > 9 && 
> +     net_ratelimit()) 
> + netdev_err(vif->dev, 
> +    "Page still granted! Index: %x\n", 
> +    i); 
> + i = -1; 
> + } 
> + } 
> + 
> + free_xenballooned_pages(MAX_PENDING_REQS, vif->mmap_pages); 
> + 
> netif_napi_del(&vif->napi); 
>
> unregister_netdev(vif->dev); 
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c 
> index 91d7375..8a94338 100644 
> --- a/drivers/net/xen-netback/netback.c 
> +++ b/drivers/net/xen-netback/netback.c 
> @@ -101,10 +101,17 @@ static inline unsigned long idx_to_kaddr(struct xenvif *vif, 
> return (unsigned long)pfn_to_kaddr(idx_to_pfn(vif, idx)); 
> } 
>
> +/* Find the containing VIF's structure from a pointer in pending_tx_info */ 
> static inline struct xenvif * ubuf_to_vif(struct ubuf_info *ubuf) 
> { 
> - return NULL; 
> + u16 pending_idx = ubuf->desc; 
> + struct pending_tx_info *temp = 
> + container_of(ubuf, struct pending_tx_info, callback_struct); 
> + return container_of(temp - pending_idx, 
> +     struct xenvif, 
> +     pending_tx_info[0]); 
> } 
> + 
> /* This is a miniumum size for the linear area to avoid lots of 
>   * calls to __pskb_pull_tail() as we set up checksum offsets. The 
>   * value 128 was chosen as it covers all IPv4 and most likely 
> @@ -664,9 +671,12 @@ static void xenvif_tx_err(struct xenvif *vif, 
>   struct xen_netif_tx_request *txp, RING_IDX end) 
> { 
> RING_IDX cons = vif->tx.req_cons; 
> + unsigned long flags; 
>
> do { 
> + spin_lock_irqsave(&vif->response_lock, flags); 
> make_tx_response(vif, txp, XEN_NETIF_RSP_ERROR); 
> + spin_unlock_irqrestore(&vif->response_lock, flags); 
> if (cons == end) 
> break; 
> txp = RING_GET_REQUEST(&vif->tx, cons++); 
> @@ -791,10 +801,24 @@ static struct page *xenvif_alloc_page(struct xenvif *vif, 
> return page; 
> } 
>
> -static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif, 
> -        struct sk_buff *skb, 
> -        struct xen_netif_tx_request *txp, 
> -        struct gnttab_copy *gop) 
> +static inline void xenvif_tx_create_gop(struct xenvif *vif, 
> + u16 pending_idx, 
> + struct xen_netif_tx_request *txp, 
> + struct gnttab_map_grant_ref *gop) 
> +{ 
> + vif->pages_to_map[gop-vif->tx_map_ops] = vif->mmap_pages[pending_idx]; 
> + gnttab_set_map_op(gop, idx_to_kaddr(vif, pending_idx), 
> +   GNTMAP_host_map | GNTMAP_readonly, 
> +   txp->gref, vif->domid); 
> + 
> + memcpy(&vif->pending_tx_info[pending_idx].req, txp, 
> +        sizeof(*txp)); 
> +} 
> + 
> +static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif, 
> + struct sk_buff *skb, 
> + struct xen_netif_tx_request *txp, 
> + struct gnttab_map_grant_ref *gop) 
> { 
> struct skb_shared_info *shinfo = skb_shinfo(skb); 
> skb_frag_t *frags = shinfo->frags; 
> @@ -815,83 +839,12 @@ static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif, 
> /* Skip first skb fragment if it is on same page as header fragment. */ 
> start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx); 
>
> - /* Coalesce tx requests, at this point the packet passed in 
> - * should be <= 64K. Any packets larger than 64K have been 
> - * handled in xenvif_count_requests(). 
> - */ 
> - for (shinfo->nr_frags = slot = start; slot < nr_slots; 
> -      shinfo->nr_frags++) { 
> - struct pending_tx_info *pending_tx_info = 
> - vif->pending_tx_info; 
> - 
> - page = alloc_page(GFP_ATOMIC|__GFP_COLD); 
> - if (!page) 
> - goto err; 
> - 
> - dst_offset = 0; 
> - first = NULL; 
> - while (dst_offset < PAGE_SIZE && slot < nr_slots) { 
> - gop->flags = GNTCOPY_source_gref; 
> - 
> - gop->source.u.ref = txp->gref; 
> - gop->source.domid = vif->domid; 
> - gop->source.offset = txp->offset; 
> - 
> - gop->dest.domid = DOMID_SELF; 
> - 
> - gop->dest.offset = dst_offset; 
> - gop->dest.u.gmfn = virt_to_mfn(page_address(page)); 
> - 
> - if (dst_offset + txp->size > PAGE_SIZE) { 
> - /* This page can only merge a portion 
> - * of tx request. Do not increment any 
> - * pointer / counter here. The txp 
> - * will be dealt with in future 
> - * rounds, eventually hitting the 
> - * `else` branch. 
> - */ 
> - gop->len = PAGE_SIZE - dst_offset; 
> - txp->offset += gop->len; 
> - txp->size -= gop->len; 
> - dst_offset += gop->len; /* quit loop */ 
> - } else { 
> - /* This tx request can be merged in the page */ 
> - gop->len = txp->size; 
> - dst_offset += gop->len; 
> - 
> + for (shinfo->nr_frags = start; shinfo->nr_frags < nr_slots; 
> +      shinfo->nr_frags++, txp++, gop++) { 
> index = pending_index(vif->pending_cons++); 
> - 
> pending_idx = vif->pending_ring[index]; 
> - 
> - memcpy(&pending_tx_info[pending_idx].req, txp, 
> -        sizeof(*txp)); 
> - 
> - /* Poison these fields, corresponding 
> - * fields for head tx req will be set 
> - * to correct values after the loop. 
> - */ 
> - vif->mmap_pages[pending_idx] = (void *)(~0UL); 
> - pending_tx_info[pending_idx].head = 
> - INVALID_PENDING_RING_IDX; 
> - 
> - if (!first) { 
> - first = &pending_tx_info[pending_idx]; 
> - start_idx = index; 
> - head_idx = pending_idx; 
> - } 
> - 
> - txp++; 
> - slot++; 
> - } 
> - 
> - gop++; 
> - } 
> - 
> - first->req.offset = 0; 
> - first->req.size = dst_offset; 
> - first->head = start_idx; 
> - vif->mmap_pages[head_idx] = page; 
> - frag_set_pending_idx(&frags[shinfo->nr_frags], head_idx); 
> + xenvif_tx_create_gop(vif, pending_idx, txp, gop); 
> + frag_set_pending_idx(&frags[shinfo->nr_frags], pending_idx); 
> } 
>
> BUG_ON(shinfo->nr_frags > MAX_SKB_FRAGS); 
> @@ -911,11 +864,38 @@ err: 
> return NULL; 
> } 
>
> +static inline void xenvif_grant_handle_set(struct xenvif *vif, 
> +    u16 pending_idx, 
> +    grant_handle_t handle) 
> +{ 
> + if (unlikely(vif->grant_tx_handle[pending_idx] != 
> +      NETBACK_INVALID_HANDLE)) { 
> + netdev_err(vif->dev, 
> +    "Trying to unmap invalid handle! " 
> +    "pending_idx: %x\n", pending_idx); 
> + BUG(); 
> + } 
> + vif->grant_tx_handle[pending_idx] = handle; 
> +} 
> + 
> +static inline void xenvif_grant_handle_reset(struct xenvif *vif, 
> +      u16 pending_idx) 
> +{ 
> + if (unlikely(vif->grant_tx_handle[pending_idx] == 
> +      NETBACK_INVALID_HANDLE)) { 
> + netdev_err(vif->dev, 
> +    "Trying to unmap invalid handle! " 
> +    "pending_idx: %x\n", pending_idx); 
> + BUG(); 
> + } 
> + vif->grant_tx_handle[pending_idx] = NETBACK_INVALID_HANDLE; 
> +} 
> + 
> static int xenvif_tx_check_gop(struct xenvif *vif, 
>        struct sk_buff *skb, 
> -        struct gnttab_copy **gopp) 
> +        struct gnttab_map_grant_ref **gopp) 
> { 
> - struct gnttab_copy *gop = *gopp; 
> + struct gnttab_map_grant_ref *gop = *gopp; 
> u16 pending_idx = *((u16 *)skb->cb); 
> struct skb_shared_info *shinfo = skb_shinfo(skb); 
> struct pending_tx_info *tx_info; 
> @@ -927,6 +907,8 @@ static int xenvif_tx_check_gop(struct xenvif *vif, 
> err = gop->status; 
> if (unlikely(err)) 
> xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_ERROR); 
> + else 
> + xenvif_grant_handle_set(vif, pending_idx , gop->handle); 
>
> /* Skip first skb fragment if it is on same page as header fragment. */ 
> start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx); 
> @@ -940,18 +922,13 @@ static int xenvif_tx_check_gop(struct xenvif *vif, 
> head = tx_info->head; 
>
> /* Check error status: if okay then remember grant handle. */ 
> - do { 
> newerr = (++gop)->status; 
> - if (newerr) 
> - break; 
> - peek = vif->pending_ring[pending_index(++head)]; 
> - } while (!pending_tx_is_head(vif, peek)); 
>
> if (likely(!newerr)) { 
> + xenvif_grant_handle_set(vif, pending_idx , gop->handle); 
> /* Had a previous error? Invalidate this fragment. */ 
> if (unlikely(err)) 
> - xenvif_idx_release(vif, pending_idx, 
> -    XEN_NETIF_RSP_OKAY); 
> + xenvif_idx_unmap(vif, pending_idx); 
> continue; 
> } 
>
> @@ -964,11 +941,10 @@ static int xenvif_tx_check_gop(struct xenvif *vif, 
>
> /* First error: invalidate header and preceding fragments. */ 
> pending_idx = *((u16 *)skb->cb); 
> - xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_OKAY); 
> + xenvif_idx_unmap(vif, pending_idx); 
> for (j = start; j < i; j++) { 
> pending_idx = frag_get_pending_idx(&shinfo->frags[j]); 
> - xenvif_idx_release(vif, pending_idx, 
> -    XEN_NETIF_RSP_OKAY); 
> + xenvif_idx_unmap(vif, pending_idx); 
> } 
>
> /* Remember the error: invalidate all subsequent fragments. */ 
> @@ -984,6 +960,10 @@ static void xenvif_fill_frags(struct xenvif *vif, struct sk_buff *skb) 
> struct skb_shared_info *shinfo = skb_shinfo(skb); 
> int nr_frags = shinfo->nr_frags; 
> int i; 
> + u16 prev_pending_idx = INVALID_PENDING_IDX; 
> + 
> + if (skb_shinfo(skb)->destructor_arg) 
> + prev_pending_idx = skb->cb; 
>
> for (i = 0; i < nr_frags; i++) { 
> skb_frag_t *frag = shinfo->frags + i; 
> @@ -993,6 +973,17 @@ static void xenvif_fill_frags(struct xenvif *vif, struct sk_buff *skb) 
>
> pending_idx = frag_get_pending_idx(frag); 
>
> + /* If this is not the first frag, chain it to the previous*/ 
> + if (unlikely(prev_pending_idx == INVALID_PENDING_IDX)) 
> + skb_shinfo(skb)->destructor_arg = 
> + &vif->pending_tx_info[pending_idx].callback_struct; 
> + else if (likely(pending_idx != prev_pending_idx)) 
> + vif->pending_tx_info[prev_pending_idx].callback_struct.ctx = 
> + &(vif->pending_tx_info[pending_idx].callback_struct); 
> + 
> + vif->pending_tx_info[pending_idx].callback_struct.ctx = NULL; 
> + prev_pending_idx = pending_idx; 
> + 
> txp = &vif->pending_tx_info[pending_idx].req; 
> page = virt_to_page(idx_to_kaddr(vif, pending_idx)); 
> __skb_fill_page_desc(skb, i, page, txp->offset, txp->size); 
> @@ -1000,10 +991,15 @@ static void xenvif_fill_frags(struct xenvif *vif, struct sk_buff *skb) 
> skb->data_len += txp->size; 
> skb->truesize += txp->size; 
>
> - /* Take an extra reference to offset xenvif_idx_release */ 
> + /* Take an extra reference to offset network stack's put_page */ 
> get_page(vif->mmap_pages[pending_idx]); 
> - xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_OKAY); 
> } 
> + /* FIXME: __skb_fill_page_desc set this to true because page->pfmemalloc 
> + * overlaps with "index", and "mapping" is not set. I think mapping 
> + * should be set. If delivered to local stack, it would drop this 
> + * skb in sk_filter unless the socket has the right to use it. 
> + */ 
> + skb->pfmemalloc = false; 
> } 
>
> static int xenvif_get_extras(struct xenvif *vif, 
> @@ -1123,7 +1119,7 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size) 
>
> static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget) 
> { 
> - struct gnttab_copy *gop = vif->tx_copy_ops, *request_gop; 
> + struct gnttab_map_grant_ref *gop = vif->tx_map_ops, *request_gop; 
> struct sk_buff *skb; 
> int ret; 
>
> @@ -1230,30 +1226,10 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget) 
> } 
> } 
>
> - /* XXX could copy straight to head */ 
> - page = xenvif_alloc_page(vif, pending_idx); 
> - if (!page) { 
> - kfree_skb(skb); 
> - xenvif_tx_err(vif, &txreq, idx); 
> - break; 
> - } 
> - 
> - gop->source.u.ref = txreq.gref; 
> - gop->source.domid = vif->domid; 
> - gop->source.offset = txreq.offset; 
> - 
> - gop->dest.u.gmfn = virt_to_mfn(page_address(page)); 
> - gop->dest.domid = DOMID_SELF; 
> - gop->dest.offset = txreq.offset; 
> - 
> - gop->len = txreq.size; 
> - gop->flags = GNTCOPY_source_gref; 
> + xenvif_tx_create_gop(vif, pending_idx, &txreq, gop); 
>
> gop++; 
>
> - memcpy(&vif->pending_tx_info[pending_idx].req, 
> -        &txreq, sizeof(txreq)); 
> - vif->pending_tx_info[pending_idx].head = index; 
> *((u16 *)skb->cb) = pending_idx; 
>
> __skb_put(skb, data_len); 
> @@ -1282,17 +1258,17 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget) 
>
> vif->tx.req_cons = idx; 
>
> - if ((gop-vif->tx_copy_ops) >= ARRAY_SIZE(vif->tx_copy_ops)) 
> + if ((gop-vif->tx_map_ops) >= ARRAY_SIZE(vif->tx_map_ops)) 
> break; 
> } 
>
> - return gop - vif->tx_copy_ops; 
> + return gop - vif->tx_map_ops; 
> } 
>
>
> static int xenvif_tx_submit(struct xenvif *vif) 
> { 
> - struct gnttab_copy *gop = vif->tx_copy_ops; 
> + struct gnttab_map_grant_ref *gop = vif->tx_map_ops; 
> struct sk_buff *skb; 
> int work_done = 0; 
>
> @@ -1316,14 +1292,17 @@ static int xenvif_tx_submit(struct xenvif *vif) 
> memcpy(skb->data, 
>        (void *)(idx_to_kaddr(vif, pending_idx)|txp->offset), 
>        data_len); 
> + vif->pending_tx_info[pending_idx].callback_struct.ctx = NULL; 
> if (data_len < txp->size) { 
> /* Append the packet payload as a fragment. */ 
> txp->offset += data_len; 
> txp->size -= data_len; 
> + skb_shinfo(skb)->destructor_arg = 
> + &vif->pending_tx_info[pending_idx].callback_struct; 
> } else { 
> /* Schedule a response immediately. */ 
> - xenvif_idx_release(vif, pending_idx, 
> -    XEN_NETIF_RSP_OKAY); 
> + skb_shinfo(skb)->destructor_arg = NULL; 
> + xenvif_idx_unmap(vif, pending_idx); 
> } 
>
> if (txp->flags & XEN_NETTXF_csum_blank) 
> @@ -1345,6 +1324,9 @@ static int xenvif_tx_submit(struct xenvif *vif) 
> if (checksum_setup(vif, skb)) { 
> netdev_dbg(vif->dev, 
>    "Can't setup checksum in net_tx_action\n"); 
> + /* We have to set this flag to trigger the callback */ 
> + if (skb_shinfo(skb)->destructor_arg) 
> + skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY; 
> kfree_skb(skb); 
> continue; 
> } 
> @@ -1370,6 +1352,14 @@ static int xenvif_tx_submit(struct xenvif *vif) 
>
> work_done++; 
>
> + /* Set this flag right before netif_receive_skb, otherwise 
> + * someone might think this packet already left netback, and 
> + * do a skb_copy_ubufs while we are still in control of the 
> + * skb. E.g. the __pskb_pull_tail earlier can do such thing. 
> + */ 
> + if (skb_shinfo(skb)->destructor_arg) 
> + skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY; 
> + 
> netif_receive_skb(skb); 
> } 
>
> @@ -1378,14 +1368,111 @@ static int xenvif_tx_submit(struct xenvif *vif) 
>
> void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success) 
> { 
> - return; 
> + unsigned long flags; 
> + pending_ring_idx_t index; 
> + struct xenvif *vif = ubuf_to_vif(ubuf); 
> + 
> + /* This is the only place where we grab this lock, to protect callbacks 
> + * from each other. 
> + */ 
> + spin_lock_irqsave(&vif->callback_lock, flags); 
> + do { 
> + u16 pending_idx = ubuf->desc; 
> + ubuf = (struct ubuf_info *) ubuf->ctx; 
> + BUG_ON(vif->dealloc_prod - vif->dealloc_cons >= 
> + MAX_PENDING_REQS); 
> + index = pending_index(vif->dealloc_prod); 
> + vif->dealloc_ring[index] = pending_idx; 
> + /* Sync with xenvif_tx_dealloc_action: 
> + * insert idx then incr producer. 
> + */ 
> + smp_wmb(); 
> + vif->dealloc_prod++; 
> + } while (ubuf); 
> + wake_up(&vif->dealloc_wq); 
> + spin_unlock_irqrestore(&vif->callback_lock, flags); 
> + 
> + if (RING_HAS_UNCONSUMED_REQUESTS(&vif->tx) && 
> +     xenvif_tx_pending_slots_available(vif)) { 
> + local_bh_disable(); 
> + napi_schedule(&vif->napi); 
> + local_bh_enable(); 
> + } 
> +} 
> + 
> +static inline void xenvif_tx_dealloc_action(struct xenvif *vif) 
> +{ 
> + struct gnttab_unmap_grant_ref *gop; 
> + pending_ring_idx_t dc, dp; 
> + u16 pending_idx, pending_idx_release[MAX_PENDING_REQS]; 
> + unsigned int i = 0; 
> + 
> + dc = vif->dealloc_cons; 
> + gop = vif->tx_unmap_ops; 
> + 
> + /* Free up any grants we have finished using */ 
> + do { 
> + dp = vif->dealloc_prod; 
> + 
> + /* Ensure we see all indices enqueued by all 
> + * xenvif_zerocopy_callback(). 
> + */ 
> + smp_rmb(); 
> + 
> + while (dc != dp) { 
> + BUG_ON(gop - vif->tx_unmap_ops > MAX_PENDING_REQS); 
> + pending_idx = 
> + vif->dealloc_ring[pending_index(dc++)]; 
> + 
> + pending_idx_release[gop-vif->tx_unmap_ops] = 
> + pending_idx; 
> + vif->pages_to_unmap[gop-vif->tx_unmap_ops] = 
> + vif->mmap_pages[pending_idx]; 
> + gnttab_set_unmap_op(gop, 
> +     idx_to_kaddr(vif, pending_idx), 
> +     GNTMAP_host_map, 
> +     vif->grant_tx_handle[pending_idx]); 
> + /* Btw. already unmapped? */ 
> + xenvif_grant_handle_reset(vif, pending_idx); 
> + ++gop; 
> + } 
> + 
> + } while (dp != vif->dealloc_prod); 
> + 
> + vif->dealloc_cons = dc; 
> + 
> + if (gop - vif->tx_unmap_ops > 0) { 
> + int ret; 
> + ret = gnttab_unmap_refs(vif->tx_unmap_ops, 
> + NULL, 
> + vif->pages_to_unmap, 
> + gop - vif->tx_unmap_ops); 
> + if (ret) { 
> + netdev_err(vif->dev, "Unmap fail: nr_ops %x ret %d\n", 
> +    gop - vif->tx_unmap_ops, ret); 
> + for (i = 0; i < gop - vif->tx_unmap_ops; ++i) { 
> + if (gop[i].status != GNTST_okay) 
> + netdev_err(vif->dev, 
> +    " host_addr: %llx handle: %x status: %d\n", 
> +    gop[i].host_addr, 
> +    gop[i].handle, 
> +    gop[i].status); 
> + } 
> + BUG(); 
> + } 
> + } 
> + 
> + for (i = 0; i < gop - vif->tx_unmap_ops; ++i) 
> + xenvif_idx_release(vif, pending_idx_release[i], 
> +    XEN_NETIF_RSP_OKAY); 
> } 
>
> + 
> /* Called after netfront has transmitted */ 
> int xenvif_tx_action(struct xenvif *vif, int budget) 
> { 
> unsigned nr_gops; 
> - int work_done; 
> + int work_done, ret; 
>
> if (unlikely(!tx_work_todo(vif))) 
> return 0; 
> @@ -1395,7 +1482,11 @@ int xenvif_tx_action(struct xenvif *vif, int budget) 
> if (nr_gops == 0) 
> return 0; 
>
> - gnttab_batch_copy(vif->tx_copy_ops, nr_gops); 
> + ret = gnttab_map_refs(vif->tx_map_ops, 
> +       NULL, 
> +       vif->pages_to_map, 
> +       nr_gops); 
> + BUG_ON(ret); 
>
> work_done = xenvif_tx_submit(vif); 
>
> @@ -1406,48 +1497,40 @@ static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx, 
>        u8 status) 
> { 
> struct pending_tx_info *pending_tx_info; 
> - pending_ring_idx_t head; 
> + pending_ring_idx_t index; 
> u16 peek; /* peek into next tx request */ 
> + unsigned long flags; 
>
> - BUG_ON(vif->mmap_pages[pending_idx] == (void *)(~0UL)); 
> - 
> - /* Already complete? */ 
> - if (vif->mmap_pages[pending_idx] == NULL) 
> - return; 
> - 
> - pending_tx_info = &vif->pending_tx_info[pending_idx]; 
> - 
> - head = pending_tx_info->head; 
> - 
> - BUG_ON(!pending_tx_is_head(vif, head)); 
> - BUG_ON(vif->pending_ring[pending_index(head)] != pending_idx); 
> - 
> - do { 
> - pending_ring_idx_t index; 
> - pending_ring_idx_t idx = pending_index(head); 
> - u16 info_idx = vif->pending_ring[idx]; 
> - 
> - pending_tx_info = &vif->pending_tx_info[info_idx]; 
> + pending_tx_info = &vif->pending_tx_info[pending_idx]; 
> + spin_lock_irqsave(&vif->response_lock, flags); 
> make_tx_response(vif, &pending_tx_info->req, status); 
> + index = pending_index(vif->pending_prod); 
> + vif->pending_ring[index] = pending_idx; 
> + /* TX shouldn't use the index before we give it back here */ 
> + mb(); 
> + vif->pending_prod++; 
> + spin_unlock_irqrestore(&vif->response_lock, flags); 
> +} 
>
> - /* Setting any number other than 
> - * INVALID_PENDING_RING_IDX indicates this slot is 
> - * starting a new packet / ending a previous packet. 
> - */ 
> - pending_tx_info->head = 0; 
> - 
> - index = pending_index(vif->pending_prod++); 
> - vif->pending_ring[index] = vif->pending_ring[info_idx]; 
> +void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx) 
> +{ 
> + int ret; 
> + struct gnttab_unmap_grant_ref tx_unmap_op; 
>
> - peek = vif->pending_ring[pending_index(++head)]; 
> + gnttab_set_unmap_op(&tx_unmap_op, 
> +     idx_to_kaddr(vif, pending_idx), 
> +     GNTMAP_host_map, 
> +     vif->grant_tx_handle[pending_idx]); 
> + /* Btw. already unmapped? */ 
> + xenvif_grant_handle_reset(vif, pending_idx); 
>
> - } while (!pending_tx_is_head(vif, peek)); 
> + ret = gnttab_unmap_refs(&tx_unmap_op, NULL, 
> + &vif->mmap_pages[pending_idx], 1); 
> + BUG_ON(ret); 
>
> - put_page(vif->mmap_pages[pending_idx]); 
> - vif->mmap_pages[pending_idx] = NULL; 
> + xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_OKAY); 
> } 
>
> - 
> static void make_tx_response(struct xenvif *vif, 
>      struct xen_netif_tx_request *txp, 
>      s8       st) 
> @@ -1508,6 +1591,11 @@ static inline int tx_work_todo(struct xenvif *vif) 
> return 0; 
> } 
>
> +static inline bool tx_dealloc_work_todo(struct xenvif *vif) 
> +{ 
> + return vif->dealloc_cons != vif->dealloc_prod; 
> +} 
> + 
> void xenvif_unmap_frontend_rings(struct xenvif *vif) 
> { 
> if (vif->tx.sring) 
> @@ -1594,6 +1682,28 @@ int xenvif_kthread_guest_rx(void *data) 
> return 0; 
> } 
>
> +int xenvif_dealloc_kthread(void *data) 
> +{ 
> + struct xenvif *vif = data; 
> + 
> + while (!kthread_should_stop()) { 
> + wait_event_interruptible(vif->dealloc_wq, 
> + tx_dealloc_work_todo(vif) || 
> + kthread_should_stop()); 
> + if (kthread_should_stop()) 
> + break; 
> + 
> + xenvif_tx_dealloc_action(vif); 
> + cond_resched(); 
> + } 
> + 
> + /* Unmap anything remaining*/ 
> + if (tx_dealloc_work_todo(vif)) 
> + xenvif_tx_dealloc_action(vif); 
> + 
> + return 0; 
> +} 
> + 
> static int __init netback_init(void) 
> { 
> int rc = 0; 
>
> _______________________________________________ 
> Xen-devel mailing list 
> Xen-devel@lists.xen.org 
> http://lists.xen.org/xen-devel 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2014-03-06 12:41 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-04 22:32 [PATCH net-next v6 0/10] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy Zoltan Kiss
2014-03-04 22:32 ` [PATCH net-next v6 1/10] xen-netback: Use skb->cb for pending_idx Zoltan Kiss
2014-03-04 22:32 ` Zoltan Kiss
2014-03-05 12:16   ` Wei Liu
2014-03-05 19:13     ` Zoltan Kiss
2014-03-05 19:13     ` Zoltan Kiss
2014-03-05 12:16   ` Wei Liu
2014-03-04 22:32 ` [PATCH net-next v6 2/10] xen-netback: Minor refactoring of netback code Zoltan Kiss
2014-03-04 22:32   ` Zoltan Kiss
2014-03-04 22:32   ` Zoltan Kiss
2014-03-04 22:32 ` [PATCH net-next v6 3/10] xen-netback: Handle foreign mapped pages on the guest RX path Zoltan Kiss
2014-03-04 22:32 ` Zoltan Kiss
2014-03-04 22:32 ` [PATCH net-next v6 4/10] xen-netback: Introduce TX grant mapping Zoltan Kiss
2014-03-04 22:32   ` Zoltan Kiss
2014-03-04 22:32   ` Zoltan Kiss
2014-03-05 12:28   ` Wei Liu
2014-03-05 12:28   ` Wei Liu
2014-03-05 21:33     ` Zoltan Kiss
2014-03-05 21:33     ` Zoltan Kiss
2014-03-04 22:32 ` [PATCH net-next v6 5/10] xen-netback: Remove old TX grant copy definitons and fix indentations Zoltan Kiss
2014-03-04 22:32 ` Zoltan Kiss
2014-03-04 22:32 ` [PATCH net-next v6 6/10] xen-netback: Add stat counters for zerocopy Zoltan Kiss
2014-03-04 22:32   ` Zoltan Kiss
2014-03-04 22:32   ` Zoltan Kiss
2014-03-04 22:32 ` [PATCH net-next v6 7/10] xen-netback: Handle guests with too many frags Zoltan Kiss
2014-03-04 22:32 ` Zoltan Kiss
2014-03-05 12:35   ` Wei Liu
2014-03-05 12:35   ` Wei Liu
2014-03-05 22:56     ` Zoltan Kiss
2014-03-05 22:56     ` Zoltan Kiss
2014-03-04 22:32 ` [PATCH net-next v6 8/10] xen-netback: Add stat counters for frag_list skbs Zoltan Kiss
2014-03-04 22:32   ` Zoltan Kiss
2014-03-04 22:32   ` Zoltan Kiss
2014-03-05 12:35   ` Wei Liu
2014-03-05 12:35   ` Wei Liu
2014-03-06 12:41     ` Zoltan Kiss
2014-03-06 12:41     ` Zoltan Kiss
2014-03-04 22:32 ` [PATCH net-next v6 9/10] xen-netback: Timeout packets in RX path Zoltan Kiss
2014-03-04 22:32 ` Zoltan Kiss
2014-03-04 22:32 ` [PATCH net-next v6 9/9] xen-netback: Aggregate TX unmap operations Zoltan Kiss
2014-03-05  0:45   ` Zoltan Kiss
2014-03-05  0:45   ` Zoltan Kiss
2014-03-05  1:07     ` David Miller
2014-03-05  1:07     ` David Miller
2014-03-04 22:32 ` Zoltan Kiss
  -- strict thread matches above, loose matches on Subject: below --
2014-03-05  2:19 [PATCH net-next v6 4/10] xen-netback: Introduce TX grant mapping Konrad Rzeszutek Wilk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.