Re: [PATCH 3 of 3] RFC: xenpaging: use waitqueue in ept

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 3 of 3] RFC: xenpaging: use waitqueue in ept_get
       [not found] <mailman.1126.1321998852.12970.xen-devel@lists.xensource.com>
@ 2011-11-23  4:52 ` Andres Lagar-Cavilla
  2011-11-23 16:37   ` Olaf Hering
  2011-11-23  4:58 ` RFC: mem_event: use wait queue when ring is full Andres Lagar-Cavilla
  1 sibling, 1 reply; 7+ messages in thread
From: Andres Lagar-Cavilla @ 2011-11-23  4:52 UTC (permalink / raw)
  To: Olaf Hering, xen-devel

Hi Olaf,

thanks for posting these RFC patches, great work!

I have several comments. Most importantly, I want to hash out the
interactions between these wait queues and the work I've been doing on p2m
synchronization. I've been runnning Xen with synchronized (i.e. locking)
p2m lookups for a few weeks now with little/no trouble. You can refer to a
patch I posted to the list previously, which I can resend. (I'm waiting on
feedback on some previous patches I sent to keep pushing on this work)

- I think the waitqueue should be part of struct arch_domain. It is an x86
construct that applies only to the base level p2m of the domain, not every
possible p2m in a nested setting.

- The same could be said of the p2m.pod struct, by the way, and that's a
patch I want to get upstreamed too.

- The right place to wait is not ept->get_entry, imho, but rather the
implementation-independent caller (get_gfn_type_access). More p2m
implementations could be added in the future (however unlikely that may
be) that can also perform paging.

- With locking p2m lookups, one needs to be careful about in_atomic. I
haven't performed a full audit of all callers, but this is what I can say:
1. there are no code paths that perform recursive p2m lookups, *on
different gfns*, with p2m_query_t != p2m_query
2. there are recursive lookups of different gfns, with p2m_query_t ==
p2m_query, most notably pod sweeps. These do not represent a problem here,
since those paths will skip over gfns that are paged.

Why is this important? Because we need to be in a position where a code
snippet such as

get_gfn_type_access(gfn) {
retry:
   p2m_lock()
   mfn = p2m->get_entry(gfn, &p2mt)
   if (p2mt == paging) {
       p2m_unlock()
       sleep_on_waitqueue()
       goto retry;
   }

works. This will get us a non-racy p2m that sends vcpu's to sleep without
holding locks. Makes sense?

- What is the plan for grant operations for pv-on-hvm drivers? Those will
enter get_gfn* holding the domain lock, and thus in an atomic context.

Andres
> Date: Tue, 22 Nov 2011 22:13:25 +0100
> From: Olaf Hering <olaf@aepfle.de>
> To: xen-devel@lists.xensource.com
> Subject: [Xen-devel] [PATCH 3 of 3] RFC: xenpaging: use waitqueue in
> 	ept_get
> Message-ID: <9d63ecd3969bb7a2e398.1321996405@probook.site>
> Content-Type: text/plain; charset="us-ascii"
>
> # HG changeset patch
> # User Olaf Hering <olaf@aepfle.de>
> # Date 1321996199 -3600
> # Node ID 9d63ecd3969bb7a2e39841f6c859b4c23f750642
> # Parent  de6860cb9205b68d1287482288d1b7b9d0255609
> RFC: xenpaging: use waitqueue in ept_get
> %0

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: mem_event: use wait queue when ring is full
       [not found] <mailman.1126.1321998852.12970.xen-devel@lists.xensource.com>
  2011-11-23  4:52 ` [PATCH 3 of 3] RFC: xenpaging: use waitqueue in ept_get Andres Lagar-Cavilla
@ 2011-11-23  4:58 ` Andres Lagar-Cavilla
  2011-11-23 16:49   ` Olaf Hering
  1 sibling, 1 reply; 7+ messages in thread
From: Andres Lagar-Cavilla @ 2011-11-23  4:58 UTC (permalink / raw)
  To: Olaf Hering, xen-devel; +Cc: adin

Olaf, two questions here

- do you have any insight for events caused by foreign mappings? Those
will be lost with a full ring, with or without wait queues

- we have posted a patch (twice) previously, with changes to ring
management, most importantly sending guest vcpus to sleep when space in
the ring is < d->max_vcpus. I see these two patches as complementary. What
is your take?

Thanks,
Andres

> Date: Tue, 22 Nov 2011 22:13:24 +0100
> From: Olaf Hering <olaf@aepfle.de>
> To: xen-devel@lists.xensource.com
> Subject: [Xen-devel] [PATCH 2 of 3] RFC: mem_event: use wait queue
> 	when ring	is full
> Message-ID: <de6860cb9205b68d1287.1321996404@probook.site>
> Content-Type: text/plain; charset="us-ascii"
>
> # HG changeset patch
> # User Olaf Hering <olaf@aepfle.de>
> # Date 1321996145 -3600
> # Node ID de6860cb9205b68d1287482288d1b7b9d0255609
> # Parent  d347a8a36d2e7951f98a3d22866dce004484d95f
> RFC: mem_event: use wait queue when ring is full
>
> CAUTION: while the patch by itself is supposed to be complete,
> the added usage of waitqueues will lead to sudden host reboots!
>
>
> This change is based on an idea/patch from Adin Scannell.
>
> If the ring is full, put the current vcpu to sleep if it belongs to the
> target domain. The wakeup happens in the p2m_*_resume functions.
> A request from another domain will use the existing ->req_producers
> functionality because sleeping is not possible if the vcpu does not
> belong to the target domain.
>
> This change fixes also a bug in p2m_mem_paging_drop_page(). Up to now a
> full ring will lead to harmless inconsistency in the pager,
>
> v2:
>  - p2m_mem_paging_populate: move vcpu_pause after put_request, otherwise
> the
>    vcpu will not wake_up after a wait_event because the pause_count was
>    increased twice. Fixes guest hangs.
>  - update free space check in _mem_event_put_request()
>  - simplify mem_event_put_request()
>
> Signed-off-by: Olaf Hering <olaf@aepfle.de>
>
> diff -r d347a8a36d2e -r de6860cb9205 xen/arch/x86/hvm/hvm.c
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -4099,7 +4099,7 @@ static int hvm_memory_event_traps(long p
>          return 1;
>
>      rc = mem_event_check_ring(d, &d->mem_event->mem_access);
> -    if ( rc )
> +    if ( rc < 0 )
>          return rc;
>
>      memset(&req, 0, sizeof(req));
> diff -r d347a8a36d2e -r de6860cb9205 xen/arch/x86/mm/mem_event.c
> --- a/xen/arch/x86/mm/mem_event.c
> +++ b/xen/arch/x86/mm/mem_event.c
> @@ -23,6 +23,7 @@
>
>  #include <asm/domain.h>
>  #include <xen/event.h>
> +#include <xen/wait.h>
>  #include <asm/p2m.h>
>  #include <asm/mem_event.h>
>  #include <asm/mem_paging.h>
> @@ -39,6 +40,7 @@
>
>  static int mem_event_enable(struct domain *d,
>                              xen_domctl_mem_event_op_t *mec,
> +                            int mem_event_bit,
>                              struct mem_event_domain *med)
>  {
>      int rc;
> @@ -107,8 +109,12 @@ static int mem_event_enable(struct domai
>
>      mem_event_ring_lock_init(med);
>
> +    med->mem_event_bit = mem_event_bit;
> +
> +    init_waitqueue_head(&med->wq);
> +
>      /* Wake any VCPUs paused for memory events */
> -    mem_event_unpause_vcpus(d);
> +    mem_event_unpause_vcpus(d, med);
>
>      return 0;
>
> @@ -124,6 +130,9 @@ static int mem_event_enable(struct domai
>
>  static int mem_event_disable(struct mem_event_domain *med)
>  {
> +    if (!list_empty(&med->wq.list))
> +        return -EBUSY;
> +
>      unmap_domain_page(med->ring_page);
>      med->ring_page = NULL;
>
> @@ -133,13 +142,21 @@ static int mem_event_disable(struct mem_
>      return 0;
>  }
>
> -void mem_event_put_request(struct domain *d, struct mem_event_domain
> *med, mem_event_request_t *req)
> +static int _mem_event_put_request(struct domain *d, struct
> mem_event_domain *med, mem_event_request_t *req)
>  {
>      mem_event_front_ring_t *front_ring;
> +    int free_requests;
>      RING_IDX req_prod;
>
>      mem_event_ring_lock(med);
>
> +    free_requests = RING_FREE_REQUESTS(&med->front_ring);
> +    /* A request was eventually claimed in mem_event_check_ring() */
> +    if ((current->domain == d && free_requests <= med->req_producers) ||
> !free_requests) {
> +        mem_event_ring_unlock(med);
> +        return 0;
> +    }
> +
>      front_ring = &med->front_ring;
>      req_prod = front_ring->req_prod_pvt;
>
> @@ -155,6 +172,20 @@ void mem_event_put_request(struct domain
>      mem_event_ring_unlock(med);
>
>      notify_via_xen_event_channel(d, med->xen_port);
> +
> +    return 1;
> +}
> +
> +void mem_event_put_request(struct domain *d, struct mem_event_domain
> *med, mem_event_request_t *req)
> +{
> +    /* Go to sleep if request came from guest */
> +    if (current->domain == d) {
> +        wait_event(med->wq, _mem_event_put_request(d, med, req));
> +        return;
> +    }
> +    /* Ring was full anyway, unable to sleep in non-guest context */
> +    if (!_mem_event_put_request(d, med, req))
> +        printk("Failed to put memreq: d %u t %x f %x gfn %lx\n",
> d->domain_id, req->type, req->flags, (unsigned long)req->gfn);
>  }
>
>  void mem_event_get_response(struct mem_event_domain *med,
> mem_event_response_t *rsp)
> @@ -178,21 +209,27 @@ void mem_event_get_response(struct mem_e
>      mem_event_ring_unlock(med);
>  }
>
> -void mem_event_unpause_vcpus(struct domain *d)
> +void mem_event_unpause_vcpus(struct domain *d, struct mem_event_domain
> *med)
>  {
>      struct vcpu *v;
>
>      for_each_vcpu ( d, v )
> -        if ( test_and_clear_bit(_VPF_mem_event, &v->pause_flags) )
> +        if ( test_and_clear_bit(med->mem_event_bit, &v->pause_flags) )
>              vcpu_wake(v);
>  }
>
> -void mem_event_mark_and_pause(struct vcpu *v)
> +void mem_event_mark_and_pause(struct vcpu *v, struct mem_event_domain
> *med)
>  {
> -    set_bit(_VPF_mem_event, &v->pause_flags);
> +    set_bit(med->mem_event_bit, &v->pause_flags);
>      vcpu_sleep_nosync(v);
>  }
>
> +/**
> + * mem_event_put_req_producers - Release a claimed slot
> + * @med: mem_event ring
> + *
> + * mem_event_put_req_producers() releases a claimed slot in the mem_event
> ring.
> + */
>  void mem_event_put_req_producers(struct mem_event_domain *med)
>  {
>      mem_event_ring_lock(med);
> @@ -200,9 +237,26 @@ void mem_event_put_req_producers(struct
>      mem_event_ring_unlock(med);
>  }
>
> +/**
> + * mem_event_check_ring - Check state of a mem_event ring
> + * @d: guest domain
> + * @med: mem_event ring
> + *
> + * Return codes: < 0: the ring is not yet configured
> + *                 0: the ring has some room
> + *               > 0: the ring is full
> + *
> + * mem_event_check_ring() checks the state of the given mem_event ring.
> + * If the current vcpu belongs to the guest domain, the function assumes
> that
> + * mem_event_put_request() will sleep until the ring has room again.
> + *
> + * If the current vcpu does not belong to the target domain the caller
> must try
> + * again until there is room. A slot is claimed and the caller can place
> a
> + * request. If the caller does not need to send a request, the claimed
> slot has
> + * to be released with mem_event_put_req_producers().
> + */
>  int mem_event_check_ring(struct domain *d, struct mem_event_domain *med)
>  {
> -    struct vcpu *curr = current;
>      int free_requests;
>      int ring_full = 1;
>
> @@ -218,8 +272,8 @@ int mem_event_check_ring(struct domain *
>          ring_full = 0;
>      }
>
> -    if ( ring_full && (curr->domain == d) )
> -        mem_event_mark_and_pause(curr);
> +    if ( current->domain == d )
> +        ring_full = 0;
>
>      mem_event_ring_unlock(med);
>
> @@ -287,7 +341,7 @@ int mem_event_domctl(struct domain *d, x
>              if ( p2m->pod.entry_count )
>                  break;
>
> -            rc = mem_event_enable(d, mec, med);
> +            rc = mem_event_enable(d, mec, _VPF_me_mem_paging, med);
>          }
>          break;
>
> @@ -326,7 +380,7 @@ int mem_event_domctl(struct domain *d, x
>              if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
>                  break;
>
> -            rc = mem_event_enable(d, mec, med);
> +            rc = mem_event_enable(d, mec, _VPF_me_mem_access, med);
>          }
>          break;
>
> diff -r d347a8a36d2e -r de6860cb9205 xen/arch/x86/mm/mem_sharing.c
> --- a/xen/arch/x86/mm/mem_sharing.c
> +++ b/xen/arch/x86/mm/mem_sharing.c
> @@ -253,19 +253,11 @@ static void mem_sharing_audit(void)
>  #endif
>
>
> -static struct page_info* mem_sharing_alloc_page(struct domain *d,
> -                                                unsigned long gfn)
> +static void mem_sharing_notify_helper(struct domain *d, unsigned long
> gfn)
>  {
> -    struct page_info* page;
>      struct vcpu *v = current;
>      mem_event_request_t req;
>
> -    page = alloc_domheap_page(d, 0);
> -    if(page != NULL) return page;
> -
> -    memset(&req, 0, sizeof(req));
> -    req.type = MEM_EVENT_TYPE_SHARED;
> -
>      if ( v->domain != d )
>      {
>          /* XXX This path needs some attention.  For now, just fail
> foreign
> @@ -275,20 +267,20 @@ static struct page_info* mem_sharing_all
>          gdprintk(XENLOG_ERR,
>                   "Failed alloc on unshare path for foreign (%d)
> lookup\n",
>                   d->domain_id);
> -        return page;
> +        return;
>      }
>
> -    vcpu_pause_nosync(v);
> -    req.flags |= MEM_EVENT_FLAG_VCPU_PAUSED;
> +    if (mem_event_check_ring(d, &d->mem_event->mem_share) < 0)
> +        return;
>
> -    if(mem_event_check_ring(d, &d->mem_event->mem_share)) return page;
> -
> +    memset(&req, 0, sizeof(req));
> +    req.type = MEM_EVENT_TYPE_SHARED;
> +    req.flags = MEM_EVENT_FLAG_VCPU_PAUSED;
>      req.gfn = gfn;
>      req.p2mt = p2m_ram_shared;
>      req.vcpu_id = v->vcpu_id;
>      mem_event_put_request(d, &d->mem_event->mem_share, &req);
> -
> -    return page;
> +    vcpu_pause_nosync(v);
>  }
>
>  unsigned int mem_sharing_get_nr_saved_mfns(void)
> @@ -653,7 +645,7 @@ gfn_found:
>      if(ret == 0) goto private_page_found;
>
>      old_page = page;
> -    page = mem_sharing_alloc_page(d, gfn);
> +    page = alloc_domheap_page(d, 0);
>      if(!page)
>      {
>          /* We've failed to obtain memory for private page. Need to re-add
> the
> @@ -661,6 +653,7 @@ gfn_found:
>          list_add(&gfn_info->list, &hash_entry->gfns);
>          put_gfn(d, gfn);
>          shr_unlock();
> +        mem_sharing_notify_helper(d, gfn);
>          return -ENOMEM;
>      }
>
> diff -r d347a8a36d2e -r de6860cb9205 xen/arch/x86/mm/p2m.c
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -884,17 +884,13 @@ void p2m_mem_paging_drop_page(struct dom
>      struct vcpu *v = current;
>      mem_event_request_t req;
>
> -    /* Check that there's space on the ring for this request */
> -    if ( mem_event_check_ring(d, &d->mem_event->mem_paging) == 0)
> -    {
> -        /* Send release notification to pager */
> -        memset(&req, 0, sizeof(req));
> -        req.flags |= MEM_EVENT_FLAG_DROP_PAGE;
> -        req.gfn = gfn;
> -        req.vcpu_id = v->vcpu_id;
> +    /* Send release notification to pager */
> +    memset(&req, 0, sizeof(req));
> +    req.flags = MEM_EVENT_FLAG_DROP_PAGE;
> +    req.gfn = gfn;
> +    req.vcpu_id = v->vcpu_id;
>
> -        mem_event_put_request(d, &d->mem_event->mem_paging, &req);
> -    }
> +    mem_event_put_request(d, &d->mem_event->mem_paging, &req);
>  }
>
>  /**
> @@ -952,7 +948,6 @@ void p2m_mem_paging_populate(struct doma
>      /* Pause domain if request came from guest and gfn has paging type */
>      if (  p2m_is_paging(p2mt) && v->domain == d )
>      {
> -        vcpu_pause_nosync(v);
>          req.flags |= MEM_EVENT_FLAG_VCPU_PAUSED;
>      }
>      /* No need to inform pager if the gfn is not in the page-out path */
> @@ -969,6 +964,10 @@ void p2m_mem_paging_populate(struct doma
>      req.vcpu_id = v->vcpu_id;
>
>      mem_event_put_request(d, &d->mem_event->mem_paging, &req);
> +
> +    /* Pause guest after put_request to allow wake_up after wait_event */
> +    if ( req.flags & MEM_EVENT_FLAG_VCPU_PAUSED )
> +        vcpu_pause_nosync(v);
>  }
>
>  /**
> @@ -1071,8 +1070,8 @@ void p2m_mem_paging_resume(struct domain
>      if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED )
>          vcpu_unpause(d->vcpu[rsp.vcpu_id]);
>
> -    /* Unpause any domains that were paused because the ring was full */
> -    mem_event_unpause_vcpus(d);
> +    /* Unpause all vcpus that were paused because the ring was full */
> +    wake_up(&d->mem_event->mem_paging.wq);
>  }
>
>  void p2m_mem_access_check(unsigned long gpa, bool_t gla_valid, unsigned
> long gla,
> @@ -1111,7 +1110,7 @@ void p2m_mem_access_check(unsigned long
>                     "Memory access permissions failure, no mem_event
> listener: pausing VCPU %d, dom %d\n",
>                     v->vcpu_id, d->domain_id);
>
> -            mem_event_mark_and_pause(v);
> +            mem_event_mark_and_pause(v, &d->mem_event->mem_access);
>          }
>          else
>          {
> @@ -1131,8 +1130,7 @@ void p2m_mem_access_check(unsigned long
>      req.reason = MEM_EVENT_REASON_VIOLATION;
>
>      /* Pause the current VCPU unconditionally */
> -    vcpu_pause_nosync(v);
> -    req.flags |= MEM_EVENT_FLAG_VCPU_PAUSED;
> +    req.flags = MEM_EVENT_FLAG_VCPU_PAUSED;
>
>      /* Send request to mem event */
>      req.gfn = gfn;
> @@ -1148,6 +1146,7 @@ void p2m_mem_access_check(unsigned long
>      mem_event_put_request(d, &d->mem_event->mem_access, &req);
>
>      /* VCPU paused, mem event request sent */
> +    vcpu_pause_nosync(v);
>  }
>
>  void p2m_mem_access_resume(struct p2m_domain *p2m)
> @@ -1161,9 +1160,11 @@ void p2m_mem_access_resume(struct p2m_do
>      if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED )
>          vcpu_unpause(d->vcpu[rsp.vcpu_id]);
>
> -    /* Unpause any domains that were paused because the ring was full or
> no listener
> -     * was available */
> -    mem_event_unpause_vcpus(d);
> +    /* Wakeup all vcpus waiting because the ring was full */
> +    wake_up(&d->mem_event->mem_access.wq);
> +
> +    /* Unpause all vcpus that were paused because no listener was
> available */
> +    mem_event_unpause_vcpus(d, &d->mem_event->mem_access);
>  }
>
>
> diff -r d347a8a36d2e -r de6860cb9205 xen/include/asm-x86/mem_event.h
> --- a/xen/include/asm-x86/mem_event.h
> +++ b/xen/include/asm-x86/mem_event.h
> @@ -25,12 +25,12 @@
>  #define __MEM_EVENT_H__
>
>  /* Pauses VCPU while marking pause flag for mem event */
> -void mem_event_mark_and_pause(struct vcpu *v);
> +void mem_event_mark_and_pause(struct vcpu *v, struct mem_event_domain
> *med);
>  int mem_event_check_ring(struct domain *d, struct mem_event_domain *med);
>  void mem_event_put_req_producers(struct mem_event_domain *med);
>  void mem_event_put_request(struct domain *d, struct mem_event_domain
> *med, mem_event_request_t *req);
>  void mem_event_get_response(struct mem_event_domain *med,
> mem_event_response_t *rsp);
> -void mem_event_unpause_vcpus(struct domain *d);
> +void mem_event_unpause_vcpus(struct domain *d, struct mem_event_domain
> *med);
>
>  int mem_event_domctl(struct domain *d, xen_domctl_mem_event_op_t *mec,
>                       XEN_GUEST_HANDLE(void) u_domctl);
> diff -r d347a8a36d2e -r de6860cb9205 xen/include/xen/sched.h
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -14,6 +14,7 @@
>  #include <xen/nodemask.h>
>  #include <xen/radix-tree.h>
>  #include <xen/multicall.h>
> +#include <xen/wait.h>
>  #include <public/xen.h>
>  #include <public/domctl.h>
>  #include <public/sysctl.h>
> @@ -192,6 +193,10 @@ struct mem_event_domain
>      mem_event_front_ring_t front_ring;
>      /* event channel port (vcpu0 only) */
>      int xen_port;
> +    /* mem_event bit for vcpu->pause_flags */
> +    int mem_event_bit;
> +    /* list of vcpus waiting for room in the ring */
> +    struct waitqueue_head wq;
>  };
>
>  struct mem_event_per_domain
> @@ -615,9 +620,12 @@ static inline struct domain *next_domain
>   /* VCPU affinity has changed: migrating to a new CPU. */
>  #define _VPF_migrating       3
>  #define VPF_migrating        (1UL<<_VPF_migrating)
> - /* VCPU is blocked on memory-event ring. */
> -#define _VPF_mem_event       4
> -#define VPF_mem_event        (1UL<<_VPF_mem_event)
> + /* VCPU is blocked on mem_paging ring. */
> +#define _VPF_me_mem_paging   4
> +#define VPF_me_mem_paging    (1UL<<_VPF_me_mem_paging)
> + /* VCPU is blocked on mem_access ring. */
> +#define _VPF_me_mem_access   5
> +#define VPF_me_mem_access    (1UL<<_VPF_me_mem_access)
>
>  static inline int vcpu_runnable(struct vcpu *v)
>  {
>
>
>
> ------------------------------
>
> Message: 3
> Date: Tue, 22 Nov 2011 22:15:19 +0100
> From: Olaf Hering <olaf@aepfle.de>
> To: Keir Fraser <keir.xen@gmail.com>
> Cc: xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] Need help with fixing the Xen waitqueue
> 	feature
> Message-ID: <20111122211519.GA1039@aepfle.de>
> Content-Type: text/plain; charset=utf-8
>
> On Tue, Nov 22, Olaf Hering wrote:
>
>> On Tue, Nov 22, Keir Fraser wrote:
>>
>> > Could it have ended up on the waitqueue?
>>
>> Unlikely, but I will add checks for that as well.
>
> I posted three changes which make use of the wait queues.
> For some reason the code at the very end of p2m_mem_paging_populate()
> triggers when d is dom0, so its vcpu is put to sleep.
>
>
> Olaf
>
>
>
> ------------------------------
>
> Message: 4
> Date: Tue, 22 Nov 2011 21:53:35 +0000
> From: Keir Fraser <keir.xen@gmail.com>
> To: Olaf Hering <olaf@aepfle.de>
> Cc: xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] Need help with fixing the Xen waitqueue
> 	feature
> Message-ID: <CAF1CA5F.2560C%keir.xen@gmail.com>
> Content-Type: text/plain;	charset="US-ASCII"
>
> On 22/11/2011 21:15, "Olaf Hering" <olaf@aepfle.de> wrote:
>
>> On Tue, Nov 22, Olaf Hering wrote:
>>
>>> On Tue, Nov 22, Keir Fraser wrote:
>>>
>>>> Could it have ended up on the waitqueue?
>>>
>>> Unlikely, but I will add checks for that as well.
>>
>> I posted three changes which make use of the wait queues.
>> For some reason the code at the very end of p2m_mem_paging_populate()
>> triggers when d is dom0, so its vcpu is put to sleep.
>
> We obviously can't have dom0 going to sleep on paging work. This, at
> least,
> isn't a wait-queue bug.
>
>> Olaf
>
>
>
>
>
> ------------------------------
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
>
> End of Xen-devel Digest, Vol 81, Issue 296
> ******************************************
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 3 of 3] RFC: xenpaging: use waitqueue in ept_get
  2011-11-23  4:52 ` [PATCH 3 of 3] RFC: xenpaging: use waitqueue in ept_get Andres Lagar-Cavilla
@ 2011-11-23 16:37   ` Olaf Hering
  2011-11-23 17:36     ` Andres Lagar-Cavilla
  0 siblings, 1 reply; 7+ messages in thread
From: Olaf Hering @ 2011-11-23 16:37 UTC (permalink / raw)
  To: Andres Lagar-Cavilla; +Cc: xen-devel

On Tue, Nov 22, Andres Lagar-Cavilla wrote:

> Hi Olaf,
> 
> thanks for posting these RFC patches, great work!
> 
> I have several comments. Most importantly, I want to hash out the
> interactions between these wait queues and the work I've been doing on p2m
> synchronization. I've been runnning Xen with synchronized (i.e. locking)
> p2m lookups for a few weeks now with little/no trouble. You can refer to a
> patch I posted to the list previously, which I can resend. (I'm waiting on
> feedback on some previous patches I sent to keep pushing on this work)
> 
> - I think the waitqueue should be part of struct arch_domain. It is an x86
> construct that applies only to the base level p2m of the domain, not every
> possible p2m in a nested setting.

Good point, I will move it. On the other hand, its current location cant
be the final solution. A wake_up would start all waiting vcpus, not just
the ones waiting for a certain gfn (in case of paging). There needs to
be a better way for selective wake_up.

> - The right place to wait is not ept->get_entry, imho, but rather the
> implementation-independent caller (get_gfn_type_access). More p2m
> implementations could be added in the future (however unlikely that may
> be) that can also perform paging.

The wait could probably be moved one level up, yes.

> - With locking p2m lookups, one needs to be careful about in_atomic. I
> haven't performed a full audit of all callers, but this is what I can say:
> 1. there are no code paths that perform recursive p2m lookups, *on
> different gfns*, with p2m_query_t != p2m_query
> 2. there are recursive lookups of different gfns, with p2m_query_t ==
> p2m_query, most notably pod sweeps. These do not represent a problem here,
> since those paths will skip over gfns that are paged.
> 
> Why is this important? Because we need to be in a position where a code
> snippet such as
> 
> get_gfn_type_access(gfn) {
> retry:
>    p2m_lock()
>    mfn = p2m->get_entry(gfn, &p2mt)
>    if (p2mt == paging) {
>        p2m_unlock()
>        sleep_on_waitqueue()
>        goto retry;
>    }
> 
> works. This will get us a non-racy p2m that sends vcpu's to sleep without
> holding locks. Makes sense?

Something like that can be done if needed, yes.

> - What is the plan for grant operations for pv-on-hvm drivers? Those will
> enter get_gfn* holding the domain lock, and thus in an atomic context.

Is that a new thing? So far my PVonHVM guests did not encounter that
issue. I see do_grant_table_op() taking the domain_lock, but is this
code path really entered from the guest or rather from dom0?
__get_paged_frame() returns GNTST_eagain, and that needs to be handled
by callers of do_grant_table_op. linux-2.6.18-xen.hg has code to retry
the grant operation in that case.

Olaf

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: mem_event: use wait queue when ring is full
  2011-11-23  4:58 ` RFC: mem_event: use wait queue when ring is full Andres Lagar-Cavilla
@ 2011-11-23 16:49   ` Olaf Hering
  2011-11-23 17:17     ` Keir Fraser
  0 siblings, 1 reply; 7+ messages in thread
From: Olaf Hering @ 2011-11-23 16:49 UTC (permalink / raw)
  To: Andres Lagar-Cavilla; +Cc: xen-devel, adin

On Tue, Nov 22, Andres Lagar-Cavilla wrote:

> Olaf, two questions here
> 
> - do you have any insight for events caused by foreign mappings? Those
> will be lost with a full ring, with or without wait queues

The callers of mem_event_check_ring() have to retry if the ring is full.
Thats what happens with p2m_mem_paging_populate(), the callers return
-ENOENT and expect a retry at some later point.

> - we have posted a patch (twice) previously, with changes to ring
> management, most importantly sending guest vcpus to sleep when space in
> the ring is < d->max_vcpus. I see these two patches as complementary. What
> is your take?

I'm not proposing to include my patch as is, because it has one issue:
wake_up will start all waiting vcpus even if there is just a single slot
free in the ringbuffer. You patch is better in this respect because only
a few will be started again.

I will send comments for it later.

Olaf

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: mem_event: use wait queue when ring is full
  2011-11-23 16:49   ` Olaf Hering
@ 2011-11-23 17:17     ` Keir Fraser
  2011-11-23 18:23       ` Olaf Hering
  0 siblings, 1 reply; 7+ messages in thread
From: Keir Fraser @ 2011-11-23 17:17 UTC (permalink / raw)
  To: Olaf Hering, Andres Lagar-Cavilla; +Cc: xen-devel, adin

On 23/11/2011 16:49, "Olaf Hering" <olaf@aepfle.de> wrote:

> On Tue, Nov 22, Andres Lagar-Cavilla wrote:
> 
>> Olaf, two questions here
>> 
>> - do you have any insight for events caused by foreign mappings? Those
>> will be lost with a full ring, with or without wait queues
> 
> The callers of mem_event_check_ring() have to retry if the ring is full.
> Thats what happens with p2m_mem_paging_populate(), the callers return
> -ENOENT and expect a retry at some later point.
> 
>> - we have posted a patch (twice) previously, with changes to ring
>> management, most importantly sending guest vcpus to sleep when space in
>> the ring is < d->max_vcpus. I see these two patches as complementary. What
>> is your take?
> 
> I'm not proposing to include my patch as is, because it has one issue:
> wake_up will start all waiting vcpus even if there is just a single slot
> free in the ringbuffer. You patch is better in this respect because only
> a few will be started again.

Do you need a wake_up_one() function?

 -- Keir

> I will send comments for it later.
> 
> Olaf
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 3 of 3] RFC: xenpaging: use waitqueue in ept_get
  2011-11-23 16:37   ` Olaf Hering
@ 2011-11-23 17:36     ` Andres Lagar-Cavilla
  0 siblings, 0 replies; 7+ messages in thread
From: Andres Lagar-Cavilla @ 2011-11-23 17:36 UTC (permalink / raw)
  To: Olaf Hering; +Cc: xen-devel

Below:
> On Tue, Nov 22, Andres Lagar-Cavilla wrote:
>
>> Hi Olaf,
>>
>> thanks for posting these RFC patches, great work!
>>
>> I have several comments. Most importantly, I want to hash out the
>> interactions between these wait queues and the work I've been doing on
>> p2m
>> synchronization. I've been runnning Xen with synchronized (i.e. locking)
>> p2m lookups for a few weeks now with little/no trouble. You can refer to
>> a
>> patch I posted to the list previously, which I can resend. (I'm waiting
>> on
>> feedback on some previous patches I sent to keep pushing on this work)
>>
>> - I think the waitqueue should be part of struct arch_domain. It is an
>> x86
>> construct that applies only to the base level p2m of the domain, not
>> every
>> possible p2m in a nested setting.
>
> Good point, I will move it. On the other hand, its current location cant
> be the final solution. A wake_up would start all waiting vcpus, not just
> the ones waiting for a certain gfn (in case of paging). There needs to
> be a better way for selective wake_up.

As long as you wrap the wait queue go-to-sleep action in a while loop that
rechecks the sleep condition before exiting the loop, this should be fine.
It's a standard idiom. There is an argument against spurious wake-ups, but
imhois no biggie.

>
>> - The right place to wait is not ept->get_entry, imho, but rather the
>> implementation-independent caller (get_gfn_type_access). More p2m
>> implementations could be added in the future (however unlikely that may
>> be) that can also perform paging.
>
> The wait could probably be moved one level up, yes.
>
>> - With locking p2m lookups, one needs to be careful about in_atomic. I
>> haven't performed a full audit of all callers, but this is what I can
>> say:
>> 1. there are no code paths that perform recursive p2m lookups, *on
>> different gfns*, with p2m_query_t != p2m_query
>> 2. there are recursive lookups of different gfns, with p2m_query_t ==
>> p2m_query, most notably pod sweeps. These do not represent a problem
>> here,
>> since those paths will skip over gfns that are paged.
>>
>> Why is this important? Because we need to be in a position where a code
>> snippet such as
>>
>> get_gfn_type_access(gfn) {
>> retry:
>>    p2m_lock()
>>    mfn = p2m->get_entry(gfn, &p2mt)
>>    if (p2mt == paging) {
>>        p2m_unlock()
>>        sleep_on_waitqueue()
>>        goto retry;
>>    }
>>
>> works. This will get us a non-racy p2m that sends vcpu's to sleep
>> without
>> holding locks. Makes sense?
>
> Something like that can be done if needed, yes.
>
>> - What is the plan for grant operations for pv-on-hvm drivers? Those
>> will
>> enter get_gfn* holding the domain lock, and thus in an atomic context.
>
> Is that a new thing? So far my PVonHVM guests did not encounter that
> issue. I see do_grant_table_op() taking the domain_lock, but is this
> code path really entered from the guest or rather from dom0?
> __get_paged_frame() returns GNTST_eagain, and that needs to be handled
> by callers of do_grant_table_op. linux-2.6.18-xen.hg has code to retry
> the grant operation in that case.

I'm on your side here. I've seen posts in the list about putting dom0 into
an hvm container using ept, though.

Andres
>
> Olaf
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: mem_event: use wait queue when ring is full
  2011-11-23 17:17     ` Keir Fraser
@ 2011-11-23 18:23       ` Olaf Hering
  0 siblings, 0 replies; 7+ messages in thread
From: Olaf Hering @ 2011-11-23 18:23 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel, Andres Lagar-Cavilla, adin

On Wed, Nov 23, Keir Fraser wrote:

> On 23/11/2011 16:49, "Olaf Hering" <olaf@aepfle.de> wrote:
> > I'm not proposing to include my patch as is, because it has one issue:
> > wake_up will start all waiting vcpus even if there is just a single slot
> > free in the ringbuffer. You patch is better in this respect because only
> > a few will be started again.
> 
> Do you need a wake_up_one() function?

I'm not sure.

In case of paging, if gfn 1234 is missing, several vcpus may wait for it
to come back. Other vcpus may wait for other gfns. So if we had a struct
waitqueue_head for gfn 1234, and other individual gfns, that would help
for this case.
Since there cant be more gfns to wait for than the available guest
vcpus, a list of waitqueue_head's with a gfn attached to it would work.

If that sounds to crazy, or if I'm on the wrong track, how would you do that?
(Perhaps I should prepare a patch to demonstrate what I mean.)

Olaf

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-11-23 18:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <mailman.1126.1321998852.12970.xen-devel@lists.xensource.com>
2011-11-23  4:52 ` [PATCH 3 of 3] RFC: xenpaging: use waitqueue in ept_get Andres Lagar-Cavilla
2011-11-23 16:37   ` Olaf Hering
2011-11-23 17:36     ` Andres Lagar-Cavilla
2011-11-23  4:58 ` RFC: mem_event: use wait queue when ring is full Andres Lagar-Cavilla
2011-11-23 16:49   ` Olaf Hering
2011-11-23 17:17     ` Keir Fraser
2011-11-23 18:23       ` Olaf Hering

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.