From: Zhenyu Wang <zhenyuw@linux.intel.com>
To: Yan Zhao <yan.y.zhao@intel.com>
Cc: intel-gvt-dev@lists.freedesktop.org, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, alex.williamson@redhat.com,
zhenyuw@linux.intel.com, pbonzini@redhat.com,
kevin.tian@intel.com, peterx@redhat.com
Subject: Re: [PATCH v4 7/7] drm/i915/gvt: rw more pages a time for shadow context
Date: Mon, 16 Mar 2020 11:23:20 +0800 [thread overview]
Message-ID: <20200316032320.GA20491@zhen-hp.sh.intel.com> (raw)
In-Reply-To: <20200313031233.8094-1-yan.y.zhao@intel.com>
[-- Attachment #1: Type: text/plain, Size: 8961 bytes --]
On 2020.03.12 23:12:33 -0400, Yan Zhao wrote:
> 1. as shadow context is pinned in intel_vgpu_setup_submission() and
> unpinned in intel_vgpu_clean_submission(), its base virtual address of
> is safely obtained from lrc_reg_state. no need to call kmap()/kunmap()
> repeatedly.
>
> 2. IOVA(GPA)s of context pages are checked in this patch and if they are
> consecutive, read/write them together in one
> intel_gvt_hypervisor_read_gpa() / intel_gvt_hypervisor_write_gpa().
>
> after the two changes in this patch,
Better split the kmap remove and consecutive copy one for bisect.
> average cycles for populate_shadow_context() and update_guest_context()
> are reduced by ~10000-20000 cycles, depending on the average number of
> consecutive pages in each read/write.
>
> (1) comparison of cycles of
> populate_shadow_context() + update_guest_context() when executing
> different benchmarks
> -------------------------------------------------------------
> | cycles | glmark2 | lightsmark | openarena |
> |-------------------------------------------------------------|
> | before this patch | 65968 | 97852 | 61373 |
> | after this patch | 56017 (85%) | 73862 (75%) | 47463 (77%) |
> -------------------------------------------------------------
>
> (2) average count of pages read/written a time in
> populate_shadow_context() and update_guest_context()
> for each benchmark
>
> -----------------------------------------------------------
> | page cnt | glmark2 | lightsmark | openarena |
> |-----------------------------------------------------------|
> | before this patch | 1 | 1 | 1 |
> | after this patch | 5.25 | 19.99 | 20 |
> ------------------------------------------------------------
>
> (3) comparison of benchmarks scores
> ---------------------------------------------------------------------
> | score | glmark2 | lightsmark | openarena |
> |---------------------------------------------------------------------|
> | before this patch | 1244 | 222.18 | 114.4 |
> | after this patch | 1248 (100.3%) | 225.8 (101.6%) | 115.0 (100.9%) |
> ---------------------------------------------------------------------
>
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
> drivers/gpu/drm/i915/gvt/scheduler.c | 95 ++++++++++++++++++++--------
> 1 file changed, 67 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c
> index 1c95bf8cbed0..852d924f6abc 100644
> --- a/drivers/gpu/drm/i915/gvt/scheduler.c
> +++ b/drivers/gpu/drm/i915/gvt/scheduler.c
> @@ -128,16 +128,21 @@ static int populate_shadow_context(struct intel_vgpu_workload *workload)
> {
> struct intel_vgpu *vgpu = workload->vgpu;
> struct intel_gvt *gvt = vgpu->gvt;
> - struct drm_i915_gem_object *ctx_obj =
> - workload->req->context->state->obj;
> + struct intel_context *ctx = workload->req->context;
> struct execlist_ring_context *shadow_ring_context;
> - struct page *page;
> void *dst;
> + void *context_base;
> unsigned long context_gpa, context_page_num;
> + unsigned long gpa_base; /* first gpa of consecutive GPAs */
> + unsigned long gpa_size; /* size of consecutive GPAs */
> int i;
>
> - page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
> - shadow_ring_context = kmap(page);
> + GEM_BUG_ON(!intel_context_is_pinned(ctx));
> +
> + context_base = (void *) ctx->lrc_reg_state -
> + (LRC_STATE_PN << I915_GTT_PAGE_SHIFT);
> +
> + shadow_ring_context = (void *) ctx->lrc_reg_state;
>
> sr_oa_regs(workload, (u32 *)shadow_ring_context, true);
> #define COPY_REG(name) \
> @@ -169,7 +174,6 @@ static int populate_shadow_context(struct intel_vgpu_workload *workload)
> I915_GTT_PAGE_SIZE - sizeof(*shadow_ring_context));
>
> sr_oa_regs(workload, (u32 *)shadow_ring_context, false);
> - kunmap(page);
>
> if (IS_RESTORE_INHIBIT(shadow_ring_context->ctx_ctrl.val))
> return 0;
> @@ -184,8 +188,12 @@ static int populate_shadow_context(struct intel_vgpu_workload *workload)
> if (IS_BROADWELL(gvt->gt->i915) && workload->engine->id == RCS0)
> context_page_num = 19;
>
> - i = 2;
> - while (i < context_page_num) {
> +
> + /* find consecutive GPAs from gma until the first inconsecutive GPA.
> + * read from the continuous GPAs into dst virtual address
> + */
> + gpa_size = 0;
> + for (i = 2; i < context_page_num; i++) {
> context_gpa = intel_vgpu_gma_to_gpa(vgpu->gtt.ggtt_mm,
> (u32)((workload->ctx_desc.lrca + i) <<
> I915_GTT_PAGE_SHIFT));
> @@ -194,12 +202,24 @@ static int populate_shadow_context(struct intel_vgpu_workload *workload)
> return -EFAULT;
> }
>
> - page = i915_gem_object_get_page(ctx_obj, i);
> - dst = kmap(page);
> - intel_gvt_hypervisor_read_gpa(vgpu, context_gpa, dst,
> - I915_GTT_PAGE_SIZE);
> - kunmap(page);
> - i++;
> + if (gpa_size == 0) {
> + gpa_base = context_gpa;
> + dst = context_base + (i << I915_GTT_PAGE_SHIFT);
> + } else if (context_gpa != gpa_base + gpa_size)
> + goto read;
> +
> + gpa_size += I915_GTT_PAGE_SIZE;
> +
> + if (i == context_page_num - 1)
> + goto read;
> +
> + continue;
> +
> +read:
> + intel_gvt_hypervisor_read_gpa(vgpu, gpa_base, dst, gpa_size);
> + gpa_base = context_gpa;
> + gpa_size = I915_GTT_PAGE_SIZE;
> + dst = context_base + (i << I915_GTT_PAGE_SHIFT);
> }
> return 0;
> }
> @@ -784,19 +804,23 @@ static void update_guest_context(struct intel_vgpu_workload *workload)
> {
> struct i915_request *rq = workload->req;
> struct intel_vgpu *vgpu = workload->vgpu;
> - struct drm_i915_gem_object *ctx_obj = rq->context->state->obj;
> + struct intel_context *ctx = workload->req->context;
> struct execlist_ring_context *shadow_ring_context;
> - struct page *page;
> - void *src;
> unsigned long context_gpa, context_page_num;
> + unsigned long gpa_base; /* first gpa of consecutive GPAs */
> + unsigned long gpa_size; /* size of consecutive GPAs*/
> int i;
> u32 ring_base;
> u32 head, tail;
> u16 wrap_count;
> + void *src;
> + void *context_base;
>
> gvt_dbg_sched("ring id %d workload lrca %x\n", rq->engine->id,
> workload->ctx_desc.lrca);
>
> + GEM_BUG_ON(!intel_context_is_pinned(ctx));
> +
> head = workload->rb_head;
> tail = workload->rb_tail;
> wrap_count = workload->guest_rb_head >> RB_HEAD_WRAP_CNT_OFF;
> @@ -820,9 +844,14 @@ static void update_guest_context(struct intel_vgpu_workload *workload)
> if (IS_BROADWELL(rq->i915) && rq->engine->id == RCS0)
> context_page_num = 19;
>
> - i = 2;
> + context_base = (void *) ctx->lrc_reg_state -
> + (LRC_STATE_PN << I915_GTT_PAGE_SHIFT);
>
> - while (i < context_page_num) {
> + /* find consecutive GPAs from gma until the first inconsecutive GPA.
> + * write to the consecutive GPAs from src virtual address
> + */
> + gpa_size = 0;
> + for (i = 2; i < context_page_num; i++) {
> context_gpa = intel_vgpu_gma_to_gpa(vgpu->gtt.ggtt_mm,
> (u32)((workload->ctx_desc.lrca + i) <<
> I915_GTT_PAGE_SHIFT));
> @@ -831,19 +860,30 @@ static void update_guest_context(struct intel_vgpu_workload *workload)
> return;
> }
>
> - page = i915_gem_object_get_page(ctx_obj, i);
> - src = kmap(page);
> - intel_gvt_hypervisor_write_gpa(vgpu, context_gpa, src,
> - I915_GTT_PAGE_SIZE);
> - kunmap(page);
> - i++;
> + if (gpa_size == 0) {
> + gpa_base = context_gpa;
> + src = context_base + (i << I915_GTT_PAGE_SHIFT);
> + } else if (context_gpa != gpa_base + gpa_size)
> + goto write;
> +
> + gpa_size += I915_GTT_PAGE_SIZE;
> +
> + if (i == context_page_num - 1)
> + goto write;
> +
> + continue;
> +
> +write:
> + intel_gvt_hypervisor_write_gpa(vgpu, gpa_base, src, gpa_size);
> + gpa_base = context_gpa;
> + gpa_size = I915_GTT_PAGE_SIZE;
> + src = context_base + (i << I915_GTT_PAGE_SHIFT);
> }
>
> intel_gvt_hypervisor_write_gpa(vgpu, workload->ring_context_gpa +
> RING_CTX_OFF(ring_header.val), &workload->rb_tail, 4);
>
> - page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
> - shadow_ring_context = kmap(page);
> + shadow_ring_context = (void *) ctx->lrc_reg_state;
>
> #define COPY_REG(name) \
> intel_gvt_hypervisor_write_gpa(vgpu, workload->ring_context_gpa + \
> @@ -861,7 +901,6 @@ static void update_guest_context(struct intel_vgpu_workload *workload)
> sizeof(*shadow_ring_context),
> I915_GTT_PAGE_SIZE - sizeof(*shadow_ring_context));
>
> - kunmap(page);
> }
>
> void intel_vgpu_clean_workloads(struct intel_vgpu *vgpu,
> --
> 2.17.1
>
--
Open Source Technology Center, Intel ltd.
$gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
next prev parent reply other threads:[~2020-03-16 3:36 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-13 3:05 [PATCH v4 0/7] use vfio_dma_rw to read/write IOVAs from CPU side Yan Zhao
2020-03-13 3:07 ` [PATCH v4 1/7] vfio: allow external user to get vfio group from device Yan Zhao
2020-03-13 3:09 ` [PATCH v4 2/7] vfio: introduce vfio_dma_rw to read/write a range of IOVAs Yan Zhao
2020-04-05 16:17 ` Kees Cook
2020-04-07 3:48 ` Yan Zhao
2020-03-13 3:09 ` [PATCH v4 3/7] vfio: avoid inefficient operations on VFIO group in vfio_pin/unpin_pages Yan Zhao
2020-03-13 3:10 ` [PATCH v4 4/7] drm/i915/gvt: hold reference of VFIO group during opening of vgpu Yan Zhao
2020-03-13 3:11 ` [PATCH v4 5/7] drm/i915/gvt: subsitute kvm_read/write_guest with vfio_dma_rw Yan Zhao
2020-03-13 3:11 ` [PATCH v4 6/7] drm/i915/gvt: switch to user vfio_group_pin/upin_pages Yan Zhao
2020-03-13 3:12 ` [PATCH v4 7/7] drm/i915/gvt: rw more pages a time for shadow context Yan Zhao
2020-03-16 3:23 ` Zhenyu Wang [this message]
2020-03-13 22:29 ` [PATCH v4 0/7] use vfio_dma_rw to read/write IOVAs from CPU side Alex Williamson
2020-03-15 23:56 ` Yan Zhao
2020-03-16 3:24 ` Zhenyu Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200316032320.GA20491@zhen-hp.sh.intel.com \
--to=zhenyuw@linux.intel.com \
--cc=alex.williamson@redhat.com \
--cc=intel-gvt-dev@lists.freedesktop.org \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox