* [PATCH] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps @ 2015-08-31 16:27 Michał Winiarski 2015-08-31 16:40 ` Chris Wilson 2015-08-31 16:59 ` [PATCH v2] " Michał Winiarski 0 siblings, 2 replies; 15+ messages in thread From: Michał Winiarski @ 2015-08-31 16:27 UTC (permalink / raw) To: intel-gfx On each call to gen8_alloc_va_range_3lvl we're allocating temporary bitmaps needed for error handling. Unfortunately, when we increase address space size (48b ppgtt) we do additional (512 - 4) calls to kcalloc, increasing latency between exec and actual start of execution on the GPU. Let's just do a single kcalloc and setup proper offsets in an array, we can also drop the size from free_gen8_temp_bitmaps since it's no longer needed. Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> --- drivers/gpu/drm/i915/i915_gem_gtt.c | 35 +++++++++++++++-------------------- 1 file changed, 15 insertions(+), 20 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 4a76807..fd5545a 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -1126,13 +1126,9 @@ unwind_out: } static void -free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts, - uint32_t pdpes) +free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts) { - int i; - - for (i = 0; i < pdpes; i++) - kfree(new_pts[i]); + kfree(*new_pts); kfree(new_pts); kfree(new_pds); } @@ -1154,17 +1150,16 @@ int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds, return -ENOMEM; pts = kcalloc(pdpes, sizeof(unsigned long *), GFP_KERNEL); - if (!pts) { - kfree(pds); - return -ENOMEM; - } + if (!pts) + goto err_out; - for (i = 0; i < pdpes; i++) { - pts[i] = kcalloc(BITS_TO_LONGS(I915_PDES), - sizeof(unsigned long), GFP_KERNEL); - if (!pts[i]) - goto err_out; - } + *pts = kcalloc(pdpes * BITS_TO_LONGS(I915_PDES), + sizeof(unsigned long), GFP_KERNEL); + if (!*pts) + goto err_out; + + for (i = 0; i < pdpes; i++) + pts[i] = *pts + i * BITS_TO_LONGS(I915_PDES); *new_pds = pds; *new_pts = pts; @@ -1172,7 +1167,7 @@ int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds, return 0; err_out: - free_gen8_temp_bitmaps(pds, pts, pdpes); + free_gen8_temp_bitmaps(pds, pts); return -ENOMEM; } @@ -1220,7 +1215,7 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm, ret = gen8_ppgtt_alloc_page_directories(vm, pdp, start, length, new_page_dirs); if (ret) { - free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes); + free_gen8_temp_bitmaps(new_page_dirs, new_page_tables); return ret; } @@ -1278,7 +1273,7 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm, gen8_setup_page_directory(ppgtt, pdp, pd, pdpe); } - free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes); + free_gen8_temp_bitmaps(new_page_dirs, new_page_tables); mark_tlbs_dirty(ppgtt); return 0; @@ -1291,7 +1286,7 @@ err_out: for_each_set_bit(pdpe, new_page_dirs, pdpes) free_pd(dev, pdp->page_directory[pdpe]); - free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes); + free_gen8_temp_bitmaps(new_page_dirs, new_page_tables); mark_tlbs_dirty(ppgtt); return ret; } -- 2.4.3 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps 2015-08-31 16:27 [PATCH] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps Michał Winiarski @ 2015-08-31 16:40 ` Chris Wilson 2015-08-31 16:59 ` [PATCH v2] " Michał Winiarski 1 sibling, 0 replies; 15+ messages in thread From: Chris Wilson @ 2015-08-31 16:40 UTC (permalink / raw) To: Michał Winiarski; +Cc: intel-gfx On Mon, Aug 31, 2015 at 06:27:40PM +0200, Michał Winiarski wrote: > On each call to gen8_alloc_va_range_3lvl we're allocating temporary > bitmaps needed for error handling. Unfortunately, when we increase > address space size (48b ppgtt) we do additional (512 - 4) calls to > kcalloc, increasing latency between exec and actual start of execution > on the GPU. Let's just do a single kcalloc and setup proper offsets in > an array, we can also drop the size from free_gen8_temp_bitmaps since > it's no longer needed. > > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> > Cc: Michel Thierry <michel.thierry@intel.com> > Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Whilst you are here, mind using GFP_TEMPORARY as well? -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps 2015-08-31 16:27 [PATCH] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps Michał Winiarski 2015-08-31 16:40 ` Chris Wilson @ 2015-08-31 16:59 ` Michał Winiarski 2015-08-31 18:42 ` Chris Wilson 2015-09-01 9:06 ` [PATCH v3] " Michał Winiarski 1 sibling, 2 replies; 15+ messages in thread From: Michał Winiarski @ 2015-08-31 16:59 UTC (permalink / raw) To: intel-gfx On each call to gen8_alloc_va_range_3lvl we're allocating temporary bitmaps needed for error handling. Unfortunately, when we increase address space size (48b ppgtt) we do additional (512 - 4) calls to kcalloc, increasing latency between exec and actual start of execution on the GPU. Let's just do a single kcalloc and setup proper offsets in an array, we can also drop the size from free_gen8_temp_bitmaps since it's no longer needed. v2: Use GFP_TEMPORARY to make the allocations reclaimable. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> --- drivers/gpu/drm/i915/i915_gem_gtt.c | 39 ++++++++++++++++--------------------- 1 file changed, 17 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 4a76807..9270c47 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -1126,13 +1126,9 @@ unwind_out: } static void -free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts, - uint32_t pdpes) +free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts) { - int i; - - for (i = 0; i < pdpes; i++) - kfree(new_pts[i]); + kfree(*new_pts); kfree(new_pts); kfree(new_pds); } @@ -1149,22 +1145,21 @@ int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds, unsigned long *pds; unsigned long **pts; - pds = kcalloc(BITS_TO_LONGS(pdpes), sizeof(unsigned long), GFP_KERNEL); + pds = kcalloc(BITS_TO_LONGS(pdpes), sizeof(unsigned long), GFP_TEMPORARY); if (!pds) return -ENOMEM; - pts = kcalloc(pdpes, sizeof(unsigned long *), GFP_KERNEL); - if (!pts) { - kfree(pds); - return -ENOMEM; - } + pts = kcalloc(pdpes, sizeof(unsigned long *), GFP_TEMPORARY); + if (!pts) + goto err_out; - for (i = 0; i < pdpes; i++) { - pts[i] = kcalloc(BITS_TO_LONGS(I915_PDES), - sizeof(unsigned long), GFP_KERNEL); - if (!pts[i]) - goto err_out; - } + *pts = kcalloc(pdpes * BITS_TO_LONGS(I915_PDES), + sizeof(unsigned long), GFP_TEMPORARY); + if (!*pts) + goto err_out; + + for (i = 0; i < pdpes; i++) + pts[i] = *pts + i * BITS_TO_LONGS(I915_PDES); *new_pds = pds; *new_pts = pts; @@ -1172,7 +1167,7 @@ int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds, return 0; err_out: - free_gen8_temp_bitmaps(pds, pts, pdpes); + free_gen8_temp_bitmaps(pds, pts); return -ENOMEM; } @@ -1220,7 +1215,7 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm, ret = gen8_ppgtt_alloc_page_directories(vm, pdp, start, length, new_page_dirs); if (ret) { - free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes); + free_gen8_temp_bitmaps(new_page_dirs, new_page_tables); return ret; } @@ -1278,7 +1273,7 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm, gen8_setup_page_directory(ppgtt, pdp, pd, pdpe); } - free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes); + free_gen8_temp_bitmaps(new_page_dirs, new_page_tables); mark_tlbs_dirty(ppgtt); return 0; @@ -1291,7 +1286,7 @@ err_out: for_each_set_bit(pdpe, new_page_dirs, pdpes) free_pd(dev, pdp->page_directory[pdpe]); - free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes); + free_gen8_temp_bitmaps(new_page_dirs, new_page_tables); mark_tlbs_dirty(ppgtt); return ret; } -- 2.4.3 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v2] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps 2015-08-31 16:59 ` [PATCH v2] " Michał Winiarski @ 2015-08-31 18:42 ` Chris Wilson 2015-09-01 9:03 ` Michał Winiarski 2015-09-01 9:06 ` [PATCH v3] " Michał Winiarski 1 sibling, 1 reply; 15+ messages in thread From: Chris Wilson @ 2015-08-31 18:42 UTC (permalink / raw) To: Michał Winiarski; +Cc: intel-gfx On Mon, Aug 31, 2015 at 06:59:40PM +0200, Michał Winiarski wrote: > On each call to gen8_alloc_va_range_3lvl we're allocating temporary > bitmaps needed for error handling. Unfortunately, when we increase > address space size (48b ppgtt) we do additional (512 - 4) calls to > kcalloc, increasing latency between exec and actual start of execution > on the GPU. Let's just do a single kcalloc and setup proper offsets in > an array, we can also drop the size from free_gen8_temp_bitmaps since > it's no longer needed. > > v2: Use GFP_TEMPORARY to make the allocations reclaimable. > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> > Cc: Michel Thierry <michel.thierry@intel.com> > Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Is there any reason why it remains a 2D array though? Looks like it can be reduced to a single block. And why do we allocate a bitmap for the whole address space rather than just the range being allocated? While you may just answer the questions posed, this looks clumsy: > + *pts = kcalloc(pdpes * BITS_TO_LONGS(I915_PDES), > + sizeof(unsigned long), GFP_TEMPORARY); > + if (!*pts) > + goto err_out; > + > + for (i = 0; i < pdpes; i++) > + pts[i] = *pts + i * BITS_TO_LONGS(I915_PDES); i..e for (i = 1; i < pdpes; i++) pts[i] = pts[0] + i* BITS_TO_LONGS(I915_PDES); raises fewer questions. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps 2015-08-31 18:42 ` Chris Wilson @ 2015-09-01 9:03 ` Michał Winiarski 0 siblings, 0 replies; 15+ messages in thread From: Michał Winiarski @ 2015-09-01 9:03 UTC (permalink / raw) To: Chris Wilson, intel-gfx, Mika Kuoppala, Michel Thierry On Mon, Aug 31, 2015 at 07:42:34PM +0100, Chris Wilson wrote: > On Mon, Aug 31, 2015 at 06:59:40PM +0200, Michał Winiarski wrote: > > On each call to gen8_alloc_va_range_3lvl we're allocating temporary > > bitmaps needed for error handling. Unfortunately, when we increase > > address space size (48b ppgtt) we do additional (512 - 4) calls to > > kcalloc, increasing latency between exec and actual start of execution > > on the GPU. Let's just do a single kcalloc and setup proper offsets in > > an array, we can also drop the size from free_gen8_temp_bitmaps since > > it's no longer needed. > > > > v2: Use GFP_TEMPORARY to make the allocations reclaimable. > > > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> > > Cc: Michel Thierry <michel.thierry@intel.com> > > Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> > > Is there any reason why it remains a 2D array though? Looks like it can > be reduced to a single block. And why do we allocate a bitmap for the > whole address space rather than just the range being allocated? No particular reason, let's try that. It's not the whole address space... Quite possibly just 1/512 of the whole address space, as the PUD is handled in gen8_alloc_va_range_4lvl. I guess it was done to simplify... something, although I can't tell what at this point. > While you may just answer the questions posed, this looks clumsy: Yup - missed that, I was using temp variable before sending the patch. -Michał > > > + *pts = kcalloc(pdpes * BITS_TO_LONGS(I915_PDES), > > + sizeof(unsigned long), GFP_TEMPORARY); > > + if (!*pts) > > + goto err_out; > > + > > + for (i = 0; i < pdpes; i++) > > + pts[i] = *pts + i * BITS_TO_LONGS(I915_PDES); > > i..e > > for (i = 1; i < pdpes; i++) > pts[i] = pts[0] + i* BITS_TO_LONGS(I915_PDES); > > raises fewer questions. > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v3] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps 2015-08-31 16:59 ` [PATCH v2] " Michał Winiarski 2015-08-31 18:42 ` Chris Wilson @ 2015-09-01 9:06 ` Michał Winiarski 2015-09-02 13:40 ` Michel Thierry 1 sibling, 1 reply; 15+ messages in thread From: Michał Winiarski @ 2015-09-01 9:06 UTC (permalink / raw) To: intel-gfx On each call to gen8_alloc_va_range_3lvl we're allocating temporary bitmaps needed for error handling. Unfortunately, when we increase address space size (48b ppgtt) we do additional (512 - 4) calls to kcalloc, increasing latency between exec and actual start of execution on the GPU. Let's just do a single kcalloc, we can also drop the size from free_gen8_temp_bitmaps since it's no longer used. v2: Use GFP_TEMPORARY to make the allocations reclaimable. v3: Drop the 2D array, just allocate a single block. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> --- drivers/gpu/drm/i915/i915_gem_gtt.c | 45 +++++++++++++------------------------ 1 file changed, 16 insertions(+), 29 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 4a76807..f810762 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -1126,13 +1126,8 @@ unwind_out: } static void -free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts, - uint32_t pdpes) +free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long *new_pts) { - int i; - - for (i = 0; i < pdpes; i++) - kfree(new_pts[i]); kfree(new_pts); kfree(new_pds); } @@ -1142,29 +1137,20 @@ free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts, */ static int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds, - unsigned long ***new_pts, + unsigned long **new_pts, uint32_t pdpes) { - int i; unsigned long *pds; - unsigned long **pts; + unsigned long *pts; - pds = kcalloc(BITS_TO_LONGS(pdpes), sizeof(unsigned long), GFP_KERNEL); + pds = kcalloc(BITS_TO_LONGS(pdpes), sizeof(unsigned long), GFP_TEMPORARY); if (!pds) return -ENOMEM; - pts = kcalloc(pdpes, sizeof(unsigned long *), GFP_KERNEL); - if (!pts) { - kfree(pds); - return -ENOMEM; - } - - for (i = 0; i < pdpes; i++) { - pts[i] = kcalloc(BITS_TO_LONGS(I915_PDES), - sizeof(unsigned long), GFP_KERNEL); - if (!pts[i]) - goto err_out; - } + pts = kcalloc(pdpes * BITS_TO_LONGS(I915_PDES), + sizeof(unsigned long), GFP_TEMPORARY); + if (!pts) + goto err_out; *new_pds = pds; *new_pts = pts; @@ -1172,7 +1158,7 @@ int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds, return 0; err_out: - free_gen8_temp_bitmaps(pds, pts, pdpes); + free_gen8_temp_bitmaps(pds, pts); return -ENOMEM; } @@ -1193,7 +1179,7 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm, { struct i915_hw_ppgtt *ppgtt = container_of(vm, struct i915_hw_ppgtt, base); - unsigned long *new_page_dirs, **new_page_tables; + unsigned long *new_page_dirs, *new_page_tables; struct drm_device *dev = vm->dev; struct i915_page_directory *pd; const uint64_t orig_start = start; @@ -1220,14 +1206,14 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm, ret = gen8_ppgtt_alloc_page_directories(vm, pdp, start, length, new_page_dirs); if (ret) { - free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes); + free_gen8_temp_bitmaps(new_page_dirs, new_page_tables); return ret; } /* For every page directory referenced, allocate page tables */ gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) { ret = gen8_ppgtt_alloc_pagetabs(vm, pd, start, length, - new_page_tables[pdpe]); + new_page_tables + pdpe * BITS_TO_LONGS(I915_PDES)); if (ret) goto err_out; } @@ -1278,20 +1264,21 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm, gen8_setup_page_directory(ppgtt, pdp, pd, pdpe); } - free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes); + free_gen8_temp_bitmaps(new_page_dirs, new_page_tables); mark_tlbs_dirty(ppgtt); return 0; err_out: while (pdpe--) { - for_each_set_bit(temp, new_page_tables[pdpe], I915_PDES) + for_each_set_bit(temp, new_page_tables + pdpe * + BITS_TO_LONGS(I915_PDES), I915_PDES) free_pt(dev, pdp->page_directory[pdpe]->page_table[temp]); } for_each_set_bit(pdpe, new_page_dirs, pdpes) free_pd(dev, pdp->page_directory[pdpe]); - free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes); + free_gen8_temp_bitmaps(new_page_dirs, new_page_tables); mark_tlbs_dirty(ppgtt); return ret; } -- 2.4.3 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v3] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps 2015-09-01 9:06 ` [PATCH v3] " Michał Winiarski @ 2015-09-02 13:40 ` Michel Thierry 2015-09-02 13:46 ` Chris Wilson 0 siblings, 1 reply; 15+ messages in thread From: Michel Thierry @ 2015-09-02 13:40 UTC (permalink / raw) To: Michał Winiarski, intel-gfx On 9/1/2015 10:06 AM, Michał Winiarski wrote: > On each call to gen8_alloc_va_range_3lvl we're allocating temporary > bitmaps needed for error handling. Unfortunately, when we increase > address space size (48b ppgtt) we do additional (512 - 4) calls to > kcalloc, increasing latency between exec and actual start of execution > on the GPU. Let's just do a single kcalloc, we can also drop the size > from free_gen8_temp_bitmaps since it's no longer used. > > v2: Use GFP_TEMPORARY to make the allocations reclaimable. > v3: Drop the 2D array, just allocate a single block. > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> > Cc: Michel Thierry <michel.thierry@intel.com> > Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Unless Chris thinks otherwise, I see Michał already addressed his comments. Reviewed-by: Michel Thierry <michel.thierry@intel.com> > --- > drivers/gpu/drm/i915/i915_gem_gtt.c | 45 +++++++++++++------------------------ > 1 file changed, 16 insertions(+), 29 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c > index 4a76807..f810762 100644 > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c > @@ -1126,13 +1126,8 @@ unwind_out: > } > > static void > -free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts, > - uint32_t pdpes) > +free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long *new_pts) > { > - int i; > - > - for (i = 0; i < pdpes; i++) > - kfree(new_pts[i]); > kfree(new_pts); > kfree(new_pds); > } > @@ -1142,29 +1137,20 @@ free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts, > */ > static > int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds, > - unsigned long ***new_pts, > + unsigned long **new_pts, > uint32_t pdpes) > { > - int i; > unsigned long *pds; > - unsigned long **pts; > + unsigned long *pts; > > - pds = kcalloc(BITS_TO_LONGS(pdpes), sizeof(unsigned long), GFP_KERNEL); > + pds = kcalloc(BITS_TO_LONGS(pdpes), sizeof(unsigned long), GFP_TEMPORARY); > if (!pds) > return -ENOMEM; > > - pts = kcalloc(pdpes, sizeof(unsigned long *), GFP_KERNEL); > - if (!pts) { > - kfree(pds); > - return -ENOMEM; > - } > - > - for (i = 0; i < pdpes; i++) { > - pts[i] = kcalloc(BITS_TO_LONGS(I915_PDES), > - sizeof(unsigned long), GFP_KERNEL); > - if (!pts[i]) > - goto err_out; > - } > + pts = kcalloc(pdpes * BITS_TO_LONGS(I915_PDES), > + sizeof(unsigned long), GFP_TEMPORARY); > + if (!pts) > + goto err_out; > > *new_pds = pds; > *new_pts = pts; > @@ -1172,7 +1158,7 @@ int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds, > return 0; > > err_out: > - free_gen8_temp_bitmaps(pds, pts, pdpes); > + free_gen8_temp_bitmaps(pds, pts); > return -ENOMEM; > } > > @@ -1193,7 +1179,7 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm, > { > struct i915_hw_ppgtt *ppgtt = > container_of(vm, struct i915_hw_ppgtt, base); > - unsigned long *new_page_dirs, **new_page_tables; > + unsigned long *new_page_dirs, *new_page_tables; > struct drm_device *dev = vm->dev; > struct i915_page_directory *pd; > const uint64_t orig_start = start; > @@ -1220,14 +1206,14 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm, > ret = gen8_ppgtt_alloc_page_directories(vm, pdp, start, length, > new_page_dirs); > if (ret) { > - free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes); > + free_gen8_temp_bitmaps(new_page_dirs, new_page_tables); > return ret; > } > > /* For every page directory referenced, allocate page tables */ > gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) { > ret = gen8_ppgtt_alloc_pagetabs(vm, pd, start, length, > - new_page_tables[pdpe]); > + new_page_tables + pdpe * BITS_TO_LONGS(I915_PDES)); > if (ret) > goto err_out; > } > @@ -1278,20 +1264,21 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm, > gen8_setup_page_directory(ppgtt, pdp, pd, pdpe); > } > > - free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes); > + free_gen8_temp_bitmaps(new_page_dirs, new_page_tables); > mark_tlbs_dirty(ppgtt); > return 0; > > err_out: > while (pdpe--) { > - for_each_set_bit(temp, new_page_tables[pdpe], I915_PDES) > + for_each_set_bit(temp, new_page_tables + pdpe * > + BITS_TO_LONGS(I915_PDES), I915_PDES) > free_pt(dev, pdp->page_directory[pdpe]->page_table[temp]); > } > > for_each_set_bit(pdpe, new_page_dirs, pdpes) > free_pd(dev, pdp->page_directory[pdpe]); > > - free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes); > + free_gen8_temp_bitmaps(new_page_dirs, new_page_tables); > mark_tlbs_dirty(ppgtt); > return ret; > } > _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps 2015-09-02 13:40 ` Michel Thierry @ 2015-09-02 13:46 ` Chris Wilson 2015-09-02 15:13 ` Daniel Vetter 0 siblings, 1 reply; 15+ messages in thread From: Chris Wilson @ 2015-09-02 13:46 UTC (permalink / raw) To: Michel Thierry; +Cc: intel-gfx On Wed, Sep 02, 2015 at 02:40:03PM +0100, Michel Thierry wrote: > On 9/1/2015 10:06 AM, Michał Winiarski wrote: > >On each call to gen8_alloc_va_range_3lvl we're allocating temporary > >bitmaps needed for error handling. Unfortunately, when we increase > >address space size (48b ppgtt) we do additional (512 - 4) calls to > >kcalloc, increasing latency between exec and actual start of execution > >on the GPU. Let's just do a single kcalloc, we can also drop the size > >from free_gen8_temp_bitmaps since it's no longer used. > > > >v2: Use GFP_TEMPORARY to make the allocations reclaimable. > >v3: Drop the 2D array, just allocate a single block. > > > >Cc: Chris Wilson <chris@chris-wilson.co.uk> > >Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> > >Cc: Michel Thierry <michel.thierry@intel.com> > >Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> > > Unless Chris thinks otherwise, I see Michał already addressed his comments. I think I spotted a misaligned bracket... ;) Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> -Chri -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps 2015-09-02 13:46 ` Chris Wilson @ 2015-09-02 15:13 ` Daniel Vetter 2015-09-02 15:26 ` Daniel Vetter 0 siblings, 1 reply; 15+ messages in thread From: Daniel Vetter @ 2015-09-02 15:13 UTC (permalink / raw) To: Chris Wilson, Michel Thierry, Michał Winiarski, intel-gfx, Mika Kuoppala On Wed, Sep 02, 2015 at 02:46:41PM +0100, Chris Wilson wrote: > On Wed, Sep 02, 2015 at 02:40:03PM +0100, Michel Thierry wrote: > > On 9/1/2015 10:06 AM, Michał Winiarski wrote: > > >On each call to gen8_alloc_va_range_3lvl we're allocating temporary > > >bitmaps needed for error handling. Unfortunately, when we increase > > >address space size (48b ppgtt) we do additional (512 - 4) calls to > > >kcalloc, increasing latency between exec and actual start of execution > > >on the GPU. Let's just do a single kcalloc, we can also drop the size > > >from free_gen8_temp_bitmaps since it's no longer used. > > > > > >v2: Use GFP_TEMPORARY to make the allocations reclaimable. > > >v3: Drop the 2D array, just allocate a single block. > > > > > >Cc: Chris Wilson <chris@chris-wilson.co.uk> > > >Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> > > >Cc: Michel Thierry <michel.thierry@intel.com> > > >Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> > > > > Unless Chris thinks otherwise, I see Michał already addressed his comments. > > I think I spotted a misaligned bracket... ;) checkpatch didn't spot it and neither me ... > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Queued for -next, thanks for the patch. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps 2015-09-02 15:13 ` Daniel Vetter @ 2015-09-02 15:26 ` Daniel Vetter 2015-09-02 15:46 ` [PATCH v4] " Michał Winiarski 0 siblings, 1 reply; 15+ messages in thread From: Daniel Vetter @ 2015-09-02 15:26 UTC (permalink / raw) To: Chris Wilson, Michel Thierry, Michał Winiarski, intel-gfx, Mika Kuoppala On Wed, Sep 02, 2015 at 05:13:29PM +0200, Daniel Vetter wrote: > On Wed, Sep 02, 2015 at 02:46:41PM +0100, Chris Wilson wrote: > > On Wed, Sep 02, 2015 at 02:40:03PM +0100, Michel Thierry wrote: > > > On 9/1/2015 10:06 AM, Michał Winiarski wrote: > > > >On each call to gen8_alloc_va_range_3lvl we're allocating temporary > > > >bitmaps needed for error handling. Unfortunately, when we increase > > > >address space size (48b ppgtt) we do additional (512 - 4) calls to > > > >kcalloc, increasing latency between exec and actual start of execution > > > >on the GPU. Let's just do a single kcalloc, we can also drop the size > > > >from free_gen8_temp_bitmaps since it's no longer used. > > > > > > > >v2: Use GFP_TEMPORARY to make the allocations reclaimable. > > > >v3: Drop the 2D array, just allocate a single block. > > > > > > > >Cc: Chris Wilson <chris@chris-wilson.co.uk> > > > >Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> > > > >Cc: Michel Thierry <michel.thierry@intel.com> > > > >Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> > > > > > > Unless Chris thinks otherwise, I see Michał already addressed his comments. > > > > I think I spotted a misaligned bracket... ;) > > checkpatch didn't spot it and neither me ... > > > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> > > Queued for -next, thanks for the patch. Strike that, patch doesn't compile too well on plain dinq. What's going on here? And there doesn't seem to be anything in -nightly which isn't in dinq ... -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v4] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps 2015-09-02 15:26 ` Daniel Vetter @ 2015-09-02 15:46 ` Michał Winiarski 2015-09-02 16:05 ` Chris Wilson 2015-09-03 17:22 ` [PATCH v5] " Michał Winiarski 0 siblings, 2 replies; 15+ messages in thread From: Michał Winiarski @ 2015-09-02 15:46 UTC (permalink / raw) To: intel-gfx, Daniel Vetter On each call to gen8_alloc_va_range_3lvl we're allocating temporary bitmaps needed for error handling. Unfortunately, when we increase address space size (48b ppgtt) we do additional (512 - 4) calls to kcalloc, increasing latency between exec and actual start of execution on the GPU. Let's just do a single kcalloc, we can also drop the size from free_gen8_temp_bitmaps since it's no longer used. v2: Use GFP_TEMPORARY to make the allocations reclaimable. v3: Drop the 2D array, just allocate a single block. v4: Rebase to handle gen8_preallocate_top_level_pdps. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> --- drivers/gpu/drm/i915/i915_gem_gtt.c | 49 ++++++++++++++----------------------- 1 file changed, 18 insertions(+), 31 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index bdb7adc..8885ceb 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -1164,13 +1164,8 @@ unwind_out: } static void -free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts, - uint32_t pdpes) +free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long *new_pts) { - int i; - - for (i = 0; i < pdpes; i++) - kfree(new_pts[i]); kfree(new_pts); kfree(new_pds); } @@ -1180,29 +1175,20 @@ free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts, */ static int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds, - unsigned long ***new_pts, + unsigned long **new_pts, uint32_t pdpes) { - int i; unsigned long *pds; - unsigned long **pts; + unsigned long *pts; - pds = kcalloc(BITS_TO_LONGS(pdpes), sizeof(unsigned long), GFP_KERNEL); + pds = kcalloc(BITS_TO_LONGS(pdpes), sizeof(unsigned long), GFP_TEMPORARY); if (!pds) return -ENOMEM; - pts = kcalloc(pdpes, sizeof(unsigned long *), GFP_KERNEL); - if (!pts) { - kfree(pds); - return -ENOMEM; - } - - for (i = 0; i < pdpes; i++) { - pts[i] = kcalloc(BITS_TO_LONGS(I915_PDES), - sizeof(unsigned long), GFP_KERNEL); - if (!pts[i]) - goto err_out; - } + pts = kcalloc(pdpes * BITS_TO_LONGS(I915_PDES), + sizeof(unsigned long), GFP_TEMPORARY); + if (!pts) + goto err_out; *new_pds = pds; *new_pts = pts; @@ -1210,7 +1196,7 @@ int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds, return 0; err_out: - free_gen8_temp_bitmaps(pds, pts, pdpes); + free_gen8_temp_bitmaps(pds, pts); return -ENOMEM; } @@ -1231,7 +1217,7 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm, { struct i915_hw_ppgtt *ppgtt = container_of(vm, struct i915_hw_ppgtt, base); - unsigned long *new_page_dirs, **new_page_tables; + unsigned long *new_page_dirs, *new_page_tables; struct drm_device *dev = vm->dev; struct i915_page_directory *pd; const uint64_t orig_start = start; @@ -1258,14 +1244,14 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm, ret = gen8_ppgtt_alloc_page_directories(vm, pdp, start, length, new_page_dirs); if (ret) { - free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes); + free_gen8_temp_bitmaps(new_page_dirs, new_page_tables); return ret; } /* For every page directory referenced, allocate page tables */ gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) { ret = gen8_ppgtt_alloc_pagetabs(vm, pd, start, length, - new_page_tables[pdpe]); + new_page_tables + pdpe * BITS_TO_LONGS(I915_PDES)); if (ret) goto err_out; } @@ -1316,20 +1302,21 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm, gen8_setup_page_directory(ppgtt, pdp, pd, pdpe); } - free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes); + free_gen8_temp_bitmaps(new_page_dirs, new_page_tables); mark_tlbs_dirty(ppgtt); return 0; err_out: while (pdpe--) { - for_each_set_bit(temp, new_page_tables[pdpe], I915_PDES) + for_each_set_bit(temp, new_page_tables + pdpe * + BITS_TO_LONGS(I915_PDES), I915_PDES) free_pt(dev, pdp->page_directory[pdpe]->page_table[temp]); } for_each_set_bit(pdpe, new_page_dirs, pdpes) free_pd(dev, pdp->page_directory[pdpe]); - free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes); + free_gen8_temp_bitmaps(new_page_dirs, new_page_tables); mark_tlbs_dirty(ppgtt); return ret; } @@ -1481,7 +1468,7 @@ static void gen8_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m) static int gen8_preallocate_top_level_pdps(struct i915_hw_ppgtt *ppgtt) { - unsigned long *new_page_dirs, **new_page_tables; + unsigned long *new_page_dirs, *new_page_tables; uint32_t pdpes = I915_PDPES_PER_PDP(dev); int ret; @@ -1501,7 +1488,7 @@ static int gen8_preallocate_top_level_pdps(struct i915_hw_ppgtt *ppgtt) if (!ret) *ppgtt->pdp.used_pdpes = *new_page_dirs; - free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes); + free_gen8_temp_bitmaps(new_page_dirs, new_page_tables); return ret; } -- 2.4.3 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v4] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps 2015-09-02 15:46 ` [PATCH v4] " Michał Winiarski @ 2015-09-02 16:05 ` Chris Wilson 2015-09-03 17:22 ` [PATCH v5] " Michał Winiarski 1 sibling, 0 replies; 15+ messages in thread From: Chris Wilson @ 2015-09-02 16:05 UTC (permalink / raw) To: Michał Winiarski; +Cc: intel-gfx On Wed, Sep 02, 2015 at 05:46:38PM +0200, Michał Winiarski wrote: > + pts = kcalloc(pdpes * BITS_TO_LONGS(I915_PDES), > + sizeof(unsigned long), GFP_TEMPORARY); > + if (!pts) > + goto err_out; This is the oddly aligned bracket! -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v5] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps 2015-09-02 15:46 ` [PATCH v4] " Michał Winiarski 2015-09-02 16:05 ` Chris Wilson @ 2015-09-03 17:22 ` Michał Winiarski 2015-09-03 20:48 ` Chris Wilson 1 sibling, 1 reply; 15+ messages in thread From: Michał Winiarski @ 2015-09-03 17:22 UTC (permalink / raw) To: intel-gfx, Daniel Vetter On each call to gen8_alloc_va_range_3lvl we're allocating temporary bitmaps needed for error handling. Unfortunately, when we increase address space size (48b ppgtt) we do additional (512 - 4) calls to kcalloc, increasing latency between exec and actual start of execution on the GPU. Let's just do a single kcalloc, we can also drop the size from free_gen8_temp_bitmaps since it's no longer used. v2: Use GFP_TEMPORARY to make the allocations reclaimable. v3: Drop the 2D array, just allocate a single block. v4: Rebase to handle gen8_preallocate_top_level_pdps. v5: Align misaligned bracket. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> --- drivers/gpu/drm/i915/i915_gem_gtt.c | 49 ++++++++++++++----------------------- 1 file changed, 18 insertions(+), 31 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index bdb7adc..b40a9cb 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -1164,13 +1164,8 @@ unwind_out: } static void -free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts, - uint32_t pdpes) +free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long *new_pts) { - int i; - - for (i = 0; i < pdpes; i++) - kfree(new_pts[i]); kfree(new_pts); kfree(new_pds); } @@ -1180,29 +1175,20 @@ free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts, */ static int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds, - unsigned long ***new_pts, + unsigned long **new_pts, uint32_t pdpes) { - int i; unsigned long *pds; - unsigned long **pts; + unsigned long *pts; - pds = kcalloc(BITS_TO_LONGS(pdpes), sizeof(unsigned long), GFP_KERNEL); + pds = kcalloc(BITS_TO_LONGS(pdpes), sizeof(unsigned long), GFP_TEMPORARY); if (!pds) return -ENOMEM; - pts = kcalloc(pdpes, sizeof(unsigned long *), GFP_KERNEL); - if (!pts) { - kfree(pds); - return -ENOMEM; - } - - for (i = 0; i < pdpes; i++) { - pts[i] = kcalloc(BITS_TO_LONGS(I915_PDES), - sizeof(unsigned long), GFP_KERNEL); - if (!pts[i]) - goto err_out; - } + pts = kcalloc(pdpes * BITS_TO_LONGS(I915_PDES), + sizeof(unsigned long), GFP_TEMPORARY); + if (!pts) + goto err_out; *new_pds = pds; *new_pts = pts; @@ -1210,7 +1196,7 @@ int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds, return 0; err_out: - free_gen8_temp_bitmaps(pds, pts, pdpes); + free_gen8_temp_bitmaps(pds, pts); return -ENOMEM; } @@ -1231,7 +1217,7 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm, { struct i915_hw_ppgtt *ppgtt = container_of(vm, struct i915_hw_ppgtt, base); - unsigned long *new_page_dirs, **new_page_tables; + unsigned long *new_page_dirs, *new_page_tables; struct drm_device *dev = vm->dev; struct i915_page_directory *pd; const uint64_t orig_start = start; @@ -1258,14 +1244,14 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm, ret = gen8_ppgtt_alloc_page_directories(vm, pdp, start, length, new_page_dirs); if (ret) { - free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes); + free_gen8_temp_bitmaps(new_page_dirs, new_page_tables); return ret; } /* For every page directory referenced, allocate page tables */ gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) { ret = gen8_ppgtt_alloc_pagetabs(vm, pd, start, length, - new_page_tables[pdpe]); + new_page_tables + pdpe * BITS_TO_LONGS(I915_PDES)); if (ret) goto err_out; } @@ -1316,20 +1302,21 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm, gen8_setup_page_directory(ppgtt, pdp, pd, pdpe); } - free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes); + free_gen8_temp_bitmaps(new_page_dirs, new_page_tables); mark_tlbs_dirty(ppgtt); return 0; err_out: while (pdpe--) { - for_each_set_bit(temp, new_page_tables[pdpe], I915_PDES) + for_each_set_bit(temp, new_page_tables + pdpe * + BITS_TO_LONGS(I915_PDES), I915_PDES) free_pt(dev, pdp->page_directory[pdpe]->page_table[temp]); } for_each_set_bit(pdpe, new_page_dirs, pdpes) free_pd(dev, pdp->page_directory[pdpe]); - free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes); + free_gen8_temp_bitmaps(new_page_dirs, new_page_tables); mark_tlbs_dirty(ppgtt); return ret; } @@ -1481,7 +1468,7 @@ static void gen8_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m) static int gen8_preallocate_top_level_pdps(struct i915_hw_ppgtt *ppgtt) { - unsigned long *new_page_dirs, **new_page_tables; + unsigned long *new_page_dirs, *new_page_tables; uint32_t pdpes = I915_PDPES_PER_PDP(dev); int ret; @@ -1501,7 +1488,7 @@ static int gen8_preallocate_top_level_pdps(struct i915_hw_ppgtt *ppgtt) if (!ret) *ppgtt->pdp.used_pdpes = *new_page_dirs; - free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes); + free_gen8_temp_bitmaps(new_page_dirs, new_page_tables); return ret; } -- 2.4.3 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v5] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps 2015-09-03 17:22 ` [PATCH v5] " Michał Winiarski @ 2015-09-03 20:48 ` Chris Wilson 2015-09-04 7:53 ` Daniel Vetter 0 siblings, 1 reply; 15+ messages in thread From: Chris Wilson @ 2015-09-03 20:48 UTC (permalink / raw) To: Michał Winiarski; +Cc: intel-gfx On Thu, Sep 03, 2015 at 07:22:18PM +0200, Michał Winiarski wrote: > + pts = kcalloc(pdpes * BITS_TO_LONGS(I915_PDES), > + sizeof(unsigned long), GFP_TEMPORARY); Something to remember is that kcalloc is written presuming that the size argument (the second) is constant. pts = kcalloc(pdpes, BITS_TO_LONGS(I915_PDES) * sizeof(unsigned long), GFP_TEMPORARY); should be infinitesimally more efficient. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v5] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps 2015-09-03 20:48 ` Chris Wilson @ 2015-09-04 7:53 ` Daniel Vetter 0 siblings, 0 replies; 15+ messages in thread From: Daniel Vetter @ 2015-09-04 7:53 UTC (permalink / raw) To: Chris Wilson, Michał Winiarski, intel-gfx, Daniel Vetter, Mika Kuoppala, Michel Thierry On Thu, Sep 03, 2015 at 09:48:03PM +0100, Chris Wilson wrote: > On Thu, Sep 03, 2015 at 07:22:18PM +0200, Michał Winiarski wrote: > > + pts = kcalloc(pdpes * BITS_TO_LONGS(I915_PDES), > > + sizeof(unsigned long), GFP_TEMPORARY); > > Something to remember is that kcalloc is written presuming that the size > argument (the second) is constant. > > pts = kcalloc(pdpes, > BITS_TO_LONGS(I915_PDES) * sizeof(unsigned long), > GFP_TEMPORARY); > > should be infinitesimally more efficient. That's also better style from a security pov sinc kcalloc checks for overflows. Which means if you multiply yourself things might overflow and go boom. Luckily pdpdes is guaranteed to be small enough here, but still secure coding best practices means you better not multiply the variable. Hence I changed this. > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Queued for -next, thanks for the patch. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2015-09-04 7:51 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-08-31 16:27 [PATCH] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps Michał Winiarski 2015-08-31 16:40 ` Chris Wilson 2015-08-31 16:59 ` [PATCH v2] " Michał Winiarski 2015-08-31 18:42 ` Chris Wilson 2015-09-01 9:03 ` Michał Winiarski 2015-09-01 9:06 ` [PATCH v3] " Michał Winiarski 2015-09-02 13:40 ` Michel Thierry 2015-09-02 13:46 ` Chris Wilson 2015-09-02 15:13 ` Daniel Vetter 2015-09-02 15:26 ` Daniel Vetter 2015-09-02 15:46 ` [PATCH v4] " Michał Winiarski 2015-09-02 16:05 ` Chris Wilson 2015-09-03 17:22 ` [PATCH v5] " Michał Winiarski 2015-09-03 20:48 ` Chris Wilson 2015-09-04 7:53 ` Daniel Vetter
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.