* [PATCH] drm/ttm: fix missing NULL check in ttm_device_swapout @ 2022-07-14 21:58 Felix Kuehling 2022-07-15 6:46 ` Christian König 0 siblings, 1 reply; 5+ messages in thread From: Felix Kuehling @ 2022-07-14 21:58 UTC (permalink / raw) To: amd-gfx; +Cc: christian.koenig Backport of Christian's patch 81b0d0e4f811 to amd-staging-drm-next. This branch may be nearly obsolete, but this patch may still be worth applying as it can serve as a template for backports to some release branches. It fixes intermittent kernel oopses when memory is severely overcommitted. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> --- drivers/gpu/drm/ttm/ttm_device.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c index be24bb6cefd0..165a6cbb45d5 100644 --- a/drivers/gpu/drm/ttm/ttm_device.c +++ b/drivers/gpu/drm/ttm/ttm_device.c @@ -157,6 +157,9 @@ int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx, list_for_each_entry(bo, &man->lru[j], lru) { uint32_t num_pages = PFN_UP(bo->base.size); + if (!bo->resource) + continue; + ret = ttm_bo_swapout(bo, ctx, gfp_flags); /* ttm_bo_swapout has dropped the lru_lock */ if (!ret) -- 2.32.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] drm/ttm: fix missing NULL check in ttm_device_swapout 2022-07-14 21:58 [PATCH] drm/ttm: fix missing NULL check in ttm_device_swapout Felix Kuehling @ 2022-07-15 6:46 ` Christian König 0 siblings, 0 replies; 5+ messages in thread From: Christian König @ 2022-07-15 6:46 UTC (permalink / raw) To: Felix Kuehling, amd-gfx Am 14.07.22 um 23:58 schrieb Felix Kuehling: > Backport of Christian's patch 81b0d0e4f811 to amd-staging-drm-next. This > branch may be nearly obsolete, but this patch may still be worth > applying as it can serve as a template for backports to some release > branches. It fixes intermittent kernel oopses when memory is severely > overcommitted. > > Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> I was hoping that Alex rebase would land before anybody notices this problem. Anyway patch is Reviewed-by: Christian König <christian.koenig@amd.com>. Regards, Christian. > --- > drivers/gpu/drm/ttm/ttm_device.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c > index be24bb6cefd0..165a6cbb45d5 100644 > --- a/drivers/gpu/drm/ttm/ttm_device.c > +++ b/drivers/gpu/drm/ttm/ttm_device.c > @@ -157,6 +157,9 @@ int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx, > list_for_each_entry(bo, &man->lru[j], lru) { > uint32_t num_pages = PFN_UP(bo->base.size); > > + if (!bo->resource) > + continue; > + > ret = ttm_bo_swapout(bo, ctx, gfp_flags); > /* ttm_bo_swapout has dropped the lru_lock */ > if (!ret) ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <20220603104604.456991-1-christian.koenig@amd.com>]
[parent not found: <c9b23cac-6bf0-e8ad-d6b1-f59c1ee1569f@amd.com>]
* Re: [PATCH] drm/ttm: fix missing NULL check in ttm_device_swapout [not found] ` <c9b23cac-6bf0-e8ad-d6b1-f59c1ee1569f@amd.com> @ 2022-06-03 22:44 ` Felix Kuehling 2022-06-06 10:20 ` Christian König 0 siblings, 1 reply; 5+ messages in thread From: Felix Kuehling @ 2022-06-03 22:44 UTC (permalink / raw) To: Christian König, mike, dri-devel, amd-gfx@lists.freedesktop.org Cc: Christian König [+amd-gfx] On 2022-06-03 15:37, Felix Kuehling wrote: > > On 2022-06-03 06:46, Christian König wrote: >> Resources about to be destructed are not tied to BOs any more. > I've been seeing a backtrace in that area with a patch series I'm > working on, but didn't have enough time to track it down yet. I'll try > if this patch fixes it. The patch doesn't apply on amd-staging-drm-next. I made the following change instead, which fixes my problem (and I do see the pr_err being triggered): --- a/drivers/gpu/drm/ttm/ttm_device.c +++ b/drivers/gpu/drm/ttm/ttm_device.c @@ -157,6 +157,10 @@ int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx, list_for_each_entry(bo, &man->lru[j], lru) { uint32_t num_pages = PFN_UP(bo->base.size); + if (!bo->resource) { + pr_err("### bo->resource is NULL\n"); + continue; + } ret = ttm_bo_swapout(bo, ctx, gfp_flags); /* ttm_bo_swapout has dropped the lru_lock */ if (!ret) > > Regards, > Felix > > >> >> Signed-off-by: Christian König <christian.koenig@amd.com> >> --- >> drivers/gpu/drm/ttm/ttm_device.c | 6 +++++- >> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/ttm/ttm_device.c >> b/drivers/gpu/drm/ttm/ttm_device.c >> index a0562ab386f5..e7147e304637 100644 >> --- a/drivers/gpu/drm/ttm/ttm_device.c >> +++ b/drivers/gpu/drm/ttm/ttm_device.c >> @@ -156,8 +156,12 @@ int ttm_device_swapout(struct ttm_device *bdev, >> struct ttm_operation_ctx *ctx, >> ttm_resource_manager_for_each_res(man, &cursor, res) { >> struct ttm_buffer_object *bo = res->bo; >> - uint32_t num_pages = PFN_UP(bo->base.size); >> + uint32_t num_pages; >> + if (!bo) >> + continue; >> + >> + num_pages = PFN_UP(bo->base.size); >> ret = ttm_bo_swapout(bo, ctx, gfp_flags); >> /* ttm_bo_swapout has dropped the lru_lock */ >> if (!ret) ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] drm/ttm: fix missing NULL check in ttm_device_swapout 2022-06-03 22:44 ` Felix Kuehling @ 2022-06-06 10:20 ` Christian König 0 siblings, 0 replies; 5+ messages in thread From: Christian König @ 2022-06-06 10:20 UTC (permalink / raw) To: Felix Kuehling, mike, dri-devel, amd-gfx@lists.freedesktop.org Cc: Christian König Am 04.06.22 um 00:44 schrieb Felix Kuehling: > [+amd-gfx] > > > On 2022-06-03 15:37, Felix Kuehling wrote: >> >> On 2022-06-03 06:46, Christian König wrote: >>> Resources about to be destructed are not tied to BOs any more. >> I've been seeing a backtrace in that area with a patch series I'm >> working on, but didn't have enough time to track it down yet. I'll >> try if this patch fixes it. > > The patch doesn't apply on amd-staging-drm-next. I made the following > change instead, which fixes my problem (and I do see the pr_err being > triggered): > > --- a/drivers/gpu/drm/ttm/ttm_device.c > +++ b/drivers/gpu/drm/ttm/ttm_device.c > @@ -157,6 +157,10 @@ int ttm_device_swapout(struct ttm_device *bdev, > struct ttm_operation_ctx *ctx, > list_for_each_entry(bo, &man->lru[j], lru) { > uint32_t num_pages = > PFN_UP(bo->base.size); > > + if (!bo->resource) { > + pr_err("### bo->resource is > NULL\n"); > + continue; > + } Yeah, that should be functional identical. Can I get an rb for that? Going to provide backports to older kernels as well then. Regards, Christian. > ret = ttm_bo_swapout(bo, ctx, gfp_flags); > /* ttm_bo_swapout has dropped the > lru_lock */ > if (!ret) > >> >> Regards, >> Felix >> >> >>> >>> Signed-off-by: Christian König <christian.koenig@amd.com> >>> --- >>> drivers/gpu/drm/ttm/ttm_device.c | 6 +++++- >>> 1 file changed, 5 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/ttm/ttm_device.c >>> b/drivers/gpu/drm/ttm/ttm_device.c >>> index a0562ab386f5..e7147e304637 100644 >>> --- a/drivers/gpu/drm/ttm/ttm_device.c >>> +++ b/drivers/gpu/drm/ttm/ttm_device.c >>> @@ -156,8 +156,12 @@ int ttm_device_swapout(struct ttm_device *bdev, >>> struct ttm_operation_ctx *ctx, >>> ttm_resource_manager_for_each_res(man, &cursor, res) { >>> struct ttm_buffer_object *bo = res->bo; >>> - uint32_t num_pages = PFN_UP(bo->base.size); >>> + uint32_t num_pages; >>> + if (!bo) >>> + continue; >>> + >>> + num_pages = PFN_UP(bo->base.size); >>> ret = ttm_bo_swapout(bo, ctx, gfp_flags); >>> /* ttm_bo_swapout has dropped the lru_lock */ >>> if (!ret) ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] drm/ttm: fix missing NULL check in ttm_device_swapout [not found] <20220603104604.456991-1-christian.koenig@amd.com> [not found] ` <c9b23cac-6bf0-e8ad-d6b1-f59c1ee1569f@amd.com> @ 2022-06-06 13:15 ` Felix Kuehling 1 sibling, 0 replies; 5+ messages in thread From: Felix Kuehling @ 2022-06-06 13:15 UTC (permalink / raw) To: Christian König, mike, dri-devel, amd-gfx list; +Cc: Christian König Am 2022-06-03 um 06:46 schrieb Christian König: > Resources about to be destructed are not tied to BOs any more. > > Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> > --- > drivers/gpu/drm/ttm/ttm_device.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c > index a0562ab386f5..e7147e304637 100644 > --- a/drivers/gpu/drm/ttm/ttm_device.c > +++ b/drivers/gpu/drm/ttm/ttm_device.c > @@ -156,8 +156,12 @@ int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx, > > ttm_resource_manager_for_each_res(man, &cursor, res) { > struct ttm_buffer_object *bo = res->bo; > - uint32_t num_pages = PFN_UP(bo->base.size); > + uint32_t num_pages; > > + if (!bo) > + continue; > + > + num_pages = PFN_UP(bo->base.size); > ret = ttm_bo_swapout(bo, ctx, gfp_flags); > /* ttm_bo_swapout has dropped the lru_lock */ > if (!ret) ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-07-15 6:46 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-07-14 21:58 [PATCH] drm/ttm: fix missing NULL check in ttm_device_swapout Felix Kuehling
2022-07-15 6:46 ` Christian König
[not found] <20220603104604.456991-1-christian.koenig@amd.com>
[not found] ` <c9b23cac-6bf0-e8ad-d6b1-f59c1ee1569f@amd.com>
2022-06-03 22:44 ` Felix Kuehling
2022-06-06 10:20 ` Christian König
2022-06-06 13:15 ` Felix Kuehling
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox