Re: [Intel-gfx] LOOKS GOOD: Possible regression in drm/i915 driver: memleak

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>,
	srinivas pandruvada <srinivas.pandruvada@linux.intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	jani.nikula@linux.intel.com, joonas.lahtinen@linux.intel.com,
	Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: intel-gfx@lists.freedesktop.org,
	Thorsten Leemhuis <regressions@leemhuis.info>
Subject: Re: [Intel-gfx] LOOKS GOOD: Possible regression in drm/i915 driver: memleak
Date: Thu, 22 Dec 2022 08:04:28 +0000	[thread overview]
Message-ID: <96661293-32d7-0bb4-fb0e-28086eaaecc3@linux.intel.com> (raw)
In-Reply-To: <8e080674-36ab-9260-046e-f4e3c931a3b9@alu.unizg.hr>


On 22/12/2022 00:12, Mirsad Goran Todorovac wrote:
> On 20. 12. 2022. 20:34, Mirsad Todorovac wrote:
>> On 12/20/22 16:52, Tvrtko Ursulin wrote:
>>
>>> On 20/12/2022 15:22, srinivas pandruvada wrote:
>>>> +Added DRM mailing list and maintainers
>>>>
>>>> On Tue, 2022-12-20 at 15:33 +0100, Mirsad Todorovac wrote:
>>>>> Hi all,
>>>>>
>>>>> I have been unsuccessful to find any particular Intel i915 maintainer
>>>>> emails, so my best bet is to post here, as you will must assuredly
>>>>> already know them.
>>>
>>> For future reference you can use 
>>> ${kernel_dir}/scripts/get_maintainer.pl -f ...
>>>
>>>>> The problem is a kernel memory leak that is repeatedly occurring
>>>>> triggered during the execution of Chrome browser under the latest
>>>>> 6.1.0+
>>>>> kernel of this morning and Almalinux 8.6 on a Lenovo desktop box
>>>>> with Intel(R) Core(TM) i5-8400 CPU @ 2.80GHz CPU.
>>>>>
>>>>> The build is with KMEMLEAK, KASAN and MGLRU turned on during the
>>>>> build,
>>>>> on a vanilla mainline kernel from Mr. Torvalds' tree.
>>>>>
>>>>> The leaks look like this one:
>>>>>
>>>>> unreferenced object 0xffff888131754880 (size 64):
>>>>>     comm "chrome", pid 13058, jiffies 4298568878 (age 3708.084s)
>>>>>     hex dump (first 32 bytes):
>>>>>       01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>>> ................
>>>>>       00 00 00 00 00 00 00 00 00 80 1e 3e 83 88 ff ff
>>>>> ...........>....
>>>>>     backtrace:
>>>>>       [<ffffffff9e9b5542>] slab_post_alloc_hook+0xb2/0x340
>>>>>       [<ffffffff9e9bbf5f>] __kmem_cache_alloc_node+0x1bf/0x2c0
>>>>>       [<ffffffff9e8f767a>] kmalloc_trace+0x2a/0xb0
>>>>>       [<ffffffffc08dfde5>] drm_vma_node_allow+0x45/0x150 [drm]
>>>>>       [<ffffffffc0b33315>] __assign_mmap_offset_handle+0x615/0x820
>>>>> [i915]
>>>>>       [<ffffffffc0b34057>] i915_gem_mmap_offset_ioctl+0x77/0x110
>>>>> [i915]
>>>>>       [<ffffffffc08bc5e1>] drm_ioctl_kernel+0x181/0x280 [drm]
>>>>>       [<ffffffffc08bc9cd>] drm_ioctl+0x2dd/0x6a0 [drm]
>>>>>       [<ffffffff9ea54744>] __x64_sys_ioctl+0xc4/0x100
>>>>>       [<ffffffff9fbc0178>] do_syscall_64+0x58/0x80
>>>>>       [<ffffffff9fc000aa>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
>>>>>
>>>>> The complete list of leaks in attachment, but they seem similar or
>>>>> the same.
>>>>>
>>>>> Please find attached lshw and kernel build config file.
>>>>>
>>>>> I will probably check the same parms on my laptop at home, which is
>>>>> also
>>>>> Lenovo, but a different hw config and Ubuntu 22.10.
>>>
>>> Could you try the below patch?
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>> index c3ea243d414d..0b07534c203a 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>> @@ -679,9 +679,10 @@ mmap_offset_attach(struct drm_i915_gem_object *obj,
>>>   insert:
>>>          mmo = insert_mmo(obj, mmo);
>>>          GEM_BUG_ON(lookup_mmo(obj, mmap_type) != mmo);
>>> -out:
>>> +
>>>          if (file)
>>>                  drm_vma_node_allow(&mmo->vma_node, file);
>>> +out:
>>>          return mmo;
>>>
>>>   err:
>>>
>>> Maybe it is not the best fix but curious to know if it will make the 
>>> leak go away.
>>
>> Hi,
>>
>> After 27 minutes uptime with the patched kernel it looks promising.
>> It is much longer than it took for the buggy kernel to leak slabs.
>>
>> Here is the output:
>>
>> [root@pc-mtodorov marvin]# echo scan > /sys/kernel/debug/kmemleak
>> [root@pc-mtodorov marvin]# cat !$
>> cat /sys/kernel/debug/kmemleak
>> unreferenced object 0xffff888105028d80 (size 16):
>>    comm "kworker/u12:5", pid 359, jiffies 4294902898 (age 1620.144s)
>>    hex dump (first 16 bytes):
>>      6d 65 6d 73 74 69 63 6b 30 00 00 00 00 00 00 00  memstick0.......
>>    backtrace:
>>      [<ffffffffb6bb5542>] slab_post_alloc_hook+0xb2/0x340
>>      [<ffffffffb6bbbf5f>] __kmem_cache_alloc_node+0x1bf/0x2c0
>>      [<ffffffffb6af8175>] __kmalloc_node_track_caller+0x55/0x160
>>      [<ffffffffb6ae34a6>] kstrdup+0x36/0x60
>>      [<ffffffffb6ae3508>] kstrdup_const+0x28/0x30
>>      [<ffffffffb70d0757>] kvasprintf_const+0x97/0xd0
>>      [<ffffffffb7c9cdf4>] kobject_set_name_vargs+0x34/0xc0
>>      [<ffffffffb750289b>] dev_set_name+0x9b/0xd0
>>      [<ffffffffc12d9201>] memstick_check+0x181/0x639 [memstick]
>>      [<ffffffffb676e1d6>] process_one_work+0x4e6/0x7e0
>>      [<ffffffffb676e556>] worker_thread+0x76/0x770
>>      [<ffffffffb677b468>] kthread+0x168/0x1a0
>>      [<ffffffffb6604c99>] ret_from_fork+0x29/0x50
>> [root@pc-mtodorov marvin]# w
>>   20:27:35 up 27 min,  2 users,  load average: 0.83, 1.15, 1.19
>> USER     TTY      FROM             LOGIN@   IDLE   JCPU   PCPU WHAT
>> marvin   tty2     tty2             20:01   27:10  10:12   2.09s 
>> /opt/google/chrome/chrome --type=utility --utility-sub-type=audio.m
>> marvin   pts/1    -                20:01    0.00s  2:00   0.38s sudo bash
>> [root@pc-mtodorov marvin]# uname -rms
>> Linux 6.1.0-b6bb9676f216-mglru-kmemlk-kasan+ x86_64
>> [root@pc-mtodorov marvin]#
> 
> As I hear no reply from Tvrtko, and there is already 1d5h uptime with no 
> leaks (but
> the kworker with memstick_check nag I couldn't bisect on the only box 
> that reproduced it,
> because something in hw was not supported in pre 4.16 kernels on the 
> Lenovo V530S-07ICB.
> Or I am doing something wrong.)
> 
> However, now I can find the memstick maintainers thanks to Tvrtko's hint.
> 
> If you no longer require my service, I would close this on my behalf.
> 
> I hope I did not cause too much trouble. The knowledgeable knew that 
> this was not a security
> risk, but only a bug. (30 leaks of 64 bytes each were hardly to exhaust 
> memory in any realistic
> time.)
> 
> However, having some experience with software development, I always 
> preferred bugs reported
> and fixed rather than concealed and lying in wait (or worse, found first 
> by a motivated
> adversary.) Forgive me this rant, I do not live from writing kernel 
> drivers, this is just a
> pet project as of time being ...

It is not forgotten - I was trying to reach out to the original author 
of the fixlet which worked for you. If that fails I will take it up on 
myself, but need to set aside some time to get into the exact problem 
space before I can vouch for the fix and send it on my own.

In the meantime definitely thanks a lot for testing this quickly and 
reporting back!

What will happen next is, that when either the original author or myself 
are ready to send out the fix as a proper patch, you will be copied on 
it via the "Reported-by" and possibly "Tested-by" tags. Latter is if the 
patch remains identical. If it changes we might kindly ask you to 
re-test if possible.

Regards,

Tvrtko

WARNING: multiple messages have this Message-ID (diff)

From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>,
	srinivas pandruvada <srinivas.pandruvada@linux.intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	jani.nikula@linux.intel.com, joonas.lahtinen@linux.intel.com,
	Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>,
	intel-gfx@lists.freedesktop.org
Subject: Re: LOOKS GOOD: Possible regression in drm/i915 driver: memleak
Date: Thu, 22 Dec 2022 08:04:28 +0000	[thread overview]
Message-ID: <96661293-32d7-0bb4-fb0e-28086eaaecc3@linux.intel.com> (raw)
In-Reply-To: <8e080674-36ab-9260-046e-f4e3c931a3b9@alu.unizg.hr>


On 22/12/2022 00:12, Mirsad Goran Todorovac wrote:
> On 20. 12. 2022. 20:34, Mirsad Todorovac wrote:
>> On 12/20/22 16:52, Tvrtko Ursulin wrote:
>>
>>> On 20/12/2022 15:22, srinivas pandruvada wrote:
>>>> +Added DRM mailing list and maintainers
>>>>
>>>> On Tue, 2022-12-20 at 15:33 +0100, Mirsad Todorovac wrote:
>>>>> Hi all,
>>>>>
>>>>> I have been unsuccessful to find any particular Intel i915 maintainer
>>>>> emails, so my best bet is to post here, as you will must assuredly
>>>>> already know them.
>>>
>>> For future reference you can use 
>>> ${kernel_dir}/scripts/get_maintainer.pl -f ...
>>>
>>>>> The problem is a kernel memory leak that is repeatedly occurring
>>>>> triggered during the execution of Chrome browser under the latest
>>>>> 6.1.0+
>>>>> kernel of this morning and Almalinux 8.6 on a Lenovo desktop box
>>>>> with Intel(R) Core(TM) i5-8400 CPU @ 2.80GHz CPU.
>>>>>
>>>>> The build is with KMEMLEAK, KASAN and MGLRU turned on during the
>>>>> build,
>>>>> on a vanilla mainline kernel from Mr. Torvalds' tree.
>>>>>
>>>>> The leaks look like this one:
>>>>>
>>>>> unreferenced object 0xffff888131754880 (size 64):
>>>>>     comm "chrome", pid 13058, jiffies 4298568878 (age 3708.084s)
>>>>>     hex dump (first 32 bytes):
>>>>>       01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>>> ................
>>>>>       00 00 00 00 00 00 00 00 00 80 1e 3e 83 88 ff ff
>>>>> ...........>....
>>>>>     backtrace:
>>>>>       [<ffffffff9e9b5542>] slab_post_alloc_hook+0xb2/0x340
>>>>>       [<ffffffff9e9bbf5f>] __kmem_cache_alloc_node+0x1bf/0x2c0
>>>>>       [<ffffffff9e8f767a>] kmalloc_trace+0x2a/0xb0
>>>>>       [<ffffffffc08dfde5>] drm_vma_node_allow+0x45/0x150 [drm]
>>>>>       [<ffffffffc0b33315>] __assign_mmap_offset_handle+0x615/0x820
>>>>> [i915]
>>>>>       [<ffffffffc0b34057>] i915_gem_mmap_offset_ioctl+0x77/0x110
>>>>> [i915]
>>>>>       [<ffffffffc08bc5e1>] drm_ioctl_kernel+0x181/0x280 [drm]
>>>>>       [<ffffffffc08bc9cd>] drm_ioctl+0x2dd/0x6a0 [drm]
>>>>>       [<ffffffff9ea54744>] __x64_sys_ioctl+0xc4/0x100
>>>>>       [<ffffffff9fbc0178>] do_syscall_64+0x58/0x80
>>>>>       [<ffffffff9fc000aa>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
>>>>>
>>>>> The complete list of leaks in attachment, but they seem similar or
>>>>> the same.
>>>>>
>>>>> Please find attached lshw and kernel build config file.
>>>>>
>>>>> I will probably check the same parms on my laptop at home, which is
>>>>> also
>>>>> Lenovo, but a different hw config and Ubuntu 22.10.
>>>
>>> Could you try the below patch?
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>> index c3ea243d414d..0b07534c203a 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>> @@ -679,9 +679,10 @@ mmap_offset_attach(struct drm_i915_gem_object *obj,
>>>   insert:
>>>          mmo = insert_mmo(obj, mmo);
>>>          GEM_BUG_ON(lookup_mmo(obj, mmap_type) != mmo);
>>> -out:
>>> +
>>>          if (file)
>>>                  drm_vma_node_allow(&mmo->vma_node, file);
>>> +out:
>>>          return mmo;
>>>
>>>   err:
>>>
>>> Maybe it is not the best fix but curious to know if it will make the 
>>> leak go away.
>>
>> Hi,
>>
>> After 27 minutes uptime with the patched kernel it looks promising.
>> It is much longer than it took for the buggy kernel to leak slabs.
>>
>> Here is the output:
>>
>> [root@pc-mtodorov marvin]# echo scan > /sys/kernel/debug/kmemleak
>> [root@pc-mtodorov marvin]# cat !$
>> cat /sys/kernel/debug/kmemleak
>> unreferenced object 0xffff888105028d80 (size 16):
>>    comm "kworker/u12:5", pid 359, jiffies 4294902898 (age 1620.144s)
>>    hex dump (first 16 bytes):
>>      6d 65 6d 73 74 69 63 6b 30 00 00 00 00 00 00 00  memstick0.......
>>    backtrace:
>>      [<ffffffffb6bb5542>] slab_post_alloc_hook+0xb2/0x340
>>      [<ffffffffb6bbbf5f>] __kmem_cache_alloc_node+0x1bf/0x2c0
>>      [<ffffffffb6af8175>] __kmalloc_node_track_caller+0x55/0x160
>>      [<ffffffffb6ae34a6>] kstrdup+0x36/0x60
>>      [<ffffffffb6ae3508>] kstrdup_const+0x28/0x30
>>      [<ffffffffb70d0757>] kvasprintf_const+0x97/0xd0
>>      [<ffffffffb7c9cdf4>] kobject_set_name_vargs+0x34/0xc0
>>      [<ffffffffb750289b>] dev_set_name+0x9b/0xd0
>>      [<ffffffffc12d9201>] memstick_check+0x181/0x639 [memstick]
>>      [<ffffffffb676e1d6>] process_one_work+0x4e6/0x7e0
>>      [<ffffffffb676e556>] worker_thread+0x76/0x770
>>      [<ffffffffb677b468>] kthread+0x168/0x1a0
>>      [<ffffffffb6604c99>] ret_from_fork+0x29/0x50
>> [root@pc-mtodorov marvin]# w
>>   20:27:35 up 27 min,  2 users,  load average: 0.83, 1.15, 1.19
>> USER     TTY      FROM             LOGIN@   IDLE   JCPU   PCPU WHAT
>> marvin   tty2     tty2             20:01   27:10  10:12   2.09s 
>> /opt/google/chrome/chrome --type=utility --utility-sub-type=audio.m
>> marvin   pts/1    -                20:01    0.00s  2:00   0.38s sudo bash
>> [root@pc-mtodorov marvin]# uname -rms
>> Linux 6.1.0-b6bb9676f216-mglru-kmemlk-kasan+ x86_64
>> [root@pc-mtodorov marvin]#
> 
> As I hear no reply from Tvrtko, and there is already 1d5h uptime with no 
> leaks (but
> the kworker with memstick_check nag I couldn't bisect on the only box 
> that reproduced it,
> because something in hw was not supported in pre 4.16 kernels on the 
> Lenovo V530S-07ICB.
> Or I am doing something wrong.)
> 
> However, now I can find the memstick maintainers thanks to Tvrtko's hint.
> 
> If you no longer require my service, I would close this on my behalf.
> 
> I hope I did not cause too much trouble. The knowledgeable knew that 
> this was not a security
> risk, but only a bug. (30 leaks of 64 bytes each were hardly to exhaust 
> memory in any realistic
> time.)
> 
> However, having some experience with software development, I always 
> preferred bugs reported
> and fixed rather than concealed and lying in wait (or worse, found first 
> by a motivated
> adversary.) Forgive me this rant, I do not live from writing kernel 
> drivers, this is just a
> pet project as of time being ...

It is not forgotten - I was trying to reach out to the original author 
of the fixlet which worked for you. If that fails I will take it up on 
myself, but need to set aside some time to get into the exact problem 
space before I can vouch for the fix and send it on my own.

In the meantime definitely thanks a lot for testing this quickly and 
reporting back!

What will happen next is, that when either the original author or myself 
are ready to send out the fix as a proper patch, you will be copied on 
it via the "Reported-by" and possibly "Tested-by" tags. Latter is if the 
patch remains identical. If it changes we might kindly ask you to 
re-test if possible.

Regards,

Tvrtko

next prev parent reply	other threads:[~2022-12-22  8:04 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-20 14:33 Possible regression in drm/i915 driver: memleak Mirsad Todorovac
2022-12-20 15:22 ` [Intel-gfx] " srinivas pandruvada
2022-12-20 15:22   ` srinivas pandruvada
2022-12-20 15:52   ` [Intel-gfx] " Tvrtko Ursulin
2022-12-20 15:52     ` Tvrtko Ursulin
2022-12-20 17:20     ` [Intel-gfx] " Mirsad Goran Todorovac
2022-12-20 17:20       ` Mirsad Goran Todorovac
2022-12-20 19:34     ` [Intel-gfx] LOOKS GOOD: " Mirsad Todorovac
2022-12-20 19:34       ` Mirsad Todorovac
2022-12-21  7:15       ` [Intel-gfx] " Mirsad Goran Todorovac
2022-12-21  7:15         ` Mirsad Goran Todorovac
2022-12-22  0:12       ` [Intel-gfx] LOOKS GOOD: " Mirsad Goran Todorovac
2022-12-22  0:12         ` Mirsad Goran Todorovac
2022-12-22  8:04         ` Tvrtko Ursulin [this message]
2022-12-22  8:04           ` Tvrtko Ursulin
2022-12-22 15:21           ` [Intel-gfx] " Mirsad Goran Todorovac
2022-12-22 15:21             ` Mirsad Goran Todorovac
2022-12-23 12:18             ` [Intel-gfx] " Tvrtko Ursulin
2022-12-23 12:18               ` Tvrtko Ursulin
2022-12-25 21:11               ` [Intel-gfx] " Mirsad Goran Todorovac
2022-12-25 21:11                 ` Mirsad Goran Todorovac
2022-12-25 22:48           ` [Intel-gfx] " Mirsad Goran Todorovac
2022-12-25 22:48             ` Mirsad Goran Todorovac
2023-01-09 15:00             ` [Intel-gfx] " Tvrtko Ursulin
2023-01-09 15:00               ` Tvrtko Ursulin
2023-01-16  6:25               ` [Intel-gfx] " Mirsad Todorovac
2023-01-16  6:25                 ` Mirsad Todorovac
2022-12-21 23:34 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=96661293-32d7-0bb4-fb0e-28086eaaecc3@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=jani.nikula@linux.intel.com \
    --cc=joonas.lahtinen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mirsad.todorovac@alu.unizg.hr \
    --cc=regressions@leemhuis.info \
    --cc=rodrigo.vivi@intel.com \
    --cc=srinivas.pandruvada@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.