Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Wysocki, Rafael J" <rafael.j.wysocki@intel.com>
To: "Borah, Chaitanya Kumar" <chaitanya.kumar.borah@intel.com>
Cc: "intel-gfx@lists.freedesktop.org"
	<intel-gfx@lists.freedesktop.org>,
	"Kurmi, Suresh Kumar" <suresh.kumar.kurmi@intel.com>
Subject: Re: [Intel-gfx] Regression in linux-next
Date: Wed, 11 Oct 2023 18:14:03 +0200	[thread overview]
Message-ID: <ddb6d76b-829e-5c81-5459-61774ee79b1a@intel.com> (raw)
In-Reply-To: <SJ1PR11MB6129F78B98D448AF1D05EC07B9CCA@SJ1PR11MB6129.namprd11.prod.outlook.com>

Hi,

On 10/11/2023 6:00 AM, Borah, Chaitanya Kumar wrote:
> Hello Rafael,
>
>> -----Original Message-----
>> From: Wysocki, Rafael J <rafael.j.wysocki@intel.com>
>> Sent: Tuesday, October 10, 2023 12:54 AM
>> To: Borah, Chaitanya Kumar <chaitanya.kumar.borah@intel.com>
>> Cc: intel-gfx@lists.freedesktop.org; Kurmi, Suresh Kumar
>> <suresh.kumar.kurmi@intel.com>; Saarinen, Jani <jani.saarinen@intel.com>
>> Subject: Re: Regression in linux-next
>>
>> Hi,
>>
>> On 10/9/2023 7:10 AM, Borah, Chaitanya Kumar wrote:
>>> Hello Rafael
>>>
>>>> Thanks for the report, I think that this is a lockdep assertion failing.
>>>> If that is correct, it should be straightforward to fix.
>>>> I'll take care of this early next week.
>>>> Thanks!
>>> Thank you for your response.  Please let us know when a fix is available.
>> It should be fixed in linux-next from today, by this commit:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-
>> pm.git/commit/?h=linux-
>> next&id=b44444027ce7714f309e96b804b7fb088a40d708
>>
>> Thanks!
> Thanks a lot for the fix. This seems to have fixed the issue in most of the machines but we are still seeing a similar problem in few of the machines.

Thanks for reporting this!


> This has a different call stack but seems to be from the same thermal subsystem. Full logs in [1]
>
> <4>[    4.392015] WARNING: CPU: 1 PID: 306 at drivers/thermal/thermal_trip.c:178 thermal_zone_trip_id+0x61/0x70
> <4>[    4.392022] Modules linked in: x86_pkg_temp_thermal coretemp kvm_intel mei_pxp mei_hdcp wmi_bmof kvm e1000e irqbypass crct10dif_pclmul video ptp crc32_pclmul ghash_clmulni_intel i2c_i801 mei_me pps_core mei i2c_smbus wmi
> <4>[    4.392057] CPU: 1 PID: 306 Comm: thermald Not tainted 6.6.0-rc5-next-20231010-next-20231010-gc0a6edb636cb+ #1
> <4>[    4.392061] Hardware name: System manufacturer System Product Name/Z170M-PLUS, BIOS 3610 03/29/2018
> <4>[    4.392063] RIP: 0010:thermal_zone_trip_id+0x61/0x70
> <4>[    4.392066] Code: 74 0c 83 c0 01 39 c8 75 f0 b8 c3 ff ff ff 5b 5d c3 cc cc cc cc 48 8d bf f0 05 00 00 be ff ff ff ff e8 63 a4 2d 00 85 c0 75 b5 <0f> 0b eb b1 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90
> <4>[    4.392069] RSP: 0018:ffffc9000156bda8 EFLAGS: 00010246
> <4>[    4.392073] RAX: 0000000000000000 RBX: ffff888103828ae8 RCX: 0000000000000001
> <4>[    4.392075] RDX: 0000000080000000 RSI: ffffffff823de5ab RDI: ffffffff823fdfba
> <4>[    4.392078] RBP: ffff888103a88800 R08: ffff888103828ae8 R09: 0000000000000001
> <4>[    4.392080] R10: 0000000000000001 R11: ffff88811494d3c0 R12: ffff888103a88818
> <4>[    4.392082] R13: ffff8881108bfa00 R14: ffff888103794408 R15: 0000000000000001
> <4>[    4.392084] FS:  00007f1f0d6d28c0(0000) GS:ffff88822e680000(0000) knlGS:0000000000000000
> <4>[    4.392087] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[    4.392089] CR2: 000055857c50b750 CR3: 0000000111efa005 CR4: 00000000003706f0
> <4>[    4.392091] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> <4>[    4.392093] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> <4>[    4.392095] Call Trace:
> <4>[    4.392097]  <TASK>
> <4>[    4.392100]  ? __warn+0x7f/0x170
> <4>[    4.392104]  ? thermal_zone_trip_id+0x61/0x70
> <4>[    4.392109]  ? report_bug+0x1f8/0x200
> <4>[    4.392116]  ? handle_bug+0x3c/0x70
> <4>[    4.392119]  ? exc_invalid_op+0x18/0x70
> <4>[    4.392123]  ? asm_exc_invalid_op+0x1a/0x20
> <4>[    4.392133]  ? thermal_zone_trip_id+0x61/0x70
> <4>[    4.392137]  ? thermal_zone_trip_id+0x5d/0x70
> <4>[    4.392141]  trip_point_show+0x18/0x40
> <4>[    4.392145]  dev_attr_show+0x15/0x60
> <4>[    4.392149]  sysfs_kf_seq_show+0xb5/0x100
> <4>[    4.392154]  seq_read_iter+0x111/0x450
> <4>[    4.392158]  ? check_object+0x133/0x320
> <4>[    4.392164]  vfs_read+0x20d/0x300
> <4>[    4.392175]  ksys_read+0x64/0xe0
> <4>[    4.392180]  do_syscall_64+0x3c/0x90
> <4>[    4.392183]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> <4>[    4.392187] RIP: 0033:0x7f1f0e193392
>
> Can you please check what could be the reason for this issue?

Well, one more unuseful lockdep assertion has been added recently to the 
thermal core, sorry about that.

This commit

https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=linux-next&id=108ffd12be24ba1d74b3314df8db32a0a6d55ba5

that will be merged into linux-next tomorrow if all goes well, should 
address this.

Thanks!


> [1] https://intel-gfx-ci.01.org/tree/linux-next/next-20231010/fi-kbl-guc/boot0.txt
>
> Regards
>
> Chaitanya
>
>
>
>
>>
>>> From: Wysocki, Rafael J <rafael.j.wysocki@intel.com>
>>> Sent: Saturday, October 7, 2023 2:01 AM
>>> To: Borah, Chaitanya Kumar <chaitanya.kumar.borah@intel.com>
>>> Cc: intel-gfx@lists.freedesktop.org; Kurmi, Suresh Kumar
>>> <suresh.kumar.kurmi@intel.com>; Saarinen, Jani
>>> <jani.saarinen@intel.com>
>>> Subject: Re: Regression in linux-next
>>>
>>> Hi,
>>> On 10/5/2023 5:58 PM, Borah, Chaitanya Kumar wrote:
>>> Hello Rafael,
>>>
>>> Hope you are doing well. I am Chaitanya from the linux graphics team in
>> Intel.
>>> This mail is regarding a regression we are seeing in our CI runs[1] on linux-
>> next repository.
>>> Thanks for the report, I think that this is a lockdep assertion failing.
>>> If that is correct, it should be straightforward to fix.
>>> I'll take care of this early next week.
>>> Thanks!
>>>
>>> On next-20231003 [2], we are seeing the following error
>>>
>>> ``````````````````````````````````````````````````````````````````````
>>> ````````` <4>[   14.093075] ------------[ cut here ]------------ <4>[
>>> 14.097664] WARNING: CPU: 0 PID: 1 at drivers/thermal/thermal_trip.c:18
>>> for_each_thermal_trip+0x83/0x90 <4>[   14.106977] Modules linked in:
>>> <4>[   14.110017] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W
>>> 6.6.0-rc4-next-20231003-next-20231003-gc9f2baaa18b5+ #1 <4>[
>>> 14.121305] Hardware name: Intel Corporation Meteor Lake Client
>>> Platform/MTL-P DDR5 SODIMM SBS RVP, BIOS
>>> MTLPFWI1.R00.3323.D89.2309110529 09/11/2023 <4>[   14.134478] RIP:
>>> 0010:for_each_thermal_trip+0x83/0x90
>>> <4>[   14.139496] Code: 5c 41 5d c3 cc cc cc cc 5b 31 c0 5d 41 5c 41
>>> 5d c3 cc cc cc cc 48 8d bf f0 05 00 00 be ff ff ff ff e8 21 a2 2d 00
>>> 85 c0 75 9a <0f> 0b eb 96 66 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90
>>> 90 90 90
>>>
>>> Details log can be found in [3].
>>>
>>> After bisecting the tree, the following patch [4] seems to be causing the
>> regression.
>>> commit d5ea889246b112e228433a5f27f57af90ca0c1fb
>>> Author: Rafael J. Wysocki mailto:rafael.j.wysocki@intel.com
>>> Date:   Thu Sep 21 20:02:59 2023 +0200
>>>
>>>       ACPI: thermal: Do not use trip indices for cooling device binding
>>>
>>>       Rearrange the ACPI thermal driver's callback functions used for
>>> cooling
>>>       device binding and unbinding, acpi_thermal_bind_cooling_device()
>>> and
>>>       acpi_thermal_unbind_cooling_device(), respectively, so that they
>>> use trip
>>>       pointers instead of trip indices which is more straightforward
>>> and allows
>>>       the driver to become independent of the ordering of trips in the
>>> thermal
>>>       zone structure.
>>>
>>>       The general functionality is not expected to be changed.
>>>
>>>       Signed-off-by: Rafael J. Wysocki
>>> mailto:rafael.j.wysocki@intel.com
>>>       Reviewed-by: Daniel Lezcano mailto:daniel.lezcano@linaro.org
>>>
>>> We also verified by moving the head of the tree to the previous commit.
>>>
>>> Could you please check why this patch causes the regression and if we can
>> find a solution for it soon?
>>> [1] https://intel-gfx-ci.01.org/tree/linux-next/combined-alt.html?
>>> [2]
>>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/co
>>> mmit/?h=next-20231003 [3]
>>> https://intel-gfx-ci.01.org/tree/linux-next/next-20231003/bat-mtlp-6/b
>>> oot0.txt [4]
>>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/co
>>> mmit/?h=next-20231003&id=d5ea889246b112e228433a5f27f57af90ca0c1fb

  reply	other threads:[~2023-10-11 16:17 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-05 15:58 [Intel-gfx] Regression in linux-next Borah, Chaitanya Kumar
2023-10-06 20:30 ` Wysocki, Rafael J
2023-10-09  5:10   ` Borah, Chaitanya Kumar
2023-10-09 19:23     ` Wysocki, Rafael J
2023-10-11  4:00       ` Borah, Chaitanya Kumar
2023-10-11 16:14         ` Wysocki, Rafael J [this message]
2023-10-11 16:49           ` Borah, Chaitanya Kumar
2023-10-13 14:05             ` Borah, Chaitanya Kumar
2023-10-20  5:52 ` [Intel-gfx] Regression on linux-next (next-20231016) Borah, Chaitanya Kumar
2023-10-20  6:38   ` Lorenzo Stoakes
2023-10-20  7:58     ` Borah, Chaitanya Kumar
2023-10-25  6:32   ` [Intel-gfx] Regression on linux-next (next-20231013) Borah, Chaitanya Kumar
2023-10-25  7:32     ` Christian Brauner
2023-10-25 13:44       ` Borah, Chaitanya Kumar
2023-10-26 10:14         ` Borah, Chaitanya Kumar
2023-10-26 12:16           ` Christian Brauner
2023-11-09 17:00     ` [Intel-gfx] Regression on linux-next (next-20231107) Borah, Chaitanya Kumar
2023-11-09 20:40       ` Krister Johansen
2023-11-10  3:38         ` Borah, Chaitanya Kumar
2023-11-13  6:21           ` Borah, Chaitanya Kumar
     [not found]             ` <20231114174121.GA2064@templeofstupid.com>
2023-11-15  4:33               ` Borah, Chaitanya Kumar
2023-12-04 17:17       ` [Intel-gfx] Regression on linux-next (next-20231130) Borah, Chaitanya Kumar
2023-12-04 18:11         ` Berg, Johannes
2023-12-05  6:14           ` Borah, Chaitanya Kumar
2024-01-31  5:34         ` Regression on drm-tip Borah, Chaitanya Kumar
     [not found]           ` <b77d8588-6809-416c-b598-7a33a672c1e7@opensource.cirrus.com>
2024-02-01  5:13             ` Borah, Chaitanya Kumar
     [not found] <SJ1PR11MB6129592BDF5D06949F99816CB95B9@SJ1PR11MB6129.namprd11.prod.outlook.com>
     [not found] ` <SJ1PR11MB6129A7F5C08E2C47748F2BA5B97E9@SJ1PR11MB6129.namprd11.prod.outlook.com>
     [not found]   ` <SJ1PR11MB612980562220A376CA90E105B97E9@SJ1PR11MB6129.namprd11.prod.outlook.com>
2023-07-25  6:42     ` [Intel-gfx] Regression in linux-next Borah, Chaitanya Kumar
2023-07-25 10:53       ` Tvrtko Ursulin
2023-07-26  3:55         ` Borah, Chaitanya Kumar
2023-07-25 13:15       ` Alistair Popple
2023-07-26  3:53         ` Borah, Chaitanya Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ddb6d76b-829e-5c81-5459-61774ee79b1a@intel.com \
    --to=rafael.j.wysocki@intel.com \
    --cc=chaitanya.kumar.borah@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=suresh.kumar.kurmi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox