* [git pull] drm fixes for 6.13-rc1
@ 2024-11-28 20:42 Dave Airlie
2024-11-29 5:11 ` Sasha Levin
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Dave Airlie @ 2024-11-28 20:42 UTC (permalink / raw)
To: Linus Torvalds, Sima Vetter; +Cc: dri-devel, LKML
Hi Linus,
Merge window fixes, mostly amdgpu and xe, with a few other minor ones,
all looks fairly normal,
Dave.
drm-next-2024-11-29:
drm fixes for v6.13-rc1
i915:
- hdcp: Fix when the first read and write are retried
xe:
- Wake up waiters after wait condition set to true
- Mark the preempt fence workqueue as reclaim
- Update xe2 graphics name string
- Fix a couple of guc submit races
- Fix pat index usage in migrate
- Ensure non-cached migrate pagetable bo mappings
- Take a PM ref in the delayed snapshot capture worker
amdgpu:
- SMU 13.0.6 fixes
- XGMI fixes
- SMU 13.0.7 fixes
- Misc code cleanups
- Plane refcount fixes
- DCN 4.0.1 fixes
- DC power fixes
- DTO fixes
- NBIO 7.11 fixes
- SMU 14.0.x fixes
- Reset fixes
- Enable DC on LoongArch
- Sysfs hotplug warning fix
- Misc small fixes
- VCN 4.0.3 fix
- Slab usage fix
- Jpeg delayed work fix
amdkfd:
- wptr handling fixes
radeon:
- Use ttm_bo_move_null()
- Constify struct pci_device_id
- Fix spurious hotplug
- HPD fix
rockchip
- fix 32-bit build
The following changes since commit a163b895077861598be48c1cf7f4a88413c28b22:
Merge tag 'drm-xe-next-fixes-2024-11-15' of
https://gitlab.freedesktop.org/drm/xe/kernel into drm-next (2024-11-18
13:38:46 +1000)
are available in the Git repository at:
https://gitlab.freedesktop.org/drm/kernel.git tags/drm-next-2024-11-29
for you to fetch changes up to 9794b89c50f7fc972c6b4ddc69693c9f9d1ae7d7:
Merge tag 'drm-xe-next-fixes-2024-11-28' of
https://gitlab.freedesktop.org/drm/xe/kernel into drm-next (2024-11-29
04:59:28 +1000)
----------------------------------------------------------------
drm fixes for v6.13-rc1
i915:
- hdcp: Fix when the first read and write are retried
xe:
- Wake up waiters after wait condition set to true
- Mark the preempt fence workqueue as reclaim
- Update xe2 graphics name string
- Fix a couple of guc submit races
- Fix pat index usage in migrate
- Ensure non-cached migrate pagetable bo mappings
- Take a PM ref in the delayed snapshot capture worker
amdgpu:
- SMU 13.0.6 fixes
- XGMI fixes
- SMU 13.0.7 fixes
- Misc code cleanups
- Plane refcount fixes
- DCN 4.0.1 fixes
- DC power fixes
- DTO fixes
- NBIO 7.11 fixes
- SMU 14.0.x fixes
- Reset fixes
- Enable DC on LoongArch
- Sysfs hotplug warning fix
- Misc small fixes
- VCN 4.0.3 fix
- Slab usage fix
- Jpeg delayed work fix
amdkfd:
- wptr handling fixes
radeon:
- Use ttm_bo_move_null()
- Constify struct pci_device_id
- Fix spurious hotplug
- HPD fix
rockchip
- fix 32-bit build
----------------------------------------------------------------
Alex Deucher (3):
drm/amdgpu/gmc7: fix wait_for_idle callers
drm/amdgpu/jpeg: cancel the jpeg worker
Revert "drm/radeon: Delay Connector detecting when HPD singals
is unstable"
Aric Cyr (1):
drm/amd/display: 3.2.310
Arnd Bergmann (1):
drm/rockchip: avoid 64-bit division
Asad Kamal (3):
drm/amd/pm: Update data types used for uapi i/f
drm/amd/pm: Add gpu_metrics_v1_7
drm/amd/pm: Get xgmi link status for XGMI_v_6_4_0
Austin Zheng (1):
drm/amd/display: Populate Power Profile In Case of Early Return
Bhavin Sharma (2):
drm/amd/pm: remove redundant tools_size check
drm/amd/display: remove redundant is_dsc_possible check
Chris Park (1):
drm/amd/display: Ignore scalar validation failure if pipe is phantom
Christophe JAILLET (1):
drm/radeon: Constify struct pci_device_id
Dave Airlie (5):
Merge tag 'drm-intel-next-fixes-2024-11-21' of
https://gitlab.freedesktop.org/drm/i915/kernel into drm-next
Merge tag 'drm-xe-next-fixes-2024-11-21' of
https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
Merge tag 'amd-drm-fixes-6.13-2024-11-22' of
https://gitlab.freedesktop.org/agd5f/linux into drm-next
Merge tag 'drm-misc-next-fixes-2024-11-28' of
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
Merge tag 'drm-xe-next-fixes-2024-11-28' of
https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
Dillon Varone (1):
drm/amd/display: Enable Request rate limiter during C-State on dcn401
Huacai Chen (2):
drm/radeon: Use ttm_bo_move_null() in radeon_bo_move()
drm/amd/display: Allow building DC with clang on LoongArch
Jesse.zhang@amd.com (2):
drm/amdgpu: Add sysfs interface for vcn reset mask
drm/amdgpu: Fix sysfs warning when hotplugging
Joshua Aberback (1):
drm/amd/display: Fix handling of plane refcount
Kenneth Feng (3):
drm/amdgpu/pm: add gen5 display to the user on smu v14.0.2/3
drm/amd/pm: disable pcie speed switching on Intel platform for
smu v14.0.2/3
drm/amd/pm: skip setting the power source on smu v14.0.2/3
Lijo Lazar (4):
drm/amdgpu: Add init level for post reset reinit
drm/amdgpu: Use reset recovery state checks
drm/amdkfd: Use the correct wptr size
drm/amd/pm: Remove arcturus min power limit
Mario Limonciello (2):
drm/amd: Add some missing straps from NBIO 7.11.0
drm/amd: Fix initialization mistake for NBIO 7.11 devices
Matt Roper (1):
drm/xe: Update xe2_graphics name string
Matthew Auld (4):
drm/xe/guc_submit: fix race around pending_disable
drm/xe/guc_submit: fix race around suspend_pending
drm/xe/migrate: fix pat index usage
drm/xe/migrate: use XE_BO_FLAG_PAGETABLE
Matthew Brost (2):
drm/xe: Mark preempt fence workqueue as reclaim
drm/xe: Take PM ref in delayed snapshot capture worker
Nirmoy Das (1):
drm/xe/ufence: Wake up waiters after setting ufence->signalled
Ovidiu Bunea (1):
drm/amd/display: Remove PIPE_DTO_SRC_SEL programming from set_dtbclk_dto
Samson Tam (2):
drm/amd/display: add public taps API in SPL
drm/amd/display: allow chroma 1:1 scaling when sharpness is off
Steven 'Steve' Kendall (1):
drm/radeon: Fix spurious unplug event on radeon HDMI
Suraj Kandpal (1):
drm/i915/hdcp: Fix when the first read and write are retried
Umio Yasuno (1):
drm/amd/pm: update current_socclk and current_uclk in
gpu_metrics on smu v13.0.7
Victor Zhao (1):
drm/amdkfd: make sure ring buffer is flushed before update wptr
Vitaly Prosyak (1):
drm/amdgpu: fix usage slab after free
Xiang Liu (1):
drm/amdgpu/vcn: reset fw_shared when VCPU buffers corrupted on vcn v4.0.3
Yihan Zhu (1):
drm/amd/display: update pipe selection policy to check head pipe
Zicheng Qu (2):
drm/amd/display: Fix null check for pipe_ctx->plane_state in
dcn20_program_pipe
drm/amd/display: Fix null check for pipe_ctx->plane_state in
hwss_setup_dpp
drivers/gpu/drm/amd/amdgpu/aldebaran.c | 4 +
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 29 ++++-
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 8 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c | 6 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_preempt_mgr.c | 3 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 10 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 5 +
drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 2 +
drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 6 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 6 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 37 +++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h | 4 +
drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c | 6 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 41 +++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 2 +
drivers/gpu/drm/amd/amdgpu/df_v3_6.c | 4 +-
drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 18 +++-
drivers/gpu/drm/amd/amdgpu/jpeg_v1_0.c | 2 +-
drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c | 2 +-
drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c | 2 +-
drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c | 2 +-
drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c | 2 +-
drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c | 2 +-
drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_0.c | 2 +-
drivers/gpu/drm/amd/amdgpu/nbio_v7_11.c | 9 ++
drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c | 2 +
drivers/gpu/drm/amd/amdgpu/smu_v13_0_10.c | 2 +
drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 9 ++
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c | 39 +++++--
drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c | 10 ++
drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c | 7 +-
drivers/gpu/drm/amd/display/Kconfig | 15 +--
drivers/gpu/drm/amd/display/dc/core/dc.c | 7 +-
.../gpu/drm/amd/display/dc/core/dc_hw_sequencer.c | 3 +
drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 8 ++
drivers/gpu/drm/amd/display/dc/dc.h | 2 +-
.../gpu/drm/amd/display/dc/dccg/dcn35/dcn35_dccg.c | 15 +--
.../dml21/src/dml2_core/dml2_core_dcn4_calcs.c | 6 ++
.../amd/display/dc/dml2/dml2_dc_resource_mgmt.c | 23 +++-
drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c | 13 +--
.../drm/amd/display/dc/hubbub/dcn10/dcn10_hubbub.h | 8 +-
.../drm/amd/display/dc/hubbub/dcn20/dcn20_hubbub.h | 1 +
.../amd/display/dc/hubbub/dcn401/dcn401_hubbub.c | 24 ++++-
.../amd/display/dc/hubbub/dcn401/dcn401_hubbub.h | 7 +-
.../drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c | 6 +-
.../drm/amd/display/dc/hwss/dcn401/dcn401_hwseq.c | 13 ++-
drivers/gpu/drm/amd/display/dc/inc/hw/dchubbub.h | 2 +-
.../display/dc/resource/dcn401/dcn401_resource.h | 3 +-
drivers/gpu/drm/amd/display/dc/spl/dc_spl.c | 97 +++++++++++------
drivers/gpu/drm/amd/display/dc/spl/dc_spl.h | 2 +
.../amd/include/asic_reg/nbio/nbio_7_11_0_offset.h | 2 +
.../include/asic_reg/nbio/nbio_7_11_0_sh_mask.h | 13 +++
drivers/gpu/drm/amd/include/kgd_pp_interface.h | 118 ++++++++++++++++++++-
.../drm/amd/pm/powerplay/smumgr/vega12_smumgr.c | 24 ++---
drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 8 +-
drivers/gpu/drm/amd/pm/swsmu/inc/smu_v14_0.h | 2 +-
drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c | 6 +-
.../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 12 ++-
.../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c | 2 +
drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0.c | 2 +-
.../gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_2_ppt.c | 37 +++++--
drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 3 +
drivers/gpu/drm/i915/display/intel_hdcp.c | 32 +++---
drivers/gpu/drm/radeon/radeon_audio.c | 12 ++-
drivers/gpu/drm/radeon/radeon_connectors.c | 10 --
drivers/gpu/drm/radeon/radeon_drv.c | 3 +-
drivers/gpu/drm/radeon/radeon_ttm.c | 3 +-
drivers/gpu/drm/rockchip/dw_hdmi_qp-rockchip.c | 2 +-
drivers/gpu/drm/xe/xe_devcoredump.c | 6 ++
drivers/gpu/drm/xe/xe_device.c | 3 +-
drivers/gpu/drm/xe/xe_guc_submit.c | 34 ++++--
drivers/gpu/drm/xe/xe_migrate.c | 6 +-
drivers/gpu/drm/xe/xe_pci.c | 2 +-
drivers/gpu/drm/xe/xe_sync.c | 6 +-
75 files changed, 692 insertions(+), 195 deletions(-)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [git pull] drm fixes for 6.13-rc1
2024-11-28 20:42 [git pull] drm fixes for 6.13-rc1 Dave Airlie
@ 2024-11-29 5:11 ` Sasha Levin
2024-11-29 15:51 ` Sasha Levin
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: Sasha Levin @ 2024-11-29 5:11 UTC (permalink / raw)
To: Dave Airlie
Cc: Linus Torvalds, Sima Vetter, dri-devel, LKML, alexander.deucher
[+cc Alex]
On Fri, Nov 29, 2024 at 06:42:18AM +1000, Dave Airlie wrote:
>Alex Deucher (3):
> drm/amdgpu/jpeg: cancel the jpeg worker
Hi folks,
When merging this PR into linus-next I've started the following warning
triggered by the commit above:
[ 4.356975] WARNING: CPU: 1 PID: 1 at kernel/workqueue.c:4192 __flush_work+0x29f/0x2c0
[ 4.364893] Modules linked in:
[ 4.367954] CPU: 1 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0 #1 2142668e2e0d420da59d54376b9563eacab27615
[ 4.378035] Hardware name: HP Berknip/Berknip, BIOS Google_Berknip.13434.796.2023_03_03_1148 09/12/2022
[ 4.387422] RIP: 0010:__flush_work+0x29f/0x2c0
[ 4.391870] Code: 48 8b 15 24 53 70 65 48 89 54 24 58 48 8b 73 40 8b 4b 30 e9 b6 fe ff ff 40 30 f6 4c 8b 26 e9 f2 fd ff ff 0f 0b e9 33 ff ff ff <0f> 0b e9 2c ff ff ff 0f 0b e9 d2 fe ff ff e8 8e 62 26 01 66 66 2e
[ 4.410618] RSP: 0018:ffffa38540037b60 EFLAGS: 00010246
[ 4.415846] RAX: 0000000000000000 RBX: ffff974c87b3bd08 RCX: 0000000000000000
[ 4.422977] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffa38540037bc0
[ 4.430108] RBP: ffff974c87b3bd08 R08: ffff974c87b31fa0 R09: ffff974c818ad000
[ 4.437238] R10: 0000000000000040 R11: ffffffff9df2fd88 R12: ffff974c87b00000
[ 4.444371] R13: ffff974c87b34fe0 R14: ffffa38540037b68 R15: 0000000000000001
[ 4.451503] FS: 0000000000000000(0000) GS:ffff974da7080000(0000) knlGS:0000000000000000
[ 4.459587] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4.465333] CR2: 0000000000000000 CR3: 0000000162c28000 CR4: 00000000003506f0
[ 4.472463] Call Trace:
[ 4.474916] <TASK>
[ 4.477022] ? __warn+0x85/0x130
[ 4.480256] ? __flush_work+0x29f/0x2c0
[ 4.484098] ? report_bug+0x160/0x190
[ 4.487767] ? handle_bug+0x53/0x90
[ 4.491263] ? exc_invalid_op+0x13/0x60
[ 4.495102] ? asm_exc_invalid_op+0x16/0x20
[ 4.499291] ? __flush_work+0x29f/0x2c0
[ 4.503132] cancel_delayed_work_sync+0x6e/0x90
[ 4.507667] jpeg_v1_0_ring_begin_use+0x1d/0xb0
[ 4.512204] amdgpu_ring_alloc+0x3f/0x60
[ 4.516135] amdgpu_jpeg_dec_ring_test_ring+0x31/0x180
[ 4.521274] amdgpu_ring_test_helper+0x1c/0x90
[ 4.525721] amdgpu_device_init+0x205f/0x26f0
[ 4.530083] amdgpu_driver_load_kms+0x15/0x80
[ 4.534445] amdgpu_pci_probe+0x17e/0x4f0
[ 4.538458] pci_device_probe+0x98/0x120
[ 4.542387] really_probe+0xd1/0x2b0
[ 4.545969] ? __device_attach_driver+0xc0/0xc0
[ 4.550503] __driver_probe_device+0x73/0x120
[ 4.554862] driver_probe_device+0x1f/0x90
[ 4.558962] __driver_attach+0x84/0x130
[ 4.562804] bus_for_each_dev+0x84/0xd0
[ 4.566645] bus_add_driver+0xe4/0x210
[ 4.570400] driver_register+0x55/0x100
[ 4.574240] ? drm_sched_fence_slab_init+0x90/0x90
[ 4.579033] do_one_initcall+0x57/0x300
[ 4.582868] kernel_init_freeable+0x1be/0x300
[ 4.587231] ? rest_init+0xc0/0xc0
[ 4.590638] kernel_init+0x16/0x1b0
[ 4.594129] ret_from_fork+0x30/0x50
[ 4.597711] ? rest_init+0xc0/0xc0
[ 4.601116] ret_from_fork_asm+0x11/0x20
[ 4.605046] </TASK>
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [git pull] drm fixes for 6.13-rc1
2024-11-28 20:42 [git pull] drm fixes for 6.13-rc1 Dave Airlie
2024-11-29 5:11 ` Sasha Levin
@ 2024-11-29 15:51 ` Sasha Levin
2024-11-29 21:31 ` Linus Torvalds
2024-11-29 21:33 ` pr-tracker-bot
3 siblings, 0 replies; 8+ messages in thread
From: Sasha Levin @ 2024-11-29 15:51 UTC (permalink / raw)
To: Dave Airlie; +Cc: Linus Torvalds, Sima Vetter, dri-devel, LKML
On Fri, Nov 29, 2024 at 06:42:18AM +1000, Dave Airlie wrote:
>Hi Linus,
>
>Merge window fixes, mostly amdgpu and xe, with a few other minor ones,
>all looks fairly normal,
Hi folks,
I've also started seeing the following warning after the merge into
linus-next:
[ 4.495349] UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13
[ 4.501876] shift exponent 32 is too large for 32-bit type 'long unsigned int'
[ 4.509101] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0 #1
[ 4.510096] Hardware name: Google Shuboz/Shuboz, BIOS Google_Shuboz.13434.780.2022_10_13_1418 09/12/2022
[ 4.510096] Call Trace:
[ 4.510096] dump_stack_lvl+0x94/0xa4
[ 4.510096] dump_stack+0x12/0x18
[ 4.510096] __ubsan_handle_shift_out_of_bounds+0x156/0x320
[ 4.510096] amdgpu_vm_adjust_size.cold+0x64/0x6c
[ 4.510096] ? __lock_release.isra.0+0x5d/0x170
[ 4.510096] ? amdgpu_device_skip_hw_access.part.0+0x6a/0x70
[ 4.510096] ? gmc_v9_0_init_mem_ranges+0x14c/0x14c
[ 4.510096] gmc_v9_0_sw_init+0x436/0x7c0
[ 4.510096] ? nbio_v7_0_vcn_doorbell_range+0x74/0x74
[ 4.510096] ? gmc_v9_0_init_mem_ranges+0x14c/0x14c
[ 4.510096] amdgpu_device_ip_init+0xd4/0xa74
[ 4.510096] amdgpu_device_init+0xc4a/0x1458
[ 4.510096] amdgpu_driver_load_kms+0x19/0x9c
[ 4.510096] amdgpu_pci_probe+0x153/0x570
[ 4.510096] ? _raw_spin_unlock_irqrestore+0x2f/0x58
[ 4.510096] pci_device_probe+0x8c/0x118
[ 4.510096] ? sysfs_create_link+0x1d/0x38
[ 4.510096] really_probe+0xc2/0x2ac
[ 4.510096] ? _raw_spin_unlock_irq+0x1d/0x38
[ 4.510096] ? pm_runtime_barrier+0x52/0x90
[ 4.510096] __driver_probe_device+0x7a/0x180
[ 4.510096] ? __driver_attach+0x8e/0x188
[ 4.510096] driver_probe_device+0x23/0x108
[ 4.510096] __driver_attach+0x97/0x188
[ 4.510096] ? __device_attach_driver+0x120/0x120
[ 4.510096] bus_for_each_dev+0x71/0xc0
[ 4.510096] driver_attach+0x19/0x20
[ 4.510096] ? __device_attach_driver+0x120/0x120
[ 4.510096] bus_add_driver+0xc9/0x208
[ 4.510096] driver_register+0x52/0x10c
[ 4.510096] ? drm_sched_fence_slab_init+0x80/0x80
[ 4.510096] __pci_register_driver+0x5f/0x68
[ 4.510096] amdgpu_init+0x62/0xb0
[ 4.510096] do_one_initcall+0x63/0x2a8
[ 4.510096] ? rdinit_setup+0x40/0x40
[ 4.510096] ? parse_args+0x14b/0x3f4
[ 4.510096] do_initcalls+0xbc/0x148
[ 4.510096] ? rdinit_setup+0x40/0x40
[ 4.510096] kernel_init_freeable+0x15b/0x1fc
[ 4.510096] ? kernel_init+0x18/0x1f4
[ 4.510096] ? rest_init+0x1cc/0x1cc
[ 4.510096] kernel_init+0x18/0x1f4
[ 4.510096] ? schedule_tail+0x50/0x60
[ 4.510096] ret_from_fork+0x38/0x44
[ 4.510096] ? rest_init+0x1cc/0x1cc
[ 4.510096] ret_from_fork_asm+0x12/0x18
[ 4.510096] entry_INT80_32+0x108/0x108
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [git pull] drm fixes for 6.13-rc1
2024-11-28 20:42 [git pull] drm fixes for 6.13-rc1 Dave Airlie
2024-11-29 5:11 ` Sasha Levin
2024-11-29 15:51 ` Sasha Levin
@ 2024-11-29 21:31 ` Linus Torvalds
2024-11-29 22:59 ` Sasha Levin
2024-12-02 14:10 ` Alex Deucher
2024-11-29 21:33 ` pr-tracker-bot
3 siblings, 2 replies; 8+ messages in thread
From: Linus Torvalds @ 2024-11-29 21:31 UTC (permalink / raw)
To: Dave Airlie, Alex Deucher, Sasha Levin; +Cc: Sima Vetter, dri-devel, LKML
On Thu, 28 Nov 2024 at 12:42, Dave Airlie <airlied@gmail.com> wrote:
>
> Merge window fixes, mostly amdgpu and xe, with a few other minor ones,
> all looks fairly normal,
Hmm. I've pulled this, but do note the report by Sasha.
The
if (WARN_ON(!work->func))
return false;
from __flush_work() looks odd, and is fairly obviously triggered by
this one liner in commit 93df74873703 ("drm/amdgpu/jpeg: cancel the
jpeg worker")
- bool set_clocks = !cancel_delayed_work_sync(&adev->vcn.idle_work);
+ bool set_clocks = !cancel_delayed_work_sync(&adev->jpeg.idle_work);
where apparently the jpeg.idle_work isn't initialized at that point.
It looks like the initialization is done by amdgpu_jpeg_sw_init(), and
it looks like that cancel_delayed_work_sync() is just done too early.
But I don't know the code. Alex?
The other report by Sasha seems to be a 32-bit issue, where something
calls roundup_pow_of_two() on a thing that would round up past the
32-bit limit. Presumably it works on 64-bit.
But I'm not seeing anything that looks like a likely *cause* of the new warning.
There's a couple possible cases, although this one looks suspicious:
adev->vm_manager.max_pfn = (uint64_t)vm_size << 18;
tmp = roundup_pow_of_two(adev->vm_manager.max_pfn);
because it explicitly uses 64-bit types for that max_pfn thing, but
then does that roundup_pow_of_two() that only works on "unsigned
long".
Sasha - it would help if your warning stack dumps had line numbers
(using decode_stacktrace.sh, which you should be familiar with, since
you wrote it...)
I realize that requires some debug info, which might slow down builds
etc, but it would be really nice.
Linus
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [git pull] drm fixes for 6.13-rc1
2024-11-28 20:42 [git pull] drm fixes for 6.13-rc1 Dave Airlie
` (2 preceding siblings ...)
2024-11-29 21:31 ` Linus Torvalds
@ 2024-11-29 21:33 ` pr-tracker-bot
3 siblings, 0 replies; 8+ messages in thread
From: pr-tracker-bot @ 2024-11-29 21:33 UTC (permalink / raw)
To: Dave Airlie; +Cc: Linus Torvalds, Sima Vetter, dri-devel, LKML
The pull request you sent on Fri, 29 Nov 2024 06:42:18 +1000:
> https://gitlab.freedesktop.org/drm/kernel.git tags/drm-next-2024-11-29
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/2ba9f676d0a2e408aef14d679984c26373bf37b7
Thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [git pull] drm fixes for 6.13-rc1
2024-11-29 21:31 ` Linus Torvalds
@ 2024-11-29 22:59 ` Sasha Levin
2024-11-30 4:29 ` Linus Torvalds
2024-12-02 14:10 ` Alex Deucher
1 sibling, 1 reply; 8+ messages in thread
From: Sasha Levin @ 2024-11-29 22:59 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Dave Airlie, Alex Deucher, Sima Vetter, dri-devel, LKML
On Fri, Nov 29, 2024 at 01:31:37PM -0800, Linus Torvalds wrote:
>Sasha - it would help if your warning stack dumps had line numbers
>(using decode_stacktrace.sh, which you should be familiar with, since
>you wrote it...)
>
>I realize that requires some debug info, which might slow down builds
>etc, but it would be really nice.
I don't actually do any of the testing myself: my scripts just try to
race you, pulling trees and feeding them to KernelCI/LKFT for testing.
I'm constrained by what I get out of the testing infrastructure, and
from what I see in KernelCI the kernel builds are done without debug
info (a trade-off to allow more builds/tests?).
I should be able to reuse their config and just add debug info, no?
This is what I get as output for the 32-bit issue:
[ 4.495349] UBSAN: shift-out-of-bounds in include/linux/log2.h:57:13
[ 4.501876] shift exponent 32 is too large for 32-bit type 'long unsigned int'
[ 4.509101] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0 #1
[ 4.510096] Hardware name: Google Shuboz/Shuboz, BIOS Google_Shuboz.13434.780.2022_10_13_1418 09/12/2022
[ 4.510096] Call Trace:
[ 4.510096] dump_stack_lvl (lib/dump_stack.c:108)
[ 4.510096] dump_stack (lib/dump_stack.c:114)
[ 4.510096] __ubsan_handle_shift_out_of_bounds (lib/ubsan.c:133 lib/ubsan.c:373)
[ 4.510096] amdgpu_vm_adjust_size.cold (drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:2083 (discriminator 1))
[ 4.510096] ? __lock_release.isra.0 (kernel/locking/lockdep.c:5429)
[ 4.510096] ? amdgpu_device_skip_hw_access.part.0 (drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:380)
[ 4.510096] ? gmc_v9_0_init_mem_ranges (include/linux/device.h:861 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:1994)
[ 4.510096] gmc_v9_0_sw_init (drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:2167)
[ 4.510096] ? nbio_v7_0_vcn_doorbell_range (drivers/gpu/drm/amd/amdgpu/nbio_v7_0.c:103)
[ 4.510096] ? gmc_v9_0_init_mem_ranges (include/linux/device.h:861 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:1994)
[ 4.510096] amdgpu_device_ip_init (drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:2319 (discriminator 1))
[ 4.510096] amdgpu_device_init (drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3906)
[ 4.510096] amdgpu_driver_load_kms (drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c:148)
[ 4.510096] amdgpu_pci_probe (drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:2151 (discriminator 1))
[ 4.510096] ? _raw_spin_unlock_irqrestore (arch/x86/include/asm/preempt.h:95 (discriminator 1) include/linux/spinlock_api_smp.h:152 (discriminator 1) kernel/locking/spinlock.c:194 (discriminator 1))
[ 4.510096] pci_device_probe (drivers/pci/pci-driver.c:324 drivers/pci/pci-driver.c:392 drivers/pci/pci-driver.c:417 drivers/pci/pci-driver.c:460)
[ 4.510096] ? sysfs_create_link (fs/sysfs/symlink.c:93)
[ 4.510096] really_probe (drivers/base/dd.c:652 (discriminator 1))
[ 4.510096] ? _raw_spin_unlock_irq (include/linux/spinlock_api_smp.h:159 kernel/locking/spinlock.c:202)
[ 4.510096] ? pm_runtime_barrier (arch/x86/include/asm/atomic.h:23 include/linux/atomic/atomic-arch-fallback.h:444 include/linux/atomic/atomic-arch-fallback.h:2404 include/linux/atomic/atomic-arch-fallback.h:2433 include/linux/atomic/atomic-instrumented.h:1508 include/linux/pm_runtime.h:140 drivers/base/power/runtime.c:1424)
[ 4.510096] __driver_probe_device (drivers/base/dd.c:800)
[ 4.510096] ? __driver_attach (include/linux/device.h:992 drivers/base/dd.c:1095 drivers/base/dd.c:1215 drivers/base/dd.c:1156)
[ 4.510096] driver_probe_device (drivers/base/dd.c:831)
[ 4.510096] __driver_attach (include/linux/device.h:992 drivers/base/dd.c:1095 drivers/base/dd.c:1215 drivers/base/dd.c:1156)
[ 4.510096] ? __device_attach_driver (include/linux/list.h:154 include/linux/list.h:183 drivers/base/dd.c:140 drivers/base/dd.c:132 drivers/base/dd.c:935)
[ 4.510096] bus_for_each_dev (drivers/base/bus.c:326 drivers/base/bus.c:369)
[ 4.510096] driver_attach (drivers/base/dd.c:1234)
[ 4.510096] ? __device_attach_driver (include/linux/list.h:154 include/linux/list.h:183 drivers/base/dd.c:140 drivers/base/dd.c:132 drivers/base/dd.c:935)
[ 4.510096] bus_add_driver (drivers/base/bus.c:673)
[ 4.510096] driver_register (drivers/base/driver.c:240)
[ 4.510096] ? drm_sched_fence_slab_init (drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:2894)
[ 4.510096] __pci_register_driver (drivers/pci/pci-driver.c:1459)
[ 4.510096] amdgpu_init (drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:2905)
[ 4.510096] do_one_initcall (arch/x86/include/asm/bitops.h:228 arch/x86/include/asm/bitops.h:240 include/asm-generic/bitops/instrumented-non-atomic.h:142 include/linux/cpumask.h:504 include/linux/cpumask.h:1082 include/trace/events/initcall.h:27 init/main.c:1237)
[ 4.510096] ? rdinit_setup (init/main.c:601 (discriminator 1))
[ 4.510096] ? parse_args (kernel/params.c:191 (discriminator 1))
[ 4.510096] do_initcalls (init/main.c:1299 init/main.c:1316)
[ 4.510096] ? rdinit_setup (init/main.c:601 (discriminator 1))
[ 4.510096] kernel_init_freeable (init/main.c:1546)
[ 4.510096] ? kernel_init (init/main.c:1445)
[ 4.510096] ? rest_init (include/linux/rcupdate.h:787 (discriminator 5) init/main.c:703 (discriminator 5))
[ 4.510096] kernel_init (init/main.c:1445)
[ 4.510096] ? schedule_tail (kernel/sched/core.c:5317)
[ 4.510096] ret_from_fork (arch/x86/kernel/process.c:153)
[ 4.510096] ? rest_init (include/linux/rcupdate.h:787 (discriminator 5) init/main.c:703 (discriminator 5))
[ 4.510096] ret_from_fork_asm (arch/x86/entry/entry_64.S:291)
[ 4.510096] entry_INT80_32+0x108/0x108
...which looks reasonable?
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [git pull] drm fixes for 6.13-rc1
2024-11-29 22:59 ` Sasha Levin
@ 2024-11-30 4:29 ` Linus Torvalds
0 siblings, 0 replies; 8+ messages in thread
From: Linus Torvalds @ 2024-11-30 4:29 UTC (permalink / raw)
To: Sasha Levin; +Cc: Dave Airlie, Alex Deucher, Sima Vetter, dri-devel, LKML
On Fri, 29 Nov 2024 at 14:59, Sasha Levin <sashal@kernel.org> wrote:
>
> I should be able to reuse their config and just add debug info, no?
Sadly, no. Not unless you exactly match their compiler version. And it
looks like you don't, because the line numbers make no sense.
For example, this is the thing I would expect shows exactly *which* of
the roundup_pow_of_two()'s it is that causes it, but:
> [ 4.510096] amdgpu_vm_adjust_size.cold (drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:2083 (discriminator 1))
that amdgpu_vm.c:2083 line doesn't match anything, and isn't even
inside amdgpu_vm_adjust_size() - or something it would be inlining -
at all.
So I'm afraid it would have to be done at the KernelCI/LKFT side.
Linus
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [git pull] drm fixes for 6.13-rc1
2024-11-29 21:31 ` Linus Torvalds
2024-11-29 22:59 ` Sasha Levin
@ 2024-12-02 14:10 ` Alex Deucher
1 sibling, 0 replies; 8+ messages in thread
From: Alex Deucher @ 2024-12-02 14:10 UTC (permalink / raw)
To: Linus Torvalds
Cc: Dave Airlie, Alex Deucher, Sasha Levin, Sima Vetter, dri-devel,
LKML
On Fri, Nov 29, 2024 at 4:57 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Thu, 28 Nov 2024 at 12:42, Dave Airlie <airlied@gmail.com> wrote:
> >
> > Merge window fixes, mostly amdgpu and xe, with a few other minor ones,
> > all looks fairly normal,
>
> Hmm. I've pulled this, but do note the report by Sasha.
>
> The
>
> if (WARN_ON(!work->func))
> return false;
>
> from __flush_work() looks odd, and is fairly obviously triggered by
> this one liner in commit 93df74873703 ("drm/amdgpu/jpeg: cancel the
> jpeg worker")
>
> - bool set_clocks = !cancel_delayed_work_sync(&adev->vcn.idle_work);
> + bool set_clocks = !cancel_delayed_work_sync(&adev->jpeg.idle_work);
>
> where apparently the jpeg.idle_work isn't initialized at that point.
>
> It looks like the initialization is done by amdgpu_jpeg_sw_init(), and
> it looks like that cancel_delayed_work_sync() is just done too early.
> But I don't know the code. Alex?
Already fixed with this patch:
https://patchwork.freedesktop.org/patch/625940/
Will be in my fixes PR this week.
Alex
>
> The other report by Sasha seems to be a 32-bit issue, where something
> calls roundup_pow_of_two() on a thing that would round up past the
> 32-bit limit. Presumably it works on 64-bit.
>
> But I'm not seeing anything that looks like a likely *cause* of the new warning.
>
> There's a couple possible cases, although this one looks suspicious:
>
> adev->vm_manager.max_pfn = (uint64_t)vm_size << 18;
>
> tmp = roundup_pow_of_two(adev->vm_manager.max_pfn);
>
> because it explicitly uses 64-bit types for that max_pfn thing, but
> then does that roundup_pow_of_two() that only works on "unsigned
> long".
>
> Sasha - it would help if your warning stack dumps had line numbers
> (using decode_stacktrace.sh, which you should be familiar with, since
> you wrote it...)
>
> I realize that requires some debug info, which might slow down builds
> etc, but it would be really nice.
>
> Linus
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-12-02 14:11 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-28 20:42 [git pull] drm fixes for 6.13-rc1 Dave Airlie
2024-11-29 5:11 ` Sasha Levin
2024-11-29 15:51 ` Sasha Levin
2024-11-29 21:31 ` Linus Torvalds
2024-11-29 22:59 ` Sasha Levin
2024-11-30 4:29 ` Linus Torvalds
2024-12-02 14:10 ` Alex Deucher
2024-11-29 21:33 ` pr-tracker-bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox