From: Philipp Stanner <phasta@kernel.org>
To: "Lyude Paul" <lyude@redhat.com>,
"Danilo Krummrich" <dakr@kernel.org>,
"David Airlie" <airlied@gmail.com>,
"Simona Vetter" <simona@ffwll.ch>,
"Sabrina Dubroca" <sd@queasysnail.net>,
"Sumit Semwal" <sumit.semwal@linaro.org>,
"Christian König" <christian.koenig@amd.com>
Cc: dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org,
linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org,
Philipp Stanner <phasta@kernel.org>
Subject: [PATCH 0/3] drm/nouveau: Fix & improve nouveau_fence_done()
Date: Thu, 10 Apr 2025 11:24:16 +0200 [thread overview]
Message-ID: <20250410092418.135258-2-phasta@kernel.org> (raw)
Contains two patches improving nouveau_fence_done(), and one addressing
an actual bug (race):
[ 39.848463] WARNING: CPU: 21 PID: 1734 at drivers/gpu/drm/nouveau/nouveau_fence.c:509 nouveau_fence_no_signaling+0xac/0xd0 [nouveau]
[ 39.848551] Modules linked in: snd_seq_dummy snd_hrtimer nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_ine
t nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables qrtr sunrpc snd_sof_pci_intel_
tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic snd_sof_pci snd_sof_xtensa_dsp snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda snd_sof snd_sof_utils snd
_soc_acpi_intel_match snd_soc_acpi snd_soc_acpi_intel_sdca_quirks snd_sof_intel_hda_mlink snd_soc_sdca snd_soc_avs snd_ctl_led snd_soc_hda_codec intel_rapl_msr snd_hda_
codec_realtek snd_hda_ext_core intel_rapl_common snd_hda_codec_generic snd_soc_core snd_hda_scodec_component intel_uncore_frequency intel_uncore_frequency_common snd_hd
a_codec_hdmi intel_ifs snd_compress i10nm_edac skx_edac_common nfit snd_hda_intel snd_intel_dspcfg libnvdimm snd_hda_codec binfmt_misc snd_hwdep snd_hda_core snd_seq sn
d_seq_device dell_wmi
[ 39.848575] dell_pc x86_pkg_temp_thermal spi_nor platform_profile sparse_keymap intel_powerclamp dax_hmem snd_pcm cxl_acpi coretemp cxl_port iTCO_wdt mtd rapl intel
_pmc_bxt pmt_telemetry cxl_core dell_wmi_sysman pmt_class iTCO_vendor_support snd_timer isst_if_mmio vfat intel_cstate dell_smbios dcdbas fat dell_wmi_ddv dell_smm_hwmo
n dell_wmi_descriptor firmware_attributes_class wmi_bmof intel_uncore einj pcspkr isst_if_mbox_pci atlantic snd isst_if_common intel_vsec e1000e macsec mei_me i2c_i801
spi_intel_pci soundcore i2c_smbus spi_intel mei joydev loop nfnetlink zram nouveau drm_ttm_helper ttm polyval_clmulni iaa_crypto gpu_sched polyval_generic rtsx_pci_sdmm
c ghash_clmulni_intel i2c_algo_bit mmc_core drm_gpuvm sha512_ssse3 nvme drm_exec drm_display_helper sha256_ssse3 idxd sha1_ssse3 cec nvme_core idxd_bus rtsx_pci nvme_au
th pinctrl_alderlake ip6_tables ip_tables fuse
[ 39.848603] CPU: 21 UID: 42 PID: 1734 Comm: gnome-shell Tainted: G W 6.14.0-rc4+ #11
[ 39.848605] Tainted: [W]=WARN
[ 39.848606] Hardware name: Dell Inc. Precision 7960 Tower/01G0M6, BIOS 2.7.0 12/17/2024
[ 39.848607] RIP: 0010:nouveau_fence_no_signaling+0xac/0xd0 [nouveau]
[ 39.848688] Code: db 74 17 48 8d 7b 38 b8 ff ff ff ff f0 0f c1 43 38 83 f8 01 74 29 85 c0 7e 17 31 c0 5b 5d c3 cc cc cc cc e8 76 b2 c5 f0 eb 96 <0f> 0b e9 67 ff ff f
f be 03 00 00 00 e8 83 76 33 f1 31 c0 eb dd e8
[ 39.848690] RSP: 0018:ff1cc1ffc5c039f0 EFLAGS: 00010046
[ 39.848691] RAX: 0000000000000001 RBX: ff175a3b504da980 RCX: ff175a3b4801e008
[ 39.848692] RDX: ff175a3b43e7bad0 RSI: ffffffffc09d3fda RDI: ff175a3b504da980
[ 39.848693] RBP: ff175a3b504da9c0 R08: ffffffffc09e39df R09: 0000000000000001
[ 39.848694] R10: 0000000000000001 R11: 0000000000000000 R12: ff175a3b6d97de00
[ 39.848695] R13: 0000000000000246 R14: ff1cc1ffc5c03c60 R15: 0000000000000001
[ 39.848696] FS: 00007fc5477846c0(0000) GS:ff175a5a50280000(0000) knlGS:0000000000000000
[ 39.848698] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 39.848699] CR2: 000055cb7613d1a8 CR3: 000000012e5ce004 CR4: 0000000000f71ef0
[ 39.848700] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 39.848701] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 39.848702] PKRU: 55555554
[ 39.848703] Call Trace:
[ 39.848704] <TASK>
[ 39.848705] ? nouveau_fence_no_signaling+0xac/0xd0 [nouveau]
[ 39.848782] ? __warn.cold+0x93/0xfa
[ 39.848785] ? nouveau_fence_no_signaling+0xac/0xd0 [nouveau]
[ 39.848861] ? report_bug+0xff/0x140
[ 39.848863] ? handle_bug+0x58/0x90
[ 39.848865] ? exc_invalid_op+0x17/0x70
[ 39.848866] ? asm_exc_invalid_op+0x1a/0x20
[ 39.848870] ? nouveau_fence_no_signaling+0xac/0xd0 [nouveau]
[ 39.848943] nouveau_fence_enable_signaling+0x32/0x80 [nouveau]
[ 39.849016] ? __pfx_nouveau_fence_cleanup_cb+0x10/0x10 [nouveau]
[ 39.849088] __dma_fence_enable_signaling+0x33/0xc0
[ 39.849090] dma_fence_add_callback+0x4b/0xd0
[ 39.849093] nouveau_fence_emit+0xa3/0x260 [nouveau]
[ 39.849166] nouveau_fence_new+0x7d/0xf0 [nouveau]
[ 39.849242] nouveau_gem_ioctl_pushbuf+0xe8f/0x1300 [nouveau]
[ 39.849338] ? __pfx_nouveau_gem_ioctl_pushbuf+0x10/0x10 [nouveau]
[ 39.849431] drm_ioctl_kernel+0xad/0x100
[ 39.849433] drm_ioctl+0x288/0x550
[ 39.849435] ? __pfx_nouveau_gem_ioctl_pushbuf+0x10/0x10 [nouveau]
[ 39.849526] nouveau_drm_ioctl+0x57/0xb0 [nouveau]
[ 39.849620] __x64_sys_ioctl+0x94/0xc0
[ 39.849621] do_syscall_64+0x82/0x160
[ 39.849623] ? drm_ioctl+0x2b7/0x550
[ 39.849625] ? __pfx_nouveau_gem_ioctl_pushbuf+0x10/0x10 [nouveau]
[ 39.849719] ? ktime_get_mono_fast_ns+0x38/0xd0
[ 39.849721] ? __pm_runtime_suspend+0x69/0xc0
[ 39.849724] ? syscall_exit_to_user_mode_prepare+0x15e/0x1a0
[ 39.849726] ? syscall_exit_to_user_mode+0x10/0x200
[ 39.849729] ? do_syscall_64+0x8e/0x160
[ 39.849730] ? exc_page_fault+0x7e/0x1a0
[ 39.849733] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 39.849735] RIP: 0033:0x7fc5576fe0ad
[ 39.849736] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
[ 39.849737] RSP: 002b:00007ffc002688a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 39.849739] RAX: ffffffffffffffda RBX: 000055cb74e316c0 RCX: 00007fc5576fe0ad
[ 39.849740] RDX: 00007ffc00268960 RSI: 00000000c0406481 RDI: 000000000000000e
[ 39.849741] RBP: 00007ffc002688f0 R08: 0000000000000000 R09: 000055cb74e35560
[ 39.849742] R10: 0000000000000014 R11: 0000000000000246 R12: 00007ffc00268960
[ 39.849744] R13: 00000000c0406481 R14: 000000000000000e R15: 000055cb74e3cd10
[ 39.849746] </TASK>
[ 39.849746] ---[ end trace 0000000000000000 ]---
[ 39.849776] ------------[ cut here ]------------
This is the first WARN_ON() in dma_fence_set_error(), called by
nouveau_fence_context_kill().
It's rare, but it is a bug, or rather: the archetype of a race, since
(as Christian pointed out) nouveau_fence_update() later at some point
will remove the signaled fence (by signaling it again).
P.
Philipp Stanner (3):
drm/nouveau: Prevent signaled fences in pending list
drm/nouveau: Remove surplus if-branch
drm/nouveau: Add helper to check base fence
drivers/gpu/drm/nouveau/nouveau_fence.c | 32 ++++++++++++++-----------
1 file changed, 18 insertions(+), 14 deletions(-)
--
2.48.1
next reply other threads:[~2025-04-10 9:24 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-10 9:24 Philipp Stanner [this message]
2025-04-10 9:24 ` [PATCH 1/3] drm/nouveau: Prevent signaled fences in pending list Philipp Stanner
2025-04-10 12:13 ` Christian König
2025-04-10 12:21 ` Danilo Krummrich
2025-04-10 12:42 ` Christian König
2025-04-10 12:58 ` Christian König
2025-04-10 13:09 ` Philipp Stanner
2025-04-10 13:16 ` Christian König
2025-04-10 15:36 ` Philipp Stanner
2025-04-11 9:29 ` Philipp Stanner
[not found] ` <81a70ba6-94b1-4bb3-a0b2-9e8890f90b33@amd.com>
2025-04-11 12:44 ` Philipp Stanner
2025-04-11 13:06 ` Christian König
2025-04-11 14:10 ` Philipp Stanner
2025-04-14 8:54 ` Philipp Stanner
2025-04-14 14:27 ` Danilo Krummrich
2025-04-15 9:56 ` Christian König
2025-04-15 12:54 ` Philipp Stanner
2025-04-10 9:24 ` [PATCH 2/3] drm/nouveau: Remove surplus if-branch Philipp Stanner
2025-04-10 12:15 ` Christian König
2025-04-10 9:24 ` [PATCH 3/3] drm/nouveau: Add helper to check base fence Philipp Stanner
2025-04-10 9:51 ` [PATCH 0/3] drm/nouveau: Fix & improve nouveau_fence_done() Philipp Stanner
2025-04-10 12:18 ` Christian König
2025-04-10 13:18 ` Philipp Stanner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250410092418.135258-2-phasta@kernel.org \
--to=phasta@kernel.org \
--cc=airlied@gmail.com \
--cc=christian.koenig@amd.com \
--cc=dakr@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=linaro-mm-sig@lists.linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-media@vger.kernel.org \
--cc=lyude@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=nouveau@lists.freedesktop.org \
--cc=sd@queasysnail.net \
--cc=simona@ffwll.ch \
--cc=sumit.semwal@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).