From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev, Dave Airlie <airlied@redhat.com>,
Danilo Krummrich <dakr@redhat.com>
Subject: [PATCH 5.4 049/107] nouveau: fix instmem race condition around ptr stores
Date: Tue, 30 Apr 2024 12:40:09 +0200 [thread overview]
Message-ID: <20240430103046.106249968@linuxfoundation.org> (raw)
In-Reply-To: <20240430103044.655968143@linuxfoundation.org>
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Airlie <airlied@redhat.com>
commit fff1386cc889d8fb4089d285f883f8cba62d82ce upstream.
Running a lot of VK CTS in parallel against nouveau, once every
few hours you might see something like this crash.
BUG: kernel NULL pointer dereference, address: 0000000000000008
PGD 8000000114e6e067 P4D 8000000114e6e067 PUD 109046067 PMD 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 7 PID: 53891 Comm: deqp-vk Not tainted 6.8.0-rc6+ #27
Hardware name: Gigabyte Technology Co., Ltd. Z390 I AORUS PRO WIFI/Z390 I AORUS PRO WIFI-CF, BIOS F8 11/05/2021
RIP: 0010:gp100_vmm_pgt_mem+0xe3/0x180 [nouveau]
Code: c7 48 01 c8 49 89 45 58 85 d2 0f 84 95 00 00 00 41 0f b7 46 12 49 8b 7e 08 89 da 42 8d 2c f8 48 8b 47 08 41 83 c7 01 48 89 ee <48> 8b 40 08 ff d0 0f 1f 00 49 8b 7e 08 48 89 d9 48 8d 75 04 48 c1
RSP: 0000:ffffac20c5857838 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 00000000004d8001 RCX: 0000000000000001
RDX: 00000000004d8001 RSI: 00000000000006d8 RDI: ffffa07afe332180
RBP: 00000000000006d8 R08: ffffac20c5857ad0 R09: 0000000000ffff10
R10: 0000000000000001 R11: ffffa07af27e2de0 R12: 000000000000001c
R13: ffffac20c5857ad0 R14: ffffa07a96fe9040 R15: 000000000000001c
FS: 00007fe395eed7c0(0000) GS:ffffa07e2c980000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 000000011febe001 CR4: 00000000003706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
...
? gp100_vmm_pgt_mem+0xe3/0x180 [nouveau]
? gp100_vmm_pgt_mem+0x37/0x180 [nouveau]
nvkm_vmm_iter+0x351/0xa20 [nouveau]
? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau]
? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
? __lock_acquire+0x3ed/0x2170
? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
nvkm_vmm_ptes_get_map+0xc2/0x100 [nouveau]
? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau]
? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
nvkm_vmm_map_locked+0x224/0x3a0 [nouveau]
Adding any sort of useful debug usually makes it go away, so I hand
wrote the function in a line, and debugged the asm.
Every so often pt->memory->ptrs is NULL. This ptrs ptr is set in
the nv50_instobj_acquire called from nvkm_kmap.
If Thread A and Thread B both get to nv50_instobj_acquire around
the same time, and Thread A hits the refcount_set line, and in
lockstep thread B succeeds at refcount_inc_not_zero, there is a
chance the ptrs value won't have been stored since refcount_set
is unordered. Force a memory barrier here, I picked smp_mb, since
we want it on all CPUs and it's write followed by a read.
v2: use paired smp_rmb/smp_wmb.
Cc: <stable@vger.kernel.org>
Fixes: be55287aa5ba ("drm/nouveau/imem/nv50: embed nvkm_instobj directly into nv04_instobj")
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240411011510.2546857-1-airlied@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c
@@ -221,8 +221,11 @@ nv50_instobj_acquire(struct nvkm_memory
void __iomem *map = NULL;
/* Already mapped? */
- if (refcount_inc_not_zero(&iobj->maps))
+ if (refcount_inc_not_zero(&iobj->maps)) {
+ /* read barrier match the wmb on refcount set */
+ smp_rmb();
return iobj->map;
+ }
/* Take the lock, and re-check that another thread hasn't
* already mapped the object in the meantime.
@@ -249,6 +252,8 @@ nv50_instobj_acquire(struct nvkm_memory
iobj->base.memory.ptrs = &nv50_instobj_fast;
else
iobj->base.memory.ptrs = &nv50_instobj_slow;
+ /* barrier to ensure the ptrs are written before refcount is set */
+ smp_wmb();
refcount_set(&iobj->maps, 1);
}
next prev parent reply other threads:[~2024-04-30 11:24 UTC|newest]
Thread overview: 114+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-30 10:39 [PATCH 5.4 000/107] 5.4.275-rc1 review Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 001/107] batman-adv: Avoid infinite loop trying to resize local TT Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 002/107] Bluetooth: Fix memory leak in hci_req_sync_complete() Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 003/107] nouveau: fix function cast warning Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 004/107] net: openvswitch: fix unwanted error log on timeout policy probing Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 005/107] u64_stats: fix u64_stats_init() for lockdep when used repeatedly in one file Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 006/107] geneve: fix header validation in geneve[6]_xmit_skb Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 007/107] ipv6: fib: hide unused pn variable Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 008/107] ipv4/route: avoid unused-but-set-variable warning Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 009/107] ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 010/107] net/mlx5: Properly link new fs rules into the tree Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 011/107] af_unix: Do not use atomic ops for unix_sk(sk)->inflight Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 012/107] af_unix: Fix garbage collector racing against connect() Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 013/107] net: ena: Fix potential sign extension issue Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 014/107] btrfs: qgroup: correctly model root qgroup rsv in convert Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 015/107] drm/client: Fully protect modes[] with dev->mode_config.mutex Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 016/107] vhost: Add smp_rmb() in vhost_vq_avail_empty() Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 017/107] selftests: timers: Fix abs() warning in posix_timers test Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 018/107] x86/apic: Force native_apic_mem_read() to use the MOV instruction Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 019/107] btrfs: record delayed inode root in transaction Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 020/107] selftests/ftrace: Limit length in subsystem-enable tests Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 021/107] kprobes: Fix possible use-after-free issue on kprobe registration Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 022/107] Revert "tracing/trigger: Fix to return error if failed to alloc snapshot" Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 023/107] netfilter: nf_tables: Fix potential data-race in __nft_expr_type_get() Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 024/107] tun: limit printing rate when illegal packet received by tun dev Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 025/107] RDMA/rxe: Fix the problem "mutex_destroy missing" Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 026/107] RDMA/mlx5: Fix port number for counter query in multi-port configuration Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 027/107] drm: nv04: Fix out of bounds access Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 028/107] clk: Remove prepare_lock hold assertion in __clk_release() Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 029/107] clk: Mark all_lists as const Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 030/107] clk: remove extra empty line Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 031/107] clk: Print an info line before disabling unused clocks Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 032/107] clk: Initialize struct clk_core kref earlier Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 033/107] clk: Get runtime PM before walking tree during disable_unused Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 034/107] x86/cpufeatures: Fix dependencies for GFNI, VAES, and VPCLMULQDQ Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 035/107] binder: check offset alignment in binder_get_object() Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 036/107] comedi: vmk80xx: fix incomplete endpoint checking Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 037/107] serial/pmac_zilog: Remove flawed mitigation for rx irq flood Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 038/107] USB: serial: option: add Fibocom FM135-GL variants Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 039/107] USB: serial: option: add support for Fibocom FM650/FG650 Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 040/107] USB: serial: option: add Lonsung U8300/U9300 product Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 041/107] USB: serial: option: support Quectel EM060K sub-models Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 042/107] USB: serial: option: add Rolling RW101-GL and RW135-GL support Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 043/107] USB: serial: option: add Telit FN920C04 rmnet compositions Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 044/107] Revert "usb: cdc-wdm: close race between read and workqueue" Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 045/107] usb: dwc2: host: Fix dereference issue in DDMA completion flow Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 046/107] usb: Disable USB3 LPM at shutdown Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 047/107] speakup: Avoid crash on very long word Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 048/107] fs: sysfs: Fix reference leak in sysfs_break_active_protection() Greg Kroah-Hartman
2024-04-30 10:40 ` Greg Kroah-Hartman [this message]
2024-04-30 10:40 ` [PATCH 5.4 050/107] nilfs2: fix OOB in nilfs_set_de_type Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 051/107] KVM: async_pf: Cleanup kvm_setup_async_pf() Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 052/107] arm64: dts: rockchip: fix alphabetical ordering RK3399 puma Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 053/107] arm64: dts: rockchip: enable internal pull-up on PCIE_WAKE# for RK3399 Puma Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 054/107] arm64: dts: mediatek: mt8183: Add power-domains properity to mfgcfg Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 055/107] arm64: dts: mediatek: mt7622: fix IR nodename Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 056/107] arm64: dts: mediatek: mt7622: fix ethernet controller "compatible" Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 057/107] arm64: dts: mediatek: mt7622: drop "reset-names" from thermal block Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 058/107] arm64: dts: mt2712: add ethernet device node Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 059/107] arm64: dts: mediatek: mt2712: fix validation errors Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 060/107] ARC: [plat-hsdk]: Remove misplaced interrupt-cells property Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 061/107] vxlan: drop packets from invalid src-address Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 062/107] mlxsw: core: Unregister EMAD trap using FORWARD action Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 063/107] NFC: trf7970a: disable all regulators on removal Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 064/107] net: usb: ax88179_178a: stop lying about skb->truesize Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 065/107] net: gtp: Fix Use-After-Free in gtp_dellink Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 066/107] ipvs: Fix checksumming on GSO of SCTP packets Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 067/107] net: openvswitch: Fix Use-After-Free in ovs_ct_exit Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 068/107] mlxsw: spectrum_acl_tcam: Fix race during rehash delayed work Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 069/107] mlxsw: spectrum_acl_tcam: Fix possible use-after-free during activity update Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 070/107] mlxsw: spectrum_acl_tcam: Fix possible use-after-free during rehash Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 071/107] mlxsw: spectrum_acl_tcam: Rate limit error message Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 072/107] mlxsw: spectrum_acl_tcam: Fix memory leak during rehash Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 073/107] mlxsw: spectrum_acl_tcam: Fix warning " Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 074/107] mlxsw: spectrum_acl_tcam: Fix incorrect list API usage Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 075/107] mlxsw: spectrum_acl_tcam: Fix memory leak when canceling rehash work Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 076/107] i40e: Do not use WQ_MEM_RECLAIM flag for workqueue Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 077/107] iavf: Fix TC config comparison with existing adapter TC config Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 078/107] af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc() Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 079/107] serial: core: Provide port lock wrappers Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 080/107] serial: mxs-auart: add spinlock around changing cts state Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 081/107] drm/amdgpu: restrict bo mapping within gpu address limits Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 082/107] amdgpu: validate offset_in_bo of drm_amdgpu_gem_va Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 083/107] drm/amdgpu: validate the parameters of bo mapping operations more clearly Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 084/107] Revert "crypto: api - Disallow identical driver names" Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 085/107] net/mlx5e: Fix a race in command alloc flow Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 086/107] tracing: Show size of requested perf buffer Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 087/107] tracing: Increase PERF_MAX_TRACE_SIZE to handle Sentinel1 and docker together Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 088/107] Bluetooth: Fix type of len in {l2cap,sco}_sock_getsockopt_old() Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 089/107] Bluetooth: btusb: Add Realtek RTL8852BE support ID 0x0bda:0x4853 Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 090/107] btrfs: fix information leak in btrfs_ioctl_logical_to_ino() Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 091/107] arm64: dts: rockchip: enable internal pull-up for Q7_THRM# on RK3399 Puma Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 092/107] drm/amdgpu: Fix leak when GPU memory allocation fails Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 093/107] irqchip/gic-v3-its: Prevent double free on error Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 094/107] ethernet: Add helper for assigning packet type when dest address does not match device address Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 095/107] net: b44: set pause params only when interface is up Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 096/107] stackdepot: respect __GFP_NOLOCKDEP allocation flag Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 097/107] mtd: diskonchip: work around ubsan link failure Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 098/107] tcp: Clean up kernel listeners reqsk in inet_twsk_purge() Greg Kroah-Hartman
2024-05-06 1:34 ` shaozhengchao
2024-05-06 3:46 ` shaozhengchao
2024-04-30 10:40 ` [PATCH 5.4 099/107] tcp: Fix NEW_SYN_RECV handling " Greg Kroah-Hartman
2024-04-30 10:41 ` [PATCH 5.4 100/107] dmaengine: owl: fix register access functions Greg Kroah-Hartman
2024-04-30 10:41 ` [PATCH 5.4 101/107] idma64: Dont try to serve interrupts when device is powered off Greg Kroah-Hartman
2024-04-30 10:41 ` [PATCH 5.4 102/107] i2c: smbus: fix NULL function pointer dereference Greg Kroah-Hartman
2024-04-30 10:41 ` [PATCH 5.4 103/107] HID: i2c-hid: remove I2C_HID_READ_PENDING flag to prevent lock-up Greg Kroah-Hartman
2024-04-30 10:41 ` [PATCH 5.4 104/107] bounds: Use the right number of bits for power-of-two CONFIG_NR_CPUS Greg Kroah-Hartman
2024-04-30 10:41 ` [PATCH 5.4 105/107] dm: limit the number of targets and parameter size area Greg Kroah-Hartman
2024-04-30 10:41 ` [PATCH 5.4 106/107] udp: preserve the connected status if only UDP cmsg Greg Kroah-Hartman
2024-04-30 10:41 ` [PATCH 5.4 107/107] serial: core: fix kernel-doc for uart_port_unlock_irqrestore() Greg Kroah-Hartman
2024-04-30 11:44 ` [PATCH 5.4 000/107] 5.4.275-rc1 review Jon Hunter
2024-04-30 12:10 ` Harshit Mogalapalli
2024-04-30 13:38 ` Greg Kroah-Hartman
2024-05-02 3:11 ` Shuah Khan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240430103046.106249968@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=airlied@redhat.com \
--cc=dakr@redhat.com \
--cc=patches@lists.linux.dev \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).