From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev, Dave Airlie <airlied@redhat.com>,
Danilo Krummrich <dakr@redhat.com>
Subject: [PATCH 5.4 049/107] nouveau: fix instmem race condition around ptr stores
Date: Tue, 30 Apr 2024 12:40:09 +0200 [thread overview]
Message-ID: <20240430103046.106249968@linuxfoundation.org> (raw)
In-Reply-To: <20240430103044.655968143@linuxfoundation.org>
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Airlie <airlied@redhat.com>
commit fff1386cc889d8fb4089d285f883f8cba62d82ce upstream.
Running a lot of VK CTS in parallel against nouveau, once every
few hours you might see something like this crash.
BUG: kernel NULL pointer dereference, address: 0000000000000008
PGD 8000000114e6e067 P4D 8000000114e6e067 PUD 109046067 PMD 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 7 PID: 53891 Comm: deqp-vk Not tainted 6.8.0-rc6+ #27
Hardware name: Gigabyte Technology Co., Ltd. Z390 I AORUS PRO WIFI/Z390 I AORUS PRO WIFI-CF, BIOS F8 11/05/2021
RIP: 0010:gp100_vmm_pgt_mem+0xe3/0x180 [nouveau]
Code: c7 48 01 c8 49 89 45 58 85 d2 0f 84 95 00 00 00 41 0f b7 46 12 49 8b 7e 08 89 da 42 8d 2c f8 48 8b 47 08 41 83 c7 01 48 89 ee <48> 8b 40 08 ff d0 0f 1f 00 49 8b 7e 08 48 89 d9 48 8d 75 04 48 c1
RSP: 0000:ffffac20c5857838 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 00000000004d8001 RCX: 0000000000000001
RDX: 00000000004d8001 RSI: 00000000000006d8 RDI: ffffa07afe332180
RBP: 00000000000006d8 R08: ffffac20c5857ad0 R09: 0000000000ffff10
R10: 0000000000000001 R11: ffffa07af27e2de0 R12: 000000000000001c
R13: ffffac20c5857ad0 R14: ffffa07a96fe9040 R15: 000000000000001c
FS: 00007fe395eed7c0(0000) GS:ffffa07e2c980000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 000000011febe001 CR4: 00000000003706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
...
? gp100_vmm_pgt_mem+0xe3/0x180 [nouveau]
? gp100_vmm_pgt_mem+0x37/0x180 [nouveau]
nvkm_vmm_iter+0x351/0xa20 [nouveau]
? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau]
? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
? __lock_acquire+0x3ed/0x2170
? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
nvkm_vmm_ptes_get_map+0xc2/0x100 [nouveau]
? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau]
? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
nvkm_vmm_map_locked+0x224/0x3a0 [nouveau]
Adding any sort of useful debug usually makes it go away, so I hand
wrote the function in a line, and debugged the asm.
Every so often pt->memory->ptrs is NULL. This ptrs ptr is set in
the nv50_instobj_acquire called from nvkm_kmap.
If Thread A and Thread B both get to nv50_instobj_acquire around
the same time, and Thread A hits the refcount_set line, and in
lockstep thread B succeeds at refcount_inc_not_zero, there is a
chance the ptrs value won't have been stored since refcount_set
is unordered. Force a memory barrier here, I picked smp_mb, since
we want it on all CPUs and it's write followed by a read.
v2: use paired smp_rmb/smp_wmb.
Cc: <stable@vger.kernel.org>
Fixes: be55287aa5ba ("drm/nouveau/imem/nv50: embed nvkm_instobj directly into nv04_instobj")
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240411011510.2546857-1-airlied@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c
@@ -221,8 +221,11 @@ nv50_instobj_acquire(struct nvkm_memory
void __iomem *map = NULL;
/* Already mapped? */
- if (refcount_inc_not_zero(&iobj->maps))
+ if (refcount_inc_not_zero(&iobj->maps)) {
+ /* read barrier match the wmb on refcount set */
+ smp_rmb();
return iobj->map;
+ }
/* Take the lock, and re-check that another thread hasn't
* already mapped the object in the meantime.
@@ -249,6 +252,8 @@ nv50_instobj_acquire(struct nvkm_memory
iobj->base.memory.ptrs = &nv50_instobj_fast;
else
iobj->base.memory.ptrs = &nv50_instobj_slow;
+ /* barrier to ensure the ptrs are written before refcount is set */
+ smp_wmb();
refcount_set(&iobj->maps, 1);
}
next prev parent reply other threads:[~2024-04-30 11:24 UTC|newest]
Thread overview: 114+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-30 10:39 [PATCH 5.4 000/107] 5.4.275-rc1 review Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 001/107] batman-adv: Avoid infinite loop trying to resize local TT Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 002/107] Bluetooth: Fix memory leak in hci_req_sync_complete() Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 003/107] nouveau: fix function cast warning Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 004/107] net: openvswitch: fix unwanted error log on timeout policy probing Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 005/107] u64_stats: fix u64_stats_init() for lockdep when used repeatedly in one file Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 006/107] geneve: fix header validation in geneve[6]_xmit_skb Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 007/107] ipv6: fib: hide unused pn variable Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 008/107] ipv4/route: avoid unused-but-set-variable warning Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 009/107] ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 010/107] net/mlx5: Properly link new fs rules into the tree Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 011/107] af_unix: Do not use atomic ops for unix_sk(sk)->inflight Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 012/107] af_unix: Fix garbage collector racing against connect() Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 013/107] net: ena: Fix potential sign extension issue Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 014/107] btrfs: qgroup: correctly model root qgroup rsv in convert Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 015/107] drm/client: Fully protect modes[] with dev->mode_config.mutex Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 016/107] vhost: Add smp_rmb() in vhost_vq_avail_empty() Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 017/107] selftests: timers: Fix abs() warning in posix_timers test Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 018/107] x86/apic: Force native_apic_mem_read() to use the MOV instruction Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 019/107] btrfs: record delayed inode root in transaction Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 020/107] selftests/ftrace: Limit length in subsystem-enable tests Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 021/107] kprobes: Fix possible use-after-free issue on kprobe registration Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 022/107] Revert "tracing/trigger: Fix to return error if failed to alloc snapshot" Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 023/107] netfilter: nf_tables: Fix potential data-race in __nft_expr_type_get() Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 024/107] tun: limit printing rate when illegal packet received by tun dev Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 025/107] RDMA/rxe: Fix the problem "mutex_destroy missing" Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 026/107] RDMA/mlx5: Fix port number for counter query in multi-port configuration Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 027/107] drm: nv04: Fix out of bounds access Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 028/107] clk: Remove prepare_lock hold assertion in __clk_release() Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 029/107] clk: Mark all_lists as const Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 030/107] clk: remove extra empty line Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 031/107] clk: Print an info line before disabling unused clocks Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 032/107] clk: Initialize struct clk_core kref earlier Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 033/107] clk: Get runtime PM before walking tree during disable_unused Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 034/107] x86/cpufeatures: Fix dependencies for GFNI, VAES, and VPCLMULQDQ Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 035/107] binder: check offset alignment in binder_get_object() Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 036/107] comedi: vmk80xx: fix incomplete endpoint checking Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 037/107] serial/pmac_zilog: Remove flawed mitigation for rx irq flood Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 038/107] USB: serial: option: add Fibocom FM135-GL variants Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 5.4 039/107] USB: serial: option: add support for Fibocom FM650/FG650 Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 040/107] USB: serial: option: add Lonsung U8300/U9300 product Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 041/107] USB: serial: option: support Quectel EM060K sub-models Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 042/107] USB: serial: option: add Rolling RW101-GL and RW135-GL support Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 043/107] USB: serial: option: add Telit FN920C04 rmnet compositions Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 044/107] Revert "usb: cdc-wdm: close race between read and workqueue" Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 045/107] usb: dwc2: host: Fix dereference issue in DDMA completion flow Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 046/107] usb: Disable USB3 LPM at shutdown Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 047/107] speakup: Avoid crash on very long word Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 048/107] fs: sysfs: Fix reference leak in sysfs_break_active_protection() Greg Kroah-Hartman
2024-04-30 10:40 ` Greg Kroah-Hartman [this message]
2024-04-30 10:40 ` [PATCH 5.4 050/107] nilfs2: fix OOB in nilfs_set_de_type Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 051/107] KVM: async_pf: Cleanup kvm_setup_async_pf() Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 052/107] arm64: dts: rockchip: fix alphabetical ordering RK3399 puma Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 053/107] arm64: dts: rockchip: enable internal pull-up on PCIE_WAKE# for RK3399 Puma Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 054/107] arm64: dts: mediatek: mt8183: Add power-domains properity to mfgcfg Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 055/107] arm64: dts: mediatek: mt7622: fix IR nodename Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 056/107] arm64: dts: mediatek: mt7622: fix ethernet controller "compatible" Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 057/107] arm64: dts: mediatek: mt7622: drop "reset-names" from thermal block Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 058/107] arm64: dts: mt2712: add ethernet device node Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 059/107] arm64: dts: mediatek: mt2712: fix validation errors Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 060/107] ARC: [plat-hsdk]: Remove misplaced interrupt-cells property Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 061/107] vxlan: drop packets from invalid src-address Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 062/107] mlxsw: core: Unregister EMAD trap using FORWARD action Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 063/107] NFC: trf7970a: disable all regulators on removal Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 064/107] net: usb: ax88179_178a: stop lying about skb->truesize Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 065/107] net: gtp: Fix Use-After-Free in gtp_dellink Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 066/107] ipvs: Fix checksumming on GSO of SCTP packets Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 067/107] net: openvswitch: Fix Use-After-Free in ovs_ct_exit Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 068/107] mlxsw: spectrum_acl_tcam: Fix race during rehash delayed work Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 069/107] mlxsw: spectrum_acl_tcam: Fix possible use-after-free during activity update Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 070/107] mlxsw: spectrum_acl_tcam: Fix possible use-after-free during rehash Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 071/107] mlxsw: spectrum_acl_tcam: Rate limit error message Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 072/107] mlxsw: spectrum_acl_tcam: Fix memory leak during rehash Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 073/107] mlxsw: spectrum_acl_tcam: Fix warning " Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 074/107] mlxsw: spectrum_acl_tcam: Fix incorrect list API usage Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 075/107] mlxsw: spectrum_acl_tcam: Fix memory leak when canceling rehash work Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 076/107] i40e: Do not use WQ_MEM_RECLAIM flag for workqueue Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 077/107] iavf: Fix TC config comparison with existing adapter TC config Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 078/107] af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc() Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 079/107] serial: core: Provide port lock wrappers Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 080/107] serial: mxs-auart: add spinlock around changing cts state Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 081/107] drm/amdgpu: restrict bo mapping within gpu address limits Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 082/107] amdgpu: validate offset_in_bo of drm_amdgpu_gem_va Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 083/107] drm/amdgpu: validate the parameters of bo mapping operations more clearly Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 084/107] Revert "crypto: api - Disallow identical driver names" Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 085/107] net/mlx5e: Fix a race in command alloc flow Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 086/107] tracing: Show size of requested perf buffer Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 087/107] tracing: Increase PERF_MAX_TRACE_SIZE to handle Sentinel1 and docker together Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 088/107] Bluetooth: Fix type of len in {l2cap,sco}_sock_getsockopt_old() Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 089/107] Bluetooth: btusb: Add Realtek RTL8852BE support ID 0x0bda:0x4853 Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 090/107] btrfs: fix information leak in btrfs_ioctl_logical_to_ino() Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 091/107] arm64: dts: rockchip: enable internal pull-up for Q7_THRM# on RK3399 Puma Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 092/107] drm/amdgpu: Fix leak when GPU memory allocation fails Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 093/107] irqchip/gic-v3-its: Prevent double free on error Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 094/107] ethernet: Add helper for assigning packet type when dest address does not match device address Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 095/107] net: b44: set pause params only when interface is up Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 096/107] stackdepot: respect __GFP_NOLOCKDEP allocation flag Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 097/107] mtd: diskonchip: work around ubsan link failure Greg Kroah-Hartman
2024-04-30 10:40 ` [PATCH 5.4 098/107] tcp: Clean up kernel listeners reqsk in inet_twsk_purge() Greg Kroah-Hartman
2024-05-06 1:34 ` shaozhengchao
2024-05-06 3:46 ` shaozhengchao
2024-04-30 10:40 ` [PATCH 5.4 099/107] tcp: Fix NEW_SYN_RECV handling " Greg Kroah-Hartman
2024-04-30 10:41 ` [PATCH 5.4 100/107] dmaengine: owl: fix register access functions Greg Kroah-Hartman
2024-04-30 10:41 ` [PATCH 5.4 101/107] idma64: Dont try to serve interrupts when device is powered off Greg Kroah-Hartman
2024-04-30 10:41 ` [PATCH 5.4 102/107] i2c: smbus: fix NULL function pointer dereference Greg Kroah-Hartman
2024-04-30 10:41 ` [PATCH 5.4 103/107] HID: i2c-hid: remove I2C_HID_READ_PENDING flag to prevent lock-up Greg Kroah-Hartman
2024-04-30 10:41 ` [PATCH 5.4 104/107] bounds: Use the right number of bits for power-of-two CONFIG_NR_CPUS Greg Kroah-Hartman
2024-04-30 10:41 ` [PATCH 5.4 105/107] dm: limit the number of targets and parameter size area Greg Kroah-Hartman
2024-04-30 10:41 ` [PATCH 5.4 106/107] udp: preserve the connected status if only UDP cmsg Greg Kroah-Hartman
2024-04-30 10:41 ` [PATCH 5.4 107/107] serial: core: fix kernel-doc for uart_port_unlock_irqrestore() Greg Kroah-Hartman
2024-04-30 11:44 ` [PATCH 5.4 000/107] 5.4.275-rc1 review Jon Hunter
2024-04-30 12:10 ` Harshit Mogalapalli
2024-04-30 13:38 ` Greg Kroah-Hartman
2024-05-02 3:11 ` Shuah Khan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240430103046.106249968@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=airlied@redhat.com \
--cc=dakr@redhat.com \
--cc=patches@lists.linux.dev \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.