From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev, Dave Airlie <airlied@redhat.com>,
Danilo Krummrich <dakr@redhat.com>
Subject: [PATCH 4.19 34/77] nouveau: fix instmem race condition around ptr stores
Date: Tue, 30 Apr 2024 12:39:13 +0200 [thread overview]
Message-ID: <20240430103042.139562036@linuxfoundation.org> (raw)
In-Reply-To: <20240430103041.111219002@linuxfoundation.org>
4.19-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Airlie <airlied@redhat.com>
commit fff1386cc889d8fb4089d285f883f8cba62d82ce upstream.
Running a lot of VK CTS in parallel against nouveau, once every
few hours you might see something like this crash.
BUG: kernel NULL pointer dereference, address: 0000000000000008
PGD 8000000114e6e067 P4D 8000000114e6e067 PUD 109046067 PMD 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 7 PID: 53891 Comm: deqp-vk Not tainted 6.8.0-rc6+ #27
Hardware name: Gigabyte Technology Co., Ltd. Z390 I AORUS PRO WIFI/Z390 I AORUS PRO WIFI-CF, BIOS F8 11/05/2021
RIP: 0010:gp100_vmm_pgt_mem+0xe3/0x180 [nouveau]
Code: c7 48 01 c8 49 89 45 58 85 d2 0f 84 95 00 00 00 41 0f b7 46 12 49 8b 7e 08 89 da 42 8d 2c f8 48 8b 47 08 41 83 c7 01 48 89 ee <48> 8b 40 08 ff d0 0f 1f 00 49 8b 7e 08 48 89 d9 48 8d 75 04 48 c1
RSP: 0000:ffffac20c5857838 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 00000000004d8001 RCX: 0000000000000001
RDX: 00000000004d8001 RSI: 00000000000006d8 RDI: ffffa07afe332180
RBP: 00000000000006d8 R08: ffffac20c5857ad0 R09: 0000000000ffff10
R10: 0000000000000001 R11: ffffa07af27e2de0 R12: 000000000000001c
R13: ffffac20c5857ad0 R14: ffffa07a96fe9040 R15: 000000000000001c
FS: 00007fe395eed7c0(0000) GS:ffffa07e2c980000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 000000011febe001 CR4: 00000000003706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
...
? gp100_vmm_pgt_mem+0xe3/0x180 [nouveau]
? gp100_vmm_pgt_mem+0x37/0x180 [nouveau]
nvkm_vmm_iter+0x351/0xa20 [nouveau]
? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau]
? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
? __lock_acquire+0x3ed/0x2170
? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
nvkm_vmm_ptes_get_map+0xc2/0x100 [nouveau]
? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau]
? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
nvkm_vmm_map_locked+0x224/0x3a0 [nouveau]
Adding any sort of useful debug usually makes it go away, so I hand
wrote the function in a line, and debugged the asm.
Every so often pt->memory->ptrs is NULL. This ptrs ptr is set in
the nv50_instobj_acquire called from nvkm_kmap.
If Thread A and Thread B both get to nv50_instobj_acquire around
the same time, and Thread A hits the refcount_set line, and in
lockstep thread B succeeds at refcount_inc_not_zero, there is a
chance the ptrs value won't have been stored since refcount_set
is unordered. Force a memory barrier here, I picked smp_mb, since
we want it on all CPUs and it's write followed by a read.
v2: use paired smp_rmb/smp_wmb.
Cc: <stable@vger.kernel.org>
Fixes: be55287aa5ba ("drm/nouveau/imem/nv50: embed nvkm_instobj directly into nv04_instobj")
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240411011510.2546857-1-airlied@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c
@@ -221,8 +221,11 @@ nv50_instobj_acquire(struct nvkm_memory
void __iomem *map = NULL;
/* Already mapped? */
- if (refcount_inc_not_zero(&iobj->maps))
+ if (refcount_inc_not_zero(&iobj->maps)) {
+ /* read barrier match the wmb on refcount set */
+ smp_rmb();
return iobj->map;
+ }
/* Take the lock, and re-check that another thread hasn't
* already mapped the object in the meantime.
@@ -249,6 +252,8 @@ nv50_instobj_acquire(struct nvkm_memory
iobj->base.memory.ptrs = &nv50_instobj_fast;
else
iobj->base.memory.ptrs = &nv50_instobj_slow;
+ /* barrier to ensure the ptrs are written before refcount is set */
+ smp_wmb();
refcount_set(&iobj->maps, 1);
}
next prev parent reply other threads:[~2024-04-30 10:42 UTC|newest]
Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-30 10:38 [PATCH 4.19 00/77] 4.19.313-rc1 review Greg Kroah-Hartman
2024-04-30 10:38 ` [PATCH 4.19 01/77] batman-adv: Avoid infinite loop trying to resize local TT Greg Kroah-Hartman
2024-04-30 10:38 ` [PATCH 4.19 02/77] Bluetooth: Fix memory leak in hci_req_sync_complete() Greg Kroah-Hartman
2024-04-30 10:38 ` [PATCH 4.19 03/77] nouveau: fix function cast warning Greg Kroah-Hartman
2024-04-30 10:38 ` [PATCH 4.19 04/77] geneve: fix header validation in geneve[6]_xmit_skb Greg Kroah-Hartman
2024-04-30 10:38 ` [PATCH 4.19 05/77] ipv6: fib: hide unused pn variable Greg Kroah-Hartman
2024-04-30 10:38 ` [PATCH 4.19 06/77] ipv4/route: avoid unused-but-set-variable warning Greg Kroah-Hartman
2024-04-30 10:38 ` [PATCH 4.19 07/77] ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr Greg Kroah-Hartman
2024-04-30 10:38 ` [PATCH 4.19 08/77] net/mlx5: Properly link new fs rules into the tree Greg Kroah-Hartman
2024-04-30 10:38 ` [PATCH 4.19 09/77] tracing: hide unused ftrace_event_id_fops Greg Kroah-Hartman
2024-04-30 10:38 ` [PATCH 4.19 10/77] vhost: Add smp_rmb() in vhost_vq_avail_empty() Greg Kroah-Hartman
2024-04-30 10:38 ` [PATCH 4.19 11/77] selftests: timers: Fix abs() warning in posix_timers test Greg Kroah-Hartman
2024-04-30 10:38 ` [PATCH 4.19 12/77] x86/apic: Force native_apic_mem_read() to use the MOV instruction Greg Kroah-Hartman
2024-04-30 10:38 ` [PATCH 4.19 13/77] btrfs: record delayed inode root in transaction Greg Kroah-Hartman
2024-04-30 10:38 ` [PATCH 4.19 14/77] selftests/ftrace: Limit length in subsystem-enable tests Greg Kroah-Hartman
2024-04-30 10:38 ` [PATCH 4.19 15/77] kprobes: Fix possible use-after-free issue on kprobe registration Greg Kroah-Hartman
2024-04-30 10:38 ` [PATCH 4.19 16/77] Revert "tracing/trigger: Fix to return error if failed to alloc snapshot" Greg Kroah-Hartman
2024-04-30 10:38 ` [PATCH 4.19 17/77] netfilter: nf_tables: __nft_expr_type_get() selects specific family type Greg Kroah-Hartman
2024-04-30 10:38 ` [PATCH 4.19 18/77] netfilter: nf_tables: Fix potential data-race in __nft_expr_type_get() Greg Kroah-Hartman
2024-04-30 10:38 ` [PATCH 4.19 19/77] tun: limit printing rate when illegal packet received by tun dev Greg Kroah-Hartman
2024-04-30 10:38 ` [PATCH 4.19 20/77] RDMA/mlx5: Fix port number for counter query in multi-port configuration Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 21/77] drm: nv04: Fix out of bounds access Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 22/77] comedi: vmk80xx: fix incomplete endpoint checking Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 23/77] serial/pmac_zilog: Remove flawed mitigation for rx irq flood Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 24/77] USB: serial: option: add Fibocom FM135-GL variants Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 25/77] USB: serial: option: add support for Fibocom FM650/FG650 Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 26/77] USB: serial: option: add Lonsung U8300/U9300 product Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 27/77] USB: serial: option: support Quectel EM060K sub-models Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 28/77] USB: serial: option: add Rolling RW101-GL and RW135-GL support Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 29/77] USB: serial: option: add Telit FN920C04 rmnet compositions Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 30/77] Revert "usb: cdc-wdm: close race between read and workqueue" Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 31/77] usb: dwc2: host: Fix dereference issue in DDMA completion flow Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 32/77] speakup: Avoid crash on very long word Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 33/77] fs: sysfs: Fix reference leak in sysfs_break_active_protection() Greg Kroah-Hartman
2024-04-30 10:39 ` Greg Kroah-Hartman [this message]
2024-04-30 10:39 ` [PATCH 4.19 35/77] nilfs2: fix OOB in nilfs_set_de_type Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 36/77] tracing: Remove hist trigger synth_var_refs Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 37/77] tracing: Use var_refs[] for hist trigger reference checking Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 38/77] arm64: dts: rockchip: fix alphabetical ordering RK3399 puma Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 39/77] arm64: dts: rockchip: enable internal pull-up on PCIE_WAKE# for RK3399 Puma Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 40/77] arm64: dts: mediatek: mt7622: fix IR nodename Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 41/77] arm64: dts: mediatek: mt7622: fix ethernet controller "compatible" Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 42/77] arm64: dts: mediatek: mt7622: drop "reset-names" from thermal block Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 43/77] ARC: [plat-hsdk]: Remove misplaced interrupt-cells property Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 44/77] vxlan: drop packets from invalid src-address Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 45/77] mlxsw: core: Unregister EMAD trap using FORWARD action Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 46/77] NFC: trf7970a: disable all regulators on removal Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 47/77] net: usb: ax88179_178a: stop lying about skb->truesize Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 48/77] net: gtp: Fix Use-After-Free in gtp_dellink Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 49/77] ipvs: Fix checksumming on GSO of SCTP packets Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 50/77] net: openvswitch: ovs_ct_exit to be done under ovs_lock Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 51/77] net: openvswitch: Fix Use-After-Free in ovs_ct_exit Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 52/77] i40e: Do not use WQ_MEM_RECLAIM flag for workqueue Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 53/77] serial: core: Provide port lock wrappers Greg Kroah-Hartman
2024-04-30 10:50 ` John Ogness
2024-04-30 12:10 ` John Ogness
2024-04-30 12:40 ` Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 54/77] serial: mxs-auart: add spinlock around changing cts state Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 55/77] drm/amdgpu: restrict bo mapping within gpu address limits Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 56/77] amdgpu: validate offset_in_bo of drm_amdgpu_gem_va Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 57/77] drm/amdgpu: validate the parameters of bo mapping operations more clearly Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 58/77] Revert "crypto: api - Disallow identical driver names" Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 59/77] tracing: Show size of requested perf buffer Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 60/77] tracing: Increase PERF_MAX_TRACE_SIZE to handle Sentinel1 and docker together Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 61/77] Bluetooth: Fix type of len in {l2cap,sco}_sock_getsockopt_old() Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 62/77] btrfs: fix information leak in btrfs_ioctl_logical_to_ino() Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 63/77] arm64: dts: rockchip: enable internal pull-up for Q7_THRM# on RK3399 Puma Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 64/77] irqchip/gic-v3-its: Prevent double free on error Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 65/77] net: b44: set pause params only when interface is up Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 66/77] stackdepot: respect __GFP_NOLOCKDEP allocation flag Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 67/77] mtd: diskonchip: work around ubsan link failure Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 68/77] tcp: Clean up kernel listeners reqsk in inet_twsk_purge() Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 69/77] tcp: Fix NEW_SYN_RECV handling " Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 70/77] dmaengine: owl: fix register access functions Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 71/77] idma64: Dont try to serve interrupts when device is powered off Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 72/77] i2c: smbus: fix NULL function pointer dereference Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 73/77] HID: i2c-hid: remove I2C_HID_READ_PENDING flag to prevent lock-up Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 74/77] Revert "loop: Remove sector_t truncation checks" Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 75/77] Revert "y2038: rusage: use __kernel_old_timeval" Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 76/77] udp: preserve the connected status if only UDP cmsg Greg Kroah-Hartman
2024-04-30 10:39 ` [PATCH 4.19 77/77] serial: core: fix kernel-doc for uart_port_unlock_irqrestore() Greg Kroah-Hartman
2024-05-01 13:37 ` [PATCH 4.19 00/77] 4.19.313-rc1 review Jon Hunter
2024-05-01 19:44 ` Pavel Machek
2024-05-02 3:13 ` Shuah Khan
2024-05-02 7:49 ` Naresh Kamboju
2024-05-02 8:31 ` Harshit Mogalapalli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240430103042.139562036@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=airlied@redhat.com \
--cc=dakr@redhat.com \
--cc=patches@lists.linux.dev \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.