From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev, Dave Airlie <airlied@redhat.com>,
Danilo Krummrich <dakr@redhat.com>
Subject: [PATCH 5.15 66/71] nouveau: fix instmem race condition around ptr stores
Date: Tue, 23 Apr 2024 14:40:19 -0700 [thread overview]
Message-ID: <20240423213846.480886974@linuxfoundation.org> (raw)
In-Reply-To: <20240423213844.122920086@linuxfoundation.org>
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Airlie <airlied@redhat.com>
commit fff1386cc889d8fb4089d285f883f8cba62d82ce upstream.
Running a lot of VK CTS in parallel against nouveau, once every
few hours you might see something like this crash.
BUG: kernel NULL pointer dereference, address: 0000000000000008
PGD 8000000114e6e067 P4D 8000000114e6e067 PUD 109046067 PMD 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 7 PID: 53891 Comm: deqp-vk Not tainted 6.8.0-rc6+ #27
Hardware name: Gigabyte Technology Co., Ltd. Z390 I AORUS PRO WIFI/Z390 I AORUS PRO WIFI-CF, BIOS F8 11/05/2021
RIP: 0010:gp100_vmm_pgt_mem+0xe3/0x180 [nouveau]
Code: c7 48 01 c8 49 89 45 58 85 d2 0f 84 95 00 00 00 41 0f b7 46 12 49 8b 7e 08 89 da 42 8d 2c f8 48 8b 47 08 41 83 c7 01 48 89 ee <48> 8b 40 08 ff d0 0f 1f 00 49 8b 7e 08 48 89 d9 48 8d 75 04 48 c1
RSP: 0000:ffffac20c5857838 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 00000000004d8001 RCX: 0000000000000001
RDX: 00000000004d8001 RSI: 00000000000006d8 RDI: ffffa07afe332180
RBP: 00000000000006d8 R08: ffffac20c5857ad0 R09: 0000000000ffff10
R10: 0000000000000001 R11: ffffa07af27e2de0 R12: 000000000000001c
R13: ffffac20c5857ad0 R14: ffffa07a96fe9040 R15: 000000000000001c
FS: 00007fe395eed7c0(0000) GS:ffffa07e2c980000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 000000011febe001 CR4: 00000000003706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
...
? gp100_vmm_pgt_mem+0xe3/0x180 [nouveau]
? gp100_vmm_pgt_mem+0x37/0x180 [nouveau]
nvkm_vmm_iter+0x351/0xa20 [nouveau]
? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau]
? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
? __lock_acquire+0x3ed/0x2170
? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
nvkm_vmm_ptes_get_map+0xc2/0x100 [nouveau]
? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau]
? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
nvkm_vmm_map_locked+0x224/0x3a0 [nouveau]
Adding any sort of useful debug usually makes it go away, so I hand
wrote the function in a line, and debugged the asm.
Every so often pt->memory->ptrs is NULL. This ptrs ptr is set in
the nv50_instobj_acquire called from nvkm_kmap.
If Thread A and Thread B both get to nv50_instobj_acquire around
the same time, and Thread A hits the refcount_set line, and in
lockstep thread B succeeds at refcount_inc_not_zero, there is a
chance the ptrs value won't have been stored since refcount_set
is unordered. Force a memory barrier here, I picked smp_mb, since
we want it on all CPUs and it's write followed by a read.
v2: use paired smp_rmb/smp_wmb.
Cc: <stable@vger.kernel.org>
Fixes: be55287aa5ba ("drm/nouveau/imem/nv50: embed nvkm_instobj directly into nv04_instobj")
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240411011510.2546857-1-airlied@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c
@@ -221,8 +221,11 @@ nv50_instobj_acquire(struct nvkm_memory
void __iomem *map = NULL;
/* Already mapped? */
- if (refcount_inc_not_zero(&iobj->maps))
+ if (refcount_inc_not_zero(&iobj->maps)) {
+ /* read barrier match the wmb on refcount set */
+ smp_rmb();
return iobj->map;
+ }
/* Take the lock, and re-check that another thread hasn't
* already mapped the object in the meantime.
@@ -249,6 +252,8 @@ nv50_instobj_acquire(struct nvkm_memory
iobj->base.memory.ptrs = &nv50_instobj_fast;
else
iobj->base.memory.ptrs = &nv50_instobj_slow;
+ /* barrier to ensure the ptrs are written before refcount is set */
+ smp_wmb();
refcount_set(&iobj->maps, 1);
}
next prev parent reply other threads:[~2024-04-23 21:46 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-23 21:39 [PATCH 5.15 00/71] 5.15.157-rc1 review Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 01/71] ksmbd: dont send oplock break if rename fails Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 02/71] ksmbd: validate payload size in ipc response Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 03/71] ksmbd: do not set SMB2_GLOBAL_CAP_ENCRYPTION for SMB 3.1.1 Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 04/71] btrfs: record delayed inode root in transaction Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 05/71] SUNRPC: Fix rpcgss_context trace event acceptor field Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 06/71] selftests/ftrace: Limit length in subsystem-enable tests Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 07/71] bpf: Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 08/71] bpf: Generalize check_ctx_reg for reuse with other types Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 09/71] bpf: Generally fix helper register offset check Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 10/71] bpf: Fix out of bounds access for ringbuf helpers Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 11/71] bpf: Fix ringbuf memory type confusion when passing to helpers Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 12/71] kprobes: Fix possible use-after-free issue on kprobe registration Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 13/71] Revert "tracing/trigger: Fix to return error if failed to alloc snapshot" Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 14/71] Revert "lockd: introduce safe async lock op" Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 15/71] netfilter: nf_tables: Fix potential data-race in __nft_expr_type_get() Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 16/71] netfilter: nf_tables: Fix potential data-race in __nft_obj_type_get() Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 17/71] netfilter: br_netfilter: skip conntrack input hook for promisc packets Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 18/71] netfilter: nft_set_pipapo: do not free live element Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 19/71] netfilter: nf_flow_table: count pending offload workqueue tasks Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 20/71] netfilter: flowtable: validate pppoe header Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 21/71] netfilter: flowtable: incorrect pppoe tuple Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 22/71] af_unix: Call manage_oob() for every skb in unix_stream_read_generic() Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 23/71] af_unix: Dont peek OOB data without MSG_OOB Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 24/71] tun: limit printing rate when illegal packet received by tun dev Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 25/71] net: dsa: mt7530: fix mirroring frames received on local port Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 26/71] net: ethernet: ti: am65-cpsw-nuss: cleanup DMA Channels before using them Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 27/71] RDMA/rxe: Fix the problem "mutex_destroy missing" Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 28/71] RDMA/cm: Print the old state when cm_destroy_id gets timeout Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 29/71] RDMA/mlx5: Fix port number for counter query in multi-port configuration Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 30/71] s390/qdio: handle deferred cc1 Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 31/71] s390/cio: fix race condition during online processing Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 32/71] drm: nv04: Fix out of bounds access Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 33/71] drm/panel: visionox-rm69299: dont unregister DSI device Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 34/71] clk: Remove prepare_lock hold assertion in __clk_release() Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 35/71] clk: Mark all_lists as const Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 36/71] clk: remove extra empty line Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 37/71] clk: Print an info line before disabling unused clocks Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 38/71] clk: Initialize struct clk_core kref earlier Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 39/71] clk: Get runtime PM before walking tree during disable_unused Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 40/71] x86/bugs: Fix BHI retpoline check Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 41/71] x86/cpufeatures: Fix dependencies for GFNI, VAES, and VPCLMULQDQ Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 42/71] thunderbolt: Avoid notify PM core about runtime PM resume Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 43/71] thunderbolt: Fix wake configurations after device unplug Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 44/71] comedi: vmk80xx: fix incomplete endpoint checking Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 45/71] serial/pmac_zilog: Remove flawed mitigation for rx irq flood Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 46/71] USB: serial: option: add Fibocom FM135-GL variants Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 47/71] USB: serial: option: add support for Fibocom FM650/FG650 Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 48/71] USB: serial: option: add Lonsung U8300/U9300 product Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 49/71] USB: serial: option: support Quectel EM060K sub-models Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 50/71] USB: serial: option: add Rolling RW101-GL and RW135-GL support Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 51/71] USB: serial: option: add Telit FN920C04 rmnet compositions Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 52/71] Revert "usb: cdc-wdm: close race between read and workqueue" Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 53/71] usb: dwc2: host: Fix dereference issue in DDMA completion flow Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 54/71] usb: Disable USB3 LPM at shutdown Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 55/71] usb: gadget: f_ncm: Fix UAF ncm object at re-bind after usb ep transport error Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 56/71] mei: me: disable RPL-S on SPS and IGN firmwares Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 57/71] speakup: Avoid crash on very long word Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 58/71] fs: sysfs: Fix reference leak in sysfs_break_active_protection() Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 59/71] KVM: x86: Snapshot if a vCPUs vendor model is AMD vs. Intel compatible Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 60/71] KVM: x86/pmu: Do not mask LVTPC when handling a PMI on AMD platforms Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 61/71] arm64: hibernate: Fix level3 translation fault in swsusp_save() Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 62/71] init/main.c: Fix potential static_command_line memory overflow Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 63/71] binder: check offset alignment in binder_get_object() Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 64/71] drm/amdgpu: validate the parameters of bo mapping operations more clearly Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 65/71] drm/vmwgfx: Sort primary plane formats by order of preference Greg Kroah-Hartman
2024-04-23 21:40 ` Greg Kroah-Hartman [this message]
2024-04-23 21:40 ` [PATCH 5.15 67/71] nilfs2: fix OOB in nilfs_set_de_type Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 68/71] net: dsa: mt7530: set all CPU ports in MT7531_CPU_PMAP Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 69/71] net: dsa: introduce preferred_default_local_cpu_port and use on MT7530 Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 70/71] net: dsa: mt7530: fix improper frames on all 25MHz and 40MHz XTAL MT7530 Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 71/71] net: dsa: mt7530: fix enabling EEE on MT7531 switch on all boards Greg Kroah-Hartman
2024-04-23 23:00 ` [PATCH 5.15 00/71] 5.15.157-rc1 review SeongJae Park
2024-04-23 23:32 ` Florian Fainelli
2024-04-24 7:25 ` Pavel Machek
2024-04-24 7:32 ` Pavel Machek
2024-04-24 7:57 ` Naresh Kamboju
2024-04-24 9:21 ` Peter Oberparleiter
2024-04-27 14:26 ` Greg Kroah-Hartman
2024-04-24 8:28 ` Ron Economos
2024-04-24 9:30 ` Harshit Mogalapalli
2024-04-25 8:59 ` Jon Hunter
2024-04-25 20:19 ` Shreeya Patel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240423213846.480886974@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=airlied@redhat.com \
--cc=dakr@redhat.com \
--cc=patches@lists.linux.dev \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.