stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev, Dave Airlie <airlied@redhat.com>,
	Danilo Krummrich <dakr@redhat.com>
Subject: [PATCH 5.15 66/71] nouveau: fix instmem race condition around ptr stores
Date: Tue, 23 Apr 2024 14:40:19 -0700	[thread overview]
Message-ID: <20240423213846.480886974@linuxfoundation.org> (raw)
In-Reply-To: <20240423213844.122920086@linuxfoundation.org>

5.15-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dave Airlie <airlied@redhat.com>

commit fff1386cc889d8fb4089d285f883f8cba62d82ce upstream.

Running a lot of VK CTS in parallel against nouveau, once every
few hours you might see something like this crash.

BUG: kernel NULL pointer dereference, address: 0000000000000008
PGD 8000000114e6e067 P4D 8000000114e6e067 PUD 109046067 PMD 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 7 PID: 53891 Comm: deqp-vk Not tainted 6.8.0-rc6+ #27
Hardware name: Gigabyte Technology Co., Ltd. Z390 I AORUS PRO WIFI/Z390 I AORUS PRO WIFI-CF, BIOS F8 11/05/2021
RIP: 0010:gp100_vmm_pgt_mem+0xe3/0x180 [nouveau]
Code: c7 48 01 c8 49 89 45 58 85 d2 0f 84 95 00 00 00 41 0f b7 46 12 49 8b 7e 08 89 da 42 8d 2c f8 48 8b 47 08 41 83 c7 01 48 89 ee <48> 8b 40 08 ff d0 0f 1f 00 49 8b 7e 08 48 89 d9 48 8d 75 04 48 c1
RSP: 0000:ffffac20c5857838 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 00000000004d8001 RCX: 0000000000000001
RDX: 00000000004d8001 RSI: 00000000000006d8 RDI: ffffa07afe332180
RBP: 00000000000006d8 R08: ffffac20c5857ad0 R09: 0000000000ffff10
R10: 0000000000000001 R11: ffffa07af27e2de0 R12: 000000000000001c
R13: ffffac20c5857ad0 R14: ffffa07a96fe9040 R15: 000000000000001c
FS:  00007fe395eed7c0(0000) GS:ffffa07e2c980000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 000000011febe001 CR4: 00000000003706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:

...

 ? gp100_vmm_pgt_mem+0xe3/0x180 [nouveau]
 ? gp100_vmm_pgt_mem+0x37/0x180 [nouveau]
 nvkm_vmm_iter+0x351/0xa20 [nouveau]
 ? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau]
 ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
 ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
 ? __lock_acquire+0x3ed/0x2170
 ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
 nvkm_vmm_ptes_get_map+0xc2/0x100 [nouveau]
 ? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau]
 ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
 nvkm_vmm_map_locked+0x224/0x3a0 [nouveau]

Adding any sort of useful debug usually makes it go away, so I hand
wrote the function in a line, and debugged the asm.

Every so often pt->memory->ptrs is NULL. This ptrs ptr is set in
the nv50_instobj_acquire called from nvkm_kmap.

If Thread A and Thread B both get to nv50_instobj_acquire around
the same time, and Thread A hits the refcount_set line, and in
lockstep thread B succeeds at refcount_inc_not_zero, there is a
chance the ptrs value won't have been stored since refcount_set
is unordered. Force a memory barrier here, I picked smp_mb, since
we want it on all CPUs and it's write followed by a read.

v2: use paired smp_rmb/smp_wmb.

Cc: <stable@vger.kernel.org>
Fixes: be55287aa5ba ("drm/nouveau/imem/nv50: embed nvkm_instobj directly into nv04_instobj")
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240411011510.2546857-1-airlied@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

--- a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c
@@ -221,8 +221,11 @@ nv50_instobj_acquire(struct nvkm_memory
 	void __iomem *map = NULL;
 
 	/* Already mapped? */
-	if (refcount_inc_not_zero(&iobj->maps))
+	if (refcount_inc_not_zero(&iobj->maps)) {
+		/* read barrier match the wmb on refcount set */
+		smp_rmb();
 		return iobj->map;
+	}
 
 	/* Take the lock, and re-check that another thread hasn't
 	 * already mapped the object in the meantime.
@@ -249,6 +252,8 @@ nv50_instobj_acquire(struct nvkm_memory
 			iobj->base.memory.ptrs = &nv50_instobj_fast;
 		else
 			iobj->base.memory.ptrs = &nv50_instobj_slow;
+		/* barrier to ensure the ptrs are written before refcount is set */
+		smp_wmb();
 		refcount_set(&iobj->maps, 1);
 	}
 



  parent reply	other threads:[~2024-04-23 21:46 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-23 21:39 [PATCH 5.15 00/71] 5.15.157-rc1 review Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 01/71] ksmbd: dont send oplock break if rename fails Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 02/71] ksmbd: validate payload size in ipc response Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 03/71] ksmbd: do not set SMB2_GLOBAL_CAP_ENCRYPTION for SMB 3.1.1 Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 04/71] btrfs: record delayed inode root in transaction Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 05/71] SUNRPC: Fix rpcgss_context trace event acceptor field Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 06/71] selftests/ftrace: Limit length in subsystem-enable tests Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 07/71] bpf: Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 08/71] bpf: Generalize check_ctx_reg for reuse with other types Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 09/71] bpf: Generally fix helper register offset check Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 10/71] bpf: Fix out of bounds access for ringbuf helpers Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 11/71] bpf: Fix ringbuf memory type confusion when passing to helpers Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 12/71] kprobes: Fix possible use-after-free issue on kprobe registration Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 13/71] Revert "tracing/trigger: Fix to return error if failed to alloc snapshot" Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 14/71] Revert "lockd: introduce safe async lock op" Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 15/71] netfilter: nf_tables: Fix potential data-race in __nft_expr_type_get() Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 16/71] netfilter: nf_tables: Fix potential data-race in __nft_obj_type_get() Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 17/71] netfilter: br_netfilter: skip conntrack input hook for promisc packets Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 18/71] netfilter: nft_set_pipapo: do not free live element Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 19/71] netfilter: nf_flow_table: count pending offload workqueue tasks Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 20/71] netfilter: flowtable: validate pppoe header Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 21/71] netfilter: flowtable: incorrect pppoe tuple Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 22/71] af_unix: Call manage_oob() for every skb in unix_stream_read_generic() Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 23/71] af_unix: Dont peek OOB data without MSG_OOB Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 24/71] tun: limit printing rate when illegal packet received by tun dev Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 25/71] net: dsa: mt7530: fix mirroring frames received on local port Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 26/71] net: ethernet: ti: am65-cpsw-nuss: cleanup DMA Channels before using them Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 27/71] RDMA/rxe: Fix the problem "mutex_destroy missing" Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 28/71] RDMA/cm: Print the old state when cm_destroy_id gets timeout Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 29/71] RDMA/mlx5: Fix port number for counter query in multi-port configuration Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 30/71] s390/qdio: handle deferred cc1 Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 31/71] s390/cio: fix race condition during online processing Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 32/71] drm: nv04: Fix out of bounds access Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 33/71] drm/panel: visionox-rm69299: dont unregister DSI device Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 34/71] clk: Remove prepare_lock hold assertion in __clk_release() Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 35/71] clk: Mark all_lists as const Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 36/71] clk: remove extra empty line Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 37/71] clk: Print an info line before disabling unused clocks Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 38/71] clk: Initialize struct clk_core kref earlier Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 39/71] clk: Get runtime PM before walking tree during disable_unused Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 40/71] x86/bugs: Fix BHI retpoline check Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 41/71] x86/cpufeatures: Fix dependencies for GFNI, VAES, and VPCLMULQDQ Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 42/71] thunderbolt: Avoid notify PM core about runtime PM resume Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 43/71] thunderbolt: Fix wake configurations after device unplug Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 44/71] comedi: vmk80xx: fix incomplete endpoint checking Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 45/71] serial/pmac_zilog: Remove flawed mitigation for rx irq flood Greg Kroah-Hartman
2024-04-23 21:39 ` [PATCH 5.15 46/71] USB: serial: option: add Fibocom FM135-GL variants Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 47/71] USB: serial: option: add support for Fibocom FM650/FG650 Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 48/71] USB: serial: option: add Lonsung U8300/U9300 product Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 49/71] USB: serial: option: support Quectel EM060K sub-models Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 50/71] USB: serial: option: add Rolling RW101-GL and RW135-GL support Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 51/71] USB: serial: option: add Telit FN920C04 rmnet compositions Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 52/71] Revert "usb: cdc-wdm: close race between read and workqueue" Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 53/71] usb: dwc2: host: Fix dereference issue in DDMA completion flow Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 54/71] usb: Disable USB3 LPM at shutdown Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 55/71] usb: gadget: f_ncm: Fix UAF ncm object at re-bind after usb ep transport error Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 56/71] mei: me: disable RPL-S on SPS and IGN firmwares Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 57/71] speakup: Avoid crash on very long word Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 58/71] fs: sysfs: Fix reference leak in sysfs_break_active_protection() Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 59/71] KVM: x86: Snapshot if a vCPUs vendor model is AMD vs. Intel compatible Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 60/71] KVM: x86/pmu: Do not mask LVTPC when handling a PMI on AMD platforms Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 61/71] arm64: hibernate: Fix level3 translation fault in swsusp_save() Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 62/71] init/main.c: Fix potential static_command_line memory overflow Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 63/71] binder: check offset alignment in binder_get_object() Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 64/71] drm/amdgpu: validate the parameters of bo mapping operations more clearly Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 65/71] drm/vmwgfx: Sort primary plane formats by order of preference Greg Kroah-Hartman
2024-04-23 21:40 ` Greg Kroah-Hartman [this message]
2024-04-23 21:40 ` [PATCH 5.15 67/71] nilfs2: fix OOB in nilfs_set_de_type Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 68/71] net: dsa: mt7530: set all CPU ports in MT7531_CPU_PMAP Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 69/71] net: dsa: introduce preferred_default_local_cpu_port and use on MT7530 Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 70/71] net: dsa: mt7530: fix improper frames on all 25MHz and 40MHz XTAL MT7530 Greg Kroah-Hartman
2024-04-23 21:40 ` [PATCH 5.15 71/71] net: dsa: mt7530: fix enabling EEE on MT7531 switch on all boards Greg Kroah-Hartman
2024-04-23 23:00 ` [PATCH 5.15 00/71] 5.15.157-rc1 review SeongJae Park
2024-04-23 23:32 ` Florian Fainelli
2024-04-24  7:25 ` Pavel Machek
2024-04-24  7:32 ` Pavel Machek
2024-04-24  7:57 ` Naresh Kamboju
2024-04-24  9:21   ` Peter Oberparleiter
2024-04-27 14:26     ` Greg Kroah-Hartman
2024-04-24  8:28 ` Ron Economos
2024-04-24  9:30 ` Harshit Mogalapalli
2024-04-25  8:59 ` Jon Hunter
2024-04-25 20:19 ` Shreeya Patel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240423213846.480886974@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=airlied@redhat.com \
    --cc=dakr@redhat.com \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).