From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 62974481C4 for ; Mon, 2 Dec 2024 14:57:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733151448; cv=none; b=JfzBvWGnsPAJ89/CfBDj0lRsTzuxViY78fnJi+5V11kfXucSTKLitp03R3lxvRxOP6CW+1sbyy/hWVX2iqny3EmIEw5IN8hB5PKIPlz6gR7MQF0ycHizOo3SaSgphnC0F+HS7ut6bOhdfzSoT1zQJsYO0FVFIq5rgCmsY8XQDus= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733151448; c=relaxed/simple; bh=zi2Y8KnhH074XXMtquovnNIt4w/1S9hvg8VUmNuLcjM=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=sYli9OeLYffweqrt/7zU9Rx/GD/wD1Ez7baYPflzHvumzu1EqRAOLfr1bd+DVQukphbYd+HlQ0E953XUDPv0cUdCsIf2Om9sM/gEHFYiLgg5F9BfMCLSlwn1A3ezhy1HNLu6kB66dv9DJic6gPXj137Q7P9rUxjY1BRVBb6Q32w= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=jEiyTb7z; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="jEiyTb7z" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 61204C4CED1; Mon, 2 Dec 2024 14:57:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1733151446; bh=zi2Y8KnhH074XXMtquovnNIt4w/1S9hvg8VUmNuLcjM=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=jEiyTb7zUHYBgPZJfCUwe25oW3QW+AzOkP2v3OzyIEdoWCTSP/cwe9lSnOg85ZnBB VFlLaaijABHJPyAP8pH45NKKnbPGdE9eD9nGsrMa2o8BPHiu2bdPwgYzXF67ms7GUj qPY92B7OF5eRyLsJfbwRYjwqvr+xcIFU3TH++IrAqD/bTrT16WYh5kOgRw5TwhXrlE RajxL6QBfd1Hrpj/FB39WqswQzPP5e6K7bdpRBXkw4ENfS8Ypgmgdy7T9LGCSBCtvU D/xAmR8P/cx74JKJBHW94Jy0TcZQDAkWQvLw5HyDnH4nskgya2kcgzkf3jAZ1+0sgX Gn0+Rw93a/Uug== Date: Mon, 2 Dec 2024 15:57:24 +0100 From: Maxime Ripard To: Chen-Yu Tsai Cc: Sui Jingfeng , Chen-Yu Tsai , Fei Shao , Andrzej Hajda , Neil Armstrong , Robert Foss , Laurent Pinchart , AngeloGioacchino Del Regno , David Airlie , Jernej Skrabec , Jonas Karlman , Maarten Lankhorst , Simona Vetter , Thomas Zimmermann , dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH] drm/bridge: panel: Use devm_drm_bridge_add() Message-ID: <20241202-rigorous-sensible-monkey-4b0f48@houat> References: <20241009052402.411978-1-fshao@chromium.org> <20241024-stalwart-bandicoot-of-music-bc6b29@houat> <20241114-gray-corgi-of-youth-f992ec@houat> <20241129-meticulous-pumpkin-echidna-dff6df@houat> <20241129-blazing-granite-beetle-e9fecd@houat> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha384; protocol="application/pgp-signature"; boundary="3pgxj6ts43wvawyl" Content-Disposition: inline In-Reply-To: --3pgxj6ts43wvawyl Content-Type: text/plain; protected-headers=v1; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Subject: Re: [RFC PATCH] drm/bridge: panel: Use devm_drm_bridge_add() MIME-Version: 1.0 On Fri, Nov 29, 2024 at 11:16:50PM +0800, Chen-Yu Tsai wrote: > On Fri, Nov 29, 2024 at 10:55=E2=80=AFPM Maxime Ripard wrote: > > > > On Fri, Nov 29, 2024 at 10:12:02PM +0800, Sui Jingfeng wrote: > > > Hi, > > > > > > On 2024/11/29 18:51, Maxime Ripard wrote: > > > > On Wed, Nov 27, 2024 at 05:58:31PM +0800, Chen-Yu Tsai wrote: > > > > > Revisiting this thread since I just stepped on the same problem o= n a > > > > > different device. > > > > > > > > > > On Thu, Nov 14, 2024 at 9:12=E2=80=AFPM Maxime Ripard wrote: > > > > > > On Tue, Oct 29, 2024 at 10:53:49PM +0800, Fei Shao wrote: > > > > > > > On Thu, Oct 24, 2024 at 8:36=E2=80=AFPM Maxime Ripard wrote: > > > > > > > > On Wed, Oct 09, 2024 at 01:23:31PM +0800, Fei Shao wrote: > > > > > > > > > In the mtk_dsi driver, its DSI host attach callback calls > > > > > > > > > devm_drm_of_get_bridge() to get the next bridge. If that = next bridge is > > > > > > > > > a panel bridge, a panel_bridge object is allocated and ma= naged by the > > > > > > > > > panel device. > > > > > > > > > > > > > > > > > > Later, if the attach callback fails with -EPROBE_DEFER fr= om subsequent > > > > > > > > > component_add(), the panel device invoking the callback a= t probe time > > > > > > > > > also fails, and all device-managed resources are freed ac= cordingly. > > > > > > > > > > > > > > > > > > This exposes a drm_bridge bridge_list corruption due to t= he unbalanced > > > > > > > > > lifecycle between the DSI host and the panel devices: the= panel_bridge > > > > > > > > > object managed by panel device is freed, while drm_bridge= _remove() is > > > > > > > > > bound to DSI host device and never gets called. > > > > > > > > > The next drm_bridge_add() will trigger UAF against the fr= eed bridge list > > > > > > > > > object and result in kernel panic. > > > > > > > > > > > > > > > > > > This bug is observed on a MediaTek MT8188-based Chromeboo= k with MIPI DSI > > > > > > > > > outputting to a DSI panel (DT is WIP for upstream). > > > > > > > > > > > > > > > > > > As a fix, using devm_drm_bridge_add() with the panel devi= ce in the panel > > > > > > > > > path seems reasonable. This also implies a chain of poten= tial cleanup > > > > > > > > > actions: > > > > > > > > > > > > > > > > > > 1. Removing drm_bridge_remove() means devm_drm_panel_brid= ge_release() > > > > > > > > > becomes hollow and can be removed. > > > > > > > > > > > > > > > > > > 2. devm_drm_panel_bridge_add_typed() is almost emptied ex= cept for the > > > > > > > > > `bridge->pre_enable_prev_first` line. Itself can be a= lso removed if > > > > > > > > > we move the line into drm_panel_bridge_add_typed(). (= maybe?) > > > > > > > > > > > > > > > > > > 3. drm_panel_bridge_add_typed() now calls all the needed = devm_* calls, > > > > > > > > > so it's essentially the new devm_drm_panel_bridge_add= _typed(). > > > > > > > > > > > > > > > > > > 4. drmm_panel_bridge_add() needs to be updated accordingl= y since it > > > > > > > > > calls drm_panel_bridge_add_typed(). But now there's o= nly one bridge > > > > > > > > > object to be freed, and it's already being managed by= panel device. > > > > > > > > > I wonder if we still need both drmm_ and devm_ versio= n in this case. > > > > > > > > > (maybe yes from DRM PoV, I don't know much about the = context) > > > > > > > > > > > > > > > > > > This is a RFC patch since I'm not sure if my understandin= g is correct > > > > > > > > > (for both the fix and the cleanup). It fixes the issue I = encountered, > > > > > > > > > but I don't expect it to be picked up directly due to the= redundant > > > > > > > > > commit message and the dangling devm_drm_panel_bridge_rel= ease(). > > > > > > > > > I plan to resend the official patch(es) once I know what = I supposed to > > > > > > > > > do next. > > > > > > > > > > > > > > > > > > For reference, here's the KASAN report from the device: > > > > > > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > > > > > > BUG: KASAN: slab-use-after-free in drm_bridge_add+0x98/= 0x230 > > > > > > > > > Read of size 8 at addr ffffff80c4e9e100 by task kworker= /u32:1/69 > > > > > > > > > > > > > > > > > > CPU: 1 UID: 0 PID: 69 Comm: kworker/u32:1 Not tainted 6= =2E12.0-rc1-next-20241004-kasan-00030-g062135fa4046 #1 > > > > > > > > > Hardware name: Google Ciri sku0/unprovisioned board (DT) > > > > > > > > > Workqueue: events_unbound deferred_probe_work_func > > > > > > > > > Call trace: > > > > > > > > > dump_backtrace+0xfc/0x140 > > > > > > > > > show_stack+0x24/0x38 > > > > > > > > > dump_stack_lvl+0x40/0xc8 > > > > > > > > > print_report+0x140/0x700 > > > > > > > > > kasan_report+0xcc/0x130 > > > > > > > > > __asan_report_load8_noabort+0x20/0x30 > > > > > > > > > drm_bridge_add+0x98/0x230 > > > > > > > > > devm_drm_panel_bridge_add_typed+0x174/0x298 > > > > > > > > > devm_drm_of_get_bridge+0xe8/0x190 > > > > > > > > > mtk_dsi_host_attach+0x130/0x2b0 > > > > > > > > > mipi_dsi_attach+0x8c/0xe8 > > > > > > > > > hx83102_probe+0x1a8/0x368 > > > > > > > > > mipi_dsi_drv_probe+0x6c/0x88 > > > > > > > > > really_probe+0x1c4/0x698 > > > > > > > > > __driver_probe_device+0x160/0x298 > > > > > > > > > driver_probe_device+0x7c/0x2a8 > > > > > > > > > __device_attach_driver+0x2a0/0x398 > > > > > > > > > bus_for_each_drv+0x198/0x200 > > > > > > > > > __device_attach+0x1c0/0x308 > > > > > > > > > device_initial_probe+0x20/0x38 > > > > > > > > > bus_probe_device+0x11c/0x1f8 > > > > > > > > > deferred_probe_work_func+0x80/0x250 > > > > > > > > > worker_thread+0x9b4/0x2780 > > > > > > > > > kthread+0x274/0x350 > > > > > > > > > ret_from_fork+0x10/0x20 > > > > > > > > > > > > > > > > > > Allocated by task 69: > > > > > > > > > kasan_save_track+0x40/0x78 > > > > > > > > > kasan_save_alloc_info+0x44/0x58 > > > > > > > > > __kasan_kmalloc+0x84/0xa0 > > > > > > > > > __kmalloc_node_track_caller_noprof+0x228/0x450 > > > > > > > > > devm_kmalloc+0x6c/0x288 > > > > > > > > > devm_drm_panel_bridge_add_typed+0xa0/0x298 > > > > > > > > > devm_drm_of_get_bridge+0xe8/0x190 > > > > > > > > > mtk_dsi_host_attach+0x130/0x2b0 > > > > > > > > > mipi_dsi_attach+0x8c/0xe8 > > > > > > > > > hx83102_probe+0x1a8/0x368 > > > > > > > > > mipi_dsi_drv_probe+0x6c/0x88 > > > > > > > > > really_probe+0x1c4/0x698 > > > > > > > > > __driver_probe_device+0x160/0x298 > > > > > > > > > driver_probe_device+0x7c/0x2a8 > > > > > > > > > __device_attach_driver+0x2a0/0x398 > > > > > > > > > bus_for_each_drv+0x198/0x200 > > > > > > > > > __device_attach+0x1c0/0x308 > > > > > > > > > device_initial_probe+0x20/0x38 > > > > > > > > > bus_probe_device+0x11c/0x1f8 > > > > > > > > > deferred_probe_work_func+0x80/0x250 > > > > > > > > > worker_thread+0x9b4/0x2780 > > > > > > > > > kthread+0x274/0x350 > > > > > > > > > ret_from_fork+0x10/0x20 > > > > > > > > > > > > > > > > > > Freed by task 69: > > > > > > > > > kasan_save_track+0x40/0x78 > > > > > > > > > kasan_save_free_info+0x58/0x78 > > > > > > > > > __kasan_slab_free+0x48/0x68 > > > > > > > > > kfree+0xd4/0x750 > > > > > > > > > devres_release_all+0x144/0x1e8 > > > > > > > > > really_probe+0x48c/0x698 > > > > > > > > > __driver_probe_device+0x160/0x298 > > > > > > > > > driver_probe_device+0x7c/0x2a8 > > > > > > > > > __device_attach_driver+0x2a0/0x398 > > > > > > > > > bus_for_each_drv+0x198/0x200 > > > > > > > > > __device_attach+0x1c0/0x308 > > > > > > > > > device_initial_probe+0x20/0x38 > > > > > > > > > bus_probe_device+0x11c/0x1f8 > > > > > > > > > deferred_probe_work_func+0x80/0x250 > > > > > > > > > worker_thread+0x9b4/0x2780 > > > > > > > > > kthread+0x274/0x350 > > > > > > > > > ret_from_fork+0x10/0x20 > > > > > > > > > > > > > > > > > > The buggy address belongs to the object at ffffff80c4e9= e000 > > > > > > > > > which belongs to the cache kmalloc-4k of size 4096 > > > > > > > > > The buggy address is located 256 bytes inside of > > > > > > > > > freed 4096-byte region [ffffff80c4e9e000, ffffff80c4e9= f000) > > > > > > > > > > > > > > > > > > The buggy address belongs to the physical page: > > > > > > > > > head: order:3 mapcount:0 entire_mapcount:0 nr_pages_map= ped:0 pincount:0 > > > > > > > > > flags: 0x8000000000000040(head|zone=3D2) > > > > > > > > > page_type: f5(slab) > > > > > > > > > page: refcount:1 mapcount:0 mapping:0000000000000000 > > > > > > > > > index:0x0 pfn:0x104e98 > > > > > > > > > raw: 8000000000000040 ffffff80c0003040 dead000000000122= 0000000000000000 > > > > > > > > > raw: 0000000000000000 0000000000040004 00000001f5000000= 0000000000000000 > > > > > > > > > head: 8000000000000040 ffffff80c0003040 dead00000000012= 2 0000000000000000 > > > > > > > > > head: 0000000000000000 0000000000040004 00000001f500000= 0 0000000000000000 > > > > > > > > > head: 8000000000000003 fffffffec313a601 fffffffffffffff= f 0000000000000000 > > > > > > > > > head: 0000000000000008 0000000000000000 00000000fffffff= f 0000000000000000 > > > > > > > > > page dumped because: kasan: bad access detected > > > > > > > > > > > > > > > > > > Memory state around the buggy address: > > > > > > > > > ffffff80c4e9e000: fa fb fb fb fb fb fb fb fb fb fb fb = fb fb fb fb > > > > > > > > > ffffff80c4e9e080: fb fb fb fb fb fb fb fb fb fb fb fb = fb fb fb fb > > > > > > > > > >ffffff80c4e9e100: fb fb fb fb fb fb fb fb fb fb fb fb = fb fb fb fb > > > > > > > > > ^ > > > > > > > > > ffffff80c4e9e180: fb fb fb fb fb fb fb fb fb fb fb fb = fb fb fb fb > > > > > > > > > ffffff80c4e9e200: fb fb fb fb fb fb fb fb fb fb fb fb = fb fb fb fb > > > > > > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > > > > > > > > > > > > > > > Signed-off-by: Fei Shao > > > > > > > > I was looking at the driver to try to follow your (awesome = btw, thanks) > > > > > > > > commit log, and it does have a quite different structure co= mpared to > > > > > > > > what we recommend. > > > > > > > > > > > > > > > > Would following > > > > > > > > https://docs.kernel.org/gpu/drm-kms-helpers.html#special-ca= re-with-mipi-dsi-bridges > > > > > > > > help? > > > > > > > Hi Maxime, > > > > > > > > > > > > > > Thank you for the pointer. > > > > > > > I read the suggested pattern in the doc and compared it with = the > > > > > > > drivers. If I understand correctly, both the MIPI-DSI host an= d panel > > > > > > > drivers follow the instructions: > > > > > > > > > > > > > > 1. The MIPI-DSI host driver must run mipi_dsi_host_register()= in its probe hook. > > > > > > > >> drm/mediatek/mtk_dsi.c runs mipi_dsi_host_register() i= n the probe hook. > > > > > > > 2. In its probe hook, the bridge driver must try to find its = MIPI-DSI > > > > > > > host, register as a MIPI-DSI device and attach the MIPI-DSI d= evice to > > > > > > > its host. > > > > > > > >> drm/panel/panel-himax-hx83102.c follows and runs > > > > > > > mipi_dsi_attach() at the end of probe hook. > > > > > > > 3. In its struct mipi_dsi_host_ops.attach hook, the MIPI-DSI = host can > > > > > > > now add its component. > > > > > > > >> drm/mediatek/mtk_dsi.c calls component_add() in the at= tach callback. > > > > > > > > > > > > > > Could you elaborate on the "different structures" you mention= ed? > > > > > > Yeah, you're right, sorry. > > > > > > > > > > > > > To clarify my point: the issue is that component_add() may re= turn > > > > > > > -EPROBE_DEFER if the component (e.g. DSI encoder) is not read= y, > > > > > > > causing the panel bridge to be removed. However, drm_bridge_r= emove() > > > > > > > is bound to MIPI-DSI host instead of panel bridge, which owns= the > > > > > > > actual list_head object. > > > > > > > > > > > > > > This might be reproducible with other MIPI-DSI host + panel > > > > > > > combinations by forcibly returning -EPROBE_DEFER in the host = attach > > > > > > > hook (verification with another device is needed), so the fix= may be > > > > > > > required in drm/bridge/panel.c. > > > > > > Yeah, I think you're just hitting another bridge lifetime issue= , and > > > > > > it's not the only one unfortunately. Tying the bridge structure= lifetime > > > > > > itself to the device is wrong, it should be tied to the DRM dev= ice > > > > > > lifetime instead. > > > > > I think the more immediate issue is that the bridge object's life= time > > > > > and drm_bridge_add/remove are inconsistent when devm_drm_of_get_b= ridge() > > > > > or drmm_of_get_bridge() are used. > > > > > > > > > > These helpers tie the bridge add/removal to the device or drm_dev= ice > > > > > passed in, but internally they call down to drm_panel_bridge_add_= typed() > > > > > which allocates the bridge object tied to the panel device. > > > > > > But then, the discussion becomes that bridges typically probe o= utside of > > > > > > the "main" DRM device probe path, so you don't have access to t= he DRM > > > > > > device structure until attach at best. > > > > > > > > > > > > That's why I'm a bit skeptical about your patch. It might worka= round > > > > > > your issue, but it doesn't actually solve the problem. I guess = the best > > > > > > way about it would be to convert bridges to reference counting,= with the > > > > > > device taking a reference at probe time when it allocates the s= tructure > > > > > > (and giving it back at remove time), and the DRM device taking = one when > > > > > > it's attached and one when it's detached. > > > > > Without going as far, it's probably better to align the lifecycle= of > > > > > the two parts. Most other bridge drivers in the kernel have |drm_= bridge| > > > > > lifecycle tied to their underlying |device|, either with explicit > > > > > drm_bridge_{add,remove}() calls in their probe/bind and remove/un= bind > > > > > callbacks respectively, or with devm_drm_bridge_add in the probe/= bind > > > > > path. The only ones with a narrower lifecycle are the DSI hosts, = which > > > > > add the bridge in during host attach and remove it during host de= tach. > > > > > > > > > > I'm thinking about fixing the panel_bridge lifecycle such that it= is > > > > > tied to the panel itself. Maybe that would involve making > > > > > devm_drm_of_get_bridge() correctly return bridges even if a panel= was > > > > > found, and then making the panels create and add panel bridges di= rectly, > > > > > possibly within drm_panel_add(). Would that make sense? > > > > Not really. > > > > > > > > > [...] > > > > > > > > > > Or rather, it doesn't fix the root cause that is that tieing > > > > the bridge lifetime to the device is wrong. >=20 > Well, yeah, but at least it aligns the panel_bridge case with every other > bridge driver: the bridge object and its duration existing in the list > of bridges are the same in every other bridge driver. The bridge driver > allocates the bridge object in its probe function (with devm or not), > and adds the bridge to the global list as probably the last step of > the probe function. In the remove function the reverse is done. >=20 > Right now for the panel bridge, the bridge object is allocated with its > lifetime tied to the panel, but the adding and removing of the bridge > to/from the global list is tied to the caller of the panel_bridge API, > likely the previous bridge or encoder in the chain. >=20 > This is what is blowing up for us, right now, with the Ciri Chromebooks. > If we fix this bit, then at least it reduces the problem to be only as > worse as the other bridge drivers. It also creates a harmful function in the framework, and thus technical debt that we will have to deal with eventually. It's pretty clear that you don't want to fix it properly, but at the very least we shouldn't pile on more technical debt in the framework. Anyway, you both seem to have decided I'm wrong, so go ahead anyway you see fit. Maxime --3pgxj6ts43wvawyl Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iJUEABMJAB0WIQTkHFbLp4ejekA/qfgnX84Zoj2+dgUCZ03KzwAKCRAnX84Zoj2+ drNvAX9l0tth5bOlbgSIbgs/z+FCBSnf8t1kpQdDZSwN1dtZv15jpsiyhHa85Kjl /jfZKcQBfiUkIb3ghT2HSqKKs2tkiKoTEyL790gyz0nNLcvcPb0vVb4zWODUrhfg G3dtkYCfEA== =4lMk -----END PGP SIGNATURE----- --3pgxj6ts43wvawyl--