From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Anholt Subject: Re: [BUG] drm: vc4: refcount_t: increment on 0; use-after-free. Date: Sun, 26 Nov 2017 17:40:11 -0800 Message-ID: <87shd0pkhw.fsf@anholt.net> References: <499324124.39558.1511369015110@email.1und1.de> <20171122185111.0ffa7fca@bbrezillon> <20171122192100.5eb8ad11@bbrezillon> <20171123075754.qwp4f4ampu2shzoi@phenom.ffwll.local> <20171126210259.2d6f3a6f@bbrezillon> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0953310838==" Return-path: Received: from anholt.net (anholt.net [50.246.234.109]) by gabe.freedesktop.org (Postfix) with ESMTP id BBE926E16C for ; Mon, 27 Nov 2017 01:40:18 +0000 (UTC) In-Reply-To: <20171126210259.2d6f3a6f@bbrezillon> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: Boris Brezillon , Kees Cook Cc: Stefan Wahren , David Airlie , "Reshetova, Elena" , dri-devel , Daniel Vetter List-Id: dri-devel@lists.freedesktop.org --===============0953310838== Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha512; protocol="application/pgp-signature" --=-=-= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable Boris Brezillon writes: > Hi Kees, > > On Sat, 25 Nov 2017 12:57:04 -0800 > Kees Cook wrote: > >> On Wed, Nov 22, 2017 at 11:57 PM, Daniel Vetter wrote: >> > On Wed, Nov 22, 2017 at 07:21:00PM +0100, Boris Brezillon wrote:=20=20 >> >> On Wed, 22 Nov 2017 19:13:09 +0100 >> >> Daniel Vetter wrote: >> >>=20=20 >> >> > On Wed, Nov 22, 2017 at 6:51 PM, Boris Brezillon >> >> > wrote:=20=20 >> >> > > Hi Stefan, >> >> > > >> >> > > On Wed, 22 Nov 2017 17:43:35 +0100 (CET) >> >> > > Stefan Wahren wrote: >> >> > >=20=20 >> >> > >> Hi Boris, >> >> > >> if i boot Raspberry Pi 3 (ARM64, defconfig, linux-next-20171122)= with sufficient CMA memory (32 MB), i'll get this warning during boot: >> >> > >> >> >> > >> [ 7.623510] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdm= i_ops [vc4]) >> >> > >> [ 7.632453] vc4-drm soc:gpu: bound 3f806000.vec (ops vc4_vec_= ops [vc4]) >> >> > >> [ 7.639707] vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_= ops [vc4]) >> >> > >> [ 7.647364] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops v= c4_crtc_ops [vc4]) >> >> > >> [ 7.655451] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops v= c4_crtc_ops [vc4]) >> >> > >> [ 7.663415] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops v= c4_crtc_ops [vc4]) >> >> > >> [ 7.730580] vc4-drm soc:gpu: bound 3fc00000.v3d (ops vc4_v3d_= ops [vc4]) >> >> > >> [ 7.743965] [drm] Initialized vc4 0.0.0 20140616 for soc:gpu = on minor 0 >> >> > >> [ 7.750841] [drm] Supports vblank timestamp caching Rev 2 (21= .10.2013). >> >> > >> [ 7.757620] [drm] Driver supports precise vblank timestamp qu= ery. >> >> > >> [ 7.811775] ------------[ cut here ]------------ >> >> > >> [ 7.811780] refcount_t: increment on 0; use-after-free. >> >> > >> [ 7.811881] WARNING: CPU: 2 PID: 2188 at lib/refcount.c:153 r= efcount_inc+0x44/0x50 >> >> > >> [ 7.811884] Modules linked in: vc4(+) cfg80211 cec drm_kms_he= lper smsc95xx usbnet drm rfkill brcmutil bcm2835_rng rng_core bcm2835_dma c= rc32_ce i2c_bcm2835 pwm_bcm2835 ip_tables x_tables ipv6 >> >> > >> [ 7.811950] CPU: 2 PID: 2188 Comm: systemd-udevd Not tainted = 4.14.0-next-20171122 #1 >> >> > >> [ 7.811953] Hardware name: Raspberry Pi 3 Model B Rev 1.2 (DT) >> >> > >> [ 7.811958] task: ffff800036b91c00 task.stack: ffff00000d6f00= 00 >> >> > >> [ 7.811967] pstate: 20000005 (nzCv daif -PAN -UAO) >> >> > >> [ 7.811974] pc : refcount_inc+0x44/0x50 >> >> > >> [ 7.811981] lr : refcount_inc+0x44/0x50 >> >> > >> [ 7.811984] sp : ffff00000d6f3300 >> >> > >> [ 7.811988] x29: ffff00000d6f3300 x28: 0000000000000000 >> >> > >> [ 7.811996] x27: 0000000000000003 x26: ffff800037107800 >> >> > >> [ 7.812004] x25: 0000000000000001 x24: ffff800035afdc00 >> >> > >> [ 7.812012] x23: 0000000000000000 x22: ffff800035dfa600 >> >> > >> [ 7.812020] x21: ffff800035afd9b0 x20: ffff800035afd9a4 >> >> > >> [ 7.812027] x19: 0000000000000000 x18: 0000000000000000 >> >> > >> [ 7.812034] x17: 0000000000000001 x16: 0000000000000019 >> >> > >> [ 7.812042] x15: 0000000000000001 x14: 00000000fffffff0 >> >> > >> [ 7.812049] x13: ffff0000090ae840 x12: ffff000008fa2d50 >> >> > >> [ 7.812057] x11: ffff000008fa2000 x10: ffff0000090ad000 >> >> > >> [ 7.812064] x9 : 0000000000000000 x8 : ffff0000090b5c2f >> >> > >> [ 7.812072] x7 : 0000000000000000 x6 : 000000000015ee00 >> >> > >> [ 7.812079] x5 : 0000000000000000 x4 : 0000000000000000 >> >> > >> [ 7.812087] x3 : ffffffffffffffff x2 : 0000800030047000 >> >> > >> [ 7.812094] x1 : ffff800036b91c00 x0 : 000000000000002b >> >> > >> [ 7.812102] Call trace: >> >> > >> [ 7.812109] refcount_inc+0x44/0x50 >> >> > >> [ 7.812226] vc4_bo_inc_usecnt+0x84/0x88 [vc4] >> >> > >> [ 7.812310] vc4_prepare_fb+0x40/0xf0 [vc4] >> >> > >> [ 7.812460] drm_atomic_helper_prepare_planes+0x54/0xf0 [drm_= kms_helper] >> >> > >> [ 7.812543] vc4_atomic_commit+0x88/0x130 [vc4] >> >> > >> [ 7.812868] drm_atomic_commit+0x48/0x68 [drm] >> >> > >> [ 7.813002] restore_fbdev_mode_atomic+0x1d8/0x1e8 [drm_kms_h= elper] >> >> > >> [ 7.813113] restore_fbdev_mode+0x28/0x160 [drm_kms_helper] >> >> > >> [ 7.813223] drm_fb_helper_restore_fbdev_mode_unlocked.part.2= 4+0x28/0x90 [drm_kms_helper] >> >> > >> [ 7.813331] drm_fb_helper_set_par+0x54/0xa8 [drm_kms_helper] >> >> > >> [ 7.813346] fbcon_init+0x4e8/0x538 >> >> > >> [ 7.813357] visual_init+0xb4/0x108 >> >> > >> [ 7.813366] do_bind_con_driver+0x1b8/0x3a0 >> >> > >> [ 7.813373] do_take_over_console+0x150/0x1d0 >> >> > >> [ 7.813380] do_fbcon_takeover+0x6c/0xf0 >> >> > >> [ 7.813387] fbcon_event_notify+0x8fc/0x928 >> >> > >> [ 7.813399] notifier_call_chain+0x50/0x90 >> >> > >> [ 7.813406] __blocking_notifier_call_chain+0x4c/0x90 >> >> > >> [ 7.813413] blocking_notifier_call_chain+0x14/0x20 >> >> > >> [ 7.813420] fb_notifier_call_chain+0x1c/0x28 >> >> > >> [ 7.813426] register_framebuffer+0x1d0/0x2d8 >> >> > >> [ 7.813533] __drm_fb_helper_initial_config_and_unlock+0x1e8/= 0x350 [drm_kms_helper] >> >> > >> [ 7.813639] drm_fb_helper_initial_config+0x40/0x58 [drm_kms_= helper] >> >> > >> [ 7.813747] drm_fbdev_cma_init_with_funcs+0x88/0x158 [drm_km= s_helper] >> >> > >> [ 7.813855] drm_fbdev_cma_init+0x14/0x28 [drm_kms_helper] >> >> > >> [ 7.813943] vc4_kms_load+0xa4/0xf0 [vc4] >> >> > >> [ 7.814026] vc4_drm_bind+0x100/0x168 [vc4] >> >> > >> [ 7.814038] try_to_bring_up_master+0x144/0x1a8 >> >> > >> [ 7.814044] component_master_add_with_match+0x9c/0xe0 >> >> > >> [ 7.814128] vc4_platform_drm_probe+0xb4/0xf0 [vc4] >> >> > >> [ 7.814137] platform_drv_probe+0x58/0xc0 >> >> > >> [ 7.814146] driver_probe_device+0x224/0x308 >> >> > >> [ 7.814153] __driver_attach+0xac/0xb0 >> >> > >> [ 7.814161] bus_for_each_dev+0x64/0xa0 >> >> > >> [ 7.814169] driver_attach+0x20/0x28 >> >> > >> [ 7.814177] bus_add_driver+0x108/0x228 >> >> > >> [ 7.814184] driver_register+0x60/0xf8 >> >> > >> [ 7.814190] __platform_driver_register+0x40/0x48 >> >> > >> [ 7.814274] vc4_drm_register+0x38/0x1000 [vc4] >> >> > >> [ 7.814283] do_one_initcall+0x38/0x120 >> >> > >> [ 7.814295] do_init_module+0x58/0x1b8 >> >> > >> [ 7.814304] load_module+0x1fa8/0x2280 >> >> > >> [ 7.814311] SyS_finit_module+0xc0/0xd0 >> >> > >> [ 7.814318] __sys_trace_return+0x0/0x4 >> >> > >> [ 7.814325] ---[ end trace 3348554eb91e19a1 ]---=20=20 >> >> > > >> >> > > Looks like I didn't test this code with CONFIG_REFCOUNT_FULL enab= led :-/. >> >> > > >> >> > > Anyway, can you try to apply the following diff and let me know i= f it >> >> > > fixes the problem? >> >> > > >> >> > > Thanks, >> >> > > >> >> > > Boris >> >> > >=20=20 >> >> > > --->8---=20=20 >> >> > > diff --git a/drivers/gpu/drm/vc4/vc4_bo.c b/drivers/gpu/drm/vc4/v= c4_bo.c >> >> > > index 4ae45d7dac42..2decc8e2c79f 100644 >> >> > > --- a/drivers/gpu/drm/vc4/vc4_bo.c >> >> > > +++ b/drivers/gpu/drm/vc4/vc4_bo.c >> >> > > @@ -637,7 +637,8 @@ int vc4_bo_inc_usecnt(struct vc4_bo *bo) >> >> > > mutex_lock(&bo->madv_lock); >> >> > > switch (bo->madv) { >> >> > > case VC4_MADV_WILLNEED: >> >> > > - refcount_inc(&bo->usecnt); >> >> > > + if (!refcount_inc_not_zero(&bo->usecnt)) >> >> > > + refcount_set(&bo->usecnt, 1);=20=20 >> >> > >> >> > This is racy.=20=20 >> >> >> >> Well, in this case it's not, because this section is protected by a >> >> mutex. >> >>=20=20 >> >> > You need a refcount_inc_allow_zero (if that exists).=20=20 >> >> >> >> I searched for something like that, but it seems that there's no such >> >> helper.=20=20 >> > >> > In that case I think the right thing is to switch to atomic_t. That me= ans >> > no overflow protection, but if we don't have a "this can be 0, I know = what >> > we're doing" style refcount, then I don't think it's suitable. >> > >> > Adding Kees to confirm or clarify or ...=20=20 >>=20 >> Adding Elena for more details. IIRC there have been two cases: >>=20 >> 1) refcount =3D=3D 0 doesn't mean the object has been released. The >> solution tends to be a +1 to the count everywhere. >>=20 >> 2) refcount =3D=3D 0 means the object was released, but the refcount gets >> reused for the next allocation. The solution tends to be to use >> refcount_set() in the new initialization instead of refcount_inc(). >>=20 >> It's not clear to me if either apply here, though. > > I'd say #1, since ->usecnt is not controlling the VC4-BO object > lifetime. This being said, I'm not sure the +1 trick is really > appropriate. Indeed, we're using the > refcount_inc_not_zero()/refcount_dec_not_one() helpers to remove/put > the object from/in a > "mem-pointed-by-obj-can-be-freed-under-mem-pressure" pool (note the > object itself is not freed, just the memory portion that is pointed by > this object). If I apply +1 everywhere then I have no way to handle the > 0 -> 1 and 1 -> 0 transitions differently (these become 1 -> 2 and 2 -> > 1 transitions, and there's no functions called refcount_dec_not_two() > and refcount_inc_not_one()). > > Anyway, if refcount_inc_allow_zero() is not option, maybe I should just > use an atomic_t object (as suggested by Daniel) or a plain unsigned > integer protected by the ->madv.lock (as suggested by Eric). Of course, > that means losing built-in counter-overflow checking and reworking a bit > the VC4-MADV logic, but that's not such a big problem.=20 I was profiling minetest last week, and the madv lock was at least 10% of the kernel overhead, since we need to grab it for each BO that goes idle->busy and also each BO that goes busy->idle in a job. Given that we don't reallocate out of the userspace BO cache until it goes idle, that's potentially a lot of BOs per job, and we'd probably be better off with the global bo_lock. --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEE/JuuFDWp9/ZkuCBXtdYpNtH8nugFAlobbPsACgkQtdYpNtH8 nuhWnw//Tubv0CPdamfRFNS1S+vTQdvoUYl16P/mStfL/aEKfScUTIqjs0PUnvde 9ai7llrXqAw5grp/CRWP42R55eAzpwQvDxiwY0V8PIokxpHeNDh41GP1uUMLp4Sg wRC35nLsp+8zy3BwmSo6d6epZM0Ym4r1rj8UVoqY10p3+I5/npbLn3XSx2avfhTU RomeBMzG0ONSx2nRJKxgZMhxxVFt5p6hyz1BsG9y1ynRccGdpvQd2ZRaape1EIel DKc/1JSAj3jYA2VAOE81W6snsqR1Xh0pQsciZfE0yE6iyc4fWtcb51nryB0/hp24 d8wsG5hFuxAoQtF/Tay8AcEqsdMsTnW7PUqyX3hI0/vpVoP4J9UezOQ/DAwNF7ZH u0Lp2riekMY5i1qGxAMWsBfZAC71rroUI16dFj5pnTMA5HuTPuZpY+UK1PKp2hpe S5YAwABFB57elNPhC9jXg6lEE3oF3seRNHchxzUUi4Qg0kCtbwx4DmiSAdA6A7HN PTlRv2Sj0Iar2tALPGvGAPEXh/UCbq8UrOTeHGLyHBwnNX2BeVVMNg96MHB2CAxb /YITONWeVGbi8jiT9Douj6j0hGT+VMDIKLsgctViFO8NglGCrbyF4gJcs4m4CSnj c3By1rpdJx27CqO3ZOSvOQ7f7VLBW2erFQsApf0RQtQ1DYCge5Y= =IbPY -----END PGP SIGNATURE----- --=-=-=-- --===============0953310838== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0953310838==--