From mboxrd@z Thu Jan 1 00:00:00 1970 From: Daniel Vetter Subject: Re: [PATCH] drm/prime: fixup the dma buf export cache locking Date: Fri, 24 Aug 2012 14:45:29 +0200 Message-ID: <20120824124529.GD5371@phenom.ffwll.local> References: <1343032047-5713-1-git-send-email-daniel.vetter@ffwll.ch> <1343035613-30104-1-git-send-email-daniel.vetter@ffwll.ch> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail-wi0-f171.google.com (mail-wi0-f171.google.com [209.85.212.171]) by gabe.freedesktop.org (Postfix) with ESMTP id ECFD09F593 for ; Fri, 24 Aug 2012 05:45:06 -0700 (PDT) Received: by wibhq4 with SMTP id hq4so662888wib.12 for ; Fri, 24 Aug 2012 05:45:05 -0700 (PDT) Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org Errors-To: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org To: Dave Airlie Cc: Daniel Vetter , DRI Development List-Id: dri-devel@lists.freedesktop.org On Fri, Aug 24, 2012 at 02:13:33PM +1000, Dave Airlie wrote: > On Mon, Jul 23, 2012 at 7:26 PM, Daniel Vetter wrote: > > Well, actually add some, because currently there's exactly none: > > - in handle_to_fd we only take the file_priv's prime lock, but the > > underlying gem object we're exporting isn't per-file. > > - in each driver's dma_buf->release callback we don't grab any locks > > at all. > > Finally got to this patch and applied it to my test box and it > explodes in the following > > start X, bind udl as an output slave, set a mode on the udl, then kill > the X server, > > I get an oops on i915/radeon/nouveau, like I have to admit that I don't see what's happening here :( But thinking some more about the locking issues and race my patch tried to fix I concluded that it's the wrong approach anyway - we still have ugly issues left. The fundamental problem is that for the exporter we have 2 refcounts, the gem refcount and the dma_buf refcount, but we should only destroy the gem object + it's dma_buf export if both are 0. And keep everything around otherwise to avoid nasty issues with re-export or reaping userspace-facing resources like the mmap_offset. But I haven't yet grown clue how to handle this any better. Slightly related: Jani discovered that we seem to have a leak, putting him on cc just in case he's the one with clue ;-) I've looked again at the other two patches inspired by this one, which fix some races between gem names & gem flink. And I think those two are still valid. Yours, Daniel > > [ 8046.363596] general protection fault: 0000 [#1] SMP > [ 8046.364012] Modules linked in: fuse lockd rfcomm sunrpc bnep > tpm_bios ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter > ip6_tables nf_conntrack_ > ipv4 nf_defrag_ipv4 xt_state nf_conntrack snd_hda_codec_conexant arc4 > iwldvm mac80211 coretemp kvm_intel snd_hda_intel snd_hda_codec kvm > btusb snd_hwdep bluet > ooth iwlwifi i915 snd_seq snd_seq_device microcode snd_pcm cfg80211 > snd_page_alloc i2c_i801 lpc_ich e1000e mfd_core snd_timer wmi > thinkpad_acpi snd soundcore > rfkill video uinput sdhci_pci sdhci udl syscopyarea sysfillrect > sysimgblt firewire_ohci mmc_core drm_usb firewire_core crc_itu_t > yenta_socket radeon i2c_algo_ > bit drm_kms_helper ttm drm i2c_core > [ 8046.364012] CPU 0 > [ 8046.364012] Pid: 6947, comm: Xorg Tainted: G W 3.6.0-rc3+ > #7 LENOVO 406334M/406334M > [ 8046.364012] RIP: 0010:[] [] > i915_gem_unmap_dma_buf+0x5c/0xb0 [i915] > [ 8046.364012] RSP: 0018:ffff88002c22dc28 EFLAGS: 00010296 > [ 8046.364012] RAX: 0000000000000500 RBX: ffff880007d5d6d8 RCX: 0000000000000000 > [ 8046.364012] RDX: 0000000000000500 RSI: ffff8800570ca000 RDI: ffff88005b2ea5a8 > [ 8046.364012] RBP: ffff88002c22dc68 R08: 0000000000000000 R09: 0000000000000000 > [ 8046.364012] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88005b2ea5a8 > [ 8046.364012] R13: 0000000000000000 R14: 6b6b6b6b6b6b6b6b R15: ffff8800570ca000 > [ 8046.364012] FS: 00007f79955759c0(0000) GS:ffff880078c00000(0000) > knlGS:0000000000000000 > [ 8046.364012] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 8046.364012] CR2: 00000031808bab90 CR3: 0000000001c0b000 CR4: 00000000000007f0 > [ 8046.364012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 8046.364012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 8046.364012] Process Xorg (pid: 6947, threadinfo ffff88002c22c000, > task ffff88005d680000) > [ 8046.364012] Stack: > [ 8046.364012] ffff880058d167b0 ffff880000000500 ffff88002c22dc48 > ffff88005f6f0f50 > [ 8046.364012] ffff88005d9622f8 ffff88005d962290 0000000000000000 > ffffffffffffffff > [ 8046.364012] ffff88002c22dc78 ffffffff81409182 ffff88002c22dc98 > ffffffffa002cfeb > [ 8046.364012] Call Trace: > [ 8046.364012] [] dma_buf_unmap_attachment+0x22/0x40 > [ 8046.364012] [] drm_prime_gem_destroy+0x2b/0x50 [drm] > [ 8046.364012] [] udl_gem_free_object+0x39/0x70 [udl] > [ 8046.364012] [] drm_gem_object_free+0x2a/0x30 [drm] > [ 8046.364012] [] > drm_gem_object_release_handle+0xa8/0xd0 [drm] > [ 8046.364012] [] idr_for_each+0xe5/0x180 > [ 8046.364012] [] ? drm_gem_vm_open+0x70/0x70 [drm] > [ 8046.364012] [] ? trace_hardirqs_on+0xd/0x10 > [ 8046.364012] [] drm_gem_release+0x24/0x40 [drm] > [ 8046.364012] [] drm_release+0x55a/0x5f0 [drm] > [ 8046.364012] [] __fput+0xfa/0x2d0 > [ 8046.364012] [] ____fput+0xe/0x10 > [ 8046.364012] [] task_work_run+0x69/0xa0 > [ 8046.364012] [] do_exit+0xa0e/0xb20 > [ 8046.364012] [] ? retint_swapgs+0x13/0x1b > [ 8046.364012] [] do_group_exit+0x4c/0xc0 > > which implies some reference count is off or some object is freed in > the wrong order. > > Dave. -- Daniel Vetter Mail: daniel@ffwll.ch Mobile: +41 (0)79 365 57 48