From mboxrd@z Thu Jan  1 00:00:00 1970
From: Daniel Vetter <daniel@ffwll.ch>
Subject: Re: [PATCH] drm/prime: fixup the dma buf export cache locking
Date: Fri, 24 Aug 2012 14:45:29 +0200
Message-ID: <20120824124529.GD5371@phenom.ffwll.local>
References: <1343032047-5713-1-git-send-email-daniel.vetter@ffwll.ch>
	<1343035613-30104-1-git-send-email-daniel.vetter@ffwll.ch>
	<CAPM=9tw21u2bECK9++A8-kjX7n5RZBajVBmo82gd_9r+BgL-KA@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org>
Received: from mail-wi0-f171.google.com (mail-wi0-f171.google.com
	[209.85.212.171])
	by gabe.freedesktop.org (Postfix) with ESMTP id ECFD09F593
	for <dri-devel@lists.freedesktop.org>;
	Fri, 24 Aug 2012 05:45:06 -0700 (PDT)
Received: by wibhq4 with SMTP id hq4so662888wib.12
	for <dri-devel@lists.freedesktop.org>;
	Fri, 24 Aug 2012 05:45:05 -0700 (PDT)
Content-Disposition: inline
In-Reply-To: <CAPM=9tw21u2bECK9++A8-kjX7n5RZBajVBmo82gd_9r+BgL-KA@mail.gmail.com>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/dri-devel>,
	<mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/dri-devel>,
	<mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Sender: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org
Errors-To: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org
To: Dave Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>, DRI Development <dri-devel@lists.freedesktop.org>
List-Id: dri-devel@lists.freedesktop.org

On Fri, Aug 24, 2012 at 02:13:33PM +1000, Dave Airlie wrote:
> On Mon, Jul 23, 2012 at 7:26 PM, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > Well, actually add some, because currently there's exactly none:
> > - in handle_to_fd we only take the file_priv's prime lock, but the
> >   underlying gem object we're exporting isn't per-file.
> > - in each driver's dma_buf->release callback we don't grab any locks
> >   at all.
> 
> Finally got to this patch and applied it to my test box and it
> explodes in the following
> 
> start X, bind udl as an output slave, set a mode on the udl, then kill
> the X server,
> 
> I get an oops on i915/radeon/nouveau, like

I have to admit that I don't see what's happening here :( But thinking
some more about the locking issues and race my patch tried to fix I
concluded that it's the wrong approach anyway - we still have ugly issues
left. The fundamental problem is that for the exporter we have 2
refcounts, the gem refcount and the dma_buf refcount, but we should only
destroy the gem object + it's dma_buf export if both are 0. And keep
everything around otherwise to avoid nasty issues with re-export or
reaping userspace-facing resources like the mmap_offset. But I haven't yet
grown clue how to handle this any better.

Slightly related: Jani discovered that we seem to have a leak, putting him
on cc just in case he's the one with clue ;-)

I've looked again at the other two patches inspired by this one, which fix
some races between gem names & gem flink. And I think those two are still
valid.

Yours, Daniel
> 
> [ 8046.363596] general protection fault: 0000 [#1] SMP
> [ 8046.364012] Modules linked in: fuse lockd rfcomm sunrpc bnep
> tpm_bios ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter
> ip6_tables nf_conntrack_
> ipv4 nf_defrag_ipv4 xt_state nf_conntrack snd_hda_codec_conexant arc4
> iwldvm mac80211 coretemp kvm_intel snd_hda_intel snd_hda_codec kvm
> btusb snd_hwdep bluet
> ooth iwlwifi i915 snd_seq snd_seq_device microcode snd_pcm cfg80211
> snd_page_alloc i2c_i801 lpc_ich e1000e mfd_core snd_timer wmi
> thinkpad_acpi snd soundcore
> rfkill video uinput sdhci_pci sdhci udl syscopyarea sysfillrect
> sysimgblt firewire_ohci mmc_core drm_usb firewire_core crc_itu_t
> yenta_socket radeon i2c_algo_
> bit drm_kms_helper ttm drm i2c_core
> [ 8046.364012] CPU 0
> [ 8046.364012] Pid: 6947, comm: Xorg Tainted: G        W    3.6.0-rc3+
> #7 LENOVO 406334M/406334M
> [ 8046.364012] RIP: 0010:[<ffffffffa0388e0c>]  [<ffffffffa0388e0c>]
> i915_gem_unmap_dma_buf+0x5c/0xb0 [i915]
> [ 8046.364012] RSP: 0018:ffff88002c22dc28  EFLAGS: 00010296
> [ 8046.364012] RAX: 0000000000000500 RBX: ffff880007d5d6d8 RCX: 0000000000000000
> [ 8046.364012] RDX: 0000000000000500 RSI: ffff8800570ca000 RDI: ffff88005b2ea5a8
> [ 8046.364012] RBP: ffff88002c22dc68 R08: 0000000000000000 R09: 0000000000000000
> [ 8046.364012] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88005b2ea5a8
> [ 8046.364012] R13: 0000000000000000 R14: 6b6b6b6b6b6b6b6b R15: ffff8800570ca000
> [ 8046.364012] FS:  00007f79955759c0(0000) GS:ffff880078c00000(0000)
> knlGS:0000000000000000
> [ 8046.364012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 8046.364012] CR2: 00000031808bab90 CR3: 0000000001c0b000 CR4: 00000000000007f0
> [ 8046.364012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 8046.364012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 8046.364012] Process Xorg (pid: 6947, threadinfo ffff88002c22c000,
> task ffff88005d680000)
> [ 8046.364012] Stack:
> [ 8046.364012]  ffff880058d167b0 ffff880000000500 ffff88002c22dc48
> ffff88005f6f0f50
> [ 8046.364012]  ffff88005d9622f8 ffff88005d962290 0000000000000000
> ffffffffffffffff
> [ 8046.364012]  ffff88002c22dc78 ffffffff81409182 ffff88002c22dc98
> ffffffffa002cfeb
> [ 8046.364012] Call Trace:
> [ 8046.364012]  [<ffffffff81409182>] dma_buf_unmap_attachment+0x22/0x40
> [ 8046.364012]  [<ffffffffa002cfeb>] drm_prime_gem_destroy+0x2b/0x50 [drm]
> [ 8046.364012]  [<ffffffffa01f9cb9>] udl_gem_free_object+0x39/0x70 [udl]
> [ 8046.364012]  [<ffffffffa001695a>] drm_gem_object_free+0x2a/0x30 [drm]
> [ 8046.364012]  [<ffffffffa00171a8>]
> drm_gem_object_release_handle+0xa8/0xd0 [drm]
> [ 8046.364012]  [<ffffffff81319b85>] idr_for_each+0xe5/0x180
> [ 8046.364012]  [<ffffffffa0017100>] ? drm_gem_vm_open+0x70/0x70 [drm]
> [ 8046.364012]  [<ffffffff810bea2d>] ? trace_hardirqs_on+0xd/0x10
> [ 8046.364012]  [<ffffffffa0017804>] drm_gem_release+0x24/0x40 [drm]
> [ 8046.364012]  [<ffffffffa0015e1a>] drm_release+0x55a/0x5f0 [drm]
> [ 8046.364012]  [<ffffffff811a908a>] __fput+0xfa/0x2d0
> [ 8046.364012]  [<ffffffff811a926e>] ____fput+0xe/0x10
> [ 8046.364012]  [<ffffffff81076c89>] task_work_run+0x69/0xa0
> [ 8046.364012]  [<ffffffff81056aee>] do_exit+0xa0e/0xb20
> [ 8046.364012]  [<ffffffff816887d5>] ? retint_swapgs+0x13/0x1b
> [ 8046.364012]  [<ffffffff81056f4c>] do_group_exit+0x4c/0xc0
> 
> which implies some reference count is off or some object is freed in
> the wrong order.
> 
> Dave.

-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48