Re: buggy/weird behavior in ttm

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Maarten Lankhorst <maarten.lankhorst@canonical.com>
To: Thomas Hellstrom <thellstrom@vmware.com>
Cc: dri-devel@lists.freedesktop.org
Subject: Re: buggy/weird behavior in ttm
Date: Thu, 11 Oct 2012 22:55:45 +0200	[thread overview]
Message-ID: <50773251.9060205@canonical.com> (raw)
In-Reply-To: <50771D50.5000104@vmware.com>

Op 11-10-12 21:26, Thomas Hellstrom schreef:
> On 10/11/2012 08:42 PM, Maarten Lankhorst wrote:
>
>>
>>> Anyway, if you plan to remove the fence lock and protect it with reserve, you must
>>> make sure that a waiting reserve is never done in a destruction path. I think this
>>> mostly concerns the nvidia driver.
>> Well I don't think any lock should ever be held during destruction time,
>
> What I mean is, that *something* needs to protect the fence pointer. Currently it's the
> fence lock, and I was assuming you'd protect it with reserve. And neither TTM nor
> Nvidia should, when a resource is about to be freed, be forced to *block* waiting for
> reserve just to access the fence pointer. When and if you have a solution that fulfills
> those requirements, I'm ready to review it.
It's not blocking, cleanup_refs_or_queue will toss it on the deferred list if reservation fails,
behavior doesn't change just because I changed the order around.
>>
>>>> - no_wait_reserve is ignored if no_wait_gpu is false
>>>>     ttm_bo_reserve_locked can only return true if no_wait_reserve is true, but
>>>>     subsequently it will do a wait_unreserved if no_wait_gpu is false.
>>>> I'm planning on removing this argument and act like it is always true, since
>>>> nothing on the lru list should fail to reserve currently.
>>> Yes, since all buffers that are reserved are removed from the LRU list, there
>>> should never be a waiting reserve on them, so no_wait_reserve can be removed
>>> from ttm_mem_evict_first, ttm_bo_evict and possibly other functions in the call chain.
>> I suppose there will stay a small race though,
>
> Hmm, where?
When you enter the ddestroy path, you drop the lock and hope the buffer doesn't reserved
away from under you.

>>>> - effectively unlimited callchain between some functions that all go through
>>>>     ttm_mem_evict_first:
>>>>
>>>>                                       /------------------------\
>>>> ttm_mem_evict_first - ttm_bo_evict -                          -ttm_bo_mem_space  - ttm_bo_mem_force_space - ttm_mem_evict_first
>>>>                                       \ ttm_bo_handle_move_mem /
>>>> I'm not surprised that there was a deadlock before, it seems to me it would
>>>> be pretty suicidal to ever do a blocking reserve on any of those lists,
>>>> lockdep would be all over you for this.
>>> Well, at first this may look worse than it actually is. The driver's eviction memory order determines the recursion depth
>>> and typically it's 0 or 1, since subsequent ttm_mem_evict_first should never touch the same LRU lists as the first one.
>>> What would typically happen is that a BO is evicted from VRAM to TT, and if there is no space in TT, another BO is evicted
>>> to system memory, and the chain is terminated. However a driver could set up any eviction order but that would be
>>> a BUG.
>>>
>>> But in essence, as you say, even with a small recursion depth, a waiting reserve could cause a deadlock.
>>> But there should be no waiting reserves in the eviction path currently.
>> Partially true, ttm_bo_cleanup_refs is currently capable of blocking reserve.
>> Fixing this might mean that ttm_mem_evict_first may need to become more aggressive,
>> since it seems all the callers of this function assume that ttm_mem_evict_first can only fail
>> if there is really nothing more to free and blocking nested would really upset lockdep
>> and leave you open to the same deadlocks.
> I can't see how the waiting reserve in ttm_bo_cleanup_refs would cause a deadlock,
> because the buffer about to be reserved is always *last* in a reservation sequence, and the
> reservation is always released (or the buffer destroyed) before trying to reserve another buffer.
> Technically the buffer is not looked up from a LRU list but from the delayed delete list.
> Could you describe such a deadlock case?
The only interesting case for this is ttm_mem_evict_first, and while it may not technically
be a deadlock, lockdep will flag you for blocking on this anyway, since the only reason it
would not be a deadlock is if you know the exact semantics of why.
>> An unexpected bonus from adding this skipping would be that the atomic lru_list
>> removal on unreserve guarantee becomes a lot easier to drop while keeping behavior
>> exactly the same. Only the swapout code would need to be adjusted for to try the whole
>> list. Maybe ttm_bo_delayed_delete with remove_all = false too would change behavior
>> slightly too, but this seems to be less likely, as it could only ever happen on delayed
>> destruction of imported dma-buf's.
>
>
> May I kindly remind you that the atomic lru_list removal stays in TTM
> until we've had a high level design discussion weighing it against an extended
> reservation object API.
I'm aware, but I still wanted to see if it was possible or not in a clean way for testing,
so I don't ask for things that would be impossible to do or too involved to do in a safe
manner.

Will you make it to UDS-r?

~Maarten

next prev parent reply	other threads:[~2012-10-11 20:55 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-11 14:50 buggy/weird behavior in ttm Maarten Lankhorst
2012-10-11 16:57 ` Thomas Hellstrom
2012-10-11 18:42   ` Maarten Lankhorst
2012-10-11 19:26     ` Thomas Hellstrom
2012-10-11 19:35       ` Thomas Hellstrom
2012-10-11 20:55       ` Maarten Lankhorst [this message]
2012-10-12  5:57         ` Thomas Hellstrom
2012-10-12  7:49           ` Maarten Lankhorst
2012-10-15 12:14             ` Thomas Hellstrom
2012-10-12 10:09           ` Maarten Lankhorst
2012-10-15 12:27             ` Thomas Hellstrom
2012-10-15 15:37               ` Maarten Lankhorst
2012-10-15 18:34                 ` Maarten Lankhorst
2012-10-15 19:44                   ` Jerome Glisse
2012-10-15 18:40                 ` Thomas Hellstrom
2012-10-15 19:32                   ` Maarten Lankhorst

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50773251.9060205@canonical.com \
    --to=maarten.lankhorst@canonical.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=thellstrom@vmware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.