From mboxrd@z Thu Jan  1 00:00:00 1970
From: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Subject: Re: [PATCH 1/6] drm/ttm: remove lru_lock around ttm_bo_reserve
Date: Thu, 06 Dec 2012 10:55:53 +0100
Message-ID: <50C06BA9.9030604@canonical.com>
References: <50B8A254.904@canonical.com>
	<1354277580-17958-1-git-send-email-maarten.lankhorst@canonical.com>
	<CAH3drwbe9X-1Kd6rx60hDjkU=eGwQ6DpZ=zetDh9JWW0cxLW5w@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org>
Received: from youngberry.canonical.com (youngberry.canonical.com
	[91.189.89.112])
	by gabe.freedesktop.org (Postfix) with ESMTP id EAF8EE66C1
	for <dri-devel@lists.freedesktop.org>;
	Thu,  6 Dec 2012 01:55:56 -0800 (PST)
In-Reply-To: <CAH3drwbe9X-1Kd6rx60hDjkU=eGwQ6DpZ=zetDh9JWW0cxLW5w@mail.gmail.com>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/dri-devel>,
	<mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/dri-devel>,
	<mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Sender: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org
Errors-To: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org
To: Jerome Glisse <j.glisse@gmail.com>
Cc: Maarten Lankhorst <m.b.lankhorst@gmail.com>, thellstrom@vmware.com, dri-devel@lists.freedesktop.org
List-Id: dri-devel@lists.freedesktop.org

Op 06-12-12 02:19, Jerome Glisse schreef:
> On Fri, Nov 30, 2012 at 7:12 AM, Maarten Lankhorst
> <m.b.lankhorst@gmail.com> wrote:
>> There should no longer be assumptions that reserve will always succeed
>> with the lru lock held, so we can safely break the whole atomic
>> reserve/lru thing. As a bonus this fixes most lockdep annotations for
>> reservations.
>>
>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
>> ---
>>  drivers/gpu/drm/ttm/ttm_bo.c           | 54 ++++++++++++++++++++++------------
>>  drivers/gpu/drm/ttm/ttm_execbuf_util.c |  2 +-
>>  include/drm/ttm/ttm_bo_driver.h        | 19 +++---------
>>  3 files changed, 40 insertions(+), 35 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>> index 9028327..61b5cd0 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>> @@ -213,14 +213,13 @@ int ttm_bo_del_from_lru(struct ttm_buffer_object *bo)
>>         return put_count;
>>  }
>>
>> -int ttm_bo_reserve_locked(struct ttm_buffer_object *bo,
>> +int ttm_bo_reserve_nolru(struct ttm_buffer_object *bo,
>>                           bool interruptible,
>>                           bool no_wait, bool use_sequence, uint32_t sequence)
>>  {
>> -       struct ttm_bo_global *glob = bo->glob;
>>         int ret;
>>
>> -       while (unlikely(atomic_read(&bo->reserved) != 0)) {
>> +       while (unlikely(atomic_xchg(&bo->reserved, 1) != 0)) {
>>                 /**
>>                  * Deadlock avoidance for multi-bo reserving.
>>                  */
> Regarding memory barrier discussion we could add a smp_rdmb here or
> few line below before the read of sequence for -EAGAIN. But it's not
> really important, if a CPU doesn't see new sequence value the worst
> case is that the CPU will go to wait again before reaching back this
> section and returning EAGAIN. So it's just gonna waste some CPU cycle
> i can't see anything bad that could happen.
Ah indeed, no bad thing can happen from what I can tell.

I looked at Documentation/atomic_ops.txt again, and it says:

"If a caller requires memory barrier semantics around an atomic_t
operation which does not return a value, opsa set of interfaces are
defined which accomplish this:"

Since we check the value of atomic_xchg, I think that means memory barriers are implied,
strengthened by the fact that the arch specific xchg implementations I checked (x86 and sparc) clobber
memory, which definitely implies a compiler barrier at least, that should be good enough here. Parisc
has no native xchg op and falls back to using spin_lock_irqrestore, so it would definitely be good enough
there.

>> @@ -241,26 +240,36 @@ int ttm_bo_reserve_locked(struct ttm_buffer_object *bo,
>>                 if (no_wait)
>>                         return -EBUSY;
>>
>> -               spin_unlock(&glob->lru_lock);
>>                 ret = ttm_bo_wait_unreserved(bo, interruptible);
>> -               spin_lock(&glob->lru_lock);
>>
>>                 if (unlikely(ret))
>>                         return ret;
>>         }
>>
>> -       atomic_set(&bo->reserved, 1);
>>         if (use_sequence) {
>> +               bool wake_up = false;
>>                 /**
>>                  * Wake up waiters that may need to recheck for deadlock,
>>                  * if we decreased the sequence number.
>>                  */
>>                 if (unlikely((bo->val_seq - sequence < (1 << 31))
>>                              || !bo->seq_valid))
>> -                       wake_up_all(&bo->event_queue);
>> +                       wake_up = true;
>>
>> +               /*
>> +                * In the worst case with memory ordering these values can be
>> +                * seen in the wrong order. However since we call wake_up_all
>> +                * in that case, this will hopefully not pose a problem,
>> +                * and the worst case would only cause someone to accidentally
>> +                * hit -EAGAIN in ttm_bo_reserve when they see old value of
>> +                * val_seq. However this would only happen if seq_valid was
>> +                * written before val_seq was, and just means some slightly
>> +                * increased cpu usage
>> +                */
>>                 bo->val_seq = sequence;
>>                 bo->seq_valid = true;
> If we want we could add smp_wrmb here but see above comment on
> usefullness of this.
Yeah would be overkill. The wake_up_all takes an irqoff spinlock, which would implicitly
count as full memory barrier, and the lru lock will always be taken afterwards, which is
definitely a full memory barrier.

~Maarten