Re: [PATCH 2/4] drm/i915: Handle incomplete Z_FINISH for compressed error states

From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>,
	intel-gfx <intel-gfx@lists.freedesktop.org>
Subject: Re: [PATCH 2/4] drm/i915: Handle incomplete Z_FINISH for compressed error states
Date: Tue, 2 Oct 2018 15:34:43 +0100	[thread overview]
Message-ID: <528f0db4-9d6f-dc89-d3bf-cf4cc6858d64@linux.intel.com> (raw)
In-Reply-To: <153848833858.23109.12163375002290599091@skylake-alporthouse-com>

On 02/10/2018 14:52, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-10-02 14:46:10)
>>
>> On 02/10/2018 14:22, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2018-10-02 14:13:14)
>>>>
>>>> On 02/10/2018 13:24, Chris Wilson wrote:
>>>>> Quoting Tvrtko Ursulin (2018-10-02 13:20:05)
>>>>>>
>>>>>> On 01/10/2018 20:44, Chris Wilson wrote:
>>>>>>> The final call to zlib_deflate(Z_FINISH) may require more output
>>>>>>> space to be allocated and so needs to re-invoked. Failure to do so in
>>>>>>> the current code leads to incomplete zlib streams (albeit intact due to
>>>>>>> the use of Z_SYNC_FLUSH) resulting in the occasional short object
>>>>>>> capture.
>>>>>>>
>>>>>>> Testcase: igt/i915-error-capture.js
>>>>>>> Fixes: 0a97015d45ee ("drm/i915: Compress GPU objects in error state")
>>>>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>>>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>>>>>> Cc: <stable@vger.kernel.org> # v4.10+
>>>>>>> ---
>>>>>>>      drivers/gpu/drm/i915/i915_gpu_error.c | 60 +++++++++++++++++++++------
>>>>>>>      1 file changed, 47 insertions(+), 13 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
>>>>>>> index 3d5554f14dfd..ed8c16cbfaa4 100644
>>>>>>> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
>>>>>>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
>>>>>>> @@ -237,6 +237,7 @@ static int compress_page(struct compress *c,
>>>>>>>                           struct drm_i915_error_object *dst)
>>>>>>>      {
>>>>>>>          struct z_stream_s *zstream = &c->zstream;
>>>>>>> +     int flush = Z_NO_FLUSH;
>>>>>>>      
>>>>>>>          zstream->next_in = src;
>>>>>>>          if (c->tmp && i915_memcpy_from_wc(c->tmp, src, PAGE_SIZE))
>>>>>>> @@ -257,8 +258,11 @@ static int compress_page(struct compress *c,
>>>>>>>                          zstream->avail_out = PAGE_SIZE;
>>>>>>>                  }
>>>>>>
>>>>>> Looks like the block above, not shown in this diff, could use a check
>>>>>> and abort if the dst->page_count overgrows the pessimistic allocation of
>>>>>> the array.
>>>>>>
>>>>>>>      
>>>>>>> -             if (zlib_deflate(zstream, Z_SYNC_FLUSH) != Z_OK)
>>>>>>> +             if (zlib_deflate(zstream, flush) != Z_OK)
>>>>>>
>>>>>> So this (not always asking for a flush) actually not only fixes the
>>>>>> flush at the end but improves the API usage and potentially compression
>>>>>> ratio, correct?
>>>>>
>>>>> Yes.
>>>>>     
>>>>>>>                          return -EIO;
>>>>>>> +
>>>>>>> +             if (zstream->avail_out)
>>>>>>> +                     flush = Z_SYNC_FLUSH;
>>>>>>
>>>>>> Hm.. but why this? It will flush only occasionally, when one input page
>>>>>> did not fit in the available output - but that will depend on
>>>>>> compressibility so I don't think it has a fixed period. It is not for
>>>>>> instance for every 4k of compressed output, if that was maybe the goal.
>>>>>
>>>>> My thinking is that if the zlib_deflate() wants to defer to fill its
>>>>> window (or whatever) but we need to cross a page boundary, we have to
>>>>> push everything from the current page before we change the PTE.
>>>>
>>>> Hm but I am not sure this is achieving that, or that it is needed. I
>>>> read the docs many times and I am not sure how it is supposed to work.
>>>>
>>>> First thing I am not sure of is whether we need to ask for any flushes,
>>>> or it would be enough to loop until avail_in > 0 || avail_out == 0.
>>>> Since docs say that if avail_out == 0, zlib_deflate needs to be called
>>>> again in the same flush mode.
>>>
>>> Iirc, Z_SYNC_FLUSH talks about terminating the compression block earlier
>>> in order to consume next_avail, without the SYNC repeating a call is
>>> allowed to not progress.
>>>
>>>> Current code fails to ensure this. It would happen on the next call to
>>>> compress_page, but a) I think that is unclean and b) flush mode will not
>>>> match.
>>>
>>> Fails to ensure what exactly? I think it is obeying the rules on flush:
>>>
>>>     If the parameter flush is set to Z_SYNC_FLUSH, all pending output is
>>>     flushed to the output buffer and the output is aligned on a byte boundary, so
>>>     that the decompressor can get all input data available so far. (In particular
>>>     avail_in is zero after the call if enough output space has been provided
>>>     before the call.)  Flushing may degrade compression for some compression
>>>     algorithms and so it should be used only when necessary.
>>
>> I am not sure we need to flush - I don't see that it says in the docs
>> that we have to. AFAIU when avail_in drops to zero we are allowed to
>> move on. Anything outstanding in the compressor would be then flushed
>> either on processing the next page, or on stream close.
> 
> The case we want to handle is avail_in != 0. We cannot move on until
> avail_in is 0.

That's what I said! :) But I don't see in the docs that is says we have 
to flush after avail_in is zero, nor when avail_out is zero.

> 
>>>     ...
>>>
>>>     If deflate returns with avail_out == 0, this function must be called again
>>>     with the same value of the flush parameter and more output space (updated
>>>     avail_out), until the flush is complete (deflate returns with non-zero
>>>     avail_out).
>>
>> Yes, but the patch does not do that. You enable the flush mode when
>> avail_out is non zero, but then two options are possible. Either
>> avail_in is zero, in which case nothing happens, or it is not. In the
>> latter case there is another flushing call to zlib_deflate. If avail_out
>> is zero after that call, and avail_in is also zero, the documentation
>> paragraph above says you have to call zlib_deflate in the same flush
>> mode again.
> 
> Read the paragraph again, specifically when avail_out == 0 you should
> allocate more space and rerun the compressor. If avail_out != 0 and
> avail_in != 0 we want to SYNC in order to consume the remaining
> avail_in.

Sorry I don't see where it says that you have to SYNC_FLUSH in any 
circumstance apart from influencing the stream composition either for 
error recovery or interactive applications?

Furthermore, as I wrote above, when you start a flush in this patch you 
don't ensure the next zlib_deflate call is with the same flush mode 
(since it can be only on the next call to compress_page).

Z_OK && avail_out == 0 it says you have to run it again, so that too 
would be more intuitive being done in scope of a single compress_page 
call. (It doesn't say avail_in == 0 && avail_out == 0 is not possible.)

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx