From: Peter Senna Tschudin <peter.senna@linux.intel.com>
To: "Zbigniew Kempczyński" <zbigniew.kempczynski@intel.com>
Cc: Kamil Konieczny <kamil.konieczny@linux.intel.com>,
igt-dev@lists.freedesktop.org, ryszard.knop@intel.com,
lucas.demarchi@intel.com, katarzyna.piecielska@intel.com,
Jonathan Cavitt <jonathan.cavitt@intel.com>
Subject: Re: [PATCH i-g-t v4 1/2] lib/igt_kmemleak: library to interact with kmemleak
Date: Wed, 12 Feb 2025 14:12:44 +0100 [thread overview]
Message-ID: <af2caad1-7daf-4ffb-8e04-fe6b69f9f233@linux.intel.com> (raw)
In-Reply-To: <20250212100157.zgzmorbk7rd3ogd3@zkempczy-mobl2>
On 12.02.2025 11:01, Zbigniew Kempczyński wrote:
> On Wed, Feb 12, 2025 at 10:18:17AM +0100, Peter Senna Tschudin wrote:
>>
>>
>> On 12.02.2025 09:36, Zbigniew Kempczyński wrote:
>>> On Tue, Feb 11, 2025 at 08:12:16PM +0100, Peter Senna Tschudin wrote:
>>>> Hi Zbigniew,
>>>>
>>>> Thank you for your comments, please see my answers below.
>>>>
>>>> On 11.02.2025 17:17, Zbigniew Kempczyński wrote:
>>>>> On Mon, Feb 03, 2025 at 10:23:00AM +0100, Peter Senna Tschudin wrote:
>>>>>> Hi Kamil,
>>>>>>
>>>>>> Thank you for your review. Please see below.
>>>>>>
>>>>>> On 31.01.2025 13:34, Kamil Konieczny wrote:
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>>>> +#include <stdio.h>
>>>>>>>> +#include <string.h>
>>>>>>>> +#include <unistd.h>
>>>>>>>> +#include <fcntl.h>
>>>>>>>> +#include <errno.h>
>>>>>>>
>>>>>>> Sort this alphabetically with exception of unistd.h - this
>>>>>>> header could be first.
>>>>>>
>>>>>> That is picky, but ok.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>> +
>>>>>>>> +#include "igt_core.h"
>>>>>>>> +#include "igt_kmemleak.h"
>>>>>>>> +
>>>>>>>> +/* We can change the path for unit testing, see igt_kmemleak_init() */
>>>>>>>> +static char igt_kmemleak_file[256] = "/sys/kernel/debug/kmemleak";
>>>>>>>> +
>>>>>>>> +#define MAX_WRITE_RETRIES 5
>>>>>>>> +/**
>>>>>>>> + * igt_kmemleak_write - Writes the buffer to the file descriptor retrying when
>>>>>>>> + * possible.
>>>>>>>> + * @fd: The file descriptor to write to.
>>>>>>>> + * @buf: Pointer to the data to write.
>>>>>>>> + * @count: Total number of bytes to write.
>>>>>>>> + *
>>>>>>>> + * Returns the total number of bytes written on success, or -1 on failure.
>>>>>>>> + */
>>>>>>>> +static bool igt_kmemleak_write(int fd, const void *buf, size_t count)
>>>>>>>> +{
>>>>>>>> + const char *ptr = buf;
>>>>>>>> + size_t remaining = count;
>>>>>>>> + ssize_t written;
>>>>>>>> + int retries = 0;
>>>>>>>> +
>>>>>>>> + while (remaining > 0) {
>>>>>>>> + written = write(fd, ptr, remaining);
>>>>>>>> + if (written > 0) {
>>>>>>>> + ptr += written;
>>>>>>>> + remaining -= written;
>>>>>>>> + } else if (written == -1) {
>>>>>>>> + if (errno == EINTR || errno == EAGAIN) {
>>>>>>>> + /* Retry for recoverable errors */
>>>>>>>> + if (++retries > MAX_WRITE_RETRIES) {
>>>>>>>> + igt_warn("igt_write: Exceeded retry limit\n");
>>>>>>>
>>>>>>> Do not use test prints in lib, especially those used by runner.
>>>>>>
>>>>>> Oh, I will. kmemleak is about writing to disk. If that fails, I will tell the user.
>>>>>> Why does this restriction applies only to me? See:
>>>>>>
>>>>>> ---
>>>>>> psennats@friendship7-home:~/UPSTREAM/igt-gpu-tools/lib$ git grep igt_warn|wc -l
>>>>>> 164
>>>>>> psennats@friendship7-home:~/UPSTREAM/igt-gpu-tools/lib$ cd ..
>>>>>> psennats@friendship7-home:~/UPSTREAM/igt-gpu-tools$ git grep igt_warn|wc -l
>>>>>> 324
>>>>>> ---
>>>>>>
>>>>>> About 50% of all references to igt_warn are in lib, what is the problem with my code?
>>>>>
>>>>> I'm looking at the code and I have impression this should be part of
>>>>> the runner, not igt library. Do I understand correctly that your
>>>>> code will be used by the runner only and not by the tests?
>>>>
>>>> It is difficult to know for now. We have one test that will use the lib
>>>> as is: `tests/amdgpu/amd_mem_leak.c`. But that is not the main potential
>>>> use case.
>>>>
>>>> As I describe in the cover letter, transient leaks are kind of tricky,
>>>> and we may require to request a scan from a test itself for better
>>>> coverage. So in short I honestly do not know if we will have test
>>>> integration in the future or not.
>>>
>>> Fundamental question is - where do you want to scan/clear - on the runner
>>> level or on the test? What will happen if you start to clear-scan/clear
>>> both from runner (-k) and from the test itself. I've intentionally
>>> added clear first to limit catched leaks to period where test was
>>> executed.
>>
>> I do not know know. There may be uses for both. Currently I went for the
>> runner for two reasons: it is simple, and produced good results.
>>
>> However for transient leaks the key seems to be timing and then igt_runner
>> does not offer enough granularity. We may need clears and scans at very
>> specific places in the code.
>>
>> My goal is to have CI experiment with kmemleak on igt_runner, collect
>> developer feedback, and then decide if the next step of trying to figure
>> it out how to have a more granular test-level scan makes sense and if is
>> suitable for our use case.
>>
>> Detecting transient leaks is an open problem.
>>
>>>
>>> On the runner level we would collect for all tests, on the test level
>>> I imagine command line argument passed to the test like -k which would
>>> enable the scan on the very beginning. But this would require to rewrite
>>> all tests which are using pure main() instead of igt_main.*().
>>> If really there's leak and someone wants to catch it, he can always
>>> simply wrap around script execution by kmemleak clear-scan/clear.
>>
>> Currently we have -keach and -konce with -keach scanning between each test.
>> As I said, this produces results, but is not effective for transient leaks.
>>
>> Again, I do not know. If the initial evaluation indicates that there may be
>> a benefit on extending the tests to trigger kmemleak scans, I see no reason
>> for not doing it.
>>
>>
>>>
>>> Resume - I would stick to the runner level, I see no judgement to change
>>> igt_core / tests to use this library directly.
>>
>> Resume: I prefer to wait until the initial feedback from CI and developers.
>> I will be happy to move this to runner later if we decide to not walk the
>> unknown road of trying to catch as many transient leaks as we can.
>>
>> That being said, let me know how to move forward.
>
> Question is to CI folks, for me this scan should be executed as
> dedicated CI run with kmemleak on. And I repeat - I don't see reason
> we should add this support to the tests, as few line script which
> wraps around test by kmemleak clear/scan is enough. We already have
> scripts directory so it is good place for such wrapper.
So I am moving the code to the runner directory, and using printf() instead
of igt_warn(). Any other request for the series?
[...]
next prev parent reply other threads:[~2025-02-12 13:12 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-28 15:15 [PATCH i-g-t v4 0/2] Integrate kmemleak scans in igt_runner Peter Senna Tschudin
2025-01-28 15:15 ` [PATCH i-g-t v4 1/2] lib/igt_kmemleak: library to interact with kmemleak Peter Senna Tschudin
2025-01-31 12:34 ` Kamil Konieczny
2025-02-03 9:23 ` Peter Senna Tschudin
2025-02-11 16:17 ` Zbigniew Kempczyński
2025-02-11 19:12 ` Peter Senna Tschudin
2025-02-12 8:36 ` Zbigniew Kempczyński
2025-02-12 9:18 ` Peter Senna Tschudin
2025-02-12 10:01 ` Zbigniew Kempczyński
2025-02-12 13:12 ` Peter Senna Tschudin [this message]
2025-02-12 13:41 ` Kamil Konieczny
2025-02-12 14:57 ` Peter Senna Tschudin
2025-02-25 13:46 ` Peter Senna Tschudin
2025-02-25 18:06 ` Peter Senna Tschudin
2025-01-31 12:38 ` Kamil Konieczny
2025-02-03 9:14 ` Peter Senna Tschudin
2025-02-12 14:26 ` Kamil Konieczny
2025-01-28 15:15 ` [PATCH i-g-t v4 2/2] runner/executor: Integrate igt_kmemleak scans Peter Senna Tschudin
2025-01-31 12:50 ` Kamil Konieczny
2025-02-03 9:10 ` Peter Senna Tschudin
2025-02-12 14:24 ` Kamil Konieczny
2025-02-12 14:53 ` Peter Senna Tschudin
2025-01-28 20:48 ` ✓ i915.CI.BAT: success for Integrate kmemleak scans in igt_runner (rev2) Patchwork
2025-01-28 20:49 ` Patchwork
2025-01-28 20:52 ` ✓ Xe.CI.BAT: " Patchwork
2025-01-29 10:40 ` ✗ Xe.CI.Full: failure " Patchwork
2025-01-29 13:18 ` Peter Senna Tschudin
2025-01-29 11:25 ` ✗ i915.CI.Full: " Patchwork
2025-01-29 13:19 ` Peter Senna Tschudin
2025-01-30 6:18 ` Ravali, JupallyX
2025-01-30 5:26 ` ✓ i915.CI.Full: success " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=af2caad1-7daf-4ffb-8e04-fe6b69f9f233@linux.intel.com \
--to=peter.senna@linux.intel.com \
--cc=igt-dev@lists.freedesktop.org \
--cc=jonathan.cavitt@intel.com \
--cc=kamil.konieczny@linux.intel.com \
--cc=katarzyna.piecielska@intel.com \
--cc=lucas.demarchi@intel.com \
--cc=ryszard.knop@intel.com \
--cc=zbigniew.kempczynski@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox