[LTP] [REGRESSION] lkft ltp for 6763a36

public inbox for ltp@lists.linux.it
 help / color / mirror / Atom feed

* [LTP] [REGRESSION] lkft ltp for 6763a36
@ 2022-06-17  1:17 lkft
  2022-06-21  7:15 ` Joerg Vehlow
  0 siblings, 1 reply; 10+ messages in thread
From: lkft @ 2022-06-17  1:17 UTC (permalink / raw)
  To: ltp; +Cc: lkft-triage

## Build
* kernel: 5.17.15
* git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc
* git branch: linux-5.17.y
* git commit: eed68052d2016d9f96d6656435099762608120e3
* git describe: 6763a36
* test details: https://qa-reports.linaro.org/lkft/ltp/build/6763a36

## Test Regressions (compared to 20220527-48-g47ebb84)
* qemu_arm, ltp-syscalls-tests
  - accept02


## Metric Regressions (compared to 20220527-48-g47ebb84)
No metric regressions found.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>


## Test Fixes (compared to 20220527-48-g47ebb84)
* qemu_arm, ltp-syscalls-tests
  - inotify12

* qemu_arm64, ltp-crypto-tests
  - af_alg07

* qemu_arm64, ltp-syscalls-tests
  - inotify12

* qemu_i386, ltp-fs-tests
  - read_all_proc

* qemu_i386, ltp-syscalls-tests
  - inotify12

* qemu_x86_64, ltp-syscalls-tests
  - inotify12


## Metric Fixes (compared to 20220527-48-g47ebb84)
No metric fixes found.

## Test result summary
total: 12654, pass: 10650, fail: 63, skip: 1941, xfail: 0

## Build Summary

## Test suites summary
* log-parser-boot
* log-parser-test
* ltp-cap_bounds-tests
* ltp-commands-tests
* ltp-containers-tests
* ltp-controllers-tests
* ltp-cpuhotplug-tests
* ltp-crypto-tests
* ltp-cve-tests
* ltp-dio-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-mm-tests
* ltp-nptl-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-tracing-tests

--
Linaro LKFT
https://lkft.linaro.org

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LTP] [REGRESSION] lkft ltp for 6763a36
  2022-06-17  1:17 [LTP] [REGRESSION] lkft ltp for 6763a36 lkft
@ 2022-06-21  7:15 ` Joerg Vehlow
  2022-06-21  7:22   ` Jan Stancek
  2022-06-22  7:39   ` Martin Doucha
  0 siblings, 2 replies; 10+ messages in thread
From: Joerg Vehlow @ 2022-06-21  7:15 UTC (permalink / raw)
  To: ltp, Martin Doucha, Richard Palethorpe

Hi,

Am 6/17/2022 um 3:17 AM schrieb lkft@linaro.org:
> ## Build
> * kernel: 5.17.15
> * git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc
> * git branch: linux-5.17.y
> * git commit: eed68052d2016d9f96d6656435099762608120e3
> * git describe: 6763a36
> * test details: https://qa-reports.linaro.org/lkft/ltp/build/6763a36
> 
> ## Test Regressions (compared to 20220527-48-g47ebb84)
> * qemu_arm, ltp-syscalls-tests
>   - accept02
> 
> 
> ## Metric Regressions (compared to 20220527-48-g47ebb84)
> No metric regressions found.
> 
> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
> 
> 
> ## Test Fixes (compared to 20220527-48-g47ebb84)
> * qemu_arm, ltp-syscalls-tests
>   - inotify12
> 
> * qemu_arm64, ltp-crypto-tests
>   - af_alg07
@Martin
This test is very unstable, can we do anything about it?

> 
> * qemu_arm64, ltp-syscalls-tests
>   - inotify12
> 
> * qemu_i386, ltp-fs-tests
>   - read_all_proc
I've seen this test fail a lot, has anyone ever tried to analyze it? I
was unable to reproduce the problem when running the test in isolation.


> 
> * qemu_i386, ltp-syscalls-tests
>   - inotify12
> 
> * qemu_x86_64, ltp-syscalls-tests
>   - inotify12
> 

Joerg

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LTP] [REGRESSION] lkft ltp for 6763a36
  2022-06-21  7:15 ` Joerg Vehlow
@ 2022-06-21  7:22   ` Jan Stancek
  2022-06-21  7:56     ` Joerg Vehlow
  2022-06-22  7:39   ` Martin Doucha
  1 sibling, 1 reply; 10+ messages in thread
From: Jan Stancek @ 2022-06-21  7:22 UTC (permalink / raw)
  To: Joerg Vehlow; +Cc: LTP List, Richard Palethorpe

On Tue, Jun 21, 2022 at 9:15 AM Joerg Vehlow <lkml@jv-coder.de> wrote:
>
> Hi,
>
> Am 6/17/2022 um 3:17 AM schrieb lkft@linaro.org:
> > ## Build
> > * kernel: 5.17.15
> > * git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc
> > * git branch: linux-5.17.y
> > * git commit: eed68052d2016d9f96d6656435099762608120e3
> > * git describe: 6763a36
> > * test details: https://qa-reports.linaro.org/lkft/ltp/build/6763a36
> >
> > ## Test Regressions (compared to 20220527-48-g47ebb84)
> > * qemu_arm, ltp-syscalls-tests
> >   - accept02
> >
> >
> > ## Metric Regressions (compared to 20220527-48-g47ebb84)
> > No metric regressions found.
> >
> > Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
> >
> >
> > ## Test Fixes (compared to 20220527-48-g47ebb84)
> > * qemu_arm, ltp-syscalls-tests
> >   - inotify12
> >
> > * qemu_arm64, ltp-crypto-tests
> >   - af_alg07
> @Martin
> This test is very unstable, can we do anything about it?
>
> >
> > * qemu_arm64, ltp-syscalls-tests
> >   - inotify12
> >
> > * qemu_i386, ltp-fs-tests
> >   - read_all_proc
> I've seen this test fail a lot, has anyone ever tried to analyze it? I
> was unable to reproduce the problem when running the test in isolation.

I see it hit timeouts too (read_all_sys as well). I think it needs
runtime restored to 5minutes as well, atm. it has 30s.

>
>
> >
> > * qemu_i386, ltp-syscalls-tests
> >   - inotify12
> >
> > * qemu_x86_64, ltp-syscalls-tests
> >   - inotify12
> >
>
> Joerg
>
> --
> Mailing list info: https://lists.linux.it/listinfo/ltp
>


-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LTP] [REGRESSION] lkft ltp for 6763a36
  2022-06-21  7:22   ` Jan Stancek
@ 2022-06-21  7:56     ` Joerg Vehlow
  2022-06-21  8:35       ` Richard Palethorpe
  0 siblings, 1 reply; 10+ messages in thread
From: Joerg Vehlow @ 2022-06-21  7:56 UTC (permalink / raw)
  To: Jan Stancek; +Cc: LTP List, Richard Palethorpe

Hi Jan,

Am 6/21/2022 um 9:22 AM schrieb Jan Stancek:
> On Tue, Jun 21, 2022 at 9:15 AM Joerg Vehlow <lkml@jv-coder.de> wrote:
>>
>> Hi,
>>
>> Am 6/17/2022 um 3:17 AM schrieb lkft@linaro.org:
>>> * qemu_i386, ltp-fs-tests
>>>   - read_all_proc
>> I've seen this test fail a lot, has anyone ever tried to analyze it? I
>> was unable to reproduce the problem when running the test in isolation.
> 
> I see it hit timeouts too (read_all_sys as well). I think it needs
> runtime restored to 5minutes as well, atm. it has 30s.
Didn't think about that, but at least for the failures I've seen, this
is not the reason. The message printed by the test is "Test timeout 5
minutes exceeded."

Joerg

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LTP] [REGRESSION] lkft ltp for 6763a36
  2022-06-21  7:56     ` Joerg Vehlow
@ 2022-06-21  8:35       ` Richard Palethorpe
  2022-06-21  9:45         ` Li Wang
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Palethorpe @ 2022-06-21  8:35 UTC (permalink / raw)
  To: Joerg Vehlow; +Cc: LTP List

Hello,

Joerg Vehlow <lkml@jv-coder.de> writes:

> Hi Jan,
>
> Am 6/21/2022 um 9:22 AM schrieb Jan Stancek:
>> On Tue, Jun 21, 2022 at 9:15 AM Joerg Vehlow <lkml@jv-coder.de> wrote:
>>>
>>> Hi,
>>>
>>> Am 6/17/2022 um 3:17 AM schrieb lkft@linaro.org:
>>>> * qemu_i386, ltp-fs-tests
>>>>   - read_all_proc
>>> I've seen this test fail a lot, has anyone ever tried to analyze it? I
>>> was unable to reproduce the problem when running the test in isolation.
>> 
>> I see it hit timeouts too (read_all_sys as well). I think it needs
>> runtime restored to 5minutes as well, atm. it has 30s.
> Didn't think about that, but at least for the failures I've seen, this
> is not the reason. The message printed by the test is "Test timeout 5
> minutes exceeded."
>
> Joerg

The main issue with read_all is that it also acts as a stress
test. Reading some files in proc and sys is very resource intensive
(e.g. due to lock contention) and varies depending on what state the
system is in. On some systems this test will take a long time. Also
there are some files which have to be filtered from the test. This
varies by system as well.

-- 
Thank you,
Richard.

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LTP] [REGRESSION] lkft ltp for 6763a36
  2022-06-21  8:35       ` Richard Palethorpe
@ 2022-06-21  9:45         ` Li Wang
  2022-06-21 11:38           ` Richard Palethorpe
  0 siblings, 1 reply; 10+ messages in thread
From: Li Wang @ 2022-06-21  9:45 UTC (permalink / raw)
  To: Richard Palethorpe; +Cc: LTP List


[-- Attachment #1.1: Type: text/plain, Size: 1536 bytes --]

On Tue, Jun 21, 2022 at 4:56 PM Richard Palethorpe <rpalethorpe@suse.de>
wrote:

> Hello,
>
> Joerg Vehlow <lkml@jv-coder.de> writes:
>
> > Hi Jan,
> >
> > Am 6/21/2022 um 9:22 AM schrieb Jan Stancek:
> >> On Tue, Jun 21, 2022 at 9:15 AM Joerg Vehlow <lkml@jv-coder.de> wrote:
> >>>
> >>> Hi,
> >>>
> >>> Am 6/17/2022 um 3:17 AM schrieb lkft@linaro.org:
> >>>> * qemu_i386, ltp-fs-tests
> >>>>   - read_all_proc
> >>> I've seen this test fail a lot, has anyone ever tried to analyze it? I
> >>> was unable to reproduce the problem when running the test in isolation.
> >>
> >> I see it hit timeouts too (read_all_sys as well). I think it needs
> >> runtime restored to 5minutes as well, atm. it has 30s.
> > Didn't think about that, but at least for the failures I've seen, this
> > is not the reason. The message printed by the test is "Test timeout 5
> > minutes exceeded."
> >
> > Joerg
>
> The main issue with read_all is that it also acts as a stress
> test. Reading some files in proc and sys is very resource intensive
> (e.g. due to lock contention) and varies depending on what state the
> system is in. On some systems this test will take a long time. Also
> there are some files which have to be filtered from the test. This
> varies by system as well.
>

Does it make sense to have a lite version of read_all_sys?
which may only go through files sequentially or under slight stress.

With regard to this stressful read_all, I guess we can put into a dedicated
set and run separately in stress testing.

-- 
Regards,
Li Wang

[-- Attachment #1.2: Type: text/html, Size: 2728 bytes --]

[-- Attachment #2: Type: text/plain, Size: 60 bytes --]


-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LTP] [REGRESSION] lkft ltp for 6763a36
  2022-06-21  9:45         ` Li Wang
@ 2022-06-21 11:38           ` Richard Palethorpe
  2022-06-21 12:51             ` Joerg Vehlow
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Palethorpe @ 2022-06-21 11:38 UTC (permalink / raw)
  To: Li Wang; +Cc: LTP List

Hello Li,

Li Wang <liwang@redhat.com> writes:

> On Tue, Jun 21, 2022 at 4:56 PM Richard Palethorpe <rpalethorpe@suse.de> wrote:
>
>  Hello,
>
>  Joerg Vehlow <lkml@jv-coder.de> writes:
>
>  > Hi Jan,
>  >
>  > Am 6/21/2022 um 9:22 AM schrieb Jan Stancek:
>  >> On Tue, Jun 21, 2022 at 9:15 AM Joerg Vehlow <lkml@jv-coder.de> wrote:
>  >>>
>  >>> Hi,
>  >>>
>  >>> Am 6/17/2022 um 3:17 AM schrieb lkft@linaro.org:
>  >>>> * qemu_i386, ltp-fs-tests
>  >>>>   - read_all_proc
>  >>> I've seen this test fail a lot, has anyone ever tried to analyze it? I
>  >>> was unable to reproduce the problem when running the test in isolation.
>  >> 
>  >> I see it hit timeouts too (read_all_sys as well). I think it needs
>  >> runtime restored to 5minutes as well, atm. it has 30s.
>  > Didn't think about that, but at least for the failures I've seen, this
>  > is not the reason. The message printed by the test is "Test timeout 5
>  > minutes exceeded."
>  >
>  > Joerg
>
>  The main issue with read_all is that it also acts as a stress
>  test. Reading some files in proc and sys is very resource intensive
>  (e.g. due to lock contention) and varies depending on what state the
>  system is in. On some systems this test will take a long time. Also
>  there are some files which have to be filtered from the test. This
>  varies by system as well.
>
> Does it make sense to have a lite version of read_all_sys?
> which may only go through files sequentially or under slight stress.

IIRC the reason I started doing it in parallel is because sequential
opens and reads are even slower and unreliable. Some level of parallism
is required, but too much and it causes issues.

Thinking about it now, on a single or two core system only one worker
process will be spawned. Which could get blocked for a long time on some
reads because of the way some sys/proc files are implemented.

The worker count can be overridden with -w if someone wants to try
increasing it to see if that actually helps on systems with <3
cpus. Also the number of reads is set to 3 in the runtest file, that can
be reduced to 1 with -r.

>
> With regard to this stressful read_all, I guess we can put into a dedicated
> set and run separately in stress testing.

I don't think I'd want to run that. IMO just doing enough to test
parallel accesses is whats required. More than that we will run into
diminishing returns . However I'm not against creating another runtest
file/entry for that.

On bigger systems I think the test is already quite limited even though
it does 3 reads. It only spwans a max of 15 workers which should prevent
it from causing huge lock contention on machines with >16 CPUs. At least
I've not seen problems with that.

It looks like the log from lkft is for a smaller machine?

-- 
Thank you,
Richard.

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LTP] [REGRESSION] lkft ltp for 6763a36
  2022-06-21 11:38           ` Richard Palethorpe
@ 2022-06-21 12:51             ` Joerg Vehlow
  2022-06-23 10:51               ` Richard Palethorpe
  0 siblings, 1 reply; 10+ messages in thread
From: Joerg Vehlow @ 2022-06-21 12:51 UTC (permalink / raw)
  To: rpalethorpe, Li Wang; +Cc: LTP List

Hi,

Am 6/21/2022 um 1:38 PM schrieb Richard Palethorpe:
> Hello Li,
> 
> Li Wang <liwang@redhat.com> writes:
> 
>> On Tue, Jun 21, 2022 at 4:56 PM Richard Palethorpe <rpalethorpe@suse.de> wrote:
>>
>>  Hello,
>>
>>  Joerg Vehlow <lkml@jv-coder.de> writes:
>>
>>  > Hi Jan,
>>  >
>>  > Am 6/21/2022 um 9:22 AM schrieb Jan Stancek:
>>  >> On Tue, Jun 21, 2022 at 9:15 AM Joerg Vehlow <lkml@jv-coder.de> wrote:
>>  >>>
>>  >>> Hi,
>>  >>>
>>  >>> Am 6/17/2022 um 3:17 AM schrieb lkft@linaro.org:
>>  >>>> * qemu_i386, ltp-fs-tests
>>  >>>>   - read_all_proc
>>  >>> I've seen this test fail a lot, has anyone ever tried to analyze it? I
>>  >>> was unable to reproduce the problem when running the test in isolation.
>>  >> 
>>  >> I see it hit timeouts too (read_all_sys as well). I think it needs
>>  >> runtime restored to 5minutes as well, atm. it has 30s.
>>  > Didn't think about that, but at least for the failures I've seen, this
>>  > is not the reason. The message printed by the test is "Test timeout 5
>>  > minutes exceeded."
>>  >
>>  > Joerg
>>
>>  The main issue with read_all is that it also acts as a stress
>>  test. Reading some files in proc and sys is very resource intensive
>>  (e.g. due to lock contention) and varies depending on what state the
>>  system is in. On some systems this test will take a long time. Also
>>  there are some files which have to be filtered from the test. This
>>  varies by system as well.
>>
>> Does it make sense to have a lite version of read_all_sys?
>> which may only go through files sequentially or under slight stress.
> 
> IIRC the reason I started doing it in parallel is because sequential
> opens and reads are even slower and unreliable. Some level of parallism
> is required, but too much and it causes issues.
> 
> Thinking about it now, on a single or two core system only one worker
> process will be spawned. Which could get blocked for a long time on some
> reads because of the way some sys/proc files are implemented.
> 
> The worker count can be overridden with -w if someone wants to try
> increasing it to see if that actually helps on systems with <3
> cpus. Also the number of reads is set to 3 in the runtest file, that can
> be reduced to 1 with -r.
> 
>>
>> With regard to this stressful read_all, I guess we can put into a dedicated
>> set and run separately in stress testing.
> 
> I don't think I'd want to run that. IMO just doing enough to test
> parallel accesses is whats required. More than that we will run into
> diminishing returns . However I'm not against creating another runtest
> file/entry for that.
> 
> On bigger systems I think the test is already quite limited even though
> it does 3 reads. It only spwans a max of 15 workers which should prevent
> it from causing huge lock contention on machines with >16 CPUs. At least
> I've not seen problems with that.
> 
> It looks like the log from lkft is for a smaller machine?
I just used this regression report as an anchor point, because I am
seeing the same intermittent error on a 4 and an 8 core aarch64 system.
The system state at the time of the test execution is very reproducible
and sometimes the 5 minutes are exceeded, while it only takes ~3s, when
it is successful. Maybe there is a very time sensitive kernel bug here?
I am still not sure how to debug this, because I was never able to
reproduce it without executing all ltp tests, that run before in out setup.

Joerg

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LTP] [REGRESSION] lkft ltp for 6763a36
  2022-06-21  7:15 ` Joerg Vehlow
  2022-06-21  7:22   ` Jan Stancek
@ 2022-06-22  7:39   ` Martin Doucha
  1 sibling, 0 replies; 10+ messages in thread
From: Martin Doucha @ 2022-06-22  7:39 UTC (permalink / raw)
  To: Joerg Vehlow, ltp, Richard Palethorpe

On 21. 06. 22 9:15, Joerg Vehlow wrote:
>> * qemu_arm64, ltp-crypto-tests
>>   - af_alg07
> @Martin
> This test is very unstable, can we do anything about it?

The only thing that might improve stability is to increase timeout back
to the default 5 minutes. af_alg07 timeout is currently set to 150
seconds and fzsync library will exit after half that time. There is
unfortunately no reliable way to test the bug or even presence of a
supposed fix.

-- 
Martin Doucha   mdoucha@suse.cz
QA Engineer for Software Maintenance
SUSE LINUX, s.r.o.
CORSO IIa
Krizikova 148/34
186 00 Prague 8
Czech Republic

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LTP] [REGRESSION] lkft ltp for 6763a36
  2022-06-21 12:51             ` Joerg Vehlow
@ 2022-06-23 10:51               ` Richard Palethorpe
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Palethorpe @ 2022-06-23 10:51 UTC (permalink / raw)
  To: Joerg Vehlow; +Cc: LTP List

Hello Joerg,

Joerg Vehlow <lkml@jv-coder.de> writes:

> Hi,
>
> Am 6/21/2022 um 1:38 PM schrieb Richard Palethorpe:
>> Hello Li,
>> 
>> Li Wang <liwang@redhat.com> writes:
>> 
>>> On Tue, Jun 21, 2022 at 4:56 PM Richard Palethorpe <rpalethorpe@suse.de> wrote:
>>>
>>>  Hello,
>>>
>>>  Joerg Vehlow <lkml@jv-coder.de> writes:
>>>
>>>  > Hi Jan,
>>>  >
>>>  > Am 6/21/2022 um 9:22 AM schrieb Jan Stancek:
>>>  >> On Tue, Jun 21, 2022 at 9:15 AM Joerg Vehlow <lkml@jv-coder.de> wrote:
>>>  >>>
>>>  >>> Hi,
>>>  >>>
>>>  >>> Am 6/17/2022 um 3:17 AM schrieb lkft@linaro.org:
>>>  >>>> * qemu_i386, ltp-fs-tests
>>>  >>>>   - read_all_proc
>>>  >>> I've seen this test fail a lot, has anyone ever tried to analyze it? I
>>>  >>> was unable to reproduce the problem when running the test in isolation.
>>>  >> 
>>>  >> I see it hit timeouts too (read_all_sys as well). I think it needs
>>>  >> runtime restored to 5minutes as well, atm. it has 30s.
>>>  > Didn't think about that, but at least for the failures I've seen, this
>>>  > is not the reason. The message printed by the test is "Test timeout 5
>>>  > minutes exceeded."
>>>  >
>>>  > Joerg
>>>
>>>  The main issue with read_all is that it also acts as a stress
>>>  test. Reading some files in proc and sys is very resource intensive
>>>  (e.g. due to lock contention) and varies depending on what state the
>>>  system is in. On some systems this test will take a long time. Also
>>>  there are some files which have to be filtered from the test. This
>>>  varies by system as well.
>>>
>>> Does it make sense to have a lite version of read_all_sys?
>>> which may only go through files sequentially or under slight stress.
>> 
>> IIRC the reason I started doing it in parallel is because sequential
>> opens and reads are even slower and unreliable. Some level of parallism
>> is required, but too much and it causes issues.
>> 
>> Thinking about it now, on a single or two core system only one worker
>> process will be spawned. Which could get blocked for a long time on some
>> reads because of the way some sys/proc files are implemented.
>> 
>> The worker count can be overridden with -w if someone wants to try
>> increasing it to see if that actually helps on systems with <3
>> cpus. Also the number of reads is set to 3 in the runtest file, that can
>> be reduced to 1 with -r.
>> 
>>>
>>> With regard to this stressful read_all, I guess we can put into a dedicated
>>> set and run separately in stress testing.
>> 
>> I don't think I'd want to run that. IMO just doing enough to test
>> parallel accesses is whats required. More than that we will run into
>> diminishing returns . However I'm not against creating another runtest
>> file/entry for that.
>> 
>> On bigger systems I think the test is already quite limited even though
>> it does 3 reads. It only spwans a max of 15 workers which should prevent
>> it from causing huge lock contention on machines with >16 CPUs. At least
>> I've not seen problems with that.
>> 
>> It looks like the log from lkft is for a smaller machine?
> I just used this regression report as an anchor point, because I am
> seeing the same intermittent error on a 4 and an 8 core aarch64 system.
> The system state at the time of the test execution is very reproducible
> and sometimes the 5 minutes are exceeded, while it only takes ~3s, when
> it is successful. Maybe there is a very time sensitive kernel bug here?
> I am still not sure how to debug this, because I was never able to
> reproduce it without executing all ltp tests, that run before in out
> setup.

Very interesting. Well, running tests can cause files to appear in proc
and sys. Including ones which remain after testing has finished. The
most obvious example being when a module is loaded and it creates some
sys files.

Also it could be some reasources are added which are probed by existing
files. Which could be time sensitive if they are cleaned up
asynchronously.

Anyway it should be possible to profile the open and read syscalls with
ftrace or similar. Or you can just set '-v' and inspect the log. We
should also have a per read timeout. I just haven't got around to
implementing it. Probably it requires monitoring, killing and restarting
stuck workers due to how read is implemented on some files.

>
> Joerg


-- 
Thank you,
Richard.

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-06-23 11:14 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-06-17  1:17 [LTP] [REGRESSION] lkft ltp for 6763a36 lkft
2022-06-21  7:15 ` Joerg Vehlow
2022-06-21  7:22   ` Jan Stancek
2022-06-21  7:56     ` Joerg Vehlow
2022-06-21  8:35       ` Richard Palethorpe
2022-06-21  9:45         ` Li Wang
2022-06-21 11:38           ` Richard Palethorpe
2022-06-21 12:51             ` Joerg Vehlow
2022-06-23 10:51               ` Richard Palethorpe
2022-06-22  7:39   ` Martin Doucha

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox