[PATCH v1 2/2] Add selftests for pidfd polling

All of lore.kernel.org
 help / color / mirror / Atom feed

From: joel at joelfernandes.org (Joel Fernandes)
Subject: [PATCH v1 2/2] Add selftests for pidfd polling
Date: Fri, 26 Apr 2019 16:31:14 -0400	[thread overview]
Message-ID: <20190426203114.GA185553@google.com> (raw)
In-Reply-To: <CAKOZuetFw9MnZCrXbcFTCFD9Jq8bZwzdcEi_OYHMMv2R_sgNNA@mail.gmail.com>

On Fri, Apr 26, 2019 at 12:35:40PM -0700, Daniel Colascione wrote:
> On Fri, Apr 26, 2019 at 10:26 AM Joel Fernandes <joel at joelfernandes.org> wrote:
> > On Thu, Apr 25, 2019 at 03:07:48PM -0700, Daniel Colascione wrote:
> > > On Thu, Apr 25, 2019 at 2:29 PM Christian Brauner <christian at brauner.io> wrote:
> > > > This timing-based testing seems kinda odd to be honest. Can't we do
> > > > something better than this?
> > >
> > > Agreed. Timing-based tests have a substantial risk of becoming flaky.
> > > We ought to be able to make these tests fully deterministic and not
> > > subject to breakage from odd scheduling outcomes. We don't have
> > > sleepable events for everything, granted, but sleep-waiting on a
> > > condition with exponential backoff is fine in test code. In general,
> > > if you start with a robust test, you can insert a sleep(100) anywhere
> > > and not break the logic. Violating this rule always causes pain sooner
> > > or later.
> >
> > I prefer if you can be more specific about how to redesign the test. Please
> > go through the code and make suggestions there. The tests have not been flaky
> > in my experience.
> 
> You've been running them in an ideal environment.

One would hope for a reliable test environment.

> > In this case, we want to make sure that the poll unblocks at the right "time"
> > that is when the non-leader thread exits, and not when the leader thread
> > exits (test 1), or when the non-leader thread exits and not when the same
> > non-leader previous did an execve (test 2).
> 
> Instead of sleeping, you want to wait for some condition. Right now,
> in a bunch of places, the test does something like this:
> 
> do_something()
> sleep(SOME_TIMEOUT)
> check(some_condition())

No. I don't have anything like "some_condition()". My some_condition() is
just the difference in time.

> 
> You can replace each of these clauses with something like this:
> 
> do_something()
> start_time = now()
> while(!some_condition() && now() - start_time < LONG_TIMEOUT)
>   sleep(SHORT_DELAY)
> check(some_condition())
> 
> This way, you're insensitive to timing, up to LONG_TIMEOUT (which can
> be something like a minute). Yes, you can always write
> sleep(LARGE_TIMEOUT) instead, but a good, robust value of LONG_TIMEOUT
> (which should be tens of seconds) would make the test take far too
> long to run in the happy case.

Yes, but try implementing some_condition()  :-). It is easy to talk in the
abstract, I think it would be more productive if you can come up with an
implementation/patchh of the test itself and send a patch for that. I know
you wrote some pseudocode below, but it is a complex reimplementation that I
don't think will make the test more robust. I mean reading /proc/pid stat?
yuck :) You are welcome to send a patch though if you have a better
implementation.

> Note that this code is fine:
> 
> check(!some_condition())
> sleep(SOME_REASONABLE_TIMEOUT)
> check(!some_condition())
> 
> It's okay to sleep for a little while and check that something did
> *not* happen, but it's not okay for the test to *fail* due to
> scheduling delays. The difference is that

As I said, multi-second scheduling delay are really unacceptable anyway. I
bet many kselftest may fail on a "bad" system like that way, that does not
mean the test is flaky. If there are any reports in the future that the test
fails or is flaky, I am happy to address them at that time. The tests work
and catch bugs reliably as I have seen. We could also make the test task as
RT if scheduling class is a concern.

I don't think its worth bikeshedding about hypothetical issues.

thanks,

 - Joel

WARNING: multiple messages have this Message-ID (diff)

From: joel@joelfernandes.org (Joel Fernandes)
Subject: [PATCH v1 2/2] Add selftests for pidfd polling
Date: Fri, 26 Apr 2019 16:31:14 -0400	[thread overview]
Message-ID: <20190426203114.GA185553@google.com> (raw)
Message-ID: <20190426203114.ty8Az934zdxRP4m-thYAlFjRd0ll6wM8EWkEhbcC8RA@z> (raw)
In-Reply-To: <CAKOZuetFw9MnZCrXbcFTCFD9Jq8bZwzdcEi_OYHMMv2R_sgNNA@mail.gmail.com>

On Fri, Apr 26, 2019@12:35:40PM -0700, Daniel Colascione wrote:
> On Fri, Apr 26, 2019@10:26 AM Joel Fernandes <joel@joelfernandes.org> wrote:
> > On Thu, Apr 25, 2019@03:07:48PM -0700, Daniel Colascione wrote:
> > > On Thu, Apr 25, 2019@2:29 PM Christian Brauner <christian@brauner.io> wrote:
> > > > This timing-based testing seems kinda odd to be honest. Can't we do
> > > > something better than this?
> > >
> > > Agreed. Timing-based tests have a substantial risk of becoming flaky.
> > > We ought to be able to make these tests fully deterministic and not
> > > subject to breakage from odd scheduling outcomes. We don't have
> > > sleepable events for everything, granted, but sleep-waiting on a
> > > condition with exponential backoff is fine in test code. In general,
> > > if you start with a robust test, you can insert a sleep(100) anywhere
> > > and not break the logic. Violating this rule always causes pain sooner
> > > or later.
> >
> > I prefer if you can be more specific about how to redesign the test. Please
> > go through the code and make suggestions there. The tests have not been flaky
> > in my experience.
> 
> You've been running them in an ideal environment.

One would hope for a reliable test environment.

> > In this case, we want to make sure that the poll unblocks at the right "time"
> > that is when the non-leader thread exits, and not when the leader thread
> > exits (test 1), or when the non-leader thread exits and not when the same
> > non-leader previous did an execve (test 2).
> 
> Instead of sleeping, you want to wait for some condition. Right now,
> in a bunch of places, the test does something like this:
> 
> do_something()
> sleep(SOME_TIMEOUT)
> check(some_condition())

No. I don't have anything like "some_condition()". My some_condition() is
just the difference in time.

> 
> You can replace each of these clauses with something like this:
> 
> do_something()
> start_time = now()
> while(!some_condition() && now() - start_time < LONG_TIMEOUT)
>   sleep(SHORT_DELAY)
> check(some_condition())
> 
> This way, you're insensitive to timing, up to LONG_TIMEOUT (which can
> be something like a minute). Yes, you can always write
> sleep(LARGE_TIMEOUT) instead, but a good, robust value of LONG_TIMEOUT
> (which should be tens of seconds) would make the test take far too
> long to run in the happy case.

Yes, but try implementing some_condition()  :-). It is easy to talk in the
abstract, I think it would be more productive if you can come up with an
implementation/patchh of the test itself and send a patch for that. I know
you wrote some pseudocode below, but it is a complex reimplementation that I
don't think will make the test more robust. I mean reading /proc/pid stat?
yuck :) You are welcome to send a patch though if you have a better
implementation.

> Note that this code is fine:
> 
> check(!some_condition())
> sleep(SOME_REASONABLE_TIMEOUT)
> check(!some_condition())
> 
> It's okay to sleep for a little while and check that something did
> *not* happen, but it's not okay for the test to *fail* due to
> scheduling delays. The difference is that

As I said, multi-second scheduling delay are really unacceptable anyway. I
bet many kselftest may fail on a "bad" system like that way, that does not
mean the test is flaky. If there are any reports in the future that the test
fails or is flaky, I am happy to address them at that time. The tests work
and catch bugs reliably as I have seen. We could also make the test task as
RT if scheduling class is a concern.

I don't think its worth bikeshedding about hypothetical issues.

thanks,

 - Joel

WARNING: multiple messages have this Message-ID (diff)

From: Joel Fernandes <joel@joelfernandes.org>
To: Daniel Colascione <dancol@google.com>
Cc: Christian Brauner <christian@brauner.io>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Arnd Bergmann <arnd@arndb.de>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Ingo Molnar <mingo@kernel.org>, Jann Horn <jannh@google.com>,
	Jann Horn <jann@thejh.net>,
	Jonathan Kowalski <bl0pbl33p@gmail.com>,
	Android Kernel Team <kernel-team@android.com>,
	"open list:KERNEL SELFTEST FRAMEWORK" 
	<linux-kselftest@vger.kernel.org>,
	Andy Lutomirski <luto@amacapital.net>,
	Michal Hocko <mhocko@suse.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Serge Hallyn <serge@hallyn.com>, Shuah Khan <shuah@kernel.org>,
	Sandeep Patil <sspatil@google.com>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	Suren Baghdasaryan <surenb@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Tim Murray <timmurray@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Tycho Andersen <tycho@tycho.ws>
Subject: Re: [PATCH v1 2/2] Add selftests for pidfd polling
Date: Fri, 26 Apr 2019 16:31:14 -0400	[thread overview]
Message-ID: <20190426203114.GA185553@google.com> (raw)
In-Reply-To: <CAKOZuetFw9MnZCrXbcFTCFD9Jq8bZwzdcEi_OYHMMv2R_sgNNA@mail.gmail.com>

On Fri, Apr 26, 2019 at 12:35:40PM -0700, Daniel Colascione wrote:
> On Fri, Apr 26, 2019 at 10:26 AM Joel Fernandes <joel@joelfernandes.org> wrote:
> > On Thu, Apr 25, 2019 at 03:07:48PM -0700, Daniel Colascione wrote:
> > > On Thu, Apr 25, 2019 at 2:29 PM Christian Brauner <christian@brauner.io> wrote:
> > > > This timing-based testing seems kinda odd to be honest. Can't we do
> > > > something better than this?
> > >
> > > Agreed. Timing-based tests have a substantial risk of becoming flaky.
> > > We ought to be able to make these tests fully deterministic and not
> > > subject to breakage from odd scheduling outcomes. We don't have
> > > sleepable events for everything, granted, but sleep-waiting on a
> > > condition with exponential backoff is fine in test code. In general,
> > > if you start with a robust test, you can insert a sleep(100) anywhere
> > > and not break the logic. Violating this rule always causes pain sooner
> > > or later.
> >
> > I prefer if you can be more specific about how to redesign the test. Please
> > go through the code and make suggestions there. The tests have not been flaky
> > in my experience.
> 
> You've been running them in an ideal environment.

One would hope for a reliable test environment.

> > In this case, we want to make sure that the poll unblocks at the right "time"
> > that is when the non-leader thread exits, and not when the leader thread
> > exits (test 1), or when the non-leader thread exits and not when the same
> > non-leader previous did an execve (test 2).
> 
> Instead of sleeping, you want to wait for some condition. Right now,
> in a bunch of places, the test does something like this:
> 
> do_something()
> sleep(SOME_TIMEOUT)
> check(some_condition())

No. I don't have anything like "some_condition()". My some_condition() is
just the difference in time.

> 
> You can replace each of these clauses with something like this:
> 
> do_something()
> start_time = now()
> while(!some_condition() && now() - start_time < LONG_TIMEOUT)
>   sleep(SHORT_DELAY)
> check(some_condition())
> 
> This way, you're insensitive to timing, up to LONG_TIMEOUT (which can
> be something like a minute). Yes, you can always write
> sleep(LARGE_TIMEOUT) instead, but a good, robust value of LONG_TIMEOUT
> (which should be tens of seconds) would make the test take far too
> long to run in the happy case.

Yes, but try implementing some_condition()  :-). It is easy to talk in the
abstract, I think it would be more productive if you can come up with an
implementation/patchh of the test itself and send a patch for that. I know
you wrote some pseudocode below, but it is a complex reimplementation that I
don't think will make the test more robust. I mean reading /proc/pid stat?
yuck :) You are welcome to send a patch though if you have a better
implementation.

> Note that this code is fine:
> 
> check(!some_condition())
> sleep(SOME_REASONABLE_TIMEOUT)
> check(!some_condition())
> 
> It's okay to sleep for a little while and check that something did
> *not* happen, but it's not okay for the test to *fail* due to
> scheduling delays. The difference is that

As I said, multi-second scheduling delay are really unacceptable anyway. I
bet many kselftest may fail on a "bad" system like that way, that does not
mean the test is flaky. If there are any reports in the future that the test
fails or is flaky, I am happy to address them at that time. The tests work
and catch bugs reliably as I have seen. We could also make the test task as
RT if scheduling class is a concern.

I don't think its worth bikeshedding about hypothetical issues.

thanks,

 - Joel

next prev parent reply	other threads:[~2019-04-26 20:31 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-25 19:00 [PATCH v1 1/2] Add polling support to pidfd joel
2019-04-25 19:00 ` Joel Fernandes (Google)
2019-04-25 19:00 ` Joel Fernandes (Google)
2019-04-25 19:00 ` [PATCH v1 2/2] Add selftests for pidfd polling joel
2019-04-25 19:00   ` Joel Fernandes (Google)
2019-04-25 19:00   ` Joel Fernandes (Google)
2019-04-25 20:00   ` tycho
2019-04-25 20:00     ` Tycho Andersen
2019-04-25 20:00     ` Tycho Andersen
2019-04-26 13:47     ` joel
2019-04-26 13:47       ` Joel Fernandes
2019-04-26 13:47       ` Joel Fernandes
2019-04-25 21:29   ` christian
2019-04-25 21:29     ` Christian Brauner
2019-04-25 21:29     ` Christian Brauner
2019-04-25 22:07     ` dancol
2019-04-25 22:07       ` Daniel Colascione
2019-04-25 22:07       ` Daniel Colascione
2019-04-26 17:26       ` joel
2019-04-26 17:26         ` Joel Fernandes
2019-04-26 17:26         ` Joel Fernandes
2019-04-26 19:35         ` dancol
2019-04-26 19:35           ` Daniel Colascione
2019-04-26 19:35           ` Daniel Colascione
2019-04-26 20:31           ` joel [this message]
2019-04-26 20:31             ` Joel Fernandes
2019-04-26 20:31             ` Joel Fernandes
2019-04-26 13:42     ` joel
2019-04-26 13:42       ` Joel Fernandes
2019-04-26 13:42       ` Joel Fernandes
2019-04-25 22:24 ` [PATCH v1 1/2] Add polling support to pidfd Christian Brauner
2019-04-25 22:24   ` Christian Brauner
2019-04-25 22:24   ` Christian Brauner
2019-04-25 22:24   ` christian
2019-04-26 14:23   ` Joel Fernandes
2019-04-26 14:23     ` Joel Fernandes
2019-04-26 14:23     ` Joel Fernandes
2019-04-26 14:23     ` joel
2019-04-26 15:21     ` Christian Brauner
2019-04-26 15:21       ` Christian Brauner
2019-04-26 15:21       ` Christian Brauner
2019-04-26 15:21       ` christian
2019-04-26 15:31       ` Christian Brauner
2019-04-26 15:31         ` Christian Brauner
2019-04-26 15:31         ` Christian Brauner
2019-04-26 15:31         ` christian
2019-04-28 16:24   ` Oleg Nesterov
2019-04-28 16:24     ` Oleg Nesterov
2019-04-28 16:24     ` Oleg Nesterov
2019-04-28 16:24     ` oleg
2019-04-29 14:02     ` Joel Fernandes
2019-04-29 14:02       ` Joel Fernandes
2019-04-29 14:02       ` Joel Fernandes
2019-04-29 14:02       ` joel
2019-04-29 14:07       ` Joel Fernandes
2019-04-29 14:07         ` Joel Fernandes
2019-04-29 14:07         ` Joel Fernandes
2019-04-29 14:07         ` joel
2019-04-29 14:25         ` Oleg Nesterov
2019-04-29 14:25           ` Oleg Nesterov
2019-04-29 14:25           ` Oleg Nesterov
2019-04-29 14:25           ` oleg
2019-04-29 14:20       ` Oleg Nesterov
2019-04-29 14:20         ` Oleg Nesterov
2019-04-29 14:20         ` Oleg Nesterov
2019-04-29 14:20         ` oleg
2019-04-29 16:32         ` Joel Fernandes
2019-04-29 16:32           ` Joel Fernandes
2019-04-29 16:32           ` Joel Fernandes
2019-04-29 16:32           ` joel
2019-04-30 11:53           ` Oleg Nesterov
2019-04-30 11:53             ` Oleg Nesterov
2019-04-30 11:53             ` Oleg Nesterov
2019-04-30 11:53             ` oleg
2019-04-30 12:07             ` Oleg Nesterov
2019-04-30 12:07               ` Oleg Nesterov
2019-04-30 12:07               ` Oleg Nesterov
2019-04-30 12:07               ` oleg
2019-04-30 15:49             ` Joel Fernandes
2019-04-30 15:49               ` Joel Fernandes
2019-04-30 15:49               ` Joel Fernandes
2019-04-30 15:49               ` joel
2019-04-26 14:58 ` christian
2019-04-26 14:58   ` Christian Brauner
2019-04-26 14:58   ` Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190426203114.GA185553@google.com \
    --to=unknown@example.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.