* [ndctl PATCH v3] test/monitor.sh: replace sleep with event driven wait
@ 2025-05-19 19:28 alison.schofield
2025-05-19 22:42 ` Marc Herbert
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: alison.schofield @ 2025-05-19 19:28 UTC (permalink / raw)
To: nvdimm, Marc Herbert, Li Zhijian; +Cc: Alison Schofield
From: Alison Schofield <alison.schofield@intel.com>
monitor.sh runs for 50 seconds and spends 48 of those seconds sleeping
after sync. It sleeps for 3 seconds each time it restarts the monitor,
and 3 seconds before checking for expected log entries.
Add a wait_for_logfile_update() helper that waits a max of 3 seconds
for an expected string to appear N times in the logfile using tail -F.
Add a "monitor ready" log message to the monitor executable and wait
for that message once after monitor start. Note that if no DIMM has an
event flag set, there will be no log entry at startup. Always look for
the "monitor ready" message.
Expand the check_result() function to handle both the sync and wait
that were previously duplicated in inject_smart() and call_notify().
It now waits for the expected N of new log entries.
Again, looking for Tested-by Tags. Thanks!
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
Changes in v3:
- Add and use a helper that uses tail -F
- Add ready message to monitor.c
- Update commit msg and log
Link to v2: https://lore.kernel.org/nvdimm/20250516044628.1532939-1-alison.schofield@intel.com/
Changes in v2:
- Poll for 3 seconds instead of removing sleep entirely (MarcH)
- Update commit msg & log
Link to v1: https://lore.kernel.org/nvdimm/20250514014133.1431846-1-alison.schofield@intel.com/
ndctl/monitor.c | 2 +-
test/monitor.sh | 24 +++++++++++++++++++++---
2 files changed, 22 insertions(+), 4 deletions(-)
diff --git a/ndctl/monitor.c b/ndctl/monitor.c
index bd8a74863476..925b37f4184b 100644
--- a/ndctl/monitor.c
+++ b/ndctl/monitor.c
@@ -658,7 +658,7 @@ int cmd_monitor(int argc, const char **argv, struct ndctl_ctx *ctx)
rc = -ENXIO;
goto out;
}
-
+ info(&monitor, "monitor ready\n");
rc = monitor_event(ctx, &mfa);
out:
if (monitor.ctx.log_file)
diff --git a/test/monitor.sh b/test/monitor.sh
index be8e24d6f3aa..d0666392ab5b 100755
--- a/test/monitor.sh
+++ b/test/monitor.sh
@@ -21,12 +21,28 @@ trap 'err $LINENO' ERR
check_min_kver "4.15" || do_skip "kernel $KVER may not support monitor service"
+wait_for_logfile_update()
+{
+ local expect_string="$1"
+ local expect_count="$2"
+
+ # Wait up to 3s for $expect_count occurrences of $expect_string
+ # tail -n +1 -F: starts watching the logfile from the first line
+
+ if ! timeout 3s tail -n +1 -F "$logfile" | grep -m "$expect_count" -q "$expect_string"; then
+ echo "logfile not updated in 3 secs"
+ err "$LINENO"
+ fi
+}
+
start_monitor()
{
logfile=$(mktemp)
$NDCTL monitor -c "$monitor_conf" -l "$logfile" $1 &
monitor_pid=$!
- sync; sleep 3
+
+ sync
+ wait_for_logfile_update "monitor ready" 1
truncate --size 0 "$logfile" #remove startup log
}
@@ -49,17 +65,19 @@ get_monitor_dimm()
call_notify()
{
"$TEST_PATH"/smart-notify "$smart_supported_bus"
- sync; sleep 3
}
inject_smart()
{
$NDCTL inject-smart "$monitor_dimms" $1
- sync; sleep 3
}
check_result()
{
+ sync
+ expect_count=$(wc -w <<< "$1")
+ wait_for_logfile_update "timestamp" "$expect_count"
+
jlog=$(cat "$logfile")
notify_dimms=$(jq ."dimm"."dev" <<<"$jlog" | sort | uniq | xargs)
[[ "$1" == "$notify_dimms" ]]
--
2.37.3
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [ndctl PATCH v3] test/monitor.sh: replace sleep with event driven wait 2025-05-19 19:28 [ndctl PATCH v3] test/monitor.sh: replace sleep with event driven wait alison.schofield @ 2025-05-19 22:42 ` Marc Herbert 2025-05-22 4:20 ` Alison Schofield 2025-05-20 20:47 ` Verma, Vishal L 2025-05-21 9:00 ` Zhijian Li (Fujitsu) 2 siblings, 1 reply; 9+ messages in thread From: Marc Herbert @ 2025-05-19 22:42 UTC (permalink / raw) To: alison.schofield, nvdimm, Li Zhijian On 2025-05-19 12:28, alison.schofield@intel.com wrote: > From: Alison Schofield <alison.schofield@intel.com> > > > Again, looking for Tested-by Tags. Thanks! Wow, it went so fast I genuinely thought something went terribly wrong and it did not test anything anymore. While reporting "Success!" - that has happened before. But I check the test logs and it does a lot of stuff - in less 2 seconds! What a difference. Tested-by: Marc Herbert <marc.herbert@linux.intel.com> I have only concern holding back my Reviewed-by + some minor nits after the code. I think the error message for the timeout pipeline is too limited, terse and generic; does not say anything about what happened, does not make the difference between a timeout and some other, unexpected failure, how many occurences were found etc. I think the error clause should do at least two things: - print the timeout $? exit code. - grep the log file again, either with -c if it's too long, or not. - Any other useful information that could be up for grabs at that point. > +wait_for_logfile_update() > +{ > + local expect_string="$1" > + local expect_count="$2" > + > + # Wait up to 3s for $expect_count occurrences of $expect_string > + # tail -n +1 -F: starts watching the logfile from the first line > + > + if ! timeout 3s tail -n +1 -F "$logfile" | grep -m "$expect_count" -q "$expect_string"; then > + echo "logfile not updated in 3 secs" > + err "$LINENO" > + fi > +} tail -F really does solve the problem with very little code, well spotted! Nit 1: I did not know that -F option: can you try to sneak the word "retry" somewhere in the comments? Codestyle Nit 2: to avoid negations I generally prefer: timeout 3s tail -n +1 -F "$logfile" | grep -q -m "$expect_count" -q "$expect_string" || { error ... } This also translates to plain English nicely as "wait or fail", "do or die", etc. Super minor nit 3: "$expect_string" looks like an argument of "-q", can you move -q first? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [ndctl PATCH v3] test/monitor.sh: replace sleep with event driven wait 2025-05-19 22:42 ` Marc Herbert @ 2025-05-22 4:20 ` Alison Schofield 0 siblings, 0 replies; 9+ messages in thread From: Alison Schofield @ 2025-05-22 4:20 UTC (permalink / raw) To: Marc Herbert; +Cc: nvdimm, Li Zhijian On Mon, May 19, 2025 at 03:42:30PM -0700, Marc Herbert wrote: > On 2025-05-19 12:28, alison.schofield@intel.com wrote: > > From: Alison Schofield <alison.schofield@intel.com> > > > > > > Again, looking for Tested-by Tags. Thanks! > > Wow, it went so fast I genuinely thought something went terribly wrong > and it did not test anything anymore. While reporting "Success!" - that > has happened before. > > But I check the test logs and it does a lot of stuff - in less 2 > seconds! What a difference. > > Tested-by: Marc Herbert <marc.herbert@linux.intel.com> > > I have only concern holding back my Reviewed-by + some minor nits after > the code. I think the error message for the timeout pipeline is too > limited, terse and generic; does not say anything about what happened, > does not make the difference between a timeout and some other, unexpected > failure, how many occurences were found etc. I think the error clause > should do at least two things: > > - print the timeout $? exit code. OK. I'll combine that request with what Vishal suggested set -o pipefail do my 'do or die' cmd check the rc for timeout, grep no match, or other. > - grep the log file again, either with -c if it's too long, or not. > - Any other useful information that could be up for grabs at that point. > I can add dump the logfile to the test log before removing it, everytime. Each test case only puts in 1-4 entries. > > > +wait_for_logfile_update() > > +{ > > + local expect_string="$1" > > + local expect_count="$2" > > + > > + # Wait up to 3s for $expect_count occurrences of $expect_string > > + # tail -n +1 -F: starts watching the logfile from the first line > > + > > + if ! timeout 3s tail -n +1 -F "$logfile" | grep -m "$expect_count" -q "$expect_string"; then > > + echo "logfile not updated in 3 secs" > > + err "$LINENO" > > + fi > > +} > > tail -F really does solve the problem with very little code, well > spotted! > Nit 1: I did not know that -F option: can you try to sneak the word > "retry" somewhere in the comments? will do > > Codestyle Nit 2: to avoid negations I generally prefer: > > timeout 3s tail -n +1 -F "$logfile" | > grep -q -m "$expect_count" -q "$expect_string" || { > error ... > } > > This also translates to plain English nicely as "wait or fail", "do or > die", etc. > You prefer do or die style. I can do that. > Super minor nit 3: "$expect_string" looks like an argument of "-q", can > you move -q first? will do. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [ndctl PATCH v3] test/monitor.sh: replace sleep with event driven wait 2025-05-19 19:28 [ndctl PATCH v3] test/monitor.sh: replace sleep with event driven wait alison.schofield 2025-05-19 22:42 ` Marc Herbert @ 2025-05-20 20:47 ` Verma, Vishal L 2025-05-22 4:22 ` Alison Schofield 2025-05-21 9:00 ` Zhijian Li (Fujitsu) 2 siblings, 1 reply; 9+ messages in thread From: Verma, Vishal L @ 2025-05-20 20:47 UTC (permalink / raw) To: Schofield, Alison, marc.herbert@linux.intel.com, nvdimm@lists.linux.dev, lizhijian@fujitsu.com On Mon, 2025-05-19 at 12:28 -0700, alison.schofield@intel.com wrote: > From: Alison Schofield <alison.schofield@intel.com> > > monitor.sh runs for 50 seconds and spends 48 of those seconds sleeping > after sync. It sleeps for 3 seconds each time it restarts the monitor, > and 3 seconds before checking for expected log entries. > > Add a wait_for_logfile_update() helper that waits a max of 3 seconds > for an expected string to appear N times in the logfile using tail -F. > > Add a "monitor ready" log message to the monitor executable and wait > for that message once after monitor start. Note that if no DIMM has an > event flag set, there will be no log entry at startup. Always look for > the "monitor ready" message. > > Expand the check_result() function to handle both the sync and wait > that were previously duplicated in inject_smart() and call_notify(). > It now waits for the expected N of new log entries. > > Again, looking for Tested-by Tags. Thanks! > > Signed-off-by: Alison Schofield <alison.schofield@intel.com> Tested-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> > <snip> > + > + if ! timeout 3s tail -n +1 -F "$logfile" | grep -m "$expect_count" -q "$expect_string"; then > + echo "logfile not updated in 3 secs" > + err "$LINENO" > + fi I was tempted to say something about a `set -o pipefail` before this, and it might be nice to add one? It's not needed here since the grep will fail if the first part of the pipeline fails, and there's no chance of the pipeline eating your error in this case. But it might be good practice to do it for scenarios like if ! cmd1 | cmd2 ... etc. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [ndctl PATCH v3] test/monitor.sh: replace sleep with event driven wait 2025-05-20 20:47 ` Verma, Vishal L @ 2025-05-22 4:22 ` Alison Schofield 0 siblings, 0 replies; 9+ messages in thread From: Alison Schofield @ 2025-05-22 4:22 UTC (permalink / raw) To: Verma, Vishal L Cc: marc.herbert@linux.intel.com, nvdimm@lists.linux.dev, lizhijian@fujitsu.com On Tue, May 20, 2025 at 01:47:22PM -0700, Vishal Verma wrote: > On Mon, 2025-05-19 at 12:28 -0700, alison.schofield@intel.com wrote: > > From: Alison Schofield <alison.schofield@intel.com> > > > > monitor.sh runs for 50 seconds and spends 48 of those seconds sleeping > > after sync. It sleeps for 3 seconds each time it restarts the monitor, > > and 3 seconds before checking for expected log entries. > > > > Add a wait_for_logfile_update() helper that waits a max of 3 seconds > > for an expected string to appear N times in the logfile using tail -F. > > > > Add a "monitor ready" log message to the monitor executable and wait > > for that message once after monitor start. Note that if no DIMM has an > > event flag set, there will be no log entry at startup. Always look for > > the "monitor ready" message. > > > > Expand the check_result() function to handle both the sync and wait > > that were previously duplicated in inject_smart() and call_notify(). > > It now waits for the expected N of new log entries. > > > > Again, looking for Tested-by Tags. Thanks! > > > > Signed-off-by: Alison Schofield <alison.schofield@intel.com> > > Tested-by: Vishal Verma <vishal.l.verma@intel.com> > Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> > > > > <snip> > > + > > + if ! timeout 3s tail -n +1 -F "$logfile" | grep -m "$expect_count" -q "$expect_string"; then > > + echo "logfile not updated in 3 secs" > > + err "$LINENO" > > + fi > > I was tempted to say something about a `set -o pipefail` before this, > and it might be nice to add one? > > It's not needed here since the grep will fail if the first part of the > pipeline fails, and there's no chance of the pipeline eating your error > in this case. > > But it might be good practice to do it for scenarios like > if ! cmd1 | cmd2 ... etc. I looked at pipeline and it seems like it's be 1: grep no match 124: timeout expired 125+: anything else I'll experiment with it. > > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [ndctl PATCH v3] test/monitor.sh: replace sleep with event driven wait 2025-05-19 19:28 [ndctl PATCH v3] test/monitor.sh: replace sleep with event driven wait alison.schofield 2025-05-19 22:42 ` Marc Herbert 2025-05-20 20:47 ` Verma, Vishal L @ 2025-05-21 9:00 ` Zhijian Li (Fujitsu) 2025-05-22 4:35 ` Alison Schofield 2 siblings, 1 reply; 9+ messages in thread From: Zhijian Li (Fujitsu) @ 2025-05-21 9:00 UTC (permalink / raw) To: alison.schofield@intel.com, nvdimm@lists.linux.dev, Marc Herbert On 20/05/2025 03:28, alison.schofield@intel.com wrote: > diff --git a/ndctl/monitor.c b/ndctl/monitor.c > index bd8a74863476..925b37f4184b 100644 > --- a/ndctl/monitor.c > +++ b/ndctl/monitor.c > @@ -658,7 +658,7 @@ int cmd_monitor(int argc, const char **argv, struct ndctl_ctx *ctx) > rc = -ENXIO; > goto out; > } > - > + info(&monitor, "monitor ready\n"); This brings to mind my initial contribution to ndctl, where it commented that monitor expects to output content in json format[1]? So this update could break it, does it matter now? [1] https://lore.kernel.org/linux-cxl/4c2341c8a4e579e9643b7daa3eb412b0ac0da98a.camel@intel.com/ Thanks Zhijian > rc = monitor_event(ctx, &mfa); > out: ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [ndctl PATCH v3] test/monitor.sh: replace sleep with event driven wait 2025-05-21 9:00 ` Zhijian Li (Fujitsu) @ 2025-05-22 4:35 ` Alison Schofield 2025-05-22 18:38 ` Dan Williams 0 siblings, 1 reply; 9+ messages in thread From: Alison Schofield @ 2025-05-22 4:35 UTC (permalink / raw) To: Zhijian Li (Fujitsu); +Cc: nvdimm@lists.linux.dev, Marc Herbert On Wed, May 21, 2025 at 09:00:19AM +0000, Zhijian Li (Fujitsu) wrote: > > > On 20/05/2025 03:28, alison.schofield@intel.com wrote: > > diff --git a/ndctl/monitor.c b/ndctl/monitor.c > > index bd8a74863476..925b37f4184b 100644 > > --- a/ndctl/monitor.c > > +++ b/ndctl/monitor.c > > @@ -658,7 +658,7 @@ int cmd_monitor(int argc, const char **argv, struct ndctl_ctx *ctx) > > rc = -ENXIO; > > goto out; > > } > > - > > + info(&monitor, "monitor ready\n"); > > > This brings to mind my initial contribution to ndctl, where it commented that monitor expects to > output content in json format[1]? So this update could break it, does it matter now? > > [1] https://lore.kernel.org/linux-cxl/4c2341c8a4e579e9643b7daa3eb412b0ac0da98a.camel@intel.com/ That is odd because right above where you wanted to add that info[] to cxl/monitor.c another info[] was added to the log for the daemon starting ? ndctl/monitor.c has a few info[] going to the log. In the ndctl/monitor.c the presence of a log does not mean the monitor started. I'll poke around more about the need for json. I get that in theory, but I'm doubtful in practice that a json parser would die on those info entries. Thanks for trying this out. I'll be back with another flavor. > > Thanks > Zhijian > > > rc = monitor_event(ctx, &mfa); > > out: ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [ndctl PATCH v3] test/monitor.sh: replace sleep with event driven wait 2025-05-22 4:35 ` Alison Schofield @ 2025-05-22 18:38 ` Dan Williams 2025-05-22 22:05 ` Alison Schofield 0 siblings, 1 reply; 9+ messages in thread From: Dan Williams @ 2025-05-22 18:38 UTC (permalink / raw) To: Alison Schofield, Zhijian Li (Fujitsu) Cc: nvdimm@lists.linux.dev, Marc Herbert Alison Schofield wrote: > On Wed, May 21, 2025 at 09:00:19AM +0000, Zhijian Li (Fujitsu) wrote: > > > > > > On 20/05/2025 03:28, alison.schofield@intel.com wrote: > > > diff --git a/ndctl/monitor.c b/ndctl/monitor.c > > > index bd8a74863476..925b37f4184b 100644 > > > --- a/ndctl/monitor.c > > > +++ b/ndctl/monitor.c > > > @@ -658,7 +658,7 @@ int cmd_monitor(int argc, const char **argv, struct ndctl_ctx *ctx) > > > rc = -ENXIO; > > > goto out; > > > } > > > - > > > + info(&monitor, "monitor ready\n"); > > > > > > This brings to mind my initial contribution to ndctl, where it commented that monitor expects to > > output content in json format[1]? So this update could break it, does it matter now? > > > > [1] https://lore.kernel.org/linux-cxl/4c2341c8a4e579e9643b7daa3eb412b0ac0da98a.camel@intel.com/ > > That is odd because right above where you wanted to add that info[] to > cxl/monitor.c another info[] was added to the log for the daemon starting ? > > ndctl/monitor.c has a few info[] going to the log. > > In the ndctl/monitor.c the presence of a log does not mean the monitor > started. I'll poke around more about the need for json. I get that in > theory, but I'm doubtful in practice that a json parser would die on > those info entries. It's not just monitor, the expectation for all binaries in the project is that all stdout is json and anything that is not json is only allowed on stderr. That way you can always use these utilities in tooling pipelines that do not need not to build a pile of grep and awk to extract useful data. It turns out, unfortunately, that LOG_INFO is the only log level in util/log.c that violates the expectation that all non-json goes to stderr. So I would support a change like this to remove that trap: diff --git a/util/log.c b/util/log.c index d4eef0a1fc5c..4ee85c7609c3 100644 --- a/util/log.c +++ b/util/log.c @@ -18,10 +18,7 @@ void log_syslog(struct log_ctx *ctx, int priority, const char *file, int line, void log_standard(struct log_ctx *ctx, int priority, const char *file, int line, const char *fn, const char *format, va_list args) { - if (priority == 6) - vfprintf(stdout, format, args); - else - vfprintf(stderr, format, args); + vfprintf(stderr, format, args); } void log_file(struct log_ctx *ctx, int priority, const char *file, int line, ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [ndctl PATCH v3] test/monitor.sh: replace sleep with event driven wait 2025-05-22 18:38 ` Dan Williams @ 2025-05-22 22:05 ` Alison Schofield 0 siblings, 0 replies; 9+ messages in thread From: Alison Schofield @ 2025-05-22 22:05 UTC (permalink / raw) To: Dan Williams; +Cc: Zhijian Li (Fujitsu), nvdimm@lists.linux.dev, Marc Herbert On Thu, May 22, 2025 at 11:38:34AM -0700, Dan Williams wrote: > Alison Schofield wrote: > > On Wed, May 21, 2025 at 09:00:19AM +0000, Zhijian Li (Fujitsu) wrote: > > > > > > > > > On 20/05/2025 03:28, alison.schofield@intel.com wrote: > > > > diff --git a/ndctl/monitor.c b/ndctl/monitor.c > > > > index bd8a74863476..925b37f4184b 100644 > > > > --- a/ndctl/monitor.c > > > > +++ b/ndctl/monitor.c > > > > @@ -658,7 +658,7 @@ int cmd_monitor(int argc, const char **argv, struct ndctl_ctx *ctx) > > > > rc = -ENXIO; > > > > goto out; > > > > } > > > > - > > > > + info(&monitor, "monitor ready\n"); > > > > > > > > > This brings to mind my initial contribution to ndctl, where it commented that monitor expects to > > > output content in json format[1]? So this update could break it, does it matter now? > > > > > > [1] https://lore.kernel.org/linux-cxl/4c2341c8a4e579e9643b7daa3eb412b0ac0da98a.camel@intel.com/ > > > > That is odd because right above where you wanted to add that info[] to > > cxl/monitor.c another info[] was added to the log for the daemon starting ? > > > > ndctl/monitor.c has a few info[] going to the log. > > > > In the ndctl/monitor.c the presence of a log does not mean the monitor > > started. I'll poke around more about the need for json. I get that in > > theory, but I'm doubtful in practice that a json parser would die on > > those info entries. > > It's not just monitor, the expectation for all binaries in the project > is that all stdout is json and anything that is not json is only allowed > on stderr. That way you can always use these utilities in tooling > pipelines that do not need not to build a pile of grep and awk to > extract useful data. > > It turns out, unfortunately, that LOG_INFO is the only log level in > util/log.c that violates the expectation that all non-json goes to > stderr. So I would support a change like this to remove that trap: > Ah, understood (now). Thanks & will do. > diff --git a/util/log.c b/util/log.c > index d4eef0a1fc5c..4ee85c7609c3 100644 > --- a/util/log.c > +++ b/util/log.c > @@ -18,10 +18,7 @@ void log_syslog(struct log_ctx *ctx, int priority, const char *file, int line, > void log_standard(struct log_ctx *ctx, int priority, const char *file, int line, > const char *fn, const char *format, va_list args) > { > - if (priority == 6) > - vfprintf(stdout, format, args); > - else > - vfprintf(stderr, format, args); > + vfprintf(stderr, format, args); > } > > void log_file(struct log_ctx *ctx, int priority, const char *file, int line, ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-05-22 22:05 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-05-19 19:28 [ndctl PATCH v3] test/monitor.sh: replace sleep with event driven wait alison.schofield 2025-05-19 22:42 ` Marc Herbert 2025-05-22 4:20 ` Alison Schofield 2025-05-20 20:47 ` Verma, Vishal L 2025-05-22 4:22 ` Alison Schofield 2025-05-21 9:00 ` Zhijian Li (Fujitsu) 2025-05-22 4:35 ` Alison Schofield 2025-05-22 18:38 ` Dan Williams 2025-05-22 22:05 ` Alison Schofield
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.