* [igt-dev] [PATCH i-g-t] runner: check disk limit at dumping kmsg
@ 2023-02-24 19:27 Kamil Konieczny
2023-02-24 20:22 ` [igt-dev] ✗ Fi.CI.BAT: failure for " Patchwork
2023-02-25 13:09 ` [igt-dev] [PATCH i-g-t] " Petri Latvala
0 siblings, 2 replies; 3+ messages in thread
From: Kamil Konieczny @ 2023-02-24 19:27 UTC (permalink / raw)
To: igt-dev; +Cc: Karol Krol, Arkadiusz Hiler
It was reported that kernel dumps can grow beyond disk limit size
so add checks for it and report error if that happen.
Reported-by: Karol Krol <karol.krol@intel.com>
Ref: https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/issues/129
Cc: Petri Latvala <adrinael@adrinael.net>
Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Signed-off-by: Kamil Konieczny <kamil.konieczny@linux.intel.com>
---
runner/executor.c | 24 +++++++++++++++++++-----
1 file changed, 19 insertions(+), 5 deletions(-)
diff --git a/runner/executor.c b/runner/executor.c
index 597cd7f5..17ebcdb8 100644
--- a/runner/executor.c
+++ b/runner/executor.c
@@ -584,7 +584,7 @@ void close_outputs(int *fds)
}
/* Returns the number of bytes written to disk, or a negative number on error */
-static long dump_dmesg(int kmsgfd, int outfd)
+static long dump_dmesg(int kmsgfd, int outfd, size_t disk_limit)
{
/*
* Write kernel messages to the log file until we reach
@@ -599,12 +599,18 @@ static long dump_dmesg(int kmsgfd, int outfd)
bool underflow_once = false;
char cont;
char buf[2048];
- ssize_t r;
+ ssize_t r, disk_written;
long written = 0;
if (kmsgfd < 0)
return 0;
+ disk_written = lseek(outfd, 0, SEEK_SET);
+ if (disk_written > disk_limit) {
+ errf("Error dumping kmsg: disk limit already exceeded\n");
+ return disk_written;
+ }
+
comparefd = open("/dev/kmsg", O_RDONLY | O_NONBLOCK);
if (comparefd < 0) {
errf("Error opening another fd for /dev/kmsg\n");
@@ -655,6 +661,13 @@ static long dump_dmesg(int kmsgfd, int outfd)
write(outfd, buf, r);
written += r;
+ disk_written += r;
+
+ if (disk_written > disk_limit) {
+ close(comparefd);
+ errf("Error dumping kmsg: disk limit exceeded\n");
+ return disk_written;
+ }
if (comparefd < 0 && sscanf(buf, "%u,%llu,%llu,%c;",
&flags, &seq, &usec, &cont) == 4) {
@@ -890,6 +903,7 @@ static int monitor_output(pid_t child,
unsigned long taints = 0;
bool aborting = false;
size_t disk_usage = 0;
+ const size_t mon_disk_limit = settings->disk_usage_limit ? : (~(size_t)0);
bool socket_comms_used = false; /* whether the test actually uses comms */
bool results_received = false; /* whether we already have test results that might need overriding if we detect an abort condition */
@@ -1219,7 +1233,7 @@ static int monitor_output(pid_t child,
time_last_activity = time_now;
- dmesgwritten = dump_dmesg(kmsgfd, outputs[_F_DMESG]);
+ dmesgwritten = dump_dmesg(kmsgfd, outputs[_F_DMESG], mon_disk_limit);
if (settings->sync)
fdatasync(outputs[_F_DMESG]);
@@ -1457,7 +1471,7 @@ static int monitor_output(pid_t child,
asprintf(abortreason, "Child refuses to die, tainted 0x%lx.", taints);
}
- dump_dmesg(kmsgfd, outputs[_F_DMESG]);
+ dump_dmesg(kmsgfd, outputs[_F_DMESG], mon_disk_limit);
if (settings->sync)
fdatasync(outputs[_F_DMESG]);
@@ -1483,7 +1497,7 @@ static int monitor_output(pid_t child,
}
}
- dump_dmesg(kmsgfd, outputs[_F_DMESG]);
+ dump_dmesg(kmsgfd, outputs[_F_DMESG], mon_disk_limit);
if (settings->sync)
fdatasync(outputs[_F_DMESG]);
--
2.37.2
^ permalink raw reply related [flat|nested] 3+ messages in thread* [igt-dev] ✗ Fi.CI.BAT: failure for runner: check disk limit at dumping kmsg
2023-02-24 19:27 [igt-dev] [PATCH i-g-t] runner: check disk limit at dumping kmsg Kamil Konieczny
@ 2023-02-24 20:22 ` Patchwork
2023-02-25 13:09 ` [igt-dev] [PATCH i-g-t] " Petri Latvala
1 sibling, 0 replies; 3+ messages in thread
From: Patchwork @ 2023-02-24 20:22 UTC (permalink / raw)
To: Kamil Konieczny; +Cc: igt-dev
[-- Attachment #1: Type: text/plain, Size: 5701 bytes --]
== Series Details ==
Series: runner: check disk limit at dumping kmsg
URL : https://patchwork.freedesktop.org/series/114353/
State : failure
== Summary ==
CI Bug Log - changes from CI_DRM_12776 -> IGTPW_8526
====================================================
Summary
-------
**FAILURE**
Serious unknown changes coming with IGTPW_8526 absolutely need to be
verified manually.
If you think the reported changes have nothing to do with the changes
introduced in IGTPW_8526, please notify your bug team to allow them
to document this new failure mode, which will reduce false positives in CI.
External URL: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8526/index.html
Participating hosts (39 -> 38)
------------------------------
Missing (1): fi-snb-2520m
Possible new issues
-------------------
Here are the unknown changes that may have been introduced in IGTPW_8526:
### IGT changes ###
#### Possible regressions ####
* igt@dmabuf@all-tests@dma_fence:
- fi-cfl-8109u: [PASS][1] -> [FAIL][2]
[1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12776/fi-cfl-8109u/igt@dmabuf@all-tests@dma_fence.html
[2]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8526/fi-cfl-8109u/igt@dmabuf@all-tests@dma_fence.html
* igt@dmabuf@all-tests@sanitycheck:
- fi-cfl-8109u: [PASS][3] -> [ABORT][4]
[3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12776/fi-cfl-8109u/igt@dmabuf@all-tests@sanitycheck.html
[4]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8526/fi-cfl-8109u/igt@dmabuf@all-tests@sanitycheck.html
* igt@i915_selftest@live@hangcheck:
- fi-kbl-soraka: [PASS][5] -> [INCOMPLETE][6]
[5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12776/fi-kbl-soraka/igt@i915_selftest@live@hangcheck.html
[6]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8526/fi-kbl-soraka/igt@i915_selftest@live@hangcheck.html
* igt@i915_selftest@live@requests:
- bat-rpls-2: [PASS][7] -> [ABORT][8]
[7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12776/bat-rpls-2/igt@i915_selftest@live@requests.html
[8]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8526/bat-rpls-2/igt@i915_selftest@live@requests.html
* igt@i915_selftest@live@reset:
- bat-rpls-1: [PASS][9] -> [ABORT][10]
[9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12776/bat-rpls-1/igt@i915_selftest@live@reset.html
[10]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8526/bat-rpls-1/igt@i915_selftest@live@reset.html
#### Warnings ####
* igt@i915_selftest@live@gt_pm:
- bat-rpls-2: [DMESG-FAIL][11] ([i915#4258]) -> [FAIL][12]
[11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12776/bat-rpls-2/igt@i915_selftest@live@gt_pm.html
[12]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8526/bat-rpls-2/igt@i915_selftest@live@gt_pm.html
- fi-kbl-soraka: [DMESG-FAIL][13] ([i915#1886]) -> [FAIL][14]
[13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12776/fi-kbl-soraka/igt@i915_selftest@live@gt_pm.html
[14]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8526/fi-kbl-soraka/igt@i915_selftest@live@gt_pm.html
* igt@i915_selftest@live@slpc:
- bat-adln-1: [DMESG-FAIL][15] ([i915#6997]) -> [FAIL][16]
[15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12776/bat-adln-1/igt@i915_selftest@live@slpc.html
[16]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8526/bat-adln-1/igt@i915_selftest@live@slpc.html
* igt@i915_suspend@basic-s3-without-i915:
- fi-tgl-1115g4: [INCOMPLETE][17] ([i915#7443]) -> [INCOMPLETE][18]
[17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12776/fi-tgl-1115g4/igt@i915_suspend@basic-s3-without-i915.html
[18]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8526/fi-tgl-1115g4/igt@i915_suspend@basic-s3-without-i915.html
Known issues
------------
Here are the changes found in IGTPW_8526 that come from known issues:
### IGT changes ###
#### Possible fixes ####
* igt@gem_exec_gttfill@basic:
- fi-pnv-d510: [FAIL][19] ([i915#7229]) -> [PASS][20]
[19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12776/fi-pnv-d510/igt@gem_exec_gttfill@basic.html
[20]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8526/fi-pnv-d510/igt@gem_exec_gttfill@basic.html
* igt@i915_selftest@live@gt_heartbeat:
- fi-kbl-soraka: [DMESG-FAIL][21] ([i915#5334] / [i915#7872]) -> [PASS][22]
[21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12776/fi-kbl-soraka/igt@i915_selftest@live@gt_heartbeat.html
[22]: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8526/fi-kbl-soraka/igt@i915_selftest@live@gt_heartbeat.html
[i915#1886]: https://gitlab.freedesktop.org/drm/intel/issues/1886
[i915#4258]: https://gitlab.freedesktop.org/drm/intel/issues/4258
[i915#5334]: https://gitlab.freedesktop.org/drm/intel/issues/5334
[i915#6997]: https://gitlab.freedesktop.org/drm/intel/issues/6997
[i915#7229]: https://gitlab.freedesktop.org/drm/intel/issues/7229
[i915#7443]: https://gitlab.freedesktop.org/drm/intel/issues/7443
[i915#7872]: https://gitlab.freedesktop.org/drm/intel/issues/7872
Build changes
-------------
* CI: CI-20190529 -> None
* IGT: IGT_7173 -> IGTPW_8526
CI-20190529: 20190529
CI_DRM_12776: f86cdb8690c1cd968b6763b613a4a0fde424895e @ git://anongit.freedesktop.org/gfx-ci/linux
IGTPW_8526: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8526/index.html
IGT_7173: deab4e0bdf5a9366b67d0a44f478f3da3c9a943b @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
Testlist changes
----------------
-igt@kms_plane_scaling@i915-max-source-size
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_8526/index.html
[-- Attachment #2: Type: text/html, Size: 6586 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [igt-dev] [PATCH i-g-t] runner: check disk limit at dumping kmsg
2023-02-24 19:27 [igt-dev] [PATCH i-g-t] runner: check disk limit at dumping kmsg Kamil Konieczny
2023-02-24 20:22 ` [igt-dev] ✗ Fi.CI.BAT: failure for " Patchwork
@ 2023-02-25 13:09 ` Petri Latvala
1 sibling, 0 replies; 3+ messages in thread
From: Petri Latvala @ 2023-02-25 13:09 UTC (permalink / raw)
To: Kamil Konieczny; +Cc: igt-dev, Arkadiusz Hiler, Karol Krol
On Fri, Feb 24, 2023 at 08:27:03PM +0100, Kamil Konieczny wrote:
> It was reported that kernel dumps can grow beyond disk limit size
> so add checks for it and report error if that happen.
>
> Reported-by: Karol Krol <karol.krol@intel.com>
> Ref: https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/issues/129
> Cc: Petri Latvala <adrinael@adrinael.net>
> Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
> Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
> Signed-off-by: Kamil Konieczny <kamil.konieczny@linux.intel.com>
> ---
> runner/executor.c | 24 +++++++++++++++++++-----
> 1 file changed, 19 insertions(+), 5 deletions(-)
>
> diff --git a/runner/executor.c b/runner/executor.c
> index 597cd7f5..17ebcdb8 100644
> --- a/runner/executor.c
> +++ b/runner/executor.c
> @@ -584,7 +584,7 @@ void close_outputs(int *fds)
> }
>
> /* Returns the number of bytes written to disk, or a negative number on error */
> -static long dump_dmesg(int kmsgfd, int outfd)
> +static long dump_dmesg(int kmsgfd, int outfd, size_t disk_limit)
> {
> /*
> * Write kernel messages to the log file until we reach
> @@ -599,12 +599,18 @@ static long dump_dmesg(int kmsgfd, int outfd)
> bool underflow_once = false;
> char cont;
> char buf[2048];
> - ssize_t r;
> + ssize_t r, disk_written;
> long written = 0;
>
> if (kmsgfd < 0)
> return 0;
>
> + disk_written = lseek(outfd, 0, SEEK_SET);
> + if (disk_written > disk_limit) {
> + errf("Error dumping kmsg: disk limit already exceeded\n");
> + return disk_written;
> + }
The return value is the amount written to disk by this call, return 0 here.
> +
> comparefd = open("/dev/kmsg", O_RDONLY | O_NONBLOCK);
> if (comparefd < 0) {
> errf("Error opening another fd for /dev/kmsg\n");
> @@ -655,6 +661,13 @@ static long dump_dmesg(int kmsgfd, int outfd)
>
> write(outfd, buf, r);
> written += r;
> + disk_written += r;
> +
> + if (disk_written > disk_limit) {
> + close(comparefd);
> + errf("Error dumping kmsg: disk limit exceeded\n");
> + return disk_written;
> + }
And same as above, return 'written' here instead of the current size.
All in all, this is a fine solution and it looks like I had a bit of a
brainfart originally when writing this code. When we're aborting and
killing the test, the runner lets the test (and kernel) dump out the
dying screams in hopes that those logs are useful with figuring out
why that condition happened, but disk limit being exceeded doesn't
need that additional logging. The damage is already done and what's in
the assumed-to-be already-humongous logs is the interesting bits.
With the return values changed,
Reviewed-by: Petri Latvala <adrinael@adrinael.net>
TODO for later:
1) Instead of letting dmesg log grow to an additional limit (the disk
usage limit is supposed to be _total_, stdout+stderr+dmesg), let dmesg
dumping only use what's left of the quota.
2) When disk limit is exceeded, add a message to dmesg that more
kernel logs might be available but we stopped collecting.
--
Petri Latvala
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-02-25 13:09 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-24 19:27 [igt-dev] [PATCH i-g-t] runner: check disk limit at dumping kmsg Kamil Konieczny
2023-02-24 20:22 ` [igt-dev] ✗ Fi.CI.BAT: failure for " Patchwork
2023-02-25 13:09 ` [igt-dev] [PATCH i-g-t] " Petri Latvala
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox