* [LTP] [PATCH] controllers/cgroup_regression_test.sh: mitigate potential mount error
@ 2019-02-22 6:10 Xu Yu
2019-04-04 9:10 ` Cyril Hrubis
0 siblings, 1 reply; 4+ messages in thread
From: Xu Yu @ 2019-02-22 6:10 UTC (permalink / raw)
To: ltp
Immediately `umount cgroup/` after `rmdir cgroup/0` is very likely to
make the corresponding num_cgroups not decrease, and causes the
following mount operation with overlapping subsys to fail.
A demo test script can be:
mount -t cgroup -o hugetlb,pids xxx cgroup/
mkdir cgroup/0
rmdir cgroup/0
umount cgroup/
mount -t cgroup -o pids xxx cgroup/ <-- FAIL
The root cause is that `rmdir cgroup/0` is asynchronous in the kernel
implementation, causing `umount cgroup/` to enter `cgroup_put` path,
instead of `percpu_ref_kill` path.
There is no good kernel solution yet[1]. Therefore, we temporarily add
`sleep` in the test script to ensure `umount cgroup/` is executed
after `rmdir cgroup/0` is completed. Note that we only add `sleep` in
the clean up phase of each test in the cgroup_regression_test.sh.
No `sleep` is added in the cgroup_regression_6_1.sh and
cgroup_regression_10_1.sh for the sake of pressure test.
[1] https://marc.info/?t=155021167400005&r=1&w=2
Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
---
testcases/kernel/controllers/cgroup/cgroup_regression_test.sh | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/testcases/kernel/controllers/cgroup/cgroup_regression_test.sh b/testcases/kernel/controllers/cgroup/cgroup_regression_test.sh
index 6868609..f78d80a 100755
--- a/testcases/kernel/controllers/cgroup/cgroup_regression_test.sh
+++ b/testcases/kernel/controllers/cgroup/cgroup_regression_test.sh
@@ -145,6 +145,7 @@ test2()
fi
rmdir cgroup/0 cgroup/1
+ sleep 1
umount cgroup/
}
@@ -193,6 +194,7 @@ test3()
wait $pid2 2>/dev/null
rmdir $cpu_subsys_path/0 2> /dev/null
+ sleep 1
umount cgroup/ 2> /dev/null
check_kernel_bug
}
@@ -222,6 +224,7 @@ test4()
mount -t cgroup -o none,name=foo cgroup cgroup/
mkdir cgroup/0
rmdir cgroup/0
+ sleep 1
umount cgroup/
if dmesg | grep -q "MAX_LOCKDEP_SUBCLASSES too low"; then
@@ -302,6 +305,7 @@ test5()
kill -TERM $! > /dev/null
wait $! 2>/dev/null
rmdir $mntpoint/0
+ sleep 1
# Do NOT unmount pre-existent mountpoints...
[ -z "$mounted" ] && umount $mntpoint/
check_kernel_bug
@@ -339,6 +343,7 @@ test6()
mount -t cgroup -o ns xxx cgroup/ > /dev/null 2>&1
rmdir cgroup/[1-9]* > /dev/null 2>&1
+ sleep 1
umount cgroup/
check_kernel_bug
}
@@ -373,6 +378,7 @@ test_7_1()
mkdir $subsys_path/0
sleep 100 < $subsys_path/0 & # add refcnt to this dir
rmdir $subsys_path/0
+ sleep 1
# remount with new subsystems added
# since 2.6.28, this remount will fail
@@ -398,6 +404,7 @@ test_7_2()
mkdir cgroup/0
sleep 100 < cgroup/0 & # add refcnt to this dir
rmdir cgroup/0
+ sleep 1
# remount with some subsystems removed
# since 2.6.28, this remount will fail
@@ -510,6 +517,7 @@ test10()
mount -t cgroup none cgroup 2> /dev/null
mkdir cgroup/0
rmdir cgroup/0
+ sleep 1
umount cgroup/ 2> /dev/null
check_kernel_bug
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [LTP] [PATCH] controllers/cgroup_regression_test.sh: mitigate potential mount error
2019-02-22 6:10 [LTP] [PATCH] controllers/cgroup_regression_test.sh: mitigate potential mount error Xu Yu
@ 2019-04-04 9:10 ` Cyril Hrubis
2019-04-10 7:04 ` Yu Xu
0 siblings, 1 reply; 4+ messages in thread
From: Cyril Hrubis @ 2019-04-04 9:10 UTC (permalink / raw)
To: ltp
Hi!
First of all sorry for the delayed response.
> Immediately `umount cgroup/` after `rmdir cgroup/0` is very likely to
> make the corresponding num_cgroups not decrease, and causes the
> following mount operation with overlapping subsys to fail.
>
> A demo test script can be:
> mount -t cgroup -o hugetlb,pids xxx cgroup/
> mkdir cgroup/0
> rmdir cgroup/0
> umount cgroup/
> mount -t cgroup -o pids xxx cgroup/ <-- FAIL
>
> The root cause is that `rmdir cgroup/0` is asynchronous in the kernel
> implementation, causing `umount cgroup/` to enter `cgroup_put` path,
> instead of `percpu_ref_kill` path.
>
> There is no good kernel solution yet[1]. Therefore, we temporarily add
> `sleep` in the test script to ensure `umount cgroup/` is executed
> after `rmdir cgroup/0` is completed. Note that we only add `sleep` in
> the clean up phase of each test in the cgroup_regression_test.sh.
> No `sleep` is added in the cgroup_regression_6_1.sh and
> cgroup_regression_10_1.sh for the sake of pressure test.
There is always better solution than sprinking the code with sleeps,
here we can retry the mount instead, which would be faster and more
reliable. And we even have functions for this see
https://github.com/linux-test-project/ltp/wiki/Test-Writing-Guidelines#retry-a-function-in-limited-time
--
Cyril Hrubis
chrubis@suse.cz
^ permalink raw reply [flat|nested] 4+ messages in thread
* [LTP] [PATCH] controllers/cgroup_regression_test.sh: mitigate potential mount error
2019-04-04 9:10 ` Cyril Hrubis
@ 2019-04-10 7:04 ` Yu Xu
2019-10-08 14:19 ` Cyril Hrubis
0 siblings, 1 reply; 4+ messages in thread
From: Yu Xu @ 2019-04-10 7:04 UTC (permalink / raw)
To: ltp
On 4/4/19 5:10 PM, Cyril Hrubis wrote:
> Hi!
> First of all sorry for the delayed response.
>
>> Immediately `umount cgroup/` after `rmdir cgroup/0` is very likely to
>> make the corresponding num_cgroups not decrease, and causes the
>> following mount operation with overlapping subsys to fail.
>>
>> A demo test script can be:
>> mount -t cgroup -o hugetlb,pids xxx cgroup/
>> mkdir cgroup/0
>> rmdir cgroup/0
>> umount cgroup/
>> mount -t cgroup -o pids xxx cgroup/ <-- FAIL
>>
>> The root cause is that `rmdir cgroup/0` is asynchronous in the kernel
>> implementation, causing `umount cgroup/` to enter `cgroup_put` path,
>> instead of `percpu_ref_kill` path.
>>
>> There is no good kernel solution yet[1]. Therefore, we temporarily add
>> `sleep` in the test script to ensure `umount cgroup/` is executed
>> after `rmdir cgroup/0` is completed. Note that we only add `sleep` in
>> the clean up phase of each test in the cgroup_regression_test.sh.
>> No `sleep` is added in the cgroup_regression_6_1.sh and
>> cgroup_regression_10_1.sh for the sake of pressure test.
>
> There is always better solution than sprinking the code with sleeps,
> here we can retry the mount instead, which would be faster and more
thanks for your kind reply!
retry mount cannot help here(at least for the current kernel), because
`umount cgroup/` **immediately** after `rmdir cgroup/0` corrupts the
cgroup_root refcnt, and it will always cause the subsequent remount
failure(s) whenever remounting overlapping controllers later.
so I simply put a sleep between between `umount cgroup/` and `rmdir
cgroup/0` to synchronize them. <- any good idea to this point?
and attach the related kernel issue again:
https://marc.info/?t=155021167400005&r=1&w=2
thanks, Yu.
> reliable. And we even have functions for this see
> https://github.com/linux-test-project/ltp/wiki/Test-Writing-Guidelines#retry-a-function-in-limited-time
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* [LTP] [PATCH] controllers/cgroup_regression_test.sh: mitigate potential mount error
2019-04-10 7:04 ` Yu Xu
@ 2019-10-08 14:19 ` Cyril Hrubis
0 siblings, 0 replies; 4+ messages in thread
From: Cyril Hrubis @ 2019-10-08 14:19 UTC (permalink / raw)
To: ltp
Hi!
> > There is always better solution than sprinking the code with sleeps,
> > here we can retry the mount instead, which would be faster and more
>
> thanks for your kind reply!
>
> retry mount cannot help here(at least for the current kernel), because
> `umount cgroup/` **immediately** after `rmdir cgroup/0` corrupts the
> cgroup_root refcnt, and it will always cause the subsequent remount
> failure(s) whenever remounting overlapping controllers later.
>
> so I simply put a sleep between between `umount cgroup/` and `rmdir
> cgroup/0` to synchronize them. <- any good idea to this point?
>
> and attach the related kernel issue again:
> https://marc.info/?t=155021167400005&r=1&w=2
We are not working around kernel bugs in tests.
Too bad that it seems that nobody is interested in fixing the v1 cgroup
API. I guess this is a signal that we should stop using it as fast as
possible and migrate to v2.
--
Cyril Hrubis
chrubis@suse.cz
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2019-10-08 14:19 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-02-22 6:10 [LTP] [PATCH] controllers/cgroup_regression_test.sh: mitigate potential mount error Xu Yu
2019-04-04 9:10 ` Cyril Hrubis
2019-04-10 7:04 ` Yu Xu
2019-10-08 14:19 ` Cyril Hrubis
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox