public inbox for ltp@lists.linux.it
 help / color / mirror / Atom feed
* [LTP] [PATCH] controllers/cgroup_regression_test.sh: mitigate potential mount error
@ 2019-02-22  6:10 Xu Yu
  2019-04-04  9:10 ` Cyril Hrubis
  0 siblings, 1 reply; 4+ messages in thread
From: Xu Yu @ 2019-02-22  6:10 UTC (permalink / raw)
  To: ltp

Immediately `umount cgroup/` after `rmdir cgroup/0` is very likely to
make the corresponding num_cgroups not decrease, and causes the
following mount operation with overlapping subsys to fail.

A demo test script can be:
mount -t cgroup -o hugetlb,pids xxx cgroup/
mkdir cgroup/0
rmdir cgroup/0
umount cgroup/
mount -t cgroup -o pids xxx cgroup/  <-- FAIL

The root cause is that `rmdir cgroup/0` is asynchronous in the kernel
implementation, causing `umount cgroup/` to enter `cgroup_put` path,
instead of `percpu_ref_kill` path.

There is no good kernel solution yet[1]. Therefore, we temporarily add
`sleep` in the test script to ensure `umount cgroup/` is executed
after `rmdir cgroup/0` is completed. Note that we only add `sleep` in
the clean up phase of each test in the cgroup_regression_test.sh.
No `sleep` is added in the cgroup_regression_6_1.sh and
cgroup_regression_10_1.sh for the sake of pressure test.

[1] https://marc.info/?t=155021167400005&r=1&w=2

Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
---
 testcases/kernel/controllers/cgroup/cgroup_regression_test.sh | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/testcases/kernel/controllers/cgroup/cgroup_regression_test.sh b/testcases/kernel/controllers/cgroup/cgroup_regression_test.sh
index 6868609..f78d80a 100755
--- a/testcases/kernel/controllers/cgroup/cgroup_regression_test.sh
+++ b/testcases/kernel/controllers/cgroup/cgroup_regression_test.sh
@@ -145,6 +145,7 @@ test2()
 	fi
 
 	rmdir cgroup/0 cgroup/1
+	sleep 1
 	umount cgroup/
 }
 
@@ -193,6 +194,7 @@ test3()
 	wait $pid2 2>/dev/null
 
 	rmdir $cpu_subsys_path/0 2> /dev/null
+	sleep 1
 	umount cgroup/ 2> /dev/null
 	check_kernel_bug
 }
@@ -222,6 +224,7 @@ test4()
 	mount -t cgroup -o none,name=foo cgroup cgroup/
 	mkdir cgroup/0
 	rmdir cgroup/0
+	sleep 1
 	umount cgroup/
 
 	if dmesg | grep -q "MAX_LOCKDEP_SUBCLASSES too low"; then
@@ -302,6 +305,7 @@ test5()
 	kill -TERM $! > /dev/null
 	wait $! 2>/dev/null
 	rmdir $mntpoint/0
+	sleep 1
 	# Do NOT unmount pre-existent mountpoints...
 	[ -z "$mounted" ] && umount $mntpoint/
 	check_kernel_bug
@@ -339,6 +343,7 @@ test6()
 
 	mount -t cgroup -o ns xxx cgroup/ > /dev/null 2>&1
 	rmdir cgroup/[1-9]* > /dev/null 2>&1
+	sleep 1
 	umount cgroup/
 	check_kernel_bug
 }
@@ -373,6 +378,7 @@ test_7_1()
 	mkdir $subsys_path/0
 	sleep 100 < $subsys_path/0 &	# add refcnt to this dir
 	rmdir $subsys_path/0
+	sleep 1
 
 	# remount with new subsystems added
 	# since 2.6.28, this remount will fail
@@ -398,6 +404,7 @@ test_7_2()
 	mkdir cgroup/0
 	sleep 100 < cgroup/0 &	# add refcnt to this dir
 	rmdir cgroup/0
+	sleep 1
 
 	# remount with some subsystems removed
 	# since 2.6.28, this remount will fail
@@ -510,6 +517,7 @@ test10()
 	mount -t cgroup none cgroup 2> /dev/null
 	mkdir cgroup/0
 	rmdir cgroup/0
+	sleep 1
 	umount cgroup/ 2> /dev/null
 	check_kernel_bug
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [LTP] [PATCH] controllers/cgroup_regression_test.sh: mitigate potential mount error
  2019-02-22  6:10 [LTP] [PATCH] controllers/cgroup_regression_test.sh: mitigate potential mount error Xu Yu
@ 2019-04-04  9:10 ` Cyril Hrubis
  2019-04-10  7:04   ` Yu Xu
  0 siblings, 1 reply; 4+ messages in thread
From: Cyril Hrubis @ 2019-04-04  9:10 UTC (permalink / raw)
  To: ltp

Hi!
First of all sorry for the delayed response.

> Immediately `umount cgroup/` after `rmdir cgroup/0` is very likely to
> make the corresponding num_cgroups not decrease, and causes the
> following mount operation with overlapping subsys to fail.
> 
> A demo test script can be:
> mount -t cgroup -o hugetlb,pids xxx cgroup/
> mkdir cgroup/0
> rmdir cgroup/0
> umount cgroup/
> mount -t cgroup -o pids xxx cgroup/  <-- FAIL
> 
> The root cause is that `rmdir cgroup/0` is asynchronous in the kernel
> implementation, causing `umount cgroup/` to enter `cgroup_put` path,
> instead of `percpu_ref_kill` path.
> 
> There is no good kernel solution yet[1]. Therefore, we temporarily add
> `sleep` in the test script to ensure `umount cgroup/` is executed
> after `rmdir cgroup/0` is completed. Note that we only add `sleep` in
> the clean up phase of each test in the cgroup_regression_test.sh.
> No `sleep` is added in the cgroup_regression_6_1.sh and
> cgroup_regression_10_1.sh for the sake of pressure test.

There is always better solution than sprinking the code with sleeps,
here we can retry the mount instead, which would be faster and more
reliable. And we even have functions for this see
https://github.com/linux-test-project/ltp/wiki/Test-Writing-Guidelines#retry-a-function-in-limited-time

-- 
Cyril Hrubis
chrubis@suse.cz

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [LTP] [PATCH] controllers/cgroup_regression_test.sh: mitigate potential mount error
  2019-04-04  9:10 ` Cyril Hrubis
@ 2019-04-10  7:04   ` Yu Xu
  2019-10-08 14:19     ` Cyril Hrubis
  0 siblings, 1 reply; 4+ messages in thread
From: Yu Xu @ 2019-04-10  7:04 UTC (permalink / raw)
  To: ltp

On 4/4/19 5:10 PM, Cyril Hrubis wrote:
> Hi!
> First of all sorry for the delayed response.
> 
>> Immediately `umount cgroup/` after `rmdir cgroup/0` is very likely to
>> make the corresponding num_cgroups not decrease, and causes the
>> following mount operation with overlapping subsys to fail.
>>
>> A demo test script can be:
>> mount -t cgroup -o hugetlb,pids xxx cgroup/
>> mkdir cgroup/0
>> rmdir cgroup/0
>> umount cgroup/
>> mount -t cgroup -o pids xxx cgroup/  <-- FAIL
>>
>> The root cause is that `rmdir cgroup/0` is asynchronous in the kernel
>> implementation, causing `umount cgroup/` to enter `cgroup_put` path,
>> instead of `percpu_ref_kill` path.
>>
>> There is no good kernel solution yet[1]. Therefore, we temporarily add
>> `sleep` in the test script to ensure `umount cgroup/` is executed
>> after `rmdir cgroup/0` is completed. Note that we only add `sleep` in
>> the clean up phase of each test in the cgroup_regression_test.sh.
>> No `sleep` is added in the cgroup_regression_6_1.sh and
>> cgroup_regression_10_1.sh for the sake of pressure test.
> 
> There is always better solution than sprinking the code with sleeps,
> here we can retry the mount instead, which would be faster and more

thanks for your kind reply!

retry mount cannot help here(at least for the current kernel), because 
`umount cgroup/` **immediately** after `rmdir cgroup/0` corrupts the 
cgroup_root refcnt, and it will always cause the subsequent remount 
failure(s) whenever remounting overlapping controllers later.

so I simply put a sleep between between `umount cgroup/` and `rmdir 
cgroup/0` to synchronize them.  <- any good idea to this point?

and attach the related kernel issue again:
https://marc.info/?t=155021167400005&r=1&w=2

thanks, Yu.

> reliable. And we even have functions for this see
> https://github.com/linux-test-project/ltp/wiki/Test-Writing-Guidelines#retry-a-function-in-limited-time
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [LTP] [PATCH] controllers/cgroup_regression_test.sh: mitigate potential mount error
  2019-04-10  7:04   ` Yu Xu
@ 2019-10-08 14:19     ` Cyril Hrubis
  0 siblings, 0 replies; 4+ messages in thread
From: Cyril Hrubis @ 2019-10-08 14:19 UTC (permalink / raw)
  To: ltp

Hi!
> > There is always better solution than sprinking the code with sleeps,
> > here we can retry the mount instead, which would be faster and more
> 
> thanks for your kind reply!
> 
> retry mount cannot help here(at least for the current kernel), because 
> `umount cgroup/` **immediately** after `rmdir cgroup/0` corrupts the 
> cgroup_root refcnt, and it will always cause the subsequent remount 
> failure(s) whenever remounting overlapping controllers later.
> 
> so I simply put a sleep between between `umount cgroup/` and `rmdir 
> cgroup/0` to synchronize them.  <- any good idea to this point?
> 
> and attach the related kernel issue again:
> https://marc.info/?t=155021167400005&r=1&w=2

We are not working around kernel bugs in tests.

Too bad that it seems that nobody is interested in fixing the v1 cgroup
API. I guess this is a signal that we should stop using it as fast as
possible and migrate to v2.

-- 
Cyril Hrubis
chrubis@suse.cz

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-10-08 14:19 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-02-22  6:10 [LTP] [PATCH] controllers/cgroup_regression_test.sh: mitigate potential mount error Xu Yu
2019-04-04  9:10 ` Cyril Hrubis
2019-04-10  7:04   ` Yu Xu
2019-10-08 14:19     ` Cyril Hrubis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox