public inbox for cgroups@vger.kernel.org
 help / color / mirror / Atom feed
From: "Jarkko Sakkinen" <jarkko@kernel.org>
To: "Haitao Huang" <haitao.huang@linux.intel.com>,
	<dave.hansen@linux.intel.com>, <kai.huang@intel.com>,
	<tj@kernel.org>, <mkoutny@suse.com>,
	<linux-kernel@vger.kernel.org>, <linux-sgx@vger.kernel.org>,
	<x86@kernel.org>, <cgroups@vger.kernel.org>, <tglx@linutronix.de>,
	<mingo@redhat.com>, <bp@alien8.de>, <hpa@zytor.com>,
	<sohil.mehta@intel.com>, <tim.c.chen@linux.intel.com>
Cc: <zhiquan1.li@intel.com>, <kristen@linux.intel.com>,
	<seanjc@google.com>, <zhanb@microsoft.com>,
	<anakrish@microsoft.com>, <mikko.ylinen@linux.intel.com>,
	<yangjie@microsoft.com>, <chrisyan@microsoft.com>
Subject: Re: [PATCH v11 14/14] selftests/sgx: Add scripts for EPC cgroup testing
Date: Sun, 14 Apr 2024 00:34:17 +0300	[thread overview]
Message-ID: <D0JBFRTGWZV9.3TRHOTV0SCGV@kernel.org> (raw)
In-Reply-To: <20240410182558.41467-15-haitao.huang@linux.intel.com>

On Wed Apr 10, 2024 at 9:25 PM EEST, Haitao Huang wrote:
> To run selftests for EPC cgroup:
>
> sudo ./run_epc_cg_selftests.sh
>
> To watch misc cgroup 'current' changes during testing, run this in a
> separate terminal:
>
> ./watch_misc_for_tests.sh current
>
> With different cgroups, the script starts one or multiple concurrent SGX
> selftests (test_sgx), each to run the unclobbered_vdso_oversubscribed
> test case, which loads an enclave of EPC size equal to the EPC capacity
> available on the platform. The script checks results against the
> expectation set for each cgroup and reports success or failure.
>
> The script creates 3 different cgroups at the beginning with following
> expectations:
>
> 1) SMALL - intentionally small enough to fail the test loading an
> enclave of size equal to the capacity.
> 2) LARGE - large enough to run up to 4 concurrent tests but fail some if
> more than 4 concurrent tests are run. The script starts 4 expecting at
> least one test to pass, and then starts 5 expecting at least one test
> to fail.
> 3) LARGER - limit is the same as the capacity, large enough to run lots of
> concurrent tests. The script starts 8 of them and expects all pass.
> Then it reruns the same test with one process randomly killed and
> usage checked to be zero after all processes exit.
>
> The script also includes a test with low mem_cg limit and LARGE sgx_epc
> limit to verify that the RAM used for per-cgroup reclamation is charged
> to a proper mem_cg. For this test, it turns off swapping before start,
> and turns swapping back on afterwards.
>
> Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com>
> ---
> V11:
> - Remove cgroups-tools dependency and make scripts ash compatible. (Jarkko)
> - Drop support for cgroup v1 and simplify. (Michal, Jarkko)
> - Add documentation for functions. (Jarkko)
> - Turn off swapping before memcontrol tests and back on after
> - Format and style fixes, name for hard coded values
>
> V7:
> - Added memcontrol test.
>
> V5:
> - Added script with automatic results checking, remove the interactive
> script.
> - The script can run independent from the series below.
> ---
>  tools/testing/selftests/sgx/ash_cgexec.sh     |  16 +
>  .../selftests/sgx/run_epc_cg_selftests.sh     | 275 ++++++++++++++++++
>  .../selftests/sgx/watch_misc_for_tests.sh     |  11 +
>  3 files changed, 302 insertions(+)
>  create mode 100755 tools/testing/selftests/sgx/ash_cgexec.sh
>  create mode 100755 tools/testing/selftests/sgx/run_epc_cg_selftests.sh
>  create mode 100755 tools/testing/selftests/sgx/watch_misc_for_tests.sh
>
> diff --git a/tools/testing/selftests/sgx/ash_cgexec.sh b/tools/testing/selftests/sgx/ash_cgexec.sh
> new file mode 100755
> index 000000000000..cfa5d2b0e795
> --- /dev/null
> +++ b/tools/testing/selftests/sgx/ash_cgexec.sh
> @@ -0,0 +1,16 @@
> +#!/usr/bin/env sh
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright(c) 2024 Intel Corporation.
> +
> +# Start a program in a given cgroup.
> +# Supports V2 cgroup paths, relative to /sys/fs/cgroup
> +if [ "$#" -lt 2 ]; then
> +    echo "Usage: $0 <v2 cgroup path> <command> [args...]"
> +    exit 1
> +fi
> +# Move this shell to the cgroup.
> +echo 0 >/sys/fs/cgroup/$1/cgroup.procs
> +shift
> +# Execute the command within the cgroup
> +exec "$@"
> +
> diff --git a/tools/testing/selftests/sgx/run_epc_cg_selftests.sh b/tools/testing/selftests/sgx/run_epc_cg_selftests.sh
> new file mode 100755
> index 000000000000..dd56273056fc
> --- /dev/null
> +++ b/tools/testing/selftests/sgx/run_epc_cg_selftests.sh
> @@ -0,0 +1,275 @@
> +#!/usr/bin/env sh
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright(c) 2023, 2024 Intel Corporation.
> +
> +TEST_ROOT_CG=selftest
> +TEST_CG_SUB1=$TEST_ROOT_CG/test1
> +TEST_CG_SUB2=$TEST_ROOT_CG/test2
> +# We will only set limit in test1 and run tests in test3
> +TEST_CG_SUB3=$TEST_ROOT_CG/test1/test3
> +TEST_CG_SUB4=$TEST_ROOT_CG/test4
> +
> +# Cgroup v2 only
> +CG_ROOT=/sys/fs/cgroup
> +mkdir -p $CG_ROOT/$TEST_CG_SUB1
> +mkdir -p $CG_ROOT/$TEST_CG_SUB2
> +mkdir -p $CG_ROOT/$TEST_CG_SUB3
> +mkdir -p $CG_ROOT/$TEST_CG_SUB4
> +
> +# Turn on misc and memory controller in non-leaf nodes
> +echo "+misc" >  $CG_ROOT/cgroup.subtree_control && \
> +echo "+memory" > $CG_ROOT/cgroup.subtree_control && \
> +echo "+misc" >  $CG_ROOT/$TEST_ROOT_CG/cgroup.subtree_control && \
> +echo "+memory" > $CG_ROOT/$TEST_ROOT_CG/cgroup.subtree_control && \
> +echo "+misc" >  $CG_ROOT/$TEST_CG_SUB1/cgroup.subtree_control
> +if [ $? -ne 0 ]; then
> +    echo "# Failed setting up cgroups, make sure misc and memory cgroups are enabled."
> +    exit 1
> +fi
> +
> +CAPACITY=$(grep "sgx_epc" "$CG_ROOT/misc.capacity" | awk '{print $2}')
> +# This is below number of VA pages needed for enclave of capacity size. So
> +# should fail oversubscribed cases
> +SMALL=$(( CAPACITY / 512 ))
> +
> +# At least load one enclave of capacity size successfully, maybe up to 4.
> +# But some may fail if we run more than 4 concurrent enclaves of capacity size.
> +LARGE=$(( SMALL * 4 ))
> +
> +# Load lots of enclaves
> +LARGER=$CAPACITY
> +echo "# Setting up limits."
> +echo "sgx_epc $SMALL" > $CG_ROOT/$TEST_CG_SUB1/misc.max && \
> +echo "sgx_epc $LARGE" >  $CG_ROOT/$TEST_CG_SUB2/misc.max && \
> +echo "sgx_epc $LARGER" > $CG_ROOT/$TEST_CG_SUB4/misc.max
> +if [ $? -ne 0 ]; then
> +    echo "# Failed setting up misc limits."
> +    exit 1
> +fi
> +
> +clean_up()
> +{
> +    sleep 2
> +    rmdir $CG_ROOT/$TEST_CG_SUB2
> +    rmdir $CG_ROOT/$TEST_CG_SUB3
> +    rmdir $CG_ROOT/$TEST_CG_SUB4
> +    rmdir $CG_ROOT/$TEST_CG_SUB1
> +    rmdir $CG_ROOT/$TEST_ROOT_CG
> +}
> +
> +timestamp=$(date +%Y%m%d_%H%M%S)
> +
> +test_cmd="./test_sgx -t unclobbered_vdso_oversubscribed"
> +
> +PROCESS_SUCCESS=1
> +PROCESS_FAILURE=0
> +
> +# Wait for a process and check for expected exit status.
> +#
> +# Arguments:
> +#	$1 - the pid of the process to wait and check.
> +#	$2 - 1 if expecting success, 0 for failure.
> +#
> +# Return:
> +#	0 if the exit status of the process matches the expectation.
> +#	1 otherwise.
> +wait_check_process_status() {
> +    pid=$1
> +    check_for_success=$2
> +
> +    wait "$pid"
> +    status=$?
> +
> +    if [ $check_for_success -eq $PROCESS_SUCCESS ] && [ $status -eq 0 ]; then
> +        echo "# Process $pid succeeded."
> +        return 0
> +    elif [ $check_for_success -eq $PROCESS_FAILURE ] && [ $status -ne 0 ]; then
> +        echo "# Process $pid returned failure."
> +        return 0
> +    fi
> +    return 1
> +}
> +
> +# Wait for a set of processes and check for expected exit status
> +#
> +# Arguments:
> +#	$1 - 1 if expecting success, 0 for failure.
> +#	remaining args - The pids of the processes
> +#
> +# Return:
> +#	0 if exit status of any process matches the expectation.
> +#	1 otherwise.
> +wait_and_detect_for_any() {
> +    check_for_success=$1
> +
> +    shift
> +    detected=1 # 0 for success detection
> +
> +    for pid in $@; do
> +        if wait_check_process_status "$pid" "$check_for_success"; then
> +            detected=0
> +            # Wait for other processes to exit
> +        fi
> +    done
> +
> +    return $detected
> +}
> +
> +echo "# Start unclobbered_vdso_oversubscribed with SMALL limit, expecting failure..."
> +# Always use leaf node of misc cgroups
> +# these may fail on OOM
> +./ash_cgexec.sh $TEST_CG_SUB3 $test_cmd >cgtest_small_$timestamp.log 2>&1
> +if [ $? -eq 0 ]; then
> +    echo "# Fail on SMALL limit, not expecting any test passes."
> +    clean_up
> +    exit 1
> +else
> +    echo "# Test failed as expected."
> +fi
> +
> +echo "# PASSED SMALL limit."
> +
> +echo "# Start 4 concurrent unclobbered_vdso_oversubscribed tests with LARGE limit,
> +        expecting at least one success...."
> +
> +pids=""
> +for i in 1 2 3 4; do
> +    (
> +        ./ash_cgexec.sh $TEST_CG_SUB2 $test_cmd >cgtest_large_positive_$timestamp.$i.log 2>&1
> +    ) &
> +    pids="$pids $!"
> +done
> +
> +
> +if wait_and_detect_for_any $PROCESS_SUCCESS "$pids"; then
> +    echo "# PASSED LARGE limit positive testing."
> +else
> +    echo "# Failed on LARGE limit positive testing, no test passes."
> +    clean_up
> +    exit 1
> +fi
> +
> +echo "# Start 5 concurrent unclobbered_vdso_oversubscribed tests with LARGE limit,
> +        expecting at least one failure...."
> +pids=""
> +for i in 1 2 3 4 5; do
> +    (
> +        ./ash_cgexec.sh $TEST_CG_SUB2 $test_cmd >cgtest_large_negative_$timestamp.$i.log 2>&1
> +    ) &
> +    pids="$pids $!"
> +done
> +
> +if wait_and_detect_for_any $PROCESS_FAILURE "$pids"; then
> +    echo "# PASSED LARGE limit negative testing."
> +else
> +    echo "# Failed on LARGE limit negative testing, no test fails."
> +    clean_up
> +    exit 1
> +fi
> +
> +echo "# Start 8 concurrent unclobbered_vdso_oversubscribed tests with LARGER limit,
> +        expecting no failure...."
> +pids=""
> +for i in 1 2 3 4 5 6 7 8; do
> +    (
> +        ./ash_cgexec.sh $TEST_CG_SUB4 $test_cmd >cgtest_larger_$timestamp.$i.log 2>&1
> +    ) &
> +    pids="$pids $!"
> +done
> +
> +if wait_and_detect_for_any $PROCESS_FAILURE "$pids"; then
> +    echo "# Failed on LARGER limit, at least one test fails."
> +    clean_up
> +    exit 1
> +else
> +    echo "# PASSED LARGER limit tests."
> +fi
> +
> +echo "# Start 8 concurrent unclobbered_vdso_oversubscribed tests with LARGER limit,
> +      randomly kill one, expecting no failure...."
> +pids=""
> +for i in 1 2 3 4 5 6 7 8; do
> +    (
> +        ./ash_cgexec.sh $TEST_CG_SUB4 $test_cmd >cgtest_larger_kill_$timestamp.$i.log 2>&1
> +    ) &
> +    pids="$pids $!"
> +done
> +random_number=$(awk 'BEGIN{srand();print int(rand()*5)}')
> +sleep $((random_number + 1))
> +
> +# Randomly select a process to kill
> +# Make sure usage counter not leaked at the end.
> +RANDOM_INDEX=$(awk 'BEGIN{srand();print int(rand()*8)}')
> +counter=0
> +for pid in $pids; do
> +    if [ "$counter" -eq "$RANDOM_INDEX" ]; then
> +        PID_TO_KILL=$pid
> +        break
> +    fi
> +    counter=$((counter + 1))
> +done
> +
> +kill $PID_TO_KILL
> +echo "# Killed process with PID: $PID_TO_KILL"
> +
> +any_failure=0
> +for pid in $pids; do
> +    wait "$pid"
> +    status=$?
> +    if [ "$pid" != "$PID_TO_KILL" ]; then
> +        if [ $status -ne 0 ]; then
> +	    echo "# Process $pid returned failure."
> +            any_failure=1
> +        fi
> +    fi
> +done
> +
> +if [ $any_failure -ne 0 ]; then
> +    echo "# Failed on random killing, at least one test fails."
> +    clean_up
> +    exit 1
> +fi
> +echo "# PASSED LARGER limit test with a process randomly killed."
> +
> +MEM_LIMIT_TOO_SMALL=$((CAPACITY - 2 * LARGE))
> +
> +echo "$MEM_LIMIT_TOO_SMALL" > $CG_ROOT/$TEST_CG_SUB2/memory.max
> +if [ $? -ne 0 ]; then
> +    echo "# Failed creating memory controller."
> +    clean_up
> +    exit 1
> +fi
> +
> +echo "# Start 4 concurrent unclobbered_vdso_oversubscribed tests with LARGE EPC limit,
> +        and too small RAM limit, expecting all failures...."
> +# Ensure swapping off so the OOM killer is activated when mem_cgroup limit is hit.
> +swapoff -a
> +pids=""
> +for i in 1 2 3 4; do
> +    (
> +        ./ash_cgexec.sh $TEST_CG_SUB2 $test_cmd >cgtest_large_oom_$timestamp.$i.log 2>&1
> +    ) &
> +    pids="$pids $!"
> +done
> +
> +if wait_and_detect_for_any $PROCESS_SUCCESS "$pids"; then
> +    echo "# Failed on tests with memcontrol, some tests did not fail."
> +    clean_up
> +    swapon -a
> +    exit 1
> +else
> +    swapon -a
> +    echo "# PASSED LARGE limit tests with memcontrol."
> +fi
> +
> +sleep 2
> +
> +USAGE=$(grep '^sgx_epc' "$CG_ROOT/$TEST_ROOT_CG/misc.current" | awk '{print $2}')
> +if [ "$USAGE" -ne 0 ]; then
> +    echo "# Failed: Final usage is $USAGE, not 0."
> +else
> +    echo "# PASSED leakage check."
> +    echo "# PASSED ALL cgroup limit tests, cleanup cgroups..."
> +fi
> +clean_up
> +echo "# done."
> diff --git a/tools/testing/selftests/sgx/watch_misc_for_tests.sh b/tools/testing/selftests/sgx/watch_misc_for_tests.sh
> new file mode 100755
> index 000000000000..1c9985726ace
> --- /dev/null
> +++ b/tools/testing/selftests/sgx/watch_misc_for_tests.sh
> @@ -0,0 +1,11 @@
> +#!/usr/bin/env sh
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright(c) 2023, 2024 Intel Corporation.
> +
> +if [ -z "$1" ]; then
> +    echo "No argument supplied, please provide 'max', 'current', or 'events'"
> +    exit 1
> +fi
> +
> +watch -n 1 'find /sys/fs/cgroup -wholename "*/test*/misc.'$1'" -exec \
> +    sh -c '\''echo "$1:"; cat "$1"'\'' _ {} \;'

I'll compile the kernel with this and see what happens!

Have you tried to run the test suite from top-level? This is just a
sanity check. I've few times forgot to do this so thus asking :-)

BR, Jarkko

  reply	other threads:[~2024-04-13 21:34 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-10 18:25 [PATCH v11 00/14] Add Cgroup support for SGX EPC memory Haitao Huang
2024-04-10 18:25 ` [PATCH v11 01/14] x86/sgx: Replace boolean parameters with enums Haitao Huang
2024-04-15 13:22   ` Huang, Kai
2024-04-15 19:10     ` Jarkko Sakkinen
2024-04-10 18:25 ` [PATCH v11 02/14] cgroup/misc: Add per resource callbacks for CSS events Haitao Huang
2024-04-15 13:43   ` Huang, Kai
2024-04-10 18:25 ` [PATCH v11 03/14] cgroup/misc: Export APIs for SGX driver Haitao Huang
2024-04-15 13:45   ` Huang, Kai
2024-04-10 18:25 ` [PATCH v11 04/14] cgroup/misc: Add SGX EPC resource type Haitao Huang
2024-04-15 13:49   ` Huang, Kai
2024-04-10 18:25 ` [PATCH v11 05/14] x86/sgx: Implement basic EPC misc cgroup functionality Haitao Huang
2024-04-10 18:25 ` [PATCH v11 06/14] x86/sgx: Add sgx_epc_lru_list to encapsulate LRU list Haitao Huang
2024-04-15 13:51   ` Huang, Kai
2024-04-10 18:25 ` [PATCH v11 07/14] x86/sgx: Abstract tracking reclaimable pages in LRU Haitao Huang
2024-04-10 18:25 ` [PATCH v11 08/14] x86/sgx: Add basic EPC reclamation flow for cgroup Haitao Huang
2024-04-10 18:25 ` [PATCH v11 09/14] x86/sgx: Implement async reclamation " Haitao Huang
2024-04-10 18:25 ` [PATCH v11 10/14] x86/sgx: Charge mem_cgroup for per-cgroup reclamation Haitao Huang
2024-04-10 18:25 ` [PATCH v11 11/14] x86/sgx: Abstract check for global reclaimable pages Haitao Huang
2024-04-10 18:25 ` [PATCH v11 12/14] x86/sgx: Turn on per-cgroup EPC reclamation Haitao Huang
2024-04-10 18:25 ` [PATCH v11 13/14] Docs/x86/sgx: Add description for cgroup support Haitao Huang
2024-04-10 18:25 ` [PATCH v11 14/14] selftests/sgx: Add scripts for EPC cgroup testing Haitao Huang
2024-04-13 21:34   ` Jarkko Sakkinen [this message]
2024-04-15 17:32     ` Haitao Huang
2024-04-15 19:12       ` Jarkko Sakkinen
2024-04-14 15:01   ` Jarkko Sakkinen
2024-04-15  3:13     ` Haitao Huang
2024-04-15 19:08       ` Jarkko Sakkinen
2024-04-15 19:28         ` Haitao Huang
2024-04-22 19:38         ` Haitao Huang
2024-04-13  6:48 ` [PATCH v11 00/14] Add Cgroup support for SGX EPC memory Mikko Ylinen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D0JBFRTGWZV9.3TRHOTV0SCGV@kernel.org \
    --to=jarkko@kernel.org \
    --cc=anakrish@microsoft.com \
    --cc=bp@alien8.de \
    --cc=cgroups@vger.kernel.org \
    --cc=chrisyan@microsoft.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=haitao.huang@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=kai.huang@intel.com \
    --cc=kristen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-sgx@vger.kernel.org \
    --cc=mikko.ylinen@linux.intel.com \
    --cc=mingo@redhat.com \
    --cc=mkoutny@suse.com \
    --cc=seanjc@google.com \
    --cc=sohil.mehta@intel.com \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=tj@kernel.org \
    --cc=x86@kernel.org \
    --cc=yangjie@microsoft.com \
    --cc=zhanb@microsoft.com \
    --cc=zhiquan1.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox