public inbox for ltp@lists.linux.it
 help / color / mirror / Atom feed
* [LTP] [RFC] cpu_hotplug: Adding a cpu hotplug stress test
@ 2012-07-17 11:00 preeti
  2012-07-18  3:02 ` Wanlong Gao
  0 siblings, 1 reply; 6+ messages in thread
From: preeti @ 2012-07-17 11:00 UTC (permalink / raw)
  To: ltp-list@lists.sourceforge.net

Hi

The test case included is a simple case for cpu hotplug.It does offlines the cpus that are online and does an online of the offlined cpus in a loop

This stress test had failed on certain distros when the loop was run infinite times.This test is presented here for review of correctness and necessity,as this is the first attempt at contributing test cases to LTP from this end.

The test is meant to be included under the testcases/kernel/hotplug/cpu_hotplug/functional directory.

Regards
Preeti
---

# File		:	stress_cpu_hotplug.sh
# Description	:	Switches the online state of all the cpus in  a
# 			loop to test the robustness of cpu hotplug
#		:	The loop iteration of 20 is a randomly chosen number

#! /bin/bash

# Includes:
LHCS_PATH=${LHCS_PATH:-$LTPROOT/testcases/bin/cpu_hotplug}
. $LHCS_PATH/include/hotplug.fns
. $LHCS_PATH/include/testsuite.fns

setup()
{
	export TST_TOTAL=1
	export TCID="setup"
	export TST_COUNT=0

	trap "cleanup" 0
	RC=0

	return $RC

}
cleanup()
{
	set_all_cpu_state "$STATE"
}

test01()
{

	TCID="stress_cpu_hotplug"
	TST_COUNT=1
	RC=0

	NUMBER_OF_CPUS=`ls -d /sys/devices/system/cpu/cpu[0-9]* | wc -l`

	cd /sys/devices/system/cpu

	for ARRAY_INDEX in `seq 20`
	do
		for ((i=1; i < NUMBER_OF_CPUS; i++ ))
		do
			#skip the boot cpu;cannot offline it
			if [ $i -eq 0 ]
			then
				continue
			fi

        		state=`cat cpu$i/online`
		        if [ $state -eq 0 ]
        		then
				RC=online_cpu $i
        		else
                		RC=offline_cpu $i
        		fi

			if [ $RC -ne 0 ]
			then
				test_brkm TBROK NULL "stress_cpu_hotplug:
							cpu$i failed to hotplug"
				return $RC
			fi
		done

		if [ `expr $ARRAY_INDEX % 10` == 0 ]
		then
			echo "stress test successfully completed
				"$ARRAY_INDEX" times">$LTPTMP/test_file.out
		fi
	done
	test_res TPASS $LTPTMP/test_file.out "stress_cpu_hotplug:SUCCESS"
	return $RC
}

#main

RC=0
LTPTMP=${TMP}

#create output file to dump test results
touch $LTPTMP/test_file.out || RC=$?

if [ $RC -ne 0 ]
then
	test_resm TFAIL "Failed to create output file under temp directory"
	exit $RC
fi

if ! get_all_cpus >/dev/null 2>$RC;
then
	tst_brkm TCONF "system does not have required cpu hotplug support"
	exit $RC
fi

setup || exit $RC

#capture the initial state of the cpus
state=`cd /sys/devices/system/cpu/ && grep '' */online | sed -e
's/\/online//g' -e 's/\ /\n/g'`

test01 || exit $RC


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Ltp-list mailing list
Ltp-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ltp-list

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [LTP] [RFC] cpu_hotplug: Adding a cpu hotplug stress test
  2012-07-17 11:00 [LTP] [RFC] cpu_hotplug: Adding a cpu hotplug stress test preeti
@ 2012-07-18  3:02 ` Wanlong Gao
  2012-07-18  3:41   ` preeti
  0 siblings, 1 reply; 6+ messages in thread
From: Wanlong Gao @ 2012-07-18  3:02 UTC (permalink / raw)
  To: preeti; +Cc: ltp-list@lists.sourceforge.net

Hi Preeti,

> Hi
> 
> The test case included is a simple case for cpu hotplug.It does offlines the cpus that are online and does an online of the offlined cpus in a loop
> 
> This stress test had failed on certain distros when the loop was run infinite times.This test is presented here for review of correctness and necessity,as this is the first attempt at contributing test cases to LTP from this end.
> 
> The test is meant to be included under the testcases/kernel/hotplug/cpu_hotplug/functional directory.

Why didn't you send this as a patch format?
Some comments below.

> 
> Regards
> Preeti
> ---
> 
> # File		:	stress_cpu_hotplug.sh
> # Description	:	Switches the online state of all the cpus in  a
> # 			loop to test the robustness of cpu hotplug
> #		:	The loop iteration of 20 is a randomly chosen number
> 
> #! /bin/bash
> 
> # Includes:
> LHCS_PATH=${LHCS_PATH:-$LTPROOT/testcases/bin/cpu_hotplug}
> . $LHCS_PATH/include/hotplug.fns
> . $LHCS_PATH/include/testsuite.fns
> 
> setup()
> {
> 	export TST_TOTAL=1
> 	export TCID="setup"
> 	export TST_COUNT=0
> 
> 	trap "cleanup" 0
> 	RC=0
> 
> 	return $RC
> 
> }
> cleanup()
> {
> 	set_all_cpu_state "$STATE"

I can't find the definition of "$STATE" in your test script.

> }
> 
> test01()
> {
> 
> 	TCID="stress_cpu_hotplug"
> 	TST_COUNT=1
> 	RC=0
> 
> 	NUMBER_OF_CPUS=`ls -d /sys/devices/system/cpu/cpu[0-9]* | wc -l`
> 
> 	cd /sys/devices/system/cpu
> 
> 	for ARRAY_INDEX in `seq 20`
> 	do
> 		for ((i=1; i < NUMBER_OF_CPUS; i++ ))
> 		do
> 			#skip the boot cpu;cannot offline it
> 			if [ $i -eq 0 ]
> 			then
> 				continue
> 			fi
> 
>         		state=`cat cpu$i/online`
> 		        if [ $state -eq 0 ]
>         		then
> 				RC=online_cpu $i
>         		else
>                 		RC=offline_cpu $i
>         		fi

Can it always success? I suppose that it need a bit sleep for the online/offline time delay.

> 
> 			if [ $RC -ne 0 ]
> 			then
> 				test_brkm TBROK NULL "stress_cpu_hotplug:
> 							cpu$i failed to hotplug"
> 				return $RC
> 			fi
> 		done
> 
> 		if [ `expr $ARRAY_INDEX % 10` == 0 ]
> 		then
> 			echo "stress test successfully completed
> 				"$ARRAY_INDEX" times">$LTPTMP/test_file.out

Every 10 times means a successful test?

> 		fi
> 	done
> 	test_res TPASS $LTPTMP/test_file.out "stress_cpu_hotplug:SUCCESS"
> 	return $RC
> }
> 
> #main
> 
> RC=0
> LTPTMP=${TMP}
> 
> #create output file to dump test results
> touch $LTPTMP/test_file.out || RC=$?
> 
> if [ $RC -ne 0 ]
> then
> 	test_resm TFAIL "Failed to create output file under temp directory"
> 	exit $RC
> fi
> 
> if ! get_all_cpus >/dev/null 2>$RC;
> then
> 	tst_brkm TCONF "system does not have required cpu hotplug support"
> 	exit $RC
> fi
> 
> setup || exit $RC
> 
> #capture the initial state of the cpus
> state=`cd /sys/devices/system/cpu/ && grep '' */online | sed -e
> 's/\/online//g' -e 's/\ /\n/g'`

This is the "STATE"?

Does the output of get_all_cpu_states() not suit the set_all_cpu_states()? 
if not, please fix it.

> 
> test01 || exit $RC

Don't you want to cleanup and reset the cpu state after the test?

Thanks,
Wanlong Gao

> 
> 
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and 
> threat landscape has changed and how IT managers can respond. Discussions 
> will include endpoint security, mobile security and the latest in malware 
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Ltp-list mailing list
> Ltp-list@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ltp-list
> 



------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Ltp-list mailing list
Ltp-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ltp-list

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [LTP] [RFC] cpu_hotplug: Adding a cpu hotplug stress test
  2012-07-18  3:02 ` Wanlong Gao
@ 2012-07-18  3:41   ` preeti
  2012-07-18  3:42     ` Wanlong Gao
  0 siblings, 1 reply; 6+ messages in thread
From: preeti @ 2012-07-18  3:41 UTC (permalink / raw)
  To: gaowanlong
  Cc: ltp-list@lists.sourceforge.net,
	Mailing list for the Energy Management India Team

On 07/18/2012 08:32 AM, Wanlong Gao wrote:
> Hi Preeti,
> 
>> Hi
>>
>> The test case included is a simple case for cpu hotplug.It does offlines the cpus that are online and does an online of the offlined cpus in a loop
>>
>> This stress test had failed on certain distros when the loop was run infinite times.This test is presented here for review of correctness and necessity,as this is the first attempt at contributing test cases to LTP from this end.
>>
>> The test is meant to be included under the testcases/kernel/hotplug/cpu_hotplug/functional directory.
> 
> Why didn't you send this as a patch format?

This was a frst attempt at sending test cases to LTP,so thought would get it
reviewed as an RFC first.

> Some comments below.
> 
>>
>> Regards
>> Preeti
>> ---
>>
>> # File		:	stress_cpu_hotplug.sh
>> # Description	:	Switches the online state of all the cpus in  a
>> # 			loop to test the robustness of cpu hotplug
>> #		:	The loop iteration of 20 is a randomly chosen number
>>
>> #! /bin/bash
>>
>> # Includes:
>> LHCS_PATH=${LHCS_PATH:-$LTPROOT/testcases/bin/cpu_hotplug}
>> . $LHCS_PATH/include/hotplug.fns
>> . $LHCS_PATH/include/testsuite.fns
>>
>> setup()
>> {
>> 	export TST_TOTAL=1
>> 	export TCID="setup"
>> 	export TST_COUNT=0
>>
>> 	trap "cleanup" 0
>> 	RC=0
>>
>> 	return $RC
>>
>> }
>> cleanup()
>> {
>> 	set_all_cpu_state "$STATE"
> 
> I can't find the definition of "$STATE" in your test script.

I apologise for this typo.It needs to be $state as you have pointed out
below.
> 
>> }
>>
>> test01()
>> {
>>
>> 	TCID="stress_cpu_hotplug"
>> 	TST_COUNT=1
>> 	RC=0
>>
>> 	NUMBER_OF_CPUS=`ls -d /sys/devices/system/cpu/cpu[0-9]* | wc -l`
>>
>> 	cd /sys/devices/system/cpu
>>
>> 	for ARRAY_INDEX in `seq 20`
>> 	do
>> 		for ((i=1; i < NUMBER_OF_CPUS; i++ ))
>> 		do
>> 			#skip the boot cpu;cannot offline it
>> 			if [ $i -eq 0 ]
>> 			then
>> 				continue
>> 			fi
>>
>>         		state=`cat cpu$i/online`
>> 		        if [ $state -eq 0 ]
>>         		then
>> 				RC=online_cpu $i
>>         		else
>>                 		RC=offline_cpu $i
>>         		fi
> 
> Can it always success? I suppose that it need a bit sleep for the online/offline time delay.

It does not need a sleep because we are doing an online and offline of
different cpus in one loop.i.e.for example:cpu1->1,cpu2->0,cpu3->1. so it
takes one complete loop for cpu1->0 to occur which is enough time for an
online or an offline operation for a cpu.

Besides this,the test has been carried out on RHEL distros before and they have
succeeded.Only the snapshot 5 of RHEL 6.3 is failing after running for a few seconds which
is equivalent to nearly two loops.
> 
>>
>> 			if [ $RC -ne 0 ]
>> 			then
>> 				test_brkm TBROK NULL "stress_cpu_hotplug:
>> 							cpu$i failed to hotplug"
>> 				return $RC
>> 			fi
>> 		done
>>
>> 		if [ `expr $ARRAY_INDEX % 10` == 0 ]
>> 		then
>> 			echo "stress test successfully completed
>> 				"$ARRAY_INDEX" times">$LTPTMP/test_file.out
> 
> Every 10 times means a successful test?

Not really.This message is intended to tell us after how many runs of the cpu
hotplug operation on all the cpus, is the machine failing to withstand the
stress.It might fail after running the loop 100 times or fail within 50 times
itself. Also 20 is a very small number for this stress test.It should
typically run 100 times.

I have captured the state of the stress test for every 10 iterations,instead
of logging for every iteration.So for example if the test is meant to run 100
times,but fails on some distro after 30 loops,the above message logging tells
us that the distro withstood the test for 30 loops atleast if not for the
entire duration.
> 
>> 		fi
>> 	done
>> 	test_res TPASS $LTPTMP/test_file.out "stress_cpu_hotplug:SUCCESS"
>> 	return $RC
>> }
>>
>> #main
>>
>> RC=0
>> LTPTMP=${TMP}
>>
>> #create output file to dump test results
>> touch $LTPTMP/test_file.out || RC=$?
>>
>> if [ $RC -ne 0 ]
>> then
>> 	test_resm TFAIL "Failed to create output file under temp directory"
>> 	exit $RC
>> fi
>>
>> if ! get_all_cpus >/dev/null 2>$RC;
>> then
>> 	tst_brkm TCONF "system does not have required cpu hotplug support"
>> 	exit $RC
>> fi
>>
>> setup || exit $RC
>>
>> #capture the initial state of the cpus
>> state=`cd /sys/devices/system/cpu/ && grep '' */online | sed -e
>> 's/\/online//g' -e 's/\ /\n/g'`
> 
> This is the "STATE"?
Yes this is the one.
> 
> Does the output of get_all_cpu_states() not suit the set_all_cpu_states()? 
> if not, please fix it.

No, get_all_cpu_states(),simply echos the states of the cpus onto the screen in
a single line.while the set_all_cpu_states() requires it as a variable with
the cpu states printed on multiple lines.Sure will fix this up.
> 
>>
>> test01 || exit $RC
> 
> Don't you want to cleanup and reset the cpu state after the test?

yes that is done in the cleanup function,except that it should be
set_all_cpu_states "$state"
> 
> Thanks,
> Wanlong Gao
> 
>>
Thank you,
Preeti
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and 
>> threat landscape has changed and how IT managers can respond. Discussions 
>> will include endpoint security, mobile security and the latest in malware 
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Ltp-list mailing list
>> Ltp-list@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/ltp-list
>>
> 
> 



------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Ltp-list mailing list
Ltp-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ltp-list

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [LTP] [RFC] cpu_hotplug: Adding a cpu hotplug stress test
  2012-07-18  3:41   ` preeti
@ 2012-07-18  3:42     ` Wanlong Gao
  2012-07-18  5:41       ` preeti
  0 siblings, 1 reply; 6+ messages in thread
From: Wanlong Gao @ 2012-07-18  3:42 UTC (permalink / raw)
  To: preeti
  Cc: ltp-list@lists.sourceforge.net,
	Mailing list for the Energy Management India Team

On 07/18/2012 11:41 AM, preeti wrote:
> On 07/18/2012 08:32 AM, Wanlong Gao wrote:
>> Hi Preeti,
>>
>>> Hi
>>>
>>> The test case included is a simple case for cpu hotplug.It does offlines the cpus that are online and does an online of the offlined cpus in a loop
>>>
>>> This stress test had failed on certain distros when the loop was run infinite times.This test is presented here for review of correctness and necessity,as this is the first attempt at contributing test cases to LTP from this end.
>>>
>>> The test is meant to be included under the testcases/kernel/hotplug/cpu_hotplug/functional directory.
>>
>> Why didn't you send this as a patch format?
> 
> This was a frst attempt at sending test cases to LTP,so thought would get it
> reviewed as an RFC first.

Yeah, but you can also send a patch titled like [RFC PATCH] xxx.

> 
>> Some comments below.
>>
>>>
>>> Regards
>>> Preeti
>>> ---
>>>
>>> # File		:	stress_cpu_hotplug.sh
>>> # Description	:	Switches the online state of all the cpus in  a
>>> # 			loop to test the robustness of cpu hotplug
>>> #		:	The loop iteration of 20 is a randomly chosen number
>>>
>>> #! /bin/bash
>>>
>>> # Includes:
>>> LHCS_PATH=${LHCS_PATH:-$LTPROOT/testcases/bin/cpu_hotplug}
>>> . $LHCS_PATH/include/hotplug.fns
>>> . $LHCS_PATH/include/testsuite.fns
>>>
>>> setup()
>>> {
>>> 	export TST_TOTAL=1
>>> 	export TCID="setup"
>>> 	export TST_COUNT=0
>>>
>>> 	trap "cleanup" 0
>>> 	RC=0
>>>
>>> 	return $RC
>>>
>>> }
>>> cleanup()
>>> {
>>> 	set_all_cpu_state "$STATE"
>>
>> I can't find the definition of "$STATE" in your test script.
> 
> I apologise for this typo.It needs to be $state as you have pointed out
> below.
>>
>>> }
>>>
>>> test01()
>>> {
>>>
>>> 	TCID="stress_cpu_hotplug"
>>> 	TST_COUNT=1
>>> 	RC=0
>>>
>>> 	NUMBER_OF_CPUS=`ls -d /sys/devices/system/cpu/cpu[0-9]* | wc -l`
>>>
>>> 	cd /sys/devices/system/cpu
>>>
>>> 	for ARRAY_INDEX in `seq 20`
>>> 	do
>>> 		for ((i=1; i < NUMBER_OF_CPUS; i++ ))
>>> 		do
>>> 			#skip the boot cpu;cannot offline it
>>> 			if [ $i -eq 0 ]
>>> 			then
>>> 				continue
>>> 			fi
>>>
>>>         		state=`cat cpu$i/online`
>>> 		        if [ $state -eq 0 ]
>>>         		then
>>> 				RC=online_cpu $i
>>>         		else
>>>                 		RC=offline_cpu $i
>>>         		fi
>>
>> Can it always success? I suppose that it need a bit sleep for the online/offline time delay.
> 
> It does not need a sleep because we are doing an online and offline of
> different cpus in one loop.i.e.for example:cpu1->1,cpu2->0,cpu3->1. so it
> takes one complete loop for cpu1->0 to occur which is enough time for an
> online or an offline operation for a cpu.
> 
> Besides this,the test has been carried out on RHEL distros before and they have
> succeeded.Only the snapshot 5 of RHEL 6.3 is failing after running for a few seconds which
> is equivalent to nearly two loops.

Did you investigate this problem? Why does it fail? Kernel problem or any others?

>>
>>>
>>> 			if [ $RC -ne 0 ]
>>> 			then
>>> 				test_brkm TBROK NULL "stress_cpu_hotplug:
>>> 							cpu$i failed to hotplug"
>>> 				return $RC
>>> 			fi
>>> 		done
>>>
>>> 		if [ `expr $ARRAY_INDEX % 10` == 0 ]
>>> 		then
>>> 			echo "stress test successfully completed
>>> 				"$ARRAY_INDEX" times">$LTPTMP/test_file.out
>>
>> Every 10 times means a successful test?
> 
> Not really.This message is intended to tell us after how many runs of the cpu
> hotplug operation on all the cpus, is the machine failing to withstand the
> stress.It might fail after running the loop 100 times or fail within 50 times
> itself. Also 20 is a very small number for this stress test.It should
> typically run 100 times.
> 
> I have captured the state of the stress test for every 10 iterations,instead
> of logging for every iteration.So for example if the test is meant to run 100
> times,but fails on some distro after 30 loops,the above message logging tells
> us that the distro withstood the test for 30 loops atleast if not for the
> entire duration.

Yeah, so the message "stress test successfully completed" need to be fixed ?

>>
>>> 		fi
>>> 	done
>>> 	test_res TPASS $LTPTMP/test_file.out "stress_cpu_hotplug:SUCCESS"
>>> 	return $RC
>>> }
>>>
>>> #main
>>>
>>> RC=0
>>> LTPTMP=${TMP}
>>>
>>> #create output file to dump test results
>>> touch $LTPTMP/test_file.out || RC=$?
>>>
>>> if [ $RC -ne 0 ]
>>> then
>>> 	test_resm TFAIL "Failed to create output file under temp directory"
>>> 	exit $RC
>>> fi
>>>
>>> if ! get_all_cpus >/dev/null 2>$RC;
>>> then
>>> 	tst_brkm TCONF "system does not have required cpu hotplug support"
>>> 	exit $RC
>>> fi
>>>
>>> setup || exit $RC
>>>
>>> #capture the initial state of the cpus
>>> state=`cd /sys/devices/system/cpu/ && grep '' */online | sed -e
>>> 's/\/online//g' -e 's/\ /\n/g'`
>>
>> This is the "STATE"?
> Yes this is the one.
>>
>> Does the output of get_all_cpu_states() not suit the set_all_cpu_states()? 
>> if not, please fix it.
> 
> No, get_all_cpu_states(),simply echos the states of the cpus onto the screen in
> a single line.while the set_all_cpu_states() requires it as a variable with
> the cpu states printed on multiple lines.Sure will fix this up.

OK, please.

>>
>>>
>>> test01 || exit $RC
>>
>> Don't you want to cleanup and reset the cpu state after the test?
> 
> yes that is done in the cleanup function,except that it should be
> set_all_cpu_states "$state"

You didn't call cleanup() after the test.

Thanks,
Wanlong Gao

>>
>> Thanks,
>> Wanlong Gao
>>
>>>
> Thank you,
> Preeti
>>>
>>> ------------------------------------------------------------------------------
>>> Live Security Virtual Conference
>>> Exclusive live event will cover all the ways today's security and 
>>> threat landscape has changed and how IT managers can respond. Discussions 
>>> will include endpoint security, mobile security and the latest in malware 
>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>> _______________________________________________
>>> Ltp-list mailing list
>>> Ltp-list@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/ltp-list
>>>
>>
>>
> 
> 
> 



------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Ltp-list mailing list
Ltp-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ltp-list

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [LTP] [RFC] cpu_hotplug: Adding a cpu hotplug stress test
  2012-07-18  5:41       ` preeti
@ 2012-07-18  5:40         ` Wanlong Gao
  0 siblings, 0 replies; 6+ messages in thread
From: Wanlong Gao @ 2012-07-18  5:40 UTC (permalink / raw)
  To: preeti
  Cc: ltp-list@lists.sourceforge.net,
	Mailing list for the Energy Management India Team

On 07/18/2012 01:41 PM, preeti wrote:
> On 07/18/2012 09:12 AM, Wanlong Gao wrote:
>> On 07/18/2012 11:41 AM, preeti wrote:
>>> On 07/18/2012 08:32 AM, Wanlong Gao wrote:
>>>> Hi Preeti,
>>>>
>>>>> Hi
>>>>>
>>>>> The test case included is a simple case for cpu hotplug.It does offlines the cpus that are online and does an online of the offlined cpus in a loop
>>>>>
>>>>> This stress test had failed on certain distros when the loop was run infinite times.This test is presented here for review of correctness and necessity,as this is the first attempt at contributing test cases to LTP from this end.
>>>>>
>>>>> The test is meant to be included under the testcases/kernel/hotplug/cpu_hotplug/functional directory.
>>>>
>>>> Why didn't you send this as a patch format?
>>>
>>> This was a frst attempt at sending test cases to LTP,so thought would get it
>>> reviewed as an RFC first.
>>
>> Yeah, but you can also send a patch titled like [RFC PATCH] xxx.
> 
> Ok.
>>
>>>
>>>> Some comments below.
>>>>
>>>>>
>>>>> Regards
>>>>> Preeti
>>>>> ---
>>>>>
>>>>> # File		:	stress_cpu_hotplug.sh
>>>>> # Description	:	Switches the online state of all the cpus in  a
>>>>> # 			loop to test the robustness of cpu hotplug
>>>>> #		:	The loop iteration of 20 is a randomly chosen number
>>>>>
>>>>> #! /bin/bash
>>>>>
>>>>> # Includes:
>>>>> LHCS_PATH=${LHCS_PATH:-$LTPROOT/testcases/bin/cpu_hotplug}
>>>>> . $LHCS_PATH/include/hotplug.fns
>>>>> . $LHCS_PATH/include/testsuite.fns
>>>>>
>>>>> setup()
>>>>> {
>>>>> 	export TST_TOTAL=1
>>>>> 	export TCID="setup"
>>>>> 	export TST_COUNT=0
>>>>>
>>>>> 	trap "cleanup" 0
>>>>> 	RC=0
>>>>>
>>>>> 	return $RC
>>>>>
>>>>> }
>>>>> cleanup()
>>>>> {
>>>>> 	set_all_cpu_state "$STATE"
>>>>
>>>> I can't find the definition of "$STATE" in your test script.
>>>
>>> I apologise for this typo.It needs to be $state as you have pointed out
>>> below.
>>>>
>>>>> }
>>>>>
>>>>> test01()
>>>>> {
>>>>>
>>>>> 	TCID="stress_cpu_hotplug"
>>>>> 	TST_COUNT=1
>>>>> 	RC=0
>>>>>
>>>>> 	NUMBER_OF_CPUS=`ls -d /sys/devices/system/cpu/cpu[0-9]* | wc -l`
>>>>>
>>>>> 	cd /sys/devices/system/cpu
>>>>>
>>>>> 	for ARRAY_INDEX in `seq 20`
>>>>> 	do
>>>>> 		for ((i=1; i < NUMBER_OF_CPUS; i++ ))
>>>>> 		do
>>>>> 			#skip the boot cpu;cannot offline it
>>>>> 			if [ $i -eq 0 ]
>>>>> 			then
>>>>> 				continue
>>>>> 			fi
>>>>>
>>>>>         		state=`cat cpu$i/online`
>>>>> 		        if [ $state -eq 0 ]
>>>>>         		then
>>>>> 				RC=online_cpu $i
>>>>>         		else
>>>>>                 		RC=offline_cpu $i
>>>>>         		fi
>>>>
>>>> Can it always success? I suppose that it need a bit sleep for the online/offline time delay.
>>>
>>> It does not need a sleep because we are doing an online and offline of
>>> different cpus in one loop.i.e.for example:cpu1->1,cpu2->0,cpu3->1. so it
>>> takes one complete loop for cpu1->0 to occur which is enough time for an
>>> online or an offline operation for a cpu.
>>>
>>> Besides this,the test has been carried out on RHEL distros before and they have
>>> succeeded.Only the snapshot 5 of RHEL 6.3 is failing after running for a few seconds which
>>> is equivalent to nearly two loops.
>>
>> Did you investigate this problem? Why does it fail? Kernel problem or any others?
> 
> Yes,it is a kernel problem.The dmesg output showed that the cpu hotplug operation hangs at
> synchronize_sched().The scheduler is waiting for some rcu read side critical
> section to complete,and is either not notified of the completion of the task
> or there is some rcu section which is actually not completed.
> 
> The machine is responsive,in the sense that it responds to the ping
> packets,but is too slow to perform any operation on.But slowly recovers back
> to the original state.We have opened a bug on this.

Ok, thank you.


Wanlong Gao

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Ltp-list mailing list
Ltp-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ltp-list

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [LTP] [RFC] cpu_hotplug: Adding a cpu hotplug stress test
  2012-07-18  3:42     ` Wanlong Gao
@ 2012-07-18  5:41       ` preeti
  2012-07-18  5:40         ` Wanlong Gao
  0 siblings, 1 reply; 6+ messages in thread
From: preeti @ 2012-07-18  5:41 UTC (permalink / raw)
  To: gaowanlong
  Cc: ltp-list@lists.sourceforge.net,
	Mailing list for the Energy Management India Team

On 07/18/2012 09:12 AM, Wanlong Gao wrote:
> On 07/18/2012 11:41 AM, preeti wrote:
>> On 07/18/2012 08:32 AM, Wanlong Gao wrote:
>>> Hi Preeti,
>>>
>>>> Hi
>>>>
>>>> The test case included is a simple case for cpu hotplug.It does offlines the cpus that are online and does an online of the offlined cpus in a loop
>>>>
>>>> This stress test had failed on certain distros when the loop was run infinite times.This test is presented here for review of correctness and necessity,as this is the first attempt at contributing test cases to LTP from this end.
>>>>
>>>> The test is meant to be included under the testcases/kernel/hotplug/cpu_hotplug/functional directory.
>>>
>>> Why didn't you send this as a patch format?
>>
>> This was a frst attempt at sending test cases to LTP,so thought would get it
>> reviewed as an RFC first.
> 
> Yeah, but you can also send a patch titled like [RFC PATCH] xxx.

Ok.
> 
>>
>>> Some comments below.
>>>
>>>>
>>>> Regards
>>>> Preeti
>>>> ---
>>>>
>>>> # File		:	stress_cpu_hotplug.sh
>>>> # Description	:	Switches the online state of all the cpus in  a
>>>> # 			loop to test the robustness of cpu hotplug
>>>> #		:	The loop iteration of 20 is a randomly chosen number
>>>>
>>>> #! /bin/bash
>>>>
>>>> # Includes:
>>>> LHCS_PATH=${LHCS_PATH:-$LTPROOT/testcases/bin/cpu_hotplug}
>>>> . $LHCS_PATH/include/hotplug.fns
>>>> . $LHCS_PATH/include/testsuite.fns
>>>>
>>>> setup()
>>>> {
>>>> 	export TST_TOTAL=1
>>>> 	export TCID="setup"
>>>> 	export TST_COUNT=0
>>>>
>>>> 	trap "cleanup" 0
>>>> 	RC=0
>>>>
>>>> 	return $RC
>>>>
>>>> }
>>>> cleanup()
>>>> {
>>>> 	set_all_cpu_state "$STATE"
>>>
>>> I can't find the definition of "$STATE" in your test script.
>>
>> I apologise for this typo.It needs to be $state as you have pointed out
>> below.
>>>
>>>> }
>>>>
>>>> test01()
>>>> {
>>>>
>>>> 	TCID="stress_cpu_hotplug"
>>>> 	TST_COUNT=1
>>>> 	RC=0
>>>>
>>>> 	NUMBER_OF_CPUS=`ls -d /sys/devices/system/cpu/cpu[0-9]* | wc -l`
>>>>
>>>> 	cd /sys/devices/system/cpu
>>>>
>>>> 	for ARRAY_INDEX in `seq 20`
>>>> 	do
>>>> 		for ((i=1; i < NUMBER_OF_CPUS; i++ ))
>>>> 		do
>>>> 			#skip the boot cpu;cannot offline it
>>>> 			if [ $i -eq 0 ]
>>>> 			then
>>>> 				continue
>>>> 			fi
>>>>
>>>>         		state=`cat cpu$i/online`
>>>> 		        if [ $state -eq 0 ]
>>>>         		then
>>>> 				RC=online_cpu $i
>>>>         		else
>>>>                 		RC=offline_cpu $i
>>>>         		fi
>>>
>>> Can it always success? I suppose that it need a bit sleep for the online/offline time delay.
>>
>> It does not need a sleep because we are doing an online and offline of
>> different cpus in one loop.i.e.for example:cpu1->1,cpu2->0,cpu3->1. so it
>> takes one complete loop for cpu1->0 to occur which is enough time for an
>> online or an offline operation for a cpu.
>>
>> Besides this,the test has been carried out on RHEL distros before and they have
>> succeeded.Only the snapshot 5 of RHEL 6.3 is failing after running for a few seconds which
>> is equivalent to nearly two loops.
> 
> Did you investigate this problem? Why does it fail? Kernel problem or any others?

Yes,it is a kernel problem.The dmesg output showed that the cpu hotplug operation hangs at
synchronize_sched().The scheduler is waiting for some rcu read side critical
section to complete,and is either not notified of the completion of the task
or there is some rcu section which is actually not completed.

The machine is responsive,in the sense that it responds to the ping
packets,but is too slow to perform any operation on.But slowly recovers back
to the original state.We have opened a bug on this.
> 
>>>
>>>>
>>>> 			if [ $RC -ne 0 ]
>>>> 			then
>>>> 				test_brkm TBROK NULL "stress_cpu_hotplug:
>>>> 							cpu$i failed to hotplug"
>>>> 				return $RC
>>>> 			fi
>>>> 		done
>>>>
>>>> 		if [ `expr $ARRAY_INDEX % 10` == 0 ]
>>>> 		then
>>>> 			echo "stress test successfully completed
>>>> 				"$ARRAY_INDEX" times">$LTPTMP/test_file.out
>>>
>>> Every 10 times means a successful test?
>>
>> Not really.This message is intended to tell us after how many runs of the cpu
>> hotplug operation on all the cpus, is the machine failing to withstand the
>> stress.It might fail after running the loop 100 times or fail within 50 times
>> itself. Also 20 is a very small number for this stress test.It should
>> typically run 100 times.
>>
>> I have captured the state of the stress test for every 10 iterations,instead
>> of logging for every iteration.So for example if the test is meant to run 100
>> times,but fails on some distro after 30 loops,the above message logging tells
>> us that the distro withstood the test for 30 loops atleast if not for the
>> entire duration.
> 
> Yeah, so the message "stress test successfully completed" need to be fixed ?
no the code statement is:

echo "stress test successfully completed "$ARRAY_INDEX" times">$LTPTMP/test_file.out
where $ARRAY_INDEX contains the loop number.
> 
>>>
>>>> 		fi
>>>> 	done
>>>> 	test_res TPASS $LTPTMP/test_file.out "stress_cpu_hotplug:SUCCESS"
>>>> 	return $RC
>>>> }
>>>>
>>>> #main
>>>>
>>>> RC=0
>>>> LTPTMP=${TMP}
>>>>
>>>> #create output file to dump test results
>>>> touch $LTPTMP/test_file.out || RC=$?
>>>>
>>>> if [ $RC -ne 0 ]
>>>> then
>>>> 	test_resm TFAIL "Failed to create output file under temp directory"
>>>> 	exit $RC
>>>> fi
>>>>
>>>> if ! get_all_cpus >/dev/null 2>$RC;
>>>> then
>>>> 	tst_brkm TCONF "system does not have required cpu hotplug support"
>>>> 	exit $RC
>>>> fi
>>>>
>>>> setup || exit $RC
>>>>
>>>> #capture the initial state of the cpus
>>>> state=`cd /sys/devices/system/cpu/ && grep '' */online | sed -e
>>>> 's/\/online//g' -e 's/\ /\n/g'`
>>>
>>> This is the "STATE"?
>> Yes this is the one.
>>>
>>> Does the output of get_all_cpu_states() not suit the set_all_cpu_states()? 
>>> if not, please fix it.
>>
>> No, get_all_cpu_states(),simply echos the states of the cpus onto the screen in
>> a single line.while the set_all_cpu_states() requires it as a variable with
>> the cpu states printed on multiple lines.Sure will fix this up.
> 
> OK, please.
> 
>>>
>>>>
>>>> test01 || exit $RC
>>>
>>> Don't you want to cleanup and reset the cpu state after the test?
>>
>> yes that is done in the cleanup function,except that it should be
>> set_all_cpu_states "$state"
> 
> You didn't call cleanup() after the test.

Notice the statement trap "cleanup" 0 under setup.This will call the
cleanup() function on exit,where 0 is the code for exit.This is declared when
setup() is called.
> 
> Thanks,
> Wanlong Gao
> 
>>>
>>> Thanks,
>>> Wanlong Gao
>>>
>>>>
>> Thank you,
>> Preeti
>>>>
Thanks
Preeti
>>>> ------------------------------------------------------------------------------
>>>> Live Security Virtual Conference
>>>> Exclusive live event will cover all the ways today's security and 
>>>> threat landscape has changed and how IT managers can respond. Discussions 
>>>> will include endpoint security, mobile security and the latest in malware 
>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>> _______________________________________________
>>>> Ltp-list mailing list
>>>> Ltp-list@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/ltp-list
>>>>
>>>
>>>
>>
>>
>>
> 
> 



------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Ltp-list mailing list
Ltp-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ltp-list

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-07-18  5:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-17 11:00 [LTP] [RFC] cpu_hotplug: Adding a cpu hotplug stress test preeti
2012-07-18  3:02 ` Wanlong Gao
2012-07-18  3:41   ` preeti
2012-07-18  3:42     ` Wanlong Gao
2012-07-18  5:41       ` preeti
2012-07-18  5:40         ` Wanlong Gao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox