From: preeti <preeti@linux.vnet.ibm.com>
To: gaowanlong@cn.fujitsu.com
Cc: "ltp-list@lists.sourceforge.net" <ltp-list@lists.sourceforge.net>,
Mailing list for the Energy Management India Team
<ltc-india-em@lists.linux.ibm.com>
Subject: Re: [LTP] [RFC] cpu_hotplug: Adding a cpu hotplug stress test
Date: Wed, 18 Jul 2012 11:11:55 +0530 [thread overview]
Message-ID: <50064CA3.6070003@linux.vnet.ibm.com> (raw)
In-Reply-To: <5006308A.7050206@cn.fujitsu.com>
On 07/18/2012 09:12 AM, Wanlong Gao wrote:
> On 07/18/2012 11:41 AM, preeti wrote:
>> On 07/18/2012 08:32 AM, Wanlong Gao wrote:
>>> Hi Preeti,
>>>
>>>> Hi
>>>>
>>>> The test case included is a simple case for cpu hotplug.It does offlines the cpus that are online and does an online of the offlined cpus in a loop
>>>>
>>>> This stress test had failed on certain distros when the loop was run infinite times.This test is presented here for review of correctness and necessity,as this is the first attempt at contributing test cases to LTP from this end.
>>>>
>>>> The test is meant to be included under the testcases/kernel/hotplug/cpu_hotplug/functional directory.
>>>
>>> Why didn't you send this as a patch format?
>>
>> This was a frst attempt at sending test cases to LTP,so thought would get it
>> reviewed as an RFC first.
>
> Yeah, but you can also send a patch titled like [RFC PATCH] xxx.
Ok.
>
>>
>>> Some comments below.
>>>
>>>>
>>>> Regards
>>>> Preeti
>>>> ---
>>>>
>>>> # File : stress_cpu_hotplug.sh
>>>> # Description : Switches the online state of all the cpus in a
>>>> # loop to test the robustness of cpu hotplug
>>>> # : The loop iteration of 20 is a randomly chosen number
>>>>
>>>> #! /bin/bash
>>>>
>>>> # Includes:
>>>> LHCS_PATH=${LHCS_PATH:-$LTPROOT/testcases/bin/cpu_hotplug}
>>>> . $LHCS_PATH/include/hotplug.fns
>>>> . $LHCS_PATH/include/testsuite.fns
>>>>
>>>> setup()
>>>> {
>>>> export TST_TOTAL=1
>>>> export TCID="setup"
>>>> export TST_COUNT=0
>>>>
>>>> trap "cleanup" 0
>>>> RC=0
>>>>
>>>> return $RC
>>>>
>>>> }
>>>> cleanup()
>>>> {
>>>> set_all_cpu_state "$STATE"
>>>
>>> I can't find the definition of "$STATE" in your test script.
>>
>> I apologise for this typo.It needs to be $state as you have pointed out
>> below.
>>>
>>>> }
>>>>
>>>> test01()
>>>> {
>>>>
>>>> TCID="stress_cpu_hotplug"
>>>> TST_COUNT=1
>>>> RC=0
>>>>
>>>> NUMBER_OF_CPUS=`ls -d /sys/devices/system/cpu/cpu[0-9]* | wc -l`
>>>>
>>>> cd /sys/devices/system/cpu
>>>>
>>>> for ARRAY_INDEX in `seq 20`
>>>> do
>>>> for ((i=1; i < NUMBER_OF_CPUS; i++ ))
>>>> do
>>>> #skip the boot cpu;cannot offline it
>>>> if [ $i -eq 0 ]
>>>> then
>>>> continue
>>>> fi
>>>>
>>>> state=`cat cpu$i/online`
>>>> if [ $state -eq 0 ]
>>>> then
>>>> RC=online_cpu $i
>>>> else
>>>> RC=offline_cpu $i
>>>> fi
>>>
>>> Can it always success? I suppose that it need a bit sleep for the online/offline time delay.
>>
>> It does not need a sleep because we are doing an online and offline of
>> different cpus in one loop.i.e.for example:cpu1->1,cpu2->0,cpu3->1. so it
>> takes one complete loop for cpu1->0 to occur which is enough time for an
>> online or an offline operation for a cpu.
>>
>> Besides this,the test has been carried out on RHEL distros before and they have
>> succeeded.Only the snapshot 5 of RHEL 6.3 is failing after running for a few seconds which
>> is equivalent to nearly two loops.
>
> Did you investigate this problem? Why does it fail? Kernel problem or any others?
Yes,it is a kernel problem.The dmesg output showed that the cpu hotplug operation hangs at
synchronize_sched().The scheduler is waiting for some rcu read side critical
section to complete,and is either not notified of the completion of the task
or there is some rcu section which is actually not completed.
The machine is responsive,in the sense that it responds to the ping
packets,but is too slow to perform any operation on.But slowly recovers back
to the original state.We have opened a bug on this.
>
>>>
>>>>
>>>> if [ $RC -ne 0 ]
>>>> then
>>>> test_brkm TBROK NULL "stress_cpu_hotplug:
>>>> cpu$i failed to hotplug"
>>>> return $RC
>>>> fi
>>>> done
>>>>
>>>> if [ `expr $ARRAY_INDEX % 10` == 0 ]
>>>> then
>>>> echo "stress test successfully completed
>>>> "$ARRAY_INDEX" times">$LTPTMP/test_file.out
>>>
>>> Every 10 times means a successful test?
>>
>> Not really.This message is intended to tell us after how many runs of the cpu
>> hotplug operation on all the cpus, is the machine failing to withstand the
>> stress.It might fail after running the loop 100 times or fail within 50 times
>> itself. Also 20 is a very small number for this stress test.It should
>> typically run 100 times.
>>
>> I have captured the state of the stress test for every 10 iterations,instead
>> of logging for every iteration.So for example if the test is meant to run 100
>> times,but fails on some distro after 30 loops,the above message logging tells
>> us that the distro withstood the test for 30 loops atleast if not for the
>> entire duration.
>
> Yeah, so the message "stress test successfully completed" need to be fixed ?
no the code statement is:
echo "stress test successfully completed "$ARRAY_INDEX" times">$LTPTMP/test_file.out
where $ARRAY_INDEX contains the loop number.
>
>>>
>>>> fi
>>>> done
>>>> test_res TPASS $LTPTMP/test_file.out "stress_cpu_hotplug:SUCCESS"
>>>> return $RC
>>>> }
>>>>
>>>> #main
>>>>
>>>> RC=0
>>>> LTPTMP=${TMP}
>>>>
>>>> #create output file to dump test results
>>>> touch $LTPTMP/test_file.out || RC=$?
>>>>
>>>> if [ $RC -ne 0 ]
>>>> then
>>>> test_resm TFAIL "Failed to create output file under temp directory"
>>>> exit $RC
>>>> fi
>>>>
>>>> if ! get_all_cpus >/dev/null 2>$RC;
>>>> then
>>>> tst_brkm TCONF "system does not have required cpu hotplug support"
>>>> exit $RC
>>>> fi
>>>>
>>>> setup || exit $RC
>>>>
>>>> #capture the initial state of the cpus
>>>> state=`cd /sys/devices/system/cpu/ && grep '' */online | sed -e
>>>> 's/\/online//g' -e 's/\ /\n/g'`
>>>
>>> This is the "STATE"?
>> Yes this is the one.
>>>
>>> Does the output of get_all_cpu_states() not suit the set_all_cpu_states()?
>>> if not, please fix it.
>>
>> No, get_all_cpu_states(),simply echos the states of the cpus onto the screen in
>> a single line.while the set_all_cpu_states() requires it as a variable with
>> the cpu states printed on multiple lines.Sure will fix this up.
>
> OK, please.
>
>>>
>>>>
>>>> test01 || exit $RC
>>>
>>> Don't you want to cleanup and reset the cpu state after the test?
>>
>> yes that is done in the cleanup function,except that it should be
>> set_all_cpu_states "$state"
>
> You didn't call cleanup() after the test.
Notice the statement trap "cleanup" 0 under setup.This will call the
cleanup() function on exit,where 0 is the code for exit.This is declared when
setup() is called.
>
> Thanks,
> Wanlong Gao
>
>>>
>>> Thanks,
>>> Wanlong Gao
>>>
>>>>
>> Thank you,
>> Preeti
>>>>
Thanks
Preeti
>>>> ------------------------------------------------------------------------------
>>>> Live Security Virtual Conference
>>>> Exclusive live event will cover all the ways today's security and
>>>> threat landscape has changed and how IT managers can respond. Discussions
>>>> will include endpoint security, mobile security and the latest in malware
>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>> _______________________________________________
>>>> Ltp-list mailing list
>>>> Ltp-list@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/ltp-list
>>>>
>>>
>>>
>>
>>
>>
>
>
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Ltp-list mailing list
Ltp-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ltp-list
next prev parent reply other threads:[~2012-07-18 5:31 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-17 11:00 [LTP] [RFC] cpu_hotplug: Adding a cpu hotplug stress test preeti
2012-07-18 3:02 ` Wanlong Gao
2012-07-18 3:41 ` preeti
2012-07-18 3:42 ` Wanlong Gao
2012-07-18 5:41 ` preeti [this message]
2012-07-18 5:40 ` Wanlong Gao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50064CA3.6070003@linux.vnet.ibm.com \
--to=preeti@linux.vnet.ibm.com \
--cc=gaowanlong@cn.fujitsu.com \
--cc=ltc-india-em@lists.linux.ibm.com \
--cc=ltp-list@lists.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox