From mboxrd@z Thu Jan 1 00:00:00 1970 From: Cyril Hrubis Date: Thu, 1 Oct 2020 13:03:36 +0200 Subject: [LTP] [RFC] ltp test add reboot function In-Reply-To: References: <20200824074226.GB2466@yuki.lan> <20200828130638.GD10501@yuki.lan> <20200903092448.GC6285@yuki.lan> <20200930140508.GA12097@yuki.lan> Message-ID: <20201001110336.GA7349@yuki.lan> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: ltp@lists.linux.it Hi! > > I guess that we would have to add a command line parameter to the test > > library to tell the testcase to continue with the second half of the > > test. Then after the reboot the testcase would be executed with that > > option so that it knows that we are running it for a second time and > > then we have to pass that to the testcases. > > > > And since the uClinux support is dead, we are free to reuse the -C flag > > for this purpose. Probably easiest solution would be to set a global > > variable (named tst_rebooted or something similar) if -C was passed to > > the test on a command line, then we can use the value of this variable > > in the test setup/verify/cleanup functions. > > Just brainstorming here... > > How about an environment variable that contains the location to resume > execution. It could be a specific test case (within a test), or some value that the verify > function uses to skip down to the place to resume. > > If set, the library would skip the setup operation (or modify it appropriately). > > Something like this: > > LTP_RESUME_POS=reboot_test:testcase_6 > > (or some better name :-) ) > > The value for the resume position would have to be sent to the test framework, > so it could set it (in the variable) on machine reboot. The test framework has to know > to set something (either -C or LTP_RESUME_POS) so that the test can recognize it is > in a resume-after-reboot condition. > > OR > > Maybe it's sufficient for the test to create a temp file (in a temp directory that is known > to be persistent across a reboot, which not all of them are). The presence of the temp > file could indicate a resume-after-reboot condition, and it's contents could be used > to indicate the resume position. That would mean defining a persistent directory and would be prone to leftovers, i.e. what happends when a test crashes. I think that having a environment variable would be much better solution. > Question: does this "resume-after-reboot" condition need to be recognized by ltp-pan? First of all I think that ltp-pan is something that does not fit into this picture at all. Hopefully it will be replaced by runltp-ng in a few years, which will run on a different host and will simply continue to run during the time the SUT is rebooted. > I'm still not sure what is envisioned for the interface between the test and the > test framework, to detect that it should resume a particular test on DUT reboot. > If the test is initiating the reboot, maybe it needs to communicate some data to > the test framework (or whatever is performing the reboot), so that things can be > set up during boot to continue where the test left off. I do see it as: * The test advertizes to the test executing framework that it reboots the machine during the testrun * When the test is executed the test executing framework will expect reboot, wait for the machine to boot and finally re-executes the test * The execution framework will also pass down the evironment variable/command line parameters so that the test picks up where it's supposed to. This, among other things, makes sure that there is no state saved on the SUT and when a test fails after third reboot we will know exactly where since we are tracking the state in the execution framework. Does this make sense? > We have been envisioning in Fuego supporting an API like the following, for > a generic reboot mechanism for rebooting a board: > > CLI: > lc board {board_name} reboot > > REST API: > wget https://{lab-control-server}/api/devices/{board_name}/power/reboot > > These are both intended to support immediate reboot of the board, and don't take > any parameters. It seems like there's a bit of "knowledge" about the board > bringup that is outside the scope of just a simple board reboot operation, that would have > to be conveyed to the test framework and possibly the on-board boot initialization > code, to accomplish a resume operation for a test. It's a bit hard to figure out > where the extra information should reside. Should the data be placed in > the reboot API? Should there be a separate call to the test framework/board control > software to prepare for a reboot-and-resume-test operation? Well for the parts that are related to the testrun these probably belongs to the executing framework. The runltp-ng works in a way where it drives the execution of the tests, i.e. it waits for a machine to boot, then starts executing binaries. These binaries are really a single testcases, each with it's own timemout, etc. Which really means that it can also handle a test that needs reboot just fine since it will execute the test binary for a first time, waits for a reboot, then execute the binary for a second time with correspnding parameters. What exactly do you mean by the board specific info? I suppose that's about kernel image, rootfs and where to load these from, right? That should be probably part of the lab-control to remmeber these between test requested reboots. -- Cyril Hrubis chrubis@suse.cz