From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jan Stancek Date: Mon, 7 Jan 2019 12:39:35 -0500 (EST) Subject: [LTP] [PATCH v3 2/3] lib/tst_test.c: Update result counters when calling tst_brk() In-Reply-To: <20190107150619.GC15221@rei.lan> References: <20181211151733.GC1180@rei> <1544690160-13900-1-git-send-email-yangx.jy@cn.fujitsu.com> <1544690160-13900-2-git-send-email-yangx.jy@cn.fujitsu.com> <20190107150619.GC15221@rei.lan> Message-ID: <1383176395.93706380.1546882774990.JavaMail.zimbra@redhat.com> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: ltp@lists.linux.it ----- Original Message ----- > Hi! > > 1) Catch and report the TFAIL exit status of child process. > > Looking at the codebase we do have a few usages of tst_brk(TFAIL, "...") > to exit the child process, which sort of works but it's incorrect. The > tst_brk() always meant "unrecoverable failure have happened, exit the > current process as fast as possible". Looking over our codebase most of > the tst_brk(TFAIL, "...") should not actually cause the main test > process to exit, these were only meant to exit the child and report the > result in one call. It will for instance break the test with -i option > on the first failure, which is incorrect. Nice example, would you care to add that to docs? > > So if we ever want to have a function to exit child process with a result we > should implement tst_ret() that would be equivalent to tst_res() followed by > exit(0). > > It could be even implemented as: > > #define tst_ret(ttype, fmt, ...) \ > do { \ > tst_res_(__FILE__, __LINE__, (ttype), (fmt), ##__VA_ARGS__); \ > exit(0); \ > } while (0) > > This function has one big advantage, it increments the results counters > before the child process exits. If all call-sites switch to tst_ret(), we could add TFAIL to tst_brk compile time check. > > Actually one of the big points of the new test library was that the > results counters are atomically increased, because passing the results > in exit values is nightmare that cannot be done correclty. > > > 2) Only update result counters in library process and main test > > process because the exit status of child can be reported by > > main test process. > > Actually after I spend some time on it I think that the best solution is > to update the results in the piece of shared memory as fast as possible, > anything else is prone to various races and corner cases. I was thinking this too. If your parent process happens to wait for the child itself, then library will never get to see retcode. Regards, Jan > > > 3) Print TCONF message and increase skipped when calling tst_brk(TCONF). > > Print TBROK message and increase broken when calling tst_brk(TBROK). > > Print TFAIL message and increase failed when calling tst_brk(TFAIL). > > 4) Remove duplicate update_results() in run_tcases_per_fs(). > > I've been thinking about this and the problem is more complex, and I'm > even not sure that it's possible to write the library so that the > counters are consistent at the time we exit the test if something > unexpected happened and we called tst_brk(). > > Consider for instance this example: > > #include "tst_test.h" > > static void do_test(void) > { > if (!SAFE_FORK()) > tst_brk(TBROK, "child"); > tst_brk(TBROK, "parent"); > } > > static struct tst_test test = { > .test_all = do_test, > .forks_child = 1, > }; > > When tst_brk() is called both in parent and child the counter would be > incremented only once because the child is not waited for by the main > test. > > We can close this special case by changing the main test pid to wait for the > children before it calls exit() in the tst_brk() but that may cause the > main process to get stuck undefinitely if the child processes get stuck, > so we would have to be careful. > > Also from the very definition of the TBROK return status the test > results would be incomplete at best, since TBROK really means > "unrecoverable error happened during the test" which would mostly means > that something as low level as filesystem got corrupted and there is no > point in presenting the results in that case, so I guess that the best > we could do in the case of TBROK is to print big message that says > "things went horribly wrong!" or something similar. > > All in all I would like to avoid applying patches to the test library > before we finalize the release, since there is not much time for > testing now. > > -- > Cyril Hrubis > chrubis@suse.cz >