From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ipmail06.adl2.internode.on.net ([150.101.137.129]:12124 "EHLO ipmail06.adl2.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932080AbcFBGfo (ORCPT ); Thu, 2 Jun 2016 02:35:44 -0400 Date: Thu, 2 Jun 2016 16:35:39 +1000 From: Dave Chinner To: Daniel Wagner Cc: linux-fsdevel@vger.kernel.org, "linux-kernel@vger.kernel.org" , xfs@oss.sgi.com Subject: Re: Internal error xfs_trans_cancel Message-ID: <20160602063539.GM12670@dastard> References: <20160601071047.GJ12670@dastard> <0644b434-6cea-4188-9702-469c26d191b8@monom.org> <20160602002653.GL12670@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Thu, Jun 02, 2016 at 07:23:24AM +0200, Daniel Wagner wrote: > > posix03 and posix04 just emit error messages: > > > > posix04 -n 40 -l 100 > > posix04: invalid option -- 'l' > > posix04: Usage: posix04 [-i iterations] [-n nr_children] [-s] > > ..... > > I screwed that this up. I have patched my version of lockperf to make > all test using the same options names. Though forgot to send those > patches. Will do now. > > In this case you can use use '-i' instead of '-l'. > > > So I changed them to run "-i $l" instead, and that has a somewhat > > undesired effect: > > > > static void > > kill_children() > > { > > siginfo_t infop; > > > > signal(SIGINT, SIG_IGN); > >>>>>> kill(0, SIGINT); > > while (waitid(P_ALL, 0, &infop, WEXITED) != -1); > > } > > > > Yeah, it sends a SIGINT to everything with a process group id. It > > kills the parent shell: > > Ah that rings a bell. I tuned the parameters so that I did not run into > this problem. I'll do patch for this one. It's pretty annoying. > > > $ ./run-lockperf-tests.sh /mnt/scratch/ > > pid 9597's current affinity list: 0-15 > > pid 9597's new affinity list: 0,4,8,12 > > sh: 1: cannot create /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor: Directory nonexistent > > posix01 -n 8 -l 100 > > posix02 -n 8 -l 100 > > posix03 -n 8 -i 100 > > > > $ > > > > So, I've just removed those tests from your script. I'll see if I > > have any luck with reproducing the problem now. > > I was able to reproduce it again with the same steps. Hmmm, Ok. I've been running the lockperf test and kernel builds all day on a filesystem that is identical in shape and size to yours (i.e. xfs_info output is the same) but I haven't reproduced it yet. Is it possible to get a metadump image of your filesystem to see if I can reproduce it on that? Cheers, Dave. -- Dave Chinner david@fromorbit.com