From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Wed, 24 Jul 2013 10:10:19 +0800 From: Han Pingtian Subject: Re: segmentation fault on power system Message-ID: <20130724021019.GA2615@localhost.localdomain> References: <20130723101359.GC2152@localhost.localdomain> <51EE5997.6070804@enovance.com> <51EE9E1C.3040708@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <51EE9E1C.3040708@kernel.dk> To: Jens Axboe Cc: Erwan Velu , fio@vger.kernel.org List-ID: On Tue, Jul 23, 2013 at 09:15:40AM -0600, Jens Axboe wrote: > On 07/23/2013 04:23 AM, Erwan Velu wrote: > > On 23/07/2013 12:13, Han Pingtian wrote: > >> Hey there, > >> > >> When trying to run fio on one of our power system, segmentation fault > >> > > Can you give us the kind of cpu you are using ? (/proc/cpuinfo) > > I'm not used with PPC but maybe your processor doesn't support ATBU call > > on mfspr. > > That is definitely the problem, so CPU info would help. In the mean > time, you can use clocksource=clock_gettime to get rid of the illegal > instruction. This is the contents of /proc/cpuinfo: processor : 0 cpu : POWER7 (architected), altivec supported clock : 3550.000000MHz revision : 2.3 (pvr 003f 0203) processor : 1 cpu : POWER7 (architected), altivec supported clock : 3550.000000MHz revision : 2.3 (pvr 003f 0203) processor : 2 cpu : POWER7 (architected), altivec supported clock : 3550.000000MHz revision : 2.3 (pvr 003f 0203) processor : 3 cpu : POWER7 (architected), altivec supported clock : 3550.000000MHz revision : 2.3 (pvr 003f 0203) timebase : 512000000 platform : pSeries model : IBM,8231-E2C machine : CHRP IBM,8231-E2C But 'clocksource=clock_gettime' doesn't fix the fault. I can also get the same two cores. If I changed the code like this: ================================================================================ diff --git a/arch/arch-ppc.h b/arch/arch-ppc.h index 65e6b74..30c315c 100644 --- a/arch/arch-ppc.h +++ b/arch/arch-ppc.h @@ -67,15 +67,15 @@ static inline unsigned long long get_cpu_clock(void) unsigned long long ret; do { - if (arch_flags & ARCH_FLAG_1) { - tbu0 = mfspr(SPRN_ATBU); - tbl = mfspr(SPRN_ATBL); - tbu1 = mfspr(SPRN_ATBU); - } else { + //if (arch_flags & ARCH_FLAG_1) { + // tbu0 = mfspr(SPRN_ATBU); + // tbl = mfspr(SPRN_ATBL); + // tbu1 = mfspr(SPRN_ATBU); + //} else { tbu0 = mfspr(SPRN_TBRU); tbl = mfspr(SPRN_TBRL); tbu1 = mfspr(SPRN_TBRU); - } + //} } while (tbu0 != tbu1); ret = (((unsigned long long)tbu0) << 32) | tbl; ================================================================================ then only one core dumpped which has this backtrace: ================================================================================ Core was generated by `./fio/fio --debug=parse fio-jobs/randomw.fio '. Program terminated with signal 11, Segmentation fault. #0 0x000000001006288c in init_disk_util (td=0xfff9a730000) at diskutil.c:481 481 if (!td->o.do_disk_util || (gdb) bt #0 0x000000001006288c in init_disk_util (td=0xfff9a730000) at diskutil.c:481 #1 0x0000000010050d30 in run_threads () at backend.c:1691 #2 0x000000001005159c in fio_backend () at backend.c:1911 #3 0x00000000100669a4 in main (argc=, argv=0xfffd20c9ec8, envp=) at fio.c:50 (gdb) p td $1 = (struct thread_data *) 0xfff9a730000 (gdb) p threads $2 = (struct thread_data *) 0xfff9a730000 (gdb) p td->io_ops $3 = (struct ioengine_ops *) 0x0 (gdb) p threads->io_ops $4 = (struct ioengine_ops *) 0x1000e262720 (gdb) ================================================================================