From mboxrd@z Thu Jan 1 00:00:00 1970 From: richard -rw- weinberger Subject: Re: Regression with calibrate_xor_blocks, probably UML related Date: Fri, 11 Feb 2011 13:38:17 +0100 Message-ID: References: <4D52E4E1.7070705@panasas.com> <4D540969.9090507@panasas.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Dan Williams , linux-kernel , linux-fsdevel , NeilBrown , uml-devel To: Boaz Harrosh Return-path: In-Reply-To: <4D540969.9090507@panasas.com> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Thu, Feb 10, 2011 at 4:51 PM, Boaz Harrosh wr= ote: > On 02/09/2011 09:02 PM, Boaz Harrosh wrote: >> I have a new module that uses the async_tx.h lib. >> >> On an exact same module code based on 3.6.37 I see the: >> =A0 =A0 =A0 xor: measuring software checksum speed >> =A0 =A0 =A0 =A0 =A08regs =A0 =A0 : 11312.000 MB/sec >> =A0 =A0 =A0 =A0 =A08regs_prefetch: =A09792.800 MB/sec >> =A0 =A0 =A0 =A0 =A032regs =A0 =A0: 11220.400 MB/sec >> =A0 =A0 =A0 =A0 =A032regs_prefetch: =A09750.800 MB/sec >> =A0 =A0 =A0 xor: using function: 8regs (11312.000 MB/sec) >> >> And all is well. But on code based on 2.6.38-rc4 I get hard stuck >> right after: >> =A0 =A0 =A0 xor: measuring software checksum speed >> > > OK this is not dependent on Kernel version it is the same for both > .38-rc4 and .37. I was just lucky with .37 more. > > And the same things happen with raid456 module. I do > []$ modprobe raid456; modprobe --remove raid456 > A few times it loads, printing the above checks, Then At one > time it freezes. Sometimes at first attempt sometimes at 4-7 > attempts. I never went 10 times strait. > > When it freezes (hard) I can see in my host that the UML is > at 100% CPU. > > BTW: when I manage to pass the tests I get the above numbers > But when I load directly on the host I get: > > =A0xor: automatically using best checksumming function: generic_sse > =A0 generic_sse: =A07596.000 MB/sec > =A0xor: using function: generic_sse (7596.000 MB/sec) > =A0raid6: int64x1 =A0 1660 MB/s > =A0raid6: int64x2 =A0 1832 MB/s > =A0raid6: int64x4 =A0 1566 MB/s > =A0raid6: int64x8 =A0 1175 MB/s > =A0raid6: sse2x1 =A0 =A03699 MB/s > =A0raid6: sse2x2 =A0 =A04398 MB/s > =A0raid6: sse2x4 =A0 =A05863 MB/s > =A0raid6: using algorithm sse2x4 (5863 MB/s) > > and on the UML: > > =A0raid6: int64x1 =A0 2019 MB/s > =A0raid6: int64x2 =A0 2208 MB/s > =A0raid6: int64x4 =A0 1892 MB/s > =A0raid6: int64x8 =A0 1528 MB/s > =A0raid6: using algorithm int64x2 (2208 MB/s) > =A0xor: measuring software checksum speed > =A0 8regs =A0 =A0 : 11308.000 MB/sec > =A0 8regs_prefetch: =A09795.600 MB/sec > =A0 32regs =A0 =A0: 11236.000 MB/sec > =A0 32regs_prefetch: =A09752.400 MB/sec > =A0xor: using function: 8regs (11308.000 MB/sec) > > So the raid6 sse is better, but comparing it64xX the UML is faster th= an host > But raid5? that's 33% better results. Does that say that UML's clock = has > a bug? > > Any way I'm trying to debug that xor.ko loading problem see what > comes up. Any help is welcome Hmmm, can you bisect it? Can you post you config then I can also try my best... > Thanks > Boaz > >> the UML is completely frozen. When I kill the uml from the host >> I can sometimes get this trace. >> > > > > > >> 750c7498: =A0[<6005f936>] bad_page+0xd8/0xf3 >> 750c74c8: =A0[<60060c93>] get_page_from_freelist+0x333/0x47b >> 750c7508: =A0[<60131243>] put_dec+0x20/0x3c >> 750c75a0: =A0[<6001a0ac>] change_pre_exec+0x0/0x24 >> 750c75b8: =A0[<60060ef1>] __alloc_pages_nodemask+0x116/0x65b >> 750c7668: =A0[<60132e25>] sprintf+0xa1/0xa3 >> 750c76a0: =A0[<6001a0ac>] change_pre_exec+0x0/0x24 >> 750c76b8: =A0[<60061446>] __get_free_pages+0x10/0x43 >> 750c76c8: =A0[<60012875>] alloc_stack+0x1b/0x1d >> 750c76d8: =A0[<6001fe27>] run_helper+0x26/0x1b5 >> 750c76e8: =A0[<60021553>] set_signals+0x1c/0x2e >> 750c7708: =A0[<6007efac>] __kmalloc+0x9e/0xc4 >> 750c7748: =A0[<6001a544>] change+0x124/0x189 >> 750c77e8: =A0[<601b77db>] _raw_spin_unlock+0x9/0xb >> 750c7818: =A0[<6001a5a9>] close_addr+0x0/0x1c >> 750c7828: =A0[<6001a5c3>] close_addr+0x1a/0x1c >> 750c7838: =A0[<6001926a>] iter_addresses+0x5f/0x76 >> 750c7858: =A0[<6007e8e8>] kfree+0x92/0x9b >> 750c7898: =A0[<60022d01>] tuntap_close+0x24/0x38 >> 750c78b8: =A0[<600194e4>] close_devices+0x4a/0x7f >> 750c78d8: =A0[<600121bf>] do_uml_exitcalls+0x12/0x23 >> 750c78f8: =A0[<60012cd2>] uml_cleanup+0x1a/0x87 >> 750c7928: =A0[<6002039b>] last_ditch_exit+0x9/0x16 >> 750c79e8: =A0[<78817031>] xor_8regs_2+0x31/0x58 [xor] >> 750c7a18: =A0[<7881b000>] calibrate_xor_blocks+0x0/0xdf [xor] >> 750c7aa8: =A0[<601b77ce>] _raw_spin_unlock_irqrestore+0x18/0x1c >> 750c7ac8: =A0[<60029d8d>] try_to_wake_up+0x86/0x98 >> 750c7d78: =A0[<601b548d>] printk+0xa0/0xa3 >> 750c7e08: =A0[<78817633>] do_xor_speed+0x54/0xaf [xor] >> 750c7e20: =A0[<7881b000>] calibrate_xor_blocks+0x0/0xdf [xor] >> 750c7e58: =A0[<7881b057>] calibrate_xor_blocks+0x57/0xdf [xor] >> 750c7e68: =A0[<7881b000>] calibrate_xor_blocks+0x0/0xdf [xor] >> 750c7e78: =A0[<6001105a>] do_one_initcall+0x76/0x121 >> 750c7eb8: =A0[<600563fd>] sys_init_module+0x78/0x1a6 >> 750c7ee8: =A0[<60014d60>] handle_syscall+0x58/0x70 >> 750c7f08: =A0[<60024163>] userspace+0x2dd/0x38a >> 750c7fc8: =A0[<600126af>] fork_handler+0x62/0x69 >> >> (gdb) list *(xor_8regs_2+0x31) >> 0x55 is in xor_8regs_2 (/usr0/export/dev/bharrosh/git/pub/scsi-misc/= include/asm-generic/xor.h:29). >> 24 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0p1[0] ^=3D p2[0]; >> 25 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0p1[1] ^=3D p2[1]; >> 26 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0p1[2] ^=3D p2[2]; >> 27 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0p1[3] ^=3D p2[3]; >> 28 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0p1[4] ^=3D p2[4]; >> 29 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0p1[5] ^=3D p2[5]; >> 30 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0p1[6] ^=3D p2[6]; >> 31 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0p1[7] ^=3D p2[7]; >> 32 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0p1 +=3D 8; >> 33 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0p2 +=3D 8; >> (gdb) list *(calibrate_xor_blocks+0x0) >> 0xd52 is in calibrate_xor_blocks (/usr0/export/dev/bharrosh/git/pub/= scsi-misc/crypto/xor.c:101). >> 96 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 speed / 1000, speed % 100= 0); >> 97 =A0 =A0 =A0} >> 98 >> 99 =A0 =A0 =A0static int __init >> 100 =A0 =A0 calibrate_xor_blocks(void) >> 101 =A0 =A0 { >> 102 =A0 =A0 =A0 =A0 =A0 =A0 void *b1, *b2; >> 103 =A0 =A0 =A0 =A0 =A0 =A0 struct xor_block_template *f, *fastest; >> 104 >> 105 =A0 =A0 =A0 =A0 =A0 =A0 /* >> (gdb) list *(do_xor_speed+0x54) >> 0x657 is in do_xor_speed (/usr0/export/dev/bharrosh/git/pub/scsi-mis= c/crypto/xor.c:84). >> 79 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0now =3D jiffies; >> 80 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0count =3D 0; >> 81 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0while (jiffies =3D=3D = now) { >> 82 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mb(); = /* prevent loop optimzation */ >> 83 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0tmpl->= do_2(BENCH_SIZE, b1, b2); >> 84 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mb(); >> 85 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0count+= +; >> 86 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mb(); >> 87 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} >> 88 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (count > max) >> (gdb) list *(calibrate_xor_blocks+0x57) >> 0xda9 is in calibrate_xor_blocks (/usr0/export/dev/bharrosh/git/pub/= scsi-misc/crypto/xor.c:137). >> 132 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 "checksu= mming function: %s\n", >> 133 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 fastest-= >name); >> 134 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 xor_speed(fastest); >> 135 =A0 =A0 =A0 =A0 =A0 =A0 } else { >> 136 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 printk(KERN_INFO "xor: m= easuring software checksum speed\n"); >> 137 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 XOR_TRY_TEMPLATES; >> 138 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 fastest =3D template_lis= t; >> 139 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 for (f =3D fastest; f; f= =3D f->next) >> 140 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (f->s= peed > fastest->speed) >> 141 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 fastest =3D f; >> (gdb) q >> >> So it looks like the code in UML links the include/asm-generic/xor.h= and that it gets >> stuck. Any thing changed in this area in last merge window? >> >> Before I start the very difficult bisect? >> >> Thanks for any tips >> Boaz >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-fsde= vel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kerne= l" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at =A0http://vger.kernel.org/majordomo-info.html > Please read the FAQ at =A0http://www.tux.org/lkml/ > --=20 Thanks, //richard