Re: Regression with calibrate_xor_blocks, probably UML related

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Boaz Harrosh <bharrosh@panasas.com>
To: Dan Williams <dan.j.williams@intel.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	NeilBrown <neilb@suse.de>,
	uml-devel <user-mode-linux-devel@lists.sourceforge.net>
Subject: Re: Regression with calibrate_xor_blocks, probably UML related
Date: Thu, 10 Feb 2011 17:51:05 +0200	[thread overview]
Message-ID: <4D540969.9090507@panasas.com> (raw)
In-Reply-To: <4D52E4E1.7070705@panasas.com>

On 02/09/2011 09:02 PM, Boaz Harrosh wrote:
> I have a new module that uses the async_tx.h lib.
> 
> On an exact same module code based on 3.6.37 I see the:
> 	xor: measuring software checksum speed
> 	   8regs     : 11312.000 MB/sec
> 	   8regs_prefetch:  9792.800 MB/sec
> 	   32regs    : 11220.400 MB/sec
> 	   32regs_prefetch:  9750.800 MB/sec
> 	xor: using function: 8regs (11312.000 MB/sec)
> 
> And all is well. But on code based on 2.6.38-rc4 I get hard stuck
> right after:
> 	xor: measuring software checksum speed
> 

OK this is not dependent on Kernel version it is the same for both
.38-rc4 and .37. I was just lucky with .37 more.

And the same things happen with raid456 module. I do
[]$ modprobe raid456; modprobe --remove raid456
A few times it loads, printing the above checks, Then At one
time it freezes. Sometimes at first attempt sometimes at 4-7
attempts. I never went 10 times strait.

When it freezes (hard) I can see in my host that the UML is
at 100% CPU.

BTW: when I manage to pass the tests I get the above numbers
But when I load directly on the host I get:

 xor: automatically using best checksumming function: generic_sse
   generic_sse:  7596.000 MB/sec
 xor: using function: generic_sse (7596.000 MB/sec)
 raid6: int64x1   1660 MB/s
 raid6: int64x2   1832 MB/s
 raid6: int64x4   1566 MB/s
 raid6: int64x8   1175 MB/s
 raid6: sse2x1    3699 MB/s
 raid6: sse2x2    4398 MB/s
 raid6: sse2x4    5863 MB/s
 raid6: using algorithm sse2x4 (5863 MB/s)

and on the UML:

 raid6: int64x1   2019 MB/s
 raid6: int64x2   2208 MB/s
 raid6: int64x4   1892 MB/s
 raid6: int64x8   1528 MB/s
 raid6: using algorithm int64x2 (2208 MB/s)
 xor: measuring software checksum speed
   8regs     : 11308.000 MB/sec
   8regs_prefetch:  9795.600 MB/sec
   32regs    : 11236.000 MB/sec
   32regs_prefetch:  9752.400 MB/sec
 xor: using function: 8regs (11308.000 MB/sec)

So the raid6 sse is better, but comparing it64xX the UML is faster than host
But raid5? that's 33% better results. Does that say that UML's clock has
a bug?

Any way I'm trying to debug that xor.ko loading problem see what
comes up. Any help is welcome

Thanks
Boaz

> the UML is completely frozen. When I kill the uml from the host
> I can sometimes get this trace.
> 





> 750c7498:  [<6005f936>] bad_page+0xd8/0xf3
> 750c74c8:  [<60060c93>] get_page_from_freelist+0x333/0x47b
> 750c7508:  [<60131243>] put_dec+0x20/0x3c
> 750c75a0:  [<6001a0ac>] change_pre_exec+0x0/0x24
> 750c75b8:  [<60060ef1>] __alloc_pages_nodemask+0x116/0x65b
> 750c7668:  [<60132e25>] sprintf+0xa1/0xa3
> 750c76a0:  [<6001a0ac>] change_pre_exec+0x0/0x24
> 750c76b8:  [<60061446>] __get_free_pages+0x10/0x43
> 750c76c8:  [<60012875>] alloc_stack+0x1b/0x1d
> 750c76d8:  [<6001fe27>] run_helper+0x26/0x1b5
> 750c76e8:  [<60021553>] set_signals+0x1c/0x2e
> 750c7708:  [<6007efac>] __kmalloc+0x9e/0xc4
> 750c7748:  [<6001a544>] change+0x124/0x189
> 750c77e8:  [<601b77db>] _raw_spin_unlock+0x9/0xb
> 750c7818:  [<6001a5a9>] close_addr+0x0/0x1c
> 750c7828:  [<6001a5c3>] close_addr+0x1a/0x1c
> 750c7838:  [<6001926a>] iter_addresses+0x5f/0x76
> 750c7858:  [<6007e8e8>] kfree+0x92/0x9b
> 750c7898:  [<60022d01>] tuntap_close+0x24/0x38
> 750c78b8:  [<600194e4>] close_devices+0x4a/0x7f
> 750c78d8:  [<600121bf>] do_uml_exitcalls+0x12/0x23
> 750c78f8:  [<60012cd2>] uml_cleanup+0x1a/0x87
> 750c7928:  [<6002039b>] last_ditch_exit+0x9/0x16
> 750c79e8:  [<78817031>] xor_8regs_2+0x31/0x58 [xor]
> 750c7a18:  [<7881b000>] calibrate_xor_blocks+0x0/0xdf [xor]
> 750c7aa8:  [<601b77ce>] _raw_spin_unlock_irqrestore+0x18/0x1c
> 750c7ac8:  [<60029d8d>] try_to_wake_up+0x86/0x98
> 750c7d78:  [<601b548d>] printk+0xa0/0xa3
> 750c7e08:  [<78817633>] do_xor_speed+0x54/0xaf [xor]
> 750c7e20:  [<7881b000>] calibrate_xor_blocks+0x0/0xdf [xor]
> 750c7e58:  [<7881b057>] calibrate_xor_blocks+0x57/0xdf [xor]
> 750c7e68:  [<7881b000>] calibrate_xor_blocks+0x0/0xdf [xor]
> 750c7e78:  [<6001105a>] do_one_initcall+0x76/0x121
> 750c7eb8:  [<600563fd>] sys_init_module+0x78/0x1a6
> 750c7ee8:  [<60014d60>] handle_syscall+0x58/0x70
> 750c7f08:  [<60024163>] userspace+0x2dd/0x38a
> 750c7fc8:  [<600126af>] fork_handler+0x62/0x69
> 
> (gdb) list *(xor_8regs_2+0x31)
> 0x55 is in xor_8regs_2 (/usr0/export/dev/bharrosh/git/pub/scsi-misc/include/asm-generic/xor.h:29).
> 24                      p1[0] ^= p2[0];
> 25                      p1[1] ^= p2[1];
> 26                      p1[2] ^= p2[2];
> 27                      p1[3] ^= p2[3];
> 28                      p1[4] ^= p2[4];
> 29                      p1[5] ^= p2[5];
> 30                      p1[6] ^= p2[6];
> 31                      p1[7] ^= p2[7];
> 32                      p1 += 8;
> 33                      p2 += 8;
> (gdb) list *(calibrate_xor_blocks+0x0)
> 0xd52 is in calibrate_xor_blocks (/usr0/export/dev/bharrosh/git/pub/scsi-misc/crypto/xor.c:101).
> 96                     speed / 1000, speed % 1000);
> 97      }
> 98
> 99      static int __init
> 100     calibrate_xor_blocks(void)
> 101     {
> 102             void *b1, *b2;
> 103             struct xor_block_template *f, *fastest;
> 104
> 105             /*
> (gdb) list *(do_xor_speed+0x54)
> 0x657 is in do_xor_speed (/usr0/export/dev/bharrosh/git/pub/scsi-misc/crypto/xor.c:84).
> 79                      now = jiffies;
> 80                      count = 0;
> 81                      while (jiffies == now) {
> 82                              mb(); /* prevent loop optimzation */
> 83                              tmpl->do_2(BENCH_SIZE, b1, b2);
> 84                              mb();
> 85                              count++;
> 86                              mb();
> 87                      }
> 88                      if (count > max)
> (gdb) list *(calibrate_xor_blocks+0x57)
> 0xda9 is in calibrate_xor_blocks (/usr0/export/dev/bharrosh/git/pub/scsi-misc/crypto/xor.c:137).
> 132                             "checksumming function: %s\n",
> 133                             fastest->name);
> 134                     xor_speed(fastest);
> 135             } else {
> 136                     printk(KERN_INFO "xor: measuring software checksum speed\n");
> 137                     XOR_TRY_TEMPLATES;
> 138                     fastest = template_list;
> 139                     for (f = fastest; f; f = f->next)
> 140                             if (f->speed > fastest->speed)
> 141                                     fastest = f;
> (gdb) q
> 
> So it looks like the code in UML links the include/asm-generic/xor.h and that it gets
> stuck. Any thing changed in this area in last merge window?
> 
> Before I start the very difficult bisect?
> 
> Thanks for any tips
> Boaz
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)

From: Boaz Harrosh <bharrosh@panasas.com>
To: Dan Williams <dan.j.williams@intel.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	NeilBrown <neilb@suse.de>,
	uml-devel <user-mode-l
Subject: Re: Regression with calibrate_xor_blocks, probably UML related
Date: Thu, 10 Feb 2011 17:51:05 +0200	[thread overview]
Message-ID: <4D540969.9090507@panasas.com> (raw)
In-Reply-To: <4D52E4E1.7070705@panasas.com>

On 02/09/2011 09:02 PM, Boaz Harrosh wrote:
> I have a new module that uses the async_tx.h lib.
> 
> On an exact same module code based on 3.6.37 I see the:
> 	xor: measuring software checksum speed
> 	   8regs     : 11312.000 MB/sec
> 	   8regs_prefetch:  9792.800 MB/sec
> 	   32regs    : 11220.400 MB/sec
> 	   32regs_prefetch:  9750.800 MB/sec
> 	xor: using function: 8regs (11312.000 MB/sec)
> 
> And all is well. But on code based on 2.6.38-rc4 I get hard stuck
> right after:
> 	xor: measuring software checksum speed
> 

OK this is not dependent on Kernel version it is the same for both
.38-rc4 and .37. I was just lucky with .37 more.

And the same things happen with raid456 module. I do
[]$ modprobe raid456; modprobe --remove raid456
A few times it loads, printing the above checks, Then At one
time it freezes. Sometimes at first attempt sometimes at 4-7
attempts. I never went 10 times strait.

When it freezes (hard) I can see in my host that the UML is
at 100% CPU.

BTW: when I manage to pass the tests I get the above numbers
But when I load directly on the host I get:

 xor: automatically using best checksumming function: generic_sse
   generic_sse:  7596.000 MB/sec
 xor: using function: generic_sse (7596.000 MB/sec)
 raid6: int64x1   1660 MB/s
 raid6: int64x2   1832 MB/s
 raid6: int64x4   1566 MB/s
 raid6: int64x8   1175 MB/s
 raid6: sse2x1    3699 MB/s
 raid6: sse2x2    4398 MB/s
 raid6: sse2x4    5863 MB/s
 raid6: using algorithm sse2x4 (5863 MB/s)

and on the UML:

 raid6: int64x1   2019 MB/s
 raid6: int64x2   2208 MB/s
 raid6: int64x4   1892 MB/s
 raid6: int64x8   1528 MB/s
 raid6: using algorithm int64x2 (2208 MB/s)
 xor: measuring software checksum speed
   8regs     : 11308.000 MB/sec
   8regs_prefetch:  9795.600 MB/sec
   32regs    : 11236.000 MB/sec
   32regs_prefetch:  9752.400 MB/sec
 xor: using function: 8regs (11308.000 MB/sec)

So the raid6 sse is better, but comparing it64xX the UML is faster than host
But raid5? that's 33% better results. Does that say that UML's clock has
a bug?

Any way I'm trying to debug that xor.ko loading problem see what
comes up. Any help is welcome

Thanks
Boaz

> the UML is completely frozen. When I kill the uml from the host
> I can sometimes get this trace.
> 





> 750c7498:  [<6005f936>] bad_page+0xd8/0xf3
> 750c74c8:  [<60060c93>] get_page_from_freelist+0x333/0x47b
> 750c7508:  [<60131243>] put_dec+0x20/0x3c
> 750c75a0:  [<6001a0ac>] change_pre_exec+0x0/0x24
> 750c75b8:  [<60060ef1>] __alloc_pages_nodemask+0x116/0x65b
> 750c7668:  [<60132e25>] sprintf+0xa1/0xa3
> 750c76a0:  [<6001a0ac>] change_pre_exec+0x0/0x24
> 750c76b8:  [<60061446>] __get_free_pages+0x10/0x43
> 750c76c8:  [<60012875>] alloc_stack+0x1b/0x1d
> 750c76d8:  [<6001fe27>] run_helper+0x26/0x1b5
> 750c76e8:  [<60021553>] set_signals+0x1c/0x2e
> 750c7708:  [<6007efac>] __kmalloc+0x9e/0xc4
> 750c7748:  [<6001a544>] change+0x124/0x189
> 750c77e8:  [<601b77db>] _raw_spin_unlock+0x9/0xb
> 750c7818:  [<6001a5a9>] close_addr+0x0/0x1c
> 750c7828:  [<6001a5c3>] close_addr+0x1a/0x1c
> 750c7838:  [<6001926a>] iter_addresses+0x5f/0x76
> 750c7858:  [<6007e8e8>] kfree+0x92/0x9b
> 750c7898:  [<60022d01>] tuntap_close+0x24/0x38
> 750c78b8:  [<600194e4>] close_devices+0x4a/0x7f
> 750c78d8:  [<600121bf>] do_uml_exitcalls+0x12/0x23
> 750c78f8:  [<60012cd2>] uml_cleanup+0x1a/0x87
> 750c7928:  [<6002039b>] last_ditch_exit+0x9/0x16
> 750c79e8:  [<78817031>] xor_8regs_2+0x31/0x58 [xor]
> 750c7a18:  [<7881b000>] calibrate_xor_blocks+0x0/0xdf [xor]
> 750c7aa8:  [<601b77ce>] _raw_spin_unlock_irqrestore+0x18/0x1c
> 750c7ac8:  [<60029d8d>] try_to_wake_up+0x86/0x98
> 750c7d78:  [<601b548d>] printk+0xa0/0xa3
> 750c7e08:  [<78817633>] do_xor_speed+0x54/0xaf [xor]
> 750c7e20:  [<7881b000>] calibrate_xor_blocks+0x0/0xdf [xor]
> 750c7e58:  [<7881b057>] calibrate_xor_blocks+0x57/0xdf [xor]
> 750c7e68:  [<7881b000>] calibrate_xor_blocks+0x0/0xdf [xor]
> 750c7e78:  [<6001105a>] do_one_initcall+0x76/0x121
> 750c7eb8:  [<600563fd>] sys_init_module+0x78/0x1a6
> 750c7ee8:  [<60014d60>] handle_syscall+0x58/0x70
> 750c7f08:  [<60024163>] userspace+0x2dd/0x38a
> 750c7fc8:  [<600126af>] fork_handler+0x62/0x69
> 
> (gdb) list *(xor_8regs_2+0x31)
> 0x55 is in xor_8regs_2 (/usr0/export/dev/bharrosh/git/pub/scsi-misc/include/asm-generic/xor.h:29).
> 24                      p1[0] ^= p2[0];
> 25                      p1[1] ^= p2[1];
> 26                      p1[2] ^= p2[2];
> 27                      p1[3] ^= p2[3];
> 28                      p1[4] ^= p2[4];
> 29                      p1[5] ^= p2[5];
> 30                      p1[6] ^= p2[6];
> 31                      p1[7] ^= p2[7];
> 32                      p1 += 8;
> 33                      p2 += 8;
> (gdb) list *(calibrate_xor_blocks+0x0)
> 0xd52 is in calibrate_xor_blocks (/usr0/export/dev/bharrosh/git/pub/scsi-misc/crypto/xor.c:101).
> 96                     speed / 1000, speed % 1000);
> 97      }
> 98
> 99      static int __init
> 100     calibrate_xor_blocks(void)
> 101     {
> 102             void *b1, *b2;
> 103             struct xor_block_template *f, *fastest;
> 104
> 105             /*
> (gdb) list *(do_xor_speed+0x54)
> 0x657 is in do_xor_speed (/usr0/export/dev/bharrosh/git/pub/scsi-misc/crypto/xor.c:84).
> 79                      now = jiffies;
> 80                      count = 0;
> 81                      while (jiffies == now) {
> 82                              mb(); /* prevent loop optimzation */
> 83                              tmpl->do_2(BENCH_SIZE, b1, b2);
> 84                              mb();
> 85                              count++;
> 86                              mb();
> 87                      }
> 88                      if (count > max)
> (gdb) list *(calibrate_xor_blocks+0x57)
> 0xda9 is in calibrate_xor_blocks (/usr0/export/dev/bharrosh/git/pub/scsi-misc/crypto/xor.c:137).
> 132                             "checksumming function: %s\n",
> 133                             fastest->name);
> 134                     xor_speed(fastest);
> 135             } else {
> 136                     printk(KERN_INFO "xor: measuring software checksum speed\n");
> 137                     XOR_TRY_TEMPLATES;
> 138                     fastest = template_list;
> 139                     for (f = fastest; f; f = f->next)
> 140                             if (f->speed > fastest->speed)
> 141                                     fastest = f;
> (gdb) q
> 
> So it looks like the code in UML links the include/asm-generic/xor.h and that it gets
> stuck. Any thing changed in this area in last merge window?
> 
> Before I start the very difficult bisect?
> 
> Thanks for any tips
> Boaz
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2011-02-10 15:51 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-09 19:02 Regression with calibrate_xor_blocks, probably UML related Boaz Harrosh
2011-02-09 19:02 ` Boaz Harrosh
2011-02-10 15:51 ` Boaz Harrosh [this message]
2011-02-10 15:51   ` Boaz Harrosh
2011-02-11 12:38   ` richard -rw- weinberger
2011-02-11 12:38     ` richard -rw- weinberger
2011-02-11 13:06     ` Boaz Harrosh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D540969.9090507@panasas.com \
    --to=bharrosh@panasas.com \
    --cc=dan.j.williams@intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=user-mode-linux-devel@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.