* [PATCH] kexec: Add --lite option [not found] ` <20151207114547.GE16406@dhcppc13.redhat.com> @ 2015-12-07 11:48 ` Pratyush Anand 2015-12-07 13:16 ` James Morse 0 siblings, 1 reply; 8+ messages in thread From: Pratyush Anand @ 2015-12-07 11:48 UTC (permalink / raw) To: linux-arm-kernel Sorry, forgot to add linux-arm-kernel at lists.infradead.org. Now CCed. On 07/12/2015:05:15:47 PM, Pratyush Anand wrote: > +linux-arm-kernel at lists.infradead.org (May be someone from arm kernel list can > give some more input) > > On 04/11/2015:11:56:51 PM, Scott Wood wrote: > > On Thu, 2015-10-22 at 12:08 -0700, Geoff Levand wrote: > > > I notice the difference on the my arm64 system, so I guess we > > > are even on that. > > > > For me it was beyond "notice the difference" -- I thought it was completely > > broken, and was preparing to debug, until it started spitting out output over > > a minute later. > > > > Compiling the sha256 code with -O2 instead of -O0 cut it down to around 10 > > seconds (still unpleasant, but not quite as crazy... still unacceptable for > > non-kdump, though). > > Yes, compiling purgatory code with -O2 helps to improve the timing, and I notice > that enabling D-cache on top of -O2 does not improve it further. However, I am > still not able to understand that why there is huge difference between following > two. > > 1) When we execute kexec() system call in first kernel, at that time it > calculates sha256 on all the binaries [1]. It take almost un-noticeable time > (less than a sec) there. > > 2) When purgatory is executed then it re-calculates sha256 using same routines > [2] on same binary data as that of case (1). But, now it takes 10-20 sec > (depending of size of binaries)? > > Why did not it take same time with O2 + D-cache enabled? I think, we should be > able to achieve same time in second case as well. What is missing? > > ~Pratyush > > [1] http://git.kernel.org/cgit/utils/kernel/kexec/kexec-tools.git/tree/kexec/kexec.c#n650 > [2] http://git.kernel.org/cgit/utils/kernel/kexec/kexec-tools.git/tree/purgatory/purgatory.c#n20 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] kexec: Add --lite option 2015-12-07 11:48 ` [PATCH] kexec: Add --lite option Pratyush Anand @ 2015-12-07 13:16 ` James Morse 2015-12-07 14:07 ` Pratyush Anand 0 siblings, 1 reply; 8+ messages in thread From: James Morse @ 2015-12-07 13:16 UTC (permalink / raw) To: linux-arm-kernel Hi Pratyush, On 07/12/15 11:48, Pratyush Anand wrote: >> 1) When we execute kexec() system call in first kernel, at that time it >> calculates sha256 on all the binaries [1]. It take almost un-noticeable time >> (less than a sec) there. >> >> 2) When purgatory is executed then it re-calculates sha256 using same routines >> [2] on same binary data as that of case (1). But, now it takes 10-20 sec >> (depending of size of binaries)? >> >> Why did not it take same time with O2 + D-cache enabled? I think, we should be >> able to achieve same time in second case as well. What is missing? I haven't benchmarked this, but: util_lib/sha256.c contains calls out to memcpy(). In your case 1, this will use the glibc version. In case 2, it will use the version implemented in purgatory/string.c, which is a byte-by-byte copy. Thanks, James ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] kexec: Add --lite option 2015-12-07 13:16 ` James Morse @ 2015-12-07 14:07 ` Pratyush Anand 2015-12-08 1:03 ` Scott Wood 2015-12-08 16:00 ` James Morse 0 siblings, 2 replies; 8+ messages in thread From: Pratyush Anand @ 2015-12-07 14:07 UTC (permalink / raw) To: linux-arm-kernel Hi James, Thanks for the reply. On 07/12/2015:01:16:06 PM, James Morse wrote: > Hi Pratyush, > > On 07/12/15 11:48, Pratyush Anand wrote: > >> 1) When we execute kexec() system call in first kernel, at that time it > >> calculates sha256 on all the binaries [1]. It take almost un-noticeable time > >> (less than a sec) there. > >> > >> 2) When purgatory is executed then it re-calculates sha256 using same routines > >> [2] on same binary data as that of case (1). But, now it takes 10-20 sec > >> (depending of size of binaries)? > >> > >> Why did not it take same time with O2 + D-cache enabled? I think, we should be > >> able to achieve same time in second case as well. What is missing? > > I haven't benchmarked this, but: > > util_lib/sha256.c contains calls out to memcpy(). > In your case 1, this will use the glibc version. In case 2, it will use > the version implemented in purgatory/string.c, which is a byte-by-byte copy. > Yes, I agree that byte copy is too slow. But, memcpy() in sha256_update() will copy only few bytes (I think max 126 bytes). Most of the data will be processed using loop while( length >= 64 ){}, where we do not have any memcpy.So, I do not think that this would be causing such a difference. Could it be the case that I am not using perfect memory attributes while setting up identity mapping and enabling D-cache. My implementation is here: https://github.com/pratyushanand/kexec-tools/commit/8efdbc56b52f99a8a074edd0ddc519d7b68be82f ~Pratyush ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] kexec: Add --lite option 2015-12-07 14:07 ` Pratyush Anand @ 2015-12-08 1:03 ` Scott Wood 2015-12-08 16:00 ` James Morse 1 sibling, 0 replies; 8+ messages in thread From: Scott Wood @ 2015-12-08 1:03 UTC (permalink / raw) To: linux-arm-kernel On Mon, 2015-12-07 at 19:37 +0530, Pratyush Anand wrote: > Hi James, > > Thanks for the reply. > > On 07/12/2015:01:16:06 PM, James Morse wrote: > > Hi Pratyush, > > > > On 07/12/15 11:48, Pratyush Anand wrote: > > > > 1) When we execute kexec() system call in first kernel, at that time > > > > it > > > > calculates sha256 on all the binaries [1]. It take almost un > > > > -noticeable time > > > > (less than a sec) there. > > > > > > > > 2) When purgatory is executed then it re-calculates sha256 using same > > > > routines > > > > [2] on same binary data as that of case (1). But, now it takes 10-20 > > > > sec > > > > (depending of size of binaries)? > > > > > > > > Why did not it take same time with O2 + D-cache enabled? I think, we > > > > should be > > > > able to achieve same time in second case as well. What is missing? > > > > I haven't benchmarked this, but: > > > > util_lib/sha256.c contains calls out to memcpy(). > > In your case 1, this will use the glibc version. In case 2, it will use > > the version implemented in purgatory/string.c, which is a byte-by-byte > > copy. > > > > Yes, I agree that byte copy is too slow. But, memcpy() in sha256_update() > will > copy only few bytes (I think max 126 bytes). Most of the data will be > processed > using loop while( length >= 64 ){}, where we do not have any memcpy.So, I do > not > think that this would be causing such a difference. > > Could it be the case that I am not using perfect memory attributes while > setting > up identity mapping and enabling D-cache. My implementation is here: > https://github.com/pratyushanand/kexec-tools/commit/8efdbc56b52f99a8a074edd0 > ddc519d7b68be82f FWIW, purgatory is fast for me on PPC (sub-second), so between that (assuming it's not due to some PPC-specific optimization) and the fact that you don't see any improvement with cache, I'd guess there's something wrong with how you're enabling caches. -Scott ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] kexec: Add --lite option 2015-12-07 14:07 ` Pratyush Anand 2015-12-08 1:03 ` Scott Wood @ 2015-12-08 16:00 ` James Morse 2015-12-09 9:28 ` Pratyush Anand 1 sibling, 1 reply; 8+ messages in thread From: James Morse @ 2015-12-08 16:00 UTC (permalink / raw) To: linux-arm-kernel Hi Pratyush, On 07/12/15 14:07, Pratyush Anand wrote: > On 07/12/2015:01:16:06 PM, James Morse wrote: >> I haven't benchmarked this, but: >> >> util_lib/sha256.c contains calls out to memcpy(). >> In your case 1, this will use the glibc version. In case 2, it will use >> the version implemented in purgatory/string.c, which is a byte-by-byte copy. >> > > Yes, I agree that byte copy is too slow. But, memcpy() in sha256_update() will > copy only few bytes (I think max 126 bytes). Most of the data will be processed > using loop while( length >= 64 ){}, where we do not have any memcpy.So, I do not > think that this would be causing such a difference. You're right, I benchmarked the two sha256.o files checksumming a 10MB buffer - one takes 0.6s, the other 1.7s, we can probably expect a couple of seconds to do this. Is the sha256 really useful? Purgatory can't print out an error message, if it fails... > Could it be the case that I am not using perfect memory attributes while setting > up identity mapping and enabling D-cache. My implementation is here: > https://github.com/pratyushanand/kexec-tools/commit/8efdbc56b52f99a8a074edd0ddc519d7b68be82f I'm no expert, but that looks like you're setting it up as 'normal' memory. You're missing some isb-s and tlbi-s: depending on how long the changes to system state take, you may be using old memory-attributes or page-tables. If you depend on a change to system state, (like turning the mmu on), you need explicit synchronisation, see section 12.3.2 of the 'Architecture Programmers Guide' (arm den0024a), and D7.1.2 of the ARM ARM. I haven't managed to get your kexec-tools branch to work with v10 of Geoff's series. It looks like you save registers to the stack, which give stale values once you turn the mmu off. You also do the opposite, saving registers with the mmu off, then cleaning cache lines over the top, corrupting the saved registers. The page size of 64K is hard coded. Kexec-ing from a 4K kernel, to a 4K kernel will work, but only if the hardware also supports 64K, this will be surprising to debug. Thanks, James ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] kexec: Add --lite option 2015-12-08 16:00 ` James Morse @ 2015-12-09 9:28 ` Pratyush Anand 2016-01-11 12:46 ` Pratyush Anand 0 siblings, 1 reply; 8+ messages in thread From: Pratyush Anand @ 2015-12-09 9:28 UTC (permalink / raw) To: linux-arm-kernel Hi James, On 08/12/2015:04:00:17 PM, James Morse wrote: > Hi Pratyush, > > On 07/12/15 14:07, Pratyush Anand wrote: > > On 07/12/2015:01:16:06 PM, James Morse wrote: > >> I haven't benchmarked this, but: > >> > >> util_lib/sha256.c contains calls out to memcpy(). > >> In your case 1, this will use the glibc version. In case 2, it will use > >> the version implemented in purgatory/string.c, which is a byte-by-byte copy. > >> > > > > Yes, I agree that byte copy is too slow. But, memcpy() in sha256_update() will > > copy only few bytes (I think max 126 bytes). Most of the data will be processed > > using loop while( length >= 64 ){}, where we do not have any memcpy.So, I do not > > think that this would be causing such a difference. > > You're right, I benchmarked the two sha256.o files checksumming a 10MB > buffer - one takes 0.6s, the other 1.7s, we can probably expect a couple > of seconds to do this. > > Is the sha256 really useful? Purgatory can't print out an error message, > if it fails... kdump needs the sha256 integrity checks. Please see previous reply from Dave, Vivek and Eric in this thread. Purgatory prints error message, in case sha256 fails. It prints expected and calculated sha256 values. You must pass --port with proper value to see print messages. Geoff has been able to see purgatory debug messages on foundation model only with --port, however I need below patch and pass --port-lsr as well in order to print all the characters properly. https://github.com/pratyushanand/kexec-tools/commit/ab30f4015189b177dd2e78980f5b7e47c2d22fe4 So, on a system having pl011 base address 0xe1010000, I pass --port=0xe1010000 --port-lsr=0xe1010018,0x80 to the kexec command. > > > > Could it be the case that I am not using perfect memory attributes while setting > > up identity mapping and enabling D-cache. My implementation is here: > > https://github.com/pratyushanand/kexec-tools/commit/8efdbc56b52f99a8a074edd0ddc519d7b68be82f > First of all, thanks a lot for taking out your time to review it :-) > I'm no expert, but that looks like you're setting it up as 'normal' Me too not an expert. Shouldn't it be normal type memory? I think, I will need to define only UART area as device type (currently it is not defined, and so I am not able to use print message while mmu is enabled). > memory. You're missing some isb-s and tlbi-s: depending on how long the > changes to system state take, you may be using old memory-attributes or > page-tables. If you depend on a change to system state, (like turning > the mmu on), you need explicit synchronisation, see section 12.3.2 of > the 'Architecture Programmers Guide' (arm den0024a), and D7.1.2 of the > ARM ARM. I will go through these docs and kernel arch/arm64/kernel/head.S and will rewrite the cache implementation. > > I haven't managed to get your kexec-tools branch to work with v10 of May be you can try master branch. It is almost same as that of Geoff's. Additionally, it has support to "wait for transmit completion before next character transmission". > Geoff's series. It looks like you save registers to the stack, which > give stale values once you turn the mmu off. You also do the opposite, > saving registers with the mmu off, then cleaning cache lines over the > top, corrupting the saved registers. > > The page size of 64K is hard coded. Kexec-ing from a 4K kernel, to a 4K > kernel will work, but only if the hardware also supports 64K, this will > be surprising to debug. OK, I will take care in the re-implementation. Thanks ~Pratyush ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] kexec: Add --lite option 2015-12-09 9:28 ` Pratyush Anand @ 2016-01-11 12:46 ` Pratyush Anand 2016-01-12 1:06 ` Simon Horman 0 siblings, 1 reply; 8+ messages in thread From: Pratyush Anand @ 2016-01-11 12:46 UTC (permalink / raw) To: linux-arm-kernel +Fu Hi James, > On 08/12/2015:04:00:17 PM, James Morse wrote: > > Hi Pratyush, > > I haven't managed to get your kexec-tools branch to work with v10 of Thanks for all your feedback. It helped to stabilize the code better. I took all your suggestions and also did few other modifications [1]. Now, purgatory does SHA verifications on mustang and seattle within a second. I think this would be helpful in handling few of the kdump issues with watchdog. May be you can give a try to the code in branch purgatory-enable-dcache of my tree. Any feedback would be welcome. ~Pratyush [1] https://github.com/pratyushanand/kexec-tools/commit/5679d4baaa5e644f8302982c6f468214ed3d3f3d ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] kexec: Add --lite option 2016-01-11 12:46 ` Pratyush Anand @ 2016-01-12 1:06 ` Simon Horman 0 siblings, 0 replies; 8+ messages in thread From: Simon Horman @ 2016-01-12 1:06 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jan 11, 2016 at 06:16:38PM +0530, Pratyush Anand wrote: > +Fu > > Hi James, > > > On 08/12/2015:04:00:17 PM, James Morse wrote: > > > Hi Pratyush, > > > > I haven't managed to get your kexec-tools branch to work with v10 of > > Thanks for all your feedback. It helped to stabilize the code better. I took > all your suggestions and also did few other modifications [1]. Now, purgatory > does SHA verifications on mustang and seattle within a second. I think this > would be helpful in handling few of the kdump issues with watchdog. May be you > can give a try to the code in branch purgatory-enable-dcache of my tree. Any > feedback would be welcome. Please post the patches here for review. Thanks ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-01-12 1:06 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <1445469125.30908.105.camel@infradead.org> [not found] ` <20151022031718.GB11227@dhcp-129-115.nay.redhat.com> [not found] ` <20151022125012.GB20847@redhat.com> [not found] ` <1445540891.30908.144.camel@infradead.org> [not found] ` <1446703011.12676.83.camel@freescale.com> [not found] ` <20151207114547.GE16406@dhcppc13.redhat.com> 2015-12-07 11:48 ` [PATCH] kexec: Add --lite option Pratyush Anand 2015-12-07 13:16 ` James Morse 2015-12-07 14:07 ` Pratyush Anand 2015-12-08 1:03 ` Scott Wood 2015-12-08 16:00 ` James Morse 2015-12-09 9:28 ` Pratyush Anand 2016-01-11 12:46 ` Pratyush Anand 2016-01-12 1:06 ` Simon Horman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).