* [PATCH] kexec: Add --lite option
[not found] ` <20151207114547.GE16406@dhcppc13.redhat.com>
@ 2015-12-07 11:48 ` Pratyush Anand
2015-12-07 13:16 ` James Morse
0 siblings, 1 reply; 8+ messages in thread
From: Pratyush Anand @ 2015-12-07 11:48 UTC (permalink / raw)
To: linux-arm-kernel
Sorry, forgot to add linux-arm-kernel at lists.infradead.org. Now CCed.
On 07/12/2015:05:15:47 PM, Pratyush Anand wrote:
> +linux-arm-kernel at lists.infradead.org (May be someone from arm kernel list can
> give some more input)
>
> On 04/11/2015:11:56:51 PM, Scott Wood wrote:
> > On Thu, 2015-10-22 at 12:08 -0700, Geoff Levand wrote:
> > > I notice the difference on the my arm64 system, so I guess we
> > > are even on that.
> >
> > For me it was beyond "notice the difference" -- I thought it was completely
> > broken, and was preparing to debug, until it started spitting out output over
> > a minute later.
> >
> > Compiling the sha256 code with -O2 instead of -O0 cut it down to around 10
> > seconds (still unpleasant, but not quite as crazy... still unacceptable for
> > non-kdump, though).
>
> Yes, compiling purgatory code with -O2 helps to improve the timing, and I notice
> that enabling D-cache on top of -O2 does not improve it further. However, I am
> still not able to understand that why there is huge difference between following
> two.
>
> 1) When we execute kexec() system call in first kernel, at that time it
> calculates sha256 on all the binaries [1]. It take almost un-noticeable time
> (less than a sec) there.
>
> 2) When purgatory is executed then it re-calculates sha256 using same routines
> [2] on same binary data as that of case (1). But, now it takes 10-20 sec
> (depending of size of binaries)?
>
> Why did not it take same time with O2 + D-cache enabled? I think, we should be
> able to achieve same time in second case as well. What is missing?
>
> ~Pratyush
>
> [1] http://git.kernel.org/cgit/utils/kernel/kexec/kexec-tools.git/tree/kexec/kexec.c#n650
> [2] http://git.kernel.org/cgit/utils/kernel/kexec/kexec-tools.git/tree/purgatory/purgatory.c#n20
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] kexec: Add --lite option
2015-12-07 11:48 ` [PATCH] kexec: Add --lite option Pratyush Anand
@ 2015-12-07 13:16 ` James Morse
2015-12-07 14:07 ` Pratyush Anand
0 siblings, 1 reply; 8+ messages in thread
From: James Morse @ 2015-12-07 13:16 UTC (permalink / raw)
To: linux-arm-kernel
Hi Pratyush,
On 07/12/15 11:48, Pratyush Anand wrote:
>> 1) When we execute kexec() system call in first kernel, at that time it
>> calculates sha256 on all the binaries [1]. It take almost un-noticeable time
>> (less than a sec) there.
>>
>> 2) When purgatory is executed then it re-calculates sha256 using same routines
>> [2] on same binary data as that of case (1). But, now it takes 10-20 sec
>> (depending of size of binaries)?
>>
>> Why did not it take same time with O2 + D-cache enabled? I think, we should be
>> able to achieve same time in second case as well. What is missing?
I haven't benchmarked this, but:
util_lib/sha256.c contains calls out to memcpy().
In your case 1, this will use the glibc version. In case 2, it will use
the version implemented in purgatory/string.c, which is a byte-by-byte copy.
Thanks,
James
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] kexec: Add --lite option
2015-12-07 13:16 ` James Morse
@ 2015-12-07 14:07 ` Pratyush Anand
2015-12-08 1:03 ` Scott Wood
2015-12-08 16:00 ` James Morse
0 siblings, 2 replies; 8+ messages in thread
From: Pratyush Anand @ 2015-12-07 14:07 UTC (permalink / raw)
To: linux-arm-kernel
Hi James,
Thanks for the reply.
On 07/12/2015:01:16:06 PM, James Morse wrote:
> Hi Pratyush,
>
> On 07/12/15 11:48, Pratyush Anand wrote:
> >> 1) When we execute kexec() system call in first kernel, at that time it
> >> calculates sha256 on all the binaries [1]. It take almost un-noticeable time
> >> (less than a sec) there.
> >>
> >> 2) When purgatory is executed then it re-calculates sha256 using same routines
> >> [2] on same binary data as that of case (1). But, now it takes 10-20 sec
> >> (depending of size of binaries)?
> >>
> >> Why did not it take same time with O2 + D-cache enabled? I think, we should be
> >> able to achieve same time in second case as well. What is missing?
>
> I haven't benchmarked this, but:
>
> util_lib/sha256.c contains calls out to memcpy().
> In your case 1, this will use the glibc version. In case 2, it will use
> the version implemented in purgatory/string.c, which is a byte-by-byte copy.
>
Yes, I agree that byte copy is too slow. But, memcpy() in sha256_update() will
copy only few bytes (I think max 126 bytes). Most of the data will be processed
using loop while( length >= 64 ){}, where we do not have any memcpy.So, I do not
think that this would be causing such a difference.
Could it be the case that I am not using perfect memory attributes while setting
up identity mapping and enabling D-cache. My implementation is here:
https://github.com/pratyushanand/kexec-tools/commit/8efdbc56b52f99a8a074edd0ddc519d7b68be82f
~Pratyush
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] kexec: Add --lite option
2015-12-07 14:07 ` Pratyush Anand
@ 2015-12-08 1:03 ` Scott Wood
2015-12-08 16:00 ` James Morse
1 sibling, 0 replies; 8+ messages in thread
From: Scott Wood @ 2015-12-08 1:03 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, 2015-12-07 at 19:37 +0530, Pratyush Anand wrote:
> Hi James,
>
> Thanks for the reply.
>
> On 07/12/2015:01:16:06 PM, James Morse wrote:
> > Hi Pratyush,
> >
> > On 07/12/15 11:48, Pratyush Anand wrote:
> > > > 1) When we execute kexec() system call in first kernel, at that time
> > > > it
> > > > calculates sha256 on all the binaries [1]. It take almost un
> > > > -noticeable time
> > > > (less than a sec) there.
> > > >
> > > > 2) When purgatory is executed then it re-calculates sha256 using same
> > > > routines
> > > > [2] on same binary data as that of case (1). But, now it takes 10-20
> > > > sec
> > > > (depending of size of binaries)?
> > > >
> > > > Why did not it take same time with O2 + D-cache enabled? I think, we
> > > > should be
> > > > able to achieve same time in second case as well. What is missing?
> >
> > I haven't benchmarked this, but:
> >
> > util_lib/sha256.c contains calls out to memcpy().
> > In your case 1, this will use the glibc version. In case 2, it will use
> > the version implemented in purgatory/string.c, which is a byte-by-byte
> > copy.
> >
>
> Yes, I agree that byte copy is too slow. But, memcpy() in sha256_update()
> will
> copy only few bytes (I think max 126 bytes). Most of the data will be
> processed
> using loop while( length >= 64 ){}, where we do not have any memcpy.So, I do
> not
> think that this would be causing such a difference.
>
> Could it be the case that I am not using perfect memory attributes while
> setting
> up identity mapping and enabling D-cache. My implementation is here:
> https://github.com/pratyushanand/kexec-tools/commit/8efdbc56b52f99a8a074edd0
> ddc519d7b68be82f
FWIW, purgatory is fast for me on PPC (sub-second), so between that (assuming
it's not due to some PPC-specific optimization) and the fact that you don't
see any improvement with cache, I'd guess there's something wrong with how
you're enabling caches.
-Scott
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] kexec: Add --lite option
2015-12-07 14:07 ` Pratyush Anand
2015-12-08 1:03 ` Scott Wood
@ 2015-12-08 16:00 ` James Morse
2015-12-09 9:28 ` Pratyush Anand
1 sibling, 1 reply; 8+ messages in thread
From: James Morse @ 2015-12-08 16:00 UTC (permalink / raw)
To: linux-arm-kernel
Hi Pratyush,
On 07/12/15 14:07, Pratyush Anand wrote:
> On 07/12/2015:01:16:06 PM, James Morse wrote:
>> I haven't benchmarked this, but:
>>
>> util_lib/sha256.c contains calls out to memcpy().
>> In your case 1, this will use the glibc version. In case 2, it will use
>> the version implemented in purgatory/string.c, which is a byte-by-byte copy.
>>
>
> Yes, I agree that byte copy is too slow. But, memcpy() in sha256_update() will
> copy only few bytes (I think max 126 bytes). Most of the data will be processed
> using loop while( length >= 64 ){}, where we do not have any memcpy.So, I do not
> think that this would be causing such a difference.
You're right, I benchmarked the two sha256.o files checksumming a 10MB
buffer - one takes 0.6s, the other 1.7s, we can probably expect a couple
of seconds to do this.
Is the sha256 really useful? Purgatory can't print out an error message,
if it fails...
> Could it be the case that I am not using perfect memory attributes while setting
> up identity mapping and enabling D-cache. My implementation is here:
> https://github.com/pratyushanand/kexec-tools/commit/8efdbc56b52f99a8a074edd0ddc519d7b68be82f
I'm no expert, but that looks like you're setting it up as 'normal'
memory. You're missing some isb-s and tlbi-s: depending on how long the
changes to system state take, you may be using old memory-attributes or
page-tables. If you depend on a change to system state, (like turning
the mmu on), you need explicit synchronisation, see section 12.3.2 of
the 'Architecture Programmers Guide' (arm den0024a), and D7.1.2 of the
ARM ARM.
I haven't managed to get your kexec-tools branch to work with v10 of
Geoff's series. It looks like you save registers to the stack, which
give stale values once you turn the mmu off. You also do the opposite,
saving registers with the mmu off, then cleaning cache lines over the
top, corrupting the saved registers.
The page size of 64K is hard coded. Kexec-ing from a 4K kernel, to a 4K
kernel will work, but only if the hardware also supports 64K, this will
be surprising to debug.
Thanks,
James
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] kexec: Add --lite option
2015-12-08 16:00 ` James Morse
@ 2015-12-09 9:28 ` Pratyush Anand
2016-01-11 12:46 ` Pratyush Anand
0 siblings, 1 reply; 8+ messages in thread
From: Pratyush Anand @ 2015-12-09 9:28 UTC (permalink / raw)
To: linux-arm-kernel
Hi James,
On 08/12/2015:04:00:17 PM, James Morse wrote:
> Hi Pratyush,
>
> On 07/12/15 14:07, Pratyush Anand wrote:
> > On 07/12/2015:01:16:06 PM, James Morse wrote:
> >> I haven't benchmarked this, but:
> >>
> >> util_lib/sha256.c contains calls out to memcpy().
> >> In your case 1, this will use the glibc version. In case 2, it will use
> >> the version implemented in purgatory/string.c, which is a byte-by-byte copy.
> >>
> >
> > Yes, I agree that byte copy is too slow. But, memcpy() in sha256_update() will
> > copy only few bytes (I think max 126 bytes). Most of the data will be processed
> > using loop while( length >= 64 ){}, where we do not have any memcpy.So, I do not
> > think that this would be causing such a difference.
>
> You're right, I benchmarked the two sha256.o files checksumming a 10MB
> buffer - one takes 0.6s, the other 1.7s, we can probably expect a couple
> of seconds to do this.
>
> Is the sha256 really useful? Purgatory can't print out an error message,
> if it fails...
kdump needs the sha256 integrity checks. Please see previous reply from Dave,
Vivek and Eric in this thread.
Purgatory prints error message, in case sha256 fails. It prints expected and
calculated sha256 values. You must pass --port with proper value to see print
messages. Geoff has been able to see purgatory debug messages on foundation
model only with --port, however I need below patch and pass --port-lsr as well
in order to print all the characters properly.
https://github.com/pratyushanand/kexec-tools/commit/ab30f4015189b177dd2e78980f5b7e47c2d22fe4
So, on a system having pl011 base address 0xe1010000, I pass --port=0xe1010000
--port-lsr=0xe1010018,0x80 to the kexec command.
>
>
> > Could it be the case that I am not using perfect memory attributes while setting
> > up identity mapping and enabling D-cache. My implementation is here:
> > https://github.com/pratyushanand/kexec-tools/commit/8efdbc56b52f99a8a074edd0ddc519d7b68be82f
>
First of all, thanks a lot for taking out your time to review it :-)
> I'm no expert, but that looks like you're setting it up as 'normal'
Me too not an expert.
Shouldn't it be normal type memory? I think, I will need to define only UART
area as device type (currently it is not defined, and so I am not able to use
print message while mmu is enabled).
> memory. You're missing some isb-s and tlbi-s: depending on how long the
> changes to system state take, you may be using old memory-attributes or
> page-tables. If you depend on a change to system state, (like turning
> the mmu on), you need explicit synchronisation, see section 12.3.2 of
> the 'Architecture Programmers Guide' (arm den0024a), and D7.1.2 of the
> ARM ARM.
I will go through these docs and kernel arch/arm64/kernel/head.S and will
rewrite the cache implementation.
>
> I haven't managed to get your kexec-tools branch to work with v10 of
May be you can try master branch. It is almost same as that of Geoff's.
Additionally, it has support to "wait for transmit completion before next
character transmission".
> Geoff's series. It looks like you save registers to the stack, which
> give stale values once you turn the mmu off. You also do the opposite,
> saving registers with the mmu off, then cleaning cache lines over the
> top, corrupting the saved registers.
>
> The page size of 64K is hard coded. Kexec-ing from a 4K kernel, to a 4K
> kernel will work, but only if the hardware also supports 64K, this will
> be surprising to debug.
OK, I will take care in the re-implementation.
Thanks
~Pratyush
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] kexec: Add --lite option
2015-12-09 9:28 ` Pratyush Anand
@ 2016-01-11 12:46 ` Pratyush Anand
2016-01-12 1:06 ` Simon Horman
0 siblings, 1 reply; 8+ messages in thread
From: Pratyush Anand @ 2016-01-11 12:46 UTC (permalink / raw)
To: linux-arm-kernel
+Fu
Hi James,
> On 08/12/2015:04:00:17 PM, James Morse wrote:
> > Hi Pratyush,
> > I haven't managed to get your kexec-tools branch to work with v10 of
Thanks for all your feedback. It helped to stabilize the code better. I took
all your suggestions and also did few other modifications [1]. Now, purgatory
does SHA verifications on mustang and seattle within a second. I think this
would be helpful in handling few of the kdump issues with watchdog. May be you
can give a try to the code in branch purgatory-enable-dcache of my tree. Any
feedback would be welcome.
~Pratyush
[1] https://github.com/pratyushanand/kexec-tools/commit/5679d4baaa5e644f8302982c6f468214ed3d3f3d
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] kexec: Add --lite option
2016-01-11 12:46 ` Pratyush Anand
@ 2016-01-12 1:06 ` Simon Horman
0 siblings, 0 replies; 8+ messages in thread
From: Simon Horman @ 2016-01-12 1:06 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Jan 11, 2016 at 06:16:38PM +0530, Pratyush Anand wrote:
> +Fu
>
> Hi James,
>
> > On 08/12/2015:04:00:17 PM, James Morse wrote:
> > > Hi Pratyush,
>
> > > I haven't managed to get your kexec-tools branch to work with v10 of
>
> Thanks for all your feedback. It helped to stabilize the code better. I took
> all your suggestions and also did few other modifications [1]. Now, purgatory
> does SHA verifications on mustang and seattle within a second. I think this
> would be helpful in handling few of the kdump issues with watchdog. May be you
> can give a try to the code in branch purgatory-enable-dcache of my tree. Any
> feedback would be welcome.
Please post the patches here for review.
Thanks
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-01-12 1:06 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1445469125.30908.105.camel@infradead.org>
[not found] ` <20151022031718.GB11227@dhcp-129-115.nay.redhat.com>
[not found] ` <20151022125012.GB20847@redhat.com>
[not found] ` <1445540891.30908.144.camel@infradead.org>
[not found] ` <1446703011.12676.83.camel@freescale.com>
[not found] ` <20151207114547.GE16406@dhcppc13.redhat.com>
2015-12-07 11:48 ` [PATCH] kexec: Add --lite option Pratyush Anand
2015-12-07 13:16 ` James Morse
2015-12-07 14:07 ` Pratyush Anand
2015-12-08 1:03 ` Scott Wood
2015-12-08 16:00 ` James Morse
2015-12-09 9:28 ` Pratyush Anand
2016-01-11 12:46 ` Pratyush Anand
2016-01-12 1:06 ` Simon Horman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).