* [U-Boot] fdt performance
@ 2013-10-17 6:24 Aaron Williams
2013-10-17 9:48 ` Albert ARIBAUD
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Aaron Williams @ 2013-10-17 6:24 UTC (permalink / raw)
To: u-boot
Hi all,
In our bootloader based off of 2013.07 we make extensive use of the flat
device tree. In profiling our bootloader in our simulator I found that
the function eating up the most time is fdt_next_tag. Looking at it,
especially fdt_offset_ptr, it looks like there is a lot of room for
improvement especially in the skip name section.
Some of the checks in fdt_offset_ptr also look useless, such as if
((offset + len) < offset) which will always be false, or
if (p + len < p)
len is always positive.
-Aaron
--
Aaron Williams
Software Engineer
Cavium, Inc.
(408) 943-7198 (510) 789-8988 (cell)
^ permalink raw reply [flat|nested] 14+ messages in thread
* [U-Boot] fdt performance
2013-10-17 6:24 [U-Boot] fdt performance Aaron Williams
@ 2013-10-17 9:48 ` Albert ARIBAUD
2013-10-17 18:55 ` Wolfgang Denk
2013-10-17 22:27 ` Simon Glass
2 siblings, 0 replies; 14+ messages in thread
From: Albert ARIBAUD @ 2013-10-17 9:48 UTC (permalink / raw)
To: u-boot
Hi Aaron,
On Wed, 16 Oct 2013 23:24:04 -0700, Aaron Williams
<Aaron.Williams@caviumnetworks.com> wrote:
> Hi all,
>
> In our bootloader based off of 2013.07 we make extensive use of the flat
> device tree. In profiling our bootloader in our simulator I found that
> the function eating up the most time is fdt_next_tag. Looking at it,
> especially fdt_offset_ptr, it looks like there is a lot of room for
> improvement especially in the skip name section.
>
> Some of the checks in fdt_offset_ptr also look useless, such as if
> ((offset + len) < offset) which will always be false, or
> if (p + len < p)
>
> len is always positive.
I guess these tests are to catch corruption cases where the pointer
plus length wrap around the address space. Granted, this is far from
perfect protection, but still, it makes some sense.
In any case, if you found places where the code can be optimized, feel
free to directly post a patch!
(you can still put notes and comments below the '---' line)
> -Aaron
Amicalement,
--
Albert.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [U-Boot] fdt performance
2013-10-17 6:24 [U-Boot] fdt performance Aaron Williams
2013-10-17 9:48 ` Albert ARIBAUD
@ 2013-10-17 18:55 ` Wolfgang Denk
2013-10-17 19:04 ` Andre Renaud
2013-10-17 22:27 ` Simon Glass
2 siblings, 1 reply; 14+ messages in thread
From: Wolfgang Denk @ 2013-10-17 18:55 UTC (permalink / raw)
To: u-boot
Dear Aaron Williams,
In message <525F8284.4000304@caviumnetworks.com> you wrote:
>
> Some of the checks in fdt_offset_ptr also look useless, such as if
> ((offset + len) < offset) which will always be false, or
> if (p + len < p)
What happens if the "offset" or "p" point to addresses close to the
upper end of the address space, and adding "len" makes it wrap around?
Best regards,
Wolfgang Denk
--
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
The more we disagree, the more chance there is that@least one of
us is right.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [U-Boot] fdt performance
2013-10-17 18:55 ` Wolfgang Denk
@ 2013-10-17 19:04 ` Andre Renaud
2013-10-18 20:11 ` Wolfgang Denk
0 siblings, 1 reply; 14+ messages in thread
From: Andre Renaud @ 2013-10-17 19:04 UTC (permalink / raw)
To: u-boot
Hi Wolfgang,
On 18 October 2013 07:55, Wolfgang Denk <wd@denx.de> wrote:
> In message <525F8284.4000304@caviumnetworks.com> you wrote:
>> Some of the checks in fdt_offset_ptr also look useless, such as if
>> ((offset + len) < offset) which will always be false, or
>> if (p + len < p)
>
> What happens if the "offset" or "p" point to addresses close to the
> upper end of the address space, and adding "len" makes it wrap around?
I'm not sure how particular U-Boot is about this, but the C standard
doesn't specify what to do in the situation of signed overflow, so
it's possible that these checks could be simply optimised away. The
portable way to write it (I believe) is:
if (INT_MAX - len < offset). I don't know what GCC does in this
situation specifically though.
Regards,
Andre
^ permalink raw reply [flat|nested] 14+ messages in thread
* [U-Boot] fdt performance
2013-10-17 6:24 [U-Boot] fdt performance Aaron Williams
2013-10-17 9:48 ` Albert ARIBAUD
2013-10-17 18:55 ` Wolfgang Denk
@ 2013-10-17 22:27 ` Simon Glass
2013-11-14 23:36 ` Scott Wood
2014-01-14 6:13 ` Aaron Williams
2 siblings, 2 replies; 14+ messages in thread
From: Simon Glass @ 2013-10-17 22:27 UTC (permalink / raw)
To: u-boot
Hi Aaron,
On Thu, Oct 17, 2013 at 12:24 AM, Aaron Williams
<Aaron.Williams@caviumnetworks.com> wrote:
> Hi all,
>
> In our bootloader based off of 2013.07 we make extensive use of the flat
> device tree. In profiling our bootloader in our simulator I found that the
> function eating up the most time is fdt_next_tag. Looking at it, especially
> fdt_offset_ptr, it looks like there is a lot of room for improvement
> especially in the skip name section.
>
> Some of the checks in fdt_offset_ptr also look useless, such as if ((offset
> + len) < offset) which will always be false, or
> if (p + len < p)
>
> len is always positive.
Are you using CONFIG_OF_CONTROL?
If so, as a higher-level point, we could bring in an efficient DT
library, which converts the the FDT into a tree structure for faster
parsing. I can point you to a starting point if you like.
Regards,
Simon
>
> -Aaron
>
> --
> Aaron Williams
> Software Engineer
> Cavium, Inc.
> (408) 943-7198 (510) 789-8988 (cell)
>
> _______________________________________________
> U-Boot mailing list
> U-Boot at lists.denx.de
> http://lists.denx.de/mailman/listinfo/u-boot
^ permalink raw reply [flat|nested] 14+ messages in thread
* [U-Boot] fdt performance
2013-10-17 19:04 ` Andre Renaud
@ 2013-10-18 20:11 ` Wolfgang Denk
2013-10-20 18:18 ` Michael Pratt
0 siblings, 1 reply; 14+ messages in thread
From: Wolfgang Denk @ 2013-10-18 20:11 UTC (permalink / raw)
To: u-boot
Dear Andre,
In message <CAPfzE3a2ne-xcjKUTK8WS78V0yxuSd50wSQvm1rSPgNUfwP4Ow@mail.gmail.com> you wrote:
>
> >> Some of the checks in fdt_offset_ptr also look useless, such as if
> >> ((offset + len) < offset) which will always be false, or
> >> if (p + len < p)
> >
> > What happens if the "offset" or "p" point to addresses close to the
> > upper end of the address space, and adding "len" makes it wrap around?
>
> I'm not sure how particular U-Boot is about this, but the C standard
> doesn't specify what to do in the situation of signed overflow, so
These are no signed numbers, right?
> it's possible that these checks could be simply optimised away. The
This is not hwat happens.
> portable way to write it (I believe) is:
> if (INT_MAX - len < offset). I don't know what GCC does in this
> situation specifically though.
This has nothing to do with GCC. It's a standard C question. Inmy
understanding, the expression "offset + len" (with "offset" being
"int", and "len" being "unsigned int"), will give an "unsigned int";
the comparison willt hen also be done using "unsigned int" data types.
So if you want to write a "portable" expression (though I have to
admit that I don't see how this would be more portable, or how the
current code is less portable), that would be:
(UINT_MAX - len < offset)
At least that would give the same results - but to me the meaning
would be totally unclear when I read the code - while I think I
understand the current form.
Best regards,
Wolfgang Denk
--
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
The price one pays for pursuing any profession, or calling, is an
intimate knowledge of its ugly side. - James Baldwin
^ permalink raw reply [flat|nested] 14+ messages in thread
* [U-Boot] fdt performance
2013-10-18 20:11 ` Wolfgang Denk
@ 2013-10-20 18:18 ` Michael Pratt
2013-10-21 19:55 ` Wolfgang Denk
0 siblings, 1 reply; 14+ messages in thread
From: Michael Pratt @ 2013-10-20 18:18 UTC (permalink / raw)
To: u-boot
Hi Wolfgang,
On Fri, Oct 18, 2013 at 4:11 PM, Wolfgang Denk <wd@denx.de> wrote:
> Dear Andre,
>
> In message <
> CAPfzE3a2ne-xcjKUTK8WS78V0yxuSd50wSQvm1rSPgNUfwP4Ow at mail.gmail.com> you
> wrote:
> >
> > >> Some of the checks in fdt_offset_ptr also look useless, such as if
> > >> ((offset + len) < offset) which will always be false, or
> > >> if (p + len < p)
> > >
> > > What happens if the "offset" or "p" point to addresses close to the
> > > upper end of the address space, and adding "len" makes it wrap around?
> >
> > I'm not sure how particular U-Boot is about this, but the C standard
> > doesn't specify what to do in the situation of signed overflow, so
>
> These are no signed numbers, right?
>
> > it's possible that these checks could be simply optimised away. The
>
> This is not hwat happens.
>
Actually, it is my understanding that the "if (p + len < p)" can be
optimized away. This exact case is discussed in the LWN article "GCC and
pointer overflows"[1].
Basically, the C standard states that pointer arithmetic should not cause
overflow, thus allowing the compiler to assume that "p + len" must always
be greater than "p".
The exact wording can be found in section 6.5.6 "Additive operators",
statement 8 of the C99 specification:
> If both the pointer operand and the result point to elements of the same
> array object, or one past the last element of the array object, the
> evaluation shall not produce an over?ow; otherwise, the behavior is
> unde?ned.
Thus, this use of pointer overflow is undefined.
This has nothing to do with GCC. It's a standard C question. Inmy
> understanding, the expression "offset + len" (with "offset" being
> "int", and "len" being "unsigned int"), will give an "unsigned int";
> the comparison willt hen also be done using "unsigned int" data types.
>
I agree this expression ((offset + len) < offset) is safe, since it will
used unsigned ints. Since "int" and "unsigned int" have the same rank,
"offset" will be converted to an unsigned int before the addition. This
comes from section 6.3.1.8 in the C99 specification:
> Otherwise, if the operand that has unsigned integer type has rank greater
> or equal to the rank of the type of the other operand, then the operand
> with signed integer type is converted to the type of the operand with
> unsigned integer type.
[1]: http://lwn.net/Articles/278137/
Regards,
Michael Pratt
^ permalink raw reply [flat|nested] 14+ messages in thread
* [U-Boot] fdt performance
2013-10-20 18:18 ` Michael Pratt
@ 2013-10-21 19:55 ` Wolfgang Denk
2013-10-22 20:43 ` Michael Pratt
0 siblings, 1 reply; 14+ messages in thread
From: Wolfgang Denk @ 2013-10-21 19:55 UTC (permalink / raw)
To: u-boot
Dear Michael,
In message <CAPx6ZwHLn-VABzQOyAMF+T2VyQEc3MZDi1E_kdTVZG8OkJmBYA@mail.gmail.com> you wrote:
>
> > > it's possible that these checks could be simply optimised away. The
> >
> > This is not hwat happens.
>
> Actually, it is my understanding that the "if (p + len < p)" can be
> optimized away. This exact case is discussed in the LWN article "GCC and
> pointer overflows"[1].
No, this does not apply here. You miss a key point. We are not doing
any pointer arithmetics here. We have:
int offset;
unsigned int len;
and then do:
if (((offset + len) < offset) ...) ...
> Basically, the C standard states that pointer arithmetic should not cause
> overflow, thus allowing the compiler to assume that "p + len" must always
> be greater than "p".
This is totally irrelevant here. There are no pointers being used
here.
> --001a1133769056575204e93035f1
> Content-Type: text/html; charset=UTF-8
> Content-Transfer-Encoding: quoted-printable
Can you please stop posting HTML? Thanks!
Best regards,
Wolfgang Denk
--
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
The human race is faced with a cruel choice: work or daytime tele-
vision.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [U-Boot] fdt performance
2013-10-21 19:55 ` Wolfgang Denk
@ 2013-10-22 20:43 ` Michael Pratt
2013-10-23 4:13 ` Wolfgang Denk
0 siblings, 1 reply; 14+ messages in thread
From: Michael Pratt @ 2013-10-22 20:43 UTC (permalink / raw)
To: u-boot
Hi Wolfgang,
On Mon, Oct 21, 2013 at 3:55 PM, Wolfgang Denk <wd@denx.de> wrote:
>
> Dear Michael,
>
> In message <CAPx6ZwHLn-VABzQOyAMF+T2VyQEc3MZDi1E_kdTVZG8OkJmBYA@mail.gmail.com> you wrote:
> >
> > > > it's possible that these checks could be simply optimised away. The
> > >
> > > This is not hwat happens.
> >
> > Actually, it is my understanding that the "if (p + len < p)" can be
> > optimized away. This exact case is discussed in the LWN article "GCC and
> > pointer overflows"[1].
>
> No, this does not apply here. You miss a key point. We are not doing
> any pointer arithmetics here. We have:
>
> int offset;
> unsigned int len;
>
> and then do:
>
> if (((offset + len) < offset) ...)
We seem to have a misunderstanding, I did not mean to imply that the
"offset + len"
expression was undefined. I agree that it is fine. I was referring
to this statement from
Aaron's original email:
if (p + len < p)
Which is following:
const char *p;
unsigned int len;
That is the statement which is utilizing pointer overflow.
.
>
> > --001a1133769056575204e93035f1
> > Content-Type: text/html; charset=UTF-8
> > Content-Transfer-Encoding: quoted-printable
>
> Can you please stop posting HTML? Thanks!
Sorry about that! Hopefully this message is correct.
Regards,
Michael Pratt
^ permalink raw reply [flat|nested] 14+ messages in thread
* [U-Boot] fdt performance
2013-10-22 20:43 ` Michael Pratt
@ 2013-10-23 4:13 ` Wolfgang Denk
0 siblings, 0 replies; 14+ messages in thread
From: Wolfgang Denk @ 2013-10-23 4:13 UTC (permalink / raw)
To: u-boot
Dear Michael,
In message <CAPx6ZwGMzAzJo=PuyRCJyJ_w=7kPjtdSHJTYYoB6dpH7pd+xLA@mail.gmail.com> you wrote:
>
> We seem to have a misunderstanding, I did not mean to imply that the
> "offset + len"
> expression was undefined. I agree that it is fine. I was referring
> to this statement from
> Aaron's original email:
>
> if (p + len < p)
>
> Which is following:
>
> const char *p;
> unsigned int len;
>
> That is the statement which is utilizing pointer overflow.
Agreed. That should be fixed.
Best regards,
Wolfgang Denk
--
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
An Elephant is a mouse with an Operating System. - Knuth
^ permalink raw reply [flat|nested] 14+ messages in thread
* [U-Boot] fdt performance
2013-10-17 22:27 ` Simon Glass
@ 2013-11-14 23:36 ` Scott Wood
2013-11-14 23:52 ` Simon Glass
2014-01-14 6:13 ` Aaron Williams
1 sibling, 1 reply; 14+ messages in thread
From: Scott Wood @ 2013-11-14 23:36 UTC (permalink / raw)
To: u-boot
On Thu, 2013-10-17 at 16:27 -0600, Simon Glass wrote:
> Hi Aaron,
>
> On Thu, Oct 17, 2013 at 12:24 AM, Aaron Williams
> <Aaron.Williams@caviumnetworks.com> wrote:
> > Hi all,
> >
> > In our bootloader based off of 2013.07 we make extensive use of the flat
> > device tree. In profiling our bootloader in our simulator I found that the
> > function eating up the most time is fdt_next_tag. Looking at it, especially
> > fdt_offset_ptr, it looks like there is a lot of room for improvement
> > especially in the skip name section.
> >
> > Some of the checks in fdt_offset_ptr also look useless, such as if ((offset
> > + len) < offset) which will always be false, or
> > if (p + len < p)
> >
> > len is always positive.
>
> Are you using CONFIG_OF_CONTROL?
>
> If so, as a higher-level point, we could bring in an efficient DT
> library, which converts the the FDT into a tree structure for faster
> parsing. I can point you to a starting point if you like.
I've also seen it be a noticeable performance problem (in slow
simulation environments) on the extensive DT fixups we do on FSL PPC
(repeated calls to functions like fdt_node_offset_by_compat_reg are
particularly bad).
Though, when it comes to parsing the tree, rather than modifying it, I'm
not sure that the flattened data structure is all that much worse than
an unflattened tree. Lookups by path would be faster, but lookups by
compatible would still have to visit every node. I think the usage
patterns are at least part of the problem -- repeatedly scanning the
entire tree, rather than going over it once and assigning node offsets
to drivers. The driver model ought to help here.
FWIW, if there is interest in unflattening the device tree, Freescale's
hypervisor has code for this:
http://git.freescale.com/git/cgit.cgi/ppc/sdk/hypervisor/hypervisor.git/tree/src/livetree.c
http://git.freescale.com/git/cgit.cgi/ppc/sdk/hypervisor/hypervisor.git/tree/include/devtree.h
I'm not sure how it compares to the code Simon had in mind, but it
supports merging nodes which could be useful for boards that do dynamic
device tree updates.
-Scott
^ permalink raw reply [flat|nested] 14+ messages in thread
* [U-Boot] fdt performance
2013-11-14 23:36 ` Scott Wood
@ 2013-11-14 23:52 ` Simon Glass
0 siblings, 0 replies; 14+ messages in thread
From: Simon Glass @ 2013-11-14 23:52 UTC (permalink / raw)
To: u-boot
Hi Scott,
On Thu, Nov 14, 2013 at 4:36 PM, Scott Wood <scottwood@freescale.com> wrote:
> On Thu, 2013-10-17 at 16:27 -0600, Simon Glass wrote:
> > Hi Aaron,
> >
> > On Thu, Oct 17, 2013 at 12:24 AM, Aaron Williams
> > <Aaron.Williams@caviumnetworks.com> wrote:
> > > Hi all,
> > >
> > > In our bootloader based off of 2013.07 we make extensive use of the
> flat
> > > device tree. In profiling our bootloader in our simulator I found that
> the
> > > function eating up the most time is fdt_next_tag. Looking at it,
> especially
> > > fdt_offset_ptr, it looks like there is a lot of room for improvement
> > > especially in the skip name section.
> > >
> > > Some of the checks in fdt_offset_ptr also look useless, such as if
> ((offset
> > > + len) < offset) which will always be false, or
> > > if (p + len < p)
> > >
> > > len is always positive.
> >
> > Are you using CONFIG_OF_CONTROL?
> >
> > If so, as a higher-level point, we could bring in an efficient DT
> > library, which converts the the FDT into a tree structure for faster
> > parsing. I can point you to a starting point if you like.
>
> I've also seen it be a noticeable performance problem (in slow
> simulation environments) on the extensive DT fixups we do on FSL PPC
> (repeated calls to functions like fdt_node_offset_by_compat_reg are
> particularly bad).
>
> Though, when it comes to parsing the tree, rather than modifying it, I'm
> not sure that the flattened data structure is all that much worse than
> an unflattened tree. Lookups by path would be faster, but lookups by
> compatible would still have to visit every node. I think the usage
>
My thought was that compatible strings would be collated somewhere so there
is an efficient way to find them. However, that might violate the
simplicity rule.
> patterns are at least part of the problem -- repeatedly scanning the
> entire tree, rather than going over it once and assigning node offsets
> to drivers. The driver model ought to help here.
>
> FWIW, if there is interest in unflattening the device tree, Freescale's
> hypervisor has code for this:
>
> http://git.freescale.com/git/cgit.cgi/ppc/sdk/hypervisor/hypervisor.git/tree/src/livetree.c
>
> http://git.freescale.com/git/cgit.cgi/ppc/sdk/hypervisor/hypervisor.git/tree/include/devtree.h
>
> I'm not sure how it compares to the code Simon had in mind, but it
> supports merging nodes which could be useful for boards that do dynamic
> device tree updates.
>
Well that code looks great. I'm not sure if we have the demand as yet (as
you say driver model should largely allow the tree to be scanned
sequentially and only once) but I am sure it will come up one day.
Regards,
Simon
^ permalink raw reply [flat|nested] 14+ messages in thread
* [U-Boot] fdt performance
2013-10-17 22:27 ` Simon Glass
2013-11-14 23:36 ` Scott Wood
@ 2014-01-14 6:13 ` Aaron Williams
2014-01-26 16:56 ` Simon Glass
1 sibling, 1 reply; 14+ messages in thread
From: Aaron Williams @ 2014-01-14 6:13 UTC (permalink / raw)
To: u-boot
Hi Simon,
Sorry for the long delay.
On 10/17/2013 03:27 PM, Simon Glass wrote:
> Hi Aaron,
>
> On Thu, Oct 17, 2013 at 12:24 AM, Aaron Williams
> <Aaron.Williams@caviumnetworks.com> wrote:
>> Hi all,
>>
>> In our bootloader based off of 2013.07 we make extensive use of the flat
>> device tree. In profiling our bootloader in our simulator I found that the
>> function eating up the most time is fdt_next_tag. Looking at it, especially
>> fdt_offset_ptr, it looks like there is a lot of room for improvement
>> especially in the skip name section.
>>
>> Some of the checks in fdt_offset_ptr also look useless, such as if ((offset
>> + len) < offset) which will always be false, or
>> if (p + len < p)
>>
>> len is always positive.
> Are you using CONFIG_OF_CONTROL?
>
> If so, as a higher-level point, we could bring in an efficient DT
> library, which converts the the FDT into a tree structure for faster
> parsing. I can point you to a starting point if you like.
>
> Regards,
> Simon
A higher-level point is not desirable since when we are experiencing the
performance issues we are running out of NOR flash or our simulator.
Since most of our customers use NOR flash this a huge issue for us. We
have very little memory available for holding data structures since
basically only the stack is available before relocation.
Taking out these checks significantly sped up our boot process.
If you're checking for a wrap-around it should not check for each byte
but should check only once if it will wrap and handle it accordingly. If
we're wrapping then the device tree is hosed and we have bigger problems.
-Aaron
--
Aaron Williams
Software Engineer
Cavium, Inc.
(408) 943-7198 (510) 789-8988 (cell)
^ permalink raw reply [flat|nested] 14+ messages in thread
* [U-Boot] fdt performance
2014-01-14 6:13 ` Aaron Williams
@ 2014-01-26 16:56 ` Simon Glass
0 siblings, 0 replies; 14+ messages in thread
From: Simon Glass @ 2014-01-26 16:56 UTC (permalink / raw)
To: u-boot
Hi Aaron,
On 13 January 2014 23:13, Aaron Williams
<Aaron.Williams@caviumnetworks.com> wrote:
> Hi Simon,
>
> Sorry for the long delay.
>
>
>
> On 10/17/2013 03:27 PM, Simon Glass wrote:
>>
>> Hi Aaron,
>>
>> On Thu, Oct 17, 2013 at 12:24 AM, Aaron Williams
>> <Aaron.Williams@caviumnetworks.com> wrote:
>>>
>>> Hi all,
>>>
>>> In our bootloader based off of 2013.07 we make extensive use of the flat
>>> device tree. In profiling our bootloader in our simulator I found that
>>> the
>>> function eating up the most time is fdt_next_tag. Looking at it,
>>> especially
>>> fdt_offset_ptr, it looks like there is a lot of room for improvement
>>> especially in the skip name section.
>>>
>>> Some of the checks in fdt_offset_ptr also look useless, such as if
>>> ((offset
>>> + len) < offset) which will always be false, or
>>> if (p + len < p)
>>>
>>> len is always positive.
>>
>> Are you using CONFIG_OF_CONTROL?
>>
>> If so, as a higher-level point, we could bring in an efficient DT
>> library, which converts the the FDT into a tree structure for faster
>> parsing. I can point you to a starting point if you like.
>>
>> Regards,
>> Simon
>
> A higher-level point is not desirable since when we are experiencing the
> performance issues we are running out of NOR flash or our simulator. Since
> most of our customers use NOR flash this a huge issue for us. We have very
> little memory available for holding data structures since basically only the
> stack is available before relocation.
>
> Taking out these checks significantly sped up our boot process.
>
> If you're checking for a wrap-around it should not check for each byte but
> should check only once if it will wrap and handle it accordingly. If we're
> wrapping then the device tree is hosed and we have bigger problems.
Are you scanning through the FDT multiple times before relocation?
Certainly libfdt is designed to be careful about things and there are
many checks. Are you suggesting adding some kind of CONFIG optoin tot
turn them off?
I'm having a hard time understanding why these simple checks (which
would expand to a few machine instructions) should take so long. Have
you confirmed that removing them does significantly speed up the
hardware, and it is not just an artifact of your profiling system?
It is certainly possible to pass a -ve number as the device tree
offset to any of the exported functions. This should result in correct
behaviour (returning an error) rather than a crash.
Of course anything that speeds up the code is welcome so long as it is
still correct.
Regards,
Simon
>
>
> -Aaron
>
> --
> Aaron Williams
> Software Engineer
> Cavium, Inc.
> (408) 943-7198 (510) 789-8988 (cell)
>
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2014-01-26 16:56 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-17 6:24 [U-Boot] fdt performance Aaron Williams
2013-10-17 9:48 ` Albert ARIBAUD
2013-10-17 18:55 ` Wolfgang Denk
2013-10-17 19:04 ` Andre Renaud
2013-10-18 20:11 ` Wolfgang Denk
2013-10-20 18:18 ` Michael Pratt
2013-10-21 19:55 ` Wolfgang Denk
2013-10-22 20:43 ` Michael Pratt
2013-10-23 4:13 ` Wolfgang Denk
2013-10-17 22:27 ` Simon Glass
2013-11-14 23:36 ` Scott Wood
2013-11-14 23:52 ` Simon Glass
2014-01-14 6:13 ` Aaron Williams
2014-01-26 16:56 ` Simon Glass
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox