All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jon Hunter <jon-hunter-l0cyMroinI0@public.gmane.org>
To: Mitch Bradley <wmb-D5eQfiDGL7eakBO8gow8eQ@public.gmane.org>
Cc: devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Santosh Shilimkar
	<santosh.shilimkar-l0cyMroinI0@public.gmane.org>,
	Sebastien Guiriec <s-guiriec-l0cyMroinI0@public.gmane.org>,
	linux-omap-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
Subject: Re: [PATCH v2 1/4] ARM: dts: omap5: Update GPIO with address space and interrupts
Date: Tue, 23 Oct 2012 18:15:54 -0500	[thread overview]
Message-ID: <5087252A.50203@ti.com> (raw)
In-Reply-To: <5086CC02.6070801-D5eQfiDGL7eakBO8gow8eQ@public.gmane.org>

Hi Mitch,

On 10/23/2012 11:55 AM, Mitch Bradley wrote:
> On 10/23/2012 4:49 AM, Jon Hunter wrote:
> 
>> Therefore, I believe it will improve search time and hence, boot time if
>> we have interrupt-parent defined in each node.
> 
> I strongly suspect (based on many years of performance tuning, with
> special focus on boot time) that the time difference will be completely
> insignificant.  The total extra time for walking up the interrupt tree
> for every interrupt in a large system is comparable to the time it takes
> to send a few characters out a UART.  So you can get more improvement
> from eliminating a single printk() than from globally adding per-node
> interrupt-parent.
> 
> Furthermore, the cost of processing all of the interrupt-parent
> properties is probably similar to the cost of the avoided tree walks.
> 
> CPU cycles are very fast compared to I/O register accesses, say a factor
> of 100.  Now consider that many modern devices contain embedded
> microcontrollers (SD cards, network interface modules, USB hubs and
> devices, ...), and those devices usually require various delays measured
> in milliseconds, to ensure that the microcontroller is ready for the
> next initialization step.  Those delays are extremely long compared to
> CPU cycles.  Obviously, some of that can be overlapped by careful
> multithreading, but that isn't free either.
> 
> The bottom line is that I'm pretty sure that adding per-node
> interrupt-parent would not be worthwhile from the standpoint of speeding
> up boot time.

Absolutely, I don't expect this to miraculously improve the boot time or
suggest that this is a major contributor to boot time, but what is the
best approach in general in terms of efficiency (memory and time). In
other words, is there a best practice? And from your feedback, I
understand that adding a global interrupt-parent is a good practice.

For a bit of fun, I took an omap4430 board and benchmarked the time
taken by the of_irq_find_parent() when interrupt-parent was defined for
each node using interrupts and without.

There were a total of 47 device nodes using interrupts. Adding the
interrupt-parent to all 47 nodes increased the dtb from 13211 bytes to
13963 bytes.

On boot-up I saw 117 calls to of_irq_find_parent() for this platform
(there appears to be multiple calls for a given device). Without
interrupt-parent defined for each node total time spent in
of_irq_find_parent() was 1.028 ms where as with interrupt-parent defined
for each node the total time was 0.4032 ms. This was done using a
38.4MHz timer and the overhead of reading the timer 117 times was about
36 us.

I understand that this does not provide the full picture, but I wanted
to get a better handle on the times here. So yes the overall overhead
here is not significant for us to worry about.

Cheers
Jon

WARNING: multiple messages have this Message-ID (diff)
From: jon-hunter@ti.com (Jon Hunter)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v2 1/4] ARM: dts: omap5: Update GPIO with address space and interrupts
Date: Tue, 23 Oct 2012 18:15:54 -0500	[thread overview]
Message-ID: <5087252A.50203@ti.com> (raw)
In-Reply-To: <5086CC02.6070801@firmworks.com>

Hi Mitch,

On 10/23/2012 11:55 AM, Mitch Bradley wrote:
> On 10/23/2012 4:49 AM, Jon Hunter wrote:
> 
>> Therefore, I believe it will improve search time and hence, boot time if
>> we have interrupt-parent defined in each node.
> 
> I strongly suspect (based on many years of performance tuning, with
> special focus on boot time) that the time difference will be completely
> insignificant.  The total extra time for walking up the interrupt tree
> for every interrupt in a large system is comparable to the time it takes
> to send a few characters out a UART.  So you can get more improvement
> from eliminating a single printk() than from globally adding per-node
> interrupt-parent.
> 
> Furthermore, the cost of processing all of the interrupt-parent
> properties is probably similar to the cost of the avoided tree walks.
> 
> CPU cycles are very fast compared to I/O register accesses, say a factor
> of 100.  Now consider that many modern devices contain embedded
> microcontrollers (SD cards, network interface modules, USB hubs and
> devices, ...), and those devices usually require various delays measured
> in milliseconds, to ensure that the microcontroller is ready for the
> next initialization step.  Those delays are extremely long compared to
> CPU cycles.  Obviously, some of that can be overlapped by careful
> multithreading, but that isn't free either.
> 
> The bottom line is that I'm pretty sure that adding per-node
> interrupt-parent would not be worthwhile from the standpoint of speeding
> up boot time.

Absolutely, I don't expect this to miraculously improve the boot time or
suggest that this is a major contributor to boot time, but what is the
best approach in general in terms of efficiency (memory and time). In
other words, is there a best practice? And from your feedback, I
understand that adding a global interrupt-parent is a good practice.

For a bit of fun, I took an omap4430 board and benchmarked the time
taken by the of_irq_find_parent() when interrupt-parent was defined for
each node using interrupts and without.

There were a total of 47 device nodes using interrupts. Adding the
interrupt-parent to all 47 nodes increased the dtb from 13211 bytes to
13963 bytes.

On boot-up I saw 117 calls to of_irq_find_parent() for this platform
(there appears to be multiple calls for a given device). Without
interrupt-parent defined for each node total time spent in
of_irq_find_parent() was 1.028 ms where as with interrupt-parent defined
for each node the total time was 0.4032 ms. This was done using a
38.4MHz timer and the overhead of reading the timer 117 times was about
36 us.

I understand that this does not provide the full picture, but I wanted
to get a better handle on the times here. So yes the overall overhead
here is not significant for us to worry about.

Cheers
Jon

WARNING: multiple messages have this Message-ID (diff)
From: Jon Hunter <jon-hunter@ti.com>
To: Mitch Bradley <wmb@firmworks.com>
Cc: Sebastien Guiriec <s-guiriec@ti.com>,
	<devicetree-discuss@lists.ozlabs.org>,
	<linux-kernel@vger.kernel.org>,
	Santosh Shilimkar <santosh.shilimkar@ti.com>,
	<linux-omap@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH v2 1/4] ARM: dts: omap5: Update GPIO with address space and interrupts
Date: Tue, 23 Oct 2012 18:15:54 -0500	[thread overview]
Message-ID: <5087252A.50203@ti.com> (raw)
In-Reply-To: <5086CC02.6070801@firmworks.com>

Hi Mitch,

On 10/23/2012 11:55 AM, Mitch Bradley wrote:
> On 10/23/2012 4:49 AM, Jon Hunter wrote:
> 
>> Therefore, I believe it will improve search time and hence, boot time if
>> we have interrupt-parent defined in each node.
> 
> I strongly suspect (based on many years of performance tuning, with
> special focus on boot time) that the time difference will be completely
> insignificant.  The total extra time for walking up the interrupt tree
> for every interrupt in a large system is comparable to the time it takes
> to send a few characters out a UART.  So you can get more improvement
> from eliminating a single printk() than from globally adding per-node
> interrupt-parent.
> 
> Furthermore, the cost of processing all of the interrupt-parent
> properties is probably similar to the cost of the avoided tree walks.
> 
> CPU cycles are very fast compared to I/O register accesses, say a factor
> of 100.  Now consider that many modern devices contain embedded
> microcontrollers (SD cards, network interface modules, USB hubs and
> devices, ...), and those devices usually require various delays measured
> in milliseconds, to ensure that the microcontroller is ready for the
> next initialization step.  Those delays are extremely long compared to
> CPU cycles.  Obviously, some of that can be overlapped by careful
> multithreading, but that isn't free either.
> 
> The bottom line is that I'm pretty sure that adding per-node
> interrupt-parent would not be worthwhile from the standpoint of speeding
> up boot time.

Absolutely, I don't expect this to miraculously improve the boot time or
suggest that this is a major contributor to boot time, but what is the
best approach in general in terms of efficiency (memory and time). In
other words, is there a best practice? And from your feedback, I
understand that adding a global interrupt-parent is a good practice.

For a bit of fun, I took an omap4430 board and benchmarked the time
taken by the of_irq_find_parent() when interrupt-parent was defined for
each node using interrupts and without.

There were a total of 47 device nodes using interrupts. Adding the
interrupt-parent to all 47 nodes increased the dtb from 13211 bytes to
13963 bytes.

On boot-up I saw 117 calls to of_irq_find_parent() for this platform
(there appears to be multiple calls for a given device). Without
interrupt-parent defined for each node total time spent in
of_irq_find_parent() was 1.028 ms where as with interrupt-parent defined
for each node the total time was 0.4032 ms. This was done using a
38.4MHz timer and the overhead of reading the timer 117 times was about
36 us.

I understand that this does not provide the full picture, but I wanted
to get a better handle on the times here. So yes the overall overhead
here is not significant for us to worry about.

Cheers
Jon



  parent reply	other threads:[~2012-10-23 23:15 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-23  8:37 [PATCH v2 0/4] ARM: dts: Update OMAP5 with address space and interrupts Sebastien Guiriec
2012-10-23  8:37 ` Sebastien Guiriec
2012-10-23  8:37 ` Sebastien Guiriec
2012-10-23  8:37 ` [PATCH v2 1/4] ARM: dts: omap5: Update GPIO " Sebastien Guiriec
2012-10-23  8:37   ` Sebastien Guiriec
2012-10-23  8:37   ` Sebastien Guiriec
2012-10-23 14:49   ` Jon Hunter
2012-10-23 14:49     ` Jon Hunter
2012-10-23 14:49     ` Jon Hunter
2012-10-23 15:09     ` Benoit Cousson
2012-10-23 15:09       ` Benoit Cousson
2012-10-23 15:09       ` Benoit Cousson
2012-10-23 15:59       ` Jon Hunter
2012-10-23 15:59         ` Jon Hunter
2012-10-23 15:59         ` Jon Hunter
2012-10-23 16:07         ` Benoit Cousson
2012-10-23 16:07           ` Benoit Cousson
2012-10-23 16:07           ` Benoit Cousson
2012-10-23 16:15           ` Sebastien Guiriec
2012-10-23 16:15             ` Sebastien Guiriec
2012-10-23 16:15             ` Sebastien Guiriec
2012-10-24  7:44             ` Benoit Cousson
2012-10-24  7:44               ` Benoit Cousson
2012-10-24  7:44               ` Benoit Cousson
2012-10-23 16:55     ` Mitch Bradley
2012-10-23 16:55       ` Mitch Bradley
     [not found]       ` <5086CC02.6070801-D5eQfiDGL7eakBO8gow8eQ@public.gmane.org>
2012-10-23 23:15         ` Jon Hunter [this message]
2012-10-23 23:15           ` Jon Hunter
2012-10-23 23:15           ` Jon Hunter
2012-10-24  0:18           ` Mitch Bradley
2012-10-24  0:18             ` Mitch Bradley
2012-10-23  8:37 ` [PATCH v2 2/4] ARM: dts: omap5: Update I2C " Sebastien Guiriec
2012-10-23  8:37   ` Sebastien Guiriec
2012-10-23  8:37   ` Sebastien Guiriec
2012-10-23  8:37 ` [PATCH v2 3/4] ARM: dts: omap5: Update UART " Sebastien Guiriec
2012-10-23  8:37   ` Sebastien Guiriec
2012-10-23  8:37   ` Sebastien Guiriec
2012-10-23  8:37 ` [PATCH v2 4/4] ARM: dts: omap5: Update MMC " Sebastien Guiriec
2012-10-23  8:37   ` Sebastien Guiriec
2012-10-23  8:37   ` Sebastien Guiriec

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5087252A.50203@ti.com \
    --to=jon-hunter-l0cymroini0@public.gmane.org \
    --cc=devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org \
    --cc=linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-omap-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=s-guiriec-l0cyMroinI0@public.gmane.org \
    --cc=santosh.shilimkar-l0cyMroinI0@public.gmane.org \
    --cc=wmb-D5eQfiDGL7eakBO8gow8eQ@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.