From: Russell King - ARM Linux admin <linux@armlinux.org.uk>
To: Marc Gonzalez <marc.w.gonzalez@free.fr>
Cc: Robin Murphy <robin.murphy@arm.com>,
Dmitry Torokhov <dmitry.torokhov@gmail.com>,
Bjorn Andersson <bjorn.andersson@linaro.org>,
Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>,
Stephen Boyd <sboyd@kernel.org>,
Michael Turquette <mturquette@baylibre.com>,
LKML <linux-kernel@vger.kernel.org>,
Sudip Mukherjee <sudipm.mukherjee@gmail.com>,
Guenter Roeck <linux@roeck-us.net>,
linux-clk <linux-clk@vger.kernel.org>,
Linux ARM <linux-arm-kernel@lists.infradead.org>,
x86 <x86@kernel.org>
Subject: Re: [PATCH v1] clk: Convert managed get functions to devm_add_action API
Date: Thu, 12 Dec 2019 17:05:37 +0000 [thread overview]
Message-ID: <20191212170537.GL25745@shell.armlinux.org.uk> (raw)
In-Reply-To: <6a647c20-c2fa-f14c-256d-6516d0ad03b0@free.fr>
On Thu, Dec 12, 2019 at 05:59:04PM +0100, Marc Gonzalez wrote:
> On 12/12/2019 15:47, Robin Murphy wrote:
>
> > On 12/12/2019 1:53 pm, Marc Gonzalez wrote:
> >
> >> On 11/12/2019 23:28, Dmitry Torokhov wrote:
> >>
> >>> On Wed, Dec 11, 2019 at 05:17:28PM +0100, Marc Gonzalez wrote:
> >>>
> >>>> What is the rationale for the devm_add_action API?
> >>>
> >>> For one-off and maybe complex unwind actions in drivers that wish to use
> >>> devm API (as mixing devm and manual release is verboten). Also is often
> >>> used when some core subsystem does not provide enough devm APIs.
> >>
> >> Thanks for the insight, Dmitry. Thanks to Robin too.
> >>
> >> This is what I understand so far:
> >>
> >> devm_add_action() is nice because it hides/factorizes the complexity
> >> of the devres API, but it incurs a small storage overhead of one
> >> pointer per call, which makes it unfit for frequently used actions,
> >> such as clk_get.
> >>
> >> Is that correct?
> >>
> >> My question is: why not design the API without the small overhead?
> >
> > Probably because on most architectures, ARCH_KMALLOC_MINALIGN is at
> > least as big as two pointers anyway, so this "overhead" should mostly be
> > free in practice. Plus the devres API is almost entirely about being
> > able to write simple robust code, rather than absolute efficiency - I
> > mean, struct devres itself is already 5 pointers large at the absolute
> > minimum ;)
>
> (3 pointers: 1 list_head + 1 function pointer)
>
> I'm confused. The first patch was criticized for potentially adding
> an extra pointer for every devm_clk_get (e.g. 800 bytes on a 64-bit
> platform with 100 clocks).
>
> Let's see. On arm64, ARCH_KMALLOC_MINALIGN is 128.
>
> So basically, a struct devres looks like this on arm64:
>
> list_head.next
> list_head.prev
> dr_release_t
> .
> .
> .
> 104 bytes of padding
> .
> .
> .
> data (flexible array)
> .
> .
> .
> padding up to 256 bytes
>
>
> Basically, on arm64, every struct devres occupies 256 bytes, most of it
> (typically 104 + 112 = 216) wasted as padding.
>
> Hmmm, given how many devm stuff goes on in a modern platform, there
> might be large savings to be had...
>
> Assuming 10,000 calls to devres_alloc_node(), we would be wasting ~2 MB
> of RAM. Not sure it's worth trying to save that?
>
> $ git grep '#define ARCH_DMA_MINALIGN'
> arch/arc/include/asm/cache.h:#define ARCH_DMA_MINALIGN SMP_CACHE_BYTES
> arch/arm/include/asm/cache.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
> arch/arm64/include/asm/cache.h:#define ARCH_DMA_MINALIGN (128)
> arch/c6x/include/asm/cache.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
> arch/csky/include/asm/cache.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
> arch/hexagon/include/asm/cache.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
> arch/m68k/include/asm/cache.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
> arch/microblaze/include/asm/page.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
> arch/mips/include/asm/mach-generic/kmalloc.h:#define ARCH_DMA_MINALIGN 128
> arch/mips/include/asm/mach-ip32/kmalloc.h:#define ARCH_DMA_MINALIGN 32
> arch/mips/include/asm/mach-ip32/kmalloc.h:#define ARCH_DMA_MINALIGN 128
> arch/mips/include/asm/mach-tx49xx/kmalloc.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
> arch/nds32/include/asm/cache.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
> arch/nios2/include/asm/cache.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
> arch/parisc/include/asm/cache.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
> arch/powerpc/include/asm/page_32.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
> arch/sh/include/asm/page.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
> arch/unicore32/include/asm/cache.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
> arch/xtensa/include/asm/cache.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
>
> Hmmm, how does arch/x86 do it?
As I understand it, x86 tends to be fully coherent, so has no there
is not much requirement for DMA to be aligned to cachelines.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
WARNING: multiple messages have this Message-ID (diff)
From: Russell King - ARM Linux admin <linux@armlinux.org.uk>
To: Marc Gonzalez <marc.w.gonzalez@free.fr>
Cc: x86 <x86@kernel.org>,
Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>,
Stephen Boyd <sboyd@kernel.org>,
Michael Turquette <mturquette@baylibre.com>,
Dmitry Torokhov <dmitry.torokhov@gmail.com>,
linux-clk <linux-clk@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
Bjorn Andersson <bjorn.andersson@linaro.org>,
Linux ARM <linux-arm-kernel@lists.infradead.org>,
Robin Murphy <robin.murphy@arm.com>,
Sudip Mukherjee <sudipm.mukherjee@gmail.com>,
Guenter Roeck <linux@roeck-us.net>
Subject: Re: [PATCH v1] clk: Convert managed get functions to devm_add_action API
Date: Thu, 12 Dec 2019 17:05:37 +0000 [thread overview]
Message-ID: <20191212170537.GL25745@shell.armlinux.org.uk> (raw)
In-Reply-To: <6a647c20-c2fa-f14c-256d-6516d0ad03b0@free.fr>
On Thu, Dec 12, 2019 at 05:59:04PM +0100, Marc Gonzalez wrote:
> On 12/12/2019 15:47, Robin Murphy wrote:
>
> > On 12/12/2019 1:53 pm, Marc Gonzalez wrote:
> >
> >> On 11/12/2019 23:28, Dmitry Torokhov wrote:
> >>
> >>> On Wed, Dec 11, 2019 at 05:17:28PM +0100, Marc Gonzalez wrote:
> >>>
> >>>> What is the rationale for the devm_add_action API?
> >>>
> >>> For one-off and maybe complex unwind actions in drivers that wish to use
> >>> devm API (as mixing devm and manual release is verboten). Also is often
> >>> used when some core subsystem does not provide enough devm APIs.
> >>
> >> Thanks for the insight, Dmitry. Thanks to Robin too.
> >>
> >> This is what I understand so far:
> >>
> >> devm_add_action() is nice because it hides/factorizes the complexity
> >> of the devres API, but it incurs a small storage overhead of one
> >> pointer per call, which makes it unfit for frequently used actions,
> >> such as clk_get.
> >>
> >> Is that correct?
> >>
> >> My question is: why not design the API without the small overhead?
> >
> > Probably because on most architectures, ARCH_KMALLOC_MINALIGN is at
> > least as big as two pointers anyway, so this "overhead" should mostly be
> > free in practice. Plus the devres API is almost entirely about being
> > able to write simple robust code, rather than absolute efficiency - I
> > mean, struct devres itself is already 5 pointers large at the absolute
> > minimum ;)
>
> (3 pointers: 1 list_head + 1 function pointer)
>
> I'm confused. The first patch was criticized for potentially adding
> an extra pointer for every devm_clk_get (e.g. 800 bytes on a 64-bit
> platform with 100 clocks).
>
> Let's see. On arm64, ARCH_KMALLOC_MINALIGN is 128.
>
> So basically, a struct devres looks like this on arm64:
>
> list_head.next
> list_head.prev
> dr_release_t
> .
> .
> .
> 104 bytes of padding
> .
> .
> .
> data (flexible array)
> .
> .
> .
> padding up to 256 bytes
>
>
> Basically, on arm64, every struct devres occupies 256 bytes, most of it
> (typically 104 + 112 = 216) wasted as padding.
>
> Hmmm, given how many devm stuff goes on in a modern platform, there
> might be large savings to be had...
>
> Assuming 10,000 calls to devres_alloc_node(), we would be wasting ~2 MB
> of RAM. Not sure it's worth trying to save that?
>
> $ git grep '#define ARCH_DMA_MINALIGN'
> arch/arc/include/asm/cache.h:#define ARCH_DMA_MINALIGN SMP_CACHE_BYTES
> arch/arm/include/asm/cache.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
> arch/arm64/include/asm/cache.h:#define ARCH_DMA_MINALIGN (128)
> arch/c6x/include/asm/cache.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
> arch/csky/include/asm/cache.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
> arch/hexagon/include/asm/cache.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
> arch/m68k/include/asm/cache.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
> arch/microblaze/include/asm/page.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
> arch/mips/include/asm/mach-generic/kmalloc.h:#define ARCH_DMA_MINALIGN 128
> arch/mips/include/asm/mach-ip32/kmalloc.h:#define ARCH_DMA_MINALIGN 32
> arch/mips/include/asm/mach-ip32/kmalloc.h:#define ARCH_DMA_MINALIGN 128
> arch/mips/include/asm/mach-tx49xx/kmalloc.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
> arch/nds32/include/asm/cache.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
> arch/nios2/include/asm/cache.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
> arch/parisc/include/asm/cache.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
> arch/powerpc/include/asm/page_32.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
> arch/sh/include/asm/page.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
> arch/unicore32/include/asm/cache.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
> arch/xtensa/include/asm/cache.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
>
> Hmmm, how does arch/x86 do it?
As I understand it, x86 tends to be fully coherent, so has no there
is not much requirement for DMA to be aligned to cachelines.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2019-12-12 17:05 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-26 16:13 [PATCH v1] clk: Convert managed get functions to devm_add_action API Marc Gonzalez
2019-11-26 16:13 ` Marc Gonzalez
2019-11-28 18:56 ` Bjorn Andersson
2019-11-28 18:56 ` Bjorn Andersson
2019-12-02 1:42 ` Dmitry Torokhov
2019-12-02 1:42 ` Dmitry Torokhov
2019-12-02 9:25 ` Marc Gonzalez
2019-12-02 9:25 ` Marc Gonzalez
2019-12-02 13:51 ` Robin Murphy
2019-12-02 13:51 ` Robin Murphy
2019-12-11 16:17 ` Marc Gonzalez
2019-12-11 16:17 ` Marc Gonzalez
2019-12-11 22:28 ` Dmitry Torokhov
2019-12-11 22:28 ` Dmitry Torokhov
2019-12-12 13:53 ` Marc Gonzalez
2019-12-12 13:53 ` Marc Gonzalez
2019-12-12 14:17 ` Russell King - ARM Linux admin
2019-12-12 14:17 ` Russell King - ARM Linux admin
2019-12-12 14:41 ` Marc Gonzalez
2019-12-12 14:41 ` Marc Gonzalez
2019-12-12 14:46 ` Russell King - ARM Linux admin
2019-12-12 14:46 ` Russell King - ARM Linux admin
2019-12-12 15:51 ` Marc Gonzalez
2019-12-12 15:51 ` Marc Gonzalez
2019-12-12 16:13 ` Russell King - ARM Linux admin
2019-12-12 16:13 ` Russell King - ARM Linux admin
2019-12-12 14:47 ` Robin Murphy
2019-12-12 14:47 ` Robin Murphy
2019-12-12 16:59 ` Marc Gonzalez
2019-12-12 16:59 ` Marc Gonzalez
2019-12-12 17:05 ` Russell King - ARM Linux admin [this message]
2019-12-12 17:05 ` Russell King - ARM Linux admin
2019-12-12 18:15 ` Robin Murphy
2019-12-12 18:15 ` Robin Murphy
2019-12-12 19:10 ` Dmitry Torokhov
2019-12-12 19:10 ` Dmitry Torokhov
2019-12-12 21:08 ` Robin Murphy
2019-12-12 21:08 ` Robin Murphy
2019-12-13 0:16 ` Dmitry Torokhov
2019-12-13 0:16 ` Dmitry Torokhov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191212170537.GL25745@shell.armlinux.org.uk \
--to=linux@armlinux.org.uk \
--cc=bjorn.andersson@linaro.org \
--cc=dmitry.torokhov@gmail.com \
--cc=kuninori.morimoto.gx@renesas.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-clk@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@roeck-us.net \
--cc=marc.w.gonzalez@free.fr \
--cc=mturquette@baylibre.com \
--cc=robin.murphy@arm.com \
--cc=sboyd@kernel.org \
--cc=sudipm.mukherjee@gmail.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.