* [PATCH 0/2] Improve VCHIQ cache line size handling @ 2018-09-12 16:42 Phil Elwell 2018-09-12 16:42 ` [PATCH 1/2] staging/vc04_services: Derive g_cache_line_size Phil Elwell ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Phil Elwell @ 2018-09-12 16:42 UTC (permalink / raw) To: linux-arm-kernel Both sides of the VCHIQ communications mechanism need to agree on the cache line size. Using an incorrect value can lead to data corruption, but having the two sides using different values is usually worse. In the absence of an obvious convenient run-time method to determine the correct value in the ARCH=arm world, the downstream Raspberry Pi trees used a Device Tree property, written by the firmware, to configure the kernel driver. This method was vetoed during the upstreaming process, so a fixed value of 32 was used instead, and some corruptions ensued. This is take 2 at arriving at the correct value. Part one of the fix is deriving the correct value from the ARM's cpuid register. Part two is a (seemingly cosmetic) correction of the Device Tree reg declaration used by the driver, but it doubles as an indication to the Raspberry Pi firmware that the kernel driver is running a recent kernel driver that chooses the correct value. As such I would like very much for the DT patch not to be merged before the driver patch - just tell me what hoops I need to jump through. Phil Elwell (2): staging/vc04_services: Derive g_cache_line_size ARM: dts: bcm283x: Correct mailbox register sizes arch/arm/boot/dts/bcm2835-rpi.dtsi | 2 +- .../interface/vchiq_arm/vchiq_2835_arm.c | 24 +++++++++++++++++----- 2 files changed, 20 insertions(+), 6 deletions(-) -- 2.7.4 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/2] staging/vc04_services: Derive g_cache_line_size 2018-09-12 16:42 [PATCH 0/2] Improve VCHIQ cache line size handling Phil Elwell @ 2018-09-12 16:42 ` Phil Elwell 2018-09-14 10:11 ` Stefan Wahren 2018-09-12 16:42 ` [PATCH 2/2] ARM: dts: bcm283x: Correct mailbox register sizes Phil Elwell 2018-09-16 15:25 ` [PATCH 0/2] Improve VCHIQ cache line size handling Stefan Wahren 2 siblings, 1 reply; 10+ messages in thread From: Phil Elwell @ 2018-09-12 16:42 UTC (permalink / raw) To: linux-arm-kernel The ARM coprocessor registers include dcache line size, but there is no function to expose this value. Rather than create a new one, use the read_cpuid_id function to derive the correct value, which is 32 for BCM2835 and 64 for BCM2836/7. Signed-off-by: Phil Elwell <phil@raspberrypi.org> --- .../interface/vchiq_arm/vchiq_2835_arm.c | 24 +++++++++++++++++----- 1 file changed, 19 insertions(+), 5 deletions(-) diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c index e767209..3537f60 100644 --- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c +++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c @@ -42,6 +42,7 @@ #include <linux/uaccess.h> #include <linux/mm.h> #include <linux/of.h> +#include <asm/cputype.h> #include <soc/bcm2835/raspberrypi-firmware.h> #define TOTAL_SLOTS (VCHIQ_SLOT_ZERO_SLOTS + 2 * 32) @@ -81,13 +82,15 @@ static void __iomem *g_regs; * VPU firmware, which determines the required alignment of the * offsets/sizes in pagelists. * - * Modern VPU firmware looks for a DT "cache-line-size" property in - * the VCHIQ node and will overwrite it with the actual L2 cache size, + * Previous VPU firmware looked for a DT "cache-line-size" property in + * the VCHIQ node and would overwrite it with the actual L2 cache size, * which the kernel must then respect. That property was rejected - * upstream, so we have to use the VPU firmware's compatibility value - * of 32. + * upstream, so we now rely on both sides to "do the right thing" independently + * of the other. To improve backwards compatibility, this new behaviour is + * signalled to the firmware by the use of a corrected "reg" property on the + * relevant Device Tree node. */ -static unsigned int g_cache_line_size = 32; +static unsigned int g_cache_line_size; static unsigned int g_fragments_size; static char *g_fragments_base; static char *g_free_fragments; @@ -127,6 +130,17 @@ int vchiq_platform_init(struct platform_device *pdev, VCHIQ_STATE_T *state) if (err < 0) return err; + /* + * The tempting L1_CACHE_BYTES macro doesn't work in the case of + * a kernel built with bcm2835_defconfig running on a BCM2836/7 + * processor, hence the need for a runtime check. The dcache line size + * is encoded in one of the coprocessor registers, but there is no + * convenient way to access it short of embedded assembler, hence + * the use of read_cpuid_id(). The following test evaluates to true + * on a BCM2835 showing that it is ARMv6-ish, whereas + * cpu_architecture() will indicate that it is an ARMv7. + */ + g_cache_line_size = ((read_cpuid_id() & 0x7f000) == 0x7b000) ? 32 : 64; g_fragments_size = 2 * g_cache_line_size; /* Allocate space for the channels in coherent memory */ -- 2.7.4 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 1/2] staging/vc04_services: Derive g_cache_line_size 2018-09-12 16:42 ` [PATCH 1/2] staging/vc04_services: Derive g_cache_line_size Phil Elwell @ 2018-09-14 10:11 ` Stefan Wahren 2018-09-14 10:26 ` Phil Elwell 2018-09-14 10:41 ` Russell King - ARM Linux 0 siblings, 2 replies; 10+ messages in thread From: Stefan Wahren @ 2018-09-14 10:11 UTC (permalink / raw) To: linux-arm-kernel Hi Phil, Am 12.09.2018 um 18:42 schrieb Phil Elwell: > The ARM coprocessor registers include dcache line size, but there is no > function to expose this value. Rather than create a new one, use the > read_cpuid_id function to derive the correct value, which is 32 for > BCM2835 and 64 for BCM2836/7. > > Signed-off-by: Phil Elwell <phil@raspberrypi.org> > --- > .../interface/vchiq_arm/vchiq_2835_arm.c | 24 +++++++++++++++++----- > 1 file changed, 19 insertions(+), 5 deletions(-) > > diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c > index e767209..3537f60 100644 > --- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c > +++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c > @@ -42,6 +42,7 @@ > #include <linux/uaccess.h> > #include <linux/mm.h> > #include <linux/of.h> > +#include <asm/cputype.h> > #include <soc/bcm2835/raspberrypi-firmware.h> > > #define TOTAL_SLOTS (VCHIQ_SLOT_ZERO_SLOTS + 2 * 32) > @@ -81,13 +82,15 @@ static void __iomem *g_regs; > * VPU firmware, which determines the required alignment of the > * offsets/sizes in pagelists. > * > - * Modern VPU firmware looks for a DT "cache-line-size" property in > - * the VCHIQ node and will overwrite it with the actual L2 cache size, > + * Previous VPU firmware looked for a DT "cache-line-size" property in > + * the VCHIQ node and would overwrite it with the actual L2 cache size, > * which the kernel must then respect. That property was rejected > - * upstream, so we have to use the VPU firmware's compatibility value > - * of 32. > + * upstream, so we now rely on both sides to "do the right thing" independently > + * of the other. To improve backwards compatibility, this new behaviour is > + * signalled to the firmware by the use of a corrected "reg" property on the > + * relevant Device Tree node. > */ > -static unsigned int g_cache_line_size = 32; > +static unsigned int g_cache_line_size; > static unsigned int g_fragments_size; > static char *g_fragments_base; > static char *g_free_fragments; > @@ -127,6 +130,17 @@ int vchiq_platform_init(struct platform_device *pdev, VCHIQ_STATE_T *state) > if (err < 0) > return err; > > + /* > + * The tempting L1_CACHE_BYTES macro doesn't work in the case of > + * a kernel built with bcm2835_defconfig running on a BCM2836/7 > + * processor, hence the need for a runtime check. The dcache line size > + * is encoded in one of the coprocessor registers, but there is no > + * convenient way to access it short of embedded assembler, hence > + * the use of read_cpuid_id(). The following test evaluates to true > + * on a BCM2835 showing that it is ARMv6-ish, whereas > + * cpu_architecture() will indicate that it is an ARMv7. > + */ > + g_cache_line_size = ((read_cpuid_id() & 0x7f000) == 0x7b000) ? 32 : 64; is there a chance to use ARM_CPU_PART_ARM1176 or something else instead of this magic numbers? Sorry, i cannot remember the whole discussion but why we can use different compatible here instead of using ARM specific stuff. compatible = "brcm,bcm2836-vchiq", "brcm,bcm2835-vchiq"; // for BCM2836/7 compatible = "brcm,bcm2835"; // only for BCM2835 > g_fragments_size = 2 * g_cache_line_size; > > /* Allocate space for the channels in coherent memory */ ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/2] staging/vc04_services: Derive g_cache_line_size 2018-09-14 10:11 ` Stefan Wahren @ 2018-09-14 10:26 ` Phil Elwell 2018-09-14 11:03 ` Stefan Wahren 2018-09-14 10:41 ` Russell King - ARM Linux 1 sibling, 1 reply; 10+ messages in thread From: Phil Elwell @ 2018-09-14 10:26 UTC (permalink / raw) To: linux-arm-kernel Hi Stefan, On 14/09/2018 11:11, Stefan Wahren wrote: > Hi Phil, > > Am 12.09.2018 um 18:42 schrieb Phil Elwell: >> The ARM coprocessor registers include dcache line size, but there is no >> function to expose this value. Rather than create a new one, use the >> read_cpuid_id function to derive the correct value, which is 32 for >> BCM2835 and 64 for BCM2836/7. >> >> Signed-off-by: Phil Elwell <phil@raspberrypi.org> >> --- >> .../interface/vchiq_arm/vchiq_2835_arm.c | 24 +++++++++++++++++----- >> 1 file changed, 19 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c >> index e767209..3537f60 100644 >> --- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c >> +++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c >> @@ -42,6 +42,7 @@ >> #include <linux/uaccess.h> >> #include <linux/mm.h> >> #include <linux/of.h> >> +#include <asm/cputype.h> >> #include <soc/bcm2835/raspberrypi-firmware.h> >> >> #define TOTAL_SLOTS (VCHIQ_SLOT_ZERO_SLOTS + 2 * 32) >> @@ -81,13 +82,15 @@ static void __iomem *g_regs; >> * VPU firmware, which determines the required alignment of the >> * offsets/sizes in pagelists. >> * >> - * Modern VPU firmware looks for a DT "cache-line-size" property in >> - * the VCHIQ node and will overwrite it with the actual L2 cache size, >> + * Previous VPU firmware looked for a DT "cache-line-size" property in >> + * the VCHIQ node and would overwrite it with the actual L2 cache size, >> * which the kernel must then respect. That property was rejected >> - * upstream, so we have to use the VPU firmware's compatibility value >> - * of 32. >> + * upstream, so we now rely on both sides to "do the right thing" independently >> + * of the other. To improve backwards compatibility, this new behaviour is >> + * signalled to the firmware by the use of a corrected "reg" property on the >> + * relevant Device Tree node. >> */ >> -static unsigned int g_cache_line_size = 32; >> +static unsigned int g_cache_line_size; >> static unsigned int g_fragments_size; >> static char *g_fragments_base; >> static char *g_free_fragments; >> @@ -127,6 +130,17 @@ int vchiq_platform_init(struct platform_device *pdev, VCHIQ_STATE_T *state) >> if (err < 0) >> return err; >> >> + /* >> + * The tempting L1_CACHE_BYTES macro doesn't work in the case of >> + * a kernel built with bcm2835_defconfig running on a BCM2836/7 >> + * processor, hence the need for a runtime check. The dcache line size >> + * is encoded in one of the coprocessor registers, but there is no >> + * convenient way to access it short of embedded assembler, hence >> + * the use of read_cpuid_id(). The following test evaluates to true >> + * on a BCM2835 showing that it is ARMv6-ish, whereas >> + * cpu_architecture() will indicate that it is an ARMv7. >> + */ >> + g_cache_line_size = ((read_cpuid_id() & 0x7f000) == 0x7b000) ? 32 : 64; > > is there a chance to use ARM_CPU_PART_ARM1176 or something else instead > of this magic numbers? One could. It's a trade-off between specific and human-readable vs. generic and less-so, and possibly moot given the next question. > Sorry, i cannot remember the whole discussion but why we can use > different compatible here instead of using ARM specific stuff. > > compatible = "brcm,bcm2836-vchiq", "brcm,bcm2835-vchiq"; // for BCM2836/7 > > compatible = "brcm,bcm2835"; // only for BCM2835 It wasn't an option you raised in the offline discussion. Assuming you meant "brcm,bcm2835-vchiq", I suppose you could think of it as being a different type of device if that is more acceptable. >> g_fragments_size = 2 * g_cache_line_size; >> >> /* Allocate space for the channels in coherent memory */ ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/2] staging/vc04_services: Derive g_cache_line_size 2018-09-14 10:26 ` Phil Elwell @ 2018-09-14 11:03 ` Stefan Wahren 2018-09-14 11:09 ` Phil Elwell 0 siblings, 1 reply; 10+ messages in thread From: Stefan Wahren @ 2018-09-14 11:03 UTC (permalink / raw) To: linux-arm-kernel Hi, Am 14.09.2018 um 12:26 schrieb Phil Elwell: > Hi Stefan, > > On 14/09/2018 11:11, Stefan Wahren wrote: >> Hi Phil, >> >> Am 12.09.2018 um 18:42 schrieb Phil Elwell: >>> The ARM coprocessor registers include dcache line size, but there is no >>> function to expose this value. Rather than create a new one, use the >>> read_cpuid_id function to derive the correct value, which is 32 for >>> BCM2835 and 64 for BCM2836/7. >>> >>> Signed-off-by: Phil Elwell <phil@raspberrypi.org> >>> --- >>> .../interface/vchiq_arm/vchiq_2835_arm.c | 24 +++++++++++++++++----- >>> 1 file changed, 19 insertions(+), 5 deletions(-) >>> >>> diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c >>> index e767209..3537f60 100644 >>> --- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c >>> +++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c >>> @@ -42,6 +42,7 @@ >>> #include <linux/uaccess.h> >>> #include <linux/mm.h> >>> #include <linux/of.h> >>> +#include <asm/cputype.h> >>> #include <soc/bcm2835/raspberrypi-firmware.h> >>> >>> #define TOTAL_SLOTS (VCHIQ_SLOT_ZERO_SLOTS + 2 * 32) >>> @@ -81,13 +82,15 @@ static void __iomem *g_regs; >>> * VPU firmware, which determines the required alignment of the >>> * offsets/sizes in pagelists. >>> * >>> - * Modern VPU firmware looks for a DT "cache-line-size" property in >>> - * the VCHIQ node and will overwrite it with the actual L2 cache size, >>> + * Previous VPU firmware looked for a DT "cache-line-size" property in >>> + * the VCHIQ node and would overwrite it with the actual L2 cache size, >>> * which the kernel must then respect. That property was rejected >>> - * upstream, so we have to use the VPU firmware's compatibility value >>> - * of 32. >>> + * upstream, so we now rely on both sides to "do the right thing" independently >>> + * of the other. To improve backwards compatibility, this new behaviour is >>> + * signalled to the firmware by the use of a corrected "reg" property on the >>> + * relevant Device Tree node. >>> */ >>> -static unsigned int g_cache_line_size = 32; >>> +static unsigned int g_cache_line_size; >>> static unsigned int g_fragments_size; >>> static char *g_fragments_base; >>> static char *g_free_fragments; >>> @@ -127,6 +130,17 @@ int vchiq_platform_init(struct platform_device *pdev, VCHIQ_STATE_T *state) >>> if (err < 0) >>> return err; >>> >>> + /* >>> + * The tempting L1_CACHE_BYTES macro doesn't work in the case of >>> + * a kernel built with bcm2835_defconfig running on a BCM2836/7 >>> + * processor, hence the need for a runtime check. The dcache line size >>> + * is encoded in one of the coprocessor registers, but there is no >>> + * convenient way to access it short of embedded assembler, hence >>> + * the use of read_cpuid_id(). The following test evaluates to true >>> + * on a BCM2835 showing that it is ARMv6-ish, whereas >>> + * cpu_architecture() will indicate that it is an ARMv7. >>> + */ >>> + g_cache_line_size = ((read_cpuid_id() & 0x7f000) == 0x7b000) ? 32 : 64; >> is there a chance to use ARM_CPU_PART_ARM1176 or something else instead >> of this magic numbers? > One could. It's a trade-off between specific and human-readable vs. generic and less-so, > and possibly moot given the next question. > >> Sorry, i cannot remember the whole discussion but why we can use >> different compatible here instead of using ARM specific stuff. >> >> compatible = "brcm,bcm2836-vchiq", "brcm,bcm2835-vchiq"; // for BCM2836/7 >> >> compatible = "brcm,bcm2835"; // only for BCM2835 > It wasn't an option you raised in the offline discussion. maybe this was too obvious :-( > Assuming you meant > "brcm,bcm2835-vchiq", I suppose you could think of it as being a different type > of device if that is more acceptable. Yes, it's a different SoC. Do you see any problems with the VC4 firmware according to this approach? > >>> g_fragments_size = 2 * g_cache_line_size; >>> >>> /* Allocate space for the channels in coherent memory */ > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel at lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/2] staging/vc04_services: Derive g_cache_line_size 2018-09-14 11:03 ` Stefan Wahren @ 2018-09-14 11:09 ` Phil Elwell 2018-09-14 11:25 ` Stefan Wahren 0 siblings, 1 reply; 10+ messages in thread From: Phil Elwell @ 2018-09-14 11:09 UTC (permalink / raw) To: linux-arm-kernel On 14/09/2018 12:03, Stefan Wahren wrote: > Hi, > > Am 14.09.2018 um 12:26 schrieb Phil Elwell: >> Hi Stefan, >> >> On 14/09/2018 11:11, Stefan Wahren wrote: >>> Hi Phil, >>> >>> Am 12.09.2018 um 18:42 schrieb Phil Elwell: >>>> The ARM coprocessor registers include dcache line size, but there is no >>>> function to expose this value. Rather than create a new one, use the >>>> read_cpuid_id function to derive the correct value, which is 32 for >>>> BCM2835 and 64 for BCM2836/7. >>>> >>>> Signed-off-by: Phil Elwell <phil@raspberrypi.org> >>>> --- >>>> .../interface/vchiq_arm/vchiq_2835_arm.c | 24 +++++++++++++++++----- >>>> 1 file changed, 19 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c >>>> index e767209..3537f60 100644 >>>> --- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c >>>> +++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c >>>> @@ -42,6 +42,7 @@ >>>> #include <linux/uaccess.h> >>>> #include <linux/mm.h> >>>> #include <linux/of.h> >>>> +#include <asm/cputype.h> >>>> #include <soc/bcm2835/raspberrypi-firmware.h> >>>> >>>> #define TOTAL_SLOTS (VCHIQ_SLOT_ZERO_SLOTS + 2 * 32) >>>> @@ -81,13 +82,15 @@ static void __iomem *g_regs; >>>> * VPU firmware, which determines the required alignment of the >>>> * offsets/sizes in pagelists. >>>> * >>>> - * Modern VPU firmware looks for a DT "cache-line-size" property in >>>> - * the VCHIQ node and will overwrite it with the actual L2 cache size, >>>> + * Previous VPU firmware looked for a DT "cache-line-size" property in >>>> + * the VCHIQ node and would overwrite it with the actual L2 cache size, >>>> * which the kernel must then respect. That property was rejected >>>> - * upstream, so we have to use the VPU firmware's compatibility value >>>> - * of 32. >>>> + * upstream, so we now rely on both sides to "do the right thing" independently >>>> + * of the other. To improve backwards compatibility, this new behaviour is >>>> + * signalled to the firmware by the use of a corrected "reg" property on the >>>> + * relevant Device Tree node. >>>> */ >>>> -static unsigned int g_cache_line_size = 32; >>>> +static unsigned int g_cache_line_size; >>>> static unsigned int g_fragments_size; >>>> static char *g_fragments_base; >>>> static char *g_free_fragments; >>>> @@ -127,6 +130,17 @@ int vchiq_platform_init(struct platform_device *pdev, VCHIQ_STATE_T *state) >>>> if (err < 0) >>>> return err; >>>> >>>> + /* >>>> + * The tempting L1_CACHE_BYTES macro doesn't work in the case of >>>> + * a kernel built with bcm2835_defconfig running on a BCM2836/7 >>>> + * processor, hence the need for a runtime check. The dcache line size >>>> + * is encoded in one of the coprocessor registers, but there is no >>>> + * convenient way to access it short of embedded assembler, hence >>>> + * the use of read_cpuid_id(). The following test evaluates to true >>>> + * on a BCM2835 showing that it is ARMv6-ish, whereas >>>> + * cpu_architecture() will indicate that it is an ARMv7. >>>> + */ >>>> + g_cache_line_size = ((read_cpuid_id() & 0x7f000) == 0x7b000) ? 32 : 64; >>> is there a chance to use ARM_CPU_PART_ARM1176 or something else instead >>> of this magic numbers? >> One could. It's a trade-off between specific and human-readable vs. generic and less-so, >> and possibly moot given the next question. >> >>> Sorry, i cannot remember the whole discussion but why we can use >>> different compatible here instead of using ARM specific stuff. >>> >>> compatible = "brcm,bcm2836-vchiq", "brcm,bcm2835-vchiq"; // for BCM2836/7 >>> >>> compatible = "brcm,bcm2835"; // only for BCM2835 >> It wasn't an option you raised in the offline discussion. > > maybe this was too obvious :-( > >> Assuming you meant >> "brcm,bcm2835-vchiq", I suppose you could think of it as being a different type >> of device if that is more acceptable. > > Yes, it's a different SoC. Do you see any problems with the VC4 firmware > according to this approach? No, it can be made to work. Expect V2 shortly. > >> >>>> g_fragments_size = 2 * g_cache_line_size; >>>> >>>> /* Allocate space for the channels in coherent memory */ >> _______________________________________________ >> linux-arm-kernel mailing list >> linux-arm-kernel at lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/2] staging/vc04_services: Derive g_cache_line_size 2018-09-14 11:09 ` Phil Elwell @ 2018-09-14 11:25 ` Stefan Wahren 0 siblings, 0 replies; 10+ messages in thread From: Stefan Wahren @ 2018-09-14 11:25 UTC (permalink / raw) To: linux-arm-kernel Am 14.09.2018 um 13:09 schrieb Phil Elwell: > On 14/09/2018 12:03, Stefan Wahren wrote: >> Hi, >> >> Am 14.09.2018 um 12:26 schrieb Phil Elwell: >>> Hi Stefan, >>> >>> On 14/09/2018 11:11, Stefan Wahren wrote: >>>> Hi Phil, >>>> >>>> Am 12.09.2018 um 18:42 schrieb Phil Elwell: >>>>> The ARM coprocessor registers include dcache line size, but there is no >>>>> function to expose this value. Rather than create a new one, use the >>>>> read_cpuid_id function to derive the correct value, which is 32 for >>>>> BCM2835 and 64 for BCM2836/7. >>>>> >>>>> Signed-off-by: Phil Elwell <phil@raspberrypi.org> >>>>> --- >>>>> .../interface/vchiq_arm/vchiq_2835_arm.c | 24 +++++++++++++++++----- >>>>> 1 file changed, 19 insertions(+), 5 deletions(-) >>>>> >>>>> diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c >>>>> index e767209..3537f60 100644 >>>>> --- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c >>>>> +++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c >>>>> @@ -42,6 +42,7 @@ >>>>> #include <linux/uaccess.h> >>>>> #include <linux/mm.h> >>>>> #include <linux/of.h> >>>>> +#include <asm/cputype.h> >>>>> #include <soc/bcm2835/raspberrypi-firmware.h> >>>>> >>>>> #define TOTAL_SLOTS (VCHIQ_SLOT_ZERO_SLOTS + 2 * 32) >>>>> @@ -81,13 +82,15 @@ static void __iomem *g_regs; >>>>> * VPU firmware, which determines the required alignment of the >>>>> * offsets/sizes in pagelists. >>>>> * >>>>> - * Modern VPU firmware looks for a DT "cache-line-size" property in >>>>> - * the VCHIQ node and will overwrite it with the actual L2 cache size, >>>>> + * Previous VPU firmware looked for a DT "cache-line-size" property in >>>>> + * the VCHIQ node and would overwrite it with the actual L2 cache size, >>>>> * which the kernel must then respect. That property was rejected >>>>> - * upstream, so we have to use the VPU firmware's compatibility value >>>>> - * of 32. >>>>> + * upstream, so we now rely on both sides to "do the right thing" independently >>>>> + * of the other. To improve backwards compatibility, this new behaviour is >>>>> + * signalled to the firmware by the use of a corrected "reg" property on the >>>>> + * relevant Device Tree node. >>>>> */ >>>>> -static unsigned int g_cache_line_size = 32; >>>>> +static unsigned int g_cache_line_size; >>>>> static unsigned int g_fragments_size; >>>>> static char *g_fragments_base; >>>>> static char *g_free_fragments; >>>>> @@ -127,6 +130,17 @@ int vchiq_platform_init(struct platform_device *pdev, VCHIQ_STATE_T *state) >>>>> if (err < 0) >>>>> return err; >>>>> >>>>> + /* >>>>> + * The tempting L1_CACHE_BYTES macro doesn't work in the case of >>>>> + * a kernel built with bcm2835_defconfig running on a BCM2836/7 >>>>> + * processor, hence the need for a runtime check. The dcache line size >>>>> + * is encoded in one of the coprocessor registers, but there is no >>>>> + * convenient way to access it short of embedded assembler, hence >>>>> + * the use of read_cpuid_id(). The following test evaluates to true >>>>> + * on a BCM2835 showing that it is ARMv6-ish, whereas >>>>> + * cpu_architecture() will indicate that it is an ARMv7. >>>>> + */ >>>>> + g_cache_line_size = ((read_cpuid_id() & 0x7f000) == 0x7b000) ? 32 : 64; >>>> is there a chance to use ARM_CPU_PART_ARM1176 or something else instead >>>> of this magic numbers? >>> One could. It's a trade-off between specific and human-readable vs. generic and less-so, >>> and possibly moot given the next question. >>> >>>> Sorry, i cannot remember the whole discussion but why we can use >>>> different compatible here instead of using ARM specific stuff. >>>> >>>> compatible = "brcm,bcm2836-vchiq", "brcm,bcm2835-vchiq"; // for BCM2836/7 >>>> >>>> compatible = "brcm,bcm2835"; // only for BCM2835 >>> It wasn't an option you raised in the offline discussion. >> maybe this was too obvious :-( >> >>> Assuming you meant >>> "brcm,bcm2835-vchiq", I suppose you could think of it as being a different type >>> of device if that is more acceptable. >> Yes, it's a different SoC. Do you see any problems with the VC4 firmware >> according to this approach? > No, it can be made to work. Expect V2 shortly. Great. I forgot a mailing list in my first comment. Please also add devel at driverdev.osuosl.org in CC. > >>>>> g_fragments_size = 2 * g_cache_line_size; >>>>> >>>>> /* Allocate space for the channels in coherent memory */ >>> _______________________________________________ >>> linux-arm-kernel mailing list >>> linux-arm-kernel at lists.infradead.org >>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >> ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/2] staging/vc04_services: Derive g_cache_line_size 2018-09-14 10:11 ` Stefan Wahren 2018-09-14 10:26 ` Phil Elwell @ 2018-09-14 10:41 ` Russell King - ARM Linux 1 sibling, 0 replies; 10+ messages in thread From: Russell King - ARM Linux @ 2018-09-14 10:41 UTC (permalink / raw) To: linux-arm-kernel On Fri, Sep 14, 2018 at 12:11:54PM +0200, Stefan Wahren wrote: > Hi Phil, > > Am 12.09.2018 um 18:42 schrieb Phil Elwell: > > + /* > > + * The tempting L1_CACHE_BYTES macro doesn't work in the case of > > + * a kernel built with bcm2835_defconfig running on a BCM2836/7 > > + * processor, hence the need for a runtime check. The dcache line size > > + * is encoded in one of the coprocessor registers, but there is no > > + * convenient way to access it short of embedded assembler, hence > > + * the use of read_cpuid_id(). The following test evaluates to true > > + * on a BCM2835 showing that it is ARMv6-ish, whereas > > + * cpu_architecture() will indicate that it is an ARMv7. > > + */ > > + g_cache_line_size = ((read_cpuid_id() & 0x7f000) == 0x7b000) ? 32 : 64; > > is there a chance to use ARM_CPU_PART_ARM1176 or something else instead > of this magic numbers? In any case, reading the CPU ID and masking it in this way is really not on. The above checks that the ID _may_ be a "new ID" format, and that the top four bits of the CPU part number are 11. However, there are three formats of the CPU ID register which can conflict if you do this kind of trivial mask-and-test. The relevant cases for are: bits 19:16 inclusive being all ones define whether the ID is "new ID" format with the additional CPU ID registers. This is used for all modern CPUs. Otherwise, if bit 19 is clear, it is an older ID format, and bits 18:16 may contain '7'. The value of bits 15:4 depend on the implementer - "1011" in 15:12 are used by Intel for StrongARM1110 (though they won't have '7' in 18:16), ARM Ltd for ARM1136, ARM1156, ARM1176 and ARM11MPCore (which will have '7' in 18:16) - and hence will match this test. So, the above code does not match just BCM2835, but also matches other CPUs. There is a reason why the old macros that allow the part number to be fetched in asm/cputype.h are marked as deprecated, and that is to encourage correct behaviour - to use both the part number and implementer fields. The comment is also wrong. It claims that in order to read the cache line size, one needs to use assembler. asm/cputype.h has macros and inline functions (of which read_cpuid_id() is one of them) to allow access without having assembler encoded in drivers. However, there is a gotcha - the format of the cache type register also depends on the CPU architecture, so it's not a simple case of grabbing the value of a bitfield. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up According to speedtest.net: 13Mbps down 490kbps up ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/2] ARM: dts: bcm283x: Correct mailbox register sizes 2018-09-12 16:42 [PATCH 0/2] Improve VCHIQ cache line size handling Phil Elwell 2018-09-12 16:42 ` [PATCH 1/2] staging/vc04_services: Derive g_cache_line_size Phil Elwell @ 2018-09-12 16:42 ` Phil Elwell 2018-09-16 15:25 ` [PATCH 0/2] Improve VCHIQ cache line size handling Stefan Wahren 2 siblings, 0 replies; 10+ messages in thread From: Phil Elwell @ 2018-09-12 16:42 UTC (permalink / raw) To: linux-arm-kernel The size field in a Device Tree "reg" property is encoded in bytes, not words. Signed-off-by: Phil Elwell <phil@raspberrypi.org> --- arch/arm/boot/dts/bcm2835-rpi.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/bcm2835-rpi.dtsi b/arch/arm/boot/dts/bcm2835-rpi.dtsi index cb2d6d7..c481eab 100644 --- a/arch/arm/boot/dts/bcm2835-rpi.dtsi +++ b/arch/arm/boot/dts/bcm2835-rpi.dtsi @@ -32,7 +32,7 @@ mailbox at 7e00b840 { compatible = "brcm,bcm2835-vchiq"; - reg = <0x7e00b840 0xf>; + reg = <0x7e00b840 0x3c>; interrupts = <0 2>; }; }; -- 2.7.4 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 0/2] Improve VCHIQ cache line size handling 2018-09-12 16:42 [PATCH 0/2] Improve VCHIQ cache line size handling Phil Elwell 2018-09-12 16:42 ` [PATCH 1/2] staging/vc04_services: Derive g_cache_line_size Phil Elwell 2018-09-12 16:42 ` [PATCH 2/2] ARM: dts: bcm283x: Correct mailbox register sizes Phil Elwell @ 2018-09-16 15:25 ` Stefan Wahren 2 siblings, 0 replies; 10+ messages in thread From: Stefan Wahren @ 2018-09-16 15:25 UTC (permalink / raw) To: linux-arm-kernel Hi, > Phil Elwell <phil@raspberrypi.org> hat am 12. September 2018 um 18:42 geschrieben: > > > Both sides of the VCHIQ communications mechanism need to agree on the cache > line size. Using an incorrect value can lead to data corruption, but having the > two sides using different values is usually worse. > > In the absence of an obvious convenient run-time method to determine the > correct value in the ARCH=arm world, the downstream Raspberry Pi trees used a > Device Tree property, written by the firmware, to configure the kernel driver. > This method was vetoed during the upstreaming process, so a fixed value of 32 > was used instead, and some corruptions ensued. This is take 2 at arriving at > the correct value. > > Part one of the fix is deriving the correct value from the ARM's cpuid register. > Part two is a (seemingly cosmetic) correction of the Device Tree reg declaration > used by the driver, but it doubles as an indication to the Raspberry Pi firmware > that the kernel driver is running a recent kernel driver that chooses the > correct value. As such I would like very much for the DT patch not to be merged > before the driver patch - just tell me what hoops I need to jump through. > even if this version isn't the best, the test results for vchiq_test -f 1 (using recent firmware 2018-09-10) are very good: Raspberry 1 B, bcm2835_defconfig -> success Raspberry 2 B, multi_v7_defconfig -> success Raspberry 3 B, arm64/defconfig -> success Thanks Stefan ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2018-09-16 15:25 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-09-12 16:42 [PATCH 0/2] Improve VCHIQ cache line size handling Phil Elwell 2018-09-12 16:42 ` [PATCH 1/2] staging/vc04_services: Derive g_cache_line_size Phil Elwell 2018-09-14 10:11 ` Stefan Wahren 2018-09-14 10:26 ` Phil Elwell 2018-09-14 11:03 ` Stefan Wahren 2018-09-14 11:09 ` Phil Elwell 2018-09-14 11:25 ` Stefan Wahren 2018-09-14 10:41 ` Russell King - ARM Linux 2018-09-12 16:42 ` [PATCH 2/2] ARM: dts: bcm283x: Correct mailbox register sizes Phil Elwell 2018-09-16 15:25 ` [PATCH 0/2] Improve VCHIQ cache line size handling Stefan Wahren
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).