From mboxrd@z Thu Jan  1 00:00:00 1970
From: Stefan Roese <sr-ynQEQJNshbs@public.gmane.org>
Subject: Re: [PATCH v5] spi: orion.c: Add direct access mode
Date: Wed, 20 Apr 2016 09:38:29 +0200
Message-ID: <571731F5.2010809@denx.de>
References: <1460974417-32375-1-git-send-email-sr@denx.de>
 <4509446.OJU128rpNa@wuerfel> <5715FD81.8090100@denx.de>
 <3956609.3ttDzXh866@wuerfel>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	Mark Brown <broonie-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Thomas Petazzoni <thomas.petazzoni-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8@public.gmane.org>,
	Gregory CLEMENT <gregory.clement-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8@public.gmane.org>,
	Andrew Lunn <andrew-g2DYL2Zd6BY@public.gmane.org>, linux-spi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
To: Arnd Bergmann <arnd-r2nGTMty4D4@public.gmane.org>
Return-path: <linux-spi-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
In-Reply-To: <3956609.3ttDzXh866@wuerfel>
Sender: linux-spi-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
List-ID: <linux-spi.vger.kernel.org>

On 19.04.2016 15:41, Arnd Bergmann wrote:
> On Tuesday 19 April 2016 11:42:25 Stefan Roese wrote:
>> On 18.04.2016 15:57, Arnd Bergmann wrote:
>>> On Monday 18 April 2016 15:00:52 Stefan Roese wrote:
>>>> On 18.04.2016 13:32, Arnd Bergmann wrote:
>>> Ok, so I looked up the datasheet again and understood now that it
>>> actually does both ways: a) the direct read/write that I had always assumed
>>> it did with automatic cmd/addr/data generation, and b) the mode where
>>> you do each write transfer separately as implemented by your patch.
>>>
>>> Those two modes seem to be wildly different in their needs for address
>>> space:
>>>
>>> - For a) we definitely need the mapping to be the same size as the
>>>     addressable storage of the SPI-NOR (or other device that fits
>>>     into the cmd/addr/data pattern supported by the hardware)
>>>
>>> - For b) the address is ignored and at most 32 bytes are transfered,
>>>     so whatever the smallest MBUS mapping window size is would certainly
>>>     be enough.
>>>
>>> As a side-note, I see that you use a regular devm_ioremap_resource(),
>>> which I assume prevents the transfers from ever being more than
>>> a single 4-byte word, while burst mode with up to 32 bytes likely
>>> requires a write-combining mapping.
>>
>> Thanks for the hint. I've tested again with devm_ioremap_wc() and the
>> resulting SPI TX time is identical to the one using the driver with
>> the uncached (non write-combined) mapping. This is most likely a
>> result of the already fully saturated SPI bus speed being the
>> bottle-neck here.
>
> Ok
>
>> A bit more testing has shown, that using devm_ioremap_wc() leads
>> to sporadic failures in the FPGA image downloading though. Now
>> I'm unsure, if its really worth to add some cache flushing after
>> the memcpy() call here (flush_cache_vmap ?) or better keep the
>> uncached ioremap in place.
>
> The cache is not involved here, as devm_ioremap_wc is still uncached.
> However it's possible that memcpy has other issues here. If you
> don't gain anything from sending more than 32 bits at a time,
> writesl() might be a better alternative, but it requires the data
> to be 32-bit aligned in RAM. If the buffer is not aligned, the
> memcpy() might already cause problems in case the kernel
> implementation does not send the data in the correct order.
>
> memcpy_toio() could also be helpful here.
>
>>>>> This means that the 1MB window is probably a reasonable size (provided
>>>>> that the (most importantly) the SPI-NOR driver can guarantee that it
>>>>> never exceeds this).
>>>>
>>>> I have no problem switching from 1MiB to using 0xffffffff in the
>>>> SPI controller 'reg' node to allow even bigger windows in v6.
>>>
>>> How do you decide how much of the window to map though? Using 0xffffffff
>>> is nice because it better represents what the hardware looks like
>>> and doesn't limit you from having arbitrarily large mbus windows
>>> based on your needs, but you'd probably have to try mapping various
>>> sizes then to find the maximum supported one, or you'd have to do
>>> some ugly parsing of the parent node ranges.
>>
>> This is for the single-window mode, right? My comment above was
>> still referring to the original implementation. And yes, this
>> gets a bit complicated and is most likely not really necessary.
>> So I would prefer to keep the mapping fixed to a max. of 1MiB for
>> now.
>
> No, I also meant the separate windows here. Based on the discussion
> above, I think the driver can ideally just ioremap 4KB for devices
> other than SPI-NOR, and then you can list the 0xffffffff length.
>
>>> In this example, you have to configure the SPI controller to have 1MB per CS
>>> in single-window mode using the "SPI DecodeMode" and "SPI CS Adress Decode"
>>> registers. The mbus then contains two mappings, one 2MB window spanning
>>> CS3 and CS4 in a continuous CPU range, and one 64K window for a device at
>>> CS6, making it as small as possible to conserve CPU address space.
>>
>> Thanks. I've prepared a new version implementing this single-
>> window mode. Please let me know how you prefer the cache issue
>> above to be handled (uncached mapping, flush or ?) and I'll
>> update the patch accordingly.
>
> If a write-combining mapping doesn't help you here, just use the normal
> ioremap as before.

Okay, switching back to the normal ioremap.

> The single-window mode won't really work with SPI-NOR unless you make each
> window large enough to hold the largest possible flash (128MB with
> single-window, 256MB with double-window), but that size makes it harder
> to gain much from saving mbus windows when you trade them for gigantic
> CPU address space consumption.
>
> Therefore, we probably want the separate-window mode to allow the
> combination of large SPI flashes with other SPI devices in direct-access
> mode. We can also implement the single (and/or double) window mode
> in addition to that, but realistically we probably don't need that
> as all machines we support in the kernel today have at most one
> non-flash device.

Yes, this is also my feeling. While implementing this 0xffffffff
size in the SPI controller 'reg' DT node, I just stumbled over the
following problem: The resulting size of the area to map
(of_address_to_resource) is not the minimum of the 'soc' 'ranges'
entry and this 0xffffffff as I would have expected. But its always
0xffffffff. Do you know if this is general problem with this
approach or perhaps a problem / bug in the DT address probing /
translation?

Thanks,
Stefan

--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

From mboxrd@z Thu Jan  1 00:00:00 1970
From: sr@denx.de (Stefan Roese)
Date: Wed, 20 Apr 2016 09:38:29 +0200
Subject: [PATCH v5] spi: orion.c: Add direct access mode
In-Reply-To: <3956609.3ttDzXh866@wuerfel>
References: <1460974417-32375-1-git-send-email-sr@denx.de>
 <4509446.OJU128rpNa@wuerfel> <5715FD81.8090100@denx.de>
 <3956609.3ttDzXh866@wuerfel>
Message-ID: <571731F5.2010809@denx.de>
To: linux-arm-kernel@lists.infradead.org
List-Id: linux-arm-kernel.lists.infradead.org

On 19.04.2016 15:41, Arnd Bergmann wrote:
> On Tuesday 19 April 2016 11:42:25 Stefan Roese wrote:
>> On 18.04.2016 15:57, Arnd Bergmann wrote:
>>> On Monday 18 April 2016 15:00:52 Stefan Roese wrote:
>>>> On 18.04.2016 13:32, Arnd Bergmann wrote:
>>> Ok, so I looked up the datasheet again and understood now that it
>>> actually does both ways: a) the direct read/write that I had always assumed
>>> it did with automatic cmd/addr/data generation, and b) the mode where
>>> you do each write transfer separately as implemented by your patch.
>>>
>>> Those two modes seem to be wildly different in their needs for address
>>> space:
>>>
>>> - For a) we definitely need the mapping to be the same size as the
>>>     addressable storage of the SPI-NOR (or other device that fits
>>>     into the cmd/addr/data pattern supported by the hardware)
>>>
>>> - For b) the address is ignored and at most 32 bytes are transfered,
>>>     so whatever the smallest MBUS mapping window size is would certainly
>>>     be enough.
>>>
>>> As a side-note, I see that you use a regular devm_ioremap_resource(),
>>> which I assume prevents the transfers from ever being more than
>>> a single 4-byte word, while burst mode with up to 32 bytes likely
>>> requires a write-combining mapping.
>>
>> Thanks for the hint. I've tested again with devm_ioremap_wc() and the
>> resulting SPI TX time is identical to the one using the driver with
>> the uncached (non write-combined) mapping. This is most likely a
>> result of the already fully saturated SPI bus speed being the
>> bottle-neck here.
>
> Ok
>
>> A bit more testing has shown, that using devm_ioremap_wc() leads
>> to sporadic failures in the FPGA image downloading though. Now
>> I'm unsure, if its really worth to add some cache flushing after
>> the memcpy() call here (flush_cache_vmap ?) or better keep the
>> uncached ioremap in place.
>
> The cache is not involved here, as devm_ioremap_wc is still uncached.
> However it's possible that memcpy has other issues here. If you
> don't gain anything from sending more than 32 bits at a time,
> writesl() might be a better alternative, but it requires the data
> to be 32-bit aligned in RAM. If the buffer is not aligned, the
> memcpy() might already cause problems in case the kernel
> implementation does not send the data in the correct order.
>
> memcpy_toio() could also be helpful here.
>
>>>>> This means that the 1MB window is probably a reasonable size (provided
>>>>> that the (most importantly) the SPI-NOR driver can guarantee that it
>>>>> never exceeds this).
>>>>
>>>> I have no problem switching from 1MiB to using 0xffffffff in the
>>>> SPI controller 'reg' node to allow even bigger windows in v6.
>>>
>>> How do you decide how much of the window to map though? Using 0xffffffff
>>> is nice because it better represents what the hardware looks like
>>> and doesn't limit you from having arbitrarily large mbus windows
>>> based on your needs, but you'd probably have to try mapping various
>>> sizes then to find the maximum supported one, or you'd have to do
>>> some ugly parsing of the parent node ranges.
>>
>> This is for the single-window mode, right? My comment above was
>> still referring to the original implementation. And yes, this
>> gets a bit complicated and is most likely not really necessary.
>> So I would prefer to keep the mapping fixed to a max. of 1MiB for
>> now.
>
> No, I also meant the separate windows here. Based on the discussion
> above, I think the driver can ideally just ioremap 4KB for devices
> other than SPI-NOR, and then you can list the 0xffffffff length.
>
>>> In this example, you have to configure the SPI controller to have 1MB per CS
>>> in single-window mode using the "SPI DecodeMode" and "SPI CS Adress Decode"
>>> registers. The mbus then contains two mappings, one 2MB window spanning
>>> CS3 and CS4 in a continuous CPU range, and one 64K window for a device at
>>> CS6, making it as small as possible to conserve CPU address space.
>>
>> Thanks. I've prepared a new version implementing this single-
>> window mode. Please let me know how you prefer the cache issue
>> above to be handled (uncached mapping, flush or ?) and I'll
>> update the patch accordingly.
>
> If a write-combining mapping doesn't help you here, just use the normal
> ioremap as before.

Okay, switching back to the normal ioremap.

> The single-window mode won't really work with SPI-NOR unless you make each
> window large enough to hold the largest possible flash (128MB with
> single-window, 256MB with double-window), but that size makes it harder
> to gain much from saving mbus windows when you trade them for gigantic
> CPU address space consumption.
>
> Therefore, we probably want the separate-window mode to allow the
> combination of large SPI flashes with other SPI devices in direct-access
> mode. We can also implement the single (and/or double) window mode
> in addition to that, but realistically we probably don't need that
> as all machines we support in the kernel today have at most one
> non-flash device.

Yes, this is also my feeling. While implementing this 0xffffffff
size in the SPI controller 'reg' DT node, I just stumbled over the
following problem: The resulting size of the area to map
(of_address_to_resource) is not the minimum of the 'soc' 'ranges'
entry and this 0xffffffff as I would have expected. But its always
0xffffffff. Do you know if this is general problem with this
approach or perhaps a problem / bug in the DT address probing /
translation?

Thanks,
Stefan