From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stephen Warren Date: Wed, 20 Aug 2014 13:12:20 -0600 Subject: [U-Boot] [PATCH 0/9] net: rtl8169: Fix cache maintenance issues In-Reply-To: <1408348852-30894-1-git-send-email-thierry.reding@gmail.com> References: <1408348852-30894-1-git-send-email-thierry.reding@gmail.com> Message-ID: <53F4F314.6070101@wwwdotorg.org> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: u-boot@lists.denx.de On 08/18/2014 02:00 AM, Thierry Reding wrote: > From: Thierry Reding > > This series attempts to fix a long-standing problem in the rtl8169 driver > (though the same problem may exist in other drivers as well). Let me first > explain what exactly the issue is: > > The rtl8169 driver provides a set of RX and TX descriptors for the device to > use. Once they're set up, the device is told about their location so that it > can fetch the descriptors using DMA. The device will also write packet state > back into these descriptors using DMA. For this to work properly, whenever a > driver needs to access these descriptors it needs to invalidate the D-cache > line(s) associated with them. Similarly when changes to the descriptor have > been made by the driver, the cache lines need to be flushed to make sure the > changes are visible to the device. > > The descriptors are 16 bytes in size. This causes problems when used on CPUs > that have a cache-line size that is larger than 16 bytes. One example is the > NVIDIA Tegra124 which has 64-byte cache-lines. That means that 4 descriptors > fit into a single cache-line. So whenever the driver flushes a cache-line it > has the potential to discard changes made to another descriptor by the DMA > device. One typical symptom is that large transfers over TFTP will often not > complete and hang somewhere midway because a device marked a packet received > but the driver flushing the cache and causing the packet to be lost. > > Since the descriptors need to be consecutive in memory, I don't see a way to > fix this other than to use uncached memory. Therefore the solution proposed > in this patch series is to introduce a mechanism in U-Boot to allow a driver > to allocate from a pool of uncached memory. Currently an implementation is > provided only for ARM v7. The idea is that a region (of user-definable size) > immediately below (taking into account architecture-specific alignment > restrictions) the malloc() area is mapped uncacheable in the MMU. A driver > can use the new noncached_alloc() function to allocate a chunk of memory > from this pool dynamically for buffers that it can't or doesn't want to do > any explicit cache-maintainance on, yet needs to be shared with DMA devices. > > Patches 1-3 are minor preparatory work. Patch 1 cleans up some coding style > issues in the ARM v7 cache code and patch 2 uses more future-proof types for > the mmu_set_region_dcache_behaviour() function arguments. Patch 3 is purely > for debugging purposes. It will print out the region used by malloc() when > DEBUG is enabled. This can be useful to see where the malloc() region is in > the memory map (compared to the noncached region introduced in a later patch > for example). > > Patch 4 implements the noncached API for ARM v7. It obtains the start of the > malloc() area and places the noncached region immediately below it so that > noncached_alloc() can allocate from it. During boot, the noncached area will > be set up immediately after malloc(). > > Patch 5 enables noncached memory for all Tegra boards. It uses a 1 MiB chunk > which should be plenty (it's also the minimum on ARM v7 because it matches > the MMU section size and therefore the granularity at which U-Boot can set > the cacheable attributes). If LPAE were to be enabled, the minimum would be 2MiB, but I suppose we can deal with that if/when the time comes.