From mboxrd@z Thu Jan 1 00:00:00 1970 Received: with ECARTIS (v1.0.0; list linux-mips); Tue, 20 Sep 2005 13:23:07 +0100 (BST) Received: from extgw-uk.mips.com ([IPv6:::ffff:62.254.210.129]:52509 "EHLO bacchus.net.dhis.org") by linux-mips.org with ESMTP id ; Tue, 20 Sep 2005 13:22:50 +0100 Received: from dea.linux-mips.net (localhost.localdomain [127.0.0.1]) by bacchus.net.dhis.org (8.13.4/8.13.1) with ESMTP id j8KCMi3p008283; Tue, 20 Sep 2005 13:22:44 +0100 Received: (from ralf@localhost) by dea.linux-mips.net (8.13.4/8.13.4/Submit) id j8KCMhEf008282; Tue, 20 Sep 2005 13:22:43 +0100 Date: Tue, 20 Sep 2005 13:22:43 +0100 From: Ralf Baechle To: Dominic Sweetman Cc: "Maciej W. Rozycki" , Thiemo Seufer , linux-mips@linux-mips.org Subject: Re: Performance bug in c-r4k.c cache handling code Message-ID: <20050920122243.GE3159@linux-mips.org> References: <20050919154056.GG3386@hattusa.textio> <17199.53696.27856.801284@mips.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <17199.53696.27856.801284@mips.com> User-Agent: Mutt/1.4.2.1i Return-Path: X-Envelope-To: <"|/home/ecartis/ecartis -s linux-mips"> (uid 0) X-Orcpt: rfc822;linux-mips@linux-mips.org Original-Recipient: rfc822;linux-mips@linux-mips.org X-archive-position: 8993 X-ecartis-version: Ecartis v1.0.0 Sender: linux-mips-bounce@linux-mips.org Errors-to: linux-mips-bounce@linux-mips.org X-original-sender: ralf@linux-mips.org Precedence: bulk X-list: linux-mips On Tue, Sep 20, 2005 at 10:09:20AM +0100, Dominic Sweetman wrote: > > > I found an performance bug in c-r4k.c:r4k_dma_cache_inv, where a > > > Hit_Writeback_Inv instead of Hit_Invalidate is done. > > The MIPS64 spec (which is really all there is to set standards in this > area) regards Hit_Invalidate as optional. So it would be nice not to > use it. CPUs have no standard "configuration" register you can read > to establish which cacheops work, so to identify capable CPUs you must > use a table of CPU attributes indexed by the CPU ID, which encourages > the crime of building software which can't possibly run on a new CPU... > > So long as the buffer is in fact clean, then in most implementations a > Hit_Writeback_Invalidate will be just as efficient. This are R4700 numbers, the only I was able to find in a quick search. Hit_Invalidate_D 7 cycles for a cache miss 9 cycles for a cache hit Hit_Writeback_Invalidate_D 7 cycles for a cache miss 12 cycles for a cache hit if the cache line is clean. 14 cycles for a cache hit if the cache line is dirty (Writeback). Hit_Writeback_D 7 cycles for a cache miss 10 cycles for a cache hit if the cache line is clean 14 cycles for a cache hit if the cache line is dirty (Writeback). > Moreover, CPUs always "post" writes to some extent, so a small > percentage of dirty lines can be handled without any great overhead. > So a significant advantage can only occur when the buffer you want to > invalidate (prior to DMA-in) was fairly recently densely written by > the CPU; and this is only safe when all that data can be guaranteed to > now be of no importance to anyone. Linux has a well-defined ABI that DMA drivers are supposed to use; the functions of this ABI that perform cache flushes also take a DMA direction argument based on which the implementation can deciede on what the best flush function for a particular case will be. > Randomly and retrospectively discarding writes could generate some > very interesting bugs, or (indeed) usually hide some very interesting > bugs. It's the kind of thing one would lik to avoid! > > I suppose where DMA data subsequently gets decorated by the CPU then > handed on to some other layer, then the buffer is freed...? > > > FYI, for R4k DECstations the need to flush the cache for newly allocated > > skbs reduces throughput of FDDI reception by about a half (!), down from > > about 90Mbps (that's for the /260)... Software coherency will result in many server / client type operations approximate worst case as none of the data will reside in caches. Routers are going to be somewhat better off - as long as they don't peek to deep into the packets, that is. > How did you measure the high throughput? Have you got a > machine with DMA-coherency you can turn on and off? Afaik AMD Alchemy processors have configurable coherency. Ralf