From mboxrd@z Thu Jan 1 00:00:00 1970 From: will.deacon@arm.com (Will Deacon) Date: Wed, 21 Aug 2013 13:24:24 +0100 Subject: [PATCH v2 1/3] ARM: Introduce atomic MMIO clear/set In-Reply-To: <1377017307-23201-2-git-send-email-ezequiel.garcia@free-electrons.com> References: <1377017307-23201-1-git-send-email-ezequiel.garcia@free-electrons.com> <1377017307-23201-2-git-send-email-ezequiel.garcia@free-electrons.com> Message-ID: <20130821122424.GB1653@mudshark.cambridge.arm.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org Hi Ezequiel, Cheers for the v2. I've been thinking about how to improve the performance of this operation and I ended up completely changing my mind about how it should be implemented :) Comments and questions below... On Tue, Aug 20, 2013 at 05:48:25PM +0100, Ezequiel Garcia wrote: > diff --git a/arch/arm/kernel/io.c b/arch/arm/kernel/io.c > index dcd5b4d..b2a53a3 100644 > --- a/arch/arm/kernel/io.c > +++ b/arch/arm/kernel/io.c > @@ -1,6 +1,19 @@ > #include > #include > #include > +#include > + > +static DEFINE_SPINLOCK(__io_lock); > + > +void atomic_io_clear_set(u32 clear, u32 set, void __iomem *reg) First off, what exactly is this function supposed to guarantee? Do you simply require thread-safe device access, or do you also require completion of the the device writes? Since the latter is device-specific (probably requiring some read-backs in the driver), I assume you actually just need to ensure that multiple drivers don't trip over each other. In which case, this can be *much* lighter weight. > +{ > + spin_lock(&__io_lock); Is this function likely to be used in irq context? If so, better disable IRQs here. > + writel((readl(reg) & ~clear) | set, reg); Going back to my earlier comments, this will assemble to something like: dmb ldr [device] dsb ... dsb outer_sync str [device] dmb If my assumption about completion is correct, you actually just want: dmb ldr [device] ... str [device] dmb which can be done by using the _relaxed variants of readl/writel. > + /* ensure the write get done before unlocking */ > + __iowmb(); And then, despite my previous suggestion, you can drop this line too. > + spin_unlock(&__io_lock); > +} > +EXPORT_SYMBOL(atomic_io_clear_set); Now, the only downside with this approach is that you need explicit barriers in the driver if you want to enforce ordering with respect to normal, cacheable buffers to be consumed/populated by the device. What do you think? An alternative would be just to relax the readl and rely on the dsb preceeding the writel, then add atomic_io_clear_set_relaxed to implement what I described above, but it depends on how you envisage this helper being used. Will