From mboxrd@z Thu Jan  1 00:00:00 1970
From: Dave.Martin@arm.com (Dave Martin)
Date: Fri, 19 May 2017 15:02:40 +0100
Subject: [RFC PATCH] arm64: neon: Remove support for nested or hardirq
 kernel-mode NEON
In-Reply-To: <20170519133424.GC22800@leverpostej>
References: <1495193199-1291-1-git-send-email-Dave.Martin@arm.com>
 <20170519124952.GB22800@leverpostej>
 <20170519131321.GD3559@e103592.cambridge.arm.com>
 <20170519133424.GC22800@leverpostej>
Message-ID: <20170519140239.GF3559@e103592.cambridge.arm.com>
To: linux-arm-kernel@lists.infradead.org
List-Id: linux-arm-kernel.lists.infradead.org

On Fri, May 19, 2017 at 02:34:25PM +0100, Mark Rutland wrote:
> On Fri, May 19, 2017 at 02:13:21PM +0100, Dave Martin wrote:
> > On Fri, May 19, 2017 at 01:49:53PM +0100, Mark Rutland wrote:
> > > Hi,
> > > 
> > > On Fri, May 19, 2017 at 12:26:39PM +0100, Dave Martin wrote:
> > > > +static bool __maybe_unused kernel_neon_allowed(void)
> > > > +{
> > > > +	/*
> > > > +	 * The per_cpu_ptr() is racy if called with preemption enabled.
> > > > +	 * This is not a bug:  per_cpu(kernel_neon_busy) is only set
> > > > +	 * when preemption is disabled, so we cannot migrate to another
> > > > +	 * CPU while it is set, nor can we migrate to a CPU where it is set.
> > > > +	 * So, if we find it clear on some CPU then we're guaranteed to find it
> > > > +	 * clear on any CPU we could migrate to.
> > > > +	 *
> > > > +	 * If we are in between kernel_neon_begin()...kernel_neon_end(), the
> > > > +	 * flag will be set, but preemption is also disabled, so we can't
> > > > +	 * migrate to another CPU and spuriously see it become false.
> > > > +	 */
> > > > +	return !(in_irq() || in_nmi()) &&
> > > > +		!local_read(this_cpu_ptr(&kernel_neon_busy));
> > > > +}
> > > 
> > > I think it would be better to use the this_cpu ops for this, rather than
> > > manually messing with the pointer.
> > 
> > I had thought that the properties of local_t were important here, but
> > this_cpu ops seem equally appropriate.
> > 
> > > Here, we can use raw_cpu_read(kernel_neon_busy), given the comment
> > > above.
> > 
> > What's the difference?  The raw_ variants aren't documented.  Do these
> > not bother about atomicity between the address calculation and the read?
> 
> Yup.
> 
> Comparing raw_cpu_{x}, __this_cpu_{x}, and this_cpu_{x}:
> 
> * raw_cpu_{x} don't have any preemption checks, and don't disable
>   preemption internally. Due to this, the address gen and operation can
>   occur on different CPUs.

Ah, I see.  This is not fantastically intuitive...

> * __this_cpu_{x} have lockdep annotations to check that preemption is
>   disabled, but otherwise behave the same as raw_cpu_{x}. i.e. they
>   don't try to disable preemption internally.
> 
> * this_cpu_{x} ensure that preemption is disabled around the address gen
>   and operation.
> 
> Since preemption may be possible, but we believe this is fine, we have
> to use the raw form. We'll want to keep the commment explaining why.

Absolutely.  Those seem natural fits then.

Cheers
---Dave