From mboxrd@z Thu Jan  1 00:00:00 1970
From: will.deacon@arm.com (Will Deacon)
Date: Fri, 14 Nov 2014 13:46:45 +0000
Subject: [PATCH V3] arm64: percpu: Implement this_cpu operations
In-Reply-To: <1415965077-10495-1-git-send-email-steve.capper@linaro.org>
References: <20141107135205.GA7591@linaro.org>
 <1415965077-10495-1-git-send-email-steve.capper@linaro.org>
Message-ID: <20141114134644.GD27963@arm.com>
To: linux-arm-kernel@lists.infradead.org
List-Id: linux-arm-kernel.lists.infradead.org

Hi Steve,

On Fri, Nov 14, 2014 at 11:37:57AM +0000, Steve Capper wrote:
> The generic this_cpu operations disable interrupts to ensure that the
> requested operation is protected from pre-emption. For arm64, this is
> overkill and can hurt throughput and latency.
> 
> This patch provides arm64 specific implementations for the this_cpu
> operations. Rather than disable interrupts, we use the exclusive
> monitor or atomic operations as appropriate.
> 
> The following operations are implemented: add, add_return, and, or,
> read, write, xchg. We also wire up a cmpxchg implementation from
> cmpxchg.h.
> 
> Testing was performed using the percpu_test module and hackbench on a
> Juno board running 3.18-rc4.

What does this patch apply against? I'm struggling to apply it to our
for-next branch (perhaps it conflicts with your other cmpxchg patch?)

Anyway, one comment below.

> +static inline unsigned long __percpu_read(void *ptr, int size)
> +{
> +	unsigned long ret;
> +
> +	switch (size) {
> +	case 1:
> +		asm ("//__per_cpu_read_1\n"
> +			"ldrb %w[ret], %[ptr]\n" :
> +			[ret] "=&r"(ret) : [ptr] "Q"(*(u8 *)ptr));
> +		break;
> +	case 2:
> +		asm ("//__per_cpu_read_2\n"
> +			"ldrh %w[ret], %[ptr]\n" :
> +			[ret] "=&r"(ret) : [ptr] "Q"(*(u16 *)ptr));
> +		break;
> +	case 4:
> +		asm ("//__per_cpu_read_4\n"
> +			"ldr %w[ret], %[ptr]\n" :
> +			[ret] "=&r"(ret) : [ptr] "Q"(*(u32 *)ptr));
> +		break;
> +	case 8:
> +		asm ("//__per_cpu_read_8\n"
> +			"ldr %[ret], %[ptr]\n" :
> +			[ret] "=&r"(ret) : [ptr] "Q"(*(u64 *)ptr));
> +		break;
> +	default:
> +		BUILD_BUG();
> +	}
> +
> +	return ret;
> +}
> +
> +static inline void __percpu_write(void *ptr, unsigned long val, int size)
> +{
> +	switch (size) {
> +	case 1:
> +		asm ("//__per_cpu_write_1\n"
> +			"strb %w[val], %[ptr]\n" :
> +			[ptr] "=Q"(*(u8 *)ptr) : [val] "r"(val));
> +		break;
> +	case 2:
> +		asm ("//__per_cpu_write_2\n"
> +			"strh %w[val], %[ptr]\n" :
> +			[ptr] "=Q"(*(u16 *)ptr) : [val] "r"(val));
> +		break;
> +	case 4:
> +		asm ("//__per_cpu_write_4\n"
> +			"str %w[val], %[ptr]\n" :
> +			[ptr] "=Q"(*(u32 *)ptr) : [val] "r"(val));
> +		break;
> +	case 8:
> +		asm ("//__per_cpu_write_8\n"
> +			"str %[val], %[ptr]\n" :
> +			[ptr] "=Q"(*(u64 *)ptr) : [val] "r"(val));
> +		break;
> +	default:
> +		BUILD_BUG();
> +	}
> +}

Can you implement the read/write accessors with ACCESS_ONCE instead?
I think we're just after a single-copy atomic access without barrier
semantics, so that should work if you get your types right.

Will