From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f51.google.com (mail-wm1-f51.google.com [209.85.128.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6AD6F2E06D2 for ; Fri, 3 Jul 2026 21:16:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.51 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1783113417; cv=none; b=jVQyQaX8bfpSYNNLCqGqxCEEUjQa6RvDZQiDMyyH1kX/L5JeaTcMFfUda60AZ7xuEvjABI4WTHQtcAOzcfBkdeRyzzph2a17Jeud/kAqzsNSn6Y5BWYJUBidgVdebxzwf9lHdSBd17/HMlpF/PipWAZz+7QCzQErMip4xT2PMgo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1783113417; c=relaxed/simple; bh=cCkQjCePSHb28HQ2XyzUAFDSIIkdoakr3w3MVImbm6U=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ODEgNwe7ThxSEoA2YP7HIz3D8QvzicrjeQ4q+4UG6dVk4yTh1S1+zlm0yNbl9mILGFkyV2ZKmzy50cp+qrPis3qHahNlFoKTJz4gw7q/LYfzh9MlhoOGX2NhOcCsy7VVS/324KplFLYlRGPxcC1M1X//Sp9NBW91gD/k9IbBY9w= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=i8LjiFqh; arc=none smtp.client-ip=209.85.128.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="i8LjiFqh" Received: by mail-wm1-f51.google.com with SMTP id 5b1f17b1804b1-493be1b9564so6986785e9.2 for ; Fri, 03 Jul 2026 14:16:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1783113414; x=1783718214; darn=vger.kernel.org; h=content-transfer-encoding:content-type:mime-version:references :in-reply-to:message-id:subject:cc:to:from:date:from:to:cc:subject :date:message-id:reply-to:content-type; bh=C4eCszeuptfMVD03Bd+mf3Rmau3oAhlB5q7FtJvbpYM=; b=i8LjiFqh5EoZSxBpxrDqKOAkMrdQ0gNVu+qanDBXhMGCg01oHwb+vlCzvyptN/LuOF tmPwvalqmdDZkCnKTlpZGeueW42uxpOKho2EwKztdf8N3Fboj4Hz8kaQ6e4Urll/FuXu tKLrrJN9wLtC5Ko1KfteBrY5ftuHbrYNPBYt3XpZzZA2J79IcBYlfg4/C7aQdaPZeHTQ lTQHkydxOuHsrhMn3lDYMFaf6Xdze1MBRhY5ahjLH+qHK6Yevl0wglMobKPPFBUxBZSR dtj9HrSbGUM8Z8pfjD3LrkM13v0LjUcc9OXZpdR2SvHwM1TNStNDMA8SUYTya63kntps kEig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1783113414; x=1783718214; h=content-transfer-encoding:content-type:mime-version:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to :content-type; bh=C4eCszeuptfMVD03Bd+mf3Rmau3oAhlB5q7FtJvbpYM=; b=F/vM0e89gW7kDLCUlvpyuubNly6vI4QFr/83Wy+ExmbTjoLz/KjFLowCTbVwIuqqBg 81bY2WSsRLcKyTt6IKUWyzDfiqH4RL3fYjnWNRMQJl7wHD/5M7vs/QS1JmCoKVX45UIq +bDoEz0jZFtaZ1SuufS8zjNNigcs0SvvFH6OKz3kxOfJAbdBowpyjgULT9aVIi7oimgu 6JeivBgk85N78x6w2m8vOzp8GeoQfPeSqHWwfu2dzA4BQU3c3ww6Mtx3D+m4ZohCm6MP obkSmsmGV/Nm2RCb+IW9wrrX8f4KntZ3WeKOJ7sn0VK/SSHyaIS3wxp/bfzBdo11l6L0 uX8w== X-Forwarded-Encrypted: i=1; AFNElJ9BCcgM64q4MY6qDmur5EmfHxxYAMMRLGHhDI30qtj1ZJX5JV1cwwWtePsnNIQ1YHetSgV3OGFhBguE7xA=@vger.kernel.org X-Gm-Message-State: AOJu0YxvDpG68uDK8rWxdZ9YAMlUYypoRm1/24nkX5wyk1k+jUwwnJaG eo0Mrrl+qH74cA3JoEMDEAqsHoaenQDmWROs7gywOYoesu62d7PylDx+ X-Gm-Gg: AfdE7clj0AebQTkuoQqc6ID64gVI11aOBUnXkcMuN0tkaytpo1AXwYcs6SE9Cl2tt0I SpD3VfBFHboeMZWYDXncGZDcrmqDP5Z2WNpdUpbCKcZ2A5Jpwq8gUXgwu8kQrO/fJvJq4c/eYCh 8S1FSTh890VcylZ6XRINErxXZjeG4PwUsC9dZSzbzU8szMwkqUAxtd/UXjPALaQ7PgY6gF41uVR 5Z0HothDM87X8KC+QgMdojdlCeo7EQpa5h+AIe4SIO39fBareG3pflRHVijVffb02DLFaxJRMm/ xGlj++KCoua8QFwYRUU6HcK4WKEQ5HABxjARctOXXQGR2gOqx5ZpOiUyP0zskyW0rpyfu2LX5QW Ycj/M69rSM+YV256pjx3YTXNLBsfB6hJCjNlhqbE0qN1KwIpkH9dYj+blCx/CyxTS/yOXsTEoli w05zWy1JG9RYl6oQy58y5ZgKj5N3tXmdqOesEty0OXDAle/+1B5IJnO0KC X-Received: by 2002:a05:600c:c1c8:10b0:492:1e36:1fe9 with SMTP id 5b1f17b1804b1-493d11fafa3mr6498625e9.37.1783113413568; Fri, 03 Jul 2026 14:16:53 -0700 (PDT) Received: from pumpkin (host-92-21-50-228.as13285.net. [92.21.50.228]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-493c63ba971sm247973145e9.13.2026.07.03.14.16.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Jul 2026 14:16:53 -0700 (PDT) Date: Fri, 3 Jul 2026 22:16:51 +0100 From: David Laight To: Jacopo Mondi Cc: David Carlier , dan.scally@ideasonboard.com, mchehab@kernel.org, linux-media@vger.kernel.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: Re: [PATCH] media: mali-c55: Fix unaligned access of AEC histogram zone weights Message-ID: <20260703221651.41669d55@pumpkin> In-Reply-To: References: <20260702103453.348056-1-devnexen@gmail.com> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; arm-unknown-linux-gnueabihf) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Fri, 3 Jul 2026 11:44:31 +0200 Jacopo Mondi wrote: > Hi David, > thanks for sending a patch to address this issue > > On Thu, Jul 02, 2026 at 11:34:53AM +0100, David Carlier wrote: > > mali_c55_params_aexp_hist_weights() packs the 225 per-zone u8 weights > > into the ISP registers four at a time by casting the zone_weights array > > to u32 and dereferencing it. The array sits at offset 10 within the > > parameter block, so it is only 2-byte aligned: the u32 access is > > unaligned, which is undefined behaviour and can fault on strict-align > > configurations or once the loop is auto-vectorised. > > well, I don't there is a risk of undefined behaviour on ARMv8, it's > just less efficient > > > > > The cast also reads the four weights in host byte order before they are > > written to the little-endian register, so on big-endian hosts the four > > weights packed into each register end up in the wrong byte lanes. > > Also we don't have any endianess issue as the IP is only found on > little endian systems > > > > > Read the weights with get_unaligned_le32() instead, which is both > > alignment-safe and fixes the byte order regardless of host endianness. > > > > mmm, I read in Documentation/core-api/unaligned-memory-access.rst > that: > > ------------------------------------------------------------------------------ > u32 value = get_unaligned((u32 *) data); > > These macros work for memory accesses of any length (not just 32 bits as > in the examples above). Be aware that when compared to standard access of > aligned memory, using these macros to access unaligned memory can be costly in > terms of performance. > > If use of such macros is not convenient, another option is to use memcpy(), > where the source or destination (or both) are of type u8* or unsigned char*. > Due to the byte-wise nature of this operation, unaligned accesses are avoided. > ------------------------------------------------------------------------------ > > Which seems to suggest, if the issue here is performances, we should > aim for something different ? (honest question here, any kind of > guidance is appreciated) > > > Fixes: d5f281f3dd29 ("media: mali-c55: Add Mali-C55 ISP driver") > > Cc: stable@vger.kernel.org > > If it's only about performances, does this qualifies as a fix ? > > > Signed-off-by: David Carlier > > --- > > drivers/media/platform/arm/mali-c55/mali-c55-params.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/media/platform/arm/mali-c55/mali-c55-params.c b/drivers/media/platform/arm/mali-c55/mali-c55-params.c > > index de0e9d898..1aaf64dde 100644 > > --- a/drivers/media/platform/arm/mali-c55/mali-c55-params.c > > +++ b/drivers/media/platform/arm/mali-c55/mali-c55-params.c > > @@ -6,6 +6,7 @@ > > */ > > #include > > #include > > +#include > > > > #include > > #include > > @@ -203,7 +204,7 @@ mali_c55_params_aexp_hist_weights(struct mali_c55 *mali_c55, > > * of overwriting other registers. > > */ > > for (unsigned int i = 0; i < 56; i++) { > > - val = ((u32 *)params->zone_weights)[i] > > + val = get_unaligned_le32(¶ms->zone_weights[i * 4]) > > & MALI_C55_AEXP_HIST_ZONE_WEIGHT_MASK; On LE with HAVE_EFFICIENT_UNALIGNED_ACCESS the latter generates what you expect the former to generate. But gcc can unroll loops and use (IIRC) 'rdp' to read two registers at once. That will crash and burn. The best thing would be to have a union of the two arrays with the member marked __packed to remove the padding before it. > > > We could do: > > memcpy(&val, ¶ms->zone_weights[4 * i], 4); Some of the KASAN (etc) builds might make a mess of that. Without compiler optimisations of memcpy() it is horrid. > addr = base + MALI_C55_AEXP_HIST_ZONE_WEIGHTS_OFFSET + (4 * i); > > mali_c55_ctx_write(mali_c55, addr, > val & MALI_C55_AEXP_HIST_ZONE_WEIGHT_MASK); > > Or this could be an alternative: > > const u8 *w = ¶ms->zone_weights[4 * i]; > > val = w[0] | w[1] << 8 | w[2] << 16 | w[3] << 24; That is a possible implementation of get_unaligned_le32() no point doing it explicitly. A late enough gcc will convert that to a 32bit memory read (with any byteswap in the read or after) if unaligned accesses are supported. Otherwise you get byte loads, shifts and ors. David > addr = base + MALI_C55_AEXP_HIST_ZONE_WEIGHTS_OFFSET + (4 * i); > > mali_c55_ctx_write(mali_c55, addr, > val & MALI_C55_AEXP_HIST_ZONE_WEIGHT_MASK); > > What do you think ? > > > addr = base + MALI_C55_AEXP_HIST_ZONE_WEIGHTS_OFFSET + (4 * i); > > > > -- > > 2.53.0 > > > > >