From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wr1-f68.google.com (mail-wr1-f68.google.com [209.85.221.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1F52D364E9D for ; Tue, 16 Dec 2025 13:14:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.68 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765890852; cv=none; b=K4LuKRpPbzy6YDfYLVX+eBYa/bAWY7OAzm8pCbeN2STZJWlfwYhbeuGKMarU7HBHogtFvkoVlEuVuFI44jXpMaLdSWy0Q8qBCsP5pa/DkZ97+QUqOkP6efrtLyUs2sTj7ZMxmqtKZ69n2WwhkU9eX0CjY9FZEWtW86Y1tiFteOA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765890852; c=relaxed/simple; bh=ynhrkW999UXEknYTrUI9x3+TVU7FWUuBgy8bSoRTmiY=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=cvW1O5sPq36CeQOLCG+V0RAk1HzTGTkhbWu0wOGr6ArItWiFksk5XOb6UbnJeZVcnWUHBAPLdLxtuthxXKQiDQSucAnvuHNAmgRCwow5ei/HzXetb8foNGvKsZiD6Xc1RpYoID+R1s1JYzB5KP6KSp/yUOOPx1vPM2e/hrjafMk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=fPWuqcKY; arc=none smtp.client-ip=209.85.221.68 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="fPWuqcKY" Received: by mail-wr1-f68.google.com with SMTP id ffacd0b85a97d-42fb6ce71c7so3570088f8f.1 for ; Tue, 16 Dec 2025 05:14:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1765890848; x=1766495648; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=5z756M5ITyQ1/7nULDOILiBSxTUPMCets85WlSMgNCc=; b=fPWuqcKYjce9bh8IYs/V/GC53bq46tjyP7CPBbADMg+fWJZRzrBFPk3OyYT4jhO+0t aJ+ud2V6UfDcY54WN8+eiPoYftqXHb3rweVYIZtIZp5jq5hpNT2n0b2uW5uPB6SyeVH9 m6YcYDCLaWR3RL3Mjm07k1RjfJ7c0UB44+CQ5ISqiRdGW/FaMae80wNKfByue0mf7cB4 5peHFeX+q9KqgsiBG8qasIeTmCtULv9MbrwlvfK/TAx28ZnywkfWMz6+AYI0KiJEQv1q F/AHZfeCHcgDsg36nt4syO+uJOkMxI9cOmjlWOLoofex//2rleYrvOBT8C+G4zeBVcEj dg3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1765890848; x=1766495648; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=5z756M5ITyQ1/7nULDOILiBSxTUPMCets85WlSMgNCc=; b=oagf3a8DeUiSEJ5jl4TkkeRpWxp618uU1xMCv0ABzDBaj+gS0tJP6KHI7XnFcZNssQ JW89L1PSzY02ffFynyHk95mSIbN7tM8a4IGhX4s665oY52YjdaqM8RDpN4mMRQCD5YAg 5Z6gzCYt7ms3b5FfLurqRmowysykwSkyTaG9SLj4LdsNzavFv4+UQ8p825VykXMJT+Ud gfFblMelL7dxekNyQ7RfWAmzJO5fTRqy9XxoH+WxtAaA+Cg3qhjebNvqcbrlkupLwYJm 3V4kG82Y2m+lBd5OU0GNEBnjHIhXiosH8cHjKlB5oQxVjvxWt1wzNPqZDdfRiBZ45RRw eRZQ== X-Forwarded-Encrypted: i=1; AJvYcCWD16iUffJqLrYsvz2VT0xEqmWDpBnncue5mJNU3vA31SluOtvxI17CT+q7L99FBG4SfQwal9B4cWHQguw=@vger.kernel.org X-Gm-Message-State: AOJu0YwhdJj+aH3zI+8ZbgAlqn5p5jI8NOQzfHNGyAeAP0P+Lvja79Vs MsXKrfizTgUqNmKn3Dcy4EGYiTCZR744ygC5U4oKORskFAgRRe+pzJjG X-Gm-Gg: AY/fxX7DLdgW31h4J/mwJ6zbvtt1NIhuc1ncLZkEiOdBQ4nAKDyREPYJuuFJx8obHWE Rws1erjkeGdUnrw1Vp9sYGcbPQsYKpJUmjngSzYhV5lPPSK2/EW34LFH6ZJXJhNpVQY4ghL9KEQ mu5oNDBfRf9goHXxctvnctIQZs3bKGVoLo+HGUShdZTRarsD8sVGknOTv7nJ0hJKxocqiVKj/20 rF8RLdWca7Eh/DRMdhpKGmxI4i8fV+TilAzliwPFH65rHvp81GdjmN15PatN9EYQW6dDYchmxA/ qGDnUWZrEOHswHCUPb9Ke7agdxJ2ABOmIROAgb77b8opsZA8yTAKsRdqTwQs3tkVORdJV3pq8QL Itx9hlx+8YT3MorYNSNcqYlANTvUrj1Wu1SaBm1nJP3my9+w061Mg/3dghMSpCZLtcasA6ohzrK fdquaVVenF6vSuoRQeYgsp8af/cuXee08teDTRHDREeIPP+bPzMTEH X-Google-Smtp-Source: AGHT+IHhcFJMUJY2F0Nj84fJB0XHOTuFyk3Mla9zTRXSBxf4/WSP7tR00HzvX+8J/BWwi/O9M2vO6g== X-Received: by 2002:adf:f489:0:b0:430:7d4c:3dbc with SMTP id ffacd0b85a97d-4307d4c3f2bmr11148934f8f.53.1765890848206; Tue, 16 Dec 2025 05:14:08 -0800 (PST) Received: from pumpkin (82-69-66-36.dsl.in-addr.zen.co.uk. [82.69.66.36]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-42fa8b9b750sm34931241f8f.42.2025.12.16.05.14.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 16 Dec 2025 05:14:07 -0800 (PST) Date: Tue, 16 Dec 2025 13:14:06 +0000 From: David Laight To: Uros Bizjak Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" Subject: Re: [PATCH RESEND] x86/asm/32: Modernize _memcpy() Message-ID: <20251216131406.43bfbbaa@pumpkin> In-Reply-To: <20251216103750.229347-1-ubizjak@gmail.com> References: <20251216103750.229347-1-ubizjak@gmail.com> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; arm-unknown-linux-gnueabihf) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Tue, 16 Dec 2025 11:37:33 +0100 Uros Bizjak wrote: > Use inout "+" constraint modifier where appropriate, declare > temporary variables as unsigned long and rewrite parts of assembly > in plain C. The memcpy() function shrinks by 10 bytes, from: > > 00e778d0 : > e778d0: 55 push %ebp > e778d1: 89 e5 mov %esp,%ebp > e778d3: 83 ec 0c sub $0xc,%esp > e778d6: 89 5d f4 mov %ebx,-0xc(%ebp) > e778d9: 89 c3 mov %eax,%ebx > e778db: 89 c8 mov %ecx,%eax > e778dd: 89 75 f8 mov %esi,-0x8(%ebp) > e778e0: c1 e9 02 shr $0x2,%ecx > e778e3: 89 d6 mov %edx,%esi > e778e5: 89 7d fc mov %edi,-0x4(%ebp) > e778e8: 89 df mov %ebx,%edi > e778ea: f3 a5 rep movsl %ds:(%esi),%es:(%edi) > e778ec: 89 c1 mov %eax,%ecx > e778ee: 83 e1 03 and $0x3,%ecx > e778f1: 74 02 je e778f5 > e778f3: f3 a4 rep movsb %ds:(%esi),%es:(%edi) > e778f5: 8b 75 f8 mov -0x8(%ebp),%esi > e778f8: 89 d8 mov %ebx,%eax > e778fa: 8b 5d f4 mov -0xc(%ebp),%ebx > e778fd: 8b 7d fc mov -0x4(%ebp),%edi > e77900: 89 ec mov %ebp,%esp > e77902: 5d pop %ebp > e77903: c3 ret > > to: > > 00e778b0 : > e778b0: 55 push %ebp > e778b1: 89 e5 mov %esp,%ebp > e778b3: 83 ec 08 sub $0x8,%esp > e778b6: 89 75 f8 mov %esi,-0x8(%ebp) > e778b9: 89 d6 mov %edx,%esi > e778bb: 89 ca mov %ecx,%edx > e778bd: 89 7d fc mov %edi,-0x4(%ebp) > e778c0: c1 e9 02 shr $0x2,%ecx > e778c3: 89 c7 mov %eax,%edi > e778c5: f3 a5 rep movsl %ds:(%esi),%es:(%edi) > e778c7: 83 e2 03 and $0x3,%edx > e778ca: 74 04 je e778d0 > e778cc: 89 d1 mov %edx,%ecx > e778ce: f3 a4 rep movsb %ds:(%esi),%es:(%edi) > e778d0: 8b 75 f8 mov -0x8(%ebp),%esi > e778d3: 8b 7d fc mov -0x4(%ebp),%edi > e778d6: 89 ec mov %ebp,%esp > e778d8: 5d pop %ebp > e778d9: c3 ret > > due to a better register allocation, avoiding the call-saved > %ebx register. That'll might be semi-random. > > No functional changes intended. > > Signed-off-by: Uros Bizjak > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: Borislav Petkov > Cc: Dave Hansen > Cc: "H. Peter Anvin" > --- > arch/x86/include/asm/string_32.h | 22 ++++++++++++---------- > 1 file changed, 12 insertions(+), 10 deletions(-) > > diff --git a/arch/x86/include/asm/string_32.h b/arch/x86/include/asm/string_32.h > index e9cce169bb4c..16cdda93e437 100644 > --- a/arch/x86/include/asm/string_32.h > +++ b/arch/x86/include/asm/string_32.h > @@ -32,16 +32,18 @@ extern size_t strlen(const char *s); > > static __always_inline void *__memcpy(void *to, const void *from, size_t n) > { > - int d0, d1, d2; > - asm volatile("rep movsl\n\t" > - "movl %4,%%ecx\n\t" > - "andl $3,%%ecx\n\t" > - "jz 1f\n\t" > - "rep movsb\n\t" > - "1:" > - : "=&c" (d0), "=&D" (d1), "=&S" (d2) > - : "0" (n / 4), "g" (n), "1" ((long)to), "2" ((long)from) > - : "memory"); > + unsigned long esi = (unsigned long)from; > + unsigned long edi = (unsigned long)to; There is no need to cast to 'unsigned long', the same code should be generated if 'void *' is used. > + unsigned long ecx = n >> 2; > + > + asm volatile("rep movsl" > + : "+D" (edi), "+S" (esi), "+c" (ecx) > + : : "memory"); > + ecx = n & 3; > + if (ecx) > + asm volatile("rep movsb" > + : "+D" (edi), "+S" (esi), "+c" (ecx) > + : : "memory"); > return to; > } > This version seems to generate better code still: see https://godbolt.org/z/78cq97PPj void *__memcpy(void *to, const void *from, unsigned long n) { unsigned long ecx = n >> 2; asm volatile("rep movsl" : "+D" (to), "+S" (from), "+c" (ecx) : : "memory"); ecx = n & 3; if (ecx) asm volatile("rep movsb" : "+D" (to), "+S" (from), "+c" (ecx) : : "memory"); return (char *)to - n; } Note that the performance is horrid on a lot of cpu. (In particular the copy of the remaining 1-3 bytes.) David