From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wr1-f50.google.com (mail-wr1-f50.google.com [209.85.221.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 54AB0273803 for ; Tue, 16 Dec 2025 19:18:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.50 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765912738; cv=none; b=TiCX/2p/591NFfwo6uvHw6wFaIzh1jyZFaslzExl+8x4Cos1vyR9K4JbxnOB+QWez06rT+SBrzmoQOpYSJUltCP9Nxeat9sJkz+XvVxz5eqDzKHL6AgciwB+cGIoGXaQ7C82JNbBg9lIB3fu7C1HWJs8Qm2I3+zqDY0CPrWJUJ8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765912738; c=relaxed/simple; bh=yucfX2d12WmpqDhvuBnoZ1p7WJx24ddxxbHA40eyic4=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=nvYw1UcWdNQam8RCUrUhfmH+hEuC/4GjoKSD2MqE94FZ2QYLTmI9peLKN5PKsQS0VSDi+fcMSyF0sWkNqdqR0VDdFCn4VvyV4vR46X0ui/eNmKKFVXUlIcEn/fUu2jL4V818i4zpiRq9br2iQhEM+ObqTqdUo80z8OFAFo4sPPo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=S8xJVK+9; arc=none smtp.client-ip=209.85.221.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="S8xJVK+9" Received: by mail-wr1-f50.google.com with SMTP id ffacd0b85a97d-42e2d5e119fso2265991f8f.2 for ; Tue, 16 Dec 2025 11:18:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1765912734; x=1766517534; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=+LZitL33qaaFeqGCc8yrMadspdfhJugdgfVlCYpUUcU=; b=S8xJVK+9U4nB6IAG+V+B8pmYGG7ebeLkrbLDKEgqnjUzTp8j+WIrtz2t409y6cQqoW eBvj+XPBWQ8Zqya6tGnr50BgWU1PnnDS4ZlHdCy/SFX9LzpLYB6v76v1Yc37ZXNxHahz bGlNSDgZ1yG9ED7vQBKD1bncjXpeRsNffKxV3KEtvMkZz4M/5gaLhNybINduCKCWFti+ ne2D9y4hwtr43EtegJPM+zfMkYwXkBLlh35l8+PkECXzmYGsi8aE7hluO/z/zslYvjA6 A1cULdjEZXghJuQqqaJFvHXkKAl1kcb8mCmWJO4EODZeQopROUHRnRwItksIYKBzksyU mtaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1765912734; x=1766517534; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=+LZitL33qaaFeqGCc8yrMadspdfhJugdgfVlCYpUUcU=; b=crSB8evuOG/I5P0xwIeb+qolmqe9PmFKD91rO3DYfR/7g5GSEbiuqxZQ0NEXp/Q1LG XzdM+iRBDvB+Bwpg2GheMm7GbAI77RHRI29uN6TVv+XfREer7cWFHZnbCyujFG0fA1EF Ws4fUueNcg7XiZ8x1BWmdvoRP3iPJWVNom2OOgummWnamoDKNmeAZr2d3qL/dXVo+6dz jvje7C8kVllW6/gYVenYtQEE4QFPoJ+OdaNUx3tD1oKh6SRcLX/lkdqDwhrgwCIgLnea m9vZZPH91JyBS7VkbfHaMMQWP13mxRYxIKfCNWX8z2ex+Xrh341Fd1O0YNSbjA1z7RiH bSUA== X-Forwarded-Encrypted: i=1; AJvYcCWQo4hmyBXWPbAevAPWj+EDu3dvvuD7dVsv4AhfuQAwCiUTtazdAJqVOTPyVHnn3x/3abyQpDRU7cKvFKU=@vger.kernel.org X-Gm-Message-State: AOJu0YyubT8OgpZYWJuSSI2bpTFOhnxZhkUaL8DhYuyMbnrOIAdLlsa6 huWe4ttn8M3X371fouo7NNxoLWgXo7WIDjOUzb7Qwy5OeTlLr+1j+2Euz0I1pQ== X-Gm-Gg: AY/fxX6uxva825ZsPli8fCOJE2oCRUNxfSC+dz3x8IdkZbcb04kv3nkqZRcL+Lc274S RX5UnBBY/Cwxc4gO5YBXIBftySbo6VwNWL7FfoZlTz0IF2Q7VBq0BB7SLevZ+S5UsQGsQJsThhl Bd9/tpR0FsIhYmnMOTDdRB2YWmxpI06gKSPtnyl4vklLJWuS4IP5xP+J1YWkLiHOp8fbNe0vOjW IPYfJhbQ9XnaOzHHfZiEmdgZ8FWDhaQt7A1zYxrSW/YEGNxdodSt701O2Z7MMqjWhAGiUkZrjkR M24I+J0dwxSM+BpHa3R6O3cHgyKTjRa3wohbW+kHZ9XaWf9ppuYhjlAL2d39/iGlsV983judigu 4a0SF9zhukBEoEi7cpx8fRtXezktS0B0cRr3wOm4zO9iclcYeF+D4Nrup8b6si1v3ka75obkuFA TNcDxo0G/+SVawau1ntj+mqWE1lFs/NlAExN1zWDr7APj7VpCvRU4/ X-Google-Smtp-Source: AGHT+IH+QFc0xH8ZX0+YFNWGBjql/C2hliywziGXtAr3mbL1d1ElGux00WobJARbSYv3OAnb8/Cn4w== X-Received: by 2002:a05:6000:178d:b0:431:752:672b with SMTP id ffacd0b85a97d-4310752808bmr2835830f8f.14.1765912734305; Tue, 16 Dec 2025 11:18:54 -0800 (PST) Received: from pumpkin (82-69-66-36.dsl.in-addr.zen.co.uk. [82.69.66.36]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-4310adacfbbsm676583f8f.12.2025.12.16.11.18.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 16 Dec 2025 11:18:53 -0800 (PST) Date: Tue, 16 Dec 2025 19:18:52 +0000 From: David Laight To: Uros Bizjak Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" Subject: Re: [PATCH RESEND] x86/asm/32: Modernize _memcpy() Message-ID: <20251216191852.7f5a7354@pumpkin> In-Reply-To: References: <20251216103750.229347-1-ubizjak@gmail.com> <20251216131406.43bfbbaa@pumpkin> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; arm-unknown-linux-gnueabihf) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Tue, 16 Dec 2025 17:30:55 +0100 Uros Bizjak wrote: > On Tue, Dec 16, 2025 at 2:14=E2=80=AFPM David Laight > wrote: >=20 > > > 00e778b0 : > > > e778b0: 55 push %ebp > > > e778b1: 89 e5 mov %esp,%ebp > > > e778b3: 83 ec 08 sub $0x8,%esp > > > e778b6: 89 75 f8 mov %esi,-0x8(%ebp) > > > e778b9: 89 d6 mov %edx,%esi > > > e778bb: 89 ca mov %ecx,%edx > > > e778bd: 89 7d fc mov %edi,-0x4(%ebp) > > > e778c0: c1 e9 02 shr $0x2,%ecx > > > e778c3: 89 c7 mov %eax,%edi > > > e778c5: f3 a5 rep movsl %ds:(%esi),%es:(%edi) > > > e778c7: 83 e2 03 and $0x3,%edx > > > e778ca: 74 04 je e778d0 > > > e778cc: 89 d1 mov %edx,%ecx > > > e778ce: f3 a4 rep movsb %ds:(%esi),%es:(%edi) > > > e778d0: 8b 75 f8 mov -0x8(%ebp),%esi > > > e778d3: 8b 7d fc mov -0x4(%ebp),%edi > > > e778d6: 89 ec mov %ebp,%esp > > > e778d8: 5d pop %ebp > > > e778d9: c3 ret > > > > > > due to a better register allocation, avoiding the call-saved > > > %ebx register. =20 > > > > That'll might be semi-random. =20 >=20 > Not really, the compiler has more freedom to allocate more optimal regist= er. >=20 > > > + unsigned long ecx =3D n >> 2; > > > + > > > + asm volatile("rep movsl" > > > + : "+D" (edi), "+S" (esi), "+c" (ecx) > > > + : : "memory"); > > > + ecx =3D n & 3; > > > + if (ecx) > > > + asm volatile("rep movsb" > > > + : "+D" (edi), "+S" (esi), "+c" (ecx) > > > + : : "memory"); > > > return to; > > > } > > > =20 >=20 > > This version seems to generate better code still: > > see https://godbolt.org/z/78cq97PPj > > > > void *__memcpy(void *to, const void *from, unsigned long n) > > { > > unsigned long ecx =3D n >> 2; > > > > asm volatile("rep movsl" > > : "+D" (to), "+S" (from), "+c" (ecx) > > : : "memory"); > > ecx =3D n & 3; > > if (ecx) > > asm volatile("rep movsb" > > : "+D" (to), "+S" (from), "+c" (ecx) > > : : "memory"); > > return (char *)to - n; =20 >=20 > I don't think that additional subtraction outweighs a move from EAX to > a temporary. It saves a read from stack, and/or a register spill to stack (at least as I compiled the code...). The extra alu operation is much cheaper - especially since it will sit in the 'out of order' instruction queue with nothing waiting for the result (the write to stack . Since pretty much no code user the result of memcpy() you could have a inline wrapper that just preserves the 'to' address and calls a 'void' function to do the copy itself. But the real performance killer is the setup time for 'rep movs'. That will outweigh any minor faffing about by orders of magnitude. Oh, and start trying to select between algorithms based on length and/or alignment and you get killed by the mispredicted branch penalty - that seems to be about 20 clocks on my zen-5. I haven't measured the setup cost for 'rep movs', it is on my 'round tuit' list. David >=20 > BR, > Uros.