From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3F14132FA09 for ; Mon, 24 Nov 2025 21:55:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.17 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764021337; cv=none; b=RfvIvkeeXM8zKjmW4a2TkoRzaJ1fhJBOC0/VBqLOE/M4fB8K0lf2C42qQwi1L1NkEOuRn/x7yFm6dcJd0If7ETfVFKIMgICSXPVkjHqTsjqHcHWAlP+LxTNaYJFhZs/IKJDXH1aEDYn8Pdhbq7r5qzul4SiJ5mawCdjPySa4LKM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764021337; c=relaxed/simple; bh=OPZa9F072idkUlqUJV4ZiUBR28wxzrwPZD2/ijd0HpE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dS/riKOHFJgjWGvARXSyUEgDFZWns225e4HFI06ietnAgukknUqEUAtvLTPNvyLXOo/k0fzsfVXuCo240ZJpGgj+ndk2Pv4a0gTZcu21Ig8CGxZLvsAw2tevRppKPj4Gi3g5a3YmnlafT/B9PhNQUI+6Mnebi/pzCd6Q3WL5enc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=XhkzbSwA; arc=none smtp.client-ip=198.175.65.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="XhkzbSwA" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1764021336; x=1795557336; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=OPZa9F072idkUlqUJV4ZiUBR28wxzrwPZD2/ijd0HpE=; b=XhkzbSwAZKhaM6BhqzIlfs0TWCSINFVknXoN7oxw3Cai/YsEAYJRhvNc ST7xKy2mxlswd62/puPGc33AIA2ECXG1IcJz1lGXxs64pPX4aekVMJBG+ G6uItVIKBGUthBMwUI+UcsmM2mlnIj1MjX+8jCj78o/uQE79PGQVDEfWP n4Av7z3id5/iGrPZiKO0Kn4UstHSuRHvkXb7T1NQDnapX/iBlg8O8+bgR oTXnnDKiW1wqp8ahTSmPp7iMHVwtLi1YG2ooWOu0dC7Hpfpd22Ci20XtJ oFMTElh5394cJOWud/Adqf8H/XB1qJTtNX60Mn448S47CEobcUSdS4E5Q Q==; X-CSE-ConnectionGUID: eHq0S+RkSmSXTmftTXDlIA== X-CSE-MsgGUID: DWp+g3Z/RrSa8FvytJP/Lg== X-IronPort-AV: E=McAfee;i="6800,10657,11623"; a="65985348" X-IronPort-AV: E=Sophos;i="6.20,223,1758610800"; d="scan'208";a="65985348" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by orvoesa109.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Nov 2025 13:55:36 -0800 X-CSE-ConnectionGUID: 7XkgwMtpQ563yDMdoXvxTw== X-CSE-MsgGUID: RUwT1pRZSAm/l5Pu9+FvQQ== X-ExtLoop1: 1 Received: from chang-linux-3.sc.intel.com (HELO chang-linux-3) ([172.25.66.172]) by fmviesa003.fm.intel.com with ESMTP; 24 Nov 2025 13:55:35 -0800 From: "Chang S. Bae" To: linux-kernel@vger.kernel.org Cc: x86@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, chang.seok.bae@intel.com Subject: [RFC PATCH 2/3] x86/lib: Convert repeated asm sequences in checksum copy into macros Date: Mon, 24 Nov 2025 21:32:25 +0000 Message-ID: <20251124213227.123779-3-chang.seok.bae@intel.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251124213227.123779-1-chang.seok.bae@intel.com> References: <20251124213227.123779-1-chang.seok.bae@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Several instruction patterns are repeated in the checksum-copy function. Replace them with small macros to make concise and more readable. No functional change. Signed-off-by: Chang S. Bae --- These repetitions are related to the loop unrolling, which will be further extended using EGPRs in the next patch. --- arch/x86/lib/csum-copy_64.S | 106 ++++++++++++++++-------------------- 1 file changed, 48 insertions(+), 58 deletions(-) diff --git a/arch/x86/lib/csum-copy_64.S b/arch/x86/lib/csum-copy_64.S index 66ed849090b7..5526bdfac041 100644 --- a/arch/x86/lib/csum-copy_64.S +++ b/arch/x86/lib/csum-copy_64.S @@ -46,6 +46,43 @@ RET .endm +.macro prefetch +30: + /* + * No _ASM_EXTABLE_UA; this is used for intentional prefetch on a + * potentially unmapped kernel address. + */ + _ASM_EXTABLE(30b, 2f) + prefetcht0 5*64(%rdi) +2: +.endm + +.macro loadregs offset, src, regs:vararg + source + i = 0 +.irp r, \regs + movq 8*(\offset + i)(\src), \r +.endr +.endm + +.macro storeregs offset, dst, regs:vararg + dest + i = 0 +.irp r, \regs + movq \r, 8*(\offset + i)(\dst) +.endr +.endm + +.macro sumregs sum, regs:vararg +.irp r, \regs + adcq \r, \sum +.endr +.endm + +.macro incr ptr, count + leaq 8*(\count)(\ptr), \ptr +.endm + .macro _csum_partial_copy subq $5*8, %rsp movq %rbx, 0*8(%rsp) @@ -87,63 +124,18 @@ .p2align 4 .Lloop\@: - source - movq (INP), TMP1 - source - movq 8(INP), TMP2 - source - movq 16(INP), TMP3 - source - movq 24(INP), TMP4 + loadregs 0, INP, TMP1, TMP2, TMP3, TMP4, TMP5, TMP6, TMP7, TMP8 - source - movq 32(INP), TMP5 - source - movq 40(INP), TMP6 - source - movq 48(INP), TMP7 - source - movq 56(INP), TMP8 + prefetch -30: - /* - * No _ASM_EXTABLE_UA; this is used for intentional prefetch on a - * potentially unmapped kernel address. - */ - _ASM_EXTABLE(30b, 2f) - prefetcht0 5*64(%rdi) -2: - adcq TMP1, SUM - adcq TMP2, SUM - adcq TMP3, SUM - adcq TMP4, SUM - adcq TMP5, SUM - adcq TMP6, SUM - adcq TMP7, SUM - adcq TMP8, SUM + sumregs SUM, TMP1, TMP2, TMP3, TMP4, TMP5, TMP6, TMP7, TMP8 decl LEN64B - dest - movq TMP1, (OUTP) - dest - movq TMP2, 8(OUTP) - dest - movq TMP3, 16(OUTP) - dest - movq TMP4, 24(OUTP) + storeregs 0, OUTP, TMP1, TMP2, TMP3, TMP4, TMP5, TMP6, TMP7, TMP8 - dest - movq TMP5, 32(OUTP) - dest - movq TMP6, 40(OUTP) - dest - movq TMP7, 48(OUTP) - dest - movq TMP8, 56(OUTP) - - leaq 64(INP), INP - leaq 64(OUTP), OUTP + incr INP, 8 + incr OUTP, 8 jnz .Lloop\@ @@ -159,14 +151,12 @@ clc .p2align 4 .Lloop_8\@: - source - movq (INP), TMP1 - adcq TMP1, SUM + loadregs 0, INP, TMP1 + sumregs SUM, TMP1 decl LEN - dest - movq TMP1, (OUTP) - leaq 8(INP), INP /* preserve carry */ - leaq 8(OUTP), OUTP + storeregs 0, OUTP, TMP1 + incr INP, 1 /* preserve carry */ + incr OUTP, 1 jnz .Lloop_8\@ adcq ZERO, SUM /* add in carry */ -- 2.51.0