From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: util-linux-owner@vger.kernel.org Received: from mta02.eastlink.ca ([24.224.136.13]:60932 "EHLO mta02.eastlink.ca" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726866AbeHFCPV (ORCPT ); Sun, 5 Aug 2018 22:15:21 -0400 MIME-version: 1.0 Content-type: text/plain; CHARSET=US-ASCII Received: from emgw02.eastlink.ca ([71.7.199.174]) by mta02.eastlink.ca (Oracle Communications Messaging Server 8.0.2.2.20180130 64bit (built Jan 30 2018)) with ESMTP id <0PD000CF5HNTW9M0@mta02.eastlink.ca> for util-linux@vger.kernel.org; Sun, 05 Aug 2018 20:38:55 -0300 (ADT) Date: Sun, 5 Aug 2018 20:41:17 -0300 To: Sami Kerola Cc: "Theodore Y. Ts'o" , "Dmitry V. Levin" , util-linux@vger.kernel.org Subject: Re: [PATCH] libuuid: use kernel crypto api Message-id: <20180805234117.GR865@cordes.ca> References: <20180804191706.10641-1-kerolasa@iki.fi> <20180804194655.GD4461@thunk.org> In-reply-to: From: Peter Cordes Sender: util-linux-owner@vger.kernel.org List-ID: On Sun, Aug 05, 2018 at 11:42:09AM +0100, Sami Kerola wrote: > > I should have told in that commit message part of the motivation was to > deprecate util-linux local md5 implementation. But since both of you > raised concern about performance I decided to test kernel api and > util-linux implementations as close the same way as they are used in > libuuid. > > Executive summary: kernel api is surprisingly slow. You're probably testing on an x86-64 system with kernel mitigation for Spectre and Meltdown. Both of those add *significant* overhead to every system call (or other kernel entry/exit, like interrupts). e.g. in comments on Stack Overflow, @BeeOnRope found that a `syscall` instruction with an invalid call number takes about 1800 cycles on a Skylake CPU running Linux (in late February 2018). https://stackoverflow.com/questions/48913091/fastest-linux-system-call#comment84843442_48914200 (Unfortunately IDK if there's a better / more details analysis of system call costs anywhere.) Most of that cost is in the WRMSR that flushes branch predictors, using Intel's newly-introduced (and *not* fast) microcode assistance for Spectre. Possibly future hardware will make this cheaper, but on current hardware it just sucks to make system calls. Thanks to Meltdown mitigation, you get extra TLB misses in the kernel and after returning to user-space. (This may be less bad than in early patches, thanks to using hardware PCIDs). But even just the MOV to CR3 to change the top level page table takes some time. I'm not surprised that you found a 10x slowdown for short messages. Amortizing the kernel entry/exit over a larger buffer is the only way for it not to be horrible. If you're curious, you could try booting with the workarounds disabled (or an old kernel) to see how much perf difference that makes. The SYSCALL / SYSRET instructions themselves only take something in the ballpark of 50 cycles on Skylake or Ryzen, IIRC, and Linux's system-call dispatch code is pretty efficient for the fast path. Even that might still be measurable overhead for MD5 on a short message, though. -- #define X(x,y) x##y Peter Cordes ; e-mail: X(peter@cor , des.ca) "The gods confound the man who first found out how to distinguish the hours! Confound him, too, who in this place set up a sundial, to cut and hack my day so wretchedly into small pieces!" -- Plautus, 200 BC