public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Rostislav Krasny <rostiprodev@gmail.com>
To: git@vger.kernel.org
Cc: Rostislav Krasny <rostiprodev@gmail.com>
Subject: [PATCH 0/1] compat: modernize and simplify byte swapping functions
Date: Fri,  2 Jan 2026 02:27:34 +0200	[thread overview]
Message-ID: <20260102002735.31390-1-rostiprodev@gmail.com> (raw)

When I read sha256/block/sha256.c I noticed it uses both the htonl macro and
the get_be32() static inline function. I was surprised how different the
implementations of those two kindred things are. When GCC or Clang is used the
htonl macro is translated into the __builtin_bswap32() call, which is assembled
into one single CPU instruction, in the case of x86. And the original
implementation of the get_be32() function used eight bitwise operations. Even
if the compiler can optimize that code it's still less readable and more error
prone.

The main reason it was implemented so complicated is UB when conversion of a
pointer to one object type into a pointer of a different object type is used.
On the other hand, memcpy is protected from such UB and this allows us to make
that code simpler and even more optimal, in some cases.

Additionally I made a few more small improvements related to the same
functionality.

I've measured performance of the original and the new code on my Intel
Xeon W-2135 based computer in Fedora 43 Linux with:

* glibc 2.42-5.fc43
* gcc   15.2.1-5.fc43
* clang 21.1.7-1.fc43

I used the following code for these measurements:

#include <inttypes.h>
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <time.h>

#include "bswap.h"

#define ITERATIONS 1000000
#define BUF_SIZE 8192

int main() {
    uint8_t buffer[BUF_SIZE];
    uint64_t sum = 0;

    for (int i = 0; i < BUF_SIZE; i++) {
        buffer[i] = (uint8_t)i;
    }

    clock_t start = clock();

    for (int i = 0; i < ITERATIONS; i++) {
        // use a volatile pointer to force the compiler to read memory
        volatile uint8_t *p = buffer; 
        for (int j = 0; j < BUF_SIZE - 8; j++) {
            sum += get_be64((const void*)(p + j));
        }
    }
    
    clock_t end = clock();
    double time_taken = (double)(end - start) / CLOCKS_PER_SEC;

    printf("Time taken: %f seconds\n", time_taken);
    printf("Checksum: %" PRIu64 "\n", sum);

    return 0;
}

And these are the results:

GCC 15.2.1
version |  -Os     |  -O0     |  -O1     |  -O2     |  -O3
================================================================
        | 3.721806 |72.342204 |11.956021 | 3.119833 | 0.919873  
original| 3.726111 |72.326920 |11.963618 | 3.128222 | 0.921128  
        | 3.719791 |72.328175 |11.949108 | 3.130956 | 0.920296         
================================================================
        | 3.719899 |17.177719 | 3.005065 | 3.120747 | 0.920609  
new     | 3.714785 |17.168950 | 3.004978 | 3.119227 | 0.918851  
        | 3.716782 |17.145386 | 3.009364 | 3.119573 | 0.920030  
================================================================

Clang 21.1.7
version |  -Os     |  -O0     |  -O1     |  -O2     |  -O3
================================================================
        | 3.690718 |62.916338 | 3.017460 | 3.768443 | 3.778840  
original| 3.686283 |62.965916 | 3.014674 | 3.777897 | 3.774776  
        | 3.687775 |62.850648 | 3.003496 | 3.766108 | 3.765313         
================================================================
        | 3.681818 |16.753385 | 3.008131 | 2.075271 | 2.076090  
new     | 3.687184 |16.737982 | 3.004365 | 2.071597 | 2.074507  
        | 3.683960 |16.765067 | 2.999775 | 2.075354 | 2.075759  
================================================================

Rostislav Krasny (1):
  compat: modernize and simplify byte swapping functions

 compat/bswap.h | 74 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 44 insertions(+), 30 deletions(-)

-- 
2.52.0


             reply	other threads:[~2026-01-02  0:27 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-02  0:27 Rostislav Krasny [this message]
2026-01-02  0:27 ` [PATCH 1/1] compat: modernize and simplify byte swapping functions Rostislav Krasny
2026-01-02  6:16   ` Jeff King
2026-01-02 17:37     ` Rostislav Krasny
2026-01-11 22:05       ` Rostislav Krasny
2026-01-14 21:14         ` Jeff King
2026-01-02  7:29 ` [PATCH 0/1] " Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260102002735.31390-1-rostiprodev@gmail.com \
    --to=rostiprodev@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox