public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] compat: modernize and simplify byte swapping functions
@ 2026-01-02  0:27 Rostislav Krasny
  2026-01-02  0:27 ` [PATCH 1/1] " Rostislav Krasny
  2026-01-02  7:29 ` [PATCH 0/1] " Jeff King
  0 siblings, 2 replies; 7+ messages in thread
From: Rostislav Krasny @ 2026-01-02  0:27 UTC (permalink / raw)
  To: git; +Cc: Rostislav Krasny

When I read sha256/block/sha256.c I noticed it uses both the htonl macro and
the get_be32() static inline function. I was surprised how different the
implementations of those two kindred things are. When GCC or Clang is used the
htonl macro is translated into the __builtin_bswap32() call, which is assembled
into one single CPU instruction, in the case of x86. And the original
implementation of the get_be32() function used eight bitwise operations. Even
if the compiler can optimize that code it's still less readable and more error
prone.

The main reason it was implemented so complicated is UB when conversion of a
pointer to one object type into a pointer of a different object type is used.
On the other hand, memcpy is protected from such UB and this allows us to make
that code simpler and even more optimal, in some cases.

Additionally I made a few more small improvements related to the same
functionality.

I've measured performance of the original and the new code on my Intel
Xeon W-2135 based computer in Fedora 43 Linux with:

* glibc 2.42-5.fc43
* gcc   15.2.1-5.fc43
* clang 21.1.7-1.fc43

I used the following code for these measurements:

#include <inttypes.h>
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <time.h>

#include "bswap.h"

#define ITERATIONS 1000000
#define BUF_SIZE 8192

int main() {
    uint8_t buffer[BUF_SIZE];
    uint64_t sum = 0;

    for (int i = 0; i < BUF_SIZE; i++) {
        buffer[i] = (uint8_t)i;
    }

    clock_t start = clock();

    for (int i = 0; i < ITERATIONS; i++) {
        // use a volatile pointer to force the compiler to read memory
        volatile uint8_t *p = buffer; 
        for (int j = 0; j < BUF_SIZE - 8; j++) {
            sum += get_be64((const void*)(p + j));
        }
    }
    
    clock_t end = clock();
    double time_taken = (double)(end - start) / CLOCKS_PER_SEC;

    printf("Time taken: %f seconds\n", time_taken);
    printf("Checksum: %" PRIu64 "\n", sum);

    return 0;
}

And these are the results:

GCC 15.2.1
version |  -Os     |  -O0     |  -O1     |  -O2     |  -O3
================================================================
        | 3.721806 |72.342204 |11.956021 | 3.119833 | 0.919873  
original| 3.726111 |72.326920 |11.963618 | 3.128222 | 0.921128  
        | 3.719791 |72.328175 |11.949108 | 3.130956 | 0.920296         
================================================================
        | 3.719899 |17.177719 | 3.005065 | 3.120747 | 0.920609  
new     | 3.714785 |17.168950 | 3.004978 | 3.119227 | 0.918851  
        | 3.716782 |17.145386 | 3.009364 | 3.119573 | 0.920030  
================================================================

Clang 21.1.7
version |  -Os     |  -O0     |  -O1     |  -O2     |  -O3
================================================================
        | 3.690718 |62.916338 | 3.017460 | 3.768443 | 3.778840  
original| 3.686283 |62.965916 | 3.014674 | 3.777897 | 3.774776  
        | 3.687775 |62.850648 | 3.003496 | 3.766108 | 3.765313         
================================================================
        | 3.681818 |16.753385 | 3.008131 | 2.075271 | 2.076090  
new     | 3.687184 |16.737982 | 3.004365 | 2.071597 | 2.074507  
        | 3.683960 |16.765067 | 2.999775 | 2.075354 | 2.075759  
================================================================

Rostislav Krasny (1):
  compat: modernize and simplify byte swapping functions

 compat/bswap.h | 74 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 44 insertions(+), 30 deletions(-)

-- 
2.52.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-01-14 21:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-02  0:27 [PATCH 0/1] compat: modernize and simplify byte swapping functions Rostislav Krasny
2026-01-02  0:27 ` [PATCH 1/1] " Rostislav Krasny
2026-01-02  6:16   ` Jeff King
2026-01-02 17:37     ` Rostislav Krasny
2026-01-11 22:05       ` Rostislav Krasny
2026-01-14 21:14         ` Jeff King
2026-01-02  7:29 ` [PATCH 0/1] " Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox