[PATCH] doc: code generation style

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Alexey Dobriyan <adobriyan@gmail.com>
To: torvalds@linux-foundation.org, corbet@lwn.net
Cc: linux-kernel@vger.kernel.org, linux-arch@vger.kernel,
	linux-doc@vger.kernel.org, x86@kernel.org
Subject: [PATCH] doc: code generation style
Date: Thu, 5 Mar 2020 22:02:53 +0300	[thread overview]
Message-ID: <20200305190253.GA28787@avx2> (raw)

I wonder if it would be useful to have something like this in tree.

It states trivial things for anyone who looked at disassembly few times
but still...

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
---

 Documentation/process/code-generation.rst |  196 ++++++++++++++++++++++++++++++
 1 file changed, 196 insertions(+)

new file mode 100644
--- /dev/null
+++ b/Documentation/process/code-generation.rst
@@ -0,0 +1,196 @@
+Code generation
+===============
+
+1) Generic techniques
+---------------------
+
+### a) Inlining/uninlining function calls ###
+
+External function call is serious business from code generation point of view.
+ABIs require that specific arguments are placed into specific registers before
+doing the call forcing spilling and register shuffling to accomodate ABI rules.
+Clobbered registers which aren't used by a function are wasted. Declaring
+function as ``static inline`` in a header gives compiler more information
+to work with.
+
+However, excessing inlining often leads to code bloat for no measurable
+performance gains. In such case it is probably better to save on generated code
+for icache, disk I/O and network bandwidth costs.
+
+Use ``noinline`` attribute to prevent inlining inside translation unit and
+see what happens:
+
+.. code-block:: c
+
+        noinline
+        int f()
+        {
+                ...
+        }
+
+It is hard to advice any more than that as modern compilers generate code in
+mysterious ways.
+
+
+### b) Appending arguments ###
+
+Some functions are thin wrappers appending an argument or two to another
+function which actually does the job:
+
+.. code-block:: c
+
+        int g(int, int, flag_t);
+        int f(int a, int b)
+        {
+                return g(a, b, FLAG_C);
+        }
+
+Appending an argument at the end adds minimum amount of code:
+
+.. code-block:: none
+
+        f:
+                mov     edx, FLAG_C
+                jmp     g
+
+Appending an argument in the middle or in the beginning will generate
+reshuffle sequence:
+
+.. code-block:: none
+
+        f:
+                mov     edx, esi
+                mov     esi, edi
+                mov     edi, FLAG_C
+                jmp     g
+
+Do not enforce this rule religiously as there may be other reasons for
+specific argument order most notably keeping related arguments together
+at source level.
+
+
+2) Architecture specific issues (i386/x86_64)
+---------------------------------------------
+
+### a) Member placement ###
+
+First member of any structure is very special on i386/x86_64: compiler will
+use ``[r32]`` or ``[r64]`` addressing mode which has the shortest encoding.
+After laying out members of a structure into cachelines for performance,
+move most often used member of the first cacheline to the very beginning.
+
+Done that, pay attention to bytes 1--127. Members placed there will be encoded
+with ``[r64+disp8]`` encoding (or ``[r32+disp8]`` on i386). This is only 1 byte
+longer than encoding used for the first member but 3 bytes _shorter_ than
+``[r64+disp32]`` used for all other members. Try to shift more often used
+members into first 2 cachelines.
+
+"Refugee" members living in byte 128 and beyond can be placed in any order.
+
+
+### b) Implicit 32/64-bit casts
+
+Avoid casts which change signedness and/or bitness of a value.
+
+If some piece of data appears in the code it generally should be kept in its
+original type unless there are specific reasons to do otherwise (packing, etc).
+With C's seemingly arcane implicit and explicit casting rules this is good advice
+from programming language point of view as well.
+
+Given the code:
+
+.. code-block:: c
+
+        void f(size_t);
+
+        int len = strlen(s);
+        f(len);
+
+if compiler doesn't or can't maintain value ranges through casts it will have
+no choice but to assume that all "size_t" values are possible and emit MOVSX
+instruction:
+
+.. code-block:: none
+
+        mov     rdi, ...
+        call    strlen
+        movsx   rdi, eax
+        call    f
+
+MOVSX by itself it not a problem but it a) may be 1 byte longer than MOV
+instruction with same arguments and b) it won't be handled by register renaming,
+increasing dependency chains by 1 instruction.
+
+
+### c) 64-bitness ###
+
+64-bit instruction are 1-byte longer than corresponding 32-bit equivalents
+on x86_64.
+
+There is one big 64-bit enabler which is dynamic memory allocation: all
+kmalloc variant accept ``size_t`` and ``sizeof`` operator returns ``size_t``.
+
+Do not use 64-bit/``size_t`` unless strictly necessary (pointer-to-integer
+conversion, syscall ABI interfaces, integers which can be genuinely big on
+big machines, statistics).
+
+Use 32-bit ``unsigned int``. Kernel simply doesn't to individual 4+ GB
+allocations and if it does it probably goes via page allocator. Such huge
+amounts of memory simply aren't needed: network doesn't do gigabyte packets,
+VFS caps IO at 2 GB minus a little and interating with userspace via
+``copy_from_user``/``copy_to_user`` is capped at ``INT_MAX`` as well.
+
+.. code-block:: c
+
+        #define MAX_RW_COUNT (INT_MAX & PAGE_MASK)
+
+The only exceptional case is ``size_t`` value being passed directly into
+a standard function accepting ``size_t`` (``memset``, ``memcpy``, ...).
+Truncating value to 32-bit won't do anything useful in this case.
+
+
+### d) 16-bitness ###
+
+16-bit instructions will generate 1-byte operand size override prefix (66)
+which again bloats an instruction by 1 byte. Unlike REX prefixes, this is
+unavoidable.
+
+It is better to use 16-bit types at ABI/protocol/memory level, convert
+to plain ``int``/``unsigned int`` as soon as possible and work with that.
+
+Preferred order of bitness on x86_64 is:
+
+        32/8-bit > 64-bit > 16-bit.
+
+3) Architecture specific issues (arm/arm64)
+-------------------------------------------
+
+### Constant flags value selection ###
+
+"Tight" constants can be loaded into a register in 1 instruction on arm and
+other RISC architectures:
+
+.. code-block:: c
+
+        int f()
+        {
+                return 1;
+        }
+
+.. code-block:: none
+
+        00000000 <f>:
+           0:   e3a00001        mov     r0, #1
+           4:   e12fff1e        bx      lr
+
+Constants which don't fit into 12-bit window on arm will be loaded from memory
+or constructed with 2 loads:
+
+.. code-block:: none
+
+        00000000 <f>:
+           0:   e59f0000        ldr     r0, [pc]        ; 8 <f+0x8>
+           4:   e12fff1e        bx      lr
+           8:   00000801        .word   0x00000801      ; <=== 2049
+
+After settling on flags/constants push often used values together bitwise.

next             reply	other threads:[~2020-03-05 19:03 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-05 19:02 Alexey Dobriyan [this message]
2020-03-05 19:21 ` [PATCH] doc: code generation style Matthew Wilcox
2020-03-05 23:39 ` Jonathan Corbet
2020-03-06 10:40 ` Jani Nikula

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200305190253.GA28787@avx2 \
    --to=adobriyan@gmail.com \
    --cc=corbet@lwn.net \
    --cc=linux-arch@vger.kernel \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.