public inbox for linux-man@vger.kernel.org
 help / color / mirror / Atom feed
From: Alejandro Colomar <alx@kernel.org>
To: Jonny Grant <jg@jguk.org>
Cc: Paul Eggert <eggert@cs.ucla.edu>,
	Matthew House <mattlloydhouse@gmail.com>,
	linux-man <linux-man@vger.kernel.org>
Subject: Re: strncpy clarify result may not be null terminated
Date: Sat, 18 Nov 2023 11:12:35 +0100	[thread overview]
Message-ID: <ZViOGeSYdYVRl-Ky@devuan> (raw)
In-Reply-To: <b5182629-8028-4842-bbfc-ea106e251b92@jguk.org>

[-- Attachment #1: Type: text/plain, Size: 5478 bytes --]

Hi Jonny,

On Fri, Nov 17, 2023 at 09:57:39PM +0000, Jonny Grant wrote:
> > -  strlcpy(3)'s first order growth corresponds to strlen(src).  That's
> >    due to returning strlen(src), which proves to be a poor API.
> > 
> > -  strncpy(3)'s first order growth corresponds to sizeof(dst).  That's
> >    of course due to the zeroing.  If sizeof(dst) is kept very small, you
> >    could live with it.  When the size grows to more or less 4 KiB, this
> >    drag becomes meaningful.
> > 
> > -  strnlen(3)+*cpy() first order growth corresponds to
> >    strnlen(src, sizeof(dst)), which is the fastest order of growth
> >    you can get from a truncating string-copying function (except if you
> >    keep track of your slen manually and call directly memcpy(3)).
> 
> That's a really good point, keeping track of the length (and buffer size) and then just using memcpy.
> The copy time should be closer to the number of bytes read and written.

Actually, the performance of memcpy(3) should also be on the order of
strnlen(src, sizeof(dst)), so it should always take similar times
compared to strnlen(3)+*cpy().  It is only that it will always be
slightly faster due to avoiding a second read, but it will only be a %.
Nothing like 10x, which can easily happen with strlcpy(3) or strncpy(3).

> > Of course, first order of growth ignores second order of growth and so
> > on, which for small inputs can be important.  That is, O(x^3) is bigger
> > than O(x^2), but x3 + x2 can be smaller than 5*x2 for small x.
> > 
> >>
> >> As Paul mentioned, strlcpy is a poor choice for processing strings.\
> >> Could rely on their guidance as they already measured.
> >> https://www.gnu.org/software/libc/manual/html_node/Truncating-Strings.html
> > 
> > Indeed.  I've added important notices in BUGS about it, and recommended
> > against
> 
> Saw glibc have (11) functions listed as a poor choice for string processing

They list many functions as poor choices for string processing.  The
problem is that they list those functions for string processing.  I went
a bit further and de-listed some: We don't list strncpy(3) or strncat(3)
as functions that process strings, but rather as something else.  And
they are actually good functions for processing that something else.

The problem with strlcpy(3) is that it's a function that is designed to
process strings, and being bad at processing strings makes it a bad
function period.

> >> Maybe the strlcpy API is easier, safer for programmers; but the
> >> compiler can't figure out that the programmer already knew src string
> >> length.  So the strlcpy does a strlen() and wastes time reading over
> >> memory.  If the src length is known, can just memcpy.
> > 
> > I've written strtcpy(3) as an alternative to strlcpy(3) that doesn't
> > suffer its problems.  It should be even safer and easier to use, and its
> > first order of growth is better.  I'll send a patch for review in a
> > moment.
> 
> I did take a look at strtcpy but it calls strnlen(), reading over memory.

That's just a few % slower than memcpy(3).  Don't expect memcpy(3) to be
much faster than this.  strtcpy() reads twice writes once; memcpy(3)
reads once writes once.  So you can expect memcpy(3) to be constantly
33% faster (very roughly).

If you implement you own strtcpy() in assembly, maybe you can get
something that's in the single-digit % slower than memcpy(3), similar to
strcpy(3).

> >> When I've benchmarked things, reducing the memory accesses for read,
> >> write boosted performance, also looked at the cycles taken, of course
> >> cache and alignment all play a part too.
> > 
> > If one wants to micro-optimize for their use case, its none of my
> > business.  I provide a function that should be safe and relatively fast
> > for all use cases, which libc doesn't.
> > 
> >> Maybe could suggest in your man page programmers should keep track of
> >> the src size ? - to save the cost of the strlen().
> > 
> > No.  Optimizations are not my business.  Writing good APIs should make
> > these optimizations low value so that they aren't done, except for the
> > most performance-critical programs.
> > 
> > The problem comes when libc doesn't provide anything usable, and the
> > user has no guidance on where to start.  Then, programmers start being
> > clever, usually too clever.  That's why I think the man-pages should go
> > ahead and write wrapper functions such as strtcpy() and stpecpy()
> > aound libc functions; these wrappers should provide a fast and safe
> > starting point for most programs.
> > 
> > It's true that memcpy(3) is the fastest function one can use, but it
> > requires the programmer to be rather careful with the lengths of the
> > strings.  I don't think keeping track of all those little details is
> > what the common programmer should do.
> 
> That's true, high-performance users probably create their own bespoke solutions.
> strtcpy probably takes the src size?

No.  strtcpy() takes the dst size.

ssize_t
strtcpy(char dst[restrict dsize], const char *restrict src, size_t dsize);

This function doesn't care about the src size.  It requires that it's
either a string, or a character array larger than dst.  In both cases,
it means that the internal calculation of slen = strnlen(src, dsize)
will never overrun the buffer, while costing only a small time.


-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2023-11-18 10:09 UTC|newest]

Thread overview: 138+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-04 11:27 strncpy clarify result may not be null terminated Jonny Grant
2023-11-04 19:33 ` Alejandro Colomar
2023-11-04 21:18   ` Jonny Grant
2023-11-05  1:36     ` Alejandro Colomar
2023-11-05 21:16   ` Jonny Grant
2023-11-05 23:31     ` Alejandro Colomar
2023-11-07 11:52       ` Jonny Grant
2023-11-07 13:23         ` Alejandro Colomar
2023-11-07 14:19           ` Jonny Grant
2023-11-07 16:17             ` Alejandro Colomar
2023-11-07 17:00               ` Jonny Grant
2023-11-07 17:20                 ` Alejandro Colomar
2023-11-08  6:18               ` Oskari Pirhonen
2023-11-08  9:51                 ` Alejandro Colomar
2023-11-08  9:59                   ` Thorsten Kukuk
2023-11-08 15:09                     ` Alejandro Colomar
     [not found]                     ` <6bcad2492ab843019aa63895beaea2ce@DB6PR04MB3255.eurprd04.prod.outlook.com>
2023-11-08 15:44                       ` Thorsten Kukuk
2023-11-08 17:26                         ` Adhemerval Zanella Netto
2023-11-08 14:06                   ` Zack Weinberg
2023-11-08 15:07                     ` Alejandro Colomar
2023-11-08 19:45                       ` G. Branden Robinson
2023-11-08 21:35                       ` Carlos O'Donell
2023-11-08 22:11                         ` Alejandro Colomar
2023-11-08 23:31                           ` Paul Eggert
2023-11-09  0:29                             ` Alejandro Colomar
2023-11-09 10:13                               ` Jonny Grant
2023-11-09 11:08                                 ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Alejandro Colomar
2023-11-09 14:06                                   ` catenate vs concatenate Jonny Grant
2023-11-27 14:33                                   ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Zack Weinberg
2023-11-27 15:08                                     ` Alejandro Colomar
2023-11-27 15:13                                       ` Alejandro Colomar
2023-11-27 16:59                                       ` G. Branden Robinson
2023-11-27 18:35                                         ` Zack Weinberg
2023-11-27 23:45                                           ` G. Branden Robinson
2023-11-09 11:13                                 ` strncpy clarify result may not be null terminated Alejandro Colomar
2023-11-09 14:05                                   ` Jonny Grant
2023-11-09 15:04                                     ` Alejandro Colomar
2023-11-08 19:04                   ` DJ Delorie
2023-11-08 19:40                     ` Alejandro Colomar
2023-11-08 19:58                       ` DJ Delorie
2023-11-08 20:13                         ` Alejandro Colomar
2023-11-08 21:07                           ` DJ Delorie
2023-11-08 21:50                             ` Alejandro Colomar
2023-11-08 22:17                               ` [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string Alejandro Colomar
2023-11-08 23:06                                 ` Paul Eggert
2023-11-08 23:28                                   ` DJ Delorie
2023-11-09  0:24                                   ` Alejandro Colomar
2023-11-09 14:11                                   ` Jonny Grant
2023-11-09 14:35                                     ` Alejandro Colomar
2023-11-09 14:47                                       ` Jonny Grant
2023-11-09 15:02                                         ` Alejandro Colomar
2023-11-09 17:30                                           ` DJ Delorie
2023-11-09 17:54                                             ` Andreas Schwab
2023-11-09 18:00                                             ` Alejandro Colomar
2023-11-09 19:42                                             ` Jonny Grant
2023-11-09  7:23                                 ` Oskari Pirhonen
2023-11-09 15:20                                 ` [PATCH v2 1/2] " Alejandro Colomar
2023-11-09 15:20                                 ` [PATCH v2 2/2] stpncpy.3, string.3, string_copying.7: Clarify that st[rp]ncpy() pad with null bytes Alejandro Colomar
2023-11-10  5:47                                   ` Oskari Pirhonen
2023-11-10 10:47                                     ` Alejandro Colomar
2023-11-08  2:12           ` strncpy clarify result may not be null terminated Matthew House
2023-11-08 19:33             ` Alejandro Colomar
2023-11-08 19:40               ` Alejandro Colomar
2023-11-09  3:13               ` Matthew House
2023-11-09 10:26                 ` Jonny Grant
2023-11-09 10:31                 ` Jonny Grant
2023-11-09 11:38                   ` Alejandro Colomar
2023-11-09 12:43                     ` Alejandro Colomar
2023-11-09 12:51                     ` Xi Ruoyao
2023-11-09 14:01                       ` Alejandro Colomar
2023-11-09 18:11                     ` Paul Eggert
2023-11-09 23:48                       ` Alejandro Colomar
2023-11-10  5:36                         ` Paul Eggert
2023-11-10 11:05                           ` Alejandro Colomar
2023-11-10 11:47                             ` Alejandro Colomar
2023-11-10 17:58                             ` Paul Eggert
2023-11-10 18:36                               ` Alejandro Colomar
2023-11-10 20:19                                 ` Alejandro Colomar
2023-11-10 23:44                                   ` Jonny Grant
2023-11-10 19:52                               ` Alejandro Colomar
2023-11-10 22:14                                 ` Paul Eggert
2023-11-11 21:13                                   ` Alejandro Colomar
2023-11-11 22:20                                     ` Paul Eggert
2023-11-12  9:52                                     ` Jonny Grant
2023-11-12 10:59                                       ` Alejandro Colomar
2023-11-12 20:49                                         ` Paul Eggert
2023-11-12 21:00                                           ` Alejandro Colomar
2023-11-12 21:45                                             ` Alejandro Colomar
2023-11-13 23:46                                           ` Jonny Grant
2023-11-17 21:57                                         ` Jonny Grant
2023-11-18 10:12                                           ` Alejandro Colomar [this message]
2023-11-18 23:03                                             ` Jonny Grant
2023-11-10 11:36                           ` Jonny Grant
2023-11-10 13:15                             ` Alejandro Colomar
2023-11-18 23:40                               ` Jonny Grant
2023-11-20 11:56                                 ` Jonny Grant
2023-11-20 15:12                                   ` Alejandro Colomar
2023-11-20 23:08                                     ` Jonny Grant
2023-11-20 23:42                                       ` Alejandro Colomar
2023-11-10 11:23                     ` Jonny Grant
2023-11-09 12:23                 ` Alejandro Colomar
2023-11-09 12:35                   ` Alejandro Colomar
2023-11-10  7:06                   ` Oskari Pirhonen
2023-11-10 11:18                     ` Alejandro Colomar
2023-11-11  7:55                       ` Oskari Pirhonen
2023-11-10 16:06                   ` Matthew House
2023-11-10 17:48                     ` Alejandro Colomar
2023-11-13 15:01                       ` Matthew House
2023-11-11 20:55                     ` Jonny Grant
2023-11-11 21:15                       ` Jonny Grant
2023-11-11 22:36                         ` Alejandro Colomar
2023-11-11 23:19                           ` Alejandro Colomar
2023-11-17 21:46                           ` Jonny Grant
2023-11-18  9:37                             ` PDF book of unreleased pages (was: strncpy clarify result may not be null terminated) Alejandro Colomar
2023-11-19  0:22                               ` Deri
2023-11-19  1:19                                 ` Alejandro Colomar
2023-11-19  9:29                                   ` Alejandro Colomar
2023-11-19 16:21                                   ` Deri
2023-11-19 20:58                                     ` Alejandro Colomar
2023-11-20  0:46                                       ` G. Branden Robinson
2023-11-20  9:43                                         ` Alejandro Colomar
2023-11-18  9:44                             ` NULL safety " Alejandro Colomar
2023-11-18 23:21                               ` NULL safety Jonny Grant
2023-11-24 22:25                                 ` Alejandro Colomar
2023-11-25  0:57                                   ` Jonny Grant
2023-11-10 10:40               ` strncpy clarify result may not be null terminated Stefan Puiu
2023-11-10 11:06                 ` Jonny Grant
2023-11-10 11:20                 ` Alejandro Colomar
2023-11-12  9:17 ` [PATCH 0/2] Expand BUGS section of string_copying(7) Alejandro Colomar
2023-11-12  9:18 ` [PATCH 1/2] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar
2023-11-12  9:18 ` [PATCH 2/2] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems Alejandro Colomar
2023-11-12 11:26 ` [PATCH v2 0/3] Improve string_copying(7) Alejandro Colomar
2023-11-12 11:26 ` [PATCH v2 1/3] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar
2023-11-17 21:43   ` Jonny Grant
2023-11-18  0:25     ` Signing all patches and email to this list Matthew House
2023-11-18 23:24       ` Jonny Grant
2023-11-12 11:26 ` [PATCH v2 2/3] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems Alejandro Colomar
2023-11-12 11:27 ` [PATCH v2 3/3] strtcpy.3, string_copying.7: Add strtcpy(3) Alejandro Colomar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZViOGeSYdYVRl-Ky@devuan \
    --to=alx@kernel.org \
    --cc=eggert@cs.ucla.edu \
    --cc=jg@jguk.org \
    --cc=linux-man@vger.kernel.org \
    --cc=mattlloydhouse@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox