Differences between man-pages and libc manual safety markings

linux-man.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Differences between man-pages and libc manual safety markings
@ 2014-10-17 13:26 Michael Kerrisk (man-pages)
       [not found] ` <544118FA.3070003-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-10-17 13:26 UTC (permalink / raw)
  To: Peng Haitao
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Alexandre Oliva,
	Carlos O'Donell,
	linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Hello Haitao,

I was comparing some of the MT-Safety markings in man-pages versus the glibc
manual (https://www.gnu.org/software/libc/manual/html_mono/libc.html)
I found four cases that seem to contradict. Are there errors in either
the man pages or in the glibc manual?

==
ctermid.3       MT-Unsafe race:ctermid/!s
	glibc: MT-Safe

man-pages and glibc manual disagree (man-pages seems to be more
precise than glibc).

==
getcwd.3        MT-Safe env
	glibc: MT-Safe

man-pages and glibc manual disagree on "env" (man-pages seems 
to be more precise than glibc).

==
getlogin.3      MT-Unsafe race:cuserid/!string locale
	glibc: MT-Unsafe race:getlogin race:utent sig:ALRM timer locale

man-pages and glibc manual disagree on "race:cuserid/!string" versus
"race:getlogin"

==
regex.3         MT-Safe env
	glibc: MT-Safe locale

man-pages and glibc manual disagree on "env" versus "locale"

==

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

[parent not found: <544118FA.3070003-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]

* Re: Differences between man-pages and libc manual safety markings
       [not found] ` <544118FA.3070003-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2014-10-20 15:47   ` Carlos O'Donell
       [not found]     ` <CAE2sS1jbGRT4uvBBVAPJkX2Mi4gHG=ii_G713MHhQzyGxO4yyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2014-10-21  8:31   ` Peng Haitao
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 35+ messages in thread
From: Carlos O'Donell @ 2014-10-20 15:47 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Peng Haitao, Alexandre Oliva,
	linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Torvald Riegel

On Fri, Oct 17, 2014 at 9:26 AM, Michael Kerrisk (man-pages)
<mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> I was comparing some of the MT-Safety markings in man-pages versus the glibc
> manual (https://www.gnu.org/software/libc/manual/html_mono/libc.html)
> I found four cases that seem to contradict. Are there errors in either
> the man pages or in the glibc manual?

What's missing here is detailed analysis notes.

In glibc we added the detailed notes into the comments, and Alex did a
great job maintaining those.

Peng, if you have detailed notes, please provide them so we can
compare to glibc's notes.

> ==
> ctermid.3       MT-Unsafe race:ctermid/!s
>         glibc: MT-Safe
>
> man-pages and glibc manual disagree (man-pages seems to be more
> precise than glibc).

IMO, Alex's original marking is correct.

The code in question is a POSIX stub:
===
char *
ctermid (s)
     char *s;
{
  static char name[L_ctermid];

  if (s == NULL)
    s = name;

  return strcpy (s, "/dev/tty");
}
===

Threads could race to set `s` to point to `name` and it would be fine.

Similarly threads could race to write to characters in `s` and it
would also be fine.

They all copy the same thing into the destination buffer.

It is only unsafe if you can prove the intermediate results of a
pointer copy or strcpy change bytes in the destination string in ways
that make it invalid during the copying.

Lastly, note that because `s` is not an opaque type, and the user
controls it, and we never mark a function unsafe if it's a user
controlled buffer. We expect the user to manage that buffer, otherwise
tons of functions become unsafe.

> ==
> getcwd.3        MT-Safe env
>         glibc: MT-Safe
>
> man-pages and glibc manual disagree on "env" (man-pages seems
> to be more precise than glibc).

In this particular case I again think glibc's notation is correct. I
don't see why `env` is involved in getcwd. Please provide more
detailed rationale.

> ==
> getlogin.3      MT-Unsafe race:cuserid/!string locale
>         glibc: MT-Unsafe race:getlogin race:utent sig:ALRM timer locale
>
> man-pages and glibc manual disagree on "race:cuserid/!string" versus
> "race:getlogin"

Peng or others needs to provide more detailed rationale for why they
arrived at this result.

> ==
> regex.3         MT-Safe env
>         glibc: MT-Safe locale
>
> man-pages and glibc manual disagree on "env" versus "locale"

All of the functions in regex touch locales, and therefore we mark
this function `MT-Safe locale` because the `locale` annotations are
defined as being useful to note that MT-Safety is at risk if locale is
modified. Again, functions that modify locales are marked MT-Unsafe
const:locale to indicate that using them would break these functions.

Why is this marked `env`? Is it because the initialization of the
localization information might depend on the environment settings for
the locale? If you can prove that then it might be `MT-Safe env
locale`, but I the initialization is done via setlocale() and
therefore that function has the appropriate markings (not this one).

Cheers,
Carlos.
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

[parent not found: <CAE2sS1jbGRT4uvBBVAPJkX2Mi4gHG=ii_G713MHhQzyGxO4yyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: Differences between man-pages and libc manual safety markings
       [not found]     ` <CAE2sS1jbGRT4uvBBVAPJkX2Mi4gHG=ii_G713MHhQzyGxO4yyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-10-21  8:53       ` Peng Haitao
       [not found]         ` <54461F16.2080705-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Peng Haitao @ 2014-10-21  8:53 UTC (permalink / raw)
  To: Carlos O'Donell, Michael Kerrisk (man-pages)
  Cc: Alexandre Oliva,
	linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Torvald Riegel


On 10/20/2014 11:47 PM, Carlos O'Donell wrote:
> On Fri, Oct 17, 2014 at 9:26 AM, Michael Kerrisk (man-pages)
> <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> I was comparing some of the MT-Safety markings in man-pages versus the glibc
>> manual (https://www.gnu.org/software/libc/manual/html_mono/libc.html)
>> I found four cases that seem to contradict. Are there errors in either
>> the man pages or in the glibc manual?
> 
> What's missing here is detailed analysis notes.
> 
> In glibc we added the detailed notes into the comments, and Alex did a
> great job maintaining those.
> 
> Peng, if you have detailed notes, please provide them so we can
> compare to glibc's notes.
> 
>> ==
>> ctermid.3       MT-Unsafe race:ctermid/!s
>>         glibc: MT-Safe
>>
>> man-pages and glibc manual disagree (man-pages seems to be more
>> precise than glibc).
> 
> IMO, Alex's original marking is correct.
> 

POSIX said: 
The ctermid() function need not be thread-safe if called with a NULL parameter.
The tmpnam() function need not be thread-safe if called with a NULL parameter.


In glibc manual, 
tmpnam() is "MT-Unsafe race:tmpnam/!result"
ctermid() is "MT-Safe"


The code of tmpnam() is:
===
static char tmpnam_buffer[L_tmpnam];

char *tmpnam (char *s)
{
  char tmpbufmem[L_tmpnam];
  char *tmpbuf = s ?: tmpbufmem;

  if (__builtin_expect (__path_search (tmpbuf, L_tmpnam, NULL, NULL, 0), 0))
    return NULL;

  if (__glibc_unlikely (__gen_tempname (tmpbuf, 0, 0, __GT_NOCREATE)))
    return NULL;

  if (s == NULL)
    return (char *) memcpy (tmpnam_buffer, tmpbuf, L_tmpnam);

  return s;
}     
===

The codes of ctermid() and cuserid() are similar to tmpnam(),
so I think
ctermid() should be "MT-Unsafe race:ctermid/!s".
cuserid() should be "MT-Unsafe race:cuserid/!string locale".

Thanks.

-- 
Best Regards,
Peng

> The code in question is a POSIX stub:
> ===
> char *
> ctermid (s)
>      char *s;
> {
>   static char name[L_ctermid];
> 
>   if (s == NULL)
>     s = name;
> 
>   return strcpy (s, "/dev/tty");
> }
> ===
> 
> Threads could race to set `s` to point to `name` and it would be fine.
> 
> Similarly threads could race to write to characters in `s` and it
> would also be fine.
> 
> They all copy the same thing into the destination buffer.
> 
> It is only unsafe if you can prove the intermediate results of a
> pointer copy or strcpy change bytes in the destination string in ways
> that make it invalid during the copying.
> 
> Lastly, note that because `s` is not an opaque type, and the user
> controls it, and we never mark a function unsafe if it's a user
> controlled buffer. We expect the user to manage that buffer, otherwise
> tons of functions become unsafe.
> 
>> ==
>> getcwd.3        MT-Safe env
>>         glibc: MT-Safe
>>
>> man-pages and glibc manual disagree on "env" (man-pages seems
>> to be more precise than glibc).
> 
> In this particular case I again think glibc's notation is correct. I
> don't see why `env` is involved in getcwd. Please provide more
> detailed rationale.
> 
>> ==
>> getlogin.3      MT-Unsafe race:cuserid/!string locale
>>         glibc: MT-Unsafe race:getlogin race:utent sig:ALRM timer locale
>>
>> man-pages and glibc manual disagree on "race:cuserid/!string" versus
>> "race:getlogin"
> 
> Peng or others needs to provide more detailed rationale for why they
> arrived at this result.
> 
>> ==
>> regex.3         MT-Safe env
>>         glibc: MT-Safe locale
>>
>> man-pages and glibc manual disagree on "env" versus "locale"
> 
> All of the functions in regex touch locales, and therefore we mark
> this function `MT-Safe locale` because the `locale` annotations are
> defined as being useful to note that MT-Safety is at risk if locale is
> modified. Again, functions that modify locales are marked MT-Unsafe
> const:locale to indicate that using them would break these functions.
> 
> Why is this marked `env`? Is it because the initialization of the
> localization information might depend on the environment settings for
> the locale? If you can prove that then it might be `MT-Safe env
> locale`, but I the initialization is done via setlocale() and
> therefore that function has the appropriate markings (not this one).
> 
> Cheers,
> Carlos.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-man" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

[parent not found: <54461F16.2080705-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>]

* Re: Differences between man-pages and libc manual safety markings
       [not found]         ` <54461F16.2080705-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
@ 2014-10-23  6:16           ` Alexandre Oliva
       [not found]             ` <oroat3wbsl.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Alexandre Oliva @ 2014-10-23  6:16 UTC (permalink / raw)
  To: Peng Haitao
  Cc: Carlos O'Donell, Michael Kerrisk (man-pages),
	linux-man@vger.kernel.org, Torvald Riegel

On Oct 21, 2014, Peng Haitao <penght-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org> wrote:

> The codes of ctermid() and cuserid() are similar to tmpnam(),

Not quite.

ctermid copies a constant string to the static buffer, so any thread
calls it with a NULL string will cause the copy to be done, and when it
is complete, nothing different will ever be stored in that buffer, so
the thread that completed the call can safely read from the static
buffer, and expect to get the constant string.

tmpnam does not copy a constant string to the static buffer, so each
call may store a different string in it, so race protection is
necessary.

cuserid is somewhere in between: as long as euid doesn't change, and the
same login name is found for that euid, the string will remain the same.
I'm concerned I may have missed the static buffer in the analysis,
though, because I didn't mention it in the comments, and I don't
remember having taking it into account (but then, I don't remember the
analysis of this particular function in any other way ;-).

I'm inclined to decide the glibc manual entry is missing a MT-Unsafe
race:cuserid/!string annotation for cuserid.

As for ctermid, I admit that, strictly speaking, there's a potential
race as defined by POSIX: one thread may be reading from the static
buffer while another is overwriting it with the same bytes, without any
intervening synchronization operation.  I decided that this is a
harmless race.  However, should a machine be unable to perform
char-sized writes, and instead resorted to larger-sized
read-modify-write cycles, then a strcpy implementation that modified one
byte at a time with such cycles could race with itself, even writing the
same sequence of bytes in the same order.  However, if we had such a
machine, and we didn't optimize strcpy to coalesce writes to bytes in
the same word into a single read-modify-write cycle, we'd be failing at
a far more fundamental level.  So I crossed my fingers and hoped we'd
always have such optimizations, so that the potential race would never
become a real problem in this case.

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

[parent not found: <oroat3wbsl.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>]

* Re: Differences between man-pages and libc manual safety markings
       [not found]             ` <oroat3wbsl.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
@ 2014-10-23  9:29               ` Torvald Riegel
       [not found]                 ` <1414056576.8483.79.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Torvald Riegel @ 2014-10-23  9:29 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Peng Haitao, Carlos O'Donell, Michael Kerrisk (man-pages),
	linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Thu, 2014-10-23 at 04:16 -0200, Alexandre Oliva wrote:
> On Oct 21, 2014, Peng Haitao <penght-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org> wrote:
> 
> > The codes of ctermid() and cuserid() are similar to tmpnam(),
> 
> Not quite.
> 
> ctermid copies a constant string to the static buffer, so any thread
> calls it with a NULL string will cause the copy to be done, and when it
> is complete, nothing different will ever be stored in that buffer, so
> the thread that completed the call can safely read from the static
> buffer, and expect to get the constant string.
> 
> tmpnam does not copy a constant string to the static buffer, so each
> call may store a different string in it, so race protection is
> necessary.
> 
> 
> cuserid is somewhere in between: as long as euid doesn't change, and the
> same login name is found for that euid, the string will remain the same.
> I'm concerned I may have missed the static buffer in the analysis,
> though, because I didn't mention it in the comments, and I don't
> remember having taking it into account (but then, I don't remember the
> analysis of this particular function in any other way ;-).
> 
> I'm inclined to decide the glibc manual entry is missing a MT-Unsafe
> race:cuserid/!string annotation for cuserid.
> 
> As for ctermid, I admit that, strictly speaking, there's a potential
> race as defined by POSIX: one thread may be reading from the static
> buffer while another is overwriting it with the same bytes, without any
> intervening synchronization operation.  I decided that this is a
> harmless race.

I don't think it's easy to classify something as a harmless race.  If
you violate the data-race-freedom assumption of an implementation, or of
the compiler, you're really making assumptions about those
implementations.

In this case, you must make assumptions about strcpy's implementation to
prove that this race condition won't trigger an error in any execution.

> However, should a machine be unable to perform
> char-sized writes, and instead resorted to larger-sized
> read-modify-write cycles, then a strcpy implementation that modified one
> byte at a time with such cycles could race with itself, even writing the
> same sequence of bytes in the same order.  However, if we had such a
> machine, and we didn't optimize strcpy to coalesce writes to bytes in
> the same word into a single read-modify-write cycle, we'd be failing at
> a far more fundamental level.

A strcpy implementation can assume that no other thread is observing the
data during the execution of the function.  Thus, it would be allowed to
write intermediate results.  For example, it could be allowed to write
the whole string, but garbage for the terminating zero, and then fix up
the zero afterwards.  It could also use funny SIMD instructions or such
that don't work like normal memory accesses in a concurrent setting.
And so on.

So, to make ctermid safe we would have to put these constraints on the
implementation, compiler, etc.  While I would also guess that none of
the above seems likely on current systems, I don't think we can actually
track and maintain such requirements -- because they're hacks.

This would also be detected by data race detectors, so they would
complain about a MT-Safe function.

I think there's also hardware being designed on which synchronizing
loads/stores differ from nonsynchronizing ones.  glibc doesn't run on
such hardware today, but if we want to do at some point, this would be
an issue.

> So I crossed my fingers and hoped we'd
> always have such optimizations, so that the potential race would never
> become a real problem in this case.

I think we should be much more conservative here.  Having to cross
fingers is not something I want to have to rely on :)

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

[parent not found: <1414056576.8483.79.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>]

* Re: Differences between man-pages and libc manual safety markings
       [not found]                 ` <1414056576.8483.79.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
@ 2014-10-24 11:48                   ` Alexandre Oliva
       [not found]                     ` <or38adofh9.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Alexandre Oliva @ 2014-10-24 11:48 UTC (permalink / raw)
  To: Torvald Riegel
  Cc: Peng Haitao, Carlos O'Donell, Michael Kerrisk (man-pages),
	linux-man@vger.kernel.org

On Oct 23, 2014, Torvald Riegel <triegel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> I don't think it's easy to classify something as a harmless race.

In general, I'd agree with you.

But writing the same data onto the same memory range with code under our
entire control that completes execution on any thread before returning
control to any potental reader can access it is such a case IMHO.

> In this case, you must make assumptions about strcpy's implementation

I did, and I'm quite comfortable with them.

Do you have any evidence that they don't hold, or that they might not
hold, or are you just making wild speculations about compliant but
entirely nonsensical implementations of strcpy that we'd likely never
bring into glibc?

> It could also use funny SIMD instructions or such that don't work like
> normal memory accesses in a concurrent setting.

As long as they don't write outside the memory area of the static char[]
where we're to store the constant string forever, they should still be
safe, because callers can only get to the data after their own thread
finishes writing to the string.  And if the caller wishes to pass the
string on to another thread, then it must ensure the transfer is
properly synchronized.

So the only potentially dangerous case really is that of strcpy writing
intermediate nonsense, or the case I discussed in my previous email, of
larger-than-byte read-modify-write cycles that pick up uninitialized
fragments and then, after another thread initializes those fragments,
overwrite parts of the same word with the uninitialized fragments they
read before.

> I think there's also hardware being designed on which synchronizing
> loads/stores differ from nonsynchronizing ones.

It *still* wouldn't be a problem.  A reader only gets a chance to read
after its own writer completed (over?)writing the memory area with the
bits that shall remain there forever.

Given this more detailed explanation of the conditions that apply and
that IMHO make it perfectly safe, do you still see any concrete error
situation here?

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

[parent not found: <or38adofh9.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>]

* Re: Differences between man-pages and libc manual safety markings
       [not found]                     ` <or38adofh9.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
@ 2014-10-24 12:12                       ` Torvald Riegel
       [not found]                         ` <1414152747.18538.26.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
  2014-10-24 12:14                       ` Torvald Riegel
  1 sibling, 1 reply; 35+ messages in thread
From: Torvald Riegel @ 2014-10-24 12:12 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Peng Haitao, Carlos O'Donell, Michael Kerrisk (man-pages),
	linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Fri, 2014-10-24 at 09:48 -0200, Alexandre Oliva wrote:
> On Oct 23, 2014, Torvald Riegel <triegel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
> > I don't think it's easy to classify something as a harmless race.
> 
> In general, I'd agree with you.
> 
> But writing the same data onto the same memory range with code under our
> entire control that completes execution on any thread before returning
> control to any potental reader can access it is such a case IMHO.

The contract for a normal sequential function is that there must be a
certain state or output *after* it has completed execution.  There is no
guarantee whatsoever about what happens during its execution -- you only
get this for concurrent specifications, to some extent.

> > In this case, you must make assumptions about strcpy's implementation
> 
> I did, and I'm quite comfortable with them.

But did you at the very least document those assumptions on all the
strcpy implementations?  If not, nothing warns anyone working on those
implementations.

> Do you have any evidence that they don't hold, or that they might not
> hold, or are you just making wild speculations about compliant but
> entirely nonsensical implementations of strcpy that we'd likely never
> bring into glibc?

Why do you think that they are nonsensical?  strcpy is a sequential
function, so as long as it doesn't touch memory outside of what it is
supposed to access, and as long as the state/output matches it's
contract when it returns, then the implementation is free to do what it
thinks works best.

> > It could also use funny SIMD instructions or such that don't work like
> > normal memory accesses in a concurrent setting.
> 
> As long as they don't write outside the memory area of the static char[]
> where we're to store the constant string forever, they should still be
> safe, because callers can only get to the data after their own thread
> finishes writing to the string.

I agree that they will not see state before the execution of any of the
concurrent strcpys, and I never said that.  The point is that they can
see intermediate writes of other threads, which are allowed to be
anything.

To put it abstractly: Just because the sequential composition of two
strcpy's copying the same string to the same location is as if the two
strcpy's were idempotent wrt. each other, it doesn't mean that
concurrent execution provides the same guarantees. 

> And if the caller wishes to pass the
> string on to another thread, then it must ensure the transfer is
> properly synchronized.

That's not the point.

> So the only potentially dangerous case really is that of strcpy writing
> intermediate nonsense, or the case I discussed in my previous email, of
> larger-than-byte read-modify-write cycles that pick up uninitialized
> fragments and then, after another thread initializes those fragments,
> overwrite parts of the same word with the uninitialized fragments they
> read before.
> 
> 
> > I think there's also hardware being designed on which synchronizing
> > loads/stores differ from nonsynchronizing ones.
> 
> It *still* wouldn't be a problem.  A reader only gets a chance to read
> after its own writer completed (over?)writing the memory area with the
> bits that shall remain there forever.

The hardware requires synchronizing accesses, and just the mere presence
of a data race may lead to undefined behavior of the program.  We
typically don't have this on current CPUs, where individual loads/stores
are basically atomic, or at least are a combination of the individual
bytes stored concurrently.  But if you bring in a GPU whose firmware, or
the driver, is actually a compiler that may do whole-program
optimization, things look differently.

> 
> Given this more detailed explanation of the conditions that apply and
> that IMHO make it perfectly safe, do you still see any concrete error
> situation here?

Yes.  We can make the trade-off that it's safe *if* in turn, we put the
required assumptions (and check them) on all strcpy implementations.
But if we don't do the latter, then we're introducing a fault, and even
if it may not lead to errors in the present, it's still a fault we're
adding.  I don't see any point in digging our own bug grave, even if
this one here is just a part of it.

So, in my opinion, this should either be unsafe (which would be easy --
is there a real benefit to have it be safe?), or we make it safe and
document the trade-off, and document the constraints on all strcpy
implementations so that future implementers are aware of it.

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

[parent not found: <1414152747.18538.26.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>]

* Re: Differences between man-pages and libc manual safety markings
       [not found]                         ` <1414152747.18538.26.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
@ 2014-10-24 16:31                           ` Alexandre Oliva
       [not found]                             ` <orioj9bfaa.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Alexandre Oliva @ 2014-10-24 16:31 UTC (permalink / raw)
  To: Torvald Riegel
  Cc: Peng Haitao, Carlos O'Donell, Michael Kerrisk (man-pages),
	linux-man@vger.kernel.org

On Oct 24, 2014, Torvald Riegel <triegel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> On Fri, 2014-10-24 at 09:48 -0200, Alexandre Oliva wrote:
>> On Oct 23, 2014, Torvald Riegel <triegel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>> 
>> > I don't think it's easy to classify something as a harmless race.
>> 
>> In general, I'd agree with you.
>> 
>> But writing the same data onto the same memory range with code under our
>> entire control that completes execution on any thread before returning
>> control to any potental reader can access it is such a case IMHO.

> The contract for a normal sequential function is that there must be a
> certain state or output *after* it has completed execution.

We're not talking about the strcpy contract in the abstract.  We're
talking about implementations we control, and that ought to be as
efficient as possible because it's such a frequently used function.
Writing intermediate states is not something such a function would want
to do.

> There is no
> guarantee whatsoever about what happens during its execution

There are plenty of guarantees in the existing implementations.
Admittedly, I didn't look at all of them, but logic tells me they won't
add writes to memory just because they can just so as to feed your FUD.

> But did you at the very least document those assumptions on all the
> strcpy implementations?

No.  I rather derived my reasoning from a perfectly reasonable
requirement we already place on any strcpy implementation we use: that
it shouldn't do more work than what is expected of it.

> Why do you think that they are nonsensical?  strcpy is a sequential
> function, so as long as it doesn't touch memory outside of what it is
> supposed to access, and as long as the state/output matches it's
> contract when it returns, then the implementation is free to do what it
> thinks works best.

Sorry, no, we're not living in a purely theoretical world.  We're living
in a world of real and efficient implementations of strcpy.  They don't
waste cycles doing useless work such as writing garbage before writing
what is expected of them.

> The point is that they can see intermediate writes of other threads

Which, except for the case I mentioned, are writes of the same data that
was already there.  What's the problem with that, again?

> To put it abstractly: Just because the sequential composition of two
> strcpy's copying the same string to the same location is as if the two
> strcpy's were idempotent wrt. each other, it doesn't mean that
> concurrent execution provides the same guarantees. 

Will you please come down to Earth and have a look at actual strcpy
implementations we ship?

>> > I think there's also hardware being designed on which synchronizing
>> > loads/stores differ from nonsynchronizing ones.
>> 
>> It *still* wouldn't be a problem.  A reader only gets a chance to read
>> after its own writer completed (over?)writing the memory area with the
>> bits that shall remain there forever.

> The hardware requires synchronizing accesses, and just the mere presence
> of a data race may lead to undefined behavior of the program.  We
> typically don't have this on current CPUs, where individual loads/stores
> are basically atomic, or at least are a combination of the individual
> bytes stored concurrently.  But if you bring in a GPU whose firmware, or
> the driver, is actually a compiler that may do whole-program
> optimization, things look differently.

If we get there, we can change this function to use a pre-initialized
static buffer and skip the strcpy altogether if the user doesn't supply
a buffer.

> Yes.  We can make the trade-off that it's safe *if* in turn, we put the
> required assumptions (and check them) on all strcpy implementations.

If you think that necessary, would you please submit a patch to all
implementations of such performance-critical micro functions indicating
they must not do needless work such as writing garbage before writing
what they're told to write at a certain portion of memory?

That would be totally redundant IMHO, and it should probably be in
pretty much every glibc source file, but if it makes you happy, I
wouldn't oppose that.

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

[parent not found: <orioj9bfaa.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>]

* Re: Differences between man-pages and libc manual safety markings
       [not found]                             ` <orioj9bfaa.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
@ 2014-10-24 19:15                               ` Torvald Riegel
       [not found]                                 ` <1414178101.18538.53.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
  2014-10-27 20:46                               ` Mark Thompson
  1 sibling, 1 reply; 35+ messages in thread
From: Torvald Riegel @ 2014-10-24 19:15 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Peng Haitao, Carlos O'Donell, Michael Kerrisk (man-pages),
	linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Fri, 2014-10-24 at 14:31 -0200, Alexandre Oliva wrote:
> On Oct 24, 2014, Torvald Riegel <triegel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
> > On Fri, 2014-10-24 at 09:48 -0200, Alexandre Oliva wrote:
> >> On Oct 23, 2014, Torvald Riegel <triegel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> >> 
> >> > I don't think it's easy to classify something as a harmless race.
> >> 
> >> In general, I'd agree with you.
> >> 
> >> But writing the same data onto the same memory range with code under our
> >> entire control that completes execution on any thread before returning
> >> control to any potental reader can access it is such a case IMHO.
> 
> > The contract for a normal sequential function is that there must be a
> > certain state or output *after* it has completed execution.
> 
> We're not talking about the strcpy contract in the abstract.

When somebody works on a strcpy implementation, the strcpy contract is
the law.

> We're
> talking about implementations we control, and that ought to be as
> efficient as possible because it's such a frequently used function.
> Writing intermediate states is not something such a function would want
> to do.

Please.  We're talking corner cases here.  Just because there's
obviously bogus stuff that shares a property with a corner case doesn't
mean that no corner cases would ever arise in practice.

> > There is no
> > guarantee whatsoever about what happens during its execution
> 
> There are plenty of guarantees in the existing implementations.
> Admittedly, I didn't look at all of them,

So you looked at some.  Did you look at future ones?  Is your future
self guaranteed to tell other future implementers about the assumptions
you make?

> but logic tells me they won't
> add writes to memory just because they can just so as to feed your FUD.

Please try to understand the difference between being conservative,
defensive, or just mindful of future people working on glibc -- and FUD.
Trying to not create a potential future problem is not FUD.

> > But did you at the very least document those assumptions on all the
> > strcpy implementations?
> 
> No.

So how should people implementing strcpy differently, in the future,
have any idea about your reasoning?

> I rather derived my reasoning from a perfectly reasonable
> requirement we already place on any strcpy implementation we use: that
> it shouldn't do more work than what is expected of it.

And that's not what needs to happen.  Doing extra, intermediate writes
could be one reason.  But preventing that doesn't prevent the problem in
general.  See my remarks about GPUs etc.

> > Why do you think that they are nonsensical?  strcpy is a sequential
> > function, so as long as it doesn't touch memory outside of what it is
> > supposed to access, and as long as the state/output matches it's
> > contract when it returns, then the implementation is free to do what it
> > thinks works best.
> 
> Sorry, no, we're not living in a purely theoretical world.  We're living
> in a world of real and efficient implementations of strcpy.  They don't
> waste cycles doing useless work such as writing garbage before writing
> what is expected of them.

Likewise.  Please, just try to be a bit more conservative in the
assumptions you make about future HW and potential implementations.

> > The point is that they can see intermediate writes of other threads
> 
> Which, except for the case I mentioned, are writes of the same data that
> was already there.  What's the problem with that, again?
> 
> > To put it abstractly: Just because the sequential composition of two
> > strcpy's copying the same string to the same location is as if the two
> > strcpy's were idempotent wrt. each other, it doesn't mean that
> > concurrent execution provides the same guarantees. 
> 
> Will you please come down to Earth and have a look at actual strcpy
> implementations we ship?

This is not the point!  I already said that I would also guess that it
works in practice on current systems.

Also, if someone should have looked at them, then this would be you.

> >> > I think there's also hardware being designed on which synchronizing
> >> > loads/stores differ from nonsynchronizing ones.
> >> 
> >> It *still* wouldn't be a problem.  A reader only gets a chance to read
> >> after its own writer completed (over?)writing the memory area with the
> >> bits that shall remain there forever.
> 
> > The hardware requires synchronizing accesses, and just the mere presence
> > of a data race may lead to undefined behavior of the program.  We
> > typically don't have this on current CPUs, where individual loads/stores
> > are basically atomic, or at least are a combination of the individual
> > bytes stored concurrently.  But if you bring in a GPU whose firmware, or
> > the driver, is actually a compiler that may do whole-program
> > optimization, things look differently.
> 
> If we get there, we can change this function to use a pre-initialized
> static buffer and skip the strcpy altogether if the user doesn't supply
> a buffer.

But we won't know!  Because nobody (except you maybe, unless you forgot)
will remember that there is this relationship -- you haven't documented
the constraint anywhere.  And when somebody else implements strcpy, then
she'll think about just strcpy, not the additional requirement you added
without documenting it.

> > Yes.  We can make the trade-off that it's safe *if* in turn, we put the
> > required assumptions (and check them) on all strcpy implementations.
> 
> If you think that necessary, would you please submit a patch to all
> implementations of such performance-critical micro functions indicating
> they must not do needless work such as writing garbage before writing
> what they're told to write at a certain portion of memory?

1.) That is not what we'd have to document.
2.) You should have done that when deciding to rely on this for making
the function MT-Safe.

> That would be totally redundant IMHO, and it should probably be in
> pretty much every glibc source file, but if it makes you happy, I
> wouldn't oppose that.

That's not the point.  Let me try to make another, similar example, to
perhaps make it clearer.

nptl has very little documentation of the precise contracts / semantics
of many of the synchronization mechanisms etc.  So when the people
actively working on it change, all the silent assumptions get lost.
Which leaves us in a state like now where for a lot of things, we don't
really know why they work, under which additional assumptions, etc.  And
it's so much harder to maintain as a result.

Another good example, and one that I've been looking at recently, are
the atomic operations.  If you look closely, you'll see that on the
different archs, some of them vary in terms of the included barriers.
And it's not even an obvious bug, because if you look at just all the
atomics and not all uses of them in glibc, then making those choices is
understandable.  It looks reasonable from an arch maintainer
perspective.  But it's either too few or more barriers than necessary.
And what happens with some of the powerpc or x86 atomic ops is that if
you use them in a way that seems to match what they promise, it won't
work.  Yes, it works in (most of) the current code, but it's
error-prone.  How do we expect to keep this consistent and keep on top
of all this if we're not programming defensively and document properly?

Do you see the similarities?  What did you say about me spreading FUD
again?

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

[parent not found: <1414178101.18538.53.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>]

* Re: Differences between man-pages and libc manual safety markings
       [not found]                                 ` <1414178101.18538.53.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
@ 2014-10-30 18:24                                   ` Alexandre Oliva
       [not found]                                     ` <orbnottnzb.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Alexandre Oliva @ 2014-10-30 18:24 UTC (permalink / raw)
  To: Torvald Riegel
  Cc: Peng Haitao, Carlos O'Donell, Michael Kerrisk (man-pages),
	linux-man@vger.kernel.org

On Oct 24, 2014, Torvald Riegel <triegel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

>> There are plenty of guarantees in the existing implementations.
>> Admittedly, I didn't look at all of them,

> So you looked at some.  Did you look at future ones?

I see that's a rhetorical question.  How could anyone?  How are they
relevant to *current* properties, that I've been asked to document in
glibc?  Do you remember that that very manual states they are not
promises of future behavior of future releases of glibc?

>> > But did you at the very least document those assumptions on all the
>> > strcpy implementations?

>> No.

> So how should people implementing strcpy differently, in the future,
> have any idea about your reasoning?

How could documentation next to current implementations possibly affect
people implementing strcpy from scratch in new architectures?  How am I
supposed to divine what the important features of such future
architectures are so that I can document them?  How would you?

> I already said that I would also guess that it works in practice on
> current systems.

Good.  Then we're in agreement that the current documentation is
accurate, in spite of the potential race that future implementations of
strcpy might introduce.

>> > The hardware requires synchronizing accesses, and just the mere presence
>> > of a data race may lead to undefined behavior of the program.

Sorry, but “undefined behavior” is standardese for “don't do that”.
Real hardware doesn't behave in undefined ways.  Since we're talking
about existing code running on actual hardware, it is always possible to
enumerate what kinds of behaviors may be observed as a consequence of a
running certain instruction sequences, with or without interference of
other processors/threads/whatever.  In some cases, like this one, it can
be proven that none of the potential behaviors deviates from that which
is desired.

>> > But if you bring in a GPU whose firmware, or
>> > the driver, is actually a compiler that may do whole-program
>> > optimization, things look differently.

Yeah, it can just optimize away the strcpy and use the original string,
since it's not modified by the caller and its address is not compared
wtih anything else.  Way to go!  And still safe.

> But we won't know!

Enters your hypothetical race detector for the hypothetical future
architecture to report the hypothetical race introduced by the
hypothetical strcpy.

> How do we expect to keep this consistent and keep on top of all this
> if we're not programming defensively and document properly?

I agree it might be useful to have some part of the manual, or some
other piece of internal documentation, where we document assumptions
guiding implementation decisions.  We'll have a maintenance challenge in
keeping them accurate over time, but I guess they might be better than
nothing.  Rather than documenting properties of one specific (per-arch,
per-ABI, whatever) implementation in its own file, it would be a
reference point for anyone starting a new arch from scratch.

Maybe you can use whatever pearls you find while going through my notes
in the manual to start such a section with information you might find
useful.

I say you, singular, because, well, you pose as having a much better
understanding of the issues than I do, so you couldn't possibly disagree
that you're far better suited to do this job than I am.

> Do you see the similarities?

Not much, really.  IIUC, you described cases in which low-level
implementations didn't offer standard-imposed guarantees that users
expected and relied on, whereas what we are speaking of here is
implementations going beyond standard-imposed guarantees, and other
implementation internals relying on these (observable but undocumented)
guarantees to offer safety properties that, per the standard, need not
even be offered in the first place.  I wish I could say this reliance
led to better performance too, but in this case, it doesn't.

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

[parent not found: <orbnottnzb.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>]

* Re: Differences between man-pages and libc manual safety markings
       [not found]                                     ` <orbnottnzb.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
@ 2014-10-30 19:01                                       ` Torvald Riegel
       [not found]                                         ` <1414695671.10085.180.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Torvald Riegel @ 2014-10-30 19:01 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Peng Haitao, Carlos O'Donell, Michael Kerrisk (man-pages),
	linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Thu, 2014-10-30 at 16:24 -0200, Alexandre Oliva wrote:
> On Oct 24, 2014, Torvald Riegel <triegel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
> >> There are plenty of guarantees in the existing implementations.
> >> Admittedly, I didn't look at all of them,
> 
> > So you looked at some.  Did you look at future ones?
> 
> I see that's a rhetorical question.  How could anyone?

Yep it's a trick question, to some extent.  The point I'm trying to make
is that future implementations are constrained by the contract of the
function, including the MT-Safety guarantees you give.  At the very
least it's a big heads-up to any future implementation, right?

> How are they
> relevant to *current* properties, that I've been asked to document in
> glibc?  Do you remember that that very manual states they are not
> promises of future behavior of future releases of glibc?

I would have hoped that another goal had been to also come up with
documentation that can be easily maintained and adapted when
implementations change.  I agree that the annotations are no promise of
future behavior.  But that does not mean that they should bit-rot
quickly because it's unclear whether they still hold when anything was
changed, like a strcpy implementation that still perfectly fits the
strcpy contract and MT-Safety annotations and comments.

> >> > But did you at the very least document those assumptions on all the
> >> > strcpy implementations?
> 
> >> No.
> 
> > So how should people implementing strcpy differently, in the future,
> > have any idea about your reasoning?
> 
> How could documentation next to current implementations possibly affect
> people implementing strcpy from scratch in new architectures?

Well, they look at other code, you put the constraint in the manual as
comments, or anywhere else where it makes future contributors notice it.
YOu can't put the comment on a nonexisting future implementation, but
there's certainly other ways to do it.

> How am I
> supposed to divine what the important features of such future
> architectures are so that I can document them?  How would you?

You stick to the contract of a function.  In the strcpy example, you say
that the string data is required to not be subject to concurrent
accesses.

> > I already said that I would also guess that it works in practice on
> > current systems.
> 
> Good.  Then we're in agreement that the current documentation is
> accurate, in spite of the potential race that future implementations of
> strcpy might introduce.
> 
> 
> >> > The hardware requires synchronizing accesses, and just the mere presence
> >> > of a data race may lead to undefined behavior of the program.
> 
> Sorry, but “undefined behavior” is standardese for “don't do that”.

It's don't do that for a reason, not just don't do that and you'll be
fine.

> Real hardware doesn't behave in undefined ways.  Since we're talking
> about existing code running on actual hardware, it is always possible to
> enumerate what kinds of behaviors may be observed as a consequence of a
> running certain instruction sequences, with or without interference of
> other processors/threads/whatever.  In some cases, like this one, it can
> be proven that none of the potential behaviors deviates from that which
> is desired.
> 
> >> > But if you bring in a GPU whose firmware, or
> >> > the driver, is actually a compiler that may do whole-program
> >> > optimization, things look differently.
> 
> Yeah, it can just optimize away the strcpy and use the original string,
> since it's not modified by the caller and its address is not compared
> wtih anything else.  Way to go!  And still safe.

Do you see that this is just what you would be hoping for, but not all
that the compiler would be allowed to do?

> > But we won't know!
> 
> Enters your hypothetical race detector for the hypothetical future
> architecture to report the hypothetical race introduced by the
> hypothetical strcpy.

Please try to understand the issue.  Maybe you can't imagine someone
building glibc with ThreadSanitizer or such, but that doesn't mean it
won't happen.

> > How do we expect to keep this consistent and keep on top of all this
> > if we're not programming defensively and document properly?
> 
> I agree it might be useful to have some part of the manual, or some
> other piece of internal documentation, where we document assumptions
> guiding implementation decisions.  We'll have a maintenance challenge in
> keeping them accurate over time, but I guess they might be better than
> nothing.  Rather than documenting properties of one specific (per-arch,
> per-ABI, whatever) implementation in its own file, it would be a
> reference point for anyone starting a new arch from scratch.
> 
> Maybe you can use whatever pearls you find while going through my notes
> in the manual to start such a section with information you might find
> useful.
> 
> I say you, singular, because, well, you pose as having a much better
> understanding of the issues than I do, so you couldn't possibly disagree
> that you're far better suited to do this job than I am.

How is that supposed to work if you haven't documented all the
assumptions you made (i.e., if ctermid is not just an outlier)?  Should
I review all the code again?  I offered help in working on a way to
define the MT-Safety annotations thoroughly and in a hopefully more
future-proof way before you started to work -- but you didn't seem
interested back then.

You said elsewhere in the thread that you have no other information than
what's in the manual.  If you have anything else, in general, that might
hint at what assumptions you made beyond the contracts of functions
(e.g., assumptions about reasonable implementations) but didn't make
explicit in the manual, please share this.

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

[parent not found: <1414695671.10085.180.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>]

* Re: Differences between man-pages and libc manual safety markings
       [not found]                                         ` <1414695671.10085.180.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
@ 2014-11-01  8:48                                           ` Alexandre Oliva
       [not found]                                             ` <ora94b8fxl.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Alexandre Oliva @ 2014-11-01  8:48 UTC (permalink / raw)
  To: Torvald Riegel
  Cc: Peng Haitao, Carlos O'Donell, Michael Kerrisk (man-pages),
	linux-man@vger.kernel.org

On Oct 30, 2014, Torvald Riegel <triegel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> On Thu, 2014-10-30 at 16:24 -0200, Alexandre Oliva wrote:
>> >> > The hardware requires synchronizing accesses, and just the mere presence
>> >> > of a data race may lead to undefined behavior of the program.
>> 
>> Sorry, but “undefined behavior” is standardese for “don't do that”.

> It's don't do that for a reason, not just don't do that and you'll be
> fine.

Yup.  But remember, it's users of the standard we implement that are not
supposed to do that.  We can and often do get away with such stuff as
part of the implementation of the standard.  There's a long history of
doing so: remember when we implemented mutexes without standard atomics?
Nowadays, you might look at them and find those implementations
disgusting, and think “how the heck nobody thought of documenting the
reliance of this code on certain memory model properties that no longer
hold?”  The obvious answer is that, back then, such properties were
perfectly normal and they had no idea something else might take over in
the distant future.  The point I'm trying to make is that there's only
so much future-proofness you can put into this sort of documentation.
The most difficult bits to document are not those that are surprising as
of the writing of the docs, but those that are blatantly obvious to
pretty much anyone at that time, but that over time become surprising.
Historians run into lots of walls related with this sort of implicit
knowledge of a time.

> Please try to understand the issue.

I do.  It's very clear to me.  You wanted and hoped me to do a lot of
work that I was not supposed to do, and I didn't do it, in part because
I have little hope of seeing the future as well as you claim to be able
to.

> How is that supposed to work if you haven't documented all the
> assumptions you made (i.e., if ctermid is not just an outlier)?

It is, but if I were to document all the assumptions I made, I'd have to
write several books of assumptions, encoding all the knowledge I've
accumulated about how past and present hardware architectures work, any
one of which might change in future architectures.

> Should I review all the code again?

Sure!  Clearly you're not happy with the annotations and comments I
made, so I don't see another way to go about it.

> you didn't seem interested back then.

There's a huge difference between being interested and being pragmatic
about fulfilling one's assignment from up the management chain, rather
than doing somebody else's work while slowing down one's own.

> You said elsewhere in the thread that you have no other information than
> what's in the manual

... and what we covered in earlier discussions.

> what assumptions you made beyond the contracts of functions

Heh.  It's almost funny that you talk about the contracts of functions,
when you yourself claimed their definitions were not clear enough to
figure out what the precise requirements were.  Please make up your mind
about that point before wasting more my time, will you?

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

[parent not found: <ora94b8fxl.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>]

* Re: Differences between man-pages and libc manual safety markings
       [not found]                                             ` <ora94b8fxl.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
@ 2014-11-01 10:47                                               ` Torvald Riegel
       [not found]                                                 ` <1414838867.10085.431.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Torvald Riegel @ 2014-11-01 10:47 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Peng Haitao, Carlos O'Donell, Michael Kerrisk (man-pages),
	linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Sat, 2014-11-01 at 06:48 -0200, Alexandre Oliva wrote:
> On Oct 30, 2014, Torvald Riegel <triegel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
> > On Thu, 2014-10-30 at 16:24 -0200, Alexandre Oliva wrote:
> >> >> > The hardware requires synchronizing accesses, and just the mere presence
> >> >> > of a data race may lead to undefined behavior of the program.
> >> 
> >> Sorry, but “undefined behavior” is standardese for “don't do that”.
> 
> > It's don't do that for a reason, not just don't do that and you'll be
> > fine.
> 
> Yup.  But remember, it's users of the standard we implement that are not
> supposed to do that.  We can and often do get away with such stuff as
> part of the implementation of the standard.  There's a long history of
> doing so: remember when we implemented mutexes without standard atomics?
> Nowadays, you might look at them and find those implementations
> disgusting, and think “how the heck nobody thought of documenting the
> reliance of this code on certain memory model properties that no longer
> hold?”  The obvious answer is that, back then, such properties were
> perfectly normal and they had no idea something else might take over in
> the distant future.  The point I'm trying to make is that there's only
> so much future-proofness you can put into this sort of documentation.

I don't disagree with this in general.  But in the concrete case we're
talking about, it's really not that hard.  Does strcpy need to consider
that there are concurrent accesses and that it has to do achieve
something under concurrent execution, or does it not?

It's not surprising that this matters today (ie, when you made the
choices), and it's not like we've been aware of this since just
yesterday.

> The most difficult bits to document are not those that are surprising as
> of the writing of the docs, but those that are blatantly obvious to
> pretty much anyone at that time, but that over time become surprising.
> Historians run into lots of walls related with this sort of implicit
> knowledge of a time.

That's why I'm arguing for being conservative: Be a little cautious with
what you consider obvious.  I definitely agree that one can't be perfect
with that but, for example, it's a clear difference whether you
implement on an additional implementation property or can just rely on
the sequential contract of the function.

> > Please try to understand the issue.
> 
> I do.  It's very clear to me.  You wanted and hoped me to do a lot of
> work that I was not supposed to do, and I didn't do it, in part because
> I have little hope of seeing the future as well as you claim to be able
> to.

I don't asked you to know about everything that happens in the future.
Because that will be hard, as you say.  But the result of this is that
it helps to be cautious when making assumptions about things that may
easily change in the future and that you can't predict.

IOW, when you can't easily predict future implementations, be
conservative when making assumptions about them.  Or at least document
that.

> > How is that supposed to work if you haven't documented all the
> > assumptions you made (i.e., if ctermid is not just an outlier)?
> 
> It is, but if I were to document all the assumptions I made, I'd have to
> write several books of assumptions, encoding all the knowledge I've
> accumulated about how past and present hardware architectures work, any
> one of which might change in future architectures.

I don't think it's that hard.  Coming back to the "being conservative"
point, if you feel like you have to write a book about the assumptions
you make that your code (or your documentation, annotations, ...) rely
on, then maybe it's better to take a step back and do not make those
assumptions in the first place.

In our case here, if you feel like what you require from the strcpy
implementation is very complex, perhaps just not make the requirement
and tag ctermid as unsafe?

Or, don't go for specifying assumptions about strcpy in the ctermid
docs, but rather try to solve it at the other end by documenting that
strcpy has to work well under concurrent execution, in particular under
concurrent but "idempotent" copies to a memory range.

> > what assumptions you made beyond the contracts of functions
> 
> Heh.  It's almost funny that you talk about the contracts of functions,
> when you yourself claimed their definitions were not clear enough to
> figure out what the precise requirements were.  Please make up your mind
> about that point before wasting more my time, will you?

I never said that the sequential contract of strcpy would be incomplete
or wrong in some way.  I said that the MT-Safety definition needs
improvement.
When you make the assumption that it has to work even under concurrent
accesses to the destination string, you go beyond the sequential
contract of the function.  You specified it as MT-Safe, but in your
specification that means that the caller-provided data is supposed to be
protected from concurrent accesses by the caller.  So, your assumption
still conflicts with the contract when taking MT-Safe docs into account.
Thus, what I said was not inconsistent.

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

[parent not found: <1414838867.10085.431.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>]

* Re: Differences between man-pages and libc manual safety markings
       [not found]                                                 ` <1414838867.10085.431.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
@ 2014-11-01 18:32                                                   ` Alexandre Oliva
       [not found]                                                     ` <orwq7e22n2.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Alexandre Oliva @ 2014-11-01 18:32 UTC (permalink / raw)
  To: Torvald Riegel
  Cc: Peng Haitao, Carlos O'Donell, Michael Kerrisk (man-pages),
	linux-man@vger.kernel.org

On Nov  1, 2014, Torvald Riegel <triegel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> It's not surprising that this matters today (ie, when you made the
> choices), and it's not like we've been aware of this since just
> yesterday.

> That's why I'm arguing for being conservative

That goes both ways.  While strcpy coded for current standards might
wish to make such optimizations, old code written for earlier standards
that did not make allowances for the proposed strcpy optimization would
break.  So we have to be conservative in strcpy to avoid breaking valid
old programs (per the standards they were written for), and this implies
not making the proposed optimization, which brings us back to the
conclusion that the ctermid(NULL) implementation is MT-Safe.  And
AS-Safe, too.

> it helps to be cautious when making assumptions about things that may
> easily change in the future and that you can't predict.

Per the above, this one property of strcpy is not one that can *easily*
change.  Quite the opposite.  It takes a lot of wording contortionism to
make writing garbage fit into the strcpy contract even under current
standards.

> In our case here, if you feel like what you require from the strcpy
> implementation is very complex

I don't.  The requirements are the common requirements that apply to all
historical standards that have specified strcpy.  Nothing beyond that.
Now that's not much of a strong or surprising assumption, is it?

> Or, don't go for specifying assumptions about strcpy in the ctermid
> docs, but rather try to solve it at the other end by documenting that
> strcpy has to work well under concurrent execution, in particular under
> concurrent but "idempotent" copies to a memory range.

My take is that requirement is already coded in early C standards.c

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

[parent not found: <orwq7e22n2.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>]

* Re: Differences between man-pages and libc manual safety markings
       [not found]                                                     ` <orwq7e22n2.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
@ 2014-11-01 18:58                                                       ` Torvald Riegel
       [not found]                                                         ` <1414868298.10085.488.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Torvald Riegel @ 2014-11-01 18:58 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Peng Haitao, Carlos O'Donell, Michael Kerrisk (man-pages),
	linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Sat, 2014-11-01 at 16:32 -0200, Alexandre Oliva wrote:
> On Nov  1, 2014, Torvald Riegel <triegel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
> > It's not surprising that this matters today (ie, when you made the
> > choices), and it's not like we've been aware of this since just
> > yesterday.
> 
> > That's why I'm arguing for being conservative
> 
> That goes both ways.  While strcpy coded for current standards might
> wish to make such optimizations, old code written for earlier standards
> that did not make allowances for the proposed strcpy optimization would
> break.

Which earlier standard are you referring to?  Did any old standard have
strcpy as something else than a sequentially-specified function?

> So we have to be conservative in strcpy to avoid breaking valid
> old programs (per the standards they were written for), and this implies
> not making the proposed optimization, which brings us back to the
> conclusion that the ctermid(NULL) implementation is MT-Safe.  And
> AS-Safe, too.
> 
> > it helps to be cautious when making assumptions about things that may
> > easily change in the future and that you can't predict.
> 
> Per the above, this one property of strcpy is not one that can *easily*
> change.  Quite the opposite.  It takes a lot of wording contortionism to
> make writing garbage fit into the strcpy contract even under current
> standards.

No it does not.  Sequential specifications for functions are before
invocation / after invocation rules -- how the function does it is not
restricted, especially if all it does is write to the destination
string, in some way.  It is not specified as operating on
volatile-qualified data, so C as-if rule applies.

Same for compilers.  If, for a piece of sequential code, the compiler
can prove that a location will be written to, then sure it is allowed to
write a speculative value early, for example.  There's a reason we have
the volatile qualifier and it's not the default.

> > In our case here, if you feel like what you require from the strcpy
> > implementation is very complex
> 
> I don't.  The requirements are the common requirements that apply to all
> historical standards that have specified strcpy.  Nothing beyond that.
> Now that's not much of a strong or surprising assumption, is it?
> 
> > Or, don't go for specifying assumptions about strcpy in the ctermid
> > docs, but rather try to solve it at the other end by documenting that
> > strcpy has to work well under concurrent execution, in particular under
> > concurrent but "idempotent" copies to a memory range.
> 
> My take is that requirement is already coded in early C standards.c
> 

They didn't specify rules for multi-threaded execution, did they?
There's volatile, and the as-if rule, but the latter really allows stuff
like speculative writes.

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

[parent not found: <1414868298.10085.488.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>]

* Re: Differences between man-pages and libc manual safety markings
       [not found]                                                         ` <1414868298.10085.488.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
@ 2014-11-03  5:13                                                           ` Alexandre Oliva
       [not found]                                                             ` <or4mug27f7.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Alexandre Oliva @ 2014-11-03  5:13 UTC (permalink / raw)
  To: Torvald Riegel
  Cc: Peng Haitao, Carlos O'Donell, Michael Kerrisk (man-pages),
	linux-man@vger.kernel.org

On Nov  1, 2014, Torvald Riegel <triegel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> On Sat, 2014-11-01 at 16:32 -0200, Alexandre Oliva wrote:
>> That goes both ways.  While strcpy coded for current standards might
>> wish to make such optimizations, old code written for earlier standards
>> that did not make allowances for the proposed strcpy optimization would
>> break.

> Which earlier standard are you referring to?

I think going back to C90 would do.

> Did any old standard have strcpy as something else than a
> sequentially-specified function?

Your question needs to be rephrased to take out the “sequentially-”
noise that distorts your reasoning into some kind of tunnel vision.

There's nothing specifically sequential about it, nor is it only
before/after invocation: the standard states what the function does, and
if it does something else that conflicts with the specification and is
observable, it diverges from the specification.  Right?

> especially if all it does is write to the destination
> string, in some way.

The way is not specified, but it does not state that it is to write
something else there before, and doing so is NOT allowed by the as-if
rule.  Consider a function that goes:

  for (;;) {
    extern char buffer[];
    strcpy (buffer, "foo");
    signal (SIGUSR1, testme);
    strcpy (buffer, "fool");
    signal (SIGUSR1, SIG_IGN);
  } 

Now, if the signal handler testme were to inspect buffer[1] (knowing the
only window in which it may be activated is the above, in a
single-threaded program), what values could it possibly find there?
Please justify with quotes from combinations of C and POSIX standards of
the same vintage you can find.  How about buffer[0], and buffer[3]?

> It is not specified as operating on volatile-qualified data

volatile doesn't mean what you appear to suggest it does.

volatile means operations are not to be reordered, combined or otherwise
optimized.  It does not mean writing garbage to a global variable is
allowed just because it's not volatile, any more than it means the as-if
rule allows a function to release a lock on entry and take it again
before returning.  What if printf were to do that on the lock that
controls stdout?  Surely that would be observable, as much as the
contents of the buffer within the signal handler.

> They didn't specify rules for multi-threaded execution, did they?

Exactly!

I don't think you can push the as-if rule as far as you suggest when it
comes to global symbols, or other symbols whose pointers have escaped.
POSIX memory synchronization requirements might make speculative writes
defensible for asynchronicity arising from threads, but from async
signals?

I don't see allowances for that, do you?

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

[parent not found: <or4mug27f7.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>]

* Re: Differences between man-pages and libc manual safety markings
       [not found]                                                             ` <or4mug27f7.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
@ 2014-11-03 16:10                                                               ` Torvald Riegel
       [not found]                                                                 ` <1415031006.4531.44.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Torvald Riegel @ 2014-11-03 16:10 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Peng Haitao, Carlos O'Donell, Michael Kerrisk (man-pages),
	linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Mon, 2014-11-03 at 03:13 -0200, Alexandre Oliva wrote:
> On Nov  1, 2014, Torvald Riegel <triegel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
> > On Sat, 2014-11-01 at 16:32 -0200, Alexandre Oliva wrote:
> >> That goes both ways.  While strcpy coded for current standards might
> >> wish to make such optimizations, old code written for earlier standards
> >> that did not make allowances for the proposed strcpy optimization would
> >> break.
> 
> > Which earlier standard are you referring to?
> 
> I think going back to C90 would do.
> 
> > Did any old standard have strcpy as something else than a
> > sequentially-specified function?
> 
> Your question needs to be rephrased to take out the “sequentially-”
> noise that distorts your reasoning into some kind of tunnel vision.

Ignoring perhaps signal handlers for a second, is there anything in C90
that is not a sequential program?

> There's nothing specifically sequential about it, nor is it only
> before/after invocation: the standard states what the function does,

Yes.  And how do function definitions work?  You specify before and
after.  Because you couldn't reasonably specify how it's implemented,
nor would you want to.  So it has a precondition, and it has a
postcondition.

Just think about the sorting function analogy I mentioned, will you?

> and
> if it does something else that conflicts with the specification and is
> observable, it diverges from the specification.  Right?

Sure, but we still don't agree what is specified, so this sentence here
doesn't add anything.

> 
> > especially if all it does is write to the destination
> > string, in some way.
> 
> The way is not specified,

Exactly.  Period.  Can it set bit by bit?  Sure it can, why not?

But you would end up with the same kind of "garbage" as you call it.  Or
you could write by adding 1's until it reaches the final value.

> but it does not state that it is to write
> something else there before, and doing so is NOT allowed by the as-if
> rule.  Consider a function that goes:
> 
>   for (;;) {
>     extern char buffer[];
>     strcpy (buffer, "foo");
>     signal (SIGUSR1, testme);
>     strcpy (buffer, "fool");
>     signal (SIGUSR1, SIG_IGN);
>   } 
> 
> 
> Now, if the signal handler testme were to inspect buffer[1] (knowing the
> only window in which it may be activated is the above, in a
> single-threaded program), what values could it possibly find there?
> Please justify with quotes from combinations of C and POSIX standards of
> the same vintage you can find.  How about buffer[0], and buffer[3]?

I find it very telling that you completely ignored the sorting function
analogy I mentioned.  Here is the paragraph again, from my previous
email:

Would you make any assumptions about the stores performed by a sorting
function that is specified to take an array, and return with the array's
elements being sorted?  Would you require it to only write finally
sorted data out, or would you allow it to use the array as scratch space
too?  I guess the latter.  And the same applies to strcpy, you don't
want to restrict whether it copies forwards or backwards, for example.
And you don't have to for sequential code.

Unless you reply to that, I won't spend further time discussing with
you.

> 
> > It is not specified as operating on volatile-qualified data
> 
> volatile doesn't mean what you appear to suggest it does.
> 
> volatile means operations are not to be reordered, combined or otherwise
> optimized.

Precisely, it means that volatile accesses have to executed as-if by the
abstract machine.  So, if we have volatile for that, do you see that
something of this won't hold for nonvolatile?

> It does not mean writing garbage to a global variable is
> allowed just because it's not volatile, any more than it means the as-if
> rule allows a function to release a lock on entry and take it again
> before returning.  What if printf were to do that on the lock that
> controls stdout?  Surely that would be observable, as much as the
> contents of the buffer within the signal handler.

Those examples don't apply here, and I said before why, and you didn't
dispute that, so I'm not writing that again.

> > They didn't specify rules for multi-threaded execution, did they?
> 
> Exactly!

Good. Because then effects of multi-threaded execution are not
specified.  So there's no constraints.

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

[parent not found: <1415031006.4531.44.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>]

* Re: Differences between man-pages and libc manual safety markings
       [not found]                                                                 ` <1415031006.4531.44.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
@ 2014-11-04  0:18                                                                   ` Alexandre Oliva
  0 siblings, 0 replies; 35+ messages in thread
From: Alexandre Oliva @ 2014-11-04  0:18 UTC (permalink / raw)
  To: Torvald Riegel
  Cc: Peng Haitao, Carlos O'Donell, Michael Kerrisk (man-pages),
	linux-man@vger.kernel.org

On Nov  3, 2014, Torvald Riegel <triegel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> Unless you reply to that, I won't spend further time discussing with
> you.

You promise to stop harrassing me?  You got yourself a deal!

Thank you thank you thank you!

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Differences between man-pages and libc manual safety markings
       [not found]                             ` <orioj9bfaa.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
  2014-10-24 19:15                               ` Torvald Riegel
@ 2014-10-27 20:46                               ` Mark Thompson
       [not found]                                 ` <544EAF20.8050509-W77v16wj1OVeoWH0uzbU5w@public.gmane.org>
  1 sibling, 1 reply; 35+ messages in thread
From: Mark Thompson @ 2014-10-27 20:46 UTC (permalink / raw)
  To: Alexandre Oliva, Torvald Riegel
  Cc: Peng Haitao, Carlos O'Donell, Michael Kerrisk (man-pages),
	linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On 24/10/14 17:31, Alexandre Oliva wrote:
>
> We're not talking about the strcpy contract in the abstract.  We're
> talking about implementations we control, and that ought to be as
> efficient as possible because it's such a frequently used function.
> Writing intermediate states is not something such a function would want
> to do.
>

I don't think this assumption is reasonable.  While it is not (to my 
knowledge) violated anywhere in glibc yet, it is not a stretch that a 
correct and useful implementation of strcpy would be written in the near 
future on a currently-existing machine which does violate it.

To offer a concrete example, consider an instruction which allocates a 
cache line without reading it from memory, such as wh64 on alpha, 
tilepro or tilegx.  The point of the instruction is to avoid loading 
part of a cache line if you are going to overwrite all of it, so the 
explicit semantics from the point of view of the user are that the 
contents of the cache line addressed are replaced with unspecified data. 
  Removing the redundant load reduces the memory bandwidth used in large 
copy operations by one third, so this is certainly worth looking at for 
long strcpy.

Now suppose we have such an implementation.  Consider two distinct 
threads copying the same thing which is longer than a cache line into a 
cache-line-aligned buffer, on a uniprocessor machine:

Thread 1:
   Allocate cache line.
   Overwrite whole cache line.
   Return from call.
   Get preempted.

Enough other things happen to flush the whole cache back to memory, 
including the written line.

Thread 2:
   Allocate cache line (the line is not already present in the cache, so 
fill with unspecified data to avoid loading an old value from memory).
   Overwrite the first byte or word, but not the whole line (so it is 
now dirty, and must therefore overwrite the copy in memory if flushed).
   Get preempted.

Thread 1:
   User code reads the result, it's wrong.

This does have very strong alignment constraints so I don't know whether 
anyone would actually bother to write an optimised strcpy like this, but 
it certainly isn't a purely theoretical failing.

Note also that if the strcpy here were replaced with strlen+memcpy, it 
would already be wrong on alpha, tilepro and tilegx with currently 
released glibc, as all have an optimised memcpy implementation using 
this feature.

Regards,

- Mark

P.S.  Even the "no point in writing redundant data" straw man is made of 
surprisingly resilient straw.  Since strcpy will always write at least 
one byte, can you really argue that adding "*dest = 0;" to the beginning 
of a strcpy function is always a bad thing?  It seems to me that it 
actually helps in at least some cases if neither source nor destination 
is already present in cache, because we can then start loading the first 
destination line from memory in parallel with the load of the first 
source line without needing to wait for the data dependency on the source.

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

[parent not found: <544EAF20.8050509-W77v16wj1OVeoWH0uzbU5w@public.gmane.org>]

* Re: Differences between man-pages and libc manual safety markings
       [not found]                                 ` <544EAF20.8050509-W77v16wj1OVeoWH0uzbU5w@public.gmane.org>
@ 2014-10-29  8:55                                   ` Alexandre Oliva
       [not found]                                     ` <ork33jqmqe.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Alexandre Oliva @ 2014-10-29  8:55 UTC (permalink / raw)
  To: Mark Thompson
  Cc: Torvald Riegel, Peng Haitao, Carlos O'Donell,
	Michael Kerrisk (man-pages), linux-man@vger.kernel.org

On Oct 27, 2014, Mark Thompson <mrt-W77v16wj1OVeoWH0uzbU5w@public.gmane.org> wrote:

> Now suppose we have such an implementation.  Consider two distinct
> threads copying the same thing which is longer than a cache line

"/dev/tty" (the constant string copied in the case at hand) is not
longer than a cache line (right? :-), so while your case is compelling,
it doesn't apply.

> Since strcpy will always write at least one byte, can you really argue
> that adding "*dest = 0;" to the beginning of a strcpy function is
> always a bad thing?

Now, this one is compelling *and* fitting IMHO.

Of course we could rule this out in glibc, but should we?  Maybe not.

So I guess we're better off fixing the implementation of ctermid(NULL)
to return a pointer to a constant string that (per POSIX) must not be
modified by the caller, rather than needlessly copying it to another
buffer.  Then, if/when such a strcpy implementation comes up, we'll be
ready for it ;-)

Thanks,

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

[parent not found: <ork33jqmqe.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>]

* Re: Differences between man-pages and libc manual safety markings
       [not found]                                     ` <ork33jqmqe.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
@ 2014-10-29  9:12                                       ` Torvald Riegel
       [not found]                                         ` <1414573935.18538.74.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Torvald Riegel @ 2014-10-29  9:12 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Mark Thompson, Peng Haitao, Carlos O'Donell,
	Michael Kerrisk (man-pages),
	linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Wed, 2014-10-29 at 06:55 -0200, Alexandre Oliva wrote:
> On Oct 27, 2014, Mark Thompson <mrt-W77v16wj1OVeoWH0uzbU5w@public.gmane.org> wrote:
> 
> > Now suppose we have such an implementation.  Consider two distinct
> > threads copying the same thing which is longer than a cache line
> 
> "/dev/tty" (the constant string copied in the case at hand) is not
> longer than a cache line (right? :-), so while your case is compelling,
> it doesn't apply.

That depends on the alignment of the strings.  It's 9 bytes including
trailing zero...

> > Since strcpy will always write at least one byte, can you really argue
> > that adding "*dest = 0;" to the beginning of a strcpy function is
> > always a bad thing?
> 
> Now, this one is compelling *and* fitting IMHO.
> 
> Of course we could rule this out in glibc, but should we?  Maybe not.
> 
> So I guess we're better off fixing the implementation of ctermid(NULL)
> to return a pointer to a constant string that (per POSIX) must not be
> modified by the caller, rather than needlessly copying it to another
> buffer.  Then, if/when such a strcpy implementation comes up, we'll be
> ready for it ;-)

Yes, we either need to change the implementation, or make it MT-Unsafe
for now.

We should also review all other cases of "benign" race conditions.  As
this example shows, they can be not "benign" without this being easy to
spot.  So, IMO, we should really avoid them unless we have a strong
reason not to.

This will also give less false positives when using race detectors.

Alex, when you did the MT Safety review, which other cases of "benign"
race conditions did you see?  It would be useful to revisit those, I
think.



--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

[parent not found: <1414573935.18538.74.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>]

* Re: Differences between man-pages and libc manual safety markings
       [not found]                                         ` <1414573935.18538.74.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
@ 2014-10-30 18:00                                           ` Alexandre Oliva
       [not found]                                             ` <orfve5tp3e.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Alexandre Oliva @ 2014-10-30 18:00 UTC (permalink / raw)
  To: Torvald Riegel
  Cc: Mark Thompson, Peng Haitao, Carlos O'Donell,
	Michael Kerrisk (man-pages), linux-man@vger.kernel.org

On Oct 29, 2014, Torvald Riegel <triegel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> On Wed, 2014-10-29 at 06:55 -0200, Alexandre Oliva wrote:
>> On Oct 27, 2014, Mark Thompson <mrt-W77v16wj1OVeoWH0uzbU5w@public.gmane.org> wrote:
>> 
>> > Now suppose we have such an implementation.  Consider two distinct
>> > threads copying the same thing which is longer than a cache line
>> 
>> "/dev/tty" (the constant string copied in the case at hand) is not
>> longer than a cache line (right? :-), so while your case is compelling,
>> it doesn't apply.

> That depends on the alignment of the strings.

No, sorry.  The alignment of a string that is smaller than a cache can't
possibly make the string itself bigger than a cache line, and any
padding introduced by alignment, before or after the string, won't make
strcpy copy more bytes than the size of the string.

Maybe you were thinking of straddling over more than one a cache line?
That still wouldn't apply: the case mark described was one in which an
*entire* cache line would be overwritten, so it didn't have to be
fetched from memory.  If a smaller-than-cache-line string is placed
entirely in a single cache line, it will require that line to be
fetched; if it straddles over two cache lines, it requires *both* to be
fetched; if it didn't, or if it straddled over more than two cache
lines, it wouldn't be smaller than the cache line.

>> > Since strcpy will always write at least one byte, can you really argue
>> > that adding "*dest = 0;" to the beginning of a strcpy function is
>> > always a bad thing?
>> 
>> Now, this one is compelling *and* fitting IMHO.
>> 
>> Of course we could rule this out in glibc, but should we?  Maybe not.
>> 
>> So I guess we're better off fixing the implementation of ctermid(NULL)
>> to return a pointer to a constant string that (per POSIX) must not be
>> modified by the caller, rather than needlessly copying it to another
>> buffer.  Then, if/when such a strcpy implementation comes up, we'll be
>> ready for it ;-)

> Yes, we either need to change the implementation, or make it MT-Unsafe
> for now.

We only have to make it MT-Unsafe for now if the scenario above, of
strcpy *always* writing a zero byte to the beginning of the destination
string, were present in any implementation of strcpy in glibc.

Do you see any implementations of strcpy in glibc doing that?

> As this example shows, they can be not "benign" without this being
> easy to spot.

As much as you might want it to be so, it doesn't show an such thing.
The example just shows that changing part of the implementation can make
other parts that rely on it racy, so changes that might affect safety
properties have to be made with a lot of care.

Now that's not much of a surprise, is it?

> Alex, when you did the MT Safety review, which other cases of "benign"
> race conditions did you see?

We've already discussed them in the list where we should have been
discussing this in the first place.

You have access to my notes in the comments added next to the safety
annotations in the manual.  That's all I got.

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

[parent not found: <orfve5tp3e.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>]

* Re: Differences between man-pages and libc manual safety markings
       [not found]                                             ` <orfve5tp3e.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
@ 2014-10-30 18:41                                               ` Torvald Riegel
       [not found]                                                 ` <1414694486.10085.165.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Torvald Riegel @ 2014-10-30 18:41 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Mark Thompson, Peng Haitao, Carlos O'Donell,
	Michael Kerrisk (man-pages),
	linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Thu, 2014-10-30 at 16:00 -0200, Alexandre Oliva wrote:
> On Oct 29, 2014, Torvald Riegel <triegel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
> > On Wed, 2014-10-29 at 06:55 -0200, Alexandre Oliva wrote:
> >> On Oct 27, 2014, Mark Thompson <mrt-W77v16wj1OVeoWH0uzbU5w@public.gmane.org> wrote:
> >> 
> >> > Now suppose we have such an implementation.  Consider two distinct
> >> > threads copying the same thing which is longer than a cache line
> >> 
> >> "/dev/tty" (the constant string copied in the case at hand) is not
> >> longer than a cache line (right? :-), so while your case is compelling,
> >> it doesn't apply.
> 
> > That depends on the alignment of the strings.
> 
> No, sorry.  The alignment of a string that is smaller than a cache can't
> possibly make the string itself bigger than a cache line, and any
> padding introduced by alignment, before or after the string, won't make
> strcpy copy more bytes than the size of the string.
> 
> Maybe you were thinking of straddling over more than one a cache line?

Yep, and agreed that this wasn't what Mark described.

> >> > Since strcpy will always write at least one byte, can you really argue
> >> > that adding "*dest = 0;" to the beginning of a strcpy function is
> >> > always a bad thing?
> >> 
> >> Now, this one is compelling *and* fitting IMHO.
> >> 
> >> Of course we could rule this out in glibc, but should we?  Maybe not.
> >> 
> >> So I guess we're better off fixing the implementation of ctermid(NULL)
> >> to return a pointer to a constant string that (per POSIX) must not be
> >> modified by the caller, rather than needlessly copying it to another
> >> buffer.  Then, if/when such a strcpy implementation comes up, we'll be
> >> ready for it ;-)
> 
> > Yes, we either need to change the implementation, or make it MT-Unsafe
> > for now.
> 
> We only have to make it MT-Unsafe for now if the scenario above, of
> strcpy *always* writing a zero byte to the beginning of the destination
> string, were present in any implementation of strcpy in glibc.
> 
> Do you see any implementations of strcpy in glibc doing that?

Can we please fix faults even if they are not triggering a error right
now?  Please?  Why should we jump through all those hoops, make all
those assumptions, just to stick to this unusual reasoning?  We want to
build stuff that's easy to maintain, not a maze.

> 
> > As this example shows, they can be not "benign" without this being
> > easy to spot.
> 
> As much as you might want it to be so, it doesn't show an such thing.

No, it definitely does.  You made an assumption about a "perfectly
reasonable requirement we already place on any strcpy implementation we
use".  Mark showed that there are reasonable, and existing, strcpy
implementations (or similar for memcpy) that conflict with your
assumption.  So, it seems your "perfectly reasonable requirement" is (1)
not so obviously clear at all and (2) would be easy to break by
perfectly reasonable implementations.

> The example just shows that changing part of the implementation can make
> other parts that rely on it racy, so changes that might affect safety
> properties have to be made with a lot of care.

That's the wrong way around.  The situation you describe exists in
practice, yes, but it's what we need to avoid, not the goal.  We have
contracts for functions, and documentation, to actually decrease
complexity.  This means that we need to be able to change
implementations of functions if they still satisfy the contract.  It's
simply a matter of trying to keep things modular -- divide an conquer.

In this example, strcpy is sequential code, period. (Of course, as far
as the string data is concerned.)  That's the existing contract.  You
made an assumption in your MT-Safe reasoning that *extends* this
contract with an additional rule (for which we needed several emails to
define it, so no, it's not a trivial addition) -- that certain race
conditions must be benign.  Thus, either you are breaking the contract,
or you need to document that this is the new contract.

> Now that's not much of a surprise, is it?
> 
> > Alex, when you did the MT Safety review, which other cases of "benign"
> > race conditions did you see?
> 
> We've already discussed them in the list where we should have been
> discussing this in the first place.
> 
> You have access to my notes in the comments added next to the safety
> annotations in the manual.  That's all I got.

In the notes for ctermid, I can't see anything that would hint at an
assumed benign race condition.  Does that mean that you didn't make
notes for benign race conditions in general?

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

[parent not found: <1414694486.10085.165.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>]

* Re: Differences between man-pages and libc manual safety markings
       [not found]                                                 ` <1414694486.10085.165.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
@ 2014-11-01  8:24                                                   ` Alexandre Oliva
       [not found]                                                     ` <oregtn8h23.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Alexandre Oliva @ 2014-11-01  8:24 UTC (permalink / raw)
  To: Torvald Riegel
  Cc: Mark Thompson, Peng Haitao, Carlos O'Donell,
	Michael Kerrisk (man-pages), linux-man@vger.kernel.org

On Oct 30, 2014, Torvald Riegel <triegel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> On Thu, 2014-10-30 at 16:00 -0200, Alexandre Oliva wrote:
>> Do you see any implementations of strcpy in glibc doing that?

> Can we please fix faults even if they are not triggering a error right
> now?

Sure!  But the way to fix that is *not* modifying a piece of
documentation that is supposed to document current state, and that does,
into something that doesn't, is it?

strcpy as it is implemented today makes ctermid MT-Safe.  That's what
the current documentation states, and AFAICT that is correct.  Do you
disagree?  If so, please point out in current code where it deviates.

> No, it definitely does.  You made an assumption about a "perfectly
> reasonable requirement we already place on any strcpy implementation we
> use".

... and that current implementations of strcpy in glibc abide by, so the
above is tautologically correct.  Introducing different behavior, such
as unconditionally writing something else on the string, is indeed a
change in the current contract.

> So, it seems your "perfectly reasonable requirement" is (1)
> not so obviously clear at all

I guess it will be once you observe the existing implementations.
Have you?

> and (2) would be easy to break by perfectly reasonable
> implementations.

That much is true.  Which is why ctermid safety notes have comments in
the manual indicating what's going on in there, and why it's safe in
spite of the potential race.

> This means that we need to be able to change
> implementations of functions if they still satisfy the contract.

So you agree that it would be perfectly legitimate to address the
current problem by documentng that glibc's implementatios of strcpy must
not write to the destination any data other than what they were asked to
write?  Or even that concurrent runs of strcpy writing the same string
to the same destination must not introduce windows in which data other
than what is being written or what was in there before is visible?

These are all derivable from current implementations in glibc, so it
wouldn't be changing any contracts, just imposing further requirements
on future implementations so as to keep the current ctermid safe.

> In the notes for ctermid, I can't see anything that would hint at an
> assumed benign race condition.

Are you sure you're looking at the right place?

@deftypefun {char *} ctermid (char *@var{string})
@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
@c This function is a stub by default; the actual implementation, for
@c posix systems, returns an internal buffer if passed a NULL string,
@c but the internal buffer is always set to /dev/tty.

> Does that mean that you didn't make notes for benign race conditions
> in general?

If you saw the above and didn't see that, we can agree they're not in a
form you expect.  But no, I don't think I have made notes with blinking
red letters whenever I made an assumption you'd disagree with.  We've
covered some of them already in past discussions that led nowhere, so I
won't repeat myself to avoid wasting both of us even more time.  I will
just say that, after we had that conversation and I agreed to take
additional notes of potential races involving bit operations in IOstream
functions, I didn't find any more of those, and ctermid is certainly an
outlier; I don't recall other situations that involved that sort of
reasoning.  Other pervasive cases I'm not sure I mentioned then have to
do with unguarded access to constant locale data, that have only been
annotated when the pointer to the current locale object is accessed more
than once.  That's really all I can think of that you might find
objectionable, but I'm sure there'd be plenty of other cases you'd have
objected to if you had done the review.

Now here's something that might make your head spin: here's the object
code generated for ctermid, non-PIC (but PIC is different only in how it
initializes %rdx with the address of the buffer):

0000000000000000 <ctermid>:
   0:   48 89 f8                mov    %rdi,%rax
   3:   48 85 ff                test   %rdi,%rdi
   6:   ba 00 00 00 00          mov    $0x0,%edx
                        7: R_X86_64_32  .bss
   b:   48 0f 44 c2             cmove  %rdx,%rax
   f:   48 b9 2f 64 65 76 2f    movabs $0x7974742f7665642f,%rcx
  16:   74 74 79
  19:   48 89 08                mov    %rcx,(%rax)
  1c:   c6 40 08 00             movb   $0x0,0x8(%rax)
  20:   c3                      retq

Do you see any strcpy call in there?  Yeah, the compiler turns strcpy of
a constant into memcpy, and that in turn gets inlined into a
word-and-byte store.  This in turn invalidates Mark Thompson's
reasoning: now there's no longer any point in storing a NUL upfront to
cache the dest line in *while* we load the beginning of src, because src
is part of the instruction stream.  Writing a byte upfront would just
waste cycles that could have been spent writing the actual data, so no
sane optimizing compiler would do that.

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

[parent not found: <oregtn8h23.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>]

* Re: Differences between man-pages and libc manual safety markings
       [not found]                                                     ` <oregtn8h23.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
@ 2014-11-01 12:40                                                       ` Torvald Riegel
       [not found]                                                         ` <1414845631.10085.474.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Torvald Riegel @ 2014-11-01 12:40 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Mark Thompson, Peng Haitao, Carlos O'Donell,
	Michael Kerrisk (man-pages),
	linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Sat, 2014-11-01 at 06:24 -0200, Alexandre Oliva wrote:
> On Oct 30, 2014, Torvald Riegel <triegel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
> > On Thu, 2014-10-30 at 16:00 -0200, Alexandre Oliva wrote:
> >> Do you see any implementations of strcpy in glibc doing that?
> 
> > Can we please fix faults even if they are not triggering a error right
> > now?
> 
> Sure!  But the way to fix that is *not* modifying a piece of
> documentation that is supposed to document current state, and that does,
> into something that doesn't, is it?
> 
> strcpy as it is implemented today makes ctermid MT-Safe.  That's what
> the current documentation states, and AFAICT that is correct.  Do you
> disagree?  If so, please point out in current code where it deviates.

The fault is in the difference between what strcpy is specified to do
(ie, it's sequential contract plus your MT-Safe annotation that,
however, requires callers to protect caller-supplied data from data
races) -- and what you rely on.  You didn't document that difference, or
your assumption, anywhere.  Even if our current implementation doesn't
trigger an issue, there's still a fault in your documentation.  That's
what I'm talking about.

> > No, it definitely does.  You made an assumption about a "perfectly
> > reasonable requirement we already place on any strcpy implementation we
> > use".
> 
> ... and that current implementations of strcpy in glibc abide by, so the
> above is tautologically correct.

Ahem, no.  Because we do not place this requirement on any strcpy
implementation, or is it documented or obvious anywhere?  The fact that
current implementations happen to abide by such a requirement does not
mean that we make this requirement.  If so, you could point me at it.
And no, referring to all other implementations and saying "do exactly
what they do" is not an implicit requirement.

> Introducing different behavior, such
> as unconditionally writing something else on the string, is indeed a
> change in the current contract.

No, it's not.  It deviates from what current implementations might do.
But the sequential contract of strcpy says that the accesses should only
go to what the abstract machine would do (ie, don't write to other
strings), and that *when strcpy* returns, it must have copied.  That's
the contract.

> > So, it seems your "perfectly reasonable requirement" is (1)
> > not so obviously clear at all
> 
> I guess it will be once you observe the existing implementations.
> Have you?

See Mark's comments.  Are you saying that the implementation possibility
he mentions is obviously wrong?  If it's not, then you don't have an
obvious requirement to not do it.

> > and (2) would be easy to break by perfectly reasonable
> > implementations.
> 
> That much is true.  Which is why ctermid safety notes have comments in
> the manual indicating what's going on in there, and why it's safe in
> spite of the potential race.

No.  All that I see is this (which you're aware of, see below):

@c This function is a stub by default; the actual implementation, for
@c posix systems, returns an internal buffer if passed a NULL string,
@c but the internal buffer is always set to /dev/tty.

Where does it talk about a potential data race?  It says that's there a
shared buffer, but it does not mention that there's something special
about how the buffer is initialized.  Nor that this affects strcpy
implementations in any way.

And how do you expect strcpy implementers to be aware of this, in
practice, with high probability?  Should they review all the callers and
look for undocumented assumptions?  Do you think that this scales well
and keeps complexity down?

> > This means that we need to be able to change
> > implementations of functions if they still satisfy the contract.
> 
> So you agree that it would be perfectly legitimate to address the
> current problem by documentng that glibc's implementatios of strcpy must
> not write to the destination any data other than what they were asked to
> write?  Or even that concurrent runs of strcpy writing the same string
> to the same destination must not introduce windows in which data other
> than what is being written or what was in there before is visible?

The latter is better.

> These are all derivable from current implementations in glibc, so it
> wouldn't be changing any contracts, just imposing further requirements
> on future implementations so as to keep the current ctermid safe.

No.  Implementations do not define the contract.  Implementations
satisfy the contract.  The point of reasoning in terms of contracts is
that those provide an abstraction, and state the requirements on
implementations.  This allows decoupling, which is critical for keeping
complexity at a maintainable level.

You can ignore all that and assume that the only contract of a function
is the union of its implementations.  But that's BAD when we have a
function like strcpy that already has a contract.  It's not so bad when
we do this in tightly coupled functions, and it would be overkill to
specify a contract for, say, a static function that just has two call
sites in the function right next to it in the same file.  But why should
strcpy and ctermid be tightly coupled?

> > In the notes for ctermid, I can't see anything that would hint at an
> > assumed benign race condition.
> 
> Are you sure you're looking at the right place?
> 
> @deftypefun {char *} ctermid (char *@var{string})
> @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
> @c This function is a stub by default; the actual implementation, for
> @c posix systems, returns an internal buffer if passed a NULL string,
> @c but the internal buffer is always set to /dev/tty.

Yes that.  See above...

> > Does that mean that you didn't make notes for benign race conditions
> > in general?
> 
> If you saw the above and didn't see that, we can agree they're not in a
> form you expect.  But no, I don't think I have made notes with blinking
> red letters whenever I made an assumption you'd disagree with.  We've
> covered some of them already in past discussions that led nowhere, so I
> won't repeat myself to avoid wasting both of us even more time.  I will
> just say that, after we had that conversation and I agreed to take
> additional notes of potential races involving bit operations in IOstream
> functions, I didn't find any more of those, and ctermid is certainly an
> outlier; I don't recall other situations that involved that sort of
> reasoning.

Good.  That information helps, thanks.

> Other pervasive cases I'm not sure I mentioned then have to
> do with unguarded access to constant locale data, that have only been
> annotated when the pointer to the current locale object is accessed more
> than once.

I can't quite follow that.  Do the accesses constitute data races?  Can
you give me some more pointers or background info?

> That's really all I can think of that you might find
> objectionable, but I'm sure there'd be plenty of other cases you'd have
> objected to if you had done the review.

Or maybe not.  What makes you think that way?  I'm asking to get a feel
for what else you might have assumed to be okay that I wouldn't.

> Now here's something that might make your head spin:

Sorry, but why?

If, then because this example doesn't actually support your arguments. 

I'm arguing that bringing in additional dependencies on implementation
details doesn't help keeping complexity down nor makes things less
fragile -- and you give an example that adds even more dependencies (ie,
assumptions about a specific compiler)?

Oh, and BTW, I already mentioned that one has to consider the compiler
too when dealing with "benign" race conditions, didn't I?

> here's the object
> code generated for ctermid, non-PIC (but PIC is different only in how it
> initializes %rdx with the address of the buffer):
> 
> 0000000000000000 <ctermid>:
>    0:   48 89 f8                mov    %rdi,%rax
>    3:   48 85 ff                test   %rdi,%rdi
>    6:   ba 00 00 00 00          mov    $0x0,%edx
>                         7: R_X86_64_32  .bss
>    b:   48 0f 44 c2             cmove  %rdx,%rax
>    f:   48 b9 2f 64 65 76 2f    movabs $0x7974742f7665642f,%rcx
>   16:   74 74 79
>   19:   48 89 08                mov    %rcx,(%rax)
>   1c:   c6 40 08 00             movb   $0x0,0x8(%rax)
>   20:   c3                      retq
> 
> Do you see any strcpy call in there?  Yeah, the compiler turns strcpy of
> a constant into memcpy,

So now you depend on the compiler's memcpy implementation.  Great.  We
can consider the optimized memcpy Mark mentions.  (strcpy on constant
string is strlen (known a priori) + memcpy.)

> and that in turn gets inlined into a
> word-and-byte store.  This in turn invalidates Mark Thompson's
> reasoning: now there's no longer any point in storing a NUL upfront to
> cache the dest line in *while* we load the beginning of src, because src
> is part of the instruction stream.  Writing a byte upfront would just
> waste cycles that could have been spent writing the actual data, so no
> sane optimizing compiler would do that.

Please look again at the case he described.  The src doesn't matter,
it's the destination.  The only thing that saves you here is that
cachelines will be likely longer than "/dev/tty" plus trailing zero.  So
if we would change the string and make it longer, you'd hit exactly the
issue you described if the compiler's memcpy makes the optimization.

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

[parent not found: <1414845631.10085.474.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>]

* Re: Differences between man-pages and libc manual safety markings
       [not found]                                                         ` <1414845631.10085.474.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
@ 2014-11-01 18:22                                                           ` Alexandre Oliva
       [not found]                                                             ` <or1tpm3hn5.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Alexandre Oliva @ 2014-11-01 18:22 UTC (permalink / raw)
  To: Torvald Riegel
  Cc: Mark Thompson, Peng Haitao, Carlos O'Donell,
	Michael Kerrisk (man-pages), linux-man@vger.kernel.org

On Nov  1, 2014, Torvald Riegel <triegel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> On Sat, 2014-11-01 at 06:24 -0200, Alexandre Oliva wrote:
>> strcpy as it is implemented today makes ctermid MT-Safe.  That's what
>> the current documentation states, and AFAICT that is correct.  Do you
>> disagree?  If so, please point out in current code where it deviates.

> The fault is in the difference between what strcpy is specified to do

Please point out the part of strcpy's contract that states it can write
garbage to the destination before writing what it ought to, and then you
(and Mark) might have a point.

Don't even bother arguing a function might do things not explicitly
written out in their specs, or I'll suggest functions might release
locks and take them back at will, just to put things in a familiar
context in which you'd see how bad that would be.

You may then resort to the as-if rule combined with the requirement for
synchronization points between reads and writes by different threads,
and that would be correct for *current* standards.  Not so for much
older standards, that had no such requirements, and that existing
programs still target, so we should ideally provide an implementation
suitable for them.

So, who changed the contract and put in additional requirements, again?

> See Mark's comments.  Are you saying that the implementation possibility
> he mentions is obviously wrong?

It depends on the targeted standard.  For recent standards, it's a
legitimate optimization.  For older standards, it would not be.  For
future ones, I seem to have misplaced my crystal ball .

> @c This function is a stub by default; the actual implementation, for
> @c posix systems, returns an internal buffer if passed a NULL string,
> @c but the internal buffer is always set to /dev/tty.

> Where does it talk about a potential data race?  It says that's there a
> shared buffer, but it does not mention that there's something special
> about how the buffer is initialized.  Nor that this affects strcpy
> implementations in any way.

See?, that's what I said about the notes not being in a form you wanted.
The information is all there: there's a static shared buffer (as
required by ctermid(NULL), and that's usually not thread-safe, but
that's tacit knowledge of our time not explicitly duplicated there), and
it's always set to /dev/tty (and the fact that it uses strcpy is just an
implementation detail that, after optimization, is not even true any
more).

> And how do you expect strcpy implementers to be aware of this, in
> practice, with high probability?

I don't.  You wish they would, and I understand why, but that's all.
You wish I had solved a documentation problem, but I was under no
obligation to do so and I had enough on my plate already so I did not
jump through additional hoops just to satisfy your wishes.

> Should they review all the callers and look for undocumented
> assumptions?

I'm afraid that's how things have usually been done.

> Do you think that this scales well and keeps complexity down?

No.  I don't disagree such documentation is desirable.

>> Other pervasive cases I'm not sure I mentioned then have to
>> do with unguarded access to constant locale data, that have only been
>> annotated when the pointer to the current locale object is accessed more
>> than once.

> I can't quite follow that.  Do the accesses constitute data races?  Can
> you give me some more pointers or background info?

Look at the implementation of the ctype family of macros/functions, for
one.  Technically, there are data races in there: the global pointer to
the current locale object can be modified by other threads and they
don't take anything like a read lock to ensure they have the current
pointer and that it's not being modified concurrently.

However, because of the way locale is implemented, the pointer is
modified atomically (while holding a write lock, and written as a single
word to a properly aligned location), so readers in other threads may
get either the old pointer or the new one.

The old locale data remains valid forever (till the end of the program),
so it is safe to keep on using it until some synchronization point or
whatever else updates the local view of the global pointer.

Numerous functions access ctype data once; given the above, those are
never a problem, in spite of the race in the global locale pointer.

Some functions access ctype data multiple times; given the above, and
because the locale structures are accessed in a way that enable the
compiler to load the global locale pointer only once, and the compiler
performs this optimization (or, in some cases, the load is factored out
explicitly in the source code), these are not a problem, in spite of the
same race.

Some functions call multiple functions that each access ctype data, each
one loading the global locale pointer independently.  These are prone to
inconsistent behavior, since parts of their execution may use one locale
while another part may use another locale.  These were consistently
marked with the “locale” keyword.

>> That's really all I can think of that you might find
>> objectionable, but I'm sure there'd be plenty of other cases you'd have
>> objected to if you had done the review.

> Or maybe not.  What makes you think that way?

That you've surprised me again and again with objections to issues I
hadn't regarded as objectionable.

> I'm asking to get a feel for what else you might have assumed to be
> okay that I wouldn't.

How could I do that without knowing what *your* criteria could be?

>> Now here's something that might make your head spin:

> Sorry, but why?

Because we've been talking about strcpy, because my recollection
incorrectly told me ctermid used the internal, non-preemptible name for
strcpy, but strcpy happens to be actually irrelevant because it is
optimized away.

> So now you depend on the compiler's memcpy implementation.  Great.  We
> can consider the optimized memcpy Mark mentions.  (strcpy on constant
> string is strlen (known a priori) + memcpy.)

Well, no, it's more efficient than that, because it doesn't have to load
the src: it is a known constant.  It's more like an open-coded array
initialization.

> Please look again at the case he described.

Likewise.  No, seriously, I mean, look at the *other* case he described,
the one that actually matters.  Not the one about invalidating a entire
cache line, that I've already demonstrated as not applicable to this
short string, but you get back to with a hypothetical larger string that
might indeed invalidate the underlying assumptions.  I'm sorry to
disappoint you, but my code review abilities do not cover hypothetical
and nonsensical changes you might want to suggest just to break
observations derived from the code on which the documented properties
were based.

I was talking about the one that starts out by writing garbage to the
first byte to trigger the prefetching of that cache line.  The one that
would break the strcpy contract of old standards, but that is valid
under current standards' contracts that include the as-if rule and the
no-data-races mandate.

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

[parent not found: <or1tpm3hn5.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>]

* Re: Differences between man-pages and libc manual safety markings
       [not found]                                                             ` <or1tpm3hn5.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
@ 2014-11-01 19:54                                                               ` Torvald Riegel
       [not found]                                                                 ` <1414871691.10085.529.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Torvald Riegel @ 2014-11-01 19:54 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Mark Thompson, Peng Haitao, Carlos O'Donell,
	Michael Kerrisk (man-pages),
	linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Sat, 2014-11-01 at 16:22 -0200, Alexandre Oliva wrote:
> On Nov  1, 2014, Torvald Riegel <triegel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
> > On Sat, 2014-11-01 at 06:24 -0200, Alexandre Oliva wrote:
> >> strcpy as it is implemented today makes ctermid MT-Safe.  That's what
> >> the current documentation states, and AFAICT that is correct.  Do you
> >> disagree?  If so, please point out in current code where it deviates.
> 
> > The fault is in the difference between what strcpy is specified to do
> 
> Please point out the part of strcpy's contract that states it can write
> garbage to the destination before writing what it ought to, and then you
> (and Mark) might have a point.

It's a sequential specification, so before/after, the intermediate steps
are unspecified, so it is allowed to do what it wants there.  Within the
rules of C -- so as-if applies here.  And that doesn't disallow strcpy
to prevent writing intermediate states (whether you consider them
garbage or not).

Would you make any assumptions about the stores performed by a sorting
function that is specified to take an array, and return with the array's
elements being sorted?  Would you require it to only write finally
sorted data out, or would you allow it to use the array as scratch space
too?  I guess the latter.  And the same applies to strcpy, you don't
want to restrict whether it copies forwards or backwards, for example.
And you don't have to for sequential code.

> Don't even bother arguing a function might do things not explicitly
> written out in their specs, or I'll suggest functions might release
> locks and take them back at will, just to put things in a familiar
> context in which you'd see how bad that would be.

First, now you're talking about concurrent code, not sequential code.

If we assume we have concurrent code, then you could even release and
acquire locks at will if you can prove that this is still okay under
as-if.  But that's besides the point we're discussing here because...

.. strcpy is not specified as a concurrent function, is it?  And it
doesn't operate on volatile data.  So, yes, it can write other stuff to
the strings it will overwrite eventually with the final results.  Same
as a sorting function might do.  The requirement is about the state
after the function has completed, not while it does something.  That's
why these things are called post-conditions.

> 
> You may then resort to the as-if rule combined with the requirement for
> synchronization points between reads and writes by different threads,
> and that would be correct for *current* standards.  Not so for much
> older standards, that had no such requirements,

If you're referring to pre-C11 C standards, then they had a
single-threaded abstract machine, didn't they?

> and that existing
> programs still target, so we should ideally provide an implementation
> suitable for them.
> 
> 
> So, who changed the contract and put in additional requirements, again?
> 
> > See Mark's comments.  Are you saying that the implementation possibility
> > he mentions is obviously wrong?
> 
> It depends on the targeted standard.  For recent standards, it's a
> legitimate optimization.  For older standards, it would not be.

Unless you say which "older standards" you actually refer to, I can't
interpret your statement.  Because older standards for single-threaded
abstract machines do allow this.

> For
> future ones, I seem to have misplaced my crystal ball .
> 
> > @c This function is a stub by default; the actual implementation, for
> > @c posix systems, returns an internal buffer if passed a NULL string,
> > @c but the internal buffer is always set to /dev/tty.
> 
> > Where does it talk about a potential data race?  It says that's there a
> > shared buffer, but it does not mention that there's something special
> > about how the buffer is initialized.  Nor that this affects strcpy
> > implementations in any way.
> 
> See?, that's what I said about the notes not being in a form you wanted.
> The information is all there: there's a static shared buffer (as
> required by ctermid(NULL), and that's usually not thread-safe, but
> that's tacit knowledge of our time not explicitly duplicated there),

Wow, really?  It supposed to be "tacit knowledge of our time" to not
have thread-safe initialization of a static buffer?  Or have a
pre-initialized static buffer?  "but the internal buffer is always set
to /dev/tty" makes it obvious that there's a data race?  Really?

Maybe we should make a poll on libc-alpha to see which percentage of
people actually understands this comment as implying that there is a
data race.

> and
> it's always set to /dev/tty (and the fact that it uses strcpy is just an
> implementation detail that, after optimization, is not even true any
> more).

The validity of this comment relies on the "implementation detail" and
on other implementation details such as what the compiler does.  So that
doesn't look like "just an implementation detail" to me...

> > And how do you expect strcpy implementers to be aware of this, in
> > practice, with high probability?
> 
> I don't.  You wish they would, and I understand why, but that's all.
> You wish I had solved a documentation problem, but I was under no
> obligation to do so and I had enough on my plate already so I did not
> jump through additional hoops just to satisfy your wishes.

I'm not really sure what to say after reading that.

Also note that "your wishes" isn't what this is about -- it is about
maintainable documentation.  This affects glibc in general.

> > Should they review all the callers and look for undocumented
> > assumptions?
> 
> I'm afraid that's how things have usually been done.
> 
> > Do you think that this scales well and keeps complexity down?
> 
> No.  I don't disagree such documentation is desirable.
> 
> >> Other pervasive cases I'm not sure I mentioned then have to
> >> do with unguarded access to constant locale data, that have only been
> >> annotated when the pointer to the current locale object is accessed more
> >> than once.
> 
> > I can't quite follow that.  Do the accesses constitute data races?  Can
> > you give me some more pointers or background info?
> 
> Look at the implementation of the ctype family of macros/functions, for
> one.  Technically, there are data races in there: the global pointer to
> the current locale object can be modified by other threads and they
> don't take anything like a read lock to ensure they have the current
> pointer and that it's not being modified concurrently.
> 
> However, because of the way locale is implemented, the pointer is
> modified atomically (while holding a write lock, and written as a single
> word to a properly aligned location), so readers in other threads may
> get either the old pointer or the new one.
> 
> The old locale data remains valid forever (till the end of the program),
> so it is safe to keep on using it until some synchronization point or
> whatever else updates the local view of the global pointer.
> 
> Numerous functions access ctype data once; given the above, those are
> never a problem, in spite of the race in the global locale pointer.

I have to look at this in detail, but I'm concerned about two issues
there:
1) Do the loads use an acquire fence (ie, atomic_read_barrier)?  If not,
is any new locale data they can read from initialized before those reads
(e.g., at program startup, so happening before any spawned threads)?
2) If the compiler sees such a load (e.g., because it's a macro, or with
LTO and inlining), and it's not marked as an atomic access, it can be
free to reload it.  Which could lead to partially reading from the old
and new locale.

> Some functions access ctype data multiple times; given the above, and
> because the locale structures are accessed in a way that enable the
> compiler to load the global locale pointer only once, and the compiler
> performs this optimization

So correctness applies in some cases on certain, optional compiler
optimizations to be performed?  That wouldn't be good.

> (or, in some cases, the load is factored out
> explicitly in the source code), these are not a problem, in spite of the
> same race.
> 
> Some functions call multiple functions that each access ctype data, each
> one loading the global locale pointer independently.  These are prone to
> inconsistent behavior, since parts of their execution may use one locale
> while another part may use another locale.  These were consistently
> marked with the “locale” keyword.

Thanks for the additional information.  It's helpful.

> 
> >> That's really all I can think of that you might find
> >> objectionable, but I'm sure there'd be plenty of other cases you'd have
> >> objected to if you had done the review.
> 
> > Or maybe not.  What makes you think that way?
> 
> That you've surprised me again and again with objections to issues I
> hadn't regarded as objectionable.
> 
> > I'm asking to get a feel for what else you might have assumed to be
> > okay that I wouldn't.
> 
> How could I do that without knowing what *your* criteria could be?

You can't a priori.  But we can keep talking, and you can explain more
of your reasoning, and we can further discuss it to see whether there
are differences.  The above is a good start, so I appreciate that.

> >> Now here's something that might make your head spin:
> 
> > Sorry, but why?
> 
> Because we've been talking about strcpy, because my recollection
> incorrectly told me ctermid used the internal, non-preemptible name for
> strcpy, but strcpy happens to be actually irrelevant because it is
> optimized away.
> 
> > So now you depend on the compiler's memcpy implementation.  Great.  We
> > can consider the optimized memcpy Mark mentions.  (strcpy on constant
> > string is strlen (known a priori) + memcpy.)
> 
> Well, no, it's more efficient than that, because it doesn't have to load
> the src: it is a known constant.  It's more like an open-coded array
> initialization.

As I said in my previous email, how to write to the destination matters.

> > Please look again at the case he described.
> 
> Likewise.  No, seriously, I mean, look at the *other* case he described,
> the one that actually matters.  Not the one about invalidating a entire
> cache line, that I've already demonstrated as not applicable to this
> short string, but you get back to with a hypothetical larger string that
> might indeed invalidate the underlying assumptions.

Oh great, 

> I'm sorry to
> disappoint you, but my code review abilities do not cover hypothetical
> and nonsensical changes you might want to suggest just to break
> observations derived from the code on which the documented properties
> were based.
> 
> I was talking about the one that starts out by writing garbage to the
> first byte to trigger the prefetching of that cache line.  The one that
> would break the strcpy contract of old standards, but that is valid
> under current standards' contracts that include the as-if rule

Then please refer me to the concrete old standard (and we should support
it, actually) and the wording in there that *requires* strcpy to issue a
certain set of stores instead of just, for example, requiring that when
the function returns, the string has been copied.

Second, when you've done that, could you please explain to me how a
compiler is supposed to optimize code without something like an as-if
rule?

> and the
> no-data-races mandate.

For there to be data races, the standard must actually acknowledge that
things are supposed to work in a multi-threaded setting.  Could you
refer me to the respective wording too, please?

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

[parent not found: <1414871691.10085.529.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>]

* Re: Differences between man-pages and libc manual safety markings
       [not found]                                                                 ` <1414871691.10085.529.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
@ 2014-11-03  5:43                                                                   ` Alexandre Oliva
       [not found]                                                                     ` <orzjc8zvn6.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Alexandre Oliva @ 2014-11-03  5:43 UTC (permalink / raw)
  To: Torvald Riegel
  Cc: Mark Thompson, Peng Haitao, Carlos O'Donell,
	Michael Kerrisk (man-pages), linux-man@vger.kernel.org

On Nov  1, 2014, Torvald Riegel <triegel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

>> The information is all there: there's a static shared buffer (as
>> required by ctermid(NULL), and that's usually not thread-safe, but
>> that's tacit knowledge of our time not explicitly duplicated there),

> Wow, really?  It supposed to be "tacit knowledge of our time" to not
> have thread-safe initialization of a static buffer?

You seem to have misread what I wrote.  It's the fact that ctermid(NULL)
is not required to be thread-safe that is tacit knowledge of our time.

> "but the internal buffer is always set
> to /dev/tty" makes it obvious that there's a data race?

The reasoning indicates there's something non-obvious going on there.
When there isn't, there's no reason for any such note.

> Maybe we should make a poll on libc-alpha to see which percentage of
> people actually understands this comment as implying that there is a
> data race.

Will you please stop putting words I didn't say in my mouth?

Where did I say the comment meant to imply that there was a data race in
there?

All it was meant to do was justify why I regarded the function as
MT-Safe and AS-Safe, in spite of the possibility of concurrent
initializations of the buffer.

> Also note that "your wishes" isn't what this is about -- it is about
> maintainable documentation.  This affects glibc in general.

That's still your wishes for additional documentation that nobody has
stepped up to do.  Others (myself included) may share such a wish for
better documentation on various fronts, but none of this would entitle
you to demand me to do so, or to complain that I didn't, when I never
agreed to do it.

> 1) Do the loads use an acquire fence (ie, atomic_read_barrier)?

No.

> If not, is any new locale data they can read from initialized before
> those reads (e.g., at program startup, so happening before any spawned
> threads)?

No

> 2) If the compiler sees such a load (e.g., because it's a macro, or with
> LTO and inlining), and it's not marked as an atomic access, it can be
> free to reload it.  Which could lead to partially reading from the old
> and new locale.

Yes, but only if there's more than one use (thus marked locale).  If
there's only one use, the compiler could, but would not do something
that stupid.

>> Some functions access ctype data multiple times; given the above, and
>> because the locale structures are accessed in a way that enable the
>> compiler to load the global locale pointer only once, and the compiler
>> performs this optimization

> So correctness applies in some cases on certain, optional compiler
> optimizations to be performed?

That is glibc's locale design, yes.  Sane compilers help keep glibc sane
and safe.  We are indeed relying on compiler sanity.

> Then please refer me to the concrete old standard (and we should support
> it, actually) and the wording in there that *requires* strcpy to issue a
> certain set of stores instead of just, for example, requiring that when
> the function returns, the string has been copied.

These are equivalent requirements IMHO, because neither makes other
allowances for using the user-visible storage as scratch space, and
both, because of asynchronous signals, rule this possibility out for
storage that the signal handler could observe.  Anyway, let's leave this
point for the other sub-thread, to avoid duplication, shall we?

> Second, when you've done that, could you please explain to me how a
> compiler is supposed to optimize code without something like an as-if
> rule?

The key is external observability, and ordering requirements.

strcpy's sequential specification does not mandate chars to be copied in
any predetermined order, so strcpy is free to reorder and regroup loads
and stores as it sees fit.  None of this steps out of its explicit
specification.  Writing garbage, however, would step out, but it might
still be allowed under the as-if rule if this couldn't be legitimately
observed, e.g. if any attempt to observe it would invoke undefined
behavior.

>> and the
>> no-data-races mandate.

> For there to be data races, the standard must actually acknowledge that
> things are supposed to work in a multi-threaded setting.  Could you
> refer me to the respective wording too, please?

Why are you doing this?  You know where this requirement is in POSIX!

For a long time we have had threads as a POSIX add-on to C standards
that said nothing about threads.  POSIX imports (defers to) standard C
without conflicting with it.  It logically follows from these two
statements that all requirements from standard C on strcpy
implementations still apply under POSIX.  If you agree that a signal
handlers' ability to observe deviations from standard-mandated behavior
indicate deviation from the standard, rather than something that could
be tolerated under the as-if rule, then this strcpy requirement carries
over to the multi-thread extensions to C specified in POSIX.

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

[parent not found: <orzjc8zvn6.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>]

* Re: Differences between man-pages and libc manual safety markings
       [not found]                                                                     ` <orzjc8zvn6.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
@ 2014-11-03 13:07                                                                       ` Mark Thompson
       [not found]                                                                         ` <54577E17.7000109-W77v16wj1OVeoWH0uzbU5w@public.gmane.org>
  2014-11-03 15:55                                                                       ` Torvald Riegel
  1 sibling, 1 reply; 35+ messages in thread
From: Mark Thompson @ 2014-11-03 13:07 UTC (permalink / raw)
  To: Alexandre Oliva, Torvald Riegel
  Cc: linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On 03/11/14 05:43, Alexandre Oliva wrote:
>
> The key is external observability, and ordering requirements.
>
> strcpy's sequential specification does not mandate chars to be copied in
> any predetermined order, so strcpy is free to reorder and regroup loads
> and stores as it sees fit.  None of this steps out of its explicit
> specification.  Writing garbage, however, would step out, but it might
> still be allowed under the as-if rule if this couldn't be legitimately
> observed, e.g. if any attempt to observe it would invoke undefined
> behavior.

On 03/11/14 05:13, Alexandre Oliva wrote:
 >
 > The way is not specified, but it does not state that it is to write
 > something else there before, and doing so is NOT allowed by the as-if
 > rule.  Consider a function that goes:
 >
 >    for (;;) {
 >      extern char buffer[];
 >      strcpy (buffer, "foo");
 >      signal (SIGUSR1, testme);
 >      strcpy (buffer, "fool");
 >      signal (SIGUSR1, SIG_IGN);
 >    }
 >
 >
 > Now, if the signal handler testme were to inspect buffer[1] (knowing the
 > only window in which it may be activated is the above, in a
 > single-threaded program), what values could it possibly find there?
 > Please justify with quotes from combinations of C and POSIX standards of
 > the same vintage you can find.  How about buffer[0], and buffer[3]?
 >

I disagree with this reasoning, though I am not sufficiently familiar 
with the standards involved to argue with it effectively.

However, I do see a more concerning point here: does this argument also 
apply to memcpy()?  I can't find any language in the standard which 
places additional requirements on strcpy() (and which would disallow 
implementation as strlen+memcpy, for example).

Given that, does it not follow that the current, released, 
implementation of memcpy() in glibc for architectures using a wh64 
instruction (alpha, tilepro and tilegx) is entirely wrong?

- Mark

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

[parent not found: <54577E17.7000109-W77v16wj1OVeoWH0uzbU5w@public.gmane.org>]

* Re: Differences between man-pages and libc manual safety markings
       [not found]                                                                         ` <54577E17.7000109-W77v16wj1OVeoWH0uzbU5w@public.gmane.org>
@ 2014-11-19  0:26                                                                           ` Alexandre Oliva
  0 siblings, 0 replies; 35+ messages in thread
From: Alexandre Oliva @ 2014-11-19  0:26 UTC (permalink / raw)
  To: Mark Thompson; +Cc: Torvald Riegel, linux-man@vger.kernel.org

On Nov  3, 2014, Mark Thompson <mrt-W77v16wj1OVeoWH0uzbU5w@public.gmane.org> wrote:

> I disagree with this reasoning, though I am not sufficiently familiar
> with the standards involved to argue with it effectively.

> However, I do see a more concerning point here: does this argument
> also apply to memcpy()?

If the argument is correct, then I don't see why it wouldn't.  The
discussion has been ongoing in glibc-alpha, but it's not clear that any
argument any of us might bring up at this point would amount to global
consensus on what the standards actually meant, so I take it they're
more like exploring what is already defined and what isn't, to perhaps
later seek clarification from the standard bodies.

> Given that, does it not follow that the current, released,
> implementation of memcpy() in glibc for architectures using a wh64
> instruction (alpha, tilepro and tilegx) is entirely wrong?

If the argument, as stated, is correct, yes.

There's an alternative argument that wouldn't rule it out, though, that
amounts to requiring the implementation to stick to the stated
specification of *copying* data from source to destination (as opposed
to writing random bits in dest until it compares equal to source), but
with enough leeway to make copying behavior that performs cache-line
invalidation when there is certainty that the entire cache line is going
to be overwritten.  It is hard to specify that formally, but the
intuitive notion could be that, if the implementation does what the spec
says it should (namely, reading from source and storing in target),
besides ensuring the postconditions are met if the preconditions were
met to begin with, without modifying memory that shouldn't have been
modified, it could be ok.  That's not my reading, but it's a possibility
the standard bodies might consider if the intent is to rule out garbage
writing but not cache invalidation.

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Differences between man-pages and libc manual safety markings
       [not found]                                                                     ` <orzjc8zvn6.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
  2014-11-03 13:07                                                                       ` Mark Thompson
@ 2014-11-03 15:55                                                                       ` Torvald Riegel
  1 sibling, 0 replies; 35+ messages in thread
From: Torvald Riegel @ 2014-11-03 15:55 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Mark Thompson, Peng Haitao, Carlos O'Donell,
	Michael Kerrisk (man-pages),
	linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Mon, 2014-11-03 at 03:43 -0200, Alexandre Oliva wrote:
> On Nov  1, 2014, Torvald Riegel <triegel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
> >> The information is all there: there's a static shared buffer (as
> >> required by ctermid(NULL), and that's usually not thread-safe, but
> >> that's tacit knowledge of our time not explicitly duplicated there),
> 
> > Wow, really?  It supposed to be "tacit knowledge of our time" to not
> > have thread-safe initialization of a static buffer?
> 
> You seem to have misread what I wrote.  It's the fact that ctermid(NULL)
> is not required to be thread-safe that is tacit knowledge of our time.

I don't find that interpretation any less surprising.  So, again, why is
that "tacit knowledge of our time"?

> > "but the internal buffer is always set
> > to /dev/tty" makes it obvious that there's a data race?
> 
> The reasoning indicates there's something non-obvious going on there.
> When there isn't, there's no reason for any such note.

So, the note says that an internal buffer is used if NULL is returned,
but this buffer is always set to /dev/tty.  Why does that indicate that
there's a race, or that ctermid(NULL) is not MT-Safe, different from the
actual annotation you added?

> > Maybe we should make a poll on libc-alpha to see which percentage of
> > people actually understands this comment as implying that there is a
> > data race.
> 
> Will you please stop putting words I didn't say in my mouth?

That's the way I understood your comment but ...

> Where did I say the comment meant to imply that there was a data race in
> there?
> 
> All it was meant to do was justify why I regarded the function as
> MT-Safe and AS-Safe, in spite of the possibility of concurrent
> initializations of the buffer.

... we can as well make a poll about this reasoning of yours above.

> > Also note that "your wishes" isn't what this is about -- it is about
> > maintainable documentation.  This affects glibc in general.
> 
> That's still your wishes for additional documentation that nobody has
> stepped up to do.  Others (myself included) may share such a wish for
> better documentation on various fronts, but none of this would entitle
> you to demand me to do so, or to complain that I didn't, when I never
> agreed to do it.

Yeah, you never promised me to create maintainable documentation.  So I
can't demand it.  I just thought that this would be common sense, and as
here, it's actually not a lot of work.  You'd have had to just write
another sentence, or say
  @ data race on initialization of static buffer
or
  @ buffer initialized by running strcpy potentially concurrently

or really something like that.

Sorry if I've asked for something outrageous here...

> > 1) Do the loads use an acquire fence (ie, atomic_read_barrier)?
> 
> No.
> 
> > If not, is any new locale data they can read from initialized before
> > those reads (e.g., at program startup, so happening before any spawned
> > threads)?
> 
> No

Then this sounds like a bug that could trigger an error on archs with
weak HW memory models.

And to be clear, I'd never asked you to fix it, but I would have
appreciated if you gave me (or other glibc folks) a heads-up that
something might be fishy here, so *somebody* (not you, don't worry) can
revisit it later on.

I guess nobody specifically asked you to report potential bugs you've
found during the review for MT-Safety.  It would have just been better
if you did note them...

> > 2) If the compiler sees such a load (e.g., because it's a macro, or with
> > LTO and inlining), and it's not marked as an atomic access, it can be
> > free to reload it.  Which could lead to partially reading from the old
> > and new locale.
> 
> Yes, but only if there's more than one use (thus marked locale).  If
> there's only one use, the compiler could, but would not do something
> that stupid.

Register pressure?

Whenever you say "something that stupid", it should have a very simple
reason.  So if you can't give a one-sentence reply why reload would be
"that stupid", then maybe you're wrong, huh?

> >> Some functions access ctype data multiple times; given the above, and
> >> because the locale structures are accessed in a way that enable the
> >> compiler to load the global locale pointer only once, and the compiler
> >> performs this optimization
> 
> > So correctness applies in some cases on certain, optional compiler
> > optimizations to be performed?
> 
> That is glibc's locale design, yes.  Sane compilers help keep glibc sane
> and safe.  We are indeed relying on compiler sanity.

Sanity is not defined by adhering to the hacks you have in mind but by
adhering to the standards and following the conventions that all the
involved projects have agreed on.  Should we make a poll with the gcc
folks to see how many would agree reloading a nonatomic / non-shared mem
location would be insane?

> > Then please refer me to the concrete old standard (and we should support
> > it, actually) and the wording in there that *requires* strcpy to issue a
> > certain set of stores instead of just, for example, requiring that when
> > the function returns, the string has been copied.
> 
> These are equivalent requirements IMHO, because neither makes other
> allowances for using the user-visible storage as scratch space, and
> both, because of asynchronous signals, rule this possibility out for
> storage that the signal handler could observe.  Anyway, let's leave this
> point for the other sub-thread, to avoid duplication, shall we?

Fine, let's continue there...

> > Second, when you've done that, could you please explain to me how a
> > compiler is supposed to optimize code without something like an as-if
> > rule?
> 
> The key is external observability, and ordering requirements.
> 
> strcpy's sequential specification does not mandate chars to be copied in
> any predetermined order,

But it's not explicitly allowed either.  This contradicts with what you
said above.  Please make a consistent argument.  Otherwise, I'll reply:

strcpy's sequential specification does not mandata chars to be copied
atomically, so we can copy by adding 1 to each char until it reaches the
final value.

> so strcpy is free to reorder and regroup loads
> and stores as it sees fit.  None of this steps out of its explicit
> specification.  Writing garbage, however, would step out, but it might
> still be allowed under the as-if rule if this couldn't be legitimately
> observed, e.g. if any attempt to observe it would invoke undefined
> behavior.

If that is how you interpret the language standard, then please tell me
why we need volatile.  Seriously.

> >> and the
> >> no-data-races mandate.
> 
> > For there to be data races, the standard must actually acknowledge that
> > things are supposed to work in a multi-threaded setting.  Could you
> > refer me to the respective wording too, please?
> 
> Why are you doing this?  You know where this requirement is in POSIX!
> 
> For a long time we have had threads as a POSIX add-on to C standards
> that said nothing about threads.  POSIX imports (defers to) standard C
> without conflicting with it.  It logically follows from these two
> statements that all requirements from standard C on strcpy
> implementations still apply under POSIX.  If you agree that a signal
> handlers' ability to observe deviations from standard-mandated behavior

The question is still what standard-mandated behavior is.  Which you
don't constrain by bringing in POSIX.

> indicate deviation from the standard, rather than something that could
> be tolerated under the as-if rule, then this strcpy requirement carries
> over to the multi-thread extensions to C specified in POSIX.

Also, think about the compiler example with strcpy being replaced by the
compiler, that *you* brought up.  Do you think the compiler agrees with,
or is at least aware of your opinion, that it has to make strcpy
volatile because POSIX uses C and you interpret it to mean that every
handler can peek into intermediate state and find something that you
think makes sense?

A side note: We can keep discussing this, but I can't imagine any
further additional reasons you could bring up why either the ctermid
MT-Safety annotation is incomplete or the glibc implementation of
ctermid is wrong.  In this discussion, you and I disagree, and Mark
seems to disagree with your reasoning as well.  Maybe you should get
other opinions, or just agree that maybe your reasoning might not be as
fault-free as it may seem.

A second side note:  When us three here are discussing this for ages,
then obviously, your MT-Safety note on ctermid is NOT easy to
understand.  There should be absolutely no discussion that something
needs to change.
If any of our documentation leads to at least some disagreement between
glibc contributors, it's not clear enough.  Period.

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Differences between man-pages and libc manual safety markings
       [not found]                     ` <or38adofh9.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
  2014-10-24 12:12                       ` Torvald Riegel
@ 2014-10-24 12:14                       ` Torvald Riegel
  1 sibling, 0 replies; 35+ messages in thread
From: Torvald Riegel @ 2014-10-24 12:14 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Peng Haitao, Carlos O'Donell, Michael Kerrisk (man-pages),
	linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Fri, 2014-10-24 at 09:48 -0200, Alexandre Oliva wrote:
> Given this more detailed explanation of the conditions that apply and
> that IMHO make it perfectly safe, do you still see any concrete error
> situation here?

Oh, and I forgot to highlight the race detection issue again.
Unnecessarily introducing false positives with race detectors doesn't
sound useful to me...



--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Differences between man-pages and libc manual safety markings
       [not found] ` <544118FA.3070003-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2014-10-20 15:47   ` Carlos O'Donell
@ 2014-10-21  8:31   ` Peng Haitao
  2015-01-07  6:12   ` Michael Kerrisk (man-pages)
  2015-01-07  6:16   ` Michael Kerrisk (man-pages)
  3 siblings, 0 replies; 35+ messages in thread
From: Peng Haitao @ 2014-10-21  8:31 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages), Carlos O'Donell
  Cc: Alexandre Oliva,
	linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org


On 10/17/2014 09:26 PM, Michael Kerrisk (man-pages) wrote:
> Hello Haitao,
> 
> I was comparing some of the MT-Safety markings in man-pages versus the glibc
> manual (https://www.gnu.org/software/libc/manual/html_mono/libc.html)
> I found four cases that seem to contradict. Are there errors in either
> the man pages or in the glibc manual?
> 
> ==
> ctermid.3       MT-Unsafe race:ctermid/!s
> 	glibc: MT-Safe
> 
> man-pages and glibc manual disagree (man-pages seems to be more
> precise than glibc).
> 
> ==
> getcwd.3        MT-Safe env
> 	glibc: MT-Safe
> 
> man-pages and glibc manual disagree on "env" (man-pages seems 
> to be more precise than glibc).
> 

In getcwd.3 man-page:
getcwd() and getwd() are "MT-Safe"
get_current_dir_name() is "MT-Safe env"

URL: http://thread.gmane.org/gmane.linux.man/6580

The annotations are equal to glibc manual:)

> ==
> getlogin.3      MT-Unsafe race:cuserid/!string locale
> 	glibc: MT-Unsafe race:getlogin race:utent sig:ALRM timer locale
> 
> man-pages and glibc manual disagree on "race:cuserid/!string" versus
> "race:getlogin"
> 

In getlogin.3 man-page:
getlogin() is "MT-Unsafe locale"
getlogin_r() is "MT-Safe locale"
cuserid() is "MT-Unsafe race:cuserid/!string locale"

In glibc manual:
getlogin() is "MT-Unsafe race:getlogin race:utent sig:ALRM timer locale"
getlogin_r() is nonexistent
cuserid() is "MT-Safe locale"

glibc manual is more precise than man-page of getlogin().
The difference of cuserid() is similar to ctermid().

> ==
> regex.3         MT-Safe env
> 	glibc: MT-Safe locale
> 

In regex.3 man-page:
regcomp() and regexec() are "MT-Safe locale"
regerror() is "MT-Safe env"
regfree() is "MT-Safe"

URL: http://thread.gmane.org/gmane.linux.man/6609

The annotations are equal to glibc manual:)


-- 
Best Regards,
Peng

> man-pages and glibc manual disagree on "env" versus "locale"
> 
> ==
> 
> Cheers,
> 
> Michael
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Differences between man-pages and libc manual safety markings
       [not found] ` <544118FA.3070003-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2014-10-20 15:47   ` Carlos O'Donell
  2014-10-21  8:31   ` Peng Haitao
@ 2015-01-07  6:12   ` Michael Kerrisk (man-pages)
  2015-01-07  6:16   ` Michael Kerrisk (man-pages)
  3 siblings, 0 replies; 35+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-07  6:12 UTC (permalink / raw)
  To: Ma Shimiao
  Cc: Peng Haitao, mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Alexandre Oliva,
	Carlos O'Donell,
	linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Qian Lei

[CC += Ma Shimiao]

Hello Ma Shimiao,

As the person taking over the ATTRIBUTES patch series for man-pages,
are you aware of the earlier mail thread below? I will CC you into 
one or two of the other notable mails in this thread as well.
Thanks,

Michael

On 10/17/2014 03:26 PM, Michael Kerrisk (man-pages) wrote:
> Hello Haitao,
> 
> I was comparing some of the MT-Safety markings in man-pages versus the glibc
> manual (https://www.gnu.org/software/libc/manual/html_mono/libc.html)
> I found four cases that seem to contradict. Are there errors in either
> the man pages or in the glibc manual?
> 
> ==
> ctermid.3       MT-Unsafe race:ctermid/!s
> 	glibc: MT-Safe
> 
> man-pages and glibc manual disagree (man-pages seems to be more
> precise than glibc).
> 
> ==
> getcwd.3        MT-Safe env
> 	glibc: MT-Safe
> 
> man-pages and glibc manual disagree on "env" (man-pages seems 
> to be more precise than glibc).
> 
> ==
> getlogin.3      MT-Unsafe race:cuserid/!string locale
> 	glibc: MT-Unsafe race:getlogin race:utent sig:ALRM timer locale
> 
> man-pages and glibc manual disagree on "race:cuserid/!string" versus
> "race:getlogin"
> 
> ==
> regex.3         MT-Safe env
> 	glibc: MT-Safe locale
> 
> man-pages and glibc manual disagree on "env" versus "locale"
> 
> ==
> 
> Cheers,
> 
> Michael
> 
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Differences between man-pages and libc manual safety markings
       [not found] ` <544118FA.3070003-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
                     ` (2 preceding siblings ...)
  2015-01-07  6:12   ` Michael Kerrisk (man-pages)
@ 2015-01-07  6:16   ` Michael Kerrisk (man-pages)
  3 siblings, 0 replies; 35+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-07  6:16 UTC (permalink / raw)
  To: Peng Haitao, Ma Shimiao
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Alexandre Oliva,
	Carlos O'Donell,
	linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Hi Ma Shimiao,

You can see the whole discussion thread here:

http://thread.gmane.org/gmane.linux.man/7223

Thanks,

Michael


On 10/17/2014 03:26 PM, Michael Kerrisk (man-pages) wrote:
> Hello Haitao,
> 
> I was comparing some of the MT-Safety markings in man-pages versus the glibc
> manual (https://www.gnu.org/software/libc/manual/html_mono/libc.html)
> I found four cases that seem to contradict. Are there errors in either
> the man pages or in the glibc manual?
> 
> ==
> ctermid.3       MT-Unsafe race:ctermid/!s
> 	glibc: MT-Safe
> 
> man-pages and glibc manual disagree (man-pages seems to be more
> precise than glibc).
> 
> ==
> getcwd.3        MT-Safe env
> 	glibc: MT-Safe
> 
> man-pages and glibc manual disagree on "env" (man-pages seems 
> to be more precise than glibc).
> 
> ==
> getlogin.3      MT-Unsafe race:cuserid/!string locale
> 	glibc: MT-Unsafe race:getlogin race:utent sig:ALRM timer locale
> 
> man-pages and glibc manual disagree on "race:cuserid/!string" versus
> "race:getlogin"
> 
> ==
> regex.3         MT-Safe env
> 	glibc: MT-Safe locale
> 
> man-pages and glibc manual disagree on "env" versus "locale"
> 
> ==
> 
> Cheers,
> 
> Michael
> 
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2015-01-07  6:16 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-17 13:26 Differences between man-pages and libc manual safety markings Michael Kerrisk (man-pages)
     [not found] ` <544118FA.3070003-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-10-20 15:47   ` Carlos O'Donell
     [not found]     ` <CAE2sS1jbGRT4uvBBVAPJkX2Mi4gHG=ii_G713MHhQzyGxO4yyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-21  8:53       ` Peng Haitao
     [not found]         ` <54461F16.2080705-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2014-10-23  6:16           ` Alexandre Oliva
     [not found]             ` <oroat3wbsl.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
2014-10-23  9:29               ` Torvald Riegel
     [not found]                 ` <1414056576.8483.79.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
2014-10-24 11:48                   ` Alexandre Oliva
     [not found]                     ` <or38adofh9.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
2014-10-24 12:12                       ` Torvald Riegel
     [not found]                         ` <1414152747.18538.26.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
2014-10-24 16:31                           ` Alexandre Oliva
     [not found]                             ` <orioj9bfaa.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
2014-10-24 19:15                               ` Torvald Riegel
     [not found]                                 ` <1414178101.18538.53.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
2014-10-30 18:24                                   ` Alexandre Oliva
     [not found]                                     ` <orbnottnzb.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
2014-10-30 19:01                                       ` Torvald Riegel
     [not found]                                         ` <1414695671.10085.180.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
2014-11-01  8:48                                           ` Alexandre Oliva
     [not found]                                             ` <ora94b8fxl.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
2014-11-01 10:47                                               ` Torvald Riegel
     [not found]                                                 ` <1414838867.10085.431.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
2014-11-01 18:32                                                   ` Alexandre Oliva
     [not found]                                                     ` <orwq7e22n2.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
2014-11-01 18:58                                                       ` Torvald Riegel
     [not found]                                                         ` <1414868298.10085.488.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
2014-11-03  5:13                                                           ` Alexandre Oliva
     [not found]                                                             ` <or4mug27f7.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
2014-11-03 16:10                                                               ` Torvald Riegel
     [not found]                                                                 ` <1415031006.4531.44.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
2014-11-04  0:18                                                                   ` Alexandre Oliva
2014-10-27 20:46                               ` Mark Thompson
     [not found]                                 ` <544EAF20.8050509-W77v16wj1OVeoWH0uzbU5w@public.gmane.org>
2014-10-29  8:55                                   ` Alexandre Oliva
     [not found]                                     ` <ork33jqmqe.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
2014-10-29  9:12                                       ` Torvald Riegel
     [not found]                                         ` <1414573935.18538.74.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
2014-10-30 18:00                                           ` Alexandre Oliva
     [not found]                                             ` <orfve5tp3e.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
2014-10-30 18:41                                               ` Torvald Riegel
     [not found]                                                 ` <1414694486.10085.165.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
2014-11-01  8:24                                                   ` Alexandre Oliva
     [not found]                                                     ` <oregtn8h23.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
2014-11-01 12:40                                                       ` Torvald Riegel
     [not found]                                                         ` <1414845631.10085.474.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
2014-11-01 18:22                                                           ` Alexandre Oliva
     [not found]                                                             ` <or1tpm3hn5.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
2014-11-01 19:54                                                               ` Torvald Riegel
     [not found]                                                                 ` <1414871691.10085.529.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>
2014-11-03  5:43                                                                   ` Alexandre Oliva
     [not found]                                                                     ` <orzjc8zvn6.fsf-pcXFJVXz+5uzQB+pC5nmwQ@public.gmane.org>
2014-11-03 13:07                                                                       ` Mark Thompson
     [not found]                                                                         ` <54577E17.7000109-W77v16wj1OVeoWH0uzbU5w@public.gmane.org>
2014-11-19  0:26                                                                           ` Alexandre Oliva
2014-11-03 15:55                                                                       ` Torvald Riegel
2014-10-24 12:14                       ` Torvald Riegel
2014-10-21  8:31   ` Peng Haitao
2015-01-07  6:12   ` Michael Kerrisk (man-pages)
2015-01-07  6:16   ` Michael Kerrisk (man-pages)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).