[PATCH/RFC] mingw: implement PTHREAD_MUTEX

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH/RFC] mingw: implement PTHREAD_MUTEX_INITIALIZER
@ 2011-10-25 14:55 Erik Faye-Lund
  2011-10-25 15:28 ` Johannes Sixt
  2011-10-26  3:05 ` Atsushi Nakagawa
  0 siblings, 2 replies; 13+ messages in thread
From: Erik Faye-Lund @ 2011-10-25 14:55 UTC (permalink / raw)
  To: msysgit; +Cc: git, johannes.schindelin, j.sixt

Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
---
Every now and then, someone suggest using PTHREAD_MUTEX_INITIALIZER,
but some times gets show down by our lack of support on Windows.

I'm working on something myself that could benefit from this, so I
gave it a stab. The result looks promising to me, but I haven't
really debugged it yet.

But is there a fundamental reason why we haven't done something like
this before? :)

 compat/win32/pthread.c |   28 +++++++++++++++++++++++++---
 compat/win32/pthread.h |   18 ++++++++++++------
 2 files changed, 37 insertions(+), 9 deletions(-)

diff --git a/compat/win32/pthread.c b/compat/win32/pthread.c
index 010e875..14f91d5 100644
--- a/compat/win32/pthread.c
+++ b/compat/win32/pthread.c
@@ -57,6 +57,28 @@ pthread_t pthread_self(void)
 	return t;
 }
 
+int pthread_mutex_init(pthread_mutex_t *mutex, const pthread_mutexattr_t *attr)
+{
+	InitializeCriticalSection(&mutex->cs);
+	mutex->autoinit = 0;
+	return 0;
+}
+
+int pthread_mutex_lock(pthread_mutex_t *mutex)
+{
+	if (mutex->autoinit) {
+		if (InterlockedCompareExchange(&mutex->autoinit, -1, 1) != -1) {
+			pthread_mutex_init(mutex, NULL);
+			mutex->autoinit = 0;
+		} else
+			while (mutex->autoinit != 0)
+				; /* wait for other thread */
+	}
+
+	EnterCriticalSection(&mutex->cs);
+	return 0;
+}
+
 int pthread_cond_init(pthread_cond_t *cond, const void *unused)
 {
 	cond->waiters = 0;
@@ -85,7 +107,7 @@ int pthread_cond_destroy(pthread_cond_t *cond)
 	return 0;
 }
 
-int pthread_cond_wait(pthread_cond_t *cond, CRITICAL_SECTION *mutex)
+int pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex)
 {
 	int last_waiter;
 
@@ -99,7 +121,7 @@ int pthread_cond_wait(pthread_cond_t *cond, CRITICAL_SECTION *mutex)
 	 * waiters count above, so there's no problem with
 	 * leaving mutex unlocked before we wait on semaphore.
 	 */
-	LeaveCriticalSection(mutex);
+	LeaveCriticalSection(&mutex->cs);
 
 	/* let's wait - ignore return value */
 	WaitForSingleObject(cond->sema, INFINITE);
@@ -133,7 +155,7 @@ int pthread_cond_wait(pthread_cond_t *cond, CRITICAL_SECTION *mutex)
 		 */
 	}
 	/* lock external mutex again */
-	EnterCriticalSection(mutex);
+	EnterCriticalSection(&mutex->cs);
 
 	return 0;
 }
diff --git a/compat/win32/pthread.h b/compat/win32/pthread.h
index 2e20548..647e6d4 100644
--- a/compat/win32/pthread.h
+++ b/compat/win32/pthread.h
@@ -16,14 +16,20 @@
 /*
  * Defines that adapt Windows API threads to pthreads API
  */
-#define pthread_mutex_t CRITICAL_SECTION
+typedef struct {
+	CRITICAL_SECTION cs;
+	LONG volatile autoinit;
+} pthread_mutex_t;
 
-#define pthread_mutex_init(a,b) (InitializeCriticalSection((a)), 0)
-#define pthread_mutex_destroy(a) DeleteCriticalSection((a))
-#define pthread_mutex_lock EnterCriticalSection
-#define pthread_mutex_unlock LeaveCriticalSection
+#define PTHREAD_MUTEX_INITIALIZER { { 0 }, 1 }
 
 typedef int pthread_mutexattr_t;
+
+int pthread_mutex_init(pthread_mutex_t *, const pthread_mutexattr_t *);
+#define pthread_mutex_destroy(a) DeleteCriticalSection(&(a)->cs)
+int pthread_mutex_lock(pthread_mutex_t *);
+#define pthread_mutex_unlock(a) LeaveCriticalSection(&(a)->cs)
+
 #define pthread_mutexattr_init(a) (*(a) = 0)
 #define pthread_mutexattr_destroy(a) do {} while (0)
 #define pthread_mutexattr_settype(a, t) 0
@@ -47,7 +53,7 @@ typedef struct {
 
 extern int pthread_cond_init(pthread_cond_t *cond, const void *unused);
 extern int pthread_cond_destroy(pthread_cond_t *cond);
-extern int pthread_cond_wait(pthread_cond_t *cond, CRITICAL_SECTION *mutex);
+extern int pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex);
 extern int pthread_cond_signal(pthread_cond_t *cond);
 extern int pthread_cond_broadcast(pthread_cond_t *cond);
 
-- 
1.7.7.msysgit.1.1.g7b316

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH/RFC] mingw: implement PTHREAD_MUTEX_INITIALIZER
  2011-10-25 14:55 [PATCH/RFC] mingw: implement PTHREAD_MUTEX_INITIALIZER Erik Faye-Lund
@ 2011-10-25 15:28 ` Johannes Sixt
  2011-10-25 15:42   ` Erik Faye-Lund
  2011-10-26  3:05 ` Atsushi Nakagawa
  1 sibling, 1 reply; 13+ messages in thread
From: Johannes Sixt @ 2011-10-25 15:28 UTC (permalink / raw)
  To: Erik Faye-Lund; +Cc: msysgit, git, johannes.schindelin

Am 10/25/2011 16:55, schrieb Erik Faye-Lund:
> +int pthread_mutex_lock(pthread_mutex_t *mutex)
> +{
> +	if (mutex->autoinit) {
> +		if (InterlockedCompareExchange(&mutex->autoinit, -1, 1) != -1) {
> +			pthread_mutex_init(mutex, NULL);
> +			mutex->autoinit = 0;
> +		} else
> +			while (mutex->autoinit != 0)
> +				; /* wait for other thread */
> +	}

The double-checked locking idiom. Very suspicious. Can you explain why it
works in this case? Why are no Interlocked functions needed for the other
accesses of autoinit? ("It is volatile" is the wrong answer to this last
question, BTW.)

-- Hannes

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH/RFC] mingw: implement PTHREAD_MUTEX_INITIALIZER
  2011-10-25 15:28 ` Johannes Sixt
@ 2011-10-25 15:42   ` Erik Faye-Lund
  2011-10-25 20:07     ` [msysGit] " Johannes Sixt
  0 siblings, 1 reply; 13+ messages in thread
From: Erik Faye-Lund @ 2011-10-25 15:42 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: msysgit, git, johannes.schindelin

On Tue, Oct 25, 2011 at 5:28 PM, Johannes Sixt <j.sixt@viscovery.net> wrote:
> Am 10/25/2011 16:55, schrieb Erik Faye-Lund:
>> +int pthread_mutex_lock(pthread_mutex_t *mutex)
>> +{
>> +     if (mutex->autoinit) {
>> +             if (InterlockedCompareExchange(&mutex->autoinit, -1, 1) != -1) {
>> +                     pthread_mutex_init(mutex, NULL);
>> +                     mutex->autoinit = 0;
>> +             } else
>> +                     while (mutex->autoinit != 0)
>> +                             ; /* wait for other thread */
>> +     }
>
> The double-checked locking idiom. Very suspicious. Can you explain why it
> works in this case? Why are no Interlocked functions needed for the other
> accesses of autoinit? ("It is volatile" is the wrong answer to this last
> question, BTW.)

I agree that it should look a bit suspicious; I'm generally skeptical
whenever I see 'volatile' in threading-code myself. But I think it's
the right answer in this case. "volatile" means that the compiler
cannot optimize away accesses, which is sufficient in this case.

Basically, the thread that gets the original 1 returned from
InterlockedCompareExchange is the only one who writes to
mutex->autoinit. All other threads only read the value, and the
volatile should make sure they actually do. Since all 32-bit reads and
writes are atomic on Windows (see
http://msdn.microsoft.com/en-us/library/windows/desktop/ms684122(v=vs.85).aspx
"Simple reads and writes to properly-aligned 32-bit variables are
atomic operations.") and mutex->autoinit is a LONG, this should be
safe AFAICT. In fact, Windows specifically does not have any
explicitly atomic writes exactly for this reason.

The only ways mutex->autoinit can be updated is:
- InterlockedCompareExchange compares it to 1, finds it's identical
and inserts -1
- intialization is done
Both these updates happens from the same thread.

Yes, details like this should probably go into the commit message ;)

If there's something I'm missing, please let me know.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [msysGit] Re: [PATCH/RFC] mingw: implement PTHREAD_MUTEX_INITIALIZER
  2011-10-25 15:42   ` Erik Faye-Lund
@ 2011-10-25 20:07     ` Johannes Sixt
  2011-10-25 20:51       ` Erik Faye-Lund
  0 siblings, 1 reply; 13+ messages in thread
From: Johannes Sixt @ 2011-10-25 20:07 UTC (permalink / raw)
  To: kusmabite; +Cc: msysgit, git, johannes.schindelin

Am 25.10.2011 17:42, schrieb Erik Faye-Lund:
> On Tue, Oct 25, 2011 at 5:28 PM, Johannes Sixt <j.sixt@viscovery.net> wrote:
>> Am 10/25/2011 16:55, schrieb Erik Faye-Lund:
>>> +int pthread_mutex_lock(pthread_mutex_t *mutex)
>>> +{
>>> +     if (mutex->autoinit) {
>>> +             if (InterlockedCompareExchange(&mutex->autoinit, -1, 1) != -1) {
>>> +                     pthread_mutex_init(mutex, NULL);
>>> +                     mutex->autoinit = 0;
>>> +             } else
>>> +                     while (mutex->autoinit != 0)
>>> +                             ; /* wait for other thread */
>>> +     }
>>
>> The double-checked locking idiom. Very suspicious. Can you explain why it
>> works in this case? Why are no Interlocked functions needed for the other
>> accesses of autoinit? ("It is volatile" is the wrong answer to this last
>> question, BTW.)
> 
> I agree that it should look a bit suspicious; I'm generally skeptical
> whenever I see 'volatile' in threading-code myself. But I think it's
> the right answer in this case. "volatile" means that the compiler
> cannot optimize away accesses, which is sufficient in this case.

No, it is not, and it took me a train ride to see what's wrong. It has
nothing to do with autoinit, but with all the other memory locations
that are written. See here, with pthread_mutex_init() inlined:

  if (mutex->autoinit) {

Assume two threads enter this block.

     if (InterlockedCompareExchange(&mutex->autoinit, -1, 1) != -1) {

Only one thread, A, say on CPU A, will enter this block.

        InitializeCriticalSection(&mutex->cs);

Thread A writes some values. Note that there are no memory barriers
involved here. Not that I know of or that they would be documented.

        mutex->autoinit = 0;

And it writes another one. Thread A continues below to contend for the
mutex it just initialized.

     } else

Meanwhile, thread B, say on CPU B, spins in this loop:

        while (mutex->autoinit != 0)
           ; /* wait for other thread */

When thread B arrives here, it sees the value of autoinit that thread A
has written above.

HOWEVER, when it continues, there is NO [*] guarantee that it will also
see the values that InitializeCriticalSection() has written, because
there were no memory barriers involved. When it continues, there is a
chance that it calls EnterCriticalSection() with uninitialized values!

  }

[*] If you compile this code with MSVC >= 2005, "No guarantee" is not
true, it's exactly the opposite because Microsoft extended the meaning
of 'volatile' to imply a memory barriere. This is *NOT* true for gcc in
general. It may be true for MinGW gcc, but I do not know.

> Basically, the thread that gets the original 1 returned from
> InterlockedCompareExchange is the only one who writes to
> mutex->autoinit. All other threads only read the value, and the
> volatile should make sure they actually do. Since all 32-bit reads and
> writes are atomic on Windows (see
> http://msdn.microsoft.com/en-us/library/windows/desktop/ms684122(v=vs.85).aspx
> "Simple reads and writes to properly-aligned 32-bit variables are
> atomic operations.") and mutex->autoinit is a LONG, this should be
> safe AFAICT. In fact, Windows specifically does not have any
> explicitly atomic writes exactly for this reason.

There is a difference between atomic and coherent: Yes, 32-bit accesses
are atomic, but they are not automatically coherent: A 32-bit value
written by one CPU is not instantly visible on the other CPU. 'volatile'
as per the C lanugage does not add any guarantees that would be of
interest here. OTOH, Microsoft's definition of 'volatile' does.

> The only ways mutex->autoinit can be updated is:
> - InterlockedCompareExchange compares it to 1, finds it's identical
> and inserts -1
> - intialization is done
> Both these updates happens from the same thread.
> 
> Yes, details like this should probably go into the commit message ;)

A comment in the function is preferred!

-- Hannes

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [msysGit] Re: [PATCH/RFC] mingw: implement PTHREAD_MUTEX_INITIALIZER
  2011-10-25 20:07     ` [msysGit] " Johannes Sixt
@ 2011-10-25 20:51       ` Erik Faye-Lund
  2011-10-25 21:13         ` Johannes Sixt
  2011-10-26  3:44         ` Kyle Moffett
  0 siblings, 2 replies; 13+ messages in thread
From: Erik Faye-Lund @ 2011-10-25 20:51 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: msysgit, git, johannes.schindelin

On Tue, Oct 25, 2011 at 10:07 PM, Johannes Sixt <j6t@kdbg.org> wrote:
> Am 25.10.2011 17:42, schrieb Erik Faye-Lund:
>> On Tue, Oct 25, 2011 at 5:28 PM, Johannes Sixt <j.sixt@viscovery.net> wrote:
>>> Am 10/25/2011 16:55, schrieb Erik Faye-Lund:
>>>> +int pthread_mutex_lock(pthread_mutex_t *mutex)
>>>> +{
>>>> +     if (mutex->autoinit) {
>>>> +             if (InterlockedCompareExchange(&mutex->autoinit, -1, 1) != -1) {
>>>> +                     pthread_mutex_init(mutex, NULL);
>>>> +                     mutex->autoinit = 0;
>>>> +             } else
>>>> +                     while (mutex->autoinit != 0)
>>>> +                             ; /* wait for other thread */
>>>> +     }
>>>
>>> The double-checked locking idiom. Very suspicious. Can you explain why it
>>> works in this case? Why are no Interlocked functions needed for the other
>>> accesses of autoinit? ("It is volatile" is the wrong answer to this last
>>> question, BTW.)
>>
>> I agree that it should look a bit suspicious; I'm generally skeptical
>> whenever I see 'volatile' in threading-code myself. But I think it's
>> the right answer in this case. "volatile" means that the compiler
>> cannot optimize away accesses, which is sufficient in this case.
>
> No, it is not, and it took me a train ride to see what's wrong. It has
> nothing to do with autoinit, but with all the other memory locations
> that are written. See here, with pthread_mutex_init() inlined:
>
>  if (mutex->autoinit) {
>
> Assume two threads enter this block.
>
>     if (InterlockedCompareExchange(&mutex->autoinit, -1, 1) != -1) {
>
> Only one thread, A, say on CPU A, will enter this block.
>
>        InitializeCriticalSection(&mutex->cs);
>
> Thread A writes some values. Note that there are no memory barriers
> involved here. Not that I know of or that they would be documented.
>
>        mutex->autoinit = 0;
>
> And it writes another one. Thread A continues below to contend for the
> mutex it just initialized.
>
>     } else
>
> Meanwhile, thread B, say on CPU B, spins in this loop:
>
>        while (mutex->autoinit != 0)
>           ; /* wait for other thread */
>
> When thread B arrives here, it sees the value of autoinit that thread A
> has written above.
>
> HOWEVER, when it continues, there is NO [*] guarantee that it will also
> see the values that InitializeCriticalSection() has written, because
> there were no memory barriers involved. When it continues, there is a
> chance that it calls EnterCriticalSection() with uninitialized values!
>

Thanks for pointing this out, I completely forgot about write re-ordering.

This is indeed a problem. So, shouldn't replacing "mutex->autoinit =
0;" with "InterlockedExchange(&mutex->autoinit, 0)" solve the problem?
InterlockedExchange generates a full memory barrier:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms683590(v=vs.85).aspx

>  }
>
>
> [*] If you compile this code with MSVC >= 2005, "No guarantee" is not
> true, it's exactly the opposite because Microsoft extended the meaning
> of 'volatile' to imply a memory barriere.

Do you have a source for this? I'm not saying it isn't true, I just
never heard of this, and would like to read up on it :)

> This is *NOT* true for gcc in
> general. It may be true for MinGW gcc, but I do not know.
>
>> Basically, the thread that gets the original 1 returned from
>> InterlockedCompareExchange is the only one who writes to
>> mutex->autoinit. All other threads only read the value, and the
>> volatile should make sure they actually do. Since all 32-bit reads and
>> writes are atomic on Windows (see
>> http://msdn.microsoft.com/en-us/library/windows/desktop/ms684122(v=vs.85).aspx
>> "Simple reads and writes to properly-aligned 32-bit variables are
>> atomic operations.") and mutex->autoinit is a LONG, this should be
>> safe AFAICT. In fact, Windows specifically does not have any
>> explicitly atomic writes exactly for this reason.
>
> There is a difference between atomic and coherent: Yes, 32-bit accesses
> are atomic, but they are not automatically coherent: A 32-bit value
> written by one CPU is not instantly visible on the other CPU. 'volatile'
> as per the C lanugage does not add any guarantees that would be of
> interest here. OTOH, Microsoft's definition of 'volatile' does.
>

I never meant to imply this either, I simply forgot about write re-ordering ;)

>> The only ways mutex->autoinit can be updated is:
>> - InterlockedCompareExchange compares it to 1, finds it's identical
>> and inserts -1
>> - intialization is done
>> Both these updates happens from the same thread.
>>
>> Yes, details like this should probably go into the commit message ;)
>
> A comment in the function is preferred!
>

Yes, good point. This is code that looks dangerous (and in this case
potentially was - hopefully eventual future iteration won't be), and
should of course be documented inline...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [msysGit] Re: [PATCH/RFC] mingw: implement PTHREAD_MUTEX_INITIALIZER
  2011-10-25 20:51       ` Erik Faye-Lund
@ 2011-10-25 21:13         ` Johannes Sixt
  2011-10-26  3:44         ` Kyle Moffett
  1 sibling, 0 replies; 13+ messages in thread
From: Johannes Sixt @ 2011-10-25 21:13 UTC (permalink / raw)
  To: kusmabite; +Cc: msysgit, git, johannes.schindelin

Am 25.10.2011 22:51, schrieb Erik Faye-Lund:
> On Tue, Oct 25, 2011 at 10:07 PM, Johannes Sixt <j6t@kdbg.org> wrote:
>> HOWEVER, when it continues, there is NO [*] guarantee that it will also
>> see the values that InitializeCriticalSection() has written, because
>> there were no memory barriers involved. When it continues, there is a
>> chance that it calls EnterCriticalSection() with uninitialized values!
>>
> 
> Thanks for pointing this out, I completely forgot about write re-ordering.
> 
> This is indeed a problem. So, shouldn't replacing "mutex->autoinit =
> 0;" with "InterlockedExchange(&mutex->autoinit, 0)" solve the problem?
> InterlockedExchange generates a full memory barrier:
> http://msdn.microsoft.com/en-us/library/windows/desktop/ms683590(v=vs.85).aspx

That should do it.

>> [*] If you compile this code with MSVC >= 2005, "No guarantee" is not
>> true, it's exactly the opposite because Microsoft extended the meaning
>> of 'volatile' to imply a memory barriere.
> 
> Do you have a source for this? I'm not saying it isn't true, I just
> never heard of this, and would like to read up on it :)

http://msdn.microsoft.com/en-us/library/ms686355%28VS.85%29.aspx

-- Hannes

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH/RFC] mingw: implement PTHREAD_MUTEX_INITIALIZER
  2011-10-25 14:55 [PATCH/RFC] mingw: implement PTHREAD_MUTEX_INITIALIZER Erik Faye-Lund
  2011-10-25 15:28 ` Johannes Sixt
@ 2011-10-26  3:05 ` Atsushi Nakagawa
  2011-10-26 13:08   ` [msysGit] " Erik Faye-Lund
  1 sibling, 1 reply; 13+ messages in thread
From: Atsushi Nakagawa @ 2011-10-26  3:05 UTC (permalink / raw)
  To: msysgit; +Cc: git, johannes.schindelin, j.sixt

[-- Attachment #1: Type: text/plain, Size: 1344 bytes --]

On Oct 25, 11:55 pm, Erik Faye-Lund <kusmab...@gmail.com> wrote:
> [...]
> +int pthread_mutex_init(pthread_mutex_t *mutex, const pthread_mutexattr_t 
*attr)
> +{
> +       InitializeCriticalSection(&mutex->cs);
> +       mutex->autoinit = 0;
> +       return 0;
> +}
> +
> +int pthread_mutex_lock(pthread_mutex_t *mutex)
> +{
> +       if (mutex->autoinit) {
> +               if (InterlockedCompareExchange(&mutex->autoinit, -1, 1) 
!= -1) {

I'm making the assumption that mutex->autoinit starts off as 1 before 
things get multi-threaded..

I've only looked at what's in the patch so I could be missing vital 
context..  Anyways, is there a reason why you made this 
"InterlockedCompareExchange(..., -1, 1) != -1" and not 
"InterlockedCompareExchange(..., -1, 1) == 1"?

It looks to me the former adds a race condition after "if (mutex->autoinit) 
{".  e.g. A second thread could reinitialize mutex->cs after the first 
thread has already entered EnterCriticalSection(...).

> +                       pthread_mutex_init(mutex, NULL);
> +                       mutex->autoinit = 0;
> +               } else
> +                       while (mutex->autoinit != 0)
> +                               ; /* wait for other thread */
> +       }
> +
> +       EnterCriticalSection(&mutex->cs);
> +       return 0;
> +}
> [...]

-- 
Atsushi Nakagawa


[-- Attachment #2: Type: text/html, Size: 2272 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [msysGit] Re: [PATCH/RFC] mingw: implement PTHREAD_MUTEX_INITIALIZER
  2011-10-25 20:51       ` Erik Faye-Lund
  2011-10-25 21:13         ` Johannes Sixt
@ 2011-10-26  3:44         ` Kyle Moffett
  2011-10-26 13:16           ` Erik Faye-Lund
  1 sibling, 1 reply; 13+ messages in thread
From: Kyle Moffett @ 2011-10-26  3:44 UTC (permalink / raw)
  To: kusmabite; +Cc: Johannes Sixt, msysgit, git, johannes.schindelin

On Tue, Oct 25, 2011 at 16:51, Erik Faye-Lund <kusmabite@gmail.com> wrote:
> On Tue, Oct 25, 2011 at 10:07 PM, Johannes Sixt <j6t@kdbg.org> wrote:
>> Am 25.10.2011 17:42, schrieb Erik Faye-Lund:
>>> On Tue, Oct 25, 2011 at 5:28 PM, Johannes Sixt <j.sixt@viscovery.net> wrote:
>>>> Am 10/25/2011 16:55, schrieb Erik Faye-Lund:
>>>>> +int pthread_mutex_lock(pthread_mutex_t *mutex)
>>>>> +{
>>>>> +     if (mutex->autoinit) {
>>>>> +             if (InterlockedCompareExchange(&mutex->autoinit, -1, 1) != -1) {
>>>>> +                     pthread_mutex_init(mutex, NULL);
>>>>> +                     mutex->autoinit = 0;
>>>>> +             } else
>>>>> +                     while (mutex->autoinit != 0)
>>>>> +                             ; /* wait for other thread */
>>>>> +     }
>>>>
>>>> The double-checked locking idiom. Very suspicious. Can you explain why it
>>>> works in this case? Why are no Interlocked functions needed for the other
>>>> accesses of autoinit? ("It is volatile" is the wrong answer to this last
>>>> question, BTW.)
>>>
>>> I agree that it should look a bit suspicious; I'm generally skeptical
>>> whenever I see 'volatile' in threading-code myself. But I think it's
>>> the right answer in this case. "volatile" means that the compiler
>>> cannot optimize away accesses, which is sufficient in this case.
>>
>> No, it is not, and it took me a train ride to see what's wrong. It has
>> nothing to do with autoinit, but with all the other memory locations
>> that are written. See here, with pthread_mutex_init() inlined:
>>
>>  if (mutex->autoinit) {
>>
>> Assume two threads enter this block.
>>
>>     if (InterlockedCompareExchange(&mutex->autoinit, -1, 1) != -1) {
>>
>> Only one thread, A, say on CPU A, will enter this block.
>>
>>        InitializeCriticalSection(&mutex->cs);
>>
>> Thread A writes some values. Note that there are no memory barriers
>> involved here. Not that I know of or that they would be documented.
>>
>>        mutex->autoinit = 0;
>>
>> And it writes another one. Thread A continues below to contend for the
>> mutex it just initialized.
>>
>>     } else
>>
>> Meanwhile, thread B, say on CPU B, spins in this loop:
>>
>>        while (mutex->autoinit != 0)
>>           ; /* wait for other thread */
>>
>> When thread B arrives here, it sees the value of autoinit that thread A
>> has written above.
>>
>> HOWEVER, when it continues, there is NO [*] guarantee that it will also
>> see the values that InitializeCriticalSection() has written, because
>> there were no memory barriers involved. When it continues, there is a
>> chance that it calls EnterCriticalSection() with uninitialized values!
>>
>
> Thanks for pointing this out, I completely forgot about write re-ordering.
>
> This is indeed a problem. So, shouldn't replacing "mutex->autoinit =
> 0;" with "InterlockedExchange(&mutex->autoinit, 0)" solve the problem?
> InterlockedExchange generates a full memory barrier:
> http://msdn.microsoft.com/en-us/library/windows/desktop/ms683590(v=vs.85).aspx

No, I'm afraid that won't solve the issue (at least in GCC, not sure about MSVC)

A write barrier in one thread is only effective if it is paired with a
read barrier in the other thread.

Since there's no read barrier in the "while(mutex->autoinit != 0)",
you don't have any guaranteed ordering.

I guess if MSVC assumes that volatile reads imply barriers then it might work...

Cheers,
Kyle Moffett

-- 
Curious about my work on the Debian powerpcspe port?
I'm keeping a blog here: http://pureperl.blogspot.com/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [msysGit] Re: [PATCH/RFC] mingw: implement PTHREAD_MUTEX_INITIALIZER
  2011-10-26  3:05 ` Atsushi Nakagawa
@ 2011-10-26 13:08   ` Erik Faye-Lund
  0 siblings, 0 replies; 13+ messages in thread
From: Erik Faye-Lund @ 2011-10-26 13:08 UTC (permalink / raw)
  To: msysgit; +Cc: git, johannes.schindelin, j.sixt

On Wed, Oct 26, 2011 at 5:05 AM, Atsushi Nakagawa <atnak@chejz.com> wrote:
> On Oct 25, 11:55 pm, Erik Faye-Lund <kusmab...@gmail.com> wrote:
>> [...]
>> +int pthread_mutex_init(pthread_mutex_t *mutex, const pthread_mutexattr_t
>> *attr)
>> +{
>> +       InitializeCriticalSection(&mutex->cs);
>> +       mutex->autoinit = 0;
>> +       return 0;
>> +}
>> +
>> +int pthread_mutex_lock(pthread_mutex_t *mutex)
>> +{
>> +       if (mutex->autoinit) {
>> +               if (InterlockedCompareExchange(&mutex->autoinit, -1, 1) !=
>> -1) {
> I'm making the assumption that mutex->autoinit starts off as 1 before things
> get multi-threaded..
> I've only looked at what's in the patch so I could be missing vital
> context..  Anyways, is there a reason why you made this
> "InterlockedCompareExchange(..., -1, 1) != -1" and not
> "InterlockedCompareExchange(..., -1, 1) == 1"?

No, not really.

> It looks to me the former adds a race condition after "if (mutex->autoinit)
> {".  e.g. A second thread could reinitialize mutex->cs after the first
> thread has already entered EnterCriticalSection(...).

You are indeed correct, thanks for spotting :)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [msysGit] Re: [PATCH/RFC] mingw: implement PTHREAD_MUTEX_INITIALIZER
  2011-10-26  3:44         ` Kyle Moffett
@ 2011-10-26 13:16           ` Erik Faye-Lund
  2011-10-27 23:00             ` Atsushi Nakagawa
  0 siblings, 1 reply; 13+ messages in thread
From: Erik Faye-Lund @ 2011-10-26 13:16 UTC (permalink / raw)
  To: Kyle Moffett; +Cc: Johannes Sixt, msysgit, git, johannes.schindelin

On Wed, Oct 26, 2011 at 5:44 AM, Kyle Moffett <kyle@moffetthome.net> wrote:
> On Tue, Oct 25, 2011 at 16:51, Erik Faye-Lund <kusmabite@gmail.com> wrote:
>> On Tue, Oct 25, 2011 at 10:07 PM, Johannes Sixt <j6t@kdbg.org> wrote:
>>> Am 25.10.2011 17:42, schrieb Erik Faye-Lund:
>>>> On Tue, Oct 25, 2011 at 5:28 PM, Johannes Sixt <j.sixt@viscovery.net> wrote:
>>>>> Am 10/25/2011 16:55, schrieb Erik Faye-Lund:
>>>>>> +int pthread_mutex_lock(pthread_mutex_t *mutex)
>>>>>> +{
>>>>>> +     if (mutex->autoinit) {
>>>>>> +             if (InterlockedCompareExchange(&mutex->autoinit, -1, 1) != -1) {
>>>>>> +                     pthread_mutex_init(mutex, NULL);
>>>>>> +                     mutex->autoinit = 0;
>>>>>> +             } else
>>>>>> +                     while (mutex->autoinit != 0)
>>>>>> +                             ; /* wait for other thread */
>>>>>> +     }
>>>>>
>>>>> The double-checked locking idiom. Very suspicious. Can you explain why it
>>>>> works in this case? Why are no Interlocked functions needed for the other
>>>>> accesses of autoinit? ("It is volatile" is the wrong answer to this last
>>>>> question, BTW.)
>>>>
>>>> I agree that it should look a bit suspicious; I'm generally skeptical
>>>> whenever I see 'volatile' in threading-code myself. But I think it's
>>>> the right answer in this case. "volatile" means that the compiler
>>>> cannot optimize away accesses, which is sufficient in this case.
>>>
>>> No, it is not, and it took me a train ride to see what's wrong. It has
>>> nothing to do with autoinit, but with all the other memory locations
>>> that are written. See here, with pthread_mutex_init() inlined:
>>>
>>>  if (mutex->autoinit) {
>>>
>>> Assume two threads enter this block.
>>>
>>>     if (InterlockedCompareExchange(&mutex->autoinit, -1, 1) != -1) {
>>>
>>> Only one thread, A, say on CPU A, will enter this block.
>>>
>>>        InitializeCriticalSection(&mutex->cs);
>>>
>>> Thread A writes some values. Note that there are no memory barriers
>>> involved here. Not that I know of or that they would be documented.
>>>
>>>        mutex->autoinit = 0;
>>>
>>> And it writes another one. Thread A continues below to contend for the
>>> mutex it just initialized.
>>>
>>>     } else
>>>
>>> Meanwhile, thread B, say on CPU B, spins in this loop:
>>>
>>>        while (mutex->autoinit != 0)
>>>           ; /* wait for other thread */
>>>
>>> When thread B arrives here, it sees the value of autoinit that thread A
>>> has written above.
>>>
>>> HOWEVER, when it continues, there is NO [*] guarantee that it will also
>>> see the values that InitializeCriticalSection() has written, because
>>> there were no memory barriers involved. When it continues, there is a
>>> chance that it calls EnterCriticalSection() with uninitialized values!
>>>
>>
>> Thanks for pointing this out, I completely forgot about write re-ordering.
>>
>> This is indeed a problem. So, shouldn't replacing "mutex->autoinit =
>> 0;" with "InterlockedExchange(&mutex->autoinit, 0)" solve the problem?
>> InterlockedExchange generates a full memory barrier:
>> http://msdn.microsoft.com/en-us/library/windows/desktop/ms683590(v=vs.85).aspx
>
> No, I'm afraid that won't solve the issue (at least in GCC, not sure about MSVC)
>
> A write barrier in one thread is only effective if it is paired with a
> read barrier in the other thread.
>
> Since there's no read barrier in the "while(mutex->autoinit != 0)",
> you don't have any guaranteed ordering.
>
> I guess if MSVC assumes that volatile reads imply barriers then it might work...

OK, so I should probably do something like this instead?

while (InterlockedCompareExchange(&mutex->autoinit, 0, 0) != 0)
	; /* wait for other thread */

I really appreciate getting some extra eyes on this, thanks.
Concurrent programming is not my strong-suit (as this exercise has
shown) ;)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [msysGit] Re: [PATCH/RFC] mingw: implement PTHREAD_MUTEX_INITIALIZER
  2011-10-26 13:16           ` Erik Faye-Lund
@ 2011-10-27 23:00             ` Atsushi Nakagawa
  2011-10-27 23:20               ` Kyle Moffett
  0 siblings, 1 reply; 13+ messages in thread
From: Atsushi Nakagawa @ 2011-10-27 23:00 UTC (permalink / raw)
  To: kusmabite; +Cc: Kyle Moffett, Johannes Sixt, msysgit, git, johannes.schindelin

Erik Faye-Lund <kusmabite@gmail.com> wrote:
> On Wed, Oct 26, 2011 at 5:44 AM, Kyle Moffett <kyle@moffetthome.net> wrote:
> > On Tue, Oct 25, 2011 at 16:51, Erik Faye-Lund <kusmabite@gmail.com> wrote:
> >> On Tue, Oct 25, 2011 at 10:07 PM, Johannes Sixt <j6t@kdbg.org> wrote:
> >>> Am 25.10.2011 17:42, schrieb Erik Faye-Lund:
> >>>> On Tue, Oct 25, 2011 at 5:28 PM, Johannes Sixt <j.sixt@viscovery.net> wrote:
> >>>>> Am 10/25/2011 16:55, schrieb Erik Faye-Lund:
> >>>>>> +int pthread_mutex_lock(pthread_mutex_t *mutex)
> >>>>>> +{
> >>>>>> + [snip]
> >>>>>
> >>>>> The double-checked locking idiom. Very suspicious. Can you explain why it
> >>> [snip]
> >>>
> >>>  if (mutex->autoinit) {
> >>>
> >>> Assume two threads enter this block.
> >>>
> >>>     if (InterlockedCompareExchange(&mutex->autoinit, -1, 1) != -1) {
> >>>
> >>> Only one thread, A, say on CPU A, will enter this block.
> >>>
> >>>        InitializeCriticalSection(&mutex->cs);
> >>>
> >>> Thread A writes some values. Note that there are no memory barriers
> >>> involved here. Not that I know of or that they would be documented.
> >>>
> >>>        mutex->autoinit = 0;
> >>>
> >>> And it writes another one. Thread A continues below to contend for the
> >>> mutex it just initialized.
> >>>
> >>>     } else
> >>>
> >>> Meanwhile, thread B, say on CPU B, spins in this loop:
> >>>
> >>>        while (mutex->autoinit != 0)
> >>>           ; /* wait for other thread */
> >>>
> >>> When thread B arrives here, it sees the value of autoinit that thread A
> >>> has written above.
> >>>
> >>> [snip]
> >>>
> >>
> >> Thanks for pointing this out, I completely forgot about write re-ordering.
> >>
> >> This is indeed a problem. So, shouldn't replacing "mutex->autoinit =
> >> 0;" with "InterlockedExchange(&mutex->autoinit, 0)" solve the problem?
> >> InterlockedExchange generates a full memory barrier:
> >> http://msdn.microsoft.com/en-us/library/windows/desktop/ms683590(v=vs.85).aspx
> >
> > No, I'm afraid that won't solve the issue (at least in GCC, not sure about MSVC)
> >
> > A write barrier in one thread is only effective if it is paired with a
> > read barrier in the other thread.
> >
> > Since there's no read barrier in the "while(mutex->autoinit != 0)",
> > you don't have any guaranteed ordering.

Out of curiosity, where could re-ordering be a problem here?  I'm
thinking probably at "EnterCriticalSection(&mutex->cs)" and the contents
of "mutex->cs" not being propagated to the waiting thread.  However,
shouldn't that be a non-problem, as far as compiler reordering goes,
because it's an external function call and only the address of mutex->cs
is passed?

The only other cause I could think of is if ordering at the CPU was
somehow different (it could be if there're no special provisions for
calling external functions) or if "InterlockedExchange(&mutex->autoinit,
0)" wasn't atomic in updating autoinit and doing the memory barrier.

Either way, I couldn't vouch for the safety of the above logic without
a memory barrier so this question is purely of an academical nature. :)

> > I guess if MSVC assumes that volatile reads imply barriers then it might work...
> 
> OK, so I should probably do something like this instead?
> 
> while (InterlockedCompareExchange(&mutex->autoinit, 0, 0) != 0)
> 	; /* wait for other thread */

Technically, assuming only the updating of "mutex->cs" is in question,
the ICE should only be required once after exiting the loop...

There's a question of the propagation of the value of "mutex->autoinit"
itself, but my take is that the memory barrier on the writing thread
will push out the updated value across all CPUs, thus preventing an
infinite loop.  The other factors, value caching and loop optimization
by the compiler, should be prevented by the "volatile" keyword even with
gcc or MSVC 2003.

> I really appreciate getting some extra eyes on this, thanks.
> Concurrent programming is not my strong-suit (as this exercise has
> shown) ;)

So would I. :)

-- 
Atsushi Nakagawa
<atnak@chejz.com>
Changes are made when there is inconvenience.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [msysGit] Re: [PATCH/RFC] mingw: implement PTHREAD_MUTEX_INITIALIZER
  2011-10-27 23:00             ` Atsushi Nakagawa
@ 2011-10-27 23:20               ` Kyle Moffett
  2011-10-28 18:35                 ` Atsushi Nakagawa
  0 siblings, 1 reply; 13+ messages in thread
From: Kyle Moffett @ 2011-10-27 23:20 UTC (permalink / raw)
  To: Atsushi Nakagawa
  Cc: kusmabite, Johannes Sixt, msysgit, git, johannes.schindelin

On Thu, Oct 27, 2011 at 19:00, Atsushi Nakagawa <atnak@chejz.com> wrote:
> Erik Faye-Lund <kusmabite@gmail.com> wrote:
>> On Wed, Oct 26, 2011 at 5:44 AM, Kyle Moffett <kyle@moffetthome.net> wrote:
>> > On Tue, Oct 25, 2011 at 16:51, Erik Faye-Lund <kusmabite@gmail.com> wrote:
>> >> Thanks for pointing this out, I completely forgot about write re-ordering.
>> >>
>> >> This is indeed a problem. So, shouldn't replacing "mutex->autoinit =
>> >> 0;" with "InterlockedExchange(&mutex->autoinit, 0)" solve the problem?
>> >> InterlockedExchange generates a full memory barrier:
>> >> http://msdn.microsoft.com/en-us/library/windows/desktop/ms683590(v=vs.85).aspx
>> >
>> > No, I'm afraid that won't solve the issue (at least in GCC, not sure about MSVC)
>> >
>> > A write barrier in one thread is only effective if it is paired with a
>> > read barrier in the other thread.
>> >
>> > Since there's no read barrier in the "while(mutex->autoinit != 0)",
>> > you don't have any guaranteed ordering.
>
> Out of curiosity, where could re-ordering be a problem here?  I'm
> thinking probably at "EnterCriticalSection(&mutex->cs)" and the contents
> of "mutex->cs" not being propagated to the waiting thread.  However,
> shouldn't that be a non-problem, as far as compiler reordering goes,
> because it's an external function call and only the address of mutex->cs
> is passed?
>
> The only other cause I could think of is if ordering at the CPU was
> somehow different (it could be if there're no special provisions for
> calling external functions) or if "InterlockedExchange(&mutex->autoinit,
> 0)" wasn't atomic in updating autoinit and doing the memory barrier.
>
> Either way, I couldn't vouch for the safety of the above logic without
> a memory barrier so this question is purely of an academical nature. :)
>
>> > I guess if MSVC assumes that volatile reads imply barriers then it might work...
>>
>> OK, so I should probably do something like this instead?
>>
>> while (InterlockedCompareExchange(&mutex->autoinit, 0, 0) != 0)
>>       ; /* wait for other thread */
>
> Technically, assuming only the updating of "mutex->cs" is in question,
> the ICE should only be required once after exiting the loop...
>
> There's a question of the propagation of the value of "mutex->autoinit"
> itself, but my take is that the memory barrier on the writing thread
> will push out the updated value across all CPUs, thus preventing an
> infinite loop.  The other factors, value caching and loop optimization
> by the compiler, should be prevented by the "volatile" keyword even with
> gcc or MSVC 2003.
>
>> I really appreciate getting some extra eyes on this, thanks.
>> Concurrent programming is not my strong-suit (as this exercise has
>> shown) ;)
>
> So would I. :)

Ok, so here's the race condition:

Thread1				Thread2
				/* Speculative prefetch */
				prefetch(*mutex);

if (mutex->autoinit) {
if (ICE(&mutex->autoinit, -1, 1) != -1) {
/* Now mutex->autoinit == -1 */
pthread_mutex_init(mutex, NULL);
/* This forces writes out to memory */
ICE(&mutex->autoinit, 0, -1);

				if (mutex->autoinit) {} /* false */
				/* No read barrier here */
				EnterCriticalSection(&mutex->cs);
				/* Use cached mutex->cs from earlier */

Even though you forced the memory write to be ordered in Thread 1 you
did not ensure that the read of autoinit occurred before the read of
mutex->cs in Thread 2.  If the first thing that EnterCriticalSection
does is follow a pointer or read a mutex key value in mutex->cs then
won't necessarily get the right answer.

The rule of memory barriers is the ALWAYS come in pairs.  This simple
example guarantees that Thread2 will read "tmp_a" == 1 if it
previously read "tmp_b" == 1, although getting "tmp_a" == 1 and
"tmp_b" != 1 is still possible.

Thread1:
a = 1;
write_barrier();
b = 1;

Thread2:
tmp_b = b;
read_barrier();
tmp_a = a;

I think there's a Documentation/memory-barriers.txt file in the kernel
source code with more helpful info.

Cheers,
Kyle Moffett

-- 
Curious about my work on the Debian powerpcspe port?
I'm keeping a blog here: http://pureperl.blogspot.com/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [msysGit] Re: [PATCH/RFC] mingw: implement PTHREAD_MUTEX_INITIALIZER
  2011-10-27 23:20               ` Kyle Moffett
@ 2011-10-28 18:35                 ` Atsushi Nakagawa
  0 siblings, 0 replies; 13+ messages in thread
From: Atsushi Nakagawa @ 2011-10-28 18:35 UTC (permalink / raw)
  To: Kyle Moffett; +Cc: kusmabite, Johannes Sixt, msysgit, git, johannes.schindelin

Thanks for the explanation. :)

Kyle Moffett <kyle@moffetthome.net> wrote:
> On Thu, Oct 27, 2011 at 19:00, Atsushi Nakagawa <atnak@chejz.com> wrote:
> > Erik Faye-Lund <kusmabite@gmail.com> wrote:
> >> On Wed, Oct 26, 2011 at 5:44 AM, Kyle Moffett <kyle@moffetthome.net> wrote:
> >> > On Tue, Oct 25, 2011 at 16:51, Erik Faye-Lund <kusmabite@gmail.com> wrote:
> >> >> [...]
> >> >
> >> > No, I'm afraid that won't solve the issue (at least in GCC, not sure about MSVC)
> >> >
> >> > A write barrier in one thread is only effective if it is paired with a
> >> > read barrier in the other thread.
> >> >
> >> > Since there's no read barrier in the "while(mutex->autoinit != 0)",
> >> > you don't have any guaranteed ordering.
> >
> > Out of curiosity, where could re-ordering be a problem here?  I'm
> > thinking probably at "EnterCriticalSection(&mutex->cs)" and the contents
> > of "mutex->cs" not being propagated to the waiting thread.  However,
> > shouldn't that be a non-problem, as far as compiler reordering goes,
> > because it's an external function call and only the address of mutex->cs
> > is passed?
> >
> > [...]
> 
> Ok, so here's the race condition:
> 
> Thread1				Thread2
> 				/* Speculative prefetch */
> 				prefetch(*mutex);
> 
> if (mutex->autoinit) {
> if (ICE(&mutex->autoinit, -1, 1) != -1) {
> /* Now mutex->autoinit == -1 */
> pthread_mutex_init(mutex, NULL);
> /* This forces writes out to memory */
> ICE(&mutex->autoinit, 0, -1);
> 
> 				if (mutex->autoinit) {} /* false */
> 				/* No read barrier here */
> 				EnterCriticalSection(&mutex->cs);
> 				/* Use cached mutex->cs from earlier */

Ok, so there's no way of skimping on that one memory barrier in every
visit to pthread_mutex_lock().  Interesting.  Makes me wonder how it
trades off to lazy initialization.
> 
> Even though you forced the memory write to be ordered in Thread 1 you
> did not ensure that the read of autoinit occurred before the read of
> mutex->cs in Thread 2.  If the first thing that EnterCriticalSection
> does is follow a pointer or read a mutex key value in mutex->cs then
> won't necessarily get the right answer.
> 
> The rule of memory barriers is the ALWAYS come in pairs.  This simple
> example guarantees that Thread2 will read "tmp_a" == 1 if it
> previously read "tmp_b" == 1, although getting "tmp_a" == 1 and
> "tmp_b" != 1 is still possible.
> 
> Thread1:
> a = 1;
> write_barrier();
> b = 1;
> 
> Thread2:
> tmp_b = b;
> read_barrier();
> tmp_a = a;
> 
> I think there's a Documentation/memory-barriers.txt file in the kernel
> source code with more helpful info.

-- 
Atsushi Nakagawa
<atnak@chejz.com>
Changes are made when there is inconvenience.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-10-28 18:35 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-25 14:55 [PATCH/RFC] mingw: implement PTHREAD_MUTEX_INITIALIZER Erik Faye-Lund
2011-10-25 15:28 ` Johannes Sixt
2011-10-25 15:42   ` Erik Faye-Lund
2011-10-25 20:07     ` [msysGit] " Johannes Sixt
2011-10-25 20:51       ` Erik Faye-Lund
2011-10-25 21:13         ` Johannes Sixt
2011-10-26  3:44         ` Kyle Moffett
2011-10-26 13:16           ` Erik Faye-Lund
2011-10-27 23:00             ` Atsushi Nakagawa
2011-10-27 23:20               ` Kyle Moffett
2011-10-28 18:35                 ` Atsushi Nakagawa
2011-10-26  3:05 ` Atsushi Nakagawa
2011-10-26 13:08   ` [msysGit] " Erik Faye-Lund

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).