public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Question about the barrier() in hlist_nulls_for_each_entry_rcu()
@ 2023-07-20 18:53 Alan Huang
  2023-07-20 19:22 ` Eric Dumazet
  2023-07-21 11:51 ` David Laight
  0 siblings, 2 replies; 26+ messages in thread
From: Alan Huang @ 2023-07-20 18:53 UTC (permalink / raw)
  To: linux-kernel, netdev, rcu; +Cc: Paul E. McKenney, Eric Dumazet, roman.gushchin

Hi,

I noticed a commit c87a124a5d5e(“net: force a reload of first item in hlist_nulls_for_each_entry_rcu”)
and a related discussion [1].

After reading the whole discussion, it seems like that ptr->field was cached by gcc even with the deprecated
ACCESS_ONCE(), so my question is:

	Is that a compiler bug? If so, has this bug been fixed today, ten years later? 
	
	What about READ_ONCE(ptr->field)?


[1] https://lore.kernel.org/all/1369699930.3301.494.camel@edumazet-glaptop/

Thanks,
Alan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question about the barrier() in hlist_nulls_for_each_entry_rcu()
  2023-07-20 18:53 Question about the barrier() in hlist_nulls_for_each_entry_rcu() Alan Huang
@ 2023-07-20 19:22 ` Eric Dumazet
  2023-07-20 19:59   ` Alan Huang
  2023-07-21 11:51 ` David Laight
  1 sibling, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2023-07-20 19:22 UTC (permalink / raw)
  To: Alan Huang; +Cc: linux-kernel, netdev, rcu, Paul E. McKenney, roman.gushchin

On Thu, Jul 20, 2023 at 8:54 PM Alan Huang <mmpgouride@gmail.com> wrote:
>
> Hi,
>
> I noticed a commit c87a124a5d5e(“net: force a reload of first item in hlist_nulls_for_each_entry_rcu”)
> and a related discussion [1].
>
> After reading the whole discussion, it seems like that ptr->field was cached by gcc even with the deprecated
> ACCESS_ONCE(), so my question is:
>
>         Is that a compiler bug? If so, has this bug been fixed today, ten years later?
>
>         What about READ_ONCE(ptr->field)?

Make sure sparse is happy.

Do you have a patch for review ?


>
>
> [1] https://lore.kernel.org/all/1369699930.3301.494.camel@edumazet-glaptop/
>
> Thanks,
> Alan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question about the barrier() in hlist_nulls_for_each_entry_rcu()
  2023-07-20 19:22 ` Eric Dumazet
@ 2023-07-20 19:59   ` Alan Huang
  2023-07-20 21:11     ` Eric Dumazet
  2023-07-21 12:54     ` Joel Fernandes
  0 siblings, 2 replies; 26+ messages in thread
From: Alan Huang @ 2023-07-20 19:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel, netdev, rcu, Paul E. McKenney, roman.gushchin


> 2023年7月21日 03:22,Eric Dumazet <edumazet@google.com> 写道:
> 
> On Thu, Jul 20, 2023 at 8:54 PM Alan Huang <mmpgouride@gmail.com> wrote:
>> 
>> Hi,
>> 
>> I noticed a commit c87a124a5d5e(“net: force a reload of first item in hlist_nulls_for_each_entry_rcu”)
>> and a related discussion [1].
>> 
>> After reading the whole discussion, it seems like that ptr->field was cached by gcc even with the deprecated
>> ACCESS_ONCE(), so my question is:
>> 
>>        Is that a compiler bug? If so, has this bug been fixed today, ten years later?
>> 
>>        What about READ_ONCE(ptr->field)?
> 
> Make sure sparse is happy.

It caused a problem without barrier(), and the deprecated ACCESS_ONCE() didn’t help:

	https://lore.kernel.org/all/519D19DA.50400@yandex-team.ru/

So, my real question is: With READ_ONCE(ptr->field), are there still some unusual cases where gcc 
decides not to reload ptr->field?

> 
> Do you have a patch for review ?

Possibly next month. :)

> 
> 
>> 
>> 
>> [1] https://lore.kernel.org/all/1369699930.3301.494.camel@edumazet-glaptop/
>> 
>> Thanks,
>> Alan


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question about the barrier() in hlist_nulls_for_each_entry_rcu()
  2023-07-20 19:59   ` Alan Huang
@ 2023-07-20 21:11     ` Eric Dumazet
  2023-07-21 14:31       ` Alan Huang
  2023-07-21 12:54     ` Joel Fernandes
  1 sibling, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2023-07-20 21:11 UTC (permalink / raw)
  To: Alan Huang; +Cc: linux-kernel, netdev, rcu, Paul E. McKenney, roman.gushchin

On Thu, Jul 20, 2023 at 10:00 PM Alan Huang <mmpgouride@gmail.com> wrote:
>
>
> > 2023年7月21日 03:22,Eric Dumazet <edumazet@google.com> 写道:
> >
> > On Thu, Jul 20, 2023 at 8:54 PM Alan Huang <mmpgouride@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> I noticed a commit c87a124a5d5e(“net: force a reload of first item in hlist_nulls_for_each_entry_rcu”)
> >> and a related discussion [1].
> >>
> >> After reading the whole discussion, it seems like that ptr->field was cached by gcc even with the deprecated
> >> ACCESS_ONCE(), so my question is:
> >>
> >>        Is that a compiler bug? If so, has this bug been fixed today, ten years later?
> >>
> >>        What about READ_ONCE(ptr->field)?
> >
> > Make sure sparse is happy.
>
> It caused a problem without barrier(), and the deprecated ACCESS_ONCE() didn’t help:
>
>         https://lore.kernel.org/all/519D19DA.50400@yandex-team.ru/
>
> So, my real question is: With READ_ONCE(ptr->field), are there still some unusual cases where gcc
> decides not to reload ptr->field?

I can not really answer without seeing an actual patch...

Why are you asking ? Are you tracking compiler bug fixes ?

>
> >
> > Do you have a patch for review ?
>
> Possibly next month. :)
>
> >
> >
> >>
> >>
> >> [1] https://lore.kernel.org/all/1369699930.3301.494.camel@edumazet-glaptop/
> >>
> >> Thanks,
> >> Alan
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: Question about the barrier() in hlist_nulls_for_each_entry_rcu()
  2023-07-20 18:53 Question about the barrier() in hlist_nulls_for_each_entry_rcu() Alan Huang
  2023-07-20 19:22 ` Eric Dumazet
@ 2023-07-21 11:51 ` David Laight
  2023-07-21 15:55   ` Alan Huang
  1 sibling, 1 reply; 26+ messages in thread
From: David Laight @ 2023-07-21 11:51 UTC (permalink / raw)
  To: 'Alan Huang', linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, rcu@vger.kernel.org
  Cc: Paul E. McKenney, Eric Dumazet, roman.gushchin@linux.dev

From: Alan Huang
> Sent: 20 July 2023 19:54
> 
> I noticed a commit c87a124a5d5e(“net: force a reload of first item in hlist_nulls_for_each_entry_rcu”)
> and a related discussion [1].

Hmmm... that was all about the retry loop in ipv4/udp.c

AFAICT that retry got deleted by ca065d0c.

That also changes the list from hlist_nulls_xxx to hlist_xxx.
(I'm not sure of the difference)

This might be why we're seeing unexpected 'port unreachable' messages?

Quite why that has just started happening is another issue.
Most of the UDP sockets we create aren't 'connected' so I don't
believe they get moved between hash chains - just deleted.
The deletion should leave the hash chain intact.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question about the barrier() in hlist_nulls_for_each_entry_rcu()
  2023-07-20 19:59   ` Alan Huang
  2023-07-20 21:11     ` Eric Dumazet
@ 2023-07-21 12:54     ` Joel Fernandes
  2023-07-21 14:27       ` Alan Huang
  1 sibling, 1 reply; 26+ messages in thread
From: Joel Fernandes @ 2023-07-21 12:54 UTC (permalink / raw)
  To: Alan Huang
  Cc: Eric Dumazet, linux-kernel, netdev, rcu, Paul E. McKenney,
	roman.gushchin



> On Jul 20, 2023, at 4:00 PM, Alan Huang <mmpgouride@gmail.com> wrote:
> 
> 
>> 2023年7月21日 03:22,Eric Dumazet <edumazet@google.com> 写道:
>> 
>>> On Thu, Jul 20, 2023 at 8:54 PM Alan Huang <mmpgouride@gmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I noticed a commit c87a124a5d5e(“net: force a reload of first item in hlist_nulls_for_each_entry_rcu”)
>>> and a related discussion [1].
>>> 
>>> After reading the whole discussion, it seems like that ptr->field was cached by gcc even with the deprecated
>>> ACCESS_ONCE(), so my question is:
>>> 
>>>       Is that a compiler bug? If so, has this bug been fixed today, ten years later?
>>> 
>>>       What about READ_ONCE(ptr->field)?
>> 
>> Make sure sparse is happy.
> 
> It caused a problem without barrier(), and the deprecated ACCESS_ONCE() didn’t help:
> 
>    https://lore.kernel.org/all/519D19DA.50400@yandex-team.ru/
> 
> So, my real question is: With READ_ONCE(ptr->field), are there still some unusual cases where gcc 
> decides not to reload ptr->field?

I am a bit doubtful there will be strong (any?) interest in replacing the barrier() with READ_ONCE() without any tangible reason, regardless of whether a gcc issue was fixed.

But hey, if you want to float the idea…

Thanks,

 - Joel

> 
>> 
>> Do you have a patch for review ?
> 
> Possibly next month. :)
> 
>> 
>> 
>>> 
>>> 
>>> [1] https://lore.kernel.org/all/1369699930.3301.494.camel@edumazet-glaptop/
>>> 
>>> Thanks,
>>> Alan
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question about the barrier() in hlist_nulls_for_each_entry_rcu()
  2023-07-21 12:54     ` Joel Fernandes
@ 2023-07-21 14:27       ` Alan Huang
  2023-07-21 15:21         ` Joel Fernandes
  2023-07-31 20:09         ` Paul E. McKenney
  0 siblings, 2 replies; 26+ messages in thread
From: Alan Huang @ 2023-07-21 14:27 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Eric Dumazet, linux-kernel, netdev, rcu, Paul E. McKenney,
	roman.gushchin


> 2023年7月21日 20:54,Joel Fernandes <joel@joelfernandes.org> 写道:
> 
> 
> 
>> On Jul 20, 2023, at 4:00 PM, Alan Huang <mmpgouride@gmail.com> wrote:
>> 
>> 
>>> 2023年7月21日 03:22,Eric Dumazet <edumazet@google.com> 写道:
>>> 
>>>> On Thu, Jul 20, 2023 at 8:54 PM Alan Huang <mmpgouride@gmail.com> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> I noticed a commit c87a124a5d5e(“net: force a reload of first item in hlist_nulls_for_each_entry_rcu”)
>>>> and a related discussion [1].
>>>> 
>>>> After reading the whole discussion, it seems like that ptr->field was cached by gcc even with the deprecated
>>>> ACCESS_ONCE(), so my question is:
>>>> 
>>>>      Is that a compiler bug? If so, has this bug been fixed today, ten years later?
>>>> 
>>>>      What about READ_ONCE(ptr->field)?
>>> 
>>> Make sure sparse is happy.
>> 
>> It caused a problem without barrier(), and the deprecated ACCESS_ONCE() didn’t help:
>> 
>>   https://lore.kernel.org/all/519D19DA.50400@yandex-team.ru/
>> 
>> So, my real question is: With READ_ONCE(ptr->field), are there still some unusual cases where gcc 
>> decides not to reload ptr->field?
> 
> I am a bit doubtful there will be strong (any?) interest in replacing the barrier() with READ_ONCE() without any tangible reason, regardless of whether a gcc issue was fixed.
> 
> But hey, if you want to float the idea…

We already had the READ_ONCE() in rcu_deference_raw().

The barrier() here makes me think we need write code like below:
	
	READ_ONCE(head->first);
	barrier();
	READ_ONCE(head->first);

With READ_ONCE (or the deprecated ACCESS_ONCE),
I don’t think a compiler should cache the value of head->first.

> 
> Thanks,
> 
> - Joel
> 
>> 
>>> 
>>> Do you have a patch for review ?
>> 
>> Possibly next month. :)
>> 
>>> 
>>> 
>>>> 
>>>> 
>>>> [1] https://lore.kernel.org/all/1369699930.3301.494.camel@edumazet-glaptop/
>>>> 
>>>> Thanks,
>>>> Alan
>> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question about the barrier() in hlist_nulls_for_each_entry_rcu()
  2023-07-20 21:11     ` Eric Dumazet
@ 2023-07-21 14:31       ` Alan Huang
  2023-07-21 14:47         ` Eric Dumazet
  0 siblings, 1 reply; 26+ messages in thread
From: Alan Huang @ 2023-07-21 14:31 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel, netdev, rcu, Paul E. McKenney, roman.gushchin


> 2023年7月21日 05:11,Eric Dumazet <edumazet@google.com> 写道:
> 
> On Thu, Jul 20, 2023 at 10:00 PM Alan Huang <mmpgouride@gmail.com> wrote:
>> 
>> 
>>> 2023年7月21日 03:22,Eric Dumazet <edumazet@google.com> 写道:
>>> 
>>> On Thu, Jul 20, 2023 at 8:54 PM Alan Huang <mmpgouride@gmail.com> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> I noticed a commit c87a124a5d5e(“net: force a reload of first item in hlist_nulls_for_each_entry_rcu”)
>>>> and a related discussion [1].
>>>> 
>>>> After reading the whole discussion, it seems like that ptr->field was cached by gcc even with the deprecated
>>>> ACCESS_ONCE(), so my question is:
>>>> 
>>>>       Is that a compiler bug? If so, has this bug been fixed today, ten years later?
>>>> 
>>>>       What about READ_ONCE(ptr->field)?
>>> 
>>> Make sure sparse is happy.
>> 
>> It caused a problem without barrier(), and the deprecated ACCESS_ONCE() didn’t help:
>> 
>>        https://lore.kernel.org/all/519D19DA.50400@yandex-team.ru/
>> 
>> So, my real question is: With READ_ONCE(ptr->field), are there still some unusual cases where gcc
>> decides not to reload ptr->field?
> 
> I can not really answer without seeing an actual patch...

The content of the potential patch:

diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
index 89186c499dd4..bcd39670f359 100644
--- a/include/linux/rculist_nulls.h
+++ b/include/linux/rculist_nulls.h
@@ -158,15 +158,9 @@ static inline void hlist_nulls_add_fake(struct hlist_nulls_node *n)
  * @pos:       the &struct hlist_nulls_node to use as a loop cursor.
  * @head:      the head of the list.
  * @member:    the name of the hlist_nulls_node within the struct.
- *
- * The barrier() is needed to make sure compiler doesn't cache first element [1],
- * as this loop can be restarted [2]
- * [1] Documentation/memory-barriers.txt around line 1533
- * [2] Documentation/RCU/rculist_nulls.rst around line 146
  */
 #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member)                        \
-       for (({barrier();}),                                                    \
-            pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));            \
+       for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));            \
                (!is_a_nulls(pos)) &&                                           \
                ({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
                pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))
@@ -180,8 +174,7 @@ static inline void hlist_nulls_add_fake(struct hlist_nulls_node *n)
  * @member:    the name of the hlist_nulls_node within the struct.
  */
 #define hlist_nulls_for_each_entry_safe(tpos, pos, head, member)               \
-       for (({barrier();}),                                                    \
-            pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));            \
+       for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));            \
                (!is_a_nulls(pos)) &&                                           \
                ({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member);        \
                   pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)); 1; });)


> 
> Why are you asking ? Are you tracking compiler bug fixes ?

The barrier() here makes me confused.
 
If we really need that, do we need:

	READ_ONCE(head->first);
	barrier();
	READ_ONCE(head->first);

?

> 
>> 
>>> 
>>> Do you have a patch for review ?
>> 
>> Possibly next month. :)
>> 
>>> 
>>> 
>>>> 
>>>> 
>>>> [1] https://lore.kernel.org/all/1369699930.3301.494.camel@edumazet-glaptop/
>>>> 
>>>> Thanks,
>>>> Alan



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: Question about the barrier() in hlist_nulls_for_each_entry_rcu()
  2023-07-21 14:31       ` Alan Huang
@ 2023-07-21 14:47         ` Eric Dumazet
  2023-07-21 15:21           ` Alan Huang
  0 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2023-07-21 14:47 UTC (permalink / raw)
  To: Alan Huang; +Cc: linux-kernel, netdev, rcu, Paul E. McKenney, roman.gushchin

On Fri, Jul 21, 2023 at 4:31 PM Alan Huang <mmpgouride@gmail.com> wrote:
>
>
> > 2023年7月21日 05:11,Eric Dumazet <edumazet@google.com> 写道:
> >
> > On Thu, Jul 20, 2023 at 10:00 PM Alan Huang <mmpgouride@gmail.com> wrote:
> >>
> >>
> >>> 2023年7月21日 03:22,Eric Dumazet <edumazet@google.com> 写道:
> >>>
> >>> On Thu, Jul 20, 2023 at 8:54 PM Alan Huang <mmpgouride@gmail.com> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> I noticed a commit c87a124a5d5e(“net: force a reload of first item in hlist_nulls_for_each_entry_rcu”)
> >>>> and a related discussion [1].
> >>>>
> >>>> After reading the whole discussion, it seems like that ptr->field was cached by gcc even with the deprecated
> >>>> ACCESS_ONCE(), so my question is:
> >>>>
> >>>>       Is that a compiler bug? If so, has this bug been fixed today, ten years later?
> >>>>
> >>>>       What about READ_ONCE(ptr->field)?
> >>>
> >>> Make sure sparse is happy.
> >>
> >> It caused a problem without barrier(), and the deprecated ACCESS_ONCE() didn’t help:
> >>
> >>        https://lore.kernel.org/all/519D19DA.50400@yandex-team.ru/
> >>
> >> So, my real question is: With READ_ONCE(ptr->field), are there still some unusual cases where gcc
> >> decides not to reload ptr->field?
> >
> > I can not really answer without seeing an actual patch...
>
> The content of the potential patch:
>
> diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
> index 89186c499dd4..bcd39670f359 100644
> --- a/include/linux/rculist_nulls.h
> +++ b/include/linux/rculist_nulls.h
> @@ -158,15 +158,9 @@ static inline void hlist_nulls_add_fake(struct hlist_nulls_node *n)
>   * @pos:       the &struct hlist_nulls_node to use as a loop cursor.
>   * @head:      the head of the list.
>   * @member:    the name of the hlist_nulls_node within the struct.
> - *
> - * The barrier() is needed to make sure compiler doesn't cache first element [1],
> - * as this loop can be restarted [2]
> - * [1] Documentation/memory-barriers.txt around line 1533
> - * [2] Documentation/RCU/rculist_nulls.rst around line 146
>   */
>  #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member)                        \
> -       for (({barrier();}),                                                    \
> -            pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));            \
> +       for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));            \
>                 (!is_a_nulls(pos)) &&                                           \
>                 ({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
>                 pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))
> @@ -180,8 +174,7 @@ static inline void hlist_nulls_add_fake(struct hlist_nulls_node *n)
>   * @member:    the name of the hlist_nulls_node within the struct.
>   */
>  #define hlist_nulls_for_each_entry_safe(tpos, pos, head, member)               \
> -       for (({barrier();}),                                                    \
> -            pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));            \
> +       for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));            \
>                 (!is_a_nulls(pos)) &&                                           \
>                 ({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member);        \
>                    pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)); 1; });)
>
>
> >
> > Why are you asking ? Are you tracking compiler bug fixes ?
>
> The barrier() here makes me confused.
>
> If we really need that, do we need:
>
>         READ_ONCE(head->first);
>         barrier();
>         READ_ONCE(head->first);
>

Nope, the patch you want to revert (while it did fix (by pure luck
???) a real bug back in the days) was replacing

ACCESS_ONCE()

by

barrier();
ACCESS_ONCE();

(There is one ACCESS_ONCE(), not two of them)

BTW,
  barrier();
  followed by an arbitrary number of barrier(); back to back,
translates to one barrier()

Frankly, I would not change the code, unless someone can explain what
was the issue.
(Perhaps there was a missing barrier elsewhere)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question about the barrier() in hlist_nulls_for_each_entry_rcu()
  2023-07-21 14:27       ` Alan Huang
@ 2023-07-21 15:21         ` Joel Fernandes
  2023-07-21 15:54           ` Alan Huang
                             ` (2 more replies)
  2023-07-31 20:09         ` Paul E. McKenney
  1 sibling, 3 replies; 26+ messages in thread
From: Joel Fernandes @ 2023-07-21 15:21 UTC (permalink / raw)
  To: Alan Huang
  Cc: Eric Dumazet, linux-kernel, netdev, rcu, Paul E. McKenney,
	roman.gushchin

On 7/21/23 10:27, Alan Huang wrote:
> 
>> 2023年7月21日 20:54,Joel Fernandes <joel@joelfernandes.org> 写道:
>>
>>
>>
>>> On Jul 20, 2023, at 4:00 PM, Alan Huang <mmpgouride@gmail.com> wrote:
>>>
>>> 
>>>> 2023年7月21日 03:22,Eric Dumazet <edumazet@google.com> 写道:
>>>>
>>>>> On Thu, Jul 20, 2023 at 8:54 PM Alan Huang <mmpgouride@gmail.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I noticed a commit c87a124a5d5e(“net: force a reload of first item in hlist_nulls_for_each_entry_rcu”)
>>>>> and a related discussion [1].
>>>>>
>>>>> After reading the whole discussion, it seems like that ptr->field was cached by gcc even with the deprecated
>>>>> ACCESS_ONCE(), so my question is:
>>>>>
>>>>>       Is that a compiler bug? If so, has this bug been fixed today, ten years later?
>>>>>
>>>>>       What about READ_ONCE(ptr->field)?
>>>>
>>>> Make sure sparse is happy.
>>>
>>> It caused a problem without barrier(), and the deprecated ACCESS_ONCE() didn’t help:
>>>
>>>    https://lore.kernel.org/all/519D19DA.50400@yandex-team.ru/
>>>
>>> So, my real question is: With READ_ONCE(ptr->field), are there still some unusual cases where gcc
>>> decides not to reload ptr->field?
>>
>> I am a bit doubtful there will be strong (any?) interest in replacing the barrier() with READ_ONCE() without any tangible reason, regardless of whether a gcc issue was fixed.
>>
>> But hey, if you want to float the idea…
> 
> We already had the READ_ONCE() in rcu_deference_raw().
> 
> The barrier() here makes me think we need write code like below:
> 	
> 	READ_ONCE(head->first);
> 	barrier();
> 	READ_ONCE(head->first);
> 
> With READ_ONCE (or the deprecated ACCESS_ONCE),
> I don’t think a compiler should cache the value of head->first.


Right, it shouldn't need to cache. To Eric's point it might be risky to remove 
the barrier() and someone needs to explain that issue first (or IMO there needs 
to be another tangible reason like performance etc). Anyway, FWIW I wrote a 
simple program and I am not seeing the head->first cached with the pattern you 
shared above:

#include <stdlib.h>

#define READ_ONCE(x) (*(volatile typeof(x) *)&(x))
#define barrier() __asm__ __volatile__("": : :"memory")

typedef struct list_head {
     int first;
     struct list_head *next;
} list_head;

int main() {
     list_head *head = (list_head *)malloc(sizeof(list_head));
     head->first = 1;
     head->next = 0;

     READ_ONCE(head->first);
     barrier();
     READ_ONCE(head->first);

     free(head);
     return 0;
}

On ARM 32-bit, 64-bit and x86_64, with -Os and then another experiment with -O2 
on new gcc versions.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question about the barrier() in hlist_nulls_for_each_entry_rcu()
  2023-07-21 14:47         ` Eric Dumazet
@ 2023-07-21 15:21           ` Alan Huang
  0 siblings, 0 replies; 26+ messages in thread
From: Alan Huang @ 2023-07-21 15:21 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel, netdev, rcu, Paul E. McKenney, roman.gushchin


> 2023年7月21日 22:47,Eric Dumazet <edumazet@google.com> 写道:
> 
> On Fri, Jul 21, 2023 at 4:31 PM Alan Huang <mmpgouride@gmail.com> wrote:
>> 
>> 
>>> 2023年7月21日 05:11,Eric Dumazet <edumazet@google.com> 写道:
>>> 
>>> On Thu, Jul 20, 2023 at 10:00 PM Alan Huang <mmpgouride@gmail.com> wrote:
>>>> 
>>>> 
>>>>> 2023年7月21日 03:22,Eric Dumazet <edumazet@google.com> 写道:
>>>>> 
>>>>> On Thu, Jul 20, 2023 at 8:54 PM Alan Huang <mmpgouride@gmail.com> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I noticed a commit c87a124a5d5e(“net: force a reload of first item in hlist_nulls_for_each_entry_rcu”)
>>>>>> and a related discussion [1].
>>>>>> 
>>>>>> After reading the whole discussion, it seems like that ptr->field was cached by gcc even with the deprecated
>>>>>> ACCESS_ONCE(), so my question is:
>>>>>> 
>>>>>>      Is that a compiler bug? If so, has this bug been fixed today, ten years later?
>>>>>> 
>>>>>>      What about READ_ONCE(ptr->field)?
>>>>> 
>>>>> Make sure sparse is happy.
>>>> 
>>>> It caused a problem without barrier(), and the deprecated ACCESS_ONCE() didn’t help:
>>>> 
>>>>       https://lore.kernel.org/all/519D19DA.50400@yandex-team.ru/
>>>> 
>>>> So, my real question is: With READ_ONCE(ptr->field), are there still some unusual cases where gcc
>>>> decides not to reload ptr->field?
>>> 
>>> I can not really answer without seeing an actual patch...
>> 
>> The content of the potential patch:
>> 
>> diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
>> index 89186c499dd4..bcd39670f359 100644
>> --- a/include/linux/rculist_nulls.h
>> +++ b/include/linux/rculist_nulls.h
>> @@ -158,15 +158,9 @@ static inline void hlist_nulls_add_fake(struct hlist_nulls_node *n)
>>  * @pos:       the &struct hlist_nulls_node to use as a loop cursor.
>>  * @head:      the head of the list.
>>  * @member:    the name of the hlist_nulls_node within the struct.
>> - *
>> - * The barrier() is needed to make sure compiler doesn't cache first element [1],
>> - * as this loop can be restarted [2]
>> - * [1] Documentation/memory-barriers.txt around line 1533
>> - * [2] Documentation/RCU/rculist_nulls.rst around line 146
>>  */
>> #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member)                        \
>> -       for (({barrier();}),                                                    \
>> -            pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));            \
>> +       for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));            \
>>                (!is_a_nulls(pos)) &&                                           \
>>                ({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
>>                pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))
>> @@ -180,8 +174,7 @@ static inline void hlist_nulls_add_fake(struct hlist_nulls_node *n)
>>  * @member:    the name of the hlist_nulls_node within the struct.
>>  */
>> #define hlist_nulls_for_each_entry_safe(tpos, pos, head, member)               \
>> -       for (({barrier();}),                                                    \
>> -            pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));            \
>> +       for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));            \
>>                (!is_a_nulls(pos)) &&                                           \
>>                ({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member);        \
>>                   pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)); 1; });)
>> 
>> 
>>> 
>>> Why are you asking ? Are you tracking compiler bug fixes ?
>> 
>> The barrier() here makes me confused.
>> 
>> If we really need that, do we need:
>> 
>>        READ_ONCE(head->first);
>>        barrier();
>>        READ_ONCE(head->first);
>> 
> 
> Nope, the patch you want to revert (while it did fix (by pure luck
> ???) a real bug back in the days) was replacing
> 
> ACCESS_ONCE()
> 
> by
> 
> barrier();
> ACCESS_ONCE();

Yeah.

The commit message and related discussions all indicate
that the compiler cached a value accessed through volatile.

That’s why I asked here.

> 
> (There is one ACCESS_ONCE(), not two of them)
> 
> BTW,
>  barrier();
>  followed by an arbitrary number of barrier(); back to back,
> translates to one barrier()
> 
> Frankly, I would not change the code, unless someone can explain what
> was the issue.
> (Perhaps there was a missing barrier elsewhere)

Fair enough, although I feel like this is masking the real issue.

(I feel like no one will know the real issue ten years later, when no one knew it ten years ago.)

Anyway, thanks for your time.





^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question about the barrier() in hlist_nulls_for_each_entry_rcu()
  2023-07-21 15:21         ` Joel Fernandes
@ 2023-07-21 15:54           ` Alan Huang
  2023-07-21 16:00             ` Joel Fernandes
  2023-07-21 15:59           ` David Laight
  2023-07-21 20:08           ` Alan Huang
  2 siblings, 1 reply; 26+ messages in thread
From: Alan Huang @ 2023-07-21 15:54 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Eric Dumazet, linux-kernel, netdev, rcu, Paul E. McKenney,
	roman.gushchin


> 2023年7月21日 23:21,Joel Fernandes <joel@joelfernandes.org> 写道:
> 
> On 7/21/23 10:27, Alan Huang wrote:
>>> 2023年7月21日 20:54,Joel Fernandes <joel@joelfernandes.org> 写道:
>>> 
>>> 
>>> 
>>>> On Jul 20, 2023, at 4:00 PM, Alan Huang <mmpgouride@gmail.com> wrote:
>>>> 
>>>> 
>>>>> 2023年7月21日 03:22,Eric Dumazet <edumazet@google.com> 写道:
>>>>> 
>>>>>> On Thu, Jul 20, 2023 at 8:54 PM Alan Huang <mmpgouride@gmail.com> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I noticed a commit c87a124a5d5e(“net: force a reload of first item in hlist_nulls_for_each_entry_rcu”)
>>>>>> and a related discussion [1].
>>>>>> 
>>>>>> After reading the whole discussion, it seems like that ptr->field was cached by gcc even with the deprecated
>>>>>> ACCESS_ONCE(), so my question is:
>>>>>> 
>>>>>>      Is that a compiler bug? If so, has this bug been fixed today, ten years later?
>>>>>> 
>>>>>>      What about READ_ONCE(ptr->field)?
>>>>> 
>>>>> Make sure sparse is happy.
>>>> 
>>>> It caused a problem without barrier(), and the deprecated ACCESS_ONCE() didn’t help:
>>>> 
>>>>   https://lore.kernel.org/all/519D19DA.50400@yandex-team.ru/
>>>> 
>>>> So, my real question is: With READ_ONCE(ptr->field), are there still some unusual cases where gcc
>>>> decides not to reload ptr->field?
>>> 
>>> I am a bit doubtful there will be strong (any?) interest in replacing the barrier() with READ_ONCE() without any tangible reason, regardless of whether a gcc issue was fixed.
>>> 
>>> But hey, if you want to float the idea…
>> We already had the READ_ONCE() in rcu_deference_raw().
>> The barrier() here makes me think we need write code like below:
>> 
>> READ_ONCE(head->first);
>> barrier();
>> READ_ONCE(head->first);
>> With READ_ONCE (or the deprecated ACCESS_ONCE),
>> I don’t think a compiler should cache the value of head->first.
> 
> 
> Right, it shouldn't need to cache. To Eric's point it might be risky to remove the barrier() and someone needs to explain that issue first (or IMO there needs to be another tangible reason like performance etc). Anyway, FWIW I wrote a simple program and I am not seeing the head->first cached with the pattern you shared above:
> 
> #include <stdlib.h>
> 
> #define READ_ONCE(x) (*(volatile typeof(x) *)&(x))
> #define barrier() __asm__ __volatile__("": : :"memory")
> 
> typedef struct list_head {
>    int first;
>    struct list_head *next;
> } list_head;
> 
> int main() {
>    list_head *head = (list_head *)malloc(sizeof(list_head));
>    head->first = 1;
>    head->next = 0;
> 
>    READ_ONCE(head->first);
>    barrier();

Thanks for your time!

However, what I'm trying to say here is that without this barrier(), GCC wouldn't cache this value either.

So, I removed the barrier() and tested, GCC didn’t cache the value of head->first.
(Only tested on x86-64 (all the possible versions of gcc that Compiler Explorer has) where the original issue occurred [1].)

Therefore, the commit message and the related discussion ten years ago is misleading.

Thanks again!

[1] https://lkml.org/lkml/2013/4/16/371


>    READ_ONCE(head->first);
> 
>    free(head);
>    return 0;
> }
> 
> On ARM 32-bit, 64-bit and x86_64, with -Os and then another experiment with -O2 on new gcc versions.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question about the barrier() in hlist_nulls_for_each_entry_rcu()
  2023-07-21 11:51 ` David Laight
@ 2023-07-21 15:55   ` Alan Huang
  0 siblings, 0 replies; 26+ messages in thread
From: Alan Huang @ 2023-07-21 15:55 UTC (permalink / raw)
  To: David Laight
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	rcu@vger.kernel.org, Paul E. McKenney, Eric Dumazet,
	roman.gushchin@linux.dev


> 2023年7月21日 19:51,David Laight <David.Laight@ACULAB.COM> 写道:
> 
> From: Alan Huang
>> Sent: 20 July 2023 19:54
>> 
>> I noticed a commit c87a124a5d5e(“net: force a reload of first item in hlist_nulls_for_each_entry_rcu”)
>> and a related discussion [1].
> 
> Hmmm... that was all about the retry loop in ipv4/udp.c
> 
> AFAICT that retry got deleted by ca065d0c.
> 
> That also changes the list from hlist_nulls_xxx to hlist_xxx.
> (I'm not sure of the difference)
> 
> This might be why we're seeing unexpected 'port unreachable' messages?
> 
> Quite why that has just started happening is another issue.
> Most of the UDP sockets we create aren't 'connected' so I don't
> believe they get moved between hash chains - just deleted.
> The deletion should leave the hash chain intact.

Thanks for the information! :)

> 
> David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: Question about the barrier() in hlist_nulls_for_each_entry_rcu()
  2023-07-21 15:21         ` Joel Fernandes
  2023-07-21 15:54           ` Alan Huang
@ 2023-07-21 15:59           ` David Laight
  2023-07-21 17:14             ` Joel Fernandes
  2023-07-21 20:08           ` Alan Huang
  2 siblings, 1 reply; 26+ messages in thread
From: David Laight @ 2023-07-21 15:59 UTC (permalink / raw)
  To: 'Joel Fernandes', Alan Huang
  Cc: Eric Dumazet, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, rcu@vger.kernel.org, Paul E. McKenney,
	roman.gushchin@linux.dev

....
> Right, it shouldn't need to cache. To Eric's point it might be risky to remove
> the barrier() and someone needs to explain that issue first (or IMO there needs
> to be another tangible reason like performance etc). Anyway, FWIW I wrote a
> simple program and I am not seeing the head->first cached with the pattern you
> shared above:
> 
> #include <stdlib.h>
> 
> #define READ_ONCE(x) (*(volatile typeof(x) *)&(x))
> #define barrier() __asm__ __volatile__("": : :"memory")
> 
> typedef struct list_head {
>      int first;
>      struct list_head *next;
> } list_head;
> 
> int main() {
>      list_head *head = (list_head *)malloc(sizeof(list_head));
>      head->first = 1;
>      head->next = 0;
> 
>      READ_ONCE(head->first);
>      barrier();
>      READ_ONCE(head->first);
> 
>      free(head);
>      return 0;
> }

You probably need to try harder to generate the error.
It probably has something to do code surrounding the
sk_nulls_for_each_rcu() in the ca065d0c^ version of udp.c.

That patch removes the retry loop - and probably breaks udp receive.
The issue is that sockets can be moved between the 'hash2' chains
(eg by connect()) without being freed.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question about the barrier() in hlist_nulls_for_each_entry_rcu()
  2023-07-21 15:54           ` Alan Huang
@ 2023-07-21 16:00             ` Joel Fernandes
  0 siblings, 0 replies; 26+ messages in thread
From: Joel Fernandes @ 2023-07-21 16:00 UTC (permalink / raw)
  To: Alan Huang
  Cc: Eric Dumazet, linux-kernel, netdev, rcu, Paul E. McKenney,
	roman.gushchin



> On Jul 21, 2023, at 11:55 AM, Alan Huang <mmpgouride@gmail.com> wrote:
> 
> 
>> 2023年7月21日 23:21,Joel Fernandes <joel@joelfernandes.org> 写道:
>> 
>> On 7/21/23 10:27, Alan Huang wrote:
>>>> 2023年7月21日 20:54,Joel Fernandes <joel@joelfernandes.org> 写道:
>>>> 
>>>> 
>>>> 
>>>>> On Jul 20, 2023, at 4:00 PM, Alan Huang <mmpgouride@gmail.com> wrote:
>>>>> 
>>>>> 
>>>>>> 2023年7月21日 03:22,Eric Dumazet <edumazet@google.com> 写道:
>>>>>> 
>>>>>>>> On Thu, Jul 20, 2023 at 8:54 PM Alan Huang <mmpgouride@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I noticed a commit c87a124a5d5e(“net: force a reload of first item in hlist_nulls_for_each_entry_rcu”)
>>>>>>>> and a related discussion [1].
>>>>>>>> 
>>>>>>>> After reading the whole discussion, it seems like that ptr->field was cached by gcc even with the deprecated
>>>>>>>> ACCESS_ONCE(), so my question is:
>>>>>>>> 
>>>>>>>>     Is that a compiler bug? If so, has this bug been fixed today, ten years later?
>>>>>>>> 
>>>>>>>>     What about READ_ONCE(ptr->field)?
>>>>>>> 
>>>>>>> Make sure sparse is happy.
>>>>>> 
>>>>>> It caused a problem without barrier(), and the deprecated ACCESS_ONCE() didn’t help:
>>>>>> 
>>>>>>  https://lore.kernel.org/all/519D19DA.50400@yandex-team.ru/
>>>>>> 
>>>>>> So, my real question is: With READ_ONCE(ptr->field), are there still some unusual cases where gcc
>>>>>> decides not to reload ptr->field?
>>>>> 
>>>>> I am a bit doubtful there will be strong (any?) interest in replacing the barrier() with READ_ONCE() without any tangible reason, regardless of whether a gcc issue was fixed.
>>>>> 
>>>>> But hey, if you want to float the idea…
>>> We already had the READ_ONCE() in rcu_deference_raw().
>>> The barrier() here makes me think we need write code like below:
>>> 
>>> READ_ONCE(head->first);
>>> barrier();
>>> READ_ONCE(head->first);
>>> With READ_ONCE (or the deprecated ACCESS_ONCE),
>>> I don’t think a compiler should cache the value of head->first.
>> 
>> 
>> Right, it shouldn't need to cache. To Eric's point it might be risky to remove the barrier() and someone needs to explain that issue first (or IMO there needs to be another tangible reason like performance etc). Anyway, FWIW I wrote a simple program and I am not seeing the head->first cached with the pattern you shared above:
>> 
>> #include <stdlib.h>
>> 
>> #define READ_ONCE(x) (*(volatile typeof(x) *)&(x))
>> #define barrier() __asm__ __volatile__("": : :"memory")
>> 
>> typedef struct list_head {
>>   int first;
>>   struct list_head *next;
>> } list_head;
>> 
>> int main() {
>>   list_head *head = (list_head *)malloc(sizeof(list_head));
>>   head->first = 1;
>>   head->next = 0;
>> 
>>   READ_ONCE(head->first);
>>   barrier();
> 
> Thanks for your time!
> 
> However, what I'm trying to say here is that without this barrier(), GCC wouldn't cache this value either.
> 

Yes that is what I tried too, removing the barrier.

Apologies for confusing, the code I shared did have it but I had also tried removing it.

Thanks,

 - Joel


> So, I removed the barrier() and tested, GCC didn’t cache the value of head->first.
> (Only tested on x86-64 (all the possible versions of gcc that Compiler Explorer has) where the original issue occurred [1].)
> 
> Therefore, the commit message and the related discussion ten years ago is misleading.
> 
> Thanks again!
> 
> [1] https://lkml.org/lkml/2013/4/16/371
> 
> 
>>   READ_ONCE(head->first);
>> 
>>   free(head);
>>   return 0;
>> }
>> 
>> On ARM 32-bit, 64-bit and x86_64, with -Os and then another experiment with -O2 on new gcc versions.
> 
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question about the barrier() in hlist_nulls_for_each_entry_rcu()
  2023-07-21 15:59           ` David Laight
@ 2023-07-21 17:14             ` Joel Fernandes
  0 siblings, 0 replies; 26+ messages in thread
From: Joel Fernandes @ 2023-07-21 17:14 UTC (permalink / raw)
  To: David Laight
  Cc: Alan Huang, Eric Dumazet, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, rcu@vger.kernel.org, Paul E. McKenney,
	roman.gushchin@linux.dev

On Fri, Jul 21, 2023 at 11:59 AM David Laight <David.Laight@aculab.com> wrote:
>
> ....
> > Right, it shouldn't need to cache. To Eric's point it might be risky to remove
> > the barrier() and someone needs to explain that issue first (or IMO there needs
> > to be another tangible reason like performance etc). Anyway, FWIW I wrote a
> > simple program and I am not seeing the head->first cached with the pattern you
> > shared above:
> >
> > #include <stdlib.h>
> >
> > #define READ_ONCE(x) (*(volatile typeof(x) *)&(x))
> > #define barrier() __asm__ __volatile__("": : :"memory")
> >
> > typedef struct list_head {
> >      int first;
> >      struct list_head *next;
> > } list_head;
> >
> > int main() {
> >      list_head *head = (list_head *)malloc(sizeof(list_head));
> >      head->first = 1;
> >      head->next = 0;
> >
> >      READ_ONCE(head->first);
> >      barrier();
> >      READ_ONCE(head->first);
> >
> >      free(head);
> >      return 0;
> > }
>
> You probably need to try harder to generate the error.
> It probably has something to do code surrounding the
> sk_nulls_for_each_rcu() in the ca065d0c^ version of udp.c.
>
> That patch removes the retry loop - and probably breaks udp receive.
> The issue is that sockets can be moved between the 'hash2' chains
> (eg by connect()) without being freed.

I was just replying to Alan's question on the behavior of READ_ONCE()
since I myself recently got surprised by compiler optimizations
related to it. I haven't looked into the actual UDP code.

 - Joel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question about the barrier() in hlist_nulls_for_each_entry_rcu()
  2023-07-21 15:21         ` Joel Fernandes
  2023-07-21 15:54           ` Alan Huang
  2023-07-21 15:59           ` David Laight
@ 2023-07-21 20:08           ` Alan Huang
  2023-07-21 20:40             ` Alan Huang
  2 siblings, 1 reply; 26+ messages in thread
From: Alan Huang @ 2023-07-21 20:08 UTC (permalink / raw)
  To: Joel Fernandes, Paul E. McKenney, Eric Dumazet, roman.gushchin,
	David.Laight@aculab.com
  Cc: linux-kernel, netdev, rcu


> 2023年7月21日 下午11:21,Joel Fernandes <joel@joelfernandes.org> 写道:
> 
> On 7/21/23 10:27, Alan Huang wrote:
>>> 2023年7月21日 20:54,Joel Fernandes <joel@joelfernandes.org> 写道:
>>> 
>>> 
>>> 
>>>> On Jul 20, 2023, at 4:00 PM, Alan Huang <mmpgouride@gmail.com> wrote:
>>>> 
>>>> 
>>>>> 2023年7月21日 03:22,Eric Dumazet <edumazet@google.com> 写道:
>>>>> 
>>>>>> On Thu, Jul 20, 2023 at 8:54 PM Alan Huang <mmpgouride@gmail.com> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I noticed a commit c87a124a5d5e(“net: force a reload of first item in hlist_nulls_for_each_entry_rcu”)
>>>>>> and a related discussion [1].
>>>>>> 
>>>>>> After reading the whole discussion, it seems like that ptr->field was cached by gcc even with the deprecated
>>>>>> ACCESS_ONCE(), so my question is:
>>>>>> 
>>>>>>      Is that a compiler bug? If so, has this bug been fixed today, ten years later?
>>>>>> 
>>>>>>      What about READ_ONCE(ptr->field)?
>>>>> 
>>>>> Make sure sparse is happy.
>>>> 
>>>> It caused a problem without barrier(), and the deprecated ACCESS_ONCE() didn’t help:
>>>> 
>>>>   https://lore.kernel.org/all/519D19DA.50400@yandex-team.ru/
>>>> 
>>>> So, my real question is: With READ_ONCE(ptr->field), are there still some unusual cases where gcc
>>>> decides not to reload ptr->field?
>>> 
>>> I am a bit doubtful there will be strong (any?) interest in replacing the barrier() with READ_ONCE() without any tangible reason, regardless of whether a gcc issue was fixed.
>>> 
>>> But hey, if you want to float the idea…
>> We already had the READ_ONCE() in rcu_deference_raw().
>> The barrier() here makes me think we need write code like below:
>> 	
>> 	READ_ONCE(head->first);
>> 	barrier();
>> 	READ_ONCE(head->first);
>> With READ_ONCE (or the deprecated ACCESS_ONCE),
>> I don’t think a compiler should cache the value of head->first.
> 
> 
> Right, it shouldn't need to cache. To Eric's point it might be risky to remove the barrier() and someone needs to explain that issue first (or IMO there needs to be another tangible reason like performance etc). Anyway, FWIW I wrote a simple program and I am not seeing the head->first cached with the pattern you shared above:
> 
> #include <stdlib.h>
> 
> #define READ_ONCE(x) (*(volatile typeof(x) *)&(x))
> #define barrier() __asm__ __volatile__("": : :"memory")
> 
> typedef struct list_head {
>    int first;
>    struct list_head *next;
> } list_head;
> 
> int main() {
>    list_head *head = (list_head *)malloc(sizeof(list_head));
>    head->first = 1;
>    head->next = 0;
> 
>    READ_ONCE(head->first);
>    barrier();
>    READ_ONCE(head->first);
> 
>    free(head);
>    return 0;
> }
> 
> On ARM 32-bit, 64-bit and x86_64, with -Os and then another experiment with -O2 on new gcc versions.

Well, when I change the code as below:

#include <stdlib.h>

#define READ_ONCE(x) (*(volatile typeof(x) *)&(x))
#define barrier() __asm__ __volatile__("": : :"memory")

typedef struct list_head {
   struct list_head *next;
   int first;					// difference here
} list_head;

int main() {
   list_head *head = (list_head *)malloc(sizeof(list_head));
   head->first = 1;
   head->next = 0;

   READ_ONCE(head->first);
   READ_ONCE(head->first);

   free(head);
   return 0;
}

GCC 8, GCC 10, GCC 11 generate the following code (with -O2):

main:
        subq    $8, %rsp
        movl    $16, %edi
        call    malloc
        movl    $1, 8(%rax)
        movq    %rax, %rdi
        call    free
        xorl    %eax, %eax
        addq    $8, %rsp
        ret


The READ_ONCE has been optimized away. The difference in the source code is that I put ->first to the second member.

That means, GCC 8, 10, 11 have the bug!





^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question about the barrier() in hlist_nulls_for_each_entry_rcu()
  2023-07-21 20:08           ` Alan Huang
@ 2023-07-21 20:40             ` Alan Huang
  2023-07-21 21:25               ` Alan Huang
  2023-07-22 13:32               ` Alan Huang
  0 siblings, 2 replies; 26+ messages in thread
From: Alan Huang @ 2023-07-21 20:40 UTC (permalink / raw)
  To: Joel Fernandes, Paul E. McKenney, Eric Dumazet, roman.gushchin,
	David.Laight@aculab.com
  Cc: linux-kernel, netdev, rcu


> 2023年7月22日 上午4:08,Alan Huang <mmpgouride@gmail.com> 写道:
> 
> 
>> 2023年7月21日 下午11:21,Joel Fernandes <joel@joelfernandes.org> 写道:
>> 
>> On 7/21/23 10:27, Alan Huang wrote:
>>>> 2023年7月21日 20:54,Joel Fernandes <joel@joelfernandes.org> 写道:
>>>> 
>>>> 
>>>> 
>>>>> On Jul 20, 2023, at 4:00 PM, Alan Huang <mmpgouride@gmail.com> wrote:
>>>>> 
>>>>> 
>>>>>> 2023年7月21日 03:22,Eric Dumazet <edumazet@google.com> 写道:
>>>>>> 
>>>>>>> On Thu, Jul 20, 2023 at 8:54 PM Alan Huang <mmpgouride@gmail.com> wrote:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I noticed a commit c87a124a5d5e(“net: force a reload of first item in hlist_nulls_for_each_entry_rcu”)
>>>>>>> and a related discussion [1].
>>>>>>> 
>>>>>>> After reading the whole discussion, it seems like that ptr->field was cached by gcc even with the deprecated
>>>>>>> ACCESS_ONCE(), so my question is:
>>>>>>> 
>>>>>>>     Is that a compiler bug? If so, has this bug been fixed today, ten years later?
>>>>>>> 
>>>>>>>     What about READ_ONCE(ptr->field)?
>>>>>> 
>>>>>> Make sure sparse is happy.
>>>>> 
>>>>> It caused a problem without barrier(), and the deprecated ACCESS_ONCE() didn’t help:
>>>>> 
>>>>>  https://lore.kernel.org/all/519D19DA.50400@yandex-team.ru/
>>>>> 
>>>>> So, my real question is: With READ_ONCE(ptr->field), are there still some unusual cases where gcc
>>>>> decides not to reload ptr->field?
>>>> 
>>>> I am a bit doubtful there will be strong (any?) interest in replacing the barrier() with READ_ONCE() without any tangible reason, regardless of whether a gcc issue was fixed.
>>>> 
>>>> But hey, if you want to float the idea…
>>> We already had the READ_ONCE() in rcu_deference_raw().
>>> The barrier() here makes me think we need write code like below:
>>> 	
>>> 	READ_ONCE(head->first);
>>> 	barrier();
>>> 	READ_ONCE(head->first);
>>> With READ_ONCE (or the deprecated ACCESS_ONCE),
>>> I don’t think a compiler should cache the value of head->first.
>> 
>> 
>> Right, it shouldn't need to cache. To Eric's point it might be risky to remove the barrier() and someone needs to explain that issue first (or IMO there needs to be another tangible reason like performance etc). Anyway, FWIW I wrote a simple program and I am not seeing the head->first cached with the pattern you shared above:
>> 
>> #include <stdlib.h>
>> 
>> #define READ_ONCE(x) (*(volatile typeof(x) *)&(x))
>> #define barrier() __asm__ __volatile__("": : :"memory")
>> 
>> typedef struct list_head {
>>   int first;
>>   struct list_head *next;
>> } list_head;
>> 
>> int main() {
>>   list_head *head = (list_head *)malloc(sizeof(list_head));
>>   head->first = 1;
>>   head->next = 0;
>> 
>>   READ_ONCE(head->first);
>>   barrier();
>>   READ_ONCE(head->first);
>> 
>>   free(head);
>>   return 0;
>> }
>> 
>> On ARM 32-bit, 64-bit and x86_64, with -Os and then another experiment with -O2 on new gcc versions.
> 
> Well, when I change the code as below:
> 
> #include <stdlib.h>
> 
> #define READ_ONCE(x) (*(volatile typeof(x) *)&(x))
> #define barrier() __asm__ __volatile__("": : :"memory")
> 
> typedef struct list_head {
>   struct list_head *next;
>   int first;					// difference here
> } list_head;
> 
> int main() {
>   list_head *head = (list_head *)malloc(sizeof(list_head));
>   head->first = 1;
>   head->next = 0;
> 
>   READ_ONCE(head->first);
>   READ_ONCE(head->first);
> 
>   free(head);
>   return 0;
> }
> 
> GCC 8, GCC 10, GCC 11 generate the following code (with -O2):
> 
> main:
>        subq    $8, %rsp
>        movl    $16, %edi
>        call    malloc
>        movl    $1, 8(%rax)
>        movq    %rax, %rdi
>        call    free
>        xorl    %eax, %eax
>        addq    $8, %rsp
>        ret
> 
> 
> The READ_ONCE has been optimized away. The difference in the source code is that I put ->first to the second member.
> 
> That means, GCC 8, 10, 11 have the bug!
> 
> 

Found a related discussion:

	https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102714

Looks like GCC 10, 11 have been backported, not sure whether GCC 8 has been backported.

So, I have the following questions:

Given that some people might not update their GCC, do they need to be notified?

Do we need to CC Linus?



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question about the barrier() in hlist_nulls_for_each_entry_rcu()
  2023-07-21 20:40             ` Alan Huang
@ 2023-07-21 21:25               ` Alan Huang
  2023-07-22 13:32               ` Alan Huang
  1 sibling, 0 replies; 26+ messages in thread
From: Alan Huang @ 2023-07-21 21:25 UTC (permalink / raw)
  To: Joel Fernandes, Paul E. McKenney, Eric Dumazet, roman.gushchin,
	David.Laight@aculab.com
  Cc: linux-kernel, netdev, rcu


> 2023年7月22日 上午4:40,Alan Huang <mmpgouride@gmail.com> 写道:
> 
> 
>> 2023年7月22日 上午4:08,Alan Huang <mmpgouride@gmail.com> 写道:
>> 
>> 
>>> 2023年7月21日 下午11:21,Joel Fernandes <joel@joelfernandes.org> 写道:
>>> 
>>> On 7/21/23 10:27, Alan Huang wrote:
>>>>> 2023年7月21日 20:54,Joel Fernandes <joel@joelfernandes.org> 写道:
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Jul 20, 2023, at 4:00 PM, Alan Huang <mmpgouride@gmail.com> wrote:
>>>>>> 
>>>>>> 
>>>>>>> 2023年7月21日 03:22,Eric Dumazet <edumazet@google.com> 写道:
>>>>>>> 
>>>>>>>> On Thu, Jul 20, 2023 at 8:54 PM Alan Huang <mmpgouride@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I noticed a commit c87a124a5d5e(“net: force a reload of first item in hlist_nulls_for_each_entry_rcu”)
>>>>>>>> and a related discussion [1].
>>>>>>>> 
>>>>>>>> After reading the whole discussion, it seems like that ptr->field was cached by gcc even with the deprecated
>>>>>>>> ACCESS_ONCE(), so my question is:
>>>>>>>> 
>>>>>>>>    Is that a compiler bug? If so, has this bug been fixed today, ten years later?
>>>>>>>> 
>>>>>>>>    What about READ_ONCE(ptr->field)?
>>>>>>> 
>>>>>>> Make sure sparse is happy.
>>>>>> 
>>>>>> It caused a problem without barrier(), and the deprecated ACCESS_ONCE() didn’t help:
>>>>>> 
>>>>>> https://lore.kernel.org/all/519D19DA.50400@yandex-team.ru/
>>>>>> 
>>>>>> So, my real question is: With READ_ONCE(ptr->field), are there still some unusual cases where gcc
>>>>>> decides not to reload ptr->field?
>>>>> 
>>>>> I am a bit doubtful there will be strong (any?) interest in replacing the barrier() with READ_ONCE() without any tangible reason, regardless of whether a gcc issue was fixed.
>>>>> 
>>>>> But hey, if you want to float the idea…
>>>> We already had the READ_ONCE() in rcu_deference_raw().
>>>> The barrier() here makes me think we need write code like below:
>>>> 	
>>>> 	READ_ONCE(head->first);
>>>> 	barrier();
>>>> 	READ_ONCE(head->first);
>>>> With READ_ONCE (or the deprecated ACCESS_ONCE),
>>>> I don’t think a compiler should cache the value of head->first.
>>> 
>>> 
>>> Right, it shouldn't need to cache. To Eric's point it might be risky to remove the barrier() and someone needs to explain that issue first (or IMO there needs to be another tangible reason like performance etc). Anyway, FWIW I wrote a simple program and I am not seeing the head->first cached with the pattern you shared above:
>>> 
>>> #include <stdlib.h>
>>> 
>>> #define READ_ONCE(x) (*(volatile typeof(x) *)&(x))
>>> #define barrier() __asm__ __volatile__("": : :"memory")
>>> 
>>> typedef struct list_head {
>>>  int first;
>>>  struct list_head *next;
>>> } list_head;
>>> 
>>> int main() {
>>>  list_head *head = (list_head *)malloc(sizeof(list_head));
>>>  head->first = 1;
>>>  head->next = 0;
>>> 
>>>  READ_ONCE(head->first);
>>>  barrier();
>>>  READ_ONCE(head->first);
>>> 
>>>  free(head);
>>>  return 0;
>>> }
>>> 
>>> On ARM 32-bit, 64-bit and x86_64, with -Os and then another experiment with -O2 on new gcc versions.
>> 
>> Well, when I change the code as below:
>> 
>> #include <stdlib.h>
>> 
>> #define READ_ONCE(x) (*(volatile typeof(x) *)&(x))
>> #define barrier() __asm__ __volatile__("": : :"memory")
>> 
>> typedef struct list_head {
>>  struct list_head *next;
>>  int first;					// difference here
>> } list_head;
>> 
>> int main() {
>>  list_head *head = (list_head *)malloc(sizeof(list_head));
>>  head->first = 1;
>>  head->next = 0;
>> 
>>  READ_ONCE(head->first);
>>  READ_ONCE(head->first);
>> 
>>  free(head);
>>  return 0;
>> }
>> 
>> GCC 8, GCC 10, GCC 11 generate the following code (with -O2):
>> 
>> main:
>>       subq    $8, %rsp
>>       movl    $16, %edi
>>       call    malloc
>>       movl    $1, 8(%rax)
>>       movq    %rax, %rdi
>>       call    free
>>       xorl    %eax, %eax
>>       addq    $8, %rsp
>>       ret
>> 
>> 
>> The READ_ONCE has been optimized away. The difference in the source code is that I put ->first to the second member.
>> 
>> That means, GCC 8, 10, 11 have the bug!
>> 
>> 
> 
> Found a related discussion:
> 
> 	https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102714

And this:

	https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47409

So, the compiler had the bug ten years ago.

> 
> Looks like GCC 10, 11 have been backported, not sure whether GCC 8 has been backported.
> 
> So, I have the following questions:
> 
> Given that some people might not update their GCC, do they need to be notified?
> 
> Do we need to CC Linus?
> 
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question about the barrier() in hlist_nulls_for_each_entry_rcu()
  2023-07-21 20:40             ` Alan Huang
  2023-07-21 21:25               ` Alan Huang
@ 2023-07-22 13:32               ` Alan Huang
  2023-07-22 14:06                 ` David Laight
  1 sibling, 1 reply; 26+ messages in thread
From: Alan Huang @ 2023-07-22 13:32 UTC (permalink / raw)
  To: Joel Fernandes, Paul E. McKenney, Eric Dumazet, roman.gushchin,
	David.Laight@aculab.com
  Cc: linux-kernel, netdev, rcu


> 2023年7月22日 04:40,Alan Huang <mmpgouride@gmail.com> 写道:
> 
> 
>> 2023年7月22日 上午4:08,Alan Huang <mmpgouride@gmail.com> 写道:
>> 
>> 
>>> 2023年7月21日 下午11:21,Joel Fernandes <joel@joelfernandes.org> 写道:
>>> 
>>> On 7/21/23 10:27, Alan Huang wrote:
>>>>> 2023年7月21日 20:54,Joel Fernandes <joel@joelfernandes.org> 写道:
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Jul 20, 2023, at 4:00 PM, Alan Huang <mmpgouride@gmail.com> wrote:
>>>>>> 
>>>>>> 
>>>>>>> 2023年7月21日 03:22,Eric Dumazet <edumazet@google.com> 写道:
>>>>>>> 
>>>>>>>> On Thu, Jul 20, 2023 at 8:54 PM Alan Huang <mmpgouride@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I noticed a commit c87a124a5d5e(“net: force a reload of first item in hlist_nulls_for_each_entry_rcu”)
>>>>>>>> and a related discussion [1].
>>>>>>>> 
>>>>>>>> After reading the whole discussion, it seems like that ptr->field was cached by gcc even with the deprecated
>>>>>>>> ACCESS_ONCE(), so my question is:
>>>>>>>> 
>>>>>>>>    Is that a compiler bug? If so, has this bug been fixed today, ten years later?
>>>>>>>> 
>>>>>>>>    What about READ_ONCE(ptr->field)?
>>>>>>> 
>>>>>>> Make sure sparse is happy.
>>>>>> 
>>>>>> It caused a problem without barrier(), and the deprecated ACCESS_ONCE() didn’t help:
>>>>>> 
>>>>>> https://lore.kernel.org/all/519D19DA.50400@yandex-team.ru/
>>>>>> 
>>>>>> So, my real question is: With READ_ONCE(ptr->field), are there still some unusual cases where gcc
>>>>>> decides not to reload ptr->field?
>>>>> 
>>>>> I am a bit doubtful there will be strong (any?) interest in replacing the barrier() with READ_ONCE() without any tangible reason, regardless of whether a gcc issue was fixed.
>>>>> 
>>>>> But hey, if you want to float the idea…
>>>> We already had the READ_ONCE() in rcu_deference_raw().
>>>> The barrier() here makes me think we need write code like below:
>>>> 
>>>> READ_ONCE(head->first);
>>>> barrier();
>>>> READ_ONCE(head->first);
>>>> With READ_ONCE (or the deprecated ACCESS_ONCE),
>>>> I don’t think a compiler should cache the value of head->first.
>>> 
>>> 
>>> Right, it shouldn't need to cache. To Eric's point it might be risky to remove the barrier() and someone needs to explain that issue first (or IMO there needs to be another tangible reason like performance etc). Anyway, FWIW I wrote a simple program and I am not seeing the head->first cached with the pattern you shared above:
>>> 
>>> #include <stdlib.h>
>>> 
>>> #define READ_ONCE(x) (*(volatile typeof(x) *)&(x))
>>> #define barrier() __asm__ __volatile__("": : :"memory")
>>> 
>>> typedef struct list_head {
>>>  int first;
>>>  struct list_head *next;
>>> } list_head;
>>> 
>>> int main() {
>>>  list_head *head = (list_head *)malloc(sizeof(list_head));
>>>  head->first = 1;
>>>  head->next = 0;
>>> 
>>>  READ_ONCE(head->first);
>>>  barrier();
>>>  READ_ONCE(head->first);
>>> 
>>>  free(head);
>>>  return 0;
>>> }
>>> 
>>> On ARM 32-bit, 64-bit and x86_64, with -Os and then another experiment with -O2 on new gcc versions.
>> 
>> Well, when I change the code as below:
>> 
>> #include <stdlib.h>
>> 
>> #define READ_ONCE(x) (*(volatile typeof(x) *)&(x))
>> #define barrier() __asm__ __volatile__("": : :"memory")
>> 
>> typedef struct list_head {
>>  struct list_head *next;
>>  int first; // difference here
>> } list_head;
>> 
>> int main() {
>>  list_head *head = (list_head *)malloc(sizeof(list_head));
>>  head->first = 1;
>>  head->next = 0;
>> 
>>  READ_ONCE(head->first);
>>  READ_ONCE(head->first);
>> 
>>  free(head);
>>  return 0;
>> }
>> 
>> GCC 8, GCC 10, GCC 11 generate the following code (with -O2):
>> 
>> main:
>>       subq    $8, %rsp
>>       movl    $16, %edi
>>       call    malloc
>>       movl    $1, 8(%rax)
>>       movq    %rax, %rdi
>>       call    free
>>       xorl    %eax, %eax
>>       addq    $8, %rsp
>>       ret
>> 
>> 
>> The READ_ONCE has been optimized away. The difference in the source code is that I put ->first to the second member.
>> 
>> That means, GCC 8, 10, 11 have the bug!
>> 
>> 
> 
> Found a related discussion:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102714
> 
> Looks like GCC 10, 11 have been backported, not sure whether GCC 8 has been backported.
> 
> So, I have the following questions:
> 
> Given that some people might not update their GCC, do they need to be notified?
> 
> Do we need to CC Linus?

No need.

I put the following code into a kernel module:

typedef struct list_head_shit {
	int next;
	struct list_head *first;
} list_head_shit;

static void noinline so_shit(void) {
	list_head_shit *head = (list_head_shit *)kmalloc(sizeof(list_head_shit), GFP_KERNEL);
	head->first = 0;
	head->next = 1;

	READ_ONCE(head->first);
	READ_ONCE(head->first);

	kfree(head);
}

x86_64-linux-gnu-gcc-11 generate the following code:

0000000000000000 <so_shit>:
   0:	48 8b 3d 00 00 00 00 	mov    0x0(%rip),%rdi        # 7 <so_shit+0x7>
   7:	ba 10 00 00 00       		mov    $0x10,%edx
   c:	be c0 0c 00 00      	 	mov    $0xcc0,%esi
  11:	e8 00 00 00 00      	 	call   16 <so_shit+0x16>
  16:	48 c7 40 08 00 00 00 	movq   $0x0,0x8(%rax)
  1d:	00
  1e:	48 89 c7             		mov    %rax,%rdi
  21:	c7 00 01 00 00 00    	movl   $0x1,(%rax)
  27:	48 8b 47 08          		mov    0x8(%rdi),%rax	  # READ_ONCE here
  2b:	48 8b 47 08          		mov    0x8(%rdi),%rax	  # READ_ONCE here
  2f:	e9 00 00 00 00      	 	jmp    34 <so_shit+0x34>
  34:	66 66 2e 0f 1f 84 00 	data16 cs nopw 0x0(%rax,%rax,1)
  3b:	00 00 00 00
  3f:	90                   			nop

The conclusion is that we can rely on READ_ONCE when writing kernel code.

The kernel’s READ_ONCE is different with the one Joel wrote yesterday. (Joel’s is the same as the old ACCESS_ONCE) 

The compiler does have the bug when using simple volatile access:

	https://lwn.net/Articles/624126/

The C standard before the upcoming C23 only says accessing volatile object, 
So I think volatile access (_ONCE) is implemented differently by GCC, that’s why the simple ACCESS_ONCE didn’t work.

With the upcoming C23, they will have the same side effect (Section 5.1.2.3):

	https://open-std.org/JTC1/SC22/WG14/www/docs/n3096.pdf

I think we can remove the barrier() now. :)

Thanks,
Alan



^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: Question about the barrier() in hlist_nulls_for_each_entry_rcu()
  2023-07-22 13:32               ` Alan Huang
@ 2023-07-22 14:06                 ` David Laight
  2023-07-22 15:00                   ` Alan Huang
  0 siblings, 1 reply; 26+ messages in thread
From: David Laight @ 2023-07-22 14:06 UTC (permalink / raw)
  To: 'Alan Huang', Joel Fernandes, Paul E. McKenney,
	Eric Dumazet, roman.gushchin@linux.dev
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	rcu@vger.kernel.org

....
> > Found a related discussion:
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102714
> >
> > Looks like GCC 10, 11 have been backported, not sure whether GCC 8 has been backported.
> >
> > So, I have the following questions:
> >
> > Given that some people might not update their GCC, do they need to be notified?
> >
> > Do we need to CC Linus?
> 
> No need.
> 
> I put the following code into a kernel module:
> 
> typedef struct list_head_shit {
> 	int next;
> 	struct list_head *first;
> } list_head_shit;
> 
> static void noinline so_shit(void) {
> 	list_head_shit *head = (list_head_shit *)kmalloc(sizeof(list_head_shit), GFP_KERNEL);
> 	head->first = 0;
> 	head->next = 1;
> 
> 	READ_ONCE(head->first);
> 	READ_ONCE(head->first);
> 
> 	kfree(head);
> }
> 
> x86_64-linux-gnu-gcc-11 generate the following code:
> 
> 0000000000000000 <so_shit>:
>    0:	48 8b 3d 00 00 00 00 	mov    0x0(%rip),%rdi        # 7 <so_shit+0x7>
>    7:	ba 10 00 00 00       		mov    $0x10,%edx
>    c:	be c0 0c 00 00      	 	mov    $0xcc0,%esi
>   11:	e8 00 00 00 00      	 	call   16 <so_shit+0x16>
>   16:	48 c7 40 08 00 00 00 	movq   $0x0,0x8(%rax)
>   1d:	00
>   1e:	48 89 c7             		mov    %rax,%rdi
>   21:	c7 00 01 00 00 00    	movl   $0x1,(%rax)
>   27:	48 8b 47 08          		mov    0x8(%rdi),%rax	  # READ_ONCE here
>   2b:	48 8b 47 08          		mov    0x8(%rdi),%rax	  # READ_ONCE here
>   2f:	e9 00 00 00 00      	 	jmp    34 <so_shit+0x34>
>   34:	66 66 2e 0f 1f 84 00 	data16 cs nopw 0x0(%rax,%rax,1)
>   3b:	00 00 00 00
>   3f:	90                   			nop
> 
> The conclusion is that we can rely on READ_ONCE when writing kernel code.
> 
> The kernel’s READ_ONCE is different with the one Joel wrote yesterday. (Joel’s is the same as the old
> ACCESS_ONCE)

You do need to reproduce the error with code that looks like
the loop in the (old) udp.c code.

Then see if changing the implementation of READ_ONCE() from
a simple 'volatile' access the newer variant makes a difference.

You also need to check with the oldest version of gcc that is
still supported - that is much older than gcc 11.

In the udp code the volatile access was on a pointer (which should
qualify as a scaler type) so it may well be the inlining bug you
mentioned earlier, not the 'volatile on non-scaler' feature that
READ_ONCE() fixed.
That fix hasn't been back-ported to all the versions of gcc
that the kernel build supports.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question about the barrier() in hlist_nulls_for_each_entry_rcu()
  2023-07-22 14:06                 ` David Laight
@ 2023-07-22 15:00                   ` Alan Huang
  0 siblings, 0 replies; 26+ messages in thread
From: Alan Huang @ 2023-07-22 15:00 UTC (permalink / raw)
  To: David Laight
  Cc: Joel Fernandes, Paul E. McKenney, Eric Dumazet,
	roman.gushchin@linux.dev, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, rcu@vger.kernel.org


> 2023年7月22日 22:06,David Laight <David.Laight@ACULAB.COM> 写道:
> 
> ....
>>> Found a related discussion:
>>> 
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102714
>>> 
>>> Looks like GCC 10, 11 have been backported, not sure whether GCC 8 has been backported.
>>> 
>>> So, I have the following questions:
>>> 
>>> Given that some people might not update their GCC, do they need to be notified?
>>> 
>>> Do we need to CC Linus?
>> 
>> No need.
>> 
>> I put the following code into a kernel module:
>> 
>> typedef struct list_head_shit {
>> int next;
>> struct list_head *first;
>> } list_head_shit;
>> 
>> static void noinline so_shit(void) {
>> list_head_shit *head = (list_head_shit *)kmalloc(sizeof(list_head_shit), GFP_KERNEL);
>> head->first = 0;
>> head->next = 1;
>> 
>> READ_ONCE(head->first);
>> READ_ONCE(head->first);
>> 
>> kfree(head);
>> }
>> 
>> x86_64-linux-gnu-gcc-11 generate the following code:
>> 
>> 0000000000000000 <so_shit>:
>>   0: 48 8b 3d 00 00 00 00 mov    0x0(%rip),%rdi        # 7 <so_shit+0x7>
>>   7: ba 10 00 00 00        mov    $0x10,%edx
>>   c: be c0 0c 00 00       mov    $0xcc0,%esi
>>  11: e8 00 00 00 00       call   16 <so_shit+0x16>
>>  16: 48 c7 40 08 00 00 00 movq   $0x0,0x8(%rax)
>>  1d: 00
>>  1e: 48 89 c7              mov    %rax,%rdi
>>  21: c7 00 01 00 00 00     movl   $0x1,(%rax)
>>  27: 48 8b 47 08           mov    0x8(%rdi),%rax  # READ_ONCE here
>>  2b: 48 8b 47 08           mov    0x8(%rdi),%rax  # READ_ONCE here
>>  2f: e9 00 00 00 00       jmp    34 <so_shit+0x34>
>>  34: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1)
>>  3b: 00 00 00 00
>>  3f: 90                    nop
>> 
>> The conclusion is that we can rely on READ_ONCE when writing kernel code.
>> 
>> The kernel’s READ_ONCE is different with the one Joel wrote yesterday. (Joel’s is the same as the old
>> ACCESS_ONCE)
> 
> You do need to reproduce the error with code that looks like
> the loop in the (old) udp.c code.
> 
> Then see if changing the implementation of READ_ONCE() from
> a simple 'volatile' access the newer variant makes a difference.
> 
> You also need to check with the oldest version of gcc that is
> still supported - that is much older than gcc 11.
> 
> In the udp code the volatile access was on a pointer (which should
> qualify as a scaler type) so it may well be the inlining bug you
> mentioned earlier, not the 'volatile on non-scaler' feature that
> READ_ONCE() fixed.
> That fix hasn't been back-ported to all the versions of gcc
> that the kernel build supports.

Well, the same compiler, the kernel’s READ_ONCE:

static int noinline foo(int a, int b, int c) {
	b = a + 1;
	c = READ_ONCE(b) + 1;
	a = c + 1;
	return a;
}

0000000000000040 <foo.constprop.0>:
  40:	b8 04 00 00 00       	mov    $0x4,%eax
  45:	c3                   		ret

Example from:

	https://stackoverflow.com/questions/70380510/non-conforming-optimizations-of-volatile-in-gcc-11-1

Change the code to:

static int noinline foo(int a, volatile int b, int c) {
	b = a + 1;
	c = b + 1;
	a = c + 1;
	return a;
}

Doesn’t help.

BTW, Clang works fine, even the function is inlined.

> 
> David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question about the barrier() in hlist_nulls_for_each_entry_rcu()
  2023-07-21 14:27       ` Alan Huang
  2023-07-21 15:21         ` Joel Fernandes
@ 2023-07-31 20:09         ` Paul E. McKenney
  2023-08-03 13:40           ` Alan Huang
  1 sibling, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2023-07-31 20:09 UTC (permalink / raw)
  To: Alan Huang
  Cc: Joel Fernandes, Eric Dumazet, linux-kernel, netdev, rcu,
	roman.gushchin

On Fri, Jul 21, 2023 at 10:27:04PM +0800, Alan Huang wrote:
> 
> > 2023年7月21日 20:54,Joel Fernandes <joel@joelfernandes.org> 写道:
> > 
> > 
> > 
> >> On Jul 20, 2023, at 4:00 PM, Alan Huang <mmpgouride@gmail.com> wrote:
> >> 
> >> 
> >>> 2023年7月21日 03:22,Eric Dumazet <edumazet@google.com> 写道:
> >>> 
> >>>> On Thu, Jul 20, 2023 at 8:54 PM Alan Huang <mmpgouride@gmail.com> wrote:
> >>>> 
> >>>> Hi,
> >>>> 
> >>>> I noticed a commit c87a124a5d5e(“net: force a reload of first item in hlist_nulls_for_each_entry_rcu”)
> >>>> and a related discussion [1].
> >>>> 
> >>>> After reading the whole discussion, it seems like that ptr->field was cached by gcc even with the deprecated
> >>>> ACCESS_ONCE(), so my question is:
> >>>> 
> >>>>      Is that a compiler bug? If so, has this bug been fixed today, ten years later?
> >>>> 
> >>>>      What about READ_ONCE(ptr->field)?
> >>> 
> >>> Make sure sparse is happy.
> >> 
> >> It caused a problem without barrier(), and the deprecated ACCESS_ONCE() didn’t help:
> >> 
> >>   https://lore.kernel.org/all/519D19DA.50400@yandex-team.ru/
> >> 
> >> So, my real question is: With READ_ONCE(ptr->field), are there still some unusual cases where gcc 
> >> decides not to reload ptr->field?
> > 
> > I am a bit doubtful there will be strong (any?) interest in replacing the barrier() with READ_ONCE() without any tangible reason, regardless of whether a gcc issue was fixed.
> > 
> > But hey, if you want to float the idea…
> 
> We already had the READ_ONCE() in rcu_deference_raw().
> 
> The barrier() here makes me think we need write code like below:
> 	
> 	READ_ONCE(head->first);
> 	barrier();
> 	READ_ONCE(head->first);
> 
> With READ_ONCE (or the deprecated ACCESS_ONCE),
> I don’t think a compiler should cache the value of head->first.

Apologies for the late reply!

If both are READ_ONCE(), you should not need the barrier().  Unless there
is some other code not shown in your example that requires it, that is.

							Thanx, Paul

> > Thanks,
> > 
> > - Joel
> > 
> >> 
> >>> 
> >>> Do you have a patch for review ?
> >> 
> >> Possibly next month. :)
> >> 
> >>> 
> >>> 
> >>>> 
> >>>> 
> >>>> [1] https://lore.kernel.org/all/1369699930.3301.494.camel@edumazet-glaptop/
> >>>> 
> >>>> Thanks,
> >>>> Alan
> >> 
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question about the barrier() in hlist_nulls_for_each_entry_rcu()
  2023-07-31 20:09         ` Paul E. McKenney
@ 2023-08-03 13:40           ` Alan Huang
  2023-08-03 13:53             ` Paul E. McKenney
  0 siblings, 1 reply; 26+ messages in thread
From: Alan Huang @ 2023-08-03 13:40 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Joel Fernandes, Eric Dumazet, linux-kernel, netdev, rcu,
	roman.gushchin


> 2023年8月1日 上午4:09,Paul E. McKenney <paulmck@kernel.org> 写道:
> 
> On Fri, Jul 21, 2023 at 10:27:04PM +0800, Alan Huang wrote:
>> 
>>> 2023年7月21日 20:54,Joel Fernandes <joel@joelfernandes.org> 写道:
>>> 
>>> 
>>> 
>>>> On Jul 20, 2023, at 4:00 PM, Alan Huang <mmpgouride@gmail.com> wrote:
>>>> 
>>>> 
>>>>> 2023年7月21日 03:22,Eric Dumazet <edumazet@google.com> 写道:
>>>>> 
>>>>>> On Thu, Jul 20, 2023 at 8:54 PM Alan Huang <mmpgouride@gmail.com> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I noticed a commit c87a124a5d5e(“net: force a reload of first item in hlist_nulls_for_each_entry_rcu”)
>>>>>> and a related discussion [1].
>>>>>> 
>>>>>> After reading the whole discussion, it seems like that ptr->field was cached by gcc even with the deprecated
>>>>>> ACCESS_ONCE(), so my question is:
>>>>>> 
>>>>>>     Is that a compiler bug? If so, has this bug been fixed today, ten years later?
>>>>>> 
>>>>>>     What about READ_ONCE(ptr->field)?
>>>>> 
>>>>> Make sure sparse is happy.
>>>> 
>>>> It caused a problem without barrier(), and the deprecated ACCESS_ONCE() didn’t help:
>>>> 
>>>>  https://lore.kernel.org/all/519D19DA.50400@yandex-team.ru/
>>>> 
>>>> So, my real question is: With READ_ONCE(ptr->field), are there still some unusual cases where gcc 
>>>> decides not to reload ptr->field?
>>> 
>>> I am a bit doubtful there will be strong (any?) interest in replacing the barrier() with READ_ONCE() without any tangible reason, regardless of whether a gcc issue was fixed.
>>> 
>>> But hey, if you want to float the idea…
>> 
>> We already had the READ_ONCE() in rcu_deference_raw().
>> 
>> The barrier() here makes me think we need write code like below:
>> 	
>> 	READ_ONCE(head->first);
>> 	barrier();
>> 	READ_ONCE(head->first);
>> 
>> With READ_ONCE (or the deprecated ACCESS_ONCE),
>> I don’t think a compiler should cache the value of head->first.
> 
> Apologies for the late reply!
> 
> If both are READ_ONCE(), you should not need the barrier().  Unless there
> is some other code not shown in your example that requires it, that is.

And unless the compiler has a bug. :) 

So, the barrier() in hlist_nulls_for_each_entry_rcu() is a workaround for a compiler bug.

> 
> 							Thanx, Paul
> 
>>> Thanks,
>>> 
>>> - Joel
>>> 
>>>> 
>>>>> 
>>>>> Do you have a patch for review ?
>>>> 
>>>> Possibly next month. :)
>>>> 
>>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> [1] https://lore.kernel.org/all/1369699930.3301.494.camel@edumazet-glaptop/
>>>>>> 
>>>>>> Thanks,
>>>>>> Alan


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Question about the barrier() in hlist_nulls_for_each_entry_rcu()
  2023-08-03 13:40           ` Alan Huang
@ 2023-08-03 13:53             ` Paul E. McKenney
  2023-08-03 14:39               ` David Laight
  0 siblings, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2023-08-03 13:53 UTC (permalink / raw)
  To: Alan Huang
  Cc: Joel Fernandes, Eric Dumazet, linux-kernel, netdev, rcu,
	roman.gushchin

On Thu, Aug 03, 2023 at 09:40:11PM +0800, Alan Huang wrote:
> 
> > 2023年8月1日 上午4:09,Paul E. McKenney <paulmck@kernel.org> 写道:
> > 
> > On Fri, Jul 21, 2023 at 10:27:04PM +0800, Alan Huang wrote:
> >> 
> >>> 2023年7月21日 20:54,Joel Fernandes <joel@joelfernandes.org> 写道:
> >>> 
> >>> 
> >>> 
> >>>> On Jul 20, 2023, at 4:00 PM, Alan Huang <mmpgouride@gmail.com> wrote:
> >>>> 
> >>>> 
> >>>>> 2023年7月21日 03:22,Eric Dumazet <edumazet@google.com> 写道:
> >>>>> 
> >>>>>> On Thu, Jul 20, 2023 at 8:54 PM Alan Huang <mmpgouride@gmail.com> wrote:
> >>>>>> 
> >>>>>> Hi,
> >>>>>> 
> >>>>>> I noticed a commit c87a124a5d5e(“net: force a reload of first item in hlist_nulls_for_each_entry_rcu”)
> >>>>>> and a related discussion [1].
> >>>>>> 
> >>>>>> After reading the whole discussion, it seems like that ptr->field was cached by gcc even with the deprecated
> >>>>>> ACCESS_ONCE(), so my question is:
> >>>>>> 
> >>>>>>     Is that a compiler bug? If so, has this bug been fixed today, ten years later?
> >>>>>> 
> >>>>>>     What about READ_ONCE(ptr->field)?
> >>>>> 
> >>>>> Make sure sparse is happy.
> >>>> 
> >>>> It caused a problem without barrier(), and the deprecated ACCESS_ONCE() didn’t help:
> >>>> 
> >>>>  https://lore.kernel.org/all/519D19DA.50400@yandex-team.ru/
> >>>> 
> >>>> So, my real question is: With READ_ONCE(ptr->field), are there still some unusual cases where gcc 
> >>>> decides not to reload ptr->field?
> >>> 
> >>> I am a bit doubtful there will be strong (any?) interest in replacing the barrier() with READ_ONCE() without any tangible reason, regardless of whether a gcc issue was fixed.
> >>> 
> >>> But hey, if you want to float the idea…
> >> 
> >> We already had the READ_ONCE() in rcu_deference_raw().
> >> 
> >> The barrier() here makes me think we need write code like below:
> >> 	
> >> 	READ_ONCE(head->first);
> >> 	barrier();
> >> 	READ_ONCE(head->first);
> >> 
> >> With READ_ONCE (or the deprecated ACCESS_ONCE),
> >> I don’t think a compiler should cache the value of head->first.
> > 
> > Apologies for the late reply!
> > 
> > If both are READ_ONCE(), you should not need the barrier().  Unless there
> > is some other code not shown in your example that requires it, that is.
> 
> And unless the compiler has a bug. :) 
> 
> So, the barrier() in hlist_nulls_for_each_entry_rcu() is a workaround for a compiler bug.

Fair enough!!!  ;-)


							Thanx, Paul

> >>> Thanks,
> >>> 
> >>> - Joel
> >>> 
> >>>> 
> >>>>> 
> >>>>> Do you have a patch for review ?
> >>>> 
> >>>> Possibly next month. :)
> >>>> 
> >>>>> 
> >>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> [1] https://lore.kernel.org/all/1369699930.3301.494.camel@edumazet-glaptop/
> >>>>>> 
> >>>>>> Thanks,
> >>>>>> Alan
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: Question about the barrier() in hlist_nulls_for_each_entry_rcu()
  2023-08-03 13:53             ` Paul E. McKenney
@ 2023-08-03 14:39               ` David Laight
  0 siblings, 0 replies; 26+ messages in thread
From: David Laight @ 2023-08-03 14:39 UTC (permalink / raw)
  To: 'paulmck@kernel.org', Alan Huang
  Cc: Joel Fernandes, Eric Dumazet, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, rcu@vger.kernel.org,
	roman.gushchin@linux.dev

From: Paul E. McKenney
> Sent: 03 August 2023 14:54
....
> > > If both are READ_ONCE(), you should not need the barrier().  Unless there
> > > is some other code not shown in your example that requires it, that is.
> >
> > And unless the compiler has a bug. :)
> >
> > So, the barrier() in hlist_nulls_for_each_entry_rcu() is a workaround for a compiler bug.
> 
> Fair enough!!!  ;-)

Except that it is likely that the compiler bug is avoided by the
implementation of READ_ONCE() rather than ACCESS_ONCE().

Also the code that looped forever (UDP receive socket lookup)
no longer has the retry - which is a different bug.
If a socket rehash hits the lookup then an erroneous ICMP
'port unreachable' is sent rather than doing a rescan.

	David
 

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2023-08-03 14:41 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-20 18:53 Question about the barrier() in hlist_nulls_for_each_entry_rcu() Alan Huang
2023-07-20 19:22 ` Eric Dumazet
2023-07-20 19:59   ` Alan Huang
2023-07-20 21:11     ` Eric Dumazet
2023-07-21 14:31       ` Alan Huang
2023-07-21 14:47         ` Eric Dumazet
2023-07-21 15:21           ` Alan Huang
2023-07-21 12:54     ` Joel Fernandes
2023-07-21 14:27       ` Alan Huang
2023-07-21 15:21         ` Joel Fernandes
2023-07-21 15:54           ` Alan Huang
2023-07-21 16:00             ` Joel Fernandes
2023-07-21 15:59           ` David Laight
2023-07-21 17:14             ` Joel Fernandes
2023-07-21 20:08           ` Alan Huang
2023-07-21 20:40             ` Alan Huang
2023-07-21 21:25               ` Alan Huang
2023-07-22 13:32               ` Alan Huang
2023-07-22 14:06                 ` David Laight
2023-07-22 15:00                   ` Alan Huang
2023-07-31 20:09         ` Paul E. McKenney
2023-08-03 13:40           ` Alan Huang
2023-08-03 13:53             ` Paul E. McKenney
2023-08-03 14:39               ` David Laight
2023-07-21 11:51 ` David Laight
2023-07-21 15:55   ` Alan Huang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox