netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] Questionable RCU/BH usage in cgw_create_job().
@ 2023-10-31 11:23 Sebastian Andrzej Siewior
  2023-10-31 16:14 ` Oliver Hartkopp
  0 siblings, 1 reply; 6+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-10-31 11:23 UTC (permalink / raw)
  To: Oliver Hartkopp, Marc Kleine-Budde; +Cc: linux-can, netdev, Thomas Gleixner

Hi,

I stumbled over this piece in cgw_create_job():
|      /* update modifications with disabled softirq & quit */
|      local_bh_disable();
|      memcpy(&gwj->mod, &mod, sizeof(mod));
|      local_bh_enable();
|      return 0;

unfortunately the comment did not provide much enlightenment for me.
Let's look. That memcpy() overwrites struct cf_mod within struct cgw_job
which is under RCU protection. memcpy() and RCU hardly works as a combo.
But why the local_bh_disable()?

Let's look further. The user of this data structure is can_can_gw_rcv().
There is something like:
|                 /* check for checksum updates */
|                 if (gwj->mod.csumfunc.crc8)
|                         (*gwj->mod.csumfunc.crc8)(cf, &gwj->mod.csum.crc8);

With optimisation enabled (as in -O2 or so) the compiler will fetch
mod.csumfunc.crc8, do the comparison and if non-NULL use the previously
fetched value and jump there. So one could argue that it is not really
affected by the memcpy() suddenly setting it to NULL. However, adding
any kind of a function in between, say 
|                 /* check for checksum updates */
|                 if (gwj->mod.csumfunc.crc8) {
|+                        trace_event_crc8_sth(cf)
|                         (*gwj->mod.csumfunc.crc8)(cf, &gwj->mod.csum.crc8);
|                 }

will force the compiler to reload mod.csumfunc.crc8. And here is the
possible NULL pointer if overwritten by update in cgw_create_job().

One reload that already happens is the one of mod.modfunc. First at the
top we have:
|         if (gwj->mod.modfunc[0])
|                 nskb = skb_copy(skb, GFP_ATOMIC);
|         else
|                 nskb = skb_clone(skb, GFP_ATOMIC);

Here mod.modfunc[0] is NULL and skb_clone() is invoked. Later if
mod.modfunc has been set to non-NULL value this piece
|         while (modidx < MAX_MODFUNCTIONS && gwj->mod.modfunc[modidx])
|                 (*gwj->mod.modfunc[modidx++])(cf, &gwj->mod);

reloads mod.modfunc and may modify the skb assuming that skb_copy() has
been used earlier.

Looking at this makes me think that the local_bh_disable() has been
added to the memcpy() just to ensure that can_can_gw_rcv() won't run
because it is invoked in a BH disabled context. Clever little trick that
is. But this trick is limited to UP environments…

Am I missing something?

If not, my suggestion would be replacing the bh-off, memcpy part with:
|		old_mod = rcu_replace_pointer(gwj->mod, new_mod, true);
|		kfree_rcu_mightsleep(old_mod);

and doing the needed pointer replacement with for struct cgw_job::mod
and RCU annotation.

Sebastian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] Questionable RCU/BH usage in cgw_create_job().
  2023-10-31 11:23 [RFC] Questionable RCU/BH usage in cgw_create_job() Sebastian Andrzej Siewior
@ 2023-10-31 16:14 ` Oliver Hartkopp
  2023-10-31 16:52   ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 6+ messages in thread
From: Oliver Hartkopp @ 2023-10-31 16:14 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Marc Kleine-Budde
  Cc: linux-can, netdev, Thomas Gleixner

Hi Sebastian,

thanks for the review!

On 31.10.23 12:23, Sebastian Andrzej Siewior wrote:
> Hi,
> 
> I stumbled over this piece in cgw_create_job():
> |      /* update modifications with disabled softirq & quit */
> |      local_bh_disable();
> |      memcpy(&gwj->mod, &mod, sizeof(mod));
> |      local_bh_enable();
> |      return 0;
> 
> unfortunately the comment did not provide much enlightenment for me.
> Let's look. That memcpy() overwrites struct cf_mod within struct cgw_job
> which is under RCU protection. memcpy() and RCU hardly works as a combo.
> But why the local_bh_disable()?

The content of gwj->mod can be overwritten with new modification rules 
at runtime. But this update (with memcpy) has to take place when there 
is no incoming network traffic.

> Let's look further. The user of this data structure is can_can_gw_rcv().
> There is something like:
> |                 /* check for checksum updates */
> |                 if (gwj->mod.csumfunc.crc8)
> |                         (*gwj->mod.csumfunc.crc8)(cf, &gwj->mod.csum.crc8);
> 
> With optimisation enabled (as in -O2 or so) the compiler will fetch
> mod.csumfunc.crc8, do the comparison and if non-NULL use the previously
> fetched value and jump there. So one could argue that it is not really
> affected by the memcpy() suddenly setting it to NULL. However, adding
> any kind of a function in between, say
> |                 /* check for checksum updates */
> |                 if (gwj->mod.csumfunc.crc8) {
> |+                        trace_event_crc8_sth(cf)
> |                         (*gwj->mod.csumfunc.crc8)(cf, &gwj->mod.csum.crc8);
> |                 }
> 
> will force the compiler to reload mod.csumfunc.crc8. And here is the
> possible NULL pointer if overwritten by update in cgw_create_job().
> 
> One reload that already happens is the one of mod.modfunc. First at the
> top we have:
> |         if (gwj->mod.modfunc[0])
> |                 nskb = skb_copy(skb, GFP_ATOMIC);
> |         else
> |                 nskb = skb_clone(skb, GFP_ATOMIC);
> 
> Here mod.modfunc[0] is NULL and skb_clone() is invoked. Later if
> mod.modfunc has been set to non-NULL value this piece
> |         while (modidx < MAX_MODFUNCTIONS && gwj->mod.modfunc[modidx])
> |                 (*gwj->mod.modfunc[modidx++])(cf, &gwj->mod);
> 
> reloads mod.modfunc and may modify the skb assuming that skb_copy() has
> been used earlier.
> 
> Looking at this makes me think that the local_bh_disable() has been
> added to the memcpy() just to ensure that can_can_gw_rcv() won't run
> because it is invoked in a BH disabled context. Clever little trick that
> is. But this trick is limited to UP environments…
> 
> Am I missing something?
> 
> If not, my suggestion would be replacing the bh-off, memcpy part with:
> |		old_mod = rcu_replace_pointer(gwj->mod, new_mod, true);
> |		kfree_rcu_mightsleep(old_mod);
> 
> and doing the needed pointer replacement with for struct cgw_job::mod
> and RCU annotation.

Replacing a pointer does not copy any data to the cf_mod structure, right?

Best regards,
Oliver

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] Questionable RCU/BH usage in cgw_create_job().
  2023-10-31 16:14 ` Oliver Hartkopp
@ 2023-10-31 16:52   ` Sebastian Andrzej Siewior
  2023-11-30 16:43     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 6+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-10-31 16:52 UTC (permalink / raw)
  To: Oliver Hartkopp; +Cc: Marc Kleine-Budde, linux-can, netdev, Thomas Gleixner

On 2023-10-31 17:14:01 [+0100], Oliver Hartkopp wrote:
> Hi Sebastian,
Hi Oliver,

> The content of gwj->mod can be overwritten with new modification rules at
> runtime. But this update (with memcpy) has to take place when there is no
> incoming network traffic.

This is my assumption. But "no incoming network traffic" is not ensured,
right?

> > If not, my suggestion would be replacing the bh-off, memcpy part with:
> > |		old_mod = rcu_replace_pointer(gwj->mod, new_mod, true);
> > |		kfree_rcu_mightsleep(old_mod);
> > 
> > and doing the needed pointer replacement with for struct cgw_job::mod
> > and RCU annotation.
> 
> Replacing a pointer does not copy any data to the cf_mod structure, right?

Yes. The cf_mod data structure is embedded into cgw_job. So it would
have to become a pointer. Then cgw_create_job() would create a new
cf_mod via cgw_parse_attr() but it would be a new allocated structure
instead on stack like it is now. And then in the existing case you would
do the swap. Otherwise (non-existing, brand new) it becomes part of the
new created cgw_job.

The point is to replace/ update cf_mod at runtime while following RCU
rules so always either new or the old object is observed. Never an
intermediate step.

> Best regards,
> Oliver

Sebastian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] Questionable RCU/BH usage in cgw_create_job().
  2023-10-31 16:52   ` Sebastian Andrzej Siewior
@ 2023-11-30 16:43     ` Sebastian Andrzej Siewior
  2023-11-30 19:56       ` Oliver Hartkopp
  0 siblings, 1 reply; 6+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-11-30 16:43 UTC (permalink / raw)
  To: Oliver Hartkopp; +Cc: Marc Kleine-Budde, linux-can, netdev, Thomas Gleixner

On 2023-10-31 17:52:47 [+0100], To Oliver Hartkopp wrote:
> On 2023-10-31 17:14:01 [+0100], Oliver Hartkopp wrote:
> > Hi Sebastian,
Hi Oliver,

> The point is to replace/ update cf_mod at runtime while following RCU
> rules so always either new or the old object is observed. Never an
> intermediate step.

Do you want me to take care of it?

> > Best regards,
> > Oliver
> 
Sebastian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] Questionable RCU/BH usage in cgw_create_job().
  2023-11-30 16:43     ` Sebastian Andrzej Siewior
@ 2023-11-30 19:56       ` Oliver Hartkopp
  2023-12-21 12:43         ` Oliver Hartkopp
  0 siblings, 1 reply; 6+ messages in thread
From: Oliver Hartkopp @ 2023-11-30 19:56 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Marc Kleine-Budde, linux-can, netdev, Thomas Gleixner

Hi Sebastian,

On 30.11.23 17:43, Sebastian Andrzej Siewior wrote:
> On 2023-10-31 17:52:47 [+0100], To Oliver Hartkopp wrote:

>> The point is to replace/ update cf_mod at runtime while following RCU
>> rules so always either new or the old object is observed. Never an
>> intermediate step.
> 
> Do you want me to take care of it?

Yes, sorry.

In fact I've searched some time what would fit best without getting a 
clear picture.

As the changes triggered by the netlink update should come into action 
with the next processed CAN frame I have thought about adding a shadow 
'mod' structure which is written instantly.
And then a flag could be set, that is switched by the next incoming CAN 
frame.

I just would have a problem with performing some memory allocation for 
the 'mod' updates which might take some unpredictable time.

If you have some cool ideas please let me know. I'm unsure what is the 
most effective and performant approach for this use case.

Many thanks,
Oliver

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] Questionable RCU/BH usage in cgw_create_job().
  2023-11-30 19:56       ` Oliver Hartkopp
@ 2023-12-21 12:43         ` Oliver Hartkopp
  0 siblings, 0 replies; 6+ messages in thread
From: Oliver Hartkopp @ 2023-12-21 12:43 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Marc Kleine-Budde, linux-can, netdev, Thomas Gleixner

Hi Sebastian,

I've sent a RFC patch to be reviewed here:

https://lore.kernel.org/linux-can/20231221123703.8170-1-socketcan@hartkopp.net/T/#u

I hope your suggestion to use rcu_replace_pointer()
in the implemented way fits the requirements.

Best regards,
Oliver

On 2023-11-30 20:56, Oliver Hartkopp wrote:
> Hi Sebastian,
> 
> On 30.11.23 17:43, Sebastian Andrzej Siewior wrote:
>> On 2023-10-31 17:52:47 [+0100], To Oliver Hartkopp wrote:
> 
>>> The point is to replace/ update cf_mod at runtime while following RCU
>>> rules so always either new or the old object is observed. Never an
>>> intermediate step.
>>
>> Do you want me to take care of it?
> 
> Yes, sorry.
> 
> In fact I've searched some time what would fit best without getting a 
> clear picture.
> 
> As the changes triggered by the netlink update should come into action 
> with the next processed CAN frame I have thought about adding a shadow 
> 'mod' structure which is written instantly.
> And then a flag could be set, that is switched by the next incoming CAN 
> frame.
> 
> I just would have a problem with performing some memory allocation for 
> the 'mod' updates which might take some unpredictable time.
> 
> If you have some cool ideas please let me know. I'm unsure what is the 
> most effective and performant approach for this use case.
> 
> Many thanks,
> Oliver
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-12-21 12:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-31 11:23 [RFC] Questionable RCU/BH usage in cgw_create_job() Sebastian Andrzej Siewior
2023-10-31 16:14 ` Oliver Hartkopp
2023-10-31 16:52   ` Sebastian Andrzej Siewior
2023-11-30 16:43     ` Sebastian Andrzej Siewior
2023-11-30 19:56       ` Oliver Hartkopp
2023-12-21 12:43         ` Oliver Hartkopp

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).