[PATCH net] net: dev: Detect dev_hold() after netdev_wait

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net] net: dev: Detect dev_hold() after netdev_wait_allrefs()
@ 2022-01-28  1:43 Jann Horn
  2022-01-28  2:09 ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Jann Horn @ 2022-01-28  1:43 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, netdev, Eric Dumazet
  Cc: linux-kernel, Oliver Neukum, Jann Horn

I've run into a bug where dev_hold() was being called after
netdev_wait_allrefs(). But at that point, the device is already going
away, and dev_hold() can't stop that anymore.

To make such problems easier to diagnose in the future:

 - For CONFIG_PCPU_DEV_REFCNT builds: Recheck in free_netdev() whether
   the net refcount has been elevated. If this is detected, WARN() and
   leak the object (to prevent worse consequences from a
   use-after-free).
 - For builds without CONFIG_PCPU_DEV_REFCNT: Set the refcount to zero.
   This signals to the generic refcount infrastructure that any attempt
   to increment the refcount later is a bug.

Signed-off-by: Jann Horn <jannh@google.com>
---
 net/core/dev.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 1baab07820f6..f7916c0d226d 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9949,8 +9949,18 @@ void netdev_run_todo(void)
 
 		netdev_wait_allrefs(dev);
 
+		/* Drop the netdev refcount (which should be 1 at this point)
+		 * to zero. If we're using the generic refcount code, this will
+		 * tell it that any dev_hold() after this point is a bug.
+		 */
+#ifdef CONFIG_PCPU_DEV_REFCNT
+		this_cpu_dec(*dev->pcpu_refcnt);
+		BUG_ON(netdev_refcnt_read(dev) != 0);
+#else
+		BUG_ON(!refcount_dec_and_test(&dev->dev_refcnt));
+#endif
+
 		/* paranoia */
-		BUG_ON(netdev_refcnt_read(dev) != 1);
 		BUG_ON(!list_empty(&dev->ptype_all));
 		BUG_ON(!list_empty(&dev->ptype_specific));
 		WARN_ON(rcu_access_pointer(dev->ip_ptr));
@@ -10293,6 +10303,12 @@ void free_netdev(struct net_device *dev)
 	free_percpu(dev->xdp_bulkq);
 	dev->xdp_bulkq = NULL;
 
+	/* Recheck in case someone called dev_hold() between
+	 * netdev_wait_allrefs() and here.
+	 */
+	if (WARN_ON(netdev_refcnt_read(dev) != 0))
+		return; /* leak memory, otherwise we might get UAF */
+
 	/*  Compatibility with error handling in drivers */
 	if (dev->reg_state == NETREG_UNINITIALIZED) {
 		netdev_freemem(dev);

base-commit: 23a46422c56144939c091c76cf389aa863ce9c18
-- 
2.35.0.rc0.227.g00780c9af4-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net] net: dev: Detect dev_hold() after netdev_wait_allrefs()
  2022-01-28  1:43 [PATCH net] net: dev: Detect dev_hold() after netdev_wait_allrefs() Jann Horn
@ 2022-01-28  2:09 ` Eric Dumazet
  2022-01-28  2:14   ` Jann Horn
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2022-01-28  2:09 UTC (permalink / raw)
  To: Jann Horn; +Cc: David S . Miller, Jakub Kicinski, netdev, LKML, Oliver Neukum

On Thu, Jan 27, 2022 at 5:43 PM Jann Horn <jannh@google.com> wrote:
>
> I've run into a bug where dev_hold() was being called after
> netdev_wait_allrefs(). But at that point, the device is already going
> away, and dev_hold() can't stop that anymore.
>
> To make such problems easier to diagnose in the future:
>
>  - For CONFIG_PCPU_DEV_REFCNT builds: Recheck in free_netdev() whether
>    the net refcount has been elevated. If this is detected, WARN() and
>    leak the object (to prevent worse consequences from a
>    use-after-free).
>  - For builds without CONFIG_PCPU_DEV_REFCNT: Set the refcount to zero.
>    This signals to the generic refcount infrastructure that any attempt
>    to increment the refcount later is a bug.
>
> Signed-off-by: Jann Horn <jannh@google.com>
> ---
>  net/core/dev.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 1baab07820f6..f7916c0d226d 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -9949,8 +9949,18 @@ void netdev_run_todo(void)
>
>                 netdev_wait_allrefs(dev);
>
> +               /* Drop the netdev refcount (which should be 1 at this point)
> +                * to zero. If we're using the generic refcount code, this will
> +                * tell it that any dev_hold() after this point is a bug.
> +                */
> +#ifdef CONFIG_PCPU_DEV_REFCNT
> +               this_cpu_dec(*dev->pcpu_refcnt);
> +               BUG_ON(netdev_refcnt_read(dev) != 0);
> +#else
> +               BUG_ON(!refcount_dec_and_test(&dev->dev_refcnt));
> +#endif
> +
>                 /* paranoia */
> -               BUG_ON(netdev_refcnt_read(dev) != 1);
>                 BUG_ON(!list_empty(&dev->ptype_all));
>                 BUG_ON(!list_empty(&dev->ptype_specific));
>                 WARN_ON(rcu_access_pointer(dev->ip_ptr));
> @@ -10293,6 +10303,12 @@ void free_netdev(struct net_device *dev)
>         free_percpu(dev->xdp_bulkq);
>         dev->xdp_bulkq = NULL;
>
> +       /* Recheck in case someone called dev_hold() between
> +        * netdev_wait_allrefs() and here.
> +        */

At this point, dev->pcpu_refcnt per-cpu data has been freed already
(CONFIG_PCPU_DEV_REFCNT=y)

So this should probably crash, or at least UAF ?

> +       if (WARN_ON(netdev_refcnt_read(dev) != 0))
> +               return; /* leak memory, otherwise we might get UAF */
> +
>         /*  Compatibility with error handling in drivers */
>         if (dev->reg_state == NETREG_UNINITIALIZED) {
>                 netdev_freemem(dev);
>
> base-commit: 23a46422c56144939c091c76cf389aa863ce9c18
> --
> 2.35.0.rc0.227.g00780c9af4-goog
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net] net: dev: Detect dev_hold() after netdev_wait_allrefs()
  2022-01-28  2:09 ` Eric Dumazet
@ 2022-01-28  2:14   ` Jann Horn
  2022-01-28  2:22     ` Jakub Kicinski
  2022-01-28  2:24     ` Eric Dumazet
  0 siblings, 2 replies; 12+ messages in thread
From: Jann Horn @ 2022-01-28  2:14 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, netdev, LKML, Oliver Neukum

On Fri, Jan 28, 2022 at 3:09 AM Eric Dumazet <edumazet@google.com> wrote:
> On Thu, Jan 27, 2022 at 5:43 PM Jann Horn <jannh@google.com> wrote:
> >
> > I've run into a bug where dev_hold() was being called after
> > netdev_wait_allrefs(). But at that point, the device is already going
> > away, and dev_hold() can't stop that anymore.
> >
> > To make such problems easier to diagnose in the future:
> >
> >  - For CONFIG_PCPU_DEV_REFCNT builds: Recheck in free_netdev() whether
> >    the net refcount has been elevated. If this is detected, WARN() and
> >    leak the object (to prevent worse consequences from a
> >    use-after-free).
> >  - For builds without CONFIG_PCPU_DEV_REFCNT: Set the refcount to zero.
> >    This signals to the generic refcount infrastructure that any attempt
> >    to increment the refcount later is a bug.
> >
> > Signed-off-by: Jann Horn <jannh@google.com>
> > ---
> >  net/core/dev.c | 18 +++++++++++++++++-
> >  1 file changed, 17 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 1baab07820f6..f7916c0d226d 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -9949,8 +9949,18 @@ void netdev_run_todo(void)
> >
> >                 netdev_wait_allrefs(dev);
> >
> > +               /* Drop the netdev refcount (which should be 1 at this point)
> > +                * to zero. If we're using the generic refcount code, this will
> > +                * tell it that any dev_hold() after this point is a bug.
> > +                */
> > +#ifdef CONFIG_PCPU_DEV_REFCNT
> > +               this_cpu_dec(*dev->pcpu_refcnt);
> > +               BUG_ON(netdev_refcnt_read(dev) != 0);
> > +#else
> > +               BUG_ON(!refcount_dec_and_test(&dev->dev_refcnt));
> > +#endif
> > +
> >                 /* paranoia */
> > -               BUG_ON(netdev_refcnt_read(dev) != 1);
> >                 BUG_ON(!list_empty(&dev->ptype_all));
> >                 BUG_ON(!list_empty(&dev->ptype_specific));
> >                 WARN_ON(rcu_access_pointer(dev->ip_ptr));
> > @@ -10293,6 +10303,12 @@ void free_netdev(struct net_device *dev)
> >         free_percpu(dev->xdp_bulkq);
> >         dev->xdp_bulkq = NULL;
> >
> > +       /* Recheck in case someone called dev_hold() between
> > +        * netdev_wait_allrefs() and here.
> > +        */
>
> At this point, dev->pcpu_refcnt per-cpu data has been freed already
> (CONFIG_PCPU_DEV_REFCNT=y)
>
> So this should probably crash, or at least UAF ?

Oh. Whoops. That's what I get for only testing without CONFIG_PCPU_DEV_REFCNT...

I guess a better place to put the new check would be directly after
checking for "dev->reg_state == NETREG_UNREGISTERING"? Like this?

        if (dev->reg_state == NETREG_UNREGISTERING) {
                ASSERT_RTNL();
                dev->needs_free_netdev = true;
                return;
        }

        /* Recheck in case someone called dev_hold() between
         * netdev_wait_allrefs() and here.
         */
        if (WARN_ON(netdev_refcnt_read(dev) != 0))
                return; /* leak memory, otherwise we might get UAF */

        netif_free_tx_queues(dev);
        netif_free_rx_queues(dev);

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net] net: dev: Detect dev_hold() after netdev_wait_allrefs()
  2022-01-28  2:14   ` Jann Horn
@ 2022-01-28  2:22     ` Jakub Kicinski
  2022-01-28  2:23       ` Eric Dumazet
  2022-01-28  2:24     ` Eric Dumazet
  1 sibling, 1 reply; 12+ messages in thread
From: Jakub Kicinski @ 2022-01-28  2:22 UTC (permalink / raw)
  To: Jann Horn; +Cc: Eric Dumazet, David S . Miller, netdev, LKML, Oliver Neukum

On Fri, 28 Jan 2022 03:14:14 +0100 Jann Horn wrote:
> Oh. Whoops. That's what I get for only testing without CONFIG_PCPU_DEV_REFCNT...
> 
> I guess a better place to put the new check would be directly after
> checking for "dev->reg_state == NETREG_UNREGISTERING"? Like this?

Possibly a very silly suggestion but perhaps we should set 
the pointer to NULL for the pcpu case and let it crash?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net] net: dev: Detect dev_hold() after netdev_wait_allrefs()
  2022-01-28  2:22     ` Jakub Kicinski
@ 2022-01-28  2:23       ` Eric Dumazet
  2022-01-28  2:27         ` Jann Horn
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2022-01-28  2:23 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Jann Horn, David S . Miller, netdev, LKML, Oliver Neukum

On Thu, Jan 27, 2022 at 6:22 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 28 Jan 2022 03:14:14 +0100 Jann Horn wrote:
> > Oh. Whoops. That's what I get for only testing without CONFIG_PCPU_DEV_REFCNT...
> >
> > I guess a better place to put the new check would be directly after
> > checking for "dev->reg_state == NETREG_UNREGISTERING"? Like this?
>
> Possibly a very silly suggestion but perhaps we should set
> the pointer to NULL for the pcpu case and let it crash?

It is already set to 0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net] net: dev: Detect dev_hold() after netdev_wait_allrefs()
  2022-01-28  2:14   ` Jann Horn
  2022-01-28  2:22     ` Jakub Kicinski
@ 2022-01-28  2:24     ` Eric Dumazet
  2022-01-28  2:48       ` Jann Horn
  2022-01-28 17:58       ` Jann Horn
  1 sibling, 2 replies; 12+ messages in thread
From: Eric Dumazet @ 2022-01-28  2:24 UTC (permalink / raw)
  To: Jann Horn; +Cc: David S . Miller, Jakub Kicinski, netdev, LKML, Oliver Neukum

On Thu, Jan 27, 2022 at 6:14 PM Jann Horn <jannh@google.com> wrote:
>
> On Fri, Jan 28, 2022 at 3:09 AM Eric Dumazet <edumazet@google.com> wrote:
> > On Thu, Jan 27, 2022 at 5:43 PM Jann Horn <jannh@google.com> wrote:
> > >
> > > I've run into a bug where dev_hold() was being called after
> > > netdev_wait_allrefs(). But at that point, the device is already going
> > > away, and dev_hold() can't stop that anymore.
> > >
> > > To make such problems easier to diagnose in the future:
> > >
> > >  - For CONFIG_PCPU_DEV_REFCNT builds: Recheck in free_netdev() whether
> > >    the net refcount has been elevated. If this is detected, WARN() and
> > >    leak the object (to prevent worse consequences from a
> > >    use-after-free).
> > >  - For builds without CONFIG_PCPU_DEV_REFCNT: Set the refcount to zero.
> > >    This signals to the generic refcount infrastructure that any attempt
> > >    to increment the refcount later is a bug.
> > >
> > > Signed-off-by: Jann Horn <jannh@google.com>
> > > ---
> > >  net/core/dev.c | 18 +++++++++++++++++-
> > >  1 file changed, 17 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > index 1baab07820f6..f7916c0d226d 100644
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -9949,8 +9949,18 @@ void netdev_run_todo(void)
> > >
> > >                 netdev_wait_allrefs(dev);
> > >
> > > +               /* Drop the netdev refcount (which should be 1 at this point)
> > > +                * to zero. If we're using the generic refcount code, this will
> > > +                * tell it that any dev_hold() after this point is a bug.
> > > +                */
> > > +#ifdef CONFIG_PCPU_DEV_REFCNT
> > > +               this_cpu_dec(*dev->pcpu_refcnt);
> > > +               BUG_ON(netdev_refcnt_read(dev) != 0);
> > > +#else
> > > +               BUG_ON(!refcount_dec_and_test(&dev->dev_refcnt));
> > > +#endif
> > > +
> > >                 /* paranoia */
> > > -               BUG_ON(netdev_refcnt_read(dev) != 1);
> > >                 BUG_ON(!list_empty(&dev->ptype_all));
> > >                 BUG_ON(!list_empty(&dev->ptype_specific));
> > >                 WARN_ON(rcu_access_pointer(dev->ip_ptr));
> > > @@ -10293,6 +10303,12 @@ void free_netdev(struct net_device *dev)
> > >         free_percpu(dev->xdp_bulkq);
> > >         dev->xdp_bulkq = NULL;
> > >
> > > +       /* Recheck in case someone called dev_hold() between
> > > +        * netdev_wait_allrefs() and here.
> > > +        */
> >
> > At this point, dev->pcpu_refcnt per-cpu data has been freed already
> > (CONFIG_PCPU_DEV_REFCNT=y)
> >
> > So this should probably crash, or at least UAF ?
>
> Oh. Whoops. That's what I get for only testing without CONFIG_PCPU_DEV_REFCNT...
>
> I guess a better place to put the new check would be directly after
> checking for "dev->reg_state == NETREG_UNREGISTERING"? Like this?
>
>         if (dev->reg_state == NETREG_UNREGISTERING) {
>                 ASSERT_RTNL();
>                 dev->needs_free_netdev = true;
>                 return;
>         }
>
>         /* Recheck in case someone called dev_hold() between
>          * netdev_wait_allrefs() and here.
>          */
>         if (WARN_ON(netdev_refcnt_read(dev) != 0))
>                 return; /* leak memory, otherwise we might get UAF */
>
>         netif_free_tx_queues(dev);
>         netif_free_rx_queues(dev);

Maybe another solution would be to leverage the recent dev_hold_track().

We could add a  dead boolean to 'struct  ref_tracker_dir ' (dev->refcnt_tracker)

diff --git a/include/linux/ref_tracker.h b/include/linux/ref_tracker.h
index c11c9db5825cf933acf529c83db441a818135f29..d907759b2fa1dd6b2ef22f883d55963c410dc71b
100644
--- a/include/linux/ref_tracker.h
+++ b/include/linux/ref_tracker.h
@@ -12,6 +12,7 @@ struct ref_tracker_dir {
        spinlock_t              lock;
        unsigned int            quarantine_avail;
        refcount_t              untracked;
+       bool                    dead;
        struct list_head        list; /* List of active trackers */
        struct list_head        quarantine; /* List of dead trackers */
 #endif
@@ -24,6 +25,7 @@ static inline void ref_tracker_dir_init(struct
ref_tracker_dir *dir,
        INIT_LIST_HEAD(&dir->list);
        INIT_LIST_HEAD(&dir->quarantine);
        spin_lock_init(&dir->lock);
+       dir->dead = false;
        dir->quarantine_avail = quarantine_count;
        refcount_set(&dir->untracked, 1);
 }
diff --git a/lib/ref_tracker.c b/lib/ref_tracker.c
index a6789c0c626b0f68ad67c264cd19177a63fb82d2..cbc798e5b97674a389ea3bbf17d9bb39fbf63328
100644
--- a/lib/ref_tracker.c
+++ b/lib/ref_tracker.c
@@ -20,6 +20,7 @@ void ref_tracker_dir_exit(struct ref_tracker_dir *dir)
        unsigned long flags;
        bool leak = false;

+       dir->dead = true;
        spin_lock_irqsave(&dir->lock, flags);
        list_for_each_entry_safe(tracker, n, &dir->quarantine, head) {
                list_del(&tracker->head);
@@ -72,6 +73,8 @@ int ref_tracker_alloc(struct ref_tracker_dir *dir,
        gfp_t gfp_mask = gfp;
        unsigned long flags;

+       WARN_ON_ONCE(dir->dead);
+
        if (gfp & __GFP_DIRECT_RECLAIM)
                gfp_mask |= __GFP_NOFAIL;
        *trackerp = tracker = kzalloc(sizeof(*tracker), gfp_mask);

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net] net: dev: Detect dev_hold() after netdev_wait_allrefs()
  2022-01-28  2:23       ` Eric Dumazet
@ 2022-01-28  2:27         ` Jann Horn
  2022-01-28  2:30           ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Jann Horn @ 2022-01-28  2:27 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jakub Kicinski, David S . Miller, netdev, LKML, Oliver Neukum,
	the arch/x86 maintainers

On Fri, Jan 28, 2022 at 3:23 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Jan 27, 2022 at 6:22 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Fri, 28 Jan 2022 03:14:14 +0100 Jann Horn wrote:
> > > Oh. Whoops. That's what I get for only testing without CONFIG_PCPU_DEV_REFCNT...
> > >
> > > I guess a better place to put the new check would be directly after
> > > checking for "dev->reg_state == NETREG_UNREGISTERING"? Like this?
> >
> > Possibly a very silly suggestion but perhaps we should set
> > the pointer to NULL for the pcpu case and let it crash?

I like that idea... but this_cpu_dec()/this_cpu_inc() use GS-relative
addressing, at least on X86-64, so NULL might make things worse, I
think? /proc/kallsyms on my machine starts with:

0000000000000000 A fixed_percpu_data
0000000000000000 A __per_cpu_start
0000000000001000 A cpu_debug_store
0000000000002000 A irq_stack_backing_store
0000000000006000 A cpu_tss_rw
000000000000b000 A gdt_page
000000000000c000 A exception_stacks
0000000000010000 A entry_stack_storage
0000000000011000 A espfix_waddr

So we'd probably need some different placeholder instead of NULL to
actually crash...

> It is already set to 0

I think he meant do it already in netdev_run_todo(), not just in free_netdev()?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net] net: dev: Detect dev_hold() after netdev_wait_allrefs()
  2022-01-28  2:27         ` Jann Horn
@ 2022-01-28  2:30           ` Eric Dumazet
  0 siblings, 0 replies; 12+ messages in thread
From: Eric Dumazet @ 2022-01-28  2:30 UTC (permalink / raw)
  To: Jann Horn
  Cc: Jakub Kicinski, David S . Miller, netdev, LKML, Oliver Neukum,
	the arch/x86 maintainers

On Thu, Jan 27, 2022 at 6:27 PM Jann Horn <jannh@google.com> wrote:
>
> I like that idea... but this_cpu_dec()/this_cpu_inc() use GS-relative
> addressing, at least on X86-64, so NULL might make things worse, I
> think? /proc/kallsyms on my machine starts with:
>
> 0000000000000000 A fixed_percpu_data
> 0000000000000000 A __per_cpu_start
> 0000000000001000 A cpu_debug_store
> 0000000000002000 A irq_stack_backing_store
> 0000000000006000 A cpu_tss_rw
> 000000000000b000 A gdt_page
> 000000000000c000 A exception_stacks
> 0000000000010000 A entry_stack_storage
> 0000000000011000 A espfix_waddr
>
> So we'd probably need some different placeholder instead of NULL to
> actually crash...

Orthogonal problem, maybe we should make sure the first page of
per-cpu data is un-mapped.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net] net: dev: Detect dev_hold() after netdev_wait_allrefs()
  2022-01-28  2:24     ` Eric Dumazet
@ 2022-01-28  2:48       ` Jann Horn
  2022-01-28  2:53         ` Eric Dumazet
  2022-01-28 17:58       ` Jann Horn
  1 sibling, 1 reply; 12+ messages in thread
From: Jann Horn @ 2022-01-28  2:48 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, netdev, LKML, Oliver Neukum

On Fri, Jan 28, 2022 at 3:25 AM Eric Dumazet <edumazet@google.com> wrote:
> On Thu, Jan 27, 2022 at 6:14 PM Jann Horn <jannh@google.com> wrote:
> > On Fri, Jan 28, 2022 at 3:09 AM Eric Dumazet <edumazet@google.com> wrote:
> > > On Thu, Jan 27, 2022 at 5:43 PM Jann Horn <jannh@google.com> wrote:
> > > >
> > > > I've run into a bug where dev_hold() was being called after
> > > > netdev_wait_allrefs(). But at that point, the device is already going
> > > > away, and dev_hold() can't stop that anymore.
> > > >
> > > > To make such problems easier to diagnose in the future:
> > > >
> > > >  - For CONFIG_PCPU_DEV_REFCNT builds: Recheck in free_netdev() whether
> > > >    the net refcount has been elevated. If this is detected, WARN() and
> > > >    leak the object (to prevent worse consequences from a
> > > >    use-after-free).
> > > >  - For builds without CONFIG_PCPU_DEV_REFCNT: Set the refcount to zero.
> > > >    This signals to the generic refcount infrastructure that any attempt
> > > >    to increment the refcount later is a bug.
> > > >
> > > > Signed-off-by: Jann Horn <jannh@google.com>
> > > > ---
> > > >  net/core/dev.c | 18 +++++++++++++++++-
> > > >  1 file changed, 17 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > > index 1baab07820f6..f7916c0d226d 100644
> > > > --- a/net/core/dev.c
> > > > +++ b/net/core/dev.c
> > > > @@ -9949,8 +9949,18 @@ void netdev_run_todo(void)
> > > >
> > > >                 netdev_wait_allrefs(dev);
> > > >
> > > > +               /* Drop the netdev refcount (which should be 1 at this point)
> > > > +                * to zero. If we're using the generic refcount code, this will
> > > > +                * tell it that any dev_hold() after this point is a bug.
> > > > +                */
> > > > +#ifdef CONFIG_PCPU_DEV_REFCNT
> > > > +               this_cpu_dec(*dev->pcpu_refcnt);
> > > > +               BUG_ON(netdev_refcnt_read(dev) != 0);
> > > > +#else
> > > > +               BUG_ON(!refcount_dec_and_test(&dev->dev_refcnt));
> > > > +#endif
> > > > +
> > > >                 /* paranoia */
> > > > -               BUG_ON(netdev_refcnt_read(dev) != 1);
> > > >                 BUG_ON(!list_empty(&dev->ptype_all));
> > > >                 BUG_ON(!list_empty(&dev->ptype_specific));
> > > >                 WARN_ON(rcu_access_pointer(dev->ip_ptr));
> > > > @@ -10293,6 +10303,12 @@ void free_netdev(struct net_device *dev)
> > > >         free_percpu(dev->xdp_bulkq);
> > > >         dev->xdp_bulkq = NULL;
> > > >
> > > > +       /* Recheck in case someone called dev_hold() between
> > > > +        * netdev_wait_allrefs() and here.
> > > > +        */
> > >
> > > At this point, dev->pcpu_refcnt per-cpu data has been freed already
> > > (CONFIG_PCPU_DEV_REFCNT=y)
> > >
> > > So this should probably crash, or at least UAF ?
> >
> > Oh. Whoops. That's what I get for only testing without CONFIG_PCPU_DEV_REFCNT...
> >
> > I guess a better place to put the new check would be directly after
> > checking for "dev->reg_state == NETREG_UNREGISTERING"? Like this?
> >
> >         if (dev->reg_state == NETREG_UNREGISTERING) {
> >                 ASSERT_RTNL();
> >                 dev->needs_free_netdev = true;
> >                 return;
> >         }
> >
> >         /* Recheck in case someone called dev_hold() between
> >          * netdev_wait_allrefs() and here.
> >          */
> >         if (WARN_ON(netdev_refcnt_read(dev) != 0))
> >                 return; /* leak memory, otherwise we might get UAF */
> >
> >         netif_free_tx_queues(dev);
> >         netif_free_rx_queues(dev);
>
> Maybe another solution would be to leverage the recent dev_hold_track().
>
> We could add a  dead boolean to 'struct  ref_tracker_dir ' (dev->refcnt_tracker)
>
[...]
> @@ -72,6 +73,8 @@ int ref_tracker_alloc(struct ref_tracker_dir *dir,
>         gfp_t gfp_mask = gfp;
>         unsigned long flags;
>
> +       WARN_ON_ONCE(dir->dead);

When someone is using NET_DEV_REFCNT_TRACKER for slow debugging, they
should also be able to take the performance hit of
CONFIG_PCPU_DEV_REFCNT and rely on the normal increment-from-zero
detection of the generic refcount code, right? (Maybe
NET_DEV_REFCNT_TRACKER should depend on !CONFIG_PCPU_DEV_REFCNT?)

My intent with the extra check in free_netdev() was to get some
limited detection for production systems that don't use
NET_DEV_REFCNT_TRACKER.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net] net: dev: Detect dev_hold() after netdev_wait_allrefs()
  2022-01-28  2:48       ` Jann Horn
@ 2022-01-28  2:53         ` Eric Dumazet
  2022-01-28  3:00           ` Jann Horn
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2022-01-28  2:53 UTC (permalink / raw)
  To: Jann Horn; +Cc: David S . Miller, Jakub Kicinski, netdev, LKML, Oliver Neukum

On Thu, Jan 27, 2022 at 6:48 PM Jann Horn <jannh@google.com> wrote:

> When someone is using NET_DEV_REFCNT_TRACKER for slow debugging, they
> should also be able to take the performance hit of
> CONFIG_PCPU_DEV_REFCNT and rely on the normal increment-from-zero
> detection of the generic refcount code, right? (Maybe
> NET_DEV_REFCNT_TRACKER should depend on !CONFIG_PCPU_DEV_REFCNT?)

NET_DEV_REFCNT_TRACKER is not slow, I think it has neglectable cost really.
(I could not see any difference in my tests)

Also getting a trap at the exact moment of the buggy dev_hold_track()
is somewhat better than after-fact checking.

In your case, linkwatch_add_event() already uses dev_hold_track() so
my proposal would detect the issue right away.

>
> My intent with the extra check in free_netdev() was to get some
> limited detection for production systems that don't use
> NET_DEV_REFCNT_TRACKER.

Understood

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net] net: dev: Detect dev_hold() after netdev_wait_allrefs()
  2022-01-28  2:53         ` Eric Dumazet
@ 2022-01-28  3:00           ` Jann Horn
  0 siblings, 0 replies; 12+ messages in thread
From: Jann Horn @ 2022-01-28  3:00 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, netdev, LKML, Oliver Neukum

On Fri, Jan 28, 2022 at 3:53 AM Eric Dumazet <edumazet@google.com> wrote:
> On Thu, Jan 27, 2022 at 6:48 PM Jann Horn <jannh@google.com> wrote:
>
> > When someone is using NET_DEV_REFCNT_TRACKER for slow debugging, they
> > should also be able to take the performance hit of
> > CONFIG_PCPU_DEV_REFCNT and rely on the normal increment-from-zero
> > detection of the generic refcount code, right? (Maybe
> > NET_DEV_REFCNT_TRACKER should depend on !CONFIG_PCPU_DEV_REFCNT?)
>
> NET_DEV_REFCNT_TRACKER is not slow, I think it has neglectable cost really.
> (I could not see any difference in my tests)

Ah, sorry. I misread the Kconfig text.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net] net: dev: Detect dev_hold() after netdev_wait_allrefs()
  2022-01-28  2:24     ` Eric Dumazet
  2022-01-28  2:48       ` Jann Horn
@ 2022-01-28 17:58       ` Jann Horn
  1 sibling, 0 replies; 12+ messages in thread
From: Jann Horn @ 2022-01-28 17:58 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, netdev, LKML, Oliver Neukum

On Fri, Jan 28, 2022 at 3:25 AM Eric Dumazet <edumazet@google.com> wrote:
> On Thu, Jan 27, 2022 at 6:14 PM Jann Horn <jannh@google.com> wrote:
> > On Fri, Jan 28, 2022 at 3:09 AM Eric Dumazet <edumazet@google.com> wrote:
> > > On Thu, Jan 27, 2022 at 5:43 PM Jann Horn <jannh@google.com> wrote:
> > > > I've run into a bug where dev_hold() was being called after
> > > > netdev_wait_allrefs(). But at that point, the device is already going
> > > > away, and dev_hold() can't stop that anymore.
> > > >
> > > > To make such problems easier to diagnose in the future:
> > > >
> > > >  - For CONFIG_PCPU_DEV_REFCNT builds: Recheck in free_netdev() whether
> > > >    the net refcount has been elevated. If this is detected, WARN() and
> > > >    leak the object (to prevent worse consequences from a
> > > >    use-after-free).
> > > >  - For builds without CONFIG_PCPU_DEV_REFCNT: Set the refcount to zero.
> > > >    This signals to the generic refcount infrastructure that any attempt
> > > >    to increment the refcount later is a bug.
[...]
> >         if (dev->reg_state == NETREG_UNREGISTERING) {
> >                 ASSERT_RTNL();
> >                 dev->needs_free_netdev = true;
> >                 return;
> >         }
> >
> >         /* Recheck in case someone called dev_hold() between
> >          * netdev_wait_allrefs() and here.
> >          */
> >         if (WARN_ON(netdev_refcnt_read(dev) != 0))
> >                 return; /* leak memory, otherwise we might get UAF */
> >
> >         netif_free_tx_queues(dev);
> >         netif_free_rx_queues(dev);
>
> Maybe another solution would be to leverage the recent dev_hold_track().
>
> We could add a  dead boolean to 'struct  ref_tracker_dir ' (dev->refcnt_tracker)

Hmm... actually, what even are the semantics of dev_hold()?

Normal refcounts have the property that if you hold one reference,
you're always allowed to add another reference. But from what I can
tell, something like this:

struct net_device *dev = dev_get_by_name(net, name);
dev_hold(dev);
dev_put(dev);
dev_put(dev);

would be buggy using the current CONFIG_PCPU_DEV_REFCNT implementation.
Basically, if dev_hold() runs at the same time as
netdev_refcnt_read(), it's a bug because netdev_refcnt_read() is
non-atomic, and we could get the following race:

task B: starts netdev_refcnt_read()
task B: reads *per_cpu_ptr(dev->pcpu_refcnt, 0)
task A, on CPU 0: dev_hold(dev) increments *per_cpu_ptr(dev->pcpu_refcnt, 0)
task A: migrates from CPU 0 to CPU 1
task A, on CPU 1: dev_put(dev) decrements *per_cpu_ptr(dev->pcpu_refcnt, 1)
task B: reads *per_cpu_ptr(dev->pcpu_refcnt, 1)

which would make task B miss one outstanding reference.

(This is why the generic percpu refcounting code in
lib/percpu-refcount.c has logic for switching the refcount to atomic
mode with an RCU grace period.)


If these are the intended semantics for dev_hold(), then I guess your
approach of adding a new boolean flag somewhere is the right one - but
we should be setting that flag *before* waiting for the refcount to
drop to 1. Though maybe it shouldn't be in ref-tracker, since this is
a peculiarity of the hand-rolled netdev refcount...

Are these the intended semantics (and I should rewrite the patch to
also catch dev_hold() racing with netdev_wait_allrefs()), or is this
unintended (and the netdev refcount should be replaced)?

This should probably be documented...

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-01-28 17:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-01-28  1:43 [PATCH net] net: dev: Detect dev_hold() after netdev_wait_allrefs() Jann Horn
2022-01-28  2:09 ` Eric Dumazet
2022-01-28  2:14   ` Jann Horn
2022-01-28  2:22     ` Jakub Kicinski
2022-01-28  2:23       ` Eric Dumazet
2022-01-28  2:27         ` Jann Horn
2022-01-28  2:30           ` Eric Dumazet
2022-01-28  2:24     ` Eric Dumazet
2022-01-28  2:48       ` Jann Horn
2022-01-28  2:53         ` Eric Dumazet
2022-01-28  3:00           ` Jann Horn
2022-01-28 17:58       ` Jann Horn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).