All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kirill Tkhai <tkhai@yandex.ru>
To: Peter Zijlstra <peterz@infradead.org>,
	"umgwanakikbuti@gmail.com" <umgwanakikbuti@gmail.com>,
	"mingo@elte.hu" <mingo@elte.hu>
Cc: "ktkhai@parallels.com" <ktkhai@parallels.com>,
	"rostedt@goodmis.org" <rostedt@goodmis.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"juri.lelli@gmail.com" <juri.lelli@gmail.com>,
	"pang.xunlei@linaro.org" <pang.xunlei@linaro.org>,
	"oleg@redhat.com" <oleg@redhat.com>,
	"wanpeng.li@linux.intel.com" <wanpeng.li@linux.intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 8/9] hrtimer: Allow hrtimer::function() to free the timer
Date: Wed, 03 Jun 2015 19:26:00 +0300	[thread overview]
Message-ID: <214021433348760@web25g.yandex.ru> (raw)
In-Reply-To: <20150603134023.156059118@infradead.org>

[sorry for died formatting]

03.06.2015, 16:55, "Peter Zijlstra" <peterz@infradead.org>:
> Currently an hrtimer callback function cannot free its own timer
> because __run_hrtimer() still needs to clear HRTIMER_STATE_CALLBACK
> after it. Freeing the timer would result in a clear use-after-free.
>
> Solve this by using a scheme similar to regular timers; track the
> current running timer in hrtimer_clock_base::running.
>
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/linux/hrtimer.h |   35 ++++++++++++++---------------------
>  kernel/time/hrtimer.c   |   48 ++++++++++++++++++++++--------------------------
>  2 files changed, 36 insertions(+), 47 deletions(-)
>
> --- a/include/linux/hrtimer.h
> +++ b/include/linux/hrtimer.h
> @@ -53,34 +53,25 @@ enum hrtimer_restart {
>   *
>   * 0x00 inactive
>   * 0x01 enqueued into rbtree
> - * 0x02 callback function running
> - * 0x04 timer is migrated to another cpu
> + * 0x02 timer is migrated to another cpu
>   *
> - * Special cases:
> - * 0x03 callback function running and enqueued
> - * (was requeued on another CPU)
> - * 0x05 timer was migrated on CPU hotunplug
> + * The callback state is not part of the timer->state because clearing it would
> + * mean touching the timer after the callback, this makes it impossible to free
> + * the timer from the callback function.
>   *
> - * The "callback function running and enqueued" status is only possible on
> - * SMP. It happens for example when a posix timer expired and the callback
> + * Therefore we track the callback state in timer->base->running == timer.
> + *
> + * On SMP it is possible to have a "callback function running and enqueued"
> + * status. It happens for example when a posix timer expired and the callback
>   * queued a signal. Between dropping the lock which protects the posix timer
>   * and reacquiring the base lock of the hrtimer, another CPU can deliver the
> - * signal and rearm the timer. We have to preserve the callback running state,
> - * as otherwise the timer could be removed before the softirq code finishes the
> - * the handling of the timer.
> - *
> - * The HRTIMER_STATE_ENQUEUED bit is always or'ed to the current state
> - * to preserve the HRTIMER_STATE_CALLBACK in the above scenario. This
> - * also affects HRTIMER_STATE_MIGRATE where the preservation is not
> - * necessary. HRTIMER_STATE_MIGRATE is cleared after the timer is
> - * enqueued on the new cpu.
> + * signal and rearm the timer.
>   *
>   * All state transitions are protected by cpu_base->lock.
>   */
>  #define HRTIMER_STATE_INACTIVE 0x00
>  #define HRTIMER_STATE_ENQUEUED 0x01
> -#define HRTIMER_STATE_CALLBACK 0x02
> -#define HRTIMER_STATE_MIGRATE 0x04
> +#define HRTIMER_STATE_MIGRATE 0x02
>
>  /**
>   * struct hrtimer - the basic hrtimer structure
> @@ -153,6 +144,7 @@ struct hrtimer_clock_base {
>          struct timerqueue_head active;
>          ktime_t (*get_time)(void);
>          ktime_t offset;
> + struct hrtimer *running;
>  } __attribute__((__aligned__(HRTIMER_CLOCK_BASE_ALIGN)));
>
>  enum  hrtimer_base_type {
> @@ -402,7 +394,8 @@ extern u64 hrtimer_get_next_event(void);
>   */
>  static inline int hrtimer_active(const struct hrtimer *timer)
>  {
> - return timer->state != HRTIMER_STATE_INACTIVE;
> + return timer->state != HRTIMER_STATE_INACTIVE ||
> + timer->base->running == timer;
>  }

It seems to be not good, because hrtimer_active() check stops
to be atomic. So the things like hrtimer_try_to_cancel() race
with a callback of self-rearming timer and may return a false
positive result.

>  /*
> @@ -419,7 +412,7 @@ static inline int hrtimer_is_queued(stru
>   */
>  static inline int hrtimer_callback_running(struct hrtimer *timer)
>  {
> - return timer->state & HRTIMER_STATE_CALLBACK;
> + return timer->base->running == timer;
>  }
>
>  /* Forward a hrtimer so it expires after now: */
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -111,6 +111,13 @@ static inline int hrtimer_clockid_to_bas
>  #ifdef CONFIG_SMP
>
>  /*
> + * We require the migration_base for lock_hrtimer_base()/switch_hrtimer_base()
> + * such that hrtimer_callback_running() can unconditionally dereference
> + * timer->base.
> + */
> +static struct hrtimer_clock_base migration_base;
> +
> +/*
>   * We are using hashed locking: holding per_cpu(hrtimer_bases)[n].lock
>   * means that all timers which are tied to this base via timer->base are
>   * locked, and the base itself is locked too.
> @@ -119,8 +126,8 @@ static inline int hrtimer_clockid_to_bas
>   * be found on the lists/queues.
>   *
>   * When the timer's base is locked, and the timer removed from list, it is
> - * possible to set timer->base = NULL and drop the lock: the timer remains
> - * locked.
> + * possible to set timer->base = &migration_base and drop the lock: the timer
> + * remains locked.
>   */
>  static
>  struct hrtimer_clock_base *lock_hrtimer_base(const struct hrtimer *timer,
> @@ -130,7 +137,7 @@ struct hrtimer_clock_base *lock_hrtimer_
>
>          for (;;) {
>                  base = timer->base;
> - if (likely(base != NULL)) {
> + if (likely(base != &migration_base)) {
>                          raw_spin_lock_irqsave(&base->cpu_base->lock, *flags);
>                          if (likely(base == timer->base))
>                                  return base;
> @@ -194,8 +201,8 @@ switch_hrtimer_base(struct hrtimer *time
>                  if (unlikely(hrtimer_callback_running(timer)))
>                          return base;
>
> - /* See the comment in lock_timer_base() */
> - timer->base = NULL;
> + /* See the comment in lock_hrtimer_base() */
> + timer->base = &migration_base;
>                  raw_spin_unlock(&base->cpu_base->lock);
>                  raw_spin_lock(&new_base->cpu_base->lock);
>
> @@ -840,11 +847,7 @@ static int enqueue_hrtimer(struct hrtime
>
>          base->cpu_base->active_bases |= 1 << base->index;
>
> - /*
> - * HRTIMER_STATE_ENQUEUED is or'ed to the current state to preserve the
> - * state of a possibly running callback.
> - */
> - timer->state |= HRTIMER_STATE_ENQUEUED;
> + timer->state = HRTIMER_STATE_ENQUEUED;
>
>          return timerqueue_add(&base->active, &timer->node);
>  }
> @@ -894,7 +897,6 @@ static inline int
>  remove_hrtimer(struct hrtimer *timer, struct hrtimer_clock_base *base)
>  {
>          if (hrtimer_is_queued(timer)) {
> - unsigned long state;
>                  int reprogram;
>
>                  /*
> @@ -908,13 +910,8 @@ remove_hrtimer(struct hrtimer *timer, st
>                  debug_deactivate(timer);
>                  timer_stats_hrtimer_clear_start_info(timer);
>                  reprogram = base->cpu_base == this_cpu_ptr(&hrtimer_bases);
> - /*
> - * We must preserve the CALLBACK state flag here,
> - * otherwise we could move the timer base in
> - * switch_hrtimer_base.
> - */
> - state = timer->state & HRTIMER_STATE_CALLBACK;
> - __remove_hrtimer(timer, base, state, reprogram);
> +
> + __remove_hrtimer(timer, base, HRTIMER_STATE_INACTIVE, reprogram);
>                  return 1;
>          }
>          return 0;
> @@ -1124,7 +1121,8 @@ static void __run_hrtimer(struct hrtimer
>          WARN_ON(!irqs_disabled());
>
>          debug_deactivate(timer);
> - __remove_hrtimer(timer, base, HRTIMER_STATE_CALLBACK, 0);
> + base->running = timer;
> + __remove_hrtimer(timer, base, HRTIMER_STATE_INACTIVE, 0);
>          timer_stats_account_hrtimer(timer);
>          fn = timer->function;
>
> @@ -1140,7 +1138,7 @@ static void __run_hrtimer(struct hrtimer
>          raw_spin_lock(&cpu_base->lock);
>
>          /*
> - * Note: We clear the CALLBACK bit after enqueue_hrtimer and
> + * Note: We clear the running state after enqueue_hrtimer and
>           * we do not reprogramm the event hardware. Happens either in
>           * hrtimer_start_range_ns() or in hrtimer_interrupt()
>           *
> @@ -1152,9 +1150,8 @@ static void __run_hrtimer(struct hrtimer
>              !(timer->state & HRTIMER_STATE_ENQUEUED))
>                  enqueue_hrtimer(timer, base);
>
> - WARN_ON_ONCE(!(timer->state & HRTIMER_STATE_CALLBACK));
> -
> - timer->state &= ~HRTIMER_STATE_CALLBACK;
> + WARN_ON_ONCE(base->running != timer);
> + base->running = NULL;
>  }
>
>  static void __hrtimer_run_queues(struct hrtimer_cpu_base *cpu_base, ktime_t now)
> @@ -1523,11 +1520,10 @@ static void migrate_hrtimer_list(struct
>                   * hrtimer_interrupt after we migrated everything to
>                   * sort out already expired timers and reprogram the
>                   * event device.
> + *
> + * Sets timer->state = HRTIMER_STATE_ENQUEUED.
>                   */
>                  enqueue_hrtimer(timer, new_base);
> -
> - /* Clear the migration state bit */
> - timer->state &= ~HRTIMER_STATE_MIGRATE;
>          }
>  }
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

  reply	other threads:[~2015-06-03 16:26 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-03 13:29 [PATCH 0/9] sched: balance callbacks Peter Zijlstra
2015-06-03 13:29 ` [PATCH 1/9] sched: Replace post_schedule with a balance callback list Peter Zijlstra
2015-06-03 13:29 ` [PATCH 2/9] sched: Use replace normalize_task() with __sched_setscheduler() Peter Zijlstra
2015-06-03 13:29 ` [PATCH 3/9] sched: Allow balance callbacks for check_class_changed() Peter Zijlstra
2015-06-03 13:29 ` [PATCH 4/9] sched,rt: Remove return value from pull_rt_task() Peter Zijlstra
2015-06-03 13:29 ` [PATCH 5/9] sched,rt: Convert switched_{from,to}_rt() / prio_changed_rt() to balance callbacks Peter Zijlstra
2015-06-03 13:29 ` [PATCH 6/9] sched,dl: Remove return value from pull_dl_task() Peter Zijlstra
2015-06-03 13:29 ` [PATCH 7/9] sched,dl: Convert switched_{from,to}_dl() / prio_changed_dl() to balance callbacks Peter Zijlstra
2015-06-03 13:29 ` [PATCH 8/9] hrtimer: Allow hrtimer::function() to free the timer Peter Zijlstra
2015-06-03 16:26   ` Kirill Tkhai [this message]
2015-06-03 21:13     ` Peter Zijlstra
2015-06-04  9:07       ` Kirill Tkhai
2015-06-04 10:49         ` Peter Zijlstra
2015-06-04 10:55           ` Peter Zijlstra
2015-06-04 10:58             ` Peter Zijlstra
2015-06-05  9:02           ` Kirill Tkhai
2015-06-05  9:03             ` Kirill Tkhai
2015-06-05  9:11               ` Peter Zijlstra
2015-06-05  9:10             ` Peter Zijlstra
2015-06-05  9:27               ` Kirill Tkhai
2015-06-03 17:41   ` Thomas Gleixner
2015-06-03 21:29     ` Peter Zijlstra
2015-06-04  5:59       ` Ingo Molnar
2015-06-04 10:07         ` Peter Zijlstra
2015-06-04 12:37           ` Ingo Molnar
2015-06-03 13:29 ` [PATCH 9/9] sched,dl: Fix sched class hopping CBS hole Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=214021433348760@web25g.yandex.ru \
    --to=tkhai@yandex.ru \
    --cc=juri.lelli@gmail.com \
    --cc=ktkhai@parallels.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=oleg@redhat.com \
    --cc=pang.xunlei@linaro.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=umgwanakikbuti@gmail.com \
    --cc=wanpeng.li@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.