All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Nishanth Menon <nm@ti.com>
Cc: linux-pm <linux-pm@lists.linux-foundation.org>,
	lkml <linux-kernel@vger.kernel.org>,
	linux-arm <linux-arm-kernel@lists.infradead.org>,
	linux-omap <linux-omap@vger.kernel.org>
Subject: Re: [PATCH v4] power: introduce library for device-specific OPPs
Date: Fri, 24 Sep 2010 14:40:03 -0700	[thread overview]
Message-ID: <20100924214003.GL2375@linux.vnet.ibm.com> (raw)
In-Reply-To: <4C9D177D.6090708@ti.com>

On Fri, Sep 24, 2010 at 04:26:21PM -0500, Nishanth Menon wrote:
> Paul E. McKenney had written, on 09/24/2010 02:37 PM, the following:
> [...]
> >
> >Looks like a good start!!!  Some questions and suggestions about RCU
> Thanks for the review.. few comments below..

Back at you!  ;-)

> >usage interspersed below.
> >
> >                                                        Thanx, Paul
> [...]
> >>+
> >>+/**
> >>+ * find_device_opp() - find device_opp struct using device pointer
> >>+ * @dev:     device pointer used to lookup device OPPs
> >>+ *
> >>+ * Search list of device OPPs for one containing matching device. Does a RCU
> >>+ * reader operation to grab the pointer needed.
> >>+ *
> >>+ * Returns pointer to 'struct device_opp' if found, otherwise -ENODEV or
> >>+ * -EINVAL based on type of error.
> >>+ */
> >>+static struct device_opp *find_device_opp(struct device *dev)
> >>+{
> >>+     struct device_opp *tmp_dev_opp, *dev_opp = ERR_PTR(-ENODEV);
> >>+
> >>+     if (unlikely(!dev || IS_ERR(dev))) {
> >>+             pr_err("%s: Invalid parameters being passed\n", __func__);
> >>+             return ERR_PTR(-EINVAL);
> >>+     }
> >>+
> >>+     rcu_read_lock();
> >>+     list_for_each_entry_rcu(tmp_dev_opp, &dev_opp_list, node) {
> >>+             if (tmp_dev_opp->dev == dev) {
> >>+                     dev_opp = tmp_dev_opp;
> >>+                     break;
> >>+             }
> >>+     }
> >>+     rcu_read_unlock();
> >
> >What prevents the structure pointed to by dev_opp from being freed
> >at this point?  We are no longer in an RCU read-side critical section,
> >so RCU grace periods starting during the above RCU read-side critical
> >section can now end.
> 
> dev_opp is never freed in the implementation -> it represents
> domains, only adds with list_add_rcu() is done -> wont the usage be
> safe then? or I being blind?
> 
> the reason why we dont free is coz of the following: dev_opp
> represents voltage domains in opp library. SoC frameworks are
> required to register only those voltage domain opp that are
> required. by allowing a free logic, I knew it'd have complicated the
> implementation way beyond what we needed it to be.

Perhaps I was confusing two different data structures, if so, apologies.

So you are freeing the opp level, but never the dev_opp level, then?

But yes, if you are only adding and never deleting, then it is safe to
pass the pointers out of an RCU read-side critical section.  But please
add a comment saying why you are doing this.  Otherwise, Coccinelle will
cause me to continue complaining about this to you.  ;-)

And the later uses still look buggy to me, please see below.

> >Here is an example sequence of events that I am worried about:
> >
> >o       CPU 1 enters find_device_opp(), and pick up a pointer to
> >        a given device opp.
> >
> >o       CPU 2 executes opp_set_availability(), replacing that same
> >        device opp with a new one.  It then calls synchronize_rcu()
> >        which blocks waiting for CPU 1 to exit its RCU read-side
> >        critical section.
> >
> >o       CPU 1 exits its RCU read-side critical section, arriving at
> >        this point in the code.
> >
> >o       CPU 2's synchronize_rcu() is now permitted to return, executing
> >        the kfree(), which frees up the memory that CPU 1's dev_opp
> >        pointer references.
> >
> >o       This newly freed memory is allocated for some other structure
> >        by CPU 3.  CPU 1 and CPU 3 are now trying to use the same
> >        memory for two different structures, and nothing good can
> >        possibly come of this.  The kernel dies a brutal and nasty
> >        death.
> >
> >One way to fix this is to have the caller do rcu_read_lock() before
> >calling find_device_opp(), and to do rcu_read_unlock() only after the
> >caller has finished using the pointer that find_device_opp() returns.
> >This works well unless the caller needs to do some blocking operation
> >before it gets done using the pointer.
> >
> >Another approach is for find_device_opp() to use a reference count on
> >the structure, and for opp_set_availability() to avoid freeing the
> >structure unless/until the reference counter drops to zero.
> >
> >There are other approaches as well, please feel free to take a look
> >at Documentation/RCU/rcuref.txt for more info on using reference
> >counting and RCU.
> thx. I probably should read yet again if I got my understanding of
> usage right..
> 
> >
> [...]
> >>+
> >>+/**
> >>+ * opp_find_freq_exact() - search for an exact frequency
> >>+ * @dev:             device for which we do this operation
> >>+ * @freq:            frequency to search for
> >>+ * @is_available:    true/false - match for available opp
> >>+ *
> >>+ * Searches for exact match in the opp list and returns pointer to the matching
> >>+ * opp if found, else returns ERR_PTR in case of error and should be handled
> >>+ * using IS_ERR.
> >>+ *
> >>+ * Note: available is a modifier for the search. if available=true, then the
> >>+ * match is for exact matching frequency and is available in the stored OPP
> >>+ * table. if false, the match is for exact frequency which is not available.
> >>+ *
> >>+ * This provides a mechanism to enable an opp which is not available currently
> >>+ * or the opposite as well.
> >>+ *
> >>+ * Locking: RCU reader.
> >>+ */
> >>+struct opp *opp_find_freq_exact(struct device *dev,
> >>+                                  unsigned long freq, bool available)
> >>+{
> >>+     struct device_opp *dev_opp;
> >>+     struct opp *temp_opp, *opp = ERR_PTR(-ENODEV);
> >>+
> >>+     dev_opp = find_device_opp(dev);
> >>+     if (IS_ERR(dev_opp))
> >>+             return opp;
> >>+
> >>+     rcu_read_lock();
> >>+     list_for_each_entry_rcu(temp_opp, &dev_opp->opp_list, node) {
> >>+             if (temp_opp->available == available &&
> >>+                             temp_opp->rate == freq) {
> >>+                     opp = temp_opp;
> >>+                     break;
> >>+             }
> >>+     }
> >>+     rcu_read_unlock();
> >
> >But this one sadly has the same problem that find_device_opp() does.
> is the concern about opp OR about dev_opp here? I am guessing opp..
> >
> >>+     return opp;
> >>+}
> >>+
> >>+/**
> >>+ * opp_find_freq_ceil() - Search for an rounded ceil freq
> >>+ * @dev:     device for which we do this operation
> >>+ * @freq:    Start frequency
> >>+ *
> >>+ * Search for the matching ceil *available* OPP from a starting freq
> >>+ * for a device.
> >>+ *
> >>+ * Returns matching *opp and refreshes *freq accordingly, else returns
> >>+ * ERR_PTR in case of error and should be handled using IS_ERR.
> >>+ *
> >>+ * Locking: RCU reader.
> >>+ */
> >>+struct opp *opp_find_freq_ceil(struct device *dev, unsigned long *freq)
> >>+{
> >>+     struct device_opp *dev_opp;
> >>+     struct opp *temp_opp, *opp = ERR_PTR(-ENODEV);
> >>+
> >>+     if (!dev || !freq) {
> >>+             pr_err("%s: invalid param dev=%p freq=%p\n", __func__,
> >>+                             dev, freq);
> >>+             return ERR_PTR(-EINVAL);
> >>+     }
> >>+
> >>+     dev_opp = find_device_opp(dev);
> >>+     if (IS_ERR(dev_opp))
> >>+             return opp;
> >>+
> >>+     rcu_read_lock();
> >>+     list_for_each_entry_rcu(temp_opp, &dev_opp->opp_list, node) {
> >>+             if (temp_opp->available && temp_opp->rate >= *freq) {
> >>+                     opp = temp_opp;
> >>+                     *freq = opp->rate;
> >>+                     break;
> >>+             }
> >>+     }
> >>+     rcu_read_unlock();
> >
> >And this one also has the same problem that find_device_opp() does.
>
> guessing opp ptr here.. am I right? if it is about device_opp, it is
> not going to be freed as I mentioned above - at least we dont give
> an function to update(hence free) it.

It really does look to me that you are passing a pointer to the thing
being freed out of an RCU read-side critical section.  If you are really
doing this, you do need to do something to fix it.  I outlined some of
the options earlier.

> >>+     return opp;
> >>+}
> >>+
> >>+/**
> >>+ * opp_find_freq_floor() - Search for a rounded floor freq
> >>+ * @dev:     device for which we do this operation
> >>+ * @freq:    Start frequency
> >>+ *
> >>+ * Search for the matching floor *available* OPP from a starting freq
> >>+ * for a device.
> >>+ *
> >>+ * Returns matching *opp and refreshes *freq accordingly, else returns
> >>+ * ERR_PTR in case of error and should be handled using IS_ERR.
> >>+ *
> >>+ * Locking: RCU reader.
> >>+ */
> >>+struct opp *opp_find_freq_floor(struct device *dev, unsigned long *freq)
> >>+{
> >>+     struct device_opp *dev_opp;
> >>+     struct opp *temp_opp, *opp = ERR_PTR(-ENODEV);
> >>+
> >>+     if (!dev || !freq) {
> >>+             pr_err("%s: invalid param dev=%p freq=%p\n", __func__,
> >>+                             dev, freq);
> >>+             return ERR_PTR(-EINVAL);
> >>+     }
> >>+
> >>+     dev_opp = find_device_opp(dev);
> >>+     if (IS_ERR(dev_opp))
> >>+             return opp;
> >>+
> >>+     rcu_read_lock();
> >>+     list_for_each_entry_rcu(temp_opp, &dev_opp->opp_list, node) {
> >>+             if (temp_opp->available) {
> >>+                     /* go to the next node, before choosing prev */
> >>+                     if (temp_opp->rate > *freq)
> >>+                             break;
> >>+                     else
> >>+                             opp = temp_opp;
> >>+             }
> >>+     }
> >>+     if (!IS_ERR(opp))
> >>+             *freq = opp->rate;
> >>+     rcu_read_unlock();
> >
> >As does this one.
>
> guessing opp ptr here.. am I right?

Again, here it looks to me like you are passing a pointer out of an RCU
read-side critical section that could be freed out from under you.  If
so, again, this must be fixed.

> >>+     return opp;
> >>+}
> >>+
> >>+/**
> >>+ * opp_add()  - Add an OPP table from a table definitions
> >>+ * @dev:     device for which we do this operation
> >>+ * @freq:    Frequency in Hz for this OPP
> >>+ * @u_volt:  Voltage in uVolts for this OPP
> >>+ *
> >>+ * This function adds an opp definition to the opp list and returns status.
> >>+ * The opp is made available by default and it can be controlled using
> >>+ * opp_enable/disable functions.
> >>+ *
> >>+ * Locking: RCU, mutex
> >>+ */
> >>+int opp_add(struct device *dev, unsigned long freq, unsigned long u_volt)
> >>+{
> >>+     struct device_opp *tmp_dev_opp, *dev_opp = NULL;
> >>+     struct opp *opp, *new_opp;
> >>+     struct list_head *head;
> >>+
> >>+     /* Check for existing list for 'dev' */
> >>+     rcu_read_lock();
> >>+     list_for_each_entry_rcu(tmp_dev_opp, &dev_opp_list, node) {
> >>+             if (dev == tmp_dev_opp->dev) {
> >>+                     dev_opp = tmp_dev_opp;
> >>+                     break;
> >>+             }
> >>+     }
> >>+     rcu_read_unlock();
> >>+
> >>+     /* allocate new OPP node */
> >>+     new_opp = kzalloc(sizeof(struct opp), GFP_KERNEL);
> >>+     if (!new_opp) {
> >>+             pr_warning("%s: unable to allocate new opp node\n",
> >>+                     __func__);
> >>+             return -ENOMEM;
> >>+     }
> >>+
> >>+     if (!dev_opp) {
> >>+             /* Allocate a new device OPP table */
> >>+             dev_opp = kzalloc(sizeof(struct device_opp), GFP_KERNEL);
> >>+             if (!dev_opp) {
> >>+                     kfree(new_opp);
> >>+                     pr_warning("%s: unable to allocate device structure\n",
> >>+                             __func__);
> >>+                     return -ENOMEM;
> >>+             }
> >>+
> >>+             dev_opp->dev = dev;
> >>+             INIT_LIST_HEAD(&dev_opp->opp_list);
> >>+             mutex_init(&dev_opp->lock);
> >>+
> >>+             /* Secure the device list modification */
> >>+             mutex_lock(&dev_opp_list_lock);
> >>+             list_add_rcu(&dev_opp->node, &dev_opp_list);
> >>+             mutex_unlock(&dev_opp_list_lock);
> >>+             synchronize_rcu();
> >
> >You do not need to wait for an RCU grace period when adding objects, only
> >between removing them and freeing them.
>
> ouch.. my bad.. thx.. will fix
> 
> >
> >>+     }
> >>+
> >>+     /* populate the opp table */
> >>+     new_opp->dev_opp = dev_opp;
> >>+     new_opp->rate = freq;
> >>+     new_opp->u_volt = u_volt;
> >>+     new_opp->available = true;
> >>+
> >>+     /* make the dev_opp modification safe */
> >>+     mutex_lock(&dev_opp->lock);
> >>+
> >>+     rcu_read_lock();
> >>+     /* Insert new OPP in order of increasing frequency */
> >>+     head = &dev_opp->opp_list;
> >>+     list_for_each_entry_rcu(opp, &dev_opp->opp_list, node) {
> >>+             if (new_opp->rate < opp->rate)
> >>+                     break;
> >>+             else
> >>+                     head = &opp->node;
> >>+     }
> >>+     rcu_read_unlock();
> >>+
> >>+     list_add_rcu(&new_opp->node, head);
> >>+     mutex_unlock(&dev_opp->lock);
> >>+     synchronize_rcu();
> >
> >Ditto.
> thx.. will fix.
> 
> >
> >>+     return 0;
> >>+}
> >>+
> >>+/**
> >>+ * opp_set_availability() - helper to set the availability of an opp
> >>+ * @opp:             Pointer to opp
> >>+ * @availability_req:        availability status requested for this opp
> >>+ *
> >>+ * Set the availability of an OPP with an RCU operation, opp_{enable,disable}
> >>+ * share a common logic which is isolated here.
> >>+ *
> >>+ * Returns -EINVAL for bad pointers, -ENOMEM if no memory available for the
> >>+ * copy operation, returns 0 if no modifcation was done OR modification was
> >>+ * successful.
> >>+ */
> >>+static int opp_set_availability(struct opp *opp, bool availability_req)
> >>+{
> >>+     struct opp *new_opp, *tmp_opp;
> >>+     bool is_available;
> >>+
> >>+     if (unlikely(!opp || IS_ERR(opp))) {
> >>+             pr_err("%s: Invalid parameters being passed\n", __func__);
> >>+             return -EINVAL;
> >>+     }
> >>+
> >>+     new_opp = kmalloc(sizeof(struct opp), GFP_KERNEL);
> >>+     if (!new_opp) {
> >>+             pr_warning("%s: unable to allocate opp\n", __func__);
> >>+             return -ENOMEM;
> >>+     }
> >>+
> >>+     mutex_lock(&opp->dev_opp->lock);
> >>+
> >>+     rcu_read_lock();
> >>+     tmp_opp = rcu_dereference(opp);
> >>+     is_available = tmp_opp->available;
> >>+     rcu_read_unlock();
> >>+
> >>+     /* Is update really needed? */
> >>+     if (is_available == availability_req) {
> >>+             mutex_unlock(&opp->dev_opp->lock);
> >>+             kfree(tmp_opp);
> >>+             return 0;
> >>+     }
> >>+
> >>+     *new_opp = *opp;
> >>+     new_opp->available = availability_req;
> >>+     list_replace_rcu(&opp->node, &new_opp->node);
> >>+
> >>+     mutex_unlock(&opp->dev_opp->lock);
> >>+     synchronize_rcu();
> >
> >If you decide to rely on reference counts to fix the problem in
> >find_device_opp(), you will need to check the reference counts here.
> >Again, please see Documentation/RCU/rcuref.txt.
> 
> Does the original point about not needing to free dev_opp resolve this?

For the dev_opp case, yes.  However, I believe that my point is still
valid for the opp case.

> >>+     kfree(opp);
> >>+
> >>+     return 0;
> >>+}
> >>+
> >>+/**
> >>+ * opp_enable() - Enable a specific OPP
> >>+ * @opp:     Pointer to opp
> >>+ *
> >>+ * Enables a provided opp. If the operation is valid, this returns 0, else the
> >>+ * corresponding error value. It is meant to be used for users an OPP available
> >>+ * after being temporarily made unavailable with opp_disable.
> >>+ *
> >>+ * Locking: RCU, mutex
> >
> >By "Locking: RCU", you presumably don't mean that the caller must do
> >an rcu_read_lock() -- this would result in a synchronize_rcu() being
> >invoked in an RCU read-side critical section, which is illegal.
> aye..thx. I will make it more verbose. Does the following sound right?
> 
> Locking used internally: RCU copy-update and read_lock used, mutex
> 
> and for the readers:
> 
> Locking used internally: RCU read_lock used
> 
> or do we need to go all verbatim about the implementation here?
> 
> I intended the user to know the context in which they can call it,
> for example, since mutex is used, dont think of using this in
> interrupt context. since read_locks are already used, dont need to
> double lock it.. opp library takes care of it's own exclusivity.

I would stick to the constraints on the caller, and describe the internals
elsewhere, for example, near the data-structure definitions.  But tastes
do vary on this.

> >>+ */
> >>+int opp_enable(struct opp *opp)
> >>+{
> >>+     return opp_set_availability(opp, true);
> >>+}
> >>+
> >>+/**
> >>+ * opp_disable() - Disable a specific OPP
> >>+ * @opp:     Pointer to opp
> >>+ *
> >>+ * Disables a provided opp. If the operation is valid, this returns
> >>+ * 0, else the corresponding error value. It is meant to be a temporary
> >>+ * control by users to make this OPP not available until the circumstances are
> >>+ * right to make it available again (with a call to opp_enable).
> >>+ *
> >>+ * Locking: RCU, mutex
> >
> >Ditto.  (And similar feedback applies elsewhere.)
> >
> >>+ */
> >>+int opp_disable(struct opp *opp)
> >>+{
> >>+     return opp_set_availability(opp, false);
> >>+}
> >>+
> >>+#ifdef CONFIG_CPU_FREQ
> >>+/**
> >>+ * opp_init_cpufreq_table() - create a cpufreq table for a device
> >>+ * @dev:     device for which we do this operation
> >>+ * @table:   Cpufreq table returned back to caller
> >>+ *
> >>+ * Generate a cpufreq table for a provided device- this assumes that the
> >>+ * opp list is already initialized and ready for usage.
> >>+ *
> >>+ * This function allocates required memory for the cpufreq table. It is
> >>+ * expected that the caller does the required maintenance such as freeing
> >>+ * the table as required.
> >>+ *
> >>+ * WARNING: It is  important for the callers to ensure refreshing their copy of
> >>+ * the table if any of the mentioned functions have been invoked in the interim.
> >>+ *
> >>+ * Locking: RCU reader
> >>+ */
> >>+void opp_init_cpufreq_table(struct device *dev,
> >>+                         struct cpufreq_frequency_table **table)
> >>+{
> >>+     struct device_opp *dev_opp;
> >>+     struct opp *opp;
> >>+     struct cpufreq_frequency_table *freq_table;
> >>+     int i = 0;
> >>+
> >>+     dev_opp = find_device_opp(dev);
> >>+     if (IS_ERR(dev_opp)) {
> >>+             pr_warning("%s: unable to find device\n", __func__);
> >>+             return;
> >>+     }
> >>+
> >>+     freq_table = kzalloc(sizeof(struct cpufreq_frequency_table) *
> >>+                          (opp_get_opp_count(dev) + 1), GFP_ATOMIC);
> >>+     if (!freq_table) {
> >>+             pr_warning("%s: failed to allocate frequency table\n",
> >>+                        __func__);
> >>+             return;
> >
> >How does the caller tell that the allocation failed?  Should the caller
> >set the pointer passed in through the "table" argument to NULL before
> >calling this function?  Or should this function set *table to NULL
> >before returning in this case?
> 
> Good catch. Thanks. I would rather change the return to int and pass
> proper errors to caller so that they can handle it appropriately.

Works for me!

							Thanx, Paul

WARNING: multiple messages have this Message-ID (diff)
From: paulmck@linux.vnet.ibm.com (Paul E. McKenney)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v4] power: introduce library for device-specific OPPs
Date: Fri, 24 Sep 2010 14:40:03 -0700	[thread overview]
Message-ID: <20100924214003.GL2375@linux.vnet.ibm.com> (raw)
In-Reply-To: <4C9D177D.6090708@ti.com>

On Fri, Sep 24, 2010 at 04:26:21PM -0500, Nishanth Menon wrote:
> Paul E. McKenney had written, on 09/24/2010 02:37 PM, the following:
> [...]
> >
> >Looks like a good start!!!  Some questions and suggestions about RCU
> Thanks for the review.. few comments below..

Back at you!  ;-)

> >usage interspersed below.
> >
> >                                                        Thanx, Paul
> [...]
> >>+
> >>+/**
> >>+ * find_device_opp() - find device_opp struct using device pointer
> >>+ * @dev:     device pointer used to lookup device OPPs
> >>+ *
> >>+ * Search list of device OPPs for one containing matching device. Does a RCU
> >>+ * reader operation to grab the pointer needed.
> >>+ *
> >>+ * Returns pointer to 'struct device_opp' if found, otherwise -ENODEV or
> >>+ * -EINVAL based on type of error.
> >>+ */
> >>+static struct device_opp *find_device_opp(struct device *dev)
> >>+{
> >>+     struct device_opp *tmp_dev_opp, *dev_opp = ERR_PTR(-ENODEV);
> >>+
> >>+     if (unlikely(!dev || IS_ERR(dev))) {
> >>+             pr_err("%s: Invalid parameters being passed\n", __func__);
> >>+             return ERR_PTR(-EINVAL);
> >>+     }
> >>+
> >>+     rcu_read_lock();
> >>+     list_for_each_entry_rcu(tmp_dev_opp, &dev_opp_list, node) {
> >>+             if (tmp_dev_opp->dev == dev) {
> >>+                     dev_opp = tmp_dev_opp;
> >>+                     break;
> >>+             }
> >>+     }
> >>+     rcu_read_unlock();
> >
> >What prevents the structure pointed to by dev_opp from being freed
> >at this point?  We are no longer in an RCU read-side critical section,
> >so RCU grace periods starting during the above RCU read-side critical
> >section can now end.
> 
> dev_opp is never freed in the implementation -> it represents
> domains, only adds with list_add_rcu() is done -> wont the usage be
> safe then? or I being blind?
> 
> the reason why we dont free is coz of the following: dev_opp
> represents voltage domains in opp library. SoC frameworks are
> required to register only those voltage domain opp that are
> required. by allowing a free logic, I knew it'd have complicated the
> implementation way beyond what we needed it to be.

Perhaps I was confusing two different data structures, if so, apologies.

So you are freeing the opp level, but never the dev_opp level, then?

But yes, if you are only adding and never deleting, then it is safe to
pass the pointers out of an RCU read-side critical section.  But please
add a comment saying why you are doing this.  Otherwise, Coccinelle will
cause me to continue complaining about this to you.  ;-)

And the later uses still look buggy to me, please see below.

> >Here is an example sequence of events that I am worried about:
> >
> >o       CPU 1 enters find_device_opp(), and pick up a pointer to
> >        a given device opp.
> >
> >o       CPU 2 executes opp_set_availability(), replacing that same
> >        device opp with a new one.  It then calls synchronize_rcu()
> >        which blocks waiting for CPU 1 to exit its RCU read-side
> >        critical section.
> >
> >o       CPU 1 exits its RCU read-side critical section, arriving at
> >        this point in the code.
> >
> >o       CPU 2's synchronize_rcu() is now permitted to return, executing
> >        the kfree(), which frees up the memory that CPU 1's dev_opp
> >        pointer references.
> >
> >o       This newly freed memory is allocated for some other structure
> >        by CPU 3.  CPU 1 and CPU 3 are now trying to use the same
> >        memory for two different structures, and nothing good can
> >        possibly come of this.  The kernel dies a brutal and nasty
> >        death.
> >
> >One way to fix this is to have the caller do rcu_read_lock() before
> >calling find_device_opp(), and to do rcu_read_unlock() only after the
> >caller has finished using the pointer that find_device_opp() returns.
> >This works well unless the caller needs to do some blocking operation
> >before it gets done using the pointer.
> >
> >Another approach is for find_device_opp() to use a reference count on
> >the structure, and for opp_set_availability() to avoid freeing the
> >structure unless/until the reference counter drops to zero.
> >
> >There are other approaches as well, please feel free to take a look
> >at Documentation/RCU/rcuref.txt for more info on using reference
> >counting and RCU.
> thx. I probably should read yet again if I got my understanding of
> usage right..
> 
> >
> [...]
> >>+
> >>+/**
> >>+ * opp_find_freq_exact() - search for an exact frequency
> >>+ * @dev:             device for which we do this operation
> >>+ * @freq:            frequency to search for
> >>+ * @is_available:    true/false - match for available opp
> >>+ *
> >>+ * Searches for exact match in the opp list and returns pointer to the matching
> >>+ * opp if found, else returns ERR_PTR in case of error and should be handled
> >>+ * using IS_ERR.
> >>+ *
> >>+ * Note: available is a modifier for the search. if available=true, then the
> >>+ * match is for exact matching frequency and is available in the stored OPP
> >>+ * table. if false, the match is for exact frequency which is not available.
> >>+ *
> >>+ * This provides a mechanism to enable an opp which is not available currently
> >>+ * or the opposite as well.
> >>+ *
> >>+ * Locking: RCU reader.
> >>+ */
> >>+struct opp *opp_find_freq_exact(struct device *dev,
> >>+                                  unsigned long freq, bool available)
> >>+{
> >>+     struct device_opp *dev_opp;
> >>+     struct opp *temp_opp, *opp = ERR_PTR(-ENODEV);
> >>+
> >>+     dev_opp = find_device_opp(dev);
> >>+     if (IS_ERR(dev_opp))
> >>+             return opp;
> >>+
> >>+     rcu_read_lock();
> >>+     list_for_each_entry_rcu(temp_opp, &dev_opp->opp_list, node) {
> >>+             if (temp_opp->available == available &&
> >>+                             temp_opp->rate == freq) {
> >>+                     opp = temp_opp;
> >>+                     break;
> >>+             }
> >>+     }
> >>+     rcu_read_unlock();
> >
> >But this one sadly has the same problem that find_device_opp() does.
> is the concern about opp OR about dev_opp here? I am guessing opp..
> >
> >>+     return opp;
> >>+}
> >>+
> >>+/**
> >>+ * opp_find_freq_ceil() - Search for an rounded ceil freq
> >>+ * @dev:     device for which we do this operation
> >>+ * @freq:    Start frequency
> >>+ *
> >>+ * Search for the matching ceil *available* OPP from a starting freq
> >>+ * for a device.
> >>+ *
> >>+ * Returns matching *opp and refreshes *freq accordingly, else returns
> >>+ * ERR_PTR in case of error and should be handled using IS_ERR.
> >>+ *
> >>+ * Locking: RCU reader.
> >>+ */
> >>+struct opp *opp_find_freq_ceil(struct device *dev, unsigned long *freq)
> >>+{
> >>+     struct device_opp *dev_opp;
> >>+     struct opp *temp_opp, *opp = ERR_PTR(-ENODEV);
> >>+
> >>+     if (!dev || !freq) {
> >>+             pr_err("%s: invalid param dev=%p freq=%p\n", __func__,
> >>+                             dev, freq);
> >>+             return ERR_PTR(-EINVAL);
> >>+     }
> >>+
> >>+     dev_opp = find_device_opp(dev);
> >>+     if (IS_ERR(dev_opp))
> >>+             return opp;
> >>+
> >>+     rcu_read_lock();
> >>+     list_for_each_entry_rcu(temp_opp, &dev_opp->opp_list, node) {
> >>+             if (temp_opp->available && temp_opp->rate >= *freq) {
> >>+                     opp = temp_opp;
> >>+                     *freq = opp->rate;
> >>+                     break;
> >>+             }
> >>+     }
> >>+     rcu_read_unlock();
> >
> >And this one also has the same problem that find_device_opp() does.
>
> guessing opp ptr here.. am I right? if it is about device_opp, it is
> not going to be freed as I mentioned above - at least we dont give
> an function to update(hence free) it.

It really does look to me that you are passing a pointer to the thing
being freed out of an RCU read-side critical section.  If you are really
doing this, you do need to do something to fix it.  I outlined some of
the options earlier.

> >>+     return opp;
> >>+}
> >>+
> >>+/**
> >>+ * opp_find_freq_floor() - Search for a rounded floor freq
> >>+ * @dev:     device for which we do this operation
> >>+ * @freq:    Start frequency
> >>+ *
> >>+ * Search for the matching floor *available* OPP from a starting freq
> >>+ * for a device.
> >>+ *
> >>+ * Returns matching *opp and refreshes *freq accordingly, else returns
> >>+ * ERR_PTR in case of error and should be handled using IS_ERR.
> >>+ *
> >>+ * Locking: RCU reader.
> >>+ */
> >>+struct opp *opp_find_freq_floor(struct device *dev, unsigned long *freq)
> >>+{
> >>+     struct device_opp *dev_opp;
> >>+     struct opp *temp_opp, *opp = ERR_PTR(-ENODEV);
> >>+
> >>+     if (!dev || !freq) {
> >>+             pr_err("%s: invalid param dev=%p freq=%p\n", __func__,
> >>+                             dev, freq);
> >>+             return ERR_PTR(-EINVAL);
> >>+     }
> >>+
> >>+     dev_opp = find_device_opp(dev);
> >>+     if (IS_ERR(dev_opp))
> >>+             return opp;
> >>+
> >>+     rcu_read_lock();
> >>+     list_for_each_entry_rcu(temp_opp, &dev_opp->opp_list, node) {
> >>+             if (temp_opp->available) {
> >>+                     /* go to the next node, before choosing prev */
> >>+                     if (temp_opp->rate > *freq)
> >>+                             break;
> >>+                     else
> >>+                             opp = temp_opp;
> >>+             }
> >>+     }
> >>+     if (!IS_ERR(opp))
> >>+             *freq = opp->rate;
> >>+     rcu_read_unlock();
> >
> >As does this one.
>
> guessing opp ptr here.. am I right?

Again, here it looks to me like you are passing a pointer out of an RCU
read-side critical section that could be freed out from under you.  If
so, again, this must be fixed.

> >>+     return opp;
> >>+}
> >>+
> >>+/**
> >>+ * opp_add()  - Add an OPP table from a table definitions
> >>+ * @dev:     device for which we do this operation
> >>+ * @freq:    Frequency in Hz for this OPP
> >>+ * @u_volt:  Voltage in uVolts for this OPP
> >>+ *
> >>+ * This function adds an opp definition to the opp list and returns status.
> >>+ * The opp is made available by default and it can be controlled using
> >>+ * opp_enable/disable functions.
> >>+ *
> >>+ * Locking: RCU, mutex
> >>+ */
> >>+int opp_add(struct device *dev, unsigned long freq, unsigned long u_volt)
> >>+{
> >>+     struct device_opp *tmp_dev_opp, *dev_opp = NULL;
> >>+     struct opp *opp, *new_opp;
> >>+     struct list_head *head;
> >>+
> >>+     /* Check for existing list for 'dev' */
> >>+     rcu_read_lock();
> >>+     list_for_each_entry_rcu(tmp_dev_opp, &dev_opp_list, node) {
> >>+             if (dev == tmp_dev_opp->dev) {
> >>+                     dev_opp = tmp_dev_opp;
> >>+                     break;
> >>+             }
> >>+     }
> >>+     rcu_read_unlock();
> >>+
> >>+     /* allocate new OPP node */
> >>+     new_opp = kzalloc(sizeof(struct opp), GFP_KERNEL);
> >>+     if (!new_opp) {
> >>+             pr_warning("%s: unable to allocate new opp node\n",
> >>+                     __func__);
> >>+             return -ENOMEM;
> >>+     }
> >>+
> >>+     if (!dev_opp) {
> >>+             /* Allocate a new device OPP table */
> >>+             dev_opp = kzalloc(sizeof(struct device_opp), GFP_KERNEL);
> >>+             if (!dev_opp) {
> >>+                     kfree(new_opp);
> >>+                     pr_warning("%s: unable to allocate device structure\n",
> >>+                             __func__);
> >>+                     return -ENOMEM;
> >>+             }
> >>+
> >>+             dev_opp->dev = dev;
> >>+             INIT_LIST_HEAD(&dev_opp->opp_list);
> >>+             mutex_init(&dev_opp->lock);
> >>+
> >>+             /* Secure the device list modification */
> >>+             mutex_lock(&dev_opp_list_lock);
> >>+             list_add_rcu(&dev_opp->node, &dev_opp_list);
> >>+             mutex_unlock(&dev_opp_list_lock);
> >>+             synchronize_rcu();
> >
> >You do not need to wait for an RCU grace period when adding objects, only
> >between removing them and freeing them.
>
> ouch.. my bad.. thx.. will fix
> 
> >
> >>+     }
> >>+
> >>+     /* populate the opp table */
> >>+     new_opp->dev_opp = dev_opp;
> >>+     new_opp->rate = freq;
> >>+     new_opp->u_volt = u_volt;
> >>+     new_opp->available = true;
> >>+
> >>+     /* make the dev_opp modification safe */
> >>+     mutex_lock(&dev_opp->lock);
> >>+
> >>+     rcu_read_lock();
> >>+     /* Insert new OPP in order of increasing frequency */
> >>+     head = &dev_opp->opp_list;
> >>+     list_for_each_entry_rcu(opp, &dev_opp->opp_list, node) {
> >>+             if (new_opp->rate < opp->rate)
> >>+                     break;
> >>+             else
> >>+                     head = &opp->node;
> >>+     }
> >>+     rcu_read_unlock();
> >>+
> >>+     list_add_rcu(&new_opp->node, head);
> >>+     mutex_unlock(&dev_opp->lock);
> >>+     synchronize_rcu();
> >
> >Ditto.
> thx.. will fix.
> 
> >
> >>+     return 0;
> >>+}
> >>+
> >>+/**
> >>+ * opp_set_availability() - helper to set the availability of an opp
> >>+ * @opp:             Pointer to opp
> >>+ * @availability_req:        availability status requested for this opp
> >>+ *
> >>+ * Set the availability of an OPP with an RCU operation, opp_{enable,disable}
> >>+ * share a common logic which is isolated here.
> >>+ *
> >>+ * Returns -EINVAL for bad pointers, -ENOMEM if no memory available for the
> >>+ * copy operation, returns 0 if no modifcation was done OR modification was
> >>+ * successful.
> >>+ */
> >>+static int opp_set_availability(struct opp *opp, bool availability_req)
> >>+{
> >>+     struct opp *new_opp, *tmp_opp;
> >>+     bool is_available;
> >>+
> >>+     if (unlikely(!opp || IS_ERR(opp))) {
> >>+             pr_err("%s: Invalid parameters being passed\n", __func__);
> >>+             return -EINVAL;
> >>+     }
> >>+
> >>+     new_opp = kmalloc(sizeof(struct opp), GFP_KERNEL);
> >>+     if (!new_opp) {
> >>+             pr_warning("%s: unable to allocate opp\n", __func__);
> >>+             return -ENOMEM;
> >>+     }
> >>+
> >>+     mutex_lock(&opp->dev_opp->lock);
> >>+
> >>+     rcu_read_lock();
> >>+     tmp_opp = rcu_dereference(opp);
> >>+     is_available = tmp_opp->available;
> >>+     rcu_read_unlock();
> >>+
> >>+     /* Is update really needed? */
> >>+     if (is_available == availability_req) {
> >>+             mutex_unlock(&opp->dev_opp->lock);
> >>+             kfree(tmp_opp);
> >>+             return 0;
> >>+     }
> >>+
> >>+     *new_opp = *opp;
> >>+     new_opp->available = availability_req;
> >>+     list_replace_rcu(&opp->node, &new_opp->node);
> >>+
> >>+     mutex_unlock(&opp->dev_opp->lock);
> >>+     synchronize_rcu();
> >
> >If you decide to rely on reference counts to fix the problem in
> >find_device_opp(), you will need to check the reference counts here.
> >Again, please see Documentation/RCU/rcuref.txt.
> 
> Does the original point about not needing to free dev_opp resolve this?

For the dev_opp case, yes.  However, I believe that my point is still
valid for the opp case.

> >>+     kfree(opp);
> >>+
> >>+     return 0;
> >>+}
> >>+
> >>+/**
> >>+ * opp_enable() - Enable a specific OPP
> >>+ * @opp:     Pointer to opp
> >>+ *
> >>+ * Enables a provided opp. If the operation is valid, this returns 0, else the
> >>+ * corresponding error value. It is meant to be used for users an OPP available
> >>+ * after being temporarily made unavailable with opp_disable.
> >>+ *
> >>+ * Locking: RCU, mutex
> >
> >By "Locking: RCU", you presumably don't mean that the caller must do
> >an rcu_read_lock() -- this would result in a synchronize_rcu() being
> >invoked in an RCU read-side critical section, which is illegal.
> aye..thx. I will make it more verbose. Does the following sound right?
> 
> Locking used internally: RCU copy-update and read_lock used, mutex
> 
> and for the readers:
> 
> Locking used internally: RCU read_lock used
> 
> or do we need to go all verbatim about the implementation here?
> 
> I intended the user to know the context in which they can call it,
> for example, since mutex is used, dont think of using this in
> interrupt context. since read_locks are already used, dont need to
> double lock it.. opp library takes care of it's own exclusivity.

I would stick to the constraints on the caller, and describe the internals
elsewhere, for example, near the data-structure definitions.  But tastes
do vary on this.

> >>+ */
> >>+int opp_enable(struct opp *opp)
> >>+{
> >>+     return opp_set_availability(opp, true);
> >>+}
> >>+
> >>+/**
> >>+ * opp_disable() - Disable a specific OPP
> >>+ * @opp:     Pointer to opp
> >>+ *
> >>+ * Disables a provided opp. If the operation is valid, this returns
> >>+ * 0, else the corresponding error value. It is meant to be a temporary
> >>+ * control by users to make this OPP not available until the circumstances are
> >>+ * right to make it available again (with a call to opp_enable).
> >>+ *
> >>+ * Locking: RCU, mutex
> >
> >Ditto.  (And similar feedback applies elsewhere.)
> >
> >>+ */
> >>+int opp_disable(struct opp *opp)
> >>+{
> >>+     return opp_set_availability(opp, false);
> >>+}
> >>+
> >>+#ifdef CONFIG_CPU_FREQ
> >>+/**
> >>+ * opp_init_cpufreq_table() - create a cpufreq table for a device
> >>+ * @dev:     device for which we do this operation
> >>+ * @table:   Cpufreq table returned back to caller
> >>+ *
> >>+ * Generate a cpufreq table for a provided device- this assumes that the
> >>+ * opp list is already initialized and ready for usage.
> >>+ *
> >>+ * This function allocates required memory for the cpufreq table. It is
> >>+ * expected that the caller does the required maintenance such as freeing
> >>+ * the table as required.
> >>+ *
> >>+ * WARNING: It is  important for the callers to ensure refreshing their copy of
> >>+ * the table if any of the mentioned functions have been invoked in the interim.
> >>+ *
> >>+ * Locking: RCU reader
> >>+ */
> >>+void opp_init_cpufreq_table(struct device *dev,
> >>+                         struct cpufreq_frequency_table **table)
> >>+{
> >>+     struct device_opp *dev_opp;
> >>+     struct opp *opp;
> >>+     struct cpufreq_frequency_table *freq_table;
> >>+     int i = 0;
> >>+
> >>+     dev_opp = find_device_opp(dev);
> >>+     if (IS_ERR(dev_opp)) {
> >>+             pr_warning("%s: unable to find device\n", __func__);
> >>+             return;
> >>+     }
> >>+
> >>+     freq_table = kzalloc(sizeof(struct cpufreq_frequency_table) *
> >>+                          (opp_get_opp_count(dev) + 1), GFP_ATOMIC);
> >>+     if (!freq_table) {
> >>+             pr_warning("%s: failed to allocate frequency table\n",
> >>+                        __func__);
> >>+             return;
> >
> >How does the caller tell that the allocation failed?  Should the caller
> >set the pointer passed in through the "table" argument to NULL before
> >calling this function?  Or should this function set *table to NULL
> >before returning in this case?
> 
> Good catch. Thanks. I would rather change the return to int and pass
> proper errors to caller so that they can handle it appropriately.

Works for me!

							Thanx, Paul

  reply	other threads:[~2010-09-24 21:40 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <[PATCH v3] power: introduce library for device-specific OPPs>
2010-09-24 12:50 ` [PATCH v4] power: introduce library for device-specific OPPs Nishanth Menon
2010-09-24 12:50   ` Nishanth Menon
2010-09-24 19:37   ` Paul E. McKenney
2010-09-24 19:37   ` Paul E. McKenney
2010-09-24 19:37   ` Paul E. McKenney
2010-09-24 19:37     ` Paul E. McKenney
2010-09-24 21:26     ` Nishanth Menon
2010-09-24 21:26     ` Nishanth Menon
2010-09-24 21:26       ` Nishanth Menon
2010-09-24 21:40       ` Paul E. McKenney [this message]
2010-09-24 21:40         ` Paul E. McKenney
2010-09-27 14:29         ` Nishanth Menon
2010-09-27 14:29           ` Nishanth Menon
2010-09-27 14:29         ` Nishanth Menon
2010-09-24 21:40       ` Paul E. McKenney
2010-09-25 20:55     ` Rafael J. Wysocki
2010-09-25 20:55     ` Rafael J. Wysocki
2010-09-25 20:55       ` Rafael J. Wysocki
2010-09-26  0:56       ` Paul E. McKenney
2010-09-26  0:56       ` Paul E. McKenney
2010-09-26  0:56         ` Paul E. McKenney
2010-09-27 14:25         ` Nishanth Menon
2010-09-27 14:25         ` Nishanth Menon
2010-09-27 14:25           ` Nishanth Menon
2010-09-27 14:25           ` Nishanth Menon
2010-09-27 19:53           ` Rafael J. Wysocki
2010-09-27 19:53           ` Rafael J. Wysocki
2010-09-27 19:53             ` Rafael J. Wysocki
2010-09-24 12:50 ` Nishanth Menon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100924214003.GL2375@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-omap@vger.kernel.org \
    --cc=linux-pm@lists.linux-foundation.org \
    --cc=nm@ti.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.