All of lore.kernel.org
 help / color / mirror / Atom feed
From: Preeti U Murthy <preeti@linux.vnet.ibm.com>
To: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Rafael Wysocki <rjw@rjwysocki.net>,
	linaro-kernel@lists.linaro.org, linux-pm@vger.kernel.org,
	ego@linux.vnet.ibm.com, paulus@samba.org,
	shilpa.bhat@linux.vnet.ibm.com, prarit@redhat.com,
	robert.schoene@tu-dresden.de, skannan@codeaurora.org
Subject: Re: [PATCH 0/3] cpufreq: governor: Fix potential races
Date: Thu, 04 Jun 2015 12:06:49 +0530	[thread overview]
Message-ID: <556FF201.8080100@linux.vnet.ibm.com> (raw)
In-Reply-To: <20150604061128.GF11325@linux>

On 06/04/2015 11:41 AM, Viresh Kumar wrote:
> On 04-06-15, 11:38, Preeti U Murthy wrote:
>> And a crash at the cpufreq worker thread again due to data access
>> exception when I change governors in parallel on a single core.
>>
>> cpu 0x3: Vector: 300 (Data Access) at [c000000fedb538f0]
>>     pc: c000000000856750: od_dbs_timer+0x60/0x1e0
>>     lr: c0000000000f489c: process_one_work+0x24c/0x910
>>     sp: c000000fedb53b70
>>    msr: 9000000100009033
>>    dar: 10
>>  dsisr: 40000000
>>   current = 0xc000000fe3d128e0
>>   paca    = 0xc000000007da1c80	 softe: 0	 irq_happened: 0x01
>>     pid   = 17227, comm = kworker/3:1
>>
>> With the backtrace being:
>>
>> [c000000fedb53be0] c0000000000f489c process_one_work+0x24c/0x910
>> [c000000fedb53c90] c0000000000f50dc worker_thread+0x17c/0x540
>> [c000000fedb53d20] c0000000000fed70 kthread+0x120/0x140
>> [c000000fedb53e30] c000000000009678 ret_from_kernel_thread+0x5c/0x64
>>
>> But the kernel stays sane longer than before with the patchset. The
>> above crash happens around 15 seconds after the test begins, while
>> earlier it wouldn't survive 2 seconds even.
> 
> I haven't attempted to solve the race between the worker threads and
> governor-callbacks yet. What I have tried to solve is the race between
> different callbacks. And you shouldn't see a race there for now. For
> example a race between INIT/EXIT/START/STOP/LIMITS.

Your fix may not be complete and here is why. The reason we see the crash
is because we have *only* attempted to serialize calls to cpufreq_governor_dbs()
and not attempted to serialize *entire logical sequence of operations*. Let's
take a look at what is happening as a consequence.

CPU0                                          CPU1
store_scaling_governor()                   __cpufreq_remove_dev_finish()
 __cpufreq_governor(CPUFREQ_GOV_STOP)       __cpufreq_governor(CPUFREQ_GOV_START)
   policy->governor_enabled = false                    
   cpufreq_governor_dbs()                     policy->governor_enabled = true
    mutex_lock()
     gov_cancel_work()                        cpufreq_governor_dbs()
                                               wait on lock

      may call gov_queue_work()
       if (!policy->enabled) : fails
         and we end up queuing work

    mutex_unlock()                              
                                              mutex_lock()
                                              gov_queue_work()
                                              mutex_unlock()
 
__cpufreq_governor(CPUFREQ_GOV_POLICY_EXIT)
  mutex_lock()
    cpufreq_governor_dbs()
        kfree(dbs_data)
                                           
                                               timer fires and od_dbs_timer/cs_dbs_timer() runs
                                               References governor data structures which are freed                                          

The issue as I see it is one set of operations must be allowed to run *completely* 
before another begins. When store_scaling_governor() says STOP, all governor operations
must be stopped, till the time store_scaling_governor() itself gives permission
to restart. Somebody else, in this case __cpufreq_remove_dev_finish() cannot overrule
this if it arrives late. This is what is happening above.

So if store_scaling_governor() arrives first, STOP|EXIT|START|LIMIT must complete before
START|LIMIT of __cpufreq_remove_dev_finish() is allowed to run. So it is just not about
serializing, its about *what needs to be serialized*.

Regards
Preeti U Murthy
                                       
> 


  reply	other threads:[~2015-06-04  6:37 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-03 10:27 [PATCH 0/3] cpufreq: governor: Fix potential races Viresh Kumar
2015-06-03 10:27 ` [PATCH 1/3] cpufreq: governor: register notifier from cs_init() Viresh Kumar
2015-06-04  5:38   ` Preeti U Murthy
2015-06-04  6:02     ` Viresh Kumar
2015-06-04  7:33       ` Preeti U Murthy
2015-06-03 10:27 ` [PATCH 2/3] cpufreq: governor: split cpufreq_governor_dbs() Viresh Kumar
2015-06-04 10:04   ` Preeti U Murthy
2015-06-04 10:17     ` Viresh Kumar
2015-06-04 11:13   ` [PATCH V2 " Viresh Kumar
2015-06-05  2:51     ` Preeti U Murthy
2015-06-03 10:27 ` [PATCH 3/3] cpufreq: governor: Serialize governor callbacks Viresh Kumar
2015-06-04 10:47   ` Preeti U Murthy
2015-06-04  5:14 ` [PATCH 0/3] cpufreq: governor: Fix potential races Preeti U Murthy
2015-06-04  6:08   ` Preeti U Murthy
2015-06-04  6:11     ` Viresh Kumar
2015-06-04  6:36       ` Preeti U Murthy [this message]
2015-06-04  6:42         ` Viresh Kumar
2015-06-04  7:04           ` Preeti U Murthy
2015-06-04  7:13             ` Viresh Kumar
2015-06-04  7:27               ` Preeti U Murthy
2015-06-05  3:00   ` Viresh Kumar
2015-06-05  3:04     ` Preeti U Murthy
2015-06-05  4:05     ` Preeti U Murthy
2015-06-15 23:48 ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=556FF201.8080100@linux.vnet.ibm.com \
    --to=preeti@linux.vnet.ibm.com \
    --cc=ego@linux.vnet.ibm.com \
    --cc=linaro-kernel@lists.linaro.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=paulus@samba.org \
    --cc=prarit@redhat.com \
    --cc=rjw@rjwysocki.net \
    --cc=robert.schoene@tu-dresden.de \
    --cc=shilpa.bhat@linux.vnet.ibm.com \
    --cc=skannan@codeaurora.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.