public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Teoh <htmldeveloper@gmail.com>
To: Eric Dumazet <dada1@cosmosbay.com>
Cc: Peter Teoh <htmldeveloper@gmail.com>,
	Johannes Weiner <hannes@saeurebad.de>,
	Jeremy Fitzhardinge <jeremy@goop.org>,
	LKML <linux-kernel@vger.kernel.org>, Tejun Heo <htejun@gmail.com>,
	Dipankar Sarma <dipankar@in.ibm.com>
Subject: Re: per cpun+ spin locks coexistence?
Date: Thu, 20 Mar 2008 00:25:04 +0800	[thread overview]
Message-ID: <47E13E60.9050706@dso.org.sg> (raw)
In-Reply-To: <47E0033E.4010300@cosmosbay.com>

Eric Dumazet wrote:
> Peter Teoh a écrit :
>> On 3/18/08, Eric Dumazet <dada1@cosmosbay.com> wrote:
>>
>>  
>>> You are right Peter, that fs/file.c contains some leftover from 
>>> previous
>>>  implementation of defer queue,
>>>  that was using a timer.
>>>
>>>  So we can probably provide a patch that :
>>>
>>>  - Use spin_lock() & spin_unlock() instead of spin_lock_bh() &
>>>  spin_unlock_bh() in free_fdtable_work()
>>>  since we dont anymore use a softirq (timer) to reschedule the 
>>> workqueue.
>>>
>>>  ( this timer was deleted by the following patch :
>>>  http://readlist.com/lists/vger.kernel.org/linux-kernel/50/251040.html
>>>
>>>
>>>  But, you cannot avoid use of spin_lock()/spin_unlock() because
>>>  schedule_work() makes no garantee that the work will be done by 
>>> this cpu.
>>>     
>>
>> Ah.....u have hit the nail....and combine with Johannes Weiner's
>> explanation, I have pieced together the full scenario:
>>
>> First, the following is possible:
>>
>>                 fddef = &get_cpu_var(fdtable_defer_list);
>>                 spin_lock(&fddef->lock);
>>                 fdt->next = fddef->next;
>>                 fddef->next = fdt;==============>executing at CPU A
>>                 /* vmallocs are handled from the workqueue context */
>>                 schedule_work(&fddef->wq);
>>                 spin_unlock(&fddef->lock);==============>executing at 
>> CPU B
>>                 put_cpu_var(fdtable_defer_list);
>>
>> where the execution can switch CPU after the schedule_work() API, then
>> LOGICALLY u definitely need the spin_lock(), and the per_cpu data is
>> really not necessary.
>>
>> But without the per_cpu structure, then the following "dedicated
>> chunk" can only execute on one processor, with the possibility of
>> switching to another processor after schedule_work():
>>   
> Hum, you misunderstood the point.
>
> schedule_work(); wont switch your current CPU, since you are inside a 
> spin_lock
> ()/spin_unlock() pair, so preemption is not possible.
>
>
>
>> So then we introduce the per_cpu structure - so that the "dedicated
>> chunk" can be executing on multiple processor ALL AT THE SAME TIME,
>> without interferring each other, as fddef are per-cpu (rightfully
>> owned only before schedule_work() is called, but after schedule_work()
>> is called, an arbitrary CPU will be executing this fddef).
>>
>> spin_lock() is necessary because of the possibility of CPU switch
>> (schedule_work()).
>>
>> and per_cpu is so that the same chunk of code can be executing at
>> multiple CPUs all at the same time.
>>
>> Now the key issue rises up - as I have just asked before u answered 
>> my question:
>>
>> http://mail.nl.linux.org/kernelnewbies/2008-03/msg00236.html
>>
>> can schedule_work() sleep?   (just like schedule(), whcih can sleep 
>> right?)
>> schedule_work() is guaranteed to execute the work queue at least once,
>> and so this thread may or may not sleep. correct?   Or wrong?
>>
>>   
> schedule_work() cannot sleep. It only queues a work to be done later 
> by a special thread.
>
> We need this because vfree() should not be called from softirq handler 
> (rcu in this case), so we queue a (small) job.
>> Problem is when u sleep and never wake up, then the spin_lock become
>> permanently locked, and when later the same CPU (have to be the same
>> fddef CPU) is being reschedule to execute the get_cpu_var() again, it
>> will spin_lock() infinitely, resulting in 100% CPU utilization error.
>>
>> To prevent these types of error, spin_lock are always not to be used
>> with to wrap around functions that can sleep, and can only containing
>> short routines between lock and unlock.
>>
>> Is my analysis correct?
>>
>>   
> Not exactly :) , but please continue to learn :)
>

Thank you everyone here for a very informative education.   I will go 
back and analyse in more detail. :-).



      reply	other threads:[~2008-03-19 21:05 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-12 16:17 per cpun+ spin locks coexistence? Peter Teoh
2008-03-14 17:54 ` Jeremy Fitzhardinge
2008-03-16 16:30   ` Peter Teoh
2008-03-16 20:20     ` Johannes Weiner
2008-03-17 17:06       ` Peter Teoh
2008-03-17 17:51         ` Johannes Weiner
2008-03-17 19:22         ` Eric Dumazet
2008-03-18 17:00           ` Peter Teoh
2008-03-18 17:34             ` Dipankar Sarma
2008-03-18 18:00             ` Eric Dumazet
2008-03-19 16:25               ` Peter Teoh [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47E13E60.9050706@dso.org.sg \
    --to=htmldeveloper@gmail.com \
    --cc=dada1@cosmosbay.com \
    --cc=dipankar@in.ibm.com \
    --cc=hannes@saeurebad.de \
    --cc=htejun@gmail.com \
    --cc=jeremy@goop.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox