From: Peter Teoh <htmldeveloper@gmail.com>
To: Eric Dumazet <dada1@cosmosbay.com>
Cc: Peter Teoh <htmldeveloper@gmail.com>,
Johannes Weiner <hannes@saeurebad.de>,
Jeremy Fitzhardinge <jeremy@goop.org>,
LKML <linux-kernel@vger.kernel.org>, Tejun Heo <htejun@gmail.com>,
Dipankar Sarma <dipankar@in.ibm.com>
Subject: Re: per cpun+ spin locks coexistence?
Date: Thu, 20 Mar 2008 00:25:04 +0800 [thread overview]
Message-ID: <47E13E60.9050706@dso.org.sg> (raw)
In-Reply-To: <47E0033E.4010300@cosmosbay.com>
Eric Dumazet wrote:
> Peter Teoh a écrit :
>> On 3/18/08, Eric Dumazet <dada1@cosmosbay.com> wrote:
>>
>>
>>> You are right Peter, that fs/file.c contains some leftover from
>>> previous
>>> implementation of defer queue,
>>> that was using a timer.
>>>
>>> So we can probably provide a patch that :
>>>
>>> - Use spin_lock() & spin_unlock() instead of spin_lock_bh() &
>>> spin_unlock_bh() in free_fdtable_work()
>>> since we dont anymore use a softirq (timer) to reschedule the
>>> workqueue.
>>>
>>> ( this timer was deleted by the following patch :
>>> http://readlist.com/lists/vger.kernel.org/linux-kernel/50/251040.html
>>>
>>>
>>> But, you cannot avoid use of spin_lock()/spin_unlock() because
>>> schedule_work() makes no garantee that the work will be done by
>>> this cpu.
>>>
>>
>> Ah.....u have hit the nail....and combine with Johannes Weiner's
>> explanation, I have pieced together the full scenario:
>>
>> First, the following is possible:
>>
>> fddef = &get_cpu_var(fdtable_defer_list);
>> spin_lock(&fddef->lock);
>> fdt->next = fddef->next;
>> fddef->next = fdt;==============>executing at CPU A
>> /* vmallocs are handled from the workqueue context */
>> schedule_work(&fddef->wq);
>> spin_unlock(&fddef->lock);==============>executing at
>> CPU B
>> put_cpu_var(fdtable_defer_list);
>>
>> where the execution can switch CPU after the schedule_work() API, then
>> LOGICALLY u definitely need the spin_lock(), and the per_cpu data is
>> really not necessary.
>>
>> But without the per_cpu structure, then the following "dedicated
>> chunk" can only execute on one processor, with the possibility of
>> switching to another processor after schedule_work():
>>
> Hum, you misunderstood the point.
>
> schedule_work(); wont switch your current CPU, since you are inside a
> spin_lock
> ()/spin_unlock() pair, so preemption is not possible.
>
>
>
>> So then we introduce the per_cpu structure - so that the "dedicated
>> chunk" can be executing on multiple processor ALL AT THE SAME TIME,
>> without interferring each other, as fddef are per-cpu (rightfully
>> owned only before schedule_work() is called, but after schedule_work()
>> is called, an arbitrary CPU will be executing this fddef).
>>
>> spin_lock() is necessary because of the possibility of CPU switch
>> (schedule_work()).
>>
>> and per_cpu is so that the same chunk of code can be executing at
>> multiple CPUs all at the same time.
>>
>> Now the key issue rises up - as I have just asked before u answered
>> my question:
>>
>> http://mail.nl.linux.org/kernelnewbies/2008-03/msg00236.html
>>
>> can schedule_work() sleep? (just like schedule(), whcih can sleep
>> right?)
>> schedule_work() is guaranteed to execute the work queue at least once,
>> and so this thread may or may not sleep. correct? Or wrong?
>>
>>
> schedule_work() cannot sleep. It only queues a work to be done later
> by a special thread.
>
> We need this because vfree() should not be called from softirq handler
> (rcu in this case), so we queue a (small) job.
>> Problem is when u sleep and never wake up, then the spin_lock become
>> permanently locked, and when later the same CPU (have to be the same
>> fddef CPU) is being reschedule to execute the get_cpu_var() again, it
>> will spin_lock() infinitely, resulting in 100% CPU utilization error.
>>
>> To prevent these types of error, spin_lock are always not to be used
>> with to wrap around functions that can sleep, and can only containing
>> short routines between lock and unlock.
>>
>> Is my analysis correct?
>>
>>
> Not exactly :) , but please continue to learn :)
>
Thank you everyone here for a very informative education. I will go
back and analyse in more detail. :-).
prev parent reply other threads:[~2008-03-19 21:05 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-12 16:17 per cpun+ spin locks coexistence? Peter Teoh
2008-03-14 17:54 ` Jeremy Fitzhardinge
2008-03-16 16:30 ` Peter Teoh
2008-03-16 20:20 ` Johannes Weiner
2008-03-17 17:06 ` Peter Teoh
2008-03-17 17:51 ` Johannes Weiner
2008-03-17 19:22 ` Eric Dumazet
2008-03-18 17:00 ` Peter Teoh
2008-03-18 17:34 ` Dipankar Sarma
2008-03-18 18:00 ` Eric Dumazet
2008-03-19 16:25 ` Peter Teoh [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47E13E60.9050706@dso.org.sg \
--to=htmldeveloper@gmail.com \
--cc=dada1@cosmosbay.com \
--cc=dipankar@in.ibm.com \
--cc=hannes@saeurebad.de \
--cc=htejun@gmail.com \
--cc=jeremy@goop.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.