All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fan Du <fan.du@windriver.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: <matthew@wil.cx>, <linux-fsdevel@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] fs: Disable preempt when acquire i_size_seqcount write lock
Date: Fri, 11 Jan 2013 11:25:08 +0800	[thread overview]
Message-ID: <50EF8614.3050408@windriver.com> (raw)
In-Reply-To: <20130110143813.1ba2b4fd.akpm@linux-foundation.org>



On 2013年01月11日 06:38, Andrew Morton wrote:
> On Wed, 9 Jan 2013 11:34:19 +0800
> Fan Du<fan.du@windriver.com>  wrote:
>
>> Two rt tasks bind to one CPU core.
>>
>> The higher priority rt task A preempts a lower priority rt task B which
>> has already taken the write seq lock, and then the higher priority
>> rt task A try to acquire read seq lock, it's doomed to lockup.
>>
>> rt task A with lower priority: call write
>> i_size_write                                        rt task B with higher priority: call sync, and preempt task A
>>    write_seqcount_begin(&inode->i_size_seqcount);    i_size_read
>>    inode->i_size = i_size;                             read_seqcount_begin<-- lockup here...
>>
>
> Ouch.
>
> And even if the preemping task is preemptible, it will spend an entire
> timeslice pointlessly spinning, which isn't very good.
>
>> So disable preempt when acquiring every i_size_seqcount *write* lock will
>> cure the problem.
>>
>> ...
>>
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -758,9 +758,11 @@ static inline loff_t i_size_read(const struct inode *inode)
>>   static inline void i_size_write(struct inode *inode, loff_t i_size)
>>   {
>>   #if BITS_PER_LONG==32&&  defined(CONFIG_SMP)
>> +	preempt_disable();
>>   	write_seqcount_begin(&inode->i_size_seqcount);
>>   	inode->i_size = i_size;
>>   	write_seqcount_end(&inode->i_size_seqcount);
>> +	preempt_enable();
>>   #elif BITS_PER_LONG==32&&  defined(CONFIG_PREEMPT)
>>   	preempt_disable();
>>   	inode->i_size = i_size;
>
> afacit all write_seqcount_begin()/read_seqretry() sites are vulnerable
> to this problem.  Would it not be better to do the preempt_disable() in
> write_seqcount_begin()?

IMHO, write_seqcount_begin/write_seqcount_end are often wrapped by mutex,
this gives higher priority task a chance to sleep, and then lower priority task
get cpu to unlock, so avoid the problematic scenario this patch describing.

But in i_size_write case, I could only find disable preempt a good choice before
someone else has better idea :)

>
> Possible problems:
>
> - mm/filemap_xip.c does disk I/O under write_seqcount_begin().
>
> - dev_change_name() does GFP_KERNEL allocations under write_seqcount_begin()
>
> - I didn't review u64_stats_update_begin() callers.
>
> But I think calling schedule() under preempt_disable() is OK anyway?
>

-- 
浮沉随浪只记今朝笑

--fan

      reply	other threads:[~2013-01-11  3:25 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-09  3:34 [PATCH] fs: Disable preempt when acquire i_size_seqcount write lock Fan Du
2013-01-10 22:38 ` Andrew Morton
2013-01-11  3:25   ` Fan Du [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50EF8614.3050408@windriver.com \
    --to=fan.du@windriver.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matthew@wil.cx \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.