linux-mtd.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Ming Liu <liu.ming50@gmail.com>
To: linux-mtd@lists.infradead.org
Cc: deng.chao1@zte.com.cn, thomas.betker@freenet.de
Subject: [JFFS2] Revision "jffs2: Fix lock acquisition order bug in jffs2_write_begin" introduces another dead lock.
Date: Fri, 23 Aug 2013 17:04:01 +0800	[thread overview]
Message-ID: <52172581.6060101@gmail.com> (raw)

Hi, all:

I've been working with 2.6.34 stable kernel and recently encountered a 
AB-BA dead lock issue with jffs2, the scenario is:

Run two scripts at the same time:

Script 1:

#!/bin/bash

while [ 1 ]

do

cp /mnt/mtd-folder/region_a/xxx.tar.gz /mnt/mtd-folder/region_b

usleep 10

done


Script 2:

#!/bin/bash

while [ 1 ]

do

tar -zxvf /mnt/mtd-folder/region_b/.tar.gz -C /dev/shm

done

In several hours, the processes "cp", "tar" and "jffs2_gcd_mtd" all turn 
to "D" state. After some investigation, I found that it's introduced by 
commit "jffs2: Fix lock acquisition order bug in jffs2_write_begin", 
which tried to fix a AB-BA dead lock as:

jffs2_garbage_collect_live

mutex_lock(&f->sem) (A)

jffs2_garbage_collect_dnode

     jffs2_gc_fetch_page

         read_cache_page_async

             do_read_cache_page

lock_page(page)             (B)



jffs2_write_begin

grab_cache_page_write_begin

     find_lock_page

lock_page(page)                     (B)

mutex_lock(&f->sem) (A)


But for do_generic_file_read()  first acquires the page lock, then 
f->sem,causes another AB-BA deadlock with jffs2_write_begin(), which 
firstacquires f->sem, then the page lock:

jffs2_write_begin

mutex_lock(&f->sem) (A)

grab_cache_page_write_begin

     find_lock_page

lock_page(page)                     (B)

do_generic_file_read

lock_page_killable(page) (B)

     jffs2_readpage

mutex_lock(&f->sem)                     (A)


I also noticed there was another thread discussed a similar deadlock 
also related to the same commit, with the title: "[JFFS2]The patch 
"jffs2: Fix lock acquisition order bug in jffs2_write_begin" introduces 
another dead lock bug.", posted by Deng Chao. And Deng had proposed a 
idea that involving in a function "read_cache_page_async_trylock" 
instead of "read_cache_page_async", is there anybody has implement that?

the best,
thank you

                 reply	other threads:[~2013-08-23  9:04 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52172581.6060101@gmail.com \
    --to=liu.ming50@gmail.com \
    --cc=deng.chao1@zte.com.cn \
    --cc=linux-mtd@lists.infradead.org \
    --cc=thomas.betker@freenet.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).