From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bob Peterson Date: Thu, 13 Sep 2007 23:04:43 -0500 Subject: [Cluster-devel] [PATCH] [GFS2] bz 276631 : GFS2: chmod hung - TRY 2 Message-ID: <1189742683.5632.13.camel@technetium.msp.redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Josef's right--my bad. Here is the corrected patch for 276631. The problem boiled down to a race between the gdlm_init_threads() function initializing thread1 and its setting of blist = 1. Essentially, "if (current == ls->thread1)" was checked by the thread before the thread creator set ls->thread1. Since thread1 is the only thread who is allowed to work on the blocking queue, and since neither thread thought it was thread1, no one was working on the queue. So everything just sat. This patch reuses the ls->async_lock spin_lock to fix the race, and it fixes the problem. I've done more than 2000 iterations of the loop that was recreating the failure and it seems to work. Dave Teigland brought up the question of whether we should do this another way. For example, by checking for the task name "lock_dlm1" instead. I'm open to opinions. -- Signed-off-by: Bob Peterson -- diff -pur a/fs/gfs2/locking/dlm/thread.c b/fs/gfs2/locking/dlm/thread.c --- a/fs/gfs2/locking/dlm/thread.c 2007-09-13 17:33:58.000000000 -0500 +++ b/fs/gfs2/locking/dlm/thread.c 2007-09-13 22:47:14.000000000 -0500 @@ -279,8 +279,10 @@ static int gdlm_thread(void *data) /* Only thread1 is allowed to do blocking callbacks since gfs may wait for a completion callback within a blocking cb. */ + spin_lock(&ls->async_lock); if (current == ls->thread1) blist = 1; + spin_unlock(&ls->async_lock); while (!kthread_should_stop()) { set_current_state(TASK_INTERRUPTIBLE); @@ -338,10 +340,12 @@ int gdlm_init_threads(struct gdlm_ls *ls struct task_struct *p; int error; + spin_lock(&ls->async_lock); p = kthread_run(gdlm_thread, ls, "lock_dlm1"); error = IS_ERR(p); if (error) { log_error("can't start lock_dlm1 thread %d", error); + spin_unlock(&ls->async_lock); return error; } ls->thread1 = p; @@ -351,9 +355,11 @@ int gdlm_init_threads(struct gdlm_ls *ls if (error) { log_error("can't start lock_dlm2 thread %d", error); kthread_stop(ls->thread1); + spin_unlock(&ls->async_lock); return error; } ls->thread2 = p; + spin_unlock(&ls->async_lock); return 0; }