From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steven Whitehouse Date: Fri, 14 Sep 2007 16:12:18 +0100 Subject: [Cluster-devel] [PATCH] [GFS2] bz 276631 : GFS2: chmod hung - TRY 3 In-Reply-To: <1189780080.5632.18.camel@technetium.msp.redhat.com> References: <1189780080.5632.18.camel@technetium.msp.redhat.com> Message-ID: <1189782738.1068.54.camel@quoit> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi, Now in the -nmw git tree. Thanks, Steve. On Fri, 2007-09-14 at 09:27 -0500, Bob Peterson wrote: > This is a rewrite of the patch. We decided it was a better > approach to call separate wrapper functions than trying to work around > the problem with a spin_lock. > -- > The problem boiled down to a race between the gdlm_init_threads() > function initializing thread1 and its setting of blist = 1. > Essentially, "if (current == ls->thread1)" was checked by the thread > before the thread creator set ls->thread1. > > Since thread1 is the only thread who is allowed to work on the > blocking queue, and since neither thread thought it was thread1, no one > was working on the queue. So everything just sat. > > This patch reuses the ls->async_lock spin_lock to fix the race, > and it fixes the problem. I've done more than 2000 iterations of the > loop that was recreating the failure and it seems to work. > > Dave Teigland brought up the question of whether we should do this > another way. For example, by checking for the task name "lock_dlm1" > instead. I'm open to opinions. > -- > Signed-off-by: Bob Peterson > -- > diff -pur a/fs/gfs2/locking/dlm/thread.c b/fs/gfs2/locking/dlm/thread.c > --- a/fs/gfs2/locking/dlm/thread.c 2007-09-13 17:33:58.000000000 -0500 > +++ b/fs/gfs2/locking/dlm/thread.c 2007-09-14 09:16:07.000000000 -0500 > @@ -268,20 +268,16 @@ static inline int check_drop(struct gdlm > return 0; > } > > -static int gdlm_thread(void *data) > +static int gdlm_thread(void *data, int blist) > { > struct gdlm_ls *ls = (struct gdlm_ls *) data; > struct gdlm_lock *lp = NULL; > - int blist = 0; > uint8_t complete, blocking, submit, drop; > DECLARE_WAITQUEUE(wait, current); > > /* Only thread1 is allowed to do blocking callbacks since gfs > may wait for a completion callback within a blocking cb. */ > > - if (current == ls->thread1) > - blist = 1; > - > while (!kthread_should_stop()) { > set_current_state(TASK_INTERRUPTIBLE); > add_wait_queue(&ls->thread_wait, &wait); > @@ -333,12 +329,22 @@ static int gdlm_thread(void *data) > return 0; > } > > +static int gdlm_thread1(void *data) > +{ > + return gdlm_thread(data, 1); > +} > + > +static int gdlm_thread2(void *data) > +{ > + return gdlm_thread(data, 0); > +} > + > int gdlm_init_threads(struct gdlm_ls *ls) > { > struct task_struct *p; > int error; > > - p = kthread_run(gdlm_thread, ls, "lock_dlm1"); > + p = kthread_run(gdlm_thread1, ls, "lock_dlm1"); > error = IS_ERR(p); > if (error) { > log_error("can't start lock_dlm1 thread %d", error); > @@ -346,7 +352,7 @@ int gdlm_init_threads(struct gdlm_ls *ls > } > ls->thread1 = p; > > - p = kthread_run(gdlm_thread, ls, "lock_dlm2"); > + p = kthread_run(gdlm_thread2, ls, "lock_dlm2"); > error = IS_ERR(p); > if (error) { > log_error("can't start lock_dlm2 thread %d", error); > >