From mboxrd@z Thu Jan 1 00:00:00 1970 From: bmarzins@sourceware.org Date: 24 Jan 2008 20:25:10 -0000 Subject: [Cluster-devel] cluster/gfs-kernel/src/gfs glock.c Message-ID: <20080124202510.23956.qmail@sourceware.org> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit CVSROOT: /cvs/cluster Module name: cluster Branch: RHEL5 Changes by: bmarzins at sourceware.org 2008-01-24 20:25:10 Modified files: gfs-kernel/src/gfs: glock.c Log message: Fix for bz #426291. gfs_glock_dq was traversing the gl_holders list without holding the gl_spin spinlock, this was causing a problem when the list item it was currently looking at got removed from the list. The solution is to not traverse the list, because it is unncessary. Unfortunately, there is also a bug in this section of code, where you can't guarantee that you will not cache a glock held with GL_NOCACHE. Fixing this issue requires significantly more work. Patches: http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/gfs-kernel/src/gfs/glock.c.diff?cvsroot=cluster&only_with_tag=RHEL5&r1=1.29.2.5&r2=1.29.2.6 --- cluster/gfs-kernel/src/gfs/glock.c 2007/06/26 19:42:31 1.29.2.5 +++ cluster/gfs-kernel/src/gfs/glock.c 2008/01/24 20:25:10 1.29.2.6 @@ -1618,8 +1618,6 @@ struct gfs_sbd *sdp = gl->gl_sbd; struct gfs_glock_operations *glops = gl->gl_ops; struct list_head *pos; - struct gfs_holder *tmp_gh = NULL; - int count = 0; atomic_inc(&gl->gl_sbd->sd_glock_dq_calls); @@ -1630,14 +1628,13 @@ set_bit(GLF_SYNC, &gl->gl_flags); /* Don't cache glock; request demote to unlock at inter-node scope */ - if (gh->gh_flags & GL_NOCACHE) { - list_for_each(pos, &gl->gl_holders) { - tmp_gh = list_entry(pos, struct gfs_holder, gh_list); - ++count; - } - if (tmp_gh == gh && count == 1) - handle_callback(gl, LM_ST_UNLOCKED); - } + if (gh->gh_flags & GL_NOCACHE && gl->gl_holders.next == &gh->gh_list && + gl->gl_holders.prev == &gh->gh_list) + /* There's a race here. If there are two holders, and both + * are dq'ed at almost the same time, you can't guarantee that + * you will call handle_callback. Fixing this will require + * some refactoring */ + handle_callback(gl, LM_ST_UNLOCKED); lock_on_glock(gl);