* [Cluster-devel] [GFS2 PATCH 1/2] gfs2: Fix occasional glock use-after-free
@ 2019-04-05 17:54 Andreas Gruenbacher
2019-04-05 17:54 ` [Cluster-devel] [GFS2 PATCH 2/2] gfs2: Replace gl_revokes with a GLF flag Andreas Gruenbacher
0 siblings, 1 reply; 2+ messages in thread
From: Andreas Gruenbacher @ 2019-04-05 17:54 UTC (permalink / raw)
To: cluster-devel.redhat.com
This patch has to do with the life cycle of glocks and buffers. When
gfs2 metadata or journaled data is queued to be written, a gfs2_bufdata
object is assigned to track the buffer, and that is queued to various
lists, including the glock's gl_ail_list to indicate it's on the active
items list. Once the page associated with the buffer has been written,
it is removed from the ail list, but its life isn't over until a revoke
has been successfully written.
So after the block is written, its bufdata object is moved from the
glock's gl_ail_list to a file-system-wide list of pending revokes,
sd_log_le_revoke. At that point the glock still needs to track how many
revokes it contributed to that list (in gl_revokes) so that things like
glock go_sync can ensure all the metadata has been not only written, but
also revoked before the glock is granted to a different node. This is
to guarantee journal replay doesn't replay the block once the glock has
been granted to another node.
Ross Lagerwall recently discovered a race in which an inode could be
evicted, and its glock freed after its ail list had been synced, but
while it still had unwritten revokes on the sd_log_le_revoke list. The
evict decremented the glock reference count to zero, which allowed the
glock to be freed. After the revoke was written, function
revoke_lo_after_commit tried to adjust the glock's gl_revokes counter
and clear its GLF_LFLUSH flag, at which time it referenced the freed
glock.
This patch fixes the problem by incrementing the glock reference count
in gfs2_add_revoke when the glock's first bufdata object is moved from
the glock to the global revokes list. Later, when the glock's last such
bufdata object is freed, the reference count is decremented. This
guarantees that whichever process finishes last (the revoke writing or
the evict) will properly free the glock, and neither will reference the
glock after it has been freed.
Reported-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
fs/gfs2/glock.c | 1 +
fs/gfs2/log.c | 3 ++-
fs/gfs2/lops.c | 6 ++++--
3 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index e4f6d39500bcc..71c28ff98b564 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -140,6 +140,7 @@ void gfs2_glock_free(struct gfs2_glock *gl)
{
struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
+ BUG_ON(atomic_read(&gl->gl_revokes));
rhashtable_remove_fast(&gl_hash_table, &gl->gl_node, ht_parms);
smp_mb();
wake_up_glock(gl);
diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index ebbc68dca1459..7ba94b66c91b2 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -606,7 +606,8 @@ void gfs2_add_revoke(struct gfs2_sbd *sdp, struct gfs2_bufdata *bd)
gfs2_remove_from_ail(bd); /* drops ref on bh */
bd->bd_bh = NULL;
sdp->sd_log_num_revoke++;
- atomic_inc(&gl->gl_revokes);
+ if (atomic_inc_return(&gl->gl_revokes) == 1)
+ gfs2_glock_hold(gl);
set_bit(GLF_LFLUSH, &gl->gl_flags);
list_add(&bd->bd_list, &sdp->sd_log_le_revoke);
}
diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index aef21b6a608f6..b4f0a6a3ba59d 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -669,8 +669,10 @@ static void revoke_lo_after_commit(struct gfs2_sbd *sdp, struct gfs2_trans *tr)
bd = list_entry(head->next, struct gfs2_bufdata, bd_list);
list_del_init(&bd->bd_list);
gl = bd->bd_gl;
- atomic_dec(&gl->gl_revokes);
- clear_bit(GLF_LFLUSH, &gl->gl_flags);
+ if (atomic_dec_return(&gl->gl_revokes) == 0) {
+ clear_bit(GLF_LFLUSH, &gl->gl_flags);
+ gfs2_glock_queue_put(gl);
+ }
kmem_cache_free(gfs2_bufdata_cachep, bd);
}
}
--
2.20.1
^ permalink raw reply related [flat|nested] 2+ messages in thread
* [Cluster-devel] [GFS2 PATCH 2/2] gfs2: Replace gl_revokes with a GLF flag
2019-04-05 17:54 [Cluster-devel] [GFS2 PATCH 1/2] gfs2: Fix occasional glock use-after-free Andreas Gruenbacher
@ 2019-04-05 17:54 ` Andreas Gruenbacher
0 siblings, 0 replies; 2+ messages in thread
From: Andreas Gruenbacher @ 2019-04-05 17:54 UTC (permalink / raw)
To: cluster-devel.redhat.com
From: Bob Peterson <rpeterso@redhat.com>
The gl_revokes value determines how many outstanding revokes a glock has
on the superblock revokes list; this is used to avoid unnecessary log
flushes. However, gl_revokes is only ever tested for being zero, and it's
only decremented in revoke_lo_after_commit, which removes all revokes
from the list, so we know that the gl_revoke values of all the glocks on
the list will reach zero. Therefore, we can replace gl_revokes with a
bit flag. This saves an atomic counter in struct gfs2_glock.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
fs/gfs2/glock.c | 4 ++--
fs/gfs2/incore.h | 2 +-
fs/gfs2/log.c | 4 +++-
fs/gfs2/lops.c | 33 ++++++++++++++++++++++++---------
fs/gfs2/main.c | 1 -
fs/gfs2/super.c | 2 +-
6 files changed, 31 insertions(+), 15 deletions(-)
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 71c28ff98b564..15c605cfcfc85 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -140,7 +140,7 @@ void gfs2_glock_free(struct gfs2_glock *gl)
{
struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
- BUG_ON(atomic_read(&gl->gl_revokes));
+ BUG_ON(test_bit(GLF_REVOKES, &gl->gl_flags));
rhashtable_remove_fast(&gl_hash_table, &gl->gl_node, ht_parms);
smp_mb();
wake_up_glock(gl);
@@ -1801,7 +1801,7 @@ void gfs2_dump_glock(struct seq_file *seq, struct gfs2_glock *gl)
state2str(gl->gl_target),
state2str(gl->gl_demote_state), dtime,
atomic_read(&gl->gl_ail_count),
- atomic_read(&gl->gl_revokes),
+ test_bit(GLF_REVOKES, &gl->gl_flags) ? 1 : 0,
(int)gl->gl_lockref.count, gl->gl_hold_time);
list_for_each_entry(gh, &gl->gl_holders, gh_list)
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 86840a70ee1ae..6a94b094a9043 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -345,6 +345,7 @@ enum {
GLF_OBJECT = 14, /* Used only for tracing */
GLF_BLOCKING = 15,
GLF_INODE_CREATING = 16, /* Inode creation occurring */
+ GLF_REVOKES = 17, /* Glock has revokes in queue */
};
struct gfs2_glock {
@@ -374,7 +375,6 @@ struct gfs2_glock {
struct list_head gl_lru;
struct list_head gl_ail_list;
atomic_t gl_ail_count;
- atomic_t gl_revokes;
struct delayed_work gl_work;
union {
/* For inode and iopen glocks only */
diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 7ba94b66c91b2..d55315a46ecee 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -606,8 +606,10 @@ void gfs2_add_revoke(struct gfs2_sbd *sdp, struct gfs2_bufdata *bd)
gfs2_remove_from_ail(bd); /* drops ref on bh */
bd->bd_bh = NULL;
sdp->sd_log_num_revoke++;
- if (atomic_inc_return(&gl->gl_revokes) == 1)
+ if (!test_bit(GLF_REVOKES, &gl->gl_flags)) {
+ set_bit(GLF_REVOKES, &gl->gl_flags);
gfs2_glock_hold(gl);
+ }
set_bit(GLF_LFLUSH, &gl->gl_flags);
list_add(&bd->bd_list, &sdp->sd_log_le_revoke);
}
diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index b4f0a6a3ba59d..2fd61853ba639 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -662,19 +662,34 @@ static void revoke_lo_before_commit(struct gfs2_sbd *sdp, struct gfs2_trans *tr)
static void revoke_lo_after_commit(struct gfs2_sbd *sdp, struct gfs2_trans *tr)
{
struct list_head *head = &sdp->sd_log_le_revoke;
- struct gfs2_bufdata *bd;
- struct gfs2_glock *gl;
+ struct gfs2_bufdata *bd, *tmp;
- while (!list_empty(head)) {
- bd = list_entry(head->next, struct gfs2_bufdata, bd_list);
- list_del_init(&bd->bd_list);
- gl = bd->bd_gl;
- if (atomic_dec_return(&gl->gl_revokes) == 0) {
- clear_bit(GLF_LFLUSH, &gl->gl_flags);
- gfs2_glock_queue_put(gl);
+ /*
+ * Glocks can be referenced repeatedly on the revoke list, but the list
+ * only holds one reference. All glocks on the list will have the
+ * GLF_REVOKES flag set initially.
+ */
+
+ list_for_each_entry_safe(bd, tmp, head, bd_list) {
+ struct gfs2_glock *gl = bd->bd_gl;
+
+ if (test_bit(GLF_REVOKES, &gl->gl_flags)) {
+ /* Keep each glock on the list exactly once. */
+ clear_bit(GLF_REVOKES, &gl->gl_flags);
+ continue;
}
+ list_del(&bd->bd_list);
+ kmem_cache_free(gfs2_bufdata_cachep, bd);
+ }
+ list_for_each_entry_safe(bd, tmp, head, bd_list) {
+ struct gfs2_glock *gl = bd->bd_gl;
+
+ list_del(&bd->bd_list);
kmem_cache_free(gfs2_bufdata_cachep, bd);
+ clear_bit(GLF_LFLUSH, &gl->gl_flags);
+ gfs2_glock_queue_put(gl);
}
+ /* the list is empty now */
}
static void revoke_lo_before_scan(struct gfs2_jdesc *jd,
diff --git a/fs/gfs2/main.c b/fs/gfs2/main.c
index 136484ef35d37..c700738de1f72 100644
--- a/fs/gfs2/main.c
+++ b/fs/gfs2/main.c
@@ -59,7 +59,6 @@ static void gfs2_init_glock_once(void *foo)
INIT_LIST_HEAD(&gl->gl_lru);
INIT_LIST_HEAD(&gl->gl_ail_list);
atomic_set(&gl->gl_ail_count, 0);
- atomic_set(&gl->gl_revokes, 0);
}
static void gfs2_init_gl_aspace_once(void *foo)
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index ca71163ff7cfd..e20fa55715e21 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -1474,7 +1474,7 @@ static void gfs2_final_release_pages(struct gfs2_inode *ip)
truncate_inode_pages(gfs2_glock2aspace(ip->i_gl), 0);
truncate_inode_pages(&inode->i_data, 0);
- if (atomic_read(&gl->gl_revokes) == 0) {
+ if (!test_bit(GLF_REVOKES, &gl->gl_flags)) {
clear_bit(GLF_LFLUSH, &gl->gl_flags);
clear_bit(GLF_DIRTY, &gl->gl_flags);
}
--
2.20.1
^ permalink raw reply related [flat|nested] 2+ messages in thread
end of thread, other threads:[~2019-04-05 17:54 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-04-05 17:54 [Cluster-devel] [GFS2 PATCH 1/2] gfs2: Fix occasional glock use-after-free Andreas Gruenbacher
2019-04-05 17:54 ` [Cluster-devel] [GFS2 PATCH 2/2] gfs2: Replace gl_revokes with a GLF flag Andreas Gruenbacher
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).