* [Cluster-devel] [PATCH 1/2] iomap: Add a page_prepare callback @ 2019-04-24 17:18 Andreas Gruenbacher 2019-04-24 17:18 ` [Cluster-devel] [PATCH 2/2] gfs2: Fix iomap write page reclaim deadlock Andreas Gruenbacher ` (2 more replies) 0 siblings, 3 replies; 6+ messages in thread From: Andreas Gruenbacher @ 2019-04-24 17:18 UTC (permalink / raw) To: cluster-devel.redhat.com Add a page_prepare calback that's called before a page is written to. This will be used by gfs2 to start a transaction in page_prepare and end it in page_done. Other filesystems that implement data journaling will require the same kind of mechanism. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> --- fs/iomap.c | 4 ++++ include/linux/iomap.h | 9 ++++++--- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/fs/iomap.c b/fs/iomap.c index 97cb9d486a7d..abd9aa76dbd1 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -684,6 +684,10 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags, status = __block_write_begin_int(page, pos, len, NULL, iomap); else status = __iomap_write_begin(inode, pos, len, page, iomap); + + if (likely(!status) && iomap->page_prepare) + status = iomap->page_prepare(inode, pos, len, page, iomap); + if (unlikely(status)) { unlock_page(page); put_page(page); diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 0fefb5455bda..0982f3e13e56 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -65,10 +65,13 @@ struct iomap { void *private; /* filesystem private */ /* - * Called when finished processing a page in the mapping returned in - * this iomap. At least for now this is only supported in the buffered - * write path. + * Called before / after processing a page in the mapping returned in + * this iomap. At least for now, this is only supported in the + * buffered write path. When page_prepare returns 0 for a page, + * page_done is called for that page as well. */ + int (*page_prepare)(struct inode *inode, loff_t pos, unsigned len, + struct page *page, struct iomap *iomap); void (*page_done)(struct inode *inode, loff_t pos, unsigned copied, struct page *page, struct iomap *iomap); }; -- 2.20.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Cluster-devel] [PATCH 2/2] gfs2: Fix iomap write page reclaim deadlock 2019-04-24 17:18 [Cluster-devel] [PATCH 1/2] iomap: Add a page_prepare callback Andreas Gruenbacher @ 2019-04-24 17:18 ` Andreas Gruenbacher 2019-04-25 7:59 ` [Cluster-devel] [PATCH 1/2] iomap: Add a page_prepare callback Christoph Hellwig 2019-04-25 8:32 ` Jan Kara 2 siblings, 0 replies; 6+ messages in thread From: Andreas Gruenbacher @ 2019-04-24 17:18 UTC (permalink / raw) To: cluster-devel.redhat.com Since commit 64bc06bb32ee ("gfs2: iomap buffered write support"), gfs2 is doing buffered writes by starting a transaction in iomap_begin, writing a range of pages, and ending that transaction in iomap_end. This approach suffers from two problems: (1) Any allocations necessary for the write are done in iomap_begin, so when the data aren't journaled, there is no need for keeping the transaction open until iomap_end. (2) Transactions keep the gfs2 log flush lock held. When iomap_file_buffered_write calls balance_dirty_pages, this can end up calling gfs2_write_inode, which will try to flush the log. This requires taking the log flush lock which is already held, resulting in a deadlock. Fix both of these issues by not keeping transactions open from iomap_begin to iomap_end. Instead, start a small transaction in page_prepare and end it in page_done when necessary. Reported-by: Edwin T?r?k <edvin.torok@citrix.com> Fixes: 64bc06bb32ee ("gfs2: iomap buffered write support") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> --- fs/gfs2/aops.c | 14 +++++-- fs/gfs2/bmap.c | 99 ++++++++++++++++++++++++++++---------------------- 2 files changed, 65 insertions(+), 48 deletions(-) diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c index 05dd78f4b2b3..6210d4429d84 100644 --- a/fs/gfs2/aops.c +++ b/fs/gfs2/aops.c @@ -649,7 +649,7 @@ static int gfs2_readpages(struct file *file, struct address_space *mapping, */ void adjust_fs_space(struct inode *inode) { - struct gfs2_sbd *sdp = inode->i_sb->s_fs_info; + struct gfs2_sbd *sdp = GFS2_SB(inode); struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode); struct gfs2_inode *l_ip = GFS2_I(sdp->sd_sc_inode); struct gfs2_statfs_change_host *m_sc = &sdp->sd_statfs_master; @@ -657,10 +657,13 @@ void adjust_fs_space(struct inode *inode) struct buffer_head *m_bh, *l_bh; u64 fs_total, new_free; + if (gfs2_trans_begin(sdp, 2 * RES_STATFS, 0) != 0) + return; + /* Total up the file system space, according to the latest rindex. */ fs_total = gfs2_ri_total(sdp); if (gfs2_meta_inode_buffer(m_ip, &m_bh) != 0) - return; + goto out; spin_lock(&sdp->sd_statfs_spin); gfs2_statfs_change_in(m_sc, m_bh->b_data + @@ -675,11 +678,14 @@ void adjust_fs_space(struct inode *inode) gfs2_statfs_change(sdp, new_free, new_free, 0); if (gfs2_meta_inode_buffer(l_ip, &l_bh) != 0) - goto out; + goto out2; update_statfs(sdp, m_bh, l_bh); brelse(l_bh); -out: +out2: brelse(m_bh); +out: + sdp->sd_rindex_uptodate = 0; + gfs2_trans_end(sdp); } /** diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c index 5da4ca9041c0..34543a4d4e4a 100644 --- a/fs/gfs2/bmap.c +++ b/fs/gfs2/bmap.c @@ -991,13 +991,25 @@ static void gfs2_write_unlock(struct inode *inode) gfs2_glock_dq_uninit(&ip->i_gh); } -static void gfs2_iomap_journaled_page_done(struct inode *inode, loff_t pos, - unsigned copied, struct page *page, - struct iomap *iomap) +static int gfs2_iomap_page_prepare(struct inode *inode, loff_t pos, + unsigned len, struct page *page, + struct iomap *iomap) +{ + struct gfs2_sbd *sdp = GFS2_SB(inode); + + return gfs2_trans_begin(sdp, RES_DINODE + (len >> inode->i_blkbits), 0); +} + +static void gfs2_iomap_page_done(struct inode *inode, loff_t pos, + unsigned copied, struct page *page, + struct iomap *iomap) { struct gfs2_inode *ip = GFS2_I(inode); + struct gfs2_sbd *sdp = GFS2_SB(inode); - gfs2_page_add_databufs(ip, page, offset_in_page(pos), copied); + if (!gfs2_is_stuffed(ip)) + gfs2_page_add_databufs(ip, page, offset_in_page(pos), copied); + gfs2_trans_end(sdp); } static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos, @@ -1052,32 +1064,48 @@ static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos, if (alloc_required) rblocks += gfs2_rg_blocks(ip, data_blocks + ind_blocks); - ret = gfs2_trans_begin(sdp, rblocks, iomap->length >> inode->i_blkbits); - if (ret) - goto out_trans_fail; + if (unstuff || iomap->type == IOMAP_HOLE) { + struct gfs2_trans *tr; - if (unstuff) { - ret = gfs2_unstuff_dinode(ip, NULL); + ret = gfs2_trans_begin(sdp, rblocks, + iomap->length >> inode->i_blkbits); if (ret) - goto out_trans_end; - release_metapath(mp); - ret = gfs2_iomap_get(inode, iomap->offset, iomap->length, - flags, iomap, mp); - if (ret) - goto out_trans_end; - } + goto out_trans_fail; - if (iomap->type == IOMAP_HOLE) { - ret = gfs2_iomap_alloc(inode, iomap, flags, mp); - if (ret) { - gfs2_trans_end(sdp); - gfs2_inplace_release(ip); - punch_hole(ip, iomap->offset, iomap->length); - goto out_qunlock; + if (unstuff) { + ret = gfs2_unstuff_dinode(ip, NULL); + if (ret) + goto out_trans_end; + release_metapath(mp); + ret = gfs2_iomap_get(inode, iomap->offset, + iomap->length, flags, iomap, mp); + if (ret) + goto out_trans_end; + } + + if (iomap->type == IOMAP_HOLE) { + ret = gfs2_iomap_alloc(inode, iomap, flags, mp); + if (ret) { + gfs2_trans_end(sdp); + gfs2_inplace_release(ip); + punch_hole(ip, iomap->offset, iomap->length); + goto out_qunlock; + } } + + tr = current->journal_info; + if (tr->tr_num_buf_new) + __mark_inode_dirty(inode, I_DIRTY_DATASYNC); + else + gfs2_trans_add_meta(ip->i_gl, mp->mp_bh[0]); + + gfs2_trans_end(sdp); + } + + if (gfs2_is_stuffed(ip) || gfs2_is_jdata(ip)) { + iomap->page_prepare = gfs2_iomap_page_prepare; + iomap->page_done = gfs2_iomap_page_done; } - if (!gfs2_is_stuffed(ip) && gfs2_is_jdata(ip)) - iomap->page_done = gfs2_iomap_journaled_page_done; return 0; out_trans_end: @@ -1116,10 +1144,6 @@ static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length, iomap->type != IOMAP_MAPPED) ret = -ENOTBLK; } - if (!ret) { - get_bh(mp.mp_bh[0]); - iomap->private = mp.mp_bh[0]; - } release_metapath(&mp); trace_gfs2_iomap_end(ip, iomap, ret); return ret; @@ -1130,27 +1154,16 @@ static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length, { struct gfs2_inode *ip = GFS2_I(inode); struct gfs2_sbd *sdp = GFS2_SB(inode); - struct gfs2_trans *tr = current->journal_info; - struct buffer_head *dibh = iomap->private; if ((flags & (IOMAP_WRITE | IOMAP_DIRECT)) != IOMAP_WRITE) goto out; - if (iomap->type != IOMAP_INLINE) { + if (!gfs2_is_stuffed(ip)) gfs2_ordered_add_inode(ip); - if (tr->tr_num_buf_new) - __mark_inode_dirty(inode, I_DIRTY_DATASYNC); - else - gfs2_trans_add_meta(ip->i_gl, dibh); - } - - if (inode == sdp->sd_rindex) { + if (inode == sdp->sd_rindex) adjust_fs_space(inode); - sdp->sd_rindex_uptodate = 0; - } - gfs2_trans_end(sdp); gfs2_inplace_release(ip); if (length != written && (iomap->flags & IOMAP_F_NEW)) { @@ -1170,8 +1183,6 @@ static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length, gfs2_write_unlock(inode); out: - if (dibh) - brelse(dibh); return 0; } -- 2.20.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Cluster-devel] [PATCH 1/2] iomap: Add a page_prepare callback 2019-04-24 17:18 [Cluster-devel] [PATCH 1/2] iomap: Add a page_prepare callback Andreas Gruenbacher 2019-04-24 17:18 ` [Cluster-devel] [PATCH 2/2] gfs2: Fix iomap write page reclaim deadlock Andreas Gruenbacher @ 2019-04-25 7:59 ` Christoph Hellwig 2019-04-25 8:32 ` Jan Kara 2 siblings, 0 replies; 6+ messages in thread From: Christoph Hellwig @ 2019-04-25 7:59 UTC (permalink / raw) To: cluster-devel.redhat.com On Wed, Apr 24, 2019 at 07:18:03PM +0200, Andreas Gruenbacher wrote: > Add a page_prepare calback that's called before a page is written to. This > will be used by gfs2 to start a transaction in page_prepare and end it in > page_done. Other filesystems that implement data journaling will require the > same kind of mechanism. This looks basically fine to me. But I think it would be nicer to add a iomap_page_ops structure so that we don't have to add more pointers directly to the iomap. We can make that struct pointer const also to avoid runtime overwriting attacks. ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Cluster-devel] [PATCH 1/2] iomap: Add a page_prepare callback 2019-04-24 17:18 [Cluster-devel] [PATCH 1/2] iomap: Add a page_prepare callback Andreas Gruenbacher 2019-04-24 17:18 ` [Cluster-devel] [PATCH 2/2] gfs2: Fix iomap write page reclaim deadlock Andreas Gruenbacher 2019-04-25 7:59 ` [Cluster-devel] [PATCH 1/2] iomap: Add a page_prepare callback Christoph Hellwig @ 2019-04-25 8:32 ` Jan Kara 2019-04-25 15:03 ` Christoph Hellwig 2019-04-25 15:26 ` Andreas Gruenbacher 2 siblings, 2 replies; 6+ messages in thread From: Jan Kara @ 2019-04-25 8:32 UTC (permalink / raw) To: cluster-devel.redhat.com On Wed 24-04-19 19:18:03, Andreas Gruenbacher wrote: > Add a page_prepare calback that's called before a page is written to. This > will be used by gfs2 to start a transaction in page_prepare and end it in > page_done. Other filesystems that implement data journaling will require the > same kind of mechanism. > > Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Thanks for the patch. Some comments below. > diff --git a/fs/iomap.c b/fs/iomap.c > index 97cb9d486a7d..abd9aa76dbd1 100644 > --- a/fs/iomap.c > +++ b/fs/iomap.c > @@ -684,6 +684,10 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags, > status = __block_write_begin_int(page, pos, len, NULL, iomap); > else > status = __iomap_write_begin(inode, pos, len, page, iomap); > + > + if (likely(!status) && iomap->page_prepare) > + status = iomap->page_prepare(inode, pos, len, page, iomap); > + > if (unlikely(status)) { > unlock_page(page); > put_page(page); So this gets called after a page is locked. Is it OK for GFS2 to acquire sd_log_flush_lock under page lock? Because e.g. gfs2_write_jdata_pagevec() seems to acquire these locks the other way around so that could cause ABBA deadlocks? Also just looking at the code I was wondering about the following. E.g. in iomap_write_end() we have code like: if (iomap->type == IOMAP_INLINE) { foo } else if (iomap->flags & IOMAP_F_BUFFER_HEAD) { bar } else { baz } if (iomap->page_done) iomap->page_done(...); And now something very similar is in iomap_write_begin(). So won't it be more natural to just mandate ->page_prepare() and ->page_done() callbacks and each filesystem would set it to a helper function it needs? Probably we could get rid of IOMAP_F_BUFFER_HEAD flag that way... Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Cluster-devel] [PATCH 1/2] iomap: Add a page_prepare callback 2019-04-25 8:32 ` Jan Kara @ 2019-04-25 15:03 ` Christoph Hellwig 2019-04-25 15:26 ` Andreas Gruenbacher 1 sibling, 0 replies; 6+ messages in thread From: Christoph Hellwig @ 2019-04-25 15:03 UTC (permalink / raw) To: cluster-devel.redhat.com On Thu, Apr 25, 2019 at 10:32:52AM +0200, Jan Kara wrote: > Also just looking at the code I was wondering about the following. E.g. in > iomap_write_end() we have code like: > > if (iomap->type == IOMAP_INLINE) { > foo > } else if (iomap->flags & IOMAP_F_BUFFER_HEAD) { > bar > } else { > baz > } > > if (iomap->page_done) > iomap->page_done(...); > > And now something very similar is in iomap_write_begin(). So won't it be > more natural to just mandate ->page_prepare() and ->page_done() callbacks > and each filesystem would set it to a helper function it needs? Probably we > could get rid of IOMAP_F_BUFFER_HEAD flag that way... I don't want pointless indirect calls for the default, non-buffer head case. Also inline really is a special case independent of what the caller could pass in as flags or callbacks. We could try to hide the buffer_head stuff in there, but then again I'd rather kill that off sooner than later. ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Cluster-devel] [PATCH 1/2] iomap: Add a page_prepare callback 2019-04-25 8:32 ` Jan Kara 2019-04-25 15:03 ` Christoph Hellwig @ 2019-04-25 15:26 ` Andreas Gruenbacher 1 sibling, 0 replies; 6+ messages in thread From: Andreas Gruenbacher @ 2019-04-25 15:26 UTC (permalink / raw) To: cluster-devel.redhat.com On Thu, 25 Apr 2019 at 10:32, Jan Kara <jack@suse.cz> wrote: > On Wed 24-04-19 19:18:03, Andreas Gruenbacher wrote: > > Add a page_prepare calback that's called before a page is written to. This > > will be used by gfs2 to start a transaction in page_prepare and end it in > > page_done. Other filesystems that implement data journaling will require the > > same kind of mechanism. > > > > Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> > > Thanks for the patch. Some comments below. > > > diff --git a/fs/iomap.c b/fs/iomap.c > > index 97cb9d486a7d..abd9aa76dbd1 100644 > > --- a/fs/iomap.c > > +++ b/fs/iomap.c > > @@ -684,6 +684,10 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags, > > status = __block_write_begin_int(page, pos, len, NULL, iomap); > > else > > status = __iomap_write_begin(inode, pos, len, page, iomap); > > + > > + if (likely(!status) && iomap->page_prepare) > > + status = iomap->page_prepare(inode, pos, len, page, iomap); > > + > > if (unlikely(status)) { > > unlock_page(page); > > put_page(page); > > So this gets called after a page is locked. Is it OK for GFS2 to acquire > sd_log_flush_lock under page lock? Because e.g. gfs2_write_jdata_pagevec() > seems to acquire these locks the other way around so that could cause ABBA > deadlocks? Good catch, the callback indeed needs to happen earlier. Thanks, Andreas ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-04-25 15:26 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-04-24 17:18 [Cluster-devel] [PATCH 1/2] iomap: Add a page_prepare callback Andreas Gruenbacher 2019-04-24 17:18 ` [Cluster-devel] [PATCH 2/2] gfs2: Fix iomap write page reclaim deadlock Andreas Gruenbacher 2019-04-25 7:59 ` [Cluster-devel] [PATCH 1/2] iomap: Add a page_prepare callback Christoph Hellwig 2019-04-25 8:32 ` Jan Kara 2019-04-25 15:03 ` Christoph Hellwig 2019-04-25 15:26 ` Andreas Gruenbacher
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).