* [Cluster-devel] [GFS2 PATCH] GFS2: Take inode off order_write list when setting jdata flag [not found] <840948171.17705947.1507297147763.JavaMail.zimbra@redhat.com> @ 2017-10-06 13:39 ` Bob Peterson 2017-10-16 18:46 ` Abhijith Das 0 siblings, 1 reply; 2+ messages in thread From: Bob Peterson @ 2017-10-06 13:39 UTC (permalink / raw) To: cluster-devel.redhat.com Hi, This patch fixes a deadlock caused when the jdata flag is set for inodes that are already on the ordered write list. Since it is on the ordered write list, log_flush calls gfs2_ordered_write which calls filemap_fdatawrite. But since the inode had the jdata flag set, that calls gfs2_jdata_writepages, which tries to start a new transaction. A new transaction cannot be started because it tries to acquire the log_flush rwsem which is already locked by the log flush operation. The bottom line is: We cannot switch an inode from ordered to jdata until we eliminate any ordered data pages (via log flush) or any log_flush operation afterward will create the circular dependency above. So we need to flush the log before setting the diskflags to switch the file mode, then we need to remove the inode from the ordered writes list. Before this patch, the log flush was done for jdata->ordered, but that's wrong. If we're going from jdata to ordered, we don't need to call gfs2_log_flush because the call to filemap_fdatawrite will do it for us: filemap_fdatawrite() -> __filemap_fdatawrite_range() __filemap_fdatawrite_range() -> do_writepages() do_writepages() -> gfs2_jdata_writepages() gfs2_jdata_writepages() -> gfs2_log_flush() This patch modifies function do_gfs2_set_flags so that if a file has its jdata flag set, and it's already on the ordered write list, the log will be flushed and it will be removed from the list before setting the flag. Signed-off-by: Bob Peterson <rpeterso@redhat.com> --- diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index c7a904a8fbb4..0e9b81acf191 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -256,7 +256,7 @@ static int do_gfs2_set_flags(struct file *filp, u32 reqflags, u32 mask) goto out; } if ((flags ^ new_flags) & GFS2_DIF_JDATA) { - if (flags & GFS2_DIF_JDATA) + if (new_flags & GFS2_DIF_JDATA) gfs2_log_flush(sdp, ip->i_gl, NORMAL_FLUSH); error = filemap_fdatawrite(inode->i_mapping); if (error) @@ -264,6 +264,8 @@ static int do_gfs2_set_flags(struct file *filp, u32 reqflags, u32 mask) error = filemap_fdatawait(inode->i_mapping); if (error) goto out; + if (new_flags & GFS2_DIF_JDATA) + gfs2_ordered_del_inode(ip); } error = gfs2_trans_begin(sdp, RES_DINODE, 0); if (error) ^ permalink raw reply related [flat|nested] 2+ messages in thread
* [Cluster-devel] [GFS2 PATCH] GFS2: Take inode off order_write list when setting jdata flag 2017-10-06 13:39 ` [Cluster-devel] [GFS2 PATCH] GFS2: Take inode off order_write list when setting jdata flag Bob Peterson @ 2017-10-16 18:46 ` Abhijith Das 0 siblings, 0 replies; 2+ messages in thread From: Abhijith Das @ 2017-10-16 18:46 UTC (permalink / raw) To: cluster-devel.redhat.com Hi, Looks good. ACK. Cheers! --Abhi ----- Original Message ----- > From: "Bob Peterson" <rpeterso@redhat.com> > To: "cluster-devel" <cluster-devel@redhat.com> > Sent: Friday, October 6, 2017 8:39:08 AM > Subject: [Cluster-devel] [GFS2 PATCH] GFS2: Take inode off order_write list when setting jdata flag > > Hi, > > This patch fixes a deadlock caused when the jdata flag is set for > inodes that are already on the ordered write list. Since it is > on the ordered write list, log_flush calls gfs2_ordered_write which > calls filemap_fdatawrite. But since the inode had the jdata flag > set, that calls gfs2_jdata_writepages, which tries to start a new > transaction. A new transaction cannot be started because it tries > to acquire the log_flush rwsem which is already locked by the log > flush operation. > > The bottom line is: We cannot switch an inode from ordered to jdata > until we eliminate any ordered data pages (via log flush) or any > log_flush operation afterward will create the circular dependency > above. So we need to flush the log before setting the diskflags to > switch the file mode, then we need to remove the inode from the > ordered writes list. > > Before this patch, the log flush was done for jdata->ordered, but > that's wrong. If we're going from jdata to ordered, we don't need > to call gfs2_log_flush because the call to filemap_fdatawrite will > do it for us: > > filemap_fdatawrite() -> __filemap_fdatawrite_range() > __filemap_fdatawrite_range() -> do_writepages() > do_writepages() -> gfs2_jdata_writepages() > gfs2_jdata_writepages() -> gfs2_log_flush() > > This patch modifies function do_gfs2_set_flags so that if a file > has its jdata flag set, and it's already on the ordered write list, > the log will be flushed and it will be removed from the list > before setting the flag. > > Signed-off-by: Bob Peterson <rpeterso@redhat.com> > --- > diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c > index c7a904a8fbb4..0e9b81acf191 100644 > --- a/fs/gfs2/file.c > +++ b/fs/gfs2/file.c > @@ -256,7 +256,7 @@ static int do_gfs2_set_flags(struct file *filp, u32 > reqflags, u32 mask) > goto out; > } > if ((flags ^ new_flags) & GFS2_DIF_JDATA) { > - if (flags & GFS2_DIF_JDATA) > + if (new_flags & GFS2_DIF_JDATA) > gfs2_log_flush(sdp, ip->i_gl, NORMAL_FLUSH); > error = filemap_fdatawrite(inode->i_mapping); > if (error) > @@ -264,6 +264,8 @@ static int do_gfs2_set_flags(struct file *filp, u32 > reqflags, u32 mask) > error = filemap_fdatawait(inode->i_mapping); > if (error) > goto out; > + if (new_flags & GFS2_DIF_JDATA) > + gfs2_ordered_del_inode(ip); > } > error = gfs2_trans_begin(sdp, RES_DINODE, 0); > if (error) > > ^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2017-10-16 18:46 UTC | newest] Thread overview: 2+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <840948171.17705947.1507297147763.JavaMail.zimbra@redhat.com> 2017-10-06 13:39 ` [Cluster-devel] [GFS2 PATCH] GFS2: Take inode off order_write list when setting jdata flag Bob Peterson 2017-10-16 18:46 ` Abhijith Das
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).