From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christoph Hellwig Subject: Re: [PATCH 17/18] xfs: implement pnfs export operations Date: Wed, 7 Jan 2015 11:40:10 +0100 Message-ID: <20150107104010.GD28783@lst.de> References: <1420561721-9150-1-git-send-email-hch@lst.de> <1420561721-9150-18-git-send-email-hch@lst.de> <20150107002434.GG31508@dastard> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Christoph Hellwig , "J. Bruce Fields" , Jeff Layton , linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, xfs-VZNHf3L845pBDgjK7y7TUQ@public.gmane.org To: Dave Chinner Return-path: Content-Disposition: inline In-Reply-To: <20150107002434.GG31508@dastard> Sender: linux-nfs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: linux-fsdevel.vger.kernel.org On Wed, Jan 07, 2015 at 11:24:34AM +1100, Dave Chinner wrote: > > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c > > index fdc6422..2b86be8 100644 > > --- a/fs/xfs/xfs_fsops.c > > +++ b/fs/xfs/xfs_fsops.c > > @@ -601,6 +601,8 @@ xfs_growfs_data( > > if (!mutex_trylock(&mp->m_growlock)) > > return -EWOULDBLOCK; > > error =3D xfs_growfs_data_private(mp, in); > > + if (!error) > > + mp->m_generation++; > > mutex_unlock(&mp->m_growlock); > > return error; > > } >=20 > I couldn't find an explanation of what this generation number is > for. What are it's semantics w.r.t. server crashes? The generation is incremented when we grow the filesystem, so that a new layout (block mapping) returned to the cl=D1=96ent referers to th= e new NFS device ID, which will make the client aware of the new size. The device IDs aren't persistent, so after a server crash / reboot we'll start at zero again. I'll add comments explaining this to the code. > Why does this function get passed an offset it is not actually used? Historic reasons.. > > +static int > > +xfs_fs_update_flags( > > + struct xfs_inode *ip) > > +{ > > + struct xfs_mount *mp =3D ip->i_mount; > > + struct xfs_trans *tp; > > + int error; > > + > > + /* > > + * Update the mode, and prealloc flag bits. > > + */ > > + tp =3D xfs_trans_alloc(mp, XFS_TRANS_WRITEID); > > + error =3D xfs_trans_reserve(tp, &M_RES(mp)->tr_writeid, 0, 0); > > + if (error) { > > + xfs_trans_cancel(tp, 0); > > + return error; > > + } > > + > > + xfs_ilock(ip, XFS_ILOCK_EXCL); > > + xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL); > > + ip->i_d.di_mode &=3D ~S_ISUID; > > + if (ip->i_d.di_mode & S_IXGRP) > > + ip->i_d.di_mode &=3D ~S_ISGID; > > + > > + ip->i_d.di_flags |=3D XFS_DIFLAG_PREALLOC; > > + > > + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); > > + return xfs_trans_commit(tp, 0); > > +} >=20 > That needs timestamp changes as well. i.e.: >=20 > xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG); The time stamps are only updated when we actually commit the data. Updating them here might be harmless, but I'll have to dig into the protocol specification and tests a bit more to check if doing the additional timestamp update would be harmless. > > + > > +/* > > + * Get a layout for the pNFS client. > > + * > > + * Note that in the allocation case we do force out the transactio= n here. > > + * There is no metadata update that is required to be stable for N= =46S > > + * semantics, and layouts are not valid over a server crash. Inst= ead > > + * we'll have to be careful in the commit routine as it might pass= us > > + * blocks for an allocation that never made it to disk in the reco= very > > + * case. >=20 > I think you are saying that because block allocation is an async > transaction, then we have to deal with the possibility that we crash > before the transaction hits the disk. >=20 > How often do we have to allocate > new blocks like this? Do we need to use async transactions for this > case, or should we simply do the brute force thing (by making the > allocation transaction synchronous) initially and then, if > performance problems arise, optimise from there? Every block allocation from a pNFS client goes through this path, so yes it is performance critical. > > + xfs_map_iomap(ip, iomap, &imap, offset); > > + *device_generation =3D mp->m_generation; >=20 > So whenever the server first starts up the generation number in a > map is going to be zero - what purpose does this actually serve? So that we can communicate if a device was grown to the client, which in this case needs to re-read the device information. > > + if (!length) > > + continue; > > + > > + error =3D xfs_iomap_write_unwritten(ip, start, length); > > + if (error) > > + goto out_drop_iolock; > > + } > > + > > + /* > > + * Make sure reads through the pagecache see the new data. > > + */ > > + invalidate_inode_pages2(inode->i_mapping); >=20 > Probably should do that first. Also, what happens if there is local > dirty data on the file at this point? Doesn't this just toss them > away? If there was local data it will be tossed. For regular writes this can= 't happen because we really outstanding layouts in the write path. For mmap we for now ignore this problem, as a pNFS server should generally not be used locally. =20 -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html