* ceph-fuse remount issues @ 2015-02-19 22:23 John Spray 2015-02-23 5:14 ` Gregory Farnum 2015-02-26 8:28 ` 严正 0 siblings, 2 replies; 5+ messages in thread From: John Spray @ 2015-02-19 22:23 UTC (permalink / raw) To: ceph-devel, zyan, Gregory Farnum Background: a while ago, we found (#10277) that existing cache expiration mechanism wasn't working with latest kernels. We used to invalidate the top level dentries, which caused fuse to invalidate everything, but an implementation detail in fuse caused it to start ignoring our repeated invalidate calls, so this doesn't work any more. To persuade fuse to dirty its entire metadata cache, Zheng added in a system() call to "mount -o remount" after we expire things from our client side cache. However, this was a bit of a hack and has created problems: * You can't call mount -o remount unless you're root, so we are less flexible than we used to be (#10542) * While the remount is happening, unmounts sporadically fail and the fuse process can become unresponsive to SIGKILL (#10916) The first issue was maybe an acceptable compromise, but the second issue is just painful, and it seems like we might not have seen the last of the knock on effects -- upstream maintainers certainly aren't expecting filesystems to remount themselves quite so frequently. We probably have an opportunity to get something upstream in fuse to support a direct call to trigger the invalidation we want, if we can work out what that should look like. Thoughts? John ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ceph-fuse remount issues 2015-02-19 22:23 ceph-fuse remount issues John Spray @ 2015-02-23 5:14 ` Gregory Farnum 2015-02-26 8:28 ` 严正 1 sibling, 0 replies; 5+ messages in thread From: Gregory Farnum @ 2015-02-23 5:14 UTC (permalink / raw) To: John Spray; +Cc: ceph-devel, zyan, sage ----- Original Message ----- > From: "John Spray" <john.spray@redhat.com> > To: ceph-devel@vger.kernel.org, zyan@redhat.com, "Gregory Farnum" <gfarnum@redhat.com> > Sent: Thursday, February 19, 2015 2:23:21 PM > Subject: ceph-fuse remount issues > > > Background: a while ago, we found (#10277) that existing cache > expiration mechanism wasn't working with latest kernels. We used to > invalidate the top level dentries, which caused fuse to invalidate > everything, but an implementation detail in fuse caused it to start > ignoring our repeated invalidate calls, so this doesn't work any more. > To persuade fuse to dirty its entire metadata cache, Zheng added in a > system() call to "mount -o remount" after we expire things from our > client side cache. > > However, this was a bit of a hack and has created problems: > * You can't call mount -o remount unless you're root, so we are less > flexible than we used to be (#10542) > * While the remount is happening, unmounts sporadically fail and the > fuse process can become unresponsive to SIGKILL (#10916) > > The first issue was maybe an acceptable compromise, but the second issue > is just painful, and it seems like we might not have seen the last of > the knock on effects -- upstream maintainers certainly aren't expecting > filesystems to remount themselves quite so frequently. Yeah. I looked at this briefly and switching to a conditional behavior based on kernel version shouldn't be too difficult; the actual change in behavior is a very short patch: https://github.com/ceph/ceph/commit/0827bb79ea5127e6763f6e904dfa1a3266046ffb I'm going to try and integrate that in with my branch to warn on remount issues in the morning: https://github.com/ceph/ceph/pull/3681 (better version of that sitting on my computer now too) > We probably have an opportunity to get something upstream in fuse to > support a direct call to trigger the invalidation we want, if we can > work out what that should look like. Thoughts? Yes please. I don't really have the kernel VFS or FUSE interface experience here to offer up much off the top of my head, but I think this is something that FUSE ought to allow us to do. LSF/MM is coming up and Sage will be there, which is probably a good time to raise issues with people in the hallway or appropriate sessions. -Greg ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ceph-fuse remount issues 2015-02-19 22:23 ceph-fuse remount issues John Spray 2015-02-23 5:14 ` Gregory Farnum @ 2015-02-26 8:28 ` 严正 2015-03-16 5:28 ` Sage Weil 1 sibling, 1 reply; 5+ messages in thread From: 严正 @ 2015-02-26 8:28 UTC (permalink / raw) To: John Spray; +Cc: ceph-devel, Gregory Farnum > 在 2015年2月20日,06:23,John Spray <john.spray@redhat.com> 写道: > > > Background: a while ago, we found (#10277) that existing cache expiration mechanism wasn't working with latest kernels. We used to invalidate the top level dentries, which caused fuse to invalidate everything, but an implementation detail in fuse caused it to start ignoring our repeated invalidate calls, so this doesn't work any more. To persuade fuse to dirty its entire metadata cache, Zheng added in a system() call to "mount -o remount" after we expire things from our client side cache. Change of d_invalidate() implementation breaks our old cache expiration mechanism. When invalidating a denty, d_invalidate() also walks the dentry subtree and try pruning any unused descendant dentries. Our old cache expiration mechanism replies on this to prune unused dentries. We invalidate the top level dentries, d_invalidate() try pruning unused dentries underneath these top level dentries. Prior to 3.18 kernel, d_invalidate() can fail if the dentry is used by some one. Implementation of d_invalidate() change in 3.18 kernel, d_invalidate() always successes and unhash the dentry even if it’s still in use. This behavior changes make us not be able to use d_invalidate() at will. One known bad consequence is getcwd() system call return -EINVAL after process’ working directory gets invalidated. The cephfs kernel client has no such issue because it maintains its own per-session cap list. When it receives cache pressure message from MDS, it can iterate the list and prune unused caps. > > However, this was a bit of a hack and has created problems: > * You can't call mount -o remount unless you're root, so we are less flexible than we used to be (#10542) > * While the remount is happening, unmounts sporadically fail and the fuse process can become unresponsive to SIGKILL (#10916) > > The first issue was maybe an acceptable compromise, but the second issue is just painful, and it seems like we might not have seen the last of the knock on effects -- upstream maintainers certainly aren't expecting filesystems to remount themselves quite so frequently. > > We probably have an opportunity to get something upstream in fuse to support a direct call to trigger the invalidation we want, if we can work out what that should look like. Thoughts? > > John -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ceph-fuse remount issues 2015-02-26 8:28 ` 严正 @ 2015-03-16 5:28 ` Sage Weil 2015-03-16 6:28 ` Yan, Zheng 0 siblings, 1 reply; 5+ messages in thread From: Sage Weil @ 2015-03-16 5:28 UTC (permalink / raw) To: 严正; +Cc: John Spray, ceph-devel, Gregory Farnum Hi Zheng, On Thu, 26 Feb 2015, ?? wrote: > > ? 2015?2?20??06:23?John Spray <john.spray@redhat.com> ??? > > > > > > Background: a while ago, we found (#10277) that existing cache expiration mechanism wasn't working with latest kernels. We used to invalidate the top level dentries, which caused fuse to invalidate everything, but an implementation detail in fuse caused it to start ignoring our repeated invalidate calls, so this doesn't work any more. To persuade fuse to dirty its entire metadata cache, Zheng added in a system() call to "mount -o remount" after we expire things from our client side cache. > > Change of d_invalidate() implementation breaks our old cache expiration > mechanism. When invalidating a denty, d_invalidate() also walks the > dentry subtree and try pruning any unused descendant dentries. Our old > cache expiration mechanism replies on this to prune unused dentries. We > invalidate the top level dentries, d_invalidate() try pruning unused > dentries underneath these top level dentries. Prior to 3.18 kernel, > d_invalidate() can fail if the dentry is used by some one. > Implementation of d_invalidate() change in 3.18 kernel, d_invalidate() > always successes and unhash the dentry even if it?s still in use. This > behavior changes make us not be able to use d_invalidate() at will. One > known bad consequence is getcwd() system call return -EINVAL after > process? working directory gets invalidated. I took another look at this and it seems to me like we might need something more than a new call that does the pruning. What we were doing before was also a bit of a hack, it seems. What is really going on is that the MDS is telling us to reduce the number of inodes we have pinned. We should ideally turn that into pressure on dcache.. but it's not per-superblock, so there's not a shrinker we can poke that that does what we want. We *are* doing the dentry invalidations, so the dentries are unhashed. But they aren't getting destroyed... Zheng, this is what the previous hack was doing, right? Forcing unhashed dentries to get trimmed from the LRU? It seems like the most elegant solution would be to patch fs/fuse to make that happen in the general case when we do the invalidate upcall. Does that sound right? sage ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ceph-fuse remount issues 2015-03-16 5:28 ` Sage Weil @ 2015-03-16 6:28 ` Yan, Zheng 0 siblings, 0 replies; 5+ messages in thread From: Yan, Zheng @ 2015-03-16 6:28 UTC (permalink / raw) To: Sage Weil; +Cc: 严正, John Spray, ceph-devel, Gregory Farnum On Mon, Mar 16, 2015 at 1:28 PM, Sage Weil <sweil@redhat.com> wrote: > Hi Zheng, > > On Thu, 26 Feb 2015, ?? wrote: >> > ? 2015?2?20??06:23?John Spray <john.spray@redhat.com> ??? >> > >> > >> > Background: a while ago, we found (#10277) that existing cache expiration mechanism wasn't working with latest kernels. We used to invalidate the top level dentries, which caused fuse to invalidate everything, but an implementation detail in fuse caused it to start ignoring our repeated invalidate calls, so this doesn't work any more. To persuade fuse to dirty its entire metadata cache, Zheng added in a system() call to "mount -o remount" after we expire things from our client side cache. >> >> Change of d_invalidate() implementation breaks our old cache expiration >> mechanism. When invalidating a denty, d_invalidate() also walks the >> dentry subtree and try pruning any unused descendant dentries. Our old >> cache expiration mechanism replies on this to prune unused dentries. We >> invalidate the top level dentries, d_invalidate() try pruning unused >> dentries underneath these top level dentries. Prior to 3.18 kernel, >> d_invalidate() can fail if the dentry is used by some one. >> Implementation of d_invalidate() change in 3.18 kernel, d_invalidate() >> always successes and unhash the dentry even if it?s still in use. This >> behavior changes make us not be able to use d_invalidate() at will. One >> known bad consequence is getcwd() system call return -EINVAL after >> process? working directory gets invalidated. > > I took another look at this and it seems to me like we might need > something more than a new call that does the pruning. What we were doing > before was also a bit of a hack, it seems. > > What is really going on is that the MDS is telling us to reduce the number > of inodes we have pinned. We should ideally turn that into pressure on > dcache.. but it's not per-superblock, so there's not a shrinker we can > poke that that does what we want. > > We *are* doing the dentry is, so the dentries are unhashed. > But they aren't getting destroyed... Zheng, this is what the previous hack > was doing, right? Forcing unhashed dentries to get trimmed from the LRU? > Yes, that's what previous hack does. When invalidating dentry, VFS also tries destroying the dentry. > It seems like the most elegant solution would be to patch fs/fuse to make > that happen in the general case when we do the invalidate upcall. Does > that sound right? what do you mean "make that happen in the general case". In my option, there isn't much fuse kernel module can do about dentry. Maybe we can try adding a new callback which call d_prune_aliases(). Regards Yan, Zheng > > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-03-16 6:28 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-02-19 22:23 ceph-fuse remount issues John Spray 2015-02-23 5:14 ` Gregory Farnum 2015-02-26 8:28 ` 严正 2015-03-16 5:28 ` Sage Weil 2015-03-16 6:28 ` Yan, Zheng
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.