* ceph-fuse remount issues
@ 2015-02-19 22:23 John Spray
2015-02-23 5:14 ` Gregory Farnum
2015-02-26 8:28 ` 严正
0 siblings, 2 replies; 5+ messages in thread
From: John Spray @ 2015-02-19 22:23 UTC (permalink / raw)
To: ceph-devel, zyan, Gregory Farnum
Background: a while ago, we found (#10277) that existing cache
expiration mechanism wasn't working with latest kernels. We used to
invalidate the top level dentries, which caused fuse to invalidate
everything, but an implementation detail in fuse caused it to start
ignoring our repeated invalidate calls, so this doesn't work any more.
To persuade fuse to dirty its entire metadata cache, Zheng added in a
system() call to "mount -o remount" after we expire things from our
client side cache.
However, this was a bit of a hack and has created problems:
* You can't call mount -o remount unless you're root, so we are less
flexible than we used to be (#10542)
* While the remount is happening, unmounts sporadically fail and the
fuse process can become unresponsive to SIGKILL (#10916)
The first issue was maybe an acceptable compromise, but the second issue
is just painful, and it seems like we might not have seen the last of
the knock on effects -- upstream maintainers certainly aren't expecting
filesystems to remount themselves quite so frequently.
We probably have an opportunity to get something upstream in fuse to
support a direct call to trigger the invalidation we want, if we can
work out what that should look like. Thoughts?
John
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ceph-fuse remount issues
2015-02-19 22:23 ceph-fuse remount issues John Spray
@ 2015-02-23 5:14 ` Gregory Farnum
2015-02-26 8:28 ` 严正
1 sibling, 0 replies; 5+ messages in thread
From: Gregory Farnum @ 2015-02-23 5:14 UTC (permalink / raw)
To: John Spray; +Cc: ceph-devel, zyan, sage
----- Original Message -----
> From: "John Spray" <john.spray@redhat.com>
> To: ceph-devel@vger.kernel.org, zyan@redhat.com, "Gregory Farnum" <gfarnum@redhat.com>
> Sent: Thursday, February 19, 2015 2:23:21 PM
> Subject: ceph-fuse remount issues
>
>
> Background: a while ago, we found (#10277) that existing cache
> expiration mechanism wasn't working with latest kernels. We used to
> invalidate the top level dentries, which caused fuse to invalidate
> everything, but an implementation detail in fuse caused it to start
> ignoring our repeated invalidate calls, so this doesn't work any more.
> To persuade fuse to dirty its entire metadata cache, Zheng added in a
> system() call to "mount -o remount" after we expire things from our
> client side cache.
>
> However, this was a bit of a hack and has created problems:
> * You can't call mount -o remount unless you're root, so we are less
> flexible than we used to be (#10542)
> * While the remount is happening, unmounts sporadically fail and the
> fuse process can become unresponsive to SIGKILL (#10916)
>
> The first issue was maybe an acceptable compromise, but the second issue
> is just painful, and it seems like we might not have seen the last of
> the knock on effects -- upstream maintainers certainly aren't expecting
> filesystems to remount themselves quite so frequently.
Yeah. I looked at this briefly and switching to a conditional behavior based on kernel version shouldn't be too difficult; the actual change in behavior is a very short patch: https://github.com/ceph/ceph/commit/0827bb79ea5127e6763f6e904dfa1a3266046ffb
I'm going to try and integrate that in with my branch to warn on remount issues in the morning: https://github.com/ceph/ceph/pull/3681 (better version of that sitting on my computer now too)
> We probably have an opportunity to get something upstream in fuse to
> support a direct call to trigger the invalidation we want, if we can
> work out what that should look like. Thoughts?
Yes please. I don't really have the kernel VFS or FUSE interface experience here to offer up much off the top of my head, but I think this is something that FUSE ought to allow us to do. LSF/MM is coming up and Sage will be there, which is probably a good time to raise issues with people in the hallway or appropriate sessions.
-Greg
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ceph-fuse remount issues
2015-02-19 22:23 ceph-fuse remount issues John Spray
2015-02-23 5:14 ` Gregory Farnum
@ 2015-02-26 8:28 ` 严正
2015-03-16 5:28 ` Sage Weil
1 sibling, 1 reply; 5+ messages in thread
From: 严正 @ 2015-02-26 8:28 UTC (permalink / raw)
To: John Spray; +Cc: ceph-devel, Gregory Farnum
> 在 2015年2月20日,06:23,John Spray <john.spray@redhat.com> 写道:
>
>
> Background: a while ago, we found (#10277) that existing cache expiration mechanism wasn't working with latest kernels. We used to invalidate the top level dentries, which caused fuse to invalidate everything, but an implementation detail in fuse caused it to start ignoring our repeated invalidate calls, so this doesn't work any more. To persuade fuse to dirty its entire metadata cache, Zheng added in a system() call to "mount -o remount" after we expire things from our client side cache.
Change of d_invalidate() implementation breaks our old cache expiration mechanism. When invalidating a denty, d_invalidate() also walks the dentry subtree and try pruning any unused descendant dentries. Our old cache expiration mechanism replies on this
to prune unused dentries. We invalidate the top level dentries, d_invalidate() try pruning unused dentries underneath these top level dentries. Prior to 3.18 kernel, d_invalidate() can fail if the dentry is used by some one. Implementation of d_invalidate() change
in 3.18 kernel, d_invalidate() always successes and unhash the dentry even if it’s still in use. This behavior changes make us not be able to use d_invalidate() at will. One known bad consequence is getcwd() system call return -EINVAL after process’ working directory gets invalidated.
The cephfs kernel client has no such issue because it maintains its own per-session cap list. When it receives cache pressure message from MDS, it can iterate the list and prune unused caps.
>
> However, this was a bit of a hack and has created problems:
> * You can't call mount -o remount unless you're root, so we are less flexible than we used to be (#10542)
> * While the remount is happening, unmounts sporadically fail and the fuse process can become unresponsive to SIGKILL (#10916)
>
> The first issue was maybe an acceptable compromise, but the second issue is just painful, and it seems like we might not have seen the last of the knock on effects -- upstream maintainers certainly aren't expecting filesystems to remount themselves quite so frequently.
>
> We probably have an opportunity to get something upstream in fuse to support a direct call to trigger the invalidation we want, if we can work out what that should look like. Thoughts?
>
> John
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ceph-fuse remount issues
2015-02-26 8:28 ` 严正
@ 2015-03-16 5:28 ` Sage Weil
2015-03-16 6:28 ` Yan, Zheng
0 siblings, 1 reply; 5+ messages in thread
From: Sage Weil @ 2015-03-16 5:28 UTC (permalink / raw)
To: 严正; +Cc: John Spray, ceph-devel, Gregory Farnum
Hi Zheng,
On Thu, 26 Feb 2015, ?? wrote:
> > ? 2015?2?20??06:23?John Spray <john.spray@redhat.com> ???
> >
> >
> > Background: a while ago, we found (#10277) that existing cache expiration mechanism wasn't working with latest kernels. We used to invalidate the top level dentries, which caused fuse to invalidate everything, but an implementation detail in fuse caused it to start ignoring our repeated invalidate calls, so this doesn't work any more. To persuade fuse to dirty its entire metadata cache, Zheng added in a system() call to "mount -o remount" after we expire things from our client side cache.
>
> Change of d_invalidate() implementation breaks our old cache expiration
> mechanism. When invalidating a denty, d_invalidate() also walks the
> dentry subtree and try pruning any unused descendant dentries. Our old
> cache expiration mechanism replies on this to prune unused dentries. We
> invalidate the top level dentries, d_invalidate() try pruning unused
> dentries underneath these top level dentries. Prior to 3.18 kernel,
> d_invalidate() can fail if the dentry is used by some one.
> Implementation of d_invalidate() change in 3.18 kernel, d_invalidate()
> always successes and unhash the dentry even if it?s still in use. This
> behavior changes make us not be able to use d_invalidate() at will. One
> known bad consequence is getcwd() system call return -EINVAL after
> process? working directory gets invalidated.
I took another look at this and it seems to me like we might need
something more than a new call that does the pruning. What we were doing
before was also a bit of a hack, it seems.
What is really going on is that the MDS is telling us to reduce the number
of inodes we have pinned. We should ideally turn that into pressure on
dcache.. but it's not per-superblock, so there's not a shrinker we can
poke that that does what we want.
We *are* doing the dentry invalidations, so the dentries are unhashed.
But they aren't getting destroyed... Zheng, this is what the previous hack
was doing, right? Forcing unhashed dentries to get trimmed from the LRU?
It seems like the most elegant solution would be to patch fs/fuse to make
that happen in the general case when we do the invalidate upcall. Does
that sound right?
sage
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ceph-fuse remount issues
2015-03-16 5:28 ` Sage Weil
@ 2015-03-16 6:28 ` Yan, Zheng
0 siblings, 0 replies; 5+ messages in thread
From: Yan, Zheng @ 2015-03-16 6:28 UTC (permalink / raw)
To: Sage Weil; +Cc: 严正, John Spray, ceph-devel, Gregory Farnum
On Mon, Mar 16, 2015 at 1:28 PM, Sage Weil <sweil@redhat.com> wrote:
> Hi Zheng,
>
> On Thu, 26 Feb 2015, ?? wrote:
>> > ? 2015?2?20??06:23?John Spray <john.spray@redhat.com> ???
>> >
>> >
>> > Background: a while ago, we found (#10277) that existing cache expiration mechanism wasn't working with latest kernels. We used to invalidate the top level dentries, which caused fuse to invalidate everything, but an implementation detail in fuse caused it to start ignoring our repeated invalidate calls, so this doesn't work any more. To persuade fuse to dirty its entire metadata cache, Zheng added in a system() call to "mount -o remount" after we expire things from our client side cache.
>>
>> Change of d_invalidate() implementation breaks our old cache expiration
>> mechanism. When invalidating a denty, d_invalidate() also walks the
>> dentry subtree and try pruning any unused descendant dentries. Our old
>> cache expiration mechanism replies on this to prune unused dentries. We
>> invalidate the top level dentries, d_invalidate() try pruning unused
>> dentries underneath these top level dentries. Prior to 3.18 kernel,
>> d_invalidate() can fail if the dentry is used by some one.
>> Implementation of d_invalidate() change in 3.18 kernel, d_invalidate()
>> always successes and unhash the dentry even if it?s still in use. This
>> behavior changes make us not be able to use d_invalidate() at will. One
>> known bad consequence is getcwd() system call return -EINVAL after
>> process? working directory gets invalidated.
>
> I took another look at this and it seems to me like we might need
> something more than a new call that does the pruning. What we were doing
> before was also a bit of a hack, it seems.
>
> What is really going on is that the MDS is telling us to reduce the number
> of inodes we have pinned. We should ideally turn that into pressure on
> dcache.. but it's not per-superblock, so there's not a shrinker we can
> poke that that does what we want.
>
> We *are* doing the dentry is, so the dentries are unhashed.
> But they aren't getting destroyed... Zheng, this is what the previous hack
> was doing, right? Forcing unhashed dentries to get trimmed from the LRU?
>
Yes, that's what previous hack does. When invalidating dentry, VFS also tries
destroying the dentry.
> It seems like the most elegant solution would be to patch fs/fuse to make
> that happen in the general case when we do the invalidate upcall. Does
> that sound right?
what do you mean "make that happen in the general case". In my option,
there isn't much fuse kernel module can do about dentry. Maybe we can try
adding a new callback which call d_prune_aliases().
Regards
Yan, Zheng
>
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-03-16 6:28 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-19 22:23 ceph-fuse remount issues John Spray
2015-02-23 5:14 ` Gregory Farnum
2015-02-26 8:28 ` 严正
2015-03-16 5:28 ` Sage Weil
2015-03-16 6:28 ` Yan, Zheng
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.