[PATCH v2] ceph: do not update snapshot context when there is no new snapshot

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2] ceph: do not update snapshot context when there is no new snapshot
@ 2022-02-18  2:47 xiubli
  2022-02-18 14:17 ` Jeff Layton
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: xiubli @ 2022-02-18  2:47 UTC (permalink / raw)
  To: jlayton; +Cc: idryomov, vshankar, ukernel, ceph-devel, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

We will only track the uppest parent snapshot realm from which we
need to rebuild the snapshot contexts _downward_ in hierarchy. For
all the others having no new snapshot we will do nothing.

This fix will avoid calling ceph_queue_cap_snap() on some inodes
inappropriately. For example, with the code in mainline, suppose there
are 2 directory hierarchies (with 6 directories total), like this:

/dir_X1/dir_X2/dir_X3/
/dir_Y1/dir_Y2/dir_Y3/

Firstly, make a snapshot under /dir_X1/dir_X2/.snap/snap_X2, then make a
root snapshot under /.snap/root_snap. Every time we make snapshots under
/dir_Y1/..., the kclient will always try to rebuild the snap context for
snap_X2 realm and finally will always try to queue cap snaps for dir_Y2
and dir_Y3, which makes no sense.

That's because the snap_X2's seq is 2 and root_snap's seq is 3. So when
creating a new snapshot under /dir_Y1/... the new seq will be 4, and
the mds will send the kclient a snapshot backtrace in _downward_
order: seqs 4, 3.

When ceph_update_snap_trace() is called, it will always rebuild the from
the last realm, that's the root_snap. So later when rebuilding the snap
context, the current logic will always cause it to rebuild the snap_X2
realm and then try to queue cap snaps for all the inodes related in that
realm, even though it's not necessary.

This is accompanied by a lot of these sorts of dout messages:

    "ceph:  queue_cap_snap 00000000a42b796b nothing dirty|writing"

Fix the logic to avoid this situation.

The 'invalidate' word is not precise here, acutally it will rebuild
the snapshot existing contexts or just build none-existing ones,
rename it to 'rebuild_snapcs'.

URL: https://tracker.ceph.com/issues/44100
Signed-off-by: Xiubo Li <xiubli@redhat.com>
---

Changed in V2:
- Thanks Zheng's feedback and switched to Zheng's patch.
- Rename invalidate to rebuild_snapcs.

 fs/ceph/snap.c | 28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
index dbf34f212596..6d55b8ba79d8 100644
--- a/fs/ceph/snap.c
+++ b/fs/ceph/snap.c
@@ -735,7 +735,8 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
 	__le64 *prior_parent_snaps;        /* encoded */
 	struct ceph_snap_realm *realm = NULL;
 	struct ceph_snap_realm *first_realm = NULL;
-	int invalidate = 0;
+	struct ceph_snap_realm *realm_to_rebuild = NULL;
+	int rebuild_snapcs;
 	int err = -ENOMEM;
 	LIST_HEAD(dirty_realms);

@@ -743,6 +744,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,

 	dout("update_snap_trace deletion=%d\n", deletion);
 more:
+	rebuild_snapcs = 0;
 	ceph_decode_need(&p, e, sizeof(*ri), bad);
 	ri = p;
 	p += sizeof(*ri);
@@ -766,7 +768,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
 	err = adjust_snap_realm_parent(mdsc, realm, le64_to_cpu(ri->parent));
 	if (err < 0)
 		goto fail;
-	invalidate += err;
+	rebuild_snapcs += err;

 	if (le64_to_cpu(ri->seq) > realm->seq) {
 		dout("update_snap_trace updating %llx %p %lld -> %lld\n",
@@ -791,22 +793,30 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
 		if (realm->seq > mdsc->last_snap_seq)
 			mdsc->last_snap_seq = realm->seq;

-		invalidate = 1;
+		rebuild_snapcs = 1;
 	} else if (!realm->cached_context) {
 		dout("update_snap_trace %llx %p seq %lld new\n",
 		     realm->ino, realm, realm->seq);
-		invalidate = 1;
+		rebuild_snapcs = 1;
 	} else {
 		dout("update_snap_trace %llx %p seq %lld unchanged\n",
 		     realm->ino, realm, realm->seq);
 	}

-	dout("done with %llx %p, invalidated=%d, %p %p\n", realm->ino,
-	     realm, invalidate, p, e);
+	dout("done with %llx %p, rebuild_snapcs=%d, %p %p\n", realm->ino,
+	     realm, rebuild_snapcs, p, e);

-	/* invalidate when we reach the _end_ (root) of the trace */
-	if (invalidate && p >= e)
-		rebuild_snap_realms(realm, &dirty_realms);
+	/*
+	 * this will always track the uppest parent realm from which
+	 * we need to rebuild the snapshot contexts _downward_ in
+	 * hierarchy.
+	 */
+	if (rebuild_snapcs)
+		realm_to_rebuild = realm;
+
+	/* rebuild_snapcs when we reach the _end_ (root) of the trace */
+	if (rebuild_snapcs && p >= e)
+		rebuild_snap_realms(realm_to_rebuild, &dirty_realms);

 	if (!first_realm)
 		first_realm = realm;
-- 
2.27.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] ceph: do not update snapshot context when there is no new snapshot
  2022-02-18  2:47 [PATCH v2] ceph: do not update snapshot context when there is no new snapshot xiubli
@ 2022-02-18 14:17 ` Jeff Layton
  2022-02-18 16:53 ` Luís Henriques
  2022-02-19  6:30 ` Xiubo Li
  2 siblings, 0 replies; 7+ messages in thread
From: Jeff Layton @ 2022-02-18 14:17 UTC (permalink / raw)
  To: xiubli; +Cc: idryomov, vshankar, ukernel, ceph-devel

On Fri, 2022-02-18 at 10:47 +0800, xiubli@redhat.com wrote:
> From: Xiubo Li <xiubli@redhat.com>
> 
> We will only track the uppest parent snapshot realm from which we
> need to rebuild the snapshot contexts _downward_ in hierarchy. For
> all the others having no new snapshot we will do nothing.
> 
> This fix will avoid calling ceph_queue_cap_snap() on some inodes
> inappropriately. For example, with the code in mainline, suppose there
> are 2 directory hierarchies (with 6 directories total), like this:
> 
> /dir_X1/dir_X2/dir_X3/
> /dir_Y1/dir_Y2/dir_Y3/
> 
> Firstly, make a snapshot under /dir_X1/dir_X2/.snap/snap_X2, then make a
> root snapshot under /.snap/root_snap. Every time we make snapshots under
> /dir_Y1/..., the kclient will always try to rebuild the snap context for
> snap_X2 realm and finally will always try to queue cap snaps for dir_Y2
> and dir_Y3, which makes no sense.
> 
> That's because the snap_X2's seq is 2 and root_snap's seq is 3. So when
> creating a new snapshot under /dir_Y1/... the new seq will be 4, and
> the mds will send the kclient a snapshot backtrace in _downward_
> order: seqs 4, 3.
> 
> When ceph_update_snap_trace() is called, it will always rebuild the from
> the last realm, that's the root_snap. So later when rebuilding the snap
> context, the current logic will always cause it to rebuild the snap_X2
> realm and then try to queue cap snaps for all the inodes related in that
> realm, even though it's not necessary.
> 
> This is accompanied by a lot of these sorts of dout messages:
> 
>     "ceph:  queue_cap_snap 00000000a42b796b nothing dirty|writing"
> 
> Fix the logic to avoid this situation.
> 
> The 'invalidate' word is not precise here, acutally it will rebuild
> the snapshot existing contexts or just build none-existing ones,
> rename it to 'rebuild_snapcs'.
> 
> URL: https://tracker.ceph.com/issues/44100
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
> 
> Changed in V2:
> - Thanks Zheng's feedback and switched to Zheng's patch.
> - Rename invalidate to rebuild_snapcs.
> 
> 
> 
>  fs/ceph/snap.c | 28 +++++++++++++++++++---------
>  1 file changed, 19 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
> index dbf34f212596..6d55b8ba79d8 100644
> --- a/fs/ceph/snap.c
> +++ b/fs/ceph/snap.c
> @@ -735,7 +735,8 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
>  	__le64 *prior_parent_snaps;        /* encoded */
>  	struct ceph_snap_realm *realm = NULL;
>  	struct ceph_snap_realm *first_realm = NULL;
> -	int invalidate = 0;
> +	struct ceph_snap_realm *realm_to_rebuild = NULL;
> +	int rebuild_snapcs;
>  	int err = -ENOMEM;
>  	LIST_HEAD(dirty_realms);
>  
> @@ -743,6 +744,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
>  
>  	dout("update_snap_trace deletion=%d\n", deletion);
>  more:
> +	rebuild_snapcs = 0;
>  	ceph_decode_need(&p, e, sizeof(*ri), bad);
>  	ri = p;
>  	p += sizeof(*ri);
> @@ -766,7 +768,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
>  	err = adjust_snap_realm_parent(mdsc, realm, le64_to_cpu(ri->parent));
>  	if (err < 0)
>  		goto fail;
> -	invalidate += err;
> +	rebuild_snapcs += err;
>  
>  	if (le64_to_cpu(ri->seq) > realm->seq) {
>  		dout("update_snap_trace updating %llx %p %lld -> %lld\n",
> @@ -791,22 +793,30 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
>  		if (realm->seq > mdsc->last_snap_seq)
>  			mdsc->last_snap_seq = realm->seq;
>  
> -		invalidate = 1;
> +		rebuild_snapcs = 1;
>  	} else if (!realm->cached_context) {
>  		dout("update_snap_trace %llx %p seq %lld new\n",
>  		     realm->ino, realm, realm->seq);
> -		invalidate = 1;
> +		rebuild_snapcs = 1;
>  	} else {
>  		dout("update_snap_trace %llx %p seq %lld unchanged\n",
>  		     realm->ino, realm, realm->seq);
>  	}
>  
> -	dout("done with %llx %p, invalidated=%d, %p %p\n", realm->ino,
> -	     realm, invalidate, p, e);
> +	dout("done with %llx %p, rebuild_snapcs=%d, %p %p\n", realm->ino,
> +	     realm, rebuild_snapcs, p, e);
>  
> -	/* invalidate when we reach the _end_ (root) of the trace */
> -	if (invalidate && p >= e)
> -		rebuild_snap_realms(realm, &dirty_realms);
> +	/*
> +	 * this will always track the uppest parent realm from which
> +	 * we need to rebuild the snapshot contexts _downward_ in
> +	 * hierarchy.
> +	 */
> +	if (rebuild_snapcs)
> +		realm_to_rebuild = realm;
> +
> +	/* rebuild_snapcs when we reach the _end_ (root) of the trace */
> +	if (rebuild_snapcs && p >= e)
> +		rebuild_snap_realms(realm_to_rebuild, &dirty_realms);
>  
>  	if (!first_realm)
>  		first_realm = realm;

Looks good, Xiubo. Dropped the old patch and merged this one into the
testing branch (with a little changelog cleanup).

Thanks to both you and Zheng!
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] ceph: do not update snapshot context when there is no new snapshot
  2022-02-18  2:47 [PATCH v2] ceph: do not update snapshot context when there is no new snapshot xiubli
  2022-02-18 14:17 ` Jeff Layton
@ 2022-02-18 16:53 ` Luís Henriques
  2022-02-19  2:35   ` Xiubo Li
  2022-02-19  6:30 ` Xiubo Li
  2 siblings, 1 reply; 7+ messages in thread
From: Luís Henriques @ 2022-02-18 16:53 UTC (permalink / raw)
  To: xiubli; +Cc: jlayton, idryomov, vshankar, ukernel, ceph-devel

Hi!

I'm seeing the BUG below when running a simple fsstress on an encrypted
directory.  Reverting this commit seems to make it go away, but I'm not
yet 100% sure this is the culprit (I just wanted to report it before going
offline for the weekend.)

I stared at this code for a bit, but no light so far.

Cheers,
-- 
Luís

[   43.593441] ------------[ cut here ]------------                                                                                                                          
[   43.595707] kernel BUG at fs/ceph/addr.c:108!
[   43.598354] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[   43.601563] CPU: 0 PID: 232 Comm: fsstress Not tainted 5.17.0-rc2+ #62
[   43.604225] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
[   43.607957] RIP: 0010:ceph_set_page_dirty+0x1eb/0x1f0 [ceph]
[   43.610909] Code: 55 51 83 e9 01 50 51 48 c7 c1 df 50 0d a0 52 ff 73 20 ba 03 00 00 00 53 41 ff 34 24 e8 2e 31 2f e1 48 83 c4 50 e9 f0 fe ff ff <0f> 0b 0f 1f 00 0f 1f 44f
[   43.619910] RSP: 0018:ffffc900002cb9c8 EFLAGS: 00010246
[   43.621662] RAX: ffff888005e65ff0 RBX: ffffea0001fac3c0 RCX: 0000000000000001
[   43.624036] RDX: ffff888005e65ff0 RSI: 000000000037b280 RDI: 0000000000000000
[   43.626441] RBP: ffff888005e66180 R08: 0000000000000f8a R09: ffffea0001fac3c0
[   43.629834] R10: ffff88800b567e10 R11: 0000000000001000 R12: ffff888005e662e0
[   43.633396] R13: 0000000000000000 R14: ffff888005e65e10 R15: 0000000000000f8a
[   43.637012] FS:  00007fdc23f7fb80(0000) GS:ffff888071200000(0000) knlGS:0000000000000000
[   43.641055] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   43.643799] CR2: 00007ff30428d008 CR3: 00000000043b6000 CR4: 00000000000006b0
[   43.646667] Call Trace:
[   43.647694]  <TASK>
[   43.648579]  folio_mark_dirty+0x36/0x50
[   43.650166]  ceph_write_end+0x53/0x100 [ceph]
[   43.651734]  generic_perform_write+0xfe/0x1d0
[   43.653263]  ceph_write_iter+0x5b5/0x790 [ceph]
[   43.654864]  do_iter_readv_writev+0x14d/0x1d0
[   43.656295]  do_iter_write+0x85/0x1f0
[   43.657491]  iter_file_splice_write+0x253/0x370
[   43.658858]  direct_splice_actor+0x2c/0x40
[   43.660797]  splice_direct_to_actor+0xf8/0x220
[   43.662209]  ? opipe_prep.part.19+0xb0/0xb0
[   43.663493]  do_splice_direct+0x9a/0xd0
[   43.664684]  generic_copy_file_range+0x32/0x40
[   43.666055]  ceph_copy_file_range+0xb3/0xa10 [ceph]
[   43.667455]  ? _raw_spin_unlock+0x12/0x30
[   43.668475]  ? __ceph_do_getattr+0x7a/0x240 [ceph]
[   43.669724]  ? _copy_to_user+0x1c/0x30
[   43.670654]  ? cp_new_stat+0x12b/0x160
[   43.671569]  vfs_copy_file_range+0x26c/0x510
[   43.672609]  __x64_sys_copy_file_range+0x12d/0x1d0
[   43.673759]  do_syscall_64+0x42/0x90
[   43.674607]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   43.675875] RIP: 0033:0x7fdc240a695d
[   43.677114] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 018
[   43.683105] RSP: 002b:00007ffef97a53a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000146
[   43.685395] RAX: ffffffffffffffda RBX: 0000000000000051 RCX: 00007fdc240a695d
[   43.687528] RDX: 0000000000000005 RSI: 00007ffef97a53e0 RDI: 0000000000000004
[   43.689596] RBP: 0000000000000004 R08: 000000000001471b R09: 0000000000000000
[   43.691550] R10: 00007ffef97a53e8 R11: 0000000000000246 R12: 0000000000000005
[   43.693490] R13: 00000000002d3c32 R14: 000000000001471b R15: 00000000004be076
[   43.695375]  </TASK>
[   43.695960] Modules linked in: ceph libceph
[   43.697060] ---[ end trace 0000000000000000 ]---
[   43.698259] RIP: 0010:ceph_set_page_dirty+0x1eb/0x1f0 [ceph]
[   43.699676] Code: 55 51 83 e9 01 50 51 48 c7 c1 df 50 0d a0 52 ff 73 20 ba 03 00 00 00 53 41 ff 34 24 e8 2e 31 2f e1 48 83 c4 50 e9 f0 fe ff ff <0f> 0b 0f 1f 00 0f 1f 44f
[   43.704183] RSP: 0018:ffffc900002cb9c8 EFLAGS: 00010246
[   43.705424] RAX: ffff888005e65ff0 RBX: ffffea0001fac3c0 RCX: 0000000000000001
[   43.707116] RDX: ffff888005e65ff0 RSI: 000000000037b280 RDI: 0000000000000000
[   43.708718] RBP: ffff888005e66180 R08: 0000000000000f8a R09: ffffea0001fac3c0
[   43.709866] R10: ffff88800b567e10 R11: 0000000000001000 R12: ffff888005e662e0
[   43.710923] R13: 0000000000000000 R14: ffff888005e65e10 R15: 0000000000000f8a
[   43.711995] FS:  00007fdc23f7fb80(0000) GS:ffff888071200000(0000) knlGS:0000000000000000
[   43.713189] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   43.714069] CR2: 00007ff30428d008 CR3: 00000000043b6000 CR4: 00000000000006b0
[   43.715093] note: fsstress[232] exited with preempt_count 1



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] ceph: do not update snapshot context when there is no new snapshot
  2022-02-18 16:53 ` Luís Henriques
@ 2022-02-19  2:35   ` Xiubo Li
  2022-02-19 13:00     ` Jeff Layton
  0 siblings, 1 reply; 7+ messages in thread
From: Xiubo Li @ 2022-02-19  2:35 UTC (permalink / raw)
  To: Luís Henriques; +Cc: jlayton, idryomov, vshankar, ukernel, ceph-devel


On 2/19/22 12:53 AM, Luís Henriques wrote:
> Hi!
>
> I'm seeing the BUG below when running a simple fsstress on an encrypted
> directory.  Reverting this commit seems to make it go away, but I'm not
> yet 100% sure this is the culprit (I just wanted to report it before going
> offline for the weekend.)

BTW, were you using the 'testing' branch ? It seems Jeff has not 
included the fscrypt patches yet in it.

- Xiubo

>
> I stared at this code for a bit, but no light so far.
>
> Cheers,


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] ceph: do not update snapshot context when there is no new snapshot
  2022-02-19  2:35   ` Xiubo Li
@ 2022-02-19 13:00     ` Jeff Layton
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff Layton @ 2022-02-19 13:00 UTC (permalink / raw)
  To: Xiubo Li, Luís Henriques; +Cc: idryomov, vshankar, ukernel, ceph-devel

On Sat, 2022-02-19 at 10:35 +0800, Xiubo Li wrote:
> On 2/19/22 12:53 AM, Luís Henriques wrote:
> > Hi!
> > 
> > I'm seeing the BUG below when running a simple fsstress on an encrypted
> > directory.  Reverting this commit seems to make it go away, but I'm not
> > yet 100% sure this is the culprit (I just wanted to report it before going
> > offline for the weekend.)
> 
> BTW, were you using the 'testing' branch ? It seems Jeff has not 
> included the fscrypt patches yet in it.
> 
> 

I went ahead and rebased the wip-fscrypt branch onto the latest testing
branch yesterday, and again this morning. It should now be based on the
current testing branch (with your latest fixes).

Thanks,
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] ceph: do not update snapshot context when there is no new snapshot
  2022-02-18  2:47 [PATCH v2] ceph: do not update snapshot context when there is no new snapshot xiubli
  2022-02-18 14:17 ` Jeff Layton
  2022-02-18 16:53 ` Luís Henriques
@ 2022-02-19  6:30 ` Xiubo Li
  2022-02-21  9:54   ` Luís Henriques
  2 siblings, 1 reply; 7+ messages in thread
From: Xiubo Li @ 2022-02-19  6:30 UTC (permalink / raw)
  To: jlayton, Luis Henriques; +Cc: idryomov, vshankar, ukernel, ceph-devel


On 2/18/22 10:47 AM, xiubli@redhat.com wrote:
> From: Xiubo Li <xiubli@redhat.com>
>
> We will only track the uppest parent snapshot realm from which we
> need to rebuild the snapshot contexts _downward_ in hierarchy. For
> all the others having no new snapshot we will do nothing.
>
> This fix will avoid calling ceph_queue_cap_snap() on some inodes
> inappropriately. For example, with the code in mainline, suppose there
> are 2 directory hierarchies (with 6 directories total), like this:
>
> /dir_X1/dir_X2/dir_X3/
> /dir_Y1/dir_Y2/dir_Y3/
>
> Firstly, make a snapshot under /dir_X1/dir_X2/.snap/snap_X2, then make a
> root snapshot under /.snap/root_snap. Every time we make snapshots under
> /dir_Y1/..., the kclient will always try to rebuild the snap context for
> snap_X2 realm and finally will always try to queue cap snaps for dir_Y2
> and dir_Y3, which makes no sense.
>
> That's because the snap_X2's seq is 2 and root_snap's seq is 3. So when
> creating a new snapshot under /dir_Y1/... the new seq will be 4, and
> the mds will send the kclient a snapshot backtrace in _downward_
> order: seqs 4, 3.
>
> When ceph_update_snap_trace() is called, it will always rebuild the from
> the last realm, that's the root_snap. So later when rebuilding the snap
> context, the current logic will always cause it to rebuild the snap_X2
> realm and then try to queue cap snaps for all the inodes related in that
> realm, even though it's not necessary.
>
> This is accompanied by a lot of these sorts of dout messages:
>
>      "ceph:  queue_cap_snap 00000000a42b796b nothing dirty|writing"
>
> Fix the logic to avoid this situation.
>
> The 'invalidate' word is not precise here, acutally it will rebuild
> the snapshot existing contexts or just build none-existing ones,
> rename it to 'rebuild_snapcs'.
>
> URL: https://tracker.ceph.com/issues/44100
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>
> Changed in V2:
> - Thanks Zheng's feedback and switched to Zheng's patch.
> - Rename invalidate to rebuild_snapcs.
>
>
>
>   fs/ceph/snap.c | 28 +++++++++++++++++++---------
>   1 file changed, 19 insertions(+), 9 deletions(-)
>
> diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
> index dbf34f212596..6d55b8ba79d8 100644
> --- a/fs/ceph/snap.c
> +++ b/fs/ceph/snap.c
> @@ -735,7 +735,8 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
>   	__le64 *prior_parent_snaps;        /* encoded */
>   	struct ceph_snap_realm *realm = NULL;
>   	struct ceph_snap_realm *first_realm = NULL;
> -	int invalidate = 0;
> +	struct ceph_snap_realm *realm_to_rebuild = NULL;
> +	int rebuild_snapcs;
>   	int err = -ENOMEM;
>   	LIST_HEAD(dirty_realms);
>   
> @@ -743,6 +744,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
>   
>   	dout("update_snap_trace deletion=%d\n", deletion);
>   more:
> +	rebuild_snapcs = 0;
>   	ceph_decode_need(&p, e, sizeof(*ri), bad);
>   	ri = p;
>   	p += sizeof(*ri);
> @@ -766,7 +768,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
>   	err = adjust_snap_realm_parent(mdsc, realm, le64_to_cpu(ri->parent));
>   	if (err < 0)
>   		goto fail;
> -	invalidate += err;
> +	rebuild_snapcs += err;
>   
>   	if (le64_to_cpu(ri->seq) > realm->seq) {
>   		dout("update_snap_trace updating %llx %p %lld -> %lld\n",
> @@ -791,22 +793,30 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
>   		if (realm->seq > mdsc->last_snap_seq)
>   			mdsc->last_snap_seq = realm->seq;
>   
> -		invalidate = 1;
> +		rebuild_snapcs = 1;
>   	} else if (!realm->cached_context) {
>   		dout("update_snap_trace %llx %p seq %lld new\n",
>   		     realm->ino, realm, realm->seq);
> -		invalidate = 1;
> +		rebuild_snapcs = 1;
>   	} else {
>   		dout("update_snap_trace %llx %p seq %lld unchanged\n",
>   		     realm->ino, realm, realm->seq);
>   	}
>   
> -	dout("done with %llx %p, invalidated=%d, %p %p\n", realm->ino,
> -	     realm, invalidate, p, e);
> +	dout("done with %llx %p, rebuild_snapcs=%d, %p %p\n", realm->ino,
> +	     realm, rebuild_snapcs, p, e);
>   
> -	/* invalidate when we reach the _end_ (root) of the trace */
> -	if (invalidate && p >= e)
> -		rebuild_snap_realms(realm, &dirty_realms);
> +	/*
> +	 * this will always track the uppest parent realm from which
> +	 * we need to rebuild the snapshot contexts _downward_ in
> +	 * hierarchy.
> +	 */
> +	if (rebuild_snapcs)
> +		realm_to_rebuild = realm;
> +
> +	/* rebuild_snapcs when we reach the _end_ (root) of the trace */
> +	if (rebuild_snapcs && p >= e)

s/rebuild_snapcs/realm_to_rebuild/

This will fix the bug Luís Henriques reported.

I have sent the V3 to fix it. Thanks.

- Xiubo


> +		rebuild_snap_realms(realm_to_rebuild, &dirty_realms);
>   
>   	if (!first_realm)
>   		first_realm = realm;


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] ceph: do not update snapshot context when there is no new snapshot
  2022-02-19  6:30 ` Xiubo Li
@ 2022-02-21  9:54   ` Luís Henriques
  0 siblings, 0 replies; 7+ messages in thread
From: Luís Henriques @ 2022-02-21  9:54 UTC (permalink / raw)
  To: Xiubo Li; +Cc: jlayton, idryomov, vshankar, ukernel, ceph-devel

Xiubo Li <xiubli@redhat.com> writes:

> On 2/18/22 10:47 AM, xiubli@redhat.com wrote:
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> We will only track the uppest parent snapshot realm from which we
>> need to rebuild the snapshot contexts _downward_ in hierarchy. For
>> all the others having no new snapshot we will do nothing.
>>
>> This fix will avoid calling ceph_queue_cap_snap() on some inodes
>> inappropriately. For example, with the code in mainline, suppose there
>> are 2 directory hierarchies (with 6 directories total), like this:
>>
>> /dir_X1/dir_X2/dir_X3/
>> /dir_Y1/dir_Y2/dir_Y3/
>>
>> Firstly, make a snapshot under /dir_X1/dir_X2/.snap/snap_X2, then make a
>> root snapshot under /.snap/root_snap. Every time we make snapshots under
>> /dir_Y1/..., the kclient will always try to rebuild the snap context for
>> snap_X2 realm and finally will always try to queue cap snaps for dir_Y2
>> and dir_Y3, which makes no sense.
>>
>> That's because the snap_X2's seq is 2 and root_snap's seq is 3. So when
>> creating a new snapshot under /dir_Y1/... the new seq will be 4, and
>> the mds will send the kclient a snapshot backtrace in _downward_
>> order: seqs 4, 3.
>>
>> When ceph_update_snap_trace() is called, it will always rebuild the from
>> the last realm, that's the root_snap. So later when rebuilding the snap
>> context, the current logic will always cause it to rebuild the snap_X2
>> realm and then try to queue cap snaps for all the inodes related in that
>> realm, even though it's not necessary.
>>
>> This is accompanied by a lot of these sorts of dout messages:
>>
>>      "ceph:  queue_cap_snap 00000000a42b796b nothing dirty|writing"
>>
>> Fix the logic to avoid this situation.
>>
>> The 'invalidate' word is not precise here, acutally it will rebuild
>> the snapshot existing contexts or just build none-existing ones,
>> rename it to 'rebuild_snapcs'.
>>
>> URL: https://tracker.ceph.com/issues/44100
>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>> ---
>>
>> Changed in V2:
>> - Thanks Zheng's feedback and switched to Zheng's patch.
>> - Rename invalidate to rebuild_snapcs.
>>
>>
>>
>>   fs/ceph/snap.c | 28 +++++++++++++++++++---------
>>   1 file changed, 19 insertions(+), 9 deletions(-)
>>
>> diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
>> index dbf34f212596..6d55b8ba79d8 100644
>> --- a/fs/ceph/snap.c
>> +++ b/fs/ceph/snap.c
>> @@ -735,7 +735,8 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
>>   	__le64 *prior_parent_snaps;        /* encoded */
>>   	struct ceph_snap_realm *realm = NULL;
>>   	struct ceph_snap_realm *first_realm = NULL;
>> -	int invalidate = 0;
>> +	struct ceph_snap_realm *realm_to_rebuild = NULL;
>> +	int rebuild_snapcs;
>>   	int err = -ENOMEM;
>>   	LIST_HEAD(dirty_realms);
>>   @@ -743,6 +744,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
>>     	dout("update_snap_trace deletion=%d\n", deletion);
>>   more:
>> +	rebuild_snapcs = 0;
>>   	ceph_decode_need(&p, e, sizeof(*ri), bad);
>>   	ri = p;
>>   	p += sizeof(*ri);
>> @@ -766,7 +768,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
>>   	err = adjust_snap_realm_parent(mdsc, realm, le64_to_cpu(ri->parent));
>>   	if (err < 0)
>>   		goto fail;
>> -	invalidate += err;
>> +	rebuild_snapcs += err;
>>     	if (le64_to_cpu(ri->seq) > realm->seq) {
>>   		dout("update_snap_trace updating %llx %p %lld -> %lld\n",
>> @@ -791,22 +793,30 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
>>   		if (realm->seq > mdsc->last_snap_seq)
>>   			mdsc->last_snap_seq = realm->seq;
>>   -		invalidate = 1;
>> +		rebuild_snapcs = 1;
>>   	} else if (!realm->cached_context) {
>>   		dout("update_snap_trace %llx %p seq %lld new\n",
>>   		     realm->ino, realm, realm->seq);
>> -		invalidate = 1;
>> +		rebuild_snapcs = 1;
>>   	} else {
>>   		dout("update_snap_trace %llx %p seq %lld unchanged\n",
>>   		     realm->ino, realm, realm->seq);
>>   	}
>>   -	dout("done with %llx %p, invalidated=%d, %p %p\n", realm->ino,
>> -	     realm, invalidate, p, e);
>> +	dout("done with %llx %p, rebuild_snapcs=%d, %p %p\n", realm->ino,
>> +	     realm, rebuild_snapcs, p, e);
>>   -	/* invalidate when we reach the _end_ (root) of the trace */
>> -	if (invalidate && p >= e)
>> -		rebuild_snap_realms(realm, &dirty_realms);
>> +	/*
>> +	 * this will always track the uppest parent realm from which
>> +	 * we need to rebuild the snapshot contexts _downward_ in
>> +	 * hierarchy.
>> +	 */
>> +	if (rebuild_snapcs)
>> +		realm_to_rebuild = realm;
>> +
>> +	/* rebuild_snapcs when we reach the _end_ (root) of the trace */
>> +	if (rebuild_snapcs && p >= e)
>
> s/rebuild_snapcs/realm_to_rebuild/
>
> This will fix the bug Luís Henriques reported.
>
> I have sent the V3 to fix it. Thanks.

Awesome, thanks Xiubo.  I've give it a try today.

Cheer,
-- 
Luís

>
> - Xiubo
>
>
>> +		rebuild_snap_realms(realm_to_rebuild, &dirty_realms);
>>     	if (!first_realm)
>>   		first_realm = realm;
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-02-21 10:34 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-02-18  2:47 [PATCH v2] ceph: do not update snapshot context when there is no new snapshot xiubli
2022-02-18 14:17 ` Jeff Layton
2022-02-18 16:53 ` Luís Henriques
2022-02-19  2:35   ` Xiubo Li
2022-02-19 13:00     ` Jeff Layton
2022-02-19  6:30 ` Xiubo Li
2022-02-21  9:54   ` Luís Henriques

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.