From: Al Viro <viro@zeniv.linux.org.uk>
To: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Cc: "idryomov@gmail.com" <idryomov@gmail.com>,
Alex Markuze <amarkuze@redhat.com>,
"slava@dubeyko.com" <slava@dubeyko.com>,
"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
Patrick Donnelly <pdonnell@redhat.com>
Subject: Re: [PATCH] ceph: is_root_ceph_dentry() cleanup
Date: Tue, 11 Feb 2025 21:40:02 +0000 [thread overview]
Message-ID: <20250211214002.GJ1977892@ZenIV> (raw)
In-Reply-To: <36afe0ca1c80a97962858b81619501e5a5483fe1.camel@ibm.com>
On Tue, Feb 11, 2025 at 07:32:16PM +0000, Viacheslav Dubeyko wrote:
> > The really unpleasant question is whether ceph_handle_notrace_create() could
> > end up feeding an already-positive dentry to direct call of ceph_lookup()...
>
> We have ceph_handle_notrace_create() call in several methods:
> (1) ceph_mknod()
> (2) ceph_symlink()
> (3) ceph_mkdir()
> (4) ceph_atomic_open()
>
> Every time we create object at first and, then, we call
> ceph_handle_notrace_create() if creation was successful. We have such pattern:
>
> req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_MKNOD, USE_AUTH_MDS);
>
> <skipped>
>
> err = ceph_mdsc_do_request(mdsc, dir, req);
> if (!err && !req->r_reply_info.head->is_dentry)
> err = ceph_handle_notrace_create(dir, dentry);
>
> And ceph_lookup() has such logic:
>
> if (d_really_is_negative(dentry)) {
> <execute logic>
> if (-ENOENT)
> return NULL;
> }
>
> req = ceph_mdsc_create_request(mdsc, op, USE_ANY_MDS);
>
> <skipped>
>
> err = ceph_mdsc_do_request(mdsc, NULL, req);
>
> So, we have two different type of requests here:
> (1) ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_MKNOD, USE_AUTH_MDS)
> (2) ceph_mdsc_create_request(mdsc, op, USE_ANY_MDS)
>
> The first request creates an object on MDS side and second one checks that this
> object exists on MDS side by lookup. I assume that first request simply reports
> success or failure of object creation. And only second one can extract metadata
> from MDS side.
If only... The first request may return that metadata, in which case we'll get
splice_dentry() called by ceph_fill_trace() when reply arrives. Note that
calls of ceph_handle_notrace_create() are conditional.
Intent is as you've described and normally that's how it works, but that
assumes sane reply. Which is not a good assumption to make, when it comes
to memory corruption on client...
I _think_ the conditions are sufficient. There are 4 callers of
ceph_handle_notrace_create(). All of them require is_dentry in reply to be false;
the one in ceph_mkdir() requires both is_dentry and is_target to be false.
ceph_fill_trace() does not call splice_dentry() when both is_dentry and is_target
are false, so we are only interested in the mknod/symlink/atomic_open callers.
Past the handling of !is_dentry && !is_target case ceph_fill_trace() has
a large if (is_dentry); not a concern for us. Next comes if (is_target) where
we deal with ceph_fill_inode(); no calls of splice_dentry() in it.
After that we have
if (is_dentry && ....) {
...
splice_dentry() call possible here
...
} else if ((req->r_op == CEPH_MDS_OP_LOOKUPSNAP ||
req->r_op == CEPH_MDS_OP_MKSNAP) &&
test_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags) &&
!test_bit(CEPH_MDS_R_ABORTED, &req->r_req_flags)) {
...
splice_dentry() call possible here
...
} else if (is_dentry && ...) {
...
}
Since we only care about !is_dentry case, that leaves the middle
part. LOOKUPSNAP can come from ceph_lookup() or ceph_d_revalidate(),
neither of which call ceph_handle_notrace_create(). MKSNAP comes
only from ceph_mkdir(), which _does_ call ceph_handle_notrace_create(),
but only in case !is_dentry && !is_target, so that call of splice_dentry()
is not going to happen there.
*IF* the analysis above is correct, we can rely upon the ceph_lookup()
always getting a negative dentry and that if (d_really_is_negative(dentry))
is always taken. But I could be missing something subtle in there ;-/
prev parent reply other threads:[~2025-02-11 21:40 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-28 1:10 [PATCH] ceph: is_root_ceph_dentry() cleanup Viacheslav Dubeyko
2025-01-28 3:07 ` Al Viro
2025-01-28 23:27 ` Viacheslav Dubeyko
2025-01-29 1:12 ` Al Viro
2025-02-11 0:08 ` Viacheslav Dubeyko
2025-02-11 0:15 ` Al Viro
2025-02-11 18:01 ` Viacheslav Dubeyko
2025-02-11 19:01 ` Al Viro
2025-02-11 19:32 ` Viacheslav Dubeyko
2025-02-11 21:40 ` Al Viro [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250211214002.GJ1977892@ZenIV \
--to=viro@zeniv.linux.org.uk \
--cc=Slava.Dubeyko@ibm.com \
--cc=amarkuze@redhat.com \
--cc=ceph-devel@vger.kernel.org \
--cc=idryomov@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=pdonnell@redhat.com \
--cc=slava@dubeyko.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.