From: Al Viro <viro@zeniv.linux.org.uk>
To: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Cc: "idryomov@gmail.com" <idryomov@gmail.com>,
Alex Markuze <amarkuze@redhat.com>,
"slava@dubeyko.com" <slava@dubeyko.com>,
"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
Patrick Donnelly <pdonnell@redhat.com>
Subject: Re: [PATCH] ceph: is_root_ceph_dentry() cleanup
Date: Tue, 11 Feb 2025 21:40:02 +0000 [thread overview]
Message-ID: <20250211214002.GJ1977892@ZenIV> (raw)
In-Reply-To: <36afe0ca1c80a97962858b81619501e5a5483fe1.camel@ibm.com>
On Tue, Feb 11, 2025 at 07:32:16PM +0000, Viacheslav Dubeyko wrote:
> > The really unpleasant question is whether ceph_handle_notrace_create() could
> > end up feeding an already-positive dentry to direct call of ceph_lookup()...
>
> We have ceph_handle_notrace_create() call in several methods:
> (1) ceph_mknod()
> (2) ceph_symlink()
> (3) ceph_mkdir()
> (4) ceph_atomic_open()
>
> Every time we create object at first and, then, we call
> ceph_handle_notrace_create() if creation was successful. We have such pattern:
>
> req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_MKNOD, USE_AUTH_MDS);
>
> <skipped>
>
> err = ceph_mdsc_do_request(mdsc, dir, req);
> if (!err && !req->r_reply_info.head->is_dentry)
> err = ceph_handle_notrace_create(dir, dentry);
>
> And ceph_lookup() has such logic:
>
> if (d_really_is_negative(dentry)) {
> <execute logic>
> if (-ENOENT)
> return NULL;
> }
>
> req = ceph_mdsc_create_request(mdsc, op, USE_ANY_MDS);
>
> <skipped>
>
> err = ceph_mdsc_do_request(mdsc, NULL, req);
>
> So, we have two different type of requests here:
> (1) ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_MKNOD, USE_AUTH_MDS)
> (2) ceph_mdsc_create_request(mdsc, op, USE_ANY_MDS)
>
> The first request creates an object on MDS side and second one checks that this
> object exists on MDS side by lookup. I assume that first request simply reports
> success or failure of object creation. And only second one can extract metadata
> from MDS side.
If only... The first request may return that metadata, in which case we'll get
splice_dentry() called by ceph_fill_trace() when reply arrives. Note that
calls of ceph_handle_notrace_create() are conditional.
Intent is as you've described and normally that's how it works, but that
assumes sane reply. Which is not a good assumption to make, when it comes
to memory corruption on client...
I _think_ the conditions are sufficient. There are 4 callers of
ceph_handle_notrace_create(). All of them require is_dentry in reply to be false;
the one in ceph_mkdir() requires both is_dentry and is_target to be false.
ceph_fill_trace() does not call splice_dentry() when both is_dentry and is_target
are false, so we are only interested in the mknod/symlink/atomic_open callers.
Past the handling of !is_dentry && !is_target case ceph_fill_trace() has
a large if (is_dentry); not a concern for us. Next comes if (is_target) where
we deal with ceph_fill_inode(); no calls of splice_dentry() in it.
After that we have
if (is_dentry && ....) {
...
splice_dentry() call possible here
...
} else if ((req->r_op == CEPH_MDS_OP_LOOKUPSNAP ||
req->r_op == CEPH_MDS_OP_MKSNAP) &&
test_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags) &&
!test_bit(CEPH_MDS_R_ABORTED, &req->r_req_flags)) {
...
splice_dentry() call possible here
...
} else if (is_dentry && ...) {
...
}
Since we only care about !is_dentry case, that leaves the middle
part. LOOKUPSNAP can come from ceph_lookup() or ceph_d_revalidate(),
neither of which call ceph_handle_notrace_create(). MKSNAP comes
only from ceph_mkdir(), which _does_ call ceph_handle_notrace_create(),
but only in case !is_dentry && !is_target, so that call of splice_dentry()
is not going to happen there.
*IF* the analysis above is correct, we can rely upon the ceph_lookup()
always getting a negative dentry and that if (d_really_is_negative(dentry))
is always taken. But I could be missing something subtle in there ;-/
prev parent reply other threads:[~2025-02-11 21:40 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-28 1:10 [PATCH] ceph: is_root_ceph_dentry() cleanup Viacheslav Dubeyko
2025-01-28 3:07 ` Al Viro
2025-01-28 23:27 ` Viacheslav Dubeyko
2025-01-29 1:12 ` Al Viro
2025-02-11 0:08 ` Viacheslav Dubeyko
2025-02-11 0:15 ` Al Viro
2025-02-11 18:01 ` Viacheslav Dubeyko
2025-02-11 19:01 ` Al Viro
2025-02-11 19:32 ` Viacheslav Dubeyko
2025-02-11 21:40 ` Al Viro [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250211214002.GJ1977892@ZenIV \
--to=viro@zeniv.linux.org.uk \
--cc=Slava.Dubeyko@ibm.com \
--cc=amarkuze@redhat.com \
--cc=ceph-devel@vger.kernel.org \
--cc=idryomov@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=pdonnell@redhat.com \
--cc=slava@dubeyko.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox