* [PATCH] libceph: accept addrvecs with multiple entries of the same type
@ 2026-04-23 10:09 Kefu Chai
2026-04-23 10:35 ` Ilya Dryomov
0 siblings, 1 reply; 5+ messages in thread
From: Kefu Chai @ 2026-04-23 10:09 UTC (permalink / raw)
To: ceph-devel
Cc: Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko, linux-kernel,
Kefu Chai
ceph_decode_entity_addrvec() rejects any addrvec containing more than
one entry that matches the requested msgr type (LEGACY or MSGR2),
logging "another match of type N in addrvec" and returning -EINVAL.
This breaks legitimate deployments where a daemon advertises multiple
addresses of the same type, most notably dual-stack (IPv4 + IPv6)
clusters and multi-subnet deployments where tooling picks one address
per listed public_network.
The monmap decoder fails, the client enters a reconnect loop:
libceph: mon0 (1)10.10.10.15:6789 session established
libceph: another match of type 1 in addrvec
libceph: problem decoding monmap, -22
Match the userspace messenger, which since Nautilus picks the first
entry of the requested type and silently tolerates subsequent entries.
Link: https://tracker.ceph.com/issues/49581
Link: https://tracker.ceph.com/issues/64068
Link: https://bugzilla.proxmox.com/show_bug.cgi?id=7518
Signed-off-by: Kefu Chai <k.chai@proxmox.com>
---
Tested by reproducing the Proxmox BZ 7518 scenario against a vstart
cluster whose mon addrvec was edited to contain two v1 + two v2 entries:
ceph mon set-addrs a \\
'[v2:$ip1:$p2/0,v1:$ip1:$p1/0,v2:$ip2:$p2/0,v1:$ip2:$p1/0]'
A Debian VM booted with the patched kernel via 'qemu -kernel' then
ran 'mount -t ceph ...:$p1:/ /mnt -o name=admin'. Pre-patch kernels
fail at monmap decode with "another match of type 1 in addrvec"
(-EINVAL). Post-patch, decode succeeds and the mount proceeds to
the auth / MDS-discovery stages.
Also verified the decoder logic on the monmap.bin attached to BZ 7518
using a userspace port of ceph_decode_entity_addrvec(): the pre-patch
form returns -EINVAL on both msgr1 and msgr2 lookups; the post-patch
form returns 0 and picks the first matching entry.
net/ceph/decode.c | 11 +++--------
1 file changed, 3 insertions(+), 8 deletions(-)
diff --git a/net/ceph/decode.c b/net/ceph/decode.c
index bc109a1a4616..b17bc082b4fc 100644
--- a/net/ceph/decode.c
+++ b/net/ceph/decode.c
@@ -88,7 +88,8 @@ EXPORT_SYMBOL(ceph_decode_entity_addr);
/*
* Return addr of desired type (MSGR2 or LEGACY) or error.
- * Make sure there is only one match.
+ * If multiple entries of the desired type are present, use the
+ * first one.
*
* Assume encoding with MSG_ADDR2.
*/
@@ -120,13 +121,7 @@ int ceph_decode_entity_addrvec(void **p, void *end, bool msgr2,
return ret;
dout("%s i %d addr %s\n", __func__, i, ceph_pr_addr(&tmp_addr));
- if (tmp_addr.type == my_type) {
- if (found) {
- pr_err("another match of type %d in addrvec\n",
- le32_to_cpu(my_type));
- return -EINVAL;
- }
-
+ if (tmp_addr.type == my_type && !found) {
memcpy(addr, &tmp_addr, sizeof(*addr));
found = true;
}
--
2.47.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] libceph: accept addrvecs with multiple entries of the same type
2026-04-23 10:09 [PATCH] libceph: accept addrvecs with multiple entries of the same type Kefu Chai
@ 2026-04-23 10:35 ` Ilya Dryomov
2026-04-23 11:44 ` Kefu Chai
0 siblings, 1 reply; 5+ messages in thread
From: Ilya Dryomov @ 2026-04-23 10:35 UTC (permalink / raw)
To: Kefu Chai; +Cc: ceph-devel, Alex Markuze, Viacheslav Dubeyko, linux-kernel
On Thu, Apr 23, 2026 at 12:09 PM Kefu Chai <k.chai@proxmox.com> wrote:
>
> ceph_decode_entity_addrvec() rejects any addrvec containing more than
> one entry that matches the requested msgr type (LEGACY or MSGR2),
> logging "another match of type N in addrvec" and returning -EINVAL.
> This breaks legitimate deployments where a daemon advertises multiple
> addresses of the same type, most notably dual-stack (IPv4 + IPv6)
> clusters
Hi Kefu,
My understanding is that dual-stack isn't supported in general:
https://tracker.ceph.com/issues/65631. The respective references were
purged from the documentation with Radoslaw (offline?) ack.
> and multi-subnet deployments where tooling picks one address
> per listed public_network.
Can you elaborate on when such tooling kicks in, what exactly does it
do and the use case in general? It's not immediately obvious to me how
having two addresses of the same type/stack and simply ignoring the
second one is better than insisting on having a just single address.
> Match the userspace messenger, which since Nautilus picks the first
> entry of the requested type and silently tolerates subsequent entries.
Do you have a reference to a specific commit? I'm wondering if it
isn't on that "merged more or less accidentally" list.
Thanks,
Ilya
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] libceph: accept addrvecs with multiple entries of the same type
2026-04-23 10:35 ` Ilya Dryomov
@ 2026-04-23 11:44 ` Kefu Chai
2026-04-24 18:54 ` Ilya Dryomov
0 siblings, 1 reply; 5+ messages in thread
From: Kefu Chai @ 2026-04-23 11:44 UTC (permalink / raw)
To: Ilya Dryomov; +Cc: ceph-devel, Alex Markuze, Viacheslav Dubeyko, linux-kernel
Ilya Dryomov <idryomov@gmail.com> writes:
> On Thu, Apr 23, 2026 at 12:09 PM Kefu Chai <k.chai@proxmox.com> wrote:
>>
>> ceph_decode_entity_addrvec() rejects any addrvec containing more than
>> one entry that matches the requested msgr type (LEGACY or MSGR2),
>> logging "another match of type N in addrvec" and returning -EINVAL.
>> This breaks legitimate deployments where a daemon advertises multiple
>> addresses of the same type, most notably dual-stack (IPv4 + IPv6)
>> clusters
>
> Hi Kefu,
>
Hi Ilya,
>
> My understanding is that dual-stack isn't supported in general:
> https://tracker.ceph.com/issues/65631. The respective references were
> purged from the documentation with Radoslaw (offline?) ack.
>
Yeah, you are right. I was overreaching. Dual-stack and
heterogeneous-subnet clients are not served by multi-entry addrvecs, and
the patch does not change that.
>
>> and multi-subnet deployments where tooling picks one address
>> per listed public_network.
>
> Can you elaborate on when such tooling kicks in, what exactly does it
> do and the use case in general? It's not immediately obvious to me how
> having two addresses of the same type/stack and simply ignoring the
> second one is better than insisting on having a just single address.
>
Sure. The narrow case that remains is compatibility. Admin tooling built
around public_addrv and ceph mon set-addrs produces addrvecs with more
than one entry of the same type on the back of that behavior, and the
kernel's strict guard rejects the whole monmap. The handshake contains()
check is the one concrete reason the extra entries need to be listed in
the addrvec rather than dropped at advertise time.
>
>> Match the userspace messenger, which since Nautilus picks the first
>> entry of the requested type and silently tolerates subsequent entries.
>
> Do you have a reference to a specific commit? I'm wondering if it
> isn't on that "merged more or less accidentally" list.
>
The pick-first selector in AsyncMessenger::create_connect() landed in
Sage's commit d1a783a5f733, and Xie Xingguo's commit 50d8c8a3cce3 fixed
the loop to actually honor the "pick whichever is listed first" comment.
Both shipped in Nautilus.
Would you be willing to take this as a compatibility fix, with the
commit message and the comment in ceph_decode_entity_addrvec() rewritten
to state exactly that and nothing more? If you would rather keep the
strict check and handle this on the tooling side instead, I am happy to
withdraw the patch. Either way, thanks for the review.
Thanks,
Kefu
> Thanks,
>
> Ilya
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] libceph: accept addrvecs with multiple entries of the same type
2026-04-23 11:44 ` Kefu Chai
@ 2026-04-24 18:54 ` Ilya Dryomov
2026-04-25 12:56 ` Kefu Chai
0 siblings, 1 reply; 5+ messages in thread
From: Ilya Dryomov @ 2026-04-24 18:54 UTC (permalink / raw)
To: Kefu Chai; +Cc: ceph-devel, Alex Markuze, Viacheslav Dubeyko, linux-kernel
On Thu, Apr 23, 2026 at 1:44 PM Kefu Chai <k.chai@proxmox.com> wrote:
>
> Ilya Dryomov <idryomov@gmail.com> writes:
>
> > On Thu, Apr 23, 2026 at 12:09 PM Kefu Chai <k.chai@proxmox.com> wrote:
> >>
> >> ceph_decode_entity_addrvec() rejects any addrvec containing more than
> >> one entry that matches the requested msgr type (LEGACY or MSGR2),
> >> logging "another match of type N in addrvec" and returning -EINVAL.
> >> This breaks legitimate deployments where a daemon advertises multiple
> >> addresses of the same type, most notably dual-stack (IPv4 + IPv6)
> >> clusters
> >
> > Hi Kefu,
> >
>
> Hi Ilya,
>
> >
> > My understanding is that dual-stack isn't supported in general:
> > https://tracker.ceph.com/issues/65631. The respective references were
> > purged from the documentation with Radoslaw (offline?) ack.
> >
>
> Yeah, you are right. I was overreaching. Dual-stack and
> heterogeneous-subnet clients are not served by multi-entry addrvecs, and
> the patch does not change that.
>
> >
> >> and multi-subnet deployments where tooling picks one address
> >> per listed public_network.
> >
> > Can you elaborate on when such tooling kicks in, what exactly does it
> > do and the use case in general? It's not immediately obvious to me how
> > having two addresses of the same type/stack and simply ignoring the
> > second one is better than insisting on having a just single address.
> >
>
> Sure. The narrow case that remains is compatibility. Admin tooling built
> around public_addrv and ceph mon set-addrs produces addrvecs with more
> than one entry of the same type on the back of that behavior, and the
Hi Kefu,
Sorry for being dense, but can you expand on _why_ the tooling in
question is doing that? With the dual-stack being explicitly not
supported and the userspace messenger unconditionally picking the
first address, what are the use cases for adding additional "dead"
addresses there?
I'm asking because Xie Xingguo's commit [1] mentions dual-stack
(already covered above) and in the ticket that I assume got you
involved [2] it seems like the user was able to implement the setup
they wanted without resorting to adding those "dead" addresses after
all.
> kernel's strict guard rejects the whole monmap. The handshake contains()
> check is the one concrete reason the extra entries need to be listed in
> the addrvec rather than dropped at advertise time.
Can you point me at this check?
>
> >
> >> Match the userspace messenger, which since Nautilus picks the first
> >> entry of the requested type and silently tolerates subsequent entries.
> >
> > Do you have a reference to a specific commit? I'm wondering if it
> > isn't on that "merged more or less accidentally" list.
> >
>
> The pick-first selector in AsyncMessenger::create_connect() landed in
> Sage's commit d1a783a5f733, and Xie Xingguo's commit 50d8c8a3cce3 fixed
> the loop to actually honor the "pick whichever is listed first" comment.
> Both shipped in Nautilus.
>
> Would you be willing to take this as a compatibility fix, with the
> commit message and the comment in ceph_decode_entity_addrvec() rewritten
> to state exactly that and nothing more? If you would rather keep the
> strict check and handle this on the tooling side instead, I am happy to
> withdraw the patch. Either way, thanks for the review.
I'm all for reducing the number of inconsistencies (whether intentional
or unintentional), but I would like to make sure I understand the
background first. I'm worried that by accepting multiple addresses we
just confuse users into thinking that dual-stack is supported and then
they go on attempting to implement "a second subnet", etc. AFAIK those
things aren't tested upstream at all.
[1] https://github.com/ceph/ceph/commit/50d8c8a3cce3bfbfc9be5acfa60bda165d59e2bc
[2] https://bugzilla.proxmox.com/show_bug.cgi?id=7518
Thanks,
Ilya
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] libceph: accept addrvecs with multiple entries of the same type
2026-04-24 18:54 ` Ilya Dryomov
@ 2026-04-25 12:56 ` Kefu Chai
0 siblings, 0 replies; 5+ messages in thread
From: Kefu Chai @ 2026-04-25 12:56 UTC (permalink / raw)
To: Ilya Dryomov; +Cc: ceph-devel, Alex Markuze, Viacheslav Dubeyko, linux-kernel
Ilya Dryomov <idryomov@gmail.com> writes:
> On Thu, Apr 23, 2026 at 1:44 PM Kefu Chai <k.chai@proxmox.com> wrote:
>>
>> Ilya Dryomov <idryomov@gmail.com> writes:
>>
>> > On Thu, Apr 23, 2026 at 12:09 PM Kefu Chai <k.chai@proxmox.com> wrote:
>> >>
>> >> ceph_decode_entity_addrvec() rejects any addrvec containing more than
>> >> one entry that matches the requested msgr type (LEGACY or MSGR2),
>> >> logging "another match of type N in addrvec" and returning -EINVAL.
>> >> This breaks legitimate deployments where a daemon advertises multiple
>> >> addresses of the same type, most notably dual-stack (IPv4 + IPv6)
>> >> clusters
>> >
>> > Hi Kefu,
>> >
>>
>> Hi Ilya,
>>
>> >
>> > My understanding is that dual-stack isn't supported in general:
>> > https://tracker.ceph.com/issues/65631. The respective references were
>> > purged from the documentation with Radoslaw (offline?) ack.
>> >
>>
>> Yeah, you are right. I was overreaching. Dual-stack and
>> heterogeneous-subnet clients are not served by multi-entry addrvecs, and
>> the patch does not change that.
>>
>> >
>> >> and multi-subnet deployments where tooling picks one address
>> >> per listed public_network.
>> >
>> > Can you elaborate on when such tooling kicks in, what exactly does it
>> > do and the use case in general? It's not immediately obvious to me how
>> > having two addresses of the same type/stack and simply ignoring the
>> > second one is better than insisting on having a just single address.
>> >
>>
>> Sure. The narrow case that remains is compatibility. Admin tooling built
>> around public_addrv and ceph mon set-addrs produces addrvecs with more
>> than one entry of the same type on the back of that behavior, and the
>
> Hi Kefu,
>
> Sorry for being dense, but can you expand on _why_ the tooling in
> question is doing that? With the dual-stack being explicitly not
> supported and the userspace messenger unconditionally picking the
> first address, what are the use cases for adding additional "dead"
> addresses there?
>
Hi Ilya,
Not dense at all. I went back and walked the history. Turns out there is
no functional use case left, just a history of admin tooling producing
this shape.
The tooling is `pveceph mon create`, where pveceph is the Proxmox VE CLI
used to provision Ceph monitors on a PVE node. In 2021, we added support
for a `public_network` setting that lists more than on CIDR. When the
admin configure, for instance:
public_network = 10.0.0.0/24,10.0.1.0/24
`pveceph mon create` pick on local IP per listed subnet, and the tooling
emits both a v2 and a v1 entry for each, which is what produces the
addvecs that the previously referenced bug report [2].
The admin's intent was multi-subnet reachability, but after auditing
Ceph's code, I realized that nobody uses more than the first matching
entry per msgr type.
So, multi-subnet reachability has always relied on plain IP routing
between the listed subnets, not on any addrvec-level fallback.
>
> I'm asking because Xie Xingguo's commit [1] mentions dual-stack
> (already covered above) and in the ticket that I assume got you
> involved [2] it seems like the user was able to implement the setup
> they wanted without resorting to adding those "dead" addresses after
> all.
>
Indeed. In our issue[2], the admin repaired their cluster by editing the
monmap by hand.
>
>> kernel's strict guard rejects the whole monmap. The handshake contains()
>> check is the one concrete reason the extra entries need to be listed in
>> the addrvec rather than dropped at advertise time.
>
> Can you point me at this check?
>
Sure. I pointed at ProtocolV2's server_ident.addrs().contains(target_addr),
but after on re-reading, turns out it only requires that the address the
client connected to (picked from the monmap) be in the server's
announced bind addrs. So, there is no concrete reason for the extras to
be in the addrvec.
>>
>> >
>> >> Match the userspace messenger, which since Nautilus picks the first
>> >> entry of the requested type and silently tolerates subsequent entries.
>> >
>> > Do you have a reference to a specific commit? I'm wondering if it
>> > isn't on that "merged more or less accidentally" list.
>> >
>>
>> The pick-first selector in AsyncMessenger::create_connect() landed in
>> Sage's commit d1a783a5f733, and Xie Xingguo's commit 50d8c8a3cce3 fixed
>> the loop to actually honor the "pick whichever is listed first" comment.
>> Both shipped in Nautilus.
>>
>> Would you be willing to take this as a compatibility fix, with the
>> commit message and the comment in ceph_decode_entity_addrvec() rewritten
>> to state exactly that and nothing more? If you would rather keep the
>> strict check and handle this on the tooling side instead, I am happy to
>> withdraw the patch. Either way, thanks for the review.
>
> I'm all for reducing the number of inconsistencies (whether intentional
> or unintentional), but I would like to make sure I understand the
> background first. I'm worried that by accepting multiple addresses we
> just confuse users into thinking that dual-stack is supported and then
> they go on attempting to implement "a second subnet", etc. AFAIK those
> things aren't tested upstream at all.
>
Yeah, that worry is far. The remaining argument for taking this patch is
purely backward compatibility: there are existing deployment whose
monmap where produced by `pveceph`-like tools and hence contain extra
multiple addresses in addvec.
If you would still rather keep the strict guard and have those clusters
repair their monmaps manually, I am happy to drop the patch.
If you would consider taking it as a pure compatibility tolerance, I
will respin with the commit message and update the comment in
ceph_decode_entity_addrvec() to explain the rationales -- not because of
dual-stack or multi-subnet support.
Thanks,
Kefu
>
> [1] https://github.com/ceph/ceph/commit/50d8c8a3cce3bfbfc9be5acfa60bda165d59e2bc
> [2] https://bugzilla.proxmox.com/show_bug.cgi?id=7518
>
> Thanks,
>
> Ilya
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-04-25 12:57 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-23 10:09 [PATCH] libceph: accept addrvecs with multiple entries of the same type Kefu Chai
2026-04-23 10:35 ` Ilya Dryomov
2026-04-23 11:44 ` Kefu Chai
2026-04-24 18:54 ` Ilya Dryomov
2026-04-25 12:56 ` Kefu Chai
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox