From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by smtp.lore.kernel.org (Postfix) with ESMTP id 163BAEB362F for ; Mon, 2 Mar 2026 19:41:48 +0000 (UTC) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 3DCCB40A76; Mon, 2 Mar 2026 20:41:47 +0100 (CET) Received: from mail-dl1-f53.google.com (mail-dl1-f53.google.com [74.125.82.53]) by mails.dpdk.org (Postfix) with ESMTP id 73F0740A76 for ; Mon, 2 Mar 2026 20:41:45 +0100 (CET) Received: by mail-dl1-f53.google.com with SMTP id a92af1059eb24-124a635476fso5659754c88.0 for ; Mon, 02 Mar 2026 11:41:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20230601.gappssmtp.com; s=20230601; t=1772480504; x=1773085304; darn=dpdk.org; h=content-transfer-encoding:mime-version:message-id:subject:cc:to :from:date:from:to:cc:subject:date:message-id:reply-to; bh=2V/zgx+4tg5bXSbfhfhDyna+FpdxWDSlzZJBmh0hIVA=; b=jqE0ouJ0uoHiyKbSPMDVUIDPN00B8xVnlaRfPUB//sJKXh7yX7egYtrqmkwq/fQpMm ckDT5Yqxs0/olKQ67y0dtlj0SVFpfR5m32M8u1nbKqOxhgM4HFLdi2ix0nwiQEhvYyPH BoCL9iblMh6d7qXLmkVvX7JHacHa9Kfk8sw5uhKmBe2w4zEKMTcorG8X9QDc0EYdUFaJ N5UW084rqqrW8RgMS3KMy7QxX0b6TmDUsAr573e0J2ZCxNGOa7CnW1m3OR6j2lj2ccrc NBoFwY8TkjSn8UuWCtXUD0iF/XSgxwAtYqiVYh96XAHu90rSxqGbMj0Tm7wpKZHM+G1e lkdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772480504; x=1773085304; h=content-transfer-encoding:mime-version:message-id:subject:cc:to :from:date:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=2V/zgx+4tg5bXSbfhfhDyna+FpdxWDSlzZJBmh0hIVA=; b=FqdCGyD68xQxdAgm6jFF161ce0FfP1saF75nRgQLq6/ZpMxghCayhHm7kRlsJ0QRst 98Mtql8Ar0PedORi/XsIC2Wzq/f3v0cOS0IVVXLTBP3kmbpoHKay+j1iD2jcr2GdsUjx aJJ9iI9rj8arD39E5lgCmDhdGWRPLKhAF/1b3q7WGIA8Mh2z0SMM9iD4Jk/OrDY1CWNz 0BxrVw1OBx7B5n+dY0Weu42kevjYyKAmV67ZJLrf23seYN5ebnOrybZ2kw8wF6LOOLQI fRi7sgW5fXBpfhsmf0MetgtgiR+OhpVNMnDC6b31WCFX7tfV0b5hwIYHaS7bOUKqHVi9 W5eg== X-Gm-Message-State: AOJu0YxMs4rlnVcsFFnLOd5o8MnMk48eb0Oh5kFyiQENBSVJXGFRl64o 4Dk8s0vN3LZyQedJVVHUpFaL2abBt706e0fQnVzMt+RcyqjVqLcb1Vwh2sAkd2vqFKo= X-Gm-Gg: ATEYQzxSvieWMka+hBTQ4fmemq2Hf5ez5OaWTsuAVsXZLV3eirN98yjMnPUyMg/kwuS iyNsivibuekkLBZCZXA7nDvff006hwl9w+gTFHbhXLFGiYaa4IVvQU5n87kvUzRq4jI1aY2e05a GAeHYm2umNUyxIVGL/uTCDCT87Q7pZAD/ChDpeXBukLCoFNwspP445szN3F81vxuFflAiGJPyzf NgRU3e548JJv7CTM/sgdt8rMF69vXE0hKa5CEeTVg2swUflecz+j5EDlj3JjJ780Nm3LvD/fOyD XQ/XS+FsRY94l2Txt4ghX4vsuKGzrt6CBJTzXiEMOVrQjOYridAUX/0iU4gtJaeABUxaE/AEtmk JcuEunuQuyEjcEYvWqDsekd3mqfaW39WdRfSYg/CA1807im4E4n8KtjInP/C2OefxS+RYQlyXY0 vwMkcCZfjHUmmOGrGZzRXc5M2+DbNP++7VXoB0B3yd8iMrfLXLeSMhP5alS2v6fqnq X-Received: by 2002:a05:693c:2c13:b0:2ba:9835:1113 with SMTP id 5a478bee46e88-2bde1eb3a43mr5200200eec.36.1772480504271; Mon, 02 Mar 2026 11:41:44 -0800 (PST) Received: from phoenix.local (204-195-96-226.wavecable.com. [204.195.96.226]) by smtp.gmail.com with ESMTPSA id 5a478bee46e88-2bdd1f48c77sm12953033eec.26.2026.03.02.11.41.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Mar 2026 11:41:44 -0800 (PST) Date: Mon, 2 Mar 2026 11:41:41 -0800 From: Stephen Hemminger To: Elena Agostini Cc: dev@dpdk.org Subject: Bug analysis of gpudev library Message-ID: <20260302114141.29d25e19@phoenix.local> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Since the gpudev library is untested in current CI and there are no unit te= sts; sent AI off bug hunting into this library. It found a lot there. Is anyone outside of Nvidia proprietary even using it? Here's a thorough bug analysis of the DPDK gpudev library: --- ## 1. Memory Leak in `rte_gpu_comm_create_list` This is the most serious issue. The function has multiple early-return erro= r paths that leak previously allocated resources. **Example path:** If the `rte_gpu_mem_register` for `pkt_list` fails partwa= y through the loop, all previously allocated `pkt_list` entries, `mbufs` en= tries, the `status_h`/`status_d` allocations, and `comm_list` itself are le= aked =E2=80=94 none are freed. ```c for (idx_l =3D 0; idx_l < num_comm_items; idx_l++) { comm_list[idx_l].pkt_list =3D rte_zmalloc(NULL, ...); if (comm_list[idx_l].pkt_list =3D=3D NULL) { rte_errno =3D ENOMEM; return NULL; // leaks everything allocated so far } ret =3D rte_gpu_mem_register(dev_id, ..., comm_list[idx_l].pkt_list); if (ret < 0) { rte_errno =3D ENOMEM; return NULL; // leaks pkt_list just allocated + all prior items } // same pattern for mbufs allocation... } ``` The same problem exists for the earlier allocations of `comm_list` itself, = `status_d`, and `status_h`. Every `return NULL` after the first successful = `rte_zmalloc` is a leak. The fix would be a `goto cleanup` pattern that unw= inds in reverse order. --- ## 2. Missing Status Initialization for Index 0 in `rte_gpu_comm_create_lis= t` The loop that sets status only runs `rte_gpu_comm_set_status` when `idx_l >= 0`: ```c if (idx_l > 0) { comm_list[idx_l].status_h =3D &(comm_list[0].status_h[idx_l]); comm_list[idx_l].status_d =3D &(comm_list[0].status_d[idx_l]); ret =3D rte_gpu_comm_set_status(&comm_list[idx_l], RTE_GPU_COMM_LIST_FR= EE); ... } ``` Element 0 never has `rte_gpu_comm_set_status` called on it. It relies on `r= te_zmalloc` zeroing the memory, which happens to match `RTE_GPU_COMM_LIST_F= REE =3D=3D 0`. However, when `status_d` comes from `rte_gpu_mem_alloc` (the= GPU memory path), there's no guarantee the GPU driver zeros the memory. If= it doesn't, `comm_list[0]` starts with an undefined status. This is fragil= e at best and a latent bug if a driver doesn't zero-fill. --- ## 3. Stale `ret` Check After `rte_gpu_mem_alloc` ```c comm_list[0].status_d =3D rte_gpu_mem_alloc(dev_id, ...); if (ret < 0) { // <-- checks old 'ret' from rte_gpu_mem_register above rte_errno =3D ENOMEM; return NULL; } ``` `rte_gpu_mem_alloc` returns a pointer, not an int. The variable `ret` still= holds the return value from the previous `rte_gpu_mem_register` call. The = check should be `if (comm_list[0].status_d =3D=3D NULL)`. As written, a fai= led GPU allocation is silently ignored and the code proceeds with a NULL `s= tatus_d`, which will crash when dereferenced later. --- ## 4. Bug in `rte_gpu_close` =E2=80=94 Uninitialized `firsterr` ```c int firsterr, binerr; int *lasterr =3D &firsterr; ``` If `dev->ops.dev_close` is NULL, the code skips to `*lasterr =3D rte_gpu_re= lease(dev)`, which writes to `firsterr`. That's fine. But if `dev_close` is= non-NULL and succeeds (`*lasterr =3D 0`), then `lasterr` is **not** reassi= gned to `&binerr`, so the subsequent `rte_gpu_release` result overwrites `f= irsterr`. This means the close error is lost. And if `dev_close` fails, `la= sterr` is moved to `&binerr`, and `rte_gpu_release`'s result goes into `bin= err` =E2=80=94 but then `firsterr` is returned, discarding the release erro= r. The logic works for preserving the *first* error, but the variable `firs= terr` can be used uninitialized if there's a code path I'm missing (though = in practice the current paths do assign before reading). More practically: `rte_errno =3D -firsterr` at the end assumes `firsterr` i= s negative on error, but `GPU_DRV_RET` returns negative values while `rte_g= pu_release` also returns negative. So `rte_errno =3D -firsterr` would set e= rrno to a positive value. This is correct but the double-pointer indirectio= n is needlessly fragile and hard to audit. --- ## 5. `rte_gpu_comm_cleanup_list` Doesn't Free mbufs The function zeroes out the packet list entries and sets `mbufs[idx] =3D NU= LL`, but never calls `rte_pktmbuf_free()` on the mbufs. This means whoever = populated the list via `rte_gpu_comm_populate_list_pkts` transferred mbuf o= wnership to the comm list, but cleanup just drops the references. The doc c= omment says "mbufs are returned to the pool" but the code doesn't do that: ```c for (idx =3D 0; idx < RTE_GPU_COMM_LIST_PKTS_MAX; idx++) { if (comm_list_item->pkt_list[idx].addr =3D=3D 0) break; comm_list_item->pkt_list[idx].addr =3D 0; comm_list_item->pkt_list[idx].size =3D 0; comm_list_item->mbufs[idx] =3D NULL; // mbuf pointer dropped without f= ree } ``` If the caller is expected to free mbufs before calling cleanup, the documen= tation is misleading. --- ## 6. `rte_gpu_comm_destroy_list` Doesn't Free Status in All Paths In the "CPU map not supported" fallback path during creation, `status_d` is= set equal to `status_h`. During destroy: ```c ret =3D rte_gpu_mem_cpu_unmap(dev_id, comm_list[0].status_d); if (ret =3D=3D 0) { rte_gpu_mem_free(dev_id, comm_list[0].status_d); } else { rte_gpu_mem_unregister(dev_id, comm_list[0].status_h); rte_free(comm_list[0].status_h); } ``` This relies on `rte_gpu_mem_cpu_unmap` failing to distinguish the two alloc= ation paths. That's an implicit protocol rather than an explicit flag, whic= h is brittle =E2=80=94 if a driver's unmap succeeds on a pointer that was n= ever mapped (returning 0), the code takes the wrong path and calls `rte_gpu= _mem_free` on CPU-allocated memory. --- ## 7. Thread Safety of `gpu_count` `gpu_count` is a plain `int16_t` modified with `gpu_count++` and `gpu_count= --` in `rte_gpu_allocate`, `rte_gpu_attach`, and `rte_gpu_release` without = any locking. The header says "The API is not thread-safe" and "Device manag= ement must be done by a single thread," so this is by-design, but `rte_gpu_= count_avail()` can be called from any thread and reads `gpu_count` without = a barrier or atomic, which is a data race if management happens concurrentl= y from another thread. --- ## 8. Inconsistent `dev_id` Types `rte_gpu_comm_create_flag` and `rte_gpu_comm_create_list` take `uint16_t de= v_id`, while all other public APIs take `int16_t dev_id`. This means negati= ve error IDs (like `RTE_GPU_ID_NONE =3D -1`) can't be detected =E2=80=94 th= ey'd wrap to 65535 and silently pass validation or index out of bounds in `= gpu_get_by_id`. --- ## Summary by Severity **High:** Items 1 (memory leaks in create_list) and 3 (stale ret check =E2= =86=92 NULL deref) are real, exploitable bugs. Item 5 is a likely mbuf leak= depending on usage contract. **Medium:** Items 2 (status_d not zeroed), 6 (destroy path fragility), and = 8 (type mismatch) could cause issues under specific driver behaviors or edg= e cases. **Low:** Items 4 and 7 are code quality / maintainability concerns.