* [PATCH] zram: fix use-after-free in zram_writeback_endio
@ 2026-05-04 12:32 Richard Chang
2026-05-05 3:25 ` Sergey Senozhatsky
2026-05-05 16:37 ` Minchan Kim
0 siblings, 2 replies; 10+ messages in thread
From: Richard Chang @ 2026-05-04 12:32 UTC (permalink / raw)
To: Minchan Kim, Sergey Senozhatsky, Jens Axboe, Andrew Morton
Cc: bgeffon, liumartin, linux-kernel, linux-block, linux-mm,
Richard Chang
A crash was observed in zram_writeback_endio due to a NULL pointer
dereference in wake_up. The root cause is a race condition between the
bio completion handler (zram_writeback_endio) and the writeback task.
In zram_writeback_endio, wake_up() is called on &wb_ctl->done_wait after
releasing wb_ctl->done_lock. This creates a race window where the
writeback task can see num_inflight become 0, return, and free wb_ctl
before zram_writeback_endio calls wake_up().
CPU 0 (zram_writeback_endio) CPU 1 (zram_complete_done_reqs)
============================ ============================
spin_lock(&wb_ctl->done_lock);
list_add(&req->entry, &wb_ctl->done_reqs);
spin_unlock(&wb_ctl->done_lock);
while (&wb_ctl->num_inflight) > 0)
spin_lock(&wb_ctl->done_lock);
list_del(&req->entry);
spin_unlock(&wb_ctl->done_lock);
// num_inflight becomes 0
atomic_dec(&wb_ctl->num_inflight);
returns to writeback_store();
// frees wb_ctl
release_wb_ctl(wb_ctl);
// UAF crash!
wake_up(&wb_ctl->done_wait);
Fix this by moving wake_up() inside the done_lock critical section.
This ensures that zram_complete_done_reqs cannot consume the request
and decrement num_inflight until zram_writeback_endio has finished
calling wake_up() and released the lock.
Fixes: f405066a1f0d ("zram: introduce writeback bio batching")
Signed-off-by: Richard Chang <richardycc@google.com>
---
drivers/block/zram/zram_drv.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index aebc710f0d6a..a457fdf564f8 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -966,9 +966,8 @@ static void zram_writeback_endio(struct bio *bio)
spin_lock_irqsave(&wb_ctl->done_lock, flags);
list_add(&req->entry, &wb_ctl->done_reqs);
- spin_unlock_irqrestore(&wb_ctl->done_lock, flags);
-
wake_up(&wb_ctl->done_wait);
+ spin_unlock_irqrestore(&wb_ctl->done_lock, flags);
}
static void zram_submit_wb_request(struct zram *zram,
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] zram: fix use-after-free in zram_writeback_endio
2026-05-04 12:32 [PATCH] zram: fix use-after-free in zram_writeback_endio Richard Chang
@ 2026-05-05 3:25 ` Sergey Senozhatsky
2026-05-05 16:37 ` Minchan Kim
1 sibling, 0 replies; 10+ messages in thread
From: Sergey Senozhatsky @ 2026-05-05 3:25 UTC (permalink / raw)
To: Andrew Morton, Richard Chang
Cc: Minchan Kim, Sergey Senozhatsky, Jens Axboe, bgeffon, liumartin,
linux-kernel, linux-block, linux-mm
On (26/05/04 12:32), Richard Chang wrote:
> A crash was observed in zram_writeback_endio due to a NULL pointer
> dereference in wake_up. The root cause is a race condition between the
> bio completion handler (zram_writeback_endio) and the writeback task.
>
> In zram_writeback_endio, wake_up() is called on &wb_ctl->done_wait after
> releasing wb_ctl->done_lock. This creates a race window where the
> writeback task can see num_inflight become 0, return, and free wb_ctl
> before zram_writeback_endio calls wake_up().
>
> CPU 0 (zram_writeback_endio) CPU 1 (zram_complete_done_reqs)
> ============================ ============================
> spin_lock(&wb_ctl->done_lock);
> list_add(&req->entry, &wb_ctl->done_reqs);
> spin_unlock(&wb_ctl->done_lock);
> while (&wb_ctl->num_inflight) > 0)
> spin_lock(&wb_ctl->done_lock);
> list_del(&req->entry);
> spin_unlock(&wb_ctl->done_lock);
> // num_inflight becomes 0
> atomic_dec(&wb_ctl->num_inflight);
> returns to writeback_store();
> // frees wb_ctl
> release_wb_ctl(wb_ctl);
>
> // UAF crash!
> wake_up(&wb_ctl->done_wait);
>
> Fix this by moving wake_up() inside the done_lock critical section.
> This ensures that zram_complete_done_reqs cannot consume the request
> and decrement num_inflight until zram_writeback_endio has finished
> calling wake_up() and released the lock.
>
> Fixes: f405066a1f0d ("zram: introduce writeback bio batching")
> Signed-off-by: Richard Chang <richardycc@google.com>
Acked-by: Sergey Senozhatsky <senozhatsky@chromium.org>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] zram: fix use-after-free in zram_writeback_endio
2026-05-04 12:32 [PATCH] zram: fix use-after-free in zram_writeback_endio Richard Chang
2026-05-05 3:25 ` Sergey Senozhatsky
@ 2026-05-05 16:37 ` Minchan Kim
2026-05-07 9:40 ` Sergey Senozhatsky
1 sibling, 1 reply; 10+ messages in thread
From: Minchan Kim @ 2026-05-05 16:37 UTC (permalink / raw)
To: Richard Chang
Cc: Sergey Senozhatsky, Jens Axboe, Andrew Morton, bgeffon, liumartin,
linux-kernel, linux-block, linux-mm
On Mon, May 04, 2026 at 12:32:30PM +0000, Richard Chang wrote:
> A crash was observed in zram_writeback_endio due to a NULL pointer
> dereference in wake_up. The root cause is a race condition between the
> bio completion handler (zram_writeback_endio) and the writeback task.
>
> In zram_writeback_endio, wake_up() is called on &wb_ctl->done_wait after
> releasing wb_ctl->done_lock. This creates a race window where the
> writeback task can see num_inflight become 0, return, and free wb_ctl
> before zram_writeback_endio calls wake_up().
>
> CPU 0 (zram_writeback_endio) CPU 1 (zram_complete_done_reqs)
> ============================ ============================
> spin_lock(&wb_ctl->done_lock);
> list_add(&req->entry, &wb_ctl->done_reqs);
> spin_unlock(&wb_ctl->done_lock);
> while (&wb_ctl->num_inflight) > 0)
> spin_lock(&wb_ctl->done_lock);
> list_del(&req->entry);
> spin_unlock(&wb_ctl->done_lock);
> // num_inflight becomes 0
> atomic_dec(&wb_ctl->num_inflight);
> returns to writeback_store();
> // frees wb_ctl
> release_wb_ctl(wb_ctl);
>
> // UAF crash!
> wake_up(&wb_ctl->done_wait);
>
> Fix this by moving wake_up() inside the done_lock critical section.
> This ensures that zram_complete_done_reqs cannot consume the request
> and decrement num_inflight until zram_writeback_endio has finished
> calling wake_up() and released the lock.
>
> Fixes: f405066a1f0d ("zram: introduce writeback bio batching")
> Signed-off-by: Richard Chang <richardycc@google.com>
> ---
> drivers/block/zram/zram_drv.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index aebc710f0d6a..a457fdf564f8 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -966,9 +966,8 @@ static void zram_writeback_endio(struct bio *bio)
>
> spin_lock_irqsave(&wb_ctl->done_lock, flags);
> list_add(&req->entry, &wb_ctl->done_reqs);
> - spin_unlock_irqrestore(&wb_ctl->done_lock, flags);
> -
> wake_up(&wb_ctl->done_wait);
> + spin_unlock_irqrestore(&wb_ctl->done_lock, flags);
> }
>
I agree this will fix the issue, but using a lock to extend the lifetime of
an object to avoid a UAF is not a good pattern. Object lifetime shared between
process and interrupt contexts should be managed explicitly using refcount.
Furthermore, keeping wake_up() outside the critical section minimizes
interrupt-disabled latency and avoids nesting spinlocks
(done_lock -> done_wait.lock), reducing the risk of future lockdep
issues, just in case.
It definitely will add more overhead for the submission/completion paths to deal
with the refcount, but I think we should go that way at the cost of runtime.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] zram: fix use-after-free in zram_writeback_endio
2026-05-05 16:37 ` Minchan Kim
@ 2026-05-07 9:40 ` Sergey Senozhatsky
2026-05-07 22:56 ` Minchan Kim
0 siblings, 1 reply; 10+ messages in thread
From: Sergey Senozhatsky @ 2026-05-07 9:40 UTC (permalink / raw)
To: Minchan Kim
Cc: Richard Chang, Sergey Senozhatsky, Jens Axboe, Andrew Morton,
bgeffon, liumartin, linux-kernel, linux-block, linux-mm
On (26/05/05 09:37), Minchan Kim wrote:
> > @@ -966,9 +966,8 @@ static void zram_writeback_endio(struct bio *bio)
> >
> > spin_lock_irqsave(&wb_ctl->done_lock, flags);
> > list_add(&req->entry, &wb_ctl->done_reqs);
> > - spin_unlock_irqrestore(&wb_ctl->done_lock, flags);
> > -
> > wake_up(&wb_ctl->done_wait);
> > + spin_unlock_irqrestore(&wb_ctl->done_lock, flags);
> > }
> >
>
> I agree this will fix the issue, but using a lock to extend the lifetime of
> an object to avoid a UAF is not a good pattern. Object lifetime shared between
> process and interrupt contexts should be managed explicitly using refcount.
->num_inflight is a ref-counter, basically. The problem is that
completion is a two-step process, only one part of each is synchronized
with the writeback context. I honestly don't want to have two ref-counts:
one for requests pending zram completion and one for active endio contexts.
Maybe we can repurpose num_inflight instead.
> Furthermore, keeping wake_up() outside the critical section minimizes
> interrupt-disabled latency
So I considered that, but isn't endio already called from IRQ context?
Just asking. We wakeup only one waiter (writeback task), so it's not
that bad CPU-cycles wise. Do you think it's really a concern?
wake_up() under spin-lock solves the problem of a unsynchronized
two-stages endio process.
> and avoids nesting spinlocks (done_lock -> done_wait.lock), reducing
> the risk of future lockdep issues, just in case.
I considered lockdep as well but ruled it out as impossible scenario,
nesting here is strictly uni-directional, we never call into zram from
the scheduler. Just saying.
> It definitely will add more overhead for the submission/completion paths to deal
> with the refcount, but I think we should go that way at the cost of runtime.
Dunno, something like below maybe?
---
drivers/block/zram/zram_drv.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index ce2e1c79fc75..27fe50d666d7 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -967,7 +967,7 @@ static int zram_writeback_complete(struct zram *zram, struct zram_wb_req *req)
static void zram_writeback_endio(struct bio *bio)
{
struct zram_wb_req *req = container_of(bio, struct zram_wb_req, bio);
- struct zram_wb_ctl *wb_ctl = bio->bi_private;
+ struct zram_wb_ctl *wb_ctl = READ_ONCE(bio->bi_private);
unsigned long flags;
spin_lock_irqsave(&wb_ctl->done_lock, flags);
@@ -975,6 +975,7 @@ static void zram_writeback_endio(struct bio *bio)
spin_unlock_irqrestore(&wb_ctl->done_lock, flags);
wake_up(&wb_ctl->done_wait);
+ atomic_dec(&wb_ctl->num_inflight);
}
static void zram_submit_wb_request(struct zram *zram,
@@ -998,7 +999,7 @@ static int zram_complete_done_reqs(struct zram *zram,
unsigned long flags;
int ret = 0, err;
- while (atomic_read(&wb_ctl->num_inflight) > 0) {
+ for (;;) {
spin_lock_irqsave(&wb_ctl->done_lock, flags);
req = list_first_entry_or_null(&wb_ctl->done_reqs,
struct zram_wb_req, entry);
@@ -1006,7 +1007,6 @@ static int zram_complete_done_reqs(struct zram *zram,
list_del(&req->entry);
spin_unlock_irqrestore(&wb_ctl->done_lock, flags);
- /* ->num_inflight > 0 doesn't mean we have done requests */
if (!req)
break;
@@ -1014,7 +1014,6 @@ static int zram_complete_done_reqs(struct zram *zram,
if (err)
ret = err;
- atomic_dec(&wb_ctl->num_inflight);
release_pp_slot(zram, req->pps);
req->pps = NULL;
@@ -1129,8 +1128,11 @@ static int zram_writeback_slots(struct zram *zram,
if (req)
release_wb_req(req);
- while (atomic_read(&wb_ctl->num_inflight) > 0) {
- wait_event(wb_ctl->done_wait, !list_empty(&wb_ctl->done_reqs));
+ while (atomic_read(&wb_ctl->num_inflight) ||
+ !list_empty(&wb_ctl->done_reqs)) {
+ wait_event_timeout(wb_ctl->done_wait,
+ !list_empty(&wb_ctl->done_reqs),
+ HZ);
err = zram_complete_done_reqs(zram, wb_ctl);
if (err)
ret = err;
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] zram: fix use-after-free in zram_writeback_endio
2026-05-07 9:40 ` Sergey Senozhatsky
@ 2026-05-07 22:56 ` Minchan Kim
2026-05-07 23:38 ` Minchan Kim
2026-05-08 2:40 ` Sergey Senozhatsky
0 siblings, 2 replies; 10+ messages in thread
From: Minchan Kim @ 2026-05-07 22:56 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Richard Chang, Jens Axboe, Andrew Morton, bgeffon, liumartin,
linux-kernel, linux-block, linux-mm
On Thu, May 07, 2026 at 06:40:37PM +0900, Sergey Senozhatsky wrote:
> On (26/05/05 09:37), Minchan Kim wrote:
> > > @@ -966,9 +966,8 @@ static void zram_writeback_endio(struct bio *bio)
> > >
> > > spin_lock_irqsave(&wb_ctl->done_lock, flags);
> > > list_add(&req->entry, &wb_ctl->done_reqs);
> > > - spin_unlock_irqrestore(&wb_ctl->done_lock, flags);
> > > -
> > > wake_up(&wb_ctl->done_wait);
> > > + spin_unlock_irqrestore(&wb_ctl->done_lock, flags);
> > > }
> > >
> >
> > I agree this will fix the issue, but using a lock to extend the lifetime of
> > an object to avoid a UAF is not a good pattern. Object lifetime shared between
> > process and interrupt contexts should be managed explicitly using refcount.
>
> ->num_inflight is a ref-counter, basically. The problem is that
> completion is a two-step process, only one part of each is synchronized
> with the writeback context. I honestly don't want to have two ref-counts:
> one for requests pending zram completion and one for active endio contexts.
> Maybe we can repurpose num_inflight instead.
If it can make the code much clearer and simpler, I have no objection.
>
> > Furthermore, keeping wake_up() outside the critical section minimizes
> > interrupt-disabled latency
>
> So I considered that, but isn't endio already called from IRQ context?
> Just asking. We wakeup only one waiter (writeback task), so it's not
> that bad CPU-cycles wise. Do you think it's really a concern?
I don't think it will have any measurable impact; I was just pointing out
a theoretical one.
>
> wake_up() under spin-lock solves the problem of a unsynchronized
> two-stages endio process.
>
> > and avoids nesting spinlocks (done_lock -> done_wait.lock), reducing
> > the risk of future lockdep issues, just in case.
>
> I considered lockdep as well but ruled it out as impossible scenario,
> nesting here is strictly uni-directional, we never call into zram from
> the scheduler. Just saying.
Sure. I just prefer to avoid adding more lock dependencies without a strong
justification, to prevent potential locking issues in the future.
>
> > It definitely will add more overhead for the submission/completion paths to deal
> > with the refcount, but I think we should go that way at the cost of runtime.
>
> Dunno, something like below maybe?
>
> ---
> drivers/block/zram/zram_drv.c | 14 ++++++++------
> 1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index ce2e1c79fc75..27fe50d666d7 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -967,7 +967,7 @@ static int zram_writeback_complete(struct zram *zram, struct zram_wb_req *req)
> static void zram_writeback_endio(struct bio *bio)
> {
> struct zram_wb_req *req = container_of(bio, struct zram_wb_req, bio);
> - struct zram_wb_ctl *wb_ctl = bio->bi_private;
> + struct zram_wb_ctl *wb_ctl = READ_ONCE(bio->bi_private);
> unsigned long flags;
>
> spin_lock_irqsave(&wb_ctl->done_lock, flags);
> @@ -975,6 +975,7 @@ static void zram_writeback_endio(struct bio *bio)
> spin_unlock_irqrestore(&wb_ctl->done_lock, flags);
>
> wake_up(&wb_ctl->done_wait);
> + atomic_dec(&wb_ctl->num_inflight);
> }
>
> static void zram_submit_wb_request(struct zram *zram,
> @@ -998,7 +999,7 @@ static int zram_complete_done_reqs(struct zram *zram,
> unsigned long flags;
> int ret = 0, err;
>
> - while (atomic_read(&wb_ctl->num_inflight) > 0) {
> + for (;;) {
> spin_lock_irqsave(&wb_ctl->done_lock, flags);
> req = list_first_entry_or_null(&wb_ctl->done_reqs,
> struct zram_wb_req, entry);
> @@ -1006,7 +1007,6 @@ static int zram_complete_done_reqs(struct zram *zram,
> list_del(&req->entry);
> spin_unlock_irqrestore(&wb_ctl->done_lock, flags);
>
> - /* ->num_inflight > 0 doesn't mean we have done requests */
> if (!req)
> break;
>
> @@ -1014,7 +1014,6 @@ static int zram_complete_done_reqs(struct zram *zram,
> if (err)
> ret = err;
>
> - atomic_dec(&wb_ctl->num_inflight);
> release_pp_slot(zram, req->pps);
> req->pps = NULL;
>
> @@ -1129,8 +1128,11 @@ static int zram_writeback_slots(struct zram *zram,
> if (req)
> release_wb_req(req);
>
> - while (atomic_read(&wb_ctl->num_inflight) > 0) {
> - wait_event(wb_ctl->done_wait, !list_empty(&wb_ctl->done_reqs));
> + while (atomic_read(&wb_ctl->num_inflight) ||
> + !list_empty(&wb_ctl->done_reqs)) {
> + wait_event_timeout(wb_ctl->done_wait,
> + !list_empty(&wb_ctl->done_reqs),
> + HZ);
> err = zram_complete_done_reqs(zram, wb_ctl);
> if (err)
> ret = err;
I understand why you used a timeout here, but I still don't think it's a good
idea since the user could wait for up to a second unnecessarily during the
race.
What I prefer is simple and explicit lifetime management for wb_ctl using
refcount. It directly addresses the core issue (UAF of wb_ctl) in a standard,
robust way without needing workarounds like timeouts. The runtime overhead
of kref will be negligible.
Something like this:
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index a324ede6206d..28ab4a24e77f 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -33,6 +33,7 @@
#include <linux/cpuhotplug.h>
#include <linux/part_stat.h>
#include <linux/kernel_read_file.h>
+#include <linux/kref.h>
#include "zram_drv.h"
@@ -504,6 +505,7 @@ struct zram_wb_ctl {
wait_queue_head_t done_wait;
spinlock_t done_lock;
atomic_t num_inflight;
+ struct kref kref;
};
struct zram_wb_req {
@@ -829,11 +831,8 @@ static void release_wb_req(struct zram_wb_req *req)
kfree(req);
}
-static void release_wb_ctl(struct zram_wb_ctl *wb_ctl)
+static void __release_wb_ctl(struct zram_wb_ctl *wb_ctl)
{
- if (!wb_ctl)
- return;
-
/* We should never have inflight requests at this point */
WARN_ON(atomic_read(&wb_ctl->num_inflight));
WARN_ON(!list_empty(&wb_ctl->done_reqs));
@@ -850,6 +849,18 @@ static void release_wb_ctl(struct zram_wb_ctl *wb_ctl)
kfree(wb_ctl);
}
+static void release_wb_ctl_kref(struct kref *kref)
+{
+ struct zram_wb_ctl *wb_ctl = container_of(kref, struct zram_wb_ctl, kref);
+
+ __release_wb_ctl(wb_ctl);
+}
+
+static void release_wb_ctl(struct zram_wb_ctl *wb_ctl)
+{
+ kref_put(&wb_ctl->kref, release_wb_ctl_kref);
+}
+
static struct zram_wb_ctl *init_wb_ctl(struct zram *zram)
{
struct zram_wb_ctl *wb_ctl;
@@ -864,6 +875,7 @@ static struct zram_wb_ctl *init_wb_ctl(struct zram *zram)
atomic_set(&wb_ctl->num_inflight, 0);
init_waitqueue_head(&wb_ctl->done_wait);
spin_lock_init(&wb_ctl->done_lock);
+ kref_init(&wb_ctl->kref);
for (i = 0; i < zram->wb_batch_size; i++) {
struct zram_wb_req *req;
@@ -985,6 +997,7 @@ static void zram_writeback_endio(struct bio *bio)
spin_unlock_irqrestore(&wb_ctl->done_lock, flags);
wake_up(&wb_ctl->done_wait);
+ kref_put(&wb_ctl->kref, release_wb_ctl_kref);
}
static void zram_submit_wb_request(struct zram *zram,
@@ -996,6 +1009,7 @@ static void zram_submit_wb_request(struct zram *zram,
* so that we don't over-submit.
*/
zram_account_writeback_submit(zram);
+ kref_get(&wb_ctl->kref);
atomic_inc(&wb_ctl->num_inflight);
req->bio.bi_private = wb_ctl;
submit_bio(&req->bio);
@@ -1276,8 +1290,8 @@ static ssize_t writeback_store(struct device *dev,
wb_ctl = init_wb_ctl(zram);
if (!wb_ctl) {
- ret = -ENOMEM;
- goto out;
+ release_pp_ctl(zram, pp_ctl);
+ return -ENOMEM;
}
args = skip_spaces(buf);
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] zram: fix use-after-free in zram_writeback_endio
2026-05-07 22:56 ` Minchan Kim
@ 2026-05-07 23:38 ` Minchan Kim
2026-05-08 2:40 ` Sergey Senozhatsky
1 sibling, 0 replies; 10+ messages in thread
From: Minchan Kim @ 2026-05-07 23:38 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Richard Chang, Jens Axboe, Andrew Morton, bgeffon, liumartin,
linux-kernel, linux-block, linux-mm
On Thu, May 07, 2026 at 03:56:52PM -0700, Minchan Kim wrote:
> On Thu, May 07, 2026 at 06:40:37PM +0900, Sergey Senozhatsky wrote:
> > On (26/05/05 09:37), Minchan Kim wrote:
> > > > @@ -966,9 +966,8 @@ static void zram_writeback_endio(struct bio *bio)
> > > >
> > > > spin_lock_irqsave(&wb_ctl->done_lock, flags);
> > > > list_add(&req->entry, &wb_ctl->done_reqs);
> > > > - spin_unlock_irqrestore(&wb_ctl->done_lock, flags);
> > > > -
> > > > wake_up(&wb_ctl->done_wait);
> > > > + spin_unlock_irqrestore(&wb_ctl->done_lock, flags);
> > > > }
> > > >
> > >
> > > I agree this will fix the issue, but using a lock to extend the lifetime of
> > > an object to avoid a UAF is not a good pattern. Object lifetime shared between
> > > process and interrupt contexts should be managed explicitly using refcount.
> >
> > ->num_inflight is a ref-counter, basically. The problem is that
> > completion is a two-step process, only one part of each is synchronized
> > with the writeback context. I honestly don't want to have two ref-counts:
> > one for requests pending zram completion and one for active endio contexts.
> > Maybe we can repurpose num_inflight instead.
>
> If it can make the code much clearer and simpler, I have no objection.
>
> >
> > > Furthermore, keeping wake_up() outside the critical section minimizes
> > > interrupt-disabled latency
> >
> > So I considered that, but isn't endio already called from IRQ context?
> > Just asking. We wakeup only one waiter (writeback task), so it's not
> > that bad CPU-cycles wise. Do you think it's really a concern?
>
> I don't think it will have any measurable impact; I was just pointing out
> a theoretical one.
>
> >
> > wake_up() under spin-lock solves the problem of a unsynchronized
> > two-stages endio process.
> >
> > > and avoids nesting spinlocks (done_lock -> done_wait.lock), reducing
> > > the risk of future lockdep issues, just in case.
> >
> > I considered lockdep as well but ruled it out as impossible scenario,
> > nesting here is strictly uni-directional, we never call into zram from
> > the scheduler. Just saying.
>
> Sure. I just prefer to avoid adding more lock dependencies without a strong
> justification, to prevent potential locking issues in the future.
>
> >
> > > It definitely will add more overhead for the submission/completion paths to deal
> > > with the refcount, but I think we should go that way at the cost of runtime.
> >
> > Dunno, something like below maybe?
> >
> > ---
> > drivers/block/zram/zram_drv.c | 14 ++++++++------
> > 1 file changed, 8 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> > index ce2e1c79fc75..27fe50d666d7 100644
> > --- a/drivers/block/zram/zram_drv.c
> > +++ b/drivers/block/zram/zram_drv.c
> > @@ -967,7 +967,7 @@ static int zram_writeback_complete(struct zram *zram, struct zram_wb_req *req)
> > static void zram_writeback_endio(struct bio *bio)
> > {
> > struct zram_wb_req *req = container_of(bio, struct zram_wb_req, bio);
> > - struct zram_wb_ctl *wb_ctl = bio->bi_private;
> > + struct zram_wb_ctl *wb_ctl = READ_ONCE(bio->bi_private);
> > unsigned long flags;
> >
> > spin_lock_irqsave(&wb_ctl->done_lock, flags);
> > @@ -975,6 +975,7 @@ static void zram_writeback_endio(struct bio *bio)
> > spin_unlock_irqrestore(&wb_ctl->done_lock, flags);
> >
> > wake_up(&wb_ctl->done_wait);
> > + atomic_dec(&wb_ctl->num_inflight);
> > }
> >
> > static void zram_submit_wb_request(struct zram *zram,
> > @@ -998,7 +999,7 @@ static int zram_complete_done_reqs(struct zram *zram,
> > unsigned long flags;
> > int ret = 0, err;
> >
> > - while (atomic_read(&wb_ctl->num_inflight) > 0) {
> > + for (;;) {
> > spin_lock_irqsave(&wb_ctl->done_lock, flags);
> > req = list_first_entry_or_null(&wb_ctl->done_reqs,
> > struct zram_wb_req, entry);
> > @@ -1006,7 +1007,6 @@ static int zram_complete_done_reqs(struct zram *zram,
> > list_del(&req->entry);
> > spin_unlock_irqrestore(&wb_ctl->done_lock, flags);
> >
> > - /* ->num_inflight > 0 doesn't mean we have done requests */
> > if (!req)
> > break;
> >
> > @@ -1014,7 +1014,6 @@ static int zram_complete_done_reqs(struct zram *zram,
> > if (err)
> > ret = err;
> >
> > - atomic_dec(&wb_ctl->num_inflight);
> > release_pp_slot(zram, req->pps);
> > req->pps = NULL;
> >
> > @@ -1129,8 +1128,11 @@ static int zram_writeback_slots(struct zram *zram,
> > if (req)
> > release_wb_req(req);
> >
> > - while (atomic_read(&wb_ctl->num_inflight) > 0) {
> > - wait_event(wb_ctl->done_wait, !list_empty(&wb_ctl->done_reqs));
> > + while (atomic_read(&wb_ctl->num_inflight) ||
> > + !list_empty(&wb_ctl->done_reqs)) {
> > + wait_event_timeout(wb_ctl->done_wait,
> > + !list_empty(&wb_ctl->done_reqs),
> > + HZ);
> > err = zram_complete_done_reqs(zram, wb_ctl);
> > if (err)
> > ret = err;
>
> I understand why you used a timeout here, but I still don't think it's a good
> idea since the user could wait for up to a second unnecessarily during the
> race.
>
> What I prefer is simple and explicit lifetime management for wb_ctl using
> refcount. It directly addresses the core issue (UAF of wb_ctl) in a standard,
> robust way without needing workarounds like timeouts. The runtime overhead
> of kref will be negligible.
>
The other standard way to deal with lifetime is RCU.
How about this?
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index a324ede6206d..28ab4a24e77f 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -33,6 +33,7 @@
#include <linux/cpuhotplug.h>
#include <linux/part_stat.h>
#include <linux/kernel_read_file.h>
+#include <linux/rcupdate.h>
#include "zram_drv.h"
@@ -504,6 +505,7 @@ struct zram_wb_ctl {
wait_queue_head_t done_wait;
spinlock_t done_lock;
atomic_t num_inflight;
+ struct rcu_head rcu;
};
struct zram_wb_req {
@@ -829,14 +831,8 @@ static void release_wb_req(struct zram_wb_req *req)
kfree(req);
}
static void release_wb_ctl(struct zram_wb_ctl *wb_ctl)
{
- if (!wb_ctl)
- return;
-
/* We should never have inflight requests at this point */
WARN_ON(atomic_read(&wb_ctl->num_inflight));
WARN_ON(!list_empty(&wb_ctl->done_reqs));
@@ -850,7 +849,7 @@ static void release_wb_ctl(struct zram_wb_ctl *wb_ctl)
release_wb_req(req);
}
- kfree(wb_ctl);
+ kfree_rcu(wb_ctl, rcu);
}
static struct zram_wb_ctl *init_wb_ctl(struct zram *zram)
@@ -985,6 +997,7 @@ static void zram_writeback_endio(struct bio *bio)
struct zram_wb_ctl *wb_ctl = bio->bi_private;
unsigned long flags;
+ rcu_read_lock();
spin_lock_irqsave(&wb_ctl->done_lock, flags);
list_add(&req->entry, &wb_ctl->done_reqs);
spin_unlock_irqrestore(&wb_ctl->done_lock, flags);
@@ -991,5 +1004,6 @@ static void zram_writeback_endio(struct bio *bio)
wake_up(&wb_ctl->done_wait);
+ rcu_read_unlock();
}
static void zram_submit_wb_request(struct zram *zram,
@@ -1276,8 +1290,8 @@ static ssize_t writeback_store(struct device *dev,
wb_ctl = init_wb_ctl(zram);
if (!wb_ctl) {
- ret = -ENOMEM;
- goto out;
+ release_pp_ctl(zram, pp_ctl);
+ return -ENOMEM;
}
args = skip_spaces(buf);
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] zram: fix use-after-free in zram_writeback_endio
2026-05-07 22:56 ` Minchan Kim
2026-05-07 23:38 ` Minchan Kim
@ 2026-05-08 2:40 ` Sergey Senozhatsky
2026-05-08 8:49 ` [PATCH v2] " Richard Chang
1 sibling, 1 reply; 10+ messages in thread
From: Sergey Senozhatsky @ 2026-05-08 2:40 UTC (permalink / raw)
To: Minchan Kim
Cc: Sergey Senozhatsky, Richard Chang, Jens Axboe, Andrew Morton,
bgeffon, liumartin, linux-kernel, linux-block, linux-mm
On (26/05/07 15:56), Minchan Kim wrote:
> > - while (atomic_read(&wb_ctl->num_inflight) > 0) {
> > - wait_event(wb_ctl->done_wait, !list_empty(&wb_ctl->done_reqs));
> > + while (atomic_read(&wb_ctl->num_inflight) ||
> > + !list_empty(&wb_ctl->done_reqs)) {
> > + wait_event_timeout(wb_ctl->done_wait,
> > + !list_empty(&wb_ctl->done_reqs),
> > + HZ);
> > err = zram_complete_done_reqs(zram, wb_ctl);
> > if (err)
> > ret = err;
>
> I understand why you used a timeout here, but I still don't think it's a good
> idea since the user could wait for up to a second unnecessarily during the
> race.
Well, sure, it doesn't have to be a full HZ, we only need to wait
for propagation of atomic_dec() from another CPU. That's very fast,
orders of magniter faster than a full second. Just saying.
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index a324ede6206d..28ab4a24e77f 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -33,6 +33,7 @@
> #include <linux/cpuhotplug.h>
> #include <linux/part_stat.h>
> #include <linux/kernel_read_file.h>
> +#include <linux/kref.h>
>
> #include "zram_drv.h"
>
> @@ -504,6 +505,7 @@ struct zram_wb_ctl {
> wait_queue_head_t done_wait;
> spinlock_t done_lock;
> atomic_t num_inflight;
> + struct kref kref;
> };
Yeah okay, it overlaps with ->num_inflight, but we can live with that.
Maybe can get rod of ->num_inflight in future patches.
[..]
> @@ -864,6 +875,7 @@ static struct zram_wb_ctl *init_wb_ctl(struct zram *zram)
> atomic_set(&wb_ctl->num_inflight, 0);
> init_waitqueue_head(&wb_ctl->done_wait);
> spin_lock_init(&wb_ctl->done_lock);
> + kref_init(&wb_ctl->kref);
>
> for (i = 0; i < zram->wb_batch_size; i++) {
> struct zram_wb_req *req;
> @@ -985,6 +997,7 @@ static void zram_writeback_endio(struct bio *bio)
> spin_unlock_irqrestore(&wb_ctl->done_lock, flags);
>
> wake_up(&wb_ctl->done_wait);
> + kref_put(&wb_ctl->kref, release_wb_ctl_kref);
> }
>
>
> static void zram_submit_wb_request(struct zram *zram,
> @@ -996,6 +1009,7 @@ static void zram_submit_wb_request(struct zram *zram,
> * so that we don't over-submit.
> */
> zram_account_writeback_submit(zram);
> + kref_get(&wb_ctl->kref);
> atomic_inc(&wb_ctl->num_inflight);
> req->bio.bi_private = wb_ctl;
> submit_bio(&req->bio);
> @@ -1276,8 +1290,8 @@ static ssize_t writeback_store(struct device *dev,
>
> wb_ctl = init_wb_ctl(zram);
> if (!wb_ctl) {
> - ret = -ENOMEM;
> - goto out;
> + release_pp_ctl(zram, pp_ctl);
> + return -ENOMEM;
> }
>
> args = skip_spaces(buf);
So I think we also need to do kref_put(&wb_ctl->kref, release_wb_ctl_kref)
at the end of writeback_store(), because otherwise it just kfree()
wb_ctl and we have the same race condition:
@@ -1330,7 +1340,7 @@ static ssize_t writeback_store(struct device *dev,
out:
release_pp_ctl(zram, pp_ctl);
- release_wb_ctl(wb_ctl);
+ kref_put(&wb_ctl->kref, release_wb_ctl_kref);
return ret;
}
And indirect release in init_wb_ctl() as well:
@@ -895,7 +903,7 @@ static struct zram_wb_ctl *init_wb_ctl(struct zram *zram)
return wb_ctl;
release_wb_ctl:
- release_wb_ctl(wb_ctl);
+ kref_put(&wb_ctl->kref, release_wb_ctl_kref);
return NULL;
}
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2] zram: fix use-after-free in zram_writeback_endio
2026-05-08 2:40 ` Sergey Senozhatsky
@ 2026-05-08 8:49 ` Richard Chang
2026-05-08 21:16 ` Minchan Kim
2026-05-09 2:18 ` Sergey Senozhatsky
0 siblings, 2 replies; 10+ messages in thread
From: Richard Chang @ 2026-05-08 8:49 UTC (permalink / raw)
To: Minchan Kim, Sergey Senozhatsky, Jens Axboe, Andrew Morton
Cc: bgeffon, liumartin, linux-kernel, linux-block, linux-mm,
Richard Chang
A crash was observed in zram_writeback_endio due to a NULL pointer
dereference in wake_up. The root cause is a race condition between the
bio completion handler (zram_writeback_endio) and the writeback task.
In zram_writeback_endio, wake_up() is called on &wb_ctl->done_wait after
releasing wb_ctl->done_lock. This creates a race window where the
writeback task can see num_inflight become 0, return, and free wb_ctl
before zram_writeback_endio calls wake_up().
CPU 0 (zram_writeback_endio) CPU 1 (writeback_store)
============================ ============================
zram_writeback_slots
zram_submit_wb_request
zram_submit_wb_request
wait_event(wb_ctl->done_wait)
spin_lock(&wb_ctl->done_lock);
list_add(&req->entry, &wb_ctl->done_reqs);
spin_unlock(&wb_ctl->done_lock);
wake_up(&wb_ctl->done_wait);
zram_complete_done_reqs
spin_lock(&wb_ctl->done_lock);
list_add(&req->entry, &wb_ctl->done_reqs);
spin_unlock(&wb_ctl->done_lock);
while (num_inflight) > 0)
spin_lock(&wb_ctl->done_lock);
list_del(&req->entry);
spin_unlock(&wb_ctl->done_lock);
// num_inflight becomes 0
atomic_dec(num_inflight);
// Leave zram_writeback_slots
// Free wb_ctl
release_wb_ctl(wb_ctl);
// UAF crash!
wake_up(&wb_ctl->done_wait);
This patch fixes this race by using RCU. By protecting wb_ctl with
rcu_read_lock() in zram_writeback_endio and using kfree_rcu() to free
it, we ensure that wb_ctl remains valid during the execution of
zram_writeback_endio.
Fixes: f405066a1f0d ("zram: introduce writeback bio batching")
Suggested-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Suggested-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Richard Chang <richardycc@google.com>
---
V1 -> V2: use RCU to manage the wb_ctl lifetime
drivers/block/zram/zram_drv.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index aebc710f0d6a..07111455eecf 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -33,6 +33,7 @@
#include <linux/cpuhotplug.h>
#include <linux/part_stat.h>
#include <linux/kernel_read_file.h>
+#include <linux/rcupdate.h>
#include "zram_drv.h"
@@ -504,6 +505,7 @@ struct zram_wb_ctl {
wait_queue_head_t done_wait;
spinlock_t done_lock;
atomic_t num_inflight;
+ struct rcu_head rcu;
};
struct zram_wb_req {
@@ -847,7 +849,7 @@ static void release_wb_ctl(struct zram_wb_ctl *wb_ctl)
release_wb_req(req);
}
- kfree(wb_ctl);
+ kfree_rcu(wb_ctl, rcu);
}
static struct zram_wb_ctl *init_wb_ctl(struct zram *zram)
@@ -964,11 +966,13 @@ static void zram_writeback_endio(struct bio *bio)
struct zram_wb_ctl *wb_ctl = bio->bi_private;
unsigned long flags;
+ rcu_read_lock();
spin_lock_irqsave(&wb_ctl->done_lock, flags);
list_add(&req->entry, &wb_ctl->done_reqs);
spin_unlock_irqrestore(&wb_ctl->done_lock, flags);
wake_up(&wb_ctl->done_wait);
+ rcu_read_unlock();
}
static void zram_submit_wb_request(struct zram *zram,
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2] zram: fix use-after-free in zram_writeback_endio
2026-05-08 8:49 ` [PATCH v2] " Richard Chang
@ 2026-05-08 21:16 ` Minchan Kim
2026-05-09 2:18 ` Sergey Senozhatsky
1 sibling, 0 replies; 10+ messages in thread
From: Minchan Kim @ 2026-05-08 21:16 UTC (permalink / raw)
To: Richard Chang
Cc: Sergey Senozhatsky, Jens Axboe, Andrew Morton, bgeffon, liumartin,
linux-kernel, linux-block, linux-mm
On Fri, May 08, 2026 at 08:49:33AM +0000, Richard Chang wrote:
> A crash was observed in zram_writeback_endio due to a NULL pointer
> dereference in wake_up. The root cause is a race condition between the
> bio completion handler (zram_writeback_endio) and the writeback task.
>
> In zram_writeback_endio, wake_up() is called on &wb_ctl->done_wait after
> releasing wb_ctl->done_lock. This creates a race window where the
> writeback task can see num_inflight become 0, return, and free wb_ctl
> before zram_writeback_endio calls wake_up().
>
> CPU 0 (zram_writeback_endio) CPU 1 (writeback_store)
> ============================ ============================
> zram_writeback_slots
> zram_submit_wb_request
> zram_submit_wb_request
> wait_event(wb_ctl->done_wait)
> spin_lock(&wb_ctl->done_lock);
> list_add(&req->entry, &wb_ctl->done_reqs);
> spin_unlock(&wb_ctl->done_lock);
> wake_up(&wb_ctl->done_wait);
> zram_complete_done_reqs
> spin_lock(&wb_ctl->done_lock);
> list_add(&req->entry, &wb_ctl->done_reqs);
> spin_unlock(&wb_ctl->done_lock);
> while (num_inflight) > 0)
> spin_lock(&wb_ctl->done_lock);
> list_del(&req->entry);
> spin_unlock(&wb_ctl->done_lock);
> // num_inflight becomes 0
> atomic_dec(num_inflight);
>
> // Leave zram_writeback_slots
> // Free wb_ctl
> release_wb_ctl(wb_ctl);
> // UAF crash!
> wake_up(&wb_ctl->done_wait);
>
> This patch fixes this race by using RCU. By protecting wb_ctl with
> rcu_read_lock() in zram_writeback_endio and using kfree_rcu() to free
> it, we ensure that wb_ctl remains valid during the execution of
> zram_writeback_endio.
>
> Fixes: f405066a1f0d ("zram: introduce writeback bio batching")
> Suggested-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> Suggested-by: Minchan Kim <minchan@kernel.org>
> Signed-off-by: Richard Chang <richardycc@google.com>
Acked-by: Minchan Kim <minchan@kernel.org>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] zram: fix use-after-free in zram_writeback_endio
2026-05-08 8:49 ` [PATCH v2] " Richard Chang
2026-05-08 21:16 ` Minchan Kim
@ 2026-05-09 2:18 ` Sergey Senozhatsky
1 sibling, 0 replies; 10+ messages in thread
From: Sergey Senozhatsky @ 2026-05-09 2:18 UTC (permalink / raw)
To: Richard Chang
Cc: Minchan Kim, Sergey Senozhatsky, Jens Axboe, Andrew Morton,
bgeffon, liumartin, linux-kernel, linux-block, linux-mm
On (26/05/08 08:49), Richard Chang wrote:
> A crash was observed in zram_writeback_endio due to a NULL pointer
> dereference in wake_up. The root cause is a race condition between the
> bio completion handler (zram_writeback_endio) and the writeback task.
>
> In zram_writeback_endio, wake_up() is called on &wb_ctl->done_wait after
> releasing wb_ctl->done_lock. This creates a race window where the
> writeback task can see num_inflight become 0, return, and free wb_ctl
> before zram_writeback_endio calls wake_up().
>
> CPU 0 (zram_writeback_endio) CPU 1 (writeback_store)
> ============================ ============================
> zram_writeback_slots
> zram_submit_wb_request
> zram_submit_wb_request
> wait_event(wb_ctl->done_wait)
> spin_lock(&wb_ctl->done_lock);
> list_add(&req->entry, &wb_ctl->done_reqs);
> spin_unlock(&wb_ctl->done_lock);
> wake_up(&wb_ctl->done_wait);
> zram_complete_done_reqs
> spin_lock(&wb_ctl->done_lock);
> list_add(&req->entry, &wb_ctl->done_reqs);
> spin_unlock(&wb_ctl->done_lock);
> while (num_inflight) > 0)
> spin_lock(&wb_ctl->done_lock);
> list_del(&req->entry);
> spin_unlock(&wb_ctl->done_lock);
> // num_inflight becomes 0
> atomic_dec(num_inflight);
>
> // Leave zram_writeback_slots
> // Free wb_ctl
> release_wb_ctl(wb_ctl);
> // UAF crash!
> wake_up(&wb_ctl->done_wait);
>
> This patch fixes this race by using RCU. By protecting wb_ctl with
> rcu_read_lock() in zram_writeback_endio and using kfree_rcu() to free
> it, we ensure that wb_ctl remains valid during the execution of
> zram_writeback_endio.
>
> Fixes: f405066a1f0d ("zram: introduce writeback bio batching")
> Suggested-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> Suggested-by: Minchan Kim <minchan@kernel.org>
> Signed-off-by: Richard Chang <richardycc@google.com>
Acked-by: Sergey Senozhatsky <senozhatsky@chromium.org>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-05-09 2:18 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-04 12:32 [PATCH] zram: fix use-after-free in zram_writeback_endio Richard Chang
2026-05-05 3:25 ` Sergey Senozhatsky
2026-05-05 16:37 ` Minchan Kim
2026-05-07 9:40 ` Sergey Senozhatsky
2026-05-07 22:56 ` Minchan Kim
2026-05-07 23:38 ` Minchan Kim
2026-05-08 2:40 ` Sergey Senozhatsky
2026-05-08 8:49 ` [PATCH v2] " Richard Chang
2026-05-08 21:16 ` Minchan Kim
2026-05-09 2:18 ` Sergey Senozhatsky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox