* [PATCH] zram: fix use-after-free in zram_writeback_endio
@ 2026-05-04 12:32 Richard Chang
2026-05-05 3:25 ` Sergey Senozhatsky
2026-05-05 16:37 ` Minchan Kim
0 siblings, 2 replies; 10+ messages in thread
From: Richard Chang @ 2026-05-04 12:32 UTC (permalink / raw)
To: Minchan Kim, Sergey Senozhatsky, Jens Axboe, Andrew Morton
Cc: bgeffon, liumartin, linux-kernel, linux-block, linux-mm,
Richard Chang
A crash was observed in zram_writeback_endio due to a NULL pointer
dereference in wake_up. The root cause is a race condition between the
bio completion handler (zram_writeback_endio) and the writeback task.
In zram_writeback_endio, wake_up() is called on &wb_ctl->done_wait after
releasing wb_ctl->done_lock. This creates a race window where the
writeback task can see num_inflight become 0, return, and free wb_ctl
before zram_writeback_endio calls wake_up().
CPU 0 (zram_writeback_endio) CPU 1 (zram_complete_done_reqs)
============================ ============================
spin_lock(&wb_ctl->done_lock);
list_add(&req->entry, &wb_ctl->done_reqs);
spin_unlock(&wb_ctl->done_lock);
while (&wb_ctl->num_inflight) > 0)
spin_lock(&wb_ctl->done_lock);
list_del(&req->entry);
spin_unlock(&wb_ctl->done_lock);
// num_inflight becomes 0
atomic_dec(&wb_ctl->num_inflight);
returns to writeback_store();
// frees wb_ctl
release_wb_ctl(wb_ctl);
// UAF crash!
wake_up(&wb_ctl->done_wait);
Fix this by moving wake_up() inside the done_lock critical section.
This ensures that zram_complete_done_reqs cannot consume the request
and decrement num_inflight until zram_writeback_endio has finished
calling wake_up() and released the lock.
Fixes: f405066a1f0d ("zram: introduce writeback bio batching")
Signed-off-by: Richard Chang <richardycc@google.com>
---
drivers/block/zram/zram_drv.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index aebc710f0d6a..a457fdf564f8 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -966,9 +966,8 @@ static void zram_writeback_endio(struct bio *bio)
spin_lock_irqsave(&wb_ctl->done_lock, flags);
list_add(&req->entry, &wb_ctl->done_reqs);
- spin_unlock_irqrestore(&wb_ctl->done_lock, flags);
-
wake_up(&wb_ctl->done_wait);
+ spin_unlock_irqrestore(&wb_ctl->done_lock, flags);
}
static void zram_submit_wb_request(struct zram *zram,
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH] zram: fix use-after-free in zram_writeback_endio 2026-05-04 12:32 [PATCH] zram: fix use-after-free in zram_writeback_endio Richard Chang @ 2026-05-05 3:25 ` Sergey Senozhatsky 2026-05-05 16:37 ` Minchan Kim 1 sibling, 0 replies; 10+ messages in thread From: Sergey Senozhatsky @ 2026-05-05 3:25 UTC (permalink / raw) To: Andrew Morton, Richard Chang Cc: Minchan Kim, Sergey Senozhatsky, Jens Axboe, bgeffon, liumartin, linux-kernel, linux-block, linux-mm On (26/05/04 12:32), Richard Chang wrote: > A crash was observed in zram_writeback_endio due to a NULL pointer > dereference in wake_up. The root cause is a race condition between the > bio completion handler (zram_writeback_endio) and the writeback task. > > In zram_writeback_endio, wake_up() is called on &wb_ctl->done_wait after > releasing wb_ctl->done_lock. This creates a race window where the > writeback task can see num_inflight become 0, return, and free wb_ctl > before zram_writeback_endio calls wake_up(). > > CPU 0 (zram_writeback_endio) CPU 1 (zram_complete_done_reqs) > ============================ ============================ > spin_lock(&wb_ctl->done_lock); > list_add(&req->entry, &wb_ctl->done_reqs); > spin_unlock(&wb_ctl->done_lock); > while (&wb_ctl->num_inflight) > 0) > spin_lock(&wb_ctl->done_lock); > list_del(&req->entry); > spin_unlock(&wb_ctl->done_lock); > // num_inflight becomes 0 > atomic_dec(&wb_ctl->num_inflight); > returns to writeback_store(); > // frees wb_ctl > release_wb_ctl(wb_ctl); > > // UAF crash! > wake_up(&wb_ctl->done_wait); > > Fix this by moving wake_up() inside the done_lock critical section. > This ensures that zram_complete_done_reqs cannot consume the request > and decrement num_inflight until zram_writeback_endio has finished > calling wake_up() and released the lock. > > Fixes: f405066a1f0d ("zram: introduce writeback bio batching") > Signed-off-by: Richard Chang <richardycc@google.com> Acked-by: Sergey Senozhatsky <senozhatsky@chromium.org> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] zram: fix use-after-free in zram_writeback_endio 2026-05-04 12:32 [PATCH] zram: fix use-after-free in zram_writeback_endio Richard Chang 2026-05-05 3:25 ` Sergey Senozhatsky @ 2026-05-05 16:37 ` Minchan Kim 2026-05-07 9:40 ` Sergey Senozhatsky 1 sibling, 1 reply; 10+ messages in thread From: Minchan Kim @ 2026-05-05 16:37 UTC (permalink / raw) To: Richard Chang Cc: Sergey Senozhatsky, Jens Axboe, Andrew Morton, bgeffon, liumartin, linux-kernel, linux-block, linux-mm On Mon, May 04, 2026 at 12:32:30PM +0000, Richard Chang wrote: > A crash was observed in zram_writeback_endio due to a NULL pointer > dereference in wake_up. The root cause is a race condition between the > bio completion handler (zram_writeback_endio) and the writeback task. > > In zram_writeback_endio, wake_up() is called on &wb_ctl->done_wait after > releasing wb_ctl->done_lock. This creates a race window where the > writeback task can see num_inflight become 0, return, and free wb_ctl > before zram_writeback_endio calls wake_up(). > > CPU 0 (zram_writeback_endio) CPU 1 (zram_complete_done_reqs) > ============================ ============================ > spin_lock(&wb_ctl->done_lock); > list_add(&req->entry, &wb_ctl->done_reqs); > spin_unlock(&wb_ctl->done_lock); > while (&wb_ctl->num_inflight) > 0) > spin_lock(&wb_ctl->done_lock); > list_del(&req->entry); > spin_unlock(&wb_ctl->done_lock); > // num_inflight becomes 0 > atomic_dec(&wb_ctl->num_inflight); > returns to writeback_store(); > // frees wb_ctl > release_wb_ctl(wb_ctl); > > // UAF crash! > wake_up(&wb_ctl->done_wait); > > Fix this by moving wake_up() inside the done_lock critical section. > This ensures that zram_complete_done_reqs cannot consume the request > and decrement num_inflight until zram_writeback_endio has finished > calling wake_up() and released the lock. > > Fixes: f405066a1f0d ("zram: introduce writeback bio batching") > Signed-off-by: Richard Chang <richardycc@google.com> > --- > drivers/block/zram/zram_drv.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c > index aebc710f0d6a..a457fdf564f8 100644 > --- a/drivers/block/zram/zram_drv.c > +++ b/drivers/block/zram/zram_drv.c > @@ -966,9 +966,8 @@ static void zram_writeback_endio(struct bio *bio) > > spin_lock_irqsave(&wb_ctl->done_lock, flags); > list_add(&req->entry, &wb_ctl->done_reqs); > - spin_unlock_irqrestore(&wb_ctl->done_lock, flags); > - > wake_up(&wb_ctl->done_wait); > + spin_unlock_irqrestore(&wb_ctl->done_lock, flags); > } > I agree this will fix the issue, but using a lock to extend the lifetime of an object to avoid a UAF is not a good pattern. Object lifetime shared between process and interrupt contexts should be managed explicitly using refcount. Furthermore, keeping wake_up() outside the critical section minimizes interrupt-disabled latency and avoids nesting spinlocks (done_lock -> done_wait.lock), reducing the risk of future lockdep issues, just in case. It definitely will add more overhead for the submission/completion paths to deal with the refcount, but I think we should go that way at the cost of runtime. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] zram: fix use-after-free in zram_writeback_endio 2026-05-05 16:37 ` Minchan Kim @ 2026-05-07 9:40 ` Sergey Senozhatsky 2026-05-07 22:56 ` Minchan Kim 0 siblings, 1 reply; 10+ messages in thread From: Sergey Senozhatsky @ 2026-05-07 9:40 UTC (permalink / raw) To: Minchan Kim Cc: Richard Chang, Sergey Senozhatsky, Jens Axboe, Andrew Morton, bgeffon, liumartin, linux-kernel, linux-block, linux-mm On (26/05/05 09:37), Minchan Kim wrote: > > @@ -966,9 +966,8 @@ static void zram_writeback_endio(struct bio *bio) > > > > spin_lock_irqsave(&wb_ctl->done_lock, flags); > > list_add(&req->entry, &wb_ctl->done_reqs); > > - spin_unlock_irqrestore(&wb_ctl->done_lock, flags); > > - > > wake_up(&wb_ctl->done_wait); > > + spin_unlock_irqrestore(&wb_ctl->done_lock, flags); > > } > > > > I agree this will fix the issue, but using a lock to extend the lifetime of > an object to avoid a UAF is not a good pattern. Object lifetime shared between > process and interrupt contexts should be managed explicitly using refcount. ->num_inflight is a ref-counter, basically. The problem is that completion is a two-step process, only one part of each is synchronized with the writeback context. I honestly don't want to have two ref-counts: one for requests pending zram completion and one for active endio contexts. Maybe we can repurpose num_inflight instead. > Furthermore, keeping wake_up() outside the critical section minimizes > interrupt-disabled latency So I considered that, but isn't endio already called from IRQ context? Just asking. We wakeup only one waiter (writeback task), so it's not that bad CPU-cycles wise. Do you think it's really a concern? wake_up() under spin-lock solves the problem of a unsynchronized two-stages endio process. > and avoids nesting spinlocks (done_lock -> done_wait.lock), reducing > the risk of future lockdep issues, just in case. I considered lockdep as well but ruled it out as impossible scenario, nesting here is strictly uni-directional, we never call into zram from the scheduler. Just saying. > It definitely will add more overhead for the submission/completion paths to deal > with the refcount, but I think we should go that way at the cost of runtime. Dunno, something like below maybe? --- drivers/block/zram/zram_drv.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index ce2e1c79fc75..27fe50d666d7 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -967,7 +967,7 @@ static int zram_writeback_complete(struct zram *zram, struct zram_wb_req *req) static void zram_writeback_endio(struct bio *bio) { struct zram_wb_req *req = container_of(bio, struct zram_wb_req, bio); - struct zram_wb_ctl *wb_ctl = bio->bi_private; + struct zram_wb_ctl *wb_ctl = READ_ONCE(bio->bi_private); unsigned long flags; spin_lock_irqsave(&wb_ctl->done_lock, flags); @@ -975,6 +975,7 @@ static void zram_writeback_endio(struct bio *bio) spin_unlock_irqrestore(&wb_ctl->done_lock, flags); wake_up(&wb_ctl->done_wait); + atomic_dec(&wb_ctl->num_inflight); } static void zram_submit_wb_request(struct zram *zram, @@ -998,7 +999,7 @@ static int zram_complete_done_reqs(struct zram *zram, unsigned long flags; int ret = 0, err; - while (atomic_read(&wb_ctl->num_inflight) > 0) { + for (;;) { spin_lock_irqsave(&wb_ctl->done_lock, flags); req = list_first_entry_or_null(&wb_ctl->done_reqs, struct zram_wb_req, entry); @@ -1006,7 +1007,6 @@ static int zram_complete_done_reqs(struct zram *zram, list_del(&req->entry); spin_unlock_irqrestore(&wb_ctl->done_lock, flags); - /* ->num_inflight > 0 doesn't mean we have done requests */ if (!req) break; @@ -1014,7 +1014,6 @@ static int zram_complete_done_reqs(struct zram *zram, if (err) ret = err; - atomic_dec(&wb_ctl->num_inflight); release_pp_slot(zram, req->pps); req->pps = NULL; @@ -1129,8 +1128,11 @@ static int zram_writeback_slots(struct zram *zram, if (req) release_wb_req(req); - while (atomic_read(&wb_ctl->num_inflight) > 0) { - wait_event(wb_ctl->done_wait, !list_empty(&wb_ctl->done_reqs)); + while (atomic_read(&wb_ctl->num_inflight) || + !list_empty(&wb_ctl->done_reqs)) { + wait_event_timeout(wb_ctl->done_wait, + !list_empty(&wb_ctl->done_reqs), + HZ); err = zram_complete_done_reqs(zram, wb_ctl); if (err) ret = err; -- 2.54.0.563.g4f69b47b94-goog ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] zram: fix use-after-free in zram_writeback_endio 2026-05-07 9:40 ` Sergey Senozhatsky @ 2026-05-07 22:56 ` Minchan Kim 2026-05-07 23:38 ` Minchan Kim 2026-05-08 2:40 ` Sergey Senozhatsky 0 siblings, 2 replies; 10+ messages in thread From: Minchan Kim @ 2026-05-07 22:56 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Richard Chang, Jens Axboe, Andrew Morton, bgeffon, liumartin, linux-kernel, linux-block, linux-mm On Thu, May 07, 2026 at 06:40:37PM +0900, Sergey Senozhatsky wrote: > On (26/05/05 09:37), Minchan Kim wrote: > > > @@ -966,9 +966,8 @@ static void zram_writeback_endio(struct bio *bio) > > > > > > spin_lock_irqsave(&wb_ctl->done_lock, flags); > > > list_add(&req->entry, &wb_ctl->done_reqs); > > > - spin_unlock_irqrestore(&wb_ctl->done_lock, flags); > > > - > > > wake_up(&wb_ctl->done_wait); > > > + spin_unlock_irqrestore(&wb_ctl->done_lock, flags); > > > } > > > > > > > I agree this will fix the issue, but using a lock to extend the lifetime of > > an object to avoid a UAF is not a good pattern. Object lifetime shared between > > process and interrupt contexts should be managed explicitly using refcount. > > ->num_inflight is a ref-counter, basically. The problem is that > completion is a two-step process, only one part of each is synchronized > with the writeback context. I honestly don't want to have two ref-counts: > one for requests pending zram completion and one for active endio contexts. > Maybe we can repurpose num_inflight instead. If it can make the code much clearer and simpler, I have no objection. > > > Furthermore, keeping wake_up() outside the critical section minimizes > > interrupt-disabled latency > > So I considered that, but isn't endio already called from IRQ context? > Just asking. We wakeup only one waiter (writeback task), so it's not > that bad CPU-cycles wise. Do you think it's really a concern? I don't think it will have any measurable impact; I was just pointing out a theoretical one. > > wake_up() under spin-lock solves the problem of a unsynchronized > two-stages endio process. > > > and avoids nesting spinlocks (done_lock -> done_wait.lock), reducing > > the risk of future lockdep issues, just in case. > > I considered lockdep as well but ruled it out as impossible scenario, > nesting here is strictly uni-directional, we never call into zram from > the scheduler. Just saying. Sure. I just prefer to avoid adding more lock dependencies without a strong justification, to prevent potential locking issues in the future. > > > It definitely will add more overhead for the submission/completion paths to deal > > with the refcount, but I think we should go that way at the cost of runtime. > > Dunno, something like below maybe? > > --- > drivers/block/zram/zram_drv.c | 14 ++++++++------ > 1 file changed, 8 insertions(+), 6 deletions(-) > > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c > index ce2e1c79fc75..27fe50d666d7 100644 > --- a/drivers/block/zram/zram_drv.c > +++ b/drivers/block/zram/zram_drv.c > @@ -967,7 +967,7 @@ static int zram_writeback_complete(struct zram *zram, struct zram_wb_req *req) > static void zram_writeback_endio(struct bio *bio) > { > struct zram_wb_req *req = container_of(bio, struct zram_wb_req, bio); > - struct zram_wb_ctl *wb_ctl = bio->bi_private; > + struct zram_wb_ctl *wb_ctl = READ_ONCE(bio->bi_private); > unsigned long flags; > > spin_lock_irqsave(&wb_ctl->done_lock, flags); > @@ -975,6 +975,7 @@ static void zram_writeback_endio(struct bio *bio) > spin_unlock_irqrestore(&wb_ctl->done_lock, flags); > > wake_up(&wb_ctl->done_wait); > + atomic_dec(&wb_ctl->num_inflight); > } > > static void zram_submit_wb_request(struct zram *zram, > @@ -998,7 +999,7 @@ static int zram_complete_done_reqs(struct zram *zram, > unsigned long flags; > int ret = 0, err; > > - while (atomic_read(&wb_ctl->num_inflight) > 0) { > + for (;;) { > spin_lock_irqsave(&wb_ctl->done_lock, flags); > req = list_first_entry_or_null(&wb_ctl->done_reqs, > struct zram_wb_req, entry); > @@ -1006,7 +1007,6 @@ static int zram_complete_done_reqs(struct zram *zram, > list_del(&req->entry); > spin_unlock_irqrestore(&wb_ctl->done_lock, flags); > > - /* ->num_inflight > 0 doesn't mean we have done requests */ > if (!req) > break; > > @@ -1014,7 +1014,6 @@ static int zram_complete_done_reqs(struct zram *zram, > if (err) > ret = err; > > - atomic_dec(&wb_ctl->num_inflight); > release_pp_slot(zram, req->pps); > req->pps = NULL; > > @@ -1129,8 +1128,11 @@ static int zram_writeback_slots(struct zram *zram, > if (req) > release_wb_req(req); > > - while (atomic_read(&wb_ctl->num_inflight) > 0) { > - wait_event(wb_ctl->done_wait, !list_empty(&wb_ctl->done_reqs)); > + while (atomic_read(&wb_ctl->num_inflight) || > + !list_empty(&wb_ctl->done_reqs)) { > + wait_event_timeout(wb_ctl->done_wait, > + !list_empty(&wb_ctl->done_reqs), > + HZ); > err = zram_complete_done_reqs(zram, wb_ctl); > if (err) > ret = err; I understand why you used a timeout here, but I still don't think it's a good idea since the user could wait for up to a second unnecessarily during the race. What I prefer is simple and explicit lifetime management for wb_ctl using refcount. It directly addresses the core issue (UAF of wb_ctl) in a standard, robust way without needing workarounds like timeouts. The runtime overhead of kref will be negligible. Something like this: diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index a324ede6206d..28ab4a24e77f 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -33,6 +33,7 @@ #include <linux/cpuhotplug.h> #include <linux/part_stat.h> #include <linux/kernel_read_file.h> +#include <linux/kref.h> #include "zram_drv.h" @@ -504,6 +505,7 @@ struct zram_wb_ctl { wait_queue_head_t done_wait; spinlock_t done_lock; atomic_t num_inflight; + struct kref kref; }; struct zram_wb_req { @@ -829,11 +831,8 @@ static void release_wb_req(struct zram_wb_req *req) kfree(req); } -static void release_wb_ctl(struct zram_wb_ctl *wb_ctl) +static void __release_wb_ctl(struct zram_wb_ctl *wb_ctl) { - if (!wb_ctl) - return; - /* We should never have inflight requests at this point */ WARN_ON(atomic_read(&wb_ctl->num_inflight)); WARN_ON(!list_empty(&wb_ctl->done_reqs)); @@ -850,6 +849,18 @@ static void release_wb_ctl(struct zram_wb_ctl *wb_ctl) kfree(wb_ctl); } +static void release_wb_ctl_kref(struct kref *kref) +{ + struct zram_wb_ctl *wb_ctl = container_of(kref, struct zram_wb_ctl, kref); + + __release_wb_ctl(wb_ctl); +} + +static void release_wb_ctl(struct zram_wb_ctl *wb_ctl) +{ + kref_put(&wb_ctl->kref, release_wb_ctl_kref); +} + static struct zram_wb_ctl *init_wb_ctl(struct zram *zram) { struct zram_wb_ctl *wb_ctl; @@ -864,6 +875,7 @@ static struct zram_wb_ctl *init_wb_ctl(struct zram *zram) atomic_set(&wb_ctl->num_inflight, 0); init_waitqueue_head(&wb_ctl->done_wait); spin_lock_init(&wb_ctl->done_lock); + kref_init(&wb_ctl->kref); for (i = 0; i < zram->wb_batch_size; i++) { struct zram_wb_req *req; @@ -985,6 +997,7 @@ static void zram_writeback_endio(struct bio *bio) spin_unlock_irqrestore(&wb_ctl->done_lock, flags); wake_up(&wb_ctl->done_wait); + kref_put(&wb_ctl->kref, release_wb_ctl_kref); } static void zram_submit_wb_request(struct zram *zram, @@ -996,6 +1009,7 @@ static void zram_submit_wb_request(struct zram *zram, * so that we don't over-submit. */ zram_account_writeback_submit(zram); + kref_get(&wb_ctl->kref); atomic_inc(&wb_ctl->num_inflight); req->bio.bi_private = wb_ctl; submit_bio(&req->bio); @@ -1276,8 +1290,8 @@ static ssize_t writeback_store(struct device *dev, wb_ctl = init_wb_ctl(zram); if (!wb_ctl) { - ret = -ENOMEM; - goto out; + release_pp_ctl(zram, pp_ctl); + return -ENOMEM; } args = skip_spaces(buf); ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] zram: fix use-after-free in zram_writeback_endio 2026-05-07 22:56 ` Minchan Kim @ 2026-05-07 23:38 ` Minchan Kim 2026-05-08 2:40 ` Sergey Senozhatsky 1 sibling, 0 replies; 10+ messages in thread From: Minchan Kim @ 2026-05-07 23:38 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Richard Chang, Jens Axboe, Andrew Morton, bgeffon, liumartin, linux-kernel, linux-block, linux-mm On Thu, May 07, 2026 at 03:56:52PM -0700, Minchan Kim wrote: > On Thu, May 07, 2026 at 06:40:37PM +0900, Sergey Senozhatsky wrote: > > On (26/05/05 09:37), Minchan Kim wrote: > > > > @@ -966,9 +966,8 @@ static void zram_writeback_endio(struct bio *bio) > > > > > > > > spin_lock_irqsave(&wb_ctl->done_lock, flags); > > > > list_add(&req->entry, &wb_ctl->done_reqs); > > > > - spin_unlock_irqrestore(&wb_ctl->done_lock, flags); > > > > - > > > > wake_up(&wb_ctl->done_wait); > > > > + spin_unlock_irqrestore(&wb_ctl->done_lock, flags); > > > > } > > > > > > > > > > I agree this will fix the issue, but using a lock to extend the lifetime of > > > an object to avoid a UAF is not a good pattern. Object lifetime shared between > > > process and interrupt contexts should be managed explicitly using refcount. > > > > ->num_inflight is a ref-counter, basically. The problem is that > > completion is a two-step process, only one part of each is synchronized > > with the writeback context. I honestly don't want to have two ref-counts: > > one for requests pending zram completion and one for active endio contexts. > > Maybe we can repurpose num_inflight instead. > > If it can make the code much clearer and simpler, I have no objection. > > > > > > Furthermore, keeping wake_up() outside the critical section minimizes > > > interrupt-disabled latency > > > > So I considered that, but isn't endio already called from IRQ context? > > Just asking. We wakeup only one waiter (writeback task), so it's not > > that bad CPU-cycles wise. Do you think it's really a concern? > > I don't think it will have any measurable impact; I was just pointing out > a theoretical one. > > > > > wake_up() under spin-lock solves the problem of a unsynchronized > > two-stages endio process. > > > > > and avoids nesting spinlocks (done_lock -> done_wait.lock), reducing > > > the risk of future lockdep issues, just in case. > > > > I considered lockdep as well but ruled it out as impossible scenario, > > nesting here is strictly uni-directional, we never call into zram from > > the scheduler. Just saying. > > Sure. I just prefer to avoid adding more lock dependencies without a strong > justification, to prevent potential locking issues in the future. > > > > > > It definitely will add more overhead for the submission/completion paths to deal > > > with the refcount, but I think we should go that way at the cost of runtime. > > > > Dunno, something like below maybe? > > > > --- > > drivers/block/zram/zram_drv.c | 14 ++++++++------ > > 1 file changed, 8 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c > > index ce2e1c79fc75..27fe50d666d7 100644 > > --- a/drivers/block/zram/zram_drv.c > > +++ b/drivers/block/zram/zram_drv.c > > @@ -967,7 +967,7 @@ static int zram_writeback_complete(struct zram *zram, struct zram_wb_req *req) > > static void zram_writeback_endio(struct bio *bio) > > { > > struct zram_wb_req *req = container_of(bio, struct zram_wb_req, bio); > > - struct zram_wb_ctl *wb_ctl = bio->bi_private; > > + struct zram_wb_ctl *wb_ctl = READ_ONCE(bio->bi_private); > > unsigned long flags; > > > > spin_lock_irqsave(&wb_ctl->done_lock, flags); > > @@ -975,6 +975,7 @@ static void zram_writeback_endio(struct bio *bio) > > spin_unlock_irqrestore(&wb_ctl->done_lock, flags); > > > > wake_up(&wb_ctl->done_wait); > > + atomic_dec(&wb_ctl->num_inflight); > > } > > > > static void zram_submit_wb_request(struct zram *zram, > > @@ -998,7 +999,7 @@ static int zram_complete_done_reqs(struct zram *zram, > > unsigned long flags; > > int ret = 0, err; > > > > - while (atomic_read(&wb_ctl->num_inflight) > 0) { > > + for (;;) { > > spin_lock_irqsave(&wb_ctl->done_lock, flags); > > req = list_first_entry_or_null(&wb_ctl->done_reqs, > > struct zram_wb_req, entry); > > @@ -1006,7 +1007,6 @@ static int zram_complete_done_reqs(struct zram *zram, > > list_del(&req->entry); > > spin_unlock_irqrestore(&wb_ctl->done_lock, flags); > > > > - /* ->num_inflight > 0 doesn't mean we have done requests */ > > if (!req) > > break; > > > > @@ -1014,7 +1014,6 @@ static int zram_complete_done_reqs(struct zram *zram, > > if (err) > > ret = err; > > > > - atomic_dec(&wb_ctl->num_inflight); > > release_pp_slot(zram, req->pps); > > req->pps = NULL; > > > > @@ -1129,8 +1128,11 @@ static int zram_writeback_slots(struct zram *zram, > > if (req) > > release_wb_req(req); > > > > - while (atomic_read(&wb_ctl->num_inflight) > 0) { > > - wait_event(wb_ctl->done_wait, !list_empty(&wb_ctl->done_reqs)); > > + while (atomic_read(&wb_ctl->num_inflight) || > > + !list_empty(&wb_ctl->done_reqs)) { > > + wait_event_timeout(wb_ctl->done_wait, > > + !list_empty(&wb_ctl->done_reqs), > > + HZ); > > err = zram_complete_done_reqs(zram, wb_ctl); > > if (err) > > ret = err; > > I understand why you used a timeout here, but I still don't think it's a good > idea since the user could wait for up to a second unnecessarily during the > race. > > What I prefer is simple and explicit lifetime management for wb_ctl using > refcount. It directly addresses the core issue (UAF of wb_ctl) in a standard, > robust way without needing workarounds like timeouts. The runtime overhead > of kref will be negligible. > The other standard way to deal with lifetime is RCU. How about this? diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index a324ede6206d..28ab4a24e77f 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -33,6 +33,7 @@ #include <linux/cpuhotplug.h> #include <linux/part_stat.h> #include <linux/kernel_read_file.h> +#include <linux/rcupdate.h> #include "zram_drv.h" @@ -504,6 +505,7 @@ struct zram_wb_ctl { wait_queue_head_t done_wait; spinlock_t done_lock; atomic_t num_inflight; + struct rcu_head rcu; }; struct zram_wb_req { @@ -829,14 +831,8 @@ static void release_wb_req(struct zram_wb_req *req) kfree(req); } static void release_wb_ctl(struct zram_wb_ctl *wb_ctl) { - if (!wb_ctl) - return; - /* We should never have inflight requests at this point */ WARN_ON(atomic_read(&wb_ctl->num_inflight)); WARN_ON(!list_empty(&wb_ctl->done_reqs)); @@ -850,7 +849,7 @@ static void release_wb_ctl(struct zram_wb_ctl *wb_ctl) release_wb_req(req); } - kfree(wb_ctl); + kfree_rcu(wb_ctl, rcu); } static struct zram_wb_ctl *init_wb_ctl(struct zram *zram) @@ -985,6 +997,7 @@ static void zram_writeback_endio(struct bio *bio) struct zram_wb_ctl *wb_ctl = bio->bi_private; unsigned long flags; + rcu_read_lock(); spin_lock_irqsave(&wb_ctl->done_lock, flags); list_add(&req->entry, &wb_ctl->done_reqs); spin_unlock_irqrestore(&wb_ctl->done_lock, flags); @@ -991,5 +1004,6 @@ static void zram_writeback_endio(struct bio *bio) wake_up(&wb_ctl->done_wait); + rcu_read_unlock(); } static void zram_submit_wb_request(struct zram *zram, @@ -1276,8 +1290,8 @@ static ssize_t writeback_store(struct device *dev, wb_ctl = init_wb_ctl(zram); if (!wb_ctl) { - ret = -ENOMEM; - goto out; + release_pp_ctl(zram, pp_ctl); + return -ENOMEM; } args = skip_spaces(buf); ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] zram: fix use-after-free in zram_writeback_endio 2026-05-07 22:56 ` Minchan Kim 2026-05-07 23:38 ` Minchan Kim @ 2026-05-08 2:40 ` Sergey Senozhatsky 2026-05-08 8:49 ` [PATCH v2] " Richard Chang 1 sibling, 1 reply; 10+ messages in thread From: Sergey Senozhatsky @ 2026-05-08 2:40 UTC (permalink / raw) To: Minchan Kim Cc: Sergey Senozhatsky, Richard Chang, Jens Axboe, Andrew Morton, bgeffon, liumartin, linux-kernel, linux-block, linux-mm On (26/05/07 15:56), Minchan Kim wrote: > > - while (atomic_read(&wb_ctl->num_inflight) > 0) { > > - wait_event(wb_ctl->done_wait, !list_empty(&wb_ctl->done_reqs)); > > + while (atomic_read(&wb_ctl->num_inflight) || > > + !list_empty(&wb_ctl->done_reqs)) { > > + wait_event_timeout(wb_ctl->done_wait, > > + !list_empty(&wb_ctl->done_reqs), > > + HZ); > > err = zram_complete_done_reqs(zram, wb_ctl); > > if (err) > > ret = err; > > I understand why you used a timeout here, but I still don't think it's a good > idea since the user could wait for up to a second unnecessarily during the > race. Well, sure, it doesn't have to be a full HZ, we only need to wait for propagation of atomic_dec() from another CPU. That's very fast, orders of magniter faster than a full second. Just saying. > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c > index a324ede6206d..28ab4a24e77f 100644 > --- a/drivers/block/zram/zram_drv.c > +++ b/drivers/block/zram/zram_drv.c > @@ -33,6 +33,7 @@ > #include <linux/cpuhotplug.h> > #include <linux/part_stat.h> > #include <linux/kernel_read_file.h> > +#include <linux/kref.h> > > #include "zram_drv.h" > > @@ -504,6 +505,7 @@ struct zram_wb_ctl { > wait_queue_head_t done_wait; > spinlock_t done_lock; > atomic_t num_inflight; > + struct kref kref; > }; Yeah okay, it overlaps with ->num_inflight, but we can live with that. Maybe can get rod of ->num_inflight in future patches. [..] > @@ -864,6 +875,7 @@ static struct zram_wb_ctl *init_wb_ctl(struct zram *zram) > atomic_set(&wb_ctl->num_inflight, 0); > init_waitqueue_head(&wb_ctl->done_wait); > spin_lock_init(&wb_ctl->done_lock); > + kref_init(&wb_ctl->kref); > > for (i = 0; i < zram->wb_batch_size; i++) { > struct zram_wb_req *req; > @@ -985,6 +997,7 @@ static void zram_writeback_endio(struct bio *bio) > spin_unlock_irqrestore(&wb_ctl->done_lock, flags); > > wake_up(&wb_ctl->done_wait); > + kref_put(&wb_ctl->kref, release_wb_ctl_kref); > } > > > static void zram_submit_wb_request(struct zram *zram, > @@ -996,6 +1009,7 @@ static void zram_submit_wb_request(struct zram *zram, > * so that we don't over-submit. > */ > zram_account_writeback_submit(zram); > + kref_get(&wb_ctl->kref); > atomic_inc(&wb_ctl->num_inflight); > req->bio.bi_private = wb_ctl; > submit_bio(&req->bio); > @@ -1276,8 +1290,8 @@ static ssize_t writeback_store(struct device *dev, > > wb_ctl = init_wb_ctl(zram); > if (!wb_ctl) { > - ret = -ENOMEM; > - goto out; > + release_pp_ctl(zram, pp_ctl); > + return -ENOMEM; > } > > args = skip_spaces(buf); So I think we also need to do kref_put(&wb_ctl->kref, release_wb_ctl_kref) at the end of writeback_store(), because otherwise it just kfree() wb_ctl and we have the same race condition: @@ -1330,7 +1340,7 @@ static ssize_t writeback_store(struct device *dev, out: release_pp_ctl(zram, pp_ctl); - release_wb_ctl(wb_ctl); + kref_put(&wb_ctl->kref, release_wb_ctl_kref); return ret; } And indirect release in init_wb_ctl() as well: @@ -895,7 +903,7 @@ static struct zram_wb_ctl *init_wb_ctl(struct zram *zram) return wb_ctl; release_wb_ctl: - release_wb_ctl(wb_ctl); + kref_put(&wb_ctl->kref, release_wb_ctl_kref); return NULL; } ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2] zram: fix use-after-free in zram_writeback_endio 2026-05-08 2:40 ` Sergey Senozhatsky @ 2026-05-08 8:49 ` Richard Chang 2026-05-08 21:16 ` Minchan Kim 2026-05-09 2:18 ` Sergey Senozhatsky 0 siblings, 2 replies; 10+ messages in thread From: Richard Chang @ 2026-05-08 8:49 UTC (permalink / raw) To: Minchan Kim, Sergey Senozhatsky, Jens Axboe, Andrew Morton Cc: bgeffon, liumartin, linux-kernel, linux-block, linux-mm, Richard Chang A crash was observed in zram_writeback_endio due to a NULL pointer dereference in wake_up. The root cause is a race condition between the bio completion handler (zram_writeback_endio) and the writeback task. In zram_writeback_endio, wake_up() is called on &wb_ctl->done_wait after releasing wb_ctl->done_lock. This creates a race window where the writeback task can see num_inflight become 0, return, and free wb_ctl before zram_writeback_endio calls wake_up(). CPU 0 (zram_writeback_endio) CPU 1 (writeback_store) ============================ ============================ zram_writeback_slots zram_submit_wb_request zram_submit_wb_request wait_event(wb_ctl->done_wait) spin_lock(&wb_ctl->done_lock); list_add(&req->entry, &wb_ctl->done_reqs); spin_unlock(&wb_ctl->done_lock); wake_up(&wb_ctl->done_wait); zram_complete_done_reqs spin_lock(&wb_ctl->done_lock); list_add(&req->entry, &wb_ctl->done_reqs); spin_unlock(&wb_ctl->done_lock); while (num_inflight) > 0) spin_lock(&wb_ctl->done_lock); list_del(&req->entry); spin_unlock(&wb_ctl->done_lock); // num_inflight becomes 0 atomic_dec(num_inflight); // Leave zram_writeback_slots // Free wb_ctl release_wb_ctl(wb_ctl); // UAF crash! wake_up(&wb_ctl->done_wait); This patch fixes this race by using RCU. By protecting wb_ctl with rcu_read_lock() in zram_writeback_endio and using kfree_rcu() to free it, we ensure that wb_ctl remains valid during the execution of zram_writeback_endio. Fixes: f405066a1f0d ("zram: introduce writeback bio batching") Suggested-by: Sergey Senozhatsky <senozhatsky@chromium.org> Suggested-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Richard Chang <richardycc@google.com> --- V1 -> V2: use RCU to manage the wb_ctl lifetime drivers/block/zram/zram_drv.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index aebc710f0d6a..07111455eecf 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -33,6 +33,7 @@ #include <linux/cpuhotplug.h> #include <linux/part_stat.h> #include <linux/kernel_read_file.h> +#include <linux/rcupdate.h> #include "zram_drv.h" @@ -504,6 +505,7 @@ struct zram_wb_ctl { wait_queue_head_t done_wait; spinlock_t done_lock; atomic_t num_inflight; + struct rcu_head rcu; }; struct zram_wb_req { @@ -847,7 +849,7 @@ static void release_wb_ctl(struct zram_wb_ctl *wb_ctl) release_wb_req(req); } - kfree(wb_ctl); + kfree_rcu(wb_ctl, rcu); } static struct zram_wb_ctl *init_wb_ctl(struct zram *zram) @@ -964,11 +966,13 @@ static void zram_writeback_endio(struct bio *bio) struct zram_wb_ctl *wb_ctl = bio->bi_private; unsigned long flags; + rcu_read_lock(); spin_lock_irqsave(&wb_ctl->done_lock, flags); list_add(&req->entry, &wb_ctl->done_reqs); spin_unlock_irqrestore(&wb_ctl->done_lock, flags); wake_up(&wb_ctl->done_wait); + rcu_read_unlock(); } static void zram_submit_wb_request(struct zram *zram, -- 2.54.0.563.g4f69b47b94-goog ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2] zram: fix use-after-free in zram_writeback_endio 2026-05-08 8:49 ` [PATCH v2] " Richard Chang @ 2026-05-08 21:16 ` Minchan Kim 2026-05-09 2:18 ` Sergey Senozhatsky 1 sibling, 0 replies; 10+ messages in thread From: Minchan Kim @ 2026-05-08 21:16 UTC (permalink / raw) To: Richard Chang Cc: Sergey Senozhatsky, Jens Axboe, Andrew Morton, bgeffon, liumartin, linux-kernel, linux-block, linux-mm On Fri, May 08, 2026 at 08:49:33AM +0000, Richard Chang wrote: > A crash was observed in zram_writeback_endio due to a NULL pointer > dereference in wake_up. The root cause is a race condition between the > bio completion handler (zram_writeback_endio) and the writeback task. > > In zram_writeback_endio, wake_up() is called on &wb_ctl->done_wait after > releasing wb_ctl->done_lock. This creates a race window where the > writeback task can see num_inflight become 0, return, and free wb_ctl > before zram_writeback_endio calls wake_up(). > > CPU 0 (zram_writeback_endio) CPU 1 (writeback_store) > ============================ ============================ > zram_writeback_slots > zram_submit_wb_request > zram_submit_wb_request > wait_event(wb_ctl->done_wait) > spin_lock(&wb_ctl->done_lock); > list_add(&req->entry, &wb_ctl->done_reqs); > spin_unlock(&wb_ctl->done_lock); > wake_up(&wb_ctl->done_wait); > zram_complete_done_reqs > spin_lock(&wb_ctl->done_lock); > list_add(&req->entry, &wb_ctl->done_reqs); > spin_unlock(&wb_ctl->done_lock); > while (num_inflight) > 0) > spin_lock(&wb_ctl->done_lock); > list_del(&req->entry); > spin_unlock(&wb_ctl->done_lock); > // num_inflight becomes 0 > atomic_dec(num_inflight); > > // Leave zram_writeback_slots > // Free wb_ctl > release_wb_ctl(wb_ctl); > // UAF crash! > wake_up(&wb_ctl->done_wait); > > This patch fixes this race by using RCU. By protecting wb_ctl with > rcu_read_lock() in zram_writeback_endio and using kfree_rcu() to free > it, we ensure that wb_ctl remains valid during the execution of > zram_writeback_endio. > > Fixes: f405066a1f0d ("zram: introduce writeback bio batching") > Suggested-by: Sergey Senozhatsky <senozhatsky@chromium.org> > Suggested-by: Minchan Kim <minchan@kernel.org> > Signed-off-by: Richard Chang <richardycc@google.com> Acked-by: Minchan Kim <minchan@kernel.org> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] zram: fix use-after-free in zram_writeback_endio 2026-05-08 8:49 ` [PATCH v2] " Richard Chang 2026-05-08 21:16 ` Minchan Kim @ 2026-05-09 2:18 ` Sergey Senozhatsky 1 sibling, 0 replies; 10+ messages in thread From: Sergey Senozhatsky @ 2026-05-09 2:18 UTC (permalink / raw) To: Richard Chang Cc: Minchan Kim, Sergey Senozhatsky, Jens Axboe, Andrew Morton, bgeffon, liumartin, linux-kernel, linux-block, linux-mm On (26/05/08 08:49), Richard Chang wrote: > A crash was observed in zram_writeback_endio due to a NULL pointer > dereference in wake_up. The root cause is a race condition between the > bio completion handler (zram_writeback_endio) and the writeback task. > > In zram_writeback_endio, wake_up() is called on &wb_ctl->done_wait after > releasing wb_ctl->done_lock. This creates a race window where the > writeback task can see num_inflight become 0, return, and free wb_ctl > before zram_writeback_endio calls wake_up(). > > CPU 0 (zram_writeback_endio) CPU 1 (writeback_store) > ============================ ============================ > zram_writeback_slots > zram_submit_wb_request > zram_submit_wb_request > wait_event(wb_ctl->done_wait) > spin_lock(&wb_ctl->done_lock); > list_add(&req->entry, &wb_ctl->done_reqs); > spin_unlock(&wb_ctl->done_lock); > wake_up(&wb_ctl->done_wait); > zram_complete_done_reqs > spin_lock(&wb_ctl->done_lock); > list_add(&req->entry, &wb_ctl->done_reqs); > spin_unlock(&wb_ctl->done_lock); > while (num_inflight) > 0) > spin_lock(&wb_ctl->done_lock); > list_del(&req->entry); > spin_unlock(&wb_ctl->done_lock); > // num_inflight becomes 0 > atomic_dec(num_inflight); > > // Leave zram_writeback_slots > // Free wb_ctl > release_wb_ctl(wb_ctl); > // UAF crash! > wake_up(&wb_ctl->done_wait); > > This patch fixes this race by using RCU. By protecting wb_ctl with > rcu_read_lock() in zram_writeback_endio and using kfree_rcu() to free > it, we ensure that wb_ctl remains valid during the execution of > zram_writeback_endio. > > Fixes: f405066a1f0d ("zram: introduce writeback bio batching") > Suggested-by: Sergey Senozhatsky <senozhatsky@chromium.org> > Suggested-by: Minchan Kim <minchan@kernel.org> > Signed-off-by: Richard Chang <richardycc@google.com> Acked-by: Sergey Senozhatsky <senozhatsky@chromium.org> ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-05-09 2:18 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-04 12:32 [PATCH] zram: fix use-after-free in zram_writeback_endio Richard Chang 2026-05-05 3:25 ` Sergey Senozhatsky 2026-05-05 16:37 ` Minchan Kim 2026-05-07 9:40 ` Sergey Senozhatsky 2026-05-07 22:56 ` Minchan Kim 2026-05-07 23:38 ` Minchan Kim 2026-05-08 2:40 ` Sergey Senozhatsky 2026-05-08 8:49 ` [PATCH v2] " Richard Chang 2026-05-08 21:16 ` Minchan Kim 2026-05-09 2:18 ` Sergey Senozhatsky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox