Linux block layer
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Douglas Anderson <dianders@chromium.org>,
	Paolo Valente <paolo.valente@linaro.org>
Cc: groeck@chromium.org, drinkcat@chromium.org,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] block, bfq: NULL out the bic when it's no longer valid
Date: Fri, 28 Jun 2019 07:45:00 -0600	[thread overview]
Message-ID: <9048aa45-ecf3-c80d-a973-4aeb86adde23@kernel.dk> (raw)
In-Reply-To: <20190628044409.128823-1-dianders@chromium.org>

On 6/27/19 10:44 PM, Douglas Anderson wrote:
> In reboot tests on several devices we were seeing a "use after free"
> when slub_debug or KASAN was enabled.  The kernel complained about:
> 
>    Unable to handle kernel paging request at virtual address 6b6b6c2b
> 
> ...which is a classic sign of use after free under slub_debug.  The
> stack crawl in kgdb looked like:
> 
>   0  test_bit (addr=<optimized out>, nr=<optimized out>)
>   1  bfq_bfqq_busy (bfqq=<optimized out>)
>   2  bfq_select_queue (bfqd=<optimized out>)
>   3  __bfq_dispatch_request (hctx=<optimized out>)
>   4  bfq_dispatch_request (hctx=<optimized out>)
>   5  0xc056ef00 in blk_mq_do_dispatch_sched (hctx=0xed249440)
>   6  0xc056f728 in blk_mq_sched_dispatch_requests (hctx=0xed249440)
>   7  0xc0568d24 in __blk_mq_run_hw_queue (hctx=0xed249440)
>   8  0xc0568d94 in blk_mq_run_work_fn (work=<optimized out>)
>   9  0xc024c5c4 in process_one_work (worker=0xec6d4640, work=0xed249480)
>   10 0xc024cff4 in worker_thread (__worker=0xec6d4640)
> 
> Digging in kgdb, it could be found that, though bfqq looked fine,
> bfqq->bic had been freed.
> 
> Through further digging, I postulated that perhaps it is illegal to
> access a "bic" (AKA an "icq") after bfq_exit_icq() had been called
> because the "bic" can be freed at some point in time after this call
> is made.  I confirmed that there certainly were cases where the exact
> crashing code path would access the "bic" after bfq_exit_icq() had
> been called.  Sspecifically I set the "bfqq->bic" to (void *)0x7 and
> saw that the bic was 0x7 at the time of the crash.
> 
> To understand a bit more about why this crash was fairly uncommon (I
> saw it only once in a few hundred reboots), you can see that much of
> the time bfq_exit_icq_fbqq() fully frees the bfqq and thus it can't
> access the ->bic anymore.  The only case it doesn't is if
> bfq_put_queue() sees a reference still held.
> 
> However, even in the case when bfqq isn't freed, the crash is still
> rare.  Why?  I tracked what happened to the "bic" after the exit
> routine.  It doesn't get freed right away.  Rather,
> put_io_context_active() eventually called put_io_context() which
> queued up freeing on a workqueue.  The freeing then actually happened
> later than that through call_rcu().  Despite all these delays, some
> extra debugging showed that all the hoops could be jumped through in
> time and the memory could be freed causing the original crash.  Phew!
> 
> To make a long story short, assuming it truly is illegal to access an
> icq after the "exit_icq" callback is finished, this patch is needed.
> 
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> Most of the testing of this was done on the Chrome OS 4.19 kernel with
> BFQ backported (thanks to Paolo's help).  I did manage to reproduce a
> crash on mainline Linux (v5.2-rc6) though.
> 
> To see some of the techniques used to debug this, see
> <https://crrev.com/c/1679134> and <https://crrev.com/c/1681258/1>.
> 
> I'll also note that on linuxnext (next-20190627) I saw some other
> use-after-frees that seemed related to BFQ but haven't had time to
> debug.  They seemed unrelated.

Applied for 5.3, but I marked it for stable as well.

-- 
Jens Axboe


      parent reply	other threads:[~2019-06-28 13:45 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-28  4:44 [PATCH] block, bfq: NULL out the bic when it's no longer valid Douglas Anderson
2019-06-28  4:57 ` Guenter Roeck
2019-06-28  7:51   ` Paolo Valente
2019-06-28 13:45 ` Jens Axboe [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9048aa45-ecf3-c80d-a973-4aeb86adde23@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=dianders@chromium.org \
    --cc=drinkcat@chromium.org \
    --cc=groeck@chromium.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paolo.valente@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox