From: Jens Axboe <axboe@kernel.dk>
To: Greg KH <gregkh@linuxfoundation.org>
Cc: John Sperbeck <jsperbeck@google.com>,
Bean Huo <beanhuo@micron.com>, Sagi Grimberg <sagi@grimberg.me>,
khazhy@google.com, stable@vger.kernel.org
Subject: Re: Crash in NVME tracing on 5.10LTS
Date: Sat, 13 Jan 2024 09:11:49 -0700 [thread overview]
Message-ID: <29108779-92d0-40dc-a259-72a24ad5d385@kernel.dk> (raw)
In-Reply-To: <2024011314-debtless-clunky-ea56@gregkh>
On 1/13/24 2:34 AM, Greg KH wrote:
> On Thu, Jan 11, 2024 at 12:38:03PM -0700, Jens Axboe wrote:
>> On 1/11/24 10:00 AM, John Sperbeck wrote:
>>> On Thu, Jan 11, 2024 at 1:46?AM Greg KH <gregkh@linuxfoundation.org> wrote:
>>>>
>>>> On Tue, Jan 09, 2024 at 10:17:22AM -0800, John Sperbeck wrote:
>>>>> With 5.10LTS (e.g., 5.10.206), on a machine using an NVME device, the
>>>>> following tracing commands will trigger a crash due to a NULL pointer
>>>>> dereference:
>>>>>
>>>>> KDIR=/sys/kernel/debug/tracing
>>>>> echo 1 > $KDIR/tracing_on
>>>>> echo 1 > $KDIR/events/nvme/enable
>>>>> echo "Waiting for trace events..."
>>>>> cat $KDIR/trace_pipe
>>>>>
>>>>> The backtrace looks something like this:
>>>>>
>>>>> Call Trace:
>>>>> <IRQ>
>>>>> ? __die_body+0x6b/0xb0
>>>>> ? __die+0x9e/0xb0
>>>>> ? no_context+0x3eb/0x460
>>>>> ? ttwu_do_activate+0xf0/0x120
>>>>> ? __bad_area_nosemaphore+0x157/0x200
>>>>> ? select_idle_sibling+0x2f/0x410
>>>>> ? bad_area_nosemaphore+0x13/0x20
>>>>> ? do_user_addr_fault+0x2ab/0x360
>>>>> ? exc_page_fault+0x69/0x180
>>>>> ? asm_exc_page_fault+0x1e/0x30
>>>>> ? trace_event_raw_event_nvme_complete_rq+0xba/0x170
>>>>> ? trace_event_raw_event_nvme_complete_rq+0xa3/0x170
>>>>> nvme_complete_rq+0x168/0x170
>>>>> nvme_pci_complete_rq+0x16c/0x1f0
>>>>> nvme_handle_cqe+0xde/0x190
>>>>> nvme_irq+0x78/0x100
>>>>> __handle_irq_event_percpu+0x77/0x1e0
>>>>> handle_irq_event+0x54/0xb0
>>>>> handle_edge_irq+0xdf/0x230
>>>>> asm_call_irq_on_stack+0xf/0x20
>>>>> </IRQ>
>>>>> common_interrupt+0x9e/0x150
>>>>> asm_common_interrupt+0x1e/0x40
>>>>>
>>>>> It looks to me like these two upstream commits were backported to 5.10:
>>>>>
>>>>> 679c54f2de67 ("nvme: use command_id instead of req->tag in trace_nvme_complete_rq()")
>>>>> e7006de6c238 ("nvme: code command_id with a genctr for use-after-free validation")
>>>>>
>>>>> But they depend on this upstream commit to initialize the 'cmd' field in
>>>>> some cases:
>>>>>
>>>>> f4b9e6c90c57 ("nvme: use driver pdu command for passthrough")
>>>>>
>>>>> Does it sound like I'm on the right track? The 5.15LTS and later seems to be okay.
>>>>>
>>>>
>>>> If you apply that commit, does it solve the issue for you?
>>>>
>>>> thanks,
>>>>
>>>> greg k-h
>>>
>>> The f4b9e6c90c57 ("nvme: use driver pdu command for passthrough")
>>> upstream commit doesn't apply cleanly to 5.10LTS. If I adjust it to
>>> fit, then the crash no longer occurs for me.
>>>
>>> A revert of 706960d328f5 ("nvme: use command_id instead of req->tag in
>>> trace_nvme_complete_rq()") from 5.10LTS also prevents the crash.
>>>
>>> My leaning would be for a revert from 5.10LTS, but I think the
>>> maintainers would have better insight then me. It's also possible
>>> that this isn't serious enough to worry about in general. I don't
>>> really know.
>>
>> Either solution is fine with me, doesn't really matter. I was wondering
>> how this ended up in stable, and it looks like it was one of those
>> auto-selections... Those seem particularly dangerous the further back
>> you go.
>
> Now reverted, thanks. But note, that commit does say it fixes an issue
> this far back, which is why it was applied.
Yeah, looking into it, I can see how this was applied. And hard to know
not to, it's one of those boundary conditions that may happen for stable
backports and that only the ones intimate with the nvme code will spot.
Just one of those things, I think.
--
Jens Axboe
prev parent reply other threads:[~2024-01-13 16:11 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-09 18:17 Crash in NVME tracing on 5.10LTS John Sperbeck
2024-01-11 9:46 ` Greg KH
2024-01-11 17:00 ` John Sperbeck
2024-01-11 19:38 ` Jens Axboe
2024-01-13 9:34 ` Greg KH
2024-01-13 16:11 ` Jens Axboe [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=29108779-92d0-40dc-a259-72a24ad5d385@kernel.dk \
--to=axboe@kernel.dk \
--cc=beanhuo@micron.com \
--cc=gregkh@linuxfoundation.org \
--cc=jsperbeck@google.com \
--cc=khazhy@google.com \
--cc=sagi@grimberg.me \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox