From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 278EAC43219 for ; Tue, 30 Apr 2019 08:53:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id DFA5821670 for ; Tue, 30 Apr 2019 08:53:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726598AbfD3IxY (ORCPT ); Tue, 30 Apr 2019 04:53:24 -0400 Received: from mx2.suse.de ([195.135.220.15]:33130 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725790AbfD3IxY (ORCPT ); Tue, 30 Apr 2019 04:53:24 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 6E7BFAD0C; Tue, 30 Apr 2019 08:53:21 +0000 (UTC) Date: Tue, 30 Apr 2019 10:53:21 +0200 Message-ID: From: Takashi Iwai To: Liwei Song Cc: , Yu Zhao , Mark Brown , Keyon Jie , Jaroslav Kysela , linux-kernel Subject: Re: [PATCH] ALSA: hda: check RIRB to avoid use NULL pointer In-Reply-To: <5CC8082F.4090903@windriver.com> References: <1556604653-47363-1-git-send-email-liwei.song@windriver.com> <5CC8082F.4090903@windriver.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI/1.14.6 (Maruoka) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 Emacs/25.3 (x86_64-suse-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 30 Apr 2019 10:32:47 +0200, Liwei Song wrote: > > > > On 04/30/2019 03:31 PM, Takashi Iwai wrote: > > On Tue, 30 Apr 2019 08:10:53 +0200, > > Song liwei wrote: > >> > >> From: Liwei Song > >> > >> Fix the following BUG: > >> > >> BUG: unable to handle kernel NULL pointer dereference at 000000000000000c > >> Workqueue: events azx_probe_work [snd_hda_intel] > >> RIP: 0010:snd_hdac_bus_update_rirb+0x80/0x160 [snd_hda_core] > >> Call Trace: > >> > >> azx_interrupt+0x78/0x140 [snd_hda_codec] > >> __handle_irq_event_percpu+0x49/0x300 > >> handle_irq_event_percpu+0x23/0x60 > >> handle_irq_event+0x3c/0x60 > >> handle_edge_irq+0xdb/0x180 > >> handle_irq+0x23/0x30 > >> do_IRQ+0x6a/0x140 > >> common_interrupt+0xf/0xf > >> > >> The Call Trace happened when run kdump on a NFS rootfs system. > >> Exist the following calling sequence when boot the second kernel: > >> > >> azx_first_init() > >> --> azx_acquire_irq() > >> <-- interrupt come in, azx_interrupt() was called > >> --> hda_intel_init_chip() > >> --> azx_init_chip() > >> --> snd_hdac_bus_init_chip() > >> --> snd_hdac_bus_init_cmd_io(); > >> --> init rirb.buf and corb.buf > >> > >> Interrupt happened after azx_acquire_irq() while RIRB still didn't got > >> initialized, then NULL pointer will be used when process the interrupt. > >> > >> Check the value of RIRB to ensure it is not NULL, to aviod some special > >> case may hang the system. > >> > >> Fixes: 14752412721c ("ALSA: hda - Add the controller helper codes to hda-core module") > >> Signed-off-by: Liwei Song > > > > Oh, that's indeed a race there. > > > > But I guess the check introduced by the patch is still error-prone. > > Basically the interrupt handling should be moved after the chip > > initialization. I suppose that your platform uses the shared > > interrupt, not the MSI? > > This is the information from /proc/interrupt > 134: 0 102 0 0 IR-PCI-MSI 514048-edge snd_hda_intel:card0 Hm, then it's interesting... > > In anyway, alternative (and likely more certain) fix would be to move > > the azx_acquir_irq() call like the patch below (note: totally > > untested). Could you check whether it works? > > Yes, It works. > > Considering a previous patch like the one you provide will import some issue, > so I choose check the invalid value to low the risk, but just as you mentioned, > It is not a good solution. > > commit 542cedec53c9e8b73f3f05bf8468823598c50489 > Author: Yu Zhao > Date: Tue Sep 11 15:12:46 2018 -0600 > > Revert "ASoC: Intel: Skylake: Acquire irq after RIRB allocation" > > This reverts commit 12eeeb4f4733bbc4481d01df35933fc15beb8b19. > > The patch doesn't fix accessing memory with null pointer in > skl_interrupt(). > > There are two problems: 1) skl_init_chip() is called twice, before > and after dma buffer is allocate. The first call sets bus->chip_init > which prevents the second from initializing bus->corb.buf and > rirb.buf from bus->rb.area. 2) snd_hdac_bus_init_chip() enables > interrupt before snd_hdac_bus_init_cmd_io() initializing dma buffers. > There is a small window which skl_interrupt() can be called if irq > has been acquired. If so, it crashes when using null dma buffer > pointers. Actually this followed by another fix b61749a89f82, sound: enable interrupt after dma buffer initialization and this moved the IRQ enablement after snd_hdac_bus_init_cmd_io(). So I wonder how the irq gets triggered in your case. If it were a shared irq, it's understandable. But for MSI, it should have been the isolated source. In anyway, for the latest tree, the change I suggested would cover better although it's more radical as you pointed. thanks, Takashi