From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail-pf1-f175.google.com (mail-pf1-f175.google.com [209.85.210.175])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id A467C143888
	for <linux-kernel@vger.kernel.org>; Wed,  4 Dec 2024 06:32:04 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.175
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1733293926; cv=none; b=bP+5+4iUhBGQEGGQtIvNH7BF1wvXEdIbwABc8MmC5HitAiqDO/0nvaNfRAcvhQ3LUUFiSg67UgWwRsFdQ25kls3yQi+iwcoM+UdJZE8tTxEj8arPaTOHaoLCT3IQwEwWzH6yYrgMt4l6tQ4wTYiomZ91nwLGblMBecGW8jZnkHU=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1733293926; c=relaxed/simple;
	bh=vXd51GbwJKXNLwv2nLkHDc5hOuj5aMRgYYIQavovp0Y=;
	h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version:
	 Content-Type:Content-Disposition:In-Reply-To; b=pqnmMmD9VjcoTLJ19e4GsqSBcDvK/D2c58fnqTc4HTtNzM69olp9gYCheAUHomPMeCRyl+MnfJWbpvrtZE5uK4+nsv0K1j35N5Hj4shwD+ieFZtoEoKKTJ9rHVySi/VhmaKmdSRh5YJSx5Aw44Cjni1GihJwHZKVKd92I9xh3UQ=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org; spf=pass smtp.mailfrom=chromium.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b=fTcOZYze; arc=none smtp.client-ip=209.85.210.175
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=chromium.org
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="fTcOZYze"
Received: by mail-pf1-f175.google.com with SMTP id d2e1a72fcca58-724e14b90cfso6391102b3a.2
        for <linux-kernel@vger.kernel.org>; Tue, 03 Dec 2024 22:32:04 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=chromium.org; s=google; t=1733293924; x=1733898724; darn=vger.kernel.org;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to;
        bh=GBDbYslVLF6reDpDFL+cIdoA50QWCfJ+6RlDjehcuiI=;
        b=fTcOZYzegUv1z/sVgY+NNeZDnCzvh8yCPVKf4emzp3VNz9ZBIb0i6Vu0Dl8/rkIZ4I
         mlBbrJOHRunWn1muyq3SaH2m51racR5YSwumEqYOuai60Hma1Ui+bixuSzf84qPvhdsa
         h6cfIjrEebVQ2W5qWBNaRV0zbjnUMPp1viiAA=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1733293924; x=1733898724;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date
         :message-id:reply-to;
        bh=GBDbYslVLF6reDpDFL+cIdoA50QWCfJ+6RlDjehcuiI=;
        b=uS/AgFpz21Ico13GL08QfqxYDIiZxbjAc7X/BHerUeeyP+42NoHWnNotqfcj5jPS5k
         4y80UdnEaAKNNFNJUx71Vej8aeqRIY3RNHEn1x5pB8uGwAYrOfJDewhdldZAsH0+3TeG
         BOdiBQnbMjhQzJY3iJt4rbESZgd3MhysVmHnYe/hI9O0c7UBs0S4ncvKfm71DG6cK+c8
         Gim234FEQ6jlviN9jGR5O+yx1t6sFYAbeQdc8DnRPkKCh2aGNFcUvzDH2LM4n3t/yGeQ
         GeGSZrmeab3NMFwHpHqJhYyqPwP/EUoz23GbCL2XIU7Leeb2I52Ynf3eY5w9ASSw4yX3
         k9rg==
X-Forwarded-Encrypted: i=1; AJvYcCWpCe1NKt38tLVefRbdSERFGDz5lY9gTtRzmBroMIF+qFIITzErP5gI4mSZppN66GziPUIpslOLuJqJ2Ug=@vger.kernel.org
X-Gm-Message-State: AOJu0YwmCVsnbAFe/r/rYa3PFHYvPY1JOYUGrGFQyO2fwsVpXBz8wNNX
	jrZMI9beI6rw3Mtpt06al00HTzHJRDwuyVuvAqYI7PZQ0jSr81nO+INgF+UGcg==
X-Gm-Gg: ASbGncvB7Th1OonnORWmlHeGNMb6frRJPjQhpET/6BwKq6sC/0Bm0dD/0ZvwZmXLehq
	3cC9o85EtgaclS4d4aSgpWQwlIolbenhZH5NsJTJ0L+XorG8TRMd1pIOE2+TruOkDw+n5Ml6afQ
	khNNcpzgzg9lrYbo4WXIap2Xm6J2x9SrVvrpx+PHlu5/XJ5irKafTnf22rOR9cg7B1oq0scwdju
	1HgxHfF5fGFMdUsSvYEdpxO3Zd4+5L6DmM1Ns/lKuBhNMjgKA==
X-Google-Smtp-Source: AGHT+IFgoqrZKvhFYx8F4ky/jD8rqqc8po02xr+Nfsr12XAUBQRr/9Y1O4Z/TTuXQPv0BeKfWTRnsA==
X-Received: by 2002:a05:6a00:1952:b0:724:e160:7f19 with SMTP id d2e1a72fcca58-7257fcbc278mr6472126b3a.22.1733293923946;
        Tue, 03 Dec 2024 22:32:03 -0800 (PST)
Received: from google.com ([2401:fa00:8f:203:f520:3e:d9a1:1de])
        by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-7258b2e79d3sm917423b3a.67.2024.12.03.22.32.01
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 03 Dec 2024 22:32:03 -0800 (PST)
Date: Wed, 4 Dec 2024 15:31:58 +0900
From: Sergey Senozhatsky <senozhatsky@chromium.org>
To: Stanimir Varbanov <stanimir.k.varbanov@gmail.com>,
	Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Cc: Vikash Garodia <quic_vgarodia@quicinc.com>,
	Dikshita Agarwal <quic_dikshita@quicinc.com>,
	linux-media@vger.kernel.org, linux-kernel@vger.kernel.org,
	Sergey Senozhatsky <senozhatsky@chromium.org>
Subject: Re: [PATCHv6 2/3] media: venus: sync with threaded IRQ during inst
 destruction
Message-ID: <20241204063158.GG886051@google.com>
References: <20241025165656.778282-1-senozhatsky@chromium.org>
 <20241025165656.778282-3-senozhatsky@chromium.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20241025165656.778282-3-senozhatsky@chromium.org>

On (24/10/26 01:56), Sergey Senozhatsky wrote:
> BUG: KASAN: slab-use-after-free in vb2_queue_error+0x80/0x90
> Call trace:
> dump_backtrace+0x1c4/0x1f8
> show_stack+0x38/0x60
> dump_stack_lvl+0x168/0x1f0
> print_report+0x170/0x4c8
> kasan_report+0x94/0xd0
> __asan_report_load2_noabort+0x20/0x30
> vb2_queue_error+0x80/0x90
> venus_helper_vb2_queue_error+0x54/0x78
> venc_event_notify+0xec/0x158
> hfi_event_notify+0x878/0xd20
> hfi_process_msg_packet+0x27c/0x4e0
> venus_isr_thread+0x258/0x6e8
> hfi_isr_thread+0x70/0x90
> venus_isr_thread+0x34/0x50
> irq_thread_fn+0x88/0x130
> irq_thread+0x160/0x2c0
> kthread+0x294/0x328
> ret_from_fork+0x10/0x20
> 
> Allocated by task 20291:
> kasan_set_track+0x4c/0x80
> kasan_save_alloc_info+0x28/0x38
> __kasan_kmalloc+0x84/0xa0
> kmalloc_trace+0x7c/0x98
> v4l2_m2m_ctx_init+0x74/0x280
> venc_open+0x444/0x6d0
> v4l2_open+0x19c/0x2a0
> chrdev_open+0x374/0x3f0
> do_dentry_open+0x710/0x10a8
> vfs_open+0x88/0xa8
> path_openat+0x1e6c/0x2700
> do_filp_open+0x1a4/0x2e0
> do_sys_openat2+0xe8/0x508
> do_sys_open+0x15c/0x1a0
> __arm64_sys_openat+0xa8/0xc8
> invoke_syscall+0xdc/0x270
> el0_svc_common+0x1ec/0x250
> do_el0_svc+0x54/0x70
> el0_svc+0x50/0xe8
> el0t_64_sync_handler+0x48/0x120
> el0t_64_sync+0x1a8/0x1b0
> 
> Freed by task 20291:
>  kasan_set_track+0x4c/0x80
>  kasan_save_free_info+0x3c/0x60
>  ____kasan_slab_free+0x124/0x1a0
>  __kasan_slab_free+0x18/0x28
>  __kmem_cache_free+0x134/0x300
>  kfree+0xc8/0x1a8
>  v4l2_m2m_ctx_release+0x44/0x60
>  venc_close+0x78/0x130 [venus_enc]
>  v4l2_release+0x20c/0x2f8
>  __fput+0x328/0x7f0
>  ____fput+0x2c/0x48
>  task_work_run+0x1e0/0x280
>  get_signal+0xfb8/0x1190
>  do_notify_resume+0x34c/0x16a8
>  el0_svc+0x9c/0xe8
>  el0t_64_sync_handler+0x48/0x120
>  el0t_64_sync+0x1a8/0x1b0

[..]

> @@ -1750,10 +1750,20 @@ static int vdec_close(struct file *file)
>  	vdec_pm_get(inst);
>  
>  	cancel_work_sync(&inst->delayed_process_work);
> +	/*
> +	 * First, remove the inst from the ->instances list, so that
> +	 * to_instance() will return NULL.
> +	 */
> +	hfi_session_destroy(inst);
> +	/*
> +	 * Second, make sure we don't have IRQ/IRQ-thread currently running
> +	 * or pending execution, which would race with the inst destruction.
> +	 */
> +	synchronize_irq(inst->core->irq);
> +
>  	v4l2_m2m_ctx_release(inst->m2m_ctx);
>  	v4l2_m2m_release(inst->m2m_dev);
>  	ida_destroy(&inst->dpb_ids);
> -	hfi_session_destroy(inst);
>  	v4l2_fh_del(&inst->fh);
>  	v4l2_fh_exit(&inst->fh);
>  	vdec_ctrl_deinit(inst);

Sorry about the news, I got a regression report this morning and the reporter
points at this patch as the culprit.  It sees that under some circumstances
at close() there are still multiple pending requests, so hfi_session_destroy()
performed as the first step (in order to close race condition with
instance destruction) makes it impossible to finish some of those pending
requests ("no valid instance", which was the whole point).

v4l2_m2m_ctx_release() expects to be called before hfi_session_destroy(),
but this leaves us with half destroyed instance on the instances list.
Not sure what to do about it.  Would it be safe (?) to call synchronize_irq()
at the very start and then keep the old destruction order (which looks a
little unsafe) like something below *.  Or is there something missing in the
driver and there is a way to make sure we don't have any pending
jobs/requests at close()?


* something below
---
        /*
         * Make sure we don't have IRQ/IRQ-thread currently running
         * or pending execution, which would race with the inst destruction.
         */
        synchronize_irq(inst->core->irq);
        /*
         * Now wait for jobs to be dequeued. Note this will free m2m ctx.
         */
        v4l2_m2m_ctx_release(inst->m2m_ctx);
        v4l2_m2m_release(inst->m2m_dev);
        hfi_session_destroy(inst);
        v4l2_fh_del(&inst->fh);
        v4l2_fh_exit(&inst->fh);
        v4l2_ctrl_handler_free(&inst->ctrl_handler);

        mutex_destroy(&inst->lock);
        mutex_destroy(&inst->ctx_q_lock);