From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE2AFC433EF for ; Wed, 30 Mar 2022 13:49:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346322AbiC3Nus (ORCPT ); Wed, 30 Mar 2022 09:50:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53048 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346315AbiC3Nur (ORCPT ); Wed, 30 Mar 2022 09:50:47 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 1F684C625C for ; Wed, 30 Mar 2022 06:49:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1648648141; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Q+NBE/hTdM81w8Dvz+19s18J//AsPztVXsL/1M4A+3M=; b=CW0LDq+xEFaQGQZFCH4wMsx0HDoDHXYOixkA8kHa4hWzVozGsbdZjzw2Yp2u38hga5OIjI BiErsbbmoIDjlT023uE5j/zQr1rHTK8ieWMvFmT1wHhojs8XFh308z/HzWbCaCHsTkXWKV xdAx8U8JH7nZJrAlYvh8Lp/QCZaUsPU= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-20-ypCrzs0fOZusgdauwxujjQ-1; Wed, 30 Mar 2022 09:48:55 -0400 X-MC-Unique: ypCrzs0fOZusgdauwxujjQ-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B87C180D2A2; Wed, 30 Mar 2022 13:48:48 +0000 (UTC) Received: from T590 (ovpn-8-29.pek2.redhat.com [10.72.8.29]) by smtp.corp.redhat.com (Postfix) with ESMTPS id EAB5457B60A; Wed, 30 Mar 2022 13:48:36 +0000 (UTC) Date: Wed, 30 Mar 2022 21:48:30 +0800 From: Ming Lei To: James Bottomley Cc: John Garry , Andrea Righi , Martin Wilck , Bart Van Assche , "Martin K. Petersen" , linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: filesystem corruption with "scsi: core: Reallocate device's budget map on queue depth change" Message-ID: References: <08717833-19bb-8aaa-4f24-2989a9f56cd3@huawei.com> <263108383b1c01cf9237ff2fcd2e97a482eff83e.camel@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <263108383b1c01cf9237ff2fcd2e97a482eff83e.camel@linux.ibm.com> X-Scanned-By: MIMEDefang 2.85 on 10.11.54.9 Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org On Wed, Mar 30, 2022 at 09:31:35AM -0400, James Bottomley wrote: > On Wed, 2022-03-30 at 13:59 +0100, John Garry wrote: > > On 30/03/2022 12:21, Andrea Righi wrote: > > > On Wed, Mar 30, 2022 at 11:38:02AM +0100, John Garry wrote: > > > > On 30/03/2022 11:11, Andrea Righi wrote: > > > > > Hello, > > > > > > > > > > after this commit I'm experiencing some filesystem corruptions > > > > > at boot on a power9 box with an aacraid controller. > > > > > > > > > > At the moment I'm running a 5.15.30 kernel; when the filesystem > > > > > is mounted at boot I see the following errors in the console: > > > > About "scsi: core: Reallocate device's budget map on queue depth > > change" being added to a stable kernel, I am not sure if this was > > really a fix or just a memory optimisation. > > I can see how it becomes the problem: it frees and allocates a new > bitmap across a queue freeze, but bits in the old one might still be in > use. This isn't a problem except when they return and we now possibly > see a tag greater than we think we can allocate coming back. > Presumably we don't check this and we end up doing a write to > unallocated memory. > > I think if you want to reallocate on queue depth reduction, you might > have to drain the queue as well as freeze it. After queue is frozen, there can't be any in-flight request/scsi command, so the sbitmap is zeroed at that time, and safe to reallocate. The problem is aacraid specific, since the driver has hard limit of 256 queue depth, see aac_change_queue_depth(). Thanks, Ming