From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A760FC433E0 for ; Tue, 12 Jan 2021 06:37:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7B63622CA2 for ; Tue, 12 Jan 2021 06:37:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730806AbhALGhF (ORCPT ); Tue, 12 Jan 2021 01:37:05 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:53772 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729223AbhALGhF (ORCPT ); Tue, 12 Jan 2021 01:37:05 -0500 Received: from pps.filterd (m0098394.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 10C6XpON107042; Tue, 12 Jan 2021 01:35:47 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : subject : from : reply-to : to : cc : date : in-reply-to : references : content-type : mime-version : content-transfer-encoding; s=pp1; bh=7ICjcjyZoP36IcIqAtmsWOjw8Be2kuyorFnzhFljoHU=; b=TxDSP2nKH9h/7nEwIsELkW4NCaoGs4Qxlas2oN2Iz5FIHX3yCaCx+aflGqSWfqL/Aidc T9C1rBMu13/3PAzO2wtI8iknErkZkLccrwR2IgRK7Mj+4ML3UMQ6HNCUAz5A6GhZkVnm nvTqpZbKA+a/j8DsQP0USKLncnvEfYaEe+AoDqqQOMLoN1CSpjqVokvpVL8CNjjnZsq0 muLP8pdbpM447NpHaaYets/bqGIG1Y6jEWkqE2lQhmXkii98ryNoBKXhytu0t9/+oHPa EEgv+dnILB6bNW14nvazcYGiFdpb5rDJaHG14Wq+Xls+UFP2EGG8nXsVB+C6Cf3w/VUj xA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 3616c58af9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 12 Jan 2021 01:35:47 -0500 Received: from m0098394.ppops.net (m0098394.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 10C6YBgW108844; Tue, 12 Jan 2021 01:35:47 -0500 Received: from ppma01dal.us.ibm.com (83.d6.3fa9.ip4.static.sl-reverse.com [169.63.214.131]) by mx0a-001b2d01.pphosted.com with ESMTP id 3616c58af2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 12 Jan 2021 01:35:46 -0500 Received: from pps.filterd (ppma01dal.us.ibm.com [127.0.0.1]) by ppma01dal.us.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 10C6WAR0026671; Tue, 12 Jan 2021 06:35:45 GMT Received: from b03cxnp08027.gho.boulder.ibm.com (b03cxnp08027.gho.boulder.ibm.com [9.17.130.19]) by ppma01dal.us.ibm.com with ESMTP id 35y44914gw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 12 Jan 2021 06:35:45 +0000 Received: from b03ledav004.gho.boulder.ibm.com (b03ledav004.gho.boulder.ibm.com [9.17.130.235]) by b03cxnp08027.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 10C6Zic18323786 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 12 Jan 2021 06:35:44 GMT Received: from b03ledav004.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 120F678060; Tue, 12 Jan 2021 06:35:44 +0000 (GMT) Received: from b03ledav004.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A4BBE7805C; Tue, 12 Jan 2021 06:35:41 +0000 (GMT) Received: from jarvis.int.hansenpartnership.com (unknown [9.85.180.222]) by b03ledav004.gho.boulder.ibm.com (Postfix) with ESMTP; Tue, 12 Jan 2021 06:35:41 +0000 (GMT) Message-ID: <62b562eae9830830d87ea9f92dcc0018a1935583.camel@linux.ibm.com> Subject: Re: About scsi device queue depth From: James Bottomley Reply-To: jejb@linux.ibm.com To: John Garry , Ming Lei , Bart Van Assche , Hannes Reinecke , Kashyap Desai , "Martin K . Petersen" , Christoph Hellwig , "linux-scsi@vger.kernel.org" , Sathya Prakash , Sreekanth Reddy , Suganath Prabu Subramani , PDL-MPT-FUSIONLINUX Cc: chenxiang Date: Mon, 11 Jan 2021 22:35:40 -0800 In-Reply-To: References: <9ff894da-cf2c-9094-2690-1973cc57835a@huawei.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.34.4 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.343,18.0.737 definitions=2021-01-12_03:2021-01-11,2021-01-12 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 priorityscore=1501 spamscore=0 phishscore=0 impostorscore=0 adultscore=0 malwarescore=0 mlxlogscore=999 lowpriorityscore=0 mlxscore=0 clxscore=1031 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2101120035 Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org On Mon, 2021-01-11 at 17:11 +0000, John Garry wrote: > On 11/01/2021 16:40, James Bottomley wrote: > > > So initial sdev queue depth comes from cmd_per_lun by default or > > > manually setting in the driver via scsi_change_queue_depth(). It > > > seems to me that some drivers are not setting this optimally, as > > > above. > > > > > > Thoughts on guidance for setting sdev queue depth? Could blk-mq > > > changed this behavior? > > Hi James, > > > In general, for spinning rust, you want the minimum queue depth > > possible for keeping the device active because merging is a very > > important performance enhancement and once the drive is fully > > occupied simply sending more tags won't improve latency. We used > > to recommend a depth of about 4 for this reason. A co-operative > > device can help you find the optimal by returning QUEUE_FULL when > > it's fully occupied so we have a mechanism to track the queue full > > returns and change the depth interactively. > > > > For high iops devices, these considerations went out of the window > > and it's generally assumed (without varying evidence) the more tags > > the better. > > For this case, it seems the opposite - less is more. And I seem to > be hitting closer to the sweet spot there, with more merges. I think cheaper SSDs have a write latency problem due to erase block issues. I suspect all SSDs have a channel problem in that there's a certain number of parallel channels and once you go over that number they can't actually work on any more operations even if they can queue them. For cheaper (as in fewer channels, and less spare erased block capacity) SSDs there will be a benefit to reducing the depth to some multiplier of the channels (I'd guess 2-4 as the multiplier). When SSDs become write throttled, there may be less benefit to us queueing in the block layer (merging produces bigger packets with lower overhead, but the erase block consumption will remain the same). For the record, the internet thinks that cheap SSDs have 2-4 channels, so that would argue a tag depth somewhere from 4-16 > > SSDs have a peculiar lifetime problem in that when they get > > erase block starved they start behaving more like spinning rust in > > that they reach a processing limit but only for writes, so lowering > > the write queue depth (which we don't even have a knob for) might > > be a good solution. Trying to track the erase block problem has > > been a constant bugbear. > > I am only doing read performance test here, and the disks are SAS3.0 > SSDs HUSMM1640ASS204, so not exactly slow. Possibly ... the stats on most manufacturer SSDs don't give you information about the channels or spare erase blocks. > > I'm assuming you're using spinning rust in the above, so it sounds > > like the firmware in the card might be eating the queue full > > returns. Icould see this happening in RAID mode, but it shouldn't > > happen in jbod mode. > > Not sure on that, but I didn't check too much. I did try to increase > fio queue depth and sdev queue depth to be very large to clobber the > disks, but still nothing. If it's an SSD it's likely not giving the queue full you'd need to get the mid-layer to throttle automatically. James