From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E918C43219 for ; Mon, 29 Apr 2019 18:16:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E858D2075E for ; Mon, 29 Apr 2019 18:16:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728940AbfD2SQk (ORCPT ); Mon, 29 Apr 2019 14:16:40 -0400 Received: from mail-pl1-f194.google.com ([209.85.214.194]:46815 "EHLO mail-pl1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728748AbfD2SQk (ORCPT ); Mon, 29 Apr 2019 14:16:40 -0400 Received: by mail-pl1-f194.google.com with SMTP id bi2so664627plb.13; Mon, 29 Apr 2019 11:16:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:mime-version:content-transfer-encoding; bh=aLkZ493/1cphoGwpLIb9H6qasE7OtGYtVCVCps4qXrE=; b=GMgnYuebgPvTcDZfKu1oiDfjT9Cr/8GS7nbUkYUo7edo0jBZ9y6tQD4ZT+7ALW8fvO hjlDHcZh5eSmUp4OL1fV5RdAI2CIL+JBEiGKe03m4OSunGl7+OmUMOpq+gFdh4YfWbof 2htMal6QOv6GayZoEhBigogFJCWGNMsUgEcasXMkCW15Ky+yEkEvV0Uqymk7FIjOWI/u 25tEb2V7t1EctnVpq2SPHE2X2LJWEaL9yGhUjn+1EeFd2LcvbQqhbLYJyFcGbuMGu9vc f41LW2XFicz/6omOPjHB4m2c5p4V0bbekO6Iv7j8iq/2ftbb2rV3srEA7MxOOMmqlYiO aV5w== X-Gm-Message-State: APjAAAUuuEu+T9rrdaa6Bc8gR5JVROXbjh3XS2RB8NdtwdGJCw0a1Kau T29xojDjM8lh1Vhk6D+BXYCeBkFF X-Google-Smtp-Source: APXvYqwaDN3hsVmsoZIUFFdHMxshS0Ml2pvXLwfgKp2v4Jta1b0Rlg3VKQSEOvS12SWIGVS4Gp4+bw== X-Received: by 2002:a17:902:5910:: with SMTP id o16mr9150656pli.289.1556561799622; Mon, 29 Apr 2019 11:16:39 -0700 (PDT) Received: from ?IPv6:2620:15c:2cd:203:5cdc:422c:7b28:ebb5? ([2620:15c:2cd:203:5cdc:422c:7b28:ebb5]) by smtp.gmail.com with ESMTPSA id d15sm11992646pfr.179.2019.04.29.11.16.38 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 29 Apr 2019 11:16:38 -0700 (PDT) Message-ID: <1556561798.161891.166.camel@acm.org> Subject: Re: [PATCH V4 3/3] scsi: core: avoid to pre-allocate big chunk for sg list From: Bart Van Assche To: Ming Lei , James Bottomley , linux-scsi@vger.kernel.org, "Martin K . Petersen" Cc: linux-block@vger.kernel.org, Christoph Hellwig , "Ewan D . Milne" , Hannes Reinecke Date: Mon, 29 Apr 2019 11:16:38 -0700 In-Reply-To: <20190428073932.9898-4-ming.lei@redhat.com> References: <20190428073932.9898-1-ming.lei@redhat.com> <20190428073932.9898-4-ming.lei@redhat.com> Content-Type: text/plain; charset="UTF-7" X-Mailer: Evolution 3.26.2-1 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On Sun, 2019-04-28 at 15:39 +-0800, Ming Lei wrote: +AD4 Now scsi+AF8-mq+AF8-setup+AF8-tags() pre-allocates a big buffer for IO sg list, +AD4 and the buffer size is scsi+AF8-mq+AF8-sgl+AF8-size() which depends on smaller +AD4 value between shost-+AD4-sg+AF8-tablesize and SG+AF8-CHUNK+AF8-SIZE. +AD4 +AD4 Modern HBA's DMA is often capable of deadling with very big segment +AD4 number, so scsi+AF8-mq+AF8-sgl+AF8-size() is often big. Suppose the max sg number +AD4 of SG+AF8-CHUNK+AF8-SIZE is taken, scsi+AF8-mq+AF8-sgl+AF8-size() will be 4KB. +AD4 +AD4 Then if one HBA has lots of queues, and each hw queue's depth is +AD4 high, pre-allocation for sg list can consume huge memory. +AD4 For example of lpfc, nr+AF8-hw+AF8-queues can be 70, each queue's depth +AD4 can be 3781, so the pre-allocation for data sg list is 70+ACo-3781+ACo-2k +AD4 +AD0-517MB for single HBA. +AD4 +AD4 There is Red Hat internal report that scsi+AF8-debug based tests can't +AD4 be run any more since legacy io path is killed because too big +AD4 pre-allocation. +AD4 +AD4 So switch to runtime allocation for sg list, meantime pre-allocate 2 +AD4 inline sg entries. This way has been applied to NVMe PCI for a while, +AD4 so it should be fine for SCSI too. Also runtime sg entries allocation +AD4 has verified and run always in the original legacy io path. +AD4 +AD4 Not see performance effect in my big BS test on scsi+AF8-debug. Reviewed-by: Bart Van Assche +ADw-bvanassche+AEA-acm.org+AD4