From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754944AbdEHOxQ (ORCPT ); Mon, 8 May 2017 10:53:16 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:36701 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752756AbdEHOxN (ORCPT ); Mon, 8 May 2017 10:53:13 -0400 Authentication-Results: lightnvm.io; dkim=none (message not signed) header.d=none;lightnvm.io; dmarc=none action=none header.from=fb.com; Subject: Re: Large latency on blk_queue_enter To: =?UTF-8?Q?Javier_Gonz=c3=a1lez?= References: <1656B440-3ECA-4F2B-B95C-418CF0F347E9@lightnvm.io> <20170508122738.GC5696@ming.t460p> <76E35BA3-FEC9-46D6-B36F-554F464FA9ED@lightnvm.io> <661d4b67-cf0c-a703-331b-ce24d75e782d@fb.com> CC: Ming Lei , Christoph Hellwig , "Dan Williams" , , "Linux Kernel Mailing List" , =?UTF-8?Q?Matias_Bj=c3=b8rling?= From: Jens Axboe Message-ID: Date: Mon, 8 May 2017 08:52:54 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [216.160.245.98] X-ClientProxiedBy: CY4PR0101CA0023.prod.exchangelabs.com (10.171.218.36) To CY4PR15MB1189.namprd15.prod.outlook.com (10.172.177.11) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 4b390fe3-5936-4951-0c6f-08d49621ee2c X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(201703131423075)(201703031133081);SRVR:CY4PR15MB1189; X-Microsoft-Exchange-Diagnostics: 1;CY4PR15MB1189;3:9w1ZQo6OKzML2WbiYhBB3bJeuxgtwGLolRNElTCroAFWWUlgBlQF6LMaoQdBa6WB96doPg5+aEH873ONau7A4nwxHge7doRhrNj+o25TaHrhmW6v5UsPwSEfN8QgQv5gmc+hNNQGY5Qegv3P0CQ6O+MKrOm40S0ZlN5bq7spfTJl4NKHRghYFL/c+0h95Vwc2NwZN7kO8rVT6eXaRM6Lxkd8xuy5bYLHNHJBD9cJEJgENUPLAu7y1R5pqaqL+fz6887/y9kP2lcPm19cEqmVnqnQv1XbVf5TVXtUz+1ZT6GqjuDk0JCnFju0UBG/rYYxTPckJE33UdllSNaCPB29Dw==;25:z/cG0jxRAV5mByEfkEXvi4aua0TLCcMSydZqg/0ZLJti3cGGhA2tC4rP3JK51jgsd+hw6fFDDul/6sfJMyWlgZzQIkkLOP7BvgwVfh8j9ZAIXWRL0Aoet367q4W7ynS556i64MJEXPFhRDC/+bjoIjg515osIlHeyO+qsPo7wPeUKL/5CbxuqWU0o/I+CuSF7gm4VbsNpsAJY8o++wb5ygh7sq7jkoAkmqwsUdlvZ94miKUpVGTGm+rNJaqO3gyQ9WFge1QrG4JPNmhg4vQG/eTfhXr3nRStFJAPPCScMD1EZ8PVpzX2bw/iDQUMRRCOiOu6QsLKxDvmn/FNMZQbcRIsvZrROQeDO2vN2R1+6Cl+mj4K+4Vw9dsbFVBNK6w0YEx0l20bVx85FhEzkkSpLW5QEYzlthr2nCDY5s8UtNZp9GsICsw2p4Y5du8YjLdChTsDNKwHu+yka2ok+ehpomSffBO5W0l8c0n6GdBnn7U= X-Microsoft-Exchange-Diagnostics: 1;CY4PR15MB1189;31:gLr0CGRUQ5TeR8zjhfg/QXBjEduCdKwV1aCewyd28lal5o/0U8Jq7uG7XTR/WlJ265MtQ9SFy1wbMymL/VZKmvIL0rn/F3qeIl0AXrw/dE43rHDVb152VDAi0RQzjGc7MCjpPxksaBK/OCHXaS0Hm8TQosNYZjUNqFAfaJrtgLpFVi8mXTEb+fGF9iUCD/55TSkLO49hW8jiytnXxlDxltQiXGSpVvTEt+eRca+9oUFXEYbSLZWhyEchDE4Q2wOCWES3REk6I/FAwNnenPFv45pASsX8rjAUOB/eFkprCxY=;20:/Gemqjq60x/aUs+UVOcMqpik3Rg5y1W8OPpd7scIye+wx+BoCCuO/AOh9pb5v89c/4tf1HhBzK4yDhhdY/5cl20KtmYWn0YnarSI5QDbxKeNO6rqTUgg92lUgfRvHrPs64CIrN6PZJbumGdvMg8tOqjhQSsXfoYvRDmCk3LFuKWwe8AhgwGMkius6VIjczUW40szyewgoV0Db2ISRJ0xDJXwZ20GzwPwy7q8ptfkMd0juh9fD1C+NVY/+EG5AbrbxMJWb6sXqszdZ0+VRDrhW/u1Lu6CMS79vKvfrGcnPneCkkUAAaw+5BBU/4RBROJa2wZzaJwB5yF1ASluCb2W1PcRrbG6Brxp19hAptnMMZSPrPl98GOGIa6+FkxmbTkIvPIwJ7HLuQvtL+aROhJvCKLPhZ4hnietzC6/CWiA3bqxqQzf96QcV15O5HqK/+HcY2UkA9EdZublp9W9lMQDdKLLkUzFnMgdcTDeG19wcuOX3XNVpqC8nq8cicnscQGK X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(67672495146484); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040450)(601004)(2401047)(8121501046)(5005006)(93006095)(93001095)(10201501046)(3002001)(6041248)(20161123564025)(20161123555025)(20161123562025)(20161123558100)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123560025)(6072148);SRVR:CY4PR15MB1189;BCL:0;PCL:0;RULEID:;SRVR:CY4PR15MB1189; X-Microsoft-Exchange-Diagnostics: 1;CY4PR15MB1189;4:vZiFU+PzJI2+WqiqjyIpkNSFvLsXNuQnux1Y/msdqK3vlR5okDTZwsvTWo9+iEUvYpkhUj4CQj9ARk0qoSytle8K/n/o9ge6tMTg9iU/42+oDE0shnRWfZeK2axMmSe7S/aEQGtmxLz45yol0le+ibIhXuEiWQsvk6CYJAteqpzZdVFSqLt3HQhutw0JTlMFXms8JF6Lcuqp9vAEtcmsgiaAfdp3K/7peUq/xIZKyJoR4bXgAjNjAIOB0KDw8vTcsz3sr3dldSmdL1konu/GxgPoZxWMf4U4TALIY/bNgG10ROkibj6dp6I68ygCpHjguhdi3zyBtOPceInK3n1Q6CADpAzcTG6nomTV84jzwlboFyH6qxAaw0PCxZASKZOVz2hGiCnHpwgWVU3g/7NdoiLerG/pR09pKnxSLDLx6shO5cZtiwXTJDNZwA8tetdZQAeEYyiUBx0QlWJJRh+Gts5nPNnnqE2o66DMvculZ4oyEDUds6oDPgIE1shaInLfILaKNq06WKgcepJF8VN3E4qBxuc4zUdwIBVdRQs+P74jUqojiXgLG8xPhjwfL3jx7eNxfB8dOpU6rzvBDIDY10ONo6ceG10Vy70/ZSn9lttRkQbHJ9MztI+cP/CXtPX3AQ6qqJs56cxHdAaY1+iwjCwKP0JxAx/gAdXiTFooZCm8iep6vnb55eQLXI2bXGYlwngIPamWPadsVnojkSeZRhHOT02fAegiU/XCDE9qAJmrcExPrlhS2zXPcnyR9KLPs1XkUAZhcLuwx2un9qlZIOtqFdPoOQL29BE+5MaiJRfSLvFo+mQLjVAFke2e9nQTdZHzQFpHTrMOqjYgQJwMcA== X-Forefront-PRVS: 0301360BF5 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(6049001)(39450400003)(39840400002)(39850400002)(39400400002)(39410400002)(377454003)(24454002)(36756003)(5660300001)(6246003)(38730400002)(93886004)(110136004)(65806001)(65956001)(42186005)(77096006)(478600001)(90366009)(229853002)(33646002)(2870700001)(6916009)(7736002)(2950100002)(6116002)(3846002)(50986999)(23676002)(53936002)(2906002)(305945005)(54356999)(76176999)(66066001)(50466002)(117156002)(65826007)(47776003)(6486002)(4001350100001)(6666003)(189998001)(64126003)(8676002)(4326008)(86362001)(575784001)(81166006)(25786009)(31696002)(83506001)(53546009);DIR:OUT;SFP:1102;SCL:1;SRVR:CY4PR15MB1189;H:[192.168.1.154];FPR:;SPF:None;MLV:sfv;LANG:en; X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtDWTRQUjE1TUIxMTg5OzIzOjlROGxzOUlCM1EyVzFwWUF4NW5OSGdXblZv?= =?utf-8?B?WWxZUUFhcEZrdzFoU2RBSXZYNlhaaDhWSkVyV0FxUytDQTgvaWNkVEplTDdh?= =?utf-8?B?TGZVL2JZaUxHTVJobEZiZG82bVlYU3dqNXZ3T01UMmsxT2s2Q2NuWFVxQk1E?= =?utf-8?B?NVJReVpZbnZyRXYwcDFYaFNwK0w5NmdwNGdVS1M1ZU1uejRPUWd1Zi9GZmhu?= =?utf-8?B?aGtqRG9OdFBvbnB0RmVFbmJUTmR3REd6U2doY2kwR0NOa204ZWlIMVkyR0R2?= =?utf-8?B?bFJNOW5HZXFMdjl0cW55VGNEbG1DSEJXRmNqNTA2eGJ5VWFZZHBHN1dESWdl?= =?utf-8?B?Wm5OcWUrSW5lRmREQnVIR1RrTWo4bnA4ekV2OEZtUG0ydTdyNkJLM004M25W?= =?utf-8?B?Y2N4SE0rNjlGNC80MklhbW5SQVFNVXFST01yLzRBRlRNNHNVTERUS1FUN2Vk?= =?utf-8?B?eHZ1UTQ4bGRUazB5NTgwRjRROElrZzB3ME1VeEJjZXBVR0tpd1hzWnJ6eVd5?= =?utf-8?B?YzBqcWQvQmZ4eVoraXRxMHNYNGc1Wm9hS3BqN3RFaWg4ZUZ6MnhVUVp1L3pm?= =?utf-8?B?UlJpZzhFNlE3b0UrcHZza0NjMVIydlkzckdIcTRpYmYxckRlZzM3WHFuK2lv?= =?utf-8?B?VS9kQ2hncnFRMXcrOE1HMy9VTjh3RVgrQWhoVFRpQWVwazJBVFZab2l4SW9B?= =?utf-8?B?SUVrOGFlM001SUlsblVKcnNlM1NxQkQ1V1YrSXBwOGhHbldMeDRNSGhRTXlJ?= =?utf-8?B?eFhIcGVVNkJJWTJmbCtiODIvc3VlVUFzQnhhajcrV0ZKSzZMTVhXTnZTQytS?= =?utf-8?B?ckFPQnZQV1hLYk5TZWNsd09PQ1JRSVpTTit2RDJ1T3Z3SzNyN2pnYkJVOHV4?= =?utf-8?B?S25KTU02M1hXZ2NwS0haM2pEVTdJUnlEZk1qNFNxdGlEU2JTNmpxVlh3TjVF?= =?utf-8?B?UC9QTmt5MThmVXNMMWhxQ2t0bDhvMlJzYXVXZm5oZjc4UE9WT3c3alNydm1M?= =?utf-8?B?MkFVQ2QzbGwzY0E2OWpmc2Y5Q29vV3g1KzZVb3U3TUFVUTlFUDEzZkt3ZG9S?= =?utf-8?B?bm81TW4vSGZjaExRZzN2MjJNRnhCTklMeWVUdmhhdk9yaW5tV3FVeEI2d0hI?= =?utf-8?B?elV4Nll2aVlXYTZ6OGduT3ZTMGkwd05RZ0UxMmhNaThpRVVCQzYrZXF3NGlO?= =?utf-8?B?T2lmUExVeHpxaUgvTmdIYVk3bjlGb1phWXJwSHRFU1BlUmxUbnFpZzY2Umc0?= =?utf-8?B?dUVOd2hsNFc3TlM3eTBycTNTOEFRMFRDb0VQR1BBekI4UC9IYnNXYTNFbzNB?= =?utf-8?B?Y2dpbEV4VnJmTG1zK1hhdGNwT2hJREtzYWhCWC9VN2FlN0RwakM2SkhjV0Jr?= =?utf-8?B?ZXI3TGowdnc5dDN6VjF1SURUbWk3Uk8vL2VwOFRLVFRYMUE1WGtrd0lTcVly?= =?utf-8?B?Z0FnNStxcWdSYUJlWGQveDZkUUpkL2dDbldFVW1ZdWpMSTRiNEZLRVVNSURN?= =?utf-8?B?eGZyUVUxdE96Z2hCaGw5NGhFMjhUek9Xc0pVZGxOWWFXbG1UQnhmRnAwMFdv?= =?utf-8?B?R2lLcytlMndvNDVSYzFVVXl1Z2tXWGpKTFhyWnVTbHdqUVQ5cXl0by95dHNS?= =?utf-8?B?czhFUmhYKzJxYVBnVytBeUJWdEdCZ1Q1N0c2UmhoV0dBcHVZZDNxaXJqcFFZ?= =?utf-8?B?VTRyTHFocjVpZXlXOGZkVHhONVJyckZOYmh2M2hpalNMUVB1VjdRU0x1VVU5?= =?utf-8?B?UzZmN283L2M5RHgxd1BKQlZ3SUJlQUFkN1ZBTXVNdGxpRjF1bzNZcDUwMWJC?= =?utf-8?B?N3haRG1aSnp4WnZSTFVJd2FvSkRVaDdUSllRWnNLajZOUjIvc1VITFNPRVEy?= =?utf-8?Q?VqP6T4Nv0ZM=3D?= X-Microsoft-Exchange-Diagnostics: 1;CY4PR15MB1189;6:0QgnlTg0RsdRKfCrD1pDEw7wMwc+CQxuuUSMCHb2MtfvDNsv1s8mLet+ZmukZEcN2vWkaltKpjzeCt3xh9eQaFVHmiOrsNe1UHGLBPhqG7tvc8xGpwasNmUoC7r5yi9WY3gfn2IwKvRlw+lcEENcFXYQr9yx9li3aMRCJtr3UN54uyrgSPSqhrWDtZyrULb4QDpciQI8J2xCmcThP1tj0xkNw88o1Al8Cu7tnZULzBilnes48jdlfK36A92ECSR/cB8i2zMCYNflv+gkK4rjptWv5nKSvK7odbo+LBBzzHCnbi7sQEocGq7e2fB526r0CyxCVMa1Wr0cjJeDQ+HzNvxbrlUXmjtOIJ5drZF59td4kXYQxAtaglJxlXYNnAeU79amAvcST5PrcgoeG0IXNkYzb3t6lyjV3rQzE5g8VqDedUVAtFuCCVcfNFvSDa4eHdtOFlPuSZ1bgzDzcRYjqUnwDo4RLB4QissQkBIQkhZg5Yfp6q5RO59ks/sftUhRdDeSO/M0k3tYnnaI6jVKMQ==;5:ZE8cDLZRQUuSaO0nXMnGQpYdOQiBHdQvEz2XVYSAMZJ21pe2CrJuGSohA8QrKd6mrtVfB9/2nUO2j3HVB9oEVgbpvydaxoIxvEgIMwYj1kwJPT6hdAlG4DskqLQ4jufD9mXcs+UviN1wO0y0qBWapg==;24:G80I1Utxfb8tXMpTbvjmLf5MC0GOc0jTJXKHVGDQddOyAM6amTZJB14pMVoEKqHUvV38QD65YP6qR5aEEYMeHB/n4LcB0XzMLrG/jVWyrGw= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;CY4PR15MB1189;7:RMlYqEr1mjrRxIhU7iv2i1TnvmXjzid9pDemDWTAN9s7eMS6VmsIeIuDrchMpHs8yTsH8SaxdNxBaB2JY5+ugiKZynYlZ0XY8VSTZRJWhX5nrin9ZcO/R1AYO/qxWfud6DJnu6/JXE82BdXOLozHgMhedQxcIAqjtXYA6q2SdzWSrl7+Y0qLBkjUPztaHIJdG+SpOqhRUvpuKC0PxE8iSV3I9r3VTZUAhT3Ncw+MQcxUBj0Sv+Dfz9XS7hllvkWmDwiV2gG5ZJl9nyw8sEK0zG3zKCb9zd5XnSekPbHIQASgdR/gcGu+Zda6v3zxH5G2RfgCtqx1BT7u+GC+fS9x/A==;20:I6GtB/Jf4m+7srZ3+ON1A8WtlRCurMFPRVQ4xmvDx1NRL9asnwPR3+ugwI0y6ljnChC8qPGLANLj3NMAzpUknmz3X1Vlrw4wTU5Ayz+JA28vk6H7tGUxhaWUiGZm9TWV+BjUmD9M2aViBgjBM31Ic3cOv/ncPPbGVg4C3YxxXvk= X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 May 2017 14:53:02.4841 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR15MB1189 X-OriginatorOrg: fb.com X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-05-08_10:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/08/2017 08:46 AM, Javier González wrote: >> On 8 May 2017, at 16.23, Jens Axboe wrote: >> >> On 05/08/2017 08:20 AM, Javier González wrote: >>>> On 8 May 2017, at 16.13, Jens Axboe wrote: >>>> >>>> On 05/08/2017 07:44 AM, Javier González wrote: >>>>>> On 8 May 2017, at 14.27, Ming Lei wrote: >>>>>> >>>>>> On Mon, May 08, 2017 at 01:54:58PM +0200, Javier González wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I find an unusual added latency(~20-30ms) on blk_queue_enter when >>>>>>> allocating a request directly from the NVMe driver through >>>>>>> nvme_alloc_request. I could use some help confirming that this is a bug >>>>>>> and not an expected side effect due to something else. >>>>>>> >>>>>>> I can reproduce this latency consistently on LightNVM when mixing I/O >>>>>>> from pblk and I/O sent through an ioctl using liblightnvm, but I don't >>>>>>> see anything on the LightNVM side that could impact the request >>>>>>> allocation. >>>>>>> >>>>>>> When I have a 100% read workload sent from pblk, the max. latency is >>>>>>> constant throughout several runs at ~80us (which is normal for the media >>>>>>> we are using at bs=4k, qd=1). All pblk I/Os reach the nvme_nvm_submit_io >>>>>>> function on lightnvm.c., which uses nvme_alloc_request. When we send a >>>>>>> command from user space through an ioctl, then the max latency goes up >>>>>>> to ~20-30ms. This happens independently from the actual command >>>>>>> (IN/OUT). I tracked down the added latency down to the call >>>>>>> percpu_ref_tryget_live in blk_queue_enter. Seems that the queue >>>>>>> reference counter is not released as it should through blk_queue_exit in >>>>>>> blk_mq_alloc_request. For reference, all ioctl I/Os reach the >>>>>>> nvme_nvm_submit_user_cmd on lightnvm.c >>>>>>> >>>>>>> Do you have any idea about why this might happen? I can dig more into >>>>>>> it, but first I wanted to make sure that I am not missing any obvious >>>>>>> assumption, which would explain the reference counter to be held for a >>>>>>> longer time. >>>>>> >>>>>> You need to check if the .q_usage_counter is working at atomic mode. >>>>>> This counter is initialized as atomic mode, and finally switchs to >>>>>> percpu mode via percpu_ref_switch_to_percpu() in blk_register_queue(). >>>>> >>>>> Thanks for commenting Ming. >>>>> >>>>> The .q_usage_counter is not working on atomic mode. The queue is >>>>> initialized normally through blk_register_queue() and the counter is >>>>> switched to percpu mode, as you mentioned. As I understand it, this is >>>>> how it should be, right? >>>> >>>> That is how it should be, yes. You're not running with any heavy >>>> debugging options, like lockdep or anything like that? >>> >>> No lockdep, KASAN, kmemleak or any of the other usual suspects. >>> >>> What's interesting is that it only happens when one of the I/Os comes >>> from user space through the ioctl. If I have several pblk instances on >>> the same device (which would end up allocating a new request in >>> parallel, potentially on the same core), the latency spike does not >>> trigger. >>> >>> I also tried to bind the read thread and the liblightnvm thread issuing >>> the ioctl to different cores, but it does not help... >> >> How do I reproduce this? Off the top of my head, and looking at the code, >> I have no idea what is going on here. > > Using LightNVM and liblightnvm [1] you can reproduce it by: > > 1. Instantiate a pblk instance on the first channel (luns 0 - 7): > sudo nvme lnvm create -d nvme0n1 -n test0 -t pblk -b 0 -e 7 -f > 2. Write 5GB to the test0 block device with a normal fio script > 3. Read 5GB to verify that latencies are good (max. ~80-90us at bs=4k, qd=1) > 4. Re-run 3. and in parallel send a command through liblightnvm to a > different channel. A simple command is an erase (erase block 900 on > channel 2, lun 0): > sudo nvm_vblk line_erase /dev/nvme0n1 2 2 0 0 900 > > After 4. you should see a ~25-30ms latency on the read workload. > > I tried to reproduce the ioctl in a more generic way to reach > __nvme_submit_user_cmd(), but SPDK steals the whole device. Also, qemu > is not reliable for this kind of performance testing. > > If you have a suggestion on how I can mix an ioctl with normal block I/O > read on a standard NVMe device, I'm happy to try it and see if I can > reproduce the issue. Just to rule out this being any hardware related delays in processing IO: 1) Does it reproduce with a simpler command, anything close to a no-op that you can test? 2) What did you use to time the stall being blk_queue_enter()? -- Jens Axboe