From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05D4CC06517 for ; Thu, 4 Jul 2019 11:27:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C8032218A0 for ; Thu, 4 Jul 2019 11:27:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=acronis.com header.i=@acronis.com header.b="KP6WW9YU" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727615AbfGDL1U (ORCPT ); Thu, 4 Jul 2019 07:27:20 -0400 Received: from us-edge-1.acronis.com ([69.20.59.99]:54032 "EHLO us-edge-1.acronis.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727558AbfGDL1U (ORCPT ); Thu, 4 Jul 2019 07:27:20 -0400 X-Greylist: delayed 304 seconds by postgrey-1.27 at vger.kernel.org; Thu, 04 Jul 2019 07:27:18 EDT DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=acronis.com ; s=exim; h=MIME-Version:Content-Transfer-Encoding:Content-Type:Message-ID: Date:Subject:CC:To:From:Sender:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=ebLt+rgN7+eRdUOrhQJKd9HTr215QAx3NpL4JVnj+J8=; b=KP6WW9YUufMfCC1ywYXDnLOq3q fgNT8E2lfEOrE/LW2ounRUZwyldYFc+S8LIhlVsQEeZk63EEvq+1dr4ueTA6M6FNBP8T8YaRXnzkX 1rYSUkfgcktdNKXfhLp9x2Y4FL8vkGNdqiPq5bM0nkVp3EI5PxQSZuJ//AStH7K5IgIY=; Received: from [91.195.22.121] (helo=ru-ex-2.corp.acronis.com) by us-edge-1.acronis.com with esmtps (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.92) (envelope-from ) id 1hizoO-0001FI-SQ; Thu, 04 Jul 2019 07:22:00 -0400 Received: from ru-ex-2.corp.acronis.com (10.250.32.21) by ru-ex-2.corp.acronis.com (10.250.32.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.1591.10; Thu, 4 Jul 2019 14:21:57 +0300 Received: from ru-ex-2.corp.acronis.com ([fe80::3163:6934:fc33:4bd3]) by ru-ex-2.corp.acronis.com ([fe80::3163:6934:fc33:4bd3%4]) with mapi id 15.01.1591.011; Thu, 4 Jul 2019 14:21:57 +0300 From: Filipp Mikoian To: "linux-block@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-aio@kvack.org" CC: "axboe@kernel.dk" , "jmoyer@redhat.com" , "viro@zeniv.linux.org.uk" Subject: io_uring question Thread-Topic: io_uring question Thread-Index: AQHVMlktc6wBW1EANEu5MaqUCERTlQ== Date: Thu, 4 Jul 2019 11:21:57 +0000 Message-ID: <76524892f9c048ea88c7d87295ec85ae@acronis.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.250.3.5] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MessageID-Signature: 294ead3f9692afd3605e2efd4a21308e2b740685 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi dear io_uring developers, Recently I started playing with io_uring, and the main difference I expecte= d to see with old AIO(io_submit(), etc.) was submission syscall(io_uring_ente= r()) not blocking in case submission might take long time, e.g. if waiting for a= slot in block device request queue is required. AFAIU, 'workers' machinery is us= ed solely to be able to submit requests in async context, thus not forcing cal= ling thread to block for a significant time. At worst EAGAIN is expected. However, when I installed fresh 5.2.0-rc7 kernel on the machine with HDD wi= th 64-requests-deep queue, I noticed significant increase in time spent in io_uring_enter() once request queue became full. Below you can find output of the program that submits random(in 1GB range) 4K read requests in batche= s of 32. Though O_DIRECT is used, the same phenomenon is observed when using page cache. Source code can be found here: https://github.com/Phikimon/io_uring_question While analyzing stack dump, I found out that IOCB_NOWAIT flag being set does not prevent generic_file_read_iter() from calling blkdev_direct_IO(), so thread gets stuck for hundreds of milliseconds. However, I am not a Linux kernel expert, so I can not be sure this is actually related to the mentioned issue. Is it actually expected that io_uring would sleep in case there is no slot in block device's request queue, or is this a bug of current implementation= ? root@localhost:~/io_uring# uname -msr Linux 5.2.0-rc7 x86_64 root@localhost:~/io_uring# hdparm -I /dev/sda | grep Model Model Number: Hitachi HTS541075A9E680 root@localhost:~/io_uring# cat /sys/block/sda/queue/nr_requests 64 root@localhost:~/io_uring# ./io_uring_read_blkdev /dev/sda8 submitted_already =3D 0, submitted_now =3D 32, submit_time =3D 246 u= s submitted_already =3D 32, submitted_now =3D 32, submit_time =3D 130 u= s submitted_already =3D 64, submitted_now =3D 32, submit_time =3D 189548 u= s submitted_already =3D 96, submitted_now =3D 32, submit_time =3D 121542 u= s submitted_already =3D 128, submitted_now =3D 32, submit_time =3D 128314 u= s submitted_already =3D 160, submitted_now =3D 32, submit_time =3D 136345 u= s submitted_already =3D 192, submitted_now =3D 32, submit_time =3D 162320 u= s root@localhost:~/io_uring# cat pstack_output # This is where process slept [<0>] io_schedule+0x16/0x40 [<0>] blk_mq_get_tag+0x166/0x280 [<0>] blk_mq_get_request+0xde/0x380 [<0>] blk_mq_make_request+0x11e/0x5b0 [<0>] generic_make_request+0x191/0x3c0 [<0>] submit_bio+0x75/0x140 [<0>] blkdev_direct_IO+0x3f8/0x4a0 [<0>] generic_file_read_iter+0xbf/0xdc0 [<0>] blkdev_read_iter+0x37/0x40 [<0>] io_read+0xf6/0x180 [<0>] __io_submit_sqe+0x1cd/0x6a0 [<0>] io_submit_sqe+0xea/0x4b0 [<0>] io_ring_submit+0x86/0x120 [<0>] __x64_sys_io_uring_enter+0x241/0x2d0 [<0>] do_syscall_64+0x60/0x1a0 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [<0>] 0xffffffffffffffff P.S. There are also several suspicious places in liburing and io_uring's ke= rnel part. I'm not sure if these are really bugs, so please point out if an= y of them needs a fixing patch. Among them: 1. Inaccurate handling of errors in liburing/__io_uring_submit(). Because liburing currently does not care about queue head that kernel sets, it c= annot know how many entries have been actually consumed. In case e.g. io_uring_enter() returns EAGAIN, and consumes none of the sqes, sq->sqe_= head still advances in __io_uring_submit(), this can eventually cause both io_uring_submit() and io_uring_sqe() return 0 forever. 2. There is also a related issue -- when using IORING_SETUP_SQPOLL, in case polling kernel thread already went to sleep(IORING_SQ_NEED_WAKEUP is set= ), io_uring_enter() just wakes it up and immediately reports all @to_submit requests are consumed, while this is not true until awaken thread will m= anage to handle them. At least this contradicts with man page, which states: > When the system call returns that a certain amount of SQEs have been > consumed and submitted, it's safe to reuse SQE entries in the ring. It is easy to reproduce this bug -- just change e.g. ->offset field in t= he SQE immediately after io_uring_enter() successfully returns and you will= see that IO happened on new offset. 3. Again due to lack of synchronization between io_sq_thread() and io_uring_enter(), in case the ring is full and IORING_SETUP_SQPOLL is us= ed, it seems there is no other way for application to wait for slots in SQ t= o become available but busy waiting for *sq->khead to advance. Thus from o= ne busy waiting thread we get two. Is this the expected behavior? Should th= e user of IORING_SETUP_SQPOLL busy wait for slots in SQ? 4. Minor one: in case sq_thread_idle is set to ridiculously big value(e.g. = 100 sec), kernel watchdog starts reporting this as a bug. > Message from syslogd@centos-linux at Jun 21 20:00:04 ... > kernel:watchdog: BUG: soft lockup - CPU#0 stuck for 21s! [io_uring-sq= :10691] Looking forward to your reply and thank you in advance.=