From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f179.google.com (mail-pd0-f179.google.com [209.85.192.179]) by kanga.kvack.org (Postfix) with ESMTP id 33D716B0035 for ; Fri, 30 May 2014 22:06:58 -0400 (EDT) Received: by mail-pd0-f179.google.com with SMTP id fp1so1534225pdb.10 for ; Fri, 30 May 2014 19:06:57 -0700 (PDT) Received: from mail-pd0-f177.google.com (mail-pd0-f177.google.com [209.85.192.177]) by mx.google.com with ESMTPS id zf6si7912759pab.226.2014.05.30.19.06.56 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 30 May 2014 19:06:56 -0700 (PDT) Received: by mail-pd0-f177.google.com with SMTP id g10so1528928pdj.36 for ; Fri, 30 May 2014 19:06:56 -0700 (PDT) Message-ID: <5389393D.2030305@kernel.dk> Date: Fri, 30 May 2014 20:06:53 -0600 From: Jens Axboe MIME-Version: 1.0 Subject: Re: [RFC 2/2] x86_64: expand kernel stack to 16K References: <1401260039-18189-1-git-send-email-minchan@kernel.org> <1401260039-18189-2-git-send-email-minchan@kernel.org> <20140528223142.GO8554@dastard> <20140529013007.GF6677@dastard> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds , Dave Chinner Cc: Minchan Kim , Linux Kernel Mailing List , Andrew Morton , linux-mm , "H. Peter Anvin" , Ingo Molnar , Peter Zijlstra , Mel Gorman , Rik van Riel , Johannes Weiner , Hugh Dickins , Rusty Russell , "Michael S. Tsirkin" , Dave Hansen , Steven Rostedt On 2014-05-28 20:42, Linus Torvalds wrote: >> Regardless of whether it is swap or something external queues the >> bio on the plug, perhaps we should look at why it's done inline >> rather than by kblockd, where it was moved because it was blowing >> the stack from schedule(): > > So it sounds like we need to do this for io_schedule() too. > > In fact, we've generally found it to be a mistake every time we > "automatically" unblock some IO queue. And I'm not saying that because > of stack space, but because we've _often_ had the situation that eager > unblocking results in IO that could have been done as bigger requests. We definitely need to auto-unplug on the schedule path, otherwise we run into all sorts of trouble. But making it async off the IO schedule path is fine. By definition, it's not latency sensitive if we are hitting unplug on schedule. I'm pretty sure it was run inline on CPU concerns here, as running inline is certainly cheaper than punting to kblockd. > Looking at that callchain, I have to say that ext4 doesn't look > horrible compared to the whole block layer and virtio.. Yes, > "ext4_writepages()" is using almost 400 bytes of stack, and most of > that seems to be due to: > > struct mpage_da_data mpd; > struct blk_plug plug; Plus blk_plug is pretty tiny as it is. I queued up a patch to kill the magic part of it, since that's never caught any bugs. Only saves 8 bytes, but may as well take that. Especially if we end up with nested plugs. > Well, we've definitely have had some issues with deeper callchains > with md, but I suspect virtio might be worse, and the new blk-mq code > is lilkely worse in this respect too. I don't think blk-mq is worse than the older stack, in fact it should be better. The call chains are shorter, and a lot less cruft on the stack. Historically the stack issues have been nested devices, however. And for sync IO, we do run it inline, so if the driver chews up a lot of stack, well... Looks like I'm late here and the decision has been made to go 16K stacks, which I think is a good one. We've been living on the edge (and sometimes over) for heavy dm/md setups for a while, and have been patching around that fact in the IO stack for years. -- Jens Axboe -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org