From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay1.corp.sgi.com [137.38.102.111]) by oss.sgi.com (Postfix) with ESMTP id 821297F5A for ; Tue, 1 Dec 2015 14:56:08 -0600 (CST) Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by relay1.corp.sgi.com (Postfix) with ESMTP id 612CA8F8039 for ; Tue, 1 Dec 2015 12:56:08 -0800 (PST) Received: from mail-wm0-f46.google.com (mail-wm0-f46.google.com [74.125.82.46]) by cuda.sgi.com with ESMTP id lUIzqGlh4FInayCy (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO) for ; Tue, 01 Dec 2015 12:56:05 -0800 (PST) Received: by wmvv187 with SMTP id v187so225979720wmv.1 for ; Tue, 01 Dec 2015 12:56:04 -0800 (PST) Subject: Re: sleeps and waits during io_submit References: <20151130141000.GC24765@bfoster.bfoster> <565C5D39.8080300@scylladb.com> <20151130161438.GD24765@bfoster.bfoster> <565D639F.8070403@scylladb.com> <20151201131114.GA26129@bfoster.bfoster> <565DA784.5080003@scylladb.com> <20151201204535.GX19199@dastard> From: Avi Kivity Message-ID: <565E0961.4060603@scylladb.com> Date: Tue, 1 Dec 2015 22:56:01 +0200 MIME-Version: 1.0 In-Reply-To: <20151201204535.GX19199@dastard> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: Dave Chinner , Glauber Costa Cc: Brian Foster , xfs@oss.sgi.com On 12/01/2015 10:45 PM, Dave Chinner wrote: > On Tue, Dec 01, 2015 at 09:01:13AM -0500, Glauber Costa wrote: >> On Tue, Dec 1, 2015 at 8:58 AM, Avi Kivity wrote: >>> On 12/01/2015 03:11 PM, Brian Foster wrote: >>>> It sounds to me that first and foremost you want to make sure you don't >>>> have however many parallel operations you typically have running >>>> contending on the same inodes or AGs. Hint: creating files under >>>> separate subdirectories is a quick and easy way to allocate inodes under >>>> separate AGs (the agno is encoded into the upper bits of the inode >>>> number). >>> >>> Unfortunately our directory layout cannot be changed. And doesn't this >>> require having agcount == O(number of active files)? That is easily in the >>> thousands. >> Actually, wouldn't agcount == O(nr_cpus) be good enough? > Not quite. What you need is agcount ~= O(nr_active_allocations). Yes, this is what I mean by "active files". > > The difference is an allocation can block waiting on IO, and the > CPU can then go off and run another process, which then tries to do > an allocation. So you might only have 4 CPUs, but a workload that > can have a hundred active allocations at once (not uncommon in > file server workloads). But for us, probably not much more. We try to restrict active I/Os to the effective disk queue depth (more than that and they just turn sour waiting in the disk queue). > On worklaods that are roughly 1 process per CPU, it's typical that > agcount = 2 * N cpus gives pretty good results on large filesystems. This is probably using sync calls. Using async calls you can have many more I/Os in progress (but still limited by effective disk queue depth). > If you've got 400GB filesystems or you are using spinning disks, > then you probably don't want to go above 16 AGs, because then you > have problems with maintaining continugous free space and you'll > seek the spinning disks to death.... We're concentrating on SSDs for now. > >>>> 'mount -o ikeep,' >>> >>> Interesting. Our files are large so we could try this. > Keep in mind that ikeep means that inode allocation permanently > fragments free space, which can affect how large files are allocated > once you truncate/rm the original files. > > We can try to prime this by allocating a lot of inodes up front, then removing them, so that this doesn't happen. Hurray ext2. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs