From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay1.corp.sgi.com [137.38.102.111]) by oss.sgi.com (Postfix) with ESMTP id DD5A57F3F for ; Fri, 5 Sep 2014 15:14:52 -0500 (CDT) Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by relay1.corp.sgi.com (Postfix) with ESMTP id A13FF8F804C for ; Fri, 5 Sep 2014 13:14:52 -0700 (PDT) Received: from mail-ph.de-nserver.de (mail-ph.de-nserver.de [85.158.179.214]) by cuda.sgi.com with ESMTP id J1F1oyQZ28pPO99X (version=TLSv1 cipher=AES256-SHA bits=256 verify=NO) for ; Fri, 05 Sep 2014 13:14:47 -0700 (PDT) Message-ID: <540A19BB.8040404@profihost.ag> Date: Fri, 05 Sep 2014 22:14:51 +0200 From: Stefan Priebe MIME-Version: 1.0 Subject: Re: Is XFS suitable for 350 million files on 20TB storage? References: <540986B1.4080306@profihost.ag> <20140905123058.GA29710@bfoster.bfoster> <5409AF40.10801@profihost.ag> <20140905134810.GA3965@laptop.bfoster> <5409FBEA.9050708@profihost.ag> <20140905191815.GB8400@laptop.bfoster> In-Reply-To: <20140905191815.GB8400@laptop.bfoster> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: Brian Foster Cc: "xfs@oss.sgi.com" Am 05.09.2014 21:18, schrieb Brian Foster: ... > On Fri, Sep 05, 2014 at 08:07:38PM +0200, Stefan Priebe wrote: > Interesting, that seems like a lot of free inodes. That's 1-2 million in > each AG that we have to look around for each time we want to allocate an > inode. I can't say for sure that's the source of the slowdown, but this > certainly looks like the kind of workload that inspired the addition of > the free inode btree (finobt) to more recent kernels. > > It appears that you still have quite a bit of space available in > general. Could you run some local tests on this filesystem to try and > quantify how much of this degradation manifests on sustained writes vs. > file creation? For example, how is throughput when writing a few GB to a > local test file? Not sure if this is what you expect: # dd if=/dev/zero of=bigfile oflag=direct,sync bs=4M count=1000 1000+0 records in 1000+0 records out 4194304000 bytes (4,2 GB) copied, 125,809 s, 33,3 MB/s or without sync # dd if=/dev/zero of=bigfile oflag=direct bs=4M count=1000 1000+0 records in 1000+0 records out 4194304000 bytes (4,2 GB) copied, 32,5474 s, 129 MB/s > How about with that same amount of data broken up > across a few thousand files? This results in heavy kworker usage. 4GB in 32kb files # time (mkdir test; for i in $(seq 1 1 131072); do dd if=/dev/zero of=test/$i bs=32k count=1 oflag=direct,sync 2>/dev/null; done) ... 55 min > Brian > > P.S., Alternatively if you wanted to grab a metadump of this filesystem > and compress/upload it somewhere, I'd be interested to take a look at > it. I think there might be file and directory names in it. If this is the case i can't do it. Stefan > >> Thanks! >> >> Stefan >> >> >> >>> Brian >>> >>>>> ... as well as what your typical workflow/dataset is for this fs. It >>>>> seems like you have relatively small files (15TB used across 350m files >>>>> is around 46k per file), yes? >>>> >>>> Yes - most fo them are even smaller. And some files are > 5GB. >>>> >>>>> If so, I wonder if something like the >>>>> following commit introduced in 3.12 would help: >>>>> >>>>> 133eeb17 xfs: don't use speculative prealloc for small files >>>> >>>> Looks interesting. >>>> >>>> Stefan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs