From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752351Ab1HZD0J (ORCPT ); Thu, 25 Aug 2011 23:26:09 -0400 Received: from mga14.intel.com ([143.182.124.37]:21480 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751243Ab1HZD0G (ORCPT ); Thu, 25 Aug 2011 23:26:06 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.68,283,1312182000"; d="scan'208";a="42598820" Date: Fri, 26 Aug 2011 11:26:02 +0800 From: Wu Fengguang To: Stefan Priebe Cc: Pekka Enberg , LKML , "linux-mm@kvack.org" , Andrew Morton , Mel Gorman , Jens Axboe , Linux Netdev List Subject: Re: slow performance on disk/network i/o full speed after drop_caches Message-ID: <20110826032601.GA26282@localhost> References: <4E5494D4.1050605@profihost.ag> <4E54BDCF.9020504@profihost.ag> <20110824093336.GB5214@localhost> <4E560F2A.1030801@profihost.ag> <20110826021648.GA19529@localhost> <4E570AEB.1040703@profihost.ag> <20110826030313.GA24058@localhost> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Aug 26, 2011 at 11:13:07AM +0800, Stefan Priebe wrote: > > >> There is at least a numastat proc file. > > > > Thanks. This shows that node0 is accessed 10x more than node1. > > What can i do to prevent this or isn't this normal when a machine mostly idles so processes are mostly processed by cpu0. Yes, that's normal. However it should explain why it's slow even when there are lots of free pages _globally_. > > > >> complete ps output: > >> http://pastebin.com/raw.php?i=b948svzN > > > > In that log, scp happens to be in R state and also no other tasks in D > > state. Would you retry in the hope of catching some stucked state? > Sadly not as the sysrq trigger has rebootet the machine and it will now run fine for 1 or 2 days. Oops, sorry! It might be possible to reproduce the issue by manually eating all of the memory with sparse file data: truncate -s 1T 1T cp 1T /dev/null > > > >>> echo t> /proc/sysrq-trigger > >> sadly i wa sonly able to grab the output in this crazy format: > >> http://pastebin.com/raw.php?i=MBXvvyH1 > > > > It's pretty readable dmesg, except that the data is incomplete and > > there are nothing valuable in the uploaded portion.. > That was everything i could grab through netconsole. Is there a better way? netconsole is enough. The partial output should be due to the reboot... Thanks, Fengguang