From mboxrd@z Thu Jan 1 00:00:00 1970 From: Joe Landman Subject: Re: high throughput storage server? Date: Mon, 28 Feb 2011 10:46:06 -0500 Message-ID: <4D6BC33E.7070703@gmail.com> References: <20110215044434.GA9186@septictank.raw-sewage.fake> <4D6AC288.20101@wildgooses.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4D6AC288.20101@wildgooses.com> Sender: linux-raid-owner@vger.kernel.org To: Ed W Cc: Matt Garman , Mdadm List-Id: linux-raid.ids On 02/27/2011 04:30 PM, Ed W wrote: [...] > It would appear that you can use a much lower powered system to > basically push jobs out to the processing machines in advance, this way > your bandwidth basically only needs to be: > size_of_job * num_machines / time_to_process_jobs This would be good. Matt's original argument suggested he needed this as his sustained bandwidth given the way the analysis proceeded. If we assume that the processing time is T_p, and the communication time is T_c, ignoring other factors, the total time for 1 job is T_j = T_p + T_c. If T_c << T_p, then you can effectively ignore bandwidth related issues (and use a much smaller bandwidth system). For T_c << T_p, lets (for laughs) say T_c = 0.1 x T_p (e.g. communication time is 1/10th the processing time). Then even if you halved your bandwidth, and doubled T_c, you are making only an about 10% increase in your total execution time for a job. With Nmachines each with Ncores, you have Nmachines x Ncores jobs going on all at once. If T_c << T_p (as in the above example), then most of the time, on average, the machines will not be communicating. In fact, if we do a very rough first pass approximation to an answer (there are more accurate statistical models) for this, one would expect the network to be used T_c/T_p fraction of the time by each process. Then the total consumption of data for a run (assuming all runs are *approximately* of equal duration) D = B x T_c D being the amount of data in MB or GB, and B being the bandwidth expressed in MB/s or GB/s. Your effective bandwidth per run, Beff will be D = Beff x T = Beff x (T_c + T_p) For Nmachines x Ncores jobs, Dtotal is the total data transfered Dtotal = Nmachines x Ncores * D = Nmachines x Ncores x Beff x (T_c + T_p) You know Dtotal (aggregate data needed for run). You know Nmachines and Ncores. You know T_c and T_p (approximately). From this, solve for Beff. Thats what you have to sustain (approximately). > So if the time to process jobs is significant then you have quite some > time to push out the next job to local storage ready? > > Firstly is this architecture workable? If so then you have some new > performance parameters to target for the storage architecture? > > Good luck > > Ed W -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman@scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615