From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark Kampe Subject: Re: ceph and efficient access of distributed resources Date: Tue, 16 Apr 2013 07:18:27 -0700 Message-ID: <516D5DB3.4060800@inktank.com> References: <51683184.9010301@inktank.com> <516C7E55.1050801@inktank.com> <516C8168.40402@inktank.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from mail-pa0-f53.google.com ([209.85.220.53]:38850 "EHLO mail-pa0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933365Ab3DPOSa (ORCPT ); Tue, 16 Apr 2013 10:18:30 -0400 Received: by mail-pa0-f53.google.com with SMTP id bh4so368643pad.26 for ; Tue, 16 Apr 2013 07:18:29 -0700 (PDT) In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Gandalf Corvotempesta Cc: Matthias Urlichs , "ceph-devel@vger.kernel.org" On 04/16/13 00:20, Gandalf Corvotempesta wrote: > 2013/4/16 Mark Kampe : >> The entire web is richly festooned with cache servers whose >> sole raison d'etre is to solve precisely this problem. They >> are so good at it that back-bone providers often find it more >> cash-efficient to buy more cache servers than to lay more >> fiber. Cache servers don't merely save disk I/O, they catch >> these requests before they reach the server (or even the >> backbone). > > Mine was just an example, there are many other cases where a frotnend > cache is not possible. > I think that ceph should spread reads across the whole clusters by > default (like a big RAID-1), to archieve bandwidth improvement. At my previous distributed storage start-up (Parascale) we had the ability to distribute reads across copies for load distribution purposes and everybody we talked to said "who cares!". Why? For hot-spot situations (as in your original example) higher level caching is far more effective than random traffic distribution. For lower level (e.g. coincidental) reuse, sending all the requests to a single server will usually perform better. Network I/O is much faster than disk I/O, and a single recipient will have N * the cache hit rate that N servers would have. > What happens in case of a big file (for example, 100MB) with multiple > chunks? Is ceph smart enough to read multiple chunks from multiple > servers simultaneously or the whole file will be served by just an OSD RADOS is the underlying storage cluster, but the access methods (block, object, and file) stripe their data across many RADOS objects, which CRUSH very effectively distributes across all of the servers. A 100MB read or write turns into dozens of parallel operations to servers all over the cluster.