From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark Nelson Subject: Re: file journal fadvise Date: Mon, 01 Dec 2014 13:18:18 -0600 Message-ID: <547CBEFA.3000204@redhat.com> References: Reply-To: mnelson@redhat.com Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from mail-qg0-f52.google.com ([209.85.192.52]:42428 "EHLO mail-qg0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932157AbaLATSV (ORCPT ); Mon, 1 Dec 2014 14:18:21 -0500 Received: by mail-qg0-f52.google.com with SMTP id a108so7964909qge.39 for ; Mon, 01 Dec 2014 11:18:20 -0800 (PST) In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Sage Weil , =?UTF-8?B?6ams5bu65pyL?= Cc: ceph-devel On 11/30/2014 09:26 PM, Sage Weil wrote: > On Mon, 1 Dec 2014, ??? wrote: >> Hi sage: >> For fadvise_random it only change the file readahead. I think it make >> no sense for xfs >> Becasue xfs don't like btrfs, the journal write always on old place(at >> first allocated). We only can make those place contiguous. > > I'm thinking of the OSD journal, which can be a regular file. I guess it > would probably be an allocator mode, set via a XFS_XFLAG_* flag passed to > an ioctl, which makes the delayed allocation especially unconcerned with > keeping blocks contiguous. It would need to be combined with the discard > ioctl so that any journal write can be allocated wherever it is most > convenient (hopefully contiguous to some other write). > > sage Hi Sage, Could you quick write down the steps you are thinking we'd take to implement this? I'm concerned about the amount of overhead this could cause but I want to make sure I'm thinking about it correctly. Especially when trim happens and what you think/expect to happens at the FS and device levels. Mark > > >> >> Thanks! >> Jianpeng >> >> 2014-12-01 2:46 GMT+08:00 Sage Weil : >>> Currently, when an OSD journal is stored as a file, we preallocate it as a >>> large contiguous extent. That means that for every journal write we're >>> seeking back to wherever the journal is. That possibly not ideal for >>> writes. For reads it's great, but that's the last thing we care about >>> optimizing (we only read the journal after a failure, which is very rare). >>> >>> I wonder if we would do better if we: >>> >>> 1- trim/discard the old journal contents, >>> 2- posix_fadvise RANDOM >>> >>> I'm not sure what the XFS behavior is in this case, but ideally it seems >>> what we want it to do is write the journal wherever on disk it is most >>> convenient... ideally contiguous with some other write that it is already >>> doing. If fadvise random doesn't do that, perhaps there is another >>> allocator hint we can give it that will get us that behavior... >>> >>> sage >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >>