public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Mark Nelson <mark.nelson@inktank.com>
To: Sage Weil <sweil@redhat.com>
Cc: ceph-devel <ceph-devel@vger.kernel.org>,
	xfs@oss.sgi.com, 马建朋 <majianpeng@gmail.com>
Subject: Re: file journal fadvise
Date: Mon, 01 Dec 2014 16:31:18 -0600	[thread overview]
Message-ID: <547CEC36.6070309@redhat.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1412011122020.3471@cobra.newdream.net>



On 12/01/2014 01:23 PM, Sage Weil wrote:
> On Mon, 1 Dec 2014, Mark Nelson wrote:
>> On 11/30/2014 09:26 PM, Sage Weil wrote:
>>> On Mon, 1 Dec 2014, ??? wrote:
>>>> Hi sage:
>>>>    For fadvise_random it only change the file readahead. I think it make
>>>> no sense for xfs
>>>> Becasue xfs don't like btrfs, the journal write always on old place(at
>>>> first allocated). We only can make those place contiguous.
>>>
>>> I'm thinking of the OSD journal, which can be a regular file.  I guess it
>>> would probably be an allocator mode, set via a XFS_XFLAG_* flag passed to
>>> an ioctl, which makes the delayed allocation especially unconcerned with
>>> keeping blocks contiguous.  It would need to be combined with the discard
>>> ioctl so that any journal write can be allocated wherever it is most
>>> convenient (hopefully contiguous to some other write).
>>>
>>> sage
>>
>> Hi Sage,
>>
>> Could you quick write down the steps you are thinking we'd take to implement
>> this?  I'm concerned about the amount of overhead this could cause but I want
>> to make sure I'm thinking about it correctly. Especially when trim happens and
>> what you think/expect to happens at the FS and device levels.
>
> 1- set journal_discard = true
> 2- add journal_preallocate = true config option, set it to false, and make
> the fallocate(2) call on journal create conditional on that.
> 3- test with defaults (discard = false, preallocate = true) and
> compare it to discard = true + preallocate = false (with file journal).
> 4- possibly add a call to set extsize to something small on the journal
> file.  Or give xfs some other appropriate hint, if one exists.
>
> sage

CCing XFS devel so we can get some feedback from those guys too.

Question:  Looking through our discard code in common/blkdev.cc, it 
looks like the new discard implementation is using blkdiscard.  For 
co-located journals should we be using fstrim_range?

FWIW there were some performance tests done quite a while ago:

http://people.redhat.com/lczerner/discard/files/Performance_evaluation_of_Linux_DIscard_support_Dev_Con2011_Brno.pdf

>
>>
>> Mark
>>
>>>
>>>
>>>>
>>>> Thanks!
>>>> Jianpeng
>>>>
>>>> 2014-12-01 2:46 GMT+08:00 Sage Weil <sweil@redhat.com>:
>>>>> Currently, when an OSD journal is stored as a file, we preallocate it as
>>>>> a
>>>>> large contiguous extent.  That means that for every journal write we're
>>>>> seeking back to wherever the journal is.  That possibly not ideal for
>>>>> writes.  For reads it's great, but that's the last thing we care about
>>>>> optimizing (we only read the journal after a failure, which is very
>>>>> rare).
>>>>>
>>>>> I wonder if we would do better if we:
>>>>>
>>>>>    1- trim/discard the old journal contents,
>>>>>    2- posix_fadvise RANDOM
>>>>>
>>>>> I'm not sure what the XFS behavior is in this case, but ideally it seems
>>>>> what we want it to do is write the journal wherever on disk it is most
>>>>> convenient... ideally contiguous with some other write that it is
>>>>> already
>>>>> doing.  If fadvise random doesn't do that, perhaps there is another
>>>>> allocator hint we can give it that will get us that behavior...
>>>>>
>>>>> sage
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

       reply	other threads:[~2014-12-01 22:31 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <alpine.DEB.2.00.1411301013490.352@cobra.newdream.net>
     [not found] ` <CALurOm2tEV=RqN21eFJvfU1zTtJkbz2gHDCk_Ntsy4oz9iwHoA@mail.gmail.com>
     [not found]   ` <alpine.DEB.2.00.1411301922220.352@cobra.newdream.net>
     [not found]     ` <547CBEFA.3000204@redhat.com>
     [not found]       ` <alpine.DEB.2.00.1412011122020.3471@cobra.newdream.net>
2014-12-01 22:31         ` Mark Nelson [this message]
2014-12-01 22:51           ` file journal fadvise Dave Chinner
2014-12-02  0:12             ` Sage Weil
2014-12-02  0:32               ` Dave Chinner
2014-12-02  1:24                 ` Sage Weil
2014-12-02  2:01                   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=547CEC36.6070309@redhat.com \
    --to=mark.nelson@inktank.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=majianpeng@gmail.com \
    --cc=mnelson@redhat.com \
    --cc=sweil@redhat.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox