BTRFS && SSD

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* BTRFS && SSD
@ 2010-09-29 15:30 Yuehai Xu
  2010-09-29 17:08 ` Sean Bartell
       [not found] ` <20100929173757.7cf18c0d@simplux>
  0 siblings, 2 replies; 13+ messages in thread
From: Yuehai Xu @ 2010-09-29 15:30 UTC (permalink / raw)
  To: linux-btrfs; +Cc: yhxu, chris.mason

Hi,

I know BTRFS is a kind of Log-structured File System, which doesn't do
overwrite. Here is my question, suppose file A is overwritten by A',
instead of writing A' to the original place of A, a new place is
selected to store it. However, we know that the address of a file
should be recorded in its inode. In such case, the corresponding part
in inode of A should update from the original place A to the new place
A', is this a kind of overwrite actually? I think no matter what
design it is for Log-Structured FS, a mapping table is always needed,
such as inode map, DAT, etc. When a update operation happens for this
mapping table, is it actually a kind of over-write? If it is, is it a
bottleneck for the performance of write for SSD?

What do you think the major work that BTRFS can do to improve the
performance for SSD? I know FTL has becomes smarter and smarter, the
idea of log-structured file system is always implemented inside the
SSD by FTL, in that case, it sounds all the issues have been solved no
matter what the FS it is in upper stack. But at least, from the
results of benchmarks on the internet show that the performance from
different FS are quite different, such as NILFS2 and BTRFS.

Any comments?

Thanks,
Yuehai

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS && SSD
  2010-09-29 15:30 BTRFS && SSD Yuehai Xu
@ 2010-09-29 17:08 ` Sean Bartell
  2010-09-29 18:45   ` Yuehai Xu
  2010-09-29 19:39   ` Aryeh Gregor
       [not found] ` <20100929173757.7cf18c0d@simplux>
  1 sibling, 2 replies; 13+ messages in thread
From: Sean Bartell @ 2010-09-29 17:08 UTC (permalink / raw)
  To: Yuehai Xu; +Cc: linux-btrfs, yhxu, chris.mason

On Wed, Sep 29, 2010 at 11:30:14AM -0400, Yuehai Xu wrote:
> I know BTRFS is a kind of Log-structured File System, which doesn't do
> overwrite. Here is my question, suppose file A is overwritten by A',
> instead of writing A' to the original place of A, a new place is
> selected to store it. However, we know that the address of a file
> should be recorded in its inode. In such case, the corresponding part
> in inode of A should update from the original place A to the new place
> A', is this a kind of overwrite actually? I think no matter what
> design it is for Log-Structured FS, a mapping table is always needed,
> such as inode map, DAT, etc. When a update operation happens for this
> mapping table, is it actually a kind of over-write? If it is, is it a
> bottleneck for the performance of write for SSD?

In btrfs, this is solved by doing the same thing for the inode--a new
place for the leaf holding the inode is chosen. Then the parent of the
leaf must point to the new position of the leaf, so the parent is moved,
and the parent's parent, etc. This goes all the way up to the
superblocks, which are actually overwritten one at a time.

> What do you think the major work that BTRFS can do to improve the
> performance for SSD? I know FTL has becomes smarter and smarter, the
> idea of log-structured file system is always implemented inside the
> SSD by FTL, in that case, it sounds all the issues have been solved no
> matter what the FS it is in upper stack. But at least, from the
> results of benchmarks on the internet show that the performance from
> different FS are quite different, such as NILFS2 and BTRFS.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS && SSD
       [not found] ` <20100929173757.7cf18c0d@simplux>
@ 2010-09-29 18:38   ` Yuehai Xu
  0 siblings, 0 replies; 13+ messages in thread
From: Yuehai Xu @ 2010-09-29 18:38 UTC (permalink / raw)
  To: Dipl.-Ing. Michael Niederle; +Cc: linux-btrfs, yhxu

Hi,

On Wed, Sep 29, 2010 at 11:37 AM, Dipl.-Ing. Michael Niederle
<mniederle@gmx.at> wrote:
> Hi Yuehai!
>
> I tested nilfs2 and btrfs for the use with flash based pen drives.
>
> nilfs2 performed incredibly well as long as there were enough free blocks. But
> the garbage collector of nilfs used too much IO-bandwidth to be useable (with
> slow-write flash devices).

I also tested the performance of write for INTEL X25-V SSD by
postmark, the results are totally different from the results of INTEL
X25-M(http://www.usenix.org/event/lsf08/tech/shin_SSD.pdf). In his
test, the performance of NILFS2 is the best over all, however, in my
test, ext3 is the best while NILFS2 is the worst, almost 10 times less
than ext3 for the throughput of write.

So, what's the role of file system to handle these tricky storage?
Different throughput might be gotten by different file system.

The question is why nilfs2 and btrfs perform so well compared with
ext3 without considering my results, here I just talk about SSD, since
the FTL internal should always do the same thing as the file system,
that redirects the write to a new place instead of writing to the
original place. The throughput for different file system should be
more or less the same.

>
> btrfs on the other side performed very well - a lot better than conventional
> file systems like ext2/3 or reiserfs. After switching the mount-options to
> "noatime" I was able to run a complete Linux system from a (quite slow) pen
> drive without (much) problems. Performance on a fast pen drive is great. I'm
> using btrfs as the root file system on a daily basis since last Christmas
> without running into any problems.
>

The performance of file system is determined by the internal structure
of SSD? or by the structure of file system? or by the coordination of
both file system and SSD?

Thanks very much for replying.

> Greetings, Michael
>

Thanks,
Yuehai

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS && SSD
  2010-09-29 17:08 ` Sean Bartell
@ 2010-09-29 18:45   ` Yuehai Xu
  2010-09-29 19:59     ` Sean Bartell
  2010-09-29 19:39   ` Aryeh Gregor
  1 sibling, 1 reply; 13+ messages in thread
From: Yuehai Xu @ 2010-09-29 18:45 UTC (permalink / raw)
  To: Yuehai Xu, linux-btrfs, yhxu, chris.mason

On Wed, Sep 29, 2010 at 1:08 PM, Sean Bartell <wingedtachikoma@gmail.com> wrote:
> On Wed, Sep 29, 2010 at 11:30:14AM -0400, Yuehai Xu wrote:
>> I know BTRFS is a kind of Log-structured File System, which doesn't do
>> overwrite. Here is my question, suppose file A is overwritten by A',
>> instead of writing A' to the original place of A, a new place is
>> selected to store it. However, we know that the address of a file
>> should be recorded in its inode. In such case, the corresponding part
>> in inode of A should update from the original place A to the new place
>> A', is this a kind of overwrite actually? I think no matter what
>> design it is for Log-Structured FS, a mapping table is always needed,
>> such as inode map, DAT, etc. When a update operation happens for this
>> mapping table, is it actually a kind of over-write? If it is, is it a
>> bottleneck for the performance of write for SSD?
>
> In btrfs, this is solved by doing the same thing for the inode--a new
> place for the leaf holding the inode is chosen. Then the parent of the
> leaf must point to the new position of the leaf, so the parent is moved,
> and the parent's parent, etc. This goes all the way up to the
> superblocks, which are actually overwritten one at a time.

You mean that there is no over-write for inode too, once the inode
need to be updated, this inode is actually written to a new place
while the only thing to do is to change the point of its parent to
this new place. However, for the last parent, or the superblock, does
it need to be overwritten?

I am afraid I don't quite understand the meaning of your last sentence.

Thanks for replying,
Yuehai


>
>> What do you think the major work that BTRFS can do to improve the
>> performance for SSD? I know FTL has becomes smarter and smarter, the
>> idea of log-structured file system is always implemented inside the
>> SSD by FTL, in that case, it sounds all the issues have been solved no
>> matter what the FS it is in upper stack. But at least, from the
>> results of benchmarks on the internet show that the performance from
>> different FS are quite different, such as NILFS2 and BTRFS.
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS && SSD
  2010-09-29 17:08 ` Sean Bartell
  2010-09-29 18:45   ` Yuehai Xu
@ 2010-09-29 19:39   ` Aryeh Gregor
  2010-09-29 20:08     ` Sean Bartell
  1 sibling, 1 reply; 13+ messages in thread
From: Aryeh Gregor @ 2010-09-29 19:39 UTC (permalink / raw)
  To: Yuehai Xu, linux-btrfs, yhxu, chris.mason

On Wed, Sep 29, 2010 at 1:08 PM, Sean Bartell <wingedtachikoma@gmail.com> wrote:
> In btrfs, this is solved by doing the same thing for the inode--a new
> place for the leaf holding the inode is chosen. Then the parent of the
> leaf must point to the new position of the leaf, so the parent is moved,
> and the parent's parent, etc. This goes all the way up to the
> superblocks, which are actually overwritten one at a time.

Sorry for the useless question, but just out of curiosity: doesn't
this mean that btrfs has to do quite a lot more writes than ext4 for
small file operations?  E.g., if you append one block to a file, like
a log file, then ext3 should have to do about three writes: data,
metadata, and journal (and the latter is always sequential, so it's
cheap).  But btrfs will need to do more, rewriting parent nodes all
the way up the line for both the data and metadata blocks.  Why
doesn't this hurt performance a lot?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS && SSD
  2010-09-29 18:45   ` Yuehai Xu
@ 2010-09-29 19:59     ` Sean Bartell
  2010-09-29 21:31       ` Yuehai Xu
  0 siblings, 1 reply; 13+ messages in thread
From: Sean Bartell @ 2010-09-29 19:59 UTC (permalink / raw)
  To: Yuehai Xu; +Cc: linux-btrfs, yhxu, chris.mason

On Wed, Sep 29, 2010 at 02:45:29PM -0400, Yuehai Xu wrote:
> On Wed, Sep 29, 2010 at 1:08 PM, Sean Bartell <wingedtachikoma@gmail.com> wrote:
> > On Wed, Sep 29, 2010 at 11:30:14AM -0400, Yuehai Xu wrote:
> >> I know BTRFS is a kind of Log-structured File System, which doesn't do
> >> overwrite. Here is my question, suppose file A is overwritten by A',
> >> instead of writing A' to the original place of A, a new place is
> >> selected to store it. However, we know that the address of a file
> >> should be recorded in its inode. In such case, the corresponding part
> >> in inode of A should update from the original place A to the new place
> >> A', is this a kind of overwrite actually? I think no matter what
> >> design it is for Log-Structured FS, a mapping table is always needed,
> >> such as inode map, DAT, etc. When a update operation happens for this
> >> mapping table, is it actually a kind of over-write? If it is, is it a
> >> bottleneck for the performance of write for SSD?
> >
> > In btrfs, this is solved by doing the same thing for the inode--a new
> > place for the leaf holding the inode is chosen. Then the parent of the
> > leaf must point to the new position of the leaf, so the parent is moved,
> > and the parent's parent, etc. This goes all the way up to the
> > superblocks, which are actually overwritten one at a time.
> 
> You mean that there is no over-write for inode too, once the inode
> need to be updated, this inode is actually written to a new place
> while the only thing to do is to change the point of its parent to
> this new place. However, for the last parent, or the superblock, does
> it need to be overwritten?

Yes. The idea of copy-on-write, as used by btrfs, is that whenever
*anything* is changed, it is simply written to a new location. This
applies to data, inodes, and all of the B-trees used by the filesystem.
However, it's necessary to have *something* in a fixed place on disk
pointing to everything else. So the superblocks can't move, and they are
overwritten instead.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS && SSD
  2010-09-29 19:39   ` Aryeh Gregor
@ 2010-09-29 20:08     ` Sean Bartell
  0 siblings, 0 replies; 13+ messages in thread
From: Sean Bartell @ 2010-09-29 20:08 UTC (permalink / raw)
  To: Aryeh Gregor; +Cc: Yuehai Xu, linux-btrfs, yhxu, chris.mason

On Wed, Sep 29, 2010 at 03:39:07PM -0400, Aryeh Gregor wrote:
> On Wed, Sep 29, 2010 at 1:08 PM, Sean Bartell <wingedtachikoma@gmail.com> wrote:
> > In btrfs, this is solved by doing the same thing for the inode--a new
> > place for the leaf holding the inode is chosen. Then the parent of the
> > leaf must point to the new position of the leaf, so the parent is moved,
> > and the parent's parent, etc. This goes all the way up to the
> > superblocks, which are actually overwritten one at a time.
> 
> Sorry for the useless question, but just out of curiosity: doesn't
> this mean that btrfs has to do quite a lot more writes than ext4 for
> small file operations?  E.g., if you append one block to a file, like
> a log file, then ext3 should have to do about three writes: data,
> metadata, and journal (and the latter is always sequential, so it's
> cheap).  But btrfs will need to do more, rewriting parent nodes all
> the way up the line for both the data and metadata blocks.  Why
> doesn't this hurt performance a lot?

For a single change, it does write more. However, there are usually many
changes to children being performed at once, which only require one
change to the parent. Since it's moving everything to new places, btrfs
also has much more control over where writes occur, so all the leaves
and parents can be written sequentially. ext3 is a slave to the current
locations on disk.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS && SSD
  2010-09-29 19:59     ` Sean Bartell
@ 2010-09-29 21:31       ` Yuehai Xu
  2010-09-30  7:15         ` Sander
  2010-09-30  7:51         ` David Brown
  0 siblings, 2 replies; 13+ messages in thread
From: Yuehai Xu @ 2010-09-29 21:31 UTC (permalink / raw)
  To: Yuehai Xu, linux-btrfs, yhxu, chris.mason, wingedtachikoma

On Wed, Sep 29, 2010 at 3:59 PM, Sean Bartell <wingedtachikoma@gmail.com> wrote:
> On Wed, Sep 29, 2010 at 02:45:29PM -0400, Yuehai Xu wrote:
>> On Wed, Sep 29, 2010 at 1:08 PM, Sean Bartell <wingedtachikoma@gmail.com> wrote:
>> > On Wed, Sep 29, 2010 at 11:30:14AM -0400, Yuehai Xu wrote:
>> >> I know BTRFS is a kind of Log-structured File System, which doesn't do
>> >> overwrite. Here is my question, suppose file A is overwritten by A',
>> >> instead of writing A' to the original place of A, a new place is
>> >> selected to store it. However, we know that the address of a file
>> >> should be recorded in its inode. In such case, the corresponding part
>> >> in inode of A should update from the original place A to the new place
>> >> A', is this a kind of overwrite actually? I think no matter what
>> >> design it is for Log-Structured FS, a mapping table is always needed,
>> >> such as inode map, DAT, etc. When a update operation happens for this
>> >> mapping table, is it actually a kind of over-write? If it is, is it a
>> >> bottleneck for the performance of write for SSD?
>> >
>> > In btrfs, this is solved by doing the same thing for the inode--a new
>> > place for the leaf holding the inode is chosen. Then the parent of the
>> > leaf must point to the new position of the leaf, so the parent is moved,
>> > and the parent's parent, etc. This goes all the way up to the
>> > superblocks, which are actually overwritten one at a time.
>>
>> You mean that there is no over-write for inode too, once the inode
>> need to be updated, this inode is actually written to a new place
>> while the only thing to do is to change the point of its parent to
>> this new place. However, for the last parent, or the superblock, does
>> it need to be overwritten?
>
> Yes. The idea of copy-on-write, as used by btrfs, is that whenever
> *anything* is changed, it is simply written to a new location. This
> applies to data, inodes, and all of the B-trees used by the filesystem.
> However, it's necessary to have *something* in a fixed place on disk
> pointing to everything else. So the superblocks can't move, and they are
> overwritten instead.
>

So, is it a bottleneck in the case of SSD since the cost for over
write is very high? For every write, I think the superblocks should be
overwritten, it might be much more frequent than other common blocks
in SSD, even though SSD will do wear leveling inside by its FTL.

What I current know is that for Intel x25-V SSD, the write throughput
of BTRFS is almost 80% less than the one of EXT3 in the case of
PostMark. This really confuses me.

Thanks,
Yuehai

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS && SSD
  2010-09-29 21:31       ` Yuehai Xu
@ 2010-09-30  7:15         ` Sander
  2010-09-30 12:06           ` Yuehai Xu
  2010-09-30  7:51         ` David Brown
  1 sibling, 1 reply; 13+ messages in thread
From: Sander @ 2010-09-30  7:15 UTC (permalink / raw)
  To: Yuehai Xu; +Cc: linux-btrfs, yhxu, chris.mason, wingedtachikoma

Yuehai Xu wrote (ao):
> So, is it a bottleneck in the case of SSD since the cost for over
> write is very high? For every write, I think the superblocks should be
> overwritten, it might be much more frequent than other common blocks
> in SSD, even though SSD will do wear leveling inside by its FTL.

The FTL will make sure the write cycles are evenly divided among the
physical blocks, regardless of how often you overwrite a single spot on
the fs.

> What I current know is that for Intel x25-V SSD, the write throughput
> of BTRFS is almost 80% less than the one of EXT3 in the case of
> PostMark. This really confuses me.

Can you show the script you use to test this, provide some info
regarding your setup, and show the numbers you see?

	Sander

-- 
Humilis IT Services and Solutions
http://www.humilis.net

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS && SSD
  2010-09-29 21:31       ` Yuehai Xu
  2010-09-30  7:15         ` Sander
@ 2010-09-30  7:51         ` David Brown
  2010-09-30 12:04           ` Yuehai Xu
  1 sibling, 1 reply; 13+ messages in thread
From: David Brown @ 2010-09-30  7:51 UTC (permalink / raw)
  To: linux-btrfs

On 29/09/2010 23:31, Yuehai Xu wrote:
> On Wed, Sep 29, 2010 at 3:59 PM, Sean Bartell<wingedtachikoma@gmail.com>  wrote:
>> On Wed, Sep 29, 2010 at 02:45:29PM -0400, Yuehai Xu wrote:
>>> On Wed, Sep 29, 2010 at 1:08 PM, Sean Bartell<wingedtachikoma@gmail.com>  wrote:
>>>> On Wed, Sep 29, 2010 at 11:30:14AM -0400, Yuehai Xu wrote:
>>>>> I know BTRFS is a kind of Log-structured File System, which doesn't do
>>>>> overwrite. Here is my question, suppose file A is overwritten by A',
>>>>> instead of writing A' to the original place of A, a new place is
>>>>> selected to store it. However, we know that the address of a file
>>>>> should be recorded in its inode. In such case, the corresponding part
>>>>> in inode of A should update from the original place A to the new place
>>>>> A', is this a kind of overwrite actually? I think no matter what
>>>>> design it is for Log-Structured FS, a mapping table is always needed,
>>>>> such as inode map, DAT, etc. When a update operation happens for this
>>>>> mapping table, is it actually a kind of over-write? If it is, is it a
>>>>> bottleneck for the performance of write for SSD?
>>>>
>>>> In btrfs, this is solved by doing the same thing for the inode--a new
>>>> place for the leaf holding the inode is chosen. Then the parent of the
>>>> leaf must point to the new position of the leaf, so the parent is moved,
>>>> and the parent's parent, etc. This goes all the way up to the
>>>> superblocks, which are actually overwritten one at a time.
>>>
>>> You mean that there is no over-write for inode too, once the inode
>>> need to be updated, this inode is actually written to a new place
>>> while the only thing to do is to change the point of its parent to
>>> this new place. However, for the last parent, or the superblock, does
>>> it need to be overwritten?
>>
>> Yes. The idea of copy-on-write, as used by btrfs, is that whenever
>> *anything* is changed, it is simply written to a new location. This
>> applies to data, inodes, and all of the B-trees used by the filesystem.
>> However, it's necessary to have *something* in a fixed place on disk
>> pointing to everything else. So the superblocks can't move, and they are
>> overwritten instead.
>>
>
> So, is it a bottleneck in the case of SSD since the cost for over
> write is very high? For every write, I think the superblocks should be
> overwritten, it might be much more frequent than other common blocks
> in SSD, even though SSD will do wear leveling inside by its FTL.
>

SSDs already do copy-on-write.  They can't change small parts of the 
data in a block, but have to re-write the block.  While that could be 
done by reading the whole erase block to a ram buffer, changing the 
data, erasing the flash block, then re-writing, this is not what happens 
in practice.  To make efficient use of write blocks that are smaller 
than erase blocks, and to provide wear levelling, the flash disk will 
implement a small change to a block by writing a new copy of the 
modified block to a different part of the flash, then updating its block 
indirection tables.

BTRFS just makes this process a bit more explicit (except for superblock 
writes).

> What I current know is that for Intel x25-V SSD, the write throughput
> of BTRFS is almost 80% less than the one of EXT3 in the case of
> PostMark. This really confuses me.
>

Different file systems have different strengths and weaknesses.  I 
haven't actually tested BTRFS much, but my understanding is that it will 
be significantly slower than EXT in certain cases, such as small 
modifications to large files (since copy-on-write means a lot of extra 
disk activity in such cases).  But for other things it is faster.  Also 
remember that BTRFS is under development - optimising for raw speed 
comes at a lower priority than correctness and safety of data, and 
implementation of BTRFS features.  Once everyone is happy with the 
stability of the file system and its functionality and tools, you can 
expect the speed to improve somewhat over time.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS && SSD
  2010-09-30  7:51         ` David Brown
@ 2010-09-30 12:04           ` Yuehai Xu
  0 siblings, 0 replies; 13+ messages in thread
From: Yuehai Xu @ 2010-09-30 12:04 UTC (permalink / raw)
  To: David Brown; +Cc: linux-btrfs, yhxu

On Thu, Sep 30, 2010 at 3:51 AM, David Brown <david@westcontrol.com> wr=
ote:
> On 29/09/2010 23:31, Yuehai Xu wrote:
>>
>> On Wed, Sep 29, 2010 at 3:59 PM, Sean Bartell<wingedtachikoma@gmail.=
com>
>> =A0wrote:
>>>
>>> On Wed, Sep 29, 2010 at 02:45:29PM -0400, Yuehai Xu wrote:
>>>>
>>>> On Wed, Sep 29, 2010 at 1:08 PM, Sean Bartell<wingedtachikoma@gmai=
l.com>
>>>> =A0wrote:
>>>>>
>>>>> On Wed, Sep 29, 2010 at 11:30:14AM -0400, Yuehai Xu wrote:
>>>>>>
>>>>>> I know BTRFS is a kind of Log-structured File System, which does=
n't do
>>>>>> overwrite. Here is my question, suppose file A is overwritten by=
 A',
>>>>>> instead of writing A' to the original place of A, a new place is
>>>>>> selected to store it. However, we know that the address of a fil=
e
>>>>>> should be recorded in its inode. In such case, the corresponding=
 part
>>>>>> in inode of A should update from the original place A to the new=
 place
>>>>>> A', is this a kind of overwrite actually? I think no matter what
>>>>>> design it is for Log-Structured FS, a mapping table is always ne=
eded,
>>>>>> such as inode map, DAT, etc. When a update operation happens for=
 this
>>>>>> mapping table, is it actually a kind of over-write? If it is, is=
 it a
>>>>>> bottleneck for the performance of write for SSD?
>>>>>
>>>>> In btrfs, this is solved by doing the same thing for the inode--a=
 new
>>>>> place for the leaf holding the inode is chosen. Then the parent o=
f the
>>>>> leaf must point to the new position of the leaf, so the parent is
>>>>> moved,
>>>>> and the parent's parent, etc. This goes all the way up to the
>>>>> superblocks, which are actually overwritten one at a time.
>>>>
>>>> You mean that there is no over-write for inode too, once the inode
>>>> need to be updated, this inode is actually written to a new place
>>>> while the only thing to do is to change the point of its parent to
>>>> this new place. However, for the last parent, or the superblock, d=
oes
>>>> it need to be overwritten?
>>>
>>> Yes. The idea of copy-on-write, as used by btrfs, is that whenever
>>> *anything* is changed, it is simply written to a new location. This
>>> applies to data, inodes, and all of the B-trees used by the filesys=
tem.
>>> However, it's necessary to have *something* in a fixed place on dis=
k
>>> pointing to everything else. So the superblocks can't move, and the=
y are
>>> overwritten instead.
>>>
>>
>> So, is it a bottleneck in the case of SSD since the cost for over
>> write is very high? For every write, I think the superblocks should =
be
>> overwritten, it might be much more frequent than other common blocks
>> in SSD, even though SSD will do wear leveling inside by its FTL.
>>
>
> SSDs already do copy-on-write. =A0They can't change small parts of th=
e data in
> a block, but have to re-write the block. =A0While that could be done =
by
> reading the whole erase block to a ram buffer, changing the data, era=
sing
> the flash block, then re-writing, this is not what happens in practic=
e. =A0To
> make efficient use of write blocks that are smaller than erase blocks=
, and
> to provide wear levelling, the flash disk will implement a small chan=
ge to a
> block by writing a new copy of the modified block to a different part=
 of the
> flash, then updating its block indirection tables.

Yes, the FTL inside the SSDs will do such kind of job, and the
overhead should be small once the block mapping is page-level mapping,
however, the size of page-level mapping is too large to be stored
totally in the SRAM of SSDs, So, many complicated algorithms have been
developed to optimize this. In another word, SSDs might not always be
smart enough to do wear leveling with small overhead. This is my
subjective opinion.

>
> BTRFS just makes this process a bit more explicit (except for superbl=
ock
> writes).

As you have said, the superblocks should be over written, is it
frequent? If it is, is it possible to be potential bottleneck for the
throughput of SSDs? Afterall, SSDs are not happy with over-write. Of
course, few people really knows what's the algorithms really are for
the FTL, which determines the efficiency of SSDs actually.


>
>> What I current know is that for Intel x25-V SSD, the write throughpu=
t
>> of BTRFS is almost 80% less than the one of EXT3 in the case of
>> PostMark. This really confuses me.
>>
>
> Different file systems have different strengths and weaknesses. =A0I =
haven't
> actually tested BTRFS much, but my understanding is that it will be
> significantly slower than EXT in certain cases, such as small modific=
ations
> to large files (since copy-on-write means a lot of extra disk activit=
y in
> such cases). =A0But for other things it is faster. =A0Also remember t=
hat BTRFS
> is under development - optimising for raw speed comes at a lower prio=
rity
> than correctness and safety of data, and implementation of BTRFS feat=
ures.
> =A0Once everyone is happy with the stability of the file system and i=
ts
> functionality and tools, you can expect the speed to improve somewhat=
 over
> time.

My test case for PostMark is:
set file size 9216 15360 (file size from 9216 bytes to 15360 bytes)
set number 50000(file number is 50000)

write throughput(MB/s) for different file systems in Intel SSD X25-V:
EXT3: 28.09
NILFS2: 10
BTRFS: 17.35
EXT4: 31.04
XFS: 11.56
REISERFS: 28.09
EXT2: 15.94

Thanks,
Yuehai

>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs=
" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS && SSD
  2010-09-30  7:15         ` Sander
@ 2010-09-30 12:06           ` Yuehai Xu
  2010-09-30 13:45             ` Sander
  0 siblings, 1 reply; 13+ messages in thread
From: Yuehai Xu @ 2010-09-30 12:06 UTC (permalink / raw)
  To: sander; +Cc: linux-btrfs, yhxu, chris.mason, wingedtachikoma

On Thu, Sep 30, 2010 at 3:15 AM, Sander <sander@humilis.net> wrote:
> Yuehai Xu wrote (ao):
>> So, is it a bottleneck in the case of SSD since the cost for over
>> write is very high? For every write, I think the superblocks should =
be
>> overwritten, it might be much more frequent than other common blocks
>> in SSD, even though SSD will do wear leveling inside by its FTL.
>
> The FTL will make sure the write cycles are evenly divided among the
> physical blocks, regardless of how often you overwrite a single spot =
on
> the fs.
>
>> What I current know is that for Intel x25-V SSD, the write throughpu=
t
>> of BTRFS is almost 80% less than the one of EXT3 in the case of
>> PostMark. This really confuses me.
>
> Can you show the script you use to test this, provide some info
> regarding your setup, and show the numbers you see?

My test case for PostMark is:
set file size 9216 15360 (file size from 9216 bytes to 15360 bytes)
set number 50000(file number is 50000)

write throughput(MB/s) for different file systems in Intel SSD X25-V:
EXT3: 28.09
NILFS2: 10
BTRFS: 17.35
EXT4: 31.04
XFS: 11.56
REISERFS: 28.09
EXT2: 15.94

Thanks,
Yuehai
>
> =A0 =A0 =A0 =A0Sander
>
> --
> Humilis IT Services and Solutions
> http://www.humilis.net
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS && SSD
  2010-09-30 12:06           ` Yuehai Xu
@ 2010-09-30 13:45             ` Sander
  0 siblings, 0 replies; 13+ messages in thread
From: Sander @ 2010-09-30 13:45 UTC (permalink / raw)
  To: Yuehai Xu; +Cc: sander, linux-btrfs, yhxu, chris.mason, wingedtachikoma

Yuehai Xu wrote (ao):
> On Thu, Sep 30, 2010 at 3:15 AM, Sander <sander@humilis.net> wrote:
> > Yuehai Xu wrote (ao):
> >> So, is it a bottleneck in the case of SSD since the cost for over
> >> write is very high? For every write, I think the superblocks should be
> >> overwritten, it might be much more frequent than other common blocks
> >> in SSD, even though SSD will do wear leveling inside by its FTL.
> >
> > The FTL will make sure the write cycles are evenly divided among the
> > physical blocks, regardless of how often you overwrite a single spot on
> > the fs.
> >
> >> What I current know is that for Intel x25-V SSD, the write throughput
> >> of BTRFS is almost 80% less than the one of EXT3 in the case of
> >> PostMark. This really confuses me.
> >
> > Can you show the script you use to test this, provide some info
> > regarding your setup, and show the numbers you see?
> 
> My test case for PostMark is:
> set file size 9216 15360 (file size from 9216 bytes to 15360 bytes)
> set number 50000(file number is 50000)
> 
> write throughput(MB/s) for different file systems in Intel SSD X25-V:
> EXT3: 28.09
> NILFS2: 10
> BTRFS: 17.35
> EXT4: 31.04
> XFS: 11.56
> REISERFS: 28.09
> EXT2: 15.94

And your testscript? You'll have to provide information on how you
create the filesystems (partitioning, etc), how you mount (options),
what versions of kernel, tools, etc, if you reboot between runs, how
many runs (also per fs), did you burn in the ssd before, modules loaded,
type and amount of hardware (controller, cpu, memory) etc, etc, etc.

	Sander

-- 
Humilis IT Services and Solutions
http://www.humilis.net

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2010-09-30 13:45 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-29 15:30 BTRFS && SSD Yuehai Xu
2010-09-29 17:08 ` Sean Bartell
2010-09-29 18:45   ` Yuehai Xu
2010-09-29 19:59     ` Sean Bartell
2010-09-29 21:31       ` Yuehai Xu
2010-09-30  7:15         ` Sander
2010-09-30 12:06           ` Yuehai Xu
2010-09-30 13:45             ` Sander
2010-09-30  7:51         ` David Brown
2010-09-30 12:04           ` Yuehai Xu
2010-09-29 19:39   ` Aryeh Gregor
2010-09-29 20:08     ` Sean Bartell
     [not found] ` <20100929173757.7cf18c0d@simplux>
2010-09-29 18:38   ` Yuehai Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).