All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tao Ma <tm@tao.ma>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [PATCH 1/3] ocfs2: Add ocfs2_trim_fs for SSD trim support.
Date: Tue, 08 Mar 2011 15:47:52 +0800	[thread overview]
Message-ID: <4D75DF28.6010401@tao.ma> (raw)
In-Reply-To: <4D75D281.5000003@oracle.com>

On 03/08/2011 02:53 PM, Tristan Ye wrote:
> Tao Ma wrote:
>> On 03/08/2011 02:23 PM, Tristan Ye wrote:
>>> Tao Ma wrote:
>>>> On 03/08/2011 12:55 PM, Tristan Ye wrote:
>>>>> Hi Tao,
>>>>>
>>>>>    Most of codes looks pretty neat to me, few comments inlined below:
>>>> Thanks for the review.
>>>>> Tao Ma wrote:
>>>>>> From: Tao Ma <boyu.mt@taobao.com>
>>>>>>
>>>>>> Add ocfs2_trim_fs to support trimming freed clusters in the
>>>>>> volume. A range will be given and all the freed clusters greater
>>>>>> than minlen will be discarded to the block layer.
>>>>>>
>>>>>> Signed-off-by: Tao Ma <boyu.mt@taobao.com>
>>>>>> ---
>>>>>>  fs/ocfs2/alloc.c |  154
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>  fs/ocfs2/alloc.h |    1 +
>>>>>>  2 files changed, 155 insertions(+), 0 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
>>>>>> index b27a0d8..6e1b3b5 100644
>>>>>> --- a/fs/ocfs2/alloc.c
>>>>>> +++ b/fs/ocfs2/alloc.c
>>>>>> @@ -29,6 +29,7 @@
>>>>>>  #include <linux/highmem.h>
>>>>>>  #include <linux/swap.h>
>>>>>>  #include <linux/quotaops.h>
>>>>>> +#include <linux/blkdev.h>
>>>>>>  
>>>>>>  #include <cluster/masklog.h>
>>>>>>  
>>>>>> @@ -7184,3 +7185,156 @@ out_commit:
>>>>>>  out:
>>>>>>      return ret;
>>>>>>  }
>>>>>> +
>>>>>> +static int ocfs2_trim_extent(struct super_block *sb,
>>>>>> +                 struct ocfs2_group_desc *gd,
>>>>>> +                 int start, int count)
>>>>>> +{
>>>>>> +    u64 discard;
>>>>>> +
>>>>>> +    count = ocfs2_clusters_to_blocks(sb, count);
>>>>>> +    discard = le64_to_cpu(gd->bg_blkno) +
>>>>>> +            ocfs2_clusters_to_blocks(sb, start);
>>>>>> +
>>>>>> +    return sb_issue_discard(sb, discard, count, GFP_NOFS, 0);
>>>>>> +}
>>>>>> +
>>>>>> +static int ocfs2_trim_group(struct super_block *sb,
>>>>>> +                struct ocfs2_group_desc *gd,
>>>>>> +                int start, int max, int minbits)
>>>>>> +{
>>>>>> +    int ret = 0, count = 0, next;
>>>>>> +    void *bitmap = gd->bg_bitmap;
>>>>>> +
>>>>>> +    while (start < max) {
>>>>>> +        start = ocfs2_find_next_zero_bit(bitmap, max, start);
>>>>>> +        if (start >= max)
>>>>>> +            break;
>>>>>    /* What if the 'start' stands within a hole */
>>>>>
>>>>>    if (ocfs2_test_bit(...)) {
>>>>>       start = ocfs2_find_next_zero_bit(...);
>>>>>       if ((start == -1) || (start >= max))
>>>>>          break;
>>>>>    }
>>>>>
>>>>>> +        next = ocfs2_find_next_bit(bitmap, max, start);
>>>>>      next = ocfs2_find_next_bit(...);
>>>>>    if (next == -1)
>>>>>       break;
>>>> next will be set to "-1"? sorry, but where do you get it?
>>>>>    if (next > max)
>>>>>       next = max;
>>>> again, ocfs2_find_next_bit will return a value larger than 'max'? I am
>>>> afraid not. Otherwise, it will be nonsense to pass a 'max' to it.
>>>
>>> Say we're handling the last group, and the 'start + len' was within a
>>> hole, then the 'max'
>>> is 'first_bit + len', while the next none-zero bit we found may be
>>> larger than 'max', isn't
>>> that possible?
>> ocfs2_find_next_bit(and ext2_find_next_bit) won't parse, check and
>> return 'bit' after 'max'. otherwise there should be a problem of memory
>> overflow(you read and check some memory which isn't owned and handled by
>> you). So the same goes here. If it can return a value larger than 'max',
>> every caller will have to check the overflow. That would be too painful.
> 
>  Oh, you may misunderstood my words, the 'max' you passed to
> ocfs2_find_next_bit()
> may not be the ending-edge of the cluster group(bitmap), it may be the
> end of what user specified
> for TRIMing, therefore the 'next'(ending-edge for a wanted hole) bit you
> found from ocfs2_find_next_bit()
> might be larger than 'max', is that possible?
Please note that ocfs2_find_next_bit knows nothing about what 'max'
means. So no matter it will be the end of the cluster group or just the
middle of a bitmap, it would return values after 'max' I think.
> 
>>>>>> +int ocfs2_trim_fs(struct super_block *sb, struct fstrim_range
>>>>>> *range)
>>>>>> +{
>>>>>> +    struct ocfs2_super *osb = OCFS2_SB(sb);
>>>>>> +    u64 start, len, minlen, trimmed, first_group, last_group, group;
>>>>>    why not using u32 start, len, minlen, trimmed;
>>>> we may use 64 bit clusters later I guess. And what's more, they will be
>>>> set by the user later. and it may overflow. Say the user pass a u64
>>>> range->len, it will overflow with range->len >>
>>>> osb->s_clustersize_bits.
>>> I just found we were using u32 for counting clusters all around ocfs2
>>> codes, e.g truncate/punching_hole
>>> codes, also passing an u64 byte_offset from userspace, so my original
>>> intention is to keep an unification;-)
>>>
>>> Overflow can theoretically happen anyway, however, it's not very likely
>>> to pass a 16TB+ byte_offset from userspace.
>> I am afraid it is very likely. So say you want to trim all the clusters
>> within the volume, how could you set 'range->len'? Will you first fdisk
>> to get the volume size and then set it accordingly?
>> Most guys will set it to ULLONG_MAX and let the file system handles it.
>> This is not my personal view, please check this article:
>> http://lwn.net/Articles/417809/
>> Jonathan also suggests to set len to ULLONG_MAX so that you can trim the
>> whole volume.
> 
>    Nice self-defense;-), how about the overflow risk in
> truncate/punching-hole
> codes, where u32 were being used for cluster counting.
yeah, you can try and fix it.

Regards,
Tao

  reply	other threads:[~2011-03-08  7:47 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-07 10:02 [Ocfs2-devel] [PATCH 0/3] ocfs2: Add batched discard support Tao Ma
2011-03-07 10:05 ` [Ocfs2-devel] [PATCH 1/3] ocfs2: Add ocfs2_trim_fs for SSD trim support Tao Ma
2011-03-08  4:55   ` Tristan Ye
2011-03-08  5:53     ` Tao Ma
2011-03-08  6:23       ` Tristan Ye
2011-03-08  6:42         ` Tao Ma
2011-03-08  6:53           ` Tristan Ye
2011-03-08  7:47             ` Tao Ma [this message]
2011-03-08  7:53     ` Tao Ma
2011-03-08  7:59       ` Tristan Ye
2011-03-07 10:05 ` [Ocfs2-devel] [PATCH 2/3] ocfs2: Add FITRIM ioctl Tao Ma
2011-03-07 10:05 ` [Ocfs2-devel] [PATCH 3/3] ocfs2: Add trace event for trim Tao Ma
2011-03-08 15:26 ` [Ocfs2-devel] [PATCH 1/3 v2] ocfs2: Add ocfs2_trim_fs for SSD trim support Tao Ma
  -- strict thread matches above, loose matches on Subject: below --
2011-05-06  9:23 [Ocfs2-devel] [PATCH 0/3] ocfs2: Add batched discard support Tao Ma
2011-05-06  9:27 ` [Ocfs2-devel] [PATCH 1/3] ocfs2: Add ocfs2_trim_fs for SSD trim support Tao Ma
2011-05-09 23:02   ` Sunil Mushran
2011-05-10  3:14     ` Tao Ma

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D75DF28.6010401@tao.ma \
    --to=tm@tao.ma \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.