All of lore.kernel.org
 help / color / mirror / Atom feed
From: Edward Shishkin <edward.shishkin@gmail.com>
To: Ivan Shapovalov <intelfx100@gmail.com>
Cc: ReiserFS development mailing list <reiserfs-devel@vger.kernel.org>
Subject: Re: [PATCH] reiser4: precise discard - general case
Date: Thu, 12 Feb 2015 00:40:48 +0100	[thread overview]
Message-ID: <54DBE880.4070902@gmail.com> (raw)
In-Reply-To: <1423649354.10127.16.camel@gmail.com>


On 02/11/2015 11:09 AM, Ivan Shapovalov wrote:
> On 2015-02-11 at 09:23 +0100, Edward Shishkin wrote:
>> On 02/10/2015 09:42 PM, Ivan Shapovalov wrote:
>>> On 2014-12-20 at 21:24 +0100, Edward Shishkin wrote:
>>>> This is the promised generalization, which is supposed to work for all
>>>> discard
>>>> offsets and all discard unit sizes without any restrictions.
>>>>
>>>> Complications in comparison with the previous implementation:
>>>>
>>>> In this general case we need "precise" coordinates, where every
>>>> individual byte
>>>> can be addressed. All local variables, which represent precise
>>>> coordinates are
>>>> denoted with "prefixes" (a_len, d_off, p_tailp, etc). Local variables,
>>>> which represent
>>>> "non-precise" coordinates (they are usually of type reiser4_block_nr)
>>>> are denoted
>>>> without prefixes (start, len, end, tailp, etc).
>>>>
>>>> Blocks, which contain head and tail paddings are now calculated using the
>>>> function size_in_blocks(), which actually is an expression for the
>>>> minimal number
>>>> of blocks containing the precise extent.
>>>>
>>>> The next trouble is "peculiarity in 0", encountered when calculating the
>>>> blocks of
>>>> head padding. if discard offset is different from 0, then the first
>>>> discard unit of the
>>>> partition is partial (its other part doesn't belong to our partition, so
>>>> we can not
>>>> discard it). We handle this peculiarity by an additional check.
>>>>
>>>> In other bits everything is the same.
>>>>
>>>> Possible optimization: If discard unit sizes are always powers of 2,
>>>> then it makes
>>>> sense to replace "do_div(offset, unit_size)" with "offset & (unit_size -
>>>> 1)".
>>>>
>>>> Mount options discard.offset=xxx,discard.unit=yyy are to emulate various
>>>> discard unit sizes and offsets on devices _without_ trim support (e.g.
>>>> HDDs).
>>>> This is only for debugging purposes, don't use it for real SSD devices:
>>>> the kernel
>>>> retrieves the discard parameters on its own.
>>>>
>>>> This patch is against the patch series of Ivan Shapovalov:
>>>> http://marc.info/?l=reiserfs-devel&m=141841865432082&w=2
>>>>
>>>> Current status: not well-tested.
>>>>
>>>> Edward.
>>> Hi,
>>>
>>> I've found a bug in our implementation (don't know when it appeared,
>>> maybe it was quite some time ago). I've intended to fix it and send
>>> a patch along with description, but I still can't think of a viable fix.
>>>
>>> So: the problem is that check_free_blocks() isn't idempotent, because it
>>> allocates blocks if the whole extent is clean. Therefore, it must not be
>>> called for overlapping ranges. However, in some conditions tail padding
>>> of some extent and head padding of next extent may overlap in terms of
>>> disk blocks (gluing code only catches overlapping erase units).
>>>
>>> This will yield a false negative when checking the head padding, so it
>>> does not lead to any data losses (just to inefficiency).
>>
>> You mean that sometimes we perform unneeded checks?
>> I see nothing criminal, as we don't exceed announced (2N_e)
>> number of checks, where N_e is number of extents in the
>> discard set.
> No, I didn't talk about that.
>
>> As to fixup: I think that we need to set up the local variable
>> head_is_known_dirty properly..
> Hmm, head_is_known_dirty is an optimization: either known dirty
> (in which case we skip checking and cut the head), or unknown
> (in which case we do the check).
>
> I'm talking about a different scenario:
> - tail padding of an extent is clean
> - head padding of the next extent is clean
> - these two paddings overlap in terms of disk blocks
>
> In this case, the head padding check will yield false ("dirty") because
> part of it has been already allocated for the tail padding, but in fact
> it is clean. Thus a false negative: the head will be cut while it can be
> padded.


Ah, you suspect non-preciseness (leak of "garbage")?

If tail padding of the current extent overlaps with the head padding of
the next extent, then the end of the current extent and the beginning
of the next extent are in the same erase unit. Otherwise we'll end with
contradiction. Correct? Now note that we'll try to glue such extents
(the current one and its right neighbor).

If this gluing failed (the "area" between the extents is dirty), then we
set head_is_known_dirty = 1, so that head padding of the next extent
won't be checked.

If this gluing successful, then we won't check the head of the next
extent. Because we jump to this next extent and try to glue its right
neighbor.

That is, I still don't see any problems.

Edward.


> More generally, our check_free_blocks() is not idempotent. After a
> positive result has been returned for a given range [A;B),
> check_free_blocks() must not be called for any range [A';B') such that
> [A;B) ∩ [A';B') ≠ ∅. Otherwise a false negative can be returned.
>
> Actually, during the last night I've *apparently* came up with a
> solution for this :) I'll send a patch once I figure out how to explain
> it properly in the comments...
>
> Thanks,

--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2015-02-11 23:40 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-20 20:24 [PATCH] reiser4: precise discard - general case Edward Shishkin
2015-02-10 20:42 ` Ivan Shapovalov
2015-02-11  8:23   ` Edward Shishkin
2015-02-11 10:09     ` Ivan Shapovalov
2015-02-11 23:40       ` Edward Shishkin [this message]
2015-02-12  6:14         ` Ivan Shapovalov
2015-03-08 22:43           ` Edward Shishkin
2015-03-09  1:46             ` Ivan Shapovalov
2015-03-09 10:38               ` Edward Shishkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54DBE880.4070902@gmail.com \
    --to=edward.shishkin@gmail.com \
    --cc=intelfx100@gmail.com \
    --cc=reiserfs-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.