* [PATCH] Fix RHBZ 754198 (multiple dmeventd snapshot extensions)
@ 2011-11-15 22:00 Petr Rockai
2011-11-15 22:23 ` Milan Broz
2011-11-16 1:20 ` Alasdair G Kergon
0 siblings, 2 replies; 17+ messages in thread
From: Petr Rockai @ 2011-11-15 22:00 UTC (permalink / raw)
To: lvm-devel
Hi,
the attached patch should fix the problem encountered in 754198. Namely,
we failed to reset the percent threshold for the next check after
extending a snapshot. To this end, I have added a special exit code,
ENO_ACTION_NEEDED -- however, this is not absolutely required, and is
only an optimisation. On the other hand, it is likely an important
optimisation, because without this, all snapshots filled above 50 % will
run a LVM command (with all the associated overhead) every 10
seconds. There are more possible uses for this new exit code, so I think
it's a good idea anyway.
Documentation *might* need updating (although it's currently only used
by --use-policies, which is not an entirely user-level option...).
Other than that, the patch is relatively straightforward.
Yours,
Petr
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dmeventd-snapshot-fix.diff
Type: text/x-diff
Size: 3791 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/lvm-devel/attachments/20111115/7bbcffad/attachment.bin>
-------------- next part --------------
--
id' Ash = Ash; id' Dust = Dust; id' _ = undefined
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH] Fix RHBZ 754198 (multiple dmeventd snapshot extensions)
2011-11-15 22:00 [PATCH] Fix RHBZ 754198 (multiple dmeventd snapshot extensions) Petr Rockai
@ 2011-11-15 22:23 ` Milan Broz
[not found] ` <20111116000020.GA3294@agk-dp.fab.redhat.com>
2011-11-16 1:20 ` Alasdair G Kergon
1 sibling, 1 reply; 17+ messages in thread
From: Milan Broz @ 2011-11-15 22:23 UTC (permalink / raw)
To: lvm-devel
On 11/15/2011 11:00 PM, Petr Rockai wrote:
> the attached patch should fix the problem encountered in 754198. Namely,
> we failed to reset the percent threshold for the next check after
> extending a snapshot. To this end, I have added a special exit code,
> ENO_ACTION_NEEDED -- however, this is not absolutely required, and is
> only an optimisation. On the other hand, it is likely an important
> optimisation, because without this, all snapshots filled above 50 % will
> run a LVM command (with all the associated overhead) every 10
> seconds. There are more possible uses for this new exit code, so I think
> it's a good idea anyway.
>
> Documentation *might* need updating (although it's currently only used
> by --use-policies, which is not an entirely user-level option...).
>
> Other than that, the patch is relatively straightforward.
For me this seems ok, ack.
(dmeventd/lvm interaction through calling lvm binary strange anyway...)
Milan
^ permalink raw reply [flat|nested] 17+ messages in thread* [PATCH] Fix RHBZ 754198 (multiple dmeventd snapshot extensions)
2011-11-15 22:00 [PATCH] Fix RHBZ 754198 (multiple dmeventd snapshot extensions) Petr Rockai
2011-11-15 22:23 ` Milan Broz
@ 2011-11-16 1:20 ` Alasdair G Kergon
2011-11-16 9:41 ` Petr Rockai
1 sibling, 1 reply; 17+ messages in thread
From: Alasdair G Kergon @ 2011-11-16 1:20 UTC (permalink / raw)
To: lvm-devel
On Tue, Nov 15, 2011 at 11:00:24PM +0100, Peter Rockai wrote:
> extending a snapshot. To this end, I have added a special exit code,
> ENO_ACTION_NEEDED -- however, this is not absolutely required, and is
Let's not add it then in response to a bug fix.
If the cmd exits OK, there's only really one code we should return for
shell scripts. Splitting 'OK' into 'OK - something changed' and
'OK - but nothing was changed' doesn't seem worth the effort and
adding a new code for one special case is the thin end of the wedge.
(If there's nothing to do and use_policies was given, it can just return the
normal OK?)
> only an optimisation. On the other hand, it is likely an important
> optimisation, because without this, all snapshots filled above 50 % will
> run a LVM command (with all the associated overhead) every 10
Why does it loop every 10 seconds? Is there another way to stop it?
1) Is there some problematic interaction between the configurable
(or even default) parameters that's causing problems? Do some
parameters sometimes need overriding or constraining?
We should list the parameters and any constraints and make sure the
code deals with them.
(This might be going beyond the specific problem that Corey found,
but suggesting related problems to fix.)
2) One auto-extend should always be enough to deal with the notified
event.
- It should pass in the minimum required extension and use_policies
should take account of that.
Alasdair
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH] Fix RHBZ 754198 (multiple dmeventd snapshot extensions)
2011-11-16 1:20 ` Alasdair G Kergon
@ 2011-11-16 9:41 ` Petr Rockai
2011-11-16 10:59 ` Alasdair G Kergon
0 siblings, 1 reply; 17+ messages in thread
From: Petr Rockai @ 2011-11-16 9:41 UTC (permalink / raw)
To: lvm-devel
Hi,
Alasdair G Kergon <agk@redhat.com> writes:
> Let's not add it then in response to a bug fix.
>
> If the cmd exits OK, there's only really one code we should return for
> shell scripts. Splitting 'OK' into 'OK - something changed' and
> 'OK - but nothing was changed' doesn't seem worth the effort and
> adding a new code for one special case is the thin end of the wedge.
>
> (If there's nothing to do and use_policies was given, it can just
> return the normal OK?)
That's what used to happen until now.
> Why does it loop every 10 seconds? Is there another way to stop it?
No, there is not. (See below.)
> 1) Is there some problematic interaction between the configurable
> (or even default) parameters that's causing problems? Do some
> parameters sometimes need overriding or constraining?
> We should list the parameters and any constraints and make sure the
> code deals with them.
> (This might be going beyond the specific problem that Corey found,
> but suggesting related problems to fix.)
No, this is not a problem and is not related to the fix at hand.
> 2) One auto-extend should always be enough to deal with the notified
> event.
> - It should pass in the minimum required extension and use_policies
> should take account of that.
See above, and my previous mail as well. This is *not* a problem (and
most definitely not _the_ problem). This is all about multiple
"events". Only that we do not get any actual events from the kernel, so
we poll every ten seconds. And to avoid actually running lvresize every
10 seconds, we cache the outcome of the last check and only re-run
lvresize if things "changed" from the last time.
Petr
PS: If the separate exit code is not acceptable, there are other ways to
fix this, but they involve more substantial changes to the snapshot
DSO. It's your call whether it's worth complicating the logic and
increasing the risks by going that way.
--
id' Ash = Ash; id' Dust = Dust; id' _ = undefined
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH] Fix RHBZ 754198 (multiple dmeventd snapshot extensions)
2011-11-16 9:41 ` Petr Rockai
@ 2011-11-16 10:59 ` Alasdair G Kergon
2011-11-16 13:07 ` Alasdair G Kergon
0 siblings, 1 reply; 17+ messages in thread
From: Alasdair G Kergon @ 2011-11-16 10:59 UTC (permalink / raw)
To: lvm-devel
On Wed, Nov 16, 2011 at 10:41:16AM +0100, Peter Rockai wrote:
> See above, and my previous mail as well. This is *not* a problem (and
> most definitely not _the_ problem). This is all about multiple
> "events". Only that we do not get any actual events from the kernel, so
> we poll every ten seconds. And to avoid actually running lvresize every
> 10 seconds, we cache the outcome of the last check and only re-run
> lvresize if things "changed" from the last time.
I don't see why the outcome of lvextend matters provided that one lvextend
is always sufficient to bring it back within bounds. All that should matter
to the algorithm is the change in space used since the last check.
I still think:
- a minimum size needs to be passed into lvextend (to avoid repeated calls if
there is rapid growth or if snapshot_autoextend_threshold was set too small)
- the trigger logic needs improving and needs to take account of the actual
space used as well as the percentage (and set the mininum extension size
appropriately)
Alasdair
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH] Fix RHBZ 754198 (multiple dmeventd snapshot extensions)
2011-11-16 10:59 ` Alasdair G Kergon
@ 2011-11-16 13:07 ` Alasdair G Kergon
2011-11-18 9:52 ` Petr Rockai
0 siblings, 1 reply; 17+ messages in thread
From: Alasdair G Kergon @ 2011-11-16 13:07 UTC (permalink / raw)
To: lvm-devel
On Wed, Nov 16, 2011 at 10:59:53AM +0000, Alasdair G Kergon wrote:
> - a minimum size needs to be passed into lvextend (to avoid repeated calls if
> there is rapid growth or if snapshot_autoextend_threshold was set too small)
> - the trigger logic needs improving and needs to take account of the actual
> space used as well as the percentage (and set the mininum extension size
> appropriately)
So this line of thought leads to lvextend --use-polices calculating:
(new_size = old_size + amount_to_extend_by) >
old_size_in_use + ((100 + autoextend_percent) - autoextend_threshold) / (100+autoextend_percent) * (old_size + amount_to_extend_by)
IOW we scale the parameters based on the amount currently in use.
For example:
snapshot_autoextend_threshold = 70
snapshot_autoextend_percent = 20
Means if it was 70% full, the percentage of free space after extension would
be:
((100 + 20) - 70) / (100 + 20) = ((100 + autoextend_percent) - autoextend_threshold) / (100+autoextend_percent)
If it's more than 70% full, we extend it so that we obtain that same percentage
of free space.
As the check interval is 5%, we impose a minimum autoextend_percent of 6% of
old size (so > 5% of new size)
The plugin keeps track of the current size (from the status line) and if
the size increased it discovers this the next time it reads the status line
and resets its check at that point.
Separately, the plugin should perhaps record the maximum amount of growth
during a check interval and use that as a minimum amount of extension, if the
actual extension would otherwise be less.
Alasdair
^ permalink raw reply [flat|nested] 17+ messages in thread* [PATCH] Fix RHBZ 754198 (multiple dmeventd snapshot extensions)
2011-11-16 13:07 ` Alasdair G Kergon
@ 2011-11-18 9:52 ` Petr Rockai
2011-11-18 17:14 ` Zdenek Kabelac
0 siblings, 1 reply; 17+ messages in thread
From: Petr Rockai @ 2011-11-18 9:52 UTC (permalink / raw)
To: lvm-devel
Hi,
Alasdair G Kergon <agk@redhat.com> writes:
> (new_size = old_size + amount_to_extend_by) >
> old_size_in_use + ((100 + autoextend_percent) - autoextend_threshold) / (100+autoextend_percent) * (old_size + amount_to_extend_by)
>
> IOW we scale the parameters based on the amount currently in use.
>
> For example:
>
> snapshot_autoextend_threshold = 70
> snapshot_autoextend_percent = 20
[snip] This is all great, but completely besides the point. So let me
stress the point once more:
- snapshot DSO monitors a snapshot, getting its status line every 10
seconds or so
- the DSO has *no access* to the policy variables, because it does not
(and can not) read lvm.conf
The problem:
- to check whether anything needs to happen, lvextend needs to be
executed (this is *expensive*, even if it decides no action needs to
happen)
- if nothing needed to happen, we don't need to call lvextend until the
utilisation has grown; *but* we don't know whether anything happened
(without ENO_ACTION_NEEDED that is)
Alternative solutions:
- we can store the actual size of the snapshot returned by the status
ioctl, and compare that instead of relying on lvextend to say
ENO_ACTION_NEEDED; it's less elegant and requires more changes in
code, but avoids ENO_ACTION_NEEDED; there is a subtle race condition
involved, although most likely not important
- we could check whether percent usage decreased since the last
time... all of the disadvantages of the above are retained, and
additionally is unreliable... I would advise against this solution
All I need to know which of the possible solutions you want
implemented. The ENO_ACTION_NEEDED one (the one proposed) is most
elegant and least intrusive. All the others are basically a workaround
for lack of ENO_ACTION_NEEDED -- we go and try to figure out if anything
actually changed.
Petr
--
id' Ash = Ash; id' Dust = Dust; id' _ = undefined
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH] Fix RHBZ 754198 (multiple dmeventd snapshot extensions)
2011-11-18 9:52 ` Petr Rockai
@ 2011-11-18 17:14 ` Zdenek Kabelac
2011-11-18 19:46 ` Petr Rockai
0 siblings, 1 reply; 17+ messages in thread
From: Zdenek Kabelac @ 2011-11-18 17:14 UTC (permalink / raw)
To: lvm-devel
Dne 18.11.2011 10:52, Petr Rockai napsal(a):
> Hi,
>
> Alasdair G Kergon<agk@redhat.com> writes:
>
>> (new_size = old_size + amount_to_extend_by)>
>> old_size_in_use + ((100 + autoextend_percent) - autoextend_threshold) / (100+autoextend_percent) * (old_size + amount_to_extend_by)
>>
>> IOW we scale the parameters based on the amount currently in use.
>>
>> For example:
>>
>> snapshot_autoextend_threshold = 70
>> snapshot_autoextend_percent = 20
>
> [snip] This is all great, but completely besides the point. So let me
> stress the point once more:
>
> - snapshot DSO monitors a snapshot, getting its status line every 10
> seconds or so
> - the DSO has *no access* to the policy variables, because it does not
> (and can not) read lvm.conf
The formula above should be placed inside lvresize (_adjust_policy_params()).
It's advantage was to resize in a way - the it should avoid the need of
quick subsequent (i.e. more then autoextend_percent - so if you fill
snapshot very fast, there should be some reserve to catch up within
next 10s check.
> The problem:
>
> - to check whether anything needs to happen, lvextend needs to be
> executed (this is *expensive*, even if it decides no action needs to
> happen)
>
> - if nothing needed to happen, we don't need to call lvextend until the
> utilisation has grown; *but* we don't know whether anything happened
> (without ENO_ACTION_NEEDED that is)
But you still call dmeventd_lvm2_run() - which is the most expensive
operation here - so I do not exactly see what do you actually safe here
i.e. if the resize will not happen because according to policy it's not
yet needed - then I do not see any difference ?
Zdenek
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH] Fix RHBZ 754198 (multiple dmeventd snapshot extensions)
2011-11-18 17:14 ` Zdenek Kabelac
@ 2011-11-18 19:46 ` Petr Rockai
2011-11-18 19:56 ` Alasdair G Kergon
2011-11-18 20:48 ` Zdenek Kabelac
0 siblings, 2 replies; 17+ messages in thread
From: Petr Rockai @ 2011-11-18 19:46 UTC (permalink / raw)
To: lvm-devel
Zdenek Kabelac <zkabelac@redhat.com> writes:
> The formula above should be placed inside lvresize
> (_adjust_policy_params()).
And still completely besides the point.
>> The problem:
>>
>> - to check whether anything needs to happen, lvextend needs to be
>> executed (this is *expensive*, even if it decides no action needs to
>> happen)
>>
>> - if nothing needed to happen, we don't need to call lvextend until the
>> utilisation has grown; *but* we don't know whether anything happened
>> (without ENO_ACTION_NEEDED that is)
>
> But you still call dmeventd_lvm2_run() - which is the most expensive
> operation here - so I do not exactly see what do you actually safe here
> i.e. if the resize will not happen because according to policy it's not
> yet needed - then I do not see any difference ?
Well, what can I say. I have already tried to explain how this works
(twice?), and it's also quite obvious from the code. Well, once more: we
do *not* call dmeventd_lvm2_run *if* the last lvconvert decided that
nothing needs to be done *and* the snapshot utilisation did not change
(more than 5%).
To implement that, we need to: 1) know that snapshot utilisation grew
(we know this already in the existing code) and 2) that lvconvert did
not change anything. We don't know 2 and wrongly assume that nothing
changes, ever (that's the bug).
Were we to assume every time that lvconvert might have changed
something, we would have to call it *every time* (every 10 seconds).
Is it clear now?
--
id' Ash = Ash; id' Dust = Dust; id' _ = undefined
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH] Fix RHBZ 754198 (multiple dmeventd snapshot extensions)
2011-11-18 19:46 ` Petr Rockai
@ 2011-11-18 19:56 ` Alasdair G Kergon
2011-11-18 20:28 ` Petr Rockai
2011-11-18 20:48 ` Zdenek Kabelac
1 sibling, 1 reply; 17+ messages in thread
From: Alasdair G Kergon @ 2011-11-18 19:56 UTC (permalink / raw)
To: lvm-devel
On Fri, Nov 18, 2011 at 08:46:44PM +0100, Peter Rockai wrote:
> that lvconvert did
> not change anything. We don't know 2 and wrongly assume that nothing
> changes, ever (that's the bug).
Yes, and we can detect 2 from the status line by checking whether the
device size changed, avoiding the need for the new exit code.
Alasdair
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH] Fix RHBZ 754198 (multiple dmeventd snapshot extensions)
2011-11-18 19:56 ` Alasdair G Kergon
@ 2011-11-18 20:28 ` Petr Rockai
2011-11-18 21:19 ` Petr Rockai
0 siblings, 1 reply; 17+ messages in thread
From: Petr Rockai @ 2011-11-18 20:28 UTC (permalink / raw)
To: lvm-devel
Alasdair G Kergon <agk@redhat.com> writes:
> On Fri, Nov 18, 2011 at 08:46:44PM +0100, Peter Rockai wrote:
>> that lvconvert did
>> not change anything. We don't know 2 and wrongly assume that nothing
>> changes, ever (that's the bug).
>
> Yes, and we can detect 2 from the status line by checking whether the
> device size changed, avoiding the need for the new exit code.
Which is exactly what I proposed two mails back as a workaround for not
being able to use the exit code. So is that the preferred solution?
--
id' Ash = Ash; id' Dust = Dust; id' _ = undefined
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH] Fix RHBZ 754198 (multiple dmeventd snapshot extensions)
2011-11-18 19:46 ` Petr Rockai
2011-11-18 19:56 ` Alasdair G Kergon
@ 2011-11-18 20:48 ` Zdenek Kabelac
2011-11-18 20:53 ` Alasdair G Kergon
1 sibling, 1 reply; 17+ messages in thread
From: Zdenek Kabelac @ 2011-11-18 20:48 UTC (permalink / raw)
To: lvm-devel
Dne 18.11.2011 20:46, Petr Rockai napsal(a):
> Zdenek Kabelac<zkabelac@redhat.com> writes:
>
>> The formula above should be placed inside lvresize
>> (_adjust_policy_params()).
>
> And still completely besides the point.
>
>>> The problem:
>>>
>>> - to check whether anything needs to happen, lvextend needs to be
>>> executed (this is *expensive*, even if it decides no action needs to
>>> happen)
>>>
>>> - if nothing needed to happen, we don't need to call lvextend until the
>>> utilisation has grown; *but* we don't know whether anything happened
>>> (without ENO_ACTION_NEEDED that is)
>>
>> But you still call dmeventd_lvm2_run() - which is the most expensive
>> operation here - so I do not exactly see what do you actually safe here
>> i.e. if the resize will not happen because according to policy it's not
>> yet needed - then I do not see any difference ?
>
> Well, what can I say. I have already tried to explain how this works
> (twice?), and it's also quite obvious from the code. Well, once more: we
> do *not* call dmeventd_lvm2_run *if* the last lvconvert decided that
> nothing needs to be done *and* the snapshot utilisation did not change
> (more than 5%).
>
> To implement that, we need to: 1) know that snapshot utilisation grew
> (we know this already in the existing code) and 2) that lvconvert did
> not change anything. We don't know 2 and wrongly assume that nothing
> changes, ever (that's the bug).
>
> Were we to assume every time that lvconvert might have changed
> something, we would have to call it *every time* (every 10 seconds).
>
> Is it clear now?
>
Well my assumption is:
while {
read percentage
if converted && current_percent < last_time_percent
last_time_percent = current_percent
converted = 0
else if current_percent >= (last_time_percent + 5%)
lvconvert
converted = 1
else
wait 10sec
}
Zdenek
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2011-11-18 21:19 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-15 22:00 [PATCH] Fix RHBZ 754198 (multiple dmeventd snapshot extensions) Petr Rockai
2011-11-15 22:23 ` Milan Broz
[not found] ` <20111116000020.GA3294@agk-dp.fab.redhat.com>
2011-11-16 9:30 ` Petr Rockai
2011-11-16 9:44 ` Zdenek Kabelac
2011-11-16 10:28 ` Alasdair G Kergon
2011-11-16 1:20 ` Alasdair G Kergon
2011-11-16 9:41 ` Petr Rockai
2011-11-16 10:59 ` Alasdair G Kergon
2011-11-16 13:07 ` Alasdair G Kergon
2011-11-18 9:52 ` Petr Rockai
2011-11-18 17:14 ` Zdenek Kabelac
2011-11-18 19:46 ` Petr Rockai
2011-11-18 19:56 ` Alasdair G Kergon
2011-11-18 20:28 ` Petr Rockai
2011-11-18 21:19 ` Petr Rockai
2011-11-18 20:48 ` Zdenek Kabelac
2011-11-18 20:53 ` Alasdair G Kergon
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.