reiser4: FITRIM ioctl -- how to grab the space?

reiserfs-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* reiser4: FITRIM ioctl -- how to grab the space?
@ 2014-07-31 20:47 Ivan Shapovalov
  2014-07-31 22:03 ` Edward Shishkin
  0 siblings, 1 reply; 20+ messages in thread
From: Ivan Shapovalov @ 2014-07-31 20:47 UTC (permalink / raw)
  To: reiserfs-devel

[-- Attachment #1: Type: text/plain, Size: 1361 bytes --]

...and here is volume 2 of my neverending question series :)

A stub ioctl, placed as I described, seems to work and log a warning each time
I invoke `fstrim` on anything mounted.

Now on to space allocation. This seems pretty easy in the first approximation:
reiser4_alloc_blocks_bitmap() seems to do what's needed, finding the nearest
free extent, not the biggest one. So we could just call reiser4_alloc_blocks()
plus reiser4_dealloc_blocks(BA_DEFER) repeatedly, while allocations succeed.

However, I suppose we need to grab all free space before proceeding. There is
no primitive for grabbing all free space, so we need to read block counters,
calculate amount of space to grab and then call reiser4_grab_reserved()
(as we want to allocate everything, including the reserved space. This creates
two problems (as I see it):
- there is a race between reading block counters (under sb spinlock) and
  performing the grab (which takes the spinlock on its own);
- if anything will try to grab space during discarding, it will get an ENOSPC,
  while it's better to make the process wait until discarding is completed.

I'm not sure whether the last problem really exists (there is BA_CAN_COMMIT,
but I don't know whether is it used consistently where possible).

Could you explain these things?

Thanks,
-- 
Ivan Shapovalov / intelfx /

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 213 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4: FITRIM ioctl -- how to grab the space?
  2014-07-31 20:47 reiser4: FITRIM ioctl -- how to grab the space? Ivan Shapovalov
@ 2014-07-31 22:03 ` Edward Shishkin
  2014-07-31 22:16   ` Ivan Shapovalov
  2014-08-02 16:40   ` Edward Shishkin
  0 siblings, 2 replies; 20+ messages in thread
From: Edward Shishkin @ 2014-07-31 22:03 UTC (permalink / raw)
  To: Ivan Shapovalov, reiserfs-devel

grab_space() is only for processes which require disk space (including
wandered blocks). For example ->read() modifies a superblock (atime),
which should be written via journal. So ->read() grab 1 block, etc.

As to FITRIM ioctl: I don't think it needs to reserve disk space at all: 
it will
issue only discard requests.

Edward.


On 07/31/2014 10:47 PM, Ivan Shapovalov wrote:
> ...and here is volume 2 of my neverending question series :)
>
> A stub ioctl, placed as I described, seems to work and log a warning each time
> I invoke `fstrim` on anything mounted.
>
> Now on to space allocation. This seems pretty easy in the first approximation:
> reiser4_alloc_blocks_bitmap() seems to do what's needed, finding the nearest
> free extent, not the biggest one. So we could just call reiser4_alloc_blocks()
> plus reiser4_dealloc_blocks(BA_DEFER) repeatedly, while allocations succeed.
>
> However, I suppose we need to grab all free space before proceeding. There is
> no primitive for grabbing all free space, so we need to read block counters,
> calculate amount of space to grab and then call reiser4_grab_reserved()
> (as we want to allocate everything, including the reserved space. This creates
> two problems (as I see it):
> - there is a race between reading block counters (under sb spinlock) and
>    performing the grab (which takes the spinlock on its own);
> - if anything will try to grab space during discarding, it will get an ENOSPC,
>    while it's better to make the process wait until discarding is completed.
>
> I'm not sure whether the last problem really exists (there is BA_CAN_COMMIT,
> but I don't know whether is it used consistently where possible).
>
> Could you explain these things?
>
> Thanks,


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4: FITRIM ioctl -- how to grab the space?
  2014-07-31 22:03 ` Edward Shishkin
@ 2014-07-31 22:16   ` Ivan Shapovalov
       [not found]     ` <53DACEE9.8000802@gmail.com>
  2014-08-02 16:40   ` Edward Shishkin
  1 sibling, 1 reply; 20+ messages in thread
From: Ivan Shapovalov @ 2014-07-31 22:16 UTC (permalink / raw)
  To: Edward Shishkin; +Cc: reiserfs-devel

[-- Attachment #1: Type: text/plain, Size: 2225 bytes --]

On Friday 01 August 2014 at 00:03:58, Edward Shishkin wrote:	
> grab_space() is only for processes which require disk space (including
> wandered blocks). For example ->read() modifies a superblock (atime),
> which should be written via journal. So ->read() grab 1 block, etc.
> 
> As to FITRIM ioctl: I don't think it needs to reserve disk space at all: 
> it will
> issue only discard requests.
> 
> Edward.

OK. But it still needs to allocate space (even if just to deallocate it
shortly), which means that someone could get ENOSPC while the discard is in
progress, and this ENOSPC will come from the allocator itself.

Do you have any ideas on how to do it gracefully?

Thanks,
-- 
Ivan Shapovalov / intelfx /

> 
> 
> On 07/31/2014 10:47 PM, Ivan Shapovalov wrote:
> > ...and here is volume 2 of my neverending question series :)
> >
> > A stub ioctl, placed as I described, seems to work and log a warning each time
> > I invoke `fstrim` on anything mounted.
> >
> > Now on to space allocation. This seems pretty easy in the first approximation:
> > reiser4_alloc_blocks_bitmap() seems to do what's needed, finding the nearest
> > free extent, not the biggest one. So we could just call reiser4_alloc_blocks()
> > plus reiser4_dealloc_blocks(BA_DEFER) repeatedly, while allocations succeed.
> >
> > However, I suppose we need to grab all free space before proceeding. There is
> > no primitive for grabbing all free space, so we need to read block counters,
> > calculate amount of space to grab and then call reiser4_grab_reserved()
> > (as we want to allocate everything, including the reserved space. This creates
> > two problems (as I see it):
> > - there is a race between reading block counters (under sb spinlock) and
> >    performing the grab (which takes the spinlock on its own);
> > - if anything will try to grab space during discarding, it will get an ENOSPC,
> >    while it's better to make the process wait until discarding is completed.
> >
> > I'm not sure whether the last problem really exists (there is BA_CAN_COMMIT,
> > but I don't know whether is it used consistently where possible).
> >
> > Could you explain these things?
> >
> > Thanks,
> 

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 213 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4: FITRIM ioctl -- how to grab the space?
  2014-07-31 22:03 ` Edward Shishkin
  2014-07-31 22:16   ` Ivan Shapovalov
@ 2014-08-02 16:40   ` Edward Shishkin
  1 sibling, 0 replies; 20+ messages in thread
From: Edward Shishkin @ 2014-08-02 16:40 UTC (permalink / raw)
  To: Ivan Shapovalov, reiserfs-devel


On 08/01/2014 12:03 AM, Edward Shishkin wrote:
> grab_space() is only for processes which require disk space (including
> wandered blocks). For example ->read() modifies a superblock (atime),


^superblock^ stat_data (on-disk inode)


> which should be written via journal. So ->read() grab 1 block, etc.
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4: FITRIM ioctl -- how to grab the space?
       [not found]     ` <53DACEE9.8000802@gmail.com>
@ 2014-08-10 18:52       ` Ivan Shapovalov
  2014-08-10 19:48         ` Edward Shishkin
  0 siblings, 1 reply; 20+ messages in thread
From: Ivan Shapovalov @ 2014-08-10 18:52 UTC (permalink / raw)
  To: Edward Shishkin; +Cc: reiserfs-devel

[-- Attachment #1: Type: text/plain, Size: 1125 bytes --]

On Friday 01 August 2014 at 01:19:05, Edward Shishkin wrote:	
> 
> On 08/01/2014 12:16 AM, Ivan Shapovalov wrote:
> > [...]
> > OK. But it still needs to allocate space (even if just to deallocate it
> > shortly), which means that someone could get ENOSPC while the discard is in
> > progress, and this ENOSPC will come from the allocator itself.
> 
> 
> Ah, yes of course, we'll illegally occupy someone's reservation with
> the following oops...
> 
> 
> > Do you have any ideas on how to do it gracefully?
> 
> 
> Define maximal number of allocated blocks in one iteration
> and reserve this amount of blocks at the beginning of each
> iteration. Once the limit is exhausted, stop the scan and
> force to commit the atom.

This sounds pretty hackish... Isn't there a way to grab all possible space
at the same time?
By all possible space I mean (sbinfo->block_count - sbinfo->blocks_used),
so that `fstrim <mountpoint>` will be efficient even if system is under load
and atoms are being created continuously.

Or am I trying to over-engineer too much?

Thanks,
-- 
Ivan Shapovalov / intelfx /

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 213 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4: FITRIM ioctl -- how to grab the space?
  2014-08-10 18:52       ` Ivan Shapovalov
@ 2014-08-10 19:48         ` Edward Shishkin
  2014-08-10 20:37           ` Ivan Shapovalov
  0 siblings, 1 reply; 20+ messages in thread
From: Edward Shishkin @ 2014-08-10 19:48 UTC (permalink / raw)
  To: Ivan Shapovalov; +Cc: reiserfs-devel


On 08/10/2014 08:52 PM, Ivan Shapovalov wrote:
> On Friday 01 August 2014 at 01:19:05, Edward Shishkin wrote:	
>> On 08/01/2014 12:16 AM, Ivan Shapovalov wrote:
>>> [...]
>>> OK. But it still needs to allocate space (even if just to deallocate it
>>> shortly), which means that someone could get ENOSPC while the discard is in
>>> progress, and this ENOSPC will come from the allocator itself.
>>
>> Ah, yes of course, we'll illegally occupy someone's reservation with
>> the following oops...
>>
>>
>>> Do you have any ideas on how to do it gracefully?
>>
>> Define maximal number of allocated blocks in one iteration
>> and reserve this amount of blocks at the beginning of each
>> iteration. Once the limit is exhausted, stop the scan and
>> force to commit the atom.
> This sounds pretty hackish... Isn't there a way to grab all possible space
> at the same time?
> By all possible space I mean (sbinfo->block_count - sbinfo->blocks_used),
> so that `fstrim <mountpoint>` will be efficient even if system is under load
> and atoms are being created continuously.


I am afraid that other processes will return ENOSPC, whereas there is
a lot of free disk space.
Assume fstrim grabbed all possible space. A process X , who needs to
reserve space invokes txnmgr_force_commit_all(). Everyone waits for
commits completion. After this fstrim grabs all possible space again
(there is no any queue for free space reclaimers). Process X returns
ENOSPC.

Edward.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4: FITRIM ioctl -- how to grab the space?
  2014-08-10 19:48         ` Edward Shishkin
@ 2014-08-10 20:37           ` Ivan Shapovalov
  2014-08-10 23:29             ` Edward Shishkin
  0 siblings, 1 reply; 20+ messages in thread
From: Ivan Shapovalov @ 2014-08-10 20:37 UTC (permalink / raw)
  To: Edward Shishkin; +Cc: reiserfs-devel

[-- Attachment #1: Type: text/plain, Size: 1834 bytes --]

On Sunday 10 August 2014 at 21:48:05, Edward Shishkin wrote:	
> 
> On 08/10/2014 08:52 PM, Ivan Shapovalov wrote:
> > On Friday 01 August 2014 at 01:19:05, Edward Shishkin wrote:	
> >> [...]
> >> Define maximal number of allocated blocks in one iteration
> >> and reserve this amount of blocks at the beginning of each
> >> iteration. Once the limit is exhausted, stop the scan and
> >> force to commit the atom.
> > This sounds pretty hackish... Isn't there a way to grab all possible space
> > at the same time?
> > By all possible space I mean (sbinfo->block_count - sbinfo->blocks_used),
> > so that `fstrim <mountpoint>` will be efficient even if system is under load
> > and atoms are being created continuously.
> 
> 
> I am afraid that other processes will return ENOSPC, whereas there is
> a lot of free disk space.
> Assume fstrim grabbed all possible space. A process X , who needs to
> reserve space invokes txnmgr_force_commit_all(). Everyone waits for
> commits completion. After this fstrim grabs all possible space again
> (there is no any queue for free space reclaimers). Process X returns
> ENOSPC.
> 
> Edward.

I've meant "grabbing all space and then allocating all space" -- so there won't
be multiple grabs or multiple atoms.

Then all processes grabbing space with BA_CAN_COMMIT will wait for the discard
atom to commit. (Actually, there is a small race window between grabbing space
and creating an atom...)

The only problem is to wait for (sbinfo->block_count == sbinfo->blocks_used +
sbinfo->blocks_free) condition, i. e. until no blocks are reserved in any form,
and then to grab all space atomically wrt. reaching this condition.

Again, if this is not feasible, I'll go with the multiple atoms approach. I
just want to make sure.

-- 
Ivan Shapovalov / intelfx /

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 213 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4: FITRIM ioctl -- how to grab the space?
  2014-08-10 20:37           ` Ivan Shapovalov
@ 2014-08-10 23:29             ` Edward Shishkin
  2014-08-11  9:39               ` Ivan Shapovalov
  0 siblings, 1 reply; 20+ messages in thread
From: Edward Shishkin @ 2014-08-10 23:29 UTC (permalink / raw)
  To: Ivan Shapovalov; +Cc: reiserfs-devel


On 08/10/2014 10:37 PM, Ivan Shapovalov wrote:
> On Sunday 10 August 2014 at 21:48:05, Edward Shishkin wrote:	
>> On 08/10/2014 08:52 PM, Ivan Shapovalov wrote:
>>> On Friday 01 August 2014 at 01:19:05, Edward Shishkin wrote:	
>>>> [...]
>>>> Define maximal number of allocated blocks in one iteration
>>>> and reserve this amount of blocks at the beginning of each
>>>> iteration. Once the limit is exhausted, stop the scan and
>>>> force to commit the atom.
>>> This sounds pretty hackish... Isn't there a way to grab all possible space
>>> at the same time?
>>> By all possible space I mean (sbinfo->block_count - sbinfo->blocks_used),
>>> so that `fstrim <mountpoint>` will be efficient even if system is under load
>>> and atoms are being created continuously.
>>
>> I am afraid that other processes will return ENOSPC, whereas there is
>> a lot of free disk space.
>> Assume fstrim grabbed all possible space. A process X , who needs to
>> reserve space invokes txnmgr_force_commit_all(). Everyone waits for
>> commits completion. After this fstrim grabs all possible space again
>> (there is no any queue for free space reclaimers). Process X returns
>> ENOSPC.
>>
>> Edward.
> I've meant "grabbing all space and then allocating all space" -- so there won't
> be multiple grabs or multiple atoms.
>
> Then all processes grabbing space with BA_CAN_COMMIT will wait for the discard
> atom to commit.


It seems such waiting will screw up the system. No?


>   (Actually, there is a small race window between grabbing space
> and creating an atom...)


Which one?


>
> The only problem is to wait for (sbinfo->block_count == sbinfo->blocks_used +
> sbinfo->blocks_free) condition, i. e. until no blocks are reserved in any form,
> and then to grab all space atomically wrt. reaching this condition.
>
> Again, if this is not feasible, I'll go with the multiple atoms approach. I
> just want to make sure.
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4: FITRIM ioctl -- how to grab the space?
  2014-08-10 23:29             ` Edward Shishkin
@ 2014-08-11  9:39               ` Ivan Shapovalov
  2014-08-16  0:44                 ` Ivan Shapovalov
  0 siblings, 1 reply; 20+ messages in thread
From: Ivan Shapovalov @ 2014-08-11  9:39 UTC (permalink / raw)
  To: Edward Shishkin; +Cc: reiserfs-devel

[-- Attachment #1: Type: text/plain, Size: 2502 bytes --]

On Monday 11 August 2014 at 01:29:59, Edward Shishkin wrote:	
> 
> On 08/10/2014 10:37 PM, Ivan Shapovalov wrote:
> > On Sunday 10 August 2014 at 21:48:05, Edward Shishkin wrote:	
> >> On 08/10/2014 08:52 PM, Ivan Shapovalov wrote:
> >>> On Friday 01 August 2014 at 01:19:05, Edward Shishkin wrote:	
> >>>> [...]
> >>>> Define maximal number of allocated blocks in one iteration
> >>>> and reserve this amount of blocks at the beginning of each
> >>>> iteration. Once the limit is exhausted, stop the scan and
> >>>> force to commit the atom.
> >>> This sounds pretty hackish... Isn't there a way to grab all possible space
> >>> at the same time?
> >>> By all possible space I mean (sbinfo->block_count - sbinfo->blocks_used),
> >>> so that `fstrim <mountpoint>` will be efficient even if system is under load
> >>> and atoms are being created continuously.
> >>
> >> I am afraid that other processes will return ENOSPC, whereas there is
> >> a lot of free disk space.
> >> Assume fstrim grabbed all possible space. A process X , who needs to
> >> reserve space invokes txnmgr_force_commit_all(). Everyone waits for
> >> commits completion. After this fstrim grabs all possible space again
> >> (there is no any queue for free space reclaimers). Process X returns
> >> ENOSPC.
> >>
> >> Edward.
> > I've meant "grabbing all space and then allocating all space" -- so there won't
> > be multiple grabs or multiple atoms.
> >
> > Then all processes grabbing space with BA_CAN_COMMIT will wait for the discard
> > atom to commit.
> 
> 
> It seems such waiting will screw up the system. No?

I was afraid of such situations, but how would that happen? The discard atom's
commit will always be able to proceed as it doesn't grab space at all.

> >   (Actually, there is a small race window between grabbing space
> > and creating an atom...)
> 
> 
> Which one?

BA_CAN_COMMIT machinery does wait only for atoms, not for contexts. If
process X happens to grab space between us grabbing space and creating an atom,
it will get -ENOSPC even with BA_CAN_COMMIT.

-- 
Ivan Shapovalov / intelfx /

> > The only problem is to wait for (sbinfo->block_count == sbinfo->blocks_used +
> > sbinfo->blocks_free) condition, i. e. until no blocks are reserved in any form,
> > and then to grab all space atomically wrt. reaching this condition.
> >
> > Again, if this is not feasible, I'll go with the multiple atoms approach. I
> > just want to make sure.
> >
> 

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 213 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4: FITRIM ioctl -- how to grab the space?
  2014-08-11  9:39               ` Ivan Shapovalov
@ 2014-08-16  0:44                 ` Ivan Shapovalov
  2014-08-16  8:09                   ` Edward Shishkin
  0 siblings, 1 reply; 20+ messages in thread
From: Ivan Shapovalov @ 2014-08-16  0:44 UTC (permalink / raw)
  To: reiserfs-devel; +Cc: Edward Shishkin

[-- Attachment #1: Type: text/plain, Size: 3466 bytes --]

On Monday 11 August 2014 at 13:39:12, Ivan Shapovalov wrote:	
> [...]
> > > I've meant "grabbing all space and then allocating all space" -- so there won't
> > > be multiple grabs or multiple atoms.
> > >
> > > Then all processes grabbing space with BA_CAN_COMMIT will wait for the discard
> > > atom to commit.
> > 
> > 
> > It seems such waiting will screw up the system. No?
> 
> I was afraid of such situations, but how would that happen? The discard atom's
> commit will always be able to proceed as it doesn't grab space at all.
> 
> > >   (Actually, there is a small race window between grabbing space
> > > and creating an atom...)
> > 
> > 
> > Which one?
> 
> BA_CAN_COMMIT machinery does wait only for atoms, not for contexts. If
> process X happens to grab space between us grabbing space and creating an atom,
> it will get -ENOSPC even with BA_CAN_COMMIT.
> 
> > > The only problem is to wait for (sbinfo->block_count == sbinfo->blocks_used +
> > > sbinfo->blocks_free) condition, i. e. until no blocks are reserved in any form,
> > > and then to grab all space atomically wrt. reaching this condition.
> > >
> > > Again, if this is not feasible, I'll go with the multiple atoms approach. I
> > > just want to make sure.
> > >
> > 

...so, I've almost given up implementing this :)

In kernel there is a read-write semaphore implementation called rwsem.
I've added a per-superblock instance of rwsem with following semantics:

- when count of grabbed+special (not free or used) blocks is increased by any
  means, the semaphore is taken for reading before taking spinlock and
  modifying counters

- if the counters already were non-zero, the semaphore has been already taken
  for reading (reader count > 1) and it is released once while under spinlock
  (so that reader count always stays at 1)

- when count of grabbed+special blocks is decreased and drops to zero, the
  semaphore is released once (so reader count drops to 0 unless there is a race
  with increasing the count)

- on second try of BA_CAN_COMMIT grabbing (if there was not enough space),
  the semaphore is taken for writing instead of for reading, ensuring
  that every block is either permanently used or free. The write lock is
  converted to read lock after grabbing required space.

This "almost" works. The main problem is that Linux rwsem implementation
is write-biased: that is, if there are writers waiting, readers count can't
increase. That is, a process must not take a semaphore for reading in second
time if it is responsible for releasing the "first time" reader.

The comment in original rwsem implementation by Andrew Morton states following:
"It is NOT legal for one task to down_read() an rwsem multiple times."

  reader1     writer1
------------------------
down_read()
            down_write()
            up_write()
down_read()
up_read()
up_read()

This is a deadlock: reader1's down_read() blocks on writer1's up_write(),
while writer1's down_write() blocks on reader1's second up_read().

A force grab (or a grab preceded by grab_space_enable(), or a used2something)
deadlocks 100% in presence of waiting writers, and so does the corresponding
transaction commit.

So I need to find a way to take rwsem in a read-biased mode... Any advice is
accepted, including "give up with adding of yet another lock and go with
multiple transactions" :)

-- 
Ivan Shapovalov / intelfx /

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 213 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4: FITRIM ioctl -- how to grab the space?
  2014-08-16  0:44                 ` Ivan Shapovalov
@ 2014-08-16  8:09                   ` Edward Shishkin
  2014-08-16  8:23                     ` Edward Shishkin
  2014-08-16 11:17                     ` Ivan Shapovalov
  0 siblings, 2 replies; 20+ messages in thread
From: Edward Shishkin @ 2014-08-16  8:09 UTC (permalink / raw)
  To: Ivan Shapovalov, reiserfs-devel


On 08/16/2014 02:44 AM, Ivan Shapovalov wrote:
> On Monday 11 August 2014 at 13:39:12, Ivan Shapovalov wrote:	
>> [...]
>>>> I've meant "grabbing all space and then allocating all space" -- so there won't
>>>> be multiple grabs or multiple atoms.
>>>>
>>>> Then all processes grabbing space with BA_CAN_COMMIT will wait for the discard
>>>> atom to commit.
>>>
>>> It seems such waiting will screw up the system. No?
>> I was afraid of such situations, but how would that happen? The discard atom's
>> commit will always be able to proceed as it doesn't grab space at all.
>>
>>>>    (Actually, there is a small race window between grabbing space
>>>> and creating an atom...)
>>>
>>> Which one?
>> BA_CAN_COMMIT machinery does wait only for atoms, not for contexts. If
>> process X happens to grab space between us grabbing space and creating an atom,
>> it will get -ENOSPC even with BA_CAN_COMMIT.


I still don't see any "races" here. How atom creation is related to grabbing
space? Are we talking about races in the existing code? f so, please show
the racing paths..


>>
>>>> The only problem is to wait for (sbinfo->block_count == sbinfo->blocks_used +
>>>> sbinfo->blocks_free) condition, i. e. until no blocks are reserved in any form,
>>>> and then to grab all space atomically wrt. reaching this condition.
>>>>
>>>> Again, if this is not feasible, I'll go with the multiple atoms approach. I
>>>> just want to make sure.
>>>>
> ...so, I've almost given up implementing this :)


great!


>
> In kernel there is a read-write semaphore implementation called rwsem.
> I've added a per-superblock instance of rwsem with following semantics:
>
> - when count of grabbed+special (not free or used) blocks is increased by any
>    means, the semaphore is taken for reading before taking spinlock and
>    modifying counters
>
> - if the counters already were non-zero, the semaphore has been already taken
>    for reading (reader count > 1) and it is released once while under spinlock
>    (so that reader count always stays at 1)
>
> - when count of grabbed+special blocks is decreased and drops to zero, the
>    semaphore is released once (so reader count drops to 0 unless there is a race
>    with increasing the count)
>
> - on second try of BA_CAN_COMMIT grabbing (if there was not enough space),
>    the semaphore is taken for writing instead of for reading, ensuring
>    that every block is either permanently used or free. The write lock is
>    converted to read lock after grabbing required space.
>
> This "almost" works. The main problem is that Linux rwsem implementation
> is write-biased: that is, if there are writers waiting, readers count can't
> increase. That is, a process must not take a semaphore for reading in second
> time if it is responsible for releasing the "first time" reader.
>
> The comment in original rwsem implementation by Andrew Morton states following:
> "It is NOT legal for one task to down_read() an rwsem multiple times."
>
>    reader1     writer1
> ------------------------
> down_read()
>              down_write()
>              up_write()
> down_read()
> up_read()
> up_read()
>
> This is a deadlock: reader1's down_read() blocks on writer1's up_write(),
> while writer1's down_write() blocks on reader1's second up_read().
>
> A force grab (or a grab preceded by grab_space_enable(), or a used2something)
> deadlocks 100% in presence of waiting writers, and so does the corresponding
> transaction commit.
>
> So I need to find a way to take rwsem in a read-biased mode... Any advice is
> accepted, including "give up with adding of yet another lock and go with
> multiple transactions" :)


IMHO this is too complicated.

Why don't you want to grab, say, 20M per iteration?
It should work without any problems, just maintain a
counter of blocks allocated in the iteration..

Thanks,
Edward.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4: FITRIM ioctl -- how to grab the space?
  2014-08-16  8:09                   ` Edward Shishkin
@ 2014-08-16  8:23                     ` Edward Shishkin
  2014-08-16 11:27                       ` Ivan Shapovalov
  2014-08-16 11:17                     ` Ivan Shapovalov
  1 sibling, 1 reply; 20+ messages in thread
From: Edward Shishkin @ 2014-08-16  8:23 UTC (permalink / raw)
  To: Ivan Shapovalov, reiserfs-devel


On 08/16/2014 10:09 AM, Edward Shishkin wrote:
>
> On 08/16/2014 02:44 AM, Ivan Shapovalov wrote:
>> On Monday 11 August 2014 at 13:39:12, Ivan Shapovalov wrote:
>>> [...]
>>>>> I've meant "grabbing all space and then allocating all space" -- 
>>>>> so there won't
>>>>> be multiple grabs or multiple atoms.
>>>>>
>>>>> Then all processes grabbing space with BA_CAN_COMMIT will wait for 
>>>>> the discard
>>>>> atom to commit.
>>>>
>>>> It seems such waiting will screw up the system. No?
>>> I was afraid of such situations, but how would that happen? The 
>>> discard atom's
>>> commit will always be able to proceed as it doesn't grab space at all.
>>>
>>>>>    (Actually, there is a small race window between grabbing space
>>>>> and creating an atom...)
>>>>
>>>> Which one?
>>> BA_CAN_COMMIT machinery does wait only for atoms, not for contexts. If
>>> process X happens to grab space between us grabbing space and 
>>> creating an atom,
>>> it will get -ENOSPC even with BA_CAN_COMMIT.
>
>
> I still don't see any "races" here. How atom creation is related to 
> grabbing
> space? Are we talking about races in the existing code? f so, please show
> the racing paths..
>
>
>>>
>>>>> The only problem is to wait for (sbinfo->block_count == 
>>>>> sbinfo->blocks_used +
>>>>> sbinfo->blocks_free) condition, i. e. until no blocks are reserved 
>>>>> in any form,
>>>>> and then to grab all space atomically wrt. reaching this condition.
>>>>>
>>>>> Again, if this is not feasible, I'll go with the multiple atoms 
>>>>> approach. I
>>>>> just want to make sure.
>>>>>
>> ...so, I've almost given up implementing this :)
>
>
> great!
>
>
>>
>> In kernel there is a read-write semaphore implementation called rwsem.
>> I've added a per-superblock instance of rwsem with following semantics:
>>
>> - when count of grabbed+special (not free or used) blocks is 
>> increased by any
>>    means, the semaphore is taken for reading before taking spinlock and
>>    modifying counters
>>
>> - if the counters already were non-zero, the semaphore has been 
>> already taken
>>    for reading (reader count > 1) and it is released once while under 
>> spinlock
>>    (so that reader count always stays at 1)
>>
>> - when count of grabbed+special blocks is decreased and drops to 
>> zero, the
>>    semaphore is released once (so reader count drops to 0 unless 
>> there is a race
>>    with increasing the count)
>>
>> - on second try of BA_CAN_COMMIT grabbing (if there was not enough 
>> space),
>>    the semaphore is taken for writing instead of for reading, ensuring
>>    that every block is either permanently used or free. The write 
>> lock is
>>    converted to read lock after grabbing required space.
>>
>> This "almost" works. The main problem is that Linux rwsem implementation
>> is write-biased: that is, if there are writers waiting, readers count 
>> can't
>> increase. That is, a process must not take a semaphore for reading in 
>> second
>> time if it is responsible for releasing the "first time" reader.
>>
>> The comment in original rwsem implementation by Andrew Morton states 
>> following:
>> "It is NOT legal for one task to down_read() an rwsem multiple times."
>>
>>    reader1     writer1
>> ------------------------
>> down_read()
>>              down_write()
>>              up_write()
>> down_read()
>> up_read()
>> up_read()
>>
>> This is a deadlock: reader1's down_read() blocks on writer1's 
>> up_write(),
>> while writer1's down_write() blocks on reader1's second up_read().
>>
>> A force grab (or a grab preceded by grab_space_enable(), or a 
>> used2something)
>> deadlocks 100% in presence of waiting writers, and so does the 
>> corresponding
>> transaction commit.
>>
>> So I need to find a way to take rwsem in a read-biased mode... Any 
>> advice is
>> accepted, including "give up with adding of yet another lock and go with
>> multiple transactions" :)
>
>
> IMHO this is too complicated.
>
> Why don't you want to grab, say, 20M per iteration?
> It should work without any problems, just maintain a
> counter of blocks allocated in the iteration..


add the counter to the struct reiser4_context and
set it to zero at the beginning of every iteration.
use get_current_context() to access the counter.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4: FITRIM ioctl -- how to grab the space?
  2014-08-16  8:09                   ` Edward Shishkin
  2014-08-16  8:23                     ` Edward Shishkin
@ 2014-08-16 11:17                     ` Ivan Shapovalov
  2014-08-16 12:15                       ` Edward Shishkin
  1 sibling, 1 reply; 20+ messages in thread
From: Ivan Shapovalov @ 2014-08-16 11:17 UTC (permalink / raw)
  To: Edward Shishkin; +Cc: reiserfs-devel

[-- Attachment #1: Type: text/plain, Size: 1711 bytes --]

On Saturday 16 August 2014 at 10:09:44, Edward Shishkin wrote:	
> 
> On 08/16/2014 02:44 AM, Ivan Shapovalov wrote:
> > On Monday 11 August 2014 at 13:39:12, Ivan Shapovalov wrote:	
> >> [...]
> >>>> I've meant "grabbing all space and then allocating all space" -- so there won't
> >>>> be multiple grabs or multiple atoms.
> >>>>
> >>>> Then all processes grabbing space with BA_CAN_COMMIT will wait for the discard
> >>>> atom to commit.
> >>>
> >>> It seems such waiting will screw up the system. No?
> >> I was afraid of such situations, but how would that happen? The discard atom's
> >> commit will always be able to proceed as it doesn't grab space at all.
> >>
> >>>>    (Actually, there is a small race window between grabbing space
> >>>> and creating an atom...)
> >>>
> >>> Which one?
> >> BA_CAN_COMMIT machinery does wait only for atoms, not for contexts. If
> >> process X happens to grab space between us grabbing space and creating an atom,
> >> it will get -ENOSPC even with BA_CAN_COMMIT.
> 
> 
> I still don't see any "races" here. How atom creation is related to grabbing
> space? Are we talking about races in the existing code? f so, please show
> the racing paths..

Well, this is not a race per se - it does not involve locking. But it is
a race-like behavior.

taskA                                 taskB
--------------------------------------------------------------------------
grab very much space
                                        grab some space with BA_CAN_COMMIT
create an atom using the grabbed space

In this case, the taskB's grab will fail though it could wait for taskA's
not yet created atom.

-- 
Ivan Shapovalov / intelfx /

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 213 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4: FITRIM ioctl -- how to grab the space?
  2014-08-16  8:23                     ` Edward Shishkin
@ 2014-08-16 11:27                       ` Ivan Shapovalov
  2014-08-16 13:35                         ` Edward Shishkin
  0 siblings, 1 reply; 20+ messages in thread
From: Ivan Shapovalov @ 2014-08-16 11:27 UTC (permalink / raw)
  To: Edward Shishkin; +Cc: reiserfs-devel

[-- Attachment #1: Type: text/plain, Size: 1580 bytes --]

On Saturday 16 August 2014 at 10:23:08, Edward Shishkin wrote:	
> [...]
> >>
> >> So I need to find a way to take rwsem in a read-biased mode... Any 
> >> advice is
> >> accepted, including "give up with adding of yet another lock and go with
> >> multiple transactions" :)
> >
> >
> > IMHO this is too complicated.
> >
> > Why don't you want to grab, say, 20M per iteration?
> > It should work without any problems, just maintain a
> > counter of blocks allocated in the iteration..

I see it mostly like a "good to have" thing. When the system is under load,
FITRIM implemented in this "simple" way will be ineffective: it will miss
some blocks that are temporarily allocated (journal, etc.) even though it is
theoretically possible to stop accepting new transactions, commit all existing
ones and perform discard while the system is artificially made idle, then
resume everything.

So I've been trying to check if this is feasible. Looks like it isn't.

> add the counter to the struct reiser4_context and
> set it to zero at the beginning of every iteration.
> use get_current_context() to access the counter.

Is it needed? The simple way is just a loop consisting of
- grab space
- allocate blocks forward from start position (initially 0) until end of
  partition or limit reached
- update start position to `last allocated block + 1`
- put allocated space to the delete set
- mark for force-committing
- reiser4_txn_restart_current()
- if we've reached end of partition earlier, break, otherwise loop

-- 
Ivan Shapovalov / intelfx /

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 213 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4: FITRIM ioctl -- how to grab the space?
  2014-08-16 11:17                     ` Ivan Shapovalov
@ 2014-08-16 12:15                       ` Edward Shishkin
  2014-08-16 17:02                         ` Ivan Shapovalov
  0 siblings, 1 reply; 20+ messages in thread
From: Edward Shishkin @ 2014-08-16 12:15 UTC (permalink / raw)
  To: Ivan Shapovalov; +Cc: reiserfs-devel


On 08/16/2014 01:17 PM, Ivan Shapovalov wrote:
> On Saturday 16 August 2014 at 10:09:44, Edward Shishkin wrote:	
>> On 08/16/2014 02:44 AM, Ivan Shapovalov wrote:
>>> On Monday 11 August 2014 at 13:39:12, Ivan Shapovalov wrote:	
>>>> [...]
>>>>>> I've meant "grabbing all space and then allocating all space" -- so there won't
>>>>>> be multiple grabs or multiple atoms.
>>>>>>
>>>>>> Then all processes grabbing space with BA_CAN_COMMIT will wait for the discard
>>>>>> atom to commit.
>>>>> It seems such waiting will screw up the system. No?
>>>> I was afraid of such situations, but how would that happen? The discard atom's
>>>> commit will always be able to proceed as it doesn't grab space at all.
>>>>
>>>>>>     (Actually, there is a small race window between grabbing space
>>>>>> and creating an atom...)
>>>>> Which one?
>>>> BA_CAN_COMMIT machinery does wait only for atoms, not for contexts. If
>>>> process X happens to grab space between us grabbing space and creating an atom,
>>>> it will get -ENOSPC even with BA_CAN_COMMIT.
>>
>> I still don't see any "races" here. How atom creation is related to grabbing
>> space? Are we talking about races in the existing code? f so, please show
>> the racing paths..
> Well, this is not a race per se - it does not involve locking. But it is
> a race-like behavior.
>
> taskA                                 taskB
> --------------------------------------------------------------------------
> grab very much space

Ok, assume A wants X blocks.

>                                          grab some space with BA_CAN_COMMIT

Assume B wants Y blocks.

> create an atom using the grabbed space


Please, specify which code is executing at this point.

Anyway, we don't need any reservation to _create_ an atom.
Reservation is expended when allocating blocks on the low level
(bitmaps). Reservation (grabbing space) is needed to avoid hard
ENOSPC (=no free bits in bitmaps) in situation, when we can not
fail (e.g. flush, commit, etc..,)


>
> In this case, the taskB's grab will fail though it could wait for taskA's
> not yet created atom.


I still don't see why somebody should fail if X+Y < free-space-on-disk.
If X+Y > free-space, then yes, someone will fail, and it is correct.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4: FITRIM ioctl -- how to grab the space?
  2014-08-16 11:27                       ` Ivan Shapovalov
@ 2014-08-16 13:35                         ` Edward Shishkin
  2014-08-16 17:05                           ` Ivan Shapovalov
  0 siblings, 1 reply; 20+ messages in thread
From: Edward Shishkin @ 2014-08-16 13:35 UTC (permalink / raw)
  To: Ivan Shapovalov; +Cc: reiserfs-devel


On 08/16/2014 01:27 PM, Ivan Shapovalov wrote:
> On Saturday 16 August 2014 at 10:23:08, Edward Shishkin wrote:	
>> [...]
>>>> So I need to find a way to take rwsem in a read-biased mode... Any
>>>> advice is
>>>> accepted, including "give up with adding of yet another lock and go with
>>>> multiple transactions" :)
>>>
>>> IMHO this is too complicated.
>>>
>>> Why don't you want to grab, say, 20M per iteration?
>>> It should work without any problems, just maintain a
>>> counter of blocks allocated in the iteration..
> I see it mostly like a "good to have" thing.


Nop, we discuss the reasonable implementation of the "must have" -
space reservation by the fitrim process..


>   When the system is under load,
> FITRIM implemented in this "simple" way will be ineffective: it will miss
> some blocks that are temporarily allocated (journal, etc.)


It will be perfectly discarded by the realtime discard


>   even though it is
> theoretically possible to stop accepting new transactions, commit all existing
> ones and perform discard while the system is artificially made idle, then
> resume everything.


IMHO complicated and not need.
Anyway, missing 2 blocks per device is not critical.


>
> So I've been trying to check if this is feasible. Looks like it isn't.
>
>> add the counter to the struct reiser4_context and
>> set it to zero at the beginning of every iteration.
>> use get_current_context() to access the counter.
> Is it needed? The simple way is just a loop consisting of
> - grab space
> - allocate blocks forward from start position (initially 0) until end of
>    partition or limit reached
> - update start position to `last allocated block + 1`
> - put allocated space to the delete set
> - mark for force-committing
> - reiser4_txn_restart_current()
> - if we've reached end of partition earlier, break, otherwise loop
>


Ok, it will also work (with more iterations, though),
well, let's do so..

Thanks,
Edward.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4: FITRIM ioctl -- how to grab the space?
  2014-08-16 12:15                       ` Edward Shishkin
@ 2014-08-16 17:02                         ` Ivan Shapovalov
  2014-08-16 19:54                           ` Edward Shishkin
  0 siblings, 1 reply; 20+ messages in thread
From: Ivan Shapovalov @ 2014-08-16 17:02 UTC (permalink / raw)
  To: Edward Shishkin; +Cc: reiserfs-devel

[-- Attachment #1: Type: text/plain, Size: 3737 bytes --]

On Saturday 16 August 2014 at 14:15:29, Edward Shishkin wrote:	
> 
> On 08/16/2014 01:17 PM, Ivan Shapovalov wrote:
> > On Saturday 16 August 2014 at 10:09:44, Edward Shishkin wrote:	
> >> On 08/16/2014 02:44 AM, Ivan Shapovalov wrote:
> >>> On Monday 11 August 2014 at 13:39:12, Ivan Shapovalov wrote:	
> >>>> [...]
> >>>>>> I've meant "grabbing all space and then allocating all space" -- so there won't
> >>>>>> be multiple grabs or multiple atoms.
> >>>>>>
> >>>>>> Then all processes grabbing space with BA_CAN_COMMIT will wait for the discard
> >>>>>> atom to commit.
> >>>>> It seems such waiting will screw up the system. No?
> >>>> I was afraid of such situations, but how would that happen? The discard atom's
> >>>> commit will always be able to proceed as it doesn't grab space at all.
> >>>>
> >>>>>>     (Actually, there is a small race window between grabbing space
> >>>>>> and creating an atom...)
> >>>>> Which one?
> >>>> BA_CAN_COMMIT machinery does wait only for atoms, not for contexts. If
> >>>> process X happens to grab space between us grabbing space and creating an atom,
> >>>> it will get -ENOSPC even with BA_CAN_COMMIT.
> >>
> >> I still don't see any "races" here. How atom creation is related to grabbing
> >> space? Are we talking about races in the existing code? f so, please show
> >> the racing paths..
> > Well, this is not a race per se - it does not involve locking. But it is
> > a race-like behavior.
> >
> > taskA                                 taskB
> > --------------------------------------------------------------------------
> > grab very much space
> 
> Ok, assume A wants X blocks.
> 
> >                                          grab some space with BA_CAN_COMMIT
> 
> Assume B wants Y blocks.
> 
> > create an atom using the grabbed space
> 
> 
> Please, specify which code is executing at this point.
> 
> Anyway, we don't need any reservation to _create_ an atom.
> Reservation is expended when allocating blocks on the low level
> (bitmaps). Reservation (grabbing space) is needed to avoid hard
> ENOSPC (=no free bits in bitmaps) in situation, when we can not
> fail (e.g. flush, commit, etc..,)

Let's take reiser4_sync_file_common().

The grabbing is
	reserve = estimate_update_common(dentry->d_inode);
	if (reiser4_grab_space(reserve, BA_CAN_COMMIT)) {

The creation of atom is (somewhere deep in the call stack) at
	write_sd_by_inode_common(dentry->d_inode);

Clearly, syncing file won't increase the real space occupied by data on disk.
However, because there is WA + journaling, such transaction still needs some
space to complete. This is "X blocks".

Suppose there is a second sync scheduled between grabbing and creation of atom
of the first sync. In the same vein it needs Y blocks, and Y is such that
Y < free-space < X+Y.

In this case, the second sync will fail despite BA_CAN_COMMIT flag given to
reiser4_grab_space(): at time of its execution, the first sync did not yet
create its atom, so there is nothing to commit to reclaim those X blocks.

However, if the second sync gets ordered after write_sd_by_inode_common() of
the first sync, BA_CAN_COMMIT machinery will eventually execute
txnmgr_force_commit_all() which will wait for the first sync to complete and
reclaim those X blocks.

So, the second transaction's result depends on scheduling. It is a race-like
behavior.

-- 
Ivan Shapovalov / intelfx /

> 
> 
> >
> > In this case, the taskB's grab will fail though it could wait for taskA's
> > not yet created atom.
> 
> 
> I still don't see why somebody should fail if X+Y < free-space-on-disk.
> If X+Y > free-space, then yes, someone will fail, and it is correct.

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 213 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4: FITRIM ioctl -- how to grab the space?
  2014-08-16 13:35                         ` Edward Shishkin
@ 2014-08-16 17:05                           ` Ivan Shapovalov
  2014-08-16 20:13                             ` Edward Shishkin
  0 siblings, 1 reply; 20+ messages in thread
From: Ivan Shapovalov @ 2014-08-16 17:05 UTC (permalink / raw)
  To: Edward Shishkin; +Cc: reiserfs-devel

[-- Attachment #1: Type: text/plain, Size: 2410 bytes --]

On Saturday 16 August 2014 at 15:35:42, Edward Shishkin wrote:	
> 
> On 08/16/2014 01:27 PM, Ivan Shapovalov wrote:
> > On Saturday 16 August 2014 at 10:23:08, Edward Shishkin wrote:	
> >> [...]
> >>>> So I need to find a way to take rwsem in a read-biased mode... Any
> >>>> advice is
> >>>> accepted, including "give up with adding of yet another lock and go with
> >>>> multiple transactions" :)
> >>>
> >>> IMHO this is too complicated.
> >>>
> >>> Why don't you want to grab, say, 20M per iteration?
> >>> It should work without any problems, just maintain a
> >>> counter of blocks allocated in the iteration..
> > I see it mostly like a "good to have" thing.
> 
> 
> Nop, we discuss the reasonable implementation of the "must have" -
> space reservation by the fitrim process..
> 
> 
> >   When the system is under load,
> > FITRIM implemented in this "simple" way will be ineffective: it will miss
> > some blocks that are temporarily allocated (journal, etc.)
> 
> 
> It will be perfectly discarded by the realtime discard
> 
> 
> >   even though it is
> > theoretically possible to stop accepting new transactions, commit all existing
> > ones and perform discard while the system is artificially made idle, then
> > resume everything.
> 
> 
> IMHO complicated and not need.
> Anyway, missing 2 blocks per device is not critical.

OK. Giving up then. Let's go the multiple transaction approach.

> 
> 
> >
> > So I've been trying to check if this is feasible. Looks like it isn't.
> >
> >> add the counter to the struct reiser4_context and
> >> set it to zero at the beginning of every iteration.
> >> use get_current_context() to access the counter.
> > Is it needed? The simple way is just a loop consisting of
> > - grab space
> > - allocate blocks forward from start position (initially 0) until end of
> >    partition or limit reached
> > - update start position to `last allocated block + 1`
> > - put allocated space to the delete set
> > - mark for force-committing
> > - reiser4_txn_restart_current()
> > - if we've reached end of partition earlier, break, otherwise loop
> >
> 
> 
> Ok, it will also work (with more iterations, though),
> well, let's do so..

More iterations? It seems like I don't really understand your idea
with the counter.
Could you please elaborate a bit?

Thanks,
-- 
Ivan Shapovalov / intelfx /

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 213 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4: FITRIM ioctl -- how to grab the space?
  2014-08-16 17:02                         ` Ivan Shapovalov
@ 2014-08-16 19:54                           ` Edward Shishkin
  0 siblings, 0 replies; 20+ messages in thread
From: Edward Shishkin @ 2014-08-16 19:54 UTC (permalink / raw)
  To: Ivan Shapovalov; +Cc: reiserfs-devel


On 08/16/2014 07:02 PM, Ivan Shapovalov wrote:
> On Saturday 16 August 2014 at 14:15:29, Edward Shishkin wrote:	
>> On 08/16/2014 01:17 PM, Ivan Shapovalov wrote:
>>> On Saturday 16 August 2014 at 10:09:44, Edward Shishkin wrote:	
>>>> On 08/16/2014 02:44 AM, Ivan Shapovalov wrote:
>>>>> On Monday 11 August 2014 at 13:39:12, Ivan Shapovalov wrote:	
>>>>>> [...]
>>>>>>>> I've meant "grabbing all space and then allocating all space" -- so there won't
>>>>>>>> be multiple grabs or multiple atoms.
>>>>>>>>
>>>>>>>> Then all processes grabbing space with BA_CAN_COMMIT will wait for the discard
>>>>>>>> atom to commit.
>>>>>>> It seems such waiting will screw up the system. No?
>>>>>> I was afraid of such situations, but how would that happen? The discard atom's
>>>>>> commit will always be able to proceed as it doesn't grab space at all.
>>>>>>
>>>>>>>>      (Actually, there is a small race window between grabbing space
>>>>>>>> and creating an atom...)
>>>>>>> Which one?
>>>>>> BA_CAN_COMMIT machinery does wait only for atoms, not for contexts. If
>>>>>> process X happens to grab space between us grabbing space and creating an atom,
>>>>>> it will get -ENOSPC even with BA_CAN_COMMIT.
>>>> I still don't see any "races" here. How atom creation is related to grabbing
>>>> space? Are we talking about races in the existing code? f so, please show
>>>> the racing paths..
>>> Well, this is not a race per se - it does not involve locking. But it is
>>> a race-like behavior.
>>>
>>> taskA                                 taskB
>>> --------------------------------------------------------------------------
>>> grab very much space
>> Ok, assume A wants X blocks.
>>
>>>                                           grab some space with BA_CAN_COMMIT
>> Assume B wants Y blocks.
>>
>>> create an atom using the grabbed space
>>
>> Please, specify which code is executing at this point.
>>
>> Anyway, we don't need any reservation to _create_ an atom.
>> Reservation is expended when allocating blocks on the low level
>> (bitmaps). Reservation (grabbing space) is needed to avoid hard
>> ENOSPC (=no free bits in bitmaps) in situation, when we can not
>> fail (e.g. flush, commit, etc..,)
> Let's take reiser4_sync_file_common().
>
> The grabbing is
> 	reserve = estimate_update_common(dentry->d_inode);
> 	if (reiser4_grab_space(reserve, BA_CAN_COMMIT)) {
>
> The creation of atom is (somewhere deep in the call stack) at
> 	write_sd_by_inode_common(dentry->d_inode);
>
> Clearly, syncing file won't increase the real space occupied by data on disk.
> However, because there is WA + journaling, such transaction still needs some
> space to complete. This is "X blocks".
>
> Suppose there is a second sync scheduled between grabbing and creation of atom
> of the first sync. In the same vein it needs Y blocks, and Y is such that
> Y < free-space < X+Y.
>
> In this case, the second sync will fail despite BA_CAN_COMMIT flag given to
> reiser4_grab_space(): at time of its execution, the first sync did not yet
> create its atom, so there is nothing to commit to reclaim those X blocks.
>
> However, if the second sync gets ordered after write_sd_by_inode_common() of
> the first sync, BA_CAN_COMMIT machinery will eventually execute
> txnmgr_force_commit_all() which will wait for the first sync to complete and
> reclaim those X blocks.
>
> So, the second transaction's result depends on scheduling. It is a race-like
> behavior.


It's OK.
The second process fails in the situation of disk space pressure
(free-space < X+Y ). We don't rely on success here.

I was suspicious because of the problem of "phantom" ENOSPC,
which appears once in a while: a small write returns ENOSPC,
whereas there is a lot of free space on disk.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4: FITRIM ioctl -- how to grab the space?
  2014-08-16 17:05                           ` Ivan Shapovalov
@ 2014-08-16 20:13                             ` Edward Shishkin
  0 siblings, 0 replies; 20+ messages in thread
From: Edward Shishkin @ 2014-08-16 20:13 UTC (permalink / raw)
  To: Ivan Shapovalov; +Cc: reiserfs-devel


On 08/16/2014 07:05 PM, Ivan Shapovalov wrote:
> On Saturday 16 August 2014 at 15:35:42, Edward Shishkin wrote:	
>> On 08/16/2014 01:27 PM, Ivan Shapovalov wrote:
>>> On Saturday 16 August 2014 at 10:23:08, Edward Shishkin wrote:	
>>>> [...]
>>>>>> So I need to find a way to take rwsem in a read-biased mode... Any
>>>>>> advice is
>>>>>> accepted, including "give up with adding of yet another lock and go with
>>>>>> multiple transactions" :)
>>>>> IMHO this is too complicated.
>>>>>
>>>>> Why don't you want to grab, say, 20M per iteration?
>>>>> It should work without any problems, just maintain a
>>>>> counter of blocks allocated in the iteration..
>>> I see it mostly like a "good to have" thing.
>>
>> Nop, we discuss the reasonable implementation of the "must have" -
>> space reservation by the fitrim process..
>>
>>
>>>    When the system is under load,
>>> FITRIM implemented in this "simple" way will be ineffective: it will miss
>>> some blocks that are temporarily allocated (journal, etc.)
>>
>> It will be perfectly discarded by the realtime discard
>>
>>
>>>    even though it is
>>> theoretically possible to stop accepting new transactions, commit all existing
>>> ones and perform discard while the system is artificially made idle, then
>>> resume everything.
>>
>> IMHO complicated and not need.
>> Anyway, missing 2 blocks per device is not critical.
> OK. Giving up then. Let's go the multiple transaction approach.
>
>>
>>> So I've been trying to check if this is feasible. Looks like it isn't.
>>>
>>>> add the counter to the struct reiser4_context and
>>>> set it to zero at the beginning of every iteration.
>>>> use get_current_context() to access the counter.
>>> Is it needed? The simple way is just a loop consisting of
>>> - grab space
>>> - allocate blocks forward from start position (initially 0) until end of
>>>     partition or limit reached
>>> - update start position to `last allocated block + 1`
>>> - put allocated space to the delete set
>>> - mark for force-committing
>>> - reiser4_txn_restart_current()
>>> - if we've reached end of partition earlier, break, otherwise loop
>>>
>>
>> Ok, it will also work (with more iterations, though),
>> well, let's do so..
> More iterations? It seems like I don't really understand your idea
> with the counter.
> Could you please elaborate a bit?


Everything is OK with the number of iterations.
I misunderstood your design in the first reading..

Thanks,
Edward.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2014-08-16 20:13 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-31 20:47 reiser4: FITRIM ioctl -- how to grab the space? Ivan Shapovalov
2014-07-31 22:03 ` Edward Shishkin
2014-07-31 22:16   ` Ivan Shapovalov
     [not found]     ` <53DACEE9.8000802@gmail.com>
2014-08-10 18:52       ` Ivan Shapovalov
2014-08-10 19:48         ` Edward Shishkin
2014-08-10 20:37           ` Ivan Shapovalov
2014-08-10 23:29             ` Edward Shishkin
2014-08-11  9:39               ` Ivan Shapovalov
2014-08-16  0:44                 ` Ivan Shapovalov
2014-08-16  8:09                   ` Edward Shishkin
2014-08-16  8:23                     ` Edward Shishkin
2014-08-16 11:27                       ` Ivan Shapovalov
2014-08-16 13:35                         ` Edward Shishkin
2014-08-16 17:05                           ` Ivan Shapovalov
2014-08-16 20:13                             ` Edward Shishkin
2014-08-16 11:17                     ` Ivan Shapovalov
2014-08-16 12:15                       ` Edward Shishkin
2014-08-16 17:02                         ` Ivan Shapovalov
2014-08-16 19:54                           ` Edward Shishkin
2014-08-02 16:40   ` Edward Shishkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).