* delalloc and reservation.
@ 2007-10-29 10:00 Aneesh Kumar K.V
2007-10-29 11:07 ` Aneesh Kumar K.V
2007-10-29 15:14 ` Alex Tomas
0 siblings, 2 replies; 8+ messages in thread
From: Aneesh Kumar K.V @ 2007-10-29 10:00 UTC (permalink / raw)
To: Andreas Dilger, Eric Sandeen, Valerie Clement, Theodore Tso,
Mingming Cao
Cc: linux-ext4
Hi All,
I looked at the delalloc and reservation differences that Valerie was observing.
Below is my understanding. I am not sure whether the below will result in
higher fragmentation that Eric Sandeen is observing. I guess it should not. Even
though the reservation gets discarded during the clear inode due to memory pressure
the request for new reservation should get the blocks nearby and not break extents right ?
any how below is the simple case.
without delalloc the blocks are requested during prepare_write/write_begin.
That means we enter ext4_new_blocks_old which will call ext4_try_to_allocate_with_rsv.
Now if there is no reservation for this inode a new one will be allocated. After
using the blocks this reservation is destroyed during the close via ext4_release_file
With delalloc the blocks are not requested until we hit writeback/ext4_da_writepages
That means if we create new file and close them the reservation will be discarded
during close via ext4_release_file.( Actually there will be nothing to clear)
Now when we do a sync/or write back. We try to get the block, the inode will
request for new reservation. This reservation is not discarded untill we call clear_inode
and that results in the behavior we are seeing.
Free blocks: 1440-8191, 8194-8199, 8202-8207, 8210-8215, 8218-8223, 8226-8231, 8234-8239, 8242-8247, 8250-8255, 8258-8263, 8266-8271, 8274-8279, 8282-8287, 8290-8295, 8298-8303, 8306-8311, 8314-8319, 8322-8327, 8330-8335, 8338-8343, 8346-12799
So now the question is where do we discard the reservation in case of delalloc.
-aneesh
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: delalloc and reservation.
2007-10-29 10:00 Aneesh Kumar K.V
@ 2007-10-29 11:07 ` Aneesh Kumar K.V
2007-10-29 15:14 ` Alex Tomas
1 sibling, 0 replies; 8+ messages in thread
From: Aneesh Kumar K.V @ 2007-10-29 11:07 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: Andreas Dilger, Eric Sandeen, Valerie Clement, Theodore Tso,
Mingming Cao, linux-ext4
Aneesh Kumar K.V wrote:
> Hi All,
>
> I looked at the delalloc and reservation differences that Valerie was
> observing.
> Below is my understanding. I am not sure whether the below will result
> in higher fragmentation that Eric Sandeen is observing. I guess it
> should not. Even
> though the reservation gets discarded during the clear inode due to
> memory pressure
> the request for new reservation should get the blocks nearby and not
> break extents right ?
>
>
> any how below is the simple case.
>
> without delalloc the blocks are requested during prepare_write/write_begin.
> That means we enter ext4_new_blocks_old which will call
> ext4_try_to_allocate_with_rsv.
> Now if there is no reservation for this inode a new one will be
> allocated. After
> using the blocks this reservation is destroyed during the close via
> ext4_release_file
>
> With delalloc the blocks are not requested until we hit
> writeback/ext4_da_writepages
> That means if we create new file and close them the reservation will be
> discarded
> during close via ext4_release_file.( Actually there will be nothing to
> clear)
> Now when we do a sync/or write back. We try to get the block, the inode
> will
> request for new reservation. This reservation is not discarded untill we
> call clear_inode
> and that results in the behavior we are seeing.
> Free blocks: 1440-8191, 8194-8199, 8202-8207, 8210-8215, 8218-8223,
> 8226-8231, 8234-8239, 8242-8247, 8250-8255, 8258-8263, 8266-8271,
> 8274-8279, 8282-8287, 8290-8295, 8298-8303, 8306-8311, 8314-8319,
> 8322-8327, 8330-8335, 8338-8343, 8346-12799
>
> So now the question is where do we discard the reservation in case of
> delalloc.
>
> -
with respect to mballoc we are not seeing this because we are doing
allocation from group prealloc list which is per cpu.
For most the case we have EXT4_MB_HINT_GROUP_ALLOC set in mballoc.
In ext4_mb_group_or_file i already have a FIXME!! regarding this.
currently we have
/* request is so large that we don't care about
* streaming - it overweights any possible seek */
if (ac->ac_o_ex.fe_len >= sbi->s_mb_large_req)
return;
/* FIXME!!
* is this >= considering the above ?
*/
if (ac->ac_o_ex.fe_len >= sbi->s_mb_small_req)
return;
.....
......
/* we're going to use group allocation */
ac->ac_flags |= EXT4_MB_HINT_GROUP_ALLOC;
........
.........
So for small size we have the EXT4_MB_HINT_GROUP_ALLOC set . Now if
i change the the line below FIXME!! to <= , that will force
small size to use inode prealloc and that cause
Free blocks: 1442-1443, 1446-1447, 1450-1451, 1454-1455, 1458-1459, 1462-1463, 1466-1467, 1470-1471, 1474-1475, 1478-1479, 1482-1483, 1486-1487, 1490-1491, 1494-1495, 1498-1499, 1502-1503, 1506-1507, 1510-1511, 1514-1515, 1518-12799
So the problem is generic.
-aneesh
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: delalloc and reservation
@ 2007-10-29 14:24 Aneesh Kumar K.V
0 siblings, 0 replies; 8+ messages in thread
From: Aneesh Kumar K.V @ 2007-10-29 14:24 UTC (permalink / raw)
To: linux-ext4, bzzz.tomas
[-- Attachment #1: Type: text/plain, Size: 60 bytes --]
I guess the list dropped this mail. Sending again.
-aneesh
[-- Attachment #2: Re: delalloc and reservation..eml --]
[-- Type: message/rfc822, Size: 3693 bytes --]
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andreas Dilger <adilger@clusterfs.com>, Eric Sandeen <sandeen@redhat.com>, Valerie Clement <valerie.clement@bull.net>, Theodore Tso <tytso@mit.edu>, Mingming Cao <cmm@us.ibm.com>, linux-ext4 <linux-ext4@vger.kernel.org>
Subject: Re: delalloc and reservation.
Date: Mon, 29 Oct 2007 16:37:08 +0530
Message-ID: <4725BEDC.5090902@linux.vnet.ibm.com>
Aneesh Kumar K.V wrote:
> Hi All,
>
> I looked at the delalloc and reservation differences that Valerie was
> observing.
> Below is my understanding. I am not sure whether the below will result
> in higher fragmentation that Eric Sandeen is observing. I guess it
> should not. Even
> though the reservation gets discarded during the clear inode due to
> memory pressure
> the request for new reservation should get the blocks nearby and not
> break extents right ?
>
>
> any how below is the simple case.
>
> without delalloc the blocks are requested during prepare_write/write_begin.
> That means we enter ext4_new_blocks_old which will call
> ext4_try_to_allocate_with_rsv.
> Now if there is no reservation for this inode a new one will be
> allocated. After
> using the blocks this reservation is destroyed during the close via
> ext4_release_file
>
> With delalloc the blocks are not requested until we hit
> writeback/ext4_da_writepages
> That means if we create new file and close them the reservation will be
> discarded
> during close via ext4_release_file.( Actually there will be nothing to
> clear)
> Now when we do a sync/or write back. We try to get the block, the inode
> will
> request for new reservation. This reservation is not discarded untill we
> call clear_inode
> and that results in the behavior we are seeing.
> Free blocks: 1440-8191, 8194-8199, 8202-8207, 8210-8215, 8218-8223,
> 8226-8231, 8234-8239, 8242-8247, 8250-8255, 8258-8263, 8266-8271,
> 8274-8279, 8282-8287, 8290-8295, 8298-8303, 8306-8311, 8314-8319,
> 8322-8327, 8330-8335, 8338-8343, 8346-12799
>
> So now the question is where do we discard the reservation in case of
> delalloc.
>
> -
with respect to mballoc we are not seeing this because we are doing
allocation from group prealloc list which is per cpu.
For most the case we have EXT4_MB_HINT_GROUP_ALLOC set in mballoc.
In ext4_mb_group_or_file i already have a FIXME!! regarding this.
currently we have
/* request is so large that we don't care about
* streaming - it overweights any possible seek */
if (ac->ac_o_ex.fe_len >= sbi->s_mb_large_req)
return;
/* FIXME!!
* is this >= considering the above ?
*/
if (ac->ac_o_ex.fe_len >= sbi->s_mb_small_req)
return;
.....
......
/* we're going to use group allocation */
ac->ac_flags |= EXT4_MB_HINT_GROUP_ALLOC;
........
.........
So for small size we have the EXT4_MB_HINT_GROUP_ALLOC set . Now if
i change the the line below FIXME!! to <= , that will force
small size to use inode prealloc and that cause
Free blocks: 1442-1443, 1446-1447, 1450-1451, 1454-1455, 1458-1459, 1462-1463, 1466-1467, 1470-1471, 1474-1475, 1478-1479, 1482-1483, 1486-1487, 1490-1491, 1494-1495, 1498-1499, 1502-1503, 1506-1507, 1510-1511, 1514-1515, 1518-12799
So the problem is generic.
-aneesh
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: delalloc and reservation.
2007-10-29 15:14 ` Alex Tomas
@ 2007-10-29 14:33 ` Aneesh Kumar K.V
2007-10-29 15:44 ` Alex Tomas
0 siblings, 1 reply; 8+ messages in thread
From: Aneesh Kumar K.V @ 2007-10-29 14:33 UTC (permalink / raw)
To: Alex Tomas
Cc: Andreas Dilger, Eric Sandeen, Valerie Clement, Theodore Tso,
Mingming Cao, linux-ext4
Alex Tomas wrote:
> Hi,
>
> could you try the patch attached. it should fix the issue. the idea
> was to align requests in order to help raid5-like setups. but somewhere
> I lost one bit in mballoc: it should pre-allocate all crossed stripes,
> but it didn't.
>
> as for discard, lustre doesn't use open/close for data, so discard-on-close
> makes zero sense in our case. I'm not very positive whether we need to
> drop preallocation on file close in case of delayed allocation as writeback
> can be started while file is open and finish after close(2).
>
>
mballoc by default doesn't give the particular layout only if i force small
size to use inode preallocation i am hitting the problem. ie to change the
below line in ext4_mb_group_or_file
if (ac->ac_o_ex.fe_len >= sbi->s_mb_small_req)
to
if (ac->ac_o_ex.fe_len <= sbi->s_mb_small_req)
Do you want to test the patch with this change ?
We are observing the problem with delalloc and nomballoc.
-aneesh
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: delalloc and reservation.
2007-10-29 10:00 Aneesh Kumar K.V
2007-10-29 11:07 ` Aneesh Kumar K.V
@ 2007-10-29 15:14 ` Alex Tomas
2007-10-29 14:33 ` Aneesh Kumar K.V
1 sibling, 1 reply; 8+ messages in thread
From: Alex Tomas @ 2007-10-29 15:14 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: Andreas Dilger, Eric Sandeen, Valerie Clement, Theodore Tso,
Mingming Cao, linux-ext4
[-- Attachment #1: Type: text/plain, Size: 2447 bytes --]
Hi,
could you try the patch attached. it should fix the issue. the idea
was to align requests in order to help raid5-like setups. but somewhere
I lost one bit in mballoc: it should pre-allocate all crossed stripes,
but it didn't.
as for discard, lustre doesn't use open/close for data, so discard-on-close
makes zero sense in our case. I'm not very positive whether we need to
drop preallocation on file close in case of delayed allocation as writeback
can be started while file is open and finish after close(2).
thanks, Alex
Aneesh Kumar K.V wrote:
> Hi All,
>
> I looked at the delalloc and reservation differences that Valerie was
> observing.
> Below is my understanding. I am not sure whether the below will result
> in higher fragmentation that Eric Sandeen is observing. I guess it
> should not. Even
> though the reservation gets discarded during the clear inode due to
> memory pressure
> the request for new reservation should get the blocks nearby and not
> break extents right ?
>
>
> any how below is the simple case.
>
> without delalloc the blocks are requested during prepare_write/write_begin.
> That means we enter ext4_new_blocks_old which will call
> ext4_try_to_allocate_with_rsv.
> Now if there is no reservation for this inode a new one will be
> allocated. After
> using the blocks this reservation is destroyed during the close via
> ext4_release_file
>
> With delalloc the blocks are not requested until we hit
> writeback/ext4_da_writepages
> That means if we create new file and close them the reservation will be
> discarded
> during close via ext4_release_file.( Actually there will be nothing to
> clear)
> Now when we do a sync/or write back. We try to get the block, the inode
> will
> request for new reservation. This reservation is not discarded untill we
> call clear_inode
> and that results in the behavior we are seeing.
> Free blocks: 1440-8191, 8194-8199, 8202-8207, 8210-8215, 8218-8223,
> 8226-8231, 8234-8239, 8242-8247, 8250-8255, 8258-8263, 8266-8271,
> 8274-8279, 8282-8287, 8290-8295, 8298-8303, 8306-8311, 8314-8319,
> 8322-8327, 8330-8335, 8338-8343, 8346-12799
>
> So now the question is where do we discard the reservation in case of
> delalloc.
>
> -aneesh
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
[-- Attachment #2: mballoc-debug.patch --]
[-- Type: text/x-patch, Size: 901 bytes --]
Index: linux-2.6.24-rc1/fs/ext4/mballoc.c
===================================================================
--- linux-2.6.24-rc1.orig/fs/ext4/mballoc.c 2007-10-27 10:29:17.000000000 +0400
+++ linux-2.6.24-rc1/fs/ext4/mballoc.c 2007-10-27 22:14:54.000000000 +0400
@@ -3088,8 +3088,10 @@ static void ext4_mb_normalize_request(st
break;
}
}
+ size = wind;
+
if (wind == 0) {
- __u64 tstart;
+ __u64 tstart, tend;
/* file is quite large, we now preallocate with
* the biggest configured window with regart to
* logical offset */
@@ -3097,8 +3099,11 @@ static void ext4_mb_normalize_request(st
tstart = ac->ac_o_ex.fe_logical;
do_div(tstart, wind);
start = tstart * wind;
+ tend = ac->ac_o_ex.fe_logical + ac->ac_o_ex.fe_len - 1;
+ do_div(tend, wind);
+ tend = tend * wind + wind;
+ size = tend - start;
}
- size = wind;
orig_size = size;
orig_start = start;
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: delalloc and reservation.
2007-10-29 15:44 ` Alex Tomas
@ 2007-10-29 15:19 ` Aneesh Kumar K.V
2007-10-29 16:26 ` Alex Tomas
0 siblings, 1 reply; 8+ messages in thread
From: Aneesh Kumar K.V @ 2007-10-29 15:19 UTC (permalink / raw)
To: Alex Tomas
Cc: Andreas Dilger, Eric Sandeen, Valerie Clement, Theodore Tso,
Mingming Cao, linux-ext4
Alex Tomas wrote:
> sorry, I don't quite understand how do you observe this with nomballoc
>
> thanks, Alex
>
> Aneesh Kumar K.V wrote:
>> mballoc by default doesn't give the particular layout only if i force
>> small
>> size to use inode preallocation i am hitting the problem. ie to change
>> the
>> below line in ext4_mb_group_or_file
>>
>> if (ac->ac_o_ex.fe_len >= sbi->s_mb_small_req)
>>
>> to
>> if (ac->ac_o_ex.fe_len <= sbi->s_mb_small_req)
>>
>> Do you want to test the patch with this change ?
>>
>> We are observing the problem with delalloc and nomballoc.
>>
As i explained in the previous mail the problem is with the
the current reservation code using ext4_block_alloc_info.
EXT4_I(inode)->i_block_alloc_info;
Now what is happening is we are not discarding the reservation
with respect to particular inode in case of dealloc. Without
delalloc we discard the reservation during close(). But with
dealloc the we are getting new reservation in the writeback
path and we don't discard the reservation. This results
in the files being spread across and not closely allocated
on disk.
BTW with your patch and the change i suggested above
the problem still exist.
The output is while requesting for 2 blocks
printed by this in ext4_ext_get_blocks
printk(KERN_CRIT "allocate new block: goal %llu, found %llu/%lu\n",
ar.goal, newblock, ar.len);
allocate new block: goal 28672, found 12288/1
allocate new block: goal 8192, found 12292/2
allocate new block: goal 8192, found 12296/2
allocate new block: goal 8192, found 12300/2
allocate new block: goal 8192, found 12304/2
allocate new block: goal 8192, found 12308/2
allocate new block: goal 8192, found 12312/2
allocate new block: goal 8192, found 12316/2
allocate new block: goal 8192, found 1440/2
allocate new block: goal 8192, found 1444/2
allocate new block: goal 8192, found 1448/2
allocate new block: goal 8192, found 1452/2
allocate new block: goal 8192, found 1456/2
allocate new block: goal 8192, found 1460/2
allocate new block: goal 8192, found 1464/2
allocate new block: goal 8192, found 1468/2
allocate new block: goal 8192, found 12320/2
allocate new block: goal 8192, found 12324/2
allocate new block: goal 8192, found 12328/2
allocate new block: goal 8192, found 12332/2
allocate new block: goal 8192, found 12336/2
with the change mballoc was not giving the problem
described because it uses blocks from group
preallocation.
-aneesh
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: delalloc and reservation.
2007-10-29 14:33 ` Aneesh Kumar K.V
@ 2007-10-29 15:44 ` Alex Tomas
2007-10-29 15:19 ` Aneesh Kumar K.V
0 siblings, 1 reply; 8+ messages in thread
From: Alex Tomas @ 2007-10-29 15:44 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: Andreas Dilger, Eric Sandeen, Valerie Clement, Theodore Tso,
Mingming Cao, linux-ext4
sorry, I don't quite understand how do you observe this with nomballoc
thanks, Alex
Aneesh Kumar K.V wrote:
> mballoc by default doesn't give the particular layout only if i force small
> size to use inode preallocation i am hitting the problem. ie to change the
> below line in ext4_mb_group_or_file
>
> if (ac->ac_o_ex.fe_len >= sbi->s_mb_small_req)
>
> to
> if (ac->ac_o_ex.fe_len <= sbi->s_mb_small_req)
>
> Do you want to test the patch with this change ?
>
> We are observing the problem with delalloc and nomballoc.
>
>
> -aneesh
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: delalloc and reservation.
2007-10-29 15:19 ` Aneesh Kumar K.V
@ 2007-10-29 16:26 ` Alex Tomas
0 siblings, 0 replies; 8+ messages in thread
From: Alex Tomas @ 2007-10-29 16:26 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: Andreas Dilger, Eric Sandeen, Valerie Clement, Theodore Tso,
Mingming Cao, linux-ext4
ah, got it now. I think the solution would be to discard preallocated blocks
once blocks for all dirty data are allocated and file is closed. In the previous
version of delalloc I did this passing NOPREALLOC hint. something similar should
be done in the newer one, I guess.
thanks, Alex
Aneesh Kumar K.V wrote:
>
>
> Alex Tomas wrote:
>> sorry, I don't quite understand how do you observe this with nomballoc
>>
>> thanks, Alex
>>
>> Aneesh Kumar K.V wrote:
>>> mballoc by default doesn't give the particular layout only if i force
>>> small
>>> size to use inode preallocation i am hitting the problem. ie to
>>> change the
>>> below line in ext4_mb_group_or_file
>>>
>>> if (ac->ac_o_ex.fe_len >= sbi->s_mb_small_req)
>>>
>>> to
>>> if (ac->ac_o_ex.fe_len <= sbi->s_mb_small_req)
>>>
>>> Do you want to test the patch with this change ?
>>>
>>> We are observing the problem with delalloc and nomballoc.
>>>
>
>
> As i explained in the previous mail the problem is with the the current
> reservation code using ext4_block_alloc_info.
>
>
> EXT4_I(inode)->i_block_alloc_info;
>
> Now what is happening is we are not discarding the reservation
> with respect to particular inode in case of dealloc. Without
> delalloc we discard the reservation during close(). But with
> dealloc the we are getting new reservation in the writeback
> path and we don't discard the reservation. This results
> in the files being spread across and not closely allocated
> on disk.
> BTW with your patch and the change i suggested above the problem still
> exist.
>
> The output is while requesting for 2 blocks
> printed by this in ext4_ext_get_blocks
>
> printk(KERN_CRIT "allocate new block: goal %llu, found %llu/%lu\n",
> ar.goal, newblock, ar.len);
>
>
> allocate new block: goal 28672, found 12288/1
> allocate new block: goal 8192, found 12292/2
> allocate new block: goal 8192, found 12296/2
> allocate new block: goal 8192, found 12300/2
> allocate new block: goal 8192, found 12304/2
> allocate new block: goal 8192, found 12308/2
> allocate new block: goal 8192, found 12312/2
> allocate new block: goal 8192, found 12316/2
> allocate new block: goal 8192, found 1440/2
> allocate new block: goal 8192, found 1444/2
> allocate new block: goal 8192, found 1448/2
> allocate new block: goal 8192, found 1452/2
> allocate new block: goal 8192, found 1456/2
> allocate new block: goal 8192, found 1460/2
> allocate new block: goal 8192, found 1464/2
> allocate new block: goal 8192, found 1468/2
> allocate new block: goal 8192, found 12320/2
> allocate new block: goal 8192, found 12324/2
> allocate new block: goal 8192, found 12328/2
> allocate new block: goal 8192, found 12332/2
> allocate new block: goal 8192, found 12336/2
>
>
> with the change mballoc was not giving the problem
> described because it uses blocks from group
> preallocation.
>
> -aneesh
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2007-10-29 17:03 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-29 14:24 delalloc and reservation Aneesh Kumar K.V
-- strict thread matches above, loose matches on Subject: below --
2007-10-29 10:00 Aneesh Kumar K.V
2007-10-29 11:07 ` Aneesh Kumar K.V
2007-10-29 15:14 ` Alex Tomas
2007-10-29 14:33 ` Aneesh Kumar K.V
2007-10-29 15:44 ` Alex Tomas
2007-10-29 15:19 ` Aneesh Kumar K.V
2007-10-29 16:26 ` Alex Tomas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).