* mirrored device with thousand of mapping table entries
@ 2011-02-28 11:21 Eli Malul
2011-02-28 11:48 ` Alasdair G Kergon
0 siblings, 1 reply; 23+ messages in thread
From: Eli Malul @ 2011-02-28 11:21 UTC (permalink / raw)
To: dm-devel
[-- Attachment #1.1: Type: text/plain, Size: 605 bytes --]
Hi,
I am trying to create a mirrored device with thousand of mapping table
entries.
Unfortunately, it seems that the allocated kernel memory to satisfy this
configuration is unbearable - I am talking about 6G for 10,000 entries
where the used region size is for each entry is minimal (i.e. 8 blocks).
My questions are as follows:
1. Did anyone tried before to create a mirrored device with
thousands of targets?
2. What is the estimated total allocated memory per each target?
3. Is there something I am missing here?
Thanks a lot,
Eli
[-- Attachment #1.2: Type: text/html, Size: 5078 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: mirrored device with thousand of mapping table entries
2011-02-28 11:21 mirrored device with thousand of mapping table entries Eli Malul
@ 2011-02-28 11:48 ` Alasdair G Kergon
2011-02-28 11:59 ` mirrored device with thousand of mapping tableentries Eli Malul
0 siblings, 1 reply; 23+ messages in thread
From: Alasdair G Kergon @ 2011-02-28 11:48 UTC (permalink / raw)
To: device-mapper development
On Mon, Feb 28, 2011 at 01:21:10PM +0200, Eli Malul wrote:
> 1. Did anyone tried before to create a mirrored device with
> thousands of targets?
> 2. What is the estimated total allocated memory per each target?
> 3. Is there something I am missing here?
Please show us an example, maybe the first few lines of the tables you
are loading.
Alasdair
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: mirrored device with thousand of mapping tableentries
2011-02-28 11:48 ` Alasdair G Kergon
@ 2011-02-28 11:59 ` Eli Malul
2011-02-28 12:11 ` Alasdair G Kergon
0 siblings, 1 reply; 23+ messages in thread
From: Eli Malul @ 2011-02-28 11:59 UTC (permalink / raw)
To: Alasdair G Kergon, device-mapper development
The first few lines of the table I am loading:
[root@vpc09 ~]# vi /tmp/mirror_table.txt
0 12 km_mirror core 2 8 nosync 2 /dev/loop0 0 /dev/loop1 0
12 12 km_mirror core 2 8 nosync 2 /dev/loop0 12 /dev/loop1 12
24 12 km_mirror core 2 8 nosync 2 /dev/loop0 24 /dev/loop1 24
36 12 km_mirror core 2 8 nosync 2 /dev/loop0 36 /dev/loop1 36
48 12 km_mirror core 2 8 nosync 2 /dev/loop0 48 /dev/loop1 48
60 12 km_mirror core 2 8 nosync 2 /dev/loop0 60 /dev/loop1 60
72 12 km_mirror core 2 8 nosync 2 /dev/loop0 72 /dev/loop1 72
84 12 km_mirror core 2 8 nosync 2 /dev/loop0 84 /dev/loop1 84
96 12 km_mirror core 2 8 nosync 2 /dev/loop0 96 /dev/loop1 96
108 12 km_mirror core 2 8 nosync 2 /dev/loop0 108 /dev/loop1 108
.
.
.
I am also adding the result of the 'free' command so you can see the
memory usage increasing:
[root@vpc09 ~]# free -g
total used free shared buffers
cached
Mem: 7 0 6 0 0
0
-/+ buffers/cache: 0 7
Swap: 0 0 0
[root@vpc09 ~]# free -g
total used free shared buffers
cached
Mem: 7 6 0 0 0
0
-/+ buffers/cache: 6 1
Swap: 0 0 0
-----Original Message-----
From: Alasdair G Kergon [mailto:agk@redhat.com]
Sent: Monday, February 28, 2011 1:48 PM
To: device-mapper development
Subject: Re: [dm-devel] mirrored device with thousand of mapping
tableentries
On Mon, Feb 28, 2011 at 01:21:10PM +0200, Eli Malul wrote:
> 1. Did anyone tried before to create a mirrored device with
> thousands of targets?
> 2. What is the estimated total allocated memory per each target?
> 3. Is there something I am missing here?
Please show us an example, maybe the first few lines of the tables you
are loading.
Alasdair
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: mirrored device with thousand of mapping tableentries
2011-02-28 11:59 ` mirrored device with thousand of mapping tableentries Eli Malul
@ 2011-02-28 12:11 ` Alasdair G Kergon
2011-02-28 12:17 ` mirrored device with thousand of mappingtableentries Eli Malul
0 siblings, 1 reply; 23+ messages in thread
From: Alasdair G Kergon @ 2011-02-28 12:11 UTC (permalink / raw)
To: Eli Malul; +Cc: device-mapper development
On Mon, Feb 28, 2011 at 01:59:16PM +0200, Eli Malul wrote:
> The first few lines of the table I am loading:
> [root@vpc09 ~]# vi /tmp/mirror_table.txt
> 0 12 km_mirror core 2 8 nosync 2 /dev/loop0 0 /dev/loop1 0
> 12 12 km_mirror core 2 8 nosync 2 /dev/loop0 12 /dev/loop1 12
> 24 12 km_mirror core 2 8 nosync 2 /dev/loop0 24 /dev/loop1 24
> 36 12 km_mirror core 2 8 nosync 2 /dev/loop0 36 /dev/loop1 36
> 48 12 km_mirror core 2 8 nosync 2 /dev/loop0 48 /dev/loop1 48
> 60 12 km_mirror core 2 8 nosync 2 /dev/loop0 60 /dev/loop1 60
> 72 12 km_mirror core 2 8 nosync 2 /dev/loop0 72 /dev/loop1 72
> 84 12 km_mirror core 2 8 nosync 2 /dev/loop0 84 /dev/loop1 84
> 96 12 km_mirror core 2 8 nosync 2 /dev/loop0 96 /dev/loop1 96
> 108 12 km_mirror core 2 8 nosync 2 /dev/loop0 108 /dev/loop1 108
Why do you need this, rather than one mirror target covering the
whole device? (And if, unlike that example, your extents are not contiguous,
create two new devices that join them together, and mirror those. That's
what LVM does.)
Alasdair
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: mirrored device with thousand of mappingtableentries
2011-02-28 12:11 ` Alasdair G Kergon
@ 2011-02-28 12:17 ` Eli Malul
2011-02-28 13:10 ` Alasdair G Kergon
0 siblings, 1 reply; 23+ messages in thread
From: Eli Malul @ 2011-02-28 12:17 UTC (permalink / raw)
To: Alasdair G Kergon; +Cc: device-mapper development
I expect to have thousands of non contiguous extents (so I created a
synthetic mapping table to test the device-mapper behavior).
Any suggestions?
-----Original Message-----
From: Alasdair G Kergon [mailto:agk@redhat.com]
Sent: Monday, February 28, 2011 2:12 PM
To: Eli Malul
Cc: device-mapper development
Subject: Re: [dm-devel] mirrored device with thousand of
mappingtableentries
On Mon, Feb 28, 2011 at 01:59:16PM +0200, Eli Malul wrote:
> The first few lines of the table I am loading:
> [root@vpc09 ~]# vi /tmp/mirror_table.txt
> 0 12 km_mirror core 2 8 nosync 2 /dev/loop0 0 /dev/loop1 0
> 12 12 km_mirror core 2 8 nosync 2 /dev/loop0 12 /dev/loop1 12
> 24 12 km_mirror core 2 8 nosync 2 /dev/loop0 24 /dev/loop1 24
> 36 12 km_mirror core 2 8 nosync 2 /dev/loop0 36 /dev/loop1 36
> 48 12 km_mirror core 2 8 nosync 2 /dev/loop0 48 /dev/loop1 48
> 60 12 km_mirror core 2 8 nosync 2 /dev/loop0 60 /dev/loop1 60
> 72 12 km_mirror core 2 8 nosync 2 /dev/loop0 72 /dev/loop1 72
> 84 12 km_mirror core 2 8 nosync 2 /dev/loop0 84 /dev/loop1 84
> 96 12 km_mirror core 2 8 nosync 2 /dev/loop0 96 /dev/loop1 96
> 108 12 km_mirror core 2 8 nosync 2 /dev/loop0 108 /dev/loop1 108
Why do you need this, rather than one mirror target covering the
whole device? (And if, unlike that example, your extents are not
contiguous,
create two new devices that join them together, and mirror those.
That's
what LVM does.)
Alasdair
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: mirrored device with thousand of mappingtableentries
2011-02-28 12:17 ` mirrored device with thousand of mappingtableentries Eli Malul
@ 2011-02-28 13:10 ` Alasdair G Kergon
2011-02-28 13:13 ` Eli Malul
0 siblings, 1 reply; 23+ messages in thread
From: Alasdair G Kergon @ 2011-02-28 13:10 UTC (permalink / raw)
To: Eli Malul; +Cc: device-mapper development
On Mon, Feb 28, 2011 at 02:17:52PM +0200, Eli Malul wrote:
> I expect to have thousands of non contiguous extents (so I created a
> synthetic mapping table to test the device-mapper behavior).
>
> Any suggestions?
> -----Original Message-----
> From: Alasdair G Kergon [mailto:agk@redhat.com]
> Sent: Monday, February 28, 2011 2:12 PM
> To: Eli Malul
> Cc: device-mapper development
> (And if, unlike that example, your extents are not
> contiguous,
> create two new devices that join them together, and mirror those.
> That's
> what LVM does.)
That.
dm0 = list 0
dm1 = list 1
dm2 = mirror of dm0 and dm1
Alasdair
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: mirrored device with thousand of mappingtableentries
2011-02-28 13:10 ` Alasdair G Kergon
@ 2011-02-28 13:13 ` Eli Malul
2011-02-28 13:29 ` Alasdair G Kergon
2011-02-28 13:38 ` Zdenek Kabelac
0 siblings, 2 replies; 23+ messages in thread
From: Eli Malul @ 2011-02-28 13:13 UTC (permalink / raw)
To: Alasdair G Kergon; +Cc: device-mapper development
OK, I will try that and check the implications of that, thanks!!
But, do you know why having 10,000 extents is causing this overwhelming
memory usage?
-----Original Message-----
From: Alasdair G Kergon [mailto:agk@redhat.com]
Sent: Monday, February 28, 2011 3:10 PM
To: Eli Malul
Cc: device-mapper development
Subject: Re: [dm-devel] mirrored device with thousand of
mappingtableentries
On Mon, Feb 28, 2011 at 02:17:52PM +0200, Eli Malul wrote:
> I expect to have thousands of non contiguous extents (so I created a
> synthetic mapping table to test the device-mapper behavior).
>
> Any suggestions?
> -----Original Message-----
> From: Alasdair G Kergon [mailto:agk@redhat.com]
> Sent: Monday, February 28, 2011 2:12 PM
> To: Eli Malul
> Cc: device-mapper development
> (And if, unlike that example, your extents are not
> contiguous,
> create two new devices that join them together, and mirror those.
> That's
> what LVM does.)
That.
dm0 = list 0
dm1 = list 1
dm2 = mirror of dm0 and dm1
Alasdair
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: mirrored device with thousand of mappingtableentries
2011-02-28 13:13 ` Eli Malul
@ 2011-02-28 13:29 ` Alasdair G Kergon
2011-02-28 13:42 ` Eli Malul
2011-02-28 16:25 ` Eli Malul
2011-02-28 13:38 ` Zdenek Kabelac
1 sibling, 2 replies; 23+ messages in thread
From: Alasdair G Kergon @ 2011-02-28 13:29 UTC (permalink / raw)
To: Eli Malul; +Cc: device-mapper development
On Mon, Feb 28, 2011 at 03:13:42PM +0200, Eli Malul wrote:
> But, do you know why having 10,000 extents is causing this overwhelming
> memory usage?
Nobody considered having a huge number of tiny mirrors reasonable or necessary
so no attempt has been made to optimise that situation.
Alasdair
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: mirrored device with thousand of mappingtableentries
2011-02-28 13:13 ` Eli Malul
2011-02-28 13:29 ` Alasdair G Kergon
@ 2011-02-28 13:38 ` Zdenek Kabelac
2011-02-28 15:01 ` Martin K. Petersen
1 sibling, 1 reply; 23+ messages in thread
From: Zdenek Kabelac @ 2011-02-28 13:38 UTC (permalink / raw)
To: dm-devel
Dne 28.2.2011 14:13, Eli Malul napsal(a):
>
> OK, I will try that and check the implications of that, thanks!!
>
> But, do you know why having 10,000 extents is causing this overwhelming
> memory usage?
>
If you really need that many devices - be also sure
you have disabled this kernel config option:
CONFIG_BLK_DEV_INTEGRITY
For some reason it consumes massive amount of memory.
Other than that - you should count with 64KB per device usually.
Zdenek
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: mirrored device with thousand of mappingtableentries
2011-02-28 13:29 ` Alasdair G Kergon
@ 2011-02-28 13:42 ` Eli Malul
2011-02-28 16:25 ` Eli Malul
1 sibling, 0 replies; 23+ messages in thread
From: Eli Malul @ 2011-02-28 13:42 UTC (permalink / raw)
To: Alasdair G Kergon; +Cc: device-mapper development
According to your suggestion I need to map the non contiguous extents
into linear devices and mirror them.
But, what if I am required to preserve the original extent's offset?
Since I need to mirror an existing device with user data already on
it...
-----Original Message-----
From: Alasdair G Kergon [mailto:agk@redhat.com]
Sent: Monday, February 28, 2011 3:29 PM
To: Eli Malul
Cc: device-mapper development
Subject: Re: [dm-devel] mirrored device with thousand of
mappingtableentries
On Mon, Feb 28, 2011 at 03:13:42PM +0200, Eli Malul wrote:
> But, do you know why having 10,000 extents is causing this
overwhelming
> memory usage?
Nobody considered having a huge number of tiny mirrors reasonable or
necessary
so no attempt has been made to optimise that situation.
Alasdair
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: mirrored device with thousand of mappingtableentries
2011-02-28 13:38 ` Zdenek Kabelac
@ 2011-02-28 15:01 ` Martin K. Petersen
2011-03-06 20:39 ` Zdenek Kabelac
0 siblings, 1 reply; 23+ messages in thread
From: Martin K. Petersen @ 2011-02-28 15:01 UTC (permalink / raw)
To: device-mapper development
>>>>> "Zdenek" == Zdenek Kabelac <zkabelac@redhat.com> writes:
Zdenek> If you really need that many devices - be also sure you have
Zdenek> disabled this kernel config option:
Zdenek> CONFIG_BLK_DEV_INTEGRITY
Zdenek> For some reason it consumes massive amount of memory. Other
Zdenek> than that - you should count with 64KB per device usually.
Care to qualify that?
Unless your HBA indicates that it supports data integrity the only
penalty should be that each bio grows a pointer.
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: mirrored device with thousand of mappingtableentries
2011-02-28 13:29 ` Alasdair G Kergon
2011-02-28 13:42 ` Eli Malul
@ 2011-02-28 16:25 ` Eli Malul
1 sibling, 0 replies; 23+ messages in thread
From: Eli Malul @ 2011-02-28 16:25 UTC (permalink / raw)
To: Alasdair G Kergon; +Cc: device-mapper development
One way to achieve that is to create two linear mapped devices amd
mirror them and another two linear device to simulate the original
offsets. The last shall be mapped to the first two linear devices so the
client will continue to read and write the same offsets he was used to.
Seems a little bit complex configuration but it shall do the trick.
I will test its scalability and update of course.
BTW -
CONFIG_BLK_DEV_INTEGRITY is disabled.
-----Original Message-----
From: Eli Malul
Sent: Monday, February 28, 2011 3:43 PM
To: 'Alasdair G Kergon'
Cc: device-mapper development
Subject: RE: [dm-devel] mirrored device with thousand of
mappingtableentries
According to your suggestion I need to map the non contiguous extents
into linear devices and mirror them.
But, what if I am required to preserve the original extent's offset?
Since I need to mirror an existing device with user data already on
it...
-----Original Message-----
From: Alasdair G Kergon [mailto:agk@redhat.com]
Sent: Monday, February 28, 2011 3:29 PM
To: Eli Malul
Cc: device-mapper development
Subject: Re: [dm-devel] mirrored device with thousand of
mappingtableentries
On Mon, Feb 28, 2011 at 03:13:42PM +0200, Eli Malul wrote:
> But, do you know why having 10,000 extents is causing this
overwhelming
> memory usage?
Nobody considered having a huge number of tiny mirrors reasonable or
necessary
so no attempt has been made to optimise that situation.
Alasdair
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: mirrored device with thousand of mappingtableentries
2011-02-28 15:01 ` Martin K. Petersen
@ 2011-03-06 20:39 ` Zdenek Kabelac
2011-03-07 2:59 ` Martin K. Petersen
0 siblings, 1 reply; 23+ messages in thread
From: Zdenek Kabelac @ 2011-03-06 20:39 UTC (permalink / raw)
To: dm-devel
Dne 28.2.2011 16:01, Martin K. Petersen napsal(a):
>>>>>> "Zdenek" == Zdenek Kabelac <zkabelac@redhat.com> writes:
>
> Zdenek> If you really need that many devices - be also sure you have
> Zdenek> disabled this kernel config option:
>
> Zdenek> CONFIG_BLK_DEV_INTEGRITY
>
> Zdenek> For some reason it consumes massive amount of memory. Other
> Zdenek> than that - you should count with 64KB per device usually.
>
> Care to qualify that?
>
> Unless your HBA indicates that it supports data integrity the only
> penalty should be that each bio grows a pointer.
>
Ok - I've taken some new measurements on my (possible not the best way
configure linux kernel 2.6.38-rc7)
As I'm using some kernel memory debugging it might not apply in the same way
for non-debug kernel.
My finding seems to show that BIP-256 slabtop segment grow by ~73KB per
each device (while dm-io is ab out ~26KB)
That makes consumption ~730MB when 10000 devices are in the game.
Of course there are some other big slowdowns - so it's not the biggest
problem, however - if I get it right when I don't have hw with bio integrity
support - this memory is essencially wasted? as it will have not use.
Of course people who will plan to use such massive number of devices probably
should use hw where such memory loose isn't big problem - but it's quite
noticeable on my testing laptop with just 4G :)
Another minor issue could be seen in delaying device creation time.
(i.e. ~7000 device activation with BIP is delayed by ~30%).
Zdenek
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: mirrored device with thousand of mappingtableentries
2011-03-06 20:39 ` Zdenek Kabelac
@ 2011-03-07 2:59 ` Martin K. Petersen
2011-03-07 14:24 ` Zdenek Kabelac
` (2 more replies)
0 siblings, 3 replies; 23+ messages in thread
From: Martin K. Petersen @ 2011-03-07 2:59 UTC (permalink / raw)
To: device-mapper development
>>>>> "Zdenek" == Zdenek Kabelac <zkabelac@redhat.com> writes:
Zdenek> My finding seems to show that BIP-256 slabtop segment grow by
Zdenek> ~73KB per each device (while dm-io is ab out ~26KB)
Ok, I see it now that I tried with a bunch of DM devices.
DM allocates a bioset per volume. And since each bioset has an integrity
mempool you'll end up with a bunch of memory locked down. It seems like
a lot but it's actually the same amount as we reserve for the data path
(bio-0 + biovec-256).
Since a bioset is not necessarily tied to a single block device we can't
automatically decide whether to allocate the integrity pool or not. In
the DM case, however, we just set up the integrity profile so the
information is available.
Can you please try the following patch? This will change things so we
only attach an integrity pool to the bioset if the logical volume is
integrity-capable.
--
Martin K. Petersen Oracle Linux Engineering
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 38e4eb1..37a1b77 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -55,6 +55,7 @@ struct dm_table {
struct dm_target *targets;
unsigned discards_supported:1;
+ unsigned integrity_supported:1;
/*
* Indicates the rw permissions for the new logical
@@ -859,7 +860,11 @@ int dm_table_alloc_md_mempools(struct dm_table *t)
return -EINVAL;
}
- t->mempools = dm_alloc_md_mempools(type);
+ if (t->integrity_supported)
+ t->mempools = dm_alloc_md_mempools(type, 0);
+ else
+ t->mempools = dm_alloc_md_mempools(type, BIOSET_NO_INTEGRITY);
+
if (!t->mempools)
return -ENOMEM;
@@ -935,8 +940,10 @@ static int dm_table_prealloc_integrity(struct dm_table *t, struct mapped_device
struct dm_dev_internal *dd;
list_for_each_entry(dd, devices, list)
- if (bdev_get_integrity(dd->dm_dev.bdev))
+ if (bdev_get_integrity(dd->dm_dev.bdev)) {
+ t->integrity_supported = 1;
return blk_integrity_register(dm_disk(md), NULL);
+ }
return 0;
}
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index eaa3af0..f6146b5 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2643,7 +2643,7 @@ int dm_noflush_suspending(struct dm_target *ti)
}
EXPORT_SYMBOL_GPL(dm_noflush_suspending);
-struct dm_md_mempools *dm_alloc_md_mempools(unsigned type)
+struct dm_md_mempools *dm_alloc_md_mempools(unsigned type, unsigned int flags)
{
struct dm_md_mempools *pools = kmalloc(sizeof(*pools), GFP_KERNEL);
@@ -2663,7 +2663,8 @@ struct dm_md_mempools *dm_alloc_md_mempools(unsigned type)
goto free_io_pool_and_out;
pools->bs = (type == DM_TYPE_BIO_BASED) ?
- bioset_create(16, 0) : bioset_create(MIN_IOS, 0);
+ bioset_create_flags(16, 0, flags) :
+ bioset_create_flags(MIN_IOS, 0, flags);
if (!pools->bs)
goto free_tio_pool_and_out;
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index 0c2dd5f..d846ce0 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -149,7 +149,7 @@ void dm_kcopyd_exit(void);
/*
* Mempool operations
*/
-struct dm_md_mempools *dm_alloc_md_mempools(unsigned type);
+struct dm_md_mempools *dm_alloc_md_mempools(unsigned type, unsigned int);
void dm_free_md_mempools(struct dm_md_mempools *pools);
#endif
diff --git a/fs/bio.c b/fs/bio.c
index 4bd454f..6e4a381 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1603,9 +1603,10 @@ void bioset_free(struct bio_set *bs)
EXPORT_SYMBOL(bioset_free);
/**
- * bioset_create - Create a bio_set
+ * bioset_create_flags - Create a bio_set
* @pool_size: Number of bio and bio_vecs to cache in the mempool
* @front_pad: Number of bytes to allocate in front of the returned bio
+ * @flags: Flags that affect memory allocation
*
* Description:
* Set up a bio_set to be used with @bio_alloc_bioset. Allows the caller
@@ -1615,7 +1616,8 @@ EXPORT_SYMBOL(bioset_free);
* Note that the bio must be embedded at the END of that structure always,
* or things will break badly.
*/
-struct bio_set *bioset_create(unsigned int pool_size, unsigned int front_pad)
+struct bio_set *bioset_create_flags(unsigned int pool_size,
+ unsigned int front_pad, unsigned int flags)
{
unsigned int back_pad = BIO_INLINE_VECS * sizeof(struct bio_vec);
struct bio_set *bs;
@@ -1636,7 +1638,8 @@ struct bio_set *bioset_create(unsigned int pool_size, unsigned int front_pad)
if (!bs->bio_pool)
goto bad;
- if (bioset_integrity_create(bs, pool_size))
+ if ((flags & BIOSET_NO_INTEGRITY) == 0 &&
+ bioset_integrity_create(bs, pool_size))
goto bad;
if (!biovec_create_pools(bs, pool_size))
@@ -1646,6 +1649,12 @@ bad:
bioset_free(bs);
return NULL;
}
+EXPORT_SYMBOL(bioset_create_flags);
+
+struct bio_set *bioset_create(unsigned int pool_size, unsigned int front_pad)
+{
+ return bioset_create_flags(pool_size, front_pad, 0);
+}
EXPORT_SYMBOL(bioset_create);
static void __init biovec_init_slabs(void)
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 35dcdb3..2f758f3 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -208,7 +208,12 @@ struct bio_pair {
extern struct bio_pair *bio_split(struct bio *bi, int first_sectors);
extern void bio_pair_release(struct bio_pair *dbio);
+enum bioset_flags {
+ BIOSET_NO_INTEGRITY = (1 << 0),
+};
+
extern struct bio_set *bioset_create(unsigned int, unsigned int);
+extern struct bio_set *bioset_create_flags(unsigned int, unsigned int, unsigned int);
extern void bioset_free(struct bio_set *);
extern struct bio *bio_alloc(gfp_t, int);
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: mirrored device with thousand of mappingtableentries
2011-03-07 2:59 ` Martin K. Petersen
@ 2011-03-07 14:24 ` Zdenek Kabelac
2011-03-07 16:09 ` Mike Snitzer
2011-03-07 20:10 ` mirrored device with thousand of mappingtableentries Mike Snitzer
2 siblings, 0 replies; 23+ messages in thread
From: Zdenek Kabelac @ 2011-03-07 14:24 UTC (permalink / raw)
To: dm-devel
Dne 7.3.2011 03:59, Martin K. Petersen napsal(a):
>>>>>> "Zdenek" == Zdenek Kabelac <zkabelac@redhat.com> writes:
>
> Zdenek> My finding seems to show that BIP-256 slabtop segment grow by
> Zdenek> ~73KB per each device (while dm-io is ab out ~26KB)
>
> Ok, I see it now that I tried with a bunch of DM devices.
>
> DM allocates a bioset per volume. And since each bioset has an integrity
> mempool you'll end up with a bunch of memory locked down. It seems like
> a lot but it's actually the same amount as we reserve for the data path
> (bio-0 + biovec-256).
>
> Since a bioset is not necessarily tied to a single block device we can't
> automatically decide whether to allocate the integrity pool or not. In
> the DM case, however, we just set up the integrity profile so the
> information is available.
>
> Can you please try the following patch? This will change things so we
> only attach an integrity pool to the bioset if the logical volume is
> integrity-capable.
>
Yep - patch seems to fix the problem with wasted memory.
Thanks
tested-by: zkabelac@redhat.com
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: mirrored device with thousand of mappingtableentries
2011-03-07 2:59 ` Martin K. Petersen
2011-03-07 14:24 ` Zdenek Kabelac
@ 2011-03-07 16:09 ` Mike Snitzer
2011-03-08 6:51 ` Martin K. Petersen
2011-03-07 20:10 ` mirrored device with thousand of mappingtableentries Mike Snitzer
2 siblings, 1 reply; 23+ messages in thread
From: Mike Snitzer @ 2011-03-07 16:09 UTC (permalink / raw)
To: Martin K. Petersen; +Cc: device-mapper development, Zdenek Kabelac
Hi Martin,
On Sun, Mar 06 2011 at 9:59pm -0500,
Martin K. Petersen <martin.petersen@oracle.com> wrote:
> >>>>> "Zdenek" == Zdenek Kabelac <zkabelac@redhat.com> writes:
>
> Zdenek> My finding seems to show that BIP-256 slabtop segment grow by
> Zdenek> ~73KB per each device (while dm-io is ab out ~26KB)
>
> Ok, I see it now that I tried with a bunch of DM devices.
>
> DM allocates a bioset per volume. And since each bioset has an integrity
> mempool you'll end up with a bunch of memory locked down. It seems like
> a lot but it's actually the same amount as we reserve for the data path
> (bio-0 + biovec-256).
>
> Since a bioset is not necessarily tied to a single block device we can't
> automatically decide whether to allocate the integrity pool or not. In
> the DM case, however, we just set up the integrity profile so the
> information is available.
>
> Can you please try the following patch? This will change things so we
> only attach an integrity pool to the bioset if the logical volume is
> integrity-capable.
Thanks for sorting this out. Can you post a patch that has a proper
header with your Signed-off-by? Also including Zdenek's Tested-by, and
my:
Acked-by: Mike Snitzer <snitzer@redhat.com>
Thanks,
Mike
p.s. one small nit:
> diff --git a/fs/bio.c b/fs/bio.c
> index 4bd454f..6e4a381 100644
> --- a/fs/bio.c
> +++ b/fs/bio.c
> @@ -1603,9 +1603,10 @@ void bioset_free(struct bio_set *bs)
> EXPORT_SYMBOL(bioset_free);
>
> /**
> - * bioset_create - Create a bio_set
> + * bioset_create_flags - Create a bio_set
> * @pool_size: Number of bio and bio_vecs to cache in the mempool
> * @front_pad: Number of bytes to allocate in front of the returned bio
> + * @flags: Flags that affect memory allocation
> *
> * Description:
> * Set up a bio_set to be used with @bio_alloc_bioset. Allows the caller
> @@ -1615,7 +1616,8 @@ EXPORT_SYMBOL(bioset_free);
> * Note that the bio must be embedded at the END of that structure always,
> * or things will break badly.
> */
> -struct bio_set *bioset_create(unsigned int pool_size, unsigned int front_pad)
> +struct bio_set *bioset_create_flags(unsigned int pool_size,
> + unsigned int front_pad, unsigned int flags)
> {
> unsigned int back_pad = BIO_INLINE_VECS * sizeof(struct bio_vec);
> struct bio_set *bs;
> @@ -1636,7 +1638,8 @@ struct bio_set *bioset_create(unsigned int pool_size, unsigned int front_pad)
> if (!bs->bio_pool)
> goto bad;
>
> - if (bioset_integrity_create(bs, pool_size))
> + if ((flags & BIOSET_NO_INTEGRITY) == 0 &&
> + bioset_integrity_create(bs, pool_size))
> goto bad;
>
> if (!biovec_create_pools(bs, pool_size))
We'd generally see: if (!(flags & BIOSET_NO_INTEGRITY) && ...)
but then the double negative is maybe a bit more challenging to deal
with (for a human). In the end, not a big deal either way ;)
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: mirrored device with thousand of mappingtableentries
2011-03-07 2:59 ` Martin K. Petersen
2011-03-07 14:24 ` Zdenek Kabelac
2011-03-07 16:09 ` Mike Snitzer
@ 2011-03-07 20:10 ` Mike Snitzer
2011-03-07 20:22 ` Martin K. Petersen
2 siblings, 1 reply; 23+ messages in thread
From: Mike Snitzer @ 2011-03-07 20:10 UTC (permalink / raw)
To: Martin K. Petersen; +Cc: device-mapper development
On Sun, Mar 06 2011 at 9:59pm -0500,
Martin K. Petersen <martin.petersen@oracle.com> wrote:
> >>>>> "Zdenek" == Zdenek Kabelac <zkabelac@redhat.com> writes:
>
> Zdenek> My finding seems to show that BIP-256 slabtop segment grow by
> Zdenek> ~73KB per each device (while dm-io is ab out ~26KB)
>
> Ok, I see it now that I tried with a bunch of DM devices.
>
> DM allocates a bioset per volume. And since each bioset has an integrity
> mempool you'll end up with a bunch of memory locked down. It seems like
> a lot but it's actually the same amount as we reserve for the data path
> (bio-0 + biovec-256).
>
> Since a bioset is not necessarily tied to a single block device we can't
> automatically decide whether to allocate the integrity pool or not. In
> the DM case, however, we just set up the integrity profile so the
> information is available.
>
> Can you please try the following patch? This will change things so we
> only attach an integrity pool to the bioset if the logical volume is
> integrity-capable.
Hey Martin,
I just took the opportunity to review DM's blk_integrity code a bit more
closely -- with an eye towards stacking devices. I found an issue that
I think we need to fix that has to do with a DM device's limits being
established during do_resume() and not during table_load().
Unfortunately, a DM device's blk_integrity gets preallocated during
table_load(). dm_table_prealloc_integrity()'s call to
blk_integrity_register() establishes the blk_integrity's block_size.
But a DM device's queue_limits aren't stacked until a DM device is
resumed -- via dm_calculate_queue_limits().
For some background please see the patch header of this commit:
http://git.kernel.org/linus/754c5fc7ebb417
The final blk_integrity for the DM device isn't fully established until
do_resume()'s eventual call to dm_table_set_integrity() -- by passing a
template to blk_integrity_register(). dm_table_set_integrity() does
validate the 'block_size' of each DM devices' blk_integrity to make sure
they all match. So the code would catch the inconsistency should it
arise.
All I'm saying is: it's possible for a table_load() to not have the
awareness that a newly added device's queue_limits will cause the DM
device's final queue_limits to be increased (say a 4K device was
added to dm_device2, and dm_device2 is now being added to another
dm_device1).
So it seems we need to establish bi->sector_size during the final stage
of blk_integrity_register(), e.g. when a template is passed. Not sure
if you'd agree with that change in general but it'll work for DM because
the queue_limits are established before dm_table_set_integrity() is set.
Maybe revalidate/change the 'block_size' during the final stage in case
it changed?
Thanks,
Mike
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: mirrored device with thousand of mappingtableentries
2011-03-07 20:10 ` mirrored device with thousand of mappingtableentries Mike Snitzer
@ 2011-03-07 20:22 ` Martin K. Petersen
0 siblings, 0 replies; 23+ messages in thread
From: Martin K. Petersen @ 2011-03-07 20:22 UTC (permalink / raw)
To: Mike Snitzer; +Cc: device-mapper development, Martin K. Petersen
>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
Mike> So it seems we need to establish bi->sector_size during the final
Mike> stage of blk_integrity_register(), e.g. when a template is passed.
Mike> Not sure if you'd agree with that change in general but it'll work
Mike> for DM because the queue_limits are established before
Mike> dm_table_set_integrity() is set.
Let me contemplate a bit since I already have some changes brewing in
this area (T10 now permits protection information to be placed at
non-logical block size intervals).
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: mirrored device with thousand of mappingtableentries
2011-03-07 16:09 ` Mike Snitzer
@ 2011-03-08 6:51 ` Martin K. Petersen
2011-03-08 17:13 ` Mike Snitzer
0 siblings, 1 reply; 23+ messages in thread
From: Martin K. Petersen @ 2011-03-08 6:51 UTC (permalink / raw)
To: Mike Snitzer
Cc: Jens Axboe, device-mapper development, Martin K. Petersen,
Zdenek Kabelac
>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
Mike> Thanks for sorting this out. Can you post a patch that has a
Mike> proper header with your Signed-off-by? Also including Zdenek's
Mike> Tested-by, and my:
I contemplated a bit and decided I liked the following approach
better. What do you think?
block: Require subsystems to explicitly allocate bio_set integrity mempool
MD and DM create a new bio_set for every metadevice. Each bio_set has an
integrity mempool attached regardless of whether the metadevice is
capable of passing integrity metadata. This is a waste of memory.
Instead we defer the allocation decision to MD and DM since we know at
metadevice creation time whether integrity passthrough is needed or not.
Automatic integrity mempool allocation can then be removed from
bioset_create() and we make an explicit integrity allocation for the
fs_bio_set.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
---
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 38e4eb1..4e8e314 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -55,6 +55,7 @@ struct dm_table {
struct dm_target *targets;
unsigned discards_supported:1;
+ unsigned integrity_supported:1;
/*
* Indicates the rw permissions for the new logical
@@ -859,7 +860,7 @@ int dm_table_alloc_md_mempools(struct dm_table *t)
return -EINVAL;
}
- t->mempools = dm_alloc_md_mempools(type);
+ t->mempools = dm_alloc_md_mempools(type, t->integrity_supported);
if (!t->mempools)
return -ENOMEM;
@@ -935,8 +936,10 @@ static int dm_table_prealloc_integrity(struct dm_table *t, struct mapped_device
struct dm_dev_internal *dd;
list_for_each_entry(dd, devices, list)
- if (bdev_get_integrity(dd->dm_dev.bdev))
+ if (bdev_get_integrity(dd->dm_dev.bdev)) {
+ t->integrity_supported = 1;
return blk_integrity_register(dm_disk(md), NULL);
+ }
return 0;
}
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index eaa3af0..b55ef95 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2643,9 +2643,10 @@ int dm_noflush_suspending(struct dm_target *ti)
}
EXPORT_SYMBOL_GPL(dm_noflush_suspending);
-struct dm_md_mempools *dm_alloc_md_mempools(unsigned type)
+struct dm_md_mempools *dm_alloc_md_mempools(unsigned type, unsigned int integrity)
{
struct dm_md_mempools *pools = kmalloc(sizeof(*pools), GFP_KERNEL);
+ unsigned int pool_size = (type == DM_TYPE_BIO_BASED) ? 16 : MIN_IOS;
if (!pools)
return NULL;
@@ -2662,11 +2663,13 @@ struct dm_md_mempools *dm_alloc_md_mempools(unsigned type)
if (!pools->tio_pool)
goto free_io_pool_and_out;
- pools->bs = (type == DM_TYPE_BIO_BASED) ?
- bioset_create(16, 0) : bioset_create(MIN_IOS, 0);
+ pools->bs = bioset_create(pool_size, 0);
if (!pools->bs)
goto free_tio_pool_and_out;
+ if (integrity && bioset_integrity_create(pools->bs, pool_size))
+ goto free_tio_pool_and_out;
+
return pools;
free_tio_pool_and_out:
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index 0c2dd5f..1aaf167 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -149,7 +149,7 @@ void dm_kcopyd_exit(void);
/*
* Mempool operations
*/
-struct dm_md_mempools *dm_alloc_md_mempools(unsigned type);
+struct dm_md_mempools *dm_alloc_md_mempools(unsigned type, unsigned integrity);
void dm_free_md_mempools(struct dm_md_mempools *pools);
#endif
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 818313e..f11e0bc 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -1804,8 +1804,12 @@ int md_integrity_register(mddev_t *mddev)
mdname(mddev));
return -EINVAL;
}
- printk(KERN_NOTICE "md: data integrity on %s enabled\n",
- mdname(mddev));
+ printk(KERN_NOTICE "md: data integrity enabled on %s\n", mdname(mddev));
+ if (bioset_integrity_create(mddev->bio_set, BIO_POOL_SIZE)) {
+ printk(KERN_ERR "md: failed to create integrity pool for %s\n",
+ mdname(mddev));
+ return -EINVAL;
+ }
return 0;
}
EXPORT_SYMBOL(md_integrity_register);
diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
index e49cce2..9c5e6b2 100644
--- a/fs/bio-integrity.c
+++ b/fs/bio-integrity.c
@@ -761,6 +761,9 @@ int bioset_integrity_create(struct bio_set *bs, int pool_size)
{
unsigned int max_slab = vecs_to_idx(BIO_MAX_PAGES);
+ if (bs->bio_integrity_pool)
+ return 0;
+
bs->bio_integrity_pool =
mempool_create_slab_pool(pool_size, bip_slab[max_slab].slab);
diff --git a/fs/bio.c b/fs/bio.c
index 4bd454f..06aebf9 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1636,9 +1636,6 @@ struct bio_set *bioset_create(unsigned int pool_size, unsigned int front_pad)
if (!bs->bio_pool)
goto bad;
- if (bioset_integrity_create(bs, pool_size))
- goto bad;
-
if (!biovec_create_pools(bs, pool_size))
return bs;
@@ -1684,6 +1681,9 @@ static int __init init_bio(void)
if (!fs_bio_set)
panic("bio: can't allocate bios\n");
+ if (bioset_integrity_create(fs_bio_set, BIO_POOL_SIZE))
+ panic("bio: can't create integrity pool\n");
+
bio_split_pool = mempool_create_kmalloc_pool(BIO_SPLIT_ENTRIES,
sizeof(struct bio_pair));
if (!bio_split_pool)
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: mirrored device with thousand of mappingtableentries
2011-03-08 6:51 ` Martin K. Petersen
@ 2011-03-08 17:13 ` Mike Snitzer
2011-03-10 16:11 ` Martin K. Petersen
0 siblings, 1 reply; 23+ messages in thread
From: Mike Snitzer @ 2011-03-08 17:13 UTC (permalink / raw)
To: Martin K. Petersen
Cc: Jens Axboe, device-mapper development, Martin K. Petersen,
Zdenek Kabelac
On Tue, Mar 08 2011 at 1:51am -0500,
Martin K. Petersen <mkp@mkp.net> wrote:
> >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
>
> Mike> Thanks for sorting this out. Can you post a patch that has a
> Mike> proper header with your Signed-off-by? Also including Zdenek's
> Mike> Tested-by, and my:
>
> I contemplated a bit and decided I liked the following approach
> better. What do you think?
>
>
> block: Require subsystems to explicitly allocate bio_set integrity mempool
>
> MD and DM create a new bio_set for every metadevice. Each bio_set has an
> integrity mempool attached regardless of whether the metadevice is
> capable of passing integrity metadata. This is a waste of memory.
>
> Instead we defer the allocation decision to MD and DM since we know at
> metadevice creation time whether integrity passthrough is needed or not.
>
> Automatic integrity mempool allocation can then be removed from
> bioset_create() and we make an explicit integrity allocation for the
> fs_bio_set.
>
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
In general this approach looks fine too. Doesn't address the concern I
had yesterday about the blk_integrity block_size relative to DM (load vs
resume) but I'd imagine you'd like to sort that out in a separate patch.
But I found a few DM issues below. Provided those are sorted out:
Acked-by: Mike Snitzer <snitzer@redhat.com>
(I'll defer to Jens and Neil for the block and MD bits)
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index eaa3af0..b55ef95 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -2643,9 +2643,10 @@ int dm_noflush_suspending(struct dm_target *ti)
> }
> EXPORT_SYMBOL_GPL(dm_noflush_suspending);
>
> -struct dm_md_mempools *dm_alloc_md_mempools(unsigned type)
> +struct dm_md_mempools *dm_alloc_md_mempools(unsigned type, unsigned int integrity)
> {
Any reason you didn't just use 'unsigned integrity' like you did below
in the dm.h prototype?
> struct dm_md_mempools *pools = kmalloc(sizeof(*pools), GFP_KERNEL);
> + unsigned int pool_size = (type == DM_TYPE_BIO_BASED) ? 16 : MIN_IOS;
>
> if (!pools)
> return NULL;
> @@ -2662,11 +2663,13 @@ struct dm_md_mempools *dm_alloc_md_mempools(unsigned type)
> if (!pools->tio_pool)
> goto free_io_pool_and_out;
>
> - pools->bs = (type == DM_TYPE_BIO_BASED) ?
> - bioset_create(16, 0) : bioset_create(MIN_IOS, 0);
> + pools->bs = bioset_create(pool_size, 0);
> if (!pools->bs)
> goto free_tio_pool_and_out;
>
> + if (integrity && bioset_integrity_create(pools->bs, pool_size))
> + goto free_tio_pool_and_out;
> +
> return pools;
>
> free_tio_pool_and_out:
Don't you need to bioset_free(pools->bs) if bioset_integrity_create()
fails? So you'd need a new 'free_bioset_and_out' label. Also seems
like maybe you had some extra whitespace on the goto?
> diff --git a/drivers/md/dm.h b/drivers/md/dm.h
> index 0c2dd5f..1aaf167 100644
> --- a/drivers/md/dm.h
> +++ b/drivers/md/dm.h
> @@ -149,7 +149,7 @@ void dm_kcopyd_exit(void);
> /*
> * Mempool operations
> */
> -struct dm_md_mempools *dm_alloc_md_mempools(unsigned type);
> +struct dm_md_mempools *dm_alloc_md_mempools(unsigned type, unsigned integrity);
> void dm_free_md_mempools(struct dm_md_mempools *pools);
>
> #endif
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: mirrored device with thousand of mappingtableentries
2011-03-08 17:13 ` Mike Snitzer
@ 2011-03-10 16:11 ` Martin K. Petersen
2011-03-11 16:53 ` [PATCH] block: Require subsystems to explicitly allocate bio_set integrity mempool (was: Re: mirrored device with thousand of mappingtableentries) Mike Snitzer
0 siblings, 1 reply; 23+ messages in thread
From: Martin K. Petersen @ 2011-03-10 16:11 UTC (permalink / raw)
To: Mike Snitzer
Cc: Jens Axboe, device-mapper development, Martin K. Petersen,
Zdenek Kabelac
>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
Mike> In general this approach looks fine too. Doesn't address the
Mike> concern I had yesterday about the blk_integrity block_size
Mike> relative to DM (load vs resume) but I'd imagine you'd like to sort
Mike> that out in a separate patch.
Yeah, that'll come as part of my DIX1.1 patch set.
I believe I've addressed all your issues below.
block: Require subsystems to explicitly allocate bio_set integrity mempool
MD and DM create a new bio_set for every metadevice. Each bio_set has an
integrity mempool attached regardless of whether the metadevice is
capable of passing integrity metadata. This is a waste of memory.
Instead we defer the allocation decision to MD and DM since we know at
metadevice creation time whether integrity passthrough is needed or not.
Automatic integrity mempool allocation can then be removed from
bioset_create() and we make an explicit integrity allocation for the
fs_bio_set.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Acked-by: Mike Snitzer <snizer@redhat.com>
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 38e4eb1..4e8e314 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -55,6 +55,7 @@ struct dm_table {
struct dm_target *targets;
unsigned discards_supported:1;
+ unsigned integrity_supported:1;
/*
* Indicates the rw permissions for the new logical
@@ -859,7 +860,7 @@ int dm_table_alloc_md_mempools(struct dm_table *t)
return -EINVAL;
}
- t->mempools = dm_alloc_md_mempools(type);
+ t->mempools = dm_alloc_md_mempools(type, t->integrity_supported);
if (!t->mempools)
return -ENOMEM;
@@ -935,8 +936,10 @@ static int dm_table_prealloc_integrity(struct dm_table *t, struct mapped_device
struct dm_dev_internal *dd;
list_for_each_entry(dd, devices, list)
- if (bdev_get_integrity(dd->dm_dev.bdev))
+ if (bdev_get_integrity(dd->dm_dev.bdev)) {
+ t->integrity_supported = 1;
return blk_integrity_register(dm_disk(md), NULL);
+ }
return 0;
}
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index eaa3af0..105963d 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2643,9 +2643,10 @@ int dm_noflush_suspending(struct dm_target *ti)
}
EXPORT_SYMBOL_GPL(dm_noflush_suspending);
-struct dm_md_mempools *dm_alloc_md_mempools(unsigned type)
+struct dm_md_mempools *dm_alloc_md_mempools(unsigned type, unsigned integrity)
{
struct dm_md_mempools *pools = kmalloc(sizeof(*pools), GFP_KERNEL);
+ unsigned int pool_size = (type == DM_TYPE_BIO_BASED) ? 16 : MIN_IOS;
if (!pools)
return NULL;
@@ -2662,13 +2663,18 @@ struct dm_md_mempools *dm_alloc_md_mempools(unsigned type)
if (!pools->tio_pool)
goto free_io_pool_and_out;
- pools->bs = (type == DM_TYPE_BIO_BASED) ?
- bioset_create(16, 0) : bioset_create(MIN_IOS, 0);
+ pools->bs = bioset_create(pool_size, 0);
if (!pools->bs)
goto free_tio_pool_and_out;
+ if (integrity && bioset_integrity_create(pools->bs, pool_size))
+ goto free_bioset_and_out;
+
return pools;
+free_bioset_and_out:
+ bioset_free(pools->bs);
+
free_tio_pool_and_out:
mempool_destroy(pools->tio_pool);
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index 0c2dd5f..1aaf167 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -149,7 +149,7 @@ void dm_kcopyd_exit(void);
/*
* Mempool operations
*/
-struct dm_md_mempools *dm_alloc_md_mempools(unsigned type);
+struct dm_md_mempools *dm_alloc_md_mempools(unsigned type, unsigned integrity);
void dm_free_md_mempools(struct dm_md_mempools *pools);
#endif
diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index 0ed7f6b..b317544 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -227,8 +227,7 @@ static int linear_run (mddev_t *mddev)
mddev->queue->unplug_fn = linear_unplug;
mddev->queue->backing_dev_info.congested_fn = linear_congested;
mddev->queue->backing_dev_info.congested_data = mddev;
- md_integrity_register(mddev);
- return 0;
+ return md_integrity_register(mddev);
}
static void free_conf(struct rcu_head *head)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 818313e..f11e0bc 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -1804,8 +1804,12 @@ int md_integrity_register(mddev_t *mddev)
mdname(mddev));
return -EINVAL;
}
- printk(KERN_NOTICE "md: data integrity on %s enabled\n",
- mdname(mddev));
+ printk(KERN_NOTICE "md: data integrity enabled on %s\n", mdname(mddev));
+ if (bioset_integrity_create(mddev->bio_set, BIO_POOL_SIZE)) {
+ printk(KERN_ERR "md: failed to create integrity pool for %s\n",
+ mdname(mddev));
+ return -EINVAL;
+ }
return 0;
}
EXPORT_SYMBOL(md_integrity_register);
diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
index 3a62d44..54b8d7c 100644
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -345,7 +345,7 @@ static int multipath_remove_disk(mddev_t *mddev, int number)
p->rdev = rdev;
goto abort;
}
- md_integrity_register(mddev);
+ err = md_integrity_register(mddev);
}
abort:
@@ -520,7 +520,10 @@ static int multipath_run (mddev_t *mddev)
mddev->queue->unplug_fn = multipath_unplug;
mddev->queue->backing_dev_info.congested_fn = multipath_congested;
mddev->queue->backing_dev_info.congested_data = mddev;
- md_integrity_register(mddev);
+
+ if (md_integrity_register(mddev))
+ goto out_free_conf;
+
return 0;
out_free_conf:
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index c0ac457..17c001c 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -395,8 +395,7 @@ static int raid0_run(mddev_t *mddev)
blk_queue_merge_bvec(mddev->queue, raid0_mergeable_bvec);
dump_zones(mddev);
- md_integrity_register(mddev);
- return 0;
+ return md_integrity_register(mddev);
}
static int raid0_stop(mddev_t *mddev)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 06cd712..1cca285 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1178,7 +1178,7 @@ static int raid1_remove_disk(mddev_t *mddev, int number)
p->rdev = rdev;
goto abort;
}
- md_integrity_register(mddev);
+ err = md_integrity_register(mddev);
}
abort:
@@ -2069,8 +2069,7 @@ static int run(mddev_t *mddev)
mddev->queue->unplug_fn = raid1_unplug;
mddev->queue->backing_dev_info.congested_fn = raid1_congested;
mddev->queue->backing_dev_info.congested_data = mddev;
- md_integrity_register(mddev);
- return 0;
+ return md_integrity_register(mddev);
}
static int stop(mddev_t *mddev)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 747d061..17a2891 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1233,7 +1233,7 @@ static int raid10_remove_disk(mddev_t *mddev, int number)
p->rdev = rdev;
goto abort;
}
- md_integrity_register(mddev);
+ err = md_integrity_register(mddev);
}
abort:
@@ -2395,7 +2395,10 @@ static int run(mddev_t *mddev)
if (conf->near_copies < conf->raid_disks)
blk_queue_merge_bvec(mddev->queue, raid10_mergeable_bvec);
- md_integrity_register(mddev);
+
+ if (md_integrity_register(mddev))
+ goto out_free_conf;
+
return 0;
out_free_conf:
diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
index e49cce2..9c5e6b2 100644
--- a/fs/bio-integrity.c
+++ b/fs/bio-integrity.c
@@ -761,6 +761,9 @@ int bioset_integrity_create(struct bio_set *bs, int pool_size)
{
unsigned int max_slab = vecs_to_idx(BIO_MAX_PAGES);
+ if (bs->bio_integrity_pool)
+ return 0;
+
bs->bio_integrity_pool =
mempool_create_slab_pool(pool_size, bip_slab[max_slab].slab);
diff --git a/fs/bio.c b/fs/bio.c
index 5694b75..85e2eab 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1636,9 +1636,6 @@ struct bio_set *bioset_create(unsigned int pool_size, unsigned int front_pad)
if (!bs->bio_pool)
goto bad;
- if (bioset_integrity_create(bs, pool_size))
- goto bad;
-
if (!biovec_create_pools(bs, pool_size))
return bs;
@@ -1682,6 +1679,9 @@ static int __init init_bio(void)
if (!fs_bio_set)
panic("bio: can't allocate bios\n");
+ if (bioset_integrity_create(fs_bio_set, BIO_POOL_SIZE))
+ panic("bio: can't create integrity pool\n");
+
bio_split_pool = mempool_create_kmalloc_pool(BIO_SPLIT_ENTRIES,
sizeof(struct bio_pair));
if (!bio_split_pool)
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH] block: Require subsystems to explicitly allocate bio_set integrity mempool (was: Re: mirrored device with thousand of mappingtableentries)
2011-03-10 16:11 ` Martin K. Petersen
@ 2011-03-11 16:53 ` Mike Snitzer
2011-03-11 17:11 ` [PATCH] block: Require subsystems to explicitly allocate bio_set integrity mempool Martin K. Petersen
0 siblings, 1 reply; 23+ messages in thread
From: Mike Snitzer @ 2011-03-11 16:53 UTC (permalink / raw)
To: Martin K. Petersen
Cc: Jens Axboe, device-mapper development, Martin K. Petersen,
Zdenek Kabelac
On Thu, Mar 10 2011 at 11:11am -0500,
Martin K. Petersen <mkp@mkp.net> wrote:
> >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
>
> Mike> In general this approach looks fine too. Doesn't address the
> Mike> concern I had yesterday about the blk_integrity block_size
> Mike> relative to DM (load vs resume) but I'd imagine you'd like to sort
> Mike> that out in a separate patch.
>
> Yeah, that'll come as part of my DIX1.1 patch set.
Seems my concern should be fixed independent of a larger update
patchset. Unless DIX isn't meaningful for "stable" kernels without your
full DIX1.1 update?
> I believe I've addressed all your issues below.
The block and DM changes look good. I haven't put time to the MD
changes (cc'ing neil).
Thanks,
Mike
> block: Require subsystems to explicitly allocate bio_set integrity mempool
>
> MD and DM create a new bio_set for every metadevice. Each bio_set has an
> integrity mempool attached regardless of whether the metadevice is
> capable of passing integrity metadata. This is a waste of memory.
>
> Instead we defer the allocation decision to MD and DM since we know at
> metadevice creation time whether integrity passthrough is needed or not.
>
> Automatic integrity mempool allocation can then be removed from
> bioset_create() and we make an explicit integrity allocation for the
> fs_bio_set.
>
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
> Acked-by: Mike Snitzer <snizer@redhat.com>
>
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index 38e4eb1..4e8e314 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -55,6 +55,7 @@ struct dm_table {
> struct dm_target *targets;
>
> unsigned discards_supported:1;
> + unsigned integrity_supported:1;
>
> /*
> * Indicates the rw permissions for the new logical
> @@ -859,7 +860,7 @@ int dm_table_alloc_md_mempools(struct dm_table *t)
> return -EINVAL;
> }
>
> - t->mempools = dm_alloc_md_mempools(type);
> + t->mempools = dm_alloc_md_mempools(type, t->integrity_supported);
> if (!t->mempools)
> return -ENOMEM;
>
> @@ -935,8 +936,10 @@ static int dm_table_prealloc_integrity(struct dm_table *t, struct mapped_device
> struct dm_dev_internal *dd;
>
> list_for_each_entry(dd, devices, list)
> - if (bdev_get_integrity(dd->dm_dev.bdev))
> + if (bdev_get_integrity(dd->dm_dev.bdev)) {
> + t->integrity_supported = 1;
> return blk_integrity_register(dm_disk(md), NULL);
> + }
>
> return 0;
> }
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index eaa3af0..105963d 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -2643,9 +2643,10 @@ int dm_noflush_suspending(struct dm_target *ti)
> }
> EXPORT_SYMBOL_GPL(dm_noflush_suspending);
>
> -struct dm_md_mempools *dm_alloc_md_mempools(unsigned type)
> +struct dm_md_mempools *dm_alloc_md_mempools(unsigned type, unsigned integrity)
> {
> struct dm_md_mempools *pools = kmalloc(sizeof(*pools), GFP_KERNEL);
> + unsigned int pool_size = (type == DM_TYPE_BIO_BASED) ? 16 : MIN_IOS;
>
> if (!pools)
> return NULL;
> @@ -2662,13 +2663,18 @@ struct dm_md_mempools *dm_alloc_md_mempools(unsigned type)
> if (!pools->tio_pool)
> goto free_io_pool_and_out;
>
> - pools->bs = (type == DM_TYPE_BIO_BASED) ?
> - bioset_create(16, 0) : bioset_create(MIN_IOS, 0);
> + pools->bs = bioset_create(pool_size, 0);
> if (!pools->bs)
> goto free_tio_pool_and_out;
>
> + if (integrity && bioset_integrity_create(pools->bs, pool_size))
> + goto free_bioset_and_out;
> +
> return pools;
>
> +free_bioset_and_out:
> + bioset_free(pools->bs);
> +
> free_tio_pool_and_out:
> mempool_destroy(pools->tio_pool);
>
> diff --git a/drivers/md/dm.h b/drivers/md/dm.h
> index 0c2dd5f..1aaf167 100644
> --- a/drivers/md/dm.h
> +++ b/drivers/md/dm.h
> @@ -149,7 +149,7 @@ void dm_kcopyd_exit(void);
> /*
> * Mempool operations
> */
> -struct dm_md_mempools *dm_alloc_md_mempools(unsigned type);
> +struct dm_md_mempools *dm_alloc_md_mempools(unsigned type, unsigned integrity);
> void dm_free_md_mempools(struct dm_md_mempools *pools);
>
> #endif
> diff --git a/drivers/md/linear.c b/drivers/md/linear.c
> index 0ed7f6b..b317544 100644
> --- a/drivers/md/linear.c
> +++ b/drivers/md/linear.c
> @@ -227,8 +227,7 @@ static int linear_run (mddev_t *mddev)
> mddev->queue->unplug_fn = linear_unplug;
> mddev->queue->backing_dev_info.congested_fn = linear_congested;
> mddev->queue->backing_dev_info.congested_data = mddev;
> - md_integrity_register(mddev);
> - return 0;
> + return md_integrity_register(mddev);
> }
>
> static void free_conf(struct rcu_head *head)
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 818313e..f11e0bc 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -1804,8 +1804,12 @@ int md_integrity_register(mddev_t *mddev)
> mdname(mddev));
> return -EINVAL;
> }
> - printk(KERN_NOTICE "md: data integrity on %s enabled\n",
> - mdname(mddev));
> + printk(KERN_NOTICE "md: data integrity enabled on %s\n", mdname(mddev));
> + if (bioset_integrity_create(mddev->bio_set, BIO_POOL_SIZE)) {
> + printk(KERN_ERR "md: failed to create integrity pool for %s\n",
> + mdname(mddev));
> + return -EINVAL;
> + }
> return 0;
> }
> EXPORT_SYMBOL(md_integrity_register);
> diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
> index 3a62d44..54b8d7c 100644
> --- a/drivers/md/multipath.c
> +++ b/drivers/md/multipath.c
> @@ -345,7 +345,7 @@ static int multipath_remove_disk(mddev_t *mddev, int number)
> p->rdev = rdev;
> goto abort;
> }
> - md_integrity_register(mddev);
> + err = md_integrity_register(mddev);
> }
> abort:
>
> @@ -520,7 +520,10 @@ static int multipath_run (mddev_t *mddev)
> mddev->queue->unplug_fn = multipath_unplug;
> mddev->queue->backing_dev_info.congested_fn = multipath_congested;
> mddev->queue->backing_dev_info.congested_data = mddev;
> - md_integrity_register(mddev);
> +
> + if (md_integrity_register(mddev))
> + goto out_free_conf;
> +
> return 0;
>
> out_free_conf:
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index c0ac457..17c001c 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -395,8 +395,7 @@ static int raid0_run(mddev_t *mddev)
>
> blk_queue_merge_bvec(mddev->queue, raid0_mergeable_bvec);
> dump_zones(mddev);
> - md_integrity_register(mddev);
> - return 0;
> + return md_integrity_register(mddev);
> }
>
> static int raid0_stop(mddev_t *mddev)
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 06cd712..1cca285 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1178,7 +1178,7 @@ static int raid1_remove_disk(mddev_t *mddev, int number)
> p->rdev = rdev;
> goto abort;
> }
> - md_integrity_register(mddev);
> + err = md_integrity_register(mddev);
> }
> abort:
>
> @@ -2069,8 +2069,7 @@ static int run(mddev_t *mddev)
> mddev->queue->unplug_fn = raid1_unplug;
> mddev->queue->backing_dev_info.congested_fn = raid1_congested;
> mddev->queue->backing_dev_info.congested_data = mddev;
> - md_integrity_register(mddev);
> - return 0;
> + return md_integrity_register(mddev);
> }
>
> static int stop(mddev_t *mddev)
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 747d061..17a2891 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -1233,7 +1233,7 @@ static int raid10_remove_disk(mddev_t *mddev, int number)
> p->rdev = rdev;
> goto abort;
> }
> - md_integrity_register(mddev);
> + err = md_integrity_register(mddev);
> }
> abort:
>
> @@ -2395,7 +2395,10 @@ static int run(mddev_t *mddev)
>
> if (conf->near_copies < conf->raid_disks)
> blk_queue_merge_bvec(mddev->queue, raid10_mergeable_bvec);
> - md_integrity_register(mddev);
> +
> + if (md_integrity_register(mddev))
> + goto out_free_conf;
> +
> return 0;
>
> out_free_conf:
> diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
> index e49cce2..9c5e6b2 100644
> --- a/fs/bio-integrity.c
> +++ b/fs/bio-integrity.c
> @@ -761,6 +761,9 @@ int bioset_integrity_create(struct bio_set *bs, int pool_size)
> {
> unsigned int max_slab = vecs_to_idx(BIO_MAX_PAGES);
>
> + if (bs->bio_integrity_pool)
> + return 0;
> +
> bs->bio_integrity_pool =
> mempool_create_slab_pool(pool_size, bip_slab[max_slab].slab);
>
> diff --git a/fs/bio.c b/fs/bio.c
> index 5694b75..85e2eab 100644
> --- a/fs/bio.c
> +++ b/fs/bio.c
> @@ -1636,9 +1636,6 @@ struct bio_set *bioset_create(unsigned int pool_size, unsigned int front_pad)
> if (!bs->bio_pool)
> goto bad;
>
> - if (bioset_integrity_create(bs, pool_size))
> - goto bad;
> -
> if (!biovec_create_pools(bs, pool_size))
> return bs;
>
> @@ -1682,6 +1679,9 @@ static int __init init_bio(void)
> if (!fs_bio_set)
> panic("bio: can't allocate bios\n");
>
> + if (bioset_integrity_create(fs_bio_set, BIO_POOL_SIZE))
> + panic("bio: can't create integrity pool\n");
> +
> bio_split_pool = mempool_create_kmalloc_pool(BIO_SPLIT_ENTRIES,
> sizeof(struct bio_pair));
> if (!bio_split_pool)
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] block: Require subsystems to explicitly allocate bio_set integrity mempool
2011-03-11 16:53 ` [PATCH] block: Require subsystems to explicitly allocate bio_set integrity mempool (was: Re: mirrored device with thousand of mappingtableentries) Mike Snitzer
@ 2011-03-11 17:11 ` Martin K. Petersen
0 siblings, 0 replies; 23+ messages in thread
From: Martin K. Petersen @ 2011-03-11 17:11 UTC (permalink / raw)
To: Mike Snitzer
Cc: Jens Axboe, device-mapper development, Martin K. Petersen,
Zdenek Kabelac
>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
>> Yeah, that'll come as part of my DIX1.1 patch set.
Mike> Seems my concern should be fixed independent of a larger update
Mike> patchset. Unless DIX isn't meaningful for "stable" kernels
Mike> without your full DIX1.1 update?
Well, I'm not aware of any shipping non-512 byte block DIF devices other
than scsi_debug.
And it's not good enough to just fix it in DM since this is a hardware
parameter that needs to be queried and bubbled up. It can't be scaled
like the rest of the topology block sizes, it's wired in the
hardware. So I just don't see much point in trying to fix it now when
any ties to the logical block size will be severed in my impending patch
set.
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2011-03-11 17:11 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-28 11:21 mirrored device with thousand of mapping table entries Eli Malul
2011-02-28 11:48 ` Alasdair G Kergon
2011-02-28 11:59 ` mirrored device with thousand of mapping tableentries Eli Malul
2011-02-28 12:11 ` Alasdair G Kergon
2011-02-28 12:17 ` mirrored device with thousand of mappingtableentries Eli Malul
2011-02-28 13:10 ` Alasdair G Kergon
2011-02-28 13:13 ` Eli Malul
2011-02-28 13:29 ` Alasdair G Kergon
2011-02-28 13:42 ` Eli Malul
2011-02-28 16:25 ` Eli Malul
2011-02-28 13:38 ` Zdenek Kabelac
2011-02-28 15:01 ` Martin K. Petersen
2011-03-06 20:39 ` Zdenek Kabelac
2011-03-07 2:59 ` Martin K. Petersen
2011-03-07 14:24 ` Zdenek Kabelac
2011-03-07 16:09 ` Mike Snitzer
2011-03-08 6:51 ` Martin K. Petersen
2011-03-08 17:13 ` Mike Snitzer
2011-03-10 16:11 ` Martin K. Petersen
2011-03-11 16:53 ` [PATCH] block: Require subsystems to explicitly allocate bio_set integrity mempool (was: Re: mirrored device with thousand of mappingtableentries) Mike Snitzer
2011-03-11 17:11 ` [PATCH] block: Require subsystems to explicitly allocate bio_set integrity mempool Martin K. Petersen
2011-03-07 20:10 ` mirrored device with thousand of mappingtableentries Mike Snitzer
2011-03-07 20:22 ` Martin K. Petersen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox