* [PATCH 1/5] scatterlist: make sure sg_miter_next() doesn't return 0 sized mappings
2009-04-15 13:10 [GIT PATCH block#for-linus] block: various fixes Tejun Heo
@ 2009-04-15 13:10 ` Tejun Heo
2009-04-15 13:10 ` [PATCH 2/5] block: fix SG_IO vector request data length handling Tejun Heo
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2009-04-15 13:10 UTC (permalink / raw)
To: axboe, bharrosh, linux-kernel, fujita.tomonori; +Cc: Tejun Heo
Impact: fix not-so-critical but annoying bug
sg_miter_next() returns 0 sized mapping if there is an zero sized sg
entry in the list or at the end of each iteration. As the users
always check the ->length field, this bug shouldn't be critical other
than causing unnecessary iteration.
Fix it.
Signed-off-by: Tejun Heo <tj@kernel.org>
---
lib/scatterlist.c | 9 ++++++---
1 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index b7b449d..a295e40 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -347,9 +347,12 @@ bool sg_miter_next(struct sg_mapping_iter *miter)
sg_miter_stop(miter);
/* get to the next sg if necessary. __offset is adjusted by stop */
- if (miter->__offset == miter->__sg->length && --miter->__nents) {
- miter->__sg = sg_next(miter->__sg);
- miter->__offset = 0;
+ while (miter->__offset == miter->__sg->length) {
+ if (--miter->__nents) {
+ miter->__sg = sg_next(miter->__sg);
+ miter->__offset = 0;
+ } else
+ return false;
}
/* map the next page */
--
1.6.0.2
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH 2/5] block: fix SG_IO vector request data length handling
2009-04-15 13:10 [GIT PATCH block#for-linus] block: various fixes Tejun Heo
2009-04-15 13:10 ` [PATCH 1/5] scatterlist: make sure sg_miter_next() doesn't return 0 sized mappings Tejun Heo
@ 2009-04-15 13:10 ` Tejun Heo
2009-04-15 13:10 ` [PATCH 3/5] block: fix queue bounce limit setting Tejun Heo
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2009-04-15 13:10 UTC (permalink / raw)
To: axboe, bharrosh, linux-kernel, fujita.tomonori; +Cc: Tejun Heo
Impact: fix SG_IO behavior such that it matches the documentation
SG_IO howto says that if ->dxfer_len and sum of iovec disagress, the
shorter one wins. However, the current implementation returns -EINVAL
for such cases. Trim iovc if it's longer than ->dxfer_len.
This patch uses iov_*() helpers which take struct iovec * by casting
struct sg_iovec * to it. sg_iovec is always identical to iovec and
this will be further cleaned up with later patches.
Signed-off-by: Tejun Heo <tj@kernel.org>
---
block/scsi_ioctl.c | 13 ++++++++++++-
1 files changed, 12 insertions(+), 1 deletions(-)
diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index 84b7f87..82a0ca2 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -290,6 +290,7 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
if (hdr->iovec_count) {
const int size = sizeof(struct sg_iovec) * hdr->iovec_count;
+ size_t iov_data_len;
struct sg_iovec *iov;
iov = kmalloc(size, GFP_KERNEL);
@@ -304,8 +305,18 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
goto out;
}
+ /* SG_IO howto says that the shorter of the two wins */
+ iov_data_len = iov_length((struct iovec *)iov,
+ hdr->iovec_count);
+ if (hdr->dxfer_len < iov_data_len) {
+ hdr->iovec_count = iov_shorten((struct iovec *)iov,
+ hdr->iovec_count,
+ hdr->dxfer_len);
+ iov_data_len = hdr->dxfer_len;
+ }
+
ret = blk_rq_map_user_iov(q, rq, NULL, iov, hdr->iovec_count,
- hdr->dxfer_len, GFP_KERNEL);
+ iov_data_len, GFP_KERNEL);
kfree(iov);
} else if (hdr->dxfer_len)
ret = blk_rq_map_user(q, rq, NULL, hdr->dxferp, hdr->dxfer_len,
--
1.6.0.2
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH 3/5] block: fix queue bounce limit setting
2009-04-15 13:10 [GIT PATCH block#for-linus] block: various fixes Tejun Heo
2009-04-15 13:10 ` [PATCH 1/5] scatterlist: make sure sg_miter_next() doesn't return 0 sized mappings Tejun Heo
2009-04-15 13:10 ` [PATCH 2/5] block: fix SG_IO vector request data length handling Tejun Heo
@ 2009-04-15 13:10 ` Tejun Heo
2009-04-15 13:10 ` [PATCH 4/5] bio: fix bio_kmalloc() Tejun Heo
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2009-04-15 13:10 UTC (permalink / raw)
To: axboe, bharrosh, linux-kernel, fujita.tomonori; +Cc: Tejun Heo
Impact: don't set GFP_DMA in q->bounce_gfp unnecessarily
All DMA address limits are expressed in terms of the last addressable
unit (byte or page) instead of one plus that. However, when
determining bounce_gfp for 64bit machines in blk_queue_bounce_limit(),
it compares the specified limit against 0x100000000UL to determine
whether it's below 4G ending up falsely setting GFP_DMA in
q->bounce_gfp.
As DMA zone is very small on x86_64, this makes larger SG_IO transfers
very eager to trigger OOM killer. Fix it. While at it, rename the
parameter to @dma_mask for clarity and convert comment to proper
winged style.
Signed-off-by: Tejun Heo <tj@kernel.org>
---
block/blk-settings.c | 20 +++++++++++---------
1 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 69c42ad..57af728 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -156,26 +156,28 @@ EXPORT_SYMBOL(blk_queue_make_request);
/**
* blk_queue_bounce_limit - set bounce buffer limit for queue
- * @q: the request queue for the device
- * @dma_addr: bus address limit
+ * @q: the request queue for the device
+ * @dma_mask: the maximum address the device can handle
*
* Description:
* Different hardware can have different requirements as to what pages
* it can do I/O directly to. A low level driver can call
* blk_queue_bounce_limit to have lower memory pages allocated as bounce
- * buffers for doing I/O to pages residing above @dma_addr.
+ * buffers for doing I/O to pages residing above @dma_mask.
**/
-void blk_queue_bounce_limit(struct request_queue *q, u64 dma_addr)
+void blk_queue_bounce_limit(struct request_queue *q, u64 dma_mask)
{
- unsigned long b_pfn = dma_addr >> PAGE_SHIFT;
+ unsigned long b_pfn = dma_mask >> PAGE_SHIFT;
int dma = 0;
q->bounce_gfp = GFP_NOIO;
#if BITS_PER_LONG == 64
- /* Assume anything <= 4GB can be handled by IOMMU.
- Actually some IOMMUs can handle everything, but I don't
- know of a way to test this here. */
- if (b_pfn < (min_t(u64, 0x100000000UL, BLK_BOUNCE_HIGH) >> PAGE_SHIFT))
+ /*
+ * Assume anything <= 4GB can be handled by IOMMU. Actually
+ * some IOMMUs can handle everything, but I don't know of a
+ * way to test this here.
+ */
+ if (b_pfn < (min_t(u64, 0xffffffffUL, BLK_BOUNCE_HIGH) >> PAGE_SHIFT))
dma = 1;
q->bounce_pfn = max_low_pfn;
#else
--
1.6.0.2
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH 4/5] bio: fix bio_kmalloc()
2009-04-15 13:10 [GIT PATCH block#for-linus] block: various fixes Tejun Heo
` (2 preceding siblings ...)
2009-04-15 13:10 ` [PATCH 3/5] block: fix queue bounce limit setting Tejun Heo
@ 2009-04-15 13:10 ` Tejun Heo
2009-04-15 13:10 ` [PATCH 5/5] bio: use bio_kmalloc() in copy/map functions Tejun Heo
2009-04-15 16:15 ` [GIT PATCH block#for-linus] block: various fixes Jens Axboe
5 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2009-04-15 13:10 UTC (permalink / raw)
To: axboe, bharrosh, linux-kernel, fujita.tomonori; +Cc: Tejun Heo, Jens Axboe
Impact: fix bio_kmalloc() and its destruction path
bio_kmalloc() was broken in two ways.
* bvec_alloc_bs() first allocates bvec using kmalloc() and then
ignores it and allocates again like non-kmalloc bvecs.
* bio_kmalloc_destructor() didn't check for and free bio integrity
data.
This patch fixes the above problems. kmalloc patch is separated out
from bio_alloc_bioset() and allocates the requested number of bvecs as
inline bvecs.
* bio_alloc_bioset() no longer takes NULL @bs. None other than
bio_kmalloc() used it and outside users can't know how it was
allocated anyway.
* Define and use BIO_POOL_NONE so that pool index check in
bvec_free_bs() triggers if inline or kmalloc allocated bvec gets
there.
* Relocate destructors on top of each allocation function so that how
they're used is more clear.
Jens Axboe suggested allocating bvecs inline.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
---
fs/bio.c | 108 ++++++++++++++++++++++++++-------------------------
include/linux/bio.h | 1 +
2 files changed, 56 insertions(+), 53 deletions(-)
diff --git a/fs/bio.c b/fs/bio.c
index e0c9e54..6c2aef6 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -175,14 +175,6 @@ struct bio_vec *bvec_alloc_bs(gfp_t gfp_mask, int nr, unsigned long *idx,
struct bio_vec *bvl;
/*
- * If 'bs' is given, lookup the pool and do the mempool alloc.
- * If not, this is a bio_kmalloc() allocation and just do a
- * kzalloc() for the exact number of vecs right away.
- */
- if (!bs)
- bvl = kmalloc(nr * sizeof(struct bio_vec), gfp_mask);
-
- /*
* see comment near bvec_array define!
*/
switch (nr) {
@@ -260,21 +252,6 @@ void bio_free(struct bio *bio, struct bio_set *bs)
mempool_free(p, bs->bio_pool);
}
-/*
- * default destructor for a bio allocated with bio_alloc_bioset()
- */
-static void bio_fs_destructor(struct bio *bio)
-{
- bio_free(bio, fs_bio_set);
-}
-
-static void bio_kmalloc_destructor(struct bio *bio)
-{
- if (bio_has_allocated_vec(bio))
- kfree(bio->bi_io_vec);
- kfree(bio);
-}
-
void bio_init(struct bio *bio)
{
memset(bio, 0, sizeof(*bio));
@@ -301,21 +278,15 @@ void bio_init(struct bio *bio)
**/
struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
{
+ unsigned long idx = BIO_POOL_NONE;
struct bio_vec *bvl = NULL;
- struct bio *bio = NULL;
- unsigned long idx = 0;
- void *p = NULL;
-
- if (bs) {
- p = mempool_alloc(bs->bio_pool, gfp_mask);
- if (!p)
- goto err;
- bio = p + bs->front_pad;
- } else {
- bio = kmalloc(sizeof(*bio), gfp_mask);
- if (!bio)
- goto err;
- }
+ struct bio *bio;
+ void *p;
+
+ p = mempool_alloc(bs->bio_pool, gfp_mask);
+ if (unlikely(!p))
+ return NULL;
+ bio = p + bs->front_pad;
bio_init(bio);
@@ -332,22 +303,33 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
nr_iovecs = bvec_nr_vecs(idx);
}
+out_set:
bio->bi_flags |= idx << BIO_POOL_OFFSET;
bio->bi_max_vecs = nr_iovecs;
-out_set:
bio->bi_io_vec = bvl;
-
return bio;
err_free:
- if (bs)
- mempool_free(p, bs->bio_pool);
- else
- kfree(bio);
-err:
+ mempool_free(p, bs->bio_pool);
return NULL;
}
+static void bio_fs_destructor(struct bio *bio)
+{
+ bio_free(bio, fs_bio_set);
+}
+
+/**
+ * bio_alloc - allocate a new bio, memory pool backed
+ * @gfp_mask: allocation mask to use
+ * @nr_iovecs: number of iovecs
+ *
+ * Allocate a new bio with @nr_iovecs bvecs. If @gfp_mask
+ * contains __GFP_WAIT, the allocation is guaranteed to succeed.
+ *
+ * RETURNS:
+ * Pointer to new bio on success, NULL on failure.
+ */
struct bio *bio_alloc(gfp_t gfp_mask, int nr_iovecs)
{
struct bio *bio = bio_alloc_bioset(gfp_mask, nr_iovecs, fs_bio_set);
@@ -358,19 +340,39 @@ struct bio *bio_alloc(gfp_t gfp_mask, int nr_iovecs)
return bio;
}
-/*
- * Like bio_alloc(), but doesn't use a mempool backing. This means that
- * it CAN fail, but while bio_alloc() can only be used for allocations
- * that have a short (finite) life span, bio_kmalloc() should be used
- * for more permanent bio allocations (like allocating some bio's for
- * initalization or setup purposes).
+static void bio_kmalloc_destructor(struct bio *bio)
+{
+ if (bio_integrity(bio))
+ bio_integrity_free(bio);
+ kfree(bio);
+}
+
+/**
+ * bio_kmalloc - allocate a new bio
+ * @gfp_mask: allocation mask to use
+ * @nr_iovecs: number of iovecs
+ *
+ * Similar to bio_alloc() but uses regular kmalloc for allocation
+ * and can fail unless __GFP_NOFAIL is set in @gfp_mask. This is
+ * useful for more permanant or over-sized bio allocations.
+ *
+ * RETURNS:
+ * Poitner to new bio on success, NULL on failure.
*/
struct bio *bio_kmalloc(gfp_t gfp_mask, int nr_iovecs)
{
- struct bio *bio = bio_alloc_bioset(gfp_mask, nr_iovecs, NULL);
+ struct bio *bio;
- if (bio)
- bio->bi_destructor = bio_kmalloc_destructor;
+ bio = kmalloc(sizeof(struct bio) + nr_iovecs * sizeof(struct bio_vec),
+ gfp_mask);
+ if (unlikely(!bio))
+ return NULL;
+
+ bio_init(bio);
+ bio->bi_flags |= BIO_POOL_NONE << BIO_POOL_OFFSET;
+ bio->bi_max_vecs = nr_iovecs;
+ bio->bi_io_vec = bio->bi_inline_vecs;
+ bio->bi_destructor = bio_kmalloc_destructor;
return bio;
}
diff --git a/include/linux/bio.h b/include/linux/bio.h
index b89cf2d..7b214fd 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -132,6 +132,7 @@ struct bio {
* top 4 bits of bio flags indicate the pool this bio came from
*/
#define BIO_POOL_BITS (4)
+#define BIO_POOL_NONE ((1UL << BIO_POOL_BITS) - 1)
#define BIO_POOL_OFFSET (BITS_PER_LONG - BIO_POOL_BITS)
#define BIO_POOL_MASK (1UL << BIO_POOL_OFFSET)
#define BIO_POOL_IDX(bio) ((bio)->bi_flags >> BIO_POOL_OFFSET)
--
1.6.0.2
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH 5/5] bio: use bio_kmalloc() in copy/map functions
2009-04-15 13:10 [GIT PATCH block#for-linus] block: various fixes Tejun Heo
` (3 preceding siblings ...)
2009-04-15 13:10 ` [PATCH 4/5] bio: fix bio_kmalloc() Tejun Heo
@ 2009-04-15 13:10 ` Tejun Heo
2009-04-15 16:15 ` [GIT PATCH block#for-linus] block: various fixes Jens Axboe
5 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2009-04-15 13:10 UTC (permalink / raw)
To: axboe, bharrosh, linux-kernel, fujita.tomonori; +Cc: Tejun Heo
Impact: remove possible deadlock condition
There is no reason to use mempool backed allocation for map functions.
Also, because kern mapping is used inside LLDs (e.g. for EH), using
mempool backed allocation can lead to deadlock under extreme
conditions (mempool already consumed by the time a request reached EH
and requests are blocked on EH).
Switch copy/map functions to bio_kmalloc().
Signed-off-by: Tejun Heo <tj@kernel.org>
---
fs/bio.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/bio.c b/fs/bio.c
index 6c2aef6..eba1804 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -816,7 +816,7 @@ struct bio *bio_copy_user_iov(struct request_queue *q,
return ERR_PTR(-ENOMEM);
ret = -ENOMEM;
- bio = bio_alloc(gfp_mask, nr_pages);
+ bio = bio_kmalloc(gfp_mask, nr_pages);
if (!bio)
goto out_bmd;
@@ -940,7 +940,7 @@ static struct bio *__bio_map_user_iov(struct request_queue *q,
if (!nr_pages)
return ERR_PTR(-EINVAL);
- bio = bio_alloc(gfp_mask, nr_pages);
+ bio = bio_kmalloc(gfp_mask, nr_pages);
if (!bio)
return ERR_PTR(-ENOMEM);
@@ -1124,7 +1124,7 @@ static struct bio *__bio_map_kern(struct request_queue *q, void *data,
int offset, i;
struct bio *bio;
- bio = bio_alloc(gfp_mask, nr_pages);
+ bio = bio_kmalloc(gfp_mask, nr_pages);
if (!bio)
return ERR_PTR(-ENOMEM);
--
1.6.0.2
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [GIT PATCH block#for-linus] block: various fixes
2009-04-15 13:10 [GIT PATCH block#for-linus] block: various fixes Tejun Heo
` (4 preceding siblings ...)
2009-04-15 13:10 ` [PATCH 5/5] bio: use bio_kmalloc() in copy/map functions Tejun Heo
@ 2009-04-15 16:15 ` Jens Axboe
5 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2009-04-15 16:15 UTC (permalink / raw)
To: Tejun Heo; +Cc: bharrosh, linux-kernel, fujita.tomonori
On Wed, Apr 15 2009, Tejun Heo wrote:
> Hello,
>
> This patchset contains the following five block related fix patches.
>
> 0001-scatterlist-make-sure-sg_miter_next-doesn-t-retur.patch
> 0002-block-fix-SG_IO-vector-request-data-length-handling.patch
> 0003-block-fix-queue-bounce-limit-setting.patch
> 0004-bio-fix-bio_kmalloc.patch
> 0005-bio-use-bio_kmalloc-in-copy-map-functions.patch
>
> 0001-0003 are unchanged re-posting of the first three patches of
> blk-map-related-fixes[1]. The fourth patch of the previous patchset
> is dropped and 0004 reimplements bio_kmalloc() as per Jens'
> suggestion. 0005 is new and uses bio_kmalloc() in the map functions.
>
> This patchset is also available in the following git tree.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git blk-fixes
>
> and contains the following changes.
Thanks a lot, Tejun! I'll add these and include them in the next 2.6.30
push.
--
Jens Axboe
^ permalink raw reply [flat|nested] 7+ messages in thread