* [PATCH 0/5] convert sg to use the block layer
@ 2008-08-26 2:10 FUJITA Tomonori
2008-08-26 2:10 ` [PATCH 1/5] block: add gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov FUJITA Tomonori
` (2 more replies)
0 siblings, 3 replies; 23+ messages in thread
From: FUJITA Tomonori @ 2008-08-26 2:10 UTC (permalink / raw)
To: linux-scsi; +Cc: jens.axboe, dougg, michaelc, James.Bottomley, fujita.tomonori
This patchset converts sg to use the block layer functions. That is,
sg doesn't use scsi_execute_async() any more. This is a part of the
overdue task to remove scsi_req_map_sg.
I tested this patchset with sg v3 and the old interface (struct
sg_header) via SG_IO (v3) and the vfs API.
Doug,
1. I don't demote the sg driver's GFP_ATOMIC to GFP_KERNEL. sg always
uses GFP_ATOMIC as before.
2. I don't remove GFP_DMA allocation. As before, sg allocates reserved
pages with GFP_DMA (sfp->low_dma case).
3. I keep the reserved buffer per struct sg_fd as before.
4. I use high-order page allocation for reserved buffer as before. sg
works well with HBAs that have the limitation of the number of sg
entries.
5. I think that you were concern about the overhead of the block layer
functions. But if you look at scsi_execute_async() that sg uses now,
you can find that scsi_execute_async() uses the block layer functions
internally. So the current sg incurs the overhead (if such overhead
exists).
Jens,
I keep the block API changes to a minimum (I might need more changes
for st/osst but I'd like to progress step by step). I did only two
things.
1. I add gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov.
2. I introduces struct rq_map_data holding pages. sg puts
pre-allocated pages to it and passes it to bio_copy_user_iov(). The
current users of bio_copy_user_iov simply passes NULL. blk_rq_map_user
and blk_rq_map_user_iov take a pointer to struct rq_map_data and in
the end bio_copy_user_iov gets it.
This patchset against the for-linus branch in Jens' tree + the two
patches for 2.6.27:
http://marc.info/?l=linux-kernel&m=121964251911717&w=2
http://marc.info/?l=linux-kernel&m=121964241911611&w=2
After Jens rebases the for-2.6.28 brach, I'll update this patchset
too.
This patchset also is available at:
git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-misc.git sg-block
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 1/5] block: add gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov
2008-08-26 2:10 [PATCH 0/5] convert sg to use the block layer FUJITA Tomonori
@ 2008-08-26 2:10 ` FUJITA Tomonori
2008-08-26 2:10 ` [PATCH 2/5] block: introduce struct rq_map_data to use reserved pages FUJITA Tomonori
2008-08-26 16:35 ` [PATCH 1/5] block: add gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov Christoph Hellwig
2008-08-26 7:56 ` [PATCH 0/5] convert sg to use the block layer Jens Axboe
2008-08-27 20:14 ` Douglas Gilbert
2 siblings, 2 replies; 23+ messages in thread
From: FUJITA Tomonori @ 2008-08-26 2:10 UTC (permalink / raw)
To: linux-scsi; +Cc: jens.axboe, dougg, michaelc, James.Bottomley, fujita.tomonori
Currently, blk_rq_map_user and blk_rq_map_user_iov always do
GFP_KERNEL allocation.
This adds gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov
so sg can use it (sg always does GFP_ATOMIC allocation).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Douglas Gilbert <dougg@torque.net>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
---
block/blk-map.c | 20 ++++++++++++--------
block/bsg.c | 5 +++--
block/scsi_ioctl.c | 5 +++--
drivers/cdrom/cdrom.c | 2 +-
drivers/scsi/scsi_tgt_lib.c | 2 +-
fs/bio.c | 27 ++++++++++++++++-----------
include/linux/bio.h | 9 +++++----
include/linux/blkdev.h | 5 +++--
8 files changed, 44 insertions(+), 31 deletions(-)
diff --git a/block/blk-map.c b/block/blk-map.c
index af37e4a..c363b45 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -41,7 +41,8 @@ static int __blk_rq_unmap_user(struct bio *bio)
}
static int __blk_rq_map_user(struct request_queue *q, struct request *rq,
- void __user *ubuf, unsigned int len)
+ void __user *ubuf, unsigned int len,
+ gfp_t gfp_mask)
{
unsigned long uaddr;
unsigned int alignment;
@@ -57,9 +58,9 @@ static int __blk_rq_map_user(struct request_queue *q, struct request *rq,
uaddr = (unsigned long) ubuf;
alignment = queue_dma_alignment(q) | q->dma_pad_mask;
if (!(uaddr & alignment) && !(len & alignment))
- bio = bio_map_user(q, NULL, uaddr, len, reading);
+ bio = bio_map_user(q, NULL, uaddr, len, reading, gfp_mask);
else
- bio = bio_copy_user(q, uaddr, len, reading);
+ bio = bio_copy_user(q, uaddr, len, reading, gfp_mask);
if (IS_ERR(bio))
return PTR_ERR(bio);
@@ -90,6 +91,7 @@ static int __blk_rq_map_user(struct request_queue *q, struct request *rq,
* @rq: request structure to fill
* @ubuf: the user buffer
* @len: length of user data
+ * @gfp_mask: memory allocation flags
*
* Description:
* Data will be mapped directly for zero copy io, if possible. Otherwise
@@ -105,7 +107,7 @@ static int __blk_rq_map_user(struct request_queue *q, struct request *rq,
* unmapping.
*/
int blk_rq_map_user(struct request_queue *q, struct request *rq,
- void __user *ubuf, unsigned long len)
+ void __user *ubuf, unsigned long len, gfp_t gfp_mask)
{
unsigned long bytes_read = 0;
struct bio *bio = NULL;
@@ -132,7 +134,7 @@ int blk_rq_map_user(struct request_queue *q, struct request *rq,
if (end - start > BIO_MAX_PAGES)
map_len -= PAGE_SIZE;
- ret = __blk_rq_map_user(q, rq, ubuf, map_len);
+ ret = __blk_rq_map_user(q, rq, ubuf, map_len, gfp_mask);
if (ret < 0)
goto unmap_rq;
if (!bio)
@@ -160,6 +162,7 @@ EXPORT_SYMBOL(blk_rq_map_user);
* @iov: pointer to the iovec
* @iov_count: number of elements in the iovec
* @len: I/O byte count
+ * @gfp_mask: memory allocation flags
*
* Description:
* Data will be mapped directly for zero copy io, if possible. Otherwise
@@ -175,7 +178,8 @@ EXPORT_SYMBOL(blk_rq_map_user);
* unmapping.
*/
int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
- struct sg_iovec *iov, int iov_count, unsigned int len)
+ struct sg_iovec *iov, int iov_count, unsigned int len,
+ gfp_t gfp_mask)
{
struct bio *bio;
int i, read = rq_data_dir(rq) == READ;
@@ -194,9 +198,9 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
}
if (unaligned || (q->dma_pad_mask & len))
- bio = bio_copy_user_iov(q, iov, iov_count, read);
+ bio = bio_copy_user_iov(q, iov, iov_count, read, gfp_mask);
else
- bio = bio_map_user_iov(q, NULL, iov, iov_count, read);
+ bio = bio_map_user_iov(q, NULL, iov, iov_count, read, gfp_mask);
if (IS_ERR(bio))
return PTR_ERR(bio);
diff --git a/block/bsg.c b/block/bsg.c
index 0aae8d7..e7a142e 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -283,7 +283,8 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, int has_write_perm)
next_rq->cmd_type = rq->cmd_type;
dxferp = (void*)(unsigned long)hdr->din_xferp;
- ret = blk_rq_map_user(q, next_rq, dxferp, hdr->din_xfer_len);
+ ret = blk_rq_map_user(q, next_rq, dxferp, hdr->din_xfer_len,
+ GFP_KERNEL);
if (ret)
goto out;
}
@@ -298,7 +299,7 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, int has_write_perm)
dxfer_len = 0;
if (dxfer_len) {
- ret = blk_rq_map_user(q, rq, dxferp, dxfer_len);
+ ret = blk_rq_map_user(q, rq, dxferp, dxfer_len, GFP_KERNEL);
if (ret)
goto out;
}
diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index 3aab80a..f49d6a1 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -315,10 +315,11 @@ static int sg_io(struct file *file, struct request_queue *q,
}
ret = blk_rq_map_user_iov(q, rq, iov, hdr->iovec_count,
- hdr->dxfer_len);
+ hdr->dxfer_len, GFP_KERNEL);
kfree(iov);
} else if (hdr->dxfer_len)
- ret = blk_rq_map_user(q, rq, hdr->dxferp, hdr->dxfer_len);
+ ret = blk_rq_map_user(q, rq, hdr->dxferp, hdr->dxfer_len,
+ GFP_KERNEL);
if (ret)
goto out;
diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c
index 74031de..e861d24 100644
--- a/drivers/cdrom/cdrom.c
+++ b/drivers/cdrom/cdrom.c
@@ -2097,7 +2097,7 @@ static int cdrom_read_cdda_bpc(struct cdrom_device_info *cdi, __u8 __user *ubuf,
len = nr * CD_FRAMESIZE_RAW;
- ret = blk_rq_map_user(q, rq, ubuf, len);
+ ret = blk_rq_map_user(q, rq, ubuf, len, GFP_KERNEL);
if (ret)
break;
diff --git a/drivers/scsi/scsi_tgt_lib.c b/drivers/scsi/scsi_tgt_lib.c
index 257e097..2a4fd82 100644
--- a/drivers/scsi/scsi_tgt_lib.c
+++ b/drivers/scsi/scsi_tgt_lib.c
@@ -362,7 +362,7 @@ static int scsi_map_user_pages(struct scsi_tgt_cmd *tcmd, struct scsi_cmnd *cmd,
int err;
dprintk("%lx %u\n", uaddr, len);
- err = blk_rq_map_user(q, rq, (void *)uaddr, len);
+ err = blk_rq_map_user(q, rq, (void *)uaddr, len, GFP_KERNEL);
if (err) {
/*
* TODO: need to fixup sg_tablesize, max_segment_size,
diff --git a/fs/bio.c b/fs/bio.c
index 3cba7ae..4da5439 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -568,13 +568,14 @@ int bio_uncopy_user(struct bio *bio)
* @iov: the iovec.
* @iov_count: number of elements in the iovec
* @write_to_vm: bool indicating writing to pages or not
+ * @gfp_mask: memory allocation flags
*
* Prepares and returns a bio for indirect user io, bouncing data
* to/from kernel pages as necessary. Must be paired with
* call bio_uncopy_user() on io completion.
*/
struct bio *bio_copy_user_iov(struct request_queue *q, struct sg_iovec *iov,
- int iov_count, int write_to_vm)
+ int iov_count, int write_to_vm, gfp_t gfp_mask)
{
struct bio_map_data *bmd;
struct bio_vec *bvec;
@@ -615,7 +616,7 @@ struct bio *bio_copy_user_iov(struct request_queue *q, struct sg_iovec *iov,
if (bytes > len)
bytes = len;
- page = alloc_page(q->bounce_gfp | GFP_KERNEL);
+ page = alloc_page(q->bounce_gfp | gfp_mask);
if (!page) {
ret = -ENOMEM;
break;
@@ -657,26 +658,27 @@ out_bmd:
* @uaddr: start of user address
* @len: length in bytes
* @write_to_vm: bool indicating writing to pages or not
+ * @gfp_mask: memory allocation flags
*
* Prepares and returns a bio for indirect user io, bouncing data
* to/from kernel pages as necessary. Must be paired with
* call bio_uncopy_user() on io completion.
*/
struct bio *bio_copy_user(struct request_queue *q, unsigned long uaddr,
- unsigned int len, int write_to_vm)
+ unsigned int len, int write_to_vm, gfp_t gfp_mask)
{
struct sg_iovec iov;
iov.iov_base = (void __user *)uaddr;
iov.iov_len = len;
- return bio_copy_user_iov(q, &iov, 1, write_to_vm);
+ return bio_copy_user_iov(q, &iov, 1, write_to_vm, gfp_mask);
}
static struct bio *__bio_map_user_iov(struct request_queue *q,
struct block_device *bdev,
struct sg_iovec *iov, int iov_count,
- int write_to_vm)
+ int write_to_vm, gfp_t gfp_mask)
{
int i, j;
int nr_pages = 0;
@@ -702,12 +704,12 @@ static struct bio *__bio_map_user_iov(struct request_queue *q,
if (!nr_pages)
return ERR_PTR(-EINVAL);
- bio = bio_alloc(GFP_KERNEL, nr_pages);
+ bio = bio_alloc(gfp_mask, nr_pages);
if (!bio)
return ERR_PTR(-ENOMEM);
ret = -ENOMEM;
- pages = kcalloc(nr_pages, sizeof(struct page *), GFP_KERNEL);
+ pages = kcalloc(nr_pages, sizeof(struct page *), gfp_mask);
if (!pages)
goto out;
@@ -786,19 +788,21 @@ static struct bio *__bio_map_user_iov(struct request_queue *q,
* @uaddr: start of user address
* @len: length in bytes
* @write_to_vm: bool indicating writing to pages or not
+ * @gfp_mask: memory allocation flags
*
* Map the user space address into a bio suitable for io to a block
* device. Returns an error pointer in case of error.
*/
struct bio *bio_map_user(struct request_queue *q, struct block_device *bdev,
- unsigned long uaddr, unsigned int len, int write_to_vm)
+ unsigned long uaddr, unsigned int len, int write_to_vm,
+ gfp_t gfp_mask)
{
struct sg_iovec iov;
iov.iov_base = (void __user *)uaddr;
iov.iov_len = len;
- return bio_map_user_iov(q, bdev, &iov, 1, write_to_vm);
+ return bio_map_user_iov(q, bdev, &iov, 1, write_to_vm, gfp_mask);
}
/**
@@ -808,17 +812,18 @@ struct bio *bio_map_user(struct request_queue *q, struct block_device *bdev,
* @iov: the iovec.
* @iov_count: number of elements in the iovec
* @write_to_vm: bool indicating writing to pages or not
+ * @gfp_mask: memory allocation flags
*
* Map the user space address into a bio suitable for io to a block
* device. Returns an error pointer in case of error.
*/
struct bio *bio_map_user_iov(struct request_queue *q, struct block_device *bdev,
struct sg_iovec *iov, int iov_count,
- int write_to_vm)
+ int write_to_vm, gfp_t gfp_mask)
{
struct bio *bio;
- bio = __bio_map_user_iov(q, bdev, iov, iov_count, write_to_vm);
+ bio = __bio_map_user_iov(q, bdev, iov, iov_count, write_to_vm, gfp_mask);
if (IS_ERR(bio))
return bio;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 0933a14..f4820ca 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -347,11 +347,11 @@ extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
unsigned int, unsigned int);
extern int bio_get_nr_vecs(struct block_device *);
extern struct bio *bio_map_user(struct request_queue *, struct block_device *,
- unsigned long, unsigned int, int);
+ unsigned long, unsigned int, int, gfp_t);
struct sg_iovec;
extern struct bio *bio_map_user_iov(struct request_queue *,
struct block_device *,
- struct sg_iovec *, int, int);
+ struct sg_iovec *, int, int, gfp_t);
extern void bio_unmap_user(struct bio *);
extern struct bio *bio_map_kern(struct request_queue *, void *, unsigned int,
gfp_t);
@@ -359,9 +359,10 @@ extern struct bio *bio_copy_kern(struct request_queue *, void *, unsigned int,
gfp_t, int);
extern void bio_set_pages_dirty(struct bio *bio);
extern void bio_check_pages_dirty(struct bio *bio);
-extern struct bio *bio_copy_user(struct request_queue *, unsigned long, unsigned int, int);
+extern struct bio *bio_copy_user(struct request_queue *, unsigned long,
+ unsigned int, int, gfp_t);
extern struct bio *bio_copy_user_iov(struct request_queue *, struct sg_iovec *,
- int, int);
+ int, int, gfp_t);
extern int bio_uncopy_user(struct bio *);
void zero_fill_bio(struct bio *bio);
extern struct bio_vec *bvec_alloc_bs(gfp_t, int, unsigned long *, struct bio_set *);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index ab247d5..e4cb266 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -705,11 +705,12 @@ extern void __blk_stop_queue(struct request_queue *q);
extern void __blk_run_queue(struct request_queue *);
extern void blk_run_queue(struct request_queue *);
extern void blk_start_queueing(struct request_queue *);
-extern int blk_rq_map_user(struct request_queue *, struct request *, void __user *, unsigned long);
+extern int blk_rq_map_user(struct request_queue *, struct request *,
+ void __user *, unsigned long, gfp_t);
extern int blk_rq_unmap_user(struct bio *);
extern int blk_rq_map_kern(struct request_queue *, struct request *, void *, unsigned int, gfp_t);
extern int blk_rq_map_user_iov(struct request_queue *, struct request *,
- struct sg_iovec *, int, unsigned int);
+ struct sg_iovec *, int, unsigned int, gfp_t);
extern int blk_execute_rq(struct request_queue *, struct gendisk *,
struct request *, int);
extern void blk_execute_rq_nowait(struct request_queue *, struct gendisk *,
--
1.5.5.GIT
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 2/5] block: introduce struct rq_map_data to use reserved pages
2008-08-26 2:10 ` [PATCH 1/5] block: add gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov FUJITA Tomonori
@ 2008-08-26 2:10 ` FUJITA Tomonori
2008-08-26 2:10 ` [PATCH 3/5] sg: convert the non-data path to use the block layer FUJITA Tomonori
2008-08-26 16:35 ` [PATCH 1/5] block: add gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov Christoph Hellwig
1 sibling, 1 reply; 23+ messages in thread
From: FUJITA Tomonori @ 2008-08-26 2:10 UTC (permalink / raw)
To: linux-scsi; +Cc: jens.axboe, dougg, michaelc, James.Bottomley, fujita.tomonori
This patch introduces struct rq_map_data to enable bio_copy_use_iov()
use reserved pages.
Currently, bio_copy_user_iov allocates bounce pages but
drivers/scsi/sg.c wants to allocate pages by itself and use
them. struct rq_map_data can be used to pass allocated pages to
bio_copy_user_iov.
The current users of bio_copy_user_iov simply passes NULL (they don't
want to use pre-allocated pages).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Douglas Gilbert <dougg@torque.net>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
---
block/blk-map.c | 26 ++++++++++++-------
block/bsg.c | 7 +++--
block/scsi_ioctl.c | 4 +-
drivers/cdrom/cdrom.c | 2 +-
drivers/scsi/scsi_tgt_lib.c | 2 +-
fs/bio.c | 58 ++++++++++++++++++++++++++++++------------
include/linux/bio.h | 8 +++--
include/linux/blkdev.h | 12 +++++++-
8 files changed, 80 insertions(+), 39 deletions(-)
diff --git a/block/blk-map.c b/block/blk-map.c
index c363b45..3f6ae02 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -41,8 +41,8 @@ static int __blk_rq_unmap_user(struct bio *bio)
}
static int __blk_rq_map_user(struct request_queue *q, struct request *rq,
- void __user *ubuf, unsigned int len,
- gfp_t gfp_mask)
+ struct rq_map_data *map_data, void __user *ubuf,
+ unsigned int len, gfp_t gfp_mask)
{
unsigned long uaddr;
unsigned int alignment;
@@ -57,10 +57,10 @@ static int __blk_rq_map_user(struct request_queue *q, struct request *rq,
*/
uaddr = (unsigned long) ubuf;
alignment = queue_dma_alignment(q) | q->dma_pad_mask;
- if (!(uaddr & alignment) && !(len & alignment))
+ if (!(uaddr & alignment) && !(len & alignment) && !map_data)
bio = bio_map_user(q, NULL, uaddr, len, reading, gfp_mask);
else
- bio = bio_copy_user(q, uaddr, len, reading, gfp_mask);
+ bio = bio_copy_user(q, map_data, uaddr, len, reading, gfp_mask);
if (IS_ERR(bio))
return PTR_ERR(bio);
@@ -89,6 +89,7 @@ static int __blk_rq_map_user(struct request_queue *q, struct request *rq,
* blk_rq_map_user - map user data to a request, for REQ_BLOCK_PC usage
* @q: request queue where request should be inserted
* @rq: request structure to fill
+ * @map_data: pointer to the rq_map_data holding pages (if necessary)
* @ubuf: the user buffer
* @len: length of user data
* @gfp_mask: memory allocation flags
@@ -107,7 +108,8 @@ static int __blk_rq_map_user(struct request_queue *q, struct request *rq,
* unmapping.
*/
int blk_rq_map_user(struct request_queue *q, struct request *rq,
- void __user *ubuf, unsigned long len, gfp_t gfp_mask)
+ struct rq_map_data *map_data, void __user *ubuf,
+ unsigned long len, gfp_t gfp_mask)
{
unsigned long bytes_read = 0;
struct bio *bio = NULL;
@@ -134,7 +136,8 @@ int blk_rq_map_user(struct request_queue *q, struct request *rq,
if (end - start > BIO_MAX_PAGES)
map_len -= PAGE_SIZE;
- ret = __blk_rq_map_user(q, rq, ubuf, map_len, gfp_mask);
+ ret = __blk_rq_map_user(q, rq, map_data, ubuf, map_len,
+ gfp_mask);
if (ret < 0)
goto unmap_rq;
if (!bio)
@@ -159,6 +162,7 @@ EXPORT_SYMBOL(blk_rq_map_user);
* blk_rq_map_user_iov - map user data to a request, for REQ_BLOCK_PC usage
* @q: request queue where request should be inserted
* @rq: request to map data to
+ * @map_data: pointer to the rq_map_data holding pages (if necessary)
* @iov: pointer to the iovec
* @iov_count: number of elements in the iovec
* @len: I/O byte count
@@ -178,8 +182,8 @@ EXPORT_SYMBOL(blk_rq_map_user);
* unmapping.
*/
int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
- struct sg_iovec *iov, int iov_count, unsigned int len,
- gfp_t gfp_mask)
+ struct rq_map_data *map_data, struct sg_iovec *iov,
+ int iov_count, unsigned int len, gfp_t gfp_mask)
{
struct bio *bio;
int i, read = rq_data_dir(rq) == READ;
@@ -197,8 +201,9 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
}
}
- if (unaligned || (q->dma_pad_mask & len))
- bio = bio_copy_user_iov(q, iov, iov_count, read, gfp_mask);
+ if (unaligned || (q->dma_pad_mask & len) || map_data)
+ bio = bio_copy_user_iov(q, map_data, iov, iov_count, read,
+ gfp_mask);
else
bio = bio_map_user_iov(q, NULL, iov, iov_count, read, gfp_mask);
@@ -220,6 +225,7 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
rq->buffer = rq->data = NULL;
return 0;
}
+EXPORT_SYMBOL(blk_rq_map_user_iov);
/**
* blk_rq_unmap_user - unmap a request with user data
diff --git a/block/bsg.c b/block/bsg.c
index e7a142e..56cb343 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -283,8 +283,8 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, int has_write_perm)
next_rq->cmd_type = rq->cmd_type;
dxferp = (void*)(unsigned long)hdr->din_xferp;
- ret = blk_rq_map_user(q, next_rq, dxferp, hdr->din_xfer_len,
- GFP_KERNEL);
+ ret = blk_rq_map_user(q, next_rq, NULL, dxferp,
+ hdr->din_xfer_len, GFP_KERNEL);
if (ret)
goto out;
}
@@ -299,7 +299,8 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, int has_write_perm)
dxfer_len = 0;
if (dxfer_len) {
- ret = blk_rq_map_user(q, rq, dxferp, dxfer_len, GFP_KERNEL);
+ ret = blk_rq_map_user(q, rq, NULL, dxferp, dxfer_len,
+ GFP_KERNEL);
if (ret)
goto out;
}
diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index f49d6a1..c34272a 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -314,11 +314,11 @@ static int sg_io(struct file *file, struct request_queue *q,
goto out;
}
- ret = blk_rq_map_user_iov(q, rq, iov, hdr->iovec_count,
+ ret = blk_rq_map_user_iov(q, rq, NULL, iov, hdr->iovec_count,
hdr->dxfer_len, GFP_KERNEL);
kfree(iov);
} else if (hdr->dxfer_len)
- ret = blk_rq_map_user(q, rq, hdr->dxferp, hdr->dxfer_len,
+ ret = blk_rq_map_user(q, rq, NULL, hdr->dxferp, hdr->dxfer_len,
GFP_KERNEL);
if (ret)
diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c
index e861d24..d47f2f8 100644
--- a/drivers/cdrom/cdrom.c
+++ b/drivers/cdrom/cdrom.c
@@ -2097,7 +2097,7 @@ static int cdrom_read_cdda_bpc(struct cdrom_device_info *cdi, __u8 __user *ubuf,
len = nr * CD_FRAMESIZE_RAW;
- ret = blk_rq_map_user(q, rq, ubuf, len, GFP_KERNEL);
+ ret = blk_rq_map_user(q, rq, NULL, ubuf, len, GFP_KERNEL);
if (ret)
break;
diff --git a/drivers/scsi/scsi_tgt_lib.c b/drivers/scsi/scsi_tgt_lib.c
index 2a4fd82..3117bb1 100644
--- a/drivers/scsi/scsi_tgt_lib.c
+++ b/drivers/scsi/scsi_tgt_lib.c
@@ -362,7 +362,7 @@ static int scsi_map_user_pages(struct scsi_tgt_cmd *tcmd, struct scsi_cmnd *cmd,
int err;
dprintk("%lx %u\n", uaddr, len);
- err = blk_rq_map_user(q, rq, (void *)uaddr, len, GFP_KERNEL);
+ err = blk_rq_map_user(q, rq, NULL, (void *)uaddr, len, GFP_KERNEL);
if (err) {
/*
* TODO: need to fixup sg_tablesize, max_segment_size,
diff --git a/fs/bio.c b/fs/bio.c
index 4da5439..6dc2045 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -449,16 +449,19 @@ int bio_add_page(struct bio *bio, struct page *page, unsigned int len,
struct bio_map_data {
struct bio_vec *iovecs;
- int nr_sgvecs;
struct sg_iovec *sgvecs;
+ int nr_sgvecs;
+ int is_our_pages;
};
static void bio_set_map_data(struct bio_map_data *bmd, struct bio *bio,
- struct sg_iovec *iov, int iov_count)
+ struct sg_iovec *iov, int iov_count,
+ int is_our_pages)
{
memcpy(bmd->iovecs, bio->bi_io_vec, sizeof(struct bio_vec) * bio->bi_vcnt);
memcpy(bmd->sgvecs, iov, sizeof(struct sg_iovec) * iov_count);
bmd->nr_sgvecs = iov_count;
+ bmd->is_our_pages = is_our_pages;
bio->bi_private = bmd;
}
@@ -493,7 +496,8 @@ static struct bio_map_data *bio_alloc_map_data(int nr_segs, int iov_count,
}
static int __bio_copy_iov(struct bio *bio, struct bio_vec *iovecs,
- struct sg_iovec *iov, int iov_count, int uncopy)
+ struct sg_iovec *iov, int iov_count, int uncopy,
+ int do_free_page)
{
int ret = 0, i;
struct bio_vec *bvec;
@@ -536,7 +540,7 @@ static int __bio_copy_iov(struct bio *bio, struct bio_vec *iovecs,
}
}
- if (uncopy)
+ if (do_free_page)
__free_page(bvec->bv_page);
}
@@ -555,7 +559,8 @@ int bio_uncopy_user(struct bio *bio)
struct bio_map_data *bmd = bio->bi_private;
int ret;
- ret = __bio_copy_iov(bio, bmd->iovecs, bmd->sgvecs, bmd->nr_sgvecs, 1);
+ ret = __bio_copy_iov(bio, bmd->iovecs, bmd->sgvecs, bmd->nr_sgvecs, 1,
+ bmd->is_our_pages);
bio_free_map_data(bmd);
bio_put(bio);
@@ -565,6 +570,7 @@ int bio_uncopy_user(struct bio *bio)
/**
* bio_copy_user_iov - copy user data to bio
* @q: destination block queue
+ * @map_data: pointer to the rq_map_data holding pages (if necessary)
* @iov: the iovec.
* @iov_count: number of elements in the iovec
* @write_to_vm: bool indicating writing to pages or not
@@ -574,8 +580,10 @@ int bio_uncopy_user(struct bio *bio)
* to/from kernel pages as necessary. Must be paired with
* call bio_uncopy_user() on io completion.
*/
-struct bio *bio_copy_user_iov(struct request_queue *q, struct sg_iovec *iov,
- int iov_count, int write_to_vm, gfp_t gfp_mask)
+struct bio *bio_copy_user_iov(struct request_queue *q,
+ struct rq_map_data *map_data,
+ struct sg_iovec *iov, int iov_count,
+ int write_to_vm, gfp_t gfp_mask)
{
struct bio_map_data *bmd;
struct bio_vec *bvec;
@@ -610,13 +618,26 @@ struct bio *bio_copy_user_iov(struct request_queue *q, struct sg_iovec *iov,
bio->bi_rw |= (!write_to_vm << BIO_RW);
ret = 0;
+ i = 0;
while (len) {
- unsigned int bytes = PAGE_SIZE;
+ unsigned int bytes;
+
+ if (map_data)
+ bytes = 1U << (PAGE_SHIFT + map_data->page_order);
+ else
+ bytes = PAGE_SIZE;
if (bytes > len)
bytes = len;
- page = alloc_page(q->bounce_gfp | gfp_mask);
+ if (map_data) {
+ if (i == map_data->nr_entries) {
+ ret = -ENOMEM;
+ break;
+ }
+ page = map_data->pages[i++];
+ } else
+ page = alloc_page(q->bounce_gfp | gfp_mask);
if (!page) {
ret = -ENOMEM;
break;
@@ -635,16 +656,17 @@ struct bio *bio_copy_user_iov(struct request_queue *q, struct sg_iovec *iov,
* success
*/
if (!write_to_vm) {
- ret = __bio_copy_iov(bio, bio->bi_io_vec, iov, iov_count, 0);
+ ret = __bio_copy_iov(bio, bio->bi_io_vec, iov, iov_count, 0, 0);
if (ret)
goto cleanup;
}
- bio_set_map_data(bmd, bio, iov, iov_count);
+ bio_set_map_data(bmd, bio, iov, iov_count, map_data ? 0 : 1);
return bio;
cleanup:
- bio_for_each_segment(bvec, bio, i)
- __free_page(bvec->bv_page);
+ if (!map_data)
+ bio_for_each_segment(bvec, bio, i)
+ __free_page(bvec->bv_page);
bio_put(bio);
out_bmd:
@@ -655,6 +677,7 @@ out_bmd:
/**
* bio_copy_user - copy user data to bio
* @q: destination block queue
+ * @map_data: pointer to the rq_map_data holding pages (if necessary)
* @uaddr: start of user address
* @len: length in bytes
* @write_to_vm: bool indicating writing to pages or not
@@ -664,15 +687,16 @@ out_bmd:
* to/from kernel pages as necessary. Must be paired with
* call bio_uncopy_user() on io completion.
*/
-struct bio *bio_copy_user(struct request_queue *q, unsigned long uaddr,
- unsigned int len, int write_to_vm, gfp_t gfp_mask)
+struct bio *bio_copy_user(struct request_queue *q, struct rq_map_data *map_data,
+ unsigned long uaddr, unsigned int len,
+ int write_to_vm, gfp_t gfp_mask)
{
struct sg_iovec iov;
iov.iov_base = (void __user *)uaddr;
iov.iov_len = len;
- return bio_copy_user_iov(q, &iov, 1, write_to_vm, gfp_mask);
+ return bio_copy_user_iov(q, map_data, &iov, 1, write_to_vm, gfp_mask);
}
static struct bio *__bio_map_user_iov(struct request_queue *q,
@@ -1038,7 +1062,7 @@ struct bio *bio_copy_kern(struct request_queue *q, void *data, unsigned int len,
bio->bi_private = bmd;
bio->bi_end_io = bio_copy_kern_endio;
- bio_set_map_data(bmd, bio, &iov, 1);
+ bio_set_map_data(bmd, bio, &iov, 1, 1);
return bio;
cleanup:
bio_for_each_segment(bvec, bio, i)
diff --git a/include/linux/bio.h b/include/linux/bio.h
index f4820ca..a68c617 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -349,6 +349,7 @@ extern int bio_get_nr_vecs(struct block_device *);
extern struct bio *bio_map_user(struct request_queue *, struct block_device *,
unsigned long, unsigned int, int, gfp_t);
struct sg_iovec;
+struct rq_map_data;
extern struct bio *bio_map_user_iov(struct request_queue *,
struct block_device *,
struct sg_iovec *, int, int, gfp_t);
@@ -359,9 +360,10 @@ extern struct bio *bio_copy_kern(struct request_queue *, void *, unsigned int,
gfp_t, int);
extern void bio_set_pages_dirty(struct bio *bio);
extern void bio_check_pages_dirty(struct bio *bio);
-extern struct bio *bio_copy_user(struct request_queue *, unsigned long,
- unsigned int, int, gfp_t);
-extern struct bio *bio_copy_user_iov(struct request_queue *, struct sg_iovec *,
+extern struct bio *bio_copy_user(struct request_queue *, struct rq_map_data *,
+ unsigned long, unsigned int, int, gfp_t);
+extern struct bio *bio_copy_user_iov(struct request_queue *,
+ struct rq_map_data *, struct sg_iovec *,
int, int, gfp_t);
extern int bio_uncopy_user(struct bio *);
void zero_fill_bio(struct bio *bio);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index e4cb266..d5e76d1 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -637,6 +637,12 @@ static inline void blk_queue_bounce(struct request_queue *q, struct bio **bio)
}
#endif /* CONFIG_MMU */
+struct rq_map_data {
+ struct page **pages;
+ int page_order;
+ int nr_entries;
+};
+
struct req_iterator {
int i;
struct bio *bio;
@@ -706,11 +712,13 @@ extern void __blk_run_queue(struct request_queue *);
extern void blk_run_queue(struct request_queue *);
extern void blk_start_queueing(struct request_queue *);
extern int blk_rq_map_user(struct request_queue *, struct request *,
- void __user *, unsigned long, gfp_t);
+ struct rq_map_data *, void __user *, unsigned long,
+ gfp_t);
extern int blk_rq_unmap_user(struct bio *);
extern int blk_rq_map_kern(struct request_queue *, struct request *, void *, unsigned int, gfp_t);
extern int blk_rq_map_user_iov(struct request_queue *, struct request *,
- struct sg_iovec *, int, unsigned int, gfp_t);
+ struct rq_map_data *, struct sg_iovec *, int,
+ unsigned int, gfp_t);
extern int blk_execute_rq(struct request_queue *, struct gendisk *,
struct request *, int);
extern void blk_execute_rq_nowait(struct request_queue *, struct gendisk *,
--
1.5.5.GIT
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 3/5] sg: convert the non-data path to use the block layer
2008-08-26 2:10 ` [PATCH 2/5] block: introduce struct rq_map_data to use reserved pages FUJITA Tomonori
@ 2008-08-26 2:10 ` FUJITA Tomonori
2008-08-26 2:10 ` [PATCH 4/5] sg: convert the direct IO " FUJITA Tomonori
0 siblings, 1 reply; 23+ messages in thread
From: FUJITA Tomonori @ 2008-08-26 2:10 UTC (permalink / raw)
To: linux-scsi; +Cc: jens.axboe, dougg, michaelc, James.Bottomley, fujita.tomonori
This patch converts the non data path to use the block layer functions
(blk_get_request, blk_execute_rq_nowait, etc) instead of uses
scsi_execute_async().
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Douglas Gilbert <dougg@torque.net>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
---
drivers/scsi/sg.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++----
1 files changed, 47 insertions(+), 4 deletions(-)
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 661f9f2..d1b3de9 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -137,6 +137,7 @@ typedef struct sg_request { /* SG_MAX_QUEUE requests outstanding per file */
char orphan; /* 1 -> drop on sight, 0 -> normal */
char sg_io_owned; /* 1 -> packet belongs to SG_IO */
volatile char done; /* 0->before bh, 1->before read, 2->read */
+ struct request *rq;
} Sg_request;
typedef struct sg_fd { /* holds the state of a file descriptor */
@@ -176,7 +177,7 @@ typedef struct sg_device { /* holds the state of each scsi generic device */
static int sg_fasync(int fd, struct file *filp, int mode);
/* tasklet or soft irq callback */
static void sg_cmd_done(void *data, char *sense, int result, int resid);
-static int sg_start_req(Sg_request * srp);
+static int sg_start_req(Sg_request * srp, unsigned char *cmd);
static void sg_finish_rem_req(Sg_request * srp);
static int sg_build_indirect(Sg_scatter_hold * schp, Sg_fd * sfp, int buff_size);
static int sg_build_sgat(Sg_scatter_hold * schp, const Sg_fd * sfp,
@@ -229,6 +230,11 @@ static int sg_allow_access(struct file *filp, unsigned char *cmd)
cmd, filp->f_mode & FMODE_WRITE);
}
+static void sg_rq_end_io(struct request *rq, int uptodate)
+{
+ sg_cmd_done(rq->end_io_data, rq->sense, rq->errors, rq->data_len);
+}
+
static int
sg_open(struct inode *inode, struct file *filp)
{
@@ -732,7 +738,7 @@ sg_common_write(Sg_fd * sfp, Sg_request * srp,
SCSI_LOG_TIMEOUT(4, printk("sg_common_write: scsi opcode=0x%02x, cmd_size=%d\n",
(int) cmnd[0], (int) hp->cmd_len));
- if ((k = sg_start_req(srp))) {
+ if ((k = sg_start_req(srp, cmnd))) {
SCSI_LOG_TIMEOUT(1, printk("sg_common_write: start_req err=%d\n", k));
sg_finish_rem_req(srp);
return k; /* probably out of space --> ENOMEM */
@@ -765,6 +771,12 @@ sg_common_write(Sg_fd * sfp, Sg_request * srp,
hp->duration = jiffies_to_msecs(jiffies);
/* Now send everything of to mid-level. The next time we hear about this
packet is when sg_cmd_done() is called (i.e. a callback). */
+ if (srp->rq) {
+ srp->rq->timeout = timeout;
+ blk_execute_rq_nowait(sdp->device->request_queue, sdp->disk,
+ srp->rq, 1, sg_rq_end_io);
+ return 0;
+ }
if (scsi_execute_async(sdp->device, cmnd, hp->cmd_len, data_dir, srp->data.buffer,
hp->dxfer_len, srp->data.k_use_sg, timeout,
SG_DEFAULT_RETRIES, srp, sg_cmd_done,
@@ -1634,8 +1646,33 @@ exit_sg(void)
idr_destroy(&sg_index_idr);
}
+static int __sg_start_req(struct sg_request *srp, struct sg_io_hdr *hp,
+ unsigned char *cmd)
+{
+ struct sg_fd *sfp = srp->parentfp;
+ struct request_queue *q = sfp->parentdp->device->request_queue;
+ struct request *rq;
+ int rw = hp->dxfer_direction == SG_DXFER_TO_DEV ? WRITE : READ;
+
+ rq = blk_get_request(q, rw, GFP_ATOMIC);
+ if (!rq)
+ return -ENOMEM;
+
+ memcpy(rq->cmd, cmd, hp->cmd_len);
+
+ rq->cmd_len = hp->cmd_len;
+ rq->cmd_type = REQ_TYPE_BLOCK_PC;
+
+ srp->rq = rq;
+ rq->end_io_data = srp;
+ rq->sense = srp->sense_b;
+ rq->retries = SG_DEFAULT_RETRIES;
+
+ return 0;
+}
+
static int
-sg_start_req(Sg_request * srp)
+sg_start_req(Sg_request * srp, unsigned char *cmd)
{
int res;
Sg_fd *sfp = srp->parentfp;
@@ -1646,8 +1683,10 @@ sg_start_req(Sg_request * srp)
Sg_scatter_hold *rsv_schp = &sfp->reserve;
SCSI_LOG_TIMEOUT(4, printk("sg_start_req: dxfer_len=%d\n", dxfer_len));
+
if ((dxfer_len <= 0) || (dxfer_dir == SG_DXFER_NONE))
- return 0;
+ return __sg_start_req(srp, hp, cmd);
+
if (sg_allow_dio && (hp->flags & SG_FLAG_DIRECT_IO) &&
(dxfer_dir != SG_DXFER_UNKNOWN) && (0 == hp->iovec_count) &&
(!sfp->parentdp->device->host->unchecked_isa_dma)) {
@@ -1678,6 +1717,10 @@ sg_finish_rem_req(Sg_request * srp)
sg_unlink_reserve(sfp, srp);
else
sg_remove_scat(req_schp);
+
+ if (srp->rq)
+ blk_put_request(srp->rq);
+
sg_remove_request(sfp, srp);
}
--
1.5.5.GIT
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 4/5] sg: convert the direct IO path to use the block layer
2008-08-26 2:10 ` [PATCH 3/5] sg: convert the non-data path to use the block layer FUJITA Tomonori
@ 2008-08-26 2:10 ` FUJITA Tomonori
2008-08-26 2:10 ` [PATCH 5/5] sg: convert the indirect " FUJITA Tomonori
0 siblings, 1 reply; 23+ messages in thread
From: FUJITA Tomonori @ 2008-08-26 2:10 UTC (permalink / raw)
To: linux-scsi; +Cc: jens.axboe, dougg, michaelc, James.Bottomley, fujita.tomonori
This patch converts the direct IO path (SG_FLAG_DIRECT_IO) to use the
block layer functions (blk_get_request, blk_execute_rq_nowait,
blk_rq_map_user, etc) instead of scsi_execute_async().
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Douglas Gilbert <dougg@torque.net>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
---
drivers/scsi/sg.c | 173 ++++++++--------------------------------------------
1 files changed, 27 insertions(+), 146 deletions(-)
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index d1b3de9..10a285e 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -138,6 +138,7 @@ typedef struct sg_request { /* SG_MAX_QUEUE requests outstanding per file */
char sg_io_owned; /* 1 -> packet belongs to SG_IO */
volatile char done; /* 0->before bh, 1->before read, 2->read */
struct request *rq;
+ struct bio *bio;
} Sg_request;
typedef struct sg_fd { /* holds the state of a file descriptor */
@@ -1679,21 +1680,29 @@ sg_start_req(Sg_request * srp, unsigned char *cmd)
sg_io_hdr_t *hp = &srp->header;
int dxfer_len = (int) hp->dxfer_len;
int dxfer_dir = hp->dxfer_direction;
+ unsigned long uaddr = (unsigned long)hp->dxferp;
Sg_scatter_hold *req_schp = &srp->data;
Sg_scatter_hold *rsv_schp = &sfp->reserve;
+ struct request_queue *q = sfp->parentdp->device->request_queue;
+ unsigned long alignment = queue_dma_alignment(q) | q->dma_pad_mask;
SCSI_LOG_TIMEOUT(4, printk("sg_start_req: dxfer_len=%d\n", dxfer_len));
if ((dxfer_len <= 0) || (dxfer_dir == SG_DXFER_NONE))
return __sg_start_req(srp, hp, cmd);
+#ifdef SG_ALLOW_DIO_CODE
if (sg_allow_dio && (hp->flags & SG_FLAG_DIRECT_IO) &&
(dxfer_dir != SG_DXFER_UNKNOWN) && (0 == hp->iovec_count) &&
- (!sfp->parentdp->device->host->unchecked_isa_dma)) {
- res = sg_build_direct(srp, sfp, dxfer_len);
- if (res <= 0) /* -ve -> error, 0 -> done, 1 -> try indirect */
- return res;
+ (!sfp->parentdp->device->host->unchecked_isa_dma) &&
+ !(uaddr & alignment) && !(dxfer_len & alignment)) {
+ res = __sg_start_req(srp, hp, cmd);
+ if (!res)
+ res = sg_build_direct(srp, sfp, dxfer_len);
+
+ return res;
}
+#endif
if ((!sg_res_in_use(sfp)) && (dxfer_len <= rsv_schp->bufflen))
sg_link_reserve(sfp, srp, dxfer_len);
else {
@@ -1718,8 +1727,11 @@ sg_finish_rem_req(Sg_request * srp)
else
sg_remove_scat(req_schp);
- if (srp->rq)
+ if (srp->rq) {
+ if (srp->bio)
+ blk_rq_unmap_user(srp->bio);
blk_put_request(srp->rq);
+ }
sg_remove_request(sfp, srp);
}
@@ -1746,151 +1758,23 @@ sg_build_sgat(Sg_scatter_hold * schp, const Sg_fd * sfp, int tablesize)
return tablesize; /* number of scat_gath elements allocated */
}
-#ifdef SG_ALLOW_DIO_CODE
-/* vvvvvvvv following code borrowed from st driver's direct IO vvvvvvvvv */
- /* TODO: hopefully we can use the generic block layer code */
-
-/* Pin down user pages and put them into a scatter gather list. Returns <= 0 if
- - mapping of all pages not successful
- (i.e., either completely successful or fails)
-*/
-static int
-st_map_user_pages(struct scatterlist *sgl, const unsigned int max_pages,
- unsigned long uaddr, size_t count, int rw)
-{
- unsigned long end = (uaddr + count + PAGE_SIZE - 1) >> PAGE_SHIFT;
- unsigned long start = uaddr >> PAGE_SHIFT;
- const int nr_pages = end - start;
- int res, i, j;
- struct page **pages;
-
- /* User attempted Overflow! */
- if ((uaddr + count) < uaddr)
- return -EINVAL;
-
- /* Too big */
- if (nr_pages > max_pages)
- return -ENOMEM;
-
- /* Hmm? */
- if (count == 0)
- return 0;
-
- if ((pages = kmalloc(max_pages * sizeof(*pages), GFP_ATOMIC)) == NULL)
- return -ENOMEM;
-
- /* Try to fault in all of the necessary pages */
- down_read(¤t->mm->mmap_sem);
- /* rw==READ means read from drive, write into memory area */
- res = get_user_pages(
- current,
- current->mm,
- uaddr,
- nr_pages,
- rw == READ,
- 0, /* don't force */
- pages,
- NULL);
- up_read(¤t->mm->mmap_sem);
-
- /* Errors and no page mapped should return here */
- if (res < nr_pages)
- goto out_unmap;
-
- for (i=0; i < nr_pages; i++) {
- /* FIXME: flush superflous for rw==READ,
- * probably wrong function for rw==WRITE
- */
- flush_dcache_page(pages[i]);
- /* ?? Is locking needed? I don't think so */
- /* if (!trylock_page(pages[i]))
- goto out_unlock; */
- }
-
- sg_set_page(sgl, pages[0], 0, uaddr & ~PAGE_MASK);
- if (nr_pages > 1) {
- sgl[0].length = PAGE_SIZE - sgl[0].offset;
- count -= sgl[0].length;
- for (i=1; i < nr_pages ; i++)
- sg_set_page(&sgl[i], pages[i], count < PAGE_SIZE ? count : PAGE_SIZE, 0);
- }
- else {
- sgl[0].length = count;
- }
-
- kfree(pages);
- return nr_pages;
-
- out_unmap:
- if (res > 0) {
- for (j=0; j < res; j++)
- page_cache_release(pages[j]);
- res = 0;
- }
- kfree(pages);
- return res;
-}
-
-
-/* And unmap them... */
-static int
-st_unmap_user_pages(struct scatterlist *sgl, const unsigned int nr_pages,
- int dirtied)
-{
- int i;
-
- for (i=0; i < nr_pages; i++) {
- struct page *page = sg_page(&sgl[i]);
-
- if (dirtied)
- SetPageDirty(page);
- /* unlock_page(page); */
- /* FIXME: cache flush missing for rw==READ
- * FIXME: call the correct reference counting function
- */
- page_cache_release(page);
- }
-
- return 0;
-}
-
-/* ^^^^^^^^ above code borrowed from st driver's direct IO ^^^^^^^^^ */
-#endif
-
-
/* Returns: -ve -> error, 0 -> done, 1 -> try indirect */
static int
sg_build_direct(Sg_request * srp, Sg_fd * sfp, int dxfer_len)
{
-#ifdef SG_ALLOW_DIO_CODE
sg_io_hdr_t *hp = &srp->header;
Sg_scatter_hold *schp = &srp->data;
- int sg_tablesize = sfp->parentdp->sg_tablesize;
- int mx_sc_elems, res;
- struct scsi_device *sdev = sfp->parentdp->device;
-
- if (((unsigned long)hp->dxferp &
- queue_dma_alignment(sdev->request_queue)) != 0)
- return 1;
+ int res;
+ struct request *rq = srp->rq;
+ struct request_queue *q = sfp->parentdp->device->request_queue;
- mx_sc_elems = sg_build_sgat(schp, sfp, sg_tablesize);
- if (mx_sc_elems <= 0) {
- return 1;
- }
- res = st_map_user_pages(schp->buffer, mx_sc_elems,
- (unsigned long)hp->dxferp, dxfer_len,
- (SG_DXFER_TO_DEV == hp->dxfer_direction) ? 1 : 0);
- if (res <= 0) {
- sg_remove_scat(schp);
- return 1;
- }
- schp->k_use_sg = res;
+ res = blk_rq_map_user(q, rq, NULL, hp->dxferp, dxfer_len, GFP_ATOMIC);
+ if (res)
+ return res;
+ srp->bio = rq->bio;
schp->dio_in_use = 1;
hp->info |= SG_INFO_DIRECT_IO;
return 0;
-#else
- return 1;
-#endif
}
static int
@@ -2069,11 +1953,7 @@ sg_remove_scat(Sg_scatter_hold * schp)
if (schp->buffer && (schp->sglist_len > 0)) {
struct scatterlist *sg = schp->buffer;
- if (schp->dio_in_use) {
-#ifdef SG_ALLOW_DIO_CODE
- st_unmap_user_pages(sg, schp->k_use_sg, TRUE);
-#endif
- } else {
+ if (!schp->dio_in_use) {
int k;
for (k = 0; (k < schp->k_use_sg) && sg_page(sg);
@@ -2083,8 +1963,9 @@ sg_remove_scat(Sg_scatter_hold * schp)
k, sg_page(sg), sg->length));
sg_page_free(sg_page(sg), sg->length);
}
+
+ kfree(schp->buffer);
}
- kfree(schp->buffer);
}
memset(schp, 0, sizeof (*schp));
}
--
1.5.5.GIT
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 5/5] sg: convert the indirect IO path to use the block layer
2008-08-26 2:10 ` [PATCH 4/5] sg: convert the direct IO " FUJITA Tomonori
@ 2008-08-26 2:10 ` FUJITA Tomonori
0 siblings, 0 replies; 23+ messages in thread
From: FUJITA Tomonori @ 2008-08-26 2:10 UTC (permalink / raw)
To: linux-scsi; +Cc: jens.axboe, dougg, michaelc, James.Bottomley, fujita.tomonori
This patch converts the indirect IO path (including mmap IO and old
struct sg_header) to use the block layer functions (blk_get_request,
blk_execute_rq_nowait, blk_rq_map_user, etc) instead of
scsi_execute_async().
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Douglas Gilbert <dougg@torque.net>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
---
drivers/scsi/sg.c | 393 ++++++++++++++---------------------------------------
1 files changed, 103 insertions(+), 290 deletions(-)
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 10a285e..66a2c31 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -47,7 +47,6 @@ static int sg_version_num = 30534; /* 2 digits for each component */
#include <linux/seq_file.h>
#include <linux/blkdev.h>
#include <linux/delay.h>
-#include <linux/scatterlist.h>
#include <linux/blktrace_api.h>
#include <linux/smp_lock.h>
@@ -119,7 +118,8 @@ typedef struct sg_scatter_hold { /* holding area for scsi scatter gather info */
unsigned sglist_len; /* size of malloc'd scatter-gather list ++ */
unsigned bufflen; /* Size of (aggregate) data buffer */
unsigned b_malloc_len; /* actual len malloc'ed in buffer */
- struct scatterlist *buffer;/* scatter list */
+ struct page **pages;
+ int page_order;
char dio_in_use; /* 0->indirect IO (or mmap), 1->dio */
unsigned char cmd_opcode; /* first byte of command */
} Sg_scatter_hold;
@@ -190,8 +190,6 @@ static ssize_t sg_new_write(Sg_fd *sfp, struct file *file,
int read_only, Sg_request **o_srp);
static int sg_common_write(Sg_fd * sfp, Sg_request * srp,
unsigned char *cmnd, int timeout, int blocking);
-static int sg_u_iovec(sg_io_hdr_t * hp, int sg_num, int ind,
- int wr_xf, int *countp, unsigned char __user **up);
static int sg_write_xfer(Sg_request * srp);
static int sg_read_xfer(Sg_request * srp);
static int sg_read_oxfer(Sg_request * srp, char __user *outp, int num_read_xfer);
@@ -199,8 +197,6 @@ static void sg_remove_scat(Sg_scatter_hold * schp);
static void sg_build_reserve(Sg_fd * sfp, int req_size);
static void sg_link_reserve(Sg_fd * sfp, Sg_request * srp, int size);
static void sg_unlink_reserve(Sg_fd * sfp, Sg_request * srp);
-static struct page *sg_page_malloc(int rqSz, int lowDma, int *retSzp);
-static void sg_page_free(struct page *page, int size);
static Sg_fd *sg_add_sfp(Sg_device * sdp, int dev);
static int sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp);
static void __sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp);
@@ -770,26 +766,11 @@ sg_common_write(Sg_fd * sfp, Sg_request * srp,
break;
}
hp->duration = jiffies_to_msecs(jiffies);
-/* Now send everything of to mid-level. The next time we hear about this
- packet is when sg_cmd_done() is called (i.e. a callback). */
- if (srp->rq) {
- srp->rq->timeout = timeout;
- blk_execute_rq_nowait(sdp->device->request_queue, sdp->disk,
- srp->rq, 1, sg_rq_end_io);
- return 0;
- }
- if (scsi_execute_async(sdp->device, cmnd, hp->cmd_len, data_dir, srp->data.buffer,
- hp->dxfer_len, srp->data.k_use_sg, timeout,
- SG_DEFAULT_RETRIES, srp, sg_cmd_done,
- GFP_ATOMIC)) {
- SCSI_LOG_TIMEOUT(1, printk("sg_common_write: scsi_execute_async failed\n"));
- /*
- * most likely out of mem, but could also be a bad map
- */
- sg_finish_rem_req(srp);
- return -ENOMEM;
- } else
- return 0;
+
+ srp->rq->timeout = timeout;
+ blk_execute_rq_nowait(sdp->device->request_queue, sdp->disk,
+ srp->rq, 1, sg_rq_end_io);
+ return 0;
}
static int
@@ -1205,8 +1186,7 @@ sg_vma_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
Sg_fd *sfp;
unsigned long offset, len, sa;
Sg_scatter_hold *rsv_schp;
- struct scatterlist *sg;
- int k;
+ int k, length;
if ((NULL == vma) || (!(sfp = (Sg_fd *) vma->vm_private_data)))
return VM_FAULT_SIGBUS;
@@ -1216,15 +1196,14 @@ sg_vma_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
return VM_FAULT_SIGBUS;
SCSI_LOG_TIMEOUT(3, printk("sg_vma_fault: offset=%lu, scatg=%d\n",
offset, rsv_schp->k_use_sg));
- sg = rsv_schp->buffer;
sa = vma->vm_start;
- for (k = 0; (k < rsv_schp->k_use_sg) && (sa < vma->vm_end);
- ++k, sg = sg_next(sg)) {
+ length = 1 << (PAGE_SHIFT + rsv_schp->page_order);
+ for (k = 0; k < rsv_schp->k_use_sg && sa < vma->vm_end; k++) {
len = vma->vm_end - sa;
- len = (len < sg->length) ? len : sg->length;
+ len = (len < length) ? len : length;
if (offset < len) {
- struct page *page;
- page = virt_to_page(page_address(sg_page(sg)) + offset);
+ struct page *page = nth_page(rsv_schp->pages[k],
+ offset >> PAGE_SHIFT);
get_page(page); /* increment page count */
vmf->page = page;
return 0; /* success */
@@ -1246,8 +1225,7 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
Sg_fd *sfp;
unsigned long req_sz, len, sa;
Sg_scatter_hold *rsv_schp;
- int k;
- struct scatterlist *sg;
+ int k, length;
if ((!filp) || (!vma) || (!(sfp = (Sg_fd *) filp->private_data)))
return -ENXIO;
@@ -1261,11 +1239,10 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
return -ENOMEM; /* cannot map more than reserved buffer */
sa = vma->vm_start;
- sg = rsv_schp->buffer;
- for (k = 0; (k < rsv_schp->k_use_sg) && (sa < vma->vm_end);
- ++k, sg = sg_next(sg)) {
+ length = 1 << (PAGE_SHIFT + rsv_schp->page_order);
+ for (k = 0; k < rsv_schp->k_use_sg && sa < vma->vm_end; k++) {
len = vma->vm_end - sa;
- len = (len < sg->length) ? len : sg->length;
+ len = (len < length) ? len : length;
sa += len;
}
@@ -1309,7 +1286,6 @@ sg_cmd_done(void *data, char *sense, int result, int resid)
if (0 != result) {
struct scsi_sense_hdr sshdr;
- memcpy(srp->sense_b, sense, sizeof (srp->sense_b));
srp->header.status = 0xff & result;
srp->header.masked_status = status_byte(result);
srp->header.msg_status = msg_byte(result);
@@ -1685,34 +1661,51 @@ sg_start_req(Sg_request * srp, unsigned char *cmd)
Sg_scatter_hold *rsv_schp = &sfp->reserve;
struct request_queue *q = sfp->parentdp->device->request_queue;
unsigned long alignment = queue_dma_alignment(q) | q->dma_pad_mask;
+ struct rq_map_data map_data;
SCSI_LOG_TIMEOUT(4, printk("sg_start_req: dxfer_len=%d\n", dxfer_len));
+ res = __sg_start_req(srp, hp, cmd);
+ if (res)
+ return res;
+
if ((dxfer_len <= 0) || (dxfer_dir == SG_DXFER_NONE))
- return __sg_start_req(srp, hp, cmd);
+ return 0;
#ifdef SG_ALLOW_DIO_CODE
if (sg_allow_dio && (hp->flags & SG_FLAG_DIRECT_IO) &&
(dxfer_dir != SG_DXFER_UNKNOWN) && (0 == hp->iovec_count) &&
(!sfp->parentdp->device->host->unchecked_isa_dma) &&
- !(uaddr & alignment) && !(dxfer_len & alignment)) {
- res = __sg_start_req(srp, hp, cmd);
- if (!res)
- res = sg_build_direct(srp, sfp, dxfer_len);
-
- return res;
- }
+ !(uaddr & alignment) && !(dxfer_len & alignment))
+ return sg_build_direct(srp, sfp, dxfer_len);
#endif
if ((!sg_res_in_use(sfp)) && (dxfer_len <= rsv_schp->bufflen))
sg_link_reserve(sfp, srp, dxfer_len);
- else {
+ else
res = sg_build_indirect(req_schp, sfp, dxfer_len);
- if (res) {
- sg_remove_scat(req_schp);
- return res;
- }
+
+ if (!res) {
+ struct request *rq = srp->rq;
+ Sg_scatter_hold *schp = &srp->data;
+ int iovec_count = (int) hp->iovec_count;
+
+ map_data.pages = schp->pages;
+ map_data.page_order = schp->page_order;
+ map_data.nr_entries = schp->k_use_sg;
+
+ if (iovec_count)
+ res = blk_rq_map_user_iov(q, rq, &map_data, hp->dxferp,
+ iovec_count,
+ hp->dxfer_len, GFP_ATOMIC);
+ else
+ res = blk_rq_map_user(q, rq, &map_data, hp->dxferp, hp->dxfer_len,
+ GFP_ATOMIC);
+
+ if (!res)
+ srp->bio = rq->bio;
}
- return 0;
+
+ return res;
}
static void
@@ -1730,6 +1723,7 @@ sg_finish_rem_req(Sg_request * srp)
if (srp->rq) {
if (srp->bio)
blk_rq_unmap_user(srp->bio);
+
blk_put_request(srp->rq);
}
@@ -1739,21 +1733,12 @@ sg_finish_rem_req(Sg_request * srp)
static int
sg_build_sgat(Sg_scatter_hold * schp, const Sg_fd * sfp, int tablesize)
{
- int sg_bufflen = tablesize * sizeof(struct scatterlist);
+ int sg_bufflen = tablesize * sizeof(struct page *);
gfp_t gfp_flags = GFP_ATOMIC | __GFP_NOWARN;
- /*
- * TODO: test without low_dma, we should not need it since
- * the block layer will bounce the buffer for us
- *
- * XXX(hch): we shouldn't need GFP_DMA for the actual S/G list.
- */
- if (sfp->low_dma)
- gfp_flags |= GFP_DMA;
- schp->buffer = kzalloc(sg_bufflen, gfp_flags);
- if (!schp->buffer)
+ schp->pages = kzalloc(sg_bufflen, gfp_flags);
+ if (!schp->pages)
return -ENOMEM;
- sg_init_table(schp->buffer, tablesize);
schp->sglist_len = sg_bufflen;
return tablesize; /* number of scat_gath elements allocated */
}
@@ -1780,11 +1765,10 @@ sg_build_direct(Sg_request * srp, Sg_fd * sfp, int dxfer_len)
static int
sg_build_indirect(Sg_scatter_hold * schp, Sg_fd * sfp, int buff_size)
{
- struct scatterlist *sg;
- int ret_sz = 0, k, rem_sz, num, mx_sc_elems;
+ int ret_sz = 0, i, k, rem_sz, num, mx_sc_elems;
int sg_tablesize = sfp->parentdp->sg_tablesize;
- int blk_size = buff_size;
- struct page *p = NULL;
+ int blk_size = buff_size, order;
+ gfp_t gfp_mask = GFP_ATOMIC | __GFP_COMP | __GFP_NOWARN;
if (blk_size < 0)
return -EFAULT;
@@ -1808,15 +1792,26 @@ sg_build_indirect(Sg_scatter_hold * schp, Sg_fd * sfp, int buff_size)
} else
scatter_elem_sz_prev = num;
}
- for (k = 0, sg = schp->buffer, rem_sz = blk_size;
- (rem_sz > 0) && (k < mx_sc_elems);
- ++k, rem_sz -= ret_sz, sg = sg_next(sg)) {
-
+
+ if (sfp->low_dma)
+ gfp_mask |= GFP_DMA;
+
+ if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SYS_RAWIO))
+ gfp_mask |= __GFP_ZERO;
+
+ order = get_order(num);
+retry:
+ ret_sz = 1 << (PAGE_SHIFT + order);
+
+ for (k = 0, rem_sz = blk_size; rem_sz > 0 && k < mx_sc_elems;
+ k++, rem_sz -= ret_sz) {
+
num = (rem_sz > scatter_elem_sz_prev) ?
- scatter_elem_sz_prev : rem_sz;
- p = sg_page_malloc(num, sfp->low_dma, &ret_sz);
- if (!p)
- return -ENOMEM;
+ scatter_elem_sz_prev : rem_sz;
+
+ schp->pages[k] = alloc_pages(gfp_mask, order);
+ if (!schp->pages[k])
+ goto out;
if (num == scatter_elem_sz_prev) {
if (unlikely(ret_sz > scatter_elem_sz_prev)) {
@@ -1824,12 +1819,12 @@ sg_build_indirect(Sg_scatter_hold * schp, Sg_fd * sfp, int buff_size)
scatter_elem_sz_prev = ret_sz;
}
}
- sg_set_page(sg, p, (ret_sz > num) ? num : ret_sz, 0);
SCSI_LOG_TIMEOUT(5, printk("sg_build_indirect: k=%d, num=%d, "
"ret_sz=%d\n", k, num, ret_sz));
} /* end of for loop */
+ schp->page_order = order;
schp->k_use_sg = k;
SCSI_LOG_TIMEOUT(5, printk("sg_build_indirect: k_use_sg=%d, "
"rem_sz=%d\n", k, rem_sz));
@@ -1837,8 +1832,15 @@ sg_build_indirect(Sg_scatter_hold * schp, Sg_fd * sfp, int buff_size)
schp->bufflen = blk_size;
if (rem_sz > 0) /* must have failed */
return -ENOMEM;
-
return 0;
+out:
+ for (i = 0; i < k; i++)
+ __free_pages(schp->pages[k], order);
+
+ if (--order >= 0)
+ goto retry;
+
+ return -ENOMEM;
}
static int
@@ -1846,13 +1848,8 @@ sg_write_xfer(Sg_request * srp)
{
sg_io_hdr_t *hp = &srp->header;
Sg_scatter_hold *schp = &srp->data;
- struct scatterlist *sg = schp->buffer;
int num_xfer = 0;
- int j, k, onum, usglen, ksglen, res;
- int iovec_count = (int) hp->iovec_count;
int dxfer_dir = hp->dxfer_direction;
- unsigned char *p;
- unsigned char __user *up;
int new_interface = ('\0' == hp->interface_id) ? 0 : 1;
if ((SG_DXFER_UNKNOWN == dxfer_dir) || (SG_DXFER_TO_DEV == dxfer_dir) ||
@@ -1868,103 +1865,26 @@ sg_write_xfer(Sg_request * srp)
SCSI_LOG_TIMEOUT(4, printk("sg_write_xfer: num_xfer=%d, iovec_count=%d, k_use_sg=%d\n",
num_xfer, iovec_count, schp->k_use_sg));
- if (iovec_count) {
- onum = iovec_count;
- if (!access_ok(VERIFY_READ, hp->dxferp, SZ_SG_IOVEC * onum))
- return -EFAULT;
- } else
- onum = 1;
-
- ksglen = sg->length;
- p = page_address(sg_page(sg));
- for (j = 0, k = 0; j < onum; ++j) {
- res = sg_u_iovec(hp, iovec_count, j, 1, &usglen, &up);
- if (res)
- return res;
-
- for (; p; sg = sg_next(sg), ksglen = sg->length,
- p = page_address(sg_page(sg))) {
- if (usglen <= 0)
- break;
- if (ksglen > usglen) {
- if (usglen >= num_xfer) {
- if (__copy_from_user(p, up, num_xfer))
- return -EFAULT;
- return 0;
- }
- if (__copy_from_user(p, up, usglen))
- return -EFAULT;
- p += usglen;
- ksglen -= usglen;
- break;
- } else {
- if (ksglen >= num_xfer) {
- if (__copy_from_user(p, up, num_xfer))
- return -EFAULT;
- return 0;
- }
- if (__copy_from_user(p, up, ksglen))
- return -EFAULT;
- up += ksglen;
- usglen -= ksglen;
- }
- ++k;
- if (k >= schp->k_use_sg)
- return 0;
- }
- }
return 0;
}
-static int
-sg_u_iovec(sg_io_hdr_t * hp, int sg_num, int ind,
- int wr_xf, int *countp, unsigned char __user **up)
-{
- int num_xfer = (int) hp->dxfer_len;
- unsigned char __user *p = hp->dxferp;
- int count;
-
- if (0 == sg_num) {
- if (wr_xf && ('\0' == hp->interface_id))
- count = (int) hp->flags; /* holds "old" input_size */
- else
- count = num_xfer;
- } else {
- sg_iovec_t iovec;
- if (__copy_from_user(&iovec, p + ind*SZ_SG_IOVEC, SZ_SG_IOVEC))
- return -EFAULT;
- p = iovec.iov_base;
- count = (int) iovec.iov_len;
- }
- if (!access_ok(wr_xf ? VERIFY_READ : VERIFY_WRITE, p, count))
- return -EFAULT;
- if (up)
- *up = p;
- if (countp)
- *countp = count;
- return 0;
-}
-
static void
sg_remove_scat(Sg_scatter_hold * schp)
{
SCSI_LOG_TIMEOUT(4, printk("sg_remove_scat: k_use_sg=%d\n", schp->k_use_sg));
- if (schp->buffer && (schp->sglist_len > 0)) {
- struct scatterlist *sg = schp->buffer;
-
+ if (schp->pages && schp->sglist_len > 0) {
if (!schp->dio_in_use) {
int k;
- for (k = 0; (k < schp->k_use_sg) && sg_page(sg);
- ++k, sg = sg_next(sg)) {
+ for (k = 0; k < schp->k_use_sg && schp->pages[k]; k++) {
SCSI_LOG_TIMEOUT(5, printk(
- "sg_remove_scat: k=%d, pg=0x%p, len=%d\n",
- k, sg_page(sg), sg->length));
- sg_page_free(sg_page(sg), sg->length);
+ "sg_remove_scat: k=%d, pg=0x%p\n",
+ k, page));
+ __free_pages(schp->pages[k], schp->page_order);
}
- kfree(schp->buffer);
+ kfree(schp->pages);
}
}
memset(schp, 0, sizeof (*schp));
@@ -1975,13 +1895,8 @@ sg_read_xfer(Sg_request * srp)
{
sg_io_hdr_t *hp = &srp->header;
Sg_scatter_hold *schp = &srp->data;
- struct scatterlist *sg = schp->buffer;
int num_xfer = 0;
- int j, k, onum, usglen, ksglen, res;
- int iovec_count = (int) hp->iovec_count;
int dxfer_dir = hp->dxfer_direction;
- unsigned char *p;
- unsigned char __user *up;
int new_interface = ('\0' == hp->interface_id) ? 0 : 1;
if ((SG_DXFER_UNKNOWN == dxfer_dir) || (SG_DXFER_FROM_DEV == dxfer_dir)
@@ -1996,53 +1911,7 @@ sg_read_xfer(Sg_request * srp)
return 0;
SCSI_LOG_TIMEOUT(4, printk("sg_read_xfer: num_xfer=%d, iovec_count=%d, k_use_sg=%d\n",
- num_xfer, iovec_count, schp->k_use_sg));
- if (iovec_count) {
- onum = iovec_count;
- if (!access_ok(VERIFY_READ, hp->dxferp, SZ_SG_IOVEC * onum))
- return -EFAULT;
- } else
- onum = 1;
-
- p = page_address(sg_page(sg));
- ksglen = sg->length;
- for (j = 0, k = 0; j < onum; ++j) {
- res = sg_u_iovec(hp, iovec_count, j, 0, &usglen, &up);
- if (res)
- return res;
-
- for (; p; sg = sg_next(sg), ksglen = sg->length,
- p = page_address(sg_page(sg))) {
- if (usglen <= 0)
- break;
- if (ksglen > usglen) {
- if (usglen >= num_xfer) {
- if (__copy_to_user(up, p, num_xfer))
- return -EFAULT;
- return 0;
- }
- if (__copy_to_user(up, p, usglen))
- return -EFAULT;
- p += usglen;
- ksglen -= usglen;
- break;
- } else {
- if (ksglen >= num_xfer) {
- if (__copy_to_user(up, p, num_xfer))
- return -EFAULT;
- return 0;
- }
- if (__copy_to_user(up, p, ksglen))
- return -EFAULT;
- up += ksglen;
- usglen -= ksglen;
- }
- ++k;
- if (k >= schp->k_use_sg)
- return 0;
- }
- }
-
+ num_xfer, (int)hp->iovec_count, schp->k_use_sg));
return 0;
}
@@ -2050,7 +1919,6 @@ static int
sg_read_oxfer(Sg_request * srp, char __user *outp, int num_read_xfer)
{
Sg_scatter_hold *schp = &srp->data;
- struct scatterlist *sg = schp->buffer;
int k, num;
SCSI_LOG_TIMEOUT(4, printk("sg_read_oxfer: num_read_xfer=%d\n",
@@ -2058,15 +1926,18 @@ sg_read_oxfer(Sg_request * srp, char __user *outp, int num_read_xfer)
if ((!outp) || (num_read_xfer <= 0))
return 0;
- for (k = 0; (k < schp->k_use_sg) && sg_page(sg); ++k, sg = sg_next(sg)) {
- num = sg->length;
+ blk_rq_unmap_user(srp->bio);
+ srp->bio = NULL;
+
+ num = 1 << (PAGE_SHIFT + schp->page_order);
+ for (k = 0; k < schp->k_use_sg && schp->pages[k]; k++) {
if (num > num_read_xfer) {
- if (__copy_to_user(outp, page_address(sg_page(sg)),
+ if (__copy_to_user(outp, page_address(schp->pages[k]),
num_read_xfer))
return -EFAULT;
break;
} else {
- if (__copy_to_user(outp, page_address(sg_page(sg)),
+ if (__copy_to_user(outp, page_address(schp->pages[k]),
num))
return -EFAULT;
num_read_xfer -= num;
@@ -2101,24 +1972,22 @@ sg_link_reserve(Sg_fd * sfp, Sg_request * srp, int size)
{
Sg_scatter_hold *req_schp = &srp->data;
Sg_scatter_hold *rsv_schp = &sfp->reserve;
- struct scatterlist *sg = rsv_schp->buffer;
int k, num, rem;
srp->res_used = 1;
SCSI_LOG_TIMEOUT(4, printk("sg_link_reserve: size=%d\n", size));
rem = size;
- for (k = 0; k < rsv_schp->k_use_sg; ++k, sg = sg_next(sg)) {
- num = sg->length;
+ num = 1 << (PAGE_SHIFT + rsv_schp->page_order);
+ for (k = 0; k < rsv_schp->k_use_sg; k++) {
if (rem <= num) {
- sfp->save_scat_len = num;
- sg->length = rem;
req_schp->k_use_sg = k + 1;
req_schp->sglist_len = rsv_schp->sglist_len;
- req_schp->buffer = rsv_schp->buffer;
+ req_schp->pages = rsv_schp->pages;
req_schp->bufflen = size;
req_schp->b_malloc_len = rsv_schp->b_malloc_len;
+ req_schp->page_order = rsv_schp->page_order;
break;
} else
rem -= num;
@@ -2132,22 +2001,13 @@ static void
sg_unlink_reserve(Sg_fd * sfp, Sg_request * srp)
{
Sg_scatter_hold *req_schp = &srp->data;
- Sg_scatter_hold *rsv_schp = &sfp->reserve;
SCSI_LOG_TIMEOUT(4, printk("sg_unlink_reserve: req->k_use_sg=%d\n",
(int) req_schp->k_use_sg));
- if ((rsv_schp->k_use_sg > 0) && (req_schp->k_use_sg > 0)) {
- struct scatterlist *sg = rsv_schp->buffer;
-
- if (sfp->save_scat_len > 0)
- (sg + (req_schp->k_use_sg - 1))->length =
- (unsigned) sfp->save_scat_len;
- else
- SCSI_LOG_TIMEOUT(1, printk ("sg_unlink_reserve: BAD save_scat_len\n"));
- }
req_schp->k_use_sg = 0;
req_schp->bufflen = 0;
- req_schp->buffer = NULL;
+ req_schp->pages = NULL;
+ req_schp->page_order = 0;
req_schp->sglist_len = 0;
sfp->save_scat_len = 0;
srp->res_used = 0;
@@ -2405,53 +2265,6 @@ sg_res_in_use(Sg_fd * sfp)
return srp ? 1 : 0;
}
-/* The size fetched (value output via retSzp) set when non-NULL return */
-static struct page *
-sg_page_malloc(int rqSz, int lowDma, int *retSzp)
-{
- struct page *resp = NULL;
- gfp_t page_mask;
- int order, a_size;
- int resSz;
-
- if ((rqSz <= 0) || (NULL == retSzp))
- return resp;
-
- if (lowDma)
- page_mask = GFP_ATOMIC | GFP_DMA | __GFP_COMP | __GFP_NOWARN;
- else
- page_mask = GFP_ATOMIC | __GFP_COMP | __GFP_NOWARN;
-
- for (order = 0, a_size = PAGE_SIZE; a_size < rqSz;
- order++, a_size <<= 1) ;
- resSz = a_size; /* rounded up if necessary */
- resp = alloc_pages(page_mask, order);
- while ((!resp) && order) {
- --order;
- a_size >>= 1; /* divide by 2, until PAGE_SIZE */
- resp = alloc_pages(page_mask, order); /* try half */
- resSz = a_size;
- }
- if (resp) {
- if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SYS_RAWIO))
- memset(page_address(resp), 0, resSz);
- *retSzp = resSz;
- }
- return resp;
-}
-
-static void
-sg_page_free(struct page *page, int size)
-{
- int order, a_size;
-
- if (!page)
- return;
- for (order = 0, a_size = PAGE_SIZE; a_size < size;
- order++, a_size <<= 1) ;
- __free_pages(page, order);
-}
-
#ifdef CONFIG_SCSI_PROC_FS
static int
sg_idr_max_id(int id, void *p, void *data)
--
1.5.5.GIT
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 0/5] convert sg to use the block layer
2008-08-26 2:10 [PATCH 0/5] convert sg to use the block layer FUJITA Tomonori
2008-08-26 2:10 ` [PATCH 1/5] block: add gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov FUJITA Tomonori
@ 2008-08-26 7:56 ` Jens Axboe
2008-08-27 2:14 ` FUJITA Tomonori
2008-08-27 20:14 ` Douglas Gilbert
2 siblings, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2008-08-26 7:56 UTC (permalink / raw)
To: FUJITA Tomonori; +Cc: linux-scsi, dougg, michaelc, James.Bottomley
On Tue, Aug 26 2008, FUJITA Tomonori wrote:
> This patchset converts sg to use the block layer functions. That is,
> sg doesn't use scsi_execute_async() any more. This is a part of the
> overdue task to remove scsi_req_map_sg.
>
> I tested this patchset with sg v3 and the old interface (struct
> sg_header) via SG_IO (v3) and the vfs API.
>
>
> Doug,
>
> 1. I don't demote the sg driver's GFP_ATOMIC to GFP_KERNEL. sg always
> uses GFP_ATOMIC as before.
>
> 2. I don't remove GFP_DMA allocation. As before, sg allocates reserved
> pages with GFP_DMA (sfp->low_dma case).
>
> 3. I keep the reserved buffer per struct sg_fd as before.
>
> 4. I use high-order page allocation for reserved buffer as before. sg
> works well with HBAs that have the limitation of the number of sg
> entries.
>
> 5. I think that you were concern about the overhead of the block layer
> functions. But if you look at scsi_execute_async() that sg uses now,
> you can find that scsi_execute_async() uses the block layer functions
> internally. So the current sg incurs the overhead (if such overhead
> exists).
>
>
> Jens,
>
> I keep the block API changes to a minimum (I might need more changes
> for st/osst but I'd like to progress step by step). I did only two
> things.
>
> 1. I add gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov.
>
> 2. I introduces struct rq_map_data holding pages. sg puts
> pre-allocated pages to it and passes it to bio_copy_user_iov(). The
> current users of bio_copy_user_iov simply passes NULL. blk_rq_map_user
> and blk_rq_map_user_iov take a pointer to struct rq_map_data and in
> the end bio_copy_user_iov gets it.
>
>
> This patchset against the for-linus branch in Jens' tree + the two
> patches for 2.6.27:
>
> http://marc.info/?l=linux-kernel&m=121964251911717&w=2
> http://marc.info/?l=linux-kernel&m=121964241911611&w=2
>
> After Jens rebases the for-2.6.28 brach, I'll update this patchset
> too.
Thanks a lot for doing this work, it's been pending for a long time.
I've rebased for-2.6.28 on top of for-linus to ease this integration, so
if you could resend the patchset then I can add it.
--
Jens Axboe
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/5] block: add gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov
2008-08-26 2:10 ` [PATCH 1/5] block: add gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov FUJITA Tomonori
2008-08-26 2:10 ` [PATCH 2/5] block: introduce struct rq_map_data to use reserved pages FUJITA Tomonori
@ 2008-08-26 16:35 ` Christoph Hellwig
2008-08-27 1:30 ` FUJITA Tomonori
1 sibling, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2008-08-26 16:35 UTC (permalink / raw)
To: FUJITA Tomonori; +Cc: linux-scsi, jens.axboe, dougg, michaelc, James.Bottomley
On Tue, Aug 26, 2008 at 11:10:50AM +0900, FUJITA Tomonori wrote:
> Currently, blk_rq_map_user and blk_rq_map_user_iov always do
> GFP_KERNEL allocation.
>
> This adds gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov
> so sg can use it (sg always does GFP_ATOMIC allocation).
Most GFP_ATOMIC looks rather spurious to me, and are there probably
for some historic reason. Do you have a caller that actually needs
GFP_ATOMIC because it's under a spinlock or from irq context, or is this
just to stay as close as possible to the existing sg code?
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/5] block: add gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov
2008-08-26 16:35 ` [PATCH 1/5] block: add gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov Christoph Hellwig
@ 2008-08-27 1:30 ` FUJITA Tomonori
0 siblings, 0 replies; 23+ messages in thread
From: FUJITA Tomonori @ 2008-08-27 1:30 UTC (permalink / raw)
To: hch
Cc: fujita.tomonori, linux-scsi, jens.axboe, dougg, michaelc,
James.Bottomley
On Tue, 26 Aug 2008 12:35:45 -0400
Christoph Hellwig <hch@infradead.org> wrote:
> On Tue, Aug 26, 2008 at 11:10:50AM +0900, FUJITA Tomonori wrote:
> > Currently, blk_rq_map_user and blk_rq_map_user_iov always do
> > GFP_KERNEL allocation.
> >
> > This adds gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov
> > so sg can use it (sg always does GFP_ATOMIC allocation).
>
> Most GFP_ATOMIC looks rather spurious to me, and are there probably
> for some historic reason. Do you have a caller that actually needs
> GFP_ATOMIC because it's under a spinlock or from irq context,
No, we don't have.
> or is this just to stay as close as possible to the existing sg
> code?
Yes, I don't change sg behavior.
GFP_NOWAIT would be more appropriate than GFP_ATOMIC for sg, I guess.
But let me finish the conversion without changing sg behavior. I know
changing the behavior makes it more difficult to get Doug's ACK on
these patches.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/5] convert sg to use the block layer
2008-08-26 7:56 ` [PATCH 0/5] convert sg to use the block layer Jens Axboe
@ 2008-08-27 2:14 ` FUJITA Tomonori
2008-08-27 7:10 ` Jens Axboe
0 siblings, 1 reply; 23+ messages in thread
From: FUJITA Tomonori @ 2008-08-27 2:14 UTC (permalink / raw)
To: jens.axboe; +Cc: fujita.tomonori, linux-scsi, dougg, michaelc, James.Bottomley
On Tue, 26 Aug 2008 09:56:28 +0200
Jens Axboe <jens.axboe@oracle.com> wrote:
> On Tue, Aug 26 2008, FUJITA Tomonori wrote:
> > This patchset converts sg to use the block layer functions. That is,
> > sg doesn't use scsi_execute_async() any more. This is a part of the
> > overdue task to remove scsi_req_map_sg.
> >
> > I tested this patchset with sg v3 and the old interface (struct
> > sg_header) via SG_IO (v3) and the vfs API.
> >
> >
> > Doug,
> >
> > 1. I don't demote the sg driver's GFP_ATOMIC to GFP_KERNEL. sg always
> > uses GFP_ATOMIC as before.
> >
> > 2. I don't remove GFP_DMA allocation. As before, sg allocates reserved
> > pages with GFP_DMA (sfp->low_dma case).
> >
> > 3. I keep the reserved buffer per struct sg_fd as before.
> >
> > 4. I use high-order page allocation for reserved buffer as before. sg
> > works well with HBAs that have the limitation of the number of sg
> > entries.
> >
> > 5. I think that you were concern about the overhead of the block layer
> > functions. But if you look at scsi_execute_async() that sg uses now,
> > you can find that scsi_execute_async() uses the block layer functions
> > internally. So the current sg incurs the overhead (if such overhead
> > exists).
> >
> >
> > Jens,
> >
> > I keep the block API changes to a minimum (I might need more changes
> > for st/osst but I'd like to progress step by step). I did only two
> > things.
> >
> > 1. I add gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov.
> >
> > 2. I introduces struct rq_map_data holding pages. sg puts
> > pre-allocated pages to it and passes it to bio_copy_user_iov(). The
> > current users of bio_copy_user_iov simply passes NULL. blk_rq_map_user
> > and blk_rq_map_user_iov take a pointer to struct rq_map_data and in
> > the end bio_copy_user_iov gets it.
> >
> >
> > This patchset against the for-linus branch in Jens' tree + the two
> > patches for 2.6.27:
> >
> > http://marc.info/?l=linux-kernel&m=121964251911717&w=2
> > http://marc.info/?l=linux-kernel&m=121964241911611&w=2
> >
> > After Jens rebases the for-2.6.28 brach, I'll update this patchset
> > too.
>
> Thanks a lot for doing this work, it's been pending for a long time.
> I've rebased for-2.6.28 on top of for-linus to ease this integration, so
> if you could resend the patchset then I can add it.
Thanks, I put an updated version against the for-2.6.28 branch in your
tree:
git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-misc.git sg-block
Can you wait for a little while before applying this to your
for-2.6.28 branch? We need Doug's ACK on this.
But I think that it's nice if you can send this to linux-next since
testing this patchset on linux-next is good.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/5] convert sg to use the block layer
2008-08-27 2:14 ` FUJITA Tomonori
@ 2008-08-27 7:10 ` Jens Axboe
2008-08-27 23:26 ` FUJITA Tomonori
0 siblings, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2008-08-27 7:10 UTC (permalink / raw)
To: FUJITA Tomonori; +Cc: linux-scsi, dougg, michaelc, James.Bottomley
On Wed, Aug 27 2008, FUJITA Tomonori wrote:
> On Tue, 26 Aug 2008 09:56:28 +0200
> Jens Axboe <jens.axboe@oracle.com> wrote:
>
> > On Tue, Aug 26 2008, FUJITA Tomonori wrote:
> > > This patchset converts sg to use the block layer functions. That is,
> > > sg doesn't use scsi_execute_async() any more. This is a part of the
> > > overdue task to remove scsi_req_map_sg.
> > >
> > > I tested this patchset with sg v3 and the old interface (struct
> > > sg_header) via SG_IO (v3) and the vfs API.
> > >
> > >
> > > Doug,
> > >
> > > 1. I don't demote the sg driver's GFP_ATOMIC to GFP_KERNEL. sg always
> > > uses GFP_ATOMIC as before.
> > >
> > > 2. I don't remove GFP_DMA allocation. As before, sg allocates reserved
> > > pages with GFP_DMA (sfp->low_dma case).
> > >
> > > 3. I keep the reserved buffer per struct sg_fd as before.
> > >
> > > 4. I use high-order page allocation for reserved buffer as before. sg
> > > works well with HBAs that have the limitation of the number of sg
> > > entries.
> > >
> > > 5. I think that you were concern about the overhead of the block layer
> > > functions. But if you look at scsi_execute_async() that sg uses now,
> > > you can find that scsi_execute_async() uses the block layer functions
> > > internally. So the current sg incurs the overhead (if such overhead
> > > exists).
> > >
> > >
> > > Jens,
> > >
> > > I keep the block API changes to a minimum (I might need more changes
> > > for st/osst but I'd like to progress step by step). I did only two
> > > things.
> > >
> > > 1. I add gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov.
> > >
> > > 2. I introduces struct rq_map_data holding pages. sg puts
> > > pre-allocated pages to it and passes it to bio_copy_user_iov(). The
> > > current users of bio_copy_user_iov simply passes NULL. blk_rq_map_user
> > > and blk_rq_map_user_iov take a pointer to struct rq_map_data and in
> > > the end bio_copy_user_iov gets it.
> > >
> > >
> > > This patchset against the for-linus branch in Jens' tree + the two
> > > patches for 2.6.27:
> > >
> > > http://marc.info/?l=linux-kernel&m=121964251911717&w=2
> > > http://marc.info/?l=linux-kernel&m=121964241911611&w=2
> > >
> > > After Jens rebases the for-2.6.28 brach, I'll update this patchset
> > > too.
> >
> > Thanks a lot for doing this work, it's been pending for a long time.
> > I've rebased for-2.6.28 on top of for-linus to ease this integration, so
> > if you could resend the patchset then I can add it.
>
> Thanks, I put an updated version against the for-2.6.28 branch in your
> tree:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-misc.git sg-block
>
>
> Can you wait for a little while before applying this to your
> for-2.6.28 branch? We need Doug's ACK on this.
Sure, just let me know...
> But I think that it's nice if you can send this to linux-next since
> testing this patchset on linux-next is good.
for-2.6.28 is already in linux-next, so once this gets applied it'll get
tested there as well. And in -mm.
--
Jens Axboe
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/5] convert sg to use the block layer
2008-08-26 2:10 [PATCH 0/5] convert sg to use the block layer FUJITA Tomonori
2008-08-26 2:10 ` [PATCH 1/5] block: add gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov FUJITA Tomonori
2008-08-26 7:56 ` [PATCH 0/5] convert sg to use the block layer Jens Axboe
@ 2008-08-27 20:14 ` Douglas Gilbert
2008-08-27 23:26 ` FUJITA Tomonori
2 siblings, 1 reply; 23+ messages in thread
From: Douglas Gilbert @ 2008-08-27 20:14 UTC (permalink / raw)
To: FUJITA Tomonori; +Cc: linux-scsi, jens.axboe, michaelc, James.Bottomley
FUJITA Tomonori wrote:
> This patchset converts sg to use the block layer functions. That is,
> sg doesn't use scsi_execute_async() any more. This is a part of the
> overdue task to remove scsi_req_map_sg.
>
> I tested this patchset with sg v3 and the old interface (struct
> sg_header) via SG_IO (v3) and the vfs API.
>
>
> Doug,
>
> 1. I don't demote the sg driver's GFP_ATOMIC to GFP_KERNEL. sg always
> uses GFP_ATOMIC as before.
>
> 2. I don't remove GFP_DMA allocation. As before, sg allocates reserved
> pages with GFP_DMA (sfp->low_dma case).
>
> 3. I keep the reserved buffer per struct sg_fd as before.
>
> 4. I use high-order page allocation for reserved buffer as before. sg
> works well with HBAs that have the limitation of the number of sg
> entries.
>
> 5. I think that you were concern about the overhead of the block layer
> functions. But if you look at scsi_execute_async() that sg uses now,
> you can find that scsi_execute_async() uses the block layer functions
> internally. So the current sg incurs the overhead (if such overhead
> exists).
Tomo,
Thanks for doing the conversion.
Signed-off-by: Douglas Gilbert <dougg@torque.net>
> Jens,
>
> I keep the block API changes to a minimum (I might need more changes
> for st/osst but I'd like to progress step by step). I did only two
> things.
>
> 1. I add gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov.
>
> 2. I introduces struct rq_map_data holding pages. sg puts
> pre-allocated pages to it and passes it to bio_copy_user_iov(). The
> current users of bio_copy_user_iov simply passes NULL. blk_rq_map_user
> and blk_rq_map_user_iov take a pointer to struct rq_map_data and in
> the end bio_copy_user_iov gets it.
>
>
> This patchset against the for-linus branch in Jens' tree + the two
> patches for 2.6.27:
>
> http://marc.info/?l=linux-kernel&m=121964251911717&w=2
> http://marc.info/?l=linux-kernel&m=121964241911611&w=2
>
> After Jens rebases the for-2.6.28 brach, I'll update this patchset
> too.
>
> This patchset also is available at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-misc.git sg-block
>
>
>
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/5] convert sg to use the block layer
2008-08-27 20:14 ` Douglas Gilbert
@ 2008-08-27 23:26 ` FUJITA Tomonori
0 siblings, 0 replies; 23+ messages in thread
From: FUJITA Tomonori @ 2008-08-27 23:26 UTC (permalink / raw)
To: dougg; +Cc: fujita.tomonori, linux-scsi, jens.axboe, michaelc,
James.Bottomley
On Wed, 27 Aug 2008 16:14:24 -0400
Douglas Gilbert <dougg@torque.net> wrote:
> FUJITA Tomonori wrote:
> > This patchset converts sg to use the block layer functions. That is,
> > sg doesn't use scsi_execute_async() any more. This is a part of the
> > overdue task to remove scsi_req_map_sg.
> >
> > I tested this patchset with sg v3 and the old interface (struct
> > sg_header) via SG_IO (v3) and the vfs API.
> >
> >
> > Doug,
> >
> > 1. I don't demote the sg driver's GFP_ATOMIC to GFP_KERNEL. sg always
> > uses GFP_ATOMIC as before.
> >
> > 2. I don't remove GFP_DMA allocation. As before, sg allocates reserved
> > pages with GFP_DMA (sfp->low_dma case).
> >
> > 3. I keep the reserved buffer per struct sg_fd as before.
> >
> > 4. I use high-order page allocation for reserved buffer as before. sg
> > works well with HBAs that have the limitation of the number of sg
> > entries.
> >
> > 5. I think that you were concern about the overhead of the block layer
> > functions. But if you look at scsi_execute_async() that sg uses now,
> > you can find that scsi_execute_async() uses the block layer functions
> > internally. So the current sg incurs the overhead (if such overhead
> > exists).
>
> Tomo,
> Thanks for doing the conversion.
>
> Signed-off-by: Douglas Gilbert <dougg@torque.net>
Thanks a lot!
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/5] convert sg to use the block layer
2008-08-27 7:10 ` Jens Axboe
@ 2008-08-27 23:26 ` FUJITA Tomonori
2008-08-28 6:51 ` Jens Axboe
0 siblings, 1 reply; 23+ messages in thread
From: FUJITA Tomonori @ 2008-08-27 23:26 UTC (permalink / raw)
To: jens.axboe; +Cc: fujita.tomonori, linux-scsi, dougg, michaelc, James.Bottomley
On Wed, 27 Aug 2008 09:10:18 +0200
Jens Axboe <jens.axboe@oracle.com> wrote:
> > Thanks, I put an updated version against the for-2.6.28 branch in your
> > tree:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-misc.git sg-block
> >
> >
> > Can you wait for a little while before applying this to your
> > for-2.6.28 branch? We need Doug's ACK on this.
>
> Sure, just let me know...
Now we got Doug's ACK on this (Doug, thanks!). Can you apply them to
your for-2.6.28 branch? I'll resend them if necessary.
> > But I think that it's nice if you can send this to linux-next since
> > testing this patchset on linux-next is good.
>
> for-2.6.28 is already in linux-next, so once this gets applied it'll get
> tested there as well. And in -mm.
Thanks, I see.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/5] convert sg to use the block layer
2008-08-27 23:26 ` FUJITA Tomonori
@ 2008-08-28 6:51 ` Jens Axboe
2008-08-28 7:07 ` Jens Axboe
0 siblings, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2008-08-28 6:51 UTC (permalink / raw)
To: FUJITA Tomonori; +Cc: linux-scsi, dougg, michaelc, James.Bottomley
On Thu, Aug 28 2008, FUJITA Tomonori wrote:
> On Wed, 27 Aug 2008 09:10:18 +0200
> Jens Axboe <jens.axboe@oracle.com> wrote:
>
> > > Thanks, I put an updated version against the for-2.6.28 branch in your
> > > tree:
> > >
> > > git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-misc.git sg-block
> > >
> > >
> > > Can you wait for a little while before applying this to your
> > > for-2.6.28 branch? We need Doug's ACK on this.
> >
> > Sure, just let me know...
>
> Now we got Doug's ACK on this (Doug, thanks!). Can you apply them to
> your for-2.6.28 branch? I'll resend them if necessary.
Yes certainly. No need to resend, I'll add Dougs ack and merge it.
--
Jens Axboe
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/5] convert sg to use the block layer
2008-08-28 6:51 ` Jens Axboe
@ 2008-08-28 7:07 ` Jens Axboe
2008-08-28 7:17 ` [PATCH 1/5] block: add gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov FUJITA Tomonori
2008-08-28 7:32 ` [PATCH 0/5] convert sg to use the block layer FUJITA Tomonori
0 siblings, 2 replies; 23+ messages in thread
From: Jens Axboe @ 2008-08-28 7:07 UTC (permalink / raw)
To: FUJITA Tomonori; +Cc: linux-scsi, dougg, michaelc, James.Bottomley
On Thu, Aug 28 2008, Jens Axboe wrote:
> On Thu, Aug 28 2008, FUJITA Tomonori wrote:
> > On Wed, 27 Aug 2008 09:10:18 +0200
> > Jens Axboe <jens.axboe@oracle.com> wrote:
> >
> > > > Thanks, I put an updated version against the for-2.6.28 branch in your
> > > > tree:
> > > >
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-misc.git sg-block
> > > >
> > > >
> > > > Can you wait for a little while before applying this to your
> > > > for-2.6.28 branch? We need Doug's ACK on this.
> > >
> > > Sure, just let me know...
> >
> > Now we got Doug's ACK on this (Doug, thanks!). Can you apply them to
> > your for-2.6.28 branch? I'll resend them if necessary.
>
> Yes certainly. No need to resend, I'll add Dougs ack and merge it.
Actually, the previous was against for-linus (not for-2.6.28), so if you
could resend against 2.6.28 that would help a lot. Thanks!
--
Jens Axboe
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 1/5] block: add gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov
2008-08-28 7:07 ` Jens Axboe
@ 2008-08-28 7:17 ` FUJITA Tomonori
2008-08-28 7:17 ` [PATCH 2/5] block: introduce struct rq_map_data to use reserved pages FUJITA Tomonori
2008-08-28 7:34 ` [PATCH 1/5] block: add gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov Jens Axboe
2008-08-28 7:32 ` [PATCH 0/5] convert sg to use the block layer FUJITA Tomonori
1 sibling, 2 replies; 23+ messages in thread
From: FUJITA Tomonori @ 2008-08-28 7:17 UTC (permalink / raw)
To: jens.axboe; +Cc: linux-scsi, dougg, michaelc, James.Bottomley, FUJITA Tomonori
Currently, blk_rq_map_user and blk_rq_map_user_iov always do
GFP_KERNEL allocation.
This adds gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov
so sg can use it (sg always does GFP_ATOMIC allocation).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Douglas Gilbert <dougg@torque.net>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
---
block/blk-map.c | 20 ++++++++++++--------
block/bsg.c | 5 +++--
block/scsi_ioctl.c | 5 +++--
drivers/cdrom/cdrom.c | 2 +-
drivers/scsi/scsi_tgt_lib.c | 2 +-
fs/bio.c | 33 +++++++++++++++++++--------------
include/linux/bio.h | 9 +++++----
include/linux/blkdev.h | 5 +++--
8 files changed, 47 insertions(+), 34 deletions(-)
diff --git a/block/blk-map.c b/block/blk-map.c
index ea1bf53..ac21b73 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -41,7 +41,8 @@ static int __blk_rq_unmap_user(struct bio *bio)
}
static int __blk_rq_map_user(struct request_queue *q, struct request *rq,
- void __user *ubuf, unsigned int len)
+ void __user *ubuf, unsigned int len,
+ gfp_t gfp_mask)
{
unsigned long uaddr;
unsigned int alignment;
@@ -57,9 +58,9 @@ static int __blk_rq_map_user(struct request_queue *q, struct request *rq,
uaddr = (unsigned long) ubuf;
alignment = queue_dma_alignment(q) | q->dma_pad_mask;
if (!(uaddr & alignment) && !(len & alignment))
- bio = bio_map_user(q, NULL, uaddr, len, reading);
+ bio = bio_map_user(q, NULL, uaddr, len, reading, gfp_mask);
else
- bio = bio_copy_user(q, uaddr, len, reading);
+ bio = bio_copy_user(q, uaddr, len, reading, gfp_mask);
if (IS_ERR(bio))
return PTR_ERR(bio);
@@ -90,6 +91,7 @@ static int __blk_rq_map_user(struct request_queue *q, struct request *rq,
* @rq: request structure to fill
* @ubuf: the user buffer
* @len: length of user data
+ * @gfp_mask: memory allocation flags
*
* Description:
* Data will be mapped directly for zero copy I/O, if possible. Otherwise
@@ -105,7 +107,7 @@ static int __blk_rq_map_user(struct request_queue *q, struct request *rq,
* unmapping.
*/
int blk_rq_map_user(struct request_queue *q, struct request *rq,
- void __user *ubuf, unsigned long len)
+ void __user *ubuf, unsigned long len, gfp_t gfp_mask)
{
unsigned long bytes_read = 0;
struct bio *bio = NULL;
@@ -132,7 +134,7 @@ int blk_rq_map_user(struct request_queue *q, struct request *rq,
if (end - start > BIO_MAX_PAGES)
map_len -= PAGE_SIZE;
- ret = __blk_rq_map_user(q, rq, ubuf, map_len);
+ ret = __blk_rq_map_user(q, rq, ubuf, map_len, gfp_mask);
if (ret < 0)
goto unmap_rq;
if (!bio)
@@ -160,6 +162,7 @@ EXPORT_SYMBOL(blk_rq_map_user);
* @iov: pointer to the iovec
* @iov_count: number of elements in the iovec
* @len: I/O byte count
+ * @gfp_mask: memory allocation flags
*
* Description:
* Data will be mapped directly for zero copy I/O, if possible. Otherwise
@@ -175,7 +178,8 @@ EXPORT_SYMBOL(blk_rq_map_user);
* unmapping.
*/
int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
- struct sg_iovec *iov, int iov_count, unsigned int len)
+ struct sg_iovec *iov, int iov_count, unsigned int len,
+ gfp_t gfp_mask)
{
struct bio *bio;
int i, read = rq_data_dir(rq) == READ;
@@ -194,9 +198,9 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
}
if (unaligned || (q->dma_pad_mask & len))
- bio = bio_copy_user_iov(q, iov, iov_count, read);
+ bio = bio_copy_user_iov(q, iov, iov_count, read, gfp_mask);
else
- bio = bio_map_user_iov(q, NULL, iov, iov_count, read);
+ bio = bio_map_user_iov(q, NULL, iov, iov_count, read, gfp_mask);
if (IS_ERR(bio))
return PTR_ERR(bio);
diff --git a/block/bsg.c b/block/bsg.c
index 0aae8d7..e7a142e 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -283,7 +283,8 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, int has_write_perm)
next_rq->cmd_type = rq->cmd_type;
dxferp = (void*)(unsigned long)hdr->din_xferp;
- ret = blk_rq_map_user(q, next_rq, dxferp, hdr->din_xfer_len);
+ ret = blk_rq_map_user(q, next_rq, dxferp, hdr->din_xfer_len,
+ GFP_KERNEL);
if (ret)
goto out;
}
@@ -298,7 +299,7 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, int has_write_perm)
dxfer_len = 0;
if (dxfer_len) {
- ret = blk_rq_map_user(q, rq, dxferp, dxfer_len);
+ ret = blk_rq_map_user(q, rq, dxferp, dxfer_len, GFP_KERNEL);
if (ret)
goto out;
}
diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index 3aab80a..f49d6a1 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -315,10 +315,11 @@ static int sg_io(struct file *file, struct request_queue *q,
}
ret = blk_rq_map_user_iov(q, rq, iov, hdr->iovec_count,
- hdr->dxfer_len);
+ hdr->dxfer_len, GFP_KERNEL);
kfree(iov);
} else if (hdr->dxfer_len)
- ret = blk_rq_map_user(q, rq, hdr->dxferp, hdr->dxfer_len);
+ ret = blk_rq_map_user(q, rq, hdr->dxferp, hdr->dxfer_len,
+ GFP_KERNEL);
if (ret)
goto out;
diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c
index 74031de..e861d24 100644
--- a/drivers/cdrom/cdrom.c
+++ b/drivers/cdrom/cdrom.c
@@ -2097,7 +2097,7 @@ static int cdrom_read_cdda_bpc(struct cdrom_device_info *cdi, __u8 __user *ubuf,
len = nr * CD_FRAMESIZE_RAW;
- ret = blk_rq_map_user(q, rq, ubuf, len);
+ ret = blk_rq_map_user(q, rq, ubuf, len, GFP_KERNEL);
if (ret)
break;
diff --git a/drivers/scsi/scsi_tgt_lib.c b/drivers/scsi/scsi_tgt_lib.c
index 257e097..2a4fd82 100644
--- a/drivers/scsi/scsi_tgt_lib.c
+++ b/drivers/scsi/scsi_tgt_lib.c
@@ -362,7 +362,7 @@ static int scsi_map_user_pages(struct scsi_tgt_cmd *tcmd, struct scsi_cmnd *cmd,
int err;
dprintk("%lx %u\n", uaddr, len);
- err = blk_rq_map_user(q, rq, (void *)uaddr, len);
+ err = blk_rq_map_user(q, rq, (void *)uaddr, len, GFP_KERNEL);
if (err) {
/*
* TODO: need to fixup sg_tablesize, max_segment_size,
diff --git a/fs/bio.c b/fs/bio.c
index 6a637b5..3d2e9ad 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -558,13 +558,14 @@ int bio_uncopy_user(struct bio *bio)
* @iov: the iovec.
* @iov_count: number of elements in the iovec
* @write_to_vm: bool indicating writing to pages or not
+ * @gfp_mask: memory allocation flags
*
* Prepares and returns a bio for indirect user io, bouncing data
* to/from kernel pages as necessary. Must be paired with
* call bio_uncopy_user() on io completion.
*/
struct bio *bio_copy_user_iov(struct request_queue *q, struct sg_iovec *iov,
- int iov_count, int write_to_vm)
+ int iov_count, int write_to_vm, gfp_t gfp_mask)
{
struct bio_map_data *bmd;
struct bio_vec *bvec;
@@ -587,12 +588,12 @@ struct bio *bio_copy_user_iov(struct request_queue *q, struct sg_iovec *iov,
len += iov[i].iov_len;
}
- bmd = bio_alloc_map_data(nr_pages, iov_count, GFP_KERNEL);
+ bmd = bio_alloc_map_data(nr_pages, iov_count, gfp_mask);
if (!bmd)
return ERR_PTR(-ENOMEM);
ret = -ENOMEM;
- bio = bio_alloc(GFP_KERNEL, nr_pages);
+ bio = bio_alloc(gfp_mask, nr_pages);
if (!bio)
goto out_bmd;
@@ -605,7 +606,7 @@ struct bio *bio_copy_user_iov(struct request_queue *q, struct sg_iovec *iov,
if (bytes > len)
bytes = len;
- page = alloc_page(q->bounce_gfp | GFP_KERNEL);
+ page = alloc_page(q->bounce_gfp | gfp_mask);
if (!page) {
ret = -ENOMEM;
break;
@@ -647,26 +648,27 @@ out_bmd:
* @uaddr: start of user address
* @len: length in bytes
* @write_to_vm: bool indicating writing to pages or not
+ * @gfp_mask: memory allocation flags
*
* Prepares and returns a bio for indirect user io, bouncing data
* to/from kernel pages as necessary. Must be paired with
* call bio_uncopy_user() on io completion.
*/
struct bio *bio_copy_user(struct request_queue *q, unsigned long uaddr,
- unsigned int len, int write_to_vm)
+ unsigned int len, int write_to_vm, gfp_t gfp_mask)
{
struct sg_iovec iov;
iov.iov_base = (void __user *)uaddr;
iov.iov_len = len;
- return bio_copy_user_iov(q, &iov, 1, write_to_vm);
+ return bio_copy_user_iov(q, &iov, 1, write_to_vm, gfp_mask);
}
static struct bio *__bio_map_user_iov(struct request_queue *q,
struct block_device *bdev,
struct sg_iovec *iov, int iov_count,
- int write_to_vm)
+ int write_to_vm, gfp_t gfp_mask)
{
int i, j;
int nr_pages = 0;
@@ -692,12 +694,12 @@ static struct bio *__bio_map_user_iov(struct request_queue *q,
if (!nr_pages)
return ERR_PTR(-EINVAL);
- bio = bio_alloc(GFP_KERNEL, nr_pages);
+ bio = bio_alloc(gfp_mask, nr_pages);
if (!bio)
return ERR_PTR(-ENOMEM);
ret = -ENOMEM;
- pages = kcalloc(nr_pages, sizeof(struct page *), GFP_KERNEL);
+ pages = kcalloc(nr_pages, sizeof(struct page *), gfp_mask);
if (!pages)
goto out;
@@ -776,19 +778,21 @@ static struct bio *__bio_map_user_iov(struct request_queue *q,
* @uaddr: start of user address
* @len: length in bytes
* @write_to_vm: bool indicating writing to pages or not
+ * @gfp_mask: memory allocation flags
*
* Map the user space address into a bio suitable for io to a block
* device. Returns an error pointer in case of error.
*/
struct bio *bio_map_user(struct request_queue *q, struct block_device *bdev,
- unsigned long uaddr, unsigned int len, int write_to_vm)
+ unsigned long uaddr, unsigned int len, int write_to_vm,
+ gfp_t gfp_mask)
{
struct sg_iovec iov;
iov.iov_base = (void __user *)uaddr;
iov.iov_len = len;
- return bio_map_user_iov(q, bdev, &iov, 1, write_to_vm);
+ return bio_map_user_iov(q, bdev, &iov, 1, write_to_vm, gfp_mask);
}
/**
@@ -798,18 +802,19 @@ struct bio *bio_map_user(struct request_queue *q, struct block_device *bdev,
* @iov: the iovec.
* @iov_count: number of elements in the iovec
* @write_to_vm: bool indicating writing to pages or not
+ * @gfp_mask: memory allocation flags
*
* Map the user space address into a bio suitable for io to a block
* device. Returns an error pointer in case of error.
*/
struct bio *bio_map_user_iov(struct request_queue *q, struct block_device *bdev,
struct sg_iovec *iov, int iov_count,
- int write_to_vm)
+ int write_to_vm, gfp_t gfp_mask)
{
struct bio *bio;
- bio = __bio_map_user_iov(q, bdev, iov, iov_count, write_to_vm);
-
+ bio = __bio_map_user_iov(q, bdev, iov, iov_count, write_to_vm,
+ gfp_mask);
if (IS_ERR(bio))
return bio;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 13aba20..200b185 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -325,11 +325,11 @@ extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
unsigned int, unsigned int);
extern int bio_get_nr_vecs(struct block_device *);
extern struct bio *bio_map_user(struct request_queue *, struct block_device *,
- unsigned long, unsigned int, int);
+ unsigned long, unsigned int, int, gfp_t);
struct sg_iovec;
extern struct bio *bio_map_user_iov(struct request_queue *,
struct block_device *,
- struct sg_iovec *, int, int);
+ struct sg_iovec *, int, int, gfp_t);
extern void bio_unmap_user(struct bio *);
extern struct bio *bio_map_kern(struct request_queue *, void *, unsigned int,
gfp_t);
@@ -337,9 +337,10 @@ extern struct bio *bio_copy_kern(struct request_queue *, void *, unsigned int,
gfp_t, int);
extern void bio_set_pages_dirty(struct bio *bio);
extern void bio_check_pages_dirty(struct bio *bio);
-extern struct bio *bio_copy_user(struct request_queue *, unsigned long, unsigned int, int);
+extern struct bio *bio_copy_user(struct request_queue *, unsigned long,
+ unsigned int, int, gfp_t);
extern struct bio *bio_copy_user_iov(struct request_queue *, struct sg_iovec *,
- int, int);
+ int, int, gfp_t);
extern int bio_uncopy_user(struct bio *);
void zero_fill_bio(struct bio *bio);
extern struct bio_vec *bvec_alloc_bs(gfp_t, int, unsigned long *, struct bio_set *);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index ed9324f..9512c5b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -710,11 +710,12 @@ extern void __blk_stop_queue(struct request_queue *q);
extern void __blk_run_queue(struct request_queue *);
extern void blk_run_queue(struct request_queue *);
extern void blk_start_queueing(struct request_queue *);
-extern int blk_rq_map_user(struct request_queue *, struct request *, void __user *, unsigned long);
+extern int blk_rq_map_user(struct request_queue *, struct request *,
+ void __user *, unsigned long, gfp_t);
extern int blk_rq_unmap_user(struct bio *);
extern int blk_rq_map_kern(struct request_queue *, struct request *, void *, unsigned int, gfp_t);
extern int blk_rq_map_user_iov(struct request_queue *, struct request *,
- struct sg_iovec *, int, unsigned int);
+ struct sg_iovec *, int, unsigned int, gfp_t);
extern int blk_execute_rq(struct request_queue *, struct gendisk *,
struct request *, int);
extern void blk_execute_rq_nowait(struct request_queue *, struct gendisk *,
--
1.5.5.GIT
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 2/5] block: introduce struct rq_map_data to use reserved pages
2008-08-28 7:17 ` [PATCH 1/5] block: add gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov FUJITA Tomonori
@ 2008-08-28 7:17 ` FUJITA Tomonori
2008-08-28 7:17 ` [PATCH 3/5] sg: convert the non-data path to use the block layer FUJITA Tomonori
2008-08-28 7:34 ` [PATCH 1/5] block: add gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov Jens Axboe
1 sibling, 1 reply; 23+ messages in thread
From: FUJITA Tomonori @ 2008-08-28 7:17 UTC (permalink / raw)
To: jens.axboe; +Cc: linux-scsi, dougg, michaelc, James.Bottomley, FUJITA Tomonori
This patch introduces struct rq_map_data to enable bio_copy_use_iov()
use reserved pages.
Currently, bio_copy_user_iov allocates bounce pages but
drivers/scsi/sg.c wants to allocate pages by itself and use
them. struct rq_map_data can be used to pass allocated pages to
bio_copy_user_iov.
The current users of bio_copy_user_iov simply passes NULL (they don't
want to use pre-allocated pages).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Douglas Gilbert <dougg@torque.net>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
---
block/blk-map.c | 26 ++++++++++++-------
block/bsg.c | 7 +++--
block/scsi_ioctl.c | 4 +-
drivers/cdrom/cdrom.c | 2 +-
drivers/scsi/scsi_tgt_lib.c | 2 +-
fs/bio.c | 58 ++++++++++++++++++++++++++++++------------
include/linux/bio.h | 8 +++--
include/linux/blkdev.h | 12 +++++++-
8 files changed, 80 insertions(+), 39 deletions(-)
diff --git a/block/blk-map.c b/block/blk-map.c
index ac21b73..dad6a29 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -41,8 +41,8 @@ static int __blk_rq_unmap_user(struct bio *bio)
}
static int __blk_rq_map_user(struct request_queue *q, struct request *rq,
- void __user *ubuf, unsigned int len,
- gfp_t gfp_mask)
+ struct rq_map_data *map_data, void __user *ubuf,
+ unsigned int len, gfp_t gfp_mask)
{
unsigned long uaddr;
unsigned int alignment;
@@ -57,10 +57,10 @@ static int __blk_rq_map_user(struct request_queue *q, struct request *rq,
*/
uaddr = (unsigned long) ubuf;
alignment = queue_dma_alignment(q) | q->dma_pad_mask;
- if (!(uaddr & alignment) && !(len & alignment))
+ if (!(uaddr & alignment) && !(len & alignment) && !map_data)
bio = bio_map_user(q, NULL, uaddr, len, reading, gfp_mask);
else
- bio = bio_copy_user(q, uaddr, len, reading, gfp_mask);
+ bio = bio_copy_user(q, map_data, uaddr, len, reading, gfp_mask);
if (IS_ERR(bio))
return PTR_ERR(bio);
@@ -89,6 +89,7 @@ static int __blk_rq_map_user(struct request_queue *q, struct request *rq,
* blk_rq_map_user - map user data to a request, for REQ_TYPE_BLOCK_PC usage
* @q: request queue where request should be inserted
* @rq: request structure to fill
+ * @map_data: pointer to the rq_map_data holding pages (if necessary)
* @ubuf: the user buffer
* @len: length of user data
* @gfp_mask: memory allocation flags
@@ -107,7 +108,8 @@ static int __blk_rq_map_user(struct request_queue *q, struct request *rq,
* unmapping.
*/
int blk_rq_map_user(struct request_queue *q, struct request *rq,
- void __user *ubuf, unsigned long len, gfp_t gfp_mask)
+ struct rq_map_data *map_data, void __user *ubuf,
+ unsigned long len, gfp_t gfp_mask)
{
unsigned long bytes_read = 0;
struct bio *bio = NULL;
@@ -134,7 +136,8 @@ int blk_rq_map_user(struct request_queue *q, struct request *rq,
if (end - start > BIO_MAX_PAGES)
map_len -= PAGE_SIZE;
- ret = __blk_rq_map_user(q, rq, ubuf, map_len, gfp_mask);
+ ret = __blk_rq_map_user(q, rq, map_data, ubuf, map_len,
+ gfp_mask);
if (ret < 0)
goto unmap_rq;
if (!bio)
@@ -159,6 +162,7 @@ EXPORT_SYMBOL(blk_rq_map_user);
* blk_rq_map_user_iov - map user data to a request, for REQ_TYPE_BLOCK_PC usage
* @q: request queue where request should be inserted
* @rq: request to map data to
+ * @map_data: pointer to the rq_map_data holding pages (if necessary)
* @iov: pointer to the iovec
* @iov_count: number of elements in the iovec
* @len: I/O byte count
@@ -178,8 +182,8 @@ EXPORT_SYMBOL(blk_rq_map_user);
* unmapping.
*/
int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
- struct sg_iovec *iov, int iov_count, unsigned int len,
- gfp_t gfp_mask)
+ struct rq_map_data *map_data, struct sg_iovec *iov,
+ int iov_count, unsigned int len, gfp_t gfp_mask)
{
struct bio *bio;
int i, read = rq_data_dir(rq) == READ;
@@ -197,8 +201,9 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
}
}
- if (unaligned || (q->dma_pad_mask & len))
- bio = bio_copy_user_iov(q, iov, iov_count, read, gfp_mask);
+ if (unaligned || (q->dma_pad_mask & len) || map_data)
+ bio = bio_copy_user_iov(q, map_data, iov, iov_count, read,
+ gfp_mask);
else
bio = bio_map_user_iov(q, NULL, iov, iov_count, read, gfp_mask);
@@ -220,6 +225,7 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
rq->buffer = rq->data = NULL;
return 0;
}
+EXPORT_SYMBOL(blk_rq_map_user_iov);
/**
* blk_rq_unmap_user - unmap a request with user data
diff --git a/block/bsg.c b/block/bsg.c
index e7a142e..56cb343 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -283,8 +283,8 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, int has_write_perm)
next_rq->cmd_type = rq->cmd_type;
dxferp = (void*)(unsigned long)hdr->din_xferp;
- ret = blk_rq_map_user(q, next_rq, dxferp, hdr->din_xfer_len,
- GFP_KERNEL);
+ ret = blk_rq_map_user(q, next_rq, NULL, dxferp,
+ hdr->din_xfer_len, GFP_KERNEL);
if (ret)
goto out;
}
@@ -299,7 +299,8 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, int has_write_perm)
dxfer_len = 0;
if (dxfer_len) {
- ret = blk_rq_map_user(q, rq, dxferp, dxfer_len, GFP_KERNEL);
+ ret = blk_rq_map_user(q, rq, NULL, dxferp, dxfer_len,
+ GFP_KERNEL);
if (ret)
goto out;
}
diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index f49d6a1..c34272a 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -314,11 +314,11 @@ static int sg_io(struct file *file, struct request_queue *q,
goto out;
}
- ret = blk_rq_map_user_iov(q, rq, iov, hdr->iovec_count,
+ ret = blk_rq_map_user_iov(q, rq, NULL, iov, hdr->iovec_count,
hdr->dxfer_len, GFP_KERNEL);
kfree(iov);
} else if (hdr->dxfer_len)
- ret = blk_rq_map_user(q, rq, hdr->dxferp, hdr->dxfer_len,
+ ret = blk_rq_map_user(q, rq, NULL, hdr->dxferp, hdr->dxfer_len,
GFP_KERNEL);
if (ret)
diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c
index e861d24..d47f2f8 100644
--- a/drivers/cdrom/cdrom.c
+++ b/drivers/cdrom/cdrom.c
@@ -2097,7 +2097,7 @@ static int cdrom_read_cdda_bpc(struct cdrom_device_info *cdi, __u8 __user *ubuf,
len = nr * CD_FRAMESIZE_RAW;
- ret = blk_rq_map_user(q, rq, ubuf, len, GFP_KERNEL);
+ ret = blk_rq_map_user(q, rq, NULL, ubuf, len, GFP_KERNEL);
if (ret)
break;
diff --git a/drivers/scsi/scsi_tgt_lib.c b/drivers/scsi/scsi_tgt_lib.c
index 2a4fd82..3117bb1 100644
--- a/drivers/scsi/scsi_tgt_lib.c
+++ b/drivers/scsi/scsi_tgt_lib.c
@@ -362,7 +362,7 @@ static int scsi_map_user_pages(struct scsi_tgt_cmd *tcmd, struct scsi_cmnd *cmd,
int err;
dprintk("%lx %u\n", uaddr, len);
- err = blk_rq_map_user(q, rq, (void *)uaddr, len, GFP_KERNEL);
+ err = blk_rq_map_user(q, rq, NULL, (void *)uaddr, len, GFP_KERNEL);
if (err) {
/*
* TODO: need to fixup sg_tablesize, max_segment_size,
diff --git a/fs/bio.c b/fs/bio.c
index 3d2e9ad..a2f0726 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -439,16 +439,19 @@ int bio_add_page(struct bio *bio, struct page *page, unsigned int len,
struct bio_map_data {
struct bio_vec *iovecs;
- int nr_sgvecs;
struct sg_iovec *sgvecs;
+ int nr_sgvecs;
+ int is_our_pages;
};
static void bio_set_map_data(struct bio_map_data *bmd, struct bio *bio,
- struct sg_iovec *iov, int iov_count)
+ struct sg_iovec *iov, int iov_count,
+ int is_our_pages)
{
memcpy(bmd->iovecs, bio->bi_io_vec, sizeof(struct bio_vec) * bio->bi_vcnt);
memcpy(bmd->sgvecs, iov, sizeof(struct sg_iovec) * iov_count);
bmd->nr_sgvecs = iov_count;
+ bmd->is_our_pages = is_our_pages;
bio->bi_private = bmd;
}
@@ -483,7 +486,8 @@ static struct bio_map_data *bio_alloc_map_data(int nr_segs, int iov_count,
}
static int __bio_copy_iov(struct bio *bio, struct bio_vec *iovecs,
- struct sg_iovec *iov, int iov_count, int uncopy)
+ struct sg_iovec *iov, int iov_count, int uncopy,
+ int do_free_page)
{
int ret = 0, i;
struct bio_vec *bvec;
@@ -526,7 +530,7 @@ static int __bio_copy_iov(struct bio *bio, struct bio_vec *iovecs,
}
}
- if (uncopy)
+ if (do_free_page)
__free_page(bvec->bv_page);
}
@@ -545,7 +549,8 @@ int bio_uncopy_user(struct bio *bio)
struct bio_map_data *bmd = bio->bi_private;
int ret;
- ret = __bio_copy_iov(bio, bmd->iovecs, bmd->sgvecs, bmd->nr_sgvecs, 1);
+ ret = __bio_copy_iov(bio, bmd->iovecs, bmd->sgvecs, bmd->nr_sgvecs, 1,
+ bmd->is_our_pages);
bio_free_map_data(bmd);
bio_put(bio);
@@ -555,6 +560,7 @@ int bio_uncopy_user(struct bio *bio)
/**
* bio_copy_user_iov - copy user data to bio
* @q: destination block queue
+ * @map_data: pointer to the rq_map_data holding pages (if necessary)
* @iov: the iovec.
* @iov_count: number of elements in the iovec
* @write_to_vm: bool indicating writing to pages or not
@@ -564,8 +570,10 @@ int bio_uncopy_user(struct bio *bio)
* to/from kernel pages as necessary. Must be paired with
* call bio_uncopy_user() on io completion.
*/
-struct bio *bio_copy_user_iov(struct request_queue *q, struct sg_iovec *iov,
- int iov_count, int write_to_vm, gfp_t gfp_mask)
+struct bio *bio_copy_user_iov(struct request_queue *q,
+ struct rq_map_data *map_data,
+ struct sg_iovec *iov, int iov_count,
+ int write_to_vm, gfp_t gfp_mask)
{
struct bio_map_data *bmd;
struct bio_vec *bvec;
@@ -600,13 +608,26 @@ struct bio *bio_copy_user_iov(struct request_queue *q, struct sg_iovec *iov,
bio->bi_rw |= (!write_to_vm << BIO_RW);
ret = 0;
+ i = 0;
while (len) {
- unsigned int bytes = PAGE_SIZE;
+ unsigned int bytes;
+
+ if (map_data)
+ bytes = 1U << (PAGE_SHIFT + map_data->page_order);
+ else
+ bytes = PAGE_SIZE;
if (bytes > len)
bytes = len;
- page = alloc_page(q->bounce_gfp | gfp_mask);
+ if (map_data) {
+ if (i == map_data->nr_entries) {
+ ret = -ENOMEM;
+ break;
+ }
+ page = map_data->pages[i++];
+ } else
+ page = alloc_page(q->bounce_gfp | gfp_mask);
if (!page) {
ret = -ENOMEM;
break;
@@ -625,16 +646,17 @@ struct bio *bio_copy_user_iov(struct request_queue *q, struct sg_iovec *iov,
* success
*/
if (!write_to_vm) {
- ret = __bio_copy_iov(bio, bio->bi_io_vec, iov, iov_count, 0);
+ ret = __bio_copy_iov(bio, bio->bi_io_vec, iov, iov_count, 0, 0);
if (ret)
goto cleanup;
}
- bio_set_map_data(bmd, bio, iov, iov_count);
+ bio_set_map_data(bmd, bio, iov, iov_count, map_data ? 0 : 1);
return bio;
cleanup:
- bio_for_each_segment(bvec, bio, i)
- __free_page(bvec->bv_page);
+ if (!map_data)
+ bio_for_each_segment(bvec, bio, i)
+ __free_page(bvec->bv_page);
bio_put(bio);
out_bmd:
@@ -645,6 +667,7 @@ out_bmd:
/**
* bio_copy_user - copy user data to bio
* @q: destination block queue
+ * @map_data: pointer to the rq_map_data holding pages (if necessary)
* @uaddr: start of user address
* @len: length in bytes
* @write_to_vm: bool indicating writing to pages or not
@@ -654,15 +677,16 @@ out_bmd:
* to/from kernel pages as necessary. Must be paired with
* call bio_uncopy_user() on io completion.
*/
-struct bio *bio_copy_user(struct request_queue *q, unsigned long uaddr,
- unsigned int len, int write_to_vm, gfp_t gfp_mask)
+struct bio *bio_copy_user(struct request_queue *q, struct rq_map_data *map_data,
+ unsigned long uaddr, unsigned int len,
+ int write_to_vm, gfp_t gfp_mask)
{
struct sg_iovec iov;
iov.iov_base = (void __user *)uaddr;
iov.iov_len = len;
- return bio_copy_user_iov(q, &iov, 1, write_to_vm, gfp_mask);
+ return bio_copy_user_iov(q, map_data, &iov, 1, write_to_vm, gfp_mask);
}
static struct bio *__bio_map_user_iov(struct request_queue *q,
@@ -1028,7 +1052,7 @@ struct bio *bio_copy_kern(struct request_queue *q, void *data, unsigned int len,
bio->bi_private = bmd;
bio->bi_end_io = bio_copy_kern_endio;
- bio_set_map_data(bmd, bio, &iov, 1);
+ bio_set_map_data(bmd, bio, &iov, 1, 1);
return bio;
cleanup:
bio_for_each_segment(bvec, bio, i)
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 200b185..bc386cd 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -327,6 +327,7 @@ extern int bio_get_nr_vecs(struct block_device *);
extern struct bio *bio_map_user(struct request_queue *, struct block_device *,
unsigned long, unsigned int, int, gfp_t);
struct sg_iovec;
+struct rq_map_data;
extern struct bio *bio_map_user_iov(struct request_queue *,
struct block_device *,
struct sg_iovec *, int, int, gfp_t);
@@ -337,9 +338,10 @@ extern struct bio *bio_copy_kern(struct request_queue *, void *, unsigned int,
gfp_t, int);
extern void bio_set_pages_dirty(struct bio *bio);
extern void bio_check_pages_dirty(struct bio *bio);
-extern struct bio *bio_copy_user(struct request_queue *, unsigned long,
- unsigned int, int, gfp_t);
-extern struct bio *bio_copy_user_iov(struct request_queue *, struct sg_iovec *,
+extern struct bio *bio_copy_user(struct request_queue *, struct rq_map_data *,
+ unsigned long, unsigned int, int, gfp_t);
+extern struct bio *bio_copy_user_iov(struct request_queue *,
+ struct rq_map_data *, struct sg_iovec *,
int, int, gfp_t);
extern int bio_uncopy_user(struct bio *);
void zero_fill_bio(struct bio *bio);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 9512c5b..d2faa72 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -642,6 +642,12 @@ static inline void blk_queue_bounce(struct request_queue *q, struct bio **bio)
}
#endif /* CONFIG_MMU */
+struct rq_map_data {
+ struct page **pages;
+ int page_order;
+ int nr_entries;
+};
+
struct req_iterator {
int i;
struct bio *bio;
@@ -711,11 +717,13 @@ extern void __blk_run_queue(struct request_queue *);
extern void blk_run_queue(struct request_queue *);
extern void blk_start_queueing(struct request_queue *);
extern int blk_rq_map_user(struct request_queue *, struct request *,
- void __user *, unsigned long, gfp_t);
+ struct rq_map_data *, void __user *, unsigned long,
+ gfp_t);
extern int blk_rq_unmap_user(struct bio *);
extern int blk_rq_map_kern(struct request_queue *, struct request *, void *, unsigned int, gfp_t);
extern int blk_rq_map_user_iov(struct request_queue *, struct request *,
- struct sg_iovec *, int, unsigned int, gfp_t);
+ struct rq_map_data *, struct sg_iovec *, int,
+ unsigned int, gfp_t);
extern int blk_execute_rq(struct request_queue *, struct gendisk *,
struct request *, int);
extern void blk_execute_rq_nowait(struct request_queue *, struct gendisk *,
--
1.5.5.GIT
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 3/5] sg: convert the non-data path to use the block layer
2008-08-28 7:17 ` [PATCH 2/5] block: introduce struct rq_map_data to use reserved pages FUJITA Tomonori
@ 2008-08-28 7:17 ` FUJITA Tomonori
2008-08-28 7:17 ` [PATCH 4/5] sg: convert the direct IO " FUJITA Tomonori
0 siblings, 1 reply; 23+ messages in thread
From: FUJITA Tomonori @ 2008-08-28 7:17 UTC (permalink / raw)
To: jens.axboe; +Cc: linux-scsi, dougg, michaelc, James.Bottomley, FUJITA Tomonori
This patch converts the non data path to use the block layer functions
(blk_get_request, blk_execute_rq_nowait, etc) instead of uses
scsi_execute_async().
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Douglas Gilbert <dougg@torque.net>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
---
drivers/scsi/sg.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++++-----
1 files changed, 48 insertions(+), 5 deletions(-)
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 661f9f2..487c777 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -137,6 +137,7 @@ typedef struct sg_request { /* SG_MAX_QUEUE requests outstanding per file */
char orphan; /* 1 -> drop on sight, 0 -> normal */
char sg_io_owned; /* 1 -> packet belongs to SG_IO */
volatile char done; /* 0->before bh, 1->before read, 2->read */
+ struct request *rq;
} Sg_request;
typedef struct sg_fd { /* holds the state of a file descriptor */
@@ -176,7 +177,7 @@ typedef struct sg_device { /* holds the state of each scsi generic device */
static int sg_fasync(int fd, struct file *filp, int mode);
/* tasklet or soft irq callback */
static void sg_cmd_done(void *data, char *sense, int result, int resid);
-static int sg_start_req(Sg_request * srp);
+static int sg_start_req(Sg_request *srp, unsigned char *cmd);
static void sg_finish_rem_req(Sg_request * srp);
static int sg_build_indirect(Sg_scatter_hold * schp, Sg_fd * sfp, int buff_size);
static int sg_build_sgat(Sg_scatter_hold * schp, const Sg_fd * sfp,
@@ -229,6 +230,11 @@ static int sg_allow_access(struct file *filp, unsigned char *cmd)
cmd, filp->f_mode & FMODE_WRITE);
}
+static void sg_rq_end_io(struct request *rq, int uptodate)
+{
+ sg_cmd_done(rq->end_io_data, rq->sense, rq->errors, rq->data_len);
+}
+
static int
sg_open(struct inode *inode, struct file *filp)
{
@@ -732,7 +738,8 @@ sg_common_write(Sg_fd * sfp, Sg_request * srp,
SCSI_LOG_TIMEOUT(4, printk("sg_common_write: scsi opcode=0x%02x, cmd_size=%d\n",
(int) cmnd[0], (int) hp->cmd_len));
- if ((k = sg_start_req(srp))) {
+ k = sg_start_req(srp, cmnd);
+ if (k) {
SCSI_LOG_TIMEOUT(1, printk("sg_common_write: start_req err=%d\n", k));
sg_finish_rem_req(srp);
return k; /* probably out of space --> ENOMEM */
@@ -765,6 +772,12 @@ sg_common_write(Sg_fd * sfp, Sg_request * srp,
hp->duration = jiffies_to_msecs(jiffies);
/* Now send everything of to mid-level. The next time we hear about this
packet is when sg_cmd_done() is called (i.e. a callback). */
+ if (srp->rq) {
+ srp->rq->timeout = timeout;
+ blk_execute_rq_nowait(sdp->device->request_queue, sdp->disk,
+ srp->rq, 1, sg_rq_end_io);
+ return 0;
+ }
if (scsi_execute_async(sdp->device, cmnd, hp->cmd_len, data_dir, srp->data.buffer,
hp->dxfer_len, srp->data.k_use_sg, timeout,
SG_DEFAULT_RETRIES, srp, sg_cmd_done,
@@ -1634,8 +1647,32 @@ exit_sg(void)
idr_destroy(&sg_index_idr);
}
-static int
-sg_start_req(Sg_request * srp)
+static int __sg_start_req(struct sg_request *srp, struct sg_io_hdr *hp,
+ unsigned char *cmd)
+{
+ struct sg_fd *sfp = srp->parentfp;
+ struct request_queue *q = sfp->parentdp->device->request_queue;
+ struct request *rq;
+ int rw = hp->dxfer_direction == SG_DXFER_TO_DEV ? WRITE : READ;
+
+ rq = blk_get_request(q, rw, GFP_ATOMIC);
+ if (!rq)
+ return -ENOMEM;
+
+ memcpy(rq->cmd, cmd, hp->cmd_len);
+
+ rq->cmd_len = hp->cmd_len;
+ rq->cmd_type = REQ_TYPE_BLOCK_PC;
+
+ srp->rq = rq;
+ rq->end_io_data = srp;
+ rq->sense = srp->sense_b;
+ rq->retries = SG_DEFAULT_RETRIES;
+
+ return 0;
+}
+
+static int sg_start_req(Sg_request *srp, unsigned char *cmd)
{
int res;
Sg_fd *sfp = srp->parentfp;
@@ -1646,8 +1683,10 @@ sg_start_req(Sg_request * srp)
Sg_scatter_hold *rsv_schp = &sfp->reserve;
SCSI_LOG_TIMEOUT(4, printk("sg_start_req: dxfer_len=%d\n", dxfer_len));
+
if ((dxfer_len <= 0) || (dxfer_dir == SG_DXFER_NONE))
- return 0;
+ return __sg_start_req(srp, hp, cmd);
+
if (sg_allow_dio && (hp->flags & SG_FLAG_DIRECT_IO) &&
(dxfer_dir != SG_DXFER_UNKNOWN) && (0 == hp->iovec_count) &&
(!sfp->parentdp->device->host->unchecked_isa_dma)) {
@@ -1678,6 +1717,10 @@ sg_finish_rem_req(Sg_request * srp)
sg_unlink_reserve(sfp, srp);
else
sg_remove_scat(req_schp);
+
+ if (srp->rq)
+ blk_put_request(srp->rq);
+
sg_remove_request(sfp, srp);
}
--
1.5.5.GIT
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 4/5] sg: convert the direct IO path to use the block layer
2008-08-28 7:17 ` [PATCH 3/5] sg: convert the non-data path to use the block layer FUJITA Tomonori
@ 2008-08-28 7:17 ` FUJITA Tomonori
2008-08-28 7:17 ` [PATCH 5/5] sg: convert the indirect " FUJITA Tomonori
0 siblings, 1 reply; 23+ messages in thread
From: FUJITA Tomonori @ 2008-08-28 7:17 UTC (permalink / raw)
To: jens.axboe; +Cc: linux-scsi, dougg, michaelc, James.Bottomley, FUJITA Tomonori
This patch converts the direct IO path (SG_FLAG_DIRECT_IO) to use the
block layer functions (blk_get_request, blk_execute_rq_nowait,
blk_rq_map_user, etc) instead of scsi_execute_async().
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Douglas Gilbert <dougg@torque.net>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
---
drivers/scsi/sg.c | 173 ++++++++--------------------------------------------
1 files changed, 27 insertions(+), 146 deletions(-)
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 487c777..cb6de07 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -138,6 +138,7 @@ typedef struct sg_request { /* SG_MAX_QUEUE requests outstanding per file */
char sg_io_owned; /* 1 -> packet belongs to SG_IO */
volatile char done; /* 0->before bh, 1->before read, 2->read */
struct request *rq;
+ struct bio *bio;
} Sg_request;
typedef struct sg_fd { /* holds the state of a file descriptor */
@@ -1679,21 +1680,29 @@ static int sg_start_req(Sg_request *srp, unsigned char *cmd)
sg_io_hdr_t *hp = &srp->header;
int dxfer_len = (int) hp->dxfer_len;
int dxfer_dir = hp->dxfer_direction;
+ unsigned long uaddr = (unsigned long)hp->dxferp;
Sg_scatter_hold *req_schp = &srp->data;
Sg_scatter_hold *rsv_schp = &sfp->reserve;
+ struct request_queue *q = sfp->parentdp->device->request_queue;
+ unsigned long alignment = queue_dma_alignment(q) | q->dma_pad_mask;
SCSI_LOG_TIMEOUT(4, printk("sg_start_req: dxfer_len=%d\n", dxfer_len));
if ((dxfer_len <= 0) || (dxfer_dir == SG_DXFER_NONE))
return __sg_start_req(srp, hp, cmd);
+#ifdef SG_ALLOW_DIO_CODE
if (sg_allow_dio && (hp->flags & SG_FLAG_DIRECT_IO) &&
(dxfer_dir != SG_DXFER_UNKNOWN) && (0 == hp->iovec_count) &&
- (!sfp->parentdp->device->host->unchecked_isa_dma)) {
- res = sg_build_direct(srp, sfp, dxfer_len);
- if (res <= 0) /* -ve -> error, 0 -> done, 1 -> try indirect */
- return res;
+ (!sfp->parentdp->device->host->unchecked_isa_dma) &&
+ !(uaddr & alignment) && !(dxfer_len & alignment)) {
+ res = __sg_start_req(srp, hp, cmd);
+ if (!res)
+ res = sg_build_direct(srp, sfp, dxfer_len);
+
+ return res;
}
+#endif
if ((!sg_res_in_use(sfp)) && (dxfer_len <= rsv_schp->bufflen))
sg_link_reserve(sfp, srp, dxfer_len);
else {
@@ -1718,8 +1727,11 @@ sg_finish_rem_req(Sg_request * srp)
else
sg_remove_scat(req_schp);
- if (srp->rq)
+ if (srp->rq) {
+ if (srp->bio)
+ blk_rq_unmap_user(srp->bio);
blk_put_request(srp->rq);
+ }
sg_remove_request(sfp, srp);
}
@@ -1746,151 +1758,23 @@ sg_build_sgat(Sg_scatter_hold * schp, const Sg_fd * sfp, int tablesize)
return tablesize; /* number of scat_gath elements allocated */
}
-#ifdef SG_ALLOW_DIO_CODE
-/* vvvvvvvv following code borrowed from st driver's direct IO vvvvvvvvv */
- /* TODO: hopefully we can use the generic block layer code */
-
-/* Pin down user pages and put them into a scatter gather list. Returns <= 0 if
- - mapping of all pages not successful
- (i.e., either completely successful or fails)
-*/
-static int
-st_map_user_pages(struct scatterlist *sgl, const unsigned int max_pages,
- unsigned long uaddr, size_t count, int rw)
-{
- unsigned long end = (uaddr + count + PAGE_SIZE - 1) >> PAGE_SHIFT;
- unsigned long start = uaddr >> PAGE_SHIFT;
- const int nr_pages = end - start;
- int res, i, j;
- struct page **pages;
-
- /* User attempted Overflow! */
- if ((uaddr + count) < uaddr)
- return -EINVAL;
-
- /* Too big */
- if (nr_pages > max_pages)
- return -ENOMEM;
-
- /* Hmm? */
- if (count == 0)
- return 0;
-
- if ((pages = kmalloc(max_pages * sizeof(*pages), GFP_ATOMIC)) == NULL)
- return -ENOMEM;
-
- /* Try to fault in all of the necessary pages */
- down_read(¤t->mm->mmap_sem);
- /* rw==READ means read from drive, write into memory area */
- res = get_user_pages(
- current,
- current->mm,
- uaddr,
- nr_pages,
- rw == READ,
- 0, /* don't force */
- pages,
- NULL);
- up_read(¤t->mm->mmap_sem);
-
- /* Errors and no page mapped should return here */
- if (res < nr_pages)
- goto out_unmap;
-
- for (i=0; i < nr_pages; i++) {
- /* FIXME: flush superflous for rw==READ,
- * probably wrong function for rw==WRITE
- */
- flush_dcache_page(pages[i]);
- /* ?? Is locking needed? I don't think so */
- /* if (!trylock_page(pages[i]))
- goto out_unlock; */
- }
-
- sg_set_page(sgl, pages[0], 0, uaddr & ~PAGE_MASK);
- if (nr_pages > 1) {
- sgl[0].length = PAGE_SIZE - sgl[0].offset;
- count -= sgl[0].length;
- for (i=1; i < nr_pages ; i++)
- sg_set_page(&sgl[i], pages[i], count < PAGE_SIZE ? count : PAGE_SIZE, 0);
- }
- else {
- sgl[0].length = count;
- }
-
- kfree(pages);
- return nr_pages;
-
- out_unmap:
- if (res > 0) {
- for (j=0; j < res; j++)
- page_cache_release(pages[j]);
- res = 0;
- }
- kfree(pages);
- return res;
-}
-
-
-/* And unmap them... */
-static int
-st_unmap_user_pages(struct scatterlist *sgl, const unsigned int nr_pages,
- int dirtied)
-{
- int i;
-
- for (i=0; i < nr_pages; i++) {
- struct page *page = sg_page(&sgl[i]);
-
- if (dirtied)
- SetPageDirty(page);
- /* unlock_page(page); */
- /* FIXME: cache flush missing for rw==READ
- * FIXME: call the correct reference counting function
- */
- page_cache_release(page);
- }
-
- return 0;
-}
-
-/* ^^^^^^^^ above code borrowed from st driver's direct IO ^^^^^^^^^ */
-#endif
-
-
/* Returns: -ve -> error, 0 -> done, 1 -> try indirect */
static int
sg_build_direct(Sg_request * srp, Sg_fd * sfp, int dxfer_len)
{
-#ifdef SG_ALLOW_DIO_CODE
sg_io_hdr_t *hp = &srp->header;
Sg_scatter_hold *schp = &srp->data;
- int sg_tablesize = sfp->parentdp->sg_tablesize;
- int mx_sc_elems, res;
- struct scsi_device *sdev = sfp->parentdp->device;
-
- if (((unsigned long)hp->dxferp &
- queue_dma_alignment(sdev->request_queue)) != 0)
- return 1;
+ int res;
+ struct request *rq = srp->rq;
+ struct request_queue *q = sfp->parentdp->device->request_queue;
- mx_sc_elems = sg_build_sgat(schp, sfp, sg_tablesize);
- if (mx_sc_elems <= 0) {
- return 1;
- }
- res = st_map_user_pages(schp->buffer, mx_sc_elems,
- (unsigned long)hp->dxferp, dxfer_len,
- (SG_DXFER_TO_DEV == hp->dxfer_direction) ? 1 : 0);
- if (res <= 0) {
- sg_remove_scat(schp);
- return 1;
- }
- schp->k_use_sg = res;
+ res = blk_rq_map_user(q, rq, NULL, hp->dxferp, dxfer_len, GFP_ATOMIC);
+ if (res)
+ return res;
+ srp->bio = rq->bio;
schp->dio_in_use = 1;
hp->info |= SG_INFO_DIRECT_IO;
return 0;
-#else
- return 1;
-#endif
}
static int
@@ -2069,11 +1953,7 @@ sg_remove_scat(Sg_scatter_hold * schp)
if (schp->buffer && (schp->sglist_len > 0)) {
struct scatterlist *sg = schp->buffer;
- if (schp->dio_in_use) {
-#ifdef SG_ALLOW_DIO_CODE
- st_unmap_user_pages(sg, schp->k_use_sg, TRUE);
-#endif
- } else {
+ if (!schp->dio_in_use) {
int k;
for (k = 0; (k < schp->k_use_sg) && sg_page(sg);
@@ -2083,8 +1963,9 @@ sg_remove_scat(Sg_scatter_hold * schp)
k, sg_page(sg), sg->length));
sg_page_free(sg_page(sg), sg->length);
}
+
+ kfree(schp->buffer);
}
- kfree(schp->buffer);
}
memset(schp, 0, sizeof (*schp));
}
--
1.5.5.GIT
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 5/5] sg: convert the indirect IO path to use the block layer
2008-08-28 7:17 ` [PATCH 4/5] sg: convert the direct IO " FUJITA Tomonori
@ 2008-08-28 7:17 ` FUJITA Tomonori
0 siblings, 0 replies; 23+ messages in thread
From: FUJITA Tomonori @ 2008-08-28 7:17 UTC (permalink / raw)
To: jens.axboe; +Cc: linux-scsi, dougg, michaelc, James.Bottomley, FUJITA Tomonori
This patch converts the indirect IO path (including mmap IO and old
struct sg_header) to use the block layer functions (blk_get_request,
blk_execute_rq_nowait, blk_rq_map_user, etc) instead of
scsi_execute_async().
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Douglas Gilbert <dougg@torque.net>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
---
drivers/scsi/sg.c | 393 ++++++++++++++---------------------------------------
1 files changed, 103 insertions(+), 290 deletions(-)
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index cb6de07..56a5d96 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -47,7 +47,6 @@ static int sg_version_num = 30534; /* 2 digits for each component */
#include <linux/seq_file.h>
#include <linux/blkdev.h>
#include <linux/delay.h>
-#include <linux/scatterlist.h>
#include <linux/blktrace_api.h>
#include <linux/smp_lock.h>
@@ -119,7 +118,8 @@ typedef struct sg_scatter_hold { /* holding area for scsi scatter gather info */
unsigned sglist_len; /* size of malloc'd scatter-gather list ++ */
unsigned bufflen; /* Size of (aggregate) data buffer */
unsigned b_malloc_len; /* actual len malloc'ed in buffer */
- struct scatterlist *buffer;/* scatter list */
+ struct page **pages;
+ int page_order;
char dio_in_use; /* 0->indirect IO (or mmap), 1->dio */
unsigned char cmd_opcode; /* first byte of command */
} Sg_scatter_hold;
@@ -190,8 +190,6 @@ static ssize_t sg_new_write(Sg_fd *sfp, struct file *file,
int read_only, Sg_request **o_srp);
static int sg_common_write(Sg_fd * sfp, Sg_request * srp,
unsigned char *cmnd, int timeout, int blocking);
-static int sg_u_iovec(sg_io_hdr_t * hp, int sg_num, int ind,
- int wr_xf, int *countp, unsigned char __user **up);
static int sg_write_xfer(Sg_request * srp);
static int sg_read_xfer(Sg_request * srp);
static int sg_read_oxfer(Sg_request * srp, char __user *outp, int num_read_xfer);
@@ -199,8 +197,6 @@ static void sg_remove_scat(Sg_scatter_hold * schp);
static void sg_build_reserve(Sg_fd * sfp, int req_size);
static void sg_link_reserve(Sg_fd * sfp, Sg_request * srp, int size);
static void sg_unlink_reserve(Sg_fd * sfp, Sg_request * srp);
-static struct page *sg_page_malloc(int rqSz, int lowDma, int *retSzp);
-static void sg_page_free(struct page *page, int size);
static Sg_fd *sg_add_sfp(Sg_device * sdp, int dev);
static int sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp);
static void __sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp);
@@ -771,26 +767,11 @@ sg_common_write(Sg_fd * sfp, Sg_request * srp,
break;
}
hp->duration = jiffies_to_msecs(jiffies);
-/* Now send everything of to mid-level. The next time we hear about this
- packet is when sg_cmd_done() is called (i.e. a callback). */
- if (srp->rq) {
- srp->rq->timeout = timeout;
- blk_execute_rq_nowait(sdp->device->request_queue, sdp->disk,
- srp->rq, 1, sg_rq_end_io);
- return 0;
- }
- if (scsi_execute_async(sdp->device, cmnd, hp->cmd_len, data_dir, srp->data.buffer,
- hp->dxfer_len, srp->data.k_use_sg, timeout,
- SG_DEFAULT_RETRIES, srp, sg_cmd_done,
- GFP_ATOMIC)) {
- SCSI_LOG_TIMEOUT(1, printk("sg_common_write: scsi_execute_async failed\n"));
- /*
- * most likely out of mem, but could also be a bad map
- */
- sg_finish_rem_req(srp);
- return -ENOMEM;
- } else
- return 0;
+
+ srp->rq->timeout = timeout;
+ blk_execute_rq_nowait(sdp->device->request_queue, sdp->disk,
+ srp->rq, 1, sg_rq_end_io);
+ return 0;
}
static int
@@ -1206,8 +1187,7 @@ sg_vma_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
Sg_fd *sfp;
unsigned long offset, len, sa;
Sg_scatter_hold *rsv_schp;
- struct scatterlist *sg;
- int k;
+ int k, length;
if ((NULL == vma) || (!(sfp = (Sg_fd *) vma->vm_private_data)))
return VM_FAULT_SIGBUS;
@@ -1217,15 +1197,14 @@ sg_vma_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
return VM_FAULT_SIGBUS;
SCSI_LOG_TIMEOUT(3, printk("sg_vma_fault: offset=%lu, scatg=%d\n",
offset, rsv_schp->k_use_sg));
- sg = rsv_schp->buffer;
sa = vma->vm_start;
- for (k = 0; (k < rsv_schp->k_use_sg) && (sa < vma->vm_end);
- ++k, sg = sg_next(sg)) {
+ length = 1 << (PAGE_SHIFT + rsv_schp->page_order);
+ for (k = 0; k < rsv_schp->k_use_sg && sa < vma->vm_end; k++) {
len = vma->vm_end - sa;
- len = (len < sg->length) ? len : sg->length;
+ len = (len < length) ? len : length;
if (offset < len) {
- struct page *page;
- page = virt_to_page(page_address(sg_page(sg)) + offset);
+ struct page *page = nth_page(rsv_schp->pages[k],
+ offset >> PAGE_SHIFT);
get_page(page); /* increment page count */
vmf->page = page;
return 0; /* success */
@@ -1247,8 +1226,7 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
Sg_fd *sfp;
unsigned long req_sz, len, sa;
Sg_scatter_hold *rsv_schp;
- int k;
- struct scatterlist *sg;
+ int k, length;
if ((!filp) || (!vma) || (!(sfp = (Sg_fd *) filp->private_data)))
return -ENXIO;
@@ -1262,11 +1240,10 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
return -ENOMEM; /* cannot map more than reserved buffer */
sa = vma->vm_start;
- sg = rsv_schp->buffer;
- for (k = 0; (k < rsv_schp->k_use_sg) && (sa < vma->vm_end);
- ++k, sg = sg_next(sg)) {
+ length = 1 << (PAGE_SHIFT + rsv_schp->page_order);
+ for (k = 0; k < rsv_schp->k_use_sg && sa < vma->vm_end; k++) {
len = vma->vm_end - sa;
- len = (len < sg->length) ? len : sg->length;
+ len = (len < length) ? len : length;
sa += len;
}
@@ -1310,7 +1287,6 @@ sg_cmd_done(void *data, char *sense, int result, int resid)
if (0 != result) {
struct scsi_sense_hdr sshdr;
- memcpy(srp->sense_b, sense, sizeof (srp->sense_b));
srp->header.status = 0xff & result;
srp->header.masked_status = status_byte(result);
srp->header.msg_status = msg_byte(result);
@@ -1685,34 +1661,51 @@ static int sg_start_req(Sg_request *srp, unsigned char *cmd)
Sg_scatter_hold *rsv_schp = &sfp->reserve;
struct request_queue *q = sfp->parentdp->device->request_queue;
unsigned long alignment = queue_dma_alignment(q) | q->dma_pad_mask;
+ struct rq_map_data map_data;
SCSI_LOG_TIMEOUT(4, printk("sg_start_req: dxfer_len=%d\n", dxfer_len));
+ res = __sg_start_req(srp, hp, cmd);
+ if (res)
+ return res;
+
if ((dxfer_len <= 0) || (dxfer_dir == SG_DXFER_NONE))
- return __sg_start_req(srp, hp, cmd);
+ return 0;
#ifdef SG_ALLOW_DIO_CODE
if (sg_allow_dio && (hp->flags & SG_FLAG_DIRECT_IO) &&
(dxfer_dir != SG_DXFER_UNKNOWN) && (0 == hp->iovec_count) &&
(!sfp->parentdp->device->host->unchecked_isa_dma) &&
- !(uaddr & alignment) && !(dxfer_len & alignment)) {
- res = __sg_start_req(srp, hp, cmd);
- if (!res)
- res = sg_build_direct(srp, sfp, dxfer_len);
-
- return res;
- }
+ !(uaddr & alignment) && !(dxfer_len & alignment))
+ return sg_build_direct(srp, sfp, dxfer_len);
#endif
if ((!sg_res_in_use(sfp)) && (dxfer_len <= rsv_schp->bufflen))
sg_link_reserve(sfp, srp, dxfer_len);
- else {
+ else
res = sg_build_indirect(req_schp, sfp, dxfer_len);
- if (res) {
- sg_remove_scat(req_schp);
- return res;
- }
+
+ if (!res) {
+ struct request *rq = srp->rq;
+ Sg_scatter_hold *schp = &srp->data;
+ int iovec_count = (int) hp->iovec_count;
+
+ map_data.pages = schp->pages;
+ map_data.page_order = schp->page_order;
+ map_data.nr_entries = schp->k_use_sg;
+
+ if (iovec_count)
+ res = blk_rq_map_user_iov(q, rq, &map_data, hp->dxferp,
+ iovec_count,
+ hp->dxfer_len, GFP_ATOMIC);
+ else
+ res = blk_rq_map_user(q, rq, &map_data, hp->dxferp,
+ hp->dxfer_len, GFP_ATOMIC);
+
+ if (!res)
+ srp->bio = rq->bio;
}
- return 0;
+
+ return res;
}
static void
@@ -1730,6 +1723,7 @@ sg_finish_rem_req(Sg_request * srp)
if (srp->rq) {
if (srp->bio)
blk_rq_unmap_user(srp->bio);
+
blk_put_request(srp->rq);
}
@@ -1739,21 +1733,12 @@ sg_finish_rem_req(Sg_request * srp)
static int
sg_build_sgat(Sg_scatter_hold * schp, const Sg_fd * sfp, int tablesize)
{
- int sg_bufflen = tablesize * sizeof(struct scatterlist);
+ int sg_bufflen = tablesize * sizeof(struct page *);
gfp_t gfp_flags = GFP_ATOMIC | __GFP_NOWARN;
- /*
- * TODO: test without low_dma, we should not need it since
- * the block layer will bounce the buffer for us
- *
- * XXX(hch): we shouldn't need GFP_DMA for the actual S/G list.
- */
- if (sfp->low_dma)
- gfp_flags |= GFP_DMA;
- schp->buffer = kzalloc(sg_bufflen, gfp_flags);
- if (!schp->buffer)
+ schp->pages = kzalloc(sg_bufflen, gfp_flags);
+ if (!schp->pages)
return -ENOMEM;
- sg_init_table(schp->buffer, tablesize);
schp->sglist_len = sg_bufflen;
return tablesize; /* number of scat_gath elements allocated */
}
@@ -1780,11 +1765,10 @@ sg_build_direct(Sg_request * srp, Sg_fd * sfp, int dxfer_len)
static int
sg_build_indirect(Sg_scatter_hold * schp, Sg_fd * sfp, int buff_size)
{
- struct scatterlist *sg;
- int ret_sz = 0, k, rem_sz, num, mx_sc_elems;
+ int ret_sz = 0, i, k, rem_sz, num, mx_sc_elems;
int sg_tablesize = sfp->parentdp->sg_tablesize;
- int blk_size = buff_size;
- struct page *p = NULL;
+ int blk_size = buff_size, order;
+ gfp_t gfp_mask = GFP_ATOMIC | __GFP_COMP | __GFP_NOWARN;
if (blk_size < 0)
return -EFAULT;
@@ -1808,15 +1792,26 @@ sg_build_indirect(Sg_scatter_hold * schp, Sg_fd * sfp, int buff_size)
} else
scatter_elem_sz_prev = num;
}
- for (k = 0, sg = schp->buffer, rem_sz = blk_size;
- (rem_sz > 0) && (k < mx_sc_elems);
- ++k, rem_sz -= ret_sz, sg = sg_next(sg)) {
-
+
+ if (sfp->low_dma)
+ gfp_mask |= GFP_DMA;
+
+ if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SYS_RAWIO))
+ gfp_mask |= __GFP_ZERO;
+
+ order = get_order(num);
+retry:
+ ret_sz = 1 << (PAGE_SHIFT + order);
+
+ for (k = 0, rem_sz = blk_size; rem_sz > 0 && k < mx_sc_elems;
+ k++, rem_sz -= ret_sz) {
+
num = (rem_sz > scatter_elem_sz_prev) ?
- scatter_elem_sz_prev : rem_sz;
- p = sg_page_malloc(num, sfp->low_dma, &ret_sz);
- if (!p)
- return -ENOMEM;
+ scatter_elem_sz_prev : rem_sz;
+
+ schp->pages[k] = alloc_pages(gfp_mask, order);
+ if (!schp->pages[k])
+ goto out;
if (num == scatter_elem_sz_prev) {
if (unlikely(ret_sz > scatter_elem_sz_prev)) {
@@ -1824,12 +1819,12 @@ sg_build_indirect(Sg_scatter_hold * schp, Sg_fd * sfp, int buff_size)
scatter_elem_sz_prev = ret_sz;
}
}
- sg_set_page(sg, p, (ret_sz > num) ? num : ret_sz, 0);
SCSI_LOG_TIMEOUT(5, printk("sg_build_indirect: k=%d, num=%d, "
"ret_sz=%d\n", k, num, ret_sz));
} /* end of for loop */
+ schp->page_order = order;
schp->k_use_sg = k;
SCSI_LOG_TIMEOUT(5, printk("sg_build_indirect: k_use_sg=%d, "
"rem_sz=%d\n", k, rem_sz));
@@ -1837,8 +1832,15 @@ sg_build_indirect(Sg_scatter_hold * schp, Sg_fd * sfp, int buff_size)
schp->bufflen = blk_size;
if (rem_sz > 0) /* must have failed */
return -ENOMEM;
-
return 0;
+out:
+ for (i = 0; i < k; i++)
+ __free_pages(schp->pages[k], order);
+
+ if (--order >= 0)
+ goto retry;
+
+ return -ENOMEM;
}
static int
@@ -1846,13 +1848,8 @@ sg_write_xfer(Sg_request * srp)
{
sg_io_hdr_t *hp = &srp->header;
Sg_scatter_hold *schp = &srp->data;
- struct scatterlist *sg = schp->buffer;
int num_xfer = 0;
- int j, k, onum, usglen, ksglen, res;
- int iovec_count = (int) hp->iovec_count;
int dxfer_dir = hp->dxfer_direction;
- unsigned char *p;
- unsigned char __user *up;
int new_interface = ('\0' == hp->interface_id) ? 0 : 1;
if ((SG_DXFER_UNKNOWN == dxfer_dir) || (SG_DXFER_TO_DEV == dxfer_dir) ||
@@ -1868,103 +1865,26 @@ sg_write_xfer(Sg_request * srp)
SCSI_LOG_TIMEOUT(4, printk("sg_write_xfer: num_xfer=%d, iovec_count=%d, k_use_sg=%d\n",
num_xfer, iovec_count, schp->k_use_sg));
- if (iovec_count) {
- onum = iovec_count;
- if (!access_ok(VERIFY_READ, hp->dxferp, SZ_SG_IOVEC * onum))
- return -EFAULT;
- } else
- onum = 1;
-
- ksglen = sg->length;
- p = page_address(sg_page(sg));
- for (j = 0, k = 0; j < onum; ++j) {
- res = sg_u_iovec(hp, iovec_count, j, 1, &usglen, &up);
- if (res)
- return res;
-
- for (; p; sg = sg_next(sg), ksglen = sg->length,
- p = page_address(sg_page(sg))) {
- if (usglen <= 0)
- break;
- if (ksglen > usglen) {
- if (usglen >= num_xfer) {
- if (__copy_from_user(p, up, num_xfer))
- return -EFAULT;
- return 0;
- }
- if (__copy_from_user(p, up, usglen))
- return -EFAULT;
- p += usglen;
- ksglen -= usglen;
- break;
- } else {
- if (ksglen >= num_xfer) {
- if (__copy_from_user(p, up, num_xfer))
- return -EFAULT;
- return 0;
- }
- if (__copy_from_user(p, up, ksglen))
- return -EFAULT;
- up += ksglen;
- usglen -= ksglen;
- }
- ++k;
- if (k >= schp->k_use_sg)
- return 0;
- }
- }
return 0;
}
-static int
-sg_u_iovec(sg_io_hdr_t * hp, int sg_num, int ind,
- int wr_xf, int *countp, unsigned char __user **up)
-{
- int num_xfer = (int) hp->dxfer_len;
- unsigned char __user *p = hp->dxferp;
- int count;
-
- if (0 == sg_num) {
- if (wr_xf && ('\0' == hp->interface_id))
- count = (int) hp->flags; /* holds "old" input_size */
- else
- count = num_xfer;
- } else {
- sg_iovec_t iovec;
- if (__copy_from_user(&iovec, p + ind*SZ_SG_IOVEC, SZ_SG_IOVEC))
- return -EFAULT;
- p = iovec.iov_base;
- count = (int) iovec.iov_len;
- }
- if (!access_ok(wr_xf ? VERIFY_READ : VERIFY_WRITE, p, count))
- return -EFAULT;
- if (up)
- *up = p;
- if (countp)
- *countp = count;
- return 0;
-}
-
static void
sg_remove_scat(Sg_scatter_hold * schp)
{
SCSI_LOG_TIMEOUT(4, printk("sg_remove_scat: k_use_sg=%d\n", schp->k_use_sg));
- if (schp->buffer && (schp->sglist_len > 0)) {
- struct scatterlist *sg = schp->buffer;
-
+ if (schp->pages && schp->sglist_len > 0) {
if (!schp->dio_in_use) {
int k;
- for (k = 0; (k < schp->k_use_sg) && sg_page(sg);
- ++k, sg = sg_next(sg)) {
+ for (k = 0; k < schp->k_use_sg && schp->pages[k]; k++) {
SCSI_LOG_TIMEOUT(5, printk(
- "sg_remove_scat: k=%d, pg=0x%p, len=%d\n",
- k, sg_page(sg), sg->length));
- sg_page_free(sg_page(sg), sg->length);
+ "sg_remove_scat: k=%d, pg=0x%p\n",
+ k, page));
+ __free_pages(schp->pages[k], schp->page_order);
}
- kfree(schp->buffer);
+ kfree(schp->pages);
}
}
memset(schp, 0, sizeof (*schp));
@@ -1975,13 +1895,8 @@ sg_read_xfer(Sg_request * srp)
{
sg_io_hdr_t *hp = &srp->header;
Sg_scatter_hold *schp = &srp->data;
- struct scatterlist *sg = schp->buffer;
int num_xfer = 0;
- int j, k, onum, usglen, ksglen, res;
- int iovec_count = (int) hp->iovec_count;
int dxfer_dir = hp->dxfer_direction;
- unsigned char *p;
- unsigned char __user *up;
int new_interface = ('\0' == hp->interface_id) ? 0 : 1;
if ((SG_DXFER_UNKNOWN == dxfer_dir) || (SG_DXFER_FROM_DEV == dxfer_dir)
@@ -1996,53 +1911,7 @@ sg_read_xfer(Sg_request * srp)
return 0;
SCSI_LOG_TIMEOUT(4, printk("sg_read_xfer: num_xfer=%d, iovec_count=%d, k_use_sg=%d\n",
- num_xfer, iovec_count, schp->k_use_sg));
- if (iovec_count) {
- onum = iovec_count;
- if (!access_ok(VERIFY_READ, hp->dxferp, SZ_SG_IOVEC * onum))
- return -EFAULT;
- } else
- onum = 1;
-
- p = page_address(sg_page(sg));
- ksglen = sg->length;
- for (j = 0, k = 0; j < onum; ++j) {
- res = sg_u_iovec(hp, iovec_count, j, 0, &usglen, &up);
- if (res)
- return res;
-
- for (; p; sg = sg_next(sg), ksglen = sg->length,
- p = page_address(sg_page(sg))) {
- if (usglen <= 0)
- break;
- if (ksglen > usglen) {
- if (usglen >= num_xfer) {
- if (__copy_to_user(up, p, num_xfer))
- return -EFAULT;
- return 0;
- }
- if (__copy_to_user(up, p, usglen))
- return -EFAULT;
- p += usglen;
- ksglen -= usglen;
- break;
- } else {
- if (ksglen >= num_xfer) {
- if (__copy_to_user(up, p, num_xfer))
- return -EFAULT;
- return 0;
- }
- if (__copy_to_user(up, p, ksglen))
- return -EFAULT;
- up += ksglen;
- usglen -= ksglen;
- }
- ++k;
- if (k >= schp->k_use_sg)
- return 0;
- }
- }
-
+ num_xfer, (int)hp->iovec_count, schp->k_use_sg));
return 0;
}
@@ -2050,7 +1919,6 @@ static int
sg_read_oxfer(Sg_request * srp, char __user *outp, int num_read_xfer)
{
Sg_scatter_hold *schp = &srp->data;
- struct scatterlist *sg = schp->buffer;
int k, num;
SCSI_LOG_TIMEOUT(4, printk("sg_read_oxfer: num_read_xfer=%d\n",
@@ -2058,15 +1926,18 @@ sg_read_oxfer(Sg_request * srp, char __user *outp, int num_read_xfer)
if ((!outp) || (num_read_xfer <= 0))
return 0;
- for (k = 0; (k < schp->k_use_sg) && sg_page(sg); ++k, sg = sg_next(sg)) {
- num = sg->length;
+ blk_rq_unmap_user(srp->bio);
+ srp->bio = NULL;
+
+ num = 1 << (PAGE_SHIFT + schp->page_order);
+ for (k = 0; k < schp->k_use_sg && schp->pages[k]; k++) {
if (num > num_read_xfer) {
- if (__copy_to_user(outp, page_address(sg_page(sg)),
+ if (__copy_to_user(outp, page_address(schp->pages[k]),
num_read_xfer))
return -EFAULT;
break;
} else {
- if (__copy_to_user(outp, page_address(sg_page(sg)),
+ if (__copy_to_user(outp, page_address(schp->pages[k]),
num))
return -EFAULT;
num_read_xfer -= num;
@@ -2101,24 +1972,22 @@ sg_link_reserve(Sg_fd * sfp, Sg_request * srp, int size)
{
Sg_scatter_hold *req_schp = &srp->data;
Sg_scatter_hold *rsv_schp = &sfp->reserve;
- struct scatterlist *sg = rsv_schp->buffer;
int k, num, rem;
srp->res_used = 1;
SCSI_LOG_TIMEOUT(4, printk("sg_link_reserve: size=%d\n", size));
rem = size;
- for (k = 0; k < rsv_schp->k_use_sg; ++k, sg = sg_next(sg)) {
- num = sg->length;
+ num = 1 << (PAGE_SHIFT + rsv_schp->page_order);
+ for (k = 0; k < rsv_schp->k_use_sg; k++) {
if (rem <= num) {
- sfp->save_scat_len = num;
- sg->length = rem;
req_schp->k_use_sg = k + 1;
req_schp->sglist_len = rsv_schp->sglist_len;
- req_schp->buffer = rsv_schp->buffer;
+ req_schp->pages = rsv_schp->pages;
req_schp->bufflen = size;
req_schp->b_malloc_len = rsv_schp->b_malloc_len;
+ req_schp->page_order = rsv_schp->page_order;
break;
} else
rem -= num;
@@ -2132,22 +2001,13 @@ static void
sg_unlink_reserve(Sg_fd * sfp, Sg_request * srp)
{
Sg_scatter_hold *req_schp = &srp->data;
- Sg_scatter_hold *rsv_schp = &sfp->reserve;
SCSI_LOG_TIMEOUT(4, printk("sg_unlink_reserve: req->k_use_sg=%d\n",
(int) req_schp->k_use_sg));
- if ((rsv_schp->k_use_sg > 0) && (req_schp->k_use_sg > 0)) {
- struct scatterlist *sg = rsv_schp->buffer;
-
- if (sfp->save_scat_len > 0)
- (sg + (req_schp->k_use_sg - 1))->length =
- (unsigned) sfp->save_scat_len;
- else
- SCSI_LOG_TIMEOUT(1, printk ("sg_unlink_reserve: BAD save_scat_len\n"));
- }
req_schp->k_use_sg = 0;
req_schp->bufflen = 0;
- req_schp->buffer = NULL;
+ req_schp->pages = NULL;
+ req_schp->page_order = 0;
req_schp->sglist_len = 0;
sfp->save_scat_len = 0;
srp->res_used = 0;
@@ -2405,53 +2265,6 @@ sg_res_in_use(Sg_fd * sfp)
return srp ? 1 : 0;
}
-/* The size fetched (value output via retSzp) set when non-NULL return */
-static struct page *
-sg_page_malloc(int rqSz, int lowDma, int *retSzp)
-{
- struct page *resp = NULL;
- gfp_t page_mask;
- int order, a_size;
- int resSz;
-
- if ((rqSz <= 0) || (NULL == retSzp))
- return resp;
-
- if (lowDma)
- page_mask = GFP_ATOMIC | GFP_DMA | __GFP_COMP | __GFP_NOWARN;
- else
- page_mask = GFP_ATOMIC | __GFP_COMP | __GFP_NOWARN;
-
- for (order = 0, a_size = PAGE_SIZE; a_size < rqSz;
- order++, a_size <<= 1) ;
- resSz = a_size; /* rounded up if necessary */
- resp = alloc_pages(page_mask, order);
- while ((!resp) && order) {
- --order;
- a_size >>= 1; /* divide by 2, until PAGE_SIZE */
- resp = alloc_pages(page_mask, order); /* try half */
- resSz = a_size;
- }
- if (resp) {
- if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SYS_RAWIO))
- memset(page_address(resp), 0, resSz);
- *retSzp = resSz;
- }
- return resp;
-}
-
-static void
-sg_page_free(struct page *page, int size)
-{
- int order, a_size;
-
- if (!page)
- return;
- for (order = 0, a_size = PAGE_SIZE; a_size < size;
- order++, a_size <<= 1) ;
- __free_pages(page, order);
-}
-
#ifdef CONFIG_SCSI_PROC_FS
static int
sg_idr_max_id(int id, void *p, void *data)
--
1.5.5.GIT
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 0/5] convert sg to use the block layer
2008-08-28 7:07 ` Jens Axboe
2008-08-28 7:17 ` [PATCH 1/5] block: add gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov FUJITA Tomonori
@ 2008-08-28 7:32 ` FUJITA Tomonori
1 sibling, 0 replies; 23+ messages in thread
From: FUJITA Tomonori @ 2008-08-28 7:32 UTC (permalink / raw)
To: jens.axboe; +Cc: fujita.tomonori, linux-scsi, dougg, michaelc, James.Bottomley
On Thu, 28 Aug 2008 09:07:25 +0200
Jens Axboe <jens.axboe@oracle.com> wrote:
> On Thu, Aug 28 2008, Jens Axboe wrote:
> > On Thu, Aug 28 2008, FUJITA Tomonori wrote:
> > > On Wed, 27 Aug 2008 09:10:18 +0200
> > > Jens Axboe <jens.axboe@oracle.com> wrote:
> > >
> > > > > Thanks, I put an updated version against the for-2.6.28 branch in your
> > > > > tree:
> > > > >
> > > > > git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-misc.git sg-block
> > > > >
> > > > >
> > > > > Can you wait for a little while before applying this to your
> > > > > for-2.6.28 branch? We need Doug's ACK on this.
> > > >
> > > > Sure, just let me know...
> > >
> > > Now we got Doug's ACK on this (Doug, thanks!). Can you apply them to
> > > your for-2.6.28 branch? I'll resend them if necessary.
> >
> > Yes certainly. No need to resend, I'll add Dougs ack and merge it.
>
> Actually, the previous was against for-linus (not for-2.6.28), so if you
> could resend against 2.6.28 that would help a lot. Thanks!
Done, though actually the patchset in my git tree was updated against
for-linus.
Thanks,
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/5] block: add gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov
2008-08-28 7:17 ` [PATCH 1/5] block: add gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov FUJITA Tomonori
2008-08-28 7:17 ` [PATCH 2/5] block: introduce struct rq_map_data to use reserved pages FUJITA Tomonori
@ 2008-08-28 7:34 ` Jens Axboe
1 sibling, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2008-08-28 7:34 UTC (permalink / raw)
To: FUJITA Tomonori; +Cc: linux-scsi, dougg, michaelc, James.Bottomley
On Thu, Aug 28 2008, FUJITA Tomonori wrote:
> Currently, blk_rq_map_user and blk_rq_map_user_iov always do
> GFP_KERNEL allocation.
>
> This adds gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov
> so sg can use it (sg always does GFP_ATOMIC allocation).
Thanks for the (quick!) resend, earns you another shining star to place
on the refrigerator!
I've applied them, thanks again Tomo.
--
Jens Axboe
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2008-08-28 7:34 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-26 2:10 [PATCH 0/5] convert sg to use the block layer FUJITA Tomonori
2008-08-26 2:10 ` [PATCH 1/5] block: add gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov FUJITA Tomonori
2008-08-26 2:10 ` [PATCH 2/5] block: introduce struct rq_map_data to use reserved pages FUJITA Tomonori
2008-08-26 2:10 ` [PATCH 3/5] sg: convert the non-data path to use the block layer FUJITA Tomonori
2008-08-26 2:10 ` [PATCH 4/5] sg: convert the direct IO " FUJITA Tomonori
2008-08-26 2:10 ` [PATCH 5/5] sg: convert the indirect " FUJITA Tomonori
2008-08-26 16:35 ` [PATCH 1/5] block: add gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov Christoph Hellwig
2008-08-27 1:30 ` FUJITA Tomonori
2008-08-26 7:56 ` [PATCH 0/5] convert sg to use the block layer Jens Axboe
2008-08-27 2:14 ` FUJITA Tomonori
2008-08-27 7:10 ` Jens Axboe
2008-08-27 23:26 ` FUJITA Tomonori
2008-08-28 6:51 ` Jens Axboe
2008-08-28 7:07 ` Jens Axboe
2008-08-28 7:17 ` [PATCH 1/5] block: add gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov FUJITA Tomonori
2008-08-28 7:17 ` [PATCH 2/5] block: introduce struct rq_map_data to use reserved pages FUJITA Tomonori
2008-08-28 7:17 ` [PATCH 3/5] sg: convert the non-data path to use the block layer FUJITA Tomonori
2008-08-28 7:17 ` [PATCH 4/5] sg: convert the direct IO " FUJITA Tomonori
2008-08-28 7:17 ` [PATCH 5/5] sg: convert the indirect " FUJITA Tomonori
2008-08-28 7:34 ` [PATCH 1/5] block: add gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov Jens Axboe
2008-08-28 7:32 ` [PATCH 0/5] convert sg to use the block layer FUJITA Tomonori
2008-08-27 20:14 ` Douglas Gilbert
2008-08-27 23:26 ` FUJITA Tomonori
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox