* [PATCH 0/10] Chaining sg lists for big IO commands v2
@ 2007-05-09 7:59 Jens Axboe
2007-05-09 7:59 ` [PATCH 1/10] crypto: don't pollute the global namespace with sg_next() Jens Axboe
` (9 more replies)
0 siblings, 10 replies; 17+ messages in thread
From: Jens Axboe @ 2007-05-09 7:59 UTC (permalink / raw)
To: linux-kernel
Hi,
Ok, got this cleaned up and split a bit. Should be more reviewable
now. A rough overview of what this does:
Some people complain that Linux doesn't support really large IO
commands. The main reason why we do not support infinitely sized IO
is that we need to allocate a scatterlist to fill these elements
into for dma mapping. The Linux scatterlist is an array of scatterlist
elements, so we need to allocate a contig piece of memory to hold them
all. On i386, we can at most fit 256 scatterlist elements into a page,
and on x86-64 we are stuck with 128. So that puts us somewhere
between 512kb and 1024kb for a single IO.
To get around that limitation, this patchset introduces an sg
chaining concept. The way it works is that the last element of an
sg table can point to a new sgtable, thus extending the size of the
total IO scatterlist greatly.
The first parts of the patch are preparatory stuff, abstracting out
sg browsing/lookup and converting libata/SCSI/block to using those.
The latter part is enabling sg chaining on i386 and SCSI (and thus
libata as well).
The patch set defaults to being safe and doesn't enable large commands,
you must actively do so yourself. If you want to test eg sda with
large commands, you would do:
# cd /sys/block/sda/queue
# echo 1024 > max_segments
# cat max_hw_sectors_kb > max_sectors_kb
which would limit you to 1024 segments (effectively 8 scatterlists
chained), and should give you IO's of at least 4mb. You can go larger
than 1024, there's no real limit.
Changes since last time:
- Hopefully get the libata atapi/pio bits fixed.
- Clear __GFP_WAIT on second (and on) rounds of scatterlist allocation.
- Cleanups/fixes/etc.
It works for me, but you can't enable large commands on anything but
i386 right now. I still need to go over the x86-64 iommu bits to enable
it there as well.
block/ll_rw_blk.c | 41 +++++-
crypto/digest.c | 2
crypto/scatterwalk.c | 2
crypto/scatterwalk.h | 2
drivers/ata/libata-core.c | 30 ++--
drivers/scsi/scsi_lib.c | 212 ++++++++++++++++++++++++---------
drivers/scsi/scsi_tgt_lib.c | 4
include/asm-i386/dma-mapping.h | 13 +-
include/asm-i386/scatterlist.h | 4
include/linux/libata.h | 16 +-
include/linux/scatterlist.h | 40 ++++++
include/scsi/scsi.h | 7 -
include/scsi/scsi_cmnd.h | 3
13 files changed, 275 insertions(+), 101 deletions(-)
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 1/10] crypto: don't pollute the global namespace with sg_next()
2007-05-09 7:59 [PATCH 0/10] Chaining sg lists for big IO commands v2 Jens Axboe
@ 2007-05-09 7:59 ` Jens Axboe
2007-05-09 7:59 ` [PATCH 2/10] Add sg helpers for iterating over a scatterlist table Jens Axboe
` (8 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Jens Axboe @ 2007-05-09 7:59 UTC (permalink / raw)
To: linux-kernel; +Cc: Jens Axboe
It's a subsystem function, prefix it as such.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
crypto/digest.c | 2 +-
crypto/scatterwalk.c | 2 +-
crypto/scatterwalk.h | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/crypto/digest.c b/crypto/digest.c
index 1bf7414..e56de67 100644
--- a/crypto/digest.c
+++ b/crypto/digest.c
@@ -77,7 +77,7 @@ static int update2(struct hash_desc *desc,
if (!nbytes)
break;
- sg = sg_next(sg);
+ sg = scatterwalk_sg_next(sg);
}
return 0;
diff --git a/crypto/scatterwalk.c b/crypto/scatterwalk.c
index 81afd17..2e51f82 100644
--- a/crypto/scatterwalk.c
+++ b/crypto/scatterwalk.c
@@ -70,7 +70,7 @@ static void scatterwalk_pagedone(struct scatter_walk *walk, int out,
walk->offset += PAGE_SIZE - 1;
walk->offset &= PAGE_MASK;
if (walk->offset >= walk->sg->offset + walk->sg->length)
- scatterwalk_start(walk, sg_next(walk->sg));
+ scatterwalk_start(walk, scatterwalk_sg_next(walk->sg));
}
}
diff --git a/crypto/scatterwalk.h b/crypto/scatterwalk.h
index f1592cc..e049c62 100644
--- a/crypto/scatterwalk.h
+++ b/crypto/scatterwalk.h
@@ -20,7 +20,7 @@
#include "internal.h"
-static inline struct scatterlist *sg_next(struct scatterlist *sg)
+static inline struct scatterlist *scatterwalk_sg_next(struct scatterlist *sg)
{
return (++sg)->length ? sg : (void *)sg->page;
}
--
1.5.2.rc1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 2/10] Add sg helpers for iterating over a scatterlist table
2007-05-09 7:59 [PATCH 0/10] Chaining sg lists for big IO commands v2 Jens Axboe
2007-05-09 7:59 ` [PATCH 1/10] crypto: don't pollute the global namespace with sg_next() Jens Axboe
@ 2007-05-09 7:59 ` Jens Axboe
2007-05-09 7:59 ` [PATCH 3/10] libata: convert to using sg helpers Jens Axboe
` (7 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Jens Axboe @ 2007-05-09 7:59 UTC (permalink / raw)
To: linux-kernel; +Cc: Jens Axboe
First step to being able to change the scatterlist setup without
having to modify drivers (a lot :-)
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
include/linux/scatterlist.h | 9 +++++++++
1 files changed, 9 insertions(+), 0 deletions(-)
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 4efbd9c..c5bffde 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -20,4 +20,13 @@ static inline void sg_init_one(struct scatterlist *sg, const void *buf,
sg_set_buf(sg, buf, buflen);
}
+#define sg_next(sg) ((sg) + 1)
+#define sg_last(sg, nents) (&(sg[nents - 1]))
+
+/*
+ * Loop over each sg element, following the pointer to a new list if necessary
+ */
+#define for_each_sg(sglist, sg, nr, __i) \
+ for (__i = 0, sg = (sglist); __i < nr; __i++, sg = sg_next(sg))
+
#endif /* _LINUX_SCATTERLIST_H */
--
1.5.2.rc1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 3/10] libata: convert to using sg helpers
2007-05-09 7:59 [PATCH 0/10] Chaining sg lists for big IO commands v2 Jens Axboe
2007-05-09 7:59 ` [PATCH 1/10] crypto: don't pollute the global namespace with sg_next() Jens Axboe
2007-05-09 7:59 ` [PATCH 2/10] Add sg helpers for iterating over a scatterlist table Jens Axboe
@ 2007-05-09 7:59 ` Jens Axboe
2007-05-09 7:59 ` [PATCH 4/10] block: " Jens Axboe
` (6 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Jens Axboe @ 2007-05-09 7:59 UTC (permalink / raw)
To: linux-kernel; +Cc: Jens Axboe
This converts libata to using the sg helpers for looking up sg
elements, instead of doing it manually.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
drivers/ata/libata-core.c | 30 ++++++++++++++++--------------
include/linux/libata.h | 16 ++++++++++------
2 files changed, 26 insertions(+), 20 deletions(-)
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index a795088..2928299 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -1368,7 +1368,7 @@ static void ata_qc_complete_internal(struct ata_queued_cmd *qc)
*/
unsigned ata_exec_internal_sg(struct ata_device *dev,
struct ata_taskfile *tf, const u8 *cdb,
- int dma_dir, struct scatterlist *sg,
+ int dma_dir, struct scatterlist *sgl,
unsigned int n_elem)
{
struct ata_port *ap = dev->ap;
@@ -1426,11 +1426,12 @@ unsigned ata_exec_internal_sg(struct ata_device *dev,
qc->dma_dir = dma_dir;
if (dma_dir != DMA_NONE) {
unsigned int i, buflen = 0;
+ struct scatterlist *sg;
- for (i = 0; i < n_elem; i++)
- buflen += sg[i].length;
+ for_each_sg(sgl, sg, n_elem, i)
+ buflen += sg->length;
- ata_sg_init(qc, sg, n_elem);
+ ata_sg_init(qc, sgl, n_elem);
qc->nbytes = buflen;
}
@@ -3980,7 +3981,7 @@ void ata_sg_clean(struct ata_queued_cmd *qc)
if (qc->n_elem)
dma_unmap_sg(ap->dev, sg, qc->n_elem, dir);
/* restore last sg */
- sg[qc->orig_n_elem - 1].length += qc->pad_len;
+ sg_last(sg, qc->orig_n_elem)->length += qc->pad_len;
if (pad_buf) {
struct scatterlist *psg = &qc->pad_sgent;
void *addr = kmap_atomic(psg->page, KM_IRQ0);
@@ -4139,6 +4140,7 @@ void ata_sg_init_one(struct ata_queued_cmd *qc, void *buf, unsigned int buflen)
qc->orig_n_elem = 1;
qc->buf_virt = buf;
qc->nbytes = buflen;
+ qc->cursg = qc->__sg;
sg_init_one(&qc->sgent, buf, buflen);
}
@@ -4164,6 +4166,7 @@ void ata_sg_init(struct ata_queued_cmd *qc, struct scatterlist *sg,
qc->__sg = sg;
qc->n_elem = n_elem;
qc->orig_n_elem = n_elem;
+ qc->cursg = qc->__sg;
}
/**
@@ -4253,7 +4256,7 @@ static int ata_sg_setup(struct ata_queued_cmd *qc)
{
struct ata_port *ap = qc->ap;
struct scatterlist *sg = qc->__sg;
- struct scatterlist *lsg = &sg[qc->n_elem - 1];
+ struct scatterlist *lsg = sg_last(qc->__sg, qc->n_elem);
int n_elem, pre_n_elem, dir, trim_sg = 0;
VPRINTK("ENTER, ata%u\n", ap->print_id);
@@ -4417,7 +4420,6 @@ void ata_data_xfer_noirq(struct ata_device *adev, unsigned char *buf,
static void ata_pio_sector(struct ata_queued_cmd *qc)
{
int do_write = (qc->tf.flags & ATA_TFLAG_WRITE);
- struct scatterlist *sg = qc->__sg;
struct ata_port *ap = qc->ap;
struct page *page;
unsigned int offset;
@@ -4426,8 +4428,8 @@ static void ata_pio_sector(struct ata_queued_cmd *qc)
if (qc->curbytes == qc->nbytes - qc->sect_size)
ap->hsm_task_state = HSM_ST_LAST;
- page = sg[qc->cursg].page;
- offset = sg[qc->cursg].offset + qc->cursg_ofs;
+ page = qc->cursg->page;
+ offset = qc->cursg->offset + qc->cursg_ofs;
/* get the current page and offset */
page = nth_page(page, (offset >> PAGE_SHIFT));
@@ -4455,8 +4457,8 @@ static void ata_pio_sector(struct ata_queued_cmd *qc)
qc->curbytes += qc->sect_size;
qc->cursg_ofs += qc->sect_size;
- if (qc->cursg_ofs == (&sg[qc->cursg])->length) {
- qc->cursg++;
+ if (qc->cursg_ofs == qc->cursg->length) {
+ qc->cursg = sg_next(qc->cursg);
qc->cursg_ofs = 0;
}
}
@@ -4549,7 +4551,7 @@ static void __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes)
ap->hsm_task_state = HSM_ST_LAST;
next_sg:
- if (unlikely(qc->cursg >= qc->n_elem)) {
+ if (unlikely(qc->cursg == sg_last(qc->__sg, qc->n_elem))) {
/*
* The end of qc->sg is reached and the device expects
* more data to transfer. In order not to overrun qc->sg
@@ -4572,7 +4574,7 @@ next_sg:
return;
}
- sg = &qc->__sg[qc->cursg];
+ sg = qc->cursg;
page = sg->page;
offset = sg->offset + qc->cursg_ofs;
@@ -4611,7 +4613,7 @@ next_sg:
qc->cursg_ofs += count;
if (qc->cursg_ofs == sg->length) {
- qc->cursg++;
+ qc->cursg = sg_next(qc->cursg);
qc->cursg_ofs = 0;
}
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 7906d75..8fad10e 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -30,7 +30,7 @@
#include <linux/interrupt.h>
#include <linux/pci.h>
#include <linux/dma-mapping.h>
-#include <asm/scatterlist.h>
+#include <linux/scatterlist.h>
#include <linux/io.h>
#include <linux/ata.h>
#include <linux/workqueue.h>
@@ -388,6 +388,7 @@ struct ata_queued_cmd {
unsigned long flags; /* ATA_QCFLAG_xxx */
unsigned int tag;
unsigned int n_elem;
+ unsigned int n_iter;
unsigned int orig_n_elem;
int dma_dir;
@@ -398,7 +399,7 @@ struct ata_queued_cmd {
unsigned int nbytes;
unsigned int curbytes;
- unsigned int cursg;
+ struct scatterlist *cursg;
unsigned int cursg_ofs;
struct scatterlist sgent;
@@ -935,7 +936,7 @@ ata_sg_is_last(struct scatterlist *sg, struct ata_queued_cmd *qc)
return 1;
if (qc->pad_len)
return 0;
- if (((sg - qc->__sg) + 1) == qc->n_elem)
+ if (qc->n_iter == qc->n_elem)
return 1;
return 0;
}
@@ -943,6 +944,7 @@ ata_sg_is_last(struct scatterlist *sg, struct ata_queued_cmd *qc)
static inline struct scatterlist *
ata_qc_first_sg(struct ata_queued_cmd *qc)
{
+ qc->n_iter = 0;
if (qc->n_elem)
return qc->__sg;
if (qc->pad_len)
@@ -955,8 +957,8 @@ ata_qc_next_sg(struct scatterlist *sg, struct ata_queued_cmd *qc)
{
if (sg == &qc->pad_sgent)
return NULL;
- if (++sg - qc->__sg < qc->n_elem)
- return sg;
+ if (++qc->n_iter < qc->n_elem)
+ return sg_next(sg);
if (qc->pad_len)
return &qc->pad_sgent;
return NULL;
@@ -1157,9 +1159,11 @@ static inline void ata_qc_reinit(struct ata_queued_cmd *qc)
qc->dma_dir = DMA_NONE;
qc->__sg = NULL;
qc->flags = 0;
- qc->cursg = qc->cursg_ofs = 0;
+ qc->cursg = NULL;
+ qc->cursg_ofs = 0;
qc->nbytes = qc->curbytes = 0;
qc->n_elem = 0;
+ qc->n_iter = 0;
qc->err_mask = 0;
qc->pad_len = 0;
qc->sect_size = ATA_SECT_SIZE;
--
1.5.2.rc1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 4/10] block: convert to using sg helpers
2007-05-09 7:59 [PATCH 0/10] Chaining sg lists for big IO commands v2 Jens Axboe
` (2 preceding siblings ...)
2007-05-09 7:59 ` [PATCH 3/10] libata: convert to using sg helpers Jens Axboe
@ 2007-05-09 7:59 ` Jens Axboe
2007-05-09 7:59 ` [PATCH 5/10] scsi: " Jens Axboe
` (5 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Jens Axboe @ 2007-05-09 7:59 UTC (permalink / raw)
To: linux-kernel; +Cc: Jens Axboe
Convert the main rq mapper (blk_rq_map_sg()) to the sg helper setup.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
block/ll_rw_blk.c | 19 ++++++++++++-------
1 files changed, 12 insertions(+), 7 deletions(-)
diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index d99d402..0ad4c34 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -30,6 +30,7 @@
#include <linux/cpu.h>
#include <linux/blktrace_api.h>
#include <linux/fault-inject.h>
+#include <linux/scatterlist.h>
/*
* for max sense size
@@ -1307,9 +1308,11 @@ static int blk_hw_contig_segment(request_queue_t *q, struct bio *bio,
* map a request to scatterlist, return number of sg entries setup. Caller
* must make sure sg can hold rq->nr_phys_segments entries
*/
-int blk_rq_map_sg(request_queue_t *q, struct request *rq, struct scatterlist *sg)
+int blk_rq_map_sg(request_queue_t *q, struct request *rq,
+ struct scatterlist *sglist)
{
struct bio_vec *bvec, *bvprv;
+ struct scatterlist *next_sg, *sg;
struct bio *bio;
int nsegs, i, cluster;
@@ -1320,6 +1323,7 @@ int blk_rq_map_sg(request_queue_t *q, struct request *rq, struct scatterlist *sg
* for each bio in rq
*/
bvprv = NULL;
+ sg = next_sg = &sglist[0];
rq_for_each_bio(bio, rq) {
/*
* for each segment in bio
@@ -1328,7 +1332,7 @@ int blk_rq_map_sg(request_queue_t *q, struct request *rq, struct scatterlist *sg
int nbytes = bvec->bv_len;
if (bvprv && cluster) {
- if (sg[nsegs - 1].length + nbytes > q->max_segment_size)
+ if (sg->length + nbytes > q->max_segment_size)
goto new_segment;
if (!BIOVEC_PHYS_MERGEABLE(bvprv, bvec))
@@ -1336,14 +1340,15 @@ int blk_rq_map_sg(request_queue_t *q, struct request *rq, struct scatterlist *sg
if (!BIOVEC_SEG_BOUNDARY(q, bvprv, bvec))
goto new_segment;
- sg[nsegs - 1].length += nbytes;
+ sg->length += nbytes;
} else {
new_segment:
- memset(&sg[nsegs],0,sizeof(struct scatterlist));
- sg[nsegs].page = bvec->bv_page;
- sg[nsegs].length = nbytes;
- sg[nsegs].offset = bvec->bv_offset;
+ sg = next_sg;
+ next_sg = sg_next(sg);
+ sg->page = bvec->bv_page;
+ sg->length = nbytes;
+ sg->offset = bvec->bv_offset;
nsegs++;
}
bvprv = bvec;
--
1.5.2.rc1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 5/10] scsi: convert to using sg helpers
2007-05-09 7:59 [PATCH 0/10] Chaining sg lists for big IO commands v2 Jens Axboe
` (3 preceding siblings ...)
2007-05-09 7:59 ` [PATCH 4/10] block: " Jens Axboe
@ 2007-05-09 7:59 ` Jens Axboe
2007-05-09 7:59 ` [PATCH 6/10] i386 dma_map_sg: " Jens Axboe
` (4 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Jens Axboe @ 2007-05-09 7:59 UTC (permalink / raw)
To: linux-kernel; +Cc: Jens Axboe
This converts the SCSI mid layer to using the sg helpers for looking up
sg elements, instead of doing it manually.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
drivers/scsi/scsi_lib.c | 9 +++++----
1 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 61fbcdc..9355c5b 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -302,14 +302,15 @@ static int scsi_req_map_sg(struct request *rq, struct scatterlist *sgl,
struct request_queue *q = rq->q;
int nr_pages = (bufflen + sgl[0].offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
unsigned int data_len = 0, len, bytes, off;
+ struct scatterlist *sg;
struct page *page;
struct bio *bio = NULL;
int i, err, nr_vecs = 0;
- for (i = 0; i < nsegs; i++) {
- page = sgl[i].page;
- off = sgl[i].offset;
- len = sgl[i].length;
+ for_each_sg(sgl, sg, nsegs, i) {
+ page = sg->page;
+ off = sg->offset;
+ len = sg->length;
data_len += len;
while (len > 0) {
--
1.5.2.rc1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 6/10] i386 dma_map_sg: convert to using sg helpers
2007-05-09 7:59 [PATCH 0/10] Chaining sg lists for big IO commands v2 Jens Axboe
` (4 preceding siblings ...)
2007-05-09 7:59 ` [PATCH 5/10] scsi: " Jens Axboe
@ 2007-05-09 7:59 ` Jens Axboe
2007-05-09 7:59 ` [PATCH 7/10] i386 sg: add support for chaining scatterlists Jens Axboe
` (3 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Jens Axboe @ 2007-05-09 7:59 UTC (permalink / raw)
To: linux-kernel; +Cc: Jens Axboe
The dma mapping helpers need to be converted to using
sg helpers as well, so they will work with a chained
sglist setup.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
include/asm-i386/dma-mapping.h | 13 +++++++------
1 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/include/asm-i386/dma-mapping.h b/include/asm-i386/dma-mapping.h
index 183eebe..a956ec1 100644
--- a/include/asm-i386/dma-mapping.h
+++ b/include/asm-i386/dma-mapping.h
@@ -2,10 +2,10 @@
#define _ASM_I386_DMA_MAPPING_H
#include <linux/mm.h>
+#include <linux/scatterlist.h>
#include <asm/cache.h>
#include <asm/io.h>
-#include <asm/scatterlist.h>
#include <asm/bug.h>
#define dma_alloc_noncoherent(d, s, h, f) dma_alloc_coherent(d, s, h, f)
@@ -35,18 +35,19 @@ dma_unmap_single(struct device *dev, dma_addr_t dma_addr, size_t size,
}
static inline int
-dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
+dma_map_sg(struct device *dev, struct scatterlist *sglist, int nents,
enum dma_data_direction direction)
{
+ struct scatterlist *sg;
int i;
BUG_ON(!valid_dma_direction(direction));
- WARN_ON(nents == 0 || sg[0].length == 0);
+ WARN_ON(nents == 0 || sglist[0].length == 0);
- for (i = 0; i < nents; i++ ) {
- BUG_ON(!sg[i].page);
+ for_each_sg(sglist, sg, nents, i) {
+ BUG_ON(!sg->page);
- sg[i].dma_address = page_to_phys(sg[i].page) + sg[i].offset;
+ sg->dma_address = page_to_phys(sg->page) + sg->offset;
}
flush_write_buffers();
--
1.5.2.rc1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 7/10] i386 sg: add support for chaining scatterlists
2007-05-09 7:59 [PATCH 0/10] Chaining sg lists for big IO commands v2 Jens Axboe
` (5 preceding siblings ...)
2007-05-09 7:59 ` [PATCH 6/10] i386 dma_map_sg: " Jens Axboe
@ 2007-05-09 7:59 ` Jens Axboe
2007-05-09 10:03 ` Herbert Xu
2007-05-09 7:59 ` [PATCH 8/10] scsi: simplify scsi_free_sgtable() Jens Axboe
` (2 subsequent siblings)
9 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2007-05-09 7:59 UTC (permalink / raw)
To: linux-kernel; +Cc: Jens Axboe
The core of the patch - allow the last sg element in a scatterlist
table to point to the start of a new table. This adds a pointer
to the sglist structure, and defines sg_chain_ptr() which the
generic code can use to look it up.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
include/asm-i386/scatterlist.h | 4 ++++
include/linux/scatterlist.h | 37 ++++++++++++++++++++++++++++++++++---
2 files changed, 38 insertions(+), 3 deletions(-)
diff --git a/include/asm-i386/scatterlist.h b/include/asm-i386/scatterlist.h
index d7e45a8..794b68c 100644
--- a/include/asm-i386/scatterlist.h
+++ b/include/asm-i386/scatterlist.h
@@ -8,8 +8,11 @@ struct scatterlist {
unsigned int offset;
dma_addr_t dma_address;
unsigned int length;
+ struct scatterlist *next;
};
+#define ARCH_HAS_SG_CHAIN
+
/* These macros should be used after a pci_map_sg call has been done
* to get bus addresses of each of the SG entries and their lengths.
* You should only work with the number of sg entries pci_map_sg
@@ -17,6 +20,7 @@ struct scatterlist {
*/
#define sg_dma_address(sg) ((sg)->dma_address)
#define sg_dma_len(sg) ((sg)->length)
+#define sg_chain_ptr(sg) ((sg)->next)
#define ISA_DMA_THRESHOLD (0x00ffffff)
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index c5bffde..e3fc307 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -20,13 +20,44 @@ static inline void sg_init_one(struct scatterlist *sg, const void *buf,
sg_set_buf(sg, buf, buflen);
}
-#define sg_next(sg) ((sg) + 1)
-#define sg_last(sg, nents) (&(sg[nents - 1]))
-
/*
* Loop over each sg element, following the pointer to a new list if necessary
*/
#define for_each_sg(sglist, sg, nr, __i) \
for (__i = 0, sg = (sglist); __i < nr; __i++, sg = sg_next(sg))
+#ifdef ARCH_HAS_SG_CHAIN
+#define sg_next(sg) (sg_chain_ptr((sg)) ? : (sg) + 1)
+/*
+ * Chain previous sglist to this one
+ */
+static inline void sg_chain(struct scatterlist *prv, unsigned int nents,
+ struct scatterlist *sgl)
+{
+ sg_chain_ptr(&prv[nents - 1]) = sgl;
+}
+
+/*
+ * We could improve this by passing in the maximum size of an sglist, so
+ * we could jump directly to the last table. That would eliminate this
+ * (potentially) lengthy scan.
+ */
+static inline struct scatterlist *sg_last(struct scatterlist *sgl,
+ unsigned int nents)
+{
+ struct scatterlist *sg, *ret = NULL;
+ int i;
+
+ for_each_sg(sgl, sg, nents, i)
+ ret = sg;
+
+ return ret;
+}
+#else
+#define sg_next(sg) ((sg) + 1)
+#define sg_chain(prv, nents, sgl) BUG()
+#define sg_chain_ptr(sg) NULL
+#define sg_last(sg, nents) (&(sg[nents - 1]))
+#endif
+
#endif /* _LINUX_SCATTERLIST_H */
--
1.5.2.rc1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 8/10] scsi: simplify scsi_free_sgtable()
2007-05-09 7:59 [PATCH 0/10] Chaining sg lists for big IO commands v2 Jens Axboe
` (6 preceding siblings ...)
2007-05-09 7:59 ` [PATCH 7/10] i386 sg: add support for chaining scatterlists Jens Axboe
@ 2007-05-09 7:59 ` Jens Axboe
2007-05-09 7:59 ` [PATCH 9/10] SCSI: support for allocating large scatterlists Jens Axboe
2007-05-09 7:59 ` [PATCH 10/10] ll_rw_blk: temporarily enable max_segments tweaking Jens Axboe
9 siblings, 0 replies; 17+ messages in thread
From: Jens Axboe @ 2007-05-09 7:59 UTC (permalink / raw)
To: linux-kernel; +Cc: Jens Axboe
Just pass in the command, no point in passing in the scatterlist
and scatterlist pool index seperately.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
drivers/scsi/scsi_lib.c | 9 +++++----
drivers/scsi/scsi_tgt_lib.c | 4 ++--
include/scsi/scsi_cmnd.h | 2 +-
3 files changed, 8 insertions(+), 7 deletions(-)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 9355c5b..332fb74 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -745,13 +745,14 @@ struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
EXPORT_SYMBOL(scsi_alloc_sgtable);
-void scsi_free_sgtable(struct scatterlist *sgl, int index)
+void scsi_free_sgtable(struct scsi_cmnd *cmd)
{
+ struct scatterlist *sgl = cmd->request_buffer;
struct scsi_host_sg_pool *sgp;
- BUG_ON(index >= SG_MEMPOOL_NR);
+ BUG_ON(cmd->sglist_len >= SG_MEMPOOL_NR);
- sgp = scsi_sg_pools + index;
+ sgp = scsi_sg_pools + cmd->sglist_len;
mempool_free(sgl, sgp->pool);
}
@@ -777,7 +778,7 @@ EXPORT_SYMBOL(scsi_free_sgtable);
static void scsi_release_buffers(struct scsi_cmnd *cmd)
{
if (cmd->use_sg)
- scsi_free_sgtable(cmd->request_buffer, cmd->sglist_len);
+ scsi_free_sgtable(cmd);
/*
* Zero these out. They now point to freed memory, and it is
diff --git a/drivers/scsi/scsi_tgt_lib.c b/drivers/scsi/scsi_tgt_lib.c
index 2570f48..d6e58e5 100644
--- a/drivers/scsi/scsi_tgt_lib.c
+++ b/drivers/scsi/scsi_tgt_lib.c
@@ -329,7 +329,7 @@ static void scsi_tgt_cmd_done(struct scsi_cmnd *cmd)
scsi_tgt_uspace_send_status(cmd, tcmd->tag);
if (cmd->request_buffer)
- scsi_free_sgtable(cmd->request_buffer, cmd->sglist_len);
+ scsi_free_sgtable(cmd);
queue_work(scsi_tgtd, &tcmd->work);
}
@@ -370,7 +370,7 @@ static int scsi_tgt_init_cmd(struct scsi_cmnd *cmd, gfp_t gfp_mask)
}
eprintk("cmd %p cnt %d\n", cmd, cmd->use_sg);
- scsi_free_sgtable(cmd->request_buffer, cmd->sglist_len);
+ scsi_free_sgtable(cmd);
return -EINVAL;
}
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index a2e0c10..d7db992 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -133,6 +133,6 @@ extern void *scsi_kmap_atomic_sg(struct scatterlist *sg, int sg_count,
extern void scsi_kunmap_atomic_sg(void *virt);
extern struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *, gfp_t);
-extern void scsi_free_sgtable(struct scatterlist *, int);
+extern void scsi_free_sgtable(struct scsi_cmnd *);
#endif /* _SCSI_SCSI_CMND_H */
--
1.5.2.rc1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 9/10] SCSI: support for allocating large scatterlists
2007-05-09 7:59 [PATCH 0/10] Chaining sg lists for big IO commands v2 Jens Axboe
` (7 preceding siblings ...)
2007-05-09 7:59 ` [PATCH 8/10] scsi: simplify scsi_free_sgtable() Jens Axboe
@ 2007-05-09 7:59 ` Jens Axboe
2007-05-09 7:59 ` [PATCH 10/10] ll_rw_blk: temporarily enable max_segments tweaking Jens Axboe
9 siblings, 0 replies; 17+ messages in thread
From: Jens Axboe @ 2007-05-09 7:59 UTC (permalink / raw)
To: linux-kernel; +Cc: Jens Axboe
This is what enables large commands. If we need to allocate an
sgtable that doesn't fit in a single page, allocate several
SCSI_MAX_SG_SEGMENTS sized tables and chain them together.
We default to the safe setup of NOT chaining, for now.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
drivers/scsi/scsi_lib.c | 194 +++++++++++++++++++++++++++++++++++-----------
include/scsi/scsi.h | 7 --
include/scsi/scsi_cmnd.h | 1 +
3 files changed, 148 insertions(+), 54 deletions(-)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 332fb74..17f3e3c 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -29,39 +29,26 @@
#include "scsi_priv.h"
#include "scsi_logging.h"
+#include <linux/scatterlist.h>
#define SG_MEMPOOL_NR ARRAY_SIZE(scsi_sg_pools)
#define SG_MEMPOOL_SIZE 2
struct scsi_host_sg_pool {
size_t size;
- char *name;
+ char *name;
struct kmem_cache *slab;
mempool_t *pool;
};
-#if (SCSI_MAX_PHYS_SEGMENTS < 32)
-#error SCSI_MAX_PHYS_SEGMENTS is too small
-#endif
-
-#define SP(x) { x, "sgpool-" #x }
+#define SP(x) { x, "sgpool-" #x }
static struct scsi_host_sg_pool scsi_sg_pools[] = {
SP(8),
SP(16),
SP(32),
-#if (SCSI_MAX_PHYS_SEGMENTS > 32)
SP(64),
-#if (SCSI_MAX_PHYS_SEGMENTS > 64)
SP(128),
-#if (SCSI_MAX_PHYS_SEGMENTS > 128)
- SP(256),
-#if (SCSI_MAX_PHYS_SEGMENTS > 256)
-#error SCSI_MAX_PHYS_SEGMENTS is too large
-#endif
-#endif
-#endif
-#endif
-};
+};
#undef SP
static void scsi_run_queue(struct request_queue *q);
@@ -702,45 +689,116 @@ static struct scsi_cmnd *scsi_end_request(struct scsi_cmnd *cmd, int uptodate,
return NULL;
}
-struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
-{
- struct scsi_host_sg_pool *sgp;
- struct scatterlist *sgl;
+/*
+ * Should fit within a single page, and must be a power-of-2.
+ */
+#define SCSI_MAX_SG_SEGMENTS 128
- BUG_ON(!cmd->use_sg);
+static inline unsigned int scsi_sgtable_index(unsigned short nents)
+{
+ unsigned int index;
- switch (cmd->use_sg) {
+ switch (nents) {
case 1 ... 8:
- cmd->sglist_len = 0;
+ index = 0;
break;
case 9 ... 16:
- cmd->sglist_len = 1;
+ index = 1;
break;
case 17 ... 32:
- cmd->sglist_len = 2;
+ index = 2;
break;
-#if (SCSI_MAX_PHYS_SEGMENTS > 32)
case 33 ... 64:
- cmd->sglist_len = 3;
+ index = 3;
break;
-#if (SCSI_MAX_PHYS_SEGMENTS > 64)
- case 65 ... 128:
- cmd->sglist_len = 4;
+ case 65 ... SCSI_MAX_SG_SEGMENTS:
+ index = 4;
break;
-#if (SCSI_MAX_PHYS_SEGMENTS > 128)
- case 129 ... 256:
- cmd->sglist_len = 5;
- break;
-#endif
-#endif
-#endif
default:
- return NULL;
+ printk(KERN_ERR "scsi: bad segment count=%d\n", nents);
+ BUG();
}
- sgp = scsi_sg_pools + cmd->sglist_len;
- sgl = mempool_alloc(sgp->pool, gfp_mask);
- return sgl;
+ return index;
+}
+
+struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
+{
+ struct scsi_host_sg_pool *sgp;
+ struct scatterlist *sgl, *prev, *ret;
+ unsigned int index;
+ int this, left;
+
+ BUG_ON(!cmd->use_sg);
+
+ left = cmd->use_sg;
+ ret = prev = NULL;
+ do {
+ this = left;
+ if (this > SCSI_MAX_SG_SEGMENTS) {
+ this = SCSI_MAX_SG_SEGMENTS;
+ index = SG_MEMPOOL_NR - 1;
+ } else
+ index = scsi_sgtable_index(this);
+
+ left -= this;
+
+ sgp = scsi_sg_pools + index;
+
+ sgl = mempool_alloc(sgp->pool, gfp_mask);
+ if (unlikely(!sgl))
+ goto enomem;
+
+ memset(sgl, 0, sizeof(*sgl) * sgp->size);
+
+ /*
+ * first loop through, set initial index and return value
+ */
+ if (!ret) {
+ cmd->sglist_len = index;
+ ret = sgl;
+ }
+
+ /*
+ * chain previous sglist, if any. we know the previous
+ * sglist must be the biggest one, or we would not have
+ * ended up doing another loop.
+ */
+ if (prev)
+ sg_chain(prev, SCSI_MAX_SG_SEGMENTS, sgl);
+
+ /*
+ * don't allow subsequent mempool allocs to sleep, it would
+ * violate the mempool principle.
+ */
+ gfp_mask &= ~__GFP_WAIT;
+ prev = sgl;
+ } while (left);
+
+ /*
+ * ->use_sg may get modified after dma mapping has potentially
+ * shrunk the number of segments, so keep a copy of it for free.
+ */
+ cmd->__use_sg = cmd->use_sg;
+ return ret;
+enomem:
+ if (ret) {
+ /*
+ * Free entries chained off ret. Since we were trying to
+ * allocate another sglist, we know that all entries are of
+ * the max size.
+ */
+ sgp = scsi_sg_pools + SG_MEMPOOL_NR - 1;
+ prev = &ret[SCSI_MAX_SG_SEGMENTS - 1];
+
+ while ((sgl = sg_chain_ptr(ret)) != NULL) {
+ ret = &sgl[SCSI_MAX_SG_SEGMENTS - 1];
+ mempool_free(sgl, sgp->pool);
+ }
+
+ mempool_free(prev, sgp->pool);
+ }
+ return NULL;
}
EXPORT_SYMBOL(scsi_alloc_sgtable);
@@ -752,6 +810,42 @@ void scsi_free_sgtable(struct scsi_cmnd *cmd)
BUG_ON(cmd->sglist_len >= SG_MEMPOOL_NR);
+ /*
+ * if this is the biggest size sglist, check if we have
+ * chained parts we need to free
+ */
+ if (cmd->__use_sg > SCSI_MAX_SG_SEGMENTS) {
+ unsigned short this, left;
+ struct scatterlist *next;
+ unsigned int index;
+
+ left = cmd->__use_sg - SCSI_MAX_SG_SEGMENTS;
+ next = sg_chain_ptr(&sgl[SCSI_MAX_SG_SEGMENTS - 1]);
+ do {
+ sgl = next;
+ this = left;
+ if (this > SCSI_MAX_SG_SEGMENTS) {
+ this = SCSI_MAX_SG_SEGMENTS;
+ index = SG_MEMPOOL_NR - 1;
+ } else
+ index = scsi_sgtable_index(this);
+
+ left -= this;
+
+ sgp = scsi_sg_pools + index;
+
+ if (left)
+ next = sg_chain_ptr(&sgl[sgp->size - 1]);
+
+ mempool_free(sgl, sgp->pool);
+ } while (left);
+
+ /*
+ * Restore original, will be freed below
+ */
+ sgl = cmd->request_buffer;
+ }
+
sgp = scsi_sg_pools + cmd->sglist_len;
mempool_free(sgl, sgp->pool);
}
@@ -993,7 +1087,6 @@ EXPORT_SYMBOL(scsi_io_completion);
static int scsi_init_io(struct scsi_cmnd *cmd)
{
struct request *req = cmd->request;
- struct scatterlist *sgpnt;
int count;
/*
@@ -1006,14 +1099,13 @@ static int scsi_init_io(struct scsi_cmnd *cmd)
/*
* If sg table allocation fails, requeue request later.
*/
- sgpnt = scsi_alloc_sgtable(cmd, GFP_ATOMIC);
- if (unlikely(!sgpnt)) {
+ cmd->request_buffer = scsi_alloc_sgtable(cmd, GFP_ATOMIC);
+ if (unlikely(!cmd->request_buffer)) {
scsi_unprep_request(req);
return BLKPREP_DEFER;
}
req->buffer = NULL;
- cmd->request_buffer = (char *) sgpnt;
if (blk_pc_request(req))
cmd->request_bufflen = req->data_len;
else
@@ -1577,8 +1669,16 @@ struct request_queue *__scsi_alloc_queue(struct Scsi_Host *shost,
if (!q)
return NULL;
+ /*
+ * this limit is imposed by hardware restrictions
+ */
blk_queue_max_hw_segments(q, shost->sg_tablesize);
- blk_queue_max_phys_segments(q, SCSI_MAX_PHYS_SEGMENTS);
+
+ /*
+ * we can chain scatterlists, so this limit is fairly arbitrary
+ */
+ blk_queue_max_phys_segments(q, SCSI_MAX_SG_SEGMENTS);
+
blk_queue_max_sectors(q, shost->max_sectors);
blk_queue_bounce_limit(q, scsi_calculate_bounce_limit(shost));
blk_queue_segment_boundary(q, shost->dma_boundary);
diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h
index 9f8f80a..702fcfe 100644
--- a/include/scsi/scsi.h
+++ b/include/scsi/scsi.h
@@ -11,13 +11,6 @@
#include <linux/types.h>
/*
- * The maximum sg list length SCSI can cope with
- * (currently must be a power of 2 between 32 and 256)
- */
-#define SCSI_MAX_PHYS_SEGMENTS MAX_PHYS_SEGMENTS
-
-
-/*
* SCSI command lengths
*/
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index d7db992..fc649af 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -72,6 +72,7 @@ struct scsi_cmnd {
/* These elements define the operation we ultimately want to perform */
unsigned short use_sg; /* Number of pieces of scatter-gather */
unsigned short sglist_len; /* size of malloc'd scatter-gather list */
+ unsigned short __use_sg;
unsigned underflow; /* Return error if less than
this amount is transferred */
--
1.5.2.rc1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 10/10] ll_rw_blk: temporarily enable max_segments tweaking
2007-05-09 7:59 [PATCH 0/10] Chaining sg lists for big IO commands v2 Jens Axboe
` (8 preceding siblings ...)
2007-05-09 7:59 ` [PATCH 9/10] SCSI: support for allocating large scatterlists Jens Axboe
@ 2007-05-09 7:59 ` Jens Axboe
9 siblings, 0 replies; 17+ messages in thread
From: Jens Axboe @ 2007-05-09 7:59 UTC (permalink / raw)
To: linux-kernel; +Cc: Jens Axboe
Expose this setting for now, so that users can play with enabling
large commands without defaulting it to on globally.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
block/ll_rw_blk.c | 22 ++++++++++++++++++++++
1 files changed, 22 insertions(+), 0 deletions(-)
diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index 0ad4c34..0adcbed 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -3931,7 +3931,22 @@ static ssize_t queue_max_hw_sectors_show(struct request_queue *q, char *page)
return queue_var_show(max_hw_sectors_kb, (page));
}
+static ssize_t queue_max_segments_show(struct request_queue *q, char *page)
+{
+ return queue_var_show(q->max_phys_segments, page);
+}
+
+static ssize_t queue_max_segments_store(struct request_queue *q, const char *page, size_t count)
+{
+ unsigned long segments;
+ ssize_t ret = queue_var_store(&segments, page, count);
+ spin_lock_irq(q->queue_lock);
+ q->max_phys_segments = segments;
+ spin_unlock_irq(q->queue_lock);
+
+ return ret;
+}
static struct queue_sysfs_entry queue_requests_entry = {
.attr = {.name = "nr_requests", .mode = S_IRUGO | S_IWUSR },
.show = queue_requests_show,
@@ -3955,6 +3970,12 @@ static struct queue_sysfs_entry queue_max_hw_sectors_entry = {
.show = queue_max_hw_sectors_show,
};
+static struct queue_sysfs_entry queue_max_segments_entry = {
+ .attr = {.name = "max_segments", .mode = S_IRUGO |S_IWUSR },
+ .show = queue_max_segments_show,
+ .store = queue_max_segments_store,
+};
+
static struct queue_sysfs_entry queue_iosched_entry = {
.attr = {.name = "scheduler", .mode = S_IRUGO | S_IWUSR },
.show = elv_iosched_show,
@@ -3966,6 +3987,7 @@ static struct attribute *default_attrs[] = {
&queue_ra_entry.attr,
&queue_max_hw_sectors_entry.attr,
&queue_max_sectors_entry.attr,
+ &queue_max_segments_entry.attr,
&queue_iosched_entry.attr,
NULL,
};
--
1.5.2.rc1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 7/10] i386 sg: add support for chaining scatterlists
2007-05-09 7:59 ` [PATCH 7/10] i386 sg: add support for chaining scatterlists Jens Axboe
@ 2007-05-09 10:03 ` Herbert Xu
2007-05-09 10:19 ` Andrew Morton
2007-05-09 10:28 ` Jens Axboe
0 siblings, 2 replies; 17+ messages in thread
From: Herbert Xu @ 2007-05-09 10:03 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-kernel, jens.axboe
Jens Axboe <jens.axboe@oracle.com> wrote:
>
> diff --git a/include/asm-i386/scatterlist.h b/include/asm-i386/scatterlist.h
> index d7e45a8..794b68c 100644
> --- a/include/asm-i386/scatterlist.h
> +++ b/include/asm-i386/scatterlist.h
> @@ -8,8 +8,11 @@ struct scatterlist {
> unsigned int offset;
> dma_addr_t dma_address;
> unsigned int length;
> + struct scatterlist *next;
> };
BTW, the crypto layer's scatterlist already has a chaining mechanism
using the existing structure. The only difference is that the chained
pointer is stored inside the 'struct page *' rather than a new pointer.
Its existence is flagged by a zero value in the length field.
Now I'm not super-religious about this but we should at least consider
whether forking out 4 bytes in every scatterlist member is worthwhile
investment when we can use 12 bytes at the end instead.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 7/10] i386 sg: add support for chaining scatterlists
2007-05-09 10:03 ` Herbert Xu
@ 2007-05-09 10:19 ` Andrew Morton
2007-05-09 10:21 ` Herbert Xu
2007-05-09 10:30 ` Jens Axboe
2007-05-09 10:28 ` Jens Axboe
1 sibling, 2 replies; 17+ messages in thread
From: Andrew Morton @ 2007-05-09 10:19 UTC (permalink / raw)
To: Herbert Xu; +Cc: Jens Axboe, linux-kernel
On Wed, 09 May 2007 20:03:29 +1000 Herbert Xu <herbert@gondor.apana.org.au> wrote:
> Jens Axboe <jens.axboe@oracle.com> wrote:
> >
> > diff --git a/include/asm-i386/scatterlist.h b/include/asm-i386/scatterlist.h
> > index d7e45a8..794b68c 100644
> > --- a/include/asm-i386/scatterlist.h
> > +++ b/include/asm-i386/scatterlist.h
> > @@ -8,8 +8,11 @@ struct scatterlist {
> > unsigned int offset;
> > dma_addr_t dma_address;
> > unsigned int length;
> > + struct scatterlist *next;
> > };
>
> BTW, the crypto layer's scatterlist already has a chaining mechanism
> using the existing structure. The only difference is that the chained
> pointer is stored inside the 'struct page *' rather than a new pointer.
<greps-and-fails>
Which field in the page is it using?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 7/10] i386 sg: add support for chaining scatterlists
2007-05-09 10:19 ` Andrew Morton
@ 2007-05-09 10:21 ` Herbert Xu
2007-05-09 10:30 ` Jens Axboe
1 sibling, 0 replies; 17+ messages in thread
From: Herbert Xu @ 2007-05-09 10:21 UTC (permalink / raw)
To: Andrew Morton; +Cc: Jens Axboe, linux-kernel
On Wed, May 09, 2007 at 03:19:15AM -0700, Andrew Morton wrote:
>
> > BTW, the crypto layer's scatterlist already has a chaining mechanism
> > using the existing structure. The only difference is that the chained
> > pointer is stored inside the 'struct page *' rather than a new pointer.
>
> <greps-and-fails>
>
> Which field in the page is it using?
Sorry, I was a bit unclear. It's simply stored in the page field since
on a length == 0 entry we don't have a page at all.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 7/10] i386 sg: add support for chaining scatterlists
2007-05-09 10:03 ` Herbert Xu
2007-05-09 10:19 ` Andrew Morton
@ 2007-05-09 10:28 ` Jens Axboe
1 sibling, 0 replies; 17+ messages in thread
From: Jens Axboe @ 2007-05-09 10:28 UTC (permalink / raw)
To: Herbert Xu; +Cc: linux-kernel
On Wed, May 09 2007, Herbert Xu wrote:
> Jens Axboe <jens.axboe@oracle.com> wrote:
> >
> > diff --git a/include/asm-i386/scatterlist.h b/include/asm-i386/scatterlist.h
> > index d7e45a8..794b68c 100644
> > --- a/include/asm-i386/scatterlist.h
> > +++ b/include/asm-i386/scatterlist.h
> > @@ -8,8 +8,11 @@ struct scatterlist {
> > unsigned int offset;
> > dma_addr_t dma_address;
> > unsigned int length;
> > + struct scatterlist *next;
> > };
>
> BTW, the crypto layer's scatterlist already has a chaining mechanism
> using the existing structure. The only difference is that the chained
> pointer is stored inside the 'struct page *' rather than a new pointer.
> Its existence is flagged by a zero value in the length field.
>
> Now I'm not super-religious about this but we should at least consider
> whether forking out 4 bytes in every scatterlist member is worthwhile
> investment when we can use 12 bytes at the end instead.
>From a memory consumption POV, it's definitely a win to reuse the ->page
member as a pointer to the next sgtable. I didn't do that originally to
avoid complications in allocating and setting up the sg table, but I may
very well reconsider that decision soonish.
We can easily change this without disturbing the upper layers, so it's
not a big deal to change now or later.
--
Jens Axboe
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 7/10] i386 sg: add support for chaining scatterlists
2007-05-09 10:19 ` Andrew Morton
2007-05-09 10:21 ` Herbert Xu
@ 2007-05-09 10:30 ` Jens Axboe
1 sibling, 0 replies; 17+ messages in thread
From: Jens Axboe @ 2007-05-09 10:30 UTC (permalink / raw)
To: Andrew Morton; +Cc: Herbert Xu, linux-kernel
On Wed, May 09 2007, Andrew Morton wrote:
> On Wed, 09 May 2007 20:03:29 +1000 Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> > Jens Axboe <jens.axboe@oracle.com> wrote:
> > >
> > > diff --git a/include/asm-i386/scatterlist.h b/include/asm-i386/scatterlist.h
> > > index d7e45a8..794b68c 100644
> > > --- a/include/asm-i386/scatterlist.h
> > > +++ b/include/asm-i386/scatterlist.h
> > > @@ -8,8 +8,11 @@ struct scatterlist {
> > > unsigned int offset;
> > > dma_addr_t dma_address;
> > > unsigned int length;
> > > + struct scatterlist *next;
> > > };
> >
> > BTW, the crypto layer's scatterlist already has a chaining mechanism
> > using the existing structure. The only difference is that the chained
> > pointer is stored inside the 'struct page *' rather than a new pointer.
>
> <greps-and-fails>
>
> Which field in the page is it using?
crypto/scatterwalk.h:
static inline struct scatterlist *scatterwalk_sg_next(struct scatterlist
*sg)
{
return (++sg)->length ? sg : (void *)sg->page;
}
it's just using the page pointer, not a pointer in the page structure.
--
Jens Axboe
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 7/10] i386 sg: add support for chaining scatterlists
2007-05-16 8:31 [PATCH 0/19] Chaining sg lists for big IO commands v6 Jens Axboe
@ 2007-05-16 8:31 ` Jens Axboe
0 siblings, 0 replies; 17+ messages in thread
From: Jens Axboe @ 2007-05-16 8:31 UTC (permalink / raw)
To: linux-kernel; +Cc: Jens Axboe
The core of the patch - allow the last sg element in a scatterlist
table to point to the start of a new table. This adds a pointer
to the sglist structure, and defines sg_chain_ptr() which the
generic code can use to look it up.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
include/asm-i386/scatterlist.h | 4 ++++
include/linux/scatterlist.h | 37 ++++++++++++++++++++++++++++++++++---
2 files changed, 38 insertions(+), 3 deletions(-)
diff --git a/include/asm-i386/scatterlist.h b/include/asm-i386/scatterlist.h
index d7e45a8..794b68c 100644
--- a/include/asm-i386/scatterlist.h
+++ b/include/asm-i386/scatterlist.h
@@ -8,8 +8,11 @@ struct scatterlist {
unsigned int offset;
dma_addr_t dma_address;
unsigned int length;
+ struct scatterlist *next;
};
+#define ARCH_HAS_SG_CHAIN
+
/* These macros should be used after a pci_map_sg call has been done
* to get bus addresses of each of the SG entries and their lengths.
* You should only work with the number of sg entries pci_map_sg
@@ -17,6 +20,7 @@ struct scatterlist {
*/
#define sg_dma_address(sg) ((sg)->dma_address)
#define sg_dma_len(sg) ((sg)->length)
+#define sg_chain_ptr(sg) ((sg)->next)
#define ISA_DMA_THRESHOLD (0x00ffffff)
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index c5bffde..e3fc307 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -20,13 +20,44 @@ static inline void sg_init_one(struct scatterlist *sg, const void *buf,
sg_set_buf(sg, buf, buflen);
}
-#define sg_next(sg) ((sg) + 1)
-#define sg_last(sg, nents) (&(sg[nents - 1]))
-
/*
* Loop over each sg element, following the pointer to a new list if necessary
*/
#define for_each_sg(sglist, sg, nr, __i) \
for (__i = 0, sg = (sglist); __i < nr; __i++, sg = sg_next(sg))
+#ifdef ARCH_HAS_SG_CHAIN
+#define sg_next(sg) (sg_chain_ptr((sg)) ? : (sg) + 1)
+/*
+ * Chain previous sglist to this one
+ */
+static inline void sg_chain(struct scatterlist *prv, unsigned int nents,
+ struct scatterlist *sgl)
+{
+ sg_chain_ptr(&prv[nents - 1]) = sgl;
+}
+
+/*
+ * We could improve this by passing in the maximum size of an sglist, so
+ * we could jump directly to the last table. That would eliminate this
+ * (potentially) lengthy scan.
+ */
+static inline struct scatterlist *sg_last(struct scatterlist *sgl,
+ unsigned int nents)
+{
+ struct scatterlist *sg, *ret = NULL;
+ int i;
+
+ for_each_sg(sgl, sg, nents, i)
+ ret = sg;
+
+ return ret;
+}
+#else
+#define sg_next(sg) ((sg) + 1)
+#define sg_chain(prv, nents, sgl) BUG()
+#define sg_chain_ptr(sg) NULL
+#define sg_last(sg, nents) (&(sg[nents - 1]))
+#endif
+
#endif /* _LINUX_SCATTERLIST_H */
--
1.5.2.rc1
^ permalink raw reply related [flat|nested] 17+ messages in thread
end of thread, other threads:[~2007-05-16 8:33 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-09 7:59 [PATCH 0/10] Chaining sg lists for big IO commands v2 Jens Axboe
2007-05-09 7:59 ` [PATCH 1/10] crypto: don't pollute the global namespace with sg_next() Jens Axboe
2007-05-09 7:59 ` [PATCH 2/10] Add sg helpers for iterating over a scatterlist table Jens Axboe
2007-05-09 7:59 ` [PATCH 3/10] libata: convert to using sg helpers Jens Axboe
2007-05-09 7:59 ` [PATCH 4/10] block: " Jens Axboe
2007-05-09 7:59 ` [PATCH 5/10] scsi: " Jens Axboe
2007-05-09 7:59 ` [PATCH 6/10] i386 dma_map_sg: " Jens Axboe
2007-05-09 7:59 ` [PATCH 7/10] i386 sg: add support for chaining scatterlists Jens Axboe
2007-05-09 10:03 ` Herbert Xu
2007-05-09 10:19 ` Andrew Morton
2007-05-09 10:21 ` Herbert Xu
2007-05-09 10:30 ` Jens Axboe
2007-05-09 10:28 ` Jens Axboe
2007-05-09 7:59 ` [PATCH 8/10] scsi: simplify scsi_free_sgtable() Jens Axboe
2007-05-09 7:59 ` [PATCH 9/10] SCSI: support for allocating large scatterlists Jens Axboe
2007-05-09 7:59 ` [PATCH 10/10] ll_rw_blk: temporarily enable max_segments tweaking Jens Axboe
-- strict thread matches above, loose matches on Subject: below --
2007-05-16 8:31 [PATCH 0/19] Chaining sg lists for big IO commands v6 Jens Axboe
2007-05-16 8:31 ` [PATCH 7/10] i386 sg: add support for chaining scatterlists Jens Axboe
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.