From: Chaitanya Tumuluri <chait@getafix.engr.sgi.com>
To: "Stephen C. Tweedie" <sct@redhat.com>
Cc: chait@sgi.com, Eric Youngdale <eric@andante.org>,
Alan Cox <alan@lxorguk.ukuu.org.uk>,
Douglas Gilbert <dgilbert@interlog.com>,
Brian Pomerantz <bapper@piratehaven.org>,
linux-scsi@vger.rutgers.edu, linux-mm@kvack.org
Subject: Re: PATCH: Enhance queueing/scsi-midlayer to handle kiobufs. [Re: Request splits]
Date: Tue, 23 May 2000 14:58:34 -0700 [thread overview]
Message-ID: <200005232158.OAA77313@getafix.engr.sgi.com> (raw)
In-Reply-To: Your message of "Fri, 19 May 2000 16:09:58 BST." <20000519160958.C9961@redhat.com>
On Fri, 19 May 2000 16:09:58 BST, "Stephen C. Tweedie" <sct@redhat.com> wrote:
>Hi,
>
>On Thu, May 18, 2000 at 12:55:04PM -0700, Chaitanya Tumuluri wrote:
>
> < stuff deleted >
>
>> So, I enhanced Stephen Tweedie's
>> raw I/O and the queueing/scsi layers to handle kiobufs-based requests. This is
>> in addition to the current buffer_head based request processing.
>
>The "current" kiobuf code is in ftp.uk.linux.org:/pub/linux/sct/fs/raw-io/.
>It includes a number of bug fixes (mainly rationalising the error returns),
>plus a few new significant bits of functionality. If you can get me a
>patch against those diffs, I'll include your new code in the main kiobuf
>patchset. (I'm still maintaining the different kiobuf patches as
>separate patches within that patchset tarball.)
>
Stephen and others,
Here's my patch against the 2.3.99.pre9-2 patchset from your site. The main
differences from my earlier post are:
- removed the #ifdefs around my code as Stephen Tweedie suggested,
- corrected indentation problems pointed out earlier (Eric/Alan).
Finally, I'd like to repeat that given the consensus about moving away from
buffer-head based I/O in the future, it makes sense for me to retain the
little bit of code duplication. This is in the interests of easy surgery
when we do remove the buffer-head I/O paths.
While I see decent (upto 10%) improvement in b/w and turnaround time for
I/O to a single disk, the biggest impact is the (almost 40%) reduction
in CPU utilization with the new codepath. These are from simple `lmdd' tests
timed with /usr/bin/time.
Based on further feedback from this audience, I would like to propose this
change to Linus at some point as a general scsi mechanism to handle
kiobuf-based requests.
Thanks much,
-Chait.
----------------------------CUT HERE---------------------------------------
--- pre9.2-sct/drivers/block/ll_rw_blk.c Tue May 23 14:24:22 2000
+++ pre9.2-sct+mine/drivers/block/ll_rw_blk.c Tue May 23 14:38:20 2000
@@ -4,6 +4,7 @@
* Copyright (C) 1991, 1992 Linus Torvalds
* Copyright (C) 1994, Karl Keyte: Added support for disk statistics
* Elevator latency, (C) 2000 Andrea Arcangeli <andrea@suse.de> SuSE
+ * Support for kiobuf-based I/O requests: Chaitanya Tumuluri [chait@sgi.com]
*/
/*
@@ -639,7 +640,8 @@
starving = 1;
if (latency < 0)
continue;
-
+ if (req->kiobuf)
+ continue;
if (req->sem)
continue;
if (req->cmd != rw)
@@ -744,6 +746,7 @@
req->nr_hw_segments = 1; /* Always 1 for a new request. */
req->buffer = bh->b_data;
req->sem = NULL;
+ req->kiobuf = NULL;
req->bh = bh;
req->bhtail = bh;
req->q = q;
@@ -886,6 +889,311 @@
__ll_rw_block(rw, nr, bh, 1);
}
+/*
+ * Function: __make_kio_request()
+ *
+ * Purpose: Construct a kiobuf-based request and insert into request queue.
+ *
+ * Arguments: q - request queue of device
+ * rw - read/write
+ * kiobuf - collection of pages
+ * dev - device against which I/O requested
+ * blocknr - dev block number at which to start I/O
+ * blksize - units (512B or other) of blocknr
+ *
+ * Lock status: No lock held upon entry.
+ *
+ * Returns: Nothing
+ *
+ * Notes: Requests generated by this function should _NOT_ be merged by
+ * the __make_request() (new check for `req->kiobuf')
+ *
+ * All (relevant) req->Y parameters are expressed in sector size
+ * of 512B for kiobuf based I/O. This is assumed in the scsi
+ * mid-layer as well.
+ */
+static inline void __make_kio_request(request_queue_t * q,
+ int rw,
+ struct kiobuf * kiobuf,
+ kdev_t dev,
+ unsigned long blocknr,
+ size_t blksize)
+{
+ int major = MAJOR(dev);
+ unsigned int sector, count, nr_bytes, total_bytes, nr_seg;
+ struct request * req;
+ int rw_ahead, max_req;
+ unsigned long flags;
+ struct list_head * head = &q->queue_head;
+ size_t curr_offset;
+ int orig_latency;
+ elevator_t * elevator;
+ int correct_size, i, kioind;
+
+ /*
+ * Sanity Tests:
+ *
+ * The input arg. `blocknr' is in units of the
+ * input arg. `blksize' (inode->i_sb->s_blocksize).
+ * Convert to 512B unit used in blk_size[] array.
+ */
+ count = kiobuf->length >> 9;
+ sector = blocknr * (blksize >> 9);
+
+ if (blk_size[major]) {
+ unsigned long maxsector = (blk_size[major][MINOR(dev)] << 1) + 1;
+
+ if (maxsector < count || maxsector - count < sector) {
+ if (!blk_size[major][MINOR(dev)]) {
+ kiobuf->errno = -EINVAL;
+ goto end_io;
+ }
+ /* This may well happen - the kernel calls bread()
+ without checking the size of the device, e.g.,
+ when mounting a device. */
+ printk(KERN_INFO
+ "attempt to access beyond end of device\n");
+ printk(KERN_INFO "%s: rw=%d, want=%d, limit=%d\n",
+ kdevname(dev), rw,
+ (sector + count)>>1,
+ blk_size[major][MINOR(dev)]);
+ kiobuf->errno = -ESPIPE;
+ goto end_io;
+ }
+ }
+ /*
+ * Allow only basic block size multiples in the
+ * kiobuf->length.
+ */
+ correct_size = BLOCK_SIZE;
+ if (blksize_size[major]) {
+ i = blksize_size[major][MINOR(dev)];
+ if (i)
+ correct_size = i;
+ }
+ if ((kiobuf->length % correct_size) != 0) {
+ printk(KERN_NOTICE "ll_rw_kio: "
+ "request size [%d] not a multiple of device [%s] block-size [%d]\n",
+ kiobuf->length,
+ kdevname(dev),
+ correct_size);
+ kiobuf->errno = -EINVAL;
+ goto end_io;
+ }
+ rw_ahead = 0; /* normal case; gets changed below for READA */
+ switch (rw) {
+ case READA:
+ rw_ahead = 1;
+ rw = READ; /* drop into READ */
+ case READ:
+ kstat.pgpgin++;
+ max_req = NR_REQUEST; /* reads take precedence */
+ break;
+ case WRITERAW:
+ rw = WRITE;
+ goto do_write; /* Skip the buffer refile */
+ case WRITE:
+ do_write:
+ /*
+ * We don't allow the write-requests to fill up the
+ * queue completely: we want some room for reads,
+ * as they take precedence. The last third of the
+ * requests are only for reads.
+ */
+ kstat.pgpgout++;
+ max_req = (NR_REQUEST * 2) / 3;
+ break;
+ default:
+ BUG();
+ kiobuf->errno = -EINVAL;
+ goto end_io;
+ }
+
+ /*
+ * Creation of bounce buffers for data in high memory
+ * should (is) be handled lower in the food-chain.
+ * Ccurrently done in scsi_merge.c for scsi disks.
+ *
+ * Look for a free request with spinlock held.
+ * Apart from atomic queue access, it prevents
+ * another thread that has already queued a kiobuf-request
+ * into this queue from starting it, till we are done.
+ */
+ elevator = &q->elevator;
+ orig_latency = elevator_request_latency(elevator, rw);
+ spin_lock_irqsave(&io_request_lock,flags);
+
+ if (list_empty(head))
+ q->plug_device_fn(q, dev);
+ /*
+ * The scsi disk and cdrom drivers completely remove the request
+ * from the queue when they start processing an entry. For this
+ * reason it is safe to continue to add links to the top entry
+ * for those devices.
+ *
+ * All other drivers need to jump over the first entry, as that
+ * entry may be busy being processed and we thus can't change
+ * it.
+ */
+ if (q->head_active && !q->plugged)
+ head = head->next;
+
+ /* find an unused request. */
+ req = get_request(max_req, dev);
+
+ /*
+ * if no request available: if rw_ahead, forget it,
+ * otherwise try again blocking..
+ */
+ if (!req) {
+ spin_unlock_irqrestore(&io_request_lock,flags);
+ if (rw_ahead){
+ kiobuf->errno = -EBUSY;
+ goto end_io;
+ }
+ req = __get_request_wait(max_req, dev);
+ spin_lock_irqsave(&io_request_lock,flags);
+
+ /* revalidate elevator */
+ head = &q->queue_head;
+ if (q->head_active && !q->plugged)
+ head = head->next;
+ }
+
+ /* fill up the request-info, and add it to the queue */
+ req->cmd = rw;
+ req->errors = 0;
+ req->sector = sector;
+ req->nr_hw_segments = 1; /* Always 1 for a new request. */
+ req->nr_sectors = count; /* Length of kiobuf */
+ req->sem = NULL;
+ req->kiobuf = kiobuf;
+ req->bh = NULL;
+ req->bhtail = NULL;
+ req->q = q;
+ /* Calculate req->buffer */
+ curr_offset = kiobuf->offset;
+ for (kioind=0; kioind<kiobuf->nr_pages; kioind++)
+ if (curr_offset >= PAGE_SIZE)
+ curr_offset -= PAGE_SIZE;
+ else
+ break;
+ req->buffer = (char *) page_address(kiobuf->maplist[kioind]) +
+ curr_offset;
+
+ /* Calculate current_nr_sectors and # of scatter gather segments needed */
+ total_bytes = kiobuf->length;
+ nr_bytes = (PAGE_SIZE - curr_offset) > total_bytes ?
+ total_bytes : (PAGE_SIZE - curr_offset);
+ req->current_nr_sectors = nr_bytes >> 9;
+
+ for (nr_seg = 1;
+ kioind<kiobuf->nr_pages && nr_bytes != total_bytes;
+ kioind++) {
+ ++nr_seg;
+ if((nr_bytes + PAGE_SIZE) > total_bytes){
+ break;
+ } else {
+ nr_bytes += PAGE_SIZE;
+ }
+ }
+ req->nr_segments = nr_seg;
+
+ add_request(q, req, head, orig_latency);
+ elevator_account_request(elevator, req);
+
+ spin_unlock_irqrestore(&io_request_lock, flags);
+
+end_io:
+ return;
+}
+
+
+
+/*
+ * Function: ll_rw_kio()
+ *
+ * Purpose: Insert kiobuf-based request into request queue.
+ *
+ * Arguments: rw - read/write
+ * kiobuf - collection of pages
+ * dev - device against which I/O requested
+ * blocknr - dev block number at which to start I/O
+ * sector - units (512B or other) of blocknr
+ * error - return status
+ *
+ * Lock status: Assumed no lock held upon entry.
+ * Assumed that the pages in the kiobuf ___ARE LOCKED DOWN___.
+ *
+ * Returns: Nothing
+ *
+ * Notes: This function is called from any subsystem using kiovec[]
+ * collection of kiobufs for I/O (e.g. `pagebufs', raw-io).
+ * Relies on "kiobuf" field in the request structure.
+ */
+void ll_rw_kio(int rw,
+ struct kiobuf *kiobuf,
+ kdev_t dev,
+ unsigned long blocknr,
+ size_t sector,
+ int *error)
+{
+ request_queue_t *q;
+ /*
+ * Only support SCSI disk for now.
+ *
+ * ENOSYS to indicate caller
+ * should try ll_rw_block()
+ * for non-SCSI (e.g. IDE) disks
+ * and for MD requests.
+ */
+ if (!SCSI_DISK_MAJOR(MAJOR(dev)) ||
+ (MAJOR(dev) == MD_MAJOR)) {
+ *error = -ENOSYS;
+ goto end_io;
+ }
+ /*
+ * Sanity checks
+ */
+ q = blk_get_queue(dev);
+ if (!q) {
+ printk(KERN_ERR
+ "ll_rw_kio: Nnonexistent block-device %s\n",
+ kdevname(dev));
+ *error = -ENODEV;
+ goto end_io;
+ }
+ if ((rw & WRITE) && is_read_only(dev)) {
+ printk(KERN_NOTICE "Can't write to read-only device %s\n",
+ kdevname(dev));
+ *error = -EPERM;
+ goto end_io;
+ }
+ if (q->make_request_fn) {
+ printk(KERN_ERR
+ "ll_rw_kio: Unexpected device [%s] queueing function encountered\n",
+ kdevname(dev));
+ *error = -ENOSYS;
+ goto end_io;
+ }
+
+ __make_kio_request(q, rw, kiobuf, dev, blocknr, sector);
+ if (kiobuf->errno != 0) {
+ *error = kiobuf->errno;
+ goto end_io;
+ }
+
+ return;
+end_io:
+ /*
+ * We come here only on an error so, just set
+ * kiobuf->errno and call the completion fn.
+ */
+ if(kiobuf->errno == 0)
+ kiobuf->errno = *error;
+}
+
+
#ifdef CONFIG_STRAM_SWAP
extern int stram_device_init (void);
#endif
@@ -1079,3 +1387,5 @@
EXPORT_SYMBOL(blk_queue_pluggable);
EXPORT_SYMBOL(blk_queue_make_request);
EXPORT_SYMBOL(generic_make_request);
+EXPORT_SYMBOL(__make_kio_request);
+EXPORT_SYMBOL(ll_rw_kio);
--- pre9.2-sct/drivers/char/raw.c Tue May 23 14:25:36 2000
+++ pre9.2-sct+mine/drivers/char/raw.c Mon May 22 19:00:09 2000
@@ -238,6 +238,63 @@
#define SECTOR_SIZE (1U << SECTOR_BITS)
#define SECTOR_MASK (SECTOR_SIZE - 1)
+/*
+ * IO completion routine for a kiobuf-based request.
+ */
+static void end_kiobuf_io_kiobuf(struct kiobuf *kiobuf)
+{
+ kiobuf->locked = 0;
+ if (atomic_dec_and_test(&kiobuf->io_count))
+ wake_up(&kiobuf->wait_queue);
+}
+
+/*
+ * Send I/O down the ll_rw_kio() path first.
+ * It is assumed that any requisite locking
+ * and unlocking of pages in the kiobuf has
+ * been taken care of by the caller.
+ *
+ * Return 0 if I/O should be retried on buffer_head path.
+ * Return number of transferred bytes if successful.
+ * Return -1 value, if there was an I/O error.
+ */
+static inline int try_kiobuf_io(struct kiobuf *iobuf,
+ int rw,
+ unsigned long blocknr,
+ kdev_t dev,
+ char *buf,
+ size_t sector_size)
+{
+ int err, retval;
+
+ iobuf->end_io = end_kiobuf_io_kiobuf;
+ iobuf->errno = 0;
+ iobuf->locked = 1;
+ atomic_inc(&iobuf->io_count);
+ err = 0;
+ ll_rw_kio(rw, iobuf, dev, blocknr, sector_size, &err);
+
+ if ( err == 0 ) {
+ kiobuf_wait_for_io(iobuf);
+ if (iobuf->errno == 0) {
+ retval = iobuf->length; /* Success */
+ } else {
+ retval = -1; /* I/O error */
+ }
+ } else {
+ atomic_dec(&iobuf->io_count);
+ if ( err == -ENOSYS ) {
+ retval = 0; /* Retry the buffer_head path */
+ } else {
+ retval = -1; /* I/O error */
+ }
+ }
+
+ iobuf->locked = 0;
+ return retval;
+}
+
+
ssize_t rw_raw_dev(int rw, struct file *filp, char *buf,
size_t size, loff_t *offp)
{
@@ -254,7 +311,7 @@
int sector_size, sector_bits, sector_mask;
int max_sectors;
-
+ int kiobuf_io = 1;
/*
* First, a few checks on device size limits
*/
@@ -290,17 +347,17 @@
if (err)
return err;
+ blocknr = *offp >> sector_bits;
/*
- * Split the IO into KIO_MAX_SECTORS chunks, mapping and
- * unmapping the single kiobuf as we go to perform each chunk of
- * IO.
+ * Try sending down the entire kiobuf first via ll_rw_kio().
+ * If not successful then, split the IO into KIO_MAX_SECTORS
+ * chunks, mapping and unmapping the single kiobuf as we go
+ * to perform each chunk of IO.
*/
-
- transferred = 0;
- blocknr = *offp >> sector_bits;
+ err = transferred = 0;
while (size > 0) {
blocks = size >> sector_bits;
- if (blocks > max_sectors)
+ if ((blocks > max_sectors) && (kiobuf_io == 0))
blocks = max_sectors;
if (blocks > limit - blocknr)
blocks = limit - blocknr;
@@ -318,11 +375,19 @@
if (err)
break;
#endif
-
- for (i=0; i < blocks; i++)
- b[i] = blocknr++;
-
- err = brw_kiovec(rw, 1, &iobuf, dev, b, sector_size);
+ if (kiobuf_io == 0) {
+ for (i=0; i < blocks; i++)
+ b[i] = blocknr++;
+ err = brw_kiovec(rw, 1, &iobuf, dev, b, sector_size);
+ } else {
+ err = try_kiobuf_io(iobuf, rw, blocknr, dev, buf, sector_size);
+ if ( err > 0 ) {
+ blocknr += (err >> sector_bits);
+ } else if ( err == 0 ) {
+ kiobuf_io = 0;
+ continue;
+ } /* else (err<0) => (err!=iosize); exit loop below */
+ }
if (err >= 0) {
transferred += err;
--- pre9.2-sct/drivers/scsi/scsi_lib.c Tue May 23 14:24:21 2000
+++ pre9.2-sct+mine/drivers/scsi/scsi_lib.c Tue May 23 14:42:31 2000
@@ -15,6 +15,8 @@
* a low-level driver if they wished. Note however that this file also
* contains the "default" versions of these functions, as we don't want to
* go through and retrofit queueing functions into all 30 some-odd drivers.
+ *
+ * Support for kiobuf-based I/O requests. [Chaitanya Tumuluri, chait@sgi.com]
*/
#define __NO_VERSION__
@@ -370,6 +372,161 @@
spin_unlock_irqrestore(&io_request_lock, flags);
}
+
+/*
+ * Function: __scsi_collect_bh_sectors()
+ *
+ * Purpose: Helper routine for __scsi_end_request() to mark some number
+ * (or all, if that is the case) of sectors complete.
+ *
+ * Arguments: req - request struct. from scsi command block.
+ * uptodate - 1 if I/O indicates success, 0 for I/O error.
+ * sectors - number of sectors we want to mark.
+ * leftovers- indicates if any sectors were not done.
+ *
+ * Lock status: Assumed that lock is not held upon entry.
+ *
+ * Returns: Nothing
+ *
+ * Notes: Separate buffer-head processing from kiobuf processing
+ */
+__inline static void __scsi_collect_bh_sectors(struct request *req,
+ int uptodate,
+ int sectors,
+ char **leftovers)
+{
+ struct buffer_head *bh;
+
+ do {
+ if ((bh = req->bh) != NULL) {
+ req->bh = bh->b_reqnext;
+ req->nr_sectors -= bh->b_size >> 9;
+ req->sector += bh->b_size >> 9;
+ bh->b_reqnext = NULL;
+ sectors -= bh->b_size >> 9;
+ bh->b_end_io(bh, uptodate);
+ if ((bh = req->bh) != NULL) {
+ req->current_nr_sectors = bh->b_size >> 9;
+ if (req->nr_sectors < req->current_nr_sectors) {
+ req->nr_sectors = req->current_nr_sectors;
+ printk("collect_bh: buffer-list destroyed\n");
+ }
+ }
+ }
+ } while (sectors && bh);
+
+ /* Check for leftovers */
+ if (req->bh) {
+ *leftovers = req->bh->b_data;
+ }
+ return;
+
+}
+
+
+/*
+ * Function: __scsi_collect_kio_sectors()
+ *
+ * Purpose: Helper routine for __scsi_end_request() to mark some number
+ * (or all) of the I/O sectors and attendant pages complete.
+ * Updates the request nr_segments, nr_sectors accordingly.
+ *
+ * Arguments: req - request struct. from scsi command block.
+ * uptodate - 1 if I/O indicates success, 0 for I/O error.
+ * sectors - number of sectors we want to mark.
+ * leftovers- indicates if any sectors were not done.
+ *
+ * Lock status: Assumed that lock is not held upon entry.
+ *
+ * Returns: Nothing
+ *
+ * Notes: Separate buffer-head processing from kiobuf processing.
+ * We don't know if this was a single or multi-segment sgl
+ * request. Treat it as though it were a multi-segment one.
+ */
+__inline static void __scsi_collect_kio_sectors(struct request *req,
+ int uptodate,
+ int sectors,
+ char **leftovers)
+{
+ int pgcnt, nr_pages;
+ size_t curr_offset;
+ unsigned long va = 0;
+ unsigned int nr_bytes, total_bytes, page_sectors;
+
+ nr_pages = req->kiobuf->nr_pages;
+ total_bytes = (req->nr_sectors << 9);
+ curr_offset = req->kiobuf->offset;
+
+ /*
+ * In the case of leftover requests, the kiobuf->length
+ * remains the same, but req->nr_sectors would be smaller.
+ * Adjust curr_offset in this case. If not a leftover,
+ * the following makes no difference.
+ */
+ curr_offset += (((req->kiobuf->length >> 9) - req->nr_sectors) << 9);
+
+ /* How far into the kiobuf is the offset? */
+ for (pgcnt=0; pgcnt<nr_pages; pgcnt++) {
+ if(curr_offset >= PAGE_SIZE) {
+ curr_offset -= PAGE_SIZE;
+ continue;
+ } else {
+ break;
+ }
+ }
+ /*
+ * Reusing the pgcnt and va value from above:
+ * Harvest pages to account for number of sectors
+ * passed into function.
+ */
+ for (nr_bytes = 0;
+ pgcnt<nr_pages && nr_bytes != total_bytes;
+ pgcnt++) {
+ va = page_address(req->kiobuf->maplist[pgcnt])
+ + curr_offset;
+ /* First page or final page? Partial page? */
+ if (curr_offset != 0) {
+ page_sectors = (PAGE_SIZE - curr_offset) > total_bytes ?
+ total_bytes >> 9 : (PAGE_SIZE - curr_offset) >> 9;
+ curr_offset = 0;
+ } else if((nr_bytes + PAGE_SIZE) > total_bytes) {
+ page_sectors = (total_bytes - nr_bytes) >> 9;
+ } else {
+ page_sectors = PAGE_SIZE >> 9;
+ }
+ nr_bytes += (page_sectors << 9);
+ /* Leftover sectors in this page (onward)? */
+ if (sectors < page_sectors) {
+ req->nr_sectors -= sectors;
+ req->sector += sectors;
+ req->current_nr_sectors = page_sectors - sectors;
+ va += (sectors << 9); /* Update for req->buffer */
+ sectors = 0;
+ break;
+ } else {
+ /* Mark this page as done */
+ req->nr_segments--; /* No clustering for kiobuf */
+ req->nr_sectors -= page_sectors;
+ req->sector += page_sectors;
+ if (!uptodate && (req->kiobuf->errno != 0)){
+ req->kiobuf->errno = -EIO;
+ }
+ sectors -= page_sectors;
+ }
+ }
+
+ /* Check for leftovers */
+ if (req->nr_sectors) {
+ *leftovers = (char *)va;
+ } else if (req->kiobuf->end_io) {
+ req->kiobuf->end_io(req->kiobuf);
+ }
+
+ return;
+}
+
+
/*
* Function: scsi_end_request()
*
@@ -397,7 +554,7 @@
int requeue)
{
struct request *req;
- struct buffer_head *bh;
+ char * leftovers = NULL;
ASSERT_LOCK(&io_request_lock, 0);
@@ -407,39 +564,29 @@
printk(" I/O error: dev %s, sector %lu\n",
kdevname(req->rq_dev), req->sector);
}
- do {
- if ((bh = req->bh) != NULL) {
- req->bh = bh->b_reqnext;
- req->nr_sectors -= bh->b_size >> 9;
- req->sector += bh->b_size >> 9;
- bh->b_reqnext = NULL;
- sectors -= bh->b_size >> 9;
- bh->b_end_io(bh, uptodate);
- if ((bh = req->bh) != NULL) {
- req->current_nr_sectors = bh->b_size >> 9;
- if (req->nr_sectors < req->current_nr_sectors) {
- req->nr_sectors = req->current_nr_sectors;
- printk("scsi_end_request: buffer-list destroyed\n");
- }
- }
- }
- } while (sectors && bh);
+ leftovers = NULL;
+ if (req->bh != NULL) { /* Buffer head based request */
+ __scsi_collect_bh_sectors(req, uptodate, sectors, &leftovers);
+ } else if (req->kiobuf != NULL) { /* Kiobuf based request */
+ __scsi_collect_kio_sectors(req, uptodate, sectors, &leftovers);
+ } else {
+ panic("Both bh and kiobuf pointers are unset in request!\n");
+ }
/*
* If there are blocks left over at the end, set up the command
* to queue the remainder of them.
*/
- if (req->bh) {
+ if (leftovers != NULL) {
request_queue_t *q;
- if( !requeue )
- {
+ if( !requeue ) {
return SCpnt;
}
q = &SCpnt->device->request_queue;
- req->buffer = bh->b_data;
+ req->buffer = leftovers;
/*
* Bleah. Leftovers again. Stick the leftovers in
* the front of the queue, and goose the queue again.
--- pre9.2-sct/drivers/scsi/scsi_merge.c Tue May 23 14:24:22 2000
+++ pre9.2-sct+mine/drivers/scsi/scsi_merge.c Tue May 23 14:23:29 2000
@@ -6,6 +6,7 @@
* Based upon conversations with large numbers
* of people at Linux Expo.
* Support for dynamic DMA mapping: Jakub Jelinek (jakub@redhat.com).
+ * Support for kiobuf-based I/O requests. [Chaitanya Tumuluri, chait@sgi.com]
*/
/*
@@ -90,12 +91,13 @@
printk("nr_segments is %x\n", req->nr_segments);
printk("counted segments is %x\n", segments);
printk("Flags %d %d\n", use_clustering, dma_host);
- for (bh = req->bh; bh->b_reqnext != NULL; bh = bh->b_reqnext)
- {
- printk("Segment 0x%p, blocks %d, addr 0x%lx\n",
- bh,
- bh->b_size >> 9,
- virt_to_phys(bh->b_data - 1));
+ if (req->bh != NULL) {
+ for (bh = req->bh; bh->b_reqnext != NULL; bh = bh->b_reqnext) {
+ printk("Segment 0x%p, blocks %d, addr 0x%lx\n",
+ bh,
+ bh->b_size >> 9,
+ virt_to_phys(bh->b_data - 1));
+ }
}
panic("Ththththaats all folks. Too dangerous to continue.\n");
}
@@ -298,9 +300,22 @@
SHpnt = SCpnt->host;
SDpnt = SCpnt->device;
- req->nr_segments = __count_segments(req,
- CLUSTERABLE_DEVICE(SHpnt, SDpnt),
- SHpnt->unchecked_isa_dma, NULL);
+ if (req->kiobuf) {
+ /* Since there is no clustering/merging in kiobuf
+ * requests, the nr_segments is simply a count of
+ * the number of pages needing I/O. nr_segments is
+ * updated in __scsi_collect_kio_sectors() called
+ * from scsi_end_request(), for the leftover case.
+ * [chait@sgi.com]
+ */
+ return;
+ } else if (req->bh) {
+ req->nr_segments = __count_segments(req,
+ CLUSTERABLE_DEVICE(SHpnt, SDpnt),
+ SHpnt->unchecked_isa_dma, NULL);
+ } else {
+ panic("Both kiobuf and bh pointers are NULL!");
+ }
}
#define MERGEABLE_BUFFERS(X,Y) \
@@ -745,6 +760,191 @@
MERGEREQFCT(scsi_merge_requests_fn_, 0, 0)
MERGEREQFCT(scsi_merge_requests_fn_c, 1, 0)
MERGEREQFCT(scsi_merge_requests_fn_dc, 1, 1)
+
+
+
+/*
+ * Function: scsi_bh_sgl()
+ *
+ * Purpose: Helper routine to construct S(catter) G(ather) L(ist)
+ * assuming buffer_head-based request in the Scsi_Cmnd.
+ *
+ * Arguments: SCpnt - Command descriptor
+ * use_clustering - 1 if host uses clustering
+ * dma_host - 1 if this host has ISA DMA issues (bus doesn't
+ * expose all of the address lines, so that DMA cannot
+ * be done from an arbitrary address).
+ * sgpnt - pointer to sgl
+ *
+ * Returns: Number of sg segments in the sgl.
+ *
+ * Notes: Only the SCpnt argument should be a non-constant variable.
+ * This functionality was abstracted out of the original code
+ * in __init_io().
+ */
+__inline static int scsi_bh_sgl(Scsi_Cmnd * SCpnt,
+ int use_clustering,
+ int dma_host,
+ struct scatterlist * sgpnt)
+{
+ int count;
+ struct buffer_head * bh;
+ struct buffer_head * bhprev;
+
+ bhprev = NULL;
+
+ for (count = 0, bh = SCpnt->request.bh;
+ bh; bh = bh->b_reqnext) {
+ if (use_clustering && bhprev != NULL) {
+ if (dma_host &&
+ virt_to_phys(bhprev->b_data) - 1 == ISA_DMA_THRESHOLD) {
+ /* Nothing - fall through */
+ } else if (CONTIGUOUS_BUFFERS(bhprev, bh)) {
+ /*
+ * This one is OK. Let it go. Note that we
+ * do not have the ability to allocate
+ * bounce buffer segments > PAGE_SIZE, so
+ * for now we limit the thing.
+ */
+ if( dma_host ) {
+#ifdef DMA_SEGMENT_SIZE_LIMITED
+ if( virt_to_phys(bh->b_data) - 1 < ISA_DMA_THRESHOLD
+ || sgpnt[count - 1].length + bh->b_size <= PAGE_SIZE ) {
+ sgpnt[count - 1].length += bh->b_size;
+ bhprev = bh;
+ continue;
+ }
+#else
+ sgpnt[count - 1].length += bh->b_size;
+ bhprev = bh;
+ continue;
+#endif
+ } else {
+ sgpnt[count - 1].length += bh->b_size;
+ SCpnt->request_bufflen += bh->b_size;
+ bhprev = bh;
+ continue;
+ }
+ }
+ }
+ count++;
+ sgpnt[count - 1].address = bh->b_data;
+ sgpnt[count - 1].length += bh->b_size;
+ if (!dma_host) {
+ SCpnt->request_bufflen += bh->b_size;
+ }
+ bhprev = bh;
+ }
+
+ return count;
+}
+
+
+/*
+ * Function: scsi_kio_sgl()
+ *
+ * Purpose: Helper routine to construct S(catter) G(ather) L(ist)
+ * assuming kiobuf-based request in the Scsi_Cmnd.
+ *
+ * Arguments: SCpnt - Command descriptor
+ * dma_host - 1 if this host has ISA DMA issues (bus doesn't
+ * expose all of the address lines, so that DMA cannot
+ * be done from an arbitrary address).
+ * sgpnt - pointer to sgl
+ *
+ * Returns: Number of sg segments in the sgl.
+ *
+ * Notes: Only the SCpnt argument should be a non-constant variable.
+ * This functionality was created out of __ini_io() in the
+ * original implementation for constructing the sgl for
+ * kiobuf-based I/Os as well.
+ *
+ * Constructs SCpnt->use_sg sgl segments for the kiobuf.
+ *
+ * No clustering of pages is attempted unlike the buffer_head
+ * case. Primarily because the pages in a kiobuf are unlikely to
+ * be contiguous. Bears checking.
+ */
+__inline static int scsi_kio_sgl(Scsi_Cmnd * SCpnt,
+ int dma_host,
+ struct scatterlist * sgpnt)
+{
+ int pgcnt, nr_seg, curr_seg, nr_sectors;
+ size_t curr_offset;
+ unsigned long va;
+ unsigned int nr_bytes, total_bytes, sgl_seg_bytes;
+
+ curr_seg = SCpnt->use_sg; /* This many sgl segments */
+ nr_sectors = SCpnt->request.nr_sectors;
+ total_bytes = (nr_sectors << 9);
+ curr_offset = SCpnt->request.kiobuf->offset;
+
+ /*
+ * In the case of leftover requests, the kiobuf->length
+ * remains the same, but req->nr_sectors would be smaller.
+ * Use this difference to adjust curr_offset in this case.
+ * If not a leftover, the following makes no difference.
+ */
+ curr_offset += (((SCpnt->request.kiobuf->length >> 9) - nr_sectors) << 9);
+ /* How far into the kiobuf is the offset? */
+ for (pgcnt=0; pgcnt<SCpnt->request.kiobuf->nr_pages; pgcnt++) {
+ if(curr_offset >= PAGE_SIZE) {
+ curr_offset -= PAGE_SIZE;
+ continue;
+ } else {
+ break;
+ }
+ }
+ /*
+ * Reusing the pgcnt value from above:
+ * Starting at the right page and offset, build curr_seg
+ * sgl segments (one per page). Account for both a
+ * potentially partial last page and unrequired pages
+ * at the end of the kiobuf.
+ */
+ nr_bytes = 0;
+ for (nr_seg = 0; nr_seg < curr_seg; nr_seg++) {
+ va = page_address(SCpnt->request.kiobuf->maplist[pgcnt])
+ + curr_offset;
+ ++pgcnt;
+
+ /*
+ * If this is the first page, account for offset.
+ * If this the final (maybe partial) page, get remainder.
+ */
+ if (curr_offset != 0) {
+ sgl_seg_bytes = PAGE_SIZE - curr_offset;
+ curr_offset = 0;
+ } else if((nr_bytes + PAGE_SIZE) > total_bytes) {
+ sgl_seg_bytes = total_bytes - nr_bytes;
+ } else {
+ sgl_seg_bytes = PAGE_SIZE;
+ }
+
+ nr_bytes += sgl_seg_bytes;
+ sgpnt[nr_seg].address = (char *)va;
+ sgpnt[nr_seg].alt_address = 0;
+ sgpnt[nr_seg].length = sgl_seg_bytes;
+
+ if (!dma_host) {
+ SCpnt->request_bufflen += sgl_seg_bytes;
+ }
+ }
+ /* Sanity Check */
+ if ((nr_bytes > total_bytes) ||
+ (pgcnt > SCpnt->request.kiobuf->nr_pages)) {
+ printk(KERN_ERR
+ "scsi_kio_sgl: sgl bytes[%d], request bytes[%d]\n"
+ "scsi_kio_sgl: pgcnt[%d], kiobuf->pgcnt[%d]!\n",
+ nr_bytes, total_bytes, pgcnt, SCpnt->request.kiobuf->nr_pages);
+ BUG();
+ }
+ return nr_seg;
+
+}
+
+
+
/*
* Function: __init_io()
*
@@ -777,6 +977,9 @@
* gather list, the sg count in the request won't be valid
* (mainly because we don't need queue management functions
* which keep the tally uptodate.
+ *
+ * Modified to handle kiobuf argument in the SCpnt->request
+ * structure.
*/
__inline static int __init_io(Scsi_Cmnd * SCpnt,
int sg_count_valid,
@@ -784,7 +987,6 @@
int dma_host)
{
struct buffer_head * bh;
- struct buffer_head * bhprev;
char * buff;
int count;
int i;
@@ -799,11 +1001,11 @@
* needed any more. Need to play with it and see if we hit the
* panic. If not, then don't bother.
*/
- if (!SCpnt->request.bh) {
+ if ((!SCpnt->request.bh && !SCpnt->request.kiobuf) ||
+ (SCpnt->request.bh && SCpnt->request.kiobuf)) {
/*
- * Case of page request (i.e. raw device), or unlinked buffer
- * Typically used for swapping, but this isn't how we do
- * swapping any more.
+ * Case of unlinked buffer. Typically used for swapping,
+ * but this isn't how we do swapping any more.
*/
panic("I believe this is dead code. If we hit this, I was wrong");
#if 0
@@ -819,6 +1021,12 @@
req = &SCpnt->request;
/*
* First we need to know how many scatter gather segments are needed.
+ *
+ * Redundant test per comment below indicating sg_count_valid is always
+ * set to 1.(ll_rw_blk.c's estimate of req->nr_segments is always trusted).
+ *
+ * count is initialized in ll_rw_kio() for the kiobuf path and since these
+ * requests are never merged, the counts are stay valid.
*/
if (!sg_count_valid) {
count = __count_segments(req, use_clustering, dma_host, NULL);
@@ -842,12 +1050,24 @@
this_count = SCpnt->request.nr_sectors;
goto single_segment;
}
+ /* Check if size of the sgl would be greater than the size
+ * of the host sgl table. In which case, limit the sgl size.
+ * When the request sectors are harvested after completion of
+ * I/O in __scsi_collect_kio_sectors, the additional sectors
+ * will be reinjected into the request queue as a special cmd.
+ * This will be done till all the request sectors are done.
+ * [chait@sgi.com]
+ */
+ if((SCpnt->request.kiobuf != NULL) &&
+ (count > SCpnt->host->sg_tablesize)) {
+ count = SCpnt->host->sg_tablesize - 1;
+ }
SCpnt->use_sg = count;
-
/*
* Allocate the actual scatter-gather table itself.
* scsi_malloc can only allocate in chunks of 512 bytes
*/
+
SCpnt->sglist_len = (SCpnt->use_sg
* sizeof(struct scatterlist) + 511) & ~511;
@@ -872,51 +1092,14 @@
memset(sgpnt, 0, SCpnt->use_sg * sizeof(struct scatterlist));
SCpnt->request_buffer = (char *) sgpnt;
SCpnt->request_bufflen = 0;
- bhprev = NULL;
- for (count = 0, bh = SCpnt->request.bh;
- bh; bh = bh->b_reqnext) {
- if (use_clustering && bhprev != NULL) {
- if (dma_host &&
- virt_to_phys(bhprev->b_data) - 1 == ISA_DMA_THRESHOLD) {
- /* Nothing - fall through */
- } else if (CONTIGUOUS_BUFFERS(bhprev, bh)) {
- /*
- * This one is OK. Let it go. Note that we
- * do not have the ability to allocate
- * bounce buffer segments > PAGE_SIZE, so
- * for now we limit the thing.
- */
- if( dma_host ) {
-#ifdef DMA_SEGMENT_SIZE_LIMITED
- if( virt_to_phys(bh->b_data) - 1 < ISA_DMA_THRESHOLD
- || sgpnt[count - 1].length + bh->b_size <= PAGE_SIZE ) {
- sgpnt[count - 1].length += bh->b_size;
- bhprev = bh;
- continue;
- }
-#else
- sgpnt[count - 1].length += bh->b_size;
- bhprev = bh;
- continue;
-#endif
- } else {
- sgpnt[count - 1].length += bh->b_size;
- SCpnt->request_bufflen += bh->b_size;
- bhprev = bh;
- continue;
- }
- }
- }
- count++;
- sgpnt[count - 1].address = bh->b_data;
- sgpnt[count - 1].length += bh->b_size;
- if (!dma_host) {
- SCpnt->request_bufflen += bh->b_size;
- }
- bhprev = bh;
+ if (SCpnt->request.bh){
+ count = scsi_bh_sgl(SCpnt, use_clustering, dma_host, sgpnt);
+ } else if (SCpnt->request.kiobuf) {
+ count = scsi_kio_sgl(SCpnt, dma_host, sgpnt);
+ } else {
+ panic("Yowza! Both kiobuf and buffer_head pointers are null!");
}
-
/*
* Verify that the count is correct.
*/
@@ -1009,6 +1192,17 @@
scsi_free(SCpnt->request_buffer, SCpnt->sglist_len);
/*
+ * Shouldn't ever get here for a kiobuf request.
+ *
+ * Since each segment is a page and also, we couldn't
+ * allocate bounce buffers for even the first page,
+ * this means that the DMA buffer pool is exhausted!
+ */
+ if (SCpnt->request.kiobuf){
+ dma_exhausted(SCpnt, 0);
+ }
+
+ /*
* Make an attempt to pick up as much as we reasonably can.
* Just keep adding sectors until the pool starts running kind of
* low. The limit of 30 is somewhat arbitrary - the point is that
@@ -1043,7 +1237,6 @@
* segment. Possibly the entire request, or possibly a small
* chunk of the entire request.
*/
- bh = SCpnt->request.bh;
buff = SCpnt->request.buffer;
if (dma_host) {
@@ -1052,7 +1245,7 @@
* back and allocate a really small one - enough to satisfy
* the first buffer.
*/
- if (virt_to_phys(SCpnt->request.bh->b_data)
+ if (virt_to_phys(SCpnt->request.buffer)
+ (this_count << 9) - 1 > ISA_DMA_THRESHOLD) {
buff = (char *) scsi_malloc(this_count << 9);
if (!buff) {
@@ -1152,3 +1345,21 @@
SDpnt->scsi_init_io_fn = scsi_init_io_vdc;
}
}
+/*
+ * Overrides for Emacs so that we almost follow Linus's tabbing style.
+ * Emacs will notice this stuff at the end of the file and automatically
+ * adjust the settings for this buffer only. This must remain at the end
+ * of the file.
+ * ---------------------------------------------------------------------------
+ * Local variables:
+ * c-indent-level: 4
+ * c-brace-imaginary-offset: 0
+ * c-brace-offset: -4
+ * c-argdecl-indent: 4
+ * c-label-offset: -4
+ * c-continued-statement-offset: 4
+ * c-continued-brace-offset: 0
+ * indent-tabs-mode: nil
+ * tab-width: 8
+ * End:
+ */
--- pre9.2-sct/drivers/scsi/sd.c Tue May 23 14:24:21 2000
+++ pre9.2-sct+mine/drivers/scsi/sd.c Mon May 22 17:53:29 2000
@@ -546,6 +546,7 @@
static void rw_intr(Scsi_Cmnd * SCpnt)
{
int result = SCpnt->result;
+
#if CONFIG_SCSI_LOGGING
char nbuff[6];
#endif
@@ -575,8 +576,14 @@
(SCpnt->sense_buffer[4] << 16) |
(SCpnt->sense_buffer[5] << 8) |
SCpnt->sense_buffer[6];
- if (SCpnt->request.bh != NULL)
- block_sectors = SCpnt->request.bh->b_size >> 9;
+
+ /* Tweak to support kiobuf-based I/O requests, [chait@sgi.com] */
+ if (SCpnt->request.kiobuf != NULL)
+ block_sectors = SCpnt->request.kiobuf->length >> 9;
+ else if (SCpnt->request.bh != NULL)
+ block_sectors = SCpnt->request.bh->b_size >> 9;
+ else
+ panic("Both kiobuf and bh pointers are null!\n");
switch (SCpnt->device->sector_size) {
case 1024:
error_sector <<= 1;
--- pre9.2-sct/include/linux/blkdev.h Tue May 23 14:24:35 2000
+++ pre9.2-sct+mine/include/linux/blkdev.h Tue May 23 13:48:35 2000
@@ -6,6 +6,7 @@
#include <linux/genhd.h>
#include <linux/tqueue.h>
#include <linux/list.h>
+#include <linux/iobuf.h>
struct request_queue;
typedef struct request_queue request_queue_t;
@@ -39,6 +40,7 @@
void * special;
char * buffer;
struct semaphore * sem;
+ struct kiobuf * kiobuf;
struct buffer_head * bh;
struct buffer_head * bhtail;
request_queue_t * q;
--- pre9.2-sct/include/linux/elevator.h Tue May 23 14:24:36 2000
+++ pre9.2-sct+mine/include/linux/elevator.h Mon May 22 19:05:15 2000
@@ -107,7 +107,12 @@
elevator->sequence++;
if (req->cmd == READ)
elevator->read_pendings++;
- elevator->nr_segments++;
+
+ if (req->kiobuf != NULL) {
+ elevator->nr_segments += req->nr_segments;
+ } else {
+ elevator->nr_segments++;
+ }
}
static inline int elevator_request_latency(elevator_t * elevator, int rw)
--- pre9.2-sct/include/linux/fs.h Tue May 23 14:24:34 2000
+++ pre9.2-sct+mine/include/linux/fs.h Mon May 22 17:56:47 2000
@@ -1063,6 +1063,7 @@
extern struct buffer_head * get_hash_table(kdev_t, int, int);
extern struct buffer_head * getblk(kdev_t, int, int);
extern void ll_rw_block(int, int, struct buffer_head * bh[]);
+extern void ll_rw_kio(int , struct kiobuf *, kdev_t, unsigned long, size_t, int *);
extern int is_read_only(kdev_t);
extern void __brelse(struct buffer_head *);
static inline void brelse(struct buffer_head *buf)
--- pre9.2-sct/include/linux/iobuf.h Tue May 23 14:25:30 2000
+++ pre9.2-sct+mine/include/linux/iobuf.h Mon May 22 18:01:30 2000
@@ -56,6 +56,7 @@
atomic_t io_count; /* IOs still in progress */
int errno; /* Status of completed IO */
void (*end_io) (struct kiobuf *); /* Completion callback */
+ void *k_dev_id; /* Store kiovec (or pagebuf) here */
wait_queue_head_t wait_queue;
};
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
prev parent reply other threads:[~2000-05-23 21:58 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <00c201bfc0d7$56664db0$4d0310ac@fairfax.datafocus.com>
[not found] ` <200005181955.MAA71492@getafix.engr.sgi.com>
2000-05-19 15:09 ` PATCH: Enhance queueing/scsi-midlayer to handle kiobufs. [Re: Request splits] Stephen C. Tweedie
2000-05-19 15:48 ` Brian Pomerantz
2000-05-19 15:55 ` Stephen C. Tweedie
2000-05-19 16:17 ` Brian Pomerantz
2000-05-19 18:00 ` Chaitanya Tumuluri
2000-05-19 18:11 ` Gérard Roudier
2000-05-19 19:24 ` Brian Pomerantz
2000-05-19 20:43 ` Gérard Roudier
2000-05-20 9:10 ` Change direct I/O memory model? [Was Re: PATCH: Enhance queueing/scsi-midlayer to handle kiobufs] Mark Mokryn
2000-05-19 17:53 ` PATCH: Enhance queueing/scsi-midlayer to handle kiobufs. [Re: Request splits] Chaitanya Tumuluri
2000-05-19 17:38 ` Chaitanya Tumuluri
2000-05-23 21:58 ` Chaitanya Tumuluri [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200005232158.OAA77313@getafix.engr.sgi.com \
--to=chait@getafix.engr.sgi.com \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=bapper@piratehaven.org \
--cc=chait@sgi.com \
--cc=dgilbert@interlog.com \
--cc=eric@andante.org \
--cc=linux-mm@kvack.org \
--cc=linux-scsi@vger.rutgers.edu \
--cc=sct@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.