* 2.5.20 RAID5 compile error @ 2002-06-03 18:36 Mike Black 2002-06-04 11:51 ` Jens Axboe 0 siblings, 1 reply; 20+ messages in thread From: Mike Black @ 2002-06-03 18:36 UTC (permalink / raw) To: linux-kernel RAID5 still doesn't compile....sigh.... gcc -D__KERNEL__ -I/usr/src/linux-2.5.14/include -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fomit-frame-pointer -fno-strict-alias ing -fno-common -pipe -mpreferred-stack-boundary=2 -march=i686 -DMODULE -DKBUILD_BASENAME=raid5 -c -o raid5.o raid5.c raid5.c: In function `raid5_plug_device': raid5.c:1247: `tq_disk' undeclared (first use in this function) raid5.c:1247: (Each undeclared identifier is reported only once raid5.c:1247: for each function it appears in.) raid5.c: In function `run': raid5.c:1589: sizeof applied to an incomplete type make[2]: *** [raid5.o] Error 1 make[2]: Leaving directory `/usr/src/linux-2.5.14/drivers/md' make[1]: *** [_modsubdir_md] Error 2 make[1]: Leaving directory `/usr/src/linux-2.5.14/drivers' make: *** [_mod_drivers] Error 2 Michael D. Black mblack@csihq.com http://www.csihq.com/ http://www.csihq.com/~mike 321-676-2923, x203 Melbourne FL ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.5.20 RAID5 compile error 2002-06-03 18:36 2.5.20 RAID5 compile error Mike Black @ 2002-06-04 11:51 ` Jens Axboe 2002-06-04 11:56 ` Neil Brown 2002-06-04 21:48 ` Miles Lane 0 siblings, 2 replies; 20+ messages in thread From: Jens Axboe @ 2002-06-04 11:51 UTC (permalink / raw) To: Mike Black; +Cc: linux-kernel On Mon, Jun 03 2002, Mike Black wrote: > RAID5 still doesn't compile....sigh.... [snip] Some people do nothing but complain instead of trying to fix things. Sigh... -- Jens Axboe ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.5.20 RAID5 compile error @ 2002-06-04 11:56 ` Neil Brown 2002-06-04 11:58 ` Jens Axboe 0 siblings, 1 reply; 20+ messages in thread From: Neil Brown @ 2002-06-04 11:56 UTC (permalink / raw) To: Jens Axboe; +Cc: Mike Black, linux-kernel On Tuesday June 4, axboe@suse.de wrote: > On Mon, Jun 03 2002, Mike Black wrote: > > RAID5 still doesn't compile....sigh.... > > [snip] > > Some people do nothing but complain instead of trying to fix things. > Sigh... I've got fixes.... but I want to suggest some changes to the plugging mechanism, and as it seems to have changed a bit since 2.5.20, I'll have to sync up my patch before I show it to you... NeilBrown ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.5.20 RAID5 compile error 2002-06-04 11:56 ` Neil Brown @ 2002-06-04 11:58 ` Jens Axboe 2002-06-04 12:15 ` Neil Brown 0 siblings, 1 reply; 20+ messages in thread From: Jens Axboe @ 2002-06-04 11:58 UTC (permalink / raw) To: Neil Brown; +Cc: Mike Black, linux-kernel On Tue, Jun 04 2002, Neil Brown wrote: > On Tuesday June 4, axboe@suse.de wrote: > > On Mon, Jun 03 2002, Mike Black wrote: > > > RAID5 still doesn't compile....sigh.... > > > > [snip] > > > > Some people do nothing but complain instead of trying to fix things. > > Sigh... > > I've got fixes.... but I want to suggest some changes to the plugging > mechanism, and as it seems to have changed a bit since 2.5.20, I'll > have to sync up my patch before I show it to you... Excellent. I've sent the last plugging patch to Linus, which appears to be ok/stable. If you could send changes relative to that, it would be great. What changes did you have in mind? -- Jens Axboe ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.5.20 RAID5 compile error @ 2002-06-04 12:15 ` Neil Brown 2002-06-04 12:21 ` Jens Axboe 0 siblings, 1 reply; 20+ messages in thread From: Neil Brown @ 2002-06-04 12:15 UTC (permalink / raw) To: Jens Axboe; +Cc: Mike Black, linux-kernel On Tuesday June 4, axboe@suse.de wrote: > > What changes did you have in mind? http://www.cse.unsw.edu.au/~neilb/patches/linux-devel/2.5.20/patch-A-NewPlug Is what I had against 2.5.20. A quick look at the mail that you sent with improvements suggest that I can be even less intrusive.. But it will have to wait until tomorrow (my time). NeilBrown ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.5.20 RAID5 compile error 2002-06-04 12:15 ` Neil Brown @ 2002-06-04 12:21 ` Jens Axboe 2002-06-04 12:32 ` Jens Axboe 0 siblings, 1 reply; 20+ messages in thread From: Jens Axboe @ 2002-06-04 12:21 UTC (permalink / raw) To: Neil Brown; +Cc: Mike Black, linux-kernel On Tue, Jun 04 2002, Neil Brown wrote: > On Tuesday June 4, axboe@suse.de wrote: > > > > What changes did you have in mind? > > http://www.cse.unsw.edu.au/~neilb/patches/linux-devel/2.5.20/patch-A-NewPlug > > Is what I had against 2.5.20. A quick look at the mail that you sent > with improvements suggest that I can be even less intrusive.. But it > will have to wait until tomorrow (my time). Ah ok, I see what you have in mind. Right now you are completely mimicking the tq_struct setup -- any reason a simple q->plug_fn is not enough? Do you ever need anything else than the queue passed in with the plug? Wrt umem, it seems you could keep 'card' in the queuedata. Same for raid5 and conf. -- Jens Axboe ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.5.20 RAID5 compile error 2002-06-04 12:21 ` Jens Axboe @ 2002-06-04 12:32 ` Jens Axboe 2002-06-04 12:38 ` Jens Axboe 0 siblings, 1 reply; 20+ messages in thread From: Jens Axboe @ 2002-06-04 12:32 UTC (permalink / raw) To: Neil Brown; +Cc: Mike Black, linux-kernel On Tue, Jun 04 2002, Jens Axboe wrote: > plug? Wrt umem, it seems you could keep 'card' in the queuedata. Same > for raid5 and conf. Ok by actually looking at it, both card and conf are the queues themselves. So I'd say your approach is indeed a bit overkill. I'll send a patch in half an hour for you to review. -- Jens Axboe ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.5.20 RAID5 compile error 2002-06-04 12:32 ` Jens Axboe @ 2002-06-04 12:38 ` Jens Axboe 2002-06-04 14:23 ` Jens Axboe 2002-06-05 1:16 ` Neil Brown 0 siblings, 2 replies; 20+ messages in thread From: Jens Axboe @ 2002-06-04 12:38 UTC (permalink / raw) To: Neil Brown; +Cc: Mike Black, linux-kernel On Tue, Jun 04 2002, Jens Axboe wrote: > On Tue, Jun 04 2002, Jens Axboe wrote: > > plug? Wrt umem, it seems you could keep 'card' in the queuedata. Same > > for raid5 and conf. > > Ok by actually looking at it, both card and conf are the queues > themselves. So I'd say your approach is indeed a bit overkill. I'll send > a patch in half an hour for you to review. Alright here's the block part of it, you'll need to merge your umem and raid5 changes in with that. I'm just interested in knowing if this is all you want/need. It's actually just a two line changes from the last version posted -- set q->unplug_fn in blk_init_queue to our default (you'll override that for umem and raid5, or rather you'll set it yourself after blk_queue_make_request()), and then call that instead of __generic_unplug_device directly from blk_run_queues(). --- /opt/kernel/linux-2.5.20/drivers/block/ll_rw_blk.c 2002-06-03 10:35:35.000000000 +0200 +++ linux/drivers/block/ll_rw_blk.c 2002-06-04 14:35:16.000000000 +0200 @@ -795,8 +795,8 @@ * force the transfer to start only after we have put all the requests * on the list. * - * This is called with interrupts off and no requests on the queue. - * (and with the request spinlock acquired) + * This is called with interrupts off and no requests on the queue and + * with the queue lock held. */ void blk_plug_device(request_queue_t *q) { @@ -806,7 +806,7 @@ if (!elv_queue_empty(q)) return; - if (!test_and_set_bit(QUEUE_FLAG_PLUGGED, &q->queue_flags)) { + if (!blk_queue_plugged(q)) { spin_lock(&blk_plug_lock); list_add_tail(&q->plug_list, &blk_plug_list); spin_unlock(&blk_plug_lock); @@ -814,14 +814,27 @@ } /* + * remove the queue from the plugged list, if present. called with + * queue lock held and interrupts disabled. + */ +static inline int blk_remove_plug(request_queue_t *q) +{ + if (blk_queue_plugged(q)) { + spin_lock(&blk_plug_lock); + list_del_init(&q->plug_list); + spin_unlock(&blk_plug_lock); + return 1; + } + + return 0; +} + +/* * remove the plug and let it rip.. */ static inline void __generic_unplug_device(request_queue_t *q) { - /* - * not plugged - */ - if (!__test_and_clear_bit(QUEUE_FLAG_PLUGGED, &q->queue_flags)) + if (!blk_remove_plug(q)) return; if (test_bit(QUEUE_FLAG_STOPPED, &q->queue_flags)) @@ -849,11 +862,10 @@ void generic_unplug_device(void *data) { request_queue_t *q = data; - unsigned long flags; - spin_lock_irqsave(q->queue_lock, flags); + spin_lock_irq(q->queue_lock); __generic_unplug_device(q); - spin_unlock_irqrestore(q->queue_lock, flags); + spin_unlock_irq(q->queue_lock); } /** @@ -893,6 +905,12 @@ **/ void blk_stop_queue(request_queue_t *q) { + unsigned long flags; + + spin_lock_irqsave(q->queue_lock, flags); + blk_remove_plug(q); + spin_unlock_irqrestore(q->queue_lock, flags); + set_bit(QUEUE_FLAG_STOPPED, &q->queue_flags); } @@ -904,45 +922,33 @@ * are currently stopped are ignored. This is equivalent to the older * tq_disk task queue run. **/ +#define blk_plug_entry(entry) list_entry((entry), request_queue_t, plug_list) void blk_run_queues(void) { - struct list_head *n, *tmp, local_plug_list; - unsigned long flags; + struct list_head local_plug_list; INIT_LIST_HEAD(&local_plug_list); + spin_lock_irq(&blk_plug_lock); + /* * this will happen fairly often */ - spin_lock_irqsave(&blk_plug_lock, flags); if (list_empty(&blk_plug_list)) { - spin_unlock_irqrestore(&blk_plug_lock, flags); + spin_unlock_irq(&blk_plug_lock); return; } list_splice(&blk_plug_list, &local_plug_list); INIT_LIST_HEAD(&blk_plug_list); - spin_unlock_irqrestore(&blk_plug_lock, flags); + spin_unlock_irq(&blk_plug_lock); + + while (!list_empty(&local_plug_list)) { + request_queue_t *q = blk_plug_entry(local_plug_list.next); - /* - * local_plug_list is now a private copy we can traverse lockless - */ - list_for_each_safe(n, tmp, &local_plug_list) { - request_queue_t *q = list_entry(n, request_queue_t, plug_list); + BUG_ON(test_bit(QUEUE_FLAG_STOPPED, &q->queue_flags)); - if (!test_bit(QUEUE_FLAG_STOPPED, &q->queue_flags)) { - list_del(&q->plug_list); - generic_unplug_device(q); - } - } - - /* - * add any remaining queue back to plug list - */ - if (!list_empty(&local_plug_list)) { - spin_lock_irqsave(&blk_plug_lock, flags); - list_splice(&local_plug_list, &blk_plug_list); - spin_unlock_irqrestore(&blk_plug_lock, flags); + q->unplug_fn(q); } } @@ -1085,6 +1091,7 @@ q->front_merge_fn = ll_front_merge_fn; q->merge_requests_fn = ll_merge_requests_fn; q->prep_rq_fn = NULL; + q->unplug_fn = generic_unplug_device; q->queue_flags = (1 << QUEUE_FLAG_CLUSTER); q->queue_lock = lock; --- /opt/kernel/linux-2.5.20/include/linux/blkdev.h 2002-06-03 10:35:40.000000000 +0200 +++ linux/include/linux/blkdev.h 2002-06-04 14:33:04.000000000 +0200 @@ -116,7 +116,7 @@ typedef request_queue_t * (queue_proc) (kdev_t dev); typedef int (make_request_fn) (request_queue_t *q, struct bio *bio); typedef int (prep_rq_fn) (request_queue_t *, struct request *); -typedef void (unplug_device_fn) (void *q); +typedef void (unplug_fn) (void *q); enum blk_queue_state { Queue_down, @@ -160,6 +160,7 @@ merge_requests_fn *merge_requests_fn; make_request_fn *make_request_fn; prep_rq_fn *prep_rq_fn; + unplug_fn *unplug_fn; struct backing_dev_info backing_dev_info; @@ -209,13 +210,11 @@ #define RQ_SCSI_DONE 0xfffe #define RQ_SCSI_DISCONNECTING 0xffe0 -#define QUEUE_FLAG_PLUGGED 0 /* queue is plugged */ -#define QUEUE_FLAG_CLUSTER 1 /* cluster several segments into 1 */ -#define QUEUE_FLAG_QUEUED 2 /* uses generic tag queueing */ -#define QUEUE_FLAG_STOPPED 3 /* queue is stopped */ +#define QUEUE_FLAG_CLUSTER 0 /* cluster several segments into 1 */ +#define QUEUE_FLAG_QUEUED 1 /* uses generic tag queueing */ +#define QUEUE_FLAG_STOPPED 2 /* queue is stopped */ -#define blk_queue_plugged(q) test_bit(QUEUE_FLAG_PLUGGED, &(q)->queue_flags) -#define blk_mark_plugged(q) set_bit(QUEUE_FLAG_PLUGGED, &(q)->queue_flags) +#define blk_queue_plugged(q) !list_empty(&(q)->plug_list) #define blk_queue_tagged(q) test_bit(QUEUE_FLAG_QUEUED, &(q)->queue_flags) #define blk_queue_empty(q) elv_queue_empty(q) #define list_entry_rq(ptr) list_entry((ptr), struct request, queuelist) -- Jens Axboe ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.5.20 RAID5 compile error 2002-06-04 12:38 ` Jens Axboe @ 2002-06-04 14:23 ` Jens Axboe 2002-06-04 13:45 ` Martin Dalecki 2002-06-05 1:16 ` Neil Brown 1 sibling, 1 reply; 20+ messages in thread From: Jens Axboe @ 2002-06-04 14:23 UTC (permalink / raw) To: Neil Brown; +Cc: linux-kernel Neil, I tried converting umem to see how it fit together, this is what I came up with. This does use a queue per umem unit, but I think that's the right split anyways. Maybe at some point we can use the per-major statically allocated queues... diff -ur -X /home/axboe/cdrom/exclude /opt/kernel/linux-2.5.20/drivers/block/ll_rw_blk.c linux/drivers/block/ll_rw_blk.c --- /opt/kernel/linux-2.5.20/drivers/block/ll_rw_blk.c 2002-06-03 10:35:35.000000000 +0200 +++ linux/drivers/block/ll_rw_blk.c 2002-06-04 16:14:40.000000000 +0200 @@ -49,6 +49,9 @@ */ static kmem_cache_t *request_cachep; +/* + * plug management + */ static struct list_head blk_plug_list; static spinlock_t blk_plug_lock __cacheline_aligned_in_smp = SPIN_LOCK_UNLOCKED; @@ -795,8 +798,8 @@ * force the transfer to start only after we have put all the requests * on the list. * - * This is called with interrupts off and no requests on the queue. - * (and with the request spinlock acquired) + * This is called with interrupts off and no requests on the queue and + * with the queue lock held. */ void blk_plug_device(request_queue_t *q) { @@ -806,7 +809,7 @@ if (!elv_queue_empty(q)) return; - if (!test_and_set_bit(QUEUE_FLAG_PLUGGED, &q->queue_flags)) { + if (!blk_queue_plugged(q)) { spin_lock(&blk_plug_lock); list_add_tail(&q->plug_list, &blk_plug_list); spin_unlock(&blk_plug_lock); @@ -814,14 +817,27 @@ } /* + * remove the queue from the plugged list, if present. called with + * queue lock held and interrupts disabled. + */ +inline int blk_remove_plug(request_queue_t *q) +{ + if (blk_queue_plugged(q)) { + spin_lock(&blk_plug_lock); + list_del_init(&q->plug_list); + spin_unlock(&blk_plug_lock); + return 1; + } + + return 0; +} + +/* * remove the plug and let it rip.. */ static inline void __generic_unplug_device(request_queue_t *q) { - /* - * not plugged - */ - if (!__test_and_clear_bit(QUEUE_FLAG_PLUGGED, &q->queue_flags)) + if (!blk_remove_plug(q)) return; if (test_bit(QUEUE_FLAG_STOPPED, &q->queue_flags)) @@ -849,11 +865,10 @@ void generic_unplug_device(void *data) { request_queue_t *q = data; - unsigned long flags; - spin_lock_irqsave(q->queue_lock, flags); + spin_lock_irq(q->queue_lock); __generic_unplug_device(q); - spin_unlock_irqrestore(q->queue_lock, flags); + spin_unlock_irq(q->queue_lock); } /** @@ -893,6 +908,12 @@ **/ void blk_stop_queue(request_queue_t *q) { + unsigned long flags; + + spin_lock_irqsave(q->queue_lock, flags); + blk_remove_plug(q); + spin_unlock_irqrestore(q->queue_lock, flags); + set_bit(QUEUE_FLAG_STOPPED, &q->queue_flags); } @@ -904,45 +925,31 @@ * are currently stopped are ignored. This is equivalent to the older * tq_disk task queue run. **/ +#define blk_plug_entry(entry) list_entry((entry), request_queue_t, plug_list) void blk_run_queues(void) { - struct list_head *n, *tmp, local_plug_list; - unsigned long flags; + struct list_head local_plug_list; INIT_LIST_HEAD(&local_plug_list); + spin_lock_irq(&blk_plug_lock); + /* * this will happen fairly often */ - spin_lock_irqsave(&blk_plug_lock, flags); if (list_empty(&blk_plug_list)) { - spin_unlock_irqrestore(&blk_plug_lock, flags); + spin_unlock_irq(&blk_plug_lock); return; } list_splice(&blk_plug_list, &local_plug_list); INIT_LIST_HEAD(&blk_plug_list); - spin_unlock_irqrestore(&blk_plug_lock, flags); + spin_unlock_irq(&blk_plug_lock); + + while (!list_empty(&local_plug_list)) { + request_queue_t *q = blk_plug_entry(local_plug_list.next); - /* - * local_plug_list is now a private copy we can traverse lockless - */ - list_for_each_safe(n, tmp, &local_plug_list) { - request_queue_t *q = list_entry(n, request_queue_t, plug_list); - - if (!test_bit(QUEUE_FLAG_STOPPED, &q->queue_flags)) { - list_del(&q->plug_list); - generic_unplug_device(q); - } - } - - /* - * add any remaining queue back to plug list - */ - if (!list_empty(&local_plug_list)) { - spin_lock_irqsave(&blk_plug_lock, flags); - list_splice(&local_plug_list, &blk_plug_list); - spin_unlock_irqrestore(&blk_plug_lock, flags); + q->unplug_fn(q); } } @@ -1085,6 +1092,7 @@ q->front_merge_fn = ll_front_merge_fn; q->merge_requests_fn = ll_merge_requests_fn; q->prep_rq_fn = NULL; + q->unplug_fn = generic_unplug_device; q->queue_flags = (1 << QUEUE_FLAG_CLUSTER); q->queue_lock = lock; @@ -1352,7 +1360,7 @@ static int __make_request(request_queue_t *q, struct bio *bio) { struct request *req, *freereq = NULL; - int el_ret, rw, nr_sectors, cur_nr_sectors, barrier; + int el_ret, rw, nr_sectors, cur_nr_sectors; struct list_head *insert_here; sector_t sector; @@ -1368,16 +1376,12 @@ */ blk_queue_bounce(q, &bio); - spin_lock_prefetch(q->queue_lock); - - barrier = test_bit(BIO_RW_BARRIER, &bio->bi_rw); - spin_lock_irq(q->queue_lock); again: req = NULL; insert_here = q->queue_head.prev; - if (blk_queue_empty(q) || barrier) { + if (blk_queue_empty(q) || bio_barrier(bio)) { blk_plug_device(q); goto get_rq; } @@ -1477,7 +1481,7 @@ /* * REQ_BARRIER implies no merging, but lets make it explicit */ - if (barrier) + if (bio_barrier(bio)) req->flags |= (REQ_BARRIER | REQ_NOMERGE); req->errors = 0; @@ -1489,9 +1493,7 @@ req->buffer = bio_data(bio); /* see ->buffer comment above */ req->waiting = NULL; req->bio = req->biotail = bio; - if (bio->bi_bdev) - req->rq_dev = to_kdev_t(bio->bi_bdev->bd_dev); - else req->rq_dev = NODEV; + req->rq_dev = to_kdev_t(bio->bi_bdev->bd_dev); add_request(q, req, insert_here); out: if (freereq) @@ -2002,6 +2004,8 @@ EXPORT_SYMBOL(generic_make_request); EXPORT_SYMBOL(blkdev_release_request); EXPORT_SYMBOL(generic_unplug_device); +EXPORT_SYMBOL(blk_plug_device); +EXPORT_SYMBOL(blk_remove_plug); EXPORT_SYMBOL(blk_attempt_remerge); EXPORT_SYMBOL(blk_max_low_pfn); EXPORT_SYMBOL(blk_max_pfn); diff -ur -X /home/axboe/cdrom/exclude /opt/kernel/linux-2.5.20/drivers/block/umem.c linux/drivers/block/umem.c --- /opt/kernel/linux-2.5.20/drivers/block/umem.c 2002-05-29 20:42:54.000000000 +0200 +++ linux/drivers/block/umem.c 2002-06-04 16:13:48.000000000 +0200 @@ -128,6 +128,8 @@ */ struct bio *bio, *currentbio, **biotail; + request_queue_t queue; + struct mm_page { dma_addr_t page_dma; struct mm_dma_desc *desc; @@ -141,8 +143,6 @@ struct tasklet_struct tasklet; unsigned int dma_status; - struct tq_struct plug_tq; - struct { int good; int warned; @@ -383,7 +383,10 @@ static void mm_unplug_device(void *data) { - struct cardinfo *card = data; + request_queue_t *q = data; + struct cardinfo *card = q->queuedata; + + blk_remove_plug(q); spin_lock_bh(&card->lock); activate(card); @@ -577,7 +580,7 @@ card->biotail = &bio->bi_next; spin_unlock_bh(&card->lock); - queue_task(&card->plug_tq, &tq_disk); + blk_plug_device(q); return 0; } @@ -1064,11 +1067,12 @@ card->bio = NULL; card->biotail = &card->bio; + blk_queue_make_request(&card->queue, mm_make_request); + card->queue.queuedata = card; + card->queue.unplug_fn = mm_unplug_device; + tasklet_init(&card->tasklet, process_page, (unsigned long)card); - card->plug_tq.sync = 0; - card->plug_tq.routine = &mm_unplug_device; - card->plug_tq.data = card; card->check_batteries = 0; mem_present = readb(card->csr_remap + MEMCTRLSTATUS_MEMORY); @@ -1277,9 +1281,6 @@ add_gendisk(&mm_gendisk); - blk_queue_make_request(BLK_DEFAULT_QUEUE(MAJOR_NR), - mm_make_request); - blk_size[MAJOR_NR] = mm_gendisk.sizes; for (i = 0; i < num_cards; i++) { register_disk(&mm_gendisk, mk_kdev(MAJOR_NR, i<<MM_SHIFT), MM_SHIFT, diff -ur -X /home/axboe/cdrom/exclude /opt/kernel/linux-2.5.20/include/linux/bio.h linux/include/linux/bio.h --- /opt/kernel/linux-2.5.20/include/linux/bio.h 2002-05-29 20:42:57.000000000 +0200 +++ linux/include/linux/bio.h 2002-06-04 16:14:59.000000000 +0200 @@ -123,7 +123,7 @@ #define bio_offset(bio) bio_iovec((bio))->bv_offset #define bio_sectors(bio) ((bio)->bi_size >> 9) #define bio_data(bio) (page_address(bio_page((bio))) + bio_offset((bio))) -#define bio_barrier(bio) ((bio)->bi_rw & (1 << BIO_BARRIER)) +#define bio_barrier(bio) ((bio)->bi_rw & (1 << BIO_RW_BARRIER)) /* * will die diff -ur -X /home/axboe/cdrom/exclude /opt/kernel/linux-2.5.20/include/linux/blkdev.h linux/include/linux/blkdev.h --- /opt/kernel/linux-2.5.20/include/linux/blkdev.h 2002-06-03 10:35:40.000000000 +0200 +++ linux/include/linux/blkdev.h 2002-06-04 16:15:22.000000000 +0200 @@ -116,7 +116,7 @@ typedef request_queue_t * (queue_proc) (kdev_t dev); typedef int (make_request_fn) (request_queue_t *q, struct bio *bio); typedef int (prep_rq_fn) (request_queue_t *, struct request *); -typedef void (unplug_device_fn) (void *q); +typedef void (unplug_fn) (void *q); enum blk_queue_state { Queue_down, @@ -160,6 +160,7 @@ merge_requests_fn *merge_requests_fn; make_request_fn *make_request_fn; prep_rq_fn *prep_rq_fn; + unplug_fn *unplug_fn; struct backing_dev_info backing_dev_info; @@ -209,13 +210,11 @@ #define RQ_SCSI_DONE 0xfffe #define RQ_SCSI_DISCONNECTING 0xffe0 -#define QUEUE_FLAG_PLUGGED 0 /* queue is plugged */ -#define QUEUE_FLAG_CLUSTER 1 /* cluster several segments into 1 */ -#define QUEUE_FLAG_QUEUED 2 /* uses generic tag queueing */ -#define QUEUE_FLAG_STOPPED 3 /* queue is stopped */ +#define QUEUE_FLAG_CLUSTER 0 /* cluster several segments into 1 */ +#define QUEUE_FLAG_QUEUED 1 /* uses generic tag queueing */ +#define QUEUE_FLAG_STOPPED 2 /* queue is stopped */ -#define blk_queue_plugged(q) test_bit(QUEUE_FLAG_PLUGGED, &(q)->queue_flags) -#define blk_mark_plugged(q) set_bit(QUEUE_FLAG_PLUGGED, &(q)->queue_flags) +#define blk_queue_plugged(q) !list_empty(&(q)->plug_list) #define blk_queue_tagged(q) test_bit(QUEUE_FLAG_QUEUED, &(q)->queue_flags) #define blk_queue_empty(q) elv_queue_empty(q) #define list_entry_rq(ptr) list_entry((ptr), struct request, queuelist) @@ -295,6 +294,7 @@ extern struct request *blk_get_request(request_queue_t *, int, int); extern void blk_put_request(struct request *); extern void blk_plug_device(request_queue_t *); +extern int blk_remove_plug(request_queue_t *); extern void blk_recount_segments(request_queue_t *, struct bio *); extern inline int blk_phys_contig_segment(request_queue_t *q, struct bio *, struct bio *); extern inline int blk_hw_contig_segment(request_queue_t *q, struct bio *, struct bio *); -- Jens Axboe ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.5.20 RAID5 compile error 2002-06-04 14:23 ` Jens Axboe @ 2002-06-04 13:45 ` Martin Dalecki 2002-06-04 14:55 ` Jens Axboe 2002-06-04 17:00 ` Ingo Oeser 0 siblings, 2 replies; 20+ messages in thread From: Martin Dalecki @ 2002-06-04 13:45 UTC (permalink / raw) To: Jens Axboe; +Cc: Neil Brown, linux-kernel Jens Axboe wrote: > Neil, > > I tried converting umem to see how it fit together, this is what I came > up with. This does use a queue per umem unit, but I think that's the > right split anyways. Maybe at some point we can use the per-major > statically allocated queues... > > /* > + * remove the queue from the plugged list, if present. called with > + * queue lock held and interrupts disabled. > + */ > +inline int blk_remove_plug(request_queue_t *q) Jens - I have noticed some unlikely() tag "optimizations" in tcq code too. Please tell my, why do you attribute this exported function as inline? I "hardly doubt" that it will ever show up on any profile. Contrary to popular school generally it only pays out to unroll vector code on modern CPUs not decision code like the above. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.5.20 RAID5 compile error 2002-06-04 13:45 ` Martin Dalecki @ 2002-06-04 14:55 ` Jens Axboe 2002-06-04 14:24 ` Martin Dalecki 2002-06-04 17:00 ` Ingo Oeser 1 sibling, 1 reply; 20+ messages in thread From: Jens Axboe @ 2002-06-04 14:55 UTC (permalink / raw) To: Martin Dalecki; +Cc: Neil Brown, linux-kernel On Tue, Jun 04 2002, Martin Dalecki wrote: > Jens Axboe wrote: > >Neil, > > > >I tried converting umem to see how it fit together, this is what I came > >up with. This does use a queue per umem unit, but I think that's the > >right split anyways. Maybe at some point we can use the per-major > >statically allocated queues... > > > > > /* > >+ * remove the queue from the plugged list, if present. called with > >+ * queue lock held and interrupts disabled. > >+ */ > >+inline int blk_remove_plug(request_queue_t *q) > > > Jens - I have noticed some unlikely() tag "optimizations" in > tcq code too. > Please tell my, why do you attribute this exported function as inline? > I "hardly doubt" that it will ever show up on any profile. > Contrary to popular school generally it only pays out to unroll vector code > on modern CPUs not decision code like the above. I doubt it matters much in this case. But it certainly isn't called often enough to justify the inline, I'll uninline later. WRT the unlikely(), if you have the hints available, why not pass them on? -- Jens Axboe ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.5.20 RAID5 compile error 2002-06-04 14:55 ` Jens Axboe @ 2002-06-04 14:24 ` Martin Dalecki 2002-06-04 18:22 ` Horst von Brand 2002-06-05 10:19 ` Jens Axboe 0 siblings, 2 replies; 20+ messages in thread From: Martin Dalecki @ 2002-06-04 14:24 UTC (permalink / raw) To: Jens Axboe; +Cc: Neil Brown, linux-kernel Jens Axboe wrote: > On Tue, Jun 04 2002, Martin Dalecki wrote: > >>Jens Axboe wrote: >> >>>Neil, >>> >>>I tried converting umem to see how it fit together, this is what I came >>>up with. This does use a queue per umem unit, but I think that's the >>>right split anyways. Maybe at some point we can use the per-major >>>statically allocated queues... >> >>>/* >>>+ * remove the queue from the plugged list, if present. called with >>>+ * queue lock held and interrupts disabled. >>>+ */ >>>+inline int blk_remove_plug(request_queue_t *q) >> >> >>Jens - I have noticed some unlikely() tag "optimizations" in >>tcq code too. >>Please tell my, why do you attribute this exported function as inline? >>I "hardly doubt" that it will ever show up on any profile. >>Contrary to popular school generally it only pays out to unroll vector code >>on modern CPUs not decision code like the above. > > > I doubt it matters much in this case. But it certainly isn't called > often enough to justify the inline, I'll uninline later. > > WRT the unlikely(), if you have the hints available, why not pass them > on? Well it's kind like the answer to the question: why don't do it all in hand optimized assembler? Or in other words - let's give the GCC guys good reasons for more hard work. But more seriously: Things like unlikely() tricks and other friends seldomly really pay off if applied randomly. But they can: 1. Have quite contrary effects to what one would expect due to the fact that one is targetting a single instruction set but in reality multiple very different CPU archs or even multiple archs. 2. Changes/improvements to the compiler. My personal rule of thumb is - don't do something like the above unless you did: 1. Some real global profiling. 2. Some real CPU cycle counting on the micro level. 3. You really have too. (This should be number 1!) Unless one of the above cases applies there is only one rule for true code optimization, which appears to be generally valid: Write tight code. It will help small and old systems which usually have quite narrow memmory constrains and on bigger systems it will help keeping the intruction caches and internal instruction decode more happy. Think for example about hints for: 1. AMD's and P4's internal instruction predecode/CISC-RISC translation. Small functions will give immediately the information - "hey buddy you saw this code already once...". But inlined stuff will still trash the predecode engines along the single execution. 2. Think on the fly instruction set translation (Transmeta). Its rather pretty obvious that a small function is acutally more likely to be more palatable to software like that... 3. Even the Cyrix486 contained already very efficient mechanism to make call instructions nearly zero cost in terms of instruction cycles. (Think return address stack and hints for branch prediction.) Of corse the above is all valid for decission code, which is the case here, but not necessarily for tight loops of vectorized operations... ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.5.20 RAID5 compile error 2002-06-04 14:24 ` Martin Dalecki @ 2002-06-04 18:22 ` Horst von Brand 2002-06-05 10:19 ` Jens Axboe 1 sibling, 0 replies; 20+ messages in thread From: Horst von Brand @ 2002-06-04 18:22 UTC (permalink / raw) To: Martin Dalecki; +Cc: Jens Axboe, Neil Brown, linux-kernel Martin Dalecki <dalecki@evision-ventures.com> said: > Well it's kind like the answer to the question: why don't do it all in hand > optimized assembler? Or in other words - let's give the GCC guys good > reasons for more hard work. But more seriously: > > Things like unlikely() tricks and other friends seldomly really > pay off if applied randomly. But they can: > > 1. Have quite contrary effects to what one would expect due to > the fact that one is targetting a single instruction set but in > reality multiple very different CPU archs or even multiple archs. > > 2. Changes/improvements to the compiler. > > My personal rule of thumb is - don't do something like the > above unless you did: > > 1. Some real global profiling. > 2. Some real CPU cycle counting on the micro level. > 3. You really have too. (This should be number 1!) Anybody trying to tune code should read (and learn by heart): @Book{bentley82:_writing_eff_progr, author = {Jon Louis Bentley}, title = {Writing Efficient Programs}, publisher = {Prentice Hall}, year = 1982 } (sadly out of print AFAIK) -- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 654431 Universidad Tecnica Federico Santa Maria +56 32 654239 Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.5.20 RAID5 compile error 2002-06-04 14:24 ` Martin Dalecki 2002-06-04 18:22 ` Horst von Brand @ 2002-06-05 10:19 ` Jens Axboe 1 sibling, 0 replies; 20+ messages in thread From: Jens Axboe @ 2002-06-05 10:19 UTC (permalink / raw) To: Martin Dalecki; +Cc: Neil Brown, linux-kernel On Tue, Jun 04 2002, Martin Dalecki wrote: > >WRT the unlikely(), if you have the hints available, why not pass them > >on? > > Well it's kind like the answer to the question: why don't do it all in hand > optimized assembler? Or in other words - let's give the GCC guys good [snip] If I didn't know better Martin, I would say that you are trolling. Surely you have better ways to spend your time than to fly fuck [1] every single patch posted on lkml? :-) I would agree with your entire email if you were talking about inlining. Yes we should not inline excessively without a good reason. However, the likely/unlikely hints provide both clues to the code reader (as someone else already established) about which is the likely code path or not, and of course to the compiler. I would agree that profiling would in some cases be needed before slapping on an unlikely() sticker. These are the cases where you do not know what the call path is typically like. When I put unlikely() in there, it's because I know there's a 1:50 ratio or more. Or maybe a 1:inf ratio, meaning it should only trigger because of a bug. [1] To "fuck a fly" is a danish expression, meaning excessive attention to detail. -- Jens Axboe ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.5.20 RAID5 compile error 2002-06-04 13:45 ` Martin Dalecki 2002-06-04 14:55 ` Jens Axboe @ 2002-06-04 17:00 ` Ingo Oeser 1 sibling, 0 replies; 20+ messages in thread From: Ingo Oeser @ 2002-06-04 17:00 UTC (permalink / raw) To: Martin Dalecki; +Cc: Jens Axboe, linux-kernel Hi Martin, Hi Jens, Hi LKML, On Tue, Jun 04, 2002 at 03:45:11PM +0200, Martin Dalecki wrote: > Jens - I have noticed some unlikely() tag "optimizations" in > tcq code too. unlikely() shows the reader immediately that this is (considered) a "rare" condition. This might stop janitors optimizing these cases at all or let people with deeper knowledge check, whether this is REALLY rare. likely() should lead to the analogous actions. So this is not only a hint for the compiler and I'm very happy to see this being used and can stand the code clutter caused by this ;-) OTOH your point about inline is very valid. I use it only for code splitting or trivial functions with decisions on compile time constants. Everything else just bloats the *.o files. Regards Ingo Oeser -- Science is what we can tell a computer. Art is everything else. --- D.E.Knuth ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.5.20 RAID5 compile error @ 2002-06-05 1:16 ` Neil Brown 2002-06-05 10:13 ` Jens Axboe 0 siblings, 1 reply; 20+ messages in thread From: Neil Brown @ 2002-06-05 1:16 UTC (permalink / raw) To: Jens Axboe; +Cc: Mike Black, linux-kernel On Tuesday June 4, axboe@suse.de wrote: > On Tue, Jun 04 2002, Jens Axboe wrote: > > On Tue, Jun 04 2002, Jens Axboe wrote: > > > plug? Wrt umem, it seems you could keep 'card' in the queuedata. Same > > > for raid5 and conf. > > > > Ok by actually looking at it, both card and conf are the queues > > themselves. So I'd say your approach is indeed a bit overkill. I'll send > > a patch in half an hour for you to review. > > Alright here's the block part of it, you'll need to merge your umem and > raid5 changes in with that. I'm just interested in knowing if this is > all you want/need. It's actually just a two line changes from the last > version posted -- set q->unplug_fn in blk_init_queue to our default > (you'll override that for umem and raid5, or rather you'll set it > yourself after blk_queue_make_request()), and then call that instead of > __generic_unplug_device directly from blk_run_queues(). > I can work with that... Below is a patch that fixes a couple of problems with umem and add support for md/raid5. I haven't tested it yet, but it looks right. It is on top of your patch. It irks me, though, that I need to embed a whole request_queue_t when I don't use the vast majority of it. What I would like is to have a struct blk_dev { make_request_fn *make_request_fn; unplug_fn *unplug_fn; struct list_head plug_list; unsigned long queue_flags; spinlock_t *queue_lock; } which I can embed in mddev_s and umem card, and you can embed in struct request_queue. Then struct block_device have have struct blk_dev bd_bdev; instead of struct request_queue bd_queue; and we could each find the containing structure from the "struct blk_dev" using a list_entry type macro. There would be a bit of an issue with this in that it would then not make sense to have a full request_queue_t in struct block_device, and users of BLK_DEFAULT_QUEUE would need to allocate their own request_queue_t.... is that such a bad idea? Anyway, I can live with the current situation if you don't like the above. NeilBrown ------------------------------------------------------------- Extend plugging work for md/raid5 and fix umem We embed a request_queue_t in the mddev structure and so have a separate one for each mddev. This is used for plugging (in raid5). Given this embeded request_queue_t, md_make_request no-longer needs to make from device number to mddev, but can map from the queue to the mddev instead. In umem, we move blk_plug_device and blk_remove_plug inside the lock, and device a ->queue function to return the right queue for each device. In ll_rw_blk.c we change blk_plug_device to avoid the check for the queue being empty as this may not be well define for umem/raid5. Instead, blk_plug_device is not called when the queue is not empty (which is mostly wasn' anyway). ----------- Diffstat output ------------ ./drivers/block/ll_rw_blk.c | 10 +++------- ./drivers/block/umem.c | 20 ++++++++++++++++---- ./drivers/md/md.c | 25 ++++++++++++++++++++++--- ./drivers/md/raid5.c | 21 +++++++-------------- ./include/linux/raid/md_k.h | 2 ++ ./include/linux/raid/raid5.h | 5 +---- 6 files changed, 51 insertions(+), 32 deletions(-) --- ./include/linux/raid/md_k.h 2002/06/05 00:15:12 1.1 +++ ./include/linux/raid/md_k.h 2002/06/05 00:53:06 1.2 @@ -214,6 +214,8 @@ atomic_t recovery_active; /* blocks scheduled, but not written */ wait_queue_head_t recovery_wait; + request_queue_t queue; /* for plugging ... */ + struct list_head all_mddevs; }; --- ./include/linux/raid/raid5.h 2002/06/05 00:15:55 1.1 +++ ./include/linux/raid/raid5.h 2002/06/05 00:53:06 1.2 @@ -176,7 +176,7 @@ * is put on a "delayed" queue until there are no stripes currently * in a pre-read phase. Further, if the "delayed" queue is empty when * a stripe is put on it then we "plug" the queue and do not process it - * until an unplg call is made. (the tq_disk list is run). + * until an unplug call is made. (blk_run_queues is run). * * When preread is initiated on a stripe, we set PREREAD_ACTIVE and add * it to the count of prereading stripes. @@ -228,9 +228,6 @@ * waiting for 25% to be free */ spinlock_t device_lock; - - int plugged; - struct tq_struct plug_tq; }; typedef struct raid5_private_data raid5_conf_t; --- ./drivers/block/ll_rw_blk.c 2002/06/05 00:13:22 1.2 +++ ./drivers/block/ll_rw_blk.c 2002/06/05 00:53:06 1.3 @@ -803,12 +803,6 @@ */ void blk_plug_device(request_queue_t *q) { - /* - * common case - */ - if (!elv_queue_empty(q)) - return; - if (!blk_queue_plugged(q)) { spin_lock(&blk_plug_lock); list_add_tail(&q->plug_list, &blk_plug_list); @@ -1381,10 +1375,12 @@ req = NULL; insert_here = q->queue_head.prev; - if (blk_queue_empty(q) || bio_barrier(bio)) { + if (blk_queue_empty(q)) { blk_plug_device(q); goto get_rq; } + if (bio_barrier(bio)) + goto get_rq; el_ret = elv_merge(q, &req, bio); switch (el_ret) { --- ./drivers/block/umem.c 2002/06/05 00:13:22 1.2 +++ ./drivers/block/umem.c 2002/06/05 00:53:06 1.3 @@ -292,7 +292,7 @@ * Whenever IO on the active page completes, the Ready page is activated * and the ex-Active page is clean out and made Ready. * Otherwise the Ready page is only activated when it becomes full, or - * when mm_unplug_device is called via run_task_queue(&tq_disk). + * when mm_unplug_device is called via blk_run_queues(). * * If a request arrives while both pages a full, it is queued, and b_rdev is * overloaded to record whether it was a read or a write. @@ -386,10 +386,10 @@ request_queue_t *q = data; struct cardinfo *card = q->queuedata; - blk_remove_plug(q); spin_lock_bh(&card->lock); - activate(card); + if (blk_remove_plug(q)) + activate(card); spin_unlock_bh(&card->lock); } @@ -578,9 +578,9 @@ *card->biotail = bio; bio->bi_next = NULL; card->biotail = &bio->bi_next; + blk_plug_device(q); spin_unlock_bh(&card->lock); - blk_plug_device(q); return 0; } @@ -1240,6 +1240,17 @@ -- mm_init ----------------------------------------------------------------------------------- */ + +static request_queue_t * mm_queue_proc(kdev_t dev) +{ + int c = DEVICE_NR(bio->bi_bdev->bd_dev); + + if (c < MM_MAXCARDS) + return &cards[c].queue; + else + return BLK_DEFAULT_QUEUE(MAJOR_NR); +} + int __init mm_init(void) { int retval, i; @@ -1279,6 +1290,7 @@ mm_gendisk.part = mm_partitions; mm_gendisk.nr_real = num_cards; + blk_dev[MAJOR_NR].queue = mm_queue_proc; add_gendisk(&mm_gendisk); blk_size[MAJOR_NR] = mm_gendisk.sizes; --- ./drivers/md/raid5.c 2002/06/05 00:24:34 1.1 +++ ./drivers/md/raid5.c 2002/06/05 00:53:06 1.2 @@ -1225,14 +1225,14 @@ } static void raid5_unplug_device(void *data) { - raid5_conf_t *conf = (raid5_conf_t *)data; + request_queue_t *q = data; + raid5_conf_t *conf = q->queuedata; unsigned long flags; spin_lock_irqsave(&conf->device_lock, flags); - raid5_activate_delayed(conf); - - conf->plugged = 0; + if (blk_remove_plug(q)) + raid5_activate_delayed(conf); md_wakeup_thread(conf->thread); spin_unlock_irqrestore(&conf->device_lock, flags); @@ -1241,11 +1241,7 @@ static inline void raid5_plug_device(raid5_conf_t *conf) { spin_lock_irq(&conf->device_lock); - if (list_empty(&conf->delayed_list)) - if (!conf->plugged) { - conf->plugged = 1; - queue_task(&conf->plug_tq, &tq_disk); - } + blk_plug_device(&conf->mddev->rq); spin_unlock_irq(&conf->device_lock); } @@ -1352,7 +1348,7 @@ if (list_empty(&conf->handle_list) && atomic_read(&conf->preread_active_stripes) < IO_THRESHOLD && - !conf->plugged && + !blk_queue_plugged(&mddev->queue) && !list_empty(&conf->delayed_list)) raid5_activate_delayed(conf); @@ -1443,10 +1439,7 @@ atomic_set(&conf->active_stripes, 0); atomic_set(&conf->preread_active_stripes, 0); - conf->plugged = 0; - conf->plug_tq.sync = 0; - conf->plug_tq.routine = &raid5_unplug_device; - conf->plug_tq.data = conf; + mddev->queue.unplug_fn = raid5_unplug_device; PRINTK("raid5: run(md%d) called.\n", mdidx(mddev)); --- ./drivers/md/md.c 2002/06/05 00:30:58 1.1 +++ ./drivers/md/md.c 2002/06/05 00:53:06 1.2 @@ -171,7 +171,7 @@ static int md_make_request (request_queue_t *q, struct bio *bio) { - mddev_t *mddev = kdev_to_mddev(to_kdev_t(bio->bi_bdev->bd_dev)); + mddev_t *mddev = q->queuedata; if (mddev && mddev->pers) return mddev->pers->make_request(mddev, bio_rw(bio), bio); @@ -181,6 +181,12 @@ } } +static int md_fail_request (request_queue_t *q, struct bio *bio) +{ + bio_io_error(bio); + return 0; +} + static mddev_t * alloc_mddev(kdev_t dev) { mddev_t *mddev; @@ -1710,6 +1716,9 @@ } mddev->pers = pers[pnum]; + blk_queue_make_request(&mddev->queue, md_make_request); + mddev->queue.queuedata = mddev; + err = mddev->pers->run(mddev); if (err) { printk(KERN_ERR "md: pers->run() failed ...\n"); @@ -3615,6 +3624,15 @@ #endif } +request_queue_t * md_queue_proc(kdev_t dev) +{ + mddev_t *mddev = kdev_to_mddev(dev); + if (mddev == NULL) + return BLK_DEFAULT_QUEUE(MAJOR_NR); + else + return &mddev->queue; +} + int __init md_init(void) { static char * name = "mdrecoveryd"; @@ -3639,8 +3657,9 @@ S_IFBLK | S_IRUSR | S_IWUSR, &md_fops, NULL); } - /* forward all md request to md_make_request */ - blk_queue_make_request(BLK_DEFAULT_QUEUE(MAJOR_NR), md_make_request); + /* all requests on an uninitialised device get failed... */ + blk_queue_make_request(BLK_DEFAULT_QUEUE(MAJOR_NR), md_fail_request); + blk_dev[MAJOR_NR].queue = md_queue_proc; add_gendisk(&md_gendisk); ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.5.20 RAID5 compile error 2002-06-05 1:16 ` Neil Brown @ 2002-06-05 10:13 ` Jens Axboe 2002-06-06 2:58 ` Neil Brown 0 siblings, 1 reply; 20+ messages in thread From: Jens Axboe @ 2002-06-05 10:13 UTC (permalink / raw) To: Neil Brown; +Cc: Mike Black, linux-kernel On Wed, Jun 05 2002, Neil Brown wrote: > On Tuesday June 4, axboe@suse.de wrote: > > On Tue, Jun 04 2002, Jens Axboe wrote: > > > On Tue, Jun 04 2002, Jens Axboe wrote: > > > > plug? Wrt umem, it seems you could keep 'card' in the queuedata. Same > > > > for raid5 and conf. > > > > > > Ok by actually looking at it, both card and conf are the queues > > > themselves. So I'd say your approach is indeed a bit overkill. I'll send > > > a patch in half an hour for you to review. > > > > Alright here's the block part of it, you'll need to merge your umem and > > raid5 changes in with that. I'm just interested in knowing if this is > > all you want/need. It's actually just a two line changes from the last > > version posted -- set q->unplug_fn in blk_init_queue to our default > > (you'll override that for umem and raid5, or rather you'll set it > > yourself after blk_queue_make_request()), and then call that instead of > > __generic_unplug_device directly from blk_run_queues(). > > > I can work with that... > > Below is a patch that fixes a couple of problems with umem and add > support for md/raid5. I haven't tested it yet, but it looks right. It > is on top of your patch. > > It irks me, though, that I need to embed a whole request_queue_t when > I don't use the vast majority of it. Yeah this annoys me too... > What I would like is to have a > > struct blk_dev { > make_request_fn *make_request_fn; > unplug_fn *unplug_fn; > struct list_head plug_list; > unsigned long queue_flags; > spinlock_t *queue_lock; > } > > which I can embed in mddev_s and umem card, and you can embed in > struct request_queue. To some extent, I agree with you. However, I'm not sure it's worth it abstracting this bit out. The size of the request queue right now in 2.5.20 (with plug change) is 196 bytes on i386 SMP. If you really feel it's worth saving these bytes, then I'd much rather go in a different direction: only keep the "main" elements of request_queue_t, and have blk_init_queue() alloc the "remainder" > In ll_rw_blk.c we change blk_plug_device to avoid the check for > the queue being empty as this may not be well define for > umem/raid5. Instead, blk_plug_device is not called when > the queue is not empty (which is mostly wasn' anyway). I left it that way on purpose, and yes I did see you changed that in the previous patch as well. I think it's a bit of a mess to have both blk_plug_device and blk_plug_queue. Without looking at it, I have no idea which does what?! blk_queue_empty() will always be true for make_request_fn type drivers, so no change is necessary there. -- Jens Axboe ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.5.20 RAID5 compile error @ 2002-06-06 2:58 ` Neil Brown 2002-06-06 5:31 ` Jens Axboe 0 siblings, 1 reply; 20+ messages in thread From: Neil Brown @ 2002-06-06 2:58 UTC (permalink / raw) To: Jens Axboe; +Cc: Mike Black, linux-kernel On Wednesday June 5, axboe@suse.de wrote: > > To some extent, I agree with you. However, I'm not sure it's worth it > abstracting this bit out. The size of the request queue right now in > 2.5.20 (with plug change) is 196 bytes on i386 SMP. If you really feel > it's worth saving these bytes, then I'd much rather go in a different > direction: only keep the "main" elements of request_queue_t, and have > blk_init_queue() alloc the "remainder" It's not really the bytes that worry about. More the cleanliness of the abstraction. I actually realy like the approach of embeding a more general data structure inside a more specific data structure and using a list_entry type macro to get from the one to the other... it has a very OO feel to it (which is, in this case, good I think) but I suspect it a very personal-taste sort of thing. > > > In ll_rw_blk.c we change blk_plug_device to avoid the check for > > the queue being empty as this may not be well define for > > umem/raid5. Instead, blk_plug_device is not called when > > the queue is not empty (which is mostly wasn' anyway). > > I left it that way on purpose, and yes I did see you changed that in the > previous patch as well. I think it's a bit of a mess to have both > blk_plug_device and blk_plug_queue. Without looking at it, I have no > idea which does what?! blk_queue_empty() will always be true for > make_request_fn type drivers, so no change is necessary there. I'm not pushing the blk_plug_device vs blk_plug_queue distinction. It is just that I think the current blk_plug_device is wrong... let me try to show you why.. First, look where it is called, in __make_request (which I always thought should be spelt "elevator_make_request") - note that this is the ONLY place that it is called other than the calls that have just been introduced in md and umem - : if (blk_queue_empty(q) || bio_barrier(bio)) { blk_plug_device(q); goto get_rq; } Which says "if the queue is empty or this is a barrier bio, then plug the queue and go an make a request, skipping the merging". One then wonders why you would want to plug the queue for a barrier bio. The queue-empty bit makes sense, but not the barrier bit... Ok, lets see exactly what blk_plug_device(q) does by (mentally) inlining it: if (blk_queue_empty(q) || bio_barrier(bio)) { if (!elv_queue_empty(q)) goto return; if (!blk_queue_plugged(q)) { spin_lock(&blk_plug_lock); list_add_tail(&q->plug_list, &blk_plug_list); spin_unlock(&blk_plug_lock); } return: goto get_rq; } So we are actually plugging it if it isn't already plugged (makes sense) and elv_queue_empty, and blk_queue_empty ... or bio_barrier. I wander what those two differnt *_queue_empty functions are ..... looks in blkdev.h.. Oh, they are the same. So we can simplify this a bit: If (the_queue_is_empty(q)) { plug_if_not_plugged(q); goto get_rq; } if (bio_barrier(bio)) { /* we know the queue is not empty, so avoid that check and simply don't plug */ goto get_rq; } Now that makes sense. The answer to the question "why would you want to plug the queue for a barrier bio?" is that you don't. This is how I came to the change the I suggested. The current code is confusing, and testing elv_queue_empty in blk_plug_device is really a layering violation. You are correct from a operational sense when you say that "no change is necessary there" (providing the queue_head is initialised, which it is by blk_init_queue via elevator_init, but isn't by blk_queue_make_request) but I don't think you are correct from an abstractional purity perspective. NeilBrown ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.5.20 RAID5 compile error 2002-06-06 2:58 ` Neil Brown @ 2002-06-06 5:31 ` Jens Axboe 0 siblings, 0 replies; 20+ messages in thread From: Jens Axboe @ 2002-06-06 5:31 UTC (permalink / raw) To: Neil Brown; +Cc: Mike Black, linux-kernel On Thu, Jun 06 2002, Neil Brown wrote: > > > In ll_rw_blk.c we change blk_plug_device to avoid the check for > > > the queue being empty as this may not be well define for > > > umem/raid5. Instead, blk_plug_device is not called when > > > the queue is not empty (which is mostly wasn' anyway). > > > > I left it that way on purpose, and yes I did see you changed that in the > > previous patch as well. I think it's a bit of a mess to have both > > blk_plug_device and blk_plug_queue. Without looking at it, I have no > > idea which does what?! blk_queue_empty() will always be true for > > make_request_fn type drivers, so no change is necessary there. > > I'm not pushing the blk_plug_device vs blk_plug_queue distinction. > > It is just that I think the current blk_plug_device is wrong... let > me try to show you why.. > > First, look where it is called, in __make_request (which I always > thought should be spelt "elevator_make_request") - note that this is > the ONLY place that it is called other than the calls that have just > been introduced in md and umem - : > > > if (blk_queue_empty(q) || bio_barrier(bio)) { > blk_plug_device(q); > goto get_rq; > } > > Which says "if the queue is empty or this is a barrier bio, then > plug the queue and go an make a request, skipping the merging". > > One then wonders why you would want to plug the queue for a barrier > bio. The queue-empty bit makes sense, but not the barrier bit... No for the barrier bit we need not do the plugging, we just need to skip the merge step and go directly to grabbing a new request. However if the queue is indeed empty, then we do need to plug it. See? > Ok, lets see exactly what blk_plug_device(q) does by (mentally) > inlining it: > > if (blk_queue_empty(q) || bio_barrier(bio)) { > > if (!elv_queue_empty(q)) > goto return; > > if (!blk_queue_plugged(q)) { > spin_lock(&blk_plug_lock); > list_add_tail(&q->plug_list, &blk_plug_list); > spin_unlock(&blk_plug_lock); > } > return: > goto get_rq; > } > > So we are actually plugging it if it isn't already plugged (makes > sense) and elv_queue_empty, and blk_queue_empty ... or bio_barrier. > I wander what those two differnt *_queue_empty functions are ..... > looks in blkdev.h.. Oh, they are the same. Oh agreed, I'll get rid of one of them. > So we can simplify this a bit: > > If (the_queue_is_empty(q)) { > plug_if_not_plugged(q); > goto get_rq; > } > if (bio_barrier(bio)) { > /* we know the queue is not empty, so avoid that check and > simply don't plug */ > goto get_rq; > } > > Now that makes sense. The answer to the question "why would you want > to plug the queue for a barrier bio?" is that you don't. The answer is that you don't want to plug it anymore than what you do for a regular request, but see what I wrote above. > This is how I came to the change the I suggested. The current code is > confusing, and testing elv_queue_empty in blk_plug_device is really a > layering violation. > > You are correct from a operational sense when you say that "no change > is necessary there" (providing the queue_head is initialised, which it > is by blk_init_queue via elevator_init, but isn't by > blk_queue_make_request) but I don't think you are correct from an > abstractional purity perspective. Alright you've convinced me about that part. Will you send me the updated patch? -- Jens Axboe ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.5.20 RAID5 compile error 2002-06-04 11:51 ` Jens Axboe 2002-06-04 11:56 ` Neil Brown @ 2002-06-04 21:48 ` Miles Lane 1 sibling, 0 replies; 20+ messages in thread From: Miles Lane @ 2002-06-04 21:48 UTC (permalink / raw) To: Jens Axboe; +Cc: Mike Black, linux-kernel On Tue, 2002-06-04 at 04:51, Jens Axboe wrote: > On Mon, Jun 03 2002, Mike Black wrote: > > RAID5 still doesn't compile....sigh.... > > [snip] > > Some people do nothing but complain instead of trying to fix things. > Sigh... Hi Jens. Maybe Mike Black is a kernel hacking guru and, therefore, deserves your impatience. I send in a lot of bug reports. I don't sigh about finding bugs, because I enjoy testing and want to contribute to the kernel evolution as much as possible. Testing is my core competency, so that's what I focus on. Anyhow, it's helpful when developers are patient with the limitations of us testers. Miles ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2002-06-06 5:31 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-06-03 18:36 2.5.20 RAID5 compile error Mike Black 2002-06-04 11:51 ` Jens Axboe 2002-06-04 11:56 ` Neil Brown 2002-06-04 11:58 ` Jens Axboe 2002-06-04 12:15 ` Neil Brown 2002-06-04 12:21 ` Jens Axboe 2002-06-04 12:32 ` Jens Axboe 2002-06-04 12:38 ` Jens Axboe 2002-06-04 14:23 ` Jens Axboe 2002-06-04 13:45 ` Martin Dalecki 2002-06-04 14:55 ` Jens Axboe 2002-06-04 14:24 ` Martin Dalecki 2002-06-04 18:22 ` Horst von Brand 2002-06-05 10:19 ` Jens Axboe 2002-06-04 17:00 ` Ingo Oeser 2002-06-05 1:16 ` Neil Brown 2002-06-05 10:13 ` Jens Axboe 2002-06-06 2:58 ` Neil Brown 2002-06-06 5:31 ` Jens Axboe 2002-06-04 21:48 ` Miles Lane
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox