* 2.4.7-pre9..
@ 2001-07-20 5:17 Linus Torvalds
2001-07-20 7:22 ` 2.4.7-pre9 Jens Axboe
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Linus Torvalds @ 2001-07-20 5:17 UTC (permalink / raw)
To: Kernel Mailing List
Cc: Alexander Viro, David S. Miller, Andrea Arcangeli, Alan Cox,
David Woodhouse, linux-scsi, Andrew Morton
I'm getting ready to do a 2.4.7, but one of the fixes in 2.4.7 is a nasty
SMP race that was found and made it clear that using an old trick of
having a semaphore on the stack and doing "down()" on it to wait for some
event (that would do the "up()") was a really bad idea.
This kind of trick was used in the kernel vfork() implementation, and also
in block device "wait for request completion". I've fixed both with a new
and fairly simple "wait for completion" infrastructure, but I'd like
especially SCSI device driver writers to check their own drivers as a
result before I make the final 2.4.7.
I've changed all generic code, so drivers are all expected to compile and
work. However, some SCSI drivers use the semaphore trick in their own
code, and I've not mucked with that. It's not worth worrying about too
much, as the race is basically impossible to hit (famous last words), but
I wanted a heads-up and people to give it a quick look. I also wanted to
have people who actually have the hardware in question to verify that my
untested (but on the face of it obvious) changes are indeed working.
So please give it a quick spin,
Linus
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: 2.4.7-pre9.. 2001-07-20 5:17 2.4.7-pre9 Linus Torvalds @ 2001-07-20 7:22 ` Jens Axboe 2001-07-20 8:15 ` 2.4.7-pre9 Linus Torvalds 2001-07-20 10:23 ` 2.4.7-pre9 David Woodhouse 2001-07-23 12:56 ` 2.4.7-pre9 Pavel Machek 2 siblings, 1 reply; 14+ messages in thread From: Jens Axboe @ 2001-07-20 7:22 UTC (permalink / raw) To: Linus Torvalds Cc: Kernel Mailing List, Alexander Viro, David S. Miller, Andrea Arcangeli, Alan Cox, David Woodhouse, linux-scsi, Andrew Morton [-- Attachment #1: Type: text/plain, Size: 492 bytes --] On Thu, Jul 19 2001, Linus Torvalds wrote: > > I'm getting ready to do a 2.4.7, but one of the fixes in 2.4.7 is a nasty > SMP race that was found and made it clear that using an old trick of > having a semaphore on the stack and doing "down()" on it to wait for some > event (that would do the "up()") was a really bad idea. Attached are two patches -- one that should fix DAC960 for this new completion scheme, and one that corrects the corrected comment in blkdev.h :-) -- Jens Axboe [-- Attachment #2: DAC960-completion-1 --] [-- Type: text/plain, Size: 6169 bytes --] --- linux/drivers/block/DAC960.c~ Fri Jul 20 09:18:43 2001 +++ linux/drivers/block/DAC960.c Fri Jul 20 09:18:44 2001 @@ -40,6 +40,7 @@ #include <linux/spinlock.h> #include <linux/timer.h> #include <linux/pci.h> +#include <linux/completion.h> #include <asm/io.h> #include <asm/segment.h> #include <asm/uaccess.h> @@ -484,14 +485,14 @@ static void DAC960_ExecuteCommand(DAC960_Command_T *Command) { DAC960_Controller_T *Controller = Command->Controller; - DECLARE_MUTEX_LOCKED(Semaphore); + DECLARE_COMPLETION(Wait); unsigned long ProcessorFlags; - Command->Semaphore = &Semaphore; + Command->Waiting = &Wait; DAC960_AcquireControllerLock(Controller, &ProcessorFlags); DAC960_QueueCommand(Command); DAC960_ReleaseControllerLock(Controller, &ProcessorFlags); if (in_interrupt()) return; - down(&Semaphore); + wait_for_completion(&Wait); } @@ -1316,7 +1317,7 @@ *Controller) { DAC960_V1_DCDB_T DCDBs[DAC960_V1_MaxChannels], *DCDB; - Semaphore_T Semaphores[DAC960_V1_MaxChannels], *Semaphore; + Completion_T Wait[DAC960_V1_MaxChannels], *wait; unsigned long ProcessorFlags; int Channel, TargetID; for (TargetID = 0; TargetID < Controller->Targets; TargetID++) @@ -1327,12 +1328,12 @@ DAC960_SCSI_Inquiry_T *InquiryStandardData = &Controller->V1.InquiryStandardData[Channel][TargetID]; InquiryStandardData->PeripheralDeviceType = 0x1F; - Semaphore = &Semaphores[Channel]; - init_MUTEX_LOCKED(Semaphore); + wait = &Wait[Channel]; + init_completion(wait); DCDB = &DCDBs[Channel]; DAC960_V1_ClearCommand(Command); Command->CommandType = DAC960_ImmediateCommand; - Command->Semaphore = Semaphore; + Command->Waiting = wait; Command->V1.CommandMailbox.Type3.CommandOpcode = DAC960_V1_DCDB; Command->V1.CommandMailbox.Type3.BusAddress = Virtual_to_Bus32(DCDB); DCDB->Channel = Channel; @@ -1363,11 +1364,11 @@ DAC960_SCSI_Inquiry_UnitSerialNumber_T *InquiryUnitSerialNumber = &Controller->V1.InquiryUnitSerialNumber[Channel][TargetID]; InquiryUnitSerialNumber->PeripheralDeviceType = 0x1F; - Semaphore = &Semaphores[Channel]; - down(Semaphore); + wait = &Wait[Channel]; + wait_for_completion(wait); if (Command->V1.CommandStatus != DAC960_V1_NormalCompletion) continue; - Command->Semaphore = Semaphore; + Command->Waiting = wait; DCDB = &DCDBs[Channel]; DCDB->TransferLength = sizeof(DAC960_SCSI_Inquiry_UnitSerialNumber_T); DCDB->BusAddress = Virtual_to_Bus32(InquiryUnitSerialNumber); @@ -1381,7 +1382,7 @@ DAC960_AcquireControllerLock(Controller, &ProcessorFlags); DAC960_QueueCommand(Command); DAC960_ReleaseControllerLock(Controller, &ProcessorFlags); - down(Semaphore); + wait_for_completion(wait); } } return true; @@ -2768,7 +2769,7 @@ if (Request->cmd == READ) Command->CommandType = DAC960_ReadCommand; else Command->CommandType = DAC960_WriteCommand; - Command->Semaphore = Request->sem; + Command->Waiting = Request->waiting; Command->LogicalDriveNumber = DAC960_LogicalDriveNumber(Request->rq_dev); Command->BlockNumber = Request->sector @@ -2924,10 +2925,10 @@ /* Wake up requestor for swap file paging requests. */ - if (Command->Semaphore != NULL) + if (Command->Waiting) { - up(Command->Semaphore); - Command->Semaphore = NULL; + complete(Command->Waiting); + Command->Waiting = NULL; } add_blkdev_randomness(DAC960_MAJOR + Controller->ControllerNumber); } @@ -2969,13 +2970,10 @@ DAC960_ProcessCompletedBuffer(BufferHeader, false); BufferHeader = NextBufferHeader; } - /* - Wake up requestor for swap file paging requests. - */ - if (Command->Semaphore != NULL) + if (Command->Waiting) { - up(Command->Semaphore); - Command->Semaphore = NULL; + complete(Command->Waiting); + Command->Waiting = NULL; } } } @@ -3589,8 +3587,8 @@ } if (CommandType == DAC960_ImmediateCommand) { - up(Command->Semaphore); - Command->Semaphore = NULL; + complete(Command->Waiting); + Command->Waiting = NULL; return; } if (CommandType == DAC960_QueuedCommand) @@ -3931,13 +3929,10 @@ DAC960_ProcessCompletedBuffer(BufferHeader, true); BufferHeader = NextBufferHeader; } - /* - Wake up requestor for swap file paging requests. - */ - if (Command->Semaphore != NULL) + if (Command->Waiting) { - up(Command->Semaphore); - Command->Semaphore = NULL; + complete(Command->Waiting); + Command->Waiting = NULL; } add_blkdev_randomness(DAC960_MAJOR + Controller->ControllerNumber); } @@ -3979,13 +3974,10 @@ DAC960_ProcessCompletedBuffer(BufferHeader, false); BufferHeader = NextBufferHeader; } - /* - Wake up requestor for swap file paging requests. - */ - if (Command->Semaphore != NULL) + if (Command->Waiting) { - up(Command->Semaphore); - Command->Semaphore = NULL; + complete(Command->Waiting); + Command->Waiting = NULL; } } } @@ -4539,8 +4531,8 @@ } if (CommandType == DAC960_ImmediateCommand) { - up(Command->Semaphore); - Command->Semaphore = NULL; + complete(Command->Waiting); + Command->Waiting = NULL; return; } if (CommandType == DAC960_QueuedCommand) --- linux/drivers/block/DAC960.h~ Fri Jul 20 09:14:39 2001 +++ linux/drivers/block/DAC960.h Fri Jul 20 09:14:48 2001 @@ -2153,7 +2153,7 @@ typedef struct pt_regs Registers_T; typedef struct request IO_Request_T; typedef request_queue_t RequestQueue_T; -typedef struct semaphore Semaphore_T; +typedef struct completion Completion_T; typedef struct super_block SuperBlock_T; typedef struct timer_list Timer_T; typedef wait_queue_head_t WaitQueue_T; @@ -2220,7 +2220,7 @@ DAC960_CommandType_T CommandType; struct DAC960_Controller *Controller; struct DAC960_Command *Next; - Semaphore_T *Semaphore; + Completion_T *Waiting; unsigned int LogicalDriveNumber; unsigned int BlockNumber; unsigned int BlockCount; [-- Attachment #3: blkdev-1 --] [-- Type: text/plain, Size: 434 bytes --] --- linux/include/linux/blkdev.h~ Fri Jul 20 09:20:02 2001 +++ linux/include/linux/blkdev.h Fri Jul 20 09:20:14 2001 @@ -14,9 +14,7 @@ /* * Ok, this is an expanded form so that we can use the same - * request for paging requests when that is implemented. In - * paging, 'bh' is NULL, and the completion is used to wait - * for the IO to be ready. + * request for paging requests. */ struct request { struct list_head queue; ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.4.7-pre9.. 2001-07-20 7:22 ` 2.4.7-pre9 Jens Axboe @ 2001-07-20 8:15 ` Linus Torvalds 2001-07-20 8:26 ` 2.4.7-pre9 Jens Axboe 0 siblings, 1 reply; 14+ messages in thread From: Linus Torvalds @ 2001-07-20 8:15 UTC (permalink / raw) To: Jens Axboe Cc: Kernel Mailing List, Alexander Viro, David S. Miller, Andrea Arcangeli, Alan Cox, David Woodhouse, linux-scsi, Andrew Morton On Fri, 20 Jul 2001, Jens Axboe wrote: > > Attached are two patches -- one that should fix DAC960 for this new > completion scheme, and one that corrects the corrected comment in > blkdev.h :-) No, the correction of the correction is worse. The paging stuff doesn't use any of this. The paging stuff use the page cache lock bit, and always has. The only thing that uses request completion checking are special commands, like the initial SCSI spin-up etc (scsi_init_one()). Linus ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.4.7-pre9.. 2001-07-20 8:15 ` 2.4.7-pre9 Linus Torvalds @ 2001-07-20 8:26 ` Jens Axboe 2001-07-20 16:42 ` 2.4.7-pre9 Linus Torvalds 0 siblings, 1 reply; 14+ messages in thread From: Jens Axboe @ 2001-07-20 8:26 UTC (permalink / raw) To: Linus Torvalds Cc: Kernel Mailing List, Alexander Viro, David S. Miller, Andrea Arcangeli, Alan Cox, David Woodhouse, linux-scsi, Andrew Morton On Fri, Jul 20 2001, Linus Torvalds wrote: > > > On Fri, 20 Jul 2001, Jens Axboe wrote: > > > > Attached are two patches -- one that should fix DAC960 for this new > > completion scheme, and one that corrects the corrected comment in > > blkdev.h :-) > > No, the correction of the correction is worse. ? > The paging stuff doesn't use any of this. The paging stuff use the page > cache lock bit, and always has. Paging still hits a request, I assumed that's what the (really really) old comment meant to say. > The only thing that uses request completion checking are special commands, > like the initial SCSI spin-up etc (scsi_init_one()). Sure, and IDE ide_wait etc. -- Jens Axboe ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.4.7-pre9.. 2001-07-20 8:26 ` 2.4.7-pre9 Jens Axboe @ 2001-07-20 16:42 ` Linus Torvalds 2001-07-20 18:57 ` 2.4.7-pre9 Jens Axboe 0 siblings, 1 reply; 14+ messages in thread From: Linus Torvalds @ 2001-07-20 16:42 UTC (permalink / raw) To: Jens Axboe Cc: Kernel Mailing List, Alexander Viro, David S. Miller, Andrea Arcangeli, Alan Cox, David Woodhouse, linux-scsi, Andrew Morton On Fri, 20 Jul 2001, Jens Axboe wrote: > > > The paging stuff doesn't use any of this. The paging stuff use the page > > cache lock bit, and always has. > > Paging still hits a request, I assumed that's what the (really really) > old comment meant to say. No. Tha paging stuff (and _all_ regular IO) uses a regular request, with a NULL waiter. That's the normal path. That normal path depends on the buffer heads associated with the requests and their "bh->b_end_io()" function marking other state up-to-date, and then waits on the page being locked. The req->sem (and now req->completion) thing is a very different thing: it is a "we are waiting for this particular request", and is used when it's not really IO and doesn't have a bh, but something that has side effects. Like an ioctl that causes a special command to the disk to change some parameters, or query the size of the disk or something. So the comment has just always been wrong, I think. It may be that the original swapping code was doing raw requests instead of real IO, so maybe the comment was actually correct back in 1992 or something. My memory isn't good enough.. Linus ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.4.7-pre9.. 2001-07-20 16:42 ` 2.4.7-pre9 Linus Torvalds @ 2001-07-20 18:57 ` Jens Axboe 0 siblings, 0 replies; 14+ messages in thread From: Jens Axboe @ 2001-07-20 18:57 UTC (permalink / raw) To: Linus Torvalds Cc: Kernel Mailing List, Alexander Viro, David S. Miller, Andrea Arcangeli, Alan Cox, David Woodhouse, linux-scsi, Andrew Morton On Fri, Jul 20 2001, Linus Torvalds wrote: > > > On Fri, 20 Jul 2001, Jens Axboe wrote: > > > > > The paging stuff doesn't use any of this. The paging stuff use the page > > > cache lock bit, and always has. > > > > Paging still hits a request, I assumed that's what the (really really) > > old comment meant to say. > > No. Tha paging stuff (and _all_ regular IO) uses a regular request, with a > NULL waiter. That's the normal path. That normal path depends on the > buffer heads associated with the requests and their "bh->b_end_io()" > function marking other state up-to-date, and then waits on the page being > locked. This is perfectly clear. I'm saying 'paging uses a request just like any other I/O', and you seem to disagree and restate the same thing?! In fact the lower layers have no way of knowing what is what, paging or not. > The req->sem (and now req->completion) thing is a very different thing: it > is a "we are waiting for this particular request", and is used when it's > not really IO and doesn't have a bh, but something that has side effects. > Like an ioctl that causes a special command to the disk to change some > parameters, or query the size of the disk or something. Ditto! Are you reading what I write? > So the comment has just always been wrong, I think. It may be that the > original swapping code was doing raw requests instead of real IO, so maybe > the comment was actually correct back in 1992 or something. My memory > isn't good enough.. Good, so now you agree that the corrected comment (as per pre9) is crap, and the patch I sent that changes the wording to: "Ok, this is an expanded form so that we can use the same request for paging requests." is so much better than _you_ mixing ->waiting and ->sem into this paging or non-paging pool? But in fact the whole comment block should just be removed. It gives no useful or additional information. -- Jens Axboe ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.4.7-pre9.. 2001-07-20 5:17 2.4.7-pre9 Linus Torvalds 2001-07-20 7:22 ` 2.4.7-pre9 Jens Axboe @ 2001-07-20 10:23 ` David Woodhouse 2001-07-23 12:56 ` 2.4.7-pre9 Pavel Machek 2 siblings, 0 replies; 14+ messages in thread From: David Woodhouse @ 2001-07-20 10:23 UTC (permalink / raw) To: Linus Torvalds; +Cc: Kernel Mailing List, jgarzik, alan, zaitcev, jerdfelt torvalds@transmeta.com said: > I'm getting ready to do a 2.4.7, but one of the fixes in 2.4.7 is a > nasty SMP race that was found and made it clear that using an old > trick of having a semaphore on the stack and doing "down()" on it to > wait for some event (that would do the "up()") was a really bad idea. We need s/up_and_exit/complete_and_exit/ then. Index: drivers/i2o/i2o_block.c =================================================================== RCS file: /inst/cvs/linux/drivers/i2o/i2o_block.c,v retrieving revision 1.3.2.25 diff -u -r1.3.2.25 i2o_block.c --- drivers/i2o/i2o_block.c 2001/07/20 08:52:40 1.3.2.25 +++ drivers/i2o/i2o_block.c 2001/07/20 09:18:16 @@ -61,6 +61,7 @@ #include <asm/uaccess.h> #include <asm/semaphore.h> +#include <linux/completion.h> #include <asm/io.h> #include <asm/atomic.h> #include <linux/smp_lock.h> @@ -187,7 +188,7 @@ * evt_msg contains the last event. */ static DECLARE_MUTEX_LOCKED(i2ob_evt_sem); -static DECLARE_MUTEX_LOCKED(i2ob_thread_dead); +static DECLARE_COMPLETION(i2ob_thread_dead); static spinlock_t i2ob_evt_lock = SPIN_LOCK_UNLOCKED; static u32 evt_msg[MSG_FRAME_SIZE>>2]; @@ -859,7 +860,7 @@ } }; - up_and_exit(&i2ob_thread_dead,0); + complete_and_exit(&i2ob_thread_dead,0); return 0; } @@ -2002,7 +2003,7 @@ printk("waiting..."); } /* Be sure it died */ - down(&i2ob_thread_dead); + wait_for_completion(&i2ob_thread_dead); printk("done.\n"); } Index: drivers/i2o/i2o_core.c =================================================================== RCS file: /inst/cvs/linux/drivers/i2o/i2o_core.c,v retrieving revision 1.3.2.25 diff -u -r1.3.2.25 i2o_core.c --- drivers/i2o/i2o_core.c 2001/05/14 10:36:06 1.3.2.25 +++ drivers/i2o/i2o_core.c 2001/07/20 09:17:54 @@ -43,6 +43,7 @@ #include <linux/interrupt.h> #include <linux/sched.h> #include <asm/semaphore.h> +#include <linux/completion.h> #include <asm/io.h> #include <linux/reboot.h> @@ -206,7 +207,7 @@ */ static DECLARE_MUTEX(evt_sem); -static DECLARE_MUTEX_LOCKED(evt_dead); +static DECLARE_COMPLETION(evt_dead); DECLARE_WAIT_QUEUE_HEAD(evt_wait); static struct notifier_block i2o_reboot_notifier = @@ -905,7 +906,7 @@ dprintk(KERN_INFO "I2O event thread dead\n"); printk("exiting..."); evt_running = 0; - up_and_exit(&evt_dead, 0); + complete_and_exit(&evt_dead, 0); } /* @@ -3427,7 +3428,7 @@ stat = kill_proc(evt_pid, SIGTERM, 1); if(!stat) { printk("waiting..."); - down(&evt_dead); + wait_for_completion(&evt_dead); } printk("done.\n"); } Index: drivers/net/8139too.c =================================================================== RCS file: /inst/cvs/linux/drivers/net/8139too.c,v retrieving revision 1.1.2.28 diff -u -r1.1.2.28 8139too.c --- drivers/net/8139too.c 2001/07/19 07:54:26 1.1.2.28 +++ drivers/net/8139too.c 2001/07/20 09:20:09 @@ -152,10 +152,10 @@ #include <linux/delay.h> #include <linux/ethtool.h> #include <linux/mii.h> +#include <linux/completion.h> #include <asm/io.h> #include <asm/uaccess.h> - #define RTL8139_DRIVER_NAME DRV_NAME " Fast Ethernet driver " DRV_VERSION #define PFX DRV_NAME ": " @@ -585,7 +585,7 @@ chip_t chipset; pid_t thr_pid; wait_queue_head_t thr_wait; - struct semaphore thr_exited; + struct completion thr_exited; u32 rx_config; struct rtl_extra_stats xstats; }; @@ -970,7 +970,7 @@ tp->mmio_addr = ioaddr; spin_lock_init (&tp->lock); init_waitqueue_head (&tp->thr_wait); - init_MUTEX_LOCKED (&tp->thr_exited); + init_completion (&tp->thr_exited); /* dev is fully set up and ready to use now */ DPRINTK("about to register device named %s (%p)...\n", dev->name, dev); @@ -1632,7 +1632,7 @@ rtnl_unlock (); } - up_and_exit (&tp->thr_exited, 0); + complete_and_exit (&tp->thr_exited, 0); } @@ -2138,7 +2138,7 @@ printk (KERN_ERR "%s: unable to signal thread\n", dev->name); return ret; } - down (&tp->thr_exited); + wait_for_completion (&tp->thr_exited); } DPRINTK ("%s: Shutting down ethercard, status was 0x%4.4x.\n", Index: drivers/usb/hub.c =================================================================== RCS file: /inst/cvs/linux/drivers/usb/hub.c,v retrieving revision 1.2.2.41 diff -u -r1.2.2.41 hub.c --- drivers/usb/hub.c 2001/05/14 09:53:16 1.2.2.41 +++ drivers/usb/hub.c 2001/07/20 09:20:29 @@ -36,7 +36,7 @@ static DECLARE_WAIT_QUEUE_HEAD(khubd_wait); static int khubd_pid = 0; /* PID of khubd */ -static DECLARE_MUTEX_LOCKED(khubd_exited); +static DECLARE_COMPLETION(khubd_exited); static int usb_get_hub_descriptor(struct usb_device *dev, void *data, int size) { @@ -781,7 +781,7 @@ dbg("usb_hub_thread exiting"); - up_and_exit(&khubd_exited, 0); + complete_and_exit(&khubd_exited, 0); } static struct usb_device_id hub_id_table [] = { @@ -834,7 +834,7 @@ /* Kill the thread */ ret = kill_proc(khubd_pid, SIGTERM, 1); - down(&khubd_exited); + wait_for_completion(&khubd_exited); /* * Hub resources are freed for us by usb_deregister. It calls Index: fs/jffs/inode-v23.c =================================================================== RCS file: /inst/cvs/linux/fs/jffs/Attic/inode-v23.c,v retrieving revision 1.1.2.12 diff -u -r1.1.2.12 inode-v23.c --- fs/jffs/inode-v23.c 2001/06/28 11:16:55 1.1.2.12 +++ fs/jffs/inode-v23.c 2001/07/20 09:13:04 @@ -157,7 +157,7 @@ D1(printk (KERN_NOTICE "jffs_put_super(): Telling gc thread to die.\n")); send_sig(SIGKILL, c->gc_task, 1); } - down (&c->gc_thread_sem); + wait_for_completion(&c->gc_thread_comp); D1(printk (KERN_NOTICE "jffs_put_super(): Successfully waited on thread.\n")); Index: fs/jffs/intrep.c =================================================================== RCS file: /inst/cvs/linux/fs/jffs/Attic/intrep.c,v retrieving revision 1.1.2.5 diff -u -r1.1.2.5 intrep.c --- fs/jffs/intrep.c 2001/02/24 19:01:45 1.1.2.5 +++ fs/jffs/intrep.c 2001/07/20 09:11:15 @@ -3008,7 +3008,7 @@ current->session = 1; current->pgrp = 1; - init_MUTEX_LOCKED(&c->gc_thread_sem); /* barrier */ + init_completion(&c->gc_thread_comp); /* barrier */ spin_lock_irq(¤t->sigmask_lock); siginitsetinv (¤t->blocked, sigmask(SIGHUP) | sigmask(SIGKILL) | sigmask(SIGSTOP) | sigmask(SIGCONT)); recalc_sigpending(current); @@ -3055,7 +3055,7 @@ D1(printk("jffs_garbage_collect_thread(): SIGKILL received.\n")); c->gc_task = NULL; unlock_kernel(); - up_and_exit(&c->gc_thread_sem, 0); + complete_and_exit(&c->gc_thread_comp, 0); } } Index: include/linux/jffs.h =================================================================== RCS file: /inst/cvs/linux/include/linux/jffs.h,v retrieving revision 1.1.1.1.2.2 diff -u -r1.1.1.1.2.2 jffs.h --- include/linux/jffs.h 2000/12/05 10:51:21 1.1.1.1.2.2 +++ include/linux/jffs.h 2001/07/20 09:58:49 @@ -22,6 +22,8 @@ #define JFFS_VERSION_STRING "1.0" +#include <linux/completion.h> + /* This is a magic number that is used as an identification number for this file system. It is written to the super_block structure. */ #define JFFS_MAGIC_SB_BITMASK 0x07c0 /* 1984 */ @@ -185,7 +187,7 @@ struct jffs_delete_list *delete_list; /* Track deleted files. */ pid_t thread_pid; /* GC thread's PID */ struct task_struct *gc_task; /* GC task struct */ - struct semaphore gc_thread_sem; /* GC thread exit mutex */ + struct completion gc_thread_comp; /* GC thread exit mutex */ __u32 gc_minfree_threshold; /* GC trigger thresholds */ __u32 gc_maxdirty_threshold; }; Index: include/linux/kernel.h =================================================================== RCS file: /inst/cvs/linux/include/linux/kernel.h,v retrieving revision 1.1.1.1.2.18 diff -u -r1.1.1.1.2.18 kernel.h --- include/linux/kernel.h 2001/06/13 06:41:40 1.1.1.1.2.18 +++ include/linux/kernel.h 2001/07/20 09:17:05 @@ -45,14 +45,14 @@ #define FASTCALL(x) x #endif -struct semaphore; +struct completion; extern struct notifier_block *panic_notifier_list; NORET_TYPE void panic(const char * fmt, ...) __attribute__ ((NORET_AND format (printf, 1, 2))); NORET_TYPE void do_exit(long error_code) ATTRIB_NORET; -NORET_TYPE void up_and_exit(struct semaphore *, long) +NORET_TYPE void complete_and_exit(struct completion *, long) ATTRIB_NORET; extern int abs(int); extern unsigned long simple_strtoul(const char *,char **,unsigned int); Index: kernel/exit.c =================================================================== RCS file: /inst/cvs/linux/kernel/exit.c,v retrieving revision 1.3.2.32 diff -u -r1.3.2.32 exit.c --- kernel/exit.c 2001/06/02 14:40:18 1.3.2.32 +++ kernel/exit.c 2001/07/20 09:09:16 @@ -473,10 +473,10 @@ goto fake_volatile; } -NORET_TYPE void up_and_exit(struct semaphore *sem, long code) +NORET_TYPE void complete_and_exit(struct completion *comp, long code) { - if (sem) - up(sem); + if (comp) + complete(comp); do_exit(code); } Index: kernel/ksyms.c =================================================================== RCS file: /inst/cvs/linux/kernel/ksyms.c,v retrieving revision 1.3.2.78 diff -u -r1.3.2.78 ksyms.c --- kernel/ksyms.c 2001/07/19 07:54:30 1.3.2.78 +++ kernel/ksyms.c 2001/07/20 09:20:39 @@ -422,7 +422,7 @@ EXPORT_SYMBOL(iomem_resource); /* process management */ -EXPORT_SYMBOL(up_and_exit); +EXPORT_SYMBOL(complete_and_exit); EXPORT_SYMBOL(__wake_up); EXPORT_SYMBOL(wake_up_process); EXPORT_SYMBOL(sleep_on); -- dwmw2 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.4.7-pre9.. 2001-07-20 5:17 2.4.7-pre9 Linus Torvalds 2001-07-20 7:22 ` 2.4.7-pre9 Jens Axboe 2001-07-20 10:23 ` 2.4.7-pre9 David Woodhouse @ 2001-07-23 12:56 ` Pavel Machek 2001-07-27 16:18 ` 2.4.7-pre9 Matthew Dharm 2 siblings, 1 reply; 14+ messages in thread From: Pavel Machek @ 2001-07-23 12:56 UTC (permalink / raw) To: Linus Torvalds Cc: Kernel Mailing List, Alexander Viro, David S. Miller, Andrea Arcangeli, Alan Cox, David Woodhouse, linux-scsi, Andrew Morton Hi! > I've changed all generic code, so drivers are all expected to compile and > work. However, some SCSI drivers use the semaphore trick in their own > code, and I've not mucked with that. It's not worth worrying about too > much, as the race is basically impossible to hit (famous last words), but I smell problems in usb/storage/*... <evil> What about adding strategic udelay() somewhere so race is easier to hit? </evil> Pavel -- Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt, details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.4.7-pre9.. 2001-07-23 12:56 ` 2.4.7-pre9 Pavel Machek @ 2001-07-27 16:18 ` Matthew Dharm 2001-07-27 17:46 ` 2.4.7-pre9 Linus Torvalds 0 siblings, 1 reply; 14+ messages in thread From: Matthew Dharm @ 2001-07-27 16:18 UTC (permalink / raw) To: Pavel Machek Cc: Linus Torvalds, Kernel Mailing List, Alexander Viro, David S. Miller, Andrea Arcangeli, Alan Cox, David Woodhouse, linux-scsi, Andrew Morton [-- Attachment #1: Type: text/plain, Size: 1299 bytes --] Oh damn.... It looks like I missed an important discussion in the torrent of e-mail that I receive... could someone give me the 30-second executive summary so I can look at what may need to change in usb-storage? Matt On Mon, Jul 23, 2001 at 12:56:35PM +0000, Pavel Machek wrote: > Hi! > > > I've changed all generic code, so drivers are all expected to compile and > > work. However, some SCSI drivers use the semaphore trick in their own > > code, and I've not mucked with that. It's not worth worrying about too > > much, as the race is basically impossible to hit (famous last words), but > > I smell problems in usb/storage/*... > > <evil> > What about adding strategic udelay() somewhere so race is easier to hit? > </evil> > Pavel > -- > Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt, > details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html. > > - > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org -- Matthew Dharm Home: mdharm-usb@one-eyed-alien.net Maintainer, Linux USB Mass Storage Driver You should try to see the techs say "three piece suit". -- The Chief User Friendly, 11/23/1997 [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.4.7-pre9.. 2001-07-27 16:18 ` 2.4.7-pre9 Matthew Dharm @ 2001-07-27 17:46 ` Linus Torvalds 2001-07-27 19:47 ` 2.4.7-pre9 Matthew Dharm 0 siblings, 1 reply; 14+ messages in thread From: Linus Torvalds @ 2001-07-27 17:46 UTC (permalink / raw) To: Matthew Dharm Cc: Pavel Machek, Kernel Mailing List, Alexander Viro, David S. Miller, Andrea Arcangeli, Alan Cox, David Woodhouse, linux-scsi, Andrew Morton On Fri, 27 Jul 2001, Matthew Dharm wrote: > > It looks like I missed an important discussion in the torrent of e-mail > that I receive... could someone give me the 30-second executive summary so > I can look at what may need to change in usb-storage? The basic summary is that we had this (fairly common) way of waiting for certain events by having a locked semaphore on the stack of the waiter, and then having the waiter do a "down()" which caused it to block until the thing it was waiting for did an "up()". This works fairly well, _but_ it has a really small (and quite unlikely) race on SMP, that is not so much a race of the idea itself, as of the implementation of the semaphores. We could have fixed the semaphores, but there were a few reasons not to: - the semaphores are optimized (on purpose) for the non-contention case. The "wait for completion" usage has the opposite default case - the semaphores are quite involved and architecture-specific, exactly due to this optimization. Trying to change them is painful as hell. So instead, I introduced the notion of "wait for completion": struct completion event; init_completion(&event); .. pass of event pointer to waker .. wait_for_completion(&event); where the thing we're waiting for just does "complete(event)" and we're all done. This has the advantage of being a bit more obvious just from a syntactic angle about what is going on. It also ends up being slightly more efficient than semaphores because we can handle the right expected case, and it also avoids the implementation issue that made for the race in the first place. Switching over to the new format is really trivial: struct semaphore -> struct completion init_MUTEX_LOCKED -> init_completion DECLARE_MUTEX_LOCKED -> DECLARE_COMPLETION down() -> wait_for_completion() up() -> complete() and you can in fact maintain 2.2.x compatibility by just having a 2.2.x compatibility file that does the reverse mappings. In case anybody cares, the race was that Linux semaphores only protect the accesses _inside_ the semaphore, while the accesses by the semaphores themselves can "race" in the internal implementation. That helps make an efficient implementation, but it means that the race was: cpu #1 cpu #2 DECLARE_MUTEX_LOCKED(sem); .. down(&sem); up(&sem); return; wake_up(&sem.wait) /*BOOM*/ where the waker still touches the semaphore data structure after the sleeper has become happy with it no longer being locked - and free'd the data structure by virtue of freeing the stack. Linus ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.4.7-pre9.. 2001-07-27 17:46 ` 2.4.7-pre9 Linus Torvalds @ 2001-07-27 19:47 ` Matthew Dharm 2001-07-27 21:00 ` 2.4.7-pre9 Linus Torvalds 2001-07-28 3:53 ` 2.4.7-pre9 David Woodhouse 0 siblings, 2 replies; 14+ messages in thread From: Matthew Dharm @ 2001-07-27 19:47 UTC (permalink / raw) To: Linus Torvalds Cc: Pavel Machek, Kernel Mailing List, Alexander Viro, David S. Miller, Andrea Arcangeli, Alan Cox, David Woodhouse, linux-scsi, Andrew Morton [-- Attachment #1: Type: text/plain, Size: 3425 bytes --] Hrm... just to be clear, then... this only is a problem with semaphores that are declared on the local stack? IIRC, usb-storage only uses semaphores that are allocated via kfree, so I think we're okay. Tho, I think the new semantics are probably better, and will probably switch to them. Later. Matt On Fri, Jul 27, 2001 at 10:46:44AM -0700, Linus Torvalds wrote: > > On Fri, 27 Jul 2001, Matthew Dharm wrote: > > > > It looks like I missed an important discussion in the torrent of e-mail > > that I receive... could someone give me the 30-second executive summary so > > I can look at what may need to change in usb-storage? > > The basic summary is that we had this (fairly common) way of waiting for > certain events by having a locked semaphore on the stack of the waiter, > and then having the waiter do a "down()" which caused it to block until > the thing it was waiting for did an "up()". > > This works fairly well, _but_ it has a really small (and quite unlikely) > race on SMP, that is not so much a race of the idea itself, as of the > implementation of the semaphores. We could have fixed the semaphores, but > there were a few reasons not to: > > - the semaphores are optimized (on purpose) for the non-contention case. > The "wait for completion" usage has the opposite default case > - the semaphores are quite involved and architecture-specific, exactly > due to this optimization. Trying to change them is painful as hell. > > So instead, I introduced the notion of "wait for completion": > > struct completion event; > > init_completion(&event); > .. pass of event pointer to waker .. > wait_for_completion(&event); > > where the thing we're waiting for just does "complete(event)" and we're > all done. > > This has the advantage of being a bit more obvious just from a syntactic > angle about what is going on. It also ends up being slightly more > efficient than semaphores because we can handle the right expected case, > and it also avoids the implementation issue that made for the race in the > first place. > > Switching over to the new format is really trivial: > > struct semaphore -> struct completion > init_MUTEX_LOCKED -> init_completion > DECLARE_MUTEX_LOCKED -> DECLARE_COMPLETION > down() -> wait_for_completion() > up() -> complete() > > and you can in fact maintain 2.2.x compatibility by just having a 2.2.x > compatibility file that does the reverse mappings. > > In case anybody cares, the race was that Linux semaphores only protect the > accesses _inside_ the semaphore, while the accesses by the semaphores > themselves can "race" in the internal implementation. That helps make an > efficient implementation, but it means that the race was: > > cpu #1 cpu #2 > > DECLARE_MUTEX_LOCKED(sem); > .. > down(&sem); up(&sem); > return; > wake_up(&sem.wait) /*BOOM*/ > > where the waker still touches the semaphore data structure after the > sleeper has become happy with it no longer being locked - and free'd the > data structure by virtue of freeing the stack. > > Linus -- Matthew Dharm Home: mdharm-usb@one-eyed-alien.net Maintainer, Linux USB Mass Storage Driver Hey Chief. We've figured out how to save the technical department. We need to be committed. -- The Techs User Friendly, 1/22/1998 [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.4.7-pre9.. 2001-07-27 19:47 ` 2.4.7-pre9 Matthew Dharm @ 2001-07-27 21:00 ` Linus Torvalds 2001-07-28 3:53 ` 2.4.7-pre9 David Woodhouse 1 sibling, 0 replies; 14+ messages in thread From: Linus Torvalds @ 2001-07-27 21:00 UTC (permalink / raw) To: Matthew Dharm Cc: Pavel Machek, Kernel Mailing List, Alexander Viro, David S. Miller, Andrea Arcangeli, Alan Cox, David Woodhouse, linux-scsi, Andrew Morton On Fri, 27 Jul 2001, Matthew Dharm wrote: > > Hrm... just to be clear, then... this only is a problem with semaphores > that are declared on the local stack? Yes. In theory you can have the same problem on any allocation that is free'd after the semaphore has been used, but most (hopefully all) other forms of allocations should be using proper reference counting etc, so that it is impossible for a semaphore user to have the semaphore disappear from under it. > IIRC, usb-storage only uses semaphores that are allocated via kfree, so I > think we're okay. Tho, I think the new semantics are probably better, and > will probably switch to them. Later. If you're using kmalloc/kfree, just make sure that the kfree never happens early (ie the "down()" does not protect the kfree, but some other reference count or lock does). If you have CPU #1 CPU #2 down(mem->semaphore); up(mem->semaphore); kfree(mem); you can still have the same bug. But if you have CPU #1 CPU #2 down(mem->semaphore) up(mem->semaphore); put(mem); put(mem); where "put(mem)" does something like if (atomic_dec_and_test(&mem->refcount)) kfree(mem); then you're obvoiusly ok, because now the free'ing of the data structure cannot happen until all users have handed off on it. Linus ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.4.7-pre9.. 2001-07-27 19:47 ` 2.4.7-pre9 Matthew Dharm 2001-07-27 21:00 ` 2.4.7-pre9 Linus Torvalds @ 2001-07-28 3:53 ` David Woodhouse 2001-07-27 19:55 ` 2.4.7-pre9 Matthew Dharm 1 sibling, 1 reply; 14+ messages in thread From: David Woodhouse @ 2001-07-28 3:53 UTC (permalink / raw) To: Matthew Dharm; +Cc: Kernel Mailing List, linux-scsi (Cc list trimmed) On Fri, 27 Jul 2001, Matthew Dharm wrote: > IIRC, usb-storage only uses semaphores that are allocated via kfree, so I > think we're okay. Tho, I think the new semantics are probably better, and > will probably switch to them. Later. If the exit (or indeed any) path does down(); kfree(); you suffer the same problem. -- dwmw2 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.4.7-pre9.. 2001-07-28 3:53 ` 2.4.7-pre9 David Woodhouse @ 2001-07-27 19:55 ` Matthew Dharm 0 siblings, 0 replies; 14+ messages in thread From: Matthew Dharm @ 2001-07-27 19:55 UTC (permalink / raw) To: David Woodhouse; +Cc: Kernel Mailing List, linux-scsi [-- Attachment #1: Type: text/plain, Size: 809 bytes --] Ah... okay.. now I see the issue. Now let's see if I can fix and test this over the weekend.... Matt On Sat, Jul 28, 2001 at 04:53:30AM +0100, David Woodhouse wrote: > > (Cc list trimmed) > > On Fri, 27 Jul 2001, Matthew Dharm wrote: > > > IIRC, usb-storage only uses semaphores that are allocated via kfree, so I > > think we're okay. Tho, I think the new semantics are probably better, and > > will probably switch to them. Later. > > If the exit (or indeed any) path does down(); kfree(); you suffer the same > problem. > > -- > dwmw2 -- Matthew Dharm Home: mdharm-usb@one-eyed-alien.net Maintainer, Linux USB Mass Storage Driver How would you like this tie wrapped around your hairy round head? -- Greg User Friendly, 9/2/1998 [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2001-07-28 0:27 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2001-07-20 5:17 2.4.7-pre9 Linus Torvalds 2001-07-20 7:22 ` 2.4.7-pre9 Jens Axboe 2001-07-20 8:15 ` 2.4.7-pre9 Linus Torvalds 2001-07-20 8:26 ` 2.4.7-pre9 Jens Axboe 2001-07-20 16:42 ` 2.4.7-pre9 Linus Torvalds 2001-07-20 18:57 ` 2.4.7-pre9 Jens Axboe 2001-07-20 10:23 ` 2.4.7-pre9 David Woodhouse 2001-07-23 12:56 ` 2.4.7-pre9 Pavel Machek 2001-07-27 16:18 ` 2.4.7-pre9 Matthew Dharm 2001-07-27 17:46 ` 2.4.7-pre9 Linus Torvalds 2001-07-27 19:47 ` 2.4.7-pre9 Matthew Dharm 2001-07-27 21:00 ` 2.4.7-pre9 Linus Torvalds 2001-07-28 3:53 ` 2.4.7-pre9 David Woodhouse 2001-07-27 19:55 ` 2.4.7-pre9 Matthew Dharm
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox