* [RFC] utterly bogus userland API in infinibad @ 2005-09-16 18:11 Al Viro 2005-09-16 19:31 ` Roland Dreier 0 siblings, 1 reply; 7+ messages in thread From: Al Viro @ 2005-09-16 18:11 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel, rolandd Exhibit A: opening uverbs... is done by ib_uverbs_open() (in drivers/infinib*d/core/uverbs_main.c). Aside of a number of obvious leaks, it does a number of calls of ib_uverbs_event_init(). Each of those does something amazingly bogus: * allocates a descriptor * allocates struct file * associates that struct file with root of their pseudo-fs * inserts it into caller's descriptor table ... and leaves an unknown number of those if open() fails, while we are at it. With zero indications for caller and no way to find out. What's more, you _can_ get those descriptors afterwards, if open() had succeeded. All you need to do is... Exibit B: ... write() to said descriptor. Buffer should contain a struct that will be interpreted. Results will be written to user memory, at the addresses contained in that struct. Said results might include the descriptors shat upon by open(). Nice way to hide an ioctl(), folks... Note that this "interface" assumes that only original opener will write to that file - for anybody else descriptors obviously will not make any sense. BTW, due to the way we do opens, if another thread sharing descriptor table will guess the number of first additional descriptor to be opened and just loops doing close() on it, we'll actually get our ib_uverbs_file kfreed right under us. May I ask who had come up with that insanity? Aside of inherent ugliness and abuse of fs syscalls, it simply doesn't work. E.g. leaks on failed open() are going to be fun to fix... ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] utterly bogus userland API in infinibad 2005-09-16 18:11 [RFC] utterly bogus userland API in infinibad Al Viro @ 2005-09-16 19:31 ` Roland Dreier 2005-09-16 20:37 ` Al Viro 0 siblings, 1 reply; 7+ messages in thread From: Roland Dreier @ 2005-09-16 19:31 UTC (permalink / raw) To: Al Viro; +Cc: Linus Torvalds, linux-kernel, rolandd > Exhibit A: > > opening uverbs... is done by ib_uverbs_open() (in > drivers/infinib*d/core/uverbs_main.c). Aside of a number of obvious > leaks, it does a number of calls of ib_uverbs_event_init(). Each of > those does something amazingly bogus: > * allocates a descriptor > * allocates struct file > * associates that struct file with root of their pseudo-fs > * inserts it into caller's descriptor table > ... and leaves an unknown number of those if open() fails, while we > are at it. With zero indications for caller and no way to find out. Sorry, but the obvious leaks aren't obvious to me. Could you give more details? It is a good point that we might leak file descriptors if open() fails halfway through. I guess we should wait to do the fd_install()s until we're sure that everything has succeeded. > What's more, you _can_ get those descriptors afterwards, if open() > had succeeded. All you need to do is... Not sure I follow this. The intention is that those file descriptors be available to userspace for poll(), read(), etc. > Exibit B: > ... write() to said descriptor. Buffer should contain a struct > that will be interpreted. Results will be written to user memory, at the > addresses contained in that struct. Said results might include the > descriptors shat upon by open(). Nice way to hide an ioctl(), folks... > Note that this "interface" assumes that only original opener will write > to that file - for anybody else descriptors obviously will not make any > sense. > BTW, due to the way we do opens, if another thread sharing descriptor > table will guess the number of first additional descriptor to be opened > and just loops doing close() on it, we'll actually get our ib_uverbs_file > kfreed right under us. Good point. What do other interfaces that create file descriptors do? For example, in fs/eventpoll.c, I don't see anything obvious in sys_epoll_create() that protects the file descriptor from being closed between ep_getfd() and ep_file_init(). It seems that waiting to do the fd_install()s until we're just about to return to userspace anyway would be good enough. > May I ask who had come up with that insanity? Aside of inherent ugliness > and abuse of fs syscalls, it simply doesn't work. E.g. leaks on failed > open() are going to be fun to fix... I'm the insane one. I'm happy to fix it up, but do you have a preference for what the interface should look like? I don't think we want 30+ new system calls for InfiniBand and I don't think we want a single horrible multiplexed system call. I don't think ioctl() is any better or worse than write(), so I could go either way. Anyway, what do you suggest? Not to be peevish, but I actually described exactlyl this scheme in email to lkml, cc'ed to Al, in Message-ID: <52k6qn229h.fsf@topspin.com> back in January. For some reason, this email doesn't seem to have been archived on the web (I'm happy to resend if anyone wants), but Al certainly replied to part of it (saying that yes, get_sb_pseudo() should be exported). Thanks, Roland ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] utterly bogus userland API in infinibad 2005-09-16 19:31 ` Roland Dreier @ 2005-09-16 20:37 ` Al Viro 2005-09-16 23:54 ` Roland Dreier 0 siblings, 1 reply; 7+ messages in thread From: Al Viro @ 2005-09-16 20:37 UTC (permalink / raw) To: Roland Dreier; +Cc: Linus Torvalds, linux-kernel On Fri, Sep 16, 2005 at 12:31:30PM -0700, Roland Dreier wrote: > > Exhibit A: > > > > opening uverbs... is done by ib_uverbs_open() (in > > drivers/infinib*d/core/uverbs_main.c). Aside of a number of obvious > > leaks, it does a number of calls of ib_uverbs_event_init(). Each of > > those does something amazingly bogus: > > * allocates a descriptor > > * allocates struct file > > * associates that struct file with root of their pseudo-fs > > * inserts it into caller's descriptor table > > ... and leaves an unknown number of those if open() fails, while we > > are at it. With zero indications for caller and no way to find out. > > Sorry, but the obvious leaks aren't obvious to me. Could you give > more details? if (!try_module_get(dev->ib_dev->owner)) return -ENODEV; file = kmalloc(sizeof *file + (dev->num_comp - 1) * sizeof (struct ib_uverbs_event_file) , GFP_KERNEL); if (!file) return -ENOMEM; Looks obvious enough. The very beginning of the function... > It is a good point that we might leak file descriptors if open() fails > halfway through. I guess we should wait to do the fd_install()s until > we're sure that everything has succeeded. > > > What's more, you _can_ get those descriptors afterwards, if open() > > had succeeded. All you need to do is... > > Not sure I follow this. The intention is that those file descriptors > be available to userspace for poll(), read(), etc. ... only in the opener process? And with insertion and reporting done in separate syscalls? FWIW, we do have an interface for passing a bunch of files into descriptor table - SCM_RIGHTS datagrams. > > Exibit B: > > ... write() to said descriptor. Buffer should contain a struct > > that will be interpreted. Results will be written to user memory, at the > > addresses contained in that struct. Said results might include the > > descriptors shat upon by open(). Nice way to hide an ioctl(), folks... > > > Note that this "interface" assumes that only original opener will write > > to that file - for anybody else descriptors obviously will not make any > > sense. > > > BTW, due to the way we do opens, if another thread sharing descriptor > > table will guess the number of first additional descriptor to be opened > > and just loops doing close() on it, we'll actually get our ib_uverbs_file > > kfreed right under us. > > Good point. What do other interfaces that create file descriptors do? That one is trivial - call kref_get() before installing descriptor, not after that... > For example, in fs/eventpoll.c, I don't see anything obvious in > sys_epoll_create() that protects the file descriptor from being closed > between ep_getfd() and ep_file_init(). Oh, lovely - it's also b0rken. General rule: you should never, ever do removal from descriptor table as part of cleanup. By the time you get there you might have _anything_ in the entry you are about to close. eventpoll, AFAICS, should simply move ep_file_init() prior to insertion into descriptor table. Even more general rule: initialize the object before sticking a pointer to it into shared data structure... BTW, here's another example of the same principle: file->ucontext = ibdev->alloc_ucontext(ibdev, &udata); if (IS_ERR(file->ucontext)) { ret = PTR_ERR(file->ucontext); file->ucontext = NULL; return ret; } is broken - anything that checks for ->ucontext and assumes that non-NULL means a valid value is going to be screwed by that. Use of local variable would at least deal with that; however, you need exclusion and check to deal with multiple calls of that sucker, parallel or not. Of course, initializing the contents of ->ucontext should also go before the assignment... Another one, in the same function: ibdev->dealloc_ucontext(file->ucontext); file->ucontext = NULL; on failure exit is a) obviously wrong since you leave the pointer to freed object in shared data structure for a while b) potentially nasty since somebody might have started to use that object while we were messing with copy_to_user() (i.e. saner solution would be to do copy_to_user() before anything else and mess with allocations only if it had succeeded). > It seems that waiting to do the fd_install()s until we're just about > to return to userspace anyway would be good enough. > > May I ask who had come up with that insanity? Aside of inherent ugliness > > and abuse of fs syscalls, it simply doesn't work. E.g. leaks on failed > > open() are going to be fun to fix... > > I'm the insane one. I'm happy to fix it up, but do you have a > preference for what the interface should look like? I don't think we > want 30+ new system calls for InfiniBand and I don't think we want a > single horrible multiplexed system call. I don't think ioctl() is any > better or worse than write(), so I could go either way. Anyway, what > do you suggest? First of all, this form of write() is no better than ioctl() - you get exact same problems combined with lack of warning (ioctl() is a warning enough - we all know that its arguments can be interpreted in arbitrary ugly way; write() is not generally assumed to suffer from that). Note that quite a few of these guys are simply read() in disguise, which is particulary strange since you _do_ have extra file descriptors. Using write() to tell which stuff you would like to read and where in userland should we put the read value is... odd. Moreover, don't you have e.g. ib_query_port() accessible from sysfs as well? ib_query_gid() too... BTW, uobj = pd->uobject; ret = ib_dealloc_pd(pd); if (ret) goto out; idr_remove(&ib_uverbs_pd_idr, cmd.pd_handle); spin_lock_irq(&file->ucontext->lock); list_del(&uobj->list); spin_unlock_irq(&file->ucontext->lock); kfree(uobj); looks funny - can anything else get to uobj via that list here? If so, we are asking for trouble with that kfree()... OK, that can go on (and it definitely needs a review - we have enough obvious races in there and that's just from cursory look, more along the lines of "what do we do in that multiplexor"), but the points wrt interface are simple: * you _did_ end up with a multiplexor; you've just got it piggybacked on sys_write(). * part of that stuff appears to be duplicating sysfs interfaces * a lot of that stuff looks much more like a read(2), not write(2). * documentation (however informal) of the list of things accessible via said multiplexor is needed for further comments on the interface. > Not to be peevish, but I actually described exactlyl this scheme in > email to lkml, cc'ed to Al, in Message-ID: <52k6qn229h.fsf@topspin.com> > back in January. For some reason, this email doesn't seem to have > been archived on the web (I'm happy to resend if anyone wants), but Al > certainly replied to part of it (saying that yes, get_sb_pseudo() > should be exported). My apologies - looks like I've missed the context back then. I certainly had missed the descriptions of the operations you wanted to plug into write(2). ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] utterly bogus userland API in infinibad 2005-09-16 20:37 ` Al Viro @ 2005-09-16 23:54 ` Roland Dreier 2005-09-17 0:03 ` David S. Miller 2005-09-19 22:25 ` Roland Dreier 0 siblings, 2 replies; 7+ messages in thread From: Roland Dreier @ 2005-09-16 23:54 UTC (permalink / raw) To: Al Viro; +Cc: Linus Torvalds, linux-kernel Al> Looks obvious enough. The very beginning of the function... Fair enough. Yes, there's an obvious reference leak there. Roland> Not sure I follow this. The intention is that those file Roland> descriptors be available to userspace for poll(), read(), Al> ... only in the opener process? And with insertion and Al> reporting done in separate syscalls? FWIW, we do have an Al> interface for passing a bunch of files into descriptor table - Al> SCM_RIGHTS datagrams. I have to confess I'm not familiar with how the kernel implements SCM_RIGHTS. Is it something we could use here? Al> is broken - anything that checks for ->ucontext and assumes Al> that non-NULL means a valid value is going to be screwed by Al> that. Use of local variable would at least deal with that; Al> however, you need exclusion and check to deal with multiple Al> calls of that sucker, parallel or not. Yes, good points. I'll fix that up. Al> First of all, this form of write() is no better than ioctl() - Al> you get exact same problems combined with lack of warning Al> (ioctl() is a warning enough - we all know that its arguments Al> can be interpreted in arbitrary ugly way; write() is not Al> generally assumed to suffer from that). Agreed -- as I said before, I don't think this is any better or worse than ioctl(). The reasons for choosing write() are mostly historial (write() at least didn't take the BKL). Now that we have unlocked_ioctl I have no objection to switching to ioctl(). Al> Note that quite a few of these guys are simply read() in Al> disguise, which is particulary strange since you _do_ have Al> extra file descriptors. Using write() to tell which stuff you Al> would like to read and where in userland should we put the Al> read value is... odd. Yes, it is a perversion of write() semantics. However I don't see a way to use read(), since there's no way to tell read() what data we want to get in a particular system call. For the "event files" we do use read() because the semantics are quite natural -- "give me the next event you have queued up." However, most of the operations on the main "command" file descriptor are more like transactions that take some inputs and return some outputs. For example, the "create queue pair" operation takes a bunch of inputs like "maximum number of queue entries" and returns a status, a queue pair number and a handle to the object that was created. Al> BTW Al> uobj = pd->uobject; Al> ret = ib_dealloc_pd(pd); Al> if (ret) Al> goto out; Al> idr_remove(&ib_uverbs_pd_idr, cmd.pd_handle); Al> spin_lock_irq(&file->ucontext->lock); Al> list_del(&uobj->list); Al> spin_unlock_irq(&file->ucontext->lock); Al> kfree(uobj); Al> looks funny - can anything else get to uobj via that list Al> here? If so, we are asking for trouble with that kfree()... I think that code is actually OK. The list is only used to keep track of objects that we need to clean up when the file is closed, and we'll only walk the list in the file's release method, when no one else can be using it. Al> but the points wrt interface are simple: Al> * you _did_ end up with a multiplexor; you've just got Al> it piggybacked on sys_write(). * part of that stuff appears Al> to be duplicating sysfs interfaces * a lot of that stuff looks Al> much more like a read(2), not write(2). * documentation Al> (however informal) of the list of things accessible via said Al> multiplexor is needed for further comments on the interface. Not sure exactly what documentation you're looking for. Let me know if the following is a good start, or if there's something else you want to know. The basic objects that we want userspace to be able to work with are: Context - resource container that everything else is inside. Corresponds to a file descriptor, and no one without access to the FD can mess with the context. Asynchronous event queue - queue of asynchronous events like "adapter port has changed state." We want this to be something userspace can poll(), sleep on, etc. Work completion event queue - similar to async event queue, but there are possibly multiple queues for a single context. Queues events like "completion notification has been added to completion queue." Similarly want to be able to poll(), sleep on individual queues, etc. Protection Domain (PD) - Another type of resource container. Every queue pair (QP) and memory region (MR) are attached to a PD. Userspace needs to be able to create and destroy PDs, and pass a handle to a PD when creating QPs and MRs. Memory Region (MR) - Area of memory that IB hardware is allowed direct access to. When created, L_Key and R_Key opaque cookies are returned. Userspace needs to be able to create and destroy MRs and have access to the L_Key and R_Key. Queue Pair (QP) - Pair of work queues (send queue and receive queue) that userspace can put work requests into for processing by IB hardware. Kernel is not involved in the fast path operations of adding work requests, but it must handle resource allocation when userspace creates a QP. Userspace also needs to be able to ask the kernel to modify and destroy QPs. The modify operation takes a bunch of parameters such as remote address to connect to, etc. Completion Queue (CQ) - Queue that holds information about completed work requests. Pretty similar to QPs: IB hardware writes into the queue and it's read directly from userspace with no kernel involvement, but the kernel has to handle resource allocation/cleanup. Also, userspace can request an interrupt when something is added to a CQ, and the kernel has to turn that interrupt into something userspace can read on the appropriate completion event queue. Every work queue is attached to a CQ, so we need to have handles we can pass into the create QP operation. There are a few other types of object like shared receive queues (SRQ) and memory windows (MW) but they're pretty similar to the main ones I described above. Thanks, Roland ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] utterly bogus userland API in infinibad 2005-09-16 23:54 ` Roland Dreier @ 2005-09-17 0:03 ` David S. Miller 2005-09-17 0:08 ` Roland Dreier 2005-09-19 22:25 ` Roland Dreier 1 sibling, 1 reply; 7+ messages in thread From: David S. Miller @ 2005-09-17 0:03 UTC (permalink / raw) To: rolandd; +Cc: viro, torvalds, linux-kernel From: Roland Dreier <rolandd@cisco.com> Date: Fri, 16 Sep 2005 16:54:31 -0700 > I have to confess I'm not familiar with how the kernel implements > SCM_RIGHTS. Is it something we could use here? Yes, you could open up an AF_UNIX socket with userspace and pass the FDs over via SCM_RIGHTS. Read the unix(7) man page, section ANCILLARY MESSAGES, sub-section SCM_RIGHTS, to see how userspace can use this stuff between processes. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] utterly bogus userland API in infinibad 2005-09-17 0:03 ` David S. Miller @ 2005-09-17 0:08 ` Roland Dreier 0 siblings, 0 replies; 7+ messages in thread From: Roland Dreier @ 2005-09-17 0:08 UTC (permalink / raw) To: David S. Miller; +Cc: rolandd, viro, torvalds, linux-kernel David> Read the unix(7) man page, section ANCILLARY MESSAGES, David> sub-section SCM_RIGHTS, to see how userspace can use this David> stuff between processes. Yeah, I know about using SCM_RIGHTS between processes in userspace... David> Yes, you could open up an AF_UNIX socket with userspace and David> pass the FDs over via SCM_RIGHTS. ...but how does the kernel open an AF_UNIX socket with userspace? - R. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] utterly bogus userland API in infinibad 2005-09-16 23:54 ` Roland Dreier 2005-09-17 0:03 ` David S. Miller @ 2005-09-19 22:25 ` Roland Dreier 1 sibling, 0 replies; 7+ messages in thread From: Roland Dreier @ 2005-09-19 22:25 UTC (permalink / raw) To: Al Viro; +Cc: Linus Torvalds, linux-kernel Here's a patch that (I think) fixes all the exploitable holes that Al pointed out. I know it does still leak files if ib_uverbs_open() fails halfway through, but I think the best way to fix that will require breaking the ABI. I would propose including this patch in 2.6.14 to close the worst holes and then making ABI-breaking changes for 2.6.15. I've thought it over, and I can see a better design that always FDs directly from the system call that creates and installs them, and always returns FDs one by one. This makes cleaning up on failure of a system much simpler, and handles the case of a context being used in a different process than the original opener. However, this would leave intact the current write()-based interface. I'd love to implement a better interface, but I don't see anything that's a clear improvement. Al, any further thoughts? Thanks, Roland diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h --- a/drivers/infiniband/core/uverbs.h +++ b/drivers/infiniband/core/uverbs.h @@ -69,6 +69,7 @@ struct ib_uverbs_event_file { struct ib_uverbs_file { struct kref ref; + struct semaphore mutex; struct ib_uverbs_device *device; struct ib_ucontext *ucontext; struct ib_event_handler event_handler; diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -76,8 +76,9 @@ ssize_t ib_uverbs_get_context(struct ib_ struct ib_uverbs_get_context_resp resp; struct ib_udata udata; struct ib_device *ibdev = file->device->ib_dev; + struct ib_ucontext *ucontext; int i; - int ret = in_len; + int ret; if (out_len < sizeof resp) return -ENOSPC; @@ -85,45 +86,56 @@ ssize_t ib_uverbs_get_context(struct ib_ if (copy_from_user(&cmd, buf, sizeof cmd)) return -EFAULT; + down(&file->mutex); + + if (file->ucontext) { + ret = -EINVAL; + goto err; + } + INIT_UDATA(&udata, buf + sizeof cmd, (unsigned long) cmd.response + sizeof resp, in_len - sizeof cmd, out_len - sizeof resp); - file->ucontext = ibdev->alloc_ucontext(ibdev, &udata); - if (IS_ERR(file->ucontext)) { - ret = PTR_ERR(file->ucontext); - file->ucontext = NULL; - return ret; - } - - file->ucontext->device = ibdev; - INIT_LIST_HEAD(&file->ucontext->pd_list); - INIT_LIST_HEAD(&file->ucontext->mr_list); - INIT_LIST_HEAD(&file->ucontext->mw_list); - INIT_LIST_HEAD(&file->ucontext->cq_list); - INIT_LIST_HEAD(&file->ucontext->qp_list); - INIT_LIST_HEAD(&file->ucontext->srq_list); - INIT_LIST_HEAD(&file->ucontext->ah_list); - spin_lock_init(&file->ucontext->lock); + ucontext = ibdev->alloc_ucontext(ibdev, &udata); + if (IS_ERR(ucontext)) + return PTR_ERR(file->ucontext); + + ucontext->device = ibdev; + INIT_LIST_HEAD(&ucontext->pd_list); + INIT_LIST_HEAD(&ucontext->mr_list); + INIT_LIST_HEAD(&ucontext->mw_list); + INIT_LIST_HEAD(&ucontext->cq_list); + INIT_LIST_HEAD(&ucontext->qp_list); + INIT_LIST_HEAD(&ucontext->srq_list); + INIT_LIST_HEAD(&ucontext->ah_list); resp.async_fd = file->async_file.fd; for (i = 0; i < file->device->num_comp; ++i) if (copy_to_user((void __user *) (unsigned long) cmd.cq_fd_tab + i * sizeof (__u32), - &file->comp_file[i].fd, sizeof (__u32))) - goto err; + &file->comp_file[i].fd, sizeof (__u32))) { + ret = -EFAULT; + goto err_free; + } if (copy_to_user((void __user *) (unsigned long) cmd.response, - &resp, sizeof resp)) - goto err; + &resp, sizeof resp)) { + ret = -EFAULT; + goto err_free; + } + + file->ucontext = ucontext; + up(&file->mutex); return in_len; -err: - ibdev->dealloc_ucontext(file->ucontext); - file->ucontext = NULL; +err_free: + ibdev->dealloc_ucontext(ucontext); - return -EFAULT; +err: + up(&file->mutex); + return ret; } ssize_t ib_uverbs_query_device(struct ib_uverbs_file *file, @@ -352,9 +364,9 @@ retry: if (ret) goto err_pd; - spin_lock_irq(&file->ucontext->lock); + down(&file->mutex); list_add_tail(&uobj->list, &file->ucontext->pd_list); - spin_unlock_irq(&file->ucontext->lock); + up(&file->mutex); memset(&resp, 0, sizeof resp); resp.pd_handle = uobj->id; @@ -368,9 +380,9 @@ retry: return in_len; err_list: - spin_lock_irq(&file->ucontext->lock); + down(&file->mutex); list_del(&uobj->list); - spin_unlock_irq(&file->ucontext->lock); + up(&file->mutex); down(&ib_uverbs_idr_mutex); idr_remove(&ib_uverbs_pd_idr, uobj->id); @@ -410,9 +422,9 @@ ssize_t ib_uverbs_dealloc_pd(struct ib_u idr_remove(&ib_uverbs_pd_idr, cmd.pd_handle); - spin_lock_irq(&file->ucontext->lock); + down(&file->mutex); list_del(&uobj->list); - spin_unlock_irq(&file->ucontext->lock); + up(&file->mutex); kfree(uobj); @@ -512,9 +524,9 @@ retry: resp.mr_handle = obj->uobject.id; - spin_lock_irq(&file->ucontext->lock); + down(&file->mutex); list_add_tail(&obj->uobject.list, &file->ucontext->mr_list); - spin_unlock_irq(&file->ucontext->lock); + up(&file->mutex); if (copy_to_user((void __user *) (unsigned long) cmd.response, &resp, sizeof resp)) { @@ -527,9 +539,9 @@ retry: return in_len; err_list: - spin_lock_irq(&file->ucontext->lock); + down(&file->mutex); list_del(&obj->uobject.list); - spin_unlock_irq(&file->ucontext->lock); + up(&file->mutex); err_unreg: ib_dereg_mr(mr); @@ -570,9 +582,9 @@ ssize_t ib_uverbs_dereg_mr(struct ib_uve idr_remove(&ib_uverbs_mr_idr, cmd.mr_handle); - spin_lock_irq(&file->ucontext->lock); + down(&file->mutex); list_del(&memobj->uobject.list); - spin_unlock_irq(&file->ucontext->lock); + up(&file->mutex); ib_umem_release(file->device->ib_dev, &memobj->umem); kfree(memobj); @@ -647,9 +659,9 @@ retry: if (ret) goto err_cq; - spin_lock_irq(&file->ucontext->lock); + down(&file->mutex); list_add_tail(&uobj->uobject.list, &file->ucontext->cq_list); - spin_unlock_irq(&file->ucontext->lock); + up(&file->mutex); memset(&resp, 0, sizeof resp); resp.cq_handle = uobj->uobject.id; @@ -664,9 +676,9 @@ retry: return in_len; err_list: - spin_lock_irq(&file->ucontext->lock); + down(&file->mutex); list_del(&uobj->uobject.list); - spin_unlock_irq(&file->ucontext->lock); + up(&file->mutex); down(&ib_uverbs_idr_mutex); idr_remove(&ib_uverbs_cq_idr, uobj->uobject.id); @@ -712,9 +724,9 @@ ssize_t ib_uverbs_destroy_cq(struct ib_u idr_remove(&ib_uverbs_cq_idr, cmd.cq_handle); - spin_lock_irq(&file->ucontext->lock); + down(&file->mutex); list_del(&uobj->uobject.list); - spin_unlock_irq(&file->ucontext->lock); + up(&file->mutex); spin_lock_irq(&file->comp_file[0].lock); list_for_each_entry_safe(evt, tmp, &uobj->comp_list, obj_list) { @@ -847,9 +859,9 @@ retry: resp.qp_handle = uobj->uobject.id; - spin_lock_irq(&file->ucontext->lock); + down(&file->mutex); list_add_tail(&uobj->uobject.list, &file->ucontext->qp_list); - spin_unlock_irq(&file->ucontext->lock); + up(&file->mutex); if (copy_to_user((void __user *) (unsigned long) cmd.response, &resp, sizeof resp)) { @@ -862,9 +874,9 @@ retry: return in_len; err_list: - spin_lock_irq(&file->ucontext->lock); + down(&file->mutex); list_del(&uobj->uobject.list); - spin_unlock_irq(&file->ucontext->lock); + up(&file->mutex); err_destroy: ib_destroy_qp(qp); @@ -989,9 +1001,9 @@ ssize_t ib_uverbs_destroy_qp(struct ib_u idr_remove(&ib_uverbs_qp_idr, cmd.qp_handle); - spin_lock_irq(&file->ucontext->lock); + down(&file->mutex); list_del(&uobj->uobject.list); - spin_unlock_irq(&file->ucontext->lock); + up(&file->mutex); spin_lock_irq(&file->async_file.lock); list_for_each_entry_safe(evt, tmp, &uobj->event_list, obj_list) { @@ -1136,9 +1148,9 @@ retry: resp.srq_handle = uobj->uobject.id; - spin_lock_irq(&file->ucontext->lock); + down(&file->mutex); list_add_tail(&uobj->uobject.list, &file->ucontext->srq_list); - spin_unlock_irq(&file->ucontext->lock); + up(&file->mutex); if (copy_to_user((void __user *) (unsigned long) cmd.response, &resp, sizeof resp)) { @@ -1151,9 +1163,9 @@ retry: return in_len; err_list: - spin_lock_irq(&file->ucontext->lock); + down(&file->mutex); list_del(&uobj->uobject.list); - spin_unlock_irq(&file->ucontext->lock); + up(&file->mutex); err_destroy: ib_destroy_srq(srq); @@ -1227,9 +1239,9 @@ ssize_t ib_uverbs_destroy_srq(struct ib_ idr_remove(&ib_uverbs_srq_idr, cmd.srq_handle); - spin_lock_irq(&file->ucontext->lock); + down(&file->mutex); list_del(&uobj->uobject.list); - spin_unlock_irq(&file->ucontext->lock); + up(&file->mutex); spin_lock_irq(&file->async_file.lock); list_for_each_entry_safe(evt, tmp, &uobj->event_list, obj_list) { diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c --- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -484,27 +484,29 @@ static int ib_uverbs_open(struct inode * file = kmalloc(sizeof *file + (dev->num_comp - 1) * sizeof (struct ib_uverbs_event_file), GFP_KERNEL); - if (!file) - return -ENOMEM; + if (!file) { + ret = -ENOMEM; + goto err; + } file->device = dev; kref_init(&file->ref); + init_MUTEX(&file->mutex); file->ucontext = NULL; + kref_get(&file->ref); ret = ib_uverbs_event_init(&file->async_file, file); if (ret) - goto err; + goto err_kref; file->async_file.is_async = 1; - kref_get(&file->ref); - for (i = 0; i < dev->num_comp; ++i) { + kref_get(&file->ref); ret = ib_uverbs_event_init(&file->comp_file[i], file); if (ret) goto err_async; - kref_get(&file->ref); file->comp_file[i].is_async = 0; } @@ -524,9 +526,16 @@ err_async: ib_uverbs_event_release(&file->async_file); -err: +err_kref: + /* + * One extra kref_put() because we took a reference before the + * event file creation that failed and got us here. + */ + kref_put(&file->ref, ib_uverbs_release_file); kref_put(&file->ref, ib_uverbs_release_file); +err: + module_put(dev->ib_dev->owner); return ret; } diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -665,7 +665,6 @@ struct ib_ucontext { struct list_head qp_list; struct list_head srq_list; struct list_head ah_list; - spinlock_t lock; }; struct ib_uobject { ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2005-09-19 22:25 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-09-16 18:11 [RFC] utterly bogus userland API in infinibad Al Viro 2005-09-16 19:31 ` Roland Dreier 2005-09-16 20:37 ` Al Viro 2005-09-16 23:54 ` Roland Dreier 2005-09-17 0:03 ` David S. Miller 2005-09-17 0:08 ` Roland Dreier 2005-09-19 22:25 ` Roland Dreier
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox