* 2.5.70 add_disk(disk) re-registering disk->queue->elevator.kobj (bug?!) @ 2003-06-03 18:33 Lou Langholtz 2003-06-03 19:07 ` Andrew Morton 0 siblings, 1 reply; 8+ messages in thread From: Lou Langholtz @ 2003-06-03 18:33 UTC (permalink / raw) To: linux-kernel What am I missing (or is something in the block handling code actually seriously wrong)? In linux-2.5.70/drivers/block/genhd.c the add_disk(disk) logic seems wrong because if disk->queue is shared by a few disk's, then that shared queue's elevator is re-registered via the call to elv_register_queue(disk) which calls down to kobject_register(disk->queue->elevator.kobj). Or perhaps the block handling logic was changed such that disks don't share the same request_queue anymore. If so, then a few drivers (like nbd) need to be updated to use a seperate request_queue per disk. If the block handling code wasn't however changed this way though and struct gendisk objects can still share the same request_queue then it seems a lot of other logic todo with elv_register_queue() is also scrogged. Like shouldn't disk->kobj.parent be set to kobject_get(&disk->queue->elevator.kobj) instead of the reverse that the 2.5.70 code does? If so, then all the deallocation code (called from del_gendisk(disk)) needs to be reversed and changed around too. Seems like so much wrong logic that it's my reading of the code that's at fault instead. Help!? CC me on any email as I don't read the lkml regularly. ldl@aros.net Thanks!! Louis D. Langholtz ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.5.70 add_disk(disk) re-registering disk->queue->elevator.kobj (bug?!) 2003-06-03 18:33 2.5.70 add_disk(disk) re-registering disk->queue->elevator.kobj (bug?!) Lou Langholtz @ 2003-06-03 19:07 ` Andrew Morton 2003-06-04 0:29 ` Lou Langholtz 0 siblings, 1 reply; 8+ messages in thread From: Andrew Morton @ 2003-06-03 19:07 UTC (permalink / raw) To: Lou Langholtz; +Cc: linux-kernel Lou Langholtz <ldl@aros.net> wrote: > > Or perhaps the block > handling logic was changed such that disks don't share the same > request_queue anymore. If so, then a few drivers (like nbd) need to be > updated to use a seperate request_queue per disk. The ramdisk driver was recently changed to do exactly this. From what you say it appears that nbd needs the same treatment. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.5.70 add_disk(disk) re-registering disk->queue->elevator.kobj (bug?!) 2003-06-03 19:07 ` Andrew Morton @ 2003-06-04 0:29 ` Lou Langholtz 2003-06-04 0:56 ` viro 2003-06-04 1:00 ` Andrew Morton 0 siblings, 2 replies; 8+ messages in thread From: Lou Langholtz @ 2003-06-04 0:29 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel Andrew Morton wrote: >Lou Langholtz <ldl@aros.net> wrote: > > >>Or perhaps the block >>handling logic was changed such that disks don't share the same >>request_queue anymore. If so, then a few drivers (like nbd) need to be >>updated to use a seperate request_queue per disk. >> >> > >The ramdisk driver was recently changed to do exactly this. From what >you say it appears that nbd needs the same treatment. > > I noticed that too but thought surely that couldn't be why the rd driver was changes. Cause... then it would seem via 'grep blk_init_queue drivers/block/*.c' that most of the block drivers need to be changed. And having a request_queue structure for every disk that's often (in these drivers) every minor device, seems like a lot of unneeded memory usage too. I'm afraid to ask this, but are you sure that each disk really is supposed to have its own request queue now? That seems less sensible than inverting the kobject parenting logic so that the request_queue.elevator kobject is the parent of the disk kobject. After all, makes more sense for multiple gen_disk objects to belong to the same elevator than for multiple elevators to belong to the same gen_disk no??? Anyways.... thanks for setting me straight ;-) Lou ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.5.70 add_disk(disk) re-registering disk->queue->elevator.kobj (bug?!) 2003-06-04 0:29 ` Lou Langholtz @ 2003-06-04 0:56 ` viro 2003-06-04 16:08 ` Patrick Mochel 2003-06-04 1:00 ` Andrew Morton 1 sibling, 1 reply; 8+ messages in thread From: viro @ 2003-06-04 0:56 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel On Tue, Jun 03, 2003 at 06:29:19PM -0600, Lou Langholtz wrote: > Andrew Morton wrote: > >The ramdisk driver was recently changed to do exactly this. From what > >you say it appears that nbd needs the same treatment. This is utterly ridiculous. I realize that sysfs is fashionable, but it should reflect the existing logics, not the other way round. Guys, there are very valid reasons to have a queue shared by several disks. E.g. it can very well act as a serialization mechanism - and as the matter of fact does in quite a few drivers. What the fuck is going on? It's what, the fifth case when we have somebody export something in sysfs, ignore the lifetime rules for objects and go "reality doesn't match my theory, too bad for reality"? Linus, could we *please* put a moratorium on use of sysfs unless the persons using it had proven that they understand how the objects they export are used in the tree? Enough is enough - we already have netdev drivers to deal with and have to do that *now* thanks to blind sysfs export. Now we get random bdev stuff on top of that? Absofuckinglutely marvelous. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.5.70 add_disk(disk) re-registering disk->queue->elevator.kobj (bug?!) 2003-06-04 0:56 ` viro @ 2003-06-04 16:08 ` Patrick Mochel 0 siblings, 0 replies; 8+ messages in thread From: Patrick Mochel @ 2003-06-04 16:08 UTC (permalink / raw) To: viro; +Cc: Linus Torvalds, Andrew Morton, linux-kernel > This is utterly ridiculous. I realize that sysfs is fashionable, but > it should reflect the existing logics, not the other way round. I completely agree. When converting a subsystem to use the kobject model, there are essentially three paths to choose: - Convert the subsystem to use the kobject semantics - Convert the kobject model to tolerate the semantics of the subsystem - Something else The first is very attractive because it requires little effort to get an appealing layout in the filesystem and export some attributes for a class of objects. It's worked on some cases so far, but it's also proven to not work for all classes of objects, _especially_ when they start interacting with each other. Converting the kobject model to tolerate other semantics is fine, and I'm happy to see it happen. However, it must not make special cases for random subsystems. Which is where the 'something else' comes in. The object lifetime and interaction rules must be understood for each object that is converted to use kobjects. Once they are, as well as their violations of the current kobject semantics, we can work to improve both. The fact that kobjects make it easy to shoot ourselves in the head is bad. But, the fact that they make long-existing bugs easier to trigger and force people to deal with them and fix them is a good thing, in the long run. There's a lot of code out there, much of which not a lot of people understand. If we're forced to audit it, understand it, and fix it, then we all win in the end. HOWEVER, the current development model does not jive very well with the fact that we're quickly approaching 2.6. The temporary instability introduced by a kobject transition does not help the immediate cause. If conversions are going to happen in the near future, they need to be completely understood, well-thought out, and stable. I realize that it helps to do it gradually, and not everyone is as talented as Al in doing atomic large-scope gradual sets of patches. Also, that many people have the time/energy to continue working on conversions and kobject infrastructure. So, as a harbor for this continued work, and a pre-cursor to 2.7, I've set up a tree at bk://ldm.bkbits.net/linux-2.5-sysfs People are welcome to send me patches, which I'll keep there and push forward once they become stable and/or necessary. -pat ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.5.70 add_disk(disk) re-registering disk->queue->elevator.kobj (bug?!) 2003-06-04 0:29 ` Lou Langholtz 2003-06-04 0:56 ` viro @ 2003-06-04 1:00 ` Andrew Morton 2003-06-04 1:06 ` viro 1 sibling, 1 reply; 8+ messages in thread From: Andrew Morton @ 2003-06-04 1:00 UTC (permalink / raw) To: Lou Langholtz; +Cc: linux-kernel, Jens Axboe, Greg KH, Patrick Mochel Lou Langholtz <ldl@aros.net> wrote: > > >The ramdisk driver was recently changed to do exactly this. From what > >you say it appears that nbd needs the same treatment. > > > > > I noticed that too but thought surely that couldn't be why the rd driver > was changes. Cause... then it would seem via 'grep blk_init_queue > drivers/block/*.c' that most of the block drivers need to be changed. > And having a request_queue structure for every disk that's often (in > these drivers) every minor device, seems like a lot of unneeded memory > usage too. I'm afraid to ask this, but are you sure that each disk > really is supposed to have its own request queue now? That seems less > sensible than inverting the kobject parenting logic so that the > request_queue.elevator kobject is the parent of the disk kobject. After > all, makes more sense for multiple gen_disk objects to belong to the > same elevator than for multiple elevators to belong to the same gen_disk > no??? According to Al, we have a significant number of drivers in the tree in which multiple gendisks shared the same queue. Sometimes because that's a logical mapping onto how the hardware behaves. As far as I know, the new queue-per-gendisk requirement is purely because of this sysfs hierarchy problem. So yes, perhaps you are right and we need to rethink things from the sysfs angle rather than reworking the drivers. Along the lines which you mention. This isn't my area. Perhaps Jens, Greg or Pat could comment? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.5.70 add_disk(disk) re-registering disk->queue->elevator.kobj (bug?!) 2003-06-04 1:00 ` Andrew Morton @ 2003-06-04 1:06 ` viro 2003-06-04 16:07 ` Lou Langholtz 0 siblings, 1 reply; 8+ messages in thread From: viro @ 2003-06-04 1:06 UTC (permalink / raw) To: Andrew Morton Cc: Lou Langholtz, linux-kernel, Jens Axboe, Greg KH, Patrick Mochel On Tue, Jun 03, 2003 at 06:00:02PM -0700, Andrew Morton wrote: > According to Al, we have a significant number of drivers in the tree in > which multiple gendisks shared the same queue. Sometimes because that's a > logical mapping onto how the hardware behaves. ... on top of that, we have queues with no gendisk ever registered. SCSI tapes, for one things. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.5.70 add_disk(disk) re-registering disk->queue->elevator.kobj (bug?!) 2003-06-04 1:06 ` viro @ 2003-06-04 16:07 ` Lou Langholtz 0 siblings, 0 replies; 8+ messages in thread From: Lou Langholtz @ 2003-06-04 16:07 UTC (permalink / raw) To: viro; +Cc: Andrew Morton, linux-kernel, Jens Axboe, Greg KH, Patrick Mochel allviro@parcelfarce.linux.theplanet.co.uk wrote: >. . . >... on top of that, we have queues with no gendisk ever registered. >SCSI tapes, for one things. > > This may be unrelated but I think not... kobject refcounting for the elevators seems wrong in the current implementation if we're saying that in fact the request_queue should be sharable by zero or more devices. Here's extra debugging output from the nbd module I've been hacking on that's the result of doing an rmmod operation on the module. Note the BUG notice and the values of "q.e.ref#" that precede the BUG. "q.e.ref" is just the value of atomic_read(&disk->queue->elevator.kobj.refcount)). Running through zero as we go through calling del_gendisk(disk) for each disk belonging to each nbd_devs[] just seems wrong but this is what happens when the request_queue is shared by all the nbd_devs: Jun 2 23:09:36 cyprus kernel: nbd: nbd_exit called. Jun 2 23:09:36 cyprus kernel: nb0: calling del_gendisk(d56f1dc0): 43 0-0 ref#=4 q=e3268560 q.e.ref#=2 Jun 2 23:09:36 cyprus kernel: nb1: calling del_gendisk(d56f1cc0): 43 1-1 ref#=4 q=e3268560 q.e.ref#=0 Jun 2 23:09:36 cyprus kernel: nb2: calling del_gendisk(d56f3c80): 43 2-2 ref#=4 q=e3268560 q.e.ref#=-2 Jun 2 23:09:36 cyprus kernel: nb3: calling del_gendisk(d56f1ac0): 43 3-3 ref#=4 q=e3268560 q.e.ref#=-4 Jun 2 23:09:36 cyprus kernel: nb4: calling del_gendisk(d56f15c0): 43 4-4 ref#=4 q=e3268560 q.e.ref#=-6 Jun 2 23:09:36 cyprus kernel: nb5: calling del_gendisk(d56f16c0): 43 5-5 ref#=4 q=e3268560 q.e.ref#=-8 Jun 2 23:09:36 cyprus kernel: nb6: calling del_gendisk(d56f3980): 43 6-6 ref#=4 q=e3268560 q.e.ref#=-10 Jun 2 23:09:36 cyprus kernel: ------------[ cut here ]------------ Jun 2 23:09:36 cyprus kernel: kernel BUG at include/linux/dcache.h:271! Jun 2 23:09:36 cyprus kernel: invalid operand: 0000 [#1] Jun 2 23:09:36 cyprus kernel: CPU: 0 Jun 2 23:09:36 cyprus kernel: EIP: 0060:[<c0176b1d>] Tainted: GF Jun 2 23:09:36 cyprus kernel: EFLAGS: 00210246 Jun 2 23:09:36 cyprus kernel: EIP is at sysfs_remove_dir+0x1d/0x13f Jun 2 23:09:36 cyprus kernel: eax: 00000000 ebx: e32685a4 ecx: c027a0a0 edx: 00000000 Jun 2 23:09:36 cyprus kernel: esi: 00000006 edi: d5cad5c0 ebp: d8eadec0 esp: d8eadea8 Jun 2 23:09:36 cyprus kernel: ds: 007b es: 007b ss: 0068 Jun 2 23:09:36 cyprus kernel: Process rmmod (pid: 1732, threadinfo=d8eac000 task=d6224d80) Jun 2 23:09:36 cyprus kernel: Stack: d5cb9340 dfd6fbc0 dfd6fbe8 e32685a4 00000006 d56f3a48 d8eaded8 c0204e53 Jun 2 23:09:36 cyprus kernel: e32685a4 c047e740 e32685a4 e32685a4 d8eadee8 c0204e94 e32685a4 d56f3980 Jun 2 23:09:36 cyprus kernel: d8eadef8 c0275eae e32685a4 d56f3980 d8eadf0c c0279bd4 d56f3980 d56f3980 Jun 2 23:09:36 cyprus kernel: Call Trace: Jun 2 23:09:36 cyprus kernel: [<e32685a4>] nbd_queue+0x44/0x14c [nbd] Jun 2 23:09:36 cyprus kernel: [<c0204e53>] kobject_del+0x43/0x70 Jun 2 23:09:36 cyprus kernel: [<e32685a4>] nbd_queue+0x44/0x14c [nbd] Jun 2 23:09:36 cyprus last message repeated 2 times Jun 2 23:09:36 cyprus kernel: [<c0204e94>] kobject_unregister+0x14/0x20 Jun 2 23:09:36 cyprus kernel: [<e32685a4>] nbd_queue+0x44/0x14c [nbd] Jun 2 23:09:36 cyprus kernel: [<c0275eae>] elv_unregister_queue+0x1e/0x40 Jun 2 23:09:36 cyprus kernel: [<e32685a4>] nbd_queue+0x44/0x14c [nbd] Jun 2 23:09:36 cyprus kernel: [<c0279bd4>] unlink_gendisk+0x14/0x40 Jun 2 23:09:36 cyprus kernel: [<c0175951>] del_gendisk+0x61/0x110 Jun 2 23:09:36 cyprus kernel: [<e32617a4>] +0x644/0x7400 [nbd] Jun 2 23:09:36 cyprus kernel: [<e325efec>] nbd_exit+0x7c/0x157 [nbd] Jun 2 23:09:36 cyprus kernel: [<e3268560>] nbd_queue+0x0/0x14c [nbd] Jun 2 23:09:36 cyprus kernel: [<e3268800>] +0x0/0x140 [nbd] Jun 2 23:09:36 cyprus kernel: [<c012c30e>] sys_delete_module+0x12e/0x170 Jun 2 23:09:36 cyprus kernel: [<e3268800>] +0x0/0x140 [nbd] Jun 2 23:09:36 cyprus kernel: [<c010c73c>] do_IRQ+0xcc/0xe0 Jun 2 23:09:36 cyprus kernel: [<c010ac4d>] sysenter_past_esp+0x52/0x71 Jun 2 23:09:36 cyprus kernel: Jun 2 23:09:36 cyprus kernel: Code: 0f 0b 0f 01 db 9a 3c c0 ff 07 85 ff 0f 84 08 01 00 00 8b 47 Now if the whole block system has gotten redesigned to the point where each request_queue could be run through in multi-threaded fassion such that waiting within the queue handling q->request_fn won't block other running q->request_fn's then it makes sense in some block device driver cases to use a request_queue & lock per device but I'm still concerned by the direction of parenting of associated kobjects between gen_disk's and request_queue.elevator's. Should gen_disk's be the parent's of elevator's or the other way around? Is it ever the case that a driver needs multiple elevators for the same gen_disk? Seems like allviro is establishing that some drivers need multiple gen_disk's to use the same elevator. Looking forward to more direction on this. Thanks!!! ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2003-06-04 15:54 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2003-06-03 18:33 2.5.70 add_disk(disk) re-registering disk->queue->elevator.kobj (bug?!) Lou Langholtz 2003-06-03 19:07 ` Andrew Morton 2003-06-04 0:29 ` Lou Langholtz 2003-06-04 0:56 ` viro 2003-06-04 16:08 ` Patrick Mochel 2003-06-04 1:00 ` Andrew Morton 2003-06-04 1:06 ` viro 2003-06-04 16:07 ` Lou Langholtz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox