* Re: [RFC] Separating out libata out of SCSI (finally) [not found] <485B2CC6.6070201@kernel.org> @ 2008-06-20 19:41 ` Brian King 2008-06-20 20:28 ` James Bottomley 1 sibling, 0 replies; 18+ messages in thread From: Brian King @ 2008-06-20 19:41 UTC (permalink / raw) To: Tejun Heo Cc: Jeff Garzik, IDE/ATA development list, James Bottomley, linux-scsi, Mark Lord, Alan Cox, Jens Axboe Tejun Heo wrote: > PS. Brian, can you please point me to the latest version of libata EH > integration patch? I uploaded the current patch series here: http://internap.dl.sourceforge.net/sourceforge/iprdd/libata_sas_new_eh.tgz -Brian -- Brian King Linux on Power Virtualization IBM Linux Technology Center ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] Separating out libata out of SCSI (finally) [not found] <485B2CC6.6070201@kernel.org> 2008-06-20 19:41 ` [RFC] Separating out libata out of SCSI (finally) Brian King @ 2008-06-20 20:28 ` James Bottomley 2008-06-20 22:41 ` Jeff Garzik 2008-06-20 23:47 ` Tejun Heo 1 sibling, 2 replies; 18+ messages in thread From: James Bottomley @ 2008-06-20 20:28 UTC (permalink / raw) To: Tejun Heo Cc: Jeff Garzik, IDE/ATA development list, linux-scsi, brking, Mark Lord, Alan Cox, Jens Axboe On Fri, 2008-06-20 at 13:06 +0900, Tejun Heo wrote: > Hello, all. > > This item was on TODO list for years now. People all agree that it's > necessary but it always had relatively low priority probably because > it's a bit difficult and isn't really necessary to make disks and > optical drives work. > > Anyways, I think it's about time to take some action as SAS-ATA > integration (Brian, sorry about staying so silent about this for long > time, I was following the threads but couldn't really think of a quick > solution) and other ATA specific things including link power > management and bunch of other deferred ones due to lack of proper > sysfs interface or high level driver (parallel probing, parallel > resume). > > Currently, my plan is... > > * Move high level driver handling to request_queue. Actually, I already did quite a lot of that here: commit 7f9a6bc4e9d59e7fcf03ed23f60cd81ca5d80b65 Author: James Bottomley <James.Bottomley@steeleye.com> Date: Sat Aug 4 10:06:25 2007 -0500 [SCSI] move ULD attachment into the prep function But there's still more to be done. The way I was thinking of it was some type of protocol label (as in a ULD spits out protocols, like SCSI or ATA) and then passes them to a LLD which uses libraries (what libata and the scsi mid layer become) to process them. > * Implement queue quiescing and other state management on request_queue. > * Implement block_queue_group which... > - Handles command scheduling. > - Handles grouped queue quiescing and EH handling There's the beginnings of this in Jens' unmerged block timers work > * Move SCSI high level drivers to new infrastructure > * Convert libata to use new command scheduling and EH infrastructure > * Apply driver model to libata. > * Implement ata_disk ATA high level driver. > > In the process, I'm planning to remove ata_host requirement and break > down libata EH into actions and sequencers so that SAS can use them > easily. > > The biggest problem is how to keep userland happy. hdX -> sdX > transition was painful enough and I have a strong feeling that > everyone will come after and hunt down us if we try something like sdX > -> bdX now. :-) In theory mounting by label or ID should have fixed a lot of this. However, if we need to head off a revolt, the sdX allocation algorithm can be placed into it's own module so both sd and a ULD ata driver could use it ... > It would be ideal if those ATA specific sysfs stuff just shows up > without disturbing the original SCSI things which are now widely used > for enumeration and manipulation. I think we can get pretty close by > modifying SCSI high level drivers a bit such that they don't register > block devices for SCSI devices created to keep backward compatibility. > This is an extra burden on SCSI but it's gonna be the last one. > > *** Currently, sysfs nodes for a libata disk is like the following. > > /devices/DEV/hostH/targetH:C:I/H:C:I:L/ > driver -> /bus/scsi/drivers/sd > generic -> /class/scsi_generic/sgN > block/sdX/ > device -> ../../ > partitions... > /bus/scsi/devices/H:C:I:L -> /devices/DEV/hostH/targetH:C:I/H:C:I:L > /bus/scsi/drivers/sd/H:C:I:L -> /devices/DEV/hostH/targetH:C:I/H:C:I:L > /class/scsi_host/hostH/device -> /devices/DEV/hostH > /class/scsi_device/H:C:I:L/device -> /devices/DEV/hostH/targetH:C:I/H:C:I:L > /class/scsi_disk/H:C:I:L/device -> /devices/DEV/hostH/targetH:C:I/H:C:I:L > /class/scsi_generic/sgN/device -> /devices/DEV/hostH/targetH:C:I/H:C:I:L > /block/sda -> /devices/DEV/hostH/targetH:C:I/H:C:I:L/block/sdX > > *** After conversion, it will look something like the following. > > /devices/DEV/ata_port/P/ata_link/P:L/ata_device/P:D > driver -> /bus/ata/drivers/ata_disk > block/sdX/ > device -> ../../ > partitions... > C scsi_device -> /devices/DEV/scsi_compat/hostH/targetH:C:I/H:C:I:L > /bus/ata/devices/P:D -> /devices/DEV/ata_port/P/ata_link/P:L/ata_device/P:D > /bus/ata/drivers/ata_disk/P:D -> /devices/DEV/ata_port/P/ata_link/P:L/ata_device/P:D > /block/sda -> /devices/DEV/ata_port/P/ata_link/P:L/ata_device/P:D/block/sdX Could you, perhaps, make the port multipler visible in this as a new device, a bit like we do today for SAS expanders? > /devices/DEV/scsi_compat/hostH/targetH:C:I/H:C:I:L/ > driver -> /bus/scsi/drivers/sd > generic -> /class/scsi_generic/sgN > C block -> ata_port/P/ata_link/P:L/ata_device/P:D/block > C ata_device -> /devices/DEV/ata_port/P/ata_link/P:L/ata_device/P:D > > /bus/scsi/devices/H:C:I:L -> /devices/DEV/scsi_compat/hostH/targetH:C:I/H:C:I:L > /bus/scsi/drivers/sd/H:C:I:L -> /devices/DEV/scsi_compat/hostH/targetH:C:I/H:C:I:L > /class/scsi_host/hostH/device -> /devices/DEV/scsi_compat/hostH > /class/scsi_device/H:C:I:L/device -> /devices/DEV/scsi_compat/hostH/targetH:C:I/H:C:I:L > /class/scsi_disk/H:C:I:L/device -> /devices/DEV/scsi_compat/hostH/targetH:C:I/H:C:I:L > /class/scsi_generic/sgN/device -> /devices/DEV/scsi_compat/hostH/targetH:C:I/H:C:I:L > > The SCSI side of interface will remain as functional as now as it will > go through the same libata SAT layer. Actually, surely we can mostly dump the SAT layer? libsas should be made capable of taking ATA protocol packets straight from your ULD ATA driver and sending them out. I could see us still needing it as an optional component so we can send SCSI SG_IO to ATA devices. > The only surprise userspace > will see aside from the extra ata nodes is that /sys/block/sdX/device > will lead to an ATA device node instead of SCSI device node. Well, > that's the whole point of the converion but I think the surprise can > be minimized by reusing names used in SCSI device node and possibly > making symlinks for nodes which only makes sense for SCSI device. > > So, what do you guys think? I think the devil will be in the details, but that it certainly won't be obvious until the conversion is actually tried. James ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] Separating out libata out of SCSI (finally) 2008-06-20 20:28 ` James Bottomley @ 2008-06-20 22:41 ` Jeff Garzik 2008-06-20 23:50 ` Tejun Heo 2008-06-23 21:04 ` Greg Freemyer 2008-06-20 23:47 ` Tejun Heo 1 sibling, 2 replies; 18+ messages in thread From: Jeff Garzik @ 2008-06-20 22:41 UTC (permalink / raw) To: James Bottomley, Tejun Heo Cc: IDE/ATA development list, linux-scsi, brking, Mark Lord, Alan Cox, Jens Axboe James Bottomley wrote: > On Fri, 2008-06-20 at 13:06 +0900, Tejun Heo wrote: >> The biggest problem is how to keep userland happy. hdX -> sdX >> transition was painful enough and I have a strong feeling that >> everyone will come after and hunt down us if we try something like sdX >> -> bdX now. :-) > In theory mounting by label or ID should have fixed a lot of this. > However, if we need to head off a revolt, the sdX allocation algorithm > can be placed into it's own module so both sd and a ULD ata driver could > use it ... > Actually, surely we can mostly dump the SAT layer? I don't see that we can do that for a long time... And it's not just the sdX allocation algorithm in question -- SCSI block devices come with their own partition limits and set of supported ioctls. Therefore, my recommended path has always been * create ata_disk block device driver (ULD, in your terminology) * make SAT an optional piece, which maintains compatibility with existing SCSI blkdevs, ioctls, command sets I just don't see a valid path moving forward that breaks userland /again/... we (ATA hackers) would be drummed out of a job I think :) Another option that's been discussed is 1) Make SCSI block devices themselves an allocate-able resource (I think that's what you meant by "placed into it's own module so both sd and a ULD ata driver could use it"?) 2) Ensure that any ata_disk ULD would support the same partition limits and ioctl set, enough to ensure binary compatibility. Because that's the real need -- maintaining binary compatibility with SCSI block devices, so major/minor, ioctl supported set, partition limits, and other relevant details need to remain unchanged. The underlying software we're of course free to change... Jeff ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] Separating out libata out of SCSI (finally) 2008-06-20 22:41 ` Jeff Garzik @ 2008-06-20 23:50 ` Tejun Heo 2008-06-23 21:04 ` Greg Freemyer 1 sibling, 0 replies; 18+ messages in thread From: Tejun Heo @ 2008-06-20 23:50 UTC (permalink / raw) To: Jeff Garzik Cc: James Bottomley, IDE/ATA development list, linux-scsi, brking, Mark Lord, Alan Cox, Jens Axboe Jeff Garzik wrote: > 1) Make SCSI block devices themselves an allocate-able resource (I think > that's what you meant by "placed into it's own module so both sd and a > ULD ata driver could use it"?) > > 2) Ensure that any ata_disk ULD would support the same partition limits > and ioctl set, enough to ensure binary compatibility. > > Because that's the real need -- maintaining binary compatibility with > SCSI block devices, so major/minor, ioctl supported set, partition > limits, and other relevant details need to remain unchanged. > > The underlying software we're of course free to change... I'm taking this approach. I think it's better than introducing a new block device while keeping the old one as that causes numerous userland problems including dup devices (if they're gonna exist side-by-side) and eventual need for conversion && breakage of old userland on newer kernels. Thanks. -- tejun ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] Separating out libata out of SCSI (finally) 2008-06-20 22:41 ` Jeff Garzik 2008-06-20 23:50 ` Tejun Heo @ 2008-06-23 21:04 ` Greg Freemyer 2008-06-23 21:11 ` James Bottomley 1 sibling, 1 reply; 18+ messages in thread From: Greg Freemyer @ 2008-06-23 21:04 UTC (permalink / raw) To: Jeff Garzik Cc: James Bottomley, Tejun Heo, IDE/ATA development list, linux-scsi, brking, Mark Lord, Alan Cox, Jens Axboe On Fri, Jun 20, 2008 at 6:41 PM, Jeff Garzik <jeff@garzik.org> wrote: > James Bottomley wrote: >> >> On Fri, 2008-06-20 at 13:06 +0900, Tejun Heo wrote: >>> >>> The biggest problem is how to keep userland happy. hdX -> sdX >>> transition was painful enough and I have a strong feeling that >>> everyone will come after and hunt down us if we try something like sdX >>> -> bdX now. :-) > >> In theory mounting by label or ID should have fixed a lot of this. >> However, if we need to head off a revolt, the sdX allocation algorithm >> can be placed into it's own module so both sd and a ULD ata driver could >> use it ... > >> Actually, surely we can mostly dump the SAT layer? > > > I don't see that we can do that for a long time... And it's not just the > sdX allocation algorithm in question -- SCSI block devices come with their > own partition limits and set of supported ioctls. > > Therefore, my recommended path has always been > > * create ata_disk block device driver (ULD, in your terminology) > > * make SAT an optional piece, which maintains compatibility with existing > SCSI blkdevs, ioctls, command sets > > > I just don't see a valid path moving forward that breaks userland /again/... > we (ATA hackers) would be drummed out of a job I think :) > > Another option that's been discussed is > > 1) Make SCSI block devices themselves an allocate-able resource (I think > that's what you meant by "placed into it's own module so both sd and a ULD > ata driver could use it"?) > > 2) Ensure that any ata_disk ULD would support the same partition limits and > ioctl set, enough to ensure binary compatibility. > > Because that's the real need -- maintaining binary compatibility with SCSI > block devices, so major/minor, ioctl supported set, partition limits, and > other relevant details need to remain unchanged. > I've seen a lot of end user complaints about libata only supporting 15(14?) partitions. Will that limit be moved back to the traditional drivers/ide limit as part of this? FYI: I don't personally use that many partitions, but there is a vocal minority of users on the opensuse mailinglist that complain pretty loudly about current restriction. Greg -- Greg Freemyer Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] Separating out libata out of SCSI (finally) 2008-06-23 21:04 ` Greg Freemyer @ 2008-06-23 21:11 ` James Bottomley 2008-06-23 21:56 ` Felix Miata 2008-06-24 8:30 ` Boaz Harrosh 0 siblings, 2 replies; 18+ messages in thread From: James Bottomley @ 2008-06-23 21:11 UTC (permalink / raw) To: Greg Freemyer Cc: Jeff Garzik, Tejun Heo, IDE/ATA development list, linux-scsi, brking, Mark Lord, Alan Cox, Jens Axboe On Mon, 2008-06-23 at 17:04 -0400, Greg Freemyer wrote: > On Fri, Jun 20, 2008 at 6:41 PM, Jeff Garzik <jeff@garzik.org> wrote: > > James Bottomley wrote: > >> > >> On Fri, 2008-06-20 at 13:06 +0900, Tejun Heo wrote: > >>> > >>> The biggest problem is how to keep userland happy. hdX -> sdX > >>> transition was painful enough and I have a strong feeling that > >>> everyone will come after and hunt down us if we try something like sdX > >>> -> bdX now. :-) > > > >> In theory mounting by label or ID should have fixed a lot of this. > >> However, if we need to head off a revolt, the sdX allocation algorithm > >> can be placed into it's own module so both sd and a ULD ata driver could > >> use it ... > > > >> Actually, surely we can mostly dump the SAT layer? > > > > > > I don't see that we can do that for a long time... And it's not just the > > sdX allocation algorithm in question -- SCSI block devices come with their > > own partition limits and set of supported ioctls. > > > > Therefore, my recommended path has always been > > > > * create ata_disk block device driver (ULD, in your terminology) > > > > * make SAT an optional piece, which maintains compatibility with existing > > SCSI blkdevs, ioctls, command sets > > > > > > I just don't see a valid path moving forward that breaks userland /again/... > > we (ATA hackers) would be drummed out of a job I think :) > > > > Another option that's been discussed is > > > > 1) Make SCSI block devices themselves an allocate-able resource (I think > > that's what you meant by "placed into it's own module so both sd and a ULD > > ata driver could use it"?) > > > > 2) Ensure that any ata_disk ULD would support the same partition limits and > > ioctl set, enough to ensure binary compatibility. > > > > Because that's the real need -- maintaining binary compatibility with SCSI > > block devices, so major/minor, ioctl supported set, partition limits, and > > other relevant details need to remain unchanged. > > > > I've seen a lot of end user complaints about libata only supporting > 15(14?) partitions. Will that limit be moved back to the traditional > drivers/ide limit as part of this? Number of partitions is directly related to number of minors, so it can't be changed without a change in the allocation of major/minor space in sd ... that could only be done compatibly by permuting the space. The only other way to do it is incompatibly by changing major (again). James ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] Separating out libata out of SCSI (finally) 2008-06-23 21:11 ` James Bottomley @ 2008-06-23 21:56 ` Felix Miata 2008-06-24 8:30 ` Boaz Harrosh 1 sibling, 0 replies; 18+ messages in thread From: Felix Miata @ 2008-06-23 21:56 UTC (permalink / raw) To: James Bottomley, linux-ide Cc: Greg Freemyer, Jeff Garzik, Alan Cox, Jens Axboe On 2008/06/23 16:11 (GMT-0500) James Bottomley apparently typed: > On Mon, 2008-06-23 at 17:04 -0400, Greg Freemyer wrote: >> I've seen a lot of end user complaints about libata only supporting >> 15(14?) partitions. Will that limit be moved back to the traditional >> drivers/ide limit as part of this? > Number of partitions is directly related to number of minors, so it > can't be changed without a change in the allocation of major/minor space > in sd ... that could only be done compatibly by permuting the space. > The only other way to do it is incompatibly by changing major (again). I don't know what permuting the space means, but I do know libata affects me in a very big way, being dependent on cross platform tools for partitioning, backup, & restoration; and multibooting virtually every system I touch. LVM is simply no option for these systems, and I have yet to determine whether and how kpartx and device mapper might be used to mitigate the problems caused by trying to work within libata's SCSI limit, a limit which is a major reason why I ceased trying to use SCSI over 5 years ago when disks got too big for few partitions. I've ceased trying to use Fedora & *buntu on more than a cursory basis, using mostly SUSE and Mandriva because they still provide the option to use legacy drivers in their default kernels. Isn't libata at some point supposed to completely displace the legacy IDE driver set? If so, at some point couldn't the IDE major be recycled for libata use for devices >15, or even all of them, maybe in the interim via a compile time choice between either legacy or major 3 but never both simultaneously? -- "Where were you when I laid the earth's foundation?" Matthew 7:12 NIV Team OS/2 ** Reg. Linux User #211409 Felix Miata *** http://fm.no-ip.com/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] Separating out libata out of SCSI (finally) 2008-06-23 21:11 ` James Bottomley 2008-06-23 21:56 ` Felix Miata @ 2008-06-24 8:30 ` Boaz Harrosh 2008-06-24 14:42 ` James Bottomley 1 sibling, 1 reply; 18+ messages in thread From: Boaz Harrosh @ 2008-06-24 8:30 UTC (permalink / raw) To: James Bottomley Cc: Greg Freemyer, Jeff Garzik, Tejun Heo, IDE/ATA development list, linux-scsi, brking, Mark Lord, Alan Cox, Jens Axboe James Bottomley wrote: > On Mon, 2008-06-23 at 17:04 -0400, Greg Freemyer wrote: >> On Fri, Jun 20, 2008 at 6:41 PM, Jeff Garzik <jeff@garzik.org> wrote: >>> James Bottomley wrote: >>>> On Fri, 2008-06-20 at 13:06 +0900, Tejun Heo wrote: >>>>> The biggest problem is how to keep userland happy. hdX -> sdX >>>>> transition was painful enough and I have a strong feeling that >>>>> everyone will come after and hunt down us if we try something like sdX >>>>> -> bdX now. :-) >>>> In theory mounting by label or ID should have fixed a lot of this. >>>> However, if we need to head off a revolt, the sdX allocation algorithm >>>> can be placed into it's own module so both sd and a ULD ata driver could >>>> use it ... >>>> Actually, surely we can mostly dump the SAT layer? >>> >>> I don't see that we can do that for a long time... And it's not just the >>> sdX allocation algorithm in question -- SCSI block devices come with their >>> own partition limits and set of supported ioctls. >>> >>> Therefore, my recommended path has always been >>> >>> * create ata_disk block device driver (ULD, in your terminology) >>> >>> * make SAT an optional piece, which maintains compatibility with existing >>> SCSI blkdevs, ioctls, command sets >>> >>> >>> I just don't see a valid path moving forward that breaks userland /again/... >>> we (ATA hackers) would be drummed out of a job I think :) >>> >>> Another option that's been discussed is >>> >>> 1) Make SCSI block devices themselves an allocate-able resource (I think >>> that's what you meant by "placed into it's own module so both sd and a ULD >>> ata driver could use it"?) >>> >>> 2) Ensure that any ata_disk ULD would support the same partition limits and >>> ioctl set, enough to ensure binary compatibility. >>> >>> Because that's the real need -- maintaining binary compatibility with SCSI >>> block devices, so major/minor, ioctl supported set, partition limits, and >>> other relevant details need to remain unchanged. >>> >> I've seen a lot of end user complaints about libata only supporting >> 15(14?) partitions. Will that limit be moved back to the traditional >> drivers/ide limit as part of this? > > Number of partitions is directly related to number of minors, so it > can't be changed without a change in the allocation of major/minor space > in sd ... that could only be done compatibly by permuting the space. > The only other way to do it is incompatibly by changing major (again). > > James > Could we do both? I mean use the legacy, up to 15, with the old major, then use the new major for bigger then 15. Since user mode that knows about more then 15 partitions is new, it'll know it needs to jump a major. Boaz ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] Separating out libata out of SCSI (finally) 2008-06-24 8:30 ` Boaz Harrosh @ 2008-06-24 14:42 ` James Bottomley 2008-06-24 14:58 ` Greg Freemyer 2008-06-24 14:59 ` Tejun Heo 0 siblings, 2 replies; 18+ messages in thread From: James Bottomley @ 2008-06-24 14:42 UTC (permalink / raw) To: Boaz Harrosh Cc: Greg Freemyer, Jeff Garzik, Tejun Heo, IDE/ATA development list, linux-scsi, brking, Mark Lord, Alan Cox, Jens Axboe On Tue, 2008-06-24 at 11:30 +0300, Boaz Harrosh wrote: > James Bottomley wrote: > > On Mon, 2008-06-23 at 17:04 -0400, Greg Freemyer wrote: > >> I've seen a lot of end user complaints about libata only supporting > >> 15(14?) partitions. Will that limit be moved back to the traditional > >> drivers/ide limit as part of this? > > > > Number of partitions is directly related to number of minors, so it > > can't be changed without a change in the allocation of major/minor space > > in sd ... that could only be done compatibly by permuting the space. > > The only other way to do it is incompatibly by changing major (again). > > > Could we do both? I mean use the legacy, up to 15, with the old major, > then use the new major for bigger then 15. Since user mode that knows > about more then 15 partitions is new, it'll know it needs to jump a major. Not simultaneously, which is the problem; you can't have two separate block devices for the same physical device unless you want aliasing issues in the page cache. It might be possible to add an extra device to give access to the missing partitions, but that would require a bit of re-engineering in gendisk (which is the in-kernel code to manage the partitions). What might be far more feasible is to set up udev to use kpartx to provide the missing partitions if it detects a partition table that has them ... of course, that requires a udev setup and most of the complaints about the lost partitions seem to come from non-udev systems. But .... if everyone (particularly the people with these problems) had udev, we could simply migrate to a new major with more partitions, get udev to fix it all up for us and everyone would be happy because no-one would even notice that we'd moved majors ... James ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] Separating out libata out of SCSI (finally) 2008-06-24 14:42 ` James Bottomley @ 2008-06-24 14:58 ` Greg Freemyer 2008-06-24 15:13 ` Felix Miata 2008-06-24 14:59 ` Tejun Heo 1 sibling, 1 reply; 18+ messages in thread From: Greg Freemyer @ 2008-06-24 14:58 UTC (permalink / raw) To: James Bottomley Cc: Boaz Harrosh, Jeff Garzik, Tejun Heo, IDE/ATA development list, linux-scsi, brking, Mark Lord, Alan Cox, Jens Axboe On Tue, Jun 24, 2008 at 10:42 AM, James Bottomley <James.Bottomley@hansenpartnership.com> wrote: > On Tue, 2008-06-24 at 11:30 +0300, Boaz Harrosh wrote: >> James Bottomley wrote: >> > On Mon, 2008-06-23 at 17:04 -0400, Greg Freemyer wrote: >> >> I've seen a lot of end user complaints about libata only supporting >> >> 15(14?) partitions. Will that limit be moved back to the traditional >> >> drivers/ide limit as part of this? >> > >> > Number of partitions is directly related to number of minors, so it >> > can't be changed without a change in the allocation of major/minor space >> > in sd ... that could only be done compatibly by permuting the space. >> > The only other way to do it is incompatibly by changing major (again). >> > >> Could we do both? I mean use the legacy, up to 15, with the old major, >> then use the new major for bigger then 15. Since user mode that knows >> about more then 15 partitions is new, it'll know it needs to jump a major. > > Not simultaneously, which is the problem; you can't have two separate > block devices for the same physical device unless you want aliasing > issues in the page cache. > > It might be possible to add an extra device to give access to the > missing partitions, but that would require a bit of re-engineering in > gendisk (which is the in-kernel code to manage the partitions). > > What might be far more feasible is to set up udev to use kpartx to > provide the missing partitions if it detects a partition table that has > them ... of course, that requires a udev setup and most of the > complaints about the lost partitions seem to come from non-udev systems. > > But .... if everyone (particularly the people with these problems) had > udev, we could simply migrate to a new major with more partitions, get > udev to fix it all up for us and everyone would be happy because no-one > would even notice that we'd moved majors ... > > James >From my limited perspective, every complaint I've seen was related to OpenSUSE which I believe is a udev based distro, so addressing this for the udev based distros would address that contingent of users. Greg -- Greg Freemyer Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] Separating out libata out of SCSI (finally) 2008-06-24 14:58 ` Greg Freemyer @ 2008-06-24 15:13 ` Felix Miata 0 siblings, 0 replies; 18+ messages in thread From: Felix Miata @ 2008-06-24 15:13 UTC (permalink / raw) To: linux-ide On 2008/06/24 10:58 (GMT-0400) Greg Freemyer apparently typed: > From my limited perspective, every complaint I've seen was related to > OpenSUSE which I believe is a udev based distro, so addressing this > for the udev based distros would address that contingent of users. I think OpenSUSE users are just more vocal, but basically the contingent of those with a problem from no >15 access is heavily weighted with multibooters, whether they use OpenSUSE, Mandriva, *buntu or whatever. Fedora seems to have pushed most of its users into LVM, but I don't think LVM has caught or will catch on for the rest of the multibooters. None of the distros I've first used in the past 2+ years (probably longer, I just don't remember when I didn't see udev last) has failed to include udev. Until now at least I've been able to avoid doing any hardware upgrading that would force me into using only SATA. I have only one out of about 25 working systems with SATA exclusively. It I use virtually exclusively for OS/2 (you remember, the ancient and "dead" OS), and it currently has 50 partitions on disk 1, and 19 on disk 2. -- "Where were you when I laid the earth's foundation?" Matthew 7:12 NIV Team OS/2 ** Reg. Linux User #211409 Felix Miata *** http://fm.no-ip.com/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] Separating out libata out of SCSI (finally) 2008-06-24 14:42 ` James Bottomley 2008-06-24 14:58 ` Greg Freemyer @ 2008-06-24 14:59 ` Tejun Heo 2008-06-24 15:42 ` Felix Miata 2008-06-24 16:54 ` Alan Cox 1 sibling, 2 replies; 18+ messages in thread From: Tejun Heo @ 2008-06-24 14:59 UTC (permalink / raw) To: James Bottomley Cc: Boaz Harrosh, Greg Freemyer, Jeff Garzik, IDE/ATA development list, linux-scsi, brking, Mark Lord, Alan Cox, Jens Axboe James Bottomley wrote: > On Tue, 2008-06-24 at 11:30 +0300, Boaz Harrosh wrote: >> James Bottomley wrote: >>> On Mon, 2008-06-23 at 17:04 -0400, Greg Freemyer wrote: >>>> I've seen a lot of end user complaints about libata only supporting >>>> 15(14?) partitions. Will that limit be moved back to the traditional >>>> drivers/ide limit as part of this? >>> Number of partitions is directly related to number of minors, so it >>> can't be changed without a change in the allocation of major/minor space >>> in sd ... that could only be done compatibly by permuting the space. >>> The only other way to do it is incompatibly by changing major (again). >>> >> Could we do both? I mean use the legacy, up to 15, with the old major, >> then use the new major for bigger then 15. Since user mode that knows >> about more then 15 partitions is new, it'll know it needs to jump a major. > > Not simultaneously, which is the problem; you can't have two separate > block devices for the same physical device unless you want aliasing > issues in the page cache. > > It might be possible to add an extra device to give access to the > missing partitions, but that would require a bit of re-engineering in > gendisk (which is the in-kernel code to manage the partitions). > > What might be far more feasible is to set up udev to use kpartx to > provide the missing partitions if it detects a partition table that has > them ... of course, that requires a udev setup and most of the > complaints about the lost partitions seem to come from non-udev systems. > > But .... if everyone (particularly the people with these problems) had > udev, we could simply migrate to a new major with more partitions, get > udev to fix it all up for us and everyone would be happy because no-one > would even notice that we'd moved majors ... I'm currently working on a scheme where partitions above gd->minors get allocated dynamic MAJ:MIN. It looks like it can be done mostly in block layer proper. The only problem I can foresee is not being able to specify MAJ:MIN as root device but that shouldn't be a major problem. I'll report back when I make more progress. Thanks. -- tejun ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] Separating out libata out of SCSI (finally) 2008-06-24 14:59 ` Tejun Heo @ 2008-06-24 15:42 ` Felix Miata 2008-06-24 15:49 ` Tejun Heo 2008-06-24 16:54 ` Alan Cox 1 sibling, 1 reply; 18+ messages in thread From: Felix Miata @ 2008-06-24 15:42 UTC (permalink / raw) To: linux-ide On 2008/06/24 23:59 (GMT+0900) Tejun Heo apparently typed: > I'm currently working on a scheme where partitions above gd->minors get > allocated dynamic MAJ:MIN. It looks like it can be done mostly in block > layer proper. The only problem I can foresee is not being able to > specify MAJ:MIN as root device but that shouldn't be a major problem. > I'll report back when I make more progress. Please correct me if I'm wrong, but on the following, would it not be a major problem booting, without deleting any existing or moving any existing, with only a few more upgrade cycles to replace what's there installed now with current and near future distros? Disk /dev/hda: 160.0 GB, 160041885696 bytes 255 heads, 63 sectors/track, 19457 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x31caf8da Device Boot Start End Blocks Id System /dev/hda1 1 13 104391 17 Hidden HPFS/NTFS # OS/2 maintenance OS /dev/hda2 * 14 14 8032+ a OS/2 Boot Manager # Primary Boot Loader /dev/hda3 15 46 257040 6 FAT16 # DOS /dev/hda4 48 14594 116848777+ 5 Extended /dev/hda5 48 48 8001 1 FAT12 # DOS /dev/hda6 49 80 257008+ 6 FAT16 # DOS /dev/hda7 82 107 208813+ 83 Linux # primary Grub /dev/hda8 108 222 923706 82 Linux swap / Solaris /dev/hda9 223 834 4915858+ 83 Linux # Fedora, current /dev/hda10 1142 1243 819283+ 7 HPFS/NTFS # OS/2 OS /dev/hda11 1269 1523 2048256 7 HPFS/NTFS # OS/2 data /dev/hda12 1575 1693 955836 7 HPFS/NTFS # OS/2 apps /dev/hda13 1694 2012 2562336 7 HPFS/NTFS # OS/2 archival /dev/hda14 2014 2217 1638598+ 83 Linux # /home /dev/hda15 2218 2568 2819376 83 Linux # /Knoppix /dev/hda16 2569 3486 7373803+ 83 Linux # SUSE 10.2 / /dev/hda17 3487 4379 7172991 83 Linux # /pub /dev/hda18 4380 4409 240943+ 83 Linux # /srv /dev/hda19 4410 4549 1124518+ 83 Linux # /usr/local /dev/hda20 4550 5187 5124703+ 83 Linux # /usr/src /dev/hda21 5188 5799 4915858+ 83 Linux # Mandriva 2007 / /dev/hda22 5800 6513 5735173+ 83 Linux # Factory / /dev/hda23 6514 7176 5325516 83 Linux # Cooker / /dev/hda24 7178 7279 819283+ 7 HPFS/NTFS # eComStation beta OS /dev/hda25 7281 7892 4915858+ 83 Linux # Fedora 6 / /dev/hda26 7893 8504 4915858+ 83 Linux # Fedora 5 / /dev/hda27 8505 9116 4915858+ 83 Linux # SUSE 10.0 / /dev/hda28 9117 9728 4915858+ 83 Linux # Mandriva 2006 / /dev/hda29 9729 10340 4915858+ 83 Linux # Mandriva 2008 / /dev/hda30 10341 10850 4096543+ 83 Linux # SUSE 11.0 / /dev/hda31 12656 12757 819283+ 7 HPFS/NTFS # OS/2 skeleton /dev/hda32 12758 14593 14747638+ 83 Linux # /isos /dev/hda33 14594 14594 8001 12 Compaq diagnostics # demarcation line 14595 19456 # freespace -- "Where were you when I laid the earth's foundation?" Matthew 7:12 NIV Team OS/2 ** Reg. Linux User #211409 Felix Miata *** http://fm.no-ip.com/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] Separating out libata out of SCSI (finally) 2008-06-24 15:42 ` Felix Miata @ 2008-06-24 15:49 ` Tejun Heo 2008-06-24 16:27 ` Felix Miata 0 siblings, 1 reply; 18+ messages in thread From: Tejun Heo @ 2008-06-24 15:49 UTC (permalink / raw) To: Felix Miata; +Cc: linux-ide Felix Miata wrote: > On 2008/06/24 23:59 (GMT+0900) Tejun Heo apparently typed: > >> I'm currently working on a scheme where partitions above gd->minors get >> allocated dynamic MAJ:MIN. It looks like it can be done mostly in block >> layer proper. The only problem I can foresee is not being able to >> specify MAJ:MIN as root device but that shouldn't be a major problem. >> I'll report back when I make more progress. > > Please correct me if I'm wrong, but on the following, would it not > be a major problem booting, without deleting any existing or moving > any existing, with only a few more upgrade cycles to replace what's > there installed now with current and near future distros? I'm not sure what you mean. If you switch to libata now, you'll only be able to see 15 partitions, so, yes, you'll have problem updating to a distro which uses libata now and in near future. If the dynamic partition thing goes well, hopefully the next major distro releases work fine. -- tejun ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] Separating out libata out of SCSI (finally) 2008-06-24 15:49 ` Tejun Heo @ 2008-06-24 16:27 ` Felix Miata 2008-06-24 16:35 ` Tejun Heo 0 siblings, 1 reply; 18+ messages in thread From: Felix Miata @ 2008-06-24 16:27 UTC (permalink / raw) To: linux-ide On 2008/06/25 00:49 (GMT+0900) Tejun Heo apparently typed: > Felix Miata wrote: >> On 2008/06/24 23:59 (GMT+0900) Tejun Heo apparently typed: >>> I'm currently working on a scheme where partitions above gd->minors get >>> allocated dynamic MAJ:MIN. It looks like it can be done mostly in block >>> layer proper. The only problem I can foresee is not being able to >>> specify MAJ:MIN as root device but that shouldn't be a major problem. >>> I'll report back when I make more progress. >> Please correct me if I'm wrong, but on the following, would it not >> be a major problem booting, without deleting any existing or moving >> any existing, with only a few more upgrade cycles to replace what's >> there installed now with current and near future distros? > I'm not sure what you mean. If you switch to libata now, you'll only be > able to see 15 partitions, so, yes, you'll have problem updating to a Well, on the Fedora 9 on hda9 that is already the case, as it provided no option to not use libata. ;-) > distro which uses libata now and in near future. If the dynamic > partition thing goes well, hopefully the next major distro releases work > fine. I responded as I did due to your statement "only problem I can foresee is not being able to specify MAJ:MIN as root device", hoping to see some elaboration on the statement from you. I read your statement as "don't expect to be able to use any partition >15 as a root device". So, in the past root=3:17 would have been a valid replacement for root =/dev/hda17? If so, sorry for the noise, as I'd never seen that type of usage, and certainly would not miss it. -- "Where were you when I laid the earth's foundation?" Matthew 7:12 NIV Team OS/2 ** Reg. Linux User #211409 Felix Miata *** http://fm.no-ip.com/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] Separating out libata out of SCSI (finally) 2008-06-24 16:27 ` Felix Miata @ 2008-06-24 16:35 ` Tejun Heo 0 siblings, 0 replies; 18+ messages in thread From: Tejun Heo @ 2008-06-24 16:35 UTC (permalink / raw) To: Felix Miata; +Cc: linux-ide Felix Miata wrote: >> distro which uses libata now and in near future. If the dynamic >> partition thing goes well, hopefully the next major distro releases work >> fine. > > I responded as I did due to your statement "only problem I can foresee is not > being able to specify MAJ:MIN as root device", hoping to see some elaboration > on the statement from you. I read your statement as "don't expect to be able > to use any partition >15 as a root device". So, in the past root=3:17 would > have been a valid replacement for root =/dev/hda17? If so, sorry for the > noise, as I'd never seen that type of usage, and certainly would not miss it. Oh.. the name, say, root=/dev/sda999 should work (well, that's the plan) but as there will be no fixed mapping between /dev/sdaN and MAJ:MIN when N > 15, root=MAJ:MIN just isn't possible for dynamically allocated ones. I hope it clarified things a bit. Thanks. -- tejun ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] Separating out libata out of SCSI (finally) 2008-06-24 14:59 ` Tejun Heo 2008-06-24 15:42 ` Felix Miata @ 2008-06-24 16:54 ` Alan Cox 1 sibling, 0 replies; 18+ messages in thread From: Alan Cox @ 2008-06-24 16:54 UTC (permalink / raw) To: Tejun Heo Cc: James Bottomley, Boaz Harrosh, Greg Freemyer, Jeff Garzik, IDE/ATA development list, linux-scsi, brking, Mark Lord, Jens Axboe > I'm currently working on a scheme where partitions above gd->minors get > allocated dynamic MAJ:MIN. It looks like it can be done mostly in block > layer proper. The only problem I can foresee is not being able to > specify MAJ:MIN as root device but that shouldn't be a major problem. > I'll report back when I make more progress. Run it past Al Viro, he vetoed the previous proposal to have discontiguous minor number ranges a couple of years ago, which is why we are stuck with this limit. Alan ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] Separating out libata out of SCSI (finally) 2008-06-20 20:28 ` James Bottomley 2008-06-20 22:41 ` Jeff Garzik @ 2008-06-20 23:47 ` Tejun Heo 1 sibling, 0 replies; 18+ messages in thread From: Tejun Heo @ 2008-06-20 23:47 UTC (permalink / raw) To: James Bottomley Cc: Jeff Garzik, IDE/ATA development list, linux-scsi, brking, Mark Lord, Alan Cox, Jens Axboe Hello, James. James Bottomley wrote: > On Fri, 2008-06-20 at 13:06 +0900, Tejun Heo wrote: >> Hello, all. >> >> This item was on TODO list for years now. People all agree that it's >> necessary but it always had relatively low priority probably because >> it's a bit difficult and isn't really necessary to make disks and >> optical drives work. >> >> Anyways, I think it's about time to take some action as SAS-ATA >> integration (Brian, sorry about staying so silent about this for long >> time, I was following the threads but couldn't really think of a quick >> solution) and other ATA specific things including link power >> management and bunch of other deferred ones due to lack of proper >> sysfs interface or high level driver (parallel probing, parallel >> resume). >> >> Currently, my plan is... >> >> * Move high level driver handling to request_queue. > > Actually, I already did quite a lot of that here: > > commit 7f9a6bc4e9d59e7fcf03ed23f60cd81ca5d80b65 > Author: James Bottomley <James.Bottomley@steeleye.com> > Date: Sat Aug 4 10:06:25 2007 -0500 > > [SCSI] move ULD attachment into the prep function > > But there's still more to be done. The way I was thinking of it was > some type of protocol label (as in a ULD spits out protocols, like SCSI > or ATA) and then passes them to a LLD which uses libraries (what libata > and the scsi mid layer become) to process them. That was what I was thinking too in a similar way PC commands are carried. There are things to think about tho like splitting single request to multiple translated commands. >> * Implement queue quiescing and other state management on request_queue. >> * Implement block_queue_group which... >> - Handles command scheduling. >> - Handles grouped queue quiescing and EH handling > > There's the beginnings of this in Jens' unmerged block timers work Great, thanks for the pointer. >> In the process, I'm planning to remove ata_host requirement and break >> down libata EH into actions and sequencers so that SAS can use them >> easily. >> >> The biggest problem is how to keep userland happy. hdX -> sdX >> transition was painful enough and I have a strong feeling that >> everyone will come after and hunt down us if we try something like sdX >> -> bdX now. :-) > > In theory mounting by label or ID should have fixed a lot of this. Now that all the distros and users went through it once, maybe it's easier second time around but I think it's best to minimize the chance of breakage. One transition was painful enough. > However, if we need to head off a revolt, the sdX allocation algorithm > can be placed into it's own module so both sd and a ULD ata driver could > use it ... Yeap, that was what I was thinking. Separating out sdX allocation algorithm and making it the disk device node allocation logic such that /dev/sdX are the universal disk nodes, which is 90% true these days anyway. > Could you, perhaps, make the port multipler visible in this as a new > device, a bit like we do today for SAS expanders? I was thinking about doing... ata_link/P:0/P:0 : 1st fan-out /P:1/P:1 : 2nd fan-out /P:2/P:2 : 3rd fan-out ... /P:15/P:15 : port multiplier which is pretty much the internal representation. Do you think there's need for a separate PMP level inbetween? >> The SCSI side of interface will remain as functional as now as it will >> go through the same libata SAT layer. > > Actually, surely we can mostly dump the SAT layer? libsas should be > made capable of taking ATA protocol packets straight from your ULD ATA > driver and sending them out. Maybe in a long long time but the SAT layer will need to stay there for compatibility for now. ie. programs which use lsscsi to locate ATA devices and using matching /dev/sgX to issue SAT commands should keep working. > I could see us still needing it as an optional component so we can send > SCSI SG_IO to ATA devices. And for compatibility. We can definitely make it optional. >> So, what do you guys think? > > I think the devil will be in the details, but that it certainly won't be > obvious until the conversion is actually tried. Alright, thanks for your comments. -- tejun ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2008-06-24 17:14 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <485B2CC6.6070201@kernel.org>
2008-06-20 19:41 ` [RFC] Separating out libata out of SCSI (finally) Brian King
2008-06-20 20:28 ` James Bottomley
2008-06-20 22:41 ` Jeff Garzik
2008-06-20 23:50 ` Tejun Heo
2008-06-23 21:04 ` Greg Freemyer
2008-06-23 21:11 ` James Bottomley
2008-06-23 21:56 ` Felix Miata
2008-06-24 8:30 ` Boaz Harrosh
2008-06-24 14:42 ` James Bottomley
2008-06-24 14:58 ` Greg Freemyer
2008-06-24 15:13 ` Felix Miata
2008-06-24 14:59 ` Tejun Heo
2008-06-24 15:42 ` Felix Miata
2008-06-24 15:49 ` Tejun Heo
2008-06-24 16:27 ` Felix Miata
2008-06-24 16:35 ` Tejun Heo
2008-06-24 16:54 ` Alan Cox
2008-06-20 23:47 ` Tejun Heo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).