linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC] Separating out libata out of SCSI (finally)
       [not found] <485B2CC6.6070201@kernel.org>
@ 2008-06-20 19:41 ` Brian King
  2008-06-20 20:28 ` James Bottomley
  1 sibling, 0 replies; 18+ messages in thread
From: Brian King @ 2008-06-20 19:41 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jeff Garzik, IDE/ATA development list, James Bottomley,
	linux-scsi, Mark Lord, Alan Cox, Jens Axboe

Tejun Heo wrote:
> PS. Brian, can you please point me to the latest version of libata EH
> integration patch?

I uploaded the current patch series here:

http://internap.dl.sourceforge.net/sourceforge/iprdd/libata_sas_new_eh.tgz


-Brian

-- 
Brian King
Linux on Power Virtualization
IBM Linux Technology Center



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Separating out libata out of SCSI (finally)
       [not found] <485B2CC6.6070201@kernel.org>
  2008-06-20 19:41 ` [RFC] Separating out libata out of SCSI (finally) Brian King
@ 2008-06-20 20:28 ` James Bottomley
  2008-06-20 22:41   ` Jeff Garzik
  2008-06-20 23:47   ` Tejun Heo
  1 sibling, 2 replies; 18+ messages in thread
From: James Bottomley @ 2008-06-20 20:28 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jeff Garzik, IDE/ATA development list, linux-scsi, brking,
	Mark Lord, Alan Cox, Jens Axboe

On Fri, 2008-06-20 at 13:06 +0900, Tejun Heo wrote:
> Hello, all.
> 
> This item was on TODO list for years now.  People all agree that it's
> necessary but it always had relatively low priority probably because
> it's a bit difficult and isn't really necessary to make disks and
> optical drives work.
> 
> Anyways, I think it's about time to take some action as SAS-ATA
> integration (Brian, sorry about staying so silent about this for long
> time, I was following the threads but couldn't really think of a quick
> solution) and other ATA specific things including link power
> management and bunch of other deferred ones due to lack of proper
> sysfs interface or high level driver (parallel probing, parallel
> resume).
> 
> Currently, my plan is...
> 
> * Move high level driver handling to request_queue.

Actually, I already did quite a lot of that here:

commit 7f9a6bc4e9d59e7fcf03ed23f60cd81ca5d80b65
Author: James Bottomley <James.Bottomley@steeleye.com>
Date:   Sat Aug 4 10:06:25 2007 -0500

    [SCSI] move ULD attachment into the prep function

But there's still more to be done.  The way I was thinking of it was
some type of protocol label (as in a ULD spits out protocols, like SCSI
or ATA) and then passes them to a LLD which uses libraries (what libata
and the scsi mid layer become) to process them.

> * Implement queue quiescing and other state management on request_queue.
> * Implement block_queue_group which...
> 	- Handles command scheduling.
> 	- Handles grouped queue quiescing and EH handling

There's the beginnings of this in Jens' unmerged block timers work

> * Move SCSI high level drivers to new infrastructure
> * Convert libata to use new command scheduling and EH infrastructure
> * Apply driver model to libata.
> * Implement ata_disk ATA high level driver.
> 
> In the process, I'm planning to remove ata_host requirement and break
> down libata EH into actions and sequencers so that SAS can use them
> easily.
> 
> The biggest problem is how to keep userland happy.  hdX -> sdX
> transition was painful enough and I have a strong feeling that
> everyone will come after and hunt down us if we try something like sdX
> -> bdX now.  :-)

In theory mounting by label or ID should have fixed a lot of this.
However, if we need to head off a revolt, the sdX allocation algorithm
can be placed into it's own module so both sd and a ULD ata driver could
use it ...

> It would be ideal if those ATA specific sysfs stuff just shows up
> without disturbing the original SCSI things which are now widely used
> for enumeration and manipulation.  I think we can get pretty close by
> modifying SCSI high level drivers a bit such that they don't register
> block devices for SCSI devices created to keep backward compatibility.
> This is an extra burden on SCSI but it's gonna be the last one.
> 
> *** Currently, sysfs nodes for a libata disk is like the following.
> 
>  /devices/DEV/hostH/targetH:C:I/H:C:I:L/
> 	 driver			-> /bus/scsi/drivers/sd
> 	 generic			-> /class/scsi_generic/sgN
> 	 block/sdX/
> 		 device		-> ../../
> 		 partitions...
>  /bus/scsi/devices/H:C:I:L	-> /devices/DEV/hostH/targetH:C:I/H:C:I:L
>  /bus/scsi/drivers/sd/H:C:I:L	-> /devices/DEV/hostH/targetH:C:I/H:C:I:L
>  /class/scsi_host/hostH/device	-> /devices/DEV/hostH
>  /class/scsi_device/H:C:I:L/device -> /devices/DEV/hostH/targetH:C:I/H:C:I:L
>  /class/scsi_disk/H:C:I:L/device -> /devices/DEV/hostH/targetH:C:I/H:C:I:L
>  /class/scsi_generic/sgN/device	-> /devices/DEV/hostH/targetH:C:I/H:C:I:L
>  /block/sda			-> /devices/DEV/hostH/targetH:C:I/H:C:I:L/block/sdX
> 
> *** After conversion, it will look something like the following.
> 
>  /devices/DEV/ata_port/P/ata_link/P:L/ata_device/P:D
> 	 driver			-> /bus/ata/drivers/ata_disk
> 	 block/sdX/
> 		 device		-> ../../
> 		 partitions...
>  C	scsi_device		-> /devices/DEV/scsi_compat/hostH/targetH:C:I/H:C:I:L
>  /bus/ata/devices/P:D		-> /devices/DEV/ata_port/P/ata_link/P:L/ata_device/P:D
>  /bus/ata/drivers/ata_disk/P:D	-> /devices/DEV/ata_port/P/ata_link/P:L/ata_device/P:D
>  /block/sda			-> /devices/DEV/ata_port/P/ata_link/P:L/ata_device/P:D/block/sdX

Could you, perhaps, make the port multipler visible in this as a new
device, a bit like we do today for SAS expanders?

>  /devices/DEV/scsi_compat/hostH/targetH:C:I/H:C:I:L/
> 	 driver			-> /bus/scsi/drivers/sd
> 	 generic			-> /class/scsi_generic/sgN
>  C	block			-> ata_port/P/ata_link/P:L/ata_device/P:D/block
>  C	ata_device		-> /devices/DEV/ata_port/P/ata_link/P:L/ata_device/P:D
> 
>  /bus/scsi/devices/H:C:I:L	-> /devices/DEV/scsi_compat/hostH/targetH:C:I/H:C:I:L
>  /bus/scsi/drivers/sd/H:C:I:L	-> /devices/DEV/scsi_compat/hostH/targetH:C:I/H:C:I:L
>  /class/scsi_host/hostH/device	-> /devices/DEV/scsi_compat/hostH
>  /class/scsi_device/H:C:I:L/device -> /devices/DEV/scsi_compat/hostH/targetH:C:I/H:C:I:L
>  /class/scsi_disk/H:C:I:L/device	-> /devices/DEV/scsi_compat/hostH/targetH:C:I/H:C:I:L
>  /class/scsi_generic/sgN/device	-> /devices/DEV/scsi_compat/hostH/targetH:C:I/H:C:I:L
> 
> The SCSI side of interface will remain as functional as now as it will
> go through the same libata SAT layer.

Actually, surely we can mostly dump the SAT layer?  libsas should be
made capable of taking ATA protocol packets straight from your ULD ATA
driver and sending them out.

I could see us still needing it as an optional component so we can send
SCSI SG_IO to ATA devices.

>   The only surprise userspace
> will see aside from the extra ata nodes is that /sys/block/sdX/device
> will lead to an ATA device node instead of SCSI device node.  Well,
> that's the whole point of the converion but I think the surprise can
> be minimized by reusing names used in SCSI device node and possibly
> making symlinks for nodes which only makes sense for SCSI device.
> 
> So, what do you guys think?

I think the devil will be in the details, but that it certainly won't be
obvious until the conversion is actually tried.

James



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Separating out libata out of SCSI (finally)
  2008-06-20 20:28 ` James Bottomley
@ 2008-06-20 22:41   ` Jeff Garzik
  2008-06-20 23:50     ` Tejun Heo
  2008-06-23 21:04     ` Greg Freemyer
  2008-06-20 23:47   ` Tejun Heo
  1 sibling, 2 replies; 18+ messages in thread
From: Jeff Garzik @ 2008-06-20 22:41 UTC (permalink / raw)
  To: James Bottomley, Tejun Heo
  Cc: IDE/ATA development list, linux-scsi, brking, Mark Lord, Alan Cox,
	Jens Axboe

James Bottomley wrote:
> On Fri, 2008-06-20 at 13:06 +0900, Tejun Heo wrote:
>> The biggest problem is how to keep userland happy.  hdX -> sdX
>> transition was painful enough and I have a strong feeling that
>> everyone will come after and hunt down us if we try something like sdX
>> -> bdX now.  :-)

> In theory mounting by label or ID should have fixed a lot of this.
> However, if we need to head off a revolt, the sdX allocation algorithm
> can be placed into it's own module so both sd and a ULD ata driver could
> use it ...

> Actually, surely we can mostly dump the SAT layer?


I don't see that we can do that for a long time...  And it's not just 
the sdX allocation algorithm in question -- SCSI block devices come with 
their own partition limits and set of supported ioctls.

Therefore, my recommended path has always been

* create ata_disk block device driver (ULD, in your terminology)

* make SAT an optional piece, which maintains compatibility with 
existing SCSI blkdevs, ioctls, command sets


I just don't see a valid path moving forward that breaks userland 
/again/...  we (ATA hackers) would be drummed out of a job I think :)

Another option that's been discussed is

1) Make SCSI block devices themselves an allocate-able resource (I think 
that's what you meant by "placed into it's own module so both sd and a 
ULD ata driver could use it"?)

2) Ensure that any ata_disk ULD would support the same partition limits 
and ioctl set, enough to ensure binary compatibility.

Because that's the real need -- maintaining binary compatibility with 
SCSI block devices, so major/minor, ioctl supported set, partition 
limits, and other relevant details need to remain unchanged.

The underlying software we're of course free to change...

	Jeff



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Separating out libata out of SCSI (finally)
  2008-06-20 20:28 ` James Bottomley
  2008-06-20 22:41   ` Jeff Garzik
@ 2008-06-20 23:47   ` Tejun Heo
  1 sibling, 0 replies; 18+ messages in thread
From: Tejun Heo @ 2008-06-20 23:47 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jeff Garzik, IDE/ATA development list, linux-scsi, brking,
	Mark Lord, Alan Cox, Jens Axboe

Hello, James.

James Bottomley wrote:
> On Fri, 2008-06-20 at 13:06 +0900, Tejun Heo wrote:
>> Hello, all.
>>
>> This item was on TODO list for years now.  People all agree that it's
>> necessary but it always had relatively low priority probably because
>> it's a bit difficult and isn't really necessary to make disks and
>> optical drives work.
>>
>> Anyways, I think it's about time to take some action as SAS-ATA
>> integration (Brian, sorry about staying so silent about this for long
>> time, I was following the threads but couldn't really think of a quick
>> solution) and other ATA specific things including link power
>> management and bunch of other deferred ones due to lack of proper
>> sysfs interface or high level driver (parallel probing, parallel
>> resume).
>>
>> Currently, my plan is...
>>
>> * Move high level driver handling to request_queue.
> 
> Actually, I already did quite a lot of that here:
> 
> commit 7f9a6bc4e9d59e7fcf03ed23f60cd81ca5d80b65
> Author: James Bottomley <James.Bottomley@steeleye.com>
> Date:   Sat Aug 4 10:06:25 2007 -0500
> 
>     [SCSI] move ULD attachment into the prep function
> 
> But there's still more to be done.  The way I was thinking of it was
> some type of protocol label (as in a ULD spits out protocols, like SCSI
> or ATA) and then passes them to a LLD which uses libraries (what libata
> and the scsi mid layer become) to process them.

That was what I was thinking too in a similar way PC commands are
carried.  There are things to think about tho like splitting single
request to multiple translated commands.

>> * Implement queue quiescing and other state management on request_queue.
>> * Implement block_queue_group which...
>> 	- Handles command scheduling.
>> 	- Handles grouped queue quiescing and EH handling
> 
> There's the beginnings of this in Jens' unmerged block timers work

Great, thanks for the pointer.

>> In the process, I'm planning to remove ata_host requirement and break
>> down libata EH into actions and sequencers so that SAS can use them
>> easily.
>>
>> The biggest problem is how to keep userland happy.  hdX -> sdX
>> transition was painful enough and I have a strong feeling that
>> everyone will come after and hunt down us if we try something like sdX
>> -> bdX now.  :-)
> 
> In theory mounting by label or ID should have fixed a lot of this.

Now that all the distros and users went through it once, maybe it's
easier second time around but I think it's best to minimize the chance
of breakage.  One transition was painful enough.

> However, if we need to head off a revolt, the sdX allocation algorithm
> can be placed into it's own module so both sd and a ULD ata driver could
> use it ...

Yeap, that was what I was thinking.  Separating out sdX allocation
algorithm and making it the disk device node allocation logic such that
/dev/sdX are the universal disk nodes, which is 90% true these days anyway.

> Could you, perhaps, make the port multipler visible in this as a new
> device, a bit like we do today for SAS expanders?

I was thinking about doing...

ata_link/P:0/P:0	: 1st fan-out
	/P:1/P:1	: 2nd fan-out
	/P:2/P:2	: 3rd fan-out
	...
	/P:15/P:15	: port multiplier

which is pretty much the internal representation.  Do you think there's
need for a separate PMP level inbetween?

>> The SCSI side of interface will remain as functional as now as it will
>> go through the same libata SAT layer.
> 
> Actually, surely we can mostly dump the SAT layer?  libsas should be
> made capable of taking ATA protocol packets straight from your ULD ATA
> driver and sending them out.

Maybe in a long long time but the SAT layer will need to stay there for
compatibility for now.  ie. programs which use lsscsi to locate ATA
devices and using matching /dev/sgX to issue SAT commands should keep
working.

> I could see us still needing it as an optional component so we can send
> SCSI SG_IO to ATA devices.

And for compatibility.  We can definitely make it optional.

>> So, what do you guys think?
> 
> I think the devil will be in the details, but that it certainly won't be
> obvious until the conversion is actually tried.

Alright, thanks for your comments.

-- 
tejun

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Separating out libata out of SCSI (finally)
  2008-06-20 22:41   ` Jeff Garzik
@ 2008-06-20 23:50     ` Tejun Heo
  2008-06-23 21:04     ` Greg Freemyer
  1 sibling, 0 replies; 18+ messages in thread
From: Tejun Heo @ 2008-06-20 23:50 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: James Bottomley, IDE/ATA development list, linux-scsi, brking,
	Mark Lord, Alan Cox, Jens Axboe

Jeff Garzik wrote:
> 1) Make SCSI block devices themselves an allocate-able resource (I think
> that's what you meant by "placed into it's own module so both sd and a
> ULD ata driver could use it"?)
> 
> 2) Ensure that any ata_disk ULD would support the same partition limits
> and ioctl set, enough to ensure binary compatibility.
> 
> Because that's the real need -- maintaining binary compatibility with
> SCSI block devices, so major/minor, ioctl supported set, partition
> limits, and other relevant details need to remain unchanged.
> 
> The underlying software we're of course free to change...

I'm taking this approach.  I think it's better than introducing a new
block device while keeping the old one as that causes numerous userland
problems including dup devices (if they're gonna exist side-by-side) and
eventual need for conversion && breakage of old userland on newer kernels.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Separating out libata out of SCSI (finally)
  2008-06-20 22:41   ` Jeff Garzik
  2008-06-20 23:50     ` Tejun Heo
@ 2008-06-23 21:04     ` Greg Freemyer
  2008-06-23 21:11       ` James Bottomley
  1 sibling, 1 reply; 18+ messages in thread
From: Greg Freemyer @ 2008-06-23 21:04 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: James Bottomley, Tejun Heo, IDE/ATA development list, linux-scsi,
	brking, Mark Lord, Alan Cox, Jens Axboe

On Fri, Jun 20, 2008 at 6:41 PM, Jeff Garzik <jeff@garzik.org> wrote:
> James Bottomley wrote:
>>
>> On Fri, 2008-06-20 at 13:06 +0900, Tejun Heo wrote:
>>>
>>> The biggest problem is how to keep userland happy.  hdX -> sdX
>>> transition was painful enough and I have a strong feeling that
>>> everyone will come after and hunt down us if we try something like sdX
>>> -> bdX now.  :-)
>
>> In theory mounting by label or ID should have fixed a lot of this.
>> However, if we need to head off a revolt, the sdX allocation algorithm
>> can be placed into it's own module so both sd and a ULD ata driver could
>> use it ...
>
>> Actually, surely we can mostly dump the SAT layer?
>
>
> I don't see that we can do that for a long time...  And it's not just the
> sdX allocation algorithm in question -- SCSI block devices come with their
> own partition limits and set of supported ioctls.
>
> Therefore, my recommended path has always been
>
> * create ata_disk block device driver (ULD, in your terminology)
>
> * make SAT an optional piece, which maintains compatibility with existing
> SCSI blkdevs, ioctls, command sets
>
>
> I just don't see a valid path moving forward that breaks userland /again/...
>  we (ATA hackers) would be drummed out of a job I think :)
>
> Another option that's been discussed is
>
> 1) Make SCSI block devices themselves an allocate-able resource (I think
> that's what you meant by "placed into it's own module so both sd and a ULD
> ata driver could use it"?)
>
> 2) Ensure that any ata_disk ULD would support the same partition limits and
> ioctl set, enough to ensure binary compatibility.
>
> Because that's the real need -- maintaining binary compatibility with SCSI
> block devices, so major/minor, ioctl supported set, partition limits, and
> other relevant details need to remain unchanged.
>

I've seen a lot of end user complaints about libata only supporting
15(14?) partitions.  Will that limit be moved back to the traditional
drivers/ide limit as part of this?

FYI: I don't personally use that many partitions, but there is a vocal
minority of users on the opensuse mailinglist that complain pretty
loudly about current restriction.

Greg
-- 
Greg Freemyer
Litigation Triage Solutions Specialist
http://www.linkedin.com/in/gregfreemyer
First 99 Days Litigation White Paper -
http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf

The Norcross Group
The Intersection of Evidence & Technology
http://www.norcrossgroup.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Separating out libata out of SCSI (finally)
  2008-06-23 21:04     ` Greg Freemyer
@ 2008-06-23 21:11       ` James Bottomley
  2008-06-23 21:56         ` Felix Miata
  2008-06-24  8:30         ` Boaz Harrosh
  0 siblings, 2 replies; 18+ messages in thread
From: James Bottomley @ 2008-06-23 21:11 UTC (permalink / raw)
  To: Greg Freemyer
  Cc: Jeff Garzik, Tejun Heo, IDE/ATA development list, linux-scsi,
	brking, Mark Lord, Alan Cox, Jens Axboe

On Mon, 2008-06-23 at 17:04 -0400, Greg Freemyer wrote:
> On Fri, Jun 20, 2008 at 6:41 PM, Jeff Garzik <jeff@garzik.org> wrote:
> > James Bottomley wrote:
> >>
> >> On Fri, 2008-06-20 at 13:06 +0900, Tejun Heo wrote:
> >>>
> >>> The biggest problem is how to keep userland happy.  hdX -> sdX
> >>> transition was painful enough and I have a strong feeling that
> >>> everyone will come after and hunt down us if we try something like sdX
> >>> -> bdX now.  :-)
> >
> >> In theory mounting by label or ID should have fixed a lot of this.
> >> However, if we need to head off a revolt, the sdX allocation algorithm
> >> can be placed into it's own module so both sd and a ULD ata driver could
> >> use it ...
> >
> >> Actually, surely we can mostly dump the SAT layer?
> >
> >
> > I don't see that we can do that for a long time...  And it's not just the
> > sdX allocation algorithm in question -- SCSI block devices come with their
> > own partition limits and set of supported ioctls.
> >
> > Therefore, my recommended path has always been
> >
> > * create ata_disk block device driver (ULD, in your terminology)
> >
> > * make SAT an optional piece, which maintains compatibility with existing
> > SCSI blkdevs, ioctls, command sets
> >
> >
> > I just don't see a valid path moving forward that breaks userland /again/...
> >  we (ATA hackers) would be drummed out of a job I think :)
> >
> > Another option that's been discussed is
> >
> > 1) Make SCSI block devices themselves an allocate-able resource (I think
> > that's what you meant by "placed into it's own module so both sd and a ULD
> > ata driver could use it"?)
> >
> > 2) Ensure that any ata_disk ULD would support the same partition limits and
> > ioctl set, enough to ensure binary compatibility.
> >
> > Because that's the real need -- maintaining binary compatibility with SCSI
> > block devices, so major/minor, ioctl supported set, partition limits, and
> > other relevant details need to remain unchanged.
> >
> 
> I've seen a lot of end user complaints about libata only supporting
> 15(14?) partitions.  Will that limit be moved back to the traditional
> drivers/ide limit as part of this?

Number of partitions is directly related to number of minors, so it
can't be changed without a change in the allocation of major/minor space
in sd ... that could only be done compatibly by permuting the space.
The only other way to do it is incompatibly by changing major (again).

James



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Separating out libata out of SCSI (finally)
  2008-06-23 21:11       ` James Bottomley
@ 2008-06-23 21:56         ` Felix Miata
  2008-06-24  8:30         ` Boaz Harrosh
  1 sibling, 0 replies; 18+ messages in thread
From: Felix Miata @ 2008-06-23 21:56 UTC (permalink / raw)
  To: James Bottomley, linux-ide
  Cc: Greg Freemyer, Jeff Garzik, Alan Cox, Jens Axboe

On 2008/06/23 16:11 (GMT-0500) James Bottomley apparently typed:

> On Mon, 2008-06-23 at 17:04 -0400, Greg Freemyer wrote:

>> I've seen a lot of end user complaints about libata only supporting
>> 15(14?) partitions.  Will that limit be moved back to the traditional
>> drivers/ide limit as part of this?

> Number of partitions is directly related to number of minors, so it
> can't be changed without a change in the allocation of major/minor space
> in sd ... that could only be done compatibly by permuting the space.
> The only other way to do it is incompatibly by changing major (again).

I don't know what permuting the space means, but I do know libata affects me
in a very big way, being dependent on cross platform tools for partitioning,
backup, & restoration; and multibooting virtually every system I touch. LVM
is simply no option for these systems, and I have yet to determine whether
and how kpartx and device mapper might be used to mitigate the problems
caused by trying to work within libata's SCSI limit, a limit which is a major
reason why I ceased trying to use SCSI over 5 years ago when disks got too
big for few partitions. I've ceased trying to use Fedora & *buntu on more
than a cursory basis, using mostly SUSE and Mandriva because they still
provide the option to use legacy drivers in their default kernels.

Isn't libata at some point supposed to completely displace the legacy IDE
driver set? If so, at some point couldn't the IDE major be recycled for
libata use for devices >15, or even all of them, maybe in the interim via a
compile time choice between either legacy or major 3 but never both
simultaneously?
-- 
"Where were you when I laid the earth's
foundation?"		       Matthew 7:12 NIV

 Team OS/2 ** Reg. Linux User #211409

Felix Miata  ***  http://fm.no-ip.com/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Separating out libata out of SCSI (finally)
  2008-06-23 21:11       ` James Bottomley
  2008-06-23 21:56         ` Felix Miata
@ 2008-06-24  8:30         ` Boaz Harrosh
  2008-06-24 14:42           ` James Bottomley
  1 sibling, 1 reply; 18+ messages in thread
From: Boaz Harrosh @ 2008-06-24  8:30 UTC (permalink / raw)
  To: James Bottomley
  Cc: Greg Freemyer, Jeff Garzik, Tejun Heo, IDE/ATA development list,
	linux-scsi, brking, Mark Lord, Alan Cox, Jens Axboe

James Bottomley wrote:
> On Mon, 2008-06-23 at 17:04 -0400, Greg Freemyer wrote:
>> On Fri, Jun 20, 2008 at 6:41 PM, Jeff Garzik <jeff@garzik.org> wrote:
>>> James Bottomley wrote:
>>>> On Fri, 2008-06-20 at 13:06 +0900, Tejun Heo wrote:
>>>>> The biggest problem is how to keep userland happy.  hdX -> sdX
>>>>> transition was painful enough and I have a strong feeling that
>>>>> everyone will come after and hunt down us if we try something like sdX
>>>>> -> bdX now.  :-)
>>>> In theory mounting by label or ID should have fixed a lot of this.
>>>> However, if we need to head off a revolt, the sdX allocation algorithm
>>>> can be placed into it's own module so both sd and a ULD ata driver could
>>>> use it ...
>>>> Actually, surely we can mostly dump the SAT layer?
>>>
>>> I don't see that we can do that for a long time...  And it's not just the
>>> sdX allocation algorithm in question -- SCSI block devices come with their
>>> own partition limits and set of supported ioctls.
>>>
>>> Therefore, my recommended path has always been
>>>
>>> * create ata_disk block device driver (ULD, in your terminology)
>>>
>>> * make SAT an optional piece, which maintains compatibility with existing
>>> SCSI blkdevs, ioctls, command sets
>>>
>>>
>>> I just don't see a valid path moving forward that breaks userland /again/...
>>>  we (ATA hackers) would be drummed out of a job I think :)
>>>
>>> Another option that's been discussed is
>>>
>>> 1) Make SCSI block devices themselves an allocate-able resource (I think
>>> that's what you meant by "placed into it's own module so both sd and a ULD
>>> ata driver could use it"?)
>>>
>>> 2) Ensure that any ata_disk ULD would support the same partition limits and
>>> ioctl set, enough to ensure binary compatibility.
>>>
>>> Because that's the real need -- maintaining binary compatibility with SCSI
>>> block devices, so major/minor, ioctl supported set, partition limits, and
>>> other relevant details need to remain unchanged.
>>>
>> I've seen a lot of end user complaints about libata only supporting
>> 15(14?) partitions.  Will that limit be moved back to the traditional
>> drivers/ide limit as part of this?
> 
> Number of partitions is directly related to number of minors, so it
> can't be changed without a change in the allocation of major/minor space
> in sd ... that could only be done compatibly by permuting the space.
> The only other way to do it is incompatibly by changing major (again).
> 
> James
> 
Could we do both? I mean use the legacy, up to 15, with the old major,
then use the new major for bigger then 15. Since user mode that knows
about more then 15 partitions is new, it'll know it needs to jump a major.

Boaz

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Separating out libata out of SCSI (finally)
  2008-06-24  8:30         ` Boaz Harrosh
@ 2008-06-24 14:42           ` James Bottomley
  2008-06-24 14:58             ` Greg Freemyer
  2008-06-24 14:59             ` Tejun Heo
  0 siblings, 2 replies; 18+ messages in thread
From: James Bottomley @ 2008-06-24 14:42 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Greg Freemyer, Jeff Garzik, Tejun Heo, IDE/ATA development list,
	linux-scsi, brking, Mark Lord, Alan Cox, Jens Axboe

On Tue, 2008-06-24 at 11:30 +0300, Boaz Harrosh wrote:
> James Bottomley wrote:
> > On Mon, 2008-06-23 at 17:04 -0400, Greg Freemyer wrote:
> >> I've seen a lot of end user complaints about libata only supporting
> >> 15(14?) partitions.  Will that limit be moved back to the traditional
> >> drivers/ide limit as part of this?
> > 
> > Number of partitions is directly related to number of minors, so it
> > can't be changed without a change in the allocation of major/minor space
> > in sd ... that could only be done compatibly by permuting the space.
> > The only other way to do it is incompatibly by changing major (again).
> > 
> Could we do both? I mean use the legacy, up to 15, with the old major,
> then use the new major for bigger then 15. Since user mode that knows
> about more then 15 partitions is new, it'll know it needs to jump a major.

Not simultaneously, which is the problem; you can't have two separate
block devices for the same physical device unless you want aliasing
issues in the page cache.

It might be possible to add an extra device to give access to the
missing partitions, but that would require a bit of re-engineering in
gendisk (which is the in-kernel code to manage the partitions).

What might be far more feasible is to set up udev to use kpartx to
provide the missing partitions if it detects a partition table that has
them ... of course, that requires a udev setup and most of the
complaints about the lost partitions seem to come from non-udev systems.

But .... if everyone (particularly the people with these problems) had
udev, we could simply migrate to a new major with more partitions, get
udev to fix it all up for us and everyone would be happy because no-one
would even notice that we'd moved majors ...

James



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Separating out libata out of SCSI (finally)
  2008-06-24 14:42           ` James Bottomley
@ 2008-06-24 14:58             ` Greg Freemyer
  2008-06-24 15:13               ` Felix Miata
  2008-06-24 14:59             ` Tejun Heo
  1 sibling, 1 reply; 18+ messages in thread
From: Greg Freemyer @ 2008-06-24 14:58 UTC (permalink / raw)
  To: James Bottomley
  Cc: Boaz Harrosh, Jeff Garzik, Tejun Heo, IDE/ATA development list,
	linux-scsi, brking, Mark Lord, Alan Cox, Jens Axboe

On Tue, Jun 24, 2008 at 10:42 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Tue, 2008-06-24 at 11:30 +0300, Boaz Harrosh wrote:
>> James Bottomley wrote:
>> > On Mon, 2008-06-23 at 17:04 -0400, Greg Freemyer wrote:
>> >> I've seen a lot of end user complaints about libata only supporting
>> >> 15(14?) partitions.  Will that limit be moved back to the traditional
>> >> drivers/ide limit as part of this?
>> >
>> > Number of partitions is directly related to number of minors, so it
>> > can't be changed without a change in the allocation of major/minor space
>> > in sd ... that could only be done compatibly by permuting the space.
>> > The only other way to do it is incompatibly by changing major (again).
>> >
>> Could we do both? I mean use the legacy, up to 15, with the old major,
>> then use the new major for bigger then 15. Since user mode that knows
>> about more then 15 partitions is new, it'll know it needs to jump a major.
>
> Not simultaneously, which is the problem; you can't have two separate
> block devices for the same physical device unless you want aliasing
> issues in the page cache.
>
> It might be possible to add an extra device to give access to the
> missing partitions, but that would require a bit of re-engineering in
> gendisk (which is the in-kernel code to manage the partitions).
>
> What might be far more feasible is to set up udev to use kpartx to
> provide the missing partitions if it detects a partition table that has
> them ... of course, that requires a udev setup and most of the
> complaints about the lost partitions seem to come from non-udev systems.
>
> But .... if everyone (particularly the people with these problems) had
> udev, we could simply migrate to a new major with more partitions, get
> udev to fix it all up for us and everyone would be happy because no-one
> would even notice that we'd moved majors ...
>
> James

>From my limited perspective, every complaint I've seen was related to
OpenSUSE which I believe is a udev based distro, so addressing this
for the udev based distros would address that contingent of users.

Greg
-- 
Greg Freemyer
Litigation Triage Solutions Specialist
http://www.linkedin.com/in/gregfreemyer
First 99 Days Litigation White Paper -
http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf

The Norcross Group
The Intersection of Evidence & Technology
http://www.norcrossgroup.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Separating out libata out of SCSI (finally)
  2008-06-24 14:42           ` James Bottomley
  2008-06-24 14:58             ` Greg Freemyer
@ 2008-06-24 14:59             ` Tejun Heo
  2008-06-24 15:42               ` Felix Miata
  2008-06-24 16:54               ` Alan Cox
  1 sibling, 2 replies; 18+ messages in thread
From: Tejun Heo @ 2008-06-24 14:59 UTC (permalink / raw)
  To: James Bottomley
  Cc: Boaz Harrosh, Greg Freemyer, Jeff Garzik,
	IDE/ATA development list, linux-scsi, brking, Mark Lord, Alan Cox,
	Jens Axboe

James Bottomley wrote:
> On Tue, 2008-06-24 at 11:30 +0300, Boaz Harrosh wrote:
>> James Bottomley wrote:
>>> On Mon, 2008-06-23 at 17:04 -0400, Greg Freemyer wrote:
>>>> I've seen a lot of end user complaints about libata only supporting
>>>> 15(14?) partitions.  Will that limit be moved back to the traditional
>>>> drivers/ide limit as part of this?
>>> Number of partitions is directly related to number of minors, so it
>>> can't be changed without a change in the allocation of major/minor space
>>> in sd ... that could only be done compatibly by permuting the space.
>>> The only other way to do it is incompatibly by changing major (again).
>>>
>> Could we do both? I mean use the legacy, up to 15, with the old major,
>> then use the new major for bigger then 15. Since user mode that knows
>> about more then 15 partitions is new, it'll know it needs to jump a major.
> 
> Not simultaneously, which is the problem; you can't have two separate
> block devices for the same physical device unless you want aliasing
> issues in the page cache.
> 
> It might be possible to add an extra device to give access to the
> missing partitions, but that would require a bit of re-engineering in
> gendisk (which is the in-kernel code to manage the partitions).
> 
> What might be far more feasible is to set up udev to use kpartx to
> provide the missing partitions if it detects a partition table that has
> them ... of course, that requires a udev setup and most of the
> complaints about the lost partitions seem to come from non-udev systems.
> 
> But .... if everyone (particularly the people with these problems) had
> udev, we could simply migrate to a new major with more partitions, get
> udev to fix it all up for us and everyone would be happy because no-one
> would even notice that we'd moved majors ...

I'm currently working on a scheme where partitions above gd->minors get
allocated dynamic MAJ:MIN.  It looks like it can be done mostly in block
layer proper.  The only problem I can foresee is not being able to
specify MAJ:MIN as root device but that shouldn't be a major problem.
I'll report back when I make more progress.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Separating out libata out of SCSI (finally)
  2008-06-24 14:58             ` Greg Freemyer
@ 2008-06-24 15:13               ` Felix Miata
  0 siblings, 0 replies; 18+ messages in thread
From: Felix Miata @ 2008-06-24 15:13 UTC (permalink / raw)
  To: linux-ide

On 2008/06/24 10:58 (GMT-0400) Greg Freemyer apparently typed:

> From my limited perspective, every complaint I've seen was related to
> OpenSUSE which I believe is a udev based distro, so addressing this
> for the udev based distros would address that contingent of users.

I think OpenSUSE users are just more vocal, but basically the contingent of
those with a problem from no >15 access is heavily weighted with
multibooters, whether they use OpenSUSE, Mandriva, *buntu or whatever. Fedora
seems to have pushed most of its users into LVM, but I don't think LVM has
caught or will catch on for the rest of the multibooters.

None of the distros I've first used in the past 2+ years (probably longer, I
just don't remember when I didn't see udev last) has failed to include udev.

Until now at least I've been able to avoid doing any hardware upgrading that
would force me into using only SATA. I have only one out of about 25 working
systems with SATA exclusively. It I use virtually exclusively for OS/2 (you
remember, the ancient and "dead" OS), and it currently has 50 partitions on
disk 1, and 19 on disk 2.
-- 
"Where were you when I laid the earth's
foundation?"		       Matthew 7:12 NIV

 Team OS/2 ** Reg. Linux User #211409

Felix Miata  ***  http://fm.no-ip.com/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Separating out libata out of SCSI (finally)
  2008-06-24 14:59             ` Tejun Heo
@ 2008-06-24 15:42               ` Felix Miata
  2008-06-24 15:49                 ` Tejun Heo
  2008-06-24 16:54               ` Alan Cox
  1 sibling, 1 reply; 18+ messages in thread
From: Felix Miata @ 2008-06-24 15:42 UTC (permalink / raw)
  To: linux-ide

On 2008/06/24 23:59 (GMT+0900) Tejun Heo apparently typed:

> I'm currently working on a scheme where partitions above gd->minors get
> allocated dynamic MAJ:MIN.  It looks like it can be done mostly in block
> layer proper.  The only problem I can foresee is not being able to
> specify MAJ:MIN as root device but that shouldn't be a major problem.
> I'll report back when I make more progress.

Please correct me if I'm wrong, but on the following, would it not
be a major problem booting, without deleting any existing or moving
any existing, with only a few more upgrade cycles to replace what's
there installed now with current and near future distros?

Disk /dev/hda: 160.0 GB, 160041885696 bytes
255 heads, 63 sectors/track, 19457 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x31caf8da

   Device Boot Start    End    Blocks   Id  System
/dev/hda1          1     13    104391   17  Hidden HPFS/NTFS		# OS/2 maintenance OS
/dev/hda2   *     14     14      8032+   a  OS/2 Boot Manager		# Primary Boot Loader
/dev/hda3         15     46    257040    6  FAT16			# DOS
/dev/hda4         48  14594 116848777+   5  Extended		
/dev/hda5         48     48      8001    1  FAT12			# DOS
/dev/hda6         49     80    257008+   6  FAT16			# DOS
/dev/hda7         82    107    208813+  83  Linux			# primary Grub
/dev/hda8        108    222    923706   82  Linux swap / Solaris
/dev/hda9        223    834   4915858+  83  Linux			# Fedora, current
/dev/hda10      1142   1243    819283+   7  HPFS/NTFS			# OS/2 OS
/dev/hda11      1269   1523   2048256    7  HPFS/NTFS			# OS/2 data
/dev/hda12      1575   1693    955836    7  HPFS/NTFS			# OS/2 apps
/dev/hda13      1694   2012   2562336    7  HPFS/NTFS			# OS/2 archival
/dev/hda14      2014   2217   1638598+  83  Linux			# /home
/dev/hda15      2218   2568   2819376   83  Linux			# /Knoppix
/dev/hda16      2569   3486   7373803+  83  Linux			# SUSE 10.2 /
/dev/hda17      3487   4379   7172991   83  Linux			# /pub
/dev/hda18      4380   4409    240943+  83  Linux			# /srv
/dev/hda19      4410   4549   1124518+  83  Linux			# /usr/local
/dev/hda20      4550   5187   5124703+  83  Linux			# /usr/src
/dev/hda21      5188   5799   4915858+  83  Linux			# Mandriva 2007 /
/dev/hda22      5800   6513   5735173+  83  Linux			# Factory /
/dev/hda23      6514   7176   5325516   83  Linux			# Cooker /
/dev/hda24      7178   7279    819283+   7  HPFS/NTFS			# eComStation beta OS
/dev/hda25      7281   7892   4915858+  83  Linux			# Fedora 6 /
/dev/hda26      7893   8504   4915858+  83  Linux			# Fedora 5 /
/dev/hda27      8505   9116   4915858+  83  Linux			# SUSE 10.0 /
/dev/hda28      9117   9728   4915858+  83  Linux			# Mandriva 2006 /
/dev/hda29      9729  10340   4915858+  83  Linux			# Mandriva 2008 /
/dev/hda30     10341  10850   4096543+  83  Linux			# SUSE 11.0 /
/dev/hda31     12656  12757    819283+   7  HPFS/NTFS			# OS/2 skeleton
/dev/hda32     12758  14593  14747638+  83  Linux			# /isos
/dev/hda33     14594  14594      8001   12  Compaq diagnostics		# demarcation line
	       14595  19456						# freespace
-- 
"Where were you when I laid the earth's
foundation?"		       Matthew 7:12 NIV

 Team OS/2 ** Reg. Linux User #211409

Felix Miata  ***  http://fm.no-ip.com/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Separating out libata out of SCSI (finally)
  2008-06-24 15:42               ` Felix Miata
@ 2008-06-24 15:49                 ` Tejun Heo
  2008-06-24 16:27                   ` Felix Miata
  0 siblings, 1 reply; 18+ messages in thread
From: Tejun Heo @ 2008-06-24 15:49 UTC (permalink / raw)
  To: Felix Miata; +Cc: linux-ide

Felix Miata wrote:
> On 2008/06/24 23:59 (GMT+0900) Tejun Heo apparently typed:
> 
>> I'm currently working on a scheme where partitions above gd->minors get
>> allocated dynamic MAJ:MIN.  It looks like it can be done mostly in block
>> layer proper.  The only problem I can foresee is not being able to
>> specify MAJ:MIN as root device but that shouldn't be a major problem.
>> I'll report back when I make more progress.
> 
> Please correct me if I'm wrong, but on the following, would it not
> be a major problem booting, without deleting any existing or moving
> any existing, with only a few more upgrade cycles to replace what's
> there installed now with current and near future distros?

I'm not sure what you mean.  If you switch to libata now, you'll only be
able to see 15 partitions, so, yes, you'll have problem updating to a
distro which uses libata now and in near future.  If the dynamic
partition thing goes well, hopefully the next major distro releases work
fine.

-- 
tejun

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Separating out libata out of SCSI (finally)
  2008-06-24 15:49                 ` Tejun Heo
@ 2008-06-24 16:27                   ` Felix Miata
  2008-06-24 16:35                     ` Tejun Heo
  0 siblings, 1 reply; 18+ messages in thread
From: Felix Miata @ 2008-06-24 16:27 UTC (permalink / raw)
  To: linux-ide

On 2008/06/25 00:49 (GMT+0900) Tejun Heo apparently typed:

> Felix Miata wrote:

>> On 2008/06/24 23:59 (GMT+0900) Tejun Heo apparently typed:

>>> I'm currently working on a scheme where partitions above gd->minors get
>>> allocated dynamic MAJ:MIN.  It looks like it can be done mostly in block
>>> layer proper.  The only problem I can foresee is not being able to
>>> specify MAJ:MIN as root device but that shouldn't be a major problem.
>>> I'll report back when I make more progress.

>> Please correct me if I'm wrong, but on the following, would it not
>> be a major problem booting, without deleting any existing or moving
>> any existing, with only a few more upgrade cycles to replace what's
>> there installed now with current and near future distros?

> I'm not sure what you mean.  If you switch to libata now, you'll only be
> able to see 15 partitions, so, yes, you'll have problem updating to a

Well, on the Fedora 9 on hda9 that is already the case, as it provided no
option to not use libata. ;-)

> distro which uses libata now and in near future.  If the dynamic
> partition thing goes well, hopefully the next major distro releases work
> fine.

I responded as I did due to your statement "only problem I can foresee is not
being able to specify MAJ:MIN as root device", hoping to see some elaboration
on the statement from you. I read your statement as "don't expect to be able
to use any partition >15 as a root device". So, in the past root=3:17 would
have been a valid replacement for root =/dev/hda17? If so, sorry for the
noise, as I'd never seen that type of usage, and certainly would not miss it.
-- 
"Where were you when I laid the earth's
foundation?"		       Matthew 7:12 NIV

 Team OS/2 ** Reg. Linux User #211409

Felix Miata  ***  http://fm.no-ip.com/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Separating out libata out of SCSI (finally)
  2008-06-24 16:27                   ` Felix Miata
@ 2008-06-24 16:35                     ` Tejun Heo
  0 siblings, 0 replies; 18+ messages in thread
From: Tejun Heo @ 2008-06-24 16:35 UTC (permalink / raw)
  To: Felix Miata; +Cc: linux-ide

Felix Miata wrote:
>> distro which uses libata now and in near future.  If the dynamic
>> partition thing goes well, hopefully the next major distro releases work
>> fine.
> 
> I responded as I did due to your statement "only problem I can foresee is not
> being able to specify MAJ:MIN as root device", hoping to see some elaboration
> on the statement from you. I read your statement as "don't expect to be able
> to use any partition >15 as a root device". So, in the past root=3:17 would
> have been a valid replacement for root =/dev/hda17? If so, sorry for the
> noise, as I'd never seen that type of usage, and certainly would not miss it.

Oh.. the name, say, root=/dev/sda999 should work (well, that's the plan)
but as there will be no fixed mapping between /dev/sdaN and MAJ:MIN when
N > 15, root=MAJ:MIN just isn't possible for dynamically allocated ones.
 I hope it clarified things a bit.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Separating out libata out of SCSI (finally)
  2008-06-24 14:59             ` Tejun Heo
  2008-06-24 15:42               ` Felix Miata
@ 2008-06-24 16:54               ` Alan Cox
  1 sibling, 0 replies; 18+ messages in thread
From: Alan Cox @ 2008-06-24 16:54 UTC (permalink / raw)
  To: Tejun Heo
  Cc: James Bottomley, Boaz Harrosh, Greg Freemyer, Jeff Garzik,
	IDE/ATA development list, linux-scsi, brking, Mark Lord,
	Jens Axboe

> I'm currently working on a scheme where partitions above gd->minors get
> allocated dynamic MAJ:MIN.  It looks like it can be done mostly in block
> layer proper.  The only problem I can foresee is not being able to
> specify MAJ:MIN as root device but that shouldn't be a major problem.
> I'll report back when I make more progress.

Run it past Al Viro, he vetoed the previous proposal to have discontiguous
minor number ranges a couple of years ago, which is why we are stuck with
this limit.

Alan

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2008-06-24 17:14 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <485B2CC6.6070201@kernel.org>
2008-06-20 19:41 ` [RFC] Separating out libata out of SCSI (finally) Brian King
2008-06-20 20:28 ` James Bottomley
2008-06-20 22:41   ` Jeff Garzik
2008-06-20 23:50     ` Tejun Heo
2008-06-23 21:04     ` Greg Freemyer
2008-06-23 21:11       ` James Bottomley
2008-06-23 21:56         ` Felix Miata
2008-06-24  8:30         ` Boaz Harrosh
2008-06-24 14:42           ` James Bottomley
2008-06-24 14:58             ` Greg Freemyer
2008-06-24 15:13               ` Felix Miata
2008-06-24 14:59             ` Tejun Heo
2008-06-24 15:42               ` Felix Miata
2008-06-24 15:49                 ` Tejun Heo
2008-06-24 16:27                   ` Felix Miata
2008-06-24 16:35                     ` Tejun Heo
2008-06-24 16:54               ` Alan Cox
2008-06-20 23:47   ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).