* [ANNOUNCE]: Generic SCSI Target Mid-level For Linux (SCST), target drivers for iSCSI and QLogic Fibre Channel cards released
@ 2008-07-08 19:14 Vladislav Bolkhovitin
2008-07-08 21:09 ` Nicholas A. Bellinger
0 siblings, 1 reply; 11+ messages in thread
From: Vladislav Bolkhovitin @ 2008-07-08 19:14 UTC (permalink / raw)
To: linux-kernel, linux-scsi; +Cc: scst-devel
I'm glad to announce that version 1.0.0 of Generic SCSI Target Middle
Level for Linux (SCST) was released and available for download from
http://scst.sourceforge.net/downloads.html
SCST is a subsystem of the Linux kernel that provides a standard
framework for SCSI target drivers development. It is designed to provide
unified, consistent interface between SCSI target drivers and Linux
kernel and simplify target drivers development as much as possible. It
has the following main features:
* Simple, easy to use interface with target drivers. Particularly,
SCST core performs required pre- and post- processing of incoming
requests as well as necessary error recovery.
* Undertakes most problems, related to execution contexts, thus
practically eliminating one of the most complicated problem in the
kernel drivers development. For example, a target driver for QLogic
22xx/23xx cards, which has all necessary features, is only about 2000
lines of code long.
* Very low overhead, fine-grained locks and simplest commands
processing path, which allow to reach maximum possible performance and
scalability. Particularly, incoming requests can be processed in the
caller's context or in one of the internal SCST core's tasklets,
therefore no extra context switches required.
* Device handlers, i.e. plugins, architecture provides extra
flexibility by allowing various I/O modes in backstorage handling. For
example, pass-through device handlers allow to use real SCSI hardware
and vdisk device handler allows to use files as virtual disks.
* Provides advanced per-initiator devices visibility management
(LUN masking), which allows different initiators to see different set of
devices with different access permissions. For instance, initiator A
could see exported from target T devices X and Y read-writable, and
initiator B from the same target T could see devices Y read-only and Z
read-writable.
* Emulates necessary functionality of SCSI host adapter, because
from remote initiators point of view SCST acts as a SCSI host with its
own devices.
The following I/O modes are supported by SCST:
* Pass-through mode with one to many relationship, i.e. when multiple
initiators can connect to the exported pass-through devices, for
virtually all SCSI devices types: disks (type 0), tapes (type 1),
processors (type 3), CDROMs (type 5), MO disks (type 7), medium changers
(type 8) and RAID controllers (type 0xC)
* FILEIO mode, which allows to use files on file systems or block
devices as virtual remotely available SCSI disks or CDROMs with benefits
of the Linux page cache
* BLOCKIO mode, which performs direct block I/O with a block device,
bypassing page-cache for all operations. This mode works ideally with
high-end storage HBAs and for applications that either do not need
caching between application and disk or need the large block throughput.
* User space mode using scst_user device handler, which allows to
implement in the user space virtual SCSI devices in the SCST environment.
Detail description of SCST, its drivers and utilities you can find on
SCST home page http://scst.sourceforge.net.
Comparison with the mainstream target middle level STGT you can find on
the SCST vs STGT page http://scst.sourceforge.net/scstvsstgt.html. In
short, SCST has the following main advantages over STGT:
- Better performance (in many cases tens of %% and more) with
potential for further improvement, for example, by implementing
zero-copy cache I/O.
- Monolithic in-kernel architecture, which follows standard Linux
kernel paradigm to eliminate distributed processing. It is simpler,
hence more reliable and maintainable.
SCST is being prepared in form of patch for review and inclusion to the
kernel.
Vlad
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [ANNOUNCE]: Generic SCSI Target Mid-level For Linux (SCST), target drivers for iSCSI and QLogic Fibre Channel cards released 2008-07-08 19:14 [ANNOUNCE]: Generic SCSI Target Mid-level For Linux (SCST), target drivers for iSCSI and QLogic Fibre Channel cards released Vladislav Bolkhovitin @ 2008-07-08 21:09 ` Nicholas A. Bellinger 2008-07-09 11:19 ` Vladislav Bolkhovitin 0 siblings, 1 reply; 11+ messages in thread From: Nicholas A. Bellinger @ 2008-07-08 21:09 UTC (permalink / raw) To: Vladislav Bolkhovitin; +Cc: linux-kernel, linux-scsi, scst-devel, nab Hi Vlad, On Tue, 2008-07-08 at 23:14 +0400, Vladislav Bolkhovitin wrote: > I'm glad to announce that version 1.0.0 of Generic SCSI Target Middle > Level for Linux (SCST) was released and available for download from > http://scst.sourceforge.net/downloads.html > Congratulations on reaching your v1.0.0 release! > SCST is a subsystem of the Linux kernel that provides a standard > framework for SCSI target drivers development. It is designed to provide > unified, consistent interface between SCSI target drivers and Linux > kernel and simplify target drivers development as much as possible. It > has the following main features: > > * Simple, easy to use interface with target drivers. Particularly, > SCST core performs required pre- and post- processing of incoming > requests as well as necessary error recovery. > > * Undertakes most problems, related to execution contexts, thus > practically eliminating one of the most complicated problem in the > kernel drivers development. For example, a target driver for QLogic > 22xx/23xx cards, which has all necessary features, is only about 2000 > lines of code long. > > * Very low overhead, fine-grained locks and simplest commands > processing path, which allow to reach maximum possible performance and > scalability. Particularly, incoming requests can be processed in the > caller's context or in one of the internal SCST core's tasklets, > therefore no extra context switches required. > > * Device handlers, i.e. plugins, architecture provides extra > flexibility by allowing various I/O modes in backstorage handling. For > example, pass-through device handlers allow to use real SCSI hardware > and vdisk device handler allows to use files as virtual disks. > > * Provides advanced per-initiator devices visibility management > (LUN masking), which allows different initiators to see different set of > devices with different access permissions. For instance, initiator A > could see exported from target T devices X and Y read-writable, and > initiator B from the same target T could see devices Y read-only and Z > read-writable. > > * Emulates necessary functionality of SCSI host adapter, because > from remote initiators point of view SCST acts as a SCSI host with its > own devices. > > The following I/O modes are supported by SCST: > > * Pass-through mode with one to many relationship, i.e. when multiple > initiators can connect to the exported pass-through devices, for > virtually all SCSI devices types: disks (type 0), tapes (type 1), > processors (type 3), CDROMs (type 5), MO disks (type 7), medium changers > (type 8) and RAID controllers (type 0xC) > > * FILEIO mode, which allows to use files on file systems or block > devices as virtual remotely available SCSI disks or CDROMs with benefits > of the Linux page cache > > * BLOCKIO mode, which performs direct block I/O with a block device, > bypassing page-cache for all operations. This mode works ideally with > high-end storage HBAs and for applications that either do not need > caching between application and disk or need the large block throughput. > > * User space mode using scst_user device handler, which allows to > implement in the user space virtual SCSI devices in the SCST environment. > > Detail description of SCST, its drivers and utilities you can find on > SCST home page http://scst.sourceforge.net. > > Comparison with the mainstream target middle level STGT you can find on > the SCST vs STGT page http://scst.sourceforge.net/scstvsstgt.html. In > short, SCST has the following main advantages over STGT: > I noticed that you included a reference to my presentation at LSF 08' on your SCST vs. STGT page liked above, and took my description of your work (you are more than welcome to come and present your own case at LSF '09) very much out of context. If you wish to reference my presentation, please at least make the comparision between LIO-Core+LIO-Target vs. SCST vs. STGT, and NOT JUST SCST vs. STGT so that the community are large can understand the differences and technical challenges. The more in fighting between the leaders in our community, the less the community benefits. Many thanks for your most valuable of time, --nab > - Better performance (in many cases tens of %% and more) with > potential for further improvement, for example, by implementing > zero-copy cache I/O. > > - Monolithic in-kernel architecture, which follows standard Linux > kernel paradigm to eliminate distributed processing. It is simpler, > hence more reliable and maintainable. > > SCST is being prepared in form of patch for review and inclusion to the > kernel. > > Vlad > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [ANNOUNCE]: Generic SCSI Target Mid-level For Linux (SCST), target drivers for iSCSI and QLogic Fibre Channel cards released 2008-07-08 21:09 ` Nicholas A. Bellinger @ 2008-07-09 11:19 ` Vladislav Bolkhovitin 2008-07-09 19:34 ` Nicholas A. Bellinger 0 siblings, 1 reply; 11+ messages in thread From: Vladislav Bolkhovitin @ 2008-07-09 11:19 UTC (permalink / raw) To: Nicholas A. Bellinger; +Cc: linux-kernel, linux-scsi, scst-devel, nab Hi Nicholas, Nicholas A. Bellinger wrote: > Hi Vlad, > > On Tue, 2008-07-08 at 23:14 +0400, Vladislav Bolkhovitin wrote: >> I'm glad to announce that version 1.0.0 of Generic SCSI Target Middle >> Level for Linux (SCST) was released and available for download from >> http://scst.sourceforge.net/downloads.html > > Congratulations on reaching your v1.0.0 release! Thanks! >> Comparison with the mainstream target middle level STGT you can find on >> the SCST vs STGT page http://scst.sourceforge.net/scstvsstgt.html. In >> short, SCST has the following main advantages over STGT: > > I noticed that you included a reference to my presentation at LSF 08' on > your SCST vs. STGT page liked above, and took my description of your > work (you are more than welcome to come and present your own case at LSF > '09) very much out of context. I wasn't on the presentation, so on it might have looked as out of context. I have only documents, which I referenced. In them, especially in "2008 Linux Storage & Filesystem Workshop" summary, it doesn't look as I took it out of context. You put emphasis on "older" vs "current"/"new", didn't you ;)? Plus, Mike Christie is also listed as an author. BTW, there are another inaccuracies on your slides: - STGT doesn't support "hardware accelerated traditional iSCSI (Qlogic)", at least I have not found any signs of it. - For SCST you wrote "Only project to support PSCSI, FC and SAS Target mode (with out of tree hardware drivers)". This is ambiguous statement, but looks like you meant that SCST is intended and limited to support only the listed transport. This is incorrect. SCST is intended to support ALL possible transports and types backstorage. > If you wish to reference my presentation, please at least make the > comparision between LIO-Core+LIO-Target vs. SCST vs. STGT, and NOT JUST > SCST vs. STGT so that the community are large can understand the > differences and technical challenges. The SCST vs STGT page was written a long ago, before I ever looked at LIO. I wasn't actually going to refer to your presentation, just added a small note to your funny, from my POV, "older" vs "new" architecture comparison ;) But, when I have time for careful look, I'm going to write some LIO critics. So far, at the first glance: - It is too iSCSI-centric. ISCSI is a very special transport, so looks like when you decide to add in LIO drivers for other transports, especially for parallel SCSI and SAS, you are going to have big troubles and major redesign. And this is a real showstopper for making LIO-Core the default and the only SCSI target framework. SCST is SCSI-centric, just because there's no way to make *SCSI* target framework not being SCSI-centric. Nobody blames Linux SCSI (initiator) mid-layer for being SCSI-centric, correct? - Seems, it's a bit overcomplicated, because it has too many abstract interfaces where there's not much need it them. Having too many abstract interfaces makes code analyze a lot more complicated. For comparison, SCST has only 2 such interfaces: for target drivers and for backstorage dev handlers. Plus, there is half-abstract interface for memory allocator (sgv_pool_set_allocator()) to allow scst_user to allocate user space supplied pages. And they cover all needs. - Pass-through mode (PSCSI) also provides non-enforced 1-to-1 relationship, as it used to be in STGT (now in STGT support for pass-through mode seems to be removed), which isn't mentioned anywhere. - There is some confusion in the code in the function and variable names between persistent and SAM-2 reservations. - There is at least one SCSI standard violation: target and LUN resets don't clear the reservation. Again, it is the first impression, without deep analyze, so I might be wrong somewhere. > The more in fighting between the > leaders in our community, the less the community benefits. Sure. If my note hurts you, I can remove it. But you should also remove from your presentation and the summary paper those psychological arguments to not confuse people. > Many thanks for your most valuable of time, > > --nab ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [ANNOUNCE]: Generic SCSI Target Mid-level For Linux (SCST), target drivers for iSCSI and QLogic Fibre Channel cards released 2008-07-09 11:19 ` Vladislav Bolkhovitin @ 2008-07-09 19:34 ` Nicholas A. Bellinger 2008-07-10 18:25 ` Vladislav Bolkhovitin 0 siblings, 1 reply; 11+ messages in thread From: Nicholas A. Bellinger @ 2008-07-09 19:34 UTC (permalink / raw) To: Vladislav Bolkhovitin Cc: linux-kernel, linux-scsi, scst-devel, Linux-iSCSI.org Target Dev, nab On Wed, 2008-07-09 at 15:19 +0400, Vladislav Bolkhovitin wrote: > Hi Nicholas, > > Nicholas A. Bellinger wrote: > > Hi Vlad, > > > > On Tue, 2008-07-08 at 23:14 +0400, Vladislav Bolkhovitin wrote: > >> I'm glad to announce that version 1.0.0 of Generic SCSI Target Middle > >> Level for Linux (SCST) was released and available for download from > >> http://scst.sourceforge.net/downloads.html > > > > Congratulations on reaching your v1.0.0 release! > > Thanks! > > >> Comparison with the mainstream target middle level STGT you can find on > >> the SCST vs STGT page http://scst.sourceforge.net/scstvsstgt.html. In > >> short, SCST has the following main advantages over STGT: > > > > I noticed that you included a reference to my presentation at LSF 08' on > > your SCST vs. STGT page liked above, and took my description of your > > work (you are more than welcome to come and present your own case at LSF > > '09) very much out of context. > > I wasn't on the presentation, so on it might have looked as out of > context. Either was I until I dropped the program chair (Chris Mason was our gracious host this year) a note about some of the topics that I thought would be useful to discuss in the storage transport track. As LSF is a Usenix sponsered event, this basically consisted of a bunch of notes wrt Linux/iSCSI, LIO-Core+LIO-Target and what has now become (alot of things change in 5 months in the Linux/iSCSI world) the VHACS cloud. Please have a look at Linux-iSCSI.org to see what VHACS and VHACS-VM is. :-) > I have only documents, which I referenced. In them, especially > in "2008 Linux Storage & Filesystem Workshop" summary, it doesn't look > as I took it out of context. You put emphasis on "older" vs > "current"/"new", didn't you ;)? Well, my job was to catch everyone up to speed on the status of the 4 (four) different (insert your favorite SAM capable transport name here) Linux v2.6 based target projects. With all of the acroynms for the standards+implementations+linux-kernel being extremly confusing to anyone who does know all of them by heart. Even those people in the room, who where fimilar with storage, but not necessarly with target mode engine design, its hard to follow. You will notice is the actual slide discussing the status SCST, nothing about old vs. new is mentioned. > Plus, Mike Christie is also listed as an > author. > Yes, the Honorable Mr. Christie was discussing SYSFS infrastructure that we are going to need for a generic target mode engine. > BTW, there are another inaccuracies on your slides: > > - STGT doesn't support "hardware accelerated traditional iSCSI > (Qlogic)", at least I have not found any signs of it. > <nod>, that is correct. It does it's hardware acceleration generically using OFA VERBS for hardware that do wire protocol that implements fabric dependent direct data placement. iSER does this with 504[0-4], and I don't recall exactly how IB does it. Anyways, the point is that they use a single interface so that hardware vendors do not have to implement their own APIs, which are very complex, and usually very buggy when coming from a company who is trying to get a design into ASIC. > - For SCST you wrote "Only project to support PSCSI, FC and SAS Target > mode (with out of tree hardware drivers)". This is ambiguous statement, > but looks like you meant that SCST is intended and limited to support > only the listed transport. This is incorrect. SCST is intended to > support ALL possible transports and types backstorage. > Sure, my intention was to make the point about kernel level drivers for speciality engines being pushed out of tree because of the lack of k.o storage engine for which they can push generically push packets to. > > If you wish to reference my presentation, please at least make the > > comparision between LIO-Core+LIO-Target vs. SCST vs. STGT, and NOT JUST > > SCST vs. STGT so that the community are large can understand the > > differences and technical challenges. > > The SCST vs STGT page was written a long ago, before I ever looked at > LIO. I wasn't actually going to refer to your presentation, just added a > small note to your funny, from my POV, "older" vs "new" architecture > comparison ;) > Heh. :-) > But, when I have time for careful look, I'm going to write some LIO > critics. So far, at the first glance: > > - It is too iSCSI-centric. ISCSI is a very special transport, so looks > like when you decide to add in LIO drivers for other transports, > especially for parallel SCSI and SAS, you are going to have big troubles > and major redesign. Not true. Because LIO-Core subsystem API is battle hardened (you could say it is the 2nd oldest, behind UNH's :), allocating LIO-Core SE tasks (that then get issued to LIO-Core subsystem plugins) from a SCSI CDB with sectors+offset for ICF_SCSI_DATA_SG_IO_CDB, or a generically emulated SCSI control CDB or logic in LIO-Core, or using LIO-Core/PSCSI to let the underlying hardware do its thing, but still fill in the holes so that *ANY* SCSI subsystem, including from different OSes, can talk with storage objects behind LIO-Core when running in initiator mode amoungst the possible fabrics. Some of the classic examples here are: *) Because the Solaris 10 SCSI subsystem requiring all iSCSI devices to have EVPD information, otherwise LUN registration would fail. This means that suddently struct block_device and struct file need to have WWN information, which may be DIFFERENT based upon if said object was a Linux/MD or LVM block device, for example. *) Every cluster design that required block level shared storage needs to have at least SAM-2 Reservations. *) Exporting via LIO-Core Hardware RAID adapters on OSes where max_sectors cannot be easily changed. This is because some Hardware RAID requires a smaller struct scsi_device->max_sector to handle smaller stripe sizes for their arrays. *) Some adapters in drivers/scsi which are not REAL SCSI devices emulate none/some WWN or control logic mentioned above. I have had to do a couple of hacks over the years in LIO-Core/PSCSI to make everything place nice going to the client side of the cloud, check out iscsi_target_pscsi.c:pscsi_transport_complete() to see what I mean. > And this is a real showstopper for making LIO-Core > the default and the only SCSI target framework. SCST is SCSI-centric, Well, one needs to understand that LIO-Core subsystem API is more than a SCSI target framework. Its a generic method of accessing any possible storage object of the storage stack, and having said engine handle the hardware restrictions (be they physical or virtual) for the underlying storage object. It can run as a SCSI engine to real (or emualted) SCSI hardware from linux/drivers/scsi, but the real strength is that it sits above the SCSI/BLOCK/FILE layers and uses a single codepath for all underlying storage objects. For example in the lio-core-2.6.git tree, I chose the location linux/drivers/lio-core, because LIO-Core uses 'struct file' from fs/, 'struct block_device' from block/ and struct scsi_device from drivers/scsi. Its worth to note that I am still doing the re-org of LIO-Core and LIO-Target v3.0.0, but this will be coming soon along with the first non traditional iSCSI packets to run across LIO-Core. > just because there's no way to make *SCSI* target framework not being > SCSI-centric. Nobody blames Linux SCSI (initiator) mid-layer for being > SCSI-centric, correct? Well, as we have discussed before, the emulation of the SCSI control path is really a whole different monster, and I am certainly not interested in having to emulate all of the t10.org standards myself. :-) > > - Seems, it's a bit overcomplicated, because it has too many abstract > interfaces where there's not much need it them. Having too many abstract > interfaces makes code analyze a lot more complicated. For comparison, > SCST has only 2 such interfaces: for target drivers and for backstorage > dev handlers. Plus, there is half-abstract interface for memory > allocator (sgv_pool_set_allocator()) to allow scst_user to allocate user > space supplied pages. And they cover all needs. > Well, I have discussed why I think the LIO-Core design (which was more neccessity at the start) has been able to work with for all kernel subsystems/storage objects on all architectures for v2.2, v2.4 and v2.6 kernels. I also mention these at the 10,000 ft level in my LSF 08' pres. > - Pass-through mode (PSCSI) also provides non-enforced 1-to-1 > relationship, as it used to be in STGT (now in STGT support for > pass-through mode seems to be removed), which isn't mentioned anywhere. > Please be more specific by what you mean here. Also, note that because PSCSI is an LIO-Core subsystem plugin, LIO-Core handles the limitations of the storage object through the LIO-Core subsystem API. This means that things like (received initiator CDB sectors > LIO-Core storage object max_sectors) are handled generically by LIO-Core, using a single set of algoritims for all I/O interaction with Linux storage systems. These algoritims are also the same for DIFFERENT types of transport fabrics, both those that expect LIO-Core to allocate memory, OR that hardware will have preallocated memory and possible restrictions from the CPU/BUS architecture (take non-cache coherent MIPS for example) of how the memory gets DMA'ed or PIO'ed down to the packet's intended storage object. > - There is some confusion in the code in the function and variable > names between persistent and SAM-2 reservations. > Well, that would be because persistent reservations are not emulated generally for all of the subsystem plugins just yet. Obviously with LIO-Core/PSCSI if the underlying hardware supports it, it will work. > - There is at least one SCSI standard violation: target and LUN resets > don't clear the reservation. > Noted. This needs to be fixed in v3.0.0 and then backported to v2.9-STABLE. > Again, it is the first impression, without deep analyze, so I might be > wrong somewhere. > > > The more in fighting between the > > leaders in our community, the less the community benefits. > > Sure. If my note hurts you, I can remove it. But you should also remove > from your presentation and the summary paper those psychological > arguments to not confuse people. > Its not about removing, it is about updating the page to better reflect the bigger picture so folks coming to the sight can get the latest information from last update. Many thanks for your most valuable of time, --nab ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [ANNOUNCE]: Generic SCSI Target Mid-level For Linux (SCST), target drivers for iSCSI and QLogic Fibre Channel cards released 2008-07-09 19:34 ` Nicholas A. Bellinger @ 2008-07-10 18:25 ` Vladislav Bolkhovitin 2008-07-10 21:26 ` Nicholas A. Bellinger 0 siblings, 1 reply; 11+ messages in thread From: Vladislav Bolkhovitin @ 2008-07-10 18:25 UTC (permalink / raw) To: Nicholas A. Bellinger Cc: linux-kernel, linux-scsi, scst-devel, Linux-iSCSI.org Target Dev, nab Nicholas A. Bellinger wrote: >> I have only documents, which I referenced. In them, especially >> in "2008 Linux Storage & Filesystem Workshop" summary, it doesn't look >> as I took it out of context. You put emphasis on "older" vs >> "current"/"new", didn't you ;)? > > Well, my job was to catch everyone up to speed on the status of the 4 > (four) different (insert your favorite SAM capable transport name here) > Linux v2.6 based target projects. With all of the acroynms for the > standards+implementations+linux-kernel being extremly confusing to > anyone who does know all of them by heart. Even those people in the > room, who where fimilar with storage, but not necessarly with target > mode engine design, its hard to follow. Yes, this is a problem. Even storage experts are not too familiar with SCSI internals and not willing much to get better familiarity. Hence, almost nobody really understands for what is all those SCSI processing in SCST.. >> BTW, there are another inaccuracies on your slides: >> >> - STGT doesn't support "hardware accelerated traditional iSCSI >> (Qlogic)", at least I have not found any signs of it. >> > > <nod>, that is correct. It does it's hardware acceleration generically > using OFA VERBS for hardware that do wire protocol that implements > fabric dependent direct data placement. iSER does this with 504[0-4], > and I don't recall exactly how IB does it. Anyways, the point is that > they use a single interface so that hardware vendors do not have to > implement their own APIs, which are very complex, and usually very buggy > when coming from a company who is trying to get a design into ASIC. ISER is "iSCSI Extensions for RDMA", while usually under "hardware accelerated traditional iSCSI" people mean regular hardware iSCSI cards, like QLogic 4xxx. Hence, your sentence for most people, including myself, was incorrect and confusing. >> But, when I have time for careful look, I'm going to write some LIO >> critics. So far, at the first glance: >> >> - It is too iSCSI-centric. ISCSI is a very special transport, so looks >> like when you decide to add in LIO drivers for other transports, >> especially for parallel SCSI and SAS, you are going to have big troubles >> and major redesign. > > Not true. Because LIO-Core subsystem API is battle hardened (you could > say it is the 2nd oldest, behind UNH's :), allocating LIO-Core SE tasks > (that then get issued to LIO-Core subsystem plugins) from a SCSI CDB > with sectors+offset for ICF_SCSI_DATA_SG_IO_CDB, or a generically > emulated SCSI control CDB or logic in LIO-Core, or using LIO-Core/PSCSI > to let the underlying hardware do its thing, but still fill in the holes > so that *ANY* SCSI subsystem, including from different OSes, can talk > with storage objects behind LIO-Core when running in initiator mode > amoungst the possible fabrics. Some of the classic examples here are: > > *) Because the Solaris 10 SCSI subsystem requiring all iSCSI devices to > have EVPD information, otherwise LUN registration would fail. This > means that suddently struct block_device and struct file need to have > WWN information, which may be DIFFERENT based upon if said object was a > Linux/MD or LVM block device, for example. > > *) Every cluster design that required block level shared storage needs > to have at least SAM-2 Reservations. > > *) Exporting via LIO-Core Hardware RAID adapters on OSes where > max_sectors cannot be easily changed. This is because some Hardware > RAID requires a smaller struct scsi_device->max_sector to handle smaller > stripe sizes for their arrays. > > *) Some adapters in drivers/scsi which are not REAL SCSI devices emulate > none/some WWN or control logic mentioned above. I have had to do a > couple of hacks over the years in LIO-Core/PSCSI to make everything > place nice going to the client side of the cloud, check out > iscsi_target_pscsi.c:pscsi_transport_complete() to see what I mean. I meant something different: interface between target drivers and SCSI target core. Here (seems) you are going to have big troubles when you try to add not-iSCSI transport, like FC, for instance. >> And this is a real showstopper for making LIO-Core >> the default and the only SCSI target framework. SCST is SCSI-centric, > > Well, one needs to understand that LIO-Core subsystem API is more than a > SCSI target framework. Its a generic method of accessing any possible > storage object of the storage stack, and having said engine handle the > hardware restrictions (be they physical or virtual) for the underlying > storage object. It can run as a SCSI engine to real (or emualted) SCSI > hardware from linux/drivers/scsi, but the real strength is that it sits > above the SCSI/BLOCK/FILE layers and uses a single codepath for all > underlying storage objects. For example in the lio-core-2.6.git tree, I > chose the location linux/drivers/lio-core, because LIO-Core uses 'struct > file' from fs/, 'struct block_device' from block/ and struct scsi_device > from drivers/scsi. SCST and iSCSI-SCST, basically, do the same things, except iSCSI MC/S and related, + something more, like 1-to-many pass-through and scst_user, which need a big chunks of code, correct? And they are together about 2 times smaller: $ find core-iscsi/svn/trunk/target/target -type f -name "*.[ch]"|xargs wc 59764 163202 1625877 total + $ find core-iscsi/svn/trunk/target/include -type f -name "*.[ch]"|xargs 2981 9316 91930 total = 62745 1717807 vs $ find svn/trunk/scst -type f -name "*.[ch]"|xargs wc 28327 77878 734625 total + $ find svn/trunk/iscsi-scst/kernel -type f -name "*.[ch]"|xargs wc 7857 20394 194693 total = 36184 929318 Or did I count incorrectly? > Its worth to note that I am still doing the re-org of LIO-Core and > LIO-Target v3.0.0, but this will be coming soon along with the first non > traditional iSCSI packets to run across LIO-Core. > >> just because there's no way to make *SCSI* target framework not being >> SCSI-centric. Nobody blames Linux SCSI (initiator) mid-layer for being >> SCSI-centric, correct? > > Well, as we have discussed before, the emulation of the SCSI control > path is really a whole different monster, and I am certainly not > interested in having to emulate all of the t10.org standards > myself. :-) Sure, there optional things. But there are also requirements, which must be followed. So, this isn't about interested or not, this is about must do or don't do at all. >> - Seems, it's a bit overcomplicated, because it has too many abstract >> interfaces where there's not much need it them. Having too many abstract >> interfaces makes code analyze a lot more complicated. For comparison, >> SCST has only 2 such interfaces: for target drivers and for backstorage >> dev handlers. Plus, there is half-abstract interface for memory >> allocator (sgv_pool_set_allocator()) to allow scst_user to allocate user >> space supplied pages. And they cover all needs. > > Well, I have discussed why I think the LIO-Core design (which was more > neccessity at the start) has been able to work with for all kernel > subsystems/storage objects on all architectures for v2.2, v2.4 and v2.6 > kernels. I also mention these at the 10,000 ft level in my LSF 08' > pres. Nobody in the Linux kernel community is interested to have obsolete or unneeded for the current kernel version code in the kernel, so if you want LIO core be in the kernel, you will have to make a major cleanup. Also, see the above LIO vs SCST size comparison. Is the additional code all about the obsolete/currently unneeded features? >> - Pass-through mode (PSCSI) also provides non-enforced 1-to-1 >> relationship, as it used to be in STGT (now in STGT support for >> pass-through mode seems to be removed), which isn't mentioned anywhere. >> > > Please be more specific by what you mean here. Also, note that because > PSCSI is an LIO-Core subsystem plugin, LIO-Core handles the limitations > of the storage object through the LIO-Core subsystem API. This means > that things like (received initiator CDB sectors > LIO-Core storage > object max_sectors) are handled generically by LIO-Core, using a single > set of algoritims for all I/O interaction with Linux storage systems. > These algoritims are also the same for DIFFERENT types of transport > fabrics, both those that expect LIO-Core to allocate memory, OR that > hardware will have preallocated memory and possible restrictions from > the CPU/BUS architecture (take non-cache coherent MIPS for example) of > how the memory gets DMA'ed or PIO'ed down to the packet's intended > storage object. See here: http://www.mail-archive.com/linux-scsi@vger.kernel.org/msg06911.html >> - There is some confusion in the code in the function and variable >> names between persistent and SAM-2 reservations. > > Well, that would be because persistent reservations are not emulated > generally for all of the subsystem plugins just yet. Obviously with > LIO-Core/PSCSI if the underlying hardware supports it, it will work. What you did (passing reservation commands directly to devices and nothing more) will work only with a single initiator per device, where reservations in the majority of cases are not needed at all. With multiple initiators, as it is in clusters and where reservations are really needed, it will sooner or later lead to data corruption. See the referenced above message as well as the whole thread. >>> The more in fighting between the >>> leaders in our community, the less the community benefits. >> Sure. If my note hurts you, I can remove it. But you should also remove >> from your presentation and the summary paper those psychological >> arguments to not confuse people. >> > > Its not about removing, it is about updating the page to better reflect > the bigger picture so folks coming to the sight can get the latest > information from last update. Your suggestions? > Many thanks for your most valuable of time, > > --nab > > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [ANNOUNCE]: Generic SCSI Target Mid-level For Linux (SCST), target drivers for iSCSI and QLogic Fibre Channel cards released 2008-07-10 18:25 ` Vladislav Bolkhovitin @ 2008-07-10 21:26 ` Nicholas A. Bellinger 2008-07-11 18:41 ` Vladislav Bolkhovitin 0 siblings, 1 reply; 11+ messages in thread From: Nicholas A. Bellinger @ 2008-07-10 21:26 UTC (permalink / raw) To: Vladislav Bolkhovitin Cc: linux-kernel, linux-scsi, scst-devel, Linux-iSCSI.org Target Dev, Jeff Garzik, Leonid Grossman, H. Peter Anvin, Pete Wyckoff, Ming Zhang, Ross S. W. Walker, Rafiu Fakunle, Mike Mazarick, Andrew Morton, David Miller, Christoph Hellwig, Ted Ts'o, Jerome Martin On Thu, 2008-07-10 at 22:25 +0400, Vladislav Bolkhovitin wrote: > Nicholas A. Bellinger wrote: > >> I have only documents, which I referenced. In them, especially > >> in "2008 Linux Storage & Filesystem Workshop" summary, it doesn't look > >> as I took it out of context. You put emphasis on "older" vs > >> "current"/"new", didn't you ;)? > > > > Well, my job was to catch everyone up to speed on the status of the 4 > > (four) different (insert your favorite SAM capable transport name here) > > Linux v2.6 based target projects. With all of the acroynms for the > > standards+implementations+linux-kernel being extremly confusing to > > anyone who does know all of them by heart. Even those people in the > > room, who where fimilar with storage, but not necessarly with target > > mode engine design, its hard to follow. > > Yes, this is a problem. Even storage experts are not too familiar with > SCSI internals and not willing much to get better familiarity. Hence, > almost nobody really understands for what is all those SCSI processing > in SCST.. > Which is why being specific when we talk about these many varied subjects (see below) that all fit into the bigger picture we all want to get to (see VHACS) is of utmost importance. > >> BTW, there are another inaccuracies on your slides: > >> > >> - STGT doesn't support "hardware accelerated traditional iSCSI > >> (Qlogic)", at least I have not found any signs of it. > >> > > > > <nod>, that is correct. It does it's hardware acceleration generically > > using OFA VERBS for hardware that do wire protocol that implements > > fabric dependent direct data placement. iSER does this with 504[0-4], > > and I don't recall exactly how IB does it. Anyways, the point is that > > they use a single interface so that hardware vendors do not have to > > implement their own APIs, which are very complex, and usually very buggy > > when coming from a company who is trying to get a design into ASIC. > > ISER is "iSCSI Extensions for RDMA", while usually under "hardware > accelerated traditional iSCSI" people mean regular hardware iSCSI cards, > like QLogic 4xxx. Hence, your sentence for most people, including > myself, was incorrect and confusing. Yes, I know the difference between traditional TCP and Direct Data Placement (DDP) on multiple fabric interconnects. (I own multiple Qlogic, Intel, Alacratec traditional iSCSI cards myself, and have gotten them all to work with LIO at some point). The point that I was making is that OFA VERBS does it INDEPENDENT of the vendor actually PRODUCING the card/chip/whatever. That means: I) It makes the vendor's job easier producing silicon, because they don't need to spend lots of extra engineering resources on producing the types of APIs (VERBS, DAPL, MPI) that (some) cluster guys need for their apps. II) It allows other vendors who are also making hardware to implement the same fabric to benefit from others using/building/changing the code. III) It allows storage engine architects (like ourselves) to use a single API (and codebase with OFA) to push packets DDP packet for iSER (RFC-5045) to the engine. Anyways, the point is that with traditional iSCSI hardware acceleration. there was never anything like that, because those implemenetations, most noteably TOE (yes, I also worked on a TOE hardware at one point too :-) where always considered a 'point in time' solution. > >> But, when I have time for careful look, I'm going to write some LIO > >> critics. So far, at the first glance: > >> > >> - It is too iSCSI-centric. ISCSI is a very special transport, so looks > >> like when you decide to add in LIO drivers for other transports, > >> especially for parallel SCSI and SAS, you are going to have big troubles > >> and major redesign. > > > > Not true. Because LIO-Core subsystem API is battle hardened (you could > > say it is the 2nd oldest, behind UNH's :), allocating LIO-Core SE tasks > > (that then get issued to LIO-Core subsystem plugins) from a SCSI CDB > > with sectors+offset for ICF_SCSI_DATA_SG_IO_CDB, or a generically > > emulated SCSI control CDB or logic in LIO-Core, or using LIO-Core/PSCSI > > to let the underlying hardware do its thing, but still fill in the holes > > so that *ANY* SCSI subsystem, including from different OSes, can talk > > with storage objects behind LIO-Core when running in initiator mode > > amoungst the possible fabrics. Some of the classic examples here are: > > > > *) Because the Solaris 10 SCSI subsystem requiring all iSCSI devices to > > have EVPD information, otherwise LUN registration would fail. This > > means that suddently struct block_device and struct file need to have > > WWN information, which may be DIFFERENT based upon if said object was a > > Linux/MD or LVM block device, for example. > > > > *) Every cluster design that required block level shared storage needs > > to have at least SAM-2 Reservations. > > > > *) Exporting via LIO-Core Hardware RAID adapters on OSes where > > max_sectors cannot be easily changed. This is because some Hardware > > RAID requires a smaller struct scsi_device->max_sector to handle smaller > > stripe sizes for their arrays. > > > > *) Some adapters in drivers/scsi which are not REAL SCSI devices emulate > > none/some WWN or control logic mentioned above. I have had to do a > > couple of hacks over the years in LIO-Core/PSCSI to make everything > > place nice going to the client side of the cloud, check out > > iscsi_target_pscsi.c:pscsi_transport_complete() to see what I mean. > > I meant something different: interface between target drivers and SCSI > target core. Here (seems) you are going to have big troubles when you > try to add not-iSCSI transport, like FC, for instance. > I know what you mean. The point that I am making is that LIO-Core <-> Subsystem and LIO-Target <-> LIO-Core are seperated for all intensive purposes in the lio-core-2.6.git tree. Once the SCST interface between Fabric <-> Engine and can be hooked up to v3.0.0 LIO-Core (Engine) <-> Subsystem (Linux Storage Stack) we will be good to go to port ALL Fabric plugins, from SCST, iSER from STGT, and eventually _NON_ SCSI fabrics as well (think AoE and Target Mode SATA). > >> And this is a real showstopper for making LIO-Core > >> the default and the only SCSI target framework. SCST is SCSI-centric, > > > > Well, one needs to understand that LIO-Core subsystem API is more than a > > SCSI target framework. Its a generic method of accessing any possible > > storage object of the storage stack, and having said engine handle the > > hardware restrictions (be they physical or virtual) for the underlying > > storage object. It can run as a SCSI engine to real (or emualted) SCSI > > hardware from linux/drivers/scsi, but the real strength is that it sits > > above the SCSI/BLOCK/FILE layers and uses a single codepath for all > > underlying storage objects. For example in the lio-core-2.6.git tree, I > > chose the location linux/drivers/lio-core, because LIO-Core uses 'struct > > file' from fs/, 'struct block_device' from block/ and struct scsi_device > > from drivers/scsi. > > SCST and iSCSI-SCST, basically, do the same things, except iSCSI MC/S > and related, + something more, like 1-to-many pass-through and > scst_user, which need a big chunks of code, correct? And they are > together about 2 times smaller: > Yes, something much more. A complete implementation of traditional iSCSI/TCP (known as RFC-3720), iSCSI/SCTP (which will be important in the future), and IPv6 (also important) is a significant amount of logic. When I say a 'complete implementation' I mean: I) Active-Active connection layer recovery (known as ErrorRecoveryLevel=2). (We are going to use the same code for iSER for inter-nexus OS independent (eg: below the SCSI Initiator level) recovery. Again, the important part here is that recovery and outstanding task migration happens transparently to the host OS SCSI subsystem. This means (at least with iSCSI and iSER): not having to register multiple LUNs and depend (at least completely) on SCSI WWN information, and OS dependent SCSI level multipath. II) MC/S for multiplexing (same as I), as well as being able to multiplex across multiple cards and subnets (using TCP, SCTP has multi-homing). Also being able to bring iSCSI connections up/down on the fly, until we all have iSCSI/SCTP, is very important too. III) Every possible combination of RFC-3720 defined parameter keys (and provide the apparatis to prove it). And yes, anyone can do this today against their own Target. I created core-iscsi-dv specifically for testing LIO-Target <-> LIO-Core back in 2005. Core-iSCSI-DV is the _ONLY_ _PUBLIC_ RFC-3720 domain validation tool that will actually demonstrate, using ANY data integrity tool complete domain validation of user defined keys. Please have a look at: http://linux-iscsi.org/index.php/Core-iscsi-dv http://www.linux-iscsi.org/files/core-iscsi-dv/README Any traditional iSCSI target mode implementation + Storage Engine + Subsystem Plugin that thinks its ready to go into the kernel will have to pass at LEAST the 8k test loop interations, the simplest being: HeaderDigest, DataDigest, MaxRecvDataSegmentLength (512 -> 262144, in 512 byte increments) Core-iSCSI-DV is also a great indication of stability and data integrity of hardware/software of an iSCSI Target + Engine, espically when you have multiple core-iscsi-dv nodes hitting multiple VHACS clouds on physical machines within the cluster. I have never run IET against core-iscsi-dv personally, and I don't think Ming or Ross has either. So until SOMEONE actually does this first, I think that iSCSI-SCST is more of an experiment for your our devel that a strong contender for Linux/iSCSI Target Mode. > $ find core-iscsi/svn/trunk/target/target -type f -name "*.[ch]"|xargs wc > 59764 163202 1625877 total > + > $ find core-iscsi/svn/trunk/target/include -type f -name "*.[ch]"|xargs > 2981 9316 91930 total > = > 62745 1717807 > > vs > > $ find svn/trunk/scst -type f -name "*.[ch]"|xargs wc > 28327 77878 734625 total > + > $ find svn/trunk/iscsi-scst/kernel -type f -name "*.[ch]"|xargs wc > 7857 20394 194693 total > = > 36184 929318 > > Or did I count incorrectly? > > > Its worth to note that I am still doing the re-org of LIO-Core and > > LIO-Target v3.0.0, but this will be coming soon along with the first non > > traditional iSCSI packets to run across LIO-Core. > > > >> just because there's no way to make *SCSI* target framework not being > >> SCSI-centric. Nobody blames Linux SCSI (initiator) mid-layer for being > >> SCSI-centric, correct? > > > > Well, as we have discussed before, the emulation of the SCSI control > > path is really a whole different monster, and I am certainly not > > interested in having to emulate all of the t10.org standards > > myself. :-) > > Sure, there optional things. But there are also requirements, which must > be followed. So, this isn't about interested or not, this is about must > do or don't do at all. > <nod> > >> - Seems, it's a bit overcomplicated, because it has too many abstract > >> interfaces where there's not much need it them. Having too many abstract > >> interfaces makes code analyze a lot more complicated. For comparison, > >> SCST has only 2 such interfaces: for target drivers and for backstorage > >> dev handlers. Plus, there is half-abstract interface for memory > >> allocator (sgv_pool_set_allocator()) to allow scst_user to allocate user > >> space supplied pages. And they cover all needs. > > > > Well, I have discussed why I think the LIO-Core design (which was more > > neccessity at the start) has been able to work with for all kernel > > subsystems/storage objects on all architectures for v2.2, v2.4 and v2.6 > > kernels. I also mention these at the 10,000 ft level in my LSF 08' > > pres. > > Nobody in the Linux kernel community is interested to have obsolete or > unneeded for the current kernel version code in the kernel, so if you > want LIO core be in the kernel, you will have to make a major cleanup. > Obviously not. Also, what I was talking about there was the strength and flexibility of the LIO-Core design (it even ran on the Playstation 2 at one point, http://linux-iscsi.org/index.php/Playstation2/iSCSI, when MIPS r5900 boots modern v2.6, then we will do it again with LIO :-) Anyways, just so everyone is clear: v2.9-STABLE LIO-Target from Linux-iSCSI.org SVN: Works on all Modern v2.6 kernels up until >= v2.6.26. v3.0.0 LIO-Core in lio-core-2.6.git tree on kernel.org: All legacy code removed, currently at v2.6.26-rc9, tested on powerpc and x86. Please look at my code before making such blanket statements please. > Also, see the above LIO vs SCST size comparison. Is the additional code > all about the obsolete/currently unneeded features? > > >> - Pass-through mode (PSCSI) also provides non-enforced 1-to-1 > >> relationship, as it used to be in STGT (now in STGT support for > >> pass-through mode seems to be removed), which isn't mentioned anywhere. > >> > > > > Please be more specific by what you mean here. Also, note that because > > PSCSI is an LIO-Core subsystem plugin, LIO-Core handles the limitations > > of the storage object through the LIO-Core subsystem API. This means > > that things like (received initiator CDB sectors > LIO-Core storage > > object max_sectors) are handled generically by LIO-Core, using a single > > set of algoritims for all I/O interaction with Linux storage systems. > > These algoritims are also the same for DIFFERENT types of transport > > fabrics, both those that expect LIO-Core to allocate memory, OR that > > hardware will have preallocated memory and possible restrictions from > > the CPU/BUS architecture (take non-cache coherent MIPS for example) of > > how the memory gets DMA'ed or PIO'ed down to the packet's intended > > storage object. > > See here: > http://www.mail-archive.com/linux-scsi@vger.kernel.org/msg06911.html > <nod> > >> - There is some confusion in the code in the function and variable > >> names between persistent and SAM-2 reservations. > > > > Well, that would be because persistent reservations are not emulated > > generally for all of the subsystem plugins just yet. Obviously with > > LIO-Core/PSCSI if the underlying hardware supports it, it will work. > > What you did (passing reservation commands directly to devices and > nothing more) will work only with a single initiator per device, where > reservations in the majority of cases are not needed at all. I know, like I said, implementing Persistent Reservations for stuff besides real SCSI hardware with LIO-Core/PSCSI is a TODO item. Note that the VHACS cloud (see below) will need this for DRBD objects at some point. > With > multiple initiators, as it is in clusters and where reservations are > really needed, it will sooner or later lead to data corruption. See the > referenced above message as well as the whole thread. > Obviously with any target, if a non-shared resources is accessed by multiple initiator/client nodes and there is no data coherency layer, or reserverations or ACLS, or whatever there is going to be a problem. That is a no-brainer. Now, with a shared resource, such as a Quorum disk for a traditional cluster design, or a cluster filesystem (such as OCFS2, GFS, Lustre, etc) handle the data coherency just fine with SPC-2 Reserve today with all LIO-Core v2.9 and v3.0.0. storage objects from all subsystems. > >>> The more in fighting between the > >>> leaders in our community, the less the community benefits. > >> Sure. If my note hurts you, I can remove it. But you should also remove > >> from your presentation and the summary paper those psychological > >> arguments to not confuse people. > >> > > > > Its not about removing, it is about updating the page to better reflect > > the bigger picture so folks coming to the sight can get the latest > > information from last update. > > Your suggestions? > I would consider helping with this at some point, but as you can see, I am extremly busy ATM. I have looked at SCST quite a bit over the years, but I am not the one making a public comparision page, at least not yet. :-) So until then, at least explain how there are 3 projects on your page, with the updated 10,000 ft overviews, and mabye even add some links to LIO-Target and a bit about VHACS cloud. I would be willing to include info about SCST into the Linux-iSCSI.org wiki. Also, please feel free to open an account and start adding stuff about SCST yourself to the site. For Linux-iSCSI.org and VHACS (which is really where everything is going now), please have a look at: http://linux-iscsi.org/index.php/VHACS-VM http://linux-iscsi.org/index.php/VHACS Btw, the VHACS and LIO-Core design will allow for other fabrics to be used inside our cloud, and between other virtualized client setups who speak the wire protocol presented by the server side of VHACS cloud. Many thanks for your most valuable of time, --nab ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [ANNOUNCE]: Generic SCSI Target Mid-level For Linux (SCST), target drivers for iSCSI and QLogic Fibre Channel cards released 2008-07-10 21:26 ` Nicholas A. Bellinger @ 2008-07-11 18:41 ` Vladislav Bolkhovitin 2008-07-12 3:28 ` [ANNOUNCE]: Generic SCSI Target Mid-level For Linux (followup) Nicholas A. Bellinger 0 siblings, 1 reply; 11+ messages in thread From: Vladislav Bolkhovitin @ 2008-07-11 18:41 UTC (permalink / raw) To: Nicholas A. Bellinger Cc: linux-kernel, linux-scsi, scst-devel, Linux-iSCSI.org Target Dev, Jeff Garzik, Leonid Grossman, H. Peter Anvin, Pete Wyckoff, Ming Zhang, Ross S. W. Walker, Rafiu Fakunle, Mike Mazarick, Andrew Morton, David Miller, Christoph Hellwig, Ted Ts'o, Jerome Martin Nicholas A. Bellinger wrote: >>>> And this is a real showstopper for making LIO-Core >>>> the default and the only SCSI target framework. SCST is SCSI-centric, >>> Well, one needs to understand that LIO-Core subsystem API is more than a >>> SCSI target framework. Its a generic method of accessing any possible >>> storage object of the storage stack, and having said engine handle the >>> hardware restrictions (be they physical or virtual) for the underlying >>> storage object. It can run as a SCSI engine to real (or emualted) SCSI >>> hardware from linux/drivers/scsi, but the real strength is that it sits >>> above the SCSI/BLOCK/FILE layers and uses a single codepath for all >>> underlying storage objects. For example in the lio-core-2.6.git tree, I >>> chose the location linux/drivers/lio-core, because LIO-Core uses 'struct >>> file' from fs/, 'struct block_device' from block/ and struct scsi_device >>> from drivers/scsi. >> SCST and iSCSI-SCST, basically, do the same things, except iSCSI MC/S >> and related, + something more, like 1-to-many pass-through and >> scst_user, which need a big chunks of code, correct? And they are >> together about 2 times smaller: > > Yes, something much more. A complete implementation of traditional > iSCSI/TCP (known as RFC-3720), iSCSI/SCTP (which will be important in > the future), and IPv6 (also important) is a significant amount of logic. > When I say a 'complete implementation' I mean: > > I) Active-Active connection layer recovery (known as > ErrorRecoveryLevel=2). (We are going to use the same code for iSER for > inter-nexus OS independent (eg: below the SCSI Initiator level) > recovery. Again, the important part here is that recovery and > outstanding task migration happens transparently to the host OS SCSI > subsystem. This means (at least with iSCSI and iSER): not having to > register multiple LUNs and depend (at least completely) on SCSI WWN > information, and OS dependent SCSI level multipath. > > II) MC/S for multiplexing (same as I), as well as being able to > multiplex across multiple cards and subnets (using TCP, SCTP has > multi-homing). Also being able to bring iSCSI connections up/down on > the fly, until we all have iSCSI/SCTP, is very important too. > > III) Every possible combination of RFC-3720 defined parameter keys (and > provide the apparatis to prove it). And yes, anyone can do this today > against their own Target. I created core-iscsi-dv specifically for > testing LIO-Target <-> LIO-Core back in 2005. Core-iSCSI-DV is the > _ONLY_ _PUBLIC_ RFC-3720 domain validation tool that will actually > demonstrate, using ANY data integrity tool complete domain validation of > user defined keys. Please have a look at: > > http://linux-iscsi.org/index.php/Core-iscsi-dv > > http://www.linux-iscsi.org/files/core-iscsi-dv/README > > Any traditional iSCSI target mode implementation + Storage Engine + > Subsystem Plugin that thinks its ready to go into the kernel will have > to pass at LEAST the 8k test loop interations, the simplest being: > > HeaderDigest, DataDigest, MaxRecvDataSegmentLength (512 -> 262144, in > 512 byte increments) > > Core-iSCSI-DV is also a great indication of stability and data integrity > of hardware/software of an iSCSI Target + Engine, espically when you > have multiple core-iscsi-dv nodes hitting multiple VHACS clouds on > physical machines within the cluster. I have never run IET against > core-iscsi-dv personally, and I don't think Ming or Ross has either. So > until SOMEONE actually does this first, I think that iSCSI-SCST is more > of an experiment for your our devel that a strong contender for > Linux/iSCSI Target Mode. There are big doubts among storage experts if features I and II are needed at all, see, e.g. http://lkml.org/lkml/2008/2/5/331. I also tend to agree, that for block storage on practice MC/S is not needed or, at least, definitely doesn't worth the effort, because: 1. It is useless for sync. untagged operation (regular reads in most cases over a single stream), when always there is only one command being executed at any time, because of the commands connection allegiance, which forbids transferring data for a command over multiple connections. 2. The only advantage it has over traditional OS multi-pathing is keeping commands execution order, but on practice at the moment there is no demand for this feature, because all OS'es I know don't rely on commands order to protect data integrity. They use other techniques, like queue draining. A good target should be able itself to scheduler coming commands for execution in the correct from performance POV order and not rely for that on the commands order as they came from initiators. From other side, devices bonding also preserves commands execution order, but doesn't suffer from the connection allegiance limitation of MC/S, so can boost performance ever for sync untagged operations. Plus, it's pretty simple, easy to use and doesn't need any additional code. I don't have the exact numbers of MC/S vs bonding performance comparison (mostly, because open-iscsi doesn't support MC/S, but very curious to see them), but have very strong suspicious that on modern OS'es, which do TCP frames reorder in zero-copy manner, there shouldn't be much performance difference between MC/S vs bonding in the maximum possible throughput, but bonding should outperform MC/S a lot in case of sync untagged operations. Anyway, I think features I and II, if added, would increase iSCSI-SCST kernel side code not more than on 5K lines, because most of the code is already there, the most important part which missed is fixes of locking problems, which almost never add a lot of code. Relating Core-iSCSI-DV, I'm sure iSCSI-SCST will pass it without problems among the required set of iSCSI features, although still there are some limitations, derived from IET, for instance, support for multu-PDU commands in discovery sessions, which isn't implemented. But for adding to iSCSI-SCST optional iSCSI features there should be good *practical* reasons, which at the moment don't exist. And unused features are bad features, because they overcomplicate the code and make its maintainance harder for no gain. So, current SCST+iSCSI-SCST 36K lines + 5K new lines = 41K lines, which still a lot less than LIO's 63K lines. I downloaded the cleanuped lio-core-2.6.git tree and: $ find lio-core-2.6/drivers/lio-core -type f -name "*.[ch]"|xargs wc 57064 156617 1548344 total Still much bigger. > Obviously not. Also, what I was talking about there was the strength > and flexibility of the LIO-Core design (it even ran on the Playstation 2 > at one point, http://linux-iscsi.org/index.php/Playstation2/iSCSI, when > MIPS r5900 boots modern v2.6, then we will do it again with LIO :-) SCST and the target drivers have been successfully ran on PPC and Sparc64, so I don't see any reasons, why it can't be ran on Playstation 2 as well. >>>> - Pass-through mode (PSCSI) also provides non-enforced 1-to-1 >>>> relationship, as it used to be in STGT (now in STGT support for >>>> pass-through mode seems to be removed), which isn't mentioned anywhere. >>>> >>> Please be more specific by what you mean here. Also, note that because >>> PSCSI is an LIO-Core subsystem plugin, LIO-Core handles the limitations >>> of the storage object through the LIO-Core subsystem API. This means >>> that things like (received initiator CDB sectors > LIO-Core storage >>> object max_sectors) are handled generically by LIO-Core, using a single >>> set of algoritims for all I/O interaction with Linux storage systems. >>> These algoritims are also the same for DIFFERENT types of transport >>> fabrics, both those that expect LIO-Core to allocate memory, OR that >>> hardware will have preallocated memory and possible restrictions from >>> the CPU/BUS architecture (take non-cache coherent MIPS for example) of >>> how the memory gets DMA'ed or PIO'ed down to the packet's intended >>> storage object. >> See here: >> http://www.mail-archive.com/linux-scsi@vger.kernel.org/msg06911.html >> > > <nod> > >>>> - There is some confusion in the code in the function and variable >>>> names between persistent and SAM-2 reservations. >>> Well, that would be because persistent reservations are not emulated >>> generally for all of the subsystem plugins just yet. Obviously with >>> LIO-Core/PSCSI if the underlying hardware supports it, it will work. >> What you did (passing reservation commands directly to devices and >> nothing more) will work only with a single initiator per device, where >> reservations in the majority of cases are not needed at all. > > I know, like I said, implementing Persistent Reservations for stuff > besides real SCSI hardware with LIO-Core/PSCSI is a TODO item. Note > that the VHACS cloud (see below) will need this for DRBD objects at some > point. The problem is that persistent reservations don't work for multiple initiators even for real SCSI hardware with LIO-Core/PSCSI and I clearly described why in the referenced e-mail. Nicholas, why don't you want to see it? >>>>> The more in fighting between the >>>>> leaders in our community, the less the community benefits. >>>> Sure. If my note hurts you, I can remove it. But you should also remove >>>> from your presentation and the summary paper those psychological >>>> arguments to not confuse people. >>>> >>> Its not about removing, it is about updating the page to better reflect >>> the bigger picture so folks coming to the sight can get the latest >>> information from last update. >> Your suggestions? >> > > I would consider helping with this at some point, but as you can see, I > am extremly busy ATM. I have looked at SCST quite a bit over the years, > but I am not the one making a public comparision page, at least not > yet. :-) So until then, at least explain how there are 3 projects on > your page, with the updated 10,000 ft overviews, and mabye even add some > links to LIO-Target and a bit about VHACS cloud. I would be willing to > include info about SCST into the Linux-iSCSI.org wiki. Also, please > feel free to open an account and start adding stuff about SCST yourself > to the site. > > For Linux-iSCSI.org and VHACS (which is really where everything is going > now), please have a look at: > > http://linux-iscsi.org/index.php/VHACS-VM > http://linux-iscsi.org/index.php/VHACS > > Btw, the VHACS and LIO-Core design will allow for other fabrics to be > used inside our cloud, and between other virtualized client setups who > speak the wire protocol presented by the server side of VHACS cloud. > > Many thanks for your most valuable of time, > > --nab > > > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [ANNOUNCE]: Generic SCSI Target Mid-level For Linux (followup) 2008-07-11 18:41 ` Vladislav Bolkhovitin @ 2008-07-12 3:28 ` Nicholas A. Bellinger 2008-07-12 7:52 ` [Scst-devel] " Bart Van Assche ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Nicholas A. Bellinger @ 2008-07-12 3:28 UTC (permalink / raw) To: Vladislav Bolkhovitin Cc: linux-kernel, linux-scsi, scst-devel, Linux-iSCSI.org Target Dev, Jeff Garzik, Leonid Grossman, H. Peter Anvin, Pete Wyckoff, Ming Zhang, Ross S. W. Walker, Rafiu Fakunle, Mike Mazarick, Andrew Morton, David Miller, Christoph Hellwig, Ted Ts'o, Jerome Martin On Fri, 2008-07-11 at 22:41 +0400, Vladislav Bolkhovitin wrote: > Nicholas A. Bellinger wrote: > >>>> And this is a real showstopper for making LIO-Core > >>>> the default and the only SCSI target framework. SCST is SCSI-centric, > >>> Well, one needs to understand that LIO-Core subsystem API is more than a > >>> SCSI target framework. Its a generic method of accessing any possible > >>> storage object of the storage stack, and having said engine handle the > >>> hardware restrictions (be they physical or virtual) for the underlying > >>> storage object. It can run as a SCSI engine to real (or emualted) SCSI > >>> hardware from linux/drivers/scsi, but the real strength is that it sits > >>> above the SCSI/BLOCK/FILE layers and uses a single codepath for all > >>> underlying storage objects. For example in the lio-core-2.6.git tree, I > >>> chose the location linux/drivers/lio-core, because LIO-Core uses 'struct > >>> file' from fs/, 'struct block_device' from block/ and struct scsi_device > >>> from drivers/scsi. > >> SCST and iSCSI-SCST, basically, do the same things, except iSCSI MC/S > >> and related, + something more, like 1-to-many pass-through and > >> scst_user, which need a big chunks of code, correct? And they are > >> together about 2 times smaller: > > > > Yes, something much more. A complete implementation of traditional > > iSCSI/TCP (known as RFC-3720), iSCSI/SCTP (which will be important in > > the future), and IPv6 (also important) is a significant amount of logic. > > When I say a 'complete implementation' I mean: > > > > I) Active-Active connection layer recovery (known as > > ErrorRecoveryLevel=2). (We are going to use the same code for iSER for > > inter-nexus OS independent (eg: below the SCSI Initiator level) > > recovery. Again, the important part here is that recovery and > > outstanding task migration happens transparently to the host OS SCSI > > subsystem. This means (at least with iSCSI and iSER): not having to > > register multiple LUNs and depend (at least completely) on SCSI WWN > > information, and OS dependent SCSI level multipath. > > > > II) MC/S for multiplexing (same as I), as well as being able to > > multiplex across multiple cards and subnets (using TCP, SCTP has > > multi-homing). Also being able to bring iSCSI connections up/down on > > the fly, until we all have iSCSI/SCTP, is very important too. > > > > III) Every possible combination of RFC-3720 defined parameter keys (and > > provide the apparatis to prove it). And yes, anyone can do this today > > against their own Target. I created core-iscsi-dv specifically for > > testing LIO-Target <-> LIO-Core back in 2005. Core-iSCSI-DV is the > > _ONLY_ _PUBLIC_ RFC-3720 domain validation tool that will actually > > demonstrate, using ANY data integrity tool complete domain validation of > > user defined keys. Please have a look at: > > > > http://linux-iscsi.org/index.php/Core-iscsi-dv > > > > http://www.linux-iscsi.org/files/core-iscsi-dv/README > > > > Any traditional iSCSI target mode implementation + Storage Engine + > > Subsystem Plugin that thinks its ready to go into the kernel will have > > to pass at LEAST the 8k test loop interations, the simplest being: > > > > HeaderDigest, DataDigest, MaxRecvDataSegmentLength (512 -> 262144, in > > 512 byte increments) > > > > Core-iSCSI-DV is also a great indication of stability and data integrity > > of hardware/software of an iSCSI Target + Engine, espically when you > > have multiple core-iscsi-dv nodes hitting multiple VHACS clouds on > > physical machines within the cluster. I have never run IET against > > core-iscsi-dv personally, and I don't think Ming or Ross has either. Ming or Ross, would you like to make a comment on this, considering after it, it is your work..? > So > > until SOMEONE actually does this first, I think that iSCSI-SCST is more > > of an experiment for your our devel that a strong contender for > > Linux/iSCSI Target Mode. > > There are big doubts among storage experts if features I and II are > needed at all, see, e.g. http://lkml.org/lkml/2008/2/5/331. Well, jgarzik is both a NETWORKING and STORAGE (he was a networking guy first, mind you) expert! > I also tend > to agree, that for block storage on practice MC/S is not needed or, at > least, definitely doesn't worth the effort, because: > Trying to agrue against MC/S (or against any other major part of RFC-3720, including ERL=2) is saying that Linux/iSCSI should be BEHIND what the greatest minds in the IETF have produced (and learned) from iSCSI. Considering so many people are interested in seeing Linux/iSCSI be best and most complete implementation possible, surely one would not be foolish enough to try to debate that Linux should be BEHIND what others have figured out, be it with RFCs or running code. Also, you should understand that MC/S is more than about just moving data I/O across multiple TCP connections, its about being able to bring those paths up/down on the fly without having to actually STOP/PAUSE anything. Then you then add the ERL=2 pixie dust, which you should understand, is the result of over a decade of work creating RFC-3720 within the IETF IPS TWG. What you have is a fabric that does not STOP/PAUSE from an OS INDEPENDENT LEVEL (below the OS dependent SCSI subsystem layer) perspective, on every possible T/I node, big and small, open or closed platform. Even as we move towards more logic in the network layer (a la Stream Control Transmission Protocol), we will still benefit from RFC-3720 as the years roll on. Quite a powerful thing.. > 1. It is useless for sync. untagged operation (regular reads in most > cases over a single stream), when always there is only one command being > executed at any time, because of the commands connection allegiance, > which forbids transferring data for a command over multiple connections. > This is a very Parallel SCSI centric way of looking at design of SAM. Since SAM allows the transport fabric to enforce its own ordering rules (it does offer some of its own SCSI level ones of course). Obviously each fabric (PSCSI, FC, SAS, iSCSI) are very different from the bus phase perspective. But, if you look back into the history of iSCSI, you will see that an asymmetric design with seperate CONTROL/DATA TCP connections was considered originally BEFORE the Command Sequence Number (CmdSN) ordering algoritim was adopted that allows both SINGLE and MULTIPLE TCP connections to move both CONTROL/DATA packets across a iSCSI Nexus. Using MC/S with a modern iSCSI implementation to take advantage of lots of cores and hardware threads is something that allows one to multiplex across multiple vendor's NIC ports, with the least possible overhead, in the OS INDEPENDENT manner. Keep in mind that you can do the allocation and RX of WRITE data OOO, but the actual *EXECUTION* down via the subsystem API (which is what LIO-Target <-> LIO-Core does, in a generic way) MUST BE in the same over as the CDBs came from the iSCSI Initiator port. This is the only requirement for iSCSI CmdSN order rules wrt the SCSI Architecture Model. > 2. The only advantage it has over traditional OS multi-pathing is > keeping commands execution order, but on practice at the moment there is > no demand for this feature, because all OS'es I know don't rely on > commands order to protect data integrity. They use other techniques, > like queue draining. A good target should be able itself to scheduler > coming commands for execution in the correct from performance POV order > and not rely for that on the commands order as they came from initiators. > Ok, you are completely missing the point of MC/S and ERL=2. Notice how it works in both iSCSI *AND* iSER (even across DDP fabrics!). I discussed the significant benefit of ERL=2 in numerious previous threads. But they can all be neatly summerized in: http://linux-iscsi.org/builds/user/nab/Inter.vs.OuterNexus.Multiplexing.pdf Internexus Multiplexing is DESIGNED to work with OS dependent multipath transparently, and as a matter of fact, it complements it quite well, in a OSI (independent) method. Its completely up to the admin to determine the benefit and configure the knobs. So, the bit: "We should not implement this important part of the RFC just because I want some code in the kernel" is not going to get your design very far. > From other side, devices bonding also preserves commands execution > order, but doesn't suffer from the connection allegiance limitation of > MC/S, so can boost performance ever for sync untagged operations. Plus, > it's pretty simple, easy to use and doesn't need any additional code. I > don't have the exact numbers of MC/S vs bonding performance comparison > (mostly, because open-iscsi doesn't support MC/S, but very curious to > see them), but have very strong suspicious that on modern OS'es, which > do TCP frames reorder in zero-copy manner, there shouldn't be much > performance difference between MC/S vs bonding in the maximum possible > throughput, but bonding should outperform MC/S a lot in case of sync > untagged operations. > Simple case here for you to get your feet wet with MC/S. Try doing bonding across 4x GB/sec ports on 2x socket 2x core x86_64 and compare MC/S vs. OS dependent networking bonding and see what you find. There about two iSCSI initiators for two OSes that implementing MC/S and LIO-Target <-> LIO-Target. Anyone interested in the CPU overhead on this setup between MC/S and Link Layer bonding across 2x 2x 1 Gb/sec port chips on 4 core x86_64..? > Anyway, I think features I and II, if added, would increase iSCSI-SCST > kernel side code not more than on 5K lines, because most of the code is > already there, the most important part which missed is fixes of locking > problems, which almost never add a lot of code. You can think whatever you want. Why don't you have a look at lio-core-2.6.git and see how big they are for yourself. > Relating Core-iSCSI-DV, > I'm sure iSCSI-SCST will pass it without problems among the required set > of iSCSI features, although still there are some limitations, derived > from IET, for instance, support for multu-PDU commands in discovery > sessions, which isn't implemented. But for adding to iSCSI-SCST optional > iSCSI features there should be good *practical* reasons, which at the > moment don't exist. And unused features are bad features, because they > overcomplicate the code and make its maintainance harder for no gain. > Again, you can think whatever you want. But since you did not implement the majority of the iSCSI-SCST code yourself, (or implement your own iSCSI Initiator in parallel with your own iSCSI Target), I do not believe you are in a position to say. Any IET devs want to comment on this..? > So, current SCST+iSCSI-SCST 36K lines + 5K new lines = 41K lines, which > still a lot less than LIO's 63K lines. I downloaded the cleanuped > lio-core-2.6.git tree and: > Blindly comparing lines of code with no context is usually dumb. But, since that is what you seem to be stuck on, how about this: LIO 63k + SCST (minus iSCSI) ??k + iSER from STGT ??k == For the complete LIO-Core engine on fabrics, and which includes what Rafiu from Openfiler has been so kind to call LIO-Target, "arguably the most feature complete and mature implementation out there (on any platform) " > $ find lio-core-2.6/drivers/lio-core -type f -name "*.[ch]"|xargs wc > 57064 156617 1548344 total > > Still much bigger. > > > Obviously not. Also, what I was talking about there was the strength > > and flexibility of the LIO-Core design (it even ran on the Playstation 2 > > at one point, http://linux-iscsi.org/index.php/Playstation2/iSCSI, when > > MIPS r5900 boots modern v2.6, then we will do it again with LIO :-) > > SCST and the target drivers have been successfully ran on PPC and > Sparc64, so I don't see any reasons, why it can't be ran on Playstation > 2 as well. > Oh it can, can it..? Does your engine memory allocation algoritim provide for a SINGLE method for allocating linked list scatterlists containing page links of ANY (not just PAGE_SIZE) size handled generically across both internal or preregistered memory allocation acases, or coming from say, a software RNIC moving DDP packets for iSCSI in a single code path..? And then it needs to be able to go down to the PS2-Linux PATA driver, that does not show up under the SCSI subsystem mind you. Surely you understand that because the MIPS r5900 is a non cache coherent architecture that you simply cannot allocate out multiple page contigious scatterlists for your I/Os, and simply expect it to work when we are sending blocks down to the 32-bit MIPS r3000 IOP..? > >>>> - Pass-through mode (PSCSI) also provides non-enforced 1-to-1 > >>>> relationship, as it used to be in STGT (now in STGT support for > >>>> pass-through mode seems to be removed), which isn't mentioned anywhere. > >>>> > >>> Please be more specific by what you mean here. Also, note that because > >>> PSCSI is an LIO-Core subsystem plugin, LIO-Core handles the limitations > >>> of the storage object through the LIO-Core subsystem API. This means > >>> that things like (received initiator CDB sectors > LIO-Core storage > >>> object max_sectors) are handled generically by LIO-Core, using a single > >>> set of algoritims for all I/O interaction with Linux storage systems. > >>> These algoritims are also the same for DIFFERENT types of transport > >>> fabrics, both those that expect LIO-Core to allocate memory, OR that > >>> hardware will have preallocated memory and possible restrictions from > >>> the CPU/BUS architecture (take non-cache coherent MIPS for example) of > >>> how the memory gets DMA'ed or PIO'ed down to the packet's intended > >>> storage object. > >> See here: > >> http://www.mail-archive.com/linux-scsi@vger.kernel.org/msg06911.html > >> > > > > <nod> > > > >>>> - There is some confusion in the code in the function and variable > >>>> names between persistent and SAM-2 reservations. > >>> Well, that would be because persistent reservations are not emulated > >>> generally for all of the subsystem plugins just yet. Obviously with > >>> LIO-Core/PSCSI if the underlying hardware supports it, it will work. > >> What you did (passing reservation commands directly to devices and > >> nothing more) will work only with a single initiator per device, where > >> reservations in the majority of cases are not needed at all. > > > > I know, like I said, implementing Persistent Reservations for stuff > > besides real SCSI hardware with LIO-Core/PSCSI is a TODO item. Note > > that the VHACS cloud (see below) will need this for DRBD objects at some > > point. > > The problem is that persistent reservations don't work for multiple > initiators even for real SCSI hardware with LIO-Core/PSCSI and I clearly > described why in the referenced e-mail. Nicholas, why don't you want to > see it? > Why don't you provide a reference in the code to where you think the problem is, and/or problem case using Linux iSCSI Initiators VMs to demonstrate the bug..? > >>>>> The more in fighting between the > >>>>> leaders in our community, the less the community benefits. > >>>> Sure. If my note hurts you, I can remove it. But you should also remove > >>>> from your presentation and the summary paper those psychological > >>>> arguments to not confuse people. > >>>> > >>> Its not about removing, it is about updating the page to better reflect > >>> the bigger picture so folks coming to the sight can get the latest > >>> information from last update. > >> Your suggestions? > >> > > > > I would consider helping with this at some point, but as you can see, I > > am extremly busy ATM. I have looked at SCST quite a bit over the years, > > but I am not the one making a public comparision page, at least not > > yet. :-) So until then, at least explain how there are 3 projects on > > your page, with the updated 10,000 ft overviews, and mabye even add some > > links to LIO-Target and a bit about VHACS cloud. I would be willing to > > include info about SCST into the Linux-iSCSI.org wiki. Also, please > > feel free to open an account and start adding stuff about SCST yourself > > to the site. > > > > For Linux-iSCSI.org and VHACS (which is really where everything is going > > now), please have a look at: > > > > http://linux-iscsi.org/index.php/VHACS-VM > > http://linux-iscsi.org/index.php/VHACS > > > > Btw, the VHACS and LIO-Core design will allow for other fabrics to be > > used inside our cloud, and between other virtualized client setups who > > speak the wire protocol presented by the server side of VHACS cloud. > > > > Many thanks for your most valuable of time, > > New v0.8.15 VHACS-VM images online btw. Keep checking the site for more details. Many thanks for your most valuable of time, --nab ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Scst-devel] [ANNOUNCE]: Generic SCSI Target Mid-level For Linux (followup) 2008-07-12 3:28 ` [ANNOUNCE]: Generic SCSI Target Mid-level For Linux (followup) Nicholas A. Bellinger @ 2008-07-12 7:52 ` Bart Van Assche 2008-07-13 18:47 ` Ming Zhang 2008-07-14 18:17 ` Vladislav Bolkhovitin 2 siblings, 0 replies; 11+ messages in thread From: Bart Van Assche @ 2008-07-12 7:52 UTC (permalink / raw) To: Nicholas A. Bellinger Cc: Vladislav Bolkhovitin, Andrew Morton, Rafiu Fakunle, linux-scsi, Jeff Garzik, David Miller, Mike Mazarick, Leonid Grossman, linux-kernel, Pete Wyckoff, Ross S. W. Walker, scst-devel, Ted Ts'o, H. Peter Anvin, Jerome Martin, Christoph Hellwig, Linux-iSCSI.org Target Dev On Sat, Jul 12, 2008 at 5:28 AM, Nicholas A. Bellinger <nab@linux-iscsi.org> wrote: > Using MC/S with a modern iSCSI implementation to take advantage of lots > of cores and hardware threads is something that allows one to multiplex > across multiple vendor's NIC ports, with the least possible overhead, in > the OS INDEPENDENT manner. Why do you emphasize MC/S so much ? Any datacenter operator that has the choice between MC/S and using a faster network technology will choose the last IMHO. It's much more convenient to use e.g. 10 GbE instead of multiple 1 GbE connections. Bart. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [ANNOUNCE]: Generic SCSI Target Mid-level For Linux (followup) 2008-07-12 3:28 ` [ANNOUNCE]: Generic SCSI Target Mid-level For Linux (followup) Nicholas A. Bellinger 2008-07-12 7:52 ` [Scst-devel] " Bart Van Assche @ 2008-07-13 18:47 ` Ming Zhang 2008-07-14 18:17 ` Vladislav Bolkhovitin 2 siblings, 0 replies; 11+ messages in thread From: Ming Zhang @ 2008-07-13 18:47 UTC (permalink / raw) To: Nicholas A. Bellinger Cc: Vladislav Bolkhovitin, linux-kernel, linux-scsi, scst-devel, Linux-iSCSI.org Target Dev, Jeff Garzik, Leonid Grossman, H. Peter Anvin, Pete Wyckoff, Ross S. W. Walker, Rafiu Fakunle, Mike Mazarick, Andrew Morton, David Miller, Christoph Hellwig, Ted Ts'o, Jerome Martin On Fri, 2008-07-11 at 20:28 -0700, Nicholas A. Bellinger wrote: > On Fri, 2008-07-11 at 22:41 +0400, Vladislav Bolkhovitin wrote: > > Nicholas A. Bellinger wrote: > > >>>> And this is a real showstopper for making LIO-Core > > >>>> the default and the only SCSI target framework. SCST is SCSI-centric, > > >>> Well, one needs to understand that LIO-Core subsystem API is more than a > > >>> SCSI target framework. Its a generic method of accessing any possible > > >>> storage object of the storage stack, and having said engine handle the > > >>> hardware restrictions (be they physical or virtual) for the underlying > > >>> storage object. It can run as a SCSI engine to real (or emualted) SCSI > > >>> hardware from linux/drivers/scsi, but the real strength is that it sits > > >>> above the SCSI/BLOCK/FILE layers and uses a single codepath for all > > >>> underlying storage objects. For example in the lio-core-2.6.git tree, I > > >>> chose the location linux/drivers/lio-core, because LIO-Core uses 'struct > > >>> file' from fs/, 'struct block_device' from block/ and struct scsi_device > > >>> from drivers/scsi. > > >> SCST and iSCSI-SCST, basically, do the same things, except iSCSI MC/S > > >> and related, + something more, like 1-to-many pass-through and > > >> scst_user, which need a big chunks of code, correct? And they are > > >> together about 2 times smaller: > > > > > > Yes, something much more. A complete implementation of traditional > > > iSCSI/TCP (known as RFC-3720), iSCSI/SCTP (which will be important in > > > the future), and IPv6 (also important) is a significant amount of logic. > > > When I say a 'complete implementation' I mean: > > > > > > I) Active-Active connection layer recovery (known as > > > ErrorRecoveryLevel=2). (We are going to use the same code for iSER for > > > inter-nexus OS independent (eg: below the SCSI Initiator level) > > > recovery. Again, the important part here is that recovery and > > > outstanding task migration happens transparently to the host OS SCSI > > > subsystem. This means (at least with iSCSI and iSER): not having to > > > register multiple LUNs and depend (at least completely) on SCSI WWN > > > information, and OS dependent SCSI level multipath. > > > > > > II) MC/S for multiplexing (same as I), as well as being able to > > > multiplex across multiple cards and subnets (using TCP, SCTP has > > > multi-homing). Also being able to bring iSCSI connections up/down on > > > the fly, until we all have iSCSI/SCTP, is very important too. > > > > > > III) Every possible combination of RFC-3720 defined parameter keys (and > > > provide the apparatis to prove it). And yes, anyone can do this today > > > against their own Target. I created core-iscsi-dv specifically for > > > testing LIO-Target <-> LIO-Core back in 2005. Core-iSCSI-DV is the > > > _ONLY_ _PUBLIC_ RFC-3720 domain validation tool that will actually > > > demonstrate, using ANY data integrity tool complete domain validation of > > > user defined keys. Please have a look at: > > > > > > http://linux-iscsi.org/index.php/Core-iscsi-dv > > > > > > http://www.linux-iscsi.org/files/core-iscsi-dv/README > > > > > > Any traditional iSCSI target mode implementation + Storage Engine + > > > Subsystem Plugin that thinks its ready to go into the kernel will have > > > to pass at LEAST the 8k test loop interations, the simplest being: > > > > > > HeaderDigest, DataDigest, MaxRecvDataSegmentLength (512 -> 262144, in > > > 512 byte increments) > > > > > > Core-iSCSI-DV is also a great indication of stability and data integrity > > > of hardware/software of an iSCSI Target + Engine, espically when you > > > have multiple core-iscsi-dv nodes hitting multiple VHACS clouds on > > > physical machines within the cluster. I have never run IET against > > > core-iscsi-dv personally, and I don't think Ming or Ross has either. > > Ming or Ross, would you like to make a comment on this, considering > after it, it is your work..? hot water here ;) i never run that test on iet, probably nobody. if someone actually ran the test and find the failed case, i believe there are people who want to fix it. why not both of you write/reuse some test scripts to test a most advanced/fast target and let the number to talk? > > > So > > > until SOMEONE actually does this first, I think that iSCSI-SCST is more > > > of an experiment for your our devel that a strong contender for > > > Linux/iSCSI Target Mode. > > > > There are big doubts among storage experts if features I and II are > > needed at all, see, e.g. http://lkml.org/lkml/2008/2/5/331. > > Well, jgarzik is both a NETWORKING and STORAGE (he was a networking guy > first, mind you) expert! > > > I also tend > > to agree, that for block storage on practice MC/S is not needed or, at > > least, definitely doesn't worth the effort, because: > > > > Trying to agrue against MC/S (or against any other major part of > RFC-3720, including ERL=2) is saying that Linux/iSCSI should be BEHIND > what the greatest minds in the IETF have produced (and learned) from > iSCSI. Considering so many people are interested in seeing Linux/iSCSI > be best and most complete implementation possible, surely one would not > be foolish enough to try to debate that Linux should be BEHIND what > others have figured out, be it with RFCs or running code. > > Also, you should understand that MC/S is more than about just moving > data I/O across multiple TCP connections, its about being able to bring > those paths up/down on the fly without having to actually STOP/PAUSE > anything. Then you then add the ERL=2 pixie dust, which you should > understand, is the result of over a decade of work creating RFC-3720 > within the IETF IPS TWG. What you have is a fabric that does not > STOP/PAUSE from an OS INDEPENDENT LEVEL (below the OS dependent SCSI > subsystem layer) perspective, on every possible T/I node, big and small, > open or closed platform. Even as we move towards more logic in the > network layer (a la Stream Control Transmission Protocol), we will still > benefit from RFC-3720 as the years roll on. Quite a powerful thing.. > > > 1. It is useless for sync. untagged operation (regular reads in most > > cases over a single stream), when always there is only one command being > > executed at any time, because of the commands connection allegiance, > > which forbids transferring data for a command over multiple connections. > > > > This is a very Parallel SCSI centric way of looking at design of SAM. > Since SAM allows the transport fabric to enforce its own ordering rules > (it does offer some of its own SCSI level ones of course). Obviously > each fabric (PSCSI, FC, SAS, iSCSI) are very different from the bus > phase perspective. But, if you look back into the history of iSCSI, you > will see that an asymmetric design with seperate CONTROL/DATA TCP > connections was considered originally BEFORE the Command Sequence Number > (CmdSN) ordering algoritim was adopted that allows both SINGLE and > MULTIPLE TCP connections to move both CONTROL/DATA packets across a > iSCSI Nexus. > > Using MC/S with a modern iSCSI implementation to take advantage of lots > of cores and hardware threads is something that allows one to multiplex > across multiple vendor's NIC ports, with the least possible overhead, in > the OS INDEPENDENT manner. Keep in mind that you can do the allocation > and RX of WRITE data OOO, but the actual *EXECUTION* down via the > subsystem API (which is what LIO-Target <-> LIO-Core does, in a generic > way) MUST BE in the same over as the CDBs came from the iSCSI Initiator > port. This is the only requirement for iSCSI CmdSN order rules wrt the > SCSI Architecture Model. > > > 2. The only advantage it has over traditional OS multi-pathing is > > keeping commands execution order, but on practice at the moment there is > > no demand for this feature, because all OS'es I know don't rely on > > commands order to protect data integrity. They use other techniques, > > like queue draining. A good target should be able itself to scheduler > > coming commands for execution in the correct from performance POV order > > and not rely for that on the commands order as they came from initiators. > > > > Ok, you are completely missing the point of MC/S and ERL=2. Notice how > it works in both iSCSI *AND* iSER (even across DDP fabrics!). I > discussed the significant benefit of ERL=2 in numerious previous > threads. But they can all be neatly summerized in: > > http://linux-iscsi.org/builds/user/nab/Inter.vs.OuterNexus.Multiplexing.pdf > > Internexus Multiplexing is DESIGNED to work with OS dependent multipath > transparently, and as a matter of fact, it complements it quite well, in > a OSI (independent) method. Its completely up to the admin to determine > the benefit and configure the knobs. > > So, the bit: "We should not implement this important part of the RFC > just because I want some code in the kernel" is not going to get your > design very far. > > > From other side, devices bonding also preserves commands execution > > order, but doesn't suffer from the connection allegiance limitation of > > MC/S, so can boost performance ever for sync untagged operations. Plus, > > it's pretty simple, easy to use and doesn't need any additional code. I > > don't have the exact numbers of MC/S vs bonding performance comparison > > (mostly, because open-iscsi doesn't support MC/S, but very curious to > > see them), but have very strong suspicious that on modern OS'es, which > > do TCP frames reorder in zero-copy manner, there shouldn't be much > > performance difference between MC/S vs bonding in the maximum possible > > throughput, but bonding should outperform MC/S a lot in case of sync > > untagged operations. > > > > Simple case here for you to get your feet wet with MC/S. Try doing > bonding across 4x GB/sec ports on 2x socket 2x core x86_64 and compare > MC/S vs. OS dependent networking bonding and see what you find. There > about two iSCSI initiators for two OSes that implementing MC/S and > LIO-Target <-> LIO-Target. Anyone interested in the CPU overhead on > this setup between MC/S and Link Layer bonding across 2x 2x 1 Gb/sec > port chips on 4 core x86_64..? > > > Anyway, I think features I and II, if added, would increase iSCSI-SCST > > kernel side code not more than on 5K lines, because most of the code is > > already there, the most important part which missed is fixes of locking > > problems, which almost never add a lot of code. > > You can think whatever you want. Why don't you have a look at > lio-core-2.6.git and see how big they are for yourself. > > > Relating Core-iSCSI-DV, > > I'm sure iSCSI-SCST will pass it without problems among the required set > > of iSCSI features, although still there are some limitations, derived > > from IET, for instance, support for multu-PDU commands in discovery > > sessions, which isn't implemented. But for adding to iSCSI-SCST optional > > iSCSI features there should be good *practical* reasons, which at the > > moment don't exist. And unused features are bad features, because they > > overcomplicate the code and make its maintainance harder for no gain. > > > > Again, you can think whatever you want. But since you did not implement > the majority of the iSCSI-SCST code yourself, (or implement your own > iSCSI Initiator in parallel with your own iSCSI Target), I do not > believe you are in a position to say. Any IET devs want to comment on > this..? > > > So, current SCST+iSCSI-SCST 36K lines + 5K new lines = 41K lines, which > > still a lot less than LIO's 63K lines. I downloaded the cleanuped > > lio-core-2.6.git tree and: > > > > Blindly comparing lines of code with no context is usually dumb. But, > since that is what you seem to be stuck on, how about this: > > LIO 63k + > SCST (minus iSCSI) ??k + > iSER from STGT ??k == > > For the complete LIO-Core engine on fabrics, and which includes what > Rafiu from Openfiler has been so kind to call LIO-Target, "arguably the > most feature complete and mature implementation out there (on any > platform) " > > > $ find lio-core-2.6/drivers/lio-core -type f -name "*.[ch]"|xargs wc > > 57064 156617 1548344 total > > > > Still much bigger. > > > > > Obviously not. Also, what I was talking about there was the strength > > > and flexibility of the LIO-Core design (it even ran on the Playstation 2 > > > at one point, http://linux-iscsi.org/index.php/Playstation2/iSCSI, when > > > MIPS r5900 boots modern v2.6, then we will do it again with LIO :-) > > > > SCST and the target drivers have been successfully ran on PPC and > > Sparc64, so I don't see any reasons, why it can't be ran on Playstation > > 2 as well. > > > > Oh it can, can it..? Does your engine memory allocation algoritim > provide for a SINGLE method for allocating linked list scatterlists > containing page links of ANY (not just PAGE_SIZE) size handled > generically across both internal or preregistered memory allocation > acases, or coming from say, a software RNIC moving DDP packets for iSCSI > in a single code path..? > > And then it needs to be able to go down to the PS2-Linux PATA driver, > that does not show up under the SCSI subsystem mind you. Surely you > understand that because the MIPS r5900 is a non cache coherent > architecture that you simply cannot allocate out multiple page > contigious scatterlists for your I/Os, and simply expect it to work when > we are sending blocks down to the 32-bit MIPS r3000 IOP..? > > > >>>> - Pass-through mode (PSCSI) also provides non-enforced 1-to-1 > > >>>> relationship, as it used to be in STGT (now in STGT support for > > >>>> pass-through mode seems to be removed), which isn't mentioned anywhere. > > >>>> > > >>> Please be more specific by what you mean here. Also, note that because > > >>> PSCSI is an LIO-Core subsystem plugin, LIO-Core handles the limitations > > >>> of the storage object through the LIO-Core subsystem API. This means > > >>> that things like (received initiator CDB sectors > LIO-Core storage > > >>> object max_sectors) are handled generically by LIO-Core, using a single > > >>> set of algoritims for all I/O interaction with Linux storage systems. > > >>> These algoritims are also the same for DIFFERENT types of transport > > >>> fabrics, both those that expect LIO-Core to allocate memory, OR that > > >>> hardware will have preallocated memory and possible restrictions from > > >>> the CPU/BUS architecture (take non-cache coherent MIPS for example) of > > >>> how the memory gets DMA'ed or PIO'ed down to the packet's intended > > >>> storage object. > > >> See here: > > >> http://www.mail-archive.com/linux-scsi@vger.kernel.org/msg06911.html > > >> > > > > > > <nod> > > > > > >>>> - There is some confusion in the code in the function and variable > > >>>> names between persistent and SAM-2 reservations. > > >>> Well, that would be because persistent reservations are not emulated > > >>> generally for all of the subsystem plugins just yet. Obviously with > > >>> LIO-Core/PSCSI if the underlying hardware supports it, it will work. > > >> What you did (passing reservation commands directly to devices and > > >> nothing more) will work only with a single initiator per device, where > > >> reservations in the majority of cases are not needed at all. > > > > > > I know, like I said, implementing Persistent Reservations for stuff > > > besides real SCSI hardware with LIO-Core/PSCSI is a TODO item. Note > > > that the VHACS cloud (see below) will need this for DRBD objects at some > > > point. > > > > The problem is that persistent reservations don't work for multiple > > initiators even for real SCSI hardware with LIO-Core/PSCSI and I clearly > > described why in the referenced e-mail. Nicholas, why don't you want to > > see it? > > > > Why don't you provide a reference in the code to where you think the > problem is, and/or problem case using Linux iSCSI Initiators VMs to > demonstrate the bug..? > > > >>>>> The more in fighting between the > > >>>>> leaders in our community, the less the community benefits. > > >>>> Sure. If my note hurts you, I can remove it. But you should also remove > > >>>> from your presentation and the summary paper those psychological > > >>>> arguments to not confuse people. > > >>>> > > >>> Its not about removing, it is about updating the page to better reflect > > >>> the bigger picture so folks coming to the sight can get the latest > > >>> information from last update. > > >> Your suggestions? > > >> > > > > > > I would consider helping with this at some point, but as you can see, I > > > am extremly busy ATM. I have looked at SCST quite a bit over the years, > > > but I am not the one making a public comparision page, at least not > > > yet. :-) So until then, at least explain how there are 3 projects on > > > your page, with the updated 10,000 ft overviews, and mabye even add some > > > links to LIO-Target and a bit about VHACS cloud. I would be willing to > > > include info about SCST into the Linux-iSCSI.org wiki. Also, please > > > feel free to open an account and start adding stuff about SCST yourself > > > to the site. > > > > > > For Linux-iSCSI.org and VHACS (which is really where everything is going > > > now), please have a look at: > > > > > > http://linux-iscsi.org/index.php/VHACS-VM > > > http://linux-iscsi.org/index.php/VHACS > > > > > > Btw, the VHACS and LIO-Core design will allow for other fabrics to be > > > used inside our cloud, and between other virtualized client setups who > > > speak the wire protocol presented by the server side of VHACS cloud. > > > > > > Many thanks for your most valuable of time, > > > > > New v0.8.15 VHACS-VM images online btw. Keep checking the site for more details. > > Many thanks for your most valuable of time, > > --nab > > -- Ming Zhang @#$%^ purging memory... (*!% http://blackmagic02881.wordpress.com/ http://www.linkedin.com/in/blackmagic02881 -------------------------------------------- ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [ANNOUNCE]: Generic SCSI Target Mid-level For Linux (followup) 2008-07-12 3:28 ` [ANNOUNCE]: Generic SCSI Target Mid-level For Linux (followup) Nicholas A. Bellinger 2008-07-12 7:52 ` [Scst-devel] " Bart Van Assche 2008-07-13 18:47 ` Ming Zhang @ 2008-07-14 18:17 ` Vladislav Bolkhovitin 2 siblings, 0 replies; 11+ messages in thread From: Vladislav Bolkhovitin @ 2008-07-14 18:17 UTC (permalink / raw) To: Nicholas A. Bellinger Cc: linux-kernel, linux-scsi, scst-devel, Linux-iSCSI.org Target Dev, Jeff Garzik, Leonid Grossman, H. Peter Anvin, Pete Wyckoff, Ming Zhang, Ross S. W. Walker, Rafiu Fakunle, Mike Mazarick, Andrew Morton, David Miller, Christoph Hellwig, Ted Ts'o, Jerome Martin Nicholas A. Bellinger wrote: >> So >>> until SOMEONE actually does this first, I think that iSCSI-SCST is more >>> of an experiment for your our devel that a strong contender for >>> Linux/iSCSI Target Mode. >> There are big doubts among storage experts if features I and II are >> needed at all, see, e.g. http://lkml.org/lkml/2008/2/5/331. > > Well, jgarzik is both a NETWORKING and STORAGE (he was a networking guy > first, mind you) expert! Well, you can question Jeff Garzik knowledge, but just look around. How many are there OS'es supporting MC/S on the initiator level? I know only one: Windows. Neither Linux's mainline open-iscsi, nor xBSD, nor Solaris don't support MC/S as initiators. Only your core-iscsi supports it, but you abandoned its development in favor of open-iscsi and I've heard there are big problems to run it on the recent kernels. Then, how many are there open source iSCSI targets supporting MC/S? Neither xBSD, nor Solaris have it. People simply prefer developing MPIO, because there are other SCSI transports and they all need multipath as well. Then, finally, if that multipath works well for, e.g., FC, why it wouldn't work also well for iSCSI? >> I also tend >> to agree, that for block storage on practice MC/S is not needed or, at >> least, definitely doesn't worth the effort, because: > > Trying to agrue against MC/S (or against any other major part of > RFC-3720, including ERL=2) is saying that Linux/iSCSI should be BEHIND > what the greatest minds in the IETF have produced (and learned) from > iSCSI. Considering so many people are interested in seeing Linux/iSCSI > be best and most complete implementation possible, surely one would not > be foolish enough to try to debate that Linux should be BEHIND what > others have figured out, be it with RFCs or running code. A rather psychological argument again. One more "older" vs "newer"? ;) > Also, you should understand that MC/S is more than about just moving > data I/O across multiple TCP connections, its about being able to bring > those paths up/down on the fly without having to actually STOP/PAUSE > anything. Then you then add the ERL=2 pixie dust, which you should > understand, is the result of over a decade of work creating RFC-3720 > within the IETF IPS TWG. What you have is a fabric that does not > STOP/PAUSE from an OS INDEPENDENT LEVEL (below the OS dependent SCSI > subsystem layer) perspective, on every possible T/I node, big and small, > open or closed platform. Even as we move towards more logic in the > network layer (a la Stream Control Transmission Protocol), we will still > benefit from RFC-3720 as the years roll on. Quite a powerful thing.. Still not convincing that those are worth the effort considering that there is MPIO implementation anyway in the OS. To make you statements clearer, can you write what *real life* tasks the above going to solve, which can't be solved by MPIO? >> 1. It is useless for sync. untagged operation (regular reads in most >> cases over a single stream), when always there is only one command being >> executed at any time, because of the commands connection allegiance, >> which forbids transferring data for a command over multiple connections. > > This is a very Parallel SCSI centric way of looking at design of SAM. > Since SAM allows the transport fabric to enforce its own ordering rules > (it does offer some of its own SCSI level ones of course). Obviously > each fabric (PSCSI, FC, SAS, iSCSI) are very different from the bus > phase perspective. But, if you look back into the history of iSCSI, you > will see that an asymmetric design with seperate CONTROL/DATA TCP > connections was considered originally BEFORE the Command Sequence Number > (CmdSN) ordering algoritim was adopted that allows both SINGLE and > MULTIPLE TCP connections to move both CONTROL/DATA packets across a > iSCSI Nexus. No, the above isn't Parallel SCSI centric way of looking, it's a practical way of looking. All attempts to distribute commands between several cores to get better performance are helpless, if there is always only one being executed command at time. In this case MC/S is useless and brings nothing (if not makes things worse because of possible overhead). Only bonding can improve throughput in this case, because it can distribute data transfers of those single commands over several links, which MC/S can't do by design. And this scenario isn't rare. In fact, it's the most common. Just count commands coming to your target during single stream reads. This is why WRITEs are almost always very much outperform READs. > Using MC/S with a modern iSCSI implementation to take advantage of lots > of cores and hardware threads is something that allows one to multiplex > across multiple vendor's NIC ports, with the least possible overhead, in > the OS INDEPENDENT manner. Keep in mind that you can do the allocation > and RX of WRITE data OOO, but the actual *EXECUTION* down via the > subsystem API (which is what LIO-Target <-> LIO-Core does, in a generic > way) MUST BE in the same over as the CDBs came from the iSCSI Initiator > port. This is the only requirement for iSCSI CmdSN order rules wrt the > SCSI Architecture Model. Yes, I've already written that keeping commands order between several links is the only real advantage of MC/S. But can you name *practical* uses of it in block storage? >> 2. The only advantage it has over traditional OS multi-pathing is >> keeping commands execution order, but on practice at the moment there is >> no demand for this feature, because all OS'es I know don't rely on >> commands order to protect data integrity. They use other techniques, >> like queue draining. A good target should be able itself to scheduler >> coming commands for execution in the correct from performance POV order >> and not rely for that on the commands order as they came from initiators. > > Ok, you are completely missing the point of MC/S and ERL=2. Notice how > it works in both iSCSI *AND* iSER (even across DDP fabrics!). I > discussed the significant benefit of ERL=2 in numerious previous > threads. But they can all be neatly summerized in: > > http://linux-iscsi.org/builds/user/nab/Inter.vs.OuterNexus.Multiplexing.pdf > > Internexus Multiplexing is DESIGNED to work with OS dependent multipath > transparently, and as a matter of fact, it complements it quite well, in > a OSI (independent) method. Its completely up to the admin to determine > the benefit and configure the knobs. Nicholas, seems you miss the important point: Linux has multipath *anyway* and MC/S can't change it. >> From other side, devices bonding also preserves commands execution >> order, but doesn't suffer from the connection allegiance limitation of >> MC/S, so can boost performance ever for sync untagged operations. Plus, >> it's pretty simple, easy to use and doesn't need any additional code. I >> don't have the exact numbers of MC/S vs bonding performance comparison >> (mostly, because open-iscsi doesn't support MC/S, but very curious to >> see them), but have very strong suspicious that on modern OS'es, which >> do TCP frames reorder in zero-copy manner, there shouldn't be much >> performance difference between MC/S vs bonding in the maximum possible >> throughput, but bonding should outperform MC/S a lot in case of sync >> untagged operations. > > Simple case here for you to get your feet wet with MC/S. Try doing > bonding across 4x GB/sec ports on 2x socket 2x core x86_64 and compare > MC/S vs. OS dependent networking bonding and see what you find. There > about two iSCSI initiators for two OSes that implementing MC/S and > LIO-Target <-> LIO-Target. Anyone interested in the CPU overhead on > this setup between MC/S and Link Layer bonding across 2x 2x 1 Gb/sec > port chips on 4 core x86_64..? I think, everybody interested to see those numbers. Do you have any? >> Anyway, I think features I and II, if added, would increase iSCSI-SCST >> kernel side code not more than on 5K lines, because most of the code is >> already there, the most important part which missed is fixes of locking >> problems, which almost never add a lot of code. > > You can think whatever you want. Why don't you have a look at > lio-core-2.6.git and see how big they are for yourself. I almost doubled the iSCSI-SCST in-kernel size by that estimation (currently it's 7.8K lines long) >> Relating Core-iSCSI-DV, >> I'm sure iSCSI-SCST will pass it without problems among the required set >> of iSCSI features, although still there are some limitations, derived >> from IET, for instance, support for multu-PDU commands in discovery >> sessions, which isn't implemented. But for adding to iSCSI-SCST optional >> iSCSI features there should be good *practical* reasons, which at the >> moment don't exist. And unused features are bad features, because they >> overcomplicate the code and make its maintainance harder for no gain. > > Again, you can think whatever you want. But since you did not implement > the majority of the iSCSI-SCST code yourself, (or implement your own > iSCSI Initiator in parallel with your own iSCSI Target), I do not > believe you are in a position to say. Any IET devs want to comment on > this..? You already asked me don't do blanket statements. Can you don't make them yourself, please? I very much appreciate the work, which IET developers done, but, in fact, I had to rewrite at least 70% of in kernel part of IET, because of many problems, starting from: - Simple code quality issues, which made code auditing practically impossible. For instance, struct iscsi_cmnd has field pdu_list, which used in different part of the code both as list and list entry. Now, how many time would you need to find out in a random code place how it should be used, as list entry or list? And how big is the probability to guess wrongly? I suspect, such issues is the main reason why development of IET was frozen at some point. It's simply impossible to tell looking at a patch touching the corresponding code if it's correct or not. to more sophisticated problems like: - a Russian roulette with VMware, mentioned there: http://communities.vmware.com/thread/53797?tstart=0&start=15. BTW, LIO target isn't affected by that simply by accident, because of the reset SCSI violation, which I already mentioned. I also had to considerably change the user space part, particularly, iSCSI negotiation, because interpretation of the iSCSI RFC, which IET has, forces it to use by default very inoptimal values. Now guess, was I able to do that without sufficient understanding of iSCSI or not? Actually, if I had known about open source LIO iSCSI target implementation, I would have chosen it, not IET as the base. And now we wouldn't have a point to discuss ;) >>>>>> - Pass-through mode (PSCSI) also provides non-enforced 1-to-1 >>>>>> relationship, as it used to be in STGT (now in STGT support for >>>>>> pass-through mode seems to be removed), which isn't mentioned anywhere. >>>>>> >>>>> Please be more specific by what you mean here. Also, note that because >>>>> PSCSI is an LIO-Core subsystem plugin, LIO-Core handles the limitations >>>>> of the storage object through the LIO-Core subsystem API. This means >>>>> that things like (received initiator CDB sectors > LIO-Core storage >>>>> object max_sectors) are handled generically by LIO-Core, using a single >>>>> set of algoritims for all I/O interaction with Linux storage systems. >>>>> These algoritims are also the same for DIFFERENT types of transport >>>>> fabrics, both those that expect LIO-Core to allocate memory, OR that >>>>> hardware will have preallocated memory and possible restrictions from >>>>> the CPU/BUS architecture (take non-cache coherent MIPS for example) of >>>>> how the memory gets DMA'ed or PIO'ed down to the packet's intended >>>>> storage object. >>>> See here: >>>> http://www.mail-archive.com/linux-scsi@vger.kernel.org/msg06911.html >>>> >>> <nod> >>> >>>>>> - There is some confusion in the code in the function and variable >>>>>> names between persistent and SAM-2 reservations. >>>>> Well, that would be because persistent reservations are not emulated >>>>> generally for all of the subsystem plugins just yet. Obviously with >>>>> LIO-Core/PSCSI if the underlying hardware supports it, it will work. >>>> What you did (passing reservation commands directly to devices and >>>> nothing more) will work only with a single initiator per device, where >>>> reservations in the majority of cases are not needed at all. >>> I know, like I said, implementing Persistent Reservations for stuff >>> besides real SCSI hardware with LIO-Core/PSCSI is a TODO item. Note >>> that the VHACS cloud (see below) will need this for DRBD objects at some >>> point. >> The problem is that persistent reservations don't work for multiple >> initiators even for real SCSI hardware with LIO-Core/PSCSI and I clearly >> described why in the referenced e-mail. Nicholas, why don't you want to >> see it? > > Why don't you provide a reference in the code to where you think the > problem is, and/or problem case using Linux iSCSI Initiators VMs to > demonstrate the bug..? I described the problem in the referenced e-mail pretty well. Do you have problems with reading and understanding it? >>>>>>> The more in fighting between the >>>>>>> leaders in our community, the less the community benefits. >>>>>> Sure. If my note hurts you, I can remove it. But you should also remove >>>>>> from your presentation and the summary paper those psychological >>>>>> arguments to not confuse people. >>>>>> >>>>> Its not about removing, it is about updating the page to better reflect >>>>> the bigger picture so folks coming to the sight can get the latest >>>>> information from last update. >>>> Your suggestions? >>>> >>> I would consider helping with this at some point, but as you can see, I >>> am extremly busy ATM. I have looked at SCST quite a bit over the years, >>> but I am not the one making a public comparision page, at least not >>> yet. :-) So until then, at least explain how there are 3 projects on >>> your page, with the updated 10,000 ft overviews, and mabye even add some >>> links to LIO-Target and a bit about VHACS cloud. I would be willing to >>> include info about SCST into the Linux-iSCSI.org wiki. Also, please >>> feel free to open an account and start adding stuff about SCST yourself >>> to the site. >>> >>> For Linux-iSCSI.org and VHACS (which is really where everything is going >>> now), please have a look at: >>> >>> http://linux-iscsi.org/index.php/VHACS-VM >>> http://linux-iscsi.org/index.php/VHACS >>> >>> Btw, the VHACS and LIO-Core design will allow for other fabrics to be >>> used inside our cloud, and between other virtualized client setups who >>> speak the wire protocol presented by the server side of VHACS cloud. >>> >>> Many thanks for your most valuable of time, >>> > > New v0.8.15 VHACS-VM images online btw. Keep checking the site for more details. > > Many thanks for your most valuable of time, > > --nab > > > ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2008-07-14 18:16 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-07-08 19:14 [ANNOUNCE]: Generic SCSI Target Mid-level For Linux (SCST), target drivers for iSCSI and QLogic Fibre Channel cards released Vladislav Bolkhovitin 2008-07-08 21:09 ` Nicholas A. Bellinger 2008-07-09 11:19 ` Vladislav Bolkhovitin 2008-07-09 19:34 ` Nicholas A. Bellinger 2008-07-10 18:25 ` Vladislav Bolkhovitin 2008-07-10 21:26 ` Nicholas A. Bellinger 2008-07-11 18:41 ` Vladislav Bolkhovitin 2008-07-12 3:28 ` [ANNOUNCE]: Generic SCSI Target Mid-level For Linux (followup) Nicholas A. Bellinger 2008-07-12 7:52 ` [Scst-devel] " Bart Van Assche 2008-07-13 18:47 ` Ming Zhang 2008-07-14 18:17 ` Vladislav Bolkhovitin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox