* Re: Ang: Re: [Stgt-devel] Re: stgt a new version of iscsi target?
[not found] ` <43972C2D.9060500@cs.wisc.edu>
@ 2005-12-08 18:46 ` Vladislav Bolkhovitin
2005-12-08 18:54 ` Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] " Mike Christie
` (2 more replies)
0 siblings, 3 replies; 43+ messages in thread
From: Vladislav Bolkhovitin @ 2005-12-08 18:46 UTC (permalink / raw)
To: Mike Christie
Cc: johan, iscsitarget-devel, mingz, stgt, Robert Whitehead,
scst-devel, linux-scsi, Christoph Hellwig
Mike Christie wrote:
> johan@capvert.se wrote:
>> I, and I suppose a lot of other people, would like know how you look
>> at the
>> HW support on the Fc side? I don´t know what you mean now, when you talk
>> about HW target support,
>
> For qlogic FC, it is the same as scst at this point, except we are based
> off the mainline qla2xxx driver.
Actually, this is not completely true and could mislead people. Stgt is
*not* the same as scst at this point and, I think will not be for at
least considerable amount of time.
They are the same only on a basic subset of functionality, which I would
call "fast path", and only for block devices that are could be
considered stateless, i.e. on which the result of a SCSI command's
execution isn't dependent on the current state of the device. For, eg,
tapes this isn't true, because on such devices the current state, like
block size, changes the result of the command a lot, so it shall be
honored and special handling of Unit Attention conditions shall be done.
For example, all UAs from a device shall be delivered to all connected
initiators, not only to one of them who happens to execute the command
with UA result. Another example is when one initiator changes state of
the device. After that all other initiators shall receive appropriate
UA, i.e. the mid-layer shall generate it, because for the device all
initiators act as one initiator (nexus) and the device is not able to
distinguish between them to perform all necessary SCSI handling. Thus,
the mid-layer has to do it. Not doing so is dangerous and could lead to
data corruption and loss. The same is true for "advanced" commands like
RESERVE/RELEASE, which also have to be "emulated" by the mid-layer.
Also, currently stgt doesn't care much about task management.
Scst from the very beginning was targeted for SCSI tapes exported via
hardware targets and was designed with all that very complicated staff
in mind. Most of scst complexity is caused by handling of it as well as
by attempt to make that handling at most performance effective way. For
example, on the fast path, no task management and UA-related locks are
taken, although this is done on the way, which could be considered a bit
unusual or extravagant, but effective.
Thus, to resume, the following important features are missed in stgt
comparing to scst:
- Task management
- SCSI handling/emulation required for statefull SCSI devices (tapes,
etc.)
- Scst has some performance advantages over stgt, at least, on
hardware targets, because it allows internal handling in SIRQ context as
well as doesn't need user space program, so it eliminates additional
context switches (at least 3 per command for WRITEs and 2 per command
for reads plus switches to user space daemon, probably, 2 per command).
5 context switches per command looks too much for me, especially
considering how little work is done on each context. It means ~15000
CS/sec on regular 200Mb/sec link with 64K block size. Additionally,
kernel only commands execution allows direct access to page cache, which
is GREAT performance improvement, because in architecture with user
space daemon the data should be copied several times between kernel and
user space. Or, do I miss anything?
- Access and devices (LUNs) visibility management. It allows for an
initiator or group of initiators to have different set of LUs, each with
appropriate access permissions. This feature provides HUGE usability,
people who tried it will confirm that.
- Support for most SCSI devices, namely tapes, processors (SCSI type
3), CDROM's (SCSI type 5), MO disks (SCSI type 7), medium changers (SCSI
type 8) and RAID controllers (SCSI type 0xC)
- Stability. Current SCST (0.9.3-pre2) is quite stable, as far as I
know only task management has some unfixed flaws.
Those are what I've noticed on a brief review. It's possible that I
missed something. For example, stgt seems doesn't have internal commands
serialization, so I suspect that if some initiator mixes READs and
WRITEs on some targets it will be possible that order of the commands
execution will be broken, because WRITE command have additional phase ,
when the command's data from the initiator is sent to the target. During
it subsequent READs could be executed out of order. The result could be
data corruption.
From other side, stgt has not too much advantages over scst.
Technically, I, personally, see only few such advantages. One of them is
support for putting block commands directly to the device request queue.
I'm going to fix this in the next version (I hope nobody will blame me
if I borrow this code from stgt :) ). Another one is support for
different "protocols", although I have not understood which ones, except
SCSI, are going to be there.
Actually, we would greatly appreciate if Mike or Christoph will tell us
what is so wrong in scst on their opinion, so they started they own new
project. (I'm not considering motivation like "I want to make my own in
any case" seriously). Is scst too complicated? Do you think stgt will be
simpler with all those features implemented?
Vlad
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_idv37&alloc_id\x16865&op=click
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
2005-12-08 18:46 ` Ang: Re: [Stgt-devel] Re: stgt a new version of iscsi target? Vladislav Bolkhovitin
@ 2005-12-08 18:54 ` Mike Christie
2005-12-09 15:30 ` Ang: Re: [Stgt-devel] " Vladislav Bolkhovitin
2005-12-08 19:10 ` Mike Christie
2005-12-08 19:47 ` Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] " James Bottomley
2 siblings, 1 reply; 43+ messages in thread
From: Mike Christie @ 2005-12-08 18:54 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: johan, iscsitarget-devel, mingz, stgt, Robert Whitehead,
scst-devel, linux-scsi, Christoph Hellwig
Vladislav Bolkhovitin wrote:
> Mike Christie wrote:
>
>> johan@capvert.se wrote:
>>
>>> I, and I suppose a lot of other people, would like know how you look
>>> at the
>>> HW support on the Fc side? I don´t know what you mean now, when you talk
>>> about HW target support,
>>
>>
>> For qlogic FC, it is the same as scst at this point, except we are
>> based off the mainline qla2xxx driver.
>
>
> Actually, this is not completely true and could mislead people. Stgt is
> *not* the same as scst at this point and,
Yes, I agree, I was not talking about functionality. I was only talking
about how we interface with the initiator driver. My mistake. So we add
a qla_tgt function callback struct (but we do it for each ha rather than
global) so the initiator can call back into the target driver, so I am
basically resusing that code.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
2005-12-08 18:46 ` Ang: Re: [Stgt-devel] Re: stgt a new version of iscsi target? Vladislav Bolkhovitin
2005-12-08 18:54 ` Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] " Mike Christie
@ 2005-12-08 19:10 ` Mike Christie
2005-12-08 19:48 ` James Bottomley
2005-12-09 15:28 ` Vladislav Bolkhovitin
2005-12-08 19:47 ` Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] " James Bottomley
2 siblings, 2 replies; 43+ messages in thread
From: Mike Christie @ 2005-12-08 19:10 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: johan, iscsitarget-devel, mingz, stgt, Robert Whitehead,
scst-devel, linux-scsi, Christoph Hellwig
Vladislav Bolkhovitin wrote:
> - Scst has some performance advantages over stgt, at least, on hardware
> targets, because it allows internal handling in SIRQ context as well as
> doesn't need user space program, so it eliminates additi
> switches (at least 3 per command for WRITEs and 2 per command for reads
> plus switches to user space daemon, probably, 2 per command). 5 context
> switches per command looks too much for me, especially considering how
> little work is done on each context. It means ~15000 CS/sec on regular
> 200Mb/sec link with 64K block size. Additionally, kernel only commands
> execution allows direct access to page cache, which is GREAT performance
> improvement, because in architecture with user space daemon the data
> should be copied several times between kernel and user space. Or, do I
> miss anything?
Userspace only occurs for READ/WRITEs. So for REPORT_LUNS, TUR, etc
where performance is not a factor.
Also is the page cache comment in reference to us using the page cache
for our reads and writes or I am not sure why you wrote that if you do
not do it right now.
>
> From other side, stgt has not too much advantages over scst.
Hey, we just started and have not had too much time.
> Actually, we would greatly appreciate if Mike or Christoph will tell us
> what is so wrong in scst on their opinion, so they started they own new
> project. (I'm not considering motivation like "I want to make my own in
> any case" seriously). Is scst too complicated? Do you think stgt will be
> simpler with all those features implemented?
>
Didn't we go over this? To get SCST ready for mainline we would have had
a large cleanup effort. I started this and sent you a patch to begin the
cleanup. In the end some of the scsi people liked the idea of throwing
the non-read/write command to userspace and to do this we just decided
to start over but I have been cutting and pasting your code and cleaning
it up as I add more stuff.
Simmer down :) If you had just gotton your code ready when christoph
asked a year ago we would never have had this problem and I would be
sending you patches to remove the scsi_request usage as we speak.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
2005-12-08 18:46 ` Ang: Re: [Stgt-devel] Re: stgt a new version of iscsi target? Vladislav Bolkhovitin
2005-12-08 18:54 ` Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] " Mike Christie
2005-12-08 19:10 ` Mike Christie
@ 2005-12-08 19:47 ` James Bottomley
2005-12-09 3:57 ` Mike Christie
2005-12-09 15:29 ` [Scst-devel] Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] " Vladislav Bolkhovitin
2 siblings, 2 replies; 43+ messages in thread
From: James Bottomley @ 2005-12-08 19:47 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: Mike Christie, johan, iscsitarget-devel, mingz, stgt,
Robert Whitehead, scst-devel, linux-scsi, Christoph Hellwig
On Thu, 2005-12-08 at 21:46 +0300, Vladislav Bolkhovitin wrote:
> - Scst has some performance advantages over stgt, at least, on
> hardware targets, because it allows internal handling in SIRQ context as
> well as doesn't need user space program, so it eliminates additional
> context switches (at least 3 per command for WRITEs and 2 per command
> for reads plus switches to user space daemon, probably, 2 per command).
> 5 context switches per command looks too much for me, especially
> considering how little work is done on each context. It means ~15000
> CS/sec on regular 200Mb/sec link with 64K block size. Additionally,
> kernel only commands execution allows direct access to page cache, which
> is GREAT performance improvement, because in architecture with user
> space daemon the data should be copied several times between kernel and
> user space. Or, do I miimpss anything?
I do have to say that I consider operation in interrupt context (or even
kernel context) to be a disadvantage. Compared with the response times
that most arrays have to SCSI commands, the kernel context switch time
isn't that significant.
Additionally, it's perfectly possible for all of this to be done zero
copy on the data. A user space target mmaps the data on its storage
device and then does a SG_IO type scatter gather user virtual region
pass to the underlying target infrastructure. We already have this
demonstrated in the SG_IO path, someone just needs to come up with the
correct implementation for a target path.
The great advantage of doing SCSI state machines in user space is that
you can prototype anything you want, and user space has much better
state machine implementation (and debugging) tools available than the
kernel does.
James
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
2005-12-08 19:10 ` Mike Christie
@ 2005-12-08 19:48 ` James Bottomley
2005-12-08 20:09 ` Mike Christie
` (2 more replies)
2005-12-09 15:28 ` Vladislav Bolkhovitin
1 sibling, 3 replies; 43+ messages in thread
From: James Bottomley @ 2005-12-08 19:48 UTC (permalink / raw)
To: Mike Christie
Cc: Vladislav Bolkhovitin, johan, iscsitarget-devel, mingz, stgt,
Robert Whitehead, scst-devel, linux-scsi, Christoph Hellwig
On Thu, 2005-12-08 at 13:10 -0600, Mike Christie wrote:
> cleanup. In the end some of the scsi people liked the idea of throwing
> the non-read/write command to userspace and to do this we just decided
> to start over but I have been cutting and pasting your code and cleaning
> it up as I add more stuff.
To be honest, I'd like to see all command processing at user level
(including read/write ... for block devices, it shouldn't be that
inefficient, since you're merely going to say remap an area from one
device to another; as long as no data transformation ever occurs, the
user never touches the data and it all remains in the kernel page
cache).
My ideal for the kernel based infrastructure is a simple tap for
transporting commands addressed to devices upwards (and the responses
downwards). Then everyone can have their own user space processing
implementation that I don't have to care about.
James
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
2005-12-08 19:48 ` James Bottomley
@ 2005-12-08 20:09 ` Mike Christie
2005-12-08 21:35 ` Dave C Boutcher
2005-12-09 15:29 ` Vladislav Bolkhovitin
2005-12-21 23:53 ` FUJITA Tomonori
2005-12-26 23:53 ` Ang: " FUJITA Tomonori
2 siblings, 2 replies; 43+ messages in thread
From: Mike Christie @ 2005-12-08 20:09 UTC (permalink / raw)
To: James Bottomley
Cc: Vladislav Bolkhovitin, johan, iscsitarget-devel, mingz, stgt,
Robert Whitehead, scst-devel, linux-scsi, Christoph Hellwig
James Bottomley wrote:
> On Thu, 2005-12-08 at 13:10 -0600, Mike Christie wrote:
>
>>cleanup. In the end some of the scsi people liked the idea of throwing
>>the non-read/write command to userspace and to do this we just decided
>>to start over but I have been cutting and pasting your code and cleaning
>>it up as I add more stuff.
>
>
> To be honest, I'd like to see all command processing at user level
> (including read/write ... for block devices, it shouldn't be that
> inefficient, since you're merely going to say remap an area from one
> device to another; as long as no data transformation ever occurs, the
> user never touches the data and it all remains in the kernel page
> cache).
Ok, Tomo and I briefly talked about this when we saw Jeff's post about
doing block layer drivers in userspace on lkml. I think we were somewhat
prepared for this given some of your other replies.
So Vlad and other target guys what do you think? Vlad are you going to
continue to maintain scst as kernel only, or is there some place we can
work together on this on - if your feelings are not hurt too much that
is :) ?
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
2005-12-08 20:09 ` Mike Christie
@ 2005-12-08 21:35 ` Dave C Boutcher
2005-12-08 21:56 ` Mike Christie
2005-12-09 15:30 ` Vladislav Bolkhovitin
2005-12-09 15:29 ` Vladislav Bolkhovitin
1 sibling, 2 replies; 43+ messages in thread
From: Dave C Boutcher @ 2005-12-08 21:35 UTC (permalink / raw)
To: Mike Christie
Cc: James Bottomley, Vladislav Bolkhovitin, johan, iscsitarget-devel,
mingz, stgt, Robert Whitehead, scst-devel, linux-scsi,
Christoph Hellwig
On Thu, Dec 08, 2005 at 02:09:32PM -0600, Mike Christie wrote:
> James Bottomley wrote:
> >On Thu, 2005-12-08 at 13:10 -0600, Mike Christie wrote:
> >
> >>cleanup. In the end some of the scsi people liked the idea of throwing
> >>the non-read/write command to userspace and to do this we just decided
> >>to start over but I have been cutting and pasting your code and cleaning
> >>it up as I add more stuff.
> >
> >
> >To be honest, I'd like to see all command processing at user level
> >(including read/write ... for block devices, it shouldn't be that
> >inefficient, since you're merely going to say remap an area from one
> >device to another; as long as no data transformation ever occurs, the
> >user never touches the data and it all remains in the kernel page
> >cache).
>
> Ok, Tomo and I briefly talked about this when we saw Jeff's post about
> doing block layer drivers in userspace on lkml. I think we were somewhat
> prepared for this given some of your other replies.
>
> So Vlad and other target guys what do you think? Vlad are you going to
> continue to maintain scst as kernel only, or is there some place we can
> work together on this on - if your feelings are not hurt too much that
> is :) ?
Oofff....Architecturally I agree with James...do all command processing
in one place. On the other hand, the processing involved with a read or
write in the normal case (no aborts/resets/ordering/timeouts/etc) is
almost zero. Figure out the LBA and length and pass on the I/O. The
overhead of passing it up and down across the kernel boundary is likely
to be orders of magnitude larger than the actual processing. I would
personally rather not fix this decision in concrete until we could do
some actual measurements of a SCSI target under heavy load.
--
Dave Boutcher
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
2005-12-08 21:35 ` Dave C Boutcher
@ 2005-12-08 21:56 ` Mike Christie
2005-12-09 15:29 ` Vladislav Bolkhovitin
2005-12-09 15:30 ` Vladislav Bolkhovitin
1 sibling, 1 reply; 43+ messages in thread
From: Mike Christie @ 2005-12-08 21:56 UTC (permalink / raw)
To: boutcher
Cc: James Bottomley, Vladislav Bolkhovitin, johan, iscsitarget-devel,
mingz, stgt, Robert Whitehead, scst-devel, linux-scsi,
Christoph Hellwig
Dave C Boutcher wrote:
> On Thu, Dec 08, 2005 at 02:09:32PM -0600, Mike Christie wrote:
>
>>James Bottomley wrote:
>>
>>>On Thu, 2005-12-08 at 13:10 -0600, Mike Christie wrote:
>>>
>>>
>>>>cleanup. In the end some of the scsi people liked the idea of throwing
>>>>the non-read/write command to userspace and to do this we just decided
>>>>to start over but I have been cutting and pasting your code and cleaning
>>>>it up as I add more stuff.
>>>
>>>
>>>To be honest, I'd like to see all command processing at user level
>>>(including read/write ... for block devices, it shouldn't be that
>>>inefficient, since you're merely going to say remap an area from one
>>>device to another; as long as no data transformation ever occurs, the
>>>user never touches the data and it all remains in the kernel page
>>>cache).
>>
>>Ok, Tomo and I briefly talked about this when we saw Jeff's post about
>>doing block layer drivers in userspace on lkml. I think we were somewhat
>>prepared for this given some of your other replies.
>>
>>So Vlad and other target guys what do you think? Vlad are you going to
>>continue to maintain scst as kernel only, or is there some place we can
>>work together on this on - if your feelings are not hurt too much that
>>is :) ?
>
>
> Oofff....Architecturally I agree with James...do all command processing
> in one place. On the other hand, the processing involved with a read or
> write in the normal case (no aborts/resets/ordering/timeouts/etc) is
> almost zero. Figure out the LBA and length and pass on the I/O. The
There is still memory and scatterlist allocations. If we are not going
to allocate all the memory for a command buffer and request with
GFP_ATOMIC (and can then run from the the HW interrupt or soft irq) we
have to pass that on to a thread. I guess there is disagreement whether
that part is a feature or bad use of GFP_ATOMIC though so... But I just
mean to say there could be a little more to do.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
2005-12-08 19:47 ` Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] " James Bottomley
@ 2005-12-09 3:57 ` Mike Christie
2005-12-09 15:00 ` Ang: Re: [Stgt-devel] " Ming Zhang
2005-12-09 15:29 ` [Scst-devel] Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] " Vladislav Bolkhovitin
1 sibling, 1 reply; 43+ messages in thread
From: Mike Christie @ 2005-12-09 3:57 UTC (permalink / raw)
To: James Bottomley
Cc: Vladislav Bolkhovitin, johan, iscsitarget-devel, mingz, stgt,
Robert Whitehead, scst-devel, linux-scsi, Christoph Hellwig
James Bottomley wrote:
>
> Additionally, it's perfectly possible for all of this to be done zero
> copy on the data. A user space target mmaps the data on its storage
> device and then does a SG_IO type scatter gather user virtual region
> pass to the underlying target infrastructure. We already have this
> demonstrated in the SG_IO path, someone just needs to come up with the
> correct implementation for a target path.
>
I guess I am going to try to do some work in userspace so we can
benchmark things and see how much performance drops.
For your suggestion, are you referring to the sg.c mmap path? I was
thinking that maybe we could modify dm so it can do mmap like sg.c does.
The dm device would be the target device and would sit above the real
device or another MD/DM/Loop/ramdsik device or whatever so when
userspace decides to execute the command it would inform the dm device
and that kernel driver would just then send down the bios.
For something like qlogic or mpt would this basically work like the
following:
1. dm mmap is called and our dm_target (dm drivers like dm-multipath or
dm-raid are called dm_targets) does like sg.c sg_mmap* and sg_vma* ops.
2. HW interrupt comes in and we allocate a scatterlist with pages from #1.
3. netlink (or whatever is the favorite interface) a message to
userpsace to tell it we have a command ready.
4. userspace decides if it is a read or write and if it is a read or
write then userspace tells the dm device to read/write some pages.
5. dm's bi_endio is called when io is finished so we netlink to
userspace and then userspaces netlinks back to the kernel and tells the
LLD like qlogic that some data and/or a responce or sense is ready for
it to transfer.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Ang: Re: [Stgt-devel] Re: stgt a new version of iscsi target?
2005-12-09 3:57 ` Mike Christie
@ 2005-12-09 15:00 ` Ming Zhang
0 siblings, 0 replies; 43+ messages in thread
From: Ming Zhang @ 2005-12-09 15:00 UTC (permalink / raw)
To: Mike Christie
Cc: James Bottomley, Vladislav Bolkhovitin, johan, iscsitarget-devel,
stgt, Robert Whitehead, scst-devel, linux-scsi, Christoph Hellwig
On Thu, 2005-12-08 at 21:57 -0600, Mike Christie wrote:
> James Bottomley wrote:
> >
> > Additionally, it's perfectly possible for all of this to be done zero
> > copy on the data. A user space target mmaps the data on its storage
> > device and then does a SG_IO type scatter gather user virtual region
> > pass to the underlying target infrastructure. We already have this
> > demonstrated in the SG_IO path, someone just needs to come up with the
> > correct implementation for a target path.
> >
>
> I guess I am going to try to do some work in userspace so we can
> benchmark things and see how much performance drops.
>
> For your suggestion, are you referring to the sg.c mmap path? I was
> thinking that maybe we could modify dm so it can do mmap like sg.c does.
> The dm device would be the target device and would sit above the real
> device or another MD/DM/Loop/ramdsik device or whatever so when
> userspace decides to execute the command it would inform the dm device
> and that kernel driver would just then send down the bios.
>
> For something like qlogic or mpt would this basically work like the
> following:
>
> 1. dm mmap is called and our dm_target (dm drivers like dm-multipath or
> dm-raid are called dm_targets) does like sg.c sg_mmap* and sg_vma* ops.
> 2. HW interrupt comes in and we allocate a scatterlist with pages from #1.
> 3. netlink (or whatever is the favorite interface) a message to
> userpsace to tell it we have a command ready.
> 4. userspace decides if it is a read or write and if it is a read or
> write then userspace tells the dm device to read/write some pages.
> 5. dm's bi_endio is called when io is finished so we netlink to
> userspace and then userspaces netlinks back to the kernel and tells the
> LLD like qlogic that some data and/or a responce or sense is ready for
> it to transfer.
we can count how many system calls and context switches will occur for
each request here. will this be a source of high response time?
i guess we really need to have both a kernel and user space
implementation here to compare.
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
2005-12-08 19:10 ` Mike Christie
2005-12-08 19:48 ` James Bottomley
@ 2005-12-09 15:28 ` Vladislav Bolkhovitin
2005-12-09 22:23 ` Mike Christie
2005-12-10 8:46 ` FUJITA Tomonori
1 sibling, 2 replies; 43+ messages in thread
From: Vladislav Bolkhovitin @ 2005-12-09 15:28 UTC (permalink / raw)
To: Mike Christie
Cc: johan, iscsitarget-devel, mingz, stgt, Robert Whitehead,
scst-devel, linux-scsi, Christoph Hellwig
Mike Christie wrote:
> Vladislav Bolkhovitin wrote:
>
>> - Scst has some performance advantages over stgt, at least, on
>> hardware targets, because it allows internal handling in SIRQ context
>> as well as doesn't need user space program, so it eliminates additi
>> switches (at least 3 per command for WRITEs and 2 per command for
>> reads plus switches to user space daemon, probably, 2 per command). 5
>> context switches per command looks too much for me, especially
>> considering how little work is done on each context. It means ~15000
>> CS/sec on regular 200Mb/sec link with 64K block size. Additionally,
>> kernel only commands execution allows direct access to page cache,
>> which is GREAT performance improvement, because in architecture with
>> user space daemon the data should be copied several times between
>> kernel and user space. Or, do I miss anything?
>
>
> Userspace only occurs for READ/WRITEs. So for REPORT_LUNS, TUR, etc
> where performance is not a factor.
So, do you mean that READ/WRITE operations are (or will be) performed in
kernel space only? I don't see that from the code.
> Also is the page cache comment in reference to us using the page cache
> for our reads and writes or I am not sure why you wrote that if you do
> not do it right now.
Hm, at first, the page cache already used somehow in fileio dev handler
(though, with additional mem copy). At the second, fully utilize the
page cache is one of two major improvements that are pending in scst,
because it is required changing the kernel, which until some moment try
to avoid. Although I prepared what is necessary for that.
The idea basically is the following. When READ operation arrives, pages
for all requested blocks are at first searched in the page cache
(probably, in SIRQ context, because it isn't expensive operation) and if
all pages are found, they are referenced and the result will be sent to
the initiator. Then the pages will be dereferenced (so, no pages
allocation will be done at all). Otherwise, the missed pages will be
allocated and the command will be rescheduled to the thread, which will
read them. Then, after the response is sent, the pages will remain in
the page cache for future accesses. For WRITEs the processing is the
similar, the pages with the data will be put in the page cache.
I doubt it is possible to implement such simple and effective algorithm
via user space. Pages mapping and unmapping is quite expensive for both
coding effort and performance.
>>
>> From other side, stgt has not too much advantages over scst.
>
>
> Hey, we just started and have not had too much time.
No offense, please, you wrote stgt and scst are the same, I wrote, they
aren't :-)
>> Actually, we would greatly appreciate if Mike or Christoph will tell
>> us what is so wrong in scst on their opinion, so they started they own
>> new project. (I'm not considering motivation like "I want to make my
>> own in any case" seriously). Is scst too complicated? Do you think
>> stgt will be simpler with all those features implemented?
>>
>
> Didn't we go over this? To get SCST ready for mainline we would have had
> a large cleanup effort. I started this and sent you a patch to begin the
> cleanup. In the end some of the scsi people liked the idea of throwing
> the non-read/write command to userspace and to do this we just decided
> to start over but I have been cutting and pasting your code and cleaning
> it up as I add more stuff.
The patches that I've seen were just pretty mechanic cleanups and
renames, which could be done in a half of hour time and which I'm going
to do in any case before preparing the patch. So, that reason doesn't
look convincing for me to throw away a big chunk of working code. Doing
so you delayed SCSI targets development for at least to a year-two,
because there are too much features for you to implement in stgt, which
are already working and useful in scst.
From other side, if you look on scst closely you will see that:
- The user space gate could be easily and clearly done using existing
dev handler's hooks
- There is nothing in current SCST that could be moved in user space
without sacrificing performance. Neither task management, nor UA
processing, nor RESERVE/RELEASE emulation. Otherwise, you will have to
pass *all* the commands to the user space to check allowed this command
for processing or it shall be returned with an error. SCST core is just
about 7500 lines of code. Is it too much?
So, in my opinion, there is/was no technical point to drop scst in favor
of stgt.
> Simmer down :) If you had just gotton your code ready when christoph
> asked a year ago we would never have had this problem and I would be
> sending you patches to remove the scsi_request usage as we speak.
Christoph didn't ask me to do anything. Actually, nobody contacted and
asked anything about that me when you with Christoph conceived stgt.
Vlad
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Scst-devel] Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
2005-12-08 19:47 ` Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] " James Bottomley
2005-12-09 3:57 ` Mike Christie
@ 2005-12-09 15:29 ` Vladislav Bolkhovitin
2005-12-09 15:48 ` James Bottomley
1 sibling, 1 reply; 43+ messages in thread
From: Vladislav Bolkhovitin @ 2005-12-09 15:29 UTC (permalink / raw)
To: James Bottomley
Cc: Mike Christie, johan, iscsitarget-devel, mingz, stgt,
Robert Whitehead, scst-devel, linux-scsi, Christoph Hellwig
James Bottomley wrote:
> On Thu, 2005-12-08 at 21:46 +0300, Vladislav Bolkhovitin wrote:
>
>> - Scst has some performance advantages over stgt, at least, on
>>hardware targets, because it allows internal handling in SIRQ context as
>>well as doesn't need user space program, so it eliminates additional
>>context switches (at least 3 per command for WRITEs and 2 per command
>>for reads plus switches to user space daemon, probably, 2 per command).
>>5 context switches per command looks too much for me, especially
>>considering how little work is done on each context. It means ~15000
>>CS/sec on regular 200Mb/sec link with 64K block size. Additionally,
>>kernel only commands execution allows direct access to page cache, which
>>is GREAT performance improvement, because in architecture with user
>>space daemon the data should be copied several times between kernel and
>>user space. Or, do I miimpss anything?
>
>
> I do have to say that I consider operation in interrupt context (or even
> kernel context) to be a disadvantage. Compared with the response times
> that most arrays have to SCSI commands, the kernel context switch time
> isn't that significant.
>
> Additionally, it's perfectly possible for all of this to be done zero
> copy on the data. A user space target mmaps the data on its storage
> device and then does a SG_IO type scatter gather user virtual region
> pass to the underlying target infrastructure. We already have this
> demonstrated in the SG_IO path, someone just needs to come up with the
> correct implementation for a target path.
I'm not completely understand how it will work. Consider, there are
READ/WRITE commands with random data sizes arrive from an initiator. Are
you going to do map/unmap for each command individually or alloc data
buffers for commands from a premapped area and live with possible its
fragmentation? If map/unmap individually, then I should say that those
are very expensive operations.
> The great advantage of doing SCSI state machines in user space is that
> you can prototype anything you want, and user space has much better
> state machine implementation (and debugging) tools available than the
> kernel does.
>
> James
>
>
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
> for problems? Stop! Download the new AJAX search engine that makes
> searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
> http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
> _______________________________________________
> Scst-devel mailing list
> Scst-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scst-devel
>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
2005-12-08 20:09 ` Mike Christie
2005-12-08 21:35 ` Dave C Boutcher
@ 2005-12-09 15:29 ` Vladislav Bolkhovitin
1 sibling, 0 replies; 43+ messages in thread
From: Vladislav Bolkhovitin @ 2005-12-09 15:29 UTC (permalink / raw)
To: Mike Christie
Cc: James Bottomley, johan, iscsitarget-devel, mingz, stgt,
Robert Whitehead, scst-devel, linux-scsi, Christoph Hellwig
Mike Christie wrote:
> James Bottomley wrote:
>
>> On Thu, 2005-12-08 at 13:10 -0600, Mike Christie wrote:
>>
>>> cleanup. In the end some of the scsi people liked the idea of
>>> throwing the non-read/write command to userspace and to do this we
>>> just decided to start over but I have been cutting and pasting your
>>> code and cleaning it up as I add more stuff.
>>
>>
>>
>> To be honest, I'd like to see all command processing at user level
>> (including read/write ... for block devices, it shouldn't be that
>> inefficient, since you're merely going to say remap an area from one
>> device to another; as long as no data transformation ever occurs, the
>> user never touches the data and it all remains in the kernel page
>> cache).
>
>
> Ok, Tomo and I briefly talked about this when we saw Jeff's post about
> doing block layer drivers in userspace on lkml. I think we were somewhat
> prepared for this given some of your other replies.
Could you give me a reference on this message, please?
> So Vlad and other target guys what do you think? Vlad are you going to
> continue to maintain scst as kernel only, or is there some place we can
> work together on this on - if your feelings are not hurt too much that
> is :) ?
From one side, I hate dropping anything in the point where it isn't
somehow completed. From other side, I don't have much time to spend on
this my hobby. Nobody pays me for that, it's just for fun.
Vlad
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
2005-12-08 21:56 ` Mike Christie
@ 2005-12-09 15:29 ` Vladislav Bolkhovitin
2005-12-09 22:31 ` Mike Christie
2005-12-10 8:46 ` FUJITA Tomonori
0 siblings, 2 replies; 43+ messages in thread
From: Vladislav Bolkhovitin @ 2005-12-09 15:29 UTC (permalink / raw)
To: Mike Christie
Cc: boutcher, James Bottomley, johan, iscsitarget-devel, mingz, stgt,
Robert Whitehead, scst-devel, linux-scsi, Christoph Hellwig
Mike Christie wrote:
> Dave C Boutcher wrote:
>
>> On Thu, Dec 08, 2005 at 02:09:32PM -0600, Mike Christie wrote:
>>
>>> James Bottomley wrote:
>>>
>>>> On Thu, 2005-12-08 at 13:10 -0600, Mike Christie wrote:
>>>>
>>>>
>>>>> cleanup. In the end some of the scsi people liked the idea of
>>>>> throwing the non-read/write command to userspace and to do this we
>>>>> just decided to start over but I have been cutting and pasting your
>>>>> code and cleaning it up as I add more stuff.
>>>>
>>>>
>>>>
>>>> To be honest, I'd like to see all command processing at user level
>>>> (including read/write ... for block devices, it shouldn't be that
>>>> inefficient, since you're merely going to say remap an area from one
>>>> device to another; as long as no data transformation ever occurs, the
>>>> user never touches the data and it all remains in the kernel page
>>>> cache).
>>>
>>>
>>> Ok, Tomo and I briefly talked about this when we saw Jeff's post
>>> about doing block layer drivers in userspace on lkml. I think we were
>>> somewhat prepared for this given some of your other replies.
>>>
>>> So Vlad and other target guys what do you think? Vlad are you going
>>> to continue to maintain scst as kernel only, or is there some place
>>> we can work together on this on - if your feelings are not hurt too
>>> much that is :) ?
>>
>>
>>
>> Oofff....Architecturally I agree with James...do all command processing
>> in one place. On the other hand, the processing involved with a read or
>> write in the normal case (no aborts/resets/ordering/timeouts/etc) is
>> almost zero. Figure out the LBA and length and pass on the I/O. The
>
>
> There is still memory and scatterlist allocations. If we are not going
> to allocate all the memory for a command buffer and request with
> GFP_ATOMIC (and can then run from the the HW interrupt or soft irq) we
> have to pass that on to a thread. I guess there is disagreement whether
> that part is a feature or bad use of GFP_ATOMIC though so... But I just
> mean to say there could be a little more to do.
Actually, there is the way to allocate sg vectors with buffers in SIRQ
and not with GFP_ATOMIC. This is the second major improvement, which is
pending in scst. I called it sgv_pool. This is a new allocator in the
kernel similar to mem_pool, but it contains *complete* sg-vectors of
some size with data buffers (pages). Initiator sends data requests
usually with some fixed size, like 128K. After a data command completed,
its sg vector will not be immediately freed, but will be kept in
sgv_pool until the next request (command) or memory pressure on the
system. So, all subsequent commands will allocate already built vectors.
The first allocations will be done in some thread context. This allows
to allocate huge chunks of memory in SIRQ context as well as save a lot
of CPU power necessary to always build big sg vectors for each command
individually.
Vlad
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Ang: Re: [Stgt-devel] Re: stgt a new version of iscsi target?
2005-12-08 18:54 ` Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] " Mike Christie
@ 2005-12-09 15:30 ` Vladislav Bolkhovitin
2005-12-09 22:31 ` Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] " Mike Christie
0 siblings, 1 reply; 43+ messages in thread
From: Vladislav Bolkhovitin @ 2005-12-09 15:30 UTC (permalink / raw)
To: Mike Christie
Cc: johan, iscsitarget-devel, mingz, stgt, Robert Whitehead,
scst-devel, linux-scsi, Christoph Hellwig
Mike Christie wrote:
> Vladislav Bolkhovitin wrote:
>
>> Mike Christie wrote:
>>
>>> johan@capvert.se wrote:
>>>
>>>> I, and I suppose a lot of other people, would like know how you look
>>>> at the
>>>> HW support on the Fc side? I don´t know what you mean now, when you
>>>> talk
>>>> about HW target support,
>>>
>>>
>>>
>>> For qlogic FC, it is the same as scst at this point, except we are
>>> based off the mainline qla2xxx driver.
>>
>>
>>
>> Actually, this is not completely true and could mislead people. Stgt is
>> *not* the same as scst at this point and,
>
>
> Yes, I agree, I was not talking about functionality. I was only talking
> about how we interface with the initiator driver. My mistake. So we add
> a qla_tgt function callback struct (but we do it for each ha rather than
> global) so the initiator can call back into the target driver, so I am
> basically resusing that code.
OK, if you reuse scst code, I hope you won't forget to insert somewhere
reference to its original authors :-)
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_idv37&alloc_id\x16865&op=click
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
2005-12-08 21:35 ` Dave C Boutcher
2005-12-08 21:56 ` Mike Christie
@ 2005-12-09 15:30 ` Vladislav Bolkhovitin
1 sibling, 0 replies; 43+ messages in thread
From: Vladislav Bolkhovitin @ 2005-12-09 15:30 UTC (permalink / raw)
To: boutcher
Cc: Mike Christie, James Bottomley, johan, iscsitarget-devel, mingz,
stgt, Robert Whitehead, scst-devel, linux-scsi, Christoph Hellwig
Dave C Boutcher wrote:
> On Thu, Dec 08, 2005 at 02:09:32PM -0600, Mike Christie wrote:
>
>>James Bottomley wrote:
>>
>>>On Thu, 2005-12-08 at 13:10 -0600, Mike Christie wrote:
>>>
>>>
>>>>cleanup. In the end some of the scsi people liked the idea of throwing
>>>>the non-read/write command to userspace and to do this we just decided
>>>>to start over but I have been cutting and pasting your code and cleaning
>>>>it up as I add more stuff.
>>>
>>>
>>>To be honest, I'd like to see all command processing at user level
>>>(including read/write ... for block devices, it shouldn't be that
>>>inefficient, since you're merely going to say remap an area from one
>>>device to another; as long as no data transformation ever occurs, the
>>>user never touches the data and it all remains in the kernel page
>>>cache).
>>
>>Ok, Tomo and I briefly talked about this when we saw Jeff's post about
>>doing block layer drivers in userspace on lkml. I think we were somewhat
>>prepared for this given some of your other replies.
>>
>>So Vlad and other target guys what do you think? Vlad are you going to
>>continue to maintain scst as kernel only, or is there some place we can
>>work together on this on - if your feelings are not hurt too much that
>>is :) ?
>
>
> Oofff....Architecturally I agree with James...do all command processing
> in one place. On the other hand, the processing involved with a read or
> write in the normal case (no aborts/resets/ordering/timeouts/etc) is
> almost zero. Figure out the LBA and length and pass on the I/O. The
> overhead of passing it up and down across the kernel boundary is likely
> to be orders of magnitude larger than the actual processing. I would
> personally rather not fix this decision in concrete until we could do
> some actual measurements of a SCSI target under heavy load.
Totally agree.
Vlad
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Scst-devel] Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
2005-12-09 15:29 ` [Scst-devel] Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] " Vladislav Bolkhovitin
@ 2005-12-09 15:48 ` James Bottomley
2005-12-10 15:32 ` Vladislav Bolkhovitin
0 siblings, 1 reply; 43+ messages in thread
From: James Bottomley @ 2005-12-09 15:48 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: Mike Christie, johan, iscsitarget-devel, mingz, stgt,
Robert Whitehead, scst-devel, linux-scsi, Christoph Hellwig
On Fri, 2005-12-09 at 18:29 +0300, Vladislav Bolkhovitin wrote:
> > Additionally, it's perfectly possible for all of this to be done zero
> > copy on the data. A user space target mmaps the data on its storage
> > device and then does a SG_IO type scatter gather user virtual region
> > pass to the underlying target infrastructure. We already have this
> > demonstrated in the SG_IO path, someone just needs to come up with the
> > correct implementation for a target path.
>
> I'm not completely understand how it will work. Consider, there are
> READ/WRITE commands with random data sizes arrive from an initiator. Are
> you going to do map/unmap for each command individually or alloc data
> buffers for commands from a premapped area and live with possible its
> fragmentation? If map/unmap individually, then I should say that those
> are very expensive operations.
You do it the same way an array does: the model for SPI is read command
phase, disconnect, process command (i.e. set up areas) reconnect for
data transfer.
map/unmap are really only necessary if you're emulating the data store,
but it's a fairly cheap operation on linux: it just causes the creation
of a vm_area. If it's a pass through, you can use SG_IO to pull it in
and the SG_IO like output to shoot it out again, effectively using a
piece of user memory as a zero copy buffer.
Fragmentation isn't an issue because the I/O goes via sg lists , all
that's needed is a bunch of pages.
James
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
2005-12-09 15:28 ` Vladislav Bolkhovitin
@ 2005-12-09 22:23 ` Mike Christie
2005-12-10 1:15 ` Ang: Re: [Stgt-devel] " Mike Christie
2005-12-10 8:46 ` FUJITA Tomonori
1 sibling, 1 reply; 43+ messages in thread
From: Mike Christie @ 2005-12-09 22:23 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: johan, iscsitarget-devel, mingz, stgt, Robert Whitehead,
scst-devel, linux-scsi, Christoph Hellwig
Vladislav Bolkhovitin wrote:
> Mike Christie wrote:
>
>> Vladislav Bolkhovitin wrote:
>>
>>> - Scst has some performance advantages over stgt, at least, on
>>> hardware targets, because it allows internal handling in SIRQ context
>>> as well as doesn't need user space program, so it eliminates additi
>>> switches (at least 3 per command for WRITEs and 2 per command for
>>> reads plus switches to user space daemon, probably, 2 per command). 5
>>> context switches per command looks too much for me, especially
>>> considering how little work is done on each context. It means ~15000
>>> CS/sec on regular 200Mb/sec link with 64K block size. Additionally,
>>> kernel only commands execution allows direct access to page cache,
>>> which is GREAT performance improvement, because in architecture with
>>> user space daemon the data should be copied several times between
>>> kernel and user space. Or, do I miss anything?
>>
>>
>>
>> Userspace only occurs for READ/WRITEs. So for REPORT_LUNS, TUR, etc
>> where performance is not a factor.
>
>
> So, do you mean that READ/WRITE operations are (or will be) performed in
> kernel space only? I don't see that from the code.
They are today. I think they have always been that way. Look again :)
>>>
>>> From other side, stgt has not too much advantages over scst.
>>
>>
>>
>> Hey, we just started and have not had too much time.
>
>
> No offense, please, you wrote stgt and scst are the same, I wrote, they
> aren't :-)
>
Yeah, I know. It was my mistake. If you read other mails, you would see
I mentioned that we do not have any type of eh and I make Tomo mess up
the serialization stuff becase I asked him to try to reuse some block
layer code.
If you read other mails, you would also see I reccomended scst to people
requesting something stable :)
>>> Actually, we would greatly appreciate if Mike or Christoph will tell
>>> us what is so wrong in scst on their opinion, so they started they
>>> own new project. (I'm not considering motivation like "I want to make
>>> my own in any case" seriously). Is scst too complicated? Do you think
>>> stgt will be simpler with all those features implemented?
>>>
>>
>> Didn't we go over this? To get SCST ready for mainline we would have
>> had a large cleanup effort. I started this and sent you a patch to
>> begin the cleanup. In the end some of the scsi people liked the idea
>> of throwing the non-read/write command to userspace and to do this we
>> just decided to start over but I have been cutting and pasting your
>> code and cleaning it up as I add more stuff.
>
>
> The patches that I've seen were just pretty mechanic cleanups and
> renames, which could be done in a half of hour time and which I'm going
Yeah it was the beginning of the easy work. I did not mean that as an
example of evertthing. I thought you would remember when we discussed
this on linux-scsi before.
> to do in any case before preparing the patch. So, that reason doesn't
> look convincing for me to throw away a big chunk of working code. Doing
> so you delayed SCSI targets development for at least to a year-two,
Hey, looked how long it took iscsi to get in becuase we wasted so much
time cleaning up iscsi-sfnet :)
> because there are too much features for you to implement in stgt, which
> are already working and useful in scst.
Well, there was more when you asked on linux-scsi. You have other things
like refcouting (we only are adding that in today, but we do get
references to the scsi_host when we access the qla2xxx ha at least). If
someone ripps out a qla2xxx card you will oops.
We also did not want to hook in as a SCSI ULD interface becuase we did
not want to worry about what happens when poeple start using
usb-mass-storage for targets and LUNs. Look how many times we see Alan
Stern pop up for just the initiator side :) And to be honest DM would do
a lot of what tgt and scst want as far as giving us a reference to the
devive we want to use as a LUN and handling all the setup work so we
probably both messed up there :(
>
> From other side, if you look on scst closely you will see that:
>
> - The user space gate could be easily and clearly done using existing
> dev handler's hooks
Yeah and the problem is that we just do not believe those are
implemented in the correct place. We do not like class interface SCSI
hook in, when we can do the same thing from userspace.
>
> - There is nothing in current SCST that could be moved in user space
> without sacrificing performance. Neither task management, nor UA
> processing, nor RESERVE/RELEASE emulation. Otherwise, you will have to
> pass *all* the commands to the user space to check allowed this command
> for processing or it shall be returned with an error.
For non READ/WRITE, we are ok with that performance drops. Even for
doing READ/WRITEs in the kernel from interrupt context, we were going to
run from a softirq, but we thought allocating the whole command with
GFP_ATOMIC would not be liked so we pass it to the thread. And for when
we do pass through (using elv_add_request/blk_rq_execute_nowait), we can
do it in just the context swith needed for the memory allocation. But to
do GFP_ATOMOIC softirq or hw irq would not be a problem, although I do
not think we want to submit IO from the hw interrupt incase the queue
gets unplugged at the same time :)
For non READ/WRITEs look how far open-iscsi went. And from James's
reply, you see that he thinks READs and WRITEs can go to userspace too,
so you know this is an uphill battle to get everything in the kernel.
SCST core is just
> about 7500 lines of code. Is it too much?
Ask the people that have to review the code? :) After sfnet, I learned
that it is sometimes best to get the basics in the kernel first so we do
not burn out the christoph robot :) I think part of this stems from the
fact that I touched pretty much every line in that driver to clean it up
and it took me about a year. And while I was beginning to clean up scst
I began to remember sfnet.
But there are other cleanups like moving some of the state to per
target, cleaningup the scattlist allocation code and moving it to
scsi-ml so the SCSI ULDs can use them and convert them. There is also
thing like converting to the right APIs for 2.6 (rm kernel_thread, rm
scsi_request, rm proc, fixup class interface refcouting problems, fixup
scsi_device lack of refcounting usage, etc).
>
>> Simmer down :) If you had just gotton your code ready when christoph
>> asked a year ago we would never have had this problem and I would be
>> sending you patches to remove the scsi_request usage as we speak.
>
>
> Christoph didn't ask me to do anything. Actually, nobody contacted and
What? Look at the review comments you got on the list when you first
posted scst. It asked you to take out the 2.4ish code out and clean it
up didn't it?
> asked anything about that me when you with Christoph conceived stgt.
>
I do not think Christoph and I conceived stgt. Tomo and I did or
actually Tomo did the original code to push things to userspace. I just
bugged him about getting something in mainlin. We only asked Christoph
what we would should do to get things in mainline. After seeing what
open-iscsi did and the comments from Arjan about IET, we thought we
would have to follow a similar route as open-iscsi.
At this point I do not care what we do. I just want to get something in
mainline for everyone to use and so I can use it at home becuase I am
lazy and do not want to drive to work in the snow to test FC card
updates :) If we can come to some conclusions on some of these issues I
am sure Tomo and I would be happy to work together.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
2005-12-09 15:29 ` Vladislav Bolkhovitin
@ 2005-12-09 22:31 ` Mike Christie
2005-12-10 15:31 ` Vladislav Bolkhovitin
2005-12-10 8:46 ` FUJITA Tomonori
1 sibling, 1 reply; 43+ messages in thread
From: Mike Christie @ 2005-12-09 22:31 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: boutcher, James Bottomley, johan, iscsitarget-devel, mingz, stgt,
Robert Whitehead, scst-devel, linux-scsi, Christoph Hellwig
Vladislav Bolkhovitin wrote:
> Mike Christie wrote:
>
>> Dave C Boutcher wrote:
>>
>>> On Thu, Dec 08, 2005 at 02:09:32PM -0600, Mike Christie wrote:
>>>
>>>> James Bottomley wrote:
>>>>
>>>>> On Thu, 2005-12-08 at 13:10 -0600, Mike Christie wrote:
>>>>>
>>>>>
>>>>>> cleanup. In the end some of the scsi people liked the idea of
>>>>>> throwing the non-read/write command to userspace and to do this we
>>>>>> just decided to start over but I have been cutting and pasting
>>>>>> your code and cleaning it up as I add more stuff.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> To be honest, I'd like to see all command processing at user level
>>>>> (including read/write ... for block devices, it shouldn't be that
>>>>> inefficient, since you're merely going to say remap an area from one
>>>>> device to another; as long as no data transformation ever occurs, the
>>>>> user never touches the data and it all remains in the kernel page
>>>>> cache).
>>>>
>>>>
>>>>
>>>> Ok, Tomo and I briefly talked about this when we saw Jeff's post
>>>> about doing block layer drivers in userspace on lkml. I think we
>>>> were somewhat prepared for this given some of your other replies.
>>>>
>>>> So Vlad and other target guys what do you think? Vlad are you going
>>>> to continue to maintain scst as kernel only, or is there some place
>>>> we can work together on this on - if your feelings are not hurt too
>>>> much that is :) ?
>>>
>>>
>>>
>>>
>>> Oofff....Architecturally I agree with James...do all command processing
>>> in one place. On the other hand, the processing involved with a read or
>>> write in the normal case (no aborts/resets/ordering/timeouts/etc) is
>>> almost zero. Figure out the LBA and length and pass on the I/O. The
>>
>>
>>
>> There is still memory and scatterlist allocations. If we are not going
>> to allocate all the memory for a command buffer and request with
>> GFP_ATOMIC (and can then run from the the HW interrupt or soft irq) we
>> have to pass that on to a thread. I guess there is disagreement
>> whether that part is a feature or bad use of GFP_ATOMIC though so...
>> But I just mean to say there could be a little more to do.
>
>
> Actually, there is the way to allocate sg vectors with buffers in SIRQ
> and not with GFP_ATOMIC. This is the second major improvement, which is
> pending in scst. I called it sgv_pool. This is a new allocator in the
> kernel similar to mem_pool, but it contains *complete* sg-vectors of
> some size with data buffers (pages). Initiator sends data requests
> usually with some fixed size, like 128K. After a data command completed,
> its sg vector will not be immediately freed, but will be kept in
We considered this, but what did you decide is the upper limit size for
the pool? Is it dynmaic? We also wanted something that the SCSI ULDs
could use for their allocations which could go up to 6 MB.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
2005-12-09 15:30 ` Ang: Re: [Stgt-devel] " Vladislav Bolkhovitin
@ 2005-12-09 22:31 ` Mike Christie
0 siblings, 0 replies; 43+ messages in thread
From: Mike Christie @ 2005-12-09 22:31 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: johan, iscsitarget-devel, mingz, stgt, Robert Whitehead,
scst-devel, linux-scsi, Christoph Hellwig
Vladislav Bolkhovitin wrote:
>>
>> Yes, I agree, I was not talking about functionality. I was only
>> talking about how we interface with the initiator driver. My mistake.
>> So we add a qla_tgt function callback struct (but we do it for each ha
>> rather than global) so the initiator can call back into the target
>> driver, so I am basically resusing that code.
>
>
> OK, if you reuse scst code, I hope you won't forget to insert somewhere
> reference to its original authors :-)
>
yes of course, when approporate I kept your copyrights even.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Ang: Re: [Stgt-devel] Re: stgt a new version of iscsi target?
2005-12-09 22:23 ` Mike Christie
@ 2005-12-10 1:15 ` Mike Christie
2005-12-10 15:30 ` Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] " Vladislav Bolkhovitin
0 siblings, 1 reply; 43+ messages in thread
From: Mike Christie @ 2005-12-10 1:15 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: johan, iscsitarget-devel, mingz, stgt, Robert Whitehead,
scst-devel, linux-scsi, Christoph Hellwig
Mike Christie wrote:
>>>> Actually, we would greatly appreciate if Mike or Christoph will tell
>>>> us what is so wrong in scst on their opinion, so they started they
>>>> own new project. (I'm not considering motivation like "I want to
>>>> make my own in any case" seriously). Is scst too complicated? Do you
>>>> think stgt will be simpler with all those features implemented?
>>>>
>>>
>>> Didn't we go over this? To get SCST ready for mainline we would have
>>> had a large cleanup effort. I started this and sent you a patch to
>>> begin the cleanup. In the end some of the scsi people liked the idea
>>> of throwing the non-read/write command to userspace and to do this we
>>> just decided to start over but I have been cutting and pasting your
>>> code and cleaning it up as I add more stuff.
>>
>>
>>
>> The patches that I've seen were just pretty mechanic cleanups and
>> renames, which could be done in a half of hour time and which I'm going
>
>
> Yeah it was the beginning of the easy work. I did not mean that as an
> example of evertthing. I thought you would remember when we discussed
> this on linux-scsi before.
>
>
>> to do in any case before preparing the patch. So, that reason doesn't
>> look convincing for me to throw away a big chunk of working code.
>> Doing so you delayed SCSI targets development for at least to a year-two,
>
>
> Hey, looked how long it took iscsi to get in becuase we wasted so much
> time cleaning up iscsi-sfnet :)
>
>> because there are too much features for you to implement in stgt,
>> which are already working and useful in scst.
>
>
> Well, there was more when you asked on linux-scsi. You have other things
> like refcouting (we only are adding that in today, but we do get
> references to the scsi_host when we access the qla2xxx ha at least). If
> someone ripps out a qla2xxx card you will oops.
>
> We also did not want to hook in as a SCSI ULD interface becuase we did
> not want to worry about what happens when poeple start using
> usb-mass-storage for targets and LUNs. Look how many times we see Alan
> Stern pop up for just the initiator side :) And to be honest DM would do
> a lot of what tgt and scst want as far as giving us a reference to the
> devive we want to use as a LUN and handling all the setup work so we
> probably both messed up there :(
>
>>
>> From other side, if you look on scst closely you will see that:
>>
>> - The user space gate could be easily and clearly done using existing
>> dev handler's hooks
>
>
> Yeah and the problem is that we just do not believe those are
> implemented in the correct place. We do not like class interface SCSI
> hook in, when we can do the same thing from userspace.
>
>>
>> - There is nothing in current SCST that could be moved in user space
>> without sacrificing performance. Neither task management, nor UA
>> processing, nor RESERVE/RELEASE emulation. Otherwise, you will have to
>> pass *all* the commands to the user space to check allowed this
>> command for processing or it shall be returned with an error.
>
>
> For non READ/WRITE, we are ok with that performance drops. Even for
> doing READ/WRITEs in the kernel from interrupt context, we were going to
> run from a softirq, but we thought allocating the whole command with
> GFP_ATOMIC would not be liked so we pass it to the thread. And for when
> we do pass through (using elv_add_request/blk_rq_execute_nowait), we can
> do it in just the context swith needed for the memory allocation. But to
> do GFP_ATOMOIC softirq or hw irq would not be a problem, although I do
> not think we want to submit IO from the hw interrupt incase the queue
> gets unplugged at the same time :)
>
> For non READ/WRITEs look how far open-iscsi went. And from James's
> reply, you see that he thinks READs and WRITEs can go to userspace too,
> so you know this is an uphill battle to get everything in the kernel.
>
>
> SCST core is just
>
>> about 7500 lines of code. Is it too much?
>
>
> Ask the people that have to review the code? :) After sfnet, I learned
> that it is sometimes best to get the basics in the kernel first so we do
> not burn out the christoph robot :) I think part of this stems from the
> fact that I touched pretty much every line in that driver to clean it up
> and it took me about a year. And while I was beginning to clean up scst
> I began to remember sfnet.
>
> But there are other cleanups like moving some of the state to per
> target, cleaningup the scattlist allocation code and moving it to
> scsi-ml so the SCSI ULDs can use them and convert them. There is also
> thing like converting to the right APIs for 2.6 (rm kernel_thread, rm
> scsi_request, rm proc, fixup class interface refcouting problems, fixup
> scsi_device lack of refcounting usage, etc).
>
Oh yeah I think the other major issue at least I had with scst was that
it was scsi specific and we wanted try and seperate things so if drivers
like IET and vscsi are allowed then we could also do other drivers like
a ATA over ethernet target driver or allow any other target driver that
wanted to to hook in. I think you noted that we were spererating some
protocol specific things as a distadvantage or mentioned it for some
reason but I am not completely sure why and we may not agree on that
issue too.
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
2005-12-09 15:29 ` Vladislav Bolkhovitin
2005-12-09 22:31 ` Mike Christie
@ 2005-12-10 8:46 ` FUJITA Tomonori
1 sibling, 0 replies; 43+ messages in thread
From: FUJITA Tomonori @ 2005-12-10 8:46 UTC (permalink / raw)
To: vst
Cc: michaelc, boutcher, James.Bottomley, johan, iscsitarget-devel,
mingz, stgt-devel, WRWHITEHEAD, scst-devel, linux-scsi, hch
From: Vladislav Bolkhovitin <vst@vlnb.net>
Subject: Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
Date: Fri, 09 Dec 2005 18:29:57 +0300
> > There is still memory and scatterlist allocations. If we are not going
> > to allocate all the memory for a command buffer and request with
> > GFP_ATOMIC (and can then run from the the HW interrupt or soft irq) we
> > have to pass that on to a thread. I guess there is disagreement whether
> > that part is a feature or bad use of GFP_ATOMIC though so... But I just
> > mean to say there could be a little more to do.
>
> Actually, there is the way to allocate sg vectors with buffers in SIRQ
> and not with GFP_ATOMIC. This is the second major improvement, which is
> pending in scst. I called it sgv_pool. This is a new allocator in the
> kernel similar to mem_pool, but it contains *complete* sg-vectors of
> some size with data buffers (pages). Initiator sends data requests
> usually with some fixed size, like 128K. After a data command completed,
> its sg vector will not be immediately freed, but will be kept in
> sgv_pool until the next request (command) or memory pressure on the
> system. So, all subsequent commands will allocate already built vectors.
> The first allocations will be done in some thread context. This allows
> to allocate huge chunks of memory in SIRQ context as well as save a lot
> of CPU power necessary to always build big sg vectors for each command
> individually.
ibmvscsis code the same thing, though it is mainly for caching
dma-mapped pages, I guess.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
2005-12-09 15:28 ` Vladislav Bolkhovitin
2005-12-09 22:23 ` Mike Christie
@ 2005-12-10 8:46 ` FUJITA Tomonori
2005-12-10 15:32 ` Ang: Re: [Stgt-devel] " Vladislav Bolkhovitin
1 sibling, 1 reply; 43+ messages in thread
From: FUJITA Tomonori @ 2005-12-10 8:46 UTC (permalink / raw)
To: vst
Cc: michaelc, johan, iscsitarget-devel, mingz, stgt-devel,
WRWHITEHEAD, scst-devel, linux-scsi, hch
From: Vladislav Bolkhovitin <vst@vlnb.net>
Subject: Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
Date: Fri, 09 Dec 2005 18:28:58 +0300
> > Also is the page cache comment in reference to us using the page cache
> > for our reads and writes or I am not sure why you wrote that if you do
> > not do it right now.
>
> Hm, at first, the page cache already used somehow in fileio dev handler
> (though, with additional mem copy). At the second, fully utilize the
> page cache is one of two major improvements that are pending in scst,
> because it is required changing the kernel, which until some moment try
> to avoid. Although I prepared what is necessary for that.
>
> The idea basically is the following. When READ operation arrives, pages
> for all requested blocks are at first searched in the page cache
> (probably, in SIRQ context, because it isn't expensive operation) and if
> all pages are found, they are referenced and the result will be sent to
> the initiator. Then the pages will be dereferenced (so, no pages
> allocation will be done at all). Otherwise, the missed pages will be
> allocated and the command will be rescheduled to the thread, which will
> read them. Then, after the response is sent, the pages will remain in
> the page cache for future accesses. For WRITEs the processing is the
> similar, the pages with the data will be put in the page cache.
The Ardis iSCSI target code does the same thing.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
2005-12-10 1:15 ` Ang: Re: [Stgt-devel] " Mike Christie
@ 2005-12-10 15:30 ` Vladislav Bolkhovitin
2005-12-10 18:22 ` Mike Christie
0 siblings, 1 reply; 43+ messages in thread
From: Vladislav Bolkhovitin @ 2005-12-10 15:30 UTC (permalink / raw)
To: Mike Christie
Cc: johan, iscsitarget-devel, mingz, stgt, Robert Whitehead,
scst-devel, linux-scsi, Christoph Hellwig
Mike Christie wrote:
>> But there are other cleanups like moving some of the state to per
>> target, cleaningup the scattlist allocation code and moving it to
>> scsi-ml so the SCSI ULDs can use them and convert them. There is also
>> thing like converting to the right APIs for 2.6 (rm kernel_thread, rm
>> scsi_request, rm proc, fixup class interface refcouting problems,
>> fixup scsi_device lack of refcounting usage, etc).
>>
>
> Oh yeah I think the other major issue at least I had with scst was that
> it was scsi specific and we wanted try and seperate things so if drivers
> like IET and vscsi are allowed then we could also do other drivers like
> a ATA over ethernet target driver or allow any other target driver that
> wanted to to hook in. I think you noted that we were spererating some
> protocol specific things as a distadvantage or mentioned it for some
> reason but I am not completely sure why and we may not agree on that
> issue too.
SCSI has a lot of very specific stuff like UA handling and (at least)
some parts of task management, especially if we consider honoring NACA,
QErr, TST, UA_INTLCK_CTRL bits, therefore I'm not sure that to have
common target parts for other protocols worths complicating the
mid-layer with code and interfaces that will separate SCSI-specifics
from non-SCSI protocols. So, good luck with it :-)
Vlad
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
2005-12-09 22:31 ` Mike Christie
@ 2005-12-10 15:31 ` Vladislav Bolkhovitin
2005-12-10 18:19 ` Mike Christie
0 siblings, 1 reply; 43+ messages in thread
From: Vladislav Bolkhovitin @ 2005-12-10 15:31 UTC (permalink / raw)
To: Mike Christie
Cc: boutcher, James Bottomley, johan, iscsitarget-devel, mingz, stgt,
Robert Whitehead, scst-devel, linux-scsi, Christoph Hellwig
Mike Christie wrote:
>>> There is still memory and scatterlist allocations. If we are not
>>> going to allocate all the memory for a command buffer and request
>>> with GFP_ATOMIC (and can then run from the the HW interrupt or soft
>>> irq) we have to pass that on to a thread. I guess there is
>>> disagreement whether that part is a feature or bad use of GFP_ATOMIC
>>> though so... But I just mean to say there could be a little more to do.
>>
>>
>>
>> Actually, there is the way to allocate sg vectors with buffers in SIRQ
>> and not with GFP_ATOMIC. This is the second major improvement, which
>> is pending in scst. I called it sgv_pool. This is a new allocator in
>> the kernel similar to mem_pool, but it contains *complete* sg-vectors
>> of some size with data buffers (pages). Initiator sends data requests
>> usually with some fixed size, like 128K. After a data command
>> completed, its sg vector will not be immediately freed, but will be
>> kept in
>
>
> We considered this, but what did you decide is the upper limit size for
> the pool? Is it dynmaic? We also wanted something that the SCSI ULDs
> could use for their allocations which could go up to 6 MB.
Why do you think it needs any upper limit size? Would you like also any
upper limits on sizes of the page or slab caches? I don't see any
difference between them and sgv_pool. Let it allocate and keep unused as
much memory as possible until it would be notified about memory pressure
and ask to free some. Similarly as slab does.
BTW, from our experience, initiators tend to send commands without limit
and in case if actual SCSI device isn't so fast to serve all of them
with the rate, which they arrive, incoming queued commands very quickly
eat all system memory with obvious consequences. So, there must be some
kind of IO-throttling, when after some watermark the initiators start
receiving TASK QUEUE FULL, which calm them down. I implemented very
stupid one: don't let more than 32 commands per session. This let scst
survive under extreme loads. You can think about something smarter :-)
Vlad
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Ang: Re: [Stgt-devel] Re: stgt a new version of iscsi target?
2005-12-10 8:46 ` FUJITA Tomonori
@ 2005-12-10 15:32 ` Vladislav Bolkhovitin
2005-12-10 15:54 ` Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] " FUJITA Tomonori
2005-12-10 18:09 ` Mike Christie
0 siblings, 2 replies; 43+ messages in thread
From: Vladislav Bolkhovitin @ 2005-12-10 15:32 UTC (permalink / raw)
To: FUJITA Tomonori
Cc: michaelc, johan, iscsitarget-devel, mingz, stgt-devel,
WRWHITEHEAD, scst-devel, linux-scsi, hch
FUJITA Tomonori wrote:
> From: Vladislav Bolkhovitin <vst@vlnb.net>
> Subject: Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
> Date: Fri, 09 Dec 2005 18:28:58 +0300
>
>
>>>Also is the page cache comment in reference to us using the page cache
>>>for our reads and writes or I am not sure why you wrote that if you do
>>>not do it right now.
>>
>>Hm, at first, the page cache already used somehow in fileio dev handler
>>(though, with additional mem copy). At the second, fully utilize the
>>page cache is one of two major improvements that are pending in scst,
>>because it is required changing the kernel, which until some moment try
>>to avoid. Although I prepared what is necessary for that.
>>
>>The idea basically is the following. When READ operation arrives, pages
>>for all requested blocks are at first searched in the page cache
>>(probably, in SIRQ context, because it isn't expensive operation) and if
>>all pages are found, they are referenced and the result will be sent to
>>the initiator. Then the pages will be dereferenced (so, no pages
>>allocation will be done at all). Otherwise, the missed pages will be
>>allocated and the command will be rescheduled to the thread, which will
>>read them. Then, after the response is sent, the pages will remain in
>>the page cache for future accesses. For WRITEs the processing is the
>>similar, the pages with the data will be put in the page cache.
>
> The Ardis iSCSI target code does the same thing.
Perfectly. So, why don't do it on the mid-layer level where all targets
can benefit from it?
Vlad
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: stgt a new version of iscsi target?
2005-12-09 15:48 ` James Bottomley
@ 2005-12-10 15:32 ` Vladislav Bolkhovitin
2005-12-10 18:07 ` Mike Christie
0 siblings, 1 reply; 43+ messages in thread
From: Vladislav Bolkhovitin @ 2005-12-10 15:32 UTC (permalink / raw)
To: James Bottomley
Cc: Mike Christie, johan, iscsitarget-devel, mingz, stgt,
Robert Whitehead, scst-devel, linux-scsi, Christoph Hellwig
James Bottomley wrote:
> On Fri, 2005-12-09 at 18:29 +0300, Vladislav Bolkhovitin wrote:
>
>>>Additionally, it's perfectly possible for all of this to be done zero
>>>copy on the data. A user space target mmaps the data on its storage
>>>device and then does a SG_IO type scatter gather user virtual region
>>>pass to the underlying target infrastructure. We already have this
>>>demonstrated in the SG_IO path, someone just needs to come up with the
>>>correct implementation for a target path.
>>
>>I'm not completely understand how it will work. Consider, there are
>>READ/WRITE commands with random data sizes arrive from an initiator. Are
>>you going to do map/unmap for each command individually or alloc data
>>buffers for commands from a premapped area and live with possible its
>>fragmentation? If map/unmap individually, then I should say that those
>>are very expensive operations.
>
> You do it the same way an array does: the model for SPI is read command
> phase, disconnect, process command (i.e. set up areas) reconnect for
> data transfer.
>
> map/unmap are really only necessary if you're emulating the data store,
> but it's a fairly cheap operation on linux: it just causes the creation
> of a vm_area. If it's a pass through, you can use SG_IO to pull it in
> and the SG_IO like output to shoot it out again, effectively using a
> piece of user memory as a zero copy buffer.
>
> Fragmentation isn't an issue because the I/O goes via sg lists , all
> that's needed is a bunch of pages.
OK, I see what you meant, thanks.
> I do have to say that I consider operation in interrupt context (or even
> kernel context) to be a disadvantage. Compared with the response times
> that most arrays have to SCSI commands, the kernel context switch time
> isn't that significant.
Are you sure that there are no now or will be available in the nearest
feature such (eg iSCSI) SCSI arrays with response time/latency so small
that having 5 (five) context switches or more per command, some of which
include map/unmap operations, will not increase the latency too much? I
mean, eg NFS server, which originally was user space daemon and many
people didn't want it in the kernel. Eventually, it's in. I don't see
any fundamental difference between NFS server and SCSI target server,
especially considering that SCSI is synchroneous protocol, where the
latency matters a lot, and usually there is some FS over exported SCSI
device, which not always does bulk data operations.
Vlad
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
2005-12-10 15:32 ` Ang: Re: [Stgt-devel] " Vladislav Bolkhovitin
@ 2005-12-10 15:54 ` FUJITA Tomonori
2005-12-14 15:17 ` [Scst-devel] " Vladislav Bolkhovitin
2005-12-10 18:09 ` Mike Christie
1 sibling, 1 reply; 43+ messages in thread
From: FUJITA Tomonori @ 2005-12-10 15:54 UTC (permalink / raw)
To: vst
Cc: michaelc, johan, iscsitarget-devel, mingz, stgt-devel,
WRWHITEHEAD, scst-devel, linux-scsi, hch
From: Vladislav Bolkhovitin <vst@vlnb.net>
Subject: Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
Date: Sat, 10 Dec 2005 18:32:46 +0300
> >>>Also is the page cache comment in reference to us using the page cache
> >>>for our reads and writes or I am not sure why you wrote that if you do
> >>>not do it right now.
> >>
> >>Hm, at first, the page cache already used somehow in fileio dev handler
> >>(though, with additional mem copy). At the second, fully utilize the
> >>page cache is one of two major improvements that are pending in scst,
> >>because it is required changing the kernel, which until some moment try
> >>to avoid. Although I prepared what is necessary for that.
> >>
> >>The idea basically is the following. When READ operation arrives, pages
> >>for all requested blocks are at first searched in the page cache
> >>(probably, in SIRQ context, because it isn't expensive operation) and if
> >>all pages are found, they are referenced and the result will be sent to
> >>the initiator. Then the pages will be dereferenced (so, no pages
> >>allocation will be done at all). Otherwise, the missed pages will be
> >>allocated and the command will be rescheduled to the thread, which will
> >>read them. Then, after the response is sent, the pages will remain in
> >>the page cache for future accesses. For WRITEs the processing is the
> >>similar, the pages with the data will be put in the page cache.
> >
> > The Ardis iSCSI target code does the same thing.
>
> Perfectly. So, why don't do it on the mid-layer level where all targets
> can benefit from it?
Because I think that it not of much performance benefit.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: stgt a new version of iscsi target?
2005-12-10 15:32 ` Vladislav Bolkhovitin
@ 2005-12-10 18:07 ` Mike Christie
2005-12-14 15:06 ` Vladislav Bolkhovitin
0 siblings, 1 reply; 43+ messages in thread
From: Mike Christie @ 2005-12-10 18:07 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: James Bottomley, johan, iscsitarget-devel, mingz, stgt,
Robert Whitehead, scst-devel, linux-scsi, Christoph Hellwig
Vladislav Bolkhovitin wrote:
> James Bottomley wrote:
>
>> On Fri, 2005-12-09 at 18:29 +0300, Vladislav Bolkhovitin wrote:
>>
>>>> Additionally, it's perfectly possible for all of this to be done zero
>>>> copy on the data. A user space target mmaps the data on its storage
>>>> device and then does a SG_IO type scatter gather user virtual region
>>>> pass to the underlying target infrastructure. We already have this
>>>> demonstrated in the SG_IO path, someone just needs to come up with the
>>>> correct implementation for a target path.
>>>
>>>
>>> I'm not completely understand how it will work. Consider, there are
>>> READ/WRITE commands with random data sizes arrive from an initiator.
>>> Are you going to do map/unmap for each command individually or alloc
>>> data buffers for commands from a premapped area and live with
>>> possible its fragmentation? If map/unmap individually, then I should
>>> say that those are very expensive operations.
>>
>>
>> You do it the same way an array does: the model for SPI is read command
>> phase, disconnect, process command (i.e. set up areas) reconnect for
>> data transfer.
>>
>> map/unmap are really only necessary if you're emulating the data store,
>> but it's a fairly cheap operation on linux: it just causes the creation
>> of a vm_area. If it's a pass through, you can use SG_IO to pull it in
>> and the SG_IO like output to shoot it out again, effectively using a
>> piece of user memory as a zero copy buffer.
>>
>> Fragmentation isn't an issue because the I/O goes via sg lists , all
>> that's needed is a bunch of pages.
>
>
> OK, I see what you meant, thanks.
>
>> I do have to say that I consider operation in interrupt context (or even
>> kernel context) to be a disadvantage. Compared with the response times
>> that most arrays have to SCSI commands, the kernel context switch time
>> isn't that significant.
>
>
> Are you sure that there are no now or will be available in the nearest
> feature such (eg iSCSI) SCSI arrays with response time/latency so small
> that having 5 (five) context switches or more per command, some of which
> include map/unmap operations, will not increase the latency too much? I
> mean, eg NFS server, which originally was user space daemon and many
> people didn't want it in the kernel. Eventually, it's in. I don't see
> any fundamental difference between NFS server and SCSI target server,
Isn't the reason a NFS server is still in the kernel is becuase some of
the locking difficulties?
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
2005-12-10 15:32 ` Ang: Re: [Stgt-devel] " Vladislav Bolkhovitin
2005-12-10 15:54 ` Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] " FUJITA Tomonori
@ 2005-12-10 18:09 ` Mike Christie
2005-12-14 15:09 ` Ang: Re: [Stgt-devel] " Vladislav Bolkhovitin
1 sibling, 1 reply; 43+ messages in thread
From: Mike Christie @ 2005-12-10 18:09 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: FUJITA Tomonori, johan, iscsitarget-devel, mingz, stgt-devel,
WRWHITEHEAD, scst-devel, linux-scsi, hch
Vladislav Bolkhovitin wrote:
> FUJITA Tomonori wrote:
>
>> From: Vladislav Bolkhovitin <vst@vlnb.net>
>> Subject: Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new
>> version of iscsi target?
>> Date: Fri, 09 Dec 2005 18:28:58 +0300
>>
>>
>>>> Also is the page cache comment in reference to us using the page
>>>> cache for our reads and writes or I am not sure why you wrote that
>>>> if you do not do it right now.
>>>
>>>
>>> Hm, at first, the page cache already used somehow in fileio dev
>>> handler (though, with additional mem copy). At the second, fully
>>> utilize the page cache is one of two major improvements that are
>>> pending in scst, because it is required changing the kernel, which
>>> until some moment try to avoid. Although I prepared what is necessary
>>> for that.
>>>
>>> The idea basically is the following. When READ operation arrives,
>>> pages for all requested blocks are at first searched in the page
>>> cache (probably, in SIRQ context, because it isn't expensive
>>> operation) and if all pages are found, they are referenced and the
>>> result will be sent to the initiator. Then the pages will be
>>> dereferenced (so, no pages allocation will be done at all).
>>> Otherwise, the missed pages will be allocated and the command will be
>>> rescheduled to the thread, which will read them. Then, after the
>>> response is sent, the pages will remain in the page cache for future
>>> accesses. For WRITEs the processing is the similar, the pages with
>>> the data will be put in the page cache.
>>
>>
>> The Ardis iSCSI target code does the same thing.
>
>
> Perfectly. So, why don't do it on the mid-layer level where all targets
> can benefit from it?
>
Any target can hook into stgt too? What is your point since neither of
us are in mainline or even close given the scsi guy's veiwpoint on where
to do reads and writes?
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
2005-12-10 15:31 ` Vladislav Bolkhovitin
@ 2005-12-10 18:19 ` Mike Christie
0 siblings, 0 replies; 43+ messages in thread
From: Mike Christie @ 2005-12-10 18:19 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: boutcher, James Bottomley, johan, iscsitarget-devel, mingz, stgt,
Robert Whitehead, scst-devel, linux-scsi, Christoph Hellwig
Vladislav Bolkhovitin wrote:
> Mike Christie wrote:
>
>>>> There is still memory and scatterlist allocations. If we are not
>>>> going to allocate all the memory for a command buffer and request
>>>> with GFP_ATOMIC (and can then run from the the HW interrupt or soft
>>>> irq) we have to pass that on to a thread. I guess there is
>>>> disagreement whether that part is a feature or bad use of GFP_ATOMIC
>>>> though so... But I just mean to say there could be a little more to do.
>>>
>>>
>>>
>>>
>>> Actually, there is the way to allocate sg vectors with buffers in
>>> SIRQ and not with GFP_ATOMIC. This is the second major improvement,
>>> which is pending in scst. I called it sgv_pool. This is a new
>>> allocator in the kernel similar to mem_pool, but it contains
>>> *complete* sg-vectors of some size with data buffers (pages).
>>> Initiator sends data requests usually with some fixed size, like
>>> 128K. After a data command completed, its sg vector will not be
>>> immediately freed, but will be kept in
>>
>>
>>
>> We considered this, but what did you decide is the upper limit size
>> for the pool? Is it dynmaic? We also wanted something that the SCSI
>> ULDs could use for their allocations which could go up to 6 MB.
>
>
> Why do you think it needs any upper limit size? Would you like also any
> upper limits on sizes of the page or slab caches? I don't see any
Considering I have not seen the code and am probably misunderstanding
what you wrote above, I will shutup and wait until it is out to see it.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
2005-12-10 15:30 ` Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] " Vladislav Bolkhovitin
@ 2005-12-10 18:22 ` Mike Christie
0 siblings, 0 replies; 43+ messages in thread
From: Mike Christie @ 2005-12-10 18:22 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: johan, iscsitarget-devel, mingz, stgt, Robert Whitehead,
scst-devel, linux-scsi, Christoph Hellwig
Vladislav Bolkhovitin wrote:
> Mike Christie wrote:
>
>>> But there are other cleanups like moving some of the state to per
>>> target, cleaningup the scattlist allocation code and moving it to
>>> scsi-ml so the SCSI ULDs can use them and convert them. There is also
>>> thing like converting to the right APIs for 2.6 (rm kernel_thread, rm
>>> scsi_request, rm proc, fixup class interface refcouting problems,
>>> fixup scsi_device lack of refcounting usage, etc).
>>>
>>
>> Oh yeah I think the other major issue at least I had with scst was
>> that it was scsi specific and we wanted try and seperate things so if
>> drivers like IET and vscsi are allowed then we could also do other
>> drivers like a ATA over ethernet target driver or allow any other
>> target driver that wanted to to hook in. I think you noted that we
>> were spererating some protocol specific things as a distadvantage or
>> mentioned it for some reason but I am not completely sure why and we
>> may not agree on that issue too.
>
>
> SCSI has a lot of very specific stuff like UA handling and (at least)
> some parts of task management, especially if we consider honoring NACA,
> QErr, TST, UA_INTLCK_CTRL bits, therefore I'm not sure that to have
> common target parts for other protocols worths complicating the
> mid-layer with code and interfaces that will separate SCSI-specifics
> from non-SCSI protocols. So, good luck with it :-)
>
I am talking about the memory allocation bits for starters. For example
the way you allocate the scatterlists and memory is useful to other
parts of the kernel not just scst. The userspace interface, if we added
the netlink parts to scst could be useful too.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: stgt a new version of iscsi target?
2005-12-10 18:07 ` Mike Christie
@ 2005-12-14 15:06 ` Vladislav Bolkhovitin
2005-12-14 19:55 ` Mike Christie
0 siblings, 1 reply; 43+ messages in thread
From: Vladislav Bolkhovitin @ 2005-12-14 15:06 UTC (permalink / raw)
To: Mike Christie
Cc: James Bottomley, johan, iscsitarget-devel, mingz, stgt,
Robert Whitehead, scst-devel, linux-scsi, Christoph Hellwig
Mike Christie wrote:
>>
>> Are you sure that there are no now or will be available in the nearest
>> feature such (eg iSCSI) SCSI arrays with response time/latency so
>> small that having 5 (five) context switches or more per command, some
>> of which include map/unmap operations, will not increase the latency
>> too much? I mean, eg NFS server, which originally was user space
>> daemon and many people didn't want it in the kernel. Eventually, it's
>> in. I don't see any fundamental difference between NFS server and SCSI
>> target server,
>
>
> Isn't the reason a NFS server is still in the kernel is becuase some of
> the locking difficulties?
Might be. But from what I remember, the major reason was the
performance. After googling a bit I found many acknowledgments of that.
Vlad
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Ang: Re: [Stgt-devel] Re: stgt a new version of iscsi target?
2005-12-10 18:09 ` Mike Christie
@ 2005-12-14 15:09 ` Vladislav Bolkhovitin
0 siblings, 0 replies; 43+ messages in thread
From: Vladislav Bolkhovitin @ 2005-12-14 15:09 UTC (permalink / raw)
To: Mike Christie
Cc: FUJITA Tomonori, johan, iscsitarget-devel, mingz, stgt-devel,
WRWHITEHEAD, scst-devel, linux-scsi, hch
Mike Christie wrote:
> Vladislav Bolkhovitin wrote:
>
>> FUJITA Tomonori wrote:
>>
>>> From: Vladislav Bolkhovitin <vst@vlnb.net>
>>> Subject: Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new
>>> version of iscsi target?
>>> Date: Fri, 09 Dec 2005 18:28:58 +0300
>>>
>>>
>>>>> Also is the page cache comment in reference to us using the page
>>>>> cache for our reads and writes or I am not sure why you wrote that
>>>>> if you do not do it right now.
>>>>
>>>>
>>>>
>>>> Hm, at first, the page cache already used somehow in fileio dev
>>>> handler (though, with additional mem copy). At the second, fully
>>>> utilize the page cache is one of two major improvements that are
>>>> pending in scst, because it is required changing the kernel, which
>>>> until some moment try to avoid. Although I prepared what is
>>>> necessary for that.
>>>>
>>>> The idea basically is the following. When READ operation arrives,
>>>> pages for all requested blocks are at first searched in the page
>>>> cache (probably, in SIRQ context, because it isn't expensive
>>>> operation) and if all pages are found, they are referenced and the
>>>> result will be sent to the initiator. Then the pages will be
>>>> dereferenced (so, no pages allocation will be done at all).
>>>> Otherwise, the missed pages will be allocated and the command will
>>>> be rescheduled to the thread, which will read them. Then, after the
>>>> response is sent, the pages will remain in the page cache for future
>>>> accesses. For WRITEs the processing is the similar, the pages with
>>>> the data will be put in the page cache.
>>>
>>>
>>>
>>> The Ardis iSCSI target code does the same thing.
>>
>>
>>
>> Perfectly. So, why don't do it on the mid-layer level where all
>> targets can benefit from it?
>>
>
> Any target can hook into stgt too? What is your point since neither of
> us are in mainline or even close given the scsi guy's veiwpoint on where
> to do reads and writes?
My point is to show one of the benefits of the kernel side
implementation. Any target driver (of scst or stgt, doesn't matter)
should benefit from it. Obviously, there is no point to overcomplicate
any target driver with the functions of the the mid-level.
Vlad
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Scst-devel] Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
2005-12-10 15:54 ` Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] " FUJITA Tomonori
@ 2005-12-14 15:17 ` Vladislav Bolkhovitin
0 siblings, 0 replies; 43+ messages in thread
From: Vladislav Bolkhovitin @ 2005-12-14 15:17 UTC (permalink / raw)
To: FUJITA Tomonori
Cc: michaelc, johan, iscsitarget-devel, mingz, stgt-devel,
WRWHITEHEAD, scst-devel, linux-scsi, hch
FUJITA Tomonori wrote:
> From: Vladislav Bolkhovitin <vst@vlnb.net>
> Subject: Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
> Date: Sat, 10 Dec 2005 18:32:46 +0300
>
>
>>>>>Also is the page cache comment in reference to us using the page cache
>>>>>for our reads and writes or I am not sure why you wrote that if you do
>>>>>not do it right now.
>>>>
>>>>Hm, at first, the page cache already used somehow in fileio dev handler
>>>>(though, with additional mem copy). At the second, fully utilize the
>>>>page cache is one of two major improvements that are pending in scst,
>>>>because it is required changing the kernel, which until some moment try
>>>>to avoid. Although I prepared what is necessary for that.
>>>>
>>>>The idea basically is the following. When READ operation arrives, pages
>>>>for all requested blocks are at first searched in the page cache
>>>>(probably, in SIRQ context, because it isn't expensive operation) and if
>>>>all pages are found, they are referenced and the result will be sent to
>>>>the initiator. Then the pages will be dereferenced (so, no pages
>>>>allocation will be done at all). Otherwise, the missed pages will be
>>>>allocated and the command will be rescheduled to the thread, which will
>>>>read them. Then, after the response is sent, the pages will remain in
>>>>the page cache for future accesses. For WRITEs the processing is the
>>>>similar, the pages with the data will be put in the page cache.
>>>
>>>The Ardis iSCSI target code does the same thing.
>>
>>Perfectly. So, why don't do it on the mid-layer level where all targets
>>can benefit from it?
>
>
> Because I think that it not of much performance benefit.
Are you sure? Why do you think using the cache together with its nice
features like read-ahead doesn't bring major performance benefits?
Vlad
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: stgt a new version of iscsi target?
2005-12-14 15:06 ` Vladislav Bolkhovitin
@ 2005-12-14 19:55 ` Mike Christie
2005-12-15 18:53 ` Vladislav Bolkhovitin
0 siblings, 1 reply; 43+ messages in thread
From: Mike Christie @ 2005-12-14 19:55 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: James Bottomley, johan, iscsitarget-devel, mingz, stgt,
Robert Whitehead, scst-devel, linux-scsi, Christoph Hellwig
Vladislav Bolkhovitin wrote:
> Mike Christie wrote:
>
>>>
>>> Are you sure that there are no now or will be available in the
>>> nearest feature such (eg iSCSI) SCSI arrays with response
>>> time/latency so small that having 5 (five) context switches or more
>>> per command, some of which include map/unmap operations, will not
>>> increase the latency too much? I mean, eg NFS server, which
>>> originally was user space daemon and many people didn't want it in
>>> the kernel. Eventually, it's in. I don't see any fundamental
>>> difference between NFS server and SCSI target server,
>>
>>
>>
>> Isn't the reason a NFS server is still in the kernel is becuase some
>> of the locking difficulties?
>
>
> Might be. But from what I remember, the major reason was the
> performance. After googling a bit I found many acknowledgments of that.
>
I do not think we are going to get anywhere with this type of thread :(
We should try to compare at least one of the userspace *nbd
implementations with the unh target in scst. I see some that just do
some basic socket ops (no sendfile type hook in even) for the network
part then just async or normal read/writes. I do not want to comapre FC
to nbd, but maybe comparing software iscsi to userspace nbd is a little
more fair. I think ata over ethernet has a userspace target too. Is the
unh target defaults set ok for performance testing, or could you send
some off list, so we can at least test those.
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: stgt a new version of iscsi target?
2005-12-14 19:55 ` Mike Christie
@ 2005-12-15 18:53 ` Vladislav Bolkhovitin
0 siblings, 0 replies; 43+ messages in thread
From: Vladislav Bolkhovitin @ 2005-12-15 18:53 UTC (permalink / raw)
To: Mike Christie
Cc: James Bottomley, iscsitarget-devel, mingz, stgt, Robert Whitehead,
scst-devel, linux-scsi, Christoph Hellwig
Mike Christie wrote:
> Vladislav Bolkhovitin wrote:
>
>> Mike Christie wrote:
>>
>>>>
>>>> Are you sure that there are no now or will be available in the
>>>> nearest feature such (eg iSCSI) SCSI arrays with response
>>>> time/latency so small that having 5 (five) context switches or more
>>>> per command, some of which include map/unmap operations, will not
>>>> increase the latency too much? I mean, eg NFS server, which
>>>> originally was user space daemon and many people didn't want it in
>>>> the kernel. Eventually, it's in. I don't see any fundamental
>>>> difference between NFS server and SCSI target server,
>>>
>>>
>>>
>>>
>>> Isn't the reason a NFS server is still in the kernel is becuase some
>>> of the locking difficulties?
>>
>>
>>
>> Might be. But from what I remember, the major reason was the
>> performance. After googling a bit I found many acknowledgments of that.
>>
>
> I do not think we are going to get anywhere with this type of thread :(
>
> We should try to compare at least one of the userspace *nbd
> implementations with the unh target in scst. I see some that just do
> some basic socket ops (no sendfile type hook in even) for the network
> part then just async or normal read/writes. I do not want to comapre FC
> to nbd, but maybe comparing software iscsi to userspace nbd is a little
> more fair. I think ata over ethernet has a userspace target too. Is the
> unh target defaults set ok for performance testing, or could you send
> some off list, so we can at least test those.
Agree that we need to have some numbers. But currently it is impossible
to measure them correctly without very considerable effort. For
instance, the comparision of nbd with iscsi includes in the measurements
not only user space/kernel space differences, but also many additional
parts, like different implementation architectures. For the correct
comparison we need some target driver (for scst or sgt), which commands
would be processed in both user and kernel space. Additionally, because
we discuss not only user vs kernel implementations, but also SIRQ vs
thread implementations, the target needs to be the hardware one.
Right now without big effort we can only compare SIRQ vs thread
implementations over FC, because the QLA target driver and scst support
both modes of SCSI commands execution. See DEBUG_WORK_IN_THREAD symbol.
We did some comparisons some time ago and, if I recall correctly, on
small blocks (especially 16K and smaller) the performance drop was quite
visible, because ~40000+ cs/sec are not very good for the system health
:). You can easily repeat those experiments using scst, the qlogic
driver and disk_perf or tape_perf dev handler.
But, since FC has quite a big latencies, this comparision will not fully
suit our needs. We need some some low latency link. Probably, some of
hardware iSCSI cards, like Qlogic 4100. But this is not the nearest future.
Vlad
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
2005-12-08 19:48 ` James Bottomley
2005-12-08 20:09 ` Mike Christie
@ 2005-12-21 23:53 ` FUJITA Tomonori
2005-12-22 10:38 ` Vladislav Bolkhovitin
2005-12-26 23:53 ` Ang: " FUJITA Tomonori
2 siblings, 1 reply; 43+ messages in thread
From: FUJITA Tomonori @ 2005-12-21 23:53 UTC (permalink / raw)
To: James.Bottomley
Cc: michaelc, vst, johan, iscsitarget-devel, mingz, stgt-devel,
WRWHITEHEAD, scst-devel, linux-scsi, hch
From: James Bottomley <James.Bottomley@SteelEye.com>
Subject: Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
Date: Thu, 08 Dec 2005 14:48:10 -0500
> On Thu, 2005-12-08 at 13:10 -0600, Mike Christie wrote:
> > cleanup. In the end some of the scsi people liked the idea of throwing
> > the non-read/write command to userspace and to do this we just decided
> > to start over but I have been cutting and pasting your code and cleaning
> > it up as I add more stuff.
>
> To be honest, I'd like to see all command processing at user level
> (including read/write ... for block devices, it shouldn't be that
> inefficient, since you're merely going to say remap an area from one
> device to another; as long as no data transformation ever occurs, the
> user never touches the data and it all remains in the kernel page
> cache).
The current version of tgt that performs READ/WRITE commands in kernel
space using the vfs interface and other commands in user space. I'd
implemented a prototype version of tgt with the mmap scheme.
With the mmap tgt, the kernel module asks a single user-space daemon
(tgtd) to map a file (logical unit) through netlink, then it call
get_user_pages().
I did some initial performance tests with both tgt versions and IET
(as you know, another iSCSI software implementation runs in kernel
space). All implementations run with write back policy so probably,
there should be little real disk I/O effect. I started disktest
benchmark software with cold cache state. The machine has 4 GB memory,
1.5K SCSI disk, 4 CPUs (x86_64).
(disktest -PT -T10 -h1 -K8 -B8192 -ID /dev/sdc -w)
o IET
| 2005/12/21-18:05:15 | STAT | 7259 | v1.2.8 | /dev/sdc | Total write throughput: 48195993.6B/s (45.96MB/s), IOPS 5883.3/s.
o tgt (I/O in kernel space)
| 2005/12/21-18:03:23 | STAT | 7013 | v1.2.8 | /dev/sdc | Total write throughput: 45829324.8B/s (43.71MB/s), IOPS 5594.4/s.
o mmap tgt
| 2005/12/21-18:22:28 | STAT | 7990 | v1.2.8 | /dev/sdc | Total write throughput: 25373900.8B/s (24.20MB/s), IOPS 3097.4/s.
I guess that one of the reasons for the mmap tgt poor performance is
that it uses single user-space daemon so all commands are
serialized. I will implement multi-threaded user-space daemon version
if you like to see its performance.
One potential disadvantage of the mmap scheme is that it reads
unnecessarily from disk about WRITE commands. That is, when it updates
the whole page frame, the vfs interface can avoid reading cleverly,
however, the mmap scheme cannot (If I understand correctly).
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
2005-12-21 23:53 ` FUJITA Tomonori
@ 2005-12-22 10:38 ` Vladislav Bolkhovitin
0 siblings, 0 replies; 43+ messages in thread
From: Vladislav Bolkhovitin @ 2005-12-22 10:38 UTC (permalink / raw)
To: FUJITA Tomonori
Cc: James.Bottomley, michaelc, iscsitarget-devel, mingz, stgt-devel,
WRWHITEHEAD, scst-devel, linux-scsi, hch
FUJITA,
It would be helpful if you also gather CPU utilization statistics during
the tests (user/kernel/idle/iowait time + context switches/sec rate).
Vlad
FUJITA Tomonori wrote:
> I did some initial performance tests with both tgt versions and IET
> (as you know, another iSCSI software implementation runs in kernel
> space). All implementations run with write back policy so probably,
> there should be little real disk I/O effect. I started disktest
> benchmark software with cold cache state. The machine has 4 GB memory,
> 1.5K SCSI disk, 4 CPUs (x86_64).
>
> (disktest -PT -T10 -h1 -K8 -B8192 -ID /dev/sdc -w)
>
> o IET
> | 2005/12/21-18:05:15 | STAT | 7259 | v1.2.8 | /dev/sdc | Total write throughput: 48195993.6B/s (45.96MB/s), IOPS 5883.3/s.
>
> o tgt (I/O in kernel space)
> | 2005/12/21-18:03:23 | STAT | 7013 | v1.2.8 | /dev/sdc | Total write throughput: 45829324.8B/s (43.71MB/s), IOPS 5594.4/s.
>
> o mmap tgt
> | 2005/12/21-18:22:28 | STAT | 7990 | v1.2.8 | /dev/sdc | Total write throughput: 25373900.8B/s (24.20MB/s), IOPS 3097.4/s.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
2005-12-08 19:48 ` James Bottomley
2005-12-08 20:09 ` Mike Christie
2005-12-21 23:53 ` FUJITA Tomonori
@ 2005-12-26 23:53 ` FUJITA Tomonori
2005-12-28 16:32 ` James Bottomley
2 siblings, 1 reply; 43+ messages in thread
From: FUJITA Tomonori @ 2005-12-26 23:53 UTC (permalink / raw)
To: James.Bottomley
Cc: michaelc, vst, johan, iscsitarget-devel, mingz, stgt-devel,
WRWHITEHEAD, scst-devel, linux-scsi, hch
From: James Bottomley <James.Bottomley@SteelEye.com>
Subject: Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
Date: Thu, 08 Dec 2005 14:48:10 -0500
> On Thu, 2005-12-08 at 13:10 -0600, Mike Christie wrote:
> > cleanup. In the end some of the scsi people liked the idea of throwing
> > the non-read/write command to userspace and to do this we just decided
> > to start over but I have been cutting and pasting your code and cleaning
> > it up as I add more stuff.
>
> To be honest, I'd like to see all command processing at user level
> (including read/write ... for block devices, it shouldn't be that
> inefficient, since you're merely going to say remap an area from one
> device to another; as long as no data transformation ever occurs, the
> user never touches the data and it all remains in the kernel page
> cache).
>
> My ideal for the kernel based infrastructure is a simple tap for
> transporting commands addressed to devices upwards (and the responses
> downwards). Then everyone can have their own user space processing
> implementation that I don't have to care about.
Mike and I have worked on the tgt mmap version.
o It does read/write commands like sg by using mmap in user space and
get_user_pages in kernel space.
o It does non-read/write commands like direct I/O by allocating
aligned buffers in user space and using get_user_pages in kernel space.
It works like the simple tap that you suggested. It does not allocate
buffers in kernel space at all and does zero copy on all sorts of
commands.
Here are some performance results with open-iscsi (which are better
than the previous results that I got with sfnet).
o IET
| 2005/12/27-07:50:59 | STAT | 6827 | v1.2.8 | /dev/sdc | Total write throughput: 53790310.4B/s (51.30MB/s), IOPS 6566.2/s.
o current tgt (I/O in kernel space)
| 2005/12/27-08:07:50 | STAT | 7294 | v1.2.8 | /dev/sdc | Total write throughput: 49666457.6B/s (47.37MB/s), IOPS 6062.8/s.
o tgt mmap
| 2005/12/27-08:42:51 | STAT | 5286 | v1.2.8 | /dev/sdc | Total write
throughput: 44701286.4B/s (42.63MB/s), IOPS 5456.7/s.
We can get something like this if we avoid calling mmap/munmap per
command (by using some sorts of caching).
o tgt mmap (mmap caching)
| 2005/12/27-07:53:19 | STAT | 6996 | v1.2.8 | /dev/sdc | Total write throughput: 48253337.6B/s (46.02MB/s), IOPS 5890.3/s.
James, can we get your approval of the this mmap design?
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
2005-12-26 23:53 ` Ang: " FUJITA Tomonori
@ 2005-12-28 16:32 ` James Bottomley
2005-12-31 3:27 ` Mike Christie
0 siblings, 1 reply; 43+ messages in thread
From: James Bottomley @ 2005-12-28 16:32 UTC (permalink / raw)
To: FUJITA Tomonori
Cc: michaelc, vst, johan, iscsitarget-devel, mingz, stgt-devel,
WRWHITEHEAD, scst-devel, linux-scsi, hch
On Tue, 2005-12-27 at 08:53 +0900, FUJITA Tomonori wrote:
> Mike and I have worked on the tgt mmap version.
>
> o It does read/write commands like sg by using mmap in user space and
> get_user_pages in kernel space.
>
> o It does non-read/write commands like direct I/O by allocating
> aligned buffers in user space and using get_user_pages in kernel space.
>
> It works like the simple tap that you suggested. It does not allocate
> buffers in kernel space at all and does zero copy on all sorts of
> commands.
>
> Here are some performance results with open-iscsi (which are better
> than the previous results that I got with sfnet).
>
> o IET
>
> | 2005/12/27-07:50:59 | STAT | 6827 | v1.2.8 | /dev/sdc | Total write throughput: 53790310.4B/s (51.30MB/s), IOPS 6566.2/s.
>
> o current tgt (I/O in kernel space)
>
> | 2005/12/27-08:07:50 | STAT | 7294 | v1.2.8 | /dev/sdc | Total write throughput: 49666457.6B/s (47.37MB/s), IOPS 6062.8/s.
>
> o tgt mmap
>
> | 2005/12/27-08:42:51 | STAT | 5286 | v1.2.8 | /dev/sdc | Total write
> throughput: 44701286.4B/s (42.63MB/s), IOPS 5456.7/s.
>
> We can get something like this if we avoid calling mmap/munmap per
> command (by using some sorts of caching).
>
> o tgt mmap (mmap caching)
>
> | 2005/12/27-07:53:19 | STAT | 6996 | v1.2.8 | /dev/sdc | Total write throughput: 48253337.6B/s (46.02MB/s), IOPS 5890.3/s.
>
>
> James, can we get your approval of the this mmap design?
Yes, that looks fine ... it runs in user space, which was really all I
was looking for.
There is another half to this, which is that I'd like the tap to come
via a SCSI API. This isn't strictly necessary for iSCSI but it would
allow us to integrate a generic target approach that could work for all
SCSI HBA's as well as just iSCSI.
Thanks,
James
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
2005-12-28 16:32 ` James Bottomley
@ 2005-12-31 3:27 ` Mike Christie
2005-12-31 15:33 ` James Bottomley
0 siblings, 1 reply; 43+ messages in thread
From: Mike Christie @ 2005-12-31 3:27 UTC (permalink / raw)
To: James Bottomley
Cc: FUJITA Tomonori, vst, johan, iscsitarget-devel, mingz, stgt-devel,
WRWHITEHEAD, scst-devel, linux-scsi, hch
James Bottomley wrote:
> On Tue, 2005-12-27 at 08:53 +0900, FUJITA Tomonori wrote:
>
>>Mike and I have worked on the tgt mmap version.
>>
>>o It does read/write commands like sg by using mmap in user space and
>>get_user_pages in kernel space.
>>
>>o It does non-read/write commands like direct I/O by allocating
>>aligned buffers in user space and using get_user_pages in kernel space.
>>
>>It works like the simple tap that you suggested. It does not allocate
>>buffers in kernel space at all and does zero copy on all sorts of
>>commands.
>>
>>Here are some performance results with open-iscsi (which are better
>>than the previous results that I got with sfnet).
>>
>>o IET
>>
>>| 2005/12/27-07:50:59 | STAT | 6827 | v1.2.8 | /dev/sdc | Total write throughput: 53790310.4B/s (51.30MB/s), IOPS 6566.2/s.
>>
>>o current tgt (I/O in kernel space)
>>
>>| 2005/12/27-08:07:50 | STAT | 7294 | v1.2.8 | /dev/sdc | Total write throughput: 49666457.6B/s (47.37MB/s), IOPS 6062.8/s.
>>
>>o tgt mmap
>>
>>| 2005/12/27-08:42:51 | STAT | 5286 | v1.2.8 | /dev/sdc | Total write
>>throughput: 44701286.4B/s (42.63MB/s), IOPS 5456.7/s.
>>
>>We can get something like this if we avoid calling mmap/munmap per
>>command (by using some sorts of caching).
>>
>>o tgt mmap (mmap caching)
>>
>>| 2005/12/27-07:53:19 | STAT | 6996 | v1.2.8 | /dev/sdc | Total write throughput: 48253337.6B/s (46.02MB/s), IOPS 5890.3/s.
>>
>>
>>James, can we get your approval of the this mmap design?
>
>
> Yes, that looks fine ... it runs in user space, which was really all I
> was looking for.
>
> There is another half to this, which is that I'd like the tap to come
> via a SCSI API. This isn't strictly necessary for iSCSI but it would
> allow us to integrate a generic target approach that could work for all
> SCSI HBA's as well as just iSCSI.
>
The code we currently have is designed to work with software iscsi
targets or software AoE and HW cards like qlogic or emulex's FC cards.
There are a lot of places we could use scsi-ml or block layer structs
like the request or scsi_cmnd.
To support HW like qlogic or emulex's FC target mode, are you thinking
you might want us to add on to the scsi-ml's scsi_host_template or add a
scsi_target_template? If we add on to the scsi_host_template and if that
one PCI device would be in initiator and target mode at the same time
would we have one scsi_host for that resource and just add our target
related fields to the scsi_host? Is this what you mean?
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] stgt a new version of iscsi target?
2005-12-31 3:27 ` Mike Christie
@ 2005-12-31 15:33 ` James Bottomley
0 siblings, 0 replies; 43+ messages in thread
From: James Bottomley @ 2005-12-31 15:33 UTC (permalink / raw)
To: Mike Christie
Cc: FUJITA Tomonori, vst, johan, iscsitarget-devel, mingz, stgt-devel,
WRWHITEHEAD, scst-devel, linux-scsi, hch
On Fri, 2005-12-30 at 21:27 -0600, Mike Christie wrote:
> > Yes, that looks fine ... it runs in user space, which was really all I
> > was looking for.
> >
> > There is another half to this, which is that I'd like the tap to come
> > via a SCSI API. This isn't strictly necessary for iSCSI but it would
> > allow us to integrate a generic target approach that could work for all
> > SCSI HBA's as well as just iSCSI.
> >
>
> The code we currently have is designed to work with software iscsi
> targets or software AoE and HW cards like qlogic or emulex's FC cards.
> There are a lot of places we could use scsi-ml or block layer structs
> like the request or scsi_cmnd.
>
> To support HW like qlogic or emulex's FC target mode, are you thinking
> you might want us to add on to the scsi-ml's scsi_host_template or add a
> scsi_target_template? If we add on to the scsi_host_template and if that
> one PCI device would be in initiator and target mode at the same time
> would we have one scsi_host for that resource and just add our target
> related fields to the scsi_host? Is this what you mean?
I'm thinking one device would do both intiator and target (although not
at the same time, but probably via some sort of internal role change
mechanism---Although that would be up to the driver writer; it could
certainly be set up to be initiator or target only) we probably need one
or two additional callbacks for sending incoming commands upwards and a
control channel for specifying what we do next (since for write
commands, we need command first, then userspace processing and setup
then body into allocated buffer). The idea is that at the end of the
project we have a well defined target infrastructure for any SCSI device
(with an iSCSI reference implementation).
James
^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2005-12-31 15:34 UTC | newest]
Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <OF6932015B.01CF53D9-ONC12570D0.00462028@capvert.ins>
[not found] ` <43972C2D.9060500@cs.wisc.edu>
2005-12-08 18:46 ` Ang: Re: [Stgt-devel] Re: stgt a new version of iscsi target? Vladislav Bolkhovitin
2005-12-08 18:54 ` Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] " Mike Christie
2005-12-09 15:30 ` Ang: Re: [Stgt-devel] " Vladislav Bolkhovitin
2005-12-09 22:31 ` Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] " Mike Christie
2005-12-08 19:10 ` Mike Christie
2005-12-08 19:48 ` James Bottomley
2005-12-08 20:09 ` Mike Christie
2005-12-08 21:35 ` Dave C Boutcher
2005-12-08 21:56 ` Mike Christie
2005-12-09 15:29 ` Vladislav Bolkhovitin
2005-12-09 22:31 ` Mike Christie
2005-12-10 15:31 ` Vladislav Bolkhovitin
2005-12-10 18:19 ` Mike Christie
2005-12-10 8:46 ` FUJITA Tomonori
2005-12-09 15:30 ` Vladislav Bolkhovitin
2005-12-09 15:29 ` Vladislav Bolkhovitin
2005-12-21 23:53 ` FUJITA Tomonori
2005-12-22 10:38 ` Vladislav Bolkhovitin
2005-12-26 23:53 ` Ang: " FUJITA Tomonori
2005-12-28 16:32 ` James Bottomley
2005-12-31 3:27 ` Mike Christie
2005-12-31 15:33 ` James Bottomley
2005-12-09 15:28 ` Vladislav Bolkhovitin
2005-12-09 22:23 ` Mike Christie
2005-12-10 1:15 ` Ang: Re: [Stgt-devel] " Mike Christie
2005-12-10 15:30 ` Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] " Vladislav Bolkhovitin
2005-12-10 18:22 ` Mike Christie
2005-12-10 8:46 ` FUJITA Tomonori
2005-12-10 15:32 ` Ang: Re: [Stgt-devel] " Vladislav Bolkhovitin
2005-12-10 15:54 ` Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] " FUJITA Tomonori
2005-12-14 15:17 ` [Scst-devel] " Vladislav Bolkhovitin
2005-12-10 18:09 ` Mike Christie
2005-12-14 15:09 ` Ang: Re: [Stgt-devel] " Vladislav Bolkhovitin
2005-12-08 19:47 ` Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] " James Bottomley
2005-12-09 3:57 ` Mike Christie
2005-12-09 15:00 ` Ang: Re: [Stgt-devel] " Ming Zhang
2005-12-09 15:29 ` [Scst-devel] Re: Ang: Re: [Stgt-devel] Re: [Iscsitarget-devel] " Vladislav Bolkhovitin
2005-12-09 15:48 ` James Bottomley
2005-12-10 15:32 ` Vladislav Bolkhovitin
2005-12-10 18:07 ` Mike Christie
2005-12-14 15:06 ` Vladislav Bolkhovitin
2005-12-14 19:55 ` Mike Christie
2005-12-15 18:53 ` Vladislav Bolkhovitin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).