Re: [Open-FCoE] transport_generic_handle_data - BUG: scheduling while atomic

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [Open-FCoE] transport_generic_handle_data - BUG: scheduling while atomic
       [not found]   ` <13830B75AD5A2F42848F92269B11996F0107633CE4@orsmsx509.amr.corp.intel.com>
@ 2010-11-12  0:58     ` Nicholas A. Bellinger
       [not found]       ` <1289523519.2867.181.camel-Y1+j5t8j3WgjMeEPmliV8E/sVC8ogwMJ@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Nicholas A. Bellinger @ 2010-11-12  0:58 UTC (permalink / raw)
  To: Patil, Kiran
  Cc: Joe Eykholt, Jansen, Frank, devel@open-fcoe.org, linux-scsi,
	Christoph Hellwig

On Thu, 2010-11-11 at 14:57 -0800, Patil, Kiran wrote:
> Yes, transport_generic_handle_data which is called from ft_recv_write_data can do msleep_interruptible only if transport is active.
> 
> FYI, this msleep was not introduced by my patch, it has been there.
> 
> Agree with Joe's both suggestion (fcoe_rcv - always let it go to processing thread and TCM should not block per CPU receive thread). Will let Nick comment on that.
> 

Hey guys,

So the split for interrupt context setup of individual se_cmd
descriptors for TCM_Loop (and other WIP HW FC target mode drivers) is to
use the optional target_core_fabric_ops->new_cmd_map() for the pieces of
se_cmd setup logic that are currently not done in interrupt context.
For TCM_Loop this is currently:

*) transport_generic_allocate_tasks() (access of lun, PR and ALUA 
        specifics locks currently using spin_lock() + spin_unlock()
*) transport_generic_map_mem_to_cmd() using GFP_KERNEL allocations

However for this specific transport_generic_handle_data() case:

        /*
         * Make sure that the transport has been disabled by
         * transport_write_pending() before readding this struct se_cmd to the
         * processing queue.  If it has not yet been reset to zero by the
         * processing thread in transport_add_cmd_to_queue(), let other
         * processes run.  If a signal was received, then we assume the
         * connection is being failed/shutdown, so we return a failure.
         */
        while (atomic_read(&T_TASK(cmd)->t_transport_active)) {
                msleep_interruptible(10);
                if (signal_pending(current))
                        return -1;
        }

is specific for existing drivers/target/lio-target iSCSI code, which need this for
traditional kernel sockets recv side iSCSI WRITE case.

Since we have already have FCP write data ready for submission to
backend devices at this point, I think we want something in the
transport_generic_new_cmd() -> transport_generic_write_pending() code
that does the immediate SCSI write submission and skips the
TFO->write_pending() callback / extra fabric API exchange/response..  

Here is how TCM_loop is currently doing that with SCSI WRITE data mapped
from incoming ->queuecommand() cmd->table.sgl memory:

int tcm_loop_write_pending(struct se_cmd *se_cmd)
{
        /*
         * Since Linux/SCSI has already sent down a struct scsi_cmnd
         * sc->sc_data_direction of DMA_TO_DEVICE with struct scatterlist array
         * memory, and memory has already been mapped to struct se_cmd->t_mem_list
         * format with transport_generic_map_mem_to_cmd().
         *
         * We now tell TCM to add this WRITE CDB directly into the TCM storage
         * object execution queue.
         */
        transport_generic_process_write(se_cmd);
        return 0;
}

This will skip the transport_check_aborted_status() in
transport_generic_handle_data(), and immediately add the
T_TASK(cmd)->t_task_list for se_task execution down to
se_subsystem_api->do_task() and out to backend subsystem code.

So just to reiterate the point with current v4.0 code, we currently
cannot safely call transport_generic_allocate_tasks() or
transport_generic_map_mem_to_cmd() from interrupt context, so you want
to do these calls using TFO->new_cmd_map() callback in the backend
kernel thread process context..  

So I think this means you want to call transport_generic_process_write()
to immediate queue the WRITE from TFO->write_pending(), but not very
certain after looking at ft_write_pending().

Joe, any thoughts here..?

Best,

--nab

> Thanks,
> -- Kiran P.
> 
> -----Original Message-----
> From: devel-bounces@open-fcoe.org [mailto:devel-bounces@open-fcoe.org] On Behalf Of Joe Eykholt
> Sent: Thursday, November 11, 2010 11:52 AM
> To: Jansen, Frank
> Cc: devel@open-fcoe.org
> Subject: Re: [Open-FCoE] transport_generic_handle_data - BUG: scheduling while atomic
> 
> 
> 
> On 11/11/10 11:41 AM, Jansen, Frank wrote:
> > Greetings!
> > 
> > I'm running 2.6.36 with Kiran Patil's patches from 10/28/10.
> > 
> > I have 4 logical volumes configured over fcoe:
> > 
> > [root@dut ~]# tcm_node --listhbas
> > \------> iblock_0
> >        HBA Index: 1 plugin: iblock version: v4.0.0-rc5
> >        \-------> r0_lun3
> >        Status: ACTIVATED  Execute/Left/Max Queue Depth: 0/32/32
> > SectorSize: 512  MaxSectors: 1024
> >        iBlock device: dm-4  UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l3
> >        Major: 253 Minor: 4  CLAIMED: IBLOCK
> >        udev_path: /dev/vg_R0_p1/lv_R0_p1_l3
> >        \-------> r0_lun2
> >        Status: ACTIVATED  Execute/Left/Max Queue Depth: 0/32/32
> > SectorSize: 512  MaxSectors: 1024
> >        iBlock device: dm-3  UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l2
> >        Major: 253 Minor: 3  CLAIMED: IBLOCK
> >        udev_path: /dev/vg_R0_p1/lv_R0_p1_l2
> >        \-------> r0_lun1
> >        Status: ACTIVATED  Execute/Left/Max Queue Depth: 0/32/32
> > SectorSize: 512  MaxSectors: 1024
> >        iBlock device: dm-2  UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l1
> >        Major: 253 Minor: 2  CLAIMED: IBLOCK
> >        udev_path: /dev/vg_R0_p1/lv_R0_p1_l1
> >        \-------> r0_lun0
> >        Status: ACTIVATED  Execute/Left/Max Queue Depth: 0/32/32
> > SectorSize: 512  MaxSectors: 1024
> >        iBlock device: dm-1  UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l0
> >        Major: 253 Minor: 1  CLAIMED: IBLOCK
> >        udev_path: /dev/vg_R0_p1/lv_R0_p1_l0
> > 
> > When any significant I/O load is put on any of the devices, I receive
> > a flood of the following messages:
> > 
> >> Nov 11 13:46:09 dut kernel: BUG: scheduling while atomic:
> >> LIO_iblock/4439/0x00000101
> >> Nov 11 13:46:09 dut kernel: Modules linked in: fcoe libfcoe
> >> target_core_stgt target_core_pscsi target_core_file target_core_iblock
> >> ipt_MASQUERADE iptable_nat nf_nat bridge stp llc autofs4 tcm_fc libfc
> >> scsi_transport_fc scsi_tgt target_core_mod configfs sunrpc ipv6
> >> dm_mirror dm_region_hash dm_log kvm_intel kvm uinput ixgbe ioatdma
> >> iTCO_wdt ses enclosure i2c_i801 i2c_core iTCO_vendor_support mdio sg
> >> igb dca pcspkr evbug evdev ext4 mbcache jbd2 sd_mod crc_t10dif
> >> pata_acpi ata_generic mpt2sas scsi_transport_sas ata_piix raid_class
> >> dm_mod [last unloaded: speedstep_lib]
> >> Nov 11 13:46:09 dut kernel: Pid: 4439, comm: LIO_iblock Not tainted
> >> 2.6.36+ #1
> >> Nov 11 13:46:09 dut kernel: Call Trace:
> >> Nov 11 13:46:09 dut kernel: <IRQ>  [<ffffffff8104fb96>]
> >> __schedule_bug+0x66/0x70
> >> Nov 11 13:46:09 dut kernel: [<ffffffff8149779c>] schedule+0xa2c/0xa60
> >> Nov 11 13:46:09 dut kernel: [<ffffffff81497d73>]
> >> schedule_timeout+0x173/0x2e0
> >> Nov 11 13:46:09 dut kernel: [<ffffffff81071200>] ?
> >> process_timeout+0x0/0x10
> >> Nov 11 13:46:09 dut kernel: [<ffffffff81497f3e>]
> >> schedule_timeout_interruptible+0x1e/0x20
> >> Nov 11 13:46:09 dut kernel: [<ffffffff81072b39>]
> >> msleep_interruptible+0x39/0x50
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa033ebfa>]
> >> transport_generic_handle_data+0x2a/0x80 [target_core_mod]
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa03c33ee>]
> >> ft_recv_write_data+0x1fe/0x2b0 [tcm_fc]
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa03c13cb>] ft_recv_seq+0x8b/0xc0
> >> [tcm_fc]
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa03a0e1f>]
> >> fc_exch_recv+0x61f/0xe20 [libfc]
> >> Nov 11 13:46:09 dut kernel: [<ffffffff813c1123>] ?
> >> skb_copy_bits+0x63/0x2c0
> >> Nov 11 13:46:09 dut kernel: [<ffffffff813c15ea>] ?
> >> __pskb_pull_tail+0x26a/0x360
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa015b86d>]
> >> fcoe_recv_frame+0x18d/0x340 [fcoe]
> >> Nov 11 13:46:09 dut kernel: [<ffffffff813c13df>] ?
> >> __pskb_pull_tail+0x5f/0x360
> >> Nov 11 13:46:09 dut kernel: [<ffffffff813c0404>] ?
> >> __netdev_alloc_skb+0x24/0x50
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa015e52a>] fcoe_rcv+0x2aa/0x44c
> >> [fcoe]
> >> Nov 11 13:46:09 dut kernel: [<ffffffff8113c897>] ?
> >> __kmalloc_node_track_caller+0x67/0xe0
> >> Nov 11 13:46:09 dut kernel: [<ffffffff813c0404>] ?
> >> __netdev_alloc_skb+0x24/0x50
> >> Nov 11 13:46:09 dut kernel: [<ffffffff813cd39a>]
> >> __netif_receive_skb+0x41a/0x5d0
> >> Nov 11 13:46:09 dut kernel: [<ffffffff81012699>] ? read_tsc+0x9/0x20
> >> Nov 11 13:46:09 dut kernel: [<ffffffff813ceab8>]
> >> netif_receive_skb+0x58/0x80
> >> Nov 11 13:46:09 dut kernel: [<ffffffff813cec20>]
> >> napi_skb_finish+0x50/0x70
> >> Nov 11 13:46:09 dut kernel: [<ffffffff813cf1a5>]
> >> napi_gro_receive+0xc5/0xd0
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa0207a1f>]
> >> ixgbe_clean_rx_irq+0x31f/0x840 [ixgbe]
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa02083a6>]
> >> ixgbe_clean_rxtx_many+0x136/0x240 [ixgbe]
> >> Nov 11 13:46:09 dut kernel: [<ffffffff813cf382>]
> >> net_rx_action+0x102/0x250
> >> Nov 11 13:46:09 dut kernel: [<ffffffff81068af2>]
> >> __do_softirq+0xb2/0x240
> >> Nov 11 13:46:09 dut kernel: [<ffffffff8100c07c>] call_softirq+0x1c/0x30
> >> Nov 11 13:46:09 dut kernel: <EOI>  [<ffffffff8100db25>] ?
> >> do_softirq+0x65/0xa0
> >> Nov 11 13:46:09 dut kernel: [<ffffffff81068664>]
> >> local_bh_enable+0x94/0xa0
> >> Nov 11 13:46:09 dut kernel: [<ffffffff813cdfd3>]
> >> dev_queue_xmit+0x143/0x3b0
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa015d96e>] fcoe_xmit+0x30e/0x520
> >> [fcoe]
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa03a2a13>] ?
> >> _fc_frame_alloc+0x33/0x90 [libfc]
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa039f904>] fc_seq_send+0xb4/0x140
> >> [libfc]
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa03c1722>]
> >> ft_write_pending+0x112/0x160 [tcm_fc]
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa0347800>]
> >> transport_generic_new_cmd+0x280/0x2b0 [target_core_mod]
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa03479d4>]
> >> transport_processing_thread+0x1a4/0x7c0 [target_core_mod]
> >> Nov 11 13:46:09 dut kernel: [<ffffffff810835d0>] ?
> >> autoremove_wake_function+0x0/0x40
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa0347830>] ?
> >> transport_processing_thread+0x0/0x7c0 [target_core_mod]
> >> Nov 11 13:46:09 dut kernel: [<ffffffff81082f36>] kthread+0x96/0xa0
> >> Nov 11 13:46:09 dut kernel: [<ffffffff8100bf84>]
> >> kernel_thread_helper+0x4/0x10
> >> Nov 11 13:46:09 dut kernel: [<ffffffff81082ea0>] ? kthread+0x0/0xa0
> >> Nov 11 13:46:09 dut kernel: [<ffffffff8100bf80>] ?
> >> kernel_thread_helper+0x0/0x10
> > 
> > I started noticing these issues first when I ran I/O with larger
> > filesizes (appr. 25GB), but I'm thinking that might be a red herring.
> > I'll rebuild the kernel and tools to make sure nothing is out of sorts
> > and will report on any additional findings.
> > 
> > Thanks,
> > 
> > Frank
> 
> FCP data frames are coming in at the interrupt level, and TCM expects
> to be called in a thread or non-interrupt context, since
> transport_generic_handle_data() may sleep.
> 
> A quick workaround would be to change the fast path in fcoe_rcv() so that
> data always goes through the per-cpu receive threads.   That avoids part of the
> problem, but isn't anything like the right fix.  It doesn't seem good to
> let TCM block FCoE's per-cpu receive thread either.
> 
> Here's a quick change if you want to just work around the problem.
> I haven't tested it:
> 
> diff --git a/drivers/scsi/fcoe/fcoe.c b/drivers/scsi/fcoe/fcoe.c
> index feddb53..8f854cd 100644
> --- a/drivers/scsi/fcoe/fcoe.c
> +++ b/drivers/scsi/fcoe/fcoe.c
> @@ -1285,6 +1285,7 @@ int fcoe_rcv(struct sk_buff *skb, struct net_device *netdev,
>  	 * BLOCK softirq context.
>  	 */
>  	if (fh->fh_type == FC_TYPE_FCP &&
> +	    0 &&
>  	    cpu == smp_processor_id() &&
>  	    skb_queue_empty(&fps->fcoe_rx_list)) {
>  		spin_unlock_bh(&fps->fcoe_rx_list.lock);
> 
> ---
> 
> 	Cheers,
> 	Joe
> 
> 
> 
> 
> _______________________________________________
> devel mailing list
> devel@open-fcoe.org
> http://www.open-fcoe.org/mailman/listinfo/devel
> _______________________________________________
> devel mailing list
> devel@open-fcoe.org
> http://www.open-fcoe.org/mailman/listinfo/devel


^ permalink raw reply	[flat|nested] 3+ messages in thread

[parent not found: <1289523519.2867.181.camel-Y1+j5t8j3WgjMeEPmliV8E/sVC8ogwMJ@public.gmane.org>]

* Re: transport_generic_handle_data - BUG: scheduling while atomic
       [not found]       ` <1289523519.2867.181.camel-Y1+j5t8j3WgjMeEPmliV8E/sVC8ogwMJ@public.gmane.org>
@ 2010-11-12  1:18         ` Joe Eykholt
  2010-11-12  1:29           ` [Open-FCoE] " Nicholas A. Bellinger
  0 siblings, 1 reply; 3+ messages in thread
From: Joe Eykholt @ 2010-11-12  1:18 UTC (permalink / raw)
  To: linux-iscsi-target-dev-/JYPxA39Uh5TLH3MbocFFw
  Cc: linux-scsi, Jansen, Frank, Christoph Hellwig,
	devel-s9riP+hp16TNLxjTenLetw@public.gmane.org



On 11/11/10 4:58 PM, Nicholas A. Bellinger wrote:
> On Thu, 2010-11-11 at 14:57 -0800, Patil, Kiran wrote:
>> Yes, transport_generic_handle_data which is called from ft_recv_write_data can do msleep_interruptible only if transport is active.
>>
>> FYI, this msleep was not introduced by my patch, it has been there.
>>
>> Agree with Joe's both suggestion (fcoe_rcv - always let it go to processing thread and TCM should not block per CPU receive thread). Will let Nick comment on that.
>>
> 
> Hey guys,
> 
> So the split for interrupt context setup of individual se_cmd
> descriptors for TCM_Loop (and other WIP HW FC target mode drivers) is to
> use the optional target_core_fabric_ops->new_cmd_map() for the pieces of
> se_cmd setup logic that are currently not done in interrupt context.
> For TCM_Loop this is currently:
> 
> *) transport_generic_allocate_tasks() (access of lun, PR and ALUA 
>         specifics locks currently using spin_lock() + spin_unlock()
> *) transport_generic_map_mem_to_cmd() using GFP_KERNEL allocations
> 
> However for this specific transport_generic_handle_data() case:
> 
>         /*
>          * Make sure that the transport has been disabled by
>          * transport_write_pending() before readding this struct se_cmd to the
>          * processing queue.  If it has not yet been reset to zero by the
>          * processing thread in transport_add_cmd_to_queue(), let other
>          * processes run.  If a signal was received, then we assume the
>          * connection is being failed/shutdown, so we return a failure.
>          */
>         while (atomic_read(&T_TASK(cmd)->t_transport_active)) {
>                 msleep_interruptible(10);
>                 if (signal_pending(current))
>                         return -1;
>         }
> 
> is specific for existing drivers/target/lio-target iSCSI code, which need this for
> traditional kernel sockets recv side iSCSI WRITE case.
> 
> Since we have already have FCP write data ready for submission to

(We have some, usually not all of the data)

> backend devices at this point, I think we want something in the
> transport_generic_new_cmd() -> transport_generic_write_pending() code
> that does the immediate SCSI write submission and skips the
> TFO->write_pending() callback / extra fabric API exchange/response..  

If I understand, the write_pending() callback is when we send the transfer ready
to the initiator, and we don't have the data yet.

> Here is how TCM_loop is currently doing that with SCSI WRITE data mapped
> from incoming ->queuecommand() cmd->table.sgl memory:
> 
> int tcm_loop_write_pending(struct se_cmd *se_cmd)
> {
>         /*
>          * Since Linux/SCSI has already sent down a struct scsi_cmnd
>          * sc->sc_data_direction of DMA_TO_DEVICE with struct scatterlist array
>          * memory, and memory has already been mapped to struct se_cmd->t_mem_list
>          * format with transport_generic_map_mem_to_cmd().
>          *
>          * We now tell TCM to add this WRITE CDB directly into the TCM storage
>          * object execution queue.
>          */
>         transport_generic_process_write(se_cmd);
>         return 0;
> }
> 
> This will skip the transport_check_aborted_status() in
> transport_generic_handle_data(), and immediately add the
> T_TASK(cmd)->t_task_list for se_task execution down to
> se_subsystem_api->do_task() and out to backend subsystem code.
> 
> So just to reiterate the point with current v4.0 code, we currently
> cannot safely call transport_generic_allocate_tasks() or
> transport_generic_map_mem_to_cmd() from interrupt context, so you want
> to do these calls using TFO->new_cmd_map() callback in the backend
> kernel thread process context..  

The workaround I gave calls them from thread context, but we don't
want that thread to block (at least not for very long) either.  It is
holding up more incoming requests and data for unrelated I/O.

> So I think this means you want to call transport_generic_process_write()
> to immediate queue the WRITE from TFO->write_pending(), but not very
> certain after looking at ft_write_pending().
> 
> Joe, any thoughts here..?

I find this all confusing, mainly because I'm not taking time to figure
it all out, and there seem to be so many related issues.  So, I'm not
sure I've researched it enough to make any of these comments.

Eventually, we want to accumulate all the write data frames and then
give you an s/g list for them which you pass to the back end driver.
For FCP, however, the sequence is:
	
receive command - verify LUN, etc.  TCM calls tcm_fc to send transfer-ready.
When all the data frames have been received, tcm_fc makes the S/G list and give
them to TCM.  When the back end is done, tcm_fc sends status and free the frames.

In the mean time, the current interface is probably fine, but means we need
to do a copy, unless the LLD uses direct data placement.

	Joe

> Best,
> 
> --nab
> 
>> Thanks,
>> -- Kiran P.
>>
>> -----Original Message-----
>> From: devel-bounces-s9riP+hp16TNLxjTenLetw@public.gmane.org [mailto:devel-bounces-s9riP+hp16TNLxjTenLetw@public.gmane.org] On Behalf Of Joe Eykholt
>> Sent: Thursday, November 11, 2010 11:52 AM
>> To: Jansen, Frank
>> Cc: devel-s9riP+hp16TNLxjTenLetw@public.gmane.org
>> Subject: Re: [Open-FCoE] transport_generic_handle_data - BUG: scheduling while atomic
>>
>>
>>
>> On 11/11/10 11:41 AM, Jansen, Frank wrote:
>>> Greetings!
>>>
>>> I'm running 2.6.36 with Kiran Patil's patches from 10/28/10.
>>>
>>> I have 4 logical volumes configured over fcoe:
>>>
>>> [root@dut ~]# tcm_node --listhbas
>>> \------> iblock_0
>>>        HBA Index: 1 plugin: iblock version: v4.0.0-rc5
>>>        \-------> r0_lun3
>>>        Status: ACTIVATED  Execute/Left/Max Queue Depth: 0/32/32
>>> SectorSize: 512  MaxSectors: 1024
>>>        iBlock device: dm-4  UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l3
>>>        Major: 253 Minor: 4  CLAIMED: IBLOCK
>>>        udev_path: /dev/vg_R0_p1/lv_R0_p1_l3
>>>        \-------> r0_lun2
>>>        Status: ACTIVATED  Execute/Left/Max Queue Depth: 0/32/32
>>> SectorSize: 512  MaxSectors: 1024
>>>        iBlock device: dm-3  UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l2
>>>        Major: 253 Minor: 3  CLAIMED: IBLOCK
>>>        udev_path: /dev/vg_R0_p1/lv_R0_p1_l2
>>>        \-------> r0_lun1
>>>        Status: ACTIVATED  Execute/Left/Max Queue Depth: 0/32/32
>>> SectorSize: 512  MaxSectors: 1024
>>>        iBlock device: dm-2  UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l1
>>>        Major: 253 Minor: 2  CLAIMED: IBLOCK
>>>        udev_path: /dev/vg_R0_p1/lv_R0_p1_l1
>>>        \-------> r0_lun0
>>>        Status: ACTIVATED  Execute/Left/Max Queue Depth: 0/32/32
>>> SectorSize: 512  MaxSectors: 1024
>>>        iBlock device: dm-1  UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l0
>>>        Major: 253 Minor: 1  CLAIMED: IBLOCK
>>>        udev_path: /dev/vg_R0_p1/lv_R0_p1_l0
>>>
>>> When any significant I/O load is put on any of the devices, I receive
>>> a flood of the following messages:
>>>
>>>> Nov 11 13:46:09 dut kernel: BUG: scheduling while atomic:
>>>> LIO_iblock/4439/0x00000101
>>>> Nov 11 13:46:09 dut kernel: Modules linked in: fcoe libfcoe
>>>> target_core_stgt target_core_pscsi target_core_file target_core_iblock
>>>> ipt_MASQUERADE iptable_nat nf_nat bridge stp llc autofs4 tcm_fc libfc
>>>> scsi_transport_fc scsi_tgt target_core_mod configfs sunrpc ipv6
>>>> dm_mirror dm_region_hash dm_log kvm_intel kvm uinput ixgbe ioatdma
>>>> iTCO_wdt ses enclosure i2c_i801 i2c_core iTCO_vendor_support mdio sg
>>>> igb dca pcspkr evbug evdev ext4 mbcache jbd2 sd_mod crc_t10dif
>>>> pata_acpi ata_generic mpt2sas scsi_transport_sas ata_piix raid_class
>>>> dm_mod [last unloaded: speedstep_lib]
>>>> Nov 11 13:46:09 dut kernel: Pid: 4439, comm: LIO_iblock Not tainted
>>>> 2.6.36+ #1
>>>> Nov 11 13:46:09 dut kernel: Call Trace:
>>>> Nov 11 13:46:09 dut kernel: <IRQ>  [<ffffffff8104fb96>]
>>>> __schedule_bug+0x66/0x70
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff8149779c>] schedule+0xa2c/0xa60
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff81497d73>]
>>>> schedule_timeout+0x173/0x2e0
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff81071200>] ?
>>>> process_timeout+0x0/0x10
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff81497f3e>]
>>>> schedule_timeout_interruptible+0x1e/0x20
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff81072b39>]
>>>> msleep_interruptible+0x39/0x50
>>>> Nov 11 13:46:09 dut kernel: [<ffffffffa033ebfa>]
>>>> transport_generic_handle_data+0x2a/0x80 [target_core_mod]
>>>> Nov 11 13:46:09 dut kernel: [<ffffffffa03c33ee>]
>>>> ft_recv_write_data+0x1fe/0x2b0 [tcm_fc]
>>>> Nov 11 13:46:09 dut kernel: [<ffffffffa03c13cb>] ft_recv_seq+0x8b/0xc0
>>>> [tcm_fc]
>>>> Nov 11 13:46:09 dut kernel: [<ffffffffa03a0e1f>]
>>>> fc_exch_recv+0x61f/0xe20 [libfc]
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff813c1123>] ?
>>>> skb_copy_bits+0x63/0x2c0
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff813c15ea>] ?
>>>> __pskb_pull_tail+0x26a/0x360
>>>> Nov 11 13:46:09 dut kernel: [<ffffffffa015b86d>]
>>>> fcoe_recv_frame+0x18d/0x340 [fcoe]
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff813c13df>] ?
>>>> __pskb_pull_tail+0x5f/0x360
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff813c0404>] ?
>>>> __netdev_alloc_skb+0x24/0x50
>>>> Nov 11 13:46:09 dut kernel: [<ffffffffa015e52a>] fcoe_rcv+0x2aa/0x44c
>>>> [fcoe]
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff8113c897>] ?
>>>> __kmalloc_node_track_caller+0x67/0xe0
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff813c0404>] ?
>>>> __netdev_alloc_skb+0x24/0x50
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff813cd39a>]
>>>> __netif_receive_skb+0x41a/0x5d0
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff81012699>] ? read_tsc+0x9/0x20
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff813ceab8>]
>>>> netif_receive_skb+0x58/0x80
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff813cec20>]
>>>> napi_skb_finish+0x50/0x70
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff813cf1a5>]
>>>> napi_gro_receive+0xc5/0xd0
>>>> Nov 11 13:46:09 dut kernel: [<ffffffffa0207a1f>]
>>>> ixgbe_clean_rx_irq+0x31f/0x840 [ixgbe]
>>>> Nov 11 13:46:09 dut kernel: [<ffffffffa02083a6>]
>>>> ixgbe_clean_rxtx_many+0x136/0x240 [ixgbe]
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff813cf382>]
>>>> net_rx_action+0x102/0x250
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff81068af2>]
>>>> __do_softirq+0xb2/0x240
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff8100c07c>] call_softirq+0x1c/0x30
>>>> Nov 11 13:46:09 dut kernel: <EOI>  [<ffffffff8100db25>] ?
>>>> do_softirq+0x65/0xa0
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff81068664>]
>>>> local_bh_enable+0x94/0xa0
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff813cdfd3>]
>>>> dev_queue_xmit+0x143/0x3b0
>>>> Nov 11 13:46:09 dut kernel: [<ffffffffa015d96e>] fcoe_xmit+0x30e/0x520
>>>> [fcoe]
>>>> Nov 11 13:46:09 dut kernel: [<ffffffffa03a2a13>] ?
>>>> _fc_frame_alloc+0x33/0x90 [libfc]
>>>> Nov 11 13:46:09 dut kernel: [<ffffffffa039f904>] fc_seq_send+0xb4/0x140
>>>> [libfc]
>>>> Nov 11 13:46:09 dut kernel: [<ffffffffa03c1722>]
>>>> ft_write_pending+0x112/0x160 [tcm_fc]
>>>> Nov 11 13:46:09 dut kernel: [<ffffffffa0347800>]
>>>> transport_generic_new_cmd+0x280/0x2b0 [target_core_mod]
>>>> Nov 11 13:46:09 dut kernel: [<ffffffffa03479d4>]
>>>> transport_processing_thread+0x1a4/0x7c0 [target_core_mod]
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff810835d0>] ?
>>>> autoremove_wake_function+0x0/0x40
>>>> Nov 11 13:46:09 dut kernel: [<ffffffffa0347830>] ?
>>>> transport_processing_thread+0x0/0x7c0 [target_core_mod]
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff81082f36>] kthread+0x96/0xa0
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff8100bf84>]
>>>> kernel_thread_helper+0x4/0x10
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff81082ea0>] ? kthread+0x0/0xa0
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff8100bf80>] ?
>>>> kernel_thread_helper+0x0/0x10
>>>
>>> I started noticing these issues first when I ran I/O with larger
>>> filesizes (appr. 25GB), but I'm thinking that might be a red herring.
>>> I'll rebuild the kernel and tools to make sure nothing is out of sorts
>>> and will report on any additional findings.
>>>
>>> Thanks,
>>>
>>> Frank
>>
>> FCP data frames are coming in at the interrupt level, and TCM expects
>> to be called in a thread or non-interrupt context, since
>> transport_generic_handle_data() may sleep.
>>
>> A quick workaround would be to change the fast path in fcoe_rcv() so that
>> data always goes through the per-cpu receive threads.   That avoids part of the
>> problem, but isn't anything like the right fix.  It doesn't seem good to
>> let TCM block FCoE's per-cpu receive thread either.
>>
>> Here's a quick change if you want to just work around the problem.
>> I haven't tested it:
>>
>> diff --git a/drivers/scsi/fcoe/fcoe.c b/drivers/scsi/fcoe/fcoe.c
>> index feddb53..8f854cd 100644
>> --- a/drivers/scsi/fcoe/fcoe.c
>> +++ b/drivers/scsi/fcoe/fcoe.c
>> @@ -1285,6 +1285,7 @@ int fcoe_rcv(struct sk_buff *skb, struct net_device *netdev,
>>  	 * BLOCK softirq context.
>>  	 */
>>  	if (fh->fh_type == FC_TYPE_FCP &&
>> +	    0 &&
>>  	    cpu == smp_processor_id() &&
>>  	    skb_queue_empty(&fps->fcoe_rx_list)) {
>>  		spin_unlock_bh(&fps->fcoe_rx_list.lock);
>>
>> ---
>>
>> 	Cheers,
>> 	Joe
>>
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel-s9riP+hp16TNLxjTenLetw@public.gmane.org
>> http://www.open-fcoe.org/mailman/listinfo/devel
>> _______________________________________________
>> devel mailing list
>> devel-s9riP+hp16TNLxjTenLetw@public.gmane.org
>> http://www.open-fcoe.org/mailman/listinfo/devel
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Open-FCoE] transport_generic_handle_data - BUG: scheduling while atomic
  2010-11-12  1:18         ` Joe Eykholt
@ 2010-11-12  1:29           ` Nicholas A. Bellinger
  0 siblings, 0 replies; 3+ messages in thread
From: Nicholas A. Bellinger @ 2010-11-12  1:29 UTC (permalink / raw)
  To: Joe Eykholt
  Cc: linux-iscsi-target-dev, Patil, Kiran, Jansen, Frank,
	devel@open-fcoe.org, linux-scsi, Christoph Hellwig

On Thu, 2010-11-11 at 17:18 -0800, Joe Eykholt wrote:
> 
> On 11/11/10 4:58 PM, Nicholas A. Bellinger wrote:
> > On Thu, 2010-11-11 at 14:57 -0800, Patil, Kiran wrote:
> >> Yes, transport_generic_handle_data which is called from ft_recv_write_data can do msleep_interruptible only if transport is active.
> >>
> >> FYI, this msleep was not introduced by my patch, it has been there.
> >>
> >> Agree with Joe's both suggestion (fcoe_rcv - always let it go to processing thread and TCM should not block per CPU receive thread). Will let Nick comment on that.
> >>
> > 
> > Hey guys,
> > 
> > So the split for interrupt context setup of individual se_cmd
> > descriptors for TCM_Loop (and other WIP HW FC target mode drivers) is to
> > use the optional target_core_fabric_ops->new_cmd_map() for the pieces of
> > se_cmd setup logic that are currently not done in interrupt context.
> > For TCM_Loop this is currently:
> > 
> > *) transport_generic_allocate_tasks() (access of lun, PR and ALUA 
> >         specifics locks currently using spin_lock() + spin_unlock()
> > *) transport_generic_map_mem_to_cmd() using GFP_KERNEL allocations
> > 
> > However for this specific transport_generic_handle_data() case:
> > 
> >         /*
> >          * Make sure that the transport has been disabled by
> >          * transport_write_pending() before readding this struct se_cmd to the
> >          * processing queue.  If it has not yet been reset to zero by the
> >          * processing thread in transport_add_cmd_to_queue(), let other
> >          * processes run.  If a signal was received, then we assume the
> >          * connection is being failed/shutdown, so we return a failure.
> >          */
> >         while (atomic_read(&T_TASK(cmd)->t_transport_active)) {
> >                 msleep_interruptible(10);
> >                 if (signal_pending(current))
> >                         return -1;
> >         }
> > 
> > is specific for existing drivers/target/lio-target iSCSI code, which need this for
> > traditional kernel sockets recv side iSCSI WRITE case.
> > 
> > Since we have already have FCP write data ready for submission to
> 
> (We have some, usually not all of the data)

Correct, because the above msleep_interruptible() case is waiting for
TCM to perform internal physical memory for T_TASK(cmd)->t_mem_list and
signal back to fabric module code.

In the case the se_Cmd coming from pre-mapped SGLs into
transport_generic_map_mem_to_cmd() -> T_TASK(cmd)->t_mem_list I am
pretty certain we don't ever need to hit this msleep_interruptible() in
per se_cmd WRITE descriptor dispatch for backend ->do_task() execution.

> 
> > backend devices at this point, I think we want something in the
> > transport_generic_new_cmd() -> transport_generic_write_pending() code
> > that does the immediate SCSI write submission and skips the
> > TFO->write_pending() callback / extra fabric API exchange/response..  
> 
> If I understand, the write_pending() callback is when we send the transfer ready
> to the initiator, and we don't have the data yet.
> 
> > Here is how TCM_loop is currently doing that with SCSI WRITE data mapped
> > from incoming ->queuecommand() cmd->table.sgl memory:
> > 
> > int tcm_loop_write_pending(struct se_cmd *se_cmd)
> > {
> >         /*
> >          * Since Linux/SCSI has already sent down a struct scsi_cmnd
> >          * sc->sc_data_direction of DMA_TO_DEVICE with struct scatterlist array
> >          * memory, and memory has already been mapped to struct se_cmd->t_mem_list
> >          * format with transport_generic_map_mem_to_cmd().
> >          *
> >          * We now tell TCM to add this WRITE CDB directly into the TCM storage
> >          * object execution queue.
> >          */
> >         transport_generic_process_write(se_cmd);
> >         return 0;
> > }
> > 
> > This will skip the transport_check_aborted_status() in
> > transport_generic_handle_data(), and immediately add the
> > T_TASK(cmd)->t_task_list for se_task execution down to
> > se_subsystem_api->do_task() and out to backend subsystem code.
> > 
> > So just to reiterate the point with current v4.0 code, we currently
> > cannot safely call transport_generic_allocate_tasks() or
> > transport_generic_map_mem_to_cmd() from interrupt context, so you want
> > to do these calls using TFO->new_cmd_map() callback in the backend
> > kernel thread process context..  
> 
> The workaround I gave calls them from thread context, but we don't
> want that thread to block (at least not for very long) either.  It is
> holding up more incoming requests and data for unrelated I/O.
> 

<nod>

> > So I think this means you want to call transport_generic_process_write()
> > to immediate queue the WRITE from TFO->write_pending(), but not very
> > certain after looking at ft_write_pending().
> > 
> > Joe, any thoughts here..?
> 
> I find this all confusing, mainly because I'm not taking time to figure
> it all out, and there seem to be so many related issues.  So, I'm not
> sure I've researched it enough to make any of these comments.
> 

Yes, eventually I would like to be able to
transport_generic_allocate_tasks() (which really needs to be renamed,
because it's not actually allocating anything yet) and a special case
for transport_generic_map_mem_to_cmd() using GFP_ATOMIC (and eventually
some pre-allocated threshold perhaps..?) for handling the full interrupt
context setup case.

But I think this is going to be a v4.1 think at this point, unless this
can happen in the next weeks while I am coding on existing HW FC target
mode fabric mod ports..

> Eventually, we want to accumulate all the write data frames and then
> give you an s/g list for them which you pass to the back end driver.
> For FCP, however, the sequence is:
> 	
> receive command - verify LUN, etc.

<nod> transport_get_lun_for_cmd() can be safely called from interrupt
context.

>   TCM calls tcm_fc to send transfer-ready.
> When all the data frames have been received, tcm_fc makes the S/G list and give
> them to TCM.  When the back end is done, tcm_fc sends status and free the frames.
> 
> In the mean time, the current interface is probably fine, but means we need
> to do a copy, unless the LLD uses direct data placement.
> 

Yes, so in that sense the copy still requires an internal TCM
T_TASK(cmd)->t_mem_list allocation, which means the
msleep_interruptible() check in transport_handle_data() is required and
your short term workaround is necessary..

Shall I merge this now or do you want to do want something else..?

Thanks Joe,

--nab




^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2010-11-12  1:34 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <FE279AF0CA06284B8C26150408F3EB120787382F@CBSSEXM02P.crossbeamsys.com>
     [not found] ` <4CDC494A.5030207@cisco.com>
     [not found]   ` <13830B75AD5A2F42848F92269B11996F0107633CE4@orsmsx509.amr.corp.intel.com>
2010-11-12  0:58     ` [Open-FCoE] transport_generic_handle_data - BUG: scheduling while atomic Nicholas A. Bellinger
     [not found]       ` <1289523519.2867.181.camel-Y1+j5t8j3WgjMeEPmliV8E/sVC8ogwMJ@public.gmane.org>
2010-11-12  1:18         ` Joe Eykholt
2010-11-12  1:29           ` [Open-FCoE] " Nicholas A. Bellinger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox