* Re: [Open-FCoE] transport_generic_handle_data - BUG: scheduling while atomic [not found] ` <13830B75AD5A2F42848F92269B11996F0107633CE4@orsmsx509.amr.corp.intel.com> @ 2010-11-12 0:58 ` Nicholas A. Bellinger [not found] ` <1289523519.2867.181.camel-Y1+j5t8j3WgjMeEPmliV8E/sVC8ogwMJ@public.gmane.org> 0 siblings, 1 reply; 3+ messages in thread From: Nicholas A. Bellinger @ 2010-11-12 0:58 UTC (permalink / raw) To: Patil, Kiran Cc: Joe Eykholt, Jansen, Frank, devel@open-fcoe.org, linux-scsi, Christoph Hellwig On Thu, 2010-11-11 at 14:57 -0800, Patil, Kiran wrote: > Yes, transport_generic_handle_data which is called from ft_recv_write_data can do msleep_interruptible only if transport is active. > > FYI, this msleep was not introduced by my patch, it has been there. > > Agree with Joe's both suggestion (fcoe_rcv - always let it go to processing thread and TCM should not block per CPU receive thread). Will let Nick comment on that. > Hey guys, So the split for interrupt context setup of individual se_cmd descriptors for TCM_Loop (and other WIP HW FC target mode drivers) is to use the optional target_core_fabric_ops->new_cmd_map() for the pieces of se_cmd setup logic that are currently not done in interrupt context. For TCM_Loop this is currently: *) transport_generic_allocate_tasks() (access of lun, PR and ALUA specifics locks currently using spin_lock() + spin_unlock() *) transport_generic_map_mem_to_cmd() using GFP_KERNEL allocations However for this specific transport_generic_handle_data() case: /* * Make sure that the transport has been disabled by * transport_write_pending() before readding this struct se_cmd to the * processing queue. If it has not yet been reset to zero by the * processing thread in transport_add_cmd_to_queue(), let other * processes run. If a signal was received, then we assume the * connection is being failed/shutdown, so we return a failure. */ while (atomic_read(&T_TASK(cmd)->t_transport_active)) { msleep_interruptible(10); if (signal_pending(current)) return -1; } is specific for existing drivers/target/lio-target iSCSI code, which need this for traditional kernel sockets recv side iSCSI WRITE case. Since we have already have FCP write data ready for submission to backend devices at this point, I think we want something in the transport_generic_new_cmd() -> transport_generic_write_pending() code that does the immediate SCSI write submission and skips the TFO->write_pending() callback / extra fabric API exchange/response.. Here is how TCM_loop is currently doing that with SCSI WRITE data mapped from incoming ->queuecommand() cmd->table.sgl memory: int tcm_loop_write_pending(struct se_cmd *se_cmd) { /* * Since Linux/SCSI has already sent down a struct scsi_cmnd * sc->sc_data_direction of DMA_TO_DEVICE with struct scatterlist array * memory, and memory has already been mapped to struct se_cmd->t_mem_list * format with transport_generic_map_mem_to_cmd(). * * We now tell TCM to add this WRITE CDB directly into the TCM storage * object execution queue. */ transport_generic_process_write(se_cmd); return 0; } This will skip the transport_check_aborted_status() in transport_generic_handle_data(), and immediately add the T_TASK(cmd)->t_task_list for se_task execution down to se_subsystem_api->do_task() and out to backend subsystem code. So just to reiterate the point with current v4.0 code, we currently cannot safely call transport_generic_allocate_tasks() or transport_generic_map_mem_to_cmd() from interrupt context, so you want to do these calls using TFO->new_cmd_map() callback in the backend kernel thread process context.. So I think this means you want to call transport_generic_process_write() to immediate queue the WRITE from TFO->write_pending(), but not very certain after looking at ft_write_pending(). Joe, any thoughts here..? Best, --nab > Thanks, > -- Kiran P. > > -----Original Message----- > From: devel-bounces@open-fcoe.org [mailto:devel-bounces@open-fcoe.org] On Behalf Of Joe Eykholt > Sent: Thursday, November 11, 2010 11:52 AM > To: Jansen, Frank > Cc: devel@open-fcoe.org > Subject: Re: [Open-FCoE] transport_generic_handle_data - BUG: scheduling while atomic > > > > On 11/11/10 11:41 AM, Jansen, Frank wrote: > > Greetings! > > > > I'm running 2.6.36 with Kiran Patil's patches from 10/28/10. > > > > I have 4 logical volumes configured over fcoe: > > > > [root@dut ~]# tcm_node --listhbas > > \------> iblock_0 > > HBA Index: 1 plugin: iblock version: v4.0.0-rc5 > > \-------> r0_lun3 > > Status: ACTIVATED Execute/Left/Max Queue Depth: 0/32/32 > > SectorSize: 512 MaxSectors: 1024 > > iBlock device: dm-4 UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l3 > > Major: 253 Minor: 4 CLAIMED: IBLOCK > > udev_path: /dev/vg_R0_p1/lv_R0_p1_l3 > > \-------> r0_lun2 > > Status: ACTIVATED Execute/Left/Max Queue Depth: 0/32/32 > > SectorSize: 512 MaxSectors: 1024 > > iBlock device: dm-3 UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l2 > > Major: 253 Minor: 3 CLAIMED: IBLOCK > > udev_path: /dev/vg_R0_p1/lv_R0_p1_l2 > > \-------> r0_lun1 > > Status: ACTIVATED Execute/Left/Max Queue Depth: 0/32/32 > > SectorSize: 512 MaxSectors: 1024 > > iBlock device: dm-2 UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l1 > > Major: 253 Minor: 2 CLAIMED: IBLOCK > > udev_path: /dev/vg_R0_p1/lv_R0_p1_l1 > > \-------> r0_lun0 > > Status: ACTIVATED Execute/Left/Max Queue Depth: 0/32/32 > > SectorSize: 512 MaxSectors: 1024 > > iBlock device: dm-1 UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l0 > > Major: 253 Minor: 1 CLAIMED: IBLOCK > > udev_path: /dev/vg_R0_p1/lv_R0_p1_l0 > > > > When any significant I/O load is put on any of the devices, I receive > > a flood of the following messages: > > > >> Nov 11 13:46:09 dut kernel: BUG: scheduling while atomic: > >> LIO_iblock/4439/0x00000101 > >> Nov 11 13:46:09 dut kernel: Modules linked in: fcoe libfcoe > >> target_core_stgt target_core_pscsi target_core_file target_core_iblock > >> ipt_MASQUERADE iptable_nat nf_nat bridge stp llc autofs4 tcm_fc libfc > >> scsi_transport_fc scsi_tgt target_core_mod configfs sunrpc ipv6 > >> dm_mirror dm_region_hash dm_log kvm_intel kvm uinput ixgbe ioatdma > >> iTCO_wdt ses enclosure i2c_i801 i2c_core iTCO_vendor_support mdio sg > >> igb dca pcspkr evbug evdev ext4 mbcache jbd2 sd_mod crc_t10dif > >> pata_acpi ata_generic mpt2sas scsi_transport_sas ata_piix raid_class > >> dm_mod [last unloaded: speedstep_lib] > >> Nov 11 13:46:09 dut kernel: Pid: 4439, comm: LIO_iblock Not tainted > >> 2.6.36+ #1 > >> Nov 11 13:46:09 dut kernel: Call Trace: > >> Nov 11 13:46:09 dut kernel: <IRQ> [<ffffffff8104fb96>] > >> __schedule_bug+0x66/0x70 > >> Nov 11 13:46:09 dut kernel: [<ffffffff8149779c>] schedule+0xa2c/0xa60 > >> Nov 11 13:46:09 dut kernel: [<ffffffff81497d73>] > >> schedule_timeout+0x173/0x2e0 > >> Nov 11 13:46:09 dut kernel: [<ffffffff81071200>] ? > >> process_timeout+0x0/0x10 > >> Nov 11 13:46:09 dut kernel: [<ffffffff81497f3e>] > >> schedule_timeout_interruptible+0x1e/0x20 > >> Nov 11 13:46:09 dut kernel: [<ffffffff81072b39>] > >> msleep_interruptible+0x39/0x50 > >> Nov 11 13:46:09 dut kernel: [<ffffffffa033ebfa>] > >> transport_generic_handle_data+0x2a/0x80 [target_core_mod] > >> Nov 11 13:46:09 dut kernel: [<ffffffffa03c33ee>] > >> ft_recv_write_data+0x1fe/0x2b0 [tcm_fc] > >> Nov 11 13:46:09 dut kernel: [<ffffffffa03c13cb>] ft_recv_seq+0x8b/0xc0 > >> [tcm_fc] > >> Nov 11 13:46:09 dut kernel: [<ffffffffa03a0e1f>] > >> fc_exch_recv+0x61f/0xe20 [libfc] > >> Nov 11 13:46:09 dut kernel: [<ffffffff813c1123>] ? > >> skb_copy_bits+0x63/0x2c0 > >> Nov 11 13:46:09 dut kernel: [<ffffffff813c15ea>] ? > >> __pskb_pull_tail+0x26a/0x360 > >> Nov 11 13:46:09 dut kernel: [<ffffffffa015b86d>] > >> fcoe_recv_frame+0x18d/0x340 [fcoe] > >> Nov 11 13:46:09 dut kernel: [<ffffffff813c13df>] ? > >> __pskb_pull_tail+0x5f/0x360 > >> Nov 11 13:46:09 dut kernel: [<ffffffff813c0404>] ? > >> __netdev_alloc_skb+0x24/0x50 > >> Nov 11 13:46:09 dut kernel: [<ffffffffa015e52a>] fcoe_rcv+0x2aa/0x44c > >> [fcoe] > >> Nov 11 13:46:09 dut kernel: [<ffffffff8113c897>] ? > >> __kmalloc_node_track_caller+0x67/0xe0 > >> Nov 11 13:46:09 dut kernel: [<ffffffff813c0404>] ? > >> __netdev_alloc_skb+0x24/0x50 > >> Nov 11 13:46:09 dut kernel: [<ffffffff813cd39a>] > >> __netif_receive_skb+0x41a/0x5d0 > >> Nov 11 13:46:09 dut kernel: [<ffffffff81012699>] ? read_tsc+0x9/0x20 > >> Nov 11 13:46:09 dut kernel: [<ffffffff813ceab8>] > >> netif_receive_skb+0x58/0x80 > >> Nov 11 13:46:09 dut kernel: [<ffffffff813cec20>] > >> napi_skb_finish+0x50/0x70 > >> Nov 11 13:46:09 dut kernel: [<ffffffff813cf1a5>] > >> napi_gro_receive+0xc5/0xd0 > >> Nov 11 13:46:09 dut kernel: [<ffffffffa0207a1f>] > >> ixgbe_clean_rx_irq+0x31f/0x840 [ixgbe] > >> Nov 11 13:46:09 dut kernel: [<ffffffffa02083a6>] > >> ixgbe_clean_rxtx_many+0x136/0x240 [ixgbe] > >> Nov 11 13:46:09 dut kernel: [<ffffffff813cf382>] > >> net_rx_action+0x102/0x250 > >> Nov 11 13:46:09 dut kernel: [<ffffffff81068af2>] > >> __do_softirq+0xb2/0x240 > >> Nov 11 13:46:09 dut kernel: [<ffffffff8100c07c>] call_softirq+0x1c/0x30 > >> Nov 11 13:46:09 dut kernel: <EOI> [<ffffffff8100db25>] ? > >> do_softirq+0x65/0xa0 > >> Nov 11 13:46:09 dut kernel: [<ffffffff81068664>] > >> local_bh_enable+0x94/0xa0 > >> Nov 11 13:46:09 dut kernel: [<ffffffff813cdfd3>] > >> dev_queue_xmit+0x143/0x3b0 > >> Nov 11 13:46:09 dut kernel: [<ffffffffa015d96e>] fcoe_xmit+0x30e/0x520 > >> [fcoe] > >> Nov 11 13:46:09 dut kernel: [<ffffffffa03a2a13>] ? > >> _fc_frame_alloc+0x33/0x90 [libfc] > >> Nov 11 13:46:09 dut kernel: [<ffffffffa039f904>] fc_seq_send+0xb4/0x140 > >> [libfc] > >> Nov 11 13:46:09 dut kernel: [<ffffffffa03c1722>] > >> ft_write_pending+0x112/0x160 [tcm_fc] > >> Nov 11 13:46:09 dut kernel: [<ffffffffa0347800>] > >> transport_generic_new_cmd+0x280/0x2b0 [target_core_mod] > >> Nov 11 13:46:09 dut kernel: [<ffffffffa03479d4>] > >> transport_processing_thread+0x1a4/0x7c0 [target_core_mod] > >> Nov 11 13:46:09 dut kernel: [<ffffffff810835d0>] ? > >> autoremove_wake_function+0x0/0x40 > >> Nov 11 13:46:09 dut kernel: [<ffffffffa0347830>] ? > >> transport_processing_thread+0x0/0x7c0 [target_core_mod] > >> Nov 11 13:46:09 dut kernel: [<ffffffff81082f36>] kthread+0x96/0xa0 > >> Nov 11 13:46:09 dut kernel: [<ffffffff8100bf84>] > >> kernel_thread_helper+0x4/0x10 > >> Nov 11 13:46:09 dut kernel: [<ffffffff81082ea0>] ? kthread+0x0/0xa0 > >> Nov 11 13:46:09 dut kernel: [<ffffffff8100bf80>] ? > >> kernel_thread_helper+0x0/0x10 > > > > I started noticing these issues first when I ran I/O with larger > > filesizes (appr. 25GB), but I'm thinking that might be a red herring. > > I'll rebuild the kernel and tools to make sure nothing is out of sorts > > and will report on any additional findings. > > > > Thanks, > > > > Frank > > FCP data frames are coming in at the interrupt level, and TCM expects > to be called in a thread or non-interrupt context, since > transport_generic_handle_data() may sleep. > > A quick workaround would be to change the fast path in fcoe_rcv() so that > data always goes through the per-cpu receive threads. That avoids part of the > problem, but isn't anything like the right fix. It doesn't seem good to > let TCM block FCoE's per-cpu receive thread either. > > Here's a quick change if you want to just work around the problem. > I haven't tested it: > > diff --git a/drivers/scsi/fcoe/fcoe.c b/drivers/scsi/fcoe/fcoe.c > index feddb53..8f854cd 100644 > --- a/drivers/scsi/fcoe/fcoe.c > +++ b/drivers/scsi/fcoe/fcoe.c > @@ -1285,6 +1285,7 @@ int fcoe_rcv(struct sk_buff *skb, struct net_device *netdev, > * BLOCK softirq context. > */ > if (fh->fh_type == FC_TYPE_FCP && > + 0 && > cpu == smp_processor_id() && > skb_queue_empty(&fps->fcoe_rx_list)) { > spin_unlock_bh(&fps->fcoe_rx_list.lock); > > --- > > Cheers, > Joe > > > > > _______________________________________________ > devel mailing list > devel@open-fcoe.org > http://www.open-fcoe.org/mailman/listinfo/devel > _______________________________________________ > devel mailing list > devel@open-fcoe.org > http://www.open-fcoe.org/mailman/listinfo/devel ^ permalink raw reply [flat|nested] 3+ messages in thread
[parent not found: <1289523519.2867.181.camel-Y1+j5t8j3WgjMeEPmliV8E/sVC8ogwMJ@public.gmane.org>]
* Re: transport_generic_handle_data - BUG: scheduling while atomic [not found] ` <1289523519.2867.181.camel-Y1+j5t8j3WgjMeEPmliV8E/sVC8ogwMJ@public.gmane.org> @ 2010-11-12 1:18 ` Joe Eykholt 2010-11-12 1:29 ` [Open-FCoE] " Nicholas A. Bellinger 0 siblings, 1 reply; 3+ messages in thread From: Joe Eykholt @ 2010-11-12 1:18 UTC (permalink / raw) To: linux-iscsi-target-dev-/JYPxA39Uh5TLH3MbocFFw Cc: linux-scsi, Jansen, Frank, Christoph Hellwig, devel-s9riP+hp16TNLxjTenLetw@public.gmane.org On 11/11/10 4:58 PM, Nicholas A. Bellinger wrote: > On Thu, 2010-11-11 at 14:57 -0800, Patil, Kiran wrote: >> Yes, transport_generic_handle_data which is called from ft_recv_write_data can do msleep_interruptible only if transport is active. >> >> FYI, this msleep was not introduced by my patch, it has been there. >> >> Agree with Joe's both suggestion (fcoe_rcv - always let it go to processing thread and TCM should not block per CPU receive thread). Will let Nick comment on that. >> > > Hey guys, > > So the split for interrupt context setup of individual se_cmd > descriptors for TCM_Loop (and other WIP HW FC target mode drivers) is to > use the optional target_core_fabric_ops->new_cmd_map() for the pieces of > se_cmd setup logic that are currently not done in interrupt context. > For TCM_Loop this is currently: > > *) transport_generic_allocate_tasks() (access of lun, PR and ALUA > specifics locks currently using spin_lock() + spin_unlock() > *) transport_generic_map_mem_to_cmd() using GFP_KERNEL allocations > > However for this specific transport_generic_handle_data() case: > > /* > * Make sure that the transport has been disabled by > * transport_write_pending() before readding this struct se_cmd to the > * processing queue. If it has not yet been reset to zero by the > * processing thread in transport_add_cmd_to_queue(), let other > * processes run. If a signal was received, then we assume the > * connection is being failed/shutdown, so we return a failure. > */ > while (atomic_read(&T_TASK(cmd)->t_transport_active)) { > msleep_interruptible(10); > if (signal_pending(current)) > return -1; > } > > is specific for existing drivers/target/lio-target iSCSI code, which need this for > traditional kernel sockets recv side iSCSI WRITE case. > > Since we have already have FCP write data ready for submission to (We have some, usually not all of the data) > backend devices at this point, I think we want something in the > transport_generic_new_cmd() -> transport_generic_write_pending() code > that does the immediate SCSI write submission and skips the > TFO->write_pending() callback / extra fabric API exchange/response.. If I understand, the write_pending() callback is when we send the transfer ready to the initiator, and we don't have the data yet. > Here is how TCM_loop is currently doing that with SCSI WRITE data mapped > from incoming ->queuecommand() cmd->table.sgl memory: > > int tcm_loop_write_pending(struct se_cmd *se_cmd) > { > /* > * Since Linux/SCSI has already sent down a struct scsi_cmnd > * sc->sc_data_direction of DMA_TO_DEVICE with struct scatterlist array > * memory, and memory has already been mapped to struct se_cmd->t_mem_list > * format with transport_generic_map_mem_to_cmd(). > * > * We now tell TCM to add this WRITE CDB directly into the TCM storage > * object execution queue. > */ > transport_generic_process_write(se_cmd); > return 0; > } > > This will skip the transport_check_aborted_status() in > transport_generic_handle_data(), and immediately add the > T_TASK(cmd)->t_task_list for se_task execution down to > se_subsystem_api->do_task() and out to backend subsystem code. > > So just to reiterate the point with current v4.0 code, we currently > cannot safely call transport_generic_allocate_tasks() or > transport_generic_map_mem_to_cmd() from interrupt context, so you want > to do these calls using TFO->new_cmd_map() callback in the backend > kernel thread process context.. The workaround I gave calls them from thread context, but we don't want that thread to block (at least not for very long) either. It is holding up more incoming requests and data for unrelated I/O. > So I think this means you want to call transport_generic_process_write() > to immediate queue the WRITE from TFO->write_pending(), but not very > certain after looking at ft_write_pending(). > > Joe, any thoughts here..? I find this all confusing, mainly because I'm not taking time to figure it all out, and there seem to be so many related issues. So, I'm not sure I've researched it enough to make any of these comments. Eventually, we want to accumulate all the write data frames and then give you an s/g list for them which you pass to the back end driver. For FCP, however, the sequence is: receive command - verify LUN, etc. TCM calls tcm_fc to send transfer-ready. When all the data frames have been received, tcm_fc makes the S/G list and give them to TCM. When the back end is done, tcm_fc sends status and free the frames. In the mean time, the current interface is probably fine, but means we need to do a copy, unless the LLD uses direct data placement. Joe > Best, > > --nab > >> Thanks, >> -- Kiran P. >> >> -----Original Message----- >> From: devel-bounces-s9riP+hp16TNLxjTenLetw@public.gmane.org [mailto:devel-bounces-s9riP+hp16TNLxjTenLetw@public.gmane.org] On Behalf Of Joe Eykholt >> Sent: Thursday, November 11, 2010 11:52 AM >> To: Jansen, Frank >> Cc: devel-s9riP+hp16TNLxjTenLetw@public.gmane.org >> Subject: Re: [Open-FCoE] transport_generic_handle_data - BUG: scheduling while atomic >> >> >> >> On 11/11/10 11:41 AM, Jansen, Frank wrote: >>> Greetings! >>> >>> I'm running 2.6.36 with Kiran Patil's patches from 10/28/10. >>> >>> I have 4 logical volumes configured over fcoe: >>> >>> [root@dut ~]# tcm_node --listhbas >>> \------> iblock_0 >>> HBA Index: 1 plugin: iblock version: v4.0.0-rc5 >>> \-------> r0_lun3 >>> Status: ACTIVATED Execute/Left/Max Queue Depth: 0/32/32 >>> SectorSize: 512 MaxSectors: 1024 >>> iBlock device: dm-4 UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l3 >>> Major: 253 Minor: 4 CLAIMED: IBLOCK >>> udev_path: /dev/vg_R0_p1/lv_R0_p1_l3 >>> \-------> r0_lun2 >>> Status: ACTIVATED Execute/Left/Max Queue Depth: 0/32/32 >>> SectorSize: 512 MaxSectors: 1024 >>> iBlock device: dm-3 UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l2 >>> Major: 253 Minor: 3 CLAIMED: IBLOCK >>> udev_path: /dev/vg_R0_p1/lv_R0_p1_l2 >>> \-------> r0_lun1 >>> Status: ACTIVATED Execute/Left/Max Queue Depth: 0/32/32 >>> SectorSize: 512 MaxSectors: 1024 >>> iBlock device: dm-2 UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l1 >>> Major: 253 Minor: 2 CLAIMED: IBLOCK >>> udev_path: /dev/vg_R0_p1/lv_R0_p1_l1 >>> \-------> r0_lun0 >>> Status: ACTIVATED Execute/Left/Max Queue Depth: 0/32/32 >>> SectorSize: 512 MaxSectors: 1024 >>> iBlock device: dm-1 UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l0 >>> Major: 253 Minor: 1 CLAIMED: IBLOCK >>> udev_path: /dev/vg_R0_p1/lv_R0_p1_l0 >>> >>> When any significant I/O load is put on any of the devices, I receive >>> a flood of the following messages: >>> >>>> Nov 11 13:46:09 dut kernel: BUG: scheduling while atomic: >>>> LIO_iblock/4439/0x00000101 >>>> Nov 11 13:46:09 dut kernel: Modules linked in: fcoe libfcoe >>>> target_core_stgt target_core_pscsi target_core_file target_core_iblock >>>> ipt_MASQUERADE iptable_nat nf_nat bridge stp llc autofs4 tcm_fc libfc >>>> scsi_transport_fc scsi_tgt target_core_mod configfs sunrpc ipv6 >>>> dm_mirror dm_region_hash dm_log kvm_intel kvm uinput ixgbe ioatdma >>>> iTCO_wdt ses enclosure i2c_i801 i2c_core iTCO_vendor_support mdio sg >>>> igb dca pcspkr evbug evdev ext4 mbcache jbd2 sd_mod crc_t10dif >>>> pata_acpi ata_generic mpt2sas scsi_transport_sas ata_piix raid_class >>>> dm_mod [last unloaded: speedstep_lib] >>>> Nov 11 13:46:09 dut kernel: Pid: 4439, comm: LIO_iblock Not tainted >>>> 2.6.36+ #1 >>>> Nov 11 13:46:09 dut kernel: Call Trace: >>>> Nov 11 13:46:09 dut kernel: <IRQ> [<ffffffff8104fb96>] >>>> __schedule_bug+0x66/0x70 >>>> Nov 11 13:46:09 dut kernel: [<ffffffff8149779c>] schedule+0xa2c/0xa60 >>>> Nov 11 13:46:09 dut kernel: [<ffffffff81497d73>] >>>> schedule_timeout+0x173/0x2e0 >>>> Nov 11 13:46:09 dut kernel: [<ffffffff81071200>] ? >>>> process_timeout+0x0/0x10 >>>> Nov 11 13:46:09 dut kernel: [<ffffffff81497f3e>] >>>> schedule_timeout_interruptible+0x1e/0x20 >>>> Nov 11 13:46:09 dut kernel: [<ffffffff81072b39>] >>>> msleep_interruptible+0x39/0x50 >>>> Nov 11 13:46:09 dut kernel: [<ffffffffa033ebfa>] >>>> transport_generic_handle_data+0x2a/0x80 [target_core_mod] >>>> Nov 11 13:46:09 dut kernel: [<ffffffffa03c33ee>] >>>> ft_recv_write_data+0x1fe/0x2b0 [tcm_fc] >>>> Nov 11 13:46:09 dut kernel: [<ffffffffa03c13cb>] ft_recv_seq+0x8b/0xc0 >>>> [tcm_fc] >>>> Nov 11 13:46:09 dut kernel: [<ffffffffa03a0e1f>] >>>> fc_exch_recv+0x61f/0xe20 [libfc] >>>> Nov 11 13:46:09 dut kernel: [<ffffffff813c1123>] ? >>>> skb_copy_bits+0x63/0x2c0 >>>> Nov 11 13:46:09 dut kernel: [<ffffffff813c15ea>] ? >>>> __pskb_pull_tail+0x26a/0x360 >>>> Nov 11 13:46:09 dut kernel: [<ffffffffa015b86d>] >>>> fcoe_recv_frame+0x18d/0x340 [fcoe] >>>> Nov 11 13:46:09 dut kernel: [<ffffffff813c13df>] ? >>>> __pskb_pull_tail+0x5f/0x360 >>>> Nov 11 13:46:09 dut kernel: [<ffffffff813c0404>] ? >>>> __netdev_alloc_skb+0x24/0x50 >>>> Nov 11 13:46:09 dut kernel: [<ffffffffa015e52a>] fcoe_rcv+0x2aa/0x44c >>>> [fcoe] >>>> Nov 11 13:46:09 dut kernel: [<ffffffff8113c897>] ? >>>> __kmalloc_node_track_caller+0x67/0xe0 >>>> Nov 11 13:46:09 dut kernel: [<ffffffff813c0404>] ? >>>> __netdev_alloc_skb+0x24/0x50 >>>> Nov 11 13:46:09 dut kernel: [<ffffffff813cd39a>] >>>> __netif_receive_skb+0x41a/0x5d0 >>>> Nov 11 13:46:09 dut kernel: [<ffffffff81012699>] ? read_tsc+0x9/0x20 >>>> Nov 11 13:46:09 dut kernel: [<ffffffff813ceab8>] >>>> netif_receive_skb+0x58/0x80 >>>> Nov 11 13:46:09 dut kernel: [<ffffffff813cec20>] >>>> napi_skb_finish+0x50/0x70 >>>> Nov 11 13:46:09 dut kernel: [<ffffffff813cf1a5>] >>>> napi_gro_receive+0xc5/0xd0 >>>> Nov 11 13:46:09 dut kernel: [<ffffffffa0207a1f>] >>>> ixgbe_clean_rx_irq+0x31f/0x840 [ixgbe] >>>> Nov 11 13:46:09 dut kernel: [<ffffffffa02083a6>] >>>> ixgbe_clean_rxtx_many+0x136/0x240 [ixgbe] >>>> Nov 11 13:46:09 dut kernel: [<ffffffff813cf382>] >>>> net_rx_action+0x102/0x250 >>>> Nov 11 13:46:09 dut kernel: [<ffffffff81068af2>] >>>> __do_softirq+0xb2/0x240 >>>> Nov 11 13:46:09 dut kernel: [<ffffffff8100c07c>] call_softirq+0x1c/0x30 >>>> Nov 11 13:46:09 dut kernel: <EOI> [<ffffffff8100db25>] ? >>>> do_softirq+0x65/0xa0 >>>> Nov 11 13:46:09 dut kernel: [<ffffffff81068664>] >>>> local_bh_enable+0x94/0xa0 >>>> Nov 11 13:46:09 dut kernel: [<ffffffff813cdfd3>] >>>> dev_queue_xmit+0x143/0x3b0 >>>> Nov 11 13:46:09 dut kernel: [<ffffffffa015d96e>] fcoe_xmit+0x30e/0x520 >>>> [fcoe] >>>> Nov 11 13:46:09 dut kernel: [<ffffffffa03a2a13>] ? >>>> _fc_frame_alloc+0x33/0x90 [libfc] >>>> Nov 11 13:46:09 dut kernel: [<ffffffffa039f904>] fc_seq_send+0xb4/0x140 >>>> [libfc] >>>> Nov 11 13:46:09 dut kernel: [<ffffffffa03c1722>] >>>> ft_write_pending+0x112/0x160 [tcm_fc] >>>> Nov 11 13:46:09 dut kernel: [<ffffffffa0347800>] >>>> transport_generic_new_cmd+0x280/0x2b0 [target_core_mod] >>>> Nov 11 13:46:09 dut kernel: [<ffffffffa03479d4>] >>>> transport_processing_thread+0x1a4/0x7c0 [target_core_mod] >>>> Nov 11 13:46:09 dut kernel: [<ffffffff810835d0>] ? >>>> autoremove_wake_function+0x0/0x40 >>>> Nov 11 13:46:09 dut kernel: [<ffffffffa0347830>] ? >>>> transport_processing_thread+0x0/0x7c0 [target_core_mod] >>>> Nov 11 13:46:09 dut kernel: [<ffffffff81082f36>] kthread+0x96/0xa0 >>>> Nov 11 13:46:09 dut kernel: [<ffffffff8100bf84>] >>>> kernel_thread_helper+0x4/0x10 >>>> Nov 11 13:46:09 dut kernel: [<ffffffff81082ea0>] ? kthread+0x0/0xa0 >>>> Nov 11 13:46:09 dut kernel: [<ffffffff8100bf80>] ? >>>> kernel_thread_helper+0x0/0x10 >>> >>> I started noticing these issues first when I ran I/O with larger >>> filesizes (appr. 25GB), but I'm thinking that might be a red herring. >>> I'll rebuild the kernel and tools to make sure nothing is out of sorts >>> and will report on any additional findings. >>> >>> Thanks, >>> >>> Frank >> >> FCP data frames are coming in at the interrupt level, and TCM expects >> to be called in a thread or non-interrupt context, since >> transport_generic_handle_data() may sleep. >> >> A quick workaround would be to change the fast path in fcoe_rcv() so that >> data always goes through the per-cpu receive threads. That avoids part of the >> problem, but isn't anything like the right fix. It doesn't seem good to >> let TCM block FCoE's per-cpu receive thread either. >> >> Here's a quick change if you want to just work around the problem. >> I haven't tested it: >> >> diff --git a/drivers/scsi/fcoe/fcoe.c b/drivers/scsi/fcoe/fcoe.c >> index feddb53..8f854cd 100644 >> --- a/drivers/scsi/fcoe/fcoe.c >> +++ b/drivers/scsi/fcoe/fcoe.c >> @@ -1285,6 +1285,7 @@ int fcoe_rcv(struct sk_buff *skb, struct net_device *netdev, >> * BLOCK softirq context. >> */ >> if (fh->fh_type == FC_TYPE_FCP && >> + 0 && >> cpu == smp_processor_id() && >> skb_queue_empty(&fps->fcoe_rx_list)) { >> spin_unlock_bh(&fps->fcoe_rx_list.lock); >> >> --- >> >> Cheers, >> Joe >> >> >> >> >> _______________________________________________ >> devel mailing list >> devel-s9riP+hp16TNLxjTenLetw@public.gmane.org >> http://www.open-fcoe.org/mailman/listinfo/devel >> _______________________________________________ >> devel mailing list >> devel-s9riP+hp16TNLxjTenLetw@public.gmane.org >> http://www.open-fcoe.org/mailman/listinfo/devel > ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Open-FCoE] transport_generic_handle_data - BUG: scheduling while atomic 2010-11-12 1:18 ` Joe Eykholt @ 2010-11-12 1:29 ` Nicholas A. Bellinger 0 siblings, 0 replies; 3+ messages in thread From: Nicholas A. Bellinger @ 2010-11-12 1:29 UTC (permalink / raw) To: Joe Eykholt Cc: linux-iscsi-target-dev, Patil, Kiran, Jansen, Frank, devel@open-fcoe.org, linux-scsi, Christoph Hellwig On Thu, 2010-11-11 at 17:18 -0800, Joe Eykholt wrote: > > On 11/11/10 4:58 PM, Nicholas A. Bellinger wrote: > > On Thu, 2010-11-11 at 14:57 -0800, Patil, Kiran wrote: > >> Yes, transport_generic_handle_data which is called from ft_recv_write_data can do msleep_interruptible only if transport is active. > >> > >> FYI, this msleep was not introduced by my patch, it has been there. > >> > >> Agree with Joe's both suggestion (fcoe_rcv - always let it go to processing thread and TCM should not block per CPU receive thread). Will let Nick comment on that. > >> > > > > Hey guys, > > > > So the split for interrupt context setup of individual se_cmd > > descriptors for TCM_Loop (and other WIP HW FC target mode drivers) is to > > use the optional target_core_fabric_ops->new_cmd_map() for the pieces of > > se_cmd setup logic that are currently not done in interrupt context. > > For TCM_Loop this is currently: > > > > *) transport_generic_allocate_tasks() (access of lun, PR and ALUA > > specifics locks currently using spin_lock() + spin_unlock() > > *) transport_generic_map_mem_to_cmd() using GFP_KERNEL allocations > > > > However for this specific transport_generic_handle_data() case: > > > > /* > > * Make sure that the transport has been disabled by > > * transport_write_pending() before readding this struct se_cmd to the > > * processing queue. If it has not yet been reset to zero by the > > * processing thread in transport_add_cmd_to_queue(), let other > > * processes run. If a signal was received, then we assume the > > * connection is being failed/shutdown, so we return a failure. > > */ > > while (atomic_read(&T_TASK(cmd)->t_transport_active)) { > > msleep_interruptible(10); > > if (signal_pending(current)) > > return -1; > > } > > > > is specific for existing drivers/target/lio-target iSCSI code, which need this for > > traditional kernel sockets recv side iSCSI WRITE case. > > > > Since we have already have FCP write data ready for submission to > > (We have some, usually not all of the data) Correct, because the above msleep_interruptible() case is waiting for TCM to perform internal physical memory for T_TASK(cmd)->t_mem_list and signal back to fabric module code. In the case the se_Cmd coming from pre-mapped SGLs into transport_generic_map_mem_to_cmd() -> T_TASK(cmd)->t_mem_list I am pretty certain we don't ever need to hit this msleep_interruptible() in per se_cmd WRITE descriptor dispatch for backend ->do_task() execution. > > > backend devices at this point, I think we want something in the > > transport_generic_new_cmd() -> transport_generic_write_pending() code > > that does the immediate SCSI write submission and skips the > > TFO->write_pending() callback / extra fabric API exchange/response.. > > If I understand, the write_pending() callback is when we send the transfer ready > to the initiator, and we don't have the data yet. > > > Here is how TCM_loop is currently doing that with SCSI WRITE data mapped > > from incoming ->queuecommand() cmd->table.sgl memory: > > > > int tcm_loop_write_pending(struct se_cmd *se_cmd) > > { > > /* > > * Since Linux/SCSI has already sent down a struct scsi_cmnd > > * sc->sc_data_direction of DMA_TO_DEVICE with struct scatterlist array > > * memory, and memory has already been mapped to struct se_cmd->t_mem_list > > * format with transport_generic_map_mem_to_cmd(). > > * > > * We now tell TCM to add this WRITE CDB directly into the TCM storage > > * object execution queue. > > */ > > transport_generic_process_write(se_cmd); > > return 0; > > } > > > > This will skip the transport_check_aborted_status() in > > transport_generic_handle_data(), and immediately add the > > T_TASK(cmd)->t_task_list for se_task execution down to > > se_subsystem_api->do_task() and out to backend subsystem code. > > > > So just to reiterate the point with current v4.0 code, we currently > > cannot safely call transport_generic_allocate_tasks() or > > transport_generic_map_mem_to_cmd() from interrupt context, so you want > > to do these calls using TFO->new_cmd_map() callback in the backend > > kernel thread process context.. > > The workaround I gave calls them from thread context, but we don't > want that thread to block (at least not for very long) either. It is > holding up more incoming requests and data for unrelated I/O. > <nod> > > So I think this means you want to call transport_generic_process_write() > > to immediate queue the WRITE from TFO->write_pending(), but not very > > certain after looking at ft_write_pending(). > > > > Joe, any thoughts here..? > > I find this all confusing, mainly because I'm not taking time to figure > it all out, and there seem to be so many related issues. So, I'm not > sure I've researched it enough to make any of these comments. > Yes, eventually I would like to be able to transport_generic_allocate_tasks() (which really needs to be renamed, because it's not actually allocating anything yet) and a special case for transport_generic_map_mem_to_cmd() using GFP_ATOMIC (and eventually some pre-allocated threshold perhaps..?) for handling the full interrupt context setup case. But I think this is going to be a v4.1 think at this point, unless this can happen in the next weeks while I am coding on existing HW FC target mode fabric mod ports.. > Eventually, we want to accumulate all the write data frames and then > give you an s/g list for them which you pass to the back end driver. > For FCP, however, the sequence is: > > receive command - verify LUN, etc. <nod> transport_get_lun_for_cmd() can be safely called from interrupt context. > TCM calls tcm_fc to send transfer-ready. > When all the data frames have been received, tcm_fc makes the S/G list and give > them to TCM. When the back end is done, tcm_fc sends status and free the frames. > > In the mean time, the current interface is probably fine, but means we need > to do a copy, unless the LLD uses direct data placement. > Yes, so in that sense the copy still requires an internal TCM T_TASK(cmd)->t_mem_list allocation, which means the msleep_interruptible() check in transport_handle_data() is required and your short term workaround is necessary.. Shall I merge this now or do you want to do want something else..? Thanks Joe, --nab ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2010-11-12 1:34 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <FE279AF0CA06284B8C26150408F3EB120787382F@CBSSEXM02P.crossbeamsys.com>
[not found] ` <4CDC494A.5030207@cisco.com>
[not found] ` <13830B75AD5A2F42848F92269B11996F0107633CE4@orsmsx509.amr.corp.intel.com>
2010-11-12 0:58 ` [Open-FCoE] transport_generic_handle_data - BUG: scheduling while atomic Nicholas A. Bellinger
[not found] ` <1289523519.2867.181.camel-Y1+j5t8j3WgjMeEPmliV8E/sVC8ogwMJ@public.gmane.org>
2010-11-12 1:18 ` Joe Eykholt
2010-11-12 1:29 ` [Open-FCoE] " Nicholas A. Bellinger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox