All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Bhanu Prakash Gollapudi" <bprakash@broadcom.com>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: Robert Love <robert.w.love@intel.com>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"devel@open-fcoe.org" <devel@open-fcoe.org>
Subject: Re: [PATCH 01/10] fcoe: avoid getting into dev_queue_xmit with bottom halves disabled
Date: Sun, 11 Mar 2012 20:25:30 -0700	[thread overview]
Message-ID: <4F5D6CAA.7050304@broadcom.com> (raw)
In-Reply-To: <20120311174332.GA3518@neilslaptop.think-freely.org>

On 3/11/2012 10:43 AM, Neil Horman wrote:
> On Fri, Mar 09, 2012 at 03:50:44PM -0800, Bhanu Prakash Gollapudi wrote:
>> On 3/9/2012 2:49 PM, Robert Love wrote:
>>> From: Neil Horman<nhorman@tuxdriver.com>
>>>
>>> Recieved this warning recently:
>>>
>>> WARNING: at kernel/softirq.c:143 local_bh_enable+0x7d/0xb0() (Not tainted)
>>> Hardware name: C6100
>>> Modules linked in: autofs4 bnx2fc cnic uio fcoe libfcoe libfc scsi_transport_fc
>>> scsi_tgt 8021q garp stp llc sunrpc cpufreq_ondemand acpi_cpufreq freq_table
>>> mperf ipv6 ixgbe mdio igb dcdbas microcode i2c_i801 i2c_core sg iTCO_wdt
>>> iTCO_vendor_support ioatdma dca i7core_edac edac_core shpchp ext4 mbcache jbd2
>>> sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
>>> scsi_wait_scan]
>>> Pid: 4926, comm: fc_rport_eq Not tainted 2.6.32 #2
>>> Call Trace:
>>>   [<ffffffff81069c67>] ? warn_slowpath_common+0x87/0xc0
>>>   [<ffffffff81069cba>] ? warn_slowpath_null+0x1a/0x20
>>>   [<ffffffff8107216d>] ? local_bh_enable+0x7d/0xb0
>>>   [<ffffffff81433567>] ? dev_queue_xmit+0x177/0x6c0
>>>   [<ffffffffa0293a34>] ? vlan_dev_hwaccel_hard_start_xmit+0x84/0xb0 [8021q]
>>>   [<ffffffff8142f46c>] ? dev_hard_start_xmit+0x2bc/0x3f0
>>>   [<ffffffff8143392e>] ? dev_queue_xmit+0x53e/0x6c0
>>>   [<ffffffff8142348e>] ? __skb_clone+0x2e/0x120
>>>   [<ffffffffa02ea83d>] ? fcoe_clean_pending_queue+0xcd/0x100 [libfcoe]
>>>   [<ffffffff810559ca>] ? find_busiest_group+0x9aa/0xb20
>>>   [<ffffffffa02f7624>] ? fcoe_port_send+0x24/0x50 [fcoe]
>>>   [<ffffffffa02f9568>] ? fcoe_xmit+0x228/0x3e0 [fcoe]
>>>   [<ffffffffa02c1896>] ? fc_seq_send+0xb6/0x160 [libfc]
>>>   [<ffffffffa02c1a76>] ? fc_exch_abort_locked+0x136/0x1c0 [libfc]
>>>   [<ffffffffa02c1ca7>] ? fc_exch_mgr_reset+0x147/0x2a0 [libfc]
>>>   [<ffffffff814f2c7e>] ? _spin_unlock_irq+0xe/0x20
>>>   [<ffffffffa02c91d3>] ? fc_rport_work+0x123/0x5f0 [libfc]
>>>   [<ffffffffa02c90b0>] ? fc_rport_work+0x0/0x5f0 [libfc]
>>>   [<ffffffff8108b559>] ? worker_thread+0x179/0x2a0
>>>   [<ffffffff81090ea0>] ? autoremove_wake_function+0x0/0x40
>>>   [<ffffffff8108b3e0>] ? worker_thread+0x0/0x2a0
>>>   [<ffffffff81090b56>] ? kthread+0x96/0xa0
>>>   [<ffffffff8100c10a>] ? child_rip+0xa/0x20
>>>   [<ffffffff81090ac0>] ? kthread+0x0/0xa0
>>>   [<ffffffff8100c100>] ? child_rip+0x0/0x20
>>>
>>> fc_exch_abort_locked attempts to send an abort frame, but it does so with a
>>> spinlock held and irqs/bh's disabled.  Because of this dev_queue_xmit issues a
>>> warning. This patch allows the fc_exch_abort_locked to drop and re-acquire the
>>> lock so that the warning is suppressed.
>>>
>>> Note this isn't a great fix, given that we need to free and re-acquire the
>>> ex_lock during the fc_exch_abort_locked operation, but any other solution is
>>> going to be a fairly major undertaking I think.  This should work until a proper
>>> fix can be found.
>>>
>>> Signed-off-by: Neil Horman<nhorman@tuxdriver.com>
>>> Signed-off-by: Robert Love<robert.w.love@intel.com>
>>
>> Robert,  This patch is not required based on what we discussed in
>> this thread..
>> http://lists.open-fcoe.org/pipermail/devel/2012-February/011946.html
>>
>> Neil provided "fcoe: Ensure fcoe_recv_frame is always called in
>> process context" instead of the two patches he submitted initially.
>> http://lists.open-fcoe.org/pipermail/devel/2012-February/011962.html
>>
>> Neil, correct me if I'm wrong.
>>
> Bhanu, This is not quite right (although that was a very confusing thread).  The
> ema_list fix that I was proposing is no longer necessecary, because after that
> conversation we realized the whole issue cold be avoided by making sure that we
> never allowed fcoe_recv_frame to be called in process context.
>
> This patch is still necessecary.


Neil, The reason you saw this (WARNING) stack trace (in softirq.c) was 
because of using spin_lock_irqsave() instead of spin_lock_bh() in 
fc_exch_reset() as part of  "[PATCH 1/2] fcoe: provide exclusion to 
concurrent ema_list access", which was not eventually submitted.

Because of using spin_lock_irqsave(), the following WARN_ON is hit in 
_local_bh_enable_ip().

static inline void _local_bh_enable_ip(unsigned long ip)
{
          WARN_ON_ONCE(in_irq() || irqs_disabled());

I don't think you will see this WARNING with the existing code.


  Its needed beacuse fc_exch_abort_locked is (by
> definition) called with the ex_lock held.  fc_exch_abort_locked however calls
> fc_seq_send, which can lock the same ex_lock, causing deadlock.


This is NOT an issue. Although fc_seq_send takes ex_lock, 
fc_exch_abort_locked() code path does not come here, as we return from 
this function if fh_type is BLS.

         if (fh_type == FC_TYPE_BLS)
                 return error;

         /*
          * Update the exchange and sequence flags,
          * assuming all frames for the sequence have been sent.
          * We can only be called to send once for each sequence.
          */
         spin_lock_bh(&ep->ex_lock);
	...


Thanks,
Bhanu
>
> So yes, we need a fix for this.  That said, I'd just as soon you drop this
> patch, as I was looking at this code and realized that I can make a much more
> efficient fix for this problem.  So let me know how you want to handle this: If
> you want to drop this patch, I'll have an improved fix for you early this week.
> If you want to keep this patch in place, thats fine with me too.  My new fix
> will just revert these bits.  Let me know how you want to play it.
>
> Thank you for bringing this up though, I'd been meaning to review this and see
> if I could fix it a bit better.
>
> Best
> Neil
>
>
>



  reply	other threads:[~2012-03-12  3:25 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-09 22:49 [PATCH 00/10] libfc, libfcoe and fcoe updates for scsi-misc Robert Love
2012-03-09 22:49 ` [PATCH 01/10] fcoe: avoid getting into dev_queue_xmit with bottom halves disabled Robert Love
2012-03-09 23:50   ` Bhanu Prakash Gollapudi
2012-03-10  0:12     ` Love, Robert W
     [not found]       ` <4F5A9C53.5040200-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2012-03-12 22:51         ` Love, Robert W
     [not found]     ` <4F5A9754.6080009-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
2012-03-11 17:43       ` Neil Horman
2012-03-12  3:25         ` Bhanu Prakash Gollapudi [this message]
2012-03-12 10:42           ` Neil Horman
2012-03-09 22:49 ` [PATCH 02/10] fcoe: Ensure fcoe_recv_frame is always called in process context Robert Love
2012-03-09 22:49 ` [PATCH 03/10] libfcoe: Do not sends FDISCs before FLOGI during CVL Robert Love
2012-03-09 22:49 ` [PATCH 04/10] libfc: update fc_host mfs along with updating lport->mfs Robert Love
2012-03-09 22:50 ` [PATCH 05/10] libfcoe: Support extra MAC descriptor to be used as FCoE MAC Robert Love
2012-03-09 22:50 ` [PATCH 06/10] foce: remove bh disable from fcoe sw transport rcv function Robert Love
2012-03-09 22:50 ` [PATCH 07/10] bnx2fc: Remove bh disable in softirq context Robert Love
2012-03-09 22:50 ` [PATCH 08/10] fcoe: remove frame dropping code from fcoe_percpu_clean Robert Love
2012-03-09 22:50 ` [PATCH 09/10] fcoe: reduce contention for fcoe_rx_list lock [v2] Robert Love
2012-03-09 22:50 ` [PATCH 10/10] libfc: fcoe_transport_create fails in single-CPU environment Robert Love

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F5D6CAA.7050304@broadcom.com \
    --to=bprakash@broadcom.com \
    --cc=devel@open-fcoe.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=nhorman@tuxdriver.com \
    --cc=robert.w.love@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.