From: "Bhanu Prakash Gollapudi" <bprakash@broadcom.com>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: Robert Love <robert.w.love@intel.com>,
"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
"devel@open-fcoe.org" <devel@open-fcoe.org>
Subject: Re: [PATCH 01/10] fcoe: avoid getting into dev_queue_xmit with bottom halves disabled
Date: Sun, 11 Mar 2012 20:25:30 -0700 [thread overview]
Message-ID: <4F5D6CAA.7050304@broadcom.com> (raw)
In-Reply-To: <20120311174332.GA3518@neilslaptop.think-freely.org>
On 3/11/2012 10:43 AM, Neil Horman wrote:
> On Fri, Mar 09, 2012 at 03:50:44PM -0800, Bhanu Prakash Gollapudi wrote:
>> On 3/9/2012 2:49 PM, Robert Love wrote:
>>> From: Neil Horman<nhorman@tuxdriver.com>
>>>
>>> Recieved this warning recently:
>>>
>>> WARNING: at kernel/softirq.c:143 local_bh_enable+0x7d/0xb0() (Not tainted)
>>> Hardware name: C6100
>>> Modules linked in: autofs4 bnx2fc cnic uio fcoe libfcoe libfc scsi_transport_fc
>>> scsi_tgt 8021q garp stp llc sunrpc cpufreq_ondemand acpi_cpufreq freq_table
>>> mperf ipv6 ixgbe mdio igb dcdbas microcode i2c_i801 i2c_core sg iTCO_wdt
>>> iTCO_vendor_support ioatdma dca i7core_edac edac_core shpchp ext4 mbcache jbd2
>>> sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
>>> scsi_wait_scan]
>>> Pid: 4926, comm: fc_rport_eq Not tainted 2.6.32 #2
>>> Call Trace:
>>> [<ffffffff81069c67>] ? warn_slowpath_common+0x87/0xc0
>>> [<ffffffff81069cba>] ? warn_slowpath_null+0x1a/0x20
>>> [<ffffffff8107216d>] ? local_bh_enable+0x7d/0xb0
>>> [<ffffffff81433567>] ? dev_queue_xmit+0x177/0x6c0
>>> [<ffffffffa0293a34>] ? vlan_dev_hwaccel_hard_start_xmit+0x84/0xb0 [8021q]
>>> [<ffffffff8142f46c>] ? dev_hard_start_xmit+0x2bc/0x3f0
>>> [<ffffffff8143392e>] ? dev_queue_xmit+0x53e/0x6c0
>>> [<ffffffff8142348e>] ? __skb_clone+0x2e/0x120
>>> [<ffffffffa02ea83d>] ? fcoe_clean_pending_queue+0xcd/0x100 [libfcoe]
>>> [<ffffffff810559ca>] ? find_busiest_group+0x9aa/0xb20
>>> [<ffffffffa02f7624>] ? fcoe_port_send+0x24/0x50 [fcoe]
>>> [<ffffffffa02f9568>] ? fcoe_xmit+0x228/0x3e0 [fcoe]
>>> [<ffffffffa02c1896>] ? fc_seq_send+0xb6/0x160 [libfc]
>>> [<ffffffffa02c1a76>] ? fc_exch_abort_locked+0x136/0x1c0 [libfc]
>>> [<ffffffffa02c1ca7>] ? fc_exch_mgr_reset+0x147/0x2a0 [libfc]
>>> [<ffffffff814f2c7e>] ? _spin_unlock_irq+0xe/0x20
>>> [<ffffffffa02c91d3>] ? fc_rport_work+0x123/0x5f0 [libfc]
>>> [<ffffffffa02c90b0>] ? fc_rport_work+0x0/0x5f0 [libfc]
>>> [<ffffffff8108b559>] ? worker_thread+0x179/0x2a0
>>> [<ffffffff81090ea0>] ? autoremove_wake_function+0x0/0x40
>>> [<ffffffff8108b3e0>] ? worker_thread+0x0/0x2a0
>>> [<ffffffff81090b56>] ? kthread+0x96/0xa0
>>> [<ffffffff8100c10a>] ? child_rip+0xa/0x20
>>> [<ffffffff81090ac0>] ? kthread+0x0/0xa0
>>> [<ffffffff8100c100>] ? child_rip+0x0/0x20
>>>
>>> fc_exch_abort_locked attempts to send an abort frame, but it does so with a
>>> spinlock held and irqs/bh's disabled. Because of this dev_queue_xmit issues a
>>> warning. This patch allows the fc_exch_abort_locked to drop and re-acquire the
>>> lock so that the warning is suppressed.
>>>
>>> Note this isn't a great fix, given that we need to free and re-acquire the
>>> ex_lock during the fc_exch_abort_locked operation, but any other solution is
>>> going to be a fairly major undertaking I think. This should work until a proper
>>> fix can be found.
>>>
>>> Signed-off-by: Neil Horman<nhorman@tuxdriver.com>
>>> Signed-off-by: Robert Love<robert.w.love@intel.com>
>>
>> Robert, This patch is not required based on what we discussed in
>> this thread..
>> http://lists.open-fcoe.org/pipermail/devel/2012-February/011946.html
>>
>> Neil provided "fcoe: Ensure fcoe_recv_frame is always called in
>> process context" instead of the two patches he submitted initially.
>> http://lists.open-fcoe.org/pipermail/devel/2012-February/011962.html
>>
>> Neil, correct me if I'm wrong.
>>
> Bhanu, This is not quite right (although that was a very confusing thread). The
> ema_list fix that I was proposing is no longer necessecary, because after that
> conversation we realized the whole issue cold be avoided by making sure that we
> never allowed fcoe_recv_frame to be called in process context.
>
> This patch is still necessecary.
Neil, The reason you saw this (WARNING) stack trace (in softirq.c) was
because of using spin_lock_irqsave() instead of spin_lock_bh() in
fc_exch_reset() as part of "[PATCH 1/2] fcoe: provide exclusion to
concurrent ema_list access", which was not eventually submitted.
Because of using spin_lock_irqsave(), the following WARN_ON is hit in
_local_bh_enable_ip().
static inline void _local_bh_enable_ip(unsigned long ip)
{
WARN_ON_ONCE(in_irq() || irqs_disabled());
I don't think you will see this WARNING with the existing code.
Its needed beacuse fc_exch_abort_locked is (by
> definition) called with the ex_lock held. fc_exch_abort_locked however calls
> fc_seq_send, which can lock the same ex_lock, causing deadlock.
This is NOT an issue. Although fc_seq_send takes ex_lock,
fc_exch_abort_locked() code path does not come here, as we return from
this function if fh_type is BLS.
if (fh_type == FC_TYPE_BLS)
return error;
/*
* Update the exchange and sequence flags,
* assuming all frames for the sequence have been sent.
* We can only be called to send once for each sequence.
*/
spin_lock_bh(&ep->ex_lock);
...
Thanks,
Bhanu
>
> So yes, we need a fix for this. That said, I'd just as soon you drop this
> patch, as I was looking at this code and realized that I can make a much more
> efficient fix for this problem. So let me know how you want to handle this: If
> you want to drop this patch, I'll have an improved fix for you early this week.
> If you want to keep this patch in place, thats fine with me too. My new fix
> will just revert these bits. Let me know how you want to play it.
>
> Thank you for bringing this up though, I'd been meaning to review this and see
> if I could fix it a bit better.
>
> Best
> Neil
>
>
>
next prev parent reply other threads:[~2012-03-12 3:25 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-09 22:49 [PATCH 00/10] libfc, libfcoe and fcoe updates for scsi-misc Robert Love
2012-03-09 22:49 ` [PATCH 01/10] fcoe: avoid getting into dev_queue_xmit with bottom halves disabled Robert Love
2012-03-09 23:50 ` Bhanu Prakash Gollapudi
2012-03-10 0:12 ` Love, Robert W
[not found] ` <4F5A9C53.5040200-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2012-03-12 22:51 ` Love, Robert W
[not found] ` <4F5A9754.6080009-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
2012-03-11 17:43 ` Neil Horman
2012-03-12 3:25 ` Bhanu Prakash Gollapudi [this message]
2012-03-12 10:42 ` Neil Horman
2012-03-09 22:49 ` [PATCH 02/10] fcoe: Ensure fcoe_recv_frame is always called in process context Robert Love
2012-03-09 22:49 ` [PATCH 03/10] libfcoe: Do not sends FDISCs before FLOGI during CVL Robert Love
2012-03-09 22:49 ` [PATCH 04/10] libfc: update fc_host mfs along with updating lport->mfs Robert Love
2012-03-09 22:50 ` [PATCH 05/10] libfcoe: Support extra MAC descriptor to be used as FCoE MAC Robert Love
2012-03-09 22:50 ` [PATCH 06/10] foce: remove bh disable from fcoe sw transport rcv function Robert Love
2012-03-09 22:50 ` [PATCH 07/10] bnx2fc: Remove bh disable in softirq context Robert Love
2012-03-09 22:50 ` [PATCH 08/10] fcoe: remove frame dropping code from fcoe_percpu_clean Robert Love
2012-03-09 22:50 ` [PATCH 09/10] fcoe: reduce contention for fcoe_rx_list lock [v2] Robert Love
2012-03-09 22:50 ` [PATCH 10/10] libfc: fcoe_transport_create fails in single-CPU environment Robert Love
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F5D6CAA.7050304@broadcom.com \
--to=bprakash@broadcom.com \
--cc=devel@open-fcoe.org \
--cc=linux-scsi@vger.kernel.org \
--cc=nhorman@tuxdriver.com \
--cc=robert.w.love@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox