linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* More wireless problems..
@ 2014-08-18  4:01 Linus Torvalds
  2014-08-18  4:37 ` Linus Torvalds
  0 siblings, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2014-08-18  4:01 UTC (permalink / raw)
  To: Johannes Berg, Emmanuel Grumbach
  Cc: Intel Linux Wireless, John W. Linville, Linux Wireless List,
	Network Development

[-- Attachment #1: Type: text/plain, Size: 1014 bytes --]

So there's still something seriously wrong with the wireless changes
in the current merge window. Maybe it's not new, and is triggered by
something with the hotel wireless here at the kernel summit, but I
just got a NULL pointer dereference:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
  IP: ieee80211_vif_use_reserved_switch+0x71c/0xb00 [mac80211]

this may not be iwl-specific at all, but the second oops (which is
likely just a result of the first one) does end up having iwl-related
functions on the stack , so I'm cc'ing both generic wireless people
and intel wireless people. It's the same machine that showed the intel
wireless scanning microcode problem, but the oops looks very
different.

The end result is a dead machine (when kworkers die, things tend to go
downhill fast), so it would be good to have people give this a good
look.

Attached is the more complete oops details. I don't know what
triggered it, the machine was largely idle.

Ideas?

               Linus

[-- Attachment #2: oops --]
[-- Type: application/octet-stream, Size: 6218 bytes --]


  BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
  IP: [<ffffffffc03fe97c>] ieee80211_vif_use_reserved_switch+0x71c/0xb00 [mac80211]
  PGD 0 
  Oops: 0000 [#1] SMP 
  Modules linked in: ftdi_sio rfcomm fuse ip6t_rpfilter ip6t_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw bnep mmc_block arc4 vfat fat pn544_mei mei_phy pn544 hci snd_hda_codec_realtek nfc rtsx_pci_sdmmc snd_hda_codec_generic snd_hda_codec_hdmi mmc_core iTCO_wdt iTCO_vendor_support snd_hda_intel iwlmvm snd_hda_controller uvcvideo snd_hda_codec mac80211 x86_pkg_temp_thermal snd_hwdep coretemp videobuf2_vmalloc snd_seq videobuf2_memops videobuf2_core snd_seq_device v4l2_common videodev snd_pcm microcode
   iwlwifi joydev media hid_multitouch rtsx_pci cfg80211 btusb lpc_ich serio_raw bluetooth snd_timer snd shpchp mei_me soundcore i2c_i801 mfd_core sony_laptop mei rfkill dm_crypt crct10dif_pclmul crc32_pclmul i915 crc32c_intel ghash_clmulni_intel i2c_algo_bit drm_kms_helper drm video
  CPU: 2 PID: 12936 Comm: kworker/u8:0 Tainted: G        W      3.16.0-11383-gc9d26423e56c #5
  Hardware name: Sony Corporation SVP11213CXB/VAIO, BIOS R0270V7 05/17/2013
  Workqueue: phy0 ieee80211_chswitch_work [mac80211]
  task: ffff8800ba51d070 ti: ffff88000bc58000 task.ti: ffff88000bc58000
  RIP: 0010:[<ffffffffc03fe97c>]  [<ffffffffc03fe97c>] ieee80211_vif_use_reserved_switch+0x71c/0xb00 [mac80211]
  RSP: 0018:ffff88000bc5bd80  EFLAGS: 00010246
  RAX: ffff8800d2176e08 RBX: ffff880098e0c9c0 RCX: 0000000000000020
  RDX: ffff88001ee287a0 RSI: 0000000000000000 RDI: ffff8800d2176880
  RBP: ffff880098e0c9f0 R08: ffff88000bc58000 R09: 0000000000000000
  R10: 0000000000000004 R11: 0000000000000005 R12: 0000000000000000
  R13: ffff880098e0c458 R14: ffff8800d3681758 R15: ffff8800d3680660
  FS:  0000000000000000(0000) GS:ffff88011fb00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000020 CR3: 0000000001a11000 CR4: 00000000001407e0
  Call Trace:
    ieee80211_vif_use_reserved_context+0x88/0x110 [mac80211]
    ieee80211_chswitch_work+0xb8/0x180 [mac80211]
    process_one_work+0x160/0x370
    worker_thread+0x114/0x470
    rescuer_thread+0x2a0/0x2a0
    kthread+0xb8/0xd0
    kthread_create_on_node+0x170/0x170
    ret_from_fork+0x7c/0xb0
    kthread_create_on_node+0x170/0x170
  Code: 48 b8 00 02 20 00 00 00 ad de 48 8b 97 90 05 00 00 48 89 87 a0 05 00 00 48 8d 87 88 05 00 00 48 89 51 08 48 89 0a 48 8b 4c 24 08 <48> 8b 56 20 48 89 42 08 48 89 97 88 05 00 00 48 89 8f 90 05 00 
  RIP  [<ffffffffc03fe97c>] ieee80211_vif_use_reserved_switch+0x71c/0xb00 [mac80211]
   RSP <ffff88000bc5bd80>
  CR2: 0000000000000020
  ---[ end trace 11849fac14a9c129 ]---

  BUG: unable to handle kernel paging request at ffffffffffffffd8
  IP: [<ffffffff8107ebf7>] kthread_data+0x7/0x10
  PGD 1a12067 PUD 1a14067 PMD 0 
  Oops: 0000 [#2] SMP 
  Modules linked in: ftdi_sio rfcomm fuse ip6t_rpfilter ip6t_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw bnep mmc_block arc4 vfat fat pn544_mei mei_phy pn544 hci snd_hda_codec_realtek nfc rtsx_pci_sdmmc snd_hda_codec_generic snd_hda_codec_hdmi mmc_core iTCO_wdt iTCO_vendor_support snd_hda_intel iwlmvm snd_hda_controller uvcvideo snd_hda_codec mac80211 x86_pkg_temp_thermal snd_hwdep coretemp videobuf2_vmalloc snd_seq videobuf2_memops videobuf2_core snd_seq_device v4l2_common videodev snd_pcm microcode
   iwlwifi joydev media hid_multitouch rtsx_pci cfg80211 btusb lpc_ich serio_raw bluetooth snd_timer snd shpchp mei_me soundcore i2c_i801 mfd_core sony_laptop mei rfkill dm_crypt crct10dif_pclmul crc32_pclmul i915 crc32c_intel ghash_clmulni_intel i2c_algo_bit drm_kms_helper drm video
  CPU: 2 PID: 12936 Comm: kworker/u8:0 Tainted: G      D W      3.16.0-11383-gc9d26423e56c #5
  Hardware name: Sony Corporation SVP11213CXB/VAIO, BIOS R0270V7 05/17/2013
  task: ffff8800ba51d070 ti: ffff88000bc58000 task.ti: ffff88000bc58000
  RIP: 0010:[<ffffffff8107ebf7>]  [<ffffffff8107ebf7>] kthread_data+0x7/0x10
  RSP: 0018:ffff88000bc5ba98  EFLAGS: 00010002
  RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000006
  RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff8800ba51d070
  RBP: ffff8800ba51d070 R08: ffff88011fb00000 R09: 00000001802a0013
  R10: ffffffff810685e5 R11: ffffea0002619300 R12: ffff8800ba51d4a0
  R13: 0000000000000002 R14: 0000000000000000 R15: ffff8800ba51d070
  FS:  0000000000000000(0000) GS:ffff88011fb00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000028 CR3: 0000000001a11000 CR4: 00000000001407e0
  Call Trace:
    wq_worker_sleeping+0x8/0x90
    __schedule+0x3d3/0x6b0
    do_exit+0x6c2/0xa20
    oops_end+0x63/0x90
    no_context+0x2d1/0x32e
    __do_page_fault+0x72/0x440
    prepare_to_wait_event+0xf0/0xf0
    iwl_mvm_send_cmd+0x38/0xb0 [iwlmvm]
    iwl_mvm_send_cmd_pdu+0x3c/0x50 [iwlmvm]
    page_fault+0x22/0x30
    ieee80211_vif_use_reserved_switch+0x71c/0xb00 [mac80211]
    ieee80211_vif_use_reserved_context+0x88/0x110 [mac80211]
    ieee80211_chswitch_work+0xb8/0x180 [mac80211]
    process_one_work+0x160/0x370
    worker_thread+0x114/0x470
    rescuer_thread+0x2a0/0x2a0
    kthread+0xb8/0xd0
    kthread_create_on_node+0x170/0x170
    ret_from_fork+0x7c/0xb0
    kthread_create_on_node+0x170/0x170
  Code: 00 00 00 00 65 48 8b 04 25 c0 b8 00 00 48 8b 80 d8 03 00 00 48 8b 40 c8 48 c1 e8 02 83 e0 01 c3 0f 1f 40 00 48 8b 87 d8 03 00 00 <48> 8b 40 d8 c3 0f 1f 40 00 48 83 ec 08 ba 08 00 00 00 48 8b b7 
  RIP  [<ffffffff8107ebf7>] kthread_data+0x7/0x10
   RSP <ffff88000bc5ba98>
  CR2: ffffffffffffffd8
  ---[ end trace 11849fac14a9c12a ]---

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: More wireless problems..
  2014-08-18  4:01 More wireless problems Linus Torvalds
@ 2014-08-18  4:37 ` Linus Torvalds
  2014-08-18  8:50   ` Luca Coelho
  0 siblings, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2014-08-18  4:37 UTC (permalink / raw)
  To: Johannes Berg, Emmanuel Grumbach, Michal Kazior
  Cc: Intel Linux Wireless, John W. Linville, Linux Wireless List,
	Network Development

On Sun, Aug 17, 2014 at 11:01 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>   BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
>   IP: ieee80211_vif_use_reserved_switch+0x71c/0xb00 [mac80211]

Looking at the Code: line and the code generation for that function,
this *looks* to be this code:

                        list_del(&sdata->reserved_chanctx_list);
                        list_move(&sdata->assigned_chanctx_list,
                                  &new_ctx->assigned_vifs);
                        sdata->reserved_chanctx = NULL;

in ieee80211_vif_use_reserved_switch(), where "new_ctx" is NULL, so
the "list_move()" ends up oopsing. But maybe I screwed up the
analysis, I don't know the code.

Looks like that is all-new code introduced by commit 5bcae31d9cb1
("mac80211: implement multi-vif in-place reservations")

And doesn't look at all IWL-specific. Adding Michal Kazior to the list
of people.

                Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: More wireless problems..
  2014-08-18  4:37 ` Linus Torvalds
@ 2014-08-18  8:50   ` Luca Coelho
  2014-08-18 11:19     ` [PATCH] mac80211: fix channel switch for chanctx-based drivers Michal Kazior
  0 siblings, 1 reply; 8+ messages in thread
From: Luca Coelho @ 2014-08-18  8:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Johannes Berg, Emmanuel Grumbach, Michal Kazior,
	Intel Linux Wireless, John W. Linville, Linux Wireless List,
	Network Development

Hi Linus,

On Sun, 2014-08-17 at 23:37 -0500, Linus Torvalds wrote:
> On Sun, Aug 17, 2014 at 11:01 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> >   BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
> >   IP: ieee80211_vif_use_reserved_switch+0x71c/0xb00 [mac80211]
> 
> Looking at the Code: line and the code generation for that function,
> this *looks* to be this code:
> 
>                         list_del(&sdata->reserved_chanctx_list);
>                         list_move(&sdata->assigned_chanctx_list,
>                                   &new_ctx->assigned_vifs);
>                         sdata->reserved_chanctx = NULL;
> 
> in ieee80211_vif_use_reserved_switch(), where "new_ctx" is NULL, so
> the "list_move()" ends up oopsing. But maybe I screwed up the
> analysis, I don't know the code.
> 
> Looks like that is all-new code introduced by commit 5bcae31d9cb1
> ("mac80211: implement multi-vif in-place reservations")
> 
> And doesn't look at all IWL-specific. Adding Michal Kazior to the list
> of people.

Unfortunately I don't think channel switch on the client side has been
tested very thoroughly yet, from the iwlwifi point of view, we're still
fine-tuning things in the firmware configuration.  So I'd rather disable
channel-switch entirely for iwlwifi.

Probably better to disable it in mac80211, since this oops will probably
happen with other drivers too, since doesn't seem to be iwlwifi related.

I'll sent a patch soon.  Meanwhile, you can apply this directly, to make
the problem go away:

diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index e0ab432..4b8eee8 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -758,6 +758,12 @@ int ieee80211_register_hw(struct ieee80211_hw *hw)
        netdev_features_t feature_whitelist;
        struct cfg80211_chan_def dflt_chandef = {};
 
+       /* The CSA client code is not very stable yet and seems to be
+        * causing trouble.  Disable it for all drivers until
+        * everything has been fixed and tested properly.
+        */
+       hw->flags &= ~IEEE80211_HW_CHANCTX_STA_CSA;
+
        if (hw->flags & IEEE80211_HW_QUEUE_CONTROL &&
            (local->hw.offchannel_tx_hw_queue ==
IEEE80211_INVAL_HW_QUEUE ||
             local->hw.offchannel_tx_hw_queue >= local->hw.queues))

--
Cheers,
Luca.


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH] mac80211: fix channel switch for chanctx-based drivers
  2014-08-18  8:50   ` Luca Coelho
@ 2014-08-18 11:19     ` Michal Kazior
  2014-08-18 13:40       ` Luca Coelho
                         ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Michal Kazior @ 2014-08-18 11:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Johannes Berg, Emmanuel Grumbach, Intel Linux Wireless,
	John W. Linville, Linux Wireless List, Network Development,
	Luca Coelho, Michal Kazior

The new_ctx pointer is set only for non-chanctx
drivers. This yielded a crash for chanctx-based
drivers during channel switch finalization:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
  IP: ieee80211_vif_use_reserved_switch+0x71c/0xb00 [mac80211]

Use an adequate chanctx pointer to fix this.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
---
Note: This is based on mac80211-next/master albeit
it should apply cleanly on wireless-next/master
and v3.17-rc1.

I've verified this fix with iwlmvm & 7260.


 net/mac80211/chan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/mac80211/chan.c b/net/mac80211/chan.c
index f3317fa..7367e66 100644
--- a/net/mac80211/chan.c
+++ b/net/mac80211/chan.c
@@ -1353,7 +1353,7 @@ static int ieee80211_vif_use_reserved_switch(struct ieee80211_local *local)
 
 			list_del(&sdata->reserved_chanctx_list);
 			list_move(&sdata->assigned_chanctx_list,
-				  &new_ctx->assigned_vifs);
+				  &ctx->assigned_vifs);
 			sdata->reserved_chanctx = NULL;
 
 			ieee80211_vif_chanctx_reservation_complete(sdata);
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] mac80211: fix channel switch for chanctx-based drivers
  2014-08-18 11:19     ` [PATCH] mac80211: fix channel switch for chanctx-based drivers Michal Kazior
@ 2014-08-18 13:40       ` Luca Coelho
  2014-08-18 13:53       ` Linus Torvalds
  2014-08-18 13:59       ` Linus Torvalds
  2 siblings, 0 replies; 8+ messages in thread
From: Luca Coelho @ 2014-08-18 13:40 UTC (permalink / raw)
  To: Michal Kazior
  Cc: Linus Torvalds, Johannes Berg, Emmanuel Grumbach,
	Intel Linux Wireless, John W. Linville, Linux Wireless List,
	Network Development

On Mon, 2014-08-18 at 13:19 +0200, Michal Kazior wrote:
> The new_ctx pointer is set only for non-chanctx
> drivers. This yielded a crash for chanctx-based
> drivers during channel switch finalization:
> 
>   BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
>   IP: ieee80211_vif_use_reserved_switch+0x71c/0xb00 [mac80211]
> 
> Use an adequate chanctx pointer to fix this.
> 
> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
> ---
> Note: This is based on mac80211-next/master albeit
> it should apply cleanly on wireless-next/master
> and v3.17-rc1.
> 
> I've verified this fix with iwlmvm & 7260.

Cool!  I've also tested this (with P2P client) and it works fine.  You
can add my:

Tested-by: Luciano Coelho <luciano.coelho@intel.com>

The reason I haven't seen this before is because I've been using 2
channels support with iwlmvm, so we never get an in-place channel
switch. :( The normal case is to have single channel support...

--
Cheers,
Luca.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mac80211: fix channel switch for chanctx-based drivers
  2014-08-18 11:19     ` [PATCH] mac80211: fix channel switch for chanctx-based drivers Michal Kazior
  2014-08-18 13:40       ` Luca Coelho
@ 2014-08-18 13:53       ` Linus Torvalds
  2014-08-18 13:59         ` Luca Coelho
  2014-08-18 13:59       ` Linus Torvalds
  2 siblings, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2014-08-18 13:53 UTC (permalink / raw)
  To: Michal Kazior
  Cc: Johannes Berg, Emmanuel Grumbach, Intel Linux Wireless,
	John W. Linville, Linux Wireless List, Network Development,
	Luca Coelho

On Mon, Aug 18, 2014 at 6:19 AM, Michal Kazior <michal.kazior@tieto.com> wrote:
>
> I've verified this fix with iwlmvm & 7260.

So I'm running a kernel with this manually applied, and so far so
good. But I don't know what actually triggered the problem, and it
definitely didn't happen all the time, so my testing of this is
dubious. But the patch certainly seems to match the symptoms. Thanks,

          Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mac80211: fix channel switch for chanctx-based drivers
  2014-08-18 13:53       ` Linus Torvalds
@ 2014-08-18 13:59         ` Luca Coelho
  0 siblings, 0 replies; 8+ messages in thread
From: Luca Coelho @ 2014-08-18 13:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Michal Kazior, Johannes Berg, Emmanuel Grumbach,
	Intel Linux Wireless, John W. Linville, Linux Wireless List,
	Network Development

Hi Linus,

On Mon, 2014-08-18 at 08:53 -0500, Linus Torvalds wrote:
> On Mon, Aug 18, 2014 at 6:19 AM, Michal Kazior <michal.kazior@tieto.com> wrote:
> >
> > I've verified this fix with iwlmvm & 7260.
> 
> So I'm running a kernel with this manually applied, and so far so
> good. But I don't know what actually triggered the problem, and it
> definitely didn't happen all the time, so my testing of this is
> dubious. But the patch certainly seems to match the symptoms. Thanks,

What triggers this is a "Channel Switch Announcement" on which the
access point tells the clients to move to another channel at a specified
time.  This is not very common, but some enterprise APs use it to
improve the operating radio conditions, for instance.

Previously, as a client, we would simply disconnect from the current
channel and reconnect on the new channel after the time specified by the
AP.  Now we implemented a more advanced switch where we don't lose
connectivity, but "simply" switch channels.

Hope this clarifies a bit.

--
Cheers,
Luca.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mac80211: fix channel switch for chanctx-based drivers
  2014-08-18 11:19     ` [PATCH] mac80211: fix channel switch for chanctx-based drivers Michal Kazior
  2014-08-18 13:40       ` Luca Coelho
  2014-08-18 13:53       ` Linus Torvalds
@ 2014-08-18 13:59       ` Linus Torvalds
  2 siblings, 0 replies; 8+ messages in thread
From: Linus Torvalds @ 2014-08-18 13:59 UTC (permalink / raw)
  To: Michal Kazior
  Cc: Johannes Berg, Emmanuel Grumbach, Intel Linux Wireless,
	John W. Linville, Linux Wireless List, Network Development,
	Luca Coelho

On Mon, Aug 18, 2014 at 6:19 AM, Michal Kazior <michal.kazior@tieto.com> wrote:
>
> I've verified this fix with iwlmvm & 7260.

So I'm running a kernel with this manually applied, and so far so
good. But I don't know what actually triggered the problem, and it
definitely didn't happen all the time, so my testing of this is
dubious. But the patch certainly seems to match the symptoms. Thanks,

          Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-08-18 13:59 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-18  4:01 More wireless problems Linus Torvalds
2014-08-18  4:37 ` Linus Torvalds
2014-08-18  8:50   ` Luca Coelho
2014-08-18 11:19     ` [PATCH] mac80211: fix channel switch for chanctx-based drivers Michal Kazior
2014-08-18 13:40       ` Luca Coelho
2014-08-18 13:53       ` Linus Torvalds
2014-08-18 13:59         ` Luca Coelho
2014-08-18 13:59       ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).